barracuda 1.2 → 1.3

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -33,7 +33,7 @@ Or:
33
33
  cd barracuda
34
34
  rake install
35
35
 
36
- USING
36
+ USAGE
37
37
  -----
38
38
 
39
39
  The basic workflow behind the OpenCL architecture is:
@@ -45,12 +45,14 @@ The basic workflow behind the OpenCL architecture is:
45
45
  In Barracuda, this looks basically like:
46
46
 
47
47
  1. Create a `Barracuda::Program`
48
- 2. Create a `Barracuda::Buffer` or `Barracuda::OutputBuffer`
48
+ 2. Create a `Barracuda::Buffer` for input and output
49
49
  2. Call the kernel method on the program with buffers as arguments
50
50
  3. Read output buffers
51
51
 
52
- As you can see, there are only 3 basic classes: `Program`, `Buffer` (for input
53
- data), and `OutputBuffer` (for output data).
52
+ As you can see, there are only 2 basic classes: `Program` and `Buffer`. The
53
+ program is where you compile your OpenCL code, and the Buffer class is a
54
+ subclass of Array that contains your data to pass in and out of the kernel
55
+ method.
54
56
 
55
57
  EXAMPLE
56
58
  -------
@@ -63,7 +65,7 @@ Consider the following example to sum a bunch of integers:
63
65
  }
64
66
  eof
65
67
 
66
- output = OutputBuffer.new(:int, 1)
68
+ output = Buffer.new(1)
67
69
  program.sum((1..65536).to_a, output)
68
70
 
69
71
  puts "The sum is: " + output.data[0].to_s
@@ -72,9 +74,8 @@ The above example will compute the sum of integers 1 to 65536 using (at most)
72
74
  65536 parallel processes and return the result in the 1-dimensional output
73
75
  buffer (which stores integers and is of length 1). The kernel method `sum`
74
76
  is called by calling the `#sum` method on the program object, and the
75
- arguments are passed in sequentially as the output buffer, followed by the
76
- input data (the integers) followed by the total size of the input (since C
77
- does not have the concept of array size).
77
+ arguments are passed in sequentially as the input data (the integers)
78
+ followed by the output buffer to store the data.
78
79
 
79
80
  We can also specify the work group size (the number of iterations we need
80
81
  to run). Barracuda automatically selects the size of the largest buffer as
@@ -83,6 +84,49 @@ manually specify the work group size, call the kernel with an options hash:
83
84
 
84
85
  program.my_kernel_method(..., :times => 512)
85
86
 
87
+ OUTPUT BUFFERS
88
+ --------------
89
+
90
+ The Buffer class is a superset of both data to be sent and read from the OpenCL
91
+ kernel method being called. In general, if the Buffer contains nil elements,
92
+ it is marked as an "output buffer" and the data is read back from OpenCL after
93
+ the kernel method executes. These nil buffers are not written to OpenCL initially,
94
+ so they are only meant for output data. On the other hand, if the buffer contains
95
+ regular data, it is by default considered as input data only, and the data
96
+ is not read back after the kernel method completes.
97
+
98
+ In some cases you may want to have a buffer that is both input and output and
99
+ should be read from after the kernel method finishes. To do this, you mark the
100
+ buffer as an `outvar` as so:
101
+
102
+ program = Program.new <<-'eof'
103
+ __kernel addN(__global int *data, int N) {
104
+ int i = get_global_id(0);
105
+ data[i] = data[i] + N;
106
+ }
107
+ eof
108
+
109
+ data = [1, 2, 3]
110
+ program.addN(data.outvar, 10)
111
+
112
+ # prints: [11, 12, 13]
113
+ p data
114
+
115
+ RETURN VALUE
116
+ ------------
117
+
118
+ Generally you need to pass in your output buffer as the buffer to write the
119
+ data back to. The idiom `void method(input, output)` is common to write data to
120
+ output buffers in languages such as C but is a rather clunky API for Ruby.
121
+ Instead, Barracuda returns the output buffers as the result of the kernel method
122
+ call. If there is only one output buffer, that buffer is returned as a single
123
+ result (rather than an array of buffers).
124
+
125
+ The example above could be simply rewritten as:
126
+
127
+ # prints: [11, 12, 13]
128
+ p program.addN(data.outvar, 10)
129
+
86
130
  CONVERTING TYPES
87
131
  ----------------
88
132
 
@@ -105,6 +149,8 @@ For example, to pass in a short, do:
105
149
  This can also be applied to an Array of shorts:
106
150
 
107
151
  program.my_kernel([1, 2, 3].to_type(:short))
152
+
153
+ The default type for an array (and buffers) is :int
108
154
 
109
155
  CLASS DETAILS
110
156
  -------------
@@ -123,34 +169,21 @@ Represents an OpenCL program
123
169
  - if the last arg is a Hash, it should be an options hash with keys:
124
170
  - :times => FIXNUM (the number of iterations to run)
125
171
 
126
- **Barracuda::Buffer**:
172
+ **Barracuda::Buffer** (extends *Array*):
127
173
 
128
- Stores data to be sent to an OpenCL kernel method
174
+ Data storage to transfer to/from an OpenCL kernel method
129
175
 
130
- Buffer.new(*buffer_data) => creates a new input buffer
131
-
132
- Buffer#data => accessor for the buffer data
133
-
134
- Buffer#size_changed => call this if the buffer.data was modified and the size changed
135
- - calls Buffer#write
176
+ Buffer.new(buffer_array) => creates a new input buffer
177
+ Buffer.new(size) => creates a new output buffer of size `size`
136
178
 
137
- Buffer#write => call this if the buffer.data was modified (size not changed)
138
- - flushes the buffer.data cache to the OpenCL internal memory buffer
139
-
140
- Buffer#read => reads the cached data back into buffer.data
141
- - refreshes the buffer.data cache according to the internal memory buffer
142
-
143
- **Barracuda::OutputBuffer**:
179
+ Buffer#mark_dirty => call this if the data was modified between calls
144
180
 
145
- Holds a buffer for data written from the kernel method.
181
+ Buffer#dirty? => returns whether the buffer is marked as dirty
146
182
 
147
- OutputBuffer.new(type, size) => creates a new output buffer
148
- - type can be :float or :int
183
+ Buffer#outvar => mark the buffer to be read as output
184
+
185
+ Buffer#outvar? => returns whether buffer is marked to be read
149
186
 
150
- OutputBufferBuffer#data => accessor for the buffer data
151
-
152
- OutputBuffer#size => returns the buffer size
153
-
154
187
  GLOSSARY
155
188
  --------
156
189
 
data/Rakefile CHANGED
@@ -1,9 +1,15 @@
1
1
  require 'rubygems'
2
2
  require 'rake/gempackagetask'
3
+ require 'rake/testtask'
3
4
 
4
5
  WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
5
6
  SUDO = WINDOWS ? '' : 'sudo'
6
7
 
8
+ task :default => :test
9
+ task :test => :build
10
+
11
+ Rake::TestTask.new
12
+
7
13
  load 'barracuda.gemspec'
8
14
  Rake::GemPackageTask.new(SPEC) do |pkg|
9
15
  pkg.gem_spec = SPEC
@@ -16,3 +22,8 @@ task :install => :package do
16
22
  sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
17
23
  sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
18
24
  end
25
+
26
+ desc 'Build Barracuda'
27
+ task :build do
28
+ sh "cd ext && make"
29
+ end
@@ -34,7 +34,7 @@ eof
34
34
  num_vecs = 1000000
35
35
  arr = []
36
36
  num_vecs.times { arr.push(rand, rand, rand, 0.0) }
37
- output = OutputBuffer.new(:float, arr.size)
37
+ output = Buffer.new(arr.size).to_type(:float)
38
38
 
39
39
  Benchmark.bmbm do |x|
40
40
  x.report("cpu") { norm_all(arr) }
@@ -12,9 +12,9 @@ prog = Program.new <<-'eof'
12
12
  }
13
13
  eof
14
14
 
15
- arr = (1..3333333).to_a
15
+ arr = (1..333333).to_a
16
16
  input = Buffer.new(arr)
17
- output = OutputBuffer.new(:float, arr.size)
17
+ output = Buffer.new(arr.size).to_type(:float)
18
18
 
19
19
  Benchmark.bmbm do |x|
20
20
  x.report("regular") { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } }
@@ -4,7 +4,6 @@
4
4
 
5
5
  static VALUE rb_mBarracuda;
6
6
  static VALUE rb_cBuffer;
7
- static VALUE rb_cOutputBuffer;
8
7
  static VALUE rb_cProgram;
9
8
  static VALUE rb_eProgramSyntaxError;
10
9
  static VALUE rb_eOpenCLError;
@@ -13,8 +12,10 @@ static VALUE rb_hTypes;
13
12
 
14
13
  static ID id_times;
15
14
  static ID id_to_sym;
16
- static ID id_data_type;
15
+ static ID id_new;
17
16
  static ID id_object;
17
+ static ID id_data_type;
18
+ static ID id_buffer_data;
18
19
 
19
20
  static ID id_type_bool;
20
21
  static ID id_type_char;
@@ -35,25 +36,25 @@ static ID id_type_uintptr_t;
35
36
  /*static ID id_type_void;*/
36
37
 
37
38
  static VALUE program_compile(VALUE self, VALUE source);
38
- static VALUE buffer_data_set(VALUE self, VALUE new_value);
39
39
 
40
40
  static cl_device_id device_id = NULL;
41
41
  static cl_context context = NULL;
42
42
  static size_t max_work_group_size = 65535;
43
43
  static int err;
44
44
 
45
- #define VERSION_STRING "1.2"
45
+ #define VERSION_STRING "1.3"
46
46
 
47
47
  struct program {
48
48
  cl_program program;
49
49
  };
50
50
 
51
51
  struct buffer {
52
- VALUE arr;
52
+ VALUE dirty;
53
+ VALUE outvar;
53
54
  ID type;
54
- size_t num_items;
55
55
  size_t member_size;
56
- void *cachebuf;
56
+ size_t num_items;
57
+ int8_t *cachebuf;
57
58
  cl_mem data;
58
59
  };
59
60
 
@@ -114,11 +115,12 @@ array_data_type_get(VALUE self)
114
115
  if (RTEST(value)) return value;
115
116
 
116
117
  if (RARRAY_LEN(self) > 0) {
118
+ if (NIL_P(RARRAY_PTR(self)[0])) return ID2SYM(id_type_int);
117
119
  VALUE value = rb_funcall(RARRAY_PTR(self)[0], id_data_type, 0);
118
120
  if (RTEST(value)) return value;
119
121
  }
120
-
121
- rb_raise(rb_eRuntimeError, "unknown buffer data in array %s",
122
+
123
+ rb_raise(rb_eTypeError, "unknown buffer data %s",
122
124
  RSTRING_PTR(rb_inspect(self)));
123
125
  }
124
126
 
@@ -128,7 +130,7 @@ array_data_type_get(VALUE self)
128
130
 
129
131
  #define GET_BUFFER() \
130
132
  struct buffer *buffer; \
131
- Data_Get_Struct(self, struct buffer, buffer);
133
+ Data_Get_Struct(rb_ivar_get(self, id_buffer_data), struct buffer, buffer);
132
134
 
133
135
  #define TYPE_SET(type, size) \
134
136
  id_type_##type = rb_intern(#type); \
@@ -159,11 +161,11 @@ types_hash_init()
159
161
  TYPE_SET(ulong, cl_ulong);
160
162
  TYPE_SET(float, cl_float);
161
163
  TYPE_SET(half, cl_half);
162
- TYPE_SET(double, cl_double);
163
- TYPE_SET(size_t, size_t);
164
- TYPE_SET(ptrdiff_t, ptrdiff_t);
165
- TYPE_SET(intptr_t, intptr_t);
166
- TYPE_SET(uintptr_t, uintptr_t);
164
+ TYPE_SET(double, cl_float);
165
+ TYPE_SET(size_t, cl_uint);
166
+ TYPE_SET(ptrdiff_t, cl_uint);
167
+ TYPE_SET(intptr_t, cl_uint);
168
+ TYPE_SET(uintptr_t, cl_uint);
167
169
  OBJ_FREEZE(rb_hTypes);
168
170
  }
169
171
 
@@ -195,11 +197,10 @@ type_to_native(VALUE value, ID data_type, void *native_value)
195
197
  TYPE_TO_NATIVE(uint, cl_uint, NUM2UINT);
196
198
  TYPE_TO_NATIVE(long, cl_long, NUM2LONG);
197
199
  TYPE_TO_NATIVE(ulong, cl_ulong, NUM2ULONG);
198
- TYPE_TO_NATIVE(double, cl_double, NUM2DBL);
199
- TYPE_TO_NATIVE(size_t, size_t, NUM2UINT);
200
- TYPE_TO_NATIVE(ptrdiff_t, ptrdiff_t, NUM2UINT);
201
- TYPE_TO_NATIVE(intptr_t, intptr_t, NUM2UINT);
202
- TYPE_TO_NATIVE(uintptr_t, uintptr_t, NUM2UINT);
200
+ TYPE_TO_NATIVE(size_t, cl_uint, NUM2UINT);
201
+ TYPE_TO_NATIVE(ptrdiff_t, cl_uint, NUM2UINT);
202
+ TYPE_TO_NATIVE(intptr_t, cl_uint, NUM2UINT);
203
+ TYPE_TO_NATIVE(uintptr_t, cl_uint, NUM2UINT);
203
204
  }
204
205
 
205
206
  static VALUE
@@ -216,11 +217,11 @@ type_to_ruby(void *native_value, ID data_type)
216
217
  TYPE_TO_RUBY(ulong, cl_ulong, ULONG2NUM);
217
218
  TYPE_TO_RUBY(float, cl_float, rb_float_new);
218
219
  TYPE_TO_RUBY(half, cl_half, rb_float_new);
219
- TYPE_TO_RUBY(double, cl_double, DBL2NUM);
220
- TYPE_TO_RUBY(size_t, size_t, UINT2NUM);
221
- TYPE_TO_RUBY(ptrdiff_t, ptrdiff_t, UINT2NUM);
222
- TYPE_TO_RUBY(intptr_t, intptr_t, UINT2NUM);
223
- TYPE_TO_RUBY(uintptr_t, uintptr_t, UINT2NUM);
220
+ TYPE_TO_RUBY(double, cl_float, rb_float_new);
221
+ TYPE_TO_RUBY(size_t, cl_uint, UINT2NUM);
222
+ TYPE_TO_RUBY(ptrdiff_t, cl_uint, UINT2NUM);
223
+ TYPE_TO_RUBY(intptr_t, cl_uint, UINT2NUM);
224
+ TYPE_TO_RUBY(uintptr_t, cl_uint, UINT2NUM);
224
225
  return Qnil;
225
226
  }
226
227
 
@@ -261,172 +262,172 @@ fixnum_to_type(VALUE self, VALUE type)
261
262
  static VALUE
262
263
  type_new(VALUE klass, VALUE type)
263
264
  {
264
- return rb_funcall(rb_cType, rb_intern("new"), 1, type);
265
+ return rb_funcall(rb_cType, id_new, 1, type);
265
266
  }
266
267
 
267
268
  static void
268
- free_buffer(struct buffer *buffer)
269
+ free_buffer_data(struct buffer *buffer)
269
270
  {
270
271
  clReleaseMemObject(buffer->data);
271
- rb_gc_mark(buffer->arr);
272
272
  ruby_xfree(buffer->cachebuf);
273
- ruby_xfree(buffer);
274
273
  }
275
274
 
276
275
  static VALUE
277
- buffer_s_allocate(VALUE klass)
276
+ buffer_outvar(VALUE self)
278
277
  {
279
- struct buffer *buffer;
280
- buffer = ALLOC(struct buffer);
281
- MEMZERO(buffer, struct buffer, 1);
282
- buffer->arr = Qnil;
283
- return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
278
+ GET_BUFFER();
279
+ buffer->outvar = Qtrue;
280
+ return self;
284
281
  }
285
282
 
286
- static void
287
- buffer_update_cache_info(struct buffer *buffer)
283
+ static VALUE
284
+ buffer_is_outvar(VALUE self)
288
285
  {
289
- buffer->num_items = RARRAY_LEN(buffer->arr);
290
- buffer->type = SYM2ID(rb_funcall(buffer->arr, id_data_type, 0));
291
- buffer->member_size = FIX2INT(rb_hash_aref(rb_hTypes, ID2SYM(buffer->type)));
286
+ GET_BUFFER();
287
+ return buffer->outvar;
292
288
  }
293
289
 
294
290
  static VALUE
295
- buffer_write(VALUE self)
291
+ buffer_dirty(VALUE self)
296
292
  {
297
- unsigned int i, index;
298
- unsigned long data_ptr[16]; // data buffer
299
-
300
293
  GET_BUFFER();
301
-
302
- buffer_update_cache_info(buffer);
303
-
304
- if (buffer->cachebuf) {
305
- xfree(buffer->cachebuf);
306
- }
307
- buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
308
-
309
- for (i = 0, index = 0; i < RARRAY_LEN(buffer->arr); i++, index += buffer->member_size) {
310
- VALUE item = RARRAY_PTR(buffer->arr)[i];
311
-
312
- type_to_native(item, buffer->type, (void *)data_ptr);
313
- memcpy(((int8_t*)buffer->cachebuf) + index, (void *)data_ptr, buffer->member_size);
314
- }
315
-
316
- return self;
294
+ if (buffer->dirty == Qtrue) return Qtrue;
295
+ if (buffer->data == NULL) return Qtrue;
296
+ if (buffer->cachebuf == NULL) return Qtrue;
297
+ if (RARRAY_LEN(self) != buffer->num_items) return Qtrue;
298
+ if (SYM2ID(rb_funcall(self, id_data_type, 0)) != buffer->type) return Qtrue;
299
+ return Qfalse;
317
300
  }
318
301
 
319
302
  static VALUE
320
- buffer_read(VALUE self)
303
+ buffer_mark_dirty(VALUE self)
321
304
  {
322
- unsigned int i, index;
323
-
324
305
  GET_BUFFER();
325
-
326
- rb_gc_mark(buffer->arr);
327
- buffer->arr = rb_ary_new2(buffer->num_items);
328
-
329
- for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
330
- VALUE value = type_to_ruby(((int8_t*)buffer->cachebuf) + index, buffer->type);
331
- rb_ary_push(buffer->arr, value);
332
- }
333
-
334
- return self;
306
+ return (buffer->dirty = Qtrue);
335
307
  }
336
308
 
337
- static VALUE
338
- buffer_size_changed(VALUE self)
309
+ static void
310
+ buffer_size_changed(struct buffer *buffer)
339
311
  {
340
- GET_BUFFER();
341
-
342
- if (buffer->data) {
343
- clReleaseMemObject(buffer->data);
344
- }
345
- buffer_update_cache_info(buffer);
312
+ clReleaseMemObject(buffer->data);
346
313
  buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
347
- buffer->num_items * buffer->member_size, NULL, NULL);
348
-
349
- buffer_write(self);
350
-
351
- return self;
314
+ buffer->num_items * buffer->member_size, NULL, NULL);
315
+ ruby_xfree(buffer->cachebuf);
316
+ buffer->cachebuf = ruby_xmalloc(buffer->num_items * buffer->member_size);
352
317
  }
353
318
 
354
319
  static VALUE
355
- buffer_data(VALUE self)
320
+ buffer_update_cache(VALUE self)
356
321
  {
357
322
  GET_BUFFER();
358
- return buffer->arr;
323
+
324
+ if (buffer_dirty(self) == Qtrue) {
325
+ size_t old_num_items = buffer->num_items;
326
+ buffer->num_items = RARRAY_LEN(self);
327
+ buffer->type = SYM2ID(rb_funcall(self, id_data_type, 0));
328
+ buffer->member_size = FIX2INT(rb_hash_aref(rb_hTypes, ID2SYM(buffer->type)));
329
+ if (buffer->num_items != old_num_items) buffer_size_changed(buffer);
330
+ buffer->dirty = Qfalse;
331
+ return Qtrue;
332
+ }
333
+
334
+ return Qnil;
359
335
  }
360
336
 
361
- static VALUE
362
- buffer_data_set(VALUE self, VALUE new_value)
337
+ static void
338
+ print_buffer(struct buffer *buffer)
363
339
  {
364
- GET_BUFFER();
365
-
366
- if (RTEST(buffer->arr)) {
367
- rb_gc_mark(buffer->arr);
340
+ int i;
341
+ for (i = 0; i < buffer->num_items * buffer->member_size; i++) {
342
+ int c = (int)buffer->cachebuf[i];
343
+ if (i > 0 && i % 8 == 0) printf("\n");
344
+ printf("%2.2x ", c);
368
345
  }
369
- buffer->arr = new_value;
370
- buffer_size_changed(self);
371
- return buffer->arr;
346
+ printf("\n");
347
+ fflush(stdout);
372
348
  }
373
349
 
374
350
  static VALUE
375
- buffer_initialize(int argc, VALUE *argv, VALUE self)
351
+ buffer_write(VALUE self, cl_command_queue queue)
376
352
  {
377
- if (argc == 0) {
378
- rb_raise(rb_eArgError, "no buffer data given");
379
- }
353
+ unsigned int i, index;
354
+ unsigned long data_ptr[16]; // data buffer
380
355
 
381
- if (TYPE(argv[0]) == T_ARRAY) {
382
- buffer_data_set(self, argv[0]);
356
+ GET_BUFFER();
357
+
358
+ if (NIL_P(RARRAY_PTR(self)[0])) return Qnil;
359
+
360
+ for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
361
+ VALUE item = RARRAY_PTR(self)[i];
362
+ type_to_native(item, buffer->type, data_ptr);
363
+ memcpy(buffer->cachebuf + index, data_ptr, buffer->member_size);
383
364
  }
384
- else {
385
- buffer_data_set(self, rb_ary_new4(argc, argv));
365
+
366
+ if (queue != NULL) {
367
+ clEnqueueWriteBuffer(queue, buffer->data, CL_TRUE, 0,
368
+ buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
386
369
  }
387
370
 
388
371
  return self;
389
372
  }
390
373
 
391
374
  static VALUE
392
- obuffer_initialize(VALUE self, VALUE type, VALUE size)
375
+ buffer_read(VALUE self, cl_command_queue queue)
393
376
  {
394
- VALUE type_sym, member_size;
377
+ unsigned int i, index;
378
+
395
379
  GET_BUFFER();
396
380
 
397
- type_sym = rb_funcall(type, id_to_sym, 0);
398
- member_size = rb_hash_aref(rb_hTypes, type_sym);
399
- if (NIL_P(member_size)) {
400
- rb_raise(rb_eArgError, "type can only be one of %s",
401
- RSTRING_PTR(rb_inspect(rb_funcall(rb_hTypes, rb_intern("keys"), 0))));
402
- }
403
- if (TYPE(size) != T_FIXNUM) {
404
- rb_raise(rb_eArgError, "expecting buffer size as argument 2");
381
+ if (buffer->outvar != Qtrue) return Qnil;
382
+
383
+ if (queue != NULL) {
384
+ clEnqueueReadBuffer(queue, buffer->data, CL_TRUE, 0,
385
+ buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
405
386
  }
406
387
 
407
- buffer->type = SYM2ID(type_sym);
408
- buffer->member_size = FIX2INT(member_size);
409
- buffer->num_items = FIX2UINT(size);
410
- buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
411
- buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
412
- buffer->member_size * buffer->num_items, NULL, NULL);
388
+ for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
389
+ VALUE value = type_to_ruby(buffer->cachebuf + index, buffer->type);
390
+ rb_ary_store(self, i, value);
391
+ }
413
392
 
414
393
  return self;
415
394
  }
416
395
 
417
396
  static VALUE
418
- obuffer_clear(VALUE self)
397
+ array_to_outvar(VALUE self)
419
398
  {
420
- GET_BUFFER();
421
- memset(buffer->cachebuf, 0, buffer->member_size * buffer->num_items);
422
- return self;
399
+ VALUE buf = rb_funcall(rb_cBuffer, id_new, 0);
400
+ rb_ary_replace(buf, self);
401
+ buffer_outvar(buf);
402
+ buffer_mark_dirty(buf);
403
+ return buf;
423
404
  }
424
405
 
425
406
  static VALUE
426
- obuffer_size(VALUE self)
407
+ buffer_initialize(int argc, VALUE *argv, VALUE self)
427
408
  {
428
- GET_BUFFER();
429
- return INT2FIX(buffer->num_items);
409
+ VALUE buf_value;
410
+ struct buffer *buffer;
411
+
412
+ rb_call_super(argc, argv);
413
+
414
+ if (argc == 1 && TYPE(argv[0]) == T_ARRAY) {
415
+ VALUE value = rb_ivar_get(argv[0], id_data_type);
416
+ if (RTEST(value)) rb_ivar_set(self, id_data_type, value);
417
+ }
418
+
419
+ buffer = ALLOC(struct buffer);
420
+ MEMZERO(buffer, struct buffer, 1);
421
+ buffer->outvar = Qfalse;
422
+ buffer->dirty = Qtrue;
423
+ buf_value = Data_Wrap_Struct(rb_cObject, 0, free_buffer_data, buffer);
424
+ rb_ivar_set(self, id_buffer_data, buf_value);
425
+
426
+ if (RARRAY_LEN(self) > 0 && NIL_P(RARRAY_PTR(self)[0])) { /* outvar */
427
+ buffer->outvar = Qtrue;
428
+ }
429
+
430
+ return self;
430
431
  }
431
432
 
432
433
  static void
@@ -500,6 +501,7 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
500
501
  size_t global[3] = {1, 1, 1}, local;
501
502
  cl_kernel kernel;
502
503
  cl_command_queue commands;
504
+ VALUE result;
503
505
  GET_PROGRAM();
504
506
 
505
507
  StringValue(argv[0]);
@@ -531,30 +533,20 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
531
533
  break;
532
534
  }
533
535
 
534
- if (TYPE(item) == T_ARRAY) {
536
+ if (CLASS_OF(item) == rb_cArray) {
535
537
  /* create buffer from arg */
536
- VALUE buf = buffer_s_allocate(rb_cBuffer);
537
- item = buffer_initialize(1, &item, buf);
538
+ argv[i] = item = rb_funcall(rb_cBuffer, id_new, 1, item);
538
539
  }
539
540
 
540
- if (CLASS_OF(item) == rb_cOutputBuffer) {
541
- struct buffer *buffer;
542
- Data_Get_Struct(item, struct buffer, buffer);
543
- err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
544
- if (buffer->num_items > global[0]) {
545
- global[0] = buffer->num_items;
546
- }
547
- }
548
- else if (CLASS_OF(item) == rb_cBuffer) {
541
+ if (CLASS_OF(item) == rb_cBuffer) {
549
542
  struct buffer *buffer;
550
- Data_Get_Struct(item, struct buffer, buffer);
551
-
552
- buffer_write(item);
553
- clEnqueueWriteBuffer(commands, buffer->data, CL_TRUE, 0,
554
- buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
543
+ Data_Get_Struct(rb_ivar_get(item, id_buffer_data), struct buffer, buffer);
544
+
545
+ buffer_update_cache(item);
546
+ buffer_write(item, commands);
555
547
  err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
556
- if (buffer->num_items > global[0]) {
557
- global[0] = buffer->num_items;
548
+ if (RARRAY_LEN(item) > global[0]) {
549
+ global[0] = RARRAY_LEN(item);
558
550
  }
559
551
  }
560
552
  else {
@@ -600,23 +592,28 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
600
592
 
601
593
  clFinish(commands);
602
594
 
595
+ result = rb_ary_new();
596
+
603
597
  for (i = 1; i < argc; i++) {
604
598
  VALUE item = argv[i];
605
- if (CLASS_OF(item) == rb_cOutputBuffer) {
606
- struct buffer *buffer;
607
- Data_Get_Struct(item, struct buffer, buffer);
608
- err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
609
- buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
610
- if (err != CL_SUCCESS) {
611
- CLEAN();
612
- rb_raise(rb_eOpenCLError, "failed to read output buffer");
599
+ if (CLASS_OF(item) == rb_cBuffer) {
600
+ if (RTEST(buffer_read(item, commands))) {
601
+ rb_ary_push(result, item);
613
602
  }
614
- buffer_read(item);
615
603
  }
616
604
  }
617
605
 
618
606
  CLEAN();
619
- return Qnil;
607
+
608
+ if (RARRAY_LEN(result) == 0) {
609
+ return Qnil;
610
+ }
611
+ else if (RARRAY_LEN(result) == 1) {
612
+ return RARRAY_PTR(result)[0];
613
+ }
614
+ else {
615
+ return result;
616
+ }
620
617
  }
621
618
 
622
619
  static void
@@ -645,9 +642,10 @@ void
645
642
  Init_barracuda()
646
643
  {
647
644
  id_times = rb_intern("times");
645
+ id_new = rb_intern("new");
648
646
  id_to_sym = rb_intern("to_sym");
649
647
  id_data_type = rb_intern("data_type");
650
- id_object = rb_intern("object");
648
+ id_buffer_data = rb_intern("buffer_data");
651
649
 
652
650
  rb_hTypes = rb_hash_new();
653
651
  rb_define_method(rb_mKernel, "Type", type_new, 1);
@@ -666,28 +664,19 @@ Init_barracuda()
666
664
  rb_define_method(rb_cProgram, "compile", program_compile, 1);
667
665
  rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
668
666
 
669
- rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cObject);
670
- rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
667
+ rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cArray);
671
668
  rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
672
- rb_define_method(rb_cBuffer, "size_changed", buffer_size_changed, 0);
673
- rb_define_method(rb_cBuffer, "read", buffer_read, 0);
674
- rb_define_method(rb_cBuffer, "write", buffer_write, 0);
675
- rb_define_method(rb_cBuffer, "data", buffer_data, 0);
676
- rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
677
-
678
- rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
679
- rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
680
- rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
681
- rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
682
- rb_undef_method(rb_cOutputBuffer, "write");
683
- rb_undef_method(rb_cOutputBuffer, "size_changed");
684
- rb_undef_method(rb_cOutputBuffer, "data=");
669
+ rb_define_method(rb_cBuffer, "outvar", buffer_outvar, 0);
670
+ rb_define_method(rb_cBuffer, "outvar?", buffer_is_outvar, 0);
671
+ rb_define_method(rb_cBuffer, "mark_dirty", buffer_mark_dirty, 0);
672
+ rb_define_method(rb_cBuffer, "dirty?", buffer_dirty, 0);
685
673
 
686
674
  rb_cType = rb_define_class_under(rb_mBarracuda, "Type", rb_cObject);
687
675
  rb_define_method(rb_cType, "initialize", type_initialize, 1);
688
676
  rb_define_method(rb_cType, "method_missing", type_method_missing, 1);
689
677
  rb_define_method(rb_cType, "object", type_object, 0);
690
678
 
679
+ rb_define_method(rb_cArray, "outvar", array_to_outvar, 0);
691
680
  rb_define_method(rb_cObject, "to_type", object_to_type, 1);
692
681
  rb_define_method(rb_cFixnum, "to_type", fixnum_to_type, 1);
693
682
  rb_define_method(rb_cObject, "data_type", object_data_type_get, 0);