barracuda 1.2 → 1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +63 -30
- data/Rakefile +11 -0
- data/benchmarks/normalize.rb +1 -1
- data/benchmarks/to_float.rb +2 -2
- data/ext/barracuda.c +162 -173
- data/test/test_buffer.rb +38 -0
- data/test/test_program.rb +206 -0
- data/test/test_types.rb +68 -0
- metadata +8 -5
- data/benchmarks/sort.rb +0 -29
- data/test/test_barracuda.rb +0 -291
data/README.md
CHANGED
@@ -33,7 +33,7 @@ Or:
|
|
33
33
|
cd barracuda
|
34
34
|
rake install
|
35
35
|
|
36
|
-
|
36
|
+
USAGE
|
37
37
|
-----
|
38
38
|
|
39
39
|
The basic workflow behind the OpenCL architecture is:
|
@@ -45,12 +45,14 @@ The basic workflow behind the OpenCL architecture is:
|
|
45
45
|
In Barracuda, this looks basically like:
|
46
46
|
|
47
47
|
1. Create a `Barracuda::Program`
|
48
|
-
2. Create a `Barracuda::Buffer`
|
48
|
+
2. Create a `Barracuda::Buffer` for input and output
|
49
49
|
2. Call the kernel method on the program with buffers as arguments
|
50
50
|
3. Read output buffers
|
51
51
|
|
52
|
-
As you can see, there are only
|
53
|
-
|
52
|
+
As you can see, there are only 2 basic classes: `Program` and `Buffer`. The
|
53
|
+
program is where you compile your OpenCL code, and the Buffer class is a
|
54
|
+
subclass of Array that contains your data to pass in and out of the kernel
|
55
|
+
method.
|
54
56
|
|
55
57
|
EXAMPLE
|
56
58
|
-------
|
@@ -63,7 +65,7 @@ Consider the following example to sum a bunch of integers:
|
|
63
65
|
}
|
64
66
|
eof
|
65
67
|
|
66
|
-
output =
|
68
|
+
output = Buffer.new(1)
|
67
69
|
program.sum((1..65536).to_a, output)
|
68
70
|
|
69
71
|
puts "The sum is: " + output.data[0].to_s
|
@@ -72,9 +74,8 @@ The above example will compute the sum of integers 1 to 65536 using (at most)
|
|
72
74
|
65536 parallel processes and return the result in the 1-dimensional output
|
73
75
|
buffer (which stores integers and is of length 1). The kernel method `sum`
|
74
76
|
is called by calling the `#sum` method on the program object, and the
|
75
|
-
arguments are passed in sequentially as the
|
76
|
-
|
77
|
-
does not have the concept of array size).
|
77
|
+
arguments are passed in sequentially as the input data (the integers)
|
78
|
+
followed by the output buffer to store the data.
|
78
79
|
|
79
80
|
We can also specify the work group size (the number of iterations we need
|
80
81
|
to run). Barracuda automatically selects the size of the largest buffer as
|
@@ -83,6 +84,49 @@ manually specify the work group size, call the kernel with an options hash:
|
|
83
84
|
|
84
85
|
program.my_kernel_method(..., :times => 512)
|
85
86
|
|
87
|
+
OUTPUT BUFFERS
|
88
|
+
--------------
|
89
|
+
|
90
|
+
The Buffer class is a superset of both data to be sent and read from the OpenCL
|
91
|
+
kernel method being called. In general, if the Buffer contains nil elements,
|
92
|
+
it is marked as an "output buffer" and the data is read back from OpenCL after
|
93
|
+
the kernel method executes. These nil buffers are not written to OpenCL initially,
|
94
|
+
so they are only meant for output data. On the other hand, if the buffer contains
|
95
|
+
regular data, it is by default considered as input data only, and the data
|
96
|
+
is not read back after the kernel method completes.
|
97
|
+
|
98
|
+
In some cases you may want to have a buffer that is both input and output and
|
99
|
+
should be read from after the kernel method finishes. To do this, you mark the
|
100
|
+
buffer as an `outvar` as so:
|
101
|
+
|
102
|
+
program = Program.new <<-'eof'
|
103
|
+
__kernel addN(__global int *data, int N) {
|
104
|
+
int i = get_global_id(0);
|
105
|
+
data[i] = data[i] + N;
|
106
|
+
}
|
107
|
+
eof
|
108
|
+
|
109
|
+
data = [1, 2, 3]
|
110
|
+
program.addN(data.outvar, 10)
|
111
|
+
|
112
|
+
# prints: [11, 12, 13]
|
113
|
+
p data
|
114
|
+
|
115
|
+
RETURN VALUE
|
116
|
+
------------
|
117
|
+
|
118
|
+
Generally you need to pass in your output buffer as the buffer to write the
|
119
|
+
data back to. The idiom `void method(input, output)` is common to write data to
|
120
|
+
output buffers in languages such as C but is a rather clunky API for Ruby.
|
121
|
+
Instead, Barracuda returns the output buffers as the result of the kernel method
|
122
|
+
call. If there is only one output buffer, that buffer is returned as a single
|
123
|
+
result (rather than an array of buffers).
|
124
|
+
|
125
|
+
The example above could be simply rewritten as:
|
126
|
+
|
127
|
+
# prints: [11, 12, 13]
|
128
|
+
p program.addN(data.outvar, 10)
|
129
|
+
|
86
130
|
CONVERTING TYPES
|
87
131
|
----------------
|
88
132
|
|
@@ -105,6 +149,8 @@ For example, to pass in a short, do:
|
|
105
149
|
This can also be applied to an Array of shorts:
|
106
150
|
|
107
151
|
program.my_kernel([1, 2, 3].to_type(:short))
|
152
|
+
|
153
|
+
The default type for an array (and buffers) is :int
|
108
154
|
|
109
155
|
CLASS DETAILS
|
110
156
|
-------------
|
@@ -123,34 +169,21 @@ Represents an OpenCL program
|
|
123
169
|
- if the last arg is a Hash, it should be an options hash with keys:
|
124
170
|
- :times => FIXNUM (the number of iterations to run)
|
125
171
|
|
126
|
-
**Barracuda::Buffer
|
172
|
+
**Barracuda::Buffer** (extends *Array*):
|
127
173
|
|
128
|
-
|
174
|
+
Data storage to transfer to/from an OpenCL kernel method
|
129
175
|
|
130
|
-
Buffer.new(
|
131
|
-
|
132
|
-
Buffer#data => accessor for the buffer data
|
133
|
-
|
134
|
-
Buffer#size_changed => call this if the buffer.data was modified and the size changed
|
135
|
-
- calls Buffer#write
|
176
|
+
Buffer.new(buffer_array) => creates a new input buffer
|
177
|
+
Buffer.new(size) => creates a new output buffer of size `size`
|
136
178
|
|
137
|
-
Buffer#
|
138
|
-
- flushes the buffer.data cache to the OpenCL internal memory buffer
|
139
|
-
|
140
|
-
Buffer#read => reads the cached data back into buffer.data
|
141
|
-
- refreshes the buffer.data cache according to the internal memory buffer
|
142
|
-
|
143
|
-
**Barracuda::OutputBuffer**:
|
179
|
+
Buffer#mark_dirty => call this if the data was modified between calls
|
144
180
|
|
145
|
-
|
181
|
+
Buffer#dirty? => returns whether the buffer is marked as dirty
|
146
182
|
|
147
|
-
|
148
|
-
|
183
|
+
Buffer#outvar => mark the buffer to be read as output
|
184
|
+
|
185
|
+
Buffer#outvar? => returns whether buffer is marked to be read
|
149
186
|
|
150
|
-
OutputBufferBuffer#data => accessor for the buffer data
|
151
|
-
|
152
|
-
OutputBuffer#size => returns the buffer size
|
153
|
-
|
154
187
|
GLOSSARY
|
155
188
|
--------
|
156
189
|
|
data/Rakefile
CHANGED
@@ -1,9 +1,15 @@
|
|
1
1
|
require 'rubygems'
|
2
2
|
require 'rake/gempackagetask'
|
3
|
+
require 'rake/testtask'
|
3
4
|
|
4
5
|
WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
|
5
6
|
SUDO = WINDOWS ? '' : 'sudo'
|
6
7
|
|
8
|
+
task :default => :test
|
9
|
+
task :test => :build
|
10
|
+
|
11
|
+
Rake::TestTask.new
|
12
|
+
|
7
13
|
load 'barracuda.gemspec'
|
8
14
|
Rake::GemPackageTask.new(SPEC) do |pkg|
|
9
15
|
pkg.gem_spec = SPEC
|
@@ -16,3 +22,8 @@ task :install => :package do
|
|
16
22
|
sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
|
17
23
|
sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
|
18
24
|
end
|
25
|
+
|
26
|
+
desc 'Build Barracuda'
|
27
|
+
task :build do
|
28
|
+
sh "cd ext && make"
|
29
|
+
end
|
data/benchmarks/normalize.rb
CHANGED
data/benchmarks/to_float.rb
CHANGED
@@ -12,9 +12,9 @@ prog = Program.new <<-'eof'
|
|
12
12
|
}
|
13
13
|
eof
|
14
14
|
|
15
|
-
arr = (1..
|
15
|
+
arr = (1..333333).to_a
|
16
16
|
input = Buffer.new(arr)
|
17
|
-
output =
|
17
|
+
output = Buffer.new(arr.size).to_type(:float)
|
18
18
|
|
19
19
|
Benchmark.bmbm do |x|
|
20
20
|
x.report("regular") { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } }
|
data/ext/barracuda.c
CHANGED
@@ -4,7 +4,6 @@
|
|
4
4
|
|
5
5
|
static VALUE rb_mBarracuda;
|
6
6
|
static VALUE rb_cBuffer;
|
7
|
-
static VALUE rb_cOutputBuffer;
|
8
7
|
static VALUE rb_cProgram;
|
9
8
|
static VALUE rb_eProgramSyntaxError;
|
10
9
|
static VALUE rb_eOpenCLError;
|
@@ -13,8 +12,10 @@ static VALUE rb_hTypes;
|
|
13
12
|
|
14
13
|
static ID id_times;
|
15
14
|
static ID id_to_sym;
|
16
|
-
static ID
|
15
|
+
static ID id_new;
|
17
16
|
static ID id_object;
|
17
|
+
static ID id_data_type;
|
18
|
+
static ID id_buffer_data;
|
18
19
|
|
19
20
|
static ID id_type_bool;
|
20
21
|
static ID id_type_char;
|
@@ -35,25 +36,25 @@ static ID id_type_uintptr_t;
|
|
35
36
|
/*static ID id_type_void;*/
|
36
37
|
|
37
38
|
static VALUE program_compile(VALUE self, VALUE source);
|
38
|
-
static VALUE buffer_data_set(VALUE self, VALUE new_value);
|
39
39
|
|
40
40
|
static cl_device_id device_id = NULL;
|
41
41
|
static cl_context context = NULL;
|
42
42
|
static size_t max_work_group_size = 65535;
|
43
43
|
static int err;
|
44
44
|
|
45
|
-
#define VERSION_STRING "1.
|
45
|
+
#define VERSION_STRING "1.3"
|
46
46
|
|
47
47
|
struct program {
|
48
48
|
cl_program program;
|
49
49
|
};
|
50
50
|
|
51
51
|
struct buffer {
|
52
|
-
VALUE
|
52
|
+
VALUE dirty;
|
53
|
+
VALUE outvar;
|
53
54
|
ID type;
|
54
|
-
size_t num_items;
|
55
55
|
size_t member_size;
|
56
|
-
|
56
|
+
size_t num_items;
|
57
|
+
int8_t *cachebuf;
|
57
58
|
cl_mem data;
|
58
59
|
};
|
59
60
|
|
@@ -114,11 +115,12 @@ array_data_type_get(VALUE self)
|
|
114
115
|
if (RTEST(value)) return value;
|
115
116
|
|
116
117
|
if (RARRAY_LEN(self) > 0) {
|
118
|
+
if (NIL_P(RARRAY_PTR(self)[0])) return ID2SYM(id_type_int);
|
117
119
|
VALUE value = rb_funcall(RARRAY_PTR(self)[0], id_data_type, 0);
|
118
120
|
if (RTEST(value)) return value;
|
119
121
|
}
|
120
|
-
|
121
|
-
rb_raise(
|
122
|
+
|
123
|
+
rb_raise(rb_eTypeError, "unknown buffer data %s",
|
122
124
|
RSTRING_PTR(rb_inspect(self)));
|
123
125
|
}
|
124
126
|
|
@@ -128,7 +130,7 @@ array_data_type_get(VALUE self)
|
|
128
130
|
|
129
131
|
#define GET_BUFFER() \
|
130
132
|
struct buffer *buffer; \
|
131
|
-
Data_Get_Struct(self, struct buffer, buffer);
|
133
|
+
Data_Get_Struct(rb_ivar_get(self, id_buffer_data), struct buffer, buffer);
|
132
134
|
|
133
135
|
#define TYPE_SET(type, size) \
|
134
136
|
id_type_##type = rb_intern(#type); \
|
@@ -159,11 +161,11 @@ types_hash_init()
|
|
159
161
|
TYPE_SET(ulong, cl_ulong);
|
160
162
|
TYPE_SET(float, cl_float);
|
161
163
|
TYPE_SET(half, cl_half);
|
162
|
-
TYPE_SET(double,
|
163
|
-
TYPE_SET(size_t,
|
164
|
-
TYPE_SET(ptrdiff_t,
|
165
|
-
TYPE_SET(intptr_t,
|
166
|
-
TYPE_SET(uintptr_t,
|
164
|
+
TYPE_SET(double, cl_float);
|
165
|
+
TYPE_SET(size_t, cl_uint);
|
166
|
+
TYPE_SET(ptrdiff_t, cl_uint);
|
167
|
+
TYPE_SET(intptr_t, cl_uint);
|
168
|
+
TYPE_SET(uintptr_t, cl_uint);
|
167
169
|
OBJ_FREEZE(rb_hTypes);
|
168
170
|
}
|
169
171
|
|
@@ -195,11 +197,10 @@ type_to_native(VALUE value, ID data_type, void *native_value)
|
|
195
197
|
TYPE_TO_NATIVE(uint, cl_uint, NUM2UINT);
|
196
198
|
TYPE_TO_NATIVE(long, cl_long, NUM2LONG);
|
197
199
|
TYPE_TO_NATIVE(ulong, cl_ulong, NUM2ULONG);
|
198
|
-
TYPE_TO_NATIVE(
|
199
|
-
TYPE_TO_NATIVE(
|
200
|
-
TYPE_TO_NATIVE(
|
201
|
-
TYPE_TO_NATIVE(
|
202
|
-
TYPE_TO_NATIVE(uintptr_t, uintptr_t, NUM2UINT);
|
200
|
+
TYPE_TO_NATIVE(size_t, cl_uint, NUM2UINT);
|
201
|
+
TYPE_TO_NATIVE(ptrdiff_t, cl_uint, NUM2UINT);
|
202
|
+
TYPE_TO_NATIVE(intptr_t, cl_uint, NUM2UINT);
|
203
|
+
TYPE_TO_NATIVE(uintptr_t, cl_uint, NUM2UINT);
|
203
204
|
}
|
204
205
|
|
205
206
|
static VALUE
|
@@ -216,11 +217,11 @@ type_to_ruby(void *native_value, ID data_type)
|
|
216
217
|
TYPE_TO_RUBY(ulong, cl_ulong, ULONG2NUM);
|
217
218
|
TYPE_TO_RUBY(float, cl_float, rb_float_new);
|
218
219
|
TYPE_TO_RUBY(half, cl_half, rb_float_new);
|
219
|
-
TYPE_TO_RUBY(double,
|
220
|
-
TYPE_TO_RUBY(size_t,
|
221
|
-
TYPE_TO_RUBY(ptrdiff_t,
|
222
|
-
TYPE_TO_RUBY(intptr_t,
|
223
|
-
TYPE_TO_RUBY(uintptr_t,
|
220
|
+
TYPE_TO_RUBY(double, cl_float, rb_float_new);
|
221
|
+
TYPE_TO_RUBY(size_t, cl_uint, UINT2NUM);
|
222
|
+
TYPE_TO_RUBY(ptrdiff_t, cl_uint, UINT2NUM);
|
223
|
+
TYPE_TO_RUBY(intptr_t, cl_uint, UINT2NUM);
|
224
|
+
TYPE_TO_RUBY(uintptr_t, cl_uint, UINT2NUM);
|
224
225
|
return Qnil;
|
225
226
|
}
|
226
227
|
|
@@ -261,172 +262,172 @@ fixnum_to_type(VALUE self, VALUE type)
|
|
261
262
|
static VALUE
|
262
263
|
type_new(VALUE klass, VALUE type)
|
263
264
|
{
|
264
|
-
return rb_funcall(rb_cType,
|
265
|
+
return rb_funcall(rb_cType, id_new, 1, type);
|
265
266
|
}
|
266
267
|
|
267
268
|
static void
|
268
|
-
|
269
|
+
free_buffer_data(struct buffer *buffer)
|
269
270
|
{
|
270
271
|
clReleaseMemObject(buffer->data);
|
271
|
-
rb_gc_mark(buffer->arr);
|
272
272
|
ruby_xfree(buffer->cachebuf);
|
273
|
-
ruby_xfree(buffer);
|
274
273
|
}
|
275
274
|
|
276
275
|
static VALUE
|
277
|
-
|
276
|
+
buffer_outvar(VALUE self)
|
278
277
|
{
|
279
|
-
|
280
|
-
buffer =
|
281
|
-
|
282
|
-
buffer->arr = Qnil;
|
283
|
-
return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
|
278
|
+
GET_BUFFER();
|
279
|
+
buffer->outvar = Qtrue;
|
280
|
+
return self;
|
284
281
|
}
|
285
282
|
|
286
|
-
static
|
287
|
-
|
283
|
+
static VALUE
|
284
|
+
buffer_is_outvar(VALUE self)
|
288
285
|
{
|
289
|
-
|
290
|
-
|
291
|
-
buffer->member_size = FIX2INT(rb_hash_aref(rb_hTypes, ID2SYM(buffer->type)));
|
286
|
+
GET_BUFFER();
|
287
|
+
return buffer->outvar;
|
292
288
|
}
|
293
289
|
|
294
290
|
static VALUE
|
295
|
-
|
291
|
+
buffer_dirty(VALUE self)
|
296
292
|
{
|
297
|
-
unsigned int i, index;
|
298
|
-
unsigned long data_ptr[16]; // data buffer
|
299
|
-
|
300
293
|
GET_BUFFER();
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
if (buffer->
|
305
|
-
|
306
|
-
|
307
|
-
buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
|
308
|
-
|
309
|
-
for (i = 0, index = 0; i < RARRAY_LEN(buffer->arr); i++, index += buffer->member_size) {
|
310
|
-
VALUE item = RARRAY_PTR(buffer->arr)[i];
|
311
|
-
|
312
|
-
type_to_native(item, buffer->type, (void *)data_ptr);
|
313
|
-
memcpy(((int8_t*)buffer->cachebuf) + index, (void *)data_ptr, buffer->member_size);
|
314
|
-
}
|
315
|
-
|
316
|
-
return self;
|
294
|
+
if (buffer->dirty == Qtrue) return Qtrue;
|
295
|
+
if (buffer->data == NULL) return Qtrue;
|
296
|
+
if (buffer->cachebuf == NULL) return Qtrue;
|
297
|
+
if (RARRAY_LEN(self) != buffer->num_items) return Qtrue;
|
298
|
+
if (SYM2ID(rb_funcall(self, id_data_type, 0)) != buffer->type) return Qtrue;
|
299
|
+
return Qfalse;
|
317
300
|
}
|
318
301
|
|
319
302
|
static VALUE
|
320
|
-
|
303
|
+
buffer_mark_dirty(VALUE self)
|
321
304
|
{
|
322
|
-
unsigned int i, index;
|
323
|
-
|
324
305
|
GET_BUFFER();
|
325
|
-
|
326
|
-
rb_gc_mark(buffer->arr);
|
327
|
-
buffer->arr = rb_ary_new2(buffer->num_items);
|
328
|
-
|
329
|
-
for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
|
330
|
-
VALUE value = type_to_ruby(((int8_t*)buffer->cachebuf) + index, buffer->type);
|
331
|
-
rb_ary_push(buffer->arr, value);
|
332
|
-
}
|
333
|
-
|
334
|
-
return self;
|
306
|
+
return (buffer->dirty = Qtrue);
|
335
307
|
}
|
336
308
|
|
337
|
-
static
|
338
|
-
buffer_size_changed(
|
309
|
+
static void
|
310
|
+
buffer_size_changed(struct buffer *buffer)
|
339
311
|
{
|
340
|
-
|
341
|
-
|
342
|
-
if (buffer->data) {
|
343
|
-
clReleaseMemObject(buffer->data);
|
344
|
-
}
|
345
|
-
buffer_update_cache_info(buffer);
|
312
|
+
clReleaseMemObject(buffer->data);
|
346
313
|
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
347
|
-
|
348
|
-
|
349
|
-
|
350
|
-
|
351
|
-
return self;
|
314
|
+
buffer->num_items * buffer->member_size, NULL, NULL);
|
315
|
+
ruby_xfree(buffer->cachebuf);
|
316
|
+
buffer->cachebuf = ruby_xmalloc(buffer->num_items * buffer->member_size);
|
352
317
|
}
|
353
318
|
|
354
319
|
static VALUE
|
355
|
-
|
320
|
+
buffer_update_cache(VALUE self)
|
356
321
|
{
|
357
322
|
GET_BUFFER();
|
358
|
-
|
323
|
+
|
324
|
+
if (buffer_dirty(self) == Qtrue) {
|
325
|
+
size_t old_num_items = buffer->num_items;
|
326
|
+
buffer->num_items = RARRAY_LEN(self);
|
327
|
+
buffer->type = SYM2ID(rb_funcall(self, id_data_type, 0));
|
328
|
+
buffer->member_size = FIX2INT(rb_hash_aref(rb_hTypes, ID2SYM(buffer->type)));
|
329
|
+
if (buffer->num_items != old_num_items) buffer_size_changed(buffer);
|
330
|
+
buffer->dirty = Qfalse;
|
331
|
+
return Qtrue;
|
332
|
+
}
|
333
|
+
|
334
|
+
return Qnil;
|
359
335
|
}
|
360
336
|
|
361
|
-
static
|
362
|
-
|
337
|
+
static void
|
338
|
+
print_buffer(struct buffer *buffer)
|
363
339
|
{
|
364
|
-
|
365
|
-
|
366
|
-
|
367
|
-
|
340
|
+
int i;
|
341
|
+
for (i = 0; i < buffer->num_items * buffer->member_size; i++) {
|
342
|
+
int c = (int)buffer->cachebuf[i];
|
343
|
+
if (i > 0 && i % 8 == 0) printf("\n");
|
344
|
+
printf("%2.2x ", c);
|
368
345
|
}
|
369
|
-
|
370
|
-
|
371
|
-
return buffer->arr;
|
346
|
+
printf("\n");
|
347
|
+
fflush(stdout);
|
372
348
|
}
|
373
349
|
|
374
350
|
static VALUE
|
375
|
-
|
351
|
+
buffer_write(VALUE self, cl_command_queue queue)
|
376
352
|
{
|
377
|
-
|
378
|
-
|
379
|
-
}
|
353
|
+
unsigned int i, index;
|
354
|
+
unsigned long data_ptr[16]; // data buffer
|
380
355
|
|
381
|
-
|
382
|
-
|
356
|
+
GET_BUFFER();
|
357
|
+
|
358
|
+
if (NIL_P(RARRAY_PTR(self)[0])) return Qnil;
|
359
|
+
|
360
|
+
for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
|
361
|
+
VALUE item = RARRAY_PTR(self)[i];
|
362
|
+
type_to_native(item, buffer->type, data_ptr);
|
363
|
+
memcpy(buffer->cachebuf + index, data_ptr, buffer->member_size);
|
383
364
|
}
|
384
|
-
|
385
|
-
|
365
|
+
|
366
|
+
if (queue != NULL) {
|
367
|
+
clEnqueueWriteBuffer(queue, buffer->data, CL_TRUE, 0,
|
368
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
386
369
|
}
|
387
370
|
|
388
371
|
return self;
|
389
372
|
}
|
390
373
|
|
391
374
|
static VALUE
|
392
|
-
|
375
|
+
buffer_read(VALUE self, cl_command_queue queue)
|
393
376
|
{
|
394
|
-
|
377
|
+
unsigned int i, index;
|
378
|
+
|
395
379
|
GET_BUFFER();
|
396
380
|
|
397
|
-
|
398
|
-
|
399
|
-
if (
|
400
|
-
|
401
|
-
|
402
|
-
}
|
403
|
-
if (TYPE(size) != T_FIXNUM) {
|
404
|
-
rb_raise(rb_eArgError, "expecting buffer size as argument 2");
|
381
|
+
if (buffer->outvar != Qtrue) return Qnil;
|
382
|
+
|
383
|
+
if (queue != NULL) {
|
384
|
+
clEnqueueReadBuffer(queue, buffer->data, CL_TRUE, 0,
|
385
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
405
386
|
}
|
406
387
|
|
407
|
-
buffer->
|
408
|
-
|
409
|
-
|
410
|
-
|
411
|
-
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
412
|
-
buffer->member_size * buffer->num_items, NULL, NULL);
|
388
|
+
for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
|
389
|
+
VALUE value = type_to_ruby(buffer->cachebuf + index, buffer->type);
|
390
|
+
rb_ary_store(self, i, value);
|
391
|
+
}
|
413
392
|
|
414
393
|
return self;
|
415
394
|
}
|
416
395
|
|
417
396
|
static VALUE
|
418
|
-
|
397
|
+
array_to_outvar(VALUE self)
|
419
398
|
{
|
420
|
-
|
421
|
-
|
422
|
-
|
399
|
+
VALUE buf = rb_funcall(rb_cBuffer, id_new, 0);
|
400
|
+
rb_ary_replace(buf, self);
|
401
|
+
buffer_outvar(buf);
|
402
|
+
buffer_mark_dirty(buf);
|
403
|
+
return buf;
|
423
404
|
}
|
424
405
|
|
425
406
|
static VALUE
|
426
|
-
|
407
|
+
buffer_initialize(int argc, VALUE *argv, VALUE self)
|
427
408
|
{
|
428
|
-
|
429
|
-
|
409
|
+
VALUE buf_value;
|
410
|
+
struct buffer *buffer;
|
411
|
+
|
412
|
+
rb_call_super(argc, argv);
|
413
|
+
|
414
|
+
if (argc == 1 && TYPE(argv[0]) == T_ARRAY) {
|
415
|
+
VALUE value = rb_ivar_get(argv[0], id_data_type);
|
416
|
+
if (RTEST(value)) rb_ivar_set(self, id_data_type, value);
|
417
|
+
}
|
418
|
+
|
419
|
+
buffer = ALLOC(struct buffer);
|
420
|
+
MEMZERO(buffer, struct buffer, 1);
|
421
|
+
buffer->outvar = Qfalse;
|
422
|
+
buffer->dirty = Qtrue;
|
423
|
+
buf_value = Data_Wrap_Struct(rb_cObject, 0, free_buffer_data, buffer);
|
424
|
+
rb_ivar_set(self, id_buffer_data, buf_value);
|
425
|
+
|
426
|
+
if (RARRAY_LEN(self) > 0 && NIL_P(RARRAY_PTR(self)[0])) { /* outvar */
|
427
|
+
buffer->outvar = Qtrue;
|
428
|
+
}
|
429
|
+
|
430
|
+
return self;
|
430
431
|
}
|
431
432
|
|
432
433
|
static void
|
@@ -500,6 +501,7 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
|
|
500
501
|
size_t global[3] = {1, 1, 1}, local;
|
501
502
|
cl_kernel kernel;
|
502
503
|
cl_command_queue commands;
|
504
|
+
VALUE result;
|
503
505
|
GET_PROGRAM();
|
504
506
|
|
505
507
|
StringValue(argv[0]);
|
@@ -531,30 +533,20 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
|
|
531
533
|
break;
|
532
534
|
}
|
533
535
|
|
534
|
-
if (
|
536
|
+
if (CLASS_OF(item) == rb_cArray) {
|
535
537
|
/* create buffer from arg */
|
536
|
-
|
537
|
-
item = buffer_initialize(1, &item, buf);
|
538
|
+
argv[i] = item = rb_funcall(rb_cBuffer, id_new, 1, item);
|
538
539
|
}
|
539
540
|
|
540
|
-
if (CLASS_OF(item) ==
|
541
|
-
struct buffer *buffer;
|
542
|
-
Data_Get_Struct(item, struct buffer, buffer);
|
543
|
-
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
544
|
-
if (buffer->num_items > global[0]) {
|
545
|
-
global[0] = buffer->num_items;
|
546
|
-
}
|
547
|
-
}
|
548
|
-
else if (CLASS_OF(item) == rb_cBuffer) {
|
541
|
+
if (CLASS_OF(item) == rb_cBuffer) {
|
549
542
|
struct buffer *buffer;
|
550
|
-
Data_Get_Struct(item, struct buffer, buffer);
|
551
|
-
|
552
|
-
|
553
|
-
|
554
|
-
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
543
|
+
Data_Get_Struct(rb_ivar_get(item, id_buffer_data), struct buffer, buffer);
|
544
|
+
|
545
|
+
buffer_update_cache(item);
|
546
|
+
buffer_write(item, commands);
|
555
547
|
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
556
|
-
if (
|
557
|
-
global[0] =
|
548
|
+
if (RARRAY_LEN(item) > global[0]) {
|
549
|
+
global[0] = RARRAY_LEN(item);
|
558
550
|
}
|
559
551
|
}
|
560
552
|
else {
|
@@ -600,23 +592,28 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
|
|
600
592
|
|
601
593
|
clFinish(commands);
|
602
594
|
|
595
|
+
result = rb_ary_new();
|
596
|
+
|
603
597
|
for (i = 1; i < argc; i++) {
|
604
598
|
VALUE item = argv[i];
|
605
|
-
if (CLASS_OF(item) ==
|
606
|
-
|
607
|
-
|
608
|
-
err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
|
609
|
-
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
610
|
-
if (err != CL_SUCCESS) {
|
611
|
-
CLEAN();
|
612
|
-
rb_raise(rb_eOpenCLError, "failed to read output buffer");
|
599
|
+
if (CLASS_OF(item) == rb_cBuffer) {
|
600
|
+
if (RTEST(buffer_read(item, commands))) {
|
601
|
+
rb_ary_push(result, item);
|
613
602
|
}
|
614
|
-
buffer_read(item);
|
615
603
|
}
|
616
604
|
}
|
617
605
|
|
618
606
|
CLEAN();
|
619
|
-
|
607
|
+
|
608
|
+
if (RARRAY_LEN(result) == 0) {
|
609
|
+
return Qnil;
|
610
|
+
}
|
611
|
+
else if (RARRAY_LEN(result) == 1) {
|
612
|
+
return RARRAY_PTR(result)[0];
|
613
|
+
}
|
614
|
+
else {
|
615
|
+
return result;
|
616
|
+
}
|
620
617
|
}
|
621
618
|
|
622
619
|
static void
|
@@ -645,9 +642,10 @@ void
|
|
645
642
|
Init_barracuda()
|
646
643
|
{
|
647
644
|
id_times = rb_intern("times");
|
645
|
+
id_new = rb_intern("new");
|
648
646
|
id_to_sym = rb_intern("to_sym");
|
649
647
|
id_data_type = rb_intern("data_type");
|
650
|
-
|
648
|
+
id_buffer_data = rb_intern("buffer_data");
|
651
649
|
|
652
650
|
rb_hTypes = rb_hash_new();
|
653
651
|
rb_define_method(rb_mKernel, "Type", type_new, 1);
|
@@ -666,28 +664,19 @@ Init_barracuda()
|
|
666
664
|
rb_define_method(rb_cProgram, "compile", program_compile, 1);
|
667
665
|
rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
|
668
666
|
|
669
|
-
rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer",
|
670
|
-
rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
|
667
|
+
rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cArray);
|
671
668
|
rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
|
672
|
-
rb_define_method(rb_cBuffer, "
|
673
|
-
rb_define_method(rb_cBuffer, "
|
674
|
-
rb_define_method(rb_cBuffer, "
|
675
|
-
rb_define_method(rb_cBuffer, "
|
676
|
-
rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
|
677
|
-
|
678
|
-
rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
|
679
|
-
rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
|
680
|
-
rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
|
681
|
-
rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
|
682
|
-
rb_undef_method(rb_cOutputBuffer, "write");
|
683
|
-
rb_undef_method(rb_cOutputBuffer, "size_changed");
|
684
|
-
rb_undef_method(rb_cOutputBuffer, "data=");
|
669
|
+
rb_define_method(rb_cBuffer, "outvar", buffer_outvar, 0);
|
670
|
+
rb_define_method(rb_cBuffer, "outvar?", buffer_is_outvar, 0);
|
671
|
+
rb_define_method(rb_cBuffer, "mark_dirty", buffer_mark_dirty, 0);
|
672
|
+
rb_define_method(rb_cBuffer, "dirty?", buffer_dirty, 0);
|
685
673
|
|
686
674
|
rb_cType = rb_define_class_under(rb_mBarracuda, "Type", rb_cObject);
|
687
675
|
rb_define_method(rb_cType, "initialize", type_initialize, 1);
|
688
676
|
rb_define_method(rb_cType, "method_missing", type_method_missing, 1);
|
689
677
|
rb_define_method(rb_cType, "object", type_object, 0);
|
690
678
|
|
679
|
+
rb_define_method(rb_cArray, "outvar", array_to_outvar, 0);
|
691
680
|
rb_define_method(rb_cObject, "to_type", object_to_type, 1);
|
692
681
|
rb_define_method(rb_cFixnum, "to_type", fixnum_to_type, 1);
|
693
682
|
rb_define_method(rb_cObject, "data_type", object_data_type_get, 0);
|