barracuda 1.2 → 1.3
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +63 -30
- data/Rakefile +11 -0
- data/benchmarks/normalize.rb +1 -1
- data/benchmarks/to_float.rb +2 -2
- data/ext/barracuda.c +162 -173
- data/test/test_buffer.rb +38 -0
- data/test/test_program.rb +206 -0
- data/test/test_types.rb +68 -0
- metadata +8 -5
- data/benchmarks/sort.rb +0 -29
- data/test/test_barracuda.rb +0 -291
data/README.md
CHANGED
@@ -33,7 +33,7 @@ Or:
|
|
33
33
|
cd barracuda
|
34
34
|
rake install
|
35
35
|
|
36
|
-
|
36
|
+
USAGE
|
37
37
|
-----
|
38
38
|
|
39
39
|
The basic workflow behind the OpenCL architecture is:
|
@@ -45,12 +45,14 @@ The basic workflow behind the OpenCL architecture is:
|
|
45
45
|
In Barracuda, this looks basically like:
|
46
46
|
|
47
47
|
1. Create a `Barracuda::Program`
|
48
|
-
2. Create a `Barracuda::Buffer`
|
48
|
+
2. Create a `Barracuda::Buffer` for input and output
|
49
49
|
2. Call the kernel method on the program with buffers as arguments
|
50
50
|
3. Read output buffers
|
51
51
|
|
52
|
-
As you can see, there are only
|
53
|
-
|
52
|
+
As you can see, there are only 2 basic classes: `Program` and `Buffer`. The
|
53
|
+
program is where you compile your OpenCL code, and the Buffer class is a
|
54
|
+
subclass of Array that contains your data to pass in and out of the kernel
|
55
|
+
method.
|
54
56
|
|
55
57
|
EXAMPLE
|
56
58
|
-------
|
@@ -63,7 +65,7 @@ Consider the following example to sum a bunch of integers:
|
|
63
65
|
}
|
64
66
|
eof
|
65
67
|
|
66
|
-
output =
|
68
|
+
output = Buffer.new(1)
|
67
69
|
program.sum((1..65536).to_a, output)
|
68
70
|
|
69
71
|
puts "The sum is: " + output.data[0].to_s
|
@@ -72,9 +74,8 @@ The above example will compute the sum of integers 1 to 65536 using (at most)
|
|
72
74
|
65536 parallel processes and return the result in the 1-dimensional output
|
73
75
|
buffer (which stores integers and is of length 1). The kernel method `sum`
|
74
76
|
is called by calling the `#sum` method on the program object, and the
|
75
|
-
arguments are passed in sequentially as the
|
76
|
-
|
77
|
-
does not have the concept of array size).
|
77
|
+
arguments are passed in sequentially as the input data (the integers)
|
78
|
+
followed by the output buffer to store the data.
|
78
79
|
|
79
80
|
We can also specify the work group size (the number of iterations we need
|
80
81
|
to run). Barracuda automatically selects the size of the largest buffer as
|
@@ -83,6 +84,49 @@ manually specify the work group size, call the kernel with an options hash:
|
|
83
84
|
|
84
85
|
program.my_kernel_method(..., :times => 512)
|
85
86
|
|
87
|
+
OUTPUT BUFFERS
|
88
|
+
--------------
|
89
|
+
|
90
|
+
The Buffer class is a superset of both data to be sent and read from the OpenCL
|
91
|
+
kernel method being called. In general, if the Buffer contains nil elements,
|
92
|
+
it is marked as an "output buffer" and the data is read back from OpenCL after
|
93
|
+
the kernel method executes. These nil buffers are not written to OpenCL initially,
|
94
|
+
so they are only meant for output data. On the other hand, if the buffer contains
|
95
|
+
regular data, it is by default considered as input data only, and the data
|
96
|
+
is not read back after the kernel method completes.
|
97
|
+
|
98
|
+
In some cases you may want to have a buffer that is both input and output and
|
99
|
+
should be read from after the kernel method finishes. To do this, you mark the
|
100
|
+
buffer as an `outvar` as so:
|
101
|
+
|
102
|
+
program = Program.new <<-'eof'
|
103
|
+
__kernel addN(__global int *data, int N) {
|
104
|
+
int i = get_global_id(0);
|
105
|
+
data[i] = data[i] + N;
|
106
|
+
}
|
107
|
+
eof
|
108
|
+
|
109
|
+
data = [1, 2, 3]
|
110
|
+
program.addN(data.outvar, 10)
|
111
|
+
|
112
|
+
# prints: [11, 12, 13]
|
113
|
+
p data
|
114
|
+
|
115
|
+
RETURN VALUE
|
116
|
+
------------
|
117
|
+
|
118
|
+
Generally you need to pass in your output buffer as the buffer to write the
|
119
|
+
data back to. The idiom `void method(input, output)` is common to write data to
|
120
|
+
output buffers in languages such as C but is a rather clunky API for Ruby.
|
121
|
+
Instead, Barracuda returns the output buffers as the result of the kernel method
|
122
|
+
call. If there is only one output buffer, that buffer is returned as a single
|
123
|
+
result (rather than an array of buffers).
|
124
|
+
|
125
|
+
The example above could be simply rewritten as:
|
126
|
+
|
127
|
+
# prints: [11, 12, 13]
|
128
|
+
p program.addN(data.outvar, 10)
|
129
|
+
|
86
130
|
CONVERTING TYPES
|
87
131
|
----------------
|
88
132
|
|
@@ -105,6 +149,8 @@ For example, to pass in a short, do:
|
|
105
149
|
This can also be applied to an Array of shorts:
|
106
150
|
|
107
151
|
program.my_kernel([1, 2, 3].to_type(:short))
|
152
|
+
|
153
|
+
The default type for an array (and buffers) is :int
|
108
154
|
|
109
155
|
CLASS DETAILS
|
110
156
|
-------------
|
@@ -123,34 +169,21 @@ Represents an OpenCL program
|
|
123
169
|
- if the last arg is a Hash, it should be an options hash with keys:
|
124
170
|
- :times => FIXNUM (the number of iterations to run)
|
125
171
|
|
126
|
-
**Barracuda::Buffer
|
172
|
+
**Barracuda::Buffer** (extends *Array*):
|
127
173
|
|
128
|
-
|
174
|
+
Data storage to transfer to/from an OpenCL kernel method
|
129
175
|
|
130
|
-
Buffer.new(
|
131
|
-
|
132
|
-
Buffer#data => accessor for the buffer data
|
133
|
-
|
134
|
-
Buffer#size_changed => call this if the buffer.data was modified and the size changed
|
135
|
-
- calls Buffer#write
|
176
|
+
Buffer.new(buffer_array) => creates a new input buffer
|
177
|
+
Buffer.new(size) => creates a new output buffer of size `size`
|
136
178
|
|
137
|
-
Buffer#
|
138
|
-
- flushes the buffer.data cache to the OpenCL internal memory buffer
|
139
|
-
|
140
|
-
Buffer#read => reads the cached data back into buffer.data
|
141
|
-
- refreshes the buffer.data cache according to the internal memory buffer
|
142
|
-
|
143
|
-
**Barracuda::OutputBuffer**:
|
179
|
+
Buffer#mark_dirty => call this if the data was modified between calls
|
144
180
|
|
145
|
-
|
181
|
+
Buffer#dirty? => returns whether the buffer is marked as dirty
|
146
182
|
|
147
|
-
|
148
|
-
|
183
|
+
Buffer#outvar => mark the buffer to be read as output
|
184
|
+
|
185
|
+
Buffer#outvar? => returns whether buffer is marked to be read
|
149
186
|
|
150
|
-
OutputBufferBuffer#data => accessor for the buffer data
|
151
|
-
|
152
|
-
OutputBuffer#size => returns the buffer size
|
153
|
-
|
154
187
|
GLOSSARY
|
155
188
|
--------
|
156
189
|
|
data/Rakefile
CHANGED
@@ -1,9 +1,15 @@
|
|
1
1
|
require 'rubygems'
|
2
2
|
require 'rake/gempackagetask'
|
3
|
+
require 'rake/testtask'
|
3
4
|
|
4
5
|
WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
|
5
6
|
SUDO = WINDOWS ? '' : 'sudo'
|
6
7
|
|
8
|
+
task :default => :test
|
9
|
+
task :test => :build
|
10
|
+
|
11
|
+
Rake::TestTask.new
|
12
|
+
|
7
13
|
load 'barracuda.gemspec'
|
8
14
|
Rake::GemPackageTask.new(SPEC) do |pkg|
|
9
15
|
pkg.gem_spec = SPEC
|
@@ -16,3 +22,8 @@ task :install => :package do
|
|
16
22
|
sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
|
17
23
|
sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
|
18
24
|
end
|
25
|
+
|
26
|
+
desc 'Build Barracuda'
|
27
|
+
task :build do
|
28
|
+
sh "cd ext && make"
|
29
|
+
end
|
data/benchmarks/normalize.rb
CHANGED
data/benchmarks/to_float.rb
CHANGED
@@ -12,9 +12,9 @@ prog = Program.new <<-'eof'
|
|
12
12
|
}
|
13
13
|
eof
|
14
14
|
|
15
|
-
arr = (1..
|
15
|
+
arr = (1..333333).to_a
|
16
16
|
input = Buffer.new(arr)
|
17
|
-
output =
|
17
|
+
output = Buffer.new(arr.size).to_type(:float)
|
18
18
|
|
19
19
|
Benchmark.bmbm do |x|
|
20
20
|
x.report("regular") { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } }
|
data/ext/barracuda.c
CHANGED
@@ -4,7 +4,6 @@
|
|
4
4
|
|
5
5
|
static VALUE rb_mBarracuda;
|
6
6
|
static VALUE rb_cBuffer;
|
7
|
-
static VALUE rb_cOutputBuffer;
|
8
7
|
static VALUE rb_cProgram;
|
9
8
|
static VALUE rb_eProgramSyntaxError;
|
10
9
|
static VALUE rb_eOpenCLError;
|
@@ -13,8 +12,10 @@ static VALUE rb_hTypes;
|
|
13
12
|
|
14
13
|
static ID id_times;
|
15
14
|
static ID id_to_sym;
|
16
|
-
static ID
|
15
|
+
static ID id_new;
|
17
16
|
static ID id_object;
|
17
|
+
static ID id_data_type;
|
18
|
+
static ID id_buffer_data;
|
18
19
|
|
19
20
|
static ID id_type_bool;
|
20
21
|
static ID id_type_char;
|
@@ -35,25 +36,25 @@ static ID id_type_uintptr_t;
|
|
35
36
|
/*static ID id_type_void;*/
|
36
37
|
|
37
38
|
static VALUE program_compile(VALUE self, VALUE source);
|
38
|
-
static VALUE buffer_data_set(VALUE self, VALUE new_value);
|
39
39
|
|
40
40
|
static cl_device_id device_id = NULL;
|
41
41
|
static cl_context context = NULL;
|
42
42
|
static size_t max_work_group_size = 65535;
|
43
43
|
static int err;
|
44
44
|
|
45
|
-
#define VERSION_STRING "1.
|
45
|
+
#define VERSION_STRING "1.3"
|
46
46
|
|
47
47
|
struct program {
|
48
48
|
cl_program program;
|
49
49
|
};
|
50
50
|
|
51
51
|
struct buffer {
|
52
|
-
VALUE
|
52
|
+
VALUE dirty;
|
53
|
+
VALUE outvar;
|
53
54
|
ID type;
|
54
|
-
size_t num_items;
|
55
55
|
size_t member_size;
|
56
|
-
|
56
|
+
size_t num_items;
|
57
|
+
int8_t *cachebuf;
|
57
58
|
cl_mem data;
|
58
59
|
};
|
59
60
|
|
@@ -114,11 +115,12 @@ array_data_type_get(VALUE self)
|
|
114
115
|
if (RTEST(value)) return value;
|
115
116
|
|
116
117
|
if (RARRAY_LEN(self) > 0) {
|
118
|
+
if (NIL_P(RARRAY_PTR(self)[0])) return ID2SYM(id_type_int);
|
117
119
|
VALUE value = rb_funcall(RARRAY_PTR(self)[0], id_data_type, 0);
|
118
120
|
if (RTEST(value)) return value;
|
119
121
|
}
|
120
|
-
|
121
|
-
rb_raise(
|
122
|
+
|
123
|
+
rb_raise(rb_eTypeError, "unknown buffer data %s",
|
122
124
|
RSTRING_PTR(rb_inspect(self)));
|
123
125
|
}
|
124
126
|
|
@@ -128,7 +130,7 @@ array_data_type_get(VALUE self)
|
|
128
130
|
|
129
131
|
#define GET_BUFFER() \
|
130
132
|
struct buffer *buffer; \
|
131
|
-
Data_Get_Struct(self, struct buffer, buffer);
|
133
|
+
Data_Get_Struct(rb_ivar_get(self, id_buffer_data), struct buffer, buffer);
|
132
134
|
|
133
135
|
#define TYPE_SET(type, size) \
|
134
136
|
id_type_##type = rb_intern(#type); \
|
@@ -159,11 +161,11 @@ types_hash_init()
|
|
159
161
|
TYPE_SET(ulong, cl_ulong);
|
160
162
|
TYPE_SET(float, cl_float);
|
161
163
|
TYPE_SET(half, cl_half);
|
162
|
-
TYPE_SET(double,
|
163
|
-
TYPE_SET(size_t,
|
164
|
-
TYPE_SET(ptrdiff_t,
|
165
|
-
TYPE_SET(intptr_t,
|
166
|
-
TYPE_SET(uintptr_t,
|
164
|
+
TYPE_SET(double, cl_float);
|
165
|
+
TYPE_SET(size_t, cl_uint);
|
166
|
+
TYPE_SET(ptrdiff_t, cl_uint);
|
167
|
+
TYPE_SET(intptr_t, cl_uint);
|
168
|
+
TYPE_SET(uintptr_t, cl_uint);
|
167
169
|
OBJ_FREEZE(rb_hTypes);
|
168
170
|
}
|
169
171
|
|
@@ -195,11 +197,10 @@ type_to_native(VALUE value, ID data_type, void *native_value)
|
|
195
197
|
TYPE_TO_NATIVE(uint, cl_uint, NUM2UINT);
|
196
198
|
TYPE_TO_NATIVE(long, cl_long, NUM2LONG);
|
197
199
|
TYPE_TO_NATIVE(ulong, cl_ulong, NUM2ULONG);
|
198
|
-
TYPE_TO_NATIVE(
|
199
|
-
TYPE_TO_NATIVE(
|
200
|
-
TYPE_TO_NATIVE(
|
201
|
-
TYPE_TO_NATIVE(
|
202
|
-
TYPE_TO_NATIVE(uintptr_t, uintptr_t, NUM2UINT);
|
200
|
+
TYPE_TO_NATIVE(size_t, cl_uint, NUM2UINT);
|
201
|
+
TYPE_TO_NATIVE(ptrdiff_t, cl_uint, NUM2UINT);
|
202
|
+
TYPE_TO_NATIVE(intptr_t, cl_uint, NUM2UINT);
|
203
|
+
TYPE_TO_NATIVE(uintptr_t, cl_uint, NUM2UINT);
|
203
204
|
}
|
204
205
|
|
205
206
|
static VALUE
|
@@ -216,11 +217,11 @@ type_to_ruby(void *native_value, ID data_type)
|
|
216
217
|
TYPE_TO_RUBY(ulong, cl_ulong, ULONG2NUM);
|
217
218
|
TYPE_TO_RUBY(float, cl_float, rb_float_new);
|
218
219
|
TYPE_TO_RUBY(half, cl_half, rb_float_new);
|
219
|
-
TYPE_TO_RUBY(double,
|
220
|
-
TYPE_TO_RUBY(size_t,
|
221
|
-
TYPE_TO_RUBY(ptrdiff_t,
|
222
|
-
TYPE_TO_RUBY(intptr_t,
|
223
|
-
TYPE_TO_RUBY(uintptr_t,
|
220
|
+
TYPE_TO_RUBY(double, cl_float, rb_float_new);
|
221
|
+
TYPE_TO_RUBY(size_t, cl_uint, UINT2NUM);
|
222
|
+
TYPE_TO_RUBY(ptrdiff_t, cl_uint, UINT2NUM);
|
223
|
+
TYPE_TO_RUBY(intptr_t, cl_uint, UINT2NUM);
|
224
|
+
TYPE_TO_RUBY(uintptr_t, cl_uint, UINT2NUM);
|
224
225
|
return Qnil;
|
225
226
|
}
|
226
227
|
|
@@ -261,172 +262,172 @@ fixnum_to_type(VALUE self, VALUE type)
|
|
261
262
|
static VALUE
|
262
263
|
type_new(VALUE klass, VALUE type)
|
263
264
|
{
|
264
|
-
return rb_funcall(rb_cType,
|
265
|
+
return rb_funcall(rb_cType, id_new, 1, type);
|
265
266
|
}
|
266
267
|
|
267
268
|
static void
|
268
|
-
|
269
|
+
free_buffer_data(struct buffer *buffer)
|
269
270
|
{
|
270
271
|
clReleaseMemObject(buffer->data);
|
271
|
-
rb_gc_mark(buffer->arr);
|
272
272
|
ruby_xfree(buffer->cachebuf);
|
273
|
-
ruby_xfree(buffer);
|
274
273
|
}
|
275
274
|
|
276
275
|
static VALUE
|
277
|
-
|
276
|
+
buffer_outvar(VALUE self)
|
278
277
|
{
|
279
|
-
|
280
|
-
buffer =
|
281
|
-
|
282
|
-
buffer->arr = Qnil;
|
283
|
-
return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
|
278
|
+
GET_BUFFER();
|
279
|
+
buffer->outvar = Qtrue;
|
280
|
+
return self;
|
284
281
|
}
|
285
282
|
|
286
|
-
static
|
287
|
-
|
283
|
+
static VALUE
|
284
|
+
buffer_is_outvar(VALUE self)
|
288
285
|
{
|
289
|
-
|
290
|
-
|
291
|
-
buffer->member_size = FIX2INT(rb_hash_aref(rb_hTypes, ID2SYM(buffer->type)));
|
286
|
+
GET_BUFFER();
|
287
|
+
return buffer->outvar;
|
292
288
|
}
|
293
289
|
|
294
290
|
static VALUE
|
295
|
-
|
291
|
+
buffer_dirty(VALUE self)
|
296
292
|
{
|
297
|
-
unsigned int i, index;
|
298
|
-
unsigned long data_ptr[16]; // data buffer
|
299
|
-
|
300
293
|
GET_BUFFER();
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
if (buffer->
|
305
|
-
|
306
|
-
|
307
|
-
buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
|
308
|
-
|
309
|
-
for (i = 0, index = 0; i < RARRAY_LEN(buffer->arr); i++, index += buffer->member_size) {
|
310
|
-
VALUE item = RARRAY_PTR(buffer->arr)[i];
|
311
|
-
|
312
|
-
type_to_native(item, buffer->type, (void *)data_ptr);
|
313
|
-
memcpy(((int8_t*)buffer->cachebuf) + index, (void *)data_ptr, buffer->member_size);
|
314
|
-
}
|
315
|
-
|
316
|
-
return self;
|
294
|
+
if (buffer->dirty == Qtrue) return Qtrue;
|
295
|
+
if (buffer->data == NULL) return Qtrue;
|
296
|
+
if (buffer->cachebuf == NULL) return Qtrue;
|
297
|
+
if (RARRAY_LEN(self) != buffer->num_items) return Qtrue;
|
298
|
+
if (SYM2ID(rb_funcall(self, id_data_type, 0)) != buffer->type) return Qtrue;
|
299
|
+
return Qfalse;
|
317
300
|
}
|
318
301
|
|
319
302
|
static VALUE
|
320
|
-
|
303
|
+
buffer_mark_dirty(VALUE self)
|
321
304
|
{
|
322
|
-
unsigned int i, index;
|
323
|
-
|
324
305
|
GET_BUFFER();
|
325
|
-
|
326
|
-
rb_gc_mark(buffer->arr);
|
327
|
-
buffer->arr = rb_ary_new2(buffer->num_items);
|
328
|
-
|
329
|
-
for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
|
330
|
-
VALUE value = type_to_ruby(((int8_t*)buffer->cachebuf) + index, buffer->type);
|
331
|
-
rb_ary_push(buffer->arr, value);
|
332
|
-
}
|
333
|
-
|
334
|
-
return self;
|
306
|
+
return (buffer->dirty = Qtrue);
|
335
307
|
}
|
336
308
|
|
337
|
-
static
|
338
|
-
buffer_size_changed(
|
309
|
+
static void
|
310
|
+
buffer_size_changed(struct buffer *buffer)
|
339
311
|
{
|
340
|
-
|
341
|
-
|
342
|
-
if (buffer->data) {
|
343
|
-
clReleaseMemObject(buffer->data);
|
344
|
-
}
|
345
|
-
buffer_update_cache_info(buffer);
|
312
|
+
clReleaseMemObject(buffer->data);
|
346
313
|
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
347
|
-
|
348
|
-
|
349
|
-
|
350
|
-
|
351
|
-
return self;
|
314
|
+
buffer->num_items * buffer->member_size, NULL, NULL);
|
315
|
+
ruby_xfree(buffer->cachebuf);
|
316
|
+
buffer->cachebuf = ruby_xmalloc(buffer->num_items * buffer->member_size);
|
352
317
|
}
|
353
318
|
|
354
319
|
static VALUE
|
355
|
-
|
320
|
+
buffer_update_cache(VALUE self)
|
356
321
|
{
|
357
322
|
GET_BUFFER();
|
358
|
-
|
323
|
+
|
324
|
+
if (buffer_dirty(self) == Qtrue) {
|
325
|
+
size_t old_num_items = buffer->num_items;
|
326
|
+
buffer->num_items = RARRAY_LEN(self);
|
327
|
+
buffer->type = SYM2ID(rb_funcall(self, id_data_type, 0));
|
328
|
+
buffer->member_size = FIX2INT(rb_hash_aref(rb_hTypes, ID2SYM(buffer->type)));
|
329
|
+
if (buffer->num_items != old_num_items) buffer_size_changed(buffer);
|
330
|
+
buffer->dirty = Qfalse;
|
331
|
+
return Qtrue;
|
332
|
+
}
|
333
|
+
|
334
|
+
return Qnil;
|
359
335
|
}
|
360
336
|
|
361
|
-
static
|
362
|
-
|
337
|
+
static void
|
338
|
+
print_buffer(struct buffer *buffer)
|
363
339
|
{
|
364
|
-
|
365
|
-
|
366
|
-
|
367
|
-
|
340
|
+
int i;
|
341
|
+
for (i = 0; i < buffer->num_items * buffer->member_size; i++) {
|
342
|
+
int c = (int)buffer->cachebuf[i];
|
343
|
+
if (i > 0 && i % 8 == 0) printf("\n");
|
344
|
+
printf("%2.2x ", c);
|
368
345
|
}
|
369
|
-
|
370
|
-
|
371
|
-
return buffer->arr;
|
346
|
+
printf("\n");
|
347
|
+
fflush(stdout);
|
372
348
|
}
|
373
349
|
|
374
350
|
static VALUE
|
375
|
-
|
351
|
+
buffer_write(VALUE self, cl_command_queue queue)
|
376
352
|
{
|
377
|
-
|
378
|
-
|
379
|
-
}
|
353
|
+
unsigned int i, index;
|
354
|
+
unsigned long data_ptr[16]; // data buffer
|
380
355
|
|
381
|
-
|
382
|
-
|
356
|
+
GET_BUFFER();
|
357
|
+
|
358
|
+
if (NIL_P(RARRAY_PTR(self)[0])) return Qnil;
|
359
|
+
|
360
|
+
for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
|
361
|
+
VALUE item = RARRAY_PTR(self)[i];
|
362
|
+
type_to_native(item, buffer->type, data_ptr);
|
363
|
+
memcpy(buffer->cachebuf + index, data_ptr, buffer->member_size);
|
383
364
|
}
|
384
|
-
|
385
|
-
|
365
|
+
|
366
|
+
if (queue != NULL) {
|
367
|
+
clEnqueueWriteBuffer(queue, buffer->data, CL_TRUE, 0,
|
368
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
386
369
|
}
|
387
370
|
|
388
371
|
return self;
|
389
372
|
}
|
390
373
|
|
391
374
|
static VALUE
|
392
|
-
|
375
|
+
buffer_read(VALUE self, cl_command_queue queue)
|
393
376
|
{
|
394
|
-
|
377
|
+
unsigned int i, index;
|
378
|
+
|
395
379
|
GET_BUFFER();
|
396
380
|
|
397
|
-
|
398
|
-
|
399
|
-
if (
|
400
|
-
|
401
|
-
|
402
|
-
}
|
403
|
-
if (TYPE(size) != T_FIXNUM) {
|
404
|
-
rb_raise(rb_eArgError, "expecting buffer size as argument 2");
|
381
|
+
if (buffer->outvar != Qtrue) return Qnil;
|
382
|
+
|
383
|
+
if (queue != NULL) {
|
384
|
+
clEnqueueReadBuffer(queue, buffer->data, CL_TRUE, 0,
|
385
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
405
386
|
}
|
406
387
|
|
407
|
-
buffer->
|
408
|
-
|
409
|
-
|
410
|
-
|
411
|
-
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
412
|
-
buffer->member_size * buffer->num_items, NULL, NULL);
|
388
|
+
for (i = 0, index = 0; i < buffer->num_items; i++, index += buffer->member_size) {
|
389
|
+
VALUE value = type_to_ruby(buffer->cachebuf + index, buffer->type);
|
390
|
+
rb_ary_store(self, i, value);
|
391
|
+
}
|
413
392
|
|
414
393
|
return self;
|
415
394
|
}
|
416
395
|
|
417
396
|
static VALUE
|
418
|
-
|
397
|
+
array_to_outvar(VALUE self)
|
419
398
|
{
|
420
|
-
|
421
|
-
|
422
|
-
|
399
|
+
VALUE buf = rb_funcall(rb_cBuffer, id_new, 0);
|
400
|
+
rb_ary_replace(buf, self);
|
401
|
+
buffer_outvar(buf);
|
402
|
+
buffer_mark_dirty(buf);
|
403
|
+
return buf;
|
423
404
|
}
|
424
405
|
|
425
406
|
static VALUE
|
426
|
-
|
407
|
+
buffer_initialize(int argc, VALUE *argv, VALUE self)
|
427
408
|
{
|
428
|
-
|
429
|
-
|
409
|
+
VALUE buf_value;
|
410
|
+
struct buffer *buffer;
|
411
|
+
|
412
|
+
rb_call_super(argc, argv);
|
413
|
+
|
414
|
+
if (argc == 1 && TYPE(argv[0]) == T_ARRAY) {
|
415
|
+
VALUE value = rb_ivar_get(argv[0], id_data_type);
|
416
|
+
if (RTEST(value)) rb_ivar_set(self, id_data_type, value);
|
417
|
+
}
|
418
|
+
|
419
|
+
buffer = ALLOC(struct buffer);
|
420
|
+
MEMZERO(buffer, struct buffer, 1);
|
421
|
+
buffer->outvar = Qfalse;
|
422
|
+
buffer->dirty = Qtrue;
|
423
|
+
buf_value = Data_Wrap_Struct(rb_cObject, 0, free_buffer_data, buffer);
|
424
|
+
rb_ivar_set(self, id_buffer_data, buf_value);
|
425
|
+
|
426
|
+
if (RARRAY_LEN(self) > 0 && NIL_P(RARRAY_PTR(self)[0])) { /* outvar */
|
427
|
+
buffer->outvar = Qtrue;
|
428
|
+
}
|
429
|
+
|
430
|
+
return self;
|
430
431
|
}
|
431
432
|
|
432
433
|
static void
|
@@ -500,6 +501,7 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
|
|
500
501
|
size_t global[3] = {1, 1, 1}, local;
|
501
502
|
cl_kernel kernel;
|
502
503
|
cl_command_queue commands;
|
504
|
+
VALUE result;
|
503
505
|
GET_PROGRAM();
|
504
506
|
|
505
507
|
StringValue(argv[0]);
|
@@ -531,30 +533,20 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
|
|
531
533
|
break;
|
532
534
|
}
|
533
535
|
|
534
|
-
if (
|
536
|
+
if (CLASS_OF(item) == rb_cArray) {
|
535
537
|
/* create buffer from arg */
|
536
|
-
|
537
|
-
item = buffer_initialize(1, &item, buf);
|
538
|
+
argv[i] = item = rb_funcall(rb_cBuffer, id_new, 1, item);
|
538
539
|
}
|
539
540
|
|
540
|
-
if (CLASS_OF(item) ==
|
541
|
-
struct buffer *buffer;
|
542
|
-
Data_Get_Struct(item, struct buffer, buffer);
|
543
|
-
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
544
|
-
if (buffer->num_items > global[0]) {
|
545
|
-
global[0] = buffer->num_items;
|
546
|
-
}
|
547
|
-
}
|
548
|
-
else if (CLASS_OF(item) == rb_cBuffer) {
|
541
|
+
if (CLASS_OF(item) == rb_cBuffer) {
|
549
542
|
struct buffer *buffer;
|
550
|
-
Data_Get_Struct(item, struct buffer, buffer);
|
551
|
-
|
552
|
-
|
553
|
-
|
554
|
-
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
543
|
+
Data_Get_Struct(rb_ivar_get(item, id_buffer_data), struct buffer, buffer);
|
544
|
+
|
545
|
+
buffer_update_cache(item);
|
546
|
+
buffer_write(item, commands);
|
555
547
|
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
556
|
-
if (
|
557
|
-
global[0] =
|
548
|
+
if (RARRAY_LEN(item) > global[0]) {
|
549
|
+
global[0] = RARRAY_LEN(item);
|
558
550
|
}
|
559
551
|
}
|
560
552
|
else {
|
@@ -600,23 +592,28 @@ program_method_missing(int argc, VALUE *argv, VALUE self)
|
|
600
592
|
|
601
593
|
clFinish(commands);
|
602
594
|
|
595
|
+
result = rb_ary_new();
|
596
|
+
|
603
597
|
for (i = 1; i < argc; i++) {
|
604
598
|
VALUE item = argv[i];
|
605
|
-
if (CLASS_OF(item) ==
|
606
|
-
|
607
|
-
|
608
|
-
err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
|
609
|
-
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
610
|
-
if (err != CL_SUCCESS) {
|
611
|
-
CLEAN();
|
612
|
-
rb_raise(rb_eOpenCLError, "failed to read output buffer");
|
599
|
+
if (CLASS_OF(item) == rb_cBuffer) {
|
600
|
+
if (RTEST(buffer_read(item, commands))) {
|
601
|
+
rb_ary_push(result, item);
|
613
602
|
}
|
614
|
-
buffer_read(item);
|
615
603
|
}
|
616
604
|
}
|
617
605
|
|
618
606
|
CLEAN();
|
619
|
-
|
607
|
+
|
608
|
+
if (RARRAY_LEN(result) == 0) {
|
609
|
+
return Qnil;
|
610
|
+
}
|
611
|
+
else if (RARRAY_LEN(result) == 1) {
|
612
|
+
return RARRAY_PTR(result)[0];
|
613
|
+
}
|
614
|
+
else {
|
615
|
+
return result;
|
616
|
+
}
|
620
617
|
}
|
621
618
|
|
622
619
|
static void
|
@@ -645,9 +642,10 @@ void
|
|
645
642
|
Init_barracuda()
|
646
643
|
{
|
647
644
|
id_times = rb_intern("times");
|
645
|
+
id_new = rb_intern("new");
|
648
646
|
id_to_sym = rb_intern("to_sym");
|
649
647
|
id_data_type = rb_intern("data_type");
|
650
|
-
|
648
|
+
id_buffer_data = rb_intern("buffer_data");
|
651
649
|
|
652
650
|
rb_hTypes = rb_hash_new();
|
653
651
|
rb_define_method(rb_mKernel, "Type", type_new, 1);
|
@@ -666,28 +664,19 @@ Init_barracuda()
|
|
666
664
|
rb_define_method(rb_cProgram, "compile", program_compile, 1);
|
667
665
|
rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
|
668
666
|
|
669
|
-
rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer",
|
670
|
-
rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
|
667
|
+
rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cArray);
|
671
668
|
rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
|
672
|
-
rb_define_method(rb_cBuffer, "
|
673
|
-
rb_define_method(rb_cBuffer, "
|
674
|
-
rb_define_method(rb_cBuffer, "
|
675
|
-
rb_define_method(rb_cBuffer, "
|
676
|
-
rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
|
677
|
-
|
678
|
-
rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
|
679
|
-
rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
|
680
|
-
rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
|
681
|
-
rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
|
682
|
-
rb_undef_method(rb_cOutputBuffer, "write");
|
683
|
-
rb_undef_method(rb_cOutputBuffer, "size_changed");
|
684
|
-
rb_undef_method(rb_cOutputBuffer, "data=");
|
669
|
+
rb_define_method(rb_cBuffer, "outvar", buffer_outvar, 0);
|
670
|
+
rb_define_method(rb_cBuffer, "outvar?", buffer_is_outvar, 0);
|
671
|
+
rb_define_method(rb_cBuffer, "mark_dirty", buffer_mark_dirty, 0);
|
672
|
+
rb_define_method(rb_cBuffer, "dirty?", buffer_dirty, 0);
|
685
673
|
|
686
674
|
rb_cType = rb_define_class_under(rb_mBarracuda, "Type", rb_cObject);
|
687
675
|
rb_define_method(rb_cType, "initialize", type_initialize, 1);
|
688
676
|
rb_define_method(rb_cType, "method_missing", type_method_missing, 1);
|
689
677
|
rb_define_method(rb_cType, "object", type_object, 0);
|
690
678
|
|
679
|
+
rb_define_method(rb_cArray, "outvar", array_to_outvar, 0);
|
691
680
|
rb_define_method(rb_cObject, "to_type", object_to_type, 1);
|
692
681
|
rb_define_method(rb_cFixnum, "to_type", fixnum_to_type, 1);
|
693
682
|
rb_define_method(rb_cObject, "data_type", object_data_type_get, 0);
|