barracuda 1.0

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2009 Loren Segal
2
+
3
+ Permission is hereby granted, free of charge, to any person
4
+ obtaining a copy of this software and associated documentation
5
+ files (the "Software"), to deal in the Software without
6
+ restriction, including without limitation the rights to use,
7
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ copies of the Software, and to permit persons to whom the
9
+ Software is furnished to do so, subject to the following
10
+ conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,163 @@
1
+ Barracuda
2
+ =========
3
+
4
+ Written by Loren Segal in 2009.
5
+
6
+ SYNOPSIS
7
+ --------
8
+
9
+ Barracuda is a Ruby wrapper library for the [OpenCL][1] architecture. OpenCL is a
10
+ framework for multi-processor computing, most notably allowing a programmer
11
+ to run parallel programs on a GPU, taking advantage of the many cores
12
+ available.
13
+
14
+ Barracuda aims to abstract both CUDA and OpenCL, however for now only OpenCL
15
+ on OSX 10.6 is supported. Patches to extend this support would be joyously
16
+ accepted!
17
+
18
+ Also note that Barracuda currently only supports data types, namely ints and
19
+ floats only. This should also be expanded.
20
+
21
+ INSTALLING
22
+ ----------
23
+
24
+ As mentioned above, this library currently only supports OSX 10.6 (or an earlier
25
+ version with the OpenCL framework, if that's even possible). If you manage to
26
+ mess with the source and get it working on [insert system here], please submit
27
+ your patches.
28
+
29
+ Okay, assuming you have a compatible machine:
30
+
31
+ sudo gem install barracuda
32
+
33
+ Or:
34
+
35
+ git clone git://github.com/lsegal/barracuda
36
+ cd barracuda
37
+ rake install
38
+
39
+ USING
40
+ -----
41
+
42
+ The basic workflow behind the OpenCL architecture is:
43
+
44
+ 1. Create a program (and kernel) to be run on the GPU's many cores.
45
+ 2. Create input/output buffers to pass data from Ruby to the GPU and back.
46
+ 3. Read the output buffer(s) to get your computed data.
47
+
48
+ In Barracuda, this looks basically like:
49
+
50
+ 1. Create a `Barracuda::Program`
51
+ 2. Create a `Barracuda::Buffer` or `Barracuda::OutputBuffer`
52
+ 2. Call the kernel method on the program with buffers as arguments
53
+ 3. Read output buffers
54
+
55
+ As you can see, there are only 3 basic classes: `Program`, `Buffer` (for input
56
+ data), and `OutputBuffer` (for output data).
57
+
58
+ EXAMPLE
59
+ -------
60
+
61
+ Consider the following example to sum a bunch of integers:
62
+
63
+ program = Program.new <<-'eof'
64
+ __kernel sum(__global int *out, __global int *in, int total) {
65
+ int id = get_global_id(0);
66
+ if (id < total) atom_add(&out[0], in[id]);
67
+ }
68
+ eof
69
+
70
+ arr = (1..65536).to_a
71
+ input = Buffer.new(arr)
72
+ output = OutputBuffer.new(:int, 1)
73
+ program.sum(output, input, arr.size)
74
+
75
+ puts "The sum is: " + output.data[0].to_s
76
+
77
+ The above example will compute the sum of integers 1 to 65536 using (at most)
78
+ 65536 parallel processes and return the result in the 1-dimensional output
79
+ buffer (which stores integers and is of length 1). The kernel method `sum`
80
+ is called by calling the `#sum` method on the program object, and the
81
+ arguments are passed in sequentially as the output buffer, followed by the
82
+ input data (the integers) followed by the total size of the input (since C
83
+ does not have the concept of array size).
84
+
85
+ We can also specify the work group size (the number of iterations we need
86
+ to run). Barracuda automatically selects the size of the largest buffer as
87
+ the work group size, but in some cases this may be too small or too large. To
88
+ manually specify the work group size, call the kernel with an options hash:
89
+
90
+ program.my_kernel_method(..., :worker_size => 512)
91
+
92
+ Note that the work group size must be a power of 2. Barracuda will increase
93
+ the work group size to the next power of 2 if it needs to. This means your
94
+ OpenCL program might run more iterations of your kernel method than you
95
+ request. Because we can't rely on the work group size, we pass in the total
96
+ data size to ensure we do not exceed the bounds of our data.
97
+
98
+ CLASS DETAILS
99
+ -------------
100
+
101
+ **Barracuda::Program**:
102
+
103
+ Represents an OpenCL program
104
+
105
+ Program.new(PROGRAM_SOURCE) => creates a new program
106
+
107
+ Program#compile(SOURCE) => recompiles a program
108
+
109
+ Program#KERNEL_METHOD(*args) => runs KERNEL_METHOD in the compiled program
110
+ - args should be the arguments defined in the kernel method.
111
+ - supported argument types are Float and Fixnum objects only.
112
+ - if the last arg is a Hash, it should be an options hash with keys:
113
+ - :worker_size => FIXNUM (the number of iterations to run)
114
+
115
+ **Barracuda::Buffer**:
116
+
117
+ Stores data to be sent to an OpenCL kernel method
118
+
119
+ Buffer.new(*buffer_data) => creates a new input buffer
120
+
121
+ Buffer#data => accessor for the buffer data
122
+
123
+ Buffer#size_changed => call this if the buffer.data was modified and the size changed
124
+ - calls Buffer#write
125
+
126
+ Buffer#write => call this if the buffer.data was modified (size not changed)
127
+ - flushes the buffer.data cache to the OpenCL internal memory buffer
128
+
129
+ Buffer#read => reads the cached data back into buffer.data
130
+ - refreshes the buffer.data cache according to the internal memory buffer
131
+
132
+ **Barracuda::OutputBuffer**:
133
+
134
+ Holds a buffer for data written from the kernel method.
135
+
136
+ OutputBuffer.new(type, size) => creates a new output buffer
137
+ - type can be :float or :int
138
+
139
+ OutputBufferBuffer#data => accessor for the buffer data
140
+
141
+ OutputBuffer#size => returns the buffer size
142
+
143
+ GLOSSARY
144
+ --------
145
+
146
+ * **Program**: an OpenCL program is generally created from a variant of C that
147
+ has extra domain specific keywords. A program has at least one "kernel"
148
+ method, but can have many regular methods.
149
+
150
+ * **Kernel**: a special "entry" method in the program that is exposed to the
151
+ programmer to be called on via the OpenCL framework. A kernel method is
152
+ represented by the `__kernel` keyword before the method body.
153
+
154
+ * **Buffer**: memory storage which is accessible and (generally shared with the
155
+ program). Buffers are usually marked with the `__global` keyword in an
156
+ OpenCL program.
157
+
158
+ COPYRIGHT & LICENSING
159
+ ---------------------
160
+
161
+ Copyright 2009 Loren Segal, licensed under the MIT License
162
+
163
+ [1]: http://en.wikipedia.ca/wiki/OpenCL "OpenCL"
@@ -0,0 +1,18 @@
1
+ require 'rubygems'
2
+ require 'rake/gempackagetask'
3
+
4
+ WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
5
+ SUDO = WINDOWS ? '' : 'sudo'
6
+
7
+ load 'barracuda.gemspec'
8
+ Rake::GemPackageTask.new(SPEC) do |pkg|
9
+ pkg.gem_spec = SPEC
10
+ pkg.need_zip = true
11
+ pkg.need_tar = true
12
+ end
13
+
14
+ desc "Install the gem locally"
15
+ task :install => :package do
16
+ sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
17
+ sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
18
+ end
@@ -0,0 +1,24 @@
1
+ $:.unshift(File.dirname(__FILE__) + '/../ext')
2
+
3
+ require 'barracuda'
4
+ require 'benchmark'
5
+
6
+ include Barracuda
7
+
8
+ prog = Program.new <<-'eof'
9
+ __kernel sum(__global float *out, __global int *in, int total) {
10
+ int i = get_global_id(0);
11
+ if (i < total) out[i] = ((float)in[i] + 0.5) / 3.8 + 2.0;
12
+ }
13
+ eof
14
+
15
+ arr = (1..3333333).to_a
16
+ input = Buffer.new(arr)
17
+ output = OutputBuffer.new(:float, arr.size)
18
+
19
+ TIMES = 1
20
+ Benchmark.bmbm do |x|
21
+ x.report("cpu") { TIMES.times { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } } }
22
+ x.report("gpu") { TIMES.times { prog.sum(output, input, arr.size); output.clear } }
23
+ end
24
+
@@ -0,0 +1,481 @@
1
+ #include <ruby.h>
2
+ #include <OpenCL/OpenCL.h>
3
+
4
+ static VALUE rb_mBarracuda;
5
+ static VALUE rb_cBuffer;
6
+ static VALUE rb_cOutputBuffer;
7
+ static VALUE rb_cProgram;
8
+ static VALUE rb_eProgramSyntaxError;
9
+ static VALUE rb_eOpenCLError;
10
+
11
+ static ID ba_worker_size;
12
+
13
+ static VALUE program_compile(VALUE self, VALUE source);
14
+ static VALUE buffer_data_set(VALUE self, VALUE new_value);
15
+
16
+ static cl_device_id device_id = NULL;
17
+ static cl_context context = NULL;
18
+ static int err;
19
+
20
+ #define BUFFER_TYPE_FLOAT 0x0001
21
+ #define BUFFER_TYPE_INT 0x0002
22
+ #define BUFFER_TYPE_CHAR 0x0003
23
+
24
+ struct program {
25
+ cl_program program;
26
+ };
27
+
28
+ struct kernel {
29
+ cl_kernel kernel;
30
+ };
31
+
32
+ struct buffer {
33
+ VALUE arr;
34
+ unsigned int type;
35
+ size_t num_items;
36
+ size_t member_size;
37
+ void *cachebuf;
38
+ cl_mem data;
39
+ };
40
+
41
+ #define GET_PROGRAM() \
42
+ struct program *program; \
43
+ Data_Get_Struct(self, struct program, program);
44
+
45
+ #define GET_BUFFER() \
46
+ struct buffer *buffer; \
47
+ Data_Get_Struct(self, struct buffer, buffer);
48
+
49
+ static void
50
+ init_opencl()
51
+ {
52
+ if (device_id == NULL) {
53
+ err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
54
+ if (err != CL_SUCCESS) {
55
+ rb_raise(rb_eOpenCLError, "failed to create a device group");
56
+ }
57
+ }
58
+
59
+ if (context == NULL) {
60
+ context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
61
+ if (!context) {
62
+ rb_raise(rb_eOpenCLError, "failed to create a program context");
63
+ }
64
+ }
65
+ }
66
+
67
+ static void
68
+ free_buffer(struct buffer *buffer)
69
+ {
70
+ fflush(stdout);
71
+ clReleaseMemObject(buffer->data);
72
+ rb_gc_mark(buffer->arr);
73
+ ruby_xfree(buffer->cachebuf);
74
+ ruby_xfree(buffer);
75
+ }
76
+
77
+ static VALUE
78
+ buffer_s_allocate(VALUE klass)
79
+ {
80
+ struct buffer *buffer;
81
+ buffer = ALLOC(struct buffer);
82
+ MEMZERO(buffer, struct buffer, 1);
83
+ buffer->arr = Qnil;
84
+ return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
85
+ }
86
+
87
+ static void
88
+ buffer_update_cache_info(struct buffer *buffer)
89
+ {
90
+ buffer->num_items = RARRAY_LEN(buffer->arr);
91
+
92
+ switch (TYPE(RARRAY_PTR(buffer->arr)[0])) {
93
+ case T_FIXNUM:
94
+ buffer->type = BUFFER_TYPE_INT;
95
+ buffer->member_size = sizeof(int);
96
+ break;
97
+ case T_FLOAT:
98
+ buffer->type = BUFFER_TYPE_FLOAT;
99
+ buffer->member_size = sizeof(float);
100
+ break;
101
+ default:
102
+ rb_raise(rb_eRuntimeError, "invalid buffer data %s",
103
+ RSTRING_PTR(rb_inspect(buffer->arr)));
104
+ }
105
+ }
106
+
107
+ static VALUE
108
+ buffer_write(VALUE self)
109
+ {
110
+ unsigned int i;
111
+
112
+ GET_BUFFER();
113
+
114
+ buffer_update_cache_info(buffer);
115
+
116
+ if (buffer->cachebuf) {
117
+ xfree(buffer->cachebuf);
118
+ }
119
+ buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
120
+
121
+ for (i = 0; i < RARRAY_LEN(buffer->arr); i++) {
122
+ VALUE item = RARRAY_PTR(buffer->arr)[i];
123
+ switch (buffer->type) {
124
+ case BUFFER_TYPE_INT: {
125
+ int value = FIX2INT(item);
126
+ ((int *)buffer->cachebuf)[i] = value;
127
+ break;
128
+ }
129
+ case BUFFER_TYPE_FLOAT: {
130
+ float value = RFLOAT_VALUE(item);
131
+ ((float *)buffer->cachebuf)[i] = value;
132
+ break;
133
+ }
134
+ default:
135
+ ((uint32_t *)buffer->cachebuf)[i] = 0;
136
+ }
137
+ }
138
+
139
+ return self;
140
+ }
141
+
142
+ static VALUE
143
+ buffer_read(VALUE self)
144
+ {
145
+ unsigned int i;
146
+
147
+ GET_BUFFER();
148
+
149
+ rb_gc_mark(buffer->arr);
150
+ buffer->arr = rb_ary_new2(buffer->num_items);
151
+
152
+ for (i = 0; i < buffer->num_items; i++) {
153
+ switch (buffer->type) {
154
+ case BUFFER_TYPE_INT:
155
+ rb_ary_push(buffer->arr, INT2FIX(((int *)buffer->cachebuf)[i]));
156
+ break;
157
+ case BUFFER_TYPE_FLOAT:
158
+ rb_ary_push(buffer->arr, rb_float_new(((float *)buffer->cachebuf)[i]));
159
+ break;
160
+ default:
161
+ rb_ary_push(buffer->arr, Qnil);
162
+ }
163
+ }
164
+
165
+ return self;
166
+ }
167
+
168
+ static VALUE
169
+ buffer_size_changed(VALUE self)
170
+ {
171
+ GET_BUFFER();
172
+
173
+ if (buffer->data) {
174
+ clReleaseMemObject(buffer->data);
175
+ }
176
+ buffer_update_cache_info(buffer);
177
+ buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
178
+ buffer->num_items * buffer->member_size, NULL, NULL);
179
+
180
+ buffer_write(self);
181
+
182
+ return self;
183
+ }
184
+
185
+ static VALUE
186
+ buffer_data(VALUE self)
187
+ {
188
+ GET_BUFFER();
189
+ return buffer->arr;
190
+ }
191
+
192
+ static VALUE
193
+ buffer_data_set(VALUE self, VALUE new_value)
194
+ {
195
+ GET_BUFFER();
196
+
197
+ if (RTEST(buffer->arr)) {
198
+ rb_gc_mark(buffer->arr);
199
+ }
200
+ buffer->arr = new_value;
201
+ buffer_size_changed(self);
202
+ return buffer->arr;
203
+ }
204
+
205
+ static VALUE
206
+ buffer_initialize(int argc, VALUE *argv, VALUE self)
207
+ {
208
+ GET_BUFFER();
209
+
210
+ if (argc == 0) {
211
+ rb_raise(rb_eArgError, "no buffer data given");
212
+ }
213
+
214
+ if (TYPE(argv[0]) == T_ARRAY) {
215
+ buffer_data_set(self, argv[0]);
216
+ }
217
+ else {
218
+ buffer_data_set(self, rb_ary_new4(argc, argv));
219
+ }
220
+
221
+ return self;
222
+ }
223
+
224
+ static VALUE
225
+ obuffer_initialize(VALUE self, VALUE type, VALUE size)
226
+ {
227
+ GET_BUFFER();
228
+
229
+ StringValue(type);
230
+ if (strcmp(RSTRING_PTR(type), "float") == 0) {
231
+ buffer->type = BUFFER_TYPE_FLOAT;
232
+ buffer->member_size = sizeof(float);
233
+ }
234
+ else if (strcmp(RSTRING_PTR(type), "int") == 0) {
235
+ buffer->type = BUFFER_TYPE_INT;
236
+ buffer->member_size = sizeof(int);
237
+ }
238
+ else {
239
+ rb_raise(rb_eArgError, "type can only be :float or :int");
240
+ }
241
+
242
+ if (TYPE(size) != T_FIXNUM) {
243
+ rb_raise(rb_eArgError, "expecting buffer size as argument 2");
244
+ }
245
+
246
+ buffer->num_items = FIX2UINT(size);
247
+ buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
248
+ buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
249
+ buffer->member_size * buffer->num_items, NULL, NULL);
250
+
251
+ return self;
252
+ }
253
+
254
+ static VALUE
255
+ obuffer_clear(VALUE self)
256
+ {
257
+ GET_BUFFER();
258
+ memset(buffer->cachebuf, 0, buffer->member_size * buffer->num_items);
259
+ return self;
260
+ }
261
+
262
+ static VALUE
263
+ obuffer_size(VALUE self)
264
+ {
265
+ GET_BUFFER();
266
+ return INT2FIX(buffer->num_items);
267
+ }
268
+
269
+ static void
270
+ free_program(struct program *program)
271
+ {
272
+ clReleaseProgram(program->program);
273
+ xfree(program);
274
+ }
275
+
276
+ static VALUE
277
+ program_s_allocate(VALUE klass)
278
+ {
279
+ struct program *program;
280
+ program = ALLOC(struct program);
281
+ MEMZERO(program, struct program, 1);
282
+ return Data_Wrap_Struct(klass, 0, free_program, program);
283
+ }
284
+
285
+ static VALUE
286
+ program_initialize(int argc, VALUE *argv, VALUE self)
287
+ {
288
+ VALUE source;
289
+
290
+ rb_scan_args(argc, argv, "01", &source);
291
+ if (source != Qnil) {
292
+ program_compile(self, source);
293
+ }
294
+
295
+ return self;
296
+ }
297
+
298
+ static VALUE
299
+ program_compile(VALUE self, VALUE source)
300
+ {
301
+ const char *c_source;
302
+ GET_PROGRAM();
303
+ StringValue(source);
304
+
305
+ if (program->program) {
306
+ clReleaseProgram(program->program);
307
+ program->program = 0;
308
+ }
309
+
310
+ c_source = StringValueCStr(source);
311
+ program->program = clCreateProgramWithSource(context, 1, &c_source, NULL, &err);
312
+ if (!program->program) {
313
+ program->program = 0;
314
+ rb_raise(rb_eOpenCLError, "failed to create compute program");
315
+ }
316
+
317
+ err = clBuildProgram(program->program, 0, NULL, NULL, NULL, NULL);
318
+ if (err != CL_SUCCESS) {
319
+ size_t len;
320
+ char buffer[2048];
321
+
322
+ clGetProgramBuildInfo(program->program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
323
+ clReleaseProgram(program->program);
324
+ program->program = 0;
325
+ rb_raise(rb_eProgramSyntaxError, "%s", buffer);
326
+ }
327
+
328
+ return Qtrue;
329
+ }
330
+
331
+ #define CLEAN() program_clean(kernel, commands);
332
+ #define ERROR(msg) if (err != CL_SUCCESS) { CLEAN(); rb_raise(rb_eOpenCLError, msg); }
333
+
334
+ static void
335
+ program_clean(cl_kernel kernel, cl_command_queue commands)
336
+ {
337
+ clReleaseKernel(kernel);
338
+ clReleaseCommandQueue(commands);
339
+ }
340
+
341
+ static VALUE
342
+ program_method_missing(int argc, VALUE *argv, VALUE self)
343
+ {
344
+ int i;
345
+ size_t local = 0, global = 0;
346
+ cl_kernel kernel;
347
+ cl_command_queue commands;
348
+ GET_PROGRAM();
349
+
350
+ StringValue(argv[0]);
351
+ kernel = clCreateKernel(program->program, RSTRING_PTR(argv[0]), &err);
352
+ if (!kernel || err != CL_SUCCESS) {
353
+ rb_raise(rb_eNoMethodError, "no kernel method '%s'", RSTRING_PTR(argv[0]));
354
+ }
355
+
356
+ commands = clCreateCommandQueue(context, device_id, 0, &err);
357
+ if (!commands) {
358
+ rb_raise(rb_eOpenCLError, "could not execute kernel method '%s'", RSTRING_PTR(argv[0]));
359
+ }
360
+
361
+ for (i = 1; i < argc; i++) {
362
+ err = 0;
363
+ if (i == argc - 1 && TYPE(argv[i]) == T_HASH) {
364
+ VALUE worker_size = rb_hash_aref(argv[i], ID2SYM(ba_worker_size));
365
+ if (RTEST(worker_size) && TYPE(worker_size) == T_FIXNUM) {
366
+ global = FIX2UINT(worker_size);
367
+ }
368
+ else {
369
+ CLEAN();
370
+ rb_raise(rb_eArgError, "opts hash must be {:worker_size => INT_VALUE}, got %s",
371
+ RSTRING_PTR(rb_inspect(argv[i])));
372
+ }
373
+ break;
374
+ }
375
+
376
+ switch(TYPE(argv[i])) {
377
+ case T_FIXNUM: {
378
+ int value = FIX2INT(argv[i]);
379
+ err = clSetKernelArg(kernel, i - 1, sizeof(int), &value);
380
+ break;
381
+ }
382
+ case T_FLOAT: {
383
+ float value = RFLOAT_VALUE(argv[i]);
384
+ err = clSetKernelArg(kernel, i - 1, sizeof(float), &value);
385
+ break;
386
+ }
387
+ case T_ARRAY: {
388
+ /* TODO */
389
+ /* fall-through */
390
+ }
391
+ default:
392
+ if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
393
+ struct buffer *buffer;
394
+ Data_Get_Struct(argv[i], struct buffer, buffer);
395
+ err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
396
+ if (buffer->num_items > global) {
397
+ global = buffer->num_items;
398
+ }
399
+ }
400
+ else if (CLASS_OF(argv[i]) == rb_cBuffer) {
401
+ struct buffer *buffer;
402
+ Data_Get_Struct(argv[i], struct buffer, buffer);
403
+
404
+ buffer_write(argv[i]);
405
+ clEnqueueWriteBuffer(commands, buffer->data, CL_TRUE, 0,
406
+ buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
407
+ err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
408
+ }
409
+ break;
410
+ }
411
+ if (err != CL_SUCCESS) {
412
+ CLEAN();
413
+ rb_raise(rb_eArgError, "invalid kernel method parameter: %s", RSTRING_PTR(rb_inspect(argv[i])));
414
+ }
415
+ }
416
+
417
+ err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &local, NULL);
418
+ ERROR("failed to retrieve kernel work group info");
419
+
420
+ { /* global work size must be power of 2, greater than 3 and not smaller than local */
421
+ size_t size = 4;
422
+ while (size < global) size *= 2;
423
+ global = size;
424
+ if (global < local) global = local;
425
+ }
426
+
427
+ clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
428
+ if (err) { CLEAN(); rb_raise(rb_eOpenCLError, "failed to execute kernel method"); }
429
+
430
+ clFinish(commands);
431
+
432
+ for (i = 1; i < argc; i++) {
433
+ if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
434
+ struct buffer *buffer;
435
+ Data_Get_Struct(argv[i], struct buffer, buffer);
436
+ err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
437
+ buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
438
+ ERROR("failed to read output buffer");
439
+ buffer_read(argv[i]);
440
+ }
441
+ }
442
+
443
+ CLEAN();
444
+ return Qnil;
445
+ }
446
+
447
+ void
448
+ Init_barracuda()
449
+ {
450
+ ba_worker_size = rb_intern("worker_size");
451
+
452
+ rb_mBarracuda = rb_define_module("Barracuda");
453
+
454
+ rb_eProgramSyntaxError = rb_define_class_under(rb_mBarracuda, "SyntaxError", rb_eSyntaxError);
455
+ rb_eOpenCLError = rb_define_class_under(rb_mBarracuda, "OpenCLError", rb_eStandardError);
456
+
457
+ rb_cProgram = rb_define_class_under(rb_mBarracuda, "Program", rb_cObject);
458
+ rb_define_alloc_func(rb_cProgram, program_s_allocate);
459
+ rb_define_method(rb_cProgram, "initialize", program_initialize, -1);
460
+ rb_define_method(rb_cProgram, "compile", program_compile, 1);
461
+ rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
462
+
463
+ rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cObject);
464
+ rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
465
+ rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
466
+ rb_define_method(rb_cBuffer, "size_changed", buffer_size_changed, 0);
467
+ rb_define_method(rb_cBuffer, "read", buffer_read, 0);
468
+ rb_define_method(rb_cBuffer, "write", buffer_write, 0);
469
+ rb_define_method(rb_cBuffer, "data", buffer_data, 0);
470
+ rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
471
+
472
+ rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
473
+ rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
474
+ rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
475
+ rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
476
+ rb_undef_method(rb_cOutputBuffer, "write");
477
+ rb_undef_method(rb_cOutputBuffer, "size_changed");
478
+ rb_undef_method(rb_cOutputBuffer, "data=");
479
+
480
+ init_opencl();
481
+ }
@@ -0,0 +1,4 @@
1
+ require 'mkmf'
2
+ $CPPFLAGS += " -DRUBY_19" if RUBY_VERSION =~ /1.9/
3
+ $LDFLAGS += " -framework OpenCL" if RUBY_PLATFORM =~ /darwin/
4
+ create_makefile('barracuda')
@@ -0,0 +1,174 @@
1
+ $:.unshift(File.dirname(__FILE__) + '/../ext/')
2
+
3
+ require "test/unit"
4
+ require "barracuda"
5
+
6
+ include Barracuda
7
+
8
+ class TestBuffer < Test::Unit::TestCase
9
+ def test_buffer_create_no_data
10
+ assert_raise(ArgumentError) { Buffer.new }
11
+ end
12
+
13
+ def test_buffer_create_invalid_data
14
+ assert_raise(RuntimeError) { Buffer.new("xyz") }
15
+ end
16
+
17
+ def test_buffer_create_with_array
18
+ b = Buffer.new([1, 2, 3, 4, 5])
19
+ assert_equal [1, 2, 3, 4, 5], b.data
20
+ end
21
+
22
+ def test_buffer_create_with_splat
23
+ b = Buffer.new(1.0, 2.0, 3.0)
24
+ assert_equal [1.0, 2.0, 3.0], b.data
25
+ end
26
+
27
+ def test_buffer_set_data
28
+ b = Buffer.new(1)
29
+ b.data = [1, 2, 3]
30
+ assert_equal 3, b.data.size
31
+ end
32
+
33
+ def test_buffer_read
34
+ b = Buffer.new(4, 2, 3)
35
+ b.data[0] = 1
36
+ b.read
37
+ assert_equal [4,2,3], b.data
38
+ end
39
+
40
+ def test_buffer_write
41
+ b = Buffer.new(1, 2, 3)
42
+ b.data[0] = 4
43
+ b.write
44
+ b.read
45
+ assert_equal [4,2,3], b.data
46
+ end
47
+
48
+ def test_buffer_size_changed
49
+ b = Buffer.new(1, 2, 3)
50
+ b.data << 4
51
+ b.size_changed
52
+ b.read
53
+ assert_equal [1,2,3,4], b.data
54
+ end
55
+ end
56
+
57
+ class TestOutputBuffer < Test::Unit::TestCase
58
+ def test_create_int_output_buffer
59
+ b = OutputBuffer.new(:int, 5)
60
+ assert_equal 5, b.size
61
+ end
62
+
63
+ def test_create_int_output_buffer
64
+ b = OutputBuffer.new(:float, 5)
65
+ assert_equal 5, b.size
66
+ end
67
+
68
+ def test_create_output_buffer_with_invalid_type
69
+ assert_raise(ArgumentError) { OutputBuffer.new(:char, 5) }
70
+ end
71
+
72
+ def test_create_output_buffer_with_invalid_size
73
+ assert_raise(ArgumentError) { OutputBuffer.new(:int, 'x') }
74
+ end
75
+ end
76
+
77
+ class TestProgram < Test::Unit::TestCase
78
+ def test_program_create_invalid_code
79
+ assert_raise(Barracuda::SyntaxError) { Program.new "fib { SYNTAXERROR }" }
80
+ end
81
+
82
+ def test_program_create
83
+ assert_nothing_raised { Program.new "__kernel fib(int x) { return 0; }"}
84
+ end
85
+
86
+ def test_program_compile
87
+ p = Program.new
88
+ assert_nothing_raised { p.compile "__kernel fib(int x) { }" }
89
+ end
90
+
91
+ def test_kernel_run
92
+ p = Program.new("__kernel x_y_z(int x) { }")
93
+ assert_nothing_raised { p.x_y_z }
94
+ end
95
+
96
+ def test_kernel_missing
97
+ p = Program.new("__kernel x_y_z(int x) { }")
98
+ assert_raise(NoMethodError) { p.not_x_y_z }
99
+ end
100
+
101
+ def test_program_int_input_buffer
102
+ p = Program.new <<-'eof'
103
+ __kernel run(__global int* out, __global int* in, int total) {
104
+ int id = get_global_id(0);
105
+ if (id < total) out[id] = in[id] + 1;
106
+ }
107
+ eof
108
+
109
+ arr = (1..256).to_a
110
+ _in = Buffer.new(arr)
111
+ out = OutputBuffer.new(:int, arr.size)
112
+ p.run(out, _in, arr.size)
113
+ assert_equal arr.map {|x| x + 1 }, out.data
114
+ end
115
+
116
+ def test_program_float_buffer
117
+ p = Program.new <<-'eof'
118
+ __kernel run(__global float* out, __global int* in, int total) {
119
+ int id = get_global_id(0);
120
+ if (id < total) out[id] = (float)in[id] + 0.5;
121
+ }
122
+ eof
123
+
124
+ arr = (1..256).to_a
125
+ _in = Buffer.new(arr)
126
+ out = OutputBuffer.new(:float, arr.size)
127
+ p.run(out, _in, arr.size)
128
+ assert_equal arr.map {|x| x.to_f + 0.5 }, out.data
129
+ end
130
+
131
+ def test_program_set_worker_size
132
+ p = Program.new <<-'eof'
133
+ __kernel sum(__global int* out, __global int* in, int total) {
134
+ int id = get_global_id(0);
135
+ if (id < total) atom_add(&out[0], in[id]);
136
+ }
137
+ eof
138
+
139
+ arr = (1..517).to_a
140
+ sum = arr.inject(0) {|acc, el| acc + el }
141
+ _in = Buffer.new(arr)
142
+ out = OutputBuffer.new(:int, 1)
143
+ p.sum(out, _in, arr.size, :worker_size => arr.size)
144
+ assert_equal sum, out.data[0]
145
+ end
146
+
147
+ def test_program_largest_buffer_is_input
148
+ p = Program.new <<-'eof'
149
+ __kernel sum(__global int* out, __global int* in, int total) {
150
+ int id = get_global_id(0);
151
+ if (id < total) atom_add(&out[0], in[id]);
152
+ }
153
+ eof
154
+
155
+ arr = (1..517).to_a
156
+ sum = arr.inject(0) {|acc, el| acc + el }
157
+ _in = Buffer.new(arr)
158
+ out = OutputBuffer.new(:int, 1)
159
+ p.sum(out, _in, arr.size)
160
+ assert_equal sum, out.data[0]
161
+ end
162
+
163
+ def test_program_invalid_worker_size
164
+ p = Program.new("__kernel sum(int x) { }")
165
+ assert_raise(ArgumentError) { p.sum(:worker_size => "hello") }
166
+ assert_raise(ArgumentError) { p.sum(:worker => 1) }
167
+ end
168
+
169
+ def test_program_invalid_args
170
+ p = Program.new("__kernel sum(int x, __global int *y) { }")
171
+ assert_raise(ArgumentError) { p.sum(1, 2) }
172
+ assert_raise(ArgumentError) { p.sum(1, OutputBuffer.new(:int, 1), 3) }
173
+ end
174
+ end
metadata ADDED
@@ -0,0 +1,61 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: barracuda
3
+ version: !ruby/object:Gem::Version
4
+ version: "1.0"
5
+ platform: ruby
6
+ authors:
7
+ - Loren Segal
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-08-30 00:00:00 -04:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description:
17
+ email: lsegal@soen.ca
18
+ executables: []
19
+
20
+ extensions:
21
+ - ext/extconf.rb
22
+ extra_rdoc_files: []
23
+
24
+ files:
25
+ - ext/barracuda.c
26
+ - ext/extconf.rb
27
+ - benchmarks/to_float.rb
28
+ - test/test_barracuda.rb
29
+ - LICENSE
30
+ - README.md
31
+ - Rakefile
32
+ has_rdoc: true
33
+ homepage: http://github.com/lsegal/barracuda
34
+ licenses: []
35
+
36
+ post_install_message:
37
+ rdoc_options: []
38
+
39
+ require_paths:
40
+ - ext
41
+ required_ruby_version: !ruby/object:Gem::Requirement
42
+ requirements:
43
+ - - ">="
44
+ - !ruby/object:Gem::Version
45
+ version: "0"
46
+ version:
47
+ required_rubygems_version: !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ version: "0"
52
+ version:
53
+ requirements: []
54
+
55
+ rubyforge_project: barracuda
56
+ rubygems_version: 1.3.4
57
+ signing_key:
58
+ specification_version: 3
59
+ summary: Barracuda is a wrapper library for OpenCL/CUDA GPGPU programming
60
+ test_files:
61
+ - test/test_barracuda.rb