barracuda 1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2009 Loren Segal
2
+
3
+ Permission is hereby granted, free of charge, to any person
4
+ obtaining a copy of this software and associated documentation
5
+ files (the "Software"), to deal in the Software without
6
+ restriction, including without limitation the rights to use,
7
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ copies of the Software, and to permit persons to whom the
9
+ Software is furnished to do so, subject to the following
10
+ conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,163 @@
1
+ Barracuda
2
+ =========
3
+
4
+ Written by Loren Segal in 2009.
5
+
6
+ SYNOPSIS
7
+ --------
8
+
9
+ Barracuda is a Ruby wrapper library for the [OpenCL][1] architecture. OpenCL is a
10
+ framework for multi-processor computing, most notably allowing a programmer
11
+ to run parallel programs on a GPU, taking advantage of the many cores
12
+ available.
13
+
14
+ Barracuda aims to abstract both CUDA and OpenCL, however for now only OpenCL
15
+ on OSX 10.6 is supported. Patches to extend this support would be joyously
16
+ accepted!
17
+
18
+ Also note that Barracuda currently only supports data types, namely ints and
19
+ floats only. This should also be expanded.
20
+
21
+ INSTALLING
22
+ ----------
23
+
24
+ As mentioned above, this library currently only supports OSX 10.6 (or an earlier
25
+ version with the OpenCL framework, if that's even possible). If you manage to
26
+ mess with the source and get it working on [insert system here], please submit
27
+ your patches.
28
+
29
+ Okay, assuming you have a compatible machine:
30
+
31
+ sudo gem install barracuda
32
+
33
+ Or:
34
+
35
+ git clone git://github.com/lsegal/barracuda
36
+ cd barracuda
37
+ rake install
38
+
39
+ USING
40
+ -----
41
+
42
+ The basic workflow behind the OpenCL architecture is:
43
+
44
+ 1. Create a program (and kernel) to be run on the GPU's many cores.
45
+ 2. Create input/output buffers to pass data from Ruby to the GPU and back.
46
+ 3. Read the output buffer(s) to get your computed data.
47
+
48
+ In Barracuda, this looks basically like:
49
+
50
+ 1. Create a `Barracuda::Program`
51
+ 2. Create a `Barracuda::Buffer` or `Barracuda::OutputBuffer`
52
+ 2. Call the kernel method on the program with buffers as arguments
53
+ 3. Read output buffers
54
+
55
+ As you can see, there are only 3 basic classes: `Program`, `Buffer` (for input
56
+ data), and `OutputBuffer` (for output data).
57
+
58
+ EXAMPLE
59
+ -------
60
+
61
+ Consider the following example to sum a bunch of integers:
62
+
63
+ program = Program.new <<-'eof'
64
+ __kernel sum(__global int *out, __global int *in, int total) {
65
+ int id = get_global_id(0);
66
+ if (id < total) atom_add(&out[0], in[id]);
67
+ }
68
+ eof
69
+
70
+ arr = (1..65536).to_a
71
+ input = Buffer.new(arr)
72
+ output = OutputBuffer.new(:int, 1)
73
+ program.sum(output, input, arr.size)
74
+
75
+ puts "The sum is: " + output.data[0].to_s
76
+
77
+ The above example will compute the sum of integers 1 to 65536 using (at most)
78
+ 65536 parallel processes and return the result in the 1-dimensional output
79
+ buffer (which stores integers and is of length 1). The kernel method `sum`
80
+ is called by calling the `#sum` method on the program object, and the
81
+ arguments are passed in sequentially as the output buffer, followed by the
82
+ input data (the integers) followed by the total size of the input (since C
83
+ does not have the concept of array size).
84
+
85
+ We can also specify the work group size (the number of iterations we need
86
+ to run). Barracuda automatically selects the size of the largest buffer as
87
+ the work group size, but in some cases this may be too small or too large. To
88
+ manually specify the work group size, call the kernel with an options hash:
89
+
90
+ program.my_kernel_method(..., :worker_size => 512)
91
+
92
+ Note that the work group size must be a power of 2. Barracuda will increase
93
+ the work group size to the next power of 2 if it needs to. This means your
94
+ OpenCL program might run more iterations of your kernel method than you
95
+ request. Because we can't rely on the work group size, we pass in the total
96
+ data size to ensure we do not exceed the bounds of our data.
97
+
98
+ CLASS DETAILS
99
+ -------------
100
+
101
+ **Barracuda::Program**:
102
+
103
+ Represents an OpenCL program
104
+
105
+ Program.new(PROGRAM_SOURCE) => creates a new program
106
+
107
+ Program#compile(SOURCE) => recompiles a program
108
+
109
+ Program#KERNEL_METHOD(*args) => runs KERNEL_METHOD in the compiled program
110
+ - args should be the arguments defined in the kernel method.
111
+ - supported argument types are Float and Fixnum objects only.
112
+ - if the last arg is a Hash, it should be an options hash with keys:
113
+ - :worker_size => FIXNUM (the number of iterations to run)
114
+
115
+ **Barracuda::Buffer**:
116
+
117
+ Stores data to be sent to an OpenCL kernel method
118
+
119
+ Buffer.new(*buffer_data) => creates a new input buffer
120
+
121
+ Buffer#data => accessor for the buffer data
122
+
123
+ Buffer#size_changed => call this if the buffer.data was modified and the size changed
124
+ - calls Buffer#write
125
+
126
+ Buffer#write => call this if the buffer.data was modified (size not changed)
127
+ - flushes the buffer.data cache to the OpenCL internal memory buffer
128
+
129
+ Buffer#read => reads the cached data back into buffer.data
130
+ - refreshes the buffer.data cache according to the internal memory buffer
131
+
132
+ **Barracuda::OutputBuffer**:
133
+
134
+ Holds a buffer for data written from the kernel method.
135
+
136
+ OutputBuffer.new(type, size) => creates a new output buffer
137
+ - type can be :float or :int
138
+
139
+ OutputBufferBuffer#data => accessor for the buffer data
140
+
141
+ OutputBuffer#size => returns the buffer size
142
+
143
+ GLOSSARY
144
+ --------
145
+
146
+ * **Program**: an OpenCL program is generally created from a variant of C that
147
+ has extra domain specific keywords. A program has at least one "kernel"
148
+ method, but can have many regular methods.
149
+
150
+ * **Kernel**: a special "entry" method in the program that is exposed to the
151
+ programmer to be called on via the OpenCL framework. A kernel method is
152
+ represented by the `__kernel` keyword before the method body.
153
+
154
+ * **Buffer**: memory storage which is accessible and (generally shared with the
155
+ program). Buffers are usually marked with the `__global` keyword in an
156
+ OpenCL program.
157
+
158
+ COPYRIGHT & LICENSING
159
+ ---------------------
160
+
161
+ Copyright 2009 Loren Segal, licensed under the MIT License
162
+
163
+ [1]: http://en.wikipedia.ca/wiki/OpenCL "OpenCL"
@@ -0,0 +1,18 @@
1
+ require 'rubygems'
2
+ require 'rake/gempackagetask'
3
+
4
+ WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
5
+ SUDO = WINDOWS ? '' : 'sudo'
6
+
7
+ load 'barracuda.gemspec'
8
+ Rake::GemPackageTask.new(SPEC) do |pkg|
9
+ pkg.gem_spec = SPEC
10
+ pkg.need_zip = true
11
+ pkg.need_tar = true
12
+ end
13
+
14
+ desc "Install the gem locally"
15
+ task :install => :package do
16
+ sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
17
+ sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
18
+ end
@@ -0,0 +1,24 @@
1
+ $:.unshift(File.dirname(__FILE__) + '/../ext')
2
+
3
+ require 'barracuda'
4
+ require 'benchmark'
5
+
6
+ include Barracuda
7
+
8
+ prog = Program.new <<-'eof'
9
+ __kernel sum(__global float *out, __global int *in, int total) {
10
+ int i = get_global_id(0);
11
+ if (i < total) out[i] = ((float)in[i] + 0.5) / 3.8 + 2.0;
12
+ }
13
+ eof
14
+
15
+ arr = (1..3333333).to_a
16
+ input = Buffer.new(arr)
17
+ output = OutputBuffer.new(:float, arr.size)
18
+
19
+ TIMES = 1
20
+ Benchmark.bmbm do |x|
21
+ x.report("cpu") { TIMES.times { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } } }
22
+ x.report("gpu") { TIMES.times { prog.sum(output, input, arr.size); output.clear } }
23
+ end
24
+
@@ -0,0 +1,481 @@
1
+ #include <ruby.h>
2
+ #include <OpenCL/OpenCL.h>
3
+
4
+ static VALUE rb_mBarracuda;
5
+ static VALUE rb_cBuffer;
6
+ static VALUE rb_cOutputBuffer;
7
+ static VALUE rb_cProgram;
8
+ static VALUE rb_eProgramSyntaxError;
9
+ static VALUE rb_eOpenCLError;
10
+
11
+ static ID ba_worker_size;
12
+
13
+ static VALUE program_compile(VALUE self, VALUE source);
14
+ static VALUE buffer_data_set(VALUE self, VALUE new_value);
15
+
16
+ static cl_device_id device_id = NULL;
17
+ static cl_context context = NULL;
18
+ static int err;
19
+
20
+ #define BUFFER_TYPE_FLOAT 0x0001
21
+ #define BUFFER_TYPE_INT 0x0002
22
+ #define BUFFER_TYPE_CHAR 0x0003
23
+
24
+ struct program {
25
+ cl_program program;
26
+ };
27
+
28
+ struct kernel {
29
+ cl_kernel kernel;
30
+ };
31
+
32
+ struct buffer {
33
+ VALUE arr;
34
+ unsigned int type;
35
+ size_t num_items;
36
+ size_t member_size;
37
+ void *cachebuf;
38
+ cl_mem data;
39
+ };
40
+
41
+ #define GET_PROGRAM() \
42
+ struct program *program; \
43
+ Data_Get_Struct(self, struct program, program);
44
+
45
+ #define GET_BUFFER() \
46
+ struct buffer *buffer; \
47
+ Data_Get_Struct(self, struct buffer, buffer);
48
+
49
+ static void
50
+ init_opencl()
51
+ {
52
+ if (device_id == NULL) {
53
+ err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
54
+ if (err != CL_SUCCESS) {
55
+ rb_raise(rb_eOpenCLError, "failed to create a device group");
56
+ }
57
+ }
58
+
59
+ if (context == NULL) {
60
+ context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
61
+ if (!context) {
62
+ rb_raise(rb_eOpenCLError, "failed to create a program context");
63
+ }
64
+ }
65
+ }
66
+
67
+ static void
68
+ free_buffer(struct buffer *buffer)
69
+ {
70
+ fflush(stdout);
71
+ clReleaseMemObject(buffer->data);
72
+ rb_gc_mark(buffer->arr);
73
+ ruby_xfree(buffer->cachebuf);
74
+ ruby_xfree(buffer);
75
+ }
76
+
77
+ static VALUE
78
+ buffer_s_allocate(VALUE klass)
79
+ {
80
+ struct buffer *buffer;
81
+ buffer = ALLOC(struct buffer);
82
+ MEMZERO(buffer, struct buffer, 1);
83
+ buffer->arr = Qnil;
84
+ return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
85
+ }
86
+
87
+ static void
88
+ buffer_update_cache_info(struct buffer *buffer)
89
+ {
90
+ buffer->num_items = RARRAY_LEN(buffer->arr);
91
+
92
+ switch (TYPE(RARRAY_PTR(buffer->arr)[0])) {
93
+ case T_FIXNUM:
94
+ buffer->type = BUFFER_TYPE_INT;
95
+ buffer->member_size = sizeof(int);
96
+ break;
97
+ case T_FLOAT:
98
+ buffer->type = BUFFER_TYPE_FLOAT;
99
+ buffer->member_size = sizeof(float);
100
+ break;
101
+ default:
102
+ rb_raise(rb_eRuntimeError, "invalid buffer data %s",
103
+ RSTRING_PTR(rb_inspect(buffer->arr)));
104
+ }
105
+ }
106
+
107
+ static VALUE
108
+ buffer_write(VALUE self)
109
+ {
110
+ unsigned int i;
111
+
112
+ GET_BUFFER();
113
+
114
+ buffer_update_cache_info(buffer);
115
+
116
+ if (buffer->cachebuf) {
117
+ xfree(buffer->cachebuf);
118
+ }
119
+ buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
120
+
121
+ for (i = 0; i < RARRAY_LEN(buffer->arr); i++) {
122
+ VALUE item = RARRAY_PTR(buffer->arr)[i];
123
+ switch (buffer->type) {
124
+ case BUFFER_TYPE_INT: {
125
+ int value = FIX2INT(item);
126
+ ((int *)buffer->cachebuf)[i] = value;
127
+ break;
128
+ }
129
+ case BUFFER_TYPE_FLOAT: {
130
+ float value = RFLOAT_VALUE(item);
131
+ ((float *)buffer->cachebuf)[i] = value;
132
+ break;
133
+ }
134
+ default:
135
+ ((uint32_t *)buffer->cachebuf)[i] = 0;
136
+ }
137
+ }
138
+
139
+ return self;
140
+ }
141
+
142
+ static VALUE
143
+ buffer_read(VALUE self)
144
+ {
145
+ unsigned int i;
146
+
147
+ GET_BUFFER();
148
+
149
+ rb_gc_mark(buffer->arr);
150
+ buffer->arr = rb_ary_new2(buffer->num_items);
151
+
152
+ for (i = 0; i < buffer->num_items; i++) {
153
+ switch (buffer->type) {
154
+ case BUFFER_TYPE_INT:
155
+ rb_ary_push(buffer->arr, INT2FIX(((int *)buffer->cachebuf)[i]));
156
+ break;
157
+ case BUFFER_TYPE_FLOAT:
158
+ rb_ary_push(buffer->arr, rb_float_new(((float *)buffer->cachebuf)[i]));
159
+ break;
160
+ default:
161
+ rb_ary_push(buffer->arr, Qnil);
162
+ }
163
+ }
164
+
165
+ return self;
166
+ }
167
+
168
+ static VALUE
169
+ buffer_size_changed(VALUE self)
170
+ {
171
+ GET_BUFFER();
172
+
173
+ if (buffer->data) {
174
+ clReleaseMemObject(buffer->data);
175
+ }
176
+ buffer_update_cache_info(buffer);
177
+ buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
178
+ buffer->num_items * buffer->member_size, NULL, NULL);
179
+
180
+ buffer_write(self);
181
+
182
+ return self;
183
+ }
184
+
185
+ static VALUE
186
+ buffer_data(VALUE self)
187
+ {
188
+ GET_BUFFER();
189
+ return buffer->arr;
190
+ }
191
+
192
+ static VALUE
193
+ buffer_data_set(VALUE self, VALUE new_value)
194
+ {
195
+ GET_BUFFER();
196
+
197
+ if (RTEST(buffer->arr)) {
198
+ rb_gc_mark(buffer->arr);
199
+ }
200
+ buffer->arr = new_value;
201
+ buffer_size_changed(self);
202
+ return buffer->arr;
203
+ }
204
+
205
+ static VALUE
206
+ buffer_initialize(int argc, VALUE *argv, VALUE self)
207
+ {
208
+ GET_BUFFER();
209
+
210
+ if (argc == 0) {
211
+ rb_raise(rb_eArgError, "no buffer data given");
212
+ }
213
+
214
+ if (TYPE(argv[0]) == T_ARRAY) {
215
+ buffer_data_set(self, argv[0]);
216
+ }
217
+ else {
218
+ buffer_data_set(self, rb_ary_new4(argc, argv));
219
+ }
220
+
221
+ return self;
222
+ }
223
+
224
+ static VALUE
225
+ obuffer_initialize(VALUE self, VALUE type, VALUE size)
226
+ {
227
+ GET_BUFFER();
228
+
229
+ StringValue(type);
230
+ if (strcmp(RSTRING_PTR(type), "float") == 0) {
231
+ buffer->type = BUFFER_TYPE_FLOAT;
232
+ buffer->member_size = sizeof(float);
233
+ }
234
+ else if (strcmp(RSTRING_PTR(type), "int") == 0) {
235
+ buffer->type = BUFFER_TYPE_INT;
236
+ buffer->member_size = sizeof(int);
237
+ }
238
+ else {
239
+ rb_raise(rb_eArgError, "type can only be :float or :int");
240
+ }
241
+
242
+ if (TYPE(size) != T_FIXNUM) {
243
+ rb_raise(rb_eArgError, "expecting buffer size as argument 2");
244
+ }
245
+
246
+ buffer->num_items = FIX2UINT(size);
247
+ buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
248
+ buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
249
+ buffer->member_size * buffer->num_items, NULL, NULL);
250
+
251
+ return self;
252
+ }
253
+
254
+ static VALUE
255
+ obuffer_clear(VALUE self)
256
+ {
257
+ GET_BUFFER();
258
+ memset(buffer->cachebuf, 0, buffer->member_size * buffer->num_items);
259
+ return self;
260
+ }
261
+
262
+ static VALUE
263
+ obuffer_size(VALUE self)
264
+ {
265
+ GET_BUFFER();
266
+ return INT2FIX(buffer->num_items);
267
+ }
268
+
269
+ static void
270
+ free_program(struct program *program)
271
+ {
272
+ clReleaseProgram(program->program);
273
+ xfree(program);
274
+ }
275
+
276
+ static VALUE
277
+ program_s_allocate(VALUE klass)
278
+ {
279
+ struct program *program;
280
+ program = ALLOC(struct program);
281
+ MEMZERO(program, struct program, 1);
282
+ return Data_Wrap_Struct(klass, 0, free_program, program);
283
+ }
284
+
285
+ static VALUE
286
+ program_initialize(int argc, VALUE *argv, VALUE self)
287
+ {
288
+ VALUE source;
289
+
290
+ rb_scan_args(argc, argv, "01", &source);
291
+ if (source != Qnil) {
292
+ program_compile(self, source);
293
+ }
294
+
295
+ return self;
296
+ }
297
+
298
+ static VALUE
299
+ program_compile(VALUE self, VALUE source)
300
+ {
301
+ const char *c_source;
302
+ GET_PROGRAM();
303
+ StringValue(source);
304
+
305
+ if (program->program) {
306
+ clReleaseProgram(program->program);
307
+ program->program = 0;
308
+ }
309
+
310
+ c_source = StringValueCStr(source);
311
+ program->program = clCreateProgramWithSource(context, 1, &c_source, NULL, &err);
312
+ if (!program->program) {
313
+ program->program = 0;
314
+ rb_raise(rb_eOpenCLError, "failed to create compute program");
315
+ }
316
+
317
+ err = clBuildProgram(program->program, 0, NULL, NULL, NULL, NULL);
318
+ if (err != CL_SUCCESS) {
319
+ size_t len;
320
+ char buffer[2048];
321
+
322
+ clGetProgramBuildInfo(program->program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
323
+ clReleaseProgram(program->program);
324
+ program->program = 0;
325
+ rb_raise(rb_eProgramSyntaxError, "%s", buffer);
326
+ }
327
+
328
+ return Qtrue;
329
+ }
330
+
331
+ #define CLEAN() program_clean(kernel, commands);
332
+ #define ERROR(msg) if (err != CL_SUCCESS) { CLEAN(); rb_raise(rb_eOpenCLError, msg); }
333
+
334
+ static void
335
+ program_clean(cl_kernel kernel, cl_command_queue commands)
336
+ {
337
+ clReleaseKernel(kernel);
338
+ clReleaseCommandQueue(commands);
339
+ }
340
+
341
+ static VALUE
342
+ program_method_missing(int argc, VALUE *argv, VALUE self)
343
+ {
344
+ int i;
345
+ size_t local = 0, global = 0;
346
+ cl_kernel kernel;
347
+ cl_command_queue commands;
348
+ GET_PROGRAM();
349
+
350
+ StringValue(argv[0]);
351
+ kernel = clCreateKernel(program->program, RSTRING_PTR(argv[0]), &err);
352
+ if (!kernel || err != CL_SUCCESS) {
353
+ rb_raise(rb_eNoMethodError, "no kernel method '%s'", RSTRING_PTR(argv[0]));
354
+ }
355
+
356
+ commands = clCreateCommandQueue(context, device_id, 0, &err);
357
+ if (!commands) {
358
+ rb_raise(rb_eOpenCLError, "could not execute kernel method '%s'", RSTRING_PTR(argv[0]));
359
+ }
360
+
361
+ for (i = 1; i < argc; i++) {
362
+ err = 0;
363
+ if (i == argc - 1 && TYPE(argv[i]) == T_HASH) {
364
+ VALUE worker_size = rb_hash_aref(argv[i], ID2SYM(ba_worker_size));
365
+ if (RTEST(worker_size) && TYPE(worker_size) == T_FIXNUM) {
366
+ global = FIX2UINT(worker_size);
367
+ }
368
+ else {
369
+ CLEAN();
370
+ rb_raise(rb_eArgError, "opts hash must be {:worker_size => INT_VALUE}, got %s",
371
+ RSTRING_PTR(rb_inspect(argv[i])));
372
+ }
373
+ break;
374
+ }
375
+
376
+ switch(TYPE(argv[i])) {
377
+ case T_FIXNUM: {
378
+ int value = FIX2INT(argv[i]);
379
+ err = clSetKernelArg(kernel, i - 1, sizeof(int), &value);
380
+ break;
381
+ }
382
+ case T_FLOAT: {
383
+ float value = RFLOAT_VALUE(argv[i]);
384
+ err = clSetKernelArg(kernel, i - 1, sizeof(float), &value);
385
+ break;
386
+ }
387
+ case T_ARRAY: {
388
+ /* TODO */
389
+ /* fall-through */
390
+ }
391
+ default:
392
+ if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
393
+ struct buffer *buffer;
394
+ Data_Get_Struct(argv[i], struct buffer, buffer);
395
+ err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
396
+ if (buffer->num_items > global) {
397
+ global = buffer->num_items;
398
+ }
399
+ }
400
+ else if (CLASS_OF(argv[i]) == rb_cBuffer) {
401
+ struct buffer *buffer;
402
+ Data_Get_Struct(argv[i], struct buffer, buffer);
403
+
404
+ buffer_write(argv[i]);
405
+ clEnqueueWriteBuffer(commands, buffer->data, CL_TRUE, 0,
406
+ buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
407
+ err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
408
+ }
409
+ break;
410
+ }
411
+ if (err != CL_SUCCESS) {
412
+ CLEAN();
413
+ rb_raise(rb_eArgError, "invalid kernel method parameter: %s", RSTRING_PTR(rb_inspect(argv[i])));
414
+ }
415
+ }
416
+
417
+ err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &local, NULL);
418
+ ERROR("failed to retrieve kernel work group info");
419
+
420
+ { /* global work size must be power of 2, greater than 3 and not smaller than local */
421
+ size_t size = 4;
422
+ while (size < global) size *= 2;
423
+ global = size;
424
+ if (global < local) global = local;
425
+ }
426
+
427
+ clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
428
+ if (err) { CLEAN(); rb_raise(rb_eOpenCLError, "failed to execute kernel method"); }
429
+
430
+ clFinish(commands);
431
+
432
+ for (i = 1; i < argc; i++) {
433
+ if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
434
+ struct buffer *buffer;
435
+ Data_Get_Struct(argv[i], struct buffer, buffer);
436
+ err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
437
+ buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
438
+ ERROR("failed to read output buffer");
439
+ buffer_read(argv[i]);
440
+ }
441
+ }
442
+
443
+ CLEAN();
444
+ return Qnil;
445
+ }
446
+
447
+ void
448
+ Init_barracuda()
449
+ {
450
+ ba_worker_size = rb_intern("worker_size");
451
+
452
+ rb_mBarracuda = rb_define_module("Barracuda");
453
+
454
+ rb_eProgramSyntaxError = rb_define_class_under(rb_mBarracuda, "SyntaxError", rb_eSyntaxError);
455
+ rb_eOpenCLError = rb_define_class_under(rb_mBarracuda, "OpenCLError", rb_eStandardError);
456
+
457
+ rb_cProgram = rb_define_class_under(rb_mBarracuda, "Program", rb_cObject);
458
+ rb_define_alloc_func(rb_cProgram, program_s_allocate);
459
+ rb_define_method(rb_cProgram, "initialize", program_initialize, -1);
460
+ rb_define_method(rb_cProgram, "compile", program_compile, 1);
461
+ rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
462
+
463
+ rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cObject);
464
+ rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
465
+ rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
466
+ rb_define_method(rb_cBuffer, "size_changed", buffer_size_changed, 0);
467
+ rb_define_method(rb_cBuffer, "read", buffer_read, 0);
468
+ rb_define_method(rb_cBuffer, "write", buffer_write, 0);
469
+ rb_define_method(rb_cBuffer, "data", buffer_data, 0);
470
+ rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
471
+
472
+ rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
473
+ rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
474
+ rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
475
+ rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
476
+ rb_undef_method(rb_cOutputBuffer, "write");
477
+ rb_undef_method(rb_cOutputBuffer, "size_changed");
478
+ rb_undef_method(rb_cOutputBuffer, "data=");
479
+
480
+ init_opencl();
481
+ }
@@ -0,0 +1,4 @@
1
+ require 'mkmf'
2
+ $CPPFLAGS += " -DRUBY_19" if RUBY_VERSION =~ /1.9/
3
+ $LDFLAGS += " -framework OpenCL" if RUBY_PLATFORM =~ /darwin/
4
+ create_makefile('barracuda')
@@ -0,0 +1,174 @@
1
+ $:.unshift(File.dirname(__FILE__) + '/../ext/')
2
+
3
+ require "test/unit"
4
+ require "barracuda"
5
+
6
+ include Barracuda
7
+
8
+ class TestBuffer < Test::Unit::TestCase
9
+ def test_buffer_create_no_data
10
+ assert_raise(ArgumentError) { Buffer.new }
11
+ end
12
+
13
+ def test_buffer_create_invalid_data
14
+ assert_raise(RuntimeError) { Buffer.new("xyz") }
15
+ end
16
+
17
+ def test_buffer_create_with_array
18
+ b = Buffer.new([1, 2, 3, 4, 5])
19
+ assert_equal [1, 2, 3, 4, 5], b.data
20
+ end
21
+
22
+ def test_buffer_create_with_splat
23
+ b = Buffer.new(1.0, 2.0, 3.0)
24
+ assert_equal [1.0, 2.0, 3.0], b.data
25
+ end
26
+
27
+ def test_buffer_set_data
28
+ b = Buffer.new(1)
29
+ b.data = [1, 2, 3]
30
+ assert_equal 3, b.data.size
31
+ end
32
+
33
+ def test_buffer_read
34
+ b = Buffer.new(4, 2, 3)
35
+ b.data[0] = 1
36
+ b.read
37
+ assert_equal [4,2,3], b.data
38
+ end
39
+
40
+ def test_buffer_write
41
+ b = Buffer.new(1, 2, 3)
42
+ b.data[0] = 4
43
+ b.write
44
+ b.read
45
+ assert_equal [4,2,3], b.data
46
+ end
47
+
48
+ def test_buffer_size_changed
49
+ b = Buffer.new(1, 2, 3)
50
+ b.data << 4
51
+ b.size_changed
52
+ b.read
53
+ assert_equal [1,2,3,4], b.data
54
+ end
55
+ end
56
+
57
+ class TestOutputBuffer < Test::Unit::TestCase
58
+ def test_create_int_output_buffer
59
+ b = OutputBuffer.new(:int, 5)
60
+ assert_equal 5, b.size
61
+ end
62
+
63
+ def test_create_int_output_buffer
64
+ b = OutputBuffer.new(:float, 5)
65
+ assert_equal 5, b.size
66
+ end
67
+
68
+ def test_create_output_buffer_with_invalid_type
69
+ assert_raise(ArgumentError) { OutputBuffer.new(:char, 5) }
70
+ end
71
+
72
+ def test_create_output_buffer_with_invalid_size
73
+ assert_raise(ArgumentError) { OutputBuffer.new(:int, 'x') }
74
+ end
75
+ end
76
+
77
+ class TestProgram < Test::Unit::TestCase
78
+ def test_program_create_invalid_code
79
+ assert_raise(Barracuda::SyntaxError) { Program.new "fib { SYNTAXERROR }" }
80
+ end
81
+
82
+ def test_program_create
83
+ assert_nothing_raised { Program.new "__kernel fib(int x) { return 0; }"}
84
+ end
85
+
86
+ def test_program_compile
87
+ p = Program.new
88
+ assert_nothing_raised { p.compile "__kernel fib(int x) { }" }
89
+ end
90
+
91
+ def test_kernel_run
92
+ p = Program.new("__kernel x_y_z(int x) { }")
93
+ assert_nothing_raised { p.x_y_z }
94
+ end
95
+
96
+ def test_kernel_missing
97
+ p = Program.new("__kernel x_y_z(int x) { }")
98
+ assert_raise(NoMethodError) { p.not_x_y_z }
99
+ end
100
+
101
+ def test_program_int_input_buffer
102
+ p = Program.new <<-'eof'
103
+ __kernel run(__global int* out, __global int* in, int total) {
104
+ int id = get_global_id(0);
105
+ if (id < total) out[id] = in[id] + 1;
106
+ }
107
+ eof
108
+
109
+ arr = (1..256).to_a
110
+ _in = Buffer.new(arr)
111
+ out = OutputBuffer.new(:int, arr.size)
112
+ p.run(out, _in, arr.size)
113
+ assert_equal arr.map {|x| x + 1 }, out.data
114
+ end
115
+
116
+ def test_program_float_buffer
117
+ p = Program.new <<-'eof'
118
+ __kernel run(__global float* out, __global int* in, int total) {
119
+ int id = get_global_id(0);
120
+ if (id < total) out[id] = (float)in[id] + 0.5;
121
+ }
122
+ eof
123
+
124
+ arr = (1..256).to_a
125
+ _in = Buffer.new(arr)
126
+ out = OutputBuffer.new(:float, arr.size)
127
+ p.run(out, _in, arr.size)
128
+ assert_equal arr.map {|x| x.to_f + 0.5 }, out.data
129
+ end
130
+
131
+ def test_program_set_worker_size
132
+ p = Program.new <<-'eof'
133
+ __kernel sum(__global int* out, __global int* in, int total) {
134
+ int id = get_global_id(0);
135
+ if (id < total) atom_add(&out[0], in[id]);
136
+ }
137
+ eof
138
+
139
+ arr = (1..517).to_a
140
+ sum = arr.inject(0) {|acc, el| acc + el }
141
+ _in = Buffer.new(arr)
142
+ out = OutputBuffer.new(:int, 1)
143
+ p.sum(out, _in, arr.size, :worker_size => arr.size)
144
+ assert_equal sum, out.data[0]
145
+ end
146
+
147
+ def test_program_largest_buffer_is_input
148
+ p = Program.new <<-'eof'
149
+ __kernel sum(__global int* out, __global int* in, int total) {
150
+ int id = get_global_id(0);
151
+ if (id < total) atom_add(&out[0], in[id]);
152
+ }
153
+ eof
154
+
155
+ arr = (1..517).to_a
156
+ sum = arr.inject(0) {|acc, el| acc + el }
157
+ _in = Buffer.new(arr)
158
+ out = OutputBuffer.new(:int, 1)
159
+ p.sum(out, _in, arr.size)
160
+ assert_equal sum, out.data[0]
161
+ end
162
+
163
+ def test_program_invalid_worker_size
164
+ p = Program.new("__kernel sum(int x) { }")
165
+ assert_raise(ArgumentError) { p.sum(:worker_size => "hello") }
166
+ assert_raise(ArgumentError) { p.sum(:worker => 1) }
167
+ end
168
+
169
+ def test_program_invalid_args
170
+ p = Program.new("__kernel sum(int x, __global int *y) { }")
171
+ assert_raise(ArgumentError) { p.sum(1, 2) }
172
+ assert_raise(ArgumentError) { p.sum(1, OutputBuffer.new(:int, 1), 3) }
173
+ end
174
+ end
metadata ADDED
@@ -0,0 +1,61 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: barracuda
3
+ version: !ruby/object:Gem::Version
4
+ version: "1.0"
5
+ platform: ruby
6
+ authors:
7
+ - Loren Segal
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-08-30 00:00:00 -04:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description:
17
+ email: lsegal@soen.ca
18
+ executables: []
19
+
20
+ extensions:
21
+ - ext/extconf.rb
22
+ extra_rdoc_files: []
23
+
24
+ files:
25
+ - ext/barracuda.c
26
+ - ext/extconf.rb
27
+ - benchmarks/to_float.rb
28
+ - test/test_barracuda.rb
29
+ - LICENSE
30
+ - README.md
31
+ - Rakefile
32
+ has_rdoc: true
33
+ homepage: http://github.com/lsegal/barracuda
34
+ licenses: []
35
+
36
+ post_install_message:
37
+ rdoc_options: []
38
+
39
+ require_paths:
40
+ - ext
41
+ required_ruby_version: !ruby/object:Gem::Requirement
42
+ requirements:
43
+ - - ">="
44
+ - !ruby/object:Gem::Version
45
+ version: "0"
46
+ version:
47
+ required_rubygems_version: !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ version: "0"
52
+ version:
53
+ requirements: []
54
+
55
+ rubyforge_project: barracuda
56
+ rubygems_version: 1.3.4
57
+ signing_key:
58
+ specification_version: 3
59
+ summary: Barracuda is a wrapper library for OpenCL/CUDA GPGPU programming
60
+ test_files:
61
+ - test/test_barracuda.rb