barracuda 1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +22 -0
- data/README.md +163 -0
- data/Rakefile +18 -0
- data/benchmarks/to_float.rb +24 -0
- data/ext/barracuda.c +481 -0
- data/ext/extconf.rb +4 -0
- data/test/test_barracuda.rb +174 -0
- metadata +61 -0
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2009 Loren Segal
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person
|
4
|
+
obtaining a copy of this software and associated documentation
|
5
|
+
files (the "Software"), to deal in the Software without
|
6
|
+
restriction, including without limitation the rights to use,
|
7
|
+
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
copies of the Software, and to permit persons to whom the
|
9
|
+
Software is furnished to do so, subject to the following
|
10
|
+
conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be
|
13
|
+
included in all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
16
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
17
|
+
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
18
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
19
|
+
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
20
|
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
21
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,163 @@
|
|
1
|
+
Barracuda
|
2
|
+
=========
|
3
|
+
|
4
|
+
Written by Loren Segal in 2009.
|
5
|
+
|
6
|
+
SYNOPSIS
|
7
|
+
--------
|
8
|
+
|
9
|
+
Barracuda is a Ruby wrapper library for the [OpenCL][1] architecture. OpenCL is a
|
10
|
+
framework for multi-processor computing, most notably allowing a programmer
|
11
|
+
to run parallel programs on a GPU, taking advantage of the many cores
|
12
|
+
available.
|
13
|
+
|
14
|
+
Barracuda aims to abstract both CUDA and OpenCL, however for now only OpenCL
|
15
|
+
on OSX 10.6 is supported. Patches to extend this support would be joyously
|
16
|
+
accepted!
|
17
|
+
|
18
|
+
Also note that Barracuda currently only supports data types, namely ints and
|
19
|
+
floats only. This should also be expanded.
|
20
|
+
|
21
|
+
INSTALLING
|
22
|
+
----------
|
23
|
+
|
24
|
+
As mentioned above, this library currently only supports OSX 10.6 (or an earlier
|
25
|
+
version with the OpenCL framework, if that's even possible). If you manage to
|
26
|
+
mess with the source and get it working on [insert system here], please submit
|
27
|
+
your patches.
|
28
|
+
|
29
|
+
Okay, assuming you have a compatible machine:
|
30
|
+
|
31
|
+
sudo gem install barracuda
|
32
|
+
|
33
|
+
Or:
|
34
|
+
|
35
|
+
git clone git://github.com/lsegal/barracuda
|
36
|
+
cd barracuda
|
37
|
+
rake install
|
38
|
+
|
39
|
+
USING
|
40
|
+
-----
|
41
|
+
|
42
|
+
The basic workflow behind the OpenCL architecture is:
|
43
|
+
|
44
|
+
1. Create a program (and kernel) to be run on the GPU's many cores.
|
45
|
+
2. Create input/output buffers to pass data from Ruby to the GPU and back.
|
46
|
+
3. Read the output buffer(s) to get your computed data.
|
47
|
+
|
48
|
+
In Barracuda, this looks basically like:
|
49
|
+
|
50
|
+
1. Create a `Barracuda::Program`
|
51
|
+
2. Create a `Barracuda::Buffer` or `Barracuda::OutputBuffer`
|
52
|
+
2. Call the kernel method on the program with buffers as arguments
|
53
|
+
3. Read output buffers
|
54
|
+
|
55
|
+
As you can see, there are only 3 basic classes: `Program`, `Buffer` (for input
|
56
|
+
data), and `OutputBuffer` (for output data).
|
57
|
+
|
58
|
+
EXAMPLE
|
59
|
+
-------
|
60
|
+
|
61
|
+
Consider the following example to sum a bunch of integers:
|
62
|
+
|
63
|
+
program = Program.new <<-'eof'
|
64
|
+
__kernel sum(__global int *out, __global int *in, int total) {
|
65
|
+
int id = get_global_id(0);
|
66
|
+
if (id < total) atom_add(&out[0], in[id]);
|
67
|
+
}
|
68
|
+
eof
|
69
|
+
|
70
|
+
arr = (1..65536).to_a
|
71
|
+
input = Buffer.new(arr)
|
72
|
+
output = OutputBuffer.new(:int, 1)
|
73
|
+
program.sum(output, input, arr.size)
|
74
|
+
|
75
|
+
puts "The sum is: " + output.data[0].to_s
|
76
|
+
|
77
|
+
The above example will compute the sum of integers 1 to 65536 using (at most)
|
78
|
+
65536 parallel processes and return the result in the 1-dimensional output
|
79
|
+
buffer (which stores integers and is of length 1). The kernel method `sum`
|
80
|
+
is called by calling the `#sum` method on the program object, and the
|
81
|
+
arguments are passed in sequentially as the output buffer, followed by the
|
82
|
+
input data (the integers) followed by the total size of the input (since C
|
83
|
+
does not have the concept of array size).
|
84
|
+
|
85
|
+
We can also specify the work group size (the number of iterations we need
|
86
|
+
to run). Barracuda automatically selects the size of the largest buffer as
|
87
|
+
the work group size, but in some cases this may be too small or too large. To
|
88
|
+
manually specify the work group size, call the kernel with an options hash:
|
89
|
+
|
90
|
+
program.my_kernel_method(..., :worker_size => 512)
|
91
|
+
|
92
|
+
Note that the work group size must be a power of 2. Barracuda will increase
|
93
|
+
the work group size to the next power of 2 if it needs to. This means your
|
94
|
+
OpenCL program might run more iterations of your kernel method than you
|
95
|
+
request. Because we can't rely on the work group size, we pass in the total
|
96
|
+
data size to ensure we do not exceed the bounds of our data.
|
97
|
+
|
98
|
+
CLASS DETAILS
|
99
|
+
-------------
|
100
|
+
|
101
|
+
**Barracuda::Program**:
|
102
|
+
|
103
|
+
Represents an OpenCL program
|
104
|
+
|
105
|
+
Program.new(PROGRAM_SOURCE) => creates a new program
|
106
|
+
|
107
|
+
Program#compile(SOURCE) => recompiles a program
|
108
|
+
|
109
|
+
Program#KERNEL_METHOD(*args) => runs KERNEL_METHOD in the compiled program
|
110
|
+
- args should be the arguments defined in the kernel method.
|
111
|
+
- supported argument types are Float and Fixnum objects only.
|
112
|
+
- if the last arg is a Hash, it should be an options hash with keys:
|
113
|
+
- :worker_size => FIXNUM (the number of iterations to run)
|
114
|
+
|
115
|
+
**Barracuda::Buffer**:
|
116
|
+
|
117
|
+
Stores data to be sent to an OpenCL kernel method
|
118
|
+
|
119
|
+
Buffer.new(*buffer_data) => creates a new input buffer
|
120
|
+
|
121
|
+
Buffer#data => accessor for the buffer data
|
122
|
+
|
123
|
+
Buffer#size_changed => call this if the buffer.data was modified and the size changed
|
124
|
+
- calls Buffer#write
|
125
|
+
|
126
|
+
Buffer#write => call this if the buffer.data was modified (size not changed)
|
127
|
+
- flushes the buffer.data cache to the OpenCL internal memory buffer
|
128
|
+
|
129
|
+
Buffer#read => reads the cached data back into buffer.data
|
130
|
+
- refreshes the buffer.data cache according to the internal memory buffer
|
131
|
+
|
132
|
+
**Barracuda::OutputBuffer**:
|
133
|
+
|
134
|
+
Holds a buffer for data written from the kernel method.
|
135
|
+
|
136
|
+
OutputBuffer.new(type, size) => creates a new output buffer
|
137
|
+
- type can be :float or :int
|
138
|
+
|
139
|
+
OutputBufferBuffer#data => accessor for the buffer data
|
140
|
+
|
141
|
+
OutputBuffer#size => returns the buffer size
|
142
|
+
|
143
|
+
GLOSSARY
|
144
|
+
--------
|
145
|
+
|
146
|
+
* **Program**: an OpenCL program is generally created from a variant of C that
|
147
|
+
has extra domain specific keywords. A program has at least one "kernel"
|
148
|
+
method, but can have many regular methods.
|
149
|
+
|
150
|
+
* **Kernel**: a special "entry" method in the program that is exposed to the
|
151
|
+
programmer to be called on via the OpenCL framework. A kernel method is
|
152
|
+
represented by the `__kernel` keyword before the method body.
|
153
|
+
|
154
|
+
* **Buffer**: memory storage which is accessible and (generally shared with the
|
155
|
+
program). Buffers are usually marked with the `__global` keyword in an
|
156
|
+
OpenCL program.
|
157
|
+
|
158
|
+
COPYRIGHT & LICENSING
|
159
|
+
---------------------
|
160
|
+
|
161
|
+
Copyright 2009 Loren Segal, licensed under the MIT License
|
162
|
+
|
163
|
+
[1]: http://en.wikipedia.ca/wiki/OpenCL "OpenCL"
|
data/Rakefile
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake/gempackagetask'
|
3
|
+
|
4
|
+
WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
|
5
|
+
SUDO = WINDOWS ? '' : 'sudo'
|
6
|
+
|
7
|
+
load 'barracuda.gemspec'
|
8
|
+
Rake::GemPackageTask.new(SPEC) do |pkg|
|
9
|
+
pkg.gem_spec = SPEC
|
10
|
+
pkg.need_zip = true
|
11
|
+
pkg.need_tar = true
|
12
|
+
end
|
13
|
+
|
14
|
+
desc "Install the gem locally"
|
15
|
+
task :install => :package do
|
16
|
+
sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
|
17
|
+
sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
|
18
|
+
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
$:.unshift(File.dirname(__FILE__) + '/../ext')
|
2
|
+
|
3
|
+
require 'barracuda'
|
4
|
+
require 'benchmark'
|
5
|
+
|
6
|
+
include Barracuda
|
7
|
+
|
8
|
+
prog = Program.new <<-'eof'
|
9
|
+
__kernel sum(__global float *out, __global int *in, int total) {
|
10
|
+
int i = get_global_id(0);
|
11
|
+
if (i < total) out[i] = ((float)in[i] + 0.5) / 3.8 + 2.0;
|
12
|
+
}
|
13
|
+
eof
|
14
|
+
|
15
|
+
arr = (1..3333333).to_a
|
16
|
+
input = Buffer.new(arr)
|
17
|
+
output = OutputBuffer.new(:float, arr.size)
|
18
|
+
|
19
|
+
TIMES = 1
|
20
|
+
Benchmark.bmbm do |x|
|
21
|
+
x.report("cpu") { TIMES.times { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } } }
|
22
|
+
x.report("gpu") { TIMES.times { prog.sum(output, input, arr.size); output.clear } }
|
23
|
+
end
|
24
|
+
|
data/ext/barracuda.c
ADDED
@@ -0,0 +1,481 @@
|
|
1
|
+
#include <ruby.h>
|
2
|
+
#include <OpenCL/OpenCL.h>
|
3
|
+
|
4
|
+
static VALUE rb_mBarracuda;
|
5
|
+
static VALUE rb_cBuffer;
|
6
|
+
static VALUE rb_cOutputBuffer;
|
7
|
+
static VALUE rb_cProgram;
|
8
|
+
static VALUE rb_eProgramSyntaxError;
|
9
|
+
static VALUE rb_eOpenCLError;
|
10
|
+
|
11
|
+
static ID ba_worker_size;
|
12
|
+
|
13
|
+
static VALUE program_compile(VALUE self, VALUE source);
|
14
|
+
static VALUE buffer_data_set(VALUE self, VALUE new_value);
|
15
|
+
|
16
|
+
static cl_device_id device_id = NULL;
|
17
|
+
static cl_context context = NULL;
|
18
|
+
static int err;
|
19
|
+
|
20
|
+
#define BUFFER_TYPE_FLOAT 0x0001
|
21
|
+
#define BUFFER_TYPE_INT 0x0002
|
22
|
+
#define BUFFER_TYPE_CHAR 0x0003
|
23
|
+
|
24
|
+
struct program {
|
25
|
+
cl_program program;
|
26
|
+
};
|
27
|
+
|
28
|
+
struct kernel {
|
29
|
+
cl_kernel kernel;
|
30
|
+
};
|
31
|
+
|
32
|
+
struct buffer {
|
33
|
+
VALUE arr;
|
34
|
+
unsigned int type;
|
35
|
+
size_t num_items;
|
36
|
+
size_t member_size;
|
37
|
+
void *cachebuf;
|
38
|
+
cl_mem data;
|
39
|
+
};
|
40
|
+
|
41
|
+
#define GET_PROGRAM() \
|
42
|
+
struct program *program; \
|
43
|
+
Data_Get_Struct(self, struct program, program);
|
44
|
+
|
45
|
+
#define GET_BUFFER() \
|
46
|
+
struct buffer *buffer; \
|
47
|
+
Data_Get_Struct(self, struct buffer, buffer);
|
48
|
+
|
49
|
+
static void
|
50
|
+
init_opencl()
|
51
|
+
{
|
52
|
+
if (device_id == NULL) {
|
53
|
+
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
|
54
|
+
if (err != CL_SUCCESS) {
|
55
|
+
rb_raise(rb_eOpenCLError, "failed to create a device group");
|
56
|
+
}
|
57
|
+
}
|
58
|
+
|
59
|
+
if (context == NULL) {
|
60
|
+
context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
|
61
|
+
if (!context) {
|
62
|
+
rb_raise(rb_eOpenCLError, "failed to create a program context");
|
63
|
+
}
|
64
|
+
}
|
65
|
+
}
|
66
|
+
|
67
|
+
static void
|
68
|
+
free_buffer(struct buffer *buffer)
|
69
|
+
{
|
70
|
+
fflush(stdout);
|
71
|
+
clReleaseMemObject(buffer->data);
|
72
|
+
rb_gc_mark(buffer->arr);
|
73
|
+
ruby_xfree(buffer->cachebuf);
|
74
|
+
ruby_xfree(buffer);
|
75
|
+
}
|
76
|
+
|
77
|
+
static VALUE
|
78
|
+
buffer_s_allocate(VALUE klass)
|
79
|
+
{
|
80
|
+
struct buffer *buffer;
|
81
|
+
buffer = ALLOC(struct buffer);
|
82
|
+
MEMZERO(buffer, struct buffer, 1);
|
83
|
+
buffer->arr = Qnil;
|
84
|
+
return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
|
85
|
+
}
|
86
|
+
|
87
|
+
static void
|
88
|
+
buffer_update_cache_info(struct buffer *buffer)
|
89
|
+
{
|
90
|
+
buffer->num_items = RARRAY_LEN(buffer->arr);
|
91
|
+
|
92
|
+
switch (TYPE(RARRAY_PTR(buffer->arr)[0])) {
|
93
|
+
case T_FIXNUM:
|
94
|
+
buffer->type = BUFFER_TYPE_INT;
|
95
|
+
buffer->member_size = sizeof(int);
|
96
|
+
break;
|
97
|
+
case T_FLOAT:
|
98
|
+
buffer->type = BUFFER_TYPE_FLOAT;
|
99
|
+
buffer->member_size = sizeof(float);
|
100
|
+
break;
|
101
|
+
default:
|
102
|
+
rb_raise(rb_eRuntimeError, "invalid buffer data %s",
|
103
|
+
RSTRING_PTR(rb_inspect(buffer->arr)));
|
104
|
+
}
|
105
|
+
}
|
106
|
+
|
107
|
+
static VALUE
|
108
|
+
buffer_write(VALUE self)
|
109
|
+
{
|
110
|
+
unsigned int i;
|
111
|
+
|
112
|
+
GET_BUFFER();
|
113
|
+
|
114
|
+
buffer_update_cache_info(buffer);
|
115
|
+
|
116
|
+
if (buffer->cachebuf) {
|
117
|
+
xfree(buffer->cachebuf);
|
118
|
+
}
|
119
|
+
buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
|
120
|
+
|
121
|
+
for (i = 0; i < RARRAY_LEN(buffer->arr); i++) {
|
122
|
+
VALUE item = RARRAY_PTR(buffer->arr)[i];
|
123
|
+
switch (buffer->type) {
|
124
|
+
case BUFFER_TYPE_INT: {
|
125
|
+
int value = FIX2INT(item);
|
126
|
+
((int *)buffer->cachebuf)[i] = value;
|
127
|
+
break;
|
128
|
+
}
|
129
|
+
case BUFFER_TYPE_FLOAT: {
|
130
|
+
float value = RFLOAT_VALUE(item);
|
131
|
+
((float *)buffer->cachebuf)[i] = value;
|
132
|
+
break;
|
133
|
+
}
|
134
|
+
default:
|
135
|
+
((uint32_t *)buffer->cachebuf)[i] = 0;
|
136
|
+
}
|
137
|
+
}
|
138
|
+
|
139
|
+
return self;
|
140
|
+
}
|
141
|
+
|
142
|
+
static VALUE
|
143
|
+
buffer_read(VALUE self)
|
144
|
+
{
|
145
|
+
unsigned int i;
|
146
|
+
|
147
|
+
GET_BUFFER();
|
148
|
+
|
149
|
+
rb_gc_mark(buffer->arr);
|
150
|
+
buffer->arr = rb_ary_new2(buffer->num_items);
|
151
|
+
|
152
|
+
for (i = 0; i < buffer->num_items; i++) {
|
153
|
+
switch (buffer->type) {
|
154
|
+
case BUFFER_TYPE_INT:
|
155
|
+
rb_ary_push(buffer->arr, INT2FIX(((int *)buffer->cachebuf)[i]));
|
156
|
+
break;
|
157
|
+
case BUFFER_TYPE_FLOAT:
|
158
|
+
rb_ary_push(buffer->arr, rb_float_new(((float *)buffer->cachebuf)[i]));
|
159
|
+
break;
|
160
|
+
default:
|
161
|
+
rb_ary_push(buffer->arr, Qnil);
|
162
|
+
}
|
163
|
+
}
|
164
|
+
|
165
|
+
return self;
|
166
|
+
}
|
167
|
+
|
168
|
+
static VALUE
|
169
|
+
buffer_size_changed(VALUE self)
|
170
|
+
{
|
171
|
+
GET_BUFFER();
|
172
|
+
|
173
|
+
if (buffer->data) {
|
174
|
+
clReleaseMemObject(buffer->data);
|
175
|
+
}
|
176
|
+
buffer_update_cache_info(buffer);
|
177
|
+
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
178
|
+
buffer->num_items * buffer->member_size, NULL, NULL);
|
179
|
+
|
180
|
+
buffer_write(self);
|
181
|
+
|
182
|
+
return self;
|
183
|
+
}
|
184
|
+
|
185
|
+
static VALUE
|
186
|
+
buffer_data(VALUE self)
|
187
|
+
{
|
188
|
+
GET_BUFFER();
|
189
|
+
return buffer->arr;
|
190
|
+
}
|
191
|
+
|
192
|
+
static VALUE
|
193
|
+
buffer_data_set(VALUE self, VALUE new_value)
|
194
|
+
{
|
195
|
+
GET_BUFFER();
|
196
|
+
|
197
|
+
if (RTEST(buffer->arr)) {
|
198
|
+
rb_gc_mark(buffer->arr);
|
199
|
+
}
|
200
|
+
buffer->arr = new_value;
|
201
|
+
buffer_size_changed(self);
|
202
|
+
return buffer->arr;
|
203
|
+
}
|
204
|
+
|
205
|
+
static VALUE
|
206
|
+
buffer_initialize(int argc, VALUE *argv, VALUE self)
|
207
|
+
{
|
208
|
+
GET_BUFFER();
|
209
|
+
|
210
|
+
if (argc == 0) {
|
211
|
+
rb_raise(rb_eArgError, "no buffer data given");
|
212
|
+
}
|
213
|
+
|
214
|
+
if (TYPE(argv[0]) == T_ARRAY) {
|
215
|
+
buffer_data_set(self, argv[0]);
|
216
|
+
}
|
217
|
+
else {
|
218
|
+
buffer_data_set(self, rb_ary_new4(argc, argv));
|
219
|
+
}
|
220
|
+
|
221
|
+
return self;
|
222
|
+
}
|
223
|
+
|
224
|
+
static VALUE
|
225
|
+
obuffer_initialize(VALUE self, VALUE type, VALUE size)
|
226
|
+
{
|
227
|
+
GET_BUFFER();
|
228
|
+
|
229
|
+
StringValue(type);
|
230
|
+
if (strcmp(RSTRING_PTR(type), "float") == 0) {
|
231
|
+
buffer->type = BUFFER_TYPE_FLOAT;
|
232
|
+
buffer->member_size = sizeof(float);
|
233
|
+
}
|
234
|
+
else if (strcmp(RSTRING_PTR(type), "int") == 0) {
|
235
|
+
buffer->type = BUFFER_TYPE_INT;
|
236
|
+
buffer->member_size = sizeof(int);
|
237
|
+
}
|
238
|
+
else {
|
239
|
+
rb_raise(rb_eArgError, "type can only be :float or :int");
|
240
|
+
}
|
241
|
+
|
242
|
+
if (TYPE(size) != T_FIXNUM) {
|
243
|
+
rb_raise(rb_eArgError, "expecting buffer size as argument 2");
|
244
|
+
}
|
245
|
+
|
246
|
+
buffer->num_items = FIX2UINT(size);
|
247
|
+
buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
|
248
|
+
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
249
|
+
buffer->member_size * buffer->num_items, NULL, NULL);
|
250
|
+
|
251
|
+
return self;
|
252
|
+
}
|
253
|
+
|
254
|
+
static VALUE
|
255
|
+
obuffer_clear(VALUE self)
|
256
|
+
{
|
257
|
+
GET_BUFFER();
|
258
|
+
memset(buffer->cachebuf, 0, buffer->member_size * buffer->num_items);
|
259
|
+
return self;
|
260
|
+
}
|
261
|
+
|
262
|
+
static VALUE
|
263
|
+
obuffer_size(VALUE self)
|
264
|
+
{
|
265
|
+
GET_BUFFER();
|
266
|
+
return INT2FIX(buffer->num_items);
|
267
|
+
}
|
268
|
+
|
269
|
+
static void
|
270
|
+
free_program(struct program *program)
|
271
|
+
{
|
272
|
+
clReleaseProgram(program->program);
|
273
|
+
xfree(program);
|
274
|
+
}
|
275
|
+
|
276
|
+
static VALUE
|
277
|
+
program_s_allocate(VALUE klass)
|
278
|
+
{
|
279
|
+
struct program *program;
|
280
|
+
program = ALLOC(struct program);
|
281
|
+
MEMZERO(program, struct program, 1);
|
282
|
+
return Data_Wrap_Struct(klass, 0, free_program, program);
|
283
|
+
}
|
284
|
+
|
285
|
+
static VALUE
|
286
|
+
program_initialize(int argc, VALUE *argv, VALUE self)
|
287
|
+
{
|
288
|
+
VALUE source;
|
289
|
+
|
290
|
+
rb_scan_args(argc, argv, "01", &source);
|
291
|
+
if (source != Qnil) {
|
292
|
+
program_compile(self, source);
|
293
|
+
}
|
294
|
+
|
295
|
+
return self;
|
296
|
+
}
|
297
|
+
|
298
|
+
static VALUE
|
299
|
+
program_compile(VALUE self, VALUE source)
|
300
|
+
{
|
301
|
+
const char *c_source;
|
302
|
+
GET_PROGRAM();
|
303
|
+
StringValue(source);
|
304
|
+
|
305
|
+
if (program->program) {
|
306
|
+
clReleaseProgram(program->program);
|
307
|
+
program->program = 0;
|
308
|
+
}
|
309
|
+
|
310
|
+
c_source = StringValueCStr(source);
|
311
|
+
program->program = clCreateProgramWithSource(context, 1, &c_source, NULL, &err);
|
312
|
+
if (!program->program) {
|
313
|
+
program->program = 0;
|
314
|
+
rb_raise(rb_eOpenCLError, "failed to create compute program");
|
315
|
+
}
|
316
|
+
|
317
|
+
err = clBuildProgram(program->program, 0, NULL, NULL, NULL, NULL);
|
318
|
+
if (err != CL_SUCCESS) {
|
319
|
+
size_t len;
|
320
|
+
char buffer[2048];
|
321
|
+
|
322
|
+
clGetProgramBuildInfo(program->program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
|
323
|
+
clReleaseProgram(program->program);
|
324
|
+
program->program = 0;
|
325
|
+
rb_raise(rb_eProgramSyntaxError, "%s", buffer);
|
326
|
+
}
|
327
|
+
|
328
|
+
return Qtrue;
|
329
|
+
}
|
330
|
+
|
331
|
+
#define CLEAN() program_clean(kernel, commands);
|
332
|
+
#define ERROR(msg) if (err != CL_SUCCESS) { CLEAN(); rb_raise(rb_eOpenCLError, msg); }
|
333
|
+
|
334
|
+
static void
|
335
|
+
program_clean(cl_kernel kernel, cl_command_queue commands)
|
336
|
+
{
|
337
|
+
clReleaseKernel(kernel);
|
338
|
+
clReleaseCommandQueue(commands);
|
339
|
+
}
|
340
|
+
|
341
|
+
static VALUE
|
342
|
+
program_method_missing(int argc, VALUE *argv, VALUE self)
|
343
|
+
{
|
344
|
+
int i;
|
345
|
+
size_t local = 0, global = 0;
|
346
|
+
cl_kernel kernel;
|
347
|
+
cl_command_queue commands;
|
348
|
+
GET_PROGRAM();
|
349
|
+
|
350
|
+
StringValue(argv[0]);
|
351
|
+
kernel = clCreateKernel(program->program, RSTRING_PTR(argv[0]), &err);
|
352
|
+
if (!kernel || err != CL_SUCCESS) {
|
353
|
+
rb_raise(rb_eNoMethodError, "no kernel method '%s'", RSTRING_PTR(argv[0]));
|
354
|
+
}
|
355
|
+
|
356
|
+
commands = clCreateCommandQueue(context, device_id, 0, &err);
|
357
|
+
if (!commands) {
|
358
|
+
rb_raise(rb_eOpenCLError, "could not execute kernel method '%s'", RSTRING_PTR(argv[0]));
|
359
|
+
}
|
360
|
+
|
361
|
+
for (i = 1; i < argc; i++) {
|
362
|
+
err = 0;
|
363
|
+
if (i == argc - 1 && TYPE(argv[i]) == T_HASH) {
|
364
|
+
VALUE worker_size = rb_hash_aref(argv[i], ID2SYM(ba_worker_size));
|
365
|
+
if (RTEST(worker_size) && TYPE(worker_size) == T_FIXNUM) {
|
366
|
+
global = FIX2UINT(worker_size);
|
367
|
+
}
|
368
|
+
else {
|
369
|
+
CLEAN();
|
370
|
+
rb_raise(rb_eArgError, "opts hash must be {:worker_size => INT_VALUE}, got %s",
|
371
|
+
RSTRING_PTR(rb_inspect(argv[i])));
|
372
|
+
}
|
373
|
+
break;
|
374
|
+
}
|
375
|
+
|
376
|
+
switch(TYPE(argv[i])) {
|
377
|
+
case T_FIXNUM: {
|
378
|
+
int value = FIX2INT(argv[i]);
|
379
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(int), &value);
|
380
|
+
break;
|
381
|
+
}
|
382
|
+
case T_FLOAT: {
|
383
|
+
float value = RFLOAT_VALUE(argv[i]);
|
384
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(float), &value);
|
385
|
+
break;
|
386
|
+
}
|
387
|
+
case T_ARRAY: {
|
388
|
+
/* TODO */
|
389
|
+
/* fall-through */
|
390
|
+
}
|
391
|
+
default:
|
392
|
+
if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
|
393
|
+
struct buffer *buffer;
|
394
|
+
Data_Get_Struct(argv[i], struct buffer, buffer);
|
395
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
396
|
+
if (buffer->num_items > global) {
|
397
|
+
global = buffer->num_items;
|
398
|
+
}
|
399
|
+
}
|
400
|
+
else if (CLASS_OF(argv[i]) == rb_cBuffer) {
|
401
|
+
struct buffer *buffer;
|
402
|
+
Data_Get_Struct(argv[i], struct buffer, buffer);
|
403
|
+
|
404
|
+
buffer_write(argv[i]);
|
405
|
+
clEnqueueWriteBuffer(commands, buffer->data, CL_TRUE, 0,
|
406
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
407
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
408
|
+
}
|
409
|
+
break;
|
410
|
+
}
|
411
|
+
if (err != CL_SUCCESS) {
|
412
|
+
CLEAN();
|
413
|
+
rb_raise(rb_eArgError, "invalid kernel method parameter: %s", RSTRING_PTR(rb_inspect(argv[i])));
|
414
|
+
}
|
415
|
+
}
|
416
|
+
|
417
|
+
err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &local, NULL);
|
418
|
+
ERROR("failed to retrieve kernel work group info");
|
419
|
+
|
420
|
+
{ /* global work size must be power of 2, greater than 3 and not smaller than local */
|
421
|
+
size_t size = 4;
|
422
|
+
while (size < global) size *= 2;
|
423
|
+
global = size;
|
424
|
+
if (global < local) global = local;
|
425
|
+
}
|
426
|
+
|
427
|
+
clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
|
428
|
+
if (err) { CLEAN(); rb_raise(rb_eOpenCLError, "failed to execute kernel method"); }
|
429
|
+
|
430
|
+
clFinish(commands);
|
431
|
+
|
432
|
+
for (i = 1; i < argc; i++) {
|
433
|
+
if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
|
434
|
+
struct buffer *buffer;
|
435
|
+
Data_Get_Struct(argv[i], struct buffer, buffer);
|
436
|
+
err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
|
437
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
438
|
+
ERROR("failed to read output buffer");
|
439
|
+
buffer_read(argv[i]);
|
440
|
+
}
|
441
|
+
}
|
442
|
+
|
443
|
+
CLEAN();
|
444
|
+
return Qnil;
|
445
|
+
}
|
446
|
+
|
447
|
+
void
|
448
|
+
Init_barracuda()
|
449
|
+
{
|
450
|
+
ba_worker_size = rb_intern("worker_size");
|
451
|
+
|
452
|
+
rb_mBarracuda = rb_define_module("Barracuda");
|
453
|
+
|
454
|
+
rb_eProgramSyntaxError = rb_define_class_under(rb_mBarracuda, "SyntaxError", rb_eSyntaxError);
|
455
|
+
rb_eOpenCLError = rb_define_class_under(rb_mBarracuda, "OpenCLError", rb_eStandardError);
|
456
|
+
|
457
|
+
rb_cProgram = rb_define_class_under(rb_mBarracuda, "Program", rb_cObject);
|
458
|
+
rb_define_alloc_func(rb_cProgram, program_s_allocate);
|
459
|
+
rb_define_method(rb_cProgram, "initialize", program_initialize, -1);
|
460
|
+
rb_define_method(rb_cProgram, "compile", program_compile, 1);
|
461
|
+
rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
|
462
|
+
|
463
|
+
rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cObject);
|
464
|
+
rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
|
465
|
+
rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
|
466
|
+
rb_define_method(rb_cBuffer, "size_changed", buffer_size_changed, 0);
|
467
|
+
rb_define_method(rb_cBuffer, "read", buffer_read, 0);
|
468
|
+
rb_define_method(rb_cBuffer, "write", buffer_write, 0);
|
469
|
+
rb_define_method(rb_cBuffer, "data", buffer_data, 0);
|
470
|
+
rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
|
471
|
+
|
472
|
+
rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
|
473
|
+
rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
|
474
|
+
rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
|
475
|
+
rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
|
476
|
+
rb_undef_method(rb_cOutputBuffer, "write");
|
477
|
+
rb_undef_method(rb_cOutputBuffer, "size_changed");
|
478
|
+
rb_undef_method(rb_cOutputBuffer, "data=");
|
479
|
+
|
480
|
+
init_opencl();
|
481
|
+
}
|
data/ext/extconf.rb
ADDED
@@ -0,0 +1,174 @@
|
|
1
|
+
$:.unshift(File.dirname(__FILE__) + '/../ext/')
|
2
|
+
|
3
|
+
require "test/unit"
|
4
|
+
require "barracuda"
|
5
|
+
|
6
|
+
include Barracuda
|
7
|
+
|
8
|
+
class TestBuffer < Test::Unit::TestCase
|
9
|
+
def test_buffer_create_no_data
|
10
|
+
assert_raise(ArgumentError) { Buffer.new }
|
11
|
+
end
|
12
|
+
|
13
|
+
def test_buffer_create_invalid_data
|
14
|
+
assert_raise(RuntimeError) { Buffer.new("xyz") }
|
15
|
+
end
|
16
|
+
|
17
|
+
def test_buffer_create_with_array
|
18
|
+
b = Buffer.new([1, 2, 3, 4, 5])
|
19
|
+
assert_equal [1, 2, 3, 4, 5], b.data
|
20
|
+
end
|
21
|
+
|
22
|
+
def test_buffer_create_with_splat
|
23
|
+
b = Buffer.new(1.0, 2.0, 3.0)
|
24
|
+
assert_equal [1.0, 2.0, 3.0], b.data
|
25
|
+
end
|
26
|
+
|
27
|
+
def test_buffer_set_data
|
28
|
+
b = Buffer.new(1)
|
29
|
+
b.data = [1, 2, 3]
|
30
|
+
assert_equal 3, b.data.size
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_buffer_read
|
34
|
+
b = Buffer.new(4, 2, 3)
|
35
|
+
b.data[0] = 1
|
36
|
+
b.read
|
37
|
+
assert_equal [4,2,3], b.data
|
38
|
+
end
|
39
|
+
|
40
|
+
def test_buffer_write
|
41
|
+
b = Buffer.new(1, 2, 3)
|
42
|
+
b.data[0] = 4
|
43
|
+
b.write
|
44
|
+
b.read
|
45
|
+
assert_equal [4,2,3], b.data
|
46
|
+
end
|
47
|
+
|
48
|
+
def test_buffer_size_changed
|
49
|
+
b = Buffer.new(1, 2, 3)
|
50
|
+
b.data << 4
|
51
|
+
b.size_changed
|
52
|
+
b.read
|
53
|
+
assert_equal [1,2,3,4], b.data
|
54
|
+
end
|
55
|
+
end
|
56
|
+
|
57
|
+
class TestOutputBuffer < Test::Unit::TestCase
|
58
|
+
def test_create_int_output_buffer
|
59
|
+
b = OutputBuffer.new(:int, 5)
|
60
|
+
assert_equal 5, b.size
|
61
|
+
end
|
62
|
+
|
63
|
+
def test_create_int_output_buffer
|
64
|
+
b = OutputBuffer.new(:float, 5)
|
65
|
+
assert_equal 5, b.size
|
66
|
+
end
|
67
|
+
|
68
|
+
def test_create_output_buffer_with_invalid_type
|
69
|
+
assert_raise(ArgumentError) { OutputBuffer.new(:char, 5) }
|
70
|
+
end
|
71
|
+
|
72
|
+
def test_create_output_buffer_with_invalid_size
|
73
|
+
assert_raise(ArgumentError) { OutputBuffer.new(:int, 'x') }
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
class TestProgram < Test::Unit::TestCase
|
78
|
+
def test_program_create_invalid_code
|
79
|
+
assert_raise(Barracuda::SyntaxError) { Program.new "fib { SYNTAXERROR }" }
|
80
|
+
end
|
81
|
+
|
82
|
+
def test_program_create
|
83
|
+
assert_nothing_raised { Program.new "__kernel fib(int x) { return 0; }"}
|
84
|
+
end
|
85
|
+
|
86
|
+
def test_program_compile
|
87
|
+
p = Program.new
|
88
|
+
assert_nothing_raised { p.compile "__kernel fib(int x) { }" }
|
89
|
+
end
|
90
|
+
|
91
|
+
def test_kernel_run
|
92
|
+
p = Program.new("__kernel x_y_z(int x) { }")
|
93
|
+
assert_nothing_raised { p.x_y_z }
|
94
|
+
end
|
95
|
+
|
96
|
+
def test_kernel_missing
|
97
|
+
p = Program.new("__kernel x_y_z(int x) { }")
|
98
|
+
assert_raise(NoMethodError) { p.not_x_y_z }
|
99
|
+
end
|
100
|
+
|
101
|
+
def test_program_int_input_buffer
|
102
|
+
p = Program.new <<-'eof'
|
103
|
+
__kernel run(__global int* out, __global int* in, int total) {
|
104
|
+
int id = get_global_id(0);
|
105
|
+
if (id < total) out[id] = in[id] + 1;
|
106
|
+
}
|
107
|
+
eof
|
108
|
+
|
109
|
+
arr = (1..256).to_a
|
110
|
+
_in = Buffer.new(arr)
|
111
|
+
out = OutputBuffer.new(:int, arr.size)
|
112
|
+
p.run(out, _in, arr.size)
|
113
|
+
assert_equal arr.map {|x| x + 1 }, out.data
|
114
|
+
end
|
115
|
+
|
116
|
+
def test_program_float_buffer
|
117
|
+
p = Program.new <<-'eof'
|
118
|
+
__kernel run(__global float* out, __global int* in, int total) {
|
119
|
+
int id = get_global_id(0);
|
120
|
+
if (id < total) out[id] = (float)in[id] + 0.5;
|
121
|
+
}
|
122
|
+
eof
|
123
|
+
|
124
|
+
arr = (1..256).to_a
|
125
|
+
_in = Buffer.new(arr)
|
126
|
+
out = OutputBuffer.new(:float, arr.size)
|
127
|
+
p.run(out, _in, arr.size)
|
128
|
+
assert_equal arr.map {|x| x.to_f + 0.5 }, out.data
|
129
|
+
end
|
130
|
+
|
131
|
+
def test_program_set_worker_size
|
132
|
+
p = Program.new <<-'eof'
|
133
|
+
__kernel sum(__global int* out, __global int* in, int total) {
|
134
|
+
int id = get_global_id(0);
|
135
|
+
if (id < total) atom_add(&out[0], in[id]);
|
136
|
+
}
|
137
|
+
eof
|
138
|
+
|
139
|
+
arr = (1..517).to_a
|
140
|
+
sum = arr.inject(0) {|acc, el| acc + el }
|
141
|
+
_in = Buffer.new(arr)
|
142
|
+
out = OutputBuffer.new(:int, 1)
|
143
|
+
p.sum(out, _in, arr.size, :worker_size => arr.size)
|
144
|
+
assert_equal sum, out.data[0]
|
145
|
+
end
|
146
|
+
|
147
|
+
def test_program_largest_buffer_is_input
|
148
|
+
p = Program.new <<-'eof'
|
149
|
+
__kernel sum(__global int* out, __global int* in, int total) {
|
150
|
+
int id = get_global_id(0);
|
151
|
+
if (id < total) atom_add(&out[0], in[id]);
|
152
|
+
}
|
153
|
+
eof
|
154
|
+
|
155
|
+
arr = (1..517).to_a
|
156
|
+
sum = arr.inject(0) {|acc, el| acc + el }
|
157
|
+
_in = Buffer.new(arr)
|
158
|
+
out = OutputBuffer.new(:int, 1)
|
159
|
+
p.sum(out, _in, arr.size)
|
160
|
+
assert_equal sum, out.data[0]
|
161
|
+
end
|
162
|
+
|
163
|
+
def test_program_invalid_worker_size
|
164
|
+
p = Program.new("__kernel sum(int x) { }")
|
165
|
+
assert_raise(ArgumentError) { p.sum(:worker_size => "hello") }
|
166
|
+
assert_raise(ArgumentError) { p.sum(:worker => 1) }
|
167
|
+
end
|
168
|
+
|
169
|
+
def test_program_invalid_args
|
170
|
+
p = Program.new("__kernel sum(int x, __global int *y) { }")
|
171
|
+
assert_raise(ArgumentError) { p.sum(1, 2) }
|
172
|
+
assert_raise(ArgumentError) { p.sum(1, OutputBuffer.new(:int, 1), 3) }
|
173
|
+
end
|
174
|
+
end
|
metadata
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: barracuda
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: "1.0"
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Loren Segal
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-08-30 00:00:00 -04:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description:
|
17
|
+
email: lsegal@soen.ca
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions:
|
21
|
+
- ext/extconf.rb
|
22
|
+
extra_rdoc_files: []
|
23
|
+
|
24
|
+
files:
|
25
|
+
- ext/barracuda.c
|
26
|
+
- ext/extconf.rb
|
27
|
+
- benchmarks/to_float.rb
|
28
|
+
- test/test_barracuda.rb
|
29
|
+
- LICENSE
|
30
|
+
- README.md
|
31
|
+
- Rakefile
|
32
|
+
has_rdoc: true
|
33
|
+
homepage: http://github.com/lsegal/barracuda
|
34
|
+
licenses: []
|
35
|
+
|
36
|
+
post_install_message:
|
37
|
+
rdoc_options: []
|
38
|
+
|
39
|
+
require_paths:
|
40
|
+
- ext
|
41
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
42
|
+
requirements:
|
43
|
+
- - ">="
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: "0"
|
46
|
+
version:
|
47
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
48
|
+
requirements:
|
49
|
+
- - ">="
|
50
|
+
- !ruby/object:Gem::Version
|
51
|
+
version: "0"
|
52
|
+
version:
|
53
|
+
requirements: []
|
54
|
+
|
55
|
+
rubyforge_project: barracuda
|
56
|
+
rubygems_version: 1.3.4
|
57
|
+
signing_key:
|
58
|
+
specification_version: 3
|
59
|
+
summary: Barracuda is a wrapper library for OpenCL/CUDA GPGPU programming
|
60
|
+
test_files:
|
61
|
+
- test/test_barracuda.rb
|