barracuda 1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/LICENSE +22 -0
- data/README.md +163 -0
- data/Rakefile +18 -0
- data/benchmarks/to_float.rb +24 -0
- data/ext/barracuda.c +481 -0
- data/ext/extconf.rb +4 -0
- data/test/test_barracuda.rb +174 -0
- metadata +61 -0
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2009 Loren Segal
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person
|
4
|
+
obtaining a copy of this software and associated documentation
|
5
|
+
files (the "Software"), to deal in the Software without
|
6
|
+
restriction, including without limitation the rights to use,
|
7
|
+
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
copies of the Software, and to permit persons to whom the
|
9
|
+
Software is furnished to do so, subject to the following
|
10
|
+
conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be
|
13
|
+
included in all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
16
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
17
|
+
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
18
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
19
|
+
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
20
|
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
21
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,163 @@
|
|
1
|
+
Barracuda
|
2
|
+
=========
|
3
|
+
|
4
|
+
Written by Loren Segal in 2009.
|
5
|
+
|
6
|
+
SYNOPSIS
|
7
|
+
--------
|
8
|
+
|
9
|
+
Barracuda is a Ruby wrapper library for the [OpenCL][1] architecture. OpenCL is a
|
10
|
+
framework for multi-processor computing, most notably allowing a programmer
|
11
|
+
to run parallel programs on a GPU, taking advantage of the many cores
|
12
|
+
available.
|
13
|
+
|
14
|
+
Barracuda aims to abstract both CUDA and OpenCL, however for now only OpenCL
|
15
|
+
on OSX 10.6 is supported. Patches to extend this support would be joyously
|
16
|
+
accepted!
|
17
|
+
|
18
|
+
Also note that Barracuda currently only supports data types, namely ints and
|
19
|
+
floats only. This should also be expanded.
|
20
|
+
|
21
|
+
INSTALLING
|
22
|
+
----------
|
23
|
+
|
24
|
+
As mentioned above, this library currently only supports OSX 10.6 (or an earlier
|
25
|
+
version with the OpenCL framework, if that's even possible). If you manage to
|
26
|
+
mess with the source and get it working on [insert system here], please submit
|
27
|
+
your patches.
|
28
|
+
|
29
|
+
Okay, assuming you have a compatible machine:
|
30
|
+
|
31
|
+
sudo gem install barracuda
|
32
|
+
|
33
|
+
Or:
|
34
|
+
|
35
|
+
git clone git://github.com/lsegal/barracuda
|
36
|
+
cd barracuda
|
37
|
+
rake install
|
38
|
+
|
39
|
+
USING
|
40
|
+
-----
|
41
|
+
|
42
|
+
The basic workflow behind the OpenCL architecture is:
|
43
|
+
|
44
|
+
1. Create a program (and kernel) to be run on the GPU's many cores.
|
45
|
+
2. Create input/output buffers to pass data from Ruby to the GPU and back.
|
46
|
+
3. Read the output buffer(s) to get your computed data.
|
47
|
+
|
48
|
+
In Barracuda, this looks basically like:
|
49
|
+
|
50
|
+
1. Create a `Barracuda::Program`
|
51
|
+
2. Create a `Barracuda::Buffer` or `Barracuda::OutputBuffer`
|
52
|
+
2. Call the kernel method on the program with buffers as arguments
|
53
|
+
3. Read output buffers
|
54
|
+
|
55
|
+
As you can see, there are only 3 basic classes: `Program`, `Buffer` (for input
|
56
|
+
data), and `OutputBuffer` (for output data).
|
57
|
+
|
58
|
+
EXAMPLE
|
59
|
+
-------
|
60
|
+
|
61
|
+
Consider the following example to sum a bunch of integers:
|
62
|
+
|
63
|
+
program = Program.new <<-'eof'
|
64
|
+
__kernel sum(__global int *out, __global int *in, int total) {
|
65
|
+
int id = get_global_id(0);
|
66
|
+
if (id < total) atom_add(&out[0], in[id]);
|
67
|
+
}
|
68
|
+
eof
|
69
|
+
|
70
|
+
arr = (1..65536).to_a
|
71
|
+
input = Buffer.new(arr)
|
72
|
+
output = OutputBuffer.new(:int, 1)
|
73
|
+
program.sum(output, input, arr.size)
|
74
|
+
|
75
|
+
puts "The sum is: " + output.data[0].to_s
|
76
|
+
|
77
|
+
The above example will compute the sum of integers 1 to 65536 using (at most)
|
78
|
+
65536 parallel processes and return the result in the 1-dimensional output
|
79
|
+
buffer (which stores integers and is of length 1). The kernel method `sum`
|
80
|
+
is called by calling the `#sum` method on the program object, and the
|
81
|
+
arguments are passed in sequentially as the output buffer, followed by the
|
82
|
+
input data (the integers) followed by the total size of the input (since C
|
83
|
+
does not have the concept of array size).
|
84
|
+
|
85
|
+
We can also specify the work group size (the number of iterations we need
|
86
|
+
to run). Barracuda automatically selects the size of the largest buffer as
|
87
|
+
the work group size, but in some cases this may be too small or too large. To
|
88
|
+
manually specify the work group size, call the kernel with an options hash:
|
89
|
+
|
90
|
+
program.my_kernel_method(..., :worker_size => 512)
|
91
|
+
|
92
|
+
Note that the work group size must be a power of 2. Barracuda will increase
|
93
|
+
the work group size to the next power of 2 if it needs to. This means your
|
94
|
+
OpenCL program might run more iterations of your kernel method than you
|
95
|
+
request. Because we can't rely on the work group size, we pass in the total
|
96
|
+
data size to ensure we do not exceed the bounds of our data.
|
97
|
+
|
98
|
+
CLASS DETAILS
|
99
|
+
-------------
|
100
|
+
|
101
|
+
**Barracuda::Program**:
|
102
|
+
|
103
|
+
Represents an OpenCL program
|
104
|
+
|
105
|
+
Program.new(PROGRAM_SOURCE) => creates a new program
|
106
|
+
|
107
|
+
Program#compile(SOURCE) => recompiles a program
|
108
|
+
|
109
|
+
Program#KERNEL_METHOD(*args) => runs KERNEL_METHOD in the compiled program
|
110
|
+
- args should be the arguments defined in the kernel method.
|
111
|
+
- supported argument types are Float and Fixnum objects only.
|
112
|
+
- if the last arg is a Hash, it should be an options hash with keys:
|
113
|
+
- :worker_size => FIXNUM (the number of iterations to run)
|
114
|
+
|
115
|
+
**Barracuda::Buffer**:
|
116
|
+
|
117
|
+
Stores data to be sent to an OpenCL kernel method
|
118
|
+
|
119
|
+
Buffer.new(*buffer_data) => creates a new input buffer
|
120
|
+
|
121
|
+
Buffer#data => accessor for the buffer data
|
122
|
+
|
123
|
+
Buffer#size_changed => call this if the buffer.data was modified and the size changed
|
124
|
+
- calls Buffer#write
|
125
|
+
|
126
|
+
Buffer#write => call this if the buffer.data was modified (size not changed)
|
127
|
+
- flushes the buffer.data cache to the OpenCL internal memory buffer
|
128
|
+
|
129
|
+
Buffer#read => reads the cached data back into buffer.data
|
130
|
+
- refreshes the buffer.data cache according to the internal memory buffer
|
131
|
+
|
132
|
+
**Barracuda::OutputBuffer**:
|
133
|
+
|
134
|
+
Holds a buffer for data written from the kernel method.
|
135
|
+
|
136
|
+
OutputBuffer.new(type, size) => creates a new output buffer
|
137
|
+
- type can be :float or :int
|
138
|
+
|
139
|
+
OutputBufferBuffer#data => accessor for the buffer data
|
140
|
+
|
141
|
+
OutputBuffer#size => returns the buffer size
|
142
|
+
|
143
|
+
GLOSSARY
|
144
|
+
--------
|
145
|
+
|
146
|
+
* **Program**: an OpenCL program is generally created from a variant of C that
|
147
|
+
has extra domain specific keywords. A program has at least one "kernel"
|
148
|
+
method, but can have many regular methods.
|
149
|
+
|
150
|
+
* **Kernel**: a special "entry" method in the program that is exposed to the
|
151
|
+
programmer to be called on via the OpenCL framework. A kernel method is
|
152
|
+
represented by the `__kernel` keyword before the method body.
|
153
|
+
|
154
|
+
* **Buffer**: memory storage which is accessible and (generally shared with the
|
155
|
+
program). Buffers are usually marked with the `__global` keyword in an
|
156
|
+
OpenCL program.
|
157
|
+
|
158
|
+
COPYRIGHT & LICENSING
|
159
|
+
---------------------
|
160
|
+
|
161
|
+
Copyright 2009 Loren Segal, licensed under the MIT License
|
162
|
+
|
163
|
+
[1]: http://en.wikipedia.ca/wiki/OpenCL "OpenCL"
|
data/Rakefile
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake/gempackagetask'
|
3
|
+
|
4
|
+
WINDOWS = (PLATFORM =~ /win32|cygwin/ ? true : false) rescue false
|
5
|
+
SUDO = WINDOWS ? '' : 'sudo'
|
6
|
+
|
7
|
+
load 'barracuda.gemspec'
|
8
|
+
Rake::GemPackageTask.new(SPEC) do |pkg|
|
9
|
+
pkg.gem_spec = SPEC
|
10
|
+
pkg.need_zip = true
|
11
|
+
pkg.need_tar = true
|
12
|
+
end
|
13
|
+
|
14
|
+
desc "Install the gem locally"
|
15
|
+
task :install => :package do
|
16
|
+
sh "#{SUDO} gem install pkg/#{SPEC.name}-#{SPEC.version}.gem --local"
|
17
|
+
sh "rm -rf pkg/#{SPEC.name}-#{SPEC.version}" unless ENV['KEEP_FILES']
|
18
|
+
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
$:.unshift(File.dirname(__FILE__) + '/../ext')
|
2
|
+
|
3
|
+
require 'barracuda'
|
4
|
+
require 'benchmark'
|
5
|
+
|
6
|
+
include Barracuda
|
7
|
+
|
8
|
+
prog = Program.new <<-'eof'
|
9
|
+
__kernel sum(__global float *out, __global int *in, int total) {
|
10
|
+
int i = get_global_id(0);
|
11
|
+
if (i < total) out[i] = ((float)in[i] + 0.5) / 3.8 + 2.0;
|
12
|
+
}
|
13
|
+
eof
|
14
|
+
|
15
|
+
arr = (1..3333333).to_a
|
16
|
+
input = Buffer.new(arr)
|
17
|
+
output = OutputBuffer.new(:float, arr.size)
|
18
|
+
|
19
|
+
TIMES = 1
|
20
|
+
Benchmark.bmbm do |x|
|
21
|
+
x.report("cpu") { TIMES.times { arr.map {|x| (x.to_f + 0.5) / 3.8 + 2.0 } } }
|
22
|
+
x.report("gpu") { TIMES.times { prog.sum(output, input, arr.size); output.clear } }
|
23
|
+
end
|
24
|
+
|
data/ext/barracuda.c
ADDED
@@ -0,0 +1,481 @@
|
|
1
|
+
#include <ruby.h>
|
2
|
+
#include <OpenCL/OpenCL.h>
|
3
|
+
|
4
|
+
static VALUE rb_mBarracuda;
|
5
|
+
static VALUE rb_cBuffer;
|
6
|
+
static VALUE rb_cOutputBuffer;
|
7
|
+
static VALUE rb_cProgram;
|
8
|
+
static VALUE rb_eProgramSyntaxError;
|
9
|
+
static VALUE rb_eOpenCLError;
|
10
|
+
|
11
|
+
static ID ba_worker_size;
|
12
|
+
|
13
|
+
static VALUE program_compile(VALUE self, VALUE source);
|
14
|
+
static VALUE buffer_data_set(VALUE self, VALUE new_value);
|
15
|
+
|
16
|
+
static cl_device_id device_id = NULL;
|
17
|
+
static cl_context context = NULL;
|
18
|
+
static int err;
|
19
|
+
|
20
|
+
#define BUFFER_TYPE_FLOAT 0x0001
|
21
|
+
#define BUFFER_TYPE_INT 0x0002
|
22
|
+
#define BUFFER_TYPE_CHAR 0x0003
|
23
|
+
|
24
|
+
struct program {
|
25
|
+
cl_program program;
|
26
|
+
};
|
27
|
+
|
28
|
+
struct kernel {
|
29
|
+
cl_kernel kernel;
|
30
|
+
};
|
31
|
+
|
32
|
+
struct buffer {
|
33
|
+
VALUE arr;
|
34
|
+
unsigned int type;
|
35
|
+
size_t num_items;
|
36
|
+
size_t member_size;
|
37
|
+
void *cachebuf;
|
38
|
+
cl_mem data;
|
39
|
+
};
|
40
|
+
|
41
|
+
#define GET_PROGRAM() \
|
42
|
+
struct program *program; \
|
43
|
+
Data_Get_Struct(self, struct program, program);
|
44
|
+
|
45
|
+
#define GET_BUFFER() \
|
46
|
+
struct buffer *buffer; \
|
47
|
+
Data_Get_Struct(self, struct buffer, buffer);
|
48
|
+
|
49
|
+
static void
|
50
|
+
init_opencl()
|
51
|
+
{
|
52
|
+
if (device_id == NULL) {
|
53
|
+
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
|
54
|
+
if (err != CL_SUCCESS) {
|
55
|
+
rb_raise(rb_eOpenCLError, "failed to create a device group");
|
56
|
+
}
|
57
|
+
}
|
58
|
+
|
59
|
+
if (context == NULL) {
|
60
|
+
context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
|
61
|
+
if (!context) {
|
62
|
+
rb_raise(rb_eOpenCLError, "failed to create a program context");
|
63
|
+
}
|
64
|
+
}
|
65
|
+
}
|
66
|
+
|
67
|
+
static void
|
68
|
+
free_buffer(struct buffer *buffer)
|
69
|
+
{
|
70
|
+
fflush(stdout);
|
71
|
+
clReleaseMemObject(buffer->data);
|
72
|
+
rb_gc_mark(buffer->arr);
|
73
|
+
ruby_xfree(buffer->cachebuf);
|
74
|
+
ruby_xfree(buffer);
|
75
|
+
}
|
76
|
+
|
77
|
+
static VALUE
|
78
|
+
buffer_s_allocate(VALUE klass)
|
79
|
+
{
|
80
|
+
struct buffer *buffer;
|
81
|
+
buffer = ALLOC(struct buffer);
|
82
|
+
MEMZERO(buffer, struct buffer, 1);
|
83
|
+
buffer->arr = Qnil;
|
84
|
+
return Data_Wrap_Struct(klass, 0, free_buffer, buffer);
|
85
|
+
}
|
86
|
+
|
87
|
+
static void
|
88
|
+
buffer_update_cache_info(struct buffer *buffer)
|
89
|
+
{
|
90
|
+
buffer->num_items = RARRAY_LEN(buffer->arr);
|
91
|
+
|
92
|
+
switch (TYPE(RARRAY_PTR(buffer->arr)[0])) {
|
93
|
+
case T_FIXNUM:
|
94
|
+
buffer->type = BUFFER_TYPE_INT;
|
95
|
+
buffer->member_size = sizeof(int);
|
96
|
+
break;
|
97
|
+
case T_FLOAT:
|
98
|
+
buffer->type = BUFFER_TYPE_FLOAT;
|
99
|
+
buffer->member_size = sizeof(float);
|
100
|
+
break;
|
101
|
+
default:
|
102
|
+
rb_raise(rb_eRuntimeError, "invalid buffer data %s",
|
103
|
+
RSTRING_PTR(rb_inspect(buffer->arr)));
|
104
|
+
}
|
105
|
+
}
|
106
|
+
|
107
|
+
static VALUE
|
108
|
+
buffer_write(VALUE self)
|
109
|
+
{
|
110
|
+
unsigned int i;
|
111
|
+
|
112
|
+
GET_BUFFER();
|
113
|
+
|
114
|
+
buffer_update_cache_info(buffer);
|
115
|
+
|
116
|
+
if (buffer->cachebuf) {
|
117
|
+
xfree(buffer->cachebuf);
|
118
|
+
}
|
119
|
+
buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
|
120
|
+
|
121
|
+
for (i = 0; i < RARRAY_LEN(buffer->arr); i++) {
|
122
|
+
VALUE item = RARRAY_PTR(buffer->arr)[i];
|
123
|
+
switch (buffer->type) {
|
124
|
+
case BUFFER_TYPE_INT: {
|
125
|
+
int value = FIX2INT(item);
|
126
|
+
((int *)buffer->cachebuf)[i] = value;
|
127
|
+
break;
|
128
|
+
}
|
129
|
+
case BUFFER_TYPE_FLOAT: {
|
130
|
+
float value = RFLOAT_VALUE(item);
|
131
|
+
((float *)buffer->cachebuf)[i] = value;
|
132
|
+
break;
|
133
|
+
}
|
134
|
+
default:
|
135
|
+
((uint32_t *)buffer->cachebuf)[i] = 0;
|
136
|
+
}
|
137
|
+
}
|
138
|
+
|
139
|
+
return self;
|
140
|
+
}
|
141
|
+
|
142
|
+
static VALUE
|
143
|
+
buffer_read(VALUE self)
|
144
|
+
{
|
145
|
+
unsigned int i;
|
146
|
+
|
147
|
+
GET_BUFFER();
|
148
|
+
|
149
|
+
rb_gc_mark(buffer->arr);
|
150
|
+
buffer->arr = rb_ary_new2(buffer->num_items);
|
151
|
+
|
152
|
+
for (i = 0; i < buffer->num_items; i++) {
|
153
|
+
switch (buffer->type) {
|
154
|
+
case BUFFER_TYPE_INT:
|
155
|
+
rb_ary_push(buffer->arr, INT2FIX(((int *)buffer->cachebuf)[i]));
|
156
|
+
break;
|
157
|
+
case BUFFER_TYPE_FLOAT:
|
158
|
+
rb_ary_push(buffer->arr, rb_float_new(((float *)buffer->cachebuf)[i]));
|
159
|
+
break;
|
160
|
+
default:
|
161
|
+
rb_ary_push(buffer->arr, Qnil);
|
162
|
+
}
|
163
|
+
}
|
164
|
+
|
165
|
+
return self;
|
166
|
+
}
|
167
|
+
|
168
|
+
static VALUE
|
169
|
+
buffer_size_changed(VALUE self)
|
170
|
+
{
|
171
|
+
GET_BUFFER();
|
172
|
+
|
173
|
+
if (buffer->data) {
|
174
|
+
clReleaseMemObject(buffer->data);
|
175
|
+
}
|
176
|
+
buffer_update_cache_info(buffer);
|
177
|
+
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
178
|
+
buffer->num_items * buffer->member_size, NULL, NULL);
|
179
|
+
|
180
|
+
buffer_write(self);
|
181
|
+
|
182
|
+
return self;
|
183
|
+
}
|
184
|
+
|
185
|
+
static VALUE
|
186
|
+
buffer_data(VALUE self)
|
187
|
+
{
|
188
|
+
GET_BUFFER();
|
189
|
+
return buffer->arr;
|
190
|
+
}
|
191
|
+
|
192
|
+
static VALUE
|
193
|
+
buffer_data_set(VALUE self, VALUE new_value)
|
194
|
+
{
|
195
|
+
GET_BUFFER();
|
196
|
+
|
197
|
+
if (RTEST(buffer->arr)) {
|
198
|
+
rb_gc_mark(buffer->arr);
|
199
|
+
}
|
200
|
+
buffer->arr = new_value;
|
201
|
+
buffer_size_changed(self);
|
202
|
+
return buffer->arr;
|
203
|
+
}
|
204
|
+
|
205
|
+
static VALUE
|
206
|
+
buffer_initialize(int argc, VALUE *argv, VALUE self)
|
207
|
+
{
|
208
|
+
GET_BUFFER();
|
209
|
+
|
210
|
+
if (argc == 0) {
|
211
|
+
rb_raise(rb_eArgError, "no buffer data given");
|
212
|
+
}
|
213
|
+
|
214
|
+
if (TYPE(argv[0]) == T_ARRAY) {
|
215
|
+
buffer_data_set(self, argv[0]);
|
216
|
+
}
|
217
|
+
else {
|
218
|
+
buffer_data_set(self, rb_ary_new4(argc, argv));
|
219
|
+
}
|
220
|
+
|
221
|
+
return self;
|
222
|
+
}
|
223
|
+
|
224
|
+
static VALUE
|
225
|
+
obuffer_initialize(VALUE self, VALUE type, VALUE size)
|
226
|
+
{
|
227
|
+
GET_BUFFER();
|
228
|
+
|
229
|
+
StringValue(type);
|
230
|
+
if (strcmp(RSTRING_PTR(type), "float") == 0) {
|
231
|
+
buffer->type = BUFFER_TYPE_FLOAT;
|
232
|
+
buffer->member_size = sizeof(float);
|
233
|
+
}
|
234
|
+
else if (strcmp(RSTRING_PTR(type), "int") == 0) {
|
235
|
+
buffer->type = BUFFER_TYPE_INT;
|
236
|
+
buffer->member_size = sizeof(int);
|
237
|
+
}
|
238
|
+
else {
|
239
|
+
rb_raise(rb_eArgError, "type can only be :float or :int");
|
240
|
+
}
|
241
|
+
|
242
|
+
if (TYPE(size) != T_FIXNUM) {
|
243
|
+
rb_raise(rb_eArgError, "expecting buffer size as argument 2");
|
244
|
+
}
|
245
|
+
|
246
|
+
buffer->num_items = FIX2UINT(size);
|
247
|
+
buffer->cachebuf = malloc(buffer->num_items * buffer->member_size);
|
248
|
+
buffer->data = clCreateBuffer(context, CL_MEM_READ_WRITE,
|
249
|
+
buffer->member_size * buffer->num_items, NULL, NULL);
|
250
|
+
|
251
|
+
return self;
|
252
|
+
}
|
253
|
+
|
254
|
+
static VALUE
|
255
|
+
obuffer_clear(VALUE self)
|
256
|
+
{
|
257
|
+
GET_BUFFER();
|
258
|
+
memset(buffer->cachebuf, 0, buffer->member_size * buffer->num_items);
|
259
|
+
return self;
|
260
|
+
}
|
261
|
+
|
262
|
+
static VALUE
|
263
|
+
obuffer_size(VALUE self)
|
264
|
+
{
|
265
|
+
GET_BUFFER();
|
266
|
+
return INT2FIX(buffer->num_items);
|
267
|
+
}
|
268
|
+
|
269
|
+
static void
|
270
|
+
free_program(struct program *program)
|
271
|
+
{
|
272
|
+
clReleaseProgram(program->program);
|
273
|
+
xfree(program);
|
274
|
+
}
|
275
|
+
|
276
|
+
static VALUE
|
277
|
+
program_s_allocate(VALUE klass)
|
278
|
+
{
|
279
|
+
struct program *program;
|
280
|
+
program = ALLOC(struct program);
|
281
|
+
MEMZERO(program, struct program, 1);
|
282
|
+
return Data_Wrap_Struct(klass, 0, free_program, program);
|
283
|
+
}
|
284
|
+
|
285
|
+
static VALUE
|
286
|
+
program_initialize(int argc, VALUE *argv, VALUE self)
|
287
|
+
{
|
288
|
+
VALUE source;
|
289
|
+
|
290
|
+
rb_scan_args(argc, argv, "01", &source);
|
291
|
+
if (source != Qnil) {
|
292
|
+
program_compile(self, source);
|
293
|
+
}
|
294
|
+
|
295
|
+
return self;
|
296
|
+
}
|
297
|
+
|
298
|
+
static VALUE
|
299
|
+
program_compile(VALUE self, VALUE source)
|
300
|
+
{
|
301
|
+
const char *c_source;
|
302
|
+
GET_PROGRAM();
|
303
|
+
StringValue(source);
|
304
|
+
|
305
|
+
if (program->program) {
|
306
|
+
clReleaseProgram(program->program);
|
307
|
+
program->program = 0;
|
308
|
+
}
|
309
|
+
|
310
|
+
c_source = StringValueCStr(source);
|
311
|
+
program->program = clCreateProgramWithSource(context, 1, &c_source, NULL, &err);
|
312
|
+
if (!program->program) {
|
313
|
+
program->program = 0;
|
314
|
+
rb_raise(rb_eOpenCLError, "failed to create compute program");
|
315
|
+
}
|
316
|
+
|
317
|
+
err = clBuildProgram(program->program, 0, NULL, NULL, NULL, NULL);
|
318
|
+
if (err != CL_SUCCESS) {
|
319
|
+
size_t len;
|
320
|
+
char buffer[2048];
|
321
|
+
|
322
|
+
clGetProgramBuildInfo(program->program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
|
323
|
+
clReleaseProgram(program->program);
|
324
|
+
program->program = 0;
|
325
|
+
rb_raise(rb_eProgramSyntaxError, "%s", buffer);
|
326
|
+
}
|
327
|
+
|
328
|
+
return Qtrue;
|
329
|
+
}
|
330
|
+
|
331
|
+
#define CLEAN() program_clean(kernel, commands);
|
332
|
+
#define ERROR(msg) if (err != CL_SUCCESS) { CLEAN(); rb_raise(rb_eOpenCLError, msg); }
|
333
|
+
|
334
|
+
static void
|
335
|
+
program_clean(cl_kernel kernel, cl_command_queue commands)
|
336
|
+
{
|
337
|
+
clReleaseKernel(kernel);
|
338
|
+
clReleaseCommandQueue(commands);
|
339
|
+
}
|
340
|
+
|
341
|
+
static VALUE
|
342
|
+
program_method_missing(int argc, VALUE *argv, VALUE self)
|
343
|
+
{
|
344
|
+
int i;
|
345
|
+
size_t local = 0, global = 0;
|
346
|
+
cl_kernel kernel;
|
347
|
+
cl_command_queue commands;
|
348
|
+
GET_PROGRAM();
|
349
|
+
|
350
|
+
StringValue(argv[0]);
|
351
|
+
kernel = clCreateKernel(program->program, RSTRING_PTR(argv[0]), &err);
|
352
|
+
if (!kernel || err != CL_SUCCESS) {
|
353
|
+
rb_raise(rb_eNoMethodError, "no kernel method '%s'", RSTRING_PTR(argv[0]));
|
354
|
+
}
|
355
|
+
|
356
|
+
commands = clCreateCommandQueue(context, device_id, 0, &err);
|
357
|
+
if (!commands) {
|
358
|
+
rb_raise(rb_eOpenCLError, "could not execute kernel method '%s'", RSTRING_PTR(argv[0]));
|
359
|
+
}
|
360
|
+
|
361
|
+
for (i = 1; i < argc; i++) {
|
362
|
+
err = 0;
|
363
|
+
if (i == argc - 1 && TYPE(argv[i]) == T_HASH) {
|
364
|
+
VALUE worker_size = rb_hash_aref(argv[i], ID2SYM(ba_worker_size));
|
365
|
+
if (RTEST(worker_size) && TYPE(worker_size) == T_FIXNUM) {
|
366
|
+
global = FIX2UINT(worker_size);
|
367
|
+
}
|
368
|
+
else {
|
369
|
+
CLEAN();
|
370
|
+
rb_raise(rb_eArgError, "opts hash must be {:worker_size => INT_VALUE}, got %s",
|
371
|
+
RSTRING_PTR(rb_inspect(argv[i])));
|
372
|
+
}
|
373
|
+
break;
|
374
|
+
}
|
375
|
+
|
376
|
+
switch(TYPE(argv[i])) {
|
377
|
+
case T_FIXNUM: {
|
378
|
+
int value = FIX2INT(argv[i]);
|
379
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(int), &value);
|
380
|
+
break;
|
381
|
+
}
|
382
|
+
case T_FLOAT: {
|
383
|
+
float value = RFLOAT_VALUE(argv[i]);
|
384
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(float), &value);
|
385
|
+
break;
|
386
|
+
}
|
387
|
+
case T_ARRAY: {
|
388
|
+
/* TODO */
|
389
|
+
/* fall-through */
|
390
|
+
}
|
391
|
+
default:
|
392
|
+
if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
|
393
|
+
struct buffer *buffer;
|
394
|
+
Data_Get_Struct(argv[i], struct buffer, buffer);
|
395
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
396
|
+
if (buffer->num_items > global) {
|
397
|
+
global = buffer->num_items;
|
398
|
+
}
|
399
|
+
}
|
400
|
+
else if (CLASS_OF(argv[i]) == rb_cBuffer) {
|
401
|
+
struct buffer *buffer;
|
402
|
+
Data_Get_Struct(argv[i], struct buffer, buffer);
|
403
|
+
|
404
|
+
buffer_write(argv[i]);
|
405
|
+
clEnqueueWriteBuffer(commands, buffer->data, CL_TRUE, 0,
|
406
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
407
|
+
err = clSetKernelArg(kernel, i - 1, sizeof(cl_mem), &buffer->data);
|
408
|
+
}
|
409
|
+
break;
|
410
|
+
}
|
411
|
+
if (err != CL_SUCCESS) {
|
412
|
+
CLEAN();
|
413
|
+
rb_raise(rb_eArgError, "invalid kernel method parameter: %s", RSTRING_PTR(rb_inspect(argv[i])));
|
414
|
+
}
|
415
|
+
}
|
416
|
+
|
417
|
+
err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &local, NULL);
|
418
|
+
ERROR("failed to retrieve kernel work group info");
|
419
|
+
|
420
|
+
{ /* global work size must be power of 2, greater than 3 and not smaller than local */
|
421
|
+
size_t size = 4;
|
422
|
+
while (size < global) size *= 2;
|
423
|
+
global = size;
|
424
|
+
if (global < local) global = local;
|
425
|
+
}
|
426
|
+
|
427
|
+
clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
|
428
|
+
if (err) { CLEAN(); rb_raise(rb_eOpenCLError, "failed to execute kernel method"); }
|
429
|
+
|
430
|
+
clFinish(commands);
|
431
|
+
|
432
|
+
for (i = 1; i < argc; i++) {
|
433
|
+
if (CLASS_OF(argv[i]) == rb_cOutputBuffer) {
|
434
|
+
struct buffer *buffer;
|
435
|
+
Data_Get_Struct(argv[i], struct buffer, buffer);
|
436
|
+
err = clEnqueueReadBuffer(commands, buffer->data, CL_TRUE, 0,
|
437
|
+
buffer->num_items * buffer->member_size, buffer->cachebuf, 0, NULL, NULL);
|
438
|
+
ERROR("failed to read output buffer");
|
439
|
+
buffer_read(argv[i]);
|
440
|
+
}
|
441
|
+
}
|
442
|
+
|
443
|
+
CLEAN();
|
444
|
+
return Qnil;
|
445
|
+
}
|
446
|
+
|
447
|
+
void
|
448
|
+
Init_barracuda()
|
449
|
+
{
|
450
|
+
ba_worker_size = rb_intern("worker_size");
|
451
|
+
|
452
|
+
rb_mBarracuda = rb_define_module("Barracuda");
|
453
|
+
|
454
|
+
rb_eProgramSyntaxError = rb_define_class_under(rb_mBarracuda, "SyntaxError", rb_eSyntaxError);
|
455
|
+
rb_eOpenCLError = rb_define_class_under(rb_mBarracuda, "OpenCLError", rb_eStandardError);
|
456
|
+
|
457
|
+
rb_cProgram = rb_define_class_under(rb_mBarracuda, "Program", rb_cObject);
|
458
|
+
rb_define_alloc_func(rb_cProgram, program_s_allocate);
|
459
|
+
rb_define_method(rb_cProgram, "initialize", program_initialize, -1);
|
460
|
+
rb_define_method(rb_cProgram, "compile", program_compile, 1);
|
461
|
+
rb_define_method(rb_cProgram, "method_missing", program_method_missing, -1);
|
462
|
+
|
463
|
+
rb_cBuffer = rb_define_class_under(rb_mBarracuda, "Buffer", rb_cObject);
|
464
|
+
rb_define_alloc_func(rb_cBuffer, buffer_s_allocate);
|
465
|
+
rb_define_method(rb_cBuffer, "initialize", buffer_initialize, -1);
|
466
|
+
rb_define_method(rb_cBuffer, "size_changed", buffer_size_changed, 0);
|
467
|
+
rb_define_method(rb_cBuffer, "read", buffer_read, 0);
|
468
|
+
rb_define_method(rb_cBuffer, "write", buffer_write, 0);
|
469
|
+
rb_define_method(rb_cBuffer, "data", buffer_data, 0);
|
470
|
+
rb_define_method(rb_cBuffer, "data=", buffer_data_set, 1);
|
471
|
+
|
472
|
+
rb_cOutputBuffer = rb_define_class_under(rb_mBarracuda, "OutputBuffer", rb_cBuffer);
|
473
|
+
rb_define_method(rb_cOutputBuffer, "initialize", obuffer_initialize, 2);
|
474
|
+
rb_define_method(rb_cOutputBuffer, "size", obuffer_size, 0);
|
475
|
+
rb_define_method(rb_cOutputBuffer, "clear", obuffer_clear, 0);
|
476
|
+
rb_undef_method(rb_cOutputBuffer, "write");
|
477
|
+
rb_undef_method(rb_cOutputBuffer, "size_changed");
|
478
|
+
rb_undef_method(rb_cOutputBuffer, "data=");
|
479
|
+
|
480
|
+
init_opencl();
|
481
|
+
}
|
data/ext/extconf.rb
ADDED
@@ -0,0 +1,174 @@
|
|
1
|
+
$:.unshift(File.dirname(__FILE__) + '/../ext/')
|
2
|
+
|
3
|
+
require "test/unit"
|
4
|
+
require "barracuda"
|
5
|
+
|
6
|
+
include Barracuda
|
7
|
+
|
8
|
+
class TestBuffer < Test::Unit::TestCase
|
9
|
+
def test_buffer_create_no_data
|
10
|
+
assert_raise(ArgumentError) { Buffer.new }
|
11
|
+
end
|
12
|
+
|
13
|
+
def test_buffer_create_invalid_data
|
14
|
+
assert_raise(RuntimeError) { Buffer.new("xyz") }
|
15
|
+
end
|
16
|
+
|
17
|
+
def test_buffer_create_with_array
|
18
|
+
b = Buffer.new([1, 2, 3, 4, 5])
|
19
|
+
assert_equal [1, 2, 3, 4, 5], b.data
|
20
|
+
end
|
21
|
+
|
22
|
+
def test_buffer_create_with_splat
|
23
|
+
b = Buffer.new(1.0, 2.0, 3.0)
|
24
|
+
assert_equal [1.0, 2.0, 3.0], b.data
|
25
|
+
end
|
26
|
+
|
27
|
+
def test_buffer_set_data
|
28
|
+
b = Buffer.new(1)
|
29
|
+
b.data = [1, 2, 3]
|
30
|
+
assert_equal 3, b.data.size
|
31
|
+
end
|
32
|
+
|
33
|
+
def test_buffer_read
|
34
|
+
b = Buffer.new(4, 2, 3)
|
35
|
+
b.data[0] = 1
|
36
|
+
b.read
|
37
|
+
assert_equal [4,2,3], b.data
|
38
|
+
end
|
39
|
+
|
40
|
+
def test_buffer_write
|
41
|
+
b = Buffer.new(1, 2, 3)
|
42
|
+
b.data[0] = 4
|
43
|
+
b.write
|
44
|
+
b.read
|
45
|
+
assert_equal [4,2,3], b.data
|
46
|
+
end
|
47
|
+
|
48
|
+
def test_buffer_size_changed
|
49
|
+
b = Buffer.new(1, 2, 3)
|
50
|
+
b.data << 4
|
51
|
+
b.size_changed
|
52
|
+
b.read
|
53
|
+
assert_equal [1,2,3,4], b.data
|
54
|
+
end
|
55
|
+
end
|
56
|
+
|
57
|
+
class TestOutputBuffer < Test::Unit::TestCase
|
58
|
+
def test_create_int_output_buffer
|
59
|
+
b = OutputBuffer.new(:int, 5)
|
60
|
+
assert_equal 5, b.size
|
61
|
+
end
|
62
|
+
|
63
|
+
def test_create_int_output_buffer
|
64
|
+
b = OutputBuffer.new(:float, 5)
|
65
|
+
assert_equal 5, b.size
|
66
|
+
end
|
67
|
+
|
68
|
+
def test_create_output_buffer_with_invalid_type
|
69
|
+
assert_raise(ArgumentError) { OutputBuffer.new(:char, 5) }
|
70
|
+
end
|
71
|
+
|
72
|
+
def test_create_output_buffer_with_invalid_size
|
73
|
+
assert_raise(ArgumentError) { OutputBuffer.new(:int, 'x') }
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
class TestProgram < Test::Unit::TestCase
|
78
|
+
def test_program_create_invalid_code
|
79
|
+
assert_raise(Barracuda::SyntaxError) { Program.new "fib { SYNTAXERROR }" }
|
80
|
+
end
|
81
|
+
|
82
|
+
def test_program_create
|
83
|
+
assert_nothing_raised { Program.new "__kernel fib(int x) { return 0; }"}
|
84
|
+
end
|
85
|
+
|
86
|
+
def test_program_compile
|
87
|
+
p = Program.new
|
88
|
+
assert_nothing_raised { p.compile "__kernel fib(int x) { }" }
|
89
|
+
end
|
90
|
+
|
91
|
+
def test_kernel_run
|
92
|
+
p = Program.new("__kernel x_y_z(int x) { }")
|
93
|
+
assert_nothing_raised { p.x_y_z }
|
94
|
+
end
|
95
|
+
|
96
|
+
def test_kernel_missing
|
97
|
+
p = Program.new("__kernel x_y_z(int x) { }")
|
98
|
+
assert_raise(NoMethodError) { p.not_x_y_z }
|
99
|
+
end
|
100
|
+
|
101
|
+
def test_program_int_input_buffer
|
102
|
+
p = Program.new <<-'eof'
|
103
|
+
__kernel run(__global int* out, __global int* in, int total) {
|
104
|
+
int id = get_global_id(0);
|
105
|
+
if (id < total) out[id] = in[id] + 1;
|
106
|
+
}
|
107
|
+
eof
|
108
|
+
|
109
|
+
arr = (1..256).to_a
|
110
|
+
_in = Buffer.new(arr)
|
111
|
+
out = OutputBuffer.new(:int, arr.size)
|
112
|
+
p.run(out, _in, arr.size)
|
113
|
+
assert_equal arr.map {|x| x + 1 }, out.data
|
114
|
+
end
|
115
|
+
|
116
|
+
def test_program_float_buffer
|
117
|
+
p = Program.new <<-'eof'
|
118
|
+
__kernel run(__global float* out, __global int* in, int total) {
|
119
|
+
int id = get_global_id(0);
|
120
|
+
if (id < total) out[id] = (float)in[id] + 0.5;
|
121
|
+
}
|
122
|
+
eof
|
123
|
+
|
124
|
+
arr = (1..256).to_a
|
125
|
+
_in = Buffer.new(arr)
|
126
|
+
out = OutputBuffer.new(:float, arr.size)
|
127
|
+
p.run(out, _in, arr.size)
|
128
|
+
assert_equal arr.map {|x| x.to_f + 0.5 }, out.data
|
129
|
+
end
|
130
|
+
|
131
|
+
def test_program_set_worker_size
|
132
|
+
p = Program.new <<-'eof'
|
133
|
+
__kernel sum(__global int* out, __global int* in, int total) {
|
134
|
+
int id = get_global_id(0);
|
135
|
+
if (id < total) atom_add(&out[0], in[id]);
|
136
|
+
}
|
137
|
+
eof
|
138
|
+
|
139
|
+
arr = (1..517).to_a
|
140
|
+
sum = arr.inject(0) {|acc, el| acc + el }
|
141
|
+
_in = Buffer.new(arr)
|
142
|
+
out = OutputBuffer.new(:int, 1)
|
143
|
+
p.sum(out, _in, arr.size, :worker_size => arr.size)
|
144
|
+
assert_equal sum, out.data[0]
|
145
|
+
end
|
146
|
+
|
147
|
+
def test_program_largest_buffer_is_input
|
148
|
+
p = Program.new <<-'eof'
|
149
|
+
__kernel sum(__global int* out, __global int* in, int total) {
|
150
|
+
int id = get_global_id(0);
|
151
|
+
if (id < total) atom_add(&out[0], in[id]);
|
152
|
+
}
|
153
|
+
eof
|
154
|
+
|
155
|
+
arr = (1..517).to_a
|
156
|
+
sum = arr.inject(0) {|acc, el| acc + el }
|
157
|
+
_in = Buffer.new(arr)
|
158
|
+
out = OutputBuffer.new(:int, 1)
|
159
|
+
p.sum(out, _in, arr.size)
|
160
|
+
assert_equal sum, out.data[0]
|
161
|
+
end
|
162
|
+
|
163
|
+
def test_program_invalid_worker_size
|
164
|
+
p = Program.new("__kernel sum(int x) { }")
|
165
|
+
assert_raise(ArgumentError) { p.sum(:worker_size => "hello") }
|
166
|
+
assert_raise(ArgumentError) { p.sum(:worker => 1) }
|
167
|
+
end
|
168
|
+
|
169
|
+
def test_program_invalid_args
|
170
|
+
p = Program.new("__kernel sum(int x, __global int *y) { }")
|
171
|
+
assert_raise(ArgumentError) { p.sum(1, 2) }
|
172
|
+
assert_raise(ArgumentError) { p.sum(1, OutputBuffer.new(:int, 1), 3) }
|
173
|
+
end
|
174
|
+
end
|
metadata
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: barracuda
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: "1.0"
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Loren Segal
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-08-30 00:00:00 -04:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description:
|
17
|
+
email: lsegal@soen.ca
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions:
|
21
|
+
- ext/extconf.rb
|
22
|
+
extra_rdoc_files: []
|
23
|
+
|
24
|
+
files:
|
25
|
+
- ext/barracuda.c
|
26
|
+
- ext/extconf.rb
|
27
|
+
- benchmarks/to_float.rb
|
28
|
+
- test/test_barracuda.rb
|
29
|
+
- LICENSE
|
30
|
+
- README.md
|
31
|
+
- Rakefile
|
32
|
+
has_rdoc: true
|
33
|
+
homepage: http://github.com/lsegal/barracuda
|
34
|
+
licenses: []
|
35
|
+
|
36
|
+
post_install_message:
|
37
|
+
rdoc_options: []
|
38
|
+
|
39
|
+
require_paths:
|
40
|
+
- ext
|
41
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
42
|
+
requirements:
|
43
|
+
- - ">="
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: "0"
|
46
|
+
version:
|
47
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
48
|
+
requirements:
|
49
|
+
- - ">="
|
50
|
+
- !ruby/object:Gem::Version
|
51
|
+
version: "0"
|
52
|
+
version:
|
53
|
+
requirements: []
|
54
|
+
|
55
|
+
rubyforge_project: barracuda
|
56
|
+
rubygems_version: 1.3.4
|
57
|
+
signing_key:
|
58
|
+
specification_version: 3
|
59
|
+
summary: Barracuda is a wrapper library for OpenCL/CUDA GPGPU programming
|
60
|
+
test_files:
|
61
|
+
- test/test_barracuda.rb
|