cumo 0.2.4 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 981428686ca946222ba4e575d35d55cc1139ae9a057afe11bb19166d189ddd4d
4
- data.tar.gz: 88129353d1da98170baa4362b46032ffa49b236e7559f1c5e60a6bfa32192c7c
3
+ metadata.gz: 1b28beaea182d622d304bcb3153e56aa3280993ec079aea44c00b915d1e92b77
4
+ data.tar.gz: 26fc0e1942a444e5f9cb4641b3e36f9985593a10de26188f0a4142e72314d82a
5
5
  SHA512:
6
- metadata.gz: 7f2cfcf951e787490e427ad6688e8003b7a0d9132e5cf1a030b78727561ddbf603df367e71169f91e926b67660b40f0b1c66f26ab30c744e8afb680d9568bfa7
7
- data.tar.gz: 1c47cb15aa0676369c37cc8a77771e7b498b56fbcb7f8b6321db4c3a9c1a24e3cb8a1fdb37e9b72331441458458b943a36fe34bc2b01706998c9fa50d06413be
6
+ metadata.gz: a678cb7965fbbc9febf6b5f2f557f8be34f28c051fc0437a87506d3a067a34778a73b75dbeb56da14fd538062a8454355efd06bb686056db5b4df7cab9c04e86
7
+ data.tar.gz: 30ce98cae4e84ee7e9e73eae3ad76bcaca1e636462301d1afe1aa50e1f50633ed1b16756b90aaeba1a3e0870179d7e2dbee41696b175f0a454efee93e5f89591
@@ -1,3 +1,9 @@
1
+ # 0.2.5 (2019-03-04)-
2
+
3
+ Enhancements:
4
+
5
+ * Support arithmetic sequence, which is available in ruby >= 2.6.0 (thanks to naitoh)
6
+
1
7
  # 0.2.4 (2018-11-21)
2
8
 
3
9
  Changes:
data/README.md CHANGED
@@ -1,7 +1,6 @@
1
1
  # Cumo
2
2
 
3
- Cumo (pronounced like "koomo") is CUDA-aware numerical library whose interface is highly compatible with [Ruby Numo](https://github.com/ruby-numo).
4
- This library provides the benefit of speedup using GPU by replacing Numo with only a small piece of codes.
3
+ Cumo (pronounced "koomo") is a CUDA-aware, GPU-optimized numerical library that offers a significant performance boost over [Ruby Numo](https://github.com/ruby-numo), while (mostly) maintaining drop-in compatibility.
5
4
 
6
5
  <img src="https://raw.githubusercontent.com/sonots/cumo-logo/master/logo_transparent.png" alt="cumo logo" title="cumo logo" width="50%">
7
6
 
@@ -13,7 +12,7 @@ This library provides the benefit of speedup using GPU by replacing Numo with on
13
12
 
14
13
  ## Preparation
15
14
 
16
- Install CUDA and setup environment variables as follows:
15
+ Install CUDA and set your environment variables as follows:
17
16
 
18
17
  ```bash
19
18
  export CUDA_PATH="/usr/local/cuda"
@@ -25,7 +24,7 @@ export LIBRARY_PATH="$CUDA_PATH/lib64:$CUDA_PATH/lib:$LIBRARY_PATH"
25
24
 
26
25
  ## Installation
27
26
 
28
- Add a following line to your Gemfile:
27
+ Add the following line to your Gemfile:
29
28
 
30
29
  ```ruby
31
30
  gem 'cumo'
@@ -63,15 +62,15 @@ An example:
63
62
  => 15
64
63
  ```
65
64
 
66
- ### How to switch from Numo to Cumo
65
+ ### Switching from Numo to Cumo
67
66
 
68
- Basically, following command should make it work with Cumo.
67
+ The following find-and-replace should just work:
69
68
 
70
69
  ```
71
70
  find . -type f | xargs sed -i -e 's/Numo/Cumo/g' -e 's/numo/cumo/g'
72
71
  ```
73
72
 
74
- If you want to switch Numo and Cumo dynamically, following snippets should work:
73
+ If you want to dynamically switch between Numo and Cumo, something like the following will work:
75
74
 
76
75
  ```ruby
77
76
  if gpu
@@ -87,17 +86,17 @@ a = xm::DFloat.new(3,5).seq
87
86
 
88
87
  ### Incompatibility With Numo
89
88
 
90
- Following methods behave incompatibly with Numo as default for performance.
89
+ The following methods behave incompatibly with Numo by default for performance reasons:
91
90
 
92
91
  * `extract`
93
92
  * `[]`
94
93
  * `count_true`
95
94
  * `count_false`
96
95
 
97
- Numo returns a Ruby numeric object for 0-dimensional NArray, but Cumo returns the 0-dimensional NArray instead of a Ruby numeric object.
98
- This is to avoid synchnoziation between CPU and GPU for performance.
96
+ Numo returns a Ruby numeric object for 0-dimensional NArray, while Cumo returns the 0-dimensional NArray instead of a Ruby numeric object.
97
+ Cumo differs in this way to avoid synchronization and minimize CPU GPU data transfer.
99
98
 
100
- You may set `CUMO_COMPATIBLE_MODE=ON` environment variable to force Cumo NArray behave compatibly with Numo NArray.
99
+ Set the `CUMO_COMPATIBLE_MODE` environment variable to `ON` to force Numo NArray compatibility (for worse performance).
101
100
 
102
101
  You may enable or disable `compatible_mode` as:
103
102
 
@@ -109,7 +108,7 @@ Cumo.disable_compatible_mode # disable
109
108
  Cumo.compatible_mode_enabled? #=> false
110
109
  ```
111
110
 
112
- You can also use following methods which behaves as Numo NArray's methods. Behaviors of these methods do not depend on `compatible_mode`.
111
+ You can also use the following methods which behave like Numo's NArray methods. The behavior of these methods does not depend on `compatible_mode`.
113
112
 
114
113
  * `extract_cpu`
115
114
  * `aref_cpu(*idx)`
@@ -118,7 +117,7 @@ You can also use following methods which behaves as Numo NArray's methods. Behav
118
117
 
119
118
  ### Select a GPU device ID
120
119
 
121
- Set `CUDA_VISIBLE_DEVICES=id` environment variable, or
120
+ Set the `CUDA_VISIBLE_DEVICES=id` environment variable, or
122
121
 
123
122
  ```
124
123
  require 'cumo'
@@ -129,7 +128,7 @@ where `id` is an integer.
129
128
 
130
129
  ### Disable GPU Memory Pool
131
130
 
132
- GPU memory pool is enabled as default. To disable, set `CUMO_MEMORY_POOL=OFF` environment variable , or
131
+ GPU memory pool is enabled by default. To disable it, set `CUMO_MEMORY_POOL=OFF`, or:
133
132
 
134
133
  ```
135
134
  require 'cumo'
@@ -138,11 +137,11 @@ Cumo::CUDA::MemoryPool.disable
138
137
 
139
138
  ## Documentation
140
139
 
141
- See https://github.com/ruby-numo/numo-narray#documentation and replace Numo to Cumo.
140
+ See https://github.com/ruby-numo/numo-narray#documentation, replacing Numo with Cumo.
142
141
 
143
142
  ## Contributions
144
143
 
145
- This project is still under development. See [issues](https://github.com/sonots/cumo/issues) for future works.
144
+ This project is under active development. See [issues](https://github.com/sonots/cumo/issues) for future works.
146
145
 
147
146
  ## Development
148
147
 
@@ -170,12 +169,12 @@ Generate docs:
170
169
  bundle exec rake docs
171
170
  ```
172
171
 
173
- ## Advanced Tips on Development
172
+ ## Advanced Development Tips
174
173
 
175
174
  ### ccache
176
175
 
177
176
  [ccache](https://ccache.samba.org/) would be useful to speedup compilation time.
178
- Install ccache and setup as:
177
+ Install ccache and configure with:
179
178
 
180
179
 
181
180
  ```bash
@@ -187,7 +186,7 @@ ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/nvcc"
187
186
 
188
187
  ### Build in parallel
189
188
 
190
- Use `MAKEFLAGS` environment variable to specify `make` command options. You can build in parallel as:
189
+ Set `MAKEFLAGS` to specify `make` command options. You can build in parallel as:
191
190
 
192
191
  ```
193
192
  bundle exec env MAKEFLAG=-j8 rake compile
@@ -199,11 +198,11 @@ bundle exec env MAKEFLAG=-j8 rake compile
199
198
  bundle exec env CUMO_NVCC_GENERATE_CODE=arch=compute_60,code=sm_60 rake compile
200
199
  ```
201
200
 
202
- This is useful even on development because it makes possible to skip JIT compilation of PTX to cubin occurring on runtime.
201
+ This is useful even on development because it makes it possible to skip JIT compilation of PTX to cubin during runtime.
203
202
 
204
203
  ### Run tests with gdb
205
204
 
206
- Compile with debug option:
205
+ Compile with debugging enabled:
207
206
 
208
207
  ```
209
208
  bundle exec DEBUG=1 rake compile
@@ -242,7 +241,7 @@ bundle exec DTYPE=dfloat ruby test/narray_test.rb
242
241
  bundle exec CUDA_LAUNCH_BLOCKING=1
243
242
  ```
244
243
 
245
- ### Show GPU synchnoziation warnings
244
+ ### Show GPU synchronization warnings
246
245
 
247
246
  Cumo shows warnings if CPU and GPU synchronization occurs if:
248
247
 
@@ -250,8 +249,8 @@ Cumo shows warnings if CPU and GPU synchronization occurs if:
250
249
  export CUMO_SHOW_WARNING=ON
251
250
  ```
252
251
 
253
- As default, it shows warnings occurred at the same place only once.
254
- You may want to show warnings everytime rather than once as:
252
+ By default, Cumo shows warnings that occurred at the same place only once.
253
+ To show all, multiple warnings, set:
255
254
 
256
255
  ```
257
256
  export CUMO_SHOW_WARNING=ON
@@ -3,6 +3,7 @@ require 'benchmark'
3
3
 
4
4
  NUM = (ARGV.first || 100).to_i
5
5
 
6
+ # warm up
6
7
  a = Cumo::Float32.new(10).seq(1)
7
8
  b = Cumo::Float32.new(10).seq(10,10)
8
9
  c = a + b
@@ -29,7 +29,15 @@ cumo_cuda_runtime_malloc(size_t size)
29
29
  } catch (const cumo::internal::CUDARuntimeError& e) {
30
30
  cumo_cuda_runtime_check_status(e.status());
31
31
  } catch (const cumo::internal::OutOfMemoryError& e) {
32
- rb_raise(cumo_cuda_eOutOfMemoryError, "%s", e.what());
32
+ // retry after GC
33
+ rb_funcall(rb_define_module("GC"), rb_intern("start"), 0);
34
+ try {
35
+ return reinterpret_cast<char*>(pool.Malloc(size));
36
+ } catch (const cumo::internal::CUDARuntimeError& e) {
37
+ cumo_cuda_runtime_check_status(e.status());
38
+ } catch (const cumo::internal::OutOfMemoryError& e) {
39
+ rb_raise(cumo_cuda_eOutOfMemoryError, "%s", e.what());
40
+ }
33
41
  }
34
42
  } else {
35
43
  void *ptr = 0;
@@ -139,6 +139,8 @@ intptr_t SingleDeviceMemoryPool::Malloc(size_t size, cudaStream_t stream_ptr) {
139
139
  if (e.status() != cudaErrorMemoryAllocation) {
140
140
  throw;
141
141
  }
142
+ // Retry after free all free blocks.
143
+ // NOTE: Anotehr retry after GC is done at cumo_cuda_runtime_malloc.
142
144
  FreeAllBlocks();
143
145
  try {
144
146
  mem = std::make_shared<Memory>(size);
@@ -146,21 +148,8 @@ intptr_t SingleDeviceMemoryPool::Malloc(size_t size, cudaStream_t stream_ptr) {
146
148
  if (e.status() != cudaErrorMemoryAllocation) {
147
149
  throw;
148
150
  }
149
- #ifdef NO_RUBY // cpp test does not bind with libruby
150
151
  size_t total = size + GetTotalBytes();
151
152
  throw OutOfMemoryError(size, total);
152
- #else
153
- rb_funcall(rb_define_module("GC"), rb_intern("start"), 0);
154
- try {
155
- mem = std::make_shared<Memory>(size);
156
- } catch (const CUDARuntimeError& e) {
157
- if (e.status() != cudaErrorMemoryAllocation) {
158
- throw;
159
- }
160
- size_t total = size + GetTotalBytes();
161
- throw OutOfMemoryError(size, total);
162
- }
163
- #endif
164
153
  }
165
154
  }
166
155
  chunk = std::make_shared<Chunk>(mem, 0, size, stream_ptr);
@@ -54,11 +54,11 @@ bool cumo_show_warning_enabled_p()
54
54
  return cumo_show_warning_enabled;
55
55
  }
56
56
 
57
- static bool cumo_warning_once_enabled;
57
+ static bool cumo_show_warning_once_enabled;
58
58
 
59
- bool cumo_warning_once_enabled_p()
59
+ bool cumo_show_warning_once_enabled_p()
60
60
  {
61
- return cumo_warning_once_enabled;
61
+ return cumo_show_warning_once_enabled;
62
62
  }
63
63
 
64
64
  /*
@@ -130,7 +130,7 @@ Init_cumo()
130
130
 
131
131
  // default is true
132
132
  env = getenv("CUMO_SHOW_WARNING_ONCE");
133
- cumo_warning_once_enabled = env == NULL || (strcmp(env, "OFF") != 0 && strcmp(env, "0") != 0 && strcmp(env, "NO") != 0);
133
+ cumo_show_warning_once_enabled = env == NULL || (strcmp(env, "OFF") != 0 && strcmp(env, "0") != 0 && strcmp(env, "NO") != 0);
134
134
 
135
135
  Init_cumo_narray();
136
136
 
@@ -53,6 +53,6 @@ run-ctest : cuda/memory_pool_impl_test.exe
53
53
  ./$<
54
54
 
55
55
  cuda/memory_pool_impl_test.exe: cuda/memory_pool_impl_test.cpp cuda/memory_pool_impl.cpp cuda/memory_pool_impl.hpp
56
- nvcc -DNO_RUBY -std=c++14 <%= ENV['DEBUG'] ? '-g -O0 --compiler-options -Wall' : '' %> -L. -L$(libdir) -I. $(INCFLAGS) -o $@ $< cuda/memory_pool_impl.cpp
56
+ nvcc -std=c++14 <%= ENV['DEBUG'] ? '-g -O0 --compiler-options -Wall' : '' %> -L. -L$(libdir) -I. $(INCFLAGS) -o $@ $< cuda/memory_pool_impl.cpp
57
57
 
58
58
  CLEANOBJS = *.o */*.o */*/*.o *.bak narray/types/*.c narray/types/*_kernel.cu *.exe */*.exe
@@ -68,6 +68,7 @@ narray/step
68
68
  narray/index
69
69
  narray/index_kernel
70
70
  narray/ndloop
71
+ narray/ndloop_kernel
71
72
  narray/data
72
73
  narray/data_kernel
73
74
  narray/types/bit
@@ -158,6 +159,7 @@ unless have_type("u_int64_t", stdint)
158
159
  have_type("uint64_t", stdint)
159
160
  end
160
161
  have_func("exp10")
162
+ have_func("rb_arithmetic_sequence_extract")
161
163
 
162
164
  have_var("rb_cComplex")
163
165
  have_func("rb_thread_call_without_gvl")
@@ -10,17 +10,17 @@ extern "C" {
10
10
  #endif
11
11
  #endif
12
12
 
13
- #define CUMO_VERSION "0.2.4"
14
- #define CUMO_VERSION_CODE 24
13
+ #define CUMO_VERSION "0.2.5"
14
+ #define CUMO_VERSION_CODE 25
15
15
 
16
16
  bool cumo_compatible_mode_enabled_p();
17
17
  bool cumo_show_warning_enabled_p();
18
- bool cumo_warning_once_enabled_p();
18
+ bool cumo_show_warning_once_enabled_p();
19
19
 
20
20
  #define CUMO_SHOW_WARNING_ONCE( c_str ) \
21
21
  { \
22
22
  if (cumo_show_warning_enabled_p()) { \
23
- if (cumo_warning_once_enabled_p()) { \
23
+ if (cumo_show_warning_once_enabled_p()) { \
24
24
  static bool show_warning = true; \
25
25
  if (show_warning) { \
26
26
  fprintf(stderr, (c_str)); \
@@ -30,6 +30,11 @@ typedef struct {
30
30
  ssize_t step[CUMO_NA_MAX_DIMENSION]; // or strides
31
31
  } cumo_na_iarray_t;
32
32
 
33
+ typedef struct {
34
+ char* ptr;
35
+ cumo_stridx_t stridx[CUMO_NA_MAX_DIMENSION];
36
+ } cumo_na_iarray_stridx_t;
37
+
33
38
  typedef struct {
34
39
  cumo_na_iarray_t in;
35
40
  cumo_na_iarray_t out;
@@ -216,6 +221,51 @@ cumo_na_iarray_at_dim1(cumo_na_iarray_t* iarray, cumo_na_indexer_t* indexer) {
216
221
  return iarray->ptr + iarray->step[0] * indexer->raw_index;
217
222
  }
218
223
 
224
+ __host__ __device__
225
+ static inline char*
226
+ cumo_na_iarray_stridx_at_dim(cumo_na_iarray_stridx_t* iarray, cumo_na_indexer_t* indexer) {
227
+ char* ptr = iarray->ptr;
228
+ for (int idim = 0; idim < indexer->ndim; ++idim) {
229
+ if (CUMO_SDX_IS_INDEX(iarray->stridx[idim])) {
230
+ ptr += CUMO_SDX_GET_INDEX(iarray->stridx[idim])[indexer->index[idim]];
231
+ } else {
232
+ ptr += CUMO_SDX_GET_STRIDE(iarray->stridx[idim]) * indexer->index[idim];
233
+ }
234
+ }
235
+ return ptr;
236
+ }
237
+
238
+ // Let compiler optimize
239
+ #define CUMO_NA_IARRAY_STRIDX_AT(NDIM) \
240
+ __host__ __device__ \
241
+ static inline char* \
242
+ cumo_na_iarray_stridx_at_dim##NDIM(cumo_na_iarray_stridx_t* iarray, cumo_na_indexer_t* indexer) { \
243
+ char* ptr = iarray->ptr; \
244
+ for (int idim = 0; idim < NDIM; ++idim) { \
245
+ if (CUMO_SDX_IS_INDEX(iarray->stridx[idim])) { \
246
+ ptr += CUMO_SDX_GET_INDEX(iarray->stridx[idim])[indexer->index[idim]]; \
247
+ } else { \
248
+ ptr += CUMO_SDX_GET_STRIDE(iarray->stridx[idim]) * indexer->index[idim]; \
249
+ } \
250
+ } \
251
+ return ptr; \
252
+ }
253
+
254
+ CUMO_NA_IARRAY_STRIDX_AT(4)
255
+ CUMO_NA_IARRAY_STRIDX_AT(3)
256
+ CUMO_NA_IARRAY_STRIDX_AT(2)
257
+ CUMO_NA_IARRAY_STRIDX_AT(0)
258
+
259
+ __host__ __device__
260
+ static inline char*
261
+ cumo_na_iarray_stridx_at_dim1(cumo_na_iarray_stridx_t* iarray, cumo_na_indexer_t* indexer) {
262
+ if (CUMO_SDX_IS_INDEX(iarray->stridx[0])) {
263
+ return iarray->ptr + CUMO_SDX_GET_INDEX(iarray->stridx[0])[indexer->raw_index];
264
+ } else {
265
+ return iarray->ptr + CUMO_SDX_GET_STRIDE(iarray->stridx[0]) * indexer->raw_index;
266
+ }
267
+ }
268
+
219
269
  #endif // #ifdef __CUDACC__
220
270
 
221
271
  #endif // CUMO_INDEXER_H
@@ -69,6 +69,7 @@ bool cumo_na_test_reduce(VALUE reduce, int dim);
69
69
 
70
70
  void cumo_na_step_array_index(VALUE self, size_t ary_size, size_t *plen, ssize_t *pbeg, ssize_t *pstep);
71
71
  void cumo_na_step_sequence(VALUE self, size_t *plen, double *pbeg, double *pstep);
72
+ void cumo_na_parse_enumerator_step(VALUE enum_obj, VALUE *pstep);
72
73
 
73
74
  // used in aref, aset
74
75
  int cumo_na_get_result_dimension(VALUE self, int argc, VALUE *argv, ssize_t stride, size_t *pos_idx);
@@ -196,10 +196,12 @@ extern VALUE cumo_cUInt32;
196
196
  extern VALUE cumo_cUInt16;
197
197
  extern VALUE cumo_cUInt8;
198
198
  extern VALUE cumo_cRObject;
199
- extern VALUE cumo_na_cStep;
200
199
  #ifndef HAVE_RB_CCOMPLEX
201
200
  extern VALUE rb_cComplex;
202
201
  #endif
202
+ #ifdef HAVE_RB_ARITHMETIC_SEQUENCE_EXTRACT
203
+ extern VALUE rb_cArithSeq;
204
+ #endif
203
205
 
204
206
  extern VALUE cumo_sym_reduce;
205
207
  extern VALUE cumo_sym_option;
@@ -265,6 +267,23 @@ typedef struct {
265
267
  unsigned int element_stride;
266
268
  } cumo_narray_type_info_t;
267
269
 
270
+ // from ruby/enumerator.c
271
+ typedef struct {
272
+ VALUE obj;
273
+ ID meth;
274
+ VALUE args;
275
+ // use only above in this source
276
+ VALUE fib;
277
+ VALUE dst;
278
+ VALUE lookahead;
279
+ VALUE feedvalue;
280
+ VALUE stop_exc;
281
+ VALUE size;
282
+ // incompatible below depending on ruby version
283
+ //VALUE procs; // ruby 2.4
284
+ //rb_enumerator_size_func *size_fn; // ruby 2.1-2.4
285
+ //VALUE (*size_fn)(ANYARGS); // ruby 2.0
286
+ } cumo_enumerator_t;
268
287
 
269
288
  static inline cumo_narray_t *
270
289
  cumo_na_get_narray_t(VALUE obj)
@@ -165,6 +165,16 @@ typedef unsigned int CUMO_BIT_DIGIT;
165
165
  #define CUMO_BALL (~(CUMO_BIT_DIGIT)0)
166
166
  #define CUMO_SLB(n) (((n)==CUMO_NB)?~(CUMO_BIT_DIGIT)0:(~(~(CUMO_BIT_DIGIT)0<<(n))))
167
167
 
168
+ typedef union {
169
+ ssize_t stride;
170
+ size_t *index;
171
+ } cumo_stridx_t;
172
+
173
+ #define CUMO_SDX_IS_STRIDE(x) ((x).stride&0x1)
174
+ #define CUMO_SDX_IS_INDEX(x) (!CUMO_SDX_IS_STRIDE(x))
175
+ #define CUMO_SDX_GET_STRIDE(x) ((x).stride>>1)
176
+ #define CUMO_SDX_GET_INDEX(x) ((x).index)
177
+
168
178
  #include "cumo/indexer.h"
169
179
  #include "cumo/intern_kernel.h"
170
180
 
@@ -2,7 +2,7 @@
2
2
  #define CUMO_NDLOOP_H
3
3
 
4
4
  typedef struct {
5
- ssize_t pos; // - required for each dimension.
5
+ ssize_t pos; // only iter[0].pos is used in cumo as an offset.
6
6
  ssize_t step;
7
7
  size_t *idx;
8
8
  } cumo_na_loop_iter_t;