grx-tensor 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: b5bd8763a36b392a5e91c5d888bbf863b5a879e19db921536b1e04fe6edf961f
4
+ data.tar.gz: b4678711feb1f9e51cd1aea09e7c43d3fa604d9de262c18a2ae7c341e6501333
5
+ SHA512:
6
+ metadata.gz: 81a81fc4818c2377514f63c43a0957d0fde58faa7e511ff60299d5641104dd821f8e58401d489e7eb4d79c48644c3da8c5a7c783f20f644b832138323ff36593
7
+ data.tar.gz: ed8a532d8550bca46384dc67c83a75b0ebb2d0bfbc7b3c9cd62bcfd576d58c531bd838647564a92f3dbf782b1c5c59fd3fbbf68d69db8a1c4870b21dc42af924
data/CHANGELOG.md ADDED
@@ -0,0 +1,54 @@
1
+ # Changelog
2
+
3
+ All notable changes to GRX-Tensor are documented here.
4
+ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
5
+ Versioning follows [Semantic Versioning](https://semver.org/).
6
+
7
+ ---
8
+
9
+ ## [Unreleased]
10
+
11
+ ### Planned
12
+ - OpenMP parallelization for element-wise operations
13
+ - BLAS (`cblas_dgemm`) for production-grade matrix multiplication
14
+ - Broadcasting — automatic shape expansion
15
+ - `float32` support (8 values/cycle with AVX2)
16
+ - Move autograd graph to C to eliminate Ruby GC overhead
17
+ - `Conv2d`, `LSTM`, `MultiheadAttention` layers
18
+ - CUDA extension (`grx-tensor-cuda`)
19
+
20
+ ---
21
+
22
+ ## [0.1.0] - 2026-05-11
23
+
24
+ ### Added
25
+
26
+ **C kernel (`ext/grx/grx_core.c`)**
27
+ - Memory management: `grx_alloc` / `grx_free` — 32-byte aligned allocation via `posix_memalign` (Linux/macOS) and `_aligned_malloc` (Windows), required for AVX2 `_mm256_load_pd`
28
+ - Element-wise arithmetic: `grx_add`, `grx_sub`, `grx_mul`, `grx_div`, `grx_scale`, `grx_add_scalar`, `grx_negate` — AVX2+FMA with 2× loop unrolling
29
+ - Math ops: `grx_abs`, `grx_sqrt`, `grx_square`, `grx_log`, `grx_exp`, `grx_pow`, `grx_clip`
30
+ - Reductions: `grx_sum`, `grx_mean`, `grx_max`, `grx_min`
31
+ - Linear algebra: `grx_dot` (FMA with dual accumulators for ILP), `grx_matmul` (cache-friendly tiling, TILE=8)
32
+ - Activations: `grx_relu`, `grx_leaky_relu`, `grx_tanh_act`, `grx_sigmoid`, `grx_softmax`
33
+ - Optimizers: `grx_sgd_step` (FMA in-place), `grx_adam_step` (full Adam inner loop in C with FMA)
34
+ - Weight initialization: `grx_init_xavier_uniform`, `grx_init_he_normal` (Box-Muller)
35
+ - SIMD dispatch: AVX2+FMA → AVX2 → SSE2 → scalar, selected at compile time via `-march=native`
36
+
37
+ **Ruby layer**
38
+ - `GRX::Storage` — native memory buffer backed by `Fiddle::Pointer`; Ruby `Array` fallback when C is unavailable
39
+ - `GRX::Tensor` — shape, strides, offset (zero-copy `reshape` and `transpose`); all numeric ops delegate to C
40
+ - Autograd — topological BFS graph traversal; `backward_fn` closures for `+`, `-`, `*`, `/`, `square`, `sqrt`, `log`, `exp`, `relu`, `leaky_relu`, `tanh`, `sigmoid`, `matmul`
41
+ - `GRX::CAPI` — Fiddle bridge; detects platform and loads `libgrx_core.so` / `.dylib` / `.dll`
42
+ - `GRX::NN::Linear`, `Sequential`, `ReLU`, `LeakyReLU`, `Tanh`, `Sigmoid`, `Softmax`, `Dropout`, `BatchNorm1d`
43
+ - `GRX::Optim::SGD` (momentum, weight decay), `GRX::Optim::Adam` (bias correction, weight decay)
44
+ - `GRX::Loss::MSELoss`, `MAELoss`, `BCELoss`, `CrossEntropyLoss`, `HuberLoss`
45
+ - Factory helpers: `GRX.tensor`, `GRX.zeros`, `GRX.ones`, `GRX.rand`, `GRX.randn`, `Tensor.xavier_uniform`, `Tensor.he_normal`
46
+
47
+ **Build system**
48
+ - `ext/unix/Makefile` — compiles directly to `lib/grx/libgrx_core.so` (Linux) or `.dylib` (macOS); no intermediate file
49
+ - `ext/windows/Makefile.mingw` — compiles directly to `lib/grx/grx_core.dll` via MinGW-w64
50
+ - `ext/grx/extconf.rb` — rake-compiler config for `gem install` auto-compilation
51
+ - `.gitignore` — compiled binaries excluded from version control
52
+
53
+ **Tests**
54
+ - 43 tests, 10 121 assertions across `test_tensor.rb` and `test_nn.rb`
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Tu Nombre
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,471 @@
1
+ # GRX-Tensor
2
+
3
+ **Ruby speaks. C computes.**
4
+
5
+ A tensor framework for Ruby with automatic differentiation, a C+SIMD compute core, and neural network primitives — all behind a clean, expressive Ruby API.
6
+
7
+ [![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.0-CC342D?logo=ruby)](https://www.ruby-lang.org)
8
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue)](LICENSE.txt)
9
+ [![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)]()
10
+
11
+ ---
12
+
13
+ ## What is GRX?
14
+
15
+ GRX is a tensor computation library for Ruby. The numeric core is written in C and compiled with **AVX2 + FMA** SIMD instructions — processing 4 doubles per CPU cycle with fused multiply-add. Ruby handles the high-level API: shape validation, computation graph construction, and orchestration. C handles everything else.
16
+
17
+ ### Key features
18
+
19
+ | Feature | Details |
20
+ |---|---|
21
+ | **C+SIMD kernel** | AVX2+FMA, SSE2 fallback, scalar fallback — auto-detected at compile time |
22
+ | **Autograd** | Automatic differentiation via a topological computation graph |
23
+ | **Optimizers** | SGD (momentum, weight decay) and Adam (inner loop in C with FMA) |
24
+ | **NN layers** | Linear, Sequential, Dropout, BatchNorm1d |
25
+ | **Activations** | ReLU, Leaky ReLU, Tanh, Sigmoid, Softmax |
26
+ | **Loss functions** | MSE, MAE, BCE, CrossEntropy, Huber |
27
+ | **Weight init** | Xavier uniform, He normal (Box-Muller in C) |
28
+ | **Cross-platform** | `.so` on Linux, `.dylib` on macOS, `.dll` on Windows |
29
+ | **Pure Ruby fallback** | Works without compilation — slower but always correct |
30
+
31
+ ---
32
+
33
+ ## Installation
34
+
35
+ ```bash
36
+ gem install grx-tensor
37
+ ```
38
+
39
+ ```ruby
40
+ # Gemfile
41
+ gem "grx-tensor"
42
+ ```
43
+
44
+ The C extension compiles automatically on `gem install`. No extra steps needed.
45
+
46
+ ---
47
+
48
+ ## Quick start
49
+
50
+ ```ruby
51
+ require "grx"
52
+
53
+ a = GRX.tensor([1.0, 2.0, 3.0], [3], requires_grad: true)
54
+ b = GRX.tensor([4.0, 5.0, 6.0], [3], requires_grad: true)
55
+
56
+ c = a + b # [5.0, 7.0, 9.0] — computed in C with AVX2
57
+ c.backward # propagates gradients through the graph
58
+
59
+ a.grad.to_a # [1.0, 1.0, 1.0]
60
+ b.grad.to_a # [1.0, 1.0, 1.0]
61
+ ```
62
+
63
+ ---
64
+
65
+ ## Tensors
66
+
67
+ ```ruby
68
+ # From array + shape
69
+ t = GRX.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], [2, 3])
70
+ t.shape # [2, 3]
71
+ t.numel # 6
72
+ t.to_a # [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
73
+ t.item # only for single-element tensors → Float
74
+
75
+ # Factories
76
+ GRX.zeros([3]) # [0.0, 0.0, 0.0]
77
+ GRX.ones([2, 2]) # [1.0, 1.0, 1.0, 1.0]
78
+ GRX.rand([4]) # uniform [0, 1)
79
+ GRX.randn([4]) # normal N(0, 1)
80
+
81
+ GRX::Tensor.zeros_like(t) # same shape, all zeros
82
+ GRX::Tensor.ones_like(t) # same shape, all ones
83
+ ```
84
+
85
+ ---
86
+
87
+ ## Arithmetic
88
+
89
+ All operations run in C. Scalar operands are supported on both sides.
90
+
91
+ ```ruby
92
+ a = GRX.tensor([1.0, 2.0, 3.0, 4.0], [4])
93
+ b = GRX.tensor([4.0, 3.0, 2.0, 1.0], [4])
94
+
95
+ (a + b).to_a # [5.0, 5.0, 5.0, 5.0]
96
+ (a - b).to_a # [-3.0, -1.0, 1.0, 3.0]
97
+ (a * b).to_a # [4.0, 6.0, 6.0, 4.0]
98
+ (a / b).to_a # [0.25, 0.666, 1.5, 4.0]
99
+ (-a).to_a # [-1.0, -2.0, -3.0, -4.0]
100
+
101
+ # Tensor OP scalar
102
+ (a + 10.0).to_a # [11.0, 12.0, 13.0, 14.0]
103
+ (a * 3.0).to_a # [3.0, 6.0, 9.0, 12.0]
104
+ (a / 2.0).to_a # [0.5, 1.0, 1.5, 2.0]
105
+ (a - 1.0).to_a # [0.0, 1.0, 2.0, 3.0]
106
+ ```
107
+
108
+ ---
109
+
110
+ ## Math operations
111
+
112
+ ```ruby
113
+ x = GRX.tensor([1.0, 4.0, 9.0, 16.0], [4])
114
+
115
+ x.sqrt.to_a # [1.0, 2.0, 3.0, 4.0]
116
+ x.square.to_a # [1.0, 16.0, 81.0, 256.0]
117
+ x.abs.to_a # absolute value element-wise
118
+ x.log.to_a # natural logarithm
119
+ x.exp.to_a # e^x
120
+ x.pow(3).to_a # [1.0, 64.0, 729.0, 4096.0]
121
+ x.clip(2.0, 10.0).to_a # [2.0, 4.0, 9.0, 10.0]
122
+
123
+ # Reductions → Float
124
+ x.sum # 30.0
125
+ x.mean # 7.5
126
+ x.max # 16.0
127
+ x.min # 1.0
128
+ ```
129
+
130
+ ---
131
+
132
+ ## Linear algebra
133
+
134
+ ```ruby
135
+ u = GRX.tensor([1.0, 2.0, 3.0], [3])
136
+ v = GRX.tensor([4.0, 5.0, 6.0], [3])
137
+
138
+ u.dot(v) # 32.0 → 1×4 + 2×5 + 3×6
139
+
140
+ # Matrix multiplication — tiled for cache efficiency
141
+ a = GRX.tensor([1.0, 2.0, 3.0, 4.0], [2, 2])
142
+ b = GRX.tensor([5.0, 6.0, 7.0, 8.0], [2, 2])
143
+ a.matmul(b).to_a # [19.0, 22.0, 43.0, 50.0]
144
+
145
+ # Non-square: [2×3] × [3×2] → [2×2]
146
+ a3 = GRX.tensor([1.0,2.0,3.0, 4.0,5.0,6.0], [2, 3])
147
+ b3 = GRX.tensor([7.0,8.0, 9.0,10.0, 11.0,12.0], [3, 2])
148
+ a3.matmul(b3).to_a # [58.0, 64.0, 139.0, 154.0]
149
+ ```
150
+
151
+ ---
152
+
153
+ ## Zero-copy geometry
154
+
155
+ `reshape` and `transpose` return views over the same memory — no data is copied.
156
+
157
+ ```ruby
158
+ m = GRX.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], [2, 3])
159
+
160
+ m.get(1, 2) # 6.0
161
+ m.reshape([3, 2]) # new view, same data
162
+ m.flatten # shape [6], same data
163
+ m.transpose # shape [3, 2], same data
164
+
165
+ # Transpose is a true view
166
+ sq = GRX.tensor([1.0, 2.0, 3.0, 4.0], [2, 2])
167
+ tr = sq.transpose
168
+ tr.get(0, 1) # 3.0 (was sq[1, 0])
169
+ tr.get(1, 0) # 2.0 (was sq[0, 1])
170
+ tr.to_a # [1.0, 3.0, 2.0, 4.0]
171
+ ```
172
+
173
+ ---
174
+
175
+ ## Activations
176
+
177
+ ```ruby
178
+ x = GRX.tensor([-3.0, -1.0, 0.0, 1.0, 3.0], [5])
179
+
180
+ x.relu.to_a # [0.0, 0.0, 0.0, 1.0, 3.0]
181
+ x.leaky_relu(0.1).to_a # [-0.3, -0.1, 0.0, 1.0, 3.0]
182
+ x.sigmoid.to_a # [0.047, 0.268, 0.5, 0.731, 0.952]
183
+ x.tanh.to_a # [-0.995, -0.761, 0.0, 0.761, 0.995]
184
+
185
+ GRX.tensor([1.0, 2.0, 3.0, 4.0], [4]).softmax.to_a
186
+ # [0.032, 0.087, 0.236, 0.643] — always sums to 1.0
187
+ ```
188
+
189
+ ---
190
+
191
+ ## Autograd
192
+
193
+ Every operation builds a computation graph automatically. Call `.backward` to propagate gradients back through the graph.
194
+
195
+ ```ruby
196
+ # --- Simple gradient ---
197
+ a = GRX.tensor([2.0, 3.0], [2], requires_grad: true)
198
+ b = GRX.tensor([4.0, 5.0], [2], requires_grad: true)
199
+
200
+ c = a + b
201
+ c.backward
202
+
203
+ a.grad.to_a # [1.0, 1.0] — d(a+b)/da = 1
204
+ b.grad.to_a # [1.0, 1.0] — d(a+b)/db = 1
205
+
206
+ # --- Chained operations ---
207
+ x = GRX.tensor([1.0, 2.0], [2], requires_grad: true)
208
+ y = GRX.tensor([3.0, 4.0], [2], requires_grad: true)
209
+
210
+ z = (x + y) * y # z = xy + y²
211
+ z.backward
212
+
213
+ x.grad.to_a # [3.0, 4.0] — dz/dx = y
214
+ y.grad.to_a # [7.0, 10.0] — dz/dy = x + 2y
215
+
216
+ # Reset gradients before next step
217
+ x.zero_grad!
218
+ y.zero_grad!
219
+ ```
220
+
221
+ **Operations with autograd support:**
222
+ `+` `-` `*` `/` `negate` `scale` `square` `sqrt` `log` `exp` `pow`
223
+ `relu` `leaky_relu` `tanh` `sigmoid` `matmul` `transpose`
224
+
225
+ ---
226
+
227
+ ## Neural networks
228
+
229
+ ```ruby
230
+ # Build a network with Sequential
231
+ net = GRX::NN::Sequential.new(
232
+ GRX::NN::Linear.new(4, 64),
233
+ GRX::NN::ReLU.new,
234
+ GRX::NN::Linear.new(64, 32),
235
+ GRX::NN::Tanh.new,
236
+ GRX::NN::Linear.new(32, 1),
237
+ GRX::NN::Sigmoid.new
238
+ )
239
+
240
+ puts net
241
+ # Sequential(
242
+ # (0): Linear(4 → 64, bias: true)
243
+ # (1): ReLU()
244
+ # (2): Linear(64 → 32, bias: true)
245
+ # (3): Tanh()
246
+ # (4): Linear(32 → 1, bias: true)
247
+ # (5): Sigmoid()
248
+ # )
249
+
250
+ # Forward pass — batch of 8 samples, 4 features each
251
+ x = GRX.randn([8, 4])
252
+ pred = net.call(x) # shape [8, 1]
253
+
254
+ # Access all trainable parameters
255
+ params = net.parameters # Array of Tensors with requires_grad: true
256
+ params.size # 6 (3 weights + 3 biases)
257
+ ```
258
+
259
+ ---
260
+
261
+ ## Training loop
262
+
263
+ ```ruby
264
+ require "grx"
265
+
266
+ # --- Dataset: learn y = 2x + 1 ---
267
+ train_x = GRX.tensor((1..8).map(&:to_f), [8, 1])
268
+ train_y = GRX.tensor((1..8).map { |x| 2.0 * x + 1.0 }, [8, 1])
269
+
270
+ # --- Network ---
271
+ net = GRX::NN::Sequential.new(
272
+ GRX::NN::Linear.new(1, 8),
273
+ GRX::NN::Tanh.new,
274
+ GRX::NN::Linear.new(8, 1)
275
+ )
276
+
277
+ opt = GRX::Optim::Adam.new(net.parameters, lr: 0.05)
278
+ loss_fn = GRX::Loss::MSELoss.new
279
+
280
+ 300.times do |epoch|
281
+ opt.zero_grad
282
+
283
+ pred = net.call(train_x)
284
+ loss_val = loss_fn.call(pred, train_y)
285
+
286
+ # Compute and inject gradients
287
+ grad = pred.to_a.zip(train_y.to_a).map { |p, t| 2.0 * (p - t) / pred.numel }
288
+ pred.agregar_gradiente(GRX.tensor(grad, pred.shape))
289
+ pred.backward
290
+
291
+ opt.step
292
+
293
+ puts "epoch #{epoch + 1} loss: #{loss_val.round(6)}" if (epoch + 1) % 100 == 0
294
+ end
295
+ # epoch 100 loss: 0.312...
296
+ # epoch 200 loss: 0.041...
297
+ # epoch 300 loss: 0.005...
298
+ ```
299
+
300
+ ---
301
+
302
+ ## Layers
303
+
304
+ | Class | Description |
305
+ |---|---|
306
+ | `GRX::NN::Linear` | Dense layer — `y = x @ Wᵀ + b`, Xavier uniform init |
307
+ | `GRX::NN::Sequential` | Ordered chain of layers |
308
+ | `GRX::NN::ReLU` | Rectified Linear Unit |
309
+ | `GRX::NN::LeakyReLU` | Leaky ReLU with configurable alpha (default `0.01`) |
310
+ | `GRX::NN::Tanh` | Hyperbolic tangent |
311
+ | `GRX::NN::Sigmoid` | Logistic sigmoid |
312
+ | `GRX::NN::Softmax` | Normalized exponential |
313
+ | `GRX::NN::Dropout` | Inverted dropout — `train!` / `eval!` modes |
314
+ | `GRX::NN::BatchNorm1d` | Batch normalization with running statistics |
315
+
316
+ ---
317
+
318
+ ## Loss functions
319
+
320
+ | Class | Formula | Use case |
321
+ |---|---|---|
322
+ | `GRX::Loss::MSELoss` | `mean((pred − target)²)` | Regression |
323
+ | `GRX::Loss::MAELoss` | `mean(|pred − target|)` | Robust regression |
324
+ | `GRX::Loss::BCELoss` | `-mean(t·log(p) + (1−t)·log(1−p))` | Binary classification |
325
+ | `GRX::Loss::CrossEntropyLoss` | Softmax + NLL | Multi-class classification |
326
+ | `GRX::Loss::HuberLoss` | Smooth L1 (configurable delta) | Regression with outliers |
327
+
328
+ ---
329
+
330
+ ## Optimizers
331
+
332
+ ```ruby
333
+ # SGD with momentum and weight decay
334
+ opt = GRX::Optim::SGD.new(net.parameters,
335
+ lr: 0.01,
336
+ momentum: 0.9,
337
+ weight_decay: 1e-4
338
+ )
339
+
340
+ # Adam — the standard choice for deep networks
341
+ opt = GRX::Optim::Adam.new(net.parameters,
342
+ lr: 0.001,
343
+ beta1: 0.9,
344
+ beta2: 0.999,
345
+ epsilon: 1e-8,
346
+ weight_decay: 0.0
347
+ )
348
+
349
+ # Training step
350
+ opt.zero_grad # clear gradients
351
+ # ... forward + backward ...
352
+ opt.step # update parameters
353
+ ```
354
+
355
+ ---
356
+
357
+ ## Weight initialization
358
+
359
+ ```ruby
360
+ # Xavier uniform — recommended for tanh / sigmoid layers
361
+ GRX::Tensor.xavier_uniform([64, 32], requires_grad: true)
362
+
363
+ # He normal — recommended for ReLU layers
364
+ GRX::Tensor.he_normal([64, 32], requires_grad: true)
365
+
366
+ # Manual
367
+ GRX::Tensor.zeros([64], requires_grad: true)
368
+ GRX::Tensor.ones([64], requires_grad: true)
369
+ ```
370
+
371
+ ---
372
+
373
+ ## Dropout & BatchNorm
374
+
375
+ ```ruby
376
+ # Dropout — different behavior in train vs eval
377
+ drop = GRX::NN::Dropout.new(0.5)
378
+ drop.train! # activates dropout
379
+ drop.eval! # passes input through unchanged
380
+
381
+ # BatchNorm1d — normalizes across the batch dimension
382
+ bn = GRX::NN::BatchNorm1d.new(16)
383
+ bn.train!
384
+ bn.eval!
385
+ ```
386
+
387
+ ---
388
+
389
+ ## Architecture
390
+
391
+ ```
392
+ grx-tensor/
393
+ ├── ext/
394
+ │ ├── grx/
395
+ │ │ ├── grx_core.c # C kernel
396
+ │ │ │ # AVX2+FMA element-wise ops (unroll ×2)
397
+ │ │ │ # Cache-tiled matmul (TILE=8, 64-byte cache lines)
398
+ │ │ │ # Adam optimizer inner loop with FMA
399
+ │ │ │ # Xavier uniform + He normal (Box-Muller in C)
400
+ │ │ │ # 32-byte aligned memory (posix_memalign / _aligned_malloc)
401
+ │ │ ├── grx_core.h # Public C API with GRX_API export macro
402
+ │ │ └── extconf.rb # mkmf config — auto-detects AVX2, SSE2, scalar
403
+ │ ├── unix/
404
+ │ │ └── Makefile # Manual build → lib/grx/libgrx_core.so / .dylib
405
+ │ └── windows/
406
+ │ └── Makefile.mingw # Manual build → lib/grx/grx_core.dll
407
+
408
+ ├── lib/
409
+ │ ├── grx.rb # require "grx" ← entry point
410
+ │ └── grx/
411
+ │ ├── c_api.rb # Fiddle bridge — finds and loads the binary
412
+ │ │ # Searches: lib/grx/, lib/, ext/grx/ (all install methods)
413
+ │ ├── storage.rb # Native memory buffer (Fiddle::Pointer, 32-byte aligned)
414
+ │ ├── tensor.rb # Tensor: zero-copy views + autograd node
415
+ │ ├── nn.rb # NN layers
416
+ │ ├── optim.rb # Optimizers
417
+ │ ├── loss.rb # Loss functions
418
+ │ └── errors.rb # ShapeError, DimensionError, StorageError
419
+
420
+ └── test/
421
+ ├── test_full.rb # 104-test integration suite
422
+ ├── test_tensor.rb
423
+ ├── test_nn.rb
424
+ └── benchmark.rb
425
+ ```
426
+
427
+ ---
428
+
429
+ ## How the binary is found
430
+
431
+ `c_api.rb` searches for the compiled binary in this order:
432
+
433
+ | Priority | Path | When |
434
+ |---|---|---|
435
+ | 1 | `lib/grx/libgrx_core.so` | `make -C ext/unix` (manual) |
436
+ | 2 | `lib/grx_core.so` | `gem install` via rake-compiler |
437
+ | 3 | `lib/grx_core.bundle` | `gem install` on macOS |
438
+ | 4 | `ext/grx/libgrx_core.so` | local development |
439
+
440
+ If none is found, GRX falls back to pure Ruby automatically — no crash, no configuration needed.
441
+
442
+ ---
443
+
444
+ ## Benchmark
445
+
446
+ Measured on Ruby 3.3, Linux x86_64, AVX2+FMA active.
447
+
448
+ | Operation | n = 1M elements | Throughput |
449
+ |---|---|---|
450
+ | `add` | ~4ms / iter | ~250M doubles/s |
451
+ | `dot` | ~2ms / iter | ~500M doubles/s |
452
+ | `relu` | ~4ms / iter | ~250M doubles/s |
453
+ | `matmul` 256×256 | ~6ms | — |
454
+
455
+ ---
456
+
457
+ ## Roadmap
458
+
459
+ - [ ] OpenMP — parallelize element-wise ops across all CPU cores
460
+ - [ ] BLAS (`cblas_dgemm`) — production-grade matmul
461
+ - [ ] Broadcasting — automatic shape expansion
462
+ - [ ] `float32` support — 8 values/cycle with AVX2
463
+ - [ ] Move autograd graph to C — eliminate Ruby GC overhead for large networks
464
+ - [ ] `Conv2d`, `LSTM`, `MultiheadAttention`
465
+ - [ ] CUDA extension (`grx-tensor-cuda`)
466
+
467
+ ---
468
+
469
+ ## License
470
+
471
+ MIT — see [LICENSE.txt](LICENSE.txt)
@@ -0,0 +1,31 @@
1
+ # extconf.rb
2
+ # =====================================================================
3
+ # Script de configuración de la extensión nativa.
4
+ # rake-compiler lo ejecuta para generar el Makefile correcto
5
+ # según la plataforma del usuario.
6
+ #
7
+ # Uso:
8
+ # bundle exec rake compile → compila para la plataforma actual
9
+ # bundle exec rake native gem → empaqueta binarios pre-compilados
10
+ # =====================================================================
11
+
12
+ require "mkmf"
13
+
14
+ extension_name = "grx_core"
15
+
16
+ $CFLAGS << " -O3 -ffast-math"
17
+ $CFLAGS << " -fvisibility=hidden" unless RUBY_PLATFORM =~ /mingw|mswin/
18
+
19
+ if try_compile("int main(){return 0;}", "-mavx2 -mfma")
20
+ $CFLAGS << " -mavx2 -mfma"
21
+ puts "GRX: AVX2 + FMA habilitados"
22
+ elsif try_compile("int main(){return 0;}", "-msse4.2")
23
+ $CFLAGS << " -msse4.2"
24
+ puts "GRX: SSE4.2 habilitado"
25
+ else
26
+ puts "GRX: Sin SIMD — usando implementación escalar"
27
+ end
28
+
29
+ have_library("m") unless RUBY_PLATFORM =~ /mingw|mswin/
30
+
31
+ create_makefile(extension_name)