grx-tensor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +54 -0
- data/LICENSE.txt +21 -0
- data/README.md +471 -0
- data/ext/grx/extconf.rb +31 -0
- data/ext/grx/grx_core.c +534 -0
- data/ext/grx/grx_core.h +85 -0
- data/ext/unix/Makefile +66 -0
- data/ext/windows/Makefile.mingw +50 -0
- data/grx-tensor.gemspec +88 -0
- data/lib/grx/c_api.rb +96 -0
- data/lib/grx/errors.rb +8 -0
- data/lib/grx/loss.rb +81 -0
- data/lib/grx/nn.rb +262 -0
- data/lib/grx/optim.rb +121 -0
- data/lib/grx/storage.rb +85 -0
- data/lib/grx/tensor.rb +623 -0
- data/lib/grx/version.rb +5 -0
- data/lib/grx.rb +49 -0
- metadata +159 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: b5bd8763a36b392a5e91c5d888bbf863b5a879e19db921536b1e04fe6edf961f
|
|
4
|
+
data.tar.gz: b4678711feb1f9e51cd1aea09e7c43d3fa604d9de262c18a2ae7c341e6501333
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 81a81fc4818c2377514f63c43a0957d0fde58faa7e511ff60299d5641104dd821f8e58401d489e7eb4d79c48644c3da8c5a7c783f20f644b832138323ff36593
|
|
7
|
+
data.tar.gz: ed8a532d8550bca46384dc67c83a75b0ebb2d0bfbc7b3c9cd62bcfd576d58c531bd838647564a92f3dbf782b1c5c59fd3fbbf68d69db8a1c4870b21dc42af924
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to GRX-Tensor are documented here.
|
|
4
|
+
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
|
5
|
+
Versioning follows [Semantic Versioning](https://semver.org/).
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## [Unreleased]
|
|
10
|
+
|
|
11
|
+
### Planned
|
|
12
|
+
- OpenMP parallelization for element-wise operations
|
|
13
|
+
- BLAS (`cblas_dgemm`) for production-grade matrix multiplication
|
|
14
|
+
- Broadcasting — automatic shape expansion
|
|
15
|
+
- `float32` support (8 values/cycle with AVX2)
|
|
16
|
+
- Move autograd graph to C to eliminate Ruby GC overhead
|
|
17
|
+
- `Conv2d`, `LSTM`, `MultiheadAttention` layers
|
|
18
|
+
- CUDA extension (`grx-tensor-cuda`)
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## [0.1.0] - 2026-05-11
|
|
23
|
+
|
|
24
|
+
### Added
|
|
25
|
+
|
|
26
|
+
**C kernel (`ext/grx/grx_core.c`)**
|
|
27
|
+
- Memory management: `grx_alloc` / `grx_free` — 32-byte aligned allocation via `posix_memalign` (Linux/macOS) and `_aligned_malloc` (Windows), required for AVX2 `_mm256_load_pd`
|
|
28
|
+
- Element-wise arithmetic: `grx_add`, `grx_sub`, `grx_mul`, `grx_div`, `grx_scale`, `grx_add_scalar`, `grx_negate` — AVX2+FMA with 2× loop unrolling
|
|
29
|
+
- Math ops: `grx_abs`, `grx_sqrt`, `grx_square`, `grx_log`, `grx_exp`, `grx_pow`, `grx_clip`
|
|
30
|
+
- Reductions: `grx_sum`, `grx_mean`, `grx_max`, `grx_min`
|
|
31
|
+
- Linear algebra: `grx_dot` (FMA with dual accumulators for ILP), `grx_matmul` (cache-friendly tiling, TILE=8)
|
|
32
|
+
- Activations: `grx_relu`, `grx_leaky_relu`, `grx_tanh_act`, `grx_sigmoid`, `grx_softmax`
|
|
33
|
+
- Optimizers: `grx_sgd_step` (FMA in-place), `grx_adam_step` (full Adam inner loop in C with FMA)
|
|
34
|
+
- Weight initialization: `grx_init_xavier_uniform`, `grx_init_he_normal` (Box-Muller)
|
|
35
|
+
- SIMD dispatch: AVX2+FMA → AVX2 → SSE2 → scalar, selected at compile time via `-march=native`
|
|
36
|
+
|
|
37
|
+
**Ruby layer**
|
|
38
|
+
- `GRX::Storage` — native memory buffer backed by `Fiddle::Pointer`; Ruby `Array` fallback when C is unavailable
|
|
39
|
+
- `GRX::Tensor` — shape, strides, offset (zero-copy `reshape` and `transpose`); all numeric ops delegate to C
|
|
40
|
+
- Autograd — topological BFS graph traversal; `backward_fn` closures for `+`, `-`, `*`, `/`, `square`, `sqrt`, `log`, `exp`, `relu`, `leaky_relu`, `tanh`, `sigmoid`, `matmul`
|
|
41
|
+
- `GRX::CAPI` — Fiddle bridge; detects platform and loads `libgrx_core.so` / `.dylib` / `.dll`
|
|
42
|
+
- `GRX::NN::Linear`, `Sequential`, `ReLU`, `LeakyReLU`, `Tanh`, `Sigmoid`, `Softmax`, `Dropout`, `BatchNorm1d`
|
|
43
|
+
- `GRX::Optim::SGD` (momentum, weight decay), `GRX::Optim::Adam` (bias correction, weight decay)
|
|
44
|
+
- `GRX::Loss::MSELoss`, `MAELoss`, `BCELoss`, `CrossEntropyLoss`, `HuberLoss`
|
|
45
|
+
- Factory helpers: `GRX.tensor`, `GRX.zeros`, `GRX.ones`, `GRX.rand`, `GRX.randn`, `Tensor.xavier_uniform`, `Tensor.he_normal`
|
|
46
|
+
|
|
47
|
+
**Build system**
|
|
48
|
+
- `ext/unix/Makefile` — compiles directly to `lib/grx/libgrx_core.so` (Linux) or `.dylib` (macOS); no intermediate file
|
|
49
|
+
- `ext/windows/Makefile.mingw` — compiles directly to `lib/grx/grx_core.dll` via MinGW-w64
|
|
50
|
+
- `ext/grx/extconf.rb` — rake-compiler config for `gem install` auto-compilation
|
|
51
|
+
- `.gitignore` — compiled binaries excluded from version control
|
|
52
|
+
|
|
53
|
+
**Tests**
|
|
54
|
+
- 43 tests, 10 121 assertions across `test_tensor.rb` and `test_nn.rb`
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Tu Nombre
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,471 @@
|
|
|
1
|
+
# GRX-Tensor
|
|
2
|
+
|
|
3
|
+
**Ruby speaks. C computes.**
|
|
4
|
+
|
|
5
|
+
A tensor framework for Ruby with automatic differentiation, a C+SIMD compute core, and neural network primitives — all behind a clean, expressive Ruby API.
|
|
6
|
+
|
|
7
|
+
[](https://www.ruby-lang.org)
|
|
8
|
+
[](LICENSE.txt)
|
|
9
|
+
[]()
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## What is GRX?
|
|
14
|
+
|
|
15
|
+
GRX is a tensor computation library for Ruby. The numeric core is written in C and compiled with **AVX2 + FMA** SIMD instructions — processing 4 doubles per CPU cycle with fused multiply-add. Ruby handles the high-level API: shape validation, computation graph construction, and orchestration. C handles everything else.
|
|
16
|
+
|
|
17
|
+
### Key features
|
|
18
|
+
|
|
19
|
+
| Feature | Details |
|
|
20
|
+
|---|---|
|
|
21
|
+
| **C+SIMD kernel** | AVX2+FMA, SSE2 fallback, scalar fallback — auto-detected at compile time |
|
|
22
|
+
| **Autograd** | Automatic differentiation via a topological computation graph |
|
|
23
|
+
| **Optimizers** | SGD (momentum, weight decay) and Adam (inner loop in C with FMA) |
|
|
24
|
+
| **NN layers** | Linear, Sequential, Dropout, BatchNorm1d |
|
|
25
|
+
| **Activations** | ReLU, Leaky ReLU, Tanh, Sigmoid, Softmax |
|
|
26
|
+
| **Loss functions** | MSE, MAE, BCE, CrossEntropy, Huber |
|
|
27
|
+
| **Weight init** | Xavier uniform, He normal (Box-Muller in C) |
|
|
28
|
+
| **Cross-platform** | `.so` on Linux, `.dylib` on macOS, `.dll` on Windows |
|
|
29
|
+
| **Pure Ruby fallback** | Works without compilation — slower but always correct |
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Installation
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
gem install grx-tensor
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
```ruby
|
|
40
|
+
# Gemfile
|
|
41
|
+
gem "grx-tensor"
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
The C extension compiles automatically on `gem install`. No extra steps needed.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Quick start
|
|
49
|
+
|
|
50
|
+
```ruby
|
|
51
|
+
require "grx"
|
|
52
|
+
|
|
53
|
+
a = GRX.tensor([1.0, 2.0, 3.0], [3], requires_grad: true)
|
|
54
|
+
b = GRX.tensor([4.0, 5.0, 6.0], [3], requires_grad: true)
|
|
55
|
+
|
|
56
|
+
c = a + b # [5.0, 7.0, 9.0] — computed in C with AVX2
|
|
57
|
+
c.backward # propagates gradients through the graph
|
|
58
|
+
|
|
59
|
+
a.grad.to_a # [1.0, 1.0, 1.0]
|
|
60
|
+
b.grad.to_a # [1.0, 1.0, 1.0]
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Tensors
|
|
66
|
+
|
|
67
|
+
```ruby
|
|
68
|
+
# From array + shape
|
|
69
|
+
t = GRX.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], [2, 3])
|
|
70
|
+
t.shape # [2, 3]
|
|
71
|
+
t.numel # 6
|
|
72
|
+
t.to_a # [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
|
|
73
|
+
t.item # only for single-element tensors → Float
|
|
74
|
+
|
|
75
|
+
# Factories
|
|
76
|
+
GRX.zeros([3]) # [0.0, 0.0, 0.0]
|
|
77
|
+
GRX.ones([2, 2]) # [1.0, 1.0, 1.0, 1.0]
|
|
78
|
+
GRX.rand([4]) # uniform [0, 1)
|
|
79
|
+
GRX.randn([4]) # normal N(0, 1)
|
|
80
|
+
|
|
81
|
+
GRX::Tensor.zeros_like(t) # same shape, all zeros
|
|
82
|
+
GRX::Tensor.ones_like(t) # same shape, all ones
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Arithmetic
|
|
88
|
+
|
|
89
|
+
All operations run in C. Scalar operands are supported on both sides.
|
|
90
|
+
|
|
91
|
+
```ruby
|
|
92
|
+
a = GRX.tensor([1.0, 2.0, 3.0, 4.0], [4])
|
|
93
|
+
b = GRX.tensor([4.0, 3.0, 2.0, 1.0], [4])
|
|
94
|
+
|
|
95
|
+
(a + b).to_a # [5.0, 5.0, 5.0, 5.0]
|
|
96
|
+
(a - b).to_a # [-3.0, -1.0, 1.0, 3.0]
|
|
97
|
+
(a * b).to_a # [4.0, 6.0, 6.0, 4.0]
|
|
98
|
+
(a / b).to_a # [0.25, 0.666, 1.5, 4.0]
|
|
99
|
+
(-a).to_a # [-1.0, -2.0, -3.0, -4.0]
|
|
100
|
+
|
|
101
|
+
# Tensor OP scalar
|
|
102
|
+
(a + 10.0).to_a # [11.0, 12.0, 13.0, 14.0]
|
|
103
|
+
(a * 3.0).to_a # [3.0, 6.0, 9.0, 12.0]
|
|
104
|
+
(a / 2.0).to_a # [0.5, 1.0, 1.5, 2.0]
|
|
105
|
+
(a - 1.0).to_a # [0.0, 1.0, 2.0, 3.0]
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Math operations
|
|
111
|
+
|
|
112
|
+
```ruby
|
|
113
|
+
x = GRX.tensor([1.0, 4.0, 9.0, 16.0], [4])
|
|
114
|
+
|
|
115
|
+
x.sqrt.to_a # [1.0, 2.0, 3.0, 4.0]
|
|
116
|
+
x.square.to_a # [1.0, 16.0, 81.0, 256.0]
|
|
117
|
+
x.abs.to_a # absolute value element-wise
|
|
118
|
+
x.log.to_a # natural logarithm
|
|
119
|
+
x.exp.to_a # e^x
|
|
120
|
+
x.pow(3).to_a # [1.0, 64.0, 729.0, 4096.0]
|
|
121
|
+
x.clip(2.0, 10.0).to_a # [2.0, 4.0, 9.0, 10.0]
|
|
122
|
+
|
|
123
|
+
# Reductions → Float
|
|
124
|
+
x.sum # 30.0
|
|
125
|
+
x.mean # 7.5
|
|
126
|
+
x.max # 16.0
|
|
127
|
+
x.min # 1.0
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Linear algebra
|
|
133
|
+
|
|
134
|
+
```ruby
|
|
135
|
+
u = GRX.tensor([1.0, 2.0, 3.0], [3])
|
|
136
|
+
v = GRX.tensor([4.0, 5.0, 6.0], [3])
|
|
137
|
+
|
|
138
|
+
u.dot(v) # 32.0 → 1×4 + 2×5 + 3×6
|
|
139
|
+
|
|
140
|
+
# Matrix multiplication — tiled for cache efficiency
|
|
141
|
+
a = GRX.tensor([1.0, 2.0, 3.0, 4.0], [2, 2])
|
|
142
|
+
b = GRX.tensor([5.0, 6.0, 7.0, 8.0], [2, 2])
|
|
143
|
+
a.matmul(b).to_a # [19.0, 22.0, 43.0, 50.0]
|
|
144
|
+
|
|
145
|
+
# Non-square: [2×3] × [3×2] → [2×2]
|
|
146
|
+
a3 = GRX.tensor([1.0,2.0,3.0, 4.0,5.0,6.0], [2, 3])
|
|
147
|
+
b3 = GRX.tensor([7.0,8.0, 9.0,10.0, 11.0,12.0], [3, 2])
|
|
148
|
+
a3.matmul(b3).to_a # [58.0, 64.0, 139.0, 154.0]
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Zero-copy geometry
|
|
154
|
+
|
|
155
|
+
`reshape` and `transpose` return views over the same memory — no data is copied.
|
|
156
|
+
|
|
157
|
+
```ruby
|
|
158
|
+
m = GRX.tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], [2, 3])
|
|
159
|
+
|
|
160
|
+
m.get(1, 2) # 6.0
|
|
161
|
+
m.reshape([3, 2]) # new view, same data
|
|
162
|
+
m.flatten # shape [6], same data
|
|
163
|
+
m.transpose # shape [3, 2], same data
|
|
164
|
+
|
|
165
|
+
# Transpose is a true view
|
|
166
|
+
sq = GRX.tensor([1.0, 2.0, 3.0, 4.0], [2, 2])
|
|
167
|
+
tr = sq.transpose
|
|
168
|
+
tr.get(0, 1) # 3.0 (was sq[1, 0])
|
|
169
|
+
tr.get(1, 0) # 2.0 (was sq[0, 1])
|
|
170
|
+
tr.to_a # [1.0, 3.0, 2.0, 4.0]
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
## Activations
|
|
176
|
+
|
|
177
|
+
```ruby
|
|
178
|
+
x = GRX.tensor([-3.0, -1.0, 0.0, 1.0, 3.0], [5])
|
|
179
|
+
|
|
180
|
+
x.relu.to_a # [0.0, 0.0, 0.0, 1.0, 3.0]
|
|
181
|
+
x.leaky_relu(0.1).to_a # [-0.3, -0.1, 0.0, 1.0, 3.0]
|
|
182
|
+
x.sigmoid.to_a # [0.047, 0.268, 0.5, 0.731, 0.952]
|
|
183
|
+
x.tanh.to_a # [-0.995, -0.761, 0.0, 0.761, 0.995]
|
|
184
|
+
|
|
185
|
+
GRX.tensor([1.0, 2.0, 3.0, 4.0], [4]).softmax.to_a
|
|
186
|
+
# [0.032, 0.087, 0.236, 0.643] — always sums to 1.0
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Autograd
|
|
192
|
+
|
|
193
|
+
Every operation builds a computation graph automatically. Call `.backward` to propagate gradients back through the graph.
|
|
194
|
+
|
|
195
|
+
```ruby
|
|
196
|
+
# --- Simple gradient ---
|
|
197
|
+
a = GRX.tensor([2.0, 3.0], [2], requires_grad: true)
|
|
198
|
+
b = GRX.tensor([4.0, 5.0], [2], requires_grad: true)
|
|
199
|
+
|
|
200
|
+
c = a + b
|
|
201
|
+
c.backward
|
|
202
|
+
|
|
203
|
+
a.grad.to_a # [1.0, 1.0] — d(a+b)/da = 1
|
|
204
|
+
b.grad.to_a # [1.0, 1.0] — d(a+b)/db = 1
|
|
205
|
+
|
|
206
|
+
# --- Chained operations ---
|
|
207
|
+
x = GRX.tensor([1.0, 2.0], [2], requires_grad: true)
|
|
208
|
+
y = GRX.tensor([3.0, 4.0], [2], requires_grad: true)
|
|
209
|
+
|
|
210
|
+
z = (x + y) * y # z = xy + y²
|
|
211
|
+
z.backward
|
|
212
|
+
|
|
213
|
+
x.grad.to_a # [3.0, 4.0] — dz/dx = y
|
|
214
|
+
y.grad.to_a # [7.0, 10.0] — dz/dy = x + 2y
|
|
215
|
+
|
|
216
|
+
# Reset gradients before next step
|
|
217
|
+
x.zero_grad!
|
|
218
|
+
y.zero_grad!
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Operations with autograd support:**
|
|
222
|
+
`+` `-` `*` `/` `negate` `scale` `square` `sqrt` `log` `exp` `pow`
|
|
223
|
+
`relu` `leaky_relu` `tanh` `sigmoid` `matmul` `transpose`
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## Neural networks
|
|
228
|
+
|
|
229
|
+
```ruby
|
|
230
|
+
# Build a network with Sequential
|
|
231
|
+
net = GRX::NN::Sequential.new(
|
|
232
|
+
GRX::NN::Linear.new(4, 64),
|
|
233
|
+
GRX::NN::ReLU.new,
|
|
234
|
+
GRX::NN::Linear.new(64, 32),
|
|
235
|
+
GRX::NN::Tanh.new,
|
|
236
|
+
GRX::NN::Linear.new(32, 1),
|
|
237
|
+
GRX::NN::Sigmoid.new
|
|
238
|
+
)
|
|
239
|
+
|
|
240
|
+
puts net
|
|
241
|
+
# Sequential(
|
|
242
|
+
# (0): Linear(4 → 64, bias: true)
|
|
243
|
+
# (1): ReLU()
|
|
244
|
+
# (2): Linear(64 → 32, bias: true)
|
|
245
|
+
# (3): Tanh()
|
|
246
|
+
# (4): Linear(32 → 1, bias: true)
|
|
247
|
+
# (5): Sigmoid()
|
|
248
|
+
# )
|
|
249
|
+
|
|
250
|
+
# Forward pass — batch of 8 samples, 4 features each
|
|
251
|
+
x = GRX.randn([8, 4])
|
|
252
|
+
pred = net.call(x) # shape [8, 1]
|
|
253
|
+
|
|
254
|
+
# Access all trainable parameters
|
|
255
|
+
params = net.parameters # Array of Tensors with requires_grad: true
|
|
256
|
+
params.size # 6 (3 weights + 3 biases)
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
## Training loop
|
|
262
|
+
|
|
263
|
+
```ruby
|
|
264
|
+
require "grx"
|
|
265
|
+
|
|
266
|
+
# --- Dataset: learn y = 2x + 1 ---
|
|
267
|
+
train_x = GRX.tensor((1..8).map(&:to_f), [8, 1])
|
|
268
|
+
train_y = GRX.tensor((1..8).map { |x| 2.0 * x + 1.0 }, [8, 1])
|
|
269
|
+
|
|
270
|
+
# --- Network ---
|
|
271
|
+
net = GRX::NN::Sequential.new(
|
|
272
|
+
GRX::NN::Linear.new(1, 8),
|
|
273
|
+
GRX::NN::Tanh.new,
|
|
274
|
+
GRX::NN::Linear.new(8, 1)
|
|
275
|
+
)
|
|
276
|
+
|
|
277
|
+
opt = GRX::Optim::Adam.new(net.parameters, lr: 0.05)
|
|
278
|
+
loss_fn = GRX::Loss::MSELoss.new
|
|
279
|
+
|
|
280
|
+
300.times do |epoch|
|
|
281
|
+
opt.zero_grad
|
|
282
|
+
|
|
283
|
+
pred = net.call(train_x)
|
|
284
|
+
loss_val = loss_fn.call(pred, train_y)
|
|
285
|
+
|
|
286
|
+
# Compute and inject gradients
|
|
287
|
+
grad = pred.to_a.zip(train_y.to_a).map { |p, t| 2.0 * (p - t) / pred.numel }
|
|
288
|
+
pred.agregar_gradiente(GRX.tensor(grad, pred.shape))
|
|
289
|
+
pred.backward
|
|
290
|
+
|
|
291
|
+
opt.step
|
|
292
|
+
|
|
293
|
+
puts "epoch #{epoch + 1} loss: #{loss_val.round(6)}" if (epoch + 1) % 100 == 0
|
|
294
|
+
end
|
|
295
|
+
# epoch 100 loss: 0.312...
|
|
296
|
+
# epoch 200 loss: 0.041...
|
|
297
|
+
# epoch 300 loss: 0.005...
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## Layers
|
|
303
|
+
|
|
304
|
+
| Class | Description |
|
|
305
|
+
|---|---|
|
|
306
|
+
| `GRX::NN::Linear` | Dense layer — `y = x @ Wᵀ + b`, Xavier uniform init |
|
|
307
|
+
| `GRX::NN::Sequential` | Ordered chain of layers |
|
|
308
|
+
| `GRX::NN::ReLU` | Rectified Linear Unit |
|
|
309
|
+
| `GRX::NN::LeakyReLU` | Leaky ReLU with configurable alpha (default `0.01`) |
|
|
310
|
+
| `GRX::NN::Tanh` | Hyperbolic tangent |
|
|
311
|
+
| `GRX::NN::Sigmoid` | Logistic sigmoid |
|
|
312
|
+
| `GRX::NN::Softmax` | Normalized exponential |
|
|
313
|
+
| `GRX::NN::Dropout` | Inverted dropout — `train!` / `eval!` modes |
|
|
314
|
+
| `GRX::NN::BatchNorm1d` | Batch normalization with running statistics |
|
|
315
|
+
|
|
316
|
+
---
|
|
317
|
+
|
|
318
|
+
## Loss functions
|
|
319
|
+
|
|
320
|
+
| Class | Formula | Use case |
|
|
321
|
+
|---|---|---|
|
|
322
|
+
| `GRX::Loss::MSELoss` | `mean((pred − target)²)` | Regression |
|
|
323
|
+
| `GRX::Loss::MAELoss` | `mean(|pred − target|)` | Robust regression |
|
|
324
|
+
| `GRX::Loss::BCELoss` | `-mean(t·log(p) + (1−t)·log(1−p))` | Binary classification |
|
|
325
|
+
| `GRX::Loss::CrossEntropyLoss` | Softmax + NLL | Multi-class classification |
|
|
326
|
+
| `GRX::Loss::HuberLoss` | Smooth L1 (configurable delta) | Regression with outliers |
|
|
327
|
+
|
|
328
|
+
---
|
|
329
|
+
|
|
330
|
+
## Optimizers
|
|
331
|
+
|
|
332
|
+
```ruby
|
|
333
|
+
# SGD with momentum and weight decay
|
|
334
|
+
opt = GRX::Optim::SGD.new(net.parameters,
|
|
335
|
+
lr: 0.01,
|
|
336
|
+
momentum: 0.9,
|
|
337
|
+
weight_decay: 1e-4
|
|
338
|
+
)
|
|
339
|
+
|
|
340
|
+
# Adam — the standard choice for deep networks
|
|
341
|
+
opt = GRX::Optim::Adam.new(net.parameters,
|
|
342
|
+
lr: 0.001,
|
|
343
|
+
beta1: 0.9,
|
|
344
|
+
beta2: 0.999,
|
|
345
|
+
epsilon: 1e-8,
|
|
346
|
+
weight_decay: 0.0
|
|
347
|
+
)
|
|
348
|
+
|
|
349
|
+
# Training step
|
|
350
|
+
opt.zero_grad # clear gradients
|
|
351
|
+
# ... forward + backward ...
|
|
352
|
+
opt.step # update parameters
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## Weight initialization
|
|
358
|
+
|
|
359
|
+
```ruby
|
|
360
|
+
# Xavier uniform — recommended for tanh / sigmoid layers
|
|
361
|
+
GRX::Tensor.xavier_uniform([64, 32], requires_grad: true)
|
|
362
|
+
|
|
363
|
+
# He normal — recommended for ReLU layers
|
|
364
|
+
GRX::Tensor.he_normal([64, 32], requires_grad: true)
|
|
365
|
+
|
|
366
|
+
# Manual
|
|
367
|
+
GRX::Tensor.zeros([64], requires_grad: true)
|
|
368
|
+
GRX::Tensor.ones([64], requires_grad: true)
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
---
|
|
372
|
+
|
|
373
|
+
## Dropout & BatchNorm
|
|
374
|
+
|
|
375
|
+
```ruby
|
|
376
|
+
# Dropout — different behavior in train vs eval
|
|
377
|
+
drop = GRX::NN::Dropout.new(0.5)
|
|
378
|
+
drop.train! # activates dropout
|
|
379
|
+
drop.eval! # passes input through unchanged
|
|
380
|
+
|
|
381
|
+
# BatchNorm1d — normalizes across the batch dimension
|
|
382
|
+
bn = GRX::NN::BatchNorm1d.new(16)
|
|
383
|
+
bn.train!
|
|
384
|
+
bn.eval!
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## Architecture
|
|
390
|
+
|
|
391
|
+
```
|
|
392
|
+
grx-tensor/
|
|
393
|
+
├── ext/
|
|
394
|
+
│ ├── grx/
|
|
395
|
+
│ │ ├── grx_core.c # C kernel
|
|
396
|
+
│ │ │ # AVX2+FMA element-wise ops (unroll ×2)
|
|
397
|
+
│ │ │ # Cache-tiled matmul (TILE=8, 64-byte cache lines)
|
|
398
|
+
│ │ │ # Adam optimizer inner loop with FMA
|
|
399
|
+
│ │ │ # Xavier uniform + He normal (Box-Muller in C)
|
|
400
|
+
│ │ │ # 32-byte aligned memory (posix_memalign / _aligned_malloc)
|
|
401
|
+
│ │ ├── grx_core.h # Public C API with GRX_API export macro
|
|
402
|
+
│ │ └── extconf.rb # mkmf config — auto-detects AVX2, SSE2, scalar
|
|
403
|
+
│ ├── unix/
|
|
404
|
+
│ │ └── Makefile # Manual build → lib/grx/libgrx_core.so / .dylib
|
|
405
|
+
│ └── windows/
|
|
406
|
+
│ └── Makefile.mingw # Manual build → lib/grx/grx_core.dll
|
|
407
|
+
│
|
|
408
|
+
├── lib/
|
|
409
|
+
│ ├── grx.rb # require "grx" ← entry point
|
|
410
|
+
│ └── grx/
|
|
411
|
+
│ ├── c_api.rb # Fiddle bridge — finds and loads the binary
|
|
412
|
+
│ │ # Searches: lib/grx/, lib/, ext/grx/ (all install methods)
|
|
413
|
+
│ ├── storage.rb # Native memory buffer (Fiddle::Pointer, 32-byte aligned)
|
|
414
|
+
│ ├── tensor.rb # Tensor: zero-copy views + autograd node
|
|
415
|
+
│ ├── nn.rb # NN layers
|
|
416
|
+
│ ├── optim.rb # Optimizers
|
|
417
|
+
│ ├── loss.rb # Loss functions
|
|
418
|
+
│ └── errors.rb # ShapeError, DimensionError, StorageError
|
|
419
|
+
│
|
|
420
|
+
└── test/
|
|
421
|
+
├── test_full.rb # 104-test integration suite
|
|
422
|
+
├── test_tensor.rb
|
|
423
|
+
├── test_nn.rb
|
|
424
|
+
└── benchmark.rb
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
---
|
|
428
|
+
|
|
429
|
+
## How the binary is found
|
|
430
|
+
|
|
431
|
+
`c_api.rb` searches for the compiled binary in this order:
|
|
432
|
+
|
|
433
|
+
| Priority | Path | When |
|
|
434
|
+
|---|---|---|
|
|
435
|
+
| 1 | `lib/grx/libgrx_core.so` | `make -C ext/unix` (manual) |
|
|
436
|
+
| 2 | `lib/grx_core.so` | `gem install` via rake-compiler |
|
|
437
|
+
| 3 | `lib/grx_core.bundle` | `gem install` on macOS |
|
|
438
|
+
| 4 | `ext/grx/libgrx_core.so` | local development |
|
|
439
|
+
|
|
440
|
+
If none is found, GRX falls back to pure Ruby automatically — no crash, no configuration needed.
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
## Benchmark
|
|
445
|
+
|
|
446
|
+
Measured on Ruby 3.3, Linux x86_64, AVX2+FMA active.
|
|
447
|
+
|
|
448
|
+
| Operation | n = 1M elements | Throughput |
|
|
449
|
+
|---|---|---|
|
|
450
|
+
| `add` | ~4ms / iter | ~250M doubles/s |
|
|
451
|
+
| `dot` | ~2ms / iter | ~500M doubles/s |
|
|
452
|
+
| `relu` | ~4ms / iter | ~250M doubles/s |
|
|
453
|
+
| `matmul` 256×256 | ~6ms | — |
|
|
454
|
+
|
|
455
|
+
---
|
|
456
|
+
|
|
457
|
+
## Roadmap
|
|
458
|
+
|
|
459
|
+
- [ ] OpenMP — parallelize element-wise ops across all CPU cores
|
|
460
|
+
- [ ] BLAS (`cblas_dgemm`) — production-grade matmul
|
|
461
|
+
- [ ] Broadcasting — automatic shape expansion
|
|
462
|
+
- [ ] `float32` support — 8 values/cycle with AVX2
|
|
463
|
+
- [ ] Move autograd graph to C — eliminate Ruby GC overhead for large networks
|
|
464
|
+
- [ ] `Conv2d`, `LSTM`, `MultiheadAttention`
|
|
465
|
+
- [ ] CUDA extension (`grx-tensor-cuda`)
|
|
466
|
+
|
|
467
|
+
---
|
|
468
|
+
|
|
469
|
+
## License
|
|
470
|
+
|
|
471
|
+
MIT — see [LICENSE.txt](LICENSE.txt)
|
data/ext/grx/extconf.rb
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# extconf.rb
|
|
2
|
+
# =====================================================================
|
|
3
|
+
# Script de configuración de la extensión nativa.
|
|
4
|
+
# rake-compiler lo ejecuta para generar el Makefile correcto
|
|
5
|
+
# según la plataforma del usuario.
|
|
6
|
+
#
|
|
7
|
+
# Uso:
|
|
8
|
+
# bundle exec rake compile → compila para la plataforma actual
|
|
9
|
+
# bundle exec rake native gem → empaqueta binarios pre-compilados
|
|
10
|
+
# =====================================================================
|
|
11
|
+
|
|
12
|
+
require "mkmf"
|
|
13
|
+
|
|
14
|
+
extension_name = "grx_core"
|
|
15
|
+
|
|
16
|
+
$CFLAGS << " -O3 -ffast-math"
|
|
17
|
+
$CFLAGS << " -fvisibility=hidden" unless RUBY_PLATFORM =~ /mingw|mswin/
|
|
18
|
+
|
|
19
|
+
if try_compile("int main(){return 0;}", "-mavx2 -mfma")
|
|
20
|
+
$CFLAGS << " -mavx2 -mfma"
|
|
21
|
+
puts "GRX: AVX2 + FMA habilitados"
|
|
22
|
+
elsif try_compile("int main(){return 0;}", "-msse4.2")
|
|
23
|
+
$CFLAGS << " -msse4.2"
|
|
24
|
+
puts "GRX: SSE4.2 habilitado"
|
|
25
|
+
else
|
|
26
|
+
puts "GRX: Sin SIMD — usando implementación escalar"
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
have_library("m") unless RUBY_PLATFORM =~ /mingw|mswin/
|
|
30
|
+
|
|
31
|
+
create_makefile(extension_name)
|