@dniskav/neuron 0.2.7 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +540 -192
- package/dist/index.d.mts +587 -1
- package/dist/index.d.ts +587 -1
- package/dist/index.js +3778 -2
- package/dist/index.mjs +3734 -2
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
|
|
4
4
|
A minimal, dependency-free neural network library built from scratch in TypeScript. Designed for learning and experimentation — every line of math is readable.
|
|
5
5
|
|
|
6
|
-
Each class is a building block for the next: from a single neuron to a full Transformer with causal attention.
|
|
6
|
+
Each class is a building block for the next: from a single neuron to a full Transformer with causal attention. Includes classical ML, unsupervised learning, generative models, embeddings, and autograd — all in pure TypeScript, zero dependencies.
|
|
7
7
|
|
|
8
8
|
```mermaid
|
|
9
9
|
graph TD
|
|
@@ -20,13 +20,43 @@ graph TD
|
|
|
20
20
|
K["NetworkTransformer\nembeddings → blocks → per-token logits"]
|
|
21
21
|
L["NetworkTransformerRL\ncontinuous projection → causal attention → Q-values"]
|
|
22
22
|
|
|
23
|
+
subgraph Classical ML
|
|
24
|
+
P["Perceptron\nstep function · Rosenblatt rule"]
|
|
25
|
+
LR["LinearRegression\nnormal equation · gradient descent"]
|
|
26
|
+
LOG["LogisticRegression\nsigmoid · BCE · SoftmaxRegression"]
|
|
27
|
+
NB["GaussianNaiveBayes\nlog-probabilities · Gaussian P(x|c)"]
|
|
28
|
+
DT["DecisionTree\nCART · Gini · MSE split"]
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
subgraph Unsupervised
|
|
32
|
+
KM["KMeans\nK-Means++ · inertia · elbow"]
|
|
33
|
+
PCA["PCA\npower iteration · projection · reconstruction"]
|
|
34
|
+
SOM["SOM\nKohonen · BMU · Gaussian neighborhood"]
|
|
35
|
+
HN["HopfieldNetwork\nHebbian · energy · associative memory"]
|
|
36
|
+
AE["Autoencoder\nencoder · bottleneck · decoder"]
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
subgraph Generative
|
|
40
|
+
GAN["GAN\ngenerator · discriminator · min-max"]
|
|
41
|
+
VAE["VAE\nreparametrization trick · ELBO · KL"]
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
subgraph Autograd
|
|
45
|
+
TAP["Value / Tape\nreverse-mode · computational graph · backward"]
|
|
46
|
+
end
|
|
47
|
+
|
|
23
48
|
A --> B --> C --> D --> E
|
|
24
49
|
E --> F --> G
|
|
25
50
|
E --> H --> I --> J --> K --> L
|
|
51
|
+
E --> AE
|
|
52
|
+
E --> GAN
|
|
53
|
+
E --> VAE
|
|
26
54
|
```
|
|
27
55
|
|
|
28
56
|
## What's inside
|
|
29
57
|
|
|
58
|
+
### Neural network building blocks
|
|
59
|
+
|
|
30
60
|
| Export | Description |
|
|
31
61
|
|--------|-------------|
|
|
32
62
|
| `Neuron` | Single-input neuron. The simplest possible unit: one weight, one bias. |
|
|
@@ -36,20 +66,126 @@ graph TD
|
|
|
36
66
|
| `NetworkN` | Deep network of arbitrary depth. Define your architecture as `[inputs, ...hidden, outputs]`. |
|
|
37
67
|
| `LSTMLayer` | Recurrent layer with persistent hidden and cell state. Learns sequences via BPTT. |
|
|
38
68
|
| `NetworkLSTM` | Wraps an `LSTMLayer` + dense layers. Maintains memory across steps within an episode. |
|
|
69
|
+
| `GRULayer` | Gated Recurrent Unit — lighter alternative to LSTM, two gates instead of three. |
|
|
39
70
|
| `NetworkTransformer` | Full token-classification Transformer: embeddings → N blocks → per-token logits. |
|
|
40
|
-
| `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values.
|
|
71
|
+
| `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. |
|
|
41
72
|
| `TransformerBlock` | One Transformer block: multi-head attention + FFN + LayerNorm × 2 with residuals. |
|
|
42
73
|
| `MultiHeadAttention` | N parallel attention heads concatenated and projected to `d_model`. |
|
|
43
74
|
| `AttentionHead` | Single scaled dot-product self-attention head (Q / K / V projections + backprop). |
|
|
75
|
+
|
|
76
|
+
### Layers & components
|
|
77
|
+
|
|
78
|
+
| Export | Description |
|
|
79
|
+
|--------|-------------|
|
|
80
|
+
| `Conv1D` | 1D convolution over sequences. Multi-channel, configurable stride and padding. |
|
|
81
|
+
| `Conv2D` | 2D convolution for images. Kernels `[filters][kH][kW][C]`, full forward + backward. |
|
|
82
|
+
| `MaxPool2D` | Max pooling 2D. Stores position mask for exact gradient routing in backprop. |
|
|
83
|
+
| `Flatten` | Converts `[H][W][C]` tensors to flat vectors. Bridges Conv layers to dense layers. |
|
|
84
|
+
| `RNN` | Vanilla RNN with BPTT. Explicitly shows where and why gradients vanish. |
|
|
85
|
+
| `Seq2Seq` | Encoder + Decoder LSTMs with context vector transfer. Teacher forcing in training. |
|
|
86
|
+
| `CausalConv1D` | Causal dilated 1D convolution. One building block of a TCN. |
|
|
87
|
+
| `TCN` | Temporal Convolutional Network. Stacks causal dilated convolutions for sequences without recurrence. |
|
|
44
88
|
| `LayerNorm` | Layer normalization with learnable γ / β per feature. |
|
|
45
|
-
| `
|
|
46
|
-
| `
|
|
89
|
+
| `BatchNorm` | Batch normalization with running mean/variance for inference. |
|
|
90
|
+
| `Dropout` | Inverted dropout for regularization. Active only during training. |
|
|
91
|
+
| `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers and optional gradient clipping. |
|
|
92
|
+
| `BiasVector` | 1D bias vector with per-scalar Adam optimizers. |
|
|
47
93
|
| `EmbeddingMatrix` | Lookup-table embedding matrix with SGD updates. |
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
|
52
|
-
|
|
94
|
+
|
|
95
|
+
### Classical ML
|
|
96
|
+
|
|
97
|
+
| Export | Description |
|
|
98
|
+
|--------|-------------|
|
|
99
|
+
| `Perceptron` | The historical Rosenblatt perceptron (1957). Step function, linear rule. Shows why XOR is impossible. |
|
|
100
|
+
| `LinearRegression` | Closed-form normal equation `(XᵀX)⁻¹Xᵀy` + gradient descent mode. Pure array arithmetic. |
|
|
101
|
+
| `LogisticRegression` | Sigmoid + binary cross-entropy, no hidden layers. The boundary between classical ML and neural nets. |
|
|
102
|
+
| `SoftmaxRegression` | Multinomial logistic regression. Log-sum-exp trick for numerical stability. |
|
|
103
|
+
| `GaussianNaiveBayes` | `P(c|x) ∝ P(c)·∏P(xᵢ|c)` in log-space. Zero gradient descent — pure Bayes. |
|
|
104
|
+
| `DecisionTree` | CART with Gini impurity (classification) or variance (regression). Fully recursive. |
|
|
105
|
+
|
|
106
|
+
### Unsupervised learning
|
|
107
|
+
|
|
108
|
+
| Export | Description |
|
|
109
|
+
|--------|-------------|
|
|
110
|
+
| `KMeans` | K-Means++ initialization + Lloyd's algorithm. `inertia()` for the elbow method. |
|
|
111
|
+
| `PCA` | Principal Component Analysis via power iteration + Hotelling deflation. Projects, reconstructs, explains variance. |
|
|
112
|
+
| `SOM` | Self-Organizing Map (Kohonen). BMU search, Gaussian neighborhood, topology preservation. |
|
|
113
|
+
| `HopfieldNetwork` | Associative memory. Hebbian storage, energy function, async recall. Capacity ~0.138·N. |
|
|
114
|
+
| `Autoencoder` | Encoder + bottleneck + decoder using two `NetworkN` instances. Learns compressed representations. |
|
|
115
|
+
|
|
116
|
+
### Generative models
|
|
117
|
+
|
|
118
|
+
| Export | Description |
|
|
119
|
+
|--------|-------------|
|
|
120
|
+
| `GAN` | Generator vs Discriminator min-max game. Documents Nash equilibrium and mode collapse. |
|
|
121
|
+
| `VAE` | Variational Autoencoder. Reparametrization trick, ELBO = reconstruction + KL divergence. |
|
|
122
|
+
|
|
123
|
+
### Automatic differentiation
|
|
124
|
+
|
|
125
|
+
| Export | Description |
|
|
126
|
+
|--------|-------------|
|
|
127
|
+
| `Value` | Scalar autograd node. Builds a computational graph and propagates gradients with `.backward()`. Inspired by micrograd. |
|
|
128
|
+
|
|
129
|
+
### Embeddings
|
|
130
|
+
|
|
131
|
+
| Export | Description |
|
|
132
|
+
|--------|-------------|
|
|
133
|
+
| `Word2Vec` | Learns word embeddings via Skip-gram or CBOW. Full-softmax, cosine similarity, analogies (`king - man + woman ≈ queen`). |
|
|
134
|
+
| `TSNE` | t-SNE dimensionality reduction. Binary-search perplexity, Student-t kernel, KL gradient, early exaggeration. |
|
|
135
|
+
| `PositionalEncoding` | Sinusoidal positional encoding (Vaswani et al.). Static — no parameters, generalizes to unseen lengths. |
|
|
136
|
+
| `LearnedPositionalEncoding` | Trainable positional encoding. Xavier-initialized, learnable up to a fixed `maxSeqLen`. |
|
|
137
|
+
| `ContrastiveLearning` | SimCLR-style self-supervised learning. NT-Xent loss, encoder + projection head, temperature τ. |
|
|
138
|
+
| `Augmenter` | Data augmentation helpers for contrastive pairs: Gaussian noise, feature dropout, `makePair()`. |
|
|
139
|
+
|
|
140
|
+
### Activations & math
|
|
141
|
+
|
|
142
|
+
| Export | Description |
|
|
143
|
+
|--------|-------------|
|
|
144
|
+
| `sigmoid` `relu` `tanh` `linear` `leakyRelu` `elu` | Built-in activation functions with `fn` and `dfn` (derivative from output). |
|
|
145
|
+
| `makeLeakyRelu(α)` `makeElu(α)` | Parametric variants. |
|
|
146
|
+
| `matMul` `transpose` `softmax` `softmaxBackward` | Matrix math utilities. |
|
|
147
|
+
|
|
148
|
+
### Optimizers
|
|
149
|
+
|
|
150
|
+
| Export | Description |
|
|
151
|
+
|--------|-------------|
|
|
152
|
+
| `SGD` | Vanilla stochastic gradient descent. Stateless. |
|
|
153
|
+
| `Momentum` | Accumulates velocity in the gradient direction. |
|
|
154
|
+
| `Adam` | Adaptive moment estimation. Per-parameter first and second moments with bias correction. |
|
|
155
|
+
| `ClipOptimizer` | Wraps any optimizer with gradient clipping. |
|
|
156
|
+
| `ClippedOptimizerFactory` | Factory wrapper that clips all created optimizers. |
|
|
157
|
+
| `defaultOptimizer` | Default factory (`() => new SGD()`). Shared fallback across all classes. |
|
|
158
|
+
|
|
159
|
+
### Loss functions
|
|
160
|
+
|
|
161
|
+
| Export | Description |
|
|
162
|
+
|--------|-------------|
|
|
163
|
+
| `mse` `crossEntropy` | Scalar loss functions for evaluation and logging. |
|
|
164
|
+
| `mseDelta` `crossEntropyDelta` `crossEntropyDeltaRaw` | Output-layer delta functions for `trainWithDeltas`. |
|
|
165
|
+
|
|
166
|
+
### Metrics & evaluation
|
|
167
|
+
|
|
168
|
+
| Export | Description |
|
|
169
|
+
|--------|-------------|
|
|
170
|
+
| `confusionMatrix` | Returns `number[][]` confusion matrix. |
|
|
171
|
+
| `accuracy` `precision` `recall` `f1Score` | Standard classification metrics. |
|
|
172
|
+
| `rocCurve` `auc` | ROC curve points and area under the curve (trapezoidal rule). |
|
|
173
|
+
| `mae` `rmse` `r2Score` | Regression metrics. |
|
|
174
|
+
| `perplexity` | `exp(mean cross-entropy)` — natural metric for language models. |
|
|
175
|
+
| `printConfusionMatrix` `classificationReport` | Console-formatted output tables. |
|
|
176
|
+
|
|
177
|
+
### Training utilities
|
|
178
|
+
|
|
179
|
+
| Export | Description |
|
|
180
|
+
|--------|-------------|
|
|
181
|
+
| `Trainer` | Training loop with epochs, batches, metrics, and callbacks. |
|
|
182
|
+
| `DataLoader` | Dataset wrapper with shuffling and validation split. |
|
|
183
|
+
| `LRScheduler` | Learning rate schedules (step, exponential, cosine). |
|
|
184
|
+
| `EarlyStopping` | Stops training when a metric stalls. Configurable patience, mode, and best-weight restore. |
|
|
185
|
+
| `LossPlotter` | Renders a loss curve as ASCII art in the terminal. |
|
|
186
|
+
| `WeightInspector` | Per-layer weight statistics (mean, std, dead weights). Detects dead ReLUs. |
|
|
187
|
+
| `DataAugmentation` | Noise, jitter, normalization, z-score, shuffle, train/val/test split. |
|
|
188
|
+
| `ModelSaver` | Universal serialization via flat `getWeights()` / `setWeights()`. |
|
|
53
189
|
|
|
54
190
|
## Install
|
|
55
191
|
|
|
@@ -66,303 +202,515 @@ import { Neuron } from "@dniskav/neuron";
|
|
|
66
202
|
|
|
67
203
|
const neuron = new Neuron();
|
|
68
204
|
|
|
69
|
-
// Train: output 1 if input >= 18, else 0
|
|
70
205
|
for (let epoch = 0; epoch < 1000; epoch++) {
|
|
71
206
|
neuron.train(20, 1, 0.1); // adult
|
|
72
207
|
neuron.train(15, 0, 0.1); // minor
|
|
73
208
|
}
|
|
74
209
|
|
|
75
|
-
console.log(neuron.predict(17)); // ~0.1
|
|
76
|
-
console.log(neuron.predict(25)); // ~0.9
|
|
210
|
+
console.log(neuron.predict(17)); // ~0.1
|
|
211
|
+
console.log(neuron.predict(25)); // ~0.9
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### NetworkN — deep network with custom architecture
|
|
215
|
+
|
|
216
|
+
```ts
|
|
217
|
+
import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
|
|
218
|
+
|
|
219
|
+
const net = new NetworkN([3, 64, 32, 1], {
|
|
220
|
+
activations: [relu, relu, sigmoid],
|
|
221
|
+
optimizer: () => new Adam(),
|
|
222
|
+
});
|
|
223
|
+
|
|
224
|
+
net.train([0.5, 0.3, 0.8], [1], 0.001);
|
|
225
|
+
const [out] = net.predict([0.5, 0.3, 0.8]);
|
|
77
226
|
```
|
|
78
227
|
|
|
79
|
-
###
|
|
228
|
+
### Historical Perceptron — step function, no hidden layers
|
|
80
229
|
|
|
81
230
|
```ts
|
|
82
|
-
import {
|
|
231
|
+
import { Perceptron } from "@dniskav/neuron";
|
|
83
232
|
|
|
84
|
-
const
|
|
233
|
+
const p = new Perceptron(2);
|
|
85
234
|
|
|
86
|
-
//
|
|
87
|
-
|
|
88
|
-
|
|
235
|
+
// Learns AND gate (linearly separable)
|
|
236
|
+
const data = [[0,0,0],[0,1,0],[1,0,0],[1,1,1]];
|
|
237
|
+
for (let e = 0; e < 100; e++)
|
|
238
|
+
for (const [a, b, t] of data) p.train([a, b], t, 0.1);
|
|
89
239
|
|
|
90
|
-
console.log(
|
|
240
|
+
console.log(p.predict([1, 1])); // 1
|
|
241
|
+
console.log(p.predict([0, 1])); // 0
|
|
242
|
+
// XOR cannot be learned — not linearly separable
|
|
91
243
|
```
|
|
92
244
|
|
|
93
|
-
###
|
|
245
|
+
### Linear Regression — normal equation
|
|
94
246
|
|
|
95
247
|
```ts
|
|
96
|
-
import {
|
|
248
|
+
import { LinearRegression } from "@dniskav/neuron";
|
|
97
249
|
|
|
98
|
-
|
|
99
|
-
const net = new Network(2, 8, 1);
|
|
250
|
+
const model = new LinearRegression();
|
|
100
251
|
|
|
101
|
-
//
|
|
102
|
-
|
|
252
|
+
// Exact closed-form solution in one call
|
|
253
|
+
model.fitNormal(
|
|
254
|
+
[[1], [2], [3], [4]], // X
|
|
255
|
+
[2, 4, 6, 8] // y = 2x
|
|
256
|
+
);
|
|
103
257
|
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
}
|
|
108
|
-
}
|
|
258
|
+
console.log(model.predict([5])); // ~10
|
|
259
|
+
console.log(model.getCoefficients()); // { weights: [2], bias: ~0 }
|
|
260
|
+
```
|
|
109
261
|
|
|
110
|
-
|
|
111
|
-
|
|
262
|
+
### Logistic Regression — sigmoid + BCE
|
|
263
|
+
|
|
264
|
+
```ts
|
|
265
|
+
import { LogisticRegression } from "@dniskav/neuron";
|
|
266
|
+
|
|
267
|
+
const clf = new LogisticRegression(2);
|
|
268
|
+
const lossHistory = clf.train(
|
|
269
|
+
[[0,0],[1,1],[1,0],[0,1]],
|
|
270
|
+
[0, 1, 1, 0],
|
|
271
|
+
0.1, 500
|
|
272
|
+
);
|
|
273
|
+
|
|
274
|
+
console.log(clf.classify([0.9, 0.9])); // 1
|
|
275
|
+
console.log(clf.classify([0.1, 0.1])); // 0
|
|
112
276
|
```
|
|
113
277
|
|
|
114
|
-
###
|
|
278
|
+
### Gaussian Naive Bayes — zero gradient descent
|
|
115
279
|
|
|
116
280
|
```ts
|
|
117
|
-
import {
|
|
281
|
+
import { GaussianNaiveBayes } from "@dniskav/neuron";
|
|
282
|
+
|
|
283
|
+
const nb = new GaussianNaiveBayes();
|
|
284
|
+
nb.fit(
|
|
285
|
+
[[1.2, 0.5], [1.4, 0.7], [5.0, 4.5], [5.2, 4.8]],
|
|
286
|
+
[0, 0, 1, 1]
|
|
287
|
+
);
|
|
118
288
|
|
|
119
|
-
|
|
120
|
-
|
|
289
|
+
console.log(nb.predict([1.3, 0.6])); // 0
|
|
290
|
+
console.log(nb.predict([5.1, 4.6])); // 1
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### Decision Tree — Gini split
|
|
121
294
|
|
|
122
|
-
|
|
123
|
-
|
|
295
|
+
```ts
|
|
296
|
+
import { DecisionTree } from "@dniskav/neuron";
|
|
124
297
|
|
|
125
|
-
|
|
126
|
-
|
|
298
|
+
const tree = new DecisionTree({ maxDepth: 4, task: 'classification' });
|
|
299
|
+
tree.fit(X_train, y_train);
|
|
300
|
+
const predictions = tree.predictBatch(X_test);
|
|
127
301
|
```
|
|
128
302
|
|
|
129
|
-
###
|
|
303
|
+
### K-Means — unsupervised clustering
|
|
304
|
+
|
|
305
|
+
```ts
|
|
306
|
+
import { KMeans } from "@dniskav/neuron";
|
|
307
|
+
|
|
308
|
+
const km = new KMeans(3); // 3 clusters
|
|
309
|
+
km.fit(points);
|
|
130
310
|
|
|
131
|
-
|
|
311
|
+
const cluster = km.predict([1.2, 0.5]); // index 0, 1 or 2
|
|
312
|
+
console.log(km.inertia(points)); // lower = better fit
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
### PCA — dimensionality reduction
|
|
132
316
|
|
|
133
317
|
```ts
|
|
134
|
-
import {
|
|
318
|
+
import { PCA } from "@dniskav/neuron";
|
|
135
319
|
|
|
136
|
-
const
|
|
137
|
-
|
|
138
|
-
|
|
320
|
+
const pca = new PCA(2); // keep top 2 components
|
|
321
|
+
pca.fit(X); // 100 samples × 10 features
|
|
322
|
+
|
|
323
|
+
const Z = pca.transform(X); // 100 × 2
|
|
324
|
+
const X2 = pca.inverseTransform(Z); // reconstructed 100 × 10
|
|
325
|
+
|
|
326
|
+
console.log(pca.explainedVarianceRatio()); // [0.72, 0.15, ...]
|
|
139
327
|
```
|
|
140
328
|
|
|
141
|
-
|
|
329
|
+
### Self-Organizing Map
|
|
330
|
+
|
|
331
|
+
```ts
|
|
332
|
+
import { SOM } from "@dniskav/neuron";
|
|
333
|
+
|
|
334
|
+
const som = new SOM(10, 10, 3); // 10×10 grid, 3-dimensional inputs (RGB)
|
|
335
|
+
som.train(colors, 500);
|
|
142
336
|
|
|
143
|
-
|
|
337
|
+
const [row, col] = som.getBMU([255, 0, 0]); // find best matching unit for red
|
|
338
|
+
console.log(som.quantizationError(colors));
|
|
339
|
+
```
|
|
144
340
|
|
|
145
|
-
|
|
341
|
+
### Hopfield Network — associative memory
|
|
146
342
|
|
|
147
343
|
```ts
|
|
148
|
-
import {
|
|
344
|
+
import { HopfieldNetwork } from "@dniskav/neuron";
|
|
149
345
|
|
|
150
|
-
const net = new
|
|
151
|
-
activations: [relu, sigmoid],
|
|
152
|
-
optimizer: () => new Adam(), // default: beta1=0.9, beta2=0.999
|
|
153
|
-
});
|
|
346
|
+
const net = new HopfieldNetwork(64); // 64 binary neurons
|
|
154
347
|
|
|
155
|
-
//
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
348
|
+
// Store two 64-bit patterns
|
|
349
|
+
net.store(HopfieldNetwork.binarize(pattern1)); // converts 0/1 → -1/+1
|
|
350
|
+
net.store(HopfieldNetwork.binarize(pattern2));
|
|
351
|
+
|
|
352
|
+
// Recall from noisy input
|
|
353
|
+
const recovered = net.recall(HopfieldNetwork.binarize(noisyPattern1));
|
|
354
|
+
console.log(net.energy(recovered)); // local minimum = stored memory
|
|
160
355
|
```
|
|
161
356
|
|
|
162
|
-
|
|
357
|
+
### Autoencoder — learn compressed representations
|
|
163
358
|
|
|
164
359
|
```ts
|
|
165
|
-
import {
|
|
360
|
+
import { Autoencoder } from "@dniskav/neuron";
|
|
166
361
|
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
362
|
+
// 784 → [128, 64] → 16 (latent) → [64, 128] → 784
|
|
363
|
+
const ae = new Autoencoder(784, [128, 64], 16, [64, 128]);
|
|
364
|
+
|
|
365
|
+
for (let e = 0; e < 1000; e++)
|
|
366
|
+
for (const x of images)
|
|
367
|
+
ae.train(x, 0.001);
|
|
368
|
+
|
|
369
|
+
const latent = ae.encode(image); // compressed: 16 values
|
|
370
|
+
const reconstructed = ae.reconstruct(image); // decoded back: 784 values
|
|
171
371
|
```
|
|
172
372
|
|
|
173
|
-
###
|
|
373
|
+
### GAN — generative adversarial training
|
|
174
374
|
|
|
175
375
|
```ts
|
|
176
|
-
import {
|
|
376
|
+
import { GAN } from "@dniskav/neuron";
|
|
377
|
+
|
|
378
|
+
const gan = new GAN(
|
|
379
|
+
16, // latentDim
|
|
380
|
+
[32, 64], // generator hidden layers
|
|
381
|
+
8, // outputDim (size of generated samples)
|
|
382
|
+
[64, 32], // discriminator hidden layers
|
|
383
|
+
);
|
|
384
|
+
|
|
385
|
+
for (let step = 0; step < 10000; step++) {
|
|
386
|
+
const { dLoss, gLoss } = gan.trainStep(realBatch, 0.0002);
|
|
387
|
+
if (step % 500 === 0) console.log(`D: ${dLoss.toFixed(3)} G: ${gLoss.toFixed(3)}`);
|
|
388
|
+
}
|
|
177
389
|
|
|
178
|
-
const
|
|
179
|
-
console.log(mse(predicted, [1, 0]));
|
|
180
|
-
console.log(crossEntropy(predicted, [1, 0]));
|
|
390
|
+
const fake = gan.generate(); // new synthetic sample
|
|
181
391
|
```
|
|
182
392
|
|
|
183
|
-
###
|
|
393
|
+
### VAE — variational autoencoder
|
|
394
|
+
|
|
395
|
+
```ts
|
|
396
|
+
import { VAE } from "@dniskav/neuron";
|
|
397
|
+
|
|
398
|
+
const vae = new VAE(784, [256, 128], 32, [128, 256]);
|
|
184
399
|
|
|
185
|
-
|
|
400
|
+
for (const x of dataset) {
|
|
401
|
+
const { totalLoss, reconLoss, klLoss } = vae.train(x, 0.001);
|
|
402
|
+
}
|
|
403
|
+
|
|
404
|
+
// Sample from latent space
|
|
405
|
+
const generated = vae.generate(); // random sample
|
|
406
|
+
const { mu, logVar } = vae.encode(image); // encode → distribution params
|
|
407
|
+
const z = vae.reparametrize(mu, logVar); // sample z ~ N(μ, σ²)
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
### Word2Vec — aprende embeddings de palabras
|
|
186
411
|
|
|
187
412
|
```ts
|
|
188
|
-
import {
|
|
413
|
+
import { Word2Vec } from "@dniskav/neuron";
|
|
414
|
+
|
|
415
|
+
const w2v = new Word2Vec(64, { model: 'skipgram', windowSize: 2 });
|
|
189
416
|
|
|
190
|
-
const
|
|
191
|
-
|
|
417
|
+
const corpus = [
|
|
418
|
+
["the", "king", "rules", "the", "kingdom"],
|
|
419
|
+
["the", "queen", "rules", "the", "land"],
|
|
420
|
+
["man", "and", "woman", "are", "human"],
|
|
421
|
+
];
|
|
192
422
|
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
423
|
+
w2v.buildVocab(corpus);
|
|
424
|
+
w2v.train(corpus, 0.05, 200);
|
|
425
|
+
|
|
426
|
+
console.log(w2v.similarity("king", "queen")); // high
|
|
427
|
+
console.log(w2v.mostSimilar("king", 3));
|
|
428
|
+
// [{ word: 'queen', score: 0.91 }, ...]
|
|
429
|
+
|
|
430
|
+
// Vector arithmetic: king - man + woman ≈ queen
|
|
431
|
+
console.log(w2v.analogy("king", "man", "woman", 1));
|
|
432
|
+
// [{ word: 'queen', score: 0.87 }]
|
|
196
433
|
```
|
|
197
434
|
|
|
198
|
-
###
|
|
435
|
+
### t-SNE — visualiza embeddings en 2D
|
|
436
|
+
|
|
437
|
+
```ts
|
|
438
|
+
import { TSNE } from "@dniskav/neuron";
|
|
439
|
+
|
|
440
|
+
// Reduce 128-dim embeddings → 2D for plotting
|
|
441
|
+
const tsne = new TSNE({ perplexity: 30, nIter: 1000, seed: 42 });
|
|
442
|
+
const points2D = tsne.fitTransform(embeddings128D); // [n][2]
|
|
443
|
+
|
|
444
|
+
console.log(tsne.kl()); // KL divergence — lower is better
|
|
445
|
+
// Plot points2D with any charting library
|
|
446
|
+
```
|
|
199
447
|
|
|
200
|
-
|
|
448
|
+
### PositionalEncoding — orden sin parámetros
|
|
201
449
|
|
|
202
450
|
```ts
|
|
203
|
-
import {
|
|
451
|
+
import { PositionalEncoding, LearnedPositionalEncoding } from "@dniskav/neuron";
|
|
204
452
|
|
|
205
|
-
//
|
|
206
|
-
const
|
|
453
|
+
// Sinusoidal — deterministic, no training needed
|
|
454
|
+
const pe = PositionalEncoding.encodeSequence(512, 128); // [512][128]
|
|
455
|
+
const withPos = PositionalEncoding.apply(tokenEmbeddings); // add PE to embeddings
|
|
207
456
|
|
|
208
|
-
//
|
|
209
|
-
|
|
457
|
+
// Learned — trainable, fixed maxSeqLen
|
|
458
|
+
const lpe = new LearnedPositionalEncoding(512, 128);
|
|
459
|
+
const withLearnedPos = lpe.apply(tokenEmbeddings);
|
|
460
|
+
lpe.update(gradients, 0.001); // update during backprop
|
|
461
|
+
```
|
|
210
462
|
|
|
211
|
-
|
|
212
|
-
net.resetState(); // clear memory at episode start
|
|
463
|
+
### ContrastiveLearning — representaciones sin etiquetas
|
|
213
464
|
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
net.predict([1]); // same input every step
|
|
217
|
-
targets.push([step >= 3 ? 1 : 0]);
|
|
218
|
-
}
|
|
465
|
+
```ts
|
|
466
|
+
import { ContrastiveLearning, Augmenter } from "@dniskav/neuron";
|
|
219
467
|
|
|
220
|
-
|
|
221
|
-
}
|
|
468
|
+
// Encoder: 128 → [256, 128] → 64 latent, projection head: 64 → 32
|
|
469
|
+
const cl = new ContrastiveLearning(128, [256, 128], 64, { temperature: 0.5 });
|
|
470
|
+
|
|
471
|
+
// Create positive pairs from unlabeled data (two augmented views per sample)
|
|
472
|
+
const pairs = unlabeledData.map(x => Augmenter.makePair(x));
|
|
222
473
|
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
const [out] = net.predict([1]);
|
|
227
|
-
console.log(`step ${step}: ${out.toFixed(2)} (expected: ${step >= 3 ? 1 : 0})`);
|
|
474
|
+
for (let step = 0; step < 1000; step++) {
|
|
475
|
+
const loss = cl.trainStep(pairs, 0.001);
|
|
476
|
+
if (step % 100 === 0) console.log(`step ${step}: ${loss.toFixed(4)}`);
|
|
228
477
|
}
|
|
229
|
-
|
|
230
|
-
//
|
|
231
|
-
|
|
232
|
-
// step 3: 0.81 (expected: 1)
|
|
233
|
-
// step 4: 0.89 (expected: 1)
|
|
234
|
-
// step 5: 0.93 (expected: 1)
|
|
478
|
+
|
|
479
|
+
// Use encoder for downstream tasks (classification, clustering, etc.)
|
|
480
|
+
const representation = cl.encode(newSample); // 64-dim vector
|
|
235
481
|
```
|
|
236
482
|
|
|
237
|
-
|
|
483
|
+
### Value / Tape — automatic differentiation
|
|
238
484
|
|
|
239
|
-
|
|
485
|
+
```ts
|
|
486
|
+
import { Value } from "@dniskav/neuron";
|
|
240
487
|
|
|
241
|
-
|
|
488
|
+
// Build a computation graph
|
|
489
|
+
const x = new Value(2.0);
|
|
490
|
+
const w = new Value(-3.0);
|
|
491
|
+
const b = new Value(6.7);
|
|
492
|
+
const n = x.mul(w).add(b); // n = x*w + b
|
|
493
|
+
const o = n.tanh(); // o = tanh(n)
|
|
494
|
+
|
|
495
|
+
// Backward pass — fills .grad for every node
|
|
496
|
+
o.backward();
|
|
242
497
|
|
|
498
|
+
console.log(x.grad); // ∂o/∂x
|
|
499
|
+
console.log(w.grad); // ∂o/∂w
|
|
500
|
+
console.log(b.grad); // ∂o/∂b
|
|
243
501
|
```
|
|
244
|
-
|
|
245
|
-
|
|
502
|
+
|
|
503
|
+
### Conv2D + MaxPool2D + Flatten — CNN pipeline
|
|
504
|
+
|
|
505
|
+
```ts
|
|
506
|
+
import { Conv2D, MaxPool2D, Flatten, NetworkN, relu, sigmoid } from "@dniskav/neuron";
|
|
507
|
+
|
|
508
|
+
const conv = new Conv2D(28, 28, 1, 3, 8); // 28×28×1 → 26×26×8
|
|
509
|
+
const pool = new MaxPool2D(2); // 26×26×8 → 13×13×8
|
|
510
|
+
const flatten = new Flatten();
|
|
511
|
+
const dense = new NetworkN([13*13*8, 64, 10]);
|
|
512
|
+
|
|
513
|
+
// Forward
|
|
514
|
+
const featureMaps = conv.forward(image); // [H][W][C]
|
|
515
|
+
const pooled = pool.forward(featureMaps);
|
|
516
|
+
const flat = flatten.forward(pooled); // 1352 values
|
|
517
|
+
const logits = dense.predict(flat);
|
|
246
518
|
```
|
|
247
519
|
|
|
248
|
-
|
|
520
|
+
### RNN — vanilla recurrent network
|
|
249
521
|
|
|
250
|
-
|
|
522
|
+
```ts
|
|
523
|
+
import { RNN } from "@dniskav/neuron";
|
|
251
524
|
|
|
252
|
-
|
|
525
|
+
// 1 input → 16 hidden → 1 output, over a sequence
|
|
526
|
+
const rnn = new RNN(1, 16, 1);
|
|
253
527
|
|
|
254
|
-
|
|
528
|
+
const sequence = [[0.1], [0.3], [0.7], [0.9]]; // 4 timesteps
|
|
529
|
+
const { outputs, hiddens } = rnn.forward(sequence);
|
|
255
530
|
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
531
|
+
// BPTT backward — returns MSE loss
|
|
532
|
+
const targets = [[0.2], [0.5], [0.8], [1.0]];
|
|
533
|
+
const loss = rnn.backward(sequence, targets, 0.01);
|
|
259
534
|
```
|
|
260
535
|
|
|
261
|
-
|
|
536
|
+
### TCN — Temporal Convolutional Network
|
|
262
537
|
|
|
263
|
-
|
|
538
|
+
```ts
|
|
539
|
+
import { TCN } from "@dniskav/neuron";
|
|
264
540
|
|
|
265
|
-
|
|
541
|
+
// 3 input channels → 32 channels × 4 levels → 1 output
|
|
542
|
+
// Receptive field = (3-1)·(2⁴-1)+1 = 30 timesteps
|
|
543
|
+
const tcn = new TCN(3, 32, 3, 4, 1);
|
|
544
|
+
|
|
545
|
+
const sequence = Array.from({ length: 50 }, () => [Math.random(), Math.random(), Math.random()]);
|
|
546
|
+
const outputs = tcn.forward(sequence); // [50][1]
|
|
547
|
+
```
|
|
548
|
+
|
|
549
|
+
### NetworkLSTM — recurrent memory
|
|
266
550
|
|
|
267
551
|
```ts
|
|
268
|
-
import {
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
});
|
|
552
|
+
import { NetworkLSTM } from "@dniskav/neuron";
|
|
553
|
+
|
|
554
|
+
const net = new NetworkLSTM(1, 8, [4, 1]);
|
|
555
|
+
|
|
556
|
+
for (let epoch = 0; epoch < 300; epoch++) {
|
|
557
|
+
net.resetState();
|
|
558
|
+
for (let step = 0; step < 6; step++) net.predict([1]);
|
|
559
|
+
net.train([[0],[0],[0],[1],[1],[1]], 0.05);
|
|
560
|
+
}
|
|
561
|
+
```
|
|
279
562
|
|
|
280
|
-
|
|
281
|
-
const puzzle = [5,3,0, 0,7,0, 0,0,0, ...];
|
|
282
|
-
const targets = [...]; // 81*9 one-hot values
|
|
283
|
-
const mask = puzzle.map(v => v === 0); // only train on empty cells
|
|
563
|
+
### Metrics — evaluate your model
|
|
284
564
|
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
const logits = net.predict(puzzle); // 729 logits (81 × 9)
|
|
565
|
+
```ts
|
|
566
|
+
import { accuracy, f1Score, confusionMatrix, printConfusionMatrix, auc, classificationReport } from "@dniskav/neuron";
|
|
288
567
|
|
|
289
|
-
|
|
290
|
-
const
|
|
291
|
-
|
|
568
|
+
const yTrue = [0, 1, 1, 0, 1];
|
|
569
|
+
const yPred = [0, 1, 0, 0, 1];
|
|
570
|
+
|
|
571
|
+
console.log(accuracy(yTrue, yPred)); // 0.8
|
|
572
|
+
console.log(f1Score(yTrue, yPred)); // 0.8
|
|
573
|
+
|
|
574
|
+
const cm = confusionMatrix(yTrue, yPred);
|
|
575
|
+
printConfusionMatrix(cm, ['neg', 'pos']);
|
|
576
|
+
|
|
577
|
+
// AUC-ROC
|
|
578
|
+
const scores = [0.1, 0.9, 0.4, 0.2, 0.8];
|
|
579
|
+
console.log(auc(yTrue, scores)); // ~0.9
|
|
580
|
+
|
|
581
|
+
classificationReport(yTrue, yPred, ['neg', 'pos']);
|
|
292
582
|
```
|
|
293
583
|
|
|
294
|
-
|
|
295
|
-
|
|
584
|
+
### EarlyStopping
|
|
585
|
+
|
|
586
|
+
```ts
|
|
587
|
+
import { EarlyStopping } from "@dniskav/neuron";
|
|
588
|
+
|
|
589
|
+
const stopper = new EarlyStopping({ patience: 10, minDelta: 1e-4, mode: 'min' });
|
|
296
590
|
|
|
297
|
-
|
|
591
|
+
for (let epoch = 0; epoch < 1000; epoch++) {
|
|
592
|
+
const valLoss = trainEpoch();
|
|
593
|
+
if (stopper.update(valLoss, epoch)) {
|
|
594
|
+
console.log(`Stopped at epoch ${epoch}`);
|
|
595
|
+
break;
|
|
596
|
+
}
|
|
597
|
+
}
|
|
598
|
+
```
|
|
298
599
|
|
|
299
|
-
|
|
600
|
+
### LossPlotter — ASCII loss curve
|
|
300
601
|
|
|
301
602
|
```ts
|
|
302
|
-
import {
|
|
303
|
-
|
|
304
|
-
// Agent sees the last 8 steps, each step is a 7-value sensor vector → 4 actions
|
|
305
|
-
const net = new NetworkTransformerRL(8, 7, {
|
|
306
|
-
d_model: 32,
|
|
307
|
-
nHeads: 2,
|
|
308
|
-
d_ff: 64,
|
|
309
|
-
nBlocks: 2,
|
|
310
|
-
nActions: 4,
|
|
311
|
-
});
|
|
603
|
+
import { LossPlotter } from "@dniskav/neuron";
|
|
312
604
|
|
|
313
|
-
|
|
314
|
-
const sequence = getLastNStates(); // number[][] — shape: [8, 7]
|
|
315
|
-
const qValues = net.predict(sequence); // number[4]
|
|
605
|
+
const plotter = new LossPlotter({ width: 60, height: 12, title: 'Training Loss' });
|
|
316
606
|
|
|
317
|
-
|
|
318
|
-
const
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
607
|
+
for (let e = 0; e < 500; e++) {
|
|
608
|
+
const loss = trainStep();
|
|
609
|
+
plotter.add(loss, e);
|
|
610
|
+
}
|
|
611
|
+
|
|
612
|
+
plotter.print();
|
|
613
|
+
// Training Loss
|
|
614
|
+
// ┌────────────────────────────────────────────────────────────┐
|
|
615
|
+
// │ 2.31 ·
|
|
616
|
+
// │ · ·
|
|
617
|
+
// │ · · ·
|
|
618
|
+
// │ · · · · · · ·
|
|
619
|
+
// │ 0.02 · · · · · · · · · · · · · · ·
|
|
620
|
+
// └────────────────────────────────────────────────────────────┘
|
|
621
|
+
// 0 250 499
|
|
622
|
+
```
|
|
623
|
+
|
|
624
|
+
### DataAugmentation
|
|
625
|
+
|
|
626
|
+
```ts
|
|
627
|
+
import { DataAugmentation } from "@dniskav/neuron";
|
|
322
628
|
|
|
323
|
-
|
|
629
|
+
// Split dataset
|
|
630
|
+
const { trainX, trainY, valX, valY } = DataAugmentation.split(X, y, 0.8, 0.1);
|
|
631
|
+
|
|
632
|
+
// Normalize (fit on train, apply to all)
|
|
633
|
+
const { normalized: normTrain, min, max } = DataAugmentation.normalize(trainX);
|
|
634
|
+
const normVal = valX.map(x => DataAugmentation.normalizePoint(x, min, max));
|
|
635
|
+
|
|
636
|
+
// Augment training set (×3 copies with Gaussian noise)
|
|
637
|
+
const { X: augX, y: augY } = DataAugmentation.augmentBatch(normTrain, trainY, 3, 0.02);
|
|
324
638
|
```
|
|
325
639
|
|
|
326
|
-
|
|
640
|
+
### WeightInspector — diagnose your network
|
|
327
641
|
|
|
328
642
|
```ts
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
643
|
+
import { NetworkN, WeightInspector, relu } from "@dniskav/neuron";
|
|
644
|
+
|
|
645
|
+
const net = new NetworkN([784, 256, 128, 10], { activations: [relu, relu, relu] });
|
|
646
|
+
// ... train ...
|
|
647
|
+
|
|
648
|
+
WeightInspector.print(net);
|
|
649
|
+
// Layer 0: mean=0.001 std=0.056 min=-0.21 max=0.19 dead=0 params=200960
|
|
650
|
+
// Layer 1: mean=0.000 std=0.079 min=-0.31 max=0.28 dead=3 params=32896
|
|
651
|
+
// Layer 2: mean=-0.001 std=0.091 min=-0.28 max=0.32 dead=0 params=1290
|
|
332
652
|
```
|
|
333
653
|
|
|
654
|
+
## How it works
|
|
655
|
+
|
|
656
|
+
Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
|
|
657
|
+
|
|
658
|
+
```
|
|
659
|
+
weight += lr × delta × input
|
|
660
|
+
bias += lr × delta
|
|
661
|
+
```
|
|
662
|
+
|
|
663
|
+
`NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. `NeuronN` uses **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]`.
|
|
664
|
+
|
|
665
|
+
When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state.
|
|
666
|
+
|
|
667
|
+
The `Value` class implements **reverse-mode automatic differentiation**: every operation records its inputs and a backward function. Calling `.backward()` on the output node performs a topological sort and propagates `∂L/∂w` through the entire graph.
|
|
668
|
+
|
|
669
|
+
## Build
|
|
670
|
+
|
|
671
|
+
```bash
|
|
672
|
+
npm run build # outputs CJS + ESM + type declarations to dist/
|
|
673
|
+
npm run dev # watch mode
|
|
674
|
+
npm test # run test suite
|
|
675
|
+
```
|
|
676
|
+
|
|
677
|
+
## For AI agents
|
|
678
|
+
|
|
679
|
+
If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
|
|
680
|
+
|
|
334
681
|
## Changelog
|
|
335
682
|
|
|
683
|
+
### v0.3.1
|
|
684
|
+
- **New — Embeddings:** `Word2Vec` (Skip-gram + CBOW, full-softmax, cosine similarity, analogies), `TSNE` (binary-search perplexity, Student-t kernel, KL gradient, early exaggeration, seeded PRNG), `PositionalEncoding` (sinusoidal, Vaswani et al.), `LearnedPositionalEncoding` (trainable), `ContrastiveLearning` (NT-Xent, SimCLR encoder + projection head), `Augmenter` (noise, feature dropout, `makePair`)
|
|
685
|
+
|
|
686
|
+
### v0.3.0
|
|
687
|
+
- **New — Classical ML:** `Perceptron`, `LinearRegression` (normal equation + GD), `LogisticRegression`, `SoftmaxRegression`, `GaussianNaiveBayes`, `DecisionTree` (CART, Gini/MSE)
|
|
688
|
+
- **New — Unsupervised:** `KMeans` (K-Means++ init), `PCA` (power iteration + Hotelling deflation), `SOM` (Kohonen map), `HopfieldNetwork` (Hebbian storage + energy), `Autoencoder`
|
|
689
|
+
- **New — Deep Learning:** `Conv2D` (full forward/backward), `MaxPool2D` (position mask for exact backprop), `Flatten`, `RNN` (BPTT, documents vanishing gradient), `Seq2Seq` (encoder-decoder LSTM), `CausalConv1D`, `TCN` (dilated temporal convolutions)
|
|
690
|
+
- **New — Generative:** `GAN` (min-max game, Box-Muller sampling), `VAE` (reparametrization trick, ELBO = MSE + KL)
|
|
691
|
+
- **New — Autograd:** `Value` / `Tape` — scalar reverse-mode AD with topological backprop (micrograd-style)
|
|
692
|
+
- **New — Metrics:** `confusionMatrix`, `accuracy`, `precision`, `recall`, `f1Score`, `rocCurve`, `auc`, `mae`, `rmse`, `r2Score`, `perplexity`, `printConfusionMatrix`, `classificationReport`
|
|
693
|
+
- **New — Utilities:** `EarlyStopping` (patience + best-weight restore), `LossPlotter` (ASCII terminal curve), `WeightInspector` (per-layer stats, dead ReLU detection), `DataAugmentation` (noise, normalize, z-score, shuffle, split)
|
|
694
|
+
|
|
336
695
|
### v0.2.7
|
|
337
|
-
- **Docs:** Added architecture diagram to README
|
|
696
|
+
- **Docs:** Added architecture diagram to README
|
|
338
697
|
|
|
339
698
|
### v0.2.6
|
|
340
699
|
- **Fix:** `Network.predict` now returns `number[]` (consistent with all other network classes)
|
|
341
|
-
- **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()`
|
|
342
|
-
- **Fix:** `LayerNorm.backwardOne`
|
|
343
|
-
- **Fix:** LSTM and GRU gate initialization corrected
|
|
344
|
-
- **New:** `BiasVector` — 1D counterpart to `WeightMatrix`
|
|
345
|
-
- **New:** `defaultOptimizer`
|
|
346
|
-
- **Refactor:** `NetworkN
|
|
347
|
-
- **Refactor:** `Transformer` backward methods now throw descriptive errors instead of crashing with a cryptic `TypeError` when called before `predict()`
|
|
348
|
-
- **Refactor:** `NetworkTransformer.setWeights()` and `NetworkTransformerRL.setWeightsFlat()` use each component's own `setWeights()` instead of direct `.W` mutation
|
|
700
|
+
- **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()`
|
|
701
|
+
- **Fix:** `LayerNorm.backwardOne` correctly uses pre-update γ
|
|
702
|
+
- **Fix:** LSTM and GRU gate initialization corrected to Xavier fan-in+out
|
|
703
|
+
- **New:** `BiasVector` — 1D counterpart to `WeightMatrix`
|
|
704
|
+
- **New:** `defaultOptimizer` — shared default factory
|
|
705
|
+
- **Refactor:** `NetworkN` extracts `_forwardAll()` and `_backpropLayers()`
|
|
349
706
|
|
|
350
707
|
### v0.2.5
|
|
351
|
-
- Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D`
|
|
352
|
-
- `NetworkN`: residual connections
|
|
353
|
-
- `Conv1D`: multi-channel input
|
|
354
|
-
- `
|
|
355
|
-
- `
|
|
356
|
-
- `
|
|
357
|
-
- `ModelSaver`: universal serialization via flat `getWeights()`/`setWeights()` for all classes
|
|
358
|
-
- Gradient check test suite (`tests/GradientCheck.test.ts`)
|
|
359
|
-
|
|
360
|
-
## Possible improvements
|
|
361
|
-
|
|
362
|
-
1. **Support for batches** in training to improve efficiency and gradient stability.
|
|
363
|
-
2. **Global gradient norm clipping** — `WeightMatrix.update` supports per-element clipping; a utility to clip across all matrices by total norm would be more principled.
|
|
364
|
-
3. **Learning rate warmup** — standard practice for Transformers; ramp LR from 0 to target over the first N steps.
|
|
365
|
-
4. **Pre-norm architecture** — LayerNorm before the residual add (instead of after) is more stable for deep stacks.
|
|
708
|
+
- Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D`
|
|
709
|
+
- `NetworkN`: residual connections and dropout
|
|
710
|
+
- `Conv1D`: multi-channel input
|
|
711
|
+
- `Trainer`: weight decay, early stopping, classification metrics
|
|
712
|
+
- `DataLoader`: validation split
|
|
713
|
+
- `ModelSaver`: universal serialization
|
|
366
714
|
|
|
367
715
|
## License
|
|
368
716
|
|