@dniskav/neuron 0.2.6 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,8 +3,60 @@
3
3
 
4
4
  A minimal, dependency-free neural network library built from scratch in TypeScript. Designed for learning and experimentation — every line of math is readable.
5
5
 
6
+ Each class is a building block for the next: from a single neuron to a full Transformer with causal attention. v0.3.0 adds classical ML, unsupervised learning, generative models, autograd, and training utilities — all in pure TypeScript, zero dependencies.
7
+
8
+ ```mermaid
9
+ graph TD
10
+ A["Neuron\n1 input · 1 weight · 1 bias"]
11
+ B["NeuronN\nN inputs · Xavier init · configurable activation"]
12
+ C["Layer\ngroup of NeuronN sharing the same inputs"]
13
+ D["Network\nhidden + output · backprop"]
14
+ E["NetworkN\narbitrary depth · define as [inputs, ...hidden, outputs]"]
15
+ F["LSTMLayer\nrecurrent · hidden + cell state · BPTT"]
16
+ G["NetworkLSTM\nLSTM + dense layers · sequence memory"]
17
+ H["AttentionHead\nQ · K · V · scaled dot-product"]
18
+ I["MultiHeadAttention\nN heads in parallel"]
19
+ J["TransformerBlock\nattention + FFN + LayerNorm × 2 + residuals"]
20
+ K["NetworkTransformer\nembeddings → blocks → per-token logits"]
21
+ L["NetworkTransformerRL\ncontinuous projection → causal attention → Q-values"]
22
+
23
+ subgraph Classical ML
24
+ P["Perceptron\nstep function · Rosenblatt rule"]
25
+ LR["LinearRegression\nnormal equation · gradient descent"]
26
+ LOG["LogisticRegression\nsigmoid · BCE · SoftmaxRegression"]
27
+ NB["GaussianNaiveBayes\nlog-probabilities · Gaussian P(x|c)"]
28
+ DT["DecisionTree\nCART · Gini · MSE split"]
29
+ end
30
+
31
+ subgraph Unsupervised
32
+ KM["KMeans\nK-Means++ · inertia · elbow"]
33
+ PCA["PCA\npower iteration · projection · reconstruction"]
34
+ SOM["SOM\nKohonen · BMU · Gaussian neighborhood"]
35
+ HN["HopfieldNetwork\nHebbian · energy · associative memory"]
36
+ AE["Autoencoder\nencoder · bottleneck · decoder"]
37
+ end
38
+
39
+ subgraph Generative
40
+ GAN["GAN\ngenerator · discriminator · min-max"]
41
+ VAE["VAE\nreparametrization trick · ELBO · KL"]
42
+ end
43
+
44
+ subgraph Autograd
45
+ TAP["Value / Tape\nreverse-mode · computational graph · backward"]
46
+ end
47
+
48
+ A --> B --> C --> D --> E
49
+ E --> F --> G
50
+ E --> H --> I --> J --> K --> L
51
+ E --> AE
52
+ E --> GAN
53
+ E --> VAE
54
+ ```
55
+
6
56
  ## What's inside
7
57
 
58
+ ### Neural network building blocks
59
+
8
60
  | Export | Description |
9
61
  |--------|-------------|
10
62
  | `Neuron` | Single-input neuron. The simplest possible unit: one weight, one bias. |
@@ -14,20 +66,115 @@ A minimal, dependency-free neural network library built from scratch in TypeScri
14
66
  | `NetworkN` | Deep network of arbitrary depth. Define your architecture as `[inputs, ...hidden, outputs]`. |
15
67
  | `LSTMLayer` | Recurrent layer with persistent hidden and cell state. Learns sequences via BPTT. |
16
68
  | `NetworkLSTM` | Wraps an `LSTMLayer` + dense layers. Maintains memory across steps within an episode. |
69
+ | `GRULayer` | Gated Recurrent Unit — lighter alternative to LSTM, two gates instead of three. |
17
70
  | `NetworkTransformer` | Full token-classification Transformer: embeddings → N blocks → per-token logits. |
18
- | `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. Remembers the last N steps. |
71
+ | `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. |
19
72
  | `TransformerBlock` | One Transformer block: multi-head attention + FFN + LayerNorm × 2 with residuals. |
20
73
  | `MultiHeadAttention` | N parallel attention heads concatenated and projected to `d_model`. |
21
74
  | `AttentionHead` | Single scaled dot-product self-attention head (Q / K / V projections + backprop). |
75
+
76
+ ### Layers & components
77
+
78
+ | Export | Description |
79
+ |--------|-------------|
80
+ | `Conv1D` | 1D convolution over sequences. Multi-channel, configurable stride and padding. |
81
+ | `Conv2D` | 2D convolution for images. Kernels `[filters][kH][kW][C]`, full forward + backward. |
82
+ | `MaxPool2D` | Max pooling 2D. Stores position mask for exact gradient routing in backprop. |
83
+ | `Flatten` | Converts `[H][W][C]` tensors to flat vectors. Bridges Conv layers to dense layers. |
84
+ | `RNN` | Vanilla RNN with BPTT. Explicitly shows where and why gradients vanish. |
85
+ | `Seq2Seq` | Encoder + Decoder LSTMs with context vector transfer. Teacher forcing in training. |
86
+ | `CausalConv1D` | Causal dilated 1D convolution. One building block of a TCN. |
87
+ | `TCN` | Temporal Convolutional Network. Stacks causal dilated convolutions for sequences without recurrence. |
22
88
  | `LayerNorm` | Layer normalization with learnable γ / β per feature. |
23
- | `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers. Optional per-element gradient clipping via `update(dW, lr, clipValue)`. |
24
- | `BiasVector` | 1D bias vector with per-scalar Adam optimizers. Companion to `WeightMatrix` for bias terms. |
89
+ | `BatchNorm` | Batch normalization with running mean/variance for inference. |
90
+ | `Dropout` | Inverted dropout for regularization. Active only during training. |
91
+ | `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers and optional gradient clipping. |
92
+ | `BiasVector` | 1D bias vector with per-scalar Adam optimizers. |
25
93
  | `EmbeddingMatrix` | Lookup-table embedding matrix with SGD updates. |
26
- | `sigmoid` `relu` `tanh` `linear` | Built-in activation functions. |
27
- | `SGD` `Momentum` `Adam` `ClipOptimizer` | Optimizers. Each instance tracks its own state per weight. `ClipOptimizer` wraps any optimizer with gradient clipping. |
28
- | `defaultOptimizer` | Default `OptimizerFactory` (`() => new SGD()`). Shared across `NeuronN`, `Layer`, `NetworkN`, `NetworkLSTM`. |
29
- | `mse` `crossEntropy` | Loss functions for evaluation and logging. |
30
- | `mseDelta` `crossEntropyDelta` | Output-layer delta functions for use with `trainWithDeltas`. |
94
+
95
+ ### Classical ML
96
+
97
+ | Export | Description |
98
+ |--------|-------------|
99
+ | `Perceptron` | The historical Rosenblatt perceptron (1957). Step function, linear rule. Shows why XOR is impossible. |
100
+ | `LinearRegression` | Closed-form normal equation `(XᵀX)⁻¹Xᵀy` + gradient descent mode. Pure array arithmetic. |
101
+ | `LogisticRegression` | Sigmoid + binary cross-entropy, no hidden layers. The boundary between classical ML and neural nets. |
102
+ | `SoftmaxRegression` | Multinomial logistic regression. Log-sum-exp trick for numerical stability. |
103
+ | `GaussianNaiveBayes` | `P(c|x) ∝ P(c)·∏P(xᵢ|c)` in log-space. Zero gradient descent — pure Bayes. |
104
+ | `DecisionTree` | CART with Gini impurity (classification) or variance (regression). Fully recursive. |
105
+
106
+ ### Unsupervised learning
107
+
108
+ | Export | Description |
109
+ |--------|-------------|
110
+ | `KMeans` | K-Means++ initialization + Lloyd's algorithm. `inertia()` for the elbow method. |
111
+ | `PCA` | Principal Component Analysis via power iteration + Hotelling deflation. Projects, reconstructs, explains variance. |
112
+ | `SOM` | Self-Organizing Map (Kohonen). BMU search, Gaussian neighborhood, topology preservation. |
113
+ | `HopfieldNetwork` | Associative memory. Hebbian storage, energy function, async recall. Capacity ~0.138·N. |
114
+ | `Autoencoder` | Encoder + bottleneck + decoder using two `NetworkN` instances. Learns compressed representations. |
115
+
116
+ ### Generative models
117
+
118
+ | Export | Description |
119
+ |--------|-------------|
120
+ | `GAN` | Generator vs Discriminator min-max game. Documents Nash equilibrium and mode collapse. |
121
+ | `VAE` | Variational Autoencoder. Reparametrization trick, ELBO = reconstruction + KL divergence. |
122
+
123
+ ### Automatic differentiation
124
+
125
+ | Export | Description |
126
+ |--------|-------------|
127
+ | `Value` | Scalar autograd node. Builds a computational graph and propagates gradients with `.backward()`. Inspired by micrograd. |
128
+
129
+ ### Activations & math
130
+
131
+ | Export | Description |
132
+ |--------|-------------|
133
+ | `sigmoid` `relu` `tanh` `linear` `leakyRelu` `elu` | Built-in activation functions with `fn` and `dfn` (derivative from output). |
134
+ | `makeLeakyRelu(α)` `makeElu(α)` | Parametric variants. |
135
+ | `matMul` `transpose` `softmax` `softmaxBackward` | Matrix math utilities. |
136
+
137
+ ### Optimizers
138
+
139
+ | Export | Description |
140
+ |--------|-------------|
141
+ | `SGD` | Vanilla stochastic gradient descent. Stateless. |
142
+ | `Momentum` | Accumulates velocity in the gradient direction. |
143
+ | `Adam` | Adaptive moment estimation. Per-parameter first and second moments with bias correction. |
144
+ | `ClipOptimizer` | Wraps any optimizer with gradient clipping. |
145
+ | `ClippedOptimizerFactory` | Factory wrapper that clips all created optimizers. |
146
+ | `defaultOptimizer` | Default factory (`() => new SGD()`). Shared fallback across all classes. |
147
+
148
+ ### Loss functions
149
+
150
+ | Export | Description |
151
+ |--------|-------------|
152
+ | `mse` `crossEntropy` | Scalar loss functions for evaluation and logging. |
153
+ | `mseDelta` `crossEntropyDelta` `crossEntropyDeltaRaw` | Output-layer delta functions for `trainWithDeltas`. |
154
+
155
+ ### Metrics & evaluation
156
+
157
+ | Export | Description |
158
+ |--------|-------------|
159
+ | `confusionMatrix` | Returns `number[][]` confusion matrix. |
160
+ | `accuracy` `precision` `recall` `f1Score` | Standard classification metrics. |
161
+ | `rocCurve` `auc` | ROC curve points and area under the curve (trapezoidal rule). |
162
+ | `mae` `rmse` `r2Score` | Regression metrics. |
163
+ | `perplexity` | `exp(mean cross-entropy)` — natural metric for language models. |
164
+ | `printConfusionMatrix` `classificationReport` | Console-formatted output tables. |
165
+
166
+ ### Training utilities
167
+
168
+ | Export | Description |
169
+ |--------|-------------|
170
+ | `Trainer` | Training loop with epochs, batches, metrics, and callbacks. |
171
+ | `DataLoader` | Dataset wrapper with shuffling and validation split. |
172
+ | `LRScheduler` | Learning rate schedules (step, exponential, cosine). |
173
+ | `EarlyStopping` | Stops training when a metric stalls. Configurable patience, mode, and best-weight restore. |
174
+ | `LossPlotter` | Renders a loss curve as ASCII art in the terminal. |
175
+ | `WeightInspector` | Per-layer weight statistics (mean, std, dead weights). Detects dead ReLUs. |
176
+ | `DataAugmentation` | Noise, jitter, normalization, z-score, shuffle, train/val/test split. |
177
+ | `ModelSaver` | Universal serialization via flat `getWeights()` / `setWeights()`. |
31
178
 
32
179
  ## Install
33
180
 
@@ -44,300 +191,439 @@ import { Neuron } from "@dniskav/neuron";
44
191
 
45
192
  const neuron = new Neuron();
46
193
 
47
- // Train: output 1 if input >= 18, else 0
48
194
  for (let epoch = 0; epoch < 1000; epoch++) {
49
195
  neuron.train(20, 1, 0.1); // adult
50
196
  neuron.train(15, 0, 0.1); // minor
51
197
  }
52
198
 
53
- console.log(neuron.predict(17)); // ~0.1 (minor)
54
- console.log(neuron.predict(25)); // ~0.9 (adult)
199
+ console.log(neuron.predict(17)); // ~0.1
200
+ console.log(neuron.predict(25)); // ~0.9
55
201
  ```
56
202
 
57
- ### N-input neuron multi-feature classification
203
+ ### NetworkNdeep network with custom architecture
58
204
 
59
205
  ```ts
60
- import { NeuronN } from "@dniskav/neuron";
61
-
62
- const neuron = new NeuronN(3); // 3 inputs: R, G, B
206
+ import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
63
207
 
64
- // Teach it to detect bright colors (luminance > 0.65)
65
- neuron.train([1, 1, 1], 1, 0.05); // white → bright
66
- neuron.train([0, 0, 0], 0, 0.05); // black → dark
208
+ const net = new NetworkN([3, 64, 32, 1], {
209
+ activations: [relu, relu, sigmoid],
210
+ optimizer: () => new Adam(),
211
+ });
67
212
 
68
- console.log(neuron.predict([0.9, 0.9, 0.9])); // close to 1
213
+ net.train([0.5, 0.3, 0.8], [1], 0.001);
214
+ const [out] = net.predict([0.5, 0.3, 0.8]);
69
215
  ```
70
216
 
71
- ### Networknon-linear classification
217
+ ### Historical Perceptron step function, no hidden layers
72
218
 
73
219
  ```ts
74
- import { Network } from "@dniskav/neuron";
75
-
76
- // 2 inputs → 8 hidden neurons → 1 output
77
- const net = new Network(2, 8, 1);
220
+ import { Perceptron } from "@dniskav/neuron";
78
221
 
79
- // Train on XOR (not linearly separable — needs hidden layer)
80
- const data = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]];
222
+ const p = new Perceptron(2);
81
223
 
82
- for (let epoch = 0; epoch < 5000; epoch++) {
83
- for (const [x, y, t] of data) {
84
- net.train([x, y], t, 0.3);
85
- }
86
- }
224
+ // Learns AND gate (linearly separable)
225
+ const data = [[0,0,0],[0,1,0],[1,0,0],[1,1,1]];
226
+ for (let e = 0; e < 100; e++)
227
+ for (const [a, b, t] of data) p.train([a, b], t, 0.1);
87
228
 
88
- console.log(net.predict([0, 1])[0]); // ~0.97
89
- console.log(net.predict([1, 1])[0]); // ~0.03
229
+ console.log(p.predict([1, 1])); // 1
230
+ console.log(p.predict([0, 1])); // 0
231
+ // XOR cannot be learned — not linearly separable
90
232
  ```
91
233
 
92
- ### NetworkNdeep network with custom architecture
234
+ ### Linear Regression normal equation
93
235
 
94
236
  ```ts
95
- import { NetworkN } from "@dniskav/neuron";
237
+ import { LinearRegression } from "@dniskav/neuron";
96
238
 
97
- // 3 inputs 24 hidden → 16 hidden → 2 outputs
98
- const net = new NetworkN([3, 24, 16, 2]);
239
+ const model = new LinearRegression();
99
240
 
100
- // Train with multiple targets
101
- net.train([0.5, 0.3, 0.8], [1, 0], 0.05);
241
+ // Exact closed-form solution in one call
242
+ model.fitNormal(
243
+ [[1], [2], [3], [4]], // X
244
+ [2, 4, 6, 8] // y = 2x
245
+ );
102
246
 
103
- // Predict returns an array — one value per output neuron
104
- const [out1, out2] = net.predict([0.5, 0.3, 0.8]);
247
+ console.log(model.predict([5])); // ~10
248
+ console.log(model.getCoefficients()); // { weights: [2], bias: ~0 }
105
249
  ```
106
250
 
107
- ### ActivationsReLU, tanh, and more
251
+ ### Logistic Regression sigmoid + BCE
252
+
253
+ ```ts
254
+ import { LogisticRegression } from "@dniskav/neuron";
255
+
256
+ const clf = new LogisticRegression(2);
257
+ const lossHistory = clf.train(
258
+ [[0,0],[1,1],[1,0],[0,1]],
259
+ [0, 1, 1, 0],
260
+ 0.1, 500
261
+ );
262
+
263
+ console.log(clf.classify([0.9, 0.9])); // 1
264
+ console.log(clf.classify([0.1, 0.1])); // 0
265
+ ```
108
266
 
109
- Pass an activation per layer. The last layer typically uses `sigmoid` for binary output or `linear` for regression.
267
+ ### Gaussian Naive Bayes zero gradient descent
110
268
 
111
269
  ```ts
112
- import { NetworkN, relu, sigmoid } from "@dniskav/neuron";
270
+ import { GaussianNaiveBayes } from "@dniskav/neuron";
113
271
 
114
- const net = new NetworkN([3, 64, 32, 1], {
115
- activations: [relu, relu, sigmoid],
116
- });
272
+ const nb = new GaussianNaiveBayes();
273
+ nb.fit(
274
+ [[1.2, 0.5], [1.4, 0.7], [5.0, 4.5], [5.2, 4.8]],
275
+ [0, 0, 1, 1]
276
+ );
277
+
278
+ console.log(nb.predict([1.3, 0.6])); // 0
279
+ console.log(nb.predict([5.1, 4.6])); // 1
117
280
  ```
118
281
 
119
- Available: `sigmoid`, `relu`, `tanh`, `linear`.
282
+ ### Decision Tree Gini split
120
283
 
121
- ### Optimizers — Adam, Momentum, SGD
284
+ ```ts
285
+ import { DecisionTree } from "@dniskav/neuron";
286
+
287
+ const tree = new DecisionTree({ maxDepth: 4, task: 'classification' });
288
+ tree.fit(X_train, y_train);
289
+ const predictions = tree.predictBatch(X_test);
290
+ ```
122
291
 
123
- Pass an optimizer factory. Each weight gets its own instance with independent state.
292
+ ### K-Means unsupervised clustering
124
293
 
125
294
  ```ts
126
- import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
295
+ import { KMeans } from "@dniskav/neuron";
127
296
 
128
- const net = new NetworkN([2, 64, 1], {
129
- activations: [relu, sigmoid],
130
- optimizer: () => new Adam(), // default: beta1=0.9, beta2=0.999
131
- });
297
+ const km = new KMeans(3); // 3 clusters
298
+ km.fit(points);
132
299
 
133
- // Momentum example
134
- import { Momentum } from "@dniskav/neuron";
135
- const net2 = new NetworkN([2, 32, 1], {
136
- optimizer: () => new Momentum(0.9),
137
- });
300
+ const cluster = km.predict([1.2, 0.5]); // index 0, 1 or 2
301
+ console.log(km.inertia(points)); // lower = better fit
138
302
  ```
139
303
 
140
- Optimizers also work in `NetworkLSTM` (applied to the dense layers):
304
+ ### PCA dimensionality reduction
141
305
 
142
306
  ```ts
143
- import { NetworkLSTM, relu, Adam } from "@dniskav/neuron";
307
+ import { PCA } from "@dniskav/neuron";
144
308
 
145
- const net = new NetworkLSTM(1, 8, [4, 1], {
146
- denseActivation: relu,
147
- optimizer: () => new Adam(0.001),
148
- });
309
+ const pca = new PCA(2); // keep top 2 components
310
+ pca.fit(X); // 100 samples × 10 features
311
+
312
+ const Z = pca.transform(X); // 100 × 2
313
+ const X2 = pca.inverseTransform(Z); // reconstructed 100 × 10
314
+
315
+ console.log(pca.explainedVarianceRatio()); // [0.72, 0.15, ...]
149
316
  ```
150
317
 
151
- ### Loss utilities
318
+ ### Self-Organizing Map
152
319
 
153
320
  ```ts
154
- import { mse, crossEntropy } from "@dniskav/neuron";
321
+ import { SOM } from "@dniskav/neuron";
155
322
 
156
- const predicted = net.predict([0.5, 0.3]);
157
- console.log(mse(predicted, [1, 0]));
158
- console.log(crossEntropy(predicted, [1, 0]));
159
- ```
323
+ const som = new SOM(10, 10, 3); // 10×10 grid, 3-dimensional inputs (RGB)
324
+ som.train(colors, 500);
160
325
 
161
- ### trainWithDeltas custom loss / physics-based gradients
326
+ const [row, col] = som.getBMU([255, 0, 0]); // find best matching unit for red
327
+ console.log(som.quantizationError(colors));
328
+ ```
162
329
 
163
- `NetworkN` also exposes `trainWithDeltas` for when you compute your own output-layer deltas (e.g., from a physics simulation or a custom loss function):
330
+ ### Hopfield Network associative memory
164
331
 
165
332
  ```ts
166
- import { NetworkN, mseDelta } from "@dniskav/neuron";
333
+ import { HopfieldNetwork } from "@dniskav/neuron";
167
334
 
168
- const net = new NetworkN([3, 16, 2]);
169
- const pred = net.predict(inputs);
335
+ const net = new HopfieldNetwork(64); // 64 binary neurons
170
336
 
171
- // Compute deltas manually using a helper, or from any external signal
172
- const deltas = pred.map((p, i) => mseDelta(p, targets[i]));
173
- net.trainWithDeltas(inputs, deltas, 0.01);
174
- ```
337
+ // Store two 64-bit patterns
338
+ net.store(HopfieldNetwork.binarize(pattern1)); // converts 0/1 → -1/+1
339
+ net.store(HopfieldNetwork.binarize(pattern2));
175
340
 
176
- ### NetworkLSTM recurrent network with memory
341
+ // Recall from noisy input
342
+ const recovered = net.recall(HopfieldNetwork.binarize(noisyPattern1));
343
+ console.log(net.energy(recovered)); // local minimum = stored memory
344
+ ```
177
345
 
178
- `NetworkLSTM` adds within-episode memory: the network can remember what happened in previous steps of the same sequence.
346
+ ### Autoencoder learn compressed representations
179
347
 
180
348
  ```ts
181
- import { NetworkLSTM } from "@dniskav/neuron";
349
+ import { Autoencoder } from "@dniskav/neuron";
182
350
 
183
- // 1 input LSTM(8 hidden)Dense(4) → 1 output
184
- const net = new NetworkLSTM(1, 8, [4, 1]);
351
+ // 784[128, 64]16 (latent) → [64, 128] → 784
352
+ const ae = new Autoencoder(784, [128, 64], 16, [64, 128]);
185
353
 
186
- // Task: predict 1 if we're past step 3 in the episode, else 0
187
- // A feedforward net can't do this — it has no memory of step count.
354
+ for (let e = 0; e < 1000; e++)
355
+ for (const x of images)
356
+ ae.train(x, 0.001);
188
357
 
189
- for (let epoch = 0; epoch < 300; epoch++) {
190
- net.resetState(); // clear memory at episode start
358
+ const latent = ae.encode(image); // compressed: 16 values
359
+ const reconstructed = ae.reconstruct(image); // decoded back: 784 values
360
+ ```
191
361
 
192
- const targets: number[][] = [];
193
- for (let step = 0; step < 6; step++) {
194
- net.predict([1]); // same input every step
195
- targets.push([step >= 3 ? 1 : 0]);
196
- }
362
+ ### GAN generative adversarial training
197
363
 
198
- net.train(targets, 0.05); // BPTT across the full episode
364
+ ```ts
365
+ import { GAN } from "@dniskav/neuron";
366
+
367
+ const gan = new GAN(
368
+ 16, // latentDim
369
+ [32, 64], // generator hidden layers
370
+ 8, // outputDim (size of generated samples)
371
+ [64, 32], // discriminator hidden layers
372
+ );
373
+
374
+ for (let step = 0; step < 10000; step++) {
375
+ const { dLoss, gLoss } = gan.trainStep(realBatch, 0.0002);
376
+ if (step % 500 === 0) console.log(`D: ${dLoss.toFixed(3)} G: ${gLoss.toFixed(3)}`);
199
377
  }
200
378
 
201
- // Run a fresh episode and check predictions
202
- net.resetState();
203
- for (let step = 0; step < 6; step++) {
204
- const [out] = net.predict([1]);
205
- console.log(`step ${step}: ${out.toFixed(2)} (expected: ${step >= 3 ? 1 : 0})`);
379
+ const fake = gan.generate(); // new synthetic sample
380
+ ```
381
+
382
+ ### VAE variational autoencoder
383
+
384
+ ```ts
385
+ import { VAE } from "@dniskav/neuron";
386
+
387
+ const vae = new VAE(784, [256, 128], 32, [128, 256]);
388
+
389
+ for (const x of dataset) {
390
+ const { totalLoss, reconLoss, klLoss } = vae.train(x, 0.001);
206
391
  }
207
- // step 0: 0.07 (expected: 0)
208
- // step 1: 0.11 (expected: 0)
209
- // step 2: 0.18 (expected: 0)
210
- // step 3: 0.81 (expected: 1)
211
- // step 4: 0.89 (expected: 1)
212
- // step 5: 0.93 (expected: 1)
392
+
393
+ // Sample from latent space
394
+ const generated = vae.generate(); // random sample
395
+ const { mu, logVar } = vae.encode(image); // encode → distribution params
396
+ const z = vae.reparametrize(mu, logVar); // sample z ~ N(μ, σ²)
213
397
  ```
214
398
 
215
- The network learns to count steps using its hidden state no external counter needed.
399
+ ### Value / Tapeautomatic differentiation
216
400
 
217
- ## How it works
401
+ ```ts
402
+ import { Value } from "@dniskav/neuron";
218
403
 
219
- Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
404
+ // Build a computation graph
405
+ const x = new Value(2.0);
406
+ const w = new Value(-3.0);
407
+ const b = new Value(6.7);
408
+ const n = x.mul(w).add(b); // n = x*w + b
409
+ const o = n.tanh(); // o = tanh(n)
220
410
 
411
+ // Backward pass — fills .grad for every node
412
+ o.backward();
413
+
414
+ console.log(x.grad); // ∂o/∂x
415
+ console.log(w.grad); // ∂o/∂w
416
+ console.log(b.grad); // ∂o/∂b
221
417
  ```
222
- weight += lr × delta × input
223
- bias += lr × delta
418
+
419
+ ### Conv2D + MaxPool2D + Flatten — CNN pipeline
420
+
421
+ ```ts
422
+ import { Conv2D, MaxPool2D, Flatten, NetworkN, relu, sigmoid } from "@dniskav/neuron";
423
+
424
+ const conv = new Conv2D(28, 28, 1, 3, 8); // 28×28×1 → 26×26×8
425
+ const pool = new MaxPool2D(2); // 26×26×8 → 13×13×8
426
+ const flatten = new Flatten();
427
+ const dense = new NetworkN([13*13*8, 64, 10]);
428
+
429
+ // Forward
430
+ const featureMaps = conv.forward(image); // [H][W][C]
431
+ const pooled = pool.forward(featureMaps);
432
+ const flat = flatten.forward(pooled); // 1352 values
433
+ const logits = dense.predict(flat);
224
434
  ```
225
435
 
226
- `NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. The derivative of the chosen activation is applied at each layer.
436
+ ### RNN vanilla recurrent network
227
437
 
228
- `NeuronN` uses simplified **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]` — so gradients flow well from the start of training.
438
+ ```ts
439
+ import { RNN } from "@dniskav/neuron";
229
440
 
230
- When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state (velocity, moments).
441
+ // 1 input 16 hidden 1 output, over a sequence
442
+ const rnn = new RNN(1, 16, 1);
231
443
 
232
- ## Build
444
+ const sequence = [[0.1], [0.3], [0.7], [0.9]]; // 4 timesteps
445
+ const { outputs, hiddens } = rnn.forward(sequence);
233
446
 
234
- ```bash
235
- npm run build # outputs CJS + ESM + type declarations to dist/
236
- npm run dev # watch mode
447
+ // BPTT backward — returns MSE loss
448
+ const targets = [[0.2], [0.5], [0.8], [1.0]];
449
+ const loss = rnn.backward(sequence, targets, 0.01);
237
450
  ```
238
451
 
239
- ## For AI agents
452
+ ### TCN Temporal Convolutional Network
240
453
 
241
- If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
454
+ ```ts
455
+ import { TCN } from "@dniskav/neuron";
456
+
457
+ // 3 input channels → 32 channels × 4 levels → 1 output
458
+ // Receptive field = (3-1)·(2⁴-1)+1 = 30 timesteps
459
+ const tcn = new TCN(3, 32, 3, 4, 1);
460
+
461
+ const sequence = Array.from({ length: 50 }, () => [Math.random(), Math.random(), Math.random()]);
462
+ const outputs = tcn.forward(sequence); // [50][1]
463
+ ```
242
464
 
243
- ### NetworkTransformerself-attention over sequences
465
+ ### NetworkLSTMrecurrent memory
244
466
 
245
467
  ```ts
246
- import { NetworkTransformer } from "@dniskav/neuron";
247
-
248
- // Sudoku solver: 81 cells (tokens), values 0–9, predict digit 1–9 per cell
249
- const net = new NetworkTransformer(81, {
250
- vocabSize: 10, // digits 0–9
251
- d_model: 64, // embedding / hidden dimension
252
- nHeads: 4, // attention heads (d_k = d_model / nHeads = 16)
253
- d_ff: 128, // FFN hidden size
254
- nBlocks: 4, // number of transformer blocks
255
- nClasses: 9, // output classes per token (digits 1–9)
256
- });
468
+ import { NetworkLSTM } from "@dniskav/neuron";
469
+
470
+ const net = new NetworkLSTM(1, 8, [4, 1]);
471
+
472
+ for (let epoch = 0; epoch < 300; epoch++) {
473
+ net.resetState();
474
+ for (let step = 0; step < 6; step++) net.predict([1]);
475
+ net.train([[0],[0],[0],[1],[1],[1]], 0.05);
476
+ }
477
+ ```
478
+
479
+ ### Metrics — evaluate your model
480
+
481
+ ```ts
482
+ import { accuracy, f1Score, confusionMatrix, printConfusionMatrix, auc, classificationReport } from "@dniskav/neuron";
483
+
484
+ const yTrue = [0, 1, 1, 0, 1];
485
+ const yPred = [0, 1, 0, 0, 1];
486
+
487
+ console.log(accuracy(yTrue, yPred)); // 0.8
488
+ console.log(f1Score(yTrue, yPred)); // 0.8
489
+
490
+ const cm = confusionMatrix(yTrue, yPred);
491
+ printConfusionMatrix(cm, ['neg', 'pos']);
492
+
493
+ // AUC-ROC
494
+ const scores = [0.1, 0.9, 0.4, 0.2, 0.8];
495
+ console.log(auc(yTrue, scores)); // ~0.9
257
496
 
258
- // tokens: 81 cell values (0 = empty)
259
- const puzzle = [5,3,0, 0,7,0, 0,0,0, ...];
260
- const targets = [...]; // 81*9 one-hot values
261
- const mask = puzzle.map(v => v === 0); // only train on empty cells
497
+ classificationReport(yTrue, yPred, ['neg', 'pos']);
498
+ ```
499
+
500
+ ### EarlyStopping
501
+
502
+ ```ts
503
+ import { EarlyStopping } from "@dniskav/neuron";
262
504
 
263
- const loss = net.train(puzzle, targets, 0.001, mask);
264
- // loss is cross-entropy (not MSE) — decreases from ~2.2 toward 0 as training progresses
265
- const logits = net.predict(puzzle); // 729 logits (81 × 9)
505
+ const stopper = new EarlyStopping({ patience: 10, minDelta: 1e-4, mode: 'min' });
266
506
 
267
- // Attention weights from all blocks for visualization
268
- const weights = net.getAttentionWeights();
269
- // weights[blockIdx][headIdx] seqLen × seqLen matrix
507
+ for (let epoch = 0; epoch < 1000; epoch++) {
508
+ const valLoss = trainEpoch();
509
+ if (stopper.update(valLoss, epoch)) {
510
+ console.log(`Stopped at epoch ${epoch}`);
511
+ break;
512
+ }
513
+ }
270
514
  ```
271
515
 
272
- Each head in each block learns a different type of relationship (row, column,
273
- 3×3 box). The network figures this out by itself through training.
516
+ ### LossPlotter ASCII loss curve
274
517
 
275
- ### NetworkTransformerRL — Transformer for reinforcement learning
518
+ ```ts
519
+ import { LossPlotter } from "@dniskav/neuron";
276
520
 
277
- `NetworkTransformerRL` uses causal self-attention over a sliding window of past states to output Q-values. Unlike `NetworkLSTM`, the agent attends to specific past moments rather than compressing them into a single hidden vector.
521
+ const plotter = new LossPlotter({ width: 60, height: 12, title: 'Training Loss' });
522
+
523
+ for (let e = 0; e < 500; e++) {
524
+ const loss = trainStep();
525
+ plotter.add(loss, e);
526
+ }
527
+
528
+ plotter.print();
529
+ // Training Loss
530
+ // ┌────────────────────────────────────────────────────────────┐
531
+ // │ 2.31 ·
532
+ // │ · ·
533
+ // │ · · ·
534
+ // │ · · · · · · ·
535
+ // │ 0.02 · · · · · · · · · · · · · · ·
536
+ // └────────────────────────────────────────────────────────────┘
537
+ // 0 250 499
538
+ ```
539
+
540
+ ### DataAugmentation
278
541
 
279
542
  ```ts
280
- import { NetworkTransformerRL } from "@dniskav/neuron";
281
-
282
- // Agent sees the last 8 steps, each step is a 7-value sensor vector → 4 actions
283
- const net = new NetworkTransformerRL(8, 7, {
284
- d_model: 32,
285
- nHeads: 2,
286
- d_ff: 64,
287
- nBlocks: 2,
288
- nActions: 4,
289
- });
543
+ import { DataAugmentation } from "@dniskav/neuron";
290
544
 
291
- // Each step: feed the last N states as a sequence
292
- const sequence = getLastNStates(); // number[][] — shape: [8, 7]
293
- const qValues = net.predict(sequence); // number[4]
545
+ // Split dataset
546
+ const { trainX, trainY, valX, valY } = DataAugmentation.split(X, y, 0.8, 0.1);
294
547
 
295
- // Q-learning update: train toward Bellman target
296
- const action = argmax(qValues);
297
- const reward = env.step(action);
298
- const targets = qValues.slice();
299
- targets[action] = reward + 0.99 * Math.max(...net.predict(nextSequence));
548
+ // Normalize (fit on train, apply to all)
549
+ const { normalized: normTrain, min, max } = DataAugmentation.normalize(trainX);
550
+ const normVal = valX.map(x => DataAugmentation.normalizePoint(x, min, max));
300
551
 
301
- const loss = net.train(sequence, targets, 0.001);
552
+ // Augment training set (×3 copies with Gaussian noise)
553
+ const { X: augX, y: augY } = DataAugmentation.augmentBatch(normTrain, trainY, 3, 0.02);
302
554
  ```
303
555
 
304
- The last step in the sequence gets 2× pooling weight the most recent state contributes more to the decision.
556
+ ### WeightInspectordiagnose your network
305
557
 
306
558
  ```ts
307
- // Inspect what the agent is attending to
308
- const attnWeights = net.getAttentionWeights();
309
- // attnWeights[blockIdx][headIdx] seqLen × seqLen matrix
559
+ import { NetworkN, WeightInspector, relu } from "@dniskav/neuron";
560
+
561
+ const net = new NetworkN([784, 256, 128, 10], { activations: [relu, relu, relu] });
562
+ // ... train ...
563
+
564
+ WeightInspector.print(net);
565
+ // Layer 0: mean=0.001 std=0.056 min=-0.21 max=0.19 dead=0 params=200960
566
+ // Layer 1: mean=0.000 std=0.079 min=-0.31 max=0.28 dead=3 params=32896
567
+ // Layer 2: mean=-0.001 std=0.091 min=-0.28 max=0.32 dead=0 params=1290
310
568
  ```
311
569
 
570
+ ## How it works
571
+
572
+ Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
573
+
574
+ ```
575
+ weight += lr × delta × input
576
+ bias += lr × delta
577
+ ```
578
+
579
+ `NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. `NeuronN` uses **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]`.
580
+
581
+ When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state.
582
+
583
+ The `Value` class implements **reverse-mode automatic differentiation**: every operation records its inputs and a backward function. Calling `.backward()` on the output node performs a topological sort and propagates `∂L/∂w` through the entire graph.
584
+
585
+ ## Build
586
+
587
+ ```bash
588
+ npm run build # outputs CJS + ESM + type declarations to dist/
589
+ npm run dev # watch mode
590
+ npm test # run test suite
591
+ ```
592
+
593
+ ## For AI agents
594
+
595
+ If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
596
+
312
597
  ## Changelog
313
598
 
599
+ ### v0.3.0
600
+ - **New — Classical ML:** `Perceptron`, `LinearRegression` (normal equation + GD), `LogisticRegression`, `SoftmaxRegression`, `GaussianNaiveBayes`, `DecisionTree` (CART, Gini/MSE)
601
+ - **New — Unsupervised:** `KMeans` (K-Means++ init), `PCA` (power iteration + Hotelling deflation), `SOM` (Kohonen map), `HopfieldNetwork` (Hebbian storage + energy), `Autoencoder`
602
+ - **New — Deep Learning:** `Conv2D` (full forward/backward), `MaxPool2D` (position mask for exact backprop), `Flatten`, `RNN` (BPTT, documents vanishing gradient), `Seq2Seq` (encoder-decoder LSTM), `CausalConv1D`, `TCN` (dilated temporal convolutions)
603
+ - **New — Generative:** `GAN` (min-max game, Box-Muller sampling), `VAE` (reparametrization trick, ELBO = MSE + KL)
604
+ - **New — Autograd:** `Value` / `Tape` — scalar reverse-mode AD with topological backprop (micrograd-style)
605
+ - **New — Metrics:** `confusionMatrix`, `accuracy`, `precision`, `recall`, `f1Score`, `rocCurve`, `auc`, `mae`, `rmse`, `r2Score`, `perplexity`, `printConfusionMatrix`, `classificationReport`
606
+ - **New — Utilities:** `EarlyStopping` (patience + best-weight restore), `LossPlotter` (ASCII terminal curve), `WeightInspector` (per-layer stats, dead ReLU detection), `DataAugmentation` (noise, normalize, z-score, shuffle, split)
607
+
608
+ ### v0.2.7
609
+ - **Docs:** Added architecture diagram to README
610
+
314
611
  ### v0.2.6
315
612
  - **Fix:** `Network.predict` now returns `number[]` (consistent with all other network classes)
316
- - **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()` instead of hardcoded SGD and sigmoid derivative
317
- - **Fix:** `LayerNorm.backwardOne` now correctly uses pre-update γ when computing the input gradient
318
- - **Fix:** LSTM and GRU gate initialization corrected from He (`√(2/n)`) to Xavier fan-in+out (`√(2/(fanIn+fanOut))`), matching the sigmoid/tanh activations used in those gates
319
- - **New:** `BiasVector` — 1D counterpart to `WeightMatrix` with per-scalar Adam optimizers; replaces repeated `number[] + Adam[]` pairs in `TransformerBlock`, `NetworkTransformer`, and `NetworkTransformerRL`
320
- - **New:** `defaultOptimizer` exported from `optimizers.ts` single source of truth for the default `() => new SGD()` factory
321
- - **Refactor:** `NetworkN.train` and `trainWithDeltas` share extracted `_forwardAll()` and `_backpropLayers()` internals — eliminates ~50 lines of duplication
322
- - **Refactor:** `Transformer` backward methods now throw descriptive errors instead of crashing with a cryptic `TypeError` when called before `predict()`
323
- - **Refactor:** `NetworkTransformer.setWeights()` and `NetworkTransformerRL.setWeightsFlat()` use each component's own `setWeights()` instead of direct `.W` mutation
613
+ - **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()`
614
+ - **Fix:** `LayerNorm.backwardOne` correctly uses pre-update γ
615
+ - **Fix:** LSTM and GRU gate initialization corrected to Xavier fan-in+out
616
+ - **New:** `BiasVector` — 1D counterpart to `WeightMatrix`
617
+ - **New:** `defaultOptimizer` — shared default factory
618
+ - **Refactor:** `NetworkN` extracts `_forwardAll()` and `_backpropLayers()`
324
619
 
325
620
  ### v0.2.5
326
- - Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D` (per-scalar Adam/Momentum/SGD)
327
- - `NetworkN`: residual connections (`residual` option) and dropout (`dropoutRate`)
328
- - `Conv1D`: multi-channel input (`inputChannels`)
329
- - `NetworkTransformerRL`: configurable pooling (`avg` / `max` / `last` / `weighted`)
330
- - `Trainer`: weight decay, early stopping, classification metrics, gradient clipping support
331
- - `DataLoader`: validation split (`validationSplit` + `getValidationData()`)
332
- - `ModelSaver`: universal serialization via flat `getWeights()`/`setWeights()` for all classes
333
- - Gradient check test suite (`tests/GradientCheck.test.ts`)
334
-
335
- ## Possible improvements
336
-
337
- 1. **Support for batches** in training to improve efficiency and gradient stability.
338
- 2. **Global gradient norm clipping** — `WeightMatrix.update` supports per-element clipping; a utility to clip across all matrices by total norm would be more principled.
339
- 3. **Learning rate warmup** — standard practice for Transformers; ramp LR from 0 to target over the first N steps.
340
- 4. **Pre-norm architecture** — LayerNorm before the residual add (instead of after) is more stable for deep stacks.
621
+ - Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D`
622
+ - `NetworkN`: residual connections and dropout
623
+ - `Conv1D`: multi-channel input
624
+ - `Trainer`: weight decay, early stopping, classification metrics
625
+ - `DataLoader`: validation split
626
+ - `ModelSaver`: universal serialization
341
627
 
342
628
  ## License
343
629