@dniskav/neuron 0.2.7 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
 
4
4
  A minimal, dependency-free neural network library built from scratch in TypeScript. Designed for learning and experimentation — every line of math is readable.
5
5
 
6
- Each class is a building block for the next: from a single neuron to a full Transformer with causal attention.
6
+ Each class is a building block for the next: from a single neuron to a full Transformer with causal attention. v0.3.0 adds classical ML, unsupervised learning, generative models, autograd, and training utilities — all in pure TypeScript, zero dependencies.
7
7
 
8
8
  ```mermaid
9
9
  graph TD
@@ -20,13 +20,43 @@ graph TD
20
20
  K["NetworkTransformer\nembeddings → blocks → per-token logits"]
21
21
  L["NetworkTransformerRL\ncontinuous projection → causal attention → Q-values"]
22
22
 
23
+ subgraph Classical ML
24
+ P["Perceptron\nstep function · Rosenblatt rule"]
25
+ LR["LinearRegression\nnormal equation · gradient descent"]
26
+ LOG["LogisticRegression\nsigmoid · BCE · SoftmaxRegression"]
27
+ NB["GaussianNaiveBayes\nlog-probabilities · Gaussian P(x|c)"]
28
+ DT["DecisionTree\nCART · Gini · MSE split"]
29
+ end
30
+
31
+ subgraph Unsupervised
32
+ KM["KMeans\nK-Means++ · inertia · elbow"]
33
+ PCA["PCA\npower iteration · projection · reconstruction"]
34
+ SOM["SOM\nKohonen · BMU · Gaussian neighborhood"]
35
+ HN["HopfieldNetwork\nHebbian · energy · associative memory"]
36
+ AE["Autoencoder\nencoder · bottleneck · decoder"]
37
+ end
38
+
39
+ subgraph Generative
40
+ GAN["GAN\ngenerator · discriminator · min-max"]
41
+ VAE["VAE\nreparametrization trick · ELBO · KL"]
42
+ end
43
+
44
+ subgraph Autograd
45
+ TAP["Value / Tape\nreverse-mode · computational graph · backward"]
46
+ end
47
+
23
48
  A --> B --> C --> D --> E
24
49
  E --> F --> G
25
50
  E --> H --> I --> J --> K --> L
51
+ E --> AE
52
+ E --> GAN
53
+ E --> VAE
26
54
  ```
27
55
 
28
56
  ## What's inside
29
57
 
58
+ ### Neural network building blocks
59
+
30
60
  | Export | Description |
31
61
  |--------|-------------|
32
62
  | `Neuron` | Single-input neuron. The simplest possible unit: one weight, one bias. |
@@ -36,20 +66,115 @@ graph TD
36
66
  | `NetworkN` | Deep network of arbitrary depth. Define your architecture as `[inputs, ...hidden, outputs]`. |
37
67
  | `LSTMLayer` | Recurrent layer with persistent hidden and cell state. Learns sequences via BPTT. |
38
68
  | `NetworkLSTM` | Wraps an `LSTMLayer` + dense layers. Maintains memory across steps within an episode. |
69
+ | `GRULayer` | Gated Recurrent Unit — lighter alternative to LSTM, two gates instead of three. |
39
70
  | `NetworkTransformer` | Full token-classification Transformer: embeddings → N blocks → per-token logits. |
40
- | `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. Remembers the last N steps. |
71
+ | `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. |
41
72
  | `TransformerBlock` | One Transformer block: multi-head attention + FFN + LayerNorm × 2 with residuals. |
42
73
  | `MultiHeadAttention` | N parallel attention heads concatenated and projected to `d_model`. |
43
74
  | `AttentionHead` | Single scaled dot-product self-attention head (Q / K / V projections + backprop). |
75
+
76
+ ### Layers & components
77
+
78
+ | Export | Description |
79
+ |--------|-------------|
80
+ | `Conv1D` | 1D convolution over sequences. Multi-channel, configurable stride and padding. |
81
+ | `Conv2D` | 2D convolution for images. Kernels `[filters][kH][kW][C]`, full forward + backward. |
82
+ | `MaxPool2D` | Max pooling 2D. Stores position mask for exact gradient routing in backprop. |
83
+ | `Flatten` | Converts `[H][W][C]` tensors to flat vectors. Bridges Conv layers to dense layers. |
84
+ | `RNN` | Vanilla RNN with BPTT. Explicitly shows where and why gradients vanish. |
85
+ | `Seq2Seq` | Encoder + Decoder LSTMs with context vector transfer. Teacher forcing in training. |
86
+ | `CausalConv1D` | Causal dilated 1D convolution. One building block of a TCN. |
87
+ | `TCN` | Temporal Convolutional Network. Stacks causal dilated convolutions for sequences without recurrence. |
44
88
  | `LayerNorm` | Layer normalization with learnable γ / β per feature. |
45
- | `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers. Optional per-element gradient clipping via `update(dW, lr, clipValue)`. |
46
- | `BiasVector` | 1D bias vector with per-scalar Adam optimizers. Companion to `WeightMatrix` for bias terms. |
89
+ | `BatchNorm` | Batch normalization with running mean/variance for inference. |
90
+ | `Dropout` | Inverted dropout for regularization. Active only during training. |
91
+ | `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers and optional gradient clipping. |
92
+ | `BiasVector` | 1D bias vector with per-scalar Adam optimizers. |
47
93
  | `EmbeddingMatrix` | Lookup-table embedding matrix with SGD updates. |
48
- | `sigmoid` `relu` `tanh` `linear` | Built-in activation functions. |
49
- | `SGD` `Momentum` `Adam` `ClipOptimizer` | Optimizers. Each instance tracks its own state per weight. `ClipOptimizer` wraps any optimizer with gradient clipping. |
50
- | `defaultOptimizer` | Default `OptimizerFactory` (`() => new SGD()`). Shared across `NeuronN`, `Layer`, `NetworkN`, `NetworkLSTM`. |
51
- | `mse` `crossEntropy` | Loss functions for evaluation and logging. |
52
- | `mseDelta` `crossEntropyDelta` | Output-layer delta functions for use with `trainWithDeltas`. |
94
+
95
+ ### Classical ML
96
+
97
+ | Export | Description |
98
+ |--------|-------------|
99
+ | `Perceptron` | The historical Rosenblatt perceptron (1957). Step function, linear rule. Shows why XOR is impossible. |
100
+ | `LinearRegression` | Closed-form normal equation `(XᵀX)⁻¹Xᵀy` + gradient descent mode. Pure array arithmetic. |
101
+ | `LogisticRegression` | Sigmoid + binary cross-entropy, no hidden layers. The boundary between classical ML and neural nets. |
102
+ | `SoftmaxRegression` | Multinomial logistic regression. Log-sum-exp trick for numerical stability. |
103
+ | `GaussianNaiveBayes` | `P(c|x) ∝ P(c)·∏P(xᵢ|c)` in log-space. Zero gradient descent — pure Bayes. |
104
+ | `DecisionTree` | CART with Gini impurity (classification) or variance (regression). Fully recursive. |
105
+
106
+ ### Unsupervised learning
107
+
108
+ | Export | Description |
109
+ |--------|-------------|
110
+ | `KMeans` | K-Means++ initialization + Lloyd's algorithm. `inertia()` for the elbow method. |
111
+ | `PCA` | Principal Component Analysis via power iteration + Hotelling deflation. Projects, reconstructs, explains variance. |
112
+ | `SOM` | Self-Organizing Map (Kohonen). BMU search, Gaussian neighborhood, topology preservation. |
113
+ | `HopfieldNetwork` | Associative memory. Hebbian storage, energy function, async recall. Capacity ~0.138·N. |
114
+ | `Autoencoder` | Encoder + bottleneck + decoder using two `NetworkN` instances. Learns compressed representations. |
115
+
116
+ ### Generative models
117
+
118
+ | Export | Description |
119
+ |--------|-------------|
120
+ | `GAN` | Generator vs Discriminator min-max game. Documents Nash equilibrium and mode collapse. |
121
+ | `VAE` | Variational Autoencoder. Reparametrization trick, ELBO = reconstruction + KL divergence. |
122
+
123
+ ### Automatic differentiation
124
+
125
+ | Export | Description |
126
+ |--------|-------------|
127
+ | `Value` | Scalar autograd node. Builds a computational graph and propagates gradients with `.backward()`. Inspired by micrograd. |
128
+
129
+ ### Activations & math
130
+
131
+ | Export | Description |
132
+ |--------|-------------|
133
+ | `sigmoid` `relu` `tanh` `linear` `leakyRelu` `elu` | Built-in activation functions with `fn` and `dfn` (derivative from output). |
134
+ | `makeLeakyRelu(α)` `makeElu(α)` | Parametric variants. |
135
+ | `matMul` `transpose` `softmax` `softmaxBackward` | Matrix math utilities. |
136
+
137
+ ### Optimizers
138
+
139
+ | Export | Description |
140
+ |--------|-------------|
141
+ | `SGD` | Vanilla stochastic gradient descent. Stateless. |
142
+ | `Momentum` | Accumulates velocity in the gradient direction. |
143
+ | `Adam` | Adaptive moment estimation. Per-parameter first and second moments with bias correction. |
144
+ | `ClipOptimizer` | Wraps any optimizer with gradient clipping. |
145
+ | `ClippedOptimizerFactory` | Factory wrapper that clips all created optimizers. |
146
+ | `defaultOptimizer` | Default factory (`() => new SGD()`). Shared fallback across all classes. |
147
+
148
+ ### Loss functions
149
+
150
+ | Export | Description |
151
+ |--------|-------------|
152
+ | `mse` `crossEntropy` | Scalar loss functions for evaluation and logging. |
153
+ | `mseDelta` `crossEntropyDelta` `crossEntropyDeltaRaw` | Output-layer delta functions for `trainWithDeltas`. |
154
+
155
+ ### Metrics & evaluation
156
+
157
+ | Export | Description |
158
+ |--------|-------------|
159
+ | `confusionMatrix` | Returns `number[][]` confusion matrix. |
160
+ | `accuracy` `precision` `recall` `f1Score` | Standard classification metrics. |
161
+ | `rocCurve` `auc` | ROC curve points and area under the curve (trapezoidal rule). |
162
+ | `mae` `rmse` `r2Score` | Regression metrics. |
163
+ | `perplexity` | `exp(mean cross-entropy)` — natural metric for language models. |
164
+ | `printConfusionMatrix` `classificationReport` | Console-formatted output tables. |
165
+
166
+ ### Training utilities
167
+
168
+ | Export | Description |
169
+ |--------|-------------|
170
+ | `Trainer` | Training loop with epochs, batches, metrics, and callbacks. |
171
+ | `DataLoader` | Dataset wrapper with shuffling and validation split. |
172
+ | `LRScheduler` | Learning rate schedules (step, exponential, cosine). |
173
+ | `EarlyStopping` | Stops training when a metric stalls. Configurable patience, mode, and best-weight restore. |
174
+ | `LossPlotter` | Renders a loss curve as ASCII art in the terminal. |
175
+ | `WeightInspector` | Per-layer weight statistics (mean, std, dead weights). Detects dead ReLUs. |
176
+ | `DataAugmentation` | Noise, jitter, normalization, z-score, shuffle, train/val/test split. |
177
+ | `ModelSaver` | Universal serialization via flat `getWeights()` / `setWeights()`. |
53
178
 
54
179
  ## Install
55
180
 
@@ -66,303 +191,439 @@ import { Neuron } from "@dniskav/neuron";
66
191
 
67
192
  const neuron = new Neuron();
68
193
 
69
- // Train: output 1 if input >= 18, else 0
70
194
  for (let epoch = 0; epoch < 1000; epoch++) {
71
195
  neuron.train(20, 1, 0.1); // adult
72
196
  neuron.train(15, 0, 0.1); // minor
73
197
  }
74
198
 
75
- console.log(neuron.predict(17)); // ~0.1 (minor)
76
- console.log(neuron.predict(25)); // ~0.9 (adult)
199
+ console.log(neuron.predict(17)); // ~0.1
200
+ console.log(neuron.predict(25)); // ~0.9
77
201
  ```
78
202
 
79
- ### N-input neuron multi-feature classification
203
+ ### NetworkNdeep network with custom architecture
80
204
 
81
205
  ```ts
82
- import { NeuronN } from "@dniskav/neuron";
83
-
84
- const neuron = new NeuronN(3); // 3 inputs: R, G, B
206
+ import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
85
207
 
86
- // Teach it to detect bright colors (luminance > 0.65)
87
- neuron.train([1, 1, 1], 1, 0.05); // white → bright
88
- neuron.train([0, 0, 0], 0, 0.05); // black → dark
208
+ const net = new NetworkN([3, 64, 32, 1], {
209
+ activations: [relu, relu, sigmoid],
210
+ optimizer: () => new Adam(),
211
+ });
89
212
 
90
- console.log(neuron.predict([0.9, 0.9, 0.9])); // close to 1
213
+ net.train([0.5, 0.3, 0.8], [1], 0.001);
214
+ const [out] = net.predict([0.5, 0.3, 0.8]);
91
215
  ```
92
216
 
93
- ### Networknon-linear classification
217
+ ### Historical Perceptron step function, no hidden layers
94
218
 
95
219
  ```ts
96
- import { Network } from "@dniskav/neuron";
97
-
98
- // 2 inputs → 8 hidden neurons → 1 output
99
- const net = new Network(2, 8, 1);
220
+ import { Perceptron } from "@dniskav/neuron";
100
221
 
101
- // Train on XOR (not linearly separable — needs hidden layer)
102
- const data = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]];
222
+ const p = new Perceptron(2);
103
223
 
104
- for (let epoch = 0; epoch < 5000; epoch++) {
105
- for (const [x, y, t] of data) {
106
- net.train([x, y], t, 0.3);
107
- }
108
- }
224
+ // Learns AND gate (linearly separable)
225
+ const data = [[0,0,0],[0,1,0],[1,0,0],[1,1,1]];
226
+ for (let e = 0; e < 100; e++)
227
+ for (const [a, b, t] of data) p.train([a, b], t, 0.1);
109
228
 
110
- console.log(net.predict([0, 1])[0]); // ~0.97
111
- console.log(net.predict([1, 1])[0]); // ~0.03
229
+ console.log(p.predict([1, 1])); // 1
230
+ console.log(p.predict([0, 1])); // 0
231
+ // XOR cannot be learned — not linearly separable
112
232
  ```
113
233
 
114
- ### NetworkNdeep network with custom architecture
234
+ ### Linear Regression normal equation
115
235
 
116
236
  ```ts
117
- import { NetworkN } from "@dniskav/neuron";
237
+ import { LinearRegression } from "@dniskav/neuron";
118
238
 
119
- // 3 inputs 24 hidden → 16 hidden → 2 outputs
120
- const net = new NetworkN([3, 24, 16, 2]);
239
+ const model = new LinearRegression();
121
240
 
122
- // Train with multiple targets
123
- net.train([0.5, 0.3, 0.8], [1, 0], 0.05);
241
+ // Exact closed-form solution in one call
242
+ model.fitNormal(
243
+ [[1], [2], [3], [4]], // X
244
+ [2, 4, 6, 8] // y = 2x
245
+ );
124
246
 
125
- // Predict returns an array — one value per output neuron
126
- const [out1, out2] = net.predict([0.5, 0.3, 0.8]);
247
+ console.log(model.predict([5])); // ~10
248
+ console.log(model.getCoefficients()); // { weights: [2], bias: ~0 }
127
249
  ```
128
250
 
129
- ### ActivationsReLU, tanh, and more
251
+ ### Logistic Regression sigmoid + BCE
252
+
253
+ ```ts
254
+ import { LogisticRegression } from "@dniskav/neuron";
255
+
256
+ const clf = new LogisticRegression(2);
257
+ const lossHistory = clf.train(
258
+ [[0,0],[1,1],[1,0],[0,1]],
259
+ [0, 1, 1, 0],
260
+ 0.1, 500
261
+ );
262
+
263
+ console.log(clf.classify([0.9, 0.9])); // 1
264
+ console.log(clf.classify([0.1, 0.1])); // 0
265
+ ```
130
266
 
131
- Pass an activation per layer. The last layer typically uses `sigmoid` for binary output or `linear` for regression.
267
+ ### Gaussian Naive Bayes zero gradient descent
132
268
 
133
269
  ```ts
134
- import { NetworkN, relu, sigmoid } from "@dniskav/neuron";
270
+ import { GaussianNaiveBayes } from "@dniskav/neuron";
135
271
 
136
- const net = new NetworkN([3, 64, 32, 1], {
137
- activations: [relu, relu, sigmoid],
138
- });
272
+ const nb = new GaussianNaiveBayes();
273
+ nb.fit(
274
+ [[1.2, 0.5], [1.4, 0.7], [5.0, 4.5], [5.2, 4.8]],
275
+ [0, 0, 1, 1]
276
+ );
277
+
278
+ console.log(nb.predict([1.3, 0.6])); // 0
279
+ console.log(nb.predict([5.1, 4.6])); // 1
139
280
  ```
140
281
 
141
- Available: `sigmoid`, `relu`, `tanh`, `linear`.
282
+ ### Decision Tree Gini split
142
283
 
143
- ### Optimizers — Adam, Momentum, SGD
284
+ ```ts
285
+ import { DecisionTree } from "@dniskav/neuron";
286
+
287
+ const tree = new DecisionTree({ maxDepth: 4, task: 'classification' });
288
+ tree.fit(X_train, y_train);
289
+ const predictions = tree.predictBatch(X_test);
290
+ ```
144
291
 
145
- Pass an optimizer factory. Each weight gets its own instance with independent state.
292
+ ### K-Means unsupervised clustering
146
293
 
147
294
  ```ts
148
- import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
295
+ import { KMeans } from "@dniskav/neuron";
149
296
 
150
- const net = new NetworkN([2, 64, 1], {
151
- activations: [relu, sigmoid],
152
- optimizer: () => new Adam(), // default: beta1=0.9, beta2=0.999
153
- });
297
+ const km = new KMeans(3); // 3 clusters
298
+ km.fit(points);
154
299
 
155
- // Momentum example
156
- import { Momentum } from "@dniskav/neuron";
157
- const net2 = new NetworkN([2, 32, 1], {
158
- optimizer: () => new Momentum(0.9),
159
- });
300
+ const cluster = km.predict([1.2, 0.5]); // index 0, 1 or 2
301
+ console.log(km.inertia(points)); // lower = better fit
160
302
  ```
161
303
 
162
- Optimizers also work in `NetworkLSTM` (applied to the dense layers):
304
+ ### PCA dimensionality reduction
163
305
 
164
306
  ```ts
165
- import { NetworkLSTM, relu, Adam } from "@dniskav/neuron";
307
+ import { PCA } from "@dniskav/neuron";
166
308
 
167
- const net = new NetworkLSTM(1, 8, [4, 1], {
168
- denseActivation: relu,
169
- optimizer: () => new Adam(0.001),
170
- });
309
+ const pca = new PCA(2); // keep top 2 components
310
+ pca.fit(X); // 100 samples × 10 features
311
+
312
+ const Z = pca.transform(X); // 100 × 2
313
+ const X2 = pca.inverseTransform(Z); // reconstructed 100 × 10
314
+
315
+ console.log(pca.explainedVarianceRatio()); // [0.72, 0.15, ...]
171
316
  ```
172
317
 
173
- ### Loss utilities
318
+ ### Self-Organizing Map
174
319
 
175
320
  ```ts
176
- import { mse, crossEntropy } from "@dniskav/neuron";
321
+ import { SOM } from "@dniskav/neuron";
177
322
 
178
- const predicted = net.predict([0.5, 0.3]);
179
- console.log(mse(predicted, [1, 0]));
180
- console.log(crossEntropy(predicted, [1, 0]));
181
- ```
323
+ const som = new SOM(10, 10, 3); // 10×10 grid, 3-dimensional inputs (RGB)
324
+ som.train(colors, 500);
182
325
 
183
- ### trainWithDeltas custom loss / physics-based gradients
326
+ const [row, col] = som.getBMU([255, 0, 0]); // find best matching unit for red
327
+ console.log(som.quantizationError(colors));
328
+ ```
184
329
 
185
- `NetworkN` also exposes `trainWithDeltas` for when you compute your own output-layer deltas (e.g., from a physics simulation or a custom loss function):
330
+ ### Hopfield Network associative memory
186
331
 
187
332
  ```ts
188
- import { NetworkN, mseDelta } from "@dniskav/neuron";
333
+ import { HopfieldNetwork } from "@dniskav/neuron";
189
334
 
190
- const net = new NetworkN([3, 16, 2]);
191
- const pred = net.predict(inputs);
335
+ const net = new HopfieldNetwork(64); // 64 binary neurons
192
336
 
193
- // Compute deltas manually using a helper, or from any external signal
194
- const deltas = pred.map((p, i) => mseDelta(p, targets[i]));
195
- net.trainWithDeltas(inputs, deltas, 0.01);
196
- ```
337
+ // Store two 64-bit patterns
338
+ net.store(HopfieldNetwork.binarize(pattern1)); // converts 0/1 → -1/+1
339
+ net.store(HopfieldNetwork.binarize(pattern2));
197
340
 
198
- ### NetworkLSTM recurrent network with memory
341
+ // Recall from noisy input
342
+ const recovered = net.recall(HopfieldNetwork.binarize(noisyPattern1));
343
+ console.log(net.energy(recovered)); // local minimum = stored memory
344
+ ```
199
345
 
200
- `NetworkLSTM` adds within-episode memory: the network can remember what happened in previous steps of the same sequence.
346
+ ### Autoencoder learn compressed representations
201
347
 
202
348
  ```ts
203
- import { NetworkLSTM } from "@dniskav/neuron";
349
+ import { Autoencoder } from "@dniskav/neuron";
204
350
 
205
- // 1 input LSTM(8 hidden)Dense(4) → 1 output
206
- const net = new NetworkLSTM(1, 8, [4, 1]);
351
+ // 784[128, 64]16 (latent) → [64, 128] → 784
352
+ const ae = new Autoencoder(784, [128, 64], 16, [64, 128]);
207
353
 
208
- // Task: predict 1 if we're past step 3 in the episode, else 0
209
- // A feedforward net can't do this — it has no memory of step count.
354
+ for (let e = 0; e < 1000; e++)
355
+ for (const x of images)
356
+ ae.train(x, 0.001);
210
357
 
211
- for (let epoch = 0; epoch < 300; epoch++) {
212
- net.resetState(); // clear memory at episode start
358
+ const latent = ae.encode(image); // compressed: 16 values
359
+ const reconstructed = ae.reconstruct(image); // decoded back: 784 values
360
+ ```
213
361
 
214
- const targets: number[][] = [];
215
- for (let step = 0; step < 6; step++) {
216
- net.predict([1]); // same input every step
217
- targets.push([step >= 3 ? 1 : 0]);
218
- }
362
+ ### GAN generative adversarial training
219
363
 
220
- net.train(targets, 0.05); // BPTT across the full episode
364
+ ```ts
365
+ import { GAN } from "@dniskav/neuron";
366
+
367
+ const gan = new GAN(
368
+ 16, // latentDim
369
+ [32, 64], // generator hidden layers
370
+ 8, // outputDim (size of generated samples)
371
+ [64, 32], // discriminator hidden layers
372
+ );
373
+
374
+ for (let step = 0; step < 10000; step++) {
375
+ const { dLoss, gLoss } = gan.trainStep(realBatch, 0.0002);
376
+ if (step % 500 === 0) console.log(`D: ${dLoss.toFixed(3)} G: ${gLoss.toFixed(3)}`);
221
377
  }
222
378
 
223
- // Run a fresh episode and check predictions
224
- net.resetState();
225
- for (let step = 0; step < 6; step++) {
226
- const [out] = net.predict([1]);
227
- console.log(`step ${step}: ${out.toFixed(2)} (expected: ${step >= 3 ? 1 : 0})`);
379
+ const fake = gan.generate(); // new synthetic sample
380
+ ```
381
+
382
+ ### VAE variational autoencoder
383
+
384
+ ```ts
385
+ import { VAE } from "@dniskav/neuron";
386
+
387
+ const vae = new VAE(784, [256, 128], 32, [128, 256]);
388
+
389
+ for (const x of dataset) {
390
+ const { totalLoss, reconLoss, klLoss } = vae.train(x, 0.001);
228
391
  }
229
- // step 0: 0.07 (expected: 0)
230
- // step 1: 0.11 (expected: 0)
231
- // step 2: 0.18 (expected: 0)
232
- // step 3: 0.81 (expected: 1)
233
- // step 4: 0.89 (expected: 1)
234
- // step 5: 0.93 (expected: 1)
392
+
393
+ // Sample from latent space
394
+ const generated = vae.generate(); // random sample
395
+ const { mu, logVar } = vae.encode(image); // encode → distribution params
396
+ const z = vae.reparametrize(mu, logVar); // sample z ~ N(μ, σ²)
235
397
  ```
236
398
 
237
- The network learns to count steps using its hidden state no external counter needed.
399
+ ### Value / Tapeautomatic differentiation
238
400
 
239
- ## How it works
401
+ ```ts
402
+ import { Value } from "@dniskav/neuron";
240
403
 
241
- Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
404
+ // Build a computation graph
405
+ const x = new Value(2.0);
406
+ const w = new Value(-3.0);
407
+ const b = new Value(6.7);
408
+ const n = x.mul(w).add(b); // n = x*w + b
409
+ const o = n.tanh(); // o = tanh(n)
242
410
 
411
+ // Backward pass — fills .grad for every node
412
+ o.backward();
413
+
414
+ console.log(x.grad); // ∂o/∂x
415
+ console.log(w.grad); // ∂o/∂w
416
+ console.log(b.grad); // ∂o/∂b
243
417
  ```
244
- weight += lr × delta × input
245
- bias += lr × delta
418
+
419
+ ### Conv2D + MaxPool2D + Flatten — CNN pipeline
420
+
421
+ ```ts
422
+ import { Conv2D, MaxPool2D, Flatten, NetworkN, relu, sigmoid } from "@dniskav/neuron";
423
+
424
+ const conv = new Conv2D(28, 28, 1, 3, 8); // 28×28×1 → 26×26×8
425
+ const pool = new MaxPool2D(2); // 26×26×8 → 13×13×8
426
+ const flatten = new Flatten();
427
+ const dense = new NetworkN([13*13*8, 64, 10]);
428
+
429
+ // Forward
430
+ const featureMaps = conv.forward(image); // [H][W][C]
431
+ const pooled = pool.forward(featureMaps);
432
+ const flat = flatten.forward(pooled); // 1352 values
433
+ const logits = dense.predict(flat);
246
434
  ```
247
435
 
248
- `NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. The derivative of the chosen activation is applied at each layer.
436
+ ### RNN vanilla recurrent network
249
437
 
250
- `NeuronN` uses simplified **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]` — so gradients flow well from the start of training.
438
+ ```ts
439
+ import { RNN } from "@dniskav/neuron";
251
440
 
252
- When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state (velocity, moments).
441
+ // 1 input 16 hidden 1 output, over a sequence
442
+ const rnn = new RNN(1, 16, 1);
253
443
 
254
- ## Build
444
+ const sequence = [[0.1], [0.3], [0.7], [0.9]]; // 4 timesteps
445
+ const { outputs, hiddens } = rnn.forward(sequence);
255
446
 
256
- ```bash
257
- npm run build # outputs CJS + ESM + type declarations to dist/
258
- npm run dev # watch mode
447
+ // BPTT backward — returns MSE loss
448
+ const targets = [[0.2], [0.5], [0.8], [1.0]];
449
+ const loss = rnn.backward(sequence, targets, 0.01);
259
450
  ```
260
451
 
261
- ## For AI agents
452
+ ### TCN Temporal Convolutional Network
262
453
 
263
- If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
454
+ ```ts
455
+ import { TCN } from "@dniskav/neuron";
456
+
457
+ // 3 input channels → 32 channels × 4 levels → 1 output
458
+ // Receptive field = (3-1)·(2⁴-1)+1 = 30 timesteps
459
+ const tcn = new TCN(3, 32, 3, 4, 1);
460
+
461
+ const sequence = Array.from({ length: 50 }, () => [Math.random(), Math.random(), Math.random()]);
462
+ const outputs = tcn.forward(sequence); // [50][1]
463
+ ```
264
464
 
265
- ### NetworkTransformerself-attention over sequences
465
+ ### NetworkLSTMrecurrent memory
266
466
 
267
467
  ```ts
268
- import { NetworkTransformer } from "@dniskav/neuron";
269
-
270
- // Sudoku solver: 81 cells (tokens), values 0–9, predict digit 1–9 per cell
271
- const net = new NetworkTransformer(81, {
272
- vocabSize: 10, // digits 0–9
273
- d_model: 64, // embedding / hidden dimension
274
- nHeads: 4, // attention heads (d_k = d_model / nHeads = 16)
275
- d_ff: 128, // FFN hidden size
276
- nBlocks: 4, // number of transformer blocks
277
- nClasses: 9, // output classes per token (digits 1–9)
278
- });
468
+ import { NetworkLSTM } from "@dniskav/neuron";
469
+
470
+ const net = new NetworkLSTM(1, 8, [4, 1]);
471
+
472
+ for (let epoch = 0; epoch < 300; epoch++) {
473
+ net.resetState();
474
+ for (let step = 0; step < 6; step++) net.predict([1]);
475
+ net.train([[0],[0],[0],[1],[1],[1]], 0.05);
476
+ }
477
+ ```
478
+
479
+ ### Metrics — evaluate your model
480
+
481
+ ```ts
482
+ import { accuracy, f1Score, confusionMatrix, printConfusionMatrix, auc, classificationReport } from "@dniskav/neuron";
483
+
484
+ const yTrue = [0, 1, 1, 0, 1];
485
+ const yPred = [0, 1, 0, 0, 1];
486
+
487
+ console.log(accuracy(yTrue, yPred)); // 0.8
488
+ console.log(f1Score(yTrue, yPred)); // 0.8
279
489
 
280
- // tokens: 81 cell values (0 = empty)
281
- const puzzle = [5,3,0, 0,7,0, 0,0,0, ...];
282
- const targets = [...]; // 81*9 one-hot values
283
- const mask = puzzle.map(v => v === 0); // only train on empty cells
490
+ const cm = confusionMatrix(yTrue, yPred);
491
+ printConfusionMatrix(cm, ['neg', 'pos']);
284
492
 
285
- const loss = net.train(puzzle, targets, 0.001, mask);
286
- // loss is cross-entropy (not MSE) — decreases from ~2.2 toward 0 as training progresses
287
- const logits = net.predict(puzzle); // 729 logits (81 × 9)
493
+ // AUC-ROC
494
+ const scores = [0.1, 0.9, 0.4, 0.2, 0.8];
495
+ console.log(auc(yTrue, scores)); // ~0.9
288
496
 
289
- // Attention weights from all blocks for visualization
290
- const weights = net.getAttentionWeights();
291
- // weights[blockIdx][headIdx] → seqLen × seqLen matrix
497
+ classificationReport(yTrue, yPred, ['neg', 'pos']);
292
498
  ```
293
499
 
294
- Each head in each block learns a different type of relationship (row, column,
295
- 3×3 box). The network figures this out by itself through training.
500
+ ### EarlyStopping
296
501
 
297
- ### NetworkTransformerRL — Transformer for reinforcement learning
502
+ ```ts
503
+ import { EarlyStopping } from "@dniskav/neuron";
298
504
 
299
- `NetworkTransformerRL` uses causal self-attention over a sliding window of past states to output Q-values. Unlike `NetworkLSTM`, the agent attends to specific past moments rather than compressing them into a single hidden vector.
505
+ const stopper = new EarlyStopping({ patience: 10, minDelta: 1e-4, mode: 'min' });
506
+
507
+ for (let epoch = 0; epoch < 1000; epoch++) {
508
+ const valLoss = trainEpoch();
509
+ if (stopper.update(valLoss, epoch)) {
510
+ console.log(`Stopped at epoch ${epoch}`);
511
+ break;
512
+ }
513
+ }
514
+ ```
515
+
516
+ ### LossPlotter — ASCII loss curve
300
517
 
301
518
  ```ts
302
- import { NetworkTransformerRL } from "@dniskav/neuron";
303
-
304
- // Agent sees the last 8 steps, each step is a 7-value sensor vector → 4 actions
305
- const net = new NetworkTransformerRL(8, 7, {
306
- d_model: 32,
307
- nHeads: 2,
308
- d_ff: 64,
309
- nBlocks: 2,
310
- nActions: 4,
311
- });
519
+ import { LossPlotter } from "@dniskav/neuron";
520
+
521
+ const plotter = new LossPlotter({ width: 60, height: 12, title: 'Training Loss' });
522
+
523
+ for (let e = 0; e < 500; e++) {
524
+ const loss = trainStep();
525
+ plotter.add(loss, e);
526
+ }
527
+
528
+ plotter.print();
529
+ // Training Loss
530
+ // ┌────────────────────────────────────────────────────────────┐
531
+ // │ 2.31 ·
532
+ // │ · ·
533
+ // │ · · ·
534
+ // │ · · · · · · ·
535
+ // │ 0.02 · · · · · · · · · · · · · · ·
536
+ // └────────────────────────────────────────────────────────────┘
537
+ // 0 250 499
538
+ ```
539
+
540
+ ### DataAugmentation
312
541
 
313
- // Each step: feed the last N states as a sequence
314
- const sequence = getLastNStates(); // number[][] — shape: [8, 7]
315
- const qValues = net.predict(sequence); // number[4]
542
+ ```ts
543
+ import { DataAugmentation } from "@dniskav/neuron";
316
544
 
317
- // Q-learning update: train toward Bellman target
318
- const action = argmax(qValues);
319
- const reward = env.step(action);
320
- const targets = qValues.slice();
321
- targets[action] = reward + 0.99 * Math.max(...net.predict(nextSequence));
545
+ // Split dataset
546
+ const { trainX, trainY, valX, valY } = DataAugmentation.split(X, y, 0.8, 0.1);
322
547
 
323
- const loss = net.train(sequence, targets, 0.001);
548
+ // Normalize (fit on train, apply to all)
549
+ const { normalized: normTrain, min, max } = DataAugmentation.normalize(trainX);
550
+ const normVal = valX.map(x => DataAugmentation.normalizePoint(x, min, max));
551
+
552
+ // Augment training set (×3 copies with Gaussian noise)
553
+ const { X: augX, y: augY } = DataAugmentation.augmentBatch(normTrain, trainY, 3, 0.02);
324
554
  ```
325
555
 
326
- The last step in the sequence gets 2× pooling weight the most recent state contributes more to the decision.
556
+ ### WeightInspectordiagnose your network
327
557
 
328
558
  ```ts
329
- // Inspect what the agent is attending to
330
- const attnWeights = net.getAttentionWeights();
331
- // attnWeights[blockIdx][headIdx] seqLen × seqLen matrix
559
+ import { NetworkN, WeightInspector, relu } from "@dniskav/neuron";
560
+
561
+ const net = new NetworkN([784, 256, 128, 10], { activations: [relu, relu, relu] });
562
+ // ... train ...
563
+
564
+ WeightInspector.print(net);
565
+ // Layer 0: mean=0.001 std=0.056 min=-0.21 max=0.19 dead=0 params=200960
566
+ // Layer 1: mean=0.000 std=0.079 min=-0.31 max=0.28 dead=3 params=32896
567
+ // Layer 2: mean=-0.001 std=0.091 min=-0.28 max=0.32 dead=0 params=1290
568
+ ```
569
+
570
+ ## How it works
571
+
572
+ Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
573
+
574
+ ```
575
+ weight += lr × delta × input
576
+ bias += lr × delta
577
+ ```
578
+
579
+ `NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. `NeuronN` uses **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]`.
580
+
581
+ When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state.
582
+
583
+ The `Value` class implements **reverse-mode automatic differentiation**: every operation records its inputs and a backward function. Calling `.backward()` on the output node performs a topological sort and propagates `∂L/∂w` through the entire graph.
584
+
585
+ ## Build
586
+
587
+ ```bash
588
+ npm run build # outputs CJS + ESM + type declarations to dist/
589
+ npm run dev # watch mode
590
+ npm test # run test suite
332
591
  ```
333
592
 
593
+ ## For AI agents
594
+
595
+ If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
596
+
334
597
  ## Changelog
335
598
 
599
+ ### v0.3.0
600
+ - **New — Classical ML:** `Perceptron`, `LinearRegression` (normal equation + GD), `LogisticRegression`, `SoftmaxRegression`, `GaussianNaiveBayes`, `DecisionTree` (CART, Gini/MSE)
601
+ - **New — Unsupervised:** `KMeans` (K-Means++ init), `PCA` (power iteration + Hotelling deflation), `SOM` (Kohonen map), `HopfieldNetwork` (Hebbian storage + energy), `Autoencoder`
602
+ - **New — Deep Learning:** `Conv2D` (full forward/backward), `MaxPool2D` (position mask for exact backprop), `Flatten`, `RNN` (BPTT, documents vanishing gradient), `Seq2Seq` (encoder-decoder LSTM), `CausalConv1D`, `TCN` (dilated temporal convolutions)
603
+ - **New — Generative:** `GAN` (min-max game, Box-Muller sampling), `VAE` (reparametrization trick, ELBO = MSE + KL)
604
+ - **New — Autograd:** `Value` / `Tape` — scalar reverse-mode AD with topological backprop (micrograd-style)
605
+ - **New — Metrics:** `confusionMatrix`, `accuracy`, `precision`, `recall`, `f1Score`, `rocCurve`, `auc`, `mae`, `rmse`, `r2Score`, `perplexity`, `printConfusionMatrix`, `classificationReport`
606
+ - **New — Utilities:** `EarlyStopping` (patience + best-weight restore), `LossPlotter` (ASCII terminal curve), `WeightInspector` (per-layer stats, dead ReLU detection), `DataAugmentation` (noise, normalize, z-score, shuffle, split)
607
+
336
608
  ### v0.2.7
337
- - **Docs:** Added architecture diagram to README — visual progression from `Neuron` to `NetworkTransformerRL`
609
+ - **Docs:** Added architecture diagram to README
338
610
 
339
611
  ### v0.2.6
340
612
  - **Fix:** `Network.predict` now returns `number[]` (consistent with all other network classes)
341
- - **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()` instead of hardcoded SGD and sigmoid derivative
342
- - **Fix:** `LayerNorm.backwardOne` now correctly uses pre-update γ when computing the input gradient
343
- - **Fix:** LSTM and GRU gate initialization corrected from He (`√(2/n)`) to Xavier fan-in+out (`√(2/(fanIn+fanOut))`), matching the sigmoid/tanh activations used in those gates
344
- - **New:** `BiasVector` — 1D counterpart to `WeightMatrix` with per-scalar Adam optimizers; replaces repeated `number[] + Adam[]` pairs in `TransformerBlock`, `NetworkTransformer`, and `NetworkTransformerRL`
345
- - **New:** `defaultOptimizer` exported from `optimizers.ts` single source of truth for the default `() => new SGD()` factory
346
- - **Refactor:** `NetworkN.train` and `trainWithDeltas` share extracted `_forwardAll()` and `_backpropLayers()` internals — eliminates ~50 lines of duplication
347
- - **Refactor:** `Transformer` backward methods now throw descriptive errors instead of crashing with a cryptic `TypeError` when called before `predict()`
348
- - **Refactor:** `NetworkTransformer.setWeights()` and `NetworkTransformerRL.setWeightsFlat()` use each component's own `setWeights()` instead of direct `.W` mutation
613
+ - **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()`
614
+ - **Fix:** `LayerNorm.backwardOne` correctly uses pre-update γ
615
+ - **Fix:** LSTM and GRU gate initialization corrected to Xavier fan-in+out
616
+ - **New:** `BiasVector` — 1D counterpart to `WeightMatrix`
617
+ - **New:** `defaultOptimizer` — shared default factory
618
+ - **Refactor:** `NetworkN` extracts `_forwardAll()` and `_backpropLayers()`
349
619
 
350
620
  ### v0.2.5
351
- - Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D` (per-scalar Adam/Momentum/SGD)
352
- - `NetworkN`: residual connections (`residual` option) and dropout (`dropoutRate`)
353
- - `Conv1D`: multi-channel input (`inputChannels`)
354
- - `NetworkTransformerRL`: configurable pooling (`avg` / `max` / `last` / `weighted`)
355
- - `Trainer`: weight decay, early stopping, classification metrics, gradient clipping support
356
- - `DataLoader`: validation split (`validationSplit` + `getValidationData()`)
357
- - `ModelSaver`: universal serialization via flat `getWeights()`/`setWeights()` for all classes
358
- - Gradient check test suite (`tests/GradientCheck.test.ts`)
359
-
360
- ## Possible improvements
361
-
362
- 1. **Support for batches** in training to improve efficiency and gradient stability.
363
- 2. **Global gradient norm clipping** — `WeightMatrix.update` supports per-element clipping; a utility to clip across all matrices by total norm would be more principled.
364
- 3. **Learning rate warmup** — standard practice for Transformers; ramp LR from 0 to target over the first N steps.
365
- 4. **Pre-norm architecture** — LayerNorm before the residual add (instead of after) is more stable for deep stacks.
621
+ - Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D`
622
+ - `NetworkN`: residual connections and dropout
623
+ - `Conv1D`: multi-channel input
624
+ - `Trainer`: weight decay, early stopping, classification metrics
625
+ - `DataLoader`: validation split
626
+ - `ModelSaver`: universal serialization
366
627
 
367
628
  ## License
368
629