@dniskav/neuron 0.2.7 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
 
4
4
  A minimal, dependency-free neural network library built from scratch in TypeScript. Designed for learning and experimentation — every line of math is readable.
5
5
 
6
- Each class is a building block for the next: from a single neuron to a full Transformer with causal attention.
6
+ Each class is a building block for the next: from a single neuron to a full Transformer with causal attention. Includes classical ML, unsupervised learning, generative models, embeddings, and autograd — all in pure TypeScript, zero dependencies.
7
7
 
8
8
  ```mermaid
9
9
  graph TD
@@ -20,13 +20,43 @@ graph TD
20
20
  K["NetworkTransformer\nembeddings → blocks → per-token logits"]
21
21
  L["NetworkTransformerRL\ncontinuous projection → causal attention → Q-values"]
22
22
 
23
+ subgraph Classical ML
24
+ P["Perceptron\nstep function · Rosenblatt rule"]
25
+ LR["LinearRegression\nnormal equation · gradient descent"]
26
+ LOG["LogisticRegression\nsigmoid · BCE · SoftmaxRegression"]
27
+ NB["GaussianNaiveBayes\nlog-probabilities · Gaussian P(x|c)"]
28
+ DT["DecisionTree\nCART · Gini · MSE split"]
29
+ end
30
+
31
+ subgraph Unsupervised
32
+ KM["KMeans\nK-Means++ · inertia · elbow"]
33
+ PCA["PCA\npower iteration · projection · reconstruction"]
34
+ SOM["SOM\nKohonen · BMU · Gaussian neighborhood"]
35
+ HN["HopfieldNetwork\nHebbian · energy · associative memory"]
36
+ AE["Autoencoder\nencoder · bottleneck · decoder"]
37
+ end
38
+
39
+ subgraph Generative
40
+ GAN["GAN\ngenerator · discriminator · min-max"]
41
+ VAE["VAE\nreparametrization trick · ELBO · KL"]
42
+ end
43
+
44
+ subgraph Autograd
45
+ TAP["Value / Tape\nreverse-mode · computational graph · backward"]
46
+ end
47
+
23
48
  A --> B --> C --> D --> E
24
49
  E --> F --> G
25
50
  E --> H --> I --> J --> K --> L
51
+ E --> AE
52
+ E --> GAN
53
+ E --> VAE
26
54
  ```
27
55
 
28
56
  ## What's inside
29
57
 
58
+ ### Neural network building blocks
59
+
30
60
  | Export | Description |
31
61
  |--------|-------------|
32
62
  | `Neuron` | Single-input neuron. The simplest possible unit: one weight, one bias. |
@@ -36,20 +66,126 @@ graph TD
36
66
  | `NetworkN` | Deep network of arbitrary depth. Define your architecture as `[inputs, ...hidden, outputs]`. |
37
67
  | `LSTMLayer` | Recurrent layer with persistent hidden and cell state. Learns sequences via BPTT. |
38
68
  | `NetworkLSTM` | Wraps an `LSTMLayer` + dense layers. Maintains memory across steps within an episode. |
69
+ | `GRULayer` | Gated Recurrent Unit — lighter alternative to LSTM, two gates instead of three. |
39
70
  | `NetworkTransformer` | Full token-classification Transformer: embeddings → N blocks → per-token logits. |
40
- | `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. Remembers the last N steps. |
71
+ | `NetworkTransformerRL` | Transformer for RL agents: continuous input projection → causal attention → Q-values. |
41
72
  | `TransformerBlock` | One Transformer block: multi-head attention + FFN + LayerNorm × 2 with residuals. |
42
73
  | `MultiHeadAttention` | N parallel attention heads concatenated and projected to `d_model`. |
43
74
  | `AttentionHead` | Single scaled dot-product self-attention head (Q / K / V projections + backprop). |
75
+
76
+ ### Layers & components
77
+
78
+ | Export | Description |
79
+ |--------|-------------|
80
+ | `Conv1D` | 1D convolution over sequences. Multi-channel, configurable stride and padding. |
81
+ | `Conv2D` | 2D convolution for images. Kernels `[filters][kH][kW][C]`, full forward + backward. |
82
+ | `MaxPool2D` | Max pooling 2D. Stores position mask for exact gradient routing in backprop. |
83
+ | `Flatten` | Converts `[H][W][C]` tensors to flat vectors. Bridges Conv layers to dense layers. |
84
+ | `RNN` | Vanilla RNN with BPTT. Explicitly shows where and why gradients vanish. |
85
+ | `Seq2Seq` | Encoder + Decoder LSTMs with context vector transfer. Teacher forcing in training. |
86
+ | `CausalConv1D` | Causal dilated 1D convolution. One building block of a TCN. |
87
+ | `TCN` | Temporal Convolutional Network. Stacks causal dilated convolutions for sequences without recurrence. |
44
88
  | `LayerNorm` | Layer normalization with learnable γ / β per feature. |
45
- | `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers. Optional per-element gradient clipping via `update(dW, lr, clipValue)`. |
46
- | `BiasVector` | 1D bias vector with per-scalar Adam optimizers. Companion to `WeightMatrix` for bias terms. |
89
+ | `BatchNorm` | Batch normalization with running mean/variance for inference. |
90
+ | `Dropout` | Inverted dropout for regularization. Active only during training. |
91
+ | `WeightMatrix` | 2D weight matrix with per-scalar Adam optimizers and optional gradient clipping. |
92
+ | `BiasVector` | 1D bias vector with per-scalar Adam optimizers. |
47
93
  | `EmbeddingMatrix` | Lookup-table embedding matrix with SGD updates. |
48
- | `sigmoid` `relu` `tanh` `linear` | Built-in activation functions. |
49
- | `SGD` `Momentum` `Adam` `ClipOptimizer` | Optimizers. Each instance tracks its own state per weight. `ClipOptimizer` wraps any optimizer with gradient clipping. |
50
- | `defaultOptimizer` | Default `OptimizerFactory` (`() => new SGD()`). Shared across `NeuronN`, `Layer`, `NetworkN`, `NetworkLSTM`. |
51
- | `mse` `crossEntropy` | Loss functions for evaluation and logging. |
52
- | `mseDelta` `crossEntropyDelta` | Output-layer delta functions for use with `trainWithDeltas`. |
94
+
95
+ ### Classical ML
96
+
97
+ | Export | Description |
98
+ |--------|-------------|
99
+ | `Perceptron` | The historical Rosenblatt perceptron (1957). Step function, linear rule. Shows why XOR is impossible. |
100
+ | `LinearRegression` | Closed-form normal equation `(XᵀX)⁻¹Xᵀy` + gradient descent mode. Pure array arithmetic. |
101
+ | `LogisticRegression` | Sigmoid + binary cross-entropy, no hidden layers. The boundary between classical ML and neural nets. |
102
+ | `SoftmaxRegression` | Multinomial logistic regression. Log-sum-exp trick for numerical stability. |
103
+ | `GaussianNaiveBayes` | `P(c|x) ∝ P(c)·∏P(xᵢ|c)` in log-space. Zero gradient descent — pure Bayes. |
104
+ | `DecisionTree` | CART with Gini impurity (classification) or variance (regression). Fully recursive. |
105
+
106
+ ### Unsupervised learning
107
+
108
+ | Export | Description |
109
+ |--------|-------------|
110
+ | `KMeans` | K-Means++ initialization + Lloyd's algorithm. `inertia()` for the elbow method. |
111
+ | `PCA` | Principal Component Analysis via power iteration + Hotelling deflation. Projects, reconstructs, explains variance. |
112
+ | `SOM` | Self-Organizing Map (Kohonen). BMU search, Gaussian neighborhood, topology preservation. |
113
+ | `HopfieldNetwork` | Associative memory. Hebbian storage, energy function, async recall. Capacity ~0.138·N. |
114
+ | `Autoencoder` | Encoder + bottleneck + decoder using two `NetworkN` instances. Learns compressed representations. |
115
+
116
+ ### Generative models
117
+
118
+ | Export | Description |
119
+ |--------|-------------|
120
+ | `GAN` | Generator vs Discriminator min-max game. Documents Nash equilibrium and mode collapse. |
121
+ | `VAE` | Variational Autoencoder. Reparametrization trick, ELBO = reconstruction + KL divergence. |
122
+
123
+ ### Automatic differentiation
124
+
125
+ | Export | Description |
126
+ |--------|-------------|
127
+ | `Value` | Scalar autograd node. Builds a computational graph and propagates gradients with `.backward()`. Inspired by micrograd. |
128
+
129
+ ### Embeddings
130
+
131
+ | Export | Description |
132
+ |--------|-------------|
133
+ | `Word2Vec` | Learns word embeddings via Skip-gram or CBOW. Full-softmax, cosine similarity, analogies (`king - man + woman ≈ queen`). |
134
+ | `TSNE` | t-SNE dimensionality reduction. Binary-search perplexity, Student-t kernel, KL gradient, early exaggeration. |
135
+ | `PositionalEncoding` | Sinusoidal positional encoding (Vaswani et al.). Static — no parameters, generalizes to unseen lengths. |
136
+ | `LearnedPositionalEncoding` | Trainable positional encoding. Xavier-initialized, learnable up to a fixed `maxSeqLen`. |
137
+ | `ContrastiveLearning` | SimCLR-style self-supervised learning. NT-Xent loss, encoder + projection head, temperature τ. |
138
+ | `Augmenter` | Data augmentation helpers for contrastive pairs: Gaussian noise, feature dropout, `makePair()`. |
139
+
140
+ ### Activations & math
141
+
142
+ | Export | Description |
143
+ |--------|-------------|
144
+ | `sigmoid` `relu` `tanh` `linear` `leakyRelu` `elu` | Built-in activation functions with `fn` and `dfn` (derivative from output). |
145
+ | `makeLeakyRelu(α)` `makeElu(α)` | Parametric variants. |
146
+ | `matMul` `transpose` `softmax` `softmaxBackward` | Matrix math utilities. |
147
+
148
+ ### Optimizers
149
+
150
+ | Export | Description |
151
+ |--------|-------------|
152
+ | `SGD` | Vanilla stochastic gradient descent. Stateless. |
153
+ | `Momentum` | Accumulates velocity in the gradient direction. |
154
+ | `Adam` | Adaptive moment estimation. Per-parameter first and second moments with bias correction. |
155
+ | `ClipOptimizer` | Wraps any optimizer with gradient clipping. |
156
+ | `ClippedOptimizerFactory` | Factory wrapper that clips all created optimizers. |
157
+ | `defaultOptimizer` | Default factory (`() => new SGD()`). Shared fallback across all classes. |
158
+
159
+ ### Loss functions
160
+
161
+ | Export | Description |
162
+ |--------|-------------|
163
+ | `mse` `crossEntropy` | Scalar loss functions for evaluation and logging. |
164
+ | `mseDelta` `crossEntropyDelta` `crossEntropyDeltaRaw` | Output-layer delta functions for `trainWithDeltas`. |
165
+
166
+ ### Metrics & evaluation
167
+
168
+ | Export | Description |
169
+ |--------|-------------|
170
+ | `confusionMatrix` | Returns `number[][]` confusion matrix. |
171
+ | `accuracy` `precision` `recall` `f1Score` | Standard classification metrics. |
172
+ | `rocCurve` `auc` | ROC curve points and area under the curve (trapezoidal rule). |
173
+ | `mae` `rmse` `r2Score` | Regression metrics. |
174
+ | `perplexity` | `exp(mean cross-entropy)` — natural metric for language models. |
175
+ | `printConfusionMatrix` `classificationReport` | Console-formatted output tables. |
176
+
177
+ ### Training utilities
178
+
179
+ | Export | Description |
180
+ |--------|-------------|
181
+ | `Trainer` | Training loop with epochs, batches, metrics, and callbacks. |
182
+ | `DataLoader` | Dataset wrapper with shuffling and validation split. |
183
+ | `LRScheduler` | Learning rate schedules (step, exponential, cosine). |
184
+ | `EarlyStopping` | Stops training when a metric stalls. Configurable patience, mode, and best-weight restore. |
185
+ | `LossPlotter` | Renders a loss curve as ASCII art in the terminal. |
186
+ | `WeightInspector` | Per-layer weight statistics (mean, std, dead weights). Detects dead ReLUs. |
187
+ | `DataAugmentation` | Noise, jitter, normalization, z-score, shuffle, train/val/test split. |
188
+ | `ModelSaver` | Universal serialization via flat `getWeights()` / `setWeights()`. |
53
189
 
54
190
  ## Install
55
191
 
@@ -66,303 +202,515 @@ import { Neuron } from "@dniskav/neuron";
66
202
 
67
203
  const neuron = new Neuron();
68
204
 
69
- // Train: output 1 if input >= 18, else 0
70
205
  for (let epoch = 0; epoch < 1000; epoch++) {
71
206
  neuron.train(20, 1, 0.1); // adult
72
207
  neuron.train(15, 0, 0.1); // minor
73
208
  }
74
209
 
75
- console.log(neuron.predict(17)); // ~0.1 (minor)
76
- console.log(neuron.predict(25)); // ~0.9 (adult)
210
+ console.log(neuron.predict(17)); // ~0.1
211
+ console.log(neuron.predict(25)); // ~0.9
212
+ ```
213
+
214
+ ### NetworkN — deep network with custom architecture
215
+
216
+ ```ts
217
+ import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
218
+
219
+ const net = new NetworkN([3, 64, 32, 1], {
220
+ activations: [relu, relu, sigmoid],
221
+ optimizer: () => new Adam(),
222
+ });
223
+
224
+ net.train([0.5, 0.3, 0.8], [1], 0.001);
225
+ const [out] = net.predict([0.5, 0.3, 0.8]);
77
226
  ```
78
227
 
79
- ### N-input neuronmulti-feature classification
228
+ ### Historical Perceptronstep function, no hidden layers
80
229
 
81
230
  ```ts
82
- import { NeuronN } from "@dniskav/neuron";
231
+ import { Perceptron } from "@dniskav/neuron";
83
232
 
84
- const neuron = new NeuronN(3); // 3 inputs: R, G, B
233
+ const p = new Perceptron(2);
85
234
 
86
- // Teach it to detect bright colors (luminance > 0.65)
87
- neuron.train([1, 1, 1], 1, 0.05); // white → bright
88
- neuron.train([0, 0, 0], 0, 0.05); // black dark
235
+ // Learns AND gate (linearly separable)
236
+ const data = [[0,0,0],[0,1,0],[1,0,0],[1,1,1]];
237
+ for (let e = 0; e < 100; e++)
238
+ for (const [a, b, t] of data) p.train([a, b], t, 0.1);
89
239
 
90
- console.log(neuron.predict([0.9, 0.9, 0.9])); // close to 1
240
+ console.log(p.predict([1, 1])); // 1
241
+ console.log(p.predict([0, 1])); // 0
242
+ // XOR cannot be learned — not linearly separable
91
243
  ```
92
244
 
93
- ### Networknon-linear classification
245
+ ### Linear Regression normal equation
94
246
 
95
247
  ```ts
96
- import { Network } from "@dniskav/neuron";
248
+ import { LinearRegression } from "@dniskav/neuron";
97
249
 
98
- // 2 inputs 8 hidden neurons → 1 output
99
- const net = new Network(2, 8, 1);
250
+ const model = new LinearRegression();
100
251
 
101
- // Train on XOR (not linearly separable — needs hidden layer)
102
- const data = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]];
252
+ // Exact closed-form solution in one call
253
+ model.fitNormal(
254
+ [[1], [2], [3], [4]], // X
255
+ [2, 4, 6, 8] // y = 2x
256
+ );
103
257
 
104
- for (let epoch = 0; epoch < 5000; epoch++) {
105
- for (const [x, y, t] of data) {
106
- net.train([x, y], t, 0.3);
107
- }
108
- }
258
+ console.log(model.predict([5])); // ~10
259
+ console.log(model.getCoefficients()); // { weights: [2], bias: ~0 }
260
+ ```
109
261
 
110
- console.log(net.predict([0, 1])[0]); // ~0.97
111
- console.log(net.predict([1, 1])[0]); // ~0.03
262
+ ### Logistic Regression — sigmoid + BCE
263
+
264
+ ```ts
265
+ import { LogisticRegression } from "@dniskav/neuron";
266
+
267
+ const clf = new LogisticRegression(2);
268
+ const lossHistory = clf.train(
269
+ [[0,0],[1,1],[1,0],[0,1]],
270
+ [0, 1, 1, 0],
271
+ 0.1, 500
272
+ );
273
+
274
+ console.log(clf.classify([0.9, 0.9])); // 1
275
+ console.log(clf.classify([0.1, 0.1])); // 0
112
276
  ```
113
277
 
114
- ### NetworkN deep network with custom architecture
278
+ ### Gaussian Naive Bayes zero gradient descent
115
279
 
116
280
  ```ts
117
- import { NetworkN } from "@dniskav/neuron";
281
+ import { GaussianNaiveBayes } from "@dniskav/neuron";
282
+
283
+ const nb = new GaussianNaiveBayes();
284
+ nb.fit(
285
+ [[1.2, 0.5], [1.4, 0.7], [5.0, 4.5], [5.2, 4.8]],
286
+ [0, 0, 1, 1]
287
+ );
118
288
 
119
- // 3 inputs 24 hidden → 16 hidden → 2 outputs
120
- const net = new NetworkN([3, 24, 16, 2]);
289
+ console.log(nb.predict([1.3, 0.6])); // 0
290
+ console.log(nb.predict([5.1, 4.6])); // 1
291
+ ```
292
+
293
+ ### Decision Tree — Gini split
121
294
 
122
- // Train with multiple targets
123
- net.train([0.5, 0.3, 0.8], [1, 0], 0.05);
295
+ ```ts
296
+ import { DecisionTree } from "@dniskav/neuron";
124
297
 
125
- // Predict returns an array one value per output neuron
126
- const [out1, out2] = net.predict([0.5, 0.3, 0.8]);
298
+ const tree = new DecisionTree({ maxDepth: 4, task: 'classification' });
299
+ tree.fit(X_train, y_train);
300
+ const predictions = tree.predictBatch(X_test);
127
301
  ```
128
302
 
129
- ### ActivationsReLU, tanh, and more
303
+ ### K-Meansunsupervised clustering
304
+
305
+ ```ts
306
+ import { KMeans } from "@dniskav/neuron";
307
+
308
+ const km = new KMeans(3); // 3 clusters
309
+ km.fit(points);
130
310
 
131
- Pass an activation per layer. The last layer typically uses `sigmoid` for binary output or `linear` for regression.
311
+ const cluster = km.predict([1.2, 0.5]); // index 0, 1 or 2
312
+ console.log(km.inertia(points)); // lower = better fit
313
+ ```
314
+
315
+ ### PCA — dimensionality reduction
132
316
 
133
317
  ```ts
134
- import { NetworkN, relu, sigmoid } from "@dniskav/neuron";
318
+ import { PCA } from "@dniskav/neuron";
135
319
 
136
- const net = new NetworkN([3, 64, 32, 1], {
137
- activations: [relu, relu, sigmoid],
138
- });
320
+ const pca = new PCA(2); // keep top 2 components
321
+ pca.fit(X); // 100 samples × 10 features
322
+
323
+ const Z = pca.transform(X); // 100 × 2
324
+ const X2 = pca.inverseTransform(Z); // reconstructed 100 × 10
325
+
326
+ console.log(pca.explainedVarianceRatio()); // [0.72, 0.15, ...]
139
327
  ```
140
328
 
141
- Available: `sigmoid`, `relu`, `tanh`, `linear`.
329
+ ### Self-Organizing Map
330
+
331
+ ```ts
332
+ import { SOM } from "@dniskav/neuron";
333
+
334
+ const som = new SOM(10, 10, 3); // 10×10 grid, 3-dimensional inputs (RGB)
335
+ som.train(colors, 500);
142
336
 
143
- ### Optimizers Adam, Momentum, SGD
337
+ const [row, col] = som.getBMU([255, 0, 0]); // find best matching unit for red
338
+ console.log(som.quantizationError(colors));
339
+ ```
144
340
 
145
- Pass an optimizer factory. Each weight gets its own instance with independent state.
341
+ ### Hopfield Network associative memory
146
342
 
147
343
  ```ts
148
- import { NetworkN, relu, sigmoid, Adam } from "@dniskav/neuron";
344
+ import { HopfieldNetwork } from "@dniskav/neuron";
149
345
 
150
- const net = new NetworkN([2, 64, 1], {
151
- activations: [relu, sigmoid],
152
- optimizer: () => new Adam(), // default: beta1=0.9, beta2=0.999
153
- });
346
+ const net = new HopfieldNetwork(64); // 64 binary neurons
154
347
 
155
- // Momentum example
156
- import { Momentum } from "@dniskav/neuron";
157
- const net2 = new NetworkN([2, 32, 1], {
158
- optimizer: () => new Momentum(0.9),
159
- });
348
+ // Store two 64-bit patterns
349
+ net.store(HopfieldNetwork.binarize(pattern1)); // converts 0/1 -1/+1
350
+ net.store(HopfieldNetwork.binarize(pattern2));
351
+
352
+ // Recall from noisy input
353
+ const recovered = net.recall(HopfieldNetwork.binarize(noisyPattern1));
354
+ console.log(net.energy(recovered)); // local minimum = stored memory
160
355
  ```
161
356
 
162
- Optimizers also work in `NetworkLSTM` (applied to the dense layers):
357
+ ### Autoencoder learn compressed representations
163
358
 
164
359
  ```ts
165
- import { NetworkLSTM, relu, Adam } from "@dniskav/neuron";
360
+ import { Autoencoder } from "@dniskav/neuron";
166
361
 
167
- const net = new NetworkLSTM(1, 8, [4, 1], {
168
- denseActivation: relu,
169
- optimizer: () => new Adam(0.001),
170
- });
362
+ // 784 [128, 64] → 16 (latent) [64, 128] → 784
363
+ const ae = new Autoencoder(784, [128, 64], 16, [64, 128]);
364
+
365
+ for (let e = 0; e < 1000; e++)
366
+ for (const x of images)
367
+ ae.train(x, 0.001);
368
+
369
+ const latent = ae.encode(image); // compressed: 16 values
370
+ const reconstructed = ae.reconstruct(image); // decoded back: 784 values
171
371
  ```
172
372
 
173
- ### Loss utilities
373
+ ### GAN — generative adversarial training
174
374
 
175
375
  ```ts
176
- import { mse, crossEntropy } from "@dniskav/neuron";
376
+ import { GAN } from "@dniskav/neuron";
377
+
378
+ const gan = new GAN(
379
+ 16, // latentDim
380
+ [32, 64], // generator hidden layers
381
+ 8, // outputDim (size of generated samples)
382
+ [64, 32], // discriminator hidden layers
383
+ );
384
+
385
+ for (let step = 0; step < 10000; step++) {
386
+ const { dLoss, gLoss } = gan.trainStep(realBatch, 0.0002);
387
+ if (step % 500 === 0) console.log(`D: ${dLoss.toFixed(3)} G: ${gLoss.toFixed(3)}`);
388
+ }
177
389
 
178
- const predicted = net.predict([0.5, 0.3]);
179
- console.log(mse(predicted, [1, 0]));
180
- console.log(crossEntropy(predicted, [1, 0]));
390
+ const fake = gan.generate(); // new synthetic sample
181
391
  ```
182
392
 
183
- ### trainWithDeltascustom loss / physics-based gradients
393
+ ### VAEvariational autoencoder
394
+
395
+ ```ts
396
+ import { VAE } from "@dniskav/neuron";
397
+
398
+ const vae = new VAE(784, [256, 128], 32, [128, 256]);
184
399
 
185
- `NetworkN` also exposes `trainWithDeltas` for when you compute your own output-layer deltas (e.g., from a physics simulation or a custom loss function):
400
+ for (const x of dataset) {
401
+ const { totalLoss, reconLoss, klLoss } = vae.train(x, 0.001);
402
+ }
403
+
404
+ // Sample from latent space
405
+ const generated = vae.generate(); // random sample
406
+ const { mu, logVar } = vae.encode(image); // encode → distribution params
407
+ const z = vae.reparametrize(mu, logVar); // sample z ~ N(μ, σ²)
408
+ ```
409
+
410
+ ### Word2Vec — aprende embeddings de palabras
186
411
 
187
412
  ```ts
188
- import { NetworkN, mseDelta } from "@dniskav/neuron";
413
+ import { Word2Vec } from "@dniskav/neuron";
414
+
415
+ const w2v = new Word2Vec(64, { model: 'skipgram', windowSize: 2 });
189
416
 
190
- const net = new NetworkN([3, 16, 2]);
191
- const pred = net.predict(inputs);
417
+ const corpus = [
418
+ ["the", "king", "rules", "the", "kingdom"],
419
+ ["the", "queen", "rules", "the", "land"],
420
+ ["man", "and", "woman", "are", "human"],
421
+ ];
192
422
 
193
- // Compute deltas manually using a helper, or from any external signal
194
- const deltas = pred.map((p, i) => mseDelta(p, targets[i]));
195
- net.trainWithDeltas(inputs, deltas, 0.01);
423
+ w2v.buildVocab(corpus);
424
+ w2v.train(corpus, 0.05, 200);
425
+
426
+ console.log(w2v.similarity("king", "queen")); // high
427
+ console.log(w2v.mostSimilar("king", 3));
428
+ // [{ word: 'queen', score: 0.91 }, ...]
429
+
430
+ // Vector arithmetic: king - man + woman ≈ queen
431
+ console.log(w2v.analogy("king", "man", "woman", 1));
432
+ // [{ word: 'queen', score: 0.87 }]
196
433
  ```
197
434
 
198
- ### NetworkLSTMrecurrent network with memory
435
+ ### t-SNEvisualiza embeddings en 2D
436
+
437
+ ```ts
438
+ import { TSNE } from "@dniskav/neuron";
439
+
440
+ // Reduce 128-dim embeddings → 2D for plotting
441
+ const tsne = new TSNE({ perplexity: 30, nIter: 1000, seed: 42 });
442
+ const points2D = tsne.fitTransform(embeddings128D); // [n][2]
443
+
444
+ console.log(tsne.kl()); // KL divergence — lower is better
445
+ // Plot points2D with any charting library
446
+ ```
199
447
 
200
- `NetworkLSTM` adds within-episode memory: the network can remember what happened in previous steps of the same sequence.
448
+ ### PositionalEncoding orden sin parámetros
201
449
 
202
450
  ```ts
203
- import { NetworkLSTM } from "@dniskav/neuron";
451
+ import { PositionalEncoding, LearnedPositionalEncoding } from "@dniskav/neuron";
204
452
 
205
- // 1 input LSTM(8 hidden) → Dense(4) → 1 output
206
- const net = new NetworkLSTM(1, 8, [4, 1]);
453
+ // Sinusoidal deterministic, no training needed
454
+ const pe = PositionalEncoding.encodeSequence(512, 128); // [512][128]
455
+ const withPos = PositionalEncoding.apply(tokenEmbeddings); // add PE to embeddings
207
456
 
208
- // Task: predict 1 if we're past step 3 in the episode, else 0
209
- // A feedforward net can't do this — it has no memory of step count.
457
+ // Learned trainable, fixed maxSeqLen
458
+ const lpe = new LearnedPositionalEncoding(512, 128);
459
+ const withLearnedPos = lpe.apply(tokenEmbeddings);
460
+ lpe.update(gradients, 0.001); // update during backprop
461
+ ```
210
462
 
211
- for (let epoch = 0; epoch < 300; epoch++) {
212
- net.resetState(); // clear memory at episode start
463
+ ### ContrastiveLearning representaciones sin etiquetas
213
464
 
214
- const targets: number[][] = [];
215
- for (let step = 0; step < 6; step++) {
216
- net.predict([1]); // same input every step
217
- targets.push([step >= 3 ? 1 : 0]);
218
- }
465
+ ```ts
466
+ import { ContrastiveLearning, Augmenter } from "@dniskav/neuron";
219
467
 
220
- net.train(targets, 0.05); // BPTT across the full episode
221
- }
468
+ // Encoder: 128 → [256, 128] 64 latent, projection head: 64 → 32
469
+ const cl = new ContrastiveLearning(128, [256, 128], 64, { temperature: 0.5 });
470
+
471
+ // Create positive pairs from unlabeled data (two augmented views per sample)
472
+ const pairs = unlabeledData.map(x => Augmenter.makePair(x));
222
473
 
223
- // Run a fresh episode and check predictions
224
- net.resetState();
225
- for (let step = 0; step < 6; step++) {
226
- const [out] = net.predict([1]);
227
- console.log(`step ${step}: ${out.toFixed(2)} (expected: ${step >= 3 ? 1 : 0})`);
474
+ for (let step = 0; step < 1000; step++) {
475
+ const loss = cl.trainStep(pairs, 0.001);
476
+ if (step % 100 === 0) console.log(`step ${step}: ${loss.toFixed(4)}`);
228
477
  }
229
- // step 0: 0.07 (expected: 0)
230
- // step 1: 0.11 (expected: 0)
231
- // step 2: 0.18 (expected: 0)
232
- // step 3: 0.81 (expected: 1)
233
- // step 4: 0.89 (expected: 1)
234
- // step 5: 0.93 (expected: 1)
478
+
479
+ // Use encoder for downstream tasks (classification, clustering, etc.)
480
+ const representation = cl.encode(newSample); // 64-dim vector
235
481
  ```
236
482
 
237
- The network learns to count steps using its hidden state no external counter needed.
483
+ ### Value / Tapeautomatic differentiation
238
484
 
239
- ## How it works
485
+ ```ts
486
+ import { Value } from "@dniskav/neuron";
240
487
 
241
- Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
488
+ // Build a computation graph
489
+ const x = new Value(2.0);
490
+ const w = new Value(-3.0);
491
+ const b = new Value(6.7);
492
+ const n = x.mul(w).add(b); // n = x*w + b
493
+ const o = n.tanh(); // o = tanh(n)
494
+
495
+ // Backward pass — fills .grad for every node
496
+ o.backward();
242
497
 
498
+ console.log(x.grad); // ∂o/∂x
499
+ console.log(w.grad); // ∂o/∂w
500
+ console.log(b.grad); // ∂o/∂b
243
501
  ```
244
- weight += lr × delta × input
245
- bias += lr × delta
502
+
503
+ ### Conv2D + MaxPool2D + Flatten — CNN pipeline
504
+
505
+ ```ts
506
+ import { Conv2D, MaxPool2D, Flatten, NetworkN, relu, sigmoid } from "@dniskav/neuron";
507
+
508
+ const conv = new Conv2D(28, 28, 1, 3, 8); // 28×28×1 → 26×26×8
509
+ const pool = new MaxPool2D(2); // 26×26×8 → 13×13×8
510
+ const flatten = new Flatten();
511
+ const dense = new NetworkN([13*13*8, 64, 10]);
512
+
513
+ // Forward
514
+ const featureMaps = conv.forward(image); // [H][W][C]
515
+ const pooled = pool.forward(featureMaps);
516
+ const flat = flatten.forward(pooled); // 1352 values
517
+ const logits = dense.predict(flat);
246
518
  ```
247
519
 
248
- `NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. The derivative of the chosen activation is applied at each layer.
520
+ ### RNN vanilla recurrent network
249
521
 
250
- `NeuronN` uses simplified **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]` — so gradients flow well from the start of training.
522
+ ```ts
523
+ import { RNN } from "@dniskav/neuron";
251
524
 
252
- When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state (velocity, moments).
525
+ // 1 input 16 hidden 1 output, over a sequence
526
+ const rnn = new RNN(1, 16, 1);
253
527
 
254
- ## Build
528
+ const sequence = [[0.1], [0.3], [0.7], [0.9]]; // 4 timesteps
529
+ const { outputs, hiddens } = rnn.forward(sequence);
255
530
 
256
- ```bash
257
- npm run build # outputs CJS + ESM + type declarations to dist/
258
- npm run dev # watch mode
531
+ // BPTT backward — returns MSE loss
532
+ const targets = [[0.2], [0.5], [0.8], [1.0]];
533
+ const loss = rnn.backward(sequence, targets, 0.01);
259
534
  ```
260
535
 
261
- ## For AI agents
536
+ ### TCN Temporal Convolutional Network
262
537
 
263
- If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
538
+ ```ts
539
+ import { TCN } from "@dniskav/neuron";
264
540
 
265
- ### NetworkTransformer self-attention over sequences
541
+ // 3 input channels 32 channels × 4 levels → 1 output
542
+ // Receptive field = (3-1)·(2⁴-1)+1 = 30 timesteps
543
+ const tcn = new TCN(3, 32, 3, 4, 1);
544
+
545
+ const sequence = Array.from({ length: 50 }, () => [Math.random(), Math.random(), Math.random()]);
546
+ const outputs = tcn.forward(sequence); // [50][1]
547
+ ```
548
+
549
+ ### NetworkLSTM — recurrent memory
266
550
 
267
551
  ```ts
268
- import { NetworkTransformer } from "@dniskav/neuron";
269
-
270
- // Sudoku solver: 81 cells (tokens), values 0–9, predict digit 1–9 per cell
271
- const net = new NetworkTransformer(81, {
272
- vocabSize: 10, // digits 0–9
273
- d_model: 64, // embedding / hidden dimension
274
- nHeads: 4, // attention heads (d_k = d_model / nHeads = 16)
275
- d_ff: 128, // FFN hidden size
276
- nBlocks: 4, // number of transformer blocks
277
- nClasses: 9, // output classes per token (digits 1–9)
278
- });
552
+ import { NetworkLSTM } from "@dniskav/neuron";
553
+
554
+ const net = new NetworkLSTM(1, 8, [4, 1]);
555
+
556
+ for (let epoch = 0; epoch < 300; epoch++) {
557
+ net.resetState();
558
+ for (let step = 0; step < 6; step++) net.predict([1]);
559
+ net.train([[0],[0],[0],[1],[1],[1]], 0.05);
560
+ }
561
+ ```
279
562
 
280
- // tokens: 81 cell values (0 = empty)
281
- const puzzle = [5,3,0, 0,7,0, 0,0,0, ...];
282
- const targets = [...]; // 81*9 one-hot values
283
- const mask = puzzle.map(v => v === 0); // only train on empty cells
563
+ ### Metrics evaluate your model
284
564
 
285
- const loss = net.train(puzzle, targets, 0.001, mask);
286
- // loss is cross-entropy (not MSE) decreases from ~2.2 toward 0 as training progresses
287
- const logits = net.predict(puzzle); // 729 logits (81 × 9)
565
+ ```ts
566
+ import { accuracy, f1Score, confusionMatrix, printConfusionMatrix, auc, classificationReport } from "@dniskav/neuron";
288
567
 
289
- // Attention weights from all blocks for visualization
290
- const weights = net.getAttentionWeights();
291
- // weights[blockIdx][headIdx] → seqLen × seqLen matrix
568
+ const yTrue = [0, 1, 1, 0, 1];
569
+ const yPred = [0, 1, 0, 0, 1];
570
+
571
+ console.log(accuracy(yTrue, yPred)); // 0.8
572
+ console.log(f1Score(yTrue, yPred)); // 0.8
573
+
574
+ const cm = confusionMatrix(yTrue, yPred);
575
+ printConfusionMatrix(cm, ['neg', 'pos']);
576
+
577
+ // AUC-ROC
578
+ const scores = [0.1, 0.9, 0.4, 0.2, 0.8];
579
+ console.log(auc(yTrue, scores)); // ~0.9
580
+
581
+ classificationReport(yTrue, yPred, ['neg', 'pos']);
292
582
  ```
293
583
 
294
- Each head in each block learns a different type of relationship (row, column,
295
- 3×3 box). The network figures this out by itself through training.
584
+ ### EarlyStopping
585
+
586
+ ```ts
587
+ import { EarlyStopping } from "@dniskav/neuron";
588
+
589
+ const stopper = new EarlyStopping({ patience: 10, minDelta: 1e-4, mode: 'min' });
296
590
 
297
- ### NetworkTransformerRL Transformer for reinforcement learning
591
+ for (let epoch = 0; epoch < 1000; epoch++) {
592
+ const valLoss = trainEpoch();
593
+ if (stopper.update(valLoss, epoch)) {
594
+ console.log(`Stopped at epoch ${epoch}`);
595
+ break;
596
+ }
597
+ }
598
+ ```
298
599
 
299
- `NetworkTransformerRL` uses causal self-attention over a sliding window of past states to output Q-values. Unlike `NetworkLSTM`, the agent attends to specific past moments rather than compressing them into a single hidden vector.
600
+ ### LossPlotter ASCII loss curve
300
601
 
301
602
  ```ts
302
- import { NetworkTransformerRL } from "@dniskav/neuron";
303
-
304
- // Agent sees the last 8 steps, each step is a 7-value sensor vector → 4 actions
305
- const net = new NetworkTransformerRL(8, 7, {
306
- d_model: 32,
307
- nHeads: 2,
308
- d_ff: 64,
309
- nBlocks: 2,
310
- nActions: 4,
311
- });
603
+ import { LossPlotter } from "@dniskav/neuron";
312
604
 
313
- // Each step: feed the last N states as a sequence
314
- const sequence = getLastNStates(); // number[][] — shape: [8, 7]
315
- const qValues = net.predict(sequence); // number[4]
605
+ const plotter = new LossPlotter({ width: 60, height: 12, title: 'Training Loss' });
316
606
 
317
- // Q-learning update: train toward Bellman target
318
- const action = argmax(qValues);
319
- const reward = env.step(action);
320
- const targets = qValues.slice();
321
- targets[action] = reward + 0.99 * Math.max(...net.predict(nextSequence));
607
+ for (let e = 0; e < 500; e++) {
608
+ const loss = trainStep();
609
+ plotter.add(loss, e);
610
+ }
611
+
612
+ plotter.print();
613
+ // Training Loss
614
+ // ┌────────────────────────────────────────────────────────────┐
615
+ // │ 2.31 ·
616
+ // │ · ·
617
+ // │ · · ·
618
+ // │ · · · · · · ·
619
+ // │ 0.02 · · · · · · · · · · · · · · ·
620
+ // └────────────────────────────────────────────────────────────┘
621
+ // 0 250 499
622
+ ```
623
+
624
+ ### DataAugmentation
625
+
626
+ ```ts
627
+ import { DataAugmentation } from "@dniskav/neuron";
322
628
 
323
- const loss = net.train(sequence, targets, 0.001);
629
+ // Split dataset
630
+ const { trainX, trainY, valX, valY } = DataAugmentation.split(X, y, 0.8, 0.1);
631
+
632
+ // Normalize (fit on train, apply to all)
633
+ const { normalized: normTrain, min, max } = DataAugmentation.normalize(trainX);
634
+ const normVal = valX.map(x => DataAugmentation.normalizePoint(x, min, max));
635
+
636
+ // Augment training set (×3 copies with Gaussian noise)
637
+ const { X: augX, y: augY } = DataAugmentation.augmentBatch(normTrain, trainY, 3, 0.02);
324
638
  ```
325
639
 
326
- The last step in the sequence gets 2× pooling weight the most recent state contributes more to the decision.
640
+ ### WeightInspectordiagnose your network
327
641
 
328
642
  ```ts
329
- // Inspect what the agent is attending to
330
- const attnWeights = net.getAttentionWeights();
331
- // attnWeights[blockIdx][headIdx] seqLen × seqLen matrix
643
+ import { NetworkN, WeightInspector, relu } from "@dniskav/neuron";
644
+
645
+ const net = new NetworkN([784, 256, 128, 10], { activations: [relu, relu, relu] });
646
+ // ... train ...
647
+
648
+ WeightInspector.print(net);
649
+ // Layer 0: mean=0.001 std=0.056 min=-0.21 max=0.19 dead=0 params=200960
650
+ // Layer 1: mean=0.000 std=0.079 min=-0.31 max=0.28 dead=3 params=32896
651
+ // Layer 2: mean=-0.001 std=0.091 min=-0.28 max=0.32 dead=0 params=1290
332
652
  ```
333
653
 
654
+ ## How it works
655
+
656
+ Each class applies an **activation function** to the weighted sum of inputs and uses **gradient descent** to update weights:
657
+
658
+ ```
659
+ weight += lr × delta × input
660
+ bias += lr × delta
661
+ ```
662
+
663
+ `NetworkN` implements full **backpropagation** across all layers, propagating deltas from the output back to the first layer using the chain rule. `NeuronN` uses **Xavier initialization** — weights start in `[-√(1/n), +√(1/n)]`.
664
+
665
+ When an **optimizer** is used (e.g., Adam), the raw gradient is passed to the optimizer instead of being applied directly. Each weight maintains its own optimizer state.
666
+
667
+ The `Value` class implements **reverse-mode automatic differentiation**: every operation records its inputs and a backward function. Calling `.backward()` on the output node performs a topological sort and propagates `∂L/∂w` through the entire graph.
668
+
669
+ ## Build
670
+
671
+ ```bash
672
+ npm run build # outputs CJS + ESM + type declarations to dist/
673
+ npm run dev # watch mode
674
+ npm test # run test suite
675
+ ```
676
+
677
+ ## For AI agents
678
+
679
+ If you are an AI agent or LLM working with this codebase, read [AGENTS.md](AGENTS.md) first. It contains the full class hierarchy, design constraints, and what this library does not do.
680
+
334
681
  ## Changelog
335
682
 
683
+ ### v0.3.1
684
+ - **New — Embeddings:** `Word2Vec` (Skip-gram + CBOW, full-softmax, cosine similarity, analogies), `TSNE` (binary-search perplexity, Student-t kernel, KL gradient, early exaggeration, seeded PRNG), `PositionalEncoding` (sinusoidal, Vaswani et al.), `LearnedPositionalEncoding` (trainable), `ContrastiveLearning` (NT-Xent, SimCLR encoder + projection head), `Augmenter` (noise, feature dropout, `makePair`)
685
+
686
+ ### v0.3.0
687
+ - **New — Classical ML:** `Perceptron`, `LinearRegression` (normal equation + GD), `LogisticRegression`, `SoftmaxRegression`, `GaussianNaiveBayes`, `DecisionTree` (CART, Gini/MSE)
688
+ - **New — Unsupervised:** `KMeans` (K-Means++ init), `PCA` (power iteration + Hotelling deflation), `SOM` (Kohonen map), `HopfieldNetwork` (Hebbian storage + energy), `Autoencoder`
689
+ - **New — Deep Learning:** `Conv2D` (full forward/backward), `MaxPool2D` (position mask for exact backprop), `Flatten`, `RNN` (BPTT, documents vanishing gradient), `Seq2Seq` (encoder-decoder LSTM), `CausalConv1D`, `TCN` (dilated temporal convolutions)
690
+ - **New — Generative:** `GAN` (min-max game, Box-Muller sampling), `VAE` (reparametrization trick, ELBO = MSE + KL)
691
+ - **New — Autograd:** `Value` / `Tape` — scalar reverse-mode AD with topological backprop (micrograd-style)
692
+ - **New — Metrics:** `confusionMatrix`, `accuracy`, `precision`, `recall`, `f1Score`, `rocCurve`, `auc`, `mae`, `rmse`, `r2Score`, `perplexity`, `printConfusionMatrix`, `classificationReport`
693
+ - **New — Utilities:** `EarlyStopping` (patience + best-weight restore), `LossPlotter` (ASCII terminal curve), `WeightInspector` (per-layer stats, dead ReLU detection), `DataAugmentation` (noise, normalize, z-score, shuffle, split)
694
+
336
695
  ### v0.2.7
337
- - **Docs:** Added architecture diagram to README — visual progression from `Neuron` to `NetworkTransformerRL`
696
+ - **Docs:** Added architecture diagram to README
338
697
 
339
698
  ### v0.2.6
340
699
  - **Fix:** `Network.predict` now returns `number[]` (consistent with all other network classes)
341
- - **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()` instead of hardcoded SGD and sigmoid derivative
342
- - **Fix:** `LayerNorm.backwardOne` now correctly uses pre-update γ when computing the input gradient
343
- - **Fix:** LSTM and GRU gate initialization corrected from He (`√(2/n)`) to Xavier fan-in+out (`√(2/(fanIn+fanOut))`), matching the sigmoid/tanh activations used in those gates
344
- - **New:** `BiasVector` — 1D counterpart to `WeightMatrix` with per-scalar Adam optimizers; replaces repeated `number[] + Adam[]` pairs in `TransformerBlock`, `NetworkTransformer`, and `NetworkTransformerRL`
345
- - **New:** `defaultOptimizer` exported from `optimizers.ts` single source of truth for the default `() => new SGD()` factory
346
- - **Refactor:** `NetworkN.train` and `trainWithDeltas` share extracted `_forwardAll()` and `_backpropLayers()` internals — eliminates ~50 lines of duplication
347
- - **Refactor:** `Transformer` backward methods now throw descriptive errors instead of crashing with a cryptic `TypeError` when called before `predict()`
348
- - **Refactor:** `NetworkTransformer.setWeights()` and `NetworkTransformerRL.setWeightsFlat()` use each component's own `setWeights()` instead of direct `.W` mutation
700
+ - **Fix:** `Network.train` now uses the configured optimizer and `activation.dfn()`
701
+ - **Fix:** `LayerNorm.backwardOne` correctly uses pre-update γ
702
+ - **Fix:** LSTM and GRU gate initialization corrected to Xavier fan-in+out
703
+ - **New:** `BiasVector` — 1D counterpart to `WeightMatrix`
704
+ - **New:** `defaultOptimizer` — shared default factory
705
+ - **Refactor:** `NetworkN` extracts `_forwardAll()` and `_backpropLayers()`
349
706
 
350
707
  ### v0.2.5
351
- - Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D` (per-scalar Adam/Momentum/SGD)
352
- - `NetworkN`: residual connections (`residual` option) and dropout (`dropoutRate`)
353
- - `Conv1D`: multi-channel input (`inputChannels`)
354
- - `NetworkTransformerRL`: configurable pooling (`avg` / `max` / `last` / `weighted`)
355
- - `Trainer`: weight decay, early stopping, classification metrics, gradient clipping support
356
- - `DataLoader`: validation split (`validationSplit` + `getValidationData()`)
357
- - `ModelSaver`: universal serialization via flat `getWeights()`/`setWeights()` for all classes
358
- - Gradient check test suite (`tests/GradientCheck.test.ts`)
359
-
360
- ## Possible improvements
361
-
362
- 1. **Support for batches** in training to improve efficiency and gradient stability.
363
- 2. **Global gradient norm clipping** — `WeightMatrix.update` supports per-element clipping; a utility to clip across all matrices by total norm would be more principled.
364
- 3. **Learning rate warmup** — standard practice for Transformers; ramp LR from 0 to target over the first N steps.
365
- 4. **Pre-norm architecture** — LayerNorm before the residual add (instead of after) is more stable for deep stacks.
708
+ - Unified optimizer factories for `LSTMLayer`, `GRULayer`, `Conv1D`
709
+ - `NetworkN`: residual connections and dropout
710
+ - `Conv1D`: multi-channel input
711
+ - `Trainer`: weight decay, early stopping, classification metrics
712
+ - `DataLoader`: validation split
713
+ - `ModelSaver`: universal serialization
366
714
 
367
715
  ## License
368
716