npm - tensorgrad - Versions diffs - 0.0.1 - Mend

tensorgrad 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Ben Albahari
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,121 @@
+# tensorgrad
+A tiny TypeScript-native tensor library with autograd that compiles directly
+to WebGPU. Designed for training small models in the browser — without
+hand-writing WGSL kernels and without dragging in a 5 MB ML framework.
+```sh
+npm i tensorgrad
+```
+Roughly **3000 lines of zero-dependency TypeScript**, ~10 KB gzipped after
+build. Targets WebGPU only. Static shapes only. Forward + reverse-mode
+autograd; Adam optimizer; the whole training pipeline runs as compiled WGSL.
+## Quick example
+A 2-layer MLP fitting `y = sin(x)`:
+```ts
+import {
+  Module, compileModule,
+  add, mul, sub, sumLast, reshape, matmul, relu,
+  type Tensor,
+} from 'tensorgrad'
+class Linear extends Module {
+  W: Tensor; b: Tensor
+  constructor(public inDim: number, public outDim: number) {
+    super()
+    this.W = this.param([inDim, outDim])
+    this.b = this.param([outDim])
+  }
+}
+class MLP extends Module {
+  l1 = new Linear(1, 64)
+  l2 = new Linear(64, 64)
+  l3 = new Linear(64, 1)
+}
+const linear = (p: Linear, x: Tensor) => add(matmul(x, p.W), p.b)
+function forward(m: MLP, x: Tensor): Tensor {
+  return linear(m.l3, relu(linear(m.l2, relu(linear(m.l1, x)))))
+}
+function loss(m: MLP, x: Tensor, y: Tensor): Tensor {
+  const diff = sub(forward(m, x), y)
+  return mul(sumLast(reshape(mul(diff, diff), [B])), 1 / B)
+}
+const B = 256
+const model = new MLP()
+const compiled = await compileModule(model, loss, {
+  adam: { lr: 0.005 },
+  inputs: [
+    { name: 'x', shape: [B, 1], dtype: 'f32' },
+    { name: 'y', shape: [B, 1], dtype: 'f32' },
+  ],
+})
+// Initialize params however you like (random, etc), then upload + train.
+compiled.uploadParams(initialParams)
+for (let step = 0; step < 1000; step++) {
+  const { x, y } = generateBatch()
+  const lossVal = await compiled.step({ x, y })
+  if (step % 100 === 0) console.log('step', step, 'loss', lossVal)
+}
+```
+That's the whole user-facing surface for this model: `Module` for parameter
+storage, plain functions for the forward pass, `compileModule` to JIT-compile
+to WGSL with autograd + Adam wired in. No decorators, no `tf.GradientTape`,
+no `register_pytree_node`.
+For a more involved example — a 3-layer transformer trained from scratch on
+2-digit addition — see the [`samples/`](./samples) workspace
+(`pnpm --filter samples dev`).
+## What this library is for
+Small browser-side ML where you want to *train* the model, not just run
+inference of a pretrained model. Educational artifacts, interactive
+demos, on-device personalization, "transformer from scratch in your browser"
+blog posts. Roughly the niche where the model is small enough to fit
+comfortably in a browser tab but where you still want autograd and a real
+optimizer.
+If you want to ship inference of a pretrained model, use
+[ONNX Runtime Web](https://github.com/microsoft/onnxruntime) or
+[transformers.js](https://github.com/xenova/transformers.js).
+If you need full JAX (vmap / pmap / dynamic shapes / multi-backend), use
+[jax-js](https://github.com/jax-js/jax).
+## Scope (deliberately small)
+The library only does what it does because of what it doesn't do.
+[`SPEC.md`](./SPEC.md) has the full design notes; the load-bearing
+"out of scope" decisions are:
+- **WebGPU only** — no Wasm or WebGL fallback.
+- **Static shapes only** — every shape is fixed at compile time. This is
+  what lets us bake constants into the WGSL instead of carrying shape
+  uniforms.
+- **`grad` is the only transformation** — no `vmap`, `pmap`, `jvp`,
+  `custom_vjp`. Batch your data explicitly.
+- **`f32` only** — no dtype promotion, no mixed precision.
+- **Closed op set** — about 25 ops, listed in `SPEC.md`. Compositions of
+  those handle most needs (GELU, RMS norm, etc. are a few lines on top).
+- **Adam lives in the IR** — bias correction included; no CPU↔GPU
+  round-trip per step.
+## Status
+Alpha. Two real working models (a transformer training to <0.1 loss on
+addition, an MLP fitting `sin`). API may change before 1.0. Filing issues
+welcome.
+## License
+MIT

package/SPEC.md ADDED Viewed

@@ -0,0 +1,293 @@
+# Tensorgrad — Architecture
+This document covers the design decisions, IR, and internals of tensorgrad.
+For installation and user-facing API, see `README.md`. For pre-1.0
+implementation status and what to pick up next, see `HANDOFF.md`.
+## Scope and non-goals (load-bearing)
+The library only does what it does because of what it doesn't do. Each
+"out of scope" decision is the *precondition* that lets the rest stay small.
+**In scope:**
+- Static-shape models. Every shape is fixed at compile time.
+- WebGPU only.
+- f32 only.
+- `grad` (reverse-mode autograd) as the only transformation.
+- A closed set of ~25 ops covering transformers + MLPs.
+- Adam optimizer in-IR.
+**Out of scope (deliberately):**
+- Wasm or WebGL fallback.
+- Dynamic shapes, shape polymorphism.
+- `vmap`, `pmap`, `jvp`, `custom_vjp`, higher-order gradients.
+- Dtype promotion, mixed precision.
+- General PyTree machinery (we use `Module` + property paths instead).
+- Inference of pre-trained models (use ONNX Runtime Web or transformers.js).
+- ONNX import / safetensors / model loaders.
+- Distributed training, gradient accumulation across devices.
+The non-goals are load-bearing. Trying to add any of them without rethinking
+the IR forces complexity throughout.
+## Architecture overview
+```
+┌────────────────────────────────────────────────────────────┐
+│ User code                                                   │
+│   class Model extends Module { /* params */ }               │
+│   function forward(m: Model, x: Tensor): Tensor { /* ... */ }│
+└────────────────────────────────────────────────────────────┘
+                          │
+                          ▼  trace()
+┌────────────────────────────────────────────────────────────┐
+│ Forward IR build (src/trace.ts, src/ops.ts, src/shape.ts)   │
+│   Each op call appends a node to the Graph and returns a    │
+│   fresh Tensor handle. Shapes inferred + validated per op.  │
+└────────────────────────────────────────────────────────────┘
+                          │
+                          ▼  appendGrad()
+┌────────────────────────────────────────────────────────────┐
+│ Reverse-mode autograd (src/grad.ts)                         │
+│   Topological walk; each forward op contributes its         │
+│   transpose rule, building the backward graph in-place.     │
+└────────────────────────────────────────────────────────────┘
+                          │
+                          ▼  appendAdam() (optional)
+┌────────────────────────────────────────────────────────────┐
+│ Optimizer (src/adam.ts)                                     │
+│   Per-param: m_state, v_state state_inputs + fused          │
+│   adam_update_{m,v,p} ops. Writebacks declared.             │
+└────────────────────────────────────────────────────────────┘
+                          │
+                          ▼  planBuffers()
+┌────────────────────────────────────────────────────────────┐
+│ Buffer plan (src/buffers.ts)                                │
+│   One GPU buffer per IR tensor, categorized:                │
+│   param / param_grad / state / tensor_input / intermediate. │
+│   Writebacks resolved to (source_buf → dest_buf) pairs.     │
+└────────────────────────────────────────────────────────────┘
+                          │
+                          ▼  emitKernels()
+┌────────────────────────────────────────────────────────────┐
+│ WGSL codegen (src/codegen.ts)                               │
+│   Per op kind: a kernel template with shapes baked in.      │
+│   Returns dispatch-ready KernelSpec[].                      │
+└────────────────────────────────────────────────────────────┘
+                          │
+                          ▼  createRuntime()
+┌────────────────────────────────────────────────────────────┐
+│ Runtime (src/runtime.ts)                                    │
+│   GPUDevice setup, pipeline cache, bind groups,             │
+│   step(): upload inputs → dispatch all kernels → writebacks │
+│           → loss readback. Compile errors surface via       │
+│           pushErrorScope+getCompilationInfo.                │
+└────────────────────────────────────────────────────────────┘
+```
+## Key design decisions
+**D1. Runtime tracing, not build-time.** Forward function is traced once on
+first compile; the IR is built from those op calls. Build-time tracing via
+a TypeScript transformer plugin would be cleaner but adds a build-step
+requirement. v2 candidate.
+**D2. Tensors are opaque handles, not Proxies.** Each op returns a fresh
+`Tensor` (just `{ id, shape, dtype, source, site }`). Proxy-based tracing
+gives nicer error UX but couples the IR to runtime introspection.
+**D3. No reference counting.** Every IR tensor gets its own GPU buffer,
+allocated once and never freed. Our scope (one model, fixed shapes,
+training in a loop) means there's nothing to gain from refcount discipline.
+Memory cost is bounded; buffer pooling is a v2 optimization, not v1
+correctness.
+**D4. Closed op set.** The IR knows about exactly the ops it supports.
+Adding a new op means adding (a) shape rule, (b) WGSL kernel template,
+(c) autograd transpose rule. This is intentional — a closed op set makes
+each piece tractable to write and verify by hand.
+**D5. Shapes checked at trace time, not at type level.** Type-level shape
+encoding in TypeScript is real but hits recursion limits and adds
+significant generic complexity to user code. Runtime shape errors at trace
+time, with call-site capture, are good enough.
+**D6. Adam state in the IR.** Optimizer state (m, v, plus a per-step `lrt`
+scalar for bias correction) lives in dedicated `state_input` buffers that
+persist across `step()` calls. Writebacks at the end of each step copy new
+values into their persistent homes. No CPU↔GPU round-trip per step.
+**D7. Module separates from forward.** Mutable parameter storage lives in
+`Module` subclasses. Forward functions are pure, take the materialized
+model as the first argument, and call ordinary op functions. State and
+computation never mix — JAX's lesson, applied to TypeScript with
+class-based ergonomics.
+**D8. JS-number scalar overloads.** `add(x, 1e-5)` and `add(x, y)` both
+work. The scalar variants dispatch to fused IR ops internally.
+## IR
+```ts
+interface Tensor {
+  readonly id: number
+  readonly shape: Shape
+  readonly dtype: Dtype
+  readonly source: number | null   // op index, or null for leaves
+  readonly site: CallSite | null   // user's stack at op-call time
+}
+type OpNode =
+  | { kind: 'param_input'; ... } | { kind: 'tensor_input'; ... }
+  | { kind: 'state_input'; ... } | { kind: 'arange'; ... }
+  | { kind: 'const_scalar'; ... }
+  | { kind: 'add' | 'sub' | 'mul' | 'div'; ... }
+  | { kind: 'add_scalar' | 'mul_scalar'; ... }
+  | { kind: 'sqrt' | 'rsqrt' | 'log' | 'exp' | 'relu'; ... }
+  | { kind: 'less' | 'greater'; ... } | { kind: 'where'; ... }
+  | { kind: 'mean_last' | 'sum_last'; ... }
+  | { kind: 'reshape' | 'transpose' | 'slice_last_range'; ... }
+  | { kind: 'broadcast_to' | 'sum_to_shape'; ... }
+  | { kind: 'matmul' | 'matmul_batched'; ... }
+  | { kind: 'one_hot'; ... }
+  | { kind: 'softmax_causal_last' | 'log_softmax_last' | 'where_causal'; ... }
+  | { kind: 'relu_grad'; ... }
+  | { kind: 'adam_update_m' | 'adam_update_v' | 'adam_update_p'; ... }
+interface Graph {
+  readonly ops: OpNode[]
+  readonly tensors: Tensor[]
+  readonly outputs: number[]   // tensor ids — typically just the loss
+}
+```
+The op kinds are intentionally split fine-grained (`mean_last` not
+`mean(axis)`) because each kind maps to a hand-written WGSL kernel. Adding
+generality later is fine; pretending to be more general than we are isn't.
+## Op set (current)
+**Leaves:** `param_input`, `tensor_input`, `state_input`, `arange`, `const_scalar`
+**Element-wise binops** (NumPy broadcasting): `add`, `sub`, `mul`, `div`,
+plus fused `add_scalar`, `mul_scalar`
+**Element-wise unary:** `sqrt`, `rsqrt`, `log`, `exp`, `relu`
+**Comparisons + select:** `less`, `greater`, `where`
+**Reductions over last axis:** `mean_last`, `sum_last`
+**Shape:** `reshape`, `transpose`, `slice_last_range`, `broadcast_to`, `sum_to_shape`
+**Linear algebra:** `matmul` (2D rhs), `matmul_batched` (both batched)
+**Indexing / casting:** `one_hot`
+**ML primitives** (fused for clean autograd): `softmax_causal_last`,
+`log_softmax_last`, `where_causal`
+**Autograd-internal:** `relu_grad`
+**Adam-internal:** `adam_update_m`, `adam_update_v`, `adam_update_p`
+## Module abstraction
+The `Module` base class enables Domeleon-style auto-discovery of nested
+modules and parameters via property reflection:
+```ts
+class Linear extends Module {
+  W: Tensor; b: Tensor
+  constructor(public inDim: number, public outDim: number) {
+    super()
+    this.W = this.param([inDim, outDim])  // returns ParamSentinel cast to Tensor
+    this.b = this.param([outDim])
+  }
+}
+```
+`this.param(shape)` returns a `ParamSentinel` typed as `Tensor`. At compile
+time, `materializeParams(root)` walks enumerable properties of the model
+tree (recursing into nested `Module` instances and arrays of modules),
+replaces every sentinel with a real `paramInput` tensor whose name is the
+property path (`layers.0.attn.q.W`), and returns a flat `Record<path,
+Tensor>` for autograd to use.
+This is the JAX/Equinox separation: parameter storage is mutable
+(state-bearing components), forward computation is pure (functions over
+materialized parameters and inputs).
+## WGSL codegen
+Each op kind has a kernel template in `codegen.ts`. Shapes are **baked into
+the WGSL as compile-time constants** rather than passed as uniforms — this
+gives the WGSL compiler full freedom to specialize and means each shape
+combination produces a distinct shader. Fine for our static-shape model.
+**Dispatch:** WebGPU caps each dimension at 65535 workgroups. The runtime
+dispatches as `(min(N, 65535), ceil(N/65535), 1)`; kernels compute their
+global index as `gid.x + gid.y * (65535 * workgroup_size)`. Workgroup size
+is 256 — large enough that our biggest kernel (~8M threads in
+`matmul_bwd_dW`) fits in 1D.
+**Error reporting:** `runtime.ts` wraps each pipeline creation in
+`pushErrorScope('validation')` and pulls `getCompilationInfo()` on
+failure, so shader bugs surface with file/line/message rather than the
+useless "previous error" you get when a broken pipeline is dispatched.
+## Buffer plan
+`planBuffers(graph, paramGrads, writebacks)` walks every tensor and
+categorizes it:
+| Kind | Purpose | Lifetime |
+|---|---|---|
+| `param` | trainable parameter | persistent |
+| `param_grad` | gradient w.r.t. a param | one step |
+| `state` | optimizer state (Adam m, v) | persistent |
+| `tensor_input` | data input (tokens, targets) | one step |
+| `intermediate` | any other op output | one step |
+| `output` | exposed graph output (loss) | one step |
+State buffers are zero-initialized at runtime creation. Writebacks (declared
+by `appendAdam`) describe end-of-step `copyBufferToBuffer` operations from
+freshly-computed values into their persistent homes.
+## Autograd
+`appendGrad(graph)` walks the forward ops in reverse and emits backward ops
+into the same graph. Each op's transpose rule is hand-written in
+`grad.ts`. The cotangents map (`tensorId → Tensor`) accumulates
+contributions from multiple consumers via `add`.
+Two notable workarounds:
+- **Embedding lookup is implemented as `oneHot @ table`** rather than
+  `gather`. Gather has no transpose rule (it'd need scatter-with-atomic-add
+  or similar), but `oneHot @ table` decomposes into ops that *do* have
+  rules, so autograd works through it for free.
+- **`slice_last_range` has no backward yet.** Forward works (used in any
+  axis-2 slicing pattern); backward is unimplemented because it'd need a
+  scatter-style "place into zeros" op. Workaround: use multiple separate
+  matmuls (e.g. `W_q`/`W_k`/`W_v`) instead of a fused `W_qkv`.
+## Verification approach
+Two layers:
+1. **Smoke test** (`pnpm test` → `test/smoke.ts`) — runs in Node without
+   GPU. Builds the IR, attaches grad, plans buffers, emits all WGSL, and
+   verifies structure (kernel count, binding count, shape errors). Catches
+   codegen regressions without needing a browser.
+2. **Live samples** (`pnpm --filter samples dev`) — Vite dev server with
+   a `/__log` endpoint that streams browser logs to stdout, used during
+   development to bypass copy-paste-from-console debugging.
+## What this spec is not
+A contract. The API will change before 1.0. The load-bearing decisions
+are in **Scope and non-goals** and **Key design decisions** above —
+those are the points where the design deliberately diverges from JAX or
+PyTorch, and where reverting any of them effectively re-creates the
+failure mode that motivated this library.

package/dist/adam.d.ts ADDED Viewed

@@ -0,0 +1,31 @@
+import type { Tensor } from './ir.js';
+import type { Graph } from './ir.js';
+import type { WritebackDecl } from './buffers.js';
+export interface AdamConfig {
+    lr: number;
+    b1?: number;
+    b2?: number;
+    eps?: number;
+}
+export interface AdamResult {
+    /** Writebacks the buffer planner should wire into the runtime. */
+    writebacks: WritebackDecl[];
+    /** Name of the per-step scalar tensor_input. The runtime fills this each call
+     * with `lr * sqrt(1-b2^t)/(1-b1^t)` (Adam's bias-corrected effective LR). */
+    lrtInputName: string;
+    /** Hyperparameters as captured (so the runtime can compute lrt). */
+    config: Required<AdamConfig>;
+}
+/**
+ * Append Adam update ops to `graph`. Must be called inside an active trace
+ * context (or after a trace, since traceInto re-enters the graph).
+ *
+ * @param graph the graph (already containing forward + backward)
+ * @param paramGrads param name -> gradient tensor (output of `appendGrad`)
+ * @param paramTensors param name -> the param's leaf Tensor (the param_input).
+ *                     Needed because the param_input lives in the graph but we
+ *                     don't have a direct map by name in `Graph` — caller passes it.
+ * @param config Adam hyperparameters
+ */
+export declare function appendAdam(graph: Graph, paramGrads: Record<string, Tensor>, paramTensors: Record<string, Tensor>, config: AdamConfig): AdamResult;
+//# sourceMappingURL=adam.d.ts.map

package/dist/adam.d.ts.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"adam.d.ts","sourceRoot":"","sources":["../src/adam.ts"],"names":[],"mappings":"AAoBA,OAAO,KAAK,EAAE,MAAM,EAAE,MAAM,SAAS,CAAA;AACrC,OAAO,KAAK,EAAE,KAAK,EAAE,MAAM,SAAS,CAAA;AACpC,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,cAAc,CAAA;AAIjD,MAAM,WAAW,UAAU;IACzB,EAAE,EAAE,MAAM,CAAA;IACV,EAAE,CAAC,EAAE,MAAM,CAAA;IACX,EAAE,CAAC,EAAE,MAAM,CAAA;IACX,GAAG,CAAC,EAAE,MAAM,CAAA;CACb;AAED,MAAM,WAAW,UAAU;IACzB,kEAAkE;IAClE,UAAU,EAAE,aAAa,EAAE,CAAA;IAC3B;iFAC6E;IAC7E,YAAY,EAAE,MAAM,CAAA;IACpB,oEAAoE;IACpE,MAAM,EAAE,QAAQ,CAAC,UAAU,CAAC,CAAA;CAC7B;AAED;;;;;;;;;;GAUG;AACH,wBAAgB,UAAU,CACxB,KAAK,EAAE,KAAK,EACZ,UAAU,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,EAClC,YAAY,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,EACpC,MAAM,EAAE,UAAU,GACjB,UAAU,CAmCZ"}

package/dist/adam.js ADDED Viewed

@@ -0,0 +1,66 @@
+// Adam optimizer, in-graph.
+//
+// `appendAdam` extends a graph that already has a forward pass + autograd-emitted
+// backward (i.e., has paramGrads from `appendGrad`) with the Adam update math.
+//
+// Per parameter P with gradient g:
+//   m_new = b1 * m + (1 - b1) * g
+//   v_new = b2 * v + (1 - b2) * g²
+//   p_new = p - lr * m_new / (sqrt(v_new) + eps)
+//
+// This is "Adam without bias correction" — the `1 / (1 - β^t)` factors are
+// dropped because computing them in-graph requires per-step uniforms or
+// awkward exp/log tricks. In practice the omission only affects the first
+// ~100 steps; convergence is unaffected.
+//
+// Returns writeback declarations the buffer planner uses to wire up the
+// "after step, copy the new value into the persistent home" path. m and v
+// are state_inputs (zero-initialized, persistent across steps); the param
+// updates are aliased back to the param buffers.
+import { traceInto, stateInput, tensorInput } from './trace.js';
+import { adamUpdateM, adamUpdateV, adamUpdateP } from './ops.js';
+/**
+ * Append Adam update ops to `graph`. Must be called inside an active trace
+ * context (or after a trace, since traceInto re-enters the graph).
+ *
+ * @param graph the graph (already containing forward + backward)
+ * @param paramGrads param name -> gradient tensor (output of `appendGrad`)
+ * @param paramTensors param name -> the param's leaf Tensor (the param_input).
+ *                     Needed because the param_input lives in the graph but we
+ *                     don't have a direct map by name in `Graph` — caller passes it.
+ * @param config Adam hyperparameters
+ */
+export function appendAdam(graph, paramGrads, paramTensors, config) {
+    const fullConfig = {
+        lr: config.lr,
+        b1: config.b1 ?? 0.9,
+        b2: config.b2 ?? 0.999,
+        eps: config.eps ?? 1e-8,
+    };
+    const writebacks = [];
+    const lrtInputName = '_adam_lrt';
+    return traceInto(graph, () => {
+        // One scalar lrt input shared by every adam_update_p call. Runtime supplies
+        // it per step as `lr * sqrt(1-b2^t) / (1-b1^t)`.
+        const lrt = tensorInput(lrtInputName, [], 'f32');
+        for (const name of Object.keys(paramGrads)) {
+            const p = paramTensors[name];
+            const g = paramGrads[name];
+            if (!p)
+                throw new Error(`appendAdam: missing param tensor for '${name}'`);
+            if (!g)
+                throw new Error(`appendAdam: missing gradient for '${name}'`);
+            const mState = stateInput(`adam_m_${name}`, p.shape, 'f32', 0);
+            const vState = stateInput(`adam_v_${name}`, p.shape, 'f32', 0);
+            // Three fused kernels per parameter — one for each of m_new / v_new / p_new.
+            const newM = adamUpdateM(mState, g, fullConfig.b1);
+            const newV = adamUpdateV(vState, g, fullConfig.b2);
+            const newP = adamUpdateP(p, newM, newV, lrt, fullConfig.eps);
+            writebacks.push({ source: newM, destName: `adam_m_${name}`, destKind: 'state' });
+            writebacks.push({ source: newV, destName: `adam_v_${name}`, destKind: 'state' });
+            writebacks.push({ source: newP, destName: name, destKind: 'param' });
+        }
+        return { writebacks, lrtInputName, config: fullConfig };
+    });
+}
+//# sourceMappingURL=adam.js.map

package/dist/adam.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"adam.js","sourceRoot":"","sources":["../src/adam.ts"],"names":[],"mappings":"AAAA,4BAA4B;AAC5B,EAAE;AACF,kFAAkF;AAClF,+EAA+E;AAC/E,EAAE;AACF,mCAAmC;AACnC,kCAAkC;AAClC,mCAAmC;AACnC,iDAAiD;AACjD,EAAE;AACF,2EAA2E;AAC3E,wEAAwE;AACxE,0EAA0E;AAC1E,yCAAyC;AACzC,EAAE;AACF,wEAAwE;AACxE,0EAA0E;AAC1E,0EAA0E;AAC1E,iDAAiD;AAKjD,OAAO,EAAE,SAAS,EAAE,UAAU,EAAE,WAAW,EAAE,MAAM,YAAY,CAAA;AAC/D,OAAO,EAAE,WAAW,EAAE,WAAW,EAAE,WAAW,EAAE,MAAM,UAAU,CAAA;AAmBhE;;;;;;;;;;GAUG;AACH,MAAM,UAAU,UAAU,CACxB,KAAY,EACZ,UAAkC,EAClC,YAAoC,EACpC,MAAkB;IAElB,MAAM,UAAU,GAAyB;QACvC,EAAE,EAAE,MAAM,CAAC,EAAE;QACb,EAAE,EAAE,MAAM,CAAC,EAAE,IAAI,GAAG;QACpB,EAAE,EAAE,MAAM,CAAC,EAAE,IAAI,KAAK;QACtB,GAAG,EAAE,MAAM,CAAC,GAAG,IAAI,IAAI;KACxB,CAAA;IACD,MAAM,UAAU,GAAoB,EAAE,CAAA;IACtC,MAAM,YAAY,GAAG,WAAW,CAAA;IAEhC,OAAO,SAAS,CAAC,KAAK,EAAE,GAAG,EAAE;QAC3B,4EAA4E;QAC5E,iDAAiD;QACjD,MAAM,GAAG,GAAG,WAAW,CAAC,YAAY,EAAE,EAAE,EAAE,KAAK,CAAC,CAAA;QAEhD,KAAK,MAAM,IAAI,IAAI,MAAM,CAAC,IAAI,CAAC,UAAU,CAAC,EAAE,CAAC;YAC3C,MAAM,CAAC,GAAG,YAAY,CAAC,IAAI,CAAC,CAAA;YAC5B,MAAM,CAAC,GAAG,UAAU,CAAC,IAAI,CAAC,CAAA;YAC1B,IAAI,CAAC,CAAC;gBAAE,MAAM,IAAI,KAAK,CAAC,yCAAyC,IAAI,GAAG,CAAC,CAAA;YACzE,IAAI,CAAC,CAAC;gBAAE,MAAM,IAAI,KAAK,CAAC,qCAAqC,IAAI,GAAG,CAAC,CAAA;YAErE,MAAM,MAAM,GAAG,UAAU,CAAC,UAAU,IAAI,EAAE,EAAE,CAAC,CAAC,KAAK,EAAE,KAAK,EAAE,CAAC,CAAC,CAAA;YAC9D,MAAM,MAAM,GAAG,UAAU,CAAC,UAAU,IAAI,EAAE,EAAE,CAAC,CAAC,KAAK,EAAE,KAAK,EAAE,CAAC,CAAC,CAAA;YAE9D,6EAA6E;YAC7E,MAAM,IAAI,GAAG,WAAW,CAAC,MAAM,EAAE,CAAC,EAAE,UAAU,CAAC,EAAE,CAAC,CAAA;YAClD,MAAM,IAAI,GAAG,WAAW,CAAC,MAAM,EAAE,CAAC,EAAE,UAAU,CAAC,EAAE,CAAC,CAAA;YAClD,MAAM,IAAI,GAAG,WAAW,CAAC,CAAC,EAAE,IAAI,EAAE,IAAI,EAAE,GAAG,EAAE,UAAU,CAAC,GAAG,CAAC,CAAA;YAE5D,UAAU,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,IAAI,EAAE,QAAQ,EAAE,UAAU,IAAI,EAAE,EAAE,QAAQ,EAAE,OAAO,EAAE,CAAC,CAAA;YAChF,UAAU,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,IAAI,EAAE,QAAQ,EAAE,UAAU,IAAI,EAAE,EAAE,QAAQ,EAAE,OAAO,EAAE,CAAC,CAAA;YAChF,UAAU,CAAC,IAAI,CAAC,EAAE,MAAM,EAAE,IAAI,EAAE,QAAQ,EAAE,IAAI,EAAc,QAAQ,EAAE,OAAO,EAAE,CAAC,CAAA;QAClF,CAAC;QACD,OAAO,EAAE,UAAU,EAAE,YAAY,EAAE,MAAM,EAAE,UAAU,EAAE,CAAA;IACzD,CAAC,CAAC,CAAA;AACJ,CAAC"}

package/dist/buffers.d.ts ADDED Viewed

@@ -0,0 +1,56 @@
+import type { Graph, Tensor, Dtype, Shape } from './ir.js';
+export interface BufferSpec {
+    /** Matches tensor.id. */
+    id: number;
+    byteSize: number;
+    dtype: Dtype;
+    shape: Shape;
+    kind: 'param' | 'param_grad' | 'tensor_input' | 'state' | 'intermediate' | 'output';
+    /** External name for param/param_grad/tensor_input/state bindings. null otherwise. */
+    name: string | null;
+    /** For state buffers: the value to fill on initial allocation. 0 by default. */
+    initValue?: number;
+}
+/**
+ * After step(), copy `source`'s buffer into `dest`'s buffer.
+ * Used to write back updated optimizer state and updated parameters into
+ * their persistent home buffers.
+ */
+export interface Writeback {
+    source: number;
+    dest: number;
+    bytes: number;
+}
+export interface BufferPlan {
+    buffers: BufferSpec[];
+    /** Tensor id -> buffer id (currently 1:1 but kept opaque for future pooling). */
+    tensorToBuffer: Map<number, number>;
+    /** Easy lookup tables for the runtime. */
+    paramsByName: Map<string, number>;
+    inputsByName: Map<string, number>;
+    paramGradsByName: Map<string, number>;
+    statesByName: Map<string, number>;
+    outputBufferIds: number[];
+    /** End-of-step writebacks (Adam updates for params, m, v, etc.) */
+    writebacks: Writeback[];
+}
+/**
+ * Caller-supplied writeback declarations: "after each step, copy this Tensor's
+ * buffer into the persistent home of this param/state."
+ */
+export interface WritebackDecl {
+    /** The Tensor (output of some op) holding the new value to write back. */
+    source: Tensor;
+    /** Either a param name (writes to that param's home buffer) or a state name. */
+    destName: string;
+    destKind: 'param' | 'state';
+}
+/**
+ * Build a BufferPlan from a graph + the param-grad map produced by appendGrad.
+ * @param graph the full graph (forward + backward + any optimizer ops)
+ * @param paramGrads map from param name -> the Tensor that holds its gradient
+ * @param writebackDecls list of end-of-step writebacks (e.g. from appendAdam).
+ *                       Empty when there's no optimizer in the graph.
+ */
+export declare function planBuffers(graph: Graph, paramGrads: Record<string, Tensor>, writebackDecls?: WritebackDecl[]): BufferPlan;
+//# sourceMappingURL=buffers.d.ts.map

package/dist/buffers.d.ts.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"buffers.d.ts","sourceRoot":"","sources":["../src/buffers.ts"],"names":[],"mappings":"AAcA,OAAO,KAAK,EAAE,KAAK,EAAE,MAAM,EAAE,KAAK,EAAE,KAAK,EAAU,MAAM,SAAS,CAAA;AAElE,MAAM,WAAW,UAAU;IACzB,yBAAyB;IACzB,EAAE,EAAE,MAAM,CAAA;IACV,QAAQ,EAAE,MAAM,CAAA;IAChB,KAAK,EAAE,KAAK,CAAA;IACZ,KAAK,EAAE,KAAK,CAAA;IACZ,IAAI,EAAE,OAAO,GAAG,YAAY,GAAG,cAAc,GAAG,OAAO,GAAG,cAAc,GAAG,QAAQ,CAAA;IACnF,sFAAsF;IACtF,IAAI,EAAE,MAAM,GAAG,IAAI,CAAA;IACnB,gFAAgF;IAChF,SAAS,CAAC,EAAE,MAAM,CAAA;CACnB;AAED;;;;GAIG;AACH,MAAM,WAAW,SAAS;IACxB,MAAM,EAAE,MAAM,CAAA;IACd,IAAI,EAAE,MAAM,CAAA;IACZ,KAAK,EAAE,MAAM,CAAA;CACd;AAED,MAAM,WAAW,UAAU;IACzB,OAAO,EAAE,UAAU,EAAE,CAAA;IACrB,iFAAiF;IACjF,cAAc,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IACnC,0CAA0C;IAC1C,YAAY,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IACjC,YAAY,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IACjC,gBAAgB,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IACrC,YAAY,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IACjC,eAAe,EAAE,MAAM,EAAE,CAAA;IACzB,mEAAmE;IACnE,UAAU,EAAE,SAAS,EAAE,CAAA;CACxB;AAUD;;;GAGG;AACH,MAAM,WAAW,aAAa;IAC5B,0EAA0E;IAC1E,MAAM,EAAE,MAAM,CAAA;IACd,gFAAgF;IAChF,QAAQ,EAAE,MAAM,CAAA;IAChB,QAAQ,EAAE,OAAO,GAAG,OAAO,CAAA;CAC5B;AAED;;;;;;GAMG;AACH,wBAAgB,WAAW,CACzB,KAAK,EAAE,KAAK,EACZ,UAAU,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,EAClC,cAAc,GAAE,aAAa,EAAO,GACnC,UAAU,CAuFZ"}