npm - @jax-js/jax - Versions diffs - 0.0.5 → 0.1.1 - Mend

@jax-js/jax 0.0.5 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +267 -92
package/dist/{backend-yEU0L_ig.cjs → backend-BbrKEB18.cjs} +378 -183
package/dist/{backend-CdcTZEOF.js → backend-CoVtc9dx.js} +366 -177
package/dist/index.cjs +385 -74
package/dist/index.d.cts +115 -23
package/dist/index.d.ts +115 -23
package/dist/index.js +378 -74
package/dist/{webgpu-CM-xNYzW.js → webgpu-B3UVme6n.js} +188 -153
package/dist/{webgpu-CNOpiO5T.cjs → webgpu-DGYNVHma.cjs} +188 -153
package/package.json +25 -15

package/README.md CHANGED Viewed

@@ -1,10 +1,14 @@
-# jax-js: JAX in pure JavaScript
+<h1 align="center">jax-js: JAX in pure JavaScript</h1>
-[Website](https://www.ekzhang.com/jax-js/) | [API Reference](https://www.ekzhang.com/jax-js/docs/)
+<p align="center"><strong>
+  <a href="https://jax-js.com">Website</a> |
+  <a href="https://jax-js.com/docs/">API Reference</a> |
+  <a href="./FEATURES.md">Compatibility Table</a>
+</strong></p>
-**jax-js** is a machine learning framework for the browser. It aims to bring JAX-style,
-high-performance CPU and GPU kernels to JavaScript, so you can run numerical applications on the
-web.
+**jax-js** is a machine learning framework for the browser. It aims to bring
+[JAX](https://jax.dev)-style, high-performance CPU and GPU kernels to JavaScript, so you can run
+numerical applications on the web.
 ```bash
 npm i @jax-js/jax
@@ -19,57 +23,269 @@ GPU ML framework, since it runs anywhere a browser can run.
 ## Quickstart
-You can use `jax-js` as an array API, just like NumPy.
 ```js
 import { numpy as np } from "@jax-js/jax";
-// Array operations, compatible with NumPy.
+// Array operations, compatible with JAX/NumPy.
 const x = np.array([1, 2, 3]);
 const y = x.mul(4); // [4, 8, 12]
 ```
-It also lets you take derivatives with `grad` like in JAX (as well as `vmap`, `jit`).
+### Web usage (CDN)
-```js
-import { grad, numpy as np } from "@jax-js/jax";
+In vanilla JavaScript (without a bundler), just import from a module script tag. This is the easiest
+way to get started on a blank HTML page.
-// Calculate derivatives with reverse-mode AD.
-const norm = (a) => a.ref.mul(a).sum();
+```html
+<script type="module">
+  import { numpy as np } from "https://esm.sh/@jax-js/jax";
+</script>
+```
-const x = np.array([1, 2, 3]);
-const xnorm = norm(x.ref); // 1^2 + 2^2 + 3^2 = 14
-const xgrad = grad(norm)(x); // [2, 4, 6]
+## Tutorial
+Programming in `jax-js` looks [very similar to JAX](https://docs.jax.dev/en/latest/jax-101.html),
+just in JavaScript.
+### Arrays
+Create an array with `np.array()`:
+```ts
+import { numpy as np } from "@jax-js/jax";
+const ar = np.array([1, 2, 3]);
 ```
-The default backend runs on CPU, but on [supported browsers](https://caniuse.com/webgpu) including
-Chrome and iOS Safari, you can switch to GPU for better performance.
+By default, this is a float32 array, but you can also specify a dtype explicitly:
-```js
-import { defaultDevice, init, numpy as np } from "@jax-js/jax";
+```ts
+const ar = np.array([1, 2, 3], { dtype: np.float32 });
+```
+For more efficient construction, you can create an array from a JS `TypedArray` buffer:
+```ts
+const buf = new Float32Array([10, 20, 30, 100, 200, 300]);
+const ar = np.array(buf).reshape([2, 3]);
+```
-// Initialize the GPU backend.
-await init("webgpu");
+Once you're done with it, you can unwrap a `jax.Array` back into JavaScript. This will also apply
+any pending operations or lazy updates:
-// Change the default backend to GPU.
-defaultDevice("webgpu");
+```ts
+// 1) Returns a possibly nested JavaScript array.
+ar.js();
+await ar.jsAsync(); // Faster, non-blocking
-const x = np.ones([4096, 4096]);
-const y = np.dot(x.ref, x); // JIT-compiled into a matrix multiplication kernel
+// 2) Returns a flat TypedArray data buffer.
+ar.dataSync();
+await ar.data(); // Fastest, non-blocking
 ```
-Most common JAX APIs are supported. See the [compatibility table](./FEATURES.md) for a full
-breakdown of what features are available.
+Arrays can have mathematical operations applied to them. For example:
-### Web usage (CDN)
+```ts
+import { numpy as np, scipySpecial as special } from "@jax-js/jax";
-If you want to use `jax-js` in vanilla JavaScript (without a bundler), just import from a module
-script tag. This is the easiest way to get started on a blank HTML page.
+const x = np.arange(100).astype(np.float32); // array of integers [0..99]
-```html
-<script type="module">
-  import { numpy as np } from "https://esm.sh/@jax-js/jax";
-</script>
+const y1 = x.ref.add(x.ref); // x + x
+const y2 = np.sin(x.ref); // sin(x)
+const y3 = np.tanh(x.ref).mul(5); // 5 * tanh(x)
+const y4 = special.erfc(x.ref); // erfc(x)
+```
+Notice that in the above code, we used `x.ref`. This is because of the memory model, jax-js uses
+reference-counted _ownership_ to track when the memory of an Array can be freed. More on this below.
+### Reference counting
+Big Arrays take up a lot of memory. Python ML libraries override the `__del__()` method to free
+memory, but JavaScript has no such API for running object destructors
+([cf.](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/FinalizationRegistry)).
+This means that you have to track references manually. jax-js tries to make this as ergonomic as
+possible, so you don't accidentally leak memory in a loop.
+Every `jax.Array` has a reference count. This satisfies the following rules:
+- Whenever you create an Array, its reference count starts at `1`.
+- When an Array's reference count reaches `0`, it is freed and can no longer be used.
+- Given an Array `a`:
+  - Accessing `a.ref` returns `a` and changes its reference count by `+1`.
+  - Passing `a` into any function as argument changes its reference count by `-1`.
+  - Calling `a.dispose()` also changes its reference count by `-1`.
+What this means is that all functions in jax-js must _take ownership_ of their arguments as
+references. Whenever you would like to pass an Array as argument, you can pass it directly to
+dispose of it, or use `.ref` if you'd like to use it again later.
+**You must follow these rules on your own functions as well!** All combinators like `jvp`, `grad`,
+`jit` assume that you are following these conventions on how arguments are passed, and they will
+respect them as well.
+```ts
+// Bad: Uses `x` twice, decrementing its reference count twice.
+function foo_bad(x: np.Array, y: np.Array) {
+  return x.add(x.mul(y));
+}
+// Good: The first usage of `x` is `x.ref`, adding +1 to refcount.
+function foo_good(x: np.Array, y: np.Array) {
+  return x.ref.add(x.mul(y));
+}
+```
+Here's another example:
+```ts
+// Bad: Doesn't consume `x` in the `if`-branch.
+function bar_bad(x: np.Array, skip: boolean) {
+  if (skip) return np.zeros(x.shape);
+  return x;
+}
+// Good: Consumes `x` the one time in each branch.
+function bar_good(x: np.Array, skip: boolean) {
+  if (skip) {
+    const ret = np.zeros(x.shape);
+    x.dispose();
+    return ret;
+  }
+  return x;
+}
+```
+You can assume that every function in jax-js takes ownership properly, except with a couple of very
+rare exceptions that are documented.
+### grad(), vmap() and jit()
+JAX's signature composable transformations are also supported in jax-js. Here is a simple example of
+using `grad` and `vmap` to compute the derivaive of a function:
+```ts
+import { numpy as np, grad, vmap } from "@jax-js/jax";
+const x = np.linspace(-10, 10, 1000);
+const y1 = vmap(grad(np.sin))(x.ref); // d/dx sin(x) = cos(x)
+const y2 = np.cos(x);
+np.allclose(y1, y2); // => true
+```
+The `jit` function is especially useful when doing long sequences of primitives on GPU, since it
+fuses operations together into a single kernel dispatch. This
+[improves memory bandwidth usage](https://substack.com/home/post/p-163548742) on hardware
+accelerators, which is the bottleneck on GPU rather than raw FLOPs. For instance:
+```ts
+export const hypot = jit(function hypot(x1: np.Array, x2: np.Array) {
+  return np.sqrt(np.square(x1).add(np.square(x2)));
+});
+```
+Without JIT, the `hypot()` function would require four kernel dispatches: two multiplies, one add,
+and one sqrt. JIT fuses these together into a single kernel that does it all at once.
+All functional transformations can take typed `JsTree` of inputs and outputs. These are similar to
+[JAX's pytrees](https://docs.jax.dev/en/latest/pytrees.html), and it's basically just a structure of
+nested JavaScript objects and arrays. For instance:
+```ts
+import { grad, numpy as np } from "@jax-js/jax";
+type Params = {
+  foo: np.Array;
+  bar: np.Array[];
+};
+function getSums(p: Params) {
+  const fooSum = p.foo.sum();
+  const barSum = p.bar.map((x) => x.sum()).reduce(np.add);
+  return fooSum.add(barSum);
+}
+grad(getSums)({
+  foo: np.array([1, 2, 3]),
+  bar: [np.array([10]), np.array([11, 12])],
+});
+// => { foo: [1, 1, 1], bar: [[1], [1, 1]] }
+```
+Note that you need to use `type` alias syntax rather than `interface` to define fine-grained
+`JsTree` types.
+### Devices
+Similar to JAX, jax-js has a concept of "devices" which are a backend that stores Arrays in memory
+and determines how to execute compiled operations on them.
+There are currently 3 devices in jax-js:
+- `cpu`: Slow, mostly for debugging purposes.
+- `wasm`: [WebAssembly](https://webassembly.org/), currently single-threaded and blocking.
+- `webgpu`: [WebGPU](https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API), available on
+  [supported browsers](https://caniuse.com/webgpu) (Chrome, Firefox, Safari, iOS).
+The default device is `wasm`, but you can change this at startup time:
+```ts
+import { defaultDevice, init } from "@jax-js/jax";
+const devices = await init(); // Starts all available backends.
+if (devices.includes("webgpu")) {
+  defaultDevice("webgpu");
+} else {
+  console.warn("WebGPU is not supported, falling back to Wasm.");
+}
+```
+You can also place individual arrays on specific devices:
+```ts
+import { devicePut, numpy as np } from "@jax-js/jax";
+const ar = np.array([1, 2, 3]); // Starts with device="wasm"
+await devicePut(ar, "webgpu"); // Now device="webgpu"
+```
+### Helper libraries
+There are other libraries in the `@jax-js` namespace that can work with jax-js, or be used in a
+self-contained way in other projects.
+**`@jax-js/optax`** provides implementations of optimizers like Adam and SGD.
+```ts
+import { adam } from "@jax-js/optax";
+let params = np.array([1.0, 2.0, 3.0]);
+const solver = adam(1e-3);
+let optState = solver.init(params.ref);
+let updates: np.Array;
+const f = (x: np.Array) => squaredError(x, np.ones([3])).sum();
+for (let i = 0; i < 100; i++) {
+  const paramsGrad = grad(f)(params.ref);
+  [updates, optState] = solver.update(paramsGrad, optState);
+  params = applyUpdates(params, updates);
+}
+```
+**`@jax-js/loaders`** can load tensors from various formats like Safetensors, includes a fast and
+compliant implementation of BPE, and caches HTTP requests for large assets like model weights in
+OPFS.
+```ts
+import { tokenizers } from "@jax-js/loaders";
+const enc = await tokenizers.getBpe("clip");
+const tokens = enc.encode("Hello, world!"); // => [ 49406, 3306, 267, 1002, ... ]
 ```
 ### Performance
@@ -77,22 +293,28 @@ script tag. This is the easiest way to get started on a blank HTML page.
 We haven't spent a ton of time optimizing yet, but performance is generally pretty good. `jit` is
 very helpful for fusing operations together, and it's a feature only available on the web in jax-js.
 The default kernel-tuning heuristics get about 3000 GFLOP/s for matrix multiplication on an M4 Pro
-chip ([try it](https://www.ekzhang.com/jax-js/bench/matmul)).
+chip ([try it](https://jax-js.com/bench/matmul)).
 For that example, it's around the same GFLOP/s as
 [TensorFlow.js](https://github.com/tensorflow/tfjs) and
 [ONNX Runtime Web](https://www.npmjs.com/package/onnxruntime-web), which both use handwritten
 libraries of custom kernels (versus jax-js, which generates kernels with an ML compiler).
+### API Reference
+That's all for this short tutorial. Please see the generated
+[API reference](https://jax-js.com/docs) for detailed documentation.
 ## Examples
 If you make something cool with jax-js, don't be a stranger! We can feature it here.
-- [In-browser REPL](https://www.ekzhang.com/jax-js/repl)
-- [Interactive MNIST training](https://www.ekzhang.com/jax-js/mnist)
-- [Matmul benchmark](https://www.ekzhang.com/jax-js/bench/matmul)
-- [Conv2d benchmark](https://www.ekzhang.com/jax-js/bench/conv2d)
-- [Mandelbrot set](https://www.ekzhang.com/jax-js/mandelbrot)
+- [Training neural networks on MNIST](https://jax-js.com/mnist)
+- [CLIP embeddings for books in-browser](https://jax-js.com/mobileclip)
+- [In-browser REPL](https://jax-js.com/repl)
+- [Matmul benchmark](https://jax-js.com/bench/matmul)
+- [Conv2d benchmark](https://jax-js.com/bench/conv2d)
+- [Mandelbrot set](https://jax-js.com/mandelbrot)
 ## Development
@@ -124,59 +346,12 @@ pnpm -C website dev
 ## Future work / help wanted
-Contributions are welcomed in the following areas:
+Contributions are welcomed! Especially in:
 - Adding support for more JAX functions and operations, see [compatibility table](./FEATURES.md).
 - Improving performance of the WebGPU and Wasm runtimes, generating better kernels, and using SIMD
-  and multithreading.
-- Helping the JIT compiler to fuse operations in more cases.
+  and multithreading. (Even single-threaded Wasm could be ~20x faster.)
+- Helping the JIT compiler to fuse operations in more cases, like `tanh` branches and adding
+  epilogue to reductions.
 - Adding WebGL runtime for older browsers that don't support WebGPU.
 - Making a fast transformer inference engine, comparing against onnxruntime-web.
-- Ergonomics and API improvements.
-## Next on Eric's mind
-- Finish CLIP inference demo and associated features (depthwise convolution, vmap of gather, etc.)
-- Performance
-  - Improve perf of MobileCLIP neural network
-    - Add fused epilogue to JIT
-    - Fix fusion of activation functions with branches like tanh
-    - Reduce kernel overhead of constants / inline expressions
-  - How many threads to create per workgroup, depends on hardware
-## Milestones
-- [x] It works!
-- [x] Demos: Browser REPL / editor
-- [x] First custom kernel
-- [x] Custom WebGPU backend, removing tfjs dependency
-  - [x] Low-level operations
-  - [x] Create `class Array {}` wrappers
-  - [x] Reduction operations
-- [ ] Kernel tuning (see `tuner.ts`)
-  - [x] "Upcast" optimizations (compute a tile per thread, e.g., matmul)
-  - [x] "Unroll" optimizations (multiple loop iters per thread, e.g., matmul)
-  - [ ] "Group" optimizations (multiple threads per value, e.g., matvec)
-  - [ ] Blocks respect local dimensions
-- [x] Other dtypes like int32 and bool
-- [x] `jit()` support via Jaxprs and kernel fusion
-- [x] We figure out the `dispose()` / refcount / linear types stuff
-  - [x] `dispose()` for saved "const" tracers in Jaxprs
-  - [x] Garbage collection for JIT programs
-  - [x] Debug grad-grad-jit test producing a UseAfterFreeError
-- [ ] Demos: Navier-Stokes, neural networks, statistics
-- [x] Features for neural networks
-  - [x] Convolution
-  - [x] Random and initializers
-  - [x] Optimizers (optax package?)
-- [x] Wasm backend (needs malloc)
-  - [x] Better memory allocation that frees buffers
-  - [ ] SIMD support for Wasm backend
-  - [ ] Async / multithreading Wasm support
-- [ ] Full support of weak types and committed devices
-  - [x] High-level ops have automatic type promotion
-  - [x] Weak types - [ref](https://docs.jax.dev/en/latest/type_promotion.html#weak-types)
-  - [ ] Committed devices -
-        [ref](https://docs.jax.dev/en/latest/sharded-computation.html#sharded-data-placement)
-  - [ ] Device switching with `device_put()` between webgpu/cpu/wasm
-- [x] numpy/jax API compatibility table