@jax-js/jax 0.1.12 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -76,6 +76,7 @@ Demos on the jax-js website:
76
76
  - [Object detection: D-FINE (ONNX)](https://jax-js.com/d-fine)
77
77
  - [Object detection: DETR ResNet-50 (ONNX)](https://jax-js.com/detr-resnet-50)
78
78
  - [Fluid simulation (Navier-Stokes)](https://jax-js.com/fluid-sim)
79
+ - [Neural cellular automata](https://jax-js.com/nca-growing)
79
80
  - [In-browser REPL](https://jax-js.com/repl)
80
81
  - [Matmul benchmark](https://jax-js.com/bench/matmul)
81
82
  - [Matvec benchmark](https://jax-js.com/bench/matvec)
@@ -365,11 +366,14 @@ To see per-kernel traces in browser development tools, call `jax.profiler.startT
365
366
 
366
367
  The WebGPU runtime includes an ML compiler with tile-aware optimizations, tuned for indiidual
367
368
  browsers. Also, this library uniquely has the `jit()` feature that fuses operations together and
368
- records an execution graph. jax-js achieves **over 7000 GFLOP/s** for matrix multiplication on an
369
- Apple M4 Max chip ([try it](https://jax-js.com/bench/matmul)).
369
+ records an execution graph.
370
370
 
371
- For that example, it's significantly faster than both
372
- [TensorFlow.js](https://github.com/tensorflow/tfjs) and
371
+ - _On WebGPU:_ jax-js achieves **over 7000 GFLOP/s** for matrix multiplication on an Apple M4 Max
372
+ chip ([try it](https://jax-js.com/bench/matmul)).
373
+ - _On WebAssembly (CPU):_ jax-js is the fastest multithreaded in-browser matrix multiplication, over
374
+ twice as fast as XNNPACK, and matches **OpenBLAS performance** on Apple Silicon.
375
+
376
+ This is significantly faster than both [TensorFlow.js](https://github.com/tensorflow/tfjs) and
373
377
  [ONNX Runtime Web](https://www.npmjs.com/package/onnxruntime-web), which both use handwritten
374
378
  libraries of custom kernels.
375
379
 
@@ -423,8 +427,8 @@ pnpm check # Run TypeScript type checking
423
427
  Contributions are welcomed! Some fruitful areas to look into:
424
428
 
425
429
  - Adding support for more JAX functions and operations, see [compatibility table](./FEATURES.md).
426
- - Improving performance of the WebGPU and Wasm runtimes, generating better kernels, and using SIMD
427
- and multithreading. (Even single-threaded Wasm could be ~20x faster.)
428
- - Making a fast transformer inference engine, comparing against onnxruntime-web.
430
+ - Improving performance of the WebGPU and Wasm runtimes and generating better kernels, especially
431
+ for convolutions.
432
+ - Making a fast, general transformer inference engine or model library.
429
433
 
430
434
  You may join our [Discord server](https://discord.gg/BW6YsCd4Tf) and chat with the community.