@jax-js/jax 0.1.13 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -366,11 +366,14 @@ To see per-kernel traces in browser development tools, call `jax.profiler.startT
366
366
 
367
367
  The WebGPU runtime includes an ML compiler with tile-aware optimizations, tuned for indiidual
368
368
  browsers. Also, this library uniquely has the `jit()` feature that fuses operations together and
369
- records an execution graph. jax-js achieves **over 7000 GFLOP/s** for matrix multiplication on an
370
- Apple M4 Max chip ([try it](https://jax-js.com/bench/matmul)).
369
+ records an execution graph.
371
370
 
372
- For that example, it's significantly faster than both
373
- [TensorFlow.js](https://github.com/tensorflow/tfjs) and
371
+ - _On WebGPU:_ jax-js achieves **over 7000 GFLOP/s** for matrix multiplication on an Apple M4 Max
372
+ chip ([try it](https://jax-js.com/bench/matmul)).
373
+ - _On WebAssembly (CPU):_ jax-js is the fastest multithreaded in-browser matrix multiplication, over
374
+ twice as fast as XNNPACK, and matches **OpenBLAS performance** on Apple Silicon.
375
+
376
+ This is significantly faster than both [TensorFlow.js](https://github.com/tensorflow/tfjs) and
374
377
  [ONNX Runtime Web](https://www.npmjs.com/package/onnxruntime-web), which both use handwritten
375
378
  libraries of custom kernels.
376
379
 
@@ -424,8 +427,8 @@ pnpm check # Run TypeScript type checking
424
427
  Contributions are welcomed! Some fruitful areas to look into:
425
428
 
426
429
  - Adding support for more JAX functions and operations, see [compatibility table](./FEATURES.md).
427
- - Improving performance of the WebGPU and Wasm runtimes, generating better kernels, and using SIMD
428
- and multithreading. (Even single-threaded Wasm could be ~20x faster.)
429
- - Making a fast transformer inference engine, comparing against onnxruntime-web.
430
+ - Improving performance of the WebGPU and Wasm runtimes and generating better kernels, especially
431
+ for convolutions.
432
+ - Making a fast, general transformer inference engine or model library.
430
433
 
431
434
  You may join our [Discord server](https://discord.gg/BW6YsCd4Tf) and chat with the community.