npm - @simulatte/webgpu - Versions diffs - 0.2.3 → 0.3.0 - Mend

@simulatte/webgpu 0.2.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

package/CHANGELOG.md +47 -4
package/README.md +273 -235
package/api-contract.md +163 -0
package/assets/fawn-icon-main-256.png +0 -0
package/assets/package-layers.svg +63 -0
package/assets/package-surface-cube-snapshot.svg +7 -7
package/{COMPAT_SCOPE.md → compat-scope.md} +1 -1
package/examples/direct-webgpu/compute-dispatch.js +66 -0
package/examples/direct-webgpu/explicit-bind-group.js +85 -0
package/examples/direct-webgpu/request-device.js +10 -0
package/examples/doe-api/buffers-readback.js +9 -0
package/examples/doe-api/compile-and-dispatch.js +30 -0
package/examples/doe-api/compute-dispatch.js +25 -0
package/examples/doe-routines/compute-once-like-input.js +36 -0
package/examples/doe-routines/compute-once-matmul.js +53 -0
package/examples/doe-routines/compute-once-multiple-inputs.js +27 -0
package/examples/doe-routines/compute-once.js +23 -0
package/headless-webgpu-comparison.md +2 -2
package/{LAYERING_PLAN.md → layering-plan.md} +10 -8
package/native/doe_napi.c +102 -12
package/package.json +26 -9
package/prebuilds/darwin-arm64/doe_napi.node +0 -0
package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
package/prebuilds/darwin-arm64/metadata.json +6 -6
package/prebuilds/linux-x64/doe_napi.node +0 -0
package/prebuilds/linux-x64/libwebgpu_doe.so +0 -0
package/prebuilds/linux-x64/metadata.json +5 -5
package/scripts/generate-readme-assets.js +81 -8
package/scripts/prebuild.js +23 -19
package/src/auto_bind_group_layout.js +32 -0
package/src/bun-ffi.js +93 -12
package/src/bun.js +23 -2
package/src/compute.d.ts +162 -0
package/src/compute.js +915 -0
package/src/doe.d.ts +184 -0
package/src/doe.js +641 -0
package/src/full.d.ts +119 -0
package/src/full.js +35 -0
package/src/index.js +1013 -38
package/src/node-runtime.js +2 -2
package/src/node.js +2 -2
package/{SUPPORT_CONTRACTS.md → support-contracts.md} +27 -41
package/{ZIG_SOURCE_INVENTORY.md → zig-source-inventory.md} +2 -2
package/API_CONTRACT.md +0 -182

package/CHANGELOG.md CHANGED Viewed

@@ -7,19 +7,62 @@ retrofitted from package version history and package-surface commits so the npm
 package has a conventional release history alongside the broader Fawn status
 and process documents.
+## [0.3.0] - 2026-03-11
+### Changed
+- Breaking: redesigned the shared `doe` surface around `await
+  doe.requestDevice()`, grouped `gpu.buffers.*`, and grouped
+  `gpu.compute.*` instead of the earlier flat bound-helper methods.
+- Added `gpu.buffers.like(...)` for buffer-allocation boilerplate reduction and
+  `gpu.compute.once(...)` for the first `Doe routines` workflow.
+- Doe helper token values now use camelCase (`storageRead`,
+  `storageReadWrite`) and Doe workgroups now accept `[x, y]` in addition to
+  `number` and `[x, y, z]`.
+- `gpu.compute.once(...)` now rejects raw numeric WebGPU usage flags; use Doe
+  usage tokens there or drop to `gpu.buffers.*` for explicit raw-flag control.
+- Kept the same `doe` shape on `@simulatte/webgpu` and
+  `@simulatte/webgpu/compute`; the package split remains the underlying raw
+  device surface (`full` vs compute-only facade), not separate helper dialects.
+- Updated the package README, API contract, and JSDoc guide to standardize the
+  `Direct WebGPU`, `Doe API`, and `Doe routines` model and the boundary between the headless package lane and
+  `nursery/fawn-browser`.
+## [0.2.4] - 2026-03-11
+### Changed
+- `doe.runCompute()` now infers binding access from Doe helper-created buffer
+  usage and fails fast when a bare binding lacks Doe usage metadata or uses a
+  non-bindable/ambiguous usage shape.
+- Simplified the compute-surface README example to use inferred binding access
+  (`bindings: [input, output]`) and the device-bound `doe.bind(await
+  requestDevice())` flow directly.
+- Clarified the install contract for non-prebuilt platforms: the `node-gyp`
+  fallback only builds the native addon and does not bundle `libwebgpu_doe`
+  plus the required Dawn sidecar.
+- Aligned the published package docs and API contract with the current
+  `@simulatte/webgpu`, `@simulatte/webgpu/compute`, and `@simulatte/webgpu/full`
+  export surface.
 ## [0.2.3] - 2026-03-10
 ### Added
 - macOS arm64 (Metal) prebuilds shipped alongside existing Linux x64 (Vulkan).
-- Monte Carlo pi estimation example in the README, replacing the trivial
-  buffer-readback snippet with a real GPU compute demonstration.
 - "Verify your install" section with `npm run smoke` and `npm test` guidance.
+- Added explicit package export surfaces for `@simulatte/webgpu` (default
+  full) and `@simulatte/webgpu/compute`, plus the first `doe` ergonomic
+  namespace for buffer/readback/compute helpers.
+- Added `doe.bind(device)` so the ergonomic helper surface supports device-bound
+  workflows in addition to static helper calls.
 ### Changed
-- Restructured package README for consumers: examples, quickstart, and
-  verification first; building from source and Fawn developer context at the end.
+- Restructured the package README around the default full surface,
+  `@simulatte/webgpu/compute`, and the `doe` helper surface.
+- `doe.runCompute()` now infers binding access from Doe helper-created buffer
+  usage and fails fast for bare bindings that do not carry Doe usage metadata.
 - Fixed broken README image links to use bundled asset paths instead of dead
   raw GitHub URLs.
 - Root Fawn README now directs package users to the package README.

package/README.md CHANGED Viewed

@@ -1,319 +1,357 @@
 # @simulatte/webgpu
-Headless WebGPU for Node.js and Bun, powered by Doe, Fawn's Zig WebGPU
-runtime.
+<table>
+  <tr>
+    <td valign="middle">
+      <strong>Run real WebGPU workloads in Node.js and Bun with Doe, the WebGPU runtime from Fawn.</strong>
+    </td>
+    <td valign="middle">
+      <img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="88" />
+    </td>
+  </tr>
+</table>
-<p align="center">
-  <img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="196" />
-</p>
-Use this package for headless compute, CI, benchmarking, and offscreen GPU
-execution. It is built for explicit runtime behavior, deterministic
-traceability, and artifact-backed performance work. It is not a DOM/canvas
-package and it should not be read as a promise of full browser-surface parity.
-## Quick examples
-### Inspect the provider
+`@simulatte/webgpu` is Fawn's headless WebGPU package for Node.js and Bun: use
+the raw WebGPU API through `requestDevice()` and `device.*`, or move up to the
+Doe API + routines when you want the same runtime with less setup. Browser
+DOM/canvas ownership lives in the separate `nursery/fawn-browser` lane.
-```js
-import { providerInfo } from "@simulatte/webgpu";
-console.log(providerInfo());
-```
+Terminology in this README is deliberate:
-### Request a device
+- `Doe runtime` means the Zig/native WebGPU runtime underneath the package
+- `Doe API` means the explicit JS convenience surface under `doe`, `gpu.buffers.*`,
+  `gpu.compute.run(...)`, and `gpu.compute.compile(...)`
+- `Doe routines` means the narrower, more opinionated JS flows such as
+  `gpu.compute.once(...)`
-```js
-import { requestDevice } from "@simulatte/webgpu";
+## Start here
-const device = await requestDevice();
-console.log(device.limits.maxBufferSize);
-```
+### Same workload, two layers
-### Estimate pi on the GPU
+The same simple compute pass, shown first at the raw WebGPU layer and then at
+the explicit Doe API layer.
-65,536 threads each test 1,024 points inside the unit square. Each thread
-hashes its index to produce sample coordinates, counts how many land inside
-the unit circle, and writes its count to a results array. The CPU sums the
-counts and computes pi ≈ 4 × hits / total.
+#### 1. Direct WebGPU
 ```js
 import { globals, requestDevice } from "@simulatte/webgpu";
-const { GPUBufferUsage, GPUMapMode, GPUShaderStage } = globals;
 const device = await requestDevice();
+const input = new Float32Array([1, 2, 3, 4]);
+const bytes = input.byteLength;
-const THREADS = 65536;
-const WORKGROUP_SIZE = 256;
-const SAMPLES_PER_THREAD = 1024;
-if (THREADS % WORKGROUP_SIZE !== 0) {
-  throw new Error("THREADS must be a multiple of WORKGROUP_SIZE");
-}
-const shader = device.createShaderModule({
-  code: `
-    @group(0) @binding(0) var<storage, read_write> counts: array<u32>;
-    fn hash(n: u32) -> u32 {
-      var x = n;
-      x ^= x >> 16u;
-      x *= 0x45d9f3bu;
-      x ^= x >> 16u;
-      x *= 0x45d9f3bu;
-      x ^= x >> 16u;
-      return x;
-    }
-    @compute @workgroup_size(${WORKGROUP_SIZE})
-    fn main(@builtin(global_invocation_id) gid: vec3u) {
-      var count = 0u;
-      for (var i = 0u; i < ${SAMPLES_PER_THREAD}u; i += 1u) {
-        let idx = gid.x * ${SAMPLES_PER_THREAD}u + i;
-        let x = f32(hash(idx * 2u)) / 4294967295.0;
-        let y = f32(hash(idx * 2u + 1u)) / 4294967295.0;
-        if x * x + y * y <= 1.0 {
-          count += 1u;
-        }
-      }
-      counts[gid.x] = count;
-    }
-  `,
+const src = device.createBuffer({
+  size: bytes,
+  usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
 });
+device.queue.writeBuffer(src, 0, input);
-const bindGroupLayout = device.createBindGroupLayout({
-  entries: [{
-    binding: 0,
-    visibility: GPUShaderStage.COMPUTE,
-    buffer: { type: "storage" },
-  }],
+const dst = device.createBuffer({
+  size: bytes,
+  usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
 });
-const pipeline = device.createComputePipeline({
-  layout: device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] }),
-  compute: { module: shader, entryPoint: "main" },
+const readback = device.createBuffer({
+  size: bytes,
+  usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
 });
-const countsBuffer = device.createBuffer({
-  size: THREADS * 4,
-  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC,
-});
-const readback = device.createBuffer({
-  size: THREADS * 4,
-  usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
+const pipeline = device.createComputePipeline({
+  layout: "auto",
+  compute: {
+    module: device.createShaderModule({
+      code: `
+        @group(0) @binding(0) var<storage, read> src: array<f32>;
+        @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
+        @compute @workgroup_size(4)
+        fn main(@builtin(global_invocation_id) gid: vec3u) {
+          let i = gid.x;
+          dst[i] = src[i] * 2.0;
+        }
+      `,
+    }),
+    entryPoint: "main",
+  },
 });
 const bindGroup = device.createBindGroup({
-  layout: bindGroupLayout,
-  entries: [{ binding: 0, resource: { buffer: countsBuffer } }],
+  layout: pipeline.getBindGroupLayout(0),
+  entries: [
+    { binding: 0, resource: { buffer: src } },
+    { binding: 1, resource: { buffer: dst } },
+  ],
 });
 const encoder = device.createCommandEncoder();
 const pass = encoder.beginComputePass();
 pass.setPipeline(pipeline);
 pass.setBindGroup(0, bindGroup);
-pass.dispatchWorkgroups(THREADS / WORKGROUP_SIZE);
+pass.dispatchWorkgroups(1);
 pass.end();
-encoder.copyBufferToBuffer(countsBuffer, 0, readback, 0, THREADS * 4);
+encoder.copyBufferToBuffer(dst, 0, readback, 0, bytes);
 device.queue.submit([encoder.finish()]);
+await device.queue.onSubmittedWorkDone();
-await readback.mapAsync(GPUMapMode.READ);
-const counts = new Uint32Array(readback.getMappedRange());
-const hits = counts.reduce((a, b) => a + b, 0);
+await readback.mapAsync(globals.GPUMapMode.READ);
+const result = new Float32Array(readback.getMappedRange().slice(0));
 readback.unmap();
-const total = THREADS * SAMPLES_PER_THREAD;
-const pi = 4 * hits / total;
-console.log(`${total.toLocaleString()} samples → pi ≈ ${pi.toFixed(6)}`);
+console.log(result); // Float32Array(4) [ 2, 4, 6, 8 ]
 ```
-Expected output will vary slightly, but it should look like:
+#### 2. Doe API
-```
-67,108,864 samples → pi ≈ 3.14...
+Explicit Doe buffers and dispatch when you want less boilerplate but still want
+to manage the resources yourself.
+```js
+import { doe } from "@simulatte/webgpu/compute";
+const gpu = await doe.requestDevice();
+const src = gpu.buffers.fromData(Float32Array.of(1, 2, 3, 4));
+const dst = gpu.buffers.like(src, {
+  usage: "storageReadWrite",
+});
+await gpu.compute.run({
+  code: `
+    @group(0) @binding(0) var<storage, read> src: array<f32>;
+    @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
+    @compute @workgroup_size(4)
+    fn main(@builtin(global_invocation_id) gid: vec3u) {
+      let i = gid.x;
+      dst[i] = src[i] * 2.0;
+    }
+  `,
+  // Access is inferred from the Doe buffer usage above.
+  bindings: [src, dst],
+  workgroups: 1,
+});
+console.log(await gpu.buffers.read(dst, Float32Array)); // Float32Array(4) [ 2, 4, 6, 8 ]
 ```
-Increase `SAMPLES_PER_THREAD` for more precision.
+The package identity is simple:
-## What this package is
+- `requestDevice()` gives you real headless WebGPU
+- `doe` gives you the same runtime with less boilerplate and explicit resource control
+- `compute.once(...)` is the more opinionated routines layer when you do not want to manage buffers and readback yourself
-`@simulatte/webgpu` is the canonical package surface for Doe. Node uses an
-N-API addon and Bun currently routes through the same addon-backed runtime
-entry to load `libwebgpu_doe`. Current package builds still ship a Dawn sidecar
-where proc resolution requires it. The experimental raw Bun FFI path remains in
-`src/bun-ffi.js`, but it is not the default package entry.
+#### 3. Doe routines: one-shot tensor matmul
-Doe is a Zig-first WebGPU runtime with explicit allocator control, startup-time
-profile and quirk binding, a native WGSL pipeline (`lexer -> parser ->
-semantic analysis -> IR -> backend emitters`), and explicit
-Vulkan/Metal/D3D12 execution paths in one system. Optional
-`-Dlean-verified=true` builds use Lean 4 where proved invariants can be
-hoisted out of runtime branches instead of being re-checked on every command;
-package consumers should not assume that path by default.
+This is where the routines layer starts to separate itself: you pass typed
+arrays and an output spec, and the package handles upload, output allocation,
+dispatch, and readback while the shader and tensor shapes stay explicit.
-Doe also keeps adapter and driver quirks explicit. Profile selection happens at
-startup, quirk data is schema-backed, and the runtime binds the selected
-profile instead of relying on hidden per-command fallback logic.
+```js
+import { doe } from "@simulatte/webgpu/compute";
-## Current scope
+const gpu = await doe.requestDevice();
+const [M, K, N] = [256, 512, 256];
-- Node is the primary supported package surface (N-API bridge).
-- Bun has API parity with Node through the package's addon-backed runtime entry
-  (61/61 contract tests passing). Bun benchmark cube maturity remains
-  prototype until the comparable macOS cells stabilize across repeated
-  governed runs.
-- Package-surface comparisons should be read through the benchmark cube outputs
-  under `bench/out/cube/`, not as a replacement for strict backend reports.
+const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
+const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
+const dims = new Uint32Array([M, K, N, 0]);
+const result = await gpu.compute.once({
+  code: `
+    struct Dims {
+      m: u32,
+      k: u32,
+      n: u32,
+      _pad: u32,
+    };
+    @group(0) @binding(0) var<uniform> dims: Dims;
+    @group(0) @binding(1) var<storage, read> lhs: array<f32>;
+    @group(0) @binding(2) var<storage, read> rhs: array<f32>;
+    @group(0) @binding(3) var<storage, read_write> out: array<f32>;
+    @compute @workgroup_size(8, 8)
+    fn main(@builtin(global_invocation_id) gid: vec3u) {
+      let row = gid.y;
+      let col = gid.x;
+      if (row >= dims.m || col >= dims.n) {
+        return;
+      }
+      var acc = 0.0;
+      for (var i = 0u; i < dims.k; i = i + 1u) {
+        acc += lhs[row * dims.k + i] * rhs[i * dims.n + col];
+      }
+      out[row * dims.n + col] = acc;
+    }
+  `,
+  inputs: [
+    { data: dims, usage: "uniform", access: "uniform" },
+    lhs,
+    rhs,
+  ],
+  output: {
+    type: Float32Array,
+    size: M * N * Float32Array.BYTES_PER_ELEMENT,
+  },
+  workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
+});
+console.log(result.subarray(0, 8)); // Float32Array(8) [ ... ]
+```
+### Benchmarked package surface
+The package is not just a wrapper API. It is the headless package surface of
+the Doe runtime, Fawn's Zig-first WebGPU implementation, and it is exercised as
+a measured package surface with explicit package lanes.
 <p align="center">
   <img src="assets/package-surface-cube-snapshot.svg" alt="Static package-surface benchmark cube snapshot" width="920" />
 </p>
-Package-surface benchmark evidence lives under `bench/out/cube/latest/`. Read
-those rows as package-surface positioning data, not as substitutes for strict
-backend-native claim lanes.
+`@simulatte/webgpu` is the headless package surface of the broader
+[Fawn](https://github.com/clocksmith/fawn) project. The same repository also
+carries the Doe runtime itself, benchmarking and verification tooling, and the
+separate `nursery/fawn-browser` Chromium/browser integration lane.
-## Quick start
+## Install
 ```bash
 npm install @simulatte/webgpu
 ```
+The install ships platform-specific prebuilds for macOS arm64 (Metal) and
+Linux x64 (Vulkan). If no prebuild matches your platform, the installer falls
+back to building the native addon with `node-gyp` only; it does not build or
+bundle `libwebgpu_doe` and the required Dawn sidecar for you. On unsupported
+platforms, use a local Fawn workspace build for those runtime libraries.
+## Choose a surface
+| Import                      | Surface               | Includes                                                  |
+| --------------------------- | --------------------- | --------------------------------------------------------- |
+| `@simulatte/webgpu`         | Default full surface  | Buffers, compute, textures, samplers, render, Doe API + routines |
+| `@simulatte/webgpu/compute` | Compute-first surface | Buffers, compute, copy/upload/readback, Doe API + routines |
+| `@simulatte/webgpu/full`    | Explicit full surface | Same contract as the default package surface              |
+Use `@simulatte/webgpu/compute` when you want the constrained package contract
+for AI workloads and other buffer/dispatch-heavy headless execution. The
+compute surface intentionally omits render and sampler methods from the JS
+facade.
+## Package basics
+### Inspect the provider
 ```js
-import { providerInfo, requestDevice } from "@simulatte/webgpu";
+import { providerInfo } from "@simulatte/webgpu";
 console.log(providerInfo());
-const device = await requestDevice();
-console.log(device.limits.maxBufferSize);
 ```
-The install ships platform-specific prebuilds for macOS arm64 (Metal) and
-Linux x64 (Vulkan). The commands are the same on both platforms; the correct
-backend is selected automatically. The only external prerequisite is GPU
-drivers on the host. If no prebuild matches your platform, install falls back
-to building from source via node-gyp.
-## Verify your install
+### Request a full device
-After installing, run the smoke test to confirm native library loading and a
-GPU round-trip:
+```js
+import { requestDevice } from "@simulatte/webgpu";
-```bash
-npm run smoke
+const device = await requestDevice();
+console.log(device.limits.maxBufferSize);
 ```
-To run the full contract test suite (adapter, device, buffers, compute
-dispatch with readback, textures, samplers):
+### Request a compute-only device
-```bash
-npm test                     # Node
-npm run test:bun             # Bun
+```js
+import { requestDevice } from "@simulatte/webgpu/compute";
+const device = await requestDevice();
+console.log(typeof device.createComputePipeline); // "function"
+console.log(typeof device.createRenderPipeline); // "undefined"
 ```
-If `npm run smoke` fails, check that GPU drivers are installed and that your
-platform is supported (macOS arm64 or Linux x64).
+## Doe layers
-## Building from source
+The package exposes three layers over the same runtime:
-Use this when working from the Fawn repo checkout or rebuilding the addon
-against a local Doe runtime build.
+<p align="center">
+  <img src="assets/package-layers.svg" alt="Layered package graph showing direct WebGPU, Doe API, and Doe routines over the same package surfaces." width="920" />
+</p>
-```bash
-# From the Fawn workspace root:
-cd zig && zig build dropin   # build libwebgpu_doe + Dawn sidecar
-cd nursery/webgpu
-npm run build:addon          # compile doe_napi.node from source
-npm run smoke                # verify native loading + GPU round-trip
-npm test                     # Node contract tests
-npm run test:bun             # Bun contract tests
-```
+- `Direct WebGPU`
+  raw `requestDevice()` plus direct `device.*`
+- `Doe API`
+  explicit Doe surface for lower-boilerplate buffer and compute flows
+- `Doe routines`
+  more opinionated Doe flows where the JS surface carries more of the operation
+Examples for each style ship in:
+- `examples/direct-webgpu/`
+- `examples/doe-api/`
+- `examples/doe-routines/`
+`doe` is the package's shared JS convenience surface over the Doe runtime. It is available
+from both `@simulatte/webgpu` and `@simulatte/webgpu/compute`.
+- `await doe.requestDevice()` gets a bound helper object in one step; use
+  `doe.bind(device)` when you already have a device.
+- `gpu.buffers.*`, `gpu.compute.run(...)`, and `gpu.compute.compile(...)` are
+  the main `Doe API` surface.
+- `gpu.compute.once(...)` is currently the first `Doe routines` path.
+The Doe API and Doe routines surface is the same on both package surfaces.
+The difference is the raw device beneath it:
+- `@simulatte/webgpu/compute` returns a compute-only facade
+- `@simulatte/webgpu` keeps the full headless device surface
+Binding access is inferred from Doe helper-created buffer usage when possible.
+For raw WebGPU buffers or non-bindable/ambiguous usage, pass
+`{ buffer, access }` explicitly.
+## Runtime notes
+`@simulatte/webgpu` is the canonical package surface for the Doe runtime. Node uses the
+addon-backed path. Bun uses a platform-dependent bridge today: Linux routes
+through the package FFI surface, while macOS currently uses the full
+addon-backed path for correctness parity. Current builds still ship a Dawn
+sidecar where proc resolution requires it.
+The Doe runtime is Fawn's Zig-first WebGPU implementation with explicit profile
+and quirk binding, a native WGSL pipeline (`lexer -> parser -> semantic
+analysis -> IR -> backend emitters`), and explicit Vulkan/Metal/D3D12
+execution paths in one system.
+Optional `-Dlean-verified=true` builds use Lean 4 as build-time proof support,
+not as a runtime interpreter. When a condition is proved ahead of time, the Doe
+runtime can remove that branch instead of re-checking it on every command;
+package consumers should not assume that path by default.
-Current macOS arm64 validation for `0.2.3` was rerun on March 10, 2026 with:
+## Verify your install
 ```bash
-cd zig && zig build dropin
-cd nursery/webgpu
-npm run build:addon
 npm run smoke
 npm test
 npm run test:bun
-npm run prebuild -- --skip-addon-build
-npm pack --dry-run
 ```
-That path is green on the Apple Metal host. `npm run test:bun` also passed on
-this host (`61 passed, 0 failed`) once Bun was added to `PATH`.
+`npm run smoke` checks native library loading and a GPU round-trip. `npm test`
+covers the Node package contract and a packed-tarball export/import check.
-For Fawn development setup, build toolchain requirements, and benchmark
-harness usage, see the [Fawn project README](../../README.md).
+## Caveats
-## Packaging prebuilds (CI / release)
+- This is a headless package, not a browser DOM/canvas package.
+- `@simulatte/webgpu/compute` is intentionally narrower than the default full
+  surface.
+- Bun currently uses a platform-dependent bridge layer under the same package
+  contract: FFI on Linux, full/addon-backed on macOS. Package-surface contract
+  tests are green, and package benchmark rows are positioning data rather than
+  the source of truth for strict backend-native Doe-vs-Dawn claims.
-```bash
-npm run prebuild             # assembles prebuilds/<platform>-<arch>/
-```
+## Further reading
-Supported prebuild targets: macOS arm64 (Metal), Linux x64 (Vulkan),
-Windows x64 (D3D12). Host GPU drivers are the only external prerequisite.
-Install uses prebuilds when available, falls back to node-gyp from source.
-Tracked `prebuilds/<platform>-<arch>/` directories are the source of truth for
-reproducible package publishes. If a prebuild exists only on one local machine
-and is not committed, `npm pack` output will differ by environment.
-Generated `.tgz` package archives are release outputs and should not be
-committed to the repo.
-Prebuild `metadata.json` now records `doeBuild.leanVerifiedBuild` and
-`proofArtifactSha256`, and `providerInfo()` surfaces the same values when
-metadata is present.
-Package publication still depends on the governed Linux Vulkan release lane in
-[`process.md`](../../process.md). A green macOS package rerun is necessary, but
-not sufficient, for a release publish.
-## Current caveats
-- This package is for headless benchmarking and CI workflows, not full browser
-  parity.
-- Node provider comparisons are host-local package/runtime evidence measured
-  with package-level timers. They are useful surface-positioning data, not
-  backend claim substantiation or a broad "the package is faster" claim.
-- `@simulatte/webgpu` does not yet have a single broad cross-surface speed
-  claim. Current performance evidence is split across Node package-surface
-  runs, prototype Bun package-surface runs, and workload-specific strict
-  backend reports.
-- Linux Node Doe-native path is now wired end-to-end (Linux guard removed).
-  No `DOE_WEBGPU_LIB` env var needed when prebuilds or workspace artifacts
-  are present.
-- Fresh macOS package evidence from March 10, 2026 is reflected in
-  `bench/out/cube/latest/` (generated `2026-03-10T20:31:02.431911Z`):
-  Bun `uploads`, `compute_e2e`, and `full_comparable` are `claimable`;
-  Node `uploads`, `compute_e2e`, and `full_comparable` are also `claimable`.
-- Separate Apple Metal extended-comparable backend evidence from March 10, 2026
-  (`bench/out/apple-metal/extended-comparable/20260310T121546Z/`) is
-  `31/31` comparable and `31/31` claimable. Read that lane as stricter
-  backend evidence, not as a replacement for the package-surface cube rows.
-- Bun has API parity with Node (61/61 contract tests). The package-default Bun
-  entry currently routes through the addon-backed runtime, while
-  `src/bun-ffi.js` remains experimental. Bun benchmark lane is at
-  `bench/bun/compare.js`; benchmark interpretations should note which runtime
-  entry was exercised. Latest fresh macOS run
-  (`bench/out/bun-doe-vs-webgpu/doe-vs-bun-webgpu-2026-03-10T195022524Z.json`)
-  executes all `12` current workloads and has `9` comparable rows, all `9`
-  claimable. `compute_e2e_{256,4096,65536}` and
-  `copy_buffer_to_buffer_4kb` are claimable in the full macOS package lane.
-  The remaining three rows are intentional directional-only workloads
-  (`submit_empty`, `pipeline_create`, `compute_dispatch_simple`).
-- Latest fresh macOS Node package run
-  (`bench/out/node-doe-vs-dawn-claim-full/doe-vs-dawn-node-2026-03-10T202406545Z.json`)
-  has `12` total rows, `9` comparable rows, and all `9` comparable rows are
-  claimable. `compute_e2e_{256,4096,65536}`, `copy_buffer_to_buffer_4kb`,
-  and the current upload set are claimable in the full package lane. The
-  remaining three rows are intentional directional-only workloads
-  (`submit_empty`, `pipeline_create`, `compute_dispatch_simple`).
-- Self-contained install ships prebuilt `doe_napi.node` + `libwebgpu_doe` +
-  Dawn sidecar per platform. See **Verify your install** above.
-- API details live in `API_CONTRACT.md`.
-- Compatibility scope is documented in `COMPAT_SCOPE.md`.
+- [API contract](./api-contract.md)
+- [Support contracts](./support-contracts.md)
+- [Compatibility scope](./compat-scope.md)
+- [Layering plan](./layering-plan.md)
+- [Headless WebGPU comparison](./headless-webgpu-comparison.md)
+- [Zig source inventory](./zig-source-inventory.md)