@simulatte/webgpu 0.2.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/CHANGELOG.md +47 -4
  2. package/README.md +273 -235
  3. package/api-contract.md +163 -0
  4. package/assets/fawn-icon-main-256.png +0 -0
  5. package/assets/package-layers.svg +63 -0
  6. package/assets/package-surface-cube-snapshot.svg +7 -7
  7. package/{COMPAT_SCOPE.md → compat-scope.md} +1 -1
  8. package/examples/direct-webgpu/compute-dispatch.js +66 -0
  9. package/examples/direct-webgpu/explicit-bind-group.js +85 -0
  10. package/examples/direct-webgpu/request-device.js +10 -0
  11. package/examples/doe-api/buffers-readback.js +9 -0
  12. package/examples/doe-api/compile-and-dispatch.js +30 -0
  13. package/examples/doe-api/compute-dispatch.js +25 -0
  14. package/examples/doe-routines/compute-once-like-input.js +36 -0
  15. package/examples/doe-routines/compute-once-matmul.js +53 -0
  16. package/examples/doe-routines/compute-once-multiple-inputs.js +27 -0
  17. package/examples/doe-routines/compute-once.js +23 -0
  18. package/headless-webgpu-comparison.md +2 -2
  19. package/{LAYERING_PLAN.md → layering-plan.md} +10 -8
  20. package/native/doe_napi.c +102 -12
  21. package/package.json +26 -9
  22. package/prebuilds/darwin-arm64/doe_napi.node +0 -0
  23. package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
  24. package/prebuilds/darwin-arm64/metadata.json +6 -6
  25. package/prebuilds/linux-x64/doe_napi.node +0 -0
  26. package/prebuilds/linux-x64/libwebgpu_doe.so +0 -0
  27. package/prebuilds/linux-x64/metadata.json +5 -5
  28. package/scripts/generate-readme-assets.js +81 -8
  29. package/scripts/prebuild.js +23 -19
  30. package/src/auto_bind_group_layout.js +32 -0
  31. package/src/bun-ffi.js +93 -12
  32. package/src/bun.js +23 -2
  33. package/src/compute.d.ts +162 -0
  34. package/src/compute.js +915 -0
  35. package/src/doe.d.ts +184 -0
  36. package/src/doe.js +641 -0
  37. package/src/full.d.ts +119 -0
  38. package/src/full.js +35 -0
  39. package/src/index.js +1013 -38
  40. package/src/node-runtime.js +2 -2
  41. package/src/node.js +2 -2
  42. package/{SUPPORT_CONTRACTS.md → support-contracts.md} +27 -41
  43. package/{ZIG_SOURCE_INVENTORY.md → zig-source-inventory.md} +2 -2
  44. package/API_CONTRACT.md +0 -182
package/CHANGELOG.md CHANGED
@@ -7,19 +7,62 @@ retrofitted from package version history and package-surface commits so the npm
7
7
  package has a conventional release history alongside the broader Fawn status
8
8
  and process documents.
9
9
 
10
+ ## [0.3.0] - 2026-03-11
11
+
12
+ ### Changed
13
+
14
+ - Breaking: redesigned the shared `doe` surface around `await
15
+ doe.requestDevice()`, grouped `gpu.buffers.*`, and grouped
16
+ `gpu.compute.*` instead of the earlier flat bound-helper methods.
17
+ - Added `gpu.buffers.like(...)` for buffer-allocation boilerplate reduction and
18
+ `gpu.compute.once(...)` for the first `Doe routines` workflow.
19
+ - Doe helper token values now use camelCase (`storageRead`,
20
+ `storageReadWrite`) and Doe workgroups now accept `[x, y]` in addition to
21
+ `number` and `[x, y, z]`.
22
+ - `gpu.compute.once(...)` now rejects raw numeric WebGPU usage flags; use Doe
23
+ usage tokens there or drop to `gpu.buffers.*` for explicit raw-flag control.
24
+ - Kept the same `doe` shape on `@simulatte/webgpu` and
25
+ `@simulatte/webgpu/compute`; the package split remains the underlying raw
26
+ device surface (`full` vs compute-only facade), not separate helper dialects.
27
+ - Updated the package README, API contract, and JSDoc guide to standardize the
28
+ `Direct WebGPU`, `Doe API`, and `Doe routines` model and the boundary between the headless package lane and
29
+ `nursery/fawn-browser`.
30
+
31
+ ## [0.2.4] - 2026-03-11
32
+
33
+ ### Changed
34
+
35
+ - `doe.runCompute()` now infers binding access from Doe helper-created buffer
36
+ usage and fails fast when a bare binding lacks Doe usage metadata or uses a
37
+ non-bindable/ambiguous usage shape.
38
+ - Simplified the compute-surface README example to use inferred binding access
39
+ (`bindings: [input, output]`) and the device-bound `doe.bind(await
40
+ requestDevice())` flow directly.
41
+ - Clarified the install contract for non-prebuilt platforms: the `node-gyp`
42
+ fallback only builds the native addon and does not bundle `libwebgpu_doe`
43
+ plus the required Dawn sidecar.
44
+ - Aligned the published package docs and API contract with the current
45
+ `@simulatte/webgpu`, `@simulatte/webgpu/compute`, and `@simulatte/webgpu/full`
46
+ export surface.
47
+
10
48
  ## [0.2.3] - 2026-03-10
11
49
 
12
50
  ### Added
13
51
 
14
52
  - macOS arm64 (Metal) prebuilds shipped alongside existing Linux x64 (Vulkan).
15
- - Monte Carlo pi estimation example in the README, replacing the trivial
16
- buffer-readback snippet with a real GPU compute demonstration.
17
53
  - "Verify your install" section with `npm run smoke` and `npm test` guidance.
54
+ - Added explicit package export surfaces for `@simulatte/webgpu` (default
55
+ full) and `@simulatte/webgpu/compute`, plus the first `doe` ergonomic
56
+ namespace for buffer/readback/compute helpers.
57
+ - Added `doe.bind(device)` so the ergonomic helper surface supports device-bound
58
+ workflows in addition to static helper calls.
18
59
 
19
60
  ### Changed
20
61
 
21
- - Restructured package README for consumers: examples, quickstart, and
22
- verification first; building from source and Fawn developer context at the end.
62
+ - Restructured the package README around the default full surface,
63
+ `@simulatte/webgpu/compute`, and the `doe` helper surface.
64
+ - `doe.runCompute()` now infers binding access from Doe helper-created buffer
65
+ usage and fails fast for bare bindings that do not carry Doe usage metadata.
23
66
  - Fixed broken README image links to use bundled asset paths instead of dead
24
67
  raw GitHub URLs.
25
68
  - Root Fawn README now directs package users to the package README.
package/README.md CHANGED
@@ -1,319 +1,357 @@
1
1
  # @simulatte/webgpu
2
2
 
3
- Headless WebGPU for Node.js and Bun, powered by Doe, Fawn's Zig WebGPU
4
- runtime.
3
+ <table>
4
+ <tr>
5
+ <td valign="middle">
6
+ <strong>Run real WebGPU workloads in Node.js and Bun with Doe, the WebGPU runtime from Fawn.</strong>
7
+ </td>
8
+ <td valign="middle">
9
+ <img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="88" />
10
+ </td>
11
+ </tr>
12
+ </table>
5
13
 
6
- <p align="center">
7
- <img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="196" />
8
- </p>
9
-
10
- Use this package for headless compute, CI, benchmarking, and offscreen GPU
11
- execution. It is built for explicit runtime behavior, deterministic
12
- traceability, and artifact-backed performance work. It is not a DOM/canvas
13
- package and it should not be read as a promise of full browser-surface parity.
14
-
15
- ## Quick examples
16
-
17
- ### Inspect the provider
14
+ `@simulatte/webgpu` is Fawn's headless WebGPU package for Node.js and Bun: use
15
+ the raw WebGPU API through `requestDevice()` and `device.*`, or move up to the
16
+ Doe API + routines when you want the same runtime with less setup. Browser
17
+ DOM/canvas ownership lives in the separate `nursery/fawn-browser` lane.
18
18
 
19
- ```js
20
- import { providerInfo } from "@simulatte/webgpu";
21
-
22
- console.log(providerInfo());
23
- ```
19
+ Terminology in this README is deliberate:
24
20
 
25
- ### Request a device
21
+ - `Doe runtime` means the Zig/native WebGPU runtime underneath the package
22
+ - `Doe API` means the explicit JS convenience surface under `doe`, `gpu.buffers.*`,
23
+ `gpu.compute.run(...)`, and `gpu.compute.compile(...)`
24
+ - `Doe routines` means the narrower, more opinionated JS flows such as
25
+ `gpu.compute.once(...)`
26
26
 
27
- ```js
28
- import { requestDevice } from "@simulatte/webgpu";
27
+ ## Start here
29
28
 
30
- const device = await requestDevice();
31
- console.log(device.limits.maxBufferSize);
32
- ```
29
+ ### Same workload, two layers
33
30
 
34
- ### Estimate pi on the GPU
31
+ The same simple compute pass, shown first at the raw WebGPU layer and then at
32
+ the explicit Doe API layer.
35
33
 
36
- 65,536 threads each test 1,024 points inside the unit square. Each thread
37
- hashes its index to produce sample coordinates, counts how many land inside
38
- the unit circle, and writes its count to a results array. The CPU sums the
39
- counts and computes pi ≈ 4 × hits / total.
34
+ #### 1. Direct WebGPU
40
35
 
41
36
  ```js
42
37
  import { globals, requestDevice } from "@simulatte/webgpu";
43
38
 
44
- const { GPUBufferUsage, GPUMapMode, GPUShaderStage } = globals;
45
39
  const device = await requestDevice();
40
+ const input = new Float32Array([1, 2, 3, 4]);
41
+ const bytes = input.byteLength;
46
42
 
47
- const THREADS = 65536;
48
- const WORKGROUP_SIZE = 256;
49
- const SAMPLES_PER_THREAD = 1024;
50
-
51
- if (THREADS % WORKGROUP_SIZE !== 0) {
52
- throw new Error("THREADS must be a multiple of WORKGROUP_SIZE");
53
- }
54
-
55
- const shader = device.createShaderModule({
56
- code: `
57
- @group(0) @binding(0) var<storage, read_write> counts: array<u32>;
58
-
59
- fn hash(n: u32) -> u32 {
60
- var x = n;
61
- x ^= x >> 16u;
62
- x *= 0x45d9f3bu;
63
- x ^= x >> 16u;
64
- x *= 0x45d9f3bu;
65
- x ^= x >> 16u;
66
- return x;
67
- }
68
-
69
- @compute @workgroup_size(${WORKGROUP_SIZE})
70
- fn main(@builtin(global_invocation_id) gid: vec3u) {
71
- var count = 0u;
72
- for (var i = 0u; i < ${SAMPLES_PER_THREAD}u; i += 1u) {
73
- let idx = gid.x * ${SAMPLES_PER_THREAD}u + i;
74
- let x = f32(hash(idx * 2u)) / 4294967295.0;
75
- let y = f32(hash(idx * 2u + 1u)) / 4294967295.0;
76
- if x * x + y * y <= 1.0 {
77
- count += 1u;
78
- }
79
- }
80
- counts[gid.x] = count;
81
- }
82
- `,
43
+ const src = device.createBuffer({
44
+ size: bytes,
45
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
83
46
  });
47
+ device.queue.writeBuffer(src, 0, input);
84
48
 
85
- const bindGroupLayout = device.createBindGroupLayout({
86
- entries: [{
87
- binding: 0,
88
- visibility: GPUShaderStage.COMPUTE,
89
- buffer: { type: "storage" },
90
- }],
49
+ const dst = device.createBuffer({
50
+ size: bytes,
51
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
91
52
  });
92
53
 
93
- const pipeline = device.createComputePipeline({
94
- layout: device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] }),
95
- compute: { module: shader, entryPoint: "main" },
54
+ const readback = device.createBuffer({
55
+ size: bytes,
56
+ usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
96
57
  });
97
58
 
98
- const countsBuffer = device.createBuffer({
99
- size: THREADS * 4,
100
- usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC,
101
- });
102
- const readback = device.createBuffer({
103
- size: THREADS * 4,
104
- usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
59
+ const pipeline = device.createComputePipeline({
60
+ layout: "auto",
61
+ compute: {
62
+ module: device.createShaderModule({
63
+ code: `
64
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
65
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
66
+
67
+ @compute @workgroup_size(4)
68
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
69
+ let i = gid.x;
70
+ dst[i] = src[i] * 2.0;
71
+ }
72
+ `,
73
+ }),
74
+ entryPoint: "main",
75
+ },
105
76
  });
106
77
 
107
78
  const bindGroup = device.createBindGroup({
108
- layout: bindGroupLayout,
109
- entries: [{ binding: 0, resource: { buffer: countsBuffer } }],
79
+ layout: pipeline.getBindGroupLayout(0),
80
+ entries: [
81
+ { binding: 0, resource: { buffer: src } },
82
+ { binding: 1, resource: { buffer: dst } },
83
+ ],
110
84
  });
111
85
 
112
86
  const encoder = device.createCommandEncoder();
113
87
  const pass = encoder.beginComputePass();
114
88
  pass.setPipeline(pipeline);
115
89
  pass.setBindGroup(0, bindGroup);
116
- pass.dispatchWorkgroups(THREADS / WORKGROUP_SIZE);
90
+ pass.dispatchWorkgroups(1);
117
91
  pass.end();
118
- encoder.copyBufferToBuffer(countsBuffer, 0, readback, 0, THREADS * 4);
92
+ encoder.copyBufferToBuffer(dst, 0, readback, 0, bytes);
93
+
119
94
  device.queue.submit([encoder.finish()]);
95
+ await device.queue.onSubmittedWorkDone();
120
96
 
121
- await readback.mapAsync(GPUMapMode.READ);
122
- const counts = new Uint32Array(readback.getMappedRange());
123
- const hits = counts.reduce((a, b) => a + b, 0);
97
+ await readback.mapAsync(globals.GPUMapMode.READ);
98
+ const result = new Float32Array(readback.getMappedRange().slice(0));
124
99
  readback.unmap();
125
100
 
126
- const total = THREADS * SAMPLES_PER_THREAD;
127
- const pi = 4 * hits / total;
128
- console.log(`${total.toLocaleString()} samples → pi ≈ ${pi.toFixed(6)}`);
101
+ console.log(result); // Float32Array(4) [ 2, 4, 6, 8 ]
129
102
  ```
130
103
 
131
- Expected output will vary slightly, but it should look like:
104
+ #### 2. Doe API
132
105
 
133
- ```
134
- 67,108,864 samples pi ≈ 3.14...
106
+ Explicit Doe buffers and dispatch when you want less boilerplate but still want
107
+ to manage the resources yourself.
108
+
109
+ ```js
110
+ import { doe } from "@simulatte/webgpu/compute";
111
+
112
+ const gpu = await doe.requestDevice();
113
+ const src = gpu.buffers.fromData(Float32Array.of(1, 2, 3, 4));
114
+ const dst = gpu.buffers.like(src, {
115
+ usage: "storageReadWrite",
116
+ });
117
+
118
+ await gpu.compute.run({
119
+ code: `
120
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
121
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
122
+
123
+ @compute @workgroup_size(4)
124
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
125
+ let i = gid.x;
126
+ dst[i] = src[i] * 2.0;
127
+ }
128
+ `,
129
+ // Access is inferred from the Doe buffer usage above.
130
+ bindings: [src, dst],
131
+ workgroups: 1,
132
+ });
133
+
134
+ console.log(await gpu.buffers.read(dst, Float32Array)); // Float32Array(4) [ 2, 4, 6, 8 ]
135
135
  ```
136
136
 
137
- Increase `SAMPLES_PER_THREAD` for more precision.
137
+ The package identity is simple:
138
138
 
139
- ## What this package is
139
+ - `requestDevice()` gives you real headless WebGPU
140
+ - `doe` gives you the same runtime with less boilerplate and explicit resource control
141
+ - `compute.once(...)` is the more opinionated routines layer when you do not want to manage buffers and readback yourself
140
142
 
141
- `@simulatte/webgpu` is the canonical package surface for Doe. Node uses an
142
- N-API addon and Bun currently routes through the same addon-backed runtime
143
- entry to load `libwebgpu_doe`. Current package builds still ship a Dawn sidecar
144
- where proc resolution requires it. The experimental raw Bun FFI path remains in
145
- `src/bun-ffi.js`, but it is not the default package entry.
143
+ #### 3. Doe routines: one-shot tensor matmul
146
144
 
147
- Doe is a Zig-first WebGPU runtime with explicit allocator control, startup-time
148
- profile and quirk binding, a native WGSL pipeline (`lexer -> parser ->
149
- semantic analysis -> IR -> backend emitters`), and explicit
150
- Vulkan/Metal/D3D12 execution paths in one system. Optional
151
- `-Dlean-verified=true` builds use Lean 4 where proved invariants can be
152
- hoisted out of runtime branches instead of being re-checked on every command;
153
- package consumers should not assume that path by default.
145
+ This is where the routines layer starts to separate itself: you pass typed
146
+ arrays and an output spec, and the package handles upload, output allocation,
147
+ dispatch, and readback while the shader and tensor shapes stay explicit.
154
148
 
155
- Doe also keeps adapter and driver quirks explicit. Profile selection happens at
156
- startup, quirk data is schema-backed, and the runtime binds the selected
157
- profile instead of relying on hidden per-command fallback logic.
149
+ ```js
150
+ import { doe } from "@simulatte/webgpu/compute";
158
151
 
159
- ## Current scope
152
+ const gpu = await doe.requestDevice();
153
+ const [M, K, N] = [256, 512, 256];
160
154
 
161
- - Node is the primary supported package surface (N-API bridge).
162
- - Bun has API parity with Node through the package's addon-backed runtime entry
163
- (61/61 contract tests passing). Bun benchmark cube maturity remains
164
- prototype until the comparable macOS cells stabilize across repeated
165
- governed runs.
166
- - Package-surface comparisons should be read through the benchmark cube outputs
167
- under `bench/out/cube/`, not as a replacement for strict backend reports.
155
+ const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
156
+ const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
157
+ const dims = new Uint32Array([M, K, N, 0]);
158
+
159
+ const result = await gpu.compute.once({
160
+ code: `
161
+ struct Dims {
162
+ m: u32,
163
+ k: u32,
164
+ n: u32,
165
+ _pad: u32,
166
+ };
167
+
168
+ @group(0) @binding(0) var<uniform> dims: Dims;
169
+ @group(0) @binding(1) var<storage, read> lhs: array<f32>;
170
+ @group(0) @binding(2) var<storage, read> rhs: array<f32>;
171
+ @group(0) @binding(3) var<storage, read_write> out: array<f32>;
172
+
173
+ @compute @workgroup_size(8, 8)
174
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
175
+ let row = gid.y;
176
+ let col = gid.x;
177
+ if (row >= dims.m || col >= dims.n) {
178
+ return;
179
+ }
180
+
181
+ var acc = 0.0;
182
+ for (var i = 0u; i < dims.k; i = i + 1u) {
183
+ acc += lhs[row * dims.k + i] * rhs[i * dims.n + col];
184
+ }
185
+ out[row * dims.n + col] = acc;
186
+ }
187
+ `,
188
+ inputs: [
189
+ { data: dims, usage: "uniform", access: "uniform" },
190
+ lhs,
191
+ rhs,
192
+ ],
193
+ output: {
194
+ type: Float32Array,
195
+ size: M * N * Float32Array.BYTES_PER_ELEMENT,
196
+ },
197
+ workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
198
+ });
199
+
200
+ console.log(result.subarray(0, 8)); // Float32Array(8) [ ... ]
201
+ ```
202
+
203
+ ### Benchmarked package surface
204
+
205
+ The package is not just a wrapper API. It is the headless package surface of
206
+ the Doe runtime, Fawn's Zig-first WebGPU implementation, and it is exercised as
207
+ a measured package surface with explicit package lanes.
168
208
 
169
209
  <p align="center">
170
210
  <img src="assets/package-surface-cube-snapshot.svg" alt="Static package-surface benchmark cube snapshot" width="920" />
171
211
  </p>
172
212
 
173
- Package-surface benchmark evidence lives under `bench/out/cube/latest/`. Read
174
- those rows as package-surface positioning data, not as substitutes for strict
175
- backend-native claim lanes.
213
+ `@simulatte/webgpu` is the headless package surface of the broader
214
+ [Fawn](https://github.com/clocksmith/fawn) project. The same repository also
215
+ carries the Doe runtime itself, benchmarking and verification tooling, and the
216
+ separate `nursery/fawn-browser` Chromium/browser integration lane.
176
217
 
177
- ## Quick start
218
+ ## Install
178
219
 
179
220
  ```bash
180
221
  npm install @simulatte/webgpu
181
222
  ```
182
223
 
224
+ The install ships platform-specific prebuilds for macOS arm64 (Metal) and
225
+ Linux x64 (Vulkan). If no prebuild matches your platform, the installer falls
226
+ back to building the native addon with `node-gyp` only; it does not build or
227
+ bundle `libwebgpu_doe` and the required Dawn sidecar for you. On unsupported
228
+ platforms, use a local Fawn workspace build for those runtime libraries.
229
+
230
+ ## Choose a surface
231
+
232
+ | Import | Surface | Includes |
233
+ | --------------------------- | --------------------- | --------------------------------------------------------- |
234
+ | `@simulatte/webgpu` | Default full surface | Buffers, compute, textures, samplers, render, Doe API + routines |
235
+ | `@simulatte/webgpu/compute` | Compute-first surface | Buffers, compute, copy/upload/readback, Doe API + routines |
236
+ | `@simulatte/webgpu/full` | Explicit full surface | Same contract as the default package surface |
237
+
238
+ Use `@simulatte/webgpu/compute` when you want the constrained package contract
239
+ for AI workloads and other buffer/dispatch-heavy headless execution. The
240
+ compute surface intentionally omits render and sampler methods from the JS
241
+ facade.
242
+
243
+ ## Package basics
244
+
245
+ ### Inspect the provider
246
+
183
247
  ```js
184
- import { providerInfo, requestDevice } from "@simulatte/webgpu";
248
+ import { providerInfo } from "@simulatte/webgpu";
185
249
 
186
250
  console.log(providerInfo());
187
-
188
- const device = await requestDevice();
189
- console.log(device.limits.maxBufferSize);
190
251
  ```
191
252
 
192
- The install ships platform-specific prebuilds for macOS arm64 (Metal) and
193
- Linux x64 (Vulkan). The commands are the same on both platforms; the correct
194
- backend is selected automatically. The only external prerequisite is GPU
195
- drivers on the host. If no prebuild matches your platform, install falls back
196
- to building from source via node-gyp.
197
-
198
- ## Verify your install
253
+ ### Request a full device
199
254
 
200
- After installing, run the smoke test to confirm native library loading and a
201
- GPU round-trip:
255
+ ```js
256
+ import { requestDevice } from "@simulatte/webgpu";
202
257
 
203
- ```bash
204
- npm run smoke
258
+ const device = await requestDevice();
259
+ console.log(device.limits.maxBufferSize);
205
260
  ```
206
261
 
207
- To run the full contract test suite (adapter, device, buffers, compute
208
- dispatch with readback, textures, samplers):
262
+ ### Request a compute-only device
209
263
 
210
- ```bash
211
- npm test # Node
212
- npm run test:bun # Bun
264
+ ```js
265
+ import { requestDevice } from "@simulatte/webgpu/compute";
266
+
267
+ const device = await requestDevice();
268
+ console.log(typeof device.createComputePipeline); // "function"
269
+ console.log(typeof device.createRenderPipeline); // "undefined"
213
270
  ```
214
271
 
215
- If `npm run smoke` fails, check that GPU drivers are installed and that your
216
- platform is supported (macOS arm64 or Linux x64).
272
+ ## Doe layers
217
273
 
218
- ## Building from source
274
+ The package exposes three layers over the same runtime:
219
275
 
220
- Use this when working from the Fawn repo checkout or rebuilding the addon
221
- against a local Doe runtime build.
276
+ <p align="center">
277
+ <img src="assets/package-layers.svg" alt="Layered package graph showing direct WebGPU, Doe API, and Doe routines over the same package surfaces." width="920" />
278
+ </p>
222
279
 
223
- ```bash
224
- # From the Fawn workspace root:
225
- cd zig && zig build dropin # build libwebgpu_doe + Dawn sidecar
226
-
227
- cd nursery/webgpu
228
- npm run build:addon # compile doe_napi.node from source
229
- npm run smoke # verify native loading + GPU round-trip
230
- npm test # Node contract tests
231
- npm run test:bun # Bun contract tests
232
- ```
280
+ - `Direct WebGPU`
281
+ raw `requestDevice()` plus direct `device.*`
282
+ - `Doe API`
283
+ explicit Doe surface for lower-boilerplate buffer and compute flows
284
+ - `Doe routines`
285
+ more opinionated Doe flows where the JS surface carries more of the operation
286
+
287
+ Examples for each style ship in:
288
+
289
+ - `examples/direct-webgpu/`
290
+ - `examples/doe-api/`
291
+ - `examples/doe-routines/`
292
+
293
+ `doe` is the package's shared JS convenience surface over the Doe runtime. It is available
294
+ from both `@simulatte/webgpu` and `@simulatte/webgpu/compute`.
295
+
296
+ - `await doe.requestDevice()` gets a bound helper object in one step; use
297
+ `doe.bind(device)` when you already have a device.
298
+ - `gpu.buffers.*`, `gpu.compute.run(...)`, and `gpu.compute.compile(...)` are
299
+ the main `Doe API` surface.
300
+ - `gpu.compute.once(...)` is currently the first `Doe routines` path.
301
+
302
+ The Doe API and Doe routines surface is the same on both package surfaces.
303
+ The difference is the raw device beneath it:
304
+
305
+ - `@simulatte/webgpu/compute` returns a compute-only facade
306
+ - `@simulatte/webgpu` keeps the full headless device surface
307
+
308
+ Binding access is inferred from Doe helper-created buffer usage when possible.
309
+ For raw WebGPU buffers or non-bindable/ambiguous usage, pass
310
+ `{ buffer, access }` explicitly.
311
+
312
+ ## Runtime notes
313
+
314
+ `@simulatte/webgpu` is the canonical package surface for the Doe runtime. Node uses the
315
+ addon-backed path. Bun uses a platform-dependent bridge today: Linux routes
316
+ through the package FFI surface, while macOS currently uses the full
317
+ addon-backed path for correctness parity. Current builds still ship a Dawn
318
+ sidecar where proc resolution requires it.
319
+
320
+ The Doe runtime is Fawn's Zig-first WebGPU implementation with explicit profile
321
+ and quirk binding, a native WGSL pipeline (`lexer -> parser -> semantic
322
+ analysis -> IR -> backend emitters`), and explicit Vulkan/Metal/D3D12
323
+ execution paths in one system.
324
+ Optional `-Dlean-verified=true` builds use Lean 4 as build-time proof support,
325
+ not as a runtime interpreter. When a condition is proved ahead of time, the Doe
326
+ runtime can remove that branch instead of re-checking it on every command;
327
+ package consumers should not assume that path by default.
233
328
 
234
- Current macOS arm64 validation for `0.2.3` was rerun on March 10, 2026 with:
329
+ ## Verify your install
235
330
 
236
331
  ```bash
237
- cd zig && zig build dropin
238
-
239
- cd nursery/webgpu
240
- npm run build:addon
241
332
  npm run smoke
242
333
  npm test
243
334
  npm run test:bun
244
- npm run prebuild -- --skip-addon-build
245
- npm pack --dry-run
246
335
  ```
247
336
 
248
- That path is green on the Apple Metal host. `npm run test:bun` also passed on
249
- this host (`61 passed, 0 failed`) once Bun was added to `PATH`.
337
+ `npm run smoke` checks native library loading and a GPU round-trip. `npm test`
338
+ covers the Node package contract and a packed-tarball export/import check.
250
339
 
251
- For Fawn development setup, build toolchain requirements, and benchmark
252
- harness usage, see the [Fawn project README](../../README.md).
340
+ ## Caveats
253
341
 
254
- ## Packaging prebuilds (CI / release)
342
+ - This is a headless package, not a browser DOM/canvas package.
343
+ - `@simulatte/webgpu/compute` is intentionally narrower than the default full
344
+ surface.
345
+ - Bun currently uses a platform-dependent bridge layer under the same package
346
+ contract: FFI on Linux, full/addon-backed on macOS. Package-surface contract
347
+ tests are green, and package benchmark rows are positioning data rather than
348
+ the source of truth for strict backend-native Doe-vs-Dawn claims.
255
349
 
256
- ```bash
257
- npm run prebuild # assembles prebuilds/<platform>-<arch>/
258
- ```
350
+ ## Further reading
259
351
 
260
- Supported prebuild targets: macOS arm64 (Metal), Linux x64 (Vulkan),
261
- Windows x64 (D3D12). Host GPU drivers are the only external prerequisite.
262
- Install uses prebuilds when available, falls back to node-gyp from source.
263
- Tracked `prebuilds/<platform>-<arch>/` directories are the source of truth for
264
- reproducible package publishes. If a prebuild exists only on one local machine
265
- and is not committed, `npm pack` output will differ by environment.
266
- Generated `.tgz` package archives are release outputs and should not be
267
- committed to the repo.
268
- Prebuild `metadata.json` now records `doeBuild.leanVerifiedBuild` and
269
- `proofArtifactSha256`, and `providerInfo()` surfaces the same values when
270
- metadata is present.
271
-
272
- Package publication still depends on the governed Linux Vulkan release lane in
273
- [`process.md`](../../process.md). A green macOS package rerun is necessary, but
274
- not sufficient, for a release publish.
275
-
276
- ## Current caveats
277
-
278
- - This package is for headless benchmarking and CI workflows, not full browser
279
- parity.
280
- - Node provider comparisons are host-local package/runtime evidence measured
281
- with package-level timers. They are useful surface-positioning data, not
282
- backend claim substantiation or a broad "the package is faster" claim.
283
- - `@simulatte/webgpu` does not yet have a single broad cross-surface speed
284
- claim. Current performance evidence is split across Node package-surface
285
- runs, prototype Bun package-surface runs, and workload-specific strict
286
- backend reports.
287
- - Linux Node Doe-native path is now wired end-to-end (Linux guard removed).
288
- No `DOE_WEBGPU_LIB` env var needed when prebuilds or workspace artifacts
289
- are present.
290
- - Fresh macOS package evidence from March 10, 2026 is reflected in
291
- `bench/out/cube/latest/` (generated `2026-03-10T20:31:02.431911Z`):
292
- Bun `uploads`, `compute_e2e`, and `full_comparable` are `claimable`;
293
- Node `uploads`, `compute_e2e`, and `full_comparable` are also `claimable`.
294
- - Separate Apple Metal extended-comparable backend evidence from March 10, 2026
295
- (`bench/out/apple-metal/extended-comparable/20260310T121546Z/`) is
296
- `31/31` comparable and `31/31` claimable. Read that lane as stricter
297
- backend evidence, not as a replacement for the package-surface cube rows.
298
- - Bun has API parity with Node (61/61 contract tests). The package-default Bun
299
- entry currently routes through the addon-backed runtime, while
300
- `src/bun-ffi.js` remains experimental. Bun benchmark lane is at
301
- `bench/bun/compare.js`; benchmark interpretations should note which runtime
302
- entry was exercised. Latest fresh macOS run
303
- (`bench/out/bun-doe-vs-webgpu/doe-vs-bun-webgpu-2026-03-10T195022524Z.json`)
304
- executes all `12` current workloads and has `9` comparable rows, all `9`
305
- claimable. `compute_e2e_{256,4096,65536}` and
306
- `copy_buffer_to_buffer_4kb` are claimable in the full macOS package lane.
307
- The remaining three rows are intentional directional-only workloads
308
- (`submit_empty`, `pipeline_create`, `compute_dispatch_simple`).
309
- - Latest fresh macOS Node package run
310
- (`bench/out/node-doe-vs-dawn-claim-full/doe-vs-dawn-node-2026-03-10T202406545Z.json`)
311
- has `12` total rows, `9` comparable rows, and all `9` comparable rows are
312
- claimable. `compute_e2e_{256,4096,65536}`, `copy_buffer_to_buffer_4kb`,
313
- and the current upload set are claimable in the full package lane. The
314
- remaining three rows are intentional directional-only workloads
315
- (`submit_empty`, `pipeline_create`, `compute_dispatch_simple`).
316
- - Self-contained install ships prebuilt `doe_napi.node` + `libwebgpu_doe` +
317
- Dawn sidecar per platform. See **Verify your install** above.
318
- - API details live in `API_CONTRACT.md`.
319
- - Compatibility scope is documented in `COMPAT_SCOPE.md`.
352
+ - [API contract](./api-contract.md)
353
+ - [Support contracts](./support-contracts.md)
354
+ - [Compatibility scope](./compat-scope.md)
355
+ - [Layering plan](./layering-plan.md)
356
+ - [Headless WebGPU comparison](./headless-webgpu-comparison.md)
357
+ - [Zig source inventory](./zig-source-inventory.md)