@simulatte/webgpu 0.2.4 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/CHANGELOG.md +21 -0
  2. package/README.md +263 -71
  3. package/api-contract.md +70 -139
  4. package/assets/package-layers.svg +63 -0
  5. package/examples/direct-webgpu/compute-dispatch.js +66 -0
  6. package/examples/direct-webgpu/explicit-bind-group.js +85 -0
  7. package/examples/direct-webgpu/request-device.js +10 -0
  8. package/examples/doe-api/buffers-readback.js +9 -0
  9. package/examples/doe-api/compile-and-dispatch.js +30 -0
  10. package/examples/doe-api/compute-dispatch.js +25 -0
  11. package/examples/doe-routines/compute-once-like-input.js +36 -0
  12. package/examples/doe-routines/compute-once-matmul.js +53 -0
  13. package/examples/doe-routines/compute-once-multiple-inputs.js +27 -0
  14. package/examples/doe-routines/compute-once.js +23 -0
  15. package/headless-webgpu-comparison.md +2 -2
  16. package/layering-plan.md +1 -1
  17. package/native/doe_napi.c +102 -12
  18. package/package.json +2 -1
  19. package/prebuilds/darwin-arm64/doe_napi.node +0 -0
  20. package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
  21. package/prebuilds/darwin-arm64/metadata.json +6 -6
  22. package/prebuilds/linux-x64/doe_napi.node +0 -0
  23. package/prebuilds/linux-x64/libwebgpu_doe.so +0 -0
  24. package/prebuilds/linux-x64/metadata.json +5 -5
  25. package/scripts/generate-readme-assets.js +79 -6
  26. package/scripts/prebuild.js +23 -19
  27. package/src/auto_bind_group_layout.js +32 -0
  28. package/src/bun-ffi.js +93 -12
  29. package/src/bun.js +23 -2
  30. package/src/compute.d.ts +2 -1
  31. package/src/compute.js +671 -33
  32. package/src/doe.d.ts +127 -27
  33. package/src/doe.js +480 -114
  34. package/src/full.d.ts +8 -1
  35. package/src/full.js +28 -3
  36. package/src/index.js +1013 -38
package/CHANGELOG.md CHANGED
@@ -7,6 +7,27 @@ retrofitted from package version history and package-surface commits so the npm
7
7
  package has a conventional release history alongside the broader Fawn status
8
8
  and process documents.
9
9
 
10
+ ## [0.3.0] - 2026-03-11
11
+
12
+ ### Changed
13
+
14
+ - Breaking: redesigned the shared `doe` surface around `await
15
+ doe.requestDevice()`, grouped `gpu.buffers.*`, and grouped
16
+ `gpu.compute.*` instead of the earlier flat bound-helper methods.
17
+ - Added `gpu.buffers.like(...)` for buffer-allocation boilerplate reduction and
18
+ `gpu.compute.once(...)` for the first `Doe routines` workflow.
19
+ - Doe helper token values now use camelCase (`storageRead`,
20
+ `storageReadWrite`) and Doe workgroups now accept `[x, y]` in addition to
21
+ `number` and `[x, y, z]`.
22
+ - `gpu.compute.once(...)` now rejects raw numeric WebGPU usage flags; use Doe
23
+ usage tokens there or drop to `gpu.buffers.*` for explicit raw-flag control.
24
+ - Kept the same `doe` shape on `@simulatte/webgpu` and
25
+ `@simulatte/webgpu/compute`; the package split remains the underlying raw
26
+ device surface (`full` vs compute-only facade), not separate helper dialects.
27
+ - Updated the package README, API contract, and JSDoc guide to standardize the
28
+ `Direct WebGPU`, `Doe API`, and `Doe routines` model and the boundary between the headless package lane and
29
+ `nursery/fawn-browser`.
30
+
10
31
  ## [0.2.4] - 2026-03-11
11
32
 
12
33
  ### Changed
package/README.md CHANGED
@@ -1,13 +1,219 @@
1
1
  # @simulatte/webgpu
2
2
 
3
- Headless WebGPU for Node.js and Bun, powered by Doe.
3
+ <table>
4
+ <tr>
5
+ <td valign="middle">
6
+ <strong>Run real WebGPU workloads in Node.js and Bun with Doe, the WebGPU runtime from Fawn.</strong>
7
+ </td>
8
+ <td valign="middle">
9
+ <img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="88" />
10
+ </td>
11
+ </tr>
12
+ </table>
13
+
14
+ `@simulatte/webgpu` is Fawn's headless WebGPU package for Node.js and Bun: use
15
+ the raw WebGPU API through `requestDevice()` and `device.*`, or move up to the
16
+ Doe API + routines when you want the same runtime with less setup. Browser
17
+ DOM/canvas ownership lives in the separate `nursery/fawn-browser` lane.
18
+
19
+ Terminology in this README is deliberate:
20
+
21
+ - `Doe runtime` means the Zig/native WebGPU runtime underneath the package
22
+ - `Doe API` means the explicit JS convenience surface under `doe`, `gpu.buffers.*`,
23
+ `gpu.compute.run(...)`, and `gpu.compute.compile(...)`
24
+ - `Doe routines` means the narrower, more opinionated JS flows such as
25
+ `gpu.compute.once(...)`
26
+
27
+ ## Start here
28
+
29
+ ### Same workload, two layers
30
+
31
+ The same simple compute pass, shown first at the raw WebGPU layer and then at
32
+ the explicit Doe API layer.
33
+
34
+ #### 1. Direct WebGPU
35
+
36
+ ```js
37
+ import { globals, requestDevice } from "@simulatte/webgpu";
38
+
39
+ const device = await requestDevice();
40
+ const input = new Float32Array([1, 2, 3, 4]);
41
+ const bytes = input.byteLength;
42
+
43
+ const src = device.createBuffer({
44
+ size: bytes,
45
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
46
+ });
47
+ device.queue.writeBuffer(src, 0, input);
48
+
49
+ const dst = device.createBuffer({
50
+ size: bytes,
51
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
52
+ });
53
+
54
+ const readback = device.createBuffer({
55
+ size: bytes,
56
+ usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
57
+ });
58
+
59
+ const pipeline = device.createComputePipeline({
60
+ layout: "auto",
61
+ compute: {
62
+ module: device.createShaderModule({
63
+ code: `
64
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
65
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
66
+
67
+ @compute @workgroup_size(4)
68
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
69
+ let i = gid.x;
70
+ dst[i] = src[i] * 2.0;
71
+ }
72
+ `,
73
+ }),
74
+ entryPoint: "main",
75
+ },
76
+ });
77
+
78
+ const bindGroup = device.createBindGroup({
79
+ layout: pipeline.getBindGroupLayout(0),
80
+ entries: [
81
+ { binding: 0, resource: { buffer: src } },
82
+ { binding: 1, resource: { buffer: dst } },
83
+ ],
84
+ });
85
+
86
+ const encoder = device.createCommandEncoder();
87
+ const pass = encoder.beginComputePass();
88
+ pass.setPipeline(pipeline);
89
+ pass.setBindGroup(0, bindGroup);
90
+ pass.dispatchWorkgroups(1);
91
+ pass.end();
92
+ encoder.copyBufferToBuffer(dst, 0, readback, 0, bytes);
93
+
94
+ device.queue.submit([encoder.finish()]);
95
+ await device.queue.onSubmittedWorkDone();
96
+
97
+ await readback.mapAsync(globals.GPUMapMode.READ);
98
+ const result = new Float32Array(readback.getMappedRange().slice(0));
99
+ readback.unmap();
100
+
101
+ console.log(result); // Float32Array(4) [ 2, 4, 6, 8 ]
102
+ ```
103
+
104
+ #### 2. Doe API
105
+
106
+ Explicit Doe buffers and dispatch when you want less boilerplate but still want
107
+ to manage the resources yourself.
108
+
109
+ ```js
110
+ import { doe } from "@simulatte/webgpu/compute";
111
+
112
+ const gpu = await doe.requestDevice();
113
+ const src = gpu.buffers.fromData(Float32Array.of(1, 2, 3, 4));
114
+ const dst = gpu.buffers.like(src, {
115
+ usage: "storageReadWrite",
116
+ });
117
+
118
+ await gpu.compute.run({
119
+ code: `
120
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
121
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
122
+
123
+ @compute @workgroup_size(4)
124
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
125
+ let i = gid.x;
126
+ dst[i] = src[i] * 2.0;
127
+ }
128
+ `,
129
+ // Access is inferred from the Doe buffer usage above.
130
+ bindings: [src, dst],
131
+ workgroups: 1,
132
+ });
133
+
134
+ console.log(await gpu.buffers.read(dst, Float32Array)); // Float32Array(4) [ 2, 4, 6, 8 ]
135
+ ```
136
+
137
+ The package identity is simple:
138
+
139
+ - `requestDevice()` gives you real headless WebGPU
140
+ - `doe` gives you the same runtime with less boilerplate and explicit resource control
141
+ - `compute.once(...)` is the more opinionated routines layer when you do not want to manage buffers and readback yourself
142
+
143
+ #### 3. Doe routines: one-shot tensor matmul
144
+
145
+ This is where the routines layer starts to separate itself: you pass typed
146
+ arrays and an output spec, and the package handles upload, output allocation,
147
+ dispatch, and readback while the shader and tensor shapes stay explicit.
148
+
149
+ ```js
150
+ import { doe } from "@simulatte/webgpu/compute";
151
+
152
+ const gpu = await doe.requestDevice();
153
+ const [M, K, N] = [256, 512, 256];
154
+
155
+ const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
156
+ const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
157
+ const dims = new Uint32Array([M, K, N, 0]);
158
+
159
+ const result = await gpu.compute.once({
160
+ code: `
161
+ struct Dims {
162
+ m: u32,
163
+ k: u32,
164
+ n: u32,
165
+ _pad: u32,
166
+ };
167
+
168
+ @group(0) @binding(0) var<uniform> dims: Dims;
169
+ @group(0) @binding(1) var<storage, read> lhs: array<f32>;
170
+ @group(0) @binding(2) var<storage, read> rhs: array<f32>;
171
+ @group(0) @binding(3) var<storage, read_write> out: array<f32>;
172
+
173
+ @compute @workgroup_size(8, 8)
174
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
175
+ let row = gid.y;
176
+ let col = gid.x;
177
+ if (row >= dims.m || col >= dims.n) {
178
+ return;
179
+ }
180
+
181
+ var acc = 0.0;
182
+ for (var i = 0u; i < dims.k; i = i + 1u) {
183
+ acc += lhs[row * dims.k + i] * rhs[i * dims.n + col];
184
+ }
185
+ out[row * dims.n + col] = acc;
186
+ }
187
+ `,
188
+ inputs: [
189
+ { data: dims, usage: "uniform", access: "uniform" },
190
+ lhs,
191
+ rhs,
192
+ ],
193
+ output: {
194
+ type: Float32Array,
195
+ size: M * N * Float32Array.BYTES_PER_ELEMENT,
196
+ },
197
+ workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
198
+ });
199
+
200
+ console.log(result.subarray(0, 8)); // Float32Array(8) [ ... ]
201
+ ```
202
+
203
+ ### Benchmarked package surface
204
+
205
+ The package is not just a wrapper API. It is the headless package surface of
206
+ the Doe runtime, Fawn's Zig-first WebGPU implementation, and it is exercised as
207
+ a measured package surface with explicit package lanes.
4
208
 
5
209
  <p align="center">
6
- <img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="196" />
210
+ <img src="assets/package-surface-cube-snapshot.svg" alt="Static package-surface benchmark cube snapshot" width="920" />
7
211
  </p>
8
212
 
9
- Use this package for compute, CI, benchmarking, and offscreen GPU execution.
10
- It is not a DOM/canvas package and it does not target browser-surface parity.
213
+ `@simulatte/webgpu` is the headless package surface of the broader
214
+ [Fawn](https://github.com/clocksmith/fawn) project. The same repository also
215
+ carries the Doe runtime itself, benchmarking and verification tooling, and the
216
+ separate `nursery/fawn-browser` Chromium/browser integration lane.
11
217
 
12
218
  ## Install
13
219
 
@@ -23,18 +229,18 @@ platforms, use a local Fawn workspace build for those runtime libraries.
23
229
 
24
230
  ## Choose a surface
25
231
 
26
- | Import | Surface | Includes |
27
- | --- | --- | --- |
28
- | `@simulatte/webgpu` | Default full surface | Buffers, compute, textures, samplers, render, Doe helpers |
29
- | `@simulatte/webgpu/compute` | Compute-first surface | Buffers, compute, copy/upload/readback, Doe helpers |
30
- | `@simulatte/webgpu/full` | Explicit full surface | Same contract as the default package surface |
232
+ | Import | Surface | Includes |
233
+ | --------------------------- | --------------------- | --------------------------------------------------------- |
234
+ | `@simulatte/webgpu` | Default full surface | Buffers, compute, textures, samplers, render, Doe API + routines |
235
+ | `@simulatte/webgpu/compute` | Compute-first surface | Buffers, compute, copy/upload/readback, Doe API + routines |
236
+ | `@simulatte/webgpu/full` | Explicit full surface | Same contract as the default package surface |
31
237
 
32
238
  Use `@simulatte/webgpu/compute` when you want the constrained package contract
33
239
  for AI workloads and other buffer/dispatch-heavy headless execution. The
34
240
  compute surface intentionally omits render and sampler methods from the JS
35
241
  facade.
36
242
 
37
- ## Quick examples
243
+ ## Package basics
38
244
 
39
245
  ### Inspect the provider
40
246
 
@@ -60,77 +266,65 @@ import { requestDevice } from "@simulatte/webgpu/compute";
60
266
 
61
267
  const device = await requestDevice();
62
268
  console.log(typeof device.createComputePipeline); // "function"
63
- console.log(typeof device.createRenderPipeline); // "undefined"
269
+ console.log(typeof device.createRenderPipeline); // "undefined"
64
270
  ```
65
271
 
66
- ### Run a small compute job with `doe`
272
+ ## Doe layers
67
273
 
68
- ```js
69
- import { doe, requestDevice } from "@simulatte/webgpu/compute";
274
+ The package exposes three layers over the same runtime:
70
275
 
71
- const gpu = doe.bind(await requestDevice());
276
+ <p align="center">
277
+ <img src="assets/package-layers.svg" alt="Layered package graph showing direct WebGPU, Doe API, and Doe routines over the same package surfaces." width="920" />
278
+ </p>
72
279
 
73
- const input = gpu.createBufferFromData(new Float32Array([1, 2, 3, 4]));
280
+ - `Direct WebGPU`
281
+ raw `requestDevice()` plus direct `device.*`
282
+ - `Doe API`
283
+ explicit Doe surface for lower-boilerplate buffer and compute flows
284
+ - `Doe routines`
285
+ more opinionated Doe flows where the JS surface carries more of the operation
74
286
 
75
- const output = gpu.createBuffer({
76
- size: input.size,
77
- usage: "storage-readwrite",
78
- });
287
+ Examples for each style ship in:
79
288
 
80
- await gpu.runCompute({
81
- code: `
82
- @group(0) @binding(0) var<storage, read> src: array<f32>;
83
- @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
289
+ - `examples/direct-webgpu/`
290
+ - `examples/doe-api/`
291
+ - `examples/doe-routines/`
84
292
 
85
- @compute @workgroup_size(4)
86
- fn main(@builtin(global_invocation_id) gid: vec3u) {
87
- let i = gid.x;
88
- dst[i] = src[i] * 2.0;
89
- }
90
- `,
91
- bindings: [input, output],
92
- workgroups: 1,
93
- });
293
+ `doe` is the package's shared JS convenience surface over the Doe runtime. It is available
294
+ from both `@simulatte/webgpu` and `@simulatte/webgpu/compute`.
94
295
 
95
- const result = await gpu.readBuffer(output, Float32Array);
96
- console.log(Array.from(result)); // [2, 4, 6, 8]
97
- ```
296
+ - `await doe.requestDevice()` gets a bound helper object in one step; use
297
+ `doe.bind(device)` when you already have a device.
298
+ - `gpu.buffers.*`, `gpu.compute.run(...)`, and `gpu.compute.compile(...)` are
299
+ the main `Doe API` surface.
300
+ - `gpu.compute.once(...)` is currently the first `Doe routines` path.
301
+
302
+ The Doe API and Doe routines surface is the same on both package surfaces.
303
+ The difference is the raw device beneath it:
304
+
305
+ - `@simulatte/webgpu/compute` returns a compute-only facade
306
+ - `@simulatte/webgpu` keeps the full headless device surface
98
307
 
99
- `doe` is available from both `@simulatte/webgpu` and
100
- `@simulatte/webgpu/compute`. It provides a small ergonomic layer for common
101
- headless tasks: `doe.bind(device)` for device-bound workflows, plus static
102
- buffer creation, readback, one-shot compute dispatch, and
103
- reusable compiled compute kernels.
104
308
  Binding access is inferred from Doe helper-created buffer usage when possible.
105
309
  For raw WebGPU buffers or non-bindable/ambiguous usage, pass
106
310
  `{ buffer, access }` explicitly.
107
311
 
108
- ## What this package is
109
-
110
- `@simulatte/webgpu` is the canonical package surface for Doe. Node uses an
111
- N-API addon and Bun currently routes through the same addon-backed runtime
112
- entry to load `libwebgpu_doe`. Current builds still ship a Dawn sidecar where
113
- proc resolution requires it.
312
+ ## Runtime notes
114
313
 
115
- Doe is a Zig-first WebGPU runtime with explicit profile and quirk binding, a
116
- native WGSL pipeline (`lexer -> parser -> semantic analysis -> IR -> backend
117
- emitters`), and explicit Vulkan/Metal/D3D12 execution paths in one system.
118
- Optional `-Dlean-verified=true` builds use Lean 4 where proved invariants can
119
- be hoisted out of runtime branches instead of being re-checked on every
120
- command; package consumers should not assume that path by default.
314
+ `@simulatte/webgpu` is the canonical package surface for the Doe runtime. Node uses the
315
+ addon-backed path. Bun uses a platform-dependent bridge today: Linux routes
316
+ through the package FFI surface, while macOS currently uses the full
317
+ addon-backed path for correctness parity. Current builds still ship a Dawn
318
+ sidecar where proc resolution requires it.
121
319
 
122
- ## Current scope
123
-
124
- - `@simulatte/webgpu` is the default full headless package surface.
125
- - `@simulatte/webgpu/compute` is the compute-first subset for AI workloads.
126
- - Node is the primary supported package surface.
127
- - Bun currently shares the addon-backed runtime entry with Node.
128
- - Package-surface comparisons should be read through the published repository
129
- benchmark artifacts, not as a replacement for strict backend reports.
130
-
131
- <p align="center">
132
- <img src="assets/package-surface-cube-snapshot.svg" alt="Static package-surface benchmark cube snapshot" width="920" />
133
- </p>
320
+ The Doe runtime is Fawn's Zig-first WebGPU implementation with explicit profile
321
+ and quirk binding, a native WGSL pipeline (`lexer -> parser -> semantic
322
+ analysis -> IR -> backend emitters`), and explicit Vulkan/Metal/D3D12
323
+ execution paths in one system.
324
+ Optional `-Dlean-verified=true` builds use Lean 4 as build-time proof support,
325
+ not as a runtime interpreter. When a condition is proved ahead of time, the Doe
326
+ runtime can remove that branch instead of re-checking it on every command;
327
+ package consumers should not assume that path by default.
134
328
 
135
329
  ## Verify your install
136
330
 
@@ -148,12 +342,10 @@ covers the Node package contract and a packed-tarball export/import check.
148
342
  - This is a headless package, not a browser DOM/canvas package.
149
343
  - `@simulatte/webgpu/compute` is intentionally narrower than the default full
150
344
  surface.
151
- - Bun currently shares the addon-backed runtime entry with Node. Package-surface
152
- contract tests are green, and current comparable macOS package cells are
153
- claimable. Any FFI-specific claims remain scoped to the experimental Bun FFI
154
- path until separately revalidated.
155
- - Package-surface benchmark rows are positioning data; backend-native claim
156
- lanes remain the source of truth for strict Doe-vs-Dawn claims.
345
+ - Bun currently uses a platform-dependent bridge layer under the same package
346
+ contract: FFI on Linux, full/addon-backed on macOS. Package-surface contract
347
+ tests are green, and package benchmark rows are positioning data rather than
348
+ the source of truth for strict backend-native Doe-vs-Dawn claims.
157
349
 
158
350
  ## Further reading
159
351