@simulatte/webgpu 0.2.3 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +47 -4
- package/README.md +273 -235
- package/api-contract.md +163 -0
- package/assets/fawn-icon-main-256.png +0 -0
- package/assets/package-layers.svg +63 -0
- package/assets/package-surface-cube-snapshot.svg +7 -7
- package/{COMPAT_SCOPE.md → compat-scope.md} +1 -1
- package/examples/direct-webgpu/compute-dispatch.js +66 -0
- package/examples/direct-webgpu/explicit-bind-group.js +85 -0
- package/examples/direct-webgpu/request-device.js +10 -0
- package/examples/doe-api/buffers-readback.js +9 -0
- package/examples/doe-api/compile-and-dispatch.js +30 -0
- package/examples/doe-api/compute-dispatch.js +25 -0
- package/examples/doe-routines/compute-once-like-input.js +36 -0
- package/examples/doe-routines/compute-once-matmul.js +53 -0
- package/examples/doe-routines/compute-once-multiple-inputs.js +27 -0
- package/examples/doe-routines/compute-once.js +23 -0
- package/headless-webgpu-comparison.md +2 -2
- package/{LAYERING_PLAN.md → layering-plan.md} +10 -8
- package/native/doe_napi.c +102 -12
- package/package.json +26 -9
- package/prebuilds/darwin-arm64/doe_napi.node +0 -0
- package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
- package/prebuilds/darwin-arm64/metadata.json +6 -6
- package/prebuilds/linux-x64/doe_napi.node +0 -0
- package/prebuilds/linux-x64/libwebgpu_doe.so +0 -0
- package/prebuilds/linux-x64/metadata.json +5 -5
- package/scripts/generate-readme-assets.js +81 -8
- package/scripts/prebuild.js +23 -19
- package/src/auto_bind_group_layout.js +32 -0
- package/src/bun-ffi.js +93 -12
- package/src/bun.js +23 -2
- package/src/compute.d.ts +162 -0
- package/src/compute.js +915 -0
- package/src/doe.d.ts +184 -0
- package/src/doe.js +641 -0
- package/src/full.d.ts +119 -0
- package/src/full.js +35 -0
- package/src/index.js +1013 -38
- package/src/node-runtime.js +2 -2
- package/src/node.js +2 -2
- package/{SUPPORT_CONTRACTS.md → support-contracts.md} +27 -41
- package/{ZIG_SOURCE_INVENTORY.md → zig-source-inventory.md} +2 -2
- package/API_CONTRACT.md +0 -182
package/CHANGELOG.md
CHANGED
|
@@ -7,19 +7,62 @@ retrofitted from package version history and package-surface commits so the npm
|
|
|
7
7
|
package has a conventional release history alongside the broader Fawn status
|
|
8
8
|
and process documents.
|
|
9
9
|
|
|
10
|
+
## [0.3.0] - 2026-03-11
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- Breaking: redesigned the shared `doe` surface around `await
|
|
15
|
+
doe.requestDevice()`, grouped `gpu.buffers.*`, and grouped
|
|
16
|
+
`gpu.compute.*` instead of the earlier flat bound-helper methods.
|
|
17
|
+
- Added `gpu.buffers.like(...)` for buffer-allocation boilerplate reduction and
|
|
18
|
+
`gpu.compute.once(...)` for the first `Doe routines` workflow.
|
|
19
|
+
- Doe helper token values now use camelCase (`storageRead`,
|
|
20
|
+
`storageReadWrite`) and Doe workgroups now accept `[x, y]` in addition to
|
|
21
|
+
`number` and `[x, y, z]`.
|
|
22
|
+
- `gpu.compute.once(...)` now rejects raw numeric WebGPU usage flags; use Doe
|
|
23
|
+
usage tokens there or drop to `gpu.buffers.*` for explicit raw-flag control.
|
|
24
|
+
- Kept the same `doe` shape on `@simulatte/webgpu` and
|
|
25
|
+
`@simulatte/webgpu/compute`; the package split remains the underlying raw
|
|
26
|
+
device surface (`full` vs compute-only facade), not separate helper dialects.
|
|
27
|
+
- Updated the package README, API contract, and JSDoc guide to standardize the
|
|
28
|
+
`Direct WebGPU`, `Doe API`, and `Doe routines` model and the boundary between the headless package lane and
|
|
29
|
+
`nursery/fawn-browser`.
|
|
30
|
+
|
|
31
|
+
## [0.2.4] - 2026-03-11
|
|
32
|
+
|
|
33
|
+
### Changed
|
|
34
|
+
|
|
35
|
+
- `doe.runCompute()` now infers binding access from Doe helper-created buffer
|
|
36
|
+
usage and fails fast when a bare binding lacks Doe usage metadata or uses a
|
|
37
|
+
non-bindable/ambiguous usage shape.
|
|
38
|
+
- Simplified the compute-surface README example to use inferred binding access
|
|
39
|
+
(`bindings: [input, output]`) and the device-bound `doe.bind(await
|
|
40
|
+
requestDevice())` flow directly.
|
|
41
|
+
- Clarified the install contract for non-prebuilt platforms: the `node-gyp`
|
|
42
|
+
fallback only builds the native addon and does not bundle `libwebgpu_doe`
|
|
43
|
+
plus the required Dawn sidecar.
|
|
44
|
+
- Aligned the published package docs and API contract with the current
|
|
45
|
+
`@simulatte/webgpu`, `@simulatte/webgpu/compute`, and `@simulatte/webgpu/full`
|
|
46
|
+
export surface.
|
|
47
|
+
|
|
10
48
|
## [0.2.3] - 2026-03-10
|
|
11
49
|
|
|
12
50
|
### Added
|
|
13
51
|
|
|
14
52
|
- macOS arm64 (Metal) prebuilds shipped alongside existing Linux x64 (Vulkan).
|
|
15
|
-
- Monte Carlo pi estimation example in the README, replacing the trivial
|
|
16
|
-
buffer-readback snippet with a real GPU compute demonstration.
|
|
17
53
|
- "Verify your install" section with `npm run smoke` and `npm test` guidance.
|
|
54
|
+
- Added explicit package export surfaces for `@simulatte/webgpu` (default
|
|
55
|
+
full) and `@simulatte/webgpu/compute`, plus the first `doe` ergonomic
|
|
56
|
+
namespace for buffer/readback/compute helpers.
|
|
57
|
+
- Added `doe.bind(device)` so the ergonomic helper surface supports device-bound
|
|
58
|
+
workflows in addition to static helper calls.
|
|
18
59
|
|
|
19
60
|
### Changed
|
|
20
61
|
|
|
21
|
-
- Restructured package README
|
|
22
|
-
|
|
62
|
+
- Restructured the package README around the default full surface,
|
|
63
|
+
`@simulatte/webgpu/compute`, and the `doe` helper surface.
|
|
64
|
+
- `doe.runCompute()` now infers binding access from Doe helper-created buffer
|
|
65
|
+
usage and fails fast for bare bindings that do not carry Doe usage metadata.
|
|
23
66
|
- Fixed broken README image links to use bundled asset paths instead of dead
|
|
24
67
|
raw GitHub URLs.
|
|
25
68
|
- Root Fawn README now directs package users to the package README.
|
package/README.md
CHANGED
|
@@ -1,319 +1,357 @@
|
|
|
1
1
|
# @simulatte/webgpu
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
3
|
+
<table>
|
|
4
|
+
<tr>
|
|
5
|
+
<td valign="middle">
|
|
6
|
+
<strong>Run real WebGPU workloads in Node.js and Bun with Doe, the WebGPU runtime from Fawn.</strong>
|
|
7
|
+
</td>
|
|
8
|
+
<td valign="middle">
|
|
9
|
+
<img src="assets/fawn-icon-main-256.png" alt="Fawn logo" width="88" />
|
|
10
|
+
</td>
|
|
11
|
+
</tr>
|
|
12
|
+
</table>
|
|
5
13
|
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
Use this package for headless compute, CI, benchmarking, and offscreen GPU
|
|
11
|
-
execution. It is built for explicit runtime behavior, deterministic
|
|
12
|
-
traceability, and artifact-backed performance work. It is not a DOM/canvas
|
|
13
|
-
package and it should not be read as a promise of full browser-surface parity.
|
|
14
|
-
|
|
15
|
-
## Quick examples
|
|
16
|
-
|
|
17
|
-
### Inspect the provider
|
|
14
|
+
`@simulatte/webgpu` is Fawn's headless WebGPU package for Node.js and Bun: use
|
|
15
|
+
the raw WebGPU API through `requestDevice()` and `device.*`, or move up to the
|
|
16
|
+
Doe API + routines when you want the same runtime with less setup. Browser
|
|
17
|
+
DOM/canvas ownership lives in the separate `nursery/fawn-browser` lane.
|
|
18
18
|
|
|
19
|
-
|
|
20
|
-
import { providerInfo } from "@simulatte/webgpu";
|
|
21
|
-
|
|
22
|
-
console.log(providerInfo());
|
|
23
|
-
```
|
|
19
|
+
Terminology in this README is deliberate:
|
|
24
20
|
|
|
25
|
-
|
|
21
|
+
- `Doe runtime` means the Zig/native WebGPU runtime underneath the package
|
|
22
|
+
- `Doe API` means the explicit JS convenience surface under `doe`, `gpu.buffers.*`,
|
|
23
|
+
`gpu.compute.run(...)`, and `gpu.compute.compile(...)`
|
|
24
|
+
- `Doe routines` means the narrower, more opinionated JS flows such as
|
|
25
|
+
`gpu.compute.once(...)`
|
|
26
26
|
|
|
27
|
-
|
|
28
|
-
import { requestDevice } from "@simulatte/webgpu";
|
|
27
|
+
## Start here
|
|
29
28
|
|
|
30
|
-
|
|
31
|
-
console.log(device.limits.maxBufferSize);
|
|
32
|
-
```
|
|
29
|
+
### Same workload, two layers
|
|
33
30
|
|
|
34
|
-
|
|
31
|
+
The same simple compute pass, shown first at the raw WebGPU layer and then at
|
|
32
|
+
the explicit Doe API layer.
|
|
35
33
|
|
|
36
|
-
|
|
37
|
-
hashes its index to produce sample coordinates, counts how many land inside
|
|
38
|
-
the unit circle, and writes its count to a results array. The CPU sums the
|
|
39
|
-
counts and computes pi ≈ 4 × hits / total.
|
|
34
|
+
#### 1. Direct WebGPU
|
|
40
35
|
|
|
41
36
|
```js
|
|
42
37
|
import { globals, requestDevice } from "@simulatte/webgpu";
|
|
43
38
|
|
|
44
|
-
const { GPUBufferUsage, GPUMapMode, GPUShaderStage } = globals;
|
|
45
39
|
const device = await requestDevice();
|
|
40
|
+
const input = new Float32Array([1, 2, 3, 4]);
|
|
41
|
+
const bytes = input.byteLength;
|
|
46
42
|
|
|
47
|
-
const
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
if (THREADS % WORKGROUP_SIZE !== 0) {
|
|
52
|
-
throw new Error("THREADS must be a multiple of WORKGROUP_SIZE");
|
|
53
|
-
}
|
|
54
|
-
|
|
55
|
-
const shader = device.createShaderModule({
|
|
56
|
-
code: `
|
|
57
|
-
@group(0) @binding(0) var<storage, read_write> counts: array<u32>;
|
|
58
|
-
|
|
59
|
-
fn hash(n: u32) -> u32 {
|
|
60
|
-
var x = n;
|
|
61
|
-
x ^= x >> 16u;
|
|
62
|
-
x *= 0x45d9f3bu;
|
|
63
|
-
x ^= x >> 16u;
|
|
64
|
-
x *= 0x45d9f3bu;
|
|
65
|
-
x ^= x >> 16u;
|
|
66
|
-
return x;
|
|
67
|
-
}
|
|
68
|
-
|
|
69
|
-
@compute @workgroup_size(${WORKGROUP_SIZE})
|
|
70
|
-
fn main(@builtin(global_invocation_id) gid: vec3u) {
|
|
71
|
-
var count = 0u;
|
|
72
|
-
for (var i = 0u; i < ${SAMPLES_PER_THREAD}u; i += 1u) {
|
|
73
|
-
let idx = gid.x * ${SAMPLES_PER_THREAD}u + i;
|
|
74
|
-
let x = f32(hash(idx * 2u)) / 4294967295.0;
|
|
75
|
-
let y = f32(hash(idx * 2u + 1u)) / 4294967295.0;
|
|
76
|
-
if x * x + y * y <= 1.0 {
|
|
77
|
-
count += 1u;
|
|
78
|
-
}
|
|
79
|
-
}
|
|
80
|
-
counts[gid.x] = count;
|
|
81
|
-
}
|
|
82
|
-
`,
|
|
43
|
+
const src = device.createBuffer({
|
|
44
|
+
size: bytes,
|
|
45
|
+
usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
|
|
83
46
|
});
|
|
47
|
+
device.queue.writeBuffer(src, 0, input);
|
|
84
48
|
|
|
85
|
-
const
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
visibility: GPUShaderStage.COMPUTE,
|
|
89
|
-
buffer: { type: "storage" },
|
|
90
|
-
}],
|
|
49
|
+
const dst = device.createBuffer({
|
|
50
|
+
size: bytes,
|
|
51
|
+
usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
|
|
91
52
|
});
|
|
92
53
|
|
|
93
|
-
const
|
|
94
|
-
|
|
95
|
-
|
|
54
|
+
const readback = device.createBuffer({
|
|
55
|
+
size: bytes,
|
|
56
|
+
usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
|
|
96
57
|
});
|
|
97
58
|
|
|
98
|
-
const
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
59
|
+
const pipeline = device.createComputePipeline({
|
|
60
|
+
layout: "auto",
|
|
61
|
+
compute: {
|
|
62
|
+
module: device.createShaderModule({
|
|
63
|
+
code: `
|
|
64
|
+
@group(0) @binding(0) var<storage, read> src: array<f32>;
|
|
65
|
+
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
|
|
66
|
+
|
|
67
|
+
@compute @workgroup_size(4)
|
|
68
|
+
fn main(@builtin(global_invocation_id) gid: vec3u) {
|
|
69
|
+
let i = gid.x;
|
|
70
|
+
dst[i] = src[i] * 2.0;
|
|
71
|
+
}
|
|
72
|
+
`,
|
|
73
|
+
}),
|
|
74
|
+
entryPoint: "main",
|
|
75
|
+
},
|
|
105
76
|
});
|
|
106
77
|
|
|
107
78
|
const bindGroup = device.createBindGroup({
|
|
108
|
-
layout:
|
|
109
|
-
entries: [
|
|
79
|
+
layout: pipeline.getBindGroupLayout(0),
|
|
80
|
+
entries: [
|
|
81
|
+
{ binding: 0, resource: { buffer: src } },
|
|
82
|
+
{ binding: 1, resource: { buffer: dst } },
|
|
83
|
+
],
|
|
110
84
|
});
|
|
111
85
|
|
|
112
86
|
const encoder = device.createCommandEncoder();
|
|
113
87
|
const pass = encoder.beginComputePass();
|
|
114
88
|
pass.setPipeline(pipeline);
|
|
115
89
|
pass.setBindGroup(0, bindGroup);
|
|
116
|
-
pass.dispatchWorkgroups(
|
|
90
|
+
pass.dispatchWorkgroups(1);
|
|
117
91
|
pass.end();
|
|
118
|
-
encoder.copyBufferToBuffer(
|
|
92
|
+
encoder.copyBufferToBuffer(dst, 0, readback, 0, bytes);
|
|
93
|
+
|
|
119
94
|
device.queue.submit([encoder.finish()]);
|
|
95
|
+
await device.queue.onSubmittedWorkDone();
|
|
120
96
|
|
|
121
|
-
await readback.mapAsync(GPUMapMode.READ);
|
|
122
|
-
const
|
|
123
|
-
const hits = counts.reduce((a, b) => a + b, 0);
|
|
97
|
+
await readback.mapAsync(globals.GPUMapMode.READ);
|
|
98
|
+
const result = new Float32Array(readback.getMappedRange().slice(0));
|
|
124
99
|
readback.unmap();
|
|
125
100
|
|
|
126
|
-
|
|
127
|
-
const pi = 4 * hits / total;
|
|
128
|
-
console.log(`${total.toLocaleString()} samples → pi ≈ ${pi.toFixed(6)}`);
|
|
101
|
+
console.log(result); // Float32Array(4) [ 2, 4, 6, 8 ]
|
|
129
102
|
```
|
|
130
103
|
|
|
131
|
-
|
|
104
|
+
#### 2. Doe API
|
|
132
105
|
|
|
133
|
-
|
|
134
|
-
|
|
106
|
+
Explicit Doe buffers and dispatch when you want less boilerplate but still want
|
|
107
|
+
to manage the resources yourself.
|
|
108
|
+
|
|
109
|
+
```js
|
|
110
|
+
import { doe } from "@simulatte/webgpu/compute";
|
|
111
|
+
|
|
112
|
+
const gpu = await doe.requestDevice();
|
|
113
|
+
const src = gpu.buffers.fromData(Float32Array.of(1, 2, 3, 4));
|
|
114
|
+
const dst = gpu.buffers.like(src, {
|
|
115
|
+
usage: "storageReadWrite",
|
|
116
|
+
});
|
|
117
|
+
|
|
118
|
+
await gpu.compute.run({
|
|
119
|
+
code: `
|
|
120
|
+
@group(0) @binding(0) var<storage, read> src: array<f32>;
|
|
121
|
+
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
|
|
122
|
+
|
|
123
|
+
@compute @workgroup_size(4)
|
|
124
|
+
fn main(@builtin(global_invocation_id) gid: vec3u) {
|
|
125
|
+
let i = gid.x;
|
|
126
|
+
dst[i] = src[i] * 2.0;
|
|
127
|
+
}
|
|
128
|
+
`,
|
|
129
|
+
// Access is inferred from the Doe buffer usage above.
|
|
130
|
+
bindings: [src, dst],
|
|
131
|
+
workgroups: 1,
|
|
132
|
+
});
|
|
133
|
+
|
|
134
|
+
console.log(await gpu.buffers.read(dst, Float32Array)); // Float32Array(4) [ 2, 4, 6, 8 ]
|
|
135
135
|
```
|
|
136
136
|
|
|
137
|
-
|
|
137
|
+
The package identity is simple:
|
|
138
138
|
|
|
139
|
-
|
|
139
|
+
- `requestDevice()` gives you real headless WebGPU
|
|
140
|
+
- `doe` gives you the same runtime with less boilerplate and explicit resource control
|
|
141
|
+
- `compute.once(...)` is the more opinionated routines layer when you do not want to manage buffers and readback yourself
|
|
140
142
|
|
|
141
|
-
|
|
142
|
-
N-API addon and Bun currently routes through the same addon-backed runtime
|
|
143
|
-
entry to load `libwebgpu_doe`. Current package builds still ship a Dawn sidecar
|
|
144
|
-
where proc resolution requires it. The experimental raw Bun FFI path remains in
|
|
145
|
-
`src/bun-ffi.js`, but it is not the default package entry.
|
|
143
|
+
#### 3. Doe routines: one-shot tensor matmul
|
|
146
144
|
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
Vulkan/Metal/D3D12 execution paths in one system. Optional
|
|
151
|
-
`-Dlean-verified=true` builds use Lean 4 where proved invariants can be
|
|
152
|
-
hoisted out of runtime branches instead of being re-checked on every command;
|
|
153
|
-
package consumers should not assume that path by default.
|
|
145
|
+
This is where the routines layer starts to separate itself: you pass typed
|
|
146
|
+
arrays and an output spec, and the package handles upload, output allocation,
|
|
147
|
+
dispatch, and readback while the shader and tensor shapes stay explicit.
|
|
154
148
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
profile instead of relying on hidden per-command fallback logic.
|
|
149
|
+
```js
|
|
150
|
+
import { doe } from "@simulatte/webgpu/compute";
|
|
158
151
|
|
|
159
|
-
|
|
152
|
+
const gpu = await doe.requestDevice();
|
|
153
|
+
const [M, K, N] = [256, 512, 256];
|
|
160
154
|
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
155
|
+
const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
|
|
156
|
+
const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
|
|
157
|
+
const dims = new Uint32Array([M, K, N, 0]);
|
|
158
|
+
|
|
159
|
+
const result = await gpu.compute.once({
|
|
160
|
+
code: `
|
|
161
|
+
struct Dims {
|
|
162
|
+
m: u32,
|
|
163
|
+
k: u32,
|
|
164
|
+
n: u32,
|
|
165
|
+
_pad: u32,
|
|
166
|
+
};
|
|
167
|
+
|
|
168
|
+
@group(0) @binding(0) var<uniform> dims: Dims;
|
|
169
|
+
@group(0) @binding(1) var<storage, read> lhs: array<f32>;
|
|
170
|
+
@group(0) @binding(2) var<storage, read> rhs: array<f32>;
|
|
171
|
+
@group(0) @binding(3) var<storage, read_write> out: array<f32>;
|
|
172
|
+
|
|
173
|
+
@compute @workgroup_size(8, 8)
|
|
174
|
+
fn main(@builtin(global_invocation_id) gid: vec3u) {
|
|
175
|
+
let row = gid.y;
|
|
176
|
+
let col = gid.x;
|
|
177
|
+
if (row >= dims.m || col >= dims.n) {
|
|
178
|
+
return;
|
|
179
|
+
}
|
|
180
|
+
|
|
181
|
+
var acc = 0.0;
|
|
182
|
+
for (var i = 0u; i < dims.k; i = i + 1u) {
|
|
183
|
+
acc += lhs[row * dims.k + i] * rhs[i * dims.n + col];
|
|
184
|
+
}
|
|
185
|
+
out[row * dims.n + col] = acc;
|
|
186
|
+
}
|
|
187
|
+
`,
|
|
188
|
+
inputs: [
|
|
189
|
+
{ data: dims, usage: "uniform", access: "uniform" },
|
|
190
|
+
lhs,
|
|
191
|
+
rhs,
|
|
192
|
+
],
|
|
193
|
+
output: {
|
|
194
|
+
type: Float32Array,
|
|
195
|
+
size: M * N * Float32Array.BYTES_PER_ELEMENT,
|
|
196
|
+
},
|
|
197
|
+
workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
|
|
198
|
+
});
|
|
199
|
+
|
|
200
|
+
console.log(result.subarray(0, 8)); // Float32Array(8) [ ... ]
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Benchmarked package surface
|
|
204
|
+
|
|
205
|
+
The package is not just a wrapper API. It is the headless package surface of
|
|
206
|
+
the Doe runtime, Fawn's Zig-first WebGPU implementation, and it is exercised as
|
|
207
|
+
a measured package surface with explicit package lanes.
|
|
168
208
|
|
|
169
209
|
<p align="center">
|
|
170
210
|
<img src="assets/package-surface-cube-snapshot.svg" alt="Static package-surface benchmark cube snapshot" width="920" />
|
|
171
211
|
</p>
|
|
172
212
|
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
213
|
+
`@simulatte/webgpu` is the headless package surface of the broader
|
|
214
|
+
[Fawn](https://github.com/clocksmith/fawn) project. The same repository also
|
|
215
|
+
carries the Doe runtime itself, benchmarking and verification tooling, and the
|
|
216
|
+
separate `nursery/fawn-browser` Chromium/browser integration lane.
|
|
176
217
|
|
|
177
|
-
##
|
|
218
|
+
## Install
|
|
178
219
|
|
|
179
220
|
```bash
|
|
180
221
|
npm install @simulatte/webgpu
|
|
181
222
|
```
|
|
182
223
|
|
|
224
|
+
The install ships platform-specific prebuilds for macOS arm64 (Metal) and
|
|
225
|
+
Linux x64 (Vulkan). If no prebuild matches your platform, the installer falls
|
|
226
|
+
back to building the native addon with `node-gyp` only; it does not build or
|
|
227
|
+
bundle `libwebgpu_doe` and the required Dawn sidecar for you. On unsupported
|
|
228
|
+
platforms, use a local Fawn workspace build for those runtime libraries.
|
|
229
|
+
|
|
230
|
+
## Choose a surface
|
|
231
|
+
|
|
232
|
+
| Import | Surface | Includes |
|
|
233
|
+
| --------------------------- | --------------------- | --------------------------------------------------------- |
|
|
234
|
+
| `@simulatte/webgpu` | Default full surface | Buffers, compute, textures, samplers, render, Doe API + routines |
|
|
235
|
+
| `@simulatte/webgpu/compute` | Compute-first surface | Buffers, compute, copy/upload/readback, Doe API + routines |
|
|
236
|
+
| `@simulatte/webgpu/full` | Explicit full surface | Same contract as the default package surface |
|
|
237
|
+
|
|
238
|
+
Use `@simulatte/webgpu/compute` when you want the constrained package contract
|
|
239
|
+
for AI workloads and other buffer/dispatch-heavy headless execution. The
|
|
240
|
+
compute surface intentionally omits render and sampler methods from the JS
|
|
241
|
+
facade.
|
|
242
|
+
|
|
243
|
+
## Package basics
|
|
244
|
+
|
|
245
|
+
### Inspect the provider
|
|
246
|
+
|
|
183
247
|
```js
|
|
184
|
-
import { providerInfo
|
|
248
|
+
import { providerInfo } from "@simulatte/webgpu";
|
|
185
249
|
|
|
186
250
|
console.log(providerInfo());
|
|
187
|
-
|
|
188
|
-
const device = await requestDevice();
|
|
189
|
-
console.log(device.limits.maxBufferSize);
|
|
190
251
|
```
|
|
191
252
|
|
|
192
|
-
|
|
193
|
-
Linux x64 (Vulkan). The commands are the same on both platforms; the correct
|
|
194
|
-
backend is selected automatically. The only external prerequisite is GPU
|
|
195
|
-
drivers on the host. If no prebuild matches your platform, install falls back
|
|
196
|
-
to building from source via node-gyp.
|
|
197
|
-
|
|
198
|
-
## Verify your install
|
|
253
|
+
### Request a full device
|
|
199
254
|
|
|
200
|
-
|
|
201
|
-
|
|
255
|
+
```js
|
|
256
|
+
import { requestDevice } from "@simulatte/webgpu";
|
|
202
257
|
|
|
203
|
-
|
|
204
|
-
|
|
258
|
+
const device = await requestDevice();
|
|
259
|
+
console.log(device.limits.maxBufferSize);
|
|
205
260
|
```
|
|
206
261
|
|
|
207
|
-
|
|
208
|
-
dispatch with readback, textures, samplers):
|
|
262
|
+
### Request a compute-only device
|
|
209
263
|
|
|
210
|
-
```
|
|
211
|
-
|
|
212
|
-
|
|
264
|
+
```js
|
|
265
|
+
import { requestDevice } from "@simulatte/webgpu/compute";
|
|
266
|
+
|
|
267
|
+
const device = await requestDevice();
|
|
268
|
+
console.log(typeof device.createComputePipeline); // "function"
|
|
269
|
+
console.log(typeof device.createRenderPipeline); // "undefined"
|
|
213
270
|
```
|
|
214
271
|
|
|
215
|
-
|
|
216
|
-
platform is supported (macOS arm64 or Linux x64).
|
|
272
|
+
## Doe layers
|
|
217
273
|
|
|
218
|
-
|
|
274
|
+
The package exposes three layers over the same runtime:
|
|
219
275
|
|
|
220
|
-
|
|
221
|
-
|
|
276
|
+
<p align="center">
|
|
277
|
+
<img src="assets/package-layers.svg" alt="Layered package graph showing direct WebGPU, Doe API, and Doe routines over the same package surfaces." width="920" />
|
|
278
|
+
</p>
|
|
222
279
|
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
280
|
+
- `Direct WebGPU`
|
|
281
|
+
raw `requestDevice()` plus direct `device.*`
|
|
282
|
+
- `Doe API`
|
|
283
|
+
explicit Doe surface for lower-boilerplate buffer and compute flows
|
|
284
|
+
- `Doe routines`
|
|
285
|
+
more opinionated Doe flows where the JS surface carries more of the operation
|
|
286
|
+
|
|
287
|
+
Examples for each style ship in:
|
|
288
|
+
|
|
289
|
+
- `examples/direct-webgpu/`
|
|
290
|
+
- `examples/doe-api/`
|
|
291
|
+
- `examples/doe-routines/`
|
|
292
|
+
|
|
293
|
+
`doe` is the package's shared JS convenience surface over the Doe runtime. It is available
|
|
294
|
+
from both `@simulatte/webgpu` and `@simulatte/webgpu/compute`.
|
|
295
|
+
|
|
296
|
+
- `await doe.requestDevice()` gets a bound helper object in one step; use
|
|
297
|
+
`doe.bind(device)` when you already have a device.
|
|
298
|
+
- `gpu.buffers.*`, `gpu.compute.run(...)`, and `gpu.compute.compile(...)` are
|
|
299
|
+
the main `Doe API` surface.
|
|
300
|
+
- `gpu.compute.once(...)` is currently the first `Doe routines` path.
|
|
301
|
+
|
|
302
|
+
The Doe API and Doe routines surface is the same on both package surfaces.
|
|
303
|
+
The difference is the raw device beneath it:
|
|
304
|
+
|
|
305
|
+
- `@simulatte/webgpu/compute` returns a compute-only facade
|
|
306
|
+
- `@simulatte/webgpu` keeps the full headless device surface
|
|
307
|
+
|
|
308
|
+
Binding access is inferred from Doe helper-created buffer usage when possible.
|
|
309
|
+
For raw WebGPU buffers or non-bindable/ambiguous usage, pass
|
|
310
|
+
`{ buffer, access }` explicitly.
|
|
311
|
+
|
|
312
|
+
## Runtime notes
|
|
313
|
+
|
|
314
|
+
`@simulatte/webgpu` is the canonical package surface for the Doe runtime. Node uses the
|
|
315
|
+
addon-backed path. Bun uses a platform-dependent bridge today: Linux routes
|
|
316
|
+
through the package FFI surface, while macOS currently uses the full
|
|
317
|
+
addon-backed path for correctness parity. Current builds still ship a Dawn
|
|
318
|
+
sidecar where proc resolution requires it.
|
|
319
|
+
|
|
320
|
+
The Doe runtime is Fawn's Zig-first WebGPU implementation with explicit profile
|
|
321
|
+
and quirk binding, a native WGSL pipeline (`lexer -> parser -> semantic
|
|
322
|
+
analysis -> IR -> backend emitters`), and explicit Vulkan/Metal/D3D12
|
|
323
|
+
execution paths in one system.
|
|
324
|
+
Optional `-Dlean-verified=true` builds use Lean 4 as build-time proof support,
|
|
325
|
+
not as a runtime interpreter. When a condition is proved ahead of time, the Doe
|
|
326
|
+
runtime can remove that branch instead of re-checking it on every command;
|
|
327
|
+
package consumers should not assume that path by default.
|
|
233
328
|
|
|
234
|
-
|
|
329
|
+
## Verify your install
|
|
235
330
|
|
|
236
331
|
```bash
|
|
237
|
-
cd zig && zig build dropin
|
|
238
|
-
|
|
239
|
-
cd nursery/webgpu
|
|
240
|
-
npm run build:addon
|
|
241
332
|
npm run smoke
|
|
242
333
|
npm test
|
|
243
334
|
npm run test:bun
|
|
244
|
-
npm run prebuild -- --skip-addon-build
|
|
245
|
-
npm pack --dry-run
|
|
246
335
|
```
|
|
247
336
|
|
|
248
|
-
|
|
249
|
-
|
|
337
|
+
`npm run smoke` checks native library loading and a GPU round-trip. `npm test`
|
|
338
|
+
covers the Node package contract and a packed-tarball export/import check.
|
|
250
339
|
|
|
251
|
-
|
|
252
|
-
harness usage, see the [Fawn project README](../../README.md).
|
|
340
|
+
## Caveats
|
|
253
341
|
|
|
254
|
-
|
|
342
|
+
- This is a headless package, not a browser DOM/canvas package.
|
|
343
|
+
- `@simulatte/webgpu/compute` is intentionally narrower than the default full
|
|
344
|
+
surface.
|
|
345
|
+
- Bun currently uses a platform-dependent bridge layer under the same package
|
|
346
|
+
contract: FFI on Linux, full/addon-backed on macOS. Package-surface contract
|
|
347
|
+
tests are green, and package benchmark rows are positioning data rather than
|
|
348
|
+
the source of truth for strict backend-native Doe-vs-Dawn claims.
|
|
255
349
|
|
|
256
|
-
|
|
257
|
-
npm run prebuild # assembles prebuilds/<platform>-<arch>/
|
|
258
|
-
```
|
|
350
|
+
## Further reading
|
|
259
351
|
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
Generated `.tgz` package archives are release outputs and should not be
|
|
267
|
-
committed to the repo.
|
|
268
|
-
Prebuild `metadata.json` now records `doeBuild.leanVerifiedBuild` and
|
|
269
|
-
`proofArtifactSha256`, and `providerInfo()` surfaces the same values when
|
|
270
|
-
metadata is present.
|
|
271
|
-
|
|
272
|
-
Package publication still depends on the governed Linux Vulkan release lane in
|
|
273
|
-
[`process.md`](../../process.md). A green macOS package rerun is necessary, but
|
|
274
|
-
not sufficient, for a release publish.
|
|
275
|
-
|
|
276
|
-
## Current caveats
|
|
277
|
-
|
|
278
|
-
- This package is for headless benchmarking and CI workflows, not full browser
|
|
279
|
-
parity.
|
|
280
|
-
- Node provider comparisons are host-local package/runtime evidence measured
|
|
281
|
-
with package-level timers. They are useful surface-positioning data, not
|
|
282
|
-
backend claim substantiation or a broad "the package is faster" claim.
|
|
283
|
-
- `@simulatte/webgpu` does not yet have a single broad cross-surface speed
|
|
284
|
-
claim. Current performance evidence is split across Node package-surface
|
|
285
|
-
runs, prototype Bun package-surface runs, and workload-specific strict
|
|
286
|
-
backend reports.
|
|
287
|
-
- Linux Node Doe-native path is now wired end-to-end (Linux guard removed).
|
|
288
|
-
No `DOE_WEBGPU_LIB` env var needed when prebuilds or workspace artifacts
|
|
289
|
-
are present.
|
|
290
|
-
- Fresh macOS package evidence from March 10, 2026 is reflected in
|
|
291
|
-
`bench/out/cube/latest/` (generated `2026-03-10T20:31:02.431911Z`):
|
|
292
|
-
Bun `uploads`, `compute_e2e`, and `full_comparable` are `claimable`;
|
|
293
|
-
Node `uploads`, `compute_e2e`, and `full_comparable` are also `claimable`.
|
|
294
|
-
- Separate Apple Metal extended-comparable backend evidence from March 10, 2026
|
|
295
|
-
(`bench/out/apple-metal/extended-comparable/20260310T121546Z/`) is
|
|
296
|
-
`31/31` comparable and `31/31` claimable. Read that lane as stricter
|
|
297
|
-
backend evidence, not as a replacement for the package-surface cube rows.
|
|
298
|
-
- Bun has API parity with Node (61/61 contract tests). The package-default Bun
|
|
299
|
-
entry currently routes through the addon-backed runtime, while
|
|
300
|
-
`src/bun-ffi.js` remains experimental. Bun benchmark lane is at
|
|
301
|
-
`bench/bun/compare.js`; benchmark interpretations should note which runtime
|
|
302
|
-
entry was exercised. Latest fresh macOS run
|
|
303
|
-
(`bench/out/bun-doe-vs-webgpu/doe-vs-bun-webgpu-2026-03-10T195022524Z.json`)
|
|
304
|
-
executes all `12` current workloads and has `9` comparable rows, all `9`
|
|
305
|
-
claimable. `compute_e2e_{256,4096,65536}` and
|
|
306
|
-
`copy_buffer_to_buffer_4kb` are claimable in the full macOS package lane.
|
|
307
|
-
The remaining three rows are intentional directional-only workloads
|
|
308
|
-
(`submit_empty`, `pipeline_create`, `compute_dispatch_simple`).
|
|
309
|
-
- Latest fresh macOS Node package run
|
|
310
|
-
(`bench/out/node-doe-vs-dawn-claim-full/doe-vs-dawn-node-2026-03-10T202406545Z.json`)
|
|
311
|
-
has `12` total rows, `9` comparable rows, and all `9` comparable rows are
|
|
312
|
-
claimable. `compute_e2e_{256,4096,65536}`, `copy_buffer_to_buffer_4kb`,
|
|
313
|
-
and the current upload set are claimable in the full package lane. The
|
|
314
|
-
remaining three rows are intentional directional-only workloads
|
|
315
|
-
(`submit_empty`, `pipeline_create`, `compute_dispatch_simple`).
|
|
316
|
-
- Self-contained install ships prebuilt `doe_napi.node` + `libwebgpu_doe` +
|
|
317
|
-
Dawn sidecar per platform. See **Verify your install** above.
|
|
318
|
-
- API details live in `API_CONTRACT.md`.
|
|
319
|
-
- Compatibility scope is documented in `COMPAT_SCOPE.md`.
|
|
352
|
+
- [API contract](./api-contract.md)
|
|
353
|
+
- [Support contracts](./support-contracts.md)
|
|
354
|
+
- [Compatibility scope](./compat-scope.md)
|
|
355
|
+
- [Layering plan](./layering-plan.md)
|
|
356
|
+
- [Headless WebGPU comparison](./headless-webgpu-comparison.md)
|
|
357
|
+
- [Zig source inventory](./zig-source-inventory.md)
|