@simulatte/webgpu 0.2.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/CHANGELOG.md +47 -4
  2. package/README.md +273 -235
  3. package/api-contract.md +163 -0
  4. package/assets/fawn-icon-main-256.png +0 -0
  5. package/assets/package-layers.svg +63 -0
  6. package/assets/package-surface-cube-snapshot.svg +7 -7
  7. package/{COMPAT_SCOPE.md → compat-scope.md} +1 -1
  8. package/examples/direct-webgpu/compute-dispatch.js +66 -0
  9. package/examples/direct-webgpu/explicit-bind-group.js +85 -0
  10. package/examples/direct-webgpu/request-device.js +10 -0
  11. package/examples/doe-api/buffers-readback.js +9 -0
  12. package/examples/doe-api/compile-and-dispatch.js +30 -0
  13. package/examples/doe-api/compute-dispatch.js +25 -0
  14. package/examples/doe-routines/compute-once-like-input.js +36 -0
  15. package/examples/doe-routines/compute-once-matmul.js +53 -0
  16. package/examples/doe-routines/compute-once-multiple-inputs.js +27 -0
  17. package/examples/doe-routines/compute-once.js +23 -0
  18. package/headless-webgpu-comparison.md +2 -2
  19. package/{LAYERING_PLAN.md → layering-plan.md} +10 -8
  20. package/native/doe_napi.c +102 -12
  21. package/package.json +26 -9
  22. package/prebuilds/darwin-arm64/doe_napi.node +0 -0
  23. package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
  24. package/prebuilds/darwin-arm64/metadata.json +6 -6
  25. package/prebuilds/linux-x64/doe_napi.node +0 -0
  26. package/prebuilds/linux-x64/libwebgpu_doe.so +0 -0
  27. package/prebuilds/linux-x64/metadata.json +5 -5
  28. package/scripts/generate-readme-assets.js +81 -8
  29. package/scripts/prebuild.js +23 -19
  30. package/src/auto_bind_group_layout.js +32 -0
  31. package/src/bun-ffi.js +93 -12
  32. package/src/bun.js +23 -2
  33. package/src/compute.d.ts +162 -0
  34. package/src/compute.js +915 -0
  35. package/src/doe.d.ts +184 -0
  36. package/src/doe.js +641 -0
  37. package/src/full.d.ts +119 -0
  38. package/src/full.js +35 -0
  39. package/src/index.js +1013 -38
  40. package/src/node-runtime.js +2 -2
  41. package/src/node.js +2 -2
  42. package/{SUPPORT_CONTRACTS.md → support-contracts.md} +27 -41
  43. package/{ZIG_SOURCE_INVENTORY.md → zig-source-inventory.md} +2 -2
  44. package/API_CONTRACT.md +0 -182
@@ -0,0 +1,163 @@
1
+ # @simulatte/webgpu API Contract
2
+
3
+ Contract version: `v1`
4
+
5
+ Scope: current headless WebGPU package contract for Node.js and Bun, with a
6
+ default `full` surface, an explicit `compute` subpath, and the Doe API / Doe
7
+ routines surface used by benchmarking, CI, and artifact-backed comparison
8
+ workflows.
9
+
10
+ Terminology in this contract is explicit:
11
+
12
+ - `Doe runtime`
13
+ the Zig/native WebGPU runtime underneath the package
14
+ - `Doe API`
15
+ the explicit JS convenience surface under `doe.bind(...)`, `gpu.buffers.*`,
16
+ `gpu.compute.run(...)`, and `gpu.compute.compile(...)`
17
+ - `Doe routines`
18
+ the narrower, more opinionated JS flows layered on that same runtime;
19
+ currently `gpu.compute.once(...)`
20
+
21
+ For the current `compute` vs `full` support split, see
22
+ [`./support-contracts.md`](./support-contracts.md).
23
+
24
+ Exact type and method shapes live in:
25
+
26
+ - [`./src/full.d.ts`](./src/full.d.ts)
27
+ - [`./src/compute.d.ts`](./src/compute.d.ts)
28
+ - [`./src/doe.d.ts`](./src/doe.d.ts)
29
+
30
+ This contract covers package-surface GPU access, provider metadata, and helper
31
+ entrypoints. It does not promise DOM/canvas ownership or browser-process
32
+ parity.
33
+
34
+ ## API styles
35
+
36
+ The current package surface is organized around three API styles:
37
+
38
+ - `Direct WebGPU`
39
+ raw `requestAdapter(...)`, `requestDevice(...)`, and direct `device.*` usage
40
+ - `Doe API`
41
+ the package's explicit JS convenience surface under `doe.bind(...)`,
42
+ `gpu.buffers.*`, `gpu.compute.run(...)`, and `gpu.compute.compile(...)`
43
+ - `Doe routines`
44
+ the package's more opinionated precomposed flows; currently
45
+ `gpu.compute.once(...)`
46
+
47
+ ## Export surfaces
48
+
49
+ ### `@simulatte/webgpu`
50
+
51
+ Default package surface.
52
+
53
+ Contract:
54
+
55
+ - headless `full` surface
56
+ - includes compute plus render/sampler/surface APIs already exposed by the Doe runtime package surface
57
+ - also exports the shared `doe` namespace for the Doe API and Doe routines surface
58
+
59
+ ### `@simulatte/webgpu/compute`
60
+
61
+ Compute-first package surface.
62
+
63
+ Contract:
64
+
65
+ - sized for AI workloads and other buffer/dispatch-heavy headless execution
66
+ - excludes render/sampler/surface methods from the public JS facade
67
+ - also exports the same `doe` namespace for the Doe API and Doe routines surface
68
+
69
+ ## Shared runtime API
70
+
71
+ Modules:
72
+
73
+ - `@simulatte/webgpu`
74
+ - `@simulatte/webgpu/compute`
75
+
76
+ ### Top-level package API
77
+
78
+ The exact signatures are defined in the `.d.ts` files above. At the contract
79
+ level:
80
+
81
+ - `create(...)` loads the Doe-native addon/runtime and returns a package-local
82
+ `GPU` object.
83
+ - `globals` exposes provider globals suitable for `Object.assign(...)` or
84
+ bootstrap wiring.
85
+ - `setupGlobals(...)` installs globals and `navigator.gpu` when missing.
86
+ - `requestAdapter(...)` and `requestDevice(...)` are the `Direct WebGPU` entry
87
+ points.
88
+
89
+ On `@simulatte/webgpu/compute`, the returned device is intentionally
90
+ compute-only:
91
+
92
+ - buffer / bind group / compute pipeline / command encoder / queue methods are available
93
+ - render / sampler / surface methods are intentionally absent from the facade
94
+
95
+ ### `providerInfo()`
96
+
97
+ Behavior:
98
+
99
+ - reports package-surface library provenance when prebuild metadata or Zig build
100
+ metadata is available
101
+ - does not guess: if metadata is unavailable, `leanVerifiedBuild` is `null`
102
+ - reports whether the Doe-native path is loaded and where build metadata came from
103
+
104
+ ### `doe`
105
+
106
+ Behavior:
107
+
108
+ - provides the `Doe API` and `Doe routines` surface for common headless
109
+ compute tasks
110
+ - the exported `doe` namespace is the JS convenience surface, distinct from
111
+ the underlying Doe runtime
112
+ - `requestDevice(options?)` resolves the package-local `requestDevice(...)` and returns
113
+ the bound helper object directly
114
+ - supports both static helper calls and `doe.bind(device)` for device-bound workflows
115
+ - helper methods are grouped under `buffers.*` and `compute.*`
116
+ - `buffers.*`, `compute.run(...)`, and `compute.compile(...)` are the main
117
+ `Doe API` surface
118
+ - `compute.once(...)` is the first `Doe routines` path and stays intentionally
119
+ narrow: typed-array/headless one-call execution, not a replacement for
120
+ explicit reusable resource ownership
121
+ - infers `compute.run(...).bindings` access from Doe helper-created buffer usage when that
122
+ usage maps to one bindable access mode (`uniform`, `storageRead`, `storageReadWrite`)
123
+ - `compute.once(...)` accepts Doe usage tokens only; raw numeric WebGPU usage flags stay on
124
+ the more explicit `Doe API` surface
125
+ - fails fast for bare bindings that do not carry Doe helper usage metadata or whose
126
+ usage is non-bindable/ambiguous; callers must pass `{ buffer, access }` explicitly
127
+ - additive only; it does not replace the raw WebGPU-facing package API
128
+
129
+ ### `createDoeRuntime(options?)`
130
+
131
+ Behavior:
132
+
133
+ - returns the local Doe runtime/CLI wrapper used for command-stream execution
134
+ and benchmark orchestration from Node/Bun environments
135
+ - preserves explicit file-path ownership for the binary/library location rather
136
+ than hiding them behind package-only assumptions
137
+
138
+ ### `runDawnVsDoeCompare(options)`
139
+
140
+ Behavior:
141
+
142
+ - wraps `bench/compare_dawn_vs_doe.py`
143
+ - requires either `configPath` or `--config` in `extraArgs`
144
+
145
+ ## CLI contract
146
+
147
+ ### `fawn-webgpu-bench`
148
+
149
+ Purpose:
150
+
151
+ - execute Doe command-stream benchmark runs and emit trace artifacts.
152
+
153
+ ### `fawn-webgpu-compare`
154
+
155
+ Purpose:
156
+
157
+ - one-command Dawn-vs-Doe compare wrapper from Node tooling.
158
+
159
+ ## Non-goals in v1
160
+
161
+ 1. Full browser-parity WebGPU JS object model emulation.
162
+ 2. Browser presentation parity.
163
+ 3. npm `webgpu` drop-in compatibility guarantee.
Binary file
@@ -0,0 +1,63 @@
1
+ <!-- Generated by scripts/generate-readme-assets.js. Do not edit by hand. -->
2
+ <svg xmlns="http://www.w3.org/2000/svg" width="1200" height="470" viewBox="0 0 1200 470" role="img" aria-labelledby="layers-title layers-desc">
3
+ <title id="layers-title">@simulatte/webgpu layered package graph</title>
4
+ <desc id="layers-desc">Layered package graph showing direct WebGPU, Doe API, and Doe routines over the same package surfaces.</desc>
5
+ <defs>
6
+ <linearGradient id="layers-bg" x1="0%" y1="0%" x2="100%" y2="100%">
7
+ <stop offset="0%" stop-color="#050816"/>
8
+ <stop offset="100%" stop-color="#140c1f"/>
9
+ </linearGradient>
10
+ <radialGradient id="layers-glow-top" cx="25%" cy="18%" r="55%">
11
+ <stop offset="0%" stop-color="#ef444430"/>
12
+ <stop offset="55%" stop-color="#7c3aed18"/>
13
+ <stop offset="100%" stop-color="#00000000"/>
14
+ </radialGradient>
15
+ <radialGradient id="layers-glow-bottom" cx="78%" cy="84%" r="52%">
16
+ <stop offset="0%" stop-color="#f59e0b26"/>
17
+ <stop offset="60%" stop-color="#f9731618"/>
18
+ <stop offset="100%" stop-color="#00000000"/>
19
+ </radialGradient>
20
+ <linearGradient id="layers-root" x1="0%" y1="0%" x2="100%" y2="100%">
21
+ <stop offset="0%" stop-color="#7c3aed"/>
22
+ <stop offset="100%" stop-color="#ef4444"/>
23
+ </linearGradient>
24
+ <linearGradient id="layers-direct" x1="0%" y1="0%" x2="100%" y2="100%">
25
+ <stop offset="0%" stop-color="#ef4444"/>
26
+ <stop offset="100%" stop-color="#f97316"/>
27
+ </linearGradient>
28
+ <linearGradient id="layers-api" x1="0%" y1="0%" x2="100%" y2="100%">
29
+ <stop offset="0%" stop-color="#f97316"/>
30
+ <stop offset="100%" stop-color="#f59e0b"/>
31
+ </linearGradient>
32
+ <linearGradient id="layers-routines" x1="0%" y1="0%" x2="100%" y2="100%">
33
+ <stop offset="0%" stop-color="#f59e0b"/>
34
+ <stop offset="100%" stop-color="#eab308"/>
35
+ </linearGradient>
36
+ <filter id="shadow" x="-20%" y="-20%" width="140%" height="140%">
37
+ <feDropShadow dx="0" dy="10" stdDeviation="14" flood-color="#000000" flood-opacity="0.32"/>
38
+ </filter>
39
+ <style>
40
+ .title { font: 700 34px "Segoe UI", "Helvetica Neue", Arial, sans-serif; fill: #ffffff; paint-order: stroke fill; stroke: #000000; stroke-width: 2px; stroke-linejoin: round; }
41
+ .subtitle { font: 500 18px "Segoe UI", "Helvetica Neue", Arial, sans-serif; fill: #cbd5e1; paint-order: stroke fill; stroke: #000000; stroke-width: 2px; stroke-linejoin: round; }
42
+ .nodeTitle { font: 700 22px "Segoe UI", "Helvetica Neue", Arial, sans-serif; fill: #ffffff; paint-order: stroke fill; stroke: #000000; stroke-width: 2px; stroke-linejoin: round; }
43
+ .box { stroke-width: 2.5; filter: url(#shadow); }
44
+ </style>
45
+ </defs>
46
+ <rect width="1200" height="470" fill="url(#layers-bg)"/>
47
+ <rect width="1200" height="470" fill="url(#layers-glow-top)"/>
48
+ <rect width="1200" height="470" fill="url(#layers-glow-bottom)"/>
49
+ <text x="64" y="62" class="title">Same package, four layers</text>
50
+ <text x="64" y="94" class="subtitle">The package surface stays the same while the API gets progressively higher-level.</text>
51
+
52
+ <rect x="170" y="122" width="860" height="64" rx="20" fill="url(#layers-root)" stroke="#c4b5fd" class="box"/>
53
+ <text x="600" y="162" text-anchor="middle" class="nodeTitle">@simulatte/webgpu / @simulatte/webgpu/compute</text>
54
+
55
+ <rect x="220" y="222" width="760" height="52" rx="18" fill="url(#layers-direct)" stroke="#fca5a5" class="box"/>
56
+ <text x="600" y="255" text-anchor="middle" class="nodeTitle">Direct WebGPU</text>
57
+
58
+ <rect x="280" y="310" width="640" height="52" rx="18" fill="url(#layers-api)" stroke="#fdba74" class="box"/>
59
+ <text x="600" y="343" text-anchor="middle" class="nodeTitle">Doe API</text>
60
+
61
+ <rect x="360" y="398" width="480" height="52" rx="18" fill="url(#layers-routines)" stroke="#fde68a" class="box"/>
62
+ <text x="600" y="431" text-anchor="middle" class="nodeTitle">Doe routines</text>
63
+ </svg>
@@ -60,22 +60,22 @@
60
60
 
61
61
  <rect x="640" y="176" width="488" height="318" rx="24" class="panel toneRight"/>
62
62
  <text x="668" y="216" class="cardTitle">Bun package lane</text>
63
- <text x="668" y="244" class="cardMeta">Prototype support | linux_x64</text>
64
- <text x="668" y="266" class="cardMeta">latest populated cell 2026-03-06T21:55:26.482Z</text>
63
+ <text x="668" y="244" class="cardMeta">Validated support | mac_apple_silicon</text>
64
+ <text x="668" y="266" class="cardMeta">latest populated cell 2026-03-10T19:50:22.523Z</text>
65
65
 
66
66
  <rect x="658" y="300" width="452" height="82" rx="16" class="metric toneRight"/>
67
67
  <text x="682" y="331" class="metricTitle">Compute E2E</text>
68
68
  <rect x="954" y="315" width="132" height="28" rx="14" fill="#16a34a" stroke="#86efac" stroke-width="1.5"/>
69
69
  <text x="1020" y="334" text-anchor="middle" class="pillText">CLAIMABLE</text>
70
70
  <text x="682" y="357" class="metricBody">3 rows | claimable</text>
71
- <text x="682" y="377" class="metricBody">median p50 delta +77.2%</text>
71
+ <text x="682" y="377" class="metricBody">median p50 delta +53.1%</text>
72
72
 
73
73
  <rect x="658" y="396" width="452" height="82" rx="16" class="metric toneRight"/>
74
74
  <text x="682" y="427" class="metricTitle">Uploads</text>
75
- <rect x="954" y="411" width="132" height="28" rx="14" fill="#d97706" stroke="#fbbf24" stroke-width="1.5"/>
76
- <text x="1020" y="430" text-anchor="middle" class="pillText">COMPARABLE</text>
77
- <text x="682" y="453" class="metricBody">5 rows | comparable</text>
78
- <text x="682" y="473" class="metricBody">median p50 delta +8.6%</text>
75
+ <rect x="954" y="411" width="132" height="28" rx="14" fill="#16a34a" stroke="#86efac" stroke-width="1.5"/>
76
+ <text x="1020" y="430" text-anchor="middle" class="pillText">CLAIMABLE</text>
77
+ <text x="682" y="453" class="metricBody">5 rows | claimable</text>
78
+ <text x="682" y="473" class="metricBody">median p50 delta +67.8%</text>
79
79
  <text x="72" y="590" class="foot">Generated by nursery/webgpu/scripts/generate-readme-assets.js.</text>
80
80
  <text x="72" y="612" class="foot">Static claim and comparability card from the package-surface cube. It is not a substitute for strict backend reports.</text>
81
81
  </svg>
@@ -43,4 +43,4 @@ Layering note:
43
43
 
44
44
  - this file describes the current package surface and its present non-goals
45
45
  - proposed future `core` vs `full` support contracts are defined separately in
46
- `SUPPORT_CONTRACTS.md`
46
+ [`./support-contracts.md`](./support-contracts.md)
@@ -0,0 +1,66 @@
1
+ import { globals, requestDevice } from "@simulatte/webgpu";
2
+
3
+ const device = await requestDevice();
4
+
5
+ const input = new Float32Array([1, 2, 3, 4]);
6
+ const inputBuffer = device.createBuffer({
7
+ size: input.byteLength,
8
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
9
+ });
10
+ device.queue.writeBuffer(inputBuffer, 0, input);
11
+
12
+ const outputBuffer = device.createBuffer({
13
+ size: input.byteLength,
14
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
15
+ });
16
+
17
+ const readbackBuffer = device.createBuffer({
18
+ size: input.byteLength,
19
+ usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
20
+ });
21
+
22
+ const shader = device.createShaderModule({
23
+ code: `
24
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
25
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
26
+
27
+ @compute @workgroup_size(4)
28
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
29
+ let i = gid.x;
30
+ dst[i] = src[i] * 2.0;
31
+ }
32
+ `,
33
+ });
34
+
35
+ const pipeline = device.createComputePipeline({
36
+ layout: "auto",
37
+ compute: {
38
+ module: shader,
39
+ entryPoint: "main",
40
+ },
41
+ });
42
+
43
+ const bindGroup = device.createBindGroup({
44
+ layout: pipeline.getBindGroupLayout(0),
45
+ entries: [
46
+ { binding: 0, resource: { buffer: inputBuffer } },
47
+ { binding: 1, resource: { buffer: outputBuffer } },
48
+ ],
49
+ });
50
+
51
+ const encoder = device.createCommandEncoder();
52
+ const pass = encoder.beginComputePass();
53
+ pass.setPipeline(pipeline);
54
+ pass.setBindGroup(0, bindGroup);
55
+ pass.dispatchWorkgroups(1);
56
+ pass.end();
57
+ encoder.copyBufferToBuffer(outputBuffer, 0, readbackBuffer, 0, input.byteLength);
58
+
59
+ device.queue.submit([encoder.finish()]);
60
+ await device.queue.onSubmittedWorkDone();
61
+
62
+ await readbackBuffer.mapAsync(globals.GPUMapMode.READ);
63
+ const result = new Float32Array(readbackBuffer.getMappedRange().slice(0));
64
+ readbackBuffer.unmap();
65
+
66
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,85 @@
1
+ import { globals, requestDevice } from "@simulatte/webgpu";
2
+
3
+ const device = await requestDevice();
4
+
5
+ const input = new Float32Array([1, 2, 3, 4]);
6
+ const inputBuffer = device.createBuffer({
7
+ size: input.byteLength,
8
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_DST,
9
+ });
10
+ device.queue.writeBuffer(inputBuffer, 0, input);
11
+
12
+ const outputBuffer = device.createBuffer({
13
+ size: input.byteLength,
14
+ usage: globals.GPUBufferUsage.STORAGE | globals.GPUBufferUsage.COPY_SRC,
15
+ });
16
+
17
+ const readbackBuffer = device.createBuffer({
18
+ size: input.byteLength,
19
+ usage: globals.GPUBufferUsage.COPY_DST | globals.GPUBufferUsage.MAP_READ,
20
+ });
21
+
22
+ const shader = device.createShaderModule({
23
+ code: `
24
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
25
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
26
+
27
+ @compute @workgroup_size(4)
28
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
29
+ let i = gid.x;
30
+ dst[i] = src[i] * 4.0;
31
+ }
32
+ `,
33
+ });
34
+
35
+ const bindGroupLayout = device.createBindGroupLayout({
36
+ entries: [
37
+ {
38
+ binding: 0,
39
+ visibility: globals.GPUShaderStage.COMPUTE,
40
+ buffer: { type: "read-only-storage" },
41
+ },
42
+ {
43
+ binding: 1,
44
+ visibility: globals.GPUShaderStage.COMPUTE,
45
+ buffer: { type: "storage" },
46
+ },
47
+ ],
48
+ });
49
+
50
+ const pipelineLayout = device.createPipelineLayout({
51
+ bindGroupLayouts: [bindGroupLayout],
52
+ });
53
+
54
+ const pipeline = device.createComputePipeline({
55
+ layout: pipelineLayout,
56
+ compute: {
57
+ module: shader,
58
+ entryPoint: "main",
59
+ },
60
+ });
61
+
62
+ const bindGroup = device.createBindGroup({
63
+ layout: bindGroupLayout,
64
+ entries: [
65
+ { binding: 0, resource: { buffer: inputBuffer } },
66
+ { binding: 1, resource: { buffer: outputBuffer } },
67
+ ],
68
+ });
69
+
70
+ const encoder = device.createCommandEncoder();
71
+ const pass = encoder.beginComputePass();
72
+ pass.setPipeline(pipeline);
73
+ pass.setBindGroup(0, bindGroup);
74
+ pass.dispatchWorkgroups(1);
75
+ pass.end();
76
+ encoder.copyBufferToBuffer(outputBuffer, 0, readbackBuffer, 0, input.byteLength);
77
+
78
+ device.queue.submit([encoder.finish()]);
79
+ await device.queue.onSubmittedWorkDone();
80
+
81
+ await readbackBuffer.mapAsync(globals.GPUMapMode.READ);
82
+ const result = new Float32Array(readbackBuffer.getMappedRange().slice(0));
83
+ readbackBuffer.unmap();
84
+
85
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,10 @@
1
+ import { requestDevice } from "@simulatte/webgpu";
2
+
3
+ const device = await requestDevice();
4
+
5
+ console.log(JSON.stringify({
6
+ createBuffer: typeof device.createBuffer === "function",
7
+ createComputePipeline: typeof device.createComputePipeline === "function",
8
+ createRenderPipeline: typeof device.createRenderPipeline === "function",
9
+ writeBuffer: typeof device.queue?.writeBuffer === "function",
10
+ }));
@@ -0,0 +1,9 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+ const src = gpu.buffers.fromData(new Float32Array([1, 2, 3, 4]), {
5
+ usage: ["storageRead", "readback"],
6
+ });
7
+
8
+ const result = await gpu.buffers.read(src, Float32Array);
9
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,30 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+ const src = gpu.buffers.fromData(new Float32Array([1, 2, 3, 4]));
5
+ const dst = gpu.buffers.like(src, {
6
+ usage: "storageReadWrite",
7
+ });
8
+
9
+ const kernel = gpu.compute.compile({
10
+ code: `
11
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
12
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
13
+
14
+ @compute @workgroup_size(4)
15
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
16
+ let i = gid.x;
17
+ dst[i] = src[i] * 5.0;
18
+ }
19
+ `,
20
+ bindings: [src, dst],
21
+ workgroups: 1,
22
+ });
23
+
24
+ await kernel.dispatch({
25
+ bindings: [src, dst],
26
+ workgroups: 1,
27
+ });
28
+
29
+ const result = await gpu.buffers.read(dst, Float32Array);
30
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,25 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+ const src = gpu.buffers.fromData(new Float32Array([1, 2, 3, 4]));
5
+ const dst = gpu.buffers.like(src, {
6
+ usage: "storageReadWrite",
7
+ });
8
+
9
+ await gpu.compute.run({
10
+ code: `
11
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
12
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
13
+
14
+ @compute @workgroup_size(4)
15
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
16
+ let i = gid.x;
17
+ dst[i] = src[i] * 2.0;
18
+ }
19
+ `,
20
+ bindings: [src, dst],
21
+ workgroups: 1,
22
+ });
23
+
24
+ const result = await gpu.buffers.read(dst, Float32Array);
25
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,36 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+
5
+ const result = await gpu.compute.once({
6
+ code: `
7
+ struct Scale {
8
+ value: f32,
9
+ };
10
+
11
+ @group(0) @binding(0) var<uniform> scale: Scale;
12
+ @group(0) @binding(1) var<storage, read> src: array<f32>;
13
+ @group(0) @binding(2) var<storage, read_write> dst: array<f32>;
14
+
15
+ @compute @workgroup_size(4)
16
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
17
+ let i = gid.x;
18
+ dst[i] = src[i] * scale.value;
19
+ }
20
+ `,
21
+ inputs: [
22
+ {
23
+ data: new Float32Array([2]),
24
+ usage: "uniform",
25
+ access: "uniform",
26
+ },
27
+ new Float32Array([1, 2, 3, 4]),
28
+ ],
29
+ output: {
30
+ type: Float32Array,
31
+ likeInput: 1,
32
+ },
33
+ workgroups: [1, 1],
34
+ });
35
+
36
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,53 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+ const M = 256;
5
+ const K = 512;
6
+ const N = 256;
7
+
8
+ const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
9
+ const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
10
+ const dims = new Uint32Array([M, K, N, 0]);
11
+
12
+ const result = await gpu.compute.once({
13
+ code: `
14
+ struct Dims {
15
+ m: u32,
16
+ k: u32,
17
+ n: u32,
18
+ _pad: u32,
19
+ };
20
+
21
+ @group(0) @binding(0) var<uniform> dims: Dims;
22
+ @group(0) @binding(1) var<storage, read> lhs: array<f32>;
23
+ @group(0) @binding(2) var<storage, read> rhs: array<f32>;
24
+ @group(0) @binding(3) var<storage, read_write> out: array<f32>;
25
+
26
+ @compute @workgroup_size(8, 8)
27
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
28
+ let row = gid.y;
29
+ let col = gid.x;
30
+ if (row >= dims.m || col >= dims.n) {
31
+ return;
32
+ }
33
+
34
+ var acc = 0.0;
35
+ for (var i = 0u; i < dims.k; i = i + 1u) {
36
+ acc += lhs[row * dims.k + i] * rhs[i * dims.n + col];
37
+ }
38
+ out[row * dims.n + col] = acc;
39
+ }
40
+ `,
41
+ inputs: [
42
+ { data: dims, usage: "uniform", access: "uniform" },
43
+ lhs,
44
+ rhs,
45
+ ],
46
+ output: {
47
+ type: Float32Array,
48
+ size: M * N * Float32Array.BYTES_PER_ELEMENT,
49
+ },
50
+ workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
51
+ });
52
+
53
+ console.log(result.subarray(0, 8));
@@ -0,0 +1,27 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+
5
+ const result = await gpu.compute.once({
6
+ code: `
7
+ @group(0) @binding(0) var<storage, read> lhs: array<f32>;
8
+ @group(0) @binding(1) var<storage, read> rhs: array<f32>;
9
+ @group(0) @binding(2) var<storage, read_write> dst: array<f32>;
10
+
11
+ @compute @workgroup_size(4)
12
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
13
+ let i = gid.x;
14
+ dst[i] = lhs[i] + rhs[i];
15
+ }
16
+ `,
17
+ inputs: [
18
+ new Float32Array([1, 2, 3, 4]),
19
+ new Float32Array([10, 20, 30, 40]),
20
+ ],
21
+ output: {
22
+ type: Float32Array,
23
+ },
24
+ workgroups: 1,
25
+ });
26
+
27
+ console.log(JSON.stringify(Array.from(result)));
@@ -0,0 +1,23 @@
1
+ import { doe } from "@simulatte/webgpu/compute";
2
+
3
+ const gpu = await doe.requestDevice();
4
+
5
+ const result = await gpu.compute.once({
6
+ code: `
7
+ @group(0) @binding(0) var<storage, read> src: array<f32>;
8
+ @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
9
+
10
+ @compute @workgroup_size(4)
11
+ fn main(@builtin(global_invocation_id) gid: vec3u) {
12
+ let i = gid.x;
13
+ dst[i] = src[i] * 3.0;
14
+ }
15
+ `,
16
+ inputs: [new Float32Array([1, 2, 3, 4])],
17
+ output: {
18
+ type: Float32Array,
19
+ },
20
+ workgroups: 1,
21
+ });
22
+
23
+ console.log(JSON.stringify(Array.from(result)));
@@ -7,7 +7,7 @@ This document outlines qualitative differences and target use-cases for headless
7
7
  | **Underlying Engine** | `libwebgpu_doe` (Zig + Lean pipeline) | Google Dawn (C++) | Google Dawn (C++) |
8
8
  | **Primary Focus** | Deterministic Compute, ML/AI, Verifiability | Browser Parity, Graphics | Browser Parity, Graphics |
9
9
  | **Binary Footprint** | Smaller targeted runtime expected | Varies by build/distribution | Varies by build/distribution |
10
- | **JS Binding Layer** | Node-API (N-API); experimental Bun FFI implementation also exists | Node-API (N-API) | Bun FFI (Fast Foreign Function) |
10
+ | **JS Binding Layer** | Node addon-backed path; Bun uses FFI on Linux and full/addon-backed path on macOS today | Node-API (N-API) | Bun FFI (Fast Foreign Function) |
11
11
  | **Security Model** | Explicit schema/gate discipline in Fawn pipeline | Runtime heuristics + Dawn validation | Runtime heuristics + Dawn validation |
12
12
  | **Resource Allocation** | Arena-backed, predictable memory | General WebGPU async allocations | General WebGPU async allocations |
13
13
  | **WebGPU Spec Compliance**| Compute-prioritized subset target | Broad Chromium-aligned coverage | Broad Chromium-aligned coverage |
@@ -17,7 +17,7 @@ This document outlines qualitative differences and target use-cases for headless
17
17
  ## Architectural Takeaways for Fawn
18
18
 
19
19
  1. Determinism and fail-fast contracts are the intended Doe value proposition for benchmarking workflows.
20
- 2. The package currently defaults Bun to the addon-backed runtime for correctness parity. The separate Bun FFI path may reduce wrapper overhead later, but end-to-end results must be measured per workload.
20
+ 2. The package currently uses a platform-split Bun path: Linux uses the direct FFI route, while macOS uses the addon-backed full path for correctness parity. FFI may reduce wrapper overhead, but end-to-end results must be measured per workload.
21
21
  3. Distribution size and startup claims must be backed by measured artifacts before release claims.
22
22
 
23
23
  ## Ecosystem reference: official/community competitors and stats