@simulatte/webgpu 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/CHANGELOG.md +27 -12
  2. package/LICENSE +191 -0
  3. package/README.md +55 -41
  4. package/api-contract.md +67 -49
  5. package/architecture.md +317 -0
  6. package/assets/package-layers.svg +3 -3
  7. package/docs/doe-api-reference.html +1842 -0
  8. package/doe-api-design.md +237 -0
  9. package/examples/doe-api/README.md +19 -0
  10. package/examples/doe-api/buffers-readback.js +3 -2
  11. package/examples/{doe-routines/compute-once-like-input.js → doe-api/compute-one-shot-like-input.js} +1 -1
  12. package/examples/{doe-routines/compute-once-matmul.js → doe-api/compute-one-shot-matmul.js} +2 -2
  13. package/examples/{doe-routines/compute-once-multiple-inputs.js → doe-api/compute-one-shot-multiple-inputs.js} +1 -1
  14. package/examples/{doe-routines/compute-once.js → doe-api/compute-one-shot.js} +1 -1
  15. package/examples/doe-api/{compile-and-dispatch.js → kernel-create-and-dispatch.js} +4 -6
  16. package/examples/doe-api/{compute-dispatch.js → kernel-run.js} +4 -6
  17. package/headless-webgpu-comparison.md +3 -3
  18. package/jsdoc-style-guide.md +435 -0
  19. package/native/doe_napi.c +1481 -84
  20. package/package.json +18 -6
  21. package/prebuilds/darwin-arm64/doe_napi.node +0 -0
  22. package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
  23. package/prebuilds/darwin-arm64/metadata.json +5 -5
  24. package/prebuilds/linux-x64/metadata.json +1 -1
  25. package/scripts/generate-doe-api-docs.js +1607 -0
  26. package/scripts/generate-readme-assets.js +3 -3
  27. package/src/build_metadata.js +7 -4
  28. package/src/bun-ffi.js +1229 -474
  29. package/src/bun.js +5 -1
  30. package/src/compute.d.ts +16 -7
  31. package/src/compute.js +84 -53
  32. package/src/full.d.ts +16 -7
  33. package/src/full.js +12 -10
  34. package/src/index.js +679 -1324
  35. package/src/runtime_cli.js +17 -17
  36. package/src/shared/capabilities.js +144 -0
  37. package/src/shared/compiler-errors.js +78 -0
  38. package/src/shared/encoder-surface.js +295 -0
  39. package/src/shared/full-surface.js +514 -0
  40. package/src/shared/public-surface.js +82 -0
  41. package/src/shared/resource-lifecycle.js +120 -0
  42. package/src/shared/validation.js +495 -0
  43. package/src/webgpu_constants.js +30 -0
  44. package/support-contracts.md +2 -2
  45. package/compat-scope.md +0 -46
  46. package/layering-plan.md +0 -259
  47. package/src/auto_bind_group_layout.js +0 -32
  48. package/src/doe.d.ts +0 -184
  49. package/src/doe.js +0 -641
  50. package/zig-source-inventory.md +0 -468
@@ -0,0 +1,237 @@
1
+ # Doe API design
2
+
3
+ Status: `active`
4
+
5
+ Scope:
6
+
7
+ - Doe helper naming and layering above direct WebGPU
8
+ - public JSDoc structure for package-surface helper APIs
9
+ - future expansion direction for routine families beyond the current helper surface
10
+
11
+ Use this together with:
12
+
13
+ - [api-contract.md](./api-contract.md) for current implemented method signatures
14
+ - [architecture.md](./architecture.md) for the full runtime layer stack
15
+ - [jsdoc-style-guide.md](./jsdoc-style-guide.md) for public API documentation rules
16
+
17
+ ## Why this exists
18
+
19
+ This document captures the naming cleanup that moved the Doe helper surface to
20
+ a more coherent hierarchy:
21
+
22
+ - `gpu.buffer.*` for resource helpers
23
+ - `gpu.kernel.*` for explicit compute primitives
24
+ - `gpu.compute.*` for higher-level routines
25
+
26
+ The remaining job is to keep future additions aligned with that model instead
27
+ of drifting back into mixed abstraction buckets.
28
+
29
+ The design goal is:
30
+
31
+ 1. keep direct WebGPU separate
32
+ 2. give Doe one explicit primitive layer
33
+ 3. give Doe one routine layer
34
+ 4. avoid domain namespaces until they are clearly justified
35
+
36
+ ## Design principles
37
+
38
+ ### 1. Bind once, then stay on the bound object
39
+
40
+ `doe` itself should only do two things:
41
+
42
+ - `await doe.requestDevice()`
43
+ - `doe.bind(device)`
44
+
45
+ All helper methods should live on the returned `gpu` object.
46
+
47
+ This keeps the public model simple:
48
+
49
+ - `doe` binds
50
+ - `gpu` does work
51
+
52
+ ### 2. Namespace by unit of thought, not by implementation accident
53
+
54
+ Namespaces should represent stable user concepts:
55
+
56
+ - `buffer`
57
+ resource ownership and readback
58
+ - `kernel`
59
+ explicit compiled/dispatchable compute units
60
+ - `compute`
61
+ higher-level workflows that allocate, dispatch, and read back for the caller
62
+
63
+ Avoid namespaces that only reflect where code currently happens to live.
64
+
65
+ ### 3. Do not mix primitives and routines in one namespace
66
+
67
+ The main current naming problem is that `compute` does two jobs:
68
+
69
+ - explicit kernel operations
70
+ - opinionated one-shot workflows
71
+
72
+ Those should be separate.
73
+
74
+ ### 4. Delay domain packs until the domain is real
75
+
76
+ Names like `linalg` or `math` should not exist just because one operation such
77
+ as matmul exists.
78
+
79
+ Introduce a domain namespace only when it has a real family of routines with a
80
+ shared mental model and clear boundaries.
81
+
82
+ Until then, keep those workflows under `gpu.compute.*`.
83
+
84
+ ## Current implemented surface
85
+
86
+ For exact method signatures and behavior, see
87
+ [`api-contract.md`](./api-contract.md) (section `doe`).
88
+
89
+ The three namespaces are:
90
+
91
+ - `gpu.buffer.*` — resource helpers (create, read)
92
+ - `gpu.kernel.*` — explicit compute primitives (create, run, dispatch)
93
+ - `gpu.compute(...)` — one-shot workflow helper
94
+
95
+ ## Proposed routine family (not yet implemented)
96
+
97
+ - `gpu.compute.map(options) -> Promise<TypedArray>`
98
+ - `gpu.compute.zip(options) -> Promise<TypedArray>`
99
+ - `gpu.compute.reduce(options) -> Promise<number | TypedArray>`
100
+ - `gpu.compute.scan(options) -> Promise<TypedArray>`
101
+ - `gpu.compute.matmul(options) -> Promise<TypedArray>`
102
+
103
+ Domain namespaces like `gpu.linalg` or `gpu.math` should not exist until the
104
+ domain has a real family of routines. Keep workflow routines under
105
+ `gpu.compute.*` until then.
106
+
107
+ ## Naming rules
108
+
109
+ ### 1. Prefer singular namespaces for domains
110
+
111
+ Use:
112
+
113
+ - `gpu.buffer.*`
114
+ - `gpu.kernel.*`
115
+
116
+ Do not use plural buckets like `gpu.buffers.*` unless the package already has a
117
+ hard compatibility constraint.
118
+
119
+ ### 2. Keep verbs concrete
120
+
121
+ Use:
122
+
123
+ - `create`
124
+ - `read`
125
+ - `run`
126
+ - `dispatch`
127
+ - `map`
128
+ - `zip`
129
+ - `reduce`
130
+
131
+ Avoid generic names like:
132
+
133
+ - `execute`
134
+ - `process`
135
+ - `handle`
136
+ - `apply`
137
+
138
+ unless the contract is genuinely broader than the specific verbs above.
139
+
140
+ ### 3. Keep one namespace, one abstraction level
141
+
142
+ Examples:
143
+
144
+ - `gpu.buffer.*`
145
+ resource helpers
146
+ - `gpu.kernel.*`
147
+ explicit compute primitives
148
+ - `gpu.compute.*`
149
+ routines
150
+
151
+ Do not mix:
152
+
153
+ - reusable kernel compilation
154
+ - buffer lifecycle
155
+ - high-level task workflows
156
+
157
+ inside one namespace.
158
+
159
+ ## Migration history (completed)
160
+
161
+ The following renames were applied to reach the current implemented surface:
162
+
163
+ - `gpu.buffers.create(...)` → `gpu.buffer.create(...)`
164
+ - `gpu.buffers.fromData(...)` → merged into `gpu.buffer.create({ data: ... })`
165
+ - `gpu.buffers.like(...)` → removed (use `gpu.buffer.create({ size: src.size, ... })`)
166
+ - `gpu.buffers.read(...)` → `gpu.buffer.read({ buffer, type, ... })`
167
+ - `gpu.compute.run(...)` → `gpu.kernel.run(...)`
168
+ - `gpu.compute.compile(...)` → `gpu.kernel.create(...)`
169
+ - `kernel.dispatch(...)` → stayed `kernel.dispatch(...)`
170
+ - `gpu.compute(...)` → stayed `gpu.compute(...)`
171
+
172
+ ## JSDoc contract for the future helper surface
173
+
174
+ Public JSDoc should document the API the user actually sees, not the private
175
+ helper graph underneath it.
176
+
177
+ For the Doe helper surface, the preferred structure is:
178
+
179
+ ```js
180
+ /**
181
+ * Create a reusable compute kernel from WGSL and binding metadata.
182
+ *
183
+ * Surface: Doe API (`gpu.kernel.*`).
184
+ * Input: WGSL source, entry point, and representative bindings.
185
+ * Returns: A reusable kernel object with `.dispatch(...)`.
186
+ *
187
+ * This example shows the API in its basic form.
188
+ *
189
+ * ```js
190
+ * const kernel = gpu.kernel.create({
191
+ * code,
192
+ * bindings: [src, dst],
193
+ * });
194
+ * ```
195
+ *
196
+ * - Reuse this when dispatching the same WGSL shape repeatedly.
197
+ * - Drop to direct WebGPU if you need manual pipeline-layout ownership.
198
+ */
199
+ ```
200
+
201
+ Required fields for future Doe helper docs:
202
+
203
+ 1. one-sentence summary
204
+ 2. `Surface:` line
205
+ 3. `Input:` line
206
+ 4. `Returns:` line
207
+ 5. one small example
208
+ 6. flat bullets for defaults, failure modes, and escalation path
209
+
210
+ This is stricter than the current narrative style on purpose. The Doe helper
211
+ surface benefits from explicit API contracts more than from prose-heavy
212
+ commentary.
213
+
214
+ ## Decision rule for future additions
215
+
216
+ When adding a new helper:
217
+
218
+ 1. If it is about resource ownership, put it under `gpu.buffer.*`.
219
+ 2. If it is about explicit WGSL/pipeline reuse and dispatch, put it under
220
+ `gpu.kernel.*`.
221
+ 3. If it is a workflow that owns temporary allocations and returns typed
222
+ results, put it under `gpu.compute.*`.
223
+ 4. If it requires model semantics, tensor semantics, KV cache handling,
224
+ attention, routing, or pipeline planning, it does not belong in Doe at all;
225
+ it belongs in a higher-level consumer such as Doppler.
226
+
227
+ ## Non-goals
228
+
229
+ This design does not propose:
230
+
231
+ - moving model runtime or pipeline semantics into Doe
232
+ - replacing direct WebGPU
233
+ - creating a broad domain-pack taxonomy today
234
+ - documenting the proposed naming as if it were already the implemented package contract
235
+
236
+ The live contract remains in [api-contract.md](./api-contract.md) until the
237
+ implementation catches up.
@@ -0,0 +1,19 @@
1
+ # Doe API examples
2
+
3
+ These examples are ordered from the most explicit Doe helper path to the most
4
+ opinionated one-shot helper.
5
+
6
+ - `buffers-readback.js`
7
+ create a Doe-managed buffer and read it back
8
+ - `kernel-run.js`
9
+ run a one-off compute kernel with explicit buffer ownership
10
+ - `kernel-create-and-dispatch.js`
11
+ compile a reusable `DoeKernel` and dispatch it
12
+ - `compute-one-shot.js`
13
+ use `gpu.compute(...)` with one typed-array input and inferred output size
14
+ - `compute-one-shot-like-input.js`
15
+ use `gpu.compute(...)` with `likeInput` sizing and a uniform input
16
+ - `compute-one-shot-multiple-inputs.js`
17
+ use `gpu.compute(...)` with multiple typed-array inputs
18
+ - `compute-one-shot-matmul.js`
19
+ run a larger one-shot `gpu.compute(...)` example with explicit tensor shapes
@@ -1,9 +1,10 @@
1
1
  import { doe } from "@simulatte/webgpu/compute";
2
2
 
3
3
  const gpu = await doe.requestDevice();
4
- const src = gpu.buffers.fromData(new Float32Array([1, 2, 3, 4]), {
4
+ const src = gpu.buffer.create({
5
+ data: new Float32Array([1, 2, 3, 4]),
5
6
  usage: ["storageRead", "readback"],
6
7
  });
7
8
 
8
- const result = await gpu.buffers.read(src, Float32Array);
9
+ const result = await gpu.buffer.read({ buffer: src, type: Float32Array });
9
10
  console.log(JSON.stringify(Array.from(result)));
@@ -2,7 +2,7 @@ import { doe } from "@simulatte/webgpu/compute";
2
2
 
3
3
  const gpu = await doe.requestDevice();
4
4
 
5
- const result = await gpu.compute.once({
5
+ const result = await gpu.compute({
6
6
  code: `
7
7
  struct Scale {
8
8
  value: f32,
@@ -9,7 +9,7 @@ const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
9
9
  const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
10
10
  const dims = new Uint32Array([M, K, N, 0]);
11
11
 
12
- const result = await gpu.compute.once({
12
+ const result = await gpu.compute({
13
13
  code: `
14
14
  struct Dims {
15
15
  m: u32,
@@ -50,4 +50,4 @@ const result = await gpu.compute.once({
50
50
  workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
51
51
  });
52
52
 
53
- console.log(result.subarray(0, 8));
53
+ console.log(JSON.stringify(Array.from(result.subarray(0, 8), (value) => Number(value.toFixed(4)))));
@@ -2,7 +2,7 @@ import { doe } from "@simulatte/webgpu/compute";
2
2
 
3
3
  const gpu = await doe.requestDevice();
4
4
 
5
- const result = await gpu.compute.once({
5
+ const result = await gpu.compute({
6
6
  code: `
7
7
  @group(0) @binding(0) var<storage, read> lhs: array<f32>;
8
8
  @group(0) @binding(1) var<storage, read> rhs: array<f32>;
@@ -2,7 +2,7 @@ import { doe } from "@simulatte/webgpu/compute";
2
2
 
3
3
  const gpu = await doe.requestDevice();
4
4
 
5
- const result = await gpu.compute.once({
5
+ const result = await gpu.compute({
6
6
  code: `
7
7
  @group(0) @binding(0) var<storage, read> src: array<f32>;
8
8
  @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
@@ -1,12 +1,10 @@
1
1
  import { doe } from "@simulatte/webgpu/compute";
2
2
 
3
3
  const gpu = await doe.requestDevice();
4
- const src = gpu.buffers.fromData(new Float32Array([1, 2, 3, 4]));
5
- const dst = gpu.buffers.like(src, {
6
- usage: "storageReadWrite",
7
- });
4
+ const src = gpu.buffer.create({ data: new Float32Array([1, 2, 3, 4]) });
5
+ const dst = gpu.buffer.create({ size: src.size, usage: "storageReadWrite" });
8
6
 
9
- const kernel = gpu.compute.compile({
7
+ const kernel = gpu.kernel.create({
10
8
  code: `
11
9
  @group(0) @binding(0) var<storage, read> src: array<f32>;
12
10
  @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
@@ -26,5 +24,5 @@ await kernel.dispatch({
26
24
  workgroups: 1,
27
25
  });
28
26
 
29
- const result = await gpu.buffers.read(dst, Float32Array);
27
+ const result = await gpu.buffer.read({ buffer: dst, type: Float32Array });
30
28
  console.log(JSON.stringify(Array.from(result)));
@@ -1,12 +1,10 @@
1
1
  import { doe } from "@simulatte/webgpu/compute";
2
2
 
3
3
  const gpu = await doe.requestDevice();
4
- const src = gpu.buffers.fromData(new Float32Array([1, 2, 3, 4]));
5
- const dst = gpu.buffers.like(src, {
6
- usage: "storageReadWrite",
7
- });
4
+ const src = gpu.buffer.create({ data: new Float32Array([1, 2, 3, 4]) });
5
+ const dst = gpu.buffer.create({ size: src.size, usage: "storageReadWrite" });
8
6
 
9
- await gpu.compute.run({
7
+ await gpu.kernel.run({
10
8
  code: `
11
9
  @group(0) @binding(0) var<storage, read> src: array<f32>;
12
10
  @group(0) @binding(1) var<storage, read_write> dst: array<f32>;
@@ -21,5 +19,5 @@ await gpu.compute.run({
21
19
  workgroups: 1,
22
20
  });
23
21
 
24
- const result = await gpu.buffers.read(dst, Float32Array);
22
+ const result = await gpu.buffer.read({ buffer: dst, type: Float32Array });
25
23
  console.log(JSON.stringify(Array.from(result)));
@@ -38,6 +38,6 @@ Notes:
38
38
 
39
39
  ## Scaffolding the Fawn NPM Package
40
40
 
41
- - Doe is exposed through a native C ABI and also ships an experimental Bun FFI implementation, but the package-default Bun entry currently uses the addon-backed runtime for stability.
42
- - Node N-API support now exists in the canonical `@simulatte/webgpu` package.
43
- - Browser API parity is not claimed by this draft package; the current focus is headless benchmarking workflows.
41
+ - Doe is exposed through a native C ABI with parallel N-API (Node) and FFI (Bun) transports. Bun uses a platform-dependent bridge: FFI on Linux, addon-backed on macOS for correctness parity.
42
+ - The canonical package is `@simulatte/webgpu` with full Node N-API and Bun support.
43
+ - Browser API parity is not claimed; the current focus is headless compute and benchmarking workflows. Browser ownership lives in `nursery/fawn-browser`.