@simulatte/webgpu 0.3.1 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +27 -12
- package/LICENSE +191 -0
- package/README.md +55 -41
- package/api-contract.md +67 -49
- package/architecture.md +317 -0
- package/assets/package-layers.svg +3 -3
- package/docs/doe-api-reference.html +1842 -0
- package/doe-api-design.md +237 -0
- package/examples/doe-api/README.md +19 -0
- package/examples/doe-api/buffers-readback.js +3 -2
- package/examples/{doe-routines/compute-once-like-input.js → doe-api/compute-one-shot-like-input.js} +1 -1
- package/examples/{doe-routines/compute-once-matmul.js → doe-api/compute-one-shot-matmul.js} +2 -2
- package/examples/{doe-routines/compute-once-multiple-inputs.js → doe-api/compute-one-shot-multiple-inputs.js} +1 -1
- package/examples/{doe-routines/compute-once.js → doe-api/compute-one-shot.js} +1 -1
- package/examples/doe-api/{compile-and-dispatch.js → kernel-create-and-dispatch.js} +4 -6
- package/examples/doe-api/{compute-dispatch.js → kernel-run.js} +4 -6
- package/headless-webgpu-comparison.md +3 -3
- package/jsdoc-style-guide.md +435 -0
- package/native/doe_napi.c +1481 -84
- package/package.json +18 -6
- package/prebuilds/darwin-arm64/doe_napi.node +0 -0
- package/prebuilds/darwin-arm64/libwebgpu_doe.dylib +0 -0
- package/prebuilds/darwin-arm64/metadata.json +5 -5
- package/prebuilds/linux-x64/metadata.json +1 -1
- package/scripts/generate-doe-api-docs.js +1607 -0
- package/scripts/generate-readme-assets.js +3 -3
- package/src/build_metadata.js +7 -4
- package/src/bun-ffi.js +1229 -474
- package/src/bun.js +5 -1
- package/src/compute.d.ts +16 -7
- package/src/compute.js +84 -53
- package/src/full.d.ts +16 -7
- package/src/full.js +12 -10
- package/src/index.js +679 -1324
- package/src/runtime_cli.js +17 -17
- package/src/shared/capabilities.js +144 -0
- package/src/shared/compiler-errors.js +78 -0
- package/src/shared/encoder-surface.js +295 -0
- package/src/shared/full-surface.js +514 -0
- package/src/shared/public-surface.js +82 -0
- package/src/shared/resource-lifecycle.js +120 -0
- package/src/shared/validation.js +495 -0
- package/src/webgpu_constants.js +30 -0
- package/support-contracts.md +2 -2
- package/compat-scope.md +0 -46
- package/layering-plan.md +0 -259
- package/src/auto_bind_group_layout.js +0 -32
- package/src/doe.d.ts +0 -184
- package/src/doe.js +0 -641
- package/zig-source-inventory.md +0 -468
|
@@ -0,0 +1,237 @@
|
|
|
1
|
+
# Doe API design
|
|
2
|
+
|
|
3
|
+
Status: `active`
|
|
4
|
+
|
|
5
|
+
Scope:
|
|
6
|
+
|
|
7
|
+
- Doe helper naming and layering above direct WebGPU
|
|
8
|
+
- public JSDoc structure for package-surface helper APIs
|
|
9
|
+
- future expansion direction for routine families beyond the current helper surface
|
|
10
|
+
|
|
11
|
+
Use this together with:
|
|
12
|
+
|
|
13
|
+
- [api-contract.md](./api-contract.md) for current implemented method signatures
|
|
14
|
+
- [architecture.md](./architecture.md) for the full runtime layer stack
|
|
15
|
+
- [jsdoc-style-guide.md](./jsdoc-style-guide.md) for public API documentation rules
|
|
16
|
+
|
|
17
|
+
## Why this exists
|
|
18
|
+
|
|
19
|
+
This document captures the naming cleanup that moved the Doe helper surface to
|
|
20
|
+
a more coherent hierarchy:
|
|
21
|
+
|
|
22
|
+
- `gpu.buffer.*` for resource helpers
|
|
23
|
+
- `gpu.kernel.*` for explicit compute primitives
|
|
24
|
+
- `gpu.compute.*` for higher-level routines
|
|
25
|
+
|
|
26
|
+
The remaining job is to keep future additions aligned with that model instead
|
|
27
|
+
of drifting back into mixed abstraction buckets.
|
|
28
|
+
|
|
29
|
+
The design goal is:
|
|
30
|
+
|
|
31
|
+
1. keep direct WebGPU separate
|
|
32
|
+
2. give Doe one explicit primitive layer
|
|
33
|
+
3. give Doe one routine layer
|
|
34
|
+
4. avoid domain namespaces until they are clearly justified
|
|
35
|
+
|
|
36
|
+
## Design principles
|
|
37
|
+
|
|
38
|
+
### 1. Bind once, then stay on the bound object
|
|
39
|
+
|
|
40
|
+
`doe` itself should only do two things:
|
|
41
|
+
|
|
42
|
+
- `await doe.requestDevice()`
|
|
43
|
+
- `doe.bind(device)`
|
|
44
|
+
|
|
45
|
+
All helper methods should live on the returned `gpu` object.
|
|
46
|
+
|
|
47
|
+
This keeps the public model simple:
|
|
48
|
+
|
|
49
|
+
- `doe` binds
|
|
50
|
+
- `gpu` does work
|
|
51
|
+
|
|
52
|
+
### 2. Namespace by unit of thought, not by implementation accident
|
|
53
|
+
|
|
54
|
+
Namespaces should represent stable user concepts:
|
|
55
|
+
|
|
56
|
+
- `buffer`
|
|
57
|
+
resource ownership and readback
|
|
58
|
+
- `kernel`
|
|
59
|
+
explicit compiled/dispatchable compute units
|
|
60
|
+
- `compute`
|
|
61
|
+
higher-level workflows that allocate, dispatch, and read back for the caller
|
|
62
|
+
|
|
63
|
+
Avoid namespaces that only reflect where code currently happens to live.
|
|
64
|
+
|
|
65
|
+
### 3. Do not mix primitives and routines in one namespace
|
|
66
|
+
|
|
67
|
+
The main current naming problem is that `compute` does two jobs:
|
|
68
|
+
|
|
69
|
+
- explicit kernel operations
|
|
70
|
+
- opinionated one-shot workflows
|
|
71
|
+
|
|
72
|
+
Those should be separate.
|
|
73
|
+
|
|
74
|
+
### 4. Delay domain packs until the domain is real
|
|
75
|
+
|
|
76
|
+
Names like `linalg` or `math` should not exist just because one operation such
|
|
77
|
+
as matmul exists.
|
|
78
|
+
|
|
79
|
+
Introduce a domain namespace only when it has a real family of routines with a
|
|
80
|
+
shared mental model and clear boundaries.
|
|
81
|
+
|
|
82
|
+
Until then, keep those workflows under `gpu.compute.*`.
|
|
83
|
+
|
|
84
|
+
## Current implemented surface
|
|
85
|
+
|
|
86
|
+
For exact method signatures and behavior, see
|
|
87
|
+
[`api-contract.md`](./api-contract.md) (section `doe`).
|
|
88
|
+
|
|
89
|
+
The three namespaces are:
|
|
90
|
+
|
|
91
|
+
- `gpu.buffer.*` — resource helpers (create, read)
|
|
92
|
+
- `gpu.kernel.*` — explicit compute primitives (create, run, dispatch)
|
|
93
|
+
- `gpu.compute(...)` — one-shot workflow helper
|
|
94
|
+
|
|
95
|
+
## Proposed routine family (not yet implemented)
|
|
96
|
+
|
|
97
|
+
- `gpu.compute.map(options) -> Promise<TypedArray>`
|
|
98
|
+
- `gpu.compute.zip(options) -> Promise<TypedArray>`
|
|
99
|
+
- `gpu.compute.reduce(options) -> Promise<number | TypedArray>`
|
|
100
|
+
- `gpu.compute.scan(options) -> Promise<TypedArray>`
|
|
101
|
+
- `gpu.compute.matmul(options) -> Promise<TypedArray>`
|
|
102
|
+
|
|
103
|
+
Domain namespaces like `gpu.linalg` or `gpu.math` should not exist until the
|
|
104
|
+
domain has a real family of routines. Keep workflow routines under
|
|
105
|
+
`gpu.compute.*` until then.
|
|
106
|
+
|
|
107
|
+
## Naming rules
|
|
108
|
+
|
|
109
|
+
### 1. Prefer singular namespaces for domains
|
|
110
|
+
|
|
111
|
+
Use:
|
|
112
|
+
|
|
113
|
+
- `gpu.buffer.*`
|
|
114
|
+
- `gpu.kernel.*`
|
|
115
|
+
|
|
116
|
+
Do not use plural buckets like `gpu.buffers.*` unless the package already has a
|
|
117
|
+
hard compatibility constraint.
|
|
118
|
+
|
|
119
|
+
### 2. Keep verbs concrete
|
|
120
|
+
|
|
121
|
+
Use:
|
|
122
|
+
|
|
123
|
+
- `create`
|
|
124
|
+
- `read`
|
|
125
|
+
- `run`
|
|
126
|
+
- `dispatch`
|
|
127
|
+
- `map`
|
|
128
|
+
- `zip`
|
|
129
|
+
- `reduce`
|
|
130
|
+
|
|
131
|
+
Avoid generic names like:
|
|
132
|
+
|
|
133
|
+
- `execute`
|
|
134
|
+
- `process`
|
|
135
|
+
- `handle`
|
|
136
|
+
- `apply`
|
|
137
|
+
|
|
138
|
+
unless the contract is genuinely broader than the specific verbs above.
|
|
139
|
+
|
|
140
|
+
### 3. Keep one namespace, one abstraction level
|
|
141
|
+
|
|
142
|
+
Examples:
|
|
143
|
+
|
|
144
|
+
- `gpu.buffer.*`
|
|
145
|
+
resource helpers
|
|
146
|
+
- `gpu.kernel.*`
|
|
147
|
+
explicit compute primitives
|
|
148
|
+
- `gpu.compute.*`
|
|
149
|
+
routines
|
|
150
|
+
|
|
151
|
+
Do not mix:
|
|
152
|
+
|
|
153
|
+
- reusable kernel compilation
|
|
154
|
+
- buffer lifecycle
|
|
155
|
+
- high-level task workflows
|
|
156
|
+
|
|
157
|
+
inside one namespace.
|
|
158
|
+
|
|
159
|
+
## Migration history (completed)
|
|
160
|
+
|
|
161
|
+
The following renames were applied to reach the current implemented surface:
|
|
162
|
+
|
|
163
|
+
- `gpu.buffers.create(...)` → `gpu.buffer.create(...)`
|
|
164
|
+
- `gpu.buffers.fromData(...)` → merged into `gpu.buffer.create({ data: ... })`
|
|
165
|
+
- `gpu.buffers.like(...)` → removed (use `gpu.buffer.create({ size: src.size, ... })`)
|
|
166
|
+
- `gpu.buffers.read(...)` → `gpu.buffer.read({ buffer, type, ... })`
|
|
167
|
+
- `gpu.compute.run(...)` → `gpu.kernel.run(...)`
|
|
168
|
+
- `gpu.compute.compile(...)` → `gpu.kernel.create(...)`
|
|
169
|
+
- `kernel.dispatch(...)` → stayed `kernel.dispatch(...)`
|
|
170
|
+
- `gpu.compute(...)` → stayed `gpu.compute(...)`
|
|
171
|
+
|
|
172
|
+
## JSDoc contract for the future helper surface
|
|
173
|
+
|
|
174
|
+
Public JSDoc should document the API the user actually sees, not the private
|
|
175
|
+
helper graph underneath it.
|
|
176
|
+
|
|
177
|
+
For the Doe helper surface, the preferred structure is:
|
|
178
|
+
|
|
179
|
+
```js
|
|
180
|
+
/**
|
|
181
|
+
* Create a reusable compute kernel from WGSL and binding metadata.
|
|
182
|
+
*
|
|
183
|
+
* Surface: Doe API (`gpu.kernel.*`).
|
|
184
|
+
* Input: WGSL source, entry point, and representative bindings.
|
|
185
|
+
* Returns: A reusable kernel object with `.dispatch(...)`.
|
|
186
|
+
*
|
|
187
|
+
* This example shows the API in its basic form.
|
|
188
|
+
*
|
|
189
|
+
* ```js
|
|
190
|
+
* const kernel = gpu.kernel.create({
|
|
191
|
+
* code,
|
|
192
|
+
* bindings: [src, dst],
|
|
193
|
+
* });
|
|
194
|
+
* ```
|
|
195
|
+
*
|
|
196
|
+
* - Reuse this when dispatching the same WGSL shape repeatedly.
|
|
197
|
+
* - Drop to direct WebGPU if you need manual pipeline-layout ownership.
|
|
198
|
+
*/
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
Required fields for future Doe helper docs:
|
|
202
|
+
|
|
203
|
+
1. one-sentence summary
|
|
204
|
+
2. `Surface:` line
|
|
205
|
+
3. `Input:` line
|
|
206
|
+
4. `Returns:` line
|
|
207
|
+
5. one small example
|
|
208
|
+
6. flat bullets for defaults, failure modes, and escalation path
|
|
209
|
+
|
|
210
|
+
This is stricter than the current narrative style on purpose. The Doe helper
|
|
211
|
+
surface benefits from explicit API contracts more than from prose-heavy
|
|
212
|
+
commentary.
|
|
213
|
+
|
|
214
|
+
## Decision rule for future additions
|
|
215
|
+
|
|
216
|
+
When adding a new helper:
|
|
217
|
+
|
|
218
|
+
1. If it is about resource ownership, put it under `gpu.buffer.*`.
|
|
219
|
+
2. If it is about explicit WGSL/pipeline reuse and dispatch, put it under
|
|
220
|
+
`gpu.kernel.*`.
|
|
221
|
+
3. If it is a workflow that owns temporary allocations and returns typed
|
|
222
|
+
results, put it under `gpu.compute.*`.
|
|
223
|
+
4. If it requires model semantics, tensor semantics, KV cache handling,
|
|
224
|
+
attention, routing, or pipeline planning, it does not belong in Doe at all;
|
|
225
|
+
it belongs in a higher-level consumer such as Doppler.
|
|
226
|
+
|
|
227
|
+
## Non-goals
|
|
228
|
+
|
|
229
|
+
This design does not propose:
|
|
230
|
+
|
|
231
|
+
- moving model runtime or pipeline semantics into Doe
|
|
232
|
+
- replacing direct WebGPU
|
|
233
|
+
- creating a broad domain-pack taxonomy today
|
|
234
|
+
- documenting the proposed naming as if it were already the implemented package contract
|
|
235
|
+
|
|
236
|
+
The live contract remains in [api-contract.md](./api-contract.md) until the
|
|
237
|
+
implementation catches up.
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
# Doe API examples
|
|
2
|
+
|
|
3
|
+
These examples are ordered from the most explicit Doe helper path to the most
|
|
4
|
+
opinionated one-shot helper.
|
|
5
|
+
|
|
6
|
+
- `buffers-readback.js`
|
|
7
|
+
create a Doe-managed buffer and read it back
|
|
8
|
+
- `kernel-run.js`
|
|
9
|
+
run a one-off compute kernel with explicit buffer ownership
|
|
10
|
+
- `kernel-create-and-dispatch.js`
|
|
11
|
+
compile a reusable `DoeKernel` and dispatch it
|
|
12
|
+
- `compute-one-shot.js`
|
|
13
|
+
use `gpu.compute(...)` with one typed-array input and inferred output size
|
|
14
|
+
- `compute-one-shot-like-input.js`
|
|
15
|
+
use `gpu.compute(...)` with `likeInput` sizing and a uniform input
|
|
16
|
+
- `compute-one-shot-multiple-inputs.js`
|
|
17
|
+
use `gpu.compute(...)` with multiple typed-array inputs
|
|
18
|
+
- `compute-one-shot-matmul.js`
|
|
19
|
+
run a larger one-shot `gpu.compute(...)` example with explicit tensor shapes
|
|
@@ -1,9 +1,10 @@
|
|
|
1
1
|
import { doe } from "@simulatte/webgpu/compute";
|
|
2
2
|
|
|
3
3
|
const gpu = await doe.requestDevice();
|
|
4
|
-
const src = gpu.
|
|
4
|
+
const src = gpu.buffer.create({
|
|
5
|
+
data: new Float32Array([1, 2, 3, 4]),
|
|
5
6
|
usage: ["storageRead", "readback"],
|
|
6
7
|
});
|
|
7
8
|
|
|
8
|
-
const result = await gpu.
|
|
9
|
+
const result = await gpu.buffer.read({ buffer: src, type: Float32Array });
|
|
9
10
|
console.log(JSON.stringify(Array.from(result)));
|
|
@@ -9,7 +9,7 @@ const lhs = Float32Array.from({ length: M * K }, (_, i) => (i % 17) / 17);
|
|
|
9
9
|
const rhs = Float32Array.from({ length: K * N }, (_, i) => (i % 13) / 13);
|
|
10
10
|
const dims = new Uint32Array([M, K, N, 0]);
|
|
11
11
|
|
|
12
|
-
const result = await gpu.compute
|
|
12
|
+
const result = await gpu.compute({
|
|
13
13
|
code: `
|
|
14
14
|
struct Dims {
|
|
15
15
|
m: u32,
|
|
@@ -50,4 +50,4 @@ const result = await gpu.compute.once({
|
|
|
50
50
|
workgroups: [Math.ceil(N / 8), Math.ceil(M / 8)],
|
|
51
51
|
});
|
|
52
52
|
|
|
53
|
-
console.log(result.subarray(0, 8));
|
|
53
|
+
console.log(JSON.stringify(Array.from(result.subarray(0, 8), (value) => Number(value.toFixed(4)))));
|
|
@@ -2,7 +2,7 @@ import { doe } from "@simulatte/webgpu/compute";
|
|
|
2
2
|
|
|
3
3
|
const gpu = await doe.requestDevice();
|
|
4
4
|
|
|
5
|
-
const result = await gpu.compute
|
|
5
|
+
const result = await gpu.compute({
|
|
6
6
|
code: `
|
|
7
7
|
@group(0) @binding(0) var<storage, read> lhs: array<f32>;
|
|
8
8
|
@group(0) @binding(1) var<storage, read> rhs: array<f32>;
|
|
@@ -2,7 +2,7 @@ import { doe } from "@simulatte/webgpu/compute";
|
|
|
2
2
|
|
|
3
3
|
const gpu = await doe.requestDevice();
|
|
4
4
|
|
|
5
|
-
const result = await gpu.compute
|
|
5
|
+
const result = await gpu.compute({
|
|
6
6
|
code: `
|
|
7
7
|
@group(0) @binding(0) var<storage, read> src: array<f32>;
|
|
8
8
|
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
|
|
@@ -1,12 +1,10 @@
|
|
|
1
1
|
import { doe } from "@simulatte/webgpu/compute";
|
|
2
2
|
|
|
3
3
|
const gpu = await doe.requestDevice();
|
|
4
|
-
const src = gpu.
|
|
5
|
-
const dst = gpu.
|
|
6
|
-
usage: "storageReadWrite",
|
|
7
|
-
});
|
|
4
|
+
const src = gpu.buffer.create({ data: new Float32Array([1, 2, 3, 4]) });
|
|
5
|
+
const dst = gpu.buffer.create({ size: src.size, usage: "storageReadWrite" });
|
|
8
6
|
|
|
9
|
-
const kernel = gpu.
|
|
7
|
+
const kernel = gpu.kernel.create({
|
|
10
8
|
code: `
|
|
11
9
|
@group(0) @binding(0) var<storage, read> src: array<f32>;
|
|
12
10
|
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
|
|
@@ -26,5 +24,5 @@ await kernel.dispatch({
|
|
|
26
24
|
workgroups: 1,
|
|
27
25
|
});
|
|
28
26
|
|
|
29
|
-
const result = await gpu.
|
|
27
|
+
const result = await gpu.buffer.read({ buffer: dst, type: Float32Array });
|
|
30
28
|
console.log(JSON.stringify(Array.from(result)));
|
|
@@ -1,12 +1,10 @@
|
|
|
1
1
|
import { doe } from "@simulatte/webgpu/compute";
|
|
2
2
|
|
|
3
3
|
const gpu = await doe.requestDevice();
|
|
4
|
-
const src = gpu.
|
|
5
|
-
const dst = gpu.
|
|
6
|
-
usage: "storageReadWrite",
|
|
7
|
-
});
|
|
4
|
+
const src = gpu.buffer.create({ data: new Float32Array([1, 2, 3, 4]) });
|
|
5
|
+
const dst = gpu.buffer.create({ size: src.size, usage: "storageReadWrite" });
|
|
8
6
|
|
|
9
|
-
await gpu.
|
|
7
|
+
await gpu.kernel.run({
|
|
10
8
|
code: `
|
|
11
9
|
@group(0) @binding(0) var<storage, read> src: array<f32>;
|
|
12
10
|
@group(0) @binding(1) var<storage, read_write> dst: array<f32>;
|
|
@@ -21,5 +19,5 @@ await gpu.compute.run({
|
|
|
21
19
|
workgroups: 1,
|
|
22
20
|
});
|
|
23
21
|
|
|
24
|
-
const result = await gpu.
|
|
22
|
+
const result = await gpu.buffer.read({ buffer: dst, type: Float32Array });
|
|
25
23
|
console.log(JSON.stringify(Array.from(result)));
|
|
@@ -38,6 +38,6 @@ Notes:
|
|
|
38
38
|
|
|
39
39
|
## Scaffolding the Fawn NPM Package
|
|
40
40
|
|
|
41
|
-
- Doe is exposed through a native C ABI
|
|
42
|
-
-
|
|
43
|
-
- Browser API parity is not claimed
|
|
41
|
+
- Doe is exposed through a native C ABI with parallel N-API (Node) and FFI (Bun) transports. Bun uses a platform-dependent bridge: FFI on Linux, addon-backed on macOS for correctness parity.
|
|
42
|
+
- The canonical package is `@simulatte/webgpu` with full Node N-API and Bun support.
|
|
43
|
+
- Browser API parity is not claimed; the current focus is headless compute and benchmarking workflows. Browser ownership lives in `nursery/fawn-browser`.
|