@simulatte/webgpu-doe 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +93 -106
- package/native/doe_napi.c +2 -2
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,108 +1,8 @@
|
|
|
1
1
|
# @simulatte/webgpu-doe
|
|
2
2
|
|
|
3
|
-
Headless WebGPU for Node.js
|
|
4
|
-
[Doe](https://github.com/clocksmith/fawn) runtime.
|
|
3
|
+
Headless WebGPU for Node.js on the Doe runtime.
|
|
5
4
|
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
A native Metal WebGPU implementation for Node.js — no Dawn, no IPC, no
|
|
9
|
-
11 MB sidecar. Doe compiles WGSL to MSL at runtime via an AST-based
|
|
10
|
-
shader compiler and dispatches directly to Metal via a Zig + ObjC bridge.
|
|
11
|
-
|
|
12
|
-
This package ships:
|
|
13
|
-
|
|
14
|
-
- **`libdoe_webgpu`** — Doe native runtime (~2 MB, Zig + Metal)
|
|
15
|
-
- **`doe_napi.node`** — N-API addon bridging `libdoe_webgpu` to JavaScript
|
|
16
|
-
- **`src/index.js`** — JS wrapper providing WebGPU-shaped classes and constants
|
|
17
|
-
|
|
18
|
-
## Architecture
|
|
19
|
-
|
|
20
|
-
```
|
|
21
|
-
JavaScript (DoeGPUDevice, DoeGPUBuffer, ...)
|
|
22
|
-
|
|
|
23
|
-
N-API addon (doe_napi.c)
|
|
24
|
-
|
|
|
25
|
-
libdoe_webgpu.dylib ← Doe native Metal backend, ~2 MB
|
|
26
|
-
|
|
|
27
|
-
Metal.framework ← GPU execution (Apple Silicon)
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
No Dawn dependency. All GPU calls go directly from Zig to Metal.
|
|
31
|
-
|
|
32
|
-
## Performance claims (Metal, Apple Silicon)
|
|
33
|
-
|
|
34
|
-
Apples-to-apples vs Dawn (Chromium's WebGPU), matched workloads and timing:
|
|
35
|
-
|
|
36
|
-
- **Compute e2e** — 1.5x faster (0.23ms vs 0.35ms, 4096 threads)
|
|
37
|
-
- **Buffer upload** — faster across 1 KB to 4 GB (8 sizes claimable)
|
|
38
|
-
- **Atomics** — workgroup atomic and non-atomic both claimable
|
|
39
|
-
- **Matrix-vector multiply** — 3 variants claimable (naive, swizzle, workgroup-shared)
|
|
40
|
-
- **Concurrent execution** — claimable
|
|
41
|
-
- **Zero-init workgroup memory** — claimable
|
|
42
|
-
- **Draw throughput** — 200k draws claimable
|
|
43
|
-
- **Binary size** — ~2 MB vs Dawn's ~11 MB
|
|
44
|
-
|
|
45
|
-
19 of 30 workloads are claimable. The remaining 11 are bottlenecked by
|
|
46
|
-
per-command Metal command buffer creation overhead (~350us vs Dawn's ~30us).
|
|
47
|
-
See `fawn/bench/` for methodology and raw data.
|
|
48
|
-
|
|
49
|
-
## API surface
|
|
50
|
-
|
|
51
|
-
Compute:
|
|
52
|
-
|
|
53
|
-
- `create()` / `setupGlobals()` / `requestAdapter()` / `requestDevice()`
|
|
54
|
-
- `device.createBuffer()` / `device.createShaderModule()` (WGSL)
|
|
55
|
-
- `device.createComputePipeline()` / `device.createBindGroupLayout()`
|
|
56
|
-
- `device.createBindGroup()` / `device.createPipelineLayout()`
|
|
57
|
-
- `device.createCommandEncoder()` / `encoder.beginComputePass()`
|
|
58
|
-
- `pass.setPipeline()` / `pass.setBindGroup()` / `pass.dispatchWorkgroups()`
|
|
59
|
-
- `pass.dispatchWorkgroupsIndirect()`
|
|
60
|
-
- `pipeline.getBindGroupLayout()`
|
|
61
|
-
- `device.createComputePipelineAsync()`
|
|
62
|
-
- `encoder.copyBufferToBuffer()` / `queue.submit()` / `queue.writeBuffer()`
|
|
63
|
-
- `buffer.mapAsync()` / `buffer.getMappedRange()` / `buffer.unmap()`
|
|
64
|
-
- `queue.onSubmittedWorkDone()`
|
|
65
|
-
|
|
66
|
-
Render:
|
|
67
|
-
|
|
68
|
-
- `device.createTexture()` / `texture.createView()` / `device.createSampler()`
|
|
69
|
-
- `device.createRenderPipeline()` / `encoder.beginRenderPass()`
|
|
70
|
-
- `renderPass.setPipeline()` / `renderPass.draw()` / `renderPass.end()`
|
|
71
|
-
|
|
72
|
-
Device capabilities:
|
|
73
|
-
|
|
74
|
-
- `device.limits` / `adapter.limits` — full Metal device limits
|
|
75
|
-
- `device.features` / `adapter.features` — reports `shader-f16`
|
|
76
|
-
|
|
77
|
-
Not yet supported: canvas/surface presentation, vertex/index buffer binding
|
|
78
|
-
in render passes, full render pipeline descriptor parsing.
|
|
79
|
-
|
|
80
|
-
## Backend readiness
|
|
81
|
-
|
|
82
|
-
| Backend | Compute | Render | WGSL compiler | Status |
|
|
83
|
-
|---------|---------|--------|---------------|--------|
|
|
84
|
-
| **Metal** (macOS) | Production | Basic (no vertex/index) | WGSL -> MSL (AST-based) | Ready |
|
|
85
|
-
| **Vulkan** (Linux) | WIP | Not started | WGSL -> SPIR-V needed | Experimental |
|
|
86
|
-
| **D3D12** (Windows) | WIP | Not started | WGSL -> HLSL/DXIL needed | Experimental |
|
|
87
|
-
|
|
88
|
-
**Metal** is the primary backend. All Doppler compute workloads run on Metal today:
|
|
89
|
-
bind groups 0-3, buffer map/unmap, indirect dispatch, shader-f16, subgroups,
|
|
90
|
-
override constants, workgroup shared memory, multiple entry points.
|
|
91
|
-
|
|
92
|
-
**Vulkan** and **D3D12** have real native runtime paths (not stubs) with instance
|
|
93
|
-
creation, compute dispatch, and buffer upload — but lack shader translation,
|
|
94
|
-
bind group management, buffer map/unmap, textures, and render pipelines.
|
|
95
|
-
|
|
96
|
-
See [`fawn/status.md`](../../status.md) for the full backend implementation matrix.
|
|
97
|
-
|
|
98
|
-
## Platform support
|
|
99
|
-
|
|
100
|
-
| Platform | Architecture | Status |
|
|
101
|
-
|----------|-------------|--------|
|
|
102
|
-
| macOS | arm64 | Prebuilt, tested |
|
|
103
|
-
| macOS | x64 | Not yet built |
|
|
104
|
-
| Linux | x64 | Not yet built (Vulkan backend experimental) |
|
|
105
|
-
| Windows | x64 | Not yet built (D3D12 backend experimental) |
|
|
5
|
+
**[Fawn](https://github.com/clocksmith/fawn/tree/main/nursery/webgpu-doe)** · **[npm](https://www.npmjs.com/package/@simulatte/webgpu-doe)** · **[simulatte.world](https://simulatte.world)**
|
|
106
6
|
|
|
107
7
|
## Install
|
|
108
8
|
|
|
@@ -110,10 +10,11 @@ See [`fawn/status.md`](../../status.md) for the full backend implementation matr
|
|
|
110
10
|
npm install @simulatte/webgpu-doe
|
|
111
11
|
```
|
|
112
12
|
|
|
113
|
-
The
|
|
114
|
-
|
|
13
|
+
The published package currently targets `darwin-arm64`. The N-API addon builds
|
|
14
|
+
from C source during install via `node-gyp`, so a local C toolchain is
|
|
15
|
+
required (`xcode-select --install` on macOS).
|
|
115
16
|
|
|
116
|
-
##
|
|
17
|
+
## Quick start
|
|
117
18
|
|
|
118
19
|
```js
|
|
119
20
|
import { create, globals } from '@simulatte/webgpu-doe';
|
|
@@ -144,7 +45,93 @@ const shader = device.createShaderModule({
|
|
|
144
45
|
// ... create pipeline, bind group, encode, dispatch, readback
|
|
145
46
|
```
|
|
146
47
|
|
|
147
|
-
|
|
48
|
+
The package loads `libdoe_webgpu` and exposes a WebGPU-shaped API for
|
|
49
|
+
headless compute and basic render work. See [more examples](#more-examples)
|
|
50
|
+
below for `navigator.gpu` setup and provider inspection.
|
|
51
|
+
|
|
52
|
+
## Why Doe
|
|
53
|
+
|
|
54
|
+
- Native path: JavaScript calls into an N-API addon, which loads
|
|
55
|
+
`libdoe_webgpu` and submits work through Doe's Metal backend.
|
|
56
|
+
- Runtime ownership: WGSL is compiled to MSL inside Doe's AST-based compiler
|
|
57
|
+
instead of going through Dawn.
|
|
58
|
+
- Small package payload: the shared library is about 2 MB on `darwin-arm64`.
|
|
59
|
+
- WebGPU-shaped surface: `requestAdapter`, `requestDevice`, buffer mapping,
|
|
60
|
+
bind groups, compute pipelines, command encoders, and basic render passes are
|
|
61
|
+
exposed directly from the package.
|
|
62
|
+
|
|
63
|
+
## Status
|
|
64
|
+
|
|
65
|
+
Current package target:
|
|
66
|
+
- macOS arm64: prebuilt library and tested package path
|
|
67
|
+
|
|
68
|
+
Backend readiness:
|
|
69
|
+
|
|
70
|
+
| Backend | Compute | Render | WGSL compiler | Status |
|
|
71
|
+
|---------|---------|--------|---------------|--------|
|
|
72
|
+
| **Metal** (macOS) | Production | Basic (no vertex/index) | WGSL -> MSL (AST-based) | Ready for package use |
|
|
73
|
+
| **Vulkan** (Linux) | WIP | Not started | WGSL -> SPIR-V needed | Experimental |
|
|
74
|
+
| **D3D12** (Windows) | WIP | Not started | WGSL -> HLSL/DXIL needed | Experimental |
|
|
75
|
+
|
|
76
|
+
Metal currently covers the package's intended use: bind groups 0-3, buffer
|
|
77
|
+
map/unmap, indirect dispatch, `shader-f16`, subgroups, override constants,
|
|
78
|
+
workgroup shared memory, multiple entry points, textures, samplers, and basic
|
|
79
|
+
render-pass execution.
|
|
80
|
+
|
|
81
|
+
Vulkan and D3D12 already have native runtime paths for instance creation,
|
|
82
|
+
compute dispatch, and buffer upload, but they still need shader translation,
|
|
83
|
+
bind group management, buffer map/unmap, textures, and render pipelines.
|
|
84
|
+
|
|
85
|
+
Performance snapshot from the Fawn Dawn-vs-Doe harness on Apple Silicon with
|
|
86
|
+
strict comparability checks:
|
|
87
|
+
|
|
88
|
+
- Compute e2e: 1.5x faster (0.23ms vs 0.35ms, 4096 threads)
|
|
89
|
+
- Buffer upload: faster across 1 KB to 4 GB (8 sizes claimable)
|
|
90
|
+
- Atomics: workgroup atomic and non-atomic both claimable
|
|
91
|
+
- Matrix-vector multiply: 3 variants claimable
|
|
92
|
+
- Concurrent execution: claimable
|
|
93
|
+
- Zero-init workgroup memory: claimable
|
|
94
|
+
- Draw throughput: 200k draws claimable
|
|
95
|
+
- Binary size: about 2 MB vs Dawn's about 11 MB
|
|
96
|
+
|
|
97
|
+
19 of 30 workloads are currently claimable. The remaining 11 are limited by
|
|
98
|
+
per-command Metal command buffer creation overhead (~350us vs Dawn's ~30us).
|
|
99
|
+
See `fawn/bench/` and [`status.md`](../../status.md) for methodology and the
|
|
100
|
+
broader backend matrix.
|
|
101
|
+
|
|
102
|
+
## API surface
|
|
103
|
+
|
|
104
|
+
Compute:
|
|
105
|
+
|
|
106
|
+
- `create()` / `setupGlobals()` / `requestAdapter()` / `requestDevice()`
|
|
107
|
+
- `device.createBuffer()` / `device.createShaderModule()` (WGSL)
|
|
108
|
+
- `device.createComputePipeline()` / `device.createComputePipelineAsync()`
|
|
109
|
+
- `device.createBindGroupLayout()` / `device.createBindGroup()`
|
|
110
|
+
- `device.createPipelineLayout()` / `pipeline.getBindGroupLayout()`
|
|
111
|
+
- `device.createCommandEncoder()` / `encoder.beginComputePass()`
|
|
112
|
+
- `pass.setPipeline()` / `pass.setBindGroup()` / `pass.dispatchWorkgroups()`
|
|
113
|
+
- `pass.dispatchWorkgroupsIndirect()`
|
|
114
|
+
- `encoder.copyBufferToBuffer()` / `queue.submit()` / `queue.writeBuffer()`
|
|
115
|
+
- `buffer.mapAsync()` / `buffer.getMappedRange()` / `buffer.unmap()`
|
|
116
|
+
- `queue.onSubmittedWorkDone()`
|
|
117
|
+
|
|
118
|
+
Render:
|
|
119
|
+
|
|
120
|
+
- `device.createTexture()` / `texture.createView()` / `device.createSampler()`
|
|
121
|
+
- `device.createRenderPipeline()` / `encoder.beginRenderPass()`
|
|
122
|
+
- `renderPass.setPipeline()` / `renderPass.draw()` / `renderPass.end()`
|
|
123
|
+
|
|
124
|
+
Device capabilities:
|
|
125
|
+
|
|
126
|
+
- `device.limits` / `adapter.limits`
|
|
127
|
+
- `device.features` / `adapter.features` with `shader-f16`
|
|
128
|
+
|
|
129
|
+
Current gaps:
|
|
130
|
+
- Canvas and surface presentation
|
|
131
|
+
- Vertex and index buffer binding in render passes
|
|
132
|
+
- Full render pipeline descriptor parsing
|
|
133
|
+
|
|
134
|
+
## More examples
|
|
148
135
|
|
|
149
136
|
```js
|
|
150
137
|
import { setupGlobals } from '@simulatte/webgpu-doe';
|
package/native/doe_napi.c
CHANGED
|
@@ -813,7 +813,7 @@ static napi_value doe_create_shader_module(napi_env env, napi_callback_info info
|
|
|
813
813
|
|
|
814
814
|
WGPUShaderModule mod = pfn_wgpuDeviceCreateShaderModule(device, &desc);
|
|
815
815
|
free(code);
|
|
816
|
-
if (!mod) NAPI_THROW(env, "createShaderModule failed");
|
|
816
|
+
if (!mod) NAPI_THROW(env, "createShaderModule failed (WGSL translation or compilation error — check stderr for details)");
|
|
817
817
|
return wrap_ptr(env, mod);
|
|
818
818
|
}
|
|
819
819
|
|
|
@@ -856,7 +856,7 @@ static napi_value doe_create_compute_pipeline(napi_env env, napi_callback_info i
|
|
|
856
856
|
|
|
857
857
|
WGPUComputePipeline pipeline = pfn_wgpuDeviceCreateComputePipeline(device, &desc);
|
|
858
858
|
free(ep);
|
|
859
|
-
if (!pipeline) NAPI_THROW(env, "createComputePipeline failed");
|
|
859
|
+
if (!pipeline) NAPI_THROW(env, "createComputePipeline failed (shader module invalid or entry point not found — check stderr for details)");
|
|
860
860
|
return wrap_ptr(env, pipeline);
|
|
861
861
|
}
|
|
862
862
|
|