@simulatte/webgpu-doe 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,108 +1,8 @@
1
1
  # @simulatte/webgpu-doe
2
2
 
3
- Headless WebGPU for Node.js, powered by the
4
- [Doe](https://github.com/clocksmith/fawn) runtime.
3
+ Headless WebGPU for Node.js on the Doe runtime.
5
4
 
6
- ## What this is
7
-
8
- A native Metal WebGPU implementation for Node.js — no Dawn, no IPC, no
9
- 11 MB sidecar. Doe compiles WGSL to MSL at runtime via an AST-based
10
- shader compiler and dispatches directly to Metal via a Zig + ObjC bridge.
11
-
12
- This package ships:
13
-
14
- - **`libdoe_webgpu`** — Doe native runtime (~2 MB, Zig + Metal)
15
- - **`doe_napi.node`** — N-API addon bridging `libdoe_webgpu` to JavaScript
16
- - **`src/index.js`** — JS wrapper providing WebGPU-shaped classes and constants
17
-
18
- ## Architecture
19
-
20
- ```
21
- JavaScript (DoeGPUDevice, DoeGPUBuffer, ...)
22
- |
23
- N-API addon (doe_napi.c)
24
- |
25
- libdoe_webgpu.dylib ← Doe native Metal backend, ~2 MB
26
- |
27
- Metal.framework ← GPU execution (Apple Silicon)
28
- ```
29
-
30
- No Dawn dependency. All GPU calls go directly from Zig to Metal.
31
-
32
- ## Performance claims (Metal, Apple Silicon)
33
-
34
- Apples-to-apples vs Dawn (Chromium's WebGPU), matched workloads and timing:
35
-
36
- - **Compute e2e** — 1.5x faster (0.23ms vs 0.35ms, 4096 threads)
37
- - **Buffer upload** — faster across 1 KB to 4 GB (8 sizes claimable)
38
- - **Atomics** — workgroup atomic and non-atomic both claimable
39
- - **Matrix-vector multiply** — 3 variants claimable (naive, swizzle, workgroup-shared)
40
- - **Concurrent execution** — claimable
41
- - **Zero-init workgroup memory** — claimable
42
- - **Draw throughput** — 200k draws claimable
43
- - **Binary size** — ~2 MB vs Dawn's ~11 MB
44
-
45
- 19 of 30 workloads are claimable. The remaining 11 are bottlenecked by
46
- per-command Metal command buffer creation overhead (~350us vs Dawn's ~30us).
47
- See `fawn/bench/` for methodology and raw data.
48
-
49
- ## API surface
50
-
51
- Compute:
52
-
53
- - `create()` / `setupGlobals()` / `requestAdapter()` / `requestDevice()`
54
- - `device.createBuffer()` / `device.createShaderModule()` (WGSL)
55
- - `device.createComputePipeline()` / `device.createBindGroupLayout()`
56
- - `device.createBindGroup()` / `device.createPipelineLayout()`
57
- - `device.createCommandEncoder()` / `encoder.beginComputePass()`
58
- - `pass.setPipeline()` / `pass.setBindGroup()` / `pass.dispatchWorkgroups()`
59
- - `pass.dispatchWorkgroupsIndirect()`
60
- - `pipeline.getBindGroupLayout()`
61
- - `device.createComputePipelineAsync()`
62
- - `encoder.copyBufferToBuffer()` / `queue.submit()` / `queue.writeBuffer()`
63
- - `buffer.mapAsync()` / `buffer.getMappedRange()` / `buffer.unmap()`
64
- - `queue.onSubmittedWorkDone()`
65
-
66
- Render:
67
-
68
- - `device.createTexture()` / `texture.createView()` / `device.createSampler()`
69
- - `device.createRenderPipeline()` / `encoder.beginRenderPass()`
70
- - `renderPass.setPipeline()` / `renderPass.draw()` / `renderPass.end()`
71
-
72
- Device capabilities:
73
-
74
- - `device.limits` / `adapter.limits` — full Metal device limits
75
- - `device.features` / `adapter.features` — reports `shader-f16`
76
-
77
- Not yet supported: canvas/surface presentation, vertex/index buffer binding
78
- in render passes, full render pipeline descriptor parsing.
79
-
80
- ## Backend readiness
81
-
82
- | Backend | Compute | Render | WGSL compiler | Status |
83
- |---------|---------|--------|---------------|--------|
84
- | **Metal** (macOS) | Production | Basic (no vertex/index) | WGSL -> MSL (AST-based) | Ready |
85
- | **Vulkan** (Linux) | WIP | Not started | WGSL -> SPIR-V needed | Experimental |
86
- | **D3D12** (Windows) | WIP | Not started | WGSL -> HLSL/DXIL needed | Experimental |
87
-
88
- **Metal** is the primary backend. All Doppler compute workloads run on Metal today:
89
- bind groups 0-3, buffer map/unmap, indirect dispatch, shader-f16, subgroups,
90
- override constants, workgroup shared memory, multiple entry points.
91
-
92
- **Vulkan** and **D3D12** have real native runtime paths (not stubs) with instance
93
- creation, compute dispatch, and buffer upload — but lack shader translation,
94
- bind group management, buffer map/unmap, textures, and render pipelines.
95
-
96
- See [`fawn/status.md`](../../status.md) for the full backend implementation matrix.
97
-
98
- ## Platform support
99
-
100
- | Platform | Architecture | Status |
101
- |----------|-------------|--------|
102
- | macOS | arm64 | Prebuilt, tested |
103
- | macOS | x64 | Not yet built |
104
- | Linux | x64 | Not yet built (Vulkan backend experimental) |
105
- | Windows | x64 | Not yet built (D3D12 backend experimental) |
5
+ **[Fawn](https://github.com/clocksmith/fawn/tree/main/nursery/webgpu-doe)** · **[npm](https://www.npmjs.com/package/@simulatte/webgpu-doe)** · **[simulatte.world](https://simulatte.world)**
106
6
 
107
7
  ## Install
108
8
 
@@ -110,10 +10,11 @@ See [`fawn/status.md`](../../status.md) for the full backend implementation matr
110
10
  npm install @simulatte/webgpu-doe
111
11
  ```
112
12
 
113
- The N-API addon compiles from C source on install via node-gyp. This requires
114
- a C compiler (`xcode-select --install` on macOS).
13
+ The published package currently targets `darwin-arm64`. The N-API addon builds
14
+ from C source during install via `node-gyp`, so a local C toolchain is
15
+ required (`xcode-select --install` on macOS).
115
16
 
116
- ## Usage
17
+ ## Quick start
117
18
 
118
19
  ```js
119
20
  import { create, globals } from '@simulatte/webgpu-doe';
@@ -144,7 +45,93 @@ const shader = device.createShaderModule({
144
45
  // ... create pipeline, bind group, encode, dispatch, readback
145
46
  ```
146
47
 
147
- ### Setup globals (navigator.gpu)
48
+ The package loads `libdoe_webgpu` and exposes a WebGPU-shaped API for
49
+ headless compute and basic render work. See [more examples](#more-examples)
50
+ below for `navigator.gpu` setup and provider inspection.
51
+
52
+ ## Why Doe
53
+
54
+ - Native path: JavaScript calls into an N-API addon, which loads
55
+ `libdoe_webgpu` and submits work through Doe's Metal backend.
56
+ - Runtime ownership: WGSL is compiled to MSL inside Doe's AST-based compiler
57
+ instead of going through Dawn.
58
+ - Small package payload: the shared library is about 2 MB on `darwin-arm64`.
59
+ - WebGPU-shaped surface: `requestAdapter`, `requestDevice`, buffer mapping,
60
+ bind groups, compute pipelines, command encoders, and basic render passes are
61
+ exposed directly from the package.
62
+
63
+ ## Status
64
+
65
+ Current package target:
66
+ - macOS arm64: prebuilt library and tested package path
67
+
68
+ Backend readiness:
69
+
70
+ | Backend | Compute | Render | WGSL compiler | Status |
71
+ |---------|---------|--------|---------------|--------|
72
+ | **Metal** (macOS) | Production | Basic (no vertex/index) | WGSL -> MSL (AST-based) | Ready for package use |
73
+ | **Vulkan** (Linux) | WIP | Not started | WGSL -> SPIR-V needed | Experimental |
74
+ | **D3D12** (Windows) | WIP | Not started | WGSL -> HLSL/DXIL needed | Experimental |
75
+
76
+ Metal currently covers the package's intended use: bind groups 0-3, buffer
77
+ map/unmap, indirect dispatch, `shader-f16`, subgroups, override constants,
78
+ workgroup shared memory, multiple entry points, textures, samplers, and basic
79
+ render-pass execution.
80
+
81
+ Vulkan and D3D12 already have native runtime paths for instance creation,
82
+ compute dispatch, and buffer upload, but they still need shader translation,
83
+ bind group management, buffer map/unmap, textures, and render pipelines.
84
+
85
+ Performance snapshot from the Fawn Dawn-vs-Doe harness on Apple Silicon with
86
+ strict comparability checks:
87
+
88
+ - Compute e2e: 1.5x faster (0.23ms vs 0.35ms, 4096 threads)
89
+ - Buffer upload: faster across 1 KB to 4 GB (8 sizes claimable)
90
+ - Atomics: workgroup atomic and non-atomic both claimable
91
+ - Matrix-vector multiply: 3 variants claimable
92
+ - Concurrent execution: claimable
93
+ - Zero-init workgroup memory: claimable
94
+ - Draw throughput: 200k draws claimable
95
+ - Binary size: about 2 MB vs Dawn's about 11 MB
96
+
97
+ 19 of 30 workloads are currently claimable. The remaining 11 are limited by
98
+ per-command Metal command buffer creation overhead (~350us vs Dawn's ~30us).
99
+ See `fawn/bench/` and [`status.md`](../../status.md) for methodology and the
100
+ broader backend matrix.
101
+
102
+ ## API surface
103
+
104
+ Compute:
105
+
106
+ - `create()` / `setupGlobals()` / `requestAdapter()` / `requestDevice()`
107
+ - `device.createBuffer()` / `device.createShaderModule()` (WGSL)
108
+ - `device.createComputePipeline()` / `device.createComputePipelineAsync()`
109
+ - `device.createBindGroupLayout()` / `device.createBindGroup()`
110
+ - `device.createPipelineLayout()` / `pipeline.getBindGroupLayout()`
111
+ - `device.createCommandEncoder()` / `encoder.beginComputePass()`
112
+ - `pass.setPipeline()` / `pass.setBindGroup()` / `pass.dispatchWorkgroups()`
113
+ - `pass.dispatchWorkgroupsIndirect()`
114
+ - `encoder.copyBufferToBuffer()` / `queue.submit()` / `queue.writeBuffer()`
115
+ - `buffer.mapAsync()` / `buffer.getMappedRange()` / `buffer.unmap()`
116
+ - `queue.onSubmittedWorkDone()`
117
+
118
+ Render:
119
+
120
+ - `device.createTexture()` / `texture.createView()` / `device.createSampler()`
121
+ - `device.createRenderPipeline()` / `encoder.beginRenderPass()`
122
+ - `renderPass.setPipeline()` / `renderPass.draw()` / `renderPass.end()`
123
+
124
+ Device capabilities:
125
+
126
+ - `device.limits` / `adapter.limits`
127
+ - `device.features` / `adapter.features` with `shader-f16`
128
+
129
+ Current gaps:
130
+ - Canvas and surface presentation
131
+ - Vertex and index buffer binding in render passes
132
+ - Full render pipeline descriptor parsing
133
+
134
+ ## More examples
148
135
 
149
136
  ```js
150
137
  import { setupGlobals } from '@simulatte/webgpu-doe';
package/native/doe_napi.c CHANGED
@@ -813,7 +813,7 @@ static napi_value doe_create_shader_module(napi_env env, napi_callback_info info
813
813
 
814
814
  WGPUShaderModule mod = pfn_wgpuDeviceCreateShaderModule(device, &desc);
815
815
  free(code);
816
- if (!mod) NAPI_THROW(env, "createShaderModule failed");
816
+ if (!mod) NAPI_THROW(env, "createShaderModule failed (WGSL translation or compilation error — check stderr for details)");
817
817
  return wrap_ptr(env, mod);
818
818
  }
819
819
 
@@ -856,7 +856,7 @@ static napi_value doe_create_compute_pipeline(napi_env env, napi_callback_info i
856
856
 
857
857
  WGPUComputePipeline pipeline = pfn_wgpuDeviceCreateComputePipeline(device, &desc);
858
858
  free(ep);
859
- if (!pipeline) NAPI_THROW(env, "createComputePipeline failed");
859
+ if (!pipeline) NAPI_THROW(env, "createComputePipeline failed (shader module invalid or entry point not found — check stderr for details)");
860
860
  return wrap_ptr(env, pipeline);
861
861
  }
862
862
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@simulatte/webgpu-doe",
3
- "version": "0.1.1",
3
+ "version": "0.1.2",
4
4
  "description": "Doe WebGPU runtime for Node.js — native Zig GPU backends (Metal, Vulkan) via N-API",
5
5
  "type": "module",
6
6
  "main": "./src/index.js",