@simulatte/doppler 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +21 -16
- package/package.json +3 -5
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# @simulatte/doppler
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Inference and training on raw WebGPU. Pure JS + WGSL.
|
|
4
4
|
|
|
5
5
|
**[Live Demo](https://d4da.com)** · **[npm](https://www.npmjs.com/package/@simulatte/doppler)** · **[simulatte.world](https://simulatte.world)**
|
|
6
6
|
|
|
@@ -22,19 +22,25 @@ for await (const token of model.generate('Hello, world')) {
|
|
|
22
22
|
}
|
|
23
23
|
```
|
|
24
24
|
|
|
25
|
-
|
|
25
|
+
Tokens stream from a native `AsyncGenerator`. See [more examples](#more-examples) below or the full [API contract](docs/doppler-api-contract.md).
|
|
26
26
|
|
|
27
|
-
##
|
|
27
|
+
## Why Doppler
|
|
28
28
|
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
29
|
+
**JS → WGSL → WebGPU.** One hop to the GPU. No ONNX runtime, no WASM blob, no bridge layer.
|
|
30
|
+
|
|
31
|
+
**`for await` streaming.** Not callbacks. Not a `TextStreamer` class. A loop.
|
|
32
|
+
|
|
33
|
+
**LoRA hot-swap.** Swap adapters at runtime without reloading the base model.
|
|
34
|
+
|
|
35
|
+
**Independent model instances.** Run multiple models concurrently. Each owns its pipeline, buffers, and KV cache.
|
|
36
|
+
|
|
37
|
+
## Under the Hood
|
|
38
|
+
|
|
39
|
+
- Sharded weight loading via OPFS. Gigabytes into VRAM without blocking the main thread.
|
|
40
|
+
- Quantized inference: Q4K, Q8, F16. Real models on consumer GPUs.
|
|
41
|
+
- Kernel hot-swap between prefill and decode paths.
|
|
42
|
+
- Config-driven runtime. Presets, kernel path selection, and sampling are policy, not code.
|
|
43
|
+
- Reproducible benchmarks with deterministic knobs and auditable kernel traces.
|
|
38
44
|
|
|
39
45
|
## Browser Support
|
|
40
46
|
|
|
@@ -46,12 +52,11 @@ That's it. Streaming is the default. See [more examples](#more-examples) below o
|
|
|
46
52
|
|
|
47
53
|
## Evidence
|
|
48
54
|
|
|
49
|
-
Lower is better, comparing per-phase latency by workload.
|
|
50
|
-
|
|
51
55
|

|
|
52
56
|
|
|
53
|
-
Snapshot
|
|
54
|
-
- [g3-p064-d064-t0-k1.
|
|
57
|
+
Snapshot artifacts:
|
|
58
|
+
- [g3-1b-p064-d064-t0-k1.compare.json](benchmarks/vendors/fixtures/g3-1b-p064-d064-t0-k1.compare.json)
|
|
59
|
+
- [lfm2-5-1-2b-p064-d064-t0-k1.compare.json](benchmarks/vendors/fixtures/lfm2-5-1-2b-p064-d064-t0-k1.compare.json)
|
|
55
60
|
|
|
56
61
|
## More Examples
|
|
57
62
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@simulatte/doppler",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.1",
|
|
4
4
|
"description": "Browser-native WebGPU inference engine for local intent and inference loops",
|
|
5
5
|
"main": "src/index.js",
|
|
6
6
|
"types": "src/index.d.ts",
|
|
@@ -134,11 +134,9 @@
|
|
|
134
134
|
"tools/convert-safetensors-node.js"
|
|
135
135
|
],
|
|
136
136
|
"devDependencies": {
|
|
137
|
+
"@huggingface/transformers": "^3.8.1",
|
|
137
138
|
"jest": "^30.2.0",
|
|
139
|
+
"onnxruntime-web": "^1.24.1",
|
|
138
140
|
"playwright": "^1.58.2"
|
|
139
|
-
},
|
|
140
|
-
"dependencies": {
|
|
141
|
-
"@huggingface/transformers": "^3.8.1",
|
|
142
|
-
"onnxruntime-web": "^1.24.1"
|
|
143
141
|
}
|
|
144
142
|
}
|