web-llm-runner 0.1.17 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -0
- package/lib/index.js +45 -540
- package/lib/index.js.map +1 -1
- package/lib/onnx_engine.d.ts.map +1 -1
- package/lib/wrapper/WebLLMWrapper.d.ts.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -84,3 +84,19 @@ Safely removes a downloaded model completely from the browser's local cache to i
|
|
|
84
84
|
```javascript
|
|
85
85
|
await llm.delete_model("TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC");
|
|
86
86
|
```
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 📱 Hybrid Inference & Device Sensitivity
|
|
91
|
+
|
|
92
|
+
`web-llm-runner` features an intelligent hybrid engine that automatically selects the best inference backend for your hardware:
|
|
93
|
+
|
|
94
|
+
- **WebGPU (Primary)**: Uses high-performance hardware acceleration on supported desktop and mobile browsers.
|
|
95
|
+
- **ONNX/WASM (Fallback)**: Automatically switches to a CPU-based fallback for older devices or browsers without WebGPU support (like many current mobile browsers).
|
|
96
|
+
|
|
97
|
+
> [!NOTE]
|
|
98
|
+
> **Streaming Support**: While WebGPU streaming is fully stable, ONNX-based streaming is currently in active development. Performance and responsiveness may vary on low-resource mobile devices.
|
|
99
|
+
|
|
100
|
+
## ⚖ License
|
|
101
|
+
|
|
102
|
+
MIT
|