@tryhamster/gerbil 1.0.0-rc.8 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +1 -1
- package/README.md +247 -84
- package/dist/architectures-C1I5V3Dt.mjs +6070 -0
- package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
- package/dist/browser/index.d.ts +264 -588
- package/dist/browser/index.d.ts.map +1 -1
- package/dist/browser/index.js +585 -2334
- package/dist/browser/index.js.map +1 -1
- package/dist/cli.mjs +625 -1098
- package/dist/cli.mjs.map +1 -1
- package/dist/defaults-9komdrbY.mjs +24 -0
- package/dist/defaults-9komdrbY.mjs.map +1 -0
- package/dist/frameworks/express.d.mts +1 -3
- package/dist/frameworks/express.d.mts.map +1 -1
- package/dist/frameworks/express.mjs +7 -7
- package/dist/frameworks/express.mjs.map +1 -1
- package/dist/frameworks/fastify.d.mts +1 -1
- package/dist/frameworks/fastify.d.mts.map +1 -1
- package/dist/frameworks/fastify.mjs +3 -3
- package/dist/frameworks/fastify.mjs.map +1 -1
- package/dist/frameworks/hono.d.mts +1 -1
- package/dist/frameworks/hono.d.mts.map +1 -1
- package/dist/frameworks/hono.mjs +4 -4
- package/dist/frameworks/hono.mjs.map +1 -1
- package/dist/frameworks/next.d.mts +3 -2
- package/dist/frameworks/next.d.mts.map +1 -1
- package/dist/frameworks/next.mjs +4 -4
- package/dist/frameworks/next.mjs.map +1 -1
- package/dist/frameworks/react.d.mts +1 -1
- package/dist/frameworks/trpc.d.mts +1 -1
- package/dist/frameworks/trpc.d.mts.map +1 -1
- package/dist/frameworks/trpc.mjs +4 -4
- package/dist/frameworks/trpc.mjs.map +1 -1
- package/dist/gerbil-BHrJJIa4.mjs +1656 -0
- package/dist/gerbil-BHrJJIa4.mjs.map +1 -0
- package/dist/gerbil-BT9fCydo.d.mts +488 -0
- package/dist/gerbil-BT9fCydo.d.mts.map +1 -0
- package/dist/gerbil-DomNfIr1.mjs +4 -0
- package/dist/gpu/hooks.d.mts +520 -0
- package/dist/gpu/hooks.d.mts.map +1 -0
- package/dist/gpu/hooks.mjs +1188 -0
- package/dist/gpu/hooks.mjs.map +1 -0
- package/dist/gpu/index.d.mts +2 -0
- package/dist/gpu/index.mjs +6 -0
- package/dist/gpu-33qCAtHW.mjs +3615 -0
- package/dist/gpu-33qCAtHW.mjs.map +1 -0
- package/dist/index-Dgmb2kE3.d.mts +245 -0
- package/dist/index-Dgmb2kE3.d.mts.map +1 -0
- package/dist/index-jEAL2s-A.d.mts +2022 -0
- package/dist/index-jEAL2s-A.d.mts.map +1 -0
- package/dist/index.d.mts +22 -487
- package/dist/index.d.mts.map +1 -1
- package/dist/index.mjs +13 -8
- package/dist/index.mjs.map +1 -1
- package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
- package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
- package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
- package/dist/integrations/ai-sdk.d.mts +75 -6
- package/dist/integrations/ai-sdk.d.mts.map +1 -1
- package/dist/integrations/ai-sdk.mjs +131 -15
- package/dist/integrations/ai-sdk.mjs.map +1 -1
- package/dist/integrations/langchain.d.mts +1 -1
- package/dist/integrations/langchain.d.mts.map +1 -1
- package/dist/integrations/langchain.mjs +5 -5
- package/dist/integrations/langchain.mjs.map +1 -1
- package/dist/integrations/llamaindex.d.mts +1 -1
- package/dist/integrations/llamaindex.d.mts.map +1 -1
- package/dist/integrations/llamaindex.mjs +5 -5
- package/dist/integrations/llamaindex.mjs.map +1 -1
- package/dist/integrations/mcp-client.mjs +3 -3
- package/dist/integrations/mcp-client.mjs.map +1 -1
- package/dist/integrations/mcp.d.mts +3 -2
- package/dist/integrations/mcp.d.mts.map +1 -1
- package/dist/integrations/mcp.mjs +5 -5
- package/dist/{mcp-BvbriaBy.mjs → mcp-1DaMsaBc.mjs} +4 -4
- package/dist/mcp-1DaMsaBc.mjs.map +1 -0
- package/dist/memory/index.d.mts +3 -0
- package/dist/memory/index.mjs +6 -0
- package/dist/memory-D1P7Tmda.mjs +4 -0
- package/dist/memory-DVN0MnIG.mjs +132 -0
- package/dist/memory-DVN0MnIG.mjs.map +1 -0
- package/dist/memory-Dj0J1v88.mjs +294 -0
- package/dist/memory-Dj0J1v88.mjs.map +1 -0
- package/dist/moonshine-stt-BLyVoRpB.mjs +4 -0
- package/dist/moonshine-stt-v_P_Ci_m.mjs +11936 -0
- package/dist/moonshine-stt-v_P_Ci_m.mjs.map +1 -0
- package/dist/{one-liner-s-lD8rCC.mjs → one-liner-DnQn7HJK.mjs} +14 -16
- package/dist/one-liner-DnQn7HJK.mjs.map +1 -0
- package/dist/repl-jV5gcJFA.mjs +9 -0
- package/dist/skills/index.d.mts +270 -320
- package/dist/skills/index.d.mts.map +1 -1
- package/dist/skills/index.mjs +5 -5
- package/dist/{skills-CD3Orlex.mjs → skills-DX8D59UH.mjs} +187 -32
- package/dist/skills-DX8D59UH.mjs.map +1 -0
- package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
- package/dist/tools-DQ1mPUw5.mjs.map +1 -0
- package/dist/{types-CiTc7ez3.d.mts → types-D6FiR_oh.d.mts} +106 -12
- package/dist/types-D6FiR_oh.d.mts.map +1 -0
- package/dist/types-DQBe2lFo.d.mts +165 -0
- package/dist/types-DQBe2lFo.d.mts.map +1 -0
- package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
- package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
- package/dist/vector-B0panuy6.mjs +95 -0
- package/dist/vector-B0panuy6.mjs.map +1 -0
- package/docs/PROJECT-STATE.md +321 -0
- package/docs/adding-a-model-family.md +280 -0
- package/docs/ai-sdk.md +70 -61
- package/docs/architecture/overview.md +17 -7
- package/docs/browser.md +203 -8
- package/docs/embeddings.md +156 -0
- package/docs/gerbil-site-native-migration.md +217 -0
- package/docs/gpu-engine/architectures.md +398 -0
- package/docs/gpu-engine/ir.md +372 -0
- package/docs/gpu-engine/kernels.md +718 -0
- package/docs/gpu-engine/paper.html +1759 -0
- package/docs/gpu-engine/paper.md +2109 -0
- package/docs/gpu-engine/safetensors.md +312 -0
- package/docs/gpu-engine/tokenizer.md +302 -0
- package/docs/memory-rag.md +91 -0
- package/docs/metal-safari-intel.md +190 -0
- package/docs/mobile-failure-diagnosis.md +124 -0
- package/docs/mobile.md +99 -0
- package/docs/observability.md +230 -0
- package/docs/onnx-removal-plan.md +339 -0
- package/docs/research/autoresearch-portable.md +904 -0
- package/docs/research/dispatch-reduction-hivemind.md +84 -0
- package/docs/research/ios-safari-model-caching.md +117 -0
- package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
- package/docs/research/native-stt-model-selection.md +49 -0
- package/docs/research/native-tts-model-selection.md +90 -0
- package/docs/research/native-vs-chromium-decision.md +152 -0
- package/docs/research/nemotron-mamba2-inference.md +910 -0
- package/docs/research/qwen35-multimodal.md +293 -0
- package/docs/research/qwen36-gemma4-targets.md +337 -0
- package/docs/research/sota-embedding-models.md +179 -0
- package/docs/research/sota-mobile-models-2026.md +263 -0
- package/docs/research/sota-modality-models.md +202 -0
- package/docs/research/tps-baselines.md +71 -0
- package/docs/research/webgpu-m4-reference.md +104 -0
- package/docs/site-update-plan.md +155 -0
- package/docs/structured-output.md +123 -0
- package/docs/stt.md +63 -446
- package/docs/tts.md +77 -499
- package/docs/vision.md +100 -338
- package/package.json +22 -7
- package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
- package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
- package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
- package/dist/gerbil-CJ3ifloF.mjs +0 -4
- package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
- package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
- package/dist/gerbil-qOTe1nl2.d.mts +0 -431
- package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
- package/dist/kokoro-BNTb6egA.mjs +0 -20210
- package/dist/kokoro-BNTb6egA.mjs.map +0 -1
- package/dist/kokoro-DFRQ1OeM.js +0 -20212
- package/dist/kokoro-DFRQ1OeM.js.map +0 -1
- package/dist/mcp-BvbriaBy.mjs.map +0 -1
- package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
- package/dist/repl-DveXw36T.mjs +0 -9
- package/dist/skills-CD3Orlex.mjs.map +0 -1
- package/dist/stt-CpLYbGFd.mjs +0 -433
- package/dist/stt-CpLYbGFd.mjs.map +0 -1
- package/dist/stt-DRPLEEHB.mjs +0 -3
- package/dist/stt-Te8Qz-Ay.js +0 -433
- package/dist/stt-Te8Qz-Ay.js.map +0 -1
- package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
- package/dist/transformers.web-DokyH3rP.js +0 -3
- package/dist/transformers.web-M6mCnEYJ.js +0 -30382
- package/dist/transformers.web-M6mCnEYJ.js.map +0 -1
- package/dist/tts-C0xx3CtE.js +0 -724
- package/dist/tts-C0xx3CtE.js.map +0 -1
- package/dist/tts-DXgsKGCe.mjs +0 -3
- package/dist/tts-DeGANMNV.mjs +0 -730
- package/dist/tts-DeGANMNV.mjs.map +0 -1
- package/dist/types-CiTc7ez3.d.mts.map +0 -1
- /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
- /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
- /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
|
@@ -0,0 +1,339 @@
|
|
|
1
|
+
# ONNX / transformers.js Removal Plan — Going Pure Native
|
|
2
|
+
|
|
3
|
+
> Status: **PLANNING DOC**. No code is deleted by this document. It is the
|
|
4
|
+
> authoritative inventory + phased plan to hand to a removal workflow.
|
|
5
|
+
>
|
|
6
|
+
> Branch: `feat/webgpu-engine-mobile`
|
|
7
|
+
> Goal: remove the legacy ONNX Runtime / transformers.js / puppeteer-Chrome
|
|
8
|
+
> inference stack so Gerbil runs **pure native** on the WebGPU engine
|
|
9
|
+
> (`src/gpu/`), across text (Qwen3.5 / LFM2.5 / Gemma4), vision (Qwen3.5 ViT,
|
|
10
|
+
> Gemma4 vision), embeddings (EmbeddingGemma / Qwen3-Embedding), STT (Moonshine)
|
|
11
|
+
> and TTS (Kani).
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## 0. TL;DR — Headline Findings
|
|
16
|
+
|
|
17
|
+
- **~10,800 lines of inference-stack code become deletable** once the native
|
|
18
|
+
paths are wired in and the browser/REPL consumers are migrated. The biggest
|
|
19
|
+
single chunks: `chrome-backend.ts` (1,836), `worker.ts` (1,056),
|
|
20
|
+
`tts.ts` (1,031), `worker-entry.ts` (599), `stt.ts` (605),
|
|
21
|
+
`worker-code.generated.ts` (1.35 MB generated bundle).
|
|
22
|
+
- **3 npm deps drop out:** `puppeteer-core`, `@huggingface/transformers`,
|
|
23
|
+
`onnxruntime-web`. Likely also `kokoro-js` (ONNX-backed TTS) and possibly
|
|
24
|
+
`@huggingface/hub` (only if nothing else needs HF tree/download APIs).
|
|
25
|
+
- **The "all modalities are native" premise is only true inside `src/gpu/`.**
|
|
26
|
+
`WebGPUEngine` (`src/gpu/index.ts`) exposes native `generate`, `encodeImage`,
|
|
27
|
+
`describeImage`, `embed`, `speak` and `transcribe` (Moonshine). **But the
|
|
28
|
+
public surfaces — the core Node `Gerbil` class and the browser hooks — do NOT
|
|
29
|
+
yet route STT / TTS / embeddings to the native engine.** They still call ONNX.
|
|
30
|
+
**This wiring is the prerequisite for removal**, not an afterthought.
|
|
31
|
+
|
|
32
|
+
### The 2-3 real decisions the user must make
|
|
33
|
+
|
|
34
|
+
1. **CPU/WASM fallback (transformers.js wasm) — keep or kill?** Pure-native
|
|
35
|
+
means *WebGPU required*. Dropping the wasm path removes the only way to run
|
|
36
|
+
on machines/browsers without WebGPU (older Safari, locked-down enterprise,
|
|
37
|
+
CI without a GPU). Recommendation: **kill it** for pure-native, but this is a
|
|
38
|
+
reach tradeoff the user owns. (Detail in §3a.)
|
|
39
|
+
2. **chrome-backend.ts (puppeteer headless Chrome) — remove now?** It was the
|
|
40
|
+
Node-side WebGPU path before node-dawn/native. It is already legacy. **Yes,
|
|
41
|
+
safe to remove** — but it has surprisingly wide tentacles in the CLI/REPL
|
|
42
|
+
(cache listing, process management commands). (Detail in §3b.)
|
|
43
|
+
3. **Browser ONNX worker + hooks — remove once site/REPL stop using them?**
|
|
44
|
+
The chat worker (`worker.ts` + `worker-entry.ts` + the 1.35 MB
|
|
45
|
+
`worker-code.generated.ts`) and `use-embedding`/`use-speech` ONNX paths can
|
|
46
|
+
go once `use-native-engine` covers chat and native embed/speech hooks exist.
|
|
47
|
+
(Detail in §3c.)
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## 1. Full Inventory — File by File
|
|
52
|
+
|
|
53
|
+
Legend: **[DELETE]** = remove entirely · **[GUT]** = strip ONNX path, keep file ·
|
|
54
|
+
**[KEEP]** = native, no change · **[REWIRE]** = repoint to native then clean.
|
|
55
|
+
|
|
56
|
+
### 1a. Core (`src/core/`)
|
|
57
|
+
|
|
58
|
+
| File | Lines | ONNX/legacy usage | Disposition |
|
|
59
|
+
|---|---:|---|---|
|
|
60
|
+
| `chrome-backend.ts` | 1,836 | `puppeteer-core` headless-Chrome WebGPU backend. `ChromeGPUBackend`, `getChromeCachedModels`, global browser/process management. | **[DELETE]** (after CLI/REPL detached) |
|
|
61
|
+
| `stt.ts` | 605 | `WhisperSTT`: `pipeline("automatic-speech-recognition", "onnx-community/whisper-*")` via `@huggingface/transformers`, ORT wasm CDN. | **[DELETE]** → replace with native Moonshine wrapper |
|
|
62
|
+
| `tts.ts` | 1,031 | `KokoroTTS` (`kokoro-js`, ONNX) + `SupertonicTTS` (`pipeline("text-to-speech")`, ONNX). | **[DELETE]** → replace with native Kani wrapper |
|
|
63
|
+
| `model-compat.ts` | 79 | `patchHFCacheConfig` — patches HF cache so transformers.js loads new model_types (qwen3_5→qwen3). Pure ONNX-loader workaround. | **[DELETE]** (native loader reads safetensors directly, no patch needed) |
|
|
64
|
+
| `gerbil.ts` | 2,686 | The hub. `import { AutoModelForCausalLM, AutoModelForImageTextToText, pipeline, ... } from "@huggingface/transformers"`. `effectiveBackend` switch with `onnx` / `webgpu-native` / chrome branches. `this.model` (ONNX), `this.visionModel` (ONNX), `this.chromeBackend`, `this.embedder` (ONNX pipeline). Native path = `this.webgpuEngine`. `embed()`/`embedBatch()` still ONNX (`pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2")`). STT/TTS delegate to ONNX `WhisperSTT`/`createTTS`. | **[GUT]** — collapse to native-only; remove all ONNX branches, chrome branch, `pipeline` import, model-compat call |
|
|
65
|
+
|
|
66
|
+
Key ONNX surfaces inside `gerbil.ts`:
|
|
67
|
+
- Imports (lines ~6-21): `AutoModelForCausalLM`, `AutoModelForImageTextToText`,
|
|
68
|
+
`pipeline as rawPipeline`, etc.
|
|
69
|
+
- `effectiveBackend` switch (~386-393): three-way `onnx` / `webgpu-native` /
|
|
70
|
+
auto. After removal this becomes always-native.
|
|
71
|
+
- Branches: `isBrowser && webgpu && onnx` (~440), `!isBrowser && webgpu && onnx`
|
|
72
|
+
→ chrome backend (~467-486), CPU/WASM pipeline fallback (~488+), error
|
|
73
|
+
fallback to pipeline (~525-530).
|
|
74
|
+
- `loadVisionModel` chrome branch (~600-620).
|
|
75
|
+
- `generate`/`generateStream` dispatch: `if (this.webgpuEngine)` (native) vs
|
|
76
|
+
`else if (this.chromeBackend)` vs `else this.model.generate` (ONNX) — lines
|
|
77
|
+
~1235-1300, ~1424-1510.
|
|
78
|
+
- `embed()` (~1824) / `embedBatch()` (~1849): pure ONNX, **no native branch yet**.
|
|
79
|
+
- STT/TTS methods (~2058-2310): delegate to ONNX `tts`/`stt` objects.
|
|
80
|
+
- Chrome status/memory passthroughs (~1055-1101), getModelRepo ONNX maps
|
|
81
|
+
(~1026-1041).
|
|
82
|
+
- Cleanup (~2388-2461): disposes webgpuEngine, chromeBackend, tts, stt.
|
|
83
|
+
|
|
84
|
+
### 1b. Browser (`src/browser/`)
|
|
85
|
+
|
|
86
|
+
| File | Lines | ONNX/legacy usage | Disposition |
|
|
87
|
+
|---|---:|---|---|
|
|
88
|
+
| `worker.ts` | 1,056 | Chat web-worker host. Builds inline worker that imports `@huggingface/transformers` from CDN, ORT wasm paths, `AutoTokenizer`/`AutoModelForCausalLM`/`AutoModelForImageTextToText`. iOS main-thread inference path. | **[DELETE]** (after chat → `use-native-engine`) |
|
|
89
|
+
| `worker-entry.ts` | 599 | The real worker source compiled to the IIFE. Imports transformers v4, ORT wasm CDN, vision+text ONNX. | **[DELETE]** |
|
|
90
|
+
| `worker-code.generated.ts` | ~1.35 MB | Generated IIFE bundle of `worker-entry.ts` (transformers.js fully inlined). Produced by `scripts/build-worker.mjs`. | **[DELETE]** + delete generator + build step |
|
|
91
|
+
| `use-embedding.ts` | 485 | Worker that imports transformers.js from CDN, `pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2")`, ORT wasm, iOS single-thread. | **[GUT/REWIRE]** to native embed (EmbeddingGemma via `WebGPUEngine.embed`) |
|
|
92
|
+
| `use-speech.ts` | 689 | TTS hook: Kokoro (`kokoro-js`) + Supertonic (`pipeline("text-to-speech")`). All ONNX. | **[REWIRE]** to native Kani |
|
|
93
|
+
| `use-voice-input.ts` | 1,227 | STT+chat+TTS combo hook. Uses `createGerbilWorker` (ONNX chat worker) + ONNX STT/TTS models (`whisper-tiny.en`, `kokoro-82m`). | **[REWIRE]** to native engine + native STT/TTS |
|
|
94
|
+
| `backend-selector.ts` | 245 | `selectBackend` returning `webgpu`/`wasm`/`cpu`, fallback chains, iOS wasm fallbacks. The wasm/cpu half is ONNX-fallback machinery. | **[GUT]** — drop wasm/cpu fallback (decision §3a) |
|
|
95
|
+
| `use-native-engine.ts` | 217 | **[KEEP]** Already native — "no ONNX Runtime, no transformers.js." This is the template the others migrate to. | **[KEEP]** |
|
|
96
|
+
| `index.ts` | — | Re-exports worker, use-embedding, use-speech, use-voice-input, backend-selector. Notes core Gerbil excluded due to chrome-backend/puppeteer. | **[REWIRE]** exports after deletions |
|
|
97
|
+
|
|
98
|
+
Other browser files referencing legacy (mostly comments/guards): `preload.ts`,
|
|
99
|
+
`device-guards.ts`, `download.ts`, `webgpu.ts` — touch only as needed (mostly
|
|
100
|
+
ONNX wasm path hints / iOS guards). Low priority.
|
|
101
|
+
|
|
102
|
+
### 1c. CLI / REPL (`src/cli/`) — see §4 for full audit
|
|
103
|
+
|
|
104
|
+
| File | Lines | usage | Disposition |
|
|
105
|
+
|---|---:|---|---|
|
|
106
|
+
| `index.ts` | 1,289 | `--backend native|onnx|auto` flag (maps to webgpu-native/onnx). `transcribe`, `tts`, `voice` commands (ONNX STT/TTS). `chrome` admin command (`ChromeGPUBackend.getGlobalBrowserStatus`, `killAllBackends`, kill page, ps/pkill of `gerbil/chrome-cache`). transformers cache dir scanning. | **[REWIRE]** drop onnx/auto backend options → native-only; delete `chrome` command; repoint STT/TTS |
|
|
107
|
+
| `repl/utils.ts` | 738 | `getChromeCachedModels`, `refreshCachedModelSizes`, transformers cache dir, `.onnx` file detection, HF `/tree/main/onnx` fetch picking q4f16/q4 ONNX files, `onnx-community/*-ONNX` model registry. | **[GUT]** remove chrome + ONNX file logic; native models load safetensors |
|
|
108
|
+
| `repl/App.tsx` | 1,340 | `backend` prop ("auto"/"webgpu-native"/"onnx"), device switch webgpu/cpu, ttsModel kokoro / sttModel whisper defaults, transformers.js disposal comment. | **[REWIRE]** native-only |
|
|
109
|
+
| `repl/index.tsx` | — | Chrome cleanup delay comment, backend prop. | **[GUT]** |
|
|
110
|
+
| `repl/views/ModelView.tsx`, `ChatView.tsx`, `FrameworksView.tsx` | — | reference chrome-backend / onnx model ids. | **[GUT]** |
|
|
111
|
+
|
|
112
|
+
### 1d. Skills / Integrations
|
|
113
|
+
|
|
114
|
+
| File | usage | Disposition |
|
|
115
|
+
|---|---|---|
|
|
116
|
+
| `skills/builtin/transcribe.ts` (77) | Enum of whisper ONNX model ids; calls `g.transcribe`. | **[REWIRE]** to native STT model ids (Moonshine) |
|
|
117
|
+
| `integrations/langchain.ts` | `GerbilEmbeddings` → `g.embed`/`g.embedBatch`. No direct ONNX, rides on core `embed()`. | **[KEEP]** (auto-native once core `embed()` is native) |
|
|
118
|
+
| `integrations/ai-sdk.ts` | Embedding settings → `g.embed`. | **[KEEP]** |
|
|
119
|
+
| `src/index.ts` | Exports `ChromeGPUBackend`, `getChromeCachedModels`, `WhisperSTT`, `WHISPER_MODELS`, `KokoroTTS`, `TTS_MODELS`, STT/TTS types. | **[REWIRE]** remove chrome + ONNX-class exports; export native equivalents |
|
|
120
|
+
|
|
121
|
+
### 1e. Native engine (`src/gpu/`) — the target, all KEEP
|
|
122
|
+
|
|
123
|
+
`index.ts` (`WebGPUEngine`: `generate`, `encodeImage`, `describeImage`, `embed`,
|
|
124
|
+
`speak`, `transcribe`), `moonshine-stt.ts` / `moonshine-executor.ts` (native
|
|
125
|
+
STT), `kani-tts.ts` (native TTS), `vision-executor.ts` / `vision-preprocess.ts`,
|
|
126
|
+
`model-loader.ts` (safetensors), `architectures/*` (qwen2, qwen3_5, qwen3_5_vision,
|
|
127
|
+
lfm2, gemma4, gemma4_vision, gemma3_encoder, moonshine, kani_tts, omnivoice).
|
|
128
|
+
**No changes — this is what everything migrates onto.**
|
|
129
|
+
|
|
130
|
+
### 1f. Build / deps
|
|
131
|
+
|
|
132
|
+
- `scripts/build-worker.mjs` — bundles `worker-entry.ts` with transformers
|
|
133
|
+
inlined → `worker-code.generated.ts`. **[DELETE]** with the worker.
|
|
134
|
+
- `package.json` scripts `build` and `pretypecheck` call `build-worker.mjs`.
|
|
135
|
+
**[EDIT]** remove that step.
|
|
136
|
+
- `tsdown.config.ts` — `@huggingface/transformers` in `nodeExternals`; browser
|
|
137
|
+
build "BUNDLES transformers.js for clean DX". **[EDIT]** remove transformers
|
|
138
|
+
external + the bundling comment/path.
|
|
139
|
+
- `package.json` deps to drop: `puppeteer-core`, `@huggingface/transformers`,
|
|
140
|
+
`onnxruntime-web`, `kokoro-js`; re-evaluate `@huggingface/hub`.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## 2. What Each ONNX Usage Did → Native Replacement
|
|
145
|
+
|
|
146
|
+
| Modality | Old (ONNX) | New (native) | Wiring status |
|
|
147
|
+
|---|---|---|---|
|
|
148
|
+
| **Text gen** | transformers.js `AutoModelForCausalLM` (browser CDN + Node), chrome-backend (Node GPU) | `WebGPUEngine.generate` (qwen2/qwen3_5/lfm2/gemma4) | ✅ Wired in core `gerbil.ts` (`webgpu-native` branch) and browser `use-native-engine.ts`. ONNX text path now **DEAD** for GPU; still the CPU/WASM + chrome fallback. |
|
|
149
|
+
| **Vision (VLM)** | `AutoModelForImageTextToText` (browser + chrome) | `WebGPUEngine.encodeImage` / `describeImage` (qwen3_5_vision, gemma4_vision) | ✅ Native exists. Core vision branch still has chrome fallback (~600-620) and browser worker has ONNX vision. **Partially dead.** |
|
|
150
|
+
| **Embeddings** | `pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2")` (core + `use-embedding` worker) | `WebGPUEngine.embed` (EmbeddingGemma / Qwen3-Embedding) | ❌ **NOT wired** into core `embed()` or `use-embedding.ts`. Still 100% ONNX. **This is live, must be migrated, not just deleted.** |
|
|
151
|
+
| **STT** | `WhisperSTT` (`pipeline("automatic-speech-recognition")`, onnx-community/whisper-*) | `MoonshineSTT` / `WebGPUEngine.transcribe` | ❌ **NOT wired** into core `transcribe()` (delegates to ONNX `WhisperSTT`). Native Moonshine sits unused in `src/gpu/`. **Live, must migrate.** |
|
|
152
|
+
| **TTS** | Kokoro (`kokoro-js`) + Supertonic (`pipeline("text-to-speech")`) | `KaniTTS` / `WebGPUEngine.speak` | ❌ **NOT wired** into core `speak()` (delegates to ONNX `createTTS`). Native Kani unused outside `src/gpu/`. **Live, must migrate.** |
|
|
153
|
+
|
|
154
|
+
**Dead vs deliberate-keep summary:**
|
|
155
|
+
- **DEAD now (GPU mode):** ONNX text & vision generation paths in core + browser
|
|
156
|
+
worker. Safe to delete once consumers stop importing the worker.
|
|
157
|
+
- **LIVE (must migrate first):** core `embed/embedBatch`, core STT (`WhisperSTT`),
|
|
158
|
+
core TTS (`createTTS`/Kokoro), and their browser hooks. These are the real
|
|
159
|
+
work — the "pure native" claim is aspirational for these three modalities at
|
|
160
|
+
the public-API layer.
|
|
161
|
+
- **Deliberate keep:** all of `src/gpu/`; `use-native-engine.ts`; langchain/
|
|
162
|
+
ai-sdk integrations (ride on native `embed`).
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## 3. Decision Points
|
|
167
|
+
|
|
168
|
+
### 3a. CPU/WASM fallback (transformers.js wasm) — keep or remove?
|
|
169
|
+
|
|
170
|
+
- **What it is:** `backend-selector.ts` wasm/cpu chains, the `pipeline()` CPU/WASM
|
|
171
|
+
branches in `gerbil.ts` (~488, ~525 fallback), iOS single-thread ORT wasm in
|
|
172
|
+
`use-embedding.ts` / worker. The *only* execution path when WebGPU is absent.
|
|
173
|
+
- **Pure-native cost:** removing it = **WebGPU becomes a hard requirement.**
|
|
174
|
+
No fallback for: older Safari without WebGPU, headless CI without a GPU
|
|
175
|
+
adapter, locked-down enterprise browsers, very old Android.
|
|
176
|
+
- **Recommendation:** the user wants pure native → **remove**, but ship a clear
|
|
177
|
+
capability error ("WebGPU required") rather than a silent failure. Flagged as
|
|
178
|
+
a **reach tradeoff** — quantify lost reach before committing. If any reach
|
|
179
|
+
matters, the cheap compromise is to keep `onnxruntime-web` for embeddings only
|
|
180
|
+
(smallest, most reuse-heavy) and drop everything else; but that keeps one ONNX
|
|
181
|
+
dep alive, so it half-defeats the goal.
|
|
182
|
+
- **Decision owner: user.**
|
|
183
|
+
|
|
184
|
+
### 3b. chrome-backend.ts (puppeteer headless Chrome) — safe to remove?
|
|
185
|
+
|
|
186
|
+
- **What it was:** Node-side WebGPU via headless Chrome (puppeteer-core) before
|
|
187
|
+
node-dawn/native landed. Already deprecated; the native engine replaced it for
|
|
188
|
+
Node GPU inference.
|
|
189
|
+
- **Hidden tentacles:** beyond inference it powers CLI/REPL features —
|
|
190
|
+
`getChromeCachedModels` (cache listing in `repl/utils.ts`, `index.ts`), the
|
|
191
|
+
`gerbil chrome` admin command (status/kill/ps/pkill of `gerbil/chrome-cache`),
|
|
192
|
+
and exports in `src/index.ts`. These must be detached first or builds break.
|
|
193
|
+
- **Recommendation:** **YES, remove.** Lowest-risk *inference* removal, but do
|
|
194
|
+
it as a coordinated phase (detach CLI/REPL/index consumers → delete file →
|
|
195
|
+
drop `puppeteer-core`). 1,836 lines + a heavy dep gone.
|
|
196
|
+
- **Decision: effectively yes; confirm no external consumer imports
|
|
197
|
+
`ChromeGPUBackend` from the package.**
|
|
198
|
+
|
|
199
|
+
### 3c. Browser ONNX worker + hooks — remove once site/REPL stop using them?
|
|
200
|
+
|
|
201
|
+
- **What it is:** `worker.ts` + `worker-entry.ts` + 1.35 MB generated bundle +
|
|
202
|
+
ONNX paths in `use-embedding`/`use-speech`/`use-voice-input`.
|
|
203
|
+
- **Blocker:** chat must move to `use-native-engine`; embeddings/STT/TTS hooks
|
|
204
|
+
need native rewrites. The site migration doc (`docs/gerbil-site-native-
|
|
205
|
+
migration.md`, `docs/site-update-plan.md`) tracks consumer migration.
|
|
206
|
+
- **Recommendation:** remove **after** native hooks exist and the site/REPL are
|
|
207
|
+
repointed. Biggest bundle-size win (drops the multi-MB inlined transformers).
|
|
208
|
+
- **Decision: yes, sequenced last among browser work.**
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## 4. REPL / CLI Audit
|
|
213
|
+
|
|
214
|
+
The REPL/CLI is **partly native already** (it accepts `--backend native` →
|
|
215
|
+
`webgpu-native` and the `webgpuEngine` path handles chat), but is riddled with
|
|
216
|
+
ONNX/chrome assumptions:
|
|
217
|
+
|
|
218
|
+
- **`cli/index.ts`**: every command exposes `--backend native|onnx|auto`
|
|
219
|
+
(default `auto`). `auto` + ONNX must go for native-only. The `transcribe`,
|
|
220
|
+
`tts`, and `voice` commands use ONNX STT/TTS (`whisper-tiny.en`, `kokoro-82m`)
|
|
221
|
+
— repoint to native Moonshine/Kani. The entire `chrome` admin subcommand
|
|
222
|
+
(status/kill/ps) is chrome-backend-only — **delete**. Cache listing scans the
|
|
223
|
+
transformers cache dir — switch to native cache dir.
|
|
224
|
+
- **`repl/utils.ts`**: model registry is ONNX repos (`onnx-community/*-ONNX`,
|
|
225
|
+
`Phi-3-...-onnx`), cache scanning keys on `.onnx` files and the HF
|
|
226
|
+
`/tree/main/onnx` API, plus `getChromeCachedModels`. Rewrite registry to
|
|
227
|
+
native safetensors repos (`Qwen/Qwen3.5-0.8B`, etc. — the `nativeRepoMap` in
|
|
228
|
+
`gerbil.ts` already lists the mappings), scan for `.safetensors`, drop chrome.
|
|
229
|
+
- **`repl/App.tsx` / `index.tsx` / views**: `backend` prop, cpu/webgpu device
|
|
230
|
+
switch, kokoro/whisper defaults, chrome cleanup delay. Collapse to native;
|
|
231
|
+
device switch becomes meaningful only if wasm fallback is kept (§3a).
|
|
232
|
+
|
|
233
|
+
**Plan:** native-only REPL — remove the `onnx`/`auto` backend options (or make
|
|
234
|
+
`auto` == native), delete the `chrome` command, repoint STT/TTS/embeddings model
|
|
235
|
+
ids to native, rewrite cache/registry helpers to safetensors. If §3a keeps wasm,
|
|
236
|
+
retain the cpu/webgpu switch; otherwise drop it.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## 5. Phased Removal Plan
|
|
241
|
+
|
|
242
|
+
Ordering principle: **wire native replacements first, detach consumers, then
|
|
243
|
+
delete; deps drop only after their last importer is gone.** Each phase ends with
|
|
244
|
+
the no-regression suite (§6) green.
|
|
245
|
+
|
|
246
|
+
### Phase 0 — Safety net (no deletion)
|
|
247
|
+
- Snapshot current behavior: run `pnpm test` (vitest) + the native e2e scripts
|
|
248
|
+
(`scripts/engine/test-gerbil-e2e.mjs`, `test-vision-e2e.mjs`,
|
|
249
|
+
`test-embedding-gemma.mjs`, `test-moonshine-transcribe.mjs`,
|
|
250
|
+
`test-kani-speak.mjs`). Record baseline outputs/tokens.
|
|
251
|
+
- **Risk: none.** Verifies: native paths actually work before we lean on them.
|
|
252
|
+
|
|
253
|
+
### Phase 1 — Wire native embeddings / STT / TTS into the public API (ADD, don't delete)
|
|
254
|
+
- Core `gerbil.ts`: make `embed`/`embedBatch` route to `WebGPUEngine.embed`;
|
|
255
|
+
make `transcribe`/`speak` route to native Moonshine/Kani (new thin wrappers
|
|
256
|
+
replacing `WhisperSTT`/`createTTS`).
|
|
257
|
+
- Add native browser hooks (native embed, native speech) modeled on
|
|
258
|
+
`use-native-engine.ts`.
|
|
259
|
+
- **Risk: MEDIUM** (behavioral parity — output dims, voices, languages).
|
|
260
|
+
Verify: embedding cosine-sim sanity, transcribe WER on a fixture, speak audio
|
|
261
|
+
length/sample-rate. **This phase makes the "pure native" claim true.**
|
|
262
|
+
|
|
263
|
+
### Phase 2 — Remove chrome-backend (Node puppeteer GPU)
|
|
264
|
+
- Detach: `cli/index.ts` (`chrome` command, cache listing), `repl/utils.ts`
|
|
265
|
+
(`getChromeCachedModels`/`refreshCachedModelSizes`), `repl` views, `src/index.ts`
|
|
266
|
+
exports, `gerbil.ts` chrome branches + passthroughs.
|
|
267
|
+
- Delete `src/core/chrome-backend.ts`. Drop `puppeteer-core` from package.json.
|
|
268
|
+
- **Risk: LOW–MEDIUM** (wide but mechanical; CLI admin command removed is a
|
|
269
|
+
user-visible change). Verify: build, CLI smoke (`gerbil "hi"` native), REPL
|
|
270
|
+
loads.
|
|
271
|
+
|
|
272
|
+
### Phase 3 — Migrate browser chat to native, delete ONNX worker
|
|
273
|
+
- Repoint site/REPL chat consumers to `use-native-engine`.
|
|
274
|
+
- Delete `worker.ts`, `worker-entry.ts`, `worker-code.generated.ts`,
|
|
275
|
+
`scripts/build-worker.mjs`; remove the build/pretypecheck worker step;
|
|
276
|
+
fix `browser/index.ts` exports.
|
|
277
|
+
- **Risk: MEDIUM** (iOS/Safari chat path was the worker's reason to exist;
|
|
278
|
+
confirm `use-native-engine` covers WKWebView per `docs/mobile-failure-
|
|
279
|
+
diagnosis.md`). Verify: browser e2e chat, iPad smoke.
|
|
280
|
+
|
|
281
|
+
### Phase 4 — Remove ONNX embeddings / STT / TTS source
|
|
282
|
+
- Delete `src/core/stt.ts`, `src/core/tts.ts`, `src/core/model-compat.ts`.
|
|
283
|
+
- Gut/rewire `use-embedding.ts`, `use-speech.ts`, `use-voice-input.ts` to native.
|
|
284
|
+
- Remove `gerbil.ts` ONNX imports (`AutoModel*`, `pipeline`), the `embedder`
|
|
285
|
+
ONNX field, model-compat call, getModelRepo ONNX maps.
|
|
286
|
+
- Update `src/index.ts` exports (drop `WhisperSTT`/`WHISPER_MODELS`/`KokoroTTS`/
|
|
287
|
+
`TTS_MODELS` or replace with native).
|
|
288
|
+
- Drop `kokoro-js` from package.json.
|
|
289
|
+
- **Risk: MEDIUM.** Verify: STT/TTS/embedding suites; integration tests
|
|
290
|
+
(langchain/ai-sdk embeddings).
|
|
291
|
+
|
|
292
|
+
### Phase 5 — Decide & remove CPU/WASM fallback + final dep purge (gated on §3a)
|
|
293
|
+
- If removing fallback: delete wasm/cpu branches in `gerbil.ts` +
|
|
294
|
+
`backend-selector.ts` wasm chains; replace with "WebGPU required" error.
|
|
295
|
+
- Remove `@huggingface/transformers` + `onnxruntime-web` from package.json and
|
|
296
|
+
`tsdown.config.ts` externals/bundling. Re-evaluate `@huggingface/hub`.
|
|
297
|
+
- Native-only REPL cleanup (backend flags, device switch).
|
|
298
|
+
- **Risk: HIGH** (reach loss is permanent; this is the one that needs the user's
|
|
299
|
+
explicit sign-off). Verify: full suite on a WebGPU-capable machine; confirm
|
|
300
|
+
graceful error on a no-WebGPU environment.
|
|
301
|
+
|
|
302
|
+
### Net result
|
|
303
|
+
~10,800 LOC removed; deps removed: `puppeteer-core`, `@huggingface/transformers`,
|
|
304
|
+
`onnxruntime-web`, `kokoro-js` (+ maybe `@huggingface/hub`); the 1.35 MB worker
|
|
305
|
+
bundle and its build step gone; one inference path (native) to maintain.
|
|
306
|
+
|
|
307
|
+
---
|
|
308
|
+
|
|
309
|
+
## 6. Subagent Execution Flow (for the removal workflow)
|
|
310
|
+
|
|
311
|
+
Run as a **sequence of agents, one per phase** (phases are dependency-ordered;
|
|
312
|
+
do NOT parallelize across phases — later phases assume earlier deletions). Within
|
|
313
|
+
a phase, independent detach edits can be parallel sub-agents.
|
|
314
|
+
|
|
315
|
+
Each agent's contract:
|
|
316
|
+
1. **Read** this doc's relevant phase + the target files.
|
|
317
|
+
2. **Make the phase's edits** (wire or delete per the disposition table).
|
|
318
|
+
3. **Verify no regression:** `pnpm build` (TypeScript must pass — catches dangling
|
|
319
|
+
imports), `pnpm test` (vitest), and the phase-relevant native e2e scripts:
|
|
320
|
+
- Text: `scripts/engine/test-gerbil-e2e.mjs`
|
|
321
|
+
- Vision: `scripts/engine/test-vision-e2e.mjs`
|
|
322
|
+
- Embeddings: `scripts/engine/test-embedding-gemma.mjs`
|
|
323
|
+
- STT: `scripts/engine/test-moonshine-transcribe.mjs`
|
|
324
|
+
- TTS: `scripts/engine/test-kani-speak.mjs`
|
|
325
|
+
4. **Grep-gate:** after a delete phase, `grep -rn "<removed symbol>" src/` must be
|
|
326
|
+
empty (e.g. `ChromeGPUBackend`, `@huggingface/transformers`, `kokoro-js`).
|
|
327
|
+
5. **Report** diff summary + verification results to the orchestrator; STOP on any
|
|
328
|
+
red so a human can adjudicate (especially Phase 5 reach loss).
|
|
329
|
+
|
|
330
|
+
Suggested agent lineup:
|
|
331
|
+
- **Agent A (Phase 0):** baseline capture, no edits.
|
|
332
|
+
- **Agent B (Phase 1):** native wiring for embed/STT/TTS — the gating change.
|
|
333
|
+
- **Agent C (Phase 2):** chrome-backend removal + `puppeteer-core` drop.
|
|
334
|
+
- **Agent D (Phase 3):** browser native chat migration + worker deletion.
|
|
335
|
+
- **Agent E (Phase 4):** ONNX STT/TTS/embed source removal + `kokoro-js` drop.
|
|
336
|
+
- **Agent F (Phase 5):** GATED — fallback decision, final dep purge, REPL cleanup.
|
|
337
|
+
|
|
338
|
+
Orchestrator gates: B must pass before C–E delete the ONNX sources they replace;
|
|
339
|
+
F runs only after explicit user sign-off on §3a.
|