@tryhamster/gerbil 1.0.0-rc.9 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (179) hide show
  1. package/LICENSE +1 -1
  2. package/README.md +318 -104
  3. package/dist/architectures-C1I5V3Dt.mjs +6070 -0
  4. package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
  5. package/dist/browser/index.d.ts +276 -590
  6. package/dist/browser/index.d.ts.map +1 -1
  7. package/dist/browser/index.js +592 -2334
  8. package/dist/browser/index.js.map +1 -1
  9. package/dist/cli.mjs +625 -1098
  10. package/dist/cli.mjs.map +1 -1
  11. package/dist/defaults-9komdrbY.mjs +24 -0
  12. package/dist/defaults-9komdrbY.mjs.map +1 -0
  13. package/dist/frameworks/express.d.mts +1 -3
  14. package/dist/frameworks/express.d.mts.map +1 -1
  15. package/dist/frameworks/express.mjs +7 -7
  16. package/dist/frameworks/express.mjs.map +1 -1
  17. package/dist/frameworks/fastify.d.mts +1 -1
  18. package/dist/frameworks/fastify.d.mts.map +1 -1
  19. package/dist/frameworks/fastify.mjs +3 -3
  20. package/dist/frameworks/fastify.mjs.map +1 -1
  21. package/dist/frameworks/hono.d.mts +1 -1
  22. package/dist/frameworks/hono.d.mts.map +1 -1
  23. package/dist/frameworks/hono.mjs +4 -4
  24. package/dist/frameworks/hono.mjs.map +1 -1
  25. package/dist/frameworks/next.d.mts +3 -2
  26. package/dist/frameworks/next.d.mts.map +1 -1
  27. package/dist/frameworks/next.mjs +4 -4
  28. package/dist/frameworks/next.mjs.map +1 -1
  29. package/dist/frameworks/react.d.mts +1 -1
  30. package/dist/frameworks/trpc.d.mts +1 -1
  31. package/dist/frameworks/trpc.d.mts.map +1 -1
  32. package/dist/frameworks/trpc.mjs +4 -4
  33. package/dist/frameworks/trpc.mjs.map +1 -1
  34. package/dist/gerbil-BetB5xb0.d.mts +488 -0
  35. package/dist/gerbil-BetB5xb0.d.mts.map +1 -0
  36. package/dist/gerbil-CTZUa8EZ.mjs +4 -0
  37. package/dist/gerbil-DNniplr4.mjs +1656 -0
  38. package/dist/gerbil-DNniplr4.mjs.map +1 -0
  39. package/dist/gpu/hooks.d.mts +640 -0
  40. package/dist/gpu/hooks.d.mts.map +1 -0
  41. package/dist/gpu/hooks.mjs +1369 -0
  42. package/dist/gpu/hooks.mjs.map +1 -0
  43. package/dist/gpu/index.d.mts +2 -0
  44. package/dist/gpu/index.mjs +6 -0
  45. package/dist/gpu-DFuglcEx.mjs +3790 -0
  46. package/dist/gpu-DFuglcEx.mjs.map +1 -0
  47. package/dist/index-Dgmb2kE3.d.mts +245 -0
  48. package/dist/index-Dgmb2kE3.d.mts.map +1 -0
  49. package/dist/index-DukkJRMj.d.mts +2114 -0
  50. package/dist/index-DukkJRMj.d.mts.map +1 -0
  51. package/dist/index.d.mts +22 -487
  52. package/dist/index.d.mts.map +1 -1
  53. package/dist/index.mjs +13 -8
  54. package/dist/index.mjs.map +1 -1
  55. package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
  56. package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
  57. package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
  58. package/dist/integrations/ai-sdk.d.mts +75 -6
  59. package/dist/integrations/ai-sdk.d.mts.map +1 -1
  60. package/dist/integrations/ai-sdk.mjs +131 -15
  61. package/dist/integrations/ai-sdk.mjs.map +1 -1
  62. package/dist/integrations/langchain.d.mts +1 -1
  63. package/dist/integrations/langchain.d.mts.map +1 -1
  64. package/dist/integrations/langchain.mjs +5 -5
  65. package/dist/integrations/langchain.mjs.map +1 -1
  66. package/dist/integrations/llamaindex.d.mts +1 -1
  67. package/dist/integrations/llamaindex.d.mts.map +1 -1
  68. package/dist/integrations/llamaindex.mjs +5 -5
  69. package/dist/integrations/llamaindex.mjs.map +1 -1
  70. package/dist/integrations/mcp-client.mjs +3 -3
  71. package/dist/integrations/mcp-client.mjs.map +1 -1
  72. package/dist/integrations/mcp.d.mts +3 -2
  73. package/dist/integrations/mcp.d.mts.map +1 -1
  74. package/dist/integrations/mcp.mjs +5 -5
  75. package/dist/{mcp-BvbriaBy.mjs → mcp-D2vvH1Xc.mjs} +4 -4
  76. package/dist/mcp-D2vvH1Xc.mjs.map +1 -0
  77. package/dist/memory/index.d.mts +3 -0
  78. package/dist/memory/index.mjs +6 -0
  79. package/dist/memory-D1P7Tmda.mjs +4 -0
  80. package/dist/memory-DVN0MnIG.mjs +132 -0
  81. package/dist/memory-DVN0MnIG.mjs.map +1 -0
  82. package/dist/memory-Dj0J1v88.mjs +294 -0
  83. package/dist/memory-Dj0J1v88.mjs.map +1 -0
  84. package/dist/moonshine-stt-17dpP1kr.mjs +4 -0
  85. package/dist/moonshine-stt-4ojLtMq7.mjs +11962 -0
  86. package/dist/moonshine-stt-4ojLtMq7.mjs.map +1 -0
  87. package/dist/{one-liner-s-lD8rCC.mjs → one-liner-JhdIPxzF.mjs} +14 -16
  88. package/dist/one-liner-JhdIPxzF.mjs.map +1 -0
  89. package/dist/repl-BDRkwPGX.mjs +9 -0
  90. package/dist/skills/index.d.mts +270 -320
  91. package/dist/skills/index.d.mts.map +1 -1
  92. package/dist/skills/index.mjs +5 -5
  93. package/dist/{skills-CD3Orlex.mjs → skills-CU694Dc8.mjs} +187 -32
  94. package/dist/skills-CU694Dc8.mjs.map +1 -0
  95. package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
  96. package/dist/tools-DQ1mPUw5.mjs.map +1 -0
  97. package/dist/types-DQBe2lFo.d.mts +165 -0
  98. package/dist/types-DQBe2lFo.d.mts.map +1 -0
  99. package/dist/{types-CiTc7ez3.d.mts → types-LlyYILII.d.mts} +112 -14
  100. package/dist/types-LlyYILII.d.mts.map +1 -0
  101. package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
  102. package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
  103. package/dist/vector-B0panuy6.mjs +95 -0
  104. package/dist/vector-B0panuy6.mjs.map +1 -0
  105. package/docs/PROJECT-STATE.md +321 -0
  106. package/docs/adding-a-model-family.md +280 -0
  107. package/docs/ai-sdk.md +70 -61
  108. package/docs/architecture/overview.md +17 -7
  109. package/docs/browser.md +203 -8
  110. package/docs/embeddings.md +156 -0
  111. package/docs/gerbil-site-native-migration.md +217 -0
  112. package/docs/gpu-engine/architectures.md +398 -0
  113. package/docs/gpu-engine/ir.md +372 -0
  114. package/docs/gpu-engine/kernels.md +718 -0
  115. package/docs/gpu-engine/paper.html +1759 -0
  116. package/docs/gpu-engine/paper.md +2109 -0
  117. package/docs/gpu-engine/safetensors.md +312 -0
  118. package/docs/gpu-engine/tokenizer.md +302 -0
  119. package/docs/memory-rag.md +91 -0
  120. package/docs/metal-safari-intel.md +190 -0
  121. package/docs/mobile-failure-diagnosis.md +124 -0
  122. package/docs/mobile.md +99 -0
  123. package/docs/observability.md +230 -0
  124. package/docs/onnx-removal-plan.md +339 -0
  125. package/docs/research/autoresearch-portable.md +904 -0
  126. package/docs/research/dispatch-reduction-hivemind.md +84 -0
  127. package/docs/research/ios-safari-model-caching.md +117 -0
  128. package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
  129. package/docs/research/native-stt-model-selection.md +49 -0
  130. package/docs/research/native-tts-model-selection.md +90 -0
  131. package/docs/research/native-vs-chromium-decision.md +152 -0
  132. package/docs/research/nemotron-mamba2-inference.md +910 -0
  133. package/docs/research/qwen35-multimodal.md +293 -0
  134. package/docs/research/qwen36-gemma4-targets.md +337 -0
  135. package/docs/research/sota-embedding-models.md +179 -0
  136. package/docs/research/sota-mobile-models-2026.md +263 -0
  137. package/docs/research/sota-modality-models.md +202 -0
  138. package/docs/research/tps-baselines.md +71 -0
  139. package/docs/research/webgpu-m4-reference.md +104 -0
  140. package/docs/site-update-plan.md +155 -0
  141. package/docs/structured-output.md +123 -0
  142. package/docs/stt.md +63 -446
  143. package/docs/tts.md +77 -499
  144. package/docs/vision.md +100 -338
  145. package/package.json +22 -7
  146. package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
  147. package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
  148. package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
  149. package/dist/gerbil-CJ3ifloF.mjs +0 -4
  150. package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
  151. package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
  152. package/dist/gerbil-qOTe1nl2.d.mts +0 -431
  153. package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
  154. package/dist/kokoro-BNTb6egA.mjs +0 -20210
  155. package/dist/kokoro-BNTb6egA.mjs.map +0 -1
  156. package/dist/kokoro-CMOGDSgT.js +0 -20212
  157. package/dist/kokoro-CMOGDSgT.js.map +0 -1
  158. package/dist/mcp-BvbriaBy.mjs.map +0 -1
  159. package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
  160. package/dist/repl-DveXw36T.mjs +0 -9
  161. package/dist/skills-CD3Orlex.mjs.map +0 -1
  162. package/dist/stt-Bu-E23Sc.js +0 -433
  163. package/dist/stt-Bu-E23Sc.js.map +0 -1
  164. package/dist/stt-CpLYbGFd.mjs +0 -433
  165. package/dist/stt-CpLYbGFd.mjs.map +0 -1
  166. package/dist/stt-DRPLEEHB.mjs +0 -3
  167. package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
  168. package/dist/transformers.web-DiD1gTwk.js +0 -44695
  169. package/dist/transformers.web-DiD1gTwk.js.map +0 -1
  170. package/dist/transformers.web-u34VxRFM.js +0 -3
  171. package/dist/tts-CqroPaSK.js +0 -724
  172. package/dist/tts-CqroPaSK.js.map +0 -1
  173. package/dist/tts-DXgsKGCe.mjs +0 -3
  174. package/dist/tts-DeGANMNV.mjs +0 -730
  175. package/dist/tts-DeGANMNV.mjs.map +0 -1
  176. package/dist/types-CiTc7ez3.d.mts.map +0 -1
  177. /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
  178. /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
  179. /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
@@ -0,0 +1,339 @@
1
+ # ONNX / transformers.js Removal Plan — Going Pure Native
2
+
3
+ > Status: **PLANNING DOC**. No code is deleted by this document. It is the
4
+ > authoritative inventory + phased plan to hand to a removal workflow.
5
+ >
6
+ > Branch: `feat/webgpu-engine-mobile`
7
+ > Goal: remove the legacy ONNX Runtime / transformers.js / puppeteer-Chrome
8
+ > inference stack so Gerbil runs **pure native** on the WebGPU engine
9
+ > (`src/gpu/`), across text (Qwen3.5 / LFM2.5 / Gemma4), vision (Qwen3.5 ViT,
10
+ > Gemma4 vision), embeddings (EmbeddingGemma / Qwen3-Embedding), STT (Moonshine)
11
+ > and TTS (Kani).
12
+
13
+ ---
14
+
15
+ ## 0. TL;DR — Headline Findings
16
+
17
+ - **~10,800 lines of inference-stack code become deletable** once the native
18
+ paths are wired in and the browser/REPL consumers are migrated. The biggest
19
+ single chunks: `chrome-backend.ts` (1,836), `worker.ts` (1,056),
20
+ `tts.ts` (1,031), `worker-entry.ts` (599), `stt.ts` (605),
21
+ `worker-code.generated.ts` (1.35 MB generated bundle).
22
+ - **3 npm deps drop out:** `puppeteer-core`, `@huggingface/transformers`,
23
+ `onnxruntime-web`. Likely also `kokoro-js` (ONNX-backed TTS) and possibly
24
+ `@huggingface/hub` (only if nothing else needs HF tree/download APIs).
25
+ - **The "all modalities are native" premise is only true inside `src/gpu/`.**
26
+ `WebGPUEngine` (`src/gpu/index.ts`) exposes native `generate`, `encodeImage`,
27
+ `describeImage`, `embed`, `speak` and `transcribe` (Moonshine). **But the
28
+ public surfaces — the core Node `Gerbil` class and the browser hooks — do NOT
29
+ yet route STT / TTS / embeddings to the native engine.** They still call ONNX.
30
+ **This wiring is the prerequisite for removal**, not an afterthought.
31
+
32
+ ### The 2-3 real decisions the user must make
33
+
34
+ 1. **CPU/WASM fallback (transformers.js wasm) — keep or kill?** Pure-native
35
+ means *WebGPU required*. Dropping the wasm path removes the only way to run
36
+ on machines/browsers without WebGPU (older Safari, locked-down enterprise,
37
+ CI without a GPU). Recommendation: **kill it** for pure-native, but this is a
38
+ reach tradeoff the user owns. (Detail in §3a.)
39
+ 2. **chrome-backend.ts (puppeteer headless Chrome) — remove now?** It was the
40
+ Node-side WebGPU path before node-dawn/native. It is already legacy. **Yes,
41
+ safe to remove** — but it has surprisingly wide tentacles in the CLI/REPL
42
+ (cache listing, process management commands). (Detail in §3b.)
43
+ 3. **Browser ONNX worker + hooks — remove once site/REPL stop using them?**
44
+ The chat worker (`worker.ts` + `worker-entry.ts` + the 1.35 MB
45
+ `worker-code.generated.ts`) and `use-embedding`/`use-speech` ONNX paths can
46
+ go once `use-native-engine` covers chat and native embed/speech hooks exist.
47
+ (Detail in §3c.)
48
+
49
+ ---
50
+
51
+ ## 1. Full Inventory — File by File
52
+
53
+ Legend: **[DELETE]** = remove entirely · **[GUT]** = strip ONNX path, keep file ·
54
+ **[KEEP]** = native, no change · **[REWIRE]** = repoint to native then clean.
55
+
56
+ ### 1a. Core (`src/core/`)
57
+
58
+ | File | Lines | ONNX/legacy usage | Disposition |
59
+ |---|---:|---|---|
60
+ | `chrome-backend.ts` | 1,836 | `puppeteer-core` headless-Chrome WebGPU backend. `ChromeGPUBackend`, `getChromeCachedModels`, global browser/process management. | **[DELETE]** (after CLI/REPL detached) |
61
+ | `stt.ts` | 605 | `WhisperSTT`: `pipeline("automatic-speech-recognition", "onnx-community/whisper-*")` via `@huggingface/transformers`, ORT wasm CDN. | **[DELETE]** → replace with native Moonshine wrapper |
62
+ | `tts.ts` | 1,031 | `KokoroTTS` (`kokoro-js`, ONNX) + `SupertonicTTS` (`pipeline("text-to-speech")`, ONNX). | **[DELETE]** → replace with native Kani wrapper |
63
+ | `model-compat.ts` | 79 | `patchHFCacheConfig` — patches HF cache so transformers.js loads new model_types (qwen3_5→qwen3). Pure ONNX-loader workaround. | **[DELETE]** (native loader reads safetensors directly, no patch needed) |
64
+ | `gerbil.ts` | 2,686 | The hub. `import { AutoModelForCausalLM, AutoModelForImageTextToText, pipeline, ... } from "@huggingface/transformers"`. `effectiveBackend` switch with `onnx` / `webgpu-native` / chrome branches. `this.model` (ONNX), `this.visionModel` (ONNX), `this.chromeBackend`, `this.embedder` (ONNX pipeline). Native path = `this.webgpuEngine`. `embed()`/`embedBatch()` still ONNX (`pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2")`). STT/TTS delegate to ONNX `WhisperSTT`/`createTTS`. | **[GUT]** — collapse to native-only; remove all ONNX branches, chrome branch, `pipeline` import, model-compat call |
65
+
66
+ Key ONNX surfaces inside `gerbil.ts`:
67
+ - Imports (lines ~6-21): `AutoModelForCausalLM`, `AutoModelForImageTextToText`,
68
+ `pipeline as rawPipeline`, etc.
69
+ - `effectiveBackend` switch (~386-393): three-way `onnx` / `webgpu-native` /
70
+ auto. After removal this becomes always-native.
71
+ - Branches: `isBrowser && webgpu && onnx` (~440), `!isBrowser && webgpu && onnx`
72
+ → chrome backend (~467-486), CPU/WASM pipeline fallback (~488+), error
73
+ fallback to pipeline (~525-530).
74
+ - `loadVisionModel` chrome branch (~600-620).
75
+ - `generate`/`generateStream` dispatch: `if (this.webgpuEngine)` (native) vs
76
+ `else if (this.chromeBackend)` vs `else this.model.generate` (ONNX) — lines
77
+ ~1235-1300, ~1424-1510.
78
+ - `embed()` (~1824) / `embedBatch()` (~1849): pure ONNX, **no native branch yet**.
79
+ - STT/TTS methods (~2058-2310): delegate to ONNX `tts`/`stt` objects.
80
+ - Chrome status/memory passthroughs (~1055-1101), getModelRepo ONNX maps
81
+ (~1026-1041).
82
+ - Cleanup (~2388-2461): disposes webgpuEngine, chromeBackend, tts, stt.
83
+
84
+ ### 1b. Browser (`src/browser/`)
85
+
86
+ | File | Lines | ONNX/legacy usage | Disposition |
87
+ |---|---:|---|---|
88
+ | `worker.ts` | 1,056 | Chat web-worker host. Builds inline worker that imports `@huggingface/transformers` from CDN, ORT wasm paths, `AutoTokenizer`/`AutoModelForCausalLM`/`AutoModelForImageTextToText`. iOS main-thread inference path. | **[DELETE]** (after chat → `use-native-engine`) |
89
+ | `worker-entry.ts` | 599 | The real worker source compiled to the IIFE. Imports transformers v4, ORT wasm CDN, vision+text ONNX. | **[DELETE]** |
90
+ | `worker-code.generated.ts` | ~1.35 MB | Generated IIFE bundle of `worker-entry.ts` (transformers.js fully inlined). Produced by `scripts/build-worker.mjs`. | **[DELETE]** + delete generator + build step |
91
+ | `use-embedding.ts` | 485 | Worker that imports transformers.js from CDN, `pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2")`, ORT wasm, iOS single-thread. | **[GUT/REWIRE]** to native embed (EmbeddingGemma via `WebGPUEngine.embed`) |
92
+ | `use-speech.ts` | 689 | TTS hook: Kokoro (`kokoro-js`) + Supertonic (`pipeline("text-to-speech")`). All ONNX. | **[REWIRE]** to native Kani |
93
+ | `use-voice-input.ts` | 1,227 | STT+chat+TTS combo hook. Uses `createGerbilWorker` (ONNX chat worker) + ONNX STT/TTS models (`whisper-tiny.en`, `kokoro-82m`). | **[REWIRE]** to native engine + native STT/TTS |
94
+ | `backend-selector.ts` | 245 | `selectBackend` returning `webgpu`/`wasm`/`cpu`, fallback chains, iOS wasm fallbacks. The wasm/cpu half is ONNX-fallback machinery. | **[GUT]** — drop wasm/cpu fallback (decision §3a) |
95
+ | `use-native-engine.ts` | 217 | **[KEEP]** Already native — "no ONNX Runtime, no transformers.js." This is the template the others migrate to. | **[KEEP]** |
96
+ | `index.ts` | — | Re-exports worker, use-embedding, use-speech, use-voice-input, backend-selector. Notes core Gerbil excluded due to chrome-backend/puppeteer. | **[REWIRE]** exports after deletions |
97
+
98
+ Other browser files referencing legacy (mostly comments/guards): `preload.ts`,
99
+ `device-guards.ts`, `download.ts`, `webgpu.ts` — touch only as needed (mostly
100
+ ONNX wasm path hints / iOS guards). Low priority.
101
+
102
+ ### 1c. CLI / REPL (`src/cli/`) — see §4 for full audit
103
+
104
+ | File | Lines | usage | Disposition |
105
+ |---|---:|---|---|
106
+ | `index.ts` | 1,289 | `--backend native|onnx|auto` flag (maps to webgpu-native/onnx). `transcribe`, `tts`, `voice` commands (ONNX STT/TTS). `chrome` admin command (`ChromeGPUBackend.getGlobalBrowserStatus`, `killAllBackends`, kill page, ps/pkill of `gerbil/chrome-cache`). transformers cache dir scanning. | **[REWIRE]** drop onnx/auto backend options → native-only; delete `chrome` command; repoint STT/TTS |
107
+ | `repl/utils.ts` | 738 | `getChromeCachedModels`, `refreshCachedModelSizes`, transformers cache dir, `.onnx` file detection, HF `/tree/main/onnx` fetch picking q4f16/q4 ONNX files, `onnx-community/*-ONNX` model registry. | **[GUT]** remove chrome + ONNX file logic; native models load safetensors |
108
+ | `repl/App.tsx` | 1,340 | `backend` prop ("auto"/"webgpu-native"/"onnx"), device switch webgpu/cpu, ttsModel kokoro / sttModel whisper defaults, transformers.js disposal comment. | **[REWIRE]** native-only |
109
+ | `repl/index.tsx` | — | Chrome cleanup delay comment, backend prop. | **[GUT]** |
110
+ | `repl/views/ModelView.tsx`, `ChatView.tsx`, `FrameworksView.tsx` | — | reference chrome-backend / onnx model ids. | **[GUT]** |
111
+
112
+ ### 1d. Skills / Integrations
113
+
114
+ | File | usage | Disposition |
115
+ |---|---|---|
116
+ | `skills/builtin/transcribe.ts` (77) | Enum of whisper ONNX model ids; calls `g.transcribe`. | **[REWIRE]** to native STT model ids (Moonshine) |
117
+ | `integrations/langchain.ts` | `GerbilEmbeddings` → `g.embed`/`g.embedBatch`. No direct ONNX, rides on core `embed()`. | **[KEEP]** (auto-native once core `embed()` is native) |
118
+ | `integrations/ai-sdk.ts` | Embedding settings → `g.embed`. | **[KEEP]** |
119
+ | `src/index.ts` | Exports `ChromeGPUBackend`, `getChromeCachedModels`, `WhisperSTT`, `WHISPER_MODELS`, `KokoroTTS`, `TTS_MODELS`, STT/TTS types. | **[REWIRE]** remove chrome + ONNX-class exports; export native equivalents |
120
+
121
+ ### 1e. Native engine (`src/gpu/`) — the target, all KEEP
122
+
123
+ `index.ts` (`WebGPUEngine`: `generate`, `encodeImage`, `describeImage`, `embed`,
124
+ `speak`, `transcribe`), `moonshine-stt.ts` / `moonshine-executor.ts` (native
125
+ STT), `kani-tts.ts` (native TTS), `vision-executor.ts` / `vision-preprocess.ts`,
126
+ `model-loader.ts` (safetensors), `architectures/*` (qwen2, qwen3_5, qwen3_5_vision,
127
+ lfm2, gemma4, gemma4_vision, gemma3_encoder, moonshine, kani_tts, omnivoice).
128
+ **No changes — this is what everything migrates onto.**
129
+
130
+ ### 1f. Build / deps
131
+
132
+ - `scripts/build-worker.mjs` — bundles `worker-entry.ts` with transformers
133
+ inlined → `worker-code.generated.ts`. **[DELETE]** with the worker.
134
+ - `package.json` scripts `build` and `pretypecheck` call `build-worker.mjs`.
135
+ **[EDIT]** remove that step.
136
+ - `tsdown.config.ts` — `@huggingface/transformers` in `nodeExternals`; browser
137
+ build "BUNDLES transformers.js for clean DX". **[EDIT]** remove transformers
138
+ external + the bundling comment/path.
139
+ - `package.json` deps to drop: `puppeteer-core`, `@huggingface/transformers`,
140
+ `onnxruntime-web`, `kokoro-js`; re-evaluate `@huggingface/hub`.
141
+
142
+ ---
143
+
144
+ ## 2. What Each ONNX Usage Did → Native Replacement
145
+
146
+ | Modality | Old (ONNX) | New (native) | Wiring status |
147
+ |---|---|---|---|
148
+ | **Text gen** | transformers.js `AutoModelForCausalLM` (browser CDN + Node), chrome-backend (Node GPU) | `WebGPUEngine.generate` (qwen2/qwen3_5/lfm2/gemma4) | ✅ Wired in core `gerbil.ts` (`webgpu-native` branch) and browser `use-native-engine.ts`. ONNX text path now **DEAD** for GPU; still the CPU/WASM + chrome fallback. |
149
+ | **Vision (VLM)** | `AutoModelForImageTextToText` (browser + chrome) | `WebGPUEngine.encodeImage` / `describeImage` (qwen3_5_vision, gemma4_vision) | ✅ Native exists. Core vision branch still has chrome fallback (~600-620) and browser worker has ONNX vision. **Partially dead.** |
150
+ | **Embeddings** | `pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2")` (core + `use-embedding` worker) | `WebGPUEngine.embed` (EmbeddingGemma / Qwen3-Embedding) | ❌ **NOT wired** into core `embed()` or `use-embedding.ts`. Still 100% ONNX. **This is live, must be migrated, not just deleted.** |
151
+ | **STT** | `WhisperSTT` (`pipeline("automatic-speech-recognition")`, onnx-community/whisper-*) | `MoonshineSTT` / `WebGPUEngine.transcribe` | ❌ **NOT wired** into core `transcribe()` (delegates to ONNX `WhisperSTT`). Native Moonshine sits unused in `src/gpu/`. **Live, must migrate.** |
152
+ | **TTS** | Kokoro (`kokoro-js`) + Supertonic (`pipeline("text-to-speech")`) | `KaniTTS` / `WebGPUEngine.speak` | ❌ **NOT wired** into core `speak()` (delegates to ONNX `createTTS`). Native Kani unused outside `src/gpu/`. **Live, must migrate.** |
153
+
154
+ **Dead vs deliberate-keep summary:**
155
+ - **DEAD now (GPU mode):** ONNX text & vision generation paths in core + browser
156
+ worker. Safe to delete once consumers stop importing the worker.
157
+ - **LIVE (must migrate first):** core `embed/embedBatch`, core STT (`WhisperSTT`),
158
+ core TTS (`createTTS`/Kokoro), and their browser hooks. These are the real
159
+ work — the "pure native" claim is aspirational for these three modalities at
160
+ the public-API layer.
161
+ - **Deliberate keep:** all of `src/gpu/`; `use-native-engine.ts`; langchain/
162
+ ai-sdk integrations (ride on native `embed`).
163
+
164
+ ---
165
+
166
+ ## 3. Decision Points
167
+
168
+ ### 3a. CPU/WASM fallback (transformers.js wasm) — keep or remove?
169
+
170
+ - **What it is:** `backend-selector.ts` wasm/cpu chains, the `pipeline()` CPU/WASM
171
+ branches in `gerbil.ts` (~488, ~525 fallback), iOS single-thread ORT wasm in
172
+ `use-embedding.ts` / worker. The *only* execution path when WebGPU is absent.
173
+ - **Pure-native cost:** removing it = **WebGPU becomes a hard requirement.**
174
+ No fallback for: older Safari without WebGPU, headless CI without a GPU
175
+ adapter, locked-down enterprise browsers, very old Android.
176
+ - **Recommendation:** the user wants pure native → **remove**, but ship a clear
177
+ capability error ("WebGPU required") rather than a silent failure. Flagged as
178
+ a **reach tradeoff** — quantify lost reach before committing. If any reach
179
+ matters, the cheap compromise is to keep `onnxruntime-web` for embeddings only
180
+ (smallest, most reuse-heavy) and drop everything else; but that keeps one ONNX
181
+ dep alive, so it half-defeats the goal.
182
+ - **Decision owner: user.**
183
+
184
+ ### 3b. chrome-backend.ts (puppeteer headless Chrome) — safe to remove?
185
+
186
+ - **What it was:** Node-side WebGPU via headless Chrome (puppeteer-core) before
187
+ node-dawn/native landed. Already deprecated; the native engine replaced it for
188
+ Node GPU inference.
189
+ - **Hidden tentacles:** beyond inference it powers CLI/REPL features —
190
+ `getChromeCachedModels` (cache listing in `repl/utils.ts`, `index.ts`), the
191
+ `gerbil chrome` admin command (status/kill/ps/pkill of `gerbil/chrome-cache`),
192
+ and exports in `src/index.ts`. These must be detached first or builds break.
193
+ - **Recommendation:** **YES, remove.** Lowest-risk *inference* removal, but do
194
+ it as a coordinated phase (detach CLI/REPL/index consumers → delete file →
195
+ drop `puppeteer-core`). 1,836 lines + a heavy dep gone.
196
+ - **Decision: effectively yes; confirm no external consumer imports
197
+ `ChromeGPUBackend` from the package.**
198
+
199
+ ### 3c. Browser ONNX worker + hooks — remove once site/REPL stop using them?
200
+
201
+ - **What it is:** `worker.ts` + `worker-entry.ts` + 1.35 MB generated bundle +
202
+ ONNX paths in `use-embedding`/`use-speech`/`use-voice-input`.
203
+ - **Blocker:** chat must move to `use-native-engine`; embeddings/STT/TTS hooks
204
+ need native rewrites. The site migration doc (`docs/gerbil-site-native-
205
+ migration.md`, `docs/site-update-plan.md`) tracks consumer migration.
206
+ - **Recommendation:** remove **after** native hooks exist and the site/REPL are
207
+ repointed. Biggest bundle-size win (drops the multi-MB inlined transformers).
208
+ - **Decision: yes, sequenced last among browser work.**
209
+
210
+ ---
211
+
212
+ ## 4. REPL / CLI Audit
213
+
214
+ The REPL/CLI is **partly native already** (it accepts `--backend native` →
215
+ `webgpu-native` and the `webgpuEngine` path handles chat), but is riddled with
216
+ ONNX/chrome assumptions:
217
+
218
+ - **`cli/index.ts`**: every command exposes `--backend native|onnx|auto`
219
+ (default `auto`). `auto` + ONNX must go for native-only. The `transcribe`,
220
+ `tts`, and `voice` commands use ONNX STT/TTS (`whisper-tiny.en`, `kokoro-82m`)
221
+ — repoint to native Moonshine/Kani. The entire `chrome` admin subcommand
222
+ (status/kill/ps) is chrome-backend-only — **delete**. Cache listing scans the
223
+ transformers cache dir — switch to native cache dir.
224
+ - **`repl/utils.ts`**: model registry is ONNX repos (`onnx-community/*-ONNX`,
225
+ `Phi-3-...-onnx`), cache scanning keys on `.onnx` files and the HF
226
+ `/tree/main/onnx` API, plus `getChromeCachedModels`. Rewrite registry to
227
+ native safetensors repos (`Qwen/Qwen3.5-0.8B`, etc. — the `nativeRepoMap` in
228
+ `gerbil.ts` already lists the mappings), scan for `.safetensors`, drop chrome.
229
+ - **`repl/App.tsx` / `index.tsx` / views**: `backend` prop, cpu/webgpu device
230
+ switch, kokoro/whisper defaults, chrome cleanup delay. Collapse to native;
231
+ device switch becomes meaningful only if wasm fallback is kept (§3a).
232
+
233
+ **Plan:** native-only REPL — remove the `onnx`/`auto` backend options (or make
234
+ `auto` == native), delete the `chrome` command, repoint STT/TTS/embeddings model
235
+ ids to native, rewrite cache/registry helpers to safetensors. If §3a keeps wasm,
236
+ retain the cpu/webgpu switch; otherwise drop it.
237
+
238
+ ---
239
+
240
+ ## 5. Phased Removal Plan
241
+
242
+ Ordering principle: **wire native replacements first, detach consumers, then
243
+ delete; deps drop only after their last importer is gone.** Each phase ends with
244
+ the no-regression suite (§6) green.
245
+
246
+ ### Phase 0 — Safety net (no deletion)
247
+ - Snapshot current behavior: run `pnpm test` (vitest) + the native e2e scripts
248
+ (`scripts/engine/test-gerbil-e2e.mjs`, `test-vision-e2e.mjs`,
249
+ `test-embedding-gemma.mjs`, `test-moonshine-transcribe.mjs`,
250
+ `test-kani-speak.mjs`). Record baseline outputs/tokens.
251
+ - **Risk: none.** Verifies: native paths actually work before we lean on them.
252
+
253
+ ### Phase 1 — Wire native embeddings / STT / TTS into the public API (ADD, don't delete)
254
+ - Core `gerbil.ts`: make `embed`/`embedBatch` route to `WebGPUEngine.embed`;
255
+ make `transcribe`/`speak` route to native Moonshine/Kani (new thin wrappers
256
+ replacing `WhisperSTT`/`createTTS`).
257
+ - Add native browser hooks (native embed, native speech) modeled on
258
+ `use-native-engine.ts`.
259
+ - **Risk: MEDIUM** (behavioral parity — output dims, voices, languages).
260
+ Verify: embedding cosine-sim sanity, transcribe WER on a fixture, speak audio
261
+ length/sample-rate. **This phase makes the "pure native" claim true.**
262
+
263
+ ### Phase 2 — Remove chrome-backend (Node puppeteer GPU)
264
+ - Detach: `cli/index.ts` (`chrome` command, cache listing), `repl/utils.ts`
265
+ (`getChromeCachedModels`/`refreshCachedModelSizes`), `repl` views, `src/index.ts`
266
+ exports, `gerbil.ts` chrome branches + passthroughs.
267
+ - Delete `src/core/chrome-backend.ts`. Drop `puppeteer-core` from package.json.
268
+ - **Risk: LOW–MEDIUM** (wide but mechanical; CLI admin command removed is a
269
+ user-visible change). Verify: build, CLI smoke (`gerbil "hi"` native), REPL
270
+ loads.
271
+
272
+ ### Phase 3 — Migrate browser chat to native, delete ONNX worker
273
+ - Repoint site/REPL chat consumers to `use-native-engine`.
274
+ - Delete `worker.ts`, `worker-entry.ts`, `worker-code.generated.ts`,
275
+ `scripts/build-worker.mjs`; remove the build/pretypecheck worker step;
276
+ fix `browser/index.ts` exports.
277
+ - **Risk: MEDIUM** (iOS/Safari chat path was the worker's reason to exist;
278
+ confirm `use-native-engine` covers WKWebView per `docs/mobile-failure-
279
+ diagnosis.md`). Verify: browser e2e chat, iPad smoke.
280
+
281
+ ### Phase 4 — Remove ONNX embeddings / STT / TTS source
282
+ - Delete `src/core/stt.ts`, `src/core/tts.ts`, `src/core/model-compat.ts`.
283
+ - Gut/rewire `use-embedding.ts`, `use-speech.ts`, `use-voice-input.ts` to native.
284
+ - Remove `gerbil.ts` ONNX imports (`AutoModel*`, `pipeline`), the `embedder`
285
+ ONNX field, model-compat call, getModelRepo ONNX maps.
286
+ - Update `src/index.ts` exports (drop `WhisperSTT`/`WHISPER_MODELS`/`KokoroTTS`/
287
+ `TTS_MODELS` or replace with native).
288
+ - Drop `kokoro-js` from package.json.
289
+ - **Risk: MEDIUM.** Verify: STT/TTS/embedding suites; integration tests
290
+ (langchain/ai-sdk embeddings).
291
+
292
+ ### Phase 5 — Decide & remove CPU/WASM fallback + final dep purge (gated on §3a)
293
+ - If removing fallback: delete wasm/cpu branches in `gerbil.ts` +
294
+ `backend-selector.ts` wasm chains; replace with "WebGPU required" error.
295
+ - Remove `@huggingface/transformers` + `onnxruntime-web` from package.json and
296
+ `tsdown.config.ts` externals/bundling. Re-evaluate `@huggingface/hub`.
297
+ - Native-only REPL cleanup (backend flags, device switch).
298
+ - **Risk: HIGH** (reach loss is permanent; this is the one that needs the user's
299
+ explicit sign-off). Verify: full suite on a WebGPU-capable machine; confirm
300
+ graceful error on a no-WebGPU environment.
301
+
302
+ ### Net result
303
+ ~10,800 LOC removed; deps removed: `puppeteer-core`, `@huggingface/transformers`,
304
+ `onnxruntime-web`, `kokoro-js` (+ maybe `@huggingface/hub`); the 1.35 MB worker
305
+ bundle and its build step gone; one inference path (native) to maintain.
306
+
307
+ ---
308
+
309
+ ## 6. Subagent Execution Flow (for the removal workflow)
310
+
311
+ Run as a **sequence of agents, one per phase** (phases are dependency-ordered;
312
+ do NOT parallelize across phases — later phases assume earlier deletions). Within
313
+ a phase, independent detach edits can be parallel sub-agents.
314
+
315
+ Each agent's contract:
316
+ 1. **Read** this doc's relevant phase + the target files.
317
+ 2. **Make the phase's edits** (wire or delete per the disposition table).
318
+ 3. **Verify no regression:** `pnpm build` (TypeScript must pass — catches dangling
319
+ imports), `pnpm test` (vitest), and the phase-relevant native e2e scripts:
320
+ - Text: `scripts/engine/test-gerbil-e2e.mjs`
321
+ - Vision: `scripts/engine/test-vision-e2e.mjs`
322
+ - Embeddings: `scripts/engine/test-embedding-gemma.mjs`
323
+ - STT: `scripts/engine/test-moonshine-transcribe.mjs`
324
+ - TTS: `scripts/engine/test-kani-speak.mjs`
325
+ 4. **Grep-gate:** after a delete phase, `grep -rn "<removed symbol>" src/` must be
326
+ empty (e.g. `ChromeGPUBackend`, `@huggingface/transformers`, `kokoro-js`).
327
+ 5. **Report** diff summary + verification results to the orchestrator; STOP on any
328
+ red so a human can adjudicate (especially Phase 5 reach loss).
329
+
330
+ Suggested agent lineup:
331
+ - **Agent A (Phase 0):** baseline capture, no edits.
332
+ - **Agent B (Phase 1):** native wiring for embed/STT/TTS — the gating change.
333
+ - **Agent C (Phase 2):** chrome-backend removal + `puppeteer-core` drop.
334
+ - **Agent D (Phase 3):** browser native chat migration + worker deletion.
335
+ - **Agent E (Phase 4):** ONNX STT/TTS/embed source removal + `kokoro-js` drop.
336
+ - **Agent F (Phase 5):** GATED — fallback decision, final dep purge, REPL cleanup.
337
+
338
+ Orchestrator gates: B must pass before C–E delete the ONNX sources they replace;
339
+ F runs only after explicit user sign-off on §3a.