@tryhamster/gerbil 1.0.0-rc.8 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (179) hide show
  1. package/LICENSE +1 -1
  2. package/README.md +247 -84
  3. package/dist/architectures-C1I5V3Dt.mjs +6070 -0
  4. package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
  5. package/dist/browser/index.d.ts +264 -588
  6. package/dist/browser/index.d.ts.map +1 -1
  7. package/dist/browser/index.js +585 -2334
  8. package/dist/browser/index.js.map +1 -1
  9. package/dist/cli.mjs +625 -1098
  10. package/dist/cli.mjs.map +1 -1
  11. package/dist/defaults-9komdrbY.mjs +24 -0
  12. package/dist/defaults-9komdrbY.mjs.map +1 -0
  13. package/dist/frameworks/express.d.mts +1 -3
  14. package/dist/frameworks/express.d.mts.map +1 -1
  15. package/dist/frameworks/express.mjs +7 -7
  16. package/dist/frameworks/express.mjs.map +1 -1
  17. package/dist/frameworks/fastify.d.mts +1 -1
  18. package/dist/frameworks/fastify.d.mts.map +1 -1
  19. package/dist/frameworks/fastify.mjs +3 -3
  20. package/dist/frameworks/fastify.mjs.map +1 -1
  21. package/dist/frameworks/hono.d.mts +1 -1
  22. package/dist/frameworks/hono.d.mts.map +1 -1
  23. package/dist/frameworks/hono.mjs +4 -4
  24. package/dist/frameworks/hono.mjs.map +1 -1
  25. package/dist/frameworks/next.d.mts +3 -2
  26. package/dist/frameworks/next.d.mts.map +1 -1
  27. package/dist/frameworks/next.mjs +4 -4
  28. package/dist/frameworks/next.mjs.map +1 -1
  29. package/dist/frameworks/react.d.mts +1 -1
  30. package/dist/frameworks/trpc.d.mts +1 -1
  31. package/dist/frameworks/trpc.d.mts.map +1 -1
  32. package/dist/frameworks/trpc.mjs +4 -4
  33. package/dist/frameworks/trpc.mjs.map +1 -1
  34. package/dist/gerbil-BHrJJIa4.mjs +1656 -0
  35. package/dist/gerbil-BHrJJIa4.mjs.map +1 -0
  36. package/dist/gerbil-BT9fCydo.d.mts +488 -0
  37. package/dist/gerbil-BT9fCydo.d.mts.map +1 -0
  38. package/dist/gerbil-DomNfIr1.mjs +4 -0
  39. package/dist/gpu/hooks.d.mts +520 -0
  40. package/dist/gpu/hooks.d.mts.map +1 -0
  41. package/dist/gpu/hooks.mjs +1188 -0
  42. package/dist/gpu/hooks.mjs.map +1 -0
  43. package/dist/gpu/index.d.mts +2 -0
  44. package/dist/gpu/index.mjs +6 -0
  45. package/dist/gpu-33qCAtHW.mjs +3615 -0
  46. package/dist/gpu-33qCAtHW.mjs.map +1 -0
  47. package/dist/index-Dgmb2kE3.d.mts +245 -0
  48. package/dist/index-Dgmb2kE3.d.mts.map +1 -0
  49. package/dist/index-jEAL2s-A.d.mts +2022 -0
  50. package/dist/index-jEAL2s-A.d.mts.map +1 -0
  51. package/dist/index.d.mts +22 -487
  52. package/dist/index.d.mts.map +1 -1
  53. package/dist/index.mjs +13 -8
  54. package/dist/index.mjs.map +1 -1
  55. package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
  56. package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
  57. package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
  58. package/dist/integrations/ai-sdk.d.mts +75 -6
  59. package/dist/integrations/ai-sdk.d.mts.map +1 -1
  60. package/dist/integrations/ai-sdk.mjs +131 -15
  61. package/dist/integrations/ai-sdk.mjs.map +1 -1
  62. package/dist/integrations/langchain.d.mts +1 -1
  63. package/dist/integrations/langchain.d.mts.map +1 -1
  64. package/dist/integrations/langchain.mjs +5 -5
  65. package/dist/integrations/langchain.mjs.map +1 -1
  66. package/dist/integrations/llamaindex.d.mts +1 -1
  67. package/dist/integrations/llamaindex.d.mts.map +1 -1
  68. package/dist/integrations/llamaindex.mjs +5 -5
  69. package/dist/integrations/llamaindex.mjs.map +1 -1
  70. package/dist/integrations/mcp-client.mjs +3 -3
  71. package/dist/integrations/mcp-client.mjs.map +1 -1
  72. package/dist/integrations/mcp.d.mts +3 -2
  73. package/dist/integrations/mcp.d.mts.map +1 -1
  74. package/dist/integrations/mcp.mjs +5 -5
  75. package/dist/{mcp-BvbriaBy.mjs → mcp-1DaMsaBc.mjs} +4 -4
  76. package/dist/mcp-1DaMsaBc.mjs.map +1 -0
  77. package/dist/memory/index.d.mts +3 -0
  78. package/dist/memory/index.mjs +6 -0
  79. package/dist/memory-D1P7Tmda.mjs +4 -0
  80. package/dist/memory-DVN0MnIG.mjs +132 -0
  81. package/dist/memory-DVN0MnIG.mjs.map +1 -0
  82. package/dist/memory-Dj0J1v88.mjs +294 -0
  83. package/dist/memory-Dj0J1v88.mjs.map +1 -0
  84. package/dist/moonshine-stt-BLyVoRpB.mjs +4 -0
  85. package/dist/moonshine-stt-v_P_Ci_m.mjs +11936 -0
  86. package/dist/moonshine-stt-v_P_Ci_m.mjs.map +1 -0
  87. package/dist/{one-liner-s-lD8rCC.mjs → one-liner-DnQn7HJK.mjs} +14 -16
  88. package/dist/one-liner-DnQn7HJK.mjs.map +1 -0
  89. package/dist/repl-jV5gcJFA.mjs +9 -0
  90. package/dist/skills/index.d.mts +270 -320
  91. package/dist/skills/index.d.mts.map +1 -1
  92. package/dist/skills/index.mjs +5 -5
  93. package/dist/{skills-CD3Orlex.mjs → skills-DX8D59UH.mjs} +187 -32
  94. package/dist/skills-DX8D59UH.mjs.map +1 -0
  95. package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
  96. package/dist/tools-DQ1mPUw5.mjs.map +1 -0
  97. package/dist/{types-CiTc7ez3.d.mts → types-D6FiR_oh.d.mts} +106 -12
  98. package/dist/types-D6FiR_oh.d.mts.map +1 -0
  99. package/dist/types-DQBe2lFo.d.mts +165 -0
  100. package/dist/types-DQBe2lFo.d.mts.map +1 -0
  101. package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
  102. package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
  103. package/dist/vector-B0panuy6.mjs +95 -0
  104. package/dist/vector-B0panuy6.mjs.map +1 -0
  105. package/docs/PROJECT-STATE.md +321 -0
  106. package/docs/adding-a-model-family.md +280 -0
  107. package/docs/ai-sdk.md +70 -61
  108. package/docs/architecture/overview.md +17 -7
  109. package/docs/browser.md +203 -8
  110. package/docs/embeddings.md +156 -0
  111. package/docs/gerbil-site-native-migration.md +217 -0
  112. package/docs/gpu-engine/architectures.md +398 -0
  113. package/docs/gpu-engine/ir.md +372 -0
  114. package/docs/gpu-engine/kernels.md +718 -0
  115. package/docs/gpu-engine/paper.html +1759 -0
  116. package/docs/gpu-engine/paper.md +2109 -0
  117. package/docs/gpu-engine/safetensors.md +312 -0
  118. package/docs/gpu-engine/tokenizer.md +302 -0
  119. package/docs/memory-rag.md +91 -0
  120. package/docs/metal-safari-intel.md +190 -0
  121. package/docs/mobile-failure-diagnosis.md +124 -0
  122. package/docs/mobile.md +99 -0
  123. package/docs/observability.md +230 -0
  124. package/docs/onnx-removal-plan.md +339 -0
  125. package/docs/research/autoresearch-portable.md +904 -0
  126. package/docs/research/dispatch-reduction-hivemind.md +84 -0
  127. package/docs/research/ios-safari-model-caching.md +117 -0
  128. package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
  129. package/docs/research/native-stt-model-selection.md +49 -0
  130. package/docs/research/native-tts-model-selection.md +90 -0
  131. package/docs/research/native-vs-chromium-decision.md +152 -0
  132. package/docs/research/nemotron-mamba2-inference.md +910 -0
  133. package/docs/research/qwen35-multimodal.md +293 -0
  134. package/docs/research/qwen36-gemma4-targets.md +337 -0
  135. package/docs/research/sota-embedding-models.md +179 -0
  136. package/docs/research/sota-mobile-models-2026.md +263 -0
  137. package/docs/research/sota-modality-models.md +202 -0
  138. package/docs/research/tps-baselines.md +71 -0
  139. package/docs/research/webgpu-m4-reference.md +104 -0
  140. package/docs/site-update-plan.md +155 -0
  141. package/docs/structured-output.md +123 -0
  142. package/docs/stt.md +63 -446
  143. package/docs/tts.md +77 -499
  144. package/docs/vision.md +100 -338
  145. package/package.json +22 -7
  146. package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
  147. package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
  148. package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
  149. package/dist/gerbil-CJ3ifloF.mjs +0 -4
  150. package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
  151. package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
  152. package/dist/gerbil-qOTe1nl2.d.mts +0 -431
  153. package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
  154. package/dist/kokoro-BNTb6egA.mjs +0 -20210
  155. package/dist/kokoro-BNTb6egA.mjs.map +0 -1
  156. package/dist/kokoro-DFRQ1OeM.js +0 -20212
  157. package/dist/kokoro-DFRQ1OeM.js.map +0 -1
  158. package/dist/mcp-BvbriaBy.mjs.map +0 -1
  159. package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
  160. package/dist/repl-DveXw36T.mjs +0 -9
  161. package/dist/skills-CD3Orlex.mjs.map +0 -1
  162. package/dist/stt-CpLYbGFd.mjs +0 -433
  163. package/dist/stt-CpLYbGFd.mjs.map +0 -1
  164. package/dist/stt-DRPLEEHB.mjs +0 -3
  165. package/dist/stt-Te8Qz-Ay.js +0 -433
  166. package/dist/stt-Te8Qz-Ay.js.map +0 -1
  167. package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
  168. package/dist/transformers.web-DokyH3rP.js +0 -3
  169. package/dist/transformers.web-M6mCnEYJ.js +0 -30382
  170. package/dist/transformers.web-M6mCnEYJ.js.map +0 -1
  171. package/dist/tts-C0xx3CtE.js +0 -724
  172. package/dist/tts-C0xx3CtE.js.map +0 -1
  173. package/dist/tts-DXgsKGCe.mjs +0 -3
  174. package/dist/tts-DeGANMNV.mjs +0 -730
  175. package/dist/tts-DeGANMNV.mjs.map +0 -1
  176. package/dist/types-CiTc7ez3.d.mts.map +0 -1
  177. /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
  178. /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
  179. /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
package/docs/browser.md CHANGED
@@ -1,6 +1,93 @@
1
1
  # Browser Usage
2
2
 
3
- Run LLMs, TTS, and STT directly in the browser with WebGPU acceleration. No server required.
3
+ Run models directly in the browser with WebGPU. No server required.
4
+
5
+ Browser inference runs on the **native engine** — the React hooks at
6
+ `@tryhamster/gerbil/gpu/hooks` (`useEngine` / `useChat` / `useText` / `useVision` /
7
+ `useEmbedding` / `useTTS` / `useSTT` / `useVoiceChat` / `useMemory`),
8
+ backed by the from-scratch WGSL `WebGPUEngine`. Pure compute shaders, no ONNX, no
9
+ transformers.js. This is the supported path for text, vision, embeddings, and speech, and the
10
+ lane the Gerbil site itself runs on.
11
+
12
+ > The old inline transformers.js/ONNX worker hooks (`useChat`, `useSpeech`, `useVoiceInput`,
13
+ > `useEmbedding`, `createGerbilWorker`, `preload*`) have been **removed** from
14
+ > `@tryhamster/gerbil/browser`. `@tryhamster/gerbil/browser` now exports only device/WebGPU
15
+ > utilities (`isModelSafeForDevice`, `detectMemoryCrash`, `downloadModelChunked`,
16
+ > `checkWebGPUCapabilities`, `getBrowserDiagnostics`, …). The "Legacy Worker Lane" sections
17
+ > below are retained for historical reference and no longer reflect the shipped API.
18
+
19
+ > **Pre-1.0.** APIs may still shift before 1.0.
20
+
21
+ ## Native Engine (recommended)
22
+
23
+ ```tsx
24
+ import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
25
+
26
+ function Chat() {
27
+ const { complete, completion, isLoading, isGenerating, tps } = useEngine({
28
+ model: "mlx-community/Qwen3.5-0.8B-4bit",
29
+ autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
30
+ });
31
+
32
+ if (isLoading) return <div>Loading model…</div>;
33
+
34
+ return (
35
+ <div>
36
+ <button onClick={() => complete("Write a haiku about coding")} disabled={isGenerating}>
37
+ Generate
38
+ </button>
39
+ <p>{completion}</p>
40
+ {isGenerating && <span>{tps?.toFixed(1)} tok/s</span>}
41
+ </div>
42
+ );
43
+ }
44
+ ```
45
+
46
+ `useEngine` owns the engine lifecycle — load, unload, hot-swap on config change, and
47
+ **reference-counted sharing** so multiple components requesting the same
48
+ `model|dtype|vision|embedding|maxSeqLen` share ONE engine (weights uploaded to the GPU once).
49
+
50
+ ```typescript
51
+ const {
52
+ complete, // (prompt, opts?) => Promise<string> — streams into `completion`
53
+ describeImage, // (image, prompt?, opts?) => Promise<string> — needs enableVision
54
+ embed, // (text, { taskType }?) => Promise<Float32Array> — needs embedding
55
+ similarity, // (a, b) => Promise<number> — needs embedding
56
+ completion, // string — current text, streams token by token
57
+ isLoading, loadingProgress, isGenerating, isReady, tps, error, errorKind,
58
+ load, stop, dispose,
59
+ } = useEngine({
60
+ model: "mlx-community/Qwen3.5-0.8B-4bit", // HF repo id
61
+ dtype: "auto", // "auto" (default) | "f32" | "q4"
62
+ maxSeqLen, // default: 2048 mobile / 4096 desktop
63
+ autoLoad: false, // load on mount
64
+ enableVision: false, // build the ViT so describeImage() works
65
+ embedding: false, // load as an embedding model
66
+ onReady, onError, // onError(err, kind) — kind: "no-webgpu" | "oom" | …
67
+ });
68
+ ```
69
+
70
+ Use `enableVision: true` for image→text (see [Vision](./vision.md)) and `embedding: true`
71
+ for embeddings (see [Embeddings](./embeddings.md)).
72
+
73
+ ### Browser support (native engine)
74
+
75
+ - **Chrome / Edge 113+**
76
+ - **Safari 26+ (iOS / iPadOS 26+)**
77
+ - **Firefox 141+**
78
+
79
+ On devices without WebGPU the hook reports an `errorKind` of `"no-webgpu"` rather than
80
+ silently degrading.
81
+
82
+ ---
83
+
84
+ ## Legacy Worker Lane (removed — historical reference only)
85
+
86
+ > **Removed.** Everything below this point documents the old inline transformers.js/ONNX
87
+ > worker hooks, which have been **deleted** from the package (no `useChat`/`useSpeech`/
88
+ > `useVoiceInput`/`useEmbedding` worker exports, no `createGerbilWorker`, no `preload*`
89
+ > functions, and no ONNX/transformers.js dependency). It is kept here only for historical
90
+ > reference. Use the native hooks from `@tryhamster/gerbil/gpu/hooks` instead.
4
91
 
5
92
  ## Quick Start (React)
6
93
 
@@ -34,6 +121,40 @@ function Chat() {
34
121
 
35
122
  That's it! The hook handles model loading, streaming, and state management.
36
123
 
124
+ ## Model Preloading
125
+
126
+ Download models during app initialization so they're ready when users need them:
127
+
128
+ ```typescript
129
+ import {
130
+ preloadChatModel,
131
+ preloadEmbeddingModel,
132
+ preloadTTSModel,
133
+ preloadSTTModel
134
+ } from "@tryhamster/gerbil/browser";
135
+
136
+ // During app initialization
137
+ async function initApp() {
138
+ // Preload LLM
139
+ await preloadChatModel("qwen3-0.6b", {
140
+ onProgress: (p) => {
141
+ if (p.status === "downloading") {
142
+ console.log(`Downloading ${p.file}: ${p.progress}%`);
143
+ }
144
+ },
145
+ });
146
+
147
+ // Preload other models as needed
148
+ await preloadEmbeddingModel("Xenova/all-MiniLM-L6-v2");
149
+ await preloadTTSModel("kokoro-82m");
150
+ await preloadSTTModel("whisper-tiny.en");
151
+ }
152
+
153
+ initApp();
154
+ ```
155
+
156
+ After preloading, hooks like `useChat` will load instantly from IndexedDB cache.
157
+
37
158
  ## React Hooks
38
159
 
39
160
  ### `useChat`
@@ -220,6 +341,9 @@ const {
220
341
 
221
342
  #### Vision (Image Analysis)
222
343
 
344
+ > For native image→text, use `useEngine({ enableVision: true }).describeImage(...)` —
345
+ > see [Vision docs](./vision.md). The legacy-lane example below uses the retired ONNX worker.
346
+
223
347
  Use `useCompletion` with a vision model to analyze images:
224
348
 
225
349
  ```tsx
@@ -227,7 +351,7 @@ import { useCompletion } from "@tryhamster/gerbil/browser";
227
351
 
228
352
  function ImageAnalyzer() {
229
353
  const { complete, completion, isLoading, isGenerating } = useCompletion({
230
- model: "ministral-3b", // Vision model
354
+ model: "ministral-3b", // Vision model (legacy lane)
231
355
  maxTokens: 2048,
232
356
  });
233
357
  const [imageUrl, setImageUrl] = useState<string | null>(null);
@@ -456,6 +580,75 @@ for await (const chunk of gerbil.speakStream("Long text...")) {
456
580
  }
457
581
  ```
458
582
 
583
+ ## Embeddings Hook
584
+
585
+ ### `useEmbedding`
586
+
587
+ Generate embeddings for semantic search and similarity:
588
+
589
+ ```tsx
590
+ import { useEmbedding } from "@tryhamster/gerbil/browser";
591
+
592
+ function SemanticSearch() {
593
+ const { embed, similarity, search, isLoading, isReady, load } = useEmbedding({
594
+ model: "Xenova/all-MiniLM-L6-v2", // Default
595
+ autoLoad: false,
596
+ });
597
+
598
+ if (isLoading) return <div>Loading embedding model...</div>;
599
+
600
+ const handleSearch = async () => {
601
+ const results = await search("capital of France", [
602
+ "Paris is beautiful",
603
+ "London is in England",
604
+ "Dogs are pets",
605
+ ], 2); // topK = 2
606
+
607
+ console.log(results);
608
+ // [{ text: "Paris is beautiful", score: 0.89, index: 0 }, ...]
609
+ };
610
+
611
+ const handleSimilarity = async () => {
612
+ const score = await similarity("Hello world", "Hi there");
613
+ console.log(score); // 0.85
614
+ };
615
+
616
+ return (
617
+ <div>
618
+ <button onClick={handleSearch}>Search</button>
619
+ <button onClick={handleSimilarity}>Compare</button>
620
+ </div>
621
+ );
622
+ }
623
+ ```
624
+
625
+ ### Options
626
+
627
+ ```typescript
628
+ const {
629
+ // Actions
630
+ embed, // (text: string) => Promise<number[]>
631
+ embedBatch, // (texts: string[]) => Promise<{ vector, text }[]>
632
+ similarity, // (a: string, b: string) => Promise<number>
633
+ search, // (query: string, corpus: string[], topK?) => Promise<SearchResult[]>
634
+ findNearest, // (embedding: number[], candidates: string[], topK?) => Promise<SearchResult[]>
635
+ cosineSimilarity, // (a: number[], b: number[]) => number (sync)
636
+ load, // () => void - manually load model
637
+
638
+ // State
639
+ isLoading, // boolean - model loading
640
+ isReady, // boolean - model ready
641
+ loadingProgress, // { status, message?, progress? }
642
+ error, // string | null
643
+ } = useEmbedding({
644
+ model: "Xenova/all-MiniLM-L6-v2", // Embedding model
645
+ normalize: true, // Normalize vectors (default: true)
646
+ autoLoad: false, // Load on mount (default: false)
647
+ onReady: () => {},
648
+ onError: (err) => {},
649
+ });
650
+ ```
651
+
459
652
  ## Low-Level API
460
653
 
461
654
  For full control, use `createGerbilWorker` directly:
@@ -533,22 +726,24 @@ const info = await getWebGPUInfo();
533
726
  // { supported: true, adapter: "Apple", device: "Apple M4 Max" }
534
727
  ```
535
728
 
536
- ## Models
729
+ ## Models (legacy worker lane)
537
730
 
538
731
  | Model | Size | Best For |
539
732
  |-------|------|----------|
540
733
  | `qwen3-0.6b` | ~400MB | General use, thinking mode |
541
734
  | `smollm2-360m` | ~250MB | Faster, smaller |
542
735
  | `smollm2-135m` | ~100MB | Fastest, basic tasks |
543
- | `ministral-3b` | ~2.5GB | **Vision** — image analysis |
544
736
 
545
- Models are cached in IndexedDB after first download.
737
+ > For vision, embeddings, and speech use the native engine (`useEngine`) — see the
738
+ > [Native Engine](#native-engine-recommended) section above.
739
+
740
+ Legacy-lane models are cached in IndexedDB after first download.
546
741
 
547
- ## Browser Support
742
+ ## Browser Support (legacy worker lane)
548
743
 
549
744
  - **Chrome/Edge 113+** — Full WebGPU support
550
- - **Safari 18+** — WebGPU support (may have quirks)
551
- - **Firefox** — WebGPU behind flag, not recommended
745
+ - **Safari 26+ (iOS/iPadOS 26+)** — WebGPU support
746
+ - **Firefox 141+** — WebGPU support
552
747
 
553
748
  ## Troubleshooting
554
749
 
@@ -0,0 +1,156 @@
1
+ # Embeddings
2
+
3
+ Gerbil generates text embeddings natively on the WebGPU engine using **EmbeddingGemma-300M**
4
+ — a bidirectional Gemma3 encoder with mean pooling and a 2-layer Dense head, producing
5
+ 768-dim, L2-normalized vectors. Runs on-device (including iPad Safari), no ONNX, no API keys.
6
+
7
+ > **Pre-1.0.** `engine.embed()` is the native path. The old ONNX/transformers.js embedding
8
+ > lane (MiniLM/BGE/GTE) has been removed. The `Gerbil`-class helpers (`embed`, `similarity`,
9
+ > `search`, `findNearest`) still work but now run native EmbeddingGemma under the hood (see
10
+ > [below](#gerbil-class-embeddings-native-wrapper)).
11
+
12
+ ## Quick Start
13
+
14
+ ### Node
15
+
16
+ ```typescript
17
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
18
+
19
+ const engine = await WebGPUEngine.create({
20
+ repo: "mlx-community/embeddinggemma-300m-4bit",
21
+ embedding: true,
22
+ });
23
+
24
+ // EmbeddingGemma is asymmetric — queries and documents use different prefixes.
25
+ const query = await engine.embed("capital of France", { taskType: "query" });
26
+ const doc = await engine.embed("Paris is the capital of France.", { taskType: "document" });
27
+
28
+ // Vectors are unit-norm, so cosine similarity is just a dot product.
29
+ const dot = query.reduce((s, v, i) => s + v * doc[i], 0);
30
+ console.log(dot); // ~0.7+
31
+
32
+ engine.destroy();
33
+ ```
34
+
35
+ `embed()` returns a `Float32Array` of length 768 (EmbeddingGemma) with unit L2 norm.
36
+
37
+ ### React (Browser)
38
+
39
+ ```tsx
40
+ import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
41
+
42
+ function SemanticSearch() {
43
+ const { embed, similarity, isLoading } = useEngine({
44
+ model: "mlx-community/embeddinggemma-300m-4bit",
45
+ embedding: true,
46
+ autoLoad: true,
47
+ });
48
+
49
+ if (isLoading) return <div>Loading embedding model…</div>;
50
+
51
+ const compare = async () => {
52
+ // similarity() embeds a as a query and b as a document, returns cosine.
53
+ const score = await similarity("Hello world", "Hi there");
54
+ console.log(score);
55
+ };
56
+
57
+ return <button onClick={compare}>Compare</button>;
58
+ }
59
+ ```
60
+
61
+ The hook exposes `embed(text, { taskType })` (defaults to `"query"`) and
62
+ `similarity(a, b)`.
63
+
64
+ ## Asymmetric tasks
65
+
66
+ EmbeddingGemma uses different task prefixes for queries vs documents. Pass `taskType`, or a
67
+ raw `taskPrompt` for non-retrieval tasks (clustering / classification / STS):
68
+
69
+ ```typescript
70
+ await engine.embed(text, { taskType: "query" }); // "task: search result | query: "
71
+ await engine.embed(text, { taskType: "document" }); // "title: none | text: "
72
+ await engine.embed(text, { taskPrompt: "task: clustering | query: " });
73
+ ```
74
+
75
+ ## API
76
+
77
+ ```typescript
78
+ interface EmbedOptions {
79
+ /** EmbeddingGemma: "query" (default) or "document". */
80
+ taskType?: "query" | "document";
81
+ /** EmbeddingGemma: raw task prefix, overrides taskType. */
82
+ taskPrompt?: string;
83
+ /** Qwen3-Embedding: instruction prefix for query embeddings. */
84
+ instruction?: string;
85
+ /** Max tokens to encode (longer inputs are truncated). */
86
+ maxTokens?: number;
87
+ }
88
+
89
+ // async embed(text: string, options?: EmbedOptions): Promise<Float32Array>
90
+ ```
91
+
92
+ `embed()` requires an engine loaded with `{ embedding: true }`. The pooling strategy is
93
+ chosen by architecture: EmbeddingGemma mean-pools over all tokens; Qwen3-Embedding uses
94
+ last-token (EOS-position) pooling.
95
+
96
+ ## RAG
97
+
98
+ EmbeddingGemma pairs with `@tryhamster/gerbil/memory` for token-budgeted retrieval, or you
99
+ can build a simple pipeline by hand:
100
+
101
+ ```typescript
102
+ const engine = await WebGPUEngine.create({
103
+ repo: "mlx-community/embeddinggemma-300m-4bit",
104
+ embedding: true,
105
+ });
106
+
107
+ // Index documents.
108
+ const docs = await loadDocuments();
109
+ const index = [];
110
+ for (const text of docs) {
111
+ index.push({ text, vector: await engine.embed(text, { taskType: "document" }) });
112
+ }
113
+
114
+ // Retrieve.
115
+ const q = await engine.embed(question, { taskType: "query" });
116
+ const ranked = index
117
+ .map((d) => ({ text: d.text, score: d.vector.reduce((s, v, i) => s + v * q[i], 0) }))
118
+ .sort((a, b) => b.score - a.score)
119
+ .slice(0, 3);
120
+ ```
121
+
122
+ ## Other native embedders
123
+
124
+ - **Qwen3-Embedding-0.6B** (`Qwen/Qwen3-Embedding-0.6B`) — also supported natively
125
+ (`{ embedding: true }`); uses last-token pooling and an optional `instruction` prefix.
126
+ Larger (BF16 OOMs iPad); EmbeddingGemma is the recommended default.
127
+
128
+ ## Models
129
+
130
+ | Model | Repo | Dim | Notes |
131
+ |-------|------|-----|-------|
132
+ | **EmbeddingGemma-300M** | `mlx-community/embeddinggemma-300m-4bit` | 768 | Default; asymmetric; runs on iPad |
133
+ | Qwen3-Embedding-0.6B | `Qwen/Qwen3-Embedding-0.6B` | 1024 | Last-token pooling; desktop |
134
+
135
+ ---
136
+
137
+ ## `Gerbil`-class embeddings (native wrapper)
138
+
139
+ > The ONNX/transformers.js embedding lane (MiniLM/BGE/GTE) has been removed. The `Gerbil`-class
140
+ > helpers below still work but now run native EmbeddingGemma under the hood (768-dim,
141
+ > WebGPU-required). The old browser `useEmbedding` worker hook is gone — use `useEmbedding`
142
+ > from `@tryhamster/gerbil/gpu/hooks`.
143
+
144
+ ```typescript
145
+ import { Gerbil } from "@tryhamster/gerbil";
146
+
147
+ const g = new Gerbil();
148
+ const { vector } = await g.embed("Hello world"); // number[768] (EmbeddingGemma)
149
+ const { score } = await g.similarity("Hello world", "Hi there");
150
+ const results = await g.search("capital of France", ["Paris…", "London…"]);
151
+ ```
152
+
153
+ ## See Also
154
+
155
+ - [Browser Hooks](./browser.md) — React hooks
156
+ - [Vision](./vision.md), [TTS](./tts.md), [STT](./stt.md)
@@ -0,0 +1,217 @@
1
+ # gerbil-site → Native-Engine Migration Assessment
2
+
3
+ **As of 2026-06-14. Assessment only — no code in `gerbil-site` was changed.**
4
+ The marketing/docs site lives in a **separate repo** at `/Users/shenron/code/gerbil-site`.
5
+ This document maps how that site does in-browser inference today onto the native
6
+ WebGPU engine's browser API (`@tryhamster/gerbil` `src/browser/*` + `src/gpu/index.ts`),
7
+ says which features can switch **today** vs which must **wait**, states the device-coverage
8
+ consequence of dropping the ONNX fallback **plainly**, and gives a phased plan with the
9
+ smallest first step.
10
+
11
+ It pairs with the engine paper §29 (cross-device multi-modal parity) and §23 (the
12
+ native-only architecture decision). The owner decision is native-only with **no kept
13
+ transformers.js/ONNX fallback lane**; this doc is the concrete site-side consequence.
14
+
15
+ ---
16
+
17
+ ## 0. The one fact that reframes everything
18
+
19
+ The site does **not** call `@huggingface/transformers` or `onnxruntime-web` directly
20
+ anywhere in runtime code. Every grep hit for `pipeline(`, `AutoModel`, `AutoTokenizer`,
21
+ `KokoroTTS`, `feature-extraction`, etc. is **docs prose / code samples** under
22
+ `app/docs/**`. All real inference goes through **gerbil**, by two paths:
23
+
24
+ - **Path A — gerbil browser hooks** (`@tryhamster/gerbil/browser`): `useCompletion`,
25
+ `useSpeech`, `useVoiceInput`. These wrap a **transformers.js + onnxruntime-web Web
26
+ Worker** that lives *inside the gerbil tgz*. transformers.js@3.8.1 and the pinned
27
+ `onnxruntime-web@1.21.0-dev…` are declared deps but consumed only transitively here.
28
+ - **Path B — gerbil native engine** (`@tryhamster/gerbil/gpu`): the `WebGPUEngine`
29
+ (pure WGSL, no worker, no ONNX), dynamically imported in `hooks/useNativeEngine.ts`.
30
+
31
+ So the migration is **not** "rip ONNX out of the site." It is: **move each gerbil
32
+ browser hook from its transformers.js/ONNX worker backend to the native engine** — and
33
+ the site *already has a working native path* for chat. The work is (a) extend the native
34
+ hook coverage to embeddings + vision, (b) wire audio when native audio lands, and (c)
35
+ decide when to flip the default backend.
36
+
37
+ > The site depends on gerbil as a **local tgz at rc.26**
38
+ > (`@tryhamster/gerbil: file:/Users/shenron/Code/gerbil/tryhamster-gerbil-1.0.0-rc.26.tgz`).
39
+ > Any new native browser hooks must be published in a new rc and the tgz re-pinned before
40
+ > the site can consume them.
41
+
42
+ ---
43
+
44
+ ## (a) What the site uses for inference today, and where
45
+
46
+ Framework: **Next.js 14.2.0** (App Router, React 18), `next.config.js` aliases the gerbil
47
+ *browser* bundle to an empty module server-side and loads ORT WASM/MJS from CDN, with
48
+ COOP/COEP headers for SharedArrayBuffer threading.
49
+
50
+ | Call site (file) | Modality | Path | Library actually used | Model id(s) |
51
+ |---|---|---|---|---|
52
+ | `components/PlaygroundFull.tsx:309` `useCompletion` | Chat/completion | A (hooks) | transformers.js worker | `qwen3-0.6b` default + `smollm2-*`, `LFM2-*-ONNX`, `qwen3*` |
53
+ | `components/PlaygroundFull.tsx:318` `useCompletion({model:"ministral-3b"})` | Vision (image→text) | A (hooks) | transformers.js worker (`AutoModelForImageTextToText`) | `ministral-3b` |
54
+ | `components/PlaygroundFull.tsx:366` `useSpeech` | TTS | A (hooks) | `kokoro-js` / transformers.js | `kokoro-82m` (def), `supertonic-66m` |
55
+ | `components/PlaygroundFull.tsx:383` `useVoiceInput` | STT | A (hooks) | transformers.js (Whisper) | `whisper-tiny.en` … `whisper-large-v3-turbo` |
56
+ | `components/PlaygroundFull.tsx` Embed tab (~1700) `similarity()` | Embeddings | — | **REMOVED / dead** (`// useEmbedding removed in this gerbil version`) | n/a |
57
+ | `components/AISDKPlayground.tsx:159` `useCompletion` | Chat | A (hooks) | transformers.js worker | `qwen3-0.6b` |
58
+ | `components/AISDKPlayground.tsx:174` `useCompletion({model:"ministral-3b"})` | Vision | A (hooks) | transformers.js worker | `ministral-3b` |
59
+ | `components/AISDKPlayground.tsx:187` `useSpeech` | TTS | A (hooks) | kokoro-js | `kokoro-82m` |
60
+ | `hooks/useNativeEngine.ts:204` `import("@tryhamster/gerbil/gpu")` → `WebGPUEngine.create/generate` | Chat **only** | **B (native)** | **native WGSL** | `mlx-community/Qwen3.5-0.8B-4bit` (def), `Qwen/Qwen3.5-0.8B`, `Qwen/Qwen3-0.6B`, GPTQ variants |
61
+
62
+ Wiring/render sites: `components/Playground.tsx` chooses `PlaygroundNative` (native-only
63
+ chat; other tabs disabled) vs `PlaygroundFull` (all modalities, hooks) off
64
+ `localStorage["gerbil-backend"]`. Both `<Playground />` and `<AISDKPlayground />` render on
65
+ `app/page.tsx` and `app/playground/page.tsx`, all `dynamic(..., { ssr:false })`.
66
+
67
+ Two states worth flagging now:
68
+ - **Embeddings are already broken** in rc.26: `useEmbedding` was removed; both playgrounds
69
+ null out `similarity`/`embed` but keep live Embed UI that would throw if clicked. So the
70
+ embeddings migration is also a *bug fix*.
71
+ - The native path (`hooks/useNativeEngine.ts`) is the site's **own** hook calling
72
+ `WebGPUEngine` directly — it does **not** use a gerbil-published `useNativeEngine`
73
+ (there is no `@tryhamster/gerbil/gpu/hooks` export subpath; see §(b) caveat).
74
+
75
+ ---
76
+
77
+ ## (b) Which native browser hook/API replaces each ONNX/transformers.js call site
78
+
79
+ The native engine surface (`@tryhamster/gerbil/gpu`, `src/gpu/index.ts`) is the class
80
+ `WebGPUEngine`, constructed via `static create(options)` and exposing
81
+ `generate()`, `embed()`, `describeImage()`, `encodeImage()`. The browser hooks
82
+ (`@tryhamster/gerbil/browser`, `src/browser/*`) currently target the transformers.js
83
+ worker; native hooks either exist privately (`src/browser/use-native-engine.ts`,
84
+ intentionally **not** re-exported because the GPU engine drags in `@huggingface/hub`
85
+ Node-only `node:fs` paths) or must be added.
86
+
87
+ | Site call site | Today (ONNX/tfjs) | Native replacement | Native symbol(s) | Status |
88
+ |---|---|---|---|---|
89
+ | `useCompletion` (chat) | tfjs worker `generate` | `WebGPUEngine.create({repo, dtype, maxSeqLen, onProgress})` → `engine.generate(prompt, {maxTokens, sampling, systemPrompt, stopSequences, onToken})` | `WebGPUEngine.create`, `generate` (`src/gpu/index.ts`) | ✅ **today** — already wired in `hooks/useNativeEngine.ts` |
90
+ | Embed tab (removed) | (was `useEmbedding`) | load with `{ embedding: true }`, then `engine.embed(text, {taskType:"query"|"document"})` → unit-L2 `Float32Array` (dim 768 for EmbeddingGemma) | `embed`, guard `isEmbedding` | ✅ **today** — model `mlx-community/embeddinggemma-300m-4bit` (173 MB, runs on iPad, paper §25) |
91
+ | Vision (`ministral-3b`) | tfjs `AutoModelForImageTextToText` | load with `{ enableVision: true }` (Qwen3.5), then `engine.describeImage({pixels,width,height}, prompt, opts)` → `GenerateResult` | `describeImage`, guard `hasVision`; lower-level `encodeImage(patches, gridTHW)` | ✅ **today** — but **model changes** to a Qwen3.5 ViT checkpoint (the native ViT is Qwen3.5's own tower, not Ministral); paper §22 / §10 |
92
+ | `useSpeech` (TTS) | kokoro-js / tfjs | **none yet** — OmniVoice native TTS in progress | — (`src/browser/use-speech.ts` stays tfjs) | ❌ **wait** (audio) |
93
+ | `useVoiceInput` (STT) | tfjs Whisper | **none yet** — Moonshine native STT not started | — (`src/browser/use-voice-input.ts` stays tfjs) | ❌ **wait** (audio) |
94
+
95
+ Native `WebGPUEngine` option/method reference (from `src/gpu/index.ts` + `model-loader.ts`,
96
+ exact signatures):
97
+ - `create(options)`: `options extends LoadModelOptions` (`repo` required HF id/URL,
98
+ `onProgress(loaded,total,message)`, `dtype?: "f32"|"q4"`, `revision?`, `hfToken?`) plus
99
+ `maxSeqLen?` (capped 4096), `kvMode?`, `enableVision?` (downloads ~192 MB ViT, Qwen3.5
100
+ only), `embedding?: boolean` (last-token pool + L2; on the Gemma encoder path it builds
101
+ the encoder graph instead). Flags: `get isEmbedding`, `get hasVision`.
102
+ - `generate(prompt|ChatMessage[], {maxTokens?, stopSequences?, sampling?, systemPrompt?, onToken?})`
103
+ → `{text, tokensGenerated, tokensPerSecond, totalTime, finishReason, thinking?}`.
104
+ - `embed(text, {instruction?, taskType?:"query"|"document", taskPrompt?, maxTokens?})` →
105
+ unit-L2 `Float32Array`. Throws if not loaded with `{embedding:true}`.
106
+ - `describeImage(image, prompt?, options?)` where `image` is `{pixels,width,height}` **or**
107
+ `{patches, gridTHW}` → `GenerateResult`. Throws if not loaded with `{enableVision:true}`.
108
+
109
+ **Caveat (publishing gap to fix first):** the native engine is exposed at
110
+ `@tryhamster/gerbil/gpu`, but there is **no published React hook** wrapping it. The site
111
+ solved this itself by writing `hooks/useNativeEngine.ts`. To migrate the other modalities
112
+ cleanly, gerbil should publish proper native hooks (e.g. `useNativeChat`, `useNativeEmbedding`,
113
+ `useNativeVision`) under a real subpath (today `src/browser/use-native-engine.ts` exists but
114
+ is excluded from the `browser` barrel, and `@tryhamster/gerbil/gpu/hooks` is referenced in a
115
+ comment but **not** declared in `package.json` `exports`). Until then, the site keeps wrapping
116
+ `WebGPUEngine` directly, modality by modality, as it already does for chat.
117
+
118
+ ---
119
+
120
+ ## (c) Modality coverage map — switch TODAY vs WAIT
121
+
122
+ | Site feature | Native today? | Native model | Notes |
123
+ |---|---|---|---|
124
+ | **Chat / completion** | ✅ **yes — already live** | Qwen3.5-0.8B (4bit), LFM2.5-350M | `useNativeEngine` already ships; LFM2.5 is the faster/smaller alt (paper §30) |
125
+ | **Embeddings** | ✅ **yes** | EmbeddingGemma-300M (173 MB) | Runs on iPad (paper §25). Also fixes the currently-broken Embed tab |
126
+ | **Vision (image→text)** | ✅ **yes** | Qwen3.5 ViT (`describeImage`) | Bit-exact vs HF, word-identical greedy output (paper §22). **Model swaps off `ministral-3b`** |
127
+ | **TTS** | ❌ **wait** | OmniVoice (in progress) | Keep `useSpeech` on kokoro-js/tfjs until native audio validates |
128
+ | **STT** | ❌ **wait** | Moonshine (not started) | Keep `useVoiceInput` on tfjs Whisper |
129
+
130
+ Net: **chat, embeddings, and vision can all move to native today**; **audio must wait**.
131
+ That maps exactly to the engine's "multi-modal parity minus audio" status (paper §29).
132
+
133
+ ---
134
+
135
+ ## (d) Device/browser coverage of WebGPU-only — and the no-fallback consequence, stated plainly
136
+
137
+ The native engine is **WebGPU-only**. There is **no WASM/CPU fallback** in the native path
138
+ (`src/browser/backend-selector.ts`'s WASM tiers belong to the *transformers.js* path, not
139
+ the native engine; native `WebGPUEngine.create` simply requires a WebGPU adapter).
140
+
141
+ **Devices that gain native (faster, no mobile crashes):**
142
+ - **iPad / iPhone Safari (iPadOS/iOS 26.5+, WebKit)** — the headline win; previously crashed,
143
+ now runs text + vision + embeddings (paper §17–§29). On older WebKit, the grouped-submit
144
+ `?group=N` dial is the compatibility lever (paper §18.3).
145
+ - **Desktop Chrome / Edge 113+**, **desktop Safari 18+**, **Firefox 141+**.
146
+ - **Android Chrome 113+**, Samsung Internet 25+ (per paper Appendix B).
147
+
148
+ **The plain consequence of dropping the ONNX fallback:** **any device or browser without
149
+ WebGPU loses in-browser inference entirely.** There is no graceful degradation to WASM/CPU
150
+ in the native path — the engine **throws a clear error rather than degrading** (PROJECT-STATE
151
+ §3, "No-WebGPU / old devices: not targeted"). Concretely, the users who lose support are:
152
+ older iOS/iPadOS (pre-26 WebKit where WebGPU is absent or buggy), older desktop browsers,
153
+ locked-down enterprise browsers with WebGPU disabled, and low-end Android without a WebGPU
154
+ adapter. Today those users fall back to the slow-but-working transformers.js WASM path; a
155
+ hard native-only cutover **removes that safety net**. This is a deliberate owner decision
156
+ (paper §23: a permanent fallback "assumes defeat to begin with") — but the site must own the
157
+ UX of it: feature-detect WebGPU up front (`isWebGPUSupported` is already imported in the
158
+ playgrounds) and show an explicit "this demo needs WebGPU" state instead of a silent failure.
159
+
160
+ ---
161
+
162
+ ## (e) Phased migration plan (smallest first step first)
163
+
164
+ **Phase 0 — publish native hooks + re-pin the tgz (prerequisite, gerbil-side).**
165
+ In gerbil, expose browser-safe native React hooks (`useNativeChat`/`useNativeEmbedding`/
166
+ `useNativeVision`) under a declared `exports` subpath, fixing the `@huggingface/hub`
167
+ `node:fs` leak that currently keeps `use-native-engine.ts` out of the barrel. Cut a new rc,
168
+ rebuild the tgz, re-pin `@tryhamster/gerbil` in the site. (If this slips, the site can keep
169
+ hand-wrapping `WebGPUEngine` as it does for chat — but publishing is the clean path.)
170
+
171
+ **Phase 1 — SMALLEST FIRST STEP: make native chat the default behind WebGPU detection.**
172
+ The native chat path **already exists and works** (`hooks/useNativeEngine.ts` →
173
+ `WebGPUEngine.generate`). The minimal change is in `components/Playground.tsx`: when
174
+ `isWebGPUSupported()` is true, default `localStorage["gerbil-backend"]` to native
175
+ (`PlaygroundNative`) instead of requiring a manual toggle; keep `PlaygroundFull` (tfjs) as the
176
+ explicit opt-out and the no-WebGPU path. Zero new gerbil API needed, fully reversible, and it
177
+ flips the highest-traffic modality to native first.
178
+
179
+ **Phase 2 — fix + migrate embeddings to native (also un-breaks the dead tab).**
180
+ Replace the removed `useEmbedding` in `PlaygroundFull`/`AISDKPlayground` with a native
181
+ embedding hook loading `mlx-community/embeddinggemma-300m-4bit` + `engine.embed(text,
182
+ {taskType})`. This both ships native embeddings and repairs the currently-throwing Embed tab.
183
+
184
+ **Phase 3 — migrate vision to native, swapping the model.**
185
+ Replace the `ministral-3b` `useCompletion` vision instances with a native vision hook:
186
+ `WebGPUEngine.create({repo: <Qwen3.5 ViT checkpoint>, enableVision:true})` +
187
+ `describeImage({pixels,width,height})`. Requires host pixel→patch handling (the engine's
188
+ `describeImage` accepts decoded `{pixels,width,height}` and preprocesses internally) and a
189
+ copy/UI change because the model id and capabilities differ from Ministral.
190
+
191
+ **Phase 4 — keep audio on tfjs; flip when native audio lands.**
192
+ Leave `useSpeech` (TTS) and `useVoiceInput` (STT) on the transformers.js path. Swap TTS to
193
+ OmniVoice when it validates, then STT to Moonshine. This is the only phase gated on engine work
194
+ not yet done.
195
+
196
+ **Phase 5 — retire the tfjs path for non-audio (optional, end-state).**
197
+ Once Phases 1–3 are stable and audio is native (Phase 4), the transformers.js/ONNX worker can
198
+ be removed for all but the explicit no-WebGPU fallback decision — matching paper §23
199
+ (`chrome-backend.ts` slated for deletion engine-side). Whether to keep *any* tfjs fallback at
200
+ all is the owner call in §(d).
201
+
202
+ ---
203
+
204
+ ## (f) The single biggest risk
205
+
206
+ **Losing the WebGPU-less audience with no graceful fallback, on a demo that is the product's
207
+ shop window.** The site is gerbil's marketing front door: a visitor on an older iPhone, a
208
+ locked-down work laptop, or any browser without WebGPU currently still gets a working (if slow)
209
+ WASM demo. A native-only cutover turns that into a hard "unsupported" wall. The mitigation is
210
+ non-negotiable and cheap: **feature-detect WebGPU at the top of every playground** (the hooks
211
+ are already imported), default non-WebGPU visitors to either the retained tfjs `PlaygroundFull`
212
+ or an explicit, friendly "needs WebGPU" state — and **never** flip the default to native without
213
+ that guard in place. Secondary risks, in order: the **tgz/publishing coupling** (no native hooks
214
+ are published yet, so the site is hand-wrapping `WebGPUEngine` — a versioning and maintenance
215
+ liability until Phase 0 lands), the **vision model swap** (Ministral → Qwen3.5 ViT changes
216
+ behavior and copy, not just an import), and **iPad re-download cost** (no durable cache without a
217
+ PWA, paper §24 — a UX, not correctness, issue).