@tryhamster/gerbil 1.0.0-rc.8 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (179) hide show
  1. package/LICENSE +1 -1
  2. package/README.md +247 -84
  3. package/dist/architectures-C1I5V3Dt.mjs +6070 -0
  4. package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
  5. package/dist/browser/index.d.ts +264 -588
  6. package/dist/browser/index.d.ts.map +1 -1
  7. package/dist/browser/index.js +585 -2334
  8. package/dist/browser/index.js.map +1 -1
  9. package/dist/cli.mjs +625 -1098
  10. package/dist/cli.mjs.map +1 -1
  11. package/dist/defaults-9komdrbY.mjs +24 -0
  12. package/dist/defaults-9komdrbY.mjs.map +1 -0
  13. package/dist/frameworks/express.d.mts +1 -3
  14. package/dist/frameworks/express.d.mts.map +1 -1
  15. package/dist/frameworks/express.mjs +7 -7
  16. package/dist/frameworks/express.mjs.map +1 -1
  17. package/dist/frameworks/fastify.d.mts +1 -1
  18. package/dist/frameworks/fastify.d.mts.map +1 -1
  19. package/dist/frameworks/fastify.mjs +3 -3
  20. package/dist/frameworks/fastify.mjs.map +1 -1
  21. package/dist/frameworks/hono.d.mts +1 -1
  22. package/dist/frameworks/hono.d.mts.map +1 -1
  23. package/dist/frameworks/hono.mjs +4 -4
  24. package/dist/frameworks/hono.mjs.map +1 -1
  25. package/dist/frameworks/next.d.mts +3 -2
  26. package/dist/frameworks/next.d.mts.map +1 -1
  27. package/dist/frameworks/next.mjs +4 -4
  28. package/dist/frameworks/next.mjs.map +1 -1
  29. package/dist/frameworks/react.d.mts +1 -1
  30. package/dist/frameworks/trpc.d.mts +1 -1
  31. package/dist/frameworks/trpc.d.mts.map +1 -1
  32. package/dist/frameworks/trpc.mjs +4 -4
  33. package/dist/frameworks/trpc.mjs.map +1 -1
  34. package/dist/gerbil-BHrJJIa4.mjs +1656 -0
  35. package/dist/gerbil-BHrJJIa4.mjs.map +1 -0
  36. package/dist/gerbil-BT9fCydo.d.mts +488 -0
  37. package/dist/gerbil-BT9fCydo.d.mts.map +1 -0
  38. package/dist/gerbil-DomNfIr1.mjs +4 -0
  39. package/dist/gpu/hooks.d.mts +520 -0
  40. package/dist/gpu/hooks.d.mts.map +1 -0
  41. package/dist/gpu/hooks.mjs +1188 -0
  42. package/dist/gpu/hooks.mjs.map +1 -0
  43. package/dist/gpu/index.d.mts +2 -0
  44. package/dist/gpu/index.mjs +6 -0
  45. package/dist/gpu-33qCAtHW.mjs +3615 -0
  46. package/dist/gpu-33qCAtHW.mjs.map +1 -0
  47. package/dist/index-Dgmb2kE3.d.mts +245 -0
  48. package/dist/index-Dgmb2kE3.d.mts.map +1 -0
  49. package/dist/index-jEAL2s-A.d.mts +2022 -0
  50. package/dist/index-jEAL2s-A.d.mts.map +1 -0
  51. package/dist/index.d.mts +22 -487
  52. package/dist/index.d.mts.map +1 -1
  53. package/dist/index.mjs +13 -8
  54. package/dist/index.mjs.map +1 -1
  55. package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
  56. package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
  57. package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
  58. package/dist/integrations/ai-sdk.d.mts +75 -6
  59. package/dist/integrations/ai-sdk.d.mts.map +1 -1
  60. package/dist/integrations/ai-sdk.mjs +131 -15
  61. package/dist/integrations/ai-sdk.mjs.map +1 -1
  62. package/dist/integrations/langchain.d.mts +1 -1
  63. package/dist/integrations/langchain.d.mts.map +1 -1
  64. package/dist/integrations/langchain.mjs +5 -5
  65. package/dist/integrations/langchain.mjs.map +1 -1
  66. package/dist/integrations/llamaindex.d.mts +1 -1
  67. package/dist/integrations/llamaindex.d.mts.map +1 -1
  68. package/dist/integrations/llamaindex.mjs +5 -5
  69. package/dist/integrations/llamaindex.mjs.map +1 -1
  70. package/dist/integrations/mcp-client.mjs +3 -3
  71. package/dist/integrations/mcp-client.mjs.map +1 -1
  72. package/dist/integrations/mcp.d.mts +3 -2
  73. package/dist/integrations/mcp.d.mts.map +1 -1
  74. package/dist/integrations/mcp.mjs +5 -5
  75. package/dist/{mcp-BvbriaBy.mjs → mcp-1DaMsaBc.mjs} +4 -4
  76. package/dist/mcp-1DaMsaBc.mjs.map +1 -0
  77. package/dist/memory/index.d.mts +3 -0
  78. package/dist/memory/index.mjs +6 -0
  79. package/dist/memory-D1P7Tmda.mjs +4 -0
  80. package/dist/memory-DVN0MnIG.mjs +132 -0
  81. package/dist/memory-DVN0MnIG.mjs.map +1 -0
  82. package/dist/memory-Dj0J1v88.mjs +294 -0
  83. package/dist/memory-Dj0J1v88.mjs.map +1 -0
  84. package/dist/moonshine-stt-BLyVoRpB.mjs +4 -0
  85. package/dist/moonshine-stt-v_P_Ci_m.mjs +11936 -0
  86. package/dist/moonshine-stt-v_P_Ci_m.mjs.map +1 -0
  87. package/dist/{one-liner-s-lD8rCC.mjs → one-liner-DnQn7HJK.mjs} +14 -16
  88. package/dist/one-liner-DnQn7HJK.mjs.map +1 -0
  89. package/dist/repl-jV5gcJFA.mjs +9 -0
  90. package/dist/skills/index.d.mts +270 -320
  91. package/dist/skills/index.d.mts.map +1 -1
  92. package/dist/skills/index.mjs +5 -5
  93. package/dist/{skills-CD3Orlex.mjs → skills-DX8D59UH.mjs} +187 -32
  94. package/dist/skills-DX8D59UH.mjs.map +1 -0
  95. package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
  96. package/dist/tools-DQ1mPUw5.mjs.map +1 -0
  97. package/dist/{types-CiTc7ez3.d.mts → types-D6FiR_oh.d.mts} +106 -12
  98. package/dist/types-D6FiR_oh.d.mts.map +1 -0
  99. package/dist/types-DQBe2lFo.d.mts +165 -0
  100. package/dist/types-DQBe2lFo.d.mts.map +1 -0
  101. package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
  102. package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
  103. package/dist/vector-B0panuy6.mjs +95 -0
  104. package/dist/vector-B0panuy6.mjs.map +1 -0
  105. package/docs/PROJECT-STATE.md +321 -0
  106. package/docs/adding-a-model-family.md +280 -0
  107. package/docs/ai-sdk.md +70 -61
  108. package/docs/architecture/overview.md +17 -7
  109. package/docs/browser.md +203 -8
  110. package/docs/embeddings.md +156 -0
  111. package/docs/gerbil-site-native-migration.md +217 -0
  112. package/docs/gpu-engine/architectures.md +398 -0
  113. package/docs/gpu-engine/ir.md +372 -0
  114. package/docs/gpu-engine/kernels.md +718 -0
  115. package/docs/gpu-engine/paper.html +1759 -0
  116. package/docs/gpu-engine/paper.md +2109 -0
  117. package/docs/gpu-engine/safetensors.md +312 -0
  118. package/docs/gpu-engine/tokenizer.md +302 -0
  119. package/docs/memory-rag.md +91 -0
  120. package/docs/metal-safari-intel.md +190 -0
  121. package/docs/mobile-failure-diagnosis.md +124 -0
  122. package/docs/mobile.md +99 -0
  123. package/docs/observability.md +230 -0
  124. package/docs/onnx-removal-plan.md +339 -0
  125. package/docs/research/autoresearch-portable.md +904 -0
  126. package/docs/research/dispatch-reduction-hivemind.md +84 -0
  127. package/docs/research/ios-safari-model-caching.md +117 -0
  128. package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
  129. package/docs/research/native-stt-model-selection.md +49 -0
  130. package/docs/research/native-tts-model-selection.md +90 -0
  131. package/docs/research/native-vs-chromium-decision.md +152 -0
  132. package/docs/research/nemotron-mamba2-inference.md +910 -0
  133. package/docs/research/qwen35-multimodal.md +293 -0
  134. package/docs/research/qwen36-gemma4-targets.md +337 -0
  135. package/docs/research/sota-embedding-models.md +179 -0
  136. package/docs/research/sota-mobile-models-2026.md +263 -0
  137. package/docs/research/sota-modality-models.md +202 -0
  138. package/docs/research/tps-baselines.md +71 -0
  139. package/docs/research/webgpu-m4-reference.md +104 -0
  140. package/docs/site-update-plan.md +155 -0
  141. package/docs/structured-output.md +123 -0
  142. package/docs/stt.md +63 -446
  143. package/docs/tts.md +77 -499
  144. package/docs/vision.md +100 -338
  145. package/package.json +22 -7
  146. package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
  147. package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
  148. package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
  149. package/dist/gerbil-CJ3ifloF.mjs +0 -4
  150. package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
  151. package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
  152. package/dist/gerbil-qOTe1nl2.d.mts +0 -431
  153. package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
  154. package/dist/kokoro-BNTb6egA.mjs +0 -20210
  155. package/dist/kokoro-BNTb6egA.mjs.map +0 -1
  156. package/dist/kokoro-DFRQ1OeM.js +0 -20212
  157. package/dist/kokoro-DFRQ1OeM.js.map +0 -1
  158. package/dist/mcp-BvbriaBy.mjs.map +0 -1
  159. package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
  160. package/dist/repl-DveXw36T.mjs +0 -9
  161. package/dist/skills-CD3Orlex.mjs.map +0 -1
  162. package/dist/stt-CpLYbGFd.mjs +0 -433
  163. package/dist/stt-CpLYbGFd.mjs.map +0 -1
  164. package/dist/stt-DRPLEEHB.mjs +0 -3
  165. package/dist/stt-Te8Qz-Ay.js +0 -433
  166. package/dist/stt-Te8Qz-Ay.js.map +0 -1
  167. package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
  168. package/dist/transformers.web-DokyH3rP.js +0 -3
  169. package/dist/transformers.web-M6mCnEYJ.js +0 -30382
  170. package/dist/transformers.web-M6mCnEYJ.js.map +0 -1
  171. package/dist/tts-C0xx3CtE.js +0 -724
  172. package/dist/tts-C0xx3CtE.js.map +0 -1
  173. package/dist/tts-DXgsKGCe.mjs +0 -3
  174. package/dist/tts-DeGANMNV.mjs +0 -730
  175. package/dist/tts-DeGANMNV.mjs.map +0 -1
  176. package/dist/types-CiTc7ez3.d.mts.map +0 -1
  177. /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
  178. /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
  179. /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
package/LICENSE CHANGED
@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) 2025 Wheel Go Fast.
3
+ Copyright (c) 2025-2026 Wheel Go Fast, Inc.
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
package/README.md CHANGED
@@ -5,14 +5,16 @@
5
5
  <h1 align="center">Gerbil</h1>
6
6
 
7
7
  <p align="center">
8
- <strong>Local AI inference for Node.js. LLM, TTS, STT. GPU-accelerated. Zero config.</strong>
8
+ <strong>A from-scratch WebGPU/WGSL inference engine. Text, vision, embeddings, speech — all native, on-device, in the browser and Node.</strong>
9
9
  </p>
10
10
 
11
11
  <p align="center">
12
12
  <a href="#install">Install</a> •
13
- <a href="#quick-start">Quick Start</a> •
14
- <a href="#text-to-speech">TTS</a> •
15
- <a href="#speech-to-text">STT</a> •
13
+ <a href="#native-webgpu-engine">Engine</a> •
14
+ <a href="#react-quickstart">React</a> •
15
+ <a href="#embeddings">Embeddings</a> •
16
+ <a href="#vision">Vision</a> •
17
+ <a href="#speech">Speech</a> •
16
18
  <a href="./docs/ai-sdk.md">AI SDK</a> •
17
19
  <a href="./docs/cli.md">CLI</a>
18
20
  </p>
@@ -35,20 +37,28 @@
35
37
  ---
36
38
 
37
39
  ```typescript
38
- import gerbil from "@tryhamster/gerbil";
40
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
39
41
 
40
- const text = await gerbil("Explain recursion in one sentence");
42
+ const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
43
+ const { text } = await engine.generate("Explain recursion in one sentence");
41
44
  ```
42
45
 
46
+ > **Pre-1.0.** Gerbil is a release-candidate (`1.0.0-rc.26`, changeset pre-release). The
47
+ > native engine surface below is the path going forward; APIs may still shift before 1.0.
48
+
43
49
  ## Why Gerbil?
44
50
 
45
- - **Zero Config** — `npx @tryhamster/gerbil "your prompt"` just works
46
- - **Local & Private** — No API keys, no data leaves your machine
47
- - **GPU Accelerated** — WebGPU with CPU fallback
48
- - **Complete Audio** Text-to-Speech (Kokoro) & Speech-to-Text (Whisper)
49
- - **Framework Ready** — AI SDK v5, Next.js, Express, LangChain
50
- - **Skills System** Built-in + custom skills with Zod validation
51
- - **Tool Calling** — Agentic capabilities with Qwen3 models
51
+ - **One native engine** — a from-scratch WebGPU/WGSL engine, pure compute shaders, nothing
52
+ extra to ship.
53
+ - **Multimodal, all native** — text, vision (image→text), embeddings, and speech run on the
54
+ same engine, loading safetensors directly from the HuggingFace Hub.
55
+ - **Browser & Node** — Chrome 113+, Safari 26+ (iOS 26+), Firefox 141+, and Node via Dawn
56
+ (`webgpu` npm), anywhere there's a real GPU.
57
+ - **Local & private** — no API keys, nothing leaves the device.
58
+ - **React-first** — `useEngine` owns load / unload / hot-swap and shares one engine
59
+ across components (reference-counted), with `dtype: "auto"` picking int4 on mobile.
60
+ - **Framework ready** — Vercel AI SDK v5, Next.js, Express, LangChain adapters.
61
+ - **Skills & tools** — built-in + custom skills with Zod validation; agentic tool calling.
52
62
 
53
63
  ## Install
54
64
 
@@ -65,82 +75,184 @@ npm install @tryhamster/gerbil
65
75
 
66
76
  After global install, use `gerbil` directly instead of `npx @tryhamster/gerbil`.
67
77
 
68
- ## Quick Start
78
+ ## Native WebGPU Engine
79
+
80
+ Gerbil's product is a from-scratch WebGPU inference engine — pure WGSL compute shaders.
81
+ It loads safetensors directly from the HuggingFace Hub (selective tensor download — skip
82
+ vision towers you don't need) and runs the same code in the browser and in Node (via Dawn).
69
83
 
70
84
  ```typescript
71
- import { Gerbil } from "@tryhamster/gerbil";
85
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
72
86
 
73
- const g = new Gerbil();
74
- await g.loadModel("qwen3-0.6b");
87
+ // dtype "auto" picks int4 on mobile, the repo's native precision on desktop.
88
+ const engine = await WebGPUEngine.create({
89
+ repo: "mlx-community/Qwen3.5-0.8B-4bit",
90
+ dtype: "auto",
91
+ });
75
92
 
76
93
  // Generate
77
- const result = await g.generate("Write a haiku");
78
- console.log(result.text);
94
+ const { text, tokensPerSecond } = await engine.generate("Write a haiku about gerbils");
95
+ console.log(text, `(${tokensPerSecond.toFixed(1)} tok/s)`);
79
96
 
80
97
  // Stream
81
- for await (const chunk of g.stream("Tell me a story")) {
82
- process.stdout.write(chunk);
98
+ for await (const token of engine.stream("Tell me a story")) {
99
+ process.stdout.write(token);
83
100
  }
84
101
 
85
- // Thinking mode (Qwen3)
86
- const math = await g.generate("What is 127 × 43?", { thinking: true });
87
- console.log(math.thinking); // Shows reasoning
88
- console.log(math.text); // "5461"
102
+ engine.destroy();
103
+ ```
89
104
 
90
- // Structured JSON
91
- const data = await g.json("Extract: John, 32, NYC", {
92
- schema: z.object({ name: z.string(), age: z.number(), city: z.string() }),
93
- });
105
+ `WebGPUEngine.create({ repo, dtype, enableVision, embedding, maxSeqLen })` returns an
106
+ engine with `generate`, `stream`, `describeImage`, `embed`, and `speak`. See the
107
+ [native engine docs](#supported-models) below for the model lineup.
108
+
109
+ ## React Quickstart
110
+
111
+ `useEngine` (from `@tryhamster/gerbil/gpu/hooks`) owns the full engine lifecycle —
112
+ load, unload, hot-swap on config change, and reference-counted sharing so multiple
113
+ components never upload the same weights to the GPU twice.
114
+
115
+ ```tsx
116
+ import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
117
+
118
+ function Chat() {
119
+ const { complete, completion, isLoading, isGenerating, tps } = useEngine({
120
+ model: "mlx-community/Qwen3.5-0.8B-4bit",
121
+ autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
122
+ });
123
+
124
+ if (isLoading) return <div>Loading model…</div>;
125
+ return (
126
+ <div>
127
+ <button onClick={() => complete("What is 2+2?")} disabled={isGenerating}>
128
+ Generate
129
+ </button>
130
+ <p>{completion}</p>
131
+ {isGenerating && <span>{tps?.toFixed(1)} tok/s</span>}
132
+ </div>
133
+ );
134
+ }
94
135
  ```
95
136
 
96
- ## Text-to-Speech
137
+ The same hook exposes `describeImage` (vision), `embed`/`similarity` (embeddings), `stop`,
138
+ and `dispose`. Pass `enableVision: true` or `embedding: true` to load those modalities.
139
+
140
+ ## Structured Output
97
141
 
98
- Generate natural speech locally using Kokoro TTS (28 voices):
142
+ `generateObject` makes the model return a JSON object: it generates, extracts the JSON,
143
+ validates it, and retries with a corrective nudge until it's valid (or `maxRetries` is hit).
144
+ Validate with a predicate `(o) => boolean` or a minimal `{ required: [...] }` schema; omit
145
+ `schema` to accept any valid JSON.
99
146
 
100
147
  ```typescript
101
- const result = await g.speak("Hello, I'm Gerbil!", { voice: "af_heart" });
102
- // result.audio = Float32Array, result.sampleRate = 24000
148
+ import { generateObject } from "@tryhamster/gerbil";
103
149
 
104
- // Stream long text
105
- for await (const chunk of g.speakStream("Long paragraph...")) {
106
- // Play each chunk as it's generated
107
- }
150
+ const { object, attempts } = await generateObject<{ name: string; age: number }>(
151
+ 'Extract {name, age} from: "I am Sarah, 28"',
152
+ { schema: { required: ["name", "age"] } },
153
+ );
154
+ // object === { name: "Sarah", age: 28 }
155
+ ```
156
+
157
+ It's available on the engine, the `Gerbil` class, and the one-liner API:
158
+
159
+ ```typescript
160
+ import { Gerbil, WebGPUEngine } from "@tryhamster/gerbil";
161
+
162
+ const g = new Gerbil();
163
+ await g.loadModel("qwen3.5-0.8b");
164
+ const { object } = await g.generateObject("List 3 primes as {primes: number[]}", {
165
+ schema: (o) => Array.isArray((o as any).primes),
166
+ });
167
+
168
+ // Or directly on the engine:
169
+ const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
170
+ await engine.generateObject("…", { schema: { required: ["title"] } });
171
+ ```
172
+
173
+ In React, use `useObject` (from `@tryhamster/gerbil/gpu/hooks`):
174
+
175
+ ```tsx
176
+ import { useObject } from "@tryhamster/gerbil/gpu/hooks";
177
+
178
+ const { generate, object, isGenerating } = useObject<{ city: string }>();
179
+ await generate("Extract the city from: I live in Paris", {
180
+ schema: { required: ["city"] },
181
+ });
108
182
  ```
109
183
 
184
+ From the CLI:
185
+
110
186
  ```bash
111
- # CLI
112
- gerbil speak "Hello world" --voice bf_emma
187
+ gerbil object "Extract {name, age}: I am Sarah, 28" --schema person.json
188
+ # person.json: { "required": ["name", "age"] }
113
189
  ```
114
190
 
115
- 📖 **[Full TTS Documentation →](./docs/tts.md)**
191
+ ## Embeddings
116
192
 
117
- ## Speech-to-Text
193
+ Native text embeddings via **EmbeddingGemma-300M** (mean-pooled Gemma3 encoder + Dense
194
+ head, 768-dim, L2-normalized). EmbeddingGemma is asymmetric — pass `taskType` so queries
195
+ and documents get the right prefix.
118
196
 
119
- Transcribe audio locally using Whisper (7 models, 80+ languages):
197
+ ```typescript
198
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
199
+
200
+ const engine = await WebGPUEngine.create({
201
+ repo: "mlx-community/embeddinggemma-300m-4bit",
202
+ embedding: true,
203
+ });
204
+
205
+ const query = await engine.embed("capital of France", { taskType: "query" });
206
+ const doc = await engine.embed("Paris is the capital of France", { taskType: "document" });
207
+
208
+ // Vectors are unit-norm, so cosine similarity is a dot product.
209
+ const sim = query.reduce((s, v, i) => s + v * doc[i], 0);
210
+ ```
211
+
212
+ 📖 **[Full Embeddings Documentation →](./docs/embeddings.md)**
213
+
214
+ ## Vision
215
+
216
+ Image-in → text-out via the native vision towers (Qwen3.5 ViT and Gemma 4 ViT). Load with
217
+ `enableVision: true`, then call `describeImage`.
120
218
 
121
219
  ```typescript
122
- import { readFileSync } from "fs";
220
+ const engine = await WebGPUEngine.create({
221
+ repo: "Qwen/Qwen3.5-0.8B",
222
+ enableVision: true,
223
+ });
224
+
225
+ // In Node, decode the image to RGB pixels (HWC, 0..255) yourself; in the browser the
226
+ // React hook's describeImage() takes a URL / data-URL directly.
227
+ const { text } = await engine.describeImage(
228
+ { pixels, width, height },
229
+ "What's in this image?",
230
+ );
231
+ ```
123
232
 
124
- const audio = new Uint8Array(readFileSync("recording.wav"));
125
- const result = await g.transcribe(audio);
126
- console.log(result.text);
233
+ 📖 **[Full Vision Documentation →](./docs/vision.md)**
127
234
 
128
- // With timestamps
129
- const result = await g.transcribe(audio, { timestamps: true });
130
- for (const seg of result.segments) {
131
- console.log(`[${seg.start}s] ${seg.text}`);
132
- }
235
+ ## Speech
133
236
 
134
- // Record from microphone
135
- const result = await g.listen(5000); // 5 seconds
237
+ **Text-to-speech** native **Kani-TTS-2** (LFM2-350M codec-LM + NVIDIA NeMo NanoCodec).
238
+ `engine.speak()` returns 22.05 kHz mono PCM.
239
+
240
+ ```typescript
241
+ const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-2-en" });
242
+ const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!"); // sampleRate === 22050
136
243
  ```
137
244
 
138
- ```bash
139
- # CLI
140
- gerbil transcribe audio.wav --timestamps
245
+ **Speech-to-text** — native **Moonshine** (raw-waveform encoder/decoder, no FFT/log-mel)
246
+ via the dedicated `MoonshineSTT` class.
247
+
248
+ ```typescript
249
+ import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
250
+
251
+ const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
252
+ const { text } = await stt.transcribe(pcm16kMono); // Float32Array @ 16 kHz
141
253
  ```
142
254
 
143
- 📖 **[Full STT Documentation →](./docs/stt.md)**
255
+ 📖 **[Full TTS Documentation →](./docs/tts.md)** · **[Full STT Documentation →](./docs/stt.md)**
144
256
 
145
257
  ## Skills
146
258
 
@@ -231,25 +343,30 @@ gerbil update # Update to latest version
231
343
 
232
344
  ## Browser Usage
233
345
 
234
- Run LLMs directly in the browser with WebGPU — no server required:
346
+ Run LLMs directly in the browser with WebGPU — no server required. The React hooks
347
+ live at `@tryhamster/gerbil/gpu/hooks` and run pure WebGPU compute:
235
348
 
236
349
  ```tsx
237
- import { useChat } from "@tryhamster/gerbil/browser";
350
+ import { useChat } from "@tryhamster/gerbil/gpu/hooks";
238
351
 
239
352
  function Chat() {
240
- const { messages, input, setInput, handleSubmit, isLoading } = useChat();
353
+ const { messages, send, isLoading, isGenerating } = useChat();
241
354
 
242
355
  if (isLoading) return <div>Loading model...</div>;
243
356
 
244
357
  return (
245
- <form onSubmit={handleSubmit}>
246
- {messages.map(m => <div key={m.id}>{m.role}: {m.content}</div>)}
247
- <input value={input} onChange={e => setInput(e.target.value)} />
248
- </form>
358
+ <div>
359
+ {messages.map((m, i) => <div key={i}>{m.role}: {m.content}</div>)}
360
+ <button onClick={() => send("Hello!")} disabled={isGenerating}>Send</button>
361
+ </div>
249
362
  );
250
363
  }
251
364
  ```
252
365
 
366
+ `@tryhamster/gerbil/browser` still exports the device/WebGPU utilities
367
+ (`isModelSafeForDevice`, `detectMemoryCrash`, `downloadModelChunked`,
368
+ `checkWebGPUCapabilities`, `getBrowserDiagnostics`, …).
369
+
253
370
  📖 **[Full Browser Documentation →](./docs/browser.md)**
254
371
 
255
372
  ## Integrations
@@ -263,40 +380,82 @@ function Chat() {
263
380
  | **LangChain** | `@tryhamster/gerbil/langchain` | [📖 Frameworks](./docs/frameworks.md) |
264
381
  | **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](./docs/mcp.md) |
265
382
 
266
- **Audio capabilities:** TTS and STT are built into the core `Gerbil` class, `@tryhamster/gerbil/browser` hooks, and `@tryhamster/gerbil/ai` provider.
383
+ **Native engine:** `import { WebGPUEngine } from "@tryhamster/gerbil/gpu"` (or `useEngine` from `@tryhamster/gerbil/gpu/hooks` for React) is the primary surface for text, vision, embeddings, and speech.
384
+
385
+ ## Supported Models
267
386
 
268
- ## Models
387
+ The native engine runs these modalities today. All load straight from the HuggingFace Hub
388
+ via `WebGPUEngine.create({ repo })`.
269
389
 
270
- ### Language Models
390
+ ### Text
271
391
 
272
- | Model | Size | Best For |
273
- |-------|------|----------|
274
- | `qwen3-0.6b` | ~400MB | General use, reasoning (thinking mode) |
275
- | `qwen2.5-coder-0.5b` | ~400MB | Code generation |
276
- | `smollm2-135m` | ~100MB | Quick completions |
392
+ | Model | Repo | Notes |
393
+ |-------|------|-------|
394
+ | **Qwen3.5-0.8B** | `mlx-community/Qwen3.5-0.8B-4bit` | Default text model; vision-capable (`Qwen/Qwen3.5-0.8B` for the ViT) |
395
+ | **Qwen3.5-2B** | `Qwen/Qwen3.5-2B` | Higher quality; 262k context; multimodal (vision-capable) |
396
+ | **LFM2.5-350M** | `LiquidAI/LFM2.5-350M` | Hybrid conv/attention, very fast, ~199 MB q4 |
397
+ | **Gemma 4 E2B** | `mlx-community/gemma-4-e2b-it-4bit` | PLE CPU-streamed; vision-capable |
277
398
 
278
- Use any HuggingFace model: `npx @tryhamster/gerbil -m hf:org/model "prompt"`
399
+ ### Vision (image text, `describeImage`)
279
400
 
280
- ### Audio Models
401
+ | Tower | From | Notes |
402
+ |-------|------|-------|
403
+ | **Qwen3.5 ViT** | `Qwen/Qwen3.5-0.8B` (`enableVision: true`) | Bit-exact vs HF |
404
+ | **Gemma 4 ViT** | `mlx-community/gemma-4-e2b-it-4bit` (`enableVision: true`) | Native projector |
281
405
 
282
- | Model | Type | Size | Notes |
406
+ ### Embeddings (`embed`)
407
+
408
+ | Model | Repo | Notes |
409
+ |-------|------|-------|
410
+ | **EmbeddingGemma-300M** | `mlx-community/embeddinggemma-300m-4bit` | 768-dim, asymmetric (`taskType`), runs on iPad |
411
+
412
+ ### Speech
413
+
414
+ | Model | Type | Repo | Notes |
283
415
  |-------|------|------|-------|
284
- | `kokoro-82m` | TTS | ~330MB | 28 voices, English |
285
- | `whisper-tiny.en` | STT | 39MB | English, fastest |
286
- | `whisper-base.en` | STT | 74MB | English, balanced |
287
- | `whisper-small` | STT | 244MB | 80+ languages |
416
+ | **Kani-TTS-2** | TTS | `nineninesix/kani-tts-2-en` | `engine.speak()` 22.05 kHz PCM |
417
+ | **Moonshine** | STT | `UsefulSensors/moonshine-base` | `MoonshineSTT.transcribe()`, raw-waveform |
418
+
419
+ ### Quantization & dtype
420
+
421
+ `dtype: "auto"` (the React-hook default) picks int4 on mobile and the repo's native
422
+ precision on desktop. For Qwen3.5-0.8B on Dawn/Node:
423
+
424
+ | Format | Download | tok/s | Notes |
425
+ |---|---|---|---|
426
+ | MLX 4-bit (affine) | 404 MB | fastest | Smallest. Recommended. |
427
+ | GPTQ (AutoRound) | 734 MB | fast | Pre-quantized linears, F16 embed |
428
+ | F32 (on-the-fly Q4) | 1666 MB | slowest | No pre-quantization needed |
429
+
430
+ > Throughput moves run-to-run and across the optimization loop; treat these as relative,
431
+ > not promises.
432
+
433
+ ### WGSL Kernels
434
+
435
+ MatMul, MatMulInt4, EmbeddingInt4, RMSNorm, RoPE, GQA Attention (flash-style, causal +
436
+ bidirectional), SwiGLU/GeGLU, CrossAttention, CausalConv1d, M-RoPE, EmbedSplice, FSQ +
437
+ HiFi-GAN (NanoCodec decoder), and more.
438
+
439
+ > **High-level `Gerbil` class.** `import { Gerbil } from "@tryhamster/gerbil"` (plus the
440
+ > one-liner and `@tryhamster/gerbil/skills`) is a supported convenience wrapper over the
441
+ > native `WebGPUEngine` — ideal for quick scripts, the CLI, and the AI SDK. Reach for
442
+ > `WebGPUEngine` / `useEngine` directly when you want lower-level control over loading,
443
+ > vision, embeddings, and speech.
288
444
 
289
445
  ## Documentation
290
446
 
291
447
  | Guide | Description |
292
448
  |-------|-------------|
293
- | [📖 Text-to-Speech](./docs/tts.md) | Kokoro TTS, 28 voices, streaming audio |
294
- | [📖 Speech-to-Text](./docs/stt.md) | Whisper STT, transcription, voice input |
449
+ | [📖 Structured Output](./docs/structured-output.md) | `generateObject` / `useObject` validated JSON with retries |
450
+ | [📖 Embeddings](./docs/embeddings.md) | EmbeddingGemma semantic search, similarity, RAG |
451
+ | [📖 Vision](./docs/vision.md) | Image → text with Qwen3.5 ViT & Gemma 4 ViT |
452
+ | [📖 Text-to-Speech](./docs/tts.md) | Native Kani-TTS-2 (`engine.speak()`) |
453
+ | [📖 Speech-to-Text](./docs/stt.md) | Native Moonshine (`MoonshineSTT`) |
295
454
  | [📖 Browser](./docs/browser.md) | WebGPU inference, React hooks |
296
455
  | [📖 Skills](./docs/skills.md) | Built-in skills, custom skill development |
297
456
  | [📖 Tools](./docs/tools.md) | Tool calling, agentic workflows |
298
457
  | [📖 REPL](./docs/repl.md) | Interactive terminal dashboard |
299
- | [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT) |
458
+ | [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT, Embeddings) |
300
459
  | [📖 Frameworks](./docs/frameworks.md) | Next.js, Express, React, LangChain |
301
460
  | [📖 CLI](./docs/cli.md) | All CLI commands and options |
302
461
  | [📖 MCP Server](./docs/mcp.md) | MCP server for Claude Desktop & Cursor |
@@ -304,8 +463,12 @@ Use any HuggingFace model: `npx @tryhamster/gerbil -m hf:org/model "prompt"`
304
463
 
305
464
  ## Requirements
306
465
 
307
- - Node.js 18+
308
- - For GPU: WebGPU-compatible environment
466
+ The native engine needs a real GPU and a WebGPU runtime:
467
+
468
+ - **Browser** — Chrome/Edge 113+, Safari 26+ (iOS/iPadOS 26+), or Firefox 141+
469
+ - **Node** — Node.js 18+ with the `webgpu` package (Dawn) installed
470
+
471
+ On devices without WebGPU the engine throws a clear error rather than silently degrading.
309
472
 
310
473
  ## License
311
474