@tryhamster/gerbil 1.0.0-rc.9 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (179) hide show
  1. package/LICENSE +1 -1
  2. package/README.md +318 -104
  3. package/dist/architectures-C1I5V3Dt.mjs +6070 -0
  4. package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
  5. package/dist/browser/index.d.ts +276 -590
  6. package/dist/browser/index.d.ts.map +1 -1
  7. package/dist/browser/index.js +592 -2334
  8. package/dist/browser/index.js.map +1 -1
  9. package/dist/cli.mjs +625 -1098
  10. package/dist/cli.mjs.map +1 -1
  11. package/dist/defaults-9komdrbY.mjs +24 -0
  12. package/dist/defaults-9komdrbY.mjs.map +1 -0
  13. package/dist/frameworks/express.d.mts +1 -3
  14. package/dist/frameworks/express.d.mts.map +1 -1
  15. package/dist/frameworks/express.mjs +7 -7
  16. package/dist/frameworks/express.mjs.map +1 -1
  17. package/dist/frameworks/fastify.d.mts +1 -1
  18. package/dist/frameworks/fastify.d.mts.map +1 -1
  19. package/dist/frameworks/fastify.mjs +3 -3
  20. package/dist/frameworks/fastify.mjs.map +1 -1
  21. package/dist/frameworks/hono.d.mts +1 -1
  22. package/dist/frameworks/hono.d.mts.map +1 -1
  23. package/dist/frameworks/hono.mjs +4 -4
  24. package/dist/frameworks/hono.mjs.map +1 -1
  25. package/dist/frameworks/next.d.mts +3 -2
  26. package/dist/frameworks/next.d.mts.map +1 -1
  27. package/dist/frameworks/next.mjs +4 -4
  28. package/dist/frameworks/next.mjs.map +1 -1
  29. package/dist/frameworks/react.d.mts +1 -1
  30. package/dist/frameworks/trpc.d.mts +1 -1
  31. package/dist/frameworks/trpc.d.mts.map +1 -1
  32. package/dist/frameworks/trpc.mjs +4 -4
  33. package/dist/frameworks/trpc.mjs.map +1 -1
  34. package/dist/gerbil-BetB5xb0.d.mts +488 -0
  35. package/dist/gerbil-BetB5xb0.d.mts.map +1 -0
  36. package/dist/gerbil-CTZUa8EZ.mjs +4 -0
  37. package/dist/gerbil-DNniplr4.mjs +1656 -0
  38. package/dist/gerbil-DNniplr4.mjs.map +1 -0
  39. package/dist/gpu/hooks.d.mts +640 -0
  40. package/dist/gpu/hooks.d.mts.map +1 -0
  41. package/dist/gpu/hooks.mjs +1369 -0
  42. package/dist/gpu/hooks.mjs.map +1 -0
  43. package/dist/gpu/index.d.mts +2 -0
  44. package/dist/gpu/index.mjs +6 -0
  45. package/dist/gpu-DFuglcEx.mjs +3790 -0
  46. package/dist/gpu-DFuglcEx.mjs.map +1 -0
  47. package/dist/index-Dgmb2kE3.d.mts +245 -0
  48. package/dist/index-Dgmb2kE3.d.mts.map +1 -0
  49. package/dist/index-DukkJRMj.d.mts +2114 -0
  50. package/dist/index-DukkJRMj.d.mts.map +1 -0
  51. package/dist/index.d.mts +22 -487
  52. package/dist/index.d.mts.map +1 -1
  53. package/dist/index.mjs +13 -8
  54. package/dist/index.mjs.map +1 -1
  55. package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
  56. package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
  57. package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
  58. package/dist/integrations/ai-sdk.d.mts +75 -6
  59. package/dist/integrations/ai-sdk.d.mts.map +1 -1
  60. package/dist/integrations/ai-sdk.mjs +131 -15
  61. package/dist/integrations/ai-sdk.mjs.map +1 -1
  62. package/dist/integrations/langchain.d.mts +1 -1
  63. package/dist/integrations/langchain.d.mts.map +1 -1
  64. package/dist/integrations/langchain.mjs +5 -5
  65. package/dist/integrations/langchain.mjs.map +1 -1
  66. package/dist/integrations/llamaindex.d.mts +1 -1
  67. package/dist/integrations/llamaindex.d.mts.map +1 -1
  68. package/dist/integrations/llamaindex.mjs +5 -5
  69. package/dist/integrations/llamaindex.mjs.map +1 -1
  70. package/dist/integrations/mcp-client.mjs +3 -3
  71. package/dist/integrations/mcp-client.mjs.map +1 -1
  72. package/dist/integrations/mcp.d.mts +3 -2
  73. package/dist/integrations/mcp.d.mts.map +1 -1
  74. package/dist/integrations/mcp.mjs +5 -5
  75. package/dist/{mcp-BvbriaBy.mjs → mcp-D2vvH1Xc.mjs} +4 -4
  76. package/dist/mcp-D2vvH1Xc.mjs.map +1 -0
  77. package/dist/memory/index.d.mts +3 -0
  78. package/dist/memory/index.mjs +6 -0
  79. package/dist/memory-D1P7Tmda.mjs +4 -0
  80. package/dist/memory-DVN0MnIG.mjs +132 -0
  81. package/dist/memory-DVN0MnIG.mjs.map +1 -0
  82. package/dist/memory-Dj0J1v88.mjs +294 -0
  83. package/dist/memory-Dj0J1v88.mjs.map +1 -0
  84. package/dist/moonshine-stt-17dpP1kr.mjs +4 -0
  85. package/dist/moonshine-stt-4ojLtMq7.mjs +11962 -0
  86. package/dist/moonshine-stt-4ojLtMq7.mjs.map +1 -0
  87. package/dist/{one-liner-s-lD8rCC.mjs → one-liner-JhdIPxzF.mjs} +14 -16
  88. package/dist/one-liner-JhdIPxzF.mjs.map +1 -0
  89. package/dist/repl-BDRkwPGX.mjs +9 -0
  90. package/dist/skills/index.d.mts +270 -320
  91. package/dist/skills/index.d.mts.map +1 -1
  92. package/dist/skills/index.mjs +5 -5
  93. package/dist/{skills-CD3Orlex.mjs → skills-CU694Dc8.mjs} +187 -32
  94. package/dist/skills-CU694Dc8.mjs.map +1 -0
  95. package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
  96. package/dist/tools-DQ1mPUw5.mjs.map +1 -0
  97. package/dist/types-DQBe2lFo.d.mts +165 -0
  98. package/dist/types-DQBe2lFo.d.mts.map +1 -0
  99. package/dist/{types-CiTc7ez3.d.mts → types-LlyYILII.d.mts} +112 -14
  100. package/dist/types-LlyYILII.d.mts.map +1 -0
  101. package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
  102. package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
  103. package/dist/vector-B0panuy6.mjs +95 -0
  104. package/dist/vector-B0panuy6.mjs.map +1 -0
  105. package/docs/PROJECT-STATE.md +321 -0
  106. package/docs/adding-a-model-family.md +280 -0
  107. package/docs/ai-sdk.md +70 -61
  108. package/docs/architecture/overview.md +17 -7
  109. package/docs/browser.md +203 -8
  110. package/docs/embeddings.md +156 -0
  111. package/docs/gerbil-site-native-migration.md +217 -0
  112. package/docs/gpu-engine/architectures.md +398 -0
  113. package/docs/gpu-engine/ir.md +372 -0
  114. package/docs/gpu-engine/kernels.md +718 -0
  115. package/docs/gpu-engine/paper.html +1759 -0
  116. package/docs/gpu-engine/paper.md +2109 -0
  117. package/docs/gpu-engine/safetensors.md +312 -0
  118. package/docs/gpu-engine/tokenizer.md +302 -0
  119. package/docs/memory-rag.md +91 -0
  120. package/docs/metal-safari-intel.md +190 -0
  121. package/docs/mobile-failure-diagnosis.md +124 -0
  122. package/docs/mobile.md +99 -0
  123. package/docs/observability.md +230 -0
  124. package/docs/onnx-removal-plan.md +339 -0
  125. package/docs/research/autoresearch-portable.md +904 -0
  126. package/docs/research/dispatch-reduction-hivemind.md +84 -0
  127. package/docs/research/ios-safari-model-caching.md +117 -0
  128. package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
  129. package/docs/research/native-stt-model-selection.md +49 -0
  130. package/docs/research/native-tts-model-selection.md +90 -0
  131. package/docs/research/native-vs-chromium-decision.md +152 -0
  132. package/docs/research/nemotron-mamba2-inference.md +910 -0
  133. package/docs/research/qwen35-multimodal.md +293 -0
  134. package/docs/research/qwen36-gemma4-targets.md +337 -0
  135. package/docs/research/sota-embedding-models.md +179 -0
  136. package/docs/research/sota-mobile-models-2026.md +263 -0
  137. package/docs/research/sota-modality-models.md +202 -0
  138. package/docs/research/tps-baselines.md +71 -0
  139. package/docs/research/webgpu-m4-reference.md +104 -0
  140. package/docs/site-update-plan.md +155 -0
  141. package/docs/structured-output.md +123 -0
  142. package/docs/stt.md +63 -446
  143. package/docs/tts.md +77 -499
  144. package/docs/vision.md +100 -338
  145. package/package.json +22 -7
  146. package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
  147. package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
  148. package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
  149. package/dist/gerbil-CJ3ifloF.mjs +0 -4
  150. package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
  151. package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
  152. package/dist/gerbil-qOTe1nl2.d.mts +0 -431
  153. package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
  154. package/dist/kokoro-BNTb6egA.mjs +0 -20210
  155. package/dist/kokoro-BNTb6egA.mjs.map +0 -1
  156. package/dist/kokoro-CMOGDSgT.js +0 -20212
  157. package/dist/kokoro-CMOGDSgT.js.map +0 -1
  158. package/dist/mcp-BvbriaBy.mjs.map +0 -1
  159. package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
  160. package/dist/repl-DveXw36T.mjs +0 -9
  161. package/dist/skills-CD3Orlex.mjs.map +0 -1
  162. package/dist/stt-Bu-E23Sc.js +0 -433
  163. package/dist/stt-Bu-E23Sc.js.map +0 -1
  164. package/dist/stt-CpLYbGFd.mjs +0 -433
  165. package/dist/stt-CpLYbGFd.mjs.map +0 -1
  166. package/dist/stt-DRPLEEHB.mjs +0 -3
  167. package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
  168. package/dist/transformers.web-DiD1gTwk.js +0 -44695
  169. package/dist/transformers.web-DiD1gTwk.js.map +0 -1
  170. package/dist/transformers.web-u34VxRFM.js +0 -3
  171. package/dist/tts-CqroPaSK.js +0 -724
  172. package/dist/tts-CqroPaSK.js.map +0 -1
  173. package/dist/tts-DXgsKGCe.mjs +0 -3
  174. package/dist/tts-DeGANMNV.mjs +0 -730
  175. package/dist/tts-DeGANMNV.mjs.map +0 -1
  176. package/dist/types-CiTc7ez3.d.mts.map +0 -1
  177. /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
  178. /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
  179. /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
package/LICENSE CHANGED
@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) 2025 Wheel Go Fast.
3
+ Copyright (c) 2025-2026 Wheel Go Fast, Inc.
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
package/README.md CHANGED
@@ -5,16 +5,19 @@
5
5
  <h1 align="center">Gerbil</h1>
6
6
 
7
7
  <p align="center">
8
- <strong>Local AI inference for Node.js. LLM, TTS, STT. GPU-accelerated. Zero config.</strong>
8
+ <strong>A from-scratch WebGPU/WGSL inference engine. Text, vision, embeddings, speech — all native, on-device, in the browser and Node.</strong>
9
9
  </p>
10
10
 
11
11
  <p align="center">
12
+ <a href="https://gerbilsdk.com"><strong>gerbilsdk.com</strong></a> •
12
13
  <a href="#install">Install</a> •
13
- <a href="#quick-start">Quick Start</a> •
14
- <a href="#text-to-speech">TTS</a> •
15
- <a href="#speech-to-text">STT</a> •
16
- <a href="./docs/ai-sdk.md">AI SDK</a> •
17
- <a href="./docs/cli.md">CLI</a>
14
+ <a href="#native-webgpu-engine">Engine</a> •
15
+ <a href="#react-quickstart">React</a> •
16
+ <a href="#embeddings">Embeddings</a> •
17
+ <a href="#vision">Vision</a> •
18
+ <a href="#speech">Speech</a>
19
+ <a href="https://gerbilsdk.com/docs/frameworks/ai-sdk">AI SDK</a> •
20
+ <a href="https://gerbilsdk.com/docs/cli">CLI</a>
18
21
  </p>
19
22
 
20
23
  <p align="center">
@@ -35,20 +38,29 @@
35
38
  ---
36
39
 
37
40
  ```typescript
38
- import gerbil from "@tryhamster/gerbil";
41
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
39
42
 
40
- const text = await gerbil("Explain recursion in one sentence");
43
+ const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
44
+ const { text } = await engine.generate("Explain recursion in one sentence");
41
45
  ```
42
46
 
47
+ > **📚 Full docs → [gerbilsdk.com/docs](https://gerbilsdk.com/docs)** · live playground & demos at **[gerbilsdk.com](https://gerbilsdk.com)**
48
+
43
49
  ## Why Gerbil?
44
50
 
45
- - **Zero Config** — `npx @tryhamster/gerbil "your prompt"` just works
46
- - **Local & Private** — No API keys, no data leaves your machine
47
- - **GPU Accelerated** — WebGPU with CPU fallback
48
- - **Complete Audio** Text-to-Speech (Kokoro) & Speech-to-Text (Whisper)
49
- - **Framework Ready** — AI SDK v5, Next.js, Express, LangChain
50
- - **Skills System** Built-in + custom skills with Zod validation
51
- - **Tool Calling** — Agentic capabilities with Qwen3 models
51
+ - **One native engine** — a from-scratch WebGPU/WGSL engine, pure compute shaders, nothing
52
+ extra to ship.
53
+ - **~90 KB gzipped** — the _entire_ multimodal engine. No heavyweight ML runtime; model
54
+ weights stream from the HuggingFace Hub at run time.
55
+ - **Multimodal, all native** — text, vision (image→text), embeddings, and speech run on the
56
+ same engine, loading safetensors directly from the HuggingFace Hub.
57
+ - **Browser & Node** — Chrome 113+, Safari 26+ (iOS 26+), Firefox 141+, and Node via Dawn
58
+ (`webgpu` npm), anywhere there's a real GPU.
59
+ - **Local & private** — no API keys, nothing leaves the device.
60
+ - **React-first** — `useEngine` owns load / unload / hot-swap and shares one engine
61
+ across components (reference-counted), with `dtype: "auto"` picking int4 on mobile.
62
+ - **Framework ready** — Vercel AI SDK v5, Next.js, Express, LangChain adapters.
63
+ - **Skills & tools** — built-in + custom skills with Zod validation; agentic tool calling.
52
64
 
53
65
  ## Install
54
66
 
@@ -65,82 +77,187 @@ npm install @tryhamster/gerbil
65
77
 
66
78
  After global install, use `gerbil` directly instead of `npx @tryhamster/gerbil`.
67
79
 
68
- ## Quick Start
80
+ ## Native WebGPU Engine
81
+
82
+ Gerbil's product is a from-scratch WebGPU inference engine — pure WGSL compute shaders.
83
+ It loads safetensors directly from the HuggingFace Hub (selective tensor download — skip
84
+ vision towers you don't need) and runs the same code in the browser and in Node (via Dawn).
69
85
 
70
86
  ```typescript
71
- import { Gerbil } from "@tryhamster/gerbil";
87
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
72
88
 
73
- const g = new Gerbil();
74
- await g.loadModel("qwen3-0.6b");
89
+ // dtype "auto" picks int4 on mobile, the repo's native precision on desktop.
90
+ const engine = await WebGPUEngine.create({
91
+ repo: "mlx-community/Qwen3.5-0.8B-4bit",
92
+ dtype: "auto",
93
+ });
75
94
 
76
95
  // Generate
77
- const result = await g.generate("Write a haiku");
78
- console.log(result.text);
96
+ const { text, tokensPerSecond } = await engine.generate("Write a haiku about gerbils");
97
+ console.log(text, `(${tokensPerSecond.toFixed(1)} tok/s)`);
79
98
 
80
99
  // Stream
81
- for await (const chunk of g.stream("Tell me a story")) {
82
- process.stdout.write(chunk);
100
+ for await (const token of engine.stream("Tell me a story")) {
101
+ process.stdout.write(token);
83
102
  }
84
103
 
85
- // Thinking mode (Qwen3)
86
- const math = await g.generate("What is 127 × 43?", { thinking: true });
87
- console.log(math.thinking); // Shows reasoning
88
- console.log(math.text); // "5461"
104
+ engine.destroy();
105
+ ```
89
106
 
90
- // Structured JSON
91
- const data = await g.json("Extract: John, 32, NYC", {
92
- schema: z.object({ name: z.string(), age: z.number(), city: z.string() }),
93
- });
107
+ `WebGPUEngine.create({ repo, dtype, enableVision, embedding, maxSeqLen })` returns an
108
+ engine with `generate`, `stream`, `describeImage`, `embed`, and `speak`. See the
109
+ [native engine docs](#supported-models) below for the model lineup.
110
+
111
+ ## React Quickstart
112
+
113
+ `useEngine` (from `@tryhamster/gerbil/gpu/hooks`) owns the full engine lifecycle —
114
+ load, unload, hot-swap on config change, and reference-counted sharing so multiple
115
+ components never upload the same weights to the GPU twice.
116
+
117
+ ```tsx
118
+ import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
119
+
120
+ function Chat() {
121
+ const { complete, completion, isLoading, isGenerating, tps } = useEngine({
122
+ model: "mlx-community/Qwen3.5-0.8B-4bit",
123
+ autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
124
+ });
125
+
126
+ if (isLoading) return <div>Loading model…</div>;
127
+ return (
128
+ <div>
129
+ <button onClick={() => complete("What is 2+2?")} disabled={isGenerating}>
130
+ Generate
131
+ </button>
132
+ <p>{completion}</p>
133
+ {isGenerating && <span>{tps?.toFixed(1)} tok/s</span>}
134
+ </div>
135
+ );
136
+ }
94
137
  ```
95
138
 
96
- ## Text-to-Speech
139
+ The same hook exposes `describeImage` (vision), `embed`/`similarity` (embeddings), `stop`,
140
+ and `dispose`. Pass `enableVision: true` or `embedding: true` to load those modalities.
97
141
 
98
- Generate natural speech locally using Kokoro TTS (28 voices):
142
+ ## Structured Output
143
+
144
+ `generateObject` makes the model return a JSON object: it generates, extracts the JSON,
145
+ validates it, and retries with a corrective nudge until it's valid (or `maxRetries` is hit).
146
+ Validate with a predicate `(o) => boolean` or a minimal `{ required: [...] }` schema; omit
147
+ `schema` to accept any valid JSON.
99
148
 
100
149
  ```typescript
101
- const result = await g.speak("Hello, I'm Gerbil!", { voice: "af_heart" });
102
- // result.audio = Float32Array, result.sampleRate = 24000
150
+ import { generateObject } from "@tryhamster/gerbil";
103
151
 
104
- // Stream long text
105
- for await (const chunk of g.speakStream("Long paragraph...")) {
106
- // Play each chunk as it's generated
107
- }
152
+ const { object, attempts } = await generateObject<{ name: string; age: number }>(
153
+ 'Extract {name, age} from: "I am Sarah, 28"',
154
+ { schema: { required: ["name", "age"] } },
155
+ );
156
+ // object === { name: "Sarah", age: 28 }
108
157
  ```
109
158
 
159
+ It's available on the engine, the `Gerbil` class, and the one-liner API:
160
+
161
+ ```typescript
162
+ import { Gerbil, WebGPUEngine } from "@tryhamster/gerbil";
163
+
164
+ const g = new Gerbil();
165
+ await g.loadModel("qwen3.5-0.8b");
166
+ const { object } = await g.generateObject("List 3 primes as {primes: number[]}", {
167
+ schema: (o) => Array.isArray((o as any).primes),
168
+ });
169
+
170
+ // Or directly on the engine:
171
+ const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
172
+ await engine.generateObject("…", { schema: { required: ["title"] } });
173
+ ```
174
+
175
+ In React, use `useObject` (from `@tryhamster/gerbil/gpu/hooks`):
176
+
177
+ ```tsx
178
+ import { useObject } from "@tryhamster/gerbil/gpu/hooks";
179
+
180
+ const { generate, object, isGenerating } = useObject<{ city: string }>();
181
+ await generate("Extract the city from: I live in Paris", {
182
+ schema: { required: ["city"] },
183
+ });
184
+ ```
185
+
186
+ From the CLI:
187
+
110
188
  ```bash
111
- # CLI
112
- gerbil speak "Hello world" --voice bf_emma
189
+ gerbil object "Extract {name, age}: I am Sarah, 28" --schema person.json
190
+ # person.json: { "required": ["name", "age"] }
191
+ ```
192
+
193
+ ## Embeddings
194
+
195
+ Native text embeddings via **EmbeddingGemma-300M** (mean-pooled Gemma3 encoder + Dense
196
+ head, 768-dim, L2-normalized). EmbeddingGemma is asymmetric — pass `taskType` so queries
197
+ and documents get the right prefix.
198
+
199
+ ```typescript
200
+ import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
201
+
202
+ const engine = await WebGPUEngine.create({
203
+ repo: "mlx-community/embeddinggemma-300m-4bit",
204
+ embedding: true,
205
+ });
206
+
207
+ const query = await engine.embed("capital of France", { taskType: "query" });
208
+ const doc = await engine.embed("Paris is the capital of France", { taskType: "document" });
209
+
210
+ // Vectors are unit-norm, so cosine similarity is a dot product.
211
+ const sim = query.reduce((s, v, i) => s + v * doc[i], 0);
113
212
  ```
114
213
 
115
- 📖 **[Full TTS Documentation →](./docs/tts.md)**
214
+ 📖 **[Full Embeddings Documentation →](https://gerbilsdk.com/docs)**
116
215
 
117
- ## Speech-to-Text
216
+ ## Vision
118
217
 
119
- Transcribe audio locally using Whisper (7 models, 80+ languages):
218
+ Image-in text-out via the native vision towers (Qwen3.5 ViT and Gemma 4 ViT). Load with
219
+ `enableVision: true`, then call `describeImage`.
120
220
 
121
221
  ```typescript
122
- import { readFileSync } from "fs";
222
+ const engine = await WebGPUEngine.create({
223
+ repo: "Qwen/Qwen3.5-0.8B",
224
+ enableVision: true,
225
+ });
226
+
227
+ // In Node, decode the image to RGB pixels (HWC, 0..255) yourself; in the browser the
228
+ // React hook's describeImage() takes a URL / data-URL directly.
229
+ const { text } = await engine.describeImage(
230
+ { pixels, width, height },
231
+ "What's in this image?",
232
+ );
233
+ ```
123
234
 
124
- const audio = new Uint8Array(readFileSync("recording.wav"));
125
- const result = await g.transcribe(audio);
126
- console.log(result.text);
235
+ 📖 **[Full Vision Documentation →](https://gerbilsdk.com/docs/vision)**
127
236
 
128
- // With timestamps
129
- const result = await g.transcribe(audio, { timestamps: true });
130
- for (const seg of result.segments) {
131
- console.log(`[${seg.start}s] ${seg.text}`);
132
- }
237
+ ## Speech
238
+
239
+ **Text-to-speech** — native **Kani-TTS-2** (LFM2-350M codec-LM + NVIDIA NeMo NanoCodec).
240
+ `engine.speak()` returns 22.05 kHz mono PCM.
133
241
 
134
- // Record from microphone
135
- const result = await g.listen(5000); // 5 seconds
242
+ ```typescript
243
+ const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-2-en" });
244
+ const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!"); // sampleRate === 22050
136
245
  ```
137
246
 
138
- ```bash
139
- # CLI
140
- gerbil transcribe audio.wav --timestamps
247
+ **Speech-to-text** — native **Moonshine** (raw-waveform encoder/decoder, no FFT/log-mel)
248
+ via the dedicated `MoonshineSTT` class.
249
+
250
+ ```typescript
251
+ import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
252
+
253
+ const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
254
+ const { text, noSpeech } = await stt.transcribe(pcm16kMono); // noSpeech flags silence
141
255
  ```
142
256
 
143
- 📖 **[Full STT Documentation →](./docs/stt.md)**
257
+ `transcribe` returns `noSpeech` (RMS VAD + min-duration + marker denylist) so you can skip
258
+ silent/empty clips; `useSTT` surfaces it too, with an `onNoSpeech` callback.
259
+
260
+ 📖 **[Full TTS Documentation →](https://gerbilsdk.com/docs/tts)** · **[Full STT Documentation →](https://gerbilsdk.com/docs/stt)**
144
261
 
145
262
  ## Skills
146
263
 
@@ -174,7 +291,7 @@ await loadSkills("./skills"); // loads *.skill.ts
174
291
  const skill = useSkill("my-skill");
175
292
  ```
176
293
 
177
- 📖 **[Full Skills Documentation →](./docs/skills.md)**
294
+ 📖 **[Full Skills Documentation →](https://gerbilsdk.com/docs/skills)**
178
295
 
179
296
  ## Tools & Agents
180
297
 
@@ -191,6 +308,27 @@ const weatherTool = defineTool({
191
308
  });
192
309
  ```
193
310
 
311
+ **Agentic loop, on-device.** `engine.generateWithTools` (and the `useAgent` React hook)
312
+ run the whole loop — generate → call a tool → feed the result back → repeat — and return a
313
+ step trace for UIs:
314
+
315
+ ```tsx
316
+ import { useAgent } from "@tryhamster/gerbil/gpu/hooks";
317
+
318
+ const { run, steps, answer, isRunning } = useAgent({
319
+ model: "mlx-community/Qwen3.5-0.8B-4bit",
320
+ tools: [
321
+ {
322
+ name: "get_weather",
323
+ description: "Get the weather for a city",
324
+ parameters: { city: "string" },
325
+ execute: ({ city }) => `Weather in ${city}: 72°F, sunny`,
326
+ },
327
+ ],
328
+ });
329
+ await run("What's the weather in Paris?"); // steps[]: tool_call → tool_result → answer
330
+ ```
331
+
194
332
  **Built-in tools:**
195
333
  - `gerbil_docs` — Search Gerbil documentation
196
334
  - `run_skill` — Execute any Gerbil skill
@@ -204,7 +342,28 @@ npx @tryhamster/gerbil repl
204
342
  # Gerbil will call the docs tool and synthesize an answer
205
343
  ```
206
344
 
207
- 📖 **[Full Tools Documentation →](./docs/tools.md)**
345
+ 📖 **[Full Tools Documentation →](https://gerbilsdk.com/docs/tools)**
346
+
347
+ ## Autocomplete & Rewrite
348
+
349
+ Inline autocomplete — `engine.autocomplete(prefix)` and the debounced `useAutocomplete`
350
+ hook return a brief single-line continuation (low-latency defaults + cleanup):
351
+
352
+ ```tsx
353
+ import { useAutocomplete } from "@tryhamster/gerbil/gpu/hooks";
354
+
355
+ const { suggestion, onInput, accept, dismiss } = useAutocomplete({
356
+ model: "mlx-community/Qwen3.5-0.8B-4bit",
357
+ });
358
+ // <input onChange={(e) => onInput(e.target.value)} /> — render `suggestion` as ghost text;
359
+ // Tab → accept(), Esc → dismiss()
360
+ ```
361
+
362
+ Tone rewrite — `engine.rewrite(text, { tone })` (and `useEngine().rewrite`) re-generates
363
+ text in a target tone (`"professional"`, `"friendly"`, `"concise"`, `"playful"`,
364
+ `"pirate"`) or with free-form `instructions`.
365
+
366
+ 📖 **[Full Autocomplete Documentation →](https://gerbilsdk.com/docs/autocomplete)**
208
367
 
209
368
  ## CLI
210
369
 
@@ -227,85 +386,140 @@ gerbil update # Update to latest version
227
386
 
228
387
  > **Updates**: Gerbil checks for updates but never installs without permission. Press `u` in REPL or run `gerbil update`.
229
388
 
230
- 📖 **[Full CLI Documentation →](./docs/cli.md)**
389
+ 📖 **[Full CLI Documentation →](https://gerbilsdk.com/docs/cli)**
231
390
 
232
391
  ## Browser Usage
233
392
 
234
- Run LLMs directly in the browser with WebGPU — no server required:
393
+ Run LLMs directly in the browser with WebGPU — no server required. The React hooks
394
+ live at `@tryhamster/gerbil/gpu/hooks` and run pure WebGPU compute:
235
395
 
236
396
  ```tsx
237
- import { useChat } from "@tryhamster/gerbil/browser";
397
+ import { useChat } from "@tryhamster/gerbil/gpu/hooks";
238
398
 
239
399
  function Chat() {
240
- const { messages, input, setInput, handleSubmit, isLoading } = useChat();
400
+ const { messages, send, isLoading, isGenerating } = useChat();
241
401
 
242
402
  if (isLoading) return <div>Loading model...</div>;
243
403
 
244
404
  return (
245
- <form onSubmit={handleSubmit}>
246
- {messages.map(m => <div key={m.id}>{m.role}: {m.content}</div>)}
247
- <input value={input} onChange={e => setInput(e.target.value)} />
248
- </form>
405
+ <div>
406
+ {messages.map((m, i) => <div key={i}>{m.role}: {m.content}</div>)}
407
+ <button onClick={() => send("Hello!")} disabled={isGenerating}>Send</button>
408
+ </div>
249
409
  );
250
410
  }
251
411
  ```
252
412
 
253
- 📖 **[Full Browser Documentation →](./docs/browser.md)**
413
+ `@tryhamster/gerbil/browser` exports the device & storage utilities
414
+ (`isModelSafeForDevice`, `detectMemoryCrash`, `downloadModelChunked`,
415
+ `getRecommendedModels`, `requestPersistentStorage`, …).
416
+
417
+ 📖 **[Full Browser Documentation →](https://gerbilsdk.com/docs/browser)**
254
418
 
255
419
  ## Integrations
256
420
 
257
421
  | Integration | Import | Docs |
258
422
  |-------------|--------|------|
259
- | **Browser** | `@tryhamster/gerbil/browser` | [📖 Browser](./docs/browser.md) |
260
- | **AI SDK v5** | `@tryhamster/gerbil/ai` | [📖 AI SDK](./docs/ai-sdk.md) |
261
- | **Next.js** | `@tryhamster/gerbil/next` | [📖 Frameworks](./docs/frameworks.md) |
262
- | **Express** | `@tryhamster/gerbil/express` | [📖 Frameworks](./docs/frameworks.md) |
263
- | **LangChain** | `@tryhamster/gerbil/langchain` | [📖 Frameworks](./docs/frameworks.md) |
264
- | **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](./docs/mcp.md) |
423
+ | **Browser** | `@tryhamster/gerbil/browser` | [📖 Browser](https://gerbilsdk.com/docs/browser) |
424
+ | **AI SDK v5** | `@tryhamster/gerbil/ai` | [📖 AI SDK](https://gerbilsdk.com/docs/frameworks/ai-sdk) |
425
+ | **Next.js** | `@tryhamster/gerbil/next` | [📖 Next.js](https://gerbilsdk.com/docs/frameworks/nextjs) |
426
+ | **Express** | `@tryhamster/gerbil/express` | [📖 Express](https://gerbilsdk.com/docs/frameworks/express) |
427
+ | **LangChain** | `@tryhamster/gerbil/langchain` | [📖 LangChain](https://gerbilsdk.com/docs/frameworks/langchain) |
428
+ | **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](https://gerbilsdk.com/docs/mcp) |
429
+
430
+ **Native engine:** `import { WebGPUEngine } from "@tryhamster/gerbil/gpu"` (or `useEngine` from `@tryhamster/gerbil/gpu/hooks` for React) is the primary surface for text, vision, embeddings, and speech.
431
+
432
+ ## Supported Models
433
+
434
+ The native engine runs these modalities today. All load straight from the HuggingFace Hub
435
+ via `WebGPUEngine.create({ repo })`.
436
+
437
+ ### Text
265
438
 
266
- **Audio capabilities:** TTS and STT are built into the core `Gerbil` class, `@tryhamster/gerbil/browser` hooks, and `@tryhamster/gerbil/ai` provider.
439
+ | Model | Repo | Notes |
440
+ |-------|------|-------|
441
+ | **Qwen3.5-0.8B** | `mlx-community/Qwen3.5-0.8B-4bit` | Default text model; vision-capable (`Qwen/Qwen3.5-0.8B` for the ViT) |
442
+ | **Qwen3.5-2B** | `Qwen/Qwen3.5-2B` | Higher quality; 262k context; multimodal (vision-capable) |
443
+ | **LFM2.5-350M** | `LiquidAI/LFM2.5-350M` | Hybrid conv/attention, very fast, ~199 MB q4 |
444
+ | **Gemma 4 E2B** | `mlx-community/gemma-4-e2b-it-4bit` | PLE CPU-streamed; vision-capable |
267
445
 
268
- ## Models
446
+ ### Vision (image → text, `describeImage`)
269
447
 
270
- ### Language Models
448
+ | Tower | From | Notes |
449
+ |-------|------|-------|
450
+ | **Qwen3.5 ViT** | `Qwen/Qwen3.5-0.8B` (`enableVision: true`) | Bit-exact vs HF |
451
+ | **Gemma 4 ViT** | `mlx-community/gemma-4-e2b-it-4bit` (`enableVision: true`) | Native projector |
271
452
 
272
- | Model | Size | Best For |
273
- |-------|------|----------|
274
- | `qwen3-0.6b` | ~400MB | General use, reasoning (thinking mode) |
275
- | `qwen2.5-coder-0.5b` | ~400MB | Code generation |
276
- | `smollm2-135m` | ~100MB | Quick completions |
453
+ ### Embeddings (`embed`)
277
454
 
278
- Use any HuggingFace model: `npx @tryhamster/gerbil -m hf:org/model "prompt"`
455
+ | Model | Repo | Notes |
456
+ |-------|------|-------|
457
+ | **EmbeddingGemma-300M** | `mlx-community/embeddinggemma-300m-4bit` | 768-dim, asymmetric (`taskType`), runs on iPad |
279
458
 
280
- ### Audio Models
459
+ ### Speech
281
460
 
282
- | Model | Type | Size | Notes |
461
+ | Model | Type | Repo | Notes |
283
462
  |-------|------|------|-------|
284
- | `kokoro-82m` | TTS | ~330MB | 28 voices, English |
285
- | `whisper-tiny.en` | STT | 39MB | English, fastest |
286
- | `whisper-base.en` | STT | 74MB | English, balanced |
287
- | `whisper-small` | STT | 244MB | 80+ languages |
463
+ | **Kani-TTS-2** | TTS | `nineninesix/kani-tts-2-en` | `engine.speak()` 22.05 kHz PCM |
464
+ | **Moonshine** | STT | `UsefulSensors/moonshine-base` | `MoonshineSTT.transcribe()`, raw-waveform |
465
+
466
+ ### Quantization & dtype
467
+
468
+ `dtype: "auto"` (the React-hook default) picks int4 on mobile and the repo's native
469
+ precision on desktop. For Qwen3.5-0.8B on Dawn/Node:
470
+
471
+ | Format | Download | tok/s | Notes |
472
+ |---|---|---|---|
473
+ | MLX 4-bit (affine) | 404 MB | fastest | Smallest. Recommended. |
474
+ | GPTQ (AutoRound) | 734 MB | fast | Pre-quantized linears, F16 embed |
475
+ | F32 (on-the-fly Q4) | 1666 MB | slowest | No pre-quantization needed |
476
+
477
+ > Throughput moves run-to-run and across the optimization loop; treat these as relative,
478
+ > not promises.
479
+
480
+ ### WGSL Kernels
481
+
482
+ MatMul, MatMulInt4, EmbeddingInt4, RMSNorm, RoPE, GQA Attention (flash-style, causal +
483
+ bidirectional), SwiGLU/GeGLU, CrossAttention, CausalConv1d, M-RoPE, EmbedSplice, FSQ +
484
+ HiFi-GAN (NanoCodec decoder), and more.
485
+
486
+ > **High-level `Gerbil` class.** `import { Gerbil } from "@tryhamster/gerbil"` (plus the
487
+ > one-liner and `@tryhamster/gerbil/skills`) is a supported convenience wrapper over the
488
+ > native `WebGPUEngine` — ideal for quick scripts, the CLI, and the AI SDK. Reach for
489
+ > `WebGPUEngine` / `useEngine` directly when you want lower-level control over loading,
490
+ > vision, embeddings, and speech.
288
491
 
289
492
  ## Documentation
290
493
 
494
+ Full documentation, guides, and a live playground live at **[gerbilsdk.com/docs](https://gerbilsdk.com/docs)**.
495
+
291
496
  | Guide | Description |
292
497
  |-------|-------------|
293
- | [📖 Text-to-Speech](./docs/tts.md) | Kokoro TTS, 28 voices, streaming audio |
294
- | [📖 Speech-to-Text](./docs/stt.md) | Whisper STT, transcription, voice input |
295
- | [📖 Browser](./docs/browser.md) | WebGPU inference, React hooks |
296
- | [📖 Skills](./docs/skills.md) | Built-in skills, custom skill development |
297
- | [📖 Tools](./docs/tools.md) | Tool calling, agentic workflows |
298
- | [📖 REPL](./docs/repl.md) | Interactive terminal dashboard |
299
- | [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT) |
300
- | [📖 Frameworks](./docs/frameworks.md) | Next.js, Express, React, LangChain |
301
- | [📖 CLI](./docs/cli.md) | All CLI commands and options |
302
- | [📖 MCP Server](./docs/mcp.md) | MCP server for Claude Desktop & Cursor |
303
- | [📖 MCP Client](./docs/mcp-client.md) | Connect to external MCP servers |
498
+ | [📖 Getting Started](https://gerbilsdk.com/docs/getting-started) | Install, load a model, core concepts |
499
+ | [📖 Structured Output](https://gerbilsdk.com/docs) | `generateObject` / `useObject` validated JSON with retries |
500
+ | [📖 Embeddings](https://gerbilsdk.com/docs) | EmbeddingGemma semantic search, similarity, RAG |
501
+ | [📖 Vision](https://gerbilsdk.com/docs/vision) | Image text with Qwen3.5 ViT & Gemma 4 ViT |
502
+ | [📖 Text-to-Speech](https://gerbilsdk.com/docs/tts) | Native Kani-TTS-2 (`engine.speak()`) |
503
+ | [📖 Speech-to-Text](https://gerbilsdk.com/docs/stt) | Native Moonshine (`MoonshineSTT`) |
504
+ | [📖 Browser](https://gerbilsdk.com/docs/browser) | WebGPU inference, React hooks |
505
+ | [📖 Hooks](https://gerbilsdk.com/docs/hooks) | `useEngine` / `useObject` / `useTTS` / `useSTT` |
506
+ | [📖 Skills](https://gerbilsdk.com/docs/skills) | Built-in skills, custom skill development |
507
+ | [📖 Tools](https://gerbilsdk.com/docs/tools) | Tool calling, agentic workflows |
508
+ | [📖 REPL](https://gerbilsdk.com/docs/repl) | Interactive terminal dashboard |
509
+ | [📖 AI SDK](https://gerbilsdk.com/docs/frameworks/ai-sdk) | Vercel AI SDK v5 (LLM, TTS, STT, Embeddings) |
510
+ | [📖 Frameworks](https://gerbilsdk.com/docs/frameworks) | Next.js, Express, React, LangChain |
511
+ | [📖 CLI](https://gerbilsdk.com/docs/cli) | All CLI commands and options |
512
+ | [📖 Mobile](https://gerbilsdk.com/docs/mobile) | iOS / iPadOS guidance & memory guards |
513
+ | [📖 MCP](https://gerbilsdk.com/docs/mcp) | MCP server for Claude Desktop & Cursor |
304
514
 
305
515
  ## Requirements
306
516
 
307
- - Node.js 18+
308
- - For GPU: WebGPU-compatible environment
517
+ The native engine needs a real GPU and a WebGPU runtime:
518
+
519
+ - **Browser** — Chrome/Edge 113+, Safari 26+ (iOS/iPadOS 26+), or Firefox 141+
520
+ - **Node** — Node.js 18+ with the `webgpu` package (Dawn) installed
521
+
522
+ On devices without WebGPU the engine throws a clear error rather than silently degrading.
309
523
 
310
524
  ## License
311
525