@tryhamster/gerbil 1.0.0-rc.9 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +1 -1
- package/README.md +247 -84
- package/dist/architectures-C1I5V3Dt.mjs +6070 -0
- package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
- package/dist/browser/index.d.ts +264 -588
- package/dist/browser/index.d.ts.map +1 -1
- package/dist/browser/index.js +585 -2334
- package/dist/browser/index.js.map +1 -1
- package/dist/cli.mjs +625 -1098
- package/dist/cli.mjs.map +1 -1
- package/dist/defaults-9komdrbY.mjs +24 -0
- package/dist/defaults-9komdrbY.mjs.map +1 -0
- package/dist/frameworks/express.d.mts +1 -3
- package/dist/frameworks/express.d.mts.map +1 -1
- package/dist/frameworks/express.mjs +7 -7
- package/dist/frameworks/express.mjs.map +1 -1
- package/dist/frameworks/fastify.d.mts +1 -1
- package/dist/frameworks/fastify.d.mts.map +1 -1
- package/dist/frameworks/fastify.mjs +3 -3
- package/dist/frameworks/fastify.mjs.map +1 -1
- package/dist/frameworks/hono.d.mts +1 -1
- package/dist/frameworks/hono.d.mts.map +1 -1
- package/dist/frameworks/hono.mjs +4 -4
- package/dist/frameworks/hono.mjs.map +1 -1
- package/dist/frameworks/next.d.mts +3 -2
- package/dist/frameworks/next.d.mts.map +1 -1
- package/dist/frameworks/next.mjs +4 -4
- package/dist/frameworks/next.mjs.map +1 -1
- package/dist/frameworks/react.d.mts +1 -1
- package/dist/frameworks/trpc.d.mts +1 -1
- package/dist/frameworks/trpc.d.mts.map +1 -1
- package/dist/frameworks/trpc.mjs +4 -4
- package/dist/frameworks/trpc.mjs.map +1 -1
- package/dist/gerbil-BHrJJIa4.mjs +1656 -0
- package/dist/gerbil-BHrJJIa4.mjs.map +1 -0
- package/dist/gerbil-BT9fCydo.d.mts +488 -0
- package/dist/gerbil-BT9fCydo.d.mts.map +1 -0
- package/dist/gerbil-DomNfIr1.mjs +4 -0
- package/dist/gpu/hooks.d.mts +520 -0
- package/dist/gpu/hooks.d.mts.map +1 -0
- package/dist/gpu/hooks.mjs +1188 -0
- package/dist/gpu/hooks.mjs.map +1 -0
- package/dist/gpu/index.d.mts +2 -0
- package/dist/gpu/index.mjs +6 -0
- package/dist/gpu-33qCAtHW.mjs +3615 -0
- package/dist/gpu-33qCAtHW.mjs.map +1 -0
- package/dist/index-Dgmb2kE3.d.mts +245 -0
- package/dist/index-Dgmb2kE3.d.mts.map +1 -0
- package/dist/index-jEAL2s-A.d.mts +2022 -0
- package/dist/index-jEAL2s-A.d.mts.map +1 -0
- package/dist/index.d.mts +22 -487
- package/dist/index.d.mts.map +1 -1
- package/dist/index.mjs +13 -8
- package/dist/index.mjs.map +1 -1
- package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
- package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
- package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
- package/dist/integrations/ai-sdk.d.mts +75 -6
- package/dist/integrations/ai-sdk.d.mts.map +1 -1
- package/dist/integrations/ai-sdk.mjs +131 -15
- package/dist/integrations/ai-sdk.mjs.map +1 -1
- package/dist/integrations/langchain.d.mts +1 -1
- package/dist/integrations/langchain.d.mts.map +1 -1
- package/dist/integrations/langchain.mjs +5 -5
- package/dist/integrations/langchain.mjs.map +1 -1
- package/dist/integrations/llamaindex.d.mts +1 -1
- package/dist/integrations/llamaindex.d.mts.map +1 -1
- package/dist/integrations/llamaindex.mjs +5 -5
- package/dist/integrations/llamaindex.mjs.map +1 -1
- package/dist/integrations/mcp-client.mjs +3 -3
- package/dist/integrations/mcp-client.mjs.map +1 -1
- package/dist/integrations/mcp.d.mts +3 -2
- package/dist/integrations/mcp.d.mts.map +1 -1
- package/dist/integrations/mcp.mjs +5 -5
- package/dist/{mcp-BvbriaBy.mjs → mcp-1DaMsaBc.mjs} +4 -4
- package/dist/mcp-1DaMsaBc.mjs.map +1 -0
- package/dist/memory/index.d.mts +3 -0
- package/dist/memory/index.mjs +6 -0
- package/dist/memory-D1P7Tmda.mjs +4 -0
- package/dist/memory-DVN0MnIG.mjs +132 -0
- package/dist/memory-DVN0MnIG.mjs.map +1 -0
- package/dist/memory-Dj0J1v88.mjs +294 -0
- package/dist/memory-Dj0J1v88.mjs.map +1 -0
- package/dist/moonshine-stt-BLyVoRpB.mjs +4 -0
- package/dist/moonshine-stt-v_P_Ci_m.mjs +11936 -0
- package/dist/moonshine-stt-v_P_Ci_m.mjs.map +1 -0
- package/dist/{one-liner-s-lD8rCC.mjs → one-liner-DnQn7HJK.mjs} +14 -16
- package/dist/one-liner-DnQn7HJK.mjs.map +1 -0
- package/dist/repl-jV5gcJFA.mjs +9 -0
- package/dist/skills/index.d.mts +270 -320
- package/dist/skills/index.d.mts.map +1 -1
- package/dist/skills/index.mjs +5 -5
- package/dist/{skills-CD3Orlex.mjs → skills-DX8D59UH.mjs} +187 -32
- package/dist/skills-DX8D59UH.mjs.map +1 -0
- package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
- package/dist/tools-DQ1mPUw5.mjs.map +1 -0
- package/dist/{types-CiTc7ez3.d.mts → types-D6FiR_oh.d.mts} +106 -12
- package/dist/types-D6FiR_oh.d.mts.map +1 -0
- package/dist/types-DQBe2lFo.d.mts +165 -0
- package/dist/types-DQBe2lFo.d.mts.map +1 -0
- package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
- package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
- package/dist/vector-B0panuy6.mjs +95 -0
- package/dist/vector-B0panuy6.mjs.map +1 -0
- package/docs/PROJECT-STATE.md +321 -0
- package/docs/adding-a-model-family.md +280 -0
- package/docs/ai-sdk.md +70 -61
- package/docs/architecture/overview.md +17 -7
- package/docs/browser.md +203 -8
- package/docs/embeddings.md +156 -0
- package/docs/gerbil-site-native-migration.md +217 -0
- package/docs/gpu-engine/architectures.md +398 -0
- package/docs/gpu-engine/ir.md +372 -0
- package/docs/gpu-engine/kernels.md +718 -0
- package/docs/gpu-engine/paper.html +1759 -0
- package/docs/gpu-engine/paper.md +2109 -0
- package/docs/gpu-engine/safetensors.md +312 -0
- package/docs/gpu-engine/tokenizer.md +302 -0
- package/docs/memory-rag.md +91 -0
- package/docs/metal-safari-intel.md +190 -0
- package/docs/mobile-failure-diagnosis.md +124 -0
- package/docs/mobile.md +99 -0
- package/docs/observability.md +230 -0
- package/docs/onnx-removal-plan.md +339 -0
- package/docs/research/autoresearch-portable.md +904 -0
- package/docs/research/dispatch-reduction-hivemind.md +84 -0
- package/docs/research/ios-safari-model-caching.md +117 -0
- package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
- package/docs/research/native-stt-model-selection.md +49 -0
- package/docs/research/native-tts-model-selection.md +90 -0
- package/docs/research/native-vs-chromium-decision.md +152 -0
- package/docs/research/nemotron-mamba2-inference.md +910 -0
- package/docs/research/qwen35-multimodal.md +293 -0
- package/docs/research/qwen36-gemma4-targets.md +337 -0
- package/docs/research/sota-embedding-models.md +179 -0
- package/docs/research/sota-mobile-models-2026.md +263 -0
- package/docs/research/sota-modality-models.md +202 -0
- package/docs/research/tps-baselines.md +71 -0
- package/docs/research/webgpu-m4-reference.md +104 -0
- package/docs/site-update-plan.md +155 -0
- package/docs/structured-output.md +123 -0
- package/docs/stt.md +63 -446
- package/docs/tts.md +77 -499
- package/docs/vision.md +100 -338
- package/package.json +22 -7
- package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
- package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
- package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
- package/dist/gerbil-CJ3ifloF.mjs +0 -4
- package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
- package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
- package/dist/gerbil-qOTe1nl2.d.mts +0 -431
- package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
- package/dist/kokoro-BNTb6egA.mjs +0 -20210
- package/dist/kokoro-BNTb6egA.mjs.map +0 -1
- package/dist/kokoro-CMOGDSgT.js +0 -20212
- package/dist/kokoro-CMOGDSgT.js.map +0 -1
- package/dist/mcp-BvbriaBy.mjs.map +0 -1
- package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
- package/dist/repl-DveXw36T.mjs +0 -9
- package/dist/skills-CD3Orlex.mjs.map +0 -1
- package/dist/stt-Bu-E23Sc.js +0 -433
- package/dist/stt-Bu-E23Sc.js.map +0 -1
- package/dist/stt-CpLYbGFd.mjs +0 -433
- package/dist/stt-CpLYbGFd.mjs.map +0 -1
- package/dist/stt-DRPLEEHB.mjs +0 -3
- package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
- package/dist/transformers.web-DiD1gTwk.js +0 -44695
- package/dist/transformers.web-DiD1gTwk.js.map +0 -1
- package/dist/transformers.web-u34VxRFM.js +0 -3
- package/dist/tts-CqroPaSK.js +0 -724
- package/dist/tts-CqroPaSK.js.map +0 -1
- package/dist/tts-DXgsKGCe.mjs +0 -3
- package/dist/tts-DeGANMNV.mjs +0 -730
- package/dist/tts-DeGANMNV.mjs.map +0 -1
- package/dist/types-CiTc7ez3.d.mts.map +0 -1
- /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
- /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
- /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
package/LICENSE
CHANGED
package/README.md
CHANGED
|
@@ -5,14 +5,16 @@
|
|
|
5
5
|
<h1 align="center">Gerbil</h1>
|
|
6
6
|
|
|
7
7
|
<p align="center">
|
|
8
|
-
<strong>
|
|
8
|
+
<strong>A from-scratch WebGPU/WGSL inference engine. Text, vision, embeddings, speech — all native, on-device, in the browser and Node.</strong>
|
|
9
9
|
</p>
|
|
10
10
|
|
|
11
11
|
<p align="center">
|
|
12
12
|
<a href="#install">Install</a> •
|
|
13
|
-
<a href="#
|
|
14
|
-
<a href="#
|
|
15
|
-
<a href="#
|
|
13
|
+
<a href="#native-webgpu-engine">Engine</a> •
|
|
14
|
+
<a href="#react-quickstart">React</a> •
|
|
15
|
+
<a href="#embeddings">Embeddings</a> •
|
|
16
|
+
<a href="#vision">Vision</a> •
|
|
17
|
+
<a href="#speech">Speech</a> •
|
|
16
18
|
<a href="./docs/ai-sdk.md">AI SDK</a> •
|
|
17
19
|
<a href="./docs/cli.md">CLI</a>
|
|
18
20
|
</p>
|
|
@@ -35,20 +37,28 @@
|
|
|
35
37
|
---
|
|
36
38
|
|
|
37
39
|
```typescript
|
|
38
|
-
import
|
|
40
|
+
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
|
|
39
41
|
|
|
40
|
-
const
|
|
42
|
+
const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
|
|
43
|
+
const { text } = await engine.generate("Explain recursion in one sentence");
|
|
41
44
|
```
|
|
42
45
|
|
|
46
|
+
> **Pre-1.0.** Gerbil is a release-candidate (`1.0.0-rc.26`, changeset pre-release). The
|
|
47
|
+
> native engine surface below is the path going forward; APIs may still shift before 1.0.
|
|
48
|
+
|
|
43
49
|
## Why Gerbil?
|
|
44
50
|
|
|
45
|
-
- **
|
|
46
|
-
|
|
47
|
-
- **
|
|
48
|
-
|
|
49
|
-
- **
|
|
50
|
-
|
|
51
|
-
- **
|
|
51
|
+
- **One native engine** — a from-scratch WebGPU/WGSL engine, pure compute shaders, nothing
|
|
52
|
+
extra to ship.
|
|
53
|
+
- **Multimodal, all native** — text, vision (image→text), embeddings, and speech run on the
|
|
54
|
+
same engine, loading safetensors directly from the HuggingFace Hub.
|
|
55
|
+
- **Browser & Node** — Chrome 113+, Safari 26+ (iOS 26+), Firefox 141+, and Node via Dawn
|
|
56
|
+
(`webgpu` npm), anywhere there's a real GPU.
|
|
57
|
+
- **Local & private** — no API keys, nothing leaves the device.
|
|
58
|
+
- **React-first** — `useEngine` owns load / unload / hot-swap and shares one engine
|
|
59
|
+
across components (reference-counted), with `dtype: "auto"` picking int4 on mobile.
|
|
60
|
+
- **Framework ready** — Vercel AI SDK v5, Next.js, Express, LangChain adapters.
|
|
61
|
+
- **Skills & tools** — built-in + custom skills with Zod validation; agentic tool calling.
|
|
52
62
|
|
|
53
63
|
## Install
|
|
54
64
|
|
|
@@ -65,82 +75,184 @@ npm install @tryhamster/gerbil
|
|
|
65
75
|
|
|
66
76
|
After global install, use `gerbil` directly instead of `npx @tryhamster/gerbil`.
|
|
67
77
|
|
|
68
|
-
##
|
|
78
|
+
## Native WebGPU Engine
|
|
79
|
+
|
|
80
|
+
Gerbil's product is a from-scratch WebGPU inference engine — pure WGSL compute shaders.
|
|
81
|
+
It loads safetensors directly from the HuggingFace Hub (selective tensor download — skip
|
|
82
|
+
vision towers you don't need) and runs the same code in the browser and in Node (via Dawn).
|
|
69
83
|
|
|
70
84
|
```typescript
|
|
71
|
-
import {
|
|
85
|
+
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
|
|
72
86
|
|
|
73
|
-
|
|
74
|
-
await
|
|
87
|
+
// dtype "auto" picks int4 on mobile, the repo's native precision on desktop.
|
|
88
|
+
const engine = await WebGPUEngine.create({
|
|
89
|
+
repo: "mlx-community/Qwen3.5-0.8B-4bit",
|
|
90
|
+
dtype: "auto",
|
|
91
|
+
});
|
|
75
92
|
|
|
76
93
|
// Generate
|
|
77
|
-
const
|
|
78
|
-
console.log(
|
|
94
|
+
const { text, tokensPerSecond } = await engine.generate("Write a haiku about gerbils");
|
|
95
|
+
console.log(text, `(${tokensPerSecond.toFixed(1)} tok/s)`);
|
|
79
96
|
|
|
80
97
|
// Stream
|
|
81
|
-
for await (const
|
|
82
|
-
process.stdout.write(
|
|
98
|
+
for await (const token of engine.stream("Tell me a story")) {
|
|
99
|
+
process.stdout.write(token);
|
|
83
100
|
}
|
|
84
101
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
console.log(math.thinking); // Shows reasoning
|
|
88
|
-
console.log(math.text); // "5461"
|
|
102
|
+
engine.destroy();
|
|
103
|
+
```
|
|
89
104
|
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
105
|
+
`WebGPUEngine.create({ repo, dtype, enableVision, embedding, maxSeqLen })` returns an
|
|
106
|
+
engine with `generate`, `stream`, `describeImage`, `embed`, and `speak`. See the
|
|
107
|
+
[native engine docs](#supported-models) below for the model lineup.
|
|
108
|
+
|
|
109
|
+
## React Quickstart
|
|
110
|
+
|
|
111
|
+
`useEngine` (from `@tryhamster/gerbil/gpu/hooks`) owns the full engine lifecycle —
|
|
112
|
+
load, unload, hot-swap on config change, and reference-counted sharing so multiple
|
|
113
|
+
components never upload the same weights to the GPU twice.
|
|
114
|
+
|
|
115
|
+
```tsx
|
|
116
|
+
import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
|
|
117
|
+
|
|
118
|
+
function Chat() {
|
|
119
|
+
const { complete, completion, isLoading, isGenerating, tps } = useEngine({
|
|
120
|
+
model: "mlx-community/Qwen3.5-0.8B-4bit",
|
|
121
|
+
autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
|
|
122
|
+
});
|
|
123
|
+
|
|
124
|
+
if (isLoading) return <div>Loading model…</div>;
|
|
125
|
+
return (
|
|
126
|
+
<div>
|
|
127
|
+
<button onClick={() => complete("What is 2+2?")} disabled={isGenerating}>
|
|
128
|
+
Generate
|
|
129
|
+
</button>
|
|
130
|
+
<p>{completion}</p>
|
|
131
|
+
{isGenerating && <span>{tps?.toFixed(1)} tok/s</span>}
|
|
132
|
+
</div>
|
|
133
|
+
);
|
|
134
|
+
}
|
|
94
135
|
```
|
|
95
136
|
|
|
96
|
-
|
|
137
|
+
The same hook exposes `describeImage` (vision), `embed`/`similarity` (embeddings), `stop`,
|
|
138
|
+
and `dispose`. Pass `enableVision: true` or `embedding: true` to load those modalities.
|
|
139
|
+
|
|
140
|
+
## Structured Output
|
|
97
141
|
|
|
98
|
-
|
|
142
|
+
`generateObject` makes the model return a JSON object: it generates, extracts the JSON,
|
|
143
|
+
validates it, and retries with a corrective nudge until it's valid (or `maxRetries` is hit).
|
|
144
|
+
Validate with a predicate `(o) => boolean` or a minimal `{ required: [...] }` schema; omit
|
|
145
|
+
`schema` to accept any valid JSON.
|
|
99
146
|
|
|
100
147
|
```typescript
|
|
101
|
-
|
|
102
|
-
// result.audio = Float32Array, result.sampleRate = 24000
|
|
148
|
+
import { generateObject } from "@tryhamster/gerbil";
|
|
103
149
|
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
150
|
+
const { object, attempts } = await generateObject<{ name: string; age: number }>(
|
|
151
|
+
'Extract {name, age} from: "I am Sarah, 28"',
|
|
152
|
+
{ schema: { required: ["name", "age"] } },
|
|
153
|
+
);
|
|
154
|
+
// object === { name: "Sarah", age: 28 }
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
It's available on the engine, the `Gerbil` class, and the one-liner API:
|
|
158
|
+
|
|
159
|
+
```typescript
|
|
160
|
+
import { Gerbil, WebGPUEngine } from "@tryhamster/gerbil";
|
|
161
|
+
|
|
162
|
+
const g = new Gerbil();
|
|
163
|
+
await g.loadModel("qwen3.5-0.8b");
|
|
164
|
+
const { object } = await g.generateObject("List 3 primes as {primes: number[]}", {
|
|
165
|
+
schema: (o) => Array.isArray((o as any).primes),
|
|
166
|
+
});
|
|
167
|
+
|
|
168
|
+
// Or directly on the engine:
|
|
169
|
+
const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
|
|
170
|
+
await engine.generateObject("…", { schema: { required: ["title"] } });
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
In React, use `useObject` (from `@tryhamster/gerbil/gpu/hooks`):
|
|
174
|
+
|
|
175
|
+
```tsx
|
|
176
|
+
import { useObject } from "@tryhamster/gerbil/gpu/hooks";
|
|
177
|
+
|
|
178
|
+
const { generate, object, isGenerating } = useObject<{ city: string }>();
|
|
179
|
+
await generate("Extract the city from: I live in Paris", {
|
|
180
|
+
schema: { required: ["city"] },
|
|
181
|
+
});
|
|
108
182
|
```
|
|
109
183
|
|
|
184
|
+
From the CLI:
|
|
185
|
+
|
|
110
186
|
```bash
|
|
111
|
-
|
|
112
|
-
|
|
187
|
+
gerbil object "Extract {name, age}: I am Sarah, 28" --schema person.json
|
|
188
|
+
# person.json: { "required": ["name", "age"] }
|
|
113
189
|
```
|
|
114
190
|
|
|
115
|
-
|
|
191
|
+
## Embeddings
|
|
116
192
|
|
|
117
|
-
|
|
193
|
+
Native text embeddings via **EmbeddingGemma-300M** (mean-pooled Gemma3 encoder + Dense
|
|
194
|
+
head, 768-dim, L2-normalized). EmbeddingGemma is asymmetric — pass `taskType` so queries
|
|
195
|
+
and documents get the right prefix.
|
|
118
196
|
|
|
119
|
-
|
|
197
|
+
```typescript
|
|
198
|
+
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
|
|
199
|
+
|
|
200
|
+
const engine = await WebGPUEngine.create({
|
|
201
|
+
repo: "mlx-community/embeddinggemma-300m-4bit",
|
|
202
|
+
embedding: true,
|
|
203
|
+
});
|
|
204
|
+
|
|
205
|
+
const query = await engine.embed("capital of France", { taskType: "query" });
|
|
206
|
+
const doc = await engine.embed("Paris is the capital of France", { taskType: "document" });
|
|
207
|
+
|
|
208
|
+
// Vectors are unit-norm, so cosine similarity is a dot product.
|
|
209
|
+
const sim = query.reduce((s, v, i) => s + v * doc[i], 0);
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
📖 **[Full Embeddings Documentation →](./docs/embeddings.md)**
|
|
213
|
+
|
|
214
|
+
## Vision
|
|
215
|
+
|
|
216
|
+
Image-in → text-out via the native vision towers (Qwen3.5 ViT and Gemma 4 ViT). Load with
|
|
217
|
+
`enableVision: true`, then call `describeImage`.
|
|
120
218
|
|
|
121
219
|
```typescript
|
|
122
|
-
|
|
220
|
+
const engine = await WebGPUEngine.create({
|
|
221
|
+
repo: "Qwen/Qwen3.5-0.8B",
|
|
222
|
+
enableVision: true,
|
|
223
|
+
});
|
|
224
|
+
|
|
225
|
+
// In Node, decode the image to RGB pixels (HWC, 0..255) yourself; in the browser the
|
|
226
|
+
// React hook's describeImage() takes a URL / data-URL directly.
|
|
227
|
+
const { text } = await engine.describeImage(
|
|
228
|
+
{ pixels, width, height },
|
|
229
|
+
"What's in this image?",
|
|
230
|
+
);
|
|
231
|
+
```
|
|
123
232
|
|
|
124
|
-
|
|
125
|
-
const result = await g.transcribe(audio);
|
|
126
|
-
console.log(result.text);
|
|
233
|
+
📖 **[Full Vision Documentation →](./docs/vision.md)**
|
|
127
234
|
|
|
128
|
-
|
|
129
|
-
const result = await g.transcribe(audio, { timestamps: true });
|
|
130
|
-
for (const seg of result.segments) {
|
|
131
|
-
console.log(`[${seg.start}s] ${seg.text}`);
|
|
132
|
-
}
|
|
235
|
+
## Speech
|
|
133
236
|
|
|
134
|
-
|
|
135
|
-
|
|
237
|
+
**Text-to-speech** — native **Kani-TTS-2** (LFM2-350M codec-LM + NVIDIA NeMo NanoCodec).
|
|
238
|
+
`engine.speak()` returns 22.05 kHz mono PCM.
|
|
239
|
+
|
|
240
|
+
```typescript
|
|
241
|
+
const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-2-en" });
|
|
242
|
+
const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!"); // sampleRate === 22050
|
|
136
243
|
```
|
|
137
244
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
245
|
+
**Speech-to-text** — native **Moonshine** (raw-waveform encoder/decoder, no FFT/log-mel)
|
|
246
|
+
via the dedicated `MoonshineSTT` class.
|
|
247
|
+
|
|
248
|
+
```typescript
|
|
249
|
+
import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
|
|
250
|
+
|
|
251
|
+
const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
|
|
252
|
+
const { text } = await stt.transcribe(pcm16kMono); // Float32Array @ 16 kHz
|
|
141
253
|
```
|
|
142
254
|
|
|
143
|
-
📖 **[Full STT Documentation →](./docs/stt.md)**
|
|
255
|
+
📖 **[Full TTS Documentation →](./docs/tts.md)** · **[Full STT Documentation →](./docs/stt.md)**
|
|
144
256
|
|
|
145
257
|
## Skills
|
|
146
258
|
|
|
@@ -231,25 +343,30 @@ gerbil update # Update to latest version
|
|
|
231
343
|
|
|
232
344
|
## Browser Usage
|
|
233
345
|
|
|
234
|
-
Run LLMs directly in the browser with WebGPU — no server required
|
|
346
|
+
Run LLMs directly in the browser with WebGPU — no server required. The React hooks
|
|
347
|
+
live at `@tryhamster/gerbil/gpu/hooks` and run pure WebGPU compute:
|
|
235
348
|
|
|
236
349
|
```tsx
|
|
237
|
-
import { useChat } from "@tryhamster/gerbil/
|
|
350
|
+
import { useChat } from "@tryhamster/gerbil/gpu/hooks";
|
|
238
351
|
|
|
239
352
|
function Chat() {
|
|
240
|
-
const { messages,
|
|
353
|
+
const { messages, send, isLoading, isGenerating } = useChat();
|
|
241
354
|
|
|
242
355
|
if (isLoading) return <div>Loading model...</div>;
|
|
243
356
|
|
|
244
357
|
return (
|
|
245
|
-
<
|
|
246
|
-
{messages.map(m => <div key={
|
|
247
|
-
<
|
|
248
|
-
</
|
|
358
|
+
<div>
|
|
359
|
+
{messages.map((m, i) => <div key={i}>{m.role}: {m.content}</div>)}
|
|
360
|
+
<button onClick={() => send("Hello!")} disabled={isGenerating}>Send</button>
|
|
361
|
+
</div>
|
|
249
362
|
);
|
|
250
363
|
}
|
|
251
364
|
```
|
|
252
365
|
|
|
366
|
+
`@tryhamster/gerbil/browser` still exports the device/WebGPU utilities
|
|
367
|
+
(`isModelSafeForDevice`, `detectMemoryCrash`, `downloadModelChunked`,
|
|
368
|
+
`checkWebGPUCapabilities`, `getBrowserDiagnostics`, …).
|
|
369
|
+
|
|
253
370
|
📖 **[Full Browser Documentation →](./docs/browser.md)**
|
|
254
371
|
|
|
255
372
|
## Integrations
|
|
@@ -263,40 +380,82 @@ function Chat() {
|
|
|
263
380
|
| **LangChain** | `@tryhamster/gerbil/langchain` | [📖 Frameworks](./docs/frameworks.md) |
|
|
264
381
|
| **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](./docs/mcp.md) |
|
|
265
382
|
|
|
266
|
-
**
|
|
383
|
+
**Native engine:** `import { WebGPUEngine } from "@tryhamster/gerbil/gpu"` (or `useEngine` from `@tryhamster/gerbil/gpu/hooks` for React) is the primary surface for text, vision, embeddings, and speech.
|
|
384
|
+
|
|
385
|
+
## Supported Models
|
|
267
386
|
|
|
268
|
-
|
|
387
|
+
The native engine runs these modalities today. All load straight from the HuggingFace Hub
|
|
388
|
+
via `WebGPUEngine.create({ repo })`.
|
|
269
389
|
|
|
270
|
-
###
|
|
390
|
+
### Text
|
|
271
391
|
|
|
272
|
-
| Model |
|
|
273
|
-
|
|
274
|
-
| `
|
|
275
|
-
| `
|
|
276
|
-
| `
|
|
392
|
+
| Model | Repo | Notes |
|
|
393
|
+
|-------|------|-------|
|
|
394
|
+
| **Qwen3.5-0.8B** | `mlx-community/Qwen3.5-0.8B-4bit` | Default text model; vision-capable (`Qwen/Qwen3.5-0.8B` for the ViT) |
|
|
395
|
+
| **Qwen3.5-2B** | `Qwen/Qwen3.5-2B` | Higher quality; 262k context; multimodal (vision-capable) |
|
|
396
|
+
| **LFM2.5-350M** | `LiquidAI/LFM2.5-350M` | Hybrid conv/attention, very fast, ~199 MB q4 |
|
|
397
|
+
| **Gemma 4 E2B** | `mlx-community/gemma-4-e2b-it-4bit` | PLE CPU-streamed; vision-capable |
|
|
277
398
|
|
|
278
|
-
|
|
399
|
+
### Vision (image → text, `describeImage`)
|
|
279
400
|
|
|
280
|
-
|
|
401
|
+
| Tower | From | Notes |
|
|
402
|
+
|-------|------|-------|
|
|
403
|
+
| **Qwen3.5 ViT** | `Qwen/Qwen3.5-0.8B` (`enableVision: true`) | Bit-exact vs HF |
|
|
404
|
+
| **Gemma 4 ViT** | `mlx-community/gemma-4-e2b-it-4bit` (`enableVision: true`) | Native projector |
|
|
281
405
|
|
|
282
|
-
|
|
406
|
+
### Embeddings (`embed`)
|
|
407
|
+
|
|
408
|
+
| Model | Repo | Notes |
|
|
409
|
+
|-------|------|-------|
|
|
410
|
+
| **EmbeddingGemma-300M** | `mlx-community/embeddinggemma-300m-4bit` | 768-dim, asymmetric (`taskType`), runs on iPad |
|
|
411
|
+
|
|
412
|
+
### Speech
|
|
413
|
+
|
|
414
|
+
| Model | Type | Repo | Notes |
|
|
283
415
|
|-------|------|------|-------|
|
|
284
|
-
|
|
|
285
|
-
|
|
|
286
|
-
|
|
287
|
-
|
|
416
|
+
| **Kani-TTS-2** | TTS | `nineninesix/kani-tts-2-en` | `engine.speak()` → 22.05 kHz PCM |
|
|
417
|
+
| **Moonshine** | STT | `UsefulSensors/moonshine-base` | `MoonshineSTT.transcribe()`, raw-waveform |
|
|
418
|
+
|
|
419
|
+
### Quantization & dtype
|
|
420
|
+
|
|
421
|
+
`dtype: "auto"` (the React-hook default) picks int4 on mobile and the repo's native
|
|
422
|
+
precision on desktop. For Qwen3.5-0.8B on Dawn/Node:
|
|
423
|
+
|
|
424
|
+
| Format | Download | tok/s | Notes |
|
|
425
|
+
|---|---|---|---|
|
|
426
|
+
| MLX 4-bit (affine) | 404 MB | fastest | Smallest. Recommended. |
|
|
427
|
+
| GPTQ (AutoRound) | 734 MB | fast | Pre-quantized linears, F16 embed |
|
|
428
|
+
| F32 (on-the-fly Q4) | 1666 MB | slowest | No pre-quantization needed |
|
|
429
|
+
|
|
430
|
+
> Throughput moves run-to-run and across the optimization loop; treat these as relative,
|
|
431
|
+
> not promises.
|
|
432
|
+
|
|
433
|
+
### WGSL Kernels
|
|
434
|
+
|
|
435
|
+
MatMul, MatMulInt4, EmbeddingInt4, RMSNorm, RoPE, GQA Attention (flash-style, causal +
|
|
436
|
+
bidirectional), SwiGLU/GeGLU, CrossAttention, CausalConv1d, M-RoPE, EmbedSplice, FSQ +
|
|
437
|
+
HiFi-GAN (NanoCodec decoder), and more.
|
|
438
|
+
|
|
439
|
+
> **High-level `Gerbil` class.** `import { Gerbil } from "@tryhamster/gerbil"` (plus the
|
|
440
|
+
> one-liner and `@tryhamster/gerbil/skills`) is a supported convenience wrapper over the
|
|
441
|
+
> native `WebGPUEngine` — ideal for quick scripts, the CLI, and the AI SDK. Reach for
|
|
442
|
+
> `WebGPUEngine` / `useEngine` directly when you want lower-level control over loading,
|
|
443
|
+
> vision, embeddings, and speech.
|
|
288
444
|
|
|
289
445
|
## Documentation
|
|
290
446
|
|
|
291
447
|
| Guide | Description |
|
|
292
448
|
|-------|-------------|
|
|
293
|
-
| [📖
|
|
294
|
-
| [📖
|
|
449
|
+
| [📖 Structured Output](./docs/structured-output.md) | `generateObject` / `useObject` — validated JSON with retries |
|
|
450
|
+
| [📖 Embeddings](./docs/embeddings.md) | EmbeddingGemma semantic search, similarity, RAG |
|
|
451
|
+
| [📖 Vision](./docs/vision.md) | Image → text with Qwen3.5 ViT & Gemma 4 ViT |
|
|
452
|
+
| [📖 Text-to-Speech](./docs/tts.md) | Native Kani-TTS-2 (`engine.speak()`) |
|
|
453
|
+
| [📖 Speech-to-Text](./docs/stt.md) | Native Moonshine (`MoonshineSTT`) |
|
|
295
454
|
| [📖 Browser](./docs/browser.md) | WebGPU inference, React hooks |
|
|
296
455
|
| [📖 Skills](./docs/skills.md) | Built-in skills, custom skill development |
|
|
297
456
|
| [📖 Tools](./docs/tools.md) | Tool calling, agentic workflows |
|
|
298
457
|
| [📖 REPL](./docs/repl.md) | Interactive terminal dashboard |
|
|
299
|
-
| [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT) |
|
|
458
|
+
| [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT, Embeddings) |
|
|
300
459
|
| [📖 Frameworks](./docs/frameworks.md) | Next.js, Express, React, LangChain |
|
|
301
460
|
| [📖 CLI](./docs/cli.md) | All CLI commands and options |
|
|
302
461
|
| [📖 MCP Server](./docs/mcp.md) | MCP server for Claude Desktop & Cursor |
|
|
@@ -304,8 +463,12 @@ Use any HuggingFace model: `npx @tryhamster/gerbil -m hf:org/model "prompt"`
|
|
|
304
463
|
|
|
305
464
|
## Requirements
|
|
306
465
|
|
|
307
|
-
|
|
308
|
-
|
|
466
|
+
The native engine needs a real GPU and a WebGPU runtime:
|
|
467
|
+
|
|
468
|
+
- **Browser** — Chrome/Edge 113+, Safari 26+ (iOS/iPadOS 26+), or Firefox 141+
|
|
469
|
+
- **Node** — Node.js 18+ with the `webgpu` package (Dawn) installed
|
|
470
|
+
|
|
471
|
+
On devices without WebGPU the engine throws a clear error rather than silently degrading.
|
|
309
472
|
|
|
310
473
|
## License
|
|
311
474
|
|