@tryhamster/gerbil 1.0.0-rc.9 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +1 -1
- package/README.md +318 -104
- package/dist/architectures-C1I5V3Dt.mjs +6070 -0
- package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
- package/dist/browser/index.d.ts +276 -590
- package/dist/browser/index.d.ts.map +1 -1
- package/dist/browser/index.js +592 -2334
- package/dist/browser/index.js.map +1 -1
- package/dist/cli.mjs +625 -1098
- package/dist/cli.mjs.map +1 -1
- package/dist/defaults-9komdrbY.mjs +24 -0
- package/dist/defaults-9komdrbY.mjs.map +1 -0
- package/dist/frameworks/express.d.mts +1 -3
- package/dist/frameworks/express.d.mts.map +1 -1
- package/dist/frameworks/express.mjs +7 -7
- package/dist/frameworks/express.mjs.map +1 -1
- package/dist/frameworks/fastify.d.mts +1 -1
- package/dist/frameworks/fastify.d.mts.map +1 -1
- package/dist/frameworks/fastify.mjs +3 -3
- package/dist/frameworks/fastify.mjs.map +1 -1
- package/dist/frameworks/hono.d.mts +1 -1
- package/dist/frameworks/hono.d.mts.map +1 -1
- package/dist/frameworks/hono.mjs +4 -4
- package/dist/frameworks/hono.mjs.map +1 -1
- package/dist/frameworks/next.d.mts +3 -2
- package/dist/frameworks/next.d.mts.map +1 -1
- package/dist/frameworks/next.mjs +4 -4
- package/dist/frameworks/next.mjs.map +1 -1
- package/dist/frameworks/react.d.mts +1 -1
- package/dist/frameworks/trpc.d.mts +1 -1
- package/dist/frameworks/trpc.d.mts.map +1 -1
- package/dist/frameworks/trpc.mjs +4 -4
- package/dist/frameworks/trpc.mjs.map +1 -1
- package/dist/gerbil-BetB5xb0.d.mts +488 -0
- package/dist/gerbil-BetB5xb0.d.mts.map +1 -0
- package/dist/gerbil-CTZUa8EZ.mjs +4 -0
- package/dist/gerbil-DNniplr4.mjs +1656 -0
- package/dist/gerbil-DNniplr4.mjs.map +1 -0
- package/dist/gpu/hooks.d.mts +640 -0
- package/dist/gpu/hooks.d.mts.map +1 -0
- package/dist/gpu/hooks.mjs +1369 -0
- package/dist/gpu/hooks.mjs.map +1 -0
- package/dist/gpu/index.d.mts +2 -0
- package/dist/gpu/index.mjs +6 -0
- package/dist/gpu-DFuglcEx.mjs +3790 -0
- package/dist/gpu-DFuglcEx.mjs.map +1 -0
- package/dist/index-Dgmb2kE3.d.mts +245 -0
- package/dist/index-Dgmb2kE3.d.mts.map +1 -0
- package/dist/index-DukkJRMj.d.mts +2114 -0
- package/dist/index-DukkJRMj.d.mts.map +1 -0
- package/dist/index.d.mts +22 -487
- package/dist/index.d.mts.map +1 -1
- package/dist/index.mjs +13 -8
- package/dist/index.mjs.map +1 -1
- package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
- package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
- package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
- package/dist/integrations/ai-sdk.d.mts +75 -6
- package/dist/integrations/ai-sdk.d.mts.map +1 -1
- package/dist/integrations/ai-sdk.mjs +131 -15
- package/dist/integrations/ai-sdk.mjs.map +1 -1
- package/dist/integrations/langchain.d.mts +1 -1
- package/dist/integrations/langchain.d.mts.map +1 -1
- package/dist/integrations/langchain.mjs +5 -5
- package/dist/integrations/langchain.mjs.map +1 -1
- package/dist/integrations/llamaindex.d.mts +1 -1
- package/dist/integrations/llamaindex.d.mts.map +1 -1
- package/dist/integrations/llamaindex.mjs +5 -5
- package/dist/integrations/llamaindex.mjs.map +1 -1
- package/dist/integrations/mcp-client.mjs +3 -3
- package/dist/integrations/mcp-client.mjs.map +1 -1
- package/dist/integrations/mcp.d.mts +3 -2
- package/dist/integrations/mcp.d.mts.map +1 -1
- package/dist/integrations/mcp.mjs +5 -5
- package/dist/{mcp-BvbriaBy.mjs → mcp-D2vvH1Xc.mjs} +4 -4
- package/dist/mcp-D2vvH1Xc.mjs.map +1 -0
- package/dist/memory/index.d.mts +3 -0
- package/dist/memory/index.mjs +6 -0
- package/dist/memory-D1P7Tmda.mjs +4 -0
- package/dist/memory-DVN0MnIG.mjs +132 -0
- package/dist/memory-DVN0MnIG.mjs.map +1 -0
- package/dist/memory-Dj0J1v88.mjs +294 -0
- package/dist/memory-Dj0J1v88.mjs.map +1 -0
- package/dist/moonshine-stt-17dpP1kr.mjs +4 -0
- package/dist/moonshine-stt-4ojLtMq7.mjs +11962 -0
- package/dist/moonshine-stt-4ojLtMq7.mjs.map +1 -0
- package/dist/{one-liner-s-lD8rCC.mjs → one-liner-JhdIPxzF.mjs} +14 -16
- package/dist/one-liner-JhdIPxzF.mjs.map +1 -0
- package/dist/repl-BDRkwPGX.mjs +9 -0
- package/dist/skills/index.d.mts +270 -320
- package/dist/skills/index.d.mts.map +1 -1
- package/dist/skills/index.mjs +5 -5
- package/dist/{skills-CD3Orlex.mjs → skills-CU694Dc8.mjs} +187 -32
- package/dist/skills-CU694Dc8.mjs.map +1 -0
- package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
- package/dist/tools-DQ1mPUw5.mjs.map +1 -0
- package/dist/types-DQBe2lFo.d.mts +165 -0
- package/dist/types-DQBe2lFo.d.mts.map +1 -0
- package/dist/{types-CiTc7ez3.d.mts → types-LlyYILII.d.mts} +112 -14
- package/dist/types-LlyYILII.d.mts.map +1 -0
- package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
- package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
- package/dist/vector-B0panuy6.mjs +95 -0
- package/dist/vector-B0panuy6.mjs.map +1 -0
- package/docs/PROJECT-STATE.md +321 -0
- package/docs/adding-a-model-family.md +280 -0
- package/docs/ai-sdk.md +70 -61
- package/docs/architecture/overview.md +17 -7
- package/docs/browser.md +203 -8
- package/docs/embeddings.md +156 -0
- package/docs/gerbil-site-native-migration.md +217 -0
- package/docs/gpu-engine/architectures.md +398 -0
- package/docs/gpu-engine/ir.md +372 -0
- package/docs/gpu-engine/kernels.md +718 -0
- package/docs/gpu-engine/paper.html +1759 -0
- package/docs/gpu-engine/paper.md +2109 -0
- package/docs/gpu-engine/safetensors.md +312 -0
- package/docs/gpu-engine/tokenizer.md +302 -0
- package/docs/memory-rag.md +91 -0
- package/docs/metal-safari-intel.md +190 -0
- package/docs/mobile-failure-diagnosis.md +124 -0
- package/docs/mobile.md +99 -0
- package/docs/observability.md +230 -0
- package/docs/onnx-removal-plan.md +339 -0
- package/docs/research/autoresearch-portable.md +904 -0
- package/docs/research/dispatch-reduction-hivemind.md +84 -0
- package/docs/research/ios-safari-model-caching.md +117 -0
- package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
- package/docs/research/native-stt-model-selection.md +49 -0
- package/docs/research/native-tts-model-selection.md +90 -0
- package/docs/research/native-vs-chromium-decision.md +152 -0
- package/docs/research/nemotron-mamba2-inference.md +910 -0
- package/docs/research/qwen35-multimodal.md +293 -0
- package/docs/research/qwen36-gemma4-targets.md +337 -0
- package/docs/research/sota-embedding-models.md +179 -0
- package/docs/research/sota-mobile-models-2026.md +263 -0
- package/docs/research/sota-modality-models.md +202 -0
- package/docs/research/tps-baselines.md +71 -0
- package/docs/research/webgpu-m4-reference.md +104 -0
- package/docs/site-update-plan.md +155 -0
- package/docs/structured-output.md +123 -0
- package/docs/stt.md +63 -446
- package/docs/tts.md +77 -499
- package/docs/vision.md +100 -338
- package/package.json +22 -7
- package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
- package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
- package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
- package/dist/gerbil-CJ3ifloF.mjs +0 -4
- package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
- package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
- package/dist/gerbil-qOTe1nl2.d.mts +0 -431
- package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
- package/dist/kokoro-BNTb6egA.mjs +0 -20210
- package/dist/kokoro-BNTb6egA.mjs.map +0 -1
- package/dist/kokoro-CMOGDSgT.js +0 -20212
- package/dist/kokoro-CMOGDSgT.js.map +0 -1
- package/dist/mcp-BvbriaBy.mjs.map +0 -1
- package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
- package/dist/repl-DveXw36T.mjs +0 -9
- package/dist/skills-CD3Orlex.mjs.map +0 -1
- package/dist/stt-Bu-E23Sc.js +0 -433
- package/dist/stt-Bu-E23Sc.js.map +0 -1
- package/dist/stt-CpLYbGFd.mjs +0 -433
- package/dist/stt-CpLYbGFd.mjs.map +0 -1
- package/dist/stt-DRPLEEHB.mjs +0 -3
- package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
- package/dist/transformers.web-DiD1gTwk.js +0 -44695
- package/dist/transformers.web-DiD1gTwk.js.map +0 -1
- package/dist/transformers.web-u34VxRFM.js +0 -3
- package/dist/tts-CqroPaSK.js +0 -724
- package/dist/tts-CqroPaSK.js.map +0 -1
- package/dist/tts-DXgsKGCe.mjs +0 -3
- package/dist/tts-DeGANMNV.mjs +0 -730
- package/dist/tts-DeGANMNV.mjs.map +0 -1
- package/dist/types-CiTc7ez3.d.mts.map +0 -1
- /package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
- /package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
- /package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0
package/LICENSE
CHANGED
package/README.md
CHANGED
|
@@ -5,16 +5,19 @@
|
|
|
5
5
|
<h1 align="center">Gerbil</h1>
|
|
6
6
|
|
|
7
7
|
<p align="center">
|
|
8
|
-
<strong>
|
|
8
|
+
<strong>A from-scratch WebGPU/WGSL inference engine. Text, vision, embeddings, speech — all native, on-device, in the browser and Node.</strong>
|
|
9
9
|
</p>
|
|
10
10
|
|
|
11
11
|
<p align="center">
|
|
12
|
+
<a href="https://gerbilsdk.com"><strong>gerbilsdk.com</strong></a> •
|
|
12
13
|
<a href="#install">Install</a> •
|
|
13
|
-
<a href="#
|
|
14
|
-
<a href="#
|
|
15
|
-
<a href="#
|
|
16
|
-
<a href="
|
|
17
|
-
<a href="
|
|
14
|
+
<a href="#native-webgpu-engine">Engine</a> •
|
|
15
|
+
<a href="#react-quickstart">React</a> •
|
|
16
|
+
<a href="#embeddings">Embeddings</a> •
|
|
17
|
+
<a href="#vision">Vision</a> •
|
|
18
|
+
<a href="#speech">Speech</a> •
|
|
19
|
+
<a href="https://gerbilsdk.com/docs/frameworks/ai-sdk">AI SDK</a> •
|
|
20
|
+
<a href="https://gerbilsdk.com/docs/cli">CLI</a>
|
|
18
21
|
</p>
|
|
19
22
|
|
|
20
23
|
<p align="center">
|
|
@@ -35,20 +38,29 @@
|
|
|
35
38
|
---
|
|
36
39
|
|
|
37
40
|
```typescript
|
|
38
|
-
import
|
|
41
|
+
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
|
|
39
42
|
|
|
40
|
-
const
|
|
43
|
+
const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
|
|
44
|
+
const { text } = await engine.generate("Explain recursion in one sentence");
|
|
41
45
|
```
|
|
42
46
|
|
|
47
|
+
> **📚 Full docs → [gerbilsdk.com/docs](https://gerbilsdk.com/docs)** · live playground & demos at **[gerbilsdk.com](https://gerbilsdk.com)**
|
|
48
|
+
|
|
43
49
|
## Why Gerbil?
|
|
44
50
|
|
|
45
|
-
- **
|
|
46
|
-
|
|
47
|
-
-
|
|
48
|
-
|
|
49
|
-
- **
|
|
50
|
-
|
|
51
|
-
- **
|
|
51
|
+
- **One native engine** — a from-scratch WebGPU/WGSL engine, pure compute shaders, nothing
|
|
52
|
+
extra to ship.
|
|
53
|
+
- **~90 KB gzipped** — the _entire_ multimodal engine. No heavyweight ML runtime; model
|
|
54
|
+
weights stream from the HuggingFace Hub at run time.
|
|
55
|
+
- **Multimodal, all native** — text, vision (image→text), embeddings, and speech run on the
|
|
56
|
+
same engine, loading safetensors directly from the HuggingFace Hub.
|
|
57
|
+
- **Browser & Node** — Chrome 113+, Safari 26+ (iOS 26+), Firefox 141+, and Node via Dawn
|
|
58
|
+
(`webgpu` npm), anywhere there's a real GPU.
|
|
59
|
+
- **Local & private** — no API keys, nothing leaves the device.
|
|
60
|
+
- **React-first** — `useEngine` owns load / unload / hot-swap and shares one engine
|
|
61
|
+
across components (reference-counted), with `dtype: "auto"` picking int4 on mobile.
|
|
62
|
+
- **Framework ready** — Vercel AI SDK v5, Next.js, Express, LangChain adapters.
|
|
63
|
+
- **Skills & tools** — built-in + custom skills with Zod validation; agentic tool calling.
|
|
52
64
|
|
|
53
65
|
## Install
|
|
54
66
|
|
|
@@ -65,82 +77,187 @@ npm install @tryhamster/gerbil
|
|
|
65
77
|
|
|
66
78
|
After global install, use `gerbil` directly instead of `npx @tryhamster/gerbil`.
|
|
67
79
|
|
|
68
|
-
##
|
|
80
|
+
## Native WebGPU Engine
|
|
81
|
+
|
|
82
|
+
Gerbil's product is a from-scratch WebGPU inference engine — pure WGSL compute shaders.
|
|
83
|
+
It loads safetensors directly from the HuggingFace Hub (selective tensor download — skip
|
|
84
|
+
vision towers you don't need) and runs the same code in the browser and in Node (via Dawn).
|
|
69
85
|
|
|
70
86
|
```typescript
|
|
71
|
-
import {
|
|
87
|
+
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
|
|
72
88
|
|
|
73
|
-
|
|
74
|
-
await
|
|
89
|
+
// dtype "auto" picks int4 on mobile, the repo's native precision on desktop.
|
|
90
|
+
const engine = await WebGPUEngine.create({
|
|
91
|
+
repo: "mlx-community/Qwen3.5-0.8B-4bit",
|
|
92
|
+
dtype: "auto",
|
|
93
|
+
});
|
|
75
94
|
|
|
76
95
|
// Generate
|
|
77
|
-
const
|
|
78
|
-
console.log(
|
|
96
|
+
const { text, tokensPerSecond } = await engine.generate("Write a haiku about gerbils");
|
|
97
|
+
console.log(text, `(${tokensPerSecond.toFixed(1)} tok/s)`);
|
|
79
98
|
|
|
80
99
|
// Stream
|
|
81
|
-
for await (const
|
|
82
|
-
process.stdout.write(
|
|
100
|
+
for await (const token of engine.stream("Tell me a story")) {
|
|
101
|
+
process.stdout.write(token);
|
|
83
102
|
}
|
|
84
103
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
console.log(math.thinking); // Shows reasoning
|
|
88
|
-
console.log(math.text); // "5461"
|
|
104
|
+
engine.destroy();
|
|
105
|
+
```
|
|
89
106
|
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
107
|
+
`WebGPUEngine.create({ repo, dtype, enableVision, embedding, maxSeqLen })` returns an
|
|
108
|
+
engine with `generate`, `stream`, `describeImage`, `embed`, and `speak`. See the
|
|
109
|
+
[native engine docs](#supported-models) below for the model lineup.
|
|
110
|
+
|
|
111
|
+
## React Quickstart
|
|
112
|
+
|
|
113
|
+
`useEngine` (from `@tryhamster/gerbil/gpu/hooks`) owns the full engine lifecycle —
|
|
114
|
+
load, unload, hot-swap on config change, and reference-counted sharing so multiple
|
|
115
|
+
components never upload the same weights to the GPU twice.
|
|
116
|
+
|
|
117
|
+
```tsx
|
|
118
|
+
import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
|
|
119
|
+
|
|
120
|
+
function Chat() {
|
|
121
|
+
const { complete, completion, isLoading, isGenerating, tps } = useEngine({
|
|
122
|
+
model: "mlx-community/Qwen3.5-0.8B-4bit",
|
|
123
|
+
autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
|
|
124
|
+
});
|
|
125
|
+
|
|
126
|
+
if (isLoading) return <div>Loading model…</div>;
|
|
127
|
+
return (
|
|
128
|
+
<div>
|
|
129
|
+
<button onClick={() => complete("What is 2+2?")} disabled={isGenerating}>
|
|
130
|
+
Generate
|
|
131
|
+
</button>
|
|
132
|
+
<p>{completion}</p>
|
|
133
|
+
{isGenerating && <span>{tps?.toFixed(1)} tok/s</span>}
|
|
134
|
+
</div>
|
|
135
|
+
);
|
|
136
|
+
}
|
|
94
137
|
```
|
|
95
138
|
|
|
96
|
-
|
|
139
|
+
The same hook exposes `describeImage` (vision), `embed`/`similarity` (embeddings), `stop`,
|
|
140
|
+
and `dispose`. Pass `enableVision: true` or `embedding: true` to load those modalities.
|
|
97
141
|
|
|
98
|
-
|
|
142
|
+
## Structured Output
|
|
143
|
+
|
|
144
|
+
`generateObject` makes the model return a JSON object: it generates, extracts the JSON,
|
|
145
|
+
validates it, and retries with a corrective nudge until it's valid (or `maxRetries` is hit).
|
|
146
|
+
Validate with a predicate `(o) => boolean` or a minimal `{ required: [...] }` schema; omit
|
|
147
|
+
`schema` to accept any valid JSON.
|
|
99
148
|
|
|
100
149
|
```typescript
|
|
101
|
-
|
|
102
|
-
// result.audio = Float32Array, result.sampleRate = 24000
|
|
150
|
+
import { generateObject } from "@tryhamster/gerbil";
|
|
103
151
|
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
152
|
+
const { object, attempts } = await generateObject<{ name: string; age: number }>(
|
|
153
|
+
'Extract {name, age} from: "I am Sarah, 28"',
|
|
154
|
+
{ schema: { required: ["name", "age"] } },
|
|
155
|
+
);
|
|
156
|
+
// object === { name: "Sarah", age: 28 }
|
|
108
157
|
```
|
|
109
158
|
|
|
159
|
+
It's available on the engine, the `Gerbil` class, and the one-liner API:
|
|
160
|
+
|
|
161
|
+
```typescript
|
|
162
|
+
import { Gerbil, WebGPUEngine } from "@tryhamster/gerbil";
|
|
163
|
+
|
|
164
|
+
const g = new Gerbil();
|
|
165
|
+
await g.loadModel("qwen3.5-0.8b");
|
|
166
|
+
const { object } = await g.generateObject("List 3 primes as {primes: number[]}", {
|
|
167
|
+
schema: (o) => Array.isArray((o as any).primes),
|
|
168
|
+
});
|
|
169
|
+
|
|
170
|
+
// Or directly on the engine:
|
|
171
|
+
const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
|
|
172
|
+
await engine.generateObject("…", { schema: { required: ["title"] } });
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
In React, use `useObject` (from `@tryhamster/gerbil/gpu/hooks`):
|
|
176
|
+
|
|
177
|
+
```tsx
|
|
178
|
+
import { useObject } from "@tryhamster/gerbil/gpu/hooks";
|
|
179
|
+
|
|
180
|
+
const { generate, object, isGenerating } = useObject<{ city: string }>();
|
|
181
|
+
await generate("Extract the city from: I live in Paris", {
|
|
182
|
+
schema: { required: ["city"] },
|
|
183
|
+
});
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
From the CLI:
|
|
187
|
+
|
|
110
188
|
```bash
|
|
111
|
-
|
|
112
|
-
|
|
189
|
+
gerbil object "Extract {name, age}: I am Sarah, 28" --schema person.json
|
|
190
|
+
# person.json: { "required": ["name", "age"] }
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
## Embeddings
|
|
194
|
+
|
|
195
|
+
Native text embeddings via **EmbeddingGemma-300M** (mean-pooled Gemma3 encoder + Dense
|
|
196
|
+
head, 768-dim, L2-normalized). EmbeddingGemma is asymmetric — pass `taskType` so queries
|
|
197
|
+
and documents get the right prefix.
|
|
198
|
+
|
|
199
|
+
```typescript
|
|
200
|
+
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
|
|
201
|
+
|
|
202
|
+
const engine = await WebGPUEngine.create({
|
|
203
|
+
repo: "mlx-community/embeddinggemma-300m-4bit",
|
|
204
|
+
embedding: true,
|
|
205
|
+
});
|
|
206
|
+
|
|
207
|
+
const query = await engine.embed("capital of France", { taskType: "query" });
|
|
208
|
+
const doc = await engine.embed("Paris is the capital of France", { taskType: "document" });
|
|
209
|
+
|
|
210
|
+
// Vectors are unit-norm, so cosine similarity is a dot product.
|
|
211
|
+
const sim = query.reduce((s, v, i) => s + v * doc[i], 0);
|
|
113
212
|
```
|
|
114
213
|
|
|
115
|
-
📖 **[Full
|
|
214
|
+
📖 **[Full Embeddings Documentation →](https://gerbilsdk.com/docs)**
|
|
116
215
|
|
|
117
|
-
##
|
|
216
|
+
## Vision
|
|
118
217
|
|
|
119
|
-
|
|
218
|
+
Image-in → text-out via the native vision towers (Qwen3.5 ViT and Gemma 4 ViT). Load with
|
|
219
|
+
`enableVision: true`, then call `describeImage`.
|
|
120
220
|
|
|
121
221
|
```typescript
|
|
122
|
-
|
|
222
|
+
const engine = await WebGPUEngine.create({
|
|
223
|
+
repo: "Qwen/Qwen3.5-0.8B",
|
|
224
|
+
enableVision: true,
|
|
225
|
+
});
|
|
226
|
+
|
|
227
|
+
// In Node, decode the image to RGB pixels (HWC, 0..255) yourself; in the browser the
|
|
228
|
+
// React hook's describeImage() takes a URL / data-URL directly.
|
|
229
|
+
const { text } = await engine.describeImage(
|
|
230
|
+
{ pixels, width, height },
|
|
231
|
+
"What's in this image?",
|
|
232
|
+
);
|
|
233
|
+
```
|
|
123
234
|
|
|
124
|
-
|
|
125
|
-
const result = await g.transcribe(audio);
|
|
126
|
-
console.log(result.text);
|
|
235
|
+
📖 **[Full Vision Documentation →](https://gerbilsdk.com/docs/vision)**
|
|
127
236
|
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
}
|
|
237
|
+
## Speech
|
|
238
|
+
|
|
239
|
+
**Text-to-speech** — native **Kani-TTS-2** (LFM2-350M codec-LM + NVIDIA NeMo NanoCodec).
|
|
240
|
+
`engine.speak()` returns 22.05 kHz mono PCM.
|
|
133
241
|
|
|
134
|
-
|
|
135
|
-
const
|
|
242
|
+
```typescript
|
|
243
|
+
const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-2-en" });
|
|
244
|
+
const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!"); // sampleRate === 22050
|
|
136
245
|
```
|
|
137
246
|
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
247
|
+
**Speech-to-text** — native **Moonshine** (raw-waveform encoder/decoder, no FFT/log-mel)
|
|
248
|
+
via the dedicated `MoonshineSTT` class.
|
|
249
|
+
|
|
250
|
+
```typescript
|
|
251
|
+
import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
|
|
252
|
+
|
|
253
|
+
const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
|
|
254
|
+
const { text, noSpeech } = await stt.transcribe(pcm16kMono); // noSpeech flags silence
|
|
141
255
|
```
|
|
142
256
|
|
|
143
|
-
|
|
257
|
+
`transcribe` returns `noSpeech` (RMS VAD + min-duration + marker denylist) so you can skip
|
|
258
|
+
silent/empty clips; `useSTT` surfaces it too, with an `onNoSpeech` callback.
|
|
259
|
+
|
|
260
|
+
📖 **[Full TTS Documentation →](https://gerbilsdk.com/docs/tts)** · **[Full STT Documentation →](https://gerbilsdk.com/docs/stt)**
|
|
144
261
|
|
|
145
262
|
## Skills
|
|
146
263
|
|
|
@@ -174,7 +291,7 @@ await loadSkills("./skills"); // loads *.skill.ts
|
|
|
174
291
|
const skill = useSkill("my-skill");
|
|
175
292
|
```
|
|
176
293
|
|
|
177
|
-
📖 **[Full Skills Documentation →](
|
|
294
|
+
📖 **[Full Skills Documentation →](https://gerbilsdk.com/docs/skills)**
|
|
178
295
|
|
|
179
296
|
## Tools & Agents
|
|
180
297
|
|
|
@@ -191,6 +308,27 @@ const weatherTool = defineTool({
|
|
|
191
308
|
});
|
|
192
309
|
```
|
|
193
310
|
|
|
311
|
+
**Agentic loop, on-device.** `engine.generateWithTools` (and the `useAgent` React hook)
|
|
312
|
+
run the whole loop — generate → call a tool → feed the result back → repeat — and return a
|
|
313
|
+
step trace for UIs:
|
|
314
|
+
|
|
315
|
+
```tsx
|
|
316
|
+
import { useAgent } from "@tryhamster/gerbil/gpu/hooks";
|
|
317
|
+
|
|
318
|
+
const { run, steps, answer, isRunning } = useAgent({
|
|
319
|
+
model: "mlx-community/Qwen3.5-0.8B-4bit",
|
|
320
|
+
tools: [
|
|
321
|
+
{
|
|
322
|
+
name: "get_weather",
|
|
323
|
+
description: "Get the weather for a city",
|
|
324
|
+
parameters: { city: "string" },
|
|
325
|
+
execute: ({ city }) => `Weather in ${city}: 72°F, sunny`,
|
|
326
|
+
},
|
|
327
|
+
],
|
|
328
|
+
});
|
|
329
|
+
await run("What's the weather in Paris?"); // steps[]: tool_call → tool_result → answer
|
|
330
|
+
```
|
|
331
|
+
|
|
194
332
|
**Built-in tools:**
|
|
195
333
|
- `gerbil_docs` — Search Gerbil documentation
|
|
196
334
|
- `run_skill` — Execute any Gerbil skill
|
|
@@ -204,7 +342,28 @@ npx @tryhamster/gerbil repl
|
|
|
204
342
|
# Gerbil will call the docs tool and synthesize an answer
|
|
205
343
|
```
|
|
206
344
|
|
|
207
|
-
📖 **[Full Tools Documentation →](
|
|
345
|
+
📖 **[Full Tools Documentation →](https://gerbilsdk.com/docs/tools)**
|
|
346
|
+
|
|
347
|
+
## Autocomplete & Rewrite
|
|
348
|
+
|
|
349
|
+
Inline autocomplete — `engine.autocomplete(prefix)` and the debounced `useAutocomplete`
|
|
350
|
+
hook return a brief single-line continuation (low-latency defaults + cleanup):
|
|
351
|
+
|
|
352
|
+
```tsx
|
|
353
|
+
import { useAutocomplete } from "@tryhamster/gerbil/gpu/hooks";
|
|
354
|
+
|
|
355
|
+
const { suggestion, onInput, accept, dismiss } = useAutocomplete({
|
|
356
|
+
model: "mlx-community/Qwen3.5-0.8B-4bit",
|
|
357
|
+
});
|
|
358
|
+
// <input onChange={(e) => onInput(e.target.value)} /> — render `suggestion` as ghost text;
|
|
359
|
+
// Tab → accept(), Esc → dismiss()
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
Tone rewrite — `engine.rewrite(text, { tone })` (and `useEngine().rewrite`) re-generates
|
|
363
|
+
text in a target tone (`"professional"`, `"friendly"`, `"concise"`, `"playful"`,
|
|
364
|
+
`"pirate"`) or with free-form `instructions`.
|
|
365
|
+
|
|
366
|
+
📖 **[Full Autocomplete Documentation →](https://gerbilsdk.com/docs/autocomplete)**
|
|
208
367
|
|
|
209
368
|
## CLI
|
|
210
369
|
|
|
@@ -227,85 +386,140 @@ gerbil update # Update to latest version
|
|
|
227
386
|
|
|
228
387
|
> **Updates**: Gerbil checks for updates but never installs without permission. Press `u` in REPL or run `gerbil update`.
|
|
229
388
|
|
|
230
|
-
📖 **[Full CLI Documentation →](
|
|
389
|
+
📖 **[Full CLI Documentation →](https://gerbilsdk.com/docs/cli)**
|
|
231
390
|
|
|
232
391
|
## Browser Usage
|
|
233
392
|
|
|
234
|
-
Run LLMs directly in the browser with WebGPU — no server required
|
|
393
|
+
Run LLMs directly in the browser with WebGPU — no server required. The React hooks
|
|
394
|
+
live at `@tryhamster/gerbil/gpu/hooks` and run pure WebGPU compute:
|
|
235
395
|
|
|
236
396
|
```tsx
|
|
237
|
-
import { useChat } from "@tryhamster/gerbil/
|
|
397
|
+
import { useChat } from "@tryhamster/gerbil/gpu/hooks";
|
|
238
398
|
|
|
239
399
|
function Chat() {
|
|
240
|
-
const { messages,
|
|
400
|
+
const { messages, send, isLoading, isGenerating } = useChat();
|
|
241
401
|
|
|
242
402
|
if (isLoading) return <div>Loading model...</div>;
|
|
243
403
|
|
|
244
404
|
return (
|
|
245
|
-
<
|
|
246
|
-
{messages.map(m => <div key={
|
|
247
|
-
<
|
|
248
|
-
</
|
|
405
|
+
<div>
|
|
406
|
+
{messages.map((m, i) => <div key={i}>{m.role}: {m.content}</div>)}
|
|
407
|
+
<button onClick={() => send("Hello!")} disabled={isGenerating}>Send</button>
|
|
408
|
+
</div>
|
|
249
409
|
);
|
|
250
410
|
}
|
|
251
411
|
```
|
|
252
412
|
|
|
253
|
-
|
|
413
|
+
`@tryhamster/gerbil/browser` exports the device & storage utilities
|
|
414
|
+
(`isModelSafeForDevice`, `detectMemoryCrash`, `downloadModelChunked`,
|
|
415
|
+
`getRecommendedModels`, `requestPersistentStorage`, …).
|
|
416
|
+
|
|
417
|
+
📖 **[Full Browser Documentation →](https://gerbilsdk.com/docs/browser)**
|
|
254
418
|
|
|
255
419
|
## Integrations
|
|
256
420
|
|
|
257
421
|
| Integration | Import | Docs |
|
|
258
422
|
|-------------|--------|------|
|
|
259
|
-
| **Browser** | `@tryhamster/gerbil/browser` | [📖 Browser](
|
|
260
|
-
| **AI SDK v5** | `@tryhamster/gerbil/ai` | [📖 AI SDK](
|
|
261
|
-
| **Next.js** | `@tryhamster/gerbil/next` | [📖
|
|
262
|
-
| **Express** | `@tryhamster/gerbil/express` | [📖
|
|
263
|
-
| **LangChain** | `@tryhamster/gerbil/langchain` | [📖
|
|
264
|
-
| **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](
|
|
423
|
+
| **Browser** | `@tryhamster/gerbil/browser` | [📖 Browser](https://gerbilsdk.com/docs/browser) |
|
|
424
|
+
| **AI SDK v5** | `@tryhamster/gerbil/ai` | [📖 AI SDK](https://gerbilsdk.com/docs/frameworks/ai-sdk) |
|
|
425
|
+
| **Next.js** | `@tryhamster/gerbil/next` | [📖 Next.js](https://gerbilsdk.com/docs/frameworks/nextjs) |
|
|
426
|
+
| **Express** | `@tryhamster/gerbil/express` | [📖 Express](https://gerbilsdk.com/docs/frameworks/express) |
|
|
427
|
+
| **LangChain** | `@tryhamster/gerbil/langchain` | [📖 LangChain](https://gerbilsdk.com/docs/frameworks/langchain) |
|
|
428
|
+
| **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](https://gerbilsdk.com/docs/mcp) |
|
|
429
|
+
|
|
430
|
+
**Native engine:** `import { WebGPUEngine } from "@tryhamster/gerbil/gpu"` (or `useEngine` from `@tryhamster/gerbil/gpu/hooks` for React) is the primary surface for text, vision, embeddings, and speech.
|
|
431
|
+
|
|
432
|
+
## Supported Models
|
|
433
|
+
|
|
434
|
+
The native engine runs these modalities today. All load straight from the HuggingFace Hub
|
|
435
|
+
via `WebGPUEngine.create({ repo })`.
|
|
436
|
+
|
|
437
|
+
### Text
|
|
265
438
|
|
|
266
|
-
|
|
439
|
+
| Model | Repo | Notes |
|
|
440
|
+
|-------|------|-------|
|
|
441
|
+
| **Qwen3.5-0.8B** | `mlx-community/Qwen3.5-0.8B-4bit` | Default text model; vision-capable (`Qwen/Qwen3.5-0.8B` for the ViT) |
|
|
442
|
+
| **Qwen3.5-2B** | `Qwen/Qwen3.5-2B` | Higher quality; 262k context; multimodal (vision-capable) |
|
|
443
|
+
| **LFM2.5-350M** | `LiquidAI/LFM2.5-350M` | Hybrid conv/attention, very fast, ~199 MB q4 |
|
|
444
|
+
| **Gemma 4 E2B** | `mlx-community/gemma-4-e2b-it-4bit` | PLE CPU-streamed; vision-capable |
|
|
267
445
|
|
|
268
|
-
|
|
446
|
+
### Vision (image → text, `describeImage`)
|
|
269
447
|
|
|
270
|
-
|
|
448
|
+
| Tower | From | Notes |
|
|
449
|
+
|-------|------|-------|
|
|
450
|
+
| **Qwen3.5 ViT** | `Qwen/Qwen3.5-0.8B` (`enableVision: true`) | Bit-exact vs HF |
|
|
451
|
+
| **Gemma 4 ViT** | `mlx-community/gemma-4-e2b-it-4bit` (`enableVision: true`) | Native projector |
|
|
271
452
|
|
|
272
|
-
|
|
273
|
-
|-------|------|----------|
|
|
274
|
-
| `qwen3-0.6b` | ~400MB | General use, reasoning (thinking mode) |
|
|
275
|
-
| `qwen2.5-coder-0.5b` | ~400MB | Code generation |
|
|
276
|
-
| `smollm2-135m` | ~100MB | Quick completions |
|
|
453
|
+
### Embeddings (`embed`)
|
|
277
454
|
|
|
278
|
-
|
|
455
|
+
| Model | Repo | Notes |
|
|
456
|
+
|-------|------|-------|
|
|
457
|
+
| **EmbeddingGemma-300M** | `mlx-community/embeddinggemma-300m-4bit` | 768-dim, asymmetric (`taskType`), runs on iPad |
|
|
279
458
|
|
|
280
|
-
###
|
|
459
|
+
### Speech
|
|
281
460
|
|
|
282
|
-
| Model | Type |
|
|
461
|
+
| Model | Type | Repo | Notes |
|
|
283
462
|
|-------|------|------|-------|
|
|
284
|
-
|
|
|
285
|
-
|
|
|
286
|
-
|
|
287
|
-
|
|
463
|
+
| **Kani-TTS-2** | TTS | `nineninesix/kani-tts-2-en` | `engine.speak()` → 22.05 kHz PCM |
|
|
464
|
+
| **Moonshine** | STT | `UsefulSensors/moonshine-base` | `MoonshineSTT.transcribe()`, raw-waveform |
|
|
465
|
+
|
|
466
|
+
### Quantization & dtype
|
|
467
|
+
|
|
468
|
+
`dtype: "auto"` (the React-hook default) picks int4 on mobile and the repo's native
|
|
469
|
+
precision on desktop. For Qwen3.5-0.8B on Dawn/Node:
|
|
470
|
+
|
|
471
|
+
| Format | Download | tok/s | Notes |
|
|
472
|
+
|---|---|---|---|
|
|
473
|
+
| MLX 4-bit (affine) | 404 MB | fastest | Smallest. Recommended. |
|
|
474
|
+
| GPTQ (AutoRound) | 734 MB | fast | Pre-quantized linears, F16 embed |
|
|
475
|
+
| F32 (on-the-fly Q4) | 1666 MB | slowest | No pre-quantization needed |
|
|
476
|
+
|
|
477
|
+
> Throughput moves run-to-run and across the optimization loop; treat these as relative,
|
|
478
|
+
> not promises.
|
|
479
|
+
|
|
480
|
+
### WGSL Kernels
|
|
481
|
+
|
|
482
|
+
MatMul, MatMulInt4, EmbeddingInt4, RMSNorm, RoPE, GQA Attention (flash-style, causal +
|
|
483
|
+
bidirectional), SwiGLU/GeGLU, CrossAttention, CausalConv1d, M-RoPE, EmbedSplice, FSQ +
|
|
484
|
+
HiFi-GAN (NanoCodec decoder), and more.
|
|
485
|
+
|
|
486
|
+
> **High-level `Gerbil` class.** `import { Gerbil } from "@tryhamster/gerbil"` (plus the
|
|
487
|
+
> one-liner and `@tryhamster/gerbil/skills`) is a supported convenience wrapper over the
|
|
488
|
+
> native `WebGPUEngine` — ideal for quick scripts, the CLI, and the AI SDK. Reach for
|
|
489
|
+
> `WebGPUEngine` / `useEngine` directly when you want lower-level control over loading,
|
|
490
|
+
> vision, embeddings, and speech.
|
|
288
491
|
|
|
289
492
|
## Documentation
|
|
290
493
|
|
|
494
|
+
Full documentation, guides, and a live playground live at **[gerbilsdk.com/docs](https://gerbilsdk.com/docs)**.
|
|
495
|
+
|
|
291
496
|
| Guide | Description |
|
|
292
497
|
|-------|-------------|
|
|
293
|
-
| [📖
|
|
294
|
-
| [📖
|
|
295
|
-
| [📖
|
|
296
|
-
| [📖
|
|
297
|
-
| [📖
|
|
298
|
-
| [📖
|
|
299
|
-
| [📖
|
|
300
|
-
| [📖
|
|
301
|
-
| [📖
|
|
302
|
-
| [📖
|
|
303
|
-
| [📖
|
|
498
|
+
| [📖 Getting Started](https://gerbilsdk.com/docs/getting-started) | Install, load a model, core concepts |
|
|
499
|
+
| [📖 Structured Output](https://gerbilsdk.com/docs) | `generateObject` / `useObject` — validated JSON with retries |
|
|
500
|
+
| [📖 Embeddings](https://gerbilsdk.com/docs) | EmbeddingGemma semantic search, similarity, RAG |
|
|
501
|
+
| [📖 Vision](https://gerbilsdk.com/docs/vision) | Image → text with Qwen3.5 ViT & Gemma 4 ViT |
|
|
502
|
+
| [📖 Text-to-Speech](https://gerbilsdk.com/docs/tts) | Native Kani-TTS-2 (`engine.speak()`) |
|
|
503
|
+
| [📖 Speech-to-Text](https://gerbilsdk.com/docs/stt) | Native Moonshine (`MoonshineSTT`) |
|
|
504
|
+
| [📖 Browser](https://gerbilsdk.com/docs/browser) | WebGPU inference, React hooks |
|
|
505
|
+
| [📖 Hooks](https://gerbilsdk.com/docs/hooks) | `useEngine` / `useObject` / `useTTS` / `useSTT` |
|
|
506
|
+
| [📖 Skills](https://gerbilsdk.com/docs/skills) | Built-in skills, custom skill development |
|
|
507
|
+
| [📖 Tools](https://gerbilsdk.com/docs/tools) | Tool calling, agentic workflows |
|
|
508
|
+
| [📖 REPL](https://gerbilsdk.com/docs/repl) | Interactive terminal dashboard |
|
|
509
|
+
| [📖 AI SDK](https://gerbilsdk.com/docs/frameworks/ai-sdk) | Vercel AI SDK v5 (LLM, TTS, STT, Embeddings) |
|
|
510
|
+
| [📖 Frameworks](https://gerbilsdk.com/docs/frameworks) | Next.js, Express, React, LangChain |
|
|
511
|
+
| [📖 CLI](https://gerbilsdk.com/docs/cli) | All CLI commands and options |
|
|
512
|
+
| [📖 Mobile](https://gerbilsdk.com/docs/mobile) | iOS / iPadOS guidance & memory guards |
|
|
513
|
+
| [📖 MCP](https://gerbilsdk.com/docs/mcp) | MCP server for Claude Desktop & Cursor |
|
|
304
514
|
|
|
305
515
|
## Requirements
|
|
306
516
|
|
|
307
|
-
|
|
308
|
-
|
|
517
|
+
The native engine needs a real GPU and a WebGPU runtime:
|
|
518
|
+
|
|
519
|
+
- **Browser** — Chrome/Edge 113+, Safari 26+ (iOS/iPadOS 26+), or Firefox 141+
|
|
520
|
+
- **Node** — Node.js 18+ with the `webgpu` package (Dawn) installed
|
|
521
|
+
|
|
522
|
+
On devices without WebGPU the engine throws a clear error rather than silently degrading.
|
|
309
523
|
|
|
310
524
|
## License
|
|
311
525
|
|