npm - @tryhamster/gerbil - Versions diffs - 1.0.0-rc.8 → 1.0.0 - Mend

@tryhamster/gerbil 1.0.0-rc.8 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (179) hide show

package/LICENSE +1 -1
package/README.md +247 -84
package/dist/architectures-C1I5V3Dt.mjs +6070 -0
package/dist/architectures-C1I5V3Dt.mjs.map +1 -0
package/dist/browser/index.d.ts +264 -588
package/dist/browser/index.d.ts.map +1 -1
package/dist/browser/index.js +585 -2334
package/dist/browser/index.js.map +1 -1
package/dist/cli.mjs +625 -1098
package/dist/cli.mjs.map +1 -1
package/dist/defaults-9komdrbY.mjs +24 -0
package/dist/defaults-9komdrbY.mjs.map +1 -0
package/dist/frameworks/express.d.mts +1 -3
package/dist/frameworks/express.d.mts.map +1 -1
package/dist/frameworks/express.mjs +7 -7
package/dist/frameworks/express.mjs.map +1 -1
package/dist/frameworks/fastify.d.mts +1 -1
package/dist/frameworks/fastify.d.mts.map +1 -1
package/dist/frameworks/fastify.mjs +3 -3
package/dist/frameworks/fastify.mjs.map +1 -1
package/dist/frameworks/hono.d.mts +1 -1
package/dist/frameworks/hono.d.mts.map +1 -1
package/dist/frameworks/hono.mjs +4 -4
package/dist/frameworks/hono.mjs.map +1 -1
package/dist/frameworks/next.d.mts +3 -2
package/dist/frameworks/next.d.mts.map +1 -1
package/dist/frameworks/next.mjs +4 -4
package/dist/frameworks/next.mjs.map +1 -1
package/dist/frameworks/react.d.mts +1 -1
package/dist/frameworks/trpc.d.mts +1 -1
package/dist/frameworks/trpc.d.mts.map +1 -1
package/dist/frameworks/trpc.mjs +4 -4
package/dist/frameworks/trpc.mjs.map +1 -1
package/dist/gerbil-BHrJJIa4.mjs +1656 -0
package/dist/gerbil-BHrJJIa4.mjs.map +1 -0
package/dist/gerbil-BT9fCydo.d.mts +488 -0
package/dist/gerbil-BT9fCydo.d.mts.map +1 -0
package/dist/gerbil-DomNfIr1.mjs +4 -0
package/dist/gpu/hooks.d.mts +520 -0
package/dist/gpu/hooks.d.mts.map +1 -0
package/dist/gpu/hooks.mjs +1188 -0
package/dist/gpu/hooks.mjs.map +1 -0
package/dist/gpu/index.d.mts +2 -0
package/dist/gpu/index.mjs +6 -0
package/dist/gpu-33qCAtHW.mjs +3615 -0
package/dist/gpu-33qCAtHW.mjs.map +1 -0
package/dist/index-Dgmb2kE3.d.mts +245 -0
package/dist/index-Dgmb2kE3.d.mts.map +1 -0
package/dist/index-jEAL2s-A.d.mts +2022 -0
package/dist/index-jEAL2s-A.d.mts.map +1 -0
package/dist/index.d.mts +22 -487
package/dist/index.d.mts.map +1 -1
package/dist/index.mjs +13 -8
package/dist/index.mjs.map +1 -1
package/dist/indexeddb-store-BWIMtxxH.mjs +103 -0
package/dist/indexeddb-store-BWIMtxxH.mjs.map +1 -0
package/dist/indexeddb-store-ClH12Xnl.mjs +4 -0
package/dist/integrations/ai-sdk.d.mts +75 -6
package/dist/integrations/ai-sdk.d.mts.map +1 -1
package/dist/integrations/ai-sdk.mjs +131 -15
package/dist/integrations/ai-sdk.mjs.map +1 -1
package/dist/integrations/langchain.d.mts +1 -1
package/dist/integrations/langchain.d.mts.map +1 -1
package/dist/integrations/langchain.mjs +5 -5
package/dist/integrations/langchain.mjs.map +1 -1
package/dist/integrations/llamaindex.d.mts +1 -1
package/dist/integrations/llamaindex.d.mts.map +1 -1
package/dist/integrations/llamaindex.mjs +5 -5
package/dist/integrations/llamaindex.mjs.map +1 -1
package/dist/integrations/mcp-client.mjs +3 -3
package/dist/integrations/mcp-client.mjs.map +1 -1
package/dist/integrations/mcp.d.mts +3 -2
package/dist/integrations/mcp.d.mts.map +1 -1
package/dist/integrations/mcp.mjs +5 -5
package/dist/{mcp-BvbriaBy.mjs → mcp-1DaMsaBc.mjs} +4 -4
package/dist/mcp-1DaMsaBc.mjs.map +1 -0
package/dist/memory/index.d.mts +3 -0
package/dist/memory/index.mjs +6 -0
package/dist/memory-D1P7Tmda.mjs +4 -0
package/dist/memory-DVN0MnIG.mjs +132 -0
package/dist/memory-DVN0MnIG.mjs.map +1 -0
package/dist/memory-Dj0J1v88.mjs +294 -0
package/dist/memory-Dj0J1v88.mjs.map +1 -0
package/dist/moonshine-stt-BLyVoRpB.mjs +4 -0
package/dist/moonshine-stt-v_P_Ci_m.mjs +11936 -0
package/dist/moonshine-stt-v_P_Ci_m.mjs.map +1 -0
package/dist/{one-liner-s-lD8rCC.mjs → one-liner-DnQn7HJK.mjs} +14 -16
package/dist/one-liner-DnQn7HJK.mjs.map +1 -0
package/dist/repl-jV5gcJFA.mjs +9 -0
package/dist/skills/index.d.mts +270 -320
package/dist/skills/index.d.mts.map +1 -1
package/dist/skills/index.mjs +5 -5
package/dist/{skills-CD3Orlex.mjs → skills-DX8D59UH.mjs} +187 -32
package/dist/skills-DX8D59UH.mjs.map +1 -0
package/dist/{tools-Bi1P7Xoy.mjs → tools-DQ1mPUw5.mjs} +34 -22
package/dist/tools-DQ1mPUw5.mjs.map +1 -0
package/dist/{types-CiTc7ez3.d.mts → types-D6FiR_oh.d.mts} +106 -12
package/dist/types-D6FiR_oh.d.mts.map +1 -0
package/dist/types-DQBe2lFo.d.mts +165 -0
package/dist/types-DQBe2lFo.d.mts.map +1 -0
package/dist/{utils-CZBZ8dgR.mjs → utils-DKO55ZmZ.mjs} +1 -1
package/dist/{utils-CZBZ8dgR.mjs.map → utils-DKO55ZmZ.mjs.map} +1 -1
package/dist/vector-B0panuy6.mjs +95 -0
package/dist/vector-B0panuy6.mjs.map +1 -0
package/docs/PROJECT-STATE.md +321 -0
package/docs/adding-a-model-family.md +280 -0
package/docs/ai-sdk.md +70 -61
package/docs/architecture/overview.md +17 -7
package/docs/browser.md +203 -8
package/docs/embeddings.md +156 -0
package/docs/gerbil-site-native-migration.md +217 -0
package/docs/gpu-engine/architectures.md +398 -0
package/docs/gpu-engine/ir.md +372 -0
package/docs/gpu-engine/kernels.md +718 -0
package/docs/gpu-engine/paper.html +1759 -0
package/docs/gpu-engine/paper.md +2109 -0
package/docs/gpu-engine/safetensors.md +312 -0
package/docs/gpu-engine/tokenizer.md +302 -0
package/docs/memory-rag.md +91 -0
package/docs/metal-safari-intel.md +190 -0
package/docs/mobile-failure-diagnosis.md +124 -0
package/docs/mobile.md +99 -0
package/docs/observability.md +230 -0
package/docs/onnx-removal-plan.md +339 -0
package/docs/research/autoresearch-portable.md +904 -0
package/docs/research/dispatch-reduction-hivemind.md +84 -0
package/docs/research/ios-safari-model-caching.md +117 -0
package/docs/research/mobile-webgpu-speed-fusion.md +135 -0
package/docs/research/native-stt-model-selection.md +49 -0
package/docs/research/native-tts-model-selection.md +90 -0
package/docs/research/native-vs-chromium-decision.md +152 -0
package/docs/research/nemotron-mamba2-inference.md +910 -0
package/docs/research/qwen35-multimodal.md +293 -0
package/docs/research/qwen36-gemma4-targets.md +337 -0
package/docs/research/sota-embedding-models.md +179 -0
package/docs/research/sota-mobile-models-2026.md +263 -0
package/docs/research/sota-modality-models.md +202 -0
package/docs/research/tps-baselines.md +71 -0
package/docs/research/webgpu-m4-reference.md +104 -0
package/docs/site-update-plan.md +155 -0
package/docs/structured-output.md +123 -0
package/docs/stt.md +63 -446
package/docs/tts.md +77 -499
package/docs/vision.md +100 -338
package/package.json +22 -7
package/dist/chrome-backend-CORwaIyC.mjs +0 -1212
package/dist/chrome-backend-CORwaIyC.mjs.map +0 -1
package/dist/chrome-backend-DIKYoWj-.mjs +0 -3
package/dist/gerbil-CJ3ifloF.mjs +0 -4
package/dist/gerbil-Dw4Qj77e.mjs +0 -1631
package/dist/gerbil-Dw4Qj77e.mjs.map +0 -1
package/dist/gerbil-qOTe1nl2.d.mts +0 -431
package/dist/gerbil-qOTe1nl2.d.mts.map +0 -1
package/dist/kokoro-BNTb6egA.mjs +0 -20210
package/dist/kokoro-BNTb6egA.mjs.map +0 -1
package/dist/kokoro-DFRQ1OeM.js +0 -20212
package/dist/kokoro-DFRQ1OeM.js.map +0 -1
package/dist/mcp-BvbriaBy.mjs.map +0 -1
package/dist/one-liner-s-lD8rCC.mjs.map +0 -1
package/dist/repl-DveXw36T.mjs +0 -9
package/dist/skills-CD3Orlex.mjs.map +0 -1
package/dist/stt-CpLYbGFd.mjs +0 -433
package/dist/stt-CpLYbGFd.mjs.map +0 -1
package/dist/stt-DRPLEEHB.mjs +0 -3
package/dist/stt-Te8Qz-Ay.js +0 -433
package/dist/stt-Te8Qz-Ay.js.map +0 -1
package/dist/tools-Bi1P7Xoy.mjs.map +0 -1
package/dist/transformers.web-DokyH3rP.js +0 -3
package/dist/transformers.web-M6mCnEYJ.js +0 -30382
package/dist/transformers.web-M6mCnEYJ.js.map +0 -1
package/dist/tts-C0xx3CtE.js +0 -724
package/dist/tts-C0xx3CtE.js.map +0 -1
package/dist/tts-DXgsKGCe.mjs +0 -3
package/dist/tts-DeGANMNV.mjs +0 -730
package/dist/tts-DeGANMNV.mjs.map +0 -1
package/dist/types-CiTc7ez3.d.mts.map +0 -1
/package/dist/{auto-update-S9s5-g0C.mjs → auto-update-BVaLXcDE.mjs} +0 -0
/package/dist/{chunk-CkXuGtQK.mjs → chunk-B9cbKln6.mjs} +0 -0
/package/dist/{microphone-DaMZFRuR.mjs → microphone-Bqmoz9_K.mjs} +0 -0

package/LICENSE CHANGED Viewed

@@ -1,6 +1,6 @@
 MIT License
-Copyright (c) 2025 Wheel Go Fast.
+Copyright (c) 2025-2026 Wheel Go Fast, Inc.
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

package/README.md CHANGED Viewed

@@ -5,14 +5,16 @@
 <h1 align="center">Gerbil</h1>
 <p align="center">
-  <strong>Local AI inference for Node.js. LLM, TTS, STT. GPU-accelerated. Zero config.</strong>
+  <strong>A from-scratch WebGPU/WGSL inference engine. Text, vision, embeddings, speech — all native, on-device, in the browser and Node.</strong>
 </p>
 <p align="center">
   <a href="#install">Install</a> •
-  <a href="#quick-start">Quick Start</a> •
-  <a href="#text-to-speech">TTS</a> •
-  <a href="#speech-to-text">STT</a> •
+  <a href="#native-webgpu-engine">Engine</a> •
+  <a href="#react-quickstart">React</a> •
+  <a href="#embeddings">Embeddings</a> •
+  <a href="#vision">Vision</a> •
+  <a href="#speech">Speech</a> •
   <a href="./docs/ai-sdk.md">AI SDK</a> •
   <a href="./docs/cli.md">CLI</a>
 </p>
@@ -35,20 +37,28 @@
 ---
 ```typescript
-import gerbil from "@tryhamster/gerbil";
+import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
-const text = await gerbil("Explain recursion in one sentence");
+const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
+const { text } = await engine.generate("Explain recursion in one sentence");
 ```
+> **Pre-1.0.** Gerbil is a release-candidate (`1.0.0-rc.26`, changeset pre-release). The
+> native engine surface below is the path going forward; APIs may still shift before 1.0.
 ## Why Gerbil?
-- **Zero Config** — `npx @tryhamster/gerbil "your prompt"` just works
-- **Local & Private** — No API keys, no data leaves your machine
-- **GPU Accelerated** — WebGPU with CPU fallback
-- **Complete Audio** — Text-to-Speech (Kokoro) & Speech-to-Text (Whisper)
-- **Framework Ready** — AI SDK v5, Next.js, Express, LangChain
-- **Skills System** — Built-in + custom skills with Zod validation
-- **Tool Calling** — Agentic capabilities with Qwen3 models
+- **One native engine** — a from-scratch WebGPU/WGSL engine, pure compute shaders, nothing
+  extra to ship.
+- **Multimodal, all native** — text, vision (image→text), embeddings, and speech run on the
+  same engine, loading safetensors directly from the HuggingFace Hub.
+- **Browser & Node** — Chrome 113+, Safari 26+ (iOS 26+), Firefox 141+, and Node via Dawn
+  (`webgpu` npm), anywhere there's a real GPU.
+- **Local & private** — no API keys, nothing leaves the device.
+- **React-first** — `useEngine` owns load / unload / hot-swap and shares one engine
+  across components (reference-counted), with `dtype: "auto"` picking int4 on mobile.
+- **Framework ready** — Vercel AI SDK v5, Next.js, Express, LangChain adapters.
+- **Skills & tools** — built-in + custom skills with Zod validation; agentic tool calling.
 ## Install
@@ -65,82 +75,184 @@ npm install @tryhamster/gerbil
 After global install, use `gerbil` directly instead of `npx @tryhamster/gerbil`.
-## Quick Start
+## Native WebGPU Engine
+Gerbil's product is a from-scratch WebGPU inference engine — pure WGSL compute shaders.
+It loads safetensors directly from the HuggingFace Hub (selective tensor download — skip
+vision towers you don't need) and runs the same code in the browser and in Node (via Dawn).
 ```typescript
-import { Gerbil } from "@tryhamster/gerbil";
+import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
-const g = new Gerbil();
-await g.loadModel("qwen3-0.6b");
+// dtype "auto" picks int4 on mobile, the repo's native precision on desktop.
+const engine = await WebGPUEngine.create({
+  repo: "mlx-community/Qwen3.5-0.8B-4bit",
+  dtype: "auto",
+});
 // Generate
-const result = await g.generate("Write a haiku");
-console.log(result.text);
+const { text, tokensPerSecond } = await engine.generate("Write a haiku about gerbils");
+console.log(text, `(${tokensPerSecond.toFixed(1)} tok/s)`);
 // Stream
-for await (const chunk of g.stream("Tell me a story")) {
-  process.stdout.write(chunk);
+for await (const token of engine.stream("Tell me a story")) {
+  process.stdout.write(token);
 }
-// Thinking mode (Qwen3)
-const math = await g.generate("What is 127 × 43?", { thinking: true });
-console.log(math.thinking); // Shows reasoning
-console.log(math.text);     // "5461"
+engine.destroy();
+```
-// Structured JSON
-const data = await g.json("Extract: John, 32, NYC", {
-  schema: z.object({ name: z.string(), age: z.number(), city: z.string() }),
-});
+`WebGPUEngine.create({ repo, dtype, enableVision, embedding, maxSeqLen })` returns an
+engine with `generate`, `stream`, `describeImage`, `embed`, and `speak`. See the
+[native engine docs](#supported-models) below for the model lineup.
+## React Quickstart
+`useEngine` (from `@tryhamster/gerbil/gpu/hooks`) owns the full engine lifecycle —
+load, unload, hot-swap on config change, and reference-counted sharing so multiple
+components never upload the same weights to the GPU twice.
+```tsx
+import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
+function Chat() {
+  const { complete, completion, isLoading, isGenerating, tps } = useEngine({
+    model: "mlx-community/Qwen3.5-0.8B-4bit",
+    autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
+  });
+  if (isLoading) return <div>Loading model…</div>;
+  return (
+    <div>
+      <button onClick={() => complete("What is 2+2?")} disabled={isGenerating}>
+        Generate
+      </button>
+      <p>{completion}</p>
+      {isGenerating && <span>{tps?.toFixed(1)} tok/s</span>}
+    </div>
+  );
+}
 ```
-## Text-to-Speech
+The same hook exposes `describeImage` (vision), `embed`/`similarity` (embeddings), `stop`,
+and `dispose`. Pass `enableVision: true` or `embedding: true` to load those modalities.
+## Structured Output
-Generate natural speech locally using Kokoro TTS (28 voices):
+`generateObject` makes the model return a JSON object: it generates, extracts the JSON,
+validates it, and retries with a corrective nudge until it's valid (or `maxRetries` is hit).
+Validate with a predicate `(o) => boolean` or a minimal `{ required: [...] }` schema; omit
+`schema` to accept any valid JSON.
 ```typescript
-const result = await g.speak("Hello, I'm Gerbil!", { voice: "af_heart" });
-// result.audio = Float32Array, result.sampleRate = 24000
+import { generateObject } from "@tryhamster/gerbil";
-// Stream long text
-for await (const chunk of g.speakStream("Long paragraph...")) {
-  // Play each chunk as it's generated
-}
+const { object, attempts } = await generateObject<{ name: string; age: number }>(
+  'Extract {name, age} from: "I am Sarah, 28"',
+  { schema: { required: ["name", "age"] } },
+);
+// object === { name: "Sarah", age: 28 }
+```
+It's available on the engine, the `Gerbil` class, and the one-liner API:
+```typescript
+import { Gerbil, WebGPUEngine } from "@tryhamster/gerbil";
+const g = new Gerbil();
+await g.loadModel("qwen3.5-0.8b");
+const { object } = await g.generateObject("List 3 primes as {primes: number[]}", {
+  schema: (o) => Array.isArray((o as any).primes),
+});
+// Or directly on the engine:
+const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit" });
+await engine.generateObject("…", { schema: { required: ["title"] } });
+```
+In React, use `useObject` (from `@tryhamster/gerbil/gpu/hooks`):
+```tsx
+import { useObject } from "@tryhamster/gerbil/gpu/hooks";
+const { generate, object, isGenerating } = useObject<{ city: string }>();
+await generate("Extract the city from: I live in Paris", {
+  schema: { required: ["city"] },
+});
 ```
+From the CLI:
 ```bash
-# CLI
-gerbil speak "Hello world" --voice bf_emma
+gerbil object "Extract {name, age}: I am Sarah, 28" --schema person.json
+# person.json: { "required": ["name", "age"] }
 ```
-📖 **[Full TTS Documentation →](./docs/tts.md)**
+## Embeddings
-## Speech-to-Text
+Native text embeddings via **EmbeddingGemma-300M** (mean-pooled Gemma3 encoder + Dense
+head, 768-dim, L2-normalized). EmbeddingGemma is asymmetric — pass `taskType` so queries
+and documents get the right prefix.
-Transcribe audio locally using Whisper (7 models, 80+ languages):
+```typescript
+import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
+const engine = await WebGPUEngine.create({
+  repo: "mlx-community/embeddinggemma-300m-4bit",
+  embedding: true,
+});
+const query = await engine.embed("capital of France", { taskType: "query" });
+const doc = await engine.embed("Paris is the capital of France", { taskType: "document" });
+// Vectors are unit-norm, so cosine similarity is a dot product.
+const sim = query.reduce((s, v, i) => s + v * doc[i], 0);
+```
+📖 **[Full Embeddings Documentation →](./docs/embeddings.md)**
+## Vision
+Image-in → text-out via the native vision towers (Qwen3.5 ViT and Gemma 4 ViT). Load with
+`enableVision: true`, then call `describeImage`.
 ```typescript
-import { readFileSync } from "fs";
+const engine = await WebGPUEngine.create({
+  repo: "Qwen/Qwen3.5-0.8B",
+  enableVision: true,
+});
+// In Node, decode the image to RGB pixels (HWC, 0..255) yourself; in the browser the
+// React hook's describeImage() takes a URL / data-URL directly.
+const { text } = await engine.describeImage(
+  { pixels, width, height },
+  "What's in this image?",
+);
+```
-const audio = new Uint8Array(readFileSync("recording.wav"));
-const result = await g.transcribe(audio);
-console.log(result.text);
+📖 **[Full Vision Documentation →](./docs/vision.md)**
-// With timestamps
-const result = await g.transcribe(audio, { timestamps: true });
-for (const seg of result.segments) {
-  console.log(`[${seg.start}s] ${seg.text}`);
-}
+## Speech
-// Record from microphone
-const result = await g.listen(5000); // 5 seconds
+**Text-to-speech** — native **Kani-TTS-2** (LFM2-350M codec-LM + NVIDIA NeMo NanoCodec).
+`engine.speak()` returns 22.05 kHz mono PCM.
+```typescript
+const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-2-en" });
+const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!"); // sampleRate === 22050
 ```
-```bash
-# CLI
-gerbil transcribe audio.wav --timestamps
+**Speech-to-text** — native **Moonshine** (raw-waveform encoder/decoder, no FFT/log-mel)
+via the dedicated `MoonshineSTT` class.
+```typescript
+import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
+const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
+const { text } = await stt.transcribe(pcm16kMono); // Float32Array @ 16 kHz
 ```
-📖 **[Full STT Documentation →](./docs/stt.md)**
+📖 **[Full TTS Documentation →](./docs/tts.md)** · **[Full STT Documentation →](./docs/stt.md)**
 ## Skills
@@ -231,25 +343,30 @@ gerbil update                                 # Update to latest version
 ## Browser Usage
-Run LLMs directly in the browser with WebGPU — no server required:
+Run LLMs directly in the browser with WebGPU — no server required. The React hooks
+live at `@tryhamster/gerbil/gpu/hooks` and run pure WebGPU compute:
 ```tsx
-import { useChat } from "@tryhamster/gerbil/browser";
+import { useChat } from "@tryhamster/gerbil/gpu/hooks";
 function Chat() {
-  const { messages, input, setInput, handleSubmit, isLoading } = useChat();
+  const { messages, send, isLoading, isGenerating } = useChat();
   if (isLoading) return <div>Loading model...</div>;
   return (
-    <form onSubmit={handleSubmit}>
-      {messages.map(m => <div key={m.id}>{m.role}: {m.content}</div>)}
-      <input value={input} onChange={e => setInput(e.target.value)} />
-    </form>
+    <div>
+      {messages.map((m, i) => <div key={i}>{m.role}: {m.content}</div>)}
+      <button onClick={() => send("Hello!")} disabled={isGenerating}>Send</button>
+    </div>
   );
 }
 ```
+`@tryhamster/gerbil/browser` still exports the device/WebGPU utilities
+(`isModelSafeForDevice`, `detectMemoryCrash`, `downloadModelChunked`,
+`checkWebGPUCapabilities`, `getBrowserDiagnostics`, …).
 📖 **[Full Browser Documentation →](./docs/browser.md)**
 ## Integrations
@@ -263,40 +380,82 @@ function Chat() {
 | **LangChain** | `@tryhamster/gerbil/langchain` | [📖 Frameworks](./docs/frameworks.md) |
 | **MCP Server** | `npx @tryhamster/gerbil serve --mcp` | [📖 MCP](./docs/mcp.md) |
-**Audio capabilities:** TTS and STT are built into the core `Gerbil` class, `@tryhamster/gerbil/browser` hooks, and `@tryhamster/gerbil/ai` provider.
+**Native engine:** `import { WebGPUEngine } from "@tryhamster/gerbil/gpu"` (or `useEngine` from `@tryhamster/gerbil/gpu/hooks` for React) is the primary surface for text, vision, embeddings, and speech.
+## Supported Models
-## Models
+The native engine runs these modalities today. All load straight from the HuggingFace Hub
+via `WebGPUEngine.create({ repo })`.
-### Language Models
+### Text
-| Model | Size | Best For |
-|-------|------|----------|
-| `qwen3-0.6b` | ~400MB | General use, reasoning (thinking mode) |
-| `qwen2.5-coder-0.5b` | ~400MB | Code generation |
-| `smollm2-135m` | ~100MB | Quick completions |
+| Model | Repo | Notes |
+|-------|------|-------|
+| **Qwen3.5-0.8B** | `mlx-community/Qwen3.5-0.8B-4bit` | Default text model; vision-capable (`Qwen/Qwen3.5-0.8B` for the ViT) |
+| **Qwen3.5-2B** | `Qwen/Qwen3.5-2B` | Higher quality; 262k context; multimodal (vision-capable) |
+| **LFM2.5-350M** | `LiquidAI/LFM2.5-350M` | Hybrid conv/attention, very fast, ~199 MB q4 |
+| **Gemma 4 E2B** | `mlx-community/gemma-4-e2b-it-4bit` | PLE CPU-streamed; vision-capable |
-Use any HuggingFace model: `npx @tryhamster/gerbil -m hf:org/model "prompt"`
+### Vision (image → text, `describeImage`)
-### Audio Models
+| Tower | From | Notes |
+|-------|------|-------|
+| **Qwen3.5 ViT** | `Qwen/Qwen3.5-0.8B` (`enableVision: true`) | Bit-exact vs HF |
+| **Gemma 4 ViT** | `mlx-community/gemma-4-e2b-it-4bit` (`enableVision: true`) | Native projector |
-| Model | Type | Size | Notes |
+### Embeddings (`embed`)
+| Model | Repo | Notes |
+|-------|------|-------|
+| **EmbeddingGemma-300M** | `mlx-community/embeddinggemma-300m-4bit` | 768-dim, asymmetric (`taskType`), runs on iPad |
+### Speech
+| Model | Type | Repo | Notes |
 |-------|------|------|-------|
-| `kokoro-82m` | TTS | ~330MB | 28 voices, English |
-| `whisper-tiny.en` | STT | 39MB | English, fastest |
-| `whisper-base.en` | STT | 74MB | English, balanced |
-| `whisper-small` | STT | 244MB | 80+ languages |
+| **Kani-TTS-2** | TTS | `nineninesix/kani-tts-2-en` | `engine.speak()` → 22.05 kHz PCM |
+| **Moonshine** | STT | `UsefulSensors/moonshine-base` | `MoonshineSTT.transcribe()`, raw-waveform |
+### Quantization & dtype
+`dtype: "auto"` (the React-hook default) picks int4 on mobile and the repo's native
+precision on desktop. For Qwen3.5-0.8B on Dawn/Node:
+| Format | Download | tok/s | Notes |
+|---|---|---|---|
+| MLX 4-bit (affine) | 404 MB | fastest | Smallest. Recommended. |
+| GPTQ (AutoRound) | 734 MB | fast | Pre-quantized linears, F16 embed |
+| F32 (on-the-fly Q4) | 1666 MB | slowest | No pre-quantization needed |
+> Throughput moves run-to-run and across the optimization loop; treat these as relative,
+> not promises.
+### WGSL Kernels
+MatMul, MatMulInt4, EmbeddingInt4, RMSNorm, RoPE, GQA Attention (flash-style, causal +
+bidirectional), SwiGLU/GeGLU, CrossAttention, CausalConv1d, M-RoPE, EmbedSplice, FSQ +
+HiFi-GAN (NanoCodec decoder), and more.
+> **High-level `Gerbil` class.** `import { Gerbil } from "@tryhamster/gerbil"` (plus the
+> one-liner and `@tryhamster/gerbil/skills`) is a supported convenience wrapper over the
+> native `WebGPUEngine` — ideal for quick scripts, the CLI, and the AI SDK. Reach for
+> `WebGPUEngine` / `useEngine` directly when you want lower-level control over loading,
+> vision, embeddings, and speech.
 ## Documentation
 | Guide | Description |
 |-------|-------------|
-| [📖 Text-to-Speech](./docs/tts.md) | Kokoro TTS, 28 voices, streaming audio |
-| [📖 Speech-to-Text](./docs/stt.md) | Whisper STT, transcription, voice input |
+| [📖 Structured Output](./docs/structured-output.md) | `generateObject` / `useObject` — validated JSON with retries |
+| [📖 Embeddings](./docs/embeddings.md) | EmbeddingGemma semantic search, similarity, RAG |
+| [📖 Vision](./docs/vision.md) | Image → text with Qwen3.5 ViT & Gemma 4 ViT |
+| [📖 Text-to-Speech](./docs/tts.md) | Native Kani-TTS-2 (`engine.speak()`) |
+| [📖 Speech-to-Text](./docs/stt.md) | Native Moonshine (`MoonshineSTT`) |
 | [📖 Browser](./docs/browser.md) | WebGPU inference, React hooks |
 | [📖 Skills](./docs/skills.md) | Built-in skills, custom skill development |
 | [📖 Tools](./docs/tools.md) | Tool calling, agentic workflows |
 | [📖 REPL](./docs/repl.md) | Interactive terminal dashboard |
-| [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT) |
+| [📖 AI SDK](./docs/ai-sdk.md) | Vercel AI SDK v5 (LLM, TTS, STT, Embeddings) |
 | [📖 Frameworks](./docs/frameworks.md) | Next.js, Express, React, LangChain |
 | [📖 CLI](./docs/cli.md) | All CLI commands and options |
 | [📖 MCP Server](./docs/mcp.md) | MCP server for Claude Desktop & Cursor |
@@ -304,8 +463,12 @@ Use any HuggingFace model: `npx @tryhamster/gerbil -m hf:org/model "prompt"`
 ## Requirements
-- Node.js 18+
-- For GPU: WebGPU-compatible environment
+The native engine needs a real GPU and a WebGPU runtime:
+- **Browser** — Chrome/Edge 113+, Safari 26+ (iOS/iPadOS 26+), or Firefox 141+
+- **Node** — Node.js 18+ with the `webgpu` package (Dawn) installed
+On devices without WebGPU the engine throws a clear error rather than silently degrading.
 ## License