localm-web 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +51 -0
- package/LICENSE +21 -0
- package/README.md +225 -0
- package/dist/index.d.ts +373 -0
- package/dist/index.js +336 -0
- package/dist/index.js.map +1 -0
- package/package.json +88 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to **localm-web** will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.1.0] - 2026-05-10
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- Initial project scaffolding: TypeScript strict configuration, ESM-only Vite library build,
|
|
15
|
+
Vitest test runner, ESLint + Prettier.
|
|
16
|
+
- Public type primitives: `Message`, `Role`, `FinishReason`, `GenerationOptions`,
|
|
17
|
+
`ModelLoadProgress`, `ProgressCallback`, `TokenChunk`, `ModelPreset`.
|
|
18
|
+
- Error hierarchy rooted at `LocalmWebError` (`WebGPUUnavailableError`,
|
|
19
|
+
`ModelLoadError`, `ModelNotLoadedError`, `UnknownModelError`,
|
|
20
|
+
`GenerationAbortedError`, `QuotaExceededError`, `BackendNotAvailableError`).
|
|
21
|
+
- Runtime-agnostic `Engine` interface and concrete `WebLLMEngine` backed by `@mlc-ai/web-llm`.
|
|
22
|
+
- `Chat` task with `send()`, `stream()`, `setSystemPrompt()`, `resetHistory()`, `getHistory()`,
|
|
23
|
+
`unload()`. Streaming honors `AbortSignal`.
|
|
24
|
+
- Curated model registry: `phi-3.5-mini-int4`, `llama-3.2-1b-int4`, `qwen2.5-1.5b-int4`.
|
|
25
|
+
- Streaming helpers: `collectStream`, `tap`.
|
|
26
|
+
- Vite example app under `examples/vite-chat/` demonstrating model loading,
|
|
27
|
+
streaming output and abort.
|
|
28
|
+
- Unit tests for presets, exceptions, results, streaming and Chat (with a fake engine).
|
|
29
|
+
- Release pipeline: `Makefile` + `scripts/release.sh` (release branch + tag + PR flow,
|
|
30
|
+
PT-BR PR template), `RELEASES.md` autogenerated from git tags.
|
|
31
|
+
- GitHub Actions workflows: `ci.yml` (Node 18/20/22 matrix) and `release-npm.yml`
|
|
32
|
+
(publish on `v*.*.*` tag with `npm publish --provenance`).
|
|
33
|
+
- Security policy documented in `CLAUDE.md` and `README.md`: runtime vs dev-deps split,
|
|
34
|
+
zero-CVE bar for `peerDependencies`, `npm audit` on every release, provenance signing.
|
|
35
|
+
|
|
36
|
+
### Changed
|
|
37
|
+
|
|
38
|
+
- Bumped `vite` from `^5.4.0` to `^7.3.3` and `vitest` from `^2.1.0` to `^3.2.4`
|
|
39
|
+
to clear advisories GHSA-67mh-4wv8-2f99 (esbuild dev-server CORS) and
|
|
40
|
+
GHSA-4w7w-66w2-5vf9 (vite path traversal in optimized deps `.map`).
|
|
41
|
+
- Added `overrides.esbuild ^0.25.0` in `package.json` as defense-in-depth against
|
|
42
|
+
transitive esbuild downgrades.
|
|
43
|
+
- ESLint flat config: expanded `ignores` to cover config files and `examples/**`.
|
|
44
|
+
- `tsconfig.test.json`: explicit `exclude` so test files are picked up by the
|
|
45
|
+
TypeScript-aware ESLint parser.
|
|
46
|
+
- `package.json` `lint` script: dropped `--ext .ts,.tsx` (no-op in flat config).
|
|
47
|
+
|
|
48
|
+
### Notes
|
|
49
|
+
|
|
50
|
+
- First public release. `Chat` task is the only fully implemented task — `Completion`,
|
|
51
|
+
`Embeddings`, `Reranker`, structured output and ORT-Web fallback are scheduled for v0.2–v0.5.
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Mauricio Benjamin
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,225 @@
|
|
|
1
|
+
# localm-web
|
|
2
|
+
|
|
3
|
+
> ⚠️ **Status: pre-alpha.** Public API is being designed and is expected to change. Code in this repo is intentionally minimal until v0.1.
|
|
4
|
+
|
|
5
|
+
Browser-only TypeScript SDK for running Language Models (LLMs and SLMs) **locally in the user's browser**, with a developer experience modeled directly on [`ort-vision-sdk-web`](https://github.com/mauriciobenjamin700/ort-vision-sdk).
|
|
6
|
+
|
|
7
|
+
```typescript
|
|
8
|
+
import { Chat } from "localm-web";
|
|
9
|
+
|
|
10
|
+
const chat = await Chat.create("phi-3.5-mini-int4");
|
|
11
|
+
|
|
12
|
+
for await (const token of chat.stream("Explain ONNX in one sentence.")) {
|
|
13
|
+
process.stdout.write(token.text);
|
|
14
|
+
}
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
That's it. No server, no API key, no roundtrip — the model runs **on the user's GPU** via WebGPU.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Why does this exist?
|
|
22
|
+
|
|
23
|
+
The Python ecosystem for local Language Models is saturated: `llama-cpp-python`, Ollama, vLLM, `transformers`, text-generation-inference, and dozens more. Picking up another Python wrapper adds nothing.
|
|
24
|
+
|
|
25
|
+
The **browser side is different**. The closest equivalents are:
|
|
26
|
+
|
|
27
|
+
| Project | What it is | Why it's not enough |
|
|
28
|
+
| ----------------------------------------------------------------- | ------------------------------ | ----------------------------------------------------------------- |
|
|
29
|
+
| [WebLLM (MLC)](https://github.com/mlc-ai/web-llm) | Best-in-class WebGPU runtime | Engine-centric, low-level API, no opinionated tasks |
|
|
30
|
+
| [transformers.js](https://github.com/huggingface/transformers.js) | HF pipeline API in the browser | Slower (no WebGPU-first compilation in many paths), broad surface |
|
|
31
|
+
| `onnxruntime-genai-web` | Microsoft's web LM build | Preview, unstable, no high-level tasks |
|
|
32
|
+
|
|
33
|
+
There is no opinionated, task-oriented, strict-typed, **Ultralytics-style SDK that just works in a Vite app**. `localm-web` fills that gap.
|
|
34
|
+
|
|
35
|
+
The mental model is straightforward: if [`ort-vision-sdk-web`](https://github.com/mauriciobenjamin700/ort-vision-sdk) is what `Detector` / `Classifier` / `Segmenter` look like for vision, then `localm-web` is what `Chat` / `Completion` / `Embeddings` / `Reranker` look like for language.
|
|
36
|
+
|
|
37
|
+
## Design principles
|
|
38
|
+
|
|
39
|
+
1. **Browser-only.** No Node target, no server runtime. If your code runs on a backend, this SDK is the wrong tool — use `transformers`, vLLM, Ollama, or any of the dozens of mature Python options.
|
|
40
|
+
2. **Maximum performance.** WebGPU-first via WebLLM (MLC). Web Worker execution by default so the UI thread stays free. WASM-SIMD fallback for non-WebGPU browsers from v0.5.
|
|
41
|
+
3. **Ultralytics-style DX.** `await Class.create(model)` then `predict()` / `send()` / `embed()` / `score()`. Mirrors `ort-vision-sdk-web` so a developer using both feels continuity.
|
|
42
|
+
4. **ESM only.** No CJS, no UMD, no IIFE. The browser is ESM-native, modern bundlers expect ESM, and shipping multiple formats just bloats the package.
|
|
43
|
+
5. **Vite-first.** The build is optimized for Vite 5+ consumers. Other bundlers will still work, but Vite is the supported smooth path.
|
|
44
|
+
6. **Not tied to Vercel.** No `vercel.json`, no Next-specific helpers, no Edge runtime exports. Examples deploy to any static host (Cloudflare Pages, Netlify, GitHub Pages, S3, self-hosted).
|
|
45
|
+
7. **Wrap, don't fork.** WebLLM stays a peer dependency. We add the API layer, the task abstractions, and the missing pieces (embeddings, reranker, structured output, fallback runtime).
|
|
46
|
+
|
|
47
|
+
## Scope
|
|
48
|
+
|
|
49
|
+
### In scope
|
|
50
|
+
|
|
51
|
+
- Browser-only execution (WebGPU primary, WASM-SIMD fallback from v0.5).
|
|
52
|
+
- High-level tasks: `Chat`, `Completion`, `Embeddings`, `Reranker`.
|
|
53
|
+
- Streaming token output via async generators with `AbortSignal` support.
|
|
54
|
+
- Tokenization, chat templates, sampling, KV cache (delegated to the underlying runtime).
|
|
55
|
+
- Model caching (Cache API + OPFS) with resume on interrupted downloads.
|
|
56
|
+
- Curated registry of supported SLMs: Phi-3.5-mini, Llama-3.2-1B/3B, Qwen2.5-0.5B/1.5B/3B, Gemma-2-2B, SmolLM2.
|
|
57
|
+
- Structured output: JSON Schema → constrained decoding.
|
|
58
|
+
- Web Worker execution out of the box.
|
|
59
|
+
|
|
60
|
+
### Out of scope
|
|
61
|
+
|
|
62
|
+
- Server-side execution (Node, Bun, Deno).
|
|
63
|
+
- Training, fine-tuning, LoRA loading.
|
|
64
|
+
- Multi-modal models at v1.0 (a future composite SDK may combine `ort-vision-sdk-web` + `localm-web`).
|
|
65
|
+
- A llama.cpp / GGUF backend — community-maintained options exist; that's not our differentiation.
|
|
66
|
+
- A pre-built chat UI. This is an SDK, not a chatbot kit.
|
|
67
|
+
- Bundling model weights into the package — models are downloaded at runtime.
|
|
68
|
+
- Non-ESM module formats.
|
|
69
|
+
|
|
70
|
+
## Architecture
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
localm-web/
|
|
74
|
+
├── src/
|
|
75
|
+
│ ├── core/ # backend abstraction + WebLLM / ORT-Web engines
|
|
76
|
+
│ ├── tasks/ # Chat, Completion, Embeddings, Reranker
|
|
77
|
+
│ ├── io/ # tokenizer + chat-template loaders
|
|
78
|
+
│ ├── sampling/ # greedy, top-k, top-p, temperature
|
|
79
|
+
│ ├── cache/ # KV cache + model file cache (Cache API / OPFS)
|
|
80
|
+
│ ├── streaming/ # async iterator + AbortSignal plumbing
|
|
81
|
+
│ ├── structured/ # JSON Schema → grammar / logit-mask
|
|
82
|
+
│ ├── presets/ # curated model registry
|
|
83
|
+
│ ├── worker/ # Web Worker entrypoint for inference
|
|
84
|
+
│ ├── results.ts # typed result classes
|
|
85
|
+
│ ├── types.ts # primitive types (Message, ChatRequest, etc.)
|
|
86
|
+
│ └── index.ts # public API
|
|
87
|
+
├── test/
|
|
88
|
+
├── examples/
|
|
89
|
+
├── docs/
|
|
90
|
+
└── ...
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
A full layer-by-layer breakdown lives in [CLAUDE.md](./CLAUDE.md).
|
|
94
|
+
|
|
95
|
+
## Tech stack
|
|
96
|
+
|
|
97
|
+
- **Language:** TypeScript 5.4+, strict mode, ES2022 target.
|
|
98
|
+
- **Module format:** ESM only.
|
|
99
|
+
- **Build:** Vite 5+ in library mode, `tsc` for declarations.
|
|
100
|
+
- **Primary runtime:** [WebLLM (MLC)](https://github.com/mlc-ai/web-llm), Apache 2.0, WebGPU-first.
|
|
101
|
+
- **Fallback runtime (v0.5+):** [`onnxruntime-web`](https://github.com/microsoft/onnxruntime) + [`@huggingface/transformers`](https://github.com/huggingface/transformers.js).
|
|
102
|
+
- **Tokenizer:** `@huggingface/transformers` tokenizer module.
|
|
103
|
+
- **Chat templates:** `@huggingface/jinja`.
|
|
104
|
+
- **Storage:** Cache API + OPFS (Origin Private File System).
|
|
105
|
+
- **Concurrency:** Web Worker via `Comlink` (or native `MessagePort`).
|
|
106
|
+
- **Tests:** Vitest + Playwright (real browser for WebGPU).
|
|
107
|
+
- **Lint/format:** ESLint + Prettier.
|
|
108
|
+
|
|
109
|
+
## Public API (target shape)
|
|
110
|
+
|
|
111
|
+
```typescript
|
|
112
|
+
import { Chat, Completion, Embeddings, Reranker } from "localm-web";
|
|
113
|
+
|
|
114
|
+
// Chat — multi-turn conversation with chat template applied
|
|
115
|
+
const chat = await Chat.create("phi-3.5-mini-int4");
|
|
116
|
+
const reply = await chat.send("Explain ONNX in one sentence.");
|
|
117
|
+
console.log(reply.text);
|
|
118
|
+
|
|
119
|
+
// Streaming
|
|
120
|
+
const controller = new AbortController();
|
|
121
|
+
for await (const token of chat.stream("Explain ONNX.", { signal: controller.signal })) {
|
|
122
|
+
process.stdout.write(token.text);
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
// Completion — raw text-in text-out (no chat template)
|
|
126
|
+
const comp = await Completion.create("qwen2.5-0.5b-int4");
|
|
127
|
+
const out = await comp.predict("Once upon a time", { maxTokens: 100 });
|
|
128
|
+
|
|
129
|
+
// Embeddings
|
|
130
|
+
const emb = await Embeddings.create("bge-small-en-v1.5");
|
|
131
|
+
const vectors = await emb.embed(["hello world", "another sentence"]);
|
|
132
|
+
|
|
133
|
+
// Reranker
|
|
134
|
+
const rerank = await Reranker.create("bge-reranker-base");
|
|
135
|
+
const scores = await rerank.score("query", ["doc1", "doc2", "doc3"]);
|
|
136
|
+
|
|
137
|
+
// Structured output (JSON Schema → constrained decoding)
|
|
138
|
+
const json = await chat.send("Extract user info from: ...", {
|
|
139
|
+
jsonSchema: { type: "object", properties: { name: { type: "string" } } },
|
|
140
|
+
});
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
The shape mirrors `ort-vision-sdk-web`: `await Class.create(model)` then `predict()` / `send()` / `embed()` / `score()`.
|
|
144
|
+
|
|
145
|
+
## Versioning roadmap
|
|
146
|
+
|
|
147
|
+
| Version | Scope |
|
|
148
|
+
| -------- | -------------------------------------------------------------------------------------------- |
|
|
149
|
+
| **v0.1** | `Chat` via WebLLM. Phi-3.5-mini, Llama-3.2-1B, Qwen2.5-1.5B. Streaming with `AbortSignal`. |
|
|
150
|
+
| **v0.2** | `Completion` task. Model caching (Cache API + OPFS). Web Worker by default. Progress events. |
|
|
151
|
+
| **v0.3** | `Embeddings` and `Reranker` tasks. BGE family via transformers.js. |
|
|
152
|
+
| **v0.4** | Structured output (JSON Schema → grammar / logit masking). |
|
|
153
|
+
| **v0.5** | ORT-Web fallback for browsers without WebGPU. Auto-detection and graceful degradation. |
|
|
154
|
+
| **v0.6** | Function calling helper (tool use with schema-validated arguments). |
|
|
155
|
+
| **v1.0** | Documentation site, runnable demos, stable API contract. |
|
|
156
|
+
|
|
157
|
+
## Browser support
|
|
158
|
+
|
|
159
|
+
- **WebGPU:** Chrome 113+, Edge 113+, recent Firefox Nightly with `dom.webgpu.enabled`, Safari 18+ on macOS Sonoma+ / iOS 18+.
|
|
160
|
+
- **Without WebGPU:** from v0.5, a WASM-SIMD fallback path will run smaller models acceptably. Below v0.5, a clear runtime error is raised when WebGPU is missing.
|
|
161
|
+
|
|
162
|
+
## Installation
|
|
163
|
+
|
|
164
|
+
> Not yet published. Once v0.1 ships:
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
npm install localm-web @mlc-ai/web-llm
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
`@mlc-ai/web-llm` is a peer dependency — the consumer pins the version, which keeps the SDK lightweight and avoids version conflicts.
|
|
171
|
+
|
|
172
|
+
## Vite usage
|
|
173
|
+
|
|
174
|
+
The package is designed to drop into a Vite app with no extra config. The Web Worker is bundled via Vite's native worker support; just import the SDK and use it.
|
|
175
|
+
|
|
176
|
+
A complete example will live under `examples/vite-chat/` once v0.1 lands.
|
|
177
|
+
|
|
178
|
+
## Why not server-side?
|
|
179
|
+
|
|
180
|
+
Three reasons:
|
|
181
|
+
|
|
182
|
+
1. **Mature alternatives exist.** Python and TS already have excellent server-side LM tooling (Ollama, vLLM, llama-cpp-python, transformers, llama.cpp Node bindings). Adding another wrapper is noise.
|
|
183
|
+
2. **The browser is the underserved surface.** Running models on the user's device removes the server cost, keeps data local, and unlocks offline use cases — but the DX is currently rough.
|
|
184
|
+
3. **Different concerns.** Server inference cares about throughput, batching, multi-tenant scheduling. Browser inference cares about cold-start time, model caching, UI thread isolation, WebGPU compatibility. Conflating them produces a bad SDK on both sides.
|
|
185
|
+
|
|
186
|
+
## Security
|
|
187
|
+
|
|
188
|
+
`localm-web` is a browser SDK — its dependencies execute in your users' browsers. Two layers, treated differently:
|
|
189
|
+
|
|
190
|
+
| Layer | What it is | Vuln policy |
|
|
191
|
+
| ------------------- | --------------------------------------- | --------------------------------------------------------------------------------------- |
|
|
192
|
+
| **Runtime** (peers) | `@mlc-ai/web-llm`, future runtime peers | **Zero known CVEs.** Releases are blocked if `npm audit --omit=dev` reports anything. |
|
|
193
|
+
| **Dev tooling** | Vite, Vitest, ESLint, esbuild, etc. | Fixed promptly via dependency bumps or `overrides`. Never reaches the published bundle. |
|
|
194
|
+
|
|
195
|
+
### Reporting vulnerabilities
|
|
196
|
+
|
|
197
|
+
If you find a vulnerability in `localm-web` itself (not in a transitive dep), open a private security advisory at <https://github.com/mauriciobenjamin700/localm-web/security/advisories/new>. Please don't open public issues for unpatched runtime vulns.
|
|
198
|
+
|
|
199
|
+
### What we do on every release
|
|
200
|
+
|
|
201
|
+
- `npm ci` (locked install — no drift between dev machine and CI).
|
|
202
|
+
- `npm audit` reviewed manually; nothing handwaved.
|
|
203
|
+
- ESM-only build, no eval / `Function()` / dynamic remote code.
|
|
204
|
+
- Signed publish via `npm publish --provenance` (provenance attestation visible on the npm package page).
|
|
205
|
+
|
|
206
|
+
### What you should do as a consumer
|
|
207
|
+
|
|
208
|
+
- Pin the SDK version (`localm-web@x.y.z`, not `^x.y.z`) until you've validated a release.
|
|
209
|
+
- Self-host model weights or use Subresource Integrity (SRI) when the runtime fetches them — model URLs are external.
|
|
210
|
+
- Models are cached locally (Cache API + OPFS) — surface this in your app's privacy policy.
|
|
211
|
+
- Run inference inside a Web Worker (the SDK does this by default from v0.2). Don't bypass it to "save a thread" — it isolates faulty model code from your UI.
|
|
212
|
+
|
|
213
|
+
The full maintainer policy lives in [CLAUDE.md → Security & vulnerabilities](./CLAUDE.md#security--vulnerabilities).
|
|
214
|
+
|
|
215
|
+
## Contributing
|
|
216
|
+
|
|
217
|
+
Pre-alpha. Issues and design discussion welcome. PRs deferred until the v0.1 surface stabilizes.
|
|
218
|
+
|
|
219
|
+
## License
|
|
220
|
+
|
|
221
|
+
MIT — see [LICENSE](./LICENSE).
|
|
222
|
+
|
|
223
|
+
## Related projects
|
|
224
|
+
|
|
225
|
+
- [`ort-vision-sdk`](https://github.com/mauriciobenjamin700/ort-vision-sdk) — sibling SDK for computer vision (classification, detection, segmentation). Same DX patterns, same author.
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1,373 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* localm-web — browser-only TypeScript SDK for running LLMs and SLMs locally.
|
|
3
|
+
*
|
|
4
|
+
* Public API surface for v0.1.
|
|
5
|
+
*
|
|
6
|
+
* @packageDocumentation
|
|
7
|
+
*/
|
|
8
|
+
|
|
9
|
+
/** Thrown when no usable backend is available on the current platform. */
|
|
10
|
+
export declare class BackendNotAvailableError extends LocalmWebError {
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
/**
|
|
14
|
+
* Multi-turn chat task.
|
|
15
|
+
*
|
|
16
|
+
* Maintains an in-memory conversation history and applies the chat template
|
|
17
|
+
* configured for the loaded model. Use {@link Chat.create} to construct an
|
|
18
|
+
* instance — the constructor is private.
|
|
19
|
+
*
|
|
20
|
+
* @example
|
|
21
|
+
* ```ts
|
|
22
|
+
* const chat = await Chat.create("phi-3.5-mini-int4");
|
|
23
|
+
* const reply = await chat.send("Explain ONNX in one sentence.");
|
|
24
|
+
* console.log(reply.text);
|
|
25
|
+
* ```
|
|
26
|
+
*
|
|
27
|
+
* @example Streaming
|
|
28
|
+
* ```ts
|
|
29
|
+
* const controller = new AbortController();
|
|
30
|
+
* for await (const token of chat.stream("Explain ONNX.", { signal: controller.signal })) {
|
|
31
|
+
* process.stdout.write(token.text);
|
|
32
|
+
* }
|
|
33
|
+
* ```
|
|
34
|
+
*/
|
|
35
|
+
export declare class Chat extends LMTask {
|
|
36
|
+
private readonly history;
|
|
37
|
+
private systemPrompt;
|
|
38
|
+
private constructor();
|
|
39
|
+
/**
|
|
40
|
+
* Create and load a `Chat` task for the given model.
|
|
41
|
+
*
|
|
42
|
+
* @param modelId - Friendly model id from the registry (e.g. `"phi-3.5-mini-int4"`).
|
|
43
|
+
* @param options - Optional creation options (progress callback, engine override).
|
|
44
|
+
*/
|
|
45
|
+
static create(modelId: string, options?: LMTaskCreateOptions): Promise<Chat>;
|
|
46
|
+
/** Set or replace the system prompt prepended to every conversation. */
|
|
47
|
+
setSystemPrompt(prompt: string): void;
|
|
48
|
+
/** Clear the system prompt. */
|
|
49
|
+
clearSystemPrompt(): void;
|
|
50
|
+
/** Reset the conversation history. The system prompt is preserved. */
|
|
51
|
+
resetHistory(): void;
|
|
52
|
+
/** A read-only snapshot of the conversation history. */
|
|
53
|
+
getHistory(): readonly Message[];
|
|
54
|
+
/**
|
|
55
|
+
* Send a user message and await the full assistant reply.
|
|
56
|
+
*
|
|
57
|
+
* The user message and the assistant reply are appended to the history.
|
|
58
|
+
*
|
|
59
|
+
* @param message - The user-facing message text.
|
|
60
|
+
* @param options - Generation options.
|
|
61
|
+
* @returns A {@link ChatReply} with the assistant's reply.
|
|
62
|
+
*/
|
|
63
|
+
send(message: string, options?: GenerationOptions): Promise<ChatReply>;
|
|
64
|
+
/**
|
|
65
|
+
* Stream the assistant reply token-by-token as an async iterable.
|
|
66
|
+
*
|
|
67
|
+
* The full reply is appended to the history when the stream completes
|
|
68
|
+
* normally. If the stream is aborted, neither message is appended.
|
|
69
|
+
*
|
|
70
|
+
* @param message - The user-facing message text.
|
|
71
|
+
* @param options - Generation options including an optional `signal`.
|
|
72
|
+
*/
|
|
73
|
+
stream(message: string, options?: GenerationOptions): AsyncIterable<TokenChunk>;
|
|
74
|
+
private buildMessages;
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
/**
|
|
78
|
+
* Result returned by `Chat.send()`.
|
|
79
|
+
*
|
|
80
|
+
* Holds the assistant's textual reply, the structured assistant message
|
|
81
|
+
* (already appended to the chat history), and metadata about the generation.
|
|
82
|
+
*/
|
|
83
|
+
export declare class ChatReply {
|
|
84
|
+
/** The assistant's reply text. */
|
|
85
|
+
readonly text: string;
|
|
86
|
+
/** The structured assistant message (already appended to chat history). */
|
|
87
|
+
readonly message: Message;
|
|
88
|
+
/** Number of tokens generated. 0 when the engine does not report it. */
|
|
89
|
+
readonly tokensGenerated: number;
|
|
90
|
+
/** Why the generation loop stopped. */
|
|
91
|
+
readonly finishReason: FinishReason;
|
|
92
|
+
constructor(
|
|
93
|
+
/** The assistant's reply text. */
|
|
94
|
+
text: string,
|
|
95
|
+
/** The structured assistant message (already appended to chat history). */
|
|
96
|
+
message: Message,
|
|
97
|
+
/** Number of tokens generated. 0 when the engine does not report it. */
|
|
98
|
+
tokensGenerated: number,
|
|
99
|
+
/** Why the generation loop stopped. */
|
|
100
|
+
finishReason: FinishReason);
|
|
101
|
+
}
|
|
102
|
+
|
|
103
|
+
/**
|
|
104
|
+
* Drain an async iterable of token chunks into a single string.
|
|
105
|
+
*
|
|
106
|
+
* Useful in tests, for non-streaming consumers, and as a one-line way to
|
|
107
|
+
* reconstruct the final text from a `Chat.stream(...)` call.
|
|
108
|
+
*
|
|
109
|
+
* @param stream - The token-chunk async iterable to consume.
|
|
110
|
+
* @returns The concatenation of every chunk's `text` field.
|
|
111
|
+
*/
|
|
112
|
+
export declare function collectStream(stream: AsyncIterable<TokenChunk>): Promise<string>;
|
|
113
|
+
|
|
114
|
+
/**
|
|
115
|
+
* Runtime-agnostic inference contract.
|
|
116
|
+
*
|
|
117
|
+
* Tasks (`Chat`, future `Completion`, etc.) depend on this interface, not on
|
|
118
|
+
* a concrete backend. This lets the SDK swap WebLLM for the ORT-Web fallback
|
|
119
|
+
* (planned for v0.5) without touching task code.
|
|
120
|
+
*/
|
|
121
|
+
export declare interface Engine {
|
|
122
|
+
/**
|
|
123
|
+
* Load a model into the engine.
|
|
124
|
+
*
|
|
125
|
+
* @param modelId - Backend-specific model identifier.
|
|
126
|
+
* @param onProgress - Optional callback for load progress updates.
|
|
127
|
+
* @throws ModelLoadError on failure to fetch or initialize the model.
|
|
128
|
+
* @throws WebGPUUnavailableError when the engine requires WebGPU but it is missing.
|
|
129
|
+
*/
|
|
130
|
+
load(modelId: string, onProgress?: ProgressCallback): Promise<void>;
|
|
131
|
+
/**
|
|
132
|
+
* Generate a single non-streaming response.
|
|
133
|
+
*
|
|
134
|
+
* @param messages - Conversation history including the latest user turn.
|
|
135
|
+
* @param options - Generation options.
|
|
136
|
+
* @returns The full generated text.
|
|
137
|
+
* @throws ModelNotLoadedError if called before {@link Engine.load}.
|
|
138
|
+
* @throws GenerationAbortedError if `options.signal` is triggered.
|
|
139
|
+
*/
|
|
140
|
+
generate(messages: Message[], options?: GenerationOptions): Promise<string>;
|
|
141
|
+
/**
|
|
142
|
+
* Generate a streaming response as an async iterable of token chunks.
|
|
143
|
+
*
|
|
144
|
+
* @param messages - Conversation history including the latest user turn.
|
|
145
|
+
* @param options - Generation options.
|
|
146
|
+
* @returns Async iterable yielding token chunks. The final chunk has `done: true`.
|
|
147
|
+
* @throws ModelNotLoadedError if called before {@link Engine.load}.
|
|
148
|
+
* @throws GenerationAbortedError if `options.signal` is triggered.
|
|
149
|
+
*/
|
|
150
|
+
stream(messages: Message[], options?: GenerationOptions): AsyncIterable<TokenChunk>;
|
|
151
|
+
/** Release any resources held by the engine. Safe to call when not loaded. */
|
|
152
|
+
unload(): Promise<void>;
|
|
153
|
+
/** Whether a model is currently loaded and ready for inference. */
|
|
154
|
+
isLoaded(): boolean;
|
|
155
|
+
}
|
|
156
|
+
|
|
157
|
+
/** Reason a generation loop stopped. */
|
|
158
|
+
export declare type FinishReason = "stop" | "length" | "abort";
|
|
159
|
+
|
|
160
|
+
/** Thrown when generation is aborted via an `AbortSignal`. */
|
|
161
|
+
export declare class GenerationAbortedError extends LocalmWebError {
|
|
162
|
+
}
|
|
163
|
+
|
|
164
|
+
/** Options that control a single generation call. */
|
|
165
|
+
export declare interface GenerationOptions {
|
|
166
|
+
/** Maximum number of tokens to generate. Engine-default if omitted. */
|
|
167
|
+
maxTokens?: number;
|
|
168
|
+
/** Sampling temperature. 0 = deterministic, higher = more random. */
|
|
169
|
+
temperature?: number;
|
|
170
|
+
/** Top-K sampling cutoff. */
|
|
171
|
+
topK?: number;
|
|
172
|
+
/** Top-P (nucleus) sampling cutoff. */
|
|
173
|
+
topP?: number;
|
|
174
|
+
/** Cancellation signal. When triggered, the engine stops generation. */
|
|
175
|
+
signal?: AbortSignal;
|
|
176
|
+
/**
|
|
177
|
+
* JSON Schema for structured output. The engine constrains decoding to
|
|
178
|
+
* produce a string parseable as JSON matching the schema. Planned for v0.4.
|
|
179
|
+
*/
|
|
180
|
+
jsonSchema?: object;
|
|
181
|
+
}
|
|
182
|
+
|
|
183
|
+
/** Return the list of supported friendly model ids. */
|
|
184
|
+
export declare function listSupportedModels(): string[];
|
|
185
|
+
|
|
186
|
+
/**
|
|
187
|
+
* Base class shared by all language-model tasks (`Chat` for v0.1; `Completion`,
|
|
188
|
+
* `Embeddings` and `Reranker` planned for later versions).
|
|
189
|
+
*
|
|
190
|
+
* The base owns:
|
|
191
|
+
* - resolving a friendly model id to a {@link ModelPreset};
|
|
192
|
+
* - selecting and loading an {@link Engine} (defaulting to WebLLM);
|
|
193
|
+
* - exposing `unload()` for cleanup.
|
|
194
|
+
*
|
|
195
|
+
* Subclasses add task-specific public methods (`send`, `stream`, etc.).
|
|
196
|
+
*/
|
|
197
|
+
export declare abstract class LMTask {
|
|
198
|
+
/** Engine used for inference. */
|
|
199
|
+
protected readonly engine: Engine;
|
|
200
|
+
/** Resolved metadata for the loaded model. */
|
|
201
|
+
readonly preset: ModelPreset;
|
|
202
|
+
protected constructor(
|
|
203
|
+
/** Engine used for inference. */
|
|
204
|
+
engine: Engine,
|
|
205
|
+
/** Resolved metadata for the loaded model. */
|
|
206
|
+
preset: ModelPreset);
|
|
207
|
+
/**
|
|
208
|
+
* Load a model into a backend and return the wired-up engine + preset.
|
|
209
|
+
*
|
|
210
|
+
* Subclasses call this from their static `create()` factories.
|
|
211
|
+
*
|
|
212
|
+
* @param modelId - Friendly model id from the registry.
|
|
213
|
+
* @param options - Task creation options.
|
|
214
|
+
*/
|
|
215
|
+
protected static createEngine(modelId: string, options?: LMTaskCreateOptions): Promise<ResolvedEngine>;
|
|
216
|
+
/** Release engine resources. Safe to call multiple times. */
|
|
217
|
+
unload(): Promise<void>;
|
|
218
|
+
/** Whether the underlying engine has a loaded model. */
|
|
219
|
+
isLoaded(): boolean;
|
|
220
|
+
}
|
|
221
|
+
|
|
222
|
+
/** Common options accepted by every task's `create()` factory. */
|
|
223
|
+
export declare interface LMTaskCreateOptions {
|
|
224
|
+
/** Optional callback for model load progress updates. */
|
|
225
|
+
onProgress?: ProgressCallback;
|
|
226
|
+
/**
|
|
227
|
+
* Override the engine used for inference. Intended for testing.
|
|
228
|
+
* Production callers should let the SDK pick a backend automatically.
|
|
229
|
+
*/
|
|
230
|
+
engine?: Engine;
|
|
231
|
+
}
|
|
232
|
+
|
|
233
|
+
/**
|
|
234
|
+
* Error hierarchy for localm-web.
|
|
235
|
+
*
|
|
236
|
+
* All errors thrown by the SDK extend `LocalmWebError` so consumers can
|
|
237
|
+
* distinguish SDK errors from unrelated runtime errors with a single
|
|
238
|
+
* `instanceof` check.
|
|
239
|
+
*/
|
|
240
|
+
/** Base class for every error raised by localm-web. */
|
|
241
|
+
export declare class LocalmWebError extends Error {
|
|
242
|
+
readonly cause?: unknown | undefined;
|
|
243
|
+
/**
|
|
244
|
+
* @param message - Human-readable description of the error.
|
|
245
|
+
* @param cause - Underlying error, if any.
|
|
246
|
+
*/
|
|
247
|
+
constructor(message: string, cause?: unknown | undefined);
|
|
248
|
+
}
|
|
249
|
+
|
|
250
|
+
/** A single message in a chat conversation. */
|
|
251
|
+
export declare interface Message {
|
|
252
|
+
/** The role of the speaker. */
|
|
253
|
+
role: Role;
|
|
254
|
+
/** Text content of the message. */
|
|
255
|
+
content: string;
|
|
256
|
+
/** Optional name (used by some chat templates and tool calls). */
|
|
257
|
+
name?: string;
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
/**
|
|
261
|
+
* Curated registry of supported models for v0.1.
|
|
262
|
+
*
|
|
263
|
+
* Each entry maps a friendly id (e.g. `"phi-3.5-mini-int4"`) to the underlying
|
|
264
|
+
* runtime identifier and metadata. Friendly ids are stable; backend ids may
|
|
265
|
+
* change as upstream MLC packages evolve.
|
|
266
|
+
*
|
|
267
|
+
* Only models that have been validated to load in browsers with WebGPU and
|
|
268
|
+
* that fit the SLM target (≤ 4B parameters at INT4) are included.
|
|
269
|
+
*/
|
|
270
|
+
export declare const MODEL_PRESETS: Readonly<Record<string, ModelPreset>>;
|
|
271
|
+
|
|
272
|
+
/** Thrown when a model fails to load (network, parsing, runtime init). */
|
|
273
|
+
export declare class ModelLoadError extends LocalmWebError {
|
|
274
|
+
}
|
|
275
|
+
|
|
276
|
+
/** Progress event emitted while a model is loading. */
|
|
277
|
+
export declare interface ModelLoadProgress {
|
|
278
|
+
/** Fraction of total work completed, in [0, 1]. */
|
|
279
|
+
progress: number;
|
|
280
|
+
/** Human-readable status message from the underlying runtime. */
|
|
281
|
+
text: string;
|
|
282
|
+
/** Bytes loaded so far. 0 when unavailable. */
|
|
283
|
+
loaded: number;
|
|
284
|
+
/** Total bytes to load. 0 when unavailable. */
|
|
285
|
+
total: number;
|
|
286
|
+
}
|
|
287
|
+
|
|
288
|
+
/** Thrown when an inference call is made before a model has loaded. */
|
|
289
|
+
export declare class ModelNotLoadedError extends LocalmWebError {
|
|
290
|
+
}
|
|
291
|
+
|
|
292
|
+
/** Curated metadata for a supported model. */
|
|
293
|
+
export declare interface ModelPreset {
|
|
294
|
+
/** Friendly identifier exposed to users (e.g. "phi-3.5-mini-int4"). */
|
|
295
|
+
id: string;
|
|
296
|
+
/** Model family (e.g. "Phi-3.5", "Llama-3.2"). */
|
|
297
|
+
family: string;
|
|
298
|
+
/** Parameter count as a human string (e.g. "1B", "3.8B"). */
|
|
299
|
+
parameters: string;
|
|
300
|
+
/** Quantization scheme (e.g. "q4f16_1"). */
|
|
301
|
+
quantization: string;
|
|
302
|
+
/** Identifier expected by the WebLLM runtime. */
|
|
303
|
+
webllmId: string;
|
|
304
|
+
/** Optional ONNX URL used by the future ORT-Web fallback (v0.5+). */
|
|
305
|
+
ortUrl?: string;
|
|
306
|
+
/** Maximum context window in tokens. */
|
|
307
|
+
contextWindow: number;
|
|
308
|
+
/** Short human description. */
|
|
309
|
+
description: string;
|
|
310
|
+
}
|
|
311
|
+
|
|
312
|
+
/** Callback signature for model load progress. */
|
|
313
|
+
export declare type ProgressCallback = (progress: ModelLoadProgress) => void;
|
|
314
|
+
|
|
315
|
+
/** Thrown when the browser denies storage quota for the model cache. */
|
|
316
|
+
export declare class QuotaExceededError extends LocalmWebError {
|
|
317
|
+
}
|
|
318
|
+
|
|
319
|
+
/** Internal payload returned by {@link LMTask.createEngine}. */
|
|
320
|
+
declare interface ResolvedEngine {
|
|
321
|
+
engine: Engine;
|
|
322
|
+
preset: ModelPreset;
|
|
323
|
+
}
|
|
324
|
+
|
|
325
|
+
/**
|
|
326
|
+
* Resolve a friendly model id to its full preset metadata.
|
|
327
|
+
*
|
|
328
|
+
* @param modelId - Friendly id (e.g. `"phi-3.5-mini-int4"`).
|
|
329
|
+
* @returns The matching preset.
|
|
330
|
+
* @throws UnknownModelError if no preset matches.
|
|
331
|
+
*/
|
|
332
|
+
export declare function resolveModelPreset(modelId: string): ModelPreset;
|
|
333
|
+
|
|
334
|
+
/**
|
|
335
|
+
* Public type primitives for localm-web.
|
|
336
|
+
*/
|
|
337
|
+
/** Conversation roles supported by chat templates. */
|
|
338
|
+
export declare type Role = "system" | "user" | "assistant" | "tool";
|
|
339
|
+
|
|
340
|
+
/**
|
|
341
|
+
* Wrap an async iterable so that each `TokenChunk` is also passed to a
|
|
342
|
+
* caller-supplied side-effect callback before being yielded downstream.
|
|
343
|
+
*
|
|
344
|
+
* This is intentionally a passthrough — it does not buffer.
|
|
345
|
+
*
|
|
346
|
+
* @param stream - The upstream token-chunk async iterable.
|
|
347
|
+
* @param onChunk - Side-effect invoked for every chunk.
|
|
348
|
+
* @returns A new async iterable yielding the same chunks.
|
|
349
|
+
*/
|
|
350
|
+
export declare function tap(stream: AsyncIterable<TokenChunk>, onChunk: (chunk: TokenChunk) => void): AsyncIterable<TokenChunk>;
|
|
351
|
+
|
|
352
|
+
/** A single token (or short span) produced by the streaming generator. */
|
|
353
|
+
export declare interface TokenChunk {
|
|
354
|
+
/** Text fragment produced in this step. */
|
|
355
|
+
text: string;
|
|
356
|
+
/** Sequential index of this chunk in the stream, starting at 0. */
|
|
357
|
+
index: number;
|
|
358
|
+
/** True for the final chunk; the final chunk has empty text. */
|
|
359
|
+
done: boolean;
|
|
360
|
+
}
|
|
361
|
+
|
|
362
|
+
/** Thrown when a model id is not present in the curated registry. */
|
|
363
|
+
export declare class UnknownModelError extends LocalmWebError {
|
|
364
|
+
}
|
|
365
|
+
|
|
366
|
+
/** Current package version. Updated at release time. */
|
|
367
|
+
export declare const VERSION: string;
|
|
368
|
+
|
|
369
|
+
/** Thrown when WebGPU is required but not available in the host browser. */
|
|
370
|
+
export declare class WebGPUUnavailableError extends LocalmWebError {
|
|
371
|
+
}
|
|
372
|
+
|
|
373
|
+
export { }
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,336 @@
|
|
|
1
|
+
class LocalmWebError extends Error {
|
|
2
|
+
/**
|
|
3
|
+
* @param message - Human-readable description of the error.
|
|
4
|
+
* @param cause - Underlying error, if any.
|
|
5
|
+
*/
|
|
6
|
+
constructor(message, cause) {
|
|
7
|
+
super(message);
|
|
8
|
+
this.cause = cause;
|
|
9
|
+
this.name = new.target.name;
|
|
10
|
+
}
|
|
11
|
+
}
|
|
12
|
+
class WebGPUUnavailableError extends LocalmWebError {
|
|
13
|
+
}
|
|
14
|
+
class ModelLoadError extends LocalmWebError {
|
|
15
|
+
}
|
|
16
|
+
class ModelNotLoadedError extends LocalmWebError {
|
|
17
|
+
}
|
|
18
|
+
class UnknownModelError extends LocalmWebError {
|
|
19
|
+
}
|
|
20
|
+
class GenerationAbortedError extends LocalmWebError {
|
|
21
|
+
}
|
|
22
|
+
class QuotaExceededError extends LocalmWebError {
|
|
23
|
+
}
|
|
24
|
+
class BackendNotAvailableError extends LocalmWebError {
|
|
25
|
+
}
|
|
26
|
+
let webllmModulePromise = null;
|
|
27
|
+
async function loadWebLLM() {
|
|
28
|
+
if (!webllmModulePromise) {
|
|
29
|
+
webllmModulePromise = import("@mlc-ai/web-llm");
|
|
30
|
+
}
|
|
31
|
+
return webllmModulePromise;
|
|
32
|
+
}
|
|
33
|
+
function isWebGPUAvailable() {
|
|
34
|
+
return typeof navigator !== "undefined" && "gpu" in navigator;
|
|
35
|
+
}
|
|
36
|
+
function buildSamplingParams(options) {
|
|
37
|
+
const params = {};
|
|
38
|
+
if (options.maxTokens !== void 0) params.max_tokens = options.maxTokens;
|
|
39
|
+
if (options.temperature !== void 0) params.temperature = options.temperature;
|
|
40
|
+
if (options.topP !== void 0) params.top_p = options.topP;
|
|
41
|
+
return params;
|
|
42
|
+
}
|
|
43
|
+
function toChatMessages(messages) {
|
|
44
|
+
return messages.map((m) => {
|
|
45
|
+
switch (m.role) {
|
|
46
|
+
case "system":
|
|
47
|
+
return { role: "system", content: m.content };
|
|
48
|
+
case "user":
|
|
49
|
+
return { role: "user", content: m.content };
|
|
50
|
+
case "assistant":
|
|
51
|
+
return { role: "assistant", content: m.content };
|
|
52
|
+
case "tool":
|
|
53
|
+
return { role: "tool", content: m.content, tool_call_id: m.name ?? "" };
|
|
54
|
+
}
|
|
55
|
+
});
|
|
56
|
+
}
|
|
57
|
+
class WebLLMEngine {
|
|
58
|
+
engine = null;
|
|
59
|
+
isLoaded() {
|
|
60
|
+
return this.engine !== null;
|
|
61
|
+
}
|
|
62
|
+
async load(modelId, onProgress) {
|
|
63
|
+
if (!isWebGPUAvailable()) {
|
|
64
|
+
throw new WebGPUUnavailableError(
|
|
65
|
+
"WebGPU is not available in this browser. The ORT-Web fallback is planned for v0.5."
|
|
66
|
+
);
|
|
67
|
+
}
|
|
68
|
+
const webllm = await loadWebLLM();
|
|
69
|
+
try {
|
|
70
|
+
this.engine = await webllm.CreateMLCEngine(modelId, {
|
|
71
|
+
initProgressCallback: (report) => {
|
|
72
|
+
onProgress?.({
|
|
73
|
+
progress: report.progress,
|
|
74
|
+
text: report.text,
|
|
75
|
+
loaded: 0,
|
|
76
|
+
total: 0
|
|
77
|
+
});
|
|
78
|
+
}
|
|
79
|
+
});
|
|
80
|
+
} catch (err) {
|
|
81
|
+
throw new ModelLoadError(`Failed to load model "${modelId}".`, err);
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
async generate(messages, options = {}) {
|
|
85
|
+
const engine = this.requireEngine();
|
|
86
|
+
if (options.signal?.aborted) {
|
|
87
|
+
throw new GenerationAbortedError("Generation aborted before start.");
|
|
88
|
+
}
|
|
89
|
+
const completion = await engine.chat.completions.create({
|
|
90
|
+
...buildSamplingParams(options),
|
|
91
|
+
messages: toChatMessages(messages),
|
|
92
|
+
stream: false
|
|
93
|
+
});
|
|
94
|
+
return completion.choices[0]?.message?.content ?? "";
|
|
95
|
+
}
|
|
96
|
+
async *stream(messages, options = {}) {
|
|
97
|
+
const engine = this.requireEngine();
|
|
98
|
+
if (options.signal?.aborted) {
|
|
99
|
+
throw new GenerationAbortedError("Generation aborted before start.");
|
|
100
|
+
}
|
|
101
|
+
const completion = await engine.chat.completions.create({
|
|
102
|
+
...buildSamplingParams(options),
|
|
103
|
+
messages: toChatMessages(messages),
|
|
104
|
+
stream: true
|
|
105
|
+
});
|
|
106
|
+
let index = 0;
|
|
107
|
+
let finished = false;
|
|
108
|
+
try {
|
|
109
|
+
for await (const chunk of completion) {
|
|
110
|
+
if (options.signal?.aborted) {
|
|
111
|
+
throw new GenerationAbortedError("Generation aborted by signal.");
|
|
112
|
+
}
|
|
113
|
+
const choice = chunk.choices[0];
|
|
114
|
+
const delta = choice?.delta?.content ?? "";
|
|
115
|
+
if (delta) {
|
|
116
|
+
yield { text: delta, index, done: false };
|
|
117
|
+
index += 1;
|
|
118
|
+
}
|
|
119
|
+
if (choice?.finish_reason) {
|
|
120
|
+
finished = true;
|
|
121
|
+
yield { text: "", index, done: true };
|
|
122
|
+
index += 1;
|
|
123
|
+
}
|
|
124
|
+
}
|
|
125
|
+
if (!finished) {
|
|
126
|
+
yield { text: "", index, done: true };
|
|
127
|
+
}
|
|
128
|
+
} catch (err) {
|
|
129
|
+
if (err instanceof GenerationAbortedError) throw err;
|
|
130
|
+
throw new ModelLoadError("Streaming generation failed.", err);
|
|
131
|
+
}
|
|
132
|
+
}
|
|
133
|
+
async unload() {
|
|
134
|
+
if (this.engine) {
|
|
135
|
+
await this.engine.unload();
|
|
136
|
+
this.engine = null;
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
requireEngine() {
|
|
140
|
+
if (!this.engine) {
|
|
141
|
+
throw new ModelNotLoadedError("Engine not loaded. Call load() before generation.");
|
|
142
|
+
}
|
|
143
|
+
return this.engine;
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
const MODEL_PRESETS = Object.freeze({
|
|
147
|
+
"phi-3.5-mini-int4": {
|
|
148
|
+
id: "phi-3.5-mini-int4",
|
|
149
|
+
family: "Phi-3.5",
|
|
150
|
+
parameters: "3.8B",
|
|
151
|
+
quantization: "q4f16_1",
|
|
152
|
+
webllmId: "Phi-3.5-mini-instruct-q4f16_1-MLC",
|
|
153
|
+
contextWindow: 4096,
|
|
154
|
+
description: "Microsoft Phi-3.5 mini, INT4 quantized for browser inference."
|
|
155
|
+
},
|
|
156
|
+
"llama-3.2-1b-int4": {
|
|
157
|
+
id: "llama-3.2-1b-int4",
|
|
158
|
+
family: "Llama-3.2",
|
|
159
|
+
parameters: "1B",
|
|
160
|
+
quantization: "q4f16_1",
|
|
161
|
+
webllmId: "Llama-3.2-1B-Instruct-q4f16_1-MLC",
|
|
162
|
+
contextWindow: 4096,
|
|
163
|
+
description: "Meta Llama 3.2 1B Instruct, INT4 quantized."
|
|
164
|
+
},
|
|
165
|
+
"qwen2.5-1.5b-int4": {
|
|
166
|
+
id: "qwen2.5-1.5b-int4",
|
|
167
|
+
family: "Qwen2.5",
|
|
168
|
+
parameters: "1.5B",
|
|
169
|
+
quantization: "q4f16_1",
|
|
170
|
+
webllmId: "Qwen2.5-1.5B-Instruct-q4f16_1-MLC",
|
|
171
|
+
contextWindow: 4096,
|
|
172
|
+
description: "Alibaba Qwen 2.5 1.5B Instruct, INT4 quantized."
|
|
173
|
+
}
|
|
174
|
+
});
|
|
175
|
+
function resolveModelPreset(modelId) {
|
|
176
|
+
const preset = MODEL_PRESETS[modelId];
|
|
177
|
+
if (!preset) {
|
|
178
|
+
const available = Object.keys(MODEL_PRESETS).join(", ");
|
|
179
|
+
throw new UnknownModelError(`Unknown model "${modelId}". Available models: ${available}.`);
|
|
180
|
+
}
|
|
181
|
+
return preset;
|
|
182
|
+
}
|
|
183
|
+
function listSupportedModels() {
|
|
184
|
+
return Object.keys(MODEL_PRESETS);
|
|
185
|
+
}
|
|
186
|
+
class LMTask {
|
|
187
|
+
constructor(engine, preset) {
|
|
188
|
+
this.engine = engine;
|
|
189
|
+
this.preset = preset;
|
|
190
|
+
}
|
|
191
|
+
/**
|
|
192
|
+
* Load a model into a backend and return the wired-up engine + preset.
|
|
193
|
+
*
|
|
194
|
+
* Subclasses call this from their static `create()` factories.
|
|
195
|
+
*
|
|
196
|
+
* @param modelId - Friendly model id from the registry.
|
|
197
|
+
* @param options - Task creation options.
|
|
198
|
+
*/
|
|
199
|
+
static async createEngine(modelId, options = {}) {
|
|
200
|
+
const preset = resolveModelPreset(modelId);
|
|
201
|
+
const engine = options.engine ?? new WebLLMEngine();
|
|
202
|
+
if (!engine.isLoaded()) {
|
|
203
|
+
await engine.load(preset.webllmId, options.onProgress);
|
|
204
|
+
}
|
|
205
|
+
return { engine, preset };
|
|
206
|
+
}
|
|
207
|
+
/** Release engine resources. Safe to call multiple times. */
|
|
208
|
+
async unload() {
|
|
209
|
+
await this.engine.unload();
|
|
210
|
+
}
|
|
211
|
+
/** Whether the underlying engine has a loaded model. */
|
|
212
|
+
isLoaded() {
|
|
213
|
+
return this.engine.isLoaded();
|
|
214
|
+
}
|
|
215
|
+
}
|
|
216
|
+
class ChatReply {
|
|
217
|
+
constructor(text, message, tokensGenerated, finishReason) {
|
|
218
|
+
this.text = text;
|
|
219
|
+
this.message = message;
|
|
220
|
+
this.tokensGenerated = tokensGenerated;
|
|
221
|
+
this.finishReason = finishReason;
|
|
222
|
+
}
|
|
223
|
+
}
|
|
224
|
+
class Chat extends LMTask {
|
|
225
|
+
history = [];
|
|
226
|
+
systemPrompt = null;
|
|
227
|
+
constructor(engine, preset) {
|
|
228
|
+
super(engine, preset);
|
|
229
|
+
}
|
|
230
|
+
/**
|
|
231
|
+
* Create and load a `Chat` task for the given model.
|
|
232
|
+
*
|
|
233
|
+
* @param modelId - Friendly model id from the registry (e.g. `"phi-3.5-mini-int4"`).
|
|
234
|
+
* @param options - Optional creation options (progress callback, engine override).
|
|
235
|
+
*/
|
|
236
|
+
static async create(modelId, options = {}) {
|
|
237
|
+
const { engine, preset } = await LMTask.createEngine(modelId, options);
|
|
238
|
+
return new Chat(engine, preset);
|
|
239
|
+
}
|
|
240
|
+
/** Set or replace the system prompt prepended to every conversation. */
|
|
241
|
+
setSystemPrompt(prompt) {
|
|
242
|
+
this.systemPrompt = prompt;
|
|
243
|
+
}
|
|
244
|
+
/** Clear the system prompt. */
|
|
245
|
+
clearSystemPrompt() {
|
|
246
|
+
this.systemPrompt = null;
|
|
247
|
+
}
|
|
248
|
+
/** Reset the conversation history. The system prompt is preserved. */
|
|
249
|
+
resetHistory() {
|
|
250
|
+
this.history.length = 0;
|
|
251
|
+
}
|
|
252
|
+
/** A read-only snapshot of the conversation history. */
|
|
253
|
+
getHistory() {
|
|
254
|
+
return this.history.slice();
|
|
255
|
+
}
|
|
256
|
+
/**
|
|
257
|
+
* Send a user message and await the full assistant reply.
|
|
258
|
+
*
|
|
259
|
+
* The user message and the assistant reply are appended to the history.
|
|
260
|
+
*
|
|
261
|
+
* @param message - The user-facing message text.
|
|
262
|
+
* @param options - Generation options.
|
|
263
|
+
* @returns A {@link ChatReply} with the assistant's reply.
|
|
264
|
+
*/
|
|
265
|
+
async send(message, options = {}) {
|
|
266
|
+
const messages = this.buildMessages(message);
|
|
267
|
+
const text = await this.engine.generate(messages, options);
|
|
268
|
+
const userMsg = { role: "user", content: message };
|
|
269
|
+
const assistantMsg = { role: "assistant", content: text };
|
|
270
|
+
this.history.push(userMsg, assistantMsg);
|
|
271
|
+
return new ChatReply(text, assistantMsg, 0, "stop");
|
|
272
|
+
}
|
|
273
|
+
/**
|
|
274
|
+
* Stream the assistant reply token-by-token as an async iterable.
|
|
275
|
+
*
|
|
276
|
+
* The full reply is appended to the history when the stream completes
|
|
277
|
+
* normally. If the stream is aborted, neither message is appended.
|
|
278
|
+
*
|
|
279
|
+
* @param message - The user-facing message text.
|
|
280
|
+
* @param options - Generation options including an optional `signal`.
|
|
281
|
+
*/
|
|
282
|
+
async *stream(message, options = {}) {
|
|
283
|
+
const messages = this.buildMessages(message);
|
|
284
|
+
const userMsg = { role: "user", content: message };
|
|
285
|
+
let acc = "";
|
|
286
|
+
for await (const chunk of this.engine.stream(messages, options)) {
|
|
287
|
+
acc += chunk.text;
|
|
288
|
+
yield chunk;
|
|
289
|
+
}
|
|
290
|
+
const assistantMsg = { role: "assistant", content: acc };
|
|
291
|
+
this.history.push(userMsg, assistantMsg);
|
|
292
|
+
}
|
|
293
|
+
buildMessages(userMessage) {
|
|
294
|
+
const messages = [];
|
|
295
|
+
if (this.systemPrompt) {
|
|
296
|
+
messages.push({ role: "system", content: this.systemPrompt });
|
|
297
|
+
}
|
|
298
|
+
messages.push(...this.history);
|
|
299
|
+
messages.push({ role: "user", content: userMessage });
|
|
300
|
+
return messages;
|
|
301
|
+
}
|
|
302
|
+
}
|
|
303
|
+
async function collectStream(stream) {
|
|
304
|
+
let acc = "";
|
|
305
|
+
for await (const chunk of stream) {
|
|
306
|
+
acc += chunk.text;
|
|
307
|
+
}
|
|
308
|
+
return acc;
|
|
309
|
+
}
|
|
310
|
+
async function* tap(stream, onChunk) {
|
|
311
|
+
for await (const chunk of stream) {
|
|
312
|
+
onChunk(chunk);
|
|
313
|
+
yield chunk;
|
|
314
|
+
}
|
|
315
|
+
}
|
|
316
|
+
const VERSION = "0.1.0";
|
|
317
|
+
export {
|
|
318
|
+
BackendNotAvailableError,
|
|
319
|
+
Chat,
|
|
320
|
+
ChatReply,
|
|
321
|
+
GenerationAbortedError,
|
|
322
|
+
LMTask,
|
|
323
|
+
LocalmWebError,
|
|
324
|
+
MODEL_PRESETS,
|
|
325
|
+
ModelLoadError,
|
|
326
|
+
ModelNotLoadedError,
|
|
327
|
+
QuotaExceededError,
|
|
328
|
+
UnknownModelError,
|
|
329
|
+
VERSION,
|
|
330
|
+
WebGPUUnavailableError,
|
|
331
|
+
collectStream,
|
|
332
|
+
listSupportedModels,
|
|
333
|
+
resolveModelPreset,
|
|
334
|
+
tap
|
|
335
|
+
};
|
|
336
|
+
//# sourceMappingURL=index.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","sources":["../src/core/exceptions.ts","../src/core/webllm-engine.ts","../src/presets/models.ts","../src/tasks/lm-task.ts","../src/results.ts","../src/tasks/chat.ts","../src/streaming/token-stream.ts","../src/index.ts"],"sourcesContent":["/**\n * Error hierarchy for localm-web.\n *\n * All errors thrown by the SDK extend `LocalmWebError` so consumers can\n * distinguish SDK errors from unrelated runtime errors with a single\n * `instanceof` check.\n */\n\n/** Base class for every error raised by localm-web. */\nexport class LocalmWebError extends Error {\n /**\n * @param message - Human-readable description of the error.\n * @param cause - Underlying error, if any.\n */\n constructor(\n message: string,\n public readonly cause?: unknown\n ) {\n super(message);\n this.name = new.target.name;\n }\n}\n\n/** Thrown when WebGPU is required but not available in the host browser. */\nexport class WebGPUUnavailableError extends LocalmWebError {}\n\n/** Thrown when a model fails to load (network, parsing, runtime init). */\nexport class ModelLoadError extends LocalmWebError {}\n\n/** Thrown when an inference call is made before a model has loaded. */\nexport class ModelNotLoadedError extends LocalmWebError {}\n\n/** Thrown when a model id is not present in the curated registry. */\nexport class UnknownModelError extends LocalmWebError {}\n\n/** Thrown when generation is aborted via an `AbortSignal`. */\nexport class GenerationAbortedError extends LocalmWebError {}\n\n/** Thrown when the browser denies storage quota for the model cache. */\nexport class QuotaExceededError extends LocalmWebError {}\n\n/** Thrown when no usable backend is available on the current platform. */\nexport class BackendNotAvailableError extends LocalmWebError {}\n","import type { Engine } from \"./engine\";\nimport type { GenerationOptions, Message, ProgressCallback, TokenChunk } from \"../types\";\nimport {\n GenerationAbortedError,\n ModelLoadError,\n ModelNotLoadedError,\n WebGPUUnavailableError,\n} from \"./exceptions\";\n\ntype WebLLMModule = typeof import(\"@mlc-ai/web-llm\");\ntype MLCEngine = import(\"@mlc-ai/web-llm\").MLCEngineInterface;\ntype ChatCompletionMessageParam = import(\"@mlc-ai/web-llm\").ChatCompletionMessageParam;\n\nlet webllmModulePromise: Promise<WebLLMModule> | null = null;\n\nasync function loadWebLLM(): Promise<WebLLMModule> {\n if (!webllmModulePromise) {\n webllmModulePromise = import(\"@mlc-ai/web-llm\");\n }\n return webllmModulePromise;\n}\n\nfunction isWebGPUAvailable(): boolean {\n return typeof navigator !== \"undefined\" && \"gpu\" in navigator;\n}\n\ninterface SamplingParams {\n max_tokens?: number;\n temperature?: number;\n top_p?: number;\n}\n\nfunction buildSamplingParams(options: GenerationOptions): SamplingParams {\n const params: SamplingParams = {};\n if (options.maxTokens !== undefined) params.max_tokens = options.maxTokens;\n if (options.temperature !== undefined) params.temperature = options.temperature;\n if (options.topP !== undefined) params.top_p = options.topP;\n return params;\n}\n\nfunction toChatMessages(messages: Message[]): ChatCompletionMessageParam[] {\n return messages.map((m): ChatCompletionMessageParam => {\n switch (m.role) {\n case \"system\":\n return { role: \"system\", content: m.content };\n case \"user\":\n return { role: \"user\", content: m.content };\n case \"assistant\":\n return { role: \"assistant\", content: m.content };\n case \"tool\":\n return { role: \"tool\", content: m.content, tool_call_id: m.name ?? \"\" };\n }\n });\n}\n\n/**\n * Inference engine backed by [WebLLM (MLC)](https://github.com/mlc-ai/web-llm).\n *\n * Requires WebGPU. The fallback path planned for v0.5 will route to ORT-Web\n * when WebGPU is missing.\n */\nexport class WebLLMEngine implements Engine {\n private engine: MLCEngine | null = null;\n\n isLoaded(): boolean {\n return this.engine !== null;\n }\n\n async load(modelId: string, onProgress?: ProgressCallback): Promise<void> {\n if (!isWebGPUAvailable()) {\n throw new WebGPUUnavailableError(\n \"WebGPU is not available in this browser. The ORT-Web fallback is planned for v0.5.\"\n );\n }\n const webllm = await loadWebLLM();\n try {\n this.engine = await webllm.CreateMLCEngine(modelId, {\n initProgressCallback: (report): void => {\n onProgress?.({\n progress: report.progress,\n text: report.text,\n loaded: 0,\n total: 0,\n });\n },\n });\n } catch (err) {\n throw new ModelLoadError(`Failed to load model \"${modelId}\".`, err);\n }\n }\n\n async generate(messages: Message[], options: GenerationOptions = {}): Promise<string> {\n const engine = this.requireEngine();\n if (options.signal?.aborted) {\n throw new GenerationAbortedError(\"Generation aborted before start.\");\n }\n const completion = await engine.chat.completions.create({\n ...buildSamplingParams(options),\n messages: toChatMessages(messages),\n stream: false,\n });\n return completion.choices[0]?.message?.content ?? \"\";\n }\n\n async *stream(messages: Message[], options: GenerationOptions = {}): AsyncIterable<TokenChunk> {\n const engine = this.requireEngine();\n if (options.signal?.aborted) {\n throw new GenerationAbortedError(\"Generation aborted before start.\");\n }\n const completion = await engine.chat.completions.create({\n ...buildSamplingParams(options),\n messages: toChatMessages(messages),\n stream: true,\n });\n let index: number = 0;\n let finished: boolean = false;\n try {\n for await (const chunk of completion) {\n if (options.signal?.aborted) {\n throw new GenerationAbortedError(\"Generation aborted by signal.\");\n }\n const choice = chunk.choices[0];\n const delta = choice?.delta?.content ?? \"\";\n if (delta) {\n yield { text: delta, index, done: false };\n index += 1;\n }\n if (choice?.finish_reason) {\n finished = true;\n yield { text: \"\", index, done: true };\n index += 1;\n }\n }\n if (!finished) {\n yield { text: \"\", index, done: true };\n }\n } catch (err) {\n if (err instanceof GenerationAbortedError) throw err;\n throw new ModelLoadError(\"Streaming generation failed.\", err);\n }\n }\n\n async unload(): Promise<void> {\n if (this.engine) {\n await this.engine.unload();\n this.engine = null;\n }\n }\n\n private requireEngine(): MLCEngine {\n if (!this.engine) {\n throw new ModelNotLoadedError(\"Engine not loaded. Call load() before generation.\");\n }\n return this.engine;\n }\n}\n","import type { ModelPreset } from \"../types\";\nimport { UnknownModelError } from \"../core/exceptions\";\n\n/**\n * Curated registry of supported models for v0.1.\n *\n * Each entry maps a friendly id (e.g. `\"phi-3.5-mini-int4\"`) to the underlying\n * runtime identifier and metadata. Friendly ids are stable; backend ids may\n * change as upstream MLC packages evolve.\n *\n * Only models that have been validated to load in browsers with WebGPU and\n * that fit the SLM target (≤ 4B parameters at INT4) are included.\n */\nexport const MODEL_PRESETS: Readonly<Record<string, ModelPreset>> = Object.freeze({\n \"phi-3.5-mini-int4\": {\n id: \"phi-3.5-mini-int4\",\n family: \"Phi-3.5\",\n parameters: \"3.8B\",\n quantization: \"q4f16_1\",\n webllmId: \"Phi-3.5-mini-instruct-q4f16_1-MLC\",\n contextWindow: 4096,\n description: \"Microsoft Phi-3.5 mini, INT4 quantized for browser inference.\",\n },\n \"llama-3.2-1b-int4\": {\n id: \"llama-3.2-1b-int4\",\n family: \"Llama-3.2\",\n parameters: \"1B\",\n quantization: \"q4f16_1\",\n webllmId: \"Llama-3.2-1B-Instruct-q4f16_1-MLC\",\n contextWindow: 4096,\n description: \"Meta Llama 3.2 1B Instruct, INT4 quantized.\",\n },\n \"qwen2.5-1.5b-int4\": {\n id: \"qwen2.5-1.5b-int4\",\n family: \"Qwen2.5\",\n parameters: \"1.5B\",\n quantization: \"q4f16_1\",\n webllmId: \"Qwen2.5-1.5B-Instruct-q4f16_1-MLC\",\n contextWindow: 4096,\n description: \"Alibaba Qwen 2.5 1.5B Instruct, INT4 quantized.\",\n },\n});\n\n/**\n * Resolve a friendly model id to its full preset metadata.\n *\n * @param modelId - Friendly id (e.g. `\"phi-3.5-mini-int4\"`).\n * @returns The matching preset.\n * @throws UnknownModelError if no preset matches.\n */\nexport function resolveModelPreset(modelId: string): ModelPreset {\n const preset = MODEL_PRESETS[modelId];\n if (!preset) {\n const available = Object.keys(MODEL_PRESETS).join(\", \");\n throw new UnknownModelError(`Unknown model \"${modelId}\". Available models: ${available}.`);\n }\n return preset;\n}\n\n/** Return the list of supported friendly model ids. */\nexport function listSupportedModels(): string[] {\n return Object.keys(MODEL_PRESETS);\n}\n","import type { Engine } from \"../core/engine\";\nimport { WebLLMEngine } from \"../core/webllm-engine\";\nimport { resolveModelPreset } from \"../presets/models\";\nimport type { ModelPreset, ProgressCallback } from \"../types\";\n\n/** Common options accepted by every task's `create()` factory. */\nexport interface LMTaskCreateOptions {\n /** Optional callback for model load progress updates. */\n onProgress?: ProgressCallback;\n /**\n * Override the engine used for inference. Intended for testing.\n * Production callers should let the SDK pick a backend automatically.\n */\n engine?: Engine;\n}\n\n/** Internal payload returned by {@link LMTask.createEngine}. */\nexport interface ResolvedEngine {\n engine: Engine;\n preset: ModelPreset;\n}\n\n/**\n * Base class shared by all language-model tasks (`Chat` for v0.1; `Completion`,\n * `Embeddings` and `Reranker` planned for later versions).\n *\n * The base owns:\n * - resolving a friendly model id to a {@link ModelPreset};\n * - selecting and loading an {@link Engine} (defaulting to WebLLM);\n * - exposing `unload()` for cleanup.\n *\n * Subclasses add task-specific public methods (`send`, `stream`, etc.).\n */\nexport abstract class LMTask {\n protected constructor(\n /** Engine used for inference. */\n protected readonly engine: Engine,\n /** Resolved metadata for the loaded model. */\n public readonly preset: ModelPreset\n ) {}\n\n /**\n * Load a model into a backend and return the wired-up engine + preset.\n *\n * Subclasses call this from their static `create()` factories.\n *\n * @param modelId - Friendly model id from the registry.\n * @param options - Task creation options.\n */\n protected static async createEngine(\n modelId: string,\n options: LMTaskCreateOptions = {}\n ): Promise<ResolvedEngine> {\n const preset = resolveModelPreset(modelId);\n const engine = options.engine ?? new WebLLMEngine();\n if (!engine.isLoaded()) {\n await engine.load(preset.webllmId, options.onProgress);\n }\n return { engine, preset };\n }\n\n /** Release engine resources. Safe to call multiple times. */\n async unload(): Promise<void> {\n await this.engine.unload();\n }\n\n /** Whether the underlying engine has a loaded model. */\n isLoaded(): boolean {\n return this.engine.isLoaded();\n }\n}\n","import type { FinishReason, Message } from \"./types\";\n\n/**\n * Result returned by `Chat.send()`.\n *\n * Holds the assistant's textual reply, the structured assistant message\n * (already appended to the chat history), and metadata about the generation.\n */\nexport class ChatReply {\n constructor(\n /** The assistant's reply text. */\n public readonly text: string,\n /** The structured assistant message (already appended to chat history). */\n public readonly message: Message,\n /** Number of tokens generated. 0 when the engine does not report it. */\n public readonly tokensGenerated: number,\n /** Why the generation loop stopped. */\n public readonly finishReason: FinishReason\n ) {}\n}\n","import { LMTask, type LMTaskCreateOptions } from \"./lm-task\";\nimport type { Engine } from \"../core/engine\";\nimport { ChatReply } from \"../results\";\nimport type { GenerationOptions, Message, ModelPreset, TokenChunk } from \"../types\";\n\n/**\n * Multi-turn chat task.\n *\n * Maintains an in-memory conversation history and applies the chat template\n * configured for the loaded model. Use {@link Chat.create} to construct an\n * instance — the constructor is private.\n *\n * @example\n * ```ts\n * const chat = await Chat.create(\"phi-3.5-mini-int4\");\n * const reply = await chat.send(\"Explain ONNX in one sentence.\");\n * console.log(reply.text);\n * ```\n *\n * @example Streaming\n * ```ts\n * const controller = new AbortController();\n * for await (const token of chat.stream(\"Explain ONNX.\", { signal: controller.signal })) {\n * process.stdout.write(token.text);\n * }\n * ```\n */\nexport class Chat extends LMTask {\n private readonly history: Message[] = [];\n private systemPrompt: string | null = null;\n\n private constructor(engine: Engine, preset: ModelPreset) {\n super(engine, preset);\n }\n\n /**\n * Create and load a `Chat` task for the given model.\n *\n * @param modelId - Friendly model id from the registry (e.g. `\"phi-3.5-mini-int4\"`).\n * @param options - Optional creation options (progress callback, engine override).\n */\n static async create(modelId: string, options: LMTaskCreateOptions = {}): Promise<Chat> {\n const { engine, preset } = await LMTask.createEngine(modelId, options);\n return new Chat(engine, preset);\n }\n\n /** Set or replace the system prompt prepended to every conversation. */\n setSystemPrompt(prompt: string): void {\n this.systemPrompt = prompt;\n }\n\n /** Clear the system prompt. */\n clearSystemPrompt(): void {\n this.systemPrompt = null;\n }\n\n /** Reset the conversation history. The system prompt is preserved. */\n resetHistory(): void {\n this.history.length = 0;\n }\n\n /** A read-only snapshot of the conversation history. */\n getHistory(): readonly Message[] {\n return this.history.slice();\n }\n\n /**\n * Send a user message and await the full assistant reply.\n *\n * The user message and the assistant reply are appended to the history.\n *\n * @param message - The user-facing message text.\n * @param options - Generation options.\n * @returns A {@link ChatReply} with the assistant's reply.\n */\n async send(message: string, options: GenerationOptions = {}): Promise<ChatReply> {\n const messages = this.buildMessages(message);\n const text = await this.engine.generate(messages, options);\n const userMsg: Message = { role: \"user\", content: message };\n const assistantMsg: Message = { role: \"assistant\", content: text };\n this.history.push(userMsg, assistantMsg);\n return new ChatReply(text, assistantMsg, 0, \"stop\");\n }\n\n /**\n * Stream the assistant reply token-by-token as an async iterable.\n *\n * The full reply is appended to the history when the stream completes\n * normally. If the stream is aborted, neither message is appended.\n *\n * @param message - The user-facing message text.\n * @param options - Generation options including an optional `signal`.\n */\n async *stream(message: string, options: GenerationOptions = {}): AsyncIterable<TokenChunk> {\n const messages = this.buildMessages(message);\n const userMsg: Message = { role: \"user\", content: message };\n let acc: string = \"\";\n for await (const chunk of this.engine.stream(messages, options)) {\n acc += chunk.text;\n yield chunk;\n }\n const assistantMsg: Message = { role: \"assistant\", content: acc };\n this.history.push(userMsg, assistantMsg);\n }\n\n private buildMessages(userMessage: string): Message[] {\n const messages: Message[] = [];\n if (this.systemPrompt) {\n messages.push({ role: \"system\", content: this.systemPrompt });\n }\n messages.push(...this.history);\n messages.push({ role: \"user\", content: userMessage });\n return messages;\n }\n}\n","import type { TokenChunk } from \"../types\";\n\n/**\n * Drain an async iterable of token chunks into a single string.\n *\n * Useful in tests, for non-streaming consumers, and as a one-line way to\n * reconstruct the final text from a `Chat.stream(...)` call.\n *\n * @param stream - The token-chunk async iterable to consume.\n * @returns The concatenation of every chunk's `text` field.\n */\nexport async function collectStream(stream: AsyncIterable<TokenChunk>): Promise<string> {\n let acc: string = \"\";\n for await (const chunk of stream) {\n acc += chunk.text;\n }\n return acc;\n}\n\n/**\n * Wrap an async iterable so that each `TokenChunk` is also passed to a\n * caller-supplied side-effect callback before being yielded downstream.\n *\n * This is intentionally a passthrough — it does not buffer.\n *\n * @param stream - The upstream token-chunk async iterable.\n * @param onChunk - Side-effect invoked for every chunk.\n * @returns A new async iterable yielding the same chunks.\n */\nexport async function* tap(\n stream: AsyncIterable<TokenChunk>,\n onChunk: (chunk: TokenChunk) => void\n): AsyncIterable<TokenChunk> {\n for await (const chunk of stream) {\n onChunk(chunk);\n yield chunk;\n }\n}\n","/**\n * localm-web — browser-only TypeScript SDK for running LLMs and SLMs locally.\n *\n * Public API surface for v0.1.\n *\n * @packageDocumentation\n */\n\nexport { Chat } from \"./tasks/chat\";\nexport { LMTask } from \"./tasks/lm-task\";\nexport type { LMTaskCreateOptions } from \"./tasks/lm-task\";\n\nexport { ChatReply } from \"./results\";\n\nexport { MODEL_PRESETS, resolveModelPreset, listSupportedModels } from \"./presets/models\";\n\nexport {\n LocalmWebError,\n WebGPUUnavailableError,\n ModelLoadError,\n ModelNotLoadedError,\n UnknownModelError,\n GenerationAbortedError,\n QuotaExceededError,\n BackendNotAvailableError,\n} from \"./core/exceptions\";\n\nexport type { Engine } from \"./core/engine\";\n\nexport { collectStream, tap } from \"./streaming/token-stream\";\n\nexport type {\n Role,\n FinishReason,\n Message,\n GenerationOptions,\n ModelLoadProgress,\n ProgressCallback,\n TokenChunk,\n ModelPreset,\n} from \"./types\";\n\n/** Current package version. Updated at release time. */\nexport const VERSION: string = \"0.1.0\";\n"],"names":[],"mappings":"AASO,MAAM,uBAAuB,MAAM;AAAA;AAAA;AAAA;AAAA;AAAA,EAKxC,YACE,SACgB,OAChB;AACA,UAAM,OAAO;AAFG,SAAA,QAAA;AAGhB,SAAK,OAAO,WAAW;AAAA,EACzB;AACF;AAGO,MAAM,+BAA+B,eAAe;AAAC;AAGrD,MAAM,uBAAuB,eAAe;AAAC;AAG7C,MAAM,4BAA4B,eAAe;AAAC;AAGlD,MAAM,0BAA0B,eAAe;AAAC;AAGhD,MAAM,+BAA+B,eAAe;AAAC;AAGrD,MAAM,2BAA2B,eAAe;AAAC;AAGjD,MAAM,iCAAiC,eAAe;AAAC;AC7B9D,IAAI,sBAAoD;AAExD,eAAe,aAAoC;AACjD,MAAI,CAAC,qBAAqB;AACxB,0BAAsB,OAAO,iBAAiB;AAAA,EAChD;AACA,SAAO;AACT;AAEA,SAAS,oBAA6B;AACpC,SAAO,OAAO,cAAc,eAAe,SAAS;AACtD;AAQA,SAAS,oBAAoB,SAA4C;AACvE,QAAM,SAAyB,CAAA;AAC/B,MAAI,QAAQ,cAAc,OAAW,QAAO,aAAa,QAAQ;AACjE,MAAI,QAAQ,gBAAgB,OAAW,QAAO,cAAc,QAAQ;AACpE,MAAI,QAAQ,SAAS,OAAW,QAAO,QAAQ,QAAQ;AACvD,SAAO;AACT;AAEA,SAAS,eAAe,UAAmD;AACzE,SAAO,SAAS,IAAI,CAAC,MAAkC;AACrD,YAAQ,EAAE,MAAA;AAAA,MACR,KAAK;AACH,eAAO,EAAE,MAAM,UAAU,SAAS,EAAE,QAAA;AAAA,MACtC,KAAK;AACH,eAAO,EAAE,MAAM,QAAQ,SAAS,EAAE,QAAA;AAAA,MACpC,KAAK;AACH,eAAO,EAAE,MAAM,aAAa,SAAS,EAAE,QAAA;AAAA,MACzC,KAAK;AACH,eAAO,EAAE,MAAM,QAAQ,SAAS,EAAE,SAAS,cAAc,EAAE,QAAQ,GAAA;AAAA,IAAG;AAAA,EAE5E,CAAC;AACH;AAQO,MAAM,aAA+B;AAAA,EAClC,SAA2B;AAAA,EAEnC,WAAoB;AAClB,WAAO,KAAK,WAAW;AAAA,EACzB;AAAA,EAEA,MAAM,KAAK,SAAiB,YAA8C;AACxE,QAAI,CAAC,qBAAqB;AACxB,YAAM,IAAI;AAAA,QACR;AAAA,MAAA;AAAA,IAEJ;AACA,UAAM,SAAS,MAAM,WAAA;AACrB,QAAI;AACF,WAAK,SAAS,MAAM,OAAO,gBAAgB,SAAS;AAAA,QAClD,sBAAsB,CAAC,WAAiB;AACtC,uBAAa;AAAA,YACX,UAAU,OAAO;AAAA,YACjB,MAAM,OAAO;AAAA,YACb,QAAQ;AAAA,YACR,OAAO;AAAA,UAAA,CACR;AAAA,QACH;AAAA,MAAA,CACD;AAAA,IACH,SAAS,KAAK;AACZ,YAAM,IAAI,eAAe,yBAAyB,OAAO,MAAM,GAAG;AAAA,IACpE;AAAA,EACF;AAAA,EAEA,MAAM,SAAS,UAAqB,UAA6B,IAAqB;AACpF,UAAM,SAAS,KAAK,cAAA;AACpB,QAAI,QAAQ,QAAQ,SAAS;AAC3B,YAAM,IAAI,uBAAuB,kCAAkC;AAAA,IACrE;AACA,UAAM,aAAa,MAAM,OAAO,KAAK,YAAY,OAAO;AAAA,MACtD,GAAG,oBAAoB,OAAO;AAAA,MAC9B,UAAU,eAAe,QAAQ;AAAA,MACjC,QAAQ;AAAA,IAAA,CACT;AACD,WAAO,WAAW,QAAQ,CAAC,GAAG,SAAS,WAAW;AAAA,EACpD;AAAA,EAEA,OAAO,OAAO,UAAqB,UAA6B,IAA+B;AAC7F,UAAM,SAAS,KAAK,cAAA;AACpB,QAAI,QAAQ,QAAQ,SAAS;AAC3B,YAAM,IAAI,uBAAuB,kCAAkC;AAAA,IACrE;AACA,UAAM,aAAa,MAAM,OAAO,KAAK,YAAY,OAAO;AAAA,MACtD,GAAG,oBAAoB,OAAO;AAAA,MAC9B,UAAU,eAAe,QAAQ;AAAA,MACjC,QAAQ;AAAA,IAAA,CACT;AACD,QAAI,QAAgB;AACpB,QAAI,WAAoB;AACxB,QAAI;AACF,uBAAiB,SAAS,YAAY;AACpC,YAAI,QAAQ,QAAQ,SAAS;AAC3B,gBAAM,IAAI,uBAAuB,+BAA+B;AAAA,QAClE;AACA,cAAM,SAAS,MAAM,QAAQ,CAAC;AAC9B,cAAM,QAAQ,QAAQ,OAAO,WAAW;AACxC,YAAI,OAAO;AACT,gBAAM,EAAE,MAAM,OAAO,OAAO,MAAM,MAAA;AAClC,mBAAS;AAAA,QACX;AACA,YAAI,QAAQ,eAAe;AACzB,qBAAW;AACX,gBAAM,EAAE,MAAM,IAAI,OAAO,MAAM,KAAA;AAC/B,mBAAS;AAAA,QACX;AAAA,MACF;AACA,UAAI,CAAC,UAAU;AACb,cAAM,EAAE,MAAM,IAAI,OAAO,MAAM,KAAA;AAAA,MACjC;AAAA,IACF,SAAS,KAAK;AACZ,UAAI,eAAe,uBAAwB,OAAM;AACjD,YAAM,IAAI,eAAe,gCAAgC,GAAG;AAAA,IAC9D;AAAA,EACF;AAAA,EAEA,MAAM,SAAwB;AAC5B,QAAI,KAAK,QAAQ;AACf,YAAM,KAAK,OAAO,OAAA;AAClB,WAAK,SAAS;AAAA,IAChB;AAAA,EACF;AAAA,EAEQ,gBAA2B;AACjC,QAAI,CAAC,KAAK,QAAQ;AAChB,YAAM,IAAI,oBAAoB,mDAAmD;AAAA,IACnF;AACA,WAAO,KAAK;AAAA,EACd;AACF;AC9IO,MAAM,gBAAuD,OAAO,OAAO;AAAA,EAChF,qBAAqB;AAAA,IACnB,IAAI;AAAA,IACJ,QAAQ;AAAA,IACR,YAAY;AAAA,IACZ,cAAc;AAAA,IACd,UAAU;AAAA,IACV,eAAe;AAAA,IACf,aAAa;AAAA,EAAA;AAAA,EAEf,qBAAqB;AAAA,IACnB,IAAI;AAAA,IACJ,QAAQ;AAAA,IACR,YAAY;AAAA,IACZ,cAAc;AAAA,IACd,UAAU;AAAA,IACV,eAAe;AAAA,IACf,aAAa;AAAA,EAAA;AAAA,EAEf,qBAAqB;AAAA,IACnB,IAAI;AAAA,IACJ,QAAQ;AAAA,IACR,YAAY;AAAA,IACZ,cAAc;AAAA,IACd,UAAU;AAAA,IACV,eAAe;AAAA,IACf,aAAa;AAAA,EAAA;AAEjB,CAAC;AASM,SAAS,mBAAmB,SAA8B;AAC/D,QAAM,SAAS,cAAc,OAAO;AACpC,MAAI,CAAC,QAAQ;AACX,UAAM,YAAY,OAAO,KAAK,aAAa,EAAE,KAAK,IAAI;AACtD,UAAM,IAAI,kBAAkB,kBAAkB,OAAO,wBAAwB,SAAS,GAAG;AAAA,EAC3F;AACA,SAAO;AACT;AAGO,SAAS,sBAAgC;AAC9C,SAAO,OAAO,KAAK,aAAa;AAClC;AC7BO,MAAe,OAAO;AAAA,EACjB,YAEW,QAEH,QAChB;AAHmB,SAAA,SAAA;AAEH,SAAA,SAAA;AAAA,EACf;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA,EAUH,aAAuB,aACrB,SACA,UAA+B,IACN;AACzB,UAAM,SAAS,mBAAmB,OAAO;AACzC,UAAM,SAAS,QAAQ,UAAU,IAAI,aAAA;AACrC,QAAI,CAAC,OAAO,YAAY;AACtB,YAAM,OAAO,KAAK,OAAO,UAAU,QAAQ,UAAU;AAAA,IACvD;AACA,WAAO,EAAE,QAAQ,OAAA;AAAA,EACnB;AAAA;AAAA,EAGA,MAAM,SAAwB;AAC5B,UAAM,KAAK,OAAO,OAAA;AAAA,EACpB;AAAA;AAAA,EAGA,WAAoB;AAClB,WAAO,KAAK,OAAO,SAAA;AAAA,EACrB;AACF;AC9DO,MAAM,UAAU;AAAA,EACrB,YAEkB,MAEA,SAEA,iBAEA,cAChB;AAPgB,SAAA,OAAA;AAEA,SAAA,UAAA;AAEA,SAAA,kBAAA;AAEA,SAAA,eAAA;AAAA,EACf;AACL;ACQO,MAAM,aAAa,OAAO;AAAA,EACd,UAAqB,CAAA;AAAA,EAC9B,eAA8B;AAAA,EAE9B,YAAY,QAAgB,QAAqB;AACvD,UAAM,QAAQ,MAAM;AAAA,EACtB;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA,EAQA,aAAa,OAAO,SAAiB,UAA+B,IAAmB;AACrF,UAAM,EAAE,QAAQ,OAAA,IAAW,MAAM,OAAO,aAAa,SAAS,OAAO;AACrE,WAAO,IAAI,KAAK,QAAQ,MAAM;AAAA,EAChC;AAAA;AAAA,EAGA,gBAAgB,QAAsB;AACpC,SAAK,eAAe;AAAA,EACtB;AAAA;AAAA,EAGA,oBAA0B;AACxB,SAAK,eAAe;AAAA,EACtB;AAAA;AAAA,EAGA,eAAqB;AACnB,SAAK,QAAQ,SAAS;AAAA,EACxB;AAAA;AAAA,EAGA,aAAiC;AAC/B,WAAO,KAAK,QAAQ,MAAA;AAAA,EACtB;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA,EAWA,MAAM,KAAK,SAAiB,UAA6B,IAAwB;AAC/E,UAAM,WAAW,KAAK,cAAc,OAAO;AAC3C,UAAM,OAAO,MAAM,KAAK,OAAO,SAAS,UAAU,OAAO;AACzD,UAAM,UAAmB,EAAE,MAAM,QAAQ,SAAS,QAAA;AAClD,UAAM,eAAwB,EAAE,MAAM,aAAa,SAAS,KAAA;AAC5D,SAAK,QAAQ,KAAK,SAAS,YAAY;AACvC,WAAO,IAAI,UAAU,MAAM,cAAc,GAAG,MAAM;AAAA,EACpD;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA,EAWA,OAAO,OAAO,SAAiB,UAA6B,IAA+B;AACzF,UAAM,WAAW,KAAK,cAAc,OAAO;AAC3C,UAAM,UAAmB,EAAE,MAAM,QAAQ,SAAS,QAAA;AAClD,QAAI,MAAc;AAClB,qBAAiB,SAAS,KAAK,OAAO,OAAO,UAAU,OAAO,GAAG;AAC/D,aAAO,MAAM;AACb,YAAM;AAAA,IACR;AACA,UAAM,eAAwB,EAAE,MAAM,aAAa,SAAS,IAAA;AAC5D,SAAK,QAAQ,KAAK,SAAS,YAAY;AAAA,EACzC;AAAA,EAEQ,cAAc,aAAgC;AACpD,UAAM,WAAsB,CAAA;AAC5B,QAAI,KAAK,cAAc;AACrB,eAAS,KAAK,EAAE,MAAM,UAAU,SAAS,KAAK,cAAc;AAAA,IAC9D;AACA,aAAS,KAAK,GAAG,KAAK,OAAO;AAC7B,aAAS,KAAK,EAAE,MAAM,QAAQ,SAAS,aAAa;AACpD,WAAO;AAAA,EACT;AACF;ACvGA,eAAsB,cAAc,QAAoD;AACtF,MAAI,MAAc;AAClB,mBAAiB,SAAS,QAAQ;AAChC,WAAO,MAAM;AAAA,EACf;AACA,SAAO;AACT;AAYA,gBAAuB,IACrB,QACA,SAC2B;AAC3B,mBAAiB,SAAS,QAAQ;AAChC,YAAQ,KAAK;AACb,UAAM;AAAA,EACR;AACF;ACMO,MAAM,UAAkB;"}
|
package/package.json
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "localm-web",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Browser-only TypeScript SDK for running LLMs and SLMs locally with WebGPU. Ultralytics-style DX, Vite-first.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"main": "./dist/index.js",
|
|
7
|
+
"module": "./dist/index.js",
|
|
8
|
+
"types": "./dist/index.d.ts",
|
|
9
|
+
"exports": {
|
|
10
|
+
".": {
|
|
11
|
+
"types": "./dist/index.d.ts",
|
|
12
|
+
"import": "./dist/index.js"
|
|
13
|
+
}
|
|
14
|
+
},
|
|
15
|
+
"files": [
|
|
16
|
+
"dist",
|
|
17
|
+
"README.md",
|
|
18
|
+
"LICENSE",
|
|
19
|
+
"CHANGELOG.md"
|
|
20
|
+
],
|
|
21
|
+
"sideEffects": false,
|
|
22
|
+
"scripts": {
|
|
23
|
+
"build": "vite build",
|
|
24
|
+
"dev": "vite build --watch",
|
|
25
|
+
"test": "vitest run",
|
|
26
|
+
"test:watch": "vitest",
|
|
27
|
+
"lint": "eslint .",
|
|
28
|
+
"format": "prettier --write \"**/*.{ts,tsx,js,json,md}\"",
|
|
29
|
+
"format:check": "prettier --check \"**/*.{ts,tsx,js,json,md}\"",
|
|
30
|
+
"typecheck": "tsc --noEmit",
|
|
31
|
+
"clean": "rm -rf dist"
|
|
32
|
+
},
|
|
33
|
+
"peerDependencies": {
|
|
34
|
+
"@mlc-ai/web-llm": "^0.2.79"
|
|
35
|
+
},
|
|
36
|
+
"devDependencies": {
|
|
37
|
+
"@mlc-ai/web-llm": "^0.2.79",
|
|
38
|
+
"@types/node": "^22.10.0",
|
|
39
|
+
"@typescript-eslint/eslint-plugin": "^8.18.0",
|
|
40
|
+
"@typescript-eslint/parser": "^8.18.0",
|
|
41
|
+
"eslint": "^9.17.0",
|
|
42
|
+
"prettier": "^3.4.0",
|
|
43
|
+
"typescript": "^5.7.0",
|
|
44
|
+
"vite": "^7.3.3",
|
|
45
|
+
"vite-plugin-dts": "^4.4.0",
|
|
46
|
+
"vitest": "^3.2.4"
|
|
47
|
+
},
|
|
48
|
+
"engines": {
|
|
49
|
+
"node": ">=18.0.0"
|
|
50
|
+
},
|
|
51
|
+
"overrides": {
|
|
52
|
+
"esbuild": "^0.25.0"
|
|
53
|
+
},
|
|
54
|
+
"keywords": [
|
|
55
|
+
"llm",
|
|
56
|
+
"slm",
|
|
57
|
+
"language-model",
|
|
58
|
+
"webgpu",
|
|
59
|
+
"browser",
|
|
60
|
+
"onnx",
|
|
61
|
+
"webllm",
|
|
62
|
+
"mlc",
|
|
63
|
+
"phi",
|
|
64
|
+
"llama",
|
|
65
|
+
"qwen",
|
|
66
|
+
"gemma",
|
|
67
|
+
"local-llm",
|
|
68
|
+
"on-device",
|
|
69
|
+
"vite",
|
|
70
|
+
"typescript",
|
|
71
|
+
"chat",
|
|
72
|
+
"embeddings",
|
|
73
|
+
"reranker"
|
|
74
|
+
],
|
|
75
|
+
"license": "MIT",
|
|
76
|
+
"author": {
|
|
77
|
+
"name": "Mauricio Benjamin",
|
|
78
|
+
"email": "mauricio.benjamin@reloverelations.com"
|
|
79
|
+
},
|
|
80
|
+
"repository": {
|
|
81
|
+
"type": "git",
|
|
82
|
+
"url": "https://github.com/mauriciobenjamin700/localm-web.git"
|
|
83
|
+
},
|
|
84
|
+
"homepage": "https://github.com/mauriciobenjamin700/localm-web#readme",
|
|
85
|
+
"bugs": {
|
|
86
|
+
"url": "https://github.com/mauriciobenjamin700/localm-web/issues"
|
|
87
|
+
}
|
|
88
|
+
}
|