localm-web 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,160 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ### Changed
11
+
12
+ - **`LMTaskCreateOptions.inWorker` default flipped from `false` to `true`.**
13
+ `Chat.create()` and `Completion.create()` now spawn a Web Worker by
14
+ default, isolating tokenization and WebGPU dispatches from the UI
15
+ thread. Pass `inWorker: false` explicitly to revert to main-thread
16
+ inference (useful in environments without `Worker` support or when
17
+ debugging the runtime). The fast path for opting out is unchanged
18
+ in shape — only the default differs. Pre-1.0 SDK; consumers
19
+ upgrading from v0.2 will silently move inference off the main
20
+ thread, which is desirable for almost every app.
21
+
22
+ ### Added
23
+
24
+ - `Embeddings` task in `src/tasks/embeddings.ts` — sentence embeddings
25
+ via `@huggingface/transformers`. `Embeddings.create(modelId, options?)`
26
+ returns an instance; `embed(texts: string[], options?)` returns
27
+ `number[][]`; `embedSingle(text)` returns `number[]`. Empty input
28
+ yields `[]` (per project convention — no NotFoundError on empty).
29
+ Default pooling `"mean"`, default `normalize: true`.
30
+ - `EMBEDDING_PRESETS` registry with `bge-small-en-v1.5` (384-dim) and
31
+ `bge-base-en-v1.5` (768-dim). `resolveEmbeddingPreset(id)` and
32
+ `listSupportedEmbeddingModels()` helpers.
33
+ - Public types: `EmbeddingPreset`, `EmbeddingsCreateOptions`,
34
+ `EmbedOptions`, `EmbedPipeline` (DI hook for tests).
35
+ - `Reranker` task in `src/tasks/reranker.ts` — cross-encoder reranking
36
+ via `@huggingface/transformers`. `Reranker.create(modelId, options?)`
37
+ returns an instance; `score(query, docs, options?)` returns
38
+ `number[]` (raw logits, or sigmoid-mapped to `[0, 1]` when
39
+ `sigmoid: true`); `rank(query, docs, options?)` returns
40
+ `RankedDocument[]` sorted descending by score with the original
41
+ index preserved. Empty `docs` yields `[]`.
42
+ - `RERANKER_PRESETS` registry with `bge-reranker-base`.
43
+ `resolveRerankerPreset(id)` and `listSupportedRerankerModels()`
44
+ helpers. Public type `RerankerPreset`.
45
+ - Public types: `RerankerCreateOptions`, `RerankOptions`,
46
+ `RerankPipeline`, `RankedDocument`.
47
+ - `peerDependenciesMeta` marks `@huggingface/transformers` as
48
+ optional — Chat / Completion users do not need to install it.
49
+ - 10 unit tests in `test/embeddings.test.ts` covering registry
50
+ resolution, batch + single embedding, empty input short-circuit,
51
+ pooling / normalize defaults and overrides, unload delegation,
52
+ graceful unload when pipeline omits `unload()`.
53
+ - 10 unit tests in `test/reranker.test.ts` covering registry
54
+ resolution, score order preservation, empty input short-circuit,
55
+ sigmoid normalization, default raw-logit output, descending sort
56
+ in `rank()`, unload delegation, graceful unload without
57
+ `unload()`.
58
+ - `docs/getting-started.md` v0.3 update — new sections covering
59
+ Embeddings, Reranker, the retrieve-then-rerank pattern, the
60
+ embedding / reranker registries, and the new Web-Worker-by-default
61
+ behavior. Existing sections (model registry, downloads, cache,
62
+ troubleshooting) carry over unchanged.
63
+
64
+ ## [0.2.0] - 2026-05-10
65
+
66
+ ### Added
67
+
68
+ - `docs/getting-started.md` — end-to-end guide covering prerequisites,
69
+ install, first chat snippet, the curated model registry with download /
70
+ RAM estimates, how a model downloads and where it caches, running the
71
+ example Vite app, cold-start expectations, inspecting / clearing the
72
+ Cache API, offline behavior and troubleshooting.
73
+ - README links to the new guide from the **Installation** and
74
+ **Vite usage** sections; the example app blurb now points at the
75
+ runnable folder instead of hedging with "once v0.1 lands".
76
+ - `Completion` task for raw text continuation (no chat template, no history).
77
+ Exposes `predict()` returning a `CompletionResult` and `stream()` yielding
78
+ `TokenChunk` async iterable. Mirrors the `Chat` task DX.
79
+ - `CompletionResult` class in `src/results.ts` — holds the generated text,
80
+ the original prompt, tokens generated and finish reason.
81
+ - `Engine.complete()` and `Engine.streamCompletion()` methods on the
82
+ runtime-agnostic engine contract. `WebLLMEngine` implements both via
83
+ `engine.completions.create()` (raw text mode, bypasses chat template).
84
+ - `ModelLoadPhase` discriminated string type
85
+ (`"downloading" | "compiling" | "loading" | "ready"`) on `ModelLoadProgress`.
86
+ Lets consumers drive UI state machines (spinner → progress bar → ready
87
+ badge) without parsing the runtime's free-form status text.
88
+ - `WebLLMEngine.load()` classifies each progress report via
89
+ `classifyLoadPhase()` and emits a final `phase: "ready"` event exactly
90
+ once when the load resolves successfully.
91
+ - `WorkerEngine` — `Engine` implementation that proxies all calls to a Web
92
+ Worker via a typed RPC protocol. Lets consumers run inference off the UI
93
+ thread.
94
+ - `createInferenceWorker()` helper that spawns a module-type Worker pointing
95
+ at the SDK's bundled worker entry. Exposed for advanced lifecycle
96
+ scenarios (pooling, custom termination); most consumers never call it
97
+ directly.
98
+ - `LMTaskCreateOptions.inWorker` flag (default `false` in v0.2). When
99
+ `true`, the task instantiates a worker-backed engine instead of running
100
+ inference on the main thread. Default flips to `true` in v0.3 once the
101
+ Cache API / OPFS integration validates worker-thread storage access.
102
+ - `src/worker/protocol.ts` — discriminated-union message contract between
103
+ main thread and worker (`load`, `generate`, `stream`, `complete`,
104
+ `stream-completion`, `abort`, `unload`, `isLoaded` requests; `loaded`,
105
+ `progress`, `generated`, `token`, `stream-end`, `error`, `unloaded`,
106
+ `is-loaded` responses). Numeric op ids isolate concurrent operations.
107
+ - `WorkerLike` interface exported for tests and custom integrations that
108
+ need to inject a transport (mocks, Comlink wrappers, MessagePort
109
+ bridges).
110
+ - 11 new unit tests in `test/worker-engine.test.ts` exercising load with
111
+ progress, generate round-trip, abort propagation, signal stripping,
112
+ streaming queue, error mapping, unload short-circuit, terminate, and
113
+ concurrent-load rejection.
114
+ - `ModelCache` class in `src/cache/model-cache.ts` — inspect and manage
115
+ cached model weights from a consuming app:
116
+ - `has(modelId)` / `delete(modelId)` wrap WebLLM's `hasModelInCache` /
117
+ `deleteModelInCache`, validating the friendly id against the
118
+ registry first.
119
+ - `list()` iterates `MODEL_PRESETS` and returns the cached subset as
120
+ `CachedModelEntry[]` with friendly id, backend id, family, params.
121
+ Empty list when nothing is cached (per the project's
122
+ `*NotFoundError`-free convention).
123
+ - `clear()` deletes every registry model in parallel — useful for
124
+ logout / reset flows.
125
+ - `estimateUsage()` wraps `navigator.storage.estimate()` and returns
126
+ `{ usage, quota }`. Falls back to zeros when the API is missing.
127
+ - `ModelCache.assertKnown(modelId)` static guard that throws
128
+ `UnknownModelError` for ids outside the registry.
129
+ - Public types: `CachedModelEntry`, `CacheUsage`, `ModelCacheOptions`
130
+ re-exported from `src/index.ts`.
131
+ - Dependency-injectable backend (`hasModel`, `deleteModel`, `estimate`
132
+ hooks) so unit tests can mock the runtime + browser APIs without
133
+ touching the real Cache API or `@mlc-ai/web-llm`.
134
+ - 15 unit tests in `test/model-cache.test.ts` covering `has` / `delete`
135
+ / `list` / `clear` / `estimateUsage` and `assertKnown`, including
136
+ navigator fallbacks via `vi.stubGlobal`.
137
+
138
+ ### Changed
139
+
140
+ - `ProgressCallback` payload shape gained a required `phase` field. This is
141
+ technically a breaking change but the SDK is pre-1.0 and the type is
142
+ emitted only by the engine — consumers were already supposed to treat
143
+ the payload as opaque.
144
+ - `vite.config.ts` adds `worker.format = "es"` and externalizes ORT-Web /
145
+ HF deps from the worker bundle. `@mlc-ai/web-llm` is intentionally
146
+ bundled into the worker chunk because workers cannot resolve bare
147
+ specifiers at runtime — this trades a larger lazy-loaded chunk
148
+ (~6.5 MB pre-gzip, only fetched when `inWorker: true`) for a clean DX
149
+ (no consumer-side worker config). The main `dist/index.js` stays at
150
+ ~16 kB and webllm remains a peer dep there.
151
+ - `engines.node` bumped from `>=18.0.0` to `>=20.19.0`. Vite 7's worker
152
+ bundler depends on `crypto.hash()` which lands in Node 19; Node 18
153
+ also reaches end-of-life on 2025-04-30 per the Node release schedule.
154
+ - CI matrix dropped Node 18, kept 20 + 22.
155
+
156
+ ### Notes
157
+
158
+ - `ModelCache` is **inspection + management only**. Actual weight
159
+ download still flows through WebLLM's internal Cache-API path.
160
+ OPFS-as-primary-storage and resume-on-interrupted-download (also in
161
+ the v0.2 roadmap) require intercepting the WebLLM downloader and
162
+ are deferred to v0.3 to avoid forking upstream.
163
+
10
164
  ## [0.1.0] - 2026-05-10
11
165
 
12
166
  ### Added
package/README.md CHANGED
@@ -161,19 +161,19 @@ The shape mirrors `ort-vision-sdk-web`: `await Class.create(model)` then `predic
161
161
 
162
162
  ## Installation
163
163
 
164
- > Not yet published. Once v0.1 ships:
165
-
166
164
  ```bash
167
165
  npm install localm-web @mlc-ai/web-llm
168
166
  ```
169
167
 
170
168
  `@mlc-ai/web-llm` is a peer dependency — the consumer pins the version, which keeps the SDK lightweight and avoids version conflicts.
171
169
 
170
+ For a step-by-step walkthrough covering install, model selection, downloading weights, running the example app and troubleshooting, see **[docs/getting-started.md](./docs/getting-started.md)**.
171
+
172
172
  ## Vite usage
173
173
 
174
174
  The package is designed to drop into a Vite app with no extra config. The Web Worker is bundled via Vite's native worker support; just import the SDK and use it.
175
175
 
176
- A complete example will live under `examples/vite-chat/` once v0.1 lands.
176
+ A runnable example lives under [`examples/vite-chat/`](./examples/vite-chat/) `cd` into it, `npm install`, `npm run dev`, open the browser, pick a model, send a prompt. The full guide in [`docs/getting-started.md`](./docs/getting-started.md#run-the-example-app) walks through it.
177
177
 
178
178
  ## Why not server-side?
179
179