npm - localm-web - Versions diffs - 0.1.0 → 0.3.0 - Mend

localm-web 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +154 -0
package/README.md +3 -3
package/dist/assets/index-ChQoBCqA.js +23168 -0
package/dist/assets/index-ChQoBCqA.js.map +1 -0
package/dist/assets/inference.worker-CwvQtobb.js +330 -0
package/dist/assets/inference.worker-CwvQtobb.js.map +1 -0
package/dist/index.d.ts +634 -0
package/dist/index.js +807 -3
package/dist/index.js.map +1 -1
package/package.json +9 -2

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,160 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+### Changed
+- **`LMTaskCreateOptions.inWorker` default flipped from `false` to `true`.**
+  `Chat.create()` and `Completion.create()` now spawn a Web Worker by
+  default, isolating tokenization and WebGPU dispatches from the UI
+  thread. Pass `inWorker: false` explicitly to revert to main-thread
+  inference (useful in environments without `Worker` support or when
+  debugging the runtime). The fast path for opting out is unchanged
+  in shape — only the default differs. Pre-1.0 SDK; consumers
+  upgrading from v0.2 will silently move inference off the main
+  thread, which is desirable for almost every app.
+### Added
+- `Embeddings` task in `src/tasks/embeddings.ts` — sentence embeddings
+  via `@huggingface/transformers`. `Embeddings.create(modelId, options?)`
+  returns an instance; `embed(texts: string[], options?)` returns
+  `number[][]`; `embedSingle(text)` returns `number[]`. Empty input
+  yields `[]` (per project convention — no NotFoundError on empty).
+  Default pooling `"mean"`, default `normalize: true`.
+- `EMBEDDING_PRESETS` registry with `bge-small-en-v1.5` (384-dim) and
+  `bge-base-en-v1.5` (768-dim). `resolveEmbeddingPreset(id)` and
+  `listSupportedEmbeddingModels()` helpers.
+- Public types: `EmbeddingPreset`, `EmbeddingsCreateOptions`,
+  `EmbedOptions`, `EmbedPipeline` (DI hook for tests).
+- `Reranker` task in `src/tasks/reranker.ts` — cross-encoder reranking
+  via `@huggingface/transformers`. `Reranker.create(modelId, options?)`
+  returns an instance; `score(query, docs, options?)` returns
+  `number[]` (raw logits, or sigmoid-mapped to `[0, 1]` when
+  `sigmoid: true`); `rank(query, docs, options?)` returns
+  `RankedDocument[]` sorted descending by score with the original
+  index preserved. Empty `docs` yields `[]`.
+- `RERANKER_PRESETS` registry with `bge-reranker-base`.
+  `resolveRerankerPreset(id)` and `listSupportedRerankerModels()`
+  helpers. Public type `RerankerPreset`.
+- Public types: `RerankerCreateOptions`, `RerankOptions`,
+  `RerankPipeline`, `RankedDocument`.
+- `peerDependenciesMeta` marks `@huggingface/transformers` as
+  optional — Chat / Completion users do not need to install it.
+- 10 unit tests in `test/embeddings.test.ts` covering registry
+  resolution, batch + single embedding, empty input short-circuit,
+  pooling / normalize defaults and overrides, unload delegation,
+  graceful unload when pipeline omits `unload()`.
+- 10 unit tests in `test/reranker.test.ts` covering registry
+  resolution, score order preservation, empty input short-circuit,
+  sigmoid normalization, default raw-logit output, descending sort
+  in `rank()`, unload delegation, graceful unload without
+  `unload()`.
+- `docs/getting-started.md` v0.3 update — new sections covering
+  Embeddings, Reranker, the retrieve-then-rerank pattern, the
+  embedding / reranker registries, and the new Web-Worker-by-default
+  behavior. Existing sections (model registry, downloads, cache,
+  troubleshooting) carry over unchanged.
+## [0.2.0] - 2026-05-10
+### Added
+- `docs/getting-started.md` — end-to-end guide covering prerequisites,
+  install, first chat snippet, the curated model registry with download /
+  RAM estimates, how a model downloads and where it caches, running the
+  example Vite app, cold-start expectations, inspecting / clearing the
+  Cache API, offline behavior and troubleshooting.
+- README links to the new guide from the **Installation** and
+  **Vite usage** sections; the example app blurb now points at the
+  runnable folder instead of hedging with "once v0.1 lands".
+- `Completion` task for raw text continuation (no chat template, no history).
+  Exposes `predict()` returning a `CompletionResult` and `stream()` yielding
+  `TokenChunk` async iterable. Mirrors the `Chat` task DX.
+- `CompletionResult` class in `src/results.ts` — holds the generated text,
+  the original prompt, tokens generated and finish reason.
+- `Engine.complete()` and `Engine.streamCompletion()` methods on the
+  runtime-agnostic engine contract. `WebLLMEngine` implements both via
+  `engine.completions.create()` (raw text mode, bypasses chat template).
+- `ModelLoadPhase` discriminated string type
+  (`"downloading" | "compiling" | "loading" | "ready"`) on `ModelLoadProgress`.
+  Lets consumers drive UI state machines (spinner → progress bar → ready
+  badge) without parsing the runtime's free-form status text.
+- `WebLLMEngine.load()` classifies each progress report via
+  `classifyLoadPhase()` and emits a final `phase: "ready"` event exactly
+  once when the load resolves successfully.
+- `WorkerEngine` — `Engine` implementation that proxies all calls to a Web
+  Worker via a typed RPC protocol. Lets consumers run inference off the UI
+  thread.
+- `createInferenceWorker()` helper that spawns a module-type Worker pointing
+  at the SDK's bundled worker entry. Exposed for advanced lifecycle
+  scenarios (pooling, custom termination); most consumers never call it
+  directly.
+- `LMTaskCreateOptions.inWorker` flag (default `false` in v0.2). When
+  `true`, the task instantiates a worker-backed engine instead of running
+  inference on the main thread. Default flips to `true` in v0.3 once the
+  Cache API / OPFS integration validates worker-thread storage access.
+- `src/worker/protocol.ts` — discriminated-union message contract between
+  main thread and worker (`load`, `generate`, `stream`, `complete`,
+  `stream-completion`, `abort`, `unload`, `isLoaded` requests; `loaded`,
+  `progress`, `generated`, `token`, `stream-end`, `error`, `unloaded`,
+  `is-loaded` responses). Numeric op ids isolate concurrent operations.
+- `WorkerLike` interface exported for tests and custom integrations that
+  need to inject a transport (mocks, Comlink wrappers, MessagePort
+  bridges).
+- 11 new unit tests in `test/worker-engine.test.ts` exercising load with
+  progress, generate round-trip, abort propagation, signal stripping,
+  streaming queue, error mapping, unload short-circuit, terminate, and
+  concurrent-load rejection.
+- `ModelCache` class in `src/cache/model-cache.ts` — inspect and manage
+  cached model weights from a consuming app:
+  - `has(modelId)` / `delete(modelId)` wrap WebLLM's `hasModelInCache` /
+    `deleteModelInCache`, validating the friendly id against the
+    registry first.
+  - `list()` iterates `MODEL_PRESETS` and returns the cached subset as
+    `CachedModelEntry[]` with friendly id, backend id, family, params.
+    Empty list when nothing is cached (per the project's
+    `*NotFoundError`-free convention).
+  - `clear()` deletes every registry model in parallel — useful for
+    logout / reset flows.
+  - `estimateUsage()` wraps `navigator.storage.estimate()` and returns
+    `{ usage, quota }`. Falls back to zeros when the API is missing.
+  - `ModelCache.assertKnown(modelId)` static guard that throws
+    `UnknownModelError` for ids outside the registry.
+- Public types: `CachedModelEntry`, `CacheUsage`, `ModelCacheOptions`
+  re-exported from `src/index.ts`.
+- Dependency-injectable backend (`hasModel`, `deleteModel`, `estimate`
+  hooks) so unit tests can mock the runtime + browser APIs without
+  touching the real Cache API or `@mlc-ai/web-llm`.
+- 15 unit tests in `test/model-cache.test.ts` covering `has` / `delete`
+  / `list` / `clear` / `estimateUsage` and `assertKnown`, including
+  navigator fallbacks via `vi.stubGlobal`.
+### Changed
+- `ProgressCallback` payload shape gained a required `phase` field. This is
+  technically a breaking change but the SDK is pre-1.0 and the type is
+  emitted only by the engine — consumers were already supposed to treat
+  the payload as opaque.
+- `vite.config.ts` adds `worker.format = "es"` and externalizes ORT-Web /
+  HF deps from the worker bundle. `@mlc-ai/web-llm` is intentionally
+  bundled into the worker chunk because workers cannot resolve bare
+  specifiers at runtime — this trades a larger lazy-loaded chunk
+  (~6.5 MB pre-gzip, only fetched when `inWorker: true`) for a clean DX
+  (no consumer-side worker config). The main `dist/index.js` stays at
+  ~16 kB and webllm remains a peer dep there.
+- `engines.node` bumped from `>=18.0.0` to `>=20.19.0`. Vite 7's worker
+  bundler depends on `crypto.hash()` which lands in Node 19; Node 18
+  also reaches end-of-life on 2025-04-30 per the Node release schedule.
+- CI matrix dropped Node 18, kept 20 + 22.
+### Notes
+- `ModelCache` is **inspection + management only**. Actual weight
+  download still flows through WebLLM's internal Cache-API path.
+  OPFS-as-primary-storage and resume-on-interrupted-download (also in
+  the v0.2 roadmap) require intercepting the WebLLM downloader and
+  are deferred to v0.3 to avoid forking upstream.
 ## [0.1.0] - 2026-05-10
 ### Added

package/README.md CHANGED Viewed

@@ -161,19 +161,19 @@ The shape mirrors `ort-vision-sdk-web`: `await Class.create(model)` then `predic
 ## Installation
-> Not yet published. Once v0.1 ships:
 ```bash
 npm install localm-web @mlc-ai/web-llm
 ```
 `@mlc-ai/web-llm` is a peer dependency — the consumer pins the version, which keeps the SDK lightweight and avoids version conflicts.
+For a step-by-step walkthrough covering install, model selection, downloading weights, running the example app and troubleshooting, see **[docs/getting-started.md](./docs/getting-started.md)**.
 ## Vite usage
 The package is designed to drop into a Vite app with no extra config. The Web Worker is bundled via Vite's native worker support; just import the SDK and use it.
-A complete example will live under `examples/vite-chat/` once v0.1 lands.
+A runnable example lives under [`examples/vite-chat/`](./examples/vite-chat/) — `cd` into it, `npm install`, `npm run dev`, open the browser, pick a model, send a prompt. The full guide in [`docs/getting-started.md`](./docs/getting-started.md#run-the-example-app) walks through it.
 ## Why not server-side?