npm - opencode-lmstudio-warm - Versions diffs - 0.1.0 - Mend

opencode-lmstudio-warm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +47 -0
package/LICENSE +21 -0
package/README.md +301 -0
package/examples/README.md +36 -0
package/examples/lmstudio-warm.json +16 -0
package/examples/opencode.json +23 -0
package/package.json +57 -0
package/src/index.ts +559 -0

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Changelog
+All notable changes to this project are documented here.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+While the version is `0.x`, a MINOR bump may include breaking changes (per the
+SemVer 0.x rule); such changes are called out explicitly below.
+## [Unreleased]
+## [0.1.0] - 2026-07-03
+Initial public release.
+### Added
+- `lmstudio-warm` opencode plugin (`src/index.ts`): a deterministic pre-warm
+  gate on the awaited `chat.params` hook that guarantees the target LM Studio
+  model is addressable before every request, healing cold JIT loads,
+  "no model loaded" errors, and mid-session idle-TTL evictions.
+- Background eager warm of `model` + `small_model` at instance start.
+- Cross-process `mkdir` mutex with dead-holder liveness detection so parallel
+  opencode workers never race `lms load` (no `:2` duplicate instances).
+- Configurable via `~/.config/opencode/lmstudio-warm.json` or plugin options:
+  `providers`, `ttlSeconds`, `parallel`, `contextLength`, `perModel`,
+  `verifyCacheMs`, `retryCooldownMs`, `failMode`, `reconcileDuplicates`,
+  `eager`, and more.
+- Three install paths: npm package, single-file copy, and project-local.
+- A live E2E fixture under `test/e2e/` — a 9-check harness (cold load /
+  eviction heal / thundering herd / orphaned-duplicate reconcile).
+- Vitest unit tests (`test/`) for the exported pure logic: config merge,
+  model-ref parsing, load-arg building, addressability, pid liveness, and
+  fail-mode decisions.
+- Reference configs under `examples/`.
+### Fixed
+- Cross-process lock leak in the fire-and-forget eager-warm path: a one-shot
+  `opencode run` exiting mid-load could leave the mkdir lock held by a dead pid,
+  stalling the next worker up to ~18.5 min. `acquireLock` now breaks a contended
+  lock immediately when its holder pid is dead (or the pid file is absent past a
+  grace window), the release is synchronous, and a `process.on("exit")` handler
+  is a last-resort cleanup. Verified 9/9 against a live LM Studio fleet.
+[Unreleased]: https://github.com/diegomarino/opencode-lmstudio-warm/compare/v0.1.0...HEAD
+[0.1.0]: https://github.com/diegomarino/opencode-lmstudio-warm/releases/tag/v0.1.0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Diego Marino
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,301 @@
+# opencode-lmstudio-warm
+Deterministic model pre-warm for **opencode + LM Studio**.
+![Quick start: install the plugin, LM Studio starts cold, the first opencode run warms the model before the request leaves, and lms ps shows both models resident with no TTL](https://github.com/user-attachments/assets/f5522cb6-7967-4f47-a8c5-ca617a8d736a)
+<sup>Scripted demo (`tools/quickstart/generate-cast.py`) — every output line captured verbatim from a real run; the cold-load wait is shortened, and its spinner visualizes the plugin's background `lms load` (opencode itself waits silently).</sup>
+A dependency-free opencode plugin that **guarantees your LM Studio model is
+loaded and addressable _before_ any request leaves opencode**.
+If you point
+opencode at LM Studio, it fixes three failures you have probably already met:
+- **First request hangs** — the model is cold and JIT-loads while your request waits.
+- **`"no model loaded"` errors** — JIT is off and nothing loads the model for you.
+- **Mid-session breakage** — LM Studio's idle TTL evicted the model between two messages.
+Per request, the plugin checks that the model is actually loaded and, when it
+isn't, performs exactly one `lms load` (even across parallel sessions) before
+letting the request through.
+Verified against opencode **v1.17.10** and the local LM Studio + `lms` CLI on
+macOS/Apple Silicon (see [`test/e2e/verify.sh`](./test/e2e/verify.sh), 9/9 passing).
+## Quick start
+**1. Install and register the plugin** — one command; opencode resolves it from
+npm and adds it to your config's `plugin` array:
+```bash
+opencode plugin -g opencode-lmstudio-warm    # global (~/.config/opencode) — every session on the machine
+# or, for a single project's opencode.json:
+opencode plugin opencode-lmstudio-warm
+```
+**2. Point opencode at LM Studio** (skip if you already have an `lmstudio`
+provider). In `~/.config/opencode/opencode.json`:
+```jsonc
+{
+  "plugin": ["opencode-lmstudio-warm"],
+  "provider": {
+    "lmstudio": {
+      "npm": "@ai-sdk/openai-compatible",
+      "options": {
+        "baseURL": "http://127.0.0.1:1234/v1",
+        "apiKey": "{env:LM_API_TOKEN}",
+        "headerTimeout": 600000,
+        "chunkTimeout": 120000
+      }
+    }
+  }
+}
+```
+Then set your `model` / `small_model` to your LM Studio model keys. See
+[`examples/opencode.json`](./examples/opencode.json) for a fuller starting point.
+**3. Adjust LM Studio once** (App Settings → Developer): disable
+**JIT model auto-unload TTL** and **unload previous JIT model on load**; keep
+JIT itself on as a fallback. ([Why these matter →](#how-it-works))
+That's it — from your next opencode session, the model is warm before the
+first token is requested.
+## Install options
+All three paths load the same plugin — pick the one that fits:
+| Path | Best for |
+|------|----------|
+| [npm](#npm-recommended) (recommended) | Most users and fleets — version-pinned, one-line updates |
+| [Single-file copy](#single-file-copy-offline-fleet-wide) | Offline machines |
+| [Project-local](#project-local) | Hacking on the plugin itself |
+### npm (recommended)
+The Quick start command above is all you need. Notes:
+- You don't run `npm install` / `bun add` yourself, and there's no `npx` step —
+  opencode imports the module and auto-installs any plugin named in your config
+  at startup, so hand-adding `"opencode-lmstudio-warm"` to the `plugin` array
+  works too.
+- Use `-f` to force a version bump.
+**Scriptable setup** — for fleets or automation, this `jq` one-shot registers
+the plugin *and* scaffolds the provider with recommended timeouts. It is
+idempotent and non-destructive: keeps your existing plugins, provider, and
+models, and never overwrites options you've set.
+```bash
+CFG=~/.config/opencode/opencode.json   # or ./opencode.json for a single project
+[ -f "$CFG" ] || echo '{}' > "$CFG"
+jq '
+  .plugin = ((.plugin // []) - ["opencode-lmstudio-warm"] + ["opencode-lmstudio-warm"])
+  | .provider.lmstudio.npm                  //= "@ai-sdk/openai-compatible"
+  | .provider.lmstudio.options.baseURL      //= "http://127.0.0.1:1234/v1"
+  | .provider.lmstudio.options.apiKey       //= "{env:LM_API_TOKEN}"
+  | .provider.lmstudio.options.headerTimeout //= 600000
+  | .provider.lmstudio.options.chunkTimeout  //= 120000
+' "$CFG" > "$CFG.tmp" && mv "$CFG.tmp" "$CFG"
+```
+### Single-file copy (offline, fleet-wide)
+```bash
+mkdir -p ~/.config/opencode/plugin
+cp src/index.ts ~/.config/opencode/plugin/lmstudio-warm.ts
+```
+Auto-discovered by every opencode session on the machine. (opencode's docs spell
+this directory `plugins`; verified as `plugin/` — singular — on v1.17.10.)
+### Project-local
+Scope the plugin to a single project by copying `src/index.ts` into that
+project's `.opencode/plugin/lmstudio-warm.ts` — opencode auto-discovers it there
+for that project only. (This repo's own E2E fixture uses exactly this mechanism;
+see `test/e2e/`.)
+Whichever path you pick, also apply the LM Studio GUI settings from
+[Quick start](#quick-start) step 3 on every machine. The provider timeouts
+(`headerTimeout` / `chunkTimeout`) are defense-in-depth and are already set by
+the JSON/`jq` above.
+## Configuration
+The plugin works with zero configuration. Optional tuning lives in
+`~/.config/opencode/lmstudio-warm.json` (or inline as
+`"plugin": [["opencode-lmstudio-warm", {...}]]`): `providers`,
+`ttlSeconds`, `parallel` (size ≈ concurrent fleet width; overflow queues
+server-side), `contextLength`,
+`perModel: { "<key>": { parallel, ttlSeconds, contextLength } }`,
+`verifyCacheMs`, `retryCooldownMs`, `failMode` (`hybrid` default: confirmed
+failures fail the request with a clear error; ambiguous lock contention
+proceeds fail-open), `reconcileDuplicates`, `eager`, `logFile`.
+Log: `~/.cache/opencode/lmstudio-warm.log`.
+See `examples/lmstudio-warm.json` for a fleet-tuned starting point
+(`cp examples/lmstudio-warm.json ~/.config/opencode/lmstudio-warm.json`).
+`perModel` keys are LM Studio model keys — the exact string opencode sends as
+the API `model` field. Sizing `parallel`: set it to the expected number of
+concurrent workers hitting that model; each slot costs extra KV-cache memory,
+and overflow requests queue server-side (latency, not failure), so
+undersizing is safe and oversizing wastes VRAM. Titles/summaries on the small
+model tolerate queueing; the main model is where fleet width matters.
+## Verify
+A live, self-contained E2E fixture lives in [`test/e2e/`](./test/e2e/) — set two
+real LM Studio model keys and run it:
+```bash
+MAIN="your/main-model" SMALL="your-small-model" bun run e2e
+# requires jq, lms, opencode + a running LM Studio; export LM_API_TOKEN for full E2E
+```
+Covers: (a) cold spawn loads before the first request; (b) mid-session
+eviction healed on resume (`opencode run -c`); (c) 3 parallel cold spawns →
+exactly one `lms load`, no `:2` duplicates; (d) orphaned `:2`-only state is
+reconciled back to an addressable instance. See
+[`test/e2e/README.md`](./test/e2e/README.md) for setup and the placeholders to edit.
+> ⚠️ It mutates live LM Studio state (unloads/loads models, spawns parallel
+> sessions) — run it on a dev machine, not a busy fleet.
+## Uninstall / rollback
+For the npm install path, remove `"opencode-lmstudio-warm"` from the `plugin`
+array in `opencode.json`. For the file-copy paths:
+```bash
+rm ~/.config/opencode/plugin/lmstudio-warm.ts   # removes the gate everywhere
+rm -f ~/.config/opencode/lmstudio-warm.json     # optional tuning file
+rm -rf ~/.cache/opencode/lmstudio-warm.lock     # only if a stale lock lingers
+```
+Models loaded by the plugin have no TTL, so after uninstalling they stay
+resident until `lms unload <key>` or an LM Studio restart. The `opencode.json`
+timeout options and the LM Studio GUI settings are independent of the plugin
+and can stay.
+## How it works
+### The three layers
+1. **Plugin (primary, deterministic)** — `src/index.ts`.
+   Per request: verified-cache (30 s) → `lms ps --json` addressability check →
+   cross-process `mkdir` lock → double-checked re-check → orphan-duplicate
+   reconciliation → `lms load <key> -y` (no `--ttl` ⇒ resident indefinitely,
+   `ttlMs: null` verified) → post-load verification. Plus a background eager
+   warm of `model` + `small_model` at instance start (`config` hook).
+2. **LM Studio server settings (independent)** — in the GUI (App Settings →
+   Developer): disable **JIT model auto-unload TTL** (`jitModelTTL`, the 1 h
+   eviction that stalls long sessions) and **unload previous JIT model on load**
+   (`unloadPreviousJITModelOnLoad` — otherwise a JIT load of one model can
+   evict the other). Keys live in `~/.lmstudio/settings.json` under
+   `developer.*` (edit only while the app is closed). Keep JIT **on** as a
+   fallback; keep server autostart on.
+3. **opencode timeouts (defense-in-depth)** — v1.17.10 honors undocumented
+   provider options `timeout`, `headerTimeout`, `chunkTimeout`
+   (`provider.ts:resolveSDK`). Default is NO timeout at all (infinite hang
+   possible). `opencode.json` here sets `headerTimeout: 600000` (tolerates
+   queueing behind busy parallel slots) and `chunkTimeout: 120000` (converts a
+   wedged stream into a visible, bounded error).
+### Why a plugin is the right layer (design decision)
+Investigated against the v1.17.10 source (tag clone), not docs:
+- The `chat.params` hook is **awaited** (`yield* plugin.trigger("chat.params", ...)`
+  in `session/llm/request.ts`) before every request is built and sent, and it
+  fires for **every** stream — including `small: true` title/summary requests.
+  One hook deterministically gates BOTH pinned models, per request, which is
+  what heals mid-session eviction (an orchestrator pre-warm only helps at
+  spawn time).
+- Plugins run in-process under Bun and can spawn `lms` (a blocking, exit-code
+  deterministic load barrier).
+- The `event` hook is dispatched fire-and-forget (`void hook.event?.(...)`) —
+  it can NOT gate. The v2 `ctx.aisdk.sdk` custom-fetch API is **types-only** in
+  v1.17.10 (nothing in core imports it) — that path from the prior verdict is
+  refuted for this release.
+- A plugin dropped in `~/.config/opencode/plugin/` is auto-discovered by every
+  worker on the machine — one file distributes fleet-wide and also covers
+  manually launched sessions.
+Tradeoff vs. an orchestrator pre-warm node: the plugin costs one
+`lms ps --json` (~150 ms) per model per 30 s per process at steady state; the
+orchestrator node is simpler but only covers spawn time and only sessions it
+spawns. Keep the orchestrator node, if you add one, as belt-and-suspenders —
+it is not required.
+## Known limitations / failure modes
+- **30 s verified-cache window**: an external unload (GUI, crash) within 30 s
+  of a positive check can slip one request through; it errors visibly and the
+  next request heals. There is no error hook in v1.17.10 to invalidate the
+  cache on failure.
+- **`lms ps` cannot signal "loading"** (measured: a loading instance shows
+  `status: "idle"` at ~200 ms into a 12.5 s load). A waiter can pass the gate
+  mid-load; LM Studio queues its request until weights are ready (verified) —
+  a short wait, not a failure.
+- **External JIT loads race**: a non-gated client can still trigger JIT
+  duplicates/evictions. Mitigated by Layer 2 settings; gate all fleet clients.
+- **`unloadPreviousJITModelOnLoad` scope for explicit loads is assumed exempt**
+  (evidence: explicit loads carry `ttlMs: null` vs JIT's TTL, so bookkeeping
+  differs). Confirm by JIT-loading a third model via API while both pinned
+  models are resident, then `lms ps`. Disabling the setting (Layer 2) makes
+  this moot.
+- **LM Studio app fully closed**: `lms server start` + `open -ga "LM Studio"`
+  fallback is implemented but untested here (the app was running). Confirm:
+  quit LM Studio → run one worker → check the log.
+- **Memory guardrails**: if LM Studio's guardrail refuses a load, the plugin
+  fails that request with a clear error and cools down 60 s (no load storm) —
+  it cannot free VRAM for you.
+- **API auth**: the plugin itself never needs `LM_API_TOKEN` (lms + probe are
+  auth-independent); workers still need it for generation when auth is on.
+## Running under an orchestrator (e.g. ao-lite)
+No orchestrator changes are required — workers inherit the plugin from
+`~/.config/opencode/plugin/` and warm themselves. Two optional touches:
+export `LM_API_TOKEN` in the worker environment (the plugin itself never needs
+it), and if you want spawn-time belt-and-suspenders, a pre-warm node only needs:
+`lms ps --json` guard → `lms load <key> -y` — the same logic, but remember it
+cannot heal mid-session evictions; the plugin does.
+## Development
+The plugin is a single file with **no runtime dependencies** (its only import,
+`@opencode-ai/plugin`, is `import type` and erased at build time). The root
+`package.json` pulls that type package and `@types/node` as devDependencies so
+you can type-check locally:
+```bash
+bun install
+bun run typecheck        # tsc --strict, 0 errors
+bun run test             # vitest unit tests for the pure logic (test/)
+bun run check            # typecheck + tests + shellcheck
+bun run e2e              # live E2E fixture (needs LM Studio; see test/e2e/)
+```
+The pure, per-process-stateless logic (config merge, model-ref parsing, load-arg
+building, addressability, pid liveness, fail-mode decisions) is exported from
+`src/index.ts` and unit-tested under `test/`; the live system behavior is covered
+by the E2E fixture under [`test/e2e/`](./test/e2e/).
+Releases follow [SemVer](https://semver.org) and are cut by CI on `v*` tags
+(see [`CHANGELOG.md`](./CHANGELOG.md)).
+## Disclaimer
+Community plugin. Not affiliated with, endorsed by, or an official product of the
+OpenCode or LM Studio teams. "opencode" and "LM Studio" are used only to indicate
+compatibility.
+## License
+[MIT](./LICENSE) © Diego Marino

package/examples/README.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Examples
+Reference configs for `opencode-lmstudio-warm`. Neither is enabled by copying
+the repo — take the pieces you need into your own files and replace the
+`your-*-model-key` placeholders with your real LM Studio model keys (the exact
+strings opencode sends as the API `model` field).
+## `opencode.json` — wiring the plugin
+A minimal consumer config: the `plugin` array entry plus the `lmstudio` provider
+block (`baseURL`, `apiKey`, and the recommended `headerTimeout` / `chunkTimeout`).
+Merge it into your own `opencode.json` — the repo
+[README's Install section](../README.md#install) has an idempotent `jq` one-liner
+that does this non-destructively, or run `opencode plugin opencode-lmstudio-warm`
+to register the plugin and add only the provider block by hand.
+Set `model` / `small_model` to your own model keys before use.
+## `lmstudio-warm.json` — tuning the plugin
+A fleet-tuned starting point for the plugin's own options file. Copy it to
+`~/.config/opencode/lmstudio-warm.json` (or pass the same object as plugin
+options in `opencode.json`). Highlights:
+- **`failMode: "hybrid"`** — confirmed failures (server down, load failed,
+  unreconcilable duplicates) fail the request with a clear error; ambiguous lock
+  contention proceeds fail-open so a plausibly-in-flight load can still serve it.
+- **`perModel.<key>.parallel`** — size to the number of concurrent workers
+  hitting that model. Each slot costs extra KV-cache memory; overflow requests
+  queue server-side (latency, not failure), so undersizing is safe. The small
+  model tolerates queueing; the main model is where fleet width matters.
+- **timeouts** (`loadTimeoutMs`, `serverStartTimeoutMs`, `lockWaitTimeoutMs`,
+  `verifyCacheMs`, `retryCooldownMs`) are tuned for a multi-worker fleet.
+See the [README's Configuration section](../README.md#configuration) for the full
+option list and defaults.

package/examples/lmstudio-warm.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "providers": ["lmstudio"],
+  "failMode": "hybrid",
+  "eager": true,
+  "reconcileDuplicates": true,
+  "launchAppFallback": true,
+  "verifyCacheMs": 30000,
+  "retryCooldownMs": 60000,
+  "loadTimeoutMs": 900000,
+  "serverStartTimeoutMs": 90000,
+  "lockWaitTimeoutMs": 1200000,
+  "perModel": {
+    "your-main-model-key": { "parallel": 6 },
+    "your-small-model-key": { "parallel": 4 }
+  }
+}

package/examples/opencode.json ADDED Viewed

@@ -0,0 +1,23 @@
+{
+  "$schema": "https://opencode.ai/config.json",
+  "//": "Consumer example: wire the plugin via npm. opencode auto-installs the package named in `plugin` and loads it before every request. Copy the `plugin` line and the `lmstudio` provider block into your own opencode.json, then set your model/small_model and LM_API_TOKEN.",
+  "plugin": ["opencode-lmstudio-warm"],
+  "model": "lmstudio/your-main-model-key",
+  "small_model": "lmstudio/your-small-model-key",
+  "provider": {
+    "lmstudio": {
+      "npm": "@ai-sdk/openai-compatible",
+      "name": "LM Studio (local)",
+      "options": {
+        "baseURL": "http://127.0.0.1:1234/v1",
+        "apiKey": "{env:LM_API_TOKEN}",
+        "headerTimeout": 600000,
+        "chunkTimeout": 120000
+      },
+      "models": {
+        "your-main-model-key": { "name": "Your main local model" },
+        "your-small-model-key": { "name": "Your small/title local model" }
+      }
+    }
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,57 @@
+{
+  "name": "opencode-lmstudio-warm",
+  "version": "0.1.0",
+  "description": "Deterministic LM Studio model pre-warm gate for opencode — loads and keeps the target model resident before every request, healing cold starts and mid-session TTL evictions.",
+  "type": "module",
+  "main": "./src/index.ts",
+  "exports": {
+    ".": {
+      "types": "./src/index.ts",
+      "import": "./src/index.ts"
+    }
+  },
+  "files": [
+    "src",
+    "examples",
+    "README.md",
+    "LICENSE",
+    "CHANGELOG.md"
+  ],
+  "keywords": [
+    "opencode",
+    "opencode-plugin",
+    "lmstudio",
+    "lm-studio",
+    "llm",
+    "local-llm",
+    "model-loading",
+    "prewarm",
+    "warm",
+    "bun"
+  ],
+  "license": "MIT",
+  "author": "Diego Marino <diego@petalo.xyz>",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/diegomarino/opencode-lmstudio-warm.git"
+  },
+  "homepage": "https://github.com/diegomarino/opencode-lmstudio-warm#readme",
+  "bugs": {
+    "url": "https://github.com/diegomarino/opencode-lmstudio-warm/issues"
+  },
+  "engines": {
+    "bun": ">=1.0.0"
+  },
+  "scripts": {
+    "typecheck": "tsc --noEmit --strict --skipLibCheck --moduleResolution bundler --module esnext --target esnext --types node src/index.ts",
+    "test": "vitest run",
+    "test:watch": "vitest",
+    "e2e": "test/e2e/verify.sh",
+    "check": "bun run typecheck && bun run test && shellcheck --severity=warning test/e2e/verify.sh"
+  },
+  "devDependencies": {
+    "@opencode-ai/plugin": "1.17.10",
+    "@types/node": "^26.1.0",
+    "vitest": "^4.1.9"
+  }
+}

package/src/index.ts ADDED Viewed

@@ -0,0 +1,559 @@
+/**
+ * lmstudio-warm — deterministic LM Studio model pre-warm gate for opencode.
+ *
+ * Guarantees the target model is addressable in LM Studio BEFORE any LLM
+ * request leaves opencode, healing cold starts and mid-session TTL evictions
+ * for every model the session uses (main model AND small_model, which shares
+ * the same chat.params hook path).
+ *
+ * Verified against opencode v1.17.10 source:
+ *  - `chat.params` is awaited before each request and fires for every stream,
+ *    including small-model title/summary generation.
+ *  - `model.api.id` is the exact string sent as the API `model` field.
+ *  - Plugins run in-process (Bun) and may spawn child processes.
+ *
+ * Verified live against LM Studio (lms CLI):
+ *  - `lms load <key> -y` blocks until ready, exits 0 only on success.
+ *  - `lms load` is NOT idempotent: loading a resident key creates a duplicate
+ *    instance suffixed `:2` — hence the ps-guard + cross-process lock below.
+ *  - Omitting `--ttl` loads with ttlMs=null (resident until unloaded), and
+ *    such instances are bookkept separately from JIT loads (which carry the
+ *    server's JIT TTL).
+ *  - `lms ps --json` lists a loading instance as status "idle" ~immediately
+ *    (measured: listed at ~200ms into a 12.5s load). There is NO ps-visible
+ *    "loading" state. This is benign for the gate: identifier presence means
+ *    the instance is addressable and LM Studio QUEUES requests against it
+ *    until weights are ready (verified live) — so a waiter passing the gate
+ *    mid-load waits briefly server-side instead of erroring.
+ *  - `lms ps` works even while the HTTP server is off, so the HTTP server is
+ *    ensured independently (probe /models, else `lms server start`; any HTTP
+ *    response — including 401 when API auth is enabled — means "listening").
+ *
+ * Config (all optional), merged in this order:
+ *   defaults < ~/.config/opencode/lmstudio-warm.json < plugin options tuple
+ *
+ * The pure, per-process-stateless helpers are hoisted to module scope and
+ * exported (see the "Pure helpers" block) so they can be unit-tested directly;
+ * the plugin closure composes them with the live state and child processes.
+ */
+import type { Plugin } from "@opencode-ai/plugin"
+import { execFile } from "node:child_process"
+import * as fs from "node:fs"
+import * as fsp from "node:fs/promises"
+import * as os from "node:os"
+import * as path from "node:path"
+export type PerModel = { ttlSeconds?: number; parallel?: number; contextLength?: number }
+export type WarmOptions = {
+  /** Provider IDs to gate. Requests on other providers are ignored. */
+  providers: string[]
+  /** Absolute path to the lms CLI. */
+  lmsPath: string
+  /** Fallback base URL if the provider config doesn't carry one. */
+  baseURL: string
+  /** --ttl passed to lms load. 0 = omit (resident until unloaded). */
+  ttlSeconds: number
+  /** --parallel passed to lms load. 0 = omit (LM Studio default, currently 4).
+   *  Size to expected concurrent fleet width per model; requests beyond the
+   *  slot count queue server-side (latency, not failure). */
+  parallel: number
+  /** --context-length passed to lms load. 0 = omit (model default). */
+  contextLength: number
+  /** Per-model-key overrides of ttlSeconds/parallel/contextLength. */
+  perModel: Record<string, PerModel>
+  /** How long a positive residency verdict is trusted before re-checking. */
+  verifyCacheMs: number
+  /** After a CONFIRMED load failure, don't retry the same key for this long
+   *  (prevents a load storm when e.g. a memory guardrail keeps refusing). */
+  retryCooldownMs: number
+  /** Hard cap on a single lms load (cold load of a big model can take minutes). */
+  loadTimeoutMs: number
+  /** Hard cap on bringing the HTTP server up. */
+  serverStartTimeoutMs: number
+  /** Max time a process waits for another process's in-flight load. */
+  lockWaitTimeoutMs: number
+  /**
+   * What to do when the warm gate cannot ensure residency:
+   *  - "hybrid" (default): CONFIRMED failures (server won't start, lms load
+   *    failed, unreconcilable duplicates) fail the request with a clear error;
+   *    ambiguous outcomes (lock contention timeout) proceed fail-open so a
+   *    plausibly-in-flight load elsewhere can serve the request via queueing.
+   *  - "open": never fail the request; log and proceed (JIT fallback).
+   *  - "closed": any warm failure fails the request.
+   */
+  failMode: "open" | "closed" | "hybrid"
+  /** If only suffixed duplicate instances (key:2 …) exist and none is busy,
+   *  unload them and load fresh so the bare key becomes addressable again. */
+  reconcileDuplicates: boolean
+  /** If the server can't be started (LM Studio app closed), try `open -ga "LM Studio"` once. */
+  launchAppFallback: boolean
+  /** Warm cfg.model + cfg.small_model in the background at instance start. */
+  eager: boolean
+  logFile: string
+  lockDir: string
+}
+const HOME = os.homedir()
+const DEFAULTS: WarmOptions = {
+  providers: ["lmstudio"],
+  lmsPath: fs.existsSync(path.join(HOME, ".lmstudio/bin/lms")) ? path.join(HOME, ".lmstudio/bin/lms") : "lms",
+  baseURL: "http://127.0.0.1:1234/v1",
+  ttlSeconds: 0,
+  parallel: 0,
+  contextLength: 0,
+  perModel: {},
+  verifyCacheMs: 30_000,
+  retryCooldownMs: 60_000,
+  loadTimeoutMs: 900_000,
+  serverStartTimeoutMs: 90_000,
+  lockWaitTimeoutMs: 1_200_000,
+  failMode: "hybrid",
+  reconcileDuplicates: true,
+  launchAppFallback: true,
+  eager: true,
+  logFile: path.join(HOME, ".cache/opencode/lmstudio-warm.log"),
+  lockDir: path.join(HOME, ".cache/opencode/lmstudio-warm.lock"),
+}
+function loadFileOptions(): Partial<WarmOptions> {
+  try {
+    const p = path.join(HOME, ".config/opencode/lmstudio-warm.json")
+    if (!fs.existsSync(p)) return {}
+    return JSON.parse(fs.readFileSync(p, "utf8"))
+  } catch {
+    return {}
+  }
+}
+export type LmsInstance = {
+  modelKey?: string
+  identifier?: string
+  status?: string
+  ttlMs?: number | null
+  parallel?: number
+  queued?: number
+}
+/** Warm outcome. `confirmed` marks a definitive failure (vs. ambiguity). */
+export type WarmResult = { ok: boolean; confirmed: boolean; reason: string }
+const OK: WarmResult = { ok: true, confirmed: false, reason: "" }
+// ─── Pure helpers (module scope, exported for unit tests) ───────────────────
+// No per-process state — the plugin closure below composes these with the live
+// caches, child processes, and lock directory.
+/** Merge config in precedence order: DEFAULTS < file options < plugin options.
+ *  Also maps the legacy `failClosed` boolean onto `failMode` when the newer
+ *  key isn't set. */
+export function resolveOptions(
+  fileOpts: Partial<WarmOptions>,
+  pluginOpts?: (Partial<WarmOptions> & { failClosed?: boolean }) | null,
+): WarmOptions {
+  const raw = { ...fileOpts, ...(pluginOpts ?? {}) }
+  // Legacy boolean from earlier revisions of this plugin.
+  if (raw.failClosed !== undefined && raw.failMode === undefined) raw.failMode = raw.failClosed ? "closed" : "open"
+  return { ...DEFAULTS, ...raw }
+}
+/** opencode addresses models by the UNSUFFIXED key; LM Studio routes the API
+ *  `model` field by instance identifier. "Addressable" means an instance whose
+ *  identifier equals the key exists. NOTE (verified live): a still-loading
+ *  instance already appears with status "idle" and LM Studio queues requests
+ *  against it until ready — there is no ps-visible "loading" state, and none is
+ *  needed for correctness. */
+export function addressable(instances: LmsInstance[], key: string): boolean {
+  return instances.some((i) => i.identifier === key)
+}
+/** Split an opencode model ref ("provider/key…") on the FIRST slash, so a key
+ *  that itself contains slashes (e.g. "qwen/qwen3") is preserved intact. */
+export function parseModelRef(ref: unknown): { providerID: string; key: string } | null {
+  if (typeof ref !== "string" || !ref.includes("/")) return null
+  const slash = ref.indexOf("/")
+  return { providerID: ref.slice(0, slash), key: ref.slice(slash + 1) }
+}
+/** Build the `lms load` argv for a key, applying per-model overrides over the
+ *  top-level options. A value of 0 omits the corresponding flag. */
+export function loadArgs(opts: WarmOptions, key: string): string[] {
+  const per = opts.perModel[key] ?? {}
+  const ttl = per.ttlSeconds ?? opts.ttlSeconds
+  const parallel = per.parallel ?? opts.parallel
+  const ctx = per.contextLength ?? opts.contextLength
+  const args = ["load", key, "-y"]
+  if (ttl > 0) args.push("--ttl", String(ttl))
+  if (parallel > 0) args.push("--parallel", String(parallel))
+  if (ctx > 0) args.push("--context-length", String(ctx))
+  return args
+}
+/** Is a process alive? `kill(pid, 0)` sends no signal, just probes: ESRCH ⇒
+ *  no such process (dead); EPERM ⇒ exists but owned by another user (alive).
+ *  Host-local only, which is fine — the lock dir is host-local too. */
+export function pidAlive(pid: number): boolean {
+  try {
+    process.kill(pid, 0)
+    return true
+  } catch (err: any) {
+    return err?.code === "EPERM"
+  }
+}
+/** Parse a lock pid-file's contents to a pid, or null if absent/blank/garbage. */
+export function parseLockPid(content: string | null): number | null {
+  if (content == null) return null
+  const n = Number.parseInt(content.trim(), 10)
+  return Number.isFinite(n) ? n : null
+}
+/** Given a warm outcome and the configured failMode, should opencode's request
+ *  be failed? `closed` fails on any not-ok; `hybrid` fails only CONFIRMED
+ *  failures; `open` never fails. An ok result never fails. */
+export function shouldFailRequest(failMode: WarmOptions["failMode"], result: WarmResult): boolean {
+  if (result.ok) return false
+  return failMode === "closed" || (failMode === "hybrid" && result.confirmed)
+}
+export const LMStudioWarm: Plugin = async (_input, pluginOptions) => {
+  const opts = resolveOptions(
+    loadFileOptions(),
+    pluginOptions as (Partial<WarmOptions> & { failClosed?: boolean }) | null,
+  )
+  // ---- state (per opencode process) ----
+  const verifiedAt = new Map<string, number>() // model key -> last confirmed-addressable timestamp
+  const failedAt = new Map<string, { at: number; reason: string }>() // negative cache
+  const inflight = new Map<string, Promise<WarmResult>>()
+  let serverVerifiedAt = 0
+  // True only while THIS process holds the mkdir lock. Used by the exit handler
+  // to release a lock that a fire-and-forget eager warm may still be holding
+  // when the process tears down (otherwise the async finally never runs).
+  let holdingLock = false
+  try {
+    fs.mkdirSync(path.dirname(opts.logFile), { recursive: true })
+  } catch {}
+  function log(msg: string) {
+    try {
+      fs.appendFileSync(opts.logFile, `${new Date().toISOString()} [pid ${process.pid}] ${msg}\n`)
+    } catch {}
+  }
+  // Last-resort synchronous lock release. A one-shot `opencode run` can exit
+  // while a background eager warm still holds the lock; process.on("exit") runs
+  // sync only, so rmSync is the tool. Guard by the pid file so we never delete a
+  // lock another process legitimately re-acquired in the meantime (TOCTOU), and
+  // never throw from the handler. SIGKILL is uncatchable — the dead-holder
+  // liveness check in acquireLock is the backstop for that.
+  process.once("exit", () => {
+    if (!holdingLock) return
+    try {
+      let ours = true
+      try {
+        const pidStr = fs.readFileSync(path.join(opts.lockDir, "pid"), "utf8").trim()
+        ours = pidStr === "" || pidStr === String(process.pid) // absent pid ⇒ we mkdir'd but hadn't written it yet
+      } catch {
+        ours = true
+      }
+      if (ours) fs.rmSync(opts.lockDir, { recursive: true, force: true })
+    } catch {}
+  })
+  function run(
+    cmd: string,
+    args: string[],
+    timeoutMs: number,
+  ): Promise<{ ok: boolean; timedOut: boolean; stdout: string; stderr: string }> {
+    return new Promise((resolve) => {
+      execFile(cmd, args, { timeout: timeoutMs, maxBuffer: 16 * 1024 * 1024, env: process.env }, (err, stdout, stderr) =>
+        resolve({
+          ok: !err,
+          timedOut: Boolean(err && (err as any).killed),
+          stdout: String(stdout),
+          stderr: String(stderr),
+        }),
+      )
+    })
+  }
+  const lms = (args: string[], timeoutMs: number) => run(opts.lmsPath, args, timeoutMs)
+  async function psInstances(): Promise<LmsInstance[] | null> {
+    const res = await lms(["ps", "--json"], 15_000)
+    if (!res.ok) {
+      log(`lms ps failed: ${res.stderr.trim().slice(0, 300)}`)
+      return null
+    }
+    try {
+      const parsed = JSON.parse(res.stdout)
+      return Array.isArray(parsed) ? parsed : null
+    } catch {
+      log(`lms ps returned non-JSON: ${res.stdout.slice(0, 200)}`)
+      return null
+    }
+  }
+  // "Alive" means the server is listening — any HTTP response counts, including
+  // 401/403 (LM Studio with API auth enabled rejects unauthenticated probes).
+  // Only network-level failures (refused/timeout) mean the server is down.
+  async function httpAlive(baseURL: string): Promise<boolean> {
+    try {
+      await fetch(`${baseURL.replace(/\/+$/, "")}/models`, { signal: AbortSignal.timeout(2_500) })
+      return true
+    } catch {
+      return false
+    }
+  }
+  async function pollAlive(baseURL: string, timeoutMs: number): Promise<boolean> {
+    const deadline = Date.now() + timeoutMs
+    while (Date.now() < deadline) {
+      if (await httpAlive(baseURL)) return true
+      await new Promise((r) => setTimeout(r, 1_000))
+    }
+    return false
+  }
+  let serverInflight: Promise<boolean> | null = null
+  function ensureServer(baseURL: string): Promise<boolean> {
+    if (Date.now() - serverVerifiedAt < opts.verifyCacheMs) return Promise.resolve(true)
+    if (serverInflight) return serverInflight
+    serverInflight = ensureServerImpl(baseURL).finally(() => {
+      serverInflight = null
+    })
+    return serverInflight
+  }
+  async function ensureServerImpl(baseURL: string): Promise<boolean> {
+    if (await httpAlive(baseURL)) {
+      serverVerifiedAt = Date.now()
+      return true
+    }
+    log(`HTTP server not reachable at ${baseURL} — running lms server start`)
+    const started = await lms(["server", "start"], 30_000)
+    if (!started.ok) log(`lms server start failed: ${started.stderr.trim().slice(0, 300)}`)
+    if (await pollAlive(baseURL, opts.serverStartTimeoutMs)) {
+      serverVerifiedAt = Date.now()
+      log(`HTTP server is up at ${baseURL}`)
+      return true
+    }
+    if (opts.launchAppFallback && process.platform === "darwin") {
+      // Server still down: the LM Studio app itself may be closed.
+      log(`server still down — trying: open -ga "LM Studio"`)
+      await run("/usr/bin/open", ["-ga", "LM Studio"], 15_000)
+      await new Promise((r) => setTimeout(r, 3_000))
+      await lms(["server", "start"], 30_000)
+      if (await pollAlive(baseURL, opts.serverStartTimeoutMs)) {
+        serverVerifiedAt = Date.now()
+        log(`HTTP server is up at ${baseURL} (after app launch)`)
+        return true
+      }
+    }
+    log(`HTTP server did not come up within budget`)
+    return false
+  }
+  function lockHolderPid(): number | null {
+    try {
+      return parseLockPid(fs.readFileSync(path.join(opts.lockDir, "pid"), "utf8"))
+    } catch {
+      return null
+    }
+  }
+  // Cross-process mutex via atomic mkdir: parallel opencode workers must not
+  // race lms load (it is not idempotent). A lock may be broken when (1) it is
+  // older than staleMs (no live holder can run that long — every command under
+  // the lock is killed at a hard timeout), (2) its recorded holder pid is dead
+  // (crash/abrupt exit before the finally released it — the observed eager-warm
+  // leak), or (3) the pid file is missing AND the dir has outlived a short grace
+  // (a holder that crashed between mkdir and writeFile). A fresh, pid-less lock
+  // is left alone: that is a live holder still mid-acquisition.
+  async function acquireLock(): Promise<(() => void) | null> {
+    const deadline = Date.now() + opts.lockWaitTimeoutMs
+    const staleMs = opts.loadTimeoutMs + opts.serverStartTimeoutMs + 120_000
+    const pidGraceMs = 5_000
+    for (;;) {
+      try {
+        await fsp.mkdir(opts.lockDir, { recursive: false })
+        holdingLock = true
+        try {
+          await fsp.writeFile(path.join(opts.lockDir, "pid"), String(process.pid))
+        } catch {}
+        // Synchronous release: rmSync + flag clear run with no await between them,
+        // so a second in-process waiter (parallel eager warm) cannot observe a
+        // removed dir with holdingLock still true, and the dir is gone even if
+        // the process is mid-teardown when release fires.
+        return () => {
+          try {
+            fs.rmSync(opts.lockDir, { recursive: true, force: true })
+          } catch {}
+          holdingLock = false
+        }
+      } catch (err: any) {
+        if (err?.code !== "EEXIST") throw err
+        try {
+          const st = await fsp.stat(opts.lockDir)
+          const age = Date.now() - st.mtimeMs
+          const holder = lockHolderPid()
+          let reason = ""
+          if (age > staleMs) reason = `stale (age ${Math.round(age / 1000)}s)`
+          else if (holder !== null && holder !== process.pid && !pidAlive(holder)) reason = `dead holder pid ${holder}`
+          else if (holder === null && age > pidGraceMs) reason = `abandoned (no pid, age ${Math.round(age / 1000)}s)`
+          if (reason) {
+            log(`breaking lock: ${reason}`)
+            await fsp.rm(opts.lockDir, { recursive: true, force: true })
+            continue
+          }
+        } catch {} // lock vanished between mkdir and stat — retry
+        if (Date.now() > deadline) return null // contention timeout — ambiguous, caller decides
+        await new Promise((r) => setTimeout(r, 500))
+      }
+    }
+  }
+  async function doWarm(key: string, baseURL: string): Promise<WarmResult> {
+    if (!(await ensureServer(baseURL))) {
+      return { ok: false, confirmed: true, reason: `LM Studio HTTP server is not reachable at ${baseURL}` }
+    }
+    // Fast path: no lock needed if already addressable.
+    let instances = await psInstances()
+    if (instances && addressable(instances, key)) {
+      verifiedAt.set(key, Date.now())
+      return OK
+    }
+    const release = await acquireLock()
+    if (!release) {
+      // Someone else has been loading for a long time (big model, slow disk).
+      // Their instance may already be addressable and queueing — ambiguous.
+      log(`lock contention timeout waiting to warm ${key} — proceeding (ambiguous)`)
+      return { ok: false, confirmed: false, reason: "lock contention timeout" }
+    }
+    try {
+      // Double-checked: another process may have loaded it while we waited.
+      instances = await psInstances()
+      if (instances && addressable(instances, key)) {
+        verifiedAt.set(key, Date.now())
+        return OK
+      }
+      // Orphaned duplicates: instances of this model exist (key:2 …) but none
+      // is addressable by the bare key (e.g. the original was unloaded and a
+      // stray duplicate survived). Loading again would only stack key:3 —
+      // reconcile by unloading idle duplicates first, then load fresh.
+      const dups = (instances ?? []).filter((i) => i.modelKey === key)
+      if (dups.length > 0) {
+        const busy = dups.some((i) => i.status === "generating" || (i.queued ?? 0) > 0)
+        if (!opts.reconcileDuplicates || busy) {
+          const ids = dups.map((i) => i.identifier).join(", ")
+          log(`WARNING: only non-addressable instances of ${key} exist (${ids}); busy=${busy} — cannot warm`)
+          return { ok: false, confirmed: true, reason: `only suffixed duplicates of ${key} are resident (${ids})` }
+        }
+        for (const d of dups) {
+          if (!d.identifier) continue
+          log(`reconciling: unloading duplicate instance ${d.identifier}`)
+          const un = await lms(["unload", d.identifier], 60_000)
+          if (!un.ok) log(`unload ${d.identifier} failed: ${un.stderr.trim().slice(0, 200)}`)
+        }
+      }
+      const args = loadArgs(opts, key)
+      log(`loading ${key} (${args.join(" ")}) ...`)
+      const t0 = Date.now()
+      const res = await lms(args, opts.loadTimeoutMs)
+      if (!res.ok) {
+        const kind = res.timedOut ? "timeout" : "error"
+        const detail = (res.stderr || res.stdout).trim().slice(0, 500)
+        log(`lms load ${key} FAILED (${kind}) after ${Date.now() - t0}ms: ${detail}`)
+        return { ok: false, confirmed: true, reason: `lms load failed (${kind}): ${detail.slice(0, 200)}` }
+      }
+      instances = await psInstances()
+      if (instances && addressable(instances, key)) {
+        verifiedAt.set(key, Date.now())
+        log(`loaded ${key} in ${Math.round((Date.now() - t0) / 1000)}s`)
+        return OK
+      }
+      log(`lms load ${key} exited 0 but ps does not show identifier === key`)
+      return { ok: false, confirmed: true, reason: `loaded but not addressable as "${key}"` }
+    } finally {
+      release()
+    }
+  }
+  function warm(key: string, baseURL: string): Promise<WarmResult> {
+    if (Date.now() - (verifiedAt.get(key) ?? 0) < opts.verifyCacheMs) return Promise.resolve(OK)
+    const failed = failedAt.get(key)
+    if (failed && Date.now() - failed.at < opts.retryCooldownMs) {
+      return Promise.resolve({ ok: false, confirmed: true, reason: `${failed.reason} (cooldown)` })
+    }
+    const existing = inflight.get(key)
+    if (existing) return existing
+    const p = doWarm(key, baseURL)
+      .catch((err): WarmResult => {
+        log(`warm(${key}) error: ${err instanceof Error ? err.message : String(err)}`)
+        return { ok: false, confirmed: false, reason: "internal error (see log)" }
+      })
+      .then((r) => {
+        if (r.ok) failedAt.delete(key)
+        else if (r.confirmed) failedAt.set(key, { at: Date.now(), reason: r.reason })
+        return r
+      })
+      .finally(() => inflight.delete(key))
+    inflight.set(key, p)
+    return p
+  }
+  log(
+    `plugin loaded (providers=${opts.providers.join(",")} ttl=${opts.ttlSeconds || "none"} parallel=${opts.parallel || "default"} failMode=${opts.failMode})`,
+  )
+  return {
+    // Fires once at instance start with the resolved config. Background eager
+    // warm of both pinned models — NOT awaited, so startup isn't delayed; the
+    // chat.params gate below remains the deterministic barrier.
+    config: async (cfg: any) => {
+      if (!opts.eager) return
+      for (const ref of [cfg?.model, cfg?.small_model]) {
+        const parsed = parseModelRef(ref)
+        if (!parsed || !opts.providers.includes(parsed.providerID)) continue
+        const configured = cfg?.provider?.[parsed.providerID]?.options?.baseURL
+        const baseURL = typeof configured === "string" && configured.startsWith("http") ? configured : opts.baseURL
+        log(`eager warm queued for ${parsed.key}`)
+        void warm(parsed.key, baseURL)
+      }
+    },
+    // Awaited by opencode before EVERY LLM request (main and small model alike):
+    // the deterministic pre-warm gate. Heals cold starts and TTL evictions.
+    "chat.params": async (input: any) => {
+      let result: WarmResult = OK
+      let key: string | undefined
+      try {
+        const providerID: string | undefined = input?.provider?.info?.id ?? input?.model?.providerID
+        if (!providerID || !opts.providers.includes(providerID)) return
+        // model.api.id is the exact string opencode sends as the API `model`
+        // field (== LM Studio model key for config-defined models).
+        key = input?.model?.api?.id ?? input?.model?.id
+        if (!key) return
+        const configured = input?.provider?.options?.baseURL
+        const baseURL = typeof configured === "string" && configured.startsWith("http") ? configured : opts.baseURL
+        result = await warm(key, baseURL)
+      } catch (err) {
+        log(`chat.params hook error: ${err instanceof Error ? err.message : String(err)}`)
+        result = { ok: false, confirmed: false, reason: "hook error (see log)" }
+      }
+      if (result.ok) return
+      if (shouldFailRequest(opts.failMode, result)) {
+        throw new Error(`lmstudio-warm: cannot ensure model "${key}" is loaded — ${result.reason}. See ${opts.logFile}`)
+      }
+      log(`warm(${key}) not ensured (${result.reason}) — proceeding fail-open`)
+    },
+  }
+}