npm - @mutmutco/opencode-mmi - Versions diffs - 2.56.0 → 2.57.0 - Mend

@mutmutco/opencode-mmi 2.56.0 → 2.57.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/dist/index.d.ts +3 -0
package/dist/index.js +121 -1
package/package.json +2 -2
package/skills/_shared/doctrine.md +13 -11
package/skills/bootstrap/SKILL.md +5 -5
package/skills/build/SKILL.md +25 -20
package/skills/build/references/loops.md +1 -1
package/skills/build/references/worked-example.md +2 -2
package/skills/build/templates/campaign-northstar.md +2 -0
package/skills/coop/SKILL.md +91 -16
package/skills/grind/SKILL.md +22 -15
package/skills/grind/references/auto.md +1 -2
package/skills/grind/references/routing.md +3 -4
package/skills/grind/references/verify.md +5 -3
package/skills/grind/templates/saga-snapshot.md +2 -0
package/skills/grind/templates/synthesize-panel.md +1 -1
package/skills/handoff/SKILL.md +4 -2
package/skills/overlord/SKILL.md +12 -12
package/skills/overlord/references/controller-orphan-guard.md +12 -15
package/skills/overlord/references/failure-pressure-scenarios.md +13 -13
package/skills/overlord/references/fugu-api-engine.md +62 -0
package/skills/overlord/references/servant-liveness.md +9 -9
package/skills/overlord/references/shell-adapters.md +2 -1
package/skills/overlord/references/state-schema.md +10 -17
package/skills/overlord/references/terminal-leash.md +25 -22
package/skills/release/SKILL.md +5 -9
package/skills/stage/SKILL.md +1 -1
package/skills/overlord/references/codex-fugu-preflight.md +0 -25
package/skills/overlord/references/opencode-fugu-engine.md +0 -104

package/skills/grind/SKILL.md CHANGED Viewed

@@ -17,6 +17,8 @@ Two kinds of work, one loop:
 **Shared doctrine:** Read `skills/_shared/doctrine.md` at session start and on resume. Fusion, parallelism, panel economics, flat fan-out, classifier-denied spawns, worktree hygiene, saga resume, enforcement matrix — single source; do not duplicate here.
+**Continuity access:** Saga and North Star are Jervaise-only. If the current login is not `jervaise`, skip all `saga`, `northstar`, `plan`, and `/handoff` steps; keep durable state in the board issue, PR body/comments, verification logs, and final report.
 Flags:
 - `--explore` — brainstorm and judge approaches before building (use for open-ended,
   "find a better/faster way" work). Without it, run the convergent loop.
@@ -86,10 +88,11 @@ Flags:
   - Perform the fix, **verify empirically** (run the command whose output the acceptance pins — no diff
     to panel), close the issue, and **file any durable-prevention enhancement** as a separate item.
   - Terminal-done layer 1 is the live command output; layer 3 is the close, not a merge.
-- **Resumable.** After every phase: (1) silent one-line `mmi-cli saga note "<audit>"`; (2) `mmi-cli saga snapshot set --kind grind …` (or `--json-file`) — see **## Saga keep (resume snapshot)**. On resume, `mmi-cli saga snapshot show --kind grind` first; never reconstruct from collapsed chat history.
+- **Resumable.** For `jervaise`, after every phase: (1) silent one-line `mmi-cli saga note "<audit>"`; (2) `mmi-cli saga snapshot set --kind grind …` (or `--json-file`) — see **## Saga keep (resume snapshot)**. For everyone else, write the same phase evidence to the issue/PR record. On resume, corroborate against git, GitHub, and the board; never reconstruct from collapsed chat history.
 - **Announce routing.** At phase transitions (after Phase 0a′ / model selection, before Gate 1 under
-  interactive, and at equivalent points under `--auto`), **announce explicitly** in user-visible text
-  **and** mirror the same facts in a silent `mmi-cli saga note`:
+  interactive, and at equivalent points under `--auto`), **announce explicitly** in user-visible text.
+  For `jervaise`, also mirror the same facts in a silent `mmi-cli saga note`; for every other login,
+  mirror them in the run output/report only (saga is Jervaise-only):
   - **Ultra mode** when explicit `--ultra` YOLO is active (distinct from auto-ultra verify uplift).
   - **Explore mode** when `--explore` or auto-framed explore is selected.
   - **Routing tier** (Budget / Balanced / Paranoid / Ultra verify routing).
@@ -105,7 +108,7 @@ Flags:
 ## Saga keep (resume snapshot)
-Grind-specific snapshot wiring — shared resume rules in `skills/_shared/doctrine.md`. Schema: `templates/saga-snapshot.md`. Use `--kind grind` for show/set. CLI maps snapshot fields → HEAD primitives (`next`, `anchor`, checklist) — no parallel store.
+**Jervaise-only.** Grind-specific snapshot wiring — shared resume rules in `skills/_shared/doctrine.md`. Schema: `templates/saga-snapshot.md`. Use `--kind grind` for show/set. CLI maps snapshot fields → HEAD primitives (`next`, `anchor`, checklist) — no parallel store. For every other login, the issue/PR record + git history is the resume surface — do not run saga/North Star commands.
 ## Gates — how every gate (and decision point) is presented
 Hosts collapse mid-turn text: anything you printed between tool calls may be invisible when
@@ -134,8 +137,9 @@ the user reaches the gate. So:
 ## Phase 0a′ — Classify & route
 Runs immediately after the grindability check, **before** Phase 0a model questions. (`--auto`:
-apply silently for classification; still **announce** routing per **Hard rules — Announce routing**;
-`saga note "grind class=X routing=Y ultra=Z reason=…"`.)
+apply silently for classification; still **announce** routing per **Hard rules — Announce routing**.
+For `jervaise`, `saga note "grind class=X routing=Y ultra=Z reason=…"`; for every other login,
+announce in the run output/report only — saga is Jervaise-only.)
 Read the issue **type label** (`bug` / `feature` / `task`), **priority**, **title/body**,
 **labels** (e.g. `security`), known **files touched**, and flags (`--explore`, `--ultra`, `--auto`).
@@ -204,7 +208,7 @@ See `templates/synthesize-panel.md`.
 **Paranoid / Ultra hard-lens double-pass:** run `security` and `correctness` twice — different
 temperature or two different models — before Phase 2b.
-Log each verify round: `mmi-cli saga note "verify round N: routing=X ultra=Y"`.
+Log each verify round: for `jervaise`, `mmi-cli saga note "verify round N: routing=X ultra=Y"`; for every other login, log it in the run output/report (saga is Jervaise-only).
 ## Phase 0b — Frame  [GATE 1]
 (`--auto`: no gate — auto-decide explore-vs-convergent, in `--explore` auto-pick the judge's
@@ -220,8 +224,7 @@ planning shape — **parallel planners + verifier-tier judge** — before Gate 1
 2. Each planner returns: approach summary, risks, proposed success criteria, estimated complexity.
 3. **Judge agent** (verifier-tier model, **≠** any planner/builder) scores against the goal rubric;
    picks winner or synthesizes hybrid — synthesize-and-reconcile, not vote/debate.
-4. Winner + criteria written to issue body; North Star push; proceed to Gate 1 (interactive) or
-   Phase 1 (`--auto`). `mmi-cli saga note "multi-plan N=<n> winner=<summary>"`.
+4. Winner + criteria written to issue body; for `jervaise`, North Star push + `mmi-cli saga note "multi-plan N=<n> winner=<summary>"`; for every other login, record the winner + criteria in the issue/PR record (saga + North Star are Jervaise-only). Proceed to Gate 1 (interactive) or Phase 1 (`--auto`).
 **Skip multi-agent planning** for narrow bugs, low priority, or user "quick"/"small" — single
 planner + judge (or direct criteria framing) is enough.
@@ -249,7 +252,7 @@ planner outputs when both paths are active.
    **Vague or ambiguous body?** Read the full issue (body + comments), state the deliverable,
    and write criteria the user can confirm at Gate 1. Umbrella scope → child issues (`--parent`),
    one shippable unit per grind.
-   Write them into the issue body; push the criteria to North Star (`mmi-cli northstar push <slug>` —
+   Write them into the issue body. For `jervaise`, also push the criteria to North Star (`mmi-cli northstar push <slug>` —
    the default push queues a background sync and prints "queued": that is success, not a failure;
    `mmi-cli northstar status` checks it, `mmi-cli northstar sync` confirms durably).
 3. Present per **## Gates** (class, routing, ultra, criteria). **Wait for the user's go.**
@@ -260,8 +263,7 @@ planner outputs when both paths are active.
 2. Otherwise brainstorm **2-3 candidate approaches** (2 if the goal is clear, 3 if wide) on the
    builder model — or use **multi-agent planning** when auto-gated.
 3. Score with the **verifier-model judge** (or fusion panel judge); pick or synthesize a direction.
-4. Define a success target — numeric metric if stated, else judged rubric. File/claim, North Star
-   push. Present per **## Gates**. **Wait for the user's go.**
+4. Define a success target — numeric metric if stated, else judged rubric. File/claim; for `jervaise`, also push North Star. Present per **## Gates**. **Wait for the user's go.**
 ## Generative fusion path (auto-gated)
@@ -275,7 +277,7 @@ verifier-tier **judge** fuses into one markdown deliverable. For **research-clas
    any planner) synthesizes — not vote/debate.
 2. Judge output is a **fused markdown deliverable**: chosen approach, tradeoffs, criteria
    refinement, spike findings.
-3. Land in issue body + `mmi-cli northstar push <slug>`.
+3. Land in issue body; for `jervaise`, also `mmi-cli northstar push <slug>` (North Star is Jervaise-only).
 4. **Code-shipping explore:** fused doc feeds Phase 1; Phase 2 verify stays diff-pinned.
 5. **Research-only** (no code PR): fusion output is the primary artifact; stop after criteria met
    + Gate 2 (`--auto` report).
@@ -335,6 +337,8 @@ build that adds new modules/tests is invisible to lenses and can draw a false `c
 Stage first (`git -C <worktree> add -A && git -C <worktree> diff --cached -- ':!cli/dist' > tmp/grind-verify-<round>.patch`)
 or `git -C <worktree> add -N <new files>` before the diff (#2057).
+**Spawn lenses tool-restricted (#2137).** Under `isolation=worktree`, spawn each lens with a subagent type that has **no shell/git/repo-filesystem access** — never the default `general-purpose` (or any `*`-tool) agent. A tool-capable lens silently bypasses prompt-only pinning: a `general-purpose` requirements lens has ignored the embedded patch, run `git diff` against the worktree's PARENT checkout (a different branch), and returned a false `cannot-verify`. Pass the patch file as the lens's ONLY input and state it has zero repo/git/filesystem access; if it cannot judge from the patch, it returns `cannot-verify`. Where the host exposes no zero-shell agent type, keep the no-access framing in the prompt AND discard + re-run (patch-only) any lens whose transcript shows a repo/git read — prompt-only pinning is not structurally honored by tool-capable subagents.
 **Lens-prompt clauses → `references/verify.md`.** Every lens prompt MUST carry: the **verbatim-includes-test-files** rule, the **abstention** rule (`cannot-verify`, never a false "absent/missing" blocker), the **diff-shape** clause (a referenced-but-undefined symbol is pre-existing — never flag it), and the **worktree-isolation** clause (patch-only, deny repo FS, stale-checkout warning). The exact wording lives in `references/verify.md` — load it before spawning lenses.
 Under **Paranoid** or **Ultra**, run **hard lenses twice** before Phase 2b.
@@ -382,7 +386,9 @@ The synthesizer returns a **`PanelReport`** — structured reconciliation of len
 A verify round **fails if `PanelReport.blockers` is non-empty**. If synthesis errors or returns
 invalid JSON, **degrade gracefully**: union raw lens `blockers` (manual dedupe by file+line+title),
-`saga note` the degradation, and continue Phase 3 — synthesis is an uplift, not a hard dependency.
+then for `jervaise` `saga note` the degradation and continue Phase 3; for every other login, note the
+degradation in the run output/report and continue (saga is Jervaise-only). Synthesis is an uplift,
+not a hard dependency.
 Optional CLI path: `mmi-cli verify panel` plans lens jobs; pipe lens JSON to `mmi-cli verify synthesize`
 for deterministic blocker dedupe before the host synthesizer enriches consensus/contradictions.
 Real verifier lanes only. Empty or controller-authored all-pass stubs are invalid evidence and do
@@ -453,7 +459,8 @@ See shared doctrine § Self-learning + retro. Grind-specific examples: gate mess
 ## End-of-grind summary
 At grind completion (PR opened + interactive stop, or `--auto` terminate/merge report), emit a **very
-brief** summary block — user-visible and mirrored in `mmi-cli saga note`:
+brief** summary block in user-visible text. For `jervaise`, also mirror it in `mmi-cli saga note`
+(saga is Jervaise-only); for every other login, the user-visible summary is the record:
 - **Tier + modes:** chosen tier (`light`/`standard`/`deep`/`ultra`); `--explore`, auto-ultra (verify uplift), `--auto` if applied.
 - **Models used:** builder / verifier / third / synthesizer / judge (host slot names).

package/skills/grind/references/auto.md CHANGED Viewed

@@ -73,8 +73,7 @@ report. Never pretend cross-vendor ultra ran when it did not.
 Auto-decide explore-vs-convergent from the ask (open-ended → explore; `--explore` forces it on).
 Run **multi-agent planning + judge** when auto-gated (always under explicit `--ultra`). In explore,
 brainstorm + judge or **generative fusion** when auto-gated — **auto-pick the winning approach** —
-no wait. File/claim the item(s), write the criteria, push to North Star, then go straight to Phase 1.
-`saga note` multi-plan winner, tool policy, and fusion path when active.
+no wait. File/claim the item(s), write the criteria, then for `jervaise` push to North Star + `saga note` multi-plan winner, tool policy, and fusion path when active; for every other login, record the criteria + winner in the issue/PR record (saga + North Star are Jervaise-only). Then go straight to Phase 1.
 ## Phase 4 — PR → CI-merge loop (replaces Gate 2)

package/skills/grind/references/routing.md CHANGED Viewed

@@ -5,7 +5,7 @@ Loaded on demand from `SKILL.md` § Phase 0a′. The class table + the effort-ti
 ## Auto-ultra detection
 **auto-ultra = true** (verify-panel uplift **only** — not whole-loop YOLO) when **any** of
-(log first match in saga):
+(for `jervaise`, log first match in saga; for every other login, in the run output/report):
 1. ~~User passed **`--ultra`**.~~ **Explicit `--ultra` is separate** — see **Flags** and
    **§ Explicit `--ultra` vs auto-ultra**; it is **not** auto-ultra.
@@ -13,7 +13,7 @@ Loaded on demand from `SKILL.md` § Phase 0a′. The class table + the effort-ti
 3. **`--explore`** + stated numeric SLA/metric with high blast radius.
 4. **Architectural / cross-cutting** scope (multi-module, public API break, migration).
 5. **Escalation:** after **2 failed verify rounds** on default routing, escalate to auto-ultra for round 3+
-   (once per grind; saga note). Interactive: announce; `--auto`: silent.
+   (once per grind; for `jervaise`, `saga note` the escalation; for every other login, note it in the run output/report — saga is Jervaise-only). Interactive: announce; `--auto`: silent.
 **auto-ultra = false** (stay 2-model): docs/prose-only diffs; priority `low`; narrow bug with clear repro;
 user says "quick" / "small" in the prompt (explicit user instruction wins).
@@ -70,7 +70,6 @@ When the user passes **explicit `--ultra`**, select **higher reasoning effort**
 exposes it (e.g. elevated thinking/reasoning tier on supported models). Apply to builder, verifier,
 third, synthesizer, and judge roles when the host offers per-model reasoning knobs.
-- **Announce** when elevated reasoning is selected — model + level — in the routing announcement block
-  and `mmi-cli saga note "reasoning=<model>:<level> …"`.
+- **Announce** when elevated reasoning is selected — model + level — in the routing announcement block; for `jervaise`, also `mmi-cli saga note "reasoning=<model>:<level> …"` (saga is Jervaise-only).
 - **Fallback:** when the host has no reasoning-effort knob, use the strongest available model tier and
   note the gap in the announcement — do not block the grind.

package/skills/grind/references/verify.md CHANGED Viewed

@@ -24,10 +24,12 @@ diff-shape clause:
 This prevents diff-absent symbols from becoming ship-stoppers at the lens; Phase 2b's absence-claims
 drop rule remains the backstop.
-**Worktree isolation (#1621, #1895).** Phase 2: `isolation=worktree` — other checkouts may be stale.
+**Worktree isolation (#1621, #1895, #2137).** Phase 2: `isolation=worktree` — other checkouts may be stale.
 **Orchestrator MUST:** patch-only input; deny repo FS tools on lenses; stale-checkout clause in every lens prompt; re-run Phase 2 if a transcript shows disk reads — never triage disk-sourced blockers.
 Abstention + diff-shape rules above still apply; `cannot-verify` beats false absence.
+**Deny repo FS tools structurally, not just by prompt (#2137).** Spawn lenses with a subagent type that has no Bash/git/repo-filesystem access; never spawn a lens as a `general-purpose` / `*`-tool agent under `isolation=worktree`. A tool-capable lens has ignored the embedded patch and run `git diff` on the worktree's parent checkout (on another branch), returning a false `cannot-verify` describing a diff that is not under review. When the host cannot restrict tools, the prompt must forbid all repo/git/file access and the orchestrator discards any blocker whose evidence came from a disk/git read, re-running patch-only — prompt-only pinning is not reliably honored by tool-capable subagents.
 ## Tool-enabled lenses (default expectation on applicable lenses)
 When an objective signal exists, hard lenses must anchor to it — failing test, typecheck error,
@@ -58,7 +60,7 @@ while Phase 2b synthesizer stays **tool-free** (lens JSON + diff stat only).
 **Hygiene:** configurable allow/deny domain lists (org/repo-level) — exclude benchmark-leak
 domains (e.g. Stack Overflow, issue mirrors) from verify search. Default deny list in
-`cli/src/grind-policy.ts`. **Per-lens budget:** hard cap (3 queries/lens/round); `saga note` on exceed.
+`cli/src/grind-policy.ts`. **Per-lens budget:** hard cap (3 queries/lens/round); on exceed, for `jervaise` `saga note` it, for every other login note it in the run output/report (saga is Jervaise-only).
 **Diff-pinning preserved:** tools supplement — never replace — the pinned patch. Under
 `isolation=worktree`, repo FS tools stay forbidden even under ultra; cite builder test output from
@@ -73,7 +75,7 @@ full re-panel) before Phase 3 triage — at most once per round.
 **`contradictions`:** if the disagreement is **criteria/spec ambiguity** (lenses read the spec
 differently), stop and escalate to the human per **## Gates** — do not guess. If one lens is
-clearly wrong against consensus + diff, note it in the saga and triage real blockers only.
+clearly wrong against consensus + diff, note it (for `jervaise` in the saga; for every other login in the run output/report — saga is Jervaise-only) and triage real blockers only.
 **Absence-claims (#1621, #1895, #2057).** Drop "missing/absent/unimplemented/not fixed" blockers
 contradicted by the pinned patch or green builder tests. Drop blockers when lens logs show repo

package/skills/grind/templates/saga-snapshot.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # Grind saga keep — resume snapshot
+**Jervaise-only artifact.** `mmi-cli saga snapshot` is a Jervaise-only continuity tool. For every other login, the issue/PR record + git history is the resume surface — do not use this template or `mmi-cli saga`.
 Enforced via **`mmi-cli saga snapshot`** (maps to saga HEAD — no parallel store). Checklist uses namespaced prefixes: `gs-open:`, `gs-resolved:`, `gs-ceiling:`.
 ## Read-first on resume

package/skills/grind/templates/synthesize-panel.md CHANGED Viewed

@@ -100,5 +100,5 @@ The grind loop uses this report as follows:
 If the synthesizer returns invalid JSON or errors:
 1. Fall back to **raw lens blockers** — union all lens `blockers`, manual dedupe by file+line+title.
-2. `mmi-cli saga note "phase 2b: synthesize degraded, raw lens triage"`.
+2. For `jervaise`, `mmi-cli saga note "phase 2b: synthesize degraded, raw lens triage"`; for every other login, note the degradation in the run output/report (saga is Jervaise-only).
 3. Continue Phase 3 — synthesis is an uplift, not a hard dependency.

package/skills/handoff/SKILL.md CHANGED Viewed

@@ -1,11 +1,13 @@
 ---
 name: handoff
-description: Record or claim an explicit session handoff in the saga + North Star system that SessionStart resumes — open a handoff to leave work for a future session, accept a pending one, or cancel your own. Use when the user says "handoff" or "/handoff", asks to hand off work, leave a checkpoint for the next session, or claim a prior handoff, or invokes /handoff.
+description: Jervaise-only. Record or claim an explicit session handoff in the saga + North Star system that SessionStart resumes — open a handoff to leave work for a future session, accept a pending one, or cancel your own. Use when Jervaise says "handoff" or "/handoff", asks to hand off work, leave a checkpoint for the next session, or claim a prior handoff, or invokes /handoff.
 ---
 # /handoff — explicit saga handoff lifecycle
-Use when the user says `handoff` or `/handoff`, or asks to leave work for a future session or claim a prior handoff. This skill records an explicit handoff in the same saga + North Star system that SessionStart resumes.
+This is a Jervaise-only continuity skill. If any other developer asks for a handoff, do not open or claim one; use the board, issue, or PR as the handoff record instead.
+Use when Jervaise says `handoff` or `/handoff`, or asks to leave work for a future session or claim a prior handoff. This skill records an explicit handoff in the same saga + North Star system that SessionStart resumes.
 ## Start

package/skills/overlord/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ Default pool: 3 servants total: one `fugu-ultra` and two normal `fugu` servants.
 Allowed range: `--3` through `--6`. Exactly one servant is Ultra in every run.
-Supported engines: `codex-fugu` through the PTY leash and OpenCode/Fugu through session-backed `opencode run --session` routing. OpenCode is preferred when available because it exposes parseable JSON events, session ids, and completion state.
+Active engine: native OpenAI-compatible Fugu API calls against the Sakana endpoint. It uses `SAKANA_API_KEY` or `MMI_OVERLORD_LLM_API_KEY`, records model/request/conversation state in the run registry, and does not depend on Codex-Fugu or OpenCode session routing.
 ## Start Contract
@@ -23,7 +23,7 @@ Supported engines: `codex-fugu` through the PTY leash and OpenCode/Fugu through
 6. Own worktrees, stage/dev servers, Playwright, browsers, PRs, merges, and cleanup.
 7. Keep servants leased until `/overlord stop`, `mmi-cli overlord stop`, or explicit controlled shutdown.
-CLI startup persists a gitignored run registry at `tmp/overlord/runs.json`, starts a durable controller, and lets the controller spawn servant PTYs or OpenCode sessions. `mmi-cli overlord send <target> <message>` queues redirects into that registry so the controller can route them to live servants. On OpenCode, redirects use `opencode run --session <session-id> -m sakana/fugu --format json <message>` and advance a real message lifecycle (`queuedAt`, `startedAt`, `completedAt`, `failedAt`) from JSON events. The launch profile uses `-a never` plus explicit sandbox settings where the engine supports them so routine servant tool calls do not bounce approval prompts back to the human.
+CLI startup persists a gitignored run registry at `tmp/overlord/runs.json` and starts API-backed Fugu conversations. `mmi-cli overlord send <target> <message>` appends to each servant's stored conversation, calls `/chat/completions`, captures the assistant text, and advances a real message lifecycle (`queued`, `started`, `completed`, `partial`, `failed`) from the API result.
 ## Reference Loading
@@ -33,24 +33,24 @@ Read only what the task requires:
 - `references/servant-normal.md`: prompt for normal Fugu servants.
 - `references/servant-ultra.md`: prompt for the single Ultra servant.
 - `references/loop-contract.md`: evidence, edit, verify, retry, escalate, and stop rules.
-- `references/terminal-leash.md`: servant startup, submit probing, approval profiles, and stop safety.
+- `references/terminal-leash.md`: servant startup, API conversation routing, shell-surface boundaries, and stop safety.
 - `references/servant-liveness.md`: liveness lease and awaiting-human behavior.
-- `references/controller-orphan-guard.md`: abrupt close, stale heartbeat, adoption, exact stop, and uncertainty.
-- `references/codex-fugu-preflight.md`: setup, update, model, API key, and Windows/Git Bash path checks.
-- `references/opencode-fugu-engine.md`: OpenCode preflight, JSON event parsing, session-backed mailbox, ledger, and liveness model.
+- `references/controller-orphan-guard.md`: abrupt close, stale sessions, adoption, exact stop, and uncertainty.
+- `references/fugu-api-engine.md`: native API preflight, conversation persistence, ledger, request timeouts, and liveness model.
 - `references/shell-adapters.md`: PowerShell, cmd, Git Bash, macOS zsh/bash, Linux bash/sh, and unknown-shell rules.
 - `references/state-schema.md`: durable run-state fields.
 - `references/failure-pressure-scenarios.md`: tests and lessons from the first Overlord design run.
 ## Hard Rules
-- **Fugu only — never a sub-agent fallback.** Overlord servants are Fugu (`fugu-ultra` + `fugu`) driven through the Overlord controller. Never satisfy an Overlord run with platform sub-agents, `multi_agent_v1`, generic workers, or any non-Fugu agent pool — not as a primary path and not as a fallback when the Fugu controller is missing or inactive. If the Fugu controller cannot start, is inactive, or cannot prove readable/writable handles, stop and report `blocked: fugu-controller-unavailable` with diagnosis; do not simulate Overlord with other agents.
-- Do not spawn servants on an unreadable or undrivable surface.
-- **Probe the engine before launch.** Run the Codex/Fugu (or OpenCode) preflight + `--help`/status probe before any servant launch; never launch into an unprobed surface.
-- Codex/Fugu PTY servants launch with explicit `-a never` (no-approval) and an appropriate `-s` sandbox profile — read-only for consultation, workspace-write only in an owned worktree. OpenCode servants use `opencode run --format json --session` and rely on the OpenCode session mailbox instead of PTY submit probing.
+- **Fugu only — never a substitute pool.** Overlord servants are Fugu (`fugu-ultra` + `fugu`) driven through the native Fugu API. Never satisfy an Overlord run with platform sub-agents, `multi_agent_v1`, generic workers, Codex-Fugu, OpenCode sessions, or any non-Fugu agent pool. If the Fugu API cannot prove the required endpoint, key, and models, stop and report the setup failure; do not simulate Overlord with other agents.
+- Do not spawn servants on an unreadable or undrivable engine.
+- **Probe the native Fugu API before launch.** Verify the API key, base URL, `/models`, `fugu`, and `fugu-ultra` before any servant launch.
+- API servants keep a stored system/user/assistant conversation per servant and use bounded `/chat/completions` calls for startup and redirects.
+- When the Overlord itself uses shell tools, it must use the native current shell for the host: `pwsh` on Windows, `zsh` on macOS, and `bash` on Linux/Unix. Windows `powershell.exe` is not an acceptable Overlord default.
 - **Any routine approval prompt during startup, planning, or assigned work is a launch-profile failure** — record and recover it, never hand-wave it away or train the human to approve routine commands. Consultation servants get read-only plus the disk-read permission ordinary host config reads need, or the Overlord performs those reads itself.
-- Probe submit behavior before sending real prompts.
-- **Delivery is not execution.** `mmi-cli overlord send` records a `queued`/`started` lifecycle, not completion. A redirect counts as delivered only when the servant journal shows the assignment left the composer and produced a new useful signal. If text remains at the `›` composer prompt after a bounded interval, mark the servant `delivery-stuck-composer` and the message `failed` — never report it as ready or delivered.
+- Probe the API path before sending real prompts.
+- **Delivery is not execution.** `mmi-cli overlord send` is complete only when the API returns captured servant text or a bounded failure. If the request times out, errors, or returns no assistant text, mark the servant `blocked` and the message `failed` — never report it as ready or delivered.
 - **No handoff after delivery = stalled, not ready.** When a servant stays `ready` but produces no non-TUI output after a bounded handoff-expected interval, mark it `stalled-after-delivery`; do not keep reporting it as ready.
 - Never rely on stale ACKs as liveness proof.
 - Never broad-kill by process name, title, shell name, or model command.

package/skills/overlord/references/controller-orphan-guard.md CHANGED Viewed

@@ -1,33 +1,30 @@
-# Controller And Orphan Guard
+# Run Registry And Orphan Guard
-The controller, not conversational memory, owns servants.
+The run registry, not conversational memory, owns servants.
-Controller responsibilities:
+Registry responsibilities:
-- Spawn servant PTYs.
-- Hold readable/writable handles.
+- Record Fugu API servant conversations.
 - Persist run state under gitignored `tmp/overlord`.
-- Write heartbeat.
-- Tee bounded journals.
+- Tee bounded ledger and event journals.
 - Expose status, stop, adopt, and recover.
 On every `/overlord`, `status`, `stop`, resume, or human message:
 - Rehydrate run state.
-- Check controller heartbeat.
-- Check servant handles.
+- Check model metadata, conversation history, request ids, and last useful signal.
 - Classify orphan state before doing more work.
 Orphan classifications:
-- `controller-alive-overlord-detached`
-- `controller-dead-servants-dead`
-- `controller-dead-servants-owned-alive`
-- `controller-dead-servants-uncertain`
+- `overlord-conversations-live`
+- `conversations-blocked`
+- `resources-owned-alive`
+- `resources-uncertain`
 Actions:
-- Adopt only with matching run token and recoverable handles.
+- Adopt only with a matching run token and recoverable conversation state.
 - Exact-stop only proven run-owned resources.
 - Leave uncertain resources alone and report them.
-- Never broad-clean by process name or title.
+- Never broad-clean by process name, title, shell name, or provider name.

package/skills/overlord/references/failure-pressure-scenarios.md CHANGED Viewed

@@ -2,17 +2,17 @@
 Test these before accepting `/overlord`:
-- Windows PowerShell startup uses PowerShell syntax and native paths.
-- Windows Git Bash does not write `/c/Users/...` into native Codex config.
-- macOS zsh and Linux bash use POSIX syntax.
-- Unknown shell fails before servant launch.
-- Codex update leaves Fugu receipt stale; preflight detects and guides repair.
-- Missing `codex-fugu`, API key, or Ultra model stops startup with setup steps.
-- `TERM=dumb` warning is translated, not shown as scary raw noise.
-- Prompt typed into composer but not submitted is detected.
-- Routine read-only reconnaissance triggers approval; Overlord marks launch-profile failure.
-- Previously ACKed servants become unreachable; stale ACK is rejected.
-- Awaiting-human preserves servant leases.
-- Controller heartbeat goes stale; orphan classification runs first.
-- `/overlord stop` leaves user-owned terminals, OpenCode, Codex, Fugu, shells, and Windows Terminal untouched.
+- Windows host work uses modern PowerShell (`pwsh`) syntax and native paths.
+- macOS zsh and Linux bash use POSIX syntax for host work.
+- Missing `SAKANA_API_KEY` and `MMI_OVERLORD_LLM_API_KEY` stops startup with setup steps.
+- A bad Fugu API base URL stops startup with setup steps.
+- `GET /models` returning an error stops startup with setup steps.
+- Missing `fugu` or `fugu-ultra` stops startup with setup steps.
+- Startup `/chat/completions` timeout marks the servant `blocked`.
+- Startup API success with no assistant text marks the servant `blocked`.
+- `send all` records one completed or failed result per targeted servant.
+- A redirect timeout marks the message `failed` and does not claim delivery.
+- Previously ACKed servants with failed follow-up requests become `blocked`; stale ACK is rejected.
+- Awaiting-human preserves servant conversation state.
+- `/overlord stop` leaves user-owned terminals, shells, and unrelated provider processes untouched.
 - Ambiguous leftovers are reported as `left-uncertain`.

package/skills/overlord/references/fugu-api-engine.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Native Fugu API Engine
+Overlord servants run through the Sakana Fugu API using the OpenAI-compatible chat-completions surface.
+Defaults:
+- Base URL: `https://api.sakana.ai/v1`
+- Normal model: `fugu`
+- Ultra model: `fugu-ultra`
+- API key: `SAKANA_API_KEY`, or `MMI_OVERLORD_LLM_API_KEY`
+- Request timeout: `MMI_OVERLORD_LLM_TIMEOUT_MS`, clamped between 5 seconds and 10 minutes, defaulting to 90 seconds
+Optional overrides:
+- `MMI_OVERLORD_LLM_BASE_URL`
+- `MMI_OVERLORD_LLM_MODEL`
+- `MMI_OVERLORD_LLM_ULTRA_MODEL`
+## Preflight
+Before launching servants, probe `GET /models` with the configured key and base URL.
+Startup is blocked unless the probe proves both configured models are available.
+Do not print the API key.
+## Conversation State
+Each servant owns one stored conversation:
+- one system message with the servant identity and Overlord constraints
+- the startup assignment as a user message
+- every captured assistant response
+- future redirects appended as user messages
+The run registry records the model, request id when available, and conversation length.
+## Redirects
+`mmi-cli overlord send <target> <message>` calls `/chat/completions` for the target servant or each servant in `all`.
+The command returns only after the API response is captured or a bounded failure is recorded.
+If a request times out, errors, or returns no assistant text:
+- mark the servant `blocked`
+- mark the message `failed`
+- write the failure to the ledger
+## Ledger
+Append one ledger event for startup and one per redirect response.
+Include only operational metadata:
+- servant slot id
+- model
+- request id
+- response text
+- error text
+Never include API keys or provider secrets.

package/skills/overlord/references/servant-liveness.md CHANGED Viewed

@@ -4,9 +4,9 @@ An ACK creates a lease, not permanent proof.
 Readiness requires:
-- Current readable handle.
-- Current writable handle.
-- Proven submit mode.
+- Current Fugu API conversation state.
+- Configured model for the servant role.
+- Stored system/user/assistant messages.
 - Matching run id and run token.
 - Recent useful signal or bounded liveness response.
@@ -15,13 +15,13 @@ Stale ACK-only readiness is forbidden.
 Awaiting-human:
 - Servants remain leased.
-- Controller heartbeat stays active.
+- Run registry remains current.
 - Status rehydrates state and checks liveness.
-- If background liveness is unsupported, mark `suspended-awaiting-human` and require a rehydrate pass before work resumes.
+- If background liveness is unsupported, preserve conversation state and require a rehydrate pass before work resumes.
-Lost servant:
+Blocked servant:
-- Mark the slot lost/unresponsive.
-- Preserve bounded journal.
-- Attempt recovery only when handles can be proven.
+- Mark the slot `blocked`.
+- Preserve bounded journal and conversation state.
+- Attempt recovery only when the API key, model, and conversation can be proven.
 - Otherwise spawn a replacement in the same role slot with a compact handoff.

package/skills/overlord/references/shell-adapters.md CHANGED Viewed

@@ -10,7 +10,8 @@ Detect:
 Rules:
-- PowerShell/pwsh: use PowerShell syntax and native Windows paths.
+- Windows `pwsh`: use modern PowerShell syntax and native Windows paths.
+- Legacy Windows PowerShell (`powershell.exe`) is not an acceptable host shell for Overlord; use `pwsh`.
 - cmd: use cmd syntax and native Windows paths.
 - Windows Git Bash: distinguish shell paths from native Windows consumer-process paths.
 - macOS zsh/bash: use POSIX syntax and macOS paths.

package/skills/overlord/references/state-schema.md CHANGED Viewed

@@ -6,22 +6,15 @@ Minimum fields:
 - `runId`
 - `runToken`
-- `repo`
 - `worktree`
-- `branch`
-- `human`
-- `surface`
-- `hostPlatform`
-- `shellAdapter`
+- `engine`
+- `provider`
 - `state`
 - `createdAt`
 - `updatedAt`
-- `controllerPid`
-- `controllerFingerprint`
-- `lastControllerHeartbeatAt`
 - `statePath`
 - `journalDir`
-- `todoSnapshot`
+- `ledgerPath`
 - `servants[]`
 - `messages[]`
 - `ownedResources[]`
@@ -33,17 +26,17 @@ Servant fields:
 - `role`
 - `model`
 - `profile`
-- `state` (includes `stalled-after-delivery` for elapsed handoff windows, and `delivery-stuck-composer` when a redirect is pasted but unsubmitted)
-- `pid`
-- `runToken`
-- `fingerprint`
+- `state` (includes `blocked` when an API request fails or returns no assistant text)
 - `composerSubmitMode`
-- `opencodeSessionId`
+- `llmModel`
+- `llmRequestId`
+- `llmMessages`
 - `lastAckAt`
 - `lastLivenessCheckAt`
 - `lastUsefulSignalAt`
 - `journalPath`
 - `eventJournalPath`
+- `scopeToken` (optional non-secret attribution token bound to run id, servant slot, profile, and assignment scope)
 - `assignment`
 - `handoff`
@@ -53,14 +46,14 @@ Message fields:
 - `target`
 - `text`
 - `createdAt`
-- `state` (`queued` | `started` | `completed` | `failed`)
+- `state` (`queued` | `started` | `completed` | `partial` | `failed`)
 - `queuedAt`
 - `startedAt`
 - `completedAt`
 - `failedAt`
 - `responseText`
 - `failureReason`
-- `deliveredAt` (legacy PTY-only; superseded by the lifecycle fields)
+- `servantResults[]`
 Owned resource fields:

package/skills/overlord/references/terminal-leash.md CHANGED Viewed

@@ -1,44 +1,47 @@
-# Terminal Leash
+# Session Leash
-The Overlord must own every servant terminal through a durable controller, PTY leash, and registry.
+The Overlord must own every servant conversation through a durable registry, model metadata, request ids when available, and a bounded message lifecycle.
+The native Fugu API is the supported Overlord engine.
 Startup phases shown to humans:
-- Loading controller and PTYs.
-- Checking Fugu setup.
+- Loading the run registry.
+- Checking the Fugu API key and base URL.
+- Checking `/models` for `fugu` and `fugu-ultra`.
 - Starting one Ultra and normal Fugus.
+- Recording model, request, and conversation state.
 - Loading servant instructions.
-- Waiting for ACKs.
+- Waiting for API-backed ACKs.
 - Ready.
-Do not show raw `TERM=dumb`, ANSI redraws, title-setting failures, or TUI noise unless startup fails or debug output is requested.
-Approval profiles:
+Do not show raw provider response bodies, stack traces, or retry noise unless startup fails or debug output is requested.
-- Consultation: `codex-fugu --no-alt-screen -a never -s read-only -c 'sandbox_permissions=["disk-full-read-access"]'`
-- Implementation: `codex-fugu --no-alt-screen -a never -s workspace-write -c 'sandbox_permissions=["disk-full-read-access"]' -C <owned-worktree>`
-- Full-trust repair: only with explicit human approval and narrow blast radius.
+Servant access model:
-`-a never` is required for servant launches. If routine consultation or bounded implementation asks the human for command approval, the profile is wrong; stop launch, report setup guidance, and do not hand-wave the prompt away.
+- Servants are API conversations, not shell sessions.
+- Servants do not receive direct tools, stage/dev servers, browsers, Playwright, PR rights, or release rights.
+- The Overlord performs tool use, checks, edits, and PR operations in the host session after judging servant advice.
+- Full-trust repair is never delegated to servants.
 Before launch:
-- Verify local help/status exposes approval, sandbox, config override, no-alt-screen, and cwd flags.
-- Accept either an API-key environment variable or local Codex auth evidence; guide setup when neither exists.
-- Verify the Fugu model catalog exposes `fugu-ultra`.
+- Verify `SAKANA_API_KEY` or `MMI_OVERLORD_LLM_API_KEY` is available.
+- Verify the configured base URL, defaulting to `https://api.sakana.ai/v1`.
+- Verify `GET /models` exposes `fugu` and `fugu-ultra`, or the configured model overrides.
 - Fail closed when semantics are missing or unknown.
-Submit probe:
+Conversation probe:
-- Prefer initial-prompt launch through the PTY leash.
-- Require the servant to emit `ACK <name> ready`.
-- Record `composerSubmitMode` and `lastAckAt`.
-- Fail startup if no mode is proven.
+- Prefer initial-prompt launch through `/chat/completions`.
+- Require the servant to emit useful assistant text for its startup assignment.
+- Record `llmModel`, `llmRequestId` when available, `llmMessages`, `lastEventAt`, `lastAckAt`, and `composerSubmitMode=surface-api`.
+- Fail startup if no assistant text is captured.
-Redirects after startup use `mmi-cli overlord send <target> <message>`; the controller drains the durable mailbox into live servant PTYs. Do not bypass the mailbox with ad-hoc keystrokes unless diagnosing the leash itself.
+Redirects after startup use `mmi-cli overlord send <target> <message>`; the CLI appends to the servant conversation and records completion or bounded failure. Do not bypass the mailbox with ad-hoc provider calls unless diagnosing the leash itself.
 Stop safety:
 - Stop only recorded resources with matching run id, run token, and fingerprint.
-- Refuse generic `WindowsTerminal`, `pwsh`, `powershell`, `opencode`, `codex`, and `codex-fugu` names without exact ownership.
+- Refuse generic process, shell, terminal, or provider names without exact ownership.
 - Refuse window-title-only ownership.