@mutmutco/opencode-mmi 2.56.0 → 2.58.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -17,6 +17,8 @@ Two kinds of work, one loop:
17
17
 
18
18
  **Shared doctrine:** Read `skills/_shared/doctrine.md` at session start and on resume. Fusion, parallelism, panel economics, flat fan-out, classifier-denied spawns, worktree hygiene, saga resume, enforcement matrix — single source; do not duplicate here.
19
19
 
20
+ **Continuity access:** Saga and North Star are Jervaise-only. If the current login is not `jervaise`, skip all `saga`, `northstar`, `plan`, and `/handoff` steps; keep durable state in the board issue, PR body/comments, verification logs, and final report.
21
+
20
22
  Flags:
21
23
  - `--explore` — brainstorm and judge approaches before building (use for open-ended,
22
24
  "find a better/faster way" work). Without it, run the convergent loop.
@@ -86,10 +88,11 @@ Flags:
86
88
  - Perform the fix, **verify empirically** (run the command whose output the acceptance pins — no diff
87
89
  to panel), close the issue, and **file any durable-prevention enhancement** as a separate item.
88
90
  - Terminal-done layer 1 is the live command output; layer 3 is the close, not a merge.
89
- - **Resumable.** After every phase: (1) silent one-line `mmi-cli saga note "<audit>"`; (2) `mmi-cli saga snapshot set --kind grind …` (or `--json-file`) — see **## Saga keep (resume snapshot)**. On resume, `mmi-cli saga snapshot show --kind grind` first; never reconstruct from collapsed chat history.
91
+ - **Resumable.** For `jervaise`, after every phase: (1) silent one-line `mmi-cli saga note "<audit>"`; (2) `mmi-cli saga snapshot set --kind grind …` (or `--json-file`) — see **## Saga keep (resume snapshot)**. For everyone else, write the same phase evidence to the issue/PR record. On resume, corroborate against git, GitHub, and the board; never reconstruct from collapsed chat history.
90
92
  - **Announce routing.** At phase transitions (after Phase 0a′ / model selection, before Gate 1 under
91
- interactive, and at equivalent points under `--auto`), **announce explicitly** in user-visible text
92
- **and** mirror the same facts in a silent `mmi-cli saga note`:
93
+ interactive, and at equivalent points under `--auto`), **announce explicitly** in user-visible text.
94
+ For `jervaise`, also mirror the same facts in a silent `mmi-cli saga note`; for every other login,
95
+ mirror them in the run output/report only (saga is Jervaise-only):
93
96
  - **Ultra mode** when explicit `--ultra` YOLO is active (distinct from auto-ultra verify uplift).
94
97
  - **Explore mode** when `--explore` or auto-framed explore is selected.
95
98
  - **Routing tier** (Budget / Balanced / Paranoid / Ultra verify routing).
@@ -105,7 +108,7 @@ Flags:
105
108
 
106
109
  ## Saga keep (resume snapshot)
107
110
 
108
- Grind-specific snapshot wiring — shared resume rules in `skills/_shared/doctrine.md`. Schema: `templates/saga-snapshot.md`. Use `--kind grind` for show/set. CLI maps snapshot fields → HEAD primitives (`next`, `anchor`, checklist) — no parallel store.
111
+ **Jervaise-only.** Grind-specific snapshot wiring — shared resume rules in `skills/_shared/doctrine.md`. Schema: `templates/saga-snapshot.md`. Use `--kind grind` for show/set. CLI maps snapshot fields → HEAD primitives (`next`, `anchor`, checklist) — no parallel store. For every other login, the issue/PR record + git history is the resume surface — do not run saga/North Star commands.
109
112
 
110
113
  ## Gates — how every gate (and decision point) is presented
111
114
  Hosts collapse mid-turn text: anything you printed between tool calls may be invisible when
@@ -134,8 +137,9 @@ the user reaches the gate. So:
134
137
  ## Phase 0a′ — Classify & route
135
138
 
136
139
  Runs immediately after the grindability check, **before** Phase 0a model questions. (`--auto`:
137
- apply silently for classification; still **announce** routing per **Hard rules — Announce routing**;
138
- `saga note "grind class=X routing=Y ultra=Z reason=…"`.)
140
+ apply silently for classification; still **announce** routing per **Hard rules — Announce routing**.
141
+ For `jervaise`, `saga note "grind class=X routing=Y ultra=Z reason=…"`; for every other login,
142
+ announce in the run output/report only — saga is Jervaise-only.)
139
143
 
140
144
  Read the issue **type label** (`bug` / `feature` / `task`), **priority**, **title/body**,
141
145
  **labels** (e.g. `security`), known **files touched**, and flags (`--explore`, `--ultra`, `--auto`).
@@ -204,7 +208,7 @@ See `templates/synthesize-panel.md`.
204
208
  **Paranoid / Ultra hard-lens double-pass:** run `security` and `correctness` twice — different
205
209
  temperature or two different models — before Phase 2b.
206
210
 
207
- Log each verify round: `mmi-cli saga note "verify round N: routing=X ultra=Y"`.
211
+ Log each verify round: for `jervaise`, `mmi-cli saga note "verify round N: routing=X ultra=Y"`; for every other login, log it in the run output/report (saga is Jervaise-only).
208
212
 
209
213
  ## Phase 0b — Frame [GATE 1]
210
214
  (`--auto`: no gate — auto-decide explore-vs-convergent, in `--explore` auto-pick the judge's
@@ -220,8 +224,7 @@ planning shape — **parallel planners + verifier-tier judge** — before Gate 1
220
224
  2. Each planner returns: approach summary, risks, proposed success criteria, estimated complexity.
221
225
  3. **Judge agent** (verifier-tier model, **≠** any planner/builder) scores against the goal rubric;
222
226
  picks winner or synthesizes hybrid — synthesize-and-reconcile, not vote/debate.
223
- 4. Winner + criteria written to issue body; North Star push; proceed to Gate 1 (interactive) or
224
- Phase 1 (`--auto`). `mmi-cli saga note "multi-plan N=<n> winner=<summary>"`.
227
+ 4. Winner + criteria written to issue body; for `jervaise`, North Star push + `mmi-cli saga note "multi-plan N=<n> winner=<summary>"`; for every other login, record the winner + criteria in the issue/PR record (saga + North Star are Jervaise-only). Proceed to Gate 1 (interactive) or Phase 1 (`--auto`).
225
228
 
226
229
  **Skip multi-agent planning** for narrow bugs, low priority, or user "quick"/"small" — single
227
230
  planner + judge (or direct criteria framing) is enough.
@@ -249,7 +252,7 @@ planner outputs when both paths are active.
249
252
  **Vague or ambiguous body?** Read the full issue (body + comments), state the deliverable,
250
253
  and write criteria the user can confirm at Gate 1. Umbrella scope → child issues (`--parent`),
251
254
  one shippable unit per grind.
252
- Write them into the issue body; push the criteria to North Star (`mmi-cli northstar push <slug>` —
255
+ Write them into the issue body. For `jervaise`, also push the criteria to North Star (`mmi-cli northstar push <slug>` —
253
256
  the default push queues a background sync and prints "queued": that is success, not a failure;
254
257
  `mmi-cli northstar status` checks it, `mmi-cli northstar sync` confirms durably).
255
258
  3. Present per **## Gates** (class, routing, ultra, criteria). **Wait for the user's go.**
@@ -260,8 +263,7 @@ planner outputs when both paths are active.
260
263
  2. Otherwise brainstorm **2-3 candidate approaches** (2 if the goal is clear, 3 if wide) on the
261
264
  builder model — or use **multi-agent planning** when auto-gated.
262
265
  3. Score with the **verifier-model judge** (or fusion panel judge); pick or synthesize a direction.
263
- 4. Define a success target — numeric metric if stated, else judged rubric. File/claim, North Star
264
- push. Present per **## Gates**. **Wait for the user's go.**
266
+ 4. Define a success target — numeric metric if stated, else judged rubric. File/claim; for `jervaise`, also push North Star. Present per **## Gates**. **Wait for the user's go.**
265
267
 
266
268
  ## Generative fusion path (auto-gated)
267
269
 
@@ -275,7 +277,7 @@ verifier-tier **judge** fuses into one markdown deliverable. For **research-clas
275
277
  any planner) synthesizes — not vote/debate.
276
278
  2. Judge output is a **fused markdown deliverable**: chosen approach, tradeoffs, criteria
277
279
  refinement, spike findings.
278
- 3. Land in issue body + `mmi-cli northstar push <slug>`.
280
+ 3. Land in issue body; for `jervaise`, also `mmi-cli northstar push <slug>` (North Star is Jervaise-only).
279
281
  4. **Code-shipping explore:** fused doc feeds Phase 1; Phase 2 verify stays diff-pinned.
280
282
  5. **Research-only** (no code PR): fusion output is the primary artifact; stop after criteria met
281
283
  + Gate 2 (`--auto` report).
@@ -335,6 +337,8 @@ build that adds new modules/tests is invisible to lenses and can draw a false `c
335
337
  Stage first (`git -C <worktree> add -A && git -C <worktree> diff --cached -- ':!cli/dist' > tmp/grind-verify-<round>.patch`)
336
338
  or `git -C <worktree> add -N <new files>` before the diff (#2057).
337
339
 
340
+ **Spawn lenses tool-restricted (#2137).** Under `isolation=worktree`, spawn each lens with a subagent type that has **no shell/git/repo-filesystem access** — never the default `general-purpose` (or any `*`-tool) agent. A tool-capable lens silently bypasses prompt-only pinning: a `general-purpose` requirements lens has ignored the embedded patch, run `git diff` against the worktree's PARENT checkout (a different branch), and returned a false `cannot-verify`. Pass the patch file as the lens's ONLY input and state it has zero repo/git/filesystem access; if it cannot judge from the patch, it returns `cannot-verify`. Where the host exposes no zero-shell agent type, keep the no-access framing in the prompt AND discard + re-run (patch-only) any lens whose transcript shows a repo/git read — prompt-only pinning is not structurally honored by tool-capable subagents.
341
+
338
342
  **Lens-prompt clauses → `references/verify.md`.** Every lens prompt MUST carry: the **verbatim-includes-test-files** rule, the **abstention** rule (`cannot-verify`, never a false "absent/missing" blocker), the **diff-shape** clause (a referenced-but-undefined symbol is pre-existing — never flag it), and the **worktree-isolation** clause (patch-only, deny repo FS, stale-checkout warning). The exact wording lives in `references/verify.md` — load it before spawning lenses.
339
343
 
340
344
  Under **Paranoid** or **Ultra**, run **hard lenses twice** before Phase 2b.
@@ -382,7 +386,9 @@ The synthesizer returns a **`PanelReport`** — structured reconciliation of len
382
386
 
383
387
  A verify round **fails if `PanelReport.blockers` is non-empty**. If synthesis errors or returns
384
388
  invalid JSON, **degrade gracefully**: union raw lens `blockers` (manual dedupe by file+line+title),
385
- `saga note` the degradation, and continue Phase 3 synthesis is an uplift, not a hard dependency.
389
+ then for `jervaise` `saga note` the degradation and continue Phase 3; for every other login, note the
390
+ degradation in the run output/report and continue (saga is Jervaise-only). Synthesis is an uplift,
391
+ not a hard dependency.
386
392
  Optional CLI path: `mmi-cli verify panel` plans lens jobs; pipe lens JSON to `mmi-cli verify synthesize`
387
393
  for deterministic blocker dedupe before the host synthesizer enriches consensus/contradictions.
388
394
  Real verifier lanes only. Empty or controller-authored all-pass stubs are invalid evidence and do
@@ -453,7 +459,8 @@ See shared doctrine § Self-learning + retro. Grind-specific examples: gate mess
453
459
  ## End-of-grind summary
454
460
 
455
461
  At grind completion (PR opened + interactive stop, or `--auto` terminate/merge report), emit a **very
456
- brief** summary block user-visible and mirrored in `mmi-cli saga note`:
462
+ brief** summary block in user-visible text. For `jervaise`, also mirror it in `mmi-cli saga note`
463
+ (saga is Jervaise-only); for every other login, the user-visible summary is the record:
457
464
 
458
465
  - **Tier + modes:** chosen tier (`light`/`standard`/`deep`/`ultra`); `--explore`, auto-ultra (verify uplift), `--auto` if applied.
459
466
  - **Models used:** builder / verifier / third / synthesizer / judge (host slot names).
@@ -73,8 +73,7 @@ report. Never pretend cross-vendor ultra ran when it did not.
73
73
  Auto-decide explore-vs-convergent from the ask (open-ended → explore; `--explore` forces it on).
74
74
  Run **multi-agent planning + judge** when auto-gated (always under explicit `--ultra`). In explore,
75
75
  brainstorm + judge or **generative fusion** when auto-gated — **auto-pick the winning approach** —
76
- no wait. File/claim the item(s), write the criteria, push to North Star, then go straight to Phase 1.
77
- `saga note` multi-plan winner, tool policy, and fusion path when active.
76
+ no wait. File/claim the item(s), write the criteria, then for `jervaise` push to North Star + `saga note` multi-plan winner, tool policy, and fusion path when active; for every other login, record the criteria + winner in the issue/PR record (saga + North Star are Jervaise-only). Then go straight to Phase 1.
78
77
 
79
78
  ## Phase 4 — PR → CI-merge loop (replaces Gate 2)
80
79
 
@@ -5,7 +5,7 @@ Loaded on demand from `SKILL.md` § Phase 0a′. The class table + the effort-ti
5
5
  ## Auto-ultra detection
6
6
 
7
7
  **auto-ultra = true** (verify-panel uplift **only** — not whole-loop YOLO) when **any** of
8
- (log first match in saga):
8
+ (for `jervaise`, log first match in saga; for every other login, in the run output/report):
9
9
 
10
10
  1. ~~User passed **`--ultra`**.~~ **Explicit `--ultra` is separate** — see **Flags** and
11
11
  **§ Explicit `--ultra` vs auto-ultra**; it is **not** auto-ultra.
@@ -13,7 +13,7 @@ Loaded on demand from `SKILL.md` § Phase 0a′. The class table + the effort-ti
13
13
  3. **`--explore`** + stated numeric SLA/metric with high blast radius.
14
14
  4. **Architectural / cross-cutting** scope (multi-module, public API break, migration).
15
15
  5. **Escalation:** after **2 failed verify rounds** on default routing, escalate to auto-ultra for round 3+
16
- (once per grind; saga note). Interactive: announce; `--auto`: silent.
16
+ (once per grind; for `jervaise`, `saga note` the escalation; for every other login, note it in the run output/report — saga is Jervaise-only). Interactive: announce; `--auto`: silent.
17
17
 
18
18
  **auto-ultra = false** (stay 2-model): docs/prose-only diffs; priority `low`; narrow bug with clear repro;
19
19
  user says "quick" / "small" in the prompt (explicit user instruction wins).
@@ -70,7 +70,6 @@ When the user passes **explicit `--ultra`**, select **higher reasoning effort**
70
70
  exposes it (e.g. elevated thinking/reasoning tier on supported models). Apply to builder, verifier,
71
71
  third, synthesizer, and judge roles when the host offers per-model reasoning knobs.
72
72
 
73
- - **Announce** when elevated reasoning is selected — model + level — in the routing announcement block
74
- and `mmi-cli saga note "reasoning=<model>:<level> …"`.
73
+ - **Announce** when elevated reasoning is selected — model + level — in the routing announcement block; for `jervaise`, also `mmi-cli saga note "reasoning=<model>:<level> …"` (saga is Jervaise-only).
75
74
  - **Fallback:** when the host has no reasoning-effort knob, use the strongest available model tier and
76
75
  note the gap in the announcement — do not block the grind.
@@ -24,10 +24,12 @@ diff-shape clause:
24
24
  This prevents diff-absent symbols from becoming ship-stoppers at the lens; Phase 2b's absence-claims
25
25
  drop rule remains the backstop.
26
26
 
27
- **Worktree isolation (#1621, #1895).** Phase 2: `isolation=worktree` — other checkouts may be stale.
27
+ **Worktree isolation (#1621, #1895, #2137).** Phase 2: `isolation=worktree` — other checkouts may be stale.
28
28
  **Orchestrator MUST:** patch-only input; deny repo FS tools on lenses; stale-checkout clause in every lens prompt; re-run Phase 2 if a transcript shows disk reads — never triage disk-sourced blockers.
29
29
  Abstention + diff-shape rules above still apply; `cannot-verify` beats false absence.
30
30
 
31
+ **Deny repo FS tools structurally, not just by prompt (#2137).** Spawn lenses with a subagent type that has no Bash/git/repo-filesystem access; never spawn a lens as a `general-purpose` / `*`-tool agent under `isolation=worktree`. A tool-capable lens has ignored the embedded patch and run `git diff` on the worktree's parent checkout (on another branch), returning a false `cannot-verify` describing a diff that is not under review. When the host cannot restrict tools, the prompt must forbid all repo/git/file access and the orchestrator discards any blocker whose evidence came from a disk/git read, re-running patch-only — prompt-only pinning is not reliably honored by tool-capable subagents.
32
+
31
33
  ## Tool-enabled lenses (default expectation on applicable lenses)
32
34
 
33
35
  When an objective signal exists, hard lenses must anchor to it — failing test, typecheck error,
@@ -58,7 +60,7 @@ while Phase 2b synthesizer stays **tool-free** (lens JSON + diff stat only).
58
60
 
59
61
  **Hygiene:** configurable allow/deny domain lists (org/repo-level) — exclude benchmark-leak
60
62
  domains (e.g. Stack Overflow, issue mirrors) from verify search. Default deny list in
61
- `cli/src/grind-policy.ts`. **Per-lens budget:** hard cap (3 queries/lens/round); `saga note` on exceed.
63
+ `cli/src/grind-policy.ts`. **Per-lens budget:** hard cap (3 queries/lens/round); on exceed, for `jervaise` `saga note` it, for every other login note it in the run output/report (saga is Jervaise-only).
62
64
 
63
65
  **Diff-pinning preserved:** tools supplement — never replace — the pinned patch. Under
64
66
  `isolation=worktree`, repo FS tools stay forbidden even under ultra; cite builder test output from
@@ -73,7 +75,7 @@ full re-panel) before Phase 3 triage — at most once per round.
73
75
 
74
76
  **`contradictions`:** if the disagreement is **criteria/spec ambiguity** (lenses read the spec
75
77
  differently), stop and escalate to the human per **## Gates** — do not guess. If one lens is
76
- clearly wrong against consensus + diff, note it in the saga and triage real blockers only.
78
+ clearly wrong against consensus + diff, note it (for `jervaise` in the saga; for every other login in the run output/report — saga is Jervaise-only) and triage real blockers only.
77
79
 
78
80
  **Absence-claims (#1621, #1895, #2057).** Drop "missing/absent/unimplemented/not fixed" blockers
79
81
  contradicted by the pinned patch or green builder tests. Drop blockers when lens logs show repo
@@ -1,5 +1,7 @@
1
1
  # Grind saga keep — resume snapshot
2
2
 
3
+ **Jervaise-only artifact.** `mmi-cli saga snapshot` is a Jervaise-only continuity tool. For every other login, the issue/PR record + git history is the resume surface — do not use this template or `mmi-cli saga`.
4
+
3
5
  Enforced via **`mmi-cli saga snapshot`** (maps to saga HEAD — no parallel store). Checklist uses namespaced prefixes: `gs-open:`, `gs-resolved:`, `gs-ceiling:`.
4
6
 
5
7
  ## Read-first on resume
@@ -100,5 +100,5 @@ The grind loop uses this report as follows:
100
100
  If the synthesizer returns invalid JSON or errors:
101
101
 
102
102
  1. Fall back to **raw lens blockers** — union all lens `blockers`, manual dedupe by file+line+title.
103
- 2. `mmi-cli saga note "phase 2b: synthesize degraded, raw lens triage"`.
103
+ 2. For `jervaise`, `mmi-cli saga note "phase 2b: synthesize degraded, raw lens triage"`; for every other login, note the degradation in the run output/report (saga is Jervaise-only).
104
104
  3. Continue Phase 3 — synthesis is an uplift, not a hard dependency.
@@ -1,11 +1,13 @@
1
1
  ---
2
2
  name: handoff
3
- description: Record or claim an explicit session handoff in the saga + North Star system that SessionStart resumes — open a handoff to leave work for a future session, accept a pending one, or cancel your own. Use when the user says "handoff" or "/handoff", asks to hand off work, leave a checkpoint for the next session, or claim a prior handoff, or invokes /handoff.
3
+ description: Jervaise-only. Record or claim an explicit session handoff in the saga + North Star system that SessionStart resumes — open a handoff to leave work for a future session, accept a pending one, or cancel your own. Use when Jervaise says "handoff" or "/handoff", asks to hand off work, leave a checkpoint for the next session, or claim a prior handoff, or invokes /handoff.
4
4
  ---
5
5
 
6
6
  # /handoff — explicit saga handoff lifecycle
7
7
 
8
- Use when the user says `handoff` or `/handoff`, or asks to leave work for a future session or claim a prior handoff. This skill records an explicit handoff in the same saga + North Star system that SessionStart resumes.
8
+ This is a Jervaise-only continuity skill. If any other developer asks for a handoff, do not open or claim one; use the board, issue, or PR as the handoff record instead.
9
+
10
+ Use when Jervaise says `handoff` or `/handoff`, or asks to leave work for a future session or claim a prior handoff. This skill records an explicit handoff in the same saga + North Star system that SessionStart resumes.
9
11
 
10
12
  ## Start
11
13
 
@@ -11,7 +11,7 @@ Default pool: 3 servants total: one `fugu-ultra` and two normal `fugu` servants.
11
11
 
12
12
  Allowed range: `--3` through `--6`. Exactly one servant is Ultra in every run.
13
13
 
14
- Supported engines: `codex-fugu` through the PTY leash and OpenCode/Fugu through session-backed `opencode run --session` routing. OpenCode is preferred when available because it exposes parseable JSON events, session ids, and completion state.
14
+ Active engine: native OpenAI-compatible Fugu API calls against the Sakana endpoint. It uses `SAKANA_API_KEY` or `MMI_OVERLORD_LLM_API_KEY`, records model/request/conversation state in the run registry, and does not depend on Codex-Fugu or OpenCode session routing.
15
15
 
16
16
  ## Start Contract
17
17
 
@@ -23,7 +23,7 @@ Supported engines: `codex-fugu` through the PTY leash and OpenCode/Fugu through
23
23
  6. Own worktrees, stage/dev servers, Playwright, browsers, PRs, merges, and cleanup.
24
24
  7. Keep servants leased until `/overlord stop`, `mmi-cli overlord stop`, or explicit controlled shutdown.
25
25
 
26
- CLI startup persists a gitignored run registry at `tmp/overlord/runs.json`, starts a durable controller, and lets the controller spawn servant PTYs or OpenCode sessions. `mmi-cli overlord send <target> <message>` queues redirects into that registry so the controller can route them to live servants. On OpenCode, redirects use `opencode run --session <session-id> -m sakana/fugu --format json <message>` and advance a real message lifecycle (`queuedAt`, `startedAt`, `completedAt`, `failedAt`) from JSON events. The launch profile uses `-a never` plus explicit sandbox settings where the engine supports them so routine servant tool calls do not bounce approval prompts back to the human.
26
+ CLI startup persists a gitignored run registry at `tmp/overlord/runs.json` and starts API-backed Fugu conversations. `mmi-cli overlord send <target> <message>` appends to each servant's stored conversation, calls `/chat/completions`, captures the assistant text, and advances a real message lifecycle (`queued`, `started`, `completed`, `partial`, `failed`) from the API result.
27
27
 
28
28
  ## Reference Loading
29
29
 
@@ -33,24 +33,24 @@ Read only what the task requires:
33
33
  - `references/servant-normal.md`: prompt for normal Fugu servants.
34
34
  - `references/servant-ultra.md`: prompt for the single Ultra servant.
35
35
  - `references/loop-contract.md`: evidence, edit, verify, retry, escalate, and stop rules.
36
- - `references/terminal-leash.md`: servant startup, submit probing, approval profiles, and stop safety.
36
+ - `references/terminal-leash.md`: servant startup, API conversation routing, shell-surface boundaries, and stop safety.
37
37
  - `references/servant-liveness.md`: liveness lease and awaiting-human behavior.
38
- - `references/controller-orphan-guard.md`: abrupt close, stale heartbeat, adoption, exact stop, and uncertainty.
39
- - `references/codex-fugu-preflight.md`: setup, update, model, API key, and Windows/Git Bash path checks.
40
- - `references/opencode-fugu-engine.md`: OpenCode preflight, JSON event parsing, session-backed mailbox, ledger, and liveness model.
38
+ - `references/controller-orphan-guard.md`: abrupt close, stale sessions, adoption, exact stop, and uncertainty.
39
+ - `references/fugu-api-engine.md`: native API preflight, conversation persistence, ledger, request timeouts, and liveness model.
41
40
  - `references/shell-adapters.md`: PowerShell, cmd, Git Bash, macOS zsh/bash, Linux bash/sh, and unknown-shell rules.
42
41
  - `references/state-schema.md`: durable run-state fields.
43
42
  - `references/failure-pressure-scenarios.md`: tests and lessons from the first Overlord design run.
44
43
 
45
44
  ## Hard Rules
46
45
 
47
- - **Fugu only — never a sub-agent fallback.** Overlord servants are Fugu (`fugu-ultra` + `fugu`) driven through the Overlord controller. Never satisfy an Overlord run with platform sub-agents, `multi_agent_v1`, generic workers, or any non-Fugu agent pool not as a primary path and not as a fallback when the Fugu controller is missing or inactive. If the Fugu controller cannot start, is inactive, or cannot prove readable/writable handles, stop and report `blocked: fugu-controller-unavailable` with diagnosis; do not simulate Overlord with other agents.
48
- - Do not spawn servants on an unreadable or undrivable surface.
49
- - **Probe the engine before launch.** Run the Codex/Fugu (or OpenCode) preflight + `--help`/status probe before any servant launch; never launch into an unprobed surface.
50
- - Codex/Fugu PTY servants launch with explicit `-a never` (no-approval) and an appropriate `-s` sandbox profile — read-only for consultation, workspace-write only in an owned worktree. OpenCode servants use `opencode run --format json --session` and rely on the OpenCode session mailbox instead of PTY submit probing.
46
+ - **Fugu only — never a substitute pool.** Overlord servants are Fugu (`fugu-ultra` + `fugu`) driven through the native Fugu API. Never satisfy an Overlord run with platform sub-agents, `multi_agent_v1`, generic workers, Codex-Fugu, OpenCode sessions, or any non-Fugu agent pool. If the Fugu API cannot prove the required endpoint, key, and models, stop and report the setup failure; do not simulate Overlord with other agents.
47
+ - Do not spawn servants on an unreadable or undrivable engine.
48
+ - **Probe the native Fugu API before launch.** Verify the API key, base URL, `/models`, `fugu`, and `fugu-ultra` before any servant launch.
49
+ - API servants keep a stored system/user/assistant conversation per servant and use bounded `/chat/completions` calls for startup and redirects.
50
+ - When the Overlord itself uses shell tools, it must use the native current shell for the host: `pwsh` on Windows, `zsh` on macOS, and `bash` on Linux/Unix. Windows `powershell.exe` is not an acceptable Overlord default.
51
51
  - **Any routine approval prompt during startup, planning, or assigned work is a launch-profile failure** — record and recover it, never hand-wave it away or train the human to approve routine commands. Consultation servants get read-only plus the disk-read permission ordinary host config reads need, or the Overlord performs those reads itself.
52
- - Probe submit behavior before sending real prompts.
53
- - **Delivery is not execution.** `mmi-cli overlord send` records a `queued`/`started` lifecycle, not completion. A redirect counts as delivered only when the servant journal shows the assignment left the composer and produced a new useful signal. If text remains at the `›` composer prompt after a bounded interval, mark the servant `delivery-stuck-composer` and the message `failed` — never report it as ready or delivered.
52
+ - Probe the API path before sending real prompts.
53
+ - **Delivery is not execution.** `mmi-cli overlord send` is complete only when the API returns captured servant text or a bounded failure. If the request times out, errors, or returns no assistant text, mark the servant `blocked` and the message `failed` — never report it as ready or delivered.
54
54
  - **No handoff after delivery = stalled, not ready.** When a servant stays `ready` but produces no non-TUI output after a bounded handoff-expected interval, mark it `stalled-after-delivery`; do not keep reporting it as ready.
55
55
  - Never rely on stale ACKs as liveness proof.
56
56
  - Never broad-kill by process name, title, shell name, or model command.
@@ -1,33 +1,30 @@
1
- # Controller And Orphan Guard
1
+ # Run Registry And Orphan Guard
2
2
 
3
- The controller, not conversational memory, owns servants.
3
+ The run registry, not conversational memory, owns servants.
4
4
 
5
- Controller responsibilities:
5
+ Registry responsibilities:
6
6
 
7
- - Spawn servant PTYs.
8
- - Hold readable/writable handles.
7
+ - Record Fugu API servant conversations.
9
8
  - Persist run state under gitignored `tmp/overlord`.
10
- - Write heartbeat.
11
- - Tee bounded journals.
9
+ - Tee bounded ledger and event journals.
12
10
  - Expose status, stop, adopt, and recover.
13
11
 
14
12
  On every `/overlord`, `status`, `stop`, resume, or human message:
15
13
 
16
14
  - Rehydrate run state.
17
- - Check controller heartbeat.
18
- - Check servant handles.
15
+ - Check model metadata, conversation history, request ids, and last useful signal.
19
16
  - Classify orphan state before doing more work.
20
17
 
21
18
  Orphan classifications:
22
19
 
23
- - `controller-alive-overlord-detached`
24
- - `controller-dead-servants-dead`
25
- - `controller-dead-servants-owned-alive`
26
- - `controller-dead-servants-uncertain`
20
+ - `overlord-conversations-live`
21
+ - `conversations-blocked`
22
+ - `resources-owned-alive`
23
+ - `resources-uncertain`
27
24
 
28
25
  Actions:
29
26
 
30
- - Adopt only with matching run token and recoverable handles.
27
+ - Adopt only with a matching run token and recoverable conversation state.
31
28
  - Exact-stop only proven run-owned resources.
32
29
  - Leave uncertain resources alone and report them.
33
- - Never broad-clean by process name or title.
30
+ - Never broad-clean by process name, title, shell name, or provider name.
@@ -2,17 +2,17 @@
2
2
 
3
3
  Test these before accepting `/overlord`:
4
4
 
5
- - Windows PowerShell startup uses PowerShell syntax and native paths.
6
- - Windows Git Bash does not write `/c/Users/...` into native Codex config.
7
- - macOS zsh and Linux bash use POSIX syntax.
8
- - Unknown shell fails before servant launch.
9
- - Codex update leaves Fugu receipt stale; preflight detects and guides repair.
10
- - Missing `codex-fugu`, API key, or Ultra model stops startup with setup steps.
11
- - `TERM=dumb` warning is translated, not shown as scary raw noise.
12
- - Prompt typed into composer but not submitted is detected.
13
- - Routine read-only reconnaissance triggers approval; Overlord marks launch-profile failure.
14
- - Previously ACKed servants become unreachable; stale ACK is rejected.
15
- - Awaiting-human preserves servant leases.
16
- - Controller heartbeat goes stale; orphan classification runs first.
17
- - `/overlord stop` leaves user-owned terminals, OpenCode, Codex, Fugu, shells, and Windows Terminal untouched.
5
+ - Windows host work uses modern PowerShell (`pwsh`) syntax and native paths.
6
+ - macOS zsh and Linux bash use POSIX syntax for host work.
7
+ - Missing `SAKANA_API_KEY` and `MMI_OVERLORD_LLM_API_KEY` stops startup with setup steps.
8
+ - A bad Fugu API base URL stops startup with setup steps.
9
+ - `GET /models` returning an error stops startup with setup steps.
10
+ - Missing `fugu` or `fugu-ultra` stops startup with setup steps.
11
+ - Startup `/chat/completions` timeout marks the servant `blocked`.
12
+ - Startup API success with no assistant text marks the servant `blocked`.
13
+ - `send all` records one completed or failed result per targeted servant.
14
+ - A redirect timeout marks the message `failed` and does not claim delivery.
15
+ - Previously ACKed servants with failed follow-up requests become `blocked`; stale ACK is rejected.
16
+ - Awaiting-human preserves servant conversation state.
17
+ - `/overlord stop` leaves user-owned terminals, shells, and unrelated provider processes untouched.
18
18
  - Ambiguous leftovers are reported as `left-uncertain`.
@@ -0,0 +1,62 @@
1
+ # Native Fugu API Engine
2
+
3
+ Overlord servants run through the Sakana Fugu API using the OpenAI-compatible chat-completions surface.
4
+
5
+ Defaults:
6
+
7
+ - Base URL: `https://api.sakana.ai/v1`
8
+ - Normal model: `fugu`
9
+ - Ultra model: `fugu-ultra`
10
+ - API key: `SAKANA_API_KEY`, or `MMI_OVERLORD_LLM_API_KEY`
11
+ - Request timeout: `MMI_OVERLORD_LLM_TIMEOUT_MS`, clamped between 5 seconds and 10 minutes, defaulting to 90 seconds
12
+
13
+ Optional overrides:
14
+
15
+ - `MMI_OVERLORD_LLM_BASE_URL`
16
+ - `MMI_OVERLORD_LLM_MODEL`
17
+ - `MMI_OVERLORD_LLM_ULTRA_MODEL`
18
+
19
+ ## Preflight
20
+
21
+ Before launching servants, probe `GET /models` with the configured key and base URL.
22
+
23
+ Startup is blocked unless the probe proves both configured models are available.
24
+
25
+ Do not print the API key.
26
+
27
+ ## Conversation State
28
+
29
+ Each servant owns one stored conversation:
30
+
31
+ - one system message with the servant identity and Overlord constraints
32
+ - the startup assignment as a user message
33
+ - every captured assistant response
34
+ - future redirects appended as user messages
35
+
36
+ The run registry records the model, request id when available, and conversation length.
37
+
38
+ ## Redirects
39
+
40
+ `mmi-cli overlord send <target> <message>` calls `/chat/completions` for the target servant or each servant in `all`.
41
+
42
+ The command returns only after the API response is captured or a bounded failure is recorded.
43
+
44
+ If a request times out, errors, or returns no assistant text:
45
+
46
+ - mark the servant `blocked`
47
+ - mark the message `failed`
48
+ - write the failure to the ledger
49
+
50
+ ## Ledger
51
+
52
+ Append one ledger event for startup and one per redirect response.
53
+
54
+ Include only operational metadata:
55
+
56
+ - servant slot id
57
+ - model
58
+ - request id
59
+ - response text
60
+ - error text
61
+
62
+ Never include API keys or provider secrets.
@@ -4,9 +4,9 @@ An ACK creates a lease, not permanent proof.
4
4
 
5
5
  Readiness requires:
6
6
 
7
- - Current readable handle.
8
- - Current writable handle.
9
- - Proven submit mode.
7
+ - Current Fugu API conversation state.
8
+ - Configured model for the servant role.
9
+ - Stored system/user/assistant messages.
10
10
  - Matching run id and run token.
11
11
  - Recent useful signal or bounded liveness response.
12
12
 
@@ -15,13 +15,13 @@ Stale ACK-only readiness is forbidden.
15
15
  Awaiting-human:
16
16
 
17
17
  - Servants remain leased.
18
- - Controller heartbeat stays active.
18
+ - Run registry remains current.
19
19
  - Status rehydrates state and checks liveness.
20
- - If background liveness is unsupported, mark `suspended-awaiting-human` and require a rehydrate pass before work resumes.
20
+ - If background liveness is unsupported, preserve conversation state and require a rehydrate pass before work resumes.
21
21
 
22
- Lost servant:
22
+ Blocked servant:
23
23
 
24
- - Mark the slot lost/unresponsive.
25
- - Preserve bounded journal.
26
- - Attempt recovery only when handles can be proven.
24
+ - Mark the slot `blocked`.
25
+ - Preserve bounded journal and conversation state.
26
+ - Attempt recovery only when the API key, model, and conversation can be proven.
27
27
  - Otherwise spawn a replacement in the same role slot with a compact handoff.
@@ -10,7 +10,8 @@ Detect:
10
10
 
11
11
  Rules:
12
12
 
13
- - PowerShell/pwsh: use PowerShell syntax and native Windows paths.
13
+ - Windows `pwsh`: use modern PowerShell syntax and native Windows paths.
14
+ - Legacy Windows PowerShell (`powershell.exe`) is not an acceptable host shell for Overlord; use `pwsh`.
14
15
  - cmd: use cmd syntax and native Windows paths.
15
16
  - Windows Git Bash: distinguish shell paths from native Windows consumer-process paths.
16
17
  - macOS zsh/bash: use POSIX syntax and macOS paths.
@@ -6,22 +6,15 @@ Minimum fields:
6
6
 
7
7
  - `runId`
8
8
  - `runToken`
9
- - `repo`
10
9
  - `worktree`
11
- - `branch`
12
- - `human`
13
- - `surface`
14
- - `hostPlatform`
15
- - `shellAdapter`
10
+ - `engine`
11
+ - `provider`
16
12
  - `state`
17
13
  - `createdAt`
18
14
  - `updatedAt`
19
- - `controllerPid`
20
- - `controllerFingerprint`
21
- - `lastControllerHeartbeatAt`
22
15
  - `statePath`
23
16
  - `journalDir`
24
- - `todoSnapshot`
17
+ - `ledgerPath`
25
18
  - `servants[]`
26
19
  - `messages[]`
27
20
  - `ownedResources[]`
@@ -33,17 +26,17 @@ Servant fields:
33
26
  - `role`
34
27
  - `model`
35
28
  - `profile`
36
- - `state` (includes `stalled-after-delivery` for elapsed handoff windows, and `delivery-stuck-composer` when a redirect is pasted but unsubmitted)
37
- - `pid`
38
- - `runToken`
39
- - `fingerprint`
29
+ - `state` (includes `blocked` when an API request fails or returns no assistant text)
40
30
  - `composerSubmitMode`
41
- - `opencodeSessionId`
31
+ - `llmModel`
32
+ - `llmRequestId`
33
+ - `llmMessages`
42
34
  - `lastAckAt`
43
35
  - `lastLivenessCheckAt`
44
36
  - `lastUsefulSignalAt`
45
37
  - `journalPath`
46
38
  - `eventJournalPath`
39
+ - `scopeToken` (optional non-secret attribution token bound to run id, servant slot, profile, and assignment scope)
47
40
  - `assignment`
48
41
  - `handoff`
49
42
 
@@ -53,14 +46,14 @@ Message fields:
53
46
  - `target`
54
47
  - `text`
55
48
  - `createdAt`
56
- - `state` (`queued` | `started` | `completed` | `failed`)
49
+ - `state` (`queued` | `started` | `completed` | `partial` | `failed`)
57
50
  - `queuedAt`
58
51
  - `startedAt`
59
52
  - `completedAt`
60
53
  - `failedAt`
61
54
  - `responseText`
62
55
  - `failureReason`
63
- - `deliveredAt` (legacy PTY-only; superseded by the lifecycle fields)
56
+ - `servantResults[]`
64
57
 
65
58
  Owned resource fields:
66
59
 
@@ -1,44 +1,47 @@
1
- # Terminal Leash
1
+ # Session Leash
2
2
 
3
- The Overlord must own every servant terminal through a durable controller, PTY leash, and registry.
3
+ The Overlord must own every servant conversation through a durable registry, model metadata, request ids when available, and a bounded message lifecycle.
4
+
5
+ The native Fugu API is the supported Overlord engine.
4
6
 
5
7
  Startup phases shown to humans:
6
8
 
7
- - Loading controller and PTYs.
8
- - Checking Fugu setup.
9
+ - Loading the run registry.
10
+ - Checking the Fugu API key and base URL.
11
+ - Checking `/models` for `fugu` and `fugu-ultra`.
9
12
  - Starting one Ultra and normal Fugus.
13
+ - Recording model, request, and conversation state.
10
14
  - Loading servant instructions.
11
- - Waiting for ACKs.
15
+ - Waiting for API-backed ACKs.
12
16
  - Ready.
13
17
 
14
- Do not show raw `TERM=dumb`, ANSI redraws, title-setting failures, or TUI noise unless startup fails or debug output is requested.
15
-
16
- Approval profiles:
18
+ Do not show raw provider response bodies, stack traces, or retry noise unless startup fails or debug output is requested.
17
19
 
18
- - Consultation: `codex-fugu --no-alt-screen -a never -s read-only -c 'sandbox_permissions=["disk-full-read-access"]'`
19
- - Implementation: `codex-fugu --no-alt-screen -a never -s workspace-write -c 'sandbox_permissions=["disk-full-read-access"]' -C <owned-worktree>`
20
- - Full-trust repair: only with explicit human approval and narrow blast radius.
20
+ Servant access model:
21
21
 
22
- `-a never` is required for servant launches. If routine consultation or bounded implementation asks the human for command approval, the profile is wrong; stop launch, report setup guidance, and do not hand-wave the prompt away.
22
+ - Servants are API conversations, not shell sessions.
23
+ - Servants do not receive direct tools, stage/dev servers, browsers, Playwright, PR rights, or release rights.
24
+ - The Overlord performs tool use, checks, edits, and PR operations in the host session after judging servant advice.
25
+ - Full-trust repair is never delegated to servants.
23
26
 
24
27
  Before launch:
25
28
 
26
- - Verify local help/status exposes approval, sandbox, config override, no-alt-screen, and cwd flags.
27
- - Accept either an API-key environment variable or local Codex auth evidence; guide setup when neither exists.
28
- - Verify the Fugu model catalog exposes `fugu-ultra`.
29
+ - Verify `SAKANA_API_KEY` or `MMI_OVERLORD_LLM_API_KEY` is available.
30
+ - Verify the configured base URL, defaulting to `https://api.sakana.ai/v1`.
31
+ - Verify `GET /models` exposes `fugu` and `fugu-ultra`, or the configured model overrides.
29
32
  - Fail closed when semantics are missing or unknown.
30
33
 
31
- Submit probe:
34
+ Conversation probe:
32
35
 
33
- - Prefer initial-prompt launch through the PTY leash.
34
- - Require the servant to emit `ACK <name> ready`.
35
- - Record `composerSubmitMode` and `lastAckAt`.
36
- - Fail startup if no mode is proven.
36
+ - Prefer initial-prompt launch through `/chat/completions`.
37
+ - Require the servant to emit useful assistant text for its startup assignment.
38
+ - Record `llmModel`, `llmRequestId` when available, `llmMessages`, `lastEventAt`, `lastAckAt`, and `composerSubmitMode=surface-api`.
39
+ - Fail startup if no assistant text is captured.
37
40
 
38
- Redirects after startup use `mmi-cli overlord send <target> <message>`; the controller drains the durable mailbox into live servant PTYs. Do not bypass the mailbox with ad-hoc keystrokes unless diagnosing the leash itself.
41
+ Redirects after startup use `mmi-cli overlord send <target> <message>`; the CLI appends to the servant conversation and records completion or bounded failure. Do not bypass the mailbox with ad-hoc provider calls unless diagnosing the leash itself.
39
42
 
40
43
  Stop safety:
41
44
 
42
45
  - Stop only recorded resources with matching run id, run token, and fingerprint.
43
- - Refuse generic `WindowsTerminal`, `pwsh`, `powershell`, `opencode`, `codex`, and `codex-fugu` names without exact ownership.
46
+ - Refuse generic process, shell, terminal, or provider names without exact ownership.
44
47
  - Refuse window-title-only ownership.