@mutmutco/opencode-mmi 2.56.0 → 2.57.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.d.ts +3 -0
- package/dist/index.js +121 -1
- package/package.json +2 -2
- package/skills/_shared/doctrine.md +13 -11
- package/skills/bootstrap/SKILL.md +5 -5
- package/skills/build/SKILL.md +25 -20
- package/skills/build/references/loops.md +1 -1
- package/skills/build/references/worked-example.md +2 -2
- package/skills/build/templates/campaign-northstar.md +2 -0
- package/skills/coop/SKILL.md +91 -16
- package/skills/grind/SKILL.md +22 -15
- package/skills/grind/references/auto.md +1 -2
- package/skills/grind/references/routing.md +3 -4
- package/skills/grind/references/verify.md +5 -3
- package/skills/grind/templates/saga-snapshot.md +2 -0
- package/skills/grind/templates/synthesize-panel.md +1 -1
- package/skills/handoff/SKILL.md +4 -2
- package/skills/overlord/SKILL.md +12 -12
- package/skills/overlord/references/controller-orphan-guard.md +12 -15
- package/skills/overlord/references/failure-pressure-scenarios.md +13 -13
- package/skills/overlord/references/fugu-api-engine.md +62 -0
- package/skills/overlord/references/servant-liveness.md +9 -9
- package/skills/overlord/references/shell-adapters.md +2 -1
- package/skills/overlord/references/state-schema.md +10 -17
- package/skills/overlord/references/terminal-leash.md +25 -22
- package/skills/release/SKILL.md +5 -9
- package/skills/stage/SKILL.md +1 -1
- package/skills/overlord/references/codex-fugu-preflight.md +0 -25
- package/skills/overlord/references/opencode-fugu-engine.md +0 -104
package/skills/grind/SKILL.md
CHANGED
|
@@ -17,6 +17,8 @@ Two kinds of work, one loop:
|
|
|
17
17
|
|
|
18
18
|
**Shared doctrine:** Read `skills/_shared/doctrine.md` at session start and on resume. Fusion, parallelism, panel economics, flat fan-out, classifier-denied spawns, worktree hygiene, saga resume, enforcement matrix — single source; do not duplicate here.
|
|
19
19
|
|
|
20
|
+
**Continuity access:** Saga and North Star are Jervaise-only. If the current login is not `jervaise`, skip all `saga`, `northstar`, `plan`, and `/handoff` steps; keep durable state in the board issue, PR body/comments, verification logs, and final report.
|
|
21
|
+
|
|
20
22
|
Flags:
|
|
21
23
|
- `--explore` — brainstorm and judge approaches before building (use for open-ended,
|
|
22
24
|
"find a better/faster way" work). Without it, run the convergent loop.
|
|
@@ -86,10 +88,11 @@ Flags:
|
|
|
86
88
|
- Perform the fix, **verify empirically** (run the command whose output the acceptance pins — no diff
|
|
87
89
|
to panel), close the issue, and **file any durable-prevention enhancement** as a separate item.
|
|
88
90
|
- Terminal-done layer 1 is the live command output; layer 3 is the close, not a merge.
|
|
89
|
-
- **Resumable.**
|
|
91
|
+
- **Resumable.** For `jervaise`, after every phase: (1) silent one-line `mmi-cli saga note "<audit>"`; (2) `mmi-cli saga snapshot set --kind grind …` (or `--json-file`) — see **## Saga keep (resume snapshot)**. For everyone else, write the same phase evidence to the issue/PR record. On resume, corroborate against git, GitHub, and the board; never reconstruct from collapsed chat history.
|
|
90
92
|
- **Announce routing.** At phase transitions (after Phase 0a′ / model selection, before Gate 1 under
|
|
91
|
-
interactive, and at equivalent points under `--auto`), **announce explicitly** in user-visible text
|
|
92
|
-
|
|
93
|
+
interactive, and at equivalent points under `--auto`), **announce explicitly** in user-visible text.
|
|
94
|
+
For `jervaise`, also mirror the same facts in a silent `mmi-cli saga note`; for every other login,
|
|
95
|
+
mirror them in the run output/report only (saga is Jervaise-only):
|
|
93
96
|
- **Ultra mode** when explicit `--ultra` YOLO is active (distinct from auto-ultra verify uplift).
|
|
94
97
|
- **Explore mode** when `--explore` or auto-framed explore is selected.
|
|
95
98
|
- **Routing tier** (Budget / Balanced / Paranoid / Ultra verify routing).
|
|
@@ -105,7 +108,7 @@ Flags:
|
|
|
105
108
|
|
|
106
109
|
## Saga keep (resume snapshot)
|
|
107
110
|
|
|
108
|
-
Grind-specific snapshot wiring — shared resume rules in `skills/_shared/doctrine.md`. Schema: `templates/saga-snapshot.md`. Use `--kind grind` for show/set. CLI maps snapshot fields → HEAD primitives (`next`, `anchor`, checklist) — no parallel store.
|
|
111
|
+
**Jervaise-only.** Grind-specific snapshot wiring — shared resume rules in `skills/_shared/doctrine.md`. Schema: `templates/saga-snapshot.md`. Use `--kind grind` for show/set. CLI maps snapshot fields → HEAD primitives (`next`, `anchor`, checklist) — no parallel store. For every other login, the issue/PR record + git history is the resume surface — do not run saga/North Star commands.
|
|
109
112
|
|
|
110
113
|
## Gates — how every gate (and decision point) is presented
|
|
111
114
|
Hosts collapse mid-turn text: anything you printed between tool calls may be invisible when
|
|
@@ -134,8 +137,9 @@ the user reaches the gate. So:
|
|
|
134
137
|
## Phase 0a′ — Classify & route
|
|
135
138
|
|
|
136
139
|
Runs immediately after the grindability check, **before** Phase 0a model questions. (`--auto`:
|
|
137
|
-
apply silently for classification; still **announce** routing per **Hard rules — Announce routing
|
|
138
|
-
`saga note "grind class=X routing=Y ultra=Z reason=…"
|
|
140
|
+
apply silently for classification; still **announce** routing per **Hard rules — Announce routing**.
|
|
141
|
+
For `jervaise`, `saga note "grind class=X routing=Y ultra=Z reason=…"`; for every other login,
|
|
142
|
+
announce in the run output/report only — saga is Jervaise-only.)
|
|
139
143
|
|
|
140
144
|
Read the issue **type label** (`bug` / `feature` / `task`), **priority**, **title/body**,
|
|
141
145
|
**labels** (e.g. `security`), known **files touched**, and flags (`--explore`, `--ultra`, `--auto`).
|
|
@@ -204,7 +208,7 @@ See `templates/synthesize-panel.md`.
|
|
|
204
208
|
**Paranoid / Ultra hard-lens double-pass:** run `security` and `correctness` twice — different
|
|
205
209
|
temperature or two different models — before Phase 2b.
|
|
206
210
|
|
|
207
|
-
Log each verify round: `mmi-cli saga note "verify round N: routing=X ultra=Y"
|
|
211
|
+
Log each verify round: for `jervaise`, `mmi-cli saga note "verify round N: routing=X ultra=Y"`; for every other login, log it in the run output/report (saga is Jervaise-only).
|
|
208
212
|
|
|
209
213
|
## Phase 0b — Frame [GATE 1]
|
|
210
214
|
(`--auto`: no gate — auto-decide explore-vs-convergent, in `--explore` auto-pick the judge's
|
|
@@ -220,8 +224,7 @@ planning shape — **parallel planners + verifier-tier judge** — before Gate 1
|
|
|
220
224
|
2. Each planner returns: approach summary, risks, proposed success criteria, estimated complexity.
|
|
221
225
|
3. **Judge agent** (verifier-tier model, **≠** any planner/builder) scores against the goal rubric;
|
|
222
226
|
picks winner or synthesizes hybrid — synthesize-and-reconcile, not vote/debate.
|
|
223
|
-
4. Winner + criteria written to issue body; North Star push
|
|
224
|
-
Phase 1 (`--auto`). `mmi-cli saga note "multi-plan N=<n> winner=<summary>"`.
|
|
227
|
+
4. Winner + criteria written to issue body; for `jervaise`, North Star push + `mmi-cli saga note "multi-plan N=<n> winner=<summary>"`; for every other login, record the winner + criteria in the issue/PR record (saga + North Star are Jervaise-only). Proceed to Gate 1 (interactive) or Phase 1 (`--auto`).
|
|
225
228
|
|
|
226
229
|
**Skip multi-agent planning** for narrow bugs, low priority, or user "quick"/"small" — single
|
|
227
230
|
planner + judge (or direct criteria framing) is enough.
|
|
@@ -249,7 +252,7 @@ planner outputs when both paths are active.
|
|
|
249
252
|
**Vague or ambiguous body?** Read the full issue (body + comments), state the deliverable,
|
|
250
253
|
and write criteria the user can confirm at Gate 1. Umbrella scope → child issues (`--parent`),
|
|
251
254
|
one shippable unit per grind.
|
|
252
|
-
Write them into the issue body
|
|
255
|
+
Write them into the issue body. For `jervaise`, also push the criteria to North Star (`mmi-cli northstar push <slug>` —
|
|
253
256
|
the default push queues a background sync and prints "queued": that is success, not a failure;
|
|
254
257
|
`mmi-cli northstar status` checks it, `mmi-cli northstar sync` confirms durably).
|
|
255
258
|
3. Present per **## Gates** (class, routing, ultra, criteria). **Wait for the user's go.**
|
|
@@ -260,8 +263,7 @@ planner outputs when both paths are active.
|
|
|
260
263
|
2. Otherwise brainstorm **2-3 candidate approaches** (2 if the goal is clear, 3 if wide) on the
|
|
261
264
|
builder model — or use **multi-agent planning** when auto-gated.
|
|
262
265
|
3. Score with the **verifier-model judge** (or fusion panel judge); pick or synthesize a direction.
|
|
263
|
-
4. Define a success target — numeric metric if stated, else judged rubric. File/claim
|
|
264
|
-
push. Present per **## Gates**. **Wait for the user's go.**
|
|
266
|
+
4. Define a success target — numeric metric if stated, else judged rubric. File/claim; for `jervaise`, also push North Star. Present per **## Gates**. **Wait for the user's go.**
|
|
265
267
|
|
|
266
268
|
## Generative fusion path (auto-gated)
|
|
267
269
|
|
|
@@ -275,7 +277,7 @@ verifier-tier **judge** fuses into one markdown deliverable. For **research-clas
|
|
|
275
277
|
any planner) synthesizes — not vote/debate.
|
|
276
278
|
2. Judge output is a **fused markdown deliverable**: chosen approach, tradeoffs, criteria
|
|
277
279
|
refinement, spike findings.
|
|
278
|
-
3. Land in issue body
|
|
280
|
+
3. Land in issue body; for `jervaise`, also `mmi-cli northstar push <slug>` (North Star is Jervaise-only).
|
|
279
281
|
4. **Code-shipping explore:** fused doc feeds Phase 1; Phase 2 verify stays diff-pinned.
|
|
280
282
|
5. **Research-only** (no code PR): fusion output is the primary artifact; stop after criteria met
|
|
281
283
|
+ Gate 2 (`--auto` report).
|
|
@@ -335,6 +337,8 @@ build that adds new modules/tests is invisible to lenses and can draw a false `c
|
|
|
335
337
|
Stage first (`git -C <worktree> add -A && git -C <worktree> diff --cached -- ':!cli/dist' > tmp/grind-verify-<round>.patch`)
|
|
336
338
|
or `git -C <worktree> add -N <new files>` before the diff (#2057).
|
|
337
339
|
|
|
340
|
+
**Spawn lenses tool-restricted (#2137).** Under `isolation=worktree`, spawn each lens with a subagent type that has **no shell/git/repo-filesystem access** — never the default `general-purpose` (or any `*`-tool) agent. A tool-capable lens silently bypasses prompt-only pinning: a `general-purpose` requirements lens has ignored the embedded patch, run `git diff` against the worktree's PARENT checkout (a different branch), and returned a false `cannot-verify`. Pass the patch file as the lens's ONLY input and state it has zero repo/git/filesystem access; if it cannot judge from the patch, it returns `cannot-verify`. Where the host exposes no zero-shell agent type, keep the no-access framing in the prompt AND discard + re-run (patch-only) any lens whose transcript shows a repo/git read — prompt-only pinning is not structurally honored by tool-capable subagents.
|
|
341
|
+
|
|
338
342
|
**Lens-prompt clauses → `references/verify.md`.** Every lens prompt MUST carry: the **verbatim-includes-test-files** rule, the **abstention** rule (`cannot-verify`, never a false "absent/missing" blocker), the **diff-shape** clause (a referenced-but-undefined symbol is pre-existing — never flag it), and the **worktree-isolation** clause (patch-only, deny repo FS, stale-checkout warning). The exact wording lives in `references/verify.md` — load it before spawning lenses.
|
|
339
343
|
|
|
340
344
|
Under **Paranoid** or **Ultra**, run **hard lenses twice** before Phase 2b.
|
|
@@ -382,7 +386,9 @@ The synthesizer returns a **`PanelReport`** — structured reconciliation of len
|
|
|
382
386
|
|
|
383
387
|
A verify round **fails if `PanelReport.blockers` is non-empty**. If synthesis errors or returns
|
|
384
388
|
invalid JSON, **degrade gracefully**: union raw lens `blockers` (manual dedupe by file+line+title),
|
|
385
|
-
`saga note` the degradation
|
|
389
|
+
then for `jervaise` `saga note` the degradation and continue Phase 3; for every other login, note the
|
|
390
|
+
degradation in the run output/report and continue (saga is Jervaise-only). Synthesis is an uplift,
|
|
391
|
+
not a hard dependency.
|
|
386
392
|
Optional CLI path: `mmi-cli verify panel` plans lens jobs; pipe lens JSON to `mmi-cli verify synthesize`
|
|
387
393
|
for deterministic blocker dedupe before the host synthesizer enriches consensus/contradictions.
|
|
388
394
|
Real verifier lanes only. Empty or controller-authored all-pass stubs are invalid evidence and do
|
|
@@ -453,7 +459,8 @@ See shared doctrine § Self-learning + retro. Grind-specific examples: gate mess
|
|
|
453
459
|
## End-of-grind summary
|
|
454
460
|
|
|
455
461
|
At grind completion (PR opened + interactive stop, or `--auto` terminate/merge report), emit a **very
|
|
456
|
-
brief** summary block
|
|
462
|
+
brief** summary block in user-visible text. For `jervaise`, also mirror it in `mmi-cli saga note`
|
|
463
|
+
(saga is Jervaise-only); for every other login, the user-visible summary is the record:
|
|
457
464
|
|
|
458
465
|
- **Tier + modes:** chosen tier (`light`/`standard`/`deep`/`ultra`); `--explore`, auto-ultra (verify uplift), `--auto` if applied.
|
|
459
466
|
- **Models used:** builder / verifier / third / synthesizer / judge (host slot names).
|
|
@@ -73,8 +73,7 @@ report. Never pretend cross-vendor ultra ran when it did not.
|
|
|
73
73
|
Auto-decide explore-vs-convergent from the ask (open-ended → explore; `--explore` forces it on).
|
|
74
74
|
Run **multi-agent planning + judge** when auto-gated (always under explicit `--ultra`). In explore,
|
|
75
75
|
brainstorm + judge or **generative fusion** when auto-gated — **auto-pick the winning approach** —
|
|
76
|
-
no wait. File/claim the item(s), write the criteria, push to North Star,
|
|
77
|
-
`saga note` multi-plan winner, tool policy, and fusion path when active.
|
|
76
|
+
no wait. File/claim the item(s), write the criteria, then for `jervaise` push to North Star + `saga note` multi-plan winner, tool policy, and fusion path when active; for every other login, record the criteria + winner in the issue/PR record (saga + North Star are Jervaise-only). Then go straight to Phase 1.
|
|
78
77
|
|
|
79
78
|
## Phase 4 — PR → CI-merge loop (replaces Gate 2)
|
|
80
79
|
|
|
@@ -5,7 +5,7 @@ Loaded on demand from `SKILL.md` § Phase 0a′. The class table + the effort-ti
|
|
|
5
5
|
## Auto-ultra detection
|
|
6
6
|
|
|
7
7
|
**auto-ultra = true** (verify-panel uplift **only** — not whole-loop YOLO) when **any** of
|
|
8
|
-
(log first match in saga):
|
|
8
|
+
(for `jervaise`, log first match in saga; for every other login, in the run output/report):
|
|
9
9
|
|
|
10
10
|
1. ~~User passed **`--ultra`**.~~ **Explicit `--ultra` is separate** — see **Flags** and
|
|
11
11
|
**§ Explicit `--ultra` vs auto-ultra**; it is **not** auto-ultra.
|
|
@@ -13,7 +13,7 @@ Loaded on demand from `SKILL.md` § Phase 0a′. The class table + the effort-ti
|
|
|
13
13
|
3. **`--explore`** + stated numeric SLA/metric with high blast radius.
|
|
14
14
|
4. **Architectural / cross-cutting** scope (multi-module, public API break, migration).
|
|
15
15
|
5. **Escalation:** after **2 failed verify rounds** on default routing, escalate to auto-ultra for round 3+
|
|
16
|
-
(once per grind; saga note). Interactive: announce; `--auto`: silent.
|
|
16
|
+
(once per grind; for `jervaise`, `saga note` the escalation; for every other login, note it in the run output/report — saga is Jervaise-only). Interactive: announce; `--auto`: silent.
|
|
17
17
|
|
|
18
18
|
**auto-ultra = false** (stay 2-model): docs/prose-only diffs; priority `low`; narrow bug with clear repro;
|
|
19
19
|
user says "quick" / "small" in the prompt (explicit user instruction wins).
|
|
@@ -70,7 +70,6 @@ When the user passes **explicit `--ultra`**, select **higher reasoning effort**
|
|
|
70
70
|
exposes it (e.g. elevated thinking/reasoning tier on supported models). Apply to builder, verifier,
|
|
71
71
|
third, synthesizer, and judge roles when the host offers per-model reasoning knobs.
|
|
72
72
|
|
|
73
|
-
- **Announce** when elevated reasoning is selected — model + level — in the routing announcement block
|
|
74
|
-
and `mmi-cli saga note "reasoning=<model>:<level> …"`.
|
|
73
|
+
- **Announce** when elevated reasoning is selected — model + level — in the routing announcement block; for `jervaise`, also `mmi-cli saga note "reasoning=<model>:<level> …"` (saga is Jervaise-only).
|
|
75
74
|
- **Fallback:** when the host has no reasoning-effort knob, use the strongest available model tier and
|
|
76
75
|
note the gap in the announcement — do not block the grind.
|
|
@@ -24,10 +24,12 @@ diff-shape clause:
|
|
|
24
24
|
This prevents diff-absent symbols from becoming ship-stoppers at the lens; Phase 2b's absence-claims
|
|
25
25
|
drop rule remains the backstop.
|
|
26
26
|
|
|
27
|
-
**Worktree isolation (#1621, #1895).** Phase 2: `isolation=worktree` — other checkouts may be stale.
|
|
27
|
+
**Worktree isolation (#1621, #1895, #2137).** Phase 2: `isolation=worktree` — other checkouts may be stale.
|
|
28
28
|
**Orchestrator MUST:** patch-only input; deny repo FS tools on lenses; stale-checkout clause in every lens prompt; re-run Phase 2 if a transcript shows disk reads — never triage disk-sourced blockers.
|
|
29
29
|
Abstention + diff-shape rules above still apply; `cannot-verify` beats false absence.
|
|
30
30
|
|
|
31
|
+
**Deny repo FS tools structurally, not just by prompt (#2137).** Spawn lenses with a subagent type that has no Bash/git/repo-filesystem access; never spawn a lens as a `general-purpose` / `*`-tool agent under `isolation=worktree`. A tool-capable lens has ignored the embedded patch and run `git diff` on the worktree's parent checkout (on another branch), returning a false `cannot-verify` describing a diff that is not under review. When the host cannot restrict tools, the prompt must forbid all repo/git/file access and the orchestrator discards any blocker whose evidence came from a disk/git read, re-running patch-only — prompt-only pinning is not reliably honored by tool-capable subagents.
|
|
32
|
+
|
|
31
33
|
## Tool-enabled lenses (default expectation on applicable lenses)
|
|
32
34
|
|
|
33
35
|
When an objective signal exists, hard lenses must anchor to it — failing test, typecheck error,
|
|
@@ -58,7 +60,7 @@ while Phase 2b synthesizer stays **tool-free** (lens JSON + diff stat only).
|
|
|
58
60
|
|
|
59
61
|
**Hygiene:** configurable allow/deny domain lists (org/repo-level) — exclude benchmark-leak
|
|
60
62
|
domains (e.g. Stack Overflow, issue mirrors) from verify search. Default deny list in
|
|
61
|
-
`cli/src/grind-policy.ts`. **Per-lens budget:** hard cap (3 queries/lens/round); `saga note`
|
|
63
|
+
`cli/src/grind-policy.ts`. **Per-lens budget:** hard cap (3 queries/lens/round); on exceed, for `jervaise` `saga note` it, for every other login note it in the run output/report (saga is Jervaise-only).
|
|
62
64
|
|
|
63
65
|
**Diff-pinning preserved:** tools supplement — never replace — the pinned patch. Under
|
|
64
66
|
`isolation=worktree`, repo FS tools stay forbidden even under ultra; cite builder test output from
|
|
@@ -73,7 +75,7 @@ full re-panel) before Phase 3 triage — at most once per round.
|
|
|
73
75
|
|
|
74
76
|
**`contradictions`:** if the disagreement is **criteria/spec ambiguity** (lenses read the spec
|
|
75
77
|
differently), stop and escalate to the human per **## Gates** — do not guess. If one lens is
|
|
76
|
-
clearly wrong against consensus + diff, note it in the saga and triage real blockers only.
|
|
78
|
+
clearly wrong against consensus + diff, note it (for `jervaise` in the saga; for every other login in the run output/report — saga is Jervaise-only) and triage real blockers only.
|
|
77
79
|
|
|
78
80
|
**Absence-claims (#1621, #1895, #2057).** Drop "missing/absent/unimplemented/not fixed" blockers
|
|
79
81
|
contradicted by the pinned patch or green builder tests. Drop blockers when lens logs show repo
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# Grind saga keep — resume snapshot
|
|
2
2
|
|
|
3
|
+
**Jervaise-only artifact.** `mmi-cli saga snapshot` is a Jervaise-only continuity tool. For every other login, the issue/PR record + git history is the resume surface — do not use this template or `mmi-cli saga`.
|
|
4
|
+
|
|
3
5
|
Enforced via **`mmi-cli saga snapshot`** (maps to saga HEAD — no parallel store). Checklist uses namespaced prefixes: `gs-open:`, `gs-resolved:`, `gs-ceiling:`.
|
|
4
6
|
|
|
5
7
|
## Read-first on resume
|
|
@@ -100,5 +100,5 @@ The grind loop uses this report as follows:
|
|
|
100
100
|
If the synthesizer returns invalid JSON or errors:
|
|
101
101
|
|
|
102
102
|
1. Fall back to **raw lens blockers** — union all lens `blockers`, manual dedupe by file+line+title.
|
|
103
|
-
2. `mmi-cli saga note "phase 2b: synthesize degraded, raw lens triage"
|
|
103
|
+
2. For `jervaise`, `mmi-cli saga note "phase 2b: synthesize degraded, raw lens triage"`; for every other login, note the degradation in the run output/report (saga is Jervaise-only).
|
|
104
104
|
3. Continue Phase 3 — synthesis is an uplift, not a hard dependency.
|
package/skills/handoff/SKILL.md
CHANGED
|
@@ -1,11 +1,13 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: handoff
|
|
3
|
-
description: Record or claim an explicit session handoff in the saga + North Star system that SessionStart resumes — open a handoff to leave work for a future session, accept a pending one, or cancel your own. Use when
|
|
3
|
+
description: Jervaise-only. Record or claim an explicit session handoff in the saga + North Star system that SessionStart resumes — open a handoff to leave work for a future session, accept a pending one, or cancel your own. Use when Jervaise says "handoff" or "/handoff", asks to hand off work, leave a checkpoint for the next session, or claim a prior handoff, or invokes /handoff.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# /handoff — explicit saga handoff lifecycle
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
This is a Jervaise-only continuity skill. If any other developer asks for a handoff, do not open or claim one; use the board, issue, or PR as the handoff record instead.
|
|
9
|
+
|
|
10
|
+
Use when Jervaise says `handoff` or `/handoff`, or asks to leave work for a future session or claim a prior handoff. This skill records an explicit handoff in the same saga + North Star system that SessionStart resumes.
|
|
9
11
|
|
|
10
12
|
## Start
|
|
11
13
|
|
package/skills/overlord/SKILL.md
CHANGED
|
@@ -11,7 +11,7 @@ Default pool: 3 servants total: one `fugu-ultra` and two normal `fugu` servants.
|
|
|
11
11
|
|
|
12
12
|
Allowed range: `--3` through `--6`. Exactly one servant is Ultra in every run.
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
Active engine: native OpenAI-compatible Fugu API calls against the Sakana endpoint. It uses `SAKANA_API_KEY` or `MMI_OVERLORD_LLM_API_KEY`, records model/request/conversation state in the run registry, and does not depend on Codex-Fugu or OpenCode session routing.
|
|
15
15
|
|
|
16
16
|
## Start Contract
|
|
17
17
|
|
|
@@ -23,7 +23,7 @@ Supported engines: `codex-fugu` through the PTY leash and OpenCode/Fugu through
|
|
|
23
23
|
6. Own worktrees, stage/dev servers, Playwright, browsers, PRs, merges, and cleanup.
|
|
24
24
|
7. Keep servants leased until `/overlord stop`, `mmi-cli overlord stop`, or explicit controlled shutdown.
|
|
25
25
|
|
|
26
|
-
CLI startup persists a gitignored run registry at `tmp/overlord/runs.json
|
|
26
|
+
CLI startup persists a gitignored run registry at `tmp/overlord/runs.json` and starts API-backed Fugu conversations. `mmi-cli overlord send <target> <message>` appends to each servant's stored conversation, calls `/chat/completions`, captures the assistant text, and advances a real message lifecycle (`queued`, `started`, `completed`, `partial`, `failed`) from the API result.
|
|
27
27
|
|
|
28
28
|
## Reference Loading
|
|
29
29
|
|
|
@@ -33,24 +33,24 @@ Read only what the task requires:
|
|
|
33
33
|
- `references/servant-normal.md`: prompt for normal Fugu servants.
|
|
34
34
|
- `references/servant-ultra.md`: prompt for the single Ultra servant.
|
|
35
35
|
- `references/loop-contract.md`: evidence, edit, verify, retry, escalate, and stop rules.
|
|
36
|
-
- `references/terminal-leash.md`: servant startup,
|
|
36
|
+
- `references/terminal-leash.md`: servant startup, API conversation routing, shell-surface boundaries, and stop safety.
|
|
37
37
|
- `references/servant-liveness.md`: liveness lease and awaiting-human behavior.
|
|
38
|
-
- `references/controller-orphan-guard.md`: abrupt close, stale
|
|
39
|
-
- `references/
|
|
40
|
-
- `references/opencode-fugu-engine.md`: OpenCode preflight, JSON event parsing, session-backed mailbox, ledger, and liveness model.
|
|
38
|
+
- `references/controller-orphan-guard.md`: abrupt close, stale sessions, adoption, exact stop, and uncertainty.
|
|
39
|
+
- `references/fugu-api-engine.md`: native API preflight, conversation persistence, ledger, request timeouts, and liveness model.
|
|
41
40
|
- `references/shell-adapters.md`: PowerShell, cmd, Git Bash, macOS zsh/bash, Linux bash/sh, and unknown-shell rules.
|
|
42
41
|
- `references/state-schema.md`: durable run-state fields.
|
|
43
42
|
- `references/failure-pressure-scenarios.md`: tests and lessons from the first Overlord design run.
|
|
44
43
|
|
|
45
44
|
## Hard Rules
|
|
46
45
|
|
|
47
|
-
- **Fugu only — never a
|
|
48
|
-
- Do not spawn servants on an unreadable or undrivable
|
|
49
|
-
- **Probe the
|
|
50
|
-
-
|
|
46
|
+
- **Fugu only — never a substitute pool.** Overlord servants are Fugu (`fugu-ultra` + `fugu`) driven through the native Fugu API. Never satisfy an Overlord run with platform sub-agents, `multi_agent_v1`, generic workers, Codex-Fugu, OpenCode sessions, or any non-Fugu agent pool. If the Fugu API cannot prove the required endpoint, key, and models, stop and report the setup failure; do not simulate Overlord with other agents.
|
|
47
|
+
- Do not spawn servants on an unreadable or undrivable engine.
|
|
48
|
+
- **Probe the native Fugu API before launch.** Verify the API key, base URL, `/models`, `fugu`, and `fugu-ultra` before any servant launch.
|
|
49
|
+
- API servants keep a stored system/user/assistant conversation per servant and use bounded `/chat/completions` calls for startup and redirects.
|
|
50
|
+
- When the Overlord itself uses shell tools, it must use the native current shell for the host: `pwsh` on Windows, `zsh` on macOS, and `bash` on Linux/Unix. Windows `powershell.exe` is not an acceptable Overlord default.
|
|
51
51
|
- **Any routine approval prompt during startup, planning, or assigned work is a launch-profile failure** — record and recover it, never hand-wave it away or train the human to approve routine commands. Consultation servants get read-only plus the disk-read permission ordinary host config reads need, or the Overlord performs those reads itself.
|
|
52
|
-
- Probe
|
|
53
|
-
- **Delivery is not execution.** `mmi-cli overlord send`
|
|
52
|
+
- Probe the API path before sending real prompts.
|
|
53
|
+
- **Delivery is not execution.** `mmi-cli overlord send` is complete only when the API returns captured servant text or a bounded failure. If the request times out, errors, or returns no assistant text, mark the servant `blocked` and the message `failed` — never report it as ready or delivered.
|
|
54
54
|
- **No handoff after delivery = stalled, not ready.** When a servant stays `ready` but produces no non-TUI output after a bounded handoff-expected interval, mark it `stalled-after-delivery`; do not keep reporting it as ready.
|
|
55
55
|
- Never rely on stale ACKs as liveness proof.
|
|
56
56
|
- Never broad-kill by process name, title, shell name, or model command.
|
|
@@ -1,33 +1,30 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Run Registry And Orphan Guard
|
|
2
2
|
|
|
3
|
-
The
|
|
3
|
+
The run registry, not conversational memory, owns servants.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Registry responsibilities:
|
|
6
6
|
|
|
7
|
-
-
|
|
8
|
-
- Hold readable/writable handles.
|
|
7
|
+
- Record Fugu API servant conversations.
|
|
9
8
|
- Persist run state under gitignored `tmp/overlord`.
|
|
10
|
-
-
|
|
11
|
-
- Tee bounded journals.
|
|
9
|
+
- Tee bounded ledger and event journals.
|
|
12
10
|
- Expose status, stop, adopt, and recover.
|
|
13
11
|
|
|
14
12
|
On every `/overlord`, `status`, `stop`, resume, or human message:
|
|
15
13
|
|
|
16
14
|
- Rehydrate run state.
|
|
17
|
-
- Check
|
|
18
|
-
- Check servant handles.
|
|
15
|
+
- Check model metadata, conversation history, request ids, and last useful signal.
|
|
19
16
|
- Classify orphan state before doing more work.
|
|
20
17
|
|
|
21
18
|
Orphan classifications:
|
|
22
19
|
|
|
23
|
-
- `
|
|
24
|
-
- `
|
|
25
|
-
- `
|
|
26
|
-
- `
|
|
20
|
+
- `overlord-conversations-live`
|
|
21
|
+
- `conversations-blocked`
|
|
22
|
+
- `resources-owned-alive`
|
|
23
|
+
- `resources-uncertain`
|
|
27
24
|
|
|
28
25
|
Actions:
|
|
29
26
|
|
|
30
|
-
- Adopt only with matching run token and recoverable
|
|
27
|
+
- Adopt only with a matching run token and recoverable conversation state.
|
|
31
28
|
- Exact-stop only proven run-owned resources.
|
|
32
29
|
- Leave uncertain resources alone and report them.
|
|
33
|
-
- Never broad-clean by process name or
|
|
30
|
+
- Never broad-clean by process name, title, shell name, or provider name.
|
|
@@ -2,17 +2,17 @@
|
|
|
2
2
|
|
|
3
3
|
Test these before accepting `/overlord`:
|
|
4
4
|
|
|
5
|
-
- Windows
|
|
6
|
-
-
|
|
7
|
-
-
|
|
8
|
-
-
|
|
9
|
-
-
|
|
10
|
-
- Missing `
|
|
11
|
-
- `
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
- `/overlord stop` leaves user-owned terminals,
|
|
5
|
+
- Windows host work uses modern PowerShell (`pwsh`) syntax and native paths.
|
|
6
|
+
- macOS zsh and Linux bash use POSIX syntax for host work.
|
|
7
|
+
- Missing `SAKANA_API_KEY` and `MMI_OVERLORD_LLM_API_KEY` stops startup with setup steps.
|
|
8
|
+
- A bad Fugu API base URL stops startup with setup steps.
|
|
9
|
+
- `GET /models` returning an error stops startup with setup steps.
|
|
10
|
+
- Missing `fugu` or `fugu-ultra` stops startup with setup steps.
|
|
11
|
+
- Startup `/chat/completions` timeout marks the servant `blocked`.
|
|
12
|
+
- Startup API success with no assistant text marks the servant `blocked`.
|
|
13
|
+
- `send all` records one completed or failed result per targeted servant.
|
|
14
|
+
- A redirect timeout marks the message `failed` and does not claim delivery.
|
|
15
|
+
- Previously ACKed servants with failed follow-up requests become `blocked`; stale ACK is rejected.
|
|
16
|
+
- Awaiting-human preserves servant conversation state.
|
|
17
|
+
- `/overlord stop` leaves user-owned terminals, shells, and unrelated provider processes untouched.
|
|
18
18
|
- Ambiguous leftovers are reported as `left-uncertain`.
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Native Fugu API Engine
|
|
2
|
+
|
|
3
|
+
Overlord servants run through the Sakana Fugu API using the OpenAI-compatible chat-completions surface.
|
|
4
|
+
|
|
5
|
+
Defaults:
|
|
6
|
+
|
|
7
|
+
- Base URL: `https://api.sakana.ai/v1`
|
|
8
|
+
- Normal model: `fugu`
|
|
9
|
+
- Ultra model: `fugu-ultra`
|
|
10
|
+
- API key: `SAKANA_API_KEY`, or `MMI_OVERLORD_LLM_API_KEY`
|
|
11
|
+
- Request timeout: `MMI_OVERLORD_LLM_TIMEOUT_MS`, clamped between 5 seconds and 10 minutes, defaulting to 90 seconds
|
|
12
|
+
|
|
13
|
+
Optional overrides:
|
|
14
|
+
|
|
15
|
+
- `MMI_OVERLORD_LLM_BASE_URL`
|
|
16
|
+
- `MMI_OVERLORD_LLM_MODEL`
|
|
17
|
+
- `MMI_OVERLORD_LLM_ULTRA_MODEL`
|
|
18
|
+
|
|
19
|
+
## Preflight
|
|
20
|
+
|
|
21
|
+
Before launching servants, probe `GET /models` with the configured key and base URL.
|
|
22
|
+
|
|
23
|
+
Startup is blocked unless the probe proves both configured models are available.
|
|
24
|
+
|
|
25
|
+
Do not print the API key.
|
|
26
|
+
|
|
27
|
+
## Conversation State
|
|
28
|
+
|
|
29
|
+
Each servant owns one stored conversation:
|
|
30
|
+
|
|
31
|
+
- one system message with the servant identity and Overlord constraints
|
|
32
|
+
- the startup assignment as a user message
|
|
33
|
+
- every captured assistant response
|
|
34
|
+
- future redirects appended as user messages
|
|
35
|
+
|
|
36
|
+
The run registry records the model, request id when available, and conversation length.
|
|
37
|
+
|
|
38
|
+
## Redirects
|
|
39
|
+
|
|
40
|
+
`mmi-cli overlord send <target> <message>` calls `/chat/completions` for the target servant or each servant in `all`.
|
|
41
|
+
|
|
42
|
+
The command returns only after the API response is captured or a bounded failure is recorded.
|
|
43
|
+
|
|
44
|
+
If a request times out, errors, or returns no assistant text:
|
|
45
|
+
|
|
46
|
+
- mark the servant `blocked`
|
|
47
|
+
- mark the message `failed`
|
|
48
|
+
- write the failure to the ledger
|
|
49
|
+
|
|
50
|
+
## Ledger
|
|
51
|
+
|
|
52
|
+
Append one ledger event for startup and one per redirect response.
|
|
53
|
+
|
|
54
|
+
Include only operational metadata:
|
|
55
|
+
|
|
56
|
+
- servant slot id
|
|
57
|
+
- model
|
|
58
|
+
- request id
|
|
59
|
+
- response text
|
|
60
|
+
- error text
|
|
61
|
+
|
|
62
|
+
Never include API keys or provider secrets.
|
|
@@ -4,9 +4,9 @@ An ACK creates a lease, not permanent proof.
|
|
|
4
4
|
|
|
5
5
|
Readiness requires:
|
|
6
6
|
|
|
7
|
-
- Current
|
|
8
|
-
-
|
|
9
|
-
-
|
|
7
|
+
- Current Fugu API conversation state.
|
|
8
|
+
- Configured model for the servant role.
|
|
9
|
+
- Stored system/user/assistant messages.
|
|
10
10
|
- Matching run id and run token.
|
|
11
11
|
- Recent useful signal or bounded liveness response.
|
|
12
12
|
|
|
@@ -15,13 +15,13 @@ Stale ACK-only readiness is forbidden.
|
|
|
15
15
|
Awaiting-human:
|
|
16
16
|
|
|
17
17
|
- Servants remain leased.
|
|
18
|
-
-
|
|
18
|
+
- Run registry remains current.
|
|
19
19
|
- Status rehydrates state and checks liveness.
|
|
20
|
-
- If background liveness is unsupported,
|
|
20
|
+
- If background liveness is unsupported, preserve conversation state and require a rehydrate pass before work resumes.
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
Blocked servant:
|
|
23
23
|
|
|
24
|
-
- Mark the slot
|
|
25
|
-
- Preserve bounded journal.
|
|
26
|
-
- Attempt recovery only when
|
|
24
|
+
- Mark the slot `blocked`.
|
|
25
|
+
- Preserve bounded journal and conversation state.
|
|
26
|
+
- Attempt recovery only when the API key, model, and conversation can be proven.
|
|
27
27
|
- Otherwise spawn a replacement in the same role slot with a compact handoff.
|
|
@@ -10,7 +10,8 @@ Detect:
|
|
|
10
10
|
|
|
11
11
|
Rules:
|
|
12
12
|
|
|
13
|
-
-
|
|
13
|
+
- Windows `pwsh`: use modern PowerShell syntax and native Windows paths.
|
|
14
|
+
- Legacy Windows PowerShell (`powershell.exe`) is not an acceptable host shell for Overlord; use `pwsh`.
|
|
14
15
|
- cmd: use cmd syntax and native Windows paths.
|
|
15
16
|
- Windows Git Bash: distinguish shell paths from native Windows consumer-process paths.
|
|
16
17
|
- macOS zsh/bash: use POSIX syntax and macOS paths.
|
|
@@ -6,22 +6,15 @@ Minimum fields:
|
|
|
6
6
|
|
|
7
7
|
- `runId`
|
|
8
8
|
- `runToken`
|
|
9
|
-
- `repo`
|
|
10
9
|
- `worktree`
|
|
11
|
-
- `
|
|
12
|
-
- `
|
|
13
|
-
- `surface`
|
|
14
|
-
- `hostPlatform`
|
|
15
|
-
- `shellAdapter`
|
|
10
|
+
- `engine`
|
|
11
|
+
- `provider`
|
|
16
12
|
- `state`
|
|
17
13
|
- `createdAt`
|
|
18
14
|
- `updatedAt`
|
|
19
|
-
- `controllerPid`
|
|
20
|
-
- `controllerFingerprint`
|
|
21
|
-
- `lastControllerHeartbeatAt`
|
|
22
15
|
- `statePath`
|
|
23
16
|
- `journalDir`
|
|
24
|
-
- `
|
|
17
|
+
- `ledgerPath`
|
|
25
18
|
- `servants[]`
|
|
26
19
|
- `messages[]`
|
|
27
20
|
- `ownedResources[]`
|
|
@@ -33,17 +26,17 @@ Servant fields:
|
|
|
33
26
|
- `role`
|
|
34
27
|
- `model`
|
|
35
28
|
- `profile`
|
|
36
|
-
- `state` (includes `
|
|
37
|
-
- `pid`
|
|
38
|
-
- `runToken`
|
|
39
|
-
- `fingerprint`
|
|
29
|
+
- `state` (includes `blocked` when an API request fails or returns no assistant text)
|
|
40
30
|
- `composerSubmitMode`
|
|
41
|
-
- `
|
|
31
|
+
- `llmModel`
|
|
32
|
+
- `llmRequestId`
|
|
33
|
+
- `llmMessages`
|
|
42
34
|
- `lastAckAt`
|
|
43
35
|
- `lastLivenessCheckAt`
|
|
44
36
|
- `lastUsefulSignalAt`
|
|
45
37
|
- `journalPath`
|
|
46
38
|
- `eventJournalPath`
|
|
39
|
+
- `scopeToken` (optional non-secret attribution token bound to run id, servant slot, profile, and assignment scope)
|
|
47
40
|
- `assignment`
|
|
48
41
|
- `handoff`
|
|
49
42
|
|
|
@@ -53,14 +46,14 @@ Message fields:
|
|
|
53
46
|
- `target`
|
|
54
47
|
- `text`
|
|
55
48
|
- `createdAt`
|
|
56
|
-
- `state` (`queued` | `started` | `completed` | `failed`)
|
|
49
|
+
- `state` (`queued` | `started` | `completed` | `partial` | `failed`)
|
|
57
50
|
- `queuedAt`
|
|
58
51
|
- `startedAt`
|
|
59
52
|
- `completedAt`
|
|
60
53
|
- `failedAt`
|
|
61
54
|
- `responseText`
|
|
62
55
|
- `failureReason`
|
|
63
|
-
- `
|
|
56
|
+
- `servantResults[]`
|
|
64
57
|
|
|
65
58
|
Owned resource fields:
|
|
66
59
|
|
|
@@ -1,44 +1,47 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Session Leash
|
|
2
2
|
|
|
3
|
-
The Overlord must own every servant
|
|
3
|
+
The Overlord must own every servant conversation through a durable registry, model metadata, request ids when available, and a bounded message lifecycle.
|
|
4
|
+
|
|
5
|
+
The native Fugu API is the supported Overlord engine.
|
|
4
6
|
|
|
5
7
|
Startup phases shown to humans:
|
|
6
8
|
|
|
7
|
-
- Loading
|
|
8
|
-
- Checking Fugu
|
|
9
|
+
- Loading the run registry.
|
|
10
|
+
- Checking the Fugu API key and base URL.
|
|
11
|
+
- Checking `/models` for `fugu` and `fugu-ultra`.
|
|
9
12
|
- Starting one Ultra and normal Fugus.
|
|
13
|
+
- Recording model, request, and conversation state.
|
|
10
14
|
- Loading servant instructions.
|
|
11
|
-
- Waiting for ACKs.
|
|
15
|
+
- Waiting for API-backed ACKs.
|
|
12
16
|
- Ready.
|
|
13
17
|
|
|
14
|
-
Do not show raw
|
|
15
|
-
|
|
16
|
-
Approval profiles:
|
|
18
|
+
Do not show raw provider response bodies, stack traces, or retry noise unless startup fails or debug output is requested.
|
|
17
19
|
|
|
18
|
-
|
|
19
|
-
- Implementation: `codex-fugu --no-alt-screen -a never -s workspace-write -c 'sandbox_permissions=["disk-full-read-access"]' -C <owned-worktree>`
|
|
20
|
-
- Full-trust repair: only with explicit human approval and narrow blast radius.
|
|
20
|
+
Servant access model:
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
- Servants are API conversations, not shell sessions.
|
|
23
|
+
- Servants do not receive direct tools, stage/dev servers, browsers, Playwright, PR rights, or release rights.
|
|
24
|
+
- The Overlord performs tool use, checks, edits, and PR operations in the host session after judging servant advice.
|
|
25
|
+
- Full-trust repair is never delegated to servants.
|
|
23
26
|
|
|
24
27
|
Before launch:
|
|
25
28
|
|
|
26
|
-
- Verify
|
|
27
|
-
-
|
|
28
|
-
- Verify
|
|
29
|
+
- Verify `SAKANA_API_KEY` or `MMI_OVERLORD_LLM_API_KEY` is available.
|
|
30
|
+
- Verify the configured base URL, defaulting to `https://api.sakana.ai/v1`.
|
|
31
|
+
- Verify `GET /models` exposes `fugu` and `fugu-ultra`, or the configured model overrides.
|
|
29
32
|
- Fail closed when semantics are missing or unknown.
|
|
30
33
|
|
|
31
|
-
|
|
34
|
+
Conversation probe:
|
|
32
35
|
|
|
33
|
-
- Prefer initial-prompt launch through
|
|
34
|
-
- Require the servant to emit
|
|
35
|
-
- Record `
|
|
36
|
-
- Fail startup if no
|
|
36
|
+
- Prefer initial-prompt launch through `/chat/completions`.
|
|
37
|
+
- Require the servant to emit useful assistant text for its startup assignment.
|
|
38
|
+
- Record `llmModel`, `llmRequestId` when available, `llmMessages`, `lastEventAt`, `lastAckAt`, and `composerSubmitMode=surface-api`.
|
|
39
|
+
- Fail startup if no assistant text is captured.
|
|
37
40
|
|
|
38
|
-
Redirects after startup use `mmi-cli overlord send <target> <message>`; the
|
|
41
|
+
Redirects after startup use `mmi-cli overlord send <target> <message>`; the CLI appends to the servant conversation and records completion or bounded failure. Do not bypass the mailbox with ad-hoc provider calls unless diagnosing the leash itself.
|
|
39
42
|
|
|
40
43
|
Stop safety:
|
|
41
44
|
|
|
42
45
|
- Stop only recorded resources with matching run id, run token, and fingerprint.
|
|
43
|
-
- Refuse generic
|
|
46
|
+
- Refuse generic process, shell, terminal, or provider names without exact ownership.
|
|
44
47
|
- Refuse window-title-only ownership.
|