@maestrofrontier/frontier 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/AGENTS.md +214 -0
  2. package/CLAUDE.md +29 -0
  3. package/LICENSE +21 -0
  4. package/README.md +521 -0
  5. package/bin/maestro.cjs +75 -0
  6. package/commands/compress.md +36 -0
  7. package/commands/context-bar.md +30 -0
  8. package/commands/frontier.md +124 -0
  9. package/commands/settings.md +101 -0
  10. package/commands/terse.md +23 -0
  11. package/commands/update.md +59 -0
  12. package/docs/orchestration.md +168 -0
  13. package/frontier/cli.cjs +248 -0
  14. package/frontier/config.cjs +441 -0
  15. package/frontier/dispatch.cjs +255 -0
  16. package/frontier/judge.cjs +92 -0
  17. package/frontier/run.cjs +148 -0
  18. package/frontier/schema.cjs +112 -0
  19. package/frontier/semaphore.cjs +49 -0
  20. package/frontier/synthesize.cjs +79 -0
  21. package/hooks/frontier-autorun.cjs +124 -0
  22. package/hooks/hooks.json +103 -0
  23. package/hooks/maestro-doctrine-guard.cjs +81 -0
  24. package/hooks/maestro-gate-reminder.cjs +58 -0
  25. package/hooks/maestro-gate-telemetry.cjs +77 -0
  26. package/hooks/maestro-loop-guard.cjs +76 -0
  27. package/hooks/maestro-phase-scope.cjs +118 -0
  28. package/hooks/maestro-statusline-sync.cjs +152 -0
  29. package/hooks/maestro-subagent-guard.cjs +148 -0
  30. package/hooks/maestro-terse-mode.cjs +189 -0
  31. package/hooks/maestro-toolbudget-advisory.cjs +127 -0
  32. package/integrations/README.md +87 -0
  33. package/integrations/cline/skills/frontier/SKILL.md +75 -0
  34. package/integrations/codex/prompts/frontier.md +66 -0
  35. package/integrations/codex/prompts/update.md +36 -0
  36. package/integrations/cursor/commands/frontier.md +63 -0
  37. package/integrations/cursor/commands/update.md +34 -0
  38. package/integrations/gemini/commands/frontier.toml +76 -0
  39. package/integrations/windsurf/workflows/frontier.md +70 -0
  40. package/package.json +52 -0
  41. package/scripts/install.cjs +490 -0
  42. package/settings/cli.cjs +140 -0
  43. package/settings/config.cjs +309 -0
@@ -0,0 +1,30 @@
1
+ ---
2
+ description: Toggle the Maestro context progress bar in the Claude Code status line
3
+ argument-hint: [on|off]
4
+ allowed-tools: Bash, Read
5
+ ---
6
+
7
+ Toggle the Maestro context bar shown in the Claude Code status line.
8
+
9
+ The bar is rendered by `context-bar.ps1` (Windows) or `context-bar.sh`
10
+ (macOS / Linux) in the Claude Code statusline directory, normally
11
+ `~/.claude/statusline/`. It self-gates on a flag file named
12
+ `.context-bar-disabled` in that same directory:
13
+
14
+ - flag file ABSENT -> bar enabled (default)
15
+ - flag file PRESENT -> bar disabled (status line shows the folder name only)
16
+
17
+ Requested state: `$ARGUMENTS`
18
+
19
+ Steps:
20
+
21
+ 1. Find the statusline directory: it is the directory containing the script
22
+ path in the `statusLine.command` field of `~/.claude/settings.json`.
23
+ Default to `~/.claude/statusline/` if it cannot be determined.
24
+ 2. Resolve the action against the flag file `.context-bar-disabled` in that
25
+ directory:
26
+ - `on` / `enable` -> delete the flag file (no-op if already absent).
27
+ - `off` / `disable` -> create an empty flag file.
28
+ - no argument -> toggle: delete the flag if present, else create it.
29
+ 3. Confirm the resulting state in one line. The change takes effect on the
30
+ next status line refresh.
@@ -0,0 +1,124 @@
1
+ ---
2
+ description: Maestro Frontier local multi-CLI engine: arm a mode (off/single/fusion) so it auto-runs every prompt, pick a model/preset, or run a one-off prompt
3
+ argument-hint: "<off | single <model> | fusion <preset> | status | run <prompt> | adopt>"
4
+ allowed-tools: Bash, Read
5
+ ---
6
+
7
+ Drive the Maestro Frontier engine: a zero-dependency local multi-CLI
8
+ fusion engine (parallel panel of local CLIs -> a judge model's
9
+ analysis -> grounded synthesis). Default mode is `off`: the engine is opt-in and
10
+ never runs until you switch it on. Arming it (`single` or `fusion`)
11
+ makes it **auto-run on every prompt** — a `UserPromptSubmit` hook
12
+ (`hooks/frontier-autorun.cjs`) routes each prompt through the engine and
13
+ the live session relays the synthesized answer. `off` disables auto-run.
14
+ Autorun blocks the turn until the engine returns; the hook carries a
15
+ 300s timeout (`hooks/hooks.json`), and a run that exceeds it is skipped
16
+ so the turn proceeds normally. Any engine error degrades the same way.
17
+
18
+ Requested action: `$ARGUMENTS`
19
+
20
+ Map the argument to one engine CLI call and run it. The engine is
21
+ self-contained; do not edit its state file yourself.
22
+
23
+ 1. Mode switch (persists across sessions; default `off`):
24
+
25
+ ```bash
26
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier mode off
27
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier mode single --model <model>
28
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier mode fusion --preset <preset>
29
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier mode fusion --preset custom --models <a,b,c>
30
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier mode fusion --preset <preset> --judge <model> --synth <model>
31
+ ```
32
+
33
+ Models: `opus` (Claude Opus 4.8), `gpt-5.5` (Codex), `gemini`
34
+ (Gemini 3.1 Pro). Presets: `opus-duo`, `opus-gpt`, `gpt-duo`,
35
+ `frontier-trio`, `custom`. The judge + synthesizer default to Opus,
36
+ but `gpt-duo` runs them on GPT-5.5 (a Codex-only fusion that needs no
37
+ `claude`), and `--judge`/`--synth` override the model for any preset,
38
+ so you can mix freely (e.g. `--judge opus --synth gpt-5.5`).
39
+
40
+ 2. Show current mode/preset:
41
+
42
+ ```bash
43
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier status
44
+ ```
45
+
46
+ 3. Run a prompt through the current mode (prompt as the argument, or
47
+ piped on stdin). This is a manual one-off; when the engine is armed
48
+ (`single`/`fusion`) the autorun hook already runs every prompt for
49
+ you, so `run` is mainly for scripting or an explicit re-run:
50
+
51
+ ```bash
52
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier run "<prompt>"
53
+ ```
54
+
55
+ - `off`: prints a notice and exits without spawning anything;
56
+ normal Maestro behavior is unchanged.
57
+ - `single`: dispatches the one selected CLI and prints its answer.
58
+ - `fusion`: runs the panel in parallel, then the judge and
59
+ synthesizer models; prints the final answer (a one-line run meta of
60
+ preset, models, analysis present, and failed models goes to stderr).
61
+
62
+ 4. Adopt a previously-armed **global** mode into this workspace
63
+ (per-workspace isolation means a workspace never inherits the old
64
+ global state automatically — this copies it in once, on demand):
65
+
66
+ ```bash
67
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier adopt
68
+ node "${CLAUDE_PLUGIN_ROOT}/bin/maestro.cjs" frontier adopt --force
69
+ ```
70
+
71
+ Reads the legacy global `frontier-state.json` read-only and writes it
72
+ into this workspace's `frontier-state.cc-<hash>.json`. It refuses to
73
+ overwrite an existing workspace state unless `--force`, never touches
74
+ the legacy file, and only targets a Claude Code `cc-*` scope. If
75
+ there is no legacy state (`missing-legacy`) it does nothing — arm the
76
+ workspace with `mode` instead.
77
+
78
+ 5. Report the result, matched to the action:
79
+ - `run`: report the engine's stdout verbatim.
80
+ - `adopt`: confirm the adopted mode/preset, or relay the
81
+ `ERROR [<reason>]` (e.g. `exists` — suggest `--force`; `missing-legacy`
82
+ — nothing to adopt). Arming now auto-runs on every prompt.
83
+ - arming a mode (`single`/`fusion`): confirm the mode, then tell the
84
+ user plainly that the engine now **auto-runs on every prompt** —
85
+ they just chat normally and the synthesized answer is relayed.
86
+ Do NOT tell the user to call `run` manually; arming already routes
87
+ every prompt through the engine (`run` is only a scripted one-off).
88
+ - `off`: confirm auto-run is disabled and normal Maestro resumes.
89
+ - `status`: report the current mode/preset.
90
+
91
+ On an error the engine prints `ERROR [<failure_reason>]: <detail>`
92
+ to stderr and exits non-zero; relay the failure_reason.
93
+
94
+ Notes:
95
+
96
+ - **Arming in Claude Code is per-workspace.** State is stored in a
97
+ workspace-scoped file (`frontier-state.cc-<hash>.json`, keyed to the
98
+ git project root). Arming in one workspace does not affect any other.
99
+ After upgrading Maestro, each Claude Code workspace starts `off` and
100
+ must be re-armed — no automatic migration into workspace scopes.
101
+ `adopt` (step 4) is the explicit opt-in to copy a prior global mode in.
102
+ - Real `single`/`fusion` runs spawn local CLIs and cost tokens; each
103
+ cold `claude -p` panel/judge/synth call is non-trivial. Use small
104
+ prompts. `off` is free.
105
+ - Headless web access varies per CLI (Codex confirmed; Claude and
106
+ Gemini gated off in this build). The engine sets a per-adapter
107
+ `webTools` flag accordingly; see the risk burndown.
108
+ - `gemini` works well as a panel member, but on Windows it is a poor
109
+ `--judge`/`--synth` choice: it takes its prompt as an argument and the
110
+ judge/synth prompts contain newlines, which the win32 arg-safety guard
111
+ refuses, so the stage then degrades (judge omitted, synth falls back).
112
+ Use `opus` or `gpt-5.5` for judge/synth on Windows. (No such limit on
113
+ macOS/Linux, where args are passed directly.)
114
+
115
+ ## Binary overrides
116
+
117
+ The engine is zero-dependency CommonJS under `frontier/`. Each CLI is
118
+ resolved from your `PATH`: `claude` (Opus 4.8), `codex` (GPT-5.5), and
119
+ `gemini` (Gemini 3.1 Pro). When a binary is not on `PATH`, or you want a
120
+ specific build, point at it with an environment variable:
121
+
122
+ - `MAESTRO_CLAUDE_BIN` sets the `claude` binary path.
123
+ - `MAESTRO_CODEX_BIN` sets the `codex` binary path.
124
+ - `MAESTRO_GEMINI_BIN` sets the `gemini` binary path.
@@ -0,0 +1,101 @@
1
+ ---
2
+ description: View and change Maestro's toggles (terse, frontier, context-bar) — direct args or a keyboard picker
3
+ argument-hint: "[status | list | help | set <key> <value> | terse|frontier|context-bar <value>]"
4
+ allowed-tools: Bash, AskUserQuestion
5
+ ---
6
+
7
+ See and change Maestro's toggles: terse output, the frontier fusion engine,
8
+ and the context-bar status line. `compress` is an action, not a toggle, so it
9
+ is not shown here.
10
+
11
+ Two ways to use it. With **no arguments** it opens a keyboard picker. With
12
+ **arguments** it runs the change directly and finishes — no questionnaire.
13
+
14
+ One writer owns all state: `settings/cli.cjs`, which reads and writes the
15
+ three existing stores. This command never edits those files directly; it
16
+ always goes through the CLI so the existing readers stay in sync. Throughout,
17
+ `SCLI` = `node "${CLAUDE_PLUGIN_ROOT}/settings/cli.cjs"`.
18
+
19
+ Requested action: `$ARGUMENTS`
20
+
21
+ ## Route on `$ARGUMENTS`
22
+
23
+ Trim it and split into tokens. The first token selects the path; **anything
24
+ non-empty skips the picker entirely.**
25
+
26
+ - **empty** → run the **Interactive picker** section below.
27
+ - **`status`** (optionally `--json`) → run `SCLI status [--json]`, print it, stop.
28
+ - **`list`** (optionally `--json`) → run `SCLI list [--json]`, print it, stop.
29
+ - **`help`**, `-h`, `--help`, or anything unrecognized → run `SCLI help`,
30
+ print it (usage grammar + the full value matrix), stop. For an
31
+ unrecognized token, say so first, then show help.
32
+ - **`set <key> <value> [flags]`** → pass straight through:
33
+ `SCLI set <key> <value> [--judge M] [--synth M] [--models a,b,c]`.
34
+ - **shorthand** — first token is a key (`terse`, `frontier`, `context-bar`,
35
+ or `bar`): treat the rest as the value and run `SCLI set <key> <value>`.
36
+
37
+ ### Normalizing the value (both `set` and shorthand)
38
+
39
+ The CLI takes one frontier value with a colon (`single:opus`,
40
+ `fusion:opus-gpt`). Accept the friendlier space form and normalize:
41
+
42
+ - `frontier off` → `SCLI set frontier off`
43
+ - `frontier single opus` → `SCLI set frontier single:opus`
44
+ - `frontier fusion opus-gpt` → `SCLI set frontier fusion:opus-gpt`
45
+ - `frontier fusion custom --models opus,gpt-5.5,gemini`
46
+ → `SCLI set frontier fusion:custom --models opus,gpt-5.5,gemini`
47
+ - `frontier fusion opus-gpt --judge opus --synth gpt-5.5` → pass the flags through
48
+ - already-coloned (`frontier fusion:opus-gpt`) → pass as-is
49
+ - `terse ultra`, `context-bar off` → `SCLI set terse ultra`, `SCLI set context-bar off`
50
+
51
+ After any write, report any `WARNING:` line the CLI prints (for example an
52
+ active `MAESTRO_TERSE_LEVEL` override or an unconfirmed status-line script),
53
+ then run `SCLI status` so the result is visible. Confirm in one line.
54
+
55
+ If a value is invalid the CLI exits non-zero with a message — relay it and
56
+ show `SCLI help` so the user sees the valid values.
57
+
58
+ ## Interactive picker (no arguments)
59
+
60
+ 1. Read state and the catalog: `SCLI status --json` and `SCLI list --json`.
61
+ Build every option below from `list` — never hardcode a model or preset.
62
+
63
+ 2. First `AskUserQuestion` call, three questions, each pre-set to the current
64
+ value (only write a toggle whose answer differs from `status`):
65
+ - terse: `list.terse.values` (`off`, `lite`, `full`, `ultra`).
66
+ - context-bar: `list.contextBar.values` (`on`, `off`).
67
+ - frontier mode: `off`, `single`, `fusion` (`list.frontier.modes`).
68
+
69
+ 3. Frontier follow-ups (only if mode is not `off`):
70
+ - **single** → one question, `model` = `list.frontier.models` (3, fits);
71
+ result `single:<id>`.
72
+ - **fusion** → pick from `list.frontier.presets` (named presets + `custom`);
73
+ there are more than 4, so **page them 3 at a time** with a fourth
74
+ `More presets…` option that re-asks with the next three until all,
75
+ including `custom`, have been offered. Never drop a preset or require
76
+ typing its name.
77
+ - **custom** → one `multiSelect` question, `models` = `list.frontier.models`,
78
+ 2–8 selected; result `fusion:custom --models <ids,joined>`.
79
+ - After any fusion preset, offer **override judge/synth?** (`No` uses
80
+ `list.frontier.presetStages[preset]` if present else
81
+ `list.frontier.defaults`; `Yes` → two questions, `judge` and `synth`,
82
+ each `list.frontier.stageModels`). On Windows prefer `opus`/`gpt-5.5`;
83
+ `gemini` degrades as judge/synth (see `commands/frontier.md`).
84
+
85
+ 4. Write each change with `SCLI set ...` (as in the routing section), report
86
+ warnings, then show `SCLI status`.
87
+
88
+ ## Codex and any other CLI
89
+
90
+ No keyboard picker (Codex cannot host one). Use the CLI directly — it is the
91
+ writer this command calls:
92
+
93
+ ```bash
94
+ node settings/cli.cjs help # usage grammar + every value
95
+ node settings/cli.cjs status # current values
96
+ node settings/cli.cjs set frontier fusion:frontier-trio
97
+ node settings/cli.cjs set frontier fusion:custom --models opus,gpt-5.5,gemini
98
+ node settings/cli.cjs set frontier fusion:opus-gpt --judge opus --synth gpt-5.5
99
+ ```
100
+
101
+ See [`docs/settings.md`](../docs/settings.md) for the full reference.
@@ -0,0 +1,23 @@
1
+ ---
2
+ description: Switch Maestro terse output mode (lite|full|ultra|off)
3
+ argument-hint: "[lite|full|ultra|off]"
4
+ ---
5
+
6
+ Switch the Maestro terse output level. Requested level: `$ARGUMENTS`
7
+
8
+ The maestro-terse-mode hook already updated the flag file when this
9
+ command was submitted (bare invocation activates the configured
10
+ default, or `full`). Do not edit the flag yourself.
11
+
12
+ Steps:
13
+
14
+ 1. Read `${CLAUDE_PLUGIN_ROOT}/skills/terse/SKILL.md` and apply the
15
+ ruleset for the requested level from this response onward
16
+ (`off`: drop terse style entirely).
17
+ 2. Confirm in one line: new level, how to switch off
18
+ (`/maestro:terse off`, "stop terse", "normal mode"), and that the
19
+ statusline badge shows `[TERSE:<LEVEL>]` next session refresh.
20
+
21
+ Boundaries (always): code/commits/PRs written normal; Auto-Clarity
22
+ escape for security warnings, irreversible-action confirmations, and
23
+ multi-step sequences.
@@ -0,0 +1,59 @@
1
+ ---
2
+ description: Update the Maestro plugin to the latest marketplace code and instruct the user to reload or restart
3
+ argument-hint: ""
4
+ allowed-tools: Bash, Read
5
+ ---
6
+
7
+ Pull the latest Maestro plugin code from the marketplace and guide the user
8
+ through applying it to the running session.
9
+
10
+ Maestro pins no plugin version — the marketplace clone always tracks the newest
11
+ committed code, so a marketplace update followed by a session reload resolves
12
+ the latest without any manual version bump.
13
+
14
+ ## Steps
15
+
16
+ 1. **Pull the latest plugin code** via Bash:
17
+
18
+ ```bash
19
+ claude plugin marketplace update maestro
20
+ ```
21
+
22
+ This is the non-interactive CLI form of the marketplace update. Capture
23
+ stdout/stderr. If the command exits non-zero, report the error verbatim and
24
+ stop. If `claude` is not on PATH, tell the user to run this command in their
25
+ terminal manually and skip to step 3.
26
+
27
+ 2. **Report what changed** — after a successful update, show the latest commits:
28
+
29
+ ```bash
30
+ git -C "${CLAUDE_PLUGIN_ROOT}" log --oneline -5 2>/dev/null || true
31
+ ```
32
+
33
+ If the plugin root is not a git checkout (e.g. a marketplace zip install),
34
+ note the version is whatever the marketplace published and skip this step.
35
+
36
+ 3. **Reload the running session** — the plugin binary is pinned at session
37
+ start, so the new code is NOT live yet. Tell the user:
38
+
39
+ > Run `/reload-plugins` in this session to apply the update without
40
+ > restarting. If that command warns or is unavailable, restart Claude Code —
41
+ > the updated plugin loads automatically on next launch.
42
+
43
+ Do not attempt to run `/reload-plugins` via Bash; it is an in-session
44
+ slash command that must be entered by the user in the Claude Code UI.
45
+
46
+ 4. **Post-restart re-sync** — on restart the `SessionStart` hook fires and
47
+ re-syncs any wired copies (e.g. the statusline context-bar script).
48
+ This happens automatically; no manual step is needed.
49
+
50
+ 5. **Confirm** — report: update pulled, session reload required via
51
+ `/reload-plugins` or restart.
52
+
53
+ Notes:
54
+
55
+ - `/reload-plugins` applies the new code in-session; a full restart is
56
+ always a safe fallback.
57
+ - Do NOT tell the user to run `/plugin update maestro` — that command does not
58
+ exist in Claude Code.
59
+ - Do not edit any files.
@@ -0,0 +1,168 @@
1
+ # Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
2
+
3
+ Loaded on demand: read this file when the Decision Gate
4
+ ([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
5
+ kernel's compact protocol is a subset of this document and suffices
6
+ when this file is unavailable. Relocated verbatim from the always-on
7
+ doctrine in v1.2, content here extends the kernel, never overrides
8
+ it.
9
+
10
+ ---
11
+
12
+ ## Gate constraints (S1 detail)
13
+
14
+ - Max 4 specialists per group.
15
+ - >60% shared files or <=3 files in one chain: single-agent.
16
+ - Overlapping ownership erases parallelism; high-centrality: bias
17
+ single.
18
+ - Specialists must differ in role or context, not split identical
19
+ work, homogeneous splits underperform one agent with the same
20
+ budget. Split-design rule for the Planner, not a gate downgrade.
21
+ - Parallelizability first: specialization pays only when subtasks are
22
+ structurally independent. Coupled subtasks: single-agent wins at
23
+ equal token budget, gains that ignore total compute don't count.
24
+ - Adversarial review is the best-evidenced multi-agent win. Review
25
+ and debate panels: 3 specialists (odd, no ties); 4 stays the cap
26
+ for parallel workstreams.
27
+ - How to split (and whether a split is too homogeneous) is the
28
+ Planner's call (S2), made after the spawn, never the gate's.
29
+
30
+ ---
31
+
32
+ ## 2. Planner [MULTI-AGENT]
33
+
34
+ First sub-agent, created by calling the Task/Agent tool, never
35
+ simulated inline by the orchestrator. No specialist work before
36
+ Planner returns.
37
+
38
+ Produces: subtasks with boundaries, dependency map, parallel groups
39
+ (max 4), per-task file scope + objective + acceptance criteria, flags
40
+ for single-agent subtasks and high-risk items, cross-talk pairs,
41
+ token-cost assessment (flag >60% overlap), task-class match.
42
+
43
+ Fewer broader > many narrow. Flag ambiguity, don't assume.
44
+
45
+ Reading: recommends single-agent -> switch. Ambiguities -> surface.
46
+
47
+ Task classes: Feature (spec/implement/test/integrate),
48
+ Bug (reproduce/root-cause/fix/regress),
49
+ Refactor (scope/refactor/test/verify),
50
+ Audit (discover/analyze/consolidate), Docs+code (change/update/check).
51
+
52
+ ---
53
+
54
+ ## 3. Specialists [MULTI-AGENT]
55
+
56
+ Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
57
+ ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
58
+ injected). ROLE = procedural workflow (step sequence + acceptance
59
+ criteria), never a bare job title, identity labels alone don't
60
+ change behavior.
61
+
62
+ No conversation history, other tasks, full plan, or unrelated
63
+ context. Isolation is the advantage. Out of scope: report and stop.
64
+
65
+ ---
66
+
67
+ ## 4. Cross-Talk [MULTI-AGENT]
68
+
69
+ After each group: check if A modified B's files, changed B's
70
+ interfaces, invalidated B's assumptions, or produced B's inputs.
71
+
72
+ Route minimum context from A to B. If B completed, spawn correction
73
+ agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
74
+ plan, code, review, or do specialist work.
75
+
76
+ ---
77
+
78
+ ## 5. Staff Engineer [MULTI-AGENT]
79
+
80
+ Final sub-agent. Reviews integrated output.
81
+
82
+ Packet: changed files + diffs, objective, decisions, risks,
83
+ questions. Expand for: core architecture, security, central
84
+ abstractions.
85
+
86
+ Check: requirements met, specialist contradictions, cross-breakage
87
+ (interfaces/imports/types/state), architectural drift, verification
88
+ (S7.3), dead code/orphaned imports/incomplete renames,
89
+ surgical-scope violations (S7.4).
90
+
91
+ Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
92
+ deliver with issues listed.
93
+
94
+ High-risk or contested verdicts: adversarial panel of 3 (odd, no
95
+ ties), each prompted to refute, not confirm.
96
+
97
+ ---
98
+
99
+ ## 6. Orchestrator Discipline [MULTI-AGENT]
100
+
101
+ - Route minimum viable info (signature, not 200-line diff)
102
+ - Checkpoint before spawns/handoffs/resumes: objective, files,
103
+ requirements, decisions, risks, next action
104
+ - Structured artifacts > transcript carryover
105
+ - Stable scaffolds for cache reuse; no per-specialist rephrasing
106
+ - Track agent status; report blocks immediately
107
+ - Resume from latest artifact, not full history
108
+ - Specialist fails: report, ask user. No silent retry >1
109
+ - Deliver what asked. No gold-plating. Hooks > prompt reminders
110
+
111
+ ---
112
+
113
+ ## 9. Model Routing: full table
114
+
115
+ Pick the cheapest model that handles the task. Orchestrator decides
116
+ at spawn time; Planner (S2) assigns per subtask.
117
+
118
+ | Tier | When | Examples |
119
+ |------|------|----------|
120
+ | Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
121
+ | **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
122
+ | Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
123
+ | Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
124
+
125
+ When unsure: Sonnet.
126
+
127
+ ### Output caps
128
+
129
+ Agent prompts MUST specify max response length. Oversized results
130
+ bloat parent context and trigger compaction.
131
+
132
+ | Agent tier | Cap | Exception |
133
+ |------------|-----|-----------|
134
+ | Haiku | 100 words | - |
135
+ | Sonnet | 500 words | Code output (uncapped) |
136
+ | Opus | Uncapped | - |
137
+ | Frontier | Uncapped | - |
138
+ | Explore | 200 words | Always, regardless of model |
139
+
140
+ Explore agents: "report in under 200 words" in every prompt.
141
+
142
+ ### Tool-call budgets
143
+
144
+ Action tokens are the third cost lever, beside output caps (above)
145
+ and S8 input compression. Every subagent prompt carries a tool-call
146
+ budget (manifest field `toolBudget`); idea adapted from
147
+ claude-token-efficient (MIT).
148
+
149
+ | Task type | Budget |
150
+ |-----------|--------|
151
+ | Routine subtask, known scope | ~20 calls |
152
+ | Read-only research / Explore | ~10 calls |
153
+ | Multi-file implementation | scale with file count; state it explicitly |
154
+
155
+ Discipline inside the budget: read-first-write-once (read each
156
+ needed file once, then edit, no re-read loops); one diagnostic read
157
+ per failure, then the S7.3 two-attempt rule applies (stop, re-read
158
+ from scratch, change approach). Budget exhausted: report progress
159
+ and the named gap, never burn calls polling.
160
+ Research agents returning raw dumps waste more tokens than they save.
161
+
162
+ ---
163
+
164
+ ## Self-evaluation (relocated S7.6)
165
+
166
+ - Two perspectives: perfectionist critique + pragmatist accept
167
+ - Bug autopsy: root cause vs symptom, prevention
168
+ - After 2 failures: stop, re-read from scratch, different approach