@maestrofrontier/frontier 1.4.4 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/.agents/plugins/marketplace.json +21 -0
  2. package/.codex-plugin/plugin.json +29 -0
  3. package/.cursorrules +197 -194
  4. package/AGENTS.md +214 -214
  5. package/CLAUDE.md +29 -29
  6. package/README.md +368 -278
  7. package/bin/maestro.cjs +75 -75
  8. package/commands/compress.md +36 -36
  9. package/commands/frontier.md +124 -124
  10. package/commands/terse.md +23 -23
  11. package/docs/codex.md +167 -98
  12. package/docs/orchestration.md +168 -168
  13. package/frontier/cli.cjs +279 -248
  14. package/frontier/config.cjs +468 -441
  15. package/frontier/dispatch.cjs +267 -255
  16. package/frontier/judge.cjs +92 -92
  17. package/frontier/run.cjs +201 -148
  18. package/frontier/schema.cjs +112 -112
  19. package/frontier/semaphore.cjs +49 -49
  20. package/frontier/synthesize.cjs +79 -79
  21. package/hooks/frontier-autorun.cjs +127 -124
  22. package/hooks/hooks.json +103 -103
  23. package/hooks/maestro-doctrine-guard.cjs +81 -81
  24. package/hooks/maestro-gate-reminder.cjs +22 -7
  25. package/hooks/maestro-gate-telemetry.cjs +79 -77
  26. package/hooks/maestro-phase-scope.cjs +118 -118
  27. package/hooks/maestro-statusline-sync.cjs +152 -152
  28. package/hooks/maestro-subagent-guard.cjs +148 -148
  29. package/hooks/maestro-terse-mode.cjs +189 -189
  30. package/hooks/maestro-toolbudget-advisory.cjs +127 -127
  31. package/integrations/README.md +111 -94
  32. package/integrations/cline/skills/frontier/SKILL.md +75 -75
  33. package/integrations/codex/prompts/frontier.md +70 -66
  34. package/integrations/codex/prompts/update.md +39 -36
  35. package/integrations/codex/skills/maestro-frontier/SKILL.md +122 -0
  36. package/integrations/codex/skills/{settings → maestro-settings}/SKILL.md +55 -46
  37. package/integrations/codex/skills/{terse → maestro-terse}/SKILL.md +58 -49
  38. package/integrations/codex/skills/maestro-update/SKILL.md +31 -0
  39. package/integrations/cursor/commands/frontier.md +63 -63
  40. package/integrations/cursor/commands/update.md +34 -34
  41. package/integrations/gemini/commands/frontier.toml +76 -76
  42. package/integrations/windsurf/workflows/frontier.md +70 -70
  43. package/package.json +58 -55
  44. package/scripts/install.cjs +1014 -605
  45. package/settings/cli.cjs +140 -140
  46. package/settings/config.cjs +309 -309
  47. package/skills/maestro-frontier/SKILL.md +122 -0
  48. package/skills/maestro-settings/SKILL.md +55 -0
  49. package/skills/maestro-terse/SKILL.md +58 -0
  50. package/skills/maestro-update/SKILL.md +31 -0
  51. package/skills/terse/SKILL.md +74 -0
  52. package/integrations/codex/skills/frontier/SKILL.md +0 -91
  53. package/integrations/codex/skills/update/SKILL.md +0 -29
package/docs/codex.md CHANGED
@@ -1,98 +1,167 @@
1
- # Maestro on Codex
2
-
3
- Codex reads `AGENTS.md` natively, no adapter file needed. This page
4
- maps Maestro's concepts onto Codex specifics. All behavior below was
5
- verified against the official Codex docs
6
- ([AGENTS.md guide](https://developers.openai.com/codex/guides/agents-md),
7
- [Automations](https://developers.openai.com/codex/app/automations.md),
8
- [Subagents](https://developers.openai.com/codex/subagents.md))
9
- on 2026-06-12.
10
-
11
- ## AGENTS.md semantics
12
-
13
- Codex discovers instruction files in this order:
14
-
15
- 1. **Global:** `~/.codex/AGENTS.override.md` if present, else
16
- `~/.codex/AGENTS.md`.
17
- 2. **Project:** walking from the Git root down to the current working
18
- directory, checking each level for `AGENTS.override.md`, then
19
- `AGENTS.md`.
20
-
21
- Files are concatenated root-down with blank lines between them; files
22
- closer to your current directory appear later in the combined prompt
23
- and therefore override earlier guidance. Codex skips empty files,
24
- discovers once per run, and stops adding files once the combined set
25
- hits `project_doc_max_bytes` (32 KiB by default).
26
-
27
- Practical consequences for Maestro:
28
-
29
- - **Placement:** put Maestro's `AGENTS.md` at the repository root. If
30
- you already have a project `AGENTS.md`, append Maestro's content to
31
- it (Codex concatenates by directory level, not by file).
32
- - **Budget:** Maestro's always-on kernel is ~8 KB, a quarter of the
33
- default 32 KiB cap, leaving room for your project instructions
34
- (the full S2-S6 protocol lives in `docs/orchestration.md`, read on
35
- demand). If you layer nested `AGENTS.md` files, watch the cap:
36
- Codex silently stops adding files beyond it.
37
- - **Global install:** putting Maestro in `~/.codex/AGENTS.md` applies
38
- the doctrine to every project; per-repo files then layer on top and
39
- win where they conflict.
40
-
41
- ## Multi-agent routing (S2-S6 mapping)
42
-
43
- Codex supports subagent workflows in the CLI and app, but current Codex
44
- docs specify that subagents spawn only when the user explicitly asks for
45
- them. Practical mapping for Maestro:
46
-
47
- - If the user did **not** explicitly ask for subagents, parallel
48
- agents, or delegation, emit the counted S1 verdict and continue
49
- single-agent even when the portable gate would otherwise route to
50
- S2-S6.
51
- - If the user explicitly asked for subagents/parallel work and the S1
52
- gate returns multi-agent, map Maestro's Planner, Specialists, and
53
- Staff Engineer to Codex subagents. Keep specialist prompts scoped and
54
- cap parallel groups at 4 as usual.
55
- - Claude Code agent teams do not transfer to Codex. Codex subagents are
56
- the only Codex-native mapping for Maestro specialists.
57
-
58
- ## Long-horizon operation (S10 mapping)
59
-
60
- Claude Code maps S10 to `/loop`, `/schedule`, and `ScheduleWakeup`.
61
- The Codex analog is **Automations** (Codex app, automations pane):
62
- recurring prompts that run in the background on minute-based, daily,
63
- weekly, or cron schedules.
64
-
65
- | Maestro S10 concept | Codex mechanism |
66
- |---|---|
67
- | Self-paced session loop | **Thread automations**, heartbeat-style recurring wake-up calls attached to the current thread, preserving conversation context |
68
- | Durable scheduled routine | **Standalone/project automations**, independent runs; findings land in the Triage inbox (auto-archived when there is nothing to report) |
69
- | Checkpoint artifact | Same convention: one `_<task>.md` in the repo root (gitignore `_*`), read first on every run, holding phase status, findings with sources, decisions with rationale |
70
- | Scripted/CI iteration | `codex exec "<prompt>"` non-interactive runs |
71
-
72
- S10 rules apply unchanged: hard caps on iterations, completion criteria
73
- declared up front, externalized state (the thread is not durable
74
- memory), and an explicit final report instead of a zombie loop. For
75
- project-scoped automations note the Codex requirement that the local
76
- app is running and the project is on disk.
77
-
78
- ## What does not transfer
79
-
80
- Codex has no user-hook system equivalent to Claude Code's, so Maestro's
81
- structural enforcement pack (subagent guard, loop guard, phase-scope
82
- guard, gate telemetry) does not run on Codex. The prose doctrine in
83
- `AGENTS.md` is the enforcement surface; S7.3's verification gate relies
84
- on the model honoring it rather than on a hook.
85
-
86
- The Maestro context bar also does not apply: Codex CLI ships a native
87
- context-usage indicator (`/statusline` picker, or `context` in
88
- `[tui].status_line` in `~/.codex/config.toml`).
89
-
90
- ## Skills and the Frontier ON indicator
91
-
92
- `maestro install --target codex` installs the `frontier`, `terse`, `settings`,
93
- and `update` Maestro commands as Codex skills (no-clobber) to
94
- `.agents/skills/<name>/SKILL.md` (per-repo) or `~/.agents/skills/<name>/SKILL.md`
95
- (global). When `maestro frontier status --scope codex` reports mode != off, the
96
- `frontier` skill instructs Codex to lead its reply with
97
- `Maestro Frontier ON (<label>)` `single · <model>` or `fusion · <preset>`. When
98
- mode is off, no indicator line appears.
1
+ # Maestro on Codex
2
+
3
+ Codex reads `AGENTS.md` natively, no adapter file needed. This page
4
+ maps Maestro's concepts onto Codex specifics. All behavior below was
5
+ verified against the official Codex docs
6
+ ([AGENTS.md guide](https://developers.openai.com/codex/guides/agents-md),
7
+ [config reference](https://developers.openai.com/codex/config-reference#configtoml),
8
+ [plugin-bundled hooks](https://developers.openai.com/codex/hooks#plugin-bundled-hooks),
9
+ [Automations](https://developers.openai.com/codex/app/automations.md),
10
+ [Subagents](https://developers.openai.com/codex/subagents.md))
11
+ on 2026-06-12.
12
+
13
+ ## AGENTS.md semantics
14
+
15
+ Codex discovers instruction files in this order:
16
+
17
+ 1. **Global:** `~/.codex/AGENTS.override.md` if present, else
18
+ `~/.codex/AGENTS.md`.
19
+ 2. **Project:** walking from the Git root down to the current working
20
+ directory, checking each level for `AGENTS.override.md`, then
21
+ `AGENTS.md`.
22
+
23
+ Files are concatenated root-down with blank lines between them; files
24
+ closer to your current directory appear later in the combined prompt
25
+ and therefore override earlier guidance. Codex skips empty files,
26
+ discovers once per run, and stops adding files once the combined set
27
+ hits `project_doc_max_bytes` (32 KiB by default).
28
+
29
+ Practical consequences for Maestro:
30
+
31
+ - **Placement:** put Maestro's `AGENTS.md` at the repository root. If
32
+ you already have a project `AGENTS.md`, append Maestro's content to
33
+ it (Codex concatenates by directory level, not by file).
34
+ - **Budget:** Maestro's always-on kernel is ~8 KB, a quarter of the
35
+ default 32 KiB cap, leaving room for your project instructions
36
+ (the full S2-S6 protocol lives in `docs/orchestration.md`, read on
37
+ demand). If you layer nested `AGENTS.md` files, watch the cap:
38
+ Codex silently stops adding files beyond it.
39
+ - **Global install:** putting Maestro in `~/.codex/AGENTS.md` applies
40
+ the doctrine to every project; per-repo files then layer on top and
41
+ win where they conflict.
42
+
43
+ ## Config, hooks, and trust
44
+
45
+ Codex user config lives at `~/.codex/config.toml`. Project overrides can
46
+ live in `.codex/config.toml`, but Codex loads project-local config,
47
+ hooks, and rules only for trusted projects. Untrusted projects skip
48
+ those local surfaces.
49
+
50
+ Codex also supports plugin-bundled lifecycle hooks. Enabled plugins can
51
+ ship hooks alongside user, project, and managed hooks; the default plugin
52
+ hook file is `hooks/hooks.json`, and manifests can reference `./` paths
53
+ or inline hook definitions. Treat plugin hooks as executable code:
54
+ review and trust the plugin before enabling them. Codex sets
55
+ `PLUGIN_ROOT` and `PLUGIN_DATA` for plugin hooks, and also sets
56
+ `CLAUDE_PLUGIN_ROOT` and `CLAUDE_PLUGIN_DATA` for compatibility.
57
+
58
+ For Codex CLI/Desktop, install Maestro as a native Codex plugin:
59
+
60
+ ```text
61
+ codex plugin marketplace add mbanderas/maestro
62
+ codex plugin add maestro@maestro
63
+ ```
64
+
65
+ The repo is a Codex marketplace because it ships
66
+ `.agents/plugins/marketplace.json`; the plugin itself is described by
67
+ `.codex-plugin/plugin.json`. That manifest points at the plugin-bundled Codex
68
+ skills (`./skills/`); the hook bundle lives at Codex's default plugin hook path
69
+ `./hooks/hooks.json`, so Codex can install the plugin without `npx`. Restart
70
+ Codex or start a new thread after changing plugin installation/trust state,
71
+ then review and trust the bundled hooks before expecting autorun.
72
+
73
+ `maestro install --target codex` remains as a portable/manual fallback when
74
+ you specifically want to copy files into a project instead of installing the
75
+ Codex plugin.
76
+
77
+ ## Multi-agent routing (S2-S6 mapping)
78
+
79
+ Codex supports subagent workflows in the CLI and app, but current Codex
80
+ docs specify that subagents spawn only when the user explicitly asks for
81
+ them. Practical mapping for Maestro:
82
+
83
+ - If the user did **not** explicitly ask for subagents, parallel
84
+ agents, or delegation, emit the counted S1 verdict and continue
85
+ single-agent even when the portable gate would otherwise route to
86
+ S2-S6.
87
+ - If the user explicitly asked for subagents/parallel work and the S1
88
+ gate returns multi-agent, map Maestro's Planner, Specialists, and
89
+ Staff Engineer to Codex subagents. Keep specialist prompts scoped and
90
+ cap parallel groups at 4 as usual.
91
+ - Claude Code agent teams do not transfer to Codex. Codex subagents are
92
+ the only Codex-native mapping for Maestro specialists.
93
+
94
+ ## Long-horizon operation (S10 mapping)
95
+
96
+ Claude Code maps S10 to `/loop`, `/schedule`, and `ScheduleWakeup`.
97
+ The Codex analog is **Automations** (Codex app, automations pane):
98
+ recurring prompts that run in the background on minute-based, daily,
99
+ weekly, or cron schedules.
100
+
101
+ | Maestro S10 concept | Codex mechanism |
102
+ |---|---|
103
+ | Self-paced session loop | **Thread automations**, heartbeat-style recurring wake-up calls attached to the current thread, preserving conversation context |
104
+ | Durable scheduled routine | **Standalone/project automations**, independent runs; findings land in the Triage inbox (auto-archived when there is nothing to report) |
105
+ | Checkpoint artifact | Same convention: one `_<task>.md` in the repo root (gitignore `_*`), read first on every run, holding phase status, findings with sources, decisions with rationale |
106
+ | Scripted/CI iteration | `codex exec "<prompt>"` non-interactive runs |
107
+
108
+ S10 rules apply unchanged: hard caps on iterations, completion criteria
109
+ declared up front, externalized state (the thread is not durable
110
+ memory), and an explicit final report instead of a zombie loop. For
111
+ project-scoped automations note the Codex requirement that the local
112
+ app is running and the project is on disk.
113
+
114
+ ## Frontier autorun and scope
115
+
116
+ Frontier is off until you arm it. For Codex, the normal workflow is:
117
+
118
+ ```text
119
+ maestro frontier mode fusion --preset chatgpt-duo --scope codex-project
120
+ maestro frontier mode fusion --preset frontier-trio --judge chatgpt --synth chatgpt --scope codex-project
121
+ maestro frontier mode off --scope codex-project
122
+ ```
123
+
124
+ Once the Maestro Codex plugin hook is installed, enabled, and trusted,
125
+ normal Codex prompts route through Frontier until you turn it off.
126
+ `maestro frontier run "<prompt>" ...` remains available for advanced/debug
127
+ one-offs, but it is not the everyday Codex flow.
128
+
129
+ Project/workspace scope is the recommended default for repo installs:
130
+ it keeps one repository's armed state from leaking into another. In a
131
+ Codex plugin context Maestro resolves this automatically to a
132
+ `codex-<8hex>` workspace scope. From a shell in the repo, pass
133
+ `--scope codex-project` (or `codex-workspace`) to resolve the same project
134
+ scope. Global/user scope is optional: choose an explicit name such as
135
+ `--scope codex-global` only when you deliberately want the same state across
136
+ projects.
137
+
138
+ Use the same active scope for all lifecycle commands:
139
+
140
+ ```text
141
+ maestro frontier status --scope codex-project
142
+ maestro frontier mode off --scope codex-project
143
+ maestro frontier mode fusion --preset chatgpt-duo --scope codex-project
144
+ maestro frontier mode fusion --preset frontier-trio --judge chatgpt --synth chatgpt --scope codex-project
145
+ ```
146
+
147
+ ## What differs from Claude Code
148
+
149
+ Claude Code-specific UI such as the Maestro context bar does not apply:
150
+ Codex CLI ships a native
151
+ context-usage indicator (`/statusline` picker, or `context` in
152
+ `[tui].status_line` in `~/.codex/config.toml`).
153
+
154
+ ## Skills and the Frontier ON indicator
155
+
156
+ Codex skills can live in personal `$HOME/.agents/skills`, repo
157
+ `.agents/skills`, or installed plugins. The normal Codex path is the Maestro
158
+ plugin, which bundles `maestro-frontier`, `maestro-settings`, `maestro-terse`,
159
+ and `maestro-update` from `./skills/`. The portable
160
+ `maestro install --target codex` fallback still installs those same skills to
161
+ `.agents/skills/<name>/SKILL.md` for project installs or
162
+ `~/.agents/skills/<name>/SKILL.md` for global/user installs.
163
+
164
+ When `maestro frontier status --scope codex-project` reports mode != off,
165
+ the `maestro-frontier` skill instructs Codex to lead its reply with
166
+ `Maestro Frontier ON (<label>)` — `single · <model>` or `fusion · <preset>`. When
167
+ mode is off, no indicator line appears.
@@ -1,168 +1,168 @@
1
- # Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
2
-
3
- Loaded on demand: read this file when the Decision Gate
4
- ([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
5
- kernel's compact protocol is a subset of this document and suffices
6
- when this file is unavailable. Relocated verbatim from the always-on
7
- doctrine in v1.2, content here extends the kernel, never overrides
8
- it.
9
-
10
- ---
11
-
12
- ## Gate constraints (S1 detail)
13
-
14
- - Max 4 specialists per group.
15
- - >60% shared files or <=3 files in one chain: single-agent.
16
- - Overlapping ownership erases parallelism; high-centrality: bias
17
- single.
18
- - Specialists must differ in role or context, not split identical
19
- work, homogeneous splits underperform one agent with the same
20
- budget. Split-design rule for the Planner, not a gate downgrade.
21
- - Parallelizability first: specialization pays only when subtasks are
22
- structurally independent. Coupled subtasks: single-agent wins at
23
- equal token budget, gains that ignore total compute don't count.
24
- - Adversarial review is the best-evidenced multi-agent win. Review
25
- and debate panels: 3 specialists (odd, no ties); 4 stays the cap
26
- for parallel workstreams.
27
- - How to split (and whether a split is too homogeneous) is the
28
- Planner's call (S2), made after the spawn, never the gate's.
29
-
30
- ---
31
-
32
- ## 2. Planner [MULTI-AGENT]
33
-
34
- First sub-agent, created by calling the Task/Agent tool, never
35
- simulated inline by the orchestrator. No specialist work before
36
- Planner returns.
37
-
38
- Produces: subtasks with boundaries, dependency map, parallel groups
39
- (max 4), per-task file scope + objective + acceptance criteria, flags
40
- for single-agent subtasks and high-risk items, cross-talk pairs,
41
- token-cost assessment (flag >60% overlap), task-class match.
42
-
43
- Fewer broader > many narrow. Flag ambiguity, don't assume.
44
-
45
- Reading: recommends single-agent -> switch. Ambiguities -> surface.
46
-
47
- Task classes: Feature (spec/implement/test/integrate),
48
- Bug (reproduce/root-cause/fix/regress),
49
- Refactor (scope/refactor/test/verify),
50
- Audit (discover/analyze/consolidate), Docs+code (change/update/check).
51
-
52
- ---
53
-
54
- ## 3. Specialists [MULTI-AGENT]
55
-
56
- Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
57
- ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
58
- injected). ROLE = procedural workflow (step sequence + acceptance
59
- criteria), never a bare job title, identity labels alone don't
60
- change behavior.
61
-
62
- No conversation history, other tasks, full plan, or unrelated
63
- context. Isolation is the advantage. Out of scope: report and stop.
64
-
65
- ---
66
-
67
- ## 4. Cross-Talk [MULTI-AGENT]
68
-
69
- After each group: check if A modified B's files, changed B's
70
- interfaces, invalidated B's assumptions, or produced B's inputs.
71
-
72
- Route minimum context from A to B. If B completed, spawn correction
73
- agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
74
- plan, code, review, or do specialist work.
75
-
76
- ---
77
-
78
- ## 5. Staff Engineer [MULTI-AGENT]
79
-
80
- Final sub-agent. Reviews integrated output.
81
-
82
- Packet: changed files + diffs, objective, decisions, risks,
83
- questions. Expand for: core architecture, security, central
84
- abstractions.
85
-
86
- Check: requirements met, specialist contradictions, cross-breakage
87
- (interfaces/imports/types/state), architectural drift, verification
88
- (S7.3), dead code/orphaned imports/incomplete renames,
89
- surgical-scope violations (S7.4).
90
-
91
- Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
92
- deliver with issues listed.
93
-
94
- High-risk or contested verdicts: adversarial panel of 3 (odd, no
95
- ties), each prompted to refute, not confirm.
96
-
97
- ---
98
-
99
- ## 6. Orchestrator Discipline [MULTI-AGENT]
100
-
101
- - Route minimum viable info (signature, not 200-line diff)
102
- - Checkpoint before spawns/handoffs/resumes: objective, files,
103
- requirements, decisions, risks, next action
104
- - Structured artifacts > transcript carryover
105
- - Stable scaffolds for cache reuse; no per-specialist rephrasing
106
- - Track agent status; report blocks immediately
107
- - Resume from latest artifact, not full history
108
- - Specialist fails: report, ask user. No silent retry >1
109
- - Deliver what asked. No gold-plating. Hooks > prompt reminders
110
-
111
- ---
112
-
113
- ## 9. Model Routing: full table
114
-
115
- Pick the cheapest model that handles the task. Orchestrator decides
116
- at spawn time; Planner (S2) assigns per subtask.
117
-
118
- | Tier | When | Examples |
119
- |------|------|----------|
120
- | Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
121
- | **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
122
- | Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
123
- | Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
124
-
125
- When unsure: Sonnet.
126
-
127
- ### Output caps
128
-
129
- Agent prompts MUST specify max response length. Oversized results
130
- bloat parent context and trigger compaction.
131
-
132
- | Agent tier | Cap | Exception |
133
- |------------|-----|-----------|
134
- | Haiku | 100 words | - |
135
- | Sonnet | 500 words | Code output (uncapped) |
136
- | Opus | Uncapped | - |
137
- | Frontier | Uncapped | - |
138
- | Explore | 200 words | Always, regardless of model |
139
-
140
- Explore agents: "report in under 200 words" in every prompt.
141
-
142
- ### Tool-call budgets
143
-
144
- Action tokens are the third cost lever, beside output caps (above)
145
- and S8 input compression. Every subagent prompt carries a tool-call
146
- budget (manifest field `toolBudget`); idea adapted from
147
- claude-token-efficient (MIT).
148
-
149
- | Task type | Budget |
150
- |-----------|--------|
151
- | Routine subtask, known scope | ~20 calls |
152
- | Read-only research / Explore | ~10 calls |
153
- | Multi-file implementation | scale with file count; state it explicitly |
154
-
155
- Discipline inside the budget: read-first-write-once (read each
156
- needed file once, then edit, no re-read loops); one diagnostic read
157
- per failure, then the S7.3 two-attempt rule applies (stop, re-read
158
- from scratch, change approach). Budget exhausted: report progress
159
- and the named gap, never burn calls polling.
160
- Research agents returning raw dumps waste more tokens than they save.
161
-
162
- ---
163
-
164
- ## Self-evaluation (relocated S7.6)
165
-
166
- - Two perspectives: perfectionist critique + pragmatist accept
167
- - Bug autopsy: root cause vs symptom, prevention
168
- - After 2 failures: stop, re-read from scratch, different approach
1
+ # Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
2
+
3
+ Loaded on demand: read this file when the Decision Gate
4
+ ([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
5
+ kernel's compact protocol is a subset of this document and suffices
6
+ when this file is unavailable. Relocated verbatim from the always-on
7
+ doctrine in v1.2, content here extends the kernel, never overrides
8
+ it.
9
+
10
+ ---
11
+
12
+ ## Gate constraints (S1 detail)
13
+
14
+ - Max 4 specialists per group.
15
+ - >60% shared files or <=3 files in one chain: single-agent.
16
+ - Overlapping ownership erases parallelism; high-centrality: bias
17
+ single.
18
+ - Specialists must differ in role or context, not split identical
19
+ work, homogeneous splits underperform one agent with the same
20
+ budget. Split-design rule for the Planner, not a gate downgrade.
21
+ - Parallelizability first: specialization pays only when subtasks are
22
+ structurally independent. Coupled subtasks: single-agent wins at
23
+ equal token budget, gains that ignore total compute don't count.
24
+ - Adversarial review is the best-evidenced multi-agent win. Review
25
+ and debate panels: 3 specialists (odd, no ties); 4 stays the cap
26
+ for parallel workstreams.
27
+ - How to split (and whether a split is too homogeneous) is the
28
+ Planner's call (S2), made after the spawn, never the gate's.
29
+
30
+ ---
31
+
32
+ ## 2. Planner [MULTI-AGENT]
33
+
34
+ First sub-agent, created by calling the Task/Agent tool, never
35
+ simulated inline by the orchestrator. No specialist work before
36
+ Planner returns.
37
+
38
+ Produces: subtasks with boundaries, dependency map, parallel groups
39
+ (max 4), per-task file scope + objective + acceptance criteria, flags
40
+ for single-agent subtasks and high-risk items, cross-talk pairs,
41
+ token-cost assessment (flag >60% overlap), task-class match.
42
+
43
+ Fewer broader > many narrow. Flag ambiguity, don't assume.
44
+
45
+ Reading: recommends single-agent -> switch. Ambiguities -> surface.
46
+
47
+ Task classes: Feature (spec/implement/test/integrate),
48
+ Bug (reproduce/root-cause/fix/regress),
49
+ Refactor (scope/refactor/test/verify),
50
+ Audit (discover/analyze/consolidate), Docs+code (change/update/check).
51
+
52
+ ---
53
+
54
+ ## 3. Specialists [MULTI-AGENT]
55
+
56
+ Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
57
+ ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
58
+ injected). ROLE = procedural workflow (step sequence + acceptance
59
+ criteria), never a bare job title, identity labels alone don't
60
+ change behavior.
61
+
62
+ No conversation history, other tasks, full plan, or unrelated
63
+ context. Isolation is the advantage. Out of scope: report and stop.
64
+
65
+ ---
66
+
67
+ ## 4. Cross-Talk [MULTI-AGENT]
68
+
69
+ After each group: check if A modified B's files, changed B's
70
+ interfaces, invalidated B's assumptions, or produced B's inputs.
71
+
72
+ Route minimum context from A to B. If B completed, spawn correction
73
+ agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
74
+ plan, code, review, or do specialist work.
75
+
76
+ ---
77
+
78
+ ## 5. Staff Engineer [MULTI-AGENT]
79
+
80
+ Final sub-agent. Reviews integrated output.
81
+
82
+ Packet: changed files + diffs, objective, decisions, risks,
83
+ questions. Expand for: core architecture, security, central
84
+ abstractions.
85
+
86
+ Check: requirements met, specialist contradictions, cross-breakage
87
+ (interfaces/imports/types/state), architectural drift, verification
88
+ (S7.3), dead code/orphaned imports/incomplete renames,
89
+ surgical-scope violations (S7.4).
90
+
91
+ Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
92
+ deliver with issues listed.
93
+
94
+ High-risk or contested verdicts: adversarial panel of 3 (odd, no
95
+ ties), each prompted to refute, not confirm.
96
+
97
+ ---
98
+
99
+ ## 6. Orchestrator Discipline [MULTI-AGENT]
100
+
101
+ - Route minimum viable info (signature, not 200-line diff)
102
+ - Checkpoint before spawns/handoffs/resumes: objective, files,
103
+ requirements, decisions, risks, next action
104
+ - Structured artifacts > transcript carryover
105
+ - Stable scaffolds for cache reuse; no per-specialist rephrasing
106
+ - Track agent status; report blocks immediately
107
+ - Resume from latest artifact, not full history
108
+ - Specialist fails: report, ask user. No silent retry >1
109
+ - Deliver what asked. No gold-plating. Hooks > prompt reminders
110
+
111
+ ---
112
+
113
+ ## 9. Model Routing: full table
114
+
115
+ Pick the cheapest model that handles the task. Orchestrator decides
116
+ at spawn time; Planner (S2) assigns per subtask.
117
+
118
+ | Tier | When | Examples |
119
+ |------|------|----------|
120
+ | Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
121
+ | **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
122
+ | Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
123
+ | Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
124
+
125
+ When unsure: Sonnet.
126
+
127
+ ### Output caps
128
+
129
+ Agent prompts MUST specify max response length. Oversized results
130
+ bloat parent context and trigger compaction.
131
+
132
+ | Agent tier | Cap | Exception |
133
+ |------------|-----|-----------|
134
+ | Haiku | 100 words | - |
135
+ | Sonnet | 500 words | Code output (uncapped) |
136
+ | Opus | Uncapped | - |
137
+ | Frontier | Uncapped | - |
138
+ | Explore | 200 words | Always, regardless of model |
139
+
140
+ Explore agents: "report in under 200 words" in every prompt.
141
+
142
+ ### Tool-call budgets
143
+
144
+ Action tokens are the third cost lever, beside output caps (above)
145
+ and S8 input compression. Every subagent prompt carries a tool-call
146
+ budget (manifest field `toolBudget`); idea adapted from
147
+ claude-token-efficient (MIT).
148
+
149
+ | Task type | Budget |
150
+ |-----------|--------|
151
+ | Routine subtask, known scope | ~20 calls |
152
+ | Read-only research / Explore | ~10 calls |
153
+ | Multi-file implementation | scale with file count; state it explicitly |
154
+
155
+ Discipline inside the budget: read-first-write-once (read each
156
+ needed file once, then edit, no re-read loops); one diagnostic read
157
+ per failure, then the S7.3 two-attempt rule applies (stop, re-read
158
+ from scratch, change approach). Budget exhausted: report progress
159
+ and the named gap, never burn calls polling.
160
+ Research agents returning raw dumps waste more tokens than they save.
161
+
162
+ ---
163
+
164
+ ## Self-evaluation (relocated S7.6)
165
+
166
+ - Two perspectives: perfectionist critique + pragmatist accept
167
+ - Bug autopsy: root cause vs symptom, prevention
168
+ - After 2 failures: stop, re-read from scratch, different approach