@maestrofrontier/frontier 1.4.4 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/plugins/marketplace.json +21 -0
- package/.codex-plugin/plugin.json +29 -0
- package/.cursorrules +197 -194
- package/AGENTS.md +214 -214
- package/CLAUDE.md +29 -29
- package/README.md +368 -278
- package/bin/maestro.cjs +75 -75
- package/commands/compress.md +36 -36
- package/commands/frontier.md +124 -124
- package/commands/terse.md +23 -23
- package/docs/codex.md +167 -98
- package/docs/orchestration.md +168 -168
- package/frontier/cli.cjs +279 -248
- package/frontier/config.cjs +468 -441
- package/frontier/dispatch.cjs +267 -255
- package/frontier/judge.cjs +92 -92
- package/frontier/run.cjs +201 -148
- package/frontier/schema.cjs +112 -112
- package/frontier/semaphore.cjs +49 -49
- package/frontier/synthesize.cjs +79 -79
- package/hooks/frontier-autorun.cjs +127 -124
- package/hooks/hooks.json +103 -103
- package/hooks/maestro-doctrine-guard.cjs +81 -81
- package/hooks/maestro-gate-reminder.cjs +22 -7
- package/hooks/maestro-gate-telemetry.cjs +79 -77
- package/hooks/maestro-phase-scope.cjs +118 -118
- package/hooks/maestro-statusline-sync.cjs +152 -152
- package/hooks/maestro-subagent-guard.cjs +148 -148
- package/hooks/maestro-terse-mode.cjs +189 -189
- package/hooks/maestro-toolbudget-advisory.cjs +127 -127
- package/integrations/README.md +111 -94
- package/integrations/cline/skills/frontier/SKILL.md +75 -75
- package/integrations/codex/prompts/frontier.md +70 -66
- package/integrations/codex/prompts/update.md +39 -36
- package/integrations/codex/skills/maestro-frontier/SKILL.md +122 -0
- package/integrations/codex/skills/{settings → maestro-settings}/SKILL.md +55 -46
- package/integrations/codex/skills/{terse → maestro-terse}/SKILL.md +58 -49
- package/integrations/codex/skills/maestro-update/SKILL.md +31 -0
- package/integrations/cursor/commands/frontier.md +63 -63
- package/integrations/cursor/commands/update.md +34 -34
- package/integrations/gemini/commands/frontier.toml +76 -76
- package/integrations/windsurf/workflows/frontier.md +70 -70
- package/package.json +58 -55
- package/scripts/install.cjs +1014 -605
- package/settings/cli.cjs +140 -140
- package/settings/config.cjs +309 -309
- package/skills/maestro-frontier/SKILL.md +122 -0
- package/skills/maestro-settings/SKILL.md +55 -0
- package/skills/maestro-terse/SKILL.md +58 -0
- package/skills/maestro-update/SKILL.md +31 -0
- package/skills/terse/SKILL.md +74 -0
- package/integrations/codex/skills/frontier/SKILL.md +0 -91
- package/integrations/codex/skills/update/SKILL.md +0 -29
package/docs/codex.md
CHANGED
|
@@ -1,98 +1,167 @@
|
|
|
1
|
-
# Maestro on Codex
|
|
2
|
-
|
|
3
|
-
Codex reads `AGENTS.md` natively, no adapter file needed. This page
|
|
4
|
-
maps Maestro's concepts onto Codex specifics. All behavior below was
|
|
5
|
-
verified against the official Codex docs
|
|
6
|
-
([AGENTS.md guide](https://developers.openai.com/codex/guides/agents-md),
|
|
7
|
-
[
|
|
8
|
-
[
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
1
|
+
# Maestro on Codex
|
|
2
|
+
|
|
3
|
+
Codex reads `AGENTS.md` natively, no adapter file needed. This page
|
|
4
|
+
maps Maestro's concepts onto Codex specifics. All behavior below was
|
|
5
|
+
verified against the official Codex docs
|
|
6
|
+
([AGENTS.md guide](https://developers.openai.com/codex/guides/agents-md),
|
|
7
|
+
[config reference](https://developers.openai.com/codex/config-reference#configtoml),
|
|
8
|
+
[plugin-bundled hooks](https://developers.openai.com/codex/hooks#plugin-bundled-hooks),
|
|
9
|
+
[Automations](https://developers.openai.com/codex/app/automations.md),
|
|
10
|
+
[Subagents](https://developers.openai.com/codex/subagents.md))
|
|
11
|
+
on 2026-06-12.
|
|
12
|
+
|
|
13
|
+
## AGENTS.md semantics
|
|
14
|
+
|
|
15
|
+
Codex discovers instruction files in this order:
|
|
16
|
+
|
|
17
|
+
1. **Global:** `~/.codex/AGENTS.override.md` if present, else
|
|
18
|
+
`~/.codex/AGENTS.md`.
|
|
19
|
+
2. **Project:** walking from the Git root down to the current working
|
|
20
|
+
directory, checking each level for `AGENTS.override.md`, then
|
|
21
|
+
`AGENTS.md`.
|
|
22
|
+
|
|
23
|
+
Files are concatenated root-down with blank lines between them; files
|
|
24
|
+
closer to your current directory appear later in the combined prompt
|
|
25
|
+
and therefore override earlier guidance. Codex skips empty files,
|
|
26
|
+
discovers once per run, and stops adding files once the combined set
|
|
27
|
+
hits `project_doc_max_bytes` (32 KiB by default).
|
|
28
|
+
|
|
29
|
+
Practical consequences for Maestro:
|
|
30
|
+
|
|
31
|
+
- **Placement:** put Maestro's `AGENTS.md` at the repository root. If
|
|
32
|
+
you already have a project `AGENTS.md`, append Maestro's content to
|
|
33
|
+
it (Codex concatenates by directory level, not by file).
|
|
34
|
+
- **Budget:** Maestro's always-on kernel is ~8 KB, a quarter of the
|
|
35
|
+
default 32 KiB cap, leaving room for your project instructions
|
|
36
|
+
(the full S2-S6 protocol lives in `docs/orchestration.md`, read on
|
|
37
|
+
demand). If you layer nested `AGENTS.md` files, watch the cap:
|
|
38
|
+
Codex silently stops adding files beyond it.
|
|
39
|
+
- **Global install:** putting Maestro in `~/.codex/AGENTS.md` applies
|
|
40
|
+
the doctrine to every project; per-repo files then layer on top and
|
|
41
|
+
win where they conflict.
|
|
42
|
+
|
|
43
|
+
## Config, hooks, and trust
|
|
44
|
+
|
|
45
|
+
Codex user config lives at `~/.codex/config.toml`. Project overrides can
|
|
46
|
+
live in `.codex/config.toml`, but Codex loads project-local config,
|
|
47
|
+
hooks, and rules only for trusted projects. Untrusted projects skip
|
|
48
|
+
those local surfaces.
|
|
49
|
+
|
|
50
|
+
Codex also supports plugin-bundled lifecycle hooks. Enabled plugins can
|
|
51
|
+
ship hooks alongside user, project, and managed hooks; the default plugin
|
|
52
|
+
hook file is `hooks/hooks.json`, and manifests can reference `./` paths
|
|
53
|
+
or inline hook definitions. Treat plugin hooks as executable code:
|
|
54
|
+
review and trust the plugin before enabling them. Codex sets
|
|
55
|
+
`PLUGIN_ROOT` and `PLUGIN_DATA` for plugin hooks, and also sets
|
|
56
|
+
`CLAUDE_PLUGIN_ROOT` and `CLAUDE_PLUGIN_DATA` for compatibility.
|
|
57
|
+
|
|
58
|
+
For Codex CLI/Desktop, install Maestro as a native Codex plugin:
|
|
59
|
+
|
|
60
|
+
```text
|
|
61
|
+
codex plugin marketplace add mbanderas/maestro
|
|
62
|
+
codex plugin add maestro@maestro
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The repo is a Codex marketplace because it ships
|
|
66
|
+
`.agents/plugins/marketplace.json`; the plugin itself is described by
|
|
67
|
+
`.codex-plugin/plugin.json`. That manifest points at the plugin-bundled Codex
|
|
68
|
+
skills (`./skills/`); the hook bundle lives at Codex's default plugin hook path
|
|
69
|
+
`./hooks/hooks.json`, so Codex can install the plugin without `npx`. Restart
|
|
70
|
+
Codex or start a new thread after changing plugin installation/trust state,
|
|
71
|
+
then review and trust the bundled hooks before expecting autorun.
|
|
72
|
+
|
|
73
|
+
`maestro install --target codex` remains as a portable/manual fallback when
|
|
74
|
+
you specifically want to copy files into a project instead of installing the
|
|
75
|
+
Codex plugin.
|
|
76
|
+
|
|
77
|
+
## Multi-agent routing (S2-S6 mapping)
|
|
78
|
+
|
|
79
|
+
Codex supports subagent workflows in the CLI and app, but current Codex
|
|
80
|
+
docs specify that subagents spawn only when the user explicitly asks for
|
|
81
|
+
them. Practical mapping for Maestro:
|
|
82
|
+
|
|
83
|
+
- If the user did **not** explicitly ask for subagents, parallel
|
|
84
|
+
agents, or delegation, emit the counted S1 verdict and continue
|
|
85
|
+
single-agent even when the portable gate would otherwise route to
|
|
86
|
+
S2-S6.
|
|
87
|
+
- If the user explicitly asked for subagents/parallel work and the S1
|
|
88
|
+
gate returns multi-agent, map Maestro's Planner, Specialists, and
|
|
89
|
+
Staff Engineer to Codex subagents. Keep specialist prompts scoped and
|
|
90
|
+
cap parallel groups at 4 as usual.
|
|
91
|
+
- Claude Code agent teams do not transfer to Codex. Codex subagents are
|
|
92
|
+
the only Codex-native mapping for Maestro specialists.
|
|
93
|
+
|
|
94
|
+
## Long-horizon operation (S10 mapping)
|
|
95
|
+
|
|
96
|
+
Claude Code maps S10 to `/loop`, `/schedule`, and `ScheduleWakeup`.
|
|
97
|
+
The Codex analog is **Automations** (Codex app, automations pane):
|
|
98
|
+
recurring prompts that run in the background on minute-based, daily,
|
|
99
|
+
weekly, or cron schedules.
|
|
100
|
+
|
|
101
|
+
| Maestro S10 concept | Codex mechanism |
|
|
102
|
+
|---|---|
|
|
103
|
+
| Self-paced session loop | **Thread automations**, heartbeat-style recurring wake-up calls attached to the current thread, preserving conversation context |
|
|
104
|
+
| Durable scheduled routine | **Standalone/project automations**, independent runs; findings land in the Triage inbox (auto-archived when there is nothing to report) |
|
|
105
|
+
| Checkpoint artifact | Same convention: one `_<task>.md` in the repo root (gitignore `_*`), read first on every run, holding phase status, findings with sources, decisions with rationale |
|
|
106
|
+
| Scripted/CI iteration | `codex exec "<prompt>"` non-interactive runs |
|
|
107
|
+
|
|
108
|
+
S10 rules apply unchanged: hard caps on iterations, completion criteria
|
|
109
|
+
declared up front, externalized state (the thread is not durable
|
|
110
|
+
memory), and an explicit final report instead of a zombie loop. For
|
|
111
|
+
project-scoped automations note the Codex requirement that the local
|
|
112
|
+
app is running and the project is on disk.
|
|
113
|
+
|
|
114
|
+
## Frontier autorun and scope
|
|
115
|
+
|
|
116
|
+
Frontier is off until you arm it. For Codex, the normal workflow is:
|
|
117
|
+
|
|
118
|
+
```text
|
|
119
|
+
maestro frontier mode fusion --preset chatgpt-duo --scope codex-project
|
|
120
|
+
maestro frontier mode fusion --preset frontier-trio --judge chatgpt --synth chatgpt --scope codex-project
|
|
121
|
+
maestro frontier mode off --scope codex-project
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Once the Maestro Codex plugin hook is installed, enabled, and trusted,
|
|
125
|
+
normal Codex prompts route through Frontier until you turn it off.
|
|
126
|
+
`maestro frontier run "<prompt>" ...` remains available for advanced/debug
|
|
127
|
+
one-offs, but it is not the everyday Codex flow.
|
|
128
|
+
|
|
129
|
+
Project/workspace scope is the recommended default for repo installs:
|
|
130
|
+
it keeps one repository's armed state from leaking into another. In a
|
|
131
|
+
Codex plugin context Maestro resolves this automatically to a
|
|
132
|
+
`codex-<8hex>` workspace scope. From a shell in the repo, pass
|
|
133
|
+
`--scope codex-project` (or `codex-workspace`) to resolve the same project
|
|
134
|
+
scope. Global/user scope is optional: choose an explicit name such as
|
|
135
|
+
`--scope codex-global` only when you deliberately want the same state across
|
|
136
|
+
projects.
|
|
137
|
+
|
|
138
|
+
Use the same active scope for all lifecycle commands:
|
|
139
|
+
|
|
140
|
+
```text
|
|
141
|
+
maestro frontier status --scope codex-project
|
|
142
|
+
maestro frontier mode off --scope codex-project
|
|
143
|
+
maestro frontier mode fusion --preset chatgpt-duo --scope codex-project
|
|
144
|
+
maestro frontier mode fusion --preset frontier-trio --judge chatgpt --synth chatgpt --scope codex-project
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
## What differs from Claude Code
|
|
148
|
+
|
|
149
|
+
Claude Code-specific UI such as the Maestro context bar does not apply:
|
|
150
|
+
Codex CLI ships a native
|
|
151
|
+
context-usage indicator (`/statusline` picker, or `context` in
|
|
152
|
+
`[tui].status_line` in `~/.codex/config.toml`).
|
|
153
|
+
|
|
154
|
+
## Skills and the Frontier ON indicator
|
|
155
|
+
|
|
156
|
+
Codex skills can live in personal `$HOME/.agents/skills`, repo
|
|
157
|
+
`.agents/skills`, or installed plugins. The normal Codex path is the Maestro
|
|
158
|
+
plugin, which bundles `maestro-frontier`, `maestro-settings`, `maestro-terse`,
|
|
159
|
+
and `maestro-update` from `./skills/`. The portable
|
|
160
|
+
`maestro install --target codex` fallback still installs those same skills to
|
|
161
|
+
`.agents/skills/<name>/SKILL.md` for project installs or
|
|
162
|
+
`~/.agents/skills/<name>/SKILL.md` for global/user installs.
|
|
163
|
+
|
|
164
|
+
When `maestro frontier status --scope codex-project` reports mode != off,
|
|
165
|
+
the `maestro-frontier` skill instructs Codex to lead its reply with
|
|
166
|
+
`Maestro Frontier ON (<label>)` — `single · <model>` or `fusion · <preset>`. When
|
|
167
|
+
mode is off, no indicator line appears.
|
package/docs/orchestration.md
CHANGED
|
@@ -1,168 +1,168 @@
|
|
|
1
|
-
# Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
|
|
2
|
-
|
|
3
|
-
Loaded on demand: read this file when the Decision Gate
|
|
4
|
-
([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
|
|
5
|
-
kernel's compact protocol is a subset of this document and suffices
|
|
6
|
-
when this file is unavailable. Relocated verbatim from the always-on
|
|
7
|
-
doctrine in v1.2, content here extends the kernel, never overrides
|
|
8
|
-
it.
|
|
9
|
-
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
## Gate constraints (S1 detail)
|
|
13
|
-
|
|
14
|
-
- Max 4 specialists per group.
|
|
15
|
-
- >60% shared files or <=3 files in one chain: single-agent.
|
|
16
|
-
- Overlapping ownership erases parallelism; high-centrality: bias
|
|
17
|
-
single.
|
|
18
|
-
- Specialists must differ in role or context, not split identical
|
|
19
|
-
work, homogeneous splits underperform one agent with the same
|
|
20
|
-
budget. Split-design rule for the Planner, not a gate downgrade.
|
|
21
|
-
- Parallelizability first: specialization pays only when subtasks are
|
|
22
|
-
structurally independent. Coupled subtasks: single-agent wins at
|
|
23
|
-
equal token budget, gains that ignore total compute don't count.
|
|
24
|
-
- Adversarial review is the best-evidenced multi-agent win. Review
|
|
25
|
-
and debate panels: 3 specialists (odd, no ties); 4 stays the cap
|
|
26
|
-
for parallel workstreams.
|
|
27
|
-
- How to split (and whether a split is too homogeneous) is the
|
|
28
|
-
Planner's call (S2), made after the spawn, never the gate's.
|
|
29
|
-
|
|
30
|
-
---
|
|
31
|
-
|
|
32
|
-
## 2. Planner [MULTI-AGENT]
|
|
33
|
-
|
|
34
|
-
First sub-agent, created by calling the Task/Agent tool, never
|
|
35
|
-
simulated inline by the orchestrator. No specialist work before
|
|
36
|
-
Planner returns.
|
|
37
|
-
|
|
38
|
-
Produces: subtasks with boundaries, dependency map, parallel groups
|
|
39
|
-
(max 4), per-task file scope + objective + acceptance criteria, flags
|
|
40
|
-
for single-agent subtasks and high-risk items, cross-talk pairs,
|
|
41
|
-
token-cost assessment (flag >60% overlap), task-class match.
|
|
42
|
-
|
|
43
|
-
Fewer broader > many narrow. Flag ambiguity, don't assume.
|
|
44
|
-
|
|
45
|
-
Reading: recommends single-agent -> switch. Ambiguities -> surface.
|
|
46
|
-
|
|
47
|
-
Task classes: Feature (spec/implement/test/integrate),
|
|
48
|
-
Bug (reproduce/root-cause/fix/regress),
|
|
49
|
-
Refactor (scope/refactor/test/verify),
|
|
50
|
-
Audit (discover/analyze/consolidate), Docs+code (change/update/check).
|
|
51
|
-
|
|
52
|
-
---
|
|
53
|
-
|
|
54
|
-
## 3. Specialists [MULTI-AGENT]
|
|
55
|
-
|
|
56
|
-
Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
|
|
57
|
-
ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
|
|
58
|
-
injected). ROLE = procedural workflow (step sequence + acceptance
|
|
59
|
-
criteria), never a bare job title, identity labels alone don't
|
|
60
|
-
change behavior.
|
|
61
|
-
|
|
62
|
-
No conversation history, other tasks, full plan, or unrelated
|
|
63
|
-
context. Isolation is the advantage. Out of scope: report and stop.
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
## 4. Cross-Talk [MULTI-AGENT]
|
|
68
|
-
|
|
69
|
-
After each group: check if A modified B's files, changed B's
|
|
70
|
-
interfaces, invalidated B's assumptions, or produced B's inputs.
|
|
71
|
-
|
|
72
|
-
Route minimum context from A to B. If B completed, spawn correction
|
|
73
|
-
agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
|
|
74
|
-
plan, code, review, or do specialist work.
|
|
75
|
-
|
|
76
|
-
---
|
|
77
|
-
|
|
78
|
-
## 5. Staff Engineer [MULTI-AGENT]
|
|
79
|
-
|
|
80
|
-
Final sub-agent. Reviews integrated output.
|
|
81
|
-
|
|
82
|
-
Packet: changed files + diffs, objective, decisions, risks,
|
|
83
|
-
questions. Expand for: core architecture, security, central
|
|
84
|
-
abstractions.
|
|
85
|
-
|
|
86
|
-
Check: requirements met, specialist contradictions, cross-breakage
|
|
87
|
-
(interfaces/imports/types/state), architectural drift, verification
|
|
88
|
-
(S7.3), dead code/orphaned imports/incomplete renames,
|
|
89
|
-
surgical-scope violations (S7.4).
|
|
90
|
-
|
|
91
|
-
Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
|
|
92
|
-
deliver with issues listed.
|
|
93
|
-
|
|
94
|
-
High-risk or contested verdicts: adversarial panel of 3 (odd, no
|
|
95
|
-
ties), each prompted to refute, not confirm.
|
|
96
|
-
|
|
97
|
-
---
|
|
98
|
-
|
|
99
|
-
## 6. Orchestrator Discipline [MULTI-AGENT]
|
|
100
|
-
|
|
101
|
-
- Route minimum viable info (signature, not 200-line diff)
|
|
102
|
-
- Checkpoint before spawns/handoffs/resumes: objective, files,
|
|
103
|
-
requirements, decisions, risks, next action
|
|
104
|
-
- Structured artifacts > transcript carryover
|
|
105
|
-
- Stable scaffolds for cache reuse; no per-specialist rephrasing
|
|
106
|
-
- Track agent status; report blocks immediately
|
|
107
|
-
- Resume from latest artifact, not full history
|
|
108
|
-
- Specialist fails: report, ask user. No silent retry >1
|
|
109
|
-
- Deliver what asked. No gold-plating. Hooks > prompt reminders
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## 9. Model Routing: full table
|
|
114
|
-
|
|
115
|
-
Pick the cheapest model that handles the task. Orchestrator decides
|
|
116
|
-
at spawn time; Planner (S2) assigns per subtask.
|
|
117
|
-
|
|
118
|
-
| Tier | When | Examples |
|
|
119
|
-
|------|------|----------|
|
|
120
|
-
| Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
|
|
121
|
-
| **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
|
|
122
|
-
| Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
|
|
123
|
-
| Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
|
|
124
|
-
|
|
125
|
-
When unsure: Sonnet.
|
|
126
|
-
|
|
127
|
-
### Output caps
|
|
128
|
-
|
|
129
|
-
Agent prompts MUST specify max response length. Oversized results
|
|
130
|
-
bloat parent context and trigger compaction.
|
|
131
|
-
|
|
132
|
-
| Agent tier | Cap | Exception |
|
|
133
|
-
|------------|-----|-----------|
|
|
134
|
-
| Haiku | 100 words | - |
|
|
135
|
-
| Sonnet | 500 words | Code output (uncapped) |
|
|
136
|
-
| Opus | Uncapped | - |
|
|
137
|
-
| Frontier | Uncapped | - |
|
|
138
|
-
| Explore | 200 words | Always, regardless of model |
|
|
139
|
-
|
|
140
|
-
Explore agents: "report in under 200 words" in every prompt.
|
|
141
|
-
|
|
142
|
-
### Tool-call budgets
|
|
143
|
-
|
|
144
|
-
Action tokens are the third cost lever, beside output caps (above)
|
|
145
|
-
and S8 input compression. Every subagent prompt carries a tool-call
|
|
146
|
-
budget (manifest field `toolBudget`); idea adapted from
|
|
147
|
-
claude-token-efficient (MIT).
|
|
148
|
-
|
|
149
|
-
| Task type | Budget |
|
|
150
|
-
|-----------|--------|
|
|
151
|
-
| Routine subtask, known scope | ~20 calls |
|
|
152
|
-
| Read-only research / Explore | ~10 calls |
|
|
153
|
-
| Multi-file implementation | scale with file count; state it explicitly |
|
|
154
|
-
|
|
155
|
-
Discipline inside the budget: read-first-write-once (read each
|
|
156
|
-
needed file once, then edit, no re-read loops); one diagnostic read
|
|
157
|
-
per failure, then the S7.3 two-attempt rule applies (stop, re-read
|
|
158
|
-
from scratch, change approach). Budget exhausted: report progress
|
|
159
|
-
and the named gap, never burn calls polling.
|
|
160
|
-
Research agents returning raw dumps waste more tokens than they save.
|
|
161
|
-
|
|
162
|
-
---
|
|
163
|
-
|
|
164
|
-
## Self-evaluation (relocated S7.6)
|
|
165
|
-
|
|
166
|
-
- Two perspectives: perfectionist critique + pragmatist accept
|
|
167
|
-
- Bug autopsy: root cause vs symptom, prevention
|
|
168
|
-
- After 2 failures: stop, re-read from scratch, different approach
|
|
1
|
+
# Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
|
|
2
|
+
|
|
3
|
+
Loaded on demand: read this file when the Decision Gate
|
|
4
|
+
([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
|
|
5
|
+
kernel's compact protocol is a subset of this document and suffices
|
|
6
|
+
when this file is unavailable. Relocated verbatim from the always-on
|
|
7
|
+
doctrine in v1.2, content here extends the kernel, never overrides
|
|
8
|
+
it.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Gate constraints (S1 detail)
|
|
13
|
+
|
|
14
|
+
- Max 4 specialists per group.
|
|
15
|
+
- >60% shared files or <=3 files in one chain: single-agent.
|
|
16
|
+
- Overlapping ownership erases parallelism; high-centrality: bias
|
|
17
|
+
single.
|
|
18
|
+
- Specialists must differ in role or context, not split identical
|
|
19
|
+
work, homogeneous splits underperform one agent with the same
|
|
20
|
+
budget. Split-design rule for the Planner, not a gate downgrade.
|
|
21
|
+
- Parallelizability first: specialization pays only when subtasks are
|
|
22
|
+
structurally independent. Coupled subtasks: single-agent wins at
|
|
23
|
+
equal token budget, gains that ignore total compute don't count.
|
|
24
|
+
- Adversarial review is the best-evidenced multi-agent win. Review
|
|
25
|
+
and debate panels: 3 specialists (odd, no ties); 4 stays the cap
|
|
26
|
+
for parallel workstreams.
|
|
27
|
+
- How to split (and whether a split is too homogeneous) is the
|
|
28
|
+
Planner's call (S2), made after the spawn, never the gate's.
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## 2. Planner [MULTI-AGENT]
|
|
33
|
+
|
|
34
|
+
First sub-agent, created by calling the Task/Agent tool, never
|
|
35
|
+
simulated inline by the orchestrator. No specialist work before
|
|
36
|
+
Planner returns.
|
|
37
|
+
|
|
38
|
+
Produces: subtasks with boundaries, dependency map, parallel groups
|
|
39
|
+
(max 4), per-task file scope + objective + acceptance criteria, flags
|
|
40
|
+
for single-agent subtasks and high-risk items, cross-talk pairs,
|
|
41
|
+
token-cost assessment (flag >60% overlap), task-class match.
|
|
42
|
+
|
|
43
|
+
Fewer broader > many narrow. Flag ambiguity, don't assume.
|
|
44
|
+
|
|
45
|
+
Reading: recommends single-agent -> switch. Ambiguities -> surface.
|
|
46
|
+
|
|
47
|
+
Task classes: Feature (spec/implement/test/integrate),
|
|
48
|
+
Bug (reproduce/root-cause/fix/regress),
|
|
49
|
+
Refactor (scope/refactor/test/verify),
|
|
50
|
+
Audit (discover/analyze/consolidate), Docs+code (change/update/check).
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## 3. Specialists [MULTI-AGENT]
|
|
55
|
+
|
|
56
|
+
Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
|
|
57
|
+
ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
|
|
58
|
+
injected). ROLE = procedural workflow (step sequence + acceptance
|
|
59
|
+
criteria), never a bare job title, identity labels alone don't
|
|
60
|
+
change behavior.
|
|
61
|
+
|
|
62
|
+
No conversation history, other tasks, full plan, or unrelated
|
|
63
|
+
context. Isolation is the advantage. Out of scope: report and stop.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## 4. Cross-Talk [MULTI-AGENT]
|
|
68
|
+
|
|
69
|
+
After each group: check if A modified B's files, changed B's
|
|
70
|
+
interfaces, invalidated B's assumptions, or produced B's inputs.
|
|
71
|
+
|
|
72
|
+
Route minimum context from A to B. If B completed, spawn correction
|
|
73
|
+
agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
|
|
74
|
+
plan, code, review, or do specialist work.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## 5. Staff Engineer [MULTI-AGENT]
|
|
79
|
+
|
|
80
|
+
Final sub-agent. Reviews integrated output.
|
|
81
|
+
|
|
82
|
+
Packet: changed files + diffs, objective, decisions, risks,
|
|
83
|
+
questions. Expand for: core architecture, security, central
|
|
84
|
+
abstractions.
|
|
85
|
+
|
|
86
|
+
Check: requirements met, specialist contradictions, cross-breakage
|
|
87
|
+
(interfaces/imports/types/state), architectural drift, verification
|
|
88
|
+
(S7.3), dead code/orphaned imports/incomplete renames,
|
|
89
|
+
surgical-scope violations (S7.4).
|
|
90
|
+
|
|
91
|
+
Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
|
|
92
|
+
deliver with issues listed.
|
|
93
|
+
|
|
94
|
+
High-risk or contested verdicts: adversarial panel of 3 (odd, no
|
|
95
|
+
ties), each prompted to refute, not confirm.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## 6. Orchestrator Discipline [MULTI-AGENT]
|
|
100
|
+
|
|
101
|
+
- Route minimum viable info (signature, not 200-line diff)
|
|
102
|
+
- Checkpoint before spawns/handoffs/resumes: objective, files,
|
|
103
|
+
requirements, decisions, risks, next action
|
|
104
|
+
- Structured artifacts > transcript carryover
|
|
105
|
+
- Stable scaffolds for cache reuse; no per-specialist rephrasing
|
|
106
|
+
- Track agent status; report blocks immediately
|
|
107
|
+
- Resume from latest artifact, not full history
|
|
108
|
+
- Specialist fails: report, ask user. No silent retry >1
|
|
109
|
+
- Deliver what asked. No gold-plating. Hooks > prompt reminders
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## 9. Model Routing: full table
|
|
114
|
+
|
|
115
|
+
Pick the cheapest model that handles the task. Orchestrator decides
|
|
116
|
+
at spawn time; Planner (S2) assigns per subtask.
|
|
117
|
+
|
|
118
|
+
| Tier | When | Examples |
|
|
119
|
+
|------|------|----------|
|
|
120
|
+
| Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
|
|
121
|
+
| **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
|
|
122
|
+
| Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
|
|
123
|
+
| Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
|
|
124
|
+
|
|
125
|
+
When unsure: Sonnet.
|
|
126
|
+
|
|
127
|
+
### Output caps
|
|
128
|
+
|
|
129
|
+
Agent prompts MUST specify max response length. Oversized results
|
|
130
|
+
bloat parent context and trigger compaction.
|
|
131
|
+
|
|
132
|
+
| Agent tier | Cap | Exception |
|
|
133
|
+
|------------|-----|-----------|
|
|
134
|
+
| Haiku | 100 words | - |
|
|
135
|
+
| Sonnet | 500 words | Code output (uncapped) |
|
|
136
|
+
| Opus | Uncapped | - |
|
|
137
|
+
| Frontier | Uncapped | - |
|
|
138
|
+
| Explore | 200 words | Always, regardless of model |
|
|
139
|
+
|
|
140
|
+
Explore agents: "report in under 200 words" in every prompt.
|
|
141
|
+
|
|
142
|
+
### Tool-call budgets
|
|
143
|
+
|
|
144
|
+
Action tokens are the third cost lever, beside output caps (above)
|
|
145
|
+
and S8 input compression. Every subagent prompt carries a tool-call
|
|
146
|
+
budget (manifest field `toolBudget`); idea adapted from
|
|
147
|
+
claude-token-efficient (MIT).
|
|
148
|
+
|
|
149
|
+
| Task type | Budget |
|
|
150
|
+
|-----------|--------|
|
|
151
|
+
| Routine subtask, known scope | ~20 calls |
|
|
152
|
+
| Read-only research / Explore | ~10 calls |
|
|
153
|
+
| Multi-file implementation | scale with file count; state it explicitly |
|
|
154
|
+
|
|
155
|
+
Discipline inside the budget: read-first-write-once (read each
|
|
156
|
+
needed file once, then edit, no re-read loops); one diagnostic read
|
|
157
|
+
per failure, then the S7.3 two-attempt rule applies (stop, re-read
|
|
158
|
+
from scratch, change approach). Budget exhausted: report progress
|
|
159
|
+
and the named gap, never burn calls polling.
|
|
160
|
+
Research agents returning raw dumps waste more tokens than they save.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Self-evaluation (relocated S7.6)
|
|
165
|
+
|
|
166
|
+
- Two perspectives: perfectionist critique + pragmatist accept
|
|
167
|
+
- Bug autopsy: root cause vs symptom, prevention
|
|
168
|
+
- After 2 failures: stop, re-read from scratch, different approach
|