@maestrofrontier/frontier 1.4.5 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/plugins/marketplace.json +21 -21
- package/.codex-plugin/plugin.json +29 -29
- package/.cursorrules +197 -194
- package/AGENTS.md +3 -3
- package/README.md +368 -368
- package/bin/maestro.cjs +75 -75
- package/commands/compress.md +36 -36
- package/commands/frontier.md +124 -124
- package/commands/terse.md +23 -23
- package/docs/codex.md +167 -167
- package/docs/orchestration.md +168 -168
- package/frontier/cli.cjs +279 -252
- package/frontier/config.cjs +468 -468
- package/frontier/dispatch.cjs +267 -255
- package/frontier/judge.cjs +92 -92
- package/frontier/run.cjs +201 -180
- package/frontier/schema.cjs +112 -112
- package/frontier/semaphore.cjs +49 -49
- package/frontier/synthesize.cjs +79 -79
- package/hooks/frontier-autorun.cjs +127 -120
- package/hooks/hooks.json +103 -103
- package/hooks/maestro-doctrine-guard.cjs +81 -81
- package/hooks/maestro-gate-reminder.cjs +22 -7
- package/hooks/maestro-gate-telemetry.cjs +79 -77
- package/hooks/maestro-phase-scope.cjs +118 -118
- package/hooks/maestro-statusline-sync.cjs +152 -152
- package/hooks/maestro-subagent-guard.cjs +148 -148
- package/hooks/maestro-terse-mode.cjs +189 -189
- package/hooks/maestro-toolbudget-advisory.cjs +127 -127
- package/integrations/README.md +111 -111
- package/integrations/cline/skills/frontier/SKILL.md +75 -75
- package/integrations/codex/prompts/frontier.md +70 -70
- package/integrations/codex/prompts/update.md +39 -39
- package/integrations/codex/skills/maestro-frontier/SKILL.md +122 -122
- package/integrations/codex/skills/maestro-settings/SKILL.md +55 -55
- package/integrations/codex/skills/maestro-terse/SKILL.md +58 -58
- package/integrations/codex/skills/maestro-update/SKILL.md +31 -31
- package/integrations/cursor/commands/frontier.md +63 -63
- package/integrations/cursor/commands/update.md +34 -34
- package/integrations/gemini/commands/frontier.toml +76 -76
- package/integrations/windsurf/workflows/frontier.md +70 -70
- package/package.json +58 -58
- package/scripts/install.cjs +1014 -1014
- package/settings/cli.cjs +140 -140
- package/settings/config.cjs +309 -309
- package/skills/maestro-frontier/SKILL.md +122 -122
- package/skills/maestro-settings/SKILL.md +55 -55
- package/skills/maestro-terse/SKILL.md +58 -58
- package/skills/maestro-update/SKILL.md +31 -31
- package/skills/terse/SKILL.md +74 -74
package/docs/orchestration.md
CHANGED
|
@@ -1,168 +1,168 @@
|
|
|
1
|
-
# Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
|
|
2
|
-
|
|
3
|
-
Loaded on demand: read this file when the Decision Gate
|
|
4
|
-
([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
|
|
5
|
-
kernel's compact protocol is a subset of this document and suffices
|
|
6
|
-
when this file is unavailable. Relocated verbatim from the always-on
|
|
7
|
-
doctrine in v1.2, content here extends the kernel, never overrides
|
|
8
|
-
it.
|
|
9
|
-
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
## Gate constraints (S1 detail)
|
|
13
|
-
|
|
14
|
-
- Max 4 specialists per group.
|
|
15
|
-
- >60% shared files or <=3 files in one chain: single-agent.
|
|
16
|
-
- Overlapping ownership erases parallelism; high-centrality: bias
|
|
17
|
-
single.
|
|
18
|
-
- Specialists must differ in role or context, not split identical
|
|
19
|
-
work, homogeneous splits underperform one agent with the same
|
|
20
|
-
budget. Split-design rule for the Planner, not a gate downgrade.
|
|
21
|
-
- Parallelizability first: specialization pays only when subtasks are
|
|
22
|
-
structurally independent. Coupled subtasks: single-agent wins at
|
|
23
|
-
equal token budget, gains that ignore total compute don't count.
|
|
24
|
-
- Adversarial review is the best-evidenced multi-agent win. Review
|
|
25
|
-
and debate panels: 3 specialists (odd, no ties); 4 stays the cap
|
|
26
|
-
for parallel workstreams.
|
|
27
|
-
- How to split (and whether a split is too homogeneous) is the
|
|
28
|
-
Planner's call (S2), made after the spawn, never the gate's.
|
|
29
|
-
|
|
30
|
-
---
|
|
31
|
-
|
|
32
|
-
## 2. Planner [MULTI-AGENT]
|
|
33
|
-
|
|
34
|
-
First sub-agent, created by calling the Task/Agent tool, never
|
|
35
|
-
simulated inline by the orchestrator. No specialist work before
|
|
36
|
-
Planner returns.
|
|
37
|
-
|
|
38
|
-
Produces: subtasks with boundaries, dependency map, parallel groups
|
|
39
|
-
(max 4), per-task file scope + objective + acceptance criteria, flags
|
|
40
|
-
for single-agent subtasks and high-risk items, cross-talk pairs,
|
|
41
|
-
token-cost assessment (flag >60% overlap), task-class match.
|
|
42
|
-
|
|
43
|
-
Fewer broader > many narrow. Flag ambiguity, don't assume.
|
|
44
|
-
|
|
45
|
-
Reading: recommends single-agent -> switch. Ambiguities -> surface.
|
|
46
|
-
|
|
47
|
-
Task classes: Feature (spec/implement/test/integrate),
|
|
48
|
-
Bug (reproduce/root-cause/fix/regress),
|
|
49
|
-
Refactor (scope/refactor/test/verify),
|
|
50
|
-
Audit (discover/analyze/consolidate), Docs+code (change/update/check).
|
|
51
|
-
|
|
52
|
-
---
|
|
53
|
-
|
|
54
|
-
## 3. Specialists [MULTI-AGENT]
|
|
55
|
-
|
|
56
|
-
Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
|
|
57
|
-
ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
|
|
58
|
-
injected). ROLE = procedural workflow (step sequence + acceptance
|
|
59
|
-
criteria), never a bare job title, identity labels alone don't
|
|
60
|
-
change behavior.
|
|
61
|
-
|
|
62
|
-
No conversation history, other tasks, full plan, or unrelated
|
|
63
|
-
context. Isolation is the advantage. Out of scope: report and stop.
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
## 4. Cross-Talk [MULTI-AGENT]
|
|
68
|
-
|
|
69
|
-
After each group: check if A modified B's files, changed B's
|
|
70
|
-
interfaces, invalidated B's assumptions, or produced B's inputs.
|
|
71
|
-
|
|
72
|
-
Route minimum context from A to B. If B completed, spawn correction
|
|
73
|
-
agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
|
|
74
|
-
plan, code, review, or do specialist work.
|
|
75
|
-
|
|
76
|
-
---
|
|
77
|
-
|
|
78
|
-
## 5. Staff Engineer [MULTI-AGENT]
|
|
79
|
-
|
|
80
|
-
Final sub-agent. Reviews integrated output.
|
|
81
|
-
|
|
82
|
-
Packet: changed files + diffs, objective, decisions, risks,
|
|
83
|
-
questions. Expand for: core architecture, security, central
|
|
84
|
-
abstractions.
|
|
85
|
-
|
|
86
|
-
Check: requirements met, specialist contradictions, cross-breakage
|
|
87
|
-
(interfaces/imports/types/state), architectural drift, verification
|
|
88
|
-
(S7.3), dead code/orphaned imports/incomplete renames,
|
|
89
|
-
surgical-scope violations (S7.4).
|
|
90
|
-
|
|
91
|
-
Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
|
|
92
|
-
deliver with issues listed.
|
|
93
|
-
|
|
94
|
-
High-risk or contested verdicts: adversarial panel of 3 (odd, no
|
|
95
|
-
ties), each prompted to refute, not confirm.
|
|
96
|
-
|
|
97
|
-
---
|
|
98
|
-
|
|
99
|
-
## 6. Orchestrator Discipline [MULTI-AGENT]
|
|
100
|
-
|
|
101
|
-
- Route minimum viable info (signature, not 200-line diff)
|
|
102
|
-
- Checkpoint before spawns/handoffs/resumes: objective, files,
|
|
103
|
-
requirements, decisions, risks, next action
|
|
104
|
-
- Structured artifacts > transcript carryover
|
|
105
|
-
- Stable scaffolds for cache reuse; no per-specialist rephrasing
|
|
106
|
-
- Track agent status; report blocks immediately
|
|
107
|
-
- Resume from latest artifact, not full history
|
|
108
|
-
- Specialist fails: report, ask user. No silent retry >1
|
|
109
|
-
- Deliver what asked. No gold-plating. Hooks > prompt reminders
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## 9. Model Routing: full table
|
|
114
|
-
|
|
115
|
-
Pick the cheapest model that handles the task. Orchestrator decides
|
|
116
|
-
at spawn time; Planner (S2) assigns per subtask.
|
|
117
|
-
|
|
118
|
-
| Tier | When | Examples |
|
|
119
|
-
|------|------|----------|
|
|
120
|
-
| Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
|
|
121
|
-
| **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
|
|
122
|
-
| Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
|
|
123
|
-
| Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
|
|
124
|
-
|
|
125
|
-
When unsure: Sonnet.
|
|
126
|
-
|
|
127
|
-
### Output caps
|
|
128
|
-
|
|
129
|
-
Agent prompts MUST specify max response length. Oversized results
|
|
130
|
-
bloat parent context and trigger compaction.
|
|
131
|
-
|
|
132
|
-
| Agent tier | Cap | Exception |
|
|
133
|
-
|------------|-----|-----------|
|
|
134
|
-
| Haiku | 100 words | - |
|
|
135
|
-
| Sonnet | 500 words | Code output (uncapped) |
|
|
136
|
-
| Opus | Uncapped | - |
|
|
137
|
-
| Frontier | Uncapped | - |
|
|
138
|
-
| Explore | 200 words | Always, regardless of model |
|
|
139
|
-
|
|
140
|
-
Explore agents: "report in under 200 words" in every prompt.
|
|
141
|
-
|
|
142
|
-
### Tool-call budgets
|
|
143
|
-
|
|
144
|
-
Action tokens are the third cost lever, beside output caps (above)
|
|
145
|
-
and S8 input compression. Every subagent prompt carries a tool-call
|
|
146
|
-
budget (manifest field `toolBudget`); idea adapted from
|
|
147
|
-
claude-token-efficient (MIT).
|
|
148
|
-
|
|
149
|
-
| Task type | Budget |
|
|
150
|
-
|-----------|--------|
|
|
151
|
-
| Routine subtask, known scope | ~20 calls |
|
|
152
|
-
| Read-only research / Explore | ~10 calls |
|
|
153
|
-
| Multi-file implementation | scale with file count; state it explicitly |
|
|
154
|
-
|
|
155
|
-
Discipline inside the budget: read-first-write-once (read each
|
|
156
|
-
needed file once, then edit, no re-read loops); one diagnostic read
|
|
157
|
-
per failure, then the S7.3 two-attempt rule applies (stop, re-read
|
|
158
|
-
from scratch, change approach). Budget exhausted: report progress
|
|
159
|
-
and the named gap, never burn calls polling.
|
|
160
|
-
Research agents returning raw dumps waste more tokens than they save.
|
|
161
|
-
|
|
162
|
-
---
|
|
163
|
-
|
|
164
|
-
## Self-evaluation (relocated S7.6)
|
|
165
|
-
|
|
166
|
-
- Two perspectives: perfectionist critique + pragmatist accept
|
|
167
|
-
- Bug autopsy: root cause vs symptom, prevention
|
|
168
|
-
- After 2 failures: stop, re-read from scratch, different approach
|
|
1
|
+
# Maestro Multi-Agent Orchestration: Full Protocol (S2-S6)
|
|
2
|
+
|
|
3
|
+
Loaded on demand: read this file when the Decision Gate
|
|
4
|
+
([AGENTS.md](../AGENTS.md) S1) returns a multi-agent verdict. The
|
|
5
|
+
kernel's compact protocol is a subset of this document and suffices
|
|
6
|
+
when this file is unavailable. Relocated verbatim from the always-on
|
|
7
|
+
doctrine in v1.2, content here extends the kernel, never overrides
|
|
8
|
+
it.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Gate constraints (S1 detail)
|
|
13
|
+
|
|
14
|
+
- Max 4 specialists per group.
|
|
15
|
+
- >60% shared files or <=3 files in one chain: single-agent.
|
|
16
|
+
- Overlapping ownership erases parallelism; high-centrality: bias
|
|
17
|
+
single.
|
|
18
|
+
- Specialists must differ in role or context, not split identical
|
|
19
|
+
work, homogeneous splits underperform one agent with the same
|
|
20
|
+
budget. Split-design rule for the Planner, not a gate downgrade.
|
|
21
|
+
- Parallelizability first: specialization pays only when subtasks are
|
|
22
|
+
structurally independent. Coupled subtasks: single-agent wins at
|
|
23
|
+
equal token budget, gains that ignore total compute don't count.
|
|
24
|
+
- Adversarial review is the best-evidenced multi-agent win. Review
|
|
25
|
+
and debate panels: 3 specialists (odd, no ties); 4 stays the cap
|
|
26
|
+
for parallel workstreams.
|
|
27
|
+
- How to split (and whether a split is too homogeneous) is the
|
|
28
|
+
Planner's call (S2), made after the spawn, never the gate's.
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## 2. Planner [MULTI-AGENT]
|
|
33
|
+
|
|
34
|
+
First sub-agent, created by calling the Task/Agent tool, never
|
|
35
|
+
simulated inline by the orchestrator. No specialist work before
|
|
36
|
+
Planner returns.
|
|
37
|
+
|
|
38
|
+
Produces: subtasks with boundaries, dependency map, parallel groups
|
|
39
|
+
(max 4), per-task file scope + objective + acceptance criteria, flags
|
|
40
|
+
for single-agent subtasks and high-risk items, cross-talk pairs,
|
|
41
|
+
token-cost assessment (flag >60% overlap), task-class match.
|
|
42
|
+
|
|
43
|
+
Fewer broader > many narrow. Flag ambiguity, don't assume.
|
|
44
|
+
|
|
45
|
+
Reading: recommends single-agent -> switch. Ambiguities -> surface.
|
|
46
|
+
|
|
47
|
+
Task classes: Feature (spec/implement/test/integrate),
|
|
48
|
+
Bug (reproduce/root-cause/fix/regress),
|
|
49
|
+
Refactor (scope/refactor/test/verify),
|
|
50
|
+
Audit (discover/analyze/consolidate), Docs+code (change/update/check).
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## 3. Specialists [MULTI-AGENT]
|
|
55
|
+
|
|
56
|
+
Manifest fields: ROLE, TASK, FILES (read/modify), UPSTREAM,
|
|
57
|
+
ORIENTATION, ASSUMPTIONS, OUTPUT, ACCEPT, TOOLS (scoped), RULES (S7
|
|
58
|
+
injected). ROLE = procedural workflow (step sequence + acceptance
|
|
59
|
+
criteria), never a bare job title, identity labels alone don't
|
|
60
|
+
change behavior.
|
|
61
|
+
|
|
62
|
+
No conversation history, other tasks, full plan, or unrelated
|
|
63
|
+
context. Isolation is the advantage. Out of scope: report and stop.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## 4. Cross-Talk [MULTI-AGENT]
|
|
68
|
+
|
|
69
|
+
After each group: check if A modified B's files, changed B's
|
|
70
|
+
interfaces, invalidated B's assumptions, or produced B's inputs.
|
|
71
|
+
|
|
72
|
+
Route minimum context from A to B. If B completed, spawn correction
|
|
73
|
+
agent. Orchestrator: spawn, sequence, detect, route, deliver. Never
|
|
74
|
+
plan, code, review, or do specialist work.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## 5. Staff Engineer [MULTI-AGENT]
|
|
79
|
+
|
|
80
|
+
Final sub-agent. Reviews integrated output.
|
|
81
|
+
|
|
82
|
+
Packet: changed files + diffs, objective, decisions, risks,
|
|
83
|
+
questions. Expand for: core architecture, security, central
|
|
84
|
+
abstractions.
|
|
85
|
+
|
|
86
|
+
Check: requirements met, specialist contradictions, cross-breakage
|
|
87
|
+
(interfaces/imports/types/state), architectural drift, verification
|
|
88
|
+
(S7.3), dead code/orphaned imports/incomplete renames,
|
|
89
|
+
surgical-scope violations (S7.4).
|
|
90
|
+
|
|
91
|
+
Returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
|
|
92
|
+
deliver with issues listed.
|
|
93
|
+
|
|
94
|
+
High-risk or contested verdicts: adversarial panel of 3 (odd, no
|
|
95
|
+
ties), each prompted to refute, not confirm.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## 6. Orchestrator Discipline [MULTI-AGENT]
|
|
100
|
+
|
|
101
|
+
- Route minimum viable info (signature, not 200-line diff)
|
|
102
|
+
- Checkpoint before spawns/handoffs/resumes: objective, files,
|
|
103
|
+
requirements, decisions, risks, next action
|
|
104
|
+
- Structured artifacts > transcript carryover
|
|
105
|
+
- Stable scaffolds for cache reuse; no per-specialist rephrasing
|
|
106
|
+
- Track agent status; report blocks immediately
|
|
107
|
+
- Resume from latest artifact, not full history
|
|
108
|
+
- Specialist fails: report, ask user. No silent retry >1
|
|
109
|
+
- Deliver what asked. No gold-plating. Hooks > prompt reminders
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## 9. Model Routing: full table
|
|
114
|
+
|
|
115
|
+
Pick the cheapest model that handles the task. Orchestrator decides
|
|
116
|
+
at spawn time; Planner (S2) assigns per subtask.
|
|
117
|
+
|
|
118
|
+
| Tier | When | Examples |
|
|
119
|
+
|------|------|----------|
|
|
120
|
+
| Haiku | No edits, single source, low reasoning | Status lookup, chat, format, classify, extract |
|
|
121
|
+
| **Sonnet** | 1-3 file edits, known scope. **Default** | Bug fix, refactor, test, review, docs |
|
|
122
|
+
| Opus | 4+ files, novel design, high reversal cost | Architecture, security review, complex debug |
|
|
123
|
+
| Frontier (Fable-class) | Orchestrator tier: long-horizon autonomous work, 1M-context audits, frontier reasoning | Orchestration, system design, deep multi-file debug, adversarial synthesis |
|
|
124
|
+
|
|
125
|
+
When unsure: Sonnet.
|
|
126
|
+
|
|
127
|
+
### Output caps
|
|
128
|
+
|
|
129
|
+
Agent prompts MUST specify max response length. Oversized results
|
|
130
|
+
bloat parent context and trigger compaction.
|
|
131
|
+
|
|
132
|
+
| Agent tier | Cap | Exception |
|
|
133
|
+
|------------|-----|-----------|
|
|
134
|
+
| Haiku | 100 words | - |
|
|
135
|
+
| Sonnet | 500 words | Code output (uncapped) |
|
|
136
|
+
| Opus | Uncapped | - |
|
|
137
|
+
| Frontier | Uncapped | - |
|
|
138
|
+
| Explore | 200 words | Always, regardless of model |
|
|
139
|
+
|
|
140
|
+
Explore agents: "report in under 200 words" in every prompt.
|
|
141
|
+
|
|
142
|
+
### Tool-call budgets
|
|
143
|
+
|
|
144
|
+
Action tokens are the third cost lever, beside output caps (above)
|
|
145
|
+
and S8 input compression. Every subagent prompt carries a tool-call
|
|
146
|
+
budget (manifest field `toolBudget`); idea adapted from
|
|
147
|
+
claude-token-efficient (MIT).
|
|
148
|
+
|
|
149
|
+
| Task type | Budget |
|
|
150
|
+
|-----------|--------|
|
|
151
|
+
| Routine subtask, known scope | ~20 calls |
|
|
152
|
+
| Read-only research / Explore | ~10 calls |
|
|
153
|
+
| Multi-file implementation | scale with file count; state it explicitly |
|
|
154
|
+
|
|
155
|
+
Discipline inside the budget: read-first-write-once (read each
|
|
156
|
+
needed file once, then edit, no re-read loops); one diagnostic read
|
|
157
|
+
per failure, then the S7.3 two-attempt rule applies (stop, re-read
|
|
158
|
+
from scratch, change approach). Budget exhausted: report progress
|
|
159
|
+
and the named gap, never burn calls polling.
|
|
160
|
+
Research agents returning raw dumps waste more tokens than they save.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Self-evaluation (relocated S7.6)
|
|
165
|
+
|
|
166
|
+
- Two perspectives: perfectionist critique + pragmatist accept
|
|
167
|
+
- Bug autopsy: root cause vs symptom, prevention
|
|
168
|
+
- After 2 failures: stop, re-read from scratch, different approach
|