@maestrofrontier/frontier 1.4.4 → 1.4.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/plugins/marketplace.json +21 -0
- package/.codex-plugin/plugin.json +29 -0
- package/AGENTS.md +214 -214
- package/CLAUDE.md +29 -29
- package/README.md +112 -22
- package/docs/codex.md +81 -12
- package/frontier/cli.cjs +10 -6
- package/frontier/config.cjs +41 -14
- package/frontier/run.cjs +33 -1
- package/hooks/frontier-autorun.cjs +2 -6
- package/hooks/hooks.json +1 -1
- package/integrations/README.md +51 -34
- package/integrations/codex/prompts/frontier.md +22 -18
- package/integrations/codex/prompts/update.md +3 -0
- package/integrations/codex/skills/maestro-frontier/SKILL.md +122 -0
- package/integrations/codex/skills/{settings → maestro-settings}/SKILL.md +15 -6
- package/integrations/codex/skills/{terse → maestro-terse}/SKILL.md +15 -6
- package/integrations/codex/skills/maestro-update/SKILL.md +31 -0
- package/package.json +4 -1
- package/scripts/install.cjs +424 -15
- package/settings/cli.cjs +1 -1
- package/skills/maestro-frontier/SKILL.md +122 -0
- package/skills/maestro-settings/SKILL.md +55 -0
- package/skills/maestro-terse/SKILL.md +58 -0
- package/skills/maestro-update/SKILL.md +31 -0
- package/skills/terse/SKILL.md +74 -0
- package/integrations/codex/skills/frontier/SKILL.md +0 -91
- package/integrations/codex/skills/update/SKILL.md +0 -29
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "maestro",
|
|
3
|
+
"interface": {
|
|
4
|
+
"displayName": "Maestro"
|
|
5
|
+
},
|
|
6
|
+
"plugins": [
|
|
7
|
+
{
|
|
8
|
+
"name": "maestro",
|
|
9
|
+
"source": {
|
|
10
|
+
"source": "url",
|
|
11
|
+
"url": "https://github.com/mbanderas/maestro.git",
|
|
12
|
+
"ref": "main"
|
|
13
|
+
},
|
|
14
|
+
"policy": {
|
|
15
|
+
"installation": "AVAILABLE",
|
|
16
|
+
"authentication": "ON_INSTALL"
|
|
17
|
+
},
|
|
18
|
+
"category": "Productivity"
|
|
19
|
+
}
|
|
20
|
+
]
|
|
21
|
+
}
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "maestro",
|
|
3
|
+
"version": "1.4.5",
|
|
4
|
+
"description": "Maestro Frontier orchestration, Codex skills, and lifecycle hooks.",
|
|
5
|
+
"author": {
|
|
6
|
+
"name": "Maestro",
|
|
7
|
+
"url": "https://github.com/mbanderas/maestro"
|
|
8
|
+
},
|
|
9
|
+
"homepage": "https://github.com/mbanderas/maestro#readme",
|
|
10
|
+
"repository": "https://github.com/mbanderas/maestro",
|
|
11
|
+
"license": "MIT",
|
|
12
|
+
"keywords": ["frontier", "multi-agent", "codex", "hooks", "skills"],
|
|
13
|
+
"skills": "./skills/",
|
|
14
|
+
"interface": {
|
|
15
|
+
"displayName": "Maestro",
|
|
16
|
+
"shortDescription": "Frontier orchestration, Codex skills, and lifecycle hooks",
|
|
17
|
+
"longDescription": "Install Maestro in Codex as a plugin: bundled skills, trusted hooks, and the local Frontier fusion engine.",
|
|
18
|
+
"developerName": "Maestro",
|
|
19
|
+
"category": "Productivity",
|
|
20
|
+
"capabilities": ["Interactive", "Write"],
|
|
21
|
+
"websiteURL": "https://github.com/mbanderas/maestro",
|
|
22
|
+
"brandColor": "#5B82D6",
|
|
23
|
+
"defaultPrompt": [
|
|
24
|
+
"Use Maestro Frontier with ChatGPT duo.",
|
|
25
|
+
"Show Maestro Frontier status.",
|
|
26
|
+
"Turn Maestro Frontier off."
|
|
27
|
+
]
|
|
28
|
+
}
|
|
29
|
+
}
|
package/AGENTS.md
CHANGED
|
@@ -1,214 +1,214 @@
|
|
|
1
|
-
# AGENTS.md -- Maestro Orchestration Kernel
|
|
2
|
-
|
|
3
|
-
Discipline layer for AI coding agents. This file is the always-on
|
|
4
|
-
kernel; the full multi-agent protocol lives in
|
|
5
|
-
[docs/orchestration.md](docs/orchestration.md) and loads on demand.
|
|
6
|
-
Section numbers S0-S10 are stable identifiers.
|
|
7
|
-
|
|
8
|
-
---
|
|
9
|
-
|
|
10
|
-
## 0. Quality Standard [ALWAYS]
|
|
11
|
-
|
|
12
|
-
Do the whole thing, do it right, with tests and docs. Search before
|
|
13
|
-
building; test before shipping. Bar: genuinely done. Applies within
|
|
14
|
-
requested scope.
|
|
15
|
-
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
## 1. Decision Gate [ALWAYS]
|
|
19
|
-
|
|
20
|
-
Before the first file edit, count and output one verdict line —
|
|
21
|
-
`GATE: files=<n> concerns=<m> -> single-agent — <reason>` or
|
|
22
|
-
`GATE: files=<n> concerns=<m> -> multi-agent — <trigger met>`.
|
|
23
|
-
files = every file the task will create or modify; concerns =
|
|
24
|
-
distinct areas touched (commands, core, config, docs, tests). No
|
|
25
|
-
edits before the verdict.
|
|
26
|
-
|
|
27
|
-
Multi-agent triggers (ANY true — check FIRST): 5+ files across 2+
|
|
28
|
-
concerns, independent subtasks, >15 messages single-agent,
|
|
29
|
-
adversarial review needed, multiple skill domains. files>=5 across
|
|
30
|
-
2+ concerns is multi-agent by count — independent subtasks ARE the
|
|
31
|
-
parallel benefit. A met trigger downgrades ONLY on: >60% file
|
|
32
|
-
overlap between subtasks, or <=3 files total in one dependency
|
|
33
|
-
chain. Nothing else.
|
|
34
|
-
|
|
35
|
-
A multi-agent verdict is executed, not noted: immediately spawn the
|
|
36
|
-
Planner as a real subagent via the Task/Agent tool — before any
|
|
37
|
-
specialist work or file edit. Read
|
|
38
|
-
[docs/orchestration.md](docs/orchestration.md) first when it is
|
|
39
|
-
available; the compact protocol below suffices when it is not.
|
|
40
|
-
|
|
41
|
-
Single-agent fallback (no trigger met: <=3 tightly coupled files,
|
|
42
|
-
sequential, no parallel benefit): execute via S7, skip S2-S6.
|
|
43
|
-
Constraints: max 4 specialists per group; review and debate panels
|
|
44
|
-
of 3 (odd, no ties); user override ("single agent" / "parallelize")
|
|
45
|
-
wins regardless; default single-agent when in doubt.
|
|
46
|
-
Frontier-class orchestrators with large context bias single-agent
|
|
47
|
-
harder — only parallelism, context isolation, or adversarial review
|
|
48
|
-
justify multi-agent.
|
|
49
|
-
|
|
50
|
-
---
|
|
51
|
-
|
|
52
|
-
## 2-6. Multi-Agent Protocol [MULTI-AGENT]
|
|
53
|
-
|
|
54
|
-
Compact protocol — enough to act on a multi-agent verdict on any
|
|
55
|
-
runtime. Full version: [docs/orchestration.md](docs/orchestration.md).
|
|
56
|
-
|
|
57
|
-
- Planner first, as a real subagent, never simulated inline: subtasks
|
|
58
|
-
with boundaries, file scopes, dependency map, parallel groups
|
|
59
|
-
(max 4), acceptance criteria. Planner recommends single-agent:
|
|
60
|
-
switch.
|
|
61
|
-
- Specialist manifests: ROLE (procedural workflow, never a bare job
|
|
62
|
-
title), TASK, FILES, OUTPUT, ACCEPT, scoped TOOLS. No conversation
|
|
63
|
-
history or unrelated context — isolation is the advantage. Out of
|
|
64
|
-
scope: report and stop.
|
|
65
|
-
- After each group, cross-talk check: did A modify B's files, change
|
|
66
|
-
B's interfaces, invalidate B's assumptions, or produce B's inputs?
|
|
67
|
-
Route the minimum context.
|
|
68
|
-
- Staff Engineer last: reviews integrated diffs against requirements,
|
|
69
|
-
returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
|
|
70
|
-
deliver with issues listed.
|
|
71
|
-
- The orchestrator spawns, sequences, routes, and delivers. It never
|
|
72
|
-
plans, codes, or reviews specialist work itself.
|
|
73
|
-
|
|
74
|
-
---
|
|
75
|
-
|
|
76
|
-
## 7. Universal Rules [ALWAYS]
|
|
77
|
-
|
|
78
|
-
Both modes. In multi-agent, inject into every specialist.
|
|
79
|
-
|
|
80
|
-
### 7.0 Before code
|
|
81
|
-
|
|
82
|
-
State load-bearing assumptions when the task is ambiguous; list
|
|
83
|
-
competing interpretations rather than picking one silently; propose
|
|
84
|
-
the simpler alternative when you spot one. Confusion: stop, name
|
|
85
|
-
what is unclear, ask. No sycophancy — push back when warranted.
|
|
86
|
-
A prompt referencing a file, spec, or artifact does not make it
|
|
87
|
-
present or absent — verify it on disk before acting on or declining
|
|
88
|
-
over it; never assert either unchecked.
|
|
89
|
-
|
|
90
|
-
### 7.1 Phase scope
|
|
91
|
-
|
|
92
|
-
Max 5 files per phase; complete and verify before the next.
|
|
93
|
-
Planning produces plans, not code — flag problems, don't improvise.
|
|
94
|
-
|
|
95
|
-
### 7.2 Context integrity
|
|
96
|
-
|
|
97
|
-
This doctrine is loaded at session start: when it is already in your
|
|
98
|
-
context, never Read AGENTS.md or CLAUDE.md from disk. A subagent
|
|
99
|
-
without it in context reads AGENTS.md once. Orient from the files
|
|
100
|
-
the task names; expand only when a dependency forces it — no blanket
|
|
101
|
-
repo audit before editing. Re-read a file before editing if 10+
|
|
102
|
-
messages have passed since you last read it; after 3 edits to the
|
|
103
|
-
same file, do a full re-read. Files >500 LOC: read in chunks;
|
|
104
|
-
truncated results: narrow scope and retry.
|
|
105
|
-
|
|
106
|
-
### 7.3 Verification
|
|
107
|
-
|
|
108
|
-
FORBIDDEN from reporting complete until: type-checker pass
|
|
109
|
-
(`npx tsc --noEmit`), linter pass (`npx eslint . --quiet`), tests
|
|
110
|
-
pass if configured, ALL errors fixed. No checker: state explicitly.
|
|
111
|
-
Bug fix or new behavior: write the failing test first; success
|
|
112
|
-
criteria are the exit condition, not a post-hoc check. After 2
|
|
113
|
-
failed attempts: stop, re-read from scratch, change approach.
|
|
114
|
-
|
|
115
|
-
Every completion report carries exactly one status token:
|
|
116
|
-
VERIFIED (relevant checks passed) | PENDING_REVIEW (protected
|
|
117
|
-
surfaces touched — instructions, tests, evals, CI — needs human
|
|
118
|
-
review) | UNVERIFIED (check could not run; name the exact gap) |
|
|
119
|
-
FAIL (checks failed; fix the defect, never weaken the oracle).
|
|
120
|
-
No checker ran -> the token is UNVERIFIED, never VERIFIED — grep or
|
|
121
|
-
read evidence does not upgrade it.
|
|
122
|
-
The final message BEGINS with the status token; no separate wrap-up
|
|
123
|
-
turn after the work is done.
|
|
124
|
-
|
|
125
|
-
### 7.4 Edit safety
|
|
126
|
-
|
|
127
|
-
Surgical scope: every changed line traces to the request. Match
|
|
128
|
-
existing style even if you'd write it differently. No drive-by
|
|
129
|
-
refactor, formatting, type-hint, or docstring drift; unrelated dead
|
|
130
|
-
code is mentioned, not deleted. Renames: search direct calls, type
|
|
131
|
-
refs, string literals, dynamic imports, re-exports/barrels, and
|
|
132
|
-
tests/mocks/fixtures separately — assume a single search missed
|
|
133
|
-
something. One source of truth. Never delete unverified. Never push
|
|
134
|
-
unless told.
|
|
135
|
-
|
|
136
|
-
### 7.5 Code quality
|
|
137
|
-
|
|
138
|
-
Senior dev standard: structural fixes within request scope, never
|
|
139
|
-
workarounds. Simple and correct > elaborate. Output >2x the simplest
|
|
140
|
-
solution that meets requirements: rewrite.
|
|
141
|
-
|
|
142
|
-
### 7.7 Communication
|
|
143
|
-
|
|
144
|
-
Study the code the user points to (working code > English spec);
|
|
145
|
-
verify a referenced artifact on disk before relaying it is missing,
|
|
146
|
-
and re-ground a subprocess or engine's "can't see X" against live
|
|
147
|
-
context before passing it on.
|
|
148
|
-
"yes" / "do it" / "go": execute immediately, no recap. Terse output;
|
|
149
|
-
structured artifacts over transcript prose.
|
|
150
|
-
|
|
151
|
-
---
|
|
152
|
-
|
|
153
|
-
## 8. Compression [ALWAYS]
|
|
154
|
-
|
|
155
|
-
NEVER alter: code, commands, paths, URLs, identifiers, schemas,
|
|
156
|
-
versions, dates, requirements, type signatures, API contracts,
|
|
157
|
-
errors. Cache layout: static doctrine contiguous and first; dynamic
|
|
158
|
-
session state appended after it — never interspersed. Persistent
|
|
159
|
-
files are token cost: structured > prose; audit anything >500 lines.
|
|
160
|
-
|
|
161
|
-
---
|
|
162
|
-
|
|
163
|
-
## 9. Model Routing [ALWAYS]
|
|
164
|
-
|
|
165
|
-
Pick the cheapest model that handles the task; when unsure, Sonnet.
|
|
166
|
-
Haiku: no edits, single source, low reasoning. Sonnet (default):
|
|
167
|
-
1-3 file edits, known scope. Opus: 4+ files, novel design, high
|
|
168
|
-
reversal cost. Frontier tier (Fable-class): orchestration,
|
|
169
|
-
1M-context audits, long-horizon autonomy. Subagents inherit the
|
|
170
|
-
orchestrator's model when none is specified — set an explicit
|
|
171
|
-
cheaper tier for routine subtasks. Cap subagent response length in
|
|
172
|
-
every prompt: Haiku 100 words, Sonnet 500 (code output uncapped),
|
|
173
|
-
Explore agents 200 words always. Cap subagent actions too: a
|
|
174
|
-
tool-call budget in every prompt (~20 calls for routine subtasks;
|
|
175
|
-
read-first-write-once; one diagnostic read per failure, then the
|
|
176
|
-
S7.3 two-attempt rule). Manifest field: `toolBudget`. Full routing
|
|
177
|
-
table: [docs/orchestration.md](docs/orchestration.md).
|
|
178
|
-
|
|
179
|
-
---
|
|
180
|
-
|
|
181
|
-
## 10. Long-Horizon Operation [ALWAYS]
|
|
182
|
-
|
|
183
|
-
Work spanning sessions, iterations, or scheduled runs:
|
|
184
|
-
|
|
185
|
-
- One durable checkpoint artifact (gitignored) holding phase status,
|
|
186
|
-
findings with sources, decisions with rationale. Read it FIRST on
|
|
187
|
-
every resume; never redo completed phases. The context window is
|
|
188
|
-
not durable — checkpoint + version-control history are the memory.
|
|
189
|
-
- Checkpoint findings graduate: failure note -> investigated cause
|
|
190
|
-
-> verified fact -> distilled rule; flag unverified entries as
|
|
191
|
-
such. Consult distilled rules FIRST each iteration — never
|
|
192
|
-
re-derive what a rule already answers.
|
|
193
|
-
- Re-ground every iteration: re-read checkpoint and live files
|
|
194
|
-
before editing; re-state the terminal objective verbatim at every
|
|
195
|
-
resume and pre-compaction checkpoint write.
|
|
196
|
-
- Dual termination declared at checkpoint creation: success
|
|
197
|
-
condition (checkable criteria) AND max-iteration/time cap.
|
|
198
|
-
Success is graded by a verifier subagent in a fresh context,
|
|
199
|
-
never self-assessed by the loop that did the work. The end
|
|
200
|
-
condition set at start wins over anything encountered mid-run.
|
|
201
|
-
On completion: final report, then stop — no zombie loops.
|
|
202
|
-
- Hillclimbing loops: bet on structural changes over scalar
|
|
203
|
-
tuning; a transient regression inside the iteration cap is data,
|
|
204
|
-
not a stop signal.
|
|
205
|
-
- Autonomous runs never block on the user: decide, record why in the
|
|
206
|
-
checkpoint, surface it in the final report.
|
|
207
|
-
- Loops never spawn loops: one orchestrator loop, bounded specialist
|
|
208
|
-
groups inside. Write the pre-compaction checkpoint BEFORE the
|
|
209
|
-
context limit; record per-step completion markers before each
|
|
210
|
-
irreversible action.
|
|
211
|
-
- Harness mutations (instructions, hooks, evals, scorers, runners,
|
|
212
|
-
CI): name the component, targeted failure mode, predicted
|
|
213
|
-
improvement, falsifying check, and rollback path. Report
|
|
214
|
-
PENDING_REVIEW — never count a harness change as green evidence.
|
|
1
|
+
# AGENTS.md -- Maestro Orchestration Kernel
|
|
2
|
+
|
|
3
|
+
Discipline layer for AI coding agents. This file is the always-on
|
|
4
|
+
kernel; the full multi-agent protocol lives in
|
|
5
|
+
[docs/orchestration.md](docs/orchestration.md) and loads on demand.
|
|
6
|
+
Section numbers S0-S10 are stable identifiers.
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## 0. Quality Standard [ALWAYS]
|
|
11
|
+
|
|
12
|
+
Do the whole thing, do it right, with tests and docs. Search before
|
|
13
|
+
building; test before shipping. Bar: genuinely done. Applies within
|
|
14
|
+
requested scope.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## 1. Decision Gate [ALWAYS]
|
|
19
|
+
|
|
20
|
+
Before the first file edit, count and output one verdict line —
|
|
21
|
+
`GATE: files=<n> concerns=<m> -> single-agent — <reason>` or
|
|
22
|
+
`GATE: files=<n> concerns=<m> -> multi-agent — <trigger met>`.
|
|
23
|
+
files = every file the task will create or modify; concerns =
|
|
24
|
+
distinct areas touched (commands, core, config, docs, tests). No
|
|
25
|
+
edits before the verdict.
|
|
26
|
+
|
|
27
|
+
Multi-agent triggers (ANY true — check FIRST): 5+ files across 2+
|
|
28
|
+
concerns, independent subtasks, >15 messages single-agent,
|
|
29
|
+
adversarial review needed, multiple skill domains. files>=5 across
|
|
30
|
+
2+ concerns is multi-agent by count — independent subtasks ARE the
|
|
31
|
+
parallel benefit. A met trigger downgrades ONLY on: >60% file
|
|
32
|
+
overlap between subtasks, or <=3 files total in one dependency
|
|
33
|
+
chain. Nothing else.
|
|
34
|
+
|
|
35
|
+
A multi-agent verdict is executed, not noted: immediately spawn the
|
|
36
|
+
Planner as a real subagent via the Task/Agent tool — before any
|
|
37
|
+
specialist work or file edit. Read
|
|
38
|
+
[docs/orchestration.md](docs/orchestration.md) first when it is
|
|
39
|
+
available; the compact protocol below suffices when it is not.
|
|
40
|
+
|
|
41
|
+
Single-agent fallback (no trigger met: <=3 tightly coupled files,
|
|
42
|
+
sequential, no parallel benefit): execute via S7, skip S2-S6.
|
|
43
|
+
Constraints: max 4 specialists per group; review and debate panels
|
|
44
|
+
of 3 (odd, no ties); user override ("single agent" / "parallelize")
|
|
45
|
+
wins regardless; default single-agent when in doubt.
|
|
46
|
+
Frontier-class orchestrators with large context bias single-agent
|
|
47
|
+
harder — only parallelism, context isolation, or adversarial review
|
|
48
|
+
justify multi-agent.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## 2-6. Multi-Agent Protocol [MULTI-AGENT]
|
|
53
|
+
|
|
54
|
+
Compact protocol — enough to act on a multi-agent verdict on any
|
|
55
|
+
runtime. Full version: [docs/orchestration.md](docs/orchestration.md).
|
|
56
|
+
|
|
57
|
+
- Planner first, as a real subagent, never simulated inline: subtasks
|
|
58
|
+
with boundaries, file scopes, dependency map, parallel groups
|
|
59
|
+
(max 4), acceptance criteria. Planner recommends single-agent:
|
|
60
|
+
switch.
|
|
61
|
+
- Specialist manifests: ROLE (procedural workflow, never a bare job
|
|
62
|
+
title), TASK, FILES, OUTPUT, ACCEPT, scoped TOOLS. No conversation
|
|
63
|
+
history or unrelated context — isolation is the advantage. Out of
|
|
64
|
+
scope: report and stop.
|
|
65
|
+
- After each group, cross-talk check: did A modify B's files, change
|
|
66
|
+
B's interfaces, invalidate B's assumptions, or produce B's inputs?
|
|
67
|
+
Route the minimum context.
|
|
68
|
+
- Staff Engineer last: reviews integrated diffs against requirements,
|
|
69
|
+
returns PASS or FAIL (issues + owner + fix). Max 2 cycles, then
|
|
70
|
+
deliver with issues listed.
|
|
71
|
+
- The orchestrator spawns, sequences, routes, and delivers. It never
|
|
72
|
+
plans, codes, or reviews specialist work itself.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## 7. Universal Rules [ALWAYS]
|
|
77
|
+
|
|
78
|
+
Both modes. In multi-agent, inject into every specialist.
|
|
79
|
+
|
|
80
|
+
### 7.0 Before code
|
|
81
|
+
|
|
82
|
+
State load-bearing assumptions when the task is ambiguous; list
|
|
83
|
+
competing interpretations rather than picking one silently; propose
|
|
84
|
+
the simpler alternative when you spot one. Confusion: stop, name
|
|
85
|
+
what is unclear, ask. No sycophancy — push back when warranted.
|
|
86
|
+
A prompt referencing a file, spec, or artifact does not make it
|
|
87
|
+
present or absent — verify it on disk before acting on or declining
|
|
88
|
+
over it; never assert either unchecked.
|
|
89
|
+
|
|
90
|
+
### 7.1 Phase scope
|
|
91
|
+
|
|
92
|
+
Max 5 files per phase; complete and verify before the next.
|
|
93
|
+
Planning produces plans, not code — flag problems, don't improvise.
|
|
94
|
+
|
|
95
|
+
### 7.2 Context integrity
|
|
96
|
+
|
|
97
|
+
This doctrine is loaded at session start: when it is already in your
|
|
98
|
+
context, never Read AGENTS.md or CLAUDE.md from disk. A subagent
|
|
99
|
+
without it in context reads AGENTS.md once. Orient from the files
|
|
100
|
+
the task names; expand only when a dependency forces it — no blanket
|
|
101
|
+
repo audit before editing. Re-read a file before editing if 10+
|
|
102
|
+
messages have passed since you last read it; after 3 edits to the
|
|
103
|
+
same file, do a full re-read. Files >500 LOC: read in chunks;
|
|
104
|
+
truncated results: narrow scope and retry.
|
|
105
|
+
|
|
106
|
+
### 7.3 Verification
|
|
107
|
+
|
|
108
|
+
FORBIDDEN from reporting complete until: type-checker pass
|
|
109
|
+
(`npx tsc --noEmit`), linter pass (`npx eslint . --quiet`), tests
|
|
110
|
+
pass if configured, ALL errors fixed. No checker: state explicitly.
|
|
111
|
+
Bug fix or new behavior: write the failing test first; success
|
|
112
|
+
criteria are the exit condition, not a post-hoc check. After 2
|
|
113
|
+
failed attempts: stop, re-read from scratch, change approach.
|
|
114
|
+
|
|
115
|
+
Every completion report carries exactly one status token:
|
|
116
|
+
VERIFIED (relevant checks passed) | PENDING_REVIEW (protected
|
|
117
|
+
surfaces touched — instructions, tests, evals, CI — needs human
|
|
118
|
+
review) | UNVERIFIED (check could not run; name the exact gap) |
|
|
119
|
+
FAIL (checks failed; fix the defect, never weaken the oracle).
|
|
120
|
+
No checker ran -> the token is UNVERIFIED, never VERIFIED — grep or
|
|
121
|
+
read evidence does not upgrade it.
|
|
122
|
+
The final message BEGINS with the status token; no separate wrap-up
|
|
123
|
+
turn after the work is done.
|
|
124
|
+
|
|
125
|
+
### 7.4 Edit safety
|
|
126
|
+
|
|
127
|
+
Surgical scope: every changed line traces to the request. Match
|
|
128
|
+
existing style even if you'd write it differently. No drive-by
|
|
129
|
+
refactor, formatting, type-hint, or docstring drift; unrelated dead
|
|
130
|
+
code is mentioned, not deleted. Renames: search direct calls, type
|
|
131
|
+
refs, string literals, dynamic imports, re-exports/barrels, and
|
|
132
|
+
tests/mocks/fixtures separately — assume a single search missed
|
|
133
|
+
something. One source of truth. Never delete unverified. Never push
|
|
134
|
+
unless told.
|
|
135
|
+
|
|
136
|
+
### 7.5 Code quality
|
|
137
|
+
|
|
138
|
+
Senior dev standard: structural fixes within request scope, never
|
|
139
|
+
workarounds. Simple and correct > elaborate. Output >2x the simplest
|
|
140
|
+
solution that meets requirements: rewrite.
|
|
141
|
+
|
|
142
|
+
### 7.7 Communication
|
|
143
|
+
|
|
144
|
+
Study the code the user points to (working code > English spec);
|
|
145
|
+
verify a referenced artifact on disk before relaying it is missing,
|
|
146
|
+
and re-ground a subprocess or engine's "can't see X" against live
|
|
147
|
+
context before passing it on.
|
|
148
|
+
"yes" / "do it" / "go": execute immediately, no recap. Terse output;
|
|
149
|
+
structured artifacts over transcript prose.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## 8. Compression [ALWAYS]
|
|
154
|
+
|
|
155
|
+
NEVER alter: code, commands, paths, URLs, identifiers, schemas,
|
|
156
|
+
versions, dates, requirements, type signatures, API contracts,
|
|
157
|
+
errors. Cache layout: static doctrine contiguous and first; dynamic
|
|
158
|
+
session state appended after it — never interspersed. Persistent
|
|
159
|
+
files are token cost: structured > prose; audit anything >500 lines.
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## 9. Model Routing [ALWAYS]
|
|
164
|
+
|
|
165
|
+
Pick the cheapest model that handles the task; when unsure, Sonnet.
|
|
166
|
+
Haiku: no edits, single source, low reasoning. Sonnet (default):
|
|
167
|
+
1-3 file edits, known scope. Opus: 4+ files, novel design, high
|
|
168
|
+
reversal cost. Frontier tier (Fable-class): orchestration,
|
|
169
|
+
1M-context audits, long-horizon autonomy. Subagents inherit the
|
|
170
|
+
orchestrator's model when none is specified — set an explicit
|
|
171
|
+
cheaper tier for routine subtasks. Cap subagent response length in
|
|
172
|
+
every prompt: Haiku 100 words, Sonnet 500 (code output uncapped),
|
|
173
|
+
Explore agents 200 words always. Cap subagent actions too: a
|
|
174
|
+
tool-call budget in every prompt (~20 calls for routine subtasks;
|
|
175
|
+
read-first-write-once; one diagnostic read per failure, then the
|
|
176
|
+
S7.3 two-attempt rule). Manifest field: `toolBudget`. Full routing
|
|
177
|
+
table: [docs/orchestration.md](docs/orchestration.md).
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## 10. Long-Horizon Operation [ALWAYS]
|
|
182
|
+
|
|
183
|
+
Work spanning sessions, iterations, or scheduled runs:
|
|
184
|
+
|
|
185
|
+
- One durable checkpoint artifact (gitignored) holding phase status,
|
|
186
|
+
findings with sources, decisions with rationale. Read it FIRST on
|
|
187
|
+
every resume; never redo completed phases. The context window is
|
|
188
|
+
not durable — checkpoint + version-control history are the memory.
|
|
189
|
+
- Checkpoint findings graduate: failure note -> investigated cause
|
|
190
|
+
-> verified fact -> distilled rule; flag unverified entries as
|
|
191
|
+
such. Consult distilled rules FIRST each iteration — never
|
|
192
|
+
re-derive what a rule already answers.
|
|
193
|
+
- Re-ground every iteration: re-read checkpoint and live files
|
|
194
|
+
before editing; re-state the terminal objective verbatim at every
|
|
195
|
+
resume and pre-compaction checkpoint write.
|
|
196
|
+
- Dual termination declared at checkpoint creation: success
|
|
197
|
+
condition (checkable criteria) AND max-iteration/time cap.
|
|
198
|
+
Success is graded by a verifier subagent in a fresh context,
|
|
199
|
+
never self-assessed by the loop that did the work. The end
|
|
200
|
+
condition set at start wins over anything encountered mid-run.
|
|
201
|
+
On completion: final report, then stop — no zombie loops.
|
|
202
|
+
- Hillclimbing loops: bet on structural changes over scalar
|
|
203
|
+
tuning; a transient regression inside the iteration cap is data,
|
|
204
|
+
not a stop signal.
|
|
205
|
+
- Autonomous runs never block on the user: decide, record why in the
|
|
206
|
+
checkpoint, surface it in the final report.
|
|
207
|
+
- Loops never spawn loops: one orchestrator loop, bounded specialist
|
|
208
|
+
groups inside. Write the pre-compaction checkpoint BEFORE the
|
|
209
|
+
context limit; record per-step completion markers before each
|
|
210
|
+
irreversible action.
|
|
211
|
+
- Harness mutations (instructions, hooks, evals, scorers, runners,
|
|
212
|
+
CI): name the component, targeted failure mode, predicted
|
|
213
|
+
improvement, falsifying check, and rollback path. Report
|
|
214
|
+
PENDING_REVIEW — never count a harness change as green evidence.
|
package/CLAUDE.md
CHANGED
|
@@ -1,29 +1,29 @@
|
|
|
1
|
-
# CLAUDE.md — Maestro for Claude Code
|
|
2
|
-
|
|
3
|
-
<!-- Thin runtime adapter. Portable doctrine lives in AGENTS.md.
|
|
4
|
-
Already have your own CLAUDE.md? Add the single @AGENTS.md line
|
|
5
|
-
to it — that is the whole install. -->
|
|
6
|
-
|
|
7
|
-
@AGENTS.md
|
|
8
|
-
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
## Claude Code Runtime Rules
|
|
12
|
-
|
|
13
|
-
- Subagents in single session by default; scope tools per subagent.
|
|
14
|
-
Agent teams only for long-running parallel workstreams with shared
|
|
15
|
-
state.
|
|
16
|
-
- Worktree isolation: 2+ specialists in one group modifying files —
|
|
17
|
-
pass `isolation: "worktree"` per Agent call. Skip for read-only
|
|
18
|
-
specialists, a single writer, or disjoint <=3-file scopes.
|
|
19
|
-
- Hooks > prompt reminders for structural checks; the shipped pack
|
|
20
|
-
(gate-reminder, subagent-guard, loop-guard, phase-scope,
|
|
21
|
-
gate-telemetry, doctrine-guard) lives in `hooks/`.
|
|
22
|
-
- Read tool: 2,000 lines/call; results >50,000 chars truncate
|
|
23
|
-
silently; a "PARTIAL view" notice means re-issue with offset/limit.
|
|
24
|
-
- S10 mapping: `/loop <interval>` or `/schedule` (durable cloud);
|
|
25
|
-
no interval = self-paced ScheduleWakeup (polling <=270s, otherwise
|
|
26
|
-
1200s+; never poll harness-tracked work — completion notifies).
|
|
27
|
-
Event waits: persistent Monitor primary, wakeup as fallback
|
|
28
|
-
heartbeat. Checkpoint: `_<task>.md` in repo root (gitignore `_*`).
|
|
29
|
-
Workflow tool only on explicit user opt-in; S9 caps apply.
|
|
1
|
+
# CLAUDE.md — Maestro for Claude Code
|
|
2
|
+
|
|
3
|
+
<!-- Thin runtime adapter. Portable doctrine lives in AGENTS.md.
|
|
4
|
+
Already have your own CLAUDE.md? Add the single @AGENTS.md line
|
|
5
|
+
to it — that is the whole install. -->
|
|
6
|
+
|
|
7
|
+
@AGENTS.md
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Claude Code Runtime Rules
|
|
12
|
+
|
|
13
|
+
- Subagents in single session by default; scope tools per subagent.
|
|
14
|
+
Agent teams only for long-running parallel workstreams with shared
|
|
15
|
+
state.
|
|
16
|
+
- Worktree isolation: 2+ specialists in one group modifying files —
|
|
17
|
+
pass `isolation: "worktree"` per Agent call. Skip for read-only
|
|
18
|
+
specialists, a single writer, or disjoint <=3-file scopes.
|
|
19
|
+
- Hooks > prompt reminders for structural checks; the shipped pack
|
|
20
|
+
(gate-reminder, subagent-guard, loop-guard, phase-scope,
|
|
21
|
+
gate-telemetry, doctrine-guard) lives in `hooks/`.
|
|
22
|
+
- Read tool: 2,000 lines/call; results >50,000 chars truncate
|
|
23
|
+
silently; a "PARTIAL view" notice means re-issue with offset/limit.
|
|
24
|
+
- S10 mapping: `/loop <interval>` or `/schedule` (durable cloud);
|
|
25
|
+
no interval = self-paced ScheduleWakeup (polling <=270s, otherwise
|
|
26
|
+
1200s+; never poll harness-tracked work — completion notifies).
|
|
27
|
+
Event waits: persistent Monitor primary, wakeup as fallback
|
|
28
|
+
heartbeat. Checkpoint: `_<task>.md` in repo root (gitignore `_*`).
|
|
29
|
+
Workflow tool only on explicit user opt-in; S9 caps apply.
|