@event4u/agent-config 1.31.0 → 1.33.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/.agent-src/commands/research/deep.md +149 -0
  2. package/.agent-src/commands/research/report.md +134 -0
  3. package/.agent-src/commands/research.md +43 -13
  4. package/.agent-src/skills/feature-planning/SKILL.md +43 -7
  5. package/.agent-src/skills/judge-test-coverage/SKILL.md +4 -0
  6. package/.agent-src/skills/pest-testing/SKILL.md +13 -6
  7. package/.agent-src/skills/quality-tools/SKILL.md +4 -0
  8. package/.agent-src/skills/refine-prompt/SKILL.md +10 -0
  9. package/.agent-src/skills/refine-ticket/SKILL.md +12 -0
  10. package/.agent-src/skills/subagent-orchestration/SKILL.md +77 -12
  11. package/.agent-src/skills/subagent-orchestration/prompts/README.md +29 -0
  12. package/.agent-src/skills/subagent-orchestration/prompts/do-and-judge-two-stage.md +121 -0
  13. package/.agent-src/skills/subagent-orchestration/prompts/do-and-judge.md +60 -0
  14. package/.agent-src/skills/subagent-orchestration/prompts/do-competitively.md +65 -0
  15. package/.agent-src/skills/subagent-orchestration/prompts/do-in-parallel.md +62 -0
  16. package/.agent-src/skills/subagent-orchestration/prompts/do-in-steps.md +62 -0
  17. package/.agent-src/skills/subagent-orchestration/prompts/do-in-worktrees.md +70 -0
  18. package/.agent-src/skills/subagent-orchestration/prompts/judge-with-debate.md +63 -0
  19. package/.agent-src/skills/subagent-orchestration/schemas/subagent-status.json +63 -0
  20. package/.agent-src/skills/test-driven-development/SKILL.md +25 -13
  21. package/.agent-src/skills/testing-anti-patterns/SKILL.md +7 -0
  22. package/.agent-src/skills/testing-anti-patterns/process-anti-patterns.md +67 -0
  23. package/.claude-plugin/marketplace.json +3 -1
  24. package/CHANGELOG.md +51 -0
  25. package/README.md +3 -3
  26. package/docs/architecture.md +2 -2
  27. package/docs/catalog.md +11 -4
  28. package/docs/contracts/command-clusters.md +1 -1
  29. package/docs/contracts/file-ownership-matrix.json +395 -0
  30. package/docs/getting-started.md +1 -1
  31. package/docs/guidelines/agent-infra/5w2h-analysis.md +260 -0
  32. package/docs/guidelines/agent-infra/critical-thinking.md +156 -0
  33. package/docs/guidelines/agent-infra/first-principles.md +192 -0
  34. package/docs/guidelines/agent-infra/six-hats.md +353 -0
  35. package/docs/guidelines/agent-infra/systems-thinking.md +220 -0
  36. package/package.json +1 -1
  37. package/scripts/check_bite_sized_granularity.py +99 -0
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: subagent-orchestration
3
- description: "Use when orchestrating implementer/judge subagents — six modes (do-and-judge, do-in-steps, do-in-parallel, do-competitively, judge-with-debate, do-in-worktrees) — models from .agent-settings.yml."
3
+ description: "Use when orchestrating implementer/judge subagents — seven modes (do-and-judge ±two-stage, do-in-steps/parallel/worktrees, do-competitively, judge-with-debate) — models from .agent-settings.yml."
4
4
  source: package
5
5
  ---
6
6
 
@@ -44,7 +44,7 @@ judge is a fresh pair of eyes. If `.agent-settings.yml` resolves to
44
44
  identical implementer and judge models, surface the mismatch before
45
45
  running — do not silently continue.
46
46
 
47
- ## The six modes
47
+ ## The seven modes
48
48
 
49
49
  Each mode has a decision row: when to use, when not, and the expected
50
50
  model pairing. Defaults come from
@@ -60,7 +60,34 @@ back to the user.
60
60
  |---|---|---|
61
61
  | Single-change task with non-trivial risk | Tiny fix, or spike/exploration | implementer = session; judge = one tier up |
62
62
 
63
- ### 2. do-in-steps
63
+ ### 2. do-and-judge-two-stage
64
+
65
+ Implementer produces a diff; **two judges run sequentially** — first a
66
+ spec-compliance reviewer (does the diff satisfy the stated spec /
67
+ acceptance criteria?), then a code-quality reviewer (is the diff well-
68
+ written for the codebase it lands in?). The orchestrator only proceeds
69
+ to stage two if stage one returns `DONE` or `DONE_WITH_CONCERNS`. A
70
+ stage-one `BLOCKED` shortcuts the loop — there is no point quality-
71
+ reviewing a diff that does not satisfy the spec.
72
+
73
+ | When to use | When not | Model pairing |
74
+ |---|---|---|
75
+ | Spec is contested or AC are detailed; diff size makes one judge prone to missing one axis (correctness vs craft) | Spec is one sentence, or the diff is one line (collapse to mode 1) | implementer = session; spec-judge = one tier up; quality-judge = same tier as spec-judge, fresh context |
76
+
77
+ **Why two stages, not one judge with both rubrics:** combining the
78
+ rubrics in one prompt reliably regresses one of them — the judge "spends
79
+ attention" on whichever rubric appears last. Splitting the prompts
80
+ forces each judge to commit fully to its rubric.
81
+
82
+ **Stage-routing rule:**
83
+ - Stage-1 returns `DONE` → run stage-2.
84
+ - Stage-1 returns `DONE_WITH_CONCERNS` → run stage-2; concerns carry
85
+ forward to the final envelope.
86
+ - Stage-1 returns `NEEDS_CONTEXT` → pause; stage-2 does not run.
87
+ - Stage-1 returns `BLOCKED` → final verdict is `BLOCKED`; stage-2
88
+ does not run (saves cost).
89
+
90
+ ### 3. do-in-steps
64
91
 
65
92
  Plan is split into N steps; judge runs **between** steps. A step that
66
93
  fails judgment is revised before the next step starts. Used for
@@ -70,7 +97,7 @@ multi-file changes where a mid-plan mistake would cascade.
70
97
  |---|---|---|
71
98
  | Multi-step plan with ordered dependencies | Single-step change, or when steps are independent (use `do-in-parallel`) | implementer = session; judge = one tier up |
72
99
 
73
- ### 3. do-in-parallel
100
+ ### 4. do-in-parallel
74
101
 
75
102
  Independent slices run concurrently. No judge per slice — judge runs
76
103
  once on the aggregated result. Parallelism capped by
@@ -80,7 +107,7 @@ once on the aggregated result. Parallelism capped by
80
107
  |---|---|---|
81
108
  | Independent slices (different files, non-overlapping) | Any slice touches shared state | implementer = session; judge = one tier up, run once |
82
109
 
83
- ### 4. do-competitively
110
+ ### 5. do-competitively
84
111
 
85
112
  Multiple implementers produce candidate diffs for the **same** slice.
86
113
  Judge picks the winner and rejects the losers. Expensive — use only
@@ -90,7 +117,7 @@ when the solution space is genuinely broad.
90
117
  |---|---|---|
91
118
  | Broad solution space (algorithm choice, API shape) | Well-defined problem with one good answer | implementers = same tier (≥2 instances); judge = one tier up |
92
119
 
93
- ### 5. judge-with-debate
120
+ ### 6. judge-with-debate
94
121
 
95
122
  Two judges each produce a verdict; a meta-judge reconciles
96
123
  disagreements. Used for high-stakes changes (security, data
@@ -100,7 +127,7 @@ migration, public API) where a single judge is too easy to fool.
100
127
  |---|---|---|
101
128
  | Security, data integrity, public API change | Routine internal refactor | judges = same tier (2x); meta-judge = one tier up |
102
129
 
103
- ### 6. do-in-worktrees
130
+ ### 7. do-in-worktrees
104
131
 
105
132
  Cross-wing or cross-skill chain executed across isolated git
106
133
  worktrees — each handoff in the chain runs in its own worktree, so
@@ -130,7 +157,44 @@ end produces a single integration PR.
130
157
  **Anti-pattern:** do not use for fast iteration loops where each
131
158
  step is under ~30 minutes. The branch-creation, context-switch, and
132
159
  worktree-cleanup cost dominates. Stick with mode 1 (do-and-judge)
133
- or mode 2 (do-in-steps) for those.
160
+ or mode 3 (do-in-steps) for those.
161
+
162
+ ## Status taxonomy — every subagent return uses one envelope
163
+
164
+ Every implementer or judge return must conform to
165
+ [`schemas/subagent-status.json`](schemas/subagent-status.json). Four
166
+ statuses, no free-form alternatives:
167
+
168
+ | Status | Meaning | Required keys (beyond `status`, `summary`) |
169
+ |---|---|---|
170
+ | `DONE` | Work shipped, all gates green. | `evidence[]` |
171
+ | `DONE_WITH_CONCERNS` | Work shipped but caller must act on concerns. | `evidence[]`, `concerns[]` |
172
+ | `NEEDS_CONTEXT` | Paused; caller can unblock by answering. | `blocking_question` |
173
+ | `BLOCKED` | No path forward exists. | `blocking_reason` |
174
+
175
+ **Why a fixed taxonomy:** orchestrators (`/do-and-judge`, `/do-in-steps`)
176
+ route on status. Free-form "kind of done" returns force the orchestrator
177
+ to interpret prose, which silently regresses the two-revision ceiling and
178
+ the judge-rejected-do-not-apply rule. The schema makes routing mechanical.
179
+
180
+ **Tests:** `tests/test_subagent_status_schema.py` exercises all four
181
+ statuses plus rejection cases (missing required keys, unknown status,
182
+ extra fields, conditional-key violations).
183
+
184
+ **Distinguishing `NEEDS_CONTEXT` from `BLOCKED`:** `NEEDS_CONTEXT` means
185
+ *"you, the caller, can fix this by telling me X"*. `BLOCKED` means
186
+ *"no input from you unblocks this — escalate or rescope"*. If a subagent
187
+ is unsure, it picks `BLOCKED` and the caller can downgrade.
188
+
189
+ ## Dispatch prompts — externalized
190
+
191
+ Each mode's literal dispatch template lives under
192
+ [`prompts/{mode}.md`](prompts/README.md). The orchestrator loads the
193
+ matching prompt at dispatch time and substitutes `{{placeholders}}`.
194
+ Edits to a prompt do not bloat this skill against the 400-line sunset
195
+ trigger; `tests/test_subagent_prompt_loading.py` confirms each of the
196
+ seven modes resolves to a loadable prompt that cites all four taxonomy
197
+ statuses.
134
198
 
135
199
  ## Procedure
136
200
 
@@ -158,9 +222,10 @@ same context, **stop** and report. Do not improvise.
158
222
 
159
223
  ### 3. Pick the mode
160
224
 
161
- Match task shape to one of the five modes. When two modes could fit,
162
- prefer the cheaper one (`do-and-judge` < `do-in-steps` < `do-in-parallel`
163
- < `do-competitively` < `judge-with-debate`).
225
+ Match task shape to one of the seven modes. When two modes could fit,
226
+ prefer the cheaper one (`do-and-judge` < `do-and-judge-two-stage` <
227
+ `do-in-steps` < `do-in-parallel` < `do-competitively` <
228
+ `judge-with-debate` < `do-in-worktrees`).
164
229
 
165
230
  ### 4. Dispatch
166
231
 
@@ -195,7 +260,7 @@ the judge verdict.
195
260
 
196
261
  ## Output format
197
262
 
198
- 1. **Mode chosen** — one of the five, with the one-line reason
263
+ 1. **Mode chosen** — one of the seven, with the one-line reason
199
264
  2. **Model pairing** — implementer model / judge model (resolved)
200
265
  3. **Verdict** — applied / revised / handed back
201
266
  4. **Evidence** — diff summary, test output, or judge transcript
@@ -0,0 +1,29 @@
1
+ # Subagent dispatch prompts
2
+
3
+ One file per mode in [`SKILL.md`](../SKILL.md) § *The seven modes*. Each
4
+ prompt is the **literal template** the orchestrator hands to the
5
+ subagent on dispatch — externalized so prompt edits do not bloat the
6
+ skill above the 400-line sunset trigger.
7
+
8
+ | Mode | File |
9
+ |---|---|
10
+ | do-and-judge | [`do-and-judge.md`](do-and-judge.md) |
11
+ | do-and-judge-two-stage | [`do-and-judge-two-stage.md`](do-and-judge-two-stage.md) |
12
+ | do-in-steps | [`do-in-steps.md`](do-in-steps.md) |
13
+ | do-in-parallel | [`do-in-parallel.md`](do-in-parallel.md) |
14
+ | do-competitively | [`do-competitively.md`](do-competitively.md) |
15
+ | judge-with-debate | [`judge-with-debate.md`](judge-with-debate.md) |
16
+ | do-in-worktrees | [`do-in-worktrees.md`](do-in-worktrees.md) |
17
+
18
+ ## Contract
19
+
20
+ Every prompt cites the status taxonomy in
21
+ [`../schemas/subagent-status.json`](../schemas/subagent-status.json) and
22
+ ends with the **return-envelope** instruction so the subagent's reply
23
+ validates against `tests/test_subagent_status_schema.py`.
24
+
25
+ ## Loading
26
+
27
+ `tests/test_subagent_prompt_loading.py` asserts that every mode named
28
+ in `SKILL.md` § *The seven modes* has a loadable prompt file under this
29
+ directory and that each prompt mentions all four status enum values.
@@ -0,0 +1,121 @@
1
+ # Prompt — do-and-judge-two-stage
2
+
3
+ Mode reference: [`../SKILL.md`](../SKILL.md) § *2. do-and-judge-two-stage*.
4
+
5
+ ## Implementer prompt
6
+
7
+ ```
8
+ You are the implementer in a do-and-judge-two-stage loop. Two judges
9
+ will review your diff in sequence: first SPEC COMPLIANCE, then CODE
10
+ QUALITY. Spec failure shortcuts the loop — quality is not reviewed if
11
+ spec is wrong.
12
+
13
+ TASK: {{task_description}}
14
+ ACCEPTANCE CRITERIA: {{acceptance_criteria}}
15
+ CONTEXT FILES: {{file_paths}}
16
+
17
+ CONSTRAINTS:
18
+ - Hit every AC literally; do not "interpret" them away.
19
+ - Do not silently expand scope; AC are the contract.
20
+ - Write tests that map 1:1 to the AC so the spec-judge can verify.
21
+
22
+ ON COMPLETION, return ONE envelope per schemas/subagent-status.json:
23
+ - DONE — every AC satisfied, tests pass; evidence[]
24
+ maps each AC to the test that exercises it.
25
+ - DONE_WITH_CONCERNS — every AC satisfied but a trade-off needs
26
+ flagging in concerns[].
27
+ - NEEDS_CONTEXT — an AC is ambiguous; blocking_question must
28
+ name the AC and the interpretation gap.
29
+ - BLOCKED — an AC cannot be satisfied as stated;
30
+ blocking_reason explains why.
31
+ ```
32
+
33
+ ## Stage-1 prompt — SPEC COMPLIANCE judge
34
+
35
+ ```
36
+ You are the SPEC COMPLIANCE judge. Stage 1 of two. Your ONLY job is:
37
+ "does the diff satisfy every acceptance criterion as stated?" Do NOT
38
+ review style, naming, or craft — that is stage 2's job.
39
+
40
+ ACCEPTANCE CRITERIA: {{acceptance_criteria}}
41
+ DIFF: {{diff}}
42
+ TEST OUTPUT: {{test_output}}
43
+ IMPLEMENTER ENVELOPE: {{envelope}}
44
+
45
+ PER-AC SCAN — for each AC, return:
46
+ - SATISFIED — cite the diff hunk + test that proves it.
47
+ - PARTIAL — cite what is missing and why it falls short.
48
+ - MISSING — AC has no corresponding implementation.
49
+
50
+ VERDICT (one envelope, schemas/subagent-status.json):
51
+ - DONE — every AC SATISFIED; evidence[] is the per-AC
52
+ scan above.
53
+ - DONE_WITH_CONCERNS — every AC SATISFIED but a stretch
54
+ interpretation needs flagging (rare at this
55
+ stage).
56
+ - NEEDS_CONTEXT — an AC is ambiguous AND the implementer's
57
+ interpretation is plausible; orchestrator
58
+ must clarify.
59
+ - BLOCKED — one or more AC PARTIAL or MISSING. Stage 2
60
+ will NOT run; implementer revises first.
61
+
62
+ NEVER comment on naming, structure, or style. Stay in your lane —
63
+ that is the value of the two-stage split.
64
+ ```
65
+
66
+ ## Stage-2 prompt — CODE QUALITY judge (only if stage 1 passes)
67
+
68
+ ```
69
+ You are the CODE QUALITY judge. Stage 2 of two. Stage 1 already
70
+ confirmed the diff satisfies the spec. Your ONLY job is craft: is
71
+ the diff well-written for THIS codebase?
72
+
73
+ DIFF: {{diff}}
74
+ NEIGHBORING FILES: {{neighboring_files}}
75
+ PROJECT CONVENTIONS: {{conventions_summary}}
76
+ STAGE-1 CONCERNS (carry-forward): {{stage_1_concerns}}
77
+
78
+ QUALITY DIMENSIONS — cite each in evidence[]:
79
+ 1. Naming consistency with neighbors.
80
+ 2. Structure / responsibility boundary.
81
+ 3. Error handling matches project style.
82
+ 4. Test shape matches project conventions (Pest / pytest / etc.).
83
+ 5. Diff size — could the same intent ship smaller?
84
+
85
+ VERDICT (one envelope, schemas/subagent-status.json):
86
+ - DONE — quality is on par with the codebase;
87
+ evidence[] cites the five dimensions.
88
+ - DONE_WITH_CONCERNS — apply the diff, but concerns[] lists the
89
+ craft issues caller must address (carry
90
+ forward stage-1 concerns too).
91
+ - NEEDS_CONTEXT — convention is unclear; orchestrator must
92
+ name the canonical pattern.
93
+ - BLOCKED — diff is correct per stage 1 but quality is
94
+ unacceptable; implementer must revise.
95
+
96
+ NEVER re-litigate the spec. Stage 1 already settled correctness —
97
+ your job is craft.
98
+ ```
99
+
100
+ ## Stage routing — orchestrator logic
101
+
102
+ Stage-1 status determines whether stage 2 runs:
103
+
104
+ | Stage-1 status | Run stage 2? | Final envelope |
105
+ |---|---|---|
106
+ | `DONE` | Yes | Stage-2 envelope |
107
+ | `DONE_WITH_CONCERNS` | Yes | Stage-2 envelope; merge concerns[] from both |
108
+ | `NEEDS_CONTEXT` | No | Stage-1 envelope; pause |
109
+ | `BLOCKED` | No | Stage-1 envelope; implementer revises |
110
+
111
+ The orchestrator never collapses both stages into one prompt — that
112
+ defeats the purpose of the split (see SKILL.md § "Why two stages, not
113
+ one judge with both rubrics").
114
+
115
+ ## Cost-discipline rule
116
+
117
+ Two-stage = up to **3 subagent calls** per cycle (implementer + two
118
+ judges) versus 2 for plain `do-and-judge`. Use only when AC are
119
+ detailed enough that a single judge would predictably miss one of
120
+ correctness or craft. For one-line fixes or single-AC tasks, mode 1
121
+ (`do-and-judge`) is the right answer.
@@ -0,0 +1,60 @@
1
+ # Prompt — do-and-judge
2
+
3
+ Mode reference: [`../SKILL.md`](../SKILL.md) § *1. do-and-judge*.
4
+
5
+ ## Implementer prompt
6
+
7
+ ```
8
+ You are the implementer in a do-and-judge loop. Hard ceiling: two
9
+ revision cycles before hand-back to the user.
10
+
11
+ TASK: {{task_description}}
12
+
13
+ CONTEXT FILES: {{file_paths}}
14
+
15
+ CONSTRAINTS:
16
+ - Do not modify files outside the cited paths without surfacing why.
17
+ - Do not skip tests; if the task does not include a test, write one.
18
+ - Prefer the smallest diff that satisfies the task.
19
+
20
+ ON COMPLETION, return ONE envelope conforming to
21
+ schemas/subagent-status.json. Pick exactly one status:
22
+ - DONE — work shipped, all gates green; include evidence[].
23
+ - DONE_WITH_CONCERNS — shipped but caller must read concerns[];
24
+ include evidence[] AND concerns[].
25
+ - NEEDS_CONTEXT — paused; the orchestrator can unblock by
26
+ answering blocking_question.
27
+ - BLOCKED — no path forward; include blocking_reason.
28
+
29
+ NEVER invent a fifth status. Free-form "kind of done" prose is rejected
30
+ by the schema validator.
31
+ ```
32
+
33
+ ## Judge prompt
34
+
35
+ ```
36
+ You are the judge reviewing the implementer's diff. The implementer
37
+ returned the envelope below. Validate against the task and constraints.
38
+
39
+ TASK: {{task_description}}
40
+ DIFF: {{diff}}
41
+ IMPLEMENTER ENVELOPE: {{envelope}}
42
+
43
+ VERDICT (return ONE envelope per schemas/subagent-status.json):
44
+ - DONE — apply this diff; cite evidence in evidence[].
45
+ - DONE_WITH_CONCERNS — apply but caller must address concerns[].
46
+ - NEEDS_CONTEXT — orchestrator must clarify blocking_question
47
+ before re-dispatching the implementer.
48
+ - BLOCKED — diff is wrong; explain in blocking_reason.
49
+ Do NOT silently rewrite — that is the
50
+ implementer's job on the revision pass.
51
+
52
+ NEVER apply a diff you would have written differently if your concerns
53
+ were not addressed. Use DONE_WITH_CONCERNS for that case.
54
+ ```
55
+
56
+ ## Revision-loop rule
57
+
58
+ After two revision cycles, the orchestrator stops and hands back to the
59
+ user with the most recent envelope. The judge does not become the
60
+ implementer.
@@ -0,0 +1,65 @@
1
+ # Prompt — do-competitively
2
+
3
+ Mode reference: [`../SKILL.md`](../SKILL.md) § *4. do-competitively*.
4
+
5
+ ## Implementer prompt (per candidate)
6
+
7
+ ```
8
+ You are CANDIDATE {{candidate_id}} of {{n_candidates}} competing on the
9
+ SAME slice. Other implementers are solving the identical problem in
10
+ parallel; the judge will pick exactly one winner.
11
+
12
+ TASK: {{task_description}}
13
+ CONTEXT FILES: {{file_paths}}
14
+
15
+ CONSTRAINTS:
16
+ - Do NOT optimize for "what the judge wants to see" — solve the task.
17
+ - Do NOT copy from other candidates; you do not have access to them.
18
+ - Make a real choice: name the algorithm, the API shape, the trade-off.
19
+ Generic safe answers lose to specific decisive ones.
20
+
21
+ ON COMPLETION, return ONE envelope per schemas/subagent-status.json:
22
+ - DONE — your candidate is complete and tests pass;
23
+ evidence[] cites the test output.
24
+ - DONE_WITH_CONCERNS — complete but flag the trade-off you made so
25
+ the judge can score it.
26
+ - NEEDS_CONTEXT — task ambiguity blocks all candidates; if so,
27
+ all candidates should converge on the same
28
+ blocking_question.
29
+ - BLOCKED — task is malformed; explain in blocking_reason.
30
+ ```
31
+
32
+ ## Judge prompt (winner selection)
33
+
34
+ ```
35
+ You are the judge picking ONE winner from {{n_candidates}} competing
36
+ diffs for the SAME slice. Losers are rejected, not merged.
37
+
38
+ CANDIDATE ENVELOPES: {{envelopes_array}}
39
+ CANDIDATE DIFFS: {{diffs_array}}
40
+ TASK: {{task_description}}
41
+
42
+ SCORING DIMENSIONS (cite each in evidence[]):
43
+ 1. Correctness — does it pass tests AND solve the task?
44
+ 2. Trade-off clarity — is the choice named and defended?
45
+ 3. Maintenance cost — what does the codebase look like in 6 months?
46
+ 4. Diff size — smaller wins ties.
47
+
48
+ VERDICT (one envelope, schemas/subagent-status.json):
49
+ - DONE — winner picked; evidence[] cites the four
50
+ scoring dimensions and names the winner.
51
+ - DONE_WITH_CONCERNS — winner picked but the chosen trade-off has
52
+ carry-over costs (concerns[]).
53
+ - NEEDS_CONTEXT — all candidates need the same clarification.
54
+ - BLOCKED — no candidate is acceptable; rerun with new
55
+ implementers or change the task.
56
+
57
+ NEVER pick a winner because it was the cheapest model. NEVER merge
58
+ two candidates — that is do-in-parallel, not do-competitively.
59
+ ```
60
+
61
+ ## Cost-discipline rule
62
+
63
+ `do-competitively` is N+1 subagent calls per slice. The orchestrator
64
+ confirms budget with the user before dispatch. The losing diffs are
65
+ discarded — that cost is the price of the trade-off survey.
@@ -0,0 +1,62 @@
1
+ # Prompt — do-in-parallel
2
+
3
+ Mode reference: [`../SKILL.md`](../SKILL.md) § *3. do-in-parallel*.
4
+
5
+ ## Implementer prompt (per slice)
6
+
7
+ ```
8
+ You are the implementer for SLICE {{slice_id}} in a parallel-dispatch
9
+ run. {{n_slices}} slices run concurrently. Slices are guaranteed
10
+ independent — different files, no shared state.
11
+
12
+ SLICE: {{slice_description}}
13
+ CONTEXT FILES (this slice only): {{file_paths}}
14
+ SHARED-STATE BAN: {{shared_paths_to_avoid}}
15
+
16
+ CONSTRAINTS:
17
+ - Do NOT touch any file outside the cited paths. The orchestrator
18
+ verified independence — violating it causes a merge race.
19
+ - Do NOT communicate with other slices. They are doing their own work.
20
+ - Write tests scoped to your slice; do not assert on slice-cross
21
+ behavior.
22
+
23
+ ON COMPLETION, return ONE envelope per schemas/subagent-status.json:
24
+ - DONE — slice shipped clean; evidence[] required.
25
+ - DONE_WITH_CONCERNS — shipped but mark concerns[] for the
26
+ aggregating judge to surface.
27
+ - NEEDS_CONTEXT — paused; orchestrator must answer
28
+ blocking_question. Other slices keep running.
29
+ - BLOCKED — slice cannot complete in isolation; explain
30
+ in blocking_reason. Other slices keep running;
31
+ aggregating judge handles partial outcome.
32
+ ```
33
+
34
+ ## Judge prompt (run once on aggregate)
35
+
36
+ ```
37
+ You are the judge running ONCE over the merged output of N parallel
38
+ slices. Per-slice judges were skipped to keep cost linear.
39
+
40
+ SLICE ENVELOPES: {{envelopes_array}}
41
+ AGGREGATED DIFF: {{merged_diff}}
42
+ TEST OUTPUT (full suite): {{test_output}}
43
+
44
+ VERDICT (one envelope, schemas/subagent-status.json):
45
+ - DONE — every slice DONE or DONE_WITH_CONCERNS that
46
+ you accept; evidence[] cites the merge being
47
+ test-green.
48
+ - DONE_WITH_CONCERNS — accept the aggregate, but consolidated
49
+ concerns[] from all slices need caller action.
50
+ - NEEDS_CONTEXT — one or more slices need clarification before
51
+ the aggregate can land; cite which.
52
+ - BLOCKED — aggregate is broken; cite the slice(s) that
53
+ must be re-run.
54
+
55
+ INDEPENDENCE-VIOLATION CHECK: scan for files touched by more than one
56
+ slice. If found, return BLOCKED — the dispatch was unsafe.
57
+ ```
58
+
59
+ ## Failure-isolation rule
60
+
61
+ A slice returning BLOCKED does not abort the other slices. The
62
+ aggregating judge decides whether the partial result lands.
@@ -0,0 +1,62 @@
1
+ # Prompt — do-in-steps
2
+
3
+ Mode reference: [`../SKILL.md`](../SKILL.md) § *2. do-in-steps*.
4
+
5
+ ## Implementer prompt (per step)
6
+
7
+ ```
8
+ You are the implementer for STEP {{step_number}} of {{total_steps}} in a
9
+ sequential plan. Earlier steps that PASSED judgment are committed; their
10
+ diffs are read-only context.
11
+
12
+ PLAN: {{plan_summary}}
13
+ THIS STEP: {{step_description}}
14
+ PRIOR STEP DIFFS (read-only): {{prior_diffs}}
15
+ CONTEXT FILES: {{file_paths}}
16
+
17
+ CONSTRAINTS:
18
+ - Do NOT modify code from prior steps; their tests must still pass.
19
+ - Do NOT preempt later steps; one step at a time.
20
+ - Write the test for THIS step before the production code.
21
+
22
+ ON COMPLETION, return ONE envelope per schemas/subagent-status.json:
23
+ - DONE — step complete, gate green; cite evidence[].
24
+ - DONE_WITH_CONCERNS — step complete but flag carry-over concerns
25
+ for later steps.
26
+ - NEEDS_CONTEXT — paused; blocking_question must be answered
27
+ before this step can complete.
28
+ - BLOCKED — step cannot complete on the current plan;
29
+ blocking_reason explains why. The orchestrator
30
+ may revise the plan and re-dispatch.
31
+ ```
32
+
33
+ ## Judge prompt (between steps)
34
+
35
+ ```
36
+ You are the judge reviewing STEP {{step_number}} before STEP
37
+ {{step_number_plus_one}} starts. A failing step here cascades into the
38
+ next, so verdicts are stricter than a one-shot do-and-judge.
39
+
40
+ STEP DIFF: {{diff}}
41
+ STEP TESTS: {{test_output}}
42
+ PRIOR STEPS: {{prior_step_summaries}}
43
+ NEXT STEP DESCRIPTION: {{next_step_description}}
44
+
45
+ VERDICT — return ONE envelope per schemas/subagent-status.json:
46
+ - DONE — proceed to next step; evidence[] required.
47
+ - DONE_WITH_CONCERNS — proceed, but next step's prompt MUST surface
48
+ the concerns[] so the implementer compensates.
49
+ - NEEDS_CONTEXT — pause; orchestrator answers blocking_question
50
+ before next step.
51
+ - BLOCKED — do not start next step; this step is wrong.
52
+
53
+ DOWNSTREAM IMPACT CHECK: name one way this diff could break the next
54
+ step. If you cannot, return DONE. If you can but the implementer
55
+ already mitigated, DONE. Otherwise DONE_WITH_CONCERNS.
56
+ ```
57
+
58
+ ## Cascade rule
59
+
60
+ A step that returns BLOCKED stops the chain. The orchestrator does not
61
+ "jump ahead" or re-order — it surfaces the BLOCKED envelope to the user
62
+ and waits.
@@ -0,0 +1,70 @@
1
+ # Prompt — do-in-worktrees
2
+
3
+ Mode reference: [`../SKILL.md`](../SKILL.md) § *6. do-in-worktrees*.
4
+ Worktree creation/destruction lives in [`../../using-git-worktrees/SKILL.md`](../../using-git-worktrees/SKILL.md).
5
+
6
+ ## Implementer prompt (per worktree step)
7
+
8
+ ```
9
+ You are the implementer for STEP {{step_id}} in a cross-wing chain.
10
+ You are running INSIDE a fresh git worktree at {{worktree_path}} on
11
+ branch {{branch_name}}. Prior step's open files / branch state cannot
12
+ leak into this worktree — that is the whole point.
13
+
14
+ STEP TYPED INPUT (from prior step's ## Output): {{typed_input}}
15
+ STEP DESCRIPTION: {{step_description}}
16
+ EXPECTED ## Output (next step's ## Input): {{expected_output_shape}}
17
+
18
+ CONSTRAINTS:
19
+ - Stay inside the worktree path. Do NOT cd to the parent repo.
20
+ - Do NOT touch branches other than {{branch_name}}.
21
+ - Produce the expected ## Output shape literally — the next worktree's
22
+ implementer consumes it as ## Input.
23
+ - Run the chain-end test for THIS step before signaling completion.
24
+
25
+ ON COMPLETION, return ONE envelope per schemas/subagent-status.json:
26
+ - DONE — step output produced and validated; evidence[]
27
+ cites the typed-output file path.
28
+ - DONE_WITH_CONCERNS — output produced but flag carry-over for next
29
+ worktree; concerns[] surfaces in next step's
30
+ dispatch.
31
+ - NEEDS_CONTEXT — paused; chain pauses until orchestrator
32
+ answers blocking_question. Other worktrees
33
+ are NOT running concurrently in this mode.
34
+ - BLOCKED — step cannot complete; chain halts. The
35
+ orchestrator decides whether to drop the
36
+ worktree or rescope.
37
+ ```
38
+
39
+ ## Chain-end judge prompt (run once after final worktree)
40
+
41
+ ```
42
+ You are the chain-end judge. The chain produced N typed outputs, one
43
+ per worktree. Validate the final integration PR against the chain's
44
+ goal.
45
+
46
+ CHAIN STEPS: {{step_summaries_array}}
47
+ TYPED OUTPUTS: {{outputs_array}}
48
+ INTEGRATION PR DIFF: {{integration_diff}}
49
+
50
+ VERDICT (one envelope, schemas/subagent-status.json):
51
+ - DONE — chain landed cleanly; evidence[] cites each
52
+ step's typed output and the integration test
53
+ run.
54
+ - DONE_WITH_CONCERNS — chain landed but consolidated concerns[]
55
+ across steps need follow-up.
56
+ - NEEDS_CONTEXT — integration is unclear; cite which step(s)
57
+ need clarification.
58
+ - BLOCKED — integration is broken; cite the worktree(s)
59
+ that must be redone. Do NOT silently rewrite.
60
+
61
+ WORKTREE-LEAK CHECK: scan the integration diff for branch names or
62
+ files belonging to a different worktree's step. If found, BLOCKED —
63
+ isolation was violated.
64
+ ```
65
+
66
+ ## Sequential-not-parallel rule
67
+
68
+ `do-in-worktrees` runs steps sequentially across isolated worktrees.
69
+ Parallel concurrent worktrees are `do-in-parallel` with explicit
70
+ isolation, not this mode.