@event4u/agent-config 1.29.0 → 1.32.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent-src/commands/agents/audit.md +101 -197
- package/.agent-src/commands/{copilot-agents → agents}/init.md +18 -10
- package/.agent-src/commands/agents/optimize.md +181 -0
- package/.agent-src/commands/agents.md +19 -12
- package/.agent-src/commands/optimize/agents-dir.md +111 -0
- package/.agent-src/commands/optimize.md +10 -8
- package/.agent-src/contexts/communication/rules-auto/guidelines-mechanics.md +6 -0
- package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +2 -3
- package/.agent-src/contexts/contracts/agents-md-anatomy.md +132 -0
- package/.agent-src/skills/agents-md-thin-root/SKILL.md +8 -1
- package/.agent-src/skills/command-writing/SKILL.md +49 -0
- package/.agent-src/skills/copilot-agents-optimization/SKILL.md +3 -3
- package/.agent-src/skills/error-handling-patterns/SKILL.md +2 -2
- package/.agent-src/skills/feature-planning/SKILL.md +43 -7
- package/.agent-src/skills/judge-test-coverage/SKILL.md +4 -0
- package/.agent-src/skills/pest-testing/SKILL.md +13 -6
- package/.agent-src/skills/quality-tools/SKILL.md +4 -0
- package/.agent-src/skills/refine-prompt/SKILL.md +10 -0
- package/.agent-src/skills/refine-ticket/SKILL.md +12 -0
- package/.agent-src/skills/{repomix → repomix-packer}/SKILL.md +8 -8
- package/.agent-src/skills/roadmap-writing/SKILL.md +9 -0
- package/.agent-src/skills/rule-writing/SKILL.md +21 -0
- package/.agent-src/skills/skill-writing/SKILL.md +19 -0
- package/.agent-src/skills/subagent-orchestration/SKILL.md +77 -12
- package/.agent-src/skills/subagent-orchestration/prompts/README.md +29 -0
- package/.agent-src/skills/subagent-orchestration/prompts/do-and-judge-two-stage.md +121 -0
- package/.agent-src/skills/subagent-orchestration/prompts/do-and-judge.md +60 -0
- package/.agent-src/skills/subagent-orchestration/prompts/do-competitively.md +65 -0
- package/.agent-src/skills/subagent-orchestration/prompts/do-in-parallel.md +62 -0
- package/.agent-src/skills/subagent-orchestration/prompts/do-in-steps.md +62 -0
- package/.agent-src/skills/subagent-orchestration/prompts/do-in-worktrees.md +70 -0
- package/.agent-src/skills/subagent-orchestration/prompts/judge-with-debate.md +63 -0
- package/.agent-src/skills/subagent-orchestration/schemas/subagent-status.json +63 -0
- package/.agent-src/skills/test-driven-development/SKILL.md +25 -13
- package/.agent-src/skills/testing-anti-patterns/SKILL.md +14 -0
- package/.agent-src/skills/testing-anti-patterns/process-anti-patterns.md +67 -0
- package/.agent-src/templates/AGENTS.md +9 -10
- package/.claude-plugin/marketplace.json +5 -8
- package/AGENTS.md +1 -2
- package/CHANGELOG.md +110 -0
- package/CONTRIBUTING.md +90 -0
- package/README.md +3 -3
- package/docs/architecture.md +2 -2
- package/docs/catalog.md +12 -14
- package/docs/contracts/command-clusters.md +20 -3
- package/docs/contracts/file-ownership-matrix.json +546 -56
- package/docs/getting-started.md +1 -1
- package/docs/guidelines/code-clarity.md +95 -0
- package/docs/guidelines/php/general.md +8 -0
- package/docs/guidelines/php/php-coding-patterns.md +1 -0
- package/docs/skills-catalog.md +27 -3
- package/llms.txt +26 -2
- package/package.json +1 -1
- package/scripts/chat_history.py +166 -36
- package/scripts/check_bite_sized_granularity.py +99 -0
- package/scripts/check_command_count_messaging.py +12 -3
- package/scripts/check_portability.py +1 -0
- package/scripts/lint_agents_md.py +33 -0
- package/scripts/release.py +77 -2
- package/scripts/skill_linter.py +10 -3
- package/.agent-src/commands/agents/cleanup.md +0 -194
- package/.agent-src/commands/agents/prepare.md +0 -141
- package/.agent-src/commands/copilot-agents/optimize.md +0 -255
- package/.agent-src/commands/copilot-agents.md +0 -44
- package/.agent-src/commands/optimize/agents.md +0 -144
|
@@ -147,6 +147,10 @@ as a follow-up for the implementer — the judge does not execute tools.
|
|
|
147
147
|
model-pairing rules (`subagents.judge_model` one tier above implementer).
|
|
148
148
|
- [`test-driven-development`](../test-driven-development/SKILL.md) —
|
|
149
149
|
the write-the-test-first workflow that prevents most findings this judge makes.
|
|
150
|
+
- [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md) and its
|
|
151
|
+
sibling [`process-anti-patterns.md`](../testing-anti-patterns/process-anti-patterns.md) —
|
|
152
|
+
prevention layer this judge backs up; rationalization-table row numbers
|
|
153
|
+
are valid review citations.
|
|
150
154
|
- Sibling judges: [`judge-bug-hunter`](../judge-bug-hunter/SKILL.md),
|
|
151
155
|
[`judge-security-auditor`](../judge-security-auditor/SKILL.md),
|
|
152
156
|
[`judge-code-quality`](../judge-code-quality/SKILL.md) — dispatched
|
|
@@ -22,6 +22,13 @@ Use this skill for all Laravel testing tasks, especially when working with:
|
|
|
22
22
|
|
|
23
23
|
This skill extends `php-coder`, `laravel`, and `eloquent`.
|
|
24
24
|
|
|
25
|
+
For prevention layers that fire **before** writing a test — TDD
|
|
26
|
+
discipline, mock-isolation gates, and the 12 process rationalizations
|
|
27
|
+
("I'll add the test after", "patch first, test later") — see
|
|
28
|
+
[`test-driven-development`](../test-driven-development/SKILL.md),
|
|
29
|
+
[`testing-anti-patterns`](../testing-anti-patterns/SKILL.md), and
|
|
30
|
+
[`process-anti-patterns.md`](../testing-anti-patterns/process-anti-patterns.md).
|
|
31
|
+
|
|
25
32
|
## Procedure: Write Pest tests
|
|
26
33
|
|
|
27
34
|
1. **Read the base skills first** — apply `php-coder`, `laravel`, and `eloquent` where relevant.
|
|
@@ -53,9 +60,9 @@ For bug fixes and new features, prefer test-driven development:
|
|
|
53
60
|
|
|
54
61
|
### Why test-first matters
|
|
55
62
|
|
|
56
|
-
Tests written **after**
|
|
63
|
+
Tests written **after** implementation pass immediately. Passing immediately proves nothing:
|
|
57
64
|
- The test might test the wrong thing.
|
|
58
|
-
- The test might test
|
|
65
|
+
- The test might test implementation, not behavior.
|
|
59
66
|
- You never saw it catch the bug — so you don't know if it would.
|
|
60
67
|
|
|
61
68
|
### Bug fix TDD
|
|
@@ -120,7 +127,7 @@ The test proves the fix works AND prevents regression.
|
|
|
120
127
|
- For JSON APIs, assert:
|
|
121
128
|
- exact relevant fields
|
|
122
129
|
- error structure when applicable
|
|
123
|
-
-
|
|
130
|
+
- database state after the request
|
|
124
131
|
- Do not only assert `200` — verify meaningful behavior.
|
|
125
132
|
|
|
126
133
|
## Validation tests
|
|
@@ -258,7 +265,7 @@ When reviewing or auditing existing tests, check for these anti-patterns:
|
|
|
258
265
|
|
|
259
266
|
- Do not test private methods directly.
|
|
260
267
|
- Do not over-mock Laravel internals.
|
|
261
|
-
- Do not assert
|
|
268
|
+
- Do not assert implementation details when behavior assertions are enough.
|
|
262
269
|
- Do not write brittle tests tied to formatting or irrelevant response noise.
|
|
263
270
|
- Do not create giant tests that cover many behaviors at once.
|
|
264
271
|
- Do not skip authorization or validation coverage for important endpoints.
|
|
@@ -285,7 +292,7 @@ When generating Pest tests:
|
|
|
285
292
|
- Don't use `readonly` or `final` on Pest test helper classes — it breaks mocking.
|
|
286
293
|
- Don't add `use` statements for global classes (`Exception`, `DateTimeImmutable`) in Pest files — they're auto-imported.
|
|
287
294
|
- The model forgets `$this->travel(5)->seconds()` for time-dependent tests — never rely on `now()` differing between lines.
|
|
288
|
-
- Parallel tests share the
|
|
295
|
+
- Parallel tests share the database — don't assume column values are null unless you explicitly set them.
|
|
289
296
|
|
|
290
297
|
## Do NOT
|
|
291
298
|
|
|
@@ -297,7 +304,7 @@ When generating Pest tests:
|
|
|
297
304
|
When generating new tests, focus on:
|
|
298
305
|
- **Business logic**: calculations, status transitions, validation rules, data transformations
|
|
299
306
|
- **Edge cases**: null, empty string, zero, negative numbers, boundary values, max length
|
|
300
|
-
- **Error paths**: invalid input, missing
|
|
307
|
+
- **Error paths**: invalid input, missing dependencies, exception handling
|
|
301
308
|
- **Different code branches**: if/else, early returns, fallback behavior
|
|
302
309
|
|
|
303
310
|
What NOT to test:
|
|
@@ -34,6 +34,10 @@ If both PHP and JS/TS files changed → run **both** pipelines.
|
|
|
34
34
|
- `verify-before-complete` rule — timing: run quality tools ONCE at the end, not after each edit
|
|
35
35
|
- `php-coding` rule → PHPStan section — inline ignores, PHPDoc rules
|
|
36
36
|
- `verify-before-complete` rule — must run quality checks before claiming work is done
|
|
37
|
+
- [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md) and
|
|
38
|
+
[`process-anti-patterns.md`](../testing-anti-patterns/process-anti-patterns.md) —
|
|
39
|
+
test-side rationalizations these tools cannot catch (e.g. "CI is red,
|
|
40
|
+
patch first, test later").
|
|
37
41
|
|
|
38
42
|
---
|
|
39
43
|
|
|
@@ -126,6 +126,16 @@ The rubric (5 dimensions × 0–2, sum / 10) and band thresholds
|
|
|
126
126
|
(`high ≥ 0.8`, `medium 0.5–0.79`, `low < 0.5`) are owned by
|
|
127
127
|
`confidence.py`. Do not re-derive them in prose.
|
|
128
128
|
|
|
129
|
+
### 6. Self-review (3-scan checklist)
|
|
130
|
+
|
|
131
|
+
Before emitting the envelope, run these three scans. Each is a fast pass; failure blocks emission.
|
|
132
|
+
|
|
133
|
+
1. **Spec coverage** — every concrete signal from step 2 (constraints) and step 3 (assumptions) is reflected somewhere in the AC list. Walk the constraint list top-to-bottom; each must anchor at least one AC bullet or appear in the *Assumptions* block.
|
|
134
|
+
2. **Placeholder / TODO scan** — the rendered envelope contains no `<placeholder>`, `TODO`, `FIXME`, `tbd`, `???`, `XXX` strings. The literal angle-bracket placeholders in the template (`<one sentence …>`, `<bullet>`) must be replaced with concrete text before emission.
|
|
135
|
+
3. **Type / shape consistency** — every named file, module, route, or command in the AC matches the project's existing conventions. If the prompt names `auth.service.ts` but the codebase uses `AuthService.php`, surface the mismatch in *Assumptions* rather than adopting the prompt's spelling.
|
|
136
|
+
|
|
137
|
+
Source: adapted from `obra/superpowers` `writing-plans/SKILL.md` § Self-Review (v5.1.0).
|
|
138
|
+
|
|
129
139
|
## Band-action mapping
|
|
130
140
|
|
|
131
141
|
The `refine` dispatcher step in `directives/backend/refine.py` reads
|
|
@@ -250,6 +250,18 @@ open questions surfaced>
|
|
|
250
250
|
The "Refined ticket" section is wrapped in a **copyable Markdown box**
|
|
251
251
|
so the user can grab it verbatim.
|
|
252
252
|
|
|
253
|
+
## Self-review (3-scan checklist)
|
|
254
|
+
|
|
255
|
+
Run these three scans on the rendered output before the close-prompt. Each is a fast pass; failure blocks emission and forces a fix.
|
|
256
|
+
|
|
257
|
+
1. **Spec coverage** — every AC bullet and constraint from the original ticket (and every parent-AC line surfaced via `fold_parent_context`) is reflected in the rewritten ticket, the Top-5 risks, or the *Open questions* section. Nothing from the input vanishes silently.
|
|
258
|
+
2. **Placeholder / TODO scan** — no `<placeholder>`, `TODO`, `FIXME`, `tbd`, `???`, `XXX` strings remain. The angle-bracket placeholders in the template (`<rewritten title>`, `<risk>`, `<one paragraph>`) must be replaced with concrete prose before the close-prompt fires.
|
|
259
|
+
3. **Type / shape consistency** — every module, file, route, or domain term cited in the rewritten ticket and Top-5 risks matches `repo_context.context_docs` and `recent_branches` vocabulary. Invented terms are flagged in *Open questions* or replaced with the project's actual term.
|
|
260
|
+
|
|
261
|
+
Self-review is mechanical (gaps, leftovers, naming drift); persona voices and orchestration outputs handle reasoning critique. Both run; neither replaces the other.
|
|
262
|
+
|
|
263
|
+
Source: adapted from `obra/superpowers` `writing-plans/SKILL.md` § Self-Review (v5.1.0).
|
|
264
|
+
|
|
253
265
|
## Close-prompt (mandatory final step)
|
|
254
266
|
|
|
255
267
|
**Probe write access first (Phase F6).** Before rendering, do a
|
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: repomix
|
|
2
|
+
name: repomix-packer
|
|
3
3
|
description: "Use when packaging a codebase to a single AI-friendly file for LLM analysis — local or remote, XML/Markdown/JSON, token counting, gitignore filtering, peer-side `repomix` CLI."
|
|
4
4
|
source: package
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
> **Pinned upstream:** `repomix` CLI (npm: `repomix`, brew: `repomix`). Re-verify per minor bump. Repomix is an **optional dependency** — this skill never installs it silently.
|
|
8
8
|
|
|
9
|
-
# repomix
|
|
9
|
+
# repomix-packer
|
|
10
10
|
|
|
11
11
|
Wraps the upstream [`yamadashy/repomix`](https://github.com/yamadashy/repomix) CLI for codebase-snapshot workflows: pack a local or remote repo into a single XML / Markdown / JSON file with token counts and secret detection, then feed it to an LLM for review, audit, or migration scoping.
|
|
12
12
|
|
|
@@ -25,7 +25,7 @@ Do NOT use when:
|
|
|
25
25
|
|
|
26
26
|
## Procedure: Snapshot a repo for LLM review
|
|
27
27
|
|
|
28
|
-
###
|
|
28
|
+
### 1. Inspect: verify `repomix` is installed (peer-side)
|
|
29
29
|
|
|
30
30
|
```bash
|
|
31
31
|
repomix --version
|
|
@@ -41,7 +41,7 @@ npm install -g repomix
|
|
|
41
41
|
brew install repomix
|
|
42
42
|
```
|
|
43
43
|
|
|
44
|
-
###
|
|
44
|
+
### 2. Decide local vs remote
|
|
45
45
|
|
|
46
46
|
```bash
|
|
47
47
|
# Local: pack the current directory.
|
|
@@ -54,7 +54,7 @@ npx repomix --remote owner/repo
|
|
|
54
54
|
npx repomix --remote https://github.com/owner/repo/commit/<sha>
|
|
55
55
|
```
|
|
56
56
|
|
|
57
|
-
###
|
|
57
|
+
### 3. Filter the snapshot to the smallest useful slice
|
|
58
58
|
|
|
59
59
|
```bash
|
|
60
60
|
# Include patterns
|
|
@@ -67,7 +67,7 @@ repomix -i "tests/**,*.test.js"
|
|
|
67
67
|
repomix --remove-comments
|
|
68
68
|
```
|
|
69
69
|
|
|
70
|
-
###
|
|
70
|
+
### 4. Pick the output format and destination
|
|
71
71
|
|
|
72
72
|
```bash
|
|
73
73
|
repomix --style markdown -o snapshot.md # human-readable
|
|
@@ -76,7 +76,7 @@ repomix --style json -o snapshot.json # programmatic post-processing
|
|
|
76
76
|
repomix --copy # also copy to clipboard
|
|
77
77
|
```
|
|
78
78
|
|
|
79
|
-
###
|
|
79
|
+
### 5. Verify token budget and secrets
|
|
80
80
|
|
|
81
81
|
Repomix prints per-file and total token counts and runs Secretlint on the output. Check the totals against the target LLM context window:
|
|
82
82
|
|
|
@@ -88,7 +88,7 @@ Repomix prints per-file and total token counts and runs Secretlint on the output
|
|
|
88
88
|
|
|
89
89
|
If Secretlint flags anything, STOP — sanitize the input or add the offending paths to `.repomixignore` before re-packing. Never use `--no-security-check` on an unfamiliar codebase.
|
|
90
90
|
|
|
91
|
-
###
|
|
91
|
+
### 6. Hand the snapshot to the consumer skill
|
|
92
92
|
|
|
93
93
|
Most workflows that call this skill pass the snapshot to:
|
|
94
94
|
|
|
@@ -131,6 +131,11 @@ to every roadmap you author.
|
|
|
131
131
|
template rule 13 + [`scope-control`](../../rules/scope-control.md#git-operations--permission-gated).
|
|
132
132
|
* Plan automatic branch switches mid-roadmap (template rule 14).
|
|
133
133
|
* Ship a phase without checkboxes (`roadmap-progress-sync` Iron Law #2).
|
|
134
|
+
* Write merge, push, or commit steps into the roadmap. Roadmaps plan
|
|
135
|
+
**work**; merge / push / commit are delivery decisions owned by the
|
|
136
|
+
user (`commit-policy` Iron Law). A roadmap is "implementation-complete"
|
|
137
|
+
once its checkboxes are ticked and verification has been run — merge
|
|
138
|
+
timing is tracked outside the roadmap.
|
|
134
139
|
* Use ALL-CAPS Iron-Law fenced blocks — those belong in
|
|
135
140
|
[`kernel-membership`](../../../docs/contracts/kernel-membership.md)-listed
|
|
136
141
|
rules, not roadmaps.
|
|
@@ -149,6 +154,10 @@ to every roadmap you author.
|
|
|
149
154
|
- **Author-during-execution branch switches** — the agent should not
|
|
150
155
|
propose a new branch mid-roadmap; that decision is fenced to
|
|
151
156
|
authoring time.
|
|
157
|
+
- **Merge / commit steps in roadmap body** — checkboxes like
|
|
158
|
+
"merge PR #X" or "commit phase Y" couple roadmap closure to git
|
|
159
|
+
operations the user has not authorized. Roadmap completion is
|
|
160
|
+
decoupled from delivery; ship-the-PR is its own decision.
|
|
152
161
|
|
|
153
162
|
## Examples
|
|
154
163
|
|
|
@@ -128,6 +128,27 @@ the PR or split by responsibility.
|
|
|
128
128
|
* Run the full CI pipeline locally (see `Taskfile.yml` in this repo for
|
|
129
129
|
the script list) — must exit 0 except for tolerated warnings.
|
|
130
130
|
|
|
131
|
+
### 6. Governance baseline (when introducing a new linter check)
|
|
132
|
+
|
|
133
|
+
**Advisory, reviewer-checked — no CI gate.** When the same PR adds a
|
|
134
|
+
new check to `scripts/skill_linter.py` (or strengthens an existing one)
|
|
135
|
+
such that previously-clean rules now warn, the PR body MUST record the
|
|
136
|
+
pre-existing violations on `main` in a Markdown table:
|
|
137
|
+
|
|
138
|
+
```markdown
|
|
139
|
+
### Pre-existing baseline (informational)
|
|
140
|
+
|
|
141
|
+
| Code | Count on main | Bucket |
|
|
142
|
+
|---|---:|---|
|
|
143
|
+
| {new_code} | N | (a) genuine fix · (b) accept · (c) check too aggressive |
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Forward-only: the new check applies to **the rule under review** and to
|
|
147
|
+
**future** edits. The baseline table is informational so reviewers can
|
|
148
|
+
distinguish genuine debt from acceptable carry-overs without diffing the
|
|
149
|
+
full lint output. See `agents/analysis/lint-warning-triage.md` for the
|
|
150
|
+
3-bucket reference.
|
|
151
|
+
|
|
131
152
|
## Frontmatter shape
|
|
132
153
|
|
|
133
154
|
```yaml
|
|
@@ -59,6 +59,25 @@ Ask: **"Does the model need this to do its job correctly?"**
|
|
|
59
59
|
* If the model knows it but does it wrong in THIS project → **Rule or Guideline**
|
|
60
60
|
* If the model needs a multi-step workflow to get it right → **Skill**
|
|
61
61
|
|
|
62
|
+
### Skills and commands share the `.claude/skills/` namespace
|
|
63
|
+
|
|
64
|
+
Skills (`.agent-src.uncompressed/skills/{name}/SKILL.md`) AND commands
|
|
65
|
+
(`.agent-src.uncompressed/commands/{name}.md`) both project into
|
|
66
|
+
`.claude/skills/` (`scripts/compress.py` → `generate_claude_skills` +
|
|
67
|
+
`generate_claude_commands`). Claude treats the directory as native
|
|
68
|
+
skills.
|
|
69
|
+
|
|
70
|
+
* Same-name collision: skill wins, command is skipped
|
|
71
|
+
(`generate_claude_commands` honors this). Don't reuse a command's
|
|
72
|
+
slug for a skill unless the command should retire.
|
|
73
|
+
* Both compete on `description` for routing. A weak skill description
|
|
74
|
+
is shadowed by a stronger same-domain command — and vice versa.
|
|
75
|
+
Trigger phrasing must be precise (§ 1b below).
|
|
76
|
+
* Workflow has both "user types `/foo`" path AND "model picks this up
|
|
77
|
+
from intent" path → author the skill first, let the command delegate
|
|
78
|
+
via `skills:` frontmatter. Two artifacts with the same trigger
|
|
79
|
+
surface fight each other in the router.
|
|
80
|
+
|
|
62
81
|
### When "Nothing" is the right answer
|
|
63
82
|
|
|
64
83
|
Do NOT create a skill or rule for:
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: subagent-orchestration
|
|
3
|
-
description: "Use when orchestrating implementer/judge subagents —
|
|
3
|
+
description: "Use when orchestrating implementer/judge subagents — seven modes (do-and-judge ±two-stage, do-in-steps/parallel/worktrees, do-competitively, judge-with-debate) — models from .agent-settings.yml."
|
|
4
4
|
source: package
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -44,7 +44,7 @@ judge is a fresh pair of eyes. If `.agent-settings.yml` resolves to
|
|
|
44
44
|
identical implementer and judge models, surface the mismatch before
|
|
45
45
|
running — do not silently continue.
|
|
46
46
|
|
|
47
|
-
## The
|
|
47
|
+
## The seven modes
|
|
48
48
|
|
|
49
49
|
Each mode has a decision row: when to use, when not, and the expected
|
|
50
50
|
model pairing. Defaults come from
|
|
@@ -60,7 +60,34 @@ back to the user.
|
|
|
60
60
|
|---|---|---|
|
|
61
61
|
| Single-change task with non-trivial risk | Tiny fix, or spike/exploration | implementer = session; judge = one tier up |
|
|
62
62
|
|
|
63
|
-
### 2. do-
|
|
63
|
+
### 2. do-and-judge-two-stage
|
|
64
|
+
|
|
65
|
+
Implementer produces a diff; **two judges run sequentially** — first a
|
|
66
|
+
spec-compliance reviewer (does the diff satisfy the stated spec /
|
|
67
|
+
acceptance criteria?), then a code-quality reviewer (is the diff well-
|
|
68
|
+
written for the codebase it lands in?). The orchestrator only proceeds
|
|
69
|
+
to stage two if stage one returns `DONE` or `DONE_WITH_CONCERNS`. A
|
|
70
|
+
stage-one `BLOCKED` shortcuts the loop — there is no point quality-
|
|
71
|
+
reviewing a diff that does not satisfy the spec.
|
|
72
|
+
|
|
73
|
+
| When to use | When not | Model pairing |
|
|
74
|
+
|---|---|---|
|
|
75
|
+
| Spec is contested or AC are detailed; diff size makes one judge prone to missing one axis (correctness vs craft) | Spec is one sentence, or the diff is one line (collapse to mode 1) | implementer = session; spec-judge = one tier up; quality-judge = same tier as spec-judge, fresh context |
|
|
76
|
+
|
|
77
|
+
**Why two stages, not one judge with both rubrics:** combining the
|
|
78
|
+
rubrics in one prompt reliably regresses one of them — the judge "spends
|
|
79
|
+
attention" on whichever rubric appears last. Splitting the prompts
|
|
80
|
+
forces each judge to commit fully to its rubric.
|
|
81
|
+
|
|
82
|
+
**Stage-routing rule:**
|
|
83
|
+
- Stage-1 returns `DONE` → run stage-2.
|
|
84
|
+
- Stage-1 returns `DONE_WITH_CONCERNS` → run stage-2; concerns carry
|
|
85
|
+
forward to the final envelope.
|
|
86
|
+
- Stage-1 returns `NEEDS_CONTEXT` → pause; stage-2 does not run.
|
|
87
|
+
- Stage-1 returns `BLOCKED` → final verdict is `BLOCKED`; stage-2
|
|
88
|
+
does not run (saves cost).
|
|
89
|
+
|
|
90
|
+
### 3. do-in-steps
|
|
64
91
|
|
|
65
92
|
Plan is split into N steps; judge runs **between** steps. A step that
|
|
66
93
|
fails judgment is revised before the next step starts. Used for
|
|
@@ -70,7 +97,7 @@ multi-file changes where a mid-plan mistake would cascade.
|
|
|
70
97
|
|---|---|---|
|
|
71
98
|
| Multi-step plan with ordered dependencies | Single-step change, or when steps are independent (use `do-in-parallel`) | implementer = session; judge = one tier up |
|
|
72
99
|
|
|
73
|
-
###
|
|
100
|
+
### 4. do-in-parallel
|
|
74
101
|
|
|
75
102
|
Independent slices run concurrently. No judge per slice — judge runs
|
|
76
103
|
once on the aggregated result. Parallelism capped by
|
|
@@ -80,7 +107,7 @@ once on the aggregated result. Parallelism capped by
|
|
|
80
107
|
|---|---|---|
|
|
81
108
|
| Independent slices (different files, non-overlapping) | Any slice touches shared state | implementer = session; judge = one tier up, run once |
|
|
82
109
|
|
|
83
|
-
###
|
|
110
|
+
### 5. do-competitively
|
|
84
111
|
|
|
85
112
|
Multiple implementers produce candidate diffs for the **same** slice.
|
|
86
113
|
Judge picks the winner and rejects the losers. Expensive — use only
|
|
@@ -90,7 +117,7 @@ when the solution space is genuinely broad.
|
|
|
90
117
|
|---|---|---|
|
|
91
118
|
| Broad solution space (algorithm choice, API shape) | Well-defined problem with one good answer | implementers = same tier (≥2 instances); judge = one tier up |
|
|
92
119
|
|
|
93
|
-
###
|
|
120
|
+
### 6. judge-with-debate
|
|
94
121
|
|
|
95
122
|
Two judges each produce a verdict; a meta-judge reconciles
|
|
96
123
|
disagreements. Used for high-stakes changes (security, data
|
|
@@ -100,7 +127,7 @@ migration, public API) where a single judge is too easy to fool.
|
|
|
100
127
|
|---|---|---|
|
|
101
128
|
| Security, data integrity, public API change | Routine internal refactor | judges = same tier (2x); meta-judge = one tier up |
|
|
102
129
|
|
|
103
|
-
###
|
|
130
|
+
### 7. do-in-worktrees
|
|
104
131
|
|
|
105
132
|
Cross-wing or cross-skill chain executed across isolated git
|
|
106
133
|
worktrees — each handoff in the chain runs in its own worktree, so
|
|
@@ -130,7 +157,44 @@ end produces a single integration PR.
|
|
|
130
157
|
**Anti-pattern:** do not use for fast iteration loops where each
|
|
131
158
|
step is under ~30 minutes. The branch-creation, context-switch, and
|
|
132
159
|
worktree-cleanup cost dominates. Stick with mode 1 (do-and-judge)
|
|
133
|
-
or mode
|
|
160
|
+
or mode 3 (do-in-steps) for those.
|
|
161
|
+
|
|
162
|
+
## Status taxonomy — every subagent return uses one envelope
|
|
163
|
+
|
|
164
|
+
Every implementer or judge return must conform to
|
|
165
|
+
[`schemas/subagent-status.json`](schemas/subagent-status.json). Four
|
|
166
|
+
statuses, no free-form alternatives:
|
|
167
|
+
|
|
168
|
+
| Status | Meaning | Required keys (beyond `status`, `summary`) |
|
|
169
|
+
|---|---|---|
|
|
170
|
+
| `DONE` | Work shipped, all gates green. | `evidence[]` |
|
|
171
|
+
| `DONE_WITH_CONCERNS` | Work shipped but caller must act on concerns. | `evidence[]`, `concerns[]` |
|
|
172
|
+
| `NEEDS_CONTEXT` | Paused; caller can unblock by answering. | `blocking_question` |
|
|
173
|
+
| `BLOCKED` | No path forward exists. | `blocking_reason` |
|
|
174
|
+
|
|
175
|
+
**Why a fixed taxonomy:** orchestrators (`/do-and-judge`, `/do-in-steps`)
|
|
176
|
+
route on status. Free-form "kind of done" returns force the orchestrator
|
|
177
|
+
to interpret prose, which silently regresses the two-revision ceiling and
|
|
178
|
+
the judge-rejected-do-not-apply rule. The schema makes routing mechanical.
|
|
179
|
+
|
|
180
|
+
**Tests:** `tests/test_subagent_status_schema.py` exercises all four
|
|
181
|
+
statuses plus rejection cases (missing required keys, unknown status,
|
|
182
|
+
extra fields, conditional-key violations).
|
|
183
|
+
|
|
184
|
+
**Distinguishing `NEEDS_CONTEXT` from `BLOCKED`:** `NEEDS_CONTEXT` means
|
|
185
|
+
*"you, the caller, can fix this by telling me X"*. `BLOCKED` means
|
|
186
|
+
*"no input from you unblocks this — escalate or rescope"*. If a subagent
|
|
187
|
+
is unsure, it picks `BLOCKED` and the caller can downgrade.
|
|
188
|
+
|
|
189
|
+
## Dispatch prompts — externalized
|
|
190
|
+
|
|
191
|
+
Each mode's literal dispatch template lives under
|
|
192
|
+
[`prompts/{mode}.md`](prompts/README.md). The orchestrator loads the
|
|
193
|
+
matching prompt at dispatch time and substitutes `{{placeholders}}`.
|
|
194
|
+
Edits to a prompt do not bloat this skill against the 400-line sunset
|
|
195
|
+
trigger; `tests/test_subagent_prompt_loading.py` confirms each of the
|
|
196
|
+
seven modes resolves to a loadable prompt that cites all four taxonomy
|
|
197
|
+
statuses.
|
|
134
198
|
|
|
135
199
|
## Procedure
|
|
136
200
|
|
|
@@ -158,9 +222,10 @@ same context, **stop** and report. Do not improvise.
|
|
|
158
222
|
|
|
159
223
|
### 3. Pick the mode
|
|
160
224
|
|
|
161
|
-
Match task shape to one of the
|
|
162
|
-
prefer the cheaper one (`do-and-judge` < `do-
|
|
163
|
-
< `do-
|
|
225
|
+
Match task shape to one of the seven modes. When two modes could fit,
|
|
226
|
+
prefer the cheaper one (`do-and-judge` < `do-and-judge-two-stage` <
|
|
227
|
+
`do-in-steps` < `do-in-parallel` < `do-competitively` <
|
|
228
|
+
`judge-with-debate` < `do-in-worktrees`).
|
|
164
229
|
|
|
165
230
|
### 4. Dispatch
|
|
166
231
|
|
|
@@ -195,7 +260,7 @@ the judge verdict.
|
|
|
195
260
|
|
|
196
261
|
## Output format
|
|
197
262
|
|
|
198
|
-
1. **Mode chosen** — one of the
|
|
263
|
+
1. **Mode chosen** — one of the seven, with the one-line reason
|
|
199
264
|
2. **Model pairing** — implementer model / judge model (resolved)
|
|
200
265
|
3. **Verdict** — applied / revised / handed back
|
|
201
266
|
4. **Evidence** — diff summary, test output, or judge transcript
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Subagent dispatch prompts
|
|
2
|
+
|
|
3
|
+
One file per mode in [`SKILL.md`](../SKILL.md) § *The seven modes*. Each
|
|
4
|
+
prompt is the **literal template** the orchestrator hands to the
|
|
5
|
+
subagent on dispatch — externalized so prompt edits do not bloat the
|
|
6
|
+
skill above the 400-line sunset trigger.
|
|
7
|
+
|
|
8
|
+
| Mode | File |
|
|
9
|
+
|---|---|
|
|
10
|
+
| do-and-judge | [`do-and-judge.md`](do-and-judge.md) |
|
|
11
|
+
| do-and-judge-two-stage | [`do-and-judge-two-stage.md`](do-and-judge-two-stage.md) |
|
|
12
|
+
| do-in-steps | [`do-in-steps.md`](do-in-steps.md) |
|
|
13
|
+
| do-in-parallel | [`do-in-parallel.md`](do-in-parallel.md) |
|
|
14
|
+
| do-competitively | [`do-competitively.md`](do-competitively.md) |
|
|
15
|
+
| judge-with-debate | [`judge-with-debate.md`](judge-with-debate.md) |
|
|
16
|
+
| do-in-worktrees | [`do-in-worktrees.md`](do-in-worktrees.md) |
|
|
17
|
+
|
|
18
|
+
## Contract
|
|
19
|
+
|
|
20
|
+
Every prompt cites the status taxonomy in
|
|
21
|
+
[`../schemas/subagent-status.json`](../schemas/subagent-status.json) and
|
|
22
|
+
ends with the **return-envelope** instruction so the subagent's reply
|
|
23
|
+
validates against `tests/test_subagent_status_schema.py`.
|
|
24
|
+
|
|
25
|
+
## Loading
|
|
26
|
+
|
|
27
|
+
`tests/test_subagent_prompt_loading.py` asserts that every mode named
|
|
28
|
+
in `SKILL.md` § *The seven modes* has a loadable prompt file under this
|
|
29
|
+
directory and that each prompt mentions all four status enum values.
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
# Prompt — do-and-judge-two-stage
|
|
2
|
+
|
|
3
|
+
Mode reference: [`../SKILL.md`](../SKILL.md) § *2. do-and-judge-two-stage*.
|
|
4
|
+
|
|
5
|
+
## Implementer prompt
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
You are the implementer in a do-and-judge-two-stage loop. Two judges
|
|
9
|
+
will review your diff in sequence: first SPEC COMPLIANCE, then CODE
|
|
10
|
+
QUALITY. Spec failure shortcuts the loop — quality is not reviewed if
|
|
11
|
+
spec is wrong.
|
|
12
|
+
|
|
13
|
+
TASK: {{task_description}}
|
|
14
|
+
ACCEPTANCE CRITERIA: {{acceptance_criteria}}
|
|
15
|
+
CONTEXT FILES: {{file_paths}}
|
|
16
|
+
|
|
17
|
+
CONSTRAINTS:
|
|
18
|
+
- Hit every AC literally; do not "interpret" them away.
|
|
19
|
+
- Do not silently expand scope; AC are the contract.
|
|
20
|
+
- Write tests that map 1:1 to the AC so the spec-judge can verify.
|
|
21
|
+
|
|
22
|
+
ON COMPLETION, return ONE envelope per schemas/subagent-status.json:
|
|
23
|
+
- DONE — every AC satisfied, tests pass; evidence[]
|
|
24
|
+
maps each AC to the test that exercises it.
|
|
25
|
+
- DONE_WITH_CONCERNS — every AC satisfied but a trade-off needs
|
|
26
|
+
flagging in concerns[].
|
|
27
|
+
- NEEDS_CONTEXT — an AC is ambiguous; blocking_question must
|
|
28
|
+
name the AC and the interpretation gap.
|
|
29
|
+
- BLOCKED — an AC cannot be satisfied as stated;
|
|
30
|
+
blocking_reason explains why.
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Stage-1 prompt — SPEC COMPLIANCE judge
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
You are the SPEC COMPLIANCE judge. Stage 1 of two. Your ONLY job is:
|
|
37
|
+
"does the diff satisfy every acceptance criterion as stated?" Do NOT
|
|
38
|
+
review style, naming, or craft — that is stage 2's job.
|
|
39
|
+
|
|
40
|
+
ACCEPTANCE CRITERIA: {{acceptance_criteria}}
|
|
41
|
+
DIFF: {{diff}}
|
|
42
|
+
TEST OUTPUT: {{test_output}}
|
|
43
|
+
IMPLEMENTER ENVELOPE: {{envelope}}
|
|
44
|
+
|
|
45
|
+
PER-AC SCAN — for each AC, return:
|
|
46
|
+
- SATISFIED — cite the diff hunk + test that proves it.
|
|
47
|
+
- PARTIAL — cite what is missing and why it falls short.
|
|
48
|
+
- MISSING — AC has no corresponding implementation.
|
|
49
|
+
|
|
50
|
+
VERDICT (one envelope, schemas/subagent-status.json):
|
|
51
|
+
- DONE — every AC SATISFIED; evidence[] is the per-AC
|
|
52
|
+
scan above.
|
|
53
|
+
- DONE_WITH_CONCERNS — every AC SATISFIED but a stretch
|
|
54
|
+
interpretation needs flagging (rare at this
|
|
55
|
+
stage).
|
|
56
|
+
- NEEDS_CONTEXT — an AC is ambiguous AND the implementer's
|
|
57
|
+
interpretation is plausible; orchestrator
|
|
58
|
+
must clarify.
|
|
59
|
+
- BLOCKED — one or more AC PARTIAL or MISSING. Stage 2
|
|
60
|
+
will NOT run; implementer revises first.
|
|
61
|
+
|
|
62
|
+
NEVER comment on naming, structure, or style. Stay in your lane —
|
|
63
|
+
that is the value of the two-stage split.
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Stage-2 prompt — CODE QUALITY judge (only if stage 1 passes)
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
You are the CODE QUALITY judge. Stage 2 of two. Stage 1 already
|
|
70
|
+
confirmed the diff satisfies the spec. Your ONLY job is craft: is
|
|
71
|
+
the diff well-written for THIS codebase?
|
|
72
|
+
|
|
73
|
+
DIFF: {{diff}}
|
|
74
|
+
NEIGHBORING FILES: {{neighboring_files}}
|
|
75
|
+
PROJECT CONVENTIONS: {{conventions_summary}}
|
|
76
|
+
STAGE-1 CONCERNS (carry-forward): {{stage_1_concerns}}
|
|
77
|
+
|
|
78
|
+
QUALITY DIMENSIONS — cite each in evidence[]:
|
|
79
|
+
1. Naming consistency with neighbors.
|
|
80
|
+
2. Structure / responsibility boundary.
|
|
81
|
+
3. Error handling matches project style.
|
|
82
|
+
4. Test shape matches project conventions (Pest / pytest / etc.).
|
|
83
|
+
5. Diff size — could the same intent ship smaller?
|
|
84
|
+
|
|
85
|
+
VERDICT (one envelope, schemas/subagent-status.json):
|
|
86
|
+
- DONE — quality is on par with the codebase;
|
|
87
|
+
evidence[] cites the five dimensions.
|
|
88
|
+
- DONE_WITH_CONCERNS — apply the diff, but concerns[] lists the
|
|
89
|
+
craft issues caller must address (carry
|
|
90
|
+
forward stage-1 concerns too).
|
|
91
|
+
- NEEDS_CONTEXT — convention is unclear; orchestrator must
|
|
92
|
+
name the canonical pattern.
|
|
93
|
+
- BLOCKED — diff is correct per stage 1 but quality is
|
|
94
|
+
unacceptable; implementer must revise.
|
|
95
|
+
|
|
96
|
+
NEVER re-litigate the spec. Stage 1 already settled correctness —
|
|
97
|
+
your job is craft.
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Stage routing — orchestrator logic
|
|
101
|
+
|
|
102
|
+
Stage-1 status determines whether stage 2 runs:
|
|
103
|
+
|
|
104
|
+
| Stage-1 status | Run stage 2? | Final envelope |
|
|
105
|
+
|---|---|---|
|
|
106
|
+
| `DONE` | Yes | Stage-2 envelope |
|
|
107
|
+
| `DONE_WITH_CONCERNS` | Yes | Stage-2 envelope; merge concerns[] from both |
|
|
108
|
+
| `NEEDS_CONTEXT` | No | Stage-1 envelope; pause |
|
|
109
|
+
| `BLOCKED` | No | Stage-1 envelope; implementer revises |
|
|
110
|
+
|
|
111
|
+
The orchestrator never collapses both stages into one prompt — that
|
|
112
|
+
defeats the purpose of the split (see SKILL.md § "Why two stages, not
|
|
113
|
+
one judge with both rubrics").
|
|
114
|
+
|
|
115
|
+
## Cost-discipline rule
|
|
116
|
+
|
|
117
|
+
Two-stage = up to **3 subagent calls** per cycle (implementer + two
|
|
118
|
+
judges) versus 2 for plain `do-and-judge`. Use only when AC are
|
|
119
|
+
detailed enough that a single judge would predictably miss one of
|
|
120
|
+
correctness or craft. For one-line fixes or single-AC tasks, mode 1
|
|
121
|
+
(`do-and-judge`) is the right answer.
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# Prompt — do-and-judge
|
|
2
|
+
|
|
3
|
+
Mode reference: [`../SKILL.md`](../SKILL.md) § *1. do-and-judge*.
|
|
4
|
+
|
|
5
|
+
## Implementer prompt
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
You are the implementer in a do-and-judge loop. Hard ceiling: two
|
|
9
|
+
revision cycles before hand-back to the user.
|
|
10
|
+
|
|
11
|
+
TASK: {{task_description}}
|
|
12
|
+
|
|
13
|
+
CONTEXT FILES: {{file_paths}}
|
|
14
|
+
|
|
15
|
+
CONSTRAINTS:
|
|
16
|
+
- Do not modify files outside the cited paths without surfacing why.
|
|
17
|
+
- Do not skip tests; if the task does not include a test, write one.
|
|
18
|
+
- Prefer the smallest diff that satisfies the task.
|
|
19
|
+
|
|
20
|
+
ON COMPLETION, return ONE envelope conforming to
|
|
21
|
+
schemas/subagent-status.json. Pick exactly one status:
|
|
22
|
+
- DONE — work shipped, all gates green; include evidence[].
|
|
23
|
+
- DONE_WITH_CONCERNS — shipped but caller must read concerns[];
|
|
24
|
+
include evidence[] AND concerns[].
|
|
25
|
+
- NEEDS_CONTEXT — paused; the orchestrator can unblock by
|
|
26
|
+
answering blocking_question.
|
|
27
|
+
- BLOCKED — no path forward; include blocking_reason.
|
|
28
|
+
|
|
29
|
+
NEVER invent a fifth status. Free-form "kind of done" prose is rejected
|
|
30
|
+
by the schema validator.
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Judge prompt
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
You are the judge reviewing the implementer's diff. The implementer
|
|
37
|
+
returned the envelope below. Validate against the task and constraints.
|
|
38
|
+
|
|
39
|
+
TASK: {{task_description}}
|
|
40
|
+
DIFF: {{diff}}
|
|
41
|
+
IMPLEMENTER ENVELOPE: {{envelope}}
|
|
42
|
+
|
|
43
|
+
VERDICT (return ONE envelope per schemas/subagent-status.json):
|
|
44
|
+
- DONE — apply this diff; cite evidence in evidence[].
|
|
45
|
+
- DONE_WITH_CONCERNS — apply but caller must address concerns[].
|
|
46
|
+
- NEEDS_CONTEXT — orchestrator must clarify blocking_question
|
|
47
|
+
before re-dispatching the implementer.
|
|
48
|
+
- BLOCKED — diff is wrong; explain in blocking_reason.
|
|
49
|
+
Do NOT silently rewrite — that is the
|
|
50
|
+
implementer's job on the revision pass.
|
|
51
|
+
|
|
52
|
+
NEVER apply a diff you would have written differently if your concerns
|
|
53
|
+
were not addressed. Use DONE_WITH_CONCERNS for that case.
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Revision-loop rule
|
|
57
|
+
|
|
58
|
+
After two revision cycles, the orchestrator stops and hands back to the
|
|
59
|
+
user with the most recent envelope. The judge does not become the
|
|
60
|
+
implementer.
|