cc-dev-template 0.1.96 → 0.1.97

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cc-dev-template",
3
- "version": "0.1.96",
3
+ "version": "0.1.97",
4
4
  "description": "Structured AI-assisted development framework for Claude Code",
5
5
  "bin": {
6
6
  "cc-dev-template": "./bin/install.js"
@@ -30,13 +30,23 @@ When given a task file path:
30
30
 
31
31
  1. Read the task file at that path
32
32
  2. Read the spec file in the parent directory (`../spec.md`)
33
- 3. Check the **Review Notes** section of the task file:
33
+ 3. Read the **Test Plan** (`../test-plan.md`) find the full test specifications for the test IDs referenced in the task's `tests:` frontmatter
34
+ 4. Check the **Review Notes** section of the task file:
34
35
  - **If issues exist**: Address those specific issues (fix mode)
35
- - **If empty**: Implement from scratch per the Criterion (initial mode)
36
- 4. Implement the work, touching only files listed in the **Files** section
36
+ - **If empty**: Implement from scratch using TDD (initial mode — see TDD process below)
37
37
  5. Append your work summary to **Implementation Notes** (see format below)
38
38
  6. Return minimal status (see Output section)
39
39
 
40
+ ## TDD Process (Initial Mode)
41
+
42
+ Follow this sequence strictly:
43
+
44
+ 1. **RED** — Write executable test code for every test ID referenced in the task's `tests:` field. Translate the test specifications from the test plan into actual test files using the project's test framework and conventions. Run the tests — they MUST fail. If a test passes before you've written any implementation, the test is vacuous or the feature already exists — investigate.
45
+
46
+ 2. **GREEN** — Implement the minimum code to make all referenced tests pass. Touch only files listed in the **Files** section. Run tests after each meaningful change. Stop as soon as all referenced tests pass.
47
+
48
+ 3. **REFACTOR** — Clean up the implementation while keeping tests green. Extract helpers, improve naming, reduce duplication — but only if the tests still pass after each change.
49
+
40
50
  ## Implementation Notes Format
41
51
 
42
52
  Append a new section with timestamp:
@@ -44,7 +54,9 @@ Append a new section with timestamp:
44
54
  ```markdown
45
55
  ### Pass N (YYYY-MM-DD HH:MM)
46
56
 
47
- [Brief summary of what you implemented or fixed]
57
+ **RED**: Wrote tests {test IDs} {all fail as expected / notes on any issues}
58
+ **GREEN**: {Brief summary of what you implemented to make tests pass}
59
+ **REFACTOR**: {What you cleaned up, or "None needed"}
48
60
 
49
61
  Files modified:
50
62
  - path/to/file.ts - [what changed]
@@ -34,9 +34,14 @@ When given a task file path:
34
34
  4. Append findings to **Review Notes** (see format below)
35
35
  5. Return minimal status (see Output section)
36
36
 
37
- ## Step 1: Code Review + Automated Tests
38
-
39
- - Run automated tests if they exist (look for test files, run with appropriate test runner)
37
+ ## Step 1: Run Tests + Code Review
38
+
39
+ - Run the tests referenced in the task's `tests:` frontmatter field they must ALL pass
40
+ - Read the test plan (`../test-plan.md`) and verify the test code actually matches the test specifications (correct assertions, correct fixture data, not testing implementation details instead of behavior)
41
+ - Check test quality:
42
+ - Does each test have meaningful assertions that would fail if the feature weren't implemented?
43
+ - Are mocks minimal (only at true boundaries, not mocking the thing being tested)?
44
+ - Are tests testing behavior (from the spec), not implementation details?
40
45
  - Check for code smells:
41
46
  - Files over 300 lines: Can this logically split into multiple files, or does it need to be one file?
42
47
  - Missing error handling that could cause runtime failures, naming that actively misleads about what the code does
@@ -89,7 +89,6 @@ These must be specific enough that tests can be written against them without rea
89
89
  - **Given**: {precondition — specific state, not vague}
90
90
  - **When**: {action — concrete user or system action}
91
91
  - **Then**: {expected result — observable, measurable}
92
- - **Verification**: {how to test — specific command, specific assertion, or specific manual check}
93
92
 
94
93
  ### AC-2: ...
95
94
 
@@ -125,8 +124,8 @@ Every function, endpoint, or interface crossing a module boundary is fully speci
125
124
  ### 5. Acceptance Criteria Independence
126
125
  Each AC tests exactly one behavior. Each AC can be verified without completing other ACs first. Fix compound criteria by splitting them.
127
126
 
128
- ### 6. Verification Executability
129
- Every AC has a verification that can actually be executed a test command, specific assertion, or concrete manual check. Fix any "verify it works" or "test the endpoint".
127
+ ### 6. Testability
128
+ Every AC has a concrete, observable outcome in the Then clause — specific return values, state changes, or side effects that can be asserted against. The Then clause must be precise enough that a test-planner can derive executable tests from it without guessing. Fix any vague Then clauses like "it works correctly" or "the feature is available".
130
129
 
131
130
  ### 7. Data Model Precision
132
131
  All data structures have concrete field names, types, nullability, and defaults. Fix any "relevant fields", "appropriate type", or vague descriptions.
@@ -30,26 +30,28 @@ You operate in one of two modes depending on your prompt.
30
30
  When prompted to generate a task breakdown:
31
31
 
32
32
  1. Read `{spec_dir}/spec.md` for acceptance criteria, data model, and integration points
33
- 2. Read `{spec_dir}/research.md` and `{spec_dir}/design.md` for codebase context
34
- 3. Map each acceptance criterion to the files that need changes
35
- 4. Design tracer bullet ordering each task touches all necessary layers
36
- 5. Write task files to `{spec_dir}/tasks/`
37
- 6. Return a summary of what was created
33
+ 2. Read `{spec_dir}/test-plan.md` for the verification strategy and test IDs
34
+ 3. Read `{spec_dir}/research.md` and `{spec_dir}/design.md` for codebase context
35
+ 5. Map each acceptance criterion to the files that need changes
36
+ 6. Design tracer bullet ordering — each task touches all necessary layers
37
+ 7. Write task files to `{spec_dir}/tasks/`
38
+ 8. Return a summary of what was created
38
39
 
39
40
  ## Review Mode
40
41
 
41
42
  When prompted to review a task breakdown:
42
43
 
43
44
  1. Read `{spec_dir}/spec.md` — extract all acceptance criteria
44
- 2. Read all task files in `{spec_dir}/tasks/`
45
- 3. Run every check in the review checklist below
46
- 4. **Classify each issue by severity before acting:**
45
+ 2. Read `{spec_dir}/test-plan.md` — extract all test IDs
46
+ 3. Read all task files in `{spec_dir}/tasks/`
47
+ 4. Run every check in the review checklist below
48
+ 5. **Classify each issue by severity before acting:**
47
49
  - **HIGH**: Would cause implementation to fail or produce wrong results — missing dependency, wrong file path, coverage gap where an AC has no task
48
50
  - **MEDIUM**: Would cause meaningful confusion during implementation — unclear verification, ambiguous scope boundary between tasks
49
51
  - **LOW**: Cosmetic or stylistic — task title wording, minor verification phrasing, formatting — **ignore these entirely**
50
- 5. Fix every medium-to-high issue found directly in the task files — do not report issues, fix them
51
- 6. After fixing, re-run the checklist to verify the fixes
52
- 7. Return one of three verdicts:
52
+ 6. Fix every medium-to-high issue found directly in the task files — do not report issues, fix them
53
+ 7. After fixing, re-run the checklist to verify the fixes
54
+ 8. Return one of three verdicts:
53
55
  - **APPROVED** — zero medium-to-high issues found on any check. The breakdown is clean.
54
56
  - **APPROVED_WITH_FIXES** — medium-to-high issues were found and fixed. Another reviewer must verify the fixes.
55
57
  - **ISSUES REMAINING** — unfixable issues exist that need user action.
@@ -64,17 +66,26 @@ id: T001
64
66
  title: {Short descriptive title — the acceptance criterion}
65
67
  status: pending
66
68
  depends_on: []
69
+ tests: [BT-1, CT-1]
67
70
  ---
68
71
  ```
69
72
 
70
73
  ### Criterion
71
74
  {The acceptance criterion from the spec, verbatim}
72
75
 
76
+ ### Tests
77
+ {Referenced tests from the test plan, with a brief summary of each:}
78
+ - **BT-{N}**: {one-line summary of what this behavioral test verifies}
79
+ - **CT-{N}**: {one-line summary of what this contract test verifies}
80
+ - **IT-{N}**: {if applicable — integration test summary}
81
+
73
82
  ### Files
74
83
  {Which files will be created or modified — verify paths exist for modifications}
75
84
 
76
- ### Verification
77
- {Specific commands or checksconcrete, executable}
85
+ ### TDD Steps
86
+ 1. Write test code for the referenced tests (they should fail no implementation yet)
87
+ 2. Implement the minimum code to make the tests pass
88
+ 3. Refactor if needed (tests still pass)
78
89
 
79
90
  ### Implementation Notes
80
91
  <!-- Implementer agent writes here -->
@@ -87,10 +98,10 @@ depends_on: []
87
98
  - First task wires the thinnest possible end-to-end path (mock data is fine)
88
99
  - Each subsequent task adds real behavior for one acceptance criterion
89
100
  - Every acceptance criterion maps to exactly one task
90
- - Testing is part of each taskinclude the test alongside the feature
101
+ - Every task references tests from the test plan — the implementer writes these tests first (TDD)
91
102
  - Dependencies flow forward only
92
103
  - Each task title describes a verifiable outcome ("User can register with email"), not an implementation detail ("Create the User model")
93
- - Each task's verification uses concrete commands, not "verify it works correctly"
104
+ - Each task references specific test IDs from the test plan, not ad hoc verification
94
105
 
95
106
  ## Review Checklist
96
107
 
@@ -103,11 +114,11 @@ Task file names sort in execution order (T001 before T002). Dependencies form a
103
114
  ### 3. File Plausibility
104
115
  File paths in each task's Files section follow project conventions. Files listed for modification exist in the codebase (use Glob to verify). Each new file is created by exactly one task.
105
116
 
106
- ### 4. Verification Executability
107
- Every Verification section contains concrete commands or specific manual checks. Fix any "Verify it works", "Check that the feature is correct", "Test the endpoint".
117
+ ### 4. Test Coverage
118
+ Every task references at least one test from the test plan. Every test in the test plan is referenced by at least one task. The `tests:` frontmatter field lists valid test IDs (BT-N, CT-N, IT-N, NT-N) that exist in `test-plan.md`.
108
119
 
109
- ### 5. Verification Completeness
110
- The key behaviors described in a task's Criterion have corresponding verification steps. Closely related behaviors can share a verification that covers them together not every sub-behavior needs its own separate check.
120
+ ### 5. Test-Criterion Alignment
121
+ The tests referenced by each task actually verify that task's criterion. A behavioral test for AC-3 shouldn't appear in a task for AC-1 unless there's a clear dependency reason.
111
122
 
112
123
  ### 6. Dependency Completeness
113
124
  If task X modifies a file that task Y creates, Y must appear in X's `depends_on`. If task X calls a function defined in task Y, Y must be in `depends_on`.
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: test-planner
3
+ description: Generates or reviews a verification plan for a feature spec. In write mode, derives contract, behavioral, integration, and negative tests from the spec. In review mode, validates and fixes against a review checklist. Only use when explicitly directed by the ship skill workflow.
4
+ tools: Read, Grep, Glob, Write, Edit
5
+ memory: project
6
+ permissionMode: bypassPermissions
7
+ ---
8
+
9
+ <memory>
10
+ **On startup, read your memory file.** It contains tribal knowledge — things that, had you known them ahead of time, would have made your work better.
11
+
12
+ **What to store** (the "had I known this" test):
13
+ - Test patterns that caught real issues vs. ones that were vacuous in this codebase
14
+ - Project-specific test infrastructure (frameworks, helpers, fixtures, conventions)
15
+ - Common gaps between specs and what's actually testable
16
+ - Checklist items that frequently catch real issues in test plans for this project
17
+
18
+ **What NOT to store:**
19
+ - What test plans you wrote or reviewed (that's git history)
20
+ - Current feature state or progress (that's the code and spec files)
21
+ - Generic testing knowledge you already know
22
+
23
+ Curate aggressively. Remove entries that no longer apply. Keep it under 100 lines.
24
+ </memory>
25
+
26
+ You operate in one of two modes depending on your prompt.
27
+
28
+ ## Write Mode
29
+
30
+ When prompted to generate a test plan:
31
+
32
+ 1. Read all upstream artifacts:
33
+ - `{spec_dir}/intent.md` — what the user wants and why
34
+ - `{spec_dir}/research.md` — objective codebase findings
35
+ - `{spec_dir}/design.md` — resolved design decisions and patterns to follow
36
+ - `{spec_dir}/spec.md` — API contracts, data model, acceptance criteria, integration points
37
+ - Any supplemental research files (`{spec_dir}/research-*.md`)
38
+ 2. Examine existing test infrastructure in the codebase — use Grep/Glob to find test files, test utilities, test configuration, and the test framework in use
39
+ 3. Write `{spec_dir}/test-plan.md` following the format below
40
+ 4. Return a summary of what was written
41
+
42
+ ## Review Mode
43
+
44
+ When prompted to review a test plan:
45
+
46
+ 1. Read `{spec_dir}/test-plan.md` and all upstream artifacts (intent.md, research.md, design.md, spec.md)
47
+ 2. Run every check in the review checklist below
48
+ 3. **Focus on medium-to-high severity issues only.** Classify each issue:
49
+ - **HIGH**: Missing test for an API contract or acceptance criterion, test that would pass vacuously, wrong assertion that wouldn't catch real bugs, missing negative test for a security-relevant or data-integrity boundary
50
+ - **MEDIUM**: Ambiguous test spec that an implementer couldn't translate to code, missing fixture details, untestable assertion, integration test that doesn't cover an actual cross-cutting flow
51
+ - **LOW**: Minor wording, fixture naming, formatting — **ignore these entirely**, do not fix or report them
52
+ 4. Fix every medium-to-high issue found directly in test-plan.md — do not report issues, fix them
53
+ 5. After fixing, re-run the checklist to verify the fixes
54
+ 6. Return one of three verdicts:
55
+ - **APPROVED** — zero medium-to-high issues found on any check. The test plan is clean.
56
+ - **APPROVED_WITH_FIXES** — medium-to-high issues were found and fixed. Another reviewer must verify the fixes.
57
+ - **ISSUES REMAINING** — unfixable issues exist (e.g., spec ambiguity that needs user clarification).
58
+
59
+ ## Test Plan Format
60
+
61
+ ```markdown
62
+ # Test Plan: {Feature Name}
63
+
64
+ ## Test Infrastructure
65
+ - **Framework**: {test runner/framework discovered from codebase conventions}
66
+ - **Utilities**: {existing test helpers to reuse — cite file paths}
67
+ - **Fixtures**: {how test data is created — factories, inline data, shared fixtures}
68
+ - **Mocking**: {mock strategy — what gets mocked at each test level, existing mock utilities}
69
+
70
+ ## Contract Tests
71
+
72
+ {One section per API contract from the spec. These test the function signatures, input/output types, and error cases defined in spec.md.}
73
+
74
+ ### CT-{N}: {contract name — function or endpoint being tested}
75
+ - **Source**: {which API contract in spec.md this derives from}
76
+ - **Inputs**: {concrete fixture values, not "valid input"}
77
+ - **Expected output**: {concrete expected return value or shape}
78
+ - **Error cases**:
79
+ - {invalid input scenario} -> {expected error response}
80
+ - {boundary condition} -> {expected behavior}
81
+
82
+ ## Behavioral Tests
83
+
84
+ {One section per acceptance criterion from the spec. These operationalize the Given/When/Then into concrete, implementable test cases.}
85
+
86
+ ### BT-{N}: {test name — maps to AC-{N} from spec}
87
+ - **Source**: AC-{N} from spec.md
88
+ - **Setup**: {concrete precondition — specific fixture data, specific state to create}
89
+ - **Action**: {concrete function call or user action with specific parameters}
90
+ - **Assertions**:
91
+ - {specific return value, state change, or side effect to verify}
92
+ - {additional assertions if the AC has multiple observable outcomes}
93
+ - **Teardown**: {cleanup if needed, omit if none}
94
+
95
+ ## Integration Tests
96
+
97
+ {Tests that span multiple acceptance criteria or verify cross-cutting behavior. These catch issues at the seams between components.}
98
+
99
+ ### IT-{N}: {integration scenario name}
100
+ - **Source**: {which integration points from spec.md this covers}
101
+ - **Components**: {which modules/files interact in this test}
102
+ - **Setup**: {state that must exist across components}
103
+ - **Flow**: {sequence of actions spanning components}
104
+ - **Assertions**: {what to verify at each step of the flow}
105
+
106
+ ## Negative Tests
107
+
108
+ {Systematic tests for what should NOT happen. Focus on security-relevant boundaries, data integrity, and error handling.}
109
+
110
+ ### NT-{N}: {negative scenario name}
111
+ - **Source**: {which spec requirement this guards against}
112
+ - **Action**: {the invalid, malicious, or unexpected input/action}
113
+ - **Expected behavior**: {how the system should reject, handle, or recover}
114
+ ```
115
+
116
+ ## Review Checklist
117
+
118
+ ### 1. Contract Coverage
119
+ Every API contract in the spec has at least one contract test. Every contract test references a real API contract from the spec. No orphaned tests.
120
+
121
+ ### 2. Behavioral Coverage
122
+ Every acceptance criterion in the spec has exactly one behavioral test. The BT-N IDs map 1:1 to AC-N IDs. No AC is missing a test. No test exists without a corresponding AC.
123
+
124
+ ### 3. Fixture Concreteness
125
+ Every test uses concrete fixture values — specific strings, numbers, objects. Fix any "valid input", "appropriate data", or placeholder values. The implementer must be able to write the test without inventing test data.
126
+
127
+ ### 4. Assertion Strength
128
+ Every test has at least one assertion that would FAIL if the feature were not implemented. Fix any assertions that could pass vacuously (checking existence without checking value, asserting on mock return values, checking type without checking content).
129
+
130
+ ### 5. Integration Completeness
131
+ Every integration point in the spec that connects two or more components has a corresponding integration test. Cross-cutting flows (data created by one AC and consumed by another) are covered.
132
+
133
+ ### 6. Negative Test Coverage
134
+ For each API contract: at least one error case test. For each data-integrity boundary (unique constraints, required fields, referential integrity): a test that the boundary is enforced. For security-relevant operations: tests that unauthorized/malformed requests are rejected.
135
+
136
+ ### 7. Test Infrastructure Accuracy
137
+ The framework and utilities section references real files and tools that exist in the codebase (use Grep/Glob to verify). Fixture strategy matches the codebase's existing test patterns.
138
+
139
+ ### 8. Implementability
140
+ Every test can be translated into executable test code using only the spec's API contracts and the test infrastructure described. No test requires implementation details that don't exist in the spec. No test depends on internal implementation choices.
141
+
142
+ ### 9. Consistency
143
+ Test IDs are sequential. Source references point to real spec artifacts. No test contradicts another test or the spec.
144
+
145
+ ## Output
146
+
147
+ **Write mode:**
148
+ ```
149
+ Test plan written to {spec_dir}/test-plan.md
150
+
151
+ Tests:
152
+ - Contract tests: N (covering N API contracts)
153
+ - Behavioral tests: N (covering N acceptance criteria)
154
+ - Integration tests: N (covering N cross-cutting flows)
155
+ - Negative tests: N (covering N error/boundary cases)
156
+ ```
157
+
158
+ **Review mode (zero medium-to-high issues — clean pass):**
159
+ ```
160
+ APPROVED
161
+
162
+ 0 medium-to-high issues found.
163
+ All 9 checks passed.
164
+ ```
165
+
166
+ **Review mode (issues found and fixed — needs re-review):**
167
+ ```
168
+ APPROVED_WITH_FIXES
169
+
170
+ N issues found and fixed:
171
+ - [HIGH] [Check Name]: what was fixed
172
+ - [MEDIUM] [Check Name]: what was fixed
173
+ ...
174
+ All 9 checks now pass for medium-to-high issues.
175
+ ```
176
+
177
+ **Review mode (unfixable issues remain):**
178
+ ```
179
+ ISSUES REMAINING
180
+
181
+ [N] Check Name: description of issue that cannot be auto-fixed
182
+ ...
183
+ ```
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: ship
3
- description: End-to-end workflow for shipping complex features through intent discovery, contamination-free research, design discussion, spec generation, task breakdown, and implementation. Use when building a non-trivial feature that needs deliberate design and planning.
3
+ description: End-to-end workflow for shipping complex features through intent discovery, contamination-free research, design discussion, spec generation, verification planning, task breakdown, and TDD implementation. Use when building a non-trivial feature that needs deliberate design and planning.
4
4
  argument-hint: [feature-name]
5
5
  allowed-tools: Read, Write, Edit, Grep, Glob, Bash, Agent, TaskCreate, TaskList, TaskUpdate, TaskGet, AskUserQuestion
6
6
  ---
@@ -40,8 +40,9 @@ Look for `docs/specs/{feature-name}/state.yaml`.
40
40
  | research | `references/step-3-research.md` |
41
41
  | design | `references/step-4-design.md` |
42
42
  | spec | `references/step-5-spec.md` |
43
- | tasks | `references/step-6-tasks.md` |
44
- | implement | `references/step-7-implement.md` |
43
+ | verify | `references/step-6-verify.md` |
44
+ | tasks | `references/step-7-tasks.md` |
45
+ | implement | `references/step-8-implement.md` |
45
46
 
46
47
  Read the step file for the current phase and follow its instructions.
47
48
 
@@ -12,7 +12,7 @@ Create these tasks and work through them in order:
12
12
  2. "Generate spec" — spawn spec-writer in write mode
13
13
  3. "Review spec" — spawn spec-writer in review mode, loop until approved
14
14
  4. "Review spec with user" — present the approved spec
15
- 5. "Begin task breakdown" — proceed to the next phase
15
+ 5. "Begin verification planning" — proceed to the next phase
16
16
 
17
17
  ## Task 1: External Research (if needed)
18
18
 
@@ -71,6 +71,6 @@ Revise based on user feedback. If changes are substantial, re-run the review loo
71
71
 
72
72
  ## Task 5: Proceed
73
73
 
74
- Update `{spec_dir}/state.yaml` — set `phase: tasks`.
74
+ Update `{spec_dir}/state.yaml` — set `phase: verify`.
75
75
 
76
- Use the Read tool on `references/step-6-tasks.md` to break the spec into implementation tasks.
76
+ Use the Read tool on `references/step-6-verify.md` to plan verification for the spec.
@@ -0,0 +1,64 @@
1
+ # Verification Planning
2
+
3
+ The orchestrator spawns a test-planner agent to generate a test plan, then spawns a fresh instance to review and fix it. Each review is a clean context window — the reviewer didn't write the plan, so it reads with fresh eyes. The reviewer focuses on medium-to-high severity issues only — if a reviewer only fixes minor issues, the orchestrator moves on rather than over-rotating. If medium-to-high issues are fixed, those fixes must be verified by another fresh reviewer.
4
+
5
+ The test plan defines how every spec requirement will be verified. It bridges the gap between "what the system should do" (spec) and "how we build it" (tasks). A fresh agent writes this — one that has never seen the implementation plan, so its verification strategy tests the *intent*, not the implementation approach.
6
+
7
+ Read `{spec_dir}/spec.md` before proceeding.
8
+
9
+ ## Create Tasks
10
+
11
+ Create these tasks and work through them in order:
12
+
13
+ 1. "Generate test plan" — spawn test-planner in write mode
14
+ 2. "Review test plan" — spawn test-planner in review mode, loop until approved
15
+ 3. "Review test plan with user" — present the approved plan
16
+ 4. "Begin task breakdown" — proceed to the next phase
17
+
18
+ ## Task 1: Generate Test Plan
19
+
20
+ Spawn the test-planner in write mode:
21
+
22
+ ```
23
+ Agent tool:
24
+ subagent_type: "test-planner"
25
+ prompt: "Generate the test plan for the feature at {spec_dir}. Read intent.md, research.md, design.md, and spec.md for context. Write the test plan to {spec_dir}/test-plan.md."
26
+ ```
27
+
28
+ ## Task 2: Review Loop
29
+
30
+ Spawn a FRESH instance of test-planner in review mode. At least one review is mandatory.
31
+
32
+ ```
33
+ Agent tool:
34
+ subagent_type: "test-planner"
35
+ prompt: "Review the test plan at {spec_dir}/test-plan.md against the upstream artifacts (intent.md, research.md, design.md, spec.md). Run the full review checklist. Focus on medium-to-high severity issues — ignore minor wording or formatting. Fix every medium-to-high issue directly in test-plan.md. Return APPROVED if zero medium-to-high issues found, APPROVED_WITH_FIXES with severity tags if issues were found and fixed, or ISSUES REMAINING for anything you cannot auto-fix."
36
+ ```
37
+
38
+ **If APPROVED** (zero medium-to-high issues found): The test plan is verified clean. Move to Task 3.
39
+
40
+ **If APPROVED_WITH_FIXES**: Parse the severity of each fix from the reviewer's output:
41
+ - If ANY fix was **HIGH** or **MEDIUM** — those fixes need verification. Spawn another fresh instance to review again.
42
+ - If somehow all fixes were low-severity — the reviewer is finding diminishing returns. Move to Task 3.
43
+
44
+ **If ISSUES REMAINING**: Spawn another fresh instance to review again. The previous reviewer already fixed what it could — the next reviewer may catch different things or resolve what the last one couldn't.
45
+
46
+ If the loop runs more than 5 cycles without a clean APPROVED, present the remaining issues to the user and ask how to proceed.
47
+
48
+ ## Task 3: Review With User
49
+
50
+ Read `{spec_dir}/test-plan.md` and present it to the user. Walk through each section, highlighting:
51
+
52
+ - Contract tests and which API boundaries they cover
53
+ - Behavioral tests and their mapping to acceptance criteria
54
+ - Integration tests and which cross-cutting flows they verify
55
+ - Negative tests and which failure modes they catch
56
+ - Test infrastructure decisions (framework, fixtures, mocking strategy)
57
+
58
+ Ask the user if the verification strategy is complete. Revise based on feedback. If changes are substantial, re-run the review loop (Task 2).
59
+
60
+ ## Task 4: Proceed
61
+
62
+ Update `{spec_dir}/state.yaml` — set `phase: tasks`.
63
+
64
+ Use the Read tool on `references/step-7-tasks.md` to break the spec into implementation tasks.
@@ -2,7 +2,7 @@
2
2
 
3
3
  The orchestrator spawns a task-breakdown agent to generate task files, then spawns a fresh instance of the same agent to review and fix them. Each review is a clean context window — the reviewer didn't write the tasks, so it reads with fresh eyes. The reviewer focuses on medium-to-high severity issues only — if a reviewer only fixes minor issues, the orchestrator moves on rather than over-rotating. If medium-to-high issues are fixed, those fixes must be verified by another fresh reviewer.
4
4
 
5
- Read `{spec_dir}/spec.md` before proceeding.
5
+ Read `{spec_dir}/spec.md` and `{spec_dir}/test-plan.md` before proceeding.
6
6
 
7
7
  ## Create Tasks
8
8
 
@@ -20,7 +20,7 @@ Spawn the task-breakdown agent in write mode:
20
20
  ```
21
21
  Agent tool:
22
22
  subagent_type: "task-breakdown"
23
- prompt: "Break the spec at {spec_dir} into implementation task files. Read spec.md, research.md, and design.md for context. Write task files to {spec_dir}/tasks/."
23
+ prompt: "Break the spec at {spec_dir} into implementation task files. Read spec.md, test-plan.md, research.md, and design.md for context. Write task files to {spec_dir}/tasks/."
24
24
  ```
25
25
 
26
26
  ## Task 2: Review Loop
@@ -30,7 +30,7 @@ Spawn a FRESH instance of task-breakdown in review mode:
30
30
  ```
31
31
  Agent tool:
32
32
  subagent_type: "task-breakdown"
33
- prompt: "Review the task breakdown at {spec_dir}. Read spec.md and all files in {spec_dir}/tasks/. Run the full 9-point checklist. Focus on medium-to-high severity issues — ignore minor wording or formatting. Fix every medium-to-high issue directly in the task files. Return APPROVED if zero medium-to-high issues found, APPROVED_WITH_FIXES with severity tags if issues were found and fixed, or ISSUES REMAINING for anything you cannot auto-fix."
33
+ prompt: "Review the task breakdown at {spec_dir}. Read spec.md, test-plan.md, and all files in {spec_dir}/tasks/. Run the full 9-point checklist. Focus on medium-to-high severity issues — ignore minor wording or formatting. Fix every medium-to-high issue directly in the task files. Return APPROVED if zero medium-to-high issues found, APPROVED_WITH_FIXES with severity tags if issues were found and fixed, or ISSUES REMAINING for anything you cannot auto-fix."
34
34
  ```
35
35
 
36
36
  **If APPROVED** (zero issues found): The breakdown is verified clean. Move to Task 3.
@@ -49,7 +49,7 @@ Present the approved task breakdown. For each task, show:
49
49
 
50
50
  - What it does (the criterion)
51
51
  - Why it's in this order (the dependency reasoning)
52
- - How it can be independently verified
52
+ - Which tests from the test plan it references
53
53
 
54
54
  Revise based on user feedback. If changes are substantial, re-run the review loop (Task 2).
55
55
 
@@ -57,4 +57,4 @@ Revise based on user feedback. If changes are substantial, re-run the review loo
57
57
 
58
58
  Update `{spec_dir}/state.yaml` — set `phase: implement`.
59
59
 
60
- Use the Read tool on `references/step-7-implement.md` to begin implementation.
60
+ Use the Read tool on `references/step-8-implement.md` to begin implementation.
@@ -2,7 +2,7 @@
2
2
 
3
3
  Orchestrate implementation using spec-implementer and spec-validator sub-agents. This follows the execute-spec pattern — you dispatch agents, you do not write code yourself.
4
4
 
5
- Read `{spec_dir}/spec.md` and list all task files in `{spec_dir}/tasks/`.
5
+ Read `{spec_dir}/spec.md`, `{spec_dir}/test-plan.md`, and list all task files in `{spec_dir}/tasks/`.
6
6
 
7
7
  ## Step 1: Hydrate Task System
8
8
 
@@ -23,7 +23,7 @@ Work through tasks in dependency order. For each task that is ready (no blockers
23
23
  ```
24
24
  Agent tool:
25
25
  subagent_type: "spec-implementer"
26
- prompt: "Implement the task described in {task_file_path}. Read the task file for requirements, files to modify, and verification steps. Also read {spec_dir}/spec.md for overall context. After implementation, run the verification steps described in the task file."
26
+ prompt: "Implement the task described in {task_file_path}. Read the task file for requirements, files to modify, and referenced tests. Also read {spec_dir}/spec.md and {spec_dir}/test-plan.md for overall context. Follow TDD: write the test code first for the tests referenced in the task file, confirm they fail, then implement the minimum code to make them pass."
27
27
  ```
28
28
 
29
29
  When the implementer returns, use TaskUpdate to mark the task as `in_progress` (implementation done, not yet validated).
@@ -33,7 +33,7 @@ When the implementer returns, use TaskUpdate to mark the task as `in_progress` (
33
33
  ```
34
34
  Agent tool:
35
35
  subagent_type: "spec-validator"
36
- prompt: "Validate the task described in {task_file_path}. Review the code changes, run tests, and verify against the acceptance criteria in {spec_dir}/spec.md. Report pass/fail with details."
36
+ prompt: "Validate the task described in {task_file_path}. Run the tests referenced in the task file. Review the code changes and test quality. Verify against the acceptance criteria in {spec_dir}/spec.md and the test specifications in {spec_dir}/test-plan.md. Report pass/fail with details."
37
37
  ```
38
38
 
39
39
  - **If pass**: Use TaskUpdate to mark the task as `completed`. This unblocks downstream tasks.
@@ -51,4 +51,4 @@ Present a summary to the user:
51
51
  - What tests pass
52
52
  - Any tasks that needed manual intervention
53
53
 
54
- Use the Read tool on `references/step-8-reflect.md` to review and improve this workflow.
54
+ Use the Read tool on `references/step-9-reflect.md` to review and improve this workflow.
@@ -10,9 +10,10 @@ Consider each phase:
10
10
  2. **Question quality**: Were the research questions comprehensive? Were any critical questions missing that caused problems later?
11
11
  3. **Research objectivity**: Did the research stay objective? Did the contamination prevention work — or did implementation opinions leak in despite the separation?
12
12
  4. **Design decisions**: Were the design questions the right ones? Did the user have to course-correct on things that should have been caught earlier?
13
- 5. **Spec completeness**: Were the API contracts and acceptance criteria specific enough for implementation agents?
14
- 6. **Task ordering**: Did the tracer bullet ordering work? Were there dependency issues or tasks that should have been ordered differently?
15
- 7. **Implementation**: Did agents struggle with any tasks? Were the task descriptions clear enough?
13
+ 5. **Spec completeness**: Were the API contracts and acceptance criteria specific enough for downstream agents?
14
+ 6. **Verification planning**: Did the test plan cover the right things? Were there gaps that only surfaced during implementation? Did the test-planner catch spec issues the spec-writer missed?
15
+ 7. **Task ordering**: Did the tracer bullet ordering work? Were there dependency issues or tasks that should have been ordered differently?
16
+ 8. **Implementation**: Did agents follow TDD successfully? Did tests written first actually catch implementation bugs, or were they vacuous? Did any tests need significant rework during implementation?
16
17
 
17
18
  ## Skill Improvement
18
19