cc-dev-template 0.1.96 → 0.1.98

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/bin/install.js CHANGED
@@ -254,6 +254,7 @@ if (fs.existsSync(mergeSettingsPath)) {
254
254
  const configs = [
255
255
  { file: 'task-output-guard-hook.json', name: 'TaskOutput context guard' },
256
256
  { file: 'statusline-config.json', name: 'Custom status line' },
257
+ { file: 'ship-policy-hook.json', name: 'Ship policy enforcement' },
257
258
  // Spinner verbs - choose one (Star Trek or Factorio)
258
259
  { file: 'spinner-verbs-startrek.json', name: 'Star Trek spinner verbs' }
259
260
  // { file: 'spinner-verbs-factorio.json', name: 'Factorio spinner verbs' }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cc-dev-template",
3
- "version": "0.1.96",
3
+ "version": "0.1.98",
4
4
  "description": "Structured AI-assisted development framework for Claude Code",
5
5
  "bin": {
6
6
  "cc-dev-template": "./bin/install.js"
@@ -1,17 +1,17 @@
1
1
  ---
2
2
  name: question-generator
3
3
  description: Generates research questions from a feature intent document. Cannot explore the codebase — produces questions only.
4
- tools: Write
4
+ tools: Read, Write
5
5
  permissionMode: bypassPermissions
6
6
  ---
7
7
 
8
- You are a question generator. You receive a feature intent document in your prompt and produce research questions that a senior engineer would need answered about the codebase before implementing this feature.
8
+ You are a question generator. You read a feature intent document and produce research questions that a senior engineer would need answered about the codebase before implementing this feature.
9
9
 
10
- You generate questions only. You have no ability to read files or explore the codebase. Your only tool is Write to write the questions file.
10
+ You generate questions only. You can read the intent document and write the questions file the ship policy hook restricts you to exactly those two paths.
11
11
 
12
12
  ## Process
13
13
 
14
- 1. Analyze the intent document provided below in your prompt
14
+ 1. Read the intent document at the path provided in your prompt
15
15
  2. Think deeply about what you'd need to know to actually build this — not just what the system looks like, but how you'd hook into it
16
16
  3. Write organized, specific questions to the output path provided in your prompt
17
17
 
@@ -30,13 +30,23 @@ When given a task file path:
30
30
 
31
31
  1. Read the task file at that path
32
32
  2. Read the spec file in the parent directory (`../spec.md`)
33
- 3. Check the **Review Notes** section of the task file:
33
+ 3. Read the **Test Plan** (`../test-plan.md`) find the full test specifications for the test IDs referenced in the task's `tests:` frontmatter
34
+ 4. Check the **Review Notes** section of the task file:
34
35
  - **If issues exist**: Address those specific issues (fix mode)
35
- - **If empty**: Implement from scratch per the Criterion (initial mode)
36
- 4. Implement the work, touching only files listed in the **Files** section
36
+ - **If empty**: Implement from scratch using TDD (initial mode — see TDD process below)
37
37
  5. Append your work summary to **Implementation Notes** (see format below)
38
38
  6. Return minimal status (see Output section)
39
39
 
40
+ ## TDD Process (Initial Mode)
41
+
42
+ Follow this sequence strictly:
43
+
44
+ 1. **RED** — Write executable test code for every test ID referenced in the task's `tests:` field. Translate the test specifications from the test plan into actual test files using the project's test framework and conventions. Run the tests — they MUST fail. If a test passes before you've written any implementation, the test is vacuous or the feature already exists — investigate.
45
+
46
+ 2. **GREEN** — Implement the minimum code to make all referenced tests pass. Touch only files listed in the **Files** section. Run tests after each meaningful change. Stop as soon as all referenced tests pass.
47
+
48
+ 3. **REFACTOR** — Clean up the implementation while keeping tests green. Extract helpers, improve naming, reduce duplication — but only if the tests still pass after each change.
49
+
40
50
  ## Implementation Notes Format
41
51
 
42
52
  Append a new section with timestamp:
@@ -44,7 +54,9 @@ Append a new section with timestamp:
44
54
  ```markdown
45
55
  ### Pass N (YYYY-MM-DD HH:MM)
46
56
 
47
- [Brief summary of what you implemented or fixed]
57
+ **RED**: Wrote tests {test IDs} {all fail as expected / notes on any issues}
58
+ **GREEN**: {Brief summary of what you implemented to make tests pass}
59
+ **REFACTOR**: {What you cleaned up, or "None needed"}
48
60
 
49
61
  Files modified:
50
62
  - path/to/file.ts - [what changed]
@@ -34,9 +34,14 @@ When given a task file path:
34
34
  4. Append findings to **Review Notes** (see format below)
35
35
  5. Return minimal status (see Output section)
36
36
 
37
- ## Step 1: Code Review + Automated Tests
38
-
39
- - Run automated tests if they exist (look for test files, run with appropriate test runner)
37
+ ## Step 1: Run Tests + Code Review
38
+
39
+ - Run the tests referenced in the task's `tests:` frontmatter field they must ALL pass
40
+ - Read the test plan (`../test-plan.md`) and verify the test code actually matches the test specifications (correct assertions, correct fixture data, not testing implementation details instead of behavior)
41
+ - Check test quality:
42
+ - Does each test have meaningful assertions that would fail if the feature weren't implemented?
43
+ - Are mocks minimal (only at true boundaries, not mocking the thing being tested)?
44
+ - Are tests testing behavior (from the spec), not implementation details?
40
45
  - Check for code smells:
41
46
  - Files over 300 lines: Can this logically split into multiple files, or does it need to be one file?
42
47
  - Missing error handling that could cause runtime failures, naming that actively misleads about what the code does
@@ -89,7 +89,6 @@ These must be specific enough that tests can be written against them without rea
89
89
  - **Given**: {precondition — specific state, not vague}
90
90
  - **When**: {action — concrete user or system action}
91
91
  - **Then**: {expected result — observable, measurable}
92
- - **Verification**: {how to test — specific command, specific assertion, or specific manual check}
93
92
 
94
93
  ### AC-2: ...
95
94
 
@@ -125,8 +124,8 @@ Every function, endpoint, or interface crossing a module boundary is fully speci
125
124
  ### 5. Acceptance Criteria Independence
126
125
  Each AC tests exactly one behavior. Each AC can be verified without completing other ACs first. Fix compound criteria by splitting them.
127
126
 
128
- ### 6. Verification Executability
129
- Every AC has a verification that can actually be executed a test command, specific assertion, or concrete manual check. Fix any "verify it works" or "test the endpoint".
127
+ ### 6. Testability
128
+ Every AC has a concrete, observable outcome in the Then clause — specific return values, state changes, or side effects that can be asserted against. The Then clause must be precise enough that a test-planner can derive executable tests from it without guessing. Fix any vague Then clauses like "it works correctly" or "the feature is available".
130
129
 
131
130
  ### 7. Data Model Precision
132
131
  All data structures have concrete field names, types, nullability, and defaults. Fix any "relevant fields", "appropriate type", or vague descriptions.
@@ -30,26 +30,28 @@ You operate in one of two modes depending on your prompt.
30
30
  When prompted to generate a task breakdown:
31
31
 
32
32
  1. Read `{spec_dir}/spec.md` for acceptance criteria, data model, and integration points
33
- 2. Read `{spec_dir}/research.md` and `{spec_dir}/design.md` for codebase context
34
- 3. Map each acceptance criterion to the files that need changes
35
- 4. Design tracer bullet ordering each task touches all necessary layers
36
- 5. Write task files to `{spec_dir}/tasks/`
37
- 6. Return a summary of what was created
33
+ 2. Read `{spec_dir}/test-plan.md` for the verification strategy and test IDs
34
+ 3. Read `{spec_dir}/research.md` and `{spec_dir}/design.md` for codebase context
35
+ 5. Map each acceptance criterion to the files that need changes
36
+ 6. Design tracer bullet ordering — each task touches all necessary layers
37
+ 7. Write task files to `{spec_dir}/tasks/`
38
+ 8. Return a summary of what was created
38
39
 
39
40
  ## Review Mode
40
41
 
41
42
  When prompted to review a task breakdown:
42
43
 
43
44
  1. Read `{spec_dir}/spec.md` — extract all acceptance criteria
44
- 2. Read all task files in `{spec_dir}/tasks/`
45
- 3. Run every check in the review checklist below
46
- 4. **Classify each issue by severity before acting:**
45
+ 2. Read `{spec_dir}/test-plan.md` — extract all test IDs
46
+ 3. Read all task files in `{spec_dir}/tasks/`
47
+ 4. Run every check in the review checklist below
48
+ 5. **Classify each issue by severity before acting:**
47
49
  - **HIGH**: Would cause implementation to fail or produce wrong results — missing dependency, wrong file path, coverage gap where an AC has no task
48
50
  - **MEDIUM**: Would cause meaningful confusion during implementation — unclear verification, ambiguous scope boundary between tasks
49
51
  - **LOW**: Cosmetic or stylistic — task title wording, minor verification phrasing, formatting — **ignore these entirely**
50
- 5. Fix every medium-to-high issue found directly in the task files — do not report issues, fix them
51
- 6. After fixing, re-run the checklist to verify the fixes
52
- 7. Return one of three verdicts:
52
+ 6. Fix every medium-to-high issue found directly in the task files — do not report issues, fix them
53
+ 7. After fixing, re-run the checklist to verify the fixes
54
+ 8. Return one of three verdicts:
53
55
  - **APPROVED** — zero medium-to-high issues found on any check. The breakdown is clean.
54
56
  - **APPROVED_WITH_FIXES** — medium-to-high issues were found and fixed. Another reviewer must verify the fixes.
55
57
  - **ISSUES REMAINING** — unfixable issues exist that need user action.
@@ -64,17 +66,26 @@ id: T001
64
66
  title: {Short descriptive title — the acceptance criterion}
65
67
  status: pending
66
68
  depends_on: []
69
+ tests: [BT-1, CT-1]
67
70
  ---
68
71
  ```
69
72
 
70
73
  ### Criterion
71
74
  {The acceptance criterion from the spec, verbatim}
72
75
 
76
+ ### Tests
77
+ {Referenced tests from the test plan, with a brief summary of each:}
78
+ - **BT-{N}**: {one-line summary of what this behavioral test verifies}
79
+ - **CT-{N}**: {one-line summary of what this contract test verifies}
80
+ - **IT-{N}**: {if applicable — integration test summary}
81
+
73
82
  ### Files
74
83
  {Which files will be created or modified — verify paths exist for modifications}
75
84
 
76
- ### Verification
77
- {Specific commands or checksconcrete, executable}
85
+ ### TDD Steps
86
+ 1. Write test code for the referenced tests (they should fail no implementation yet)
87
+ 2. Implement the minimum code to make the tests pass
88
+ 3. Refactor if needed (tests still pass)
78
89
 
79
90
  ### Implementation Notes
80
91
  <!-- Implementer agent writes here -->
@@ -87,10 +98,10 @@ depends_on: []
87
98
  - First task wires the thinnest possible end-to-end path (mock data is fine)
88
99
  - Each subsequent task adds real behavior for one acceptance criterion
89
100
  - Every acceptance criterion maps to exactly one task
90
- - Testing is part of each taskinclude the test alongside the feature
101
+ - Every task references tests from the test plan — the implementer writes these tests first (TDD)
91
102
  - Dependencies flow forward only
92
103
  - Each task title describes a verifiable outcome ("User can register with email"), not an implementation detail ("Create the User model")
93
- - Each task's verification uses concrete commands, not "verify it works correctly"
104
+ - Each task references specific test IDs from the test plan, not ad hoc verification
94
105
 
95
106
  ## Review Checklist
96
107
 
@@ -103,11 +114,11 @@ Task file names sort in execution order (T001 before T002). Dependencies form a
103
114
  ### 3. File Plausibility
104
115
  File paths in each task's Files section follow project conventions. Files listed for modification exist in the codebase (use Glob to verify). Each new file is created by exactly one task.
105
116
 
106
- ### 4. Verification Executability
107
- Every Verification section contains concrete commands or specific manual checks. Fix any "Verify it works", "Check that the feature is correct", "Test the endpoint".
117
+ ### 4. Test Coverage
118
+ Every task references at least one test from the test plan. Every test in the test plan is referenced by at least one task. The `tests:` frontmatter field lists valid test IDs (BT-N, CT-N, IT-N, NT-N) that exist in `test-plan.md`.
108
119
 
109
- ### 5. Verification Completeness
110
- The key behaviors described in a task's Criterion have corresponding verification steps. Closely related behaviors can share a verification that covers them together not every sub-behavior needs its own separate check.
120
+ ### 5. Test-Criterion Alignment
121
+ The tests referenced by each task actually verify that task's criterion. A behavioral test for AC-3 shouldn't appear in a task for AC-1 unless there's a clear dependency reason.
111
122
 
112
123
  ### 6. Dependency Completeness
113
124
  If task X modifies a file that task Y creates, Y must appear in X's `depends_on`. If task X calls a function defined in task Y, Y must be in `depends_on`.
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: test-planner
3
+ description: Generates or reviews a verification plan for a feature spec. In write mode, derives contract, behavioral, integration, and negative tests from the spec. In review mode, validates and fixes against a review checklist. Only use when explicitly directed by the ship skill workflow.
4
+ tools: Read, Grep, Glob, Write, Edit
5
+ memory: project
6
+ permissionMode: bypassPermissions
7
+ ---
8
+
9
+ <memory>
10
+ **On startup, read your memory file.** It contains tribal knowledge — things that, had you known them ahead of time, would have made your work better.
11
+
12
+ **What to store** (the "had I known this" test):
13
+ - Test patterns that caught real issues vs. ones that were vacuous in this codebase
14
+ - Project-specific test infrastructure (frameworks, helpers, fixtures, conventions)
15
+ - Common gaps between specs and what's actually testable
16
+ - Checklist items that frequently catch real issues in test plans for this project
17
+
18
+ **What NOT to store:**
19
+ - What test plans you wrote or reviewed (that's git history)
20
+ - Current feature state or progress (that's the code and spec files)
21
+ - Generic testing knowledge you already know
22
+
23
+ Curate aggressively. Remove entries that no longer apply. Keep it under 100 lines.
24
+ </memory>
25
+
26
+ You operate in one of two modes depending on your prompt.
27
+
28
+ ## Write Mode
29
+
30
+ When prompted to generate a test plan:
31
+
32
+ 1. Read all upstream artifacts:
33
+ - `{spec_dir}/intent.md` — what the user wants and why
34
+ - `{spec_dir}/research.md` — objective codebase findings
35
+ - `{spec_dir}/design.md` — resolved design decisions and patterns to follow
36
+ - `{spec_dir}/spec.md` — API contracts, data model, acceptance criteria, integration points
37
+ - Any supplemental research files (`{spec_dir}/research-*.md`)
38
+ 2. Examine existing test infrastructure in the codebase — use Grep/Glob to find test files, test utilities, test configuration, and the test framework in use
39
+ 3. Write `{spec_dir}/test-plan.md` following the format below
40
+ 4. Return a summary of what was written
41
+
42
+ ## Review Mode
43
+
44
+ When prompted to review a test plan:
45
+
46
+ 1. Read `{spec_dir}/test-plan.md` and all upstream artifacts (intent.md, research.md, design.md, spec.md)
47
+ 2. Run every check in the review checklist below
48
+ 3. **Focus on medium-to-high severity issues only.** Classify each issue:
49
+ - **HIGH**: Missing test for an API contract or acceptance criterion, test that would pass vacuously, wrong assertion that wouldn't catch real bugs, missing negative test for a security-relevant or data-integrity boundary
50
+ - **MEDIUM**: Ambiguous test spec that an implementer couldn't translate to code, missing fixture details, untestable assertion, integration test that doesn't cover an actual cross-cutting flow
51
+ - **LOW**: Minor wording, fixture naming, formatting — **ignore these entirely**, do not fix or report them
52
+ 4. Fix every medium-to-high issue found directly in test-plan.md — do not report issues, fix them
53
+ 5. After fixing, re-run the checklist to verify the fixes
54
+ 6. Return one of three verdicts:
55
+ - **APPROVED** — zero medium-to-high issues found on any check. The test plan is clean.
56
+ - **APPROVED_WITH_FIXES** — medium-to-high issues were found and fixed. Another reviewer must verify the fixes.
57
+ - **ISSUES REMAINING** — unfixable issues exist (e.g., spec ambiguity that needs user clarification).
58
+
59
+ ## Test Plan Format
60
+
61
+ ```markdown
62
+ # Test Plan: {Feature Name}
63
+
64
+ ## Test Infrastructure
65
+ - **Framework**: {test runner/framework discovered from codebase conventions}
66
+ - **Utilities**: {existing test helpers to reuse — cite file paths}
67
+ - **Fixtures**: {how test data is created — factories, inline data, shared fixtures}
68
+ - **Mocking**: {mock strategy — what gets mocked at each test level, existing mock utilities}
69
+
70
+ ## Contract Tests
71
+
72
+ {One section per API contract from the spec. These test the function signatures, input/output types, and error cases defined in spec.md.}
73
+
74
+ ### CT-{N}: {contract name — function or endpoint being tested}
75
+ - **Source**: {which API contract in spec.md this derives from}
76
+ - **Inputs**: {concrete fixture values, not "valid input"}
77
+ - **Expected output**: {concrete expected return value or shape}
78
+ - **Error cases**:
79
+ - {invalid input scenario} -> {expected error response}
80
+ - {boundary condition} -> {expected behavior}
81
+
82
+ ## Behavioral Tests
83
+
84
+ {One section per acceptance criterion from the spec. These operationalize the Given/When/Then into concrete, implementable test cases.}
85
+
86
+ ### BT-{N}: {test name — maps to AC-{N} from spec}
87
+ - **Source**: AC-{N} from spec.md
88
+ - **Setup**: {concrete precondition — specific fixture data, specific state to create}
89
+ - **Action**: {concrete function call or user action with specific parameters}
90
+ - **Assertions**:
91
+ - {specific return value, state change, or side effect to verify}
92
+ - {additional assertions if the AC has multiple observable outcomes}
93
+ - **Teardown**: {cleanup if needed, omit if none}
94
+
95
+ ## Integration Tests
96
+
97
+ {Tests that span multiple acceptance criteria or verify cross-cutting behavior. These catch issues at the seams between components.}
98
+
99
+ ### IT-{N}: {integration scenario name}
100
+ - **Source**: {which integration points from spec.md this covers}
101
+ - **Components**: {which modules/files interact in this test}
102
+ - **Setup**: {state that must exist across components}
103
+ - **Flow**: {sequence of actions spanning components}
104
+ - **Assertions**: {what to verify at each step of the flow}
105
+
106
+ ## Negative Tests
107
+
108
+ {Systematic tests for what should NOT happen. Focus on security-relevant boundaries, data integrity, and error handling.}
109
+
110
+ ### NT-{N}: {negative scenario name}
111
+ - **Source**: {which spec requirement this guards against}
112
+ - **Action**: {the invalid, malicious, or unexpected input/action}
113
+ - **Expected behavior**: {how the system should reject, handle, or recover}
114
+ ```
115
+
116
+ ## Review Checklist
117
+
118
+ ### 1. Contract Coverage
119
+ Every API contract in the spec has at least one contract test. Every contract test references a real API contract from the spec. No orphaned tests.
120
+
121
+ ### 2. Behavioral Coverage
122
+ Every acceptance criterion in the spec has exactly one behavioral test. The BT-N IDs map 1:1 to AC-N IDs. No AC is missing a test. No test exists without a corresponding AC.
123
+
124
+ ### 3. Fixture Concreteness
125
+ Every test uses concrete fixture values — specific strings, numbers, objects. Fix any "valid input", "appropriate data", or placeholder values. The implementer must be able to write the test without inventing test data.
126
+
127
+ ### 4. Assertion Strength
128
+ Every test has at least one assertion that would FAIL if the feature were not implemented. Fix any assertions that could pass vacuously (checking existence without checking value, asserting on mock return values, checking type without checking content).
129
+
130
+ ### 5. Integration Completeness
131
+ Every integration point in the spec that connects two or more components has a corresponding integration test. Cross-cutting flows (data created by one AC and consumed by another) are covered.
132
+
133
+ ### 6. Negative Test Coverage
134
+ For each API contract: at least one error case test. For each data-integrity boundary (unique constraints, required fields, referential integrity): a test that the boundary is enforced. For security-relevant operations: tests that unauthorized/malformed requests are rejected.
135
+
136
+ ### 7. Test Infrastructure Accuracy
137
+ The framework and utilities section references real files and tools that exist in the codebase (use Grep/Glob to verify). Fixture strategy matches the codebase's existing test patterns.
138
+
139
+ ### 8. Implementability
140
+ Every test can be translated into executable test code using only the spec's API contracts and the test infrastructure described. No test requires implementation details that don't exist in the spec. No test depends on internal implementation choices.
141
+
142
+ ### 9. Consistency
143
+ Test IDs are sequential. Source references point to real spec artifacts. No test contradicts another test or the spec.
144
+
145
+ ## Output
146
+
147
+ **Write mode:**
148
+ ```
149
+ Test plan written to {spec_dir}/test-plan.md
150
+
151
+ Tests:
152
+ - Contract tests: N (covering N API contracts)
153
+ - Behavioral tests: N (covering N acceptance criteria)
154
+ - Integration tests: N (covering N cross-cutting flows)
155
+ - Negative tests: N (covering N error/boundary cases)
156
+ ```
157
+
158
+ **Review mode (zero medium-to-high issues — clean pass):**
159
+ ```
160
+ APPROVED
161
+
162
+ 0 medium-to-high issues found.
163
+ All 9 checks passed.
164
+ ```
165
+
166
+ **Review mode (issues found and fixed — needs re-review):**
167
+ ```
168
+ APPROVED_WITH_FIXES
169
+
170
+ N issues found and fixed:
171
+ - [HIGH] [Check Name]: what was fixed
172
+ - [MEDIUM] [Check Name]: what was fixed
173
+ ...
174
+ All 9 checks now pass for medium-to-high issues.
175
+ ```
176
+
177
+ **Review mode (unfixable issues remain):**
178
+ ```
179
+ ISSUES REMAINING
180
+
181
+ [N] Check Name: description of issue that cannot be auto-fixed
182
+ ...
183
+ ```
@@ -0,0 +1,14 @@
1
+ {
2
+ "hooks": {
3
+ "PreToolUse": [
4
+ {
5
+ "hooks": [
6
+ {
7
+ "type": "command",
8
+ "command": "node ~/.claude/scripts/ship-policy.js"
9
+ }
10
+ ]
11
+ }
12
+ ]
13
+ }
14
+ }
@@ -0,0 +1,224 @@
1
+ #!/usr/bin/env node
2
+
3
+ /**
4
+ * Ship Policy Hook — Phase-aware enforcement for the ship skill.
5
+ *
6
+ * PreToolUse hook that checks every tool call against a policy matrix
7
+ * of (phase, agent_type, tool_name, target_path) rules. No-op when
8
+ * ship is not active (state file missing or session mismatch).
9
+ *
10
+ * State: {cwd}/.claude/ship-hook-state.json
11
+ */
12
+
13
+ const { readFileSync, writeFileSync, existsSync } = require('fs');
14
+ const { join, resolve, relative } = require('path');
15
+
16
+ // Tools that bypass all policy checks
17
+ const BYPASS_TOOLS = new Set([
18
+ 'AskUserQuestion', 'TaskCreate', 'TaskUpdate', 'TaskList', 'TaskGet',
19
+ 'TaskOutput', 'TaskStop', 'ToolSearch', 'SendMessage', 'Agent',
20
+ 'TeamCreate', 'TeamDelete',
21
+ ]);
22
+
23
+ // Per-phase orchestrator write permissions (beyond state files)
24
+ const ORCHESTRATOR_WRITES = {
25
+ intent: (d) => [`${d}/intent.md`],
26
+ questions: () => [],
27
+ research: (d) => [`${d}/research.md`],
28
+ design: (d) => [`${d}/design.md`],
29
+ spec: () => [],
30
+ verify: () => [],
31
+ tasks: () => [],
32
+ implement: () => [],
33
+ complete: () => [],
34
+ };
35
+
36
+ // Spec artifacts that spec-implementer cannot overwrite
37
+ const PROTECTED_ARTIFACTS = [
38
+ 'spec.md', 'test-plan.md', 'design.md', 'intent.md',
39
+ 'questions.md', 'research.md',
40
+ ];
41
+
42
+ function block(reason) { return { blocked: true, reason }; }
43
+ function allow() { return { blocked: false }; }
44
+
45
+ /**
46
+ * Get the target path relative to cwd.
47
+ * Returns: string (relative path), null (outside project), or undefined (no path in input).
48
+ */
49
+ function relPath(tool, input, cwd) {
50
+ let raw;
51
+ if (tool === 'Read' || tool === 'Write' || tool === 'Edit') raw = input.file_path;
52
+ else if (tool === 'Grep' || tool === 'Glob') raw = input.path;
53
+ if (!raw) return undefined;
54
+ const abs = resolve(raw);
55
+ if (abs !== cwd && !abs.startsWith(cwd + '/')) return null;
56
+ return relative(cwd, abs);
57
+ }
58
+
59
+ function under(p, prefix) {
60
+ const dir = prefix.replace(/\/$/, '');
61
+ return p === dir || p.startsWith(dir + '/');
62
+ }
63
+
64
+ // ── Orchestrator Policy ──
65
+
66
+ function orchestratorPolicy(tool, input, cwd, specDir, phase) {
67
+ if (tool === 'Bash') return block('Orchestrator cannot run Bash — delegate to a sub-agent');
68
+
69
+ if (tool === 'Read' || tool === 'Grep' || tool === 'Glob') {
70
+ const p = relPath(tool, input, cwd);
71
+ if (p === null) return allow(); // Outside project (skill files, etc.)
72
+
73
+ // Grep/Glob with no path or project root — check pattern prefix
74
+ if ((tool === 'Grep' || tool === 'Glob') && (p === undefined || p === '')) {
75
+ const pat = input.pattern || '';
76
+ if (/^(docs|references|\.claude)(\/|$)/.test(pat)) return allow();
77
+ return block('Orchestrator cannot search the full project — use a path under docs/, references/, or .claude/');
78
+ }
79
+
80
+ if (p === undefined) return allow();
81
+ if (under(p, 'docs') || under(p, 'references') || under(p, '.claude')) return allow();
82
+ return block(`Orchestrator cannot read source code (${p}) — delegate to a research agent`);
83
+ }
84
+
85
+ if (tool === 'Write' || tool === 'Edit') {
86
+ const p = relPath(tool, input, cwd);
87
+ if (p === null || p === undefined) return allow();
88
+
89
+ // State files always writable
90
+ if (p === `${specDir}/state.yaml` || p === '.claude/ship-hook-state.json') return allow();
91
+
92
+ // Per-phase permissions
93
+ const permFn = ORCHESTRATOR_WRITES[phase];
94
+ if (permFn && permFn(specDir).includes(p)) return allow();
95
+
96
+ return block(`Orchestrator cannot write ${p} in ${phase} phase — delegate to a sub-agent`);
97
+ }
98
+
99
+ return allow();
100
+ }
101
+
102
+ // ── Sub-Agent Policies ──
103
+
104
+ function questionGeneratorPolicy(tool, input, cwd, specDir) {
105
+ const p = relPath(tool, input, cwd);
106
+
107
+ if (tool === 'Read') {
108
+ if (p === null) return allow();
109
+ if (p === `${specDir}/intent.md`) return allow();
110
+ return block(`question-generator can only read ${specDir}/intent.md`);
111
+ }
112
+ if (tool === 'Write') {
113
+ if (p === null) return allow();
114
+ if (p === `${specDir}/questions.md`) return allow();
115
+ return block(`question-generator can only write ${specDir}/questions.md`);
116
+ }
117
+ return block(`question-generator cannot use ${tool}`);
118
+ }
119
+
120
+ function objectiveResearcherPolicy(tool, input, cwd, specDir) {
121
+ const p = relPath(tool, input, cwd);
122
+
123
+ if (tool === 'Read' || tool === 'Grep' || tool === 'Glob') {
124
+ if (p === null) return allow();
125
+ if (p === undefined) return allow(); // Grep/Glob with no path — project-wide search
126
+ if (p.startsWith(`${specDir}/research`) && p.endsWith('.md')) return allow(); // Own output
127
+ if (under(p, 'docs')) return block('objective-researcher cannot read docs/ — research must stay objective');
128
+ return allow();
129
+ }
130
+
131
+ if (tool === 'Write') {
132
+ if (p === null) return allow();
133
+ if (p !== undefined && p.startsWith(`${specDir}/research`) && p.endsWith('.md')) return allow();
134
+ return block(`objective-researcher can only write to ${specDir}/research-*.md`);
135
+ }
136
+
137
+ if (tool === 'Bash') return allow();
138
+ return block(`objective-researcher cannot use ${tool}`);
139
+ }
140
+
141
+ function specDirWriterPolicy(tool, input, cwd, specDir) {
142
+ if (tool === 'Read' || tool === 'Grep' || tool === 'Glob') return allow();
143
+ if (tool === 'Write' || tool === 'Edit') {
144
+ const p = relPath(tool, input, cwd);
145
+ if (p === null || p === undefined) return allow();
146
+ if (under(p, specDir)) return allow();
147
+ return block(`Cannot write outside ${specDir}`);
148
+ }
149
+ if (tool === 'Bash') return block('This agent cannot run Bash commands');
150
+ return allow();
151
+ }
152
+
153
+ function specImplementerPolicy(tool, input, cwd, specDir) {
154
+ if (tool === 'Write' || tool === 'Edit') {
155
+ const p = relPath(tool, input, cwd);
156
+ if (p === null || p === undefined) return allow();
157
+ for (const artifact of PROTECTED_ARTIFACTS) {
158
+ if (p === `${specDir}/${artifact}`) return block(`spec-implementer cannot modify ${artifact}`);
159
+ }
160
+ }
161
+ return allow();
162
+ }
163
+
164
+ function specValidatorPolicy(tool, input, cwd, specDir) {
165
+ if (tool === 'Write' || tool === 'Edit') {
166
+ const p = relPath(tool, input, cwd);
167
+ if (p === null || p === undefined) return allow();
168
+ if (under(p, `${specDir}/tasks`)) return allow(); // Review Notes in task files
169
+ return block('spec-validator cannot write outside task files');
170
+ }
171
+ return allow();
172
+ }
173
+
174
+ // ── Main ──
175
+
176
+ const AGENT_POLICIES = {
177
+ 'question-generator': questionGeneratorPolicy,
178
+ 'objective-researcher': objectiveResearcherPolicy,
179
+ 'spec-writer': specDirWriterPolicy,
180
+ 'test-planner': specDirWriterPolicy,
181
+ 'task-breakdown': specDirWriterPolicy,
182
+ 'spec-implementer': specImplementerPolicy,
183
+ 'spec-validator': specValidatorPolicy,
184
+ };
185
+
186
+ function main() {
187
+ const input = JSON.parse(readFileSync(0, 'utf-8'));
188
+ const cwd = process.cwd();
189
+ const stateFile = join(cwd, '.claude', 'ship-hook-state.json');
190
+
191
+ if (!existsSync(stateFile)) process.exit(0);
192
+
193
+ let state;
194
+ try { state = JSON.parse(readFileSync(stateFile, 'utf-8')); }
195
+ catch { process.exit(0); }
196
+
197
+ // First-touch: inject session_id if missing
198
+ if (!state.session_id) {
199
+ state.session_id = input.session_id;
200
+ try { writeFileSync(stateFile, JSON.stringify(state, null, 2)); } catch {}
201
+ }
202
+
203
+ if (state.session_id !== input.session_id) process.exit(0);
204
+ if (BYPASS_TOOLS.has(input.tool_name)) process.exit(0);
205
+
206
+ const caller = input.agent_type || 'orchestrator';
207
+ const tool = input.tool_name;
208
+ const toolInput = input.tool_input || {};
209
+
210
+ let result;
211
+ if (caller === 'orchestrator') {
212
+ result = orchestratorPolicy(tool, toolInput, cwd, state.spec_dir, state.phase);
213
+ } else {
214
+ const policyFn = AGENT_POLICIES[caller];
215
+ if (!policyFn) process.exit(0); // Unknown agent — not a ship agent
216
+ result = policyFn(tool, toolInput, cwd, state.spec_dir);
217
+ }
218
+
219
+ if (result.blocked) {
220
+ console.log(JSON.stringify({ decision: 'block', reason: result.reason }));
221
+ }
222
+ }
223
+
224
+ main();
@@ -560,11 +560,25 @@ function main() {
560
560
  usageLines.push(makeBoxLine(usageDisplay));
561
561
  }
562
562
 
563
+ // Ship phase line (if active)
564
+ const shipLines = [];
565
+ try {
566
+ const shipStatePath = join(data.workspace.project_dir, '.claude', 'ship-hook-state.json');
567
+ const shipStat = statSync(shipStatePath);
568
+ if (Date.now() - shipStat.mtimeMs < 300000) { // < 5 min old
569
+ const shipState = JSON.parse(readFileSync(shipStatePath, 'utf-8'));
570
+ const phase = (shipState.phase || '').toUpperCase();
571
+ const feature = shipState.feature || '';
572
+ const subPhase = shipState.sub_phase ? ` ${shipState.sub_phase}` : '';
573
+ shipLines.push(makeBoxLine(`SHIP: ${phase} [${feature}]${subPhase}`));
574
+ }
575
+ } catch {}
576
+
563
577
  // Bottom border (add 2 to match content line width)
564
578
  const bottomBorder = `${DIM_GREY}╚${'═'.repeat(width + 2)}╝${RESET}`;
565
579
 
566
580
  // Combine all lines
567
- const allLines = [topBorder, line0, ...branchLines, ctxLine, ...usageLines, bottomBorder];
581
+ const allLines = [topBorder, line0, ...shipLines, ...branchLines, ctxLine, ...usageLines, bottomBorder];
568
582
  console.log(allLines.join('\n'));
569
583
  } catch (error) {
570
584
  // Log error for debugging (goes to stderr, not visible in status line)
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: ship
3
- description: End-to-end workflow for shipping complex features through intent discovery, contamination-free research, design discussion, spec generation, task breakdown, and implementation. Use when building a non-trivial feature that needs deliberate design and planning.
3
+ description: End-to-end workflow for shipping complex features through intent discovery, contamination-free research, design discussion, spec generation, verification planning, task breakdown, and TDD implementation. Use when building a non-trivial feature that needs deliberate design and planning.
4
4
  argument-hint: [feature-name]
5
5
  allowed-tools: Read, Write, Edit, Grep, Glob, Bash, Agent, TaskCreate, TaskList, TaskUpdate, TaskGet, AskUserQuestion
6
6
  ---
@@ -40,8 +40,11 @@ Look for `docs/specs/{feature-name}/state.yaml`.
40
40
  | research | `references/step-3-research.md` |
41
41
  | design | `references/step-4-design.md` |
42
42
  | spec | `references/step-5-spec.md` |
43
- | tasks | `references/step-6-tasks.md` |
44
- | implement | `references/step-7-implement.md` |
43
+ | verify | `references/step-6-verify.md` |
44
+ | tasks | `references/step-7-tasks.md` |
45
+ | implement | `references/step-8-implement.md` |
46
+
47
+ Also write/update `.claude/ship-hook-state.json` with `phase`, `feature`, and `spec_dir` from state.yaml (set `sub_phase` to `null`).
45
48
 
46
49
  Read the step file for the current phase and follow its instructions.
47
50
 
@@ -53,4 +56,15 @@ phase: intent
53
56
  dir: docs/specs/{feature-name}
54
57
  ```
55
58
 
59
+ Also write `.claude/ship-hook-state.json` (enables policy enforcement for this session):
60
+
61
+ ```json
62
+ {
63
+ "phase": "intent",
64
+ "feature": "{feature-name}",
65
+ "spec_dir": "docs/specs/{feature-name}",
66
+ "sub_phase": null
67
+ }
68
+ ```
69
+
56
70
  Then read `references/step-1-intent.md` to begin.
@@ -45,6 +45,6 @@ Present the intent document to the user for confirmation. Adjust if they have co
45
45
 
46
46
  ## Task 3: Proceed
47
47
 
48
- Update `{spec_dir}/state.yaml` — set `phase: questions`.
48
+ Update `{spec_dir}/state.yaml` — set `phase: questions`. Update `.claude/ship-hook-state.json` — set `phase` to `"questions"`, `sub_phase` to `null`.
49
49
 
50
50
  Use the Read tool on `references/step-2-questions.md` to generate research questions from the intent.
@@ -14,15 +14,15 @@ Create these tasks and work through them in order:
14
14
 
15
15
  ## Task 1: Generate Questions
16
16
 
17
- Read `{spec_dir}/intent.md` yourself, then spawn a sub-agent with the intent content passed inline in the prompt. The agent has `tools: Write` only — it cannot read any files.
17
+ Spawn the question-generator sub-agent. Tell it the spec directory — it will read the intent document directly.
18
18
 
19
19
  ```
20
20
  Agent tool:
21
21
  subagent_type: "question-generator"
22
- prompt: "Write research questions to {spec_dir}/questions.md based on this intent document:\n\n{paste the full intent.md content here}"
22
+ prompt: "Read the intent document at {spec_dir}/intent.md and write research questions to {spec_dir}/questions.md."
23
23
  ```
24
24
 
25
- The question-generator has zero read access. The intent content comes via the prompt, and its only tool is Write. It cannot explore the codebase.
25
+ The question-generator has `tools: Read, Write` with hook-enforced path restrictions it can only read `{spec_dir}/intent.md` and write to `{spec_dir}/questions.md`.
26
26
 
27
27
  ## Task 2: Review Questions
28
28
 
@@ -36,6 +36,6 @@ Update `questions.md` based on user feedback. The user may add questions about p
36
36
 
37
37
  ## Task 3: Proceed
38
38
 
39
- Update `{spec_dir}/state.yaml` — set `phase: research`.
39
+ Update `{spec_dir}/state.yaml` — set `phase: research`. Update `.claude/ship-hook-state.json` — set `phase` to `"research"`, `sub_phase` to `null`.
40
40
 
41
41
  Use the Read tool on `references/step-3-research.md` to begin objective codebase research.
@@ -51,6 +51,6 @@ If the research is thin or missing critical areas, spawn the objective-researche
51
51
 
52
52
  ## Task 4: Proceed
53
53
 
54
- Update `{spec_dir}/state.yaml` — set `phase: design`.
54
+ Update `{spec_dir}/state.yaml` — set `phase: design`. Update `.claude/ship-hook-state.json` — set `phase` to `"design"`, `sub_phase` to `null`.
55
55
 
56
56
  Use the Read tool on `references/step-4-design.md` to begin the design discussion with the user.
@@ -63,6 +63,6 @@ Present to the user for confirmation.
63
63
 
64
64
  ## Task 4: Proceed
65
65
 
66
- Update `{spec_dir}/state.yaml` — set `phase: spec`.
66
+ Update `{spec_dir}/state.yaml` — set `phase: spec`. Update `.claude/ship-hook-state.json` — set `phase` to `"spec"`, `sub_phase` to `null`.
67
67
 
68
68
  Use the Read tool on `references/step-5-spec.md` to generate the implementation specification.
@@ -12,7 +12,7 @@ Create these tasks and work through them in order:
12
12
  2. "Generate spec" — spawn spec-writer in write mode
13
13
  3. "Review spec" — spawn spec-writer in review mode, loop until approved
14
14
  4. "Review spec with user" — present the approved spec
15
- 5. "Begin task breakdown" — proceed to the next phase
15
+ 5. "Begin verification planning" — proceed to the next phase
16
16
 
17
17
  ## Task 1: External Research (if needed)
18
18
 
@@ -38,7 +38,7 @@ Agent tool:
38
38
 
39
39
  ## Task 3: Review Loop
40
40
 
41
- Spawn a FRESH instance of spec-writer in review mode. At least one review is mandatory.
41
+ Spawn a FRESH instance of spec-writer in review mode. At least one review is mandatory. Before each review cycle, update `.claude/ship-hook-state.json` `sub_phase` to `"review-cycle-N"` (N = cycle number, starting at 1).
42
42
 
43
43
  ```
44
44
  Agent tool:
@@ -71,6 +71,6 @@ Revise based on user feedback. If changes are substantial, re-run the review loo
71
71
 
72
72
  ## Task 5: Proceed
73
73
 
74
- Update `{spec_dir}/state.yaml` — set `phase: tasks`.
74
+ Update `{spec_dir}/state.yaml` — set `phase: verify`. Update `.claude/ship-hook-state.json` — set `phase` to `"verify"`, `sub_phase` to `null`.
75
75
 
76
- Use the Read tool on `references/step-6-tasks.md` to break the spec into implementation tasks.
76
+ Use the Read tool on `references/step-6-verify.md` to plan verification for the spec.
@@ -0,0 +1,64 @@
1
+ # Verification Planning
2
+
3
+ The orchestrator spawns a test-planner agent to generate a test plan, then spawns a fresh instance to review and fix it. Each review is a clean context window — the reviewer didn't write the plan, so it reads with fresh eyes. The reviewer focuses on medium-to-high severity issues only — if a reviewer only fixes minor issues, the orchestrator moves on rather than over-rotating. If medium-to-high issues are fixed, those fixes must be verified by another fresh reviewer.
4
+
5
+ The test plan defines how every spec requirement will be verified. It bridges the gap between "what the system should do" (spec) and "how we build it" (tasks). A fresh agent writes this — one that has never seen the implementation plan, so its verification strategy tests the *intent*, not the implementation approach.
6
+
7
+ Read `{spec_dir}/spec.md` before proceeding.
8
+
9
+ ## Create Tasks
10
+
11
+ Create these tasks and work through them in order:
12
+
13
+ 1. "Generate test plan" — spawn test-planner in write mode
14
+ 2. "Review test plan" — spawn test-planner in review mode, loop until approved
15
+ 3. "Review test plan with user" — present the approved plan
16
+ 4. "Begin task breakdown" — proceed to the next phase
17
+
18
+ ## Task 1: Generate Test Plan
19
+
20
+ Spawn the test-planner in write mode:
21
+
22
+ ```
23
+ Agent tool:
24
+ subagent_type: "test-planner"
25
+ prompt: "Generate the test plan for the feature at {spec_dir}. Read intent.md, research.md, design.md, and spec.md for context. Write the test plan to {spec_dir}/test-plan.md."
26
+ ```
27
+
28
+ ## Task 2: Review Loop
29
+
30
+ Spawn a FRESH instance of test-planner in review mode. At least one review is mandatory. Before each review cycle, update `.claude/ship-hook-state.json` `sub_phase` to `"review-cycle-N"` (N = cycle number, starting at 1).
31
+
32
+ ```
33
+ Agent tool:
34
+ subagent_type: "test-planner"
35
+ prompt: "Review the test plan at {spec_dir}/test-plan.md against the upstream artifacts (intent.md, research.md, design.md, spec.md). Run the full review checklist. Focus on medium-to-high severity issues — ignore minor wording or formatting. Fix every medium-to-high issue directly in test-plan.md. Return APPROVED if zero medium-to-high issues found, APPROVED_WITH_FIXES with severity tags if issues were found and fixed, or ISSUES REMAINING for anything you cannot auto-fix."
36
+ ```
37
+
38
+ **If APPROVED** (zero medium-to-high issues found): The test plan is verified clean. Move to Task 3.
39
+
40
+ **If APPROVED_WITH_FIXES**: Parse the severity of each fix from the reviewer's output:
41
+ - If ANY fix was **HIGH** or **MEDIUM** — those fixes need verification. Spawn another fresh instance to review again.
42
+ - If somehow all fixes were low-severity — the reviewer is finding diminishing returns. Move to Task 3.
43
+
44
+ **If ISSUES REMAINING**: Spawn another fresh instance to review again. The previous reviewer already fixed what it could — the next reviewer may catch different things or resolve what the last one couldn't.
45
+
46
+ If the loop runs more than 5 cycles without a clean APPROVED, present the remaining issues to the user and ask how to proceed.
47
+
48
+ ## Task 3: Review With User
49
+
50
+ Read `{spec_dir}/test-plan.md` and present it to the user. Walk through each section, highlighting:
51
+
52
+ - Contract tests and which API boundaries they cover
53
+ - Behavioral tests and their mapping to acceptance criteria
54
+ - Integration tests and which cross-cutting flows they verify
55
+ - Negative tests and which failure modes they catch
56
+ - Test infrastructure decisions (framework, fixtures, mocking strategy)
57
+
58
+ Ask the user if the verification strategy is complete. Revise based on feedback. If changes are substantial, re-run the review loop (Task 2).
59
+
60
+ ## Task 4: Proceed
61
+
62
+ Update `{spec_dir}/state.yaml` — set `phase: tasks`. Update `.claude/ship-hook-state.json` — set `phase` to `"tasks"`, `sub_phase` to `null`.
63
+
64
+ Use the Read tool on `references/step-7-tasks.md` to break the spec into implementation tasks.
@@ -2,7 +2,7 @@
2
2
 
3
3
  The orchestrator spawns a task-breakdown agent to generate task files, then spawns a fresh instance of the same agent to review and fix them. Each review is a clean context window — the reviewer didn't write the tasks, so it reads with fresh eyes. The reviewer focuses on medium-to-high severity issues only — if a reviewer only fixes minor issues, the orchestrator moves on rather than over-rotating. If medium-to-high issues are fixed, those fixes must be verified by another fresh reviewer.
4
4
 
5
- Read `{spec_dir}/spec.md` before proceeding.
5
+ Read `{spec_dir}/spec.md` and `{spec_dir}/test-plan.md` before proceeding.
6
6
 
7
7
  ## Create Tasks
8
8
 
@@ -20,17 +20,17 @@ Spawn the task-breakdown agent in write mode:
20
20
  ```
21
21
  Agent tool:
22
22
  subagent_type: "task-breakdown"
23
- prompt: "Break the spec at {spec_dir} into implementation task files. Read spec.md, research.md, and design.md for context. Write task files to {spec_dir}/tasks/."
23
+ prompt: "Break the spec at {spec_dir} into implementation task files. Read spec.md, test-plan.md, research.md, and design.md for context. Write task files to {spec_dir}/tasks/."
24
24
  ```
25
25
 
26
26
  ## Task 2: Review Loop
27
27
 
28
- Spawn a FRESH instance of task-breakdown in review mode:
28
+ Spawn a FRESH instance of task-breakdown in review mode. Before each review cycle, update `.claude/ship-hook-state.json` `sub_phase` to `"review-cycle-N"` (N = cycle number, starting at 1):
29
29
 
30
30
  ```
31
31
  Agent tool:
32
32
  subagent_type: "task-breakdown"
33
- prompt: "Review the task breakdown at {spec_dir}. Read spec.md and all files in {spec_dir}/tasks/. Run the full 9-point checklist. Focus on medium-to-high severity issues — ignore minor wording or formatting. Fix every medium-to-high issue directly in the task files. Return APPROVED if zero medium-to-high issues found, APPROVED_WITH_FIXES with severity tags if issues were found and fixed, or ISSUES REMAINING for anything you cannot auto-fix."
33
+ prompt: "Review the task breakdown at {spec_dir}. Read spec.md, test-plan.md, and all files in {spec_dir}/tasks/. Run the full 9-point checklist. Focus on medium-to-high severity issues — ignore minor wording or formatting. Fix every medium-to-high issue directly in the task files. Return APPROVED if zero medium-to-high issues found, APPROVED_WITH_FIXES with severity tags if issues were found and fixed, or ISSUES REMAINING for anything you cannot auto-fix."
34
34
  ```
35
35
 
36
36
  **If APPROVED** (zero issues found): The breakdown is verified clean. Move to Task 3.
@@ -49,12 +49,12 @@ Present the approved task breakdown. For each task, show:
49
49
 
50
50
  - What it does (the criterion)
51
51
  - Why it's in this order (the dependency reasoning)
52
- - How it can be independently verified
52
+ - Which tests from the test plan it references
53
53
 
54
54
  Revise based on user feedback. If changes are substantial, re-run the review loop (Task 2).
55
55
 
56
56
  ## Task 4: Proceed
57
57
 
58
- Update `{spec_dir}/state.yaml` — set `phase: implement`.
58
+ Update `{spec_dir}/state.yaml` — set `phase: implement`. Update `.claude/ship-hook-state.json` — set `phase` to `"implement"`, `sub_phase` to `null`.
59
59
 
60
- Use the Read tool on `references/step-7-implement.md` to begin implementation.
60
+ Use the Read tool on `references/step-8-implement.md` to begin implementation.
@@ -2,7 +2,7 @@
2
2
 
3
3
  Orchestrate implementation using spec-implementer and spec-validator sub-agents. This follows the execute-spec pattern — you dispatch agents, you do not write code yourself.
4
4
 
5
- Read `{spec_dir}/spec.md` and list all task files in `{spec_dir}/tasks/`.
5
+ Read `{spec_dir}/spec.md`, `{spec_dir}/test-plan.md`, and list all task files in `{spec_dir}/tasks/`.
6
6
 
7
7
  ## Step 1: Hydrate Task System
8
8
 
@@ -23,7 +23,7 @@ Work through tasks in dependency order. For each task that is ready (no blockers
23
23
  ```
24
24
  Agent tool:
25
25
  subagent_type: "spec-implementer"
26
- prompt: "Implement the task described in {task_file_path}. Read the task file for requirements, files to modify, and verification steps. Also read {spec_dir}/spec.md for overall context. After implementation, run the verification steps described in the task file."
26
+ prompt: "Implement the task described in {task_file_path}. Read the task file for requirements, files to modify, and referenced tests. Also read {spec_dir}/spec.md and {spec_dir}/test-plan.md for overall context. Follow TDD: write the test code first for the tests referenced in the task file, confirm they fail, then implement the minimum code to make them pass."
27
27
  ```
28
28
 
29
29
  When the implementer returns, use TaskUpdate to mark the task as `in_progress` (implementation done, not yet validated).
@@ -33,7 +33,7 @@ When the implementer returns, use TaskUpdate to mark the task as `in_progress` (
33
33
  ```
34
34
  Agent tool:
35
35
  subagent_type: "spec-validator"
36
- prompt: "Validate the task described in {task_file_path}. Review the code changes, run tests, and verify against the acceptance criteria in {spec_dir}/spec.md. Report pass/fail with details."
36
+ prompt: "Validate the task described in {task_file_path}. Run the tests referenced in the task file. Review the code changes and test quality. Verify against the acceptance criteria in {spec_dir}/spec.md and the test specifications in {spec_dir}/test-plan.md. Report pass/fail with details."
37
37
  ```
38
38
 
39
39
  - **If pass**: Use TaskUpdate to mark the task as `completed`. This unblocks downstream tasks.
@@ -43,7 +43,7 @@ Run independent tasks (no dependency between them) in parallel when possible. Al
43
43
 
44
44
  ## Step 3: Finalize
45
45
 
46
- Update `{spec_dir}/state.yaml` — set `phase: complete`.
46
+ Update `{spec_dir}/state.yaml` — set `phase: complete`. Update `.claude/ship-hook-state.json` — set `phase` to `"complete"`, `sub_phase` to `null`.
47
47
 
48
48
  Present a summary to the user:
49
49
 
@@ -51,4 +51,4 @@ Present a summary to the user:
51
51
  - What tests pass
52
52
  - Any tasks that needed manual intervention
53
53
 
54
- Use the Read tool on `references/step-8-reflect.md` to review and improve this workflow.
54
+ Use the Read tool on `references/step-9-reflect.md` to review and improve this workflow.
@@ -1,18 +1,23 @@
1
1
  # Reflect
2
2
 
3
- Review how this workflow performed and identify improvements.
3
+ ## Cleanup
4
+
5
+ Delete `.claude/ship-hook-state.json` — policy enforcement is no longer needed for this feature.
4
6
 
5
7
  ## Self-Assessment
6
8
 
9
+ Review how this workflow performed and identify improvements.
10
+
7
11
  Consider each phase:
8
12
 
9
13
  1. **Intent capture**: Did the intent document accurately capture what the user wanted? Did the spec drift from the original intent?
10
14
  2. **Question quality**: Were the research questions comprehensive? Were any critical questions missing that caused problems later?
11
15
  3. **Research objectivity**: Did the research stay objective? Did the contamination prevention work — or did implementation opinions leak in despite the separation?
12
16
  4. **Design decisions**: Were the design questions the right ones? Did the user have to course-correct on things that should have been caught earlier?
13
- 5. **Spec completeness**: Were the API contracts and acceptance criteria specific enough for implementation agents?
14
- 6. **Task ordering**: Did the tracer bullet ordering work? Were there dependency issues or tasks that should have been ordered differently?
15
- 7. **Implementation**: Did agents struggle with any tasks? Were the task descriptions clear enough?
17
+ 5. **Spec completeness**: Were the API contracts and acceptance criteria specific enough for downstream agents?
18
+ 6. **Verification planning**: Did the test plan cover the right things? Were there gaps that only surfaced during implementation? Did the test-planner catch spec issues the spec-writer missed?
19
+ 7. **Task ordering**: Did the tracer bullet ordering work? Were there dependency issues or tasks that should have been ordered differently?
20
+ 8. **Implementation**: Did agents follow TDD successfully? Did tests written first actually catch implementation bugs, or were they vacuous? Did any tests need significant rework during implementation?
16
21
 
17
22
  ## Skill Improvement
18
23