specpipe 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/README.md +116 -1220
  2. package/package.json +3 -2
  3. package/src/cli.js +16 -6
  4. package/src/commands/diff.js +1 -1
  5. package/src/commands/init-agents.js +40 -20
  6. package/src/commands/init-global.js +88 -33
  7. package/src/commands/init-interactive.js +71 -0
  8. package/src/commands/init.js +61 -22
  9. package/src/commands/remove.js +159 -49
  10. package/src/commands/upgrade.js +21 -56
  11. package/src/lib/agent-guards.js +34 -78
  12. package/src/lib/agent-install.js +38 -25
  13. package/src/lib/agents.js +53 -11
  14. package/src/lib/claude-global.js +50 -77
  15. package/src/lib/hooks.js +203 -0
  16. package/src/lib/installer.js +73 -61
  17. package/src/lib/reconcile.js +13 -8
  18. package/templates/{.claude/hooks → hooks}/file-guard.js +26 -21
  19. package/templates/hooks/specpipe-read-guard.sh +94 -21
  20. package/templates/hooks/specpipe-shell-guard.sh +121 -29
  21. package/templates/rules/specpipe-rules.md +77 -0
  22. package/templates/skills/sp-build/SKILL.md +101 -1
  23. package/templates/skills/sp-build-behavior-matrix/SKILL.md +876 -0
  24. package/templates/skills/sp-challenge/SKILL.md +34 -0
  25. package/templates/skills/sp-challenge-behavior-matrix/SKILL.md +289 -0
  26. package/templates/skills/sp-explore/SKILL.md +132 -0
  27. package/templates/skills/sp-explore-behavior-matrix/SKILL.md +862 -0
  28. package/templates/skills/sp-fix/SKILL.md +73 -1
  29. package/templates/skills/sp-fix-behavior-matrix/SKILL.md +338 -0
  30. package/templates/skills/sp-investigate/SKILL.md +70 -0
  31. package/templates/skills/sp-investigate-behavior-matrix/SKILL.md +718 -0
  32. package/templates/skills/sp-plan/SKILL.md +90 -0
  33. package/templates/skills/sp-plan-behavior-matrix/SKILL.md +1037 -0
  34. package/templates/skills/sp-review/SKILL.md +29 -3
  35. package/templates/skills/sp-review-behavior-matrix/SKILL.md +294 -0
  36. package/templates/.claude/CLAUDE.md +0 -79
  37. package/templates/.claude/hooks/path-guard.sh +0 -118
  38. package/templates/.claude/hooks/self-review.sh +0 -27
  39. package/templates/.claude/hooks/sensitive-guard.sh +0 -227
  40. package/templates/.claude/settings.json +0 -68
  41. package/templates/docs/WORKFLOW.md +0 -325
  42. package/templates/docs/specs/.gitkeep +0 -0
  43. package/templates/rules/specpipe-guards.md +0 -40
  44. package/templates/scripts/test-hooks.sh +0 -66
  45. /package/templates/{.claude/hooks → hooks}/comment-guard.js +0 -0
  46. /package/templates/{.claude/hooks → hooks}/glob-guard.js +0 -0
@@ -30,6 +30,7 @@ Map the plan's attack surface:
30
30
  - Scope boundaries (in/out/suspiciously unmentioned)
31
31
  - Risk acknowledgments (mentioned vs. conspicuously absent)
32
32
  - Story↔AS consistency (stories without acceptance scenarios? contradictions?)
33
+ - Behavior Matrix coverage if present: every state/viewer/surface cell, its AS/GAP/N/A coverage, suspicious N/A cells, weak AS cells, missing surfaces, and cascade/parity obligations
33
34
 
34
35
  Collect all file paths the reviewers will need to read.
35
36
 
@@ -42,8 +43,10 @@ Assess plan complexity and select which lenses to deploy:
42
43
  | Simple (1 spec section, <20 acceptance scenarios, no auth/data) | 2 | Assumptions + Scope |
43
44
  | Standard (multiple sections, auth or data involved) | 3 | + Security |
44
45
  | Complex (multiple integrations, concurrency, migrations, 6+ phases) | 4 | + Failure Modes |
46
+ | Behavior Matrix present (`## Behavior Matrix`) | +1 reviewer, replacing the lowest-value generic lens if needed to stay capped at 4 | + Lifecycle & Parity |
45
47
 
46
48
  When in doubt, use 3 reviewers. 4 is for genuinely complex plans.
49
+ If `## Behavior Matrix` exists, always include the Lifecycle & Parity lens. This lens is not optional: the matrix exists because the feature has state/viewer/surface risk.
47
50
 
48
51
  ## Phase 3: Spawn Parallel Reviewers
49
52
 
@@ -152,6 +155,33 @@ Examine the plan for:
152
155
  - Test burden: Test cases harder to maintain than the feature itself?
153
156
  ```
154
157
 
158
+ **Lifecycle & Parity Adversary (Behavior Matrix lens):**
159
+ ```
160
+ You are the QA lead who historically found state/viewer/surface bugs on staging. Your job is to attack the Behavior Matrix before code exists.
161
+
162
+ Use ONLY the plan's stated states, viewers, surfaces, AS, GAP, N/A, constraints, linked fields, and impact map. Do not invent unrelated edge cases. Focus on whether the matrix faithfully covers the behavior the plan already triggered.
163
+
164
+ Examine the plan for:
165
+ - Missing axes: states/statuses, viewer roles/relationships, or surfaces named elsewhere in the spec but absent from `## Behavior Matrix`.
166
+ - Missing sibling discovery: if the plan changes an existing operation or bug-fix path, check whether it has `## Sibling Surface Map` or an explicit reason discovery is not applicable. High/medium sibling candidates must be `cover`, `GAP-NNN`, or `ignore(reason)`; only `cover` candidates may feed Behavior Matrix surfaces.
167
+ - Missing invariant context: if the plan touches a component named by project-local `docs/invariants/INV-*.md`, confirm the relevant invariant is represented in Constraints, GAPs, Behavior Matrix, or explicitly ignored with a reason. Use the invariant registry README/schema as base knowledge only; README examples are not runtime entries. Do not invent new invariant entries here.
168
+ - Suspicious N/A cells: any `N/A` that could actually occur, lacks a concrete reason, or hides a known QA failure mode.
169
+ - GAP triage: any `GAP` cell that is too important to leave open before build because code would otherwise guess behavior.
170
+ - Weak AS cells: an AS referenced by a matrix cell but not asserting the same state, viewer, surface, source/timing, label/visibility/action, or cascade/parity obligation.
171
+ - Surface parity holes: list/detail/feed/dashboard/worklist/API/email/calendar values that should match but are covered only globally or on one surface.
172
+ - Cascade holes: state transitions that update one surface but omit queues, counts, feed/timeline, notifications/email, calendar/provider state, or read-model invalidation.
173
+ - Delete/orphan/incomplete/out-of-order handling when the matrix implies cross-module data flow: deleted source record, orphan target reference, incomplete source data, target created before source, refresh timing mismatch.
174
+ - Timing/source ambiguity: cells that say "updated" without realtime vs refresh-required vs persisted+served vs transient-in-response.
175
+
176
+ Mandatory output discipline for this lens:
177
+ - Include a section or finding line named **Suspicious N/A/GAP Review** whenever the matrix has any `N/A` or `GAP` cell.
178
+ - For every `N/A`, state one of: `accepted N/A: <reason is concrete>` or `suspicious N/A: <why this may actually occur>`.
179
+ - For every `GAP`, state one of: `safe GAP: <why build can proceed without guessing>` or `blocking GAP: <why code would otherwise guess behavior>`.
180
+ - If there are no suspicious cells, explicitly write `Suspicious N/A/GAP Review: no suspicious N/A/GAP cells found; all reasons concrete and non-release-blocking`.
181
+
182
+ For suggested fixes, do not say "add tests". Fix the spec: add/modify matrix row, convert N/A to AS/GAP, strengthen the AS Then clause, or add a constraint with per-surface coverage.
183
+ ```
184
+
155
185
  ## Phase 4: Collect and Consolidate
156
186
 
157
187
  After all reviewers complete:
@@ -169,6 +199,10 @@ After all reviewers complete:
169
199
  4. **Sort** by severity: Critical → High → Medium → Low
170
200
  5. **Cap** at 15 findings: keep all Critical, top High by specificity, note how many Medium were dropped
171
201
  6. **Cross-reference check** (you, not reviewers): Flag any stories without acceptance scenarios, and any AS that contradicts the story description
202
+ 7. **Behavior Matrix cross-check** (you, not reviewers, when present): Every matrix `Coverage = AS-NNN` must point to an AS that actually asserts the same state/viewer/surface outcome; every `GAP-NNN` must exist in `## Gaps`; every `N/A` must have a reason. Missing or mismatched cells are accepted findings unless another reviewer already caught them.
203
+ 8. **Suspicious N/A/GAP Review cross-check** (you, not reviewers, when present): If the plan has `## Behavior Matrix` and any `N/A` or `GAP` cell, the final challenge output MUST include a `Suspicious N/A/GAP Review` section. Omission is itself a Medium finding because it lets lifecycle/viewer/surface blind spots hide behind "not applicable" or unresolved gaps.
204
+ 9. **Sibling Surface Map cross-check** (you, not reviewers, when present): If `## Sibling Surface Map` exists, every high/medium candidate must have a disposition (`cover`, `GAP-NNN`, or `ignore(reason)`). If the plan touches an existing operation and has no map, flag missing discovery unless the plan states why sibling discovery is not applicable.
205
+ 10. **Invariant registry cross-check** (you, not reviewers, when present): If project-local `docs/invariants/INV-*.md` contains an entry matching the planned component, the plan must either carry it into Constraints/Behavior Matrix/GAPs or explicitly state why it does not apply. Missing invariant handling is a Medium/High finding depending on `status`.
172
206
 
173
207
  ## Phase 5: Adjudicate
174
208
 
@@ -0,0 +1,289 @@
1
+ ---
2
+ description: |
3
+ Adversarial review — spawn hostile reviewers to break the plan before coding.
4
+ Stress-tests assumptions, attacks decisions, finds blind spots in a spec.
5
+ Use when asked to "challenge this plan", "phản biện", "stress test the spec",
6
+ "tìm lỗ hổng", "break this", "red team this", or "attack this design".
7
+ Proactively suggest after /sp-plan produces a spec but before /sp-build —
8
+ catches design issues while they are still cheap to fix.
9
+ Skip for trivial spec changes or pure bug fixes.
10
+ allowed-tools: Read, Bash, Glob, Grep, AskUserQuestion, Agent
11
+ ---
12
+ Adversarial review — spawn hostile reviewers to break the plan before coding.
13
+
14
+ ## Input
15
+
16
+ Target: $ARGUMENTS
17
+
18
+ If argument is a file path → use that.
19
+ If argument is a feature name → search `docs/specs/` for matches.
20
+ If no argument → list recent files in `docs/specs/`, ask user which to challenge.
21
+
22
+ ## Phase 1: Read and Map
23
+
24
+ Read the ENTIRE target file. The spec contains both the feature definition and acceptance scenarios (in `## Stories` section).
25
+
26
+ Map the plan's attack surface:
27
+ - Decisions made (and what was rejected)
28
+ - Assumptions (stated AND implied)
29
+ - Dependencies (external services, APIs, libraries, infra)
30
+ - Scope boundaries (in/out/suspiciously unmentioned)
31
+ - Risk acknowledgments (mentioned vs. conspicuously absent)
32
+ - Story↔AS consistency (stories without acceptance scenarios? contradictions?)
33
+ - Behavior Matrix coverage if present: every state/viewer/surface cell, its AS/GAP/N/A coverage, suspicious N/A cells, weak AS cells, missing surfaces, and cascade/parity obligations
34
+
35
+ Collect all file paths the reviewers will need to read.
36
+
37
+ ## Phase 2: Scale Reviewers
38
+
39
+ Assess plan complexity and select which lenses to deploy:
40
+
41
+ | Complexity Signal | Reviewers | Lenses |
42
+ |-------------------|-----------|--------|
43
+ | Simple (1 spec section, <20 acceptance scenarios, no auth/data) | 2 | Assumptions + Scope |
44
+ | Standard (multiple sections, auth or data involved) | 3 | + Security |
45
+ | Complex (multiple integrations, concurrency, migrations, 6+ phases) | 4 | + Failure Modes |
46
+ | Behavior Matrix present (`## Behavior Matrix`) | +1 reviewer, replacing the lowest-value generic lens if needed to stay capped at 4 | + Lifecycle & Parity |
47
+
48
+ When in doubt, use 3 reviewers. 4 is for genuinely complex plans.
49
+ If `## Behavior Matrix` exists, always include the Lifecycle & Parity lens. This lens is not optional: the matrix exists because the feature has state/viewer/surface risk.
50
+
51
+ ## Phase 3: Spawn Parallel Reviewers
52
+
53
+ Launch reviewers simultaneously using the Agent tool. Each reviewer is an independent subagent that reads the plan files directly and returns findings.
54
+
55
+ **CRITICAL:** Each reviewer prompt MUST include:
56
+ 1. The file paths to read (so they can access the plan directly)
57
+ 2. Their specific adversarial persona and lens
58
+ 3. The exact output format (so you can parse findings consistently)
59
+ 4. The rules of engagement
60
+
61
+ ### Reviewer Prompts
62
+
63
+ For each selected lens, spawn an agent with this structure:
64
+
65
+ ```
66
+ You are a hostile reviewer. Your job is to DESTROY this plan by finding every flaw through the {LENS_NAME} lens.
67
+
68
+ Read these files first:
69
+ {LIST OF FILE PATHS}
70
+
71
+ --- YOUR LENS ---
72
+
73
+ {LENS-SPECIFIC INSTRUCTIONS — see below}
74
+
75
+ --- OUTPUT FORMAT ---
76
+
77
+ For EACH flaw found, output exactly:
78
+
79
+ ### Finding: <title>
80
+ - **Severity:** Critical | High | Medium
81
+ - **Confidence:** N/10 — (9-10: verified in code; 7-8: strong pattern match; 5-6: possible false positive, note caveat; ≤4: omit unless Critical)
82
+ - **Location:** <exact section or heading in the plan>
83
+ - **Flaw:** <what's wrong — be specific>
84
+ - **Evidence:** "<direct quote from the plan>"
85
+ - **Failure scenario:** <step-by-step: how this causes a real problem in production>
86
+ - **Root cause:** <why does this flaw exist? Missing requirement? Wrong assumption?>
87
+ - **Suggested fix:** <specific, actionable — not just "fix it">
88
+
89
+ --- RULES ---
90
+
91
+ - 3-7 findings per lens. Quality over quantity.
92
+ - Be HOSTILE. No praise. No "overall looks good."
93
+ - Be SPECIFIC. Cite exact sections. Quote the plan.
94
+ - Be CONCRETE. Failure scenarios must be step-by-step, not "could be a problem."
95
+ - Skip trivial issues (naming, formatting, style).
96
+ - If the plan is solid for your lens, 1-2 findings is honest. Don't manufacture problems.
97
+ ```
98
+
99
+ ### Lens-Specific Instructions
100
+
101
+ **Security Adversary:**
102
+ ```
103
+ You are an attacker with knowledge of the tech stack and access to the public API.
104
+
105
+ Examine the plan for:
106
+ - Authentication/authorization bypass: Can auth be skipped? Can user A access user B's data? Are role checks at every layer?
107
+ - Injection vectors: Where does user input enter? SQL, shell, HTML, template, log injection? Parameterized queries?
108
+ - Data exposure: What leaks in error messages, logs, API responses? Stack traces? Internal paths? DB schemas?
109
+ - Cryptography: Password hashing (bcrypt/argon2, not MD5/SHA)? Secrets in env vars not code? TLS?
110
+ - Supply chain: New dependencies? Maintained? Known CVEs?
111
+ - OWASP Top 10 (2021): Broken Access Control, Crypto Failures, Injection, Insecure Design, Security Misconfiguration, Vulnerable Components, Identity Failures, Integrity Failures, Logging Failures, SSRF
112
+ ```
113
+
114
+ **Failure Mode Analyst:**
115
+ ```
116
+ You believe Murphy's Law: everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic.
117
+
118
+ Examine the plan for:
119
+ - Partial failures: What if step 3 of 5 fails? Rollback? Atomic writes? Inconsistent state?
120
+ - Concurrency: Race conditions? Two users editing same resource? Shared mutable state? Deadlocks?
121
+ - Cascading failures: Service A down → B also fails? Circuit-breaking? Graceful degradation?
122
+ - Data integrity: Data loss? Corruption? Duplication? DB-level constraints or app-only validation?
123
+ - Recovery: How to recover from each failure? Reversible migrations? Backup restoration time?
124
+ - Deployment: What breaks during deploy? Rollback plan? Migration failures?
125
+ - Idempotency: Retried requests duplicate data? Double-charge? Double-email?
126
+ - Observability: How do you KNOW something failed? Logging? Monitoring? Alerts? Or angry users?
127
+ ```
128
+
129
+ **Assumption Destroyer:**
130
+ ```
131
+ You are a radical skeptic. "It should work" is not evidence. "We assume X" means X is unverified.
132
+
133
+ Examine the plan for:
134
+ - Unverified claims: "The API returns X" — tested? "The library supports Y" — checked docs?
135
+ - Scale assumptions: Expected load? Works at 10x? 100x? O(n²) hiding in "iterate all items"?
136
+ - Environment gaps: Same behavior in dev/staging/prod? Different OS? Docker vs bare metal?
137
+ - Integration risk: Third-party SLA? Rate limits? Their service down → your plan?
138
+ - Data assumptions: Always clean? Unicode? Emoji? Null bytes? 10MB payloads? Empty strings?
139
+ - User behavior: Will users actually do this? What if they click 50 times? Upload 2GB? Use mobile?
140
+ - Timing: "A before B" — always? What if B first? Implicit ordering dependencies?
141
+ - Hidden dependencies: Services, configs, env vars, or manual steps that must exist but aren't documented?
142
+ ```
143
+
144
+ **Scope & Complexity Critic (YAGNI Enforcer):**
145
+ ```
146
+ You believe the best code is no code. The best feature is the one you didn't build.
147
+
148
+ Examine the plan for:
149
+ - Over-engineering: Solving problems that don't exist yet? "In case we need it later" = YAGNI.
150
+ - Premature abstraction: Generic framework for 1 use case? Plugin system nobody asked for?
151
+ - Missing MVP: What's the absolute minimum viable delivery? Can 40% be deferred?
152
+ - Complexity vs value: Distributed system for 5 users? Proportional?
153
+ - Gold plating: Nice-to-have mixed with must-have? Can you ship without the nice-to-haves?
154
+ - Simpler alternative: Boring 10-line solution vs clever 500-line solution?
155
+ - Test burden: Test cases harder to maintain than the feature itself?
156
+ ```
157
+
158
+ **Lifecycle & Parity Adversary (Behavior Matrix lens):**
159
+ ```
160
+ You are the QA lead who historically found state/viewer/surface bugs on staging. Your job is to attack the Behavior Matrix before code exists.
161
+
162
+ Use ONLY the plan's stated states, viewers, surfaces, AS, GAP, N/A, constraints, linked fields, and impact map. Do not invent unrelated edge cases. Focus on whether the matrix faithfully covers the behavior the plan already triggered.
163
+
164
+ Examine the plan for:
165
+ - Missing axes: states/statuses, viewer roles/relationships, or surfaces named elsewhere in the spec but absent from `## Behavior Matrix`.
166
+ - Missing sibling discovery: if the plan changes an existing operation or bug-fix path, check whether it has `## Sibling Surface Map` or an explicit reason discovery is not applicable. High/medium sibling candidates must be `cover`, `GAP-NNN`, or `ignore(reason)`; only `cover` candidates may feed Behavior Matrix surfaces.
167
+ - Missing invariant context: if the plan touches a component named by project-local `docs/invariants/INV-*.md`, confirm the relevant invariant is represented in Constraints, GAPs, Behavior Matrix, or explicitly ignored with a reason. Use the invariant registry README/schema as base knowledge only; README examples are not runtime entries. Do not invent new invariant entries here.
168
+ - Suspicious N/A cells: any `N/A` that could actually occur, lacks a concrete reason, or hides a known QA failure mode.
169
+ - GAP triage: any `GAP` cell that is too important to leave open before build because code would otherwise guess behavior.
170
+ - Weak AS cells: an AS referenced by a matrix cell but not asserting the same state, viewer, surface, source/timing, label/visibility/action, or cascade/parity obligation.
171
+ - Surface parity holes: list/detail/feed/dashboard/worklist/API/email/calendar values that should match but are covered only globally or on one surface.
172
+ - Cascade holes: state transitions that update one surface but omit queues, counts, feed/timeline, notifications/email, calendar/provider state, or read-model invalidation.
173
+ - Delete/orphan/incomplete/out-of-order handling when the matrix implies cross-module data flow: deleted source record, orphan target reference, incomplete source data, target created before source, refresh timing mismatch.
174
+ - Timing/source ambiguity: cells that say "updated" without realtime vs refresh-required vs persisted+served vs transient-in-response.
175
+
176
+ Mandatory output discipline for this lens:
177
+ - Include a section or finding line named **Suspicious N/A/GAP Review** whenever the matrix has any `N/A` or `GAP` cell.
178
+ - For every `N/A`, state one of: `accepted N/A: <reason is concrete>` or `suspicious N/A: <why this may actually occur>`.
179
+ - For every `GAP`, state one of: `safe GAP: <why build can proceed without guessing>` or `blocking GAP: <why code would otherwise guess behavior>`.
180
+ - If there are no suspicious cells, explicitly write `Suspicious N/A/GAP Review: no suspicious N/A/GAP cells found; all reasons concrete and non-release-blocking`.
181
+
182
+ For suggested fixes, do not say "add tests". Fix the spec: add/modify matrix row, convert N/A to AS/GAP, strengthen the AS Then clause, or add a constraint with per-surface coverage.
183
+ ```
184
+
185
+ ## Phase 4: Collect and Consolidate
186
+
187
+ After all reviewers complete:
188
+
189
+ 1. **Collect** all findings from all reviewers
190
+ 2. **Deduplicate** — if two lenses found the same root issue, merge into one finding noting both lenses
191
+ 3. **Rate severity** using Likelihood × Impact:
192
+
193
+ | | Low Impact | Medium Impact | High Impact |
194
+ |---|-----------|---------------|-------------|
195
+ | **Likely** | Medium | High | Critical |
196
+ | **Possible** | Low | Medium | High |
197
+ | **Unlikely** | Low | Low | Medium |
198
+
199
+ 4. **Sort** by severity: Critical → High → Medium → Low
200
+ 5. **Cap** at 15 findings: keep all Critical, top High by specificity, note how many Medium were dropped
201
+ 6. **Cross-reference check** (you, not reviewers): Flag any stories without acceptance scenarios, and any AS that contradicts the story description
202
+ 7. **Behavior Matrix cross-check** (you, not reviewers, when present): Every matrix `Coverage = AS-NNN` must point to an AS that actually asserts the same state/viewer/surface outcome; every `GAP-NNN` must exist in `## Gaps`; every `N/A` must have a reason. Missing or mismatched cells are accepted findings unless another reviewer already caught them.
203
+ 8. **Suspicious N/A/GAP Review cross-check** (you, not reviewers, when present): If the plan has `## Behavior Matrix` and any `N/A` or `GAP` cell, the final challenge output MUST include a `Suspicious N/A/GAP Review` section. Omission is itself a Medium finding because it lets lifecycle/viewer/surface blind spots hide behind "not applicable" or unresolved gaps.
204
+ 9. **Sibling Surface Map cross-check** (you, not reviewers, when present): If `## Sibling Surface Map` exists, every high/medium candidate must have a disposition (`cover`, `GAP-NNN`, or `ignore(reason)`). If the plan touches an existing operation and has no map, flag missing discovery unless the plan states why sibling discovery is not applicable.
205
+ 10. **Invariant registry cross-check** (you, not reviewers, when present): If project-local `docs/invariants/INV-*.md` contains an entry matching the planned component, the plan must either carry it into Constraints/Behavior Matrix/GAPs or explicitly state why it does not apply. Missing invariant handling is a Medium/High finding depending on `status`.
206
+
207
+ ## Phase 5: Adjudicate
208
+
209
+ For each finding, YOU (the coordinator) evaluate and propose a disposition:
210
+
211
+ | Disposition | When to use |
212
+ |-------------|-------------|
213
+ | **Accept** | Valid flaw. Plan should be updated. |
214
+ | **Reject** | False positive, acceptable risk, or already handled elsewhere. |
215
+
216
+ Include 1-sentence rationale for each disposition. Be honest — don't reject valid findings to be nice, and don't accept trivial findings to pad the list.
217
+
218
+ ## Phase 6: Present to User
219
+
220
+ Show adjudicated findings using the reviewer output format plus Disposition and Rationale fields.
221
+
222
+ Then present the decision using the `AskUserQuestion` tool:
223
+
224
+ ```json
225
+ {
226
+ "questions": [
227
+ {
228
+ "question": "How to proceed with N accepted findings? RECOMMENDATION: Choose A if mostly Medium fixes, B if any Critical/High findings.",
229
+ "header": "Apply Findings",
230
+ "multiSelect": false,
231
+ "options": [
232
+ {"label": "A) Apply all accepted — bulk-apply all fixes at once | (human: ~30m / CC: ~10m) | Completeness: 8/10 | Trade-off: fast vs. no per-finding control"},
233
+ {"label": "B) Review each — walk through one by one, accept/reject/modify | (human: ~1h / CC: ~20m) | Completeness: 10/10 | Trade-off: precise control vs. slower"}
234
+ ]
235
+ }
236
+ ]
237
+ }
238
+ ```
239
+
240
+ Score: if most findings are High/Critical, recommend B. If mostly Medium with clear fixes, recommend A.
241
+
242
+ If user picks B: for each finding, use `AskUserQuestion`. Append `(Recommended)` to option A if the Phase 5 adjudication = Accept, or to option C if adjudication = Reject:
243
+
244
+ ```json
245
+ {
246
+ "questions": [
247
+ {
248
+ "question": "Finding [C-1]: <title>\n<flaw summary>\nRECOMMENDATION: Choose A — <adjudication rationale>.",
249
+ "header": "Finding C-1",
250
+ "multiSelect": false,
251
+ "options": [
252
+ {"label": "A) Accept — apply the suggested fix (Recommended)"},
253
+ {"label": "B) Modify — accept with changes (describe your modification)"},
254
+ {"label": "C) Reject — skip this finding"}
255
+ ]
256
+ }
257
+ ]
258
+ }
259
+ ```
260
+
261
+ *(Example above shows adjudication = Accept. If adjudication = Reject, move `(Recommended)` to option C instead.)*
262
+
263
+ ## Phase 7: Apply
264
+
265
+ For each accepted finding:
266
+ 1. Edit the target file at the exact location cited
267
+ 2. Apply the fix (or user's modified version)
268
+ 3. Surgical edits only — do NOT rewrite surrounding sections
269
+
270
+ After all edits, show summary:
271
+ ```
272
+ Challenge complete.
273
+ Reviewers: N lenses
274
+ Findings: X total → Y accepted, Z rejected
275
+ Severity: N Critical, N High, N Medium
276
+ Files modified: [list]
277
+ Next: /sp-build to implement, or /sp-plan to regenerate if major changes.
278
+ ```
279
+
280
+ If a reviewer returns > 7 findings, take only top 7 by severity. If a reviewer fails, proceed with remaining reviewers.
281
+
282
+ ## Rules — Non-Negotiable
283
+
284
+ 1. **Spawn reviewers in parallel.** Don't run lenses in your own context.
285
+ 2. **Reviewers read files directly.** Pass paths, not content.
286
+ 3. **Be hostile.** No praise. Not in reviewers, not in adjudication.
287
+ 4. **Quote the plan.** Every finding needs a direct quote in Evidence.
288
+ 5. **Don't manufacture findings.** 3 honest findings > 15 padded ones.
289
+ 6. **Skip style/formatting.** Substance only: logic, security, assumptions, scope.
@@ -141,6 +141,41 @@ Also note the **project domain** from CLAUDE.md (payment, booking, content, heal
141
141
 
142
142
  ---
143
143
 
144
+ ## Phase 0.5 — Sibling Discovery Pass (candidate only)
145
+
146
+ Run this after Phase 0 when the feature changes an existing operation, fixes a bug, or touches state/viewer/surface behavior. Purpose: find sibling entry-points that perform the same domain operation but may not be named in the ticket.
147
+
148
+ This pass produces candidates, not requirements. A candidate may become a confirmed surface only after the user/spec/code evidence supports it. Do not auto-promote noisy matches into acceptance scenarios.
149
+
150
+ **Inputs:** raw symptom text, feature nouns, touched component/module, existing code hits from Phase 0, matching project-local `docs/invariants/INV-*.md` entries, and any shared anchors/constants already found.
151
+
152
+ **Deterministic recipe:**
153
+
154
+ 1. **Seed nouns and verbs:** extract 3-8 terms such as domain object (`appointment`, `invite`, `matchup`), operation (`create`, `reschedule`, `cancel`, `send`), and surface nouns (`outreach`, `modal`, `guide`, `calendar`, `queue`).
155
+ 2. **Shared-anchor callers:** if a helper/constant/schema appears central, use `ga_callers` when GA is available; otherwise grep the anchor. Examples: `_stamp_*`, `*_status`, `send_*invite*`, `log_*outcome*`, `create_*`.
156
+ 3. **Fuzzy sibling names:** search for parallel naming patterns: `create_from_*`, `*_from_<source>`, `send_*invite*`, `*_outcome*`, `reschedule*`, `book_next*`, `cancel*`, `delete*`, and domain-specific verbs from Phase 0.
157
+ 4. **Git change-coupling:** inspect recent co-change around seed files with `git log --name-only -- <seed-file>` and look for files/functions repeatedly changed with the seed. This is recall-oriented evidence, not proof.
158
+ 5. **GA blast radius if available:** use `ga_impact` for touched symbols/files to find connected blast radius, but do not treat importers-only output as complete sibling discovery. Siblings may be co-changed or share anchors without importing each other.
159
+
160
+ Record every plausible sibling in a table:
161
+
162
+ | Candidate | Operation | Evidence | Confidence | Obligation |
163
+ |---|---|---|---|---|
164
+ | `<surface/path/symbol>` | same create/update/delete/send/read op? | `ga_callers` / grep / co-change / invariant / user text | high / medium / low | cover / GAP / ignore(reason) |
165
+
166
+ Rules:
167
+
168
+ - `high`: direct shared anchor, explicit invariant sibling, or same operation named in user/spec text.
169
+ - `medium`: strong fuzzy naming or repeated co-change with the seed.
170
+ - `low`: weak name similarity only.
171
+ - `cover`: candidate is confirmed in current scope and must feed `/sp-plan` surfaces.
172
+ - `GAP`: candidate seems material but expected behavior/scope is unknown.
173
+ - `ignore(reason)`: candidate is false positive or intentionally out of scope.
174
+
175
+ Exit condition: every high/medium candidate has `cover`, `GAP`, or `ignore(reason)`. Low-confidence candidates can be listed as notes and do not block handoff.
176
+
177
+ ---
178
+
144
179
  ## Phase 1 — Why, not what
145
180
 
146
181
  **If Phase 0 found existing code > 30%:**
@@ -466,6 +501,69 @@ If B or C → fix and confirm again. Do not proceed to Phase 6.5 until the user
466
501
 
467
502
  ---
468
503
 
504
+ ## Phase 6.25 — Behavior Matrix discovery axes
505
+
506
+ Run this before the self-audit when the feature touches any state/status/stage, permissions, multiple roles/viewers, repeated read surfaces, cross-module write/read propagation, notification, feed, dashboard, calendar, or external integration.
507
+
508
+ Purpose: capture the three axes that `/sp-plan` needs to build `## Behavior Matrix`. Do not fill matrix cells here. Discovery only identifies axes, source paths, and open questions.
509
+
510
+ ### Axis A — States / lifecycle
511
+
512
+ Derive from the user's flow, business rules, existing code, and scenarios:
513
+
514
+ - Explicit statuses/states/stages, including terminal states.
515
+ - Transition triggers: user action, system event, webhook, cron, retry, admin override.
516
+ - Blocked states: states where the action is hidden, disabled, rejected, or should be `N/A`.
517
+ - Timing: immediate, eventually consistent, queued, retryable, or external-service-dependent.
518
+
519
+ If any state is implied but unnamed, ask:
520
+ > "This behavior depends on record state. Which statuses should support it, and which statuses should block it?"
521
+
522
+ ### Axis B — Viewers / roles / relationships
523
+
524
+ Derive from permissions, multi-role flow, ownership, assignment, and notification recipients:
525
+
526
+ - Actor roles: who can perform the write action.
527
+ - Viewer roles: who can see the result after the write.
528
+ - Relationship variants: owner vs assignee vs manager vs admin vs unrelated user vs invited participant.
529
+ - Recipient identity rules: which email/account/contact identity is authoritative when notifications/calendar/events are sent.
530
+
531
+ If the same role can be in different relationships to the record, treat those as separate viewers. Example: `trainer assigned` and `trainer unassigned` are different viewers even if both have role `trainer`.
532
+
533
+ If any viewer is implied but unnamed, ask:
534
+ > "After this change, who needs to see the updated state: actor only, assigned user, manager/admin, external participant, or everyone with list access?"
535
+
536
+ ### Axis C — Surfaces / module paths
537
+
538
+ Derive from codebase scan, UI sketches, affected screens, APIs, notifications, and integrations:
539
+
540
+ - Write surfaces: page/action/form/API/webhook/cron/provider callback that can create or change the state.
541
+ - Read surfaces: list row, detail page, dashboard count, worklist/queue, feed/activity log, API list, API single-get, export/report, email, push/in-app notification, calendar/provider event, search/index, audit log.
542
+ - Module Dependency Map: for each write surface, list every read surface/module expected to reflect it.
543
+ - Existing evidence: attach file paths or route names when Phase 0 found them; mark unknown surfaces as `X / needs confirmation`.
544
+
545
+ For every material write/read pair, record:
546
+
547
+ | Write / CREATE surface | Read surface | Direction | Timing tier | Source of truth | Open question |
548
+ |------------------------|--------------|-----------|-------------|-----------------|---------------|
549
+ | `<form/API/event>` | `<list/detail/feed/...>` | write -> read | `sync` / `async` / `external-down` | DB/read model/provider/cache | `none` / question |
550
+ | `<read/API/provider>` | `<write form/action>` | read -> write | `sync` / `async` / `external-down` | DB/read model/provider/cache | `none` / question |
551
+
552
+ Use both directions when the read surface can initiate or constrain the next write. Example: a worklist row is not just read-only if it contains a reschedule/assign/cancel action.
553
+
554
+ Timing tier definitions:
555
+
556
+ - `sync` — user should see the result immediately after the transaction or page refresh.
557
+ - `async` — background worker, projection, queue, webhook, polling, or eventual consistency is expected.
558
+ - `external-down` — behavior changes when a provider/API is unavailable, delayed, or retries.
559
+
560
+ If any surface pair is unknown, ask:
561
+ > "Besides the detail page, where else must this state appear or be actionable: list, dashboard, queue/worklist, feed, API, email, calendar, or reports?"
562
+
563
+ Exit condition: the handoff has non-empty States, Viewers, and Surfaces lists for stateful features, plus at least one write/read pair for every write surface. If a list is genuinely not applicable, record `N/A` with reason.
564
+
565
+ ---
566
+
469
567
  ## Phase 6.5 — Self-audit (blind spot sweep)
470
568
 
471
569
  **Purpose:** Before writing the handoff summary, step back and think like a senior dev who just received this spec. What would they immediately ask? This step catches the 80% of obvious questions that phase-by-phase discovery misses because it was too focused on following the script. The more thorough this step is, the fewer surprises during implementation.
@@ -619,6 +717,36 @@ Timeout: [if role B does not act within X hours then...]
619
717
  - [New or changed fields/tables]
620
718
  - Migration: [backfill needed / format conversion / data cleanup]
621
719
 
720
+ **Behavior Matrix discovery axes:** _(required for stateful / role-sensitive / multi-surface features; consumed by `/sp-plan`)_
721
+
722
+ Sibling Candidate Table: _(required when Phase 0.5 ran; consumed by `/sp-plan`)_
723
+ | Candidate | Operation | Evidence | Confidence | Obligation |
724
+ |---|---|---|---|---|
725
+ | [surface/path/symbol] | [same create/update/delete/send/read op?] | [ga_callers / grep / co-change / invariant / user text] | high / medium / low | cover / GAP / ignore(reason) |
726
+
727
+ Confirmed sibling surfaces for planning:
728
+ - [surface/path/symbol confirmed from candidate table, or N/A with reason]
729
+
730
+ States / lifecycle:
731
+ - [State/status/stage 1 — transition trigger, terminal? yes/no, blocked? yes/no]
732
+ - [State/status/stage 2 — transition trigger, terminal? yes/no, blocked? yes/no]
733
+
734
+ Viewers / roles / relationships:
735
+ - [Actor/viewer 1 — role + relationship to record + allowed actions]
736
+ - [Actor/viewer 2 — role + relationship to record + allowed actions]
737
+ - [Recipient identity rule if notifications/calendar exist]
738
+
739
+ Surfaces / module paths:
740
+ - Write surfaces: [form/action/API/webhook/cron/provider callback + file/route evidence if known]
741
+ - Read surfaces: [list/detail/dashboard/worklist/feed/API/email/calendar/search/audit/export + file/route evidence if known]
742
+ - Unknown surfaces: [X / needs confirmation, or N/A with reason]
743
+
744
+ CREATE/READ pair map:
745
+ | Write / CREATE surface | Read surface | Direction | Timing tier | Source of truth | Open question |
746
+ |------------------------|--------------|-----------|-------------|-----------------|---------------|
747
+ | [write surface] | [read surface] | write -> read | sync / async / external-down | DB/read model/provider/cache | none / question |
748
+ | [read/action surface] | [write surface] | read -> write | sync / async / external-down | DB/read model/provider/cache | none / question |
749
+
622
750
  **Impact on existing system:**
623
751
  - [Affected screens/flows + description of impact]
624
752
 
@@ -689,6 +817,8 @@ Self-check before writing the output file:
689
817
  - [ ] Input validation is clear for every user-facing field
690
818
  - [ ] Permissions are clear for every relevant role
691
819
  - [ ] If multi-role: cross-role flow confirmed, including timeouts and conflicts
820
+ - [ ] If stateful / role-sensitive / multi-surface: Behavior Matrix discovery axes are filled with States, Viewers, Surfaces, and CREATE/READ pair map
821
+ - [ ] If existing-operation or bug-fix discovery ran: Sibling Candidate Table lists every high/medium candidate with cover / GAP / ignore(reason)
692
822
  - [ ] UI expectation confirmed — dev team has no room to improvise
693
823
  - [ ] Edge cases covered for critical paths *(can be deferred if time-boxed — log as Open questions)*
694
824
  - [ ] Out of scope has at least 1 item listed
@@ -728,3 +858,5 @@ If any item is unchecked → return to the corresponding phase and ask more —
728
858
  | T17 | No phasing discussion | Large scope, short timeline, features cut mid-build with no plan |
729
859
  | T18 | Only asking, never suggesting defaults when client is unsure | Client gets stuck, session drags, no decision made |
730
860
  | T19 | Not suggesting a simpler approach when client's expectations are high | Spec says Three.js for a simple animation — CSS was enough; WebSocket for 5-minute data updates — polling was enough |
861
+ | T20 | Not extracting state/viewer/surface axes | `/sp-plan` has to reconstruct the matrix from prose and misses lifecycle/parity bugs |
862
+ | T21 | Listing surfaces without CREATE/READ timing | Async projections, external-down behavior, and stale read paths are left to QA to discover after code |