claude-devkit-cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,210 @@
1
+ Think hard. This is adversarial plan review — you are the coordinator who sends hostile reviewers to DESTROY a plan, then adjudicates their findings.
2
+
3
+ ## Input
4
+
5
+ Target: $ARGUMENTS
6
+
7
+ If argument is a file path → use that.
8
+ If argument is a feature name → search `docs/test-plans/` and `docs/specs/` for matches.
9
+ If no argument → list recent files in both dirs, ask user which to challenge.
10
+
11
+ ## Phase 1: Read and Map
12
+
13
+ Read the ENTIRE target file. Also read related files:
14
+ - If target is a test plan → also read the corresponding spec in `docs/specs/`
15
+ - If target is a spec → also read the corresponding test plan in `docs/test-plans/`
16
+
17
+ Map the plan's attack surface:
18
+ - Decisions made (and what was rejected)
19
+ - Assumptions (stated AND implied)
20
+ - Dependencies (external services, APIs, libraries, infra)
21
+ - Scope boundaries (in/out/suspiciously unmentioned)
22
+ - Risk acknowledgments (mentioned vs. conspicuously absent)
23
+ - Spec↔plan consistency (use cases without test cases? contradictions?)
24
+
25
+ Collect all file paths the reviewers will need to read.
26
+
27
+ ## Phase 2: Scale Reviewers
28
+
29
+ Assess plan complexity and select which lenses to deploy:
30
+
31
+ | Complexity Signal | Reviewers | Lenses |
32
+ |-------------------|-----------|--------|
33
+ | Simple (1 spec section, <20 test cases, no auth/data) | 2 | Assumptions + Scope |
34
+ | Standard (multiple sections, auth or data involved) | 3 | + Security |
35
+ | Complex (multiple integrations, concurrency, migrations, 6+ phases) | 4 | + Failure Modes |
36
+
37
+ When in doubt, use 3 reviewers. 4 is for genuinely complex plans.
38
+
39
+ ## Phase 3: Spawn Parallel Reviewers
40
+
41
+ Launch reviewers simultaneously using the Agent tool. Each reviewer is an independent subagent that reads the plan files directly and returns findings.
42
+
43
+ **CRITICAL:** Each reviewer prompt MUST include:
44
+ 1. The file paths to read (so they can access the plan directly)
45
+ 2. Their specific adversarial persona and lens
46
+ 3. The exact output format (so you can parse findings consistently)
47
+ 4. The rules of engagement
48
+
49
+ ### Reviewer Prompts
50
+
51
+ For each selected lens, spawn an agent with this structure:
52
+
53
+ ```
54
+ You are a hostile reviewer. Your job is to DESTROY this plan by finding every flaw through the {LENS_NAME} lens.
55
+
56
+ Read these files first:
57
+ {LIST OF FILE PATHS}
58
+
59
+ --- YOUR LENS ---
60
+
61
+ {LENS-SPECIFIC INSTRUCTIONS — see below}
62
+
63
+ --- OUTPUT FORMAT ---
64
+
65
+ For EACH flaw found, output exactly:
66
+
67
+ ### Finding: <title>
68
+ - **Severity:** Critical | High | Medium
69
+ - **Location:** <exact section or heading in the plan>
70
+ - **Flaw:** <what's wrong — be specific>
71
+ - **Evidence:** "<direct quote from the plan>"
72
+ - **Failure scenario:** <step-by-step: how this causes a real problem in production>
73
+ - **Root cause:** <why does this flaw exist? Missing requirement? Wrong assumption?>
74
+ - **Suggested fix:** <specific, actionable — not just "fix it">
75
+
76
+ --- RULES ---
77
+
78
+ - 3-7 findings per lens. Quality over quantity.
79
+ - Be HOSTILE. No praise. No "overall looks good."
80
+ - Be SPECIFIC. Cite exact sections. Quote the plan.
81
+ - Be CONCRETE. Failure scenarios must be step-by-step, not "could be a problem."
82
+ - Skip trivial issues (naming, formatting, style).
83
+ - If the plan is solid for your lens, 1-2 findings is honest. Don't manufacture problems.
84
+ ```
85
+
86
+ ### Lens-Specific Instructions
87
+
88
+ **Security Adversary:**
89
+ ```
90
+ You are an attacker with knowledge of the tech stack and access to the public API.
91
+
92
+ Examine the plan for:
93
+ - Authentication/authorization bypass: Can auth be skipped? Can user A access user B's data? Are role checks at every layer?
94
+ - Injection vectors: Where does user input enter? SQL, shell, HTML, template, log injection? Parameterized queries?
95
+ - Data exposure: What leaks in error messages, logs, API responses? Stack traces? Internal paths? DB schemas?
96
+ - Cryptography: Password hashing (bcrypt/argon2, not MD5/SHA)? Secrets in env vars not code? TLS?
97
+ - Supply chain: New dependencies? Maintained? Known CVEs?
98
+ - OWASP Top 10 (2021): Broken Access Control, Crypto Failures, Injection, Insecure Design, Security Misconfiguration, Vulnerable Components, Identity Failures, Integrity Failures, Logging Failures, SSRF
99
+ ```
100
+
101
+ **Failure Mode Analyst:**
102
+ ```
103
+ You believe Murphy's Law: everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic.
104
+
105
+ Examine the plan for:
106
+ - Partial failures: What if step 3 of 5 fails? Rollback? Atomic writes? Inconsistent state?
107
+ - Concurrency: Race conditions? Two users editing same resource? Shared mutable state? Deadlocks?
108
+ - Cascading failures: Service A down → B also fails? Circuit-breaking? Graceful degradation?
109
+ - Data integrity: Data loss? Corruption? Duplication? DB-level constraints or app-only validation?
110
+ - Recovery: How to recover from each failure? Reversible migrations? Backup restoration time?
111
+ - Deployment: What breaks during deploy? Rollback plan? Migration failures?
112
+ - Idempotency: Retried requests duplicate data? Double-charge? Double-email?
113
+ - Observability: How do you KNOW something failed? Logging? Monitoring? Alerts? Or angry users?
114
+ ```
115
+
116
+ **Assumption Destroyer:**
117
+ ```
118
+ You are a radical skeptic. "It should work" is not evidence. "We assume X" means X is unverified.
119
+
120
+ Examine the plan for:
121
+ - Unverified claims: "The API returns X" — tested? "The library supports Y" — checked docs?
122
+ - Scale assumptions: Expected load? Works at 10x? 100x? O(n²) hiding in "iterate all items"?
123
+ - Environment gaps: Same behavior in dev/staging/prod? Different OS? Docker vs bare metal?
124
+ - Integration risk: Third-party SLA? Rate limits? Their service down → your plan?
125
+ - Data assumptions: Always clean? Unicode? Emoji? Null bytes? 10MB payloads? Empty strings?
126
+ - User behavior: Will users actually do this? What if they click 50 times? Upload 2GB? Use mobile?
127
+ - Timing: "A before B" — always? What if B first? Implicit ordering dependencies?
128
+ - Hidden dependencies: Services, configs, env vars, or manual steps that must exist but aren't documented?
129
+ ```
130
+
131
+ **Scope & Complexity Critic (YAGNI Enforcer):**
132
+ ```
133
+ You believe the best code is no code. The best feature is the one you didn't build.
134
+
135
+ Examine the plan for:
136
+ - Over-engineering: Solving problems that don't exist yet? "In case we need it later" = YAGNI.
137
+ - Premature abstraction: Generic framework for 1 use case? Plugin system nobody asked for?
138
+ - Missing MVP: What's the absolute minimum viable delivery? Can 40% be deferred?
139
+ - Complexity vs value: Distributed system for 5 users? Proportional?
140
+ - Gold plating: Nice-to-have mixed with must-have? Can you ship without the nice-to-haves?
141
+ - Simpler alternative: Boring 10-line solution vs clever 500-line solution?
142
+ - Test burden: Test cases harder to maintain than the feature itself?
143
+ ```
144
+
145
+ ## Phase 4: Collect and Consolidate
146
+
147
+ After all reviewers complete:
148
+
149
+ 1. **Collect** all findings from all reviewers
150
+ 2. **Deduplicate** — if two lenses found the same root issue, merge into one finding noting both lenses
151
+ 3. **Rate severity** using Likelihood × Impact:
152
+
153
+ | | Low Impact | Medium Impact | High Impact |
154
+ |---|-----------|---------------|-------------|
155
+ | **Likely** | Medium | High | Critical |
156
+ | **Possible** | Low | Medium | High |
157
+ | **Unlikely** | Low | Low | Medium |
158
+
159
+ 4. **Sort** by severity: Critical → High → Medium → Low
160
+ 5. **Cap** at 15 findings: keep all Critical, top High by specificity, note how many Medium were dropped
161
+ 6. **Cross-reference check** (you, not reviewers): If both spec and test plan exist, flag any use cases in spec without test cases, and any test cases that contradict the spec
162
+
163
+ ## Phase 5: Adjudicate
164
+
165
+ For each finding, YOU (the coordinator) evaluate and propose a disposition:
166
+
167
+ | Disposition | When to use |
168
+ |-------------|-------------|
169
+ | **Accept** | Valid flaw. Plan should be updated. |
170
+ | **Reject** | False positive, acceptable risk, or already handled elsewhere. |
171
+
172
+ Include 1-sentence rationale for each disposition. Be honest — don't reject valid findings to be nice, and don't accept trivial findings to pad the list.
173
+
174
+ ## Phase 6: Present to User
175
+
176
+ Show adjudicated findings using the reviewer output format plus Disposition and Rationale fields.
177
+
178
+ Then ask: "How would you like to proceed?"
179
+ 1. **"Apply all accepted"** — update the plan with all accepted fixes
180
+ 2. **"Review each"** — walk through one by one, accept/reject/modify each
181
+
182
+ If user picks "Review each": for each finding, ask "Accept / Reject / Modify fix?"
183
+
184
+ ## Phase 7: Apply
185
+
186
+ For each accepted finding:
187
+ 1. Edit the target file at the exact location cited
188
+ 2. Apply the fix (or user's modified version)
189
+ 3. Surgical edits only — do NOT rewrite surrounding sections
190
+
191
+ After all edits, show summary:
192
+ ```
193
+ Challenge complete.
194
+ Reviewers: N lenses
195
+ Findings: X total → Y accepted, Z rejected
196
+ Severity: N Critical, N High, N Medium
197
+ Files modified: [list]
198
+ Next: /test to implement, or /plan to regenerate if major changes.
199
+ ```
200
+
201
+ If a reviewer returns > 7 findings, take only top 7 by severity. If a reviewer fails, proceed with remaining reviewers.
202
+
203
+ ## Rules — Non-Negotiable
204
+
205
+ 1. **Spawn reviewers in parallel.** Don't run lenses in your own context.
206
+ 2. **Reviewers read files directly.** Pass paths, not content.
207
+ 3. **Be hostile.** No praise. Not in reviewers, not in adjudication.
208
+ 4. **Quote the plan.** Every finding needs a direct quote in Evidence.
209
+ 5. **Don't manufacture findings.** 3 honest findings > 15 padded ones.
210
+ 6. **Skip style/formatting.** Substance only: logic, security, assumptions, scope.
@@ -0,0 +1,97 @@
1
+ EXECUTE, not EXPLORE. Follow these steps exactly. Minimize tool calls.
2
+
3
+ ## Step 1 — Analyze (single compound command)
4
+
5
+ ```bash
6
+ echo "=== STATUS ===" && \
7
+ git status --short 2>/dev/null && \
8
+ echo "=== DIFF STAT ===" && \
9
+ git diff --stat 2>/dev/null && \
10
+ git diff --cached --stat 2>/dev/null && \
11
+ echo "=== METRICS ===" && \
12
+ { git diff --shortstat 2>/dev/null; git diff --cached --shortstat 2>/dev/null; } && \
13
+ echo "=== SECRETS ===" && \
14
+ (git diff 2>/dev/null; git diff --cached 2>/dev/null) | grep -ciE "(api[_-]?key|token|password|secret|private[_-]?key|credential|auth[_-]?token)" || echo "0" && \
15
+ echo "=== DEBUG ===" && \
16
+ (git diff 2>/dev/null; git diff --cached 2>/dev/null) | grep -ciE "(console\.log|debugger|print\(|TODO:.*remove|HACK:|FIXME:.*temp|binding\.pry|var_dump)" || echo "0"
17
+ ```
18
+
19
+ ---
20
+
21
+ ## Step 2 — Safety checks
22
+
23
+ **Secrets (hard block):** If count > 0, show matched lines and STOP. Do not commit.
24
+
25
+ **Debug code (soft warn):** If count > 0, show matched lines. Proceed only after user confirms they're intentional.
26
+
27
+ **Large diff:** If > 10 files or > 300 lines, note: "Large commit — consider splitting for easier review." Continue unless user says to split.
28
+
29
+ ---
30
+
31
+ ## Step 3 — Stage files
32
+
33
+ Prefer staging specific files by name. Do NOT use `git add -A`.
34
+
35
+ Never stage: `.env`, credentials, build artifacts, generated files, binaries > 1MB.
36
+
37
+ ---
38
+
39
+ ## Step 4 — Generate commit message
40
+
41
+ **Format:** `type(scope): description`
42
+
43
+ | Type | When |
44
+ |------|------|
45
+ | `feat` | New feature |
46
+ | `fix` | Bug fix |
47
+ | `docs` | Documentation only |
48
+ | `test` | Tests only |
49
+ | `refactor` | Code change, no behavior change |
50
+ | `chore` | Maintenance, deps, config |
51
+ | `perf` | Performance improvement |
52
+ | `build` | Build system |
53
+ | `ci` | CI/CD changes |
54
+
55
+ **Breaking changes:** If diff removes/renames a public function, export, or API endpoint → use `feat!` or `fix!` type, or add `BREAKING CHANGE:` footer.
56
+
57
+ **Rules:** Under 72 chars. Imperative tense ("add" not "added"). No period. WHAT+WHY, not HOW.
58
+
59
+ **Bad examples — avoid:**
60
+ - ❌ `Updated some files` — not descriptive
61
+ - ❌ `feat(auth): added login validation using bcrypt with salt rounds of 12` — too long, describes HOW
62
+ - ❌ `Fix bug` — not specific
63
+ - ❌ `WIP` — never commit unfinished work
64
+
65
+ ---
66
+
67
+ ## Step 5 — Commit
68
+
69
+ ```bash
70
+ git commit -m "$(cat <<'EOF'
71
+ type(scope): description
72
+
73
+ Co-Authored-By: Claude <noreply@anthropic.com>
74
+ EOF
75
+ )"
76
+ ```
77
+
78
+ **Do NOT push** unless user explicitly asks.
79
+
80
+ ---
81
+
82
+ ## Output
83
+
84
+ ```
85
+ staged: N files (+X/-Y lines)
86
+ checks: secrets ✓ | debug ✓
87
+ commit: abc1234 type(scope): description
88
+ pushed: no
89
+ ```
90
+
91
+ Keep under 5 lines. No explanations.
92
+
93
+ ## Rules
94
+ 1. **Specific files, not `git add -A`.** Stage intentionally.
95
+ 2. **Secrets = hard block.** No exceptions.
96
+ 3. **Never push without explicit request.**
97
+ 4. **One concern per commit.** Mixed features → suggest separate commits.
@@ -0,0 +1,95 @@
1
+ Test-first bug fix. Investigate → Reproduce → Fix → Verify → Learn.
2
+
3
+ Bug: $ARGUMENTS
4
+
5
+ ---
6
+
7
+ ## Phase 0: Investigate
8
+
9
+ Don't jump to code. Understand the bug first:
10
+
11
+ 1. **Parse the report.** Symptom? Expected vs actual? Repro steps?
12
+ 2. **Locate the code.** Grep for keywords from the bug (error messages, function names).
13
+ 3. **Check history.** `git log --oneline -5 -- <file>` and `git blame -L <range> <file>` — who changed this last and why?
14
+ 4. **Form a hypothesis:** "I believe the bug is caused by [X] in [file:function] because [evidence]."
15
+
16
+ If the bug is in a dependency/config/data (not our code), say so before proceeding.
17
+
18
+ ---
19
+
20
+ ## Phase 1: Write a Failing Test
21
+
22
+ Write a test that reproduces the bug. It **MUST fail** with current code.
23
+
24
+ Add a comment: `// Regression: <bug description> — <expected> vs <actual>`
25
+
26
+ Run it:
27
+ ```
28
+ bash scripts/build-test.sh --filter "<test name>"
29
+ ```
30
+
31
+ - **FAILS** → reproduced. Continue.
32
+ - **PASSES** → hypothesis may be wrong. Ask: "Test passes — need different repro steps or environment details."
33
+
34
+ ---
35
+
36
+ ## Phase 2: Fix
37
+
38
+ Make the **minimal change** needed.
39
+
40
+ | Do | Don't |
41
+ |----|-------|
42
+ | Fix the specific bug | Refactor surrounding code |
43
+ | Add a guard for the edge case | Rewrite the function |
44
+ | Explain what and why before editing | Silently change code |
45
+
46
+ ---
47
+
48
+ ## Phase 3: Verify
49
+
50
+ 1. Run the bug test: `bash scripts/build-test.sh --filter "<test name>"` → must PASS.
51
+ 2. Run full suite: `bash scripts/build-test.sh` → no regressions.
52
+
53
+ If other tests break → the fix caused a regression. Investigate. Do NOT weaken existing tests.
54
+
55
+ ---
56
+
57
+ ## Phase 4: Root Cause Analysis
58
+
59
+ After fixing, document:
60
+
61
+ ```
62
+ Symptom: <what the user saw>
63
+ Root cause: <why it happened>
64
+ Gap: <why not caught earlier — missing test? wrong assumption? missing spec?>
65
+ Prevention: <suggest one: type constraint, validation, lint rule, spec update, or test plan update>
66
+ ```
67
+
68
+ This is non-optional for serious bugs. For trivial bugs, the fix summary is enough.
69
+
70
+ ---
71
+
72
+ ## Phase 5: Summary
73
+
74
+ ```
75
+ Bug: <description>
76
+ Hypothesis: <what you predicted> → <confirmed or actual cause>
77
+ Test added: <file>:<test name>
78
+ Fix: <file>:<lines> — <what changed>
79
+ Root cause: <1 sentence>
80
+ Prevention: <suggestion>
81
+ Full suite: All passing ✓
82
+ ```
83
+
84
+ If the bug reveals an undocumented edge case: "Consider updating the spec at docs/specs/<feature>.md."
85
+
86
+ ## Multiple Bugs
87
+
88
+ If `$ARGUMENTS` describes multiple bugs: triage by severity, fix one at a time, commit each separately.
89
+
90
+ ## Rules
91
+ 1. **Investigate before coding.** Hypothesis before test. Evidence before fix.
92
+ 2. **Minimal fix.** One bug, one change. Don't improve the neighborhood.
93
+ 3. **Never weaken tests.** If existing tests break, the fix is wrong.
94
+ 4. **Ask before touching production code** if unsure.
95
+ 5. **One bug, one commit.** Each fix independently revertable.
@@ -0,0 +1,141 @@
1
+ Think hard about this task. A bad plan wastes days, a good plan saves weeks.
2
+
3
+ ## Determine mode
4
+
5
+ Examine `$ARGUMENTS`:
6
+
7
+ - **Mode A — Spec exists:** Argument is a file path → read spec, generate test plan.
8
+ - **Mode B — No spec:** Argument is a description → create spec + test plan.
9
+ - **Mode C — Update:** Argument mentions "update" or existing path → read existing, update surgically.
10
+
11
+ ---
12
+
13
+ ## Phase 0: Codebase Awareness
14
+
15
+ Before writing anything:
16
+ 1. Scan existing code in the feature area — what files, functions, types already exist?
17
+ 2. Check `docs/specs/` — is there already a spec for this or a related feature?
18
+ 3. Check `docs/test-plans/` — any overlap with existing plans?
19
+ 4. Identify project patterns — test framework, naming conventions, directory structure.
20
+
21
+ Don't plan in a vacuum. A spec that ignores existing code creates conflicts.
22
+
23
+ ---
24
+
25
+ ## Phase 1: Draft the Spec (Mode B only)
26
+
27
+ Create at `docs/specs/<feature-name>.md`. Include these sections (skip any that don't apply):
28
+
29
+ - **Overview** — what, why, who. 2-3 sentences.
30
+ - **Data Model** — entities, attributes, relationships (table format)
31
+ - **Use Cases** — UC-NNN with actor, preconditions, flow, postconditions, error cases. Each use case contains:
32
+ - **FR-NNN** (Functional Requirements) — specific behaviors the system must exhibit
33
+ - **SC-NNN** (Success Criteria) — measurable non-functional targets (performance, limits)
34
+ - **State Machine** — states and valid transitions (if applicable)
35
+ - **Settings/Configuration** — configurable behavior and defaults
36
+ - **Constraints & Invariants** — rules that must ALWAYS hold
37
+ - **Error Handling** — how errors surface to users and are logged
38
+ - **Security Considerations** — auth, authorization, data sensitivity
39
+
40
+ Match depth to complexity. Simple CRUD = 1 paragraph overview + 3 use cases. Complex auth system = full template. Don't generate filler for sections that don't apply.
41
+
42
+ Show the draft to the user. Wait for confirmation before generating the test plan.
43
+
44
+ ---
45
+
46
+ ## Phase 2: Clarify Ambiguities
47
+
48
+ Before generating the test plan, scan the spec for gaps. A test plan built on a vague spec produces vague tests.
49
+
50
+ | Lens | What to look for |
51
+ |------|-----------------|
52
+ | **Behavioral gaps** | Missing user actions, undefined system responses, incomplete flows |
53
+ | **Data & persistence** | Undefined entities, missing relationships, unclear storage/lifecycle |
54
+ | **Auth & access** | Who can do what is unclear, missing role definitions |
55
+ | **Non-functional** | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with numbers |
56
+ | **Integration** | Third-party API assumptions, unstated dependencies, SLA gaps |
57
+ | **Concurrency & edge cases** | Multi-user scenarios, boundary conditions, error paths not addressed |
58
+
59
+ Identify the top 3-5 ambiguities (most impactful first). For each, ask the user a targeted question with 2-4 concrete options and a recommendation.
60
+
61
+ If the spec is clear and complete, 0 questions is valid. Don't manufacture ambiguity.
62
+
63
+ Write clarifications back into the spec under `## Clarifications — <date>`.
64
+ Then proceed to test plan generation.
65
+
66
+ ---
67
+
68
+ ## Phase 3: Generate the Test Plan
69
+
70
+ Read the spec. For each section, extract:
71
+ 1. Use cases → at least 1 test (happy path) + 1 test (error path) each
72
+ 2. State transitions → test valid AND invalid transitions
73
+ 3. Constraints → test they hold under edge conditions
74
+ 4. Settings → test default AND non-default values
75
+ 5. Cross-cutting concerns (auth, validation) → integration-level tests
76
+
77
+ Prioritize by risk: data loss/security = P0, error handling = P1, cosmetic/rare = P2.
78
+
79
+ ### Output
80
+
81
+ Write to `docs/test-plans/<feature-name>.md`:
82
+
83
+ ```markdown
84
+ # Test Plan: <Feature Name>
85
+
86
+ **Spec:** docs/specs/<feature-name>.md
87
+ **Generated:** <$(date +%Y-%m-%d)>
88
+
89
+ ## Test Cases
90
+
91
+ | ID | Priority | Type | UC | FR/SC | Description | Expected |
92
+ |----|----------|------|----|-------|-------------|----------|
93
+ | TC-001 | P0 | unit | UC-001 | FR-001 | Valid login returns token | 200 + JWT |
94
+ | TC-002 | P0 | unit | UC-001 | FR-002 | Wrong password returns 401 | 401 + error msg |
95
+
96
+ ## Implementation Order
97
+ 1. TC-001, TC-002 (no dependencies — start here)
98
+ 2. TC-003+ (depend on setup from earlier tests)
99
+
100
+ ## Coverage Notes
101
+ - Highest risk areas: ...
102
+ - Existing code needing modification: [file paths]
103
+ ```
104
+
105
+ **Priority:** P0 = must have (blocks release), P1 = should have, P2 = nice to have.
106
+ **Type:** `unit`, `integration`, `e2e`, `snapshot`, `performance`
107
+
108
+ ### What NOT to produce
109
+ - "Test that the feature works" — too vague
110
+ - 50+ test cases for simple CRUD — over-testing
111
+ - Testing implementation details — brittle
112
+ - Duplicate tests verifying same behavior
113
+
114
+ ---
115
+
116
+ ## Phase 4: Summary
117
+
118
+ Show: test case counts (P0/P1/P2), implementation order, estimated scope.
119
+ Next steps: "Use `/test` after each chunk. For complex plans, run `/challenge` first."
120
+
121
+ ## Naming Convention
122
+
123
+ Spec and test plan MUST share the same filename:
124
+ ```
125
+ docs/specs/<feature-name>.md ← kebab-case, 2-3 words
126
+ docs/test-plans/<feature-name>.md ← same name
127
+ ```
128
+ - Use feature name, not module name: `user-auth.md` not `AuthService.md`
129
+ - No prefix/suffix: `user-auth.md` not `spec-user-auth.md`
130
+
131
+ **Requirement IDs** — sequential per spec:
132
+ - `UC-001` Use Case, `FR-001` Functional Requirement, `SC-001` Success Criteria, `TC-001` Test Case
133
+ - Every TC must reference at least one FR or SC for traceability.
134
+
135
+ ## Rules
136
+ 1. **Spec-first.** Test plan derives from spec, never from code.
137
+ 2. **Codebase-aware.** Don't plan features that already exist.
138
+ 3. **Actionable.** Every test case must be unambiguous enough to implement directly.
139
+ 4. **Proportional.** Simple feature = simple plan. Don't over-engineer CRUD.
140
+ 5. **Traceable.** Every test links to a use case. No orphan tests.
141
+ 6. **Consistent names.** Spec and test plan always share the same filename.
@@ -0,0 +1,109 @@
1
+ Think hard about this review. You are the last gate before code reaches the main branch.
2
+
3
+ ## Phase 0: Understand Intent
4
+
5
+ 1. Read commit messages:
6
+ ```
7
+ BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||') || BASE="main"
8
+ git log --oneline "$BASE"...HEAD
9
+ ```
10
+ 2. Check for spec in `docs/specs/` and test plan in `docs/test-plans/` — review against INTENT.
11
+ 3. Read the diff: `git diff "$BASE"...HEAD`
12
+
13
+ If `$ARGUMENTS` provided → scope to those files only.
14
+ If diff > 500 lines → review file-by-file, prioritize by smart focus below.
15
+
16
+ ---
17
+
18
+ ## Phase 1: Smart Focus
19
+
20
+ Auto-detect primary focus from diff content:
21
+
22
+ | Diff contains | Focus heavily on |
23
+ |--------------|-----------------|
24
+ | auth, login, token, session, password, JWT | Security — full depth |
25
+ | SQL, query, database, migration | Injection + data integrity |
26
+ | API, endpoint, route, controller, handler | Input validation + error handling |
27
+ | .env, config, secret, key, credential | Secret exposure |
28
+ | Test files only | Test quality (skip security deep-dive) |
29
+ | Docs/comments only | Accuracy only (minimal review) |
30
+ | Payment, billing, transaction | Correctness + idempotency |
31
+
32
+ Spend 60% of analysis on the primary focus. Cover all categories, but proportionally.
33
+
34
+ ---
35
+
36
+ ## Phase 2: Checklist
37
+
38
+ ### Security (Critical)
39
+ - **Injection:** Search diff for string concatenation in SQL/shell/HTML. Look for `${var}` in queries, `.innerHTML`, template literals in SQL. Flag any user input reaching a query without parameterization.
40
+ - **Auth/Authz:** New endpoint → has auth middleware? Can user A access user B's data? ID in URL without ownership check?
41
+ - **Secrets:** Hardcoded strings matching `sk-`, `ghp_`, `Bearer `, long base64. New env vars committed?
42
+ - **Error exposure:** Catch blocks sending raw errors to users? Stack traces, file paths, DB schemas in responses?
43
+ - **Dependencies:** New packages — maintained? >1000 weekly downloads? Known CVEs?
44
+
45
+ ### Correctness (High)
46
+ - **Logic vs intent:** Does the code do what commits/spec claim? "Add validation" but code just logs?
47
+ - **Edge cases:** null, empty, 0, negative, MAX_INT, unicode, very long strings — handled?
48
+ - **Error handling:** For each try/catch — error logged with context? User shown safe message? Resources cleaned in finally?
49
+ - **Concurrency:** Shared state without locks? Read-then-write without atomicity? Non-atomic DB updates?
50
+ - **Null safety:** Optionals used without guards? `object!.property` without nil check?
51
+
52
+ ### Spec-Test Alignment (Medium)
53
+ - Source changed but no spec update in `docs/specs/`? → flag
54
+ - Source changed but no test update? → flag
55
+ - Spec changed but tests not updated? → flag
56
+ - Code removed but dead tests remain? → flag
57
+ - Spec contains vague requirements without metrics ("fast", "secure", "easy", "scalable")? → flag with suggestion to add SC-NNN with concrete numbers
58
+
59
+ ### Code Quality (Medium)
60
+ - Dead code: removed functions still imported elsewhere?
61
+ - Obvious duplication: copy-pasted blocks that should be shared?
62
+ - Naming: consistent with codebase? Descriptive?
63
+ - Complexity: functions > 40 lines or > 3 nesting levels?
64
+
65
+ ### Performance (Low)
66
+ - Flag N+1 queries, unbounded collections, redundant computation in loops.
67
+
68
+ ---
69
+
70
+ ## Phase 3: Output
71
+
72
+ ```markdown
73
+ ## Code Review: <branch or description>
74
+
75
+ **Scope:** X files, +Y/-Z lines
76
+ **Focus:** <auto-detected>
77
+ **Verdict:** APPROVE / REQUEST CHANGES / NEEDS DISCUSSION
78
+
79
+ ### Critical Issues
80
+ **[C-1] file.ts:42 — SQL injection via unsanitized input**
81
+ `req.query.search` concatenated into SQL. Use parameterized query.
82
+
83
+ ### High Priority
84
+ **[H-1] file.ts:87 — Empty catch swallows DB errors**
85
+ Users see blank screen. Log with context, return safe error.
86
+
87
+ ### Medium Priority
88
+ **[M-1] Spec-test gap — rate limiting not in spec**
89
+ New logic at auth-service.ts:45-62 undocumented.
90
+
91
+ ### Low Priority
92
+ **[L-1] Consider caching config lookup (called 3x per request)**
93
+
94
+ ### Positive Notes
95
+ (At least 1 — reinforce good patterns)
96
+ - Clean middleware separation in auth-middleware.ts
97
+ - Thorough edge case tests
98
+
99
+ ### Summary
100
+ <1-2 sentences: quality + clear next action>
101
+ ```
102
+
103
+ ## Rules
104
+ 1. **Never auto-fix.** Report only.
105
+ 2. **Specific.** Every finding has `file:line` and concrete description.
106
+ 3. **Severity matches impact.** Style nits = Low. Injection = Critical.
107
+ 4. **Positive notes mandatory.** Reviews aren't just about problems.
108
+ 5. **Review against intent.** Not just "clean code?" but "does this match spec/commits?"
109
+ 6. **Proportional.** 5-line doc change ≠ 500-line auth rewrite.