claude-devkit-cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/devkit.js +3 -0
- package/package.json +36 -0
- package/src/cli.js +72 -0
- package/src/commands/check.js +33 -0
- package/src/commands/diff.js +90 -0
- package/src/commands/init.js +232 -0
- package/src/commands/list.js +50 -0
- package/src/commands/remove.js +74 -0
- package/src/commands/upgrade.js +108 -0
- package/src/lib/detector.js +93 -0
- package/src/lib/hasher.js +21 -0
- package/src/lib/installer.js +175 -0
- package/src/lib/logger.js +16 -0
- package/src/lib/manifest.js +79 -0
- package/templates/.claude/CLAUDE.md +74 -0
- package/templates/.claude/commands/challenge.md +210 -0
- package/templates/.claude/commands/commit.md +97 -0
- package/templates/.claude/commands/fix.md +95 -0
- package/templates/.claude/commands/plan.md +141 -0
- package/templates/.claude/commands/review.md +109 -0
- package/templates/.claude/commands/test.md +99 -0
- package/templates/.claude/hooks/comment-guard.js +114 -0
- package/templates/.claude/hooks/file-guard.js +120 -0
- package/templates/.claude/hooks/glob-guard.js +96 -0
- package/templates/.claude/hooks/path-guard.sh +73 -0
- package/templates/.claude/hooks/self-review.sh +29 -0
- package/templates/.claude/hooks/sensitive-guard.sh +214 -0
- package/templates/.claude/settings.json +68 -0
- package/templates/docs/WORKFLOW.md +231 -0
- package/templates/docs/specs/.gitkeep +0 -0
- package/templates/docs/test-plans/.gitkeep +0 -0
- package/templates/scripts/build-test.sh +260 -0
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
Think hard. This is adversarial plan review — you are the coordinator who sends hostile reviewers to DESTROY a plan, then adjudicates their findings.
|
|
2
|
+
|
|
3
|
+
## Input
|
|
4
|
+
|
|
5
|
+
Target: $ARGUMENTS
|
|
6
|
+
|
|
7
|
+
If argument is a file path → use that.
|
|
8
|
+
If argument is a feature name → search `docs/test-plans/` and `docs/specs/` for matches.
|
|
9
|
+
If no argument → list recent files in both dirs, ask user which to challenge.
|
|
10
|
+
|
|
11
|
+
## Phase 1: Read and Map
|
|
12
|
+
|
|
13
|
+
Read the ENTIRE target file. Also read related files:
|
|
14
|
+
- If target is a test plan → also read the corresponding spec in `docs/specs/`
|
|
15
|
+
- If target is a spec → also read the corresponding test plan in `docs/test-plans/`
|
|
16
|
+
|
|
17
|
+
Map the plan's attack surface:
|
|
18
|
+
- Decisions made (and what was rejected)
|
|
19
|
+
- Assumptions (stated AND implied)
|
|
20
|
+
- Dependencies (external services, APIs, libraries, infra)
|
|
21
|
+
- Scope boundaries (in/out/suspiciously unmentioned)
|
|
22
|
+
- Risk acknowledgments (mentioned vs. conspicuously absent)
|
|
23
|
+
- Spec↔plan consistency (use cases without test cases? contradictions?)
|
|
24
|
+
|
|
25
|
+
Collect all file paths the reviewers will need to read.
|
|
26
|
+
|
|
27
|
+
## Phase 2: Scale Reviewers
|
|
28
|
+
|
|
29
|
+
Assess plan complexity and select which lenses to deploy:
|
|
30
|
+
|
|
31
|
+
| Complexity Signal | Reviewers | Lenses |
|
|
32
|
+
|-------------------|-----------|--------|
|
|
33
|
+
| Simple (1 spec section, <20 test cases, no auth/data) | 2 | Assumptions + Scope |
|
|
34
|
+
| Standard (multiple sections, auth or data involved) | 3 | + Security |
|
|
35
|
+
| Complex (multiple integrations, concurrency, migrations, 6+ phases) | 4 | + Failure Modes |
|
|
36
|
+
|
|
37
|
+
When in doubt, use 3 reviewers. 4 is for genuinely complex plans.
|
|
38
|
+
|
|
39
|
+
## Phase 3: Spawn Parallel Reviewers
|
|
40
|
+
|
|
41
|
+
Launch reviewers simultaneously using the Agent tool. Each reviewer is an independent subagent that reads the plan files directly and returns findings.
|
|
42
|
+
|
|
43
|
+
**CRITICAL:** Each reviewer prompt MUST include:
|
|
44
|
+
1. The file paths to read (so they can access the plan directly)
|
|
45
|
+
2. Their specific adversarial persona and lens
|
|
46
|
+
3. The exact output format (so you can parse findings consistently)
|
|
47
|
+
4. The rules of engagement
|
|
48
|
+
|
|
49
|
+
### Reviewer Prompts
|
|
50
|
+
|
|
51
|
+
For each selected lens, spawn an agent with this structure:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
You are a hostile reviewer. Your job is to DESTROY this plan by finding every flaw through the {LENS_NAME} lens.
|
|
55
|
+
|
|
56
|
+
Read these files first:
|
|
57
|
+
{LIST OF FILE PATHS}
|
|
58
|
+
|
|
59
|
+
--- YOUR LENS ---
|
|
60
|
+
|
|
61
|
+
{LENS-SPECIFIC INSTRUCTIONS — see below}
|
|
62
|
+
|
|
63
|
+
--- OUTPUT FORMAT ---
|
|
64
|
+
|
|
65
|
+
For EACH flaw found, output exactly:
|
|
66
|
+
|
|
67
|
+
### Finding: <title>
|
|
68
|
+
- **Severity:** Critical | High | Medium
|
|
69
|
+
- **Location:** <exact section or heading in the plan>
|
|
70
|
+
- **Flaw:** <what's wrong — be specific>
|
|
71
|
+
- **Evidence:** "<direct quote from the plan>"
|
|
72
|
+
- **Failure scenario:** <step-by-step: how this causes a real problem in production>
|
|
73
|
+
- **Root cause:** <why does this flaw exist? Missing requirement? Wrong assumption?>
|
|
74
|
+
- **Suggested fix:** <specific, actionable — not just "fix it">
|
|
75
|
+
|
|
76
|
+
--- RULES ---
|
|
77
|
+
|
|
78
|
+
- 3-7 findings per lens. Quality over quantity.
|
|
79
|
+
- Be HOSTILE. No praise. No "overall looks good."
|
|
80
|
+
- Be SPECIFIC. Cite exact sections. Quote the plan.
|
|
81
|
+
- Be CONCRETE. Failure scenarios must be step-by-step, not "could be a problem."
|
|
82
|
+
- Skip trivial issues (naming, formatting, style).
|
|
83
|
+
- If the plan is solid for your lens, 1-2 findings is honest. Don't manufacture problems.
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Lens-Specific Instructions
|
|
87
|
+
|
|
88
|
+
**Security Adversary:**
|
|
89
|
+
```
|
|
90
|
+
You are an attacker with knowledge of the tech stack and access to the public API.
|
|
91
|
+
|
|
92
|
+
Examine the plan for:
|
|
93
|
+
- Authentication/authorization bypass: Can auth be skipped? Can user A access user B's data? Are role checks at every layer?
|
|
94
|
+
- Injection vectors: Where does user input enter? SQL, shell, HTML, template, log injection? Parameterized queries?
|
|
95
|
+
- Data exposure: What leaks in error messages, logs, API responses? Stack traces? Internal paths? DB schemas?
|
|
96
|
+
- Cryptography: Password hashing (bcrypt/argon2, not MD5/SHA)? Secrets in env vars not code? TLS?
|
|
97
|
+
- Supply chain: New dependencies? Maintained? Known CVEs?
|
|
98
|
+
- OWASP Top 10 (2021): Broken Access Control, Crypto Failures, Injection, Insecure Design, Security Misconfiguration, Vulnerable Components, Identity Failures, Integrity Failures, Logging Failures, SSRF
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
**Failure Mode Analyst:**
|
|
102
|
+
```
|
|
103
|
+
You believe Murphy's Law: everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic.
|
|
104
|
+
|
|
105
|
+
Examine the plan for:
|
|
106
|
+
- Partial failures: What if step 3 of 5 fails? Rollback? Atomic writes? Inconsistent state?
|
|
107
|
+
- Concurrency: Race conditions? Two users editing same resource? Shared mutable state? Deadlocks?
|
|
108
|
+
- Cascading failures: Service A down → B also fails? Circuit-breaking? Graceful degradation?
|
|
109
|
+
- Data integrity: Data loss? Corruption? Duplication? DB-level constraints or app-only validation?
|
|
110
|
+
- Recovery: How to recover from each failure? Reversible migrations? Backup restoration time?
|
|
111
|
+
- Deployment: What breaks during deploy? Rollback plan? Migration failures?
|
|
112
|
+
- Idempotency: Retried requests duplicate data? Double-charge? Double-email?
|
|
113
|
+
- Observability: How do you KNOW something failed? Logging? Monitoring? Alerts? Or angry users?
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
**Assumption Destroyer:**
|
|
117
|
+
```
|
|
118
|
+
You are a radical skeptic. "It should work" is not evidence. "We assume X" means X is unverified.
|
|
119
|
+
|
|
120
|
+
Examine the plan for:
|
|
121
|
+
- Unverified claims: "The API returns X" — tested? "The library supports Y" — checked docs?
|
|
122
|
+
- Scale assumptions: Expected load? Works at 10x? 100x? O(n²) hiding in "iterate all items"?
|
|
123
|
+
- Environment gaps: Same behavior in dev/staging/prod? Different OS? Docker vs bare metal?
|
|
124
|
+
- Integration risk: Third-party SLA? Rate limits? Their service down → your plan?
|
|
125
|
+
- Data assumptions: Always clean? Unicode? Emoji? Null bytes? 10MB payloads? Empty strings?
|
|
126
|
+
- User behavior: Will users actually do this? What if they click 50 times? Upload 2GB? Use mobile?
|
|
127
|
+
- Timing: "A before B" — always? What if B first? Implicit ordering dependencies?
|
|
128
|
+
- Hidden dependencies: Services, configs, env vars, or manual steps that must exist but aren't documented?
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
**Scope & Complexity Critic (YAGNI Enforcer):**
|
|
132
|
+
```
|
|
133
|
+
You believe the best code is no code. The best feature is the one you didn't build.
|
|
134
|
+
|
|
135
|
+
Examine the plan for:
|
|
136
|
+
- Over-engineering: Solving problems that don't exist yet? "In case we need it later" = YAGNI.
|
|
137
|
+
- Premature abstraction: Generic framework for 1 use case? Plugin system nobody asked for?
|
|
138
|
+
- Missing MVP: What's the absolute minimum viable delivery? Can 40% be deferred?
|
|
139
|
+
- Complexity vs value: Distributed system for 5 users? Proportional?
|
|
140
|
+
- Gold plating: Nice-to-have mixed with must-have? Can you ship without the nice-to-haves?
|
|
141
|
+
- Simpler alternative: Boring 10-line solution vs clever 500-line solution?
|
|
142
|
+
- Test burden: Test cases harder to maintain than the feature itself?
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
## Phase 4: Collect and Consolidate
|
|
146
|
+
|
|
147
|
+
After all reviewers complete:
|
|
148
|
+
|
|
149
|
+
1. **Collect** all findings from all reviewers
|
|
150
|
+
2. **Deduplicate** — if two lenses found the same root issue, merge into one finding noting both lenses
|
|
151
|
+
3. **Rate severity** using Likelihood × Impact:
|
|
152
|
+
|
|
153
|
+
| | Low Impact | Medium Impact | High Impact |
|
|
154
|
+
|---|-----------|---------------|-------------|
|
|
155
|
+
| **Likely** | Medium | High | Critical |
|
|
156
|
+
| **Possible** | Low | Medium | High |
|
|
157
|
+
| **Unlikely** | Low | Low | Medium |
|
|
158
|
+
|
|
159
|
+
4. **Sort** by severity: Critical → High → Medium → Low
|
|
160
|
+
5. **Cap** at 15 findings: keep all Critical, top High by specificity, note how many Medium were dropped
|
|
161
|
+
6. **Cross-reference check** (you, not reviewers): If both spec and test plan exist, flag any use cases in spec without test cases, and any test cases that contradict the spec
|
|
162
|
+
|
|
163
|
+
## Phase 5: Adjudicate
|
|
164
|
+
|
|
165
|
+
For each finding, YOU (the coordinator) evaluate and propose a disposition:
|
|
166
|
+
|
|
167
|
+
| Disposition | When to use |
|
|
168
|
+
|-------------|-------------|
|
|
169
|
+
| **Accept** | Valid flaw. Plan should be updated. |
|
|
170
|
+
| **Reject** | False positive, acceptable risk, or already handled elsewhere. |
|
|
171
|
+
|
|
172
|
+
Include 1-sentence rationale for each disposition. Be honest — don't reject valid findings to be nice, and don't accept trivial findings to pad the list.
|
|
173
|
+
|
|
174
|
+
## Phase 6: Present to User
|
|
175
|
+
|
|
176
|
+
Show adjudicated findings using the reviewer output format plus Disposition and Rationale fields.
|
|
177
|
+
|
|
178
|
+
Then ask: "How would you like to proceed?"
|
|
179
|
+
1. **"Apply all accepted"** — update the plan with all accepted fixes
|
|
180
|
+
2. **"Review each"** — walk through one by one, accept/reject/modify each
|
|
181
|
+
|
|
182
|
+
If user picks "Review each": for each finding, ask "Accept / Reject / Modify fix?"
|
|
183
|
+
|
|
184
|
+
## Phase 7: Apply
|
|
185
|
+
|
|
186
|
+
For each accepted finding:
|
|
187
|
+
1. Edit the target file at the exact location cited
|
|
188
|
+
2. Apply the fix (or user's modified version)
|
|
189
|
+
3. Surgical edits only — do NOT rewrite surrounding sections
|
|
190
|
+
|
|
191
|
+
After all edits, show summary:
|
|
192
|
+
```
|
|
193
|
+
Challenge complete.
|
|
194
|
+
Reviewers: N lenses
|
|
195
|
+
Findings: X total → Y accepted, Z rejected
|
|
196
|
+
Severity: N Critical, N High, N Medium
|
|
197
|
+
Files modified: [list]
|
|
198
|
+
Next: /test to implement, or /plan to regenerate if major changes.
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
If a reviewer returns > 7 findings, take only top 7 by severity. If a reviewer fails, proceed with remaining reviewers.
|
|
202
|
+
|
|
203
|
+
## Rules — Non-Negotiable
|
|
204
|
+
|
|
205
|
+
1. **Spawn reviewers in parallel.** Don't run lenses in your own context.
|
|
206
|
+
2. **Reviewers read files directly.** Pass paths, not content.
|
|
207
|
+
3. **Be hostile.** No praise. Not in reviewers, not in adjudication.
|
|
208
|
+
4. **Quote the plan.** Every finding needs a direct quote in Evidence.
|
|
209
|
+
5. **Don't manufacture findings.** 3 honest findings > 15 padded ones.
|
|
210
|
+
6. **Skip style/formatting.** Substance only: logic, security, assumptions, scope.
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
EXECUTE, not EXPLORE. Follow these steps exactly. Minimize tool calls.
|
|
2
|
+
|
|
3
|
+
## Step 1 — Analyze (single compound command)
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
echo "=== STATUS ===" && \
|
|
7
|
+
git status --short 2>/dev/null && \
|
|
8
|
+
echo "=== DIFF STAT ===" && \
|
|
9
|
+
git diff --stat 2>/dev/null && \
|
|
10
|
+
git diff --cached --stat 2>/dev/null && \
|
|
11
|
+
echo "=== METRICS ===" && \
|
|
12
|
+
{ git diff --shortstat 2>/dev/null; git diff --cached --shortstat 2>/dev/null; } && \
|
|
13
|
+
echo "=== SECRETS ===" && \
|
|
14
|
+
(git diff 2>/dev/null; git diff --cached 2>/dev/null) | grep -ciE "(api[_-]?key|token|password|secret|private[_-]?key|credential|auth[_-]?token)" || echo "0" && \
|
|
15
|
+
echo "=== DEBUG ===" && \
|
|
16
|
+
(git diff 2>/dev/null; git diff --cached 2>/dev/null) | grep -ciE "(console\.log|debugger|print\(|TODO:.*remove|HACK:|FIXME:.*temp|binding\.pry|var_dump)" || echo "0"
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Step 2 — Safety checks
|
|
22
|
+
|
|
23
|
+
**Secrets (hard block):** If count > 0, show matched lines and STOP. Do not commit.
|
|
24
|
+
|
|
25
|
+
**Debug code (soft warn):** If count > 0, show matched lines. Proceed only after user confirms they're intentional.
|
|
26
|
+
|
|
27
|
+
**Large diff:** If > 10 files or > 300 lines, note: "Large commit — consider splitting for easier review." Continue unless user says to split.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Step 3 — Stage files
|
|
32
|
+
|
|
33
|
+
Prefer staging specific files by name. Do NOT use `git add -A`.
|
|
34
|
+
|
|
35
|
+
Never stage: `.env`, credentials, build artifacts, generated files, binaries > 1MB.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Step 4 — Generate commit message
|
|
40
|
+
|
|
41
|
+
**Format:** `type(scope): description`
|
|
42
|
+
|
|
43
|
+
| Type | When |
|
|
44
|
+
|------|------|
|
|
45
|
+
| `feat` | New feature |
|
|
46
|
+
| `fix` | Bug fix |
|
|
47
|
+
| `docs` | Documentation only |
|
|
48
|
+
| `test` | Tests only |
|
|
49
|
+
| `refactor` | Code change, no behavior change |
|
|
50
|
+
| `chore` | Maintenance, deps, config |
|
|
51
|
+
| `perf` | Performance improvement |
|
|
52
|
+
| `build` | Build system |
|
|
53
|
+
| `ci` | CI/CD changes |
|
|
54
|
+
|
|
55
|
+
**Breaking changes:** If diff removes/renames a public function, export, or API endpoint → use `feat!` or `fix!` type, or add `BREAKING CHANGE:` footer.
|
|
56
|
+
|
|
57
|
+
**Rules:** Under 72 chars. Imperative tense ("add" not "added"). No period. WHAT+WHY, not HOW.
|
|
58
|
+
|
|
59
|
+
**Bad examples — avoid:**
|
|
60
|
+
- ❌ `Updated some files` — not descriptive
|
|
61
|
+
- ❌ `feat(auth): added login validation using bcrypt with salt rounds of 12` — too long, describes HOW
|
|
62
|
+
- ❌ `Fix bug` — not specific
|
|
63
|
+
- ❌ `WIP` — never commit unfinished work
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Step 5 — Commit
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
git commit -m "$(cat <<'EOF'
|
|
71
|
+
type(scope): description
|
|
72
|
+
|
|
73
|
+
Co-Authored-By: Claude <noreply@anthropic.com>
|
|
74
|
+
EOF
|
|
75
|
+
)"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
**Do NOT push** unless user explicitly asks.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Output
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
staged: N files (+X/-Y lines)
|
|
86
|
+
checks: secrets ✓ | debug ✓
|
|
87
|
+
commit: abc1234 type(scope): description
|
|
88
|
+
pushed: no
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Keep under 5 lines. No explanations.
|
|
92
|
+
|
|
93
|
+
## Rules
|
|
94
|
+
1. **Specific files, not `git add -A`.** Stage intentionally.
|
|
95
|
+
2. **Secrets = hard block.** No exceptions.
|
|
96
|
+
3. **Never push without explicit request.**
|
|
97
|
+
4. **One concern per commit.** Mixed features → suggest separate commits.
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
Test-first bug fix. Investigate → Reproduce → Fix → Verify → Learn.
|
|
2
|
+
|
|
3
|
+
Bug: $ARGUMENTS
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Phase 0: Investigate
|
|
8
|
+
|
|
9
|
+
Don't jump to code. Understand the bug first:
|
|
10
|
+
|
|
11
|
+
1. **Parse the report.** Symptom? Expected vs actual? Repro steps?
|
|
12
|
+
2. **Locate the code.** Grep for keywords from the bug (error messages, function names).
|
|
13
|
+
3. **Check history.** `git log --oneline -5 -- <file>` and `git blame -L <range> <file>` — who changed this last and why?
|
|
14
|
+
4. **Form a hypothesis:** "I believe the bug is caused by [X] in [file:function] because [evidence]."
|
|
15
|
+
|
|
16
|
+
If the bug is in a dependency/config/data (not our code), say so before proceeding.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Phase 1: Write a Failing Test
|
|
21
|
+
|
|
22
|
+
Write a test that reproduces the bug. It **MUST fail** with current code.
|
|
23
|
+
|
|
24
|
+
Add a comment: `// Regression: <bug description> — <expected> vs <actual>`
|
|
25
|
+
|
|
26
|
+
Run it:
|
|
27
|
+
```
|
|
28
|
+
bash scripts/build-test.sh --filter "<test name>"
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
- **FAILS** → reproduced. Continue.
|
|
32
|
+
- **PASSES** → hypothesis may be wrong. Ask: "Test passes — need different repro steps or environment details."
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Phase 2: Fix
|
|
37
|
+
|
|
38
|
+
Make the **minimal change** needed.
|
|
39
|
+
|
|
40
|
+
| Do | Don't |
|
|
41
|
+
|----|-------|
|
|
42
|
+
| Fix the specific bug | Refactor surrounding code |
|
|
43
|
+
| Add a guard for the edge case | Rewrite the function |
|
|
44
|
+
| Explain what and why before editing | Silently change code |
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Phase 3: Verify
|
|
49
|
+
|
|
50
|
+
1. Run the bug test: `bash scripts/build-test.sh --filter "<test name>"` → must PASS.
|
|
51
|
+
2. Run full suite: `bash scripts/build-test.sh` → no regressions.
|
|
52
|
+
|
|
53
|
+
If other tests break → the fix caused a regression. Investigate. Do NOT weaken existing tests.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Phase 4: Root Cause Analysis
|
|
58
|
+
|
|
59
|
+
After fixing, document:
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
Symptom: <what the user saw>
|
|
63
|
+
Root cause: <why it happened>
|
|
64
|
+
Gap: <why not caught earlier — missing test? wrong assumption? missing spec?>
|
|
65
|
+
Prevention: <suggest one: type constraint, validation, lint rule, spec update, or test plan update>
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
This is non-optional for serious bugs. For trivial bugs, the fix summary is enough.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Phase 5: Summary
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
Bug: <description>
|
|
76
|
+
Hypothesis: <what you predicted> → <confirmed or actual cause>
|
|
77
|
+
Test added: <file>:<test name>
|
|
78
|
+
Fix: <file>:<lines> — <what changed>
|
|
79
|
+
Root cause: <1 sentence>
|
|
80
|
+
Prevention: <suggestion>
|
|
81
|
+
Full suite: All passing ✓
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
If the bug reveals an undocumented edge case: "Consider updating the spec at docs/specs/<feature>.md."
|
|
85
|
+
|
|
86
|
+
## Multiple Bugs
|
|
87
|
+
|
|
88
|
+
If `$ARGUMENTS` describes multiple bugs: triage by severity, fix one at a time, commit each separately.
|
|
89
|
+
|
|
90
|
+
## Rules
|
|
91
|
+
1. **Investigate before coding.** Hypothesis before test. Evidence before fix.
|
|
92
|
+
2. **Minimal fix.** One bug, one change. Don't improve the neighborhood.
|
|
93
|
+
3. **Never weaken tests.** If existing tests break, the fix is wrong.
|
|
94
|
+
4. **Ask before touching production code** if unsure.
|
|
95
|
+
5. **One bug, one commit.** Each fix independently revertable.
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
Think hard about this task. A bad plan wastes days, a good plan saves weeks.
|
|
2
|
+
|
|
3
|
+
## Determine mode
|
|
4
|
+
|
|
5
|
+
Examine `$ARGUMENTS`:
|
|
6
|
+
|
|
7
|
+
- **Mode A — Spec exists:** Argument is a file path → read spec, generate test plan.
|
|
8
|
+
- **Mode B — No spec:** Argument is a description → create spec + test plan.
|
|
9
|
+
- **Mode C — Update:** Argument mentions "update" or existing path → read existing, update surgically.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Phase 0: Codebase Awareness
|
|
14
|
+
|
|
15
|
+
Before writing anything:
|
|
16
|
+
1. Scan existing code in the feature area — what files, functions, types already exist?
|
|
17
|
+
2. Check `docs/specs/` — is there already a spec for this or a related feature?
|
|
18
|
+
3. Check `docs/test-plans/` — any overlap with existing plans?
|
|
19
|
+
4. Identify project patterns — test framework, naming conventions, directory structure.
|
|
20
|
+
|
|
21
|
+
Don't plan in a vacuum. A spec that ignores existing code creates conflicts.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Phase 1: Draft the Spec (Mode B only)
|
|
26
|
+
|
|
27
|
+
Create at `docs/specs/<feature-name>.md`. Include these sections (skip any that don't apply):
|
|
28
|
+
|
|
29
|
+
- **Overview** — what, why, who. 2-3 sentences.
|
|
30
|
+
- **Data Model** — entities, attributes, relationships (table format)
|
|
31
|
+
- **Use Cases** — UC-NNN with actor, preconditions, flow, postconditions, error cases. Each use case contains:
|
|
32
|
+
- **FR-NNN** (Functional Requirements) — specific behaviors the system must exhibit
|
|
33
|
+
- **SC-NNN** (Success Criteria) — measurable non-functional targets (performance, limits)
|
|
34
|
+
- **State Machine** — states and valid transitions (if applicable)
|
|
35
|
+
- **Settings/Configuration** — configurable behavior and defaults
|
|
36
|
+
- **Constraints & Invariants** — rules that must ALWAYS hold
|
|
37
|
+
- **Error Handling** — how errors surface to users and are logged
|
|
38
|
+
- **Security Considerations** — auth, authorization, data sensitivity
|
|
39
|
+
|
|
40
|
+
Match depth to complexity. Simple CRUD = 1 paragraph overview + 3 use cases. Complex auth system = full template. Don't generate filler for sections that don't apply.
|
|
41
|
+
|
|
42
|
+
Show the draft to the user. Wait for confirmation before generating the test plan.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Phase 2: Clarify Ambiguities
|
|
47
|
+
|
|
48
|
+
Before generating the test plan, scan the spec for gaps. A test plan built on a vague spec produces vague tests.
|
|
49
|
+
|
|
50
|
+
| Lens | What to look for |
|
|
51
|
+
|------|-----------------|
|
|
52
|
+
| **Behavioral gaps** | Missing user actions, undefined system responses, incomplete flows |
|
|
53
|
+
| **Data & persistence** | Undefined entities, missing relationships, unclear storage/lifecycle |
|
|
54
|
+
| **Auth & access** | Who can do what is unclear, missing role definitions |
|
|
55
|
+
| **Non-functional** | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with numbers |
|
|
56
|
+
| **Integration** | Third-party API assumptions, unstated dependencies, SLA gaps |
|
|
57
|
+
| **Concurrency & edge cases** | Multi-user scenarios, boundary conditions, error paths not addressed |
|
|
58
|
+
|
|
59
|
+
Identify the top 3-5 ambiguities (most impactful first). For each, ask the user a targeted question with 2-4 concrete options and a recommendation.
|
|
60
|
+
|
|
61
|
+
If the spec is clear and complete, 0 questions is valid. Don't manufacture ambiguity.
|
|
62
|
+
|
|
63
|
+
Write clarifications back into the spec under `## Clarifications — <date>`.
|
|
64
|
+
Then proceed to test plan generation.
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## Phase 3: Generate the Test Plan
|
|
69
|
+
|
|
70
|
+
Read the spec. For each section, extract:
|
|
71
|
+
1. Use cases → at least 1 test (happy path) + 1 test (error path) each
|
|
72
|
+
2. State transitions → test valid AND invalid transitions
|
|
73
|
+
3. Constraints → test they hold under edge conditions
|
|
74
|
+
4. Settings → test default AND non-default values
|
|
75
|
+
5. Cross-cutting concerns (auth, validation) → integration-level tests
|
|
76
|
+
|
|
77
|
+
Prioritize by risk: data loss/security = P0, error handling = P1, cosmetic/rare = P2.
|
|
78
|
+
|
|
79
|
+
### Output
|
|
80
|
+
|
|
81
|
+
Write to `docs/test-plans/<feature-name>.md`:
|
|
82
|
+
|
|
83
|
+
```markdown
|
|
84
|
+
# Test Plan: <Feature Name>
|
|
85
|
+
|
|
86
|
+
**Spec:** docs/specs/<feature-name>.md
|
|
87
|
+
**Generated:** <$(date +%Y-%m-%d)>
|
|
88
|
+
|
|
89
|
+
## Test Cases
|
|
90
|
+
|
|
91
|
+
| ID | Priority | Type | UC | FR/SC | Description | Expected |
|
|
92
|
+
|----|----------|------|----|-------|-------------|----------|
|
|
93
|
+
| TC-001 | P0 | unit | UC-001 | FR-001 | Valid login returns token | 200 + JWT |
|
|
94
|
+
| TC-002 | P0 | unit | UC-001 | FR-002 | Wrong password returns 401 | 401 + error msg |
|
|
95
|
+
|
|
96
|
+
## Implementation Order
|
|
97
|
+
1. TC-001, TC-002 (no dependencies — start here)
|
|
98
|
+
2. TC-003+ (depend on setup from earlier tests)
|
|
99
|
+
|
|
100
|
+
## Coverage Notes
|
|
101
|
+
- Highest risk areas: ...
|
|
102
|
+
- Existing code needing modification: [file paths]
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**Priority:** P0 = must have (blocks release), P1 = should have, P2 = nice to have.
|
|
106
|
+
**Type:** `unit`, `integration`, `e2e`, `snapshot`, `performance`
|
|
107
|
+
|
|
108
|
+
### What NOT to produce
|
|
109
|
+
- "Test that the feature works" — too vague
|
|
110
|
+
- 50+ test cases for simple CRUD — over-testing
|
|
111
|
+
- Testing implementation details — brittle
|
|
112
|
+
- Duplicate tests verifying same behavior
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Phase 4: Summary
|
|
117
|
+
|
|
118
|
+
Show: test case counts (P0/P1/P2), implementation order, estimated scope.
|
|
119
|
+
Next steps: "Use `/test` after each chunk. For complex plans, run `/challenge` first."
|
|
120
|
+
|
|
121
|
+
## Naming Convention
|
|
122
|
+
|
|
123
|
+
Spec and test plan MUST share the same filename:
|
|
124
|
+
```
|
|
125
|
+
docs/specs/<feature-name>.md ← kebab-case, 2-3 words
|
|
126
|
+
docs/test-plans/<feature-name>.md ← same name
|
|
127
|
+
```
|
|
128
|
+
- Use feature name, not module name: `user-auth.md` not `AuthService.md`
|
|
129
|
+
- No prefix/suffix: `user-auth.md` not `spec-user-auth.md`
|
|
130
|
+
|
|
131
|
+
**Requirement IDs** — sequential per spec:
|
|
132
|
+
- `UC-001` Use Case, `FR-001` Functional Requirement, `SC-001` Success Criteria, `TC-001` Test Case
|
|
133
|
+
- Every TC must reference at least one FR or SC for traceability.
|
|
134
|
+
|
|
135
|
+
## Rules
|
|
136
|
+
1. **Spec-first.** Test plan derives from spec, never from code.
|
|
137
|
+
2. **Codebase-aware.** Don't plan features that already exist.
|
|
138
|
+
3. **Actionable.** Every test case must be unambiguous enough to implement directly.
|
|
139
|
+
4. **Proportional.** Simple feature = simple plan. Don't over-engineer CRUD.
|
|
140
|
+
5. **Traceable.** Every test links to a use case. No orphan tests.
|
|
141
|
+
6. **Consistent names.** Spec and test plan always share the same filename.
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
Think hard about this review. You are the last gate before code reaches the main branch.
|
|
2
|
+
|
|
3
|
+
## Phase 0: Understand Intent
|
|
4
|
+
|
|
5
|
+
1. Read commit messages:
|
|
6
|
+
```
|
|
7
|
+
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||') || BASE="main"
|
|
8
|
+
git log --oneline "$BASE"...HEAD
|
|
9
|
+
```
|
|
10
|
+
2. Check for spec in `docs/specs/` and test plan in `docs/test-plans/` — review against INTENT.
|
|
11
|
+
3. Read the diff: `git diff "$BASE"...HEAD`
|
|
12
|
+
|
|
13
|
+
If `$ARGUMENTS` provided → scope to those files only.
|
|
14
|
+
If diff > 500 lines → review file-by-file, prioritize by smart focus below.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Phase 1: Smart Focus
|
|
19
|
+
|
|
20
|
+
Auto-detect primary focus from diff content:
|
|
21
|
+
|
|
22
|
+
| Diff contains | Focus heavily on |
|
|
23
|
+
|--------------|-----------------|
|
|
24
|
+
| auth, login, token, session, password, JWT | Security — full depth |
|
|
25
|
+
| SQL, query, database, migration | Injection + data integrity |
|
|
26
|
+
| API, endpoint, route, controller, handler | Input validation + error handling |
|
|
27
|
+
| .env, config, secret, key, credential | Secret exposure |
|
|
28
|
+
| Test files only | Test quality (skip security deep-dive) |
|
|
29
|
+
| Docs/comments only | Accuracy only (minimal review) |
|
|
30
|
+
| Payment, billing, transaction | Correctness + idempotency |
|
|
31
|
+
|
|
32
|
+
Spend 60% of analysis on the primary focus. Cover all categories, but proportionally.
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Phase 2: Checklist
|
|
37
|
+
|
|
38
|
+
### Security (Critical)
|
|
39
|
+
- **Injection:** Search diff for string concatenation in SQL/shell/HTML. Look for `${var}` in queries, `.innerHTML`, template literals in SQL. Flag any user input reaching a query without parameterization.
|
|
40
|
+
- **Auth/Authz:** New endpoint → has auth middleware? Can user A access user B's data? ID in URL without ownership check?
|
|
41
|
+
- **Secrets:** Hardcoded strings matching `sk-`, `ghp_`, `Bearer `, long base64. New env vars committed?
|
|
42
|
+
- **Error exposure:** Catch blocks sending raw errors to users? Stack traces, file paths, DB schemas in responses?
|
|
43
|
+
- **Dependencies:** New packages — maintained? >1000 weekly downloads? Known CVEs?
|
|
44
|
+
|
|
45
|
+
### Correctness (High)
|
|
46
|
+
- **Logic vs intent:** Does the code do what commits/spec claim? "Add validation" but code just logs?
|
|
47
|
+
- **Edge cases:** null, empty, 0, negative, MAX_INT, unicode, very long strings — handled?
|
|
48
|
+
- **Error handling:** For each try/catch — error logged with context? User shown safe message? Resources cleaned in finally?
|
|
49
|
+
- **Concurrency:** Shared state without locks? Read-then-write without atomicity? Non-atomic DB updates?
|
|
50
|
+
- **Null safety:** Optionals used without guards? `object!.property` without nil check?
|
|
51
|
+
|
|
52
|
+
### Spec-Test Alignment (Medium)
|
|
53
|
+
- Source changed but no spec update in `docs/specs/`? → flag
|
|
54
|
+
- Source changed but no test update? → flag
|
|
55
|
+
- Spec changed but tests not updated? → flag
|
|
56
|
+
- Code removed but dead tests remain? → flag
|
|
57
|
+
- Spec contains vague requirements without metrics ("fast", "secure", "easy", "scalable")? → flag with suggestion to add SC-NNN with concrete numbers
|
|
58
|
+
|
|
59
|
+
### Code Quality (Medium)
|
|
60
|
+
- Dead code: removed functions still imported elsewhere?
|
|
61
|
+
- Obvious duplication: copy-pasted blocks that should be shared?
|
|
62
|
+
- Naming: consistent with codebase? Descriptive?
|
|
63
|
+
- Complexity: functions > 40 lines or > 3 nesting levels?
|
|
64
|
+
|
|
65
|
+
### Performance (Low)
|
|
66
|
+
- Flag N+1 queries, unbounded collections, redundant computation in loops.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Phase 3: Output
|
|
71
|
+
|
|
72
|
+
```markdown
|
|
73
|
+
## Code Review: <branch or description>
|
|
74
|
+
|
|
75
|
+
**Scope:** X files, +Y/-Z lines
|
|
76
|
+
**Focus:** <auto-detected>
|
|
77
|
+
**Verdict:** APPROVE / REQUEST CHANGES / NEEDS DISCUSSION
|
|
78
|
+
|
|
79
|
+
### Critical Issues
|
|
80
|
+
**[C-1] file.ts:42 — SQL injection via unsanitized input**
|
|
81
|
+
`req.query.search` concatenated into SQL. Use parameterized query.
|
|
82
|
+
|
|
83
|
+
### High Priority
|
|
84
|
+
**[H-1] file.ts:87 — Empty catch swallows DB errors**
|
|
85
|
+
Users see blank screen. Log with context, return safe error.
|
|
86
|
+
|
|
87
|
+
### Medium Priority
|
|
88
|
+
**[M-1] Spec-test gap — rate limiting not in spec**
|
|
89
|
+
New logic at auth-service.ts:45-62 undocumented.
|
|
90
|
+
|
|
91
|
+
### Low Priority
|
|
92
|
+
**[L-1] Consider caching config lookup (called 3x per request)**
|
|
93
|
+
|
|
94
|
+
### Positive Notes
|
|
95
|
+
(At least 1 — reinforce good patterns)
|
|
96
|
+
- Clean middleware separation in auth-middleware.ts
|
|
97
|
+
- Thorough edge case tests
|
|
98
|
+
|
|
99
|
+
### Summary
|
|
100
|
+
<1-2 sentences: quality + clear next action>
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Rules
|
|
104
|
+
1. **Never auto-fix.** Report only.
|
|
105
|
+
2. **Specific.** Every finding has `file:line` and concrete description.
|
|
106
|
+
3. **Severity matches impact.** Style nits = Low. Injection = Critical.
|
|
107
|
+
4. **Positive notes mandatory.** Reviews aren't just about problems.
|
|
108
|
+
5. **Review against intent.** Not just "clean code?" but "does this match spec/commits?"
|
|
109
|
+
6. **Proportional.** 5-line doc change ≠ 500-line auth rewrite.
|