claude-raid 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +345 -0
- package/bin/cli.js +34 -0
- package/package.json +37 -0
- package/src/detect-project.js +112 -0
- package/src/doctor.js +201 -0
- package/src/init.js +138 -0
- package/src/merge-settings.js +119 -0
- package/src/remove.js +92 -0
- package/src/update.js +110 -0
- package/template/.claude/agents/archer.md +115 -0
- package/template/.claude/agents/rogue.md +116 -0
- package/template/.claude/agents/warrior.md +114 -0
- package/template/.claude/agents/wizard.md +206 -0
- package/template/.claude/hooks/validate-commit-message.sh +78 -0
- package/template/.claude/hooks/validate-file-naming.sh +73 -0
- package/template/.claude/hooks/validate-no-placeholders.sh +67 -0
- package/template/.claude/hooks/validate-phase-gate.sh +60 -0
- package/template/.claude/hooks/validate-tests-pass.sh +43 -0
- package/template/.claude/hooks/validate-verification.sh +70 -0
- package/template/.claude/raid-rules.md +21 -0
- package/template/.claude/skills/raid-debugging/SKILL.md +159 -0
- package/template/.claude/skills/raid-design/SKILL.md +208 -0
- package/template/.claude/skills/raid-finishing/SKILL.md +123 -0
- package/template/.claude/skills/raid-git-worktrees/SKILL.md +96 -0
- package/template/.claude/skills/raid-implementation/SKILL.md +155 -0
- package/template/.claude/skills/raid-implementation-plan/SKILL.md +173 -0
- package/template/.claude/skills/raid-protocol/SKILL.md +288 -0
- package/template/.claude/skills/raid-review/SKILL.md +133 -0
- package/template/.claude/skills/raid-tdd/SKILL.md +147 -0
- package/template/.claude/skills/raid-verification/SKILL.md +113 -0
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: raid-tdd
|
|
3
|
+
description: "Use during implementation. Enforces strict RED-GREEN-REFACTOR. No production code without a failing test first. Challengers attack test quality. No subagents."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Raid TDD — Test-Driven Development
|
|
7
|
+
|
|
8
|
+
Write the test first. Watch it fail. Write minimal code to pass. Then the others try to break it.
|
|
9
|
+
|
|
10
|
+
**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
|
|
11
|
+
|
|
12
|
+
**Violating the letter of these rules is violating their spirit.**
|
|
13
|
+
|
|
14
|
+
**TDD is enforced in ALL modes — Full Raid, Skirmish, and Scout. No exceptions.**
|
|
15
|
+
|
|
16
|
+
## The Iron Law
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Write code before the test? Delete it. Start over. No exceptions.
|
|
23
|
+
- Don't keep it as "reference"
|
|
24
|
+
- Don't "adapt" it while writing tests
|
|
25
|
+
- Don't look at it
|
|
26
|
+
- Delete means delete
|
|
27
|
+
|
|
28
|
+
## Process Flow
|
|
29
|
+
|
|
30
|
+
```dot
|
|
31
|
+
digraph tdd {
|
|
32
|
+
"Write one failing test" -> "Run test";
|
|
33
|
+
"Run test" -> "Fails correctly?" [shape=diamond];
|
|
34
|
+
"Fails correctly?" -> "Write one failing test" [label="error, not failure — fix test"];
|
|
35
|
+
"Fails correctly?" -> "Write minimal code to pass" [label="yes, fails for right reason"];
|
|
36
|
+
"Write minimal code to pass" -> "Run test";
|
|
37
|
+
"Run test" -> "Passes?" [shape=diamond];
|
|
38
|
+
"Passes?" -> "Write minimal code to pass" [label="no"];
|
|
39
|
+
"Passes?" -> "Run FULL test suite" [label="yes"];
|
|
40
|
+
"Run FULL test suite" -> "All pass?" [shape=diamond];
|
|
41
|
+
"All pass?" -> "Fix regression" [label="no"];
|
|
42
|
+
"Fix regression" -> "Run FULL test suite";
|
|
43
|
+
"All pass?" -> "Refactor (optional)" [label="yes"];
|
|
44
|
+
"Refactor (optional)" -> "Run FULL test suite";
|
|
45
|
+
"Run FULL test suite" -> "Adversarial test review (challengers)" [label="green"];
|
|
46
|
+
"Adversarial test review (challengers)" -> "Next behavior" [shape=doublecircle];
|
|
47
|
+
}
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Red-Green-Refactor
|
|
51
|
+
|
|
52
|
+
### RED — Write Failing Test
|
|
53
|
+
- One test, one behavior. "And" in the name? Split it.
|
|
54
|
+
- Clear name describing the behavior, not the implementation
|
|
55
|
+
- Real code, no mocks unless absolutely unavoidable
|
|
56
|
+
- Good: `test_expired_token_returns_401`
|
|
57
|
+
- Bad: `test_auth`, `test1`, `test_the_function`
|
|
58
|
+
|
|
59
|
+
### Verify RED — Watch It Fail (MANDATORY. Never skip.)
|
|
60
|
+
Run: test command from `.claude/raid.json`
|
|
61
|
+
- Test **fails** (not errors — fails). A test that errors proves nothing.
|
|
62
|
+
- Failure message matches your expectation
|
|
63
|
+
- Fails because the feature is missing, not because of typos or import errors
|
|
64
|
+
|
|
65
|
+
### GREEN — Minimal Code
|
|
66
|
+
- Simplest code that makes the test pass. Nothing more.
|
|
67
|
+
- No extra features, options, configurations
|
|
68
|
+
- "Can I make it pass with less code?" — if yes, do that instead
|
|
69
|
+
|
|
70
|
+
### Verify GREEN — Watch It Pass (MANDATORY.)
|
|
71
|
+
Run: test command from `.claude/raid.json`
|
|
72
|
+
- New test passes
|
|
73
|
+
- ALL existing tests still pass
|
|
74
|
+
- Output clean — no warnings, no deprecation notices
|
|
75
|
+
|
|
76
|
+
### REFACTOR — Clean Up (only after green)
|
|
77
|
+
- Remove duplication, improve names, extract helpers
|
|
78
|
+
- Keep tests green throughout — run after every change
|
|
79
|
+
- Refactor the TESTS too, not just the implementation
|
|
80
|
+
|
|
81
|
+
## Adversarial Test Review
|
|
82
|
+
|
|
83
|
+
After TDD cycle, challengers attack the TESTS directly — and build on each other's critiques:
|
|
84
|
+
|
|
85
|
+
1. **Does this test prove the behavior, or just confirm the implementation?** If you renamed an internal method, would the test break? It shouldn't.
|
|
86
|
+
2. **What input would make this test pass even with a broken implementation?** (e.g., a test that only checks the happy path passes for any implementation that doesn't crash)
|
|
87
|
+
3. **What edge cases are uncovered?** Empty input, null, boundary values, Unicode, concurrent access.
|
|
88
|
+
4. **Is it testing real code or mock behavior?** Mocks that don't match real behavior = false confidence.
|
|
89
|
+
5. **Would this catch a regression?** If someone changes the implementation next month, does this test catch the break?
|
|
90
|
+
|
|
91
|
+
**Challengers interact directly:**
|
|
92
|
+
- `⚔️ CHALLENGE: @Warrior, your test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
|
|
93
|
+
- `🔗 BUILDING ON @Archer: Your edge case finding — the same gap exists in the error path test at line 32...`
|
|
94
|
+
- `🔥 ROAST: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
|
|
95
|
+
|
|
96
|
+
**Challengers don't just report to the Wizard — they fight each other over test quality.**
|
|
97
|
+
|
|
98
|
+
## Testing Anti-Patterns
|
|
99
|
+
|
|
100
|
+
| Anti-Pattern | Problem | Fix |
|
|
101
|
+
|---|---|---|
|
|
102
|
+
| Testing mock behavior, not real behavior | Mocks pass, production breaks | Use real implementations. Mock only external services. |
|
|
103
|
+
| Test-only methods in production code | Production API polluted for tests | Test through the public interface. If you can't, the design is wrong. |
|
|
104
|
+
| Mocking without understanding the real behavior | Mock returns "success" but real service returns different shape | Read the real implementation before mocking. |
|
|
105
|
+
| Incomplete mocks | Mock covers happy path, misses errors | Mock all paths, or don't mock at all. |
|
|
106
|
+
| Integration tests as afterthought | Unit tests pass, integration breaks | Write integration tests as part of TDD, not after. |
|
|
107
|
+
|
|
108
|
+
## Common Rationalizations
|
|
109
|
+
|
|
110
|
+
| Excuse | Reality |
|
|
111
|
+
|--------|---------|
|
|
112
|
+
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. Write it. |
|
|
113
|
+
| "I'll test after" | Tests passing immediately prove nothing about the design. |
|
|
114
|
+
| "Need to explore first" | Fine. Throw away exploration code. Start fresh with TDD. |
|
|
115
|
+
| "Test hard = skip it" | Hard to test = hard to use. Fix the design. |
|
|
116
|
+
| "TDD slows me down" | TDD is faster than debugging. Every time. |
|
|
117
|
+
| "Tests after achieve same goals" | Tests-after = "what does this code do?" Tests-first = "what should this code do?" |
|
|
118
|
+
| "Keep the code as reference" | You'll "adapt" it. That's testing after with extra steps. Delete means delete. |
|
|
119
|
+
| "Just this once" | Every exception is a precedent. Delete the code. Start with TDD. |
|
|
120
|
+
| "Already manually tested" | Manual testing leaves no record, can't be re-run, can't catch regressions. |
|
|
121
|
+
| "TDD is dogmatic" | TDD is evidence-based discipline. Dogma would skip the evidence. |
|
|
122
|
+
|
|
123
|
+
## Verification Checklist
|
|
124
|
+
|
|
125
|
+
Before claiming a TDD cycle is complete:
|
|
126
|
+
|
|
127
|
+
- [ ] Every new function has a test
|
|
128
|
+
- [ ] Watched each test fail before implementing
|
|
129
|
+
- [ ] Each test failed for the expected reason (not errors)
|
|
130
|
+
- [ ] Wrote minimal code to pass (no over-engineering)
|
|
131
|
+
- [ ] All tests pass (new and existing)
|
|
132
|
+
- [ ] Output clean (no warnings)
|
|
133
|
+
- [ ] Tests use real code (minimal mocking)
|
|
134
|
+
- [ ] Edge cases and error paths covered
|
|
135
|
+
- [ ] Tests describe behavior, not implementation
|
|
136
|
+
|
|
137
|
+
Can't check all boxes? Don't proceed. Fix what's missing.
|
|
138
|
+
|
|
139
|
+
## When You're Stuck
|
|
140
|
+
|
|
141
|
+
| Situation | Action |
|
|
142
|
+
|-----------|--------|
|
|
143
|
+
| Can't write the test | The requirement is unclear. Ask the Wizard. |
|
|
144
|
+
| Test won't fail | You may already have the implementation. Delete it. Write test. Watch it fail. |
|
|
145
|
+
| Test fails but for wrong reason | Fix the test setup, not the implementation. |
|
|
146
|
+
| Can't make it pass with minimal code | The design may be wrong. Discuss before forcing. |
|
|
147
|
+
| Tests are brittle (break on refactor) | Tests are testing implementation, not behavior. Rewrite them. |
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: raid-verification
|
|
3
|
+
description: "Use before claiming any work is complete, fixed, or passing. No completion claims without fresh verification evidence. Challengers verify independently. No subagents."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Raid Verification — Evidence Before Claims
|
|
7
|
+
|
|
8
|
+
Claiming work is complete without verification is dishonesty, not efficiency.
|
|
9
|
+
|
|
10
|
+
**Core principle:** Evidence before claims, always.
|
|
11
|
+
|
|
12
|
+
**Violating the letter of this rule is violating its spirit.**
|
|
13
|
+
|
|
14
|
+
## The Iron Law
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
If you haven't run the verification command THIS turn, you cannot claim it passes.
|
|
21
|
+
|
|
22
|
+
## Mode Behavior
|
|
23
|
+
|
|
24
|
+
- **Full Raid**: Triple verification — implementer + 2 challengers verify independently.
|
|
25
|
+
- **Skirmish**: Double verification — implementer + 1 challenger.
|
|
26
|
+
- **Scout**: Single verification + Wizard confirms.
|
|
27
|
+
|
|
28
|
+
## The Gate Function
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
BEFORE claiming any status:
|
|
32
|
+
1. IDENTIFY: What command proves this claim?
|
|
33
|
+
2. RUN: Execute the FULL command (fresh, complete)
|
|
34
|
+
Use test command from .claude/raid.json
|
|
35
|
+
3. READ: Full output, check exit code, count failures
|
|
36
|
+
4. VERIFY: Does output confirm the claim?
|
|
37
|
+
If NO → state actual status with evidence
|
|
38
|
+
If YES → state claim WITH evidence
|
|
39
|
+
5. ONLY THEN: Make the claim
|
|
40
|
+
|
|
41
|
+
Skip any step = lying, not verifying
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
This gate applies to EVERY status claim: "tests pass", "bug fixed", "feature complete", "regression resolved", "no issues found."
|
|
45
|
+
|
|
46
|
+
## Common Failures
|
|
47
|
+
|
|
48
|
+
| Claim | Requires | NOT Sufficient |
|
|
49
|
+
|-------|----------|----------------|
|
|
50
|
+
| "Tests pass" | Test command output: 0 failures | Previous run, "should pass", changed code without re-running |
|
|
51
|
+
| "Bug fixed" | Reproduce original symptom: now passes | Code changed, assumed fixed |
|
|
52
|
+
| "No regressions" | Full test suite: all pass | Running only new tests |
|
|
53
|
+
| "Feature complete" | All acceptance criteria verified with evidence | Self-assessment without running tests |
|
|
54
|
+
| "Regression test works" | Red-green cycle verified | Test passes once without seeing it fail |
|
|
55
|
+
|
|
56
|
+
## Forbidden Phrases Without Evidence
|
|
57
|
+
|
|
58
|
+
These phrases are NEVER allowed without preceding verification output:
|
|
59
|
+
|
|
60
|
+
- "Done", "Complete", "Fixed", "Working", "Passing"
|
|
61
|
+
- "Should work", "Probably fine", "Looks correct"
|
|
62
|
+
- "Great!", "Perfect!", "All good!"
|
|
63
|
+
|
|
64
|
+
Replace with: `[command] output shows [evidence]. [Claim].`
|
|
65
|
+
|
|
66
|
+
Example:
|
|
67
|
+
- Bad: "Tests pass."
|
|
68
|
+
- Good: "`npm test` output: 31 passing, 0 failing. Tests pass."
|
|
69
|
+
|
|
70
|
+
## Raid Triple Verification
|
|
71
|
+
|
|
72
|
+
The implementer's claim is NOT sufficient. Challengers verify AND cross-check each other:
|
|
73
|
+
|
|
74
|
+
1. **Implementer verifies** — runs tests, reports with evidence (command + output)
|
|
75
|
+
2. **Challenger 1 verifies independently** — runs same tests, confirms output matches
|
|
76
|
+
3. **Challenger 2 verifies adversarially** — runs tests PLUS tries to break it with edge cases
|
|
77
|
+
4. **Challengers cross-check each other:** `@Archer, you said tests pass but did you run the full suite or just the changed files?` / `🔗 BUILDING ON @Warrior: Your verification missed the integration test at...`
|
|
78
|
+
|
|
79
|
+
Only after all required verifications confirm — and challengers have cross-checked each other — does the Wizard accept.
|
|
80
|
+
|
|
81
|
+
## Common Rationalizations
|
|
82
|
+
|
|
83
|
+
| Excuse | Reality |
|
|
84
|
+
|--------|---------|
|
|
85
|
+
| "Should work now" | RUN the verification. "Should" is not evidence. |
|
|
86
|
+
| "I'm confident" | Confidence ≠ evidence. Confident people ship bugs too. |
|
|
87
|
+
| "Just this once" | Every "just this once" is a bug waiting to ship. No exceptions. |
|
|
88
|
+
| "Agent said success" | Verify independently. Reports lie — run the command yourself. |
|
|
89
|
+
| "Partial check is enough" | Partial verification proves nothing about the whole. Run the full suite. |
|
|
90
|
+
| "Different words so rule doesn't apply" | Spirit over letter. If you're claiming status without evidence, you're violating this. |
|
|
91
|
+
| "I already verified earlier" | Earlier is not now. Code may have changed. Fresh run required. |
|
|
92
|
+
|
|
93
|
+
## Red Flags
|
|
94
|
+
|
|
95
|
+
| Thought | Reality |
|
|
96
|
+
|---------|---------|
|
|
97
|
+
| "I just ran the tests, they should still pass" | "Should" is not evidence. Run again. |
|
|
98
|
+
| "The fix is obvious, it definitely works" | Obvious fixes break in non-obvious ways. Verify. |
|
|
99
|
+
| "I don't need to verify a one-line change" | One-line changes cause production outages. Verify. |
|
|
100
|
+
| Expressing satisfaction before verification | Satisfaction is not evidence. Run the command first. |
|
|
101
|
+
| "Let me just commit and verify after" | Committing before verification = shipping unverified code. |
|
|
102
|
+
|
|
103
|
+
## Why This Matters
|
|
104
|
+
|
|
105
|
+
From real-world failure patterns:
|
|
106
|
+
- Undefined functions shipped because "tests pass" was claimed without running
|
|
107
|
+
- Missing requirements deployed because "feature complete" was asserted without checking spec
|
|
108
|
+
- Hours wasted because "fixed" was reported without reproducing the original bug
|
|
109
|
+
- Trust broken because claims weren't backed by evidence
|
|
110
|
+
|
|
111
|
+
Every one of these was avoidable with: run the command, read the output, THEN claim.
|
|
112
|
+
|
|
113
|
+
Run the command. Read the output. THEN claim the result. Non-negotiable.
|