claude-raid 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +345 -0
  3. package/bin/cli.js +34 -0
  4. package/package.json +37 -0
  5. package/src/detect-project.js +112 -0
  6. package/src/doctor.js +201 -0
  7. package/src/init.js +138 -0
  8. package/src/merge-settings.js +119 -0
  9. package/src/remove.js +92 -0
  10. package/src/update.js +110 -0
  11. package/template/.claude/agents/archer.md +115 -0
  12. package/template/.claude/agents/rogue.md +116 -0
  13. package/template/.claude/agents/warrior.md +114 -0
  14. package/template/.claude/agents/wizard.md +206 -0
  15. package/template/.claude/hooks/validate-commit-message.sh +78 -0
  16. package/template/.claude/hooks/validate-file-naming.sh +73 -0
  17. package/template/.claude/hooks/validate-no-placeholders.sh +67 -0
  18. package/template/.claude/hooks/validate-phase-gate.sh +60 -0
  19. package/template/.claude/hooks/validate-tests-pass.sh +43 -0
  20. package/template/.claude/hooks/validate-verification.sh +70 -0
  21. package/template/.claude/raid-rules.md +21 -0
  22. package/template/.claude/skills/raid-debugging/SKILL.md +159 -0
  23. package/template/.claude/skills/raid-design/SKILL.md +208 -0
  24. package/template/.claude/skills/raid-finishing/SKILL.md +123 -0
  25. package/template/.claude/skills/raid-git-worktrees/SKILL.md +96 -0
  26. package/template/.claude/skills/raid-implementation/SKILL.md +155 -0
  27. package/template/.claude/skills/raid-implementation-plan/SKILL.md +173 -0
  28. package/template/.claude/skills/raid-protocol/SKILL.md +288 -0
  29. package/template/.claude/skills/raid-review/SKILL.md +133 -0
  30. package/template/.claude/skills/raid-tdd/SKILL.md +147 -0
  31. package/template/.claude/skills/raid-verification/SKILL.md +113 -0
@@ -0,0 +1,147 @@
1
+ ---
2
+ name: raid-tdd
3
+ description: "Use during implementation. Enforces strict RED-GREEN-REFACTOR. No production code without a failing test first. Challengers attack test quality. No subagents."
4
+ ---
5
+
6
+ # Raid TDD — Test-Driven Development
7
+
8
+ Write the test first. Watch it fail. Write minimal code to pass. Then the others try to break it.
9
+
10
+ **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
11
+
12
+ **Violating the letter of these rules is violating their spirit.**
13
+
14
+ **TDD is enforced in ALL modes — Full Raid, Skirmish, and Scout. No exceptions.**
15
+
16
+ ## The Iron Law
17
+
18
+ ```
19
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
20
+ ```
21
+
22
+ Write code before the test? Delete it. Start over. No exceptions.
23
+ - Don't keep it as "reference"
24
+ - Don't "adapt" it while writing tests
25
+ - Don't look at it
26
+ - Delete means delete
27
+
28
+ ## Process Flow
29
+
30
+ ```dot
31
+ digraph tdd {
32
+ "Write one failing test" -> "Run test";
33
+ "Run test" -> "Fails correctly?" [shape=diamond];
34
+ "Fails correctly?" -> "Write one failing test" [label="error, not failure — fix test"];
35
+ "Fails correctly?" -> "Write minimal code to pass" [label="yes, fails for right reason"];
36
+ "Write minimal code to pass" -> "Run test";
37
+ "Run test" -> "Passes?" [shape=diamond];
38
+ "Passes?" -> "Write minimal code to pass" [label="no"];
39
+ "Passes?" -> "Run FULL test suite" [label="yes"];
40
+ "Run FULL test suite" -> "All pass?" [shape=diamond];
41
+ "All pass?" -> "Fix regression" [label="no"];
42
+ "Fix regression" -> "Run FULL test suite";
43
+ "All pass?" -> "Refactor (optional)" [label="yes"];
44
+ "Refactor (optional)" -> "Run FULL test suite";
45
+ "Run FULL test suite" -> "Adversarial test review (challengers)" [label="green"];
46
+ "Adversarial test review (challengers)" -> "Next behavior" [shape=doublecircle];
47
+ }
48
+ ```
49
+
50
+ ## Red-Green-Refactor
51
+
52
+ ### RED — Write Failing Test
53
+ - One test, one behavior. "And" in the name? Split it.
54
+ - Clear name describing the behavior, not the implementation
55
+ - Real code, no mocks unless absolutely unavoidable
56
+ - Good: `test_expired_token_returns_401`
57
+ - Bad: `test_auth`, `test1`, `test_the_function`
58
+
59
+ ### Verify RED — Watch It Fail (MANDATORY. Never skip.)
60
+ Run: test command from `.claude/raid.json`
61
+ - Test **fails** (not errors — fails). A test that errors proves nothing.
62
+ - Failure message matches your expectation
63
+ - Fails because the feature is missing, not because of typos or import errors
64
+
65
+ ### GREEN — Minimal Code
66
+ - Simplest code that makes the test pass. Nothing more.
67
+ - No extra features, options, configurations
68
+ - "Can I make it pass with less code?" — if yes, do that instead
69
+
70
+ ### Verify GREEN — Watch It Pass (MANDATORY.)
71
+ Run: test command from `.claude/raid.json`
72
+ - New test passes
73
+ - ALL existing tests still pass
74
+ - Output clean — no warnings, no deprecation notices
75
+
76
+ ### REFACTOR — Clean Up (only after green)
77
+ - Remove duplication, improve names, extract helpers
78
+ - Keep tests green throughout — run after every change
79
+ - Refactor the TESTS too, not just the implementation
80
+
81
+ ## Adversarial Test Review
82
+
83
+ After TDD cycle, challengers attack the TESTS directly — and build on each other's critiques:
84
+
85
+ 1. **Does this test prove the behavior, or just confirm the implementation?** If you renamed an internal method, would the test break? It shouldn't.
86
+ 2. **What input would make this test pass even with a broken implementation?** (e.g., a test that only checks the happy path passes for any implementation that doesn't crash)
87
+ 3. **What edge cases are uncovered?** Empty input, null, boundary values, Unicode, concurrent access.
88
+ 4. **Is it testing real code or mock behavior?** Mocks that don't match real behavior = false confidence.
89
+ 5. **Would this catch a regression?** If someone changes the implementation next month, does this test catch the break?
90
+
91
+ **Challengers interact directly:**
92
+ - `⚔️ CHALLENGE: @Warrior, your test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
93
+ - `🔗 BUILDING ON @Archer: Your edge case finding — the same gap exists in the error path test at line 32...`
94
+ - `🔥 ROAST: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
95
+
96
+ **Challengers don't just report to the Wizard — they fight each other over test quality.**
97
+
98
+ ## Testing Anti-Patterns
99
+
100
+ | Anti-Pattern | Problem | Fix |
101
+ |---|---|---|
102
+ | Testing mock behavior, not real behavior | Mocks pass, production breaks | Use real implementations. Mock only external services. |
103
+ | Test-only methods in production code | Production API polluted for tests | Test through the public interface. If you can't, the design is wrong. |
104
+ | Mocking without understanding the real behavior | Mock returns "success" but real service returns different shape | Read the real implementation before mocking. |
105
+ | Incomplete mocks | Mock covers happy path, misses errors | Mock all paths, or don't mock at all. |
106
+ | Integration tests as afterthought | Unit tests pass, integration breaks | Write integration tests as part of TDD, not after. |
107
+
108
+ ## Common Rationalizations
109
+
110
+ | Excuse | Reality |
111
+ |--------|---------|
112
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. Write it. |
113
+ | "I'll test after" | Tests passing immediately prove nothing about the design. |
114
+ | "Need to explore first" | Fine. Throw away exploration code. Start fresh with TDD. |
115
+ | "Test hard = skip it" | Hard to test = hard to use. Fix the design. |
116
+ | "TDD slows me down" | TDD is faster than debugging. Every time. |
117
+ | "Tests after achieve same goals" | Tests-after = "what does this code do?" Tests-first = "what should this code do?" |
118
+ | "Keep the code as reference" | You'll "adapt" it. That's testing after with extra steps. Delete means delete. |
119
+ | "Just this once" | Every exception is a precedent. Delete the code. Start with TDD. |
120
+ | "Already manually tested" | Manual testing leaves no record, can't be re-run, can't catch regressions. |
121
+ | "TDD is dogmatic" | TDD is evidence-based discipline. Dogma would skip the evidence. |
122
+
123
+ ## Verification Checklist
124
+
125
+ Before claiming a TDD cycle is complete:
126
+
127
+ - [ ] Every new function has a test
128
+ - [ ] Watched each test fail before implementing
129
+ - [ ] Each test failed for the expected reason (not errors)
130
+ - [ ] Wrote minimal code to pass (no over-engineering)
131
+ - [ ] All tests pass (new and existing)
132
+ - [ ] Output clean (no warnings)
133
+ - [ ] Tests use real code (minimal mocking)
134
+ - [ ] Edge cases and error paths covered
135
+ - [ ] Tests describe behavior, not implementation
136
+
137
+ Can't check all boxes? Don't proceed. Fix what's missing.
138
+
139
+ ## When You're Stuck
140
+
141
+ | Situation | Action |
142
+ |-----------|--------|
143
+ | Can't write the test | The requirement is unclear. Ask the Wizard. |
144
+ | Test won't fail | You may already have the implementation. Delete it. Write test. Watch it fail. |
145
+ | Test fails but for wrong reason | Fix the test setup, not the implementation. |
146
+ | Can't make it pass with minimal code | The design may be wrong. Discuss before forcing. |
147
+ | Tests are brittle (break on refactor) | Tests are testing implementation, not behavior. Rewrite them. |
@@ -0,0 +1,113 @@
1
+ ---
2
+ name: raid-verification
3
+ description: "Use before claiming any work is complete, fixed, or passing. No completion claims without fresh verification evidence. Challengers verify independently. No subagents."
4
+ ---
5
+
6
+ # Raid Verification — Evidence Before Claims
7
+
8
+ Claiming work is complete without verification is dishonesty, not efficiency.
9
+
10
+ **Core principle:** Evidence before claims, always.
11
+
12
+ **Violating the letter of this rule is violating its spirit.**
13
+
14
+ ## The Iron Law
15
+
16
+ ```
17
+ NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
18
+ ```
19
+
20
+ If you haven't run the verification command THIS turn, you cannot claim it passes.
21
+
22
+ ## Mode Behavior
23
+
24
+ - **Full Raid**: Triple verification — implementer + 2 challengers verify independently.
25
+ - **Skirmish**: Double verification — implementer + 1 challenger.
26
+ - **Scout**: Single verification + Wizard confirms.
27
+
28
+ ## The Gate Function
29
+
30
+ ```
31
+ BEFORE claiming any status:
32
+ 1. IDENTIFY: What command proves this claim?
33
+ 2. RUN: Execute the FULL command (fresh, complete)
34
+ Use test command from .claude/raid.json
35
+ 3. READ: Full output, check exit code, count failures
36
+ 4. VERIFY: Does output confirm the claim?
37
+ If NO → state actual status with evidence
38
+ If YES → state claim WITH evidence
39
+ 5. ONLY THEN: Make the claim
40
+
41
+ Skip any step = lying, not verifying
42
+ ```
43
+
44
+ This gate applies to EVERY status claim: "tests pass", "bug fixed", "feature complete", "regression resolved", "no issues found."
45
+
46
+ ## Common Failures
47
+
48
+ | Claim | Requires | NOT Sufficient |
49
+ |-------|----------|----------------|
50
+ | "Tests pass" | Test command output: 0 failures | Previous run, "should pass", changed code without re-running |
51
+ | "Bug fixed" | Reproduce original symptom: now passes | Code changed, assumed fixed |
52
+ | "No regressions" | Full test suite: all pass | Running only new tests |
53
+ | "Feature complete" | All acceptance criteria verified with evidence | Self-assessment without running tests |
54
+ | "Regression test works" | Red-green cycle verified | Test passes once without seeing it fail |
55
+
56
+ ## Forbidden Phrases Without Evidence
57
+
58
+ These phrases are NEVER allowed without preceding verification output:
59
+
60
+ - "Done", "Complete", "Fixed", "Working", "Passing"
61
+ - "Should work", "Probably fine", "Looks correct"
62
+ - "Great!", "Perfect!", "All good!"
63
+
64
+ Replace with: `[command] output shows [evidence]. [Claim].`
65
+
66
+ Example:
67
+ - Bad: "Tests pass."
68
+ - Good: "`npm test` output: 31 passing, 0 failing. Tests pass."
69
+
70
+ ## Raid Triple Verification
71
+
72
+ The implementer's claim is NOT sufficient. Challengers verify AND cross-check each other:
73
+
74
+ 1. **Implementer verifies** — runs tests, reports with evidence (command + output)
75
+ 2. **Challenger 1 verifies independently** — runs same tests, confirms output matches
76
+ 3. **Challenger 2 verifies adversarially** — runs tests PLUS tries to break it with edge cases
77
+ 4. **Challengers cross-check each other:** `@Archer, you said tests pass but did you run the full suite or just the changed files?` / `🔗 BUILDING ON @Warrior: Your verification missed the integration test at...`
78
+
79
+ Only after all required verifications confirm — and challengers have cross-checked each other — does the Wizard accept.
80
+
81
+ ## Common Rationalizations
82
+
83
+ | Excuse | Reality |
84
+ |--------|---------|
85
+ | "Should work now" | RUN the verification. "Should" is not evidence. |
86
+ | "I'm confident" | Confidence ≠ evidence. Confident people ship bugs too. |
87
+ | "Just this once" | Every "just this once" is a bug waiting to ship. No exceptions. |
88
+ | "Agent said success" | Verify independently. Reports lie — run the command yourself. |
89
+ | "Partial check is enough" | Partial verification proves nothing about the whole. Run the full suite. |
90
+ | "Different words so rule doesn't apply" | Spirit over letter. If you're claiming status without evidence, you're violating this. |
91
+ | "I already verified earlier" | Earlier is not now. Code may have changed. Fresh run required. |
92
+
93
+ ## Red Flags
94
+
95
+ | Thought | Reality |
96
+ |---------|---------|
97
+ | "I just ran the tests, they should still pass" | "Should" is not evidence. Run again. |
98
+ | "The fix is obvious, it definitely works" | Obvious fixes break in non-obvious ways. Verify. |
99
+ | "I don't need to verify a one-line change" | One-line changes cause production outages. Verify. |
100
+ | Expressing satisfaction before verification | Satisfaction is not evidence. Run the command first. |
101
+ | "Let me just commit and verify after" | Committing before verification = shipping unverified code. |
102
+
103
+ ## Why This Matters
104
+
105
+ From real-world failure patterns:
106
+ - Undefined functions shipped because "tests pass" was claimed without running
107
+ - Missing requirements deployed because "feature complete" was asserted without checking spec
108
+ - Hours wasted because "fixed" was reported without reproducing the original bug
109
+ - Trust broken because claims weren't backed by evidence
110
+
111
+ Every one of these was avoidable with: run the command, read the output, THEN claim.
112
+
113
+ Run the command. Read the output. THEN claim the result. Non-negotiable.