agentic-sdlc-wizard 1.47.0 → 1.48.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +1 -1
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +17 -0
- package/CLAUDE_CODE_SDLC_WIZARD.md +2 -2
- package/package.json +1 -1
- package/skills/sdlc/SKILL.md +170 -661
- package/skills/update/SKILL.md +119 -211
package/skills/sdlc/SKILL.md
CHANGED
|
@@ -9,61 +9,55 @@ effort: high
|
|
|
9
9
|
## Task
|
|
10
10
|
$ARGUMENTS
|
|
11
11
|
|
|
12
|
+
Operational checklist. Full protocol lives in `CLAUDE_CODE_SDLC_WIZARD.md` — read it for depth.
|
|
13
|
+
|
|
12
14
|
## Full SDLC Checklist
|
|
13
15
|
|
|
14
|
-
Your FIRST action must be TodoWrite
|
|
16
|
+
Your FIRST action must be a TodoWrite covering every phase below. Compact form (omit `activeForm` to use the subject as the spinner label):
|
|
15
17
|
|
|
16
18
|
```
|
|
17
19
|
TodoWrite([
|
|
18
|
-
// PLANNING
|
|
19
|
-
{ content: "Find and read relevant documentation", status: "in_progress"
|
|
20
|
-
{ content: "Assess doc health - flag issues (ask before cleaning)", status: "pending"
|
|
21
|
-
{ content: "DRY scan: What patterns exist to reuse? New pattern = get approval", status: "pending"
|
|
22
|
-
{ content: "Prove It Gate: adding new component? Research alternatives, prove quality with tests", status: "pending"
|
|
23
|
-
{ content: "Blast radius: What depends on code I'm changing?", status: "pending"
|
|
24
|
-
{ content: "Design system check (if UI change)", status: "pending"
|
|
25
|
-
{ content: "Restate task in own words - verify understanding", status: "pending"
|
|
26
|
-
{ content: "Scrutinize test design - right things tested? Follow TESTING.md?", status: "pending"
|
|
27
|
-
{ content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending"
|
|
28
|
-
{ content: "Signal ready - user exits plan mode", status: "pending"
|
|
29
|
-
// TRANSITION
|
|
30
|
-
{ content: "Doc sync: update or create feature
|
|
31
|
-
// IMPLEMENTATION
|
|
32
|
-
{ content: "TDD RED: Write failing test FIRST", status: "pending"
|
|
33
|
-
{ content: "TDD GREEN: Implement, verify test passes", status: "pending"
|
|
34
|
-
{ content: "Run lint/typecheck", status: "pending"
|
|
35
|
-
{ content: "Run ALL tests", status: "pending"
|
|
36
|
-
{ content: "Production build check", status: "pending"
|
|
37
|
-
// REVIEW
|
|
38
|
-
{ content: "DRY check: Is logic duplicated elsewhere?", status: "pending"
|
|
39
|
-
{ content: "Visual consistency check (if UI change)", status: "pending"
|
|
40
|
-
{ content: "Self-review: run /code-review", status: "pending"
|
|
41
|
-
{ content: "Security review (if warranted)", status: "pending"
|
|
42
|
-
{ content: "Cross-model review (if configured
|
|
43
|
-
{ content: "Scope guard: only changes related to task? No legacy/fallback code left?", status: "pending"
|
|
44
|
-
// CI
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
// Consumer repos still use their own CI as configured.
|
|
51
|
-
{ content: "Commit and push to remote", status: "pending", activeForm: "Pushing to remote" },
|
|
52
|
-
{ content: "Watch CI - fix failures, iterate until green (max 2x)", status: "pending", activeForm: "Watching CI" },
|
|
53
|
-
{ content: "Read CI review - implement valid suggestions, iterate until clean", status: "pending", activeForm: "Addressing CI review feedback" },
|
|
54
|
-
{ content: "Meta-repo only: run local shepherd if PR needs E2E score (optional)", status: "pending", activeForm: "Running local shepherd" },
|
|
55
|
-
{ content: "Post-deploy verification (if deploy task — see Deployment Tasks)", status: "pending", activeForm: "Verifying deployment" },
|
|
20
|
+
// PLANNING
|
|
21
|
+
{ content: "Find and read relevant documentation", status: "in_progress" },
|
|
22
|
+
{ content: "Assess doc health - flag issues (ask before cleaning)", status: "pending" },
|
|
23
|
+
{ content: "DRY scan: What patterns exist to reuse? New pattern = get approval", status: "pending" },
|
|
24
|
+
{ content: "Prove It Gate: adding new component? Research alternatives, prove quality with tests", status: "pending" },
|
|
25
|
+
{ content: "Blast radius: What depends on code I'm changing?", status: "pending" },
|
|
26
|
+
{ content: "Design system check (if UI change)", status: "pending" },
|
|
27
|
+
{ content: "Restate task in own words - verify understanding", status: "pending" },
|
|
28
|
+
{ content: "Scrutinize test design - right things tested? Follow TESTING.md?", status: "pending" },
|
|
29
|
+
{ content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending" },
|
|
30
|
+
{ content: "Signal ready - user exits plan mode", status: "pending" },
|
|
31
|
+
// TRANSITION
|
|
32
|
+
{ content: "Doc sync: update or create feature doc — MUST be current before commit", status: "pending" },
|
|
33
|
+
// IMPLEMENTATION
|
|
34
|
+
{ content: "TDD RED: Write failing test FIRST", status: "pending" },
|
|
35
|
+
{ content: "TDD GREEN: Implement, verify test passes", status: "pending" },
|
|
36
|
+
{ content: "Run lint/typecheck", status: "pending" },
|
|
37
|
+
{ content: "Run ALL tests", status: "pending" },
|
|
38
|
+
{ content: "Production build check", status: "pending" },
|
|
39
|
+
// REVIEW
|
|
40
|
+
{ content: "DRY check: Is logic duplicated elsewhere?", status: "pending" },
|
|
41
|
+
{ content: "Visual consistency check (if UI change)", status: "pending" },
|
|
42
|
+
{ content: "Self-review: run /code-review", status: "pending" },
|
|
43
|
+
{ content: "Security review (if warranted)", status: "pending" },
|
|
44
|
+
{ content: "Cross-model review (if configured)", status: "pending" },
|
|
45
|
+
{ content: "Scope guard: only changes related to task? No legacy/fallback code left?", status: "pending" },
|
|
46
|
+
// CI SHEPHERD
|
|
47
|
+
{ content: "Commit and push to remote", status: "pending" },
|
|
48
|
+
{ content: "Watch CI - fix failures, iterate until green (max 2x)", status: "pending" },
|
|
49
|
+
{ content: "Read CI review - implement valid suggestions, iterate until clean", status: "pending" },
|
|
50
|
+
{ content: "Meta-repo only: run local shepherd if PR needs E2E score (optional)", status: "pending" },
|
|
51
|
+
{ content: "Post-deploy verification (if deploy task)", status: "pending" },
|
|
56
52
|
// FINAL
|
|
57
|
-
{ content: "Present summary: changes, tests, CI status", status: "pending"
|
|
58
|
-
{ content: "Capture learnings (
|
|
59
|
-
{ content: "Close out plan files: if task came from a plan, mark complete or delete", status: "pending"
|
|
53
|
+
{ content: "Present summary: changes, tests, CI status", status: "pending" },
|
|
54
|
+
{ content: "Capture learnings (after session — TESTING.md, CLAUDE.md, or feature docs)", status: "pending" },
|
|
55
|
+
{ content: "Close out plan files: if task came from a plan, mark complete or delete", status: "pending" }
|
|
60
56
|
])
|
|
61
57
|
```
|
|
62
58
|
|
|
63
59
|
## SDLC Quality Checklist (Scoring Rubric)
|
|
64
60
|
|
|
65
|
-
Your work is scored on these criteria. **Critical** criteria are must-pass.
|
|
66
|
-
|
|
67
61
|
| Criterion | Points | Critical? | What Counts |
|
|
68
62
|
|-----------|--------|-----------|-------------|
|
|
69
63
|
| task_tracking | 1 | | Use TodoWrite or TaskCreate |
|
|
@@ -76,734 +70,249 @@ Your work is scored on these criteria. **Critical** criteria are must-pass.
|
|
|
76
70
|
| self_review | 1 | **YES** | Read back files/diffs you modified |
|
|
77
71
|
| clean_code | 1 | | One coherent approach, no dead code |
|
|
78
72
|
|
|
79
|
-
**Total: 10 points** (11 for UI tasks, +1 for design_system check)
|
|
80
|
-
|
|
81
|
-
Critical miss on `tdd_red` or `self_review` = process failure regardless of total score.
|
|
73
|
+
**Total: 10 points** (11 for UI tasks, +1 for design_system check). Critical miss on `tdd_red` or `self_review` = process failure regardless of total score.
|
|
82
74
|
|
|
83
|
-
## Test Failure Recovery
|
|
75
|
+
## Test Failure Recovery
|
|
84
76
|
|
|
85
|
-
|
|
86
|
-
┌─────────────────────────────────────────────────────────────────────┐
|
|
87
|
-
│ ALL TESTS MUST PASS. NO EXCEPTIONS. │
|
|
88
|
-
│ │
|
|
89
|
-
│ This is not negotiable. This is not flexible. This is absolute. │
|
|
90
|
-
└─────────────────────────────────────────────────────────────────────┘
|
|
91
|
-
```
|
|
77
|
+
**ALL TESTS MUST PASS. NO EXCEPTIONS.** Test code is app code. Failures are bugs — investigate them like a 15-year SDET, not by brushing aside.
|
|
92
78
|
|
|
93
|
-
|
|
94
|
-
- "Those were already failing" → Fix them first
|
|
95
|
-
- "Not related to my changes" → Doesn't matter, fix it
|
|
96
|
-
- "It's flaky" → Flaky = bug, investigate
|
|
79
|
+
Not acceptable: "those were already failing", "not related to my changes", "it's flaky" (flaky = bug we haven't found yet).
|
|
97
80
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
If tests fail:
|
|
81
|
+
When tests fail:
|
|
101
82
|
1. Identify which test(s) failed
|
|
102
|
-
2. Diagnose WHY
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
- Test has wrong assertions? Fix the test
|
|
106
|
-
- Test is "flaky"? Investigate - flakiness is just another word for bug
|
|
107
|
-
3. Fix appropriately (fix code, fix test, or delete dead test)
|
|
108
|
-
4. Run specific test individually first
|
|
109
|
-
5. Then run ALL tests
|
|
110
|
-
6. Still failing? ASK USER - don't spin your wheels
|
|
111
|
-
|
|
112
|
-
**Flaky tests are bugs, not mysteries:**
|
|
113
|
-
- Sometimes the bug is in app code (race condition, timing issue)
|
|
114
|
-
- Sometimes the bug is in test code (shared state, not parallel-safe)
|
|
115
|
-
- Sometimes the bug is in test environment (cleanup not proper)
|
|
116
|
-
|
|
117
|
-
Debug it. Find root cause. Fix it properly. Tests ARE code.
|
|
118
|
-
|
|
119
|
-
## New Pattern & Test Design Scrutiny (PLANNING)
|
|
120
|
-
|
|
121
|
-
**New design patterns require human approval:**
|
|
122
|
-
1. Search first - do similar patterns exist in codebase?
|
|
123
|
-
2. If YES and they're good - use as building block
|
|
124
|
-
3. If YES but they're bad - propose improvement, get approval
|
|
125
|
-
4. If NO (new pattern) - explain why needed, get explicit approval
|
|
126
|
-
|
|
127
|
-
**Test design scrutiny during planning:**
|
|
128
|
-
- Are we testing the right things?
|
|
129
|
-
- Does test approach follow TESTING.md philosophies?
|
|
130
|
-
- If introducing new test patterns, same scrutiny as code patterns
|
|
131
|
-
|
|
132
|
-
## Prove It Gate (REQUIRED for New Additions)
|
|
133
|
-
|
|
134
|
-
**Adding a new skill, hook, workflow, or component? PROVE IT FIRST:**
|
|
135
|
-
|
|
136
|
-
1. **Absorption check:** Can this be added as a section in an existing skill instead of a new component? Default is YES — new skills/hooks need strong justification. Releasing is SDLC, not a separate skill. Debugging is SDLC, not a separate skill. Keep it lean
|
|
137
|
-
2. **Research:** Does something equivalent already exist (native CC, third-party plugin, existing skill)?
|
|
138
|
-
3. **If YES:** Why is yours better? Show evidence (A/B test, quality comparison, gap analysis)
|
|
139
|
-
4. **If NO:** What gap does this fill? Is the gap real or theoretical?
|
|
140
|
-
5. **Quality tests:** New additions MUST have tests that prove OUTPUT QUALITY, not just existence
|
|
141
|
-
6. **Less is more:** Every addition is maintenance burden. Default answer is NO unless proven YES
|
|
142
|
-
|
|
143
|
-
**Existence tests are NOT quality tests:**
|
|
144
|
-
- BAD: "ci-analyzer skill file exists" — proves nothing about quality
|
|
145
|
-
- GOOD: "ci-analyzer recommends lint-first when test-before-lint detected" — proves behavior
|
|
146
|
-
|
|
147
|
-
**If you can't write a quality test for it, you can't prove it works, so don't add it.**
|
|
148
|
-
|
|
149
|
-
## Plan Mode Integration
|
|
83
|
+
2. Diagnose WHY: your code broke it (regression — fix code), test is for deleted code (delete test), test has wrong assertions (fix test), "flaky" (investigate — race, shared state, env)
|
|
84
|
+
3. Fix appropriately, run specific test individually first, then run ALL tests
|
|
85
|
+
4. Still failing after 2 attempts? STOP and ASK USER
|
|
150
86
|
|
|
151
|
-
|
|
87
|
+
## Confidence Check (REQUIRED)
|
|
152
88
|
|
|
153
|
-
|
|
154
|
-
1. **Plan Mode** (editing blocked): Research -> Write plan file -> Present approach + confidence
|
|
155
|
-
2. **Transition** (after approval): Update feature docs
|
|
156
|
-
3. **Implementation**: TDD RED -> GREEN -> PASS
|
|
89
|
+
State your confidence before presenting an approach:
|
|
157
90
|
|
|
158
|
-
|
|
91
|
+
| Level | Meaning | Action | Effort |
|
|
92
|
+
|-------|---------|--------|--------|
|
|
93
|
+
| HIGH (90%+) | Know exactly what to do | Present, proceed after approval | `high` (default) |
|
|
94
|
+
| MEDIUM (60-89%) | Solid approach, some uncertainty | Present, highlight uncertainties | `high` |
|
|
95
|
+
| LOW (<60%) | Not sure | Research or try Codex; if still LOW, ASK USER | **`/effort xhigh` now** |
|
|
96
|
+
| FAILED 2x | Something's wrong | Codex for fresh perspective; if still stuck, STOP | **`/effort max` now** |
|
|
97
|
+
| CONFUSED | Can't diagnose | Codex; if still confused, STOP and describe | **`/effort max` now** |
|
|
159
98
|
|
|
160
|
-
|
|
161
|
-
- Confidence is **HIGH (95%+)** — you know exactly what to do
|
|
162
|
-
- Task is **single-file or trivial** (config tweak, small bug fix, string change)
|
|
163
|
-
- No new patterns introduced
|
|
164
|
-
- No architectural decisions
|
|
99
|
+
**Dynamic effort bumping is NOT optional.** "Consider max effort" is the same as "ignore this." Bump BEFORE the next attempt, not after a third failure.
|
|
165
100
|
|
|
166
|
-
|
|
167
|
-
> "Confidence HIGH (95%). Single-file change. Proceeding directly to TDD."
|
|
101
|
+
## Plan Mode
|
|
168
102
|
|
|
169
|
-
|
|
103
|
+
Use plan mode for: multi-file changes, new features, LOW confidence, bugs needing investigation. **Skip plan approval step** (auto-approval) when confidence HIGH (95%+) AND single-file/trivial AND no new patterns AND no architectural decisions — still announce approach, don't wait. When in doubt, wait.
|
|
170
104
|
|
|
171
105
|
## Recommended Model
|
|
172
106
|
|
|
173
|
-
**Opt-in: `opus[1m]` (Opus 4.7 with 1M context
|
|
174
|
-
|
|
175
|
-
**Why opt-in, not default:** A top-level `model` pin in `.claude/settings.json` disables Claude Code's per-turn model auto-selection. That's a real cost — Max-plan users pay for that auto-selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). Pin only when you actually need the 1M headroom.
|
|
176
|
-
|
|
177
|
-
**Why pin to `opus[1m]` when you do opt in:**
|
|
178
|
-
- SDLC sessions (plan → TDD → review → CI shepherd) accumulate context fast — plans, test output, diffs, review artifacts. 200K fills up before you're done.
|
|
179
|
-
- Forced auto-compact mid-task loses your working state. Extra headroom is cheaper than re-reading files.
|
|
180
|
-
- At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models — verify current rates for your plan before relying on this.
|
|
107
|
+
**Opt-in: `opus[1m]` (Opus 4.7 with 1M context).** `/model opus[1m]` at the start of non-trivial sessions — understand the tradeoff (issue #198). A top-level `model` pin in `.claude/settings.json` disables CC's per-turn auto-selection; pin only when you need 1M headroom. Requires CC v2.1.111+.
|
|
181
108
|
|
|
182
|
-
**
|
|
183
|
-
|
|
184
|
-
**Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`** when you opt in. Without it, CC's default auto-compact on 1M fires at ~76K and defeats the purpose. The setup wizard's Step 9.5 prompts to write both together (template ships with neither, opt-in only).
|
|
185
|
-
|
|
186
|
-
**Fall back to `opus` (200K) only when:** your plan charges a premium for long-context prompts, the task is genuinely short (<30K), or team cost controls flag >200K prompts. See the "1M vs 200K Context Window" section in `CLAUDE_CODE_SDLC_WIZARD.md` for details.
|
|
187
|
-
|
|
188
|
-
## Confidence Check (REQUIRED)
|
|
109
|
+
**Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` when you opt in.** Without it, the default fires at ~76K on 1M. **Pick ONE — do NOT set both `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000`** — they compound to 30% × 400K = 120K trigger ≈ 12% of 1M, fires almost immediately (#207). See wizard "Autocompact Tuning" for details.
|
|
189
110
|
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
| Level | Meaning | Action | Effort |
|
|
193
|
-
|-------|---------|--------|--------|
|
|
194
|
-
| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
|
|
195
|
-
| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
|
|
196
|
-
| LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | **Run `/effort xhigh` now** — don't wait |
|
|
197
|
-
| FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | **Run `/effort max` now** — you're already burning cycles at lower effort |
|
|
198
|
-
| CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
|
|
199
|
-
|
|
200
|
-
**Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — not after a third failure. Spinning at low effort is an SDLC failure mode, not a style choice.
|
|
201
|
-
|
|
202
|
-
## Self-Review Loop (CRITICAL)
|
|
111
|
+
## Self-Review Loop
|
|
203
112
|
|
|
204
113
|
```
|
|
205
|
-
PLANNING
|
|
206
|
-
^
|
|
207
|
-
|
|
208
|
-
| Issues found?
|
|
209
|
-
| |-- NO -> Present to user
|
|
210
|
-
| +-- YES v
|
|
211
|
-
+------------------------------------------- Ask user: fix in new plan?
|
|
114
|
+
PLANNING → DOCS → TDD RED → GREEN → Tests Pass → Self-Review
|
|
115
|
+
^ |
|
|
116
|
+
+--- Ask user: fix in new plan? ←- Issues found? YES (NO → Present)
|
|
212
117
|
```
|
|
213
118
|
|
|
214
|
-
|
|
215
|
-
1. Ask user: "Found issues. Want to create a plan to fix?"
|
|
216
|
-
2. If yes -> back to PLANNING phase with new plan doc
|
|
217
|
-
3. Then -> docs update -> TDD -> review (proper SDLC loop)
|
|
218
|
-
|
|
219
|
-
**How to self-review:**
|
|
220
|
-
1. Run `/code-review` to review your changes
|
|
221
|
-
2. It launches parallel agents (CLAUDE.md compliance, bug detection, logic & security)
|
|
222
|
-
3. Issues at confidence >= 80 are real findings — go back to PLANNING to fix
|
|
223
|
-
4. Issues below 80 are likely false positives — skip unless obviously valid
|
|
224
|
-
5. Address issues by going back through the proper SDLC loop
|
|
119
|
+
The loop goes back to PLANNING, not TDD RED. Run `/code-review`; issues at confidence ≥ 80 are real, < 80 are likely false positives. Found issues → ask "Want a plan to fix?" → new plan → docs → TDD → review.
|
|
225
120
|
|
|
226
121
|
## Cross-Model Review (If Configured)
|
|
227
122
|
|
|
228
|
-
**When to run:**
|
|
229
|
-
**When to skip:**
|
|
230
|
-
|
|
231
|
-
**Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
|
|
232
|
-
|
|
233
|
-
**The core insight:** The review PROTOCOL is universal across domains. Only the review INSTRUCTIONS change. Code review is the default template below. For non-code domains (research, persuasion, medical content), adapt the `review_instructions` and `verification_checklist` fields while keeping the same handoff/dialogue/convergence loop.
|
|
123
|
+
**When to run:** high-stakes changes (auth, payments, data), releases/publishes, complex refactors.
|
|
124
|
+
**When to skip:** trivial changes, time-sensitive hotfixes, risk < review cost.
|
|
125
|
+
**Prerequisites:** Codex CLI (`npm i -g @openai/codex`) + OpenAI API key.
|
|
234
126
|
|
|
235
|
-
**Reviewer always at
|
|
127
|
+
The PROTOCOL is universal across domains; only `review_instructions` and `verification_checklist` change. **Reviewer always at flagship tier (#233):** if the project pins `model: "sonnet[1m]"` (mixed-mode), the reviewer still runs `gpt-5.5` or Opus 4.7 max — adversarial diversity is the point.
|
|
236
128
|
|
|
237
|
-
### Step 0:
|
|
129
|
+
### Step 0: Preflight Self-Review
|
|
238
130
|
|
|
239
|
-
|
|
131
|
+
At `.reviews/preflight-{review_id}.md`, document what you already checked: `/code-review` passed, all tests passing, specific concerns checked, what you verified manually, known limitations. Reduces reviewer findings to 0-1 per round.
|
|
240
132
|
|
|
241
|
-
|
|
242
|
-
```markdown
|
|
243
|
-
## Preflight Self-Review: {feature}
|
|
244
|
-
- [ ] Self-review via /code-review passed
|
|
245
|
-
- [ ] All tests passing
|
|
246
|
-
- [ ] Checked for: [specific concerns for this change]
|
|
247
|
-
- [ ] Verified: [what you manually confirmed]
|
|
248
|
-
- [ ] Known limitations: [what you couldn't verify]
|
|
249
|
-
```
|
|
250
|
-
|
|
251
|
-
### Step 1: Write Mission-First Handoff
|
|
133
|
+
### Step 1: Mission-First Handoff
|
|
252
134
|
|
|
253
|
-
|
|
135
|
+
Write `.reviews/handoff.json`:
|
|
254
136
|
```jsonc
|
|
255
137
|
{
|
|
256
138
|
"review_id": "feature-xyz-001",
|
|
257
139
|
"status": "PENDING_REVIEW",
|
|
258
140
|
"round": 1,
|
|
259
|
-
"mission": "What changed and why — 2-3 sentences
|
|
260
|
-
"success": "What 'correctly reviewed' looks like
|
|
261
|
-
"failure": "What gets missed if
|
|
141
|
+
"mission": "What changed and why — 2-3 sentences",
|
|
142
|
+
"success": "What 'correctly reviewed' looks like",
|
|
143
|
+
"failure": "What gets missed if reviewer is superficial",
|
|
262
144
|
"files_changed": ["src/auth.ts", "tests/auth.test.ts"],
|
|
263
145
|
"fixes_applied": [],
|
|
264
146
|
"previous_score": null,
|
|
265
147
|
"verification_checklist": [
|
|
266
148
|
"(a) Verify input validation at auth.ts:45 handles empty strings",
|
|
267
|
-
"(b) Verify test covers
|
|
268
|
-
"(c) Check no hardcoded secrets in diff"
|
|
149
|
+
"(b) Verify test covers null-token edge case"
|
|
269
150
|
],
|
|
270
|
-
"review_instructions": "Focus on security and edge cases. Be strict — assume bugs may be present
|
|
151
|
+
"review_instructions": "Focus on security and edge cases. Be strict — assume bugs may be present.",
|
|
271
152
|
"preflight_path": ".reviews/preflight-feature-xyz-001.md",
|
|
272
|
-
"artifact_path": ".reviews/feature-xyz-001/",
|
|
273
153
|
"pr_number": 205
|
|
274
154
|
}
|
|
275
155
|
```
|
|
276
156
|
|
|
277
|
-
**
|
|
278
|
-
- `mission/success/failure` — Gives the reviewer context. Without this, you get generic "looks good" feedback. With it, reviewers read raw source files and verify specific claims (proven across 4 repos)
|
|
279
|
-
- `verification_checklist` — Specific things to verify with file:line references. NOT "review for correctness" — that's too vague. Each item is independently verifiable
|
|
280
|
-
- `preflight_path` — Shows the reviewer what you already checked, so they focus on what you might have missed
|
|
281
|
-
- `pr_number` (optional) — PreCompact self-heal opt-in (ROADMAP #209). When the review tracks a specific PR, set this. The `precompact-seam-check.sh` hook queries `gh pr view N --json state` on every manual `/compact` and, if the PR is MERGED, treats this handoff as implicit CERTIFIED — unblocking `/compact` even if `status` is still `PENDING_*`. Without `pr_number`, a forgotten PENDING handoff blocks every future manual compact until you flip status by hand or hit the `SDLC_HANDOFF_STALE_DAYS` (default 14) auto-expire fallback. Omit for ad-hoc reviews not tied to a PR.
|
|
157
|
+
`mission/success/failure` give context (without them: generic "looks good"). `verification_checklist` is specific (file:line), not "review for correctness." `pr_number` (optional) is the **PreCompact self-heal opt-in (ROADMAP #209)**: when set, `precompact-seam-check.sh` checks `gh pr view N --json state` on `/compact` and, if MERGED, treats handoff as implicit CERTIFIED. Without it, a forgotten PENDING handoff blocks every manual compact until you flip status or hit `SDLC_HANDOFF_STALE_DAYS` (default 14).
|
|
282
158
|
|
|
283
|
-
### Step 2: Run the
|
|
159
|
+
### Step 2: Run the Reviewer
|
|
284
160
|
|
|
285
161
|
```bash
|
|
286
|
-
codex exec \
|
|
287
|
-
-c 'model_reasoning_effort="xhigh"' \
|
|
288
|
-
-s danger-full-access \
|
|
162
|
+
codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access \
|
|
289
163
|
-o .reviews/latest-review.md \
|
|
290
|
-
"
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
Verify each checklist item with evidence (file:line, grep results, test output). \
|
|
294
|
-
Output each finding with: ID (1, 2, ...), severity (P0/P1/P2), evidence, \
|
|
295
|
-
and a 'certify condition' (what specific change resolves it). \
|
|
296
|
-
Re-verify any prior-round passes still hold. \
|
|
164
|
+
"Independent code reviewer. Read .reviews/handoff.json for context. \
|
|
165
|
+
Verify each checklist item with evidence (file:line, grep, test output). \
|
|
166
|
+
Each finding: ID, severity (P0/P1/P2), evidence, certify condition. \
|
|
297
167
|
End with: score (1-10), CERTIFIED or NOT CERTIFIED."
|
|
298
168
|
```
|
|
299
169
|
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
**Progress visibility (#259):** Reviews at `xhigh` routinely take 1-5 minutes. Without a heartbeat the user can't distinguish "still thinking" from "crashed silently". For long reviews, swap the bare invocation for `scripts/codex-review-with-progress.sh`:
|
|
170
|
+
Always `xhigh` — lower settings miss subtle errors. **Progress (#259):** xhigh runs take 1-5 min; for a heartbeat use `scripts/codex-review-with-progress.sh` (`SDLC_CODEX_HEARTBEAT_INTERVAL` tunes). **Sandbox:** Codex's Rust binary needs `SCDynamicStore`; CC's sandbox blocks this. From CC, use `dangerouslyDisableSandbox: true` — Codex has its own sandbox via `-s danger-full-access`. Known issue: [codex#15640](https://github.com/openai/codex/issues/15640).
|
|
303
171
|
|
|
304
|
-
|
|
305
|
-
scripts/codex-review-with-progress.sh \
|
|
306
|
-
.reviews/latest-review.md \
|
|
307
|
-
"You are an independent code reviewer ..."
|
|
308
|
-
```
|
|
172
|
+
CERTIFIED → CI. NOT CERTIFIED → dialogue loop.
|
|
309
173
|
|
|
310
|
-
|
|
174
|
+
### Step 3: Dialogue Loop
|
|
311
175
|
|
|
312
|
-
|
|
313
|
-
[codex 0m10s elapsed, 0 bytes written to .reviews/latest-review.md] still running...
|
|
314
|
-
[codex 0m20s elapsed, 1342 bytes written to .reviews/latest-review.md] still running...
|
|
315
|
-
[codex finished in 47s with rc=0]
|
|
316
|
-
```
|
|
176
|
+
Per-finding response in `.reviews/response.json`: `{"finding": "1", "action": "FIXED|DISPUTED|ACCEPTED", "summary": "..."}`. Update `handoff.json`: increment `round`, status `PENDING_RECHECK`, add `fixes_applied` (numbered, file:line refs).
|
|
317
177
|
|
|
318
|
-
|
|
178
|
+
Recheck prompt: "TARGETED RECHECK. For each finding: FIXED → verify certify condition. DISPUTED → ACCEPT if sound, REJECT with reasoning. ACCEPTED → verify applied. Do NOT raise new findings unless P0. End with score, CERTIFIED or NOT CERTIFIED."
|
|
319
179
|
|
|
320
|
-
**
|
|
180
|
+
**Convergence:** 2 rounds is the sweet spot, 3 max (research: 14 repos + 7 papers). After 3 still NOT CERTIFIED → escalate to user.
|
|
321
181
|
|
|
322
|
-
|
|
182
|
+
**Anti-patterns:** "find at least N problems," "review this," 1-10 without criteria, letting reviewer see author's reasoning (anchoring).
|
|
323
183
|
|
|
324
|
-
|
|
184
|
+
**Multiple reviewers** (Claude review + Codex + human): `gh api repos/OWNER/REPO/pulls/PR/comments` for all feedback, respond to each reviewer independently (different blind spots), pick stronger argument on conflicts, max 3 iterations per reviewer.
|
|
325
185
|
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
1. Write `.reviews/response.json`:
|
|
329
|
-
```jsonc
|
|
330
|
-
{
|
|
331
|
-
"review_id": "feature-xyz-001",
|
|
332
|
-
"round": 2,
|
|
333
|
-
"responding_to": ".reviews/latest-review.md",
|
|
334
|
-
"responses": [
|
|
335
|
-
{ "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
|
|
336
|
-
{ "finding": "2", "action": "DISPUTED", "justification": "Intentional — see CODE_REVIEW_EXCEPTIONS.md" },
|
|
337
|
-
{ "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
|
|
338
|
-
]
|
|
339
|
-
}
|
|
340
|
-
```
|
|
341
|
-
- **FIXED**: "I fixed this. Here's what changed." Reviewer verifies against certify condition.
|
|
342
|
-
- **DISPUTED**: "This is intentional/incorrect. Here's why." Reviewer accepts or rejects with reasoning.
|
|
343
|
-
- **ACCEPTED**: "You're right. Fixing now." (Same as FIXED, batched.)
|
|
344
|
-
|
|
345
|
-
2. Update `handoff.json`: increment `round`, set `"status": "PENDING_RECHECK"`, add `fixes_applied` list with numbered items and file:line references, update `previous_score`.
|
|
346
|
-
|
|
347
|
-
3. Run targeted recheck (NOT a full re-review):
|
|
348
|
-
```bash
|
|
349
|
-
codex exec \
|
|
350
|
-
-c 'model_reasoning_effort="xhigh"' \
|
|
351
|
-
-s danger-full-access \
|
|
352
|
-
-o .reviews/latest-review.md \
|
|
353
|
-
"TARGETED RECHECK — not a full re-review. Read .reviews/handoff.json \
|
|
354
|
-
for previous_review path and response.json for the author's responses. \
|
|
355
|
-
For each finding: FIXED → verify against original certify condition. \
|
|
356
|
-
DISPUTED → evaluate justification (ACCEPT if sound, REJECT with reasoning). \
|
|
357
|
-
ACCEPTED → verify it was applied. \
|
|
358
|
-
Do NOT raise new findings unless P0 (critical/security). \
|
|
359
|
-
New observations go in 'Notes for next review' (non-blocking). \
|
|
360
|
-
Re-verify all prior passes still hold. \
|
|
361
|
-
End with: score (1-10), CERTIFIED or NOT CERTIFIED."
|
|
362
|
-
```
|
|
363
|
-
|
|
364
|
-
### Convergence
|
|
365
|
-
|
|
366
|
-
**2 rounds is the sweet spot. 3 max.** Research across 14 repos and 7 papers confirms additional rounds beyond 3 produce <5% position shift.
|
|
367
|
-
|
|
368
|
-
Max 2 recheck rounds (3 total including initial review). If still NOT CERTIFIED after round 3, escalate to the user with a summary of open findings.
|
|
186
|
+
**Non-code domains** (research, persuasion, medical): same handoff format, adapt `review_instructions` + `verification_checklist`, add `audience` + `stakes`.
|
|
369
187
|
|
|
370
|
-
|
|
371
|
-
Preflight → handoff.json (round 1) → FULL REVIEW
|
|
372
|
-
|
|
|
373
|
-
CERTIFIED? → YES → CI
|
|
374
|
-
|
|
|
375
|
-
NO (scored findings)
|
|
376
|
-
|
|
|
377
|
-
response.json (FIXED/DISPUTED/ACCEPTED)
|
|
378
|
-
|
|
|
379
|
-
handoff.json (round 2+) → TARGETED RECHECK
|
|
380
|
-
|
|
|
381
|
-
CERTIFIED? → YES → CI
|
|
382
|
-
|
|
|
383
|
-
NO → one more round, then escalate
|
|
384
|
-
```
|
|
188
|
+
### Release Review Focus
|
|
385
189
|
|
|
386
|
-
|
|
190
|
+
Before any release/publish, add to `verification_checklist`: **CHANGELOG consistency** (sections present, no lost entries), **Version parity** (package.json + SDLC.md + CHANGELOG + wizard metadata), **Stale examples** (hardcoded version strings), **Docs accuracy** (README + ARCHITECTURE reflect current features), **CLI-distributed file parity** (live skills/hooks match CLI templates).
|
|
387
191
|
|
|
388
|
-
|
|
192
|
+
(Full protocol with rationale and convergence diagrams: `CLAUDE_CODE_SDLC_WIZARD.md` → Cross-Model Review.)
|
|
389
193
|
|
|
390
|
-
|
|
391
|
-
- **"Review this"** — Too vague, gets generic feedback. Use mission + verification checklist
|
|
392
|
-
- **Numeric 1-10 scales without criteria** — Unreliable. Decompose into specific checklist items
|
|
393
|
-
- **Letting reviewer see author's reasoning** — Causes anchoring bias. Let them form independent opinion from code
|
|
194
|
+
## Documentation Sync (REQUIRED — During Planning)
|
|
394
195
|
|
|
395
|
-
|
|
196
|
+
**Docs MUST be current before commit.** Stale docs = wrong implementations = wasted sessions.
|
|
396
197
|
|
|
397
|
-
|
|
398
|
-
- **CHANGELOG consistency** — all sections present, no lost entries during consolidation
|
|
399
|
-
- **Version parity** — package.json, SDLC.md, CHANGELOG, wizard metadata all match
|
|
400
|
-
- **Stale examples** — hardcoded version strings in docs match current release
|
|
401
|
-
- **Docs accuracy** — README, ARCHITECTURE.md reflect current feature set
|
|
402
|
-
- **CLI-distributed file parity** — live skills, hooks, settings match CLI templates
|
|
198
|
+
Standard pattern: `*_DOCS.md` — living documents that grow with the feature (`AUTH_DOCS.md`, `PAYMENTS_DOCS.md`).
|
|
403
199
|
|
|
404
|
-
|
|
200
|
+
1. Read feature docs for the area being changed during planning
|
|
201
|
+
2. When a code change contradicts what the doc says → MUST update the feature doc
|
|
202
|
+
3. When a code change extends behavior the doc describes → MUST update the feature doc (add new behavior)
|
|
203
|
+
4. No `*_DOCS.md` exists and feature touches 3+ files → create one
|
|
204
|
+
5. Project has `ROADMAP.md` → mark items done, add new items (ROADMAP feeds CHANGELOG)
|
|
405
205
|
|
|
406
|
-
|
|
206
|
+
`/claude-md-improver` audits CLAUDE.md structure. Run it periodically. It does NOT cover feature docs — the SDLC workflow handles those.
|
|
407
207
|
|
|
408
|
-
|
|
409
|
-
2. **Respond per-reviewer** — Each reviewer has different blind spots and priorities. Address each one's findings separately
|
|
410
|
-
3. **Resolve conflicts** — If reviewers disagree, pick the stronger argument, note why
|
|
411
|
-
4. **Iterate until all approve** — Don't merge until every active reviewer is satisfied
|
|
412
|
-
5. **Max 3 iterations per reviewer** — If a reviewer keeps finding new things, escalate to the user
|
|
208
|
+
## CI Feedback Loop — Local Shepherd
|
|
413
209
|
|
|
414
|
-
|
|
210
|
+
**NEVER AUTO-MERGE. Do NOT run `gh pr merge --auto`.** Auto-merge fires before review feedback can be read. The shepherd loop IS the process.
|
|
415
211
|
|
|
416
|
-
|
|
212
|
+
Mandatory steps:
|
|
213
|
+
1. Push to remote
|
|
214
|
+
2. `gh pr checks --watch`
|
|
215
|
+
3. **Read CI logs whether pass or fail** (`gh run view <RUN_ID> --log`, not just `--log-failed`). Passing CI hides warnings, skipped steps, degraded scores
|
|
216
|
+
4. **Cross-model audit the CI logs** — pipe to a tmp file, run `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access` with *"Audit for silent failures, skipped tests, degraded metrics, warnings-that-should-be-errors."* Tier 1 + Tier 2 separately
|
|
217
|
+
5. CI fails → diagnose, fix, push (max 2 attempts)
|
|
218
|
+
6. CI passes → `gh api repos/OWNER/REPO/pulls/PR/comments` for review feedback
|
|
219
|
+
7. Implement valid suggestions (bugs, perf, missing error handling, dedup, coverage). Skip opinions/style. Max 3 iterations
|
|
220
|
+
8. Explicit `gh pr merge --squash`
|
|
417
221
|
|
|
418
|
-
|
|
419
|
-
|--------|-------------------|-------------------|
|
|
420
|
-
| **Code (default)** | Security, logic bugs, test coverage | "Verify input validation at file:line" |
|
|
421
|
-
| **Research/Docs** | Factual accuracy, source verification, overclaims | "Verify $736-$804 appears in both docs, no stale $695-$723 remains" |
|
|
422
|
-
| **Persuasion** | Audience psychology, tone, trust | "If you were [audience], what's the moment you'd stop reading?" |
|
|
222
|
+
**Evidence:** PR #145 auto-merged before review was read; reviewer found a P1 dead-code bug that shipped. v1.24.0 only checked the green checkmark on round 2; passing CI hides degraded E2E scores and silent test exclusions. Use idle CI time (3-5 min) for `/compact` if context is long.
|
|
423
223
|
|
|
424
|
-
|
|
224
|
+
## Scope, DRY, Patterns, Legacy
|
|
425
225
|
|
|
426
|
-
|
|
226
|
+
- **Scope guard** — only task-related changes. Notice something else → NOTE in summary, don't fix unless asked. AI drift into "helpful" changes breaks unrelated things.
|
|
227
|
+
- **DRY** — before coding: "what patterns exist to reuse?" After: "did I duplicate anything?"
|
|
228
|
+
- **New patterns** require human approval: search first, propose if no equivalent, get explicit approval.
|
|
229
|
+
- **DELETE legacy code** — backwards-compat shims, "just in case" fallbacks → gone. If it breaks, fix properly.
|
|
427
230
|
|
|
428
|
-
|
|
231
|
+
## Debugging Workflow (Systematic)
|
|
429
232
|
|
|
430
|
-
|
|
431
|
-
- **`ci-debug`** — CI failure diagnosis (reads logs, identifies root cause, suggests fix)
|
|
432
|
-
- **`test-writer`** — Quality tests following TESTING.md philosophies
|
|
233
|
+
Reproduce → Isolate → Root Cause → Fix → Regression Test. This is the systematic debugging methodology — do not skip steps. Regressions: `git bisect`. Env-specific: check env vars/OS/deps/permissions, reproduce locally, log at the failure point. 2 failed attempts → STOP and ASK USER.
|
|
433
234
|
|
|
434
|
-
|
|
235
|
+
## Release Planning (Task Ships a Release)
|
|
435
236
|
|
|
436
|
-
|
|
237
|
+
List all items from ROADMAP, plan each at 95% confidence, identify dependencies, present all plans together (catches conflicts/scope creep), pre-release CI audit across merged PRs (warnings, degraded scores, skipped suites — green checkmark insufficient), user approves, then implement in priority order.
|
|
437
238
|
|
|
438
|
-
|
|
439
|
-
1. **Testing the right things?** - Not just that tests pass
|
|
440
|
-
2. **Tests prove correctness?** - Or just verify current behavior?
|
|
441
|
-
3. **Follow our philosophies (TESTING.md)?**
|
|
442
|
-
- Testing Diamond (integration-heavy)?
|
|
443
|
-
- Minimal mocking (see table below)?
|
|
444
|
-
- Real fixtures from captured data?
|
|
239
|
+
## Deployment Tasks
|
|
445
240
|
|
|
446
|
-
**
|
|
241
|
+
Read `ARCHITECTURE.md` Environments table + Deployment Checklist. **Production requires HIGH (90%+); ANY doubt → ASK USER.** **Post-deploy verification:** health check, log scan, smoke tests, monitor 15 min (prod only). Issues → rollback first, then new SDLC loop.
|
|
447
242
|
|
|
448
|
-
|
|
243
|
+
## Test Review (Harder Than Implementation)
|
|
449
244
|
|
|
450
|
-
|
|
451
|
-
|-------|--------------|------------|-----------|
|
|
452
|
-
| **E2E** | Full user flow through UI/browser (Playwright, Cypress) | ~5% | Slow, brittle, but proves the real thing works |
|
|
453
|
-
| **Integration** | Real systems via API without UI — real DB, real cache, real services | ~90% | **Best bang for buck.** Fast, stable, high confidence |
|
|
454
|
-
| **Unit** | Pure logic only — no DB, no API, no filesystem | ~5% | Fast but limited scope |
|
|
245
|
+
Critique tests harder than app code: testing the right things? Tests prove correctness or just verify current behavior? Follow TESTING.md (Testing Diamond, minimal mocking, real-captured fixtures).
|
|
455
246
|
|
|
456
|
-
**
|
|
247
|
+
**Testing Diamond:** E2E ~5% (slow, proves real thing) → Integration ~90% (best bang for buck — real DB/cache/services via API, no UI) → Unit ~5% (pure logic only). If no UI/browser, it's integration, not E2E.
|
|
457
248
|
|
|
458
|
-
|
|
249
|
+
**Mocking:**
|
|
459
250
|
|
|
460
251
|
| What | Mock? | Why |
|
|
461
252
|
|------|-------|-----|
|
|
462
|
-
| Database | NEVER |
|
|
463
|
-
| Cache | NEVER |
|
|
253
|
+
| Database | NEVER | Test DB or in-memory |
|
|
254
|
+
| Cache | NEVER | Isolated test instance |
|
|
464
255
|
| External APIs | YES | Real calls = flaky + expensive |
|
|
465
256
|
| Time/Date | YES | Determinism |
|
|
466
257
|
|
|
467
|
-
|
|
468
|
-
|
|
469
|
-
### Unit Tests = Pure Logic ONLY
|
|
470
|
-
|
|
471
|
-
A function qualifies for unit testing ONLY if:
|
|
472
|
-
- No database calls
|
|
473
|
-
- No external API calls
|
|
474
|
-
- No file system access
|
|
475
|
-
- No cache calls
|
|
476
|
-
- Input -> Output transformation only
|
|
477
|
-
|
|
478
|
-
Everything else needs integration tests.
|
|
479
|
-
|
|
480
|
-
### TDD Tests Must PROVE
|
|
481
|
-
|
|
482
|
-
| Phase | What It Proves |
|
|
483
|
-
|-------|----------------|
|
|
484
|
-
| RED | Test FAILS -> Bug exists or feature missing |
|
|
485
|
-
| GREEN | Test PASSES -> Fix works or feature implemented |
|
|
486
|
-
| Forever | Regression protection |
|
|
487
|
-
|
|
488
|
-
## Flaky Test Recovery
|
|
489
|
-
|
|
490
|
-
**Flaky tests are bugs. Period.** See: [How do you Address and Prevent Flaky Tests?](https://softwareautomation.notion.site/How-do-you-Address-and-Prevent-Flaky-Tests-23c539e19b3c46eeb655642b95237dc0)
|
|
491
|
-
|
|
492
|
-
When a test fails intermittently:
|
|
493
|
-
1. **Don't dismiss it** — "flaky" means "bug we haven't found yet"
|
|
494
|
-
2. **Identify the layer** — test code? app code? environment?
|
|
495
|
-
3. **Stress-test** — run the suspect test N times to reproduce reliably
|
|
496
|
-
4. **Fix root cause** — don't just retry-and-pray
|
|
497
|
-
5. **If CI infrastructure** — make cosmetic steps non-blocking, keep quality gates strict
|
|
498
|
-
|
|
499
|
-
## Scope Guard (Stay in Your Lane)
|
|
500
|
-
|
|
501
|
-
**Only make changes directly related to the task.**
|
|
502
|
-
|
|
503
|
-
If you notice something else that should be fixed:
|
|
504
|
-
- NOTE it in your summary ("I noticed X could be improved")
|
|
505
|
-
- DON'T fix it unless asked
|
|
506
|
-
|
|
507
|
-
**Why this matters:** AI agents can drift into "helpful" changes that weren't requested. This creates unexpected diffs, breaks unrelated things, and makes code review harder.
|
|
508
|
-
|
|
509
|
-
## Debugging Workflow (Systematic Investigation)
|
|
510
|
-
|
|
511
|
-
When something breaks and the cause isn't obvious, follow this systematic debugging workflow:
|
|
512
|
-
|
|
513
|
-
```
|
|
514
|
-
Reproduce → Isolate → Root Cause → Fix → Regression Test
|
|
515
|
-
```
|
|
516
|
-
|
|
517
|
-
1. **Reproduce** — Can you make it fail consistently? If intermittent, stress-test (run N times). If you can't reproduce it, you can't fix it
|
|
518
|
-
2. **Isolate** — Narrow the scope. Which file? Which function? Which input? Use binary search: comment out half the code, does it still fail?
|
|
519
|
-
3. **Root cause** — Don't fix symptoms. Ask "why?" until you hit the actual cause. "It crashes on line 42" is a symptom. "Null pointer because the API returns undefined when rate-limited" is a root cause
|
|
520
|
-
4. **Fix** — Fix the root cause, not the symptom. Write the fix
|
|
521
|
-
5. **Regression test** — Write a test that fails without your fix and passes with it (TDD GREEN)
|
|
258
|
+
Mocks MUST come from real captured data — never guess shapes. Unit tests qualify ONLY for pure I→O (no DB, API, FS, cache).
|
|
522
259
|
|
|
523
|
-
**
|
|
524
|
-
- Use `git bisect` to find the exact commit that broke it
|
|
525
|
-
- `git bisect start`, `git bisect bad` (current), `git bisect good <known-good-commit>`
|
|
526
|
-
- Bisect narrows to the breaking commit in O(log n) steps
|
|
260
|
+
**TDD proves:** RED (fails — bug or missing feature), GREEN (passes — fix works), Forever (regression protection).
|
|
527
261
|
|
|
528
|
-
|
|
529
|
-
- Check environment differences: env vars, OS version, dependency versions, file permissions
|
|
530
|
-
- Reproduce the environment locally if possible (Docker, env vars)
|
|
531
|
-
- Add logging at the failure point — don't guess, observe
|
|
262
|
+
## Prove It Gate (New Additions Only)
|
|
532
263
|
|
|
533
|
-
**
|
|
534
|
-
- After 2 failed fix attempts → STOP and ASK USER
|
|
535
|
-
- If the bug is in code you don't understand → read first, then fix
|
|
536
|
-
- If reproducing requires access you don't have → ASK USER
|
|
264
|
+
Adding a new skill/hook/workflow? Default answer is NO. Prove it: (1) **Absorption check** — can this be a section in an existing skill? (2) Research existing equivalents (native CC, third-party, existing skill). (3) If yes — why is yours better with evidence. (4) If no — real gap or theoretical? (5) **Quality tests** must prove OUTPUT QUALITY (existence tests prove nothing). (6) Less is more — every addition is maintenance burden.
|
|
537
265
|
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
**This is the "local shepherd" — the CI fix mechanism.** It runs in your active session with full context.
|
|
541
|
-
|
|
542
|
-
**The SDLC doesn't end at local tests.** CI must pass too.
|
|
543
|
-
|
|
544
|
-
```
|
|
545
|
-
Local tests pass -> Commit -> Push -> Watch CI
|
|
546
|
-
|
|
|
547
|
-
CI passes? -+-> YES -> Present for review
|
|
548
|
-
|
|
|
549
|
-
+-> NO -> Fix -> Push -> Watch CI
|
|
550
|
-
|
|
|
551
|
-
(max 2 attempts)
|
|
552
|
-
|
|
|
553
|
-
Still failing?
|
|
554
|
-
|
|
|
555
|
-
STOP and ASK USER
|
|
556
|
-
```
|
|
557
|
-
|
|
558
|
-
```
|
|
559
|
-
┌─────────────────────────────────────────────────────────────────────┐
|
|
560
|
-
│ NEVER AUTO-MERGE. NO EXCEPTIONS. │
|
|
561
|
-
│ │
|
|
562
|
-
│ Do NOT run `gh pr merge --auto`. Ever. │
|
|
563
|
-
│ Auto-merge fires before you can read review feedback. │
|
|
564
|
-
│ The shepherd loop IS the process. Skipping it = shipping bugs. │
|
|
565
|
-
└─────────────────────────────────────────────────────────────────────┘
|
|
566
|
-
```
|
|
567
|
-
|
|
568
|
-
**The full shepherd sequence — every step is mandatory:**
|
|
569
|
-
1. Push changes to remote
|
|
570
|
-
2. Watch CI: `gh pr checks --watch`
|
|
571
|
-
3. Read CI logs — **pass or fail**: `gh run view <RUN_ID> --log` (not just `--log-failed`). Passing CI can still hide warnings, skipped steps, or degraded scores. Don't just check the green checkmark
|
|
572
|
-
4. **Cross-model review the CI logs themselves** — pipe `gh run view <RUN_ID> --log` to a tmp file and run `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access` with a prompt like *"Audit this CI log for silent failures, skipped tests, degraded metrics, or warnings-that-should-be-errors. Green checkmark is necessary but not sufficient."* A second model catches things the first missed (e.g., a job that passed but degraded an E2E score by 30%, or a test that was silently excluded). Cheap — one extra `codex exec` per PR. **Run separately on Tier 1 quick-check AND Tier 2 5x evaluation logs** — they exercise different code paths, so a clean Tier 1 audit doesn't imply a clean Tier 2. Evidence from PR #206: Tier 1 audit found 3 P1s (Node 24 false-green, "11/10" score leak, E2E incomplete); Tier 2 audit TBD — value is measured by running both and comparing.
|
|
573
|
-
5. If CI fails → diagnose from logs, fix, push again (max 2 attempts)
|
|
574
|
-
6. If CI passes → read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
|
|
575
|
-
7. Fix valid suggestions, push, iterate until clean
|
|
576
|
-
8. Only then: explicit merge with `gh pr merge --squash`
|
|
577
|
-
|
|
578
|
-
**Why this is non-negotiable:** PR #145 auto-merged a release before review feedback was read. CI reviewer found a P1 dead-code bug that shipped to main. The fix required a follow-up commit. Auto-merge cost more time than the shepherd loop would have taken.
|
|
579
|
-
|
|
580
|
-
**Why read passing logs:** v1.24.0 release only read logs on failure (round 1), then just checked the green checkmark on round 2. Passing CI can hide warnings, skipped steps, degraded E2E scores, or silent test exclusions. A green checkmark is necessary but not sufficient.
|
|
581
|
-
|
|
582
|
-
**Context GC (compact during idle):** While waiting for CI (typically 3-5 min), suggest `/compact` if the conversation is long. Think of it like a time-based garbage collector — idle time + high memory pressure = good time to collect. Don't suggest on short conversations.
|
|
583
|
-
|
|
584
|
-
**CI failures follow same rules as test failures:**
|
|
585
|
-
- Your code broke it? Fix your code
|
|
586
|
-
- CI config issue? Fix the config
|
|
587
|
-
- Flaky? Investigate - flakiness is a bug
|
|
588
|
-
- Stuck? ASK USER
|
|
589
|
-
|
|
590
|
-
## CI Review Feedback Loop — Local Shepherd (After CI Passes)
|
|
591
|
-
|
|
592
|
-
**CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
|
|
593
|
-
|
|
594
|
-
```
|
|
595
|
-
CI passes -> Read review suggestions
|
|
596
|
-
|
|
|
597
|
-
Valid improvements? -+-> YES -> Implement -> Run tests -> Push
|
|
598
|
-
| |
|
|
599
|
-
| Review again (iterate)
|
|
600
|
-
|
|
|
601
|
-
+-> NO (just opinions/style) -> Skip, note why
|
|
602
|
-
|
|
|
603
|
-
+-> None -> Done, present to user
|
|
604
|
-
```
|
|
605
|
-
|
|
606
|
-
**How to evaluate suggestions:**
|
|
607
|
-
1. Read all CI review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
|
|
608
|
-
2. For each suggestion, ask: **"Is this a real improvement or just an opinion?"**
|
|
609
|
-
- **Real improvement:** Fixes a bug, improves performance, adds missing error handling, reduces duplication, improves test coverage → Implement it
|
|
610
|
-
- **Opinion/style:** Different but equivalent formatting, subjective naming preference, "you could also..." without clear benefit → Skip it
|
|
611
|
-
3. Implement the valid ones, run tests locally, push
|
|
612
|
-
4. CI re-reviews — repeat until no substantive suggestions remain
|
|
613
|
-
5. Max 3 iterations — if reviewer keeps finding new things, ASK USER
|
|
614
|
-
|
|
615
|
-
**The goal:** User is only brought in at the very end, when both CI and reviewer are satisfied. The code should be polished before human review.
|
|
616
|
-
|
|
617
|
-
**Customizable behavior** (set during wizard setup):
|
|
618
|
-
- **Auto-implement** (default): Implement valid suggestions autonomously, skip opinions
|
|
619
|
-
- **Ask first**: Present suggestions to user, let them decide which to implement
|
|
620
|
-
- **Skip review feedback**: Ignore CI review suggestions, only fix CI failures
|
|
621
|
-
|
|
622
|
-
## Context Management
|
|
623
|
-
|
|
624
|
-
- `/compact` between planning and implementation (plan preserved in summary)
|
|
625
|
-
- `/clear` between unrelated tasks (stale context wastes tokens and misleads)
|
|
626
|
-
- `/clear` after 2+ failed corrections (context polluted — start fresh with better prompt)
|
|
627
|
-
- Auto-compact fires at ~95% capacity — no manual management needed
|
|
628
|
-
- After committing a PR, `/clear` before starting the next feature
|
|
629
|
-
- **Autocompact tuning:** Set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to trigger compaction earlier (75% for 200K, 30% for 1M). On 1M models, the default fires at ~76K — pick ONE of: `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` **OR** `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (do NOT set both — they compound to 30% × 400K = 120K trigger ≈ 12% of 1M, which fires almost immediately, #207). See wizard doc "Autocompact Tuning" for full details
|
|
630
|
-
|
|
631
|
-
**`--bare` mode (v2.1.81+):** `claude -p "prompt" --bare` skips ALL hooks, skills, LSP, and plugins. This is a complete wizard bypass — no SDLC enforcement, no TDD checks, no planning hooks. Use only for scripted headless calls (CI pipelines, automation) where you explicitly don't want wizard enforcement. Never use `--bare` for normal development work.
|
|
632
|
-
|
|
633
|
-
## DRY Principle
|
|
634
|
-
|
|
635
|
-
**Before coding:** "What patterns exist I can reuse?"
|
|
636
|
-
**After coding:** "Did I accidentally duplicate anything?"
|
|
637
|
-
|
|
638
|
-
## Design System Check (If UI Change)
|
|
639
|
-
|
|
640
|
-
**When to check:** CSS/styling changes, new UI components, color/font usage.
|
|
641
|
-
**When to skip:** Backend-only changes, config/build changes, non-visual code.
|
|
642
|
-
|
|
643
|
-
**Planning phase - "Design system check":**
|
|
644
|
-
1. Read DESIGN_SYSTEM.md if it exists
|
|
645
|
-
2. Check if change involves colors, fonts, spacing, or components
|
|
646
|
-
3. Verify intended styles match design system tokens
|
|
647
|
-
4. Flag if introducing new patterns not in design system
|
|
648
|
-
|
|
649
|
-
**Review phase - "Visual consistency check":**
|
|
650
|
-
1. Are colors from the design system palette?
|
|
651
|
-
2. Are fonts/sizes from typography scale?
|
|
652
|
-
3. Are spacing values from the spacing scale?
|
|
653
|
-
4. Do new components follow existing patterns?
|
|
654
|
-
|
|
655
|
-
**If no DESIGN_SYSTEM.md exists:** Skip these checks (project has no documented design system).
|
|
656
|
-
|
|
657
|
-
## Release Planning (If Task Involves a Release)
|
|
658
|
-
|
|
659
|
-
**When to check:** Task mentions "release", "publish", "version bump", "npm publish", or multiple items being shipped together.
|
|
660
|
-
**When to skip:** Single feature implementation, bug fix, or anything that isn't a release.
|
|
661
|
-
|
|
662
|
-
Before implementing any release items:
|
|
663
|
-
|
|
664
|
-
1. **List all items** — Read ROADMAP.md (or equivalent), identify every item planned for this release
|
|
665
|
-
2. **Plan each at 95% confidence** — For each item: what files change, what tests prove it works, what's the blast radius. If confidence < 95% on any item, flag it
|
|
666
|
-
3. **Identify blocks** — Which items depend on others? What must go first?
|
|
667
|
-
4. **Present all plans together** — User reviews the complete batch, not one at a time. This catches conflicts, sequencing issues, and scope creep before any code is written
|
|
668
|
-
5. **Pre-release CI audit** — Before cutting the release, review CI runs across ALL PRs merged since last release. Look for: warnings in passing runs, degraded E2E scores, skipped test suites, silent failures masked by `continue-on-error`. Use `gh run list` + `gh run view <ID> --log` to audit. A green checkmark is necessary but not sufficient
|
|
669
|
-
6. **User approves, then implement** — Full SDLC per item (TDD RED → GREEN → self-review), in the prioritized order
|
|
670
|
-
|
|
671
|
-
**Why batch planning works:** Ad-hoc one-at-a-time implementation leads to unvalidated additions and scope creep. Batch planning catches problems early — if you can't plan it at 95%, you're not ready to ship it.
|
|
672
|
-
|
|
673
|
-
**Why pre-release CI audit:** v1.24.0 shipped without auditing CI logs across merged PRs #150-#152. Passing CI doesn't mean nothing fishy got through — warnings, degraded scores, and skipped steps can hide in green runs.
|
|
674
|
-
|
|
675
|
-
## Deployment Tasks (If Task Involves Deploy)
|
|
676
|
-
|
|
677
|
-
**When to check:** Task mentions "deploy", "release", "push to prod", "staging", etc.
|
|
678
|
-
**When to skip:** Code changes only, no deployment involved.
|
|
679
|
-
|
|
680
|
-
**Before any deployment:**
|
|
681
|
-
1. Read ARCHITECTURE.md → Find the Environments table and Deployment Checklist
|
|
682
|
-
2. Verify which environment is the target (dev/staging/prod)
|
|
683
|
-
3. Follow the deployment checklist in ARCHITECTURE.md
|
|
684
|
-
|
|
685
|
-
**Confidence levels for deployment:**
|
|
686
|
-
|
|
687
|
-
| Target | Required Confidence | If Lower |
|
|
688
|
-
|--------|---------------------|----------|
|
|
689
|
-
| Dev/Preview | MEDIUM or higher | Proceed with caution |
|
|
690
|
-
| Staging | MEDIUM or higher | Proceed, note uncertainties |
|
|
691
|
-
| **Production** | **HIGH only** | **ASK USER before deploying** |
|
|
692
|
-
|
|
693
|
-
**Production deployment requires:**
|
|
694
|
-
- All tests passing
|
|
695
|
-
- Production build succeeding
|
|
696
|
-
- Changes tested in staging/preview first
|
|
697
|
-
- HIGH confidence (90%+)
|
|
698
|
-
- If ANY doubt → ASK USER first
|
|
699
|
-
|
|
700
|
-
**If ARCHITECTURE.md has no Environments section:** Ask user "How do you deploy to [target]?" before proceeding.
|
|
701
|
-
|
|
702
|
-
**After deploying — Post-Deploy Verification:**
|
|
703
|
-
1. Read ARCHITECTURE.md → Find the Post-Deploy Verification table
|
|
704
|
-
2. Run health check for the target environment
|
|
705
|
-
3. Check logs for new errors
|
|
706
|
-
4. Run smoke tests if configured
|
|
707
|
-
5. Monitor error rates for 15 min (production only)
|
|
708
|
-
6. If issues found → rollback first, then start new SDLC loop to fix
|
|
709
|
-
|
|
710
|
-
**If ARCHITECTURE.md has no Post-Deploy section:** Ask user "How do you verify [target] is working after deploy?"
|
|
711
|
-
|
|
712
|
-
## DELETE Legacy Code
|
|
713
|
-
|
|
714
|
-
- Legacy code? DELETE IT
|
|
715
|
-
- Backwards compatibility? NO - DELETE IT
|
|
716
|
-
- "Just in case" fallbacks? DELETE IT
|
|
717
|
-
|
|
718
|
-
**THE RULE:** Delete old code first. If it breaks, fix it properly.
|
|
719
|
-
|
|
720
|
-
## Documentation Sync (REQUIRED — During Planning)
|
|
721
|
-
|
|
722
|
-
Feature docs MUST be current before commit. Docs are code — stale docs mislead future sessions, waste tokens, and cause wrong implementations.
|
|
723
|
-
|
|
724
|
-
**Standard pattern:** `*_DOCS.md` — living documents that grow with the feature (e.g., `AUTH_DOCS.md`, `PAYMENTS_DOCS.md`, `SEARCH_DOCS.md`). Same philosophy as `TESTING.md` and `ARCHITECTURE.md` — one source of truth per topic, kept current.
|
|
725
|
-
|
|
726
|
-
```
|
|
727
|
-
┌─────────────────────────────────────────────────────────────────────┐
|
|
728
|
-
│ DOCS MUST BE CURRENT BEFORE COMMIT. │
|
|
729
|
-
│ │
|
|
730
|
-
│ Stale docs = wrong implementations = wasted sessions. │
|
|
731
|
-
│ If you changed the feature, update its doc. No exceptions. │
|
|
732
|
-
└─────────────────────────────────────────────────────────────────────┘
|
|
733
|
-
```
|
|
734
|
-
|
|
735
|
-
1. **During planning**, read feature docs for the area being changed (`*_DOCS.md`, `docs/features/`, `docs/decisions/`)
|
|
736
|
-
2. If your code change contradicts what the doc says → MUST update the doc
|
|
737
|
-
3. If your code change extends behavior the doc describes → MUST add to the doc
|
|
738
|
-
4. If no `*_DOCS.md` exists and the feature touches 3+ files → create one. Keep it simple: what the feature does, key decisions, gotchas. Same structure as TESTING.md (topic-focused, not exhaustive)
|
|
739
|
-
5. If the project has a `ROADMAP.md` → update it (mark items done, add new items). ROADMAP feeds CHANGELOG — keeping it current means releases write themselves
|
|
740
|
-
|
|
741
|
-
**Doc staleness signals:** Low confidence in an area often means the docs are stale, missing, or misleading. If you struggle during planning, check whether the docs match the actual code.
|
|
742
|
-
|
|
743
|
-
**CLAUDE.md health:** `/claude-md-improver` audits CLAUDE.md structure and completeness. Run it periodically. It does NOT cover feature docs — the SDLC workflow handles those.
|
|
266
|
+
If you can't write a quality test for it, you can't prove it works.
|
|
744
267
|
|
|
745
268
|
## After Session (Capture Learnings)
|
|
746
269
|
|
|
747
|
-
|
|
748
|
-
|
|
749
|
-
|
|
750
|
-
-
|
|
751
|
-
|
|
752
|
-
|
|
270
|
+
| Insight | Destination |
|
|
271
|
+
|---------|-------------|
|
|
272
|
+
| Testing patterns/gotchas | `TESTING.md` |
|
|
273
|
+
| Feature-specific quirks | `*_DOCS.md` (e.g., `AUTH_DOCS.md`) |
|
|
274
|
+
| Architecture decisions | `docs/decisions/` (ADR) or `ARCHITECTURE.md` |
|
|
275
|
+
| General project context | `CLAUDE.md` (or `/revise-claude-md`) |
|
|
276
|
+
| Plan files (work done) | Delete or mark complete (stale plans mislead) |
|
|
753
277
|
|
|
754
278
|
### Memory Audit Protocol
|
|
755
279
|
|
|
756
|
-
Per-user memory at `~/.claude/projects/<proj>/memory/` accumulates private learnings. Some
|
|
280
|
+
Per-user memory at `~/.claude/projects/<proj>/memory/` accumulates private learnings. Some are portable lessons (tool quirks, platform gotchas) worth promoting to wizard docs.
|
|
757
281
|
|
|
758
|
-
**When to run:**
|
|
759
|
-
- End-of-release (before cutting a tag)
|
|
760
|
-
- After a debugging-heavy session with multiple memory additions
|
|
761
|
-
- On explicit "audit my memory" request
|
|
282
|
+
**When to run:** end-of-release, after debugging-heavy sessions, or on explicit "audit my memory" request.
|
|
762
283
|
|
|
763
|
-
**
|
|
284
|
+
**Rule-based denylist** (deterministic, no LLM):
|
|
285
|
+
- `type: user` → keep (user identity, preferences — never promote)
|
|
286
|
+
- `type: reference` → keep (external pointers, private by default)
|
|
287
|
+
- `type: project` → manual review (mixed state + portable lesson)
|
|
288
|
+
- `type: feedback` → manual review (mixed personal preference + portable rule)
|
|
764
289
|
|
|
765
|
-
|
|
766
|
-
- `type: user` → `keep` (user identity, preferences — never promote)
|
|
767
|
-
- `type: reference` → `keep` (external pointers to Discord/URL/etc — private by default)
|
|
768
|
-
- `type: project` → `manual-review` (often mixed state + portable lesson — human decides)
|
|
769
|
-
- `type: feedback` → `manual-review` (often mixed personal preference + portable rule — human decides)
|
|
770
|
-
- Parser must normalize YAML variants (`type: "user"`, `type: user # comment`, surrounding whitespace) — see `tests/test-memory-audit-protocol.sh::apply_denylist_rule` for the reference implementation
|
|
771
|
-
2. **Remaining entries** (no type, or type outside the 4 above) fall through to human-gated review. An LLM-assisted classification runner is Prove-It-Gated: build it only after running this protocol 4+ times with manual classification. Until then, human review at promotion time IS the quality gate
|
|
290
|
+
**Destinations for promote entries** (no new files): tool/platform gotchas → `SDLC.md` `## Lessons Learned`. Testing → `TESTING.md`. Tool quirks tied to a skill → that `SKILL.md`. Process rules → `CLAUDE.md`.
|
|
772
291
|
|
|
773
|
-
**
|
|
292
|
+
**Tracking:** `promoted_to: <path>` in the memory file's YAML frontmatter; later audits skip already-promoted entries.
|
|
774
293
|
|
|
775
|
-
|
|
776
|
-
|---------|--------|
|
|
777
|
-
| Language/tool/platform gotchas (bash, gh CLI, GHA, macOS) | `SDLC.md` → `## Lessons Learned` section |
|
|
778
|
-
| Testing gotchas (flaky patterns, mock-vs-integration lessons) | `TESTING.md` |
|
|
779
|
-
| Tool-specific quirks tied to a skill | That skill's `SKILL.md` |
|
|
780
|
-
| Process rules that should govern the project | `CLAUDE.md` |
|
|
294
|
+
**Human gate is MANDATORY.** Protocol produces diffs; user approves chunk-by-chunk. Never auto-apply. Prove-It: build a `/memory-audit` slash command only after running 4+ times manually. (Full protocol: wizard doc.)
|
|
781
295
|
|
|
782
|
-
|
|
783
|
-
|
|
784
|
-
**Human gate is MANDATORY.** Protocol produces diffs; user approves chunk-by-chunk before apply. Never auto-apply — private memory touching public docs needs human judgement.
|
|
785
|
-
|
|
786
|
-
**Prove It Gate:** If you find yourself running this protocol 4+ times and manually doing the same classification work, that's evidence to build a `/memory-audit` slash command AND wire the LLM-gated quality tests (8/10 classification, 6/6 destination). Until then, protocol + human review is enough — and no stub tests that skip (they mislead reviewers into thinking a gate exists when it doesn't).
|
|
787
|
-
|
|
788
|
-
## Post-Mortem: When Process Fails, Feed It Back
|
|
789
|
-
|
|
790
|
-
**Every process failure becomes an enforcement rule.** When you skip a step and it causes a problem, don't just fix the symptom — add a gate so it can't happen again.
|
|
296
|
+
## Post-Mortem: Process Failures Become Rules
|
|
791
297
|
|
|
792
298
|
```
|
|
793
299
|
Incident → Root Cause → New Rule → Test That Proves the Rule → Ship
|
|
794
300
|
```
|
|
795
301
|
|
|
796
|
-
|
|
797
|
-
1. **What happened?** — Describe the incident (what went wrong, what was the impact)
|
|
798
|
-
2. **Root cause** — Not "I forgot" — what structurally allowed the skip? Was it guidance (easy to ignore) instead of a gate (impossible to skip)?
|
|
799
|
-
3. **New rule** — Turn the failure into an enforcement rule in the SDLC skill
|
|
800
|
-
4. **Test** — Write a test that proves the rule exists (TDD — the rule is code too)
|
|
801
|
-
5. **Evidence** — Reference the incident so future readers understand WHY the rule exists
|
|
302
|
+
Don't fix only the symptom. Add a gate so it can't happen again. Example: PR #145 auto-merged before CI review → "NEVER AUTO-MERGE" block + `test_never_auto_merge_gate`.
|
|
802
303
|
|
|
803
|
-
|
|
304
|
+
## Context Management & Subagents
|
|
804
305
|
|
|
805
|
-
|
|
306
|
+
- `/compact` between planning and implementation (plan preserved in summary)
|
|
307
|
+
- `/clear` between unrelated tasks; after 2+ failed corrections (context polluted)
|
|
308
|
+
- Auto-compact fires at ~95% capacity
|
|
309
|
+
- After committing a PR, `/clear` before next feature
|
|
310
|
+
- `--bare` mode (v2.1.81+) skips ALL hooks/skills/LSP/plugins. Scripted headless only — never normal development.
|
|
311
|
+
- Custom subagents (`.claude/agents/`) run autonomously and return results. Skills guide behavior; agents do work. Use for parallel tasks or fresh context. Examples: `sdlc-reviewer`, `ci-debug`, `test-writer`.
|
|
806
312
|
|
|
807
|
-
|
|
313
|
+
## Design System Check (UI Changes Only)
|
|
808
314
|
|
|
809
|
-
|
|
315
|
+
Read `DESIGN_SYSTEM.md` if exists. Verify colors/fonts/spacing match tokens; flag new patterns not in design system. Skip on backend/config/non-visual code.
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
**Full reference:** `CLAUDE_CODE_SDLC_WIZARD.md` (cross-model review, deployment, debugging, post-mortem, memory audit, design system). `TESTING.md` (testing diamond + mocking). `ARCHITECTURE.md` (environments + post-deploy).
|