agentic-sdlc-wizard 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,3470 @@
1
+ # Claude Code SDLC Setup Wizard
2
+
3
+ > **Contribute**: This wizard is community-driven. PRs welcome at [github.com/BaseInfinity/agentic-ai-sdlc-wizard](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard) - your discoveries help everyone.
4
+
5
+ > **For Humans**: This wizard helps you implement a battle-tested SDLC enforcement system for Claude Code. It will scan your project, ask questions, and walk you through setup step-by-step. Works for solo developers, teams, and organizations alike.
6
+
7
+ > **Important**: This wizard is a **setup guide**, not a file you keep in your repo. Run it once to generate your SDLC files (hooks, skills, docs), then check for updates periodically with "Check if the SDLC wizard has updates".
8
+
9
+ ## What This Is: SDLC for AI Agents
10
+
11
+ **This SDLC is designed for Claude (the AI) to follow, not humans.**
12
+
13
+ You set it up, Claude follows it. The magic is that structured human engineering practices (planning, TDD, confidence levels) happen to be exactly what AI agents need to stay on track.
14
+
15
+ | Human SDLC | Why It Works for AI |
16
+ |------------|---------------------|
17
+ | Plan before coding | AI must understand before acting, or it guesses wrong |
18
+ | TDD Red-Green-Pass | AI needs concrete pass/fail feedback to verify its work |
19
+ | Confidence levels | AI needs to know when to ask vs when to proceed |
20
+ | Self-review | AI catches its own mistakes before showing you |
21
+ | TodoWrite visibility | You see what AI is doing (no black box) |
22
+
23
+ **The result:** Claude follows a disciplined engineering process automatically. You just review and approve.
24
+
25
+ ---
26
+
27
+ ## The Vision
28
+
29
+ **Think Iron Man:** Jarvis is nothing without Tony Stark. Tony Stark is still Tony Stark. But together? They make Iron Man. This SDLC is your suit - you build it over time, improve it for your needs, and it makes you both better.
30
+
31
+ **This wizard is designed to make itself unnecessary.**
32
+
33
+ As Claude Code improves, the wizard absorbs those improvements and removes its own scaffolding. Built-in TDD enforcement? Delete our hook. Native confidence tracking? Remove our guidance. Official code review plugin? Use theirs, delete ours. Every Claude Code release is an opportunity to simplify.
34
+
35
+ **The end goal:** This entire wizard becomes part of Claude Code itself. The patterns here — planning before coding, TDD enforcement, confidence levels, self-review — are exactly what every AI agent needs. Until Anthropic builds them in natively, this wizard bridges the gap.
36
+
37
+ **But here's the key:** This isn't a one-size-fits-all answer. It's a starting point that helps you find YOUR answer. Every project is different. The self-evaluating loop (plan → build → test → review → improve) needs to be tuned to your codebase, your team, your standards. The wizard gives you the framework — you shape it into something bespoke.
38
+
39
+ **The living system:**
40
+ - CI self-heal captures friction signals as GitHub issues for pattern analysis
41
+ - You approve changes to the process
42
+ - Both sides learn over time
43
+ - The system improves the system (recursive improvement)
44
+
45
+ **This is a partnership, not a rulebook.**
46
+
47
+ ---
48
+
49
+ ## KISS: Keep It Simple, Stupid
50
+
51
+ **A core principle of this SDLC - not just for coding, but for the entire development process.**
52
+
53
+ When implementing features, fixing bugs, or designing systems:
54
+ - **If something feels complex** - simplify another layer
55
+ - **If you're confused** - is this the right approach? Is there a better way?
56
+ - **If it's hard** - question WHY it's hard. Maybe it's hard for the wrong reasons.
57
+
58
+ **Don't power through complexity.** Step back and simplify. The simplest solution that works is usually the best one.
59
+
60
+ This applies to:
61
+ - Code you write
62
+ - Architecture decisions
63
+ - Test strategies
64
+ - The SDLC process itself
65
+
66
+ **When in doubt, simplify.**
67
+
68
+ ---
69
+
70
+ ## Testing AI Tool Updates
71
+
72
+ When your AI tools update, how do you know if the update is safe?
73
+
74
+ **The Problem:**
75
+ - AI behavior is stochastic - same prompt, different outputs
76
+ - Single test runs can mislead (variance looks like regression)
77
+ - "It feels slower" isn't data
78
+
79
+ **The Solution: Statistical A/B Testing**
80
+
81
+ | Phase | What You Test | Question |
82
+ |-------|---------------|----------|
83
+ | **Regression** | Old version vs new version | Did the update break anything? |
84
+ | **Improvement** | New version vs new version + changes | Do suggested changes help? |
85
+
86
+ **Statistical Rigor:**
87
+ - Run multiple trials (5+) to account for variance
88
+ - Use 95% confidence intervals
89
+ - Only claim regression/improvement when CIs don't overlap
90
+ - Overlapping CIs = no significant difference = safe
91
+
92
+ This prevents both false positives (crying wolf) and false negatives (missing real regressions).
93
+
94
+ **How We Apply This:**
95
+ - Weekly workflow tests new Claude Code versions before recommending upgrade
96
+ - Phase A: Does new CC version break SDLC enforcement?
97
+ - Phase B: Do changelog-suggested improvements actually help?
98
+ - Results shown in PR with statistical confidence
99
+
100
+ ---
101
+
102
+ ## Philosophy: Sensible Defaults, Smart Customization
103
+
104
+ This wizard provides **opinionated defaults** optimized for AI agent workflows. You can customize, but understand what's load-bearing.
105
+
106
+ ### CORE NON-NEGOTIABLES (Don't Change These)
107
+
108
+ These aren't preferences - they're **how AI agents stay on track**:
109
+
110
+ | Core Principle | Why It's Critical for AI |
111
+ |----------------|--------------------------|
112
+ | **TDD Red-Green-Pass** | AI agents need concrete pass/fail feedback. Without failing tests first, Claude can't verify its work. This is the feedback loop that keeps implementation correct. |
113
+ | **Testing Diamond** | Integration tests catch real bugs. Unit tests with mocks can "pass" while production fails. AI agents need tests that actually validate behavior. |
114
+ | **Confidence Levels** | Prevents Claude from guessing when uncertain. LOW confidence = ASK USER. This stops runaway bad implementations. |
115
+ | **TodoWrite Visibility** | You need to see what Claude is doing. Without visibility, Claude can go off-track without you knowing. |
116
+ | **Planning Before Coding** | Claude must understand before implementing. Skipping planning = wasted effort and wrong approaches. |
117
+
118
+ **WARNING:** Deviating from these fundamentals will break the system. The SDLC works because these pieces work together. Remove one and the whole system degrades.
119
+
120
+ ---
121
+
122
+ ### SAFELY CUSTOMIZABLE (Change Freely)
123
+
124
+ These adapt to your stack without affecting core behavior:
125
+
126
+ | Customization | Examples |
127
+ |---------------|----------|
128
+ | **Test framework** | Jest, Vitest, pytest, Go testing, etc. |
129
+ | **Commands** | Your specific lint, build, test commands |
130
+ | **Code style** | Tabs/spaces, quotes, semicolons |
131
+ | **Pre-commit checks** | Which checks to run (lint, typecheck, build) |
132
+ | **Documentation structure** | Your doc naming and organization |
133
+ | **Feature doc suffix** | Claude scans for existing patterns, suggests based on what you have, or lets you define custom |
134
+ | **Source directory patterns** | `/src/`, `/app/`, `/lib/`, etc. |
135
+ | **Test directory patterns** | `/tests/`, `/__tests__/`, `/spec/` |
136
+ | **Mocking rules** | What to mock in YOUR stack (external APIs, etc.) |
137
+ | **Code review** | `/code-review` for local, CI review for team visibility |
138
+ | **Security review triggers** | What's security-sensitive in your domain |
139
+
140
+ ---
141
+
142
+ ### RISKY CUSTOMIZATIONS (Strong Warnings)
143
+
144
+ You CAN change these, but understand the trade-offs:
145
+
146
+ | Customization | Default | Risk if Changed |
147
+ |---------------|---------|-----------------|
148
+ | **Testing shape** | Diamond (integration-heavy) | Pyramid (unit-heavy) = mocks can hide real bugs, AI gets false confidence |
149
+ | **TDD strictness** | Strict (test first always) | Flexible = AI may skip tests, no verification of correctness |
150
+ | **Planning mode** | Required for implementation | Skipping = Claude codes without understanding, wasted effort |
151
+ | **Confidence thresholds** | LOW < 60% = must ask | Higher threshold = Claude proceeds when unsure, mistakes |
152
+
153
+ **If you change these:** The wizard will warn you. You can override, but you're accepting the risk.
154
+
155
+ ---
156
+
157
+ ### Smart Recommendations (Not Just Detection)
158
+
159
+ During setup, Claude will:
160
+
161
+ 1. **Scan your project** - Find package managers (package.json, Cargo.toml, go.mod, pyproject.toml, etc.), test files, CI configs
162
+ 2. **Recommend best practices** - Based on YOUR stack and what Claude discovers, not assumptions
163
+ 3. **Explain the recommendation** - Why this approach works best with AI agents
164
+ 4. **Let you decide** - Accept defaults or customize with full understanding
165
+ 5. **Ask if unsure** - Claude will ask rather than guess about your stack
166
+
167
+ **Example:**
168
+ ```
169
+ Scan result: Found Jest, mostly unit tests, heavy mocking
170
+ Recommendation: Testing Diamond with integration tests
171
+ Why: Your current unit tests with mocks may pass while production fails.
172
+ Integration tests give Claude reliable feedback.
173
+ Action: [Accept Recommendation] or [Keep Current Approach (with warnings)]
174
+ ```
175
+
176
+ ---
177
+
178
+ ### The Goal
179
+
180
+ **The True Goal:** Not just keeping AI Agents following SDLC, but creating a **self-improving partnership** where:
181
+ - Humans always feel **in control**
182
+ - Both sides **learn and get better** over time
183
+ - The process **organically evolves** through collaboration
184
+ - **Human + AI collaboration** working together - everyone wins
185
+
186
+ This frames the wizard as a partnership, not a constraint.
187
+
188
+ **What this means in practice:**
189
+ 1. Have a process that Claude follows consistently
190
+ 2. Make the process visible (TodoWrite, confidence levels)
191
+ 3. Enforce quality gates (tests pass, review before commit)
192
+ 4. Let Claude ask when uncertain
193
+ 5. **Customize what makes sense, keep what keeps AI on track**
194
+
195
+ ### Leverage Official Tools (Don't Reinvent)
196
+
197
+ When Anthropic provides official plugins or tools that handle something:
198
+ - **Use theirs, delete ours** - Official tools are maintained, tested, and updated automatically
199
+ - This wizard focuses on what official tools DON'T do (TDD enforcement, confidence levels, planning integration)
200
+
201
+ **Check periodically:** `/plugin > Discover` - new plugins may replace parts of our workflow.
202
+
203
+ ---
204
+
205
+ ## Prerequisites
206
+
207
+ | Requirement | Why |
208
+ |-------------|-----|
209
+ | **Claude Code v2.1.69+** | Required for InstructionsLoaded hook, skill directory variable, and Tasks system |
210
+ | **Git repository** | Files should be committed for team sharing |
211
+
212
+ ---
213
+
214
+ ## Claude Code Feature Updates
215
+
216
+ > **Keep your SDLC current**: Claude Code evolves. This section documents features that enhance the SDLC workflow. Check [Claude Code releases](https://github.com/anthropics/claude-code/releases) periodically.
217
+
218
+ ### Tasks System (v2.1.16+)
219
+
220
+ **What changed**: TodoWrite is now backed by a persistent Tasks system with dependency tracking.
221
+
222
+ **Benefits for SDLC**:
223
+ - Tasks persist across sessions (crash recovery)
224
+ - Sub-agents can see task state
225
+ - Dependencies tracked automatically (RED → GREEN → PASS)
226
+
227
+ **No changes needed**: Your existing TodoWrite calls in skills work automatically with the new system.
228
+
229
+ **Rollback if issues**: Set `CLAUDE_CODE_ENABLE_TASKS=false` environment variable.
230
+
231
+ ### Skill Arguments with $ARGUMENTS (v2.1.19+)
232
+
233
+ **What changed**: Skills can now accept parameters via `$ARGUMENTS` placeholder.
234
+
235
+ **How to use**: Add `argument-hint` to frontmatter and `$ARGUMENTS` in skill content:
236
+
237
+ ```yaml
238
+ ---
239
+ name: sdlc
240
+ description: Full SDLC workflow for implementing features, fixing bugs, refactoring code
241
+ argument-hint: [task description]
242
+ ---
243
+
244
+ ## Task
245
+ $ARGUMENTS
246
+
247
+ ## Phases
248
+ ...rest of skill...
249
+ ```
250
+
251
+ **Usage examples**:
252
+ - `/sdlc fix the login validation bug` → `$ARGUMENTS` = "fix the login validation bug"
253
+ - `/testing unit UserService` → `$ARGUMENTS` = "unit UserService"
254
+
255
+ **Note**: Skills still auto-invoke via hooks. This is optional polish for manual invocation.
256
+
257
+ ### Auto-Memory (v2.1.59+)
258
+
259
+ Claude Code now has built-in auto-memory that persists context across sessions. Manage with `/memory`.
260
+
261
+ **No changes needed**: The wizard's hooks and skills work alongside auto-memory. Memory stores preferences and context; the wizard enforces process.
262
+
263
+ ### Built-in Commands (v2.1.59-v2.1.76)
264
+
265
+ New built-in commands available to use alongside the wizard:
266
+
267
+ | Command | Version | What It Does |
268
+ |---------|---------|--------------|
269
+ | `/memory` | v2.1.59 | Manage persistent auto-memory |
270
+ | `/simplify` | v2.1.63 | Review changed code for reuse/quality |
271
+ | `/batch` | v2.1.63 | Run prompts in batch |
272
+ | `/loop` | v2.1.71 | Run prompts on recurring intervals |
273
+ | `/effort` | v2.1.76 | Set effort level (low/medium/high) |
274
+
275
+ **Tip**: `/simplify` pairs well with the self-review phase. Run it after implementation as an additional quality check.
276
+
277
+ ### Skill Effort Frontmatter (v2.1.80+)
278
+
279
+ Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` and `/testing` skills use `effort: high` to ensure Claude gives full attention to SDLC tasks.
280
+
281
+ ### InstructionsLoaded Hook (v2.1.69+)
282
+
283
+ New hook event fires when Claude loads instructions at session start. The wizard uses this to validate that `SDLC.md` and `TESTING.md` exist — catches missing wizard files early.
284
+
285
+ ### Skill Directory Variable (v2.1.69+)
286
+
287
+ Skills can now reference companion files using `${CLAUDE_SKILL_DIR}`. Useful if you add data files alongside your skill markdown.
288
+
289
+ ### Hook Metadata (v2.1.69+)
290
+
291
+ Hook events now include `agent_id` and `agent_type` fields. Hooks can behave differently for subagents vs the main agent if needed.
292
+
293
+ ### Security Hardening (v2.1.49-v2.1.78)
294
+
295
+ Several fixes that strengthen wizard enforcement:
296
+ - **v2.1.49**: Managed hooks can't be bypassed by non-managed settings (tamper-resistant)
297
+ - **v2.1.72**: PreToolUse hooks returning `"allow"` can no longer bypass `deny` permission rules
298
+ - **v2.1.74**: Managed policy `ask` rules can't be bypassed by user `allow` or skill `allowed-tools`
299
+ - **v2.1.77**: Additional PreToolUse deny-bypass hardening
300
+ - **v2.1.78**: Visible startup warning when sandbox dependencies are missing
301
+
302
+ ### Other Notable Changes
303
+
304
+ - **v2.1.50**: `CLAUDE_CODE_SIMPLE` env var disables hooks/skills/CLAUDE.md — be aware this bypasses wizard enforcement
305
+ - **v2.1.72**: HTML comments (`<!-- -->`) in CLAUDE.md are no longer injected into context — useful for internal notes
306
+ - **v2.1.77**: Output token limits increased from 64k to 128k (Opus 4.6/Sonnet 4.6)
307
+ - **v2.1.81**: `--bare` flag for scripted `-p` calls skips hooks/LSP/plugins/skills in headless mode
308
+
309
+ ---
310
+
311
+ ## Prove It's Better
312
+
313
+ **Don't reinvent the wheel.** Use native/built-in features UNLESS you prove your custom version is better. If you can't prove it, delete yours.
314
+
315
+ This applies to everything: native Claude Code commands vs custom skills, framework utilities vs hand-rolled code, library functions vs custom implementations.
316
+
317
+ **How to prove it:**
318
+ 1. Test the native solution — measure quality, speed, reliability
319
+ 2. Test your custom solution — same scenario, same metrics
320
+ 3. Compare side-by-side
321
+ 4. Native >= custom? **Use native. Delete yours.**
322
+ 5. Custom > native? **Keep yours. Document WHY.** Re-evaluate when native improves.
323
+
324
+ **For the wizard's CI/CD:** When the weekly-update workflow detects a new Claude Code feature that overlaps with a wizard feature, the CI should automatically run E2E with both versions and recommend KEEP CUSTOM / SWITCH TO NATIVE / TIE.
325
+
326
+ ---
327
+
328
+ ## What You're Setting Up
329
+
330
+ A workflow enforcement system that makes Claude Code:
331
+ - **Plan before coding** (Planning Mode → research → present approach)
332
+ - **Follow TDD** (write failing tests first, then implement)
333
+ - **Track progress** (TodoWrite for visibility)
334
+ - **Self-review** (catch issues before showing you)
335
+ - **Ask when unsure** (confidence levels prevent guessing)
336
+
337
+ **The Result**: Claude becomes a disciplined engineer who follows your process automatically.
338
+
339
+ ---
340
+
341
+ ## Philosophy First (Read This)
342
+
343
+ Before we configure anything, understand WHY this system works:
344
+
345
+ ### 1. Planning Mode is Your Best Friend
346
+
347
+ **Start almost every task in Planning Mode.** Here's why:
348
+
349
+ **Hidden Benefit: Free Context Reset**
350
+ After planning, you get a free `/compact` - Claude's plan is preserved in the summary, and you start implementation with clean context. This is one of the biggest advantages of plan mode.
351
+
352
+ ```
353
+ ┌─────────────────────────────────────────────────────────────────┐
354
+ │ WITHOUT Planning Mode │
355
+ │ │
356
+ │ User: "Add authentication" │
357
+ │ Claude: *immediately starts writing code* │
358
+ │ Result: Maybe wrong approach, wasted effort, rework │
359
+ └─────────────────────────────────────────────────────────────────┘
360
+
361
+ ┌─────────────────────────────────────────────────────────────────┐
362
+ │ WITH Planning Mode │
363
+ │ │
364
+ │ User: "Add authentication" + enters plan mode │
365
+ │ Claude: *researches codebase, understands patterns* │
366
+ │ Claude: "Here's my approach. Confidence: MEDIUM. Questions..." │
367
+ │ User: *approves or adjusts* │
368
+ │ Claude: *now implements with clear direction* │
369
+ │ Result: Right approach, efficient implementation │
370
+ └─────────────────────────────────────────────────────────────────┘
371
+ ```
372
+
373
+ **Planning Mode + /compact = Maximum Efficiency**:
374
+ 1. Claude researches in Planning Mode
375
+ 2. Claude presents approach with confidence level
376
+ 3. You approve → Claude updates docs
377
+ 4. You run `/compact` → frees context, plan preserved in summary
378
+ 5. Claude implements with clean context
379
+
380
+ ### 2. Confidence Levels Prevent Disasters
381
+
382
+ Claude MUST state confidence before implementing:
383
+
384
+ | Level | Meaning | What Claude Does |
385
+ |-------|---------|------------------|
386
+ | **HIGH (90%+)** | "I know exactly what to do" | Proceeds after your approval |
387
+ | **MEDIUM (60-89%)** | "Solid approach, some unknowns" | Highlights uncertainties |
388
+ | **LOW (<60%)** | "I'm not sure" | **ASKS YOU before proceeding** |
389
+ | **FAILED 2x** | "Something's wrong" | **STOPS and asks for help** |
390
+ | **CONFUSED** | "I don't understand why this is failing" | **STOPS, describes what was tried** |
391
+
392
+ **Why this matters**: You have domain expertise. When Claude is uncertain, asking you takes 30 seconds. Guessing wrong takes 30 minutes to fix.
393
+
394
+ ### 3. TDD (Recommended, Customize to Your Needs)
395
+
396
+ The classic TDD cycle:
397
+ ```
398
+ RED → Write test that FAILS (proves feature doesn't exist)
399
+ GREEN → Implement feature (test passes)
400
+ PASS → All tests pass (no regressions)
401
+ ```
402
+
403
+ **The core principle:** Have a testing strategy. Know what you're testing and why.
404
+
405
+ **Customize for your team:**
406
+ - Strict TDD (test first always)? Great.
407
+ - Test-after for some cases? Fine, just be consistent.
408
+ - The key: **don't commit code that breaks existing tests.**
409
+
410
+ **Test review preference:** Ask the user if they want to review each test before implementation, or trust the TESTING.md guidelines. Tests validate code - some users want oversight, others trust the process. If tests start failing or missing bugs, investigate why.
411
+
412
+ ### 4. Testing Strategy (Define Yours)
413
+
414
+ Here's the "Testing Diamond" approach (recommended for AI agents):
415
+
416
+ ```
417
+ /\ ← Few E2E (automated like Playwright, or manual sign-off)
418
+ / \
419
+ / \
420
+ /------\
421
+ | | ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
422
+ | |
423
+ \------/
424
+ \ /
425
+ \ /
426
+ \/ ← Few Unit (pure logic only)
427
+ ```
428
+
429
+ **Why Integration Tests are Best Bang for Buck:**
430
+ - **Speed**: Fast enough to run on every change
431
+ - **Stability**: Touch real code, not mocks that lie
432
+ - **Confidence**: If integration tests pass, production usually works
433
+ - **AI-friendly**: Give Claude concrete pass/fail feedback on real behavior
434
+
435
+ **E2E vs Manual Testing:**
436
+ - **E2E (automated)**: Playwright, Cypress - runs without human
437
+ - **Manual testing**: Human sign-off at the very end
438
+ - **Goal**: Zero manual testing. Only for final verification when 100% confident.
439
+
440
+ **But your team decides:**
441
+
442
+ | Question | Your Choice |
443
+ |----------|-------------|
444
+ | Do you need E2E tests? | Maybe not for backend-only services |
445
+ | Heavy on unit tests? | Fine for pure logic codebases |
446
+ | Integration-first? | Great for systems with real DBs |
447
+ | No tests yet? | Start somewhere, even basic tests help |
448
+
449
+ **The point:** Have a testing strategy documented in TESTING.md. Claude will follow whatever approach you define.
450
+
451
+ ### 5. Mocking Strategy (Philosophy, Not Just Tech)
452
+
453
+ **The Problem:** AI agents (and humans) tend to mock too much. Tests that mock everything test nothing - they just verify the mocks work, not the actual code.
454
+
455
+ **Minimal Mocking Philosophy:**
456
+
457
+ | Dependency | Mock It? | Reasoning |
458
+ |------------|----------|-----------|
459
+ | Database | ❌ NEVER | Use test DB or in-memory |
460
+ | Cache | ❌ NEVER | Use isolated test instance |
461
+ | External APIs | ✅ YES | Real calls = flaky + expensive |
462
+ | Time/Date | ✅ YES | Determinism |
463
+
464
+ **The key insight:** When you mock something, you're saying "I trust this works." Only mock things you truly can't control (external APIs, third-party services).
465
+
466
+ **But your team decides:**
467
+ - Heavy mocking preferred? Document it.
468
+ - No mocking at all? Document it.
469
+ - Mocks from fixtures? Document where fixtures live (e.g., `tests/fixtures/`).
470
+
471
+ **The point:** Have a mocking strategy documented. Claude will follow it. The goal is tests that prove real behavior, not just pass.
472
+
473
+ ### 6. SDET Wisdom (Test Code is First-Class)
474
+
475
+ **Test Code = First-Class Citizen**
476
+ Treat test code like app code - code review, quality standards, not throwaway. Tests are production-critical infrastructure.
477
+
478
+ ### Tests As Building Blocks
479
+
480
+ Existing test patterns are building blocks - leverage them:
481
+ - **Similar tests exist and are good?** - Copy the pattern, adapt for your case
482
+ - **Similar tests exist but are bad?** - Propose improvement, worth the scrutiny
483
+ - **No similar tests?** - More scrutiny needed, may need human input on approach
484
+
485
+ **Existing patterns aren't sacred.** Don't blindly copy bad patterns just because they exist. Improving a stale pattern is worth the effort.
486
+
487
+ **Before fixing a failing test, ask:**
488
+ 1. Do we even need this test? (Is it for deleted/legacy code?)
489
+ 2. Is this tested better elsewhere? (DRY applies to tests too)
490
+ 3. Is the test wrong, or is the code wrong?
491
+
492
+ **Don't ignore flaky tests:**
493
+ - Flaky tests have revealed rare edge case bugs that later hit production
494
+ - "Nothing stings more than a flaky test you ignored coming back to bite you in prod"
495
+ - Dig into every failure - sweeping under the rug compounds problems
496
+
497
+ **Three categories of test failures:**
498
+
499
+ | Category | Examples | Fix |
500
+ |----------|----------|-----|
501
+ | **Test code bug** | Not parallel-safe, shared state, wrong assertions | Fix the test code (most common) |
502
+ | **Application bug** | Race condition, timing issue, edge case | Fix the app code - test found a real bug |
503
+ | **Environment/Infra bug** | CI config, memory, isolation issues | Fix the environment/setup/teardown |
504
+
505
+ ### The Absolute Rule: ALL TESTS MUST PASS
506
+
507
+ ```
508
+ ┌─────────────────────────────────────────────────────────────────────┐
509
+ │ ALL TESTS MUST PASS. NO EXCEPTIONS. │
510
+ │ │
511
+ │ This is not negotiable. This is not flexible. This is absolute. │
512
+ └─────────────────────────────────────────────────────────────────────┘
513
+ ```
514
+
515
+ **Not acceptable excuses:**
516
+ - "Those tests were already failing" → Then fix them first
517
+ - "That's not related to my changes" → Doesn't matter, fix it
518
+ - "It's flaky, just ignore it" → Flaky = bug, investigate it
519
+ - "It passes locally" → CI is the source of truth
520
+ - "It's just a warning" → Warnings become errors, fix it
521
+
522
+ **The fix is always the same:**
523
+ 1. Tests fail → STOP
524
+ 2. Investigate → Find root cause
525
+ 3. Fix → Whatever is actually broken (code, test, or environment)
526
+ 4. All tests pass → THEN commit
527
+
528
+ **Why this is absolute:**
529
+ - Tests are your safety net
530
+ - A failing test means something is wrong
531
+ - Committing with failing tests = committing known bugs
532
+ - "Works on my machine" is not a standard
533
+
534
+ **MCP Awareness for Testing (optional, nuanced):**
535
+ - **Where MCP adds real value:** E2E/browser testing (can't "see" UI without it), graphics projects, external systems Claude can't otherwise access
536
+ - **Often overkill for:** API/Integration tests (reading code/docs is usually sufficient), internal code work
537
+ - **Reality check:** As Claude improves, fewer MCPs are needed. Claude Code has MCP Tool Search (dynamically loads tools >10% context)
538
+ - **The rule:** Suggest where it adds real value, don't force it. Let user decide.
539
+
540
+ ---
541
+
542
+ ### 7. Delete Legacy Code (No Fallbacks)
543
+
544
+ When refactoring:
545
+ - Delete old code FIRST
546
+ - If something breaks, fix it properly
547
+ - No backwards-compatibility hacks
548
+ - No "just in case" fallbacks
549
+
550
+ **Why this works with TDD:** Your tests are your safety net. If deleting breaks something, tests catch it. Fix properly, don't create hybrid systems. This simplifies your codebase and lets you "play golf" - less code to maintain.
551
+
552
+ ### 8. Documentation Hygiene
553
+
554
+ Before starting any task, Claude should:
555
+
556
+ 1. **Find relevant documentation** - Search for docs related to the feature/system
557
+ 2. **Assess documentation health** - Is it current? Bloated? Useful?
558
+ 3. **ASK before cleaning** - Never delete or refactor docs without user approval
559
+
560
+ **Signs a doc might need attention:**
561
+ - Very large file with mixed concerns
562
+ - Outdated information mixed with current
563
+ - Duplicate information across files
564
+ - Hard to find what you need
565
+
566
+ **But remember:**
567
+ - Complex systems have complex docs - that's OK
568
+ - Size alone doesn't mean bloat - some things ARE complex
569
+ - Context and usefulness matter more than line count
570
+ - When in doubt, ASK the user
571
+
572
+ **The rule:** Identify doc issues during planning, propose cleanup, get approval. Never nuke docs on your own.
573
+
574
+ ### 9. Security Review (Calibrated to Your Project)
575
+
576
+ Security review depth should match your project's risk profile. During wizard setup, Claude will ask about your context to calibrate:
577
+
578
+ **Calibration Questions (during wizard):**
579
+ - Is this a personal project or production?
580
+ - Internal tool or public-facing?
581
+ - Handling sensitive data (PII, payments)?
582
+ - How many users?
583
+ - What's your attack surface?
584
+
585
+ **Then Claude calibrates:**
586
+
587
+ | Project Type | Security Review Depth |
588
+ |--------------|----------------------|
589
+ | Personal/learning project | Quick sanity check ("anything obvious?") |
590
+ | Internal tool, few users | Basic review of exposed endpoints |
591
+ | Production, sensitive data | Full review: auth, input validation, data exposure |
592
+ | Payment/financial | Extra scrutiny, consider external audit |
593
+
594
+ **Quick reference - which changes need review?**
595
+
596
+ | Change Type | Review? |
597
+ |-------------|---------|
598
+ | Auth/login changes | Yes |
599
+ | User input handling | Yes |
600
+ | API endpoints | Yes |
601
+ | Database queries | Yes |
602
+ | File operations | Yes |
603
+ | Internal refactoring | Usually no |
604
+ | UI/styling only | Usually no |
605
+
606
+ **What to check (when warranted):**
607
+ - Input validation at system boundaries
608
+ - Authentication/authorization on sensitive operations
609
+ - Data exposure risks
610
+ - Patterns appropriate for YOUR stack and attack surface
611
+
612
+ **The principle:** Always do a security review, but depth varies. A personal CLI tool doesn't need the same scrutiny as a payment API. Claude can always say "nothing to see here" for low-risk changes.
613
+
614
+ **Customize in wizard:** You can set your default review depth, and Claude will adjust based on what the code actually touches.
615
+
616
+ ---
617
+
618
+ ## Example Workflow (End-to-End)
619
+
620
+ Here's what a typical task looks like with this system:
621
+
622
+ ```
623
+ ┌─────────────────────────────────────────────────────────────────────────┐
624
+ │ USER: "Add a password reset feature" │
625
+ └─────────────────────────────────────────────────────────────────────────┘
626
+
627
+
628
+ ┌─────────────────────────────────────────────────────────────────────────┐
629
+ │ HOOK FIRES: SDLC baseline reminder + AUTO-INVOKE instruction │
630
+ │ CLAUDE: Sees implementation task → invokes sdlc skill │
631
+ └─────────────────────────────────────────────────────────────────────────┘
632
+
633
+
634
+ ┌─────────────────────────────────────────────────────────────────────────┐
635
+ │ PHASE 1: PLANNING │
636
+ │ │
637
+ │ Claude: │
638
+ │ 1. Creates TodoWrite with SDLC steps │
639
+ │ 2. Searches for relevant docs (auth docs, API docs, etc.) │
640
+ │ 3. Checks doc health - flags if anything needs attention │
641
+ │ 4. Researches codebase (existing auth patterns, DB schema) │
642
+ │ 5. Presents approach: │
643
+ │ │
644
+ │ "My approach: │
645
+ │ - Add /reset-password endpoint │
646
+ │ - Use existing email service │
647
+ │ - Store tokens in users table │
648
+ │ │
649
+ │ Confidence: MEDIUM │
650
+ │ Uncertainty: Not sure about token expiry - 1 hour or 24 hours?" │
651
+ │ │
652
+ │ User: "Use 1 hour. Looks good." │
653
+ └─────────────────────────────────────────────────────────────────────────┘
654
+
655
+
656
+ ┌─────────────────────────────────────────────────────────────────────────┐
657
+ │ PHASE 2: TRANSITION │
658
+ │ │
659
+ │ Claude: │
660
+ │ 1. Updates relevant docs with decisions/discoveries │
661
+ │ 2. "Docs updated. Ready for /compact before implementation?" │
662
+ │ │
663
+ │ User: runs /compact │
664
+ │ │
665
+ │ (Context freed, plan preserved in summary) │
666
+ └─────────────────────────────────────────────────────────────────────────┘
667
+
668
+
669
+ ┌─────────────────────────────────────────────────────────────────────────┐
670
+ │ PHASE 3: IMPLEMENTATION (TDD) │
671
+ │ │
672
+ │ Claude: │
673
+ │ 1. TDD RED: Writes failing test for password reset │
674
+ │ - Test expects endpoint to exist, return success │
675
+ │ - Test FAILS (endpoint doesn't exist yet) │
676
+ │ │
677
+ │ 2. TDD GREEN: Implements password reset │
678
+ │ - Creates endpoint, email logic, token handling │
679
+ │ - Test PASSES │
680
+ │ │
681
+ │ 3. Runs lint/typecheck │
682
+ │ 4. Runs ALL tests - no regressions │
683
+ │ 5. Production build check │
684
+ └─────────────────────────────────────────────────────────────────────────┘
685
+
686
+
687
+ ┌─────────────────────────────────────────────────────────────────────────┐
688
+ │ PHASE 4: REVIEW │
689
+ │ │
690
+ │ Claude: │
691
+ │ 1. DRY check - no duplicated logic │
692
+ │ 2. Self-review with /code-review │
693
+ │ 3. Security review (auth change = yes) │
694
+ │ - ✅ Token properly hashed │
695
+ │ - ✅ Rate limiting on endpoint │
696
+ │ - ✅ No password in logs │
697
+ │ │
698
+ │ 4. Presents summary: │
699
+ │ "Done. Added password reset with 1-hour tokens. │
700
+ │ 3 files changed, tests passing, security reviewed. │
701
+ │ Ready for your review." │
702
+ └─────────────────────────────────────────────────────────────────────────┘
703
+ ```
704
+
705
+ This is what the system enforces automatically. Claude follows this workflow because:
706
+ - **Hooks** remind every prompt
707
+ - **Skills** provide detailed guidance when invoked
708
+ - **TodoWrite** makes progress visible
709
+ - **Confidence levels** prevent guessing
710
+ - **TDD** ensures correctness
711
+ - **Self-review** catches issues before you see them
712
+
713
+ ---
714
+
715
+ ## Recommended Documentation Structure
716
+
717
+ For Claude to be effective at SDLC enforcement, your project should have these docs:
718
+
719
+ | Document | Purpose | Claude Uses For |
720
+ |----------|---------|-----------------|
721
+ | **CLAUDE.md** | Claude-specific instructions | Commands, code style, project rules |
722
+ | **README.md** | Project overview | Understanding what the project does |
723
+ | **ARCHITECTURE.md** | System design, data flows, services | Understanding how components connect |
724
+ | **TESTING.md** | Testing philosophy, patterns, commands | TDD guidance, test organization |
725
+ | **SDLC.md** | Development workflow (this system) | Full SDLC reference |
726
+ | **ROADMAP.md** | Vision, goals, milestones, timeline | Understanding project direction |
727
+ | **CONTRIBUTING.md** | How to contribute, PR process | Guiding external contributors |
728
+ | **Feature docs** | Per-feature documentation | Context for specific changes |
729
+
730
+ **Why these matter:**
731
+ - **CLAUDE.md** - Claude reads this automatically every session. Put commands, style rules, architecture overview here.
732
+ - **ARCHITECTURE.md** - Claude needs to understand how your system fits together before making changes.
733
+ - **TESTING.md** - Claude needs to know your testing approach, what to mock, what not to mock.
734
+ - **ROADMAP.md** - Shows where the project is going. Helps Claude understand priorities and what's next.
735
+ - **CONTRIBUTING.md** - For open source projects, defines how contributions work. Claude follows these when suggesting changes.
736
+ - **Feature docs** - For complex features, Claude reads these during planning to understand context.
737
+
738
+ **Start simple, expand over time:**
739
+ 1. Create CLAUDE.md with commands and basic architecture
740
+ 2. Create TESTING.md with your testing approach
741
+ 3. Add ARCHITECTURE.md when system grows complex
742
+ 4. Add ROADMAP.md when you have clear milestones/vision
743
+ 5. Add CONTRIBUTING.md if open source or team project
744
+ 6. Add feature docs as major features emerge
745
+
746
+ ---
747
+
748
+ ## Step 0: Repository Protection & Plugin Setup
749
+
750
+ ### Step 0.0: Enable Branch Protection (CRITICAL)
751
+
752
+ **Before setting up SDLC, protect your main branch.** This is non-negotiable for teams and highly recommended for solo developers.
753
+
754
+ **Why this matters:**
755
+ - SDLC enforcement is only as strong as your merge protection
756
+ - Without branch protection, anyone (including Claude) can push broken code to main
757
+ - Built-in GitHub feature - deterministic, no custom code needed
758
+
759
+ **Solo Developer Settings:**
760
+
761
+ | Setting | Value | Why |
762
+ |---------|-------|-----|
763
+ | Require pull request before merging | ✓ Enabled | All changes go through PR review |
764
+ | Require approvals | **0 (none)** | No one else to approve — CI is your gate |
765
+ | Require status checks to pass | ✓ Enabled | CI must be green |
766
+ | Require branches to be up to date | ✓ Enabled | No stale merges |
767
+ | Include administrators | **✗ Disabled** | You're the only admin — this locks you out |
768
+
769
+ **Team Settings (2+ developers):**
770
+
771
+ | Setting | Value | Why |
772
+ |---------|-------|-----|
773
+ | Require pull request before merging | ✓ Enabled | All changes go through PR review |
774
+ | Require approvals | 1+ (your choice) | Human must approve before merge |
775
+ | Require status checks to pass | ✓ Enabled | CI must be green |
776
+ | Require branches to be up to date | ✓ Enabled | No stale merges |
777
+ | Include administrators | ✓ Enabled | No one bypasses the rules |
778
+
779
+ **How to enable (UI):**
780
+ 1. Go to: `Settings > Branches > Add rule`
781
+ 2. Branch name pattern: `main` (or `master`)
782
+ 3. Enable the settings above (solo or team, as appropriate)
783
+ 4. Add required status checks: `validate`, `e2e-quick-check`
784
+ 5. Save changes
785
+
786
+ **How to enable (CLI — solo dev):**
787
+ ```bash
788
+ gh api repos/OWNER/REPO/branches/main/protection --method PUT --input - << 'EOF'
789
+ {
790
+ "required_status_checks": {
791
+ "strict": true,
792
+ "contexts": ["validate", "e2e-quick-check"]
793
+ },
794
+ "enforce_admins": false,
795
+ "required_pull_request_reviews": null,
796
+ "restrictions": null
797
+ }
798
+ EOF
799
+ ```
800
+
801
+ **How to enable (CLI — team):**
802
+ ```bash
803
+ gh api repos/OWNER/REPO/branches/main/protection --method PUT --input - << 'EOF'
804
+ {
805
+ "required_status_checks": {
806
+ "strict": true,
807
+ "contexts": ["validate", "e2e-quick-check"]
808
+ },
809
+ "enforce_admins": true,
810
+ "required_pull_request_reviews": {
811
+ "required_approving_review_count": 1,
812
+ "dismiss_stale_reviews": true
813
+ },
814
+ "restrictions": null
815
+ }
816
+ EOF
817
+ ```
818
+
819
+ **Optional (teams only):**
820
+
821
+ | Setting | Value | Why |
822
+ |---------|-------|-----|
823
+ | Require CODEOWNERS review | ✓ Enabled | Specific people must approve |
824
+
825
+ **CODEOWNERS file (teams only):**
826
+ Create `.github/CODEOWNERS`:
827
+ ```
828
+ # Default owners for everything
829
+ * @your-username
830
+
831
+ # Or specific paths
832
+ /src/ @dev-team
833
+ /.github/ @platform-team
834
+ ```
835
+
836
+ **The principle:** Built-in protection > custom enforcement. GitHub branch protection is battle-tested, always runs, and can't be accidentally bypassed.
837
+
838
+ **Why PRs even for solo devs?**
839
+
840
+ | Benefit | Solo Dev | Team |
841
+ |---------|----------|------|
842
+ | `/code-review` self-review | ✓ | ✓ |
843
+ | CI must pass before merge | ✓ | ✓ |
844
+ | Clean commit history | ✓ | ✓ |
845
+ | Easy rollback (revert PR) | ✓ | ✓ |
846
+ | Human review required | — | ✓ |
847
+
848
+ **Not required, but good practice.** The SDLC workflow includes a self-review step using `/code-review` (native Claude Code plugin). It launches parallel review agents for CLAUDE.md compliance, bug detection, and logic/security checks. You always have final say — the review just catches things you might miss.
849
+
850
+ **Code review workflows:**
851
+
852
+ | Workflow | When to use | How |
853
+ |----------|------------|-----|
854
+ | **Solo** | Working alone | `/code-review` locally before push |
855
+ | **Team** | Multiple contributors | `/code-review` locally + CI PR review for visibility |
856
+ | **Open Source** | External contributors | CI PR review on contributor PRs |
857
+
858
+ **Solo devs:** Skip approval requirements — CI status checks are your quality gate. The AI code review (`pr-review.yml`) provides automated review without needing human approval. GitHub does not allow PR authors to approve their own PRs, so requiring approvals on a solo repo will block all merges.
859
+
860
+ ---
861
+
862
+ ### Step 0.1: Required Plugins
863
+
864
+ **Install required plugin:**
865
+ ```bash
866
+ /plugin install claude-md-management@claude-plugin-directory
867
+ ```
868
+ > "Installing claude-md-management (required for CLAUDE.md maintenance)..."
869
+
870
+ This plugin handles:
871
+ - CLAUDE.md quality audits (A-F scores, specific improvement suggestions)
872
+ - Session learning capture via `/revise-claude-md`
873
+
874
+ **Scope:** CLAUDE.md only. Does NOT update feature docs, TESTING.md, ARCHITECTURE.md, hooks, or skills. The SDLC workflow still handles those (see Post-Mortem section for where learnings go).
875
+
876
+ ### Step 0.2: SDLC Core Setup (Wizard Creates)
877
+
878
+ The wizard creates TDD-specific automations that official plugins don't provide:
879
+ - TDD pre-tool-check hook (test-first enforcement)
880
+ - SDLC prompt-check hook (baseline reminders)
881
+ - SDLC skill with confidence levels
882
+ - Planning mode integration
883
+
884
+ ### Step 0.3: Additional Recommendations (Optional)
885
+
886
+ After SDLC setup is complete, run `claude-code-setup` for additional recommendations:
887
+
888
+ ```
889
+ "Based on your codebase, recommend additional automations"
890
+ ```
891
+
892
+ This may suggest:
893
+ - MCP Servers (context7 for docs, Playwright for frontend)
894
+ - Additional hooks (auto-format if Prettier configured)
895
+ - Subagents (security-reviewer if auth code detected)
896
+
897
+ **Claude prompts for each:**
898
+ > "[Detected: Prettier config] Want to add auto-format hook? (y/n)"
899
+
900
+ These are additive—they don't replace our TDD hooks.
901
+
902
+ ### Git Workflow Preference
903
+
904
+ **Claude asks:**
905
+ > "Do you use pull requests for code review? (y/n)"
906
+
907
+ - **Yes → PRs**: Recommend `code-review` plugin, PR workflow guidance
908
+ - **No → Solo/Feature branches**: Skip PR plugins, recommend feature branch workflow
909
+
910
+ Feature branches still recommended for solo devs (keeps main clean, easy rollback).
911
+
912
+ **If using PRs, also ask:**
913
+ > "Auto-clean old bot comments on new pushes? (y/n)"
914
+
915
+ - **Yes** → Add `int128/hide-comment-action` to CI (collapses outdated bot comments)
916
+ - **No** → Skip (some teams prefer full comment history)
917
+
918
+ **Recommendation:** Solo devs = yes (keeps PR tidy). Teams = ask (some want audit trail).
919
+
920
+ > "Run AI code review only after tests pass? (y/n)"
921
+
922
+ - **Yes** → PR review workflow waits for CI to pass first (saves API costs on broken code)
923
+ - **No** → Review runs immediately in parallel with tests (faster feedback)
924
+
925
+ **Recommendation:** Yes for most teams. No point reviewing code that doesn't build/pass tests. Saves Claude API costs and reviewer time.
926
+
927
+ > "What reasoning effort for the PR reviewer? (medium/high/max)"
928
+
929
+ | Level | Cost per Review | When to Use |
930
+ |-------|----------------|-------------|
931
+ | `medium` | ~$0.13-0.38 | Default, balanced cost/quality |
932
+ | `high` | ~$0.38-1.00 | Recommended — deeper reasoning catches more |
933
+ | `max` (Opus only) | $1.00+ | Unbounded thinking, highest quality, unpredictable cost |
934
+
935
+ **Recommendation:** `high` for most teams. The reviewer is your quality gate — deeper reasoning catches issues that `medium` misses. `max` is overkill for routine reviews but useful for security-critical or high-risk PRs.
936
+
937
+ **How to set it:** Add `--effort high` (or `medium`/`max`) to `claude_args` in your PR review workflow. You can change this anytime.
938
+
939
+ > "Use sticky PR comments or inline review comments for bot reviews? (sticky/inline)"
940
+
941
+ - **Sticky** → Bot reviews post as single PR comment that updates in place
942
+ - **Inline** → Bot creates GitHub review with inline comments on specific lines
943
+
944
+ **Recommendation:** Sticky for bots. Here's why:
945
+
946
+ | Approach | When to Use |
947
+ |----------|-------------|
948
+ | **Sticky PR comment** | Bots, automated reviews. Updates in place, stays clean. |
949
+ | **Inline review comments** | Humans. Threading on specific lines is valuable. |
950
+
951
+ **The problem with inline bot reviews:**
952
+ - Every push triggers new review → comments pile up
953
+ - GitHub's `hide-comment-action` only hides PR comments, not review comments
954
+ - PR becomes cluttered with dozens of outdated bot reviews
955
+
956
+ **Sticky comment workflow:**
957
+ 1. Bot posts review as sticky PR comment (single comment, auto-updates)
958
+ 2. User reads review, replies in PR comments if questions
959
+ 3. User adds `needs-review` label to trigger re-review
960
+ 4. Bot updates the SAME sticky comment (no pile-up)
961
+ 5. Label auto-removed, ready for next round
962
+
963
+ **Back-and-forth:** User questions live in PR comments. Bot's response is always the latest sticky comment. Clean and organized.
964
+
965
+ **CI monitoring question:**
966
+ > "Should Claude monitor CI checks after pushing and auto-diagnose failures? (y/n)"
967
+
968
+ - **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to allowedTools
969
+ - **No** → Skip CI monitoring steps (Claude still runs local tests, just doesn't watch CI)
970
+
971
+ **What this does:**
972
+ 1. After pushing, Claude runs `gh pr checks` to watch CI status
973
+ 2. If checks fail, Claude reads logs via `gh run view --log-failed`
974
+ 3. Claude diagnoses the failure and proposes a fix
975
+ 4. Max 2 fix attempts, then asks user
976
+ 5. Job isn't done until CI is green
977
+
978
+ **Recommendation:** Yes if you have CI configured. This closes the loop between
979
+ "local tests pass" and "PR is actually ready to merge."
980
+
981
+ **Requirements:**
982
+ - `gh` CLI installed and authenticated
983
+ - CI/CD configured (GitHub Actions, etc.)
984
+ - If no CI yet: skip, add later when you set up CI
985
+
986
+ **CI review feedback question (only if CI monitoring is enabled):**
987
+ > "What level of automated review response do you want?"
988
+
989
+ | Level | Name | What autofix handles | Est. API cost |
990
+ |-------|------|---------------------|---------------|
991
+ | **L1** | `ci-only` | CI failures only (broken tests, lint) | ~$0.50/fix |
992
+ | **L2** | `criticals` (default) | + Critical review findings (must-fix) | ~$1/fix |
993
+ | **L3** | `all-findings` | + Every suggestion the reviewer flags | ~$2/fix |
994
+
995
+ > **Cost note:** Higher levels mean more autofix iterations (each ~$0.50).
996
+ > L3 typically adds 1-2 extra iterations per PR but produces cleaner code.
997
+ > You can change this anytime by editing `AUTOFIX_LEVEL` in your ci-autofix workflow.
998
+
999
+ **What this does:**
1000
+ 1. After CI passes, Claude reads the automated code review comments
1001
+ 2. Based on your level: fixes criticals only, or all findings
1002
+ 3. Iterates (push -> re-review) until no findings remain at your chosen level
1003
+ 4. Only brings you in when everything is clean
1004
+ 5. Max 3 iterations to prevent infinite loops
1005
+
1006
+ **Check for new plugins periodically:**
1007
+ ```
1008
+ /plugin > Discover
1009
+ ```
1010
+
1011
+ **Re-run `claude-code-setup` periodically** (quarterly, or when your project expands in scope) to catch new automations — MCP servers, hooks, subagents — that weren't relevant at initial setup but are now.
1012
+
1013
+ ### Step 0.4: Auto-Scan Your Project
1014
+
1015
+ **Before asking questions, Claude will automatically scan your project:**
1016
+
1017
+ Claude is language-agnostic and will discover your stack, not assume it:
1018
+
1019
+ ```
1020
+ Claude scans for:
1021
+ ├── Package managers (any language):
1022
+ │ ├── package.json, package-lock.json, pnpm-lock.yaml → Node.js
1023
+ │ ├── Cargo.toml, Cargo.lock → Rust
1024
+ │ ├── go.mod, go.sum → Go
1025
+ │ ├── pyproject.toml, requirements.txt, Pipfile → Python
1026
+ │ ├── Gemfile, Gemfile.lock → Ruby
1027
+ │ ├── build.gradle, pom.xml → Java/Kotlin
1028
+ │ └── ... (any package manifest)
1029
+
1030
+ ├── Source directories: src/, app/, lib/, server/, pkg/, cmd/
1031
+ ├── Test directories: tests/, __tests__/, spec/, *_test.*, test_*.py
1032
+ ├── Test frameworks: detected from config files and test patterns
1033
+ ├── Lint/format tools: from config files
1034
+ ├── CI/CD: .github/workflows/, .gitlab-ci.yml, etc.
1035
+ ├── Feature docs: *_PLAN.md, *_DOCS.md, *_SPEC.md, docs/
1036
+ ├── README, CLAUDE.md, ARCHITECTURE.md
1037
+
1038
+ ├── Deployment targets (for ARCHITECTURE.md environments):
1039
+ │ ├── Dockerfile, docker-compose.yml → Container deployment
1040
+ │ ├── k8s/, kubernetes/, helm/ → Kubernetes
1041
+ │ ├── vercel.json, .vercel/ → Vercel
1042
+ │ ├── netlify.toml → Netlify
1043
+ │ ├── fly.toml → Fly.io
1044
+ │ ├── railway.json, railway.toml → Railway
1045
+ │ ├── render.yaml → Render
1046
+ │ ├── Procfile → Heroku
1047
+ │ ├── app.yaml, appengine/ → Google App Engine
1048
+ │ ├── deploy.sh, deploy/ → Custom scripts
1049
+ │ ├── .github/workflows/deploy*.yml → GitHub Actions deploy
1050
+ │ └── package.json scripts (deploy:*) → npm deploy scripts
1051
+
1052
+ ├── Tool permissions (for allowedTools):
1053
+ │ ├── package.json → Bash(npm *), Bash(node *), Bash(npx *)
1054
+ │ ├── pnpm-lock.yaml → Bash(pnpm *)
1055
+ │ ├── yarn.lock → Bash(yarn *)
1056
+ │ ├── go.mod → Bash(go *)
1057
+ │ ├── Cargo.toml → Bash(cargo *)
1058
+ │ ├── pyproject.toml → Bash(python *), Bash(pip *), Bash(pytest *)
1059
+ │ ├── Gemfile → Bash(ruby *), Bash(bundle *)
1060
+ │ ├── Makefile → Bash(make *)
1061
+ │ ├── docker-compose.yml → Bash(docker *)
1062
+ │ └── .github/workflows/ → Bash(gh *)
1063
+
1064
+ └── Design system (for UI projects):
1065
+ ├── tailwind.config.* → Extract colors, fonts, spacing from theme
1066
+ ├── CSS with --var-name → Extract custom property palette
1067
+ ├── .storybook/ → Reference as design source of truth
1068
+ ├── MUI/Chakra theme files → Reference theming docs + overrides
1069
+ └── /assets/, /images/ → Document asset locations
1070
+ ```
1071
+
1072
+ **If Claude can't detect something, it asks.** Never assumes.
1073
+
1074
+ **Examples are just examples.** The patterns above show common conventions - Claude will discover YOUR actual patterns.
1075
+
1076
+ **Shared vs isolated environments:** Not everyone runs in isolated local dev. Some teams share databases, staging servers, or have infrastructure already running. Claude should ask about your setup - don't assume isolated environments.
1077
+
1078
+ **Claude then presents findings:**
1079
+ ```
1080
+ 📊 Project Scan Results:
1081
+
1082
+ Detected:
1083
+ - Language: TypeScript (tsconfig.json found)
1084
+ - Source: src/
1085
+ - Tests: tests/ (Jest, 47 test files)
1086
+ - Lint: ESLint (.eslintrc.js)
1087
+ - Build: npm run build
1088
+
1089
+ Feature Docs:
1090
+ - Found: AUTH_PLAN.md, PAYMENTS_PLAN.md, API_PLAN.md
1091
+ - Pattern detected: *_PLAN.md (3 files)
1092
+
1093
+ Testing Analysis:
1094
+ - 80% unit tests, 20% integration tests
1095
+ - Heavy mocking detected (jest.mock in 35 files)
1096
+
1097
+ Recommendation: Your current tests rely heavily on mocks.
1098
+ For AI agents, Testing Diamond (integration-heavy) works better.
1099
+ Mocks can "pass" while production fails.
1100
+
1101
+ 🔧 Tool Permissions (detected from stack):
1102
+ Based on your stack, these tools would be useful:
1103
+ - Bash(npm *) ← package.json detected
1104
+ - Bash(node *) ← Node.js project
1105
+ - Bash(npx *) ← npm scripts
1106
+ - Bash(gh *) ← .github/workflows/ detected
1107
+
1108
+ Always included: Read, Edit, Write, Glob, Grep, Task
1109
+
1110
+ Options:
1111
+ [1] Accept suggested permissions (recommended)
1112
+ [2] Customize permissions
1113
+ [3] Skip - I'll manage permissions manually
1114
+
1115
+ 🎨 Design System (UI detected):
1116
+ Found: tailwind.config.js, components/ui/
1117
+
1118
+ Extracted:
1119
+ - Colors: primary (#3B82F6), secondary (#10B981), ...
1120
+ - Fonts: Inter (body), Fira Code (mono)
1121
+ - Breakpoints: sm (640px), md (768px), lg (1024px)
1122
+
1123
+ Options:
1124
+ [1] Generate DESIGN_SYSTEM.md from detected config
1125
+ [2] Point to external design system (Figma, Storybook URL)
1126
+ [3] Skip - no UI work expected in this project
1127
+
1128
+ 🚀 Deployment Targets (auto-detected):
1129
+ Found: vercel.json, .github/workflows/deploy.yml
1130
+
1131
+ Detected environments:
1132
+ - Preview: vercel (auto on PR)
1133
+ - Production: vercel --prod (manual trigger)
1134
+
1135
+ Options:
1136
+ [1] Accept detected deployment config (will populate ARCHITECTURE.md)
1137
+ [2] Let me specify deployment targets manually
1138
+ [3] Skip - no deployment from this project
1139
+
1140
+ 📝 Feature Doc Suffix:
1141
+ Current pattern: *_PLAN.md
1142
+ Recommended: *_DOCS.md (clearer for living documents)
1143
+
1144
+ Options:
1145
+ [1] Keep *_PLAN.md (don't rename existing files)
1146
+ [2] Use *_DOCS.md for NEW docs only (existing stay as-is)
1147
+ [3] Rename all to *_DOCS.md (will rename 3 files)
1148
+ [4] Custom suffix: ____________
1149
+
1150
+ 📄 Feature Doc Structure:
1151
+ Your docs don't follow our recommended structure.
1152
+
1153
+ Your current structure:
1154
+ - AUTH_PLAN.md: Free-form notes, no sections
1155
+ - PAYMENTS_PLAN.md: Has "TODO" and "Notes" sections
1156
+
1157
+ Our recommended structure:
1158
+ - Overview, Architecture, Gotchas, Future Work
1159
+
1160
+ Options:
1161
+ [1] Migrate content into new structure (Claude reorganizes)
1162
+ [2] Create new docs with our structure, archive old ones to /docs/archived/
1163
+ [3] Keep current structure (just be consistent going forward)
1164
+
1165
+ [Accept Recommendations] or [Customize]
1166
+ ```
1167
+
1168
+ **If Claude can't detect something, THEN it asks.**
1169
+
1170
+ ---
1171
+
1172
+ ## Step 1: Confirm or Customize
1173
+
1174
+ Claude presents what it found. You confirm or override:
1175
+
1176
+ ### Project Structure (Auto-Detected)
1177
+
1178
+ **Source directory:** `src/` ✓ detected
1179
+ ```
1180
+ Override? (leave blank to accept): _______________
1181
+ ```
1182
+
1183
+ **Q2: Where do your tests live?**
1184
+ ```
1185
+ Examples: tests/, __tests__/, src/**/*.test.ts, spec/
1186
+ Your answer: _______________
1187
+ ```
1188
+
1189
+ **Q3: What's your test framework?**
1190
+ ```
1191
+ Options: Jest, Vitest, Playwright, Cypress, pytest, Go testing, other
1192
+ Your answer: _______________
1193
+ ```
1194
+
1195
+ ### Commands
1196
+
1197
+ **Q4: What runs your linter?**
1198
+ ```
1199
+ Examples: npm run lint, pnpm lint, eslint ., biome check
1200
+ Your answer: _______________
1201
+ ```
1202
+
1203
+ **Q5: What runs type checking?**
1204
+ ```
1205
+ Examples: npm run typecheck, tsc --noEmit, mypy, none
1206
+ Your answer: _______________
1207
+ ```
1208
+
1209
+ **Q6: What runs all tests?**
1210
+ ```
1211
+ Examples: npm run test, pnpm test, pytest, go test ./...
1212
+ Your answer: _______________
1213
+ ```
1214
+
1215
+ **Q7: What runs a specific test file?**
1216
+ ```
1217
+ Examples: npm run test -- path/to/test.ts, pytest path/to/test.py
1218
+ Your answer: _______________
1219
+ ```
1220
+
1221
+ **Q8: What builds for production?**
1222
+ ```
1223
+ Examples: npm run build, pnpm build, go build, cargo build
1224
+ Your answer: _______________
1225
+ ```
1226
+
1227
+ ### Deployment
1228
+
1229
+ **Q8.5: How do you deploy? (auto-detected, confirm or override)**
1230
+ ```
1231
+ Detected: [e.g., Vercel, GitHub Actions, Docker, none]
1232
+
1233
+ Environments (will populate ARCHITECTURE.md):
1234
+ ┌─────────────┬──────────────────────┬────────────────────────┐
1235
+ │ Environment │ Trigger │ Deploy Command │
1236
+ ├─────────────┼──────────────────────┼────────────────────────┤
1237
+ │ Preview │ Auto on PR │ vercel │
1238
+ │ Staging │ Push to staging │ [your staging deploy] │
1239
+ │ Production │ Manual / push main │ vercel --prod │
1240
+ └─────────────┴──────────────────────┴────────────────────────┘
1241
+
1242
+ Options:
1243
+ [1] Accept detected config (recommended)
1244
+ [2] Customize environments
1245
+ [3] No deployment config needed
1246
+
1247
+ Your answer: _______________
1248
+ ```
1249
+
1250
+ ### Infrastructure
1251
+
1252
+ **Q9: What database(s) do you use?**
1253
+ ```
1254
+ Examples: PostgreSQL, MySQL, SQLite, MongoDB, none
1255
+ Your answer: _______________
1256
+ ```
1257
+
1258
+ **Q10: Do you use caching (Redis, etc.)?**
1259
+ ```
1260
+ Examples: Redis, Memcached, none
1261
+ Your answer: _______________
1262
+ ```
1263
+
1264
+ **Q11: How long do your tests take?**
1265
+ ```
1266
+ Examples: <1 minute, 1-5 minutes, 5+ minutes
1267
+ Your answer: _______________
1268
+ ```
1269
+
1270
+ ### Output Preferences
1271
+
1272
+ **Q12: How much detail in Claude's responses?**
1273
+ ```
1274
+ Options:
1275
+ - Small - Minimal output, just essentials (experienced users)
1276
+ - Medium - Balanced detail (default, recommended)
1277
+ - Large - Verbose output, full explanations (learning/debugging)
1278
+ Your answer: _______________
1279
+ ```
1280
+
1281
+ This setting affects:
1282
+ - TodoWrite verbosity (brief vs detailed task descriptions)
1283
+ - Planning output (summary vs comprehensive breakdown)
1284
+ - Self-review comments (concise vs thorough)
1285
+
1286
+ Stored in `.claude/settings.json` as `"verbosity": "small|medium|large"`.
1287
+
1288
+ ### Testing Philosophy
1289
+
1290
+ **Q13: What's your testing approach?**
1291
+ ```
1292
+ Options:
1293
+ - Strict TDD (test first always)
1294
+ - Test-after (write tests after implementation)
1295
+ - Mixed (depends on the feature)
1296
+ - Minimal (just critical paths)
1297
+ - None yet (want to start)
1298
+ Your answer: _______________
1299
+ ```
1300
+
1301
+ **Q14: What types of tests do you want?**
1302
+ ```
1303
+ (Check all that apply)
1304
+ [ ] Unit tests (pure logic, isolated)
1305
+ [ ] Integration tests (real DB, real services)
1306
+ [ ] E2E tests (Playwright, Cypress, etc.)
1307
+ [ ] API tests (endpoint testing)
1308
+ [ ] Other: _______________
1309
+ ```
1310
+
1311
+ **Q15: Your mocking philosophy?**
1312
+ ```
1313
+ Options:
1314
+ - Minimal mocking (real DB, mock external APIs only)
1315
+ - Heavy mocking (mock most dependencies)
1316
+ - No mocking (everything real, even external)
1317
+ - Not sure yet
1318
+ Your answer: _______________
1319
+ ```
1320
+
1321
+ ### Code Coverage (Optional)
1322
+
1323
+ **If test framework detected (Jest, pytest, Go, etc.):**
1324
+
1325
+ ```
1326
+ Q16: Code Coverage (Optional)
1327
+
1328
+ Detected: [test framework] with coverage configuration
1329
+
1330
+ Traditional Coverage:
1331
+ [1] Enforce threshold in CI (e.g., 80%) - Fail build if coverage drops
1332
+ [2] Report but don't enforce - Track coverage without blocking
1333
+ [3] Skip traditional coverage
1334
+
1335
+ AI Coverage Suggestions:
1336
+ [4] Enable AI-suggested coverage gaps in PR reviews
1337
+ (Claude notes: "You changed X but didn't add tests for edge case Y")
1338
+ [5] Skip AI suggestions
1339
+
1340
+ (You can choose one from each group, or skip both)
1341
+ Your answer: _______________
1342
+ ```
1343
+
1344
+ **If no test framework detected (docs/AI-heavy project):**
1345
+
1346
+ ```
1347
+ Q16: Code Coverage (Optional)
1348
+
1349
+ No test framework detected (documentation/AI-heavy project).
1350
+
1351
+ Options:
1352
+ [1] AI-suggested coverage gaps in PR reviews (Recommended)
1353
+ (Claude notes when changes affect behavior but lack test scenarios)
1354
+ [2] Skip - not needed for this project
1355
+
1356
+ Your answer: _______________
1357
+ ```
1358
+
1359
+ **How they work:**
1360
+ - **Traditional coverage:** Deterministic line/branch/function percentages via nyc, c8, coverage.py, etc.
1361
+ - **AI coverage suggestions:** Claude analyzes changes and suggests missing test cases based on context
1362
+
1363
+ **Not mutually exclusive:** Both can be used together for comprehensive coverage awareness.
1364
+
1365
+ ---
1366
+
1367
+ ### Using Your Answers
1368
+
1369
+ Your answers map to these files:
1370
+
1371
+ | Question | Used In |
1372
+ |----------|---------|
1373
+ | Q1 (source dir) | `tdd-pretool-check.sh` - pattern match |
1374
+ | Q2 (test dir) | `TESTING.md` - documentation |
1375
+ | Q3 (test framework) | `TESTING.md` - documentation |
1376
+ | Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
1377
+ | Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
1378
+ | Q11 (test duration) | `SDLC skill` - wait time note |
1379
+ | Q12 (E2E) | `testing skill` - testing diamond top |
1380
+
1381
+ ---
1382
+
1383
+ ## Step 2: Create Directory Structure
1384
+
1385
+ Create these directories in your project root:
1386
+
1387
+ ```bash
1388
+ mkdir -p .claude/hooks
1389
+ mkdir -p .claude/skills/sdlc
1390
+ mkdir -p .claude/skills/testing
1391
+ ```
1392
+
1393
+ **Commit to Git:** Yes! These files should be committed so your whole team gets the same SDLC enforcement. When teammates pull, they get the hooks and skills automatically.
1394
+
1395
+ Your structure should look like:
1396
+ ```
1397
+ your-project/
1398
+ ├── .claude/
1399
+ │ ├── hooks/
1400
+ │ │ ├── sdlc-prompt-check.sh (we'll create)
1401
+ │ │ └── tdd-pretool-check.sh (we'll create)
1402
+ │ ├── skills/
1403
+ │ │ ├── sdlc/
1404
+ │ │ │ └── SKILL.md (we'll create)
1405
+ │ │ └── testing/
1406
+ │ │ └── SKILL.md (we'll create)
1407
+ │ └── settings.json (we'll create)
1408
+ ├── CLAUDE.md (we'll create)
1409
+ ├── SDLC.md (we'll create)
1410
+ └── TESTING.md (we'll create)
1411
+ ```
1412
+
1413
+ ---
1414
+
1415
+ ## Step 3: Create settings.json
1416
+
1417
+ Create `.claude/settings.json`:
1418
+
1419
+ ```json
1420
+ {
1421
+ "verbosity": "medium",
1422
+ "allowedTools": [
1423
+ "Read",
1424
+ "Edit",
1425
+ "Write",
1426
+ "Glob",
1427
+ "Grep",
1428
+ "Task",
1429
+ "Bash(npm *)",
1430
+ "Bash(node *)",
1431
+ "Bash(npx *)",
1432
+ "Bash(gh *)"
1433
+ ],
1434
+ "hooks": {
1435
+ "UserPromptSubmit": [
1436
+ {
1437
+ "hooks": [
1438
+ {
1439
+ "type": "command",
1440
+ "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/sdlc-prompt-check.sh"
1441
+ }
1442
+ ]
1443
+ }
1444
+ ],
1445
+ "PreToolUse": [
1446
+ {
1447
+ "matcher": "Write|Edit|MultiEdit",
1448
+ "hooks": [
1449
+ {
1450
+ "type": "command",
1451
+ "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/tdd-pretool-check.sh"
1452
+ }
1453
+ ]
1454
+ }
1455
+ ]
1456
+ }
1457
+ }
1458
+ ```
1459
+
1460
+ ### Allowed Tools (Adaptive)
1461
+
1462
+ The `allowedTools` array is auto-generated based on your stack detected in Step 0.4.
1463
+
1464
+ | If Detected | Tools Added |
1465
+ |-------------|-------------|
1466
+ | `package.json` | `Bash(npm *)`, `Bash(node *)`, `Bash(npx *)` |
1467
+ | `pnpm-lock.yaml` | `Bash(pnpm *)` |
1468
+ | `yarn.lock` | `Bash(yarn *)` |
1469
+ | `go.mod` | `Bash(go *)` |
1470
+ | `Cargo.toml` | `Bash(cargo *)` |
1471
+ | `pyproject.toml` | `Bash(python *)`, `Bash(pip *)`, `Bash(pytest *)` |
1472
+ | `Gemfile` | `Bash(ruby *)`, `Bash(bundle *)` |
1473
+ | `Makefile` | `Bash(make *)` |
1474
+ | `docker-compose.yml` | `Bash(docker *)` |
1475
+ | `.github/workflows/` | `Bash(gh *)` |
1476
+
1477
+ **CI monitoring commands** (covered by `Bash(gh *)` above):
1478
+ - `gh pr checks` / `gh pr checks --watch` - watch CI status
1479
+ - `gh run view <RUN_ID> --log-failed` - read failure logs
1480
+ - `gh run list` - find workflow runs
1481
+
1482
+ **Always included:** `Read`, `Edit`, `Write`, `Glob`, `Grep`, `Task`
1483
+
1484
+ **Why this matters:** Explicitly listing allowed tools:
1485
+ - Prevents unexpected tool usage
1486
+ - Makes permissions visible and auditable
1487
+ - Reduces prompts for approval during work
1488
+
1489
+ ### Verbosity Levels
1490
+
1491
+ | Level | Output Style |
1492
+ |-------|--------------|
1493
+ | `small` | Brief, minimal output. Task names are short. Less explanation. |
1494
+ | `medium` | Balanced (default). Clear explanations without excessive detail. |
1495
+ | `large` | Verbose. Full reasoning, detailed breakdowns. Good for learning. |
1496
+
1497
+ ### Why These Hooks?
1498
+
1499
+ | Hook | When It Fires | Purpose |
1500
+ |------|---------------|---------|
1501
+ | `UserPromptSubmit` | Every message you send | Baseline SDLC reminder, skill auto-invoke |
1502
+ | `PreToolUse` | Before Claude edits files | TDD check: "Did you write the test first?" |
1503
+
1504
+ ### How Skill Auto-Invoke Works
1505
+
1506
+ The light hook outputs text that **instructs Claude** to invoke skills:
1507
+
1508
+ ```
1509
+ AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
1510
+ - implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
1511
+ - test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
1512
+ ```
1513
+
1514
+ **This is text-based, not programmatic.** Claude reads this instruction and follows it. When Claude sees your message is an implementation task, it invokes the sdlc skill using the Skill tool. This loads the full SDLC guidance into context.
1515
+
1516
+ **Why text-based works:** Claude Code's hook system allows hooks to add context that Claude reads. Claude is instructed to follow the AUTO-INVOKE rules, and it does. The skills then load detailed guidance only when needed.
1517
+
1518
+ ### Why No PostToolUse Hook?
1519
+
1520
+ **PostToolUse fires after EVERY individual edit.** If Claude makes 10 edits, it fires 10 times.
1521
+
1522
+ Running lint/typecheck after every edit is wasteful. Instead, lint/typecheck is a checklist step in the SDLC skill - run once after all edits, before tests.
1523
+
1524
+ ---
1525
+
1526
+ ## Step 4: Create the Light Hook
1527
+
1528
+ Create `.claude/hooks/sdlc-prompt-check.sh`:
1529
+
1530
+ ```bash
1531
+ #!/bin/bash
1532
+ # Light SDLC hook - baseline reminder every prompt (~100 tokens)
1533
+ # Full guidance in skills: .claude/skills/sdlc/ and .claude/skills/testing/
1534
+
1535
+ cat << 'EOF'
1536
+ SDLC BASELINE:
1537
+ 1. TodoWrite FIRST (plan tasks before coding)
1538
+ 2. STATE CONFIDENCE: HIGH/MEDIUM/LOW
1539
+ 3. LOW confidence? ASK USER before proceeding
1540
+ 4. FAILED 2x? STOP and ASK USER
1541
+ 5. 🛑 ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS
1542
+
1543
+ AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
1544
+ - implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
1545
+ - test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
1546
+ - If BOTH match (e.g., "fix the test") → sdlc takes precedence (includes TDD)
1547
+ - DON'T invoke for: questions, explanations, reading/exploring code, simple queries
1548
+ - DON'T wait for user to type /sdlc - AUTO-INVOKE based on task type
1549
+
1550
+ Workflow phases:
1551
+ 1. Plan Mode (research) → Present approach + confidence
1552
+ 2. Transition (update docs) → Request /compact
1553
+ 3. Implementation (TDD after compact)
1554
+ 4. SELF-REVIEW (/code-review) → BEFORE presenting to user
1555
+
1556
+ Quick refs: SDLC.md | TESTING.md | *_PLAN.md for feature
1557
+ EOF
1558
+ ```
1559
+
1560
+ **Make it executable:**
1561
+ ```bash
1562
+ chmod +x .claude/hooks/sdlc-prompt-check.sh
1563
+ ```
1564
+
1565
+ ---
1566
+
1567
+ ## Step 5: Create the TDD Hook
1568
+
1569
+ Create `.claude/hooks/tdd-pretool-check.sh`:
1570
+
1571
+ ```bash
1572
+ #!/bin/bash
1573
+ # PreToolUse hook - TDD enforcement before editing source files
1574
+ # Fires before Write/Edit/MultiEdit tools
1575
+
1576
+ # Read the tool input (JSON with file_path, content, etc.)
1577
+ TOOL_INPUT=$(cat)
1578
+
1579
+ # Extract the file path being edited (requires jq)
1580
+ FILE_PATH=$(echo "$TOOL_INPUT" | jq -r '.tool_input.file_path // empty')
1581
+
1582
+ # CUSTOMIZE: Change this pattern to match YOUR source directory
1583
+ # Examples: "/src/", "/app/", "/lib/", "/packages/", "/server/"
1584
+ if [[ "$FILE_PATH" == *"/src/"* ]]; then
1585
+ # Output additionalContext that Claude will read
1586
+ cat << 'EOF'
1587
+ {"hookSpecificOutput": {"hookEventName": "PreToolUse", "additionalContext": "TDD CHECK: Are you writing IMPLEMENTATION before a FAILING TEST? If yes, STOP. Write the test first (TDD RED), then implement (TDD GREEN)."}}
1588
+ EOF
1589
+ fi
1590
+
1591
+ # No output = allow the tool to proceed
1592
+ ```
1593
+
1594
+ **CUSTOMIZE:**
1595
+ 1. Replace `"/src/"` with your source directory pattern
1596
+ 2. Ensure `jq` is installed (or adapt to your preferred JSON parser)
1597
+
1598
+ **Make it executable:**
1599
+ ```bash
1600
+ chmod +x .claude/hooks/tdd-pretool-check.sh
1601
+ ```
1602
+
1603
+ **Alternative implementations:** You can write this hook in any language. The hook receives JSON on stdin and outputs JSON. See Claude Code docs for hook input/output format.
1604
+
1605
+ ---
1606
+
1607
+ ## Step 6: Create SDLC Skill
1608
+
1609
+ Create `.claude/skills/sdlc/SKILL.md`:
1610
+
1611
+ ````markdown
1612
+ ---
1613
+ name: sdlc
1614
+ description: Full SDLC workflow for implementing features, fixing bugs, refactoring code, and creating new functionality. Use this skill when implementing, fixing, refactoring, adding features, or building new code.
1615
+ argument-hint: [task description]
1616
+ ---
1617
+ # SDLC Skill - Full Development Workflow
1618
+
1619
+ ## Task
1620
+ $ARGUMENTS
1621
+
1622
+ ## Full SDLC Checklist
1623
+
1624
+ Your FIRST action must be TodoWrite with these steps:
1625
+
1626
+ ```
1627
+ TodoWrite([
1628
+ // PLANNING PHASE (Plan Mode for non-trivial tasks)
1629
+ { content: "Find and read relevant documentation", status: "in_progress", activeForm: "Reading docs" },
1630
+ { content: "Assess doc health - flag issues (ask before cleaning)", status: "pending", activeForm: "Checking doc health" },
1631
+ { content: "DRY scan: What patterns exist to reuse?", status: "pending", activeForm: "Scanning for reusable patterns" },
1632
+ { content: "Blast radius: What depends on code I'm changing?", status: "pending", activeForm: "Checking dependencies" },
1633
+ { content: "Restate task in own words - verify understanding", status: "pending", activeForm: "Verifying understanding" },
1634
+ { content: "Scrutinize test design - right things tested? Follow TESTING.md?", status: "pending", activeForm: "Reviewing test approach" },
1635
+ { content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending", activeForm: "Presenting approach" },
1636
+ { content: "Signal ready - user exits plan mode", status: "pending", activeForm: "Awaiting plan approval" },
1637
+ // TRANSITION PHASE (After plan mode, before compact)
1638
+ { content: "Update feature docs with discovered gotchas", status: "pending", activeForm: "Updating feature docs" },
1639
+ { content: "Request /compact before TDD", status: "pending", activeForm: "Requesting compact" },
1640
+ // IMPLEMENTATION PHASE (After compact)
1641
+ { content: "TDD RED: Write failing test FIRST", status: "pending", activeForm: "Writing failing test" },
1642
+ { content: "TDD GREEN: Implement, verify test passes", status: "pending", activeForm: "Implementing feature" },
1643
+ { content: "Run lint/typecheck", status: "pending", activeForm: "Running lint and typecheck" },
1644
+ { content: "Run ALL tests", status: "pending", activeForm: "Running all tests" },
1645
+ { content: "Production build check", status: "pending", activeForm: "Verifying production build" },
1646
+ // REVIEW PHASE
1647
+ { content: "DRY check: Is logic duplicated elsewhere?", status: "pending", activeForm: "Checking for duplication" },
1648
+ { content: "Self-review: run /code-review", status: "pending", activeForm: "Running code review" },
1649
+ { content: "Security review (if warranted)", status: "pending", activeForm: "Checking security implications" },
1650
+ { content: "Cross-model review (if configured — see below)", status: "pending", activeForm: "Running cross-model review" },
1651
+ // CI FEEDBACK LOOP (After local tests pass)
1652
+ { content: "Commit and push to remote", status: "pending", activeForm: "Pushing to remote" },
1653
+ { content: "Watch CI - fix failures, iterate until green (max 2x)", status: "pending", activeForm: "Watching CI" },
1654
+ { content: "Read CI review - implement valid suggestions, iterate until clean", status: "pending", activeForm: "Addressing CI review feedback" },
1655
+ // FINAL
1656
+ { content: "Present summary: changes, tests, CI status", status: "pending", activeForm: "Presenting final summary" }
1657
+ ])
1658
+ ```
1659
+
1660
+ ## New Pattern & Test Design Scrutiny (PLANNING)
1661
+
1662
+ **New design patterns require human approval:**
1663
+ 1. Search first - do similar patterns exist in codebase?
1664
+ 2. If YES and they're good - use as building block
1665
+ 3. If YES but they're bad - propose improvement, get approval
1666
+ 4. If NO (new pattern) - explain why needed, get explicit approval
1667
+
1668
+ **Test design scrutiny during planning:**
1669
+ - Are we testing the right things?
1670
+ - Does test approach follow TESTING.md philosophies?
1671
+ - If introducing new test patterns, same scrutiny as code patterns
1672
+
1673
+ ## Plan Mode Integration
1674
+
1675
+ **Use plan mode for:** Multi-file changes, new features, LOW confidence, bugs needing investigation.
1676
+
1677
+ **Workflow:**
1678
+ 1. **Plan Mode** (editing blocked): Research → Write plan file → Present approach + confidence
1679
+ 2. **Transition** (after approval): Update feature docs → Request /compact
1680
+ 3. **Implementation** (after compact): TDD RED → GREEN → PASS
1681
+
1682
+ **Before TDD, MUST ask:** "Docs updated. Run `/compact` before implementation?"
1683
+
1684
+ ## Confidence Check (REQUIRED)
1685
+
1686
+ Before presenting approach, STATE your confidence:
1687
+
1688
+ | Level | Meaning | Action |
1689
+ |-------|---------|--------|
1690
+ | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
1691
+ | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
1692
+ | LOW (<60%) | Not sure | ASK USER before proceeding |
1693
+ | FAILED 2x | Something's wrong | STOP. ASK USER immediately |
1694
+ | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
1695
+
1696
+ ## Self-Review Loop (CRITICAL)
1697
+
1698
+ ```
1699
+ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
1700
+ ↑ │
1701
+ │ ↓
1702
+ │ Issues found?
1703
+ │ ├── NO → Present to user
1704
+ │ └── YES ↓
1705
+ └────────────────────────────────────────────── Ask user: fix in new plan?
1706
+ ```
1707
+
1708
+ **The loop goes back to PLANNING, not TDD RED.** When self-review finds issues:
1709
+ 1. Ask user: "Found issues. Want to create a plan to fix?"
1710
+ 2. If yes → back to PLANNING phase with new plan doc
1711
+ 3. Then → docs update → TDD → review (proper SDLC loop)
1712
+
1713
+ **How to self-review:**
1714
+ 1. Run `/code-review` to review your changes
1715
+ 2. It launches parallel agents (CLAUDE.md compliance, bug detection, logic & security)
1716
+ 3. Issues at confidence >= 80 are real findings — go back to PLANNING to fix
1717
+ 4. Issues below 80 are likely false positives — skip unless obviously valid
1718
+ 5. Address issues by going back through the proper SDLC loop
1719
+
1720
+ ## Cross-Model Review (If Configured)
1721
+
1722
+ **When to run:** High-stakes changes (auth, payments, data handling), complex refactors, research-heavy work.
1723
+ **When to skip:** Trivial changes (typo fixes, config tweaks), time-sensitive hotfixes, risk < review cost.
1724
+
1725
+ **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
1726
+
1727
+ **Steps:**
1728
+ 1. After self-review passes, write `.reviews/handoff.json`:
1729
+ ```jsonc
1730
+ {
1731
+ "review_id": "feature-xyz-001",
1732
+ "status": "PENDING_REVIEW",
1733
+ "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
1734
+ "review_instructions": "Review for security, edge cases, and correctness",
1735
+ "artifact_path": ".reviews/feature-xyz-001/"
1736
+ }
1737
+ ```
1738
+ 2. Tell the user to run the independent reviewer:
1739
+ ```bash
1740
+ codex exec \
1741
+ -c 'model_reasoning_effort="xhigh"' \
1742
+ -s danger-full-access \
1743
+ -o .reviews/latest-review.md \
1744
+ "You are an independent code reviewer. Read .reviews/handoff.json, \
1745
+ review the listed files, and write your findings to the artifact_path. \
1746
+ End with CERTIFIED or NOT CERTIFIED."
1747
+ ```
1748
+ 3. Read `.reviews/latest-review.md` — if CERTIFIED, proceed to CI. If NOT CERTIFIED, fix findings and repeat from step 1.
1749
+
1750
+ ```
1751
+ Self-review passes → write handoff.json → user runs codex exec
1752
+ ^ |
1753
+ | CERTIFIED? → YES → CI feedback loop
1754
+ | |
1755
+ | → NO (findings)
1756
+ | |
1757
+ └──────── Fix findings ←───────────────────────┘
1758
+ (repeat until CERTIFIED, or ask user)
1759
+ ```
1760
+
1761
+ **Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
1762
+
1763
+ **Full protocol:** See the "Cross-Model Review Loop (Optional)" section below for key flags and reasoning effort guidance.
1764
+
1765
+ ## Test Review (Harder Than Implementation)
1766
+
1767
+ During self-review, critique tests HARDER than app code:
1768
+ 1. **Testing the right things?** - Not just that tests pass
1769
+ 2. **Tests prove correctness?** - Or just verify current behavior?
1770
+ 3. **Follow our philosophies (TESTING.md)?**
1771
+ - Testing Diamond (integration-heavy)?
1772
+ - Minimal mocking (real DB, mock external APIs only)?
1773
+ - Real fixtures from captured data?
1774
+
1775
+ **Tests are the foundation.** Bad tests = false confidence = production bugs.
1776
+
1777
+ ## Scope Guard (Stay in Your Lane)
1778
+
1779
+ **Only make changes directly related to the task.**
1780
+
1781
+ If you notice something else that should be fixed:
1782
+ - ✅ NOTE it in your summary ("I noticed X could be improved")
1783
+ - ❌ DON'T fix it unless asked
1784
+
1785
+ **Why this matters:** AI agents can drift into "helpful" changes that weren't requested. This creates unexpected diffs, breaks unrelated things, and makes code review harder.
1786
+
1787
+ ## Test Failure Recovery (SDET Philosophy)
1788
+
1789
+ **🛑 ALL TESTS MUST PASS BEFORE COMMIT**
1790
+
1791
+ **Treat test code like app code.** Test failures are bugs. Investigate them the way a 15-year SDET would - with thought and care, not by brushing them aside.
1792
+
1793
+ If tests fail:
1794
+ 1. Identify which test(s) failed
1795
+ 2. Diagnose WHY - this is the important part:
1796
+ - Your code broke it? Fix your code (regression)
1797
+ - Test is for deleted code? Delete the test
1798
+ - Test has wrong assertions? Fix the test
1799
+ - Test is "flaky"? Investigate - flakiness is just another word for bug
1800
+ 3. Fix appropriately (fix code, fix test, or delete dead test)
1801
+ 4. Run specific test individually first
1802
+ 5. Then run ALL tests
1803
+ 6. Still failing? ASK USER - don't spin your wheels
1804
+
1805
+ **Flaky tests are bugs, not mysteries:**
1806
+ - Sometimes the bug is in app code (race condition, timing issue)
1807
+ - Sometimes the bug is in test code (shared state, not parallel-safe)
1808
+ - Sometimes the bug is in test environment (cleanup not proper)
1809
+
1810
+ Debug it. Find root cause. Fix it properly. Tests ARE code.
1811
+
1812
+ ## Flaky Test Prevention
1813
+
1814
+ **Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions.
1815
+
1816
+ ### Principles
1817
+
1818
+ 1. **Treat test code like app code** — same code review standards, same quality bar. Tests are first-class citizens, not afterthoughts.
1819
+
1820
+ 2. **Investigate every flaky failure** — never ignore a flaky test. It's a bug somewhere in one of three layers:
1821
+ - **Test code** — shared state, not parallel-safe, timing assumptions, missing cleanup
1822
+ - **App code** — race condition, unhandled edge case, non-deterministic behavior
1823
+ - **Environment/infra** — CI runner flakiness, resource contention, external service instability
1824
+
1825
+ 3. **Stress-test new tests** — run new or modified tests N times before merge to sniff out flakiness early. A test that passes 1x but fails on run 50 has a bug.
1826
+
1827
+ 4. **Isolate testing environments** — sanitize state between tests. Don't share databases. Clean up properly. Each test should be independently runnable.
1828
+
1829
+ 5. **Address flakiness immediately** — momentum matters. The longer a flaky test lives, the more trust erodes and the harder root cause becomes to find.
1830
+
1831
+ 6. **Quarantine only if actively fixing** — quarantine is a temporary holding pen, not a permanent ignore. If a test is quarantined for more than a sprint, it needs attention or deletion.
1832
+
1833
+ 7. **Track flaky rates** — you can't fix what you don't measure. Know which tests are flaky and how often.
1834
+
1835
+ ### When the Bug Is in CI Infrastructure
1836
+
1837
+ Sometimes the flakiness is genuinely in CI infrastructure (runner environment, GitHub Actions internals, third-party action bugs). When this happens:
1838
+ - **Make cosmetic steps non-blocking** — PR comments, notifications, and reports should use `continue-on-error: true`
1839
+ - **Keep quality gates strict** — the actual pass/fail decision must NOT have `continue-on-error`
1840
+ - **Separate "fail the build" from "nice to have"** — a missing PR comment is not a regression
1841
+
1842
+ ## CI Feedback Loop (After Commit)
1843
+
1844
+ **The SDLC doesn't end at local tests.** CI must pass too.
1845
+
1846
+ ```
1847
+ Local tests pass -> Commit -> Push -> Watch CI
1848
+ |
1849
+ CI passes? -+-> YES -> Present for review
1850
+ |
1851
+ +-> NO -> Fix -> Push -> Watch CI
1852
+ |
1853
+ (max 2 attempts)
1854
+ |
1855
+ Still failing?
1856
+ |
1857
+ STOP and ASK USER
1858
+ ```
1859
+
1860
+ **How to watch CI:**
1861
+ 1. Push changes to remote
1862
+ 2. Check CI status:
1863
+ ```bash
1864
+ # Watch checks in real-time (blocks until complete)
1865
+ gh pr checks --watch
1866
+
1867
+ # Or check status without blocking
1868
+ gh pr checks
1869
+
1870
+ # View specific failed run logs
1871
+ gh run view <RUN_ID> --log-failed
1872
+ ```
1873
+ 3. If CI fails:
1874
+ - Read failure logs: `gh run view <RUN_ID> --log-failed`
1875
+ - Diagnose root cause (same philosophy as local test failures)
1876
+ - Fix and push again
1877
+ 4. Max 2 fix attempts - if still failing, ASK USER
1878
+ 5. If CI passes - proceed to present final summary
1879
+
1880
+ **Context GC (compact during idle):** While waiting for CI (typically 3-5 min), suggest `/compact` if the conversation is long. Think of it like a time-based garbage collector — idle time + high memory pressure = good time to collect. Don't suggest on short conversations.
1881
+
1882
+ **CI failures follow same rules as test failures:**
1883
+ - Your code broke it? Fix your code
1884
+ - CI config issue? Fix the config
1885
+ - Flaky? Investigate - flakiness is a bug
1886
+ - Stuck? ASK USER
1887
+
1888
+ ## CI Review Feedback Loop (After CI Passes)
1889
+
1890
+ **CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
1891
+
1892
+ ```
1893
+ CI passes -> Read review suggestions
1894
+ |
1895
+ Valid improvements? -+-> YES -> Implement -> Run tests -> Push
1896
+ | |
1897
+ | Review again (iterate)
1898
+ |
1899
+ +-> NO (just opinions/style) -> Skip, note why
1900
+ |
1901
+ +-> None -> Done, present to user
1902
+ ```
1903
+
1904
+ **How to evaluate suggestions:**
1905
+ 1. Read all CI review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
1906
+ 2. For each suggestion, ask: **"Is this a real improvement or just an opinion?"**
1907
+ - **Real improvement:** Fixes a bug, improves performance, adds missing error handling, reduces duplication, improves test coverage → Implement it
1908
+ - **Opinion/style:** Different but equivalent formatting, subjective naming preference, "you could also..." without clear benefit → Skip it
1909
+ 3. Implement the valid ones, run tests locally, push
1910
+ 4. CI re-reviews — repeat until no substantive suggestions remain
1911
+ 5. Max 3 iterations — if reviewer keeps finding new things, ASK USER
1912
+
1913
+ **The goal:** User is only brought in at the very end, when both CI and reviewer are satisfied. The code should be polished before human review.
1914
+
1915
+ **Customizable behavior** (set during wizard setup):
1916
+ - **Auto-implement** (default): Implement valid suggestions autonomously, skip opinions
1917
+ - **Ask first**: Present suggestions to user, let them decide which to implement
1918
+ - **Skip review feedback**: Ignore CI review suggestions, only fix CI failures
1919
+
1920
+ ## DRY Principle
1921
+
1922
+ **Before coding:** "What patterns exist I can reuse?"
1923
+ **After coding:** "Did I accidentally duplicate anything?"
1924
+
1925
+ ## DELETE Legacy Code
1926
+
1927
+ - Legacy code? DELETE IT
1928
+ - Backwards compatibility? NO - DELETE IT
1929
+ - "Just in case" fallbacks? DELETE IT
1930
+
1931
+ **THE RULE:** Delete old code first. If it breaks, fix it properly.
1932
+
1933
+ ---
1934
+
1935
+ **Full reference:** SDLC.md
1936
+ ````
1937
+
1938
+ ---
1939
+
1940
+ ## Step 7: Create Testing Skill
1941
+
1942
+ Create `.claude/skills/testing/SKILL.md`:
1943
+
1944
+ ````markdown
1945
+ ---
1946
+ name: testing
1947
+ description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
1948
+ argument-hint: [test type] [target]
1949
+ ---
1950
+ # Testing Skill - TDD & Testing Philosophy
1951
+
1952
+ ## Task
1953
+ $ARGUMENTS
1954
+
1955
+ ## Testing Diamond (CRITICAL)
1956
+
1957
+ ```
1958
+ /\ ← Few E2E (automated or manual sign-off at end)
1959
+ / \
1960
+ / \
1961
+ /------\
1962
+ | | ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
1963
+ | |
1964
+ \------/
1965
+ \ /
1966
+ \ /
1967
+ \/ ← Few Unit (pure logic only)
1968
+ ```
1969
+
1970
+ **Why Integration Tests are Best Bang for Buck:**
1971
+ - **Speed**: Fast enough to run on every change
1972
+ - **Stability**: Touch real code, not mocks that lie
1973
+ - **Confidence**: If they pass, production usually works
1974
+ - **Real bugs**: Integration tests with real DB catch real bugs
1975
+ - Unit tests with mocks can "pass" while production fails
1976
+
1977
+ ## Minimal Mocking Philosophy
1978
+
1979
+ | What | Mock? | Why |
1980
+ |------|-------|-----|
1981
+ | Database | ❌ NEVER | Use test DB or in-memory |
1982
+ | Cache | ❌ NEVER | Use isolated test instance |
1983
+ | External APIs | ✅ YES | Real calls = flaky + expensive |
1984
+ | Time/Date | ✅ YES | Determinism |
1985
+
1986
+ **Mocks MUST come from REAL captured data:**
1987
+ - Capture real API response
1988
+ - Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
1989
+ - Import in tests
1990
+ - Never guess mock shapes!
1991
+
1992
+ ## TDD Tests Must PROVE
1993
+
1994
+ | Phase | What It Proves |
1995
+ |-------|----------------|
1996
+ | RED | Test FAILS → Bug exists or feature missing |
1997
+ | GREEN | Test PASSES → Fix works or feature implemented |
1998
+ | Forever | Regression protection |
1999
+
2000
+ **WRONG approach:**
2001
+ ```
2002
+ // ❌ Writing test that passes with current (buggy) code
2003
+ assert currentBuggyBehavior == currentBuggyBehavior // pseudocode
2004
+ ```
2005
+
2006
+ **CORRECT approach:**
2007
+ ```
2008
+ // ✅ Writing test that FAILS with buggy code, PASSES with fix
2009
+ assert result.status == 'success' // pseudocode - adapt to your framework
2010
+ assert result.data != null
2011
+ ```
2012
+
2013
+ ## Unit Tests = Pure Logic ONLY
2014
+
2015
+ A function qualifies for unit testing ONLY if:
2016
+ - ✅ No database calls
2017
+ - ✅ No external API calls
2018
+ - ✅ No file system access
2019
+ - ✅ No cache calls
2020
+ - ✅ Input → Output transformation only
2021
+
2022
+ Everything else needs integration tests.
2023
+
2024
+ ## When Stuck on Tests
2025
+
2026
+ 1. Add console.logs → Check output
2027
+ 2. Run single test in isolation
2028
+ 3. Check fixtures match real API
2029
+ 4. **STILL stuck?** ASK USER
2030
+
2031
+ ## After Session (Capture Learnings)
2032
+
2033
+ If this session revealed testing insights, update the right place:
2034
+ - **Testing patterns, gotchas** → `TESTING.md`
2035
+ - **Feature-specific test quirks** → Feature docs (`*_PLAN.md`)
2036
+ - **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
2037
+
2038
+ ---
2039
+
2040
+ **Full reference:** TESTING.md
2041
+ ````
2042
+
2043
+ ---
2044
+
2045
+ ### Visual Regression Testing (Experimental - Niche Use Cases Only)
2046
+
2047
+ **Most apps don't need this.** Standard E2E testing (Playwright, Cypress) covers 99% of UI testing needs.
2048
+
2049
+ **What is it?** Pixel-by-pixel or AI-based screenshot comparison:
2050
+ ```
2051
+ Before: Screenshot A (baseline)
2052
+ After: Screenshot B (candidate)
2053
+ Result: Visual diff highlights pixel changes
2054
+ ```
2055
+
2056
+ **When you actually need this (rare):**
2057
+
2058
+ | Use Case | Example | Why Standard E2E Won't Work |
2059
+ |----------|---------|----------------------------|
2060
+ | Wiki/Doc renderers | Markdown → HTML rendering | Output IS the visual, not DOM state |
2061
+ | Canvas/Graphics apps | Drawing tools, charts | No DOM to assert against |
2062
+ | PDF/Image generators | Invoice generators | Binary output, not HTML |
2063
+ | Visual editors | WYSIWYG, design tools | Pixel-perfect matters |
2064
+
2065
+ **When you don't need this (most apps):**
2066
+ Standard E2E testing checks elements exist, text is correct, interactions work. That's enough for:
2067
+ - Normal web apps, forms, CRUD
2068
+ - Dashboards, e-commerce, SaaS products
2069
+
2070
+ **The reality:**
2071
+
2072
+ | Approach | Coverage | Maintenance | Cost |
2073
+ |----------|----------|-------------|------|
2074
+ | Standard E2E | 95%+ of UI bugs | Low | Free |
2075
+ | Visual regression | Remaining 5% edge cases | HIGH | Often paid |
2076
+
2077
+ **Visual regression downsides:**
2078
+ - Baseline images constantly need updating
2079
+ - Flaky due to font rendering, anti-aliasing
2080
+ - CI/OS differences cause false positives
2081
+ - Expensive (Chromatic, Percy charge per snapshot)
2082
+
2083
+ **If you actually need it:**
2084
+ ```javascript
2085
+ // Playwright built-in (free)
2086
+ await expect(page).toHaveScreenshot('rendered-page.png');
2087
+ ```
2088
+
2089
+ **During wizard setup (Step 0.4):** If canvas-heavy or rendering libraries detected, Claude asks:
2090
+ ```
2091
+ Q?: Visual Output Testing (Experimental)
2092
+
2093
+ Your app appears to generate visual output (canvas/rendering detected).
2094
+ Standard E2E may not cover visual rendering bugs.
2095
+
2096
+ Options:
2097
+ [1] I'll handle visual testing myself (most users)
2098
+ [2] Tell me about visual regression tools (niche)
2099
+ [3] Skip - standard E2E is enough for me
2100
+ ```
2101
+
2102
+ **Default: Skip.** This is not pushed on users.
2103
+
2104
+ ---
2105
+
2106
+ ## Step 8: Create CLAUDE.md
2107
+
2108
+ Create `CLAUDE.md` in your project root. This is your project-specific configuration:
2109
+
2110
+ ```markdown
2111
+ # [Your Project Name] - Development Guidelines
2112
+
2113
+ ## TDD ENFORCEMENT (READ BEFORE CODING!)
2114
+
2115
+ **STOP! Before writing ANY implementation code:**
2116
+
2117
+ 1. **Write failing tests FIRST** (TDD RED phase)
2118
+ 2. **Use integration tests** primarily - see TESTING.md
2119
+ 3. **Use REAL fixtures** for mock data - never guess API shapes
2120
+
2121
+ ## Commands
2122
+
2123
+ <!-- CUSTOMIZE: Replace with your actual commands from Q4-Q8 -->
2124
+
2125
+ - Build: `[your build command]`
2126
+ - Run dev: `[your dev command]`
2127
+ - Lint: `[your lint command]`
2128
+ - Typecheck: `[your typecheck command]`
2129
+ - Run all tests: `[your test command]`
2130
+ - Run specific test: `[your specific test command]`
2131
+
2132
+ ## Code Style
2133
+
2134
+ <!-- CUSTOMIZE: Add your code style rules -->
2135
+
2136
+ - [Your indentation: tabs or spaces?]
2137
+ - [Your quote style: single or double?]
2138
+ - [Semicolons: yes or no?]
2139
+ - Use strict TypeScript
2140
+ - Prefer const over let
2141
+
2142
+ ## Architecture
2143
+
2144
+ <!-- CUSTOMIZE: Brief overview of your project -->
2145
+
2146
+ - Commands/routes live in: [where?]
2147
+ - Core logic lives in: [where?]
2148
+ - Database: [what?]
2149
+ - Cache: [what?]
2150
+
2151
+ ## Git Commits
2152
+
2153
+ - Follow conventional commits: `type(scope): description`
2154
+ - NEVER commit with failing tests
2155
+
2156
+ ## Plan Docs
2157
+
2158
+ - Before coding a feature: READ its `*_PLAN.md` file
2159
+ - After completing work: UPDATE the plan doc
2160
+
2161
+ ## Testing Notes
2162
+
2163
+ <!-- CUSTOMIZE: Any project-specific testing notes -->
2164
+
2165
+ - Test timeout: [how long?]
2166
+ - Special considerations: [any?]
2167
+ ```
2168
+
2169
+ ---
2170
+
2171
+ ## Step 9: Create SDLC.md, TESTING.md, and ARCHITECTURE.md
2172
+
2173
+ These are your full reference docs. Start with stubs and expand over time:
2174
+
2175
+ **ARCHITECTURE.md (IMPORTANT - Dev & Prod Environments):**
2176
+ ```markdown
2177
+ # Architecture
2178
+
2179
+ ## How to Run This Project
2180
+
2181
+ ### Development
2182
+ ```bash
2183
+ # Start dev server
2184
+ [your dev command, e.g., npm run dev]
2185
+
2186
+ # Run with hot reload
2187
+ [your hot reload command]
2188
+
2189
+ # Database (dev)
2190
+ [how to start/connect to dev DB]
2191
+
2192
+ # Other services (Redis, etc.)
2193
+ [how to start dev dependencies]
2194
+ ```
2195
+
2196
+ ### Production
2197
+ ```bash
2198
+ # Build for production
2199
+ [your build command]
2200
+
2201
+ # Start production server
2202
+ [your prod start command]
2203
+
2204
+ # Database (prod)
2205
+ [connection info or how to access]
2206
+ ```
2207
+
2208
+ ## Environments
2209
+
2210
+ <!-- Claude auto-populates this from Q8.5 deployment detection -->
2211
+
2212
+ | Environment | URL | Deploy Command | Trigger |
2213
+ |-------------|-----|----------------|---------|
2214
+ | Local Dev | http://localhost:3000 | `npm run dev` | Manual |
2215
+ | Preview | [auto-generated PR URL] | `vercel` | Auto on PR |
2216
+ | Staging | https://staging.example.com | `[your staging deploy]` | Push to staging |
2217
+ | Production | https://example.com | `vercel --prod` | Manual / push to main |
2218
+
2219
+ ## Deployment Checklist
2220
+
2221
+ **Before deploying to ANY environment:**
2222
+ - [ ] All tests pass locally
2223
+ - [ ] Production build succeeds (`npm run build`)
2224
+ - [ ] No uncommitted changes
2225
+
2226
+ **Before deploying to PRODUCTION:**
2227
+ - [ ] Changes tested in staging/preview first
2228
+ - [ ] STATE CONFIDENCE: HIGH before proceeding
2229
+ - [ ] If LOW confidence → ASK USER before deploying
2230
+
2231
+ **Claude follows this automatically.** When task involves "deploy to prod" and confidence is LOW, Claude will ask before proceeding.
2232
+
2233
+ ## Rollback
2234
+
2235
+ If deployment fails or causes issues:
2236
+
2237
+ | Environment | Rollback Command | Notes |
2238
+ |-------------|------------------|-------|
2239
+ | Preview | [auto-expires or redeploy] | Usually self-heals |
2240
+ | Staging | `[your rollback command]` | [notes] |
2241
+ | Production | `[your rollback command]` | [critical - document clearly] |
2242
+
2243
+ <!-- Add specific rollback procedures as you discover them -->
2244
+
2245
+ ## System Overview
2246
+
2247
+ [Brief description of components and how they connect]
2248
+
2249
+ ## Key Services
2250
+
2251
+ | Service | Purpose | Port |
2252
+ |---------|---------|------|
2253
+ | [API] | [What it does] | [3000] |
2254
+ | [DB] | [What it does] | [5432] |
2255
+
2256
+ ## Gotchas
2257
+
2258
+ <!-- Add environment-specific gotchas as you discover them -->
2259
+ ```
2260
+
2261
+ **Why ARCHITECTURE.md matters:** Claude needs to know how to run your app in dev vs prod. Without this, Claude will ask "how do I start the server?" every time. Put it here once, never answer again.
2262
+
2263
+ **If you already have one:** Claude will scan for existing ARCHITECTURE.md, README.md, or similar and merge/reference it.
2264
+
2265
+ ---
2266
+
2267
+ **SDLC.md:**
2268
+ ```markdown
2269
+ <!-- SDLC Wizard Version: 1.15.0 -->
2270
+ <!-- Setup Date: [DATE] -->
2271
+ <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2272
+ <!-- Git Workflow: [PRs or Solo] -->
2273
+ <!-- Plugins: claude-md-management -->
2274
+
2275
+ # SDLC - Development Workflow
2276
+
2277
+ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
2278
+
2279
+ ## Workflow Overview
2280
+
2281
+ 1. **Planning Mode** → Research, present approach, get approval
2282
+ 2. **Transition** → Update docs, /compact
2283
+ 3. **Implementation** → TDD RED → GREEN → PASS
2284
+ 4. **Review** → Self-review, present summary
2285
+
2286
+ ## Lessons Learned
2287
+
2288
+ <!-- Add gotchas as you discover them -->
2289
+ ```
2290
+
2291
+ **Why the metadata comments?**
2292
+ - Invisible to readers (HTML comments)
2293
+ - Parseable by Claude for idempotent updates
2294
+ - Survives file edits
2295
+ - Travels with the repo
2296
+
2297
+ **TESTING.md:**
2298
+ ```markdown
2299
+ # Testing Guidelines
2300
+
2301
+ See `.claude/skills/testing/SKILL.md` for TDD philosophy.
2302
+
2303
+ ## Test Commands
2304
+
2305
+ - All tests: `[your command]`
2306
+ - Specific test: `[your command]`
2307
+
2308
+ ## Fixtures
2309
+
2310
+ Location: `[Claude will discover or ask - e.g., tests/fixtures/, test-data/]`
2311
+
2312
+ ## Lessons Learned
2313
+
2314
+ <!-- Add testing gotchas as you discover them -->
2315
+ ```
2316
+
2317
+ ---
2318
+
2319
+ **DESIGN_SYSTEM.md (if UI detected):**
2320
+
2321
+ Only generated if design system elements were detected in Step 0.4. Skip if no UI work expected.
2322
+
2323
+ ```markdown
2324
+ # Design System
2325
+
2326
+ ## Source of Truth
2327
+
2328
+ [Storybook URL or Figma link if external, otherwise this document]
2329
+
2330
+ ## Colors
2331
+
2332
+ | Name | Value | Usage |
2333
+ |------|-------|-------|
2334
+ | primary | #3B82F6 | Buttons, links, primary actions |
2335
+ | secondary | #10B981 | Success states, secondary actions |
2336
+ | error | #EF4444 | Error states, destructive actions |
2337
+ | warning | #F59E0B | Warning states, caution |
2338
+ | background | #FFFFFF | Page background |
2339
+ | surface | #F3F4F6 | Cards, elevated surfaces |
2340
+ | text-primary | #111827 | Main body text |
2341
+ | text-secondary | #6B7280 | Secondary, muted text |
2342
+
2343
+ ## Typography
2344
+
2345
+ | Style | Font | Size | Weight | Line Height |
2346
+ |-------|------|------|--------|-------------|
2347
+ | h1 | Inter | 2.25rem | 700 | 1.2 |
2348
+ | h2 | Inter | 1.875rem | 600 | 1.25 |
2349
+ | body | Inter | 1rem | 400 | 1.5 |
2350
+ | code | Fira Code | 0.875rem | 400 | 1.6 |
2351
+
2352
+ ## Spacing
2353
+
2354
+ Using 4px base unit: `4, 8, 12, 16, 24, 32, 48, 64, 96`
2355
+
2356
+ ## Components
2357
+
2358
+ Reference: `components/ui/` or Storybook
2359
+
2360
+ ## Assets
2361
+
2362
+ - Icons: `public/icons/` or icon library name
2363
+ - Images: `public/images/`
2364
+ - Logos: `public/logos/`
2365
+
2366
+ ## Gotchas
2367
+
2368
+ <!-- Add design-specific gotchas as you discover them -->
2369
+ ```
2370
+
2371
+ **Why DESIGN_SYSTEM.md?**
2372
+ - Claude needs to know your visual language when making UI changes
2373
+ - Prevents style drift and inconsistency
2374
+ - Extracted from your actual config (tailwind.config.js, CSS vars) - not guessed
2375
+
2376
+ **If you have external design system:** Point to Storybook/Figma URL instead of duplicating.
2377
+
2378
+ ---
2379
+
2380
+ ## Step 10: Verify Setup (Claude Does This Automatically)
2381
+
2382
+ **After creating all files, Claude automatically verifies the setup:**
2383
+
2384
+ ```
2385
+ Claude runs these checks:
2386
+ 1. ✓ Hooks are executable (chmod +x applied)
2387
+ 2. ✓ settings.json is valid JSON
2388
+ 3. ✓ Skill frontmatter has correct name/description
2389
+ 4. ✓ All required files exist
2390
+ 5. ✓ Directory structure is correct
2391
+
2392
+ Verification Results:
2393
+ ├── .claude/hooks/sdlc-prompt-check.sh ✓ executable
2394
+ ├── .claude/hooks/tdd-pretool-check.sh ✓ executable
2395
+ ├── .claude/settings.json ✓ valid JSON
2396
+ ├── .claude/skills/sdlc/SKILL.md ✓ frontmatter OK
2397
+ ├── .claude/skills/testing/SKILL.md ✓ frontmatter OK
2398
+ ├── CLAUDE.md ✓ exists
2399
+ ├── SDLC.md ✓ exists
2400
+ └── TESTING.md ✓ exists
2401
+
2402
+ All checks passed! Setup complete.
2403
+ ```
2404
+
2405
+ **If any check fails:** Claude fixes it automatically or tells you what's wrong.
2406
+
2407
+ **You don't need to verify manually** - Claude handles this as the final step of wizard execution.
2408
+
2409
+ ---
2410
+
2411
+ ## Step 11: Restart and Verify
2412
+
2413
+ **Restart Claude Code to load the new hooks/skills:**
2414
+
2415
+ 1. Exit this session, start a new one
2416
+ 2. Send any message (even just "hi")
2417
+ 3. You should see "SDLC BASELINE" in the response
2418
+
2419
+ **Test the system:**
2420
+
2421
+ | Test | Expected Result |
2422
+ |------|-----------------|
2423
+ | "What files handle auth?" | Answers without invoking skills |
2424
+ | "Add a logout button" | Auto-invokes sdlc skill, uses TodoWrite |
2425
+ | "Write tests for login" | Auto-invokes testing skill |
2426
+
2427
+ **What happens automatically:**
2428
+
2429
+ | You Do | System Does |
2430
+ |--------|-------------|
2431
+ | Ask to implement something | SDLC skill auto-invokes, TodoWrite starts |
2432
+ | Ask to write tests | Testing skill auto-invokes |
2433
+ | Claude tries to edit code | TDD reminder fires |
2434
+ | Task completes | Compliance check runs |
2435
+
2436
+ **You do NOT need to:** Type `/sdlc` manually, remember all steps, or enforce the process yourself.
2437
+
2438
+ **If not working:** Ask Claude to check:
2439
+ - Is `.claude/settings.json` valid JSON?
2440
+ - Are hooks executable? (`chmod +x .claude/hooks/*.sh`)
2441
+ - Is the hook path correct?
2442
+
2443
+ ---
2444
+
2445
+ ## Step 12: The Workflow
2446
+
2447
+ **Planning Mode** (use for non-trivial tasks):
2448
+
2449
+ 1. Claude researches codebase, reads relevant docs
2450
+ 2. Claude presents approach with **confidence level**
2451
+ 3. You approve or adjust
2452
+ 4. Claude updates docs with discoveries
2453
+ 5. Claude asks: "Run `/compact` before implementation?"
2454
+ 6. You run `/compact` to free context
2455
+ 7. Claude implements with TDD
2456
+
2457
+ **When Claude should ask you:**
2458
+ - LOW confidence → Must ask before proceeding
2459
+ - FAILED 2x → Must stop and ask
2460
+ - Multiple valid approaches → Should present options
2461
+
2462
+ ---
2463
+
2464
+ ## Quick Reference Card
2465
+
2466
+ ### Workflow Phases
2467
+
2468
+ | Phase | What Happens | Key Action |
2469
+ |-------|--------------|------------|
2470
+ | **Planning** | Research, design approach | State confidence |
2471
+ | **Transition** | Update docs | Request /compact |
2472
+ | **Implementation** | TDD RED → GREEN → PASS | All tests pass |
2473
+ | **Review** | Self-review, summary | Present to user |
2474
+
2475
+ ### Confidence Levels
2476
+
2477
+ | Level | Claude Action |
2478
+ |-------|---------------|
2479
+ | HIGH (90%+) | Proceed after approval |
2480
+ | MEDIUM (60-89%) | Highlight uncertainties |
2481
+ | LOW (<60%) | **ASK USER first** |
2482
+ | FAILED 2x | **STOP and ASK** |
2483
+
2484
+ ### Hook Summary
2485
+
2486
+ | Hook | Fires | Purpose |
2487
+ |------|-------|---------|
2488
+ | UserPromptSubmit | Every prompt | SDLC baseline + skill trigger |
2489
+ | PreToolUse | Before file edits | TDD reminder |
2490
+
2491
+ ### Key Commands
2492
+
2493
+ | Action | Command |
2494
+ |--------|---------|
2495
+ | Free context after planning | `/compact` |
2496
+ | Enter planning mode | Claude suggests or `/plan` |
2497
+ | Run specific skill | `/sdlc` or `/testing` |
2498
+
2499
+ ---
2500
+
2501
+ ## Troubleshooting
2502
+
2503
+ ### Hook Not Firing
2504
+
2505
+ ```bash
2506
+ # Check hook is executable
2507
+ chmod +x .claude/hooks/sdlc-prompt-check.sh
2508
+
2509
+ # Test hook manually
2510
+ ./.claude/hooks/sdlc-prompt-check.sh
2511
+ # Should output SDLC BASELINE text
2512
+ ```
2513
+
2514
+ ### Skills Not Loading
2515
+
2516
+ 1. Check skill frontmatter has `name:` matching directory
2517
+ 2. Check description matches trigger words in hook
2518
+ 3. Verify Claude is recognizing implementation tasks
2519
+
2520
+ ---
2521
+
2522
+ ## Success Criteria
2523
+
2524
+ You've successfully set up the system when:
2525
+
2526
+ - [ ] Light hook fires every prompt (you see SDLC BASELINE in responses)
2527
+ - [ ] Claude auto-invokes sdlc skill for implementation tasks
2528
+ - [ ] Claude auto-invokes testing skill for test tasks
2529
+ - [ ] Claude uses TodoWrite to track progress
2530
+ - [ ] Claude states confidence levels
2531
+ - [ ] Claude asks for clarification when LOW confidence
2532
+ - [ ] TDD hook reminds about tests before editing source files
2533
+ - [ ] Claude requests /compact before implementation
2534
+
2535
+ ---
2536
+
2537
+ ## End of Task: Compliance and Mini-Retro
2538
+
2539
+ **Compliance check** (Claude does this after each task):
2540
+ - TodoWrite used? Confidence stated? TDD followed? Tests pass? Self-review done?
2541
+ - If something was skipped: note what and why (intentional vs oversight)
2542
+
2543
+ **Mini-retro** (optional, for meaningful tasks only):
2544
+
2545
+ **This is for AI learning, not human.** The retro helps Claude identify:
2546
+ - What it struggled with and why
2547
+ - Whether it needs more research in certain areas
2548
+ - Whether bad/legacy code is causing low confidence (indicator of problem area)
2549
+
2550
+ ```
2551
+ - Improve: [something that could be better]
2552
+ - Stop: [something that added friction]
2553
+ - Start: [something that worked well]
2554
+
2555
+ What I struggled with: [area where confidence was low]
2556
+ Suggested doc updates: [if any]
2557
+ Want me to file these? (yes/no/not now)
2558
+ ```
2559
+
2560
+ **Capture learnings (update the right docs):**
2561
+
2562
+ | Learning Type | Update Where |
2563
+ |---------------|--------------|
2564
+ | Feature-specific gotchas, decisions | Feature docs (`*_PLAN.md`, `*_DOCS.md`) |
2565
+ | Testing patterns, gotchas | `TESTING.md` |
2566
+ | Architecture decisions | `ARCHITECTURE.md` |
2567
+ | Commands, general project context | `CLAUDE.md` (or `/revise-claude-md`) |
2568
+
2569
+ **`/revise-claude-md` scope:** Only updates CLAUDE.md. It does NOT touch feature docs, TESTING.md, hooks, or skills. Use it for general project context that applies across the codebase.
2570
+
2571
+ **When to do mini-retro:** After features, tricky bugs, or discovering gotchas. Skip for one-line fixes or questions.
2572
+
2573
+ **The SDLC evolves:** Weekly research, monthly deep-dives, and CI friction signals feed improvements. Human approves, the system gets better.
2574
+
2575
+ **If docs are causing problems:** Sometimes Claude struggles in an area because the docs are bad, legacy, or confusing - just like a human would. Low confidence in an area can indicate the docs need attention.
2576
+
2577
+ ---
2578
+
2579
+ ## Going Further
2580
+
2581
+ ### Create Feature Plan Docs
2582
+
2583
+ For each major feature, create `FEATURE_NAME_PLAN.md`:
2584
+
2585
+ ```markdown
2586
+ # Feature Name
2587
+
2588
+ ## Overview
2589
+ What is this feature? What problem does it solve?
2590
+
2591
+ ## Architecture
2592
+ How does it work? Components, data flow.
2593
+
2594
+ ## Gotchas
2595
+ Things that can trip you up.
2596
+
2597
+ ## Future Work
2598
+ What's planned but not done.
2599
+ ```
2600
+
2601
+ Claude will read these during planning and update them with discoveries.
2602
+
2603
+ ### Expand TESTING.md
2604
+
2605
+ As you discover testing gotchas, add them:
2606
+
2607
+ ```markdown
2608
+ ## Lessons Learned
2609
+
2610
+ ### [Date] - Description
2611
+ **Problem:** What went wrong
2612
+ **Solution:** How to fix it
2613
+ **Prevention:** How to avoid it
2614
+ ```
2615
+
2616
+ ### Customize Skills
2617
+
2618
+ Add project-specific guidance to skills:
2619
+
2620
+ - Domain-specific patterns
2621
+ - Common gotchas
2622
+ - Preferred patterns
2623
+ - Architecture decisions
2624
+
2625
+ ---
2626
+
2627
+ ## Testing AI Apps: What's Different
2628
+
2629
+ AI-driven applications require fundamentally different testing approaches than traditional software.
2630
+
2631
+ ### Why AI Testing is Unique
2632
+
2633
+ | Traditional Apps | AI-Driven Apps |
2634
+ |------------------|----------------|
2635
+ | Deterministic (same input → same output) | **Stochastic** (same input → varying outputs) |
2636
+ | Binary pass/fail tests | **Scored evaluation** with thresholds |
2637
+ | Test once, trust forever | **Continuous monitoring** for drift |
2638
+ | Logic bugs | Hallucination, bias, inaccuracy |
2639
+
2640
+ ### Key AI Testing Concepts
2641
+
2642
+ **1. Multiple Runs for Confidence**
2643
+
2644
+ AI outputs vary. Run evaluations multiple times and look at averages, not single results.
2645
+
2646
+ ```
2647
+ # Bad: Single run
2648
+ score = evaluate(prompt) # 7.2 - is this good or lucky?
2649
+
2650
+ # Good: Multiple runs with confidence interval
2651
+ scores = [evaluate(prompt) for _ in range(5)]
2652
+ mean = 7.1, 95% CI = [6.8, 7.4] # Now we know the range
2653
+ ```
2654
+
2655
+ **2. Baseline Scores, Not Just Pass/Fail**
2656
+
2657
+ Set baseline metrics (accuracy, relevancy, coherence) and detect regressions over time.
2658
+
2659
+ | Metric | Baseline | Current | Status |
2660
+ |--------|----------|---------|--------|
2661
+ | SDLC compliance | 6.5 | 7.2 | IMPROVED |
2662
+ | Hallucination rate | 5% | 3% | IMPROVED |
2663
+ | Response time | 2.1s | 2.3s | STABLE |
2664
+
2665
+ **3. AI-Specific Risk Categories**
2666
+
2667
+ - **Hallucination**: AI invents facts that aren't true
2668
+ - **Bias**: Unfair treatment of demographic groups
2669
+ - **Adversarial**: Prompt injection attacks
2670
+ - **Data leakage**: Exposing training data or PII
2671
+ - **Drift**: Behavior changes silently over time (model updates, context changes)
2672
+
2673
+ **4. Evaluation Frameworks**
2674
+
2675
+ Consider tools for LLM output testing:
2676
+ - [DeepEval](https://github.com/confident-ai/deepeval) - Open source LLM evaluation
2677
+ - [Deepchecks](https://deepchecks.com) - ML/AI testing and monitoring
2678
+ - Custom scoring pipelines (like this wizard's E2E evaluation)
2679
+
2680
+ ### Practical Advice
2681
+
2682
+ - **Don't trust single AI outputs** - verify with multiple samples or human review
2683
+ - **Set quantitative baselines** - "accuracy must stay above 85%" not "it should work"
2684
+ - **Monitor production** - AI apps can degrade without code changes (model drift, prompt injection)
2685
+ - **Budget for evaluation** - AI testing costs more (API calls, human review, compute)
2686
+ - **Use confidence intervals** - 5 runs with 95% CI is better than 1 run with crossed fingers
2687
+
2688
+ _Sources: [Confident AI](https://www.confident-ai.com/blog/llm-testing-in-2024-top-methods-and-strategies), [IMDA Starter Kit](https://www.imda.gov.sg/-/media/imda/files/about/emerging-tech-and-research/artificial-intelligence/starter-kit-for-testing-llm-based-applications-for-safety-and-reliability.pdf), [aistupidlevel.info methodology](https://aistupidlevel.info/methodology)_
2689
+
2690
+ ---
2691
+
2692
+ ## CI/CD Gotchas
2693
+
2694
+ Common pitfalls when automating AI-assisted development workflows.
2695
+
2696
+ ### `workflow_dispatch` Requires Merge First
2697
+
2698
+ GitHub Actions with `workflow_dispatch` (manual trigger) can only be triggered AFTER the workflow file exists on the default branch.
2699
+
2700
+ | What You Want | What Works |
2701
+ |---------------|------------|
2702
+ | Test new workflow before merge | YAML validation + trigger tests, or test via push/PR events |
2703
+ | Manual trigger new workflow | Merge first, then `gh workflow run` |
2704
+
2705
+ **Why not `act`?** Workflows that use `claude-code-action@v1` require GitHub Actions secrets and runner context that `act` cannot replicate. Use YAML validation and trigger tests instead:
2706
+
2707
+ ```bash
2708
+ # Validate YAML syntax
2709
+ python3 -c "import yaml; yaml.safe_load(open('.github/workflows/my-workflow.yml'))"
2710
+
2711
+ # Run trigger/config tests (if you have them)
2712
+ ./tests/test-workflow-triggers.sh
2713
+ ```
2714
+
2715
+ This catches structural issues before merge. For full GitHub environment testing, merge then trigger.
2716
+
2717
+ ### PR Review with Comment Response (Optional)
2718
+
2719
+ Want Claude to respond to existing PR comments during review? Add comment fetching to your review workflow.
2720
+
2721
+ **The Flow:**
2722
+ 1. PR opens → Claude reviews diff → Posts sticky comment
2723
+ 2. You read review, leave questions/comments on PR
2724
+ 3. Add `needs-review` label
2725
+ 4. Claude fetches your comments + reviews diff again
2726
+ 5. Updated sticky comment addresses your questions
2727
+
2728
+ **Two layers of interaction:**
2729
+
2730
+ | Layer | What | When to Use |
2731
+ |-------|------|-------------|
2732
+ | **Workflow** | Claude addresses comments in sticky review | Quick async response |
2733
+ | **Local terminal** | Ask Claude to fetch comments, have discussion | Deep interactive discussion |
2734
+
2735
+ **Example workflow step:**
2736
+ ```yaml
2737
+ - name: Fetch PR comments
2738
+ run: |
2739
+ gh api repos/$REPO/pulls/$PR_NUMBER/comments \
2740
+ --jq '[.[] | {author: .user.login, body: .body}]' > /tmp/comments.json
2741
+ ```
2742
+
2743
+ Then include `/tmp/comments.json` in Claude's prompt context.
2744
+
2745
+ **Local discussion:**
2746
+ ```
2747
+ You: "Fetch comments from PR #42 and let's discuss the concerns"
2748
+ Claude: [fetches via gh api, discusses with you interactively]
2749
+ ```
2750
+
2751
+ This is optional - skip if you prefer fresh reviews only.
2752
+
2753
+ ### CI Auto-Fix Loop (Optional)
2754
+
2755
+ Automatically fix CI failures and PR review findings. Claude reads the error context, fixes the code, commits, and re-triggers CI. Loops until CI passes AND review has no findings at your chosen level, or max retries hit.
2756
+
2757
+ **The Loop:**
2758
+ ```
2759
+ Push to PR
2760
+ |
2761
+ v
2762
+ CI runs ──► FAIL ──► ci-autofix: Claude reads logs, fixes, commits [autofix 1/3] ──► re-trigger
2763
+ |
2764
+ └── PASS ──► PR Review ──► has findings at your level? ──► ci-autofix: fixes all ──► re-trigger
2765
+ |
2766
+ └── APPROVE, no findings ──► DONE
2767
+ ```
2768
+
2769
+ **Safety measures:**
2770
+ - Never runs on main branch
2771
+ - Max retries (default 3, configurable via `MAX_AUTOFIX_RETRIES`)
2772
+ - `AUTOFIX_LEVEL` controls what findings to act on (`ci-only`, `criticals`, `all-findings`)
2773
+ - Restricted Claude tools (no git, no npm)
2774
+ - Self-modification ban (can't edit its own workflow file)
2775
+ - `[autofix N/M]` commit tags for audit trail
2776
+ - Sticky PR comments show status
2777
+
2778
+ **Setup:**
2779
+ 1. Create `.github/workflows/ci-autofix.yml`:
2780
+
2781
+ ```yaml
2782
+ name: CI Auto-Fix
2783
+
2784
+ on:
2785
+ workflow_run:
2786
+ workflows: ["CI", "PR Code Review"]
2787
+ types: [completed]
2788
+
2789
+ permissions:
2790
+ contents: write
2791
+ pull-requests: write
2792
+
2793
+ env:
2794
+ MAX_AUTOFIX_RETRIES: 3
2795
+ AUTOFIX_LEVEL: criticals # ci-only | criticals | all-findings
2796
+
2797
+ jobs:
2798
+ autofix:
2799
+ runs-on: ubuntu-latest
2800
+ if: |
2801
+ github.event.workflow_run.head_branch != 'main' &&
2802
+ github.event.workflow_run.event == 'pull_request' &&
2803
+ (
2804
+ (github.event.workflow_run.name == 'CI' && github.event.workflow_run.conclusion == 'failure') ||
2805
+ (github.event.workflow_run.name == 'PR Code Review' && github.event.workflow_run.conclusion == 'success')
2806
+ )
2807
+ steps:
2808
+ # Count previous [autofix] commits to enforce max retries
2809
+ # Download CI failure logs or fetch review comment
2810
+ # Check findings at your AUTOFIX_LEVEL (criticals + suggestions)
2811
+ # Run Claude to fix ALL findings with restricted tools
2812
+ # Commit [autofix N/M], push, re-trigger CI
2813
+ # Post sticky PR comment with status
2814
+ ```
2815
+
2816
+ 2. Add `workflow_dispatch:` trigger to your CI workflow (so autofix can re-trigger it)
2817
+ 3. Optionally configure a GitHub App for token generation (avoids `workflow_run` default-branch constraint)
2818
+
2819
+ **Token approaches:**
2820
+
2821
+ | Approach | When | Pros |
2822
+ |----------|------|------|
2823
+ | GITHUB_TOKEN + `gh workflow run` | Default | No extra setup |
2824
+ | GitHub App token | `CI_AUTOFIX_APP_ID` secret exists | Push triggers `synchronize` naturally |
2825
+
2826
+ **Note:** `workflow_run` only fires for workflows on the default branch. The ci-autofix workflow is dormant until first merged to main.
2827
+
2828
+ > **Template vs. this repo:** The template above uses `ci-autofix.yml` with `criticals` as a safe default for new projects. The wizard's own repo has evolved this into `ci-self-heal.yml` with `all-findings` — a more aggressive configuration we dogfood internally. Both naming conventions work; the behavior is identical.
2829
+
2830
+ ---
2831
+
2832
+ ### Cross-Model Review Loop (Optional)
2833
+
2834
+ Use an independent AI model from a different company as a code reviewer. The author can't grade their own homework — a model with different training data and different biases catches blind spots the authoring model misses.
2835
+
2836
+ **Why this works:** Two AI systems from different companies (e.g., Claude writes, GPT reviews) provide adversarial diversity. They have fundamentally different training, different failure modes, and different strengths. What one misses, the other catches.
2837
+
2838
+ **Use the best model at the deepest reasoning.** This is your quality gate — don't economize on it. Always use the latest, most capable model available (currently GPT-5.4) at maximum reasoning effort (`xhigh`). Cheaper/faster models miss things. The whole point is catching what the authoring model couldn't.
2839
+
2840
+ **Prerequisites:**
2841
+ - Codex CLI installed: `npm i -g @openai/codex`
2842
+ - OpenAI API key configured: `export OPENAI_API_KEY=...`
2843
+ - This is a local workflow tool — not required for CI/CD
2844
+
2845
+ **The Protocol:**
2846
+
2847
+ 1. Create a `.reviews/` directory in your project
2848
+ 2. After Claude completes its SDLC loop (self-review passes), write a handoff file:
2849
+
2850
+ ```jsonc
2851
+ // .reviews/handoff.json
2852
+ {
2853
+ "review_id": "feature-xyz-001",
2854
+ "status": "PENDING_REVIEW",
2855
+ "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
2856
+ "review_instructions": "Review for security, edge cases, and correctness",
2857
+ "artifact_path": ".reviews/feature-xyz-001/"
2858
+ }
2859
+ ```
2860
+
2861
+ 3. Run the independent reviewer:
2862
+
2863
+ ```bash
2864
+ codex exec \
2865
+ -c 'model_reasoning_effort="xhigh"' \
2866
+ -s danger-full-access \
2867
+ -o .reviews/latest-review.md \
2868
+ "You are an independent code reviewer. Read .reviews/handoff.json, \
2869
+ review the listed files, and write your findings to the artifact_path. \
2870
+ End with CERTIFIED or NOT CERTIFIED."
2871
+ ```
2872
+
2873
+ **The Loop:**
2874
+ ```
2875
+ Claude writes code → self-review passes → handoff.json
2876
+ ↑ |
2877
+ | v
2878
+ | Codex reviews (xhigh reasoning)
2879
+ | |
2880
+ | CERTIFIED? -+→ YES → Done
2881
+ | |
2882
+ | +→ NO (findings)
2883
+ | |
2884
+ └────────── Claude fixes findings ←────────┘
2885
+ (repeat until CERTIFIED, or ask user)
2886
+ ```
2887
+
2888
+ **Key flags:**
2889
+ - `-c 'model_reasoning_effort="xhigh"'` — Maximum reasoning depth. This is where you get the most value. Testing showed `xhigh` caught 3 findings that `high` missed on the same content.
2890
+ - `-s danger-full-access` — Full filesystem read/write so the reviewer can read your actual code.
2891
+ - `-o .reviews/latest-review.md` — Save the review output for Claude to read back.
2892
+
2893
+ **Tool-agnostic principle:** The core idea is "use a different model as an independent reviewer." Codex CLI is the concrete example today, but any competing AI tool that can read files and produce structured feedback works. The value comes from the independence and different training, not the specific tool.
2894
+
2895
+ **When to use this:**
2896
+ - High-stakes changes (auth, payments, data handling)
2897
+ - Research-heavy work where accuracy matters more than speed
2898
+ - Complex refactors touching many files
2899
+ - Any time you want higher confidence before merging
2900
+
2901
+ **When to skip:**
2902
+ - Trivial changes (typo fixes, config tweaks)
2903
+ - Time-sensitive hotfixes
2904
+ - Changes where the review cost exceeds the risk
2905
+
2906
+ ---
2907
+
2908
+ ## User Understanding and Periodic Feedback
2909
+
2910
+ **During wizard setup and ongoing use:**
2911
+
2912
+ ### Make Sure User Understands the Process
2913
+
2914
+ At key points, Claude should check:
2915
+ - "Does this workflow make sense to you?"
2916
+ - "Any parts you'd like to customize or skip?"
2917
+ - "Questions about how this works?"
2918
+
2919
+ **The goal:** User should never be confused about what's happening or why. If they are, stop and clarify.
2920
+
2921
+ ### This is a Growing Document
2922
+
2923
+ Remind users:
2924
+ - The SDLC is customizable to their needs
2925
+ - They can try something and change it later
2926
+ - It's built into the system to evolve over time
2927
+ - Their feedback makes the process better
2928
+
2929
+ ### Periodic Check-ins (Minimal, Non-Invasive)
2930
+
2931
+ Occasionally (not every task), Claude can ask:
2932
+ - "Is the SDLC working well for you? Anything causing friction?"
2933
+ - "Any parts of the process you want to adjust?"
2934
+
2935
+ **Keep it minimal.** This is meant to improve the process, not add overhead. If the user seems frustrated or doesn't need it, skip it.
2936
+
2937
+ ### When Claude Gets Lost
2938
+
2939
+ If Claude repeatedly struggles in a codebase area:
2940
+ - Low confidence is an indicator of a problem
2941
+ - Might be legacy code, bad docs, or just unfamiliar patterns
2942
+ - Claude should ask questions rather than guess wrong
2943
+ - Better to ask and be right than to assume and create rework
2944
+
2945
+ **Don't be afraid to ask questions.** It prevents being wrong. This is a symbiotic relationship - the more interaction, the better both sides get.
2946
+
2947
+ ---
2948
+
2949
+ ## Staying Updated (Idempotent Wizard)
2950
+
2951
+ **The wizard is designed to be idempotent.** You can run it on new or existing setups - it aims to detect what you have and only add what's missing.
2952
+
2953
+ ### How to Update
2954
+
2955
+ Ask Claude any of these:
2956
+ > "Check for SDLC wizard updates"
2957
+ > "Run me through the SDLC wizard"
2958
+ > "What am I missing from the latest wizard?"
2959
+ > "Update my SDLC setup"
2960
+
2961
+ **All of these do the same thing:** Claude checks what's new, shows you, and walks you through only what's missing.
2962
+
2963
+ ### Update URLs
2964
+
2965
+ Claude fetches from these URLs (via WebFetch):
2966
+
2967
+ | Resource | URL |
2968
+ |----------|-----|
2969
+ | CHANGELOG | `https://raw.githubusercontent.com/BaseInfinity/agentic-ai-sdlc-wizard/main/CHANGELOG.md` |
2970
+ | Wizard | `https://raw.githubusercontent.com/BaseInfinity/agentic-ai-sdlc-wizard/main/CLAUDE_CODE_SDLC_WIZARD.md` |
2971
+
2972
+ ### What Claude Does (4 Phases)
2973
+
2974
+ **Step 1: Read installed version** from `SDLC.md` metadata:
2975
+ ```
2976
+ <!-- SDLC Wizard Version: X.X.X -->
2977
+ ```
2978
+ If no version comment exists, treat as `0.0.0`.
2979
+
2980
+ **Step 2: Fetch CHANGELOG first** from the CHANGELOG URL above. Parse all entries between user's installed version and the latest version. Show the user what changed. If versions match, say "You're up to date!" and stop.
2981
+
2982
+ **Step 3: Fetch full wizard and compare.** For each wizard step, check if the user already has it:
2983
+
2984
+ | Component | How Claude Checks | If Missing | If Present |
2985
+ |-----------|-------------------|------------|------------|
2986
+ | Plugins | Is it installed? | Prompt to install | Skip (mention you have it) |
2987
+ | Hooks | Does `.claude/hooks/*.sh` exist? | Create | Compare against latest, offer updates |
2988
+ | Skills | Does `.claude/skills/*/SKILL.md` exist? | Create | Compare against latest, offer updates |
2989
+ | Docs | Does `SDLC.md`, `TESTING.md` exist? | Create | Compare against latest, offer updates |
2990
+ | CLAUDE.md | Does it exist? | Create from template | Never modify (fully custom) |
2991
+ | Questions | Were answers recorded in SDLC.md? | Ask them | Skip |
2992
+
2993
+ **Step 4: Apply changes and bump version.** Walk through only missing/changed pieces (opt-in each). Update `<!-- SDLC Wizard Version: X.X.X -->` in SDLC.md to the latest version.
2994
+
2995
+ ### CHANGELOG Drives the Update Flow
2996
+
2997
+ Claude reads the CHANGELOG to show you what's new **before** applying anything. The wizard contains file templates and step registry for the actual apply logic.
2998
+
2999
+ - **CHANGELOG** = What changed and why (Claude shows you this first)
3000
+ - **Wizard** = File templates + step registry (Claude uses this to apply)
3001
+
3002
+ ### Example: Old User Checking for Updates
3003
+
3004
+ ```
3005
+ Claude: "Fetching CHANGELOG to check for updates..."
3006
+
3007
+ Your version: 1.8.0
3008
+ Latest version: 1.13.0
3009
+
3010
+ What's new since 1.8.0:
3011
+ - v1.13.0: Self-update improvements, optional CI notification
3012
+ - v1.12.0: Full system audit, apply step fixes
3013
+ - v1.11.0: Stale output cleanup, error handling
3014
+ - v1.10.0: "Prove It's Better" CI automation
3015
+ - v1.9.0: Workflow consolidation (6 → 5 workflows)
3016
+
3017
+ Now checking your setup against latest wizard...
3018
+
3019
+ ✓ Hooks - up to date
3020
+ ✓ Skills - content differs (update available)
3021
+ ✗ step-update-notify - NOT DONE (new in v1.13.0, optional)
3022
+
3023
+ Summary:
3024
+ - 1 file update available (SDLC skill)
3025
+ - 1 new optional step
3026
+
3027
+ Walk through updates? (y/n)
3028
+ ```
3029
+
3030
+ **The key:** Every new thing added to the wizard becomes a trackable "step". Old users automatically get prompted for new steps they haven't done.
3031
+
3032
+ ### How State is Tracked
3033
+
3034
+ Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3035
+
3036
+ ```markdown
3037
+ <!-- SDLC Wizard Version: 1.15.0 -->
3038
+ <!-- Setup Date: 2026-01-24 -->
3039
+ <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3040
+ <!-- Git Workflow: PRs -->
3041
+ <!-- Plugins: claude-md-management -->
3042
+
3043
+ # SDLC - Development Workflow
3044
+ ...
3045
+ ```
3046
+
3047
+ When Claude runs the wizard:
3048
+ 1. Parse the version and completed steps from SDLC.md
3049
+ 2. Fetch CHANGELOG first — show what's new between installed and latest
3050
+ 3. Fetch full wizard, compare against step registry
3051
+ 4. For anything new that isn't marked complete → walk them through it
3052
+ 5. Update the metadata after each step completes
3053
+
3054
+ ### Wizard Step Registry
3055
+
3056
+ Every wizard step has a unique ID for tracking:
3057
+
3058
+ | Step ID | Description | Added in Version |
3059
+ |---------|-------------|------------------|
3060
+ | `step-0.1` | Required plugins | 1.2.0 |
3061
+ | `step-0.2` | SDLC core setup | 1.0.0 |
3062
+ | `step-0.3` | Additional recommendations | 1.2.0 |
3063
+ | `step-0.4` | Auto-scan | 1.0.0 |
3064
+ | `step-1` | Confirm/customize | 1.0.0 |
3065
+ | `step-2` | Directory structure | 1.0.0 |
3066
+ | `step-3` | settings.json | 1.0.0 |
3067
+ | `step-4` | Light hook | 1.0.0 |
3068
+ | `step-5` | TDD hook | 1.0.0 |
3069
+ | `step-6` | SDLC skill | 1.0.0 |
3070
+ | `step-7` | Testing skill | 1.0.0 |
3071
+ | `step-8` | CLAUDE.md | 1.0.0 |
3072
+ | `step-9` | SDLC/TESTING/ARCH docs | 1.0.0 |
3073
+ | `question-git-workflow` | Git workflow preference | 1.2.0 |
3074
+ | `step-update-notify` | Optional: CI update notification | 1.13.0 |
3075
+ | `step-cross-model-review` | Optional: Cross-model review loop | 1.16.0 |
3076
+
3077
+ When checking for updates, Claude compares user's completed steps against this registry.
3078
+
3079
+ ### How New Wizard Features Work
3080
+
3081
+ When we add something new to the wizard:
3082
+
3083
+ 1. **Add it as a trackable step** with a unique ID
3084
+ 2. **Add it to CHANGELOG** so users know what's new
3085
+ 3. **Old users who run "check for updates":**
3086
+ - Claude sees their version is older
3087
+ - Claude finds steps that don't exist in their tracking metadata
3088
+ - Claude walks them through just those steps
3089
+ 4. **New users:**
3090
+ - Go through everything, all steps get marked complete
3091
+
3092
+ **This is recursive** - every future wizard update follows the same pattern.
3093
+
3094
+ ### Why Designed to Be Idempotent?
3095
+
3096
+ Like `apt-get install`:
3097
+ - If package installed → skip
3098
+ - If package missing → install
3099
+ - If package outdated → offer update
3100
+ - Designed to not break existing state
3101
+
3102
+ **Intended benefits:**
3103
+ - **Safe to rerun** - designed to not duplicate or break existing setup
3104
+ - **One command for everyone** - new users, old users, current users
3105
+ - **Preserves customizations** - designed to keep your modifications intact
3106
+ - **Fills gaps** - aims to detect and address what's missing
3107
+
3108
+ > Note: Idempotent behavior is a design goal. Cross-stack setup-path E2E testing is tracked in the roadmap.
3109
+
3110
+ ### What Gets Compared
3111
+
3112
+ | Your File | Compared Against | Action |
3113
+ |-----------|------------------|--------|
3114
+ | `.claude/hooks/*.sh` | Wizard hook templates | Offer update if differs |
3115
+ | `.claude/skills/*/SKILL.md` | Wizard skill templates | Offer update if differs |
3116
+ | `SDLC.md`, `TESTING.md` | Wizard doc templates | Offer update if differs |
3117
+ | `CLAUDE.md` | NOT compared | Never touch (fully custom) |
3118
+
3119
+ ### Wizard Update Notification (Optional)
3120
+
3121
+ Want to be notified when a new wizard version is available? Add this lightweight GitHub Action to your repo. It checks weekly, costs $0 (no API key), and creates a GitHub Issue when updates exist.
3122
+
3123
+ **Setup:**
3124
+ 1. Create `.github/workflows/wizard-update-check.yml`:
3125
+
3126
+ ```yaml
3127
+ name: SDLC Wizard Update Check
3128
+
3129
+ on:
3130
+ schedule:
3131
+ - cron: '0 10 * * 1' # Mondays 10 AM UTC
3132
+ workflow_dispatch:
3133
+
3134
+ permissions:
3135
+ issues: write
3136
+ contents: read
3137
+
3138
+ jobs:
3139
+ check-wizard-update:
3140
+ runs-on: ubuntu-latest
3141
+ steps:
3142
+ - uses: actions/checkout@v4
3143
+ with:
3144
+ sparse-checkout: SDLC.md
3145
+
3146
+ - name: Check for wizard updates
3147
+ id: check
3148
+ run: |
3149
+ # Read installed version from SDLC.md metadata
3150
+ INSTALLED=$(grep -o 'SDLC Wizard Version: [0-9.]*' SDLC.md | grep -o '[0-9.]*' || echo "0.0.0")
3151
+ echo "Installed wizard version: $INSTALLED"
3152
+
3153
+ # Fetch latest CHANGELOG
3154
+ curl -sL https://raw.githubusercontent.com/BaseInfinity/agentic-ai-sdlc-wizard/main/CHANGELOG.md -o /tmp/changelog.md
3155
+
3156
+ # Extract latest version (first ## [X.X.X] line)
3157
+ LATEST=$(grep -m1 -oE '\[[0-9]+\.[0-9]+\.[0-9]+\]' /tmp/changelog.md | tr -d '[]')
3158
+ echo "Latest wizard version: $LATEST"
3159
+
3160
+ if [ "$INSTALLED" = "$LATEST" ]; then
3161
+ echo "Up to date"
3162
+ echo "needs_update=false" >> "$GITHUB_OUTPUT"
3163
+ exit 0
3164
+ fi
3165
+
3166
+ echo "Update available: v$INSTALLED -> v$LATEST"
3167
+ echo "installed=$INSTALLED" >> "$GITHUB_OUTPUT"
3168
+ echo "latest=$LATEST" >> "$GITHUB_OUTPUT"
3169
+ echo "needs_update=true" >> "$GITHUB_OUTPUT"
3170
+
3171
+ - name: Extract changelog entries
3172
+ if: steps.check.outputs.needs_update == 'true'
3173
+ run: |
3174
+ python3 -c "
3175
+ import re
3176
+ text = open('/tmp/changelog.md').read()
3177
+ installed = '${{ steps.check.outputs.installed }}'
3178
+ sections = re.split(r'^## ', text, flags=re.MULTILINE)
3179
+ relevant = []
3180
+ for s in sections:
3181
+ m = re.match(r'\[(\d+\.\d+\.\d+)\]', s)
3182
+ if m:
3183
+ v = m.group(1)
3184
+ if v == installed:
3185
+ break
3186
+ relevant.append('## ' + s)
3187
+ with open('/tmp/changes.md', 'w') as f:
3188
+ f.write(''.join(relevant))
3189
+ "
3190
+
3191
+ - name: Create notification issue
3192
+ if: steps.check.outputs.needs_update == 'true'
3193
+ env:
3194
+ GH_TOKEN: ${{ github.token }}
3195
+ INSTALLED: ${{ steps.check.outputs.installed }}
3196
+ LATEST: ${{ steps.check.outputs.latest }}
3197
+ run: |
3198
+ # Ensure wizard-update label exists
3199
+ gh label create "wizard-update" --color "0E8A16" --description "SDLC Wizard update available" 2>/dev/null || true
3200
+
3201
+ # Skip if open wizard-update issue already exists
3202
+ EXISTING=$(gh issue list --label "wizard-update" --state open --json number --jq '.[0].number' 2>/dev/null || echo "")
3203
+ if [ -n "$EXISTING" ]; then
3204
+ echo "Issue #$EXISTING already open, skipping"
3205
+ exit 0
3206
+ fi
3207
+
3208
+ # Fallback: if the extract-changelog step was skipped or failed, $CHANGES will
3209
+ # contain a plain string so the issue body still makes sense without changelog detail.
3210
+ CHANGES=$(cat /tmp/changes.md 2>/dev/null || echo "See CHANGELOG for details.")
3211
+
3212
+ # Note: ISSUE_EOF terminator indentation is intentional — YAML strips the block's
3213
+ # base indentation, leaving ISSUE_EOF at column 0 in the shell. Do not change it.
3214
+ gh issue create \
3215
+ --title "SDLC Wizard update: v${INSTALLED} -> v${LATEST}" \
3216
+ --label "wizard-update" \
3217
+ --body "$(cat <<ISSUE_EOF
3218
+ ## SDLC Wizard Update Available
3219
+
3220
+ **Installed:** v${INSTALLED}
3221
+ **Latest:** v${LATEST}
3222
+
3223
+ ### What's New
3224
+
3225
+ ${CHANGES}
3226
+
3227
+ ### How to Update
3228
+
3229
+ Ask Claude: **"Check for SDLC wizard updates"**
3230
+
3231
+ Claude will fetch the latest wizard, show what changed, and walk you through updates (opt-in each).
3232
+
3233
+ ---
3234
+ *Auto-generated by wizard update check. Close after updating.*
3235
+ ISSUE_EOF
3236
+ )"
3237
+ ```
3238
+
3239
+ 2. That's it — you'll get a GitHub Issue when updates are available (the `wizard-update` label is auto-created on first run)
3240
+
3241
+ **Cost:** $0. No API key needed. Pure bash/curl/python3. ~10 seconds of GitHub Actions time per week.
3242
+
3243
+ ### Why This Approach?
3244
+
3245
+ - **Manual flow (primary):** Uses Claude Code's built-in WebFetch - zero infrastructure
3246
+ - **CI notification (optional):** Lightweight issue creation - no API key, $0 cost
3247
+ - Opt-in per change - your customizations stay safe
3248
+ - **Tracks setup steps, not just files** - old users get new features
3249
+
3250
+ ---
3251
+
3252
+ ## Philosophy: Bespoke & Organic
3253
+
3254
+ ### The Real Goal (Read This!)
3255
+
3256
+ **This SDLC becomes YOUR custom-tailored workflow.**
3257
+
3258
+ Like a bespoke suit fitted to your body, this SDLC should grow and adapt to fit YOUR project perfectly. The wizard is a starting point - generic principles that Claude Code uses to build something unique to you.
3259
+
3260
+ **The magic:**
3261
+ - **Generic principles** - This wizard focuses on the "why", not tech specifics
3262
+ - **Claude figures out the details** - Your stack, your commands, your patterns
3263
+ - **Organic growth** - CI friction signals + scheduled research feed continuous improvement
3264
+ - **Recursive improvement** - The more you use it, the more tailored it becomes
3265
+
3266
+ ### Failure is Part of the Process
3267
+
3268
+ **No pain, no gain.**
3269
+
3270
+ When something doesn't work:
3271
+ 1. That's feedback, not failure
3272
+ 2. Claude proposes an adjustment
3273
+ 3. You approve (or tweak)
3274
+ 4. The SDLC gets better
3275
+
3276
+ **Friction is information.** Every time Claude struggles, that's a signal. Maybe the docs need updating. Maybe a gotcha needs documenting. Maybe the process needs simplifying.
3277
+
3278
+ **Don't fear mistakes.** They're how this system learns YOUR project.
3279
+
3280
+ ### Why Generic Principles Matter
3281
+
3282
+ **Less is more. Principles over prescriptions.**
3283
+
3284
+ 1. **"Plan before coding"** not "use exactly this planning template"
3285
+ 2. **"Test your work"** not "use Jest with this exact config"
3286
+ 3. **"Ask when uncertain"** not "if confidence < 60% then ask"
3287
+
3288
+ **Claude adapts the principles to YOUR stack.** Give Claude the philosophy, it figures out your tech details - your commands, your patterns, your workflow.
3289
+
3290
+ **The temptation:** Add more rules, more specifics, more enforcement.
3291
+ **The discipline:** Keep it generic. Trust Claude to adapt. KISS.
3292
+
3293
+ ### Stay Lean, Stay Engaged
3294
+
3295
+ **Don't drown in complexity. Don't turn your brain off.**
3296
+
3297
+ The human's job:
3298
+ - **Stay engaged** - keep the AI agent on track
3299
+ - **Build trust** - as velocity increases, you trust the process more
3300
+ - **Focus on what matters** - planning and confidence levels
3301
+
3302
+ **Maximum efficiency for both parties:**
3303
+ - AI handles execution details
3304
+ - Human handles direction and judgment
3305
+ - Neither is passive
3306
+
3307
+ **When you reach velocity:** You're not checking every line. You trust the process. Your brain focuses on planning and fixing confidence issues - the high-leverage work.
3308
+
3309
+ ### How Tailoring Happens
3310
+
3311
+ **This SDLC fits your project like custom-tailored clothes.**
3312
+
3313
+ The wizard provides generic starting principles, then:
3314
+
3315
+ 1. **Claude encounters your codebase** - Learns your patterns, idioms, structure
3316
+ 2. **Friction happens** - Claude struggles or makes a mistake
3317
+ 3. **Claude proposes a tweak** - "Should I add this gotcha to the docs?"
3318
+ 4. **You approve** - The SDLC becomes more fitted to YOUR project
3319
+ 5. **Repeat** - Each iteration makes it more bespoke
3320
+
3321
+ **After a few cycles:** This SDLC feels native to your project, not bolted on.
3322
+
3323
+ ### The Living System
3324
+
3325
+ > See **The Vision** at the top of this document for the full philosophy — including planned obsolescence, the Iron Man analogy, and tuning to your project.
3326
+
3327
+ ### Evolving with Claude Code
3328
+
3329
+ **Claude Code's agentic capabilities keep improving. This SDLC should evolve with them.**
3330
+
3331
+ Claude should periodically:
3332
+ 1. **Check latest Claude Code docs** - New features? Better patterns? Built-in capabilities?
3333
+ 2. **Research current best practices** - WebSearch for 2026 patterns, compare with what we're doing
3334
+ 3. **Propose SDLC updates** - "Claude Code now has X, should we use it instead of our custom Y?"
3335
+
3336
+ **The goal:** Keep the SDLC pipeline adapting to Claude's latest capabilities. Don't get stuck on old patterns when better ones exist.
3337
+
3338
+ **When Claude discovers something better:**
3339
+ 1. Propose the change with reasoning
3340
+ 2. Human approves
3341
+ 3. Update the SDLC docs
3342
+ 4. The pipeline gets better
3343
+
3344
+ **This SDLC is not static.** It grows with your project AND with Claude Code's evolution.
3345
+
3346
+ ### Stay Lightweight (Use Official Plugins)
3347
+
3348
+ When Anthropic provides official plugins that overlap with this SDLC:
3349
+
3350
+ **Use theirs, delete ours.**
3351
+
3352
+ | Official Plugin | Replaces Our... | Scope |
3353
+ |-----------------|-----------------|-------|
3354
+ | `claude-md-management` | Manual CLAUDE.md audits | CLAUDE.md only (not feature docs, TESTING.md, hooks) |
3355
+ | `code-review` | Custom self-review subagent | Local code review (parallel agents, confidence scoring) |
3356
+ | `commit-commands` | Git commit guidance | Commits only |
3357
+ | `claude-code-setup` | Manual automation discovery | Recommendations only |
3358
+
3359
+ **What we keep (not in official plugins):**
3360
+ - TDD Red-Green-Pass enforcement (hooks)
3361
+ - Confidence levels
3362
+ - Planning mode integration
3363
+ - Testing Diamond guidance
3364
+ - Feature docs, TESTING.md, ARCHITECTURE.md maintenance
3365
+ - Full SDLC workflow (planning → TDD → review)
3366
+
3367
+ **The goal isn't obsolescence - it's efficiency.** Official plugins are maintained by Anthropic, tested across codebases, and updated automatically.
3368
+
3369
+ **Check for new plugins periodically:**
3370
+ ```
3371
+ /plugin > Discover
3372
+ ```
3373
+
3374
+ **Re-run `claude-code-setup` periodically** (quarterly, or when your project expands in scope) to catch new automations — MCP servers, hooks, subagents — that weren't relevant at initial setup but are now.
3375
+
3376
+ ### When Claude Code Improves
3377
+
3378
+ Claude Code is actively improving. When they add built-in features:
3379
+
3380
+ | If Claude Code Adds... | Remove Our... |
3381
+ |------------------------|---------------|
3382
+ | Built-in TDD enforcement | `tdd-pretool-check.sh` |
3383
+ | Built-in confidence tracking | Confidence level guidance |
3384
+ | Built-in task tracking | TodoWrite reminders |
3385
+
3386
+ Use the best tool for the job. If Claude Code builds it better, use theirs.
3387
+
3388
+ ---
3389
+
3390
+ ## Community Contributions (Give Back!)
3391
+
3392
+ **This wizard belongs to the community, not any individual.**
3393
+
3394
+ ### Your Discoveries Help Everyone
3395
+
3396
+ When you find something valuable - a gotcha, a pattern, a simplification - consider contributing it back to the wizard repo so others benefit.
3397
+
3398
+ **Periodically, Claude may ask:**
3399
+ > "You discovered something useful here. Want to contribute this back to the wizard repo so others can benefit?"
3400
+
3401
+ Options:
3402
+ - **Yes** - Claude helps you create a PR
3403
+ - **Not now** - Ask again another time
3404
+ - **Never** - Never ask again (stored in config, respected forever)
3405
+
3406
+ ### What Makes a Good Contribution?
3407
+
3408
+ | Good Contributions | Not Ideal |
3409
+ |-------------------|-----------|
3410
+ | Philosophical insights | Project-specific config |
3411
+ | Generalizable patterns | Niche edge cases |
3412
+ | Simplifications | Additional complexity |
3413
+ | Battle-tested gotchas | Theoretical concerns |
3414
+
3415
+ ### The Bar: Scientific & Philosophical
3416
+
3417
+ Keep contributions:
3418
+ - **Scientific** - Based on what works, tested in real projects
3419
+ - **Philosophical** - Explains the "why", not just the "what"
3420
+ - **Generalizable** - Helps most projects, not just yours
3421
+
3422
+ ### How to Contribute
3423
+
3424
+ 1. Fork the repo: `github.com/[REPO_URL]`
3425
+ 2. Edit `CLAUDE_CODE_SDLC_WIZARD.md`
3426
+ 3. Submit a PR explaining the "why"
3427
+ 4. Community discusses, best ideas get merged
3428
+
3429
+ **Decision making is community-driven.** No single person owns this wizard.
3430
+
3431
+ ---
3432
+
3433
+ ## Ike the Cat: Easter Egg Apologies
3434
+
3435
+ **Meet Ike.** He's a sweet little black boy - 8 pounds of pure opinions. Short-to-medium black fur, loud meows, zero tolerance for bugs in production. Lives for Fancy Feast. He watches every coding session from his perch and judges silently. Sometimes not so silently.
3436
+
3437
+ **This is an EASTER EGG, not a regular feature.**
3438
+
3439
+ If the SDLC is working well, users should NEVER be frustrated. Ike jokes are a rare safety valve for when things slip - not something they encounter often.
3440
+
3441
+ **The goal:** SDLC should be smooth enough that no one needs apologies.
3442
+
3443
+ **When frustration happens (rare):**
3444
+ 1. Apologize sincerely
3445
+ 2. Keep it lighthearted with an Ike joke
3446
+ 3. Match THEIR humor, context, even their own words back at them (positively!)
3447
+ 4. SDLC is tough - Ike keeps it light
3448
+
3449
+ **Examples (adapt to user's style AND repo context):**
3450
+ - "Sorry about that! Ike just knocked over my coffee watching me type this. He's judging both of us right now."
3451
+ - "My bad! Ike says I should've asked first. He's very opinionated for a 12-pound cat."
3452
+ - "Ike just walked across my keyboard. He says 'asdfghjkl' which I think means 'write the test first next time.'"
3453
+ - Reference the repo/code they're working on:
3454
+ - (Discord bot) "Ike tried to bet on himself. The odds were not in his favor."
3455
+ - (MMA odds) "Ike thinks he could take on a lightweight. He weighs 8 pounds."
3456
+ - (Mass text) "Ike just sent 47 meows to everyone in my contacts."
3457
+
3458
+ **Be quirky! Have fun with it.** Match the vibe of what they're building.
3459
+
3460
+ **Why Ike?** Apologies should be light, not groveling. Ike keeps it friendly and human.
3461
+
3462
+ **Use their name/nickname** if you know it. Makes it personal.
3463
+
3464
+ **Mirror their communication style.** If they curse, you can curse back (friendly). If they're casual, be casual. Talk like they talk.
3465
+
3466
+ **If they don't like jokes:** Make one joke, then never mention it again. Simple.
3467
+
3468
+ ---
3469
+
3470
+ **You're ready!** Start a new Claude Code session and try implementing something. The system will guide Claude through the proper workflow automatically.