agentic-sdlc-wizard 1.15.0 → 1.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -93,8 +93,10 @@ This prevents both false positives (crying wolf) and false negatives (missing re
93
93
 
94
94
  **How We Apply This:**
95
95
  - Weekly workflow tests new Claude Code versions before recommending upgrade
96
+ - Version-pinned gate: installs the specific CC version and passes it via `path_to_claude_code_executable` so E2E actually runs the new binary
96
97
  - Phase A: Does new CC version break SDLC enforcement?
97
98
  - Phase B: Do changelog-suggested improvements actually help?
99
+ - Green CI = safe to upgrade. Red = stay on current version until fixed
98
100
  - Results shown in PR with statistical confidence
99
101
 
100
102
  ---
@@ -209,6 +211,37 @@ When Anthropic provides official plugins or tools that handle something:
209
211
  | **Claude Code v2.1.69+** | Required for InstructionsLoaded hook, skill directory variable, and Tasks system |
210
212
  | **Git repository** | Files should be committed for team sharing |
211
213
 
214
+ **Blank repos (no CLAUDE.md, no code):** The wizard works on empty repos. Run `npx agentic-sdlc-wizard init` — it installs hooks, skills, and the wizard doc. On first session, the hooks detect missing SDLC files and redirect to `/setup-wizard`, which generates CLAUDE.md, SDLC.md, TESTING.md, and ARCHITECTURE.md interactively. You do NOT need to run Claude's built-in `/init` first — the setup wizard handles everything.
215
+
216
+ ---
217
+
218
+ ## Recommended Effort Level
219
+
220
+ Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
221
+
222
+ | Level | When to Use | How to Set |
223
+ |-------|-------------|------------|
224
+ | `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
225
+ | `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
226
+
227
+ **Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
228
+
229
+ **When to escalate to `max`:**
230
+ - You hit LOW confidence on your approach — deeper thinking may find clarity
231
+ - You've failed the same thing twice — something non-obvious is wrong
232
+ - Architecture decisions with wide blast radius
233
+ - Complex multi-system debugging where you need to hold many constraints
234
+ - Cross-model review analysis (reading and evaluating external reviewer findings)
235
+
236
+ **How it works:**
237
+ - `/effort max` changes effort for the current session only (resets next session)
238
+ - `effort: high` in SKILL.md frontmatter persists — every `/sdlc` invocation uses `high`
239
+ - You can also type `ultrathink` in any prompt for a single high-effort turn
240
+
241
+ **Cost note:** `max` uses significantly more tokens than `high`. Use it when the problem justifies it, not as a default.
242
+
243
+ > See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
244
+
212
245
  ---
213
246
 
214
247
  ## Claude Code Feature Updates
@@ -250,7 +283,7 @@ $ARGUMENTS
250
283
 
251
284
  **Usage examples**:
252
285
  - `/sdlc fix the login validation bug` → `$ARGUMENTS` = "fix the login validation bug"
253
- - `/testing unit UserService` → `$ARGUMENTS` = "unit UserService"
286
+ - `/sdlc write tests for UserService` → `$ARGUMENTS` = "write tests for UserService"
254
287
 
255
288
  **Note**: Skills still auto-invoke via hooks. This is optional polish for manual invocation.
256
289
 
@@ -276,7 +309,7 @@ New built-in commands available to use alongside the wizard:
276
309
 
277
310
  ### Skill Effort Frontmatter (v2.1.80+)
278
311
 
279
- Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` and `/testing` skills use `effort: high` to ensure Claude gives full attention to SDLC tasks.
312
+ Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` skill uses `effort: high` to ensure Claude gives full attention to SDLC tasks.
280
313
 
281
314
  ### InstructionsLoaded Hook (v2.1.69+)
282
315
 
@@ -615,6 +648,33 @@ Security review depth should match your project's risk profile. During wizard se
615
648
 
616
649
  ---
617
650
 
651
+ ## Context Management: `/clear` vs `/compact`
652
+
653
+ Two tools for managing context — use the right one:
654
+
655
+ | | `/compact` | `/clear` |
656
+ |---|---|---|
657
+ | **What it does** | Summarizes conversation, frees space | Resets conversation entirely |
658
+ | **When to use** | Continuing same task, need more room | Switching to an unrelated task |
659
+ | **Preserves** | Summary of decisions + progress | Nothing (fresh start) |
660
+ | **CLAUDE.md** | Re-loaded from disk | Re-loaded from disk |
661
+ | **Hooks/skills/settings** | Unaffected | Unaffected |
662
+ | **Task list** | Persists | Cleared |
663
+
664
+ **Rules of thumb:**
665
+ - `/compact` between planning and implementation (plan preserved in summary)
666
+ - `/clear` between unrelated tasks (stale context wastes tokens and misleads Claude)
667
+ - `/clear` after 2+ failed corrections on the same issue (context is polluted with bad approaches — start fresh with a better prompt)
668
+ - After committing a PR, `/clear` before starting the next feature
669
+
670
+ **Auto-compact** fires automatically at ~95% context capacity. You don't need to manage this manually — Claude Code handles it. The SDLC skill suggests `/compact` during CI idle time as a "context GC" opportunity.
671
+
672
+ **What survives `/compact`:** Key decisions, code changes, task state (as a summary). What can be lost: detailed early-conversation instructions not in CLAUDE.md, specific file contents read long ago.
673
+
674
+ **Best practice:** Put persistent instructions in CLAUDE.md (survives both `/compact` and `/clear`), not in conversation.
675
+
676
+ ---
677
+
618
678
  ## Example Workflow (End-to-End)
619
679
 
620
680
  Here's what a typical task looks like with this system:
@@ -883,21 +943,25 @@ The wizard creates TDD-specific automations that official plugins don't provide:
883
943
 
884
944
  ### Step 0.3: Additional Recommendations (Optional)
885
945
 
886
- After SDLC setup is complete, run `claude-code-setup` for additional recommendations:
946
+ After SDLC setup is complete, run `/claude-automation-recommender` for stack-specific tooling:
887
947
 
888
948
  ```
889
- "Based on your codebase, recommend additional automations"
949
+ /claude-automation-recommender
890
950
  ```
891
951
 
892
- This may suggest:
893
- - MCP Servers (context7 for docs, Playwright for frontend)
894
- - Additional hooks (auto-format if Prettier configured)
895
- - Subagents (security-reviewer if auth code detected)
952
+ **The wizard is an enforcement engine** — it installs working hooks, skills, and process guardrails that run automatically. **The recommender is a suggestion engine** — it analyzes your codebase and suggests additional automations you might want. They're complementary:
896
953
 
897
- **Claude prompts for each:**
898
- > "[Detected: Prettier config] Want to add auto-format hook? (y/n)"
954
+ | Category | Wizard Ships | Recommender Suggests |
955
+ |----------|-------------|---------------------|
956
+ | SDLC process (TDD, planning, review) | Enforced via hooks + skills | Not covered |
957
+ | CI workflows (self-heal, PR review) | Templates + docs | Not covered |
958
+ | MCP servers (context7, Playwright, DB) | Not covered | Per-stack suggestions |
959
+ | Auto-formatting hooks (Prettier, ESLint) | Not covered | Per-stack suggestions |
960
+ | Type-checking hooks (tsc, mypy) | Not covered | Per-stack suggestions |
961
+ | Subagent templates (code-reviewer, etc.) | Cross-model review only | 8 templates |
962
+ | Plugin recommendations (LSPs, etc.) | Not covered | Per-stack suggestions |
899
963
 
900
- These are additive—they don't replace our TDD hooks.
964
+ The recommender's suggestions are additive they don't replace the wizard's TDD hooks or SDLC enforcement.
901
965
 
902
966
  ### Git Workflow Preference
903
967
 
@@ -1376,7 +1440,7 @@ Your answers map to these files:
1376
1440
  | Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
1377
1441
  | Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
1378
1442
  | Q11 (test duration) | `SDLC skill` - wait time note |
1379
- | Q12 (E2E) | `testing skill` - testing diamond top |
1443
+ | Q12 (E2E) | `TESTING.md` - testing diamond top |
1380
1444
 
1381
1445
  ---
1382
1446
 
@@ -1387,7 +1451,6 @@ Create these directories in your project root:
1387
1451
  ```bash
1388
1452
  mkdir -p .claude/hooks
1389
1453
  mkdir -p .claude/skills/sdlc
1390
- mkdir -p .claude/skills/testing
1391
1454
  ```
1392
1455
 
1393
1456
  **Commit to Git:** Yes! These files should be committed so your whole team gets the same SDLC enforcement. When teammates pull, they get the hooks and skills automatically.
@@ -1506,9 +1569,8 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
1506
1569
  The light hook outputs text that **instructs Claude** to invoke skills:
1507
1570
 
1508
1571
  ```
1509
- AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
1510
- - implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
1511
- - test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
1572
+ AUTO-INVOKE SKILL (Claude MUST do this FIRST):
1573
+ - implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
1512
1574
  ```
1513
1575
 
1514
1576
  **This is text-based, not programmatic.** Claude reads this instruction and follows it. When Claude sees your message is an implementation task, it invokes the sdlc skill using the Skill tool. This loads the full SDLC guidance into context.
@@ -1530,7 +1592,7 @@ Create `.claude/hooks/sdlc-prompt-check.sh`:
1530
1592
  ```bash
1531
1593
  #!/bin/bash
1532
1594
  # Light SDLC hook - baseline reminder every prompt (~100 tokens)
1533
- # Full guidance in skills: .claude/skills/sdlc/ and .claude/skills/testing/
1595
+ # Full guidance in skill: .claude/skills/sdlc/
1534
1596
 
1535
1597
  cat << 'EOF'
1536
1598
  SDLC BASELINE:
@@ -1540,10 +1602,8 @@ SDLC BASELINE:
1540
1602
  4. FAILED 2x? STOP and ASK USER
1541
1603
  5. 🛑 ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS
1542
1604
 
1543
- AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
1544
- - implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
1545
- - test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
1546
- - If BOTH match (e.g., "fix the test") → sdlc takes precedence (includes TDD)
1605
+ AUTO-INVOKE SKILL (Claude MUST do this FIRST):
1606
+ - implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
1547
1607
  - DON'T invoke for: questions, explanations, reading/exploring code, simple queries
1548
1608
  - DON'T wait for user to type /sdlc - AUTO-INVOKE based on task type
1549
1609
 
@@ -1635,7 +1695,7 @@ TodoWrite([
1635
1695
  { content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending", activeForm: "Presenting approach" },
1636
1696
  { content: "Signal ready - user exits plan mode", status: "pending", activeForm: "Awaiting plan approval" },
1637
1697
  // TRANSITION PHASE (After plan mode, before compact)
1638
- { content: "Update feature docs with discovered gotchas", status: "pending", activeForm: "Updating feature docs" },
1698
+ { content: "Doc sync: update feature docs if code change contradicts or extends documented behavior", status: "pending", activeForm: "Syncing feature docs" },
1639
1699
  { content: "Request /compact before TDD", status: "pending", activeForm: "Requesting compact" },
1640
1700
  // IMPLEMENTATION PHASE (After compact)
1641
1701
  { content: "TDD RED: Write failing test FIRST", status: "pending", activeForm: "Writing failing test" },
@@ -1676,7 +1736,7 @@ TodoWrite([
1676
1736
 
1677
1737
  **Workflow:**
1678
1738
  1. **Plan Mode** (editing blocked): Research → Write plan file → Present approach + confidence
1679
- 2. **Transition** (after approval): Update feature docs → Request /compact
1739
+ 2. **Transition** (after approval): Doc sync (update feature docs if code contradicts/extends them) → Request /compact
1680
1740
  3. **Implementation** (after compact): TDD RED → GREEN → PASS
1681
1741
 
1682
1742
  **Before TDD, MUST ask:** "Docs updated. Run `/compact` before implementation?"
@@ -1685,13 +1745,13 @@ TodoWrite([
1685
1745
 
1686
1746
  Before presenting approach, STATE your confidence:
1687
1747
 
1688
- | Level | Meaning | Action |
1689
- |-------|---------|--------|
1690
- | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
1691
- | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
1692
- | LOW (<60%) | Not sure | ASK USER before proceeding |
1693
- | FAILED 2x | Something's wrong | STOP. ASK USER immediately |
1694
- | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
1748
+ | Level | Meaning | Action | Effort |
1749
+ |-------|---------|--------|--------|
1750
+ | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
1751
+ | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
1752
+ | LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
1753
+ | FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
1754
+ | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
1695
1755
 
1696
1756
  ## Self-Review Loop (CRITICAL)
1697
1757
 
@@ -1724,38 +1784,100 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
1724
1784
 
1725
1785
  **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
1726
1786
 
1727
- **Steps:**
1787
+ ### Round 1: Initial Review
1788
+
1728
1789
  1. After self-review passes, write `.reviews/handoff.json`:
1729
1790
  ```jsonc
1730
1791
  {
1731
1792
  "review_id": "feature-xyz-001",
1732
1793
  "status": "PENDING_REVIEW",
1794
+ "round": 1,
1733
1795
  "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
1734
1796
  "review_instructions": "Review for security, edge cases, and correctness",
1735
1797
  "artifact_path": ".reviews/feature-xyz-001/"
1736
1798
  }
1737
1799
  ```
1738
- 2. Tell the user to run the independent reviewer:
1800
+ 2. Run the independent reviewer:
1739
1801
  ```bash
1740
1802
  codex exec \
1741
1803
  -c 'model_reasoning_effort="xhigh"' \
1742
1804
  -s danger-full-access \
1743
1805
  -o .reviews/latest-review.md \
1744
1806
  "You are an independent code reviewer. Read .reviews/handoff.json, \
1745
- review the listed files, and write your findings to the artifact_path. \
1807
+ review the listed files. Output each finding with: an ID (1, 2, ...), \
1808
+ severity (P0/P1/P2), description, and a 'certify condition' stating \
1809
+ what specific change would resolve it. \
1746
1810
  End with CERTIFIED or NOT CERTIFIED."
1747
1811
  ```
1748
- 3. Read `.reviews/latest-review.md` if CERTIFIED, proceed to CI. If NOT CERTIFIED, fix findings and repeat from step 1.
1812
+ 3. If CERTIFIED proceed to CI. If NOT CERTIFIED go to Round 2.
1813
+
1814
+ ### Round 2+: Dialogue Loop
1815
+
1816
+ When the reviewer finds issues, respond per-finding instead of silently fixing everything:
1817
+
1818
+ 1. Write `.reviews/response.json`:
1819
+ ```jsonc
1820
+ {
1821
+ "review_id": "feature-xyz-001",
1822
+ "round": 2,
1823
+ "responding_to": ".reviews/latest-review.md",
1824
+ "responses": [
1825
+ { "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
1826
+ { "finding": "2", "action": "DISPUTED", "justification": "This is intentional — see CODE_REVIEW_EXCEPTIONS.md" },
1827
+ { "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
1828
+ ]
1829
+ }
1830
+ ```
1831
+ - **FIXED**: "I fixed this. Here is what changed." Reviewer verifies.
1832
+ - **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects.
1833
+ - **ACCEPTED**: "You are right. Fixing now." (Same as FIXED, batched.)
1834
+
1835
+ 2. Update `handoff.json` with `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields.
1836
+
1837
+ 3. Run targeted recheck (NOT a full re-review):
1838
+ ```bash
1839
+ codex exec \
1840
+ -c 'model_reasoning_effort="xhigh"' \
1841
+ -s danger-full-access \
1842
+ -o .reviews/latest-review.md \
1843
+ "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
1844
+ to find the previous_review path — read that file for the original \
1845
+ findings and certify conditions. Then read .reviews/response.json \
1846
+ for the author's responses. For each: \
1847
+ FIXED → verify the fix against the original certify condition. \
1848
+ DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
1849
+ ACCEPTED → verify it was applied. \
1850
+ Do NOT raise new findings unless P0 (critical/security). \
1851
+ New observations go in 'Notes for next review' (non-blocking). \
1852
+ End with CERTIFIED or NOT CERTIFIED."
1853
+ ```
1854
+
1855
+ 4. If CERTIFIED → done. If NOT CERTIFIED (rejected disputes or failed fixes) → fix rejected items and repeat.
1856
+
1857
+ ### Convergence
1858
+
1859
+ Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of open findings. Don't spin indefinitely.
1749
1860
 
1750
1861
  ```
1751
- Self-review passes → write handoff.json user runs codex exec
1752
- ^ |
1753
- | CERTIFIED? YES CI feedback loop
1754
- | |
1755
- | NO (findings)
1756
- | |
1757
- └──────── Fix findings ←───────────────────────┘
1758
- (repeat until CERTIFIED, or ask user)
1862
+ Self-review passes → handoff.json (round 1, PENDING_REVIEW)
1863
+ |
1864
+ Reviewer: FULL REVIEW (structured findings)
1865
+ |
1866
+ CERTIFIED? YES → CI feedback loop
1867
+ |
1868
+ NO (findings with IDs + certify conditions)
1869
+ |
1870
+ Claude writes response.json:
1871
+ FIXED / DISPUTED / ACCEPTED per finding
1872
+ |
1873
+ handoff.json (round 2+, PENDING_RECHECK)
1874
+ |
1875
+ Reviewer: TARGETED RECHECK (previous findings only)
1876
+ |
1877
+ All resolved? → YES → CERTIFIED
1878
+ |
1879
+ NO → fix rejected items, repeat
1880
+ (max 3 rechecks, then escalate to user)
1759
1881
  ```
1760
1882
 
1761
1883
  **Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
@@ -1811,7 +1933,7 @@ Debug it. Find root cause. Fix it properly. Tests ARE code.
1811
1933
 
1812
1934
  ## Flaky Test Prevention
1813
1935
 
1814
- **Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions.
1936
+ **Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions. For a deep dive, see: [How do you Address and Prevent Flaky Tests?](https://softwareautomation.notion.site/How-do-you-Address-and-Prevent-Flaky-Tests-23c539e19b3c46eeb655642b95237dc0)
1815
1937
 
1816
1938
  ### Principles
1817
1939
 
@@ -1839,7 +1961,9 @@ Sometimes the flakiness is genuinely in CI infrastructure (runner environment, G
1839
1961
  - **Keep quality gates strict** — the actual pass/fail decision must NOT have `continue-on-error`
1840
1962
  - **Separate "fail the build" from "nice to have"** — a missing PR comment is not a regression
1841
1963
 
1842
- ## CI Feedback Loop (After Commit)
1964
+ ## CI Feedback Loop — Local Shepherd (After Commit)
1965
+
1966
+ **This is the "local shepherd" — the primary CI fix mechanism.** It runs in your active session with full context. The optional CI Auto-Fix bot (`.github/workflows/ci-autofix.yml`) is a fallback for unattended PRs only. When both are active, the bot detects your local pushes via SHA comparison and skips automatically.
1843
1967
 
1844
1968
  **The SDLC doesn't end at local tests.** CI must pass too.
1845
1969
 
@@ -1885,7 +2009,7 @@ Local tests pass -> Commit -> Push -> Watch CI
1885
2009
  - Flaky? Investigate - flakiness is a bug
1886
2010
  - Stuck? ASK USER
1887
2011
 
1888
- ## CI Review Feedback Loop (After CI Passes)
2012
+ ## CI Review Feedback Loop — Local Shepherd (After CI Passes)
1889
2013
 
1890
2014
  **CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
1891
2015
 
@@ -1917,6 +2041,25 @@ CI passes -> Read review suggestions
1917
2041
  - **Ask first**: Present suggestions to user, let them decide which to implement
1918
2042
  - **Skip review feedback**: Ignore CI review suggestions, only fix CI failures
1919
2043
 
2044
+ ## Shepherd vs. Bot: Two-Tier CI Fix Model
2045
+
2046
+ | Aspect | Local Shepherd | CI Auto-Fix Bot |
2047
+ |--------|---------------|-----------------|
2048
+ | **When** | Active session (you're working) | Unattended (pushed and walked away) |
2049
+ | **Context** | Full: codebase, conversation, intent | Minimal: `--bare`, 200-line truncated logs |
2050
+ | **Cost** | Session tokens (marginal cost ~$0) | Separate API calls ($0.50-$2.00 per fix) |
2051
+ | **Noise** | 0 extra commits | 1+ `[autofix N/M]` commits per attempt |
2052
+ | **Quality** | High: full diagnosis, targeted fix | Lower: stateless, may repeat same approach |
2053
+ | **Speed** | Immediate: fix locally, push once | Delayed: workflow_run trigger + runner queue |
2054
+ | **Deconfliction** | N/A (is the primary) | SHA check: skips if branch advanced since failure |
2055
+
2056
+ **The shepherd is the default.** It runs as part of the SDLC checklist above whenever you push from an active session. The bot is optional and only adds value for:
2057
+ - Dependabot/Renovate PRs (no human session)
2058
+ - PRs where you push and walk away
2059
+ - Overnight CI runs
2060
+
2061
+ If you set up the bot, the SHA-based suppression ensures they never conflict.
2062
+
1920
2063
  ## DRY Principle
1921
2064
 
1922
2065
  **Before coding:** "What patterns exist I can reuse?"
@@ -1937,111 +2080,6 @@ CI passes -> Read review suggestions
1937
2080
 
1938
2081
  ---
1939
2082
 
1940
- ## Step 7: Create Testing Skill
1941
-
1942
- Create `.claude/skills/testing/SKILL.md`:
1943
-
1944
- ````markdown
1945
- ---
1946
- name: testing
1947
- description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
1948
- argument-hint: [test type] [target]
1949
- ---
1950
- # Testing Skill - TDD & Testing Philosophy
1951
-
1952
- ## Task
1953
- $ARGUMENTS
1954
-
1955
- ## Testing Diamond (CRITICAL)
1956
-
1957
- ```
1958
- /\ ← Few E2E (automated or manual sign-off at end)
1959
- / \
1960
- / \
1961
- /------\
1962
- | | ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
1963
- | |
1964
- \------/
1965
- \ /
1966
- \ /
1967
- \/ ← Few Unit (pure logic only)
1968
- ```
1969
-
1970
- **Why Integration Tests are Best Bang for Buck:**
1971
- - **Speed**: Fast enough to run on every change
1972
- - **Stability**: Touch real code, not mocks that lie
1973
- - **Confidence**: If they pass, production usually works
1974
- - **Real bugs**: Integration tests with real DB catch real bugs
1975
- - Unit tests with mocks can "pass" while production fails
1976
-
1977
- ## Minimal Mocking Philosophy
1978
-
1979
- | What | Mock? | Why |
1980
- |------|-------|-----|
1981
- | Database | ❌ NEVER | Use test DB or in-memory |
1982
- | Cache | ❌ NEVER | Use isolated test instance |
1983
- | External APIs | ✅ YES | Real calls = flaky + expensive |
1984
- | Time/Date | ✅ YES | Determinism |
1985
-
1986
- **Mocks MUST come from REAL captured data:**
1987
- - Capture real API response
1988
- - Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
1989
- - Import in tests
1990
- - Never guess mock shapes!
1991
-
1992
- ## TDD Tests Must PROVE
1993
-
1994
- | Phase | What It Proves |
1995
- |-------|----------------|
1996
- | RED | Test FAILS → Bug exists or feature missing |
1997
- | GREEN | Test PASSES → Fix works or feature implemented |
1998
- | Forever | Regression protection |
1999
-
2000
- **WRONG approach:**
2001
- ```
2002
- // ❌ Writing test that passes with current (buggy) code
2003
- assert currentBuggyBehavior == currentBuggyBehavior // pseudocode
2004
- ```
2005
-
2006
- **CORRECT approach:**
2007
- ```
2008
- // ✅ Writing test that FAILS with buggy code, PASSES with fix
2009
- assert result.status == 'success' // pseudocode - adapt to your framework
2010
- assert result.data != null
2011
- ```
2012
-
2013
- ## Unit Tests = Pure Logic ONLY
2014
-
2015
- A function qualifies for unit testing ONLY if:
2016
- - ✅ No database calls
2017
- - ✅ No external API calls
2018
- - ✅ No file system access
2019
- - ✅ No cache calls
2020
- - ✅ Input → Output transformation only
2021
-
2022
- Everything else needs integration tests.
2023
-
2024
- ## When Stuck on Tests
2025
-
2026
- 1. Add console.logs → Check output
2027
- 2. Run single test in isolation
2028
- 3. Check fixtures match real API
2029
- 4. **STILL stuck?** ASK USER
2030
-
2031
- ## After Session (Capture Learnings)
2032
-
2033
- If this session revealed testing insights, update the right place:
2034
- - **Testing patterns, gotchas** → `TESTING.md`
2035
- - **Feature-specific test quirks** → Feature docs (`*_PLAN.md`)
2036
- - **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
2037
-
2038
- ---
2039
-
2040
- **Full reference:** TESTING.md
2041
- ````
2042
-
2043
- ---
2044
-
2045
2083
  ### Visual Regression Testing (Experimental - Niche Use Cases Only)
2046
2084
 
2047
2085
  **Most apps don't need this.** Standard E2E testing (Playwright, Cypress) covers 99% of UI testing needs.
@@ -2230,9 +2268,27 @@ These are your full reference docs. Start with stubs and expand over time:
2230
2268
 
2231
2269
  **Claude follows this automatically.** When task involves "deploy to prod" and confidence is LOW, Claude will ask before proceeding.
2232
2270
 
2271
+ ## Post-Deploy Verification
2272
+
2273
+ **After deploying to ANY environment, verify it's working:**
2274
+
2275
+ | Environment | Health Check | Log Command | Smoke Test |
2276
+ |-------------|-------------|-------------|------------|
2277
+ | Local Dev | `curl http://localhost:3000/health` | `[your dev log command]` | `npm run test:smoke` |
2278
+ | Staging | `curl https://staging.example.com/health` | `[your staging log command]` | `[your staging smoke test]` |
2279
+ | Production | `curl https://example.com/health` | `[your prod log command, e.g., kubectl logs]` | `[your prod smoke test]` |
2280
+
2281
+ **Monitoring after production deploy:**
2282
+ 1. Watch error rates for 15 minutes (dashboard: `[your monitoring URL]`)
2283
+ 2. Check application logs for new errors: `[your log command]`
2284
+ 3. Run smoke tests against production: `[your smoke test command]`
2285
+ 4. If issues found → rollback first, THEN start new SDLC loop to fix
2286
+
2287
+ **Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
2288
+
2233
2289
  ## Rollback
2234
2290
 
2235
- If deployment fails or causes issues:
2291
+ If deployment fails or post-deploy verification catches issues:
2236
2292
 
2237
2293
  | Environment | Rollback Command | Notes |
2238
2294
  |-------------|------------------|-------|
@@ -2266,7 +2322,7 @@ If deployment fails or causes issues:
2266
2322
 
2267
2323
  **SDLC.md:**
2268
2324
  ```markdown
2269
- <!-- SDLC Wizard Version: 1.15.0 -->
2325
+ <!-- SDLC Wizard Version: 1.20.0 -->
2270
2326
  <!-- Setup Date: [DATE] -->
2271
2327
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2272
2328
  <!-- Git Workflow: [PRs or Solo] -->
@@ -2298,7 +2354,7 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
2298
2354
  ```markdown
2299
2355
  # Testing Guidelines
2300
2356
 
2301
- See `.claude/skills/testing/SKILL.md` for TDD philosophy.
2357
+ See `TESTING.md` for TDD philosophy.
2302
2358
 
2303
2359
  ## Test Commands
2304
2360
 
@@ -2394,7 +2450,6 @@ Verification Results:
2394
2450
  ├── .claude/hooks/tdd-pretool-check.sh ✓ executable
2395
2451
  ├── .claude/settings.json ✓ valid JSON
2396
2452
  ├── .claude/skills/sdlc/SKILL.md ✓ frontmatter OK
2397
- ├── .claude/skills/testing/SKILL.md ✓ frontmatter OK
2398
2453
  ├── CLAUDE.md ✓ exists
2399
2454
  ├── SDLC.md ✓ exists
2400
2455
  └── TESTING.md ✓ exists
@@ -2422,14 +2477,14 @@ All checks passed! Setup complete.
2422
2477
  |------|-----------------|
2423
2478
  | "What files handle auth?" | Answers without invoking skills |
2424
2479
  | "Add a logout button" | Auto-invokes sdlc skill, uses TodoWrite |
2425
- | "Write tests for login" | Auto-invokes testing skill |
2480
+ | "Write tests for login" | Auto-invokes sdlc skill |
2426
2481
 
2427
2482
  **What happens automatically:**
2428
2483
 
2429
2484
  | You Do | System Does |
2430
2485
  |--------|-------------|
2431
2486
  | Ask to implement something | SDLC skill auto-invokes, TodoWrite starts |
2432
- | Ask to write tests | Testing skill auto-invokes |
2487
+ | Ask to write tests | SDLC skill auto-invokes |
2433
2488
  | Claude tries to edit code | TDD reminder fires |
2434
2489
  | Task completes | Compliance check runs |
2435
2490
 
@@ -2494,7 +2549,7 @@ All checks passed! Setup complete.
2494
2549
  |--------|---------|
2495
2550
  | Free context after planning | `/compact` |
2496
2551
  | Enter planning mode | Claude suggests or `/plan` |
2497
- | Run specific skill | `/sdlc` or `/testing` |
2552
+ | Run specific skill | `/sdlc` |
2498
2553
 
2499
2554
  ---
2500
2555
 
@@ -2525,7 +2580,7 @@ You've successfully set up the system when:
2525
2580
 
2526
2581
  - [ ] Light hook fires every prompt (you see SDLC BASELINE in responses)
2527
2582
  - [ ] Claude auto-invokes sdlc skill for implementation tasks
2528
- - [ ] Claude auto-invokes testing skill for test tasks
2583
+ - [ ] Claude auto-invokes sdlc skill for all tasks
2529
2584
  - [ ] Claude uses TodoWrite to track progress
2530
2585
  - [ ] Claude states confidence levels
2531
2586
  - [ ] Claude asks for clarification when LOW confidence
@@ -2578,9 +2633,17 @@ Want me to file these? (yes/no/not now)
2578
2633
 
2579
2634
  ## Going Further
2580
2635
 
2581
- ### Create Feature Plan Docs
2636
+ ### Feature Documentation
2637
+
2638
+ Keep feature docs alongside code. Three patterns, use what fits:
2639
+
2640
+ | Pattern | When to Use | Example |
2641
+ |---------|-------------|---------|
2642
+ | `*_PLAN.md` / `*_DOCS.md` | Per-feature living docs | `AUTH_DOCS.md`, `PAYMENTS_PLAN.md` |
2643
+ | `docs/decisions/NNN-title.md` (ADR) | Architecture decisions that need rationale | `docs/decisions/001-use-postgres.md` |
2644
+ | `docs/features/name.md` | Feature docs in a `docs/` directory | `docs/features/auth.md` |
2582
2645
 
2583
- For each major feature, create `FEATURE_NAME_PLAN.md`:
2646
+ **Feature doc template:**
2584
2647
 
2585
2648
  ```markdown
2586
2649
  # Feature Name
@@ -2598,7 +2661,36 @@ Things that can trip you up.
2598
2661
  What's planned but not done.
2599
2662
  ```
2600
2663
 
2601
- Claude will read these during planning and update them with discoveries.
2664
+ **ADR (Architecture Decision Record) template** for decisions that need context:
2665
+
2666
+ ```markdown
2667
+ # ADR-NNN: Decision Title
2668
+
2669
+ ## Status
2670
+ Accepted | Superseded by ADR-NNN | Deprecated
2671
+
2672
+ ## Context
2673
+ What is the problem? What forces are at play?
2674
+
2675
+ ## Decision
2676
+ What did we decide and why?
2677
+
2678
+ ## Consequences
2679
+ What are the trade-offs? What becomes easier/harder?
2680
+ ```
2681
+
2682
+ Store ADRs in `docs/decisions/`. Number sequentially. Claude reads these during planning to understand why things are built the way they are.
2683
+
2684
+ **Keeping docs in sync with code:**
2685
+
2686
+ Docs drift when code changes but docs don't. The SDLC skill's planning phase detects this:
2687
+
2688
+ - During planning, Claude reads feature docs for the area being changed
2689
+ - If the code change contradicts what the doc says, Claude updates the doc
2690
+ - The "After Session" step routes learnings to the right doc
2691
+ - Stale docs cause low confidence — if Claude struggles, the doc may need updating
2692
+
2693
+ **CLAUDE.md health:** Run `/claude-md-improver` periodically (quarterly or after major changes). It audits CLAUDE.md specifically — structure, clarity, completeness (6 criteria, 100-point rubric). It does NOT cover feature docs, TESTING.md, or ADRs — the SDLC workflow handles those.
2602
2694
 
2603
2695
  ### Expand TESTING.md
2604
2696
 
@@ -2622,6 +2714,10 @@ Add project-specific guidance to skills:
2622
2714
  - Preferred patterns
2623
2715
  - Architecture decisions
2624
2716
 
2717
+ ### Complementary Tools
2718
+
2719
+ The wizard handles SDLC process enforcement. For stack-specific tooling, run `/claude-automation-recommender` — it suggests MCP servers, formatting hooks, type-checking hooks, subagent templates, and plugins based on your detected tech stack. See [Step 0.3](#step-03-additional-recommendations-optional) for the full comparison.
2720
+
2625
2721
  ---
2626
2722
 
2627
2723
  ## Testing AI Apps: What's Different
@@ -2689,6 +2785,49 @@ _Sources: [Confident AI](https://www.confident-ai.com/blog/llm-testing-in-2024-t
2689
2785
 
2690
2786
  ---
2691
2787
 
2788
+ ## Token Efficiency
2789
+
2790
+ Practical techniques to reduce token consumption without sacrificing quality.
2791
+
2792
+ ### Monitor Costs
2793
+
2794
+ | Tool | What It Shows | When to Use |
2795
+ |------|---------------|-------------|
2796
+ | `/cost` | Session total: USD, API time, code changes | After a session to review spend |
2797
+ | `/context` | What's consuming context window space | When hitting context limits |
2798
+ | Status line | Real-time `cost.total_cost_usd` + token counts | Continuous monitoring |
2799
+
2800
+ ### Reduce Consumption
2801
+
2802
+ | Technique | Savings | How |
2803
+ |-----------|---------|-----|
2804
+ | `/compact` between phases | ~40-60% context | Plan → compact → implement (plan preserved) |
2805
+ | `/clear` between tasks | 100% context reset | No stale context from prior work |
2806
+ | Delegate verbose ops to subagents | Separate context | `Agent` tool returns summary, not full output |
2807
+ | Use skills for on-demand knowledge | Smaller base context | Skills load only when invoked |
2808
+ | Scope investigations narrowly | Fewer tokens read | "investigate auth module" > "investigate codebase" |
2809
+ | `--effort low` for simple tasks | ~50% thinking tokens | Simple renames, config changes |
2810
+
2811
+ ### CI Cost Control
2812
+
2813
+ Add `--max-budget-usd` to CI workflows as a safety net:
2814
+
2815
+ ```yaml
2816
+ claude_args: "--max-budget-usd 5.00 --max-turns 30"
2817
+ ```
2818
+
2819
+ | Flag | Purpose |
2820
+ |------|---------|
2821
+ | `--max-budget-usd` | Hard dollar cap per CI invocation |
2822
+ | `--max-turns` | Limit agentic turns (prevents infinite loops) |
2823
+ | `--effort` | `low`/`medium`/`high` controls thinking depth |
2824
+
2825
+ ### Advanced: OpenTelemetry
2826
+
2827
+ For organization-wide cost tracking, enable `CLAUDE_CODE_ENABLE_TELEMETRY=1`. This exports per-request `cost_usd`, `input_tokens`, `output_tokens` to any OTLP-compatible backend (Datadog, Honeycomb, Prometheus).
2828
+
2829
+ ---
2830
+
2692
2831
  ## CI/CD Gotchas
2693
2832
 
2694
2833
  Common pitfalls when automating AI-assisted development workflows.
@@ -2750,7 +2889,9 @@ Claude: [fetches via gh api, discusses with you interactively]
2750
2889
 
2751
2890
  This is optional - skip if you prefer fresh reviews only.
2752
2891
 
2753
- ### CI Auto-Fix Loop (Optional)
2892
+ ### CI Auto-Fix Loop (Optional — Bot Fallback)
2893
+
2894
+ > **Two-tier model:** The SDLC skill's CI loops (above) are the "local shepherd" — they handle CI fixes during active sessions. This bot is the second tier: an unattended fallback for when no one is watching. The bot includes SHA-based suppression — if you push a fix locally before the bot runs, it skips automatically.
2754
2895
 
2755
2896
  Automatically fix CI failures and PR review findings. Claude reads the error context, fixes the code, commits, and re-triggers CI. Loops until CI passes AND review has no findings at your chosen level, or max retries hit.
2756
2897
 
@@ -2852,13 +2993,14 @@ Use an independent AI model from a different company as a code reviewer. The aut
2852
2993
  {
2853
2994
  "review_id": "feature-xyz-001",
2854
2995
  "status": "PENDING_REVIEW",
2996
+ "round": 1,
2855
2997
  "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
2856
2998
  "review_instructions": "Review for security, edge cases, and correctness",
2857
2999
  "artifact_path": ".reviews/feature-xyz-001/"
2858
3000
  }
2859
3001
  ```
2860
3002
 
2861
- 3. Run the independent reviewer:
3003
+ 3. Run the independent reviewer (Round 1 — full review). These commands use your Codex default model — configure it to the latest, most capable model available:
2862
3004
 
2863
3005
  ```bash
2864
3006
  codex exec \
@@ -2866,23 +3008,95 @@ codex exec \
2866
3008
  -s danger-full-access \
2867
3009
  -o .reviews/latest-review.md \
2868
3010
  "You are an independent code reviewer. Read .reviews/handoff.json, \
2869
- review the listed files, and write your findings to the artifact_path. \
3011
+ review the listed files. Output each finding with: an ID (1, 2, ...), \
3012
+ severity (P0/P1/P2), description, and a 'certify condition' stating \
3013
+ what specific change would resolve it. \
2870
3014
  End with CERTIFIED or NOT CERTIFIED."
2871
3015
  ```
2872
3016
 
2873
- **The Loop:**
3017
+ 4. If CERTIFIED → done. If NOT CERTIFIED → enter the dialogue loop.
3018
+
3019
+ **The Dialogue Loop (Round 2+):**
3020
+
3021
+ Instead of silently fixing everything and resubmitting for another full review, respond to each finding:
3022
+
3023
+ ```jsonc
3024
+ // .reviews/response.json
3025
+ {
3026
+ "review_id": "feature-xyz-001",
3027
+ "round": 2,
3028
+ "responding_to": ".reviews/latest-review.md",
3029
+ "responses": [
3030
+ {
3031
+ "finding": "1",
3032
+ "action": "FIXED",
3033
+ "summary": "Added missing mocking table to SKILL.md",
3034
+ "evidence": "git diff shows table at SKILL.md:195-210"
3035
+ },
3036
+ {
3037
+ "finding": "2",
3038
+ "action": "DISPUTED",
3039
+ "justification": "The upgrade path cleanup runs in init.js:205. Verified with test-cli.sh test 29.",
3040
+ "evidence": "tests/test-cli.sh:583-600"
3041
+ },
3042
+ {
3043
+ "finding": "3",
3044
+ "action": "ACCEPTED",
3045
+ "summary": "Will add EVAL_PROMPT_VERSION bump"
3046
+ }
3047
+ ]
3048
+ }
2874
3049
  ```
2875
- Claude writes code → self-review passes → handoff.json
3050
+
3051
+ Three response types:
3052
+ - **FIXED**: "I fixed this. Here is what changed." Reviewer verifies the fix.
3053
+ - **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects the reasoning.
3054
+ - **ACCEPTED**: "You are right. Fixing now." (Same outcome as FIXED, used when batching fixes.)
3055
+
3056
+ Then update `handoff.json` to `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields. Run a targeted recheck:
3057
+
3058
+ ```bash
3059
+ codex exec \
3060
+ -c 'model_reasoning_effort="xhigh"' \
3061
+ -s danger-full-access \
3062
+ -o .reviews/latest-review.md \
3063
+ "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
3064
+ to find the previous_review path — read that file for the original \
3065
+ findings and certify conditions. Then read .reviews/response.json \
3066
+ for the author's responses. For each: \
3067
+ FIXED → verify the fix against the original certify condition. \
3068
+ DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
3069
+ ACCEPTED → verify it was applied. \
3070
+ Do NOT raise new findings unless P0 (critical/security). \
3071
+ New observations go in 'Notes for next review' (non-blocking). \
3072
+ End with CERTIFIED or NOT CERTIFIED."
3073
+ ```
3074
+
3075
+ **The key constraint:** Rechecks are scoped to previous findings only. The reviewer cannot block certification with new P2 observations discovered during recheck. This prevents scope creep and ensures convergence.
3076
+
3077
+ **Convergence:** Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of all open findings. Don't spin indefinitely.
3078
+
3079
+ ```
3080
+ Claude writes code → self-review passes → handoff.json (round 1)
2876
3081
  ↑ |
2877
3082
  | v
2878
- | Codex reviews (xhigh reasoning)
3083
+ | Reviewer: FULL REVIEW
3084
+ | (structured findings with IDs)
2879
3085
  | |
2880
3086
  | CERTIFIED? -+→ YES → Done
2881
3087
  | |
2882
3088
  | +→ NO (findings)
2883
3089
  | |
2884
- └────────── Claude fixes findings ←────────┘
2885
- (repeat until CERTIFIED, or ask user)
3090
+ | Claude writes response.json:
3091
+ | FIXED / DISPUTED / ACCEPTED
3092
+ | |
3093
+ | Reviewer: TARGETED RECHECK
3094
+ | (previous findings only, no new P1/P2)
3095
+ | |
3096
+ | All resolved? → YES → CERTIFIED
3097
+ | |
3098
+ └────────── Fix rejected items ←───────────┘
3099
+ (max 3 rechecks, then escalate to user)
2886
3100
  ```
2887
3101
 
2888
3102
  **Key flags:**
@@ -2952,10 +3166,13 @@ If Claude repeatedly struggles in a codebase area:
2952
3166
 
2953
3167
  ### How to Update
2954
3168
 
2955
- Ask Claude any of these:
3169
+ Use the `/update-wizard` skill for a guided, selective update experience:
3170
+ > `/update-wizard` — full guided update (shows changelog, per-file diff, selective adoption)
3171
+ > `/update-wizard check-only` — just show what changed, don't apply anything
3172
+ > `/update-wizard force-all` — apply all updates without per-file approval
3173
+
3174
+ Or ask Claude directly:
2956
3175
  > "Check for SDLC wizard updates"
2957
- > "Run me through the SDLC wizard"
2958
- > "What am I missing from the latest wizard?"
2959
3176
  > "Update my SDLC setup"
2960
3177
 
2961
3178
  **All of these do the same thing:** Claude checks what's new, shows you, and walks you through only what's missing.
@@ -3004,21 +3221,19 @@ Claude reads the CHANGELOG to show you what's new **before** applying anything.
3004
3221
  ```
3005
3222
  Claude: "Fetching CHANGELOG to check for updates..."
3006
3223
 
3007
- Your version: 1.8.0
3008
- Latest version: 1.13.0
3224
+ Your version: X.Y.0
3225
+ Latest version: X.Z.0
3009
3226
 
3010
- What's new since 1.8.0:
3011
- - v1.13.0: Self-update improvements, optional CI notification
3012
- - v1.12.0: Full system audit, apply step fixes
3013
- - v1.11.0: Stale output cleanup, error handling
3014
- - v1.10.0: "Prove It's Better" CI automation
3015
- - v1.9.0: Workflow consolidation (6 → 5 workflows)
3227
+ What's new since X.Y.0:
3228
+ - vX.Z.0: Latest features and improvements
3229
+ - vX.Y+1.0: Previous version changes
3230
+ (... entries from CHANGELOG between your version and latest ...)
3016
3231
 
3017
3232
  Now checking your setup against latest wizard...
3018
3233
 
3019
3234
  ✓ Hooks - up to date
3020
3235
  ✓ Skills - content differs (update available)
3021
- ✗ step-update-notify - NOT DONE (new in v1.13.0, optional)
3236
+ ✗ step-update-notify - NOT DONE (new in vX.Z.0, optional)
3022
3237
 
3023
3238
  Summary:
3024
3239
  - 1 file update available (SDLC skill)
@@ -3034,7 +3249,7 @@ Walk through updates? (y/n)
3034
3249
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3035
3250
 
3036
3251
  ```markdown
3037
- <!-- SDLC Wizard Version: 1.15.0 -->
3252
+ <!-- SDLC Wizard Version: 1.20.0 -->
3038
3253
  <!-- Setup Date: 2026-01-24 -->
3039
3254
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3040
3255
  <!-- Git Workflow: PRs -->
@@ -3067,12 +3282,12 @@ Every wizard step has a unique ID for tracking:
3067
3282
  | `step-4` | Light hook | 1.0.0 |
3068
3283
  | `step-5` | TDD hook | 1.0.0 |
3069
3284
  | `step-6` | SDLC skill | 1.0.0 |
3070
- | `step-7` | Testing skill | 1.0.0 |
3071
3285
  | `step-8` | CLAUDE.md | 1.0.0 |
3072
3286
  | `step-9` | SDLC/TESTING/ARCH docs | 1.0.0 |
3073
3287
  | `question-git-workflow` | Git workflow preference | 1.2.0 |
3074
3288
  | `step-update-notify` | Optional: CI update notification | 1.13.0 |
3075
3289
  | `step-cross-model-review` | Optional: Cross-model review loop | 1.16.0 |
3290
+ | `step-update-wizard` | /update-wizard smart update skill | 1.18.0 |
3076
3291
 
3077
3292
  When checking for updates, Claude compares user's completed steps against this registry.
3078
3293