agentic-sdlc-wizard 1.15.0 → 1.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,32 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.18.0] - 2026-03-30
8
+
9
+ ### Added
10
+ - `/update-wizard` skill — guided update with changelog diff, per-file comparison, selective adoption
11
+ - `step-update-wizard` in wizard step registry
12
+ - CLI distributes `skills/update/SKILL.md` (now 8 managed files)
13
+ - `/update-wizard` reference in wizard "How to Update" section
14
+
15
+ ## [1.17.0] - 2026-03-30
16
+
17
+ ### Fixed
18
+ - Setup skill now force-reads entire wizard doc before proceeding (was just "Reference")
19
+ - README no longer tells users to manually invoke setup — hooks auto-invoke
20
+ - 3 new tests for setup auto-invoke behavior
21
+
22
+ ### Changed
23
+ - Testing consolidation: `/testing` skill merged into `/sdlc` (#28)
24
+
25
+ ## [1.16.0] - 2026-03-29
26
+
27
+ ### Added
28
+ - Cross-model review dialogue protocol — structured FIXED/DISPUTED/ACCEPTED negotiation loop (#40)
29
+ - P0/P1/P2 severity rubric in PR review prompt (#34)
30
+ - Effort level recommendations in wizard
31
+ - 5 enforcement gap fixes in TodoWrite checklist (#39)
32
+
7
33
  ## [1.15.0] - 2026-03-25
8
34
 
9
35
  ### Added
@@ -211,6 +211,35 @@ When Anthropic provides official plugins or tools that handle something:
211
211
 
212
212
  ---
213
213
 
214
+ ## Recommended Effort Level
215
+
216
+ Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
217
+
218
+ | Level | When to Use | How to Set |
219
+ |-------|-------------|------------|
220
+ | `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
221
+ | `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
222
+
223
+ **Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
224
+
225
+ **When to escalate to `max`:**
226
+ - You hit LOW confidence on your approach — deeper thinking may find clarity
227
+ - You've failed the same thing twice — something non-obvious is wrong
228
+ - Architecture decisions with wide blast radius
229
+ - Complex multi-system debugging where you need to hold many constraints
230
+ - Cross-model review analysis (reading and evaluating external reviewer findings)
231
+
232
+ **How it works:**
233
+ - `/effort max` changes effort for the current session only (resets next session)
234
+ - `effort: high` in SKILL.md frontmatter persists — every `/sdlc` invocation uses `high`
235
+ - You can also type `ultrathink` in any prompt for a single high-effort turn
236
+
237
+ **Cost note:** `max` uses significantly more tokens than `high`. Use it when the problem justifies it, not as a default.
238
+
239
+ > See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
240
+
241
+ ---
242
+
214
243
  ## Claude Code Feature Updates
215
244
 
216
245
  > **Keep your SDLC current**: Claude Code evolves. This section documents features that enhance the SDLC workflow. Check [Claude Code releases](https://github.com/anthropics/claude-code/releases) periodically.
@@ -250,7 +279,7 @@ $ARGUMENTS
250
279
 
251
280
  **Usage examples**:
252
281
  - `/sdlc fix the login validation bug` → `$ARGUMENTS` = "fix the login validation bug"
253
- - `/testing unit UserService` → `$ARGUMENTS` = "unit UserService"
282
+ - `/sdlc write tests for UserService` → `$ARGUMENTS` = "write tests for UserService"
254
283
 
255
284
  **Note**: Skills still auto-invoke via hooks. This is optional polish for manual invocation.
256
285
 
@@ -276,7 +305,7 @@ New built-in commands available to use alongside the wizard:
276
305
 
277
306
  ### Skill Effort Frontmatter (v2.1.80+)
278
307
 
279
- Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` and `/testing` skills use `effort: high` to ensure Claude gives full attention to SDLC tasks.
308
+ Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` skill uses `effort: high` to ensure Claude gives full attention to SDLC tasks.
280
309
 
281
310
  ### InstructionsLoaded Hook (v2.1.69+)
282
311
 
@@ -1376,7 +1405,7 @@ Your answers map to these files:
1376
1405
  | Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
1377
1406
  | Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
1378
1407
  | Q11 (test duration) | `SDLC skill` - wait time note |
1379
- | Q12 (E2E) | `testing skill` - testing diamond top |
1408
+ | Q12 (E2E) | `TESTING.md` - testing diamond top |
1380
1409
 
1381
1410
  ---
1382
1411
 
@@ -1387,7 +1416,6 @@ Create these directories in your project root:
1387
1416
  ```bash
1388
1417
  mkdir -p .claude/hooks
1389
1418
  mkdir -p .claude/skills/sdlc
1390
- mkdir -p .claude/skills/testing
1391
1419
  ```
1392
1420
 
1393
1421
  **Commit to Git:** Yes! These files should be committed so your whole team gets the same SDLC enforcement. When teammates pull, they get the hooks and skills automatically.
@@ -1506,9 +1534,8 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
1506
1534
  The light hook outputs text that **instructs Claude** to invoke skills:
1507
1535
 
1508
1536
  ```
1509
- AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
1510
- - implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
1511
- - test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
1537
+ AUTO-INVOKE SKILL (Claude MUST do this FIRST):
1538
+ - implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
1512
1539
  ```
1513
1540
 
1514
1541
  **This is text-based, not programmatic.** Claude reads this instruction and follows it. When Claude sees your message is an implementation task, it invokes the sdlc skill using the Skill tool. This loads the full SDLC guidance into context.
@@ -1530,7 +1557,7 @@ Create `.claude/hooks/sdlc-prompt-check.sh`:
1530
1557
  ```bash
1531
1558
  #!/bin/bash
1532
1559
  # Light SDLC hook - baseline reminder every prompt (~100 tokens)
1533
- # Full guidance in skills: .claude/skills/sdlc/ and .claude/skills/testing/
1560
+ # Full guidance in skill: .claude/skills/sdlc/
1534
1561
 
1535
1562
  cat << 'EOF'
1536
1563
  SDLC BASELINE:
@@ -1540,10 +1567,8 @@ SDLC BASELINE:
1540
1567
  4. FAILED 2x? STOP and ASK USER
1541
1568
  5. 🛑 ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS
1542
1569
 
1543
- AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
1544
- - implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
1545
- - test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
1546
- - If BOTH match (e.g., "fix the test") → sdlc takes precedence (includes TDD)
1570
+ AUTO-INVOKE SKILL (Claude MUST do this FIRST):
1571
+ - implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
1547
1572
  - DON'T invoke for: questions, explanations, reading/exploring code, simple queries
1548
1573
  - DON'T wait for user to type /sdlc - AUTO-INVOKE based on task type
1549
1574
 
@@ -1685,13 +1710,13 @@ TodoWrite([
1685
1710
 
1686
1711
  Before presenting approach, STATE your confidence:
1687
1712
 
1688
- | Level | Meaning | Action |
1689
- |-------|---------|--------|
1690
- | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
1691
- | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
1692
- | LOW (<60%) | Not sure | ASK USER before proceeding |
1693
- | FAILED 2x | Something's wrong | STOP. ASK USER immediately |
1694
- | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
1713
+ | Level | Meaning | Action | Effort |
1714
+ |-------|---------|--------|--------|
1715
+ | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
1716
+ | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
1717
+ | LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
1718
+ | FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
1719
+ | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
1695
1720
 
1696
1721
  ## Self-Review Loop (CRITICAL)
1697
1722
 
@@ -1724,38 +1749,100 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
1724
1749
 
1725
1750
  **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
1726
1751
 
1727
- **Steps:**
1752
+ ### Round 1: Initial Review
1753
+
1728
1754
  1. After self-review passes, write `.reviews/handoff.json`:
1729
1755
  ```jsonc
1730
1756
  {
1731
1757
  "review_id": "feature-xyz-001",
1732
1758
  "status": "PENDING_REVIEW",
1759
+ "round": 1,
1733
1760
  "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
1734
1761
  "review_instructions": "Review for security, edge cases, and correctness",
1735
1762
  "artifact_path": ".reviews/feature-xyz-001/"
1736
1763
  }
1737
1764
  ```
1738
- 2. Tell the user to run the independent reviewer:
1765
+ 2. Run the independent reviewer:
1739
1766
  ```bash
1740
1767
  codex exec \
1741
1768
  -c 'model_reasoning_effort="xhigh"' \
1742
1769
  -s danger-full-access \
1743
1770
  -o .reviews/latest-review.md \
1744
1771
  "You are an independent code reviewer. Read .reviews/handoff.json, \
1745
- review the listed files, and write your findings to the artifact_path. \
1772
+ review the listed files. Output each finding with: an ID (1, 2, ...), \
1773
+ severity (P0/P1/P2), description, and a 'certify condition' stating \
1774
+ what specific change would resolve it. \
1775
+ End with CERTIFIED or NOT CERTIFIED."
1776
+ ```
1777
+ 3. If CERTIFIED → proceed to CI. If NOT CERTIFIED → go to Round 2.
1778
+
1779
+ ### Round 2+: Dialogue Loop
1780
+
1781
+ When the reviewer finds issues, respond per-finding instead of silently fixing everything:
1782
+
1783
+ 1. Write `.reviews/response.json`:
1784
+ ```jsonc
1785
+ {
1786
+ "review_id": "feature-xyz-001",
1787
+ "round": 2,
1788
+ "responding_to": ".reviews/latest-review.md",
1789
+ "responses": [
1790
+ { "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
1791
+ { "finding": "2", "action": "DISPUTED", "justification": "This is intentional — see CODE_REVIEW_EXCEPTIONS.md" },
1792
+ { "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
1793
+ ]
1794
+ }
1795
+ ```
1796
+ - **FIXED**: "I fixed this. Here is what changed." Reviewer verifies.
1797
+ - **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects.
1798
+ - **ACCEPTED**: "You are right. Fixing now." (Same as FIXED, batched.)
1799
+
1800
+ 2. Update `handoff.json` with `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields.
1801
+
1802
+ 3. Run targeted recheck (NOT a full re-review):
1803
+ ```bash
1804
+ codex exec \
1805
+ -c 'model_reasoning_effort="xhigh"' \
1806
+ -s danger-full-access \
1807
+ -o .reviews/latest-review.md \
1808
+ "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
1809
+ to find the previous_review path — read that file for the original \
1810
+ findings and certify conditions. Then read .reviews/response.json \
1811
+ for the author's responses. For each: \
1812
+ FIXED → verify the fix against the original certify condition. \
1813
+ DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
1814
+ ACCEPTED → verify it was applied. \
1815
+ Do NOT raise new findings unless P0 (critical/security). \
1816
+ New observations go in 'Notes for next review' (non-blocking). \
1746
1817
  End with CERTIFIED or NOT CERTIFIED."
1747
1818
  ```
1748
- 3. Read `.reviews/latest-review.md` — if CERTIFIED, proceed to CI. If NOT CERTIFIED, fix findings and repeat from step 1.
1819
+
1820
+ 4. If CERTIFIED → done. If NOT CERTIFIED (rejected disputes or failed fixes) → fix rejected items and repeat.
1821
+
1822
+ ### Convergence
1823
+
1824
+ Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of open findings. Don't spin indefinitely.
1749
1825
 
1750
1826
  ```
1751
- Self-review passes → write handoff.json user runs codex exec
1752
- ^ |
1753
- | CERTIFIED? YES CI feedback loop
1754
- | |
1755
- | NO (findings)
1756
- | |
1757
- └──────── Fix findings ←───────────────────────┘
1758
- (repeat until CERTIFIED, or ask user)
1827
+ Self-review passes → handoff.json (round 1, PENDING_REVIEW)
1828
+ |
1829
+ Reviewer: FULL REVIEW (structured findings)
1830
+ |
1831
+ CERTIFIED? YES → CI feedback loop
1832
+ |
1833
+ NO (findings with IDs + certify conditions)
1834
+ |
1835
+ Claude writes response.json:
1836
+ FIXED / DISPUTED / ACCEPTED per finding
1837
+ |
1838
+ handoff.json (round 2+, PENDING_RECHECK)
1839
+ |
1840
+ Reviewer: TARGETED RECHECK (previous findings only)
1841
+ |
1842
+ All resolved? → YES → CERTIFIED
1843
+ |
1844
+ NO → fix rejected items, repeat
1845
+ (max 3 rechecks, then escalate to user)
1759
1846
  ```
1760
1847
 
1761
1848
  **Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
@@ -1937,111 +2024,6 @@ CI passes -> Read review suggestions
1937
2024
 
1938
2025
  ---
1939
2026
 
1940
- ## Step 7: Create Testing Skill
1941
-
1942
- Create `.claude/skills/testing/SKILL.md`:
1943
-
1944
- ````markdown
1945
- ---
1946
- name: testing
1947
- description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
1948
- argument-hint: [test type] [target]
1949
- ---
1950
- # Testing Skill - TDD & Testing Philosophy
1951
-
1952
- ## Task
1953
- $ARGUMENTS
1954
-
1955
- ## Testing Diamond (CRITICAL)
1956
-
1957
- ```
1958
- /\ ← Few E2E (automated or manual sign-off at end)
1959
- / \
1960
- / \
1961
- /------\
1962
- | | ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
1963
- | |
1964
- \------/
1965
- \ /
1966
- \ /
1967
- \/ ← Few Unit (pure logic only)
1968
- ```
1969
-
1970
- **Why Integration Tests are Best Bang for Buck:**
1971
- - **Speed**: Fast enough to run on every change
1972
- - **Stability**: Touch real code, not mocks that lie
1973
- - **Confidence**: If they pass, production usually works
1974
- - **Real bugs**: Integration tests with real DB catch real bugs
1975
- - Unit tests with mocks can "pass" while production fails
1976
-
1977
- ## Minimal Mocking Philosophy
1978
-
1979
- | What | Mock? | Why |
1980
- |------|-------|-----|
1981
- | Database | ❌ NEVER | Use test DB or in-memory |
1982
- | Cache | ❌ NEVER | Use isolated test instance |
1983
- | External APIs | ✅ YES | Real calls = flaky + expensive |
1984
- | Time/Date | ✅ YES | Determinism |
1985
-
1986
- **Mocks MUST come from REAL captured data:**
1987
- - Capture real API response
1988
- - Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
1989
- - Import in tests
1990
- - Never guess mock shapes!
1991
-
1992
- ## TDD Tests Must PROVE
1993
-
1994
- | Phase | What It Proves |
1995
- |-------|----------------|
1996
- | RED | Test FAILS → Bug exists or feature missing |
1997
- | GREEN | Test PASSES → Fix works or feature implemented |
1998
- | Forever | Regression protection |
1999
-
2000
- **WRONG approach:**
2001
- ```
2002
- // ❌ Writing test that passes with current (buggy) code
2003
- assert currentBuggyBehavior == currentBuggyBehavior // pseudocode
2004
- ```
2005
-
2006
- **CORRECT approach:**
2007
- ```
2008
- // ✅ Writing test that FAILS with buggy code, PASSES with fix
2009
- assert result.status == 'success' // pseudocode - adapt to your framework
2010
- assert result.data != null
2011
- ```
2012
-
2013
- ## Unit Tests = Pure Logic ONLY
2014
-
2015
- A function qualifies for unit testing ONLY if:
2016
- - ✅ No database calls
2017
- - ✅ No external API calls
2018
- - ✅ No file system access
2019
- - ✅ No cache calls
2020
- - ✅ Input → Output transformation only
2021
-
2022
- Everything else needs integration tests.
2023
-
2024
- ## When Stuck on Tests
2025
-
2026
- 1. Add console.logs → Check output
2027
- 2. Run single test in isolation
2028
- 3. Check fixtures match real API
2029
- 4. **STILL stuck?** ASK USER
2030
-
2031
- ## After Session (Capture Learnings)
2032
-
2033
- If this session revealed testing insights, update the right place:
2034
- - **Testing patterns, gotchas** → `TESTING.md`
2035
- - **Feature-specific test quirks** → Feature docs (`*_PLAN.md`)
2036
- - **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
2037
-
2038
- ---
2039
-
2040
- **Full reference:** TESTING.md
2041
- ````
2042
-
2043
- ---
2044
-
2045
2027
  ### Visual Regression Testing (Experimental - Niche Use Cases Only)
2046
2028
 
2047
2029
  **Most apps don't need this.** Standard E2E testing (Playwright, Cypress) covers 99% of UI testing needs.
@@ -2230,9 +2212,27 @@ These are your full reference docs. Start with stubs and expand over time:
2230
2212
 
2231
2213
  **Claude follows this automatically.** When task involves "deploy to prod" and confidence is LOW, Claude will ask before proceeding.
2232
2214
 
2215
+ ## Post-Deploy Verification
2216
+
2217
+ **After deploying to ANY environment, verify it's working:**
2218
+
2219
+ | Environment | Health Check | Log Command | Smoke Test |
2220
+ |-------------|-------------|-------------|------------|
2221
+ | Local Dev | `curl http://localhost:3000/health` | `[your dev log command]` | `npm run test:smoke` |
2222
+ | Staging | `curl https://staging.example.com/health` | `[your staging log command]` | `[your staging smoke test]` |
2223
+ | Production | `curl https://example.com/health` | `[your prod log command, e.g., kubectl logs]` | `[your prod smoke test]` |
2224
+
2225
+ **Monitoring after production deploy:**
2226
+ 1. Watch error rates for 15 minutes (dashboard: `[your monitoring URL]`)
2227
+ 2. Check application logs for new errors: `[your log command]`
2228
+ 3. Run smoke tests against production: `[your smoke test command]`
2229
+ 4. If issues found → rollback first, THEN start new SDLC loop to fix
2230
+
2231
+ **Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
2232
+
2233
2233
  ## Rollback
2234
2234
 
2235
- If deployment fails or causes issues:
2235
+ If deployment fails or post-deploy verification catches issues:
2236
2236
 
2237
2237
  | Environment | Rollback Command | Notes |
2238
2238
  |-------------|------------------|-------|
@@ -2266,7 +2266,7 @@ If deployment fails or causes issues:
2266
2266
 
2267
2267
  **SDLC.md:**
2268
2268
  ```markdown
2269
- <!-- SDLC Wizard Version: 1.15.0 -->
2269
+ <!-- SDLC Wizard Version: 1.18.0 -->
2270
2270
  <!-- Setup Date: [DATE] -->
2271
2271
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2272
2272
  <!-- Git Workflow: [PRs or Solo] -->
@@ -2298,7 +2298,7 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
2298
2298
  ```markdown
2299
2299
  # Testing Guidelines
2300
2300
 
2301
- See `.claude/skills/testing/SKILL.md` for TDD philosophy.
2301
+ See `TESTING.md` for TDD philosophy.
2302
2302
 
2303
2303
  ## Test Commands
2304
2304
 
@@ -2394,7 +2394,6 @@ Verification Results:
2394
2394
  ├── .claude/hooks/tdd-pretool-check.sh ✓ executable
2395
2395
  ├── .claude/settings.json ✓ valid JSON
2396
2396
  ├── .claude/skills/sdlc/SKILL.md ✓ frontmatter OK
2397
- ├── .claude/skills/testing/SKILL.md ✓ frontmatter OK
2398
2397
  ├── CLAUDE.md ✓ exists
2399
2398
  ├── SDLC.md ✓ exists
2400
2399
  └── TESTING.md ✓ exists
@@ -2422,14 +2421,14 @@ All checks passed! Setup complete.
2422
2421
  |------|-----------------|
2423
2422
  | "What files handle auth?" | Answers without invoking skills |
2424
2423
  | "Add a logout button" | Auto-invokes sdlc skill, uses TodoWrite |
2425
- | "Write tests for login" | Auto-invokes testing skill |
2424
+ | "Write tests for login" | Auto-invokes sdlc skill |
2426
2425
 
2427
2426
  **What happens automatically:**
2428
2427
 
2429
2428
  | You Do | System Does |
2430
2429
  |--------|-------------|
2431
2430
  | Ask to implement something | SDLC skill auto-invokes, TodoWrite starts |
2432
- | Ask to write tests | Testing skill auto-invokes |
2431
+ | Ask to write tests | SDLC skill auto-invokes |
2433
2432
  | Claude tries to edit code | TDD reminder fires |
2434
2433
  | Task completes | Compliance check runs |
2435
2434
 
@@ -2494,7 +2493,7 @@ All checks passed! Setup complete.
2494
2493
  |--------|---------|
2495
2494
  | Free context after planning | `/compact` |
2496
2495
  | Enter planning mode | Claude suggests or `/plan` |
2497
- | Run specific skill | `/sdlc` or `/testing` |
2496
+ | Run specific skill | `/sdlc` |
2498
2497
 
2499
2498
  ---
2500
2499
 
@@ -2525,7 +2524,7 @@ You've successfully set up the system when:
2525
2524
 
2526
2525
  - [ ] Light hook fires every prompt (you see SDLC BASELINE in responses)
2527
2526
  - [ ] Claude auto-invokes sdlc skill for implementation tasks
2528
- - [ ] Claude auto-invokes testing skill for test tasks
2527
+ - [ ] Claude auto-invokes sdlc skill for all tasks
2529
2528
  - [ ] Claude uses TodoWrite to track progress
2530
2529
  - [ ] Claude states confidence levels
2531
2530
  - [ ] Claude asks for clarification when LOW confidence
@@ -2852,13 +2851,14 @@ Use an independent AI model from a different company as a code reviewer. The aut
2852
2851
  {
2853
2852
  "review_id": "feature-xyz-001",
2854
2853
  "status": "PENDING_REVIEW",
2854
+ "round": 1,
2855
2855
  "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
2856
2856
  "review_instructions": "Review for security, edge cases, and correctness",
2857
2857
  "artifact_path": ".reviews/feature-xyz-001/"
2858
2858
  }
2859
2859
  ```
2860
2860
 
2861
- 3. Run the independent reviewer:
2861
+ 3. Run the independent reviewer (Round 1 — full review). These commands use your Codex default model — configure it to the latest, most capable model available:
2862
2862
 
2863
2863
  ```bash
2864
2864
  codex exec \
@@ -2866,23 +2866,95 @@ codex exec \
2866
2866
  -s danger-full-access \
2867
2867
  -o .reviews/latest-review.md \
2868
2868
  "You are an independent code reviewer. Read .reviews/handoff.json, \
2869
- review the listed files, and write your findings to the artifact_path. \
2869
+ review the listed files. Output each finding with: an ID (1, 2, ...), \
2870
+ severity (P0/P1/P2), description, and a 'certify condition' stating \
2871
+ what specific change would resolve it. \
2870
2872
  End with CERTIFIED or NOT CERTIFIED."
2871
2873
  ```
2872
2874
 
2873
- **The Loop:**
2875
+ 4. If CERTIFIED → done. If NOT CERTIFIED → enter the dialogue loop.
2876
+
2877
+ **The Dialogue Loop (Round 2+):**
2878
+
2879
+ Instead of silently fixing everything and resubmitting for another full review, respond to each finding:
2880
+
2881
+ ```jsonc
2882
+ // .reviews/response.json
2883
+ {
2884
+ "review_id": "feature-xyz-001",
2885
+ "round": 2,
2886
+ "responding_to": ".reviews/latest-review.md",
2887
+ "responses": [
2888
+ {
2889
+ "finding": "1",
2890
+ "action": "FIXED",
2891
+ "summary": "Added missing mocking table to SKILL.md",
2892
+ "evidence": "git diff shows table at SKILL.md:195-210"
2893
+ },
2894
+ {
2895
+ "finding": "2",
2896
+ "action": "DISPUTED",
2897
+ "justification": "The upgrade path cleanup runs in init.js:205. Verified with test-cli.sh test 29.",
2898
+ "evidence": "tests/test-cli.sh:583-600"
2899
+ },
2900
+ {
2901
+ "finding": "3",
2902
+ "action": "ACCEPTED",
2903
+ "summary": "Will add EVAL_PROMPT_VERSION bump"
2904
+ }
2905
+ ]
2906
+ }
2907
+ ```
2908
+
2909
+ Three response types:
2910
+ - **FIXED**: "I fixed this. Here is what changed." Reviewer verifies the fix.
2911
+ - **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects the reasoning.
2912
+ - **ACCEPTED**: "You are right. Fixing now." (Same outcome as FIXED, used when batching fixes.)
2913
+
2914
+ Then update `handoff.json` to `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields. Run a targeted recheck:
2915
+
2916
+ ```bash
2917
+ codex exec \
2918
+ -c 'model_reasoning_effort="xhigh"' \
2919
+ -s danger-full-access \
2920
+ -o .reviews/latest-review.md \
2921
+ "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
2922
+ to find the previous_review path — read that file for the original \
2923
+ findings and certify conditions. Then read .reviews/response.json \
2924
+ for the author's responses. For each: \
2925
+ FIXED → verify the fix against the original certify condition. \
2926
+ DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
2927
+ ACCEPTED → verify it was applied. \
2928
+ Do NOT raise new findings unless P0 (critical/security). \
2929
+ New observations go in 'Notes for next review' (non-blocking). \
2930
+ End with CERTIFIED or NOT CERTIFIED."
2874
2931
  ```
2875
- Claude writes code → self-review passes → handoff.json
2932
+
2933
+ **The key constraint:** Rechecks are scoped to previous findings only. The reviewer cannot block certification with new P2 observations discovered during recheck. This prevents scope creep and ensures convergence.
2934
+
2935
+ **Convergence:** Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of all open findings. Don't spin indefinitely.
2936
+
2937
+ ```
2938
+ Claude writes code → self-review passes → handoff.json (round 1)
2876
2939
  ↑ |
2877
2940
  | v
2878
- | Codex reviews (xhigh reasoning)
2941
+ | Reviewer: FULL REVIEW
2942
+ | (structured findings with IDs)
2879
2943
  | |
2880
2944
  | CERTIFIED? -+→ YES → Done
2881
2945
  | |
2882
2946
  | +→ NO (findings)
2883
2947
  | |
2884
- └────────── Claude fixes findings ←────────┘
2885
- (repeat until CERTIFIED, or ask user)
2948
+ | Claude writes response.json:
2949
+ | FIXED / DISPUTED / ACCEPTED
2950
+ | |
2951
+ | Reviewer: TARGETED RECHECK
2952
+ | (previous findings only, no new P1/P2)
2953
+ | |
2954
+ | All resolved? → YES → CERTIFIED
2955
+ | |
2956
+ └────────── Fix rejected items ←───────────┘
2957
+ (max 3 rechecks, then escalate to user)
2886
2958
  ```
2887
2959
 
2888
2960
  **Key flags:**
@@ -2952,10 +3024,13 @@ If Claude repeatedly struggles in a codebase area:
2952
3024
 
2953
3025
  ### How to Update
2954
3026
 
2955
- Ask Claude any of these:
3027
+ Use the `/update-wizard` skill for a guided, selective update experience:
3028
+ > `/update-wizard` — full guided update (shows changelog, per-file diff, selective adoption)
3029
+ > `/update-wizard check-only` — just show what changed, don't apply anything
3030
+ > `/update-wizard force-all` — apply all updates without per-file approval
3031
+
3032
+ Or ask Claude directly:
2956
3033
  > "Check for SDLC wizard updates"
2957
- > "Run me through the SDLC wizard"
2958
- > "What am I missing from the latest wizard?"
2959
3034
  > "Update my SDLC setup"
2960
3035
 
2961
3036
  **All of these do the same thing:** Claude checks what's new, shows you, and walks you through only what's missing.
@@ -3034,7 +3109,7 @@ Walk through updates? (y/n)
3034
3109
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3035
3110
 
3036
3111
  ```markdown
3037
- <!-- SDLC Wizard Version: 1.15.0 -->
3112
+ <!-- SDLC Wizard Version: 1.18.0 -->
3038
3113
  <!-- Setup Date: 2026-01-24 -->
3039
3114
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3040
3115
  <!-- Git Workflow: PRs -->
@@ -3067,12 +3142,12 @@ Every wizard step has a unique ID for tracking:
3067
3142
  | `step-4` | Light hook | 1.0.0 |
3068
3143
  | `step-5` | TDD hook | 1.0.0 |
3069
3144
  | `step-6` | SDLC skill | 1.0.0 |
3070
- | `step-7` | Testing skill | 1.0.0 |
3071
3145
  | `step-8` | CLAUDE.md | 1.0.0 |
3072
3146
  | `step-9` | SDLC/TESTING/ARCH docs | 1.0.0 |
3073
3147
  | `question-git-workflow` | Git workflow preference | 1.2.0 |
3074
3148
  | `step-update-notify` | Optional: CI update notification | 1.13.0 |
3075
3149
  | `step-cross-model-review` | Optional: Cross-model review loop | 1.16.0 |
3150
+ | `step-update-wizard` | /update-wizard smart update skill | 1.18.0 |
3076
3151
 
3077
3152
  When checking for updates, Claude compares user's completed steps against this registry.
3078
3153