npm - agentic-sdlc-wizard - Versions diffs - 1.15.0 → 1.18.0 - Mend

agentic-sdlc-wizard 1.15.0 → 1.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/CHANGELOG.md +26 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +231 -156
package/README.md +51 -35
package/cli/bin/sdlc-wizard.js +14 -3
package/cli/init.js +202 -10
package/cli/templates/hooks/instructions-loaded-check.sh +1 -1
package/cli/templates/hooks/sdlc-prompt-check.sh +16 -5
package/cli/templates/skills/sdlc/SKILL.md +159 -30
package/cli/templates/skills/setup/SKILL.md +176 -0
package/cli/templates/skills/update/SKILL.md +141 -0
package/package.json +1 -1
package/cli/templates/skills/testing/SKILL.md +0 -97

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,32 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.18.0] - 2026-03-30
+### Added
+- `/update-wizard` skill — guided update with changelog diff, per-file comparison, selective adoption
+- `step-update-wizard` in wizard step registry
+- CLI distributes `skills/update/SKILL.md` (now 8 managed files)
+- `/update-wizard` reference in wizard "How to Update" section
+## [1.17.0] - 2026-03-30
+### Fixed
+- Setup skill now force-reads entire wizard doc before proceeding (was just "Reference")
+- README no longer tells users to manually invoke setup — hooks auto-invoke
+- 3 new tests for setup auto-invoke behavior
+### Changed
+- Testing consolidation: `/testing` skill merged into `/sdlc` (#28)
+## [1.16.0] - 2026-03-29
+### Added
+- Cross-model review dialogue protocol — structured FIXED/DISPUTED/ACCEPTED negotiation loop (#40)
+- P0/P1/P2 severity rubric in PR review prompt (#34)
+- Effort level recommendations in wizard
+- 5 enforcement gap fixes in TodoWrite checklist (#39)
 ## [1.15.0] - 2026-03-25
 ### Added

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -211,6 +211,35 @@ When Anthropic provides official plugins or tools that handle something:
 ---
+## Recommended Effort Level
+Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
+| Level | When to Use | How to Set |
+|-------|-------------|------------|
+| `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
+| `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
+**Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
+**When to escalate to `max`:**
+- You hit LOW confidence on your approach — deeper thinking may find clarity
+- You've failed the same thing twice — something non-obvious is wrong
+- Architecture decisions with wide blast radius
+- Complex multi-system debugging where you need to hold many constraints
+- Cross-model review analysis (reading and evaluating external reviewer findings)
+**How it works:**
+- `/effort max` changes effort for the current session only (resets next session)
+- `effort: high` in SKILL.md frontmatter persists — every `/sdlc` invocation uses `high`
+- You can also type `ultrathink` in any prompt for a single high-effort turn
+**Cost note:** `max` uses significantly more tokens than `high`. Use it when the problem justifies it, not as a default.
+> See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
+---
 ## Claude Code Feature Updates
 > **Keep your SDLC current**: Claude Code evolves. This section documents features that enhance the SDLC workflow. Check [Claude Code releases](https://github.com/anthropics/claude-code/releases) periodically.
@@ -250,7 +279,7 @@ $ARGUMENTS
 **Usage examples**:
 - `/sdlc fix the login validation bug` → `$ARGUMENTS` = "fix the login validation bug"
-- `/testing unit UserService` → `$ARGUMENTS` = "unit UserService"
+- `/sdlc write tests for UserService` → `$ARGUMENTS` = "write tests for UserService"
 **Note**: Skills still auto-invoke via hooks. This is optional polish for manual invocation.
@@ -276,7 +305,7 @@ New built-in commands available to use alongside the wizard:
 ### Skill Effort Frontmatter (v2.1.80+)
-Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` and `/testing` skills use `effort: high` to ensure Claude gives full attention to SDLC tasks.
+Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` skill uses `effort: high` to ensure Claude gives full attention to SDLC tasks.
 ### InstructionsLoaded Hook (v2.1.69+)
@@ -1376,7 +1405,7 @@ Your answers map to these files:
 | Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
 | Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
 | Q11 (test duration) | `SDLC skill` - wait time note |
-| Q12 (E2E) | `testing skill` - testing diamond top |
+| Q12 (E2E) | `TESTING.md` - testing diamond top |
 ---
@@ -1387,7 +1416,6 @@ Create these directories in your project root:
 ```bash
 mkdir -p .claude/hooks
 mkdir -p .claude/skills/sdlc
-mkdir -p .claude/skills/testing
 ```
 **Commit to Git:** Yes! These files should be committed so your whole team gets the same SDLC enforcement. When teammates pull, they get the hooks and skills automatically.
@@ -1506,9 +1534,8 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
 The light hook outputs text that **instructs Claude** to invoke skills:
 ```
-AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
-- implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
-- test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
+AUTO-INVOKE SKILL (Claude MUST do this FIRST):
+- implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
 ```
 **This is text-based, not programmatic.** Claude reads this instruction and follows it. When Claude sees your message is an implementation task, it invokes the sdlc skill using the Skill tool. This loads the full SDLC guidance into context.
@@ -1530,7 +1557,7 @@ Create `.claude/hooks/sdlc-prompt-check.sh`:
 ```bash
 #!/bin/bash
 # Light SDLC hook - baseline reminder every prompt (~100 tokens)
-# Full guidance in skills: .claude/skills/sdlc/ and .claude/skills/testing/
+# Full guidance in skill: .claude/skills/sdlc/
 cat << 'EOF'
 SDLC BASELINE:
@@ -1540,10 +1567,8 @@ SDLC BASELINE:
 4. FAILED 2x? STOP and ASK USER
 5. 🛑 ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS
-AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
-- implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
-- test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
-- If BOTH match (e.g., "fix the test") → sdlc takes precedence (includes TDD)
+AUTO-INVOKE SKILL (Claude MUST do this FIRST):
+- implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
 - DON'T invoke for: questions, explanations, reading/exploring code, simple queries
 - DON'T wait for user to type /sdlc - AUTO-INVOKE based on task type
@@ -1685,13 +1710,13 @@ TodoWrite([
 Before presenting approach, STATE your confidence:
-| Level | Meaning | Action |
-|-------|---------|--------|
-| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
-| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
-| LOW (<60%) | Not sure | ASK USER before proceeding |
-| FAILED 2x | Something's wrong | STOP. ASK USER immediately |
-| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
+| Level | Meaning | Action | Effort |
+|-------|---------|--------|--------|
+| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
+| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
+| LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
+| FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
+| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
 ## Self-Review Loop (CRITICAL)
@@ -1724,38 +1749,100 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
 **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
-**Steps:**
+### Round 1: Initial Review
 1. After self-review passes, write `.reviews/handoff.json`:
    ```jsonc
    {
      "review_id": "feature-xyz-001",
      "status": "PENDING_REVIEW",
+     "round": 1,
      "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
      "review_instructions": "Review for security, edge cases, and correctness",
      "artifact_path": ".reviews/feature-xyz-001/"
    }
    ```
-2. Tell the user to run the independent reviewer:
+2. Run the independent reviewer:
    ```bash
    codex exec \
      -c 'model_reasoning_effort="xhigh"' \
      -s danger-full-access \
      -o .reviews/latest-review.md \
      "You are an independent code reviewer. Read .reviews/handoff.json, \
-      review the listed files, and write your findings to the artifact_path. \
+      review the listed files. Output each finding with: an ID (1, 2, ...), \
+      severity (P0/P1/P2), description, and a 'certify condition' stating \
+      what specific change would resolve it. \
+      End with CERTIFIED or NOT CERTIFIED."
+   ```
+3. If CERTIFIED → proceed to CI. If NOT CERTIFIED → go to Round 2.
+### Round 2+: Dialogue Loop
+When the reviewer finds issues, respond per-finding instead of silently fixing everything:
+1. Write `.reviews/response.json`:
+   ```jsonc
+   {
+     "review_id": "feature-xyz-001",
+     "round": 2,
+     "responding_to": ".reviews/latest-review.md",
+     "responses": [
+       { "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
+       { "finding": "2", "action": "DISPUTED", "justification": "This is intentional — see CODE_REVIEW_EXCEPTIONS.md" },
+       { "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
+     ]
+   }
+   ```
+   - **FIXED**: "I fixed this. Here is what changed." Reviewer verifies.
+   - **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects.
+   - **ACCEPTED**: "You are right. Fixing now." (Same as FIXED, batched.)
+2. Update `handoff.json` with `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields.
+3. Run targeted recheck (NOT a full re-review):
+   ```bash
+   codex exec \
+     -c 'model_reasoning_effort="xhigh"' \
+     -s danger-full-access \
+     -o .reviews/latest-review.md \
+     "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
+      to find the previous_review path — read that file for the original \
+      findings and certify conditions. Then read .reviews/response.json \
+      for the author's responses. For each: \
+      FIXED → verify the fix against the original certify condition. \
+      DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
+      ACCEPTED → verify it was applied. \
+      Do NOT raise new findings unless P0 (critical/security). \
+      New observations go in 'Notes for next review' (non-blocking). \
       End with CERTIFIED or NOT CERTIFIED."
    ```
-3. Read `.reviews/latest-review.md` — if CERTIFIED, proceed to CI. If NOT CERTIFIED, fix findings and repeat from step 1.
+4. If CERTIFIED → done. If NOT CERTIFIED (rejected disputes or failed fixes) → fix rejected items and repeat.
+### Convergence
+Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of open findings. Don't spin indefinitely.
 ```
-Self-review passes → write handoff.json → user runs codex exec
-    ^                                              |
-    |                                    CERTIFIED? → YES → CI feedback loop
-    |                                              |
-    |                                              → NO (findings)
-    |                                              |
-    └──────── Fix findings ←───────────────────────┘
-                  (repeat until CERTIFIED, or ask user)
+Self-review passes → handoff.json (round 1, PENDING_REVIEW)
+                            |
+                   Reviewer: FULL REVIEW (structured findings)
+                            |
+                   CERTIFIED? → YES → CI feedback loop
+                            |
+                            NO (findings with IDs + certify conditions)
+                            |
+                   Claude writes response.json:
+                     FIXED / DISPUTED / ACCEPTED per finding
+                            |
+                   handoff.json (round 2+, PENDING_RECHECK)
+                            |
+                   Reviewer: TARGETED RECHECK (previous findings only)
+                            |
+                   All resolved? → YES → CERTIFIED
+                            |
+                            NO → fix rejected items, repeat
+                            (max 3 rechecks, then escalate to user)
 ```
 **Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
@@ -1937,111 +2024,6 @@ CI passes -> Read review suggestions
 ---
-## Step 7: Create Testing Skill
-Create `.claude/skills/testing/SKILL.md`:
-````markdown
----
-name: testing
-description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
-argument-hint: [test type] [target]
----
-# Testing Skill - TDD & Testing Philosophy
-## Task
-$ARGUMENTS
-## Testing Diamond (CRITICAL)
-```
-    /\         ← Few E2E (automated or manual sign-off at end)
-   /  \
-  /    \
- /------\
-|        |     ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
-|        |
- \------/
-  \    /
-   \  /
-    \/         ← Few Unit (pure logic only)
-```
-**Why Integration Tests are Best Bang for Buck:**
-- **Speed**: Fast enough to run on every change
-- **Stability**: Touch real code, not mocks that lie
-- **Confidence**: If they pass, production usually works
-- **Real bugs**: Integration tests with real DB catch real bugs
-- Unit tests with mocks can "pass" while production fails
-## Minimal Mocking Philosophy
-| What | Mock? | Why |
-|------|-------|-----|
-| Database | ❌ NEVER | Use test DB or in-memory |
-| Cache | ❌ NEVER | Use isolated test instance |
-| External APIs | ✅ YES | Real calls = flaky + expensive |
-| Time/Date | ✅ YES | Determinism |
-**Mocks MUST come from REAL captured data:**
-- Capture real API response
-- Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
-- Import in tests
-- Never guess mock shapes!
-## TDD Tests Must PROVE
-| Phase | What It Proves |
-|-------|----------------|
-| RED | Test FAILS → Bug exists or feature missing |
-| GREEN | Test PASSES → Fix works or feature implemented |
-| Forever | Regression protection |
-**WRONG approach:**
-```
-// ❌ Writing test that passes with current (buggy) code
-assert currentBuggyBehavior == currentBuggyBehavior  // pseudocode
-```
-**CORRECT approach:**
-```
-// ✅ Writing test that FAILS with buggy code, PASSES with fix
-assert result.status == 'success'   // pseudocode - adapt to your framework
-assert result.data != null
-```
-## Unit Tests = Pure Logic ONLY
-A function qualifies for unit testing ONLY if:
-- ✅ No database calls
-- ✅ No external API calls
-- ✅ No file system access
-- ✅ No cache calls
-- ✅ Input → Output transformation only
-Everything else needs integration tests.
-## When Stuck on Tests
-1. Add console.logs → Check output
-2. Run single test in isolation
-3. Check fixtures match real API
-4. **STILL stuck?** ASK USER
-## After Session (Capture Learnings)
-If this session revealed testing insights, update the right place:
-- **Testing patterns, gotchas** → `TESTING.md`
-- **Feature-specific test quirks** → Feature docs (`*_PLAN.md`)
-- **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
----
-**Full reference:** TESTING.md
-````
----
 ### Visual Regression Testing (Experimental - Niche Use Cases Only)
 **Most apps don't need this.** Standard E2E testing (Playwright, Cypress) covers 99% of UI testing needs.
@@ -2230,9 +2212,27 @@ These are your full reference docs. Start with stubs and expand over time:
 **Claude follows this automatically.** When task involves "deploy to prod" and confidence is LOW, Claude will ask before proceeding.
+## Post-Deploy Verification
+**After deploying to ANY environment, verify it's working:**
+| Environment | Health Check | Log Command | Smoke Test |
+|-------------|-------------|-------------|------------|
+| Local Dev | `curl http://localhost:3000/health` | `[your dev log command]` | `npm run test:smoke` |
+| Staging | `curl https://staging.example.com/health` | `[your staging log command]` | `[your staging smoke test]` |
+| Production | `curl https://example.com/health` | `[your prod log command, e.g., kubectl logs]` | `[your prod smoke test]` |
+**Monitoring after production deploy:**
+1. Watch error rates for 15 minutes (dashboard: `[your monitoring URL]`)
+2. Check application logs for new errors: `[your log command]`
+3. Run smoke tests against production: `[your smoke test command]`
+4. If issues found → rollback first, THEN start new SDLC loop to fix
+**Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
 ## Rollback
-If deployment fails or causes issues:
+If deployment fails or post-deploy verification catches issues:
 | Environment | Rollback Command | Notes |
 |-------------|------------------|-------|
@@ -2266,7 +2266,7 @@ If deployment fails or causes issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.15.0 -->
+<!-- SDLC Wizard Version: 1.18.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -2298,7 +2298,7 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
 ```markdown
 # Testing Guidelines
-See `.claude/skills/testing/SKILL.md` for TDD philosophy.
+See `TESTING.md` for TDD philosophy.
 ## Test Commands
@@ -2394,7 +2394,6 @@ Verification Results:
 ├── .claude/hooks/tdd-pretool-check.sh    ✓ executable
 ├── .claude/settings.json                  ✓ valid JSON
 ├── .claude/skills/sdlc/SKILL.md          ✓ frontmatter OK
-├── .claude/skills/testing/SKILL.md       ✓ frontmatter OK
 ├── CLAUDE.md                              ✓ exists
 ├── SDLC.md                                ✓ exists
 └── TESTING.md                             ✓ exists
@@ -2422,14 +2421,14 @@ All checks passed! Setup complete.
 |------|-----------------|
 | "What files handle auth?" | Answers without invoking skills |
 | "Add a logout button" | Auto-invokes sdlc skill, uses TodoWrite |
-| "Write tests for login" | Auto-invokes testing skill |
+| "Write tests for login" | Auto-invokes sdlc skill |
 **What happens automatically:**
 | You Do | System Does |
 |--------|-------------|
 | Ask to implement something | SDLC skill auto-invokes, TodoWrite starts |
-| Ask to write tests | Testing skill auto-invokes |
+| Ask to write tests | SDLC skill auto-invokes |
 | Claude tries to edit code | TDD reminder fires |
 | Task completes | Compliance check runs |
@@ -2494,7 +2493,7 @@ All checks passed! Setup complete.
 |--------|---------|
 | Free context after planning | `/compact` |
 | Enter planning mode | Claude suggests or `/plan` |
-| Run specific skill | `/sdlc` or `/testing` |
+| Run specific skill | `/sdlc` |
 ---
@@ -2525,7 +2524,7 @@ You've successfully set up the system when:
 - [ ] Light hook fires every prompt (you see SDLC BASELINE in responses)
 - [ ] Claude auto-invokes sdlc skill for implementation tasks
-- [ ] Claude auto-invokes testing skill for test tasks
+- [ ] Claude auto-invokes sdlc skill for all tasks
 - [ ] Claude uses TodoWrite to track progress
 - [ ] Claude states confidence levels
 - [ ] Claude asks for clarification when LOW confidence
@@ -2852,13 +2851,14 @@ Use an independent AI model from a different company as a code reviewer. The aut
 {
   "review_id": "feature-xyz-001",
   "status": "PENDING_REVIEW",
+  "round": 1,
   "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
   "review_instructions": "Review for security, edge cases, and correctness",
   "artifact_path": ".reviews/feature-xyz-001/"
 }
 ```
-3. Run the independent reviewer:
+3. Run the independent reviewer (Round 1 — full review). These commands use your Codex default model — configure it to the latest, most capable model available:
 ```bash
 codex exec \
@@ -2866,23 +2866,95 @@ codex exec \
   -s danger-full-access \
   -o .reviews/latest-review.md \
   "You are an independent code reviewer. Read .reviews/handoff.json, \
-   review the listed files, and write your findings to the artifact_path. \
+   review the listed files. Output each finding with: an ID (1, 2, ...), \
+   severity (P0/P1/P2), description, and a 'certify condition' stating \
+   what specific change would resolve it. \
    End with CERTIFIED or NOT CERTIFIED."
 ```
-**The Loop:**
+4. If CERTIFIED → done. If NOT CERTIFIED → enter the dialogue loop.
+**The Dialogue Loop (Round 2+):**
+Instead of silently fixing everything and resubmitting for another full review, respond to each finding:
+```jsonc
+// .reviews/response.json
+{
+  "review_id": "feature-xyz-001",
+  "round": 2,
+  "responding_to": ".reviews/latest-review.md",
+  "responses": [
+    {
+      "finding": "1",
+      "action": "FIXED",
+      "summary": "Added missing mocking table to SKILL.md",
+      "evidence": "git diff shows table at SKILL.md:195-210"
+    },
+    {
+      "finding": "2",
+      "action": "DISPUTED",
+      "justification": "The upgrade path cleanup runs in init.js:205. Verified with test-cli.sh test 29.",
+      "evidence": "tests/test-cli.sh:583-600"
+    },
+    {
+      "finding": "3",
+      "action": "ACCEPTED",
+      "summary": "Will add EVAL_PROMPT_VERSION bump"
+    }
+  ]
+}
+```
+Three response types:
+- **FIXED**: "I fixed this. Here is what changed." Reviewer verifies the fix.
+- **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects the reasoning.
+- **ACCEPTED**: "You are right. Fixing now." (Same outcome as FIXED, used when batching fixes.)
+Then update `handoff.json` to `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields. Run a targeted recheck:
+```bash
+codex exec \
+  -c 'model_reasoning_effort="xhigh"' \
+  -s danger-full-access \
+  -o .reviews/latest-review.md \
+  "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
+   to find the previous_review path — read that file for the original \
+   findings and certify conditions. Then read .reviews/response.json \
+   for the author's responses. For each: \
+   FIXED → verify the fix against the original certify condition. \
+   DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
+   ACCEPTED → verify it was applied. \
+   Do NOT raise new findings unless P0 (critical/security). \
+   New observations go in 'Notes for next review' (non-blocking). \
+   End with CERTIFIED or NOT CERTIFIED."
 ```
-Claude writes code → self-review passes → handoff.json
+**The key constraint:** Rechecks are scoped to previous findings only. The reviewer cannot block certification with new P2 observations discovered during recheck. This prevents scope creep and ensures convergence.
+**Convergence:** Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of all open findings. Don't spin indefinitely.
+```
+Claude writes code → self-review passes → handoff.json (round 1)
     ↑                                          |
     |                                          v
-    |                              Codex reviews (xhigh reasoning)
+    |                              Reviewer: FULL REVIEW
+    |                              (structured findings with IDs)
     |                                          |
     |                              CERTIFIED? -+→ YES → Done
     |                                          |
     |                                          +→ NO (findings)
     |                                          |
-    └────────── Claude fixes findings ←────────┘
-                    (repeat until CERTIFIED, or ask user)
+    |                              Claude writes response.json:
+    |                                FIXED / DISPUTED / ACCEPTED
+    |                                          |
+    |                              Reviewer: TARGETED RECHECK
+    |                              (previous findings only, no new P1/P2)
+    |                                          |
+    |                              All resolved? → YES → CERTIFIED
+    |                                          |
+    └────────── Fix rejected items ←───────────┘
+                    (max 3 rechecks, then escalate to user)
 ```
 **Key flags:**
@@ -2952,10 +3024,13 @@ If Claude repeatedly struggles in a codebase area:
 ### How to Update
-Ask Claude any of these:
+Use the `/update-wizard` skill for a guided, selective update experience:
+> `/update-wizard` — full guided update (shows changelog, per-file diff, selective adoption)
+> `/update-wizard check-only` — just show what changed, don't apply anything
+> `/update-wizard force-all` — apply all updates without per-file approval
+Or ask Claude directly:
 > "Check for SDLC wizard updates"
-> "Run me through the SDLC wizard"
-> "What am I missing from the latest wizard?"
 > "Update my SDLC setup"
 **All of these do the same thing:** Claude checks what's new, shows you, and walks you through only what's missing.
@@ -3034,7 +3109,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.15.0 -->
+<!-- SDLC Wizard Version: 1.18.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->
@@ -3067,12 +3142,12 @@ Every wizard step has a unique ID for tracking:
 | `step-4` | Light hook | 1.0.0 |
 | `step-5` | TDD hook | 1.0.0 |
 | `step-6` | SDLC skill | 1.0.0 |
-| `step-7` | Testing skill | 1.0.0 |
 | `step-8` | CLAUDE.md | 1.0.0 |
 | `step-9` | SDLC/TESTING/ARCH docs | 1.0.0 |
 | `question-git-workflow` | Git workflow preference | 1.2.0 |
 | `step-update-notify` | Optional: CI update notification | 1.13.0 |
 | `step-cross-model-review` | Optional: Cross-model review loop | 1.16.0 |
+| `step-update-wizard` | /update-wizard smart update skill | 1.18.0 |
 When checking for updates, Claude compares user's completed steps against this registry.