npm - agentic-sdlc-wizard - Versions diffs - 1.15.0 → 1.20.0 - Mend

agentic-sdlc-wizard 1.15.0 → 1.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/CHANGELOG.md +45 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +398 -183
package/README.md +51 -35
package/cli/bin/sdlc-wizard.js +14 -3
package/cli/init.js +202 -10
package/cli/templates/hooks/instructions-loaded-check.sh +1 -1
package/cli/templates/hooks/sdlc-prompt-check.sh +16 -5
package/cli/templates/skills/sdlc/SKILL.md +188 -33
package/cli/templates/skills/setup/SKILL.md +178 -0
package/cli/templates/skills/update/SKILL.md +141 -0
package/package.json +1 -1
package/cli/templates/skills/testing/SKILL.md +0 -97

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -93,8 +93,10 @@ This prevents both false positives (crying wolf) and false negatives (missing re
 **How We Apply This:**
 - Weekly workflow tests new Claude Code versions before recommending upgrade
+- Version-pinned gate: installs the specific CC version and passes it via `path_to_claude_code_executable` so E2E actually runs the new binary
 - Phase A: Does new CC version break SDLC enforcement?
 - Phase B: Do changelog-suggested improvements actually help?
+- Green CI = safe to upgrade. Red = stay on current version until fixed
 - Results shown in PR with statistical confidence
 ---
@@ -209,6 +211,37 @@ When Anthropic provides official plugins or tools that handle something:
 | **Claude Code v2.1.69+** | Required for InstructionsLoaded hook, skill directory variable, and Tasks system |
 | **Git repository** | Files should be committed for team sharing |
+**Blank repos (no CLAUDE.md, no code):** The wizard works on empty repos. Run `npx agentic-sdlc-wizard init` — it installs hooks, skills, and the wizard doc. On first session, the hooks detect missing SDLC files and redirect to `/setup-wizard`, which generates CLAUDE.md, SDLC.md, TESTING.md, and ARCHITECTURE.md interactively. You do NOT need to run Claude's built-in `/init` first — the setup wizard handles everything.
+---
+## Recommended Effort Level
+Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
+| Level | When to Use | How to Set |
+|-------|-------------|------------|
+| `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
+| `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
+**Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
+**When to escalate to `max`:**
+- You hit LOW confidence on your approach — deeper thinking may find clarity
+- You've failed the same thing twice — something non-obvious is wrong
+- Architecture decisions with wide blast radius
+- Complex multi-system debugging where you need to hold many constraints
+- Cross-model review analysis (reading and evaluating external reviewer findings)
+**How it works:**
+- `/effort max` changes effort for the current session only (resets next session)
+- `effort: high` in SKILL.md frontmatter persists — every `/sdlc` invocation uses `high`
+- You can also type `ultrathink` in any prompt for a single high-effort turn
+**Cost note:** `max` uses significantly more tokens than `high`. Use it when the problem justifies it, not as a default.
+> See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
 ---
 ## Claude Code Feature Updates
@@ -250,7 +283,7 @@ $ARGUMENTS
 **Usage examples**:
 - `/sdlc fix the login validation bug` → `$ARGUMENTS` = "fix the login validation bug"
-- `/testing unit UserService` → `$ARGUMENTS` = "unit UserService"
+- `/sdlc write tests for UserService` → `$ARGUMENTS` = "write tests for UserService"
 **Note**: Skills still auto-invoke via hooks. This is optional polish for manual invocation.
@@ -276,7 +309,7 @@ New built-in commands available to use alongside the wizard:
 ### Skill Effort Frontmatter (v2.1.80+)
-Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` and `/testing` skills use `effort: high` to ensure Claude gives full attention to SDLC tasks.
+Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` skill uses `effort: high` to ensure Claude gives full attention to SDLC tasks.
 ### InstructionsLoaded Hook (v2.1.69+)
@@ -615,6 +648,33 @@ Security review depth should match your project's risk profile. During wizard se
 ---
+## Context Management: `/clear` vs `/compact`
+Two tools for managing context — use the right one:
+| | `/compact` | `/clear` |
+|---|---|---|
+| **What it does** | Summarizes conversation, frees space | Resets conversation entirely |
+| **When to use** | Continuing same task, need more room | Switching to an unrelated task |
+| **Preserves** | Summary of decisions + progress | Nothing (fresh start) |
+| **CLAUDE.md** | Re-loaded from disk | Re-loaded from disk |
+| **Hooks/skills/settings** | Unaffected | Unaffected |
+| **Task list** | Persists | Cleared |
+**Rules of thumb:**
+- `/compact` between planning and implementation (plan preserved in summary)
+- `/clear` between unrelated tasks (stale context wastes tokens and misleads Claude)
+- `/clear` after 2+ failed corrections on the same issue (context is polluted with bad approaches — start fresh with a better prompt)
+- After committing a PR, `/clear` before starting the next feature
+**Auto-compact** fires automatically at ~95% context capacity. You don't need to manage this manually — Claude Code handles it. The SDLC skill suggests `/compact` during CI idle time as a "context GC" opportunity.
+**What survives `/compact`:** Key decisions, code changes, task state (as a summary). What can be lost: detailed early-conversation instructions not in CLAUDE.md, specific file contents read long ago.
+**Best practice:** Put persistent instructions in CLAUDE.md (survives both `/compact` and `/clear`), not in conversation.
+---
 ## Example Workflow (End-to-End)
 Here's what a typical task looks like with this system:
@@ -883,21 +943,25 @@ The wizard creates TDD-specific automations that official plugins don't provide:
 ### Step 0.3: Additional Recommendations (Optional)
-After SDLC setup is complete, run `claude-code-setup` for additional recommendations:
+After SDLC setup is complete, run `/claude-automation-recommender` for stack-specific tooling:
 ```
-"Based on your codebase, recommend additional automations"
+/claude-automation-recommender
 ```
-This may suggest:
-- MCP Servers (context7 for docs, Playwright for frontend)
-- Additional hooks (auto-format if Prettier configured)
-- Subagents (security-reviewer if auth code detected)
+**The wizard is an enforcement engine** — it installs working hooks, skills, and process guardrails that run automatically. **The recommender is a suggestion engine** — it analyzes your codebase and suggests additional automations you might want. They're complementary:
-**Claude prompts for each:**
-> "[Detected: Prettier config] Want to add auto-format hook? (y/n)"
+| Category | Wizard Ships | Recommender Suggests |
+|----------|-------------|---------------------|
+| SDLC process (TDD, planning, review) | Enforced via hooks + skills | Not covered |
+| CI workflows (self-heal, PR review) | Templates + docs | Not covered |
+| MCP servers (context7, Playwright, DB) | Not covered | Per-stack suggestions |
+| Auto-formatting hooks (Prettier, ESLint) | Not covered | Per-stack suggestions |
+| Type-checking hooks (tsc, mypy) | Not covered | Per-stack suggestions |
+| Subagent templates (code-reviewer, etc.) | Cross-model review only | 8 templates |
+| Plugin recommendations (LSPs, etc.) | Not covered | Per-stack suggestions |
-These are additive—they don't replace our TDD hooks.
+The recommender's suggestions are additive — they don't replace the wizard's TDD hooks or SDLC enforcement.
 ### Git Workflow Preference
@@ -1376,7 +1440,7 @@ Your answers map to these files:
 | Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
 | Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
 | Q11 (test duration) | `SDLC skill` - wait time note |
-| Q12 (E2E) | `testing skill` - testing diamond top |
+| Q12 (E2E) | `TESTING.md` - testing diamond top |
 ---
@@ -1387,7 +1451,6 @@ Create these directories in your project root:
 ```bash
 mkdir -p .claude/hooks
 mkdir -p .claude/skills/sdlc
-mkdir -p .claude/skills/testing
 ```
 **Commit to Git:** Yes! These files should be committed so your whole team gets the same SDLC enforcement. When teammates pull, they get the hooks and skills automatically.
@@ -1506,9 +1569,8 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
 The light hook outputs text that **instructs Claude** to invoke skills:
 ```
-AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
-- implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
-- test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
+AUTO-INVOKE SKILL (Claude MUST do this FIRST):
+- implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
 ```
 **This is text-based, not programmatic.** Claude reads this instruction and follows it. When Claude sees your message is an implementation task, it invokes the sdlc skill using the Skill tool. This loads the full SDLC guidance into context.
@@ -1530,7 +1592,7 @@ Create `.claude/hooks/sdlc-prompt-check.sh`:
 ```bash
 #!/bin/bash
 # Light SDLC hook - baseline reminder every prompt (~100 tokens)
-# Full guidance in skills: .claude/skills/sdlc/ and .claude/skills/testing/
+# Full guidance in skill: .claude/skills/sdlc/
 cat << 'EOF'
 SDLC BASELINE:
@@ -1540,10 +1602,8 @@ SDLC BASELINE:
 4. FAILED 2x? STOP and ASK USER
 5. 🛑 ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS
-AUTO-INVOKE SKILLS (Claude MUST do this FIRST):
-- implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
-- test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
-- If BOTH match (e.g., "fix the test") → sdlc takes precedence (includes TDD)
+AUTO-INVOKE SKILL (Claude MUST do this FIRST):
+- implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
 - DON'T invoke for: questions, explanations, reading/exploring code, simple queries
 - DON'T wait for user to type /sdlc - AUTO-INVOKE based on task type
@@ -1635,7 +1695,7 @@ TodoWrite([
   { content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending", activeForm: "Presenting approach" },
   { content: "Signal ready - user exits plan mode", status: "pending", activeForm: "Awaiting plan approval" },
   // TRANSITION PHASE (After plan mode, before compact)
-  { content: "Update feature docs with discovered gotchas", status: "pending", activeForm: "Updating feature docs" },
+  { content: "Doc sync: update feature docs if code change contradicts or extends documented behavior", status: "pending", activeForm: "Syncing feature docs" },
   { content: "Request /compact before TDD", status: "pending", activeForm: "Requesting compact" },
   // IMPLEMENTATION PHASE (After compact)
   { content: "TDD RED: Write failing test FIRST", status: "pending", activeForm: "Writing failing test" },
@@ -1676,7 +1736,7 @@ TodoWrite([
 **Workflow:**
 1. **Plan Mode** (editing blocked): Research → Write plan file → Present approach + confidence
-2. **Transition** (after approval): Update feature docs → Request /compact
+2. **Transition** (after approval): Doc sync (update feature docs if code contradicts/extends them) → Request /compact
 3. **Implementation** (after compact): TDD RED → GREEN → PASS
 **Before TDD, MUST ask:** "Docs updated. Run `/compact` before implementation?"
@@ -1685,13 +1745,13 @@ TodoWrite([
 Before presenting approach, STATE your confidence:
-| Level | Meaning | Action |
-|-------|---------|--------|
-| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
-| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
-| LOW (<60%) | Not sure | ASK USER before proceeding |
-| FAILED 2x | Something's wrong | STOP. ASK USER immediately |
-| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
+| Level | Meaning | Action | Effort |
+|-------|---------|--------|--------|
+| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
+| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
+| LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
+| FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
+| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
 ## Self-Review Loop (CRITICAL)
@@ -1724,38 +1784,100 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
 **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
-**Steps:**
+### Round 1: Initial Review
 1. After self-review passes, write `.reviews/handoff.json`:
    ```jsonc
    {
      "review_id": "feature-xyz-001",
      "status": "PENDING_REVIEW",
+     "round": 1,
      "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
      "review_instructions": "Review for security, edge cases, and correctness",
      "artifact_path": ".reviews/feature-xyz-001/"
    }
    ```
-2. Tell the user to run the independent reviewer:
+2. Run the independent reviewer:
    ```bash
    codex exec \
      -c 'model_reasoning_effort="xhigh"' \
      -s danger-full-access \
      -o .reviews/latest-review.md \
      "You are an independent code reviewer. Read .reviews/handoff.json, \
-      review the listed files, and write your findings to the artifact_path. \
+      review the listed files. Output each finding with: an ID (1, 2, ...), \
+      severity (P0/P1/P2), description, and a 'certify condition' stating \
+      what specific change would resolve it. \
       End with CERTIFIED or NOT CERTIFIED."
    ```
-3. Read `.reviews/latest-review.md` — if CERTIFIED, proceed to CI. If NOT CERTIFIED, fix findings and repeat from step 1.
+3. If CERTIFIED → proceed to CI. If NOT CERTIFIED → go to Round 2.
+### Round 2+: Dialogue Loop
+When the reviewer finds issues, respond per-finding instead of silently fixing everything:
+1. Write `.reviews/response.json`:
+   ```jsonc
+   {
+     "review_id": "feature-xyz-001",
+     "round": 2,
+     "responding_to": ".reviews/latest-review.md",
+     "responses": [
+       { "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
+       { "finding": "2", "action": "DISPUTED", "justification": "This is intentional — see CODE_REVIEW_EXCEPTIONS.md" },
+       { "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
+     ]
+   }
+   ```
+   - **FIXED**: "I fixed this. Here is what changed." Reviewer verifies.
+   - **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects.
+   - **ACCEPTED**: "You are right. Fixing now." (Same as FIXED, batched.)
+2. Update `handoff.json` with `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields.
+3. Run targeted recheck (NOT a full re-review):
+   ```bash
+   codex exec \
+     -c 'model_reasoning_effort="xhigh"' \
+     -s danger-full-access \
+     -o .reviews/latest-review.md \
+     "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
+      to find the previous_review path — read that file for the original \
+      findings and certify conditions. Then read .reviews/response.json \
+      for the author's responses. For each: \
+      FIXED → verify the fix against the original certify condition. \
+      DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
+      ACCEPTED → verify it was applied. \
+      Do NOT raise new findings unless P0 (critical/security). \
+      New observations go in 'Notes for next review' (non-blocking). \
+      End with CERTIFIED or NOT CERTIFIED."
+   ```
+4. If CERTIFIED → done. If NOT CERTIFIED (rejected disputes or failed fixes) → fix rejected items and repeat.
+### Convergence
+Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of open findings. Don't spin indefinitely.
 ```
-Self-review passes → write handoff.json → user runs codex exec
-    ^                                              |
-    |                                    CERTIFIED? → YES → CI feedback loop
-    |                                              |
-    |                                              → NO (findings)
-    |                                              |
-    └──────── Fix findings ←───────────────────────┘
-                  (repeat until CERTIFIED, or ask user)
+Self-review passes → handoff.json (round 1, PENDING_REVIEW)
+                            |
+                   Reviewer: FULL REVIEW (structured findings)
+                            |
+                   CERTIFIED? → YES → CI feedback loop
+                            |
+                            NO (findings with IDs + certify conditions)
+                            |
+                   Claude writes response.json:
+                     FIXED / DISPUTED / ACCEPTED per finding
+                            |
+                   handoff.json (round 2+, PENDING_RECHECK)
+                            |
+                   Reviewer: TARGETED RECHECK (previous findings only)
+                            |
+                   All resolved? → YES → CERTIFIED
+                            |
+                            NO → fix rejected items, repeat
+                            (max 3 rechecks, then escalate to user)
 ```
 **Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
@@ -1811,7 +1933,7 @@ Debug it. Find root cause. Fix it properly. Tests ARE code.
 ## Flaky Test Prevention
-**Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions.
+**Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions. For a deep dive, see: [How do you Address and Prevent Flaky Tests?](https://softwareautomation.notion.site/How-do-you-Address-and-Prevent-Flaky-Tests-23c539e19b3c46eeb655642b95237dc0)
 ### Principles
@@ -1839,7 +1961,9 @@ Sometimes the flakiness is genuinely in CI infrastructure (runner environment, G
 - **Keep quality gates strict** — the actual pass/fail decision must NOT have `continue-on-error`
 - **Separate "fail the build" from "nice to have"** — a missing PR comment is not a regression
-## CI Feedback Loop (After Commit)
+## CI Feedback Loop — Local Shepherd (After Commit)
+**This is the "local shepherd" — the primary CI fix mechanism.** It runs in your active session with full context. The optional CI Auto-Fix bot (`.github/workflows/ci-autofix.yml`) is a fallback for unattended PRs only. When both are active, the bot detects your local pushes via SHA comparison and skips automatically.
 **The SDLC doesn't end at local tests.** CI must pass too.
@@ -1885,7 +2009,7 @@ Local tests pass -> Commit -> Push -> Watch CI
 - Flaky? Investigate - flakiness is a bug
 - Stuck? ASK USER
-## CI Review Feedback Loop (After CI Passes)
+## CI Review Feedback Loop — Local Shepherd (After CI Passes)
 **CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
@@ -1917,6 +2041,25 @@ CI passes -> Read review suggestions
 - **Ask first**: Present suggestions to user, let them decide which to implement
 - **Skip review feedback**: Ignore CI review suggestions, only fix CI failures
+## Shepherd vs. Bot: Two-Tier CI Fix Model
+| Aspect | Local Shepherd | CI Auto-Fix Bot |
+|--------|---------------|-----------------|
+| **When** | Active session (you're working) | Unattended (pushed and walked away) |
+| **Context** | Full: codebase, conversation, intent | Minimal: `--bare`, 200-line truncated logs |
+| **Cost** | Session tokens (marginal cost ~$0) | Separate API calls ($0.50-$2.00 per fix) |
+| **Noise** | 0 extra commits | 1+ `[autofix N/M]` commits per attempt |
+| **Quality** | High: full diagnosis, targeted fix | Lower: stateless, may repeat same approach |
+| **Speed** | Immediate: fix locally, push once | Delayed: workflow_run trigger + runner queue |
+| **Deconfliction** | N/A (is the primary) | SHA check: skips if branch advanced since failure |
+**The shepherd is the default.** It runs as part of the SDLC checklist above whenever you push from an active session. The bot is optional and only adds value for:
+- Dependabot/Renovate PRs (no human session)
+- PRs where you push and walk away
+- Overnight CI runs
+If you set up the bot, the SHA-based suppression ensures they never conflict.
 ## DRY Principle
 **Before coding:** "What patterns exist I can reuse?"
@@ -1937,111 +2080,6 @@ CI passes -> Read review suggestions
 ---
-## Step 7: Create Testing Skill
-Create `.claude/skills/testing/SKILL.md`:
-````markdown
----
-name: testing
-description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
-argument-hint: [test type] [target]
----
-# Testing Skill - TDD & Testing Philosophy
-## Task
-$ARGUMENTS
-## Testing Diamond (CRITICAL)
-```
-    /\         ← Few E2E (automated or manual sign-off at end)
-   /  \
-  /    \
- /------\
-|        |     ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
-|        |
- \------/
-  \    /
-   \  /
-    \/         ← Few Unit (pure logic only)
-```
-**Why Integration Tests are Best Bang for Buck:**
-- **Speed**: Fast enough to run on every change
-- **Stability**: Touch real code, not mocks that lie
-- **Confidence**: If they pass, production usually works
-- **Real bugs**: Integration tests with real DB catch real bugs
-- Unit tests with mocks can "pass" while production fails
-## Minimal Mocking Philosophy
-| What | Mock? | Why |
-|------|-------|-----|
-| Database | ❌ NEVER | Use test DB or in-memory |
-| Cache | ❌ NEVER | Use isolated test instance |
-| External APIs | ✅ YES | Real calls = flaky + expensive |
-| Time/Date | ✅ YES | Determinism |
-**Mocks MUST come from REAL captured data:**
-- Capture real API response
-- Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
-- Import in tests
-- Never guess mock shapes!
-## TDD Tests Must PROVE
-| Phase | What It Proves |
-|-------|----------------|
-| RED | Test FAILS → Bug exists or feature missing |
-| GREEN | Test PASSES → Fix works or feature implemented |
-| Forever | Regression protection |
-**WRONG approach:**
-```
-// ❌ Writing test that passes with current (buggy) code
-assert currentBuggyBehavior == currentBuggyBehavior  // pseudocode
-```
-**CORRECT approach:**
-```
-// ✅ Writing test that FAILS with buggy code, PASSES with fix
-assert result.status == 'success'   // pseudocode - adapt to your framework
-assert result.data != null
-```
-## Unit Tests = Pure Logic ONLY
-A function qualifies for unit testing ONLY if:
-- ✅ No database calls
-- ✅ No external API calls
-- ✅ No file system access
-- ✅ No cache calls
-- ✅ Input → Output transformation only
-Everything else needs integration tests.
-## When Stuck on Tests
-1. Add console.logs → Check output
-2. Run single test in isolation
-3. Check fixtures match real API
-4. **STILL stuck?** ASK USER
-## After Session (Capture Learnings)
-If this session revealed testing insights, update the right place:
-- **Testing patterns, gotchas** → `TESTING.md`
-- **Feature-specific test quirks** → Feature docs (`*_PLAN.md`)
-- **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
----
-**Full reference:** TESTING.md
-````
----
 ### Visual Regression Testing (Experimental - Niche Use Cases Only)
 **Most apps don't need this.** Standard E2E testing (Playwright, Cypress) covers 99% of UI testing needs.
@@ -2230,9 +2268,27 @@ These are your full reference docs. Start with stubs and expand over time:
 **Claude follows this automatically.** When task involves "deploy to prod" and confidence is LOW, Claude will ask before proceeding.
+## Post-Deploy Verification
+**After deploying to ANY environment, verify it's working:**
+| Environment | Health Check | Log Command | Smoke Test |
+|-------------|-------------|-------------|------------|
+| Local Dev | `curl http://localhost:3000/health` | `[your dev log command]` | `npm run test:smoke` |
+| Staging | `curl https://staging.example.com/health` | `[your staging log command]` | `[your staging smoke test]` |
+| Production | `curl https://example.com/health` | `[your prod log command, e.g., kubectl logs]` | `[your prod smoke test]` |
+**Monitoring after production deploy:**
+1. Watch error rates for 15 minutes (dashboard: `[your monitoring URL]`)
+2. Check application logs for new errors: `[your log command]`
+3. Run smoke tests against production: `[your smoke test command]`
+4. If issues found → rollback first, THEN start new SDLC loop to fix
+**Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
 ## Rollback
-If deployment fails or causes issues:
+If deployment fails or post-deploy verification catches issues:
 | Environment | Rollback Command | Notes |
 |-------------|------------------|-------|
@@ -2266,7 +2322,7 @@ If deployment fails or causes issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.15.0 -->
+<!-- SDLC Wizard Version: 1.20.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -2298,7 +2354,7 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
 ```markdown
 # Testing Guidelines
-See `.claude/skills/testing/SKILL.md` for TDD philosophy.
+See `TESTING.md` for TDD philosophy.
 ## Test Commands
@@ -2394,7 +2450,6 @@ Verification Results:
 ├── .claude/hooks/tdd-pretool-check.sh    ✓ executable
 ├── .claude/settings.json                  ✓ valid JSON
 ├── .claude/skills/sdlc/SKILL.md          ✓ frontmatter OK
-├── .claude/skills/testing/SKILL.md       ✓ frontmatter OK
 ├── CLAUDE.md                              ✓ exists
 ├── SDLC.md                                ✓ exists
 └── TESTING.md                             ✓ exists
@@ -2422,14 +2477,14 @@ All checks passed! Setup complete.
 |------|-----------------|
 | "What files handle auth?" | Answers without invoking skills |
 | "Add a logout button" | Auto-invokes sdlc skill, uses TodoWrite |
-| "Write tests for login" | Auto-invokes testing skill |
+| "Write tests for login" | Auto-invokes sdlc skill |
 **What happens automatically:**
 | You Do | System Does |
 |--------|-------------|
 | Ask to implement something | SDLC skill auto-invokes, TodoWrite starts |
-| Ask to write tests | Testing skill auto-invokes |
+| Ask to write tests | SDLC skill auto-invokes |
 | Claude tries to edit code | TDD reminder fires |
 | Task completes | Compliance check runs |
@@ -2494,7 +2549,7 @@ All checks passed! Setup complete.
 |--------|---------|
 | Free context after planning | `/compact` |
 | Enter planning mode | Claude suggests or `/plan` |
-| Run specific skill | `/sdlc` or `/testing` |
+| Run specific skill | `/sdlc` |
 ---
@@ -2525,7 +2580,7 @@ You've successfully set up the system when:
 - [ ] Light hook fires every prompt (you see SDLC BASELINE in responses)
 - [ ] Claude auto-invokes sdlc skill for implementation tasks
-- [ ] Claude auto-invokes testing skill for test tasks
+- [ ] Claude auto-invokes sdlc skill for all tasks
 - [ ] Claude uses TodoWrite to track progress
 - [ ] Claude states confidence levels
 - [ ] Claude asks for clarification when LOW confidence
@@ -2578,9 +2633,17 @@ Want me to file these? (yes/no/not now)
 ## Going Further
-### Create Feature Plan Docs
+### Feature Documentation
+Keep feature docs alongside code. Three patterns, use what fits:
+| Pattern | When to Use | Example |
+|---------|-------------|---------|
+| `*_PLAN.md` / `*_DOCS.md` | Per-feature living docs | `AUTH_DOCS.md`, `PAYMENTS_PLAN.md` |
+| `docs/decisions/NNN-title.md` (ADR) | Architecture decisions that need rationale | `docs/decisions/001-use-postgres.md` |
+| `docs/features/name.md` | Feature docs in a `docs/` directory | `docs/features/auth.md` |
-For each major feature, create `FEATURE_NAME_PLAN.md`:
+**Feature doc template:**
 ```markdown
 # Feature Name
@@ -2598,7 +2661,36 @@ Things that can trip you up.
 What's planned but not done.
 ```
-Claude will read these during planning and update them with discoveries.
+**ADR (Architecture Decision Record) template** — for decisions that need context:
+```markdown
+# ADR-NNN: Decision Title
+## Status
+Accepted | Superseded by ADR-NNN | Deprecated
+## Context
+What is the problem? What forces are at play?
+## Decision
+What did we decide and why?
+## Consequences
+What are the trade-offs? What becomes easier/harder?
+```
+Store ADRs in `docs/decisions/`. Number sequentially. Claude reads these during planning to understand why things are built the way they are.
+**Keeping docs in sync with code:**
+Docs drift when code changes but docs don't. The SDLC skill's planning phase detects this:
+- During planning, Claude reads feature docs for the area being changed
+- If the code change contradicts what the doc says, Claude updates the doc
+- The "After Session" step routes learnings to the right doc
+- Stale docs cause low confidence — if Claude struggles, the doc may need updating
+**CLAUDE.md health:** Run `/claude-md-improver` periodically (quarterly or after major changes). It audits CLAUDE.md specifically — structure, clarity, completeness (6 criteria, 100-point rubric). It does NOT cover feature docs, TESTING.md, or ADRs — the SDLC workflow handles those.
 ### Expand TESTING.md
@@ -2622,6 +2714,10 @@ Add project-specific guidance to skills:
 - Preferred patterns
 - Architecture decisions
+### Complementary Tools
+The wizard handles SDLC process enforcement. For stack-specific tooling, run `/claude-automation-recommender` — it suggests MCP servers, formatting hooks, type-checking hooks, subagent templates, and plugins based on your detected tech stack. See [Step 0.3](#step-03-additional-recommendations-optional) for the full comparison.
 ---
 ## Testing AI Apps: What's Different
@@ -2689,6 +2785,49 @@ _Sources: [Confident AI](https://www.confident-ai.com/blog/llm-testing-in-2024-t
 ---
+## Token Efficiency
+Practical techniques to reduce token consumption without sacrificing quality.
+### Monitor Costs
+| Tool | What It Shows | When to Use |
+|------|---------------|-------------|
+| `/cost` | Session total: USD, API time, code changes | After a session to review spend |
+| `/context` | What's consuming context window space | When hitting context limits |
+| Status line | Real-time `cost.total_cost_usd` + token counts | Continuous monitoring |
+### Reduce Consumption
+| Technique | Savings | How |
+|-----------|---------|-----|
+| `/compact` between phases | ~40-60% context | Plan → compact → implement (plan preserved) |
+| `/clear` between tasks | 100% context reset | No stale context from prior work |
+| Delegate verbose ops to subagents | Separate context | `Agent` tool returns summary, not full output |
+| Use skills for on-demand knowledge | Smaller base context | Skills load only when invoked |
+| Scope investigations narrowly | Fewer tokens read | "investigate auth module" > "investigate codebase" |
+| `--effort low` for simple tasks | ~50% thinking tokens | Simple renames, config changes |
+### CI Cost Control
+Add `--max-budget-usd` to CI workflows as a safety net:
+```yaml
+claude_args: "--max-budget-usd 5.00 --max-turns 30"
+```
+| Flag | Purpose |
+|------|---------|
+| `--max-budget-usd` | Hard dollar cap per CI invocation |
+| `--max-turns` | Limit agentic turns (prevents infinite loops) |
+| `--effort` | `low`/`medium`/`high` controls thinking depth |
+### Advanced: OpenTelemetry
+For organization-wide cost tracking, enable `CLAUDE_CODE_ENABLE_TELEMETRY=1`. This exports per-request `cost_usd`, `input_tokens`, `output_tokens` to any OTLP-compatible backend (Datadog, Honeycomb, Prometheus).
+---
 ## CI/CD Gotchas
 Common pitfalls when automating AI-assisted development workflows.
@@ -2750,7 +2889,9 @@ Claude: [fetches via gh api, discusses with you interactively]
 This is optional - skip if you prefer fresh reviews only.
-### CI Auto-Fix Loop (Optional)
+### CI Auto-Fix Loop (Optional — Bot Fallback)
+> **Two-tier model:** The SDLC skill's CI loops (above) are the "local shepherd" — they handle CI fixes during active sessions. This bot is the second tier: an unattended fallback for when no one is watching. The bot includes SHA-based suppression — if you push a fix locally before the bot runs, it skips automatically.
 Automatically fix CI failures and PR review findings. Claude reads the error context, fixes the code, commits, and re-triggers CI. Loops until CI passes AND review has no findings at your chosen level, or max retries hit.
@@ -2852,13 +2993,14 @@ Use an independent AI model from a different company as a code reviewer. The aut
 {
   "review_id": "feature-xyz-001",
   "status": "PENDING_REVIEW",
+  "round": 1,
   "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
   "review_instructions": "Review for security, edge cases, and correctness",
   "artifact_path": ".reviews/feature-xyz-001/"
 }
 ```
-3. Run the independent reviewer:
+3. Run the independent reviewer (Round 1 — full review). These commands use your Codex default model — configure it to the latest, most capable model available:
 ```bash
 codex exec \
@@ -2866,23 +3008,95 @@ codex exec \
   -s danger-full-access \
   -o .reviews/latest-review.md \
   "You are an independent code reviewer. Read .reviews/handoff.json, \
-   review the listed files, and write your findings to the artifact_path. \
+   review the listed files. Output each finding with: an ID (1, 2, ...), \
+   severity (P0/P1/P2), description, and a 'certify condition' stating \
+   what specific change would resolve it. \
    End with CERTIFIED or NOT CERTIFIED."
 ```
-**The Loop:**
+4. If CERTIFIED → done. If NOT CERTIFIED → enter the dialogue loop.
+**The Dialogue Loop (Round 2+):**
+Instead of silently fixing everything and resubmitting for another full review, respond to each finding:
+```jsonc
+// .reviews/response.json
+{
+  "review_id": "feature-xyz-001",
+  "round": 2,
+  "responding_to": ".reviews/latest-review.md",
+  "responses": [
+    {
+      "finding": "1",
+      "action": "FIXED",
+      "summary": "Added missing mocking table to SKILL.md",
+      "evidence": "git diff shows table at SKILL.md:195-210"
+    },
+    {
+      "finding": "2",
+      "action": "DISPUTED",
+      "justification": "The upgrade path cleanup runs in init.js:205. Verified with test-cli.sh test 29.",
+      "evidence": "tests/test-cli.sh:583-600"
+    },
+    {
+      "finding": "3",
+      "action": "ACCEPTED",
+      "summary": "Will add EVAL_PROMPT_VERSION bump"
+    }
+  ]
+}
 ```
-Claude writes code → self-review passes → handoff.json
+Three response types:
+- **FIXED**: "I fixed this. Here is what changed." Reviewer verifies the fix.
+- **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects the reasoning.
+- **ACCEPTED**: "You are right. Fixing now." (Same outcome as FIXED, used when batching fixes.)
+Then update `handoff.json` to `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields. Run a targeted recheck:
+```bash
+codex exec \
+  -c 'model_reasoning_effort="xhigh"' \
+  -s danger-full-access \
+  -o .reviews/latest-review.md \
+  "You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
+   to find the previous_review path — read that file for the original \
+   findings and certify conditions. Then read .reviews/response.json \
+   for the author's responses. For each: \
+   FIXED → verify the fix against the original certify condition. \
+   DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
+   ACCEPTED → verify it was applied. \
+   Do NOT raise new findings unless P0 (critical/security). \
+   New observations go in 'Notes for next review' (non-blocking). \
+   End with CERTIFIED or NOT CERTIFIED."
+```
+**The key constraint:** Rechecks are scoped to previous findings only. The reviewer cannot block certification with new P2 observations discovered during recheck. This prevents scope creep and ensures convergence.
+**Convergence:** Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of all open findings. Don't spin indefinitely.
+```
+Claude writes code → self-review passes → handoff.json (round 1)
     ↑                                          |
     |                                          v
-    |                              Codex reviews (xhigh reasoning)
+    |                              Reviewer: FULL REVIEW
+    |                              (structured findings with IDs)
     |                                          |
     |                              CERTIFIED? -+→ YES → Done
     |                                          |
     |                                          +→ NO (findings)
     |                                          |
-    └────────── Claude fixes findings ←────────┘
-                    (repeat until CERTIFIED, or ask user)
+    |                              Claude writes response.json:
+    |                                FIXED / DISPUTED / ACCEPTED
+    |                                          |
+    |                              Reviewer: TARGETED RECHECK
+    |                              (previous findings only, no new P1/P2)
+    |                                          |
+    |                              All resolved? → YES → CERTIFIED
+    |                                          |
+    └────────── Fix rejected items ←───────────┘
+                    (max 3 rechecks, then escalate to user)
 ```
 **Key flags:**
@@ -2952,10 +3166,13 @@ If Claude repeatedly struggles in a codebase area:
 ### How to Update
-Ask Claude any of these:
+Use the `/update-wizard` skill for a guided, selective update experience:
+> `/update-wizard` — full guided update (shows changelog, per-file diff, selective adoption)
+> `/update-wizard check-only` — just show what changed, don't apply anything
+> `/update-wizard force-all` — apply all updates without per-file approval
+Or ask Claude directly:
 > "Check for SDLC wizard updates"
-> "Run me through the SDLC wizard"
-> "What am I missing from the latest wizard?"
 > "Update my SDLC setup"
 **All of these do the same thing:** Claude checks what's new, shows you, and walks you through only what's missing.
@@ -3004,21 +3221,19 @@ Claude reads the CHANGELOG to show you what's new **before** applying anything.
 ```
 Claude: "Fetching CHANGELOG to check for updates..."
-Your version: 1.8.0
-Latest version: 1.13.0
+Your version: X.Y.0
+Latest version: X.Z.0
-What's new since 1.8.0:
-- v1.13.0: Self-update improvements, optional CI notification
-- v1.12.0: Full system audit, apply step fixes
-- v1.11.0: Stale output cleanup, error handling
-- v1.10.0: "Prove It's Better" CI automation
-- v1.9.0: Workflow consolidation (6 → 5 workflows)
+What's new since X.Y.0:
+- vX.Z.0: Latest features and improvements
+- vX.Y+1.0: Previous version changes
+  (... entries from CHANGELOG between your version and latest ...)
 Now checking your setup against latest wizard...
 ✓ Hooks - up to date
 ✓ Skills - content differs (update available)
-✗ step-update-notify - NOT DONE (new in v1.13.0, optional)
+✗ step-update-notify - NOT DONE (new in vX.Z.0, optional)
 Summary:
 - 1 file update available (SDLC skill)
@@ -3034,7 +3249,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.15.0 -->
+<!-- SDLC Wizard Version: 1.20.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->
@@ -3067,12 +3282,12 @@ Every wizard step has a unique ID for tracking:
 | `step-4` | Light hook | 1.0.0 |
 | `step-5` | TDD hook | 1.0.0 |
 | `step-6` | SDLC skill | 1.0.0 |
-| `step-7` | Testing skill | 1.0.0 |
 | `step-8` | CLAUDE.md | 1.0.0 |
 | `step-9` | SDLC/TESTING/ARCH docs | 1.0.0 |
 | `question-git-workflow` | Git workflow preference | 1.2.0 |
 | `step-update-notify` | Optional: CI update notification | 1.13.0 |
 | `step-cross-model-review` | Optional: Cross-model review loop | 1.16.0 |
+| `step-update-wizard` | /update-wizard smart update skill | 1.18.0 |
 When checking for updates, Claude compares user's completed steps against this registry.