agentic-sdlc-wizard 1.44.0 → 1.45.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.44.0",
16
+ "version": "1.45.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.44.0",
3
+ "version": "1.45.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,39 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.45.0] - 2026-04-27
8
+
9
+ ### Added
10
+
11
+ - **PreCompact path (c) — SHA-ancestry self-heal** (closes #257). Consumer reported PreCompact blocking `/compact` even when the cited Codex review WAS actually CERTIFIED — the user just forgot to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED`. Existing self-heals don't cover this solo-developer pattern: path (a) needs `pr_number`, path (b) needs `mtime > 14d`. New path (c) heals when: handoff is `PENDING_*` with no `pr_number`, every SHA cited in `fixes_applied[]` is reachable from HEAD (`git merge-base --is-ancestor`), AND `.reviews/latest-review.md` contains `CERTIFIED` without `NOT CERTIFIED`. Path (b) still runs if (c) abstains (no SHAs / no review file).
12
+ - **Robust extraction**: awk extracts the `fixes_applied[]` block via bracket-depth + escape-aware string-literal tracking. `]` inside string literals (e.g. `"[x] FIXED..."` markdown checkboxes, `"...\"]"` escaped-quote-bracket) does NOT terminate the array prematurely.
13
+ - **UUID resilience**: strips 8-4-4-4-12 hex UUIDs before SHA extraction so ticket IDs in fixes_applied entries (Linear, Jira, mission UUIDs) don't false-block the heal.
14
+ - **Phantom SHA gate**: every cited SHA must pass `git merge-base --is-ancestor` against HEAD. Phantom SHAs (typos, references to other repos) correctly fail and block the heal.
15
+ - 9 new test-hooks tests (positive heal, phantom blocks, NOT CERTIFIED blocks, missing review.md blocks, partial coverage blocks, fall-through to stale, markdown-checkbox bracket, UUID alongside real SHA, escaped-quote bracket). Codex round 3 CERTIFIED 10/10 (rounds 1-2 surfaced bracket-extraction edge cases — markdown `[x]`, escaped quotes — and UUID false-block; all fixed).
16
+
17
+ ### Files
18
+
19
+ - `hooks/precompact-seam-check.sh` — new path (c) block with depth-counted + escape-aware awk extraction, sed UUID strip, ancestry check
20
+ - `tests/test-hooks.sh` — `_precompact_init_repo_with_commit` helper + 9 new path (c) tests
21
+
22
+ ## [1.44.1] - 2026-04-27
23
+
24
+ ### Fixed
25
+
26
+ - **Autocompact compound-misconfig detection** — closes #207. Consumer reported autocompact firing at 12% context on a fresh `opus[1m]` session because they set BOTH `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (a natural misreading of `CLAUDE_CODE_SDLC_WIZARD.md:1008`'s "or"-joined cell). The two compound: `30% × 400000 = 120000 tokens ≈ 12% of 1M`.
27
+ - **Doc fix**: `CLAUDE_CODE_SDLC_WIZARD.md` 1M-vs-200K table now writes `**OR** ... (pick one)` and adds a `> ⚠ Do NOT set both` callout that explains the compound math and points at the runtime detection.
28
+ - **Runtime detection**: `instructions-loaded-check.sh` (InstructionsLoaded hook) reads `.claude/settings.json` for both env vars, computes the effective trigger, and warns with the math when both are set — diagnosable from the warning alone.
29
+ - **Shipped skill drift**: `skills/sdlc/SKILL.md` was still calling `opus[1m]` the "default" (stale post-#198) AND repeating the same ambiguous "30 or 400000" wording it ships to consumers. Both fixed: opus[1m] now framed as opt-in with #198 reference; autocompact tuning line says "pick ONE of: ... OR ... (do NOT set both)".
30
+ - 4 new test-hooks tests (warns / silent on PCT-only / silent on WINDOW-only / shows effective trigger), 3 new test-doc-consistency tests (wizard doc + sdlc skill regression guards), size-cap test fixture extended to include the new branch (cap raised 1500 → 1700 to accommodate). Codex round 2 CERTIFIED 9/10 (round 1 surfaced the size-cap, shipped-skill drift, and InstructionsLoaded vs SessionStart wording — all fixed).
31
+
32
+ ### Files
33
+
34
+ - `hooks/instructions-loaded-check.sh` — new compound-misconfig detection block (single-line warning with full env var names + effective trigger math)
35
+ - `CLAUDE_CODE_SDLC_WIZARD.md` — line 1008 alternatives clarification + `> ⚠ Do NOT set both` callout
36
+ - `skills/sdlc/SKILL.md` — `opus[1m]` reframed `Default` → `Opt-in` (matches wizard doc post-#198); autocompact tuning line now warns against the compound config
37
+ - `tests/test-hooks.sh` — 4 new tests + size-cap fixture extended + cap raised
38
+ - `tests/test-doc-consistency.sh` — 3 new regression guards (wizard doc + sdlc skill)
39
+
7
40
  ## [1.44.0] - 2026-04-27
8
41
 
9
42
  ### Fixed
@@ -1005,7 +1005,9 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in
1005
1005
  | **Cost** | Standard pricing | Anthropic currently lists the 1M window at standard pricing across the full context for supported Opus/Sonnet models — **verify current rates at [docs.anthropic.com/pricing](https://docs.anthropic.com/)** before assuming no premium |
1006
1006
  | **Auto-mode** | **Enabled** — Claude Code chooses model per turn | **Disabled** — top-level `model` tells CC you've chosen explicitly |
1007
1007
  | **Auto-compact** | Default ~95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` |
1008
- | **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
1008
+ | **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` **OR** `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (pick one) |
1009
+
1010
+ > **⚠ Do NOT set both.** `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` and `CLAUDE_CODE_AUTO_COMPACT_WINDOW` are alternatives, not complementary. Setting both compounds: `30% × 400000 = 120000` tokens, which is ~12% of a 1M window — autocompact fires almost immediately, destroying the headroom you opted in for. Pick one knob: either lower the trigger percentage (`PCT_OVERRIDE=30`) on the model's default 1M window, OR cap the working window (`AUTO_COMPACT_WINDOW=400000`) at the model's default 95% trigger. The `instructions-loaded-check.sh` `InstructionsLoaded` hook (fires on session start/resume) detects this misconfig and prints the effective trigger so you can debug from the warning alone (#207).
1009
1011
 
1010
1012
  **Why `opus[1m]` is opt-in (issue #198):**
1011
1013
  - **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
@@ -2715,6 +2717,53 @@ Options:
2715
2717
 
2716
2718
  ---
2717
2719
 
2720
+ ### Browser Tooling Policy
2721
+
2722
+ Three different jobs, three different tools. Conflating them is the source of recurring agent failures — `Playwright MCP` for an auth-heavy registrar dashboard wastes a session, browser-use for a deterministic regression test gives flaky CI.
2723
+
2724
+ | Tool | Job | Profile model | When to pick |
2725
+ |------|-----|---------------|--------------|
2726
+ | **Playwright tests** | Deterministic regression suite, CI/release gate | Isolated by design — each test gets a clean browser context per [Playwright docs](https://playwright.dev/docs/browser-contexts) | Asserting expected user flows; running on every PR; gating deploy |
2727
+ | **Playwright MCP** | Live browser debugging, visual QA, DOM inspection | Default mode uses a **persistent Playwright-managed profile** at `ms-playwright/mcp-{channel}-{workspace-hash}` ([docs](https://playwright.dev/docs/getting-started-mcp#user-profile)) — NOT the user's regular Chrome profile. Other modes: `--isolated` (ephemeral context per session), `--user-data-dir=PATH` (caller-supplied dir), CDP attach (`--cdp-endpoint`), extension mode (attach to user's running browser tab) | One-off "look at this page" / "click this button and tell me what happens"; visual verification mid-session |
2728
+ | **Real-browser tooling** (browser-use, CDP-attach, Chrome profile) | Authenticated, profile-dependent, stateful operator flows | **When configured for real-browser mode** (e.g., browser-use `Browser.from_system_chrome()` or CLI `--profile`/`connect`), uses the user's actual Chrome profile (cookies, extensions, logged-in sessions). Default browser-use CLI runs headless Chromium — opt into the real-profile mode explicitly | Registrar dashboards (Porkbun, GoDaddy), DNS setup, cloud-provider consoles, wallet-adjacent Web3 flows, logged-in admin panels — anywhere preserving cookies/profile/extensions matters more than clean isolation |
2729
+
2730
+ **The core insight:** Playwright tests' isolation is a *feature*, not a bug. Playwright MCP's persistent-managed-profile default is also a feature — it preserves session continuity across debug interactions in a SINGLE agent. The collision case is concurrent agents (#251) sharing the same managed profile. Real-browser tooling is the right call only when the task IS the user's authenticated session, and only when explicitly configured to attach to that profile.
2731
+
2732
+ #### When to recommend real-browser tooling (#225)
2733
+
2734
+ Trigger examples — if the task description includes any of these, suggest real-browser tooling (browser-use or CDP-attach) over Playwright MCP:
2735
+
2736
+ - Registrar dashboards (domain purchase, DNS records, nameserver changes)
2737
+ - DNS setup / DNSLink / custom-domain configuration
2738
+ - Cloud/provider dashboards (AWS console, Cloudflare, Vercel, registrars)
2739
+ - Wallet-adjacent Web3 flows (token approvals, contract interactions in a logged-in MetaMask)
2740
+ - Logged-in admin panels (Stripe, Vercel, GitHub admin pages requiring 2FA-cached session)
2741
+ - Anywhere preserving cookies/profile/extensions matters more than clean isolation
2742
+
2743
+ These all share a property: the agent's job IS the authenticated session. A clean automation browser is the wrong model.
2744
+
2745
+ #### Playwright MCP profile-lock policy (#251)
2746
+
2747
+ `Playwright MCP`'s default mode reuses a single persistent managed profile (`ms-playwright/mcp-{channel}-{workspace-hash}`) across stdio sessions — which is the right call for single-agent debugging because it preserves session continuity across calls, but breaks down when **multiple agents or MCP clients run concurrently** (two CC sessions, one CC + one Codex, etc.). Concurrent stdio sessions collide on the same Chrome user-data directory and corrupt each other's session state.
2748
+
2749
+ **Upstream Playwright rejected default-isolated as the global default.** See [microsoft/playwright#40419](https://github.com/microsoft/playwright/issues/40419) and the discussion on [microsoft/playwright#40420](https://github.com/microsoft/playwright/pull/40420) — maintainer feedback: *"That's unfortunately very breaking."* Changing the default would silently break every existing single-agent setup that relies on the persistent managed profile.
2750
+
2751
+ **Wizard policy (per-user, opt-in at the wizard layer, not upstream):**
2752
+
2753
+ - **Single-agent / single MCP client (default):** Use Playwright MCP's default persistent-managed-profile mode. No special config required.
2754
+ - **Concurrent agents / multiple MCP clients:** Pick one of these per client to avoid profile-lock collisions: (a) `--isolated` (ephemeral context per session, no persistence), (b) `--user-data-dir=$TMPDIR/playwright-mcp-$AGENT_ID` (caller-supplied dir, isolated per agent), or (c) `--cdp-endpoint` to attach each agent to a separately-launched browser. None of these require an upstream breaking change.
2755
+ - **Real-browser / profile-dependent flows:** Don't use Playwright MCP at all — use real-browser tooling explicitly configured to attach to the user's profile (e.g., `browser-use` with `Browser.from_system_chrome()` or CLI `--profile`). The task is the session, not isolated automation.
2756
+
2757
+ This rule is per-workflow, not global. Setup wizard does NOT auto-configure isolated profiles — adoption is explicit, gated on the user signaling concurrent-agent intent.
2758
+
2759
+ #### Anti-patterns
2760
+
2761
+ - **Using Playwright MCP for registrar dashboards** — its persistent managed profile is NOT your real Chrome profile, so your registrar's logged-in session/2FA cookies aren't there. You'll be re-logging in on every interaction. Use real-browser tooling configured for your real Chrome profile instead.
2762
+ - **Using profile-coupled / stateful browser tooling for deterministic CI tests** — when browser-use (or any tool) is configured to use a real Chrome profile, cached state, extension chrome, and stale cookies pollute the test. Use Playwright tests with isolated browser contexts for CI.
2763
+ - **Setting Playwright MCP `--isolated` globally as a default** — breaks single-agent flows that rely on the persistent managed profile for session continuity across debug interactions. Upstream Playwright rejected this for the same reason. Make it explicit per-workflow when concurrent agents are running.
2764
+
2765
+ ---
2766
+
2718
2767
  ## Step 8: Create CLAUDE.md
2719
2768
 
2720
2769
  Create `CLAUDE.md` in your project root. This is your project-specific configuration:
@@ -2918,7 +2967,7 @@ If deployment fails or post-deploy verification catches issues:
2918
2967
 
2919
2968
  **SDLC.md:**
2920
2969
  ```markdown
2921
- <!-- SDLC Wizard Version: 1.44.0 -->
2970
+ <!-- SDLC Wizard Version: 1.45.0 -->
2922
2971
  <!-- Setup Date: [DATE] -->
2923
2972
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2924
2973
  <!-- Git Workflow: [PRs or Solo] -->
@@ -3983,7 +4032,7 @@ Walk through updates? (y/n)
3983
4032
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3984
4033
 
3985
4034
  ```markdown
3986
- <!-- SDLC Wizard Version: 1.44.0 -->
4035
+ <!-- SDLC Wizard Version: 1.45.0 -->
3987
4036
  <!-- Setup Date: 2026-01-24 -->
3988
4037
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3989
4038
  <!-- Git Workflow: PRs -->
@@ -170,6 +170,28 @@ fi
170
170
  # this hook and model-effort-check.sh both fire on SessionStart, so two checks
171
171
  # would double-print the nudge and risk drifting out of sync.
172
172
 
173
+ # Autocompact compound-misconfig check (#207). Setting BOTH
174
+ # CLAUDE_AUTOCOMPACT_PCT_OVERRIDE and CLAUDE_CODE_AUTO_COMPACT_WINDOW
175
+ # compounds — e.g. 30% × 400000 = 120000 token trigger, which on a 1M
176
+ # window fires at ~12% of context. The wizard doc lists them as
177
+ # alternatives ("PCT_OVERRIDE=30 OR AUTO_COMPACT_WINDOW=400000") but the
178
+ # "or" is easy to misread, and the consumer in #207 hit autocompact at
179
+ # 12% in a fresh session. Surface the misconfig with the effective
180
+ # trigger so it's diagnosable from the warning alone.
181
+ SETTINGS_JSON="$PROJECT_DIR/.claude/settings.json"
182
+ if [ -f "$SETTINGS_JSON" ]; then
183
+ AC_PCT=$(grep -o '"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE"[[:space:]]*:[[:space:]]*"[0-9]*"' "$SETTINGS_JSON" \
184
+ | head -1 | sed 's/.*"\([0-9]*\)"$/\1/')
185
+ AC_WIN=$(grep -o '"CLAUDE_CODE_AUTO_COMPACT_WINDOW"[[:space:]]*:[[:space:]]*"[0-9]*"' "$SETTINGS_JSON" \
186
+ | head -1 | sed 's/.*"\([0-9]*\)"$/\1/')
187
+ if [ -n "$AC_PCT" ] && [ -n "$AC_WIN" ]; then
188
+ # Effective trigger = pct% of window (integer math; both pure digits per the regex).
189
+ AC_TRIGGER=$(( AC_PCT * AC_WIN / 100 ))
190
+ AC_PCT_OF_1M=$(( AC_TRIGGER * 100 / 1000000 ))
191
+ echo "WARNING: autocompact compound misconfig — CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=${AC_PCT} AND CLAUDE_CODE_AUTO_COMPACT_WINDOW=${AC_WIN} both set in .claude/settings.json compound to ${AC_TRIGGER} tokens (~${AC_PCT_OF_1M}% of 1M). Pick one — see wizard doc '1M vs 200K' (#207)."
192
+ fi
193
+ fi
194
+
173
195
  # Dual-channel install check (#181) — nudge when CLI skills + Claude plugin both present.
174
196
  # #238: silenced once the user opts in via an ack sentinel. Sentinel is per-host
175
197
  # (lives under $SDLC_WIZARD_CACHE_DIR/dual-channel-acknowledged) since the dual
@@ -33,10 +33,16 @@ HOLD_REASONS=""
33
33
  # Check 1: Codex review mid-cycle
34
34
  # Self-heal paths (ordered by preference):
35
35
  # (a) #209: handoff has pr_number + gh reports PR MERGED → implicit CERTIFIED (silent)
36
- # (b) #229: handoff has no pr_number but mtime > SDLC_HANDOFF_STALE_DAYS days
37
- # implicit CERTIFIED with WARN (the handoff predates #209 or was never
38
- # PR-linked; blocking forever over a forgotten artifact is worse UX than
39
- # the bug we're preventing). Default threshold: 14 days.
36
+ # (c) #257: handoff has no pr_number BUT every SHA cited in fixes_applied[]
37
+ # is in HEAD's ancestry AND .reviews/latest-review.md contains CERTIFIED
38
+ # (without "NOT CERTIFIED") implicit CERTIFIED (silent). Catches the
39
+ # solo-developer pattern: write fixes, commit them, run targeted
40
+ # recheck, see CERTIFIED in latest-review.md, ship — and forget to
41
+ # update handoff.json status. The visible signals (commits landed +
42
+ # review file) already say "done" so blocking is high false-positive.
43
+ # (b) #229: handoff has no pr_number, no SHA-ancestry heal, but mtime >
44
+ # SDLC_HANDOFF_STALE_DAYS days → implicit CERTIFIED with WARN
45
+ # (forgotten artifact; blocking forever is worse UX). Default: 14 days.
40
46
  HANDOFF="$ROOT/.reviews/handoff.json"
41
47
  # Validate SDLC_HANDOFF_STALE_DAYS as non-negative integer. Anything else
42
48
  # (empty, "foo", "-3", "10.5") silently falls back to 14 — we don't want a
@@ -62,8 +68,77 @@ if [ -f "$HANDOFF" ]; then
62
68
  [ "$PR_STATE" = "MERGED" ] && HEALED=1
63
69
  fi
64
70
  else
71
+ # Path (c) #257: SHA-ancestry self-heal. Look for git SHAs cited
72
+ # in fixes_applied[]; if every cited SHA is reachable from HEAD
73
+ # AND .reviews/latest-review.md says CERTIFIED, the review IS
74
+ # closed, the user just forgot to bump status. Silent heal.
75
+ REVIEW_MD="$ROOT/.reviews/latest-review.md"
76
+ if [ -f "$REVIEW_MD" ] \
77
+ && grep -qE '\bCERTIFIED\b' "$REVIEW_MD" 2>/dev/null \
78
+ && ! grep -qE '\bNOT CERTIFIED\b' "$REVIEW_MD" 2>/dev/null; then
79
+ # Extract the fixes_applied[] block via bracket-depth
80
+ # tracking — naive `/\]/` matching breaks on `]` inside
81
+ # string literals (e.g. "[x] FIXED in <sha>" markdown
82
+ # checkboxes), which would let phantom SHAs after the
83
+ # broken-early bracket leak past path (c) and false-heal.
84
+ # Codex P1 round 1.
85
+ FIXES_BLOCK=$(awk '
86
+ BEGIN { in_block = 0; depth = 0; started = 0 }
87
+ /"fixes_applied"/ { in_block = 1 }
88
+ in_block {
89
+ print
90
+ in_string = 0
91
+ escaped = 0
92
+ for (i = 1; i <= length($0); i++) {
93
+ c = substr($0, i, 1)
94
+ # Honor JSON backslash escapes: \" inside a
95
+ # string is a literal quote, NOT a string
96
+ # terminator. Without this, a fixes_applied
97
+ # entry containing `\"]` falsely flips the
98
+ # in_string flag and exits the array early —
99
+ # letting later phantom SHAs leak past path
100
+ # (c) and false-heal (Codex round 2 P1).
101
+ if (escaped) { escaped = 0; continue }
102
+ if (c == "\\") { escaped = 1; continue }
103
+ if (c == "\"") { in_string = !in_string; continue }
104
+ if (in_string) continue
105
+ if (c == "[") { depth++; started = 1 }
106
+ else if (c == "]") { depth-- }
107
+ }
108
+ if (started && depth <= 0) in_block = 0
109
+ }
110
+ ' "$HANDOFF" 2>/dev/null)
111
+ if [ -n "$FIXES_BLOCK" ] && [ -d "$ROOT/.git" ]; then
112
+ # Strip UUIDs (8-4-4-4-12 hex pattern) BEFORE extracting
113
+ # SHA candidates. UUIDs have a fixed shape; their hex
114
+ # segments would otherwise match \b[0-9a-f]{7,40}\b and
115
+ # fail the ancestry check, false-blocking certified
116
+ # reviews that cite UUIDs in fixes_applied (mission
117
+ # UUIDs, Linear/Jira ticket IDs, etc.). Codex P2 round 1.
118
+ # POSIX-compatible: no `\b` (BSD sed doesn't support it).
119
+ # The hyphenated 8-4-4-4-12 shape is specific enough
120
+ # that false-stripping a real SHA is implausible.
121
+ CLEANED=$(echo "$FIXES_BLOCK" | sed -E 's/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}//g')
122
+ SHAS=$(echo "$CLEANED" | grep -oE '\b[0-9a-f]{7,40}\b' | sort -u)
123
+ if [ -n "$SHAS" ]; then
124
+ # Every cited SHA must be reachable from HEAD —
125
+ # phantom SHAs (e.g. typos, references to other
126
+ # repos) correctly fail ancestry and block the heal.
127
+ ALL_IN_HEAD=1
128
+ for sha in $SHAS; do
129
+ if ! git -C "$ROOT" merge-base --is-ancestor "$sha" HEAD 2>/dev/null; then
130
+ ALL_IN_HEAD=0
131
+ break
132
+ fi
133
+ done
134
+ [ "$ALL_IN_HEAD" -eq 1 ] && HEALED=1
135
+ fi
136
+ fi
137
+ fi
65
138
  # Path (b): stale-handoff auto-expire (#229). Only when no pr_number
66
- # we must not short-circuit PR-linked reviews.
139
+ # AND path (c) didn't already heal. We must not short-circuit
140
+ # PR-linked reviews.
141
+ if [ "$HEALED" -ne 1 ]; then
67
142
  # Try GNU stat first (Linux: `-c %Y` gives mtime, BSD stat errors out
68
143
  # so `||` fires). Then BSD stat (macOS: `-f %m` gives mtime). The
69
144
  # reverse order fails on Linux because `stat -f` on GNU means
@@ -83,6 +158,7 @@ if [ -f "$HANDOFF" ]; then
83
158
  ;;
84
159
  esac
85
160
  fi
161
+ fi
86
162
  if [ "$HEALED" -ne 1 ]; then
87
163
  HOLD_REASONS="${HOLD_REASONS} - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
88
164
  Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.44.0",
3
+ "version": "1.45.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "cli/bin/sdlc-wizard.js"
@@ -170,16 +170,18 @@ When auto-approving, still announce your approach — just don't wait for approv
170
170
 
171
171
  ## Recommended Model
172
172
 
173
- **Default: `opus[1m]` (Opus 4.7 with 1M context window).** Run `/model opus[1m]` at the start of any non-trivial SDLC session.
173
+ **Opt-in: `opus[1m]` (Opus 4.7 with 1M context window).** Run `/model opus[1m]` at the start of any non-trivial SDLC session — but understand the tradeoff first (issue #198).
174
174
 
175
- **Why:**
175
+ **Why opt-in, not default:** A top-level `model` pin in `.claude/settings.json` disables Claude Code's per-turn model auto-selection. That's a real cost — Max-plan users pay for that auto-selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). Pin only when you actually need the 1M headroom.
176
+
177
+ **Why pin to `opus[1m]` when you do opt in:**
176
178
  - SDLC sessions (plan → TDD → review → CI shepherd) accumulate context fast — plans, test output, diffs, review artifacts. 200K fills up before you're done.
177
179
  - Forced auto-compact mid-task loses your working state. Extra headroom is cheaper than re-reading files.
178
180
  - At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models — verify current rates for your plan before relying on this.
179
181
 
180
182
  **Requires Claude Code v2.1.111+** for Opus 4.7.
181
183
 
182
- **Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`.** Without it, CC's default auto-compact on 1M fires at ~76K and defeats the purpose. The wizard's `cli/templates/settings.json` sets both defaults on install.
184
+ **Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`** when you opt in. Without it, CC's default auto-compact on 1M fires at ~76K and defeats the purpose. The setup wizard's Step 9.5 prompts to write both together (template ships with neither, opt-in only).
183
185
 
184
186
  **Fall back to `opus` (200K) only when:** your plan charges a premium for long-context prompts, the task is genuinely short (<30K), or team cost controls flag >200K prompts. See the "1M vs 200K Context Window" section in `CLAUDE_CODE_SDLC_WIZARD.md` for details.
185
187
 
@@ -606,7 +608,7 @@ CI passes -> Read review suggestions
606
608
  - `/clear` after 2+ failed corrections (context polluted — start fresh with better prompt)
607
609
  - Auto-compact fires at ~95% capacity — no manual management needed
608
610
  - After committing a PR, `/clear` before starting the next feature
609
- - **Autocompact tuning:** Set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to trigger compaction earlier (75% for 200K, 30% for 1M). On 1M models, the default fires at ~76K — set 30% or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` to use the full context window. See wizard doc "Autocompact Tuning" for full details
611
+ - **Autocompact tuning:** Set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to trigger compaction earlier (75% for 200K, 30% for 1M). On 1M models, the default fires at ~76K — pick ONE of: `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` **OR** `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (do NOT set both — they compound to 30% × 400K = 120K trigger ≈ 12% of 1M, which fires almost immediately, #207). See wizard doc "Autocompact Tuning" for full details
610
612
 
611
613
  **`--bare` mode (v2.1.81+):** `claude -p "prompt" --bare` skips ALL hooks, skills, LSP, and plugins. This is a complete wizard bypass — no SDLC enforcement, no TDD checks, no planning hooks. Use only for scripted headless calls (CI pipelines, automation) where you explicitly don't want wizard enforcement. Never use `--bare` for normal development work.
612
614
 
@@ -131,9 +131,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
131
131
 
132
132
  ```
133
133
  Installed: 1.24.0
134
- Latest: 1.44.0
134
+ Latest: 1.45.0
135
135
 
136
136
  What changed:
137
+ - [1.45.0] PreCompact path (c) — SHA-ancestry self-heal — closes #257. Solo-developer pattern: write fixes, commit them, run targeted Codex recheck, see CERTIFIED in `.reviews/latest-review.md`, ship the feature. Forgetting to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED` is realistic — the file is buried and the visible signals (commits + review file) already say "done". PreCompact hook now self-heals silently when: (a) handoff is `PENDING_*` with no `pr_number`, (b) every SHA cited in `fixes_applied[]` is reachable from HEAD via `git merge-base --is-ancestor`, AND (c) `.reviews/latest-review.md` contains CERTIFIED without `NOT CERTIFIED`. Bracket extraction is escape-aware (depth counter + JSON `\\\"` handling) so `]` inside string literals doesn't terminate the array early. UUIDs (8-4-4-4-12 hex) stripped before SHA extraction so ticket IDs in fixes_applied don't false-block the heal. 9 new tests, Codex round 3 CERTIFIED 10/10.
138
+ - [1.44.1] Autocompact compound-misconfig detection — closes #207. Consumer reported autocompact firing at 12% context on a fresh opus[1m] session because they set BOTH `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (a natural misreading of the "or"-joined override cell). The two compound: 30% × 400K = 120K trigger ≈ 12% of 1M. Three-pronged fix: (a) wizard doc clarifies alternatives with a `> ⚠ Do NOT set both` callout that shows the compound math; (b) `instructions-loaded-check.sh` (InstructionsLoaded hook) detects when both env vars are set in `.claude/settings.json`, computes the effective trigger, and warns with the math; (c) shipped `skills/sdlc/SKILL.md` was still calling opus[1m] the "default" (stale post-#198) AND repeating the same ambiguous wording — both fixed. 4 new hook tests + 3 new doc-consistency tests + size-cap fixture extended. Codex round 2 CERTIFIED 9/10.
137
139
  - [1.44.0] Install-path & cache hygiene — closes #254, #239, #238 filed by consumer codeguesser after upgrading 1.32.0 → 1.42.1. (1) `cli/init.js` FILES list now ships `hooks/_find-sdlc-root.sh` — the helper sourced by all 5 hooks was missing from npm install path, so every session emitted `_find-sdlc-root.sh: No such file or directory` + `dedupe_plugin_or_project: command not found` and the SDLC walk-up logic was silently dead. (2) `init --force` now invalidates `~/.cache/sdlc-wizard/latest-version` so post-upgrade hooks re-fetch fresh values from npm instead of serving the pre-upgrade cache for 24h (which produced reverse "1.42.1 → 1.41.1" nudges). (3) instructions-loaded-check.sh now uses semver-direction comparison via new `semver_lt` function: nudge only fires when installed < latest, equality is silent, reverse direction is silent. Cache sanity-check rejects poisoned values (cached "latest" < installed → force refetch). (4) When `npm view` fails AND cache empty, hook now surfaces a one-line warning instead of going silent. (5) Dual-channel install nudge gains an opt-in silence sentinel — set via `mkdir -p $SDLC_WIZARD_CACHE_DIR && touch $SDLC_WIZARD_CACHE_DIR/dual-channel-acknowledged` (printed inside the nudge itself for discoverability). 8 new tests across test-cli.sh + test-hooks.sh, Codex CERTIFIED 10/10 round 2.
138
140
  - [1.43.0] Token-spike anomaly detection — ROADMAP #220 closure. New `hooks/token-spike-check.sh` (SessionStart, opt-in via `.metrics/`) ingests CC transcript usage (`input_tokens` / `output_tokens` / `cache_creation_input_tokens` / `cache_read_input_tokens`) into `.metrics/token-history.jsonl`, then warns when the last session's `costly_tokens` (input + cache_creation + output, excluding the cheap cache_read tier) exceeds median + 2σ over a rolling baseline. Catches silent CC-side caching regressions (per Anthropic's 2026-04-23 post-mortem) before they surface on the invoice. Uses MAD-based spread for the median metric so a single baseline outlier doesn't mask the next spike. 14 quality tests in `tests/test-token-spike.sh` (incl. malicious-transcript privacy probe, flat-baseline floor, median-vs-mean contrast, concurrent-ingest mkdir lock).
139
141
  - [1.42.2] PreCompact self-heal documented — ROADMAP #209 closure. Added `pr_number` opt-in to all 3 handoff template schemas (skill Step 1; wizard Round 1 + cross-model section). Self-heal logic shipped earlier with #229 but was undocumented, leaving the dead-code path. New `test_handoff_template_documents_pr_number` enforces template/doc parity. Together with #229 (mtime auto-expire) closes the "stuck PENDING handoff blocks /compact forever" footgun from both directions.