agentic-sdlc-wizard 1.44.1 → 1.46.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.44.1",
16
+ "version": "1.46.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.44.1",
3
+ "version": "1.46.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,37 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.46.0] - 2026-04-27
8
+
9
+ ### Added
10
+
11
+ - **PreCompact dry-run env vars** (closes #240). Consumer reported clobbering their real `.reviews/handoff.json` while smoke-testing the PreCompact hook — the only way to verify hook behavior was to `cp` real state aside, fabricate fakes, and restore. Two new env vars simulate state in-memory:
12
+ - `SDLC_DRY_RUN_HANDOFF_STATUS=<value>` — overrides the handoff.json read entirely. Useful values: `PENDING_REVIEW`/`PENDING_RECHECK` (block), `CERTIFIED` (silent). Skips file I/O.
13
+ - `SDLC_DRY_RUN_GIT_STATE=rebase|merge|cherry-pick` — simulates an in-flight git op. No real `.git/` needed.
14
+ - **Safety**: unknown values (typos like `bogus`) fall back to real-state checks rather than silently bypassing safety. Codex round 1 caught this bypass risk; the fix uses a `DRY_RUN_GIT_HANDLED` flag so only known scenarios short-circuit the real check.
15
+ - **No mutations**: dry-run paths are pure read-only simulation. Subsequent runs without env vars see clean state.
16
+ - 7 new test-hooks tests (positive simulations + override of real PENDING + typo fallback + no-mutation guarantee). Codex round 2 CERTIFIED 10/10.
17
+
18
+ ### Files
19
+
20
+ - `hooks/precompact-seam-check.sh` — comment block + dry-run handoff `if/elif` branch + dry-run git `case` with `DRY_RUN_GIT_HANDLED` flag; real-state check gated on flag
21
+ - `tests/test-hooks.sh` — 7 new dry-run tests
22
+
23
+ ## [1.45.0] - 2026-04-27
24
+
25
+ ### Added
26
+
27
+ - **PreCompact path (c) — SHA-ancestry self-heal** (closes #257). Consumer reported PreCompact blocking `/compact` even when the cited Codex review WAS actually CERTIFIED — the user just forgot to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED`. Existing self-heals don't cover this solo-developer pattern: path (a) needs `pr_number`, path (b) needs `mtime > 14d`. New path (c) heals when: handoff is `PENDING_*` with no `pr_number`, every SHA cited in `fixes_applied[]` is reachable from HEAD (`git merge-base --is-ancestor`), AND `.reviews/latest-review.md` contains `CERTIFIED` without `NOT CERTIFIED`. Path (b) still runs if (c) abstains (no SHAs / no review file).
28
+ - **Robust extraction**: awk extracts the `fixes_applied[]` block via bracket-depth + escape-aware string-literal tracking. `]` inside string literals (e.g. `"[x] FIXED..."` markdown checkboxes, `"...\"]"` escaped-quote-bracket) does NOT terminate the array prematurely.
29
+ - **UUID resilience**: strips 8-4-4-4-12 hex UUIDs before SHA extraction so ticket IDs in fixes_applied entries (Linear, Jira, mission UUIDs) don't false-block the heal.
30
+ - **Phantom SHA gate**: every cited SHA must pass `git merge-base --is-ancestor` against HEAD. Phantom SHAs (typos, references to other repos) correctly fail and block the heal.
31
+ - 9 new test-hooks tests (positive heal, phantom blocks, NOT CERTIFIED blocks, missing review.md blocks, partial coverage blocks, fall-through to stale, markdown-checkbox bracket, UUID alongside real SHA, escaped-quote bracket). Codex round 3 CERTIFIED 10/10 (rounds 1-2 surfaced bracket-extraction edge cases — markdown `[x]`, escaped quotes — and UUID false-block; all fixed).
32
+
33
+ ### Files
34
+
35
+ - `hooks/precompact-seam-check.sh` — new path (c) block with depth-counted + escape-aware awk extraction, sed UUID strip, ancestry check
36
+ - `tests/test-hooks.sh` — `_precompact_init_repo_with_commit` helper + 9 new path (c) tests
37
+
7
38
  ## [1.44.1] - 2026-04-27
8
39
 
9
40
  ### Fixed
@@ -2717,6 +2717,53 @@ Options:
2717
2717
 
2718
2718
  ---
2719
2719
 
2720
+ ### Browser Tooling Policy
2721
+
2722
+ Three different jobs, three different tools. Conflating them is the source of recurring agent failures — `Playwright MCP` for an auth-heavy registrar dashboard wastes a session, browser-use for a deterministic regression test gives flaky CI.
2723
+
2724
+ | Tool | Job | Profile model | When to pick |
2725
+ |------|-----|---------------|--------------|
2726
+ | **Playwright tests** | Deterministic regression suite, CI/release gate | Isolated by design — each test gets a clean browser context per [Playwright docs](https://playwright.dev/docs/browser-contexts) | Asserting expected user flows; running on every PR; gating deploy |
2727
+ | **Playwright MCP** | Live browser debugging, visual QA, DOM inspection | Default mode uses a **persistent Playwright-managed profile** at `ms-playwright/mcp-{channel}-{workspace-hash}` ([docs](https://playwright.dev/docs/getting-started-mcp#user-profile)) — NOT the user's regular Chrome profile. Other modes: `--isolated` (ephemeral context per session), `--user-data-dir=PATH` (caller-supplied dir), CDP attach (`--cdp-endpoint`), extension mode (attach to user's running browser tab) | One-off "look at this page" / "click this button and tell me what happens"; visual verification mid-session |
2728
+ | **Real-browser tooling** (browser-use, CDP-attach, Chrome profile) | Authenticated, profile-dependent, stateful operator flows | **When configured for real-browser mode** (e.g., browser-use `Browser.from_system_chrome()` or CLI `--profile`/`connect`), uses the user's actual Chrome profile (cookies, extensions, logged-in sessions). Default browser-use CLI runs headless Chromium — opt into the real-profile mode explicitly | Registrar dashboards (Porkbun, GoDaddy), DNS setup, cloud-provider consoles, wallet-adjacent Web3 flows, logged-in admin panels — anywhere preserving cookies/profile/extensions matters more than clean isolation |
2729
+
2730
+ **The core insight:** Playwright tests' isolation is a *feature*, not a bug. Playwright MCP's persistent-managed-profile default is also a feature — it preserves session continuity across debug interactions in a SINGLE agent. The collision case is concurrent agents (#251) sharing the same managed profile. Real-browser tooling is the right call only when the task IS the user's authenticated session, and only when explicitly configured to attach to that profile.
2731
+
2732
+ #### When to recommend real-browser tooling (#225)
2733
+
2734
+ Trigger examples — if the task description includes any of these, suggest real-browser tooling (browser-use or CDP-attach) over Playwright MCP:
2735
+
2736
+ - Registrar dashboards (domain purchase, DNS records, nameserver changes)
2737
+ - DNS setup / DNSLink / custom-domain configuration
2738
+ - Cloud/provider dashboards (AWS console, Cloudflare, Vercel, registrars)
2739
+ - Wallet-adjacent Web3 flows (token approvals, contract interactions in a logged-in MetaMask)
2740
+ - Logged-in admin panels (Stripe, Vercel, GitHub admin pages requiring 2FA-cached session)
2741
+ - Anywhere preserving cookies/profile/extensions matters more than clean isolation
2742
+
2743
+ These all share a property: the agent's job IS the authenticated session. A clean automation browser is the wrong model.
2744
+
2745
+ #### Playwright MCP profile-lock policy (#251)
2746
+
2747
+ `Playwright MCP`'s default mode reuses a single persistent managed profile (`ms-playwright/mcp-{channel}-{workspace-hash}`) across stdio sessions — which is the right call for single-agent debugging because it preserves session continuity across calls, but breaks down when **multiple agents or MCP clients run concurrently** (two CC sessions, one CC + one Codex, etc.). Concurrent stdio sessions collide on the same Chrome user-data directory and corrupt each other's session state.
2748
+
2749
+ **Upstream Playwright rejected default-isolated as the global default.** See [microsoft/playwright#40419](https://github.com/microsoft/playwright/issues/40419) and the discussion on [microsoft/playwright#40420](https://github.com/microsoft/playwright/pull/40420) — maintainer feedback: *"That's unfortunately very breaking."* Changing the default would silently break every existing single-agent setup that relies on the persistent managed profile.
2750
+
2751
+ **Wizard policy (per-user, opt-in at the wizard layer, not upstream):**
2752
+
2753
+ - **Single-agent / single MCP client (default):** Use Playwright MCP's default persistent-managed-profile mode. No special config required.
2754
+ - **Concurrent agents / multiple MCP clients:** Pick one of these per client to avoid profile-lock collisions: (a) `--isolated` (ephemeral context per session, no persistence), (b) `--user-data-dir=$TMPDIR/playwright-mcp-$AGENT_ID` (caller-supplied dir, isolated per agent), or (c) `--cdp-endpoint` to attach each agent to a separately-launched browser. None of these require an upstream breaking change.
2755
+ - **Real-browser / profile-dependent flows:** Don't use Playwright MCP at all — use real-browser tooling explicitly configured to attach to the user's profile (e.g., `browser-use` with `Browser.from_system_chrome()` or CLI `--profile`). The task is the session, not isolated automation.
2756
+
2757
+ This rule is per-workflow, not global. Setup wizard does NOT auto-configure isolated profiles — adoption is explicit, gated on the user signaling concurrent-agent intent.
2758
+
2759
+ #### Anti-patterns
2760
+
2761
+ - **Using Playwright MCP for registrar dashboards** — its persistent managed profile is NOT your real Chrome profile, so your registrar's logged-in session/2FA cookies aren't there. You'll be re-logging in on every interaction. Use real-browser tooling configured for your real Chrome profile instead.
2762
+ - **Using profile-coupled / stateful browser tooling for deterministic CI tests** — when browser-use (or any tool) is configured to use a real Chrome profile, cached state, extension chrome, and stale cookies pollute the test. Use Playwright tests with isolated browser contexts for CI.
2763
+ - **Setting Playwright MCP `--isolated` globally as a default** — breaks single-agent flows that rely on the persistent managed profile for session continuity across debug interactions. Upstream Playwright rejected this for the same reason. Make it explicit per-workflow when concurrent agents are running.
2764
+
2765
+ ---
2766
+
2720
2767
  ## Step 8: Create CLAUDE.md
2721
2768
 
2722
2769
  Create `CLAUDE.md` in your project root. This is your project-specific configuration:
@@ -2920,7 +2967,7 @@ If deployment fails or post-deploy verification catches issues:
2920
2967
 
2921
2968
  **SDLC.md:**
2922
2969
  ```markdown
2923
- <!-- SDLC Wizard Version: 1.44.1 -->
2970
+ <!-- SDLC Wizard Version: 1.46.0 -->
2924
2971
  <!-- Setup Date: [DATE] -->
2925
2972
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2926
2973
  <!-- Git Workflow: [PRs or Solo] -->
@@ -3985,7 +4032,7 @@ Walk through updates? (y/n)
3985
4032
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3986
4033
 
3987
4034
  ```markdown
3988
- <!-- SDLC Wizard Version: 1.44.1 -->
4035
+ <!-- SDLC Wizard Version: 1.46.0 -->
3989
4036
  <!-- Setup Date: 2026-01-24 -->
3990
4037
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3991
4038
  <!-- Git Workflow: PRs -->
@@ -30,13 +30,29 @@ ROOT="${CLAUDE_PROJECT_DIR:-$PWD}"
30
30
 
31
31
  HOLD_REASONS=""
32
32
 
33
+ # #240: Dry-run / simulation env vars. Let consumers verify hook behavior
34
+ # without mutating real .reviews/handoff.json or .git/ state.
35
+ # SDLC_DRY_RUN_HANDOFF_STATUS=<value> — overrides handoff.json status
36
+ # lookup (skip the file read entirely)
37
+ # SDLC_DRY_RUN_GIT_STATE=rebase|merge|cherry-pick — simulates an in-flight
38
+ # git operation
39
+ # When set, dry-run values short-circuit the real-state checks below. The
40
+ # hook still emits the same HOLD/silent decision so consumers can smoke-test
41
+ # every code path. No filesystem writes — purely read-only simulation.
42
+
33
43
  # Check 1: Codex review mid-cycle
34
44
  # Self-heal paths (ordered by preference):
35
45
  # (a) #209: handoff has pr_number + gh reports PR MERGED → implicit CERTIFIED (silent)
36
- # (b) #229: handoff has no pr_number but mtime > SDLC_HANDOFF_STALE_DAYS days
37
- # implicit CERTIFIED with WARN (the handoff predates #209 or was never
38
- # PR-linked; blocking forever over a forgotten artifact is worse UX than
39
- # the bug we're preventing). Default threshold: 14 days.
46
+ # (c) #257: handoff has no pr_number BUT every SHA cited in fixes_applied[]
47
+ # is in HEAD's ancestry AND .reviews/latest-review.md contains CERTIFIED
48
+ # (without "NOT CERTIFIED") implicit CERTIFIED (silent). Catches the
49
+ # solo-developer pattern: write fixes, commit them, run targeted
50
+ # recheck, see CERTIFIED in latest-review.md, ship — and forget to
51
+ # update handoff.json status. The visible signals (commits landed +
52
+ # review file) already say "done" so blocking is high false-positive.
53
+ # (b) #229: handoff has no pr_number, no SHA-ancestry heal, but mtime >
54
+ # SDLC_HANDOFF_STALE_DAYS days → implicit CERTIFIED with WARN
55
+ # (forgotten artifact; blocking forever is worse UX). Default: 14 days.
40
56
  HANDOFF="$ROOT/.reviews/handoff.json"
41
57
  # Validate SDLC_HANDOFF_STALE_DAYS as non-negative integer. Anything else
42
58
  # (empty, "foo", "-3", "10.5") silently falls back to 14 — we don't want a
@@ -48,7 +64,16 @@ case "$STALE_DAYS_RAW" in
48
64
  *) STALE_DAYS="$STALE_DAYS_RAW" ;;
49
65
  esac
50
66
  STALE_WARN=""
51
- if [ -f "$HANDOFF" ]; then
67
+ # #240: dry-run override skips the real handoff.json read.
68
+ if [ -n "${SDLC_DRY_RUN_HANDOFF_STATUS:-}" ]; then
69
+ STATUS="$SDLC_DRY_RUN_HANDOFF_STATUS"
70
+ case "$STATUS" in
71
+ PENDING_REVIEW|PENDING_RECHECK)
72
+ HOLD_REASONS="${HOLD_REASONS} - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
73
+ Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
74
+ ;;
75
+ esac
76
+ elif [ -f "$HANDOFF" ]; then
52
77
  STATUS=$(grep -o '"status"[[:space:]]*:[[:space:]]*"[^"]*"' "$HANDOFF" 2>/dev/null | head -1 | sed 's/.*"\([^"]*\)"$/\1/')
53
78
  case "$STATUS" in
54
79
  PENDING_REVIEW|PENDING_RECHECK)
@@ -62,8 +87,77 @@ if [ -f "$HANDOFF" ]; then
62
87
  [ "$PR_STATE" = "MERGED" ] && HEALED=1
63
88
  fi
64
89
  else
90
+ # Path (c) #257: SHA-ancestry self-heal. Look for git SHAs cited
91
+ # in fixes_applied[]; if every cited SHA is reachable from HEAD
92
+ # AND .reviews/latest-review.md says CERTIFIED, the review IS
93
+ # closed, the user just forgot to bump status. Silent heal.
94
+ REVIEW_MD="$ROOT/.reviews/latest-review.md"
95
+ if [ -f "$REVIEW_MD" ] \
96
+ && grep -qE '\bCERTIFIED\b' "$REVIEW_MD" 2>/dev/null \
97
+ && ! grep -qE '\bNOT CERTIFIED\b' "$REVIEW_MD" 2>/dev/null; then
98
+ # Extract the fixes_applied[] block via bracket-depth
99
+ # tracking — naive `/\]/` matching breaks on `]` inside
100
+ # string literals (e.g. "[x] FIXED in <sha>" markdown
101
+ # checkboxes), which would let phantom SHAs after the
102
+ # broken-early bracket leak past path (c) and false-heal.
103
+ # Codex P1 round 1.
104
+ FIXES_BLOCK=$(awk '
105
+ BEGIN { in_block = 0; depth = 0; started = 0 }
106
+ /"fixes_applied"/ { in_block = 1 }
107
+ in_block {
108
+ print
109
+ in_string = 0
110
+ escaped = 0
111
+ for (i = 1; i <= length($0); i++) {
112
+ c = substr($0, i, 1)
113
+ # Honor JSON backslash escapes: \" inside a
114
+ # string is a literal quote, NOT a string
115
+ # terminator. Without this, a fixes_applied
116
+ # entry containing `\"]` falsely flips the
117
+ # in_string flag and exits the array early —
118
+ # letting later phantom SHAs leak past path
119
+ # (c) and false-heal (Codex round 2 P1).
120
+ if (escaped) { escaped = 0; continue }
121
+ if (c == "\\") { escaped = 1; continue }
122
+ if (c == "\"") { in_string = !in_string; continue }
123
+ if (in_string) continue
124
+ if (c == "[") { depth++; started = 1 }
125
+ else if (c == "]") { depth-- }
126
+ }
127
+ if (started && depth <= 0) in_block = 0
128
+ }
129
+ ' "$HANDOFF" 2>/dev/null)
130
+ if [ -n "$FIXES_BLOCK" ] && [ -d "$ROOT/.git" ]; then
131
+ # Strip UUIDs (8-4-4-4-12 hex pattern) BEFORE extracting
132
+ # SHA candidates. UUIDs have a fixed shape; their hex
133
+ # segments would otherwise match \b[0-9a-f]{7,40}\b and
134
+ # fail the ancestry check, false-blocking certified
135
+ # reviews that cite UUIDs in fixes_applied (mission
136
+ # UUIDs, Linear/Jira ticket IDs, etc.). Codex P2 round 1.
137
+ # POSIX-compatible: no `\b` (BSD sed doesn't support it).
138
+ # The hyphenated 8-4-4-4-12 shape is specific enough
139
+ # that false-stripping a real SHA is implausible.
140
+ CLEANED=$(echo "$FIXES_BLOCK" | sed -E 's/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}//g')
141
+ SHAS=$(echo "$CLEANED" | grep -oE '\b[0-9a-f]{7,40}\b' | sort -u)
142
+ if [ -n "$SHAS" ]; then
143
+ # Every cited SHA must be reachable from HEAD —
144
+ # phantom SHAs (e.g. typos, references to other
145
+ # repos) correctly fail ancestry and block the heal.
146
+ ALL_IN_HEAD=1
147
+ for sha in $SHAS; do
148
+ if ! git -C "$ROOT" merge-base --is-ancestor "$sha" HEAD 2>/dev/null; then
149
+ ALL_IN_HEAD=0
150
+ break
151
+ fi
152
+ done
153
+ [ "$ALL_IN_HEAD" -eq 1 ] && HEALED=1
154
+ fi
155
+ fi
156
+ fi
65
157
  # Path (b): stale-handoff auto-expire (#229). Only when no pr_number
66
- # we must not short-circuit PR-linked reviews.
158
+ # AND path (c) didn't already heal. We must not short-circuit
159
+ # PR-linked reviews.
160
+ if [ "$HEALED" -ne 1 ]; then
67
161
  # Try GNU stat first (Linux: `-c %Y` gives mtime, BSD stat errors out
68
162
  # so `||` fires). Then BSD stat (macOS: `-f %m` gives mtime). The
69
163
  # reverse order fails on Linux because `stat -f` on GNU means
@@ -83,6 +177,7 @@ if [ -f "$HANDOFF" ]; then
83
177
  ;;
84
178
  esac
85
179
  fi
180
+ fi
86
181
  if [ "$HEALED" -ne 1 ]; then
87
182
  HOLD_REASONS="${HOLD_REASONS} - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
88
183
  Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
@@ -92,8 +187,37 @@ if [ -f "$HANDOFF" ]; then
92
187
  fi
93
188
 
94
189
  # Check 2: in-progress git operation
190
+ # #240: dry-run override simulates a git op without needing a real .git/.
95
191
  GITDIR="$ROOT/.git"
96
- if [ -d "$GITDIR" ]; then
192
+ # Step 1: when dry-run var matches a known scenario, simulate it.
193
+ # Otherwise (unset, empty, or unknown value) fall through to the real
194
+ # .git/ checks below — this prevents an unintended safety bypass when
195
+ # the user typos the env var (e.g. SDLC_DRY_RUN_GIT_STATE=bogus would
196
+ # previously skip real checks entirely; Codex P1 round 1).
197
+ DRY_RUN_GIT_HANDLED=0
198
+ case "${SDLC_DRY_RUN_GIT_STATE:-}" in
199
+ rebase)
200
+ HOLD_REASONS="${HOLD_REASONS} - Git rebase in progress. Compacting mid-rebase loses the operation's context.
201
+ Resolve: finish or abort the rebase before /compact."$'\n'
202
+ DRY_RUN_GIT_HANDLED=1
203
+ ;;
204
+ merge)
205
+ HOLD_REASONS="${HOLD_REASONS} - Git merge in progress. Compacting mid-merge loses the operation's context.
206
+ Resolve: finish or abort the merge before /compact."$'\n'
207
+ DRY_RUN_GIT_HANDLED=1
208
+ ;;
209
+ cherry-pick)
210
+ HOLD_REASONS="${HOLD_REASONS} - Git cherry-pick in progress. Compacting mid-cherry-pick loses the operation's context.
211
+ Resolve: finish or abort the cherry-pick before /compact."$'\n'
212
+ DRY_RUN_GIT_HANDLED=1
213
+ ;;
214
+ esac
215
+
216
+ # Step 2: real .git/ check fires if dry-run didn't simulate a scenario.
217
+ # Empty/unset SDLC_DRY_RUN_GIT_STATE → real check (default behavior).
218
+ # Unknown value (e.g. typo "bogus") → also falls through to real check
219
+ # rather than silently bypassing safety. The safer-than-the-typo path.
220
+ if [ "$DRY_RUN_GIT_HANDLED" -eq 0 ] && [ -d "$GITDIR" ]; then
97
221
  if [ -e "$GITDIR/REBASE_HEAD" ] || [ -d "$GITDIR/rebase-merge" ] || [ -d "$GITDIR/rebase-apply" ]; then
98
222
  HOLD_REASONS="${HOLD_REASONS} - Git rebase in progress. Compacting mid-rebase loses the operation's context.
99
223
  Resolve: finish or abort the rebase before /compact."$'\n'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.44.1",
3
+ "version": "1.46.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "cli/bin/sdlc-wizard.js"
@@ -131,9 +131,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
131
131
 
132
132
  ```
133
133
  Installed: 1.24.0
134
- Latest: 1.44.1
134
+ Latest: 1.46.0
135
135
 
136
136
  What changed:
137
+ - [1.46.0] PreCompact dry-run env vars — closes #240. Smoke-testing PreCompact previously required cp'ing real `.reviews/handoff.json` and `.git/` aside, fabricating fake state, restoring — error-prone (consumer clobbered real handoff.json mid-test). Two new env vars: `SDLC_DRY_RUN_HANDOFF_STATUS=PENDING_RECHECK|CERTIFIED|...` simulates handoff status (overrides the real file read); `SDLC_DRY_RUN_GIT_STATE=rebase|merge|cherry-pick` simulates an in-flight git op (no real .git/ needed). Empty/unset → real-state checks (no behavior change). Unknown values (typos) → fall back to real check, NOT silent bypass — Codex round 1 caught the bypass risk and we fixed it with a DRY_RUN_GIT_HANDLED flag. 7 new test-hooks tests. Codex round 2 CERTIFIED 10/10.
138
+ - [1.45.0] PreCompact path (c) — SHA-ancestry self-heal — closes #257. Solo-developer pattern: write fixes, commit them, run targeted Codex recheck, see CERTIFIED in `.reviews/latest-review.md`, ship the feature. Forgetting to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED` is realistic — the file is buried and the visible signals (commits + review file) already say "done". PreCompact hook now self-heals silently when: (a) handoff is `PENDING_*` with no `pr_number`, (b) every SHA cited in `fixes_applied[]` is reachable from HEAD via `git merge-base --is-ancestor`, AND (c) `.reviews/latest-review.md` contains CERTIFIED without `NOT CERTIFIED`. Bracket extraction is escape-aware (depth counter + JSON `\\\"` handling) so `]` inside string literals doesn't terminate the array early. UUIDs (8-4-4-4-12 hex) stripped before SHA extraction so ticket IDs in fixes_applied don't false-block the heal. 9 new tests, Codex round 3 CERTIFIED 10/10.
137
139
  - [1.44.1] Autocompact compound-misconfig detection — closes #207. Consumer reported autocompact firing at 12% context on a fresh opus[1m] session because they set BOTH `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (a natural misreading of the "or"-joined override cell). The two compound: 30% × 400K = 120K trigger ≈ 12% of 1M. Three-pronged fix: (a) wizard doc clarifies alternatives with a `> ⚠ Do NOT set both` callout that shows the compound math; (b) `instructions-loaded-check.sh` (InstructionsLoaded hook) detects when both env vars are set in `.claude/settings.json`, computes the effective trigger, and warns with the math; (c) shipped `skills/sdlc/SKILL.md` was still calling opus[1m] the "default" (stale post-#198) AND repeating the same ambiguous wording — both fixed. 4 new hook tests + 3 new doc-consistency tests + size-cap fixture extended. Codex round 2 CERTIFIED 9/10.
138
140
  - [1.44.0] Install-path & cache hygiene — closes #254, #239, #238 filed by consumer codeguesser after upgrading 1.32.0 → 1.42.1. (1) `cli/init.js` FILES list now ships `hooks/_find-sdlc-root.sh` — the helper sourced by all 5 hooks was missing from npm install path, so every session emitted `_find-sdlc-root.sh: No such file or directory` + `dedupe_plugin_or_project: command not found` and the SDLC walk-up logic was silently dead. (2) `init --force` now invalidates `~/.cache/sdlc-wizard/latest-version` so post-upgrade hooks re-fetch fresh values from npm instead of serving the pre-upgrade cache for 24h (which produced reverse "1.42.1 → 1.41.1" nudges). (3) instructions-loaded-check.sh now uses semver-direction comparison via new `semver_lt` function: nudge only fires when installed < latest, equality is silent, reverse direction is silent. Cache sanity-check rejects poisoned values (cached "latest" < installed → force refetch). (4) When `npm view` fails AND cache empty, hook now surfaces a one-line warning instead of going silent. (5) Dual-channel install nudge gains an opt-in silence sentinel — set via `mkdir -p $SDLC_WIZARD_CACHE_DIR && touch $SDLC_WIZARD_CACHE_DIR/dual-channel-acknowledged` (printed inside the nudge itself for discoverability). 8 new tests across test-cli.sh + test-hooks.sh, Codex CERTIFIED 10/10 round 2.
139
141
  - [1.43.0] Token-spike anomaly detection — ROADMAP #220 closure. New `hooks/token-spike-check.sh` (SessionStart, opt-in via `.metrics/`) ingests CC transcript usage (`input_tokens` / `output_tokens` / `cache_creation_input_tokens` / `cache_read_input_tokens`) into `.metrics/token-history.jsonl`, then warns when the last session's `costly_tokens` (input + cache_creation + output, excluding the cheap cache_read tier) exceeds median + 2σ over a rolling baseline. Catches silent CC-side caching regressions (per Anthropic's 2026-04-23 post-mortem) before they surface on the invoice. Uses MAD-based spread for the median metric so a single baseline outlier doesn't mask the next spike. 14 quality tests in `tests/test-token-spike.sh` (incl. malicious-transcript privacy probe, flat-baseline floor, median-vs-mean contrast, concurrent-ingest mkdir lock).