agentic-sdlc-wizard 1.44.0 → 1.45.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +1 -1
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +33 -0
- package/CLAUDE_CODE_SDLC_WIZARD.md +52 -3
- package/hooks/instructions-loaded-check.sh +22 -0
- package/hooks/precompact-seam-check.sh +81 -5
- package/package.json +1 -1
- package/skills/sdlc/SKILL.md +6 -4
- package/skills/update/SKILL.md +3 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,39 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.45.0] - 2026-04-27
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **PreCompact path (c) — SHA-ancestry self-heal** (closes #257). Consumer reported PreCompact blocking `/compact` even when the cited Codex review WAS actually CERTIFIED — the user just forgot to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED`. Existing self-heals don't cover this solo-developer pattern: path (a) needs `pr_number`, path (b) needs `mtime > 14d`. New path (c) heals when: handoff is `PENDING_*` with no `pr_number`, every SHA cited in `fixes_applied[]` is reachable from HEAD (`git merge-base --is-ancestor`), AND `.reviews/latest-review.md` contains `CERTIFIED` without `NOT CERTIFIED`. Path (b) still runs if (c) abstains (no SHAs / no review file).
|
|
12
|
+
- **Robust extraction**: awk extracts the `fixes_applied[]` block via bracket-depth + escape-aware string-literal tracking. `]` inside string literals (e.g. `"[x] FIXED..."` markdown checkboxes, `"...\"]"` escaped-quote-bracket) does NOT terminate the array prematurely.
|
|
13
|
+
- **UUID resilience**: strips 8-4-4-4-12 hex UUIDs before SHA extraction so ticket IDs in fixes_applied entries (Linear, Jira, mission UUIDs) don't false-block the heal.
|
|
14
|
+
- **Phantom SHA gate**: every cited SHA must pass `git merge-base --is-ancestor` against HEAD. Phantom SHAs (typos, references to other repos) correctly fail and block the heal.
|
|
15
|
+
- 9 new test-hooks tests (positive heal, phantom blocks, NOT CERTIFIED blocks, missing review.md blocks, partial coverage blocks, fall-through to stale, markdown-checkbox bracket, UUID alongside real SHA, escaped-quote bracket). Codex round 3 CERTIFIED 10/10 (rounds 1-2 surfaced bracket-extraction edge cases — markdown `[x]`, escaped quotes — and UUID false-block; all fixed).
|
|
16
|
+
|
|
17
|
+
### Files
|
|
18
|
+
|
|
19
|
+
- `hooks/precompact-seam-check.sh` — new path (c) block with depth-counted + escape-aware awk extraction, sed UUID strip, ancestry check
|
|
20
|
+
- `tests/test-hooks.sh` — `_precompact_init_repo_with_commit` helper + 9 new path (c) tests
|
|
21
|
+
|
|
22
|
+
## [1.44.1] - 2026-04-27
|
|
23
|
+
|
|
24
|
+
### Fixed
|
|
25
|
+
|
|
26
|
+
- **Autocompact compound-misconfig detection** — closes #207. Consumer reported autocompact firing at 12% context on a fresh `opus[1m]` session because they set BOTH `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (a natural misreading of `CLAUDE_CODE_SDLC_WIZARD.md:1008`'s "or"-joined cell). The two compound: `30% × 400000 = 120000 tokens ≈ 12% of 1M`.
|
|
27
|
+
- **Doc fix**: `CLAUDE_CODE_SDLC_WIZARD.md` 1M-vs-200K table now writes `**OR** ... (pick one)` and adds a `> ⚠ Do NOT set both` callout that explains the compound math and points at the runtime detection.
|
|
28
|
+
- **Runtime detection**: `instructions-loaded-check.sh` (InstructionsLoaded hook) reads `.claude/settings.json` for both env vars, computes the effective trigger, and warns with the math when both are set — diagnosable from the warning alone.
|
|
29
|
+
- **Shipped skill drift**: `skills/sdlc/SKILL.md` was still calling `opus[1m]` the "default" (stale post-#198) AND repeating the same ambiguous "30 or 400000" wording it ships to consumers. Both fixed: opus[1m] now framed as opt-in with #198 reference; autocompact tuning line says "pick ONE of: ... OR ... (do NOT set both)".
|
|
30
|
+
- 4 new test-hooks tests (warns / silent on PCT-only / silent on WINDOW-only / shows effective trigger), 3 new test-doc-consistency tests (wizard doc + sdlc skill regression guards), size-cap test fixture extended to include the new branch (cap raised 1500 → 1700 to accommodate). Codex round 2 CERTIFIED 9/10 (round 1 surfaced the size-cap, shipped-skill drift, and InstructionsLoaded vs SessionStart wording — all fixed).
|
|
31
|
+
|
|
32
|
+
### Files
|
|
33
|
+
|
|
34
|
+
- `hooks/instructions-loaded-check.sh` — new compound-misconfig detection block (single-line warning with full env var names + effective trigger math)
|
|
35
|
+
- `CLAUDE_CODE_SDLC_WIZARD.md` — line 1008 alternatives clarification + `> ⚠ Do NOT set both` callout
|
|
36
|
+
- `skills/sdlc/SKILL.md` — `opus[1m]` reframed `Default` → `Opt-in` (matches wizard doc post-#198); autocompact tuning line now warns against the compound config
|
|
37
|
+
- `tests/test-hooks.sh` — 4 new tests + size-cap fixture extended + cap raised
|
|
38
|
+
- `tests/test-doc-consistency.sh` — 3 new regression guards (wizard doc + sdlc skill)
|
|
39
|
+
|
|
7
40
|
## [1.44.0] - 2026-04-27
|
|
8
41
|
|
|
9
42
|
### Fixed
|
|
@@ -1005,7 +1005,9 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in
|
|
|
1005
1005
|
| **Cost** | Standard pricing | Anthropic currently lists the 1M window at standard pricing across the full context for supported Opus/Sonnet models — **verify current rates at [docs.anthropic.com/pricing](https://docs.anthropic.com/)** before assuming no premium |
|
|
1006
1006
|
| **Auto-mode** | **Enabled** — Claude Code chooses model per turn | **Disabled** — top-level `model` tells CC you've chosen explicitly |
|
|
1007
1007
|
| **Auto-compact** | Default ~95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` |
|
|
1008
|
-
| **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`
|
|
1008
|
+
| **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` **OR** `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (pick one) |
|
|
1009
|
+
|
|
1010
|
+
> **⚠ Do NOT set both.** `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` and `CLAUDE_CODE_AUTO_COMPACT_WINDOW` are alternatives, not complementary. Setting both compounds: `30% × 400000 = 120000` tokens, which is ~12% of a 1M window — autocompact fires almost immediately, destroying the headroom you opted in for. Pick one knob: either lower the trigger percentage (`PCT_OVERRIDE=30`) on the model's default 1M window, OR cap the working window (`AUTO_COMPACT_WINDOW=400000`) at the model's default 95% trigger. The `instructions-loaded-check.sh` `InstructionsLoaded` hook (fires on session start/resume) detects this misconfig and prints the effective trigger so you can debug from the warning alone (#207).
|
|
1009
1011
|
|
|
1010
1012
|
**Why `opus[1m]` is opt-in (issue #198):**
|
|
1011
1013
|
- **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
|
|
@@ -2715,6 +2717,53 @@ Options:
|
|
|
2715
2717
|
|
|
2716
2718
|
---
|
|
2717
2719
|
|
|
2720
|
+
### Browser Tooling Policy
|
|
2721
|
+
|
|
2722
|
+
Three different jobs, three different tools. Conflating them is the source of recurring agent failures — `Playwright MCP` for an auth-heavy registrar dashboard wastes a session, browser-use for a deterministic regression test gives flaky CI.
|
|
2723
|
+
|
|
2724
|
+
| Tool | Job | Profile model | When to pick |
|
|
2725
|
+
|------|-----|---------------|--------------|
|
|
2726
|
+
| **Playwright tests** | Deterministic regression suite, CI/release gate | Isolated by design — each test gets a clean browser context per [Playwright docs](https://playwright.dev/docs/browser-contexts) | Asserting expected user flows; running on every PR; gating deploy |
|
|
2727
|
+
| **Playwright MCP** | Live browser debugging, visual QA, DOM inspection | Default mode uses a **persistent Playwright-managed profile** at `ms-playwright/mcp-{channel}-{workspace-hash}` ([docs](https://playwright.dev/docs/getting-started-mcp#user-profile)) — NOT the user's regular Chrome profile. Other modes: `--isolated` (ephemeral context per session), `--user-data-dir=PATH` (caller-supplied dir), CDP attach (`--cdp-endpoint`), extension mode (attach to user's running browser tab) | One-off "look at this page" / "click this button and tell me what happens"; visual verification mid-session |
|
|
2728
|
+
| **Real-browser tooling** (browser-use, CDP-attach, Chrome profile) | Authenticated, profile-dependent, stateful operator flows | **When configured for real-browser mode** (e.g., browser-use `Browser.from_system_chrome()` or CLI `--profile`/`connect`), uses the user's actual Chrome profile (cookies, extensions, logged-in sessions). Default browser-use CLI runs headless Chromium — opt into the real-profile mode explicitly | Registrar dashboards (Porkbun, GoDaddy), DNS setup, cloud-provider consoles, wallet-adjacent Web3 flows, logged-in admin panels — anywhere preserving cookies/profile/extensions matters more than clean isolation |
|
|
2729
|
+
|
|
2730
|
+
**The core insight:** Playwright tests' isolation is a *feature*, not a bug. Playwright MCP's persistent-managed-profile default is also a feature — it preserves session continuity across debug interactions in a SINGLE agent. The collision case is concurrent agents (#251) sharing the same managed profile. Real-browser tooling is the right call only when the task IS the user's authenticated session, and only when explicitly configured to attach to that profile.
|
|
2731
|
+
|
|
2732
|
+
#### When to recommend real-browser tooling (#225)
|
|
2733
|
+
|
|
2734
|
+
Trigger examples — if the task description includes any of these, suggest real-browser tooling (browser-use or CDP-attach) over Playwright MCP:
|
|
2735
|
+
|
|
2736
|
+
- Registrar dashboards (domain purchase, DNS records, nameserver changes)
|
|
2737
|
+
- DNS setup / DNSLink / custom-domain configuration
|
|
2738
|
+
- Cloud/provider dashboards (AWS console, Cloudflare, Vercel, registrars)
|
|
2739
|
+
- Wallet-adjacent Web3 flows (token approvals, contract interactions in a logged-in MetaMask)
|
|
2740
|
+
- Logged-in admin panels (Stripe, Vercel, GitHub admin pages requiring 2FA-cached session)
|
|
2741
|
+
- Anywhere preserving cookies/profile/extensions matters more than clean isolation
|
|
2742
|
+
|
|
2743
|
+
These all share a property: the agent's job IS the authenticated session. A clean automation browser is the wrong model.
|
|
2744
|
+
|
|
2745
|
+
#### Playwright MCP profile-lock policy (#251)
|
|
2746
|
+
|
|
2747
|
+
`Playwright MCP`'s default mode reuses a single persistent managed profile (`ms-playwright/mcp-{channel}-{workspace-hash}`) across stdio sessions — which is the right call for single-agent debugging because it preserves session continuity across calls, but breaks down when **multiple agents or MCP clients run concurrently** (two CC sessions, one CC + one Codex, etc.). Concurrent stdio sessions collide on the same Chrome user-data directory and corrupt each other's session state.
|
|
2748
|
+
|
|
2749
|
+
**Upstream Playwright rejected default-isolated as the global default.** See [microsoft/playwright#40419](https://github.com/microsoft/playwright/issues/40419) and the discussion on [microsoft/playwright#40420](https://github.com/microsoft/playwright/pull/40420) — maintainer feedback: *"That's unfortunately very breaking."* Changing the default would silently break every existing single-agent setup that relies on the persistent managed profile.
|
|
2750
|
+
|
|
2751
|
+
**Wizard policy (per-user, opt-in at the wizard layer, not upstream):**
|
|
2752
|
+
|
|
2753
|
+
- **Single-agent / single MCP client (default):** Use Playwright MCP's default persistent-managed-profile mode. No special config required.
|
|
2754
|
+
- **Concurrent agents / multiple MCP clients:** Pick one of these per client to avoid profile-lock collisions: (a) `--isolated` (ephemeral context per session, no persistence), (b) `--user-data-dir=$TMPDIR/playwright-mcp-$AGENT_ID` (caller-supplied dir, isolated per agent), or (c) `--cdp-endpoint` to attach each agent to a separately-launched browser. None of these require an upstream breaking change.
|
|
2755
|
+
- **Real-browser / profile-dependent flows:** Don't use Playwright MCP at all — use real-browser tooling explicitly configured to attach to the user's profile (e.g., `browser-use` with `Browser.from_system_chrome()` or CLI `--profile`). The task is the session, not isolated automation.
|
|
2756
|
+
|
|
2757
|
+
This rule is per-workflow, not global. Setup wizard does NOT auto-configure isolated profiles — adoption is explicit, gated on the user signaling concurrent-agent intent.
|
|
2758
|
+
|
|
2759
|
+
#### Anti-patterns
|
|
2760
|
+
|
|
2761
|
+
- **Using Playwright MCP for registrar dashboards** — its persistent managed profile is NOT your real Chrome profile, so your registrar's logged-in session/2FA cookies aren't there. You'll be re-logging in on every interaction. Use real-browser tooling configured for your real Chrome profile instead.
|
|
2762
|
+
- **Using profile-coupled / stateful browser tooling for deterministic CI tests** — when browser-use (or any tool) is configured to use a real Chrome profile, cached state, extension chrome, and stale cookies pollute the test. Use Playwright tests with isolated browser contexts for CI.
|
|
2763
|
+
- **Setting Playwright MCP `--isolated` globally as a default** — breaks single-agent flows that rely on the persistent managed profile for session continuity across debug interactions. Upstream Playwright rejected this for the same reason. Make it explicit per-workflow when concurrent agents are running.
|
|
2764
|
+
|
|
2765
|
+
---
|
|
2766
|
+
|
|
2718
2767
|
## Step 8: Create CLAUDE.md
|
|
2719
2768
|
|
|
2720
2769
|
Create `CLAUDE.md` in your project root. This is your project-specific configuration:
|
|
@@ -2918,7 +2967,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2918
2967
|
|
|
2919
2968
|
**SDLC.md:**
|
|
2920
2969
|
```markdown
|
|
2921
|
-
<!-- SDLC Wizard Version: 1.
|
|
2970
|
+
<!-- SDLC Wizard Version: 1.45.0 -->
|
|
2922
2971
|
<!-- Setup Date: [DATE] -->
|
|
2923
2972
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2924
2973
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -3983,7 +4032,7 @@ Walk through updates? (y/n)
|
|
|
3983
4032
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3984
4033
|
|
|
3985
4034
|
```markdown
|
|
3986
|
-
<!-- SDLC Wizard Version: 1.
|
|
4035
|
+
<!-- SDLC Wizard Version: 1.45.0 -->
|
|
3987
4036
|
<!-- Setup Date: 2026-01-24 -->
|
|
3988
4037
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3989
4038
|
<!-- Git Workflow: PRs -->
|
|
@@ -170,6 +170,28 @@ fi
|
|
|
170
170
|
# this hook and model-effort-check.sh both fire on SessionStart, so two checks
|
|
171
171
|
# would double-print the nudge and risk drifting out of sync.
|
|
172
172
|
|
|
173
|
+
# Autocompact compound-misconfig check (#207). Setting BOTH
|
|
174
|
+
# CLAUDE_AUTOCOMPACT_PCT_OVERRIDE and CLAUDE_CODE_AUTO_COMPACT_WINDOW
|
|
175
|
+
# compounds — e.g. 30% × 400000 = 120000 token trigger, which on a 1M
|
|
176
|
+
# window fires at ~12% of context. The wizard doc lists them as
|
|
177
|
+
# alternatives ("PCT_OVERRIDE=30 OR AUTO_COMPACT_WINDOW=400000") but the
|
|
178
|
+
# "or" is easy to misread, and the consumer in #207 hit autocompact at
|
|
179
|
+
# 12% in a fresh session. Surface the misconfig with the effective
|
|
180
|
+
# trigger so it's diagnosable from the warning alone.
|
|
181
|
+
SETTINGS_JSON="$PROJECT_DIR/.claude/settings.json"
|
|
182
|
+
if [ -f "$SETTINGS_JSON" ]; then
|
|
183
|
+
AC_PCT=$(grep -o '"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE"[[:space:]]*:[[:space:]]*"[0-9]*"' "$SETTINGS_JSON" \
|
|
184
|
+
| head -1 | sed 's/.*"\([0-9]*\)"$/\1/')
|
|
185
|
+
AC_WIN=$(grep -o '"CLAUDE_CODE_AUTO_COMPACT_WINDOW"[[:space:]]*:[[:space:]]*"[0-9]*"' "$SETTINGS_JSON" \
|
|
186
|
+
| head -1 | sed 's/.*"\([0-9]*\)"$/\1/')
|
|
187
|
+
if [ -n "$AC_PCT" ] && [ -n "$AC_WIN" ]; then
|
|
188
|
+
# Effective trigger = pct% of window (integer math; both pure digits per the regex).
|
|
189
|
+
AC_TRIGGER=$(( AC_PCT * AC_WIN / 100 ))
|
|
190
|
+
AC_PCT_OF_1M=$(( AC_TRIGGER * 100 / 1000000 ))
|
|
191
|
+
echo "WARNING: autocompact compound misconfig — CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=${AC_PCT} AND CLAUDE_CODE_AUTO_COMPACT_WINDOW=${AC_WIN} both set in .claude/settings.json compound to ${AC_TRIGGER} tokens (~${AC_PCT_OF_1M}% of 1M). Pick one — see wizard doc '1M vs 200K' (#207)."
|
|
192
|
+
fi
|
|
193
|
+
fi
|
|
194
|
+
|
|
173
195
|
# Dual-channel install check (#181) — nudge when CLI skills + Claude plugin both present.
|
|
174
196
|
# #238: silenced once the user opts in via an ack sentinel. Sentinel is per-host
|
|
175
197
|
# (lives under $SDLC_WIZARD_CACHE_DIR/dual-channel-acknowledged) since the dual
|
|
@@ -33,10 +33,16 @@ HOLD_REASONS=""
|
|
|
33
33
|
# Check 1: Codex review mid-cycle
|
|
34
34
|
# Self-heal paths (ordered by preference):
|
|
35
35
|
# (a) #209: handoff has pr_number + gh reports PR MERGED → implicit CERTIFIED (silent)
|
|
36
|
-
# (
|
|
37
|
-
#
|
|
38
|
-
#
|
|
39
|
-
#
|
|
36
|
+
# (c) #257: handoff has no pr_number BUT every SHA cited in fixes_applied[]
|
|
37
|
+
# is in HEAD's ancestry AND .reviews/latest-review.md contains CERTIFIED
|
|
38
|
+
# (without "NOT CERTIFIED") → implicit CERTIFIED (silent). Catches the
|
|
39
|
+
# solo-developer pattern: write fixes, commit them, run targeted
|
|
40
|
+
# recheck, see CERTIFIED in latest-review.md, ship — and forget to
|
|
41
|
+
# update handoff.json status. The visible signals (commits landed +
|
|
42
|
+
# review file) already say "done" so blocking is high false-positive.
|
|
43
|
+
# (b) #229: handoff has no pr_number, no SHA-ancestry heal, but mtime >
|
|
44
|
+
# SDLC_HANDOFF_STALE_DAYS days → implicit CERTIFIED with WARN
|
|
45
|
+
# (forgotten artifact; blocking forever is worse UX). Default: 14 days.
|
|
40
46
|
HANDOFF="$ROOT/.reviews/handoff.json"
|
|
41
47
|
# Validate SDLC_HANDOFF_STALE_DAYS as non-negative integer. Anything else
|
|
42
48
|
# (empty, "foo", "-3", "10.5") silently falls back to 14 — we don't want a
|
|
@@ -62,8 +68,77 @@ if [ -f "$HANDOFF" ]; then
|
|
|
62
68
|
[ "$PR_STATE" = "MERGED" ] && HEALED=1
|
|
63
69
|
fi
|
|
64
70
|
else
|
|
71
|
+
# Path (c) #257: SHA-ancestry self-heal. Look for git SHAs cited
|
|
72
|
+
# in fixes_applied[]; if every cited SHA is reachable from HEAD
|
|
73
|
+
# AND .reviews/latest-review.md says CERTIFIED, the review IS
|
|
74
|
+
# closed, the user just forgot to bump status. Silent heal.
|
|
75
|
+
REVIEW_MD="$ROOT/.reviews/latest-review.md"
|
|
76
|
+
if [ -f "$REVIEW_MD" ] \
|
|
77
|
+
&& grep -qE '\bCERTIFIED\b' "$REVIEW_MD" 2>/dev/null \
|
|
78
|
+
&& ! grep -qE '\bNOT CERTIFIED\b' "$REVIEW_MD" 2>/dev/null; then
|
|
79
|
+
# Extract the fixes_applied[] block via bracket-depth
|
|
80
|
+
# tracking — naive `/\]/` matching breaks on `]` inside
|
|
81
|
+
# string literals (e.g. "[x] FIXED in <sha>" markdown
|
|
82
|
+
# checkboxes), which would let phantom SHAs after the
|
|
83
|
+
# broken-early bracket leak past path (c) and false-heal.
|
|
84
|
+
# Codex P1 round 1.
|
|
85
|
+
FIXES_BLOCK=$(awk '
|
|
86
|
+
BEGIN { in_block = 0; depth = 0; started = 0 }
|
|
87
|
+
/"fixes_applied"/ { in_block = 1 }
|
|
88
|
+
in_block {
|
|
89
|
+
print
|
|
90
|
+
in_string = 0
|
|
91
|
+
escaped = 0
|
|
92
|
+
for (i = 1; i <= length($0); i++) {
|
|
93
|
+
c = substr($0, i, 1)
|
|
94
|
+
# Honor JSON backslash escapes: \" inside a
|
|
95
|
+
# string is a literal quote, NOT a string
|
|
96
|
+
# terminator. Without this, a fixes_applied
|
|
97
|
+
# entry containing `\"]` falsely flips the
|
|
98
|
+
# in_string flag and exits the array early —
|
|
99
|
+
# letting later phantom SHAs leak past path
|
|
100
|
+
# (c) and false-heal (Codex round 2 P1).
|
|
101
|
+
if (escaped) { escaped = 0; continue }
|
|
102
|
+
if (c == "\\") { escaped = 1; continue }
|
|
103
|
+
if (c == "\"") { in_string = !in_string; continue }
|
|
104
|
+
if (in_string) continue
|
|
105
|
+
if (c == "[") { depth++; started = 1 }
|
|
106
|
+
else if (c == "]") { depth-- }
|
|
107
|
+
}
|
|
108
|
+
if (started && depth <= 0) in_block = 0
|
|
109
|
+
}
|
|
110
|
+
' "$HANDOFF" 2>/dev/null)
|
|
111
|
+
if [ -n "$FIXES_BLOCK" ] && [ -d "$ROOT/.git" ]; then
|
|
112
|
+
# Strip UUIDs (8-4-4-4-12 hex pattern) BEFORE extracting
|
|
113
|
+
# SHA candidates. UUIDs have a fixed shape; their hex
|
|
114
|
+
# segments would otherwise match \b[0-9a-f]{7,40}\b and
|
|
115
|
+
# fail the ancestry check, false-blocking certified
|
|
116
|
+
# reviews that cite UUIDs in fixes_applied (mission
|
|
117
|
+
# UUIDs, Linear/Jira ticket IDs, etc.). Codex P2 round 1.
|
|
118
|
+
# POSIX-compatible: no `\b` (BSD sed doesn't support it).
|
|
119
|
+
# The hyphenated 8-4-4-4-12 shape is specific enough
|
|
120
|
+
# that false-stripping a real SHA is implausible.
|
|
121
|
+
CLEANED=$(echo "$FIXES_BLOCK" | sed -E 's/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}//g')
|
|
122
|
+
SHAS=$(echo "$CLEANED" | grep -oE '\b[0-9a-f]{7,40}\b' | sort -u)
|
|
123
|
+
if [ -n "$SHAS" ]; then
|
|
124
|
+
# Every cited SHA must be reachable from HEAD —
|
|
125
|
+
# phantom SHAs (e.g. typos, references to other
|
|
126
|
+
# repos) correctly fail ancestry and block the heal.
|
|
127
|
+
ALL_IN_HEAD=1
|
|
128
|
+
for sha in $SHAS; do
|
|
129
|
+
if ! git -C "$ROOT" merge-base --is-ancestor "$sha" HEAD 2>/dev/null; then
|
|
130
|
+
ALL_IN_HEAD=0
|
|
131
|
+
break
|
|
132
|
+
fi
|
|
133
|
+
done
|
|
134
|
+
[ "$ALL_IN_HEAD" -eq 1 ] && HEALED=1
|
|
135
|
+
fi
|
|
136
|
+
fi
|
|
137
|
+
fi
|
|
65
138
|
# Path (b): stale-handoff auto-expire (#229). Only when no pr_number
|
|
66
|
-
#
|
|
139
|
+
# AND path (c) didn't already heal. We must not short-circuit
|
|
140
|
+
# PR-linked reviews.
|
|
141
|
+
if [ "$HEALED" -ne 1 ]; then
|
|
67
142
|
# Try GNU stat first (Linux: `-c %Y` gives mtime, BSD stat errors out
|
|
68
143
|
# so `||` fires). Then BSD stat (macOS: `-f %m` gives mtime). The
|
|
69
144
|
# reverse order fails on Linux because `stat -f` on GNU means
|
|
@@ -83,6 +158,7 @@ if [ -f "$HANDOFF" ]; then
|
|
|
83
158
|
;;
|
|
84
159
|
esac
|
|
85
160
|
fi
|
|
161
|
+
fi
|
|
86
162
|
if [ "$HEALED" -ne 1 ]; then
|
|
87
163
|
HOLD_REASONS="${HOLD_REASONS} - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
|
|
88
164
|
Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
|
package/package.json
CHANGED
package/skills/sdlc/SKILL.md
CHANGED
|
@@ -170,16 +170,18 @@ When auto-approving, still announce your approach — just don't wait for approv
|
|
|
170
170
|
|
|
171
171
|
## Recommended Model
|
|
172
172
|
|
|
173
|
-
**
|
|
173
|
+
**Opt-in: `opus[1m]` (Opus 4.7 with 1M context window).** Run `/model opus[1m]` at the start of any non-trivial SDLC session — but understand the tradeoff first (issue #198).
|
|
174
174
|
|
|
175
|
-
**Why:**
|
|
175
|
+
**Why opt-in, not default:** A top-level `model` pin in `.claude/settings.json` disables Claude Code's per-turn model auto-selection. That's a real cost — Max-plan users pay for that auto-selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). Pin only when you actually need the 1M headroom.
|
|
176
|
+
|
|
177
|
+
**Why pin to `opus[1m]` when you do opt in:**
|
|
176
178
|
- SDLC sessions (plan → TDD → review → CI shepherd) accumulate context fast — plans, test output, diffs, review artifacts. 200K fills up before you're done.
|
|
177
179
|
- Forced auto-compact mid-task loses your working state. Extra headroom is cheaper than re-reading files.
|
|
178
180
|
- At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models — verify current rates for your plan before relying on this.
|
|
179
181
|
|
|
180
182
|
**Requires Claude Code v2.1.111+** for Opus 4.7.
|
|
181
183
|
|
|
182
|
-
**Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30
|
|
184
|
+
**Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`** when you opt in. Without it, CC's default auto-compact on 1M fires at ~76K and defeats the purpose. The setup wizard's Step 9.5 prompts to write both together (template ships with neither, opt-in only).
|
|
183
185
|
|
|
184
186
|
**Fall back to `opus` (200K) only when:** your plan charges a premium for long-context prompts, the task is genuinely short (<30K), or team cost controls flag >200K prompts. See the "1M vs 200K Context Window" section in `CLAUDE_CODE_SDLC_WIZARD.md` for details.
|
|
185
187
|
|
|
@@ -606,7 +608,7 @@ CI passes -> Read review suggestions
|
|
|
606
608
|
- `/clear` after 2+ failed corrections (context polluted — start fresh with better prompt)
|
|
607
609
|
- Auto-compact fires at ~95% capacity — no manual management needed
|
|
608
610
|
- After committing a PR, `/clear` before starting the next feature
|
|
609
|
-
- **Autocompact tuning:** Set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to trigger compaction earlier (75% for 200K, 30% for 1M). On 1M models, the default fires at ~76K —
|
|
611
|
+
- **Autocompact tuning:** Set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to trigger compaction earlier (75% for 200K, 30% for 1M). On 1M models, the default fires at ~76K — pick ONE of: `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` **OR** `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (do NOT set both — they compound to 30% × 400K = 120K trigger ≈ 12% of 1M, which fires almost immediately, #207). See wizard doc "Autocompact Tuning" for full details
|
|
610
612
|
|
|
611
613
|
**`--bare` mode (v2.1.81+):** `claude -p "prompt" --bare` skips ALL hooks, skills, LSP, and plugins. This is a complete wizard bypass — no SDLC enforcement, no TDD checks, no planning hooks. Use only for scripted headless calls (CI pipelines, automation) where you explicitly don't want wizard enforcement. Never use `--bare` for normal development work.
|
|
612
614
|
|
package/skills/update/SKILL.md
CHANGED
|
@@ -131,9 +131,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
|
|
|
131
131
|
|
|
132
132
|
```
|
|
133
133
|
Installed: 1.24.0
|
|
134
|
-
Latest: 1.
|
|
134
|
+
Latest: 1.45.0
|
|
135
135
|
|
|
136
136
|
What changed:
|
|
137
|
+
- [1.45.0] PreCompact path (c) — SHA-ancestry self-heal — closes #257. Solo-developer pattern: write fixes, commit them, run targeted Codex recheck, see CERTIFIED in `.reviews/latest-review.md`, ship the feature. Forgetting to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED` is realistic — the file is buried and the visible signals (commits + review file) already say "done". PreCompact hook now self-heals silently when: (a) handoff is `PENDING_*` with no `pr_number`, (b) every SHA cited in `fixes_applied[]` is reachable from HEAD via `git merge-base --is-ancestor`, AND (c) `.reviews/latest-review.md` contains CERTIFIED without `NOT CERTIFIED`. Bracket extraction is escape-aware (depth counter + JSON `\\\"` handling) so `]` inside string literals doesn't terminate the array early. UUIDs (8-4-4-4-12 hex) stripped before SHA extraction so ticket IDs in fixes_applied don't false-block the heal. 9 new tests, Codex round 3 CERTIFIED 10/10.
|
|
138
|
+
- [1.44.1] Autocompact compound-misconfig detection — closes #207. Consumer reported autocompact firing at 12% context on a fresh opus[1m] session because they set BOTH `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (a natural misreading of the "or"-joined override cell). The two compound: 30% × 400K = 120K trigger ≈ 12% of 1M. Three-pronged fix: (a) wizard doc clarifies alternatives with a `> ⚠ Do NOT set both` callout that shows the compound math; (b) `instructions-loaded-check.sh` (InstructionsLoaded hook) detects when both env vars are set in `.claude/settings.json`, computes the effective trigger, and warns with the math; (c) shipped `skills/sdlc/SKILL.md` was still calling opus[1m] the "default" (stale post-#198) AND repeating the same ambiguous wording — both fixed. 4 new hook tests + 3 new doc-consistency tests + size-cap fixture extended. Codex round 2 CERTIFIED 9/10.
|
|
137
139
|
- [1.44.0] Install-path & cache hygiene — closes #254, #239, #238 filed by consumer codeguesser after upgrading 1.32.0 → 1.42.1. (1) `cli/init.js` FILES list now ships `hooks/_find-sdlc-root.sh` — the helper sourced by all 5 hooks was missing from npm install path, so every session emitted `_find-sdlc-root.sh: No such file or directory` + `dedupe_plugin_or_project: command not found` and the SDLC walk-up logic was silently dead. (2) `init --force` now invalidates `~/.cache/sdlc-wizard/latest-version` so post-upgrade hooks re-fetch fresh values from npm instead of serving the pre-upgrade cache for 24h (which produced reverse "1.42.1 → 1.41.1" nudges). (3) instructions-loaded-check.sh now uses semver-direction comparison via new `semver_lt` function: nudge only fires when installed < latest, equality is silent, reverse direction is silent. Cache sanity-check rejects poisoned values (cached "latest" < installed → force refetch). (4) When `npm view` fails AND cache empty, hook now surfaces a one-line warning instead of going silent. (5) Dual-channel install nudge gains an opt-in silence sentinel — set via `mkdir -p $SDLC_WIZARD_CACHE_DIR && touch $SDLC_WIZARD_CACHE_DIR/dual-channel-acknowledged` (printed inside the nudge itself for discoverability). 8 new tests across test-cli.sh + test-hooks.sh, Codex CERTIFIED 10/10 round 2.
|
|
138
140
|
- [1.43.0] Token-spike anomaly detection — ROADMAP #220 closure. New `hooks/token-spike-check.sh` (SessionStart, opt-in via `.metrics/`) ingests CC transcript usage (`input_tokens` / `output_tokens` / `cache_creation_input_tokens` / `cache_read_input_tokens`) into `.metrics/token-history.jsonl`, then warns when the last session's `costly_tokens` (input + cache_creation + output, excluding the cheap cache_read tier) exceeds median + 2σ over a rolling baseline. Catches silent CC-side caching regressions (per Anthropic's 2026-04-23 post-mortem) before they surface on the invoice. Uses MAD-based spread for the median metric so a single baseline outlier doesn't mask the next spike. 14 quality tests in `tests/test-token-spike.sh` (incl. malicious-transcript privacy probe, flat-baseline floor, median-vs-mean contrast, concurrent-ingest mkdir lock).
|
|
139
141
|
- [1.42.2] PreCompact self-heal documented — ROADMAP #209 closure. Added `pr_number` opt-in to all 3 handoff template schemas (skill Step 1; wizard Round 1 + cross-model section). Self-heal logic shipped earlier with #229 but was undocumented, leaving the dead-code path. New `test_handoff_template_documents_pr_number` enforces template/doc parity. Together with #229 (mtime auto-expire) closes the "stuck PENDING handoff blocks /compact forever" footgun from both directions.
|