agentic-sdlc-wizard 1.44.1 → 1.45.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,21 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.45.0] - 2026-04-27
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **PreCompact path (c) — SHA-ancestry self-heal** (closes #257). Consumer reported PreCompact blocking `/compact` even when the cited Codex review WAS actually CERTIFIED — the user just forgot to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED`. Existing self-heals don't cover this solo-developer pattern: path (a) needs `pr_number`, path (b) needs `mtime > 14d`. New path (c) heals when: handoff is `PENDING_*` with no `pr_number`, every SHA cited in `fixes_applied[]` is reachable from HEAD (`git merge-base --is-ancestor`), AND `.reviews/latest-review.md` contains `CERTIFIED` without `NOT CERTIFIED`. Path (b) still runs if (c) abstains (no SHAs / no review file).
|
|
12
|
+
- **Robust extraction**: awk extracts the `fixes_applied[]` block via bracket-depth + escape-aware string-literal tracking. `]` inside string literals (e.g. `"[x] FIXED..."` markdown checkboxes, `"...\"]"` escaped-quote-bracket) does NOT terminate the array prematurely.
|
|
13
|
+
- **UUID resilience**: strips 8-4-4-4-12 hex UUIDs before SHA extraction so ticket IDs in fixes_applied entries (Linear, Jira, mission UUIDs) don't false-block the heal.
|
|
14
|
+
- **Phantom SHA gate**: every cited SHA must pass `git merge-base --is-ancestor` against HEAD. Phantom SHAs (typos, references to other repos) correctly fail and block the heal.
|
|
15
|
+
- 9 new test-hooks tests (positive heal, phantom blocks, NOT CERTIFIED blocks, missing review.md blocks, partial coverage blocks, fall-through to stale, markdown-checkbox bracket, UUID alongside real SHA, escaped-quote bracket). Codex round 3 CERTIFIED 10/10 (rounds 1-2 surfaced bracket-extraction edge cases — markdown `[x]`, escaped quotes — and UUID false-block; all fixed).
|
|
16
|
+
|
|
17
|
+
### Files
|
|
18
|
+
|
|
19
|
+
- `hooks/precompact-seam-check.sh` — new path (c) block with depth-counted + escape-aware awk extraction, sed UUID strip, ancestry check
|
|
20
|
+
- `tests/test-hooks.sh` — `_precompact_init_repo_with_commit` helper + 9 new path (c) tests
|
|
21
|
+
|
|
7
22
|
## [1.44.1] - 2026-04-27
|
|
8
23
|
|
|
9
24
|
### Fixed
|
|
@@ -2717,6 +2717,53 @@ Options:
|
|
|
2717
2717
|
|
|
2718
2718
|
---
|
|
2719
2719
|
|
|
2720
|
+
### Browser Tooling Policy
|
|
2721
|
+
|
|
2722
|
+
Three different jobs, three different tools. Conflating them is the source of recurring agent failures — `Playwright MCP` for an auth-heavy registrar dashboard wastes a session, browser-use for a deterministic regression test gives flaky CI.
|
|
2723
|
+
|
|
2724
|
+
| Tool | Job | Profile model | When to pick |
|
|
2725
|
+
|------|-----|---------------|--------------|
|
|
2726
|
+
| **Playwright tests** | Deterministic regression suite, CI/release gate | Isolated by design — each test gets a clean browser context per [Playwright docs](https://playwright.dev/docs/browser-contexts) | Asserting expected user flows; running on every PR; gating deploy |
|
|
2727
|
+
| **Playwright MCP** | Live browser debugging, visual QA, DOM inspection | Default mode uses a **persistent Playwright-managed profile** at `ms-playwright/mcp-{channel}-{workspace-hash}` ([docs](https://playwright.dev/docs/getting-started-mcp#user-profile)) — NOT the user's regular Chrome profile. Other modes: `--isolated` (ephemeral context per session), `--user-data-dir=PATH` (caller-supplied dir), CDP attach (`--cdp-endpoint`), extension mode (attach to user's running browser tab) | One-off "look at this page" / "click this button and tell me what happens"; visual verification mid-session |
|
|
2728
|
+
| **Real-browser tooling** (browser-use, CDP-attach, Chrome profile) | Authenticated, profile-dependent, stateful operator flows | **When configured for real-browser mode** (e.g., browser-use `Browser.from_system_chrome()` or CLI `--profile`/`connect`), uses the user's actual Chrome profile (cookies, extensions, logged-in sessions). Default browser-use CLI runs headless Chromium — opt into the real-profile mode explicitly | Registrar dashboards (Porkbun, GoDaddy), DNS setup, cloud-provider consoles, wallet-adjacent Web3 flows, logged-in admin panels — anywhere preserving cookies/profile/extensions matters more than clean isolation |
|
|
2729
|
+
|
|
2730
|
+
**The core insight:** Playwright tests' isolation is a *feature*, not a bug. Playwright MCP's persistent-managed-profile default is also a feature — it preserves session continuity across debug interactions in a SINGLE agent. The collision case is concurrent agents (#251) sharing the same managed profile. Real-browser tooling is the right call only when the task IS the user's authenticated session, and only when explicitly configured to attach to that profile.
|
|
2731
|
+
|
|
2732
|
+
#### When to recommend real-browser tooling (#225)
|
|
2733
|
+
|
|
2734
|
+
Trigger examples — if the task description includes any of these, suggest real-browser tooling (browser-use or CDP-attach) over Playwright MCP:
|
|
2735
|
+
|
|
2736
|
+
- Registrar dashboards (domain purchase, DNS records, nameserver changes)
|
|
2737
|
+
- DNS setup / DNSLink / custom-domain configuration
|
|
2738
|
+
- Cloud/provider dashboards (AWS console, Cloudflare, Vercel, registrars)
|
|
2739
|
+
- Wallet-adjacent Web3 flows (token approvals, contract interactions in a logged-in MetaMask)
|
|
2740
|
+
- Logged-in admin panels (Stripe, Vercel, GitHub admin pages requiring 2FA-cached session)
|
|
2741
|
+
- Anywhere preserving cookies/profile/extensions matters more than clean isolation
|
|
2742
|
+
|
|
2743
|
+
These all share a property: the agent's job IS the authenticated session. A clean automation browser is the wrong model.
|
|
2744
|
+
|
|
2745
|
+
#### Playwright MCP profile-lock policy (#251)
|
|
2746
|
+
|
|
2747
|
+
`Playwright MCP`'s default mode reuses a single persistent managed profile (`ms-playwright/mcp-{channel}-{workspace-hash}`) across stdio sessions — which is the right call for single-agent debugging because it preserves session continuity across calls, but breaks down when **multiple agents or MCP clients run concurrently** (two CC sessions, one CC + one Codex, etc.). Concurrent stdio sessions collide on the same Chrome user-data directory and corrupt each other's session state.
|
|
2748
|
+
|
|
2749
|
+
**Upstream Playwright rejected default-isolated as the global default.** See [microsoft/playwright#40419](https://github.com/microsoft/playwright/issues/40419) and the discussion on [microsoft/playwright#40420](https://github.com/microsoft/playwright/pull/40420) — maintainer feedback: *"That's unfortunately very breaking."* Changing the default would silently break every existing single-agent setup that relies on the persistent managed profile.
|
|
2750
|
+
|
|
2751
|
+
**Wizard policy (per-user, opt-in at the wizard layer, not upstream):**
|
|
2752
|
+
|
|
2753
|
+
- **Single-agent / single MCP client (default):** Use Playwright MCP's default persistent-managed-profile mode. No special config required.
|
|
2754
|
+
- **Concurrent agents / multiple MCP clients:** Pick one of these per client to avoid profile-lock collisions: (a) `--isolated` (ephemeral context per session, no persistence), (b) `--user-data-dir=$TMPDIR/playwright-mcp-$AGENT_ID` (caller-supplied dir, isolated per agent), or (c) `--cdp-endpoint` to attach each agent to a separately-launched browser. None of these require an upstream breaking change.
|
|
2755
|
+
- **Real-browser / profile-dependent flows:** Don't use Playwright MCP at all — use real-browser tooling explicitly configured to attach to the user's profile (e.g., `browser-use` with `Browser.from_system_chrome()` or CLI `--profile`). The task is the session, not isolated automation.
|
|
2756
|
+
|
|
2757
|
+
This rule is per-workflow, not global. Setup wizard does NOT auto-configure isolated profiles — adoption is explicit, gated on the user signaling concurrent-agent intent.
|
|
2758
|
+
|
|
2759
|
+
#### Anti-patterns
|
|
2760
|
+
|
|
2761
|
+
- **Using Playwright MCP for registrar dashboards** — its persistent managed profile is NOT your real Chrome profile, so your registrar's logged-in session/2FA cookies aren't there. You'll be re-logging in on every interaction. Use real-browser tooling configured for your real Chrome profile instead.
|
|
2762
|
+
- **Using profile-coupled / stateful browser tooling for deterministic CI tests** — when browser-use (or any tool) is configured to use a real Chrome profile, cached state, extension chrome, and stale cookies pollute the test. Use Playwright tests with isolated browser contexts for CI.
|
|
2763
|
+
- **Setting Playwright MCP `--isolated` globally as a default** — breaks single-agent flows that rely on the persistent managed profile for session continuity across debug interactions. Upstream Playwright rejected this for the same reason. Make it explicit per-workflow when concurrent agents are running.
|
|
2764
|
+
|
|
2765
|
+
---
|
|
2766
|
+
|
|
2720
2767
|
## Step 8: Create CLAUDE.md
|
|
2721
2768
|
|
|
2722
2769
|
Create `CLAUDE.md` in your project root. This is your project-specific configuration:
|
|
@@ -2920,7 +2967,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2920
2967
|
|
|
2921
2968
|
**SDLC.md:**
|
|
2922
2969
|
```markdown
|
|
2923
|
-
<!-- SDLC Wizard Version: 1.
|
|
2970
|
+
<!-- SDLC Wizard Version: 1.45.0 -->
|
|
2924
2971
|
<!-- Setup Date: [DATE] -->
|
|
2925
2972
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2926
2973
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -3985,7 +4032,7 @@ Walk through updates? (y/n)
|
|
|
3985
4032
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3986
4033
|
|
|
3987
4034
|
```markdown
|
|
3988
|
-
<!-- SDLC Wizard Version: 1.
|
|
4035
|
+
<!-- SDLC Wizard Version: 1.45.0 -->
|
|
3989
4036
|
<!-- Setup Date: 2026-01-24 -->
|
|
3990
4037
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3991
4038
|
<!-- Git Workflow: PRs -->
|
|
@@ -33,10 +33,16 @@ HOLD_REASONS=""
|
|
|
33
33
|
# Check 1: Codex review mid-cycle
|
|
34
34
|
# Self-heal paths (ordered by preference):
|
|
35
35
|
# (a) #209: handoff has pr_number + gh reports PR MERGED → implicit CERTIFIED (silent)
|
|
36
|
-
# (
|
|
37
|
-
#
|
|
38
|
-
#
|
|
39
|
-
#
|
|
36
|
+
# (c) #257: handoff has no pr_number BUT every SHA cited in fixes_applied[]
|
|
37
|
+
# is in HEAD's ancestry AND .reviews/latest-review.md contains CERTIFIED
|
|
38
|
+
# (without "NOT CERTIFIED") → implicit CERTIFIED (silent). Catches the
|
|
39
|
+
# solo-developer pattern: write fixes, commit them, run targeted
|
|
40
|
+
# recheck, see CERTIFIED in latest-review.md, ship — and forget to
|
|
41
|
+
# update handoff.json status. The visible signals (commits landed +
|
|
42
|
+
# review file) already say "done" so blocking is high false-positive.
|
|
43
|
+
# (b) #229: handoff has no pr_number, no SHA-ancestry heal, but mtime >
|
|
44
|
+
# SDLC_HANDOFF_STALE_DAYS days → implicit CERTIFIED with WARN
|
|
45
|
+
# (forgotten artifact; blocking forever is worse UX). Default: 14 days.
|
|
40
46
|
HANDOFF="$ROOT/.reviews/handoff.json"
|
|
41
47
|
# Validate SDLC_HANDOFF_STALE_DAYS as non-negative integer. Anything else
|
|
42
48
|
# (empty, "foo", "-3", "10.5") silently falls back to 14 — we don't want a
|
|
@@ -62,8 +68,77 @@ if [ -f "$HANDOFF" ]; then
|
|
|
62
68
|
[ "$PR_STATE" = "MERGED" ] && HEALED=1
|
|
63
69
|
fi
|
|
64
70
|
else
|
|
71
|
+
# Path (c) #257: SHA-ancestry self-heal. Look for git SHAs cited
|
|
72
|
+
# in fixes_applied[]; if every cited SHA is reachable from HEAD
|
|
73
|
+
# AND .reviews/latest-review.md says CERTIFIED, the review IS
|
|
74
|
+
# closed, the user just forgot to bump status. Silent heal.
|
|
75
|
+
REVIEW_MD="$ROOT/.reviews/latest-review.md"
|
|
76
|
+
if [ -f "$REVIEW_MD" ] \
|
|
77
|
+
&& grep -qE '\bCERTIFIED\b' "$REVIEW_MD" 2>/dev/null \
|
|
78
|
+
&& ! grep -qE '\bNOT CERTIFIED\b' "$REVIEW_MD" 2>/dev/null; then
|
|
79
|
+
# Extract the fixes_applied[] block via bracket-depth
|
|
80
|
+
# tracking — naive `/\]/` matching breaks on `]` inside
|
|
81
|
+
# string literals (e.g. "[x] FIXED in <sha>" markdown
|
|
82
|
+
# checkboxes), which would let phantom SHAs after the
|
|
83
|
+
# broken-early bracket leak past path (c) and false-heal.
|
|
84
|
+
# Codex P1 round 1.
|
|
85
|
+
FIXES_BLOCK=$(awk '
|
|
86
|
+
BEGIN { in_block = 0; depth = 0; started = 0 }
|
|
87
|
+
/"fixes_applied"/ { in_block = 1 }
|
|
88
|
+
in_block {
|
|
89
|
+
print
|
|
90
|
+
in_string = 0
|
|
91
|
+
escaped = 0
|
|
92
|
+
for (i = 1; i <= length($0); i++) {
|
|
93
|
+
c = substr($0, i, 1)
|
|
94
|
+
# Honor JSON backslash escapes: \" inside a
|
|
95
|
+
# string is a literal quote, NOT a string
|
|
96
|
+
# terminator. Without this, a fixes_applied
|
|
97
|
+
# entry containing `\"]` falsely flips the
|
|
98
|
+
# in_string flag and exits the array early —
|
|
99
|
+
# letting later phantom SHAs leak past path
|
|
100
|
+
# (c) and false-heal (Codex round 2 P1).
|
|
101
|
+
if (escaped) { escaped = 0; continue }
|
|
102
|
+
if (c == "\\") { escaped = 1; continue }
|
|
103
|
+
if (c == "\"") { in_string = !in_string; continue }
|
|
104
|
+
if (in_string) continue
|
|
105
|
+
if (c == "[") { depth++; started = 1 }
|
|
106
|
+
else if (c == "]") { depth-- }
|
|
107
|
+
}
|
|
108
|
+
if (started && depth <= 0) in_block = 0
|
|
109
|
+
}
|
|
110
|
+
' "$HANDOFF" 2>/dev/null)
|
|
111
|
+
if [ -n "$FIXES_BLOCK" ] && [ -d "$ROOT/.git" ]; then
|
|
112
|
+
# Strip UUIDs (8-4-4-4-12 hex pattern) BEFORE extracting
|
|
113
|
+
# SHA candidates. UUIDs have a fixed shape; their hex
|
|
114
|
+
# segments would otherwise match \b[0-9a-f]{7,40}\b and
|
|
115
|
+
# fail the ancestry check, false-blocking certified
|
|
116
|
+
# reviews that cite UUIDs in fixes_applied (mission
|
|
117
|
+
# UUIDs, Linear/Jira ticket IDs, etc.). Codex P2 round 1.
|
|
118
|
+
# POSIX-compatible: no `\b` (BSD sed doesn't support it).
|
|
119
|
+
# The hyphenated 8-4-4-4-12 shape is specific enough
|
|
120
|
+
# that false-stripping a real SHA is implausible.
|
|
121
|
+
CLEANED=$(echo "$FIXES_BLOCK" | sed -E 's/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}//g')
|
|
122
|
+
SHAS=$(echo "$CLEANED" | grep -oE '\b[0-9a-f]{7,40}\b' | sort -u)
|
|
123
|
+
if [ -n "$SHAS" ]; then
|
|
124
|
+
# Every cited SHA must be reachable from HEAD —
|
|
125
|
+
# phantom SHAs (e.g. typos, references to other
|
|
126
|
+
# repos) correctly fail ancestry and block the heal.
|
|
127
|
+
ALL_IN_HEAD=1
|
|
128
|
+
for sha in $SHAS; do
|
|
129
|
+
if ! git -C "$ROOT" merge-base --is-ancestor "$sha" HEAD 2>/dev/null; then
|
|
130
|
+
ALL_IN_HEAD=0
|
|
131
|
+
break
|
|
132
|
+
fi
|
|
133
|
+
done
|
|
134
|
+
[ "$ALL_IN_HEAD" -eq 1 ] && HEALED=1
|
|
135
|
+
fi
|
|
136
|
+
fi
|
|
137
|
+
fi
|
|
65
138
|
# Path (b): stale-handoff auto-expire (#229). Only when no pr_number
|
|
66
|
-
#
|
|
139
|
+
# AND path (c) didn't already heal. We must not short-circuit
|
|
140
|
+
# PR-linked reviews.
|
|
141
|
+
if [ "$HEALED" -ne 1 ]; then
|
|
67
142
|
# Try GNU stat first (Linux: `-c %Y` gives mtime, BSD stat errors out
|
|
68
143
|
# so `||` fires). Then BSD stat (macOS: `-f %m` gives mtime). The
|
|
69
144
|
# reverse order fails on Linux because `stat -f` on GNU means
|
|
@@ -83,6 +158,7 @@ if [ -f "$HANDOFF" ]; then
|
|
|
83
158
|
;;
|
|
84
159
|
esac
|
|
85
160
|
fi
|
|
161
|
+
fi
|
|
86
162
|
if [ "$HEALED" -ne 1 ]; then
|
|
87
163
|
HOLD_REASONS="${HOLD_REASONS} - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
|
|
88
164
|
Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
|
package/package.json
CHANGED
package/skills/update/SKILL.md
CHANGED
|
@@ -131,9 +131,10 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
|
|
|
131
131
|
|
|
132
132
|
```
|
|
133
133
|
Installed: 1.24.0
|
|
134
|
-
Latest: 1.
|
|
134
|
+
Latest: 1.45.0
|
|
135
135
|
|
|
136
136
|
What changed:
|
|
137
|
+
- [1.45.0] PreCompact path (c) — SHA-ancestry self-heal — closes #257. Solo-developer pattern: write fixes, commit them, run targeted Codex recheck, see CERTIFIED in `.reviews/latest-review.md`, ship the feature. Forgetting to bump `handoff.json status` from `PENDING_RECHECK` → `CERTIFIED` is realistic — the file is buried and the visible signals (commits + review file) already say "done". PreCompact hook now self-heals silently when: (a) handoff is `PENDING_*` with no `pr_number`, (b) every SHA cited in `fixes_applied[]` is reachable from HEAD via `git merge-base --is-ancestor`, AND (c) `.reviews/latest-review.md` contains CERTIFIED without `NOT CERTIFIED`. Bracket extraction is escape-aware (depth counter + JSON `\\\"` handling) so `]` inside string literals doesn't terminate the array early. UUIDs (8-4-4-4-12 hex) stripped before SHA extraction so ticket IDs in fixes_applied don't false-block the heal. 9 new tests, Codex round 3 CERTIFIED 10/10.
|
|
137
138
|
- [1.44.1] Autocompact compound-misconfig detection — closes #207. Consumer reported autocompact firing at 12% context on a fresh opus[1m] session because they set BOTH `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` AND `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` (a natural misreading of the "or"-joined override cell). The two compound: 30% × 400K = 120K trigger ≈ 12% of 1M. Three-pronged fix: (a) wizard doc clarifies alternatives with a `> ⚠ Do NOT set both` callout that shows the compound math; (b) `instructions-loaded-check.sh` (InstructionsLoaded hook) detects when both env vars are set in `.claude/settings.json`, computes the effective trigger, and warns with the math; (c) shipped `skills/sdlc/SKILL.md` was still calling opus[1m] the "default" (stale post-#198) AND repeating the same ambiguous wording — both fixed. 4 new hook tests + 3 new doc-consistency tests + size-cap fixture extended. Codex round 2 CERTIFIED 9/10.
|
|
138
139
|
- [1.44.0] Install-path & cache hygiene — closes #254, #239, #238 filed by consumer codeguesser after upgrading 1.32.0 → 1.42.1. (1) `cli/init.js` FILES list now ships `hooks/_find-sdlc-root.sh` — the helper sourced by all 5 hooks was missing from npm install path, so every session emitted `_find-sdlc-root.sh: No such file or directory` + `dedupe_plugin_or_project: command not found` and the SDLC walk-up logic was silently dead. (2) `init --force` now invalidates `~/.cache/sdlc-wizard/latest-version` so post-upgrade hooks re-fetch fresh values from npm instead of serving the pre-upgrade cache for 24h (which produced reverse "1.42.1 → 1.41.1" nudges). (3) instructions-loaded-check.sh now uses semver-direction comparison via new `semver_lt` function: nudge only fires when installed < latest, equality is silent, reverse direction is silent. Cache sanity-check rejects poisoned values (cached "latest" < installed → force refetch). (4) When `npm view` fails AND cache empty, hook now surfaces a one-line warning instead of going silent. (5) Dual-channel install nudge gains an opt-in silence sentinel — set via `mkdir -p $SDLC_WIZARD_CACHE_DIR && touch $SDLC_WIZARD_CACHE_DIR/dual-channel-acknowledged` (printed inside the nudge itself for discoverability). 8 new tests across test-cli.sh + test-hooks.sh, Codex CERTIFIED 10/10 round 2.
|
|
139
140
|
- [1.43.0] Token-spike anomaly detection — ROADMAP #220 closure. New `hooks/token-spike-check.sh` (SessionStart, opt-in via `.metrics/`) ingests CC transcript usage (`input_tokens` / `output_tokens` / `cache_creation_input_tokens` / `cache_read_input_tokens`) into `.metrics/token-history.jsonl`, then warns when the last session's `costly_tokens` (input + cache_creation + output, excluding the cheap cache_read tier) exceeds median + 2σ over a rolling baseline. Catches silent CC-side caching regressions (per Anthropic's 2026-04-23 post-mortem) before they surface on the invoice. Uses MAD-based spread for the median metric so a single baseline outlier doesn't mask the next spike. 14 quality tests in `tests/test-token-spike.sh` (incl. malicious-transcript privacy probe, flat-baseline floor, median-vs-mean contrast, concurrent-ingest mkdir lock).
|