agentic-sdlc-wizard 1.40.1 → 1.41.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.40.1",
16
+ "version": "1.41.1",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.40.1",
3
+ "version": "1.41.1",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,22 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.41.1] - 2026-04-26
8
+
9
+ ### Added
10
+
11
+ - **MCP-tool hooks audit documented** (ROADMAP #218). CC 2.1.118 introduced `type: "mcp_tool"` for hooks. Audited all 5 wizard hooks (sdlc-prompt-check, instructions-loaded-check, tdd-pretool-check, model-effort-check, precompact-seam-check) against MCP-tool migration criteria: portability, gating semantics, cross-tool state. Conclusion: all 5 stay bash. Per-hook rationale documented in CLAUDE_CODE_SDLC_WIZARD.md → "Known CC Gotchas → MCP-tool hooks audit". New `tests/test-mcp-hook-audit.sh` (7 tests) ensures the audit doesn't get re-litigated by future maintainers; if a hook DOES migrate later, the test is the natural place to update with new rationale.
12
+
13
+ ## [1.41.0] - 2026-04-26
14
+
15
+ ### Added
16
+
17
+ - **Post-mortem 2026-04-23 lessons folded into wizard docs** (ROADMAP #221). [Anthropic's 2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) provides independent third-party evidence for three SDLC-relevant failure modes; this release captures all three:
18
+ - **Don't rely on CC default effort** — the post-mortem confirmed CC has flipped reasoning_effort defaults across versions (high → medium → xhigh/high). Recommended Effort section now cites this as evidence and reinforces that `/effort max` should be set explicitly every session, never assumed from the default.
19
+ - **Extended-thinking + caching + idle sessions can drop thinking blocks** — new "Known CC Gotchas" top-level section documents the failure mode (cached prompt prefix re-served after idle pruning silently drops thinking blocks downstream), with a workaround (start fresh session with `claude --continue` if quality degrades mid-session) and a detection signal pointer to ROADMAP #220.
20
+ - **Brevity-cap audit + regression guard** — audited every `skills/*/SKILL.md` and `hooks/*.sh` for compounding brevity constraints (`≤N words`, `be concise`, `keep brief`). Audit clean; no system-prompt brevity caps in the wizard. New `tests/test-postmortem-lessons.sh` (7 tests) includes a regression guard that fails CI if a future PR introduces one.
21
+ - "Known CC Gotchas" is now a documented section pattern; future CC failure modes get folded here with workarounds.
22
+
7
23
  ## [1.40.1] - 2026-04-26
8
24
 
9
25
  ### Added
@@ -257,6 +257,8 @@ Claude Code's **effort level** controls how much thinking the model does before
257
257
 
258
258
  **Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
259
259
 
260
+ **Don't rely on the CC default — set effort yourself.** Anthropic's [2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) is independent third-party evidence that CC has flipped reasoning_effort defaults across versions (high → medium → xhigh/high). The default has changed before and will change again. The wizard's `model-effort-check.sh` hook nudges to `xhigh`/`max` at session start specifically because the in-product default is not load-bearing — it can shift release-to-release without notice. Set `/effort max` explicitly every session you do SDLC work, and treat any "I assumed the default was X" reasoning as a bug.
261
+
260
262
  The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.7, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.7.
261
263
 
262
264
  **Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
@@ -437,6 +439,64 @@ Several fixes that strengthen wizard enforcement:
437
439
 
438
440
  ---
439
441
 
442
+ ## Known CC Gotchas
443
+
444
+ This section documents Claude Code failure modes that have been observed in the wild — typically surfaced via post-mortems, GitHub issues, or our own catches data. Each entry has a workaround and a permanent fix when one exists.
445
+
446
+ ### Extended-thinking + caching + idle sessions can drop thinking blocks (post-mortem 2026-04-23)
447
+
448
+ [Anthropic's 2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) describes a caching bug that "continuously dropped thinking blocks from subsequent requests" — surfaced primarily as silent quality degradation in long sessions. The failure mode mixed three ingredients:
449
+
450
+ 1. **Extended thinking enabled** (high/xhigh/max effort triggers thinking-block production)
451
+ 2. **Prompt caching active** (CC re-uses cached prompt prefixes across turns)
452
+ 3. **Idle sessions** (context pruning during idle pulls thinking blocks out of the cache window)
453
+
454
+ When a cached prompt prefix is re-served after idle pruning, downstream thinking blocks can be silently absent — the model produces shorter, less-considered responses despite the requested effort level. Symptom: a session that was reasoning deeply earlier suddenly returns terse answers without obvious cause.
455
+
456
+ **Workaround**: if you hit suspicious shallow reasoning mid-session — especially after a long idle gap — start a fresh session with `claude --continue` to reset cache state. The wizard's PreCompact hook gates manual `/compact` precisely because compacting at bad seams can also pull thinking blocks out of context.
457
+
458
+ **Detection signal**: the wizard's `model-effort-check.sh` already loud-warns below `xhigh`. Combine with token-spike anomaly detection (ROADMAP #220) once shipped.
459
+
460
+ ### Prompt brevity caps can compound across turns (post-mortem 2026-04-23)
461
+
462
+ The same post-mortem documented that a length-limit prompt change (one of several brevity edits, including a line like `"keep text between tool calls to ≤25 words"`) correlated with a measurable ~3% drop on one evaluation. The post-mortem attributes the drop to the broader length-limit prompt change, not to that single sentence alone.
463
+
464
+ **Wizard policy** (audited 2026-04-26): the wizard's SKILL.md files and hook stdout do **not** impose compounding brevity constraints — no `≤N words`, `<N words`, `be concise`, or `keep brief` instructions to Claude. The wizard's own response-style guidance is in CC's user-level instructions, not injected into every system prompt.
465
+
466
+ **Regression guard**: `tests/test-postmortem-lessons.sh` greps every `skills/*/SKILL.md` and `hooks/*.sh` for these patterns and fails CI if a future PR introduces one. The check is case-insensitive and ignores shell comments (`#`) but treats Markdown content (including headings) as instructions to Claude. Add new skills with this in mind — terseness for the user is fine, terseness as a system-prompt constraint is not.
467
+
468
+ ### `cleanupPeriodDays` and TodoWrite retention (CC 2.1.117+)
469
+
470
+ See the dedicated subsection under [Tasks System](#tasks-system-v2116) (above, in Claude Code Feature Updates) for the full breakdown. Short version: pin `cleanupPeriodDays: 30` or higher in `.claude/settings.json` if you ever pause SDLC work for more than a week. The wizard ships this default in `cli/templates/settings.json` and the CLI's smart-merge preserves user overrides on `init --force`.
471
+
472
+ ### MCP-tool hooks audit (ROADMAP #218, CC 2.1.118)
473
+
474
+ CC 2.1.118 introduced `type: "mcp_tool"` for hooks — a hook can now directly invoke an MCP tool instead of running a bash script. **Audit (2026-04-26) of all 5 wizard hooks concluded: none migrate, all stay bash.** This subsection documents the per-hook reasoning so future audits don't redo the work; if a future PR migrates a hook to MCP, update this entry with the new rationale rather than deleting it.
475
+
476
+ **Decision criteria applied** (any one rules out MCP):
477
+
478
+ 1. **Portability** — bash hooks port to the **shipped** Codex sibling (`~/codex-sdlc-wizard`) and to a **planned** OpenCode sibling (ROADMAP #91, not yet shipped) without rewrite. MCP hooks are CC-specific. Cross-host portability is an XDLC requirement.
479
+ 2. **Fail-closed gating** — hooks that block an action (exit 2 from PreCompact) need a fail-closed contract: any error in the hook MUST keep the block in place. CC docs ([code.claude.com/docs/en/hooks](https://code.claude.com/docs/en/hooks)) confirm `mcp_tool` hooks CAN gate via `decision: "block"` JSON output, but **MCP server errors are non-blocking by design** — if the MCP server is down/slow/buggy, the action proceeds. That breaks the fail-closed contract. Bash exit 2 fails closed.
480
+ 3. **Local-only state** — hooks that read/write `~/.cache/sdlc-wizard/` or `.reviews/handoff.json` don't surface state across tool boundaries. MCP adds a wire format without consumers.
481
+
482
+ **Per-hook decision** (each row applies at least one criterion explicitly):
483
+
484
+ - **`sdlc-prompt-check.sh`** (UserPromptSubmit, ~132 lines) — emits the SDLC BASELINE text on every prompt; writes effort-bump signals to `~/.cache/sdlc-wizard/effort-signals.log` for self-consumption on next invocation. Decision: **Stay bash.** Portability criterion: same script ships to Codex sibling unchanged. Local-state criterion: signal log is local-only.
485
+ - **`instructions-loaded-check.sh`** (~202 lines) — InstructionsLoaded event; validates SDLC files exist, fetches npm `latest` with daily file cache (`~/.cache/sdlc-wizard/npm-latest.json`), emits staleness warnings. Decision: **Stay bash.** Portability criterion: Codex sibling has its own equivalent of session-start validation; bash port is direct. Local-state criterion: cache file is local.
486
+ - **`tdd-pretool-check.sh`** (~29 lines) — PreToolUse on Write/Edit/MultiEdit; emits the TDD reminder text. Decision: **Stay bash.** Portability criterion: trivially portable (one-screen text emit). Gating criterion: not applicable (this hook does not block, it advises). MCP would add a runtime dependency for zero functional gain.
487
+ - **`model-effort-check.sh`** (~69 lines) — SessionStart event; reads `CLAUDE_CODE_EFFORT` env var, emits silent/soft/loud nudge per-tier. Decision: **Stay bash.** Portability criterion: env-var read maps 1:1 to any agent runtime. Local-state criterion: not applicable, hook is stateless.
488
+ - **`precompact-seam-check.sh`** (~125 lines) — PreCompact event (matcher: `manual`); reads `.reviews/handoff.json` via jq, blocks manual `/compact` with exit 2 + stderr message when status is `PENDING_*` and the linked PR (if any) isn't merged. Decision: **Stay bash.** Fail-closed gating criterion: bash exit 2 fails closed by definition; an MCP `mcp_tool` hook returning `decision: "block"` works on the happy path, but if the MCP server crashes/times out the action proceeds — that flips the safety property from fail-closed to fail-open. For a hook whose entire job is to prevent context loss at bad seams, fail-open is the wrong default.
489
+
490
+ **When to revisit this audit:**
491
+
492
+ - A future hook genuinely needs cross-tool structured state surfacing (e.g., a "score history reader" that an MCP-aware skill consumes directly).
493
+ - Anthropic deprecates bash hooks in favor of `mcp_tool` (currently both are first-class).
494
+ - Codex / OpenCode siblings gain native MCP-tool hook support (then portability is no longer an MCP-rules-out).
495
+
496
+ Until then, default answer for new hooks is bash.
497
+
498
+ ---
499
+
440
500
  ## Prove It's Better
441
501
 
442
502
  **Don't reinvent the wheel.** Use native/built-in features UNLESS you prove your custom version is better. If you can't prove it, delete yours.
@@ -2840,7 +2900,7 @@ If deployment fails or post-deploy verification catches issues:
2840
2900
 
2841
2901
  **SDLC.md:**
2842
2902
  ```markdown
2843
- <!-- SDLC Wizard Version: 1.40.1 -->
2903
+ <!-- SDLC Wizard Version: 1.41.1 -->
2844
2904
  <!-- Setup Date: [DATE] -->
2845
2905
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2846
2906
  <!-- Git Workflow: [PRs or Solo] -->
@@ -3902,7 +3962,7 @@ Walk through updates? (y/n)
3902
3962
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3903
3963
 
3904
3964
  ```markdown
3905
- <!-- SDLC Wizard Version: 1.40.1 -->
3965
+ <!-- SDLC Wizard Version: 1.41.1 -->
3906
3966
  <!-- Setup Date: 2026-01-24 -->
3907
3967
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3908
3968
  <!-- Git Workflow: PRs -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.40.1",
3
+ "version": "1.41.1",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "cli/bin/sdlc-wizard.js"
@@ -131,9 +131,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
131
131
 
132
132
  ```
133
133
  Installed: 1.24.0
134
- Latest: 1.40.1
134
+ Latest: 1.41.1
135
135
 
136
136
  What changed:
137
+ - [1.41.1] MCP-tool hooks audit — ROADMAP #218. Audited all 5 wizard hooks against CC 2.1.118's `type: "mcp_tool"` migration option; conclusion: all stay bash (portability to Codex/OpenCode siblings + exit-code gating semantics rule out MCP). Per-hook rationale documented under "Known CC Gotchas → MCP-tool hooks audit". 7 quality tests.
138
+ - [1.41.0] Post-mortem 2026-04-23 lessons folded into wizard — ROADMAP #221. New "Known CC Gotchas" section documents extended-thinking + caching + idle-session failure mode. Recommended Effort section cites the post-mortem as third-party evidence ("don't rely on CC default — set effort yourself"). Brevity-cap audit clean, regression guard added. 7 quality tests.
137
139
  - [1.40.1] cleanupPeriodDays: 30 pinned in template — ROADMAP #225. CC 2.1.117 expanded `cleanupPeriodDays` to also cover `~/.claude/tasks/`. Aggressive defaults could prune in-progress TodoWrite checklists for paused long-running features. Wizard now ships a 30-day floor + documented gotcha. 7 quality tests.
138
140
  - [1.40.0] CLI version detection in /update-wizard — ROADMAP #232. New Step 1.5 detects locally installed `agentic-sdlc-wizard` CLI version (npm ls + npx cache inspection, both with semver-aware ordering), compares to `registry.npmjs.org/agentic-sdlc-wizard/latest`, and surfaces a 3-way upgrade choice BEFORE drift detection: A) refresh CLI cache only (default, safest), B) `init --force` re-init with explicit non-settings overwrite warning, C) skip. Closes the gap where in-session file updates landed but the user's stale npx cache kept running an old CLI. Mirrors `claude update` UX. 8 quality tests, mutation-verified.
139
141
  - [1.39.1] Step 7.7 hoist — dead-plugin cleanup now runs even when wizard versions match. Previously `/update-wizard` exited at "you're up to date" before reaching Step 7.7, so users on the latest wizard with a stale `~/.claude/settings.json` plugin registration were never offered cleanup. New `tests/test-update-skill-step-7-7.sh` (8 quality tests) guards the ordering.