npm - agentic-sdlc-wizard - Versions diffs - 1.40.0 → 1.41.0 - Mend

agentic-sdlc-wizard 1.40.0 → 1.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +16 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +44 -2
package/cli/init.js +9 -0
package/cli/templates/settings.json +1 -0
package/package.json +1 -1
package/skills/update/SKILL.md +3 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.40.0",
+      "version": "1.41.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.40.0",
+  "version": "1.41.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,22 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.41.0] - 2026-04-26
+### Added
+- **Post-mortem 2026-04-23 lessons folded into wizard docs** (ROADMAP #221). [Anthropic's 2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) provides independent third-party evidence for three SDLC-relevant failure modes; this release captures all three:
+  - **Don't rely on CC default effort** — the post-mortem confirmed CC has flipped reasoning_effort defaults across versions (high → medium → xhigh/high). Recommended Effort section now cites this as evidence and reinforces that `/effort max` should be set explicitly every session, never assumed from the default.
+  - **Extended-thinking + caching + idle sessions can drop thinking blocks** — new "Known CC Gotchas" top-level section documents the failure mode (cached prompt prefix re-served after idle pruning silently drops thinking blocks downstream), with a workaround (start fresh session with `claude --continue` if quality degrades mid-session) and a detection signal pointer to ROADMAP #220.
+  - **Brevity-cap audit + regression guard** — audited every `skills/*/SKILL.md` and `hooks/*.sh` for compounding brevity constraints (`≤N words`, `be concise`, `keep brief`). Audit clean; no system-prompt brevity caps in the wizard. New `tests/test-postmortem-lessons.sh` (7 tests) includes a regression guard that fails CI if a future PR introduces one.
+- "Known CC Gotchas" is now a documented section pattern; future CC failure modes get folded here with workarounds.
+## [1.40.1] - 2026-04-26
+### Added
+- **`cleanupPeriodDays: 30` pinned in template settings** (ROADMAP #225). CC 2.1.117 expanded `cleanupPeriodDays` to also cover `~/.claude/tasks/` — the directory where the Tasks system persists in-progress TodoWrite state. With aggressive defaults (some CC versions defaulted to 7 days), SDLC checklists for paused long-running features could be silently pruned. `cli/templates/settings.json` now ships `"cleanupPeriodDays": 30` as a top-level field. `CLAUDE_CODE_SDLC_WIZARD.md` documents the gotcha + override path. New `tests/test-cleanup-period-guidance.sh` (7 tests) asserts template default + wizard rationale don't regress.
 ## [1.40.0] - 2026-04-25
 ### Added

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -257,6 +257,8 @@ Claude Code's **effort level** controls how much thinking the model does before
 **Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
+**Don't rely on the CC default — set effort yourself.** Anthropic's [2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) is independent third-party evidence that CC has flipped reasoning_effort defaults across versions (high → medium → xhigh/high). The default has changed before and will change again. The wizard's `model-effort-check.sh` hook nudges to `xhigh`/`max` at session start specifically because the in-product default is not load-bearing — it can shift release-to-release without notice. Set `/effort max` explicitly every session you do SDLC work, and treat any "I assumed the default was X" reasoning as a bug.
 The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.7, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.7.
 **Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
@@ -311,6 +313,14 @@ If you notice Claude Code producing shallow outputs despite `effort: high`, add
 **Rollback if issues**: Set `CLAUDE_CODE_ENABLE_TASKS=false` environment variable.
+#### Known CC gotcha: `cleanupPeriodDays` and TodoWrite retention (CC 2.1.117+)
+CC 2.1.117 expanded the `cleanupPeriodDays` setting to also cover `~/.claude/tasks/` — the directory where the Tasks system persists in-progress TodoWrite state across sessions. If `cleanupPeriodDays` is too low (CC's default has been as aggressive as 7 days in some versions), an SDLC checklist for a paused long-running feature can be silently pruned out from under you.
+**Wizard pins a safe default**: `cli/templates/settings.json` ships `"cleanupPeriodDays": 30`. The wizard's SDLC skill makes TodoWrite step 1 of every task, so the floor matters. Recommendation: keep it at **30 or higher** if you ever pause work for more than a week. If a task list disappears mid-cycle, check this setting first.
+**To override**: set `cleanupPeriodDays` to a higher number in your project's `.claude/settings.json`. The CLI's smart-merge preserves user overrides on `init --force`.
 ### Skill Arguments with $ARGUMENTS (v2.1.19+)
 **What changed**: Skills can now accept parameters via `$ARGUMENTS` placeholder.
@@ -429,6 +439,38 @@ Several fixes that strengthen wizard enforcement:
 ---
+## Known CC Gotchas
+This section documents Claude Code failure modes that have been observed in the wild — typically surfaced via post-mortems, GitHub issues, or our own catches data. Each entry has a workaround and a permanent fix when one exists.
+### Extended-thinking + caching + idle sessions can drop thinking blocks (post-mortem 2026-04-23)
+[Anthropic's 2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) describes a caching bug that "continuously dropped thinking blocks from subsequent requests" — surfaced primarily as silent quality degradation in long sessions. The failure mode mixed three ingredients:
+1. **Extended thinking enabled** (high/xhigh/max effort triggers thinking-block production)
+2. **Prompt caching active** (CC re-uses cached prompt prefixes across turns)
+3. **Idle sessions** (context pruning during idle pulls thinking blocks out of the cache window)
+When a cached prompt prefix is re-served after idle pruning, downstream thinking blocks can be silently absent — the model produces shorter, less-considered responses despite the requested effort level. Symptom: a session that was reasoning deeply earlier suddenly returns terse answers without obvious cause.
+**Workaround**: if you hit suspicious shallow reasoning mid-session — especially after a long idle gap — start a fresh session with `claude --continue` to reset cache state. The wizard's PreCompact hook gates manual `/compact` precisely because compacting at bad seams can also pull thinking blocks out of context.
+**Detection signal**: the wizard's `model-effort-check.sh` already loud-warns below `xhigh`. Combine with token-spike anomaly detection (ROADMAP #220) once shipped.
+### Prompt brevity caps can compound across turns (post-mortem 2026-04-23)
+The same post-mortem documented that a length-limit prompt change (one of several brevity edits, including a line like `"keep text between tool calls to ≤25 words"`) correlated with a measurable ~3% drop on one evaluation. The post-mortem attributes the drop to the broader length-limit prompt change, not to that single sentence alone.
+**Wizard policy** (audited 2026-04-26): the wizard's SKILL.md files and hook stdout do **not** impose compounding brevity constraints — no `≤N words`, `<N words`, `be concise`, or `keep brief` instructions to Claude. The wizard's own response-style guidance is in CC's user-level instructions, not injected into every system prompt.
+**Regression guard**: `tests/test-postmortem-lessons.sh` greps every `skills/*/SKILL.md` and `hooks/*.sh` for these patterns and fails CI if a future PR introduces one. The check is case-insensitive and ignores shell comments (`#`) but treats Markdown content (including headings) as instructions to Claude. Add new skills with this in mind — terseness for the user is fine, terseness as a system-prompt constraint is not.
+### `cleanupPeriodDays` and TodoWrite retention (CC 2.1.117+)
+See the dedicated subsection under [Tasks System](#tasks-system-v2116) (above, in Claude Code Feature Updates) for the full breakdown. Short version: pin `cleanupPeriodDays: 30` or higher in `.claude/settings.json` if you ever pause SDLC work for more than a week. The wizard ships this default in `cli/templates/settings.json` and the CLI's smart-merge preserves user overrides on `init --force`.
+---
 ## Prove It's Better
 **Don't reinvent the wheel.** Use native/built-in features UNLESS you prove your custom version is better. If you can't prove it, delete yours.
@@ -2832,7 +2874,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.40.0 -->
+<!-- SDLC Wizard Version: 1.41.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -3894,7 +3936,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.40.0 -->
+<!-- SDLC Wizard Version: 1.41.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/cli/init.js CHANGED Viewed

@@ -78,6 +78,15 @@ function mergeSettings(existingPath, templatePath, force) {
       existing.model = template.model;
     }
+    // Merge cleanupPeriodDays (ROADMAP #225): only set the template default when
+    // the user has not chosen a value. NEVER overwrite under --force — retention
+    // policy is a user preference (they may want >30 for long pauses, or <30 for
+    // disk-tight setups). The wizard's job is to provide a safe floor on fresh
+    // installs, not to clobber an explicit choice.
+    if ('cleanupPeriodDays' in template && !('cleanupPeriodDays' in existing)) {
+      existing.cleanupPeriodDays = template.cleanupPeriodDays;
+    }
     // Merge env field
     if (template.env) {
       if (!existing.env || typeof existing.env !== 'object' || Array.isArray(existing.env)) {

package/cli/templates/settings.json CHANGED Viewed

@@ -1,4 +1,5 @@
 {
+  "cleanupPeriodDays": 30,
   "hooks": {
     "UserPromptSubmit": [
       {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.40.0",
+  "version": "1.41.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "cli/bin/sdlc-wizard.js"

package/skills/update/SKILL.md CHANGED Viewed

@@ -131,9 +131,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
 ```
 Installed: 1.24.0
-Latest:    1.40.0
+Latest:    1.41.0
 What changed:
+- [1.41.0] Post-mortem 2026-04-23 lessons folded into wizard — ROADMAP #221. New "Known CC Gotchas" section documents extended-thinking + caching + idle-session failure mode. Recommended Effort section cites the post-mortem as third-party evidence ("don't rely on CC default — set effort yourself"). Brevity-cap audit clean, regression guard added. 7 quality tests.
+- [1.40.1] cleanupPeriodDays: 30 pinned in template — ROADMAP #225. CC 2.1.117 expanded `cleanupPeriodDays` to also cover `~/.claude/tasks/`. Aggressive defaults could prune in-progress TodoWrite checklists for paused long-running features. Wizard now ships a 30-day floor + documented gotcha. 7 quality tests.
 - [1.40.0] CLI version detection in /update-wizard — ROADMAP #232. New Step 1.5 detects locally installed `agentic-sdlc-wizard` CLI version (npm ls + npx cache inspection, both with semver-aware ordering), compares to `registry.npmjs.org/agentic-sdlc-wizard/latest`, and surfaces a 3-way upgrade choice BEFORE drift detection: A) refresh CLI cache only (default, safest), B) `init --force` re-init with explicit non-settings overwrite warning, C) skip. Closes the gap where in-session file updates landed but the user's stale npx cache kept running an old CLI. Mirrors `claude update` UX. 8 quality tests, mutation-verified.
 - [1.39.1] Step 7.7 hoist — dead-plugin cleanup now runs even when wizard versions match. Previously `/update-wizard` exited at "you're up to date" before reaching Step 7.7, so users on the latest wizard with a stale `~/.claude/settings.json` plugin registration were never offered cleanup. New `tests/test-update-skill-step-7-7.sh` (8 quality tests) guards the ordering.
 - [1.39.0] Community feature-discovery scanner — ROADMAP #207. `tests/e2e/scan-community.sh` extracts unknown `/slash-command` mentions from transcript text (Reddit / HN / Discord exports), dedupes against `tests/e2e/known-slash-commands.txt` allowlist, emits JSON digest of candidates with count + sample. Replaces the deleted CI scan-community job (per #231 Phase 3) with a maintainer-runnable offline scan. 14 quality tests.