npm - agentic-sdlc-wizard - Versions diffs - 1.33.0 → 1.35.0 - Mend

agentic-sdlc-wizard 1.33.0 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +52 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +106 -38
package/README.md +19 -1
package/cli/init.js +1 -0
package/cli/templates/settings.json +11 -4
package/hooks/hooks.json +11 -0
package/hooks/instructions-loaded-check.sh +104 -4
package/hooks/precompact-seam-check.sh +75 -0
package/hooks/sdlc-prompt-check.sh +63 -0
package/package.json +1 -1
package/skills/sdlc/SKILL.md +44 -7
package/skills/setup/SKILL.md +44 -11
package/skills/update/SKILL.md +50 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.33.0",
+      "version": "1.35.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.33.0",
+  "version": "1.35.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,58 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.35.0] - 2026-04-19
+### Added
+- **PreCompact seam gate** (#205 / ROADMAP #208). New `hooks/precompact-seam-check.sh` blocks manual `/compact` when `.reviews/handoff.json` status is `PENDING_REVIEW`/`PENDING_RECHECK` or a git rebase/merge/cherry-pick is in flight. Matcher is `manual` — auto-compact is deliberately NOT gated (blocking it could push past 100% context). Requires Claude Code v2.1.105+. 10 quality tests.
+- **Self-healing PreCompact** (#206 / ROADMAP #209). Hook now treats a stale `PENDING_*` handoff as implicit `CERTIFIED` when optional `pr_number` field is present and `gh pr view <N>` reports `MERGED`. Fixes the "forgot to flip status to CERTIFIED after merge" consumer bug. Graceful fallback: if `pr_number` absent, `gh` missing, offline, or any error → existing block behavior. 4 new quality tests with mocked `gh` binary.
+- **Dynamic effort auto-bump hook** (#202 / ROADMAP #195). `sdlc-prompt-check.sh` scans `UserPromptSubmit` payload for LOW-confidence / FAILED-repeatedly / CONFUSED phrases and logs a timestamped signal. At ≥2 recent signals in a 30-min window, emits a loud `!! EFFORT BUMP REQUIRED !!` block with the exact `/effort xhigh` command. 8 quality tests.
+- **Loud staleness nudge** (#201 / ROADMAP #196). `instructions-loaded-check.sh` now caches npm-latest for 24h and prints a loud multi-line warning when the installed wizard is ≥3 minor versions behind. 1–2 minor keeps the existing mild one-liner.
+- **Session-start CC auto-update nudge** (#192 / ROADMAP #85). Instructions-loaded hook queries for open auto-update PRs and nudges the user to review before compacting.
+- **Hook token-cost caps** (#203). 4 new size-cap tests across every hook. Negative control injects echo bloat to prove the caps trip.
+- **Permissions allowlist** (#204). Top read-only tools pre-approved in `.claude/settings.json` to cut permission prompts during automation.
+- **Codex-audit-on-CI-logs shepherd step** (docs). SDLC skill now requires running `codex exec xhigh` against both Tier 1 and Tier 2 CI logs separately — catches silent failures, degraded metrics, and warnings-promoted-to-errors that the green checkmark hides. Dogfooded on PR #206: caught 4 P1s in pre-existing CI infra (tracked as ROADMAP #210, #211, #215 + regression of #93).
+### Fixed
+- **CI persist-to-PR-branch race** (#196 / ROADMAP #193). Tier 2 no longer aborts the whole run on a single low-score trial; records the trial instead.
+- **Setup wizard `allowedTools` → `permissions.allow`** (#200 / ROADMAP #197). CC v2.1 renamed the field; setup template updated.
+- **Setup wizard `opus[1m]` opt-in** (#199 / ROADMAP #198). Default no longer force-pins; respects explicit user choice.
+- **SessionStart hook model-field absence** (#180 / commit 3b23860). Model isn't exposed in SessionStart payload; hook now detects effort-only.
+### Docs
+- Memory lessons promoted (#194): tier2 exit-code pattern + pipeline liveness.
+- Community paths (#191 / ROADMAP #98): issue + PR templates + Discussions enabled.
+- ROADMAP backlog filed this cycle: #210 (Node 24 false-green), #211 (Tier 1 "11/10" score), #212 (local-Max E2E), #213 (ship degradation env vars by default — blocked on #214), #214 (adaptive-thinking A/B Prove-It), #215 (Tier 2 persist dead code), #216 (repo rename).
+## [1.34.0] - 2026-04-17
+### Added
+- Memory Audit Protocol for promoting private-memory lessons to shared docs (#189)
+  - New `/sdlc` subsection under "After Session (Capture Learnings)" defines a three-bucket classifier (`promote` / `keep` / `manual-review`) with a rule-based privacy denylist (`type: user`/`reference` → keep, `project`/`feedback` → manual-review)
+  - YAML frontmatter parser in `tests/test-memory-audit-protocol.sh` normalizes inline comments, quoted values, and whitespace so variants like `type: "user" # external` still route to keep
+  - `SDLC.md` now has a `## Lessons Learned` section seeded with 7 verified technical gotchas (GH CLI stdout, `workflows` YAML scope, GITHUB_TOKEN workflow triggers, GHA `${{ }}` backtick substitution, macOS bash 3.x, stderr/stdout separation for JSON parsing, `continue-on-error` + `||` masking); each entry cites its originating PR or incident date and was re-verified with a runnable repro before promotion
+  - 10-fixture corpus at `tests/fixtures/memory-audit-corpus/` (6 promote / 2 keep / 2 manual-review) with `test_expected` frontmatter seeds the future LLM-gated quality runner
+  - 12-test protocol suite covers structure, rule-based denylist, YAML-variant hardening, corpus consistency (promote fixtures route to manual-review under rule-based), and corpus shape
+  - Codex xhigh 3-round code review: 4/10 → 8/10 → 10/10 CERTIFIED. Caught two false lessons in private memory (`${3:-{}}` brace-default claim and `--argjson result` jq-conflict claim) that were retracted with dated strikethroughs — the protocol's first real use prevented its own false claims from shipping
+  - CLI distributes skill updates + new SDLC.md section; CI wire-up in `.github/workflows/ci.yml` (validate job)
+- API feature detection shepherd for Claude API release notes (#100, PRs #184, #186, #187)
+  - LLM-free weekly detector at `.github/workflows/weekly-api-update.yml` polls `platform.claude.com/docs/en/release-notes/api.md`
+  - `scripts/parse-api-changelog.py` parses ATX date headers with ordinal-date normalizer and bullet-summary capture (non-date sub-headers like `#### SDKs` no longer terminate bullet extraction); 200-char truncation with ellipsis; tab scrub
+  - `scripts/persist-api-state.sh` writes last-seen date with branch-protection-safe non-blocking push; opens/updates a single `api-review-needed` tracking issue with enriched bullet summaries (not just dates)
+  - `instructions-loaded-check.sh` nudges at session start when open issues exist; gated on local workflow presence so consumer forks see only their own detector's issues
+  - 33 tests including 8 fixture-based parser tests (bullet capture, subheader boundary, tab scrub, truncation, ordinal dates) and 2 integration tests
+  - Codex xhigh 5 rounds across 2 PRs: 9/10 CERTIFIED. Found-in-prod P0 hotfix in #187 — `gh api` writes JSON error bodies to stdout (not stderr), so the label-create `already_exists` check was broken after the first successful dispatch; pattern now captures both streams
+### Fixed
+- `gh api` error handling in `weekly-api-update.yml` now captures stdout+stderr together for `already_exists` detection on label creation (#187). Added as portable lesson in `SDLC.md` Lessons Learned
+### Docs
+- `/less-permission-prompts` Claude Code native skill surfaced in wizard and setup documentation (#183)
+- README community section restyled with visual Discord badge for Automation Station
 ## [1.33.0] - 2026-04-17
 ### Added

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -403,7 +403,7 @@ The `if` field on individual hook handlers filters by tool name AND arguments us
 | `matcher` | Group (all hooks in array) | Tool name only | Regex (`Write\|Edit`) |
 | `if` | Individual handler | Tool name + arguments | Permission rule (`Edit(src/**)`) |
-**Pattern examples:** `Edit(*.ts)`, `Write(src/**)`, `Bash(git *)`. Same syntax as `allowedTools` in settings.json.
+**Pattern examples:** `Edit(*.ts)`, `Write(src/**)`, `Bash(git *)`. Same syntax as `permissions.allow` in settings.json.
 **Only works on tool-use events:** `PreToolUse`, `PostToolUse`, `PostToolUseFailure`. Adding `if` to non-tool events prevents the hook from running.
@@ -846,6 +846,22 @@ Two tools for managing context — use the right one:
 **Best practice:** Put persistent instructions in CLAUDE.md (survives both `/compact` and `/clear`), not in conversation.
+### Compact at Seams, Not Thresholds (PreCompact hook)
+**The threshold is the trigger, not the decision.** 25-30% remaining (~70% used) is the commonly-cited "sweet spot" but ignores *what you're doing* at that moment. Compacting mid-Codex-round loses the round-1 evidence and certify conditions that round-2 needs to re-verify. Compacting mid-rebase strands the operation without the context that was setting it up.
+A **seam** is a point where losing conversational context is safe:
+- Commit boundary (change persisted to git)
+- Codex `CERTIFIED` (review cycle closed)
+- PR merged (work shipped)
+- ROADMAP item marked DONE
+The wizard's `PreCompact` hook (`hooks/precompact-seam-check.sh`) enforces this for **manual** `/compact` only — it reads `.reviews/handoff.json` and blocks with `HOLD` + exit 2 when status is `PENDING_REVIEW` / `PENDING_RECHECK`, and also blocks when a git rebase, merge, or cherry-pick is in progress. Auto-compact is **not** gated — blocking it could push past 100% context and lose everything. Requires Claude Code **v2.1.105+** (PreCompact event introduced 2026-04-13).
+**What's NOT checked:** in-progress TodoWrite tasks. Claude Code does not persist TodoWrite state to a file readable from a hook, so "finish the current todo first" is on you, not the hook. Watch the TodoWrite panel before you `/compact`.
+Override: resolve the blocker (certify the review, finish the rebase), or temporarily disable the hook in `.claude/settings.json`. Don't suppress the warning reflexively — the warning is the point.
 ### Autocompact Tuning
 Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic. For a rigorous benchmarking methodology to validate these thresholds, see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md).
@@ -855,7 +871,9 @@ Override the default auto-compact threshold with environment variables. These ar
 | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
 | `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |
-**Recommended:** The SDLC Wizard CLI sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` and `"model": "opus[1m]"` in `.claude/settings.json` by default (tuned for the 1M context window — compacts at ~300K). To customize, edit `.claude/settings.json`:
+**Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.7 + 1M context answer yes.
+To opt in by hand, edit `.claude/settings.json`:
 ```json
 {
@@ -876,7 +894,7 @@ export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30
 | Use Case | AUTOCOMPACT % | Why |
 |----------|--------------|-----|
-| **SDLC default (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions** |
+| **Opt-in SDLC setup (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions. Paired with the opt-in `opus[1m]` pin (see issue #198)** |
 | General development (200K `opus`) | 75% | Leaves room for implementation after planning |
 | Complex refactors (200K `opus`) | 80% | Slightly more context before compaction |
 | CI pipelines | 60% | Short tasks, compact early to stay fast |
@@ -892,31 +910,33 @@ The thresholds above are community consensus — not empirically validated. For
 ### 1M vs 200K Context Window
-Claude Code supports both 200K and 1M context windows. **Default to `opus[1m]` for SDLC work** — the 1M headroom is free until you actually use it.
+Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in power-user pin** — ask yourself whether you actually need the headroom before setting it, because pinning the model at the top level disables Claude Code's auto-mode (see issue #198).
-| | 200K Context (`opus`) | 1M Context (`opus[1m]`) **← default** |
+| | 200K Context (default / auto-mode) | 1M Context (`opus[1m]`, opt-in) |
 |---|---|---|
-| **Best for** | Short one-off tasks | Normal SDLC cycles + multi-feature work |
+| **Best for** | Most work — auto-mode picks Sonnet/Opus per turn | Multi-feature / long plan+TDD+review cycles where a single session really crosses 100K+ |
 | **Typical usage** | 50-80K tokens per task | 50-80K typical, up to 200K+ for complex workflows |
 | **Cost** | Standard pricing | Anthropic currently lists the 1M window at standard pricing across the full context for supported Opus/Sonnet models — **verify current rates at [docs.anthropic.com/pricing](https://docs.anthropic.com/)** before assuming no premium |
-| **Auto-compact** | Default 95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — **tune to 30%** |
-| **Suggested override** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
+| **Auto-mode** | **Enabled** — Claude Code chooses model per turn | **Disabled** — top-level `model` tells CC you've chosen explicitly |
+| **Auto-compact** | Default ~95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` |
+| **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
+**Why `opus[1m]` is opt-in (issue #198):**
+- **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
+- **The 1M headroom has to earn it.** If your typical session stays under 150K, you're giving up auto-mode for headroom you're not using.
+- **Power users who want guaranteed Opus 4.7 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.
-**Why `opus[1m]` as default:**
-- **Long SDLC sessions accumulate context fast** — plan → TDD → review → CI shepherd on a single feature regularly crosses 100K tokens
-- **Safety margin against autocompact loss** — cheaper to have headroom than to re-read files after a forced compact
-- **At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models.** Verify current rates for your plan before relying on this — see [docs.anthropic.com/pricing](https://docs.anthropic.com/)
+**Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.7 specifically (not Sonnet), and you're OK losing auto-mode.
-**Set it up:** `/model opus[1m]` in your session, or set `"model": "opus[1m]"` in `.claude/settings.json`. The CLI template ships with this default. Requires Claude Code v2.1.111+ for Opus 4.7.
+**Stay on auto-mode (default) when:** you're unsure, your work is mixed short/long, or you want Claude Code to do the model math for you.
-**Fall back to `opus` (200K) when:**
-- Your plan or organization charges higher rates for long-context prompts (check your billing)
-- You're doing genuinely short one-off tasks and want slightly faster responses
-- Your team has cost controls that flag >200K prompts
+**How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.111+ for Opus 4.7. The setup wizard's Step 9.5 also asks once, with default No.
+**How to opt out:** remove the `model` line from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
 **Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/cost` to monitor — a 900K-token session is meaningfully more expensive than an 80K one even at standard rates.
-**Autocompact pairing (important):** If you default to `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. The CLI template sets this automatically.
+**Autocompact pairing (important):** If you opt into `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. Step 9.5 writes both together when you opt in.
 ---
@@ -1300,7 +1320,7 @@ Feature branches still recommended for solo devs (keeps main clean, easy rollbac
 **CI monitoring detail:**
 > "Should Claude monitor CI checks after pushing and auto-diagnose failures? (y/n)"
-- **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to allowedTools
+- **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to `permissions.allow`
 - **No** → Skip CI monitoring steps (Claude still runs local tests, just doesn't watch CI)
 **CI review feedback question (only if CI monitoring is enabled):**
@@ -1365,7 +1385,7 @@ Claude scans for:
 │   ├── .github/workflows/deploy*.yml     → GitHub Actions deploy
 │   └── package.json scripts (deploy:*)   → npm deploy scripts
 │
-├── Tool permissions (for allowedTools):
+├── Tool permissions (for permissions.allow):
 │   ├── package.json           → Bash(npm *), Bash(node *), Bash(npx *)
 │   ├── pnpm-lock.yaml         → Bash(pnpm *)
 │   ├── yarn.lock              → Bash(yarn *)
@@ -1759,18 +1779,20 @@ Create `.claude/settings.json`:
 ```json
 {
   "verbosity": "medium",
-  "allowedTools": [
-    "Read",
-    "Edit",
-    "Write",
-    "Glob",
-    "Grep",
-    "Task",
-    "Bash(npm *)",
-    "Bash(node *)",
-    "Bash(npx *)",
-    "Bash(gh *)"
-  ],
+  "permissions": {
+    "allow": [
+      "Read",
+      "Edit",
+      "Write",
+      "Glob",
+      "Grep",
+      "Task",
+      "Bash(npm *)",
+      "Bash(node *)",
+      "Bash(npx *)",
+      "Bash(gh *)"
+    ]
+  },
   "hooks": {
     "UserPromptSubmit": [
       {
@@ -1800,7 +1822,7 @@ Create `.claude/settings.json`:
 ### Allowed Tools (Adaptive)
-The `allowedTools` array is auto-generated based on your stack detected in Step 0.4.
+The `permissions.allow` array is auto-generated based on your stack detected in Step 0.4. (Historical note: pre-#197 guidance used a top-level `allowedTools` array — that form silently disables Claude Code auto-mode, so the wizard writes `permissions.allow` now.)
 | If Detected | Tools Added |
 |-------------|-------------|
@@ -1841,6 +1863,9 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
 |------|---------------|---------|
 | `UserPromptSubmit` | Every message you send | Baseline SDLC reminder, skill auto-invoke |
 | `PreToolUse` | Before Claude edits files | TDD check: "Did you write the test first?" Uses `if` field to only fire on source files |
+| `InstructionsLoaded` | On first SDLC.md/CLAUDE.md load | Staleness nudges (wizard version, review-protocol reminders, CC release alerts) |
+| `SessionStart` | On `claude` startup | Detect stale effort setting / model upgrades |
+| `PreCompact` (manual only) | When user runs `/compact` | **Seam gate** — blocks manual compact when `.reviews/handoff.json` is `PENDING_REVIEW`/`PENDING_RECHECK` or a git rebase/merge/cherry-pick is in progress. Auto-compact is NOT gated (blocking it risks pushing past 100% context and losing everything). Requires Claude Code v2.1.105+ |
 ### How Skill Auto-Invoke Works
@@ -2057,9 +2082,11 @@ Before presenting approach, STATE your confidence:
 |-------|---------|--------|--------|
 | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
 | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
-| LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
-| FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
-| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
+| LOW (<60%) | Not sure | ASK USER before proceeding | **Run `/effort xhigh` now** — don't wait |
+| FAILED 2x | Something's wrong | STOP. ASK USER immediately | **Run `/effort max` now** — you're burning cycles at lower effort |
+| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
+**Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — spinning at low effort is an SDLC failure mode.
 ## Self-Review Loop (CRITICAL)
@@ -2626,6 +2653,28 @@ These are your full reference docs. Start with stubs and expand over time:
 **Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
+## Pipeline Liveness Audits — CI green ≠ data flowing
+A green CI badge only means "no step crashed." It does **not** mean "this pipeline is still producing the output it's supposed to produce." Long-running pipelines — scheduled benchmarks, nightly analytics jobs, weekly report generators, any workflow that appends to a log or dataset — can silently stop producing output while every run still reports success. Regression tests alone do not catch this: the fault lives between "green status" and "observable artifact."
+**Symptom to watch for:** a file, table, or dashboard that's supposed to be updated by a scheduled workflow stops advancing even though the workflow keeps running green.
+**Concrete example from this repo (2026-04-18):** `tests/e2e/score-history.jsonl` hadn't been appended to since 2026-03-30, yet weekly runs kept completing. Two stacked causes:
+1. On the 2026-04-13 run, a `CRITICAL MISS` caused `evaluate.sh` to exit 1 *with* a valid score payload; the tier2 wrapper aborted on the non-zero exit and dropped the trial (fix: PR #193 — disambiguate infra error from legitimate low-score exit via JSON payload, not exit code).
+2. On other runs, a separate PR-branch push race (`refs/pull/N/merge` checkout vs. `refs/heads/<branch>` push) silently dropped the new trial because `continue-on-error: true` was set on the push step.
+The second failure was silent (push step protected by `continue-on-error`); the first was a red CI run but nobody was watching weekly runs closely enough to notice the artifact had stopped advancing. Either way, the artifact's liveness would have caught the stall weeks earlier than a CI-badge-only review.
+**Audit pattern — run when you touch a pipeline, and when asked to check pipeline health:**
+1. **Identify the observable output** — the artifact the pipeline is supposed to produce (file, PR, issue, log row, dashboard row).
+2. **Check its liveness** — what's the last timestamp? If the pipeline runs weekly but the artifact is 3+ cycles stale, that's a stall, not a lull.
+3. **Walk backward from the artifact to CI** — find the step that writes it, read that specific step's logs (not just the top-level status), confirm the write actually happened.
+4. **When `continue-on-error: true` is present upstream of the write step**, treat that step as suspect by default — its failures are masked.
+**When Claude should run this.** Claude runs the liveness audit when it merges or edits a scheduled workflow, when it ships a change that writes to a long-running artifact, and whenever the user asks to check a pipeline's health. It is not a background task — if no one is looking, the audit does not happen.
 ## Rollback
 If deployment fails or post-deploy verification catches issues:
@@ -2662,7 +2711,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.33.0 -->
+<!-- SDLC Wizard Version: 1.35.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -3135,6 +3184,8 @@ Want me to file these? (yes/no/not now)
 **`/revise-claude-md` scope:** Only updates CLAUDE.md. It does NOT touch feature docs, TESTING.md, hooks, or skills. Use it for general project context that applies across the codebase.
+**Memory Audit Protocol:** Per-user memory at `~/.claude/projects/<proj>/memory/` accumulates private learnings. Some are portable technical lessons that belong in shared docs. The `/sdlc` skill's **Memory Audit Protocol** section (under "After Session (Capture Learnings)") defines a three-bucket classifier (`promote` / `keep` / `manual-review`) with a type-based denylist that keeps `user`/`reference` entries private and routes `project`/`feedback` entries to human review. Run at end-of-release or after debugging-heavy sessions. Human approves every promotion chunk-by-chunk before apply.
 **When to do mini-retro:** After features, tricky bugs, or discovering gotchas. Skip for one-line fixes or questions.
 **The SDLC evolves:** Weekly research, monthly deep-dives, and CI friction signals feed improvements. Human approves, the system gets better.
@@ -3721,7 +3772,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.33.0 -->
+<!-- SDLC Wizard Version: 1.35.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->
@@ -4060,6 +4111,23 @@ When Anthropic provides official plugins that overlap with this SDLC:
 **Re-run `claude-code-setup` periodically** (quarterly, or when your project expands in scope) to catch new automations — MCP servers, hooks, subagents — that weren't relevant at initial setup but are now.
+**API feature shepherd (self-maintenance, roadmap #100):**
+The wizard watches the **Anthropic API changelog** — not just Claude Code CLI releases — for new betas, tools, and agent features. The detector runs in `.github/workflows/weekly-api-update.yml`, is intentionally LLM-free, and only opens a tracking issue labeled `api-review-needed` when new entries appear at `platform.claude.com/docs/en/release-notes/api`.
+When that issue is open, the session-start hook nudges you. The session (not the workflow) does the deep research + adoption via the full SDLC loop. This mirrors the "local shepherd" pattern used for CI fixes: cheap Action-layer detection + session-time analysis beats expensive Action-layer LLM calls.
+The gap this closes: the advisor tool (API beta, `advisor-tool-2026-03-01`) shipped and was missed for several days before manual discovery. Detector would have flagged it on the next weekly tick.
+**Complementary native skills worth knowing:**
+| Native Skill | What It Does | When to Run |
+|--------------|--------------|-------------|
+| `/less-permission-prompts` | Scans transcripts for common read-only Bash/MCP calls and proposes a prioritized allowlist | After a few sessions — reduces permission friction without auto mode |
+| `/permissions` | Pre-allow specific commands and check them into `.claude/settings.json` | Anytime you want an auditable team allowlist |
+These are shipped by Claude Code itself. The wizard doesn't reimplement them — it points you at them so you benefit from the native version's ongoing maintenance.
 ### When Claude Code Improves
 Claude Code is actively improving. When they add built-in features:

package/README.md CHANGED Viewed

@@ -237,8 +237,26 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
 ## Community
-Come join **[Automation Station](https://discord.com/invite/fGPEF7GHrF)** — a community Discord packed with software engineers bringing 40+ years of combined experience across every area of the stack (frontend, backend, infra, embedded, data, QA, DevOps, you name it). Share patterns, ask questions, compare notes on AI agents, automation, and SDLC tooling.
+<div align="center">
+[![Discord](https://img.shields.io/badge/Discord-Automation%20Station-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/fGPEF7GHrF)
+**[Automation Station](https://discord.com/invite/fGPEF7GHrF)** — a community Discord packed with software engineers bringing 40+ years of combined experience across every area of the stack.
+_Frontend · Backend · Infra · Embedded · Data · QA · DevOps_
+Share patterns, ask questions, compare notes on AI agents, automation, and SDLC tooling.
+</div>
 ## Contributing
 PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for evaluation methodology and testing.
+## Feedback
+Three ways to report bugs, request features, or ask questions:
+- **In-session:** run `/feedback` inside any Claude Code session using this wizard — auto-fills context and redacts secrets before filing
+- **Issue templates:** [bug report](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=bug_report.md), [feature request](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=feature_request.md), [question](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=question.md)
+- **Discussions:** open-ended conversations at [github.com/BaseInfinity/agentic-ai-sdlc-wizard/discussions](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/discussions)

package/cli/init.js CHANGED Viewed

@@ -24,6 +24,7 @@ const FILES = [
   { src: 'hooks/tdd-pretool-check.sh', dest: '.claude/hooks/tdd-pretool-check.sh', executable: true, base: REPO_ROOT },
   { src: 'hooks/instructions-loaded-check.sh', dest: '.claude/hooks/instructions-loaded-check.sh', executable: true, base: REPO_ROOT },
   { src: 'hooks/model-effort-check.sh', dest: '.claude/hooks/model-effort-check.sh', executable: true, base: REPO_ROOT },
+  { src: 'hooks/precompact-seam-check.sh', dest: '.claude/hooks/precompact-seam-check.sh', executable: true, base: REPO_ROOT },
   { src: 'skills/sdlc/SKILL.md', dest: '.claude/skills/sdlc/SKILL.md', base: REPO_ROOT },
   { src: 'skills/setup/SKILL.md', dest: '.claude/skills/setup/SKILL.md', base: REPO_ROOT },
   { src: 'skills/update/SKILL.md', dest: '.claude/skills/update/SKILL.md', base: REPO_ROOT },

package/cli/templates/settings.json CHANGED Viewed

@@ -1,8 +1,4 @@
 {
-  "model": "opus[1m]",
-  "env": {
-    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
-  },
   "hooks": {
     "UserPromptSubmit": [
       {
@@ -45,6 +41,17 @@
           }
         ]
       }
+    ],
+    "PreCompact": [
+      {
+        "matcher": "manual",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/precompact-seam-check.sh"
+          }
+        ]
+      }
     ]
   }
 }

package/hooks/hooks.json CHANGED Viewed

@@ -42,6 +42,17 @@
           }
         ]
       }
+    ],
+    "PreCompact": [
+      {
+        "matcher": "manual",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/precompact-seam-check.sh"
+          }
+        ]
+      }
     ]
   }
 }

package/hooks/instructions-loaded-check.sh CHANGED Viewed

@@ -34,14 +34,71 @@ if [ -n "$MISSING" ]; then
     echo "Invoke Skill tool, skill=\"setup-wizard\" to generate them."
 fi
-# Version update check (non-blocking, best-effort)
+# Version update check (non-blocking, best-effort).
+# Fetches npm latest at most once per 24h (ROADMAP #196). Prints a stronger
+# multi-line nudge when the gap is ≥3 minor versions — the one-liner gets
+# skipped after weeks of ignoring it (user feedback 2026-04-18).
 SDLC_MD="$PROJECT_DIR/SDLC.md"
+# Strict x.y.z semver — rejects whitespace, "junk", "1.alpha.0", etc.
+SEMVER_RE='^[0-9]+\.[0-9]+\.[0-9]+$'
 if [ -f "$SDLC_MD" ]; then
     INSTALLED_VERSION=$(grep -o 'SDLC Wizard Version: [0-9.]*' "$SDLC_MD" | head -1 | sed 's/SDLC Wizard Version: //')
-    if [ -n "$INSTALLED_VERSION" ] && command -v npm > /dev/null 2>&1; then
-        LATEST_VERSION=$(npm view agentic-sdlc-wizard version 2>/dev/null) || true
+    if [ -n "$INSTALLED_VERSION" ] && [[ "$INSTALLED_VERSION" =~ $SEMVER_RE ]]; then
+        VERSION_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
+        VERSION_CACHE_FILE="$VERSION_CACHE_DIR/latest-version"
+        LATEST_VERSION=""
+        # Use cache if present, <24h old, and contents are valid semver
+        if [ -f "$VERSION_CACHE_FILE" ]; then
+            if stat -f %m "$VERSION_CACHE_FILE" > /dev/null 2>&1; then
+                CACHE_MTIME=$(stat -f %m "$VERSION_CACHE_FILE")
+            else
+                CACHE_MTIME=$(stat -c %Y "$VERSION_CACHE_FILE" 2>/dev/null || echo 0)
+            fi
+            CACHE_AGE=$(( $(date +%s) - CACHE_MTIME ))
+            if [ "$CACHE_AGE" -lt 86400 ]; then
+                CACHE_CONTENT=$(cat "$VERSION_CACHE_FILE" 2>/dev/null) || CACHE_CONTENT=""
+                if [[ "$CACHE_CONTENT" =~ $SEMVER_RE ]]; then
+                    LATEST_VERSION="$CACHE_CONTENT"
+                fi
+            fi
+        fi
+        # Fetch from npm if cache miss / stale / malformed
+        if [ -z "$LATEST_VERSION" ] && command -v npm > /dev/null 2>&1; then
+            NPM_RESULT=$(npm view agentic-sdlc-wizard version 2>/dev/null) || NPM_RESULT=""
+            if [[ "$NPM_RESULT" =~ $SEMVER_RE ]]; then
+                LATEST_VERSION="$NPM_RESULT"
+                mkdir -p "$VERSION_CACHE_DIR" 2>/dev/null || true
+                printf '%s' "$LATEST_VERSION" > "$VERSION_CACHE_FILE" 2>/dev/null || true
+            fi
+        fi
         if [ -n "$LATEST_VERSION" ] && [ "$LATEST_VERSION" != "$INSTALLED_VERSION" ]; then
-            echo "SDLC Wizard update available: ${INSTALLED_VERSION} → ${LATEST_VERSION} (run /update-wizard)"
+            # Minor-version delta: 1.25.0 vs 1.34.0 → 9
+            INSTALLED_MINOR=$(echo "$INSTALLED_VERSION" | awk -F. '{print $2+0}')
+            LATEST_MINOR=$(echo "$LATEST_VERSION" | awk -F. '{print $2+0}')
+            INSTALLED_MAJOR=$(echo "$INSTALLED_VERSION" | awk -F. '{print $1+0}')
+            LATEST_MAJOR=$(echo "$LATEST_VERSION" | awk -F. '{print $1+0}')
+            MINOR_DELTA=0
+            if [ "$INSTALLED_MAJOR" = "$LATEST_MAJOR" ]; then
+                MINOR_DELTA=$(( LATEST_MINOR - INSTALLED_MINOR ))
+            else
+                # Major bump: treat as a very large delta
+                MINOR_DELTA=99
+            fi
+            if [ "$MINOR_DELTA" -ge 3 ]; then
+                echo ""
+                echo "!! WARNING: SDLC Wizard is ${MINOR_DELTA} minor versions behind !!"
+                echo "   Installed: ${INSTALLED_VERSION}"
+                echo "   Latest:    ${LATEST_VERSION}"
+                echo "   You're missing bug fixes and features shipped across ${MINOR_DELTA} releases."
+                echo "   Strongly recommend running /update-wizard before starting new work."
+                echo ""
+            else
+                echo "SDLC Wizard update available: ${INSTALLED_VERSION} → ${LATEST_VERSION} (run /update-wizard)"
+            fi
         fi
     fi
 fi
@@ -98,6 +155,49 @@ if [ -d "$PROJECT_DIR/.claude/skills/update" ]; then
     done
 fi
+# API feature review nudge (#100) — surface open 'api-review-needed' issues
+# opened by .github/workflows/weekly-api-update.yml so the session picks up
+# new API features without waiting for manual discovery.
+#
+# Gated on LOCAL presence of the detector workflow: the CLI distributes this
+# hook to consumer projects, and we don't want to pester those users with
+# upstream-wizard issues. The nudge only fires when the current repo owns
+# the detector (= the wizard repo or a fork of it).
+if [ -f "$PROJECT_DIR/.github/workflows/weekly-api-update.yml" ] && \
+   command -v gh > /dev/null 2>&1; then
+    # Query the current repo (not hardcoded upstream) — in a fork, users see
+    # their own detector's issues, not ours.
+    API_REVIEW_COUNT=$(gh issue list \
+        --state open \
+        --label "api-review-needed" \
+        --limit 1 \
+        --json number \
+        --jq 'length' 2>/dev/null) || API_REVIEW_COUNT=""
+    if [[ "$API_REVIEW_COUNT" =~ ^[0-9]+$ ]] && [ "$API_REVIEW_COUNT" -gt 0 ]; then
+        echo "Anthropic API features pending review: ${API_REVIEW_COUNT} open issue(s) with label 'api-review-needed' (see .github/workflows/weekly-api-update.yml)"
+    fi
+fi
+# Claude Code release review nudge (#85) — surface open 'auto-update' PRs
+# opened by .github/workflows/weekly-update.yml so new CC releases get triaged
+# before they bit-rot (relevance-HIGH PRs can sit for days otherwise).
+#
+# Gated on LOCAL presence of weekly-update.yml: the CLI distributes this hook
+# to consumer projects, which don't own the detector — don't pester them with
+# upstream-wizard PRs.
+if [ -f "$PROJECT_DIR/.github/workflows/weekly-update.yml" ] && \
+   command -v gh > /dev/null 2>&1; then
+    CC_UPDATE_COUNT=$(gh pr list \
+        --state open \
+        --label "auto-update" \
+        --limit 1 \
+        --json number \
+        --jq 'length' 2>/dev/null) || CC_UPDATE_COUNT=""
+    if [[ "$CC_UPDATE_COUNT" =~ ^[0-9]+$ ]] && [ "$CC_UPDATE_COUNT" -gt 0 ]; then
+        echo "Claude Code update pending review: ${CC_UPDATE_COUNT} open auto-update PR(s) (see .github/workflows/weekly-update.yml)"
+    fi
+fi
 # Claude Code version check (non-blocking, best-effort)
 if command -v claude > /dev/null 2>&1 && command -v npm > /dev/null 2>&1; then
     CC_LOCAL=$(claude --version 2>/dev/null | grep -o '[0-9][0-9.]*' | head -1) || true

package/hooks/precompact-seam-check.sh ADDED Viewed

@@ -0,0 +1,75 @@
+#!/bin/bash
+# PreCompact hook - block manual /compact at non-seam boundaries
+# Fires on manual /compact only (auto-compact is threshold-driven; blocking
+# it could push past 100% context and lose everything). Matcher: "manual"
+# in .claude/settings.json.
+#
+# Requires Claude Code v2.1.105+ (PreCompact event introduced April 13, 2026).
+#
+# Seam policy: compacting mid-cycle loses evidence the next round needs.
+# Block when:
+#   (1) .reviews/handoff.json status is PENDING_REVIEW or PENDING_RECHECK
+#       → mid-Codex-round, compact after CERTIFIED
+#   (2) git rebase/merge/cherry-pick in progress
+#       → finish in-progress git operation first
+# Allow otherwise.
+[ ! -t 0 ] && INPUT=$(cat) || INPUT=""
+# Determine project root: prefer $CLAUDE_PROJECT_DIR, fall back to cwd
+ROOT="${CLAUDE_PROJECT_DIR:-$PWD}"
+HOLD_REASONS=""
+# Check 1: Codex review mid-cycle
+# Self-heal: if handoff has a pr_number and gh reports that PR as MERGED,
+# the handoff is a stale artifact from a closed review — treat as implicit
+# CERTIFIED so a forgotten status field doesn't permanently block /compact.
+HANDOFF="$ROOT/.reviews/handoff.json"
+if [ -f "$HANDOFF" ]; then
+    STATUS=$(grep -o '"status"[[:space:]]*:[[:space:]]*"[^"]*"' "$HANDOFF" 2>/dev/null | head -1 | sed 's/.*"\([^"]*\)"$/\1/')
+    case "$STATUS" in
+        PENDING_REVIEW|PENDING_RECHECK)
+            PR_NUMBER=$(grep -o '"pr_number"[[:space:]]*:[[:space:]]*[0-9][0-9]*' "$HANDOFF" 2>/dev/null | head -1 | grep -o '[0-9][0-9]*$')
+            HEALED=0
+            if [ -n "$PR_NUMBER" ] && command -v gh >/dev/null 2>&1; then
+                PR_STATE=$(gh pr view "$PR_NUMBER" --json state --jq .state 2>/dev/null)
+                [ "$PR_STATE" = "MERGED" ] && HEALED=1
+            fi
+            if [ "$HEALED" -ne 1 ]; then
+                HOLD_REASONS="${HOLD_REASONS}  - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
+    Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
+            fi
+            ;;
+    esac
+fi
+# Check 2: in-progress git operation
+GITDIR="$ROOT/.git"
+if [ -d "$GITDIR" ]; then
+    if [ -e "$GITDIR/REBASE_HEAD" ] || [ -d "$GITDIR/rebase-merge" ] || [ -d "$GITDIR/rebase-apply" ]; then
+        HOLD_REASONS="${HOLD_REASONS}  - Git rebase in progress. Compacting mid-rebase loses the operation's context.
+    Resolve: finish or abort the rebase before /compact."$'\n'
+    fi
+    if [ -e "$GITDIR/MERGE_HEAD" ]; then
+        HOLD_REASONS="${HOLD_REASONS}  - Git merge in progress. Compacting mid-merge loses the operation's context.
+    Resolve: finish or abort the merge before /compact."$'\n'
+    fi
+    if [ -e "$GITDIR/CHERRY_PICK_HEAD" ]; then
+        HOLD_REASONS="${HOLD_REASONS}  - Git cherry-pick in progress. Compacting mid-cherry-pick loses the operation's context.
+    Resolve: finish or abort the cherry-pick before /compact."$'\n'
+    fi
+fi
+if [ -n "$HOLD_REASONS" ]; then
+    {
+        echo "HOLD: manual /compact at a non-seam. Compacting mid-cycle loses evidence the next round needs."
+        echo ""
+        echo "$HOLD_REASONS"
+        echo "Natural seams: commit boundary, Codex CERTIFIED, PR merge, ROADMAP item DONE."
+        echo "Override: resolve the blocker above, or temporarily disable this hook in .claude/settings.json."
+    } >&2
+    exit 2
+fi
+exit 0

package/hooks/sdlc-prompt-check.sh CHANGED Viewed

@@ -17,6 +17,69 @@ else
     exit 0
 fi
+# Effort auto-bump (ROADMAP #195). Watches this UserPromptSubmit payload for
+# LOW-confidence / FAILED-repeatedly / CONFUSED phrases, logs a timestamped
+# signal, and emits a loud '/effort xhigh' nudge when ≥2 signals land inside
+# a 30-minute window. Enforces the SDLC confidence table mid-session so
+# Claude stops burning budget at 'high' after confidence drops.
+EFFORT_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
+EFFORT_SIGNALS="$EFFORT_CACHE_DIR/effort-signals.log"
+PROMPT_TEXT=""
+if [ ! -t 0 ] && command -v jq > /dev/null 2>&1; then
+    STDIN_JSON=$(cat)
+    if [ -n "$STDIN_JSON" ]; then
+        PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
+    fi
+fi
+if [ -n "$PROMPT_TEXT" ]; then
+    LOWER=$(printf '%s' "$PROMPT_TEXT" | tr '[:upper:]' '[:lower:]')
+    SIGNAL_REASON=""
+    # Every trigger requires first-person ownership or a structured-label
+    # form, so educational/quoted mentions ("How do I name a low confidence
+    # badge?", "What does 'failed again' mean?") don't fire.
+    case "$LOWER" in
+        *"i'm stuck"*|*"i am stuck"*|*"im stuck"*|\
+        *"i'm confused"*|*"i am confused"*|*"im confused"*|\
+        *"i tried twice"*|*"i've tried twice"*|*"ive tried twice"*|\
+        *"i can't figure"*|*"i cant figure"*|\
+        *"i'm not sure why"*|*"i am not sure why"*|*"im not sure why"*|\
+        *"my confidence is low"*|*"my confidence: low"*|*"confidence: low"*|\
+        *"it's still failing"*|*"its still failing"*|\
+        *"it keeps failing"*|*"this keeps failing"*|\
+        *"it failed again"*|*"this failed again"*|\
+        *"failed twice"*|*"failed 2x"*)
+            SIGNAL_REASON="low"
+            ;;
+    esac
+    if [ -n "$SIGNAL_REASON" ]; then
+        # Group the write so redirection errors (e.g., unwritable HOME,
+        # cache-dir-is-a-file) land on /dev/null instead of leaking to stderr.
+        {
+            if mkdir -p "$EFFORT_CACHE_DIR" && [ -d "$EFFORT_CACHE_DIR" ]; then
+                # Prune entries older than 1h on every write to cap log size.
+                if [ -f "$EFFORT_SIGNALS" ]; then
+                    PRUNE_THRESH=$(( $(date +%s) - 3600 ))
+                    awk -v t="$PRUNE_THRESH" '$1+0 >= t' "$EFFORT_SIGNALS" > "${EFFORT_SIGNALS}.tmp" \
+                        && mv "${EFFORT_SIGNALS}.tmp" "$EFFORT_SIGNALS"
+                fi
+                printf '%s\t%s\n' "$(date +%s)" "$SIGNAL_REASON" >> "$EFFORT_SIGNALS"
+            fi
+        } 2>/dev/null || true
+    fi
+fi
+if [ -f "$EFFORT_SIGNALS" ]; then
+    NOW=$(date +%s)
+    THRESH=$(( NOW - 1800 ))
+    RECENT=$(awk -v t="$THRESH" '$1+0 >= t' "$EFFORT_SIGNALS" 2>/dev/null | wc -l | tr -d ' ')
+    if [ "${RECENT:-0}" -ge 2 ]; then
+        echo ""
+        echo "!! EFFORT BUMP REQUIRED: ${RECENT} low-confidence signals in last 30 min !!"
+        echo "   Run /effort xhigh NOW — spinning at 'high' after confidence drops wastes budget."
+        echo "   (Auto-enforcement of the SDLC confidence table. ROADMAP #195.)"
+        echo ""
+    fi
+fi
 if [ ! -s "$PROJECT_DIR/SDLC.md" ] || [ ! -s "$PROJECT_DIR/TESTING.md" ]; then
     cat << 'SETUP'
 SETUP NOT COMPLETE: SDLC.md and/or TESTING.md are missing.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.33.0",
+  "version": "1.35.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "./cli/bin/sdlc-wizard.js"

package/skills/sdlc/SKILL.md CHANGED Viewed

@@ -184,9 +184,11 @@ Before presenting approach, STATE your confidence:
 |-------|---------|--------|--------|
 | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
 | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
-| LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | Consider `/effort max` |
-| FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | Try `/effort max` |
-| CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | Try `/effort max` |
+| LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | **Run `/effort xhigh` now** — don't wait |
+| FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | **Run `/effort max` now** — you're already burning cycles at lower effort |
+| CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
+**Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — not after a third failure. Spinning at low effort is an SDLC failure mode, not a style choice.
 ## Self-Review Loop (CRITICAL)
@@ -536,10 +538,11 @@ Local tests pass -> Commit -> Push -> Watch CI
 1. Push changes to remote
 2. Watch CI: `gh pr checks --watch`
 3. Read CI logs — **pass or fail**: `gh run view <RUN_ID> --log` (not just `--log-failed`). Passing CI can still hide warnings, skipped steps, or degraded scores. Don't just check the green checkmark
-4. If CI fails → diagnose from logs, fix, push again (max 2 attempts)
-5. If CI passes → read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
-6. Fix valid suggestions, push, iterate until clean
-7. Only then: explicit merge with `gh pr merge --squash`
+4. **Cross-model review the CI logs themselves** — pipe `gh run view <RUN_ID> --log` to a tmp file and run `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access` with a prompt like *"Audit this CI log for silent failures, skipped tests, degraded metrics, or warnings-that-should-be-errors. Green checkmark is necessary but not sufficient."* A second model catches things the first missed (e.g., a job that passed but degraded an E2E score by 30%, or a test that was silently excluded). Cheap — one extra `codex exec` per PR. **Run separately on Tier 1 quick-check AND Tier 2 5x evaluation logs** — they exercise different code paths, so a clean Tier 1 audit doesn't imply a clean Tier 2. Evidence from PR #206: Tier 1 audit found 3 P1s (Node 24 false-green, "11/10" score leak, E2E incomplete); Tier 2 audit TBD — value is measured by running both and comparing.
+5. If CI fails → diagnose from logs, fix, push again (max 2 attempts)
+6. If CI passes → read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
+7. Fix valid suggestions, push, iterate until clean
+8. Only then: explicit merge with `gh pr merge --squash`
 **Why this is non-negotiable:** PR #145 auto-merged a release before review feedback was read. CI reviewer found a P1 dead-code bug that shipped to main. The fix required a follow-up commit. Auto-merge cost more time than the shepherd loop would have taken.
@@ -717,6 +720,40 @@ If this session revealed insights, update the right place:
 - **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
 - **Plan files** → If this session's work came from a plan file, delete it or mark it complete. Stale plans mislead future sessions into thinking work is still pending
+### Memory Audit Protocol
+Per-user memory at `~/.claude/projects/<proj>/memory/` accumulates private learnings. Some belong there (user preferences, external references). Others are portable technical lessons (tool quirks, platform gotchas, bash/GHA/macOS footguns) that would save the next contributor hours. Run this audit to promote the portable ones.
+**When to run:**
+- End-of-release (before cutting a tag)
+- After a debugging-heavy session with multiple memory additions
+- On explicit "audit my memory" request
+**Classify each memory file in `~/.claude/projects/<proj>/memory/`:**
+1. **Rule-based denylist (deterministic, no LLM):**
+   - `type: user` → `keep` (user identity, preferences — never promote)
+   - `type: reference` → `keep` (external pointers to Discord/URL/etc — private by default)
+   - `type: project` → `manual-review` (often mixed state + portable lesson — human decides)
+   - `type: feedback` → `manual-review` (often mixed personal preference + portable rule — human decides)
+   - Parser must normalize YAML variants (`type: "user"`, `type: user # comment`, surrounding whitespace) — see `tests/test-memory-audit-protocol.sh::apply_denylist_rule` for the reference implementation
+2. **Remaining entries** (no type, or type outside the 4 above) fall through to human-gated review. An LLM-assisted classification runner is Prove-It-Gated: build it only after running this protocol 4+ times with manual classification. Until then, human review at promotion time IS the quality gate
+**Destinations for `promote` entries (no new files — use existing wizard destinations):**
+| Content | Target |
+|---------|--------|
+| Language/tool/platform gotchas (bash, gh CLI, GHA, macOS) | `SDLC.md` → `## Lessons Learned` section |
+| Testing gotchas (flaky patterns, mock-vs-integration lessons) | `TESTING.md` |
+| Tool-specific quirks tied to a skill | That skill's `SKILL.md` |
+| Process rules that should govern the project | `CLAUDE.md` |
+**Tracking:** When you promote an entry, add `promoted_to: <path>` to that memory file's YAML frontmatter. Subsequent audits skip already-promoted entries.
+**Human gate is MANDATORY.** Protocol produces diffs; user approves chunk-by-chunk before apply. Never auto-apply — private memory touching public docs needs human judgement.
+**Prove It Gate:** If you find yourself running this protocol 4+ times and manually doing the same classification work, that's evidence to build a `/memory-audit` slash command AND wire the LLM-gated quality tests (8/10 classification, 6/6 destination). Until then, protocol + human review is enough — and no stub tests that skip (they mislead reviewers into thinking a gate exists when it doesn't).
 ## Post-Mortem: When Process Fails, Feed It Back
 **Every process failure becomes an enforcement rule.** When you skip a step and it causes a problem, don't just fix the symptom — add a gate so it can't happen again.

package/skills/setup/SKILL.md CHANGED Viewed

@@ -174,31 +174,61 @@ Skip this step if no branding assets or UI/content patterns detected.
 ### Step 9: Configure Tool Permissions
-Based on detected stack, suggest `allowedTools` entries for `.claude/settings.json`:
+Based on detected stack, suggest entries for `permissions.allow` in `.claude/settings.json`:
 - Package manager commands (npm, pnpm, yarn, cargo, go, pip, etc.)
 - Build/test commands
 - CI tools (gh)
+Write the shape as:
+```json
+{
+  "permissions": {
+    "allow": [
+      "Bash(npm:*)",
+      "Bash(npx:*)",
+      "Bash(git:*)",
+      "Bash(gh:*)"
+    ]
+  }
+}
+```
+**Do NOT write the deprecated top-level `allowedTools` array** (issue #197). Claude Code treats the presence of `allowedTools` in project settings as "user has explicitly scoped tool permissions" and silently disables its auto-mode classifier — same failure family as the model pin in #198. `permissions.allow` is the supported successor and does not trip the auto-mode gate.
 Present suggestions and let the user confirm.
-### Step 9.5: Context Window Configuration
+### Step 9.5: Context Window Configuration (Opt-In)
+The CLI ships `cli/templates/settings.json` with **no** `model` or `env` pin by default. This preserves Claude Code's built-in model auto-selection (Sonnet for cheap tasks, Opus for hard ones) and the upstream autocompact threshold. Power users who want guaranteed 1M context can opt in during setup.
+**Why this is opt-in (issue #198):** A top-level `"model"` in `settings.json` tells Claude Code "the user has explicitly chosen a model" and disables auto-mode for the session. That is a real tradeoff — the pin is only worth it when you actually need the 1M headroom and want to lock to Opus 4.7.
+**Ask the user exactly once in Step 9.5:**
+> Pin the session to `opus[1m]` (Opus 4.7 with 1M context) and set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`?
+>
+> - **No (default):** Leaves auto-mode enabled. Claude Code picks the model per turn, compaction follows upstream defaults. Recommended for most users.
+> - **Yes:** Long SDLC sessions (plan → TDD → review → CI shepherd on one feature) regularly cross 100K tokens; the 1M window gives headroom and 30% autocompact fires at ~300K. Requires Claude Code v2.1.111+ and comfort with losing model auto-selection.
+>
+> `[y/N]`
-The CLI ships `cli/templates/settings.json` with `"model": "opus[1m]"` and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M context window — compacts at ~300K). This is the SDLC wizard default. Confirm the installed values, or fall back to 200K if the user prefers:
+**If the user answers No (default):** Make no edits to `.claude/settings.json`. Auto-mode stays on. Done.
-- **1M default (`opus[1m]`):** Confirm `"model": "opus[1m]"` at top level and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "30"` under `env`. Requires Claude Code v2.1.111+ for Opus 4.7.
-- **200K fallback (`opus`):** Edit `.claude/settings.json` — change `"model"` to `"opus"` and raise `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to `"75"` (otherwise `30%` of 200K compacts too early at 60K).
+**If the user answers Yes:** Edit `.claude/settings.json` and add both fields at the top level:
-To fall back to 200K, edit `.claude/settings.json`:
 ```json
 {
-  "model": "opus",
+  "model": "opus[1m]",
   "env": {
-    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
+    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
   }
 }
 ```
-For CI pipelines, consider `"60"` — short tasks benefit from compacting early.
+Mention the escape hatch either way:
+- To opt out later: remove the `model` line (and optionally the `env` block) from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
+- For CI pipelines with short tasks, consider `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60` — compact early to stay fast.
 This is project-scoped and shared with the team via git.
@@ -224,10 +254,13 @@ Tell the user:
 > **Exit Claude Code and restart it** for the new configuration to take effect.
 > On restart, the SDLC hook will fire and you'll see the checklist in every response.
 >
-> **Optional next step:**
+> **Optional next steps:**
 > - Run `/claude-automation-recommender` for stack-specific tooling suggestions (MCP servers, formatting hooks, type-checking hooks, plugins)
+> - After a few sessions, run `/less-permission-prompts` — a native Claude Code skill
+>   that scans your transcripts for common read-only Bash/MCP calls and proposes a
+>   prioritized allowlist. Reduces permission friction without enabling auto mode.
 >
-> The recommender is complementary to the SDLC wizard — it adds tooling recommendations, not process enforcement.
+> Both are complementary to the SDLC wizard — they add tooling and quality-of-life, not process enforcement.
 ## Rules

package/skills/update/SKILL.md CHANGED Viewed

@@ -46,9 +46,10 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
 ```
 Installed: 1.24.0
-Latest:    1.33.0
+Latest:    1.34.0
 What changed:
+- [1.34.0] API feature detection shepherd for Claude releases, Memory Audit Protocol with 7 verified lessons (+2 caught-and-retracted), /less-permission-prompts surfaced, ...
 - [1.33.0] opus[1m] as SDLC default, dual-channel install drift guardrails, model/effort session-start nudge, ...
 - [1.32.0] Opus 4.7 + xhigh support, model/effort upgrade detection, benchmark ceiling audit, ...
 - [1.31.0] Hook false-positive fix for non-SDLC dirs, ephemeral marketplace path warning, ...
@@ -109,6 +110,54 @@ NEVER overwrite settings.json. Instead:
 The CLI's `init --force` already has smart merge logic for settings.json. If the manual merge gets complicated, suggest: `npx agentic-sdlc-wizard init --force` (it preserves custom hooks).
+### Step 7.5: Model Pin Migration (Issue #198)
+Wizard versions 1.31.0–1.33.x unconditionally wrote `"model": "opus[1m]"` and `"env": { "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30" }` to `.claude/settings.json`. Issue #198 flipped that to opt-in because a top-level `model` disables Claude Code's auto-mode for the session.
+If the user is upgrading from a pre-#198 version, check their `.claude/settings.json`:
+1. **If `model` is `"opus[1m]"` and `env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` is `"30"`** — this is likely the old wizard-installed pair, not an intentional user choice. Ask:
+   > Your `.claude/settings.json` pins `model: "opus[1m]"` with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`.
+   > This pair was the SDLC wizard default in 1.31.0–1.33.x, but it disables Claude Code's auto-mode (see issue #198).
+   >
+   > - **Remove the pin** (recommended for most users) — keeps auto-mode enabled, lets Claude Code pick the model per turn.
+   > - **Keep the pin** — you want guaranteed Opus 4.7 + 1M context, and you're OK giving up model auto-selection.
+   >
+   > Remove, keep, or decide later? `[r/k/l]`
+2. **If only one of the two fields matches** (e.g. `model: "opus[1m]"` but custom autocompact, or vice versa) — treat as intentional customization. Do not prompt.
+3. **If `model` is some other value** (e.g. `"sonnet"`, `"opus"`) — treat as user's explicit choice. Do not touch.
+4. **If neither field is set** — user is already on the new default. No action.
+When removing: edit the file in place, drop the `model` key (and the `env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` key if nothing else is in `env`, otherwise leave `env` alone). Never touch other keys the user added.
+### Step 7.6: `allowedTools` → `permissions.allow` Migration (Issue #197)
+Wizard versions before #197 guided users to write a top-level `allowedTools` array in `.claude/settings.json`. Claude Code silently disables its auto-mode classifier when that key is present, even with `defaultMode: "auto"` set globally.
+If the user's `.claude/settings.json` has a top-level `allowedTools` array, offer to migrate:
+1. **If only `allowedTools` is present** (no `permissions.allow`) — ask:
+   > Your `.claude/settings.json` has a top-level `allowedTools` array. This silently disables Claude Code auto-mode (see issue #197). The supported successor is `permissions.allow`, which accepts the same patterns but doesn't trip the auto-mode gate.
+   >
+   > - **Migrate** (recommended): move all entries into `permissions.allow`, remove the old `allowedTools`.
+   > - **Keep** — you have a specific reason to use the legacy key.
+   > - **Later** — don't touch it now.
+   >
+   > `[m/k/l]`
+2. **If both `allowedTools` and `permissions.allow` are present** — flag it: the two lists may have diverged. Show both arrays to the user. On migrate, append every entry from `allowedTools` to the end of `permissions.allow` (preserving order within each list), then drop the legacy `allowedTools` key. **Do NOT dedup.** If the same string appears in both lists, it stays in both positions — Claude Code treats duplicate entries as a no-op, but dedup would silently remove user data that the user might have intended. If the user explicitly asks to dedup, do that as a separate follow-up edit.
+3. **If only `permissions.allow` is present** — user is already on the new shape. No action.
+4. **If neither is present** — no action.
+When migrating: preserve every entry byte-for-byte; only the container key changes. Do not reorder, dedup, or expand wildcards. Other top-level keys (hooks, env, model, custom user fields) are never touched.
 ### Step 8: Apply Selected Changes
 For each file the user approved: