npm - agentic-sdlc-wizard - Versions diffs - 1.34.0 → 1.36.0 - Mend

agentic-sdlc-wizard 1.34.0 → 1.36.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +53 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +95 -44
package/README.md +8 -0
package/cli/init.js +1 -0
package/cli/templates/settings.json +11 -4
package/hooks/hooks.json +11 -0
package/hooks/instructions-loaded-check.sh +81 -4
package/hooks/precompact-seam-check.sh +75 -0
package/hooks/sdlc-prompt-check.sh +63 -0
package/package.json +1 -1
package/skills/sdlc/SKILL.md +10 -7
package/skills/setup/SKILL.md +39 -9
package/skills/update/SKILL.md +51 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.34.0",
+      "version": "1.36.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.34.0",
+  "version": "1.36.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,59 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.36.0] - 2026-04-23
+### Added
+- **Regression test: every ci.yml `steps.X.outputs.Y` reference must resolve** (#214 / ROADMAP #215). Python+PyYAML test walks ci.yml, builds per-job map of step_id → emitted outputs (handles `NAME=val` and heredoc `NAME<<EOF`), and flags any dead gate. Caught #215's original bug and guards against re-introduction.
+- **Regression test: no `oven-sh/setup-bun` in workflows** (#217 / ROADMAP #210). Defensive guard against Node 20 deprecation reintroduction. Committed negative control writes a tmp fixture with the banned pattern, asserts grep catches it, tears down — proves the regex is live-fire correct.
+- **ROADMAP #218** — evaluate CC 2.1.118 `type: "mcp_tool"` hook capability (Prove-It gated).
+- **ROADMAP #219** — re-verify #198 model-pin guidance against CC 2.1.117 restart-persistence behavior.
+- **ROADMAP #220** — token-spike anomaly detection (from Anthropic 2026-04-23 post-mortem).
+- **ROADMAP #221** — fold post-mortem lessons into wizard docs (explicit effort / extended-thinking+caching+idle gotcha / verbosity-cap audit).
+- **ROADMAP #222** — prompt-compounding audit harness (A/B each of ~40 prompt-injection sites).
+- **ROADMAP #223** — adopt GPT-5.5 in review-tier guidance after calibration (standard $5/$30 is Codex-usable ceiling; Pro $30/$180 is ChatGPT-only, reserve for release-blocker one-offs).
+### Fixed
+- **Tier 2 persist-scores dead gate** (#214 / ROADMAP #215). The Tier 2 "Persist scores to PR branch" step was gated on `steps.check-baseline.outputs.should_simulate`, but the Tier 2 `check-baseline` only emits `has_baseline`. The step had been silently dead, so `score-history.jsonl` never got appended from Tier 2 runs. One-word fix (`should_simulate` → `has_baseline`) plus the new cross-job output-parser regression test.
+- **`score-history.jsonl` `max_score` correctness** (#216 / ROADMAP #211). Both Tier 1 and Tier 2 hardcoded `--argjson max_score 10`, causing UI scenarios (which score out of 11 via design_system bonus) to record `11/10` — nonsensical and breaking downstream analytics. Both sites now read `MAX_SCORE` from the eval result file; shell `case` statement guards non-numeric inputs; regression test grep-asserts no hardcoded literal remains.
+- **CC 2.1.118 `/cost` → `/usage` doc rename** (#209). Claude Code 2.1.118 consolidated `/cost` and `/stats` into `/usage` with aliases preserved. Wizard docs and test-self-update now use `/usage` as canonical (alias note inline).
+### Docs
+- Roadmap entries for all additions above (#218-223), plus minor wording correction to #221(c) post-Codex review (attributes 3% drop to broader length-limit prompt change, not a single sentence).
+### Process
+- Codex cross-model review run on every PR in the v1.36.0 batch (3 code PRs: 10/10 + 10/10 + 8/10; 4 doc/roadmap PRs: 10/10 + 10/10 + 7/10→fixed + 9/10). Shepherd-loop discipline re-enforced after a process miss mid-cycle — logged as feedback memory `feedback_shepherd_loop_per_pr.md`.
+## [1.35.0] - 2026-04-19
+### Added
+- **PreCompact seam gate** (#205 / ROADMAP #208). New `hooks/precompact-seam-check.sh` blocks manual `/compact` when `.reviews/handoff.json` status is `PENDING_REVIEW`/`PENDING_RECHECK` or a git rebase/merge/cherry-pick is in flight. Matcher is `manual` — auto-compact is deliberately NOT gated (blocking it could push past 100% context). Requires Claude Code v2.1.105+. 10 quality tests.
+- **Self-healing PreCompact** (#206 / ROADMAP #209). Hook now treats a stale `PENDING_*` handoff as implicit `CERTIFIED` when optional `pr_number` field is present and `gh pr view <N>` reports `MERGED`. Fixes the "forgot to flip status to CERTIFIED after merge" consumer bug. Graceful fallback: if `pr_number` absent, `gh` missing, offline, or any error → existing block behavior. 4 new quality tests with mocked `gh` binary.
+- **Dynamic effort auto-bump hook** (#202 / ROADMAP #195). `sdlc-prompt-check.sh` scans `UserPromptSubmit` payload for LOW-confidence / FAILED-repeatedly / CONFUSED phrases and logs a timestamped signal. At ≥2 recent signals in a 30-min window, emits a loud `!! EFFORT BUMP REQUIRED !!` block with the exact `/effort xhigh` command. 8 quality tests.
+- **Loud staleness nudge** (#201 / ROADMAP #196). `instructions-loaded-check.sh` now caches npm-latest for 24h and prints a loud multi-line warning when the installed wizard is ≥3 minor versions behind. 1–2 minor keeps the existing mild one-liner.
+- **Session-start CC auto-update nudge** (#192 / ROADMAP #85). Instructions-loaded hook queries for open auto-update PRs and nudges the user to review before compacting.
+- **Hook token-cost caps** (#203). 4 new size-cap tests across every hook. Negative control injects echo bloat to prove the caps trip.
+- **Permissions allowlist** (#204). Top read-only tools pre-approved in `.claude/settings.json` to cut permission prompts during automation.
+- **Codex-audit-on-CI-logs shepherd step** (docs). SDLC skill now requires running `codex exec xhigh` against both Tier 1 and Tier 2 CI logs separately — catches silent failures, degraded metrics, and warnings-promoted-to-errors that the green checkmark hides. Dogfooded on PR #206: caught 4 P1s in pre-existing CI infra (tracked as ROADMAP #210, #211, #215 + regression of #93).
+### Fixed
+- **CI persist-to-PR-branch race** (#196 / ROADMAP #193). Tier 2 no longer aborts the whole run on a single low-score trial; records the trial instead.
+- **Setup wizard `allowedTools` → `permissions.allow`** (#200 / ROADMAP #197). CC v2.1 renamed the field; setup template updated.
+- **Setup wizard `opus[1m]` opt-in** (#199 / ROADMAP #198). Default no longer force-pins; respects explicit user choice.
+- **SessionStart hook model-field absence** (#180 / commit 3b23860). Model isn't exposed in SessionStart payload; hook now detects effort-only.
+### Docs
+- Memory lessons promoted (#194): tier2 exit-code pattern + pipeline liveness.
+- Community paths (#191 / ROADMAP #98): issue + PR templates + Discussions enabled.
+- ROADMAP backlog filed this cycle: #210 (Node 24 false-green), #211 (Tier 1 "11/10" score), #212 (local-Max E2E), #213 (ship degradation env vars by default — blocked on #214), #214 (adaptive-thinking A/B Prove-It), #215 (Tier 2 persist dead code), #216 (repo rename).
 ## [1.34.0] - 2026-04-17
 ### Added

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -240,11 +240,13 @@ When Anthropic provides official plugins or tools that handle something:
 Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
+> ⚠️ **On Opus 4.7, effort below `xhigh` breaks SDLC compliance in practice.** Unlike 4.6, Opus 4.7 respects effort levels *strictly* — at `high` or below it scopes work tighter (shallow reasoning, skipped TDD, no self-review) rather than going above-and-beyond. Treat the table below accordingly: **`max` is the recommended default, `xhigh` is the floor**, `high` or below is for trivial grep/search subagents only.
 | Level | When to Use | How to Set |
 |-------|-------------|------------|
-| `high` | Standard SDLC work. Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
-| `xhigh` | **Recommended default for coding and agentic work (Opus 4.7+).** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7 | `/effort xhigh` or set in skill frontmatter |
-| `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews. Reserve for genuinely frontier problems — on most workloads `max` adds cost for small quality gains | `/effort max` (session only — resets next session) |
+| `high` or below | **Not for SDLC work on Opus 4.7.** Only for trivial grep/search subagents or one-shot questions that don't require planning | `effort: high` in a specific subagent frontmatter only |
+| `xhigh` | **Floor for SDLC work on Opus 4.7.** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7 | `/effort xhigh` or set in skill frontmatter |
+| `max` | **Recommended default for Opus 4.7 SDLC work.** Multi-file changes, architecture decisions, debugging, cross-model reviews, any task touching wizard/skill/CI code | `/effort max` (session only — resets next session) |
 **Effort level changes in Opus 4.7 (April 2026):**
 - **`xhigh` is new** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
@@ -255,7 +257,7 @@ Claude Code's **effort level** controls how much thinking the model does before
 **Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
-The `/sdlc` skill sets `effort: high` in its frontmatter, overriding the medium default on every SDLC invocation. Consider upgrading to `effort: xhigh` on Opus 4.7+ for deeper reasoning on complex tasks.
+The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.7, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.7.
 **Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
@@ -403,7 +405,7 @@ The `if` field on individual hook handlers filters by tool name AND arguments us
 | `matcher` | Group (all hooks in array) | Tool name only | Regex (`Write\|Edit`) |
 | `if` | Individual handler | Tool name + arguments | Permission rule (`Edit(src/**)`) |
-**Pattern examples:** `Edit(*.ts)`, `Write(src/**)`, `Bash(git *)`. Same syntax as `allowedTools` in settings.json.
+**Pattern examples:** `Edit(*.ts)`, `Write(src/**)`, `Bash(git *)`. Same syntax as `permissions.allow` in settings.json.
 **Only works on tool-use events:** `PreToolUse`, `PostToolUse`, `PostToolUseFailure`. Adding `if` to non-tool events prevents the hook from running.
@@ -846,6 +848,22 @@ Two tools for managing context — use the right one:
 **Best practice:** Put persistent instructions in CLAUDE.md (survives both `/compact` and `/clear`), not in conversation.
+### Compact at Seams, Not Thresholds (PreCompact hook)
+**The threshold is the trigger, not the decision.** 25-30% remaining (~70% used) is the commonly-cited "sweet spot" but ignores *what you're doing* at that moment. Compacting mid-Codex-round loses the round-1 evidence and certify conditions that round-2 needs to re-verify. Compacting mid-rebase strands the operation without the context that was setting it up.
+A **seam** is a point where losing conversational context is safe:
+- Commit boundary (change persisted to git)
+- Codex `CERTIFIED` (review cycle closed)
+- PR merged (work shipped)
+- ROADMAP item marked DONE
+The wizard's `PreCompact` hook (`hooks/precompact-seam-check.sh`) enforces this for **manual** `/compact` only — it reads `.reviews/handoff.json` and blocks with `HOLD` + exit 2 when status is `PENDING_REVIEW` / `PENDING_RECHECK`, and also blocks when a git rebase, merge, or cherry-pick is in progress. Auto-compact is **not** gated — blocking it could push past 100% context and lose everything. Requires Claude Code **v2.1.105+** (PreCompact event introduced 2026-04-13).
+**What's NOT checked:** in-progress TodoWrite tasks. Claude Code does not persist TodoWrite state to a file readable from a hook, so "finish the current todo first" is on you, not the hook. Watch the TodoWrite panel before you `/compact`.
+Override: resolve the blocker (certify the review, finish the rebase), or temporarily disable the hook in `.claude/settings.json`. Don't suppress the warning reflexively — the warning is the point.
 ### Autocompact Tuning
 Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic. For a rigorous benchmarking methodology to validate these thresholds, see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md).
@@ -855,7 +873,9 @@ Override the default auto-compact threshold with environment variables. These ar
 | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
 | `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |
-**Recommended:** The SDLC Wizard CLI sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` and `"model": "opus[1m]"` in `.claude/settings.json` by default (tuned for the 1M context window — compacts at ~300K). To customize, edit `.claude/settings.json`:
+**Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.7 + 1M context answer yes.
+To opt in by hand, edit `.claude/settings.json`:
 ```json
 {
@@ -876,7 +896,7 @@ export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30
 | Use Case | AUTOCOMPACT % | Why |
 |----------|--------------|-----|
-| **SDLC default (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions** |
+| **Opt-in SDLC setup (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions. Paired with the opt-in `opus[1m]` pin (see issue #198)** |
 | General development (200K `opus`) | 75% | Leaves room for implementation after planning |
 | Complex refactors (200K `opus`) | 80% | Slightly more context before compaction |
 | CI pipelines | 60% | Short tasks, compact early to stay fast |
@@ -892,31 +912,33 @@ The thresholds above are community consensus — not empirically validated. For
 ### 1M vs 200K Context Window
-Claude Code supports both 200K and 1M context windows. **Default to `opus[1m]` for SDLC work** — the 1M headroom is free until you actually use it.
+Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in power-user pin** — ask yourself whether you actually need the headroom before setting it, because pinning the model at the top level disables Claude Code's auto-mode (see issue #198).
-| | 200K Context (`opus`) | 1M Context (`opus[1m]`) **← default** |
+| | 200K Context (default / auto-mode) | 1M Context (`opus[1m]`, opt-in) |
 |---|---|---|
-| **Best for** | Short one-off tasks | Normal SDLC cycles + multi-feature work |
+| **Best for** | Most work — auto-mode picks Sonnet/Opus per turn | Multi-feature / long plan+TDD+review cycles where a single session really crosses 100K+ |
 | **Typical usage** | 50-80K tokens per task | 50-80K typical, up to 200K+ for complex workflows |
 | **Cost** | Standard pricing | Anthropic currently lists the 1M window at standard pricing across the full context for supported Opus/Sonnet models — **verify current rates at [docs.anthropic.com/pricing](https://docs.anthropic.com/)** before assuming no premium |
-| **Auto-compact** | Default 95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — **tune to 30%** |
-| **Suggested override** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
+| **Auto-mode** | **Enabled** — Claude Code chooses model per turn | **Disabled** — top-level `model` tells CC you've chosen explicitly |
+| **Auto-compact** | Default ~95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` |
+| **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
+**Why `opus[1m]` is opt-in (issue #198):**
+- **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
+- **The 1M headroom has to earn it.** If your typical session stays under 150K, you're giving up auto-mode for headroom you're not using.
+- **Power users who want guaranteed Opus 4.7 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.
-**Why `opus[1m]` as default:**
-- **Long SDLC sessions accumulate context fast** — plan → TDD → review → CI shepherd on a single feature regularly crosses 100K tokens
-- **Safety margin against autocompact loss** — cheaper to have headroom than to re-read files after a forced compact
-- **At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models.** Verify current rates for your plan before relying on this — see [docs.anthropic.com/pricing](https://docs.anthropic.com/)
+**Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.7 specifically (not Sonnet), and you're OK losing auto-mode.
-**Set it up:** `/model opus[1m]` in your session, or set `"model": "opus[1m]"` in `.claude/settings.json`. The CLI template ships with this default. Requires Claude Code v2.1.111+ for Opus 4.7.
+**Stay on auto-mode (default) when:** you're unsure, your work is mixed short/long, or you want Claude Code to do the model math for you.
-**Fall back to `opus` (200K) when:**
-- Your plan or organization charges higher rates for long-context prompts (check your billing)
-- You're doing genuinely short one-off tasks and want slightly faster responses
-- Your team has cost controls that flag >200K prompts
+**How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.111+ for Opus 4.7. The setup wizard's Step 9.5 also asks once, with default No.
-**Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/cost` to monitor — a 900K-token session is meaningfully more expensive than an 80K one even at standard rates.
+**How to opt out:** remove the `model` line from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
-**Autocompact pairing (important):** If you default to `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. The CLI template sets this automatically.
+**Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/usage` to monitor (aliases: `/cost`, `/stats`) — a 900K-token session is meaningfully more expensive than an 80K one even at standard rates.
+**Autocompact pairing (important):** If you opt into `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. Step 9.5 writes both together when you opt in.
 ---
@@ -1300,7 +1322,7 @@ Feature branches still recommended for solo devs (keeps main clean, easy rollbac
 **CI monitoring detail:**
 > "Should Claude monitor CI checks after pushing and auto-diagnose failures? (y/n)"
-- **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to allowedTools
+- **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to `permissions.allow`
 - **No** → Skip CI monitoring steps (Claude still runs local tests, just doesn't watch CI)
 **CI review feedback question (only if CI monitoring is enabled):**
@@ -1365,7 +1387,7 @@ Claude scans for:
 │   ├── .github/workflows/deploy*.yml     → GitHub Actions deploy
 │   └── package.json scripts (deploy:*)   → npm deploy scripts
 │
-├── Tool permissions (for allowedTools):
+├── Tool permissions (for permissions.allow):
 │   ├── package.json           → Bash(npm *), Bash(node *), Bash(npx *)
 │   ├── pnpm-lock.yaml         → Bash(pnpm *)
 │   ├── yarn.lock              → Bash(yarn *)
@@ -1759,18 +1781,20 @@ Create `.claude/settings.json`:
 ```json
 {
   "verbosity": "medium",
-  "allowedTools": [
-    "Read",
-    "Edit",
-    "Write",
-    "Glob",
-    "Grep",
-    "Task",
-    "Bash(npm *)",
-    "Bash(node *)",
-    "Bash(npx *)",
-    "Bash(gh *)"
-  ],
+  "permissions": {
+    "allow": [
+      "Read",
+      "Edit",
+      "Write",
+      "Glob",
+      "Grep",
+      "Task",
+      "Bash(npm *)",
+      "Bash(node *)",
+      "Bash(npx *)",
+      "Bash(gh *)"
+    ]
+  },
   "hooks": {
     "UserPromptSubmit": [
       {
@@ -1800,7 +1824,7 @@ Create `.claude/settings.json`:
 ### Allowed Tools (Adaptive)
-The `allowedTools` array is auto-generated based on your stack detected in Step 0.4.
+The `permissions.allow` array is auto-generated based on your stack detected in Step 0.4. (Historical note: pre-#197 guidance used a top-level `allowedTools` array — that form silently disables Claude Code auto-mode, so the wizard writes `permissions.allow` now.)
 | If Detected | Tools Added |
 |-------------|-------------|
@@ -1841,6 +1865,9 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
 |------|---------------|---------|
 | `UserPromptSubmit` | Every message you send | Baseline SDLC reminder, skill auto-invoke |
 | `PreToolUse` | Before Claude edits files | TDD check: "Did you write the test first?" Uses `if` field to only fire on source files |
+| `InstructionsLoaded` | On first SDLC.md/CLAUDE.md load | Staleness nudges (wizard version, review-protocol reminders, CC release alerts) |
+| `SessionStart` | On `claude` startup | Detect stale effort setting / model upgrades |
+| `PreCompact` (manual only) | When user runs `/compact` | **Seam gate** — blocks manual compact when `.reviews/handoff.json` is `PENDING_REVIEW`/`PENDING_RECHECK` or a git rebase/merge/cherry-pick is in progress. Auto-compact is NOT gated (blocking it risks pushing past 100% context and losing everything). Requires Claude Code v2.1.105+ |
 ### How Skill Auto-Invoke Works
@@ -2057,9 +2084,11 @@ Before presenting approach, STATE your confidence:
 |-------|---------|--------|--------|
 | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
 | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
-| LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
-| FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
-| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
+| LOW (<60%) | Not sure | ASK USER before proceeding | **Run `/effort xhigh` now** — don't wait |
+| FAILED 2x | Something's wrong | STOP. ASK USER immediately | **Run `/effort max` now** — you're burning cycles at lower effort |
+| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
+**Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — spinning at low effort is an SDLC failure mode.
 ## Self-Review Loop (CRITICAL)
@@ -2626,6 +2655,28 @@ These are your full reference docs. Start with stubs and expand over time:
 **Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
+## Pipeline Liveness Audits — CI green ≠ data flowing
+A green CI badge only means "no step crashed." It does **not** mean "this pipeline is still producing the output it's supposed to produce." Long-running pipelines — scheduled benchmarks, nightly analytics jobs, weekly report generators, any workflow that appends to a log or dataset — can silently stop producing output while every run still reports success. Regression tests alone do not catch this: the fault lives between "green status" and "observable artifact."
+**Symptom to watch for:** a file, table, or dashboard that's supposed to be updated by a scheduled workflow stops advancing even though the workflow keeps running green.
+**Concrete example from this repo (2026-04-18):** `tests/e2e/score-history.jsonl` hadn't been appended to since 2026-03-30, yet weekly runs kept completing. Two stacked causes:
+1. On the 2026-04-13 run, a `CRITICAL MISS` caused `evaluate.sh` to exit 1 *with* a valid score payload; the tier2 wrapper aborted on the non-zero exit and dropped the trial (fix: PR #193 — disambiguate infra error from legitimate low-score exit via JSON payload, not exit code).
+2. On other runs, a separate PR-branch push race (`refs/pull/N/merge` checkout vs. `refs/heads/<branch>` push) silently dropped the new trial because `continue-on-error: true` was set on the push step.
+The second failure was silent (push step protected by `continue-on-error`); the first was a red CI run but nobody was watching weekly runs closely enough to notice the artifact had stopped advancing. Either way, the artifact's liveness would have caught the stall weeks earlier than a CI-badge-only review.
+**Audit pattern — run when you touch a pipeline, and when asked to check pipeline health:**
+1. **Identify the observable output** — the artifact the pipeline is supposed to produce (file, PR, issue, log row, dashboard row).
+2. **Check its liveness** — what's the last timestamp? If the pipeline runs weekly but the artifact is 3+ cycles stale, that's a stall, not a lull.
+3. **Walk backward from the artifact to CI** — find the step that writes it, read that specific step's logs (not just the top-level status), confirm the write actually happened.
+4. **When `continue-on-error: true` is present upstream of the write step**, treat that step as suspect by default — its failures are masked.
+**When Claude should run this.** Claude runs the liveness audit when it merges or edits a scheduled workflow, when it ships a change that writes to a long-running artifact, and whenever the user asks to check a pipeline's health. It is not a background task — if no one is looking, the audit does not happen.
 ## Rollback
 If deployment fails or post-deploy verification catches issues:
@@ -2662,7 +2713,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.34.0 -->
+<!-- SDLC Wizard Version: 1.36.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -3312,7 +3363,7 @@ Practical techniques to reduce token consumption without sacrificing quality.
 | Tool | What It Shows | When to Use |
 |------|---------------|-------------|
-| `/cost` | Session total: USD, API time, code changes | After a session to review spend |
+| `/usage` | Session total: USD, API time, code changes (aliases: `/cost`, `/stats`) | After a session to review spend |
 | `/context` | What's consuming context window space | When hitting context limits |
 | Status line | Real-time `cost.total_cost_usd` + token counts | Continuous monitoring |
@@ -3723,7 +3774,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.34.0 -->
+<!-- SDLC Wizard Version: 1.36.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/README.md CHANGED Viewed

@@ -252,3 +252,11 @@ Share patterns, ask questions, compare notes on AI agents, automation, and SDLC
 ## Contributing
 PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for evaluation methodology and testing.
+## Feedback
+Three ways to report bugs, request features, or ask questions:
+- **In-session:** run `/feedback` inside any Claude Code session using this wizard — auto-fills context and redacts secrets before filing
+- **Issue templates:** [bug report](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=bug_report.md), [feature request](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=feature_request.md), [question](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=question.md)
+- **Discussions:** open-ended conversations at [github.com/BaseInfinity/agentic-ai-sdlc-wizard/discussions](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/discussions)

package/cli/init.js CHANGED Viewed

@@ -24,6 +24,7 @@ const FILES = [
   { src: 'hooks/tdd-pretool-check.sh', dest: '.claude/hooks/tdd-pretool-check.sh', executable: true, base: REPO_ROOT },
   { src: 'hooks/instructions-loaded-check.sh', dest: '.claude/hooks/instructions-loaded-check.sh', executable: true, base: REPO_ROOT },
   { src: 'hooks/model-effort-check.sh', dest: '.claude/hooks/model-effort-check.sh', executable: true, base: REPO_ROOT },
+  { src: 'hooks/precompact-seam-check.sh', dest: '.claude/hooks/precompact-seam-check.sh', executable: true, base: REPO_ROOT },
   { src: 'skills/sdlc/SKILL.md', dest: '.claude/skills/sdlc/SKILL.md', base: REPO_ROOT },
   { src: 'skills/setup/SKILL.md', dest: '.claude/skills/setup/SKILL.md', base: REPO_ROOT },
   { src: 'skills/update/SKILL.md', dest: '.claude/skills/update/SKILL.md', base: REPO_ROOT },

package/cli/templates/settings.json CHANGED Viewed

@@ -1,8 +1,4 @@
 {
-  "model": "opus[1m]",
-  "env": {
-    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
-  },
   "hooks": {
     "UserPromptSubmit": [
       {
@@ -45,6 +41,17 @@
           }
         ]
       }
+    ],
+    "PreCompact": [
+      {
+        "matcher": "manual",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/precompact-seam-check.sh"
+          }
+        ]
+      }
     ]
   }
 }

package/hooks/hooks.json CHANGED Viewed

@@ -42,6 +42,17 @@
           }
         ]
       }
+    ],
+    "PreCompact": [
+      {
+        "matcher": "manual",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/precompact-seam-check.sh"
+          }
+        ]
+      }
     ]
   }
 }

package/hooks/instructions-loaded-check.sh CHANGED Viewed

@@ -34,14 +34,71 @@ if [ -n "$MISSING" ]; then
     echo "Invoke Skill tool, skill=\"setup-wizard\" to generate them."
 fi
-# Version update check (non-blocking, best-effort)
+# Version update check (non-blocking, best-effort).
+# Fetches npm latest at most once per 24h (ROADMAP #196). Prints a stronger
+# multi-line nudge when the gap is ≥3 minor versions — the one-liner gets
+# skipped after weeks of ignoring it (user feedback 2026-04-18).
 SDLC_MD="$PROJECT_DIR/SDLC.md"
+# Strict x.y.z semver — rejects whitespace, "junk", "1.alpha.0", etc.
+SEMVER_RE='^[0-9]+\.[0-9]+\.[0-9]+$'
 if [ -f "$SDLC_MD" ]; then
     INSTALLED_VERSION=$(grep -o 'SDLC Wizard Version: [0-9.]*' "$SDLC_MD" | head -1 | sed 's/SDLC Wizard Version: //')
-    if [ -n "$INSTALLED_VERSION" ] && command -v npm > /dev/null 2>&1; then
-        LATEST_VERSION=$(npm view agentic-sdlc-wizard version 2>/dev/null) || true
+    if [ -n "$INSTALLED_VERSION" ] && [[ "$INSTALLED_VERSION" =~ $SEMVER_RE ]]; then
+        VERSION_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
+        VERSION_CACHE_FILE="$VERSION_CACHE_DIR/latest-version"
+        LATEST_VERSION=""
+        # Use cache if present, <24h old, and contents are valid semver
+        if [ -f "$VERSION_CACHE_FILE" ]; then
+            if stat -f %m "$VERSION_CACHE_FILE" > /dev/null 2>&1; then
+                CACHE_MTIME=$(stat -f %m "$VERSION_CACHE_FILE")
+            else
+                CACHE_MTIME=$(stat -c %Y "$VERSION_CACHE_FILE" 2>/dev/null || echo 0)
+            fi
+            CACHE_AGE=$(( $(date +%s) - CACHE_MTIME ))
+            if [ "$CACHE_AGE" -lt 86400 ]; then
+                CACHE_CONTENT=$(cat "$VERSION_CACHE_FILE" 2>/dev/null) || CACHE_CONTENT=""
+                if [[ "$CACHE_CONTENT" =~ $SEMVER_RE ]]; then
+                    LATEST_VERSION="$CACHE_CONTENT"
+                fi
+            fi
+        fi
+        # Fetch from npm if cache miss / stale / malformed
+        if [ -z "$LATEST_VERSION" ] && command -v npm > /dev/null 2>&1; then
+            NPM_RESULT=$(npm view agentic-sdlc-wizard version 2>/dev/null) || NPM_RESULT=""
+            if [[ "$NPM_RESULT" =~ $SEMVER_RE ]]; then
+                LATEST_VERSION="$NPM_RESULT"
+                mkdir -p "$VERSION_CACHE_DIR" 2>/dev/null || true
+                printf '%s' "$LATEST_VERSION" > "$VERSION_CACHE_FILE" 2>/dev/null || true
+            fi
+        fi
         if [ -n "$LATEST_VERSION" ] && [ "$LATEST_VERSION" != "$INSTALLED_VERSION" ]; then
-            echo "SDLC Wizard update available: ${INSTALLED_VERSION} → ${LATEST_VERSION} (run /update-wizard)"
+            # Minor-version delta: 1.25.0 vs 1.34.0 → 9
+            INSTALLED_MINOR=$(echo "$INSTALLED_VERSION" | awk -F. '{print $2+0}')
+            LATEST_MINOR=$(echo "$LATEST_VERSION" | awk -F. '{print $2+0}')
+            INSTALLED_MAJOR=$(echo "$INSTALLED_VERSION" | awk -F. '{print $1+0}')
+            LATEST_MAJOR=$(echo "$LATEST_VERSION" | awk -F. '{print $1+0}')
+            MINOR_DELTA=0
+            if [ "$INSTALLED_MAJOR" = "$LATEST_MAJOR" ]; then
+                MINOR_DELTA=$(( LATEST_MINOR - INSTALLED_MINOR ))
+            else
+                # Major bump: treat as a very large delta
+                MINOR_DELTA=99
+            fi
+            if [ "$MINOR_DELTA" -ge 3 ]; then
+                echo ""
+                echo "!! WARNING: SDLC Wizard is ${MINOR_DELTA} minor versions behind !!"
+                echo "   Installed: ${INSTALLED_VERSION}"
+                echo "   Latest:    ${LATEST_VERSION}"
+                echo "   You're missing bug fixes and features shipped across ${MINOR_DELTA} releases."
+                echo "   Strongly recommend running /update-wizard before starting new work."
+                echo ""
+            else
+                echo "SDLC Wizard update available: ${INSTALLED_VERSION} → ${LATEST_VERSION} (run /update-wizard)"
+            fi
         fi
     fi
 fi
@@ -121,6 +178,26 @@ if [ -f "$PROJECT_DIR/.github/workflows/weekly-api-update.yml" ] && \
     fi
 fi
+# Claude Code release review nudge (#85) — surface open 'auto-update' PRs
+# opened by .github/workflows/weekly-update.yml so new CC releases get triaged
+# before they bit-rot (relevance-HIGH PRs can sit for days otherwise).
+#
+# Gated on LOCAL presence of weekly-update.yml: the CLI distributes this hook
+# to consumer projects, which don't own the detector — don't pester them with
+# upstream-wizard PRs.
+if [ -f "$PROJECT_DIR/.github/workflows/weekly-update.yml" ] && \
+   command -v gh > /dev/null 2>&1; then
+    CC_UPDATE_COUNT=$(gh pr list \
+        --state open \
+        --label "auto-update" \
+        --limit 1 \
+        --json number \
+        --jq 'length' 2>/dev/null) || CC_UPDATE_COUNT=""
+    if [[ "$CC_UPDATE_COUNT" =~ ^[0-9]+$ ]] && [ "$CC_UPDATE_COUNT" -gt 0 ]; then
+        echo "Claude Code update pending review: ${CC_UPDATE_COUNT} open auto-update PR(s) (see .github/workflows/weekly-update.yml)"
+    fi
+fi
 # Claude Code version check (non-blocking, best-effort)
 if command -v claude > /dev/null 2>&1 && command -v npm > /dev/null 2>&1; then
     CC_LOCAL=$(claude --version 2>/dev/null | grep -o '[0-9][0-9.]*' | head -1) || true

package/hooks/precompact-seam-check.sh ADDED Viewed

@@ -0,0 +1,75 @@
+#!/bin/bash
+# PreCompact hook - block manual /compact at non-seam boundaries
+# Fires on manual /compact only (auto-compact is threshold-driven; blocking
+# it could push past 100% context and lose everything). Matcher: "manual"
+# in .claude/settings.json.
+#
+# Requires Claude Code v2.1.105+ (PreCompact event introduced April 13, 2026).
+#
+# Seam policy: compacting mid-cycle loses evidence the next round needs.
+# Block when:
+#   (1) .reviews/handoff.json status is PENDING_REVIEW or PENDING_RECHECK
+#       → mid-Codex-round, compact after CERTIFIED
+#   (2) git rebase/merge/cherry-pick in progress
+#       → finish in-progress git operation first
+# Allow otherwise.
+[ ! -t 0 ] && INPUT=$(cat) || INPUT=""
+# Determine project root: prefer $CLAUDE_PROJECT_DIR, fall back to cwd
+ROOT="${CLAUDE_PROJECT_DIR:-$PWD}"
+HOLD_REASONS=""
+# Check 1: Codex review mid-cycle
+# Self-heal: if handoff has a pr_number and gh reports that PR as MERGED,
+# the handoff is a stale artifact from a closed review — treat as implicit
+# CERTIFIED so a forgotten status field doesn't permanently block /compact.
+HANDOFF="$ROOT/.reviews/handoff.json"
+if [ -f "$HANDOFF" ]; then
+    STATUS=$(grep -o '"status"[[:space:]]*:[[:space:]]*"[^"]*"' "$HANDOFF" 2>/dev/null | head -1 | sed 's/.*"\([^"]*\)"$/\1/')
+    case "$STATUS" in
+        PENDING_REVIEW|PENDING_RECHECK)
+            PR_NUMBER=$(grep -o '"pr_number"[[:space:]]*:[[:space:]]*[0-9][0-9]*' "$HANDOFF" 2>/dev/null | head -1 | grep -o '[0-9][0-9]*$')
+            HEALED=0
+            if [ -n "$PR_NUMBER" ] && command -v gh >/dev/null 2>&1; then
+                PR_STATE=$(gh pr view "$PR_NUMBER" --json state --jq .state 2>/dev/null)
+                [ "$PR_STATE" = "MERGED" ] && HEALED=1
+            fi
+            if [ "$HEALED" -ne 1 ]; then
+                HOLD_REASONS="${HOLD_REASONS}  - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
+    Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
+            fi
+            ;;
+    esac
+fi
+# Check 2: in-progress git operation
+GITDIR="$ROOT/.git"
+if [ -d "$GITDIR" ]; then
+    if [ -e "$GITDIR/REBASE_HEAD" ] || [ -d "$GITDIR/rebase-merge" ] || [ -d "$GITDIR/rebase-apply" ]; then
+        HOLD_REASONS="${HOLD_REASONS}  - Git rebase in progress. Compacting mid-rebase loses the operation's context.
+    Resolve: finish or abort the rebase before /compact."$'\n'
+    fi
+    if [ -e "$GITDIR/MERGE_HEAD" ]; then
+        HOLD_REASONS="${HOLD_REASONS}  - Git merge in progress. Compacting mid-merge loses the operation's context.
+    Resolve: finish or abort the merge before /compact."$'\n'
+    fi
+    if [ -e "$GITDIR/CHERRY_PICK_HEAD" ]; then
+        HOLD_REASONS="${HOLD_REASONS}  - Git cherry-pick in progress. Compacting mid-cherry-pick loses the operation's context.
+    Resolve: finish or abort the cherry-pick before /compact."$'\n'
+    fi
+fi
+if [ -n "$HOLD_REASONS" ]; then
+    {
+        echo "HOLD: manual /compact at a non-seam. Compacting mid-cycle loses evidence the next round needs."
+        echo ""
+        echo "$HOLD_REASONS"
+        echo "Natural seams: commit boundary, Codex CERTIFIED, PR merge, ROADMAP item DONE."
+        echo "Override: resolve the blocker above, or temporarily disable this hook in .claude/settings.json."
+    } >&2
+    exit 2
+fi
+exit 0

package/hooks/sdlc-prompt-check.sh CHANGED Viewed

@@ -17,6 +17,69 @@ else
     exit 0
 fi
+# Effort auto-bump (ROADMAP #195). Watches this UserPromptSubmit payload for
+# LOW-confidence / FAILED-repeatedly / CONFUSED phrases, logs a timestamped
+# signal, and emits a loud '/effort xhigh' nudge when ≥2 signals land inside
+# a 30-minute window. Enforces the SDLC confidence table mid-session so
+# Claude stops burning budget at 'high' after confidence drops.
+EFFORT_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
+EFFORT_SIGNALS="$EFFORT_CACHE_DIR/effort-signals.log"
+PROMPT_TEXT=""
+if [ ! -t 0 ] && command -v jq > /dev/null 2>&1; then
+    STDIN_JSON=$(cat)
+    if [ -n "$STDIN_JSON" ]; then
+        PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
+    fi
+fi
+if [ -n "$PROMPT_TEXT" ]; then
+    LOWER=$(printf '%s' "$PROMPT_TEXT" | tr '[:upper:]' '[:lower:]')
+    SIGNAL_REASON=""
+    # Every trigger requires first-person ownership or a structured-label
+    # form, so educational/quoted mentions ("How do I name a low confidence
+    # badge?", "What does 'failed again' mean?") don't fire.
+    case "$LOWER" in
+        *"i'm stuck"*|*"i am stuck"*|*"im stuck"*|\
+        *"i'm confused"*|*"i am confused"*|*"im confused"*|\
+        *"i tried twice"*|*"i've tried twice"*|*"ive tried twice"*|\
+        *"i can't figure"*|*"i cant figure"*|\
+        *"i'm not sure why"*|*"i am not sure why"*|*"im not sure why"*|\
+        *"my confidence is low"*|*"my confidence: low"*|*"confidence: low"*|\
+        *"it's still failing"*|*"its still failing"*|\
+        *"it keeps failing"*|*"this keeps failing"*|\
+        *"it failed again"*|*"this failed again"*|\
+        *"failed twice"*|*"failed 2x"*)
+            SIGNAL_REASON="low"
+            ;;
+    esac
+    if [ -n "$SIGNAL_REASON" ]; then
+        # Group the write so redirection errors (e.g., unwritable HOME,
+        # cache-dir-is-a-file) land on /dev/null instead of leaking to stderr.
+        {
+            if mkdir -p "$EFFORT_CACHE_DIR" && [ -d "$EFFORT_CACHE_DIR" ]; then
+                # Prune entries older than 1h on every write to cap log size.
+                if [ -f "$EFFORT_SIGNALS" ]; then
+                    PRUNE_THRESH=$(( $(date +%s) - 3600 ))
+                    awk -v t="$PRUNE_THRESH" '$1+0 >= t' "$EFFORT_SIGNALS" > "${EFFORT_SIGNALS}.tmp" \
+                        && mv "${EFFORT_SIGNALS}.tmp" "$EFFORT_SIGNALS"
+                fi
+                printf '%s\t%s\n' "$(date +%s)" "$SIGNAL_REASON" >> "$EFFORT_SIGNALS"
+            fi
+        } 2>/dev/null || true
+    fi
+fi
+if [ -f "$EFFORT_SIGNALS" ]; then
+    NOW=$(date +%s)
+    THRESH=$(( NOW - 1800 ))
+    RECENT=$(awk -v t="$THRESH" '$1+0 >= t' "$EFFORT_SIGNALS" 2>/dev/null | wc -l | tr -d ' ')
+    if [ "${RECENT:-0}" -ge 2 ]; then
+        echo ""
+        echo "!! EFFORT BUMP REQUIRED: ${RECENT} low-confidence signals in last 30 min !!"
+        echo "   Run /effort xhigh NOW — spinning at 'high' after confidence drops wastes budget."
+        echo "   (Auto-enforcement of the SDLC confidence table. ROADMAP #195.)"
+        echo ""
+    fi
+fi
 if [ ! -s "$PROJECT_DIR/SDLC.md" ] || [ ! -s "$PROJECT_DIR/TESTING.md" ]; then
     cat << 'SETUP'
 SETUP NOT COMPLETE: SDLC.md and/or TESTING.md are missing.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.34.0",
+  "version": "1.36.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "./cli/bin/sdlc-wizard.js"

package/skills/sdlc/SKILL.md CHANGED Viewed

@@ -184,9 +184,11 @@ Before presenting approach, STATE your confidence:
 |-------|---------|--------|--------|
 | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
 | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
-| LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | Consider `/effort max` |
-| FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | Try `/effort max` |
-| CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | Try `/effort max` |
+| LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | **Run `/effort xhigh` now** — don't wait |
+| FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | **Run `/effort max` now** — you're already burning cycles at lower effort |
+| CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
+**Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — not after a third failure. Spinning at low effort is an SDLC failure mode, not a style choice.
 ## Self-Review Loop (CRITICAL)
@@ -536,10 +538,11 @@ Local tests pass -> Commit -> Push -> Watch CI
 1. Push changes to remote
 2. Watch CI: `gh pr checks --watch`
 3. Read CI logs — **pass or fail**: `gh run view <RUN_ID> --log` (not just `--log-failed`). Passing CI can still hide warnings, skipped steps, or degraded scores. Don't just check the green checkmark
-4. If CI fails → diagnose from logs, fix, push again (max 2 attempts)
-5. If CI passes → read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
-6. Fix valid suggestions, push, iterate until clean
-7. Only then: explicit merge with `gh pr merge --squash`
+4. **Cross-model review the CI logs themselves** — pipe `gh run view <RUN_ID> --log` to a tmp file and run `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access` with a prompt like *"Audit this CI log for silent failures, skipped tests, degraded metrics, or warnings-that-should-be-errors. Green checkmark is necessary but not sufficient."* A second model catches things the first missed (e.g., a job that passed but degraded an E2E score by 30%, or a test that was silently excluded). Cheap — one extra `codex exec` per PR. **Run separately on Tier 1 quick-check AND Tier 2 5x evaluation logs** — they exercise different code paths, so a clean Tier 1 audit doesn't imply a clean Tier 2. Evidence from PR #206: Tier 1 audit found 3 P1s (Node 24 false-green, "11/10" score leak, E2E incomplete); Tier 2 audit TBD — value is measured by running both and comparing.
+5. If CI fails → diagnose from logs, fix, push again (max 2 attempts)
+6. If CI passes → read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
+7. Fix valid suggestions, push, iterate until clean
+8. Only then: explicit merge with `gh pr merge --squash`
 **Why this is non-negotiable:** PR #145 auto-merged a release before review feedback was read. CI reviewer found a P1 dead-code bug that shipped to main. The fix required a follow-up commit. Auto-merge cost more time than the shepherd loop would have taken.

package/skills/setup/SKILL.md CHANGED Viewed

@@ -174,31 +174,61 @@ Skip this step if no branding assets or UI/content patterns detected.
 ### Step 9: Configure Tool Permissions
-Based on detected stack, suggest `allowedTools` entries for `.claude/settings.json`:
+Based on detected stack, suggest entries for `permissions.allow` in `.claude/settings.json`:
 - Package manager commands (npm, pnpm, yarn, cargo, go, pip, etc.)
 - Build/test commands
 - CI tools (gh)
+Write the shape as:
+```json
+{
+  "permissions": {
+    "allow": [
+      "Bash(npm:*)",
+      "Bash(npx:*)",
+      "Bash(git:*)",
+      "Bash(gh:*)"
+    ]
+  }
+}
+```
+**Do NOT write the deprecated top-level `allowedTools` array** (issue #197). Claude Code treats the presence of `allowedTools` in project settings as "user has explicitly scoped tool permissions" and silently disables its auto-mode classifier — same failure family as the model pin in #198. `permissions.allow` is the supported successor and does not trip the auto-mode gate.
 Present suggestions and let the user confirm.
-### Step 9.5: Context Window Configuration
+### Step 9.5: Context Window Configuration (Opt-In)
+The CLI ships `cli/templates/settings.json` with **no** `model` or `env` pin by default. This preserves Claude Code's built-in model auto-selection (Sonnet for cheap tasks, Opus for hard ones) and the upstream autocompact threshold. Power users who want guaranteed 1M context can opt in during setup.
+**Why this is opt-in (issue #198):** A top-level `"model"` in `settings.json` tells Claude Code "the user has explicitly chosen a model" and disables auto-mode for the session. That is a real tradeoff — the pin is only worth it when you actually need the 1M headroom and want to lock to Opus 4.7.
+**Ask the user exactly once in Step 9.5:**
+> Pin the session to `opus[1m]` (Opus 4.7 with 1M context) and set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`?
+>
+> - **No (default):** Leaves auto-mode enabled. Claude Code picks the model per turn, compaction follows upstream defaults. Recommended for most users.
+> - **Yes:** Long SDLC sessions (plan → TDD → review → CI shepherd on one feature) regularly cross 100K tokens; the 1M window gives headroom and 30% autocompact fires at ~300K. Requires Claude Code v2.1.111+ and comfort with losing model auto-selection.
+>
+> `[y/N]`
-The CLI ships `cli/templates/settings.json` with `"model": "opus[1m]"` and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M context window — compacts at ~300K). This is the SDLC wizard default. Confirm the installed values, or fall back to 200K if the user prefers:
+**If the user answers No (default):** Make no edits to `.claude/settings.json`. Auto-mode stays on. Done.
-- **1M default (`opus[1m]`):** Confirm `"model": "opus[1m]"` at top level and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "30"` under `env`. Requires Claude Code v2.1.111+ for Opus 4.7.
-- **200K fallback (`opus`):** Edit `.claude/settings.json` — change `"model"` to `"opus"` and raise `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to `"75"` (otherwise `30%` of 200K compacts too early at 60K).
+**If the user answers Yes:** Edit `.claude/settings.json` and add both fields at the top level:
-To fall back to 200K, edit `.claude/settings.json`:
 ```json
 {
-  "model": "opus",
+  "model": "opus[1m]",
   "env": {
-    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
+    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
   }
 }
 ```
-For CI pipelines, consider `"60"` — short tasks benefit from compacting early.
+Mention the escape hatch either way:
+- To opt out later: remove the `model` line (and optionally the `env` block) from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
+- For CI pipelines with short tasks, consider `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60` — compact early to stay fast.
 This is project-scoped and shared with the team via git.

package/skills/update/SKILL.md CHANGED Viewed

@@ -46,9 +46,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
 ```
 Installed: 1.24.0
-Latest:    1.34.0
+Latest:    1.36.0
 What changed:
+- [1.36.0] CC 2.1.118 `/usage` canonical + aliases, Tier 2 dead-gate fix (#215), score-history max_score correctness (#211), setup-bun regression guard (#210), post-mortem learnings (#220-222), GPT-5.5 adoption plan (#223), MCP-tool hooks + #198 re-verify in backlog (#218/#219)
+- [1.35.0] PreCompact seam gate + self-heal (#208/#209), effort auto-bump on LOW/FAILED/CONFUSED (#195), wizard staleness nudge (#196), Codex CI-log audit pattern, ...
 - [1.34.0] API feature detection shepherd for Claude releases, Memory Audit Protocol with 7 verified lessons (+2 caught-and-retracted), /less-permission-prompts surfaced, ...
 - [1.33.0] opus[1m] as SDLC default, dual-channel install drift guardrails, model/effort session-start nudge, ...
 - [1.32.0] Opus 4.7 + xhigh support, model/effort upgrade detection, benchmark ceiling audit, ...
@@ -110,6 +112,54 @@ NEVER overwrite settings.json. Instead:
 The CLI's `init --force` already has smart merge logic for settings.json. If the manual merge gets complicated, suggest: `npx agentic-sdlc-wizard init --force` (it preserves custom hooks).
+### Step 7.5: Model Pin Migration (Issue #198)
+Wizard versions 1.31.0–1.33.x unconditionally wrote `"model": "opus[1m]"` and `"env": { "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30" }` to `.claude/settings.json`. Issue #198 flipped that to opt-in because a top-level `model` disables Claude Code's auto-mode for the session.
+If the user is upgrading from a pre-#198 version, check their `.claude/settings.json`:
+1. **If `model` is `"opus[1m]"` and `env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` is `"30"`** — this is likely the old wizard-installed pair, not an intentional user choice. Ask:
+   > Your `.claude/settings.json` pins `model: "opus[1m]"` with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`.
+   > This pair was the SDLC wizard default in 1.31.0–1.33.x, but it disables Claude Code's auto-mode (see issue #198).
+   >
+   > - **Remove the pin** (recommended for most users) — keeps auto-mode enabled, lets Claude Code pick the model per turn.
+   > - **Keep the pin** — you want guaranteed Opus 4.7 + 1M context, and you're OK giving up model auto-selection.
+   >
+   > Remove, keep, or decide later? `[r/k/l]`
+2. **If only one of the two fields matches** (e.g. `model: "opus[1m]"` but custom autocompact, or vice versa) — treat as intentional customization. Do not prompt.
+3. **If `model` is some other value** (e.g. `"sonnet"`, `"opus"`) — treat as user's explicit choice. Do not touch.
+4. **If neither field is set** — user is already on the new default. No action.
+When removing: edit the file in place, drop the `model` key (and the `env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` key if nothing else is in `env`, otherwise leave `env` alone). Never touch other keys the user added.
+### Step 7.6: `allowedTools` → `permissions.allow` Migration (Issue #197)
+Wizard versions before #197 guided users to write a top-level `allowedTools` array in `.claude/settings.json`. Claude Code silently disables its auto-mode classifier when that key is present, even with `defaultMode: "auto"` set globally.
+If the user's `.claude/settings.json` has a top-level `allowedTools` array, offer to migrate:
+1. **If only `allowedTools` is present** (no `permissions.allow`) — ask:
+   > Your `.claude/settings.json` has a top-level `allowedTools` array. This silently disables Claude Code auto-mode (see issue #197). The supported successor is `permissions.allow`, which accepts the same patterns but doesn't trip the auto-mode gate.
+   >
+   > - **Migrate** (recommended): move all entries into `permissions.allow`, remove the old `allowedTools`.
+   > - **Keep** — you have a specific reason to use the legacy key.
+   > - **Later** — don't touch it now.
+   >
+   > `[m/k/l]`
+2. **If both `allowedTools` and `permissions.allow` are present** — flag it: the two lists may have diverged. Show both arrays to the user. On migrate, append every entry from `allowedTools` to the end of `permissions.allow` (preserving order within each list), then drop the legacy `allowedTools` key. **Do NOT dedup.** If the same string appears in both lists, it stays in both positions — Claude Code treats duplicate entries as a no-op, but dedup would silently remove user data that the user might have intended. If the user explicitly asks to dedup, do that as a separate follow-up edit.
+3. **If only `permissions.allow` is present** — user is already on the new shape. No action.
+4. **If neither is present** — no action.
+When migrating: preserve every entry byte-for-byte; only the container key changes. Do not reorder, dedup, or expand wildcards. Other top-level keys (hooks, env, model, custom user fields) are never touched.
 ### Step 8: Apply Selected Changes
 For each file the user approved: