agentic-sdlc-wizard 1.33.0 → 1.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.33.0",
16
+ "version": "1.35.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.33.0",
3
+ "version": "1.35.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,58 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.35.0] - 2026-04-19
8
+
9
+ ### Added
10
+
11
+ - **PreCompact seam gate** (#205 / ROADMAP #208). New `hooks/precompact-seam-check.sh` blocks manual `/compact` when `.reviews/handoff.json` status is `PENDING_REVIEW`/`PENDING_RECHECK` or a git rebase/merge/cherry-pick is in flight. Matcher is `manual` — auto-compact is deliberately NOT gated (blocking it could push past 100% context). Requires Claude Code v2.1.105+. 10 quality tests.
12
+ - **Self-healing PreCompact** (#206 / ROADMAP #209). Hook now treats a stale `PENDING_*` handoff as implicit `CERTIFIED` when optional `pr_number` field is present and `gh pr view <N>` reports `MERGED`. Fixes the "forgot to flip status to CERTIFIED after merge" consumer bug. Graceful fallback: if `pr_number` absent, `gh` missing, offline, or any error → existing block behavior. 4 new quality tests with mocked `gh` binary.
13
+ - **Dynamic effort auto-bump hook** (#202 / ROADMAP #195). `sdlc-prompt-check.sh` scans `UserPromptSubmit` payload for LOW-confidence / FAILED-repeatedly / CONFUSED phrases and logs a timestamped signal. At ≥2 recent signals in a 30-min window, emits a loud `!! EFFORT BUMP REQUIRED !!` block with the exact `/effort xhigh` command. 8 quality tests.
14
+ - **Loud staleness nudge** (#201 / ROADMAP #196). `instructions-loaded-check.sh` now caches npm-latest for 24h and prints a loud multi-line warning when the installed wizard is ≥3 minor versions behind. 1–2 minor keeps the existing mild one-liner.
15
+ - **Session-start CC auto-update nudge** (#192 / ROADMAP #85). Instructions-loaded hook queries for open auto-update PRs and nudges the user to review before compacting.
16
+ - **Hook token-cost caps** (#203). 4 new size-cap tests across every hook. Negative control injects echo bloat to prove the caps trip.
17
+ - **Permissions allowlist** (#204). Top read-only tools pre-approved in `.claude/settings.json` to cut permission prompts during automation.
18
+ - **Codex-audit-on-CI-logs shepherd step** (docs). SDLC skill now requires running `codex exec xhigh` against both Tier 1 and Tier 2 CI logs separately — catches silent failures, degraded metrics, and warnings-promoted-to-errors that the green checkmark hides. Dogfooded on PR #206: caught 4 P1s in pre-existing CI infra (tracked as ROADMAP #210, #211, #215 + regression of #93).
19
+
20
+ ### Fixed
21
+
22
+ - **CI persist-to-PR-branch race** (#196 / ROADMAP #193). Tier 2 no longer aborts the whole run on a single low-score trial; records the trial instead.
23
+ - **Setup wizard `allowedTools` → `permissions.allow`** (#200 / ROADMAP #197). CC v2.1 renamed the field; setup template updated.
24
+ - **Setup wizard `opus[1m]` opt-in** (#199 / ROADMAP #198). Default no longer force-pins; respects explicit user choice.
25
+ - **SessionStart hook model-field absence** (#180 / commit 3b23860). Model isn't exposed in SessionStart payload; hook now detects effort-only.
26
+
27
+ ### Docs
28
+
29
+ - Memory lessons promoted (#194): tier2 exit-code pattern + pipeline liveness.
30
+ - Community paths (#191 / ROADMAP #98): issue + PR templates + Discussions enabled.
31
+ - ROADMAP backlog filed this cycle: #210 (Node 24 false-green), #211 (Tier 1 "11/10" score), #212 (local-Max E2E), #213 (ship degradation env vars by default — blocked on #214), #214 (adaptive-thinking A/B Prove-It), #215 (Tier 2 persist dead code), #216 (repo rename).
32
+
33
+ ## [1.34.0] - 2026-04-17
34
+
35
+ ### Added
36
+ - Memory Audit Protocol for promoting private-memory lessons to shared docs (#189)
37
+ - New `/sdlc` subsection under "After Session (Capture Learnings)" defines a three-bucket classifier (`promote` / `keep` / `manual-review`) with a rule-based privacy denylist (`type: user`/`reference` → keep, `project`/`feedback` → manual-review)
38
+ - YAML frontmatter parser in `tests/test-memory-audit-protocol.sh` normalizes inline comments, quoted values, and whitespace so variants like `type: "user" # external` still route to keep
39
+ - `SDLC.md` now has a `## Lessons Learned` section seeded with 7 verified technical gotchas (GH CLI stdout, `workflows` YAML scope, GITHUB_TOKEN workflow triggers, GHA `${{ }}` backtick substitution, macOS bash 3.x, stderr/stdout separation for JSON parsing, `continue-on-error` + `||` masking); each entry cites its originating PR or incident date and was re-verified with a runnable repro before promotion
40
+ - 10-fixture corpus at `tests/fixtures/memory-audit-corpus/` (6 promote / 2 keep / 2 manual-review) with `test_expected` frontmatter seeds the future LLM-gated quality runner
41
+ - 12-test protocol suite covers structure, rule-based denylist, YAML-variant hardening, corpus consistency (promote fixtures route to manual-review under rule-based), and corpus shape
42
+ - Codex xhigh 3-round code review: 4/10 → 8/10 → 10/10 CERTIFIED. Caught two false lessons in private memory (`${3:-{}}` brace-default claim and `--argjson result` jq-conflict claim) that were retracted with dated strikethroughs — the protocol's first real use prevented its own false claims from shipping
43
+ - CLI distributes skill updates + new SDLC.md section; CI wire-up in `.github/workflows/ci.yml` (validate job)
44
+ - API feature detection shepherd for Claude API release notes (#100, PRs #184, #186, #187)
45
+ - LLM-free weekly detector at `.github/workflows/weekly-api-update.yml` polls `platform.claude.com/docs/en/release-notes/api.md`
46
+ - `scripts/parse-api-changelog.py` parses ATX date headers with ordinal-date normalizer and bullet-summary capture (non-date sub-headers like `#### SDKs` no longer terminate bullet extraction); 200-char truncation with ellipsis; tab scrub
47
+ - `scripts/persist-api-state.sh` writes last-seen date with branch-protection-safe non-blocking push; opens/updates a single `api-review-needed` tracking issue with enriched bullet summaries (not just dates)
48
+ - `instructions-loaded-check.sh` nudges at session start when open issues exist; gated on local workflow presence so consumer forks see only their own detector's issues
49
+ - 33 tests including 8 fixture-based parser tests (bullet capture, subheader boundary, tab scrub, truncation, ordinal dates) and 2 integration tests
50
+ - Codex xhigh 5 rounds across 2 PRs: 9/10 CERTIFIED. Found-in-prod P0 hotfix in #187 — `gh api` writes JSON error bodies to stdout (not stderr), so the label-create `already_exists` check was broken after the first successful dispatch; pattern now captures both streams
51
+
52
+ ### Fixed
53
+ - `gh api` error handling in `weekly-api-update.yml` now captures stdout+stderr together for `already_exists` detection on label creation (#187). Added as portable lesson in `SDLC.md` Lessons Learned
54
+
55
+ ### Docs
56
+ - `/less-permission-prompts` Claude Code native skill surfaced in wizard and setup documentation (#183)
57
+ - README community section restyled with visual Discord badge for Automation Station
58
+
7
59
  ## [1.33.0] - 2026-04-17
8
60
 
9
61
  ### Added
@@ -403,7 +403,7 @@ The `if` field on individual hook handlers filters by tool name AND arguments us
403
403
  | `matcher` | Group (all hooks in array) | Tool name only | Regex (`Write\|Edit`) |
404
404
  | `if` | Individual handler | Tool name + arguments | Permission rule (`Edit(src/**)`) |
405
405
 
406
- **Pattern examples:** `Edit(*.ts)`, `Write(src/**)`, `Bash(git *)`. Same syntax as `allowedTools` in settings.json.
406
+ **Pattern examples:** `Edit(*.ts)`, `Write(src/**)`, `Bash(git *)`. Same syntax as `permissions.allow` in settings.json.
407
407
 
408
408
  **Only works on tool-use events:** `PreToolUse`, `PostToolUse`, `PostToolUseFailure`. Adding `if` to non-tool events prevents the hook from running.
409
409
 
@@ -846,6 +846,22 @@ Two tools for managing context — use the right one:
846
846
 
847
847
  **Best practice:** Put persistent instructions in CLAUDE.md (survives both `/compact` and `/clear`), not in conversation.
848
848
 
849
+ ### Compact at Seams, Not Thresholds (PreCompact hook)
850
+
851
+ **The threshold is the trigger, not the decision.** 25-30% remaining (~70% used) is the commonly-cited "sweet spot" but ignores *what you're doing* at that moment. Compacting mid-Codex-round loses the round-1 evidence and certify conditions that round-2 needs to re-verify. Compacting mid-rebase strands the operation without the context that was setting it up.
852
+
853
+ A **seam** is a point where losing conversational context is safe:
854
+ - Commit boundary (change persisted to git)
855
+ - Codex `CERTIFIED` (review cycle closed)
856
+ - PR merged (work shipped)
857
+ - ROADMAP item marked DONE
858
+
859
+ The wizard's `PreCompact` hook (`hooks/precompact-seam-check.sh`) enforces this for **manual** `/compact` only — it reads `.reviews/handoff.json` and blocks with `HOLD` + exit 2 when status is `PENDING_REVIEW` / `PENDING_RECHECK`, and also blocks when a git rebase, merge, or cherry-pick is in progress. Auto-compact is **not** gated — blocking it could push past 100% context and lose everything. Requires Claude Code **v2.1.105+** (PreCompact event introduced 2026-04-13).
860
+
861
+ **What's NOT checked:** in-progress TodoWrite tasks. Claude Code does not persist TodoWrite state to a file readable from a hook, so "finish the current todo first" is on you, not the hook. Watch the TodoWrite panel before you `/compact`.
862
+
863
+ Override: resolve the blocker (certify the review, finish the rebase), or temporarily disable the hook in `.claude/settings.json`. Don't suppress the warning reflexively — the warning is the point.
864
+
849
865
  ### Autocompact Tuning
850
866
 
851
867
  Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic. For a rigorous benchmarking methodology to validate these thresholds, see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md).
@@ -855,7 +871,9 @@ Override the default auto-compact threshold with environment variables. These ar
855
871
  | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
856
872
  | `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |
857
873
 
858
- **Recommended:** The SDLC Wizard CLI sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` and `"model": "opus[1m]"` in `.claude/settings.json` by default (tuned for the 1M context window — compacts at ~300K). To customize, edit `.claude/settings.json`:
874
+ **Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.7 + 1M context answer yes.
875
+
876
+ To opt in by hand, edit `.claude/settings.json`:
859
877
 
860
878
  ```json
861
879
  {
@@ -876,7 +894,7 @@ export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30
876
894
 
877
895
  | Use Case | AUTOCOMPACT % | Why |
878
896
  |----------|--------------|-----|
879
- | **SDLC default (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions** |
897
+ | **Opt-in SDLC setup (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions. Paired with the opt-in `opus[1m]` pin (see issue #198)** |
880
898
  | General development (200K `opus`) | 75% | Leaves room for implementation after planning |
881
899
  | Complex refactors (200K `opus`) | 80% | Slightly more context before compaction |
882
900
  | CI pipelines | 60% | Short tasks, compact early to stay fast |
@@ -892,31 +910,33 @@ The thresholds above are community consensus — not empirically validated. For
892
910
 
893
911
  ### 1M vs 200K Context Window
894
912
 
895
- Claude Code supports both 200K and 1M context windows. **Default to `opus[1m]` for SDLC work** — the 1M headroom is free until you actually use it.
913
+ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in power-user pin** — ask yourself whether you actually need the headroom before setting it, because pinning the model at the top level disables Claude Code's auto-mode (see issue #198).
896
914
 
897
- | | 200K Context (`opus`) | 1M Context (`opus[1m]`) **← default** |
915
+ | | 200K Context (default / auto-mode) | 1M Context (`opus[1m]`, opt-in) |
898
916
  |---|---|---|
899
- | **Best for** | Short one-off tasks | Normal SDLC cycles + multi-feature work |
917
+ | **Best for** | Most work — auto-mode picks Sonnet/Opus per turn | Multi-feature / long plan+TDD+review cycles where a single session really crosses 100K+ |
900
918
  | **Typical usage** | 50-80K tokens per task | 50-80K typical, up to 200K+ for complex workflows |
901
919
  | **Cost** | Standard pricing | Anthropic currently lists the 1M window at standard pricing across the full context for supported Opus/Sonnet models — **verify current rates at [docs.anthropic.com/pricing](https://docs.anthropic.com/)** before assuming no premium |
902
- | **Auto-compact** | Default 95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) **tune to 30%** |
903
- | **Suggested override** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
920
+ | **Auto-mode** | **Enabled** Claude Code chooses model per turn | **Disabled** top-level `model` tells CC you've chosen explicitly |
921
+ | **Auto-compact** | Default ~95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` |
922
+ | **Suggested override (if you pin)** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
923
+
924
+ **Why `opus[1m]` is opt-in (issue #198):**
925
+ - **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
926
+ - **The 1M headroom has to earn it.** If your typical session stays under 150K, you're giving up auto-mode for headroom you're not using.
927
+ - **Power users who want guaranteed Opus 4.7 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.
904
928
 
905
- **Why `opus[1m]` as default:**
906
- - **Long SDLC sessions accumulate context fast** — plan → TDD → review → CI shepherd on a single feature regularly crosses 100K tokens
907
- - **Safety margin against autocompact loss** — cheaper to have headroom than to re-read files after a forced compact
908
- - **At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models.** Verify current rates for your plan before relying on this — see [docs.anthropic.com/pricing](https://docs.anthropic.com/)
929
+ **Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.7 specifically (not Sonnet), and you're OK losing auto-mode.
909
930
 
910
- **Set it up:** `/model opus[1m]` in your session, or set `"model": "opus[1m]"` in `.claude/settings.json`. The CLI template ships with this default. Requires Claude Code v2.1.111+ for Opus 4.7.
931
+ **Stay on auto-mode (default) when:** you're unsure, your work is mixed short/long, or you want Claude Code to do the model math for you.
911
932
 
912
- **Fall back to `opus` (200K) when:**
913
- - Your plan or organization charges higher rates for long-context prompts (check your billing)
914
- - You're doing genuinely short one-off tasks and want slightly faster responses
915
- - Your team has cost controls that flag >200K prompts
933
+ **How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.111+ for Opus 4.7. The setup wizard's Step 9.5 also asks once, with default No.
934
+
935
+ **How to opt out:** remove the `model` line from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
916
936
 
917
937
  **Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/cost` to monitor — a 900K-token session is meaningfully more expensive than an 80K one even at standard rates.
918
938
 
919
- **Autocompact pairing (important):** If you default to `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. The CLI template sets this automatically.
939
+ **Autocompact pairing (important):** If you opt into `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. Step 9.5 writes both together when you opt in.
920
940
 
921
941
  ---
922
942
 
@@ -1300,7 +1320,7 @@ Feature branches still recommended for solo devs (keeps main clean, easy rollbac
1300
1320
  **CI monitoring detail:**
1301
1321
  > "Should Claude monitor CI checks after pushing and auto-diagnose failures? (y/n)"
1302
1322
 
1303
- - **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to allowedTools
1323
+ - **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to `permissions.allow`
1304
1324
  - **No** → Skip CI monitoring steps (Claude still runs local tests, just doesn't watch CI)
1305
1325
 
1306
1326
  **CI review feedback question (only if CI monitoring is enabled):**
@@ -1365,7 +1385,7 @@ Claude scans for:
1365
1385
  │ ├── .github/workflows/deploy*.yml → GitHub Actions deploy
1366
1386
  │ └── package.json scripts (deploy:*) → npm deploy scripts
1367
1387
 
1368
- ├── Tool permissions (for allowedTools):
1388
+ ├── Tool permissions (for permissions.allow):
1369
1389
  │ ├── package.json → Bash(npm *), Bash(node *), Bash(npx *)
1370
1390
  │ ├── pnpm-lock.yaml → Bash(pnpm *)
1371
1391
  │ ├── yarn.lock → Bash(yarn *)
@@ -1759,18 +1779,20 @@ Create `.claude/settings.json`:
1759
1779
  ```json
1760
1780
  {
1761
1781
  "verbosity": "medium",
1762
- "allowedTools": [
1763
- "Read",
1764
- "Edit",
1765
- "Write",
1766
- "Glob",
1767
- "Grep",
1768
- "Task",
1769
- "Bash(npm *)",
1770
- "Bash(node *)",
1771
- "Bash(npx *)",
1772
- "Bash(gh *)"
1773
- ],
1782
+ "permissions": {
1783
+ "allow": [
1784
+ "Read",
1785
+ "Edit",
1786
+ "Write",
1787
+ "Glob",
1788
+ "Grep",
1789
+ "Task",
1790
+ "Bash(npm *)",
1791
+ "Bash(node *)",
1792
+ "Bash(npx *)",
1793
+ "Bash(gh *)"
1794
+ ]
1795
+ },
1774
1796
  "hooks": {
1775
1797
  "UserPromptSubmit": [
1776
1798
  {
@@ -1800,7 +1822,7 @@ Create `.claude/settings.json`:
1800
1822
 
1801
1823
  ### Allowed Tools (Adaptive)
1802
1824
 
1803
- The `allowedTools` array is auto-generated based on your stack detected in Step 0.4.
1825
+ The `permissions.allow` array is auto-generated based on your stack detected in Step 0.4. (Historical note: pre-#197 guidance used a top-level `allowedTools` array — that form silently disables Claude Code auto-mode, so the wizard writes `permissions.allow` now.)
1804
1826
 
1805
1827
  | If Detected | Tools Added |
1806
1828
  |-------------|-------------|
@@ -1841,6 +1863,9 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
1841
1863
  |------|---------------|---------|
1842
1864
  | `UserPromptSubmit` | Every message you send | Baseline SDLC reminder, skill auto-invoke |
1843
1865
  | `PreToolUse` | Before Claude edits files | TDD check: "Did you write the test first?" Uses `if` field to only fire on source files |
1866
+ | `InstructionsLoaded` | On first SDLC.md/CLAUDE.md load | Staleness nudges (wizard version, review-protocol reminders, CC release alerts) |
1867
+ | `SessionStart` | On `claude` startup | Detect stale effort setting / model upgrades |
1868
+ | `PreCompact` (manual only) | When user runs `/compact` | **Seam gate** — blocks manual compact when `.reviews/handoff.json` is `PENDING_REVIEW`/`PENDING_RECHECK` or a git rebase/merge/cherry-pick is in progress. Auto-compact is NOT gated (blocking it risks pushing past 100% context and losing everything). Requires Claude Code v2.1.105+ |
1844
1869
 
1845
1870
  ### How Skill Auto-Invoke Works
1846
1871
 
@@ -2057,9 +2082,11 @@ Before presenting approach, STATE your confidence:
2057
2082
  |-------|---------|--------|--------|
2058
2083
  | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
2059
2084
  | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
2060
- | LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
2061
- | FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
2062
- | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
2085
+ | LOW (<60%) | Not sure | ASK USER before proceeding | **Run `/effort xhigh` now** — don't wait |
2086
+ | FAILED 2x | Something's wrong | STOP. ASK USER immediately | **Run `/effort max` now** — you're burning cycles at lower effort |
2087
+ | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
2088
+
2089
+ **Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — spinning at low effort is an SDLC failure mode.
2063
2090
 
2064
2091
  ## Self-Review Loop (CRITICAL)
2065
2092
 
@@ -2626,6 +2653,28 @@ These are your full reference docs. Start with stubs and expand over time:
2626
2653
 
2627
2654
  **Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
2628
2655
 
2656
+ ## Pipeline Liveness Audits — CI green ≠ data flowing
2657
+
2658
+ A green CI badge only means "no step crashed." It does **not** mean "this pipeline is still producing the output it's supposed to produce." Long-running pipelines — scheduled benchmarks, nightly analytics jobs, weekly report generators, any workflow that appends to a log or dataset — can silently stop producing output while every run still reports success. Regression tests alone do not catch this: the fault lives between "green status" and "observable artifact."
2659
+
2660
+ **Symptom to watch for:** a file, table, or dashboard that's supposed to be updated by a scheduled workflow stops advancing even though the workflow keeps running green.
2661
+
2662
+ **Concrete example from this repo (2026-04-18):** `tests/e2e/score-history.jsonl` hadn't been appended to since 2026-03-30, yet weekly runs kept completing. Two stacked causes:
2663
+
2664
+ 1. On the 2026-04-13 run, a `CRITICAL MISS` caused `evaluate.sh` to exit 1 *with* a valid score payload; the tier2 wrapper aborted on the non-zero exit and dropped the trial (fix: PR #193 — disambiguate infra error from legitimate low-score exit via JSON payload, not exit code).
2665
+ 2. On other runs, a separate PR-branch push race (`refs/pull/N/merge` checkout vs. `refs/heads/<branch>` push) silently dropped the new trial because `continue-on-error: true` was set on the push step.
2666
+
2667
+ The second failure was silent (push step protected by `continue-on-error`); the first was a red CI run but nobody was watching weekly runs closely enough to notice the artifact had stopped advancing. Either way, the artifact's liveness would have caught the stall weeks earlier than a CI-badge-only review.
2668
+
2669
+ **Audit pattern — run when you touch a pipeline, and when asked to check pipeline health:**
2670
+
2671
+ 1. **Identify the observable output** — the artifact the pipeline is supposed to produce (file, PR, issue, log row, dashboard row).
2672
+ 2. **Check its liveness** — what's the last timestamp? If the pipeline runs weekly but the artifact is 3+ cycles stale, that's a stall, not a lull.
2673
+ 3. **Walk backward from the artifact to CI** — find the step that writes it, read that specific step's logs (not just the top-level status), confirm the write actually happened.
2674
+ 4. **When `continue-on-error: true` is present upstream of the write step**, treat that step as suspect by default — its failures are masked.
2675
+
2676
+ **When Claude should run this.** Claude runs the liveness audit when it merges or edits a scheduled workflow, when it ships a change that writes to a long-running artifact, and whenever the user asks to check a pipeline's health. It is not a background task — if no one is looking, the audit does not happen.
2677
+
2629
2678
  ## Rollback
2630
2679
 
2631
2680
  If deployment fails or post-deploy verification catches issues:
@@ -2662,7 +2711,7 @@ If deployment fails or post-deploy verification catches issues:
2662
2711
 
2663
2712
  **SDLC.md:**
2664
2713
  ```markdown
2665
- <!-- SDLC Wizard Version: 1.33.0 -->
2714
+ <!-- SDLC Wizard Version: 1.35.0 -->
2666
2715
  <!-- Setup Date: [DATE] -->
2667
2716
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2668
2717
  <!-- Git Workflow: [PRs or Solo] -->
@@ -3135,6 +3184,8 @@ Want me to file these? (yes/no/not now)
3135
3184
 
3136
3185
  **`/revise-claude-md` scope:** Only updates CLAUDE.md. It does NOT touch feature docs, TESTING.md, hooks, or skills. Use it for general project context that applies across the codebase.
3137
3186
 
3187
+ **Memory Audit Protocol:** Per-user memory at `~/.claude/projects/<proj>/memory/` accumulates private learnings. Some are portable technical lessons that belong in shared docs. The `/sdlc` skill's **Memory Audit Protocol** section (under "After Session (Capture Learnings)") defines a three-bucket classifier (`promote` / `keep` / `manual-review`) with a type-based denylist that keeps `user`/`reference` entries private and routes `project`/`feedback` entries to human review. Run at end-of-release or after debugging-heavy sessions. Human approves every promotion chunk-by-chunk before apply.
3188
+
3138
3189
  **When to do mini-retro:** After features, tricky bugs, or discovering gotchas. Skip for one-line fixes or questions.
3139
3190
 
3140
3191
  **The SDLC evolves:** Weekly research, monthly deep-dives, and CI friction signals feed improvements. Human approves, the system gets better.
@@ -3721,7 +3772,7 @@ Walk through updates? (y/n)
3721
3772
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3722
3773
 
3723
3774
  ```markdown
3724
- <!-- SDLC Wizard Version: 1.33.0 -->
3775
+ <!-- SDLC Wizard Version: 1.35.0 -->
3725
3776
  <!-- Setup Date: 2026-01-24 -->
3726
3777
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3727
3778
  <!-- Git Workflow: PRs -->
@@ -4060,6 +4111,23 @@ When Anthropic provides official plugins that overlap with this SDLC:
4060
4111
 
4061
4112
  **Re-run `claude-code-setup` periodically** (quarterly, or when your project expands in scope) to catch new automations — MCP servers, hooks, subagents — that weren't relevant at initial setup but are now.
4062
4113
 
4114
+ **API feature shepherd (self-maintenance, roadmap #100):**
4115
+
4116
+ The wizard watches the **Anthropic API changelog** — not just Claude Code CLI releases — for new betas, tools, and agent features. The detector runs in `.github/workflows/weekly-api-update.yml`, is intentionally LLM-free, and only opens a tracking issue labeled `api-review-needed` when new entries appear at `platform.claude.com/docs/en/release-notes/api`.
4117
+
4118
+ When that issue is open, the session-start hook nudges you. The session (not the workflow) does the deep research + adoption via the full SDLC loop. This mirrors the "local shepherd" pattern used for CI fixes: cheap Action-layer detection + session-time analysis beats expensive Action-layer LLM calls.
4119
+
4120
+ The gap this closes: the advisor tool (API beta, `advisor-tool-2026-03-01`) shipped and was missed for several days before manual discovery. Detector would have flagged it on the next weekly tick.
4121
+
4122
+ **Complementary native skills worth knowing:**
4123
+
4124
+ | Native Skill | What It Does | When to Run |
4125
+ |--------------|--------------|-------------|
4126
+ | `/less-permission-prompts` | Scans transcripts for common read-only Bash/MCP calls and proposes a prioritized allowlist | After a few sessions — reduces permission friction without auto mode |
4127
+ | `/permissions` | Pre-allow specific commands and check them into `.claude/settings.json` | Anytime you want an auditable team allowlist |
4128
+
4129
+ These are shipped by Claude Code itself. The wizard doesn't reimplement them — it points you at them so you benefit from the native version's ongoing maintenance.
4130
+
4063
4131
  ### When Claude Code Improves
4064
4132
 
4065
4133
  Claude Code is actively improving. When they add built-in features:
package/README.md CHANGED
@@ -237,8 +237,26 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
237
237
 
238
238
  ## Community
239
239
 
240
- Come join **[Automation Station](https://discord.com/invite/fGPEF7GHrF)** — a community Discord packed with software engineers bringing 40+ years of combined experience across every area of the stack (frontend, backend, infra, embedded, data, QA, DevOps, you name it). Share patterns, ask questions, compare notes on AI agents, automation, and SDLC tooling.
240
+ <div align="center">
241
+
242
+ [![Discord](https://img.shields.io/badge/Discord-Automation%20Station-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/fGPEF7GHrF)
243
+
244
+ **[Automation Station](https://discord.com/invite/fGPEF7GHrF)** — a community Discord packed with software engineers bringing 40+ years of combined experience across every area of the stack.
245
+
246
+ _Frontend · Backend · Infra · Embedded · Data · QA · DevOps_
247
+
248
+ Share patterns, ask questions, compare notes on AI agents, automation, and SDLC tooling.
249
+
250
+ </div>
241
251
 
242
252
  ## Contributing
243
253
 
244
254
  PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for evaluation methodology and testing.
255
+
256
+ ## Feedback
257
+
258
+ Three ways to report bugs, request features, or ask questions:
259
+
260
+ - **In-session:** run `/feedback` inside any Claude Code session using this wizard — auto-fills context and redacts secrets before filing
261
+ - **Issue templates:** [bug report](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=bug_report.md), [feature request](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=feature_request.md), [question](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/issues/new?template=question.md)
262
+ - **Discussions:** open-ended conversations at [github.com/BaseInfinity/agentic-ai-sdlc-wizard/discussions](https://github.com/BaseInfinity/agentic-ai-sdlc-wizard/discussions)
package/cli/init.js CHANGED
@@ -24,6 +24,7 @@ const FILES = [
24
24
  { src: 'hooks/tdd-pretool-check.sh', dest: '.claude/hooks/tdd-pretool-check.sh', executable: true, base: REPO_ROOT },
25
25
  { src: 'hooks/instructions-loaded-check.sh', dest: '.claude/hooks/instructions-loaded-check.sh', executable: true, base: REPO_ROOT },
26
26
  { src: 'hooks/model-effort-check.sh', dest: '.claude/hooks/model-effort-check.sh', executable: true, base: REPO_ROOT },
27
+ { src: 'hooks/precompact-seam-check.sh', dest: '.claude/hooks/precompact-seam-check.sh', executable: true, base: REPO_ROOT },
27
28
  { src: 'skills/sdlc/SKILL.md', dest: '.claude/skills/sdlc/SKILL.md', base: REPO_ROOT },
28
29
  { src: 'skills/setup/SKILL.md', dest: '.claude/skills/setup/SKILL.md', base: REPO_ROOT },
29
30
  { src: 'skills/update/SKILL.md', dest: '.claude/skills/update/SKILL.md', base: REPO_ROOT },
@@ -1,8 +1,4 @@
1
1
  {
2
- "model": "opus[1m]",
3
- "env": {
4
- "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
5
- },
6
2
  "hooks": {
7
3
  "UserPromptSubmit": [
8
4
  {
@@ -45,6 +41,17 @@
45
41
  }
46
42
  ]
47
43
  }
44
+ ],
45
+ "PreCompact": [
46
+ {
47
+ "matcher": "manual",
48
+ "hooks": [
49
+ {
50
+ "type": "command",
51
+ "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/precompact-seam-check.sh"
52
+ }
53
+ ]
54
+ }
48
55
  ]
49
56
  }
50
57
  }
package/hooks/hooks.json CHANGED
@@ -42,6 +42,17 @@
42
42
  }
43
43
  ]
44
44
  }
45
+ ],
46
+ "PreCompact": [
47
+ {
48
+ "matcher": "manual",
49
+ "hooks": [
50
+ {
51
+ "type": "command",
52
+ "command": "${CLAUDE_PLUGIN_ROOT}/hooks/precompact-seam-check.sh"
53
+ }
54
+ ]
55
+ }
45
56
  ]
46
57
  }
47
58
  }
@@ -34,14 +34,71 @@ if [ -n "$MISSING" ]; then
34
34
  echo "Invoke Skill tool, skill=\"setup-wizard\" to generate them."
35
35
  fi
36
36
 
37
- # Version update check (non-blocking, best-effort)
37
+ # Version update check (non-blocking, best-effort).
38
+ # Fetches npm latest at most once per 24h (ROADMAP #196). Prints a stronger
39
+ # multi-line nudge when the gap is ≥3 minor versions — the one-liner gets
40
+ # skipped after weeks of ignoring it (user feedback 2026-04-18).
38
41
  SDLC_MD="$PROJECT_DIR/SDLC.md"
42
+ # Strict x.y.z semver — rejects whitespace, "junk", "1.alpha.0", etc.
43
+ SEMVER_RE='^[0-9]+\.[0-9]+\.[0-9]+$'
39
44
  if [ -f "$SDLC_MD" ]; then
40
45
  INSTALLED_VERSION=$(grep -o 'SDLC Wizard Version: [0-9.]*' "$SDLC_MD" | head -1 | sed 's/SDLC Wizard Version: //')
41
- if [ -n "$INSTALLED_VERSION" ] && command -v npm > /dev/null 2>&1; then
42
- LATEST_VERSION=$(npm view agentic-sdlc-wizard version 2>/dev/null) || true
46
+ if [ -n "$INSTALLED_VERSION" ] && [[ "$INSTALLED_VERSION" =~ $SEMVER_RE ]]; then
47
+ VERSION_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
48
+ VERSION_CACHE_FILE="$VERSION_CACHE_DIR/latest-version"
49
+ LATEST_VERSION=""
50
+
51
+ # Use cache if present, <24h old, and contents are valid semver
52
+ if [ -f "$VERSION_CACHE_FILE" ]; then
53
+ if stat -f %m "$VERSION_CACHE_FILE" > /dev/null 2>&1; then
54
+ CACHE_MTIME=$(stat -f %m "$VERSION_CACHE_FILE")
55
+ else
56
+ CACHE_MTIME=$(stat -c %Y "$VERSION_CACHE_FILE" 2>/dev/null || echo 0)
57
+ fi
58
+ CACHE_AGE=$(( $(date +%s) - CACHE_MTIME ))
59
+ if [ "$CACHE_AGE" -lt 86400 ]; then
60
+ CACHE_CONTENT=$(cat "$VERSION_CACHE_FILE" 2>/dev/null) || CACHE_CONTENT=""
61
+ if [[ "$CACHE_CONTENT" =~ $SEMVER_RE ]]; then
62
+ LATEST_VERSION="$CACHE_CONTENT"
63
+ fi
64
+ fi
65
+ fi
66
+
67
+ # Fetch from npm if cache miss / stale / malformed
68
+ if [ -z "$LATEST_VERSION" ] && command -v npm > /dev/null 2>&1; then
69
+ NPM_RESULT=$(npm view agentic-sdlc-wizard version 2>/dev/null) || NPM_RESULT=""
70
+ if [[ "$NPM_RESULT" =~ $SEMVER_RE ]]; then
71
+ LATEST_VERSION="$NPM_RESULT"
72
+ mkdir -p "$VERSION_CACHE_DIR" 2>/dev/null || true
73
+ printf '%s' "$LATEST_VERSION" > "$VERSION_CACHE_FILE" 2>/dev/null || true
74
+ fi
75
+ fi
76
+
43
77
  if [ -n "$LATEST_VERSION" ] && [ "$LATEST_VERSION" != "$INSTALLED_VERSION" ]; then
44
- echo "SDLC Wizard update available: ${INSTALLED_VERSION}${LATEST_VERSION} (run /update-wizard)"
78
+ # Minor-version delta: 1.25.0 vs 1.34.09
79
+ INSTALLED_MINOR=$(echo "$INSTALLED_VERSION" | awk -F. '{print $2+0}')
80
+ LATEST_MINOR=$(echo "$LATEST_VERSION" | awk -F. '{print $2+0}')
81
+ INSTALLED_MAJOR=$(echo "$INSTALLED_VERSION" | awk -F. '{print $1+0}')
82
+ LATEST_MAJOR=$(echo "$LATEST_VERSION" | awk -F. '{print $1+0}')
83
+ MINOR_DELTA=0
84
+ if [ "$INSTALLED_MAJOR" = "$LATEST_MAJOR" ]; then
85
+ MINOR_DELTA=$(( LATEST_MINOR - INSTALLED_MINOR ))
86
+ else
87
+ # Major bump: treat as a very large delta
88
+ MINOR_DELTA=99
89
+ fi
90
+
91
+ if [ "$MINOR_DELTA" -ge 3 ]; then
92
+ echo ""
93
+ echo "!! WARNING: SDLC Wizard is ${MINOR_DELTA} minor versions behind !!"
94
+ echo " Installed: ${INSTALLED_VERSION}"
95
+ echo " Latest: ${LATEST_VERSION}"
96
+ echo " You're missing bug fixes and features shipped across ${MINOR_DELTA} releases."
97
+ echo " Strongly recommend running /update-wizard before starting new work."
98
+ echo ""
99
+ else
100
+ echo "SDLC Wizard update available: ${INSTALLED_VERSION} → ${LATEST_VERSION} (run /update-wizard)"
101
+ fi
45
102
  fi
46
103
  fi
47
104
  fi
@@ -98,6 +155,49 @@ if [ -d "$PROJECT_DIR/.claude/skills/update" ]; then
98
155
  done
99
156
  fi
100
157
 
158
+ # API feature review nudge (#100) — surface open 'api-review-needed' issues
159
+ # opened by .github/workflows/weekly-api-update.yml so the session picks up
160
+ # new API features without waiting for manual discovery.
161
+ #
162
+ # Gated on LOCAL presence of the detector workflow: the CLI distributes this
163
+ # hook to consumer projects, and we don't want to pester those users with
164
+ # upstream-wizard issues. The nudge only fires when the current repo owns
165
+ # the detector (= the wizard repo or a fork of it).
166
+ if [ -f "$PROJECT_DIR/.github/workflows/weekly-api-update.yml" ] && \
167
+ command -v gh > /dev/null 2>&1; then
168
+ # Query the current repo (not hardcoded upstream) — in a fork, users see
169
+ # their own detector's issues, not ours.
170
+ API_REVIEW_COUNT=$(gh issue list \
171
+ --state open \
172
+ --label "api-review-needed" \
173
+ --limit 1 \
174
+ --json number \
175
+ --jq 'length' 2>/dev/null) || API_REVIEW_COUNT=""
176
+ if [[ "$API_REVIEW_COUNT" =~ ^[0-9]+$ ]] && [ "$API_REVIEW_COUNT" -gt 0 ]; then
177
+ echo "Anthropic API features pending review: ${API_REVIEW_COUNT} open issue(s) with label 'api-review-needed' (see .github/workflows/weekly-api-update.yml)"
178
+ fi
179
+ fi
180
+
181
+ # Claude Code release review nudge (#85) — surface open 'auto-update' PRs
182
+ # opened by .github/workflows/weekly-update.yml so new CC releases get triaged
183
+ # before they bit-rot (relevance-HIGH PRs can sit for days otherwise).
184
+ #
185
+ # Gated on LOCAL presence of weekly-update.yml: the CLI distributes this hook
186
+ # to consumer projects, which don't own the detector — don't pester them with
187
+ # upstream-wizard PRs.
188
+ if [ -f "$PROJECT_DIR/.github/workflows/weekly-update.yml" ] && \
189
+ command -v gh > /dev/null 2>&1; then
190
+ CC_UPDATE_COUNT=$(gh pr list \
191
+ --state open \
192
+ --label "auto-update" \
193
+ --limit 1 \
194
+ --json number \
195
+ --jq 'length' 2>/dev/null) || CC_UPDATE_COUNT=""
196
+ if [[ "$CC_UPDATE_COUNT" =~ ^[0-9]+$ ]] && [ "$CC_UPDATE_COUNT" -gt 0 ]; then
197
+ echo "Claude Code update pending review: ${CC_UPDATE_COUNT} open auto-update PR(s) (see .github/workflows/weekly-update.yml)"
198
+ fi
199
+ fi
200
+
101
201
  # Claude Code version check (non-blocking, best-effort)
102
202
  if command -v claude > /dev/null 2>&1 && command -v npm > /dev/null 2>&1; then
103
203
  CC_LOCAL=$(claude --version 2>/dev/null | grep -o '[0-9][0-9.]*' | head -1) || true
@@ -0,0 +1,75 @@
1
+ #!/bin/bash
2
+ # PreCompact hook - block manual /compact at non-seam boundaries
3
+ # Fires on manual /compact only (auto-compact is threshold-driven; blocking
4
+ # it could push past 100% context and lose everything). Matcher: "manual"
5
+ # in .claude/settings.json.
6
+ #
7
+ # Requires Claude Code v2.1.105+ (PreCompact event introduced April 13, 2026).
8
+ #
9
+ # Seam policy: compacting mid-cycle loses evidence the next round needs.
10
+ # Block when:
11
+ # (1) .reviews/handoff.json status is PENDING_REVIEW or PENDING_RECHECK
12
+ # → mid-Codex-round, compact after CERTIFIED
13
+ # (2) git rebase/merge/cherry-pick in progress
14
+ # → finish in-progress git operation first
15
+ # Allow otherwise.
16
+
17
+ [ ! -t 0 ] && INPUT=$(cat) || INPUT=""
18
+
19
+ # Determine project root: prefer $CLAUDE_PROJECT_DIR, fall back to cwd
20
+ ROOT="${CLAUDE_PROJECT_DIR:-$PWD}"
21
+
22
+ HOLD_REASONS=""
23
+
24
+ # Check 1: Codex review mid-cycle
25
+ # Self-heal: if handoff has a pr_number and gh reports that PR as MERGED,
26
+ # the handoff is a stale artifact from a closed review — treat as implicit
27
+ # CERTIFIED so a forgotten status field doesn't permanently block /compact.
28
+ HANDOFF="$ROOT/.reviews/handoff.json"
29
+ if [ -f "$HANDOFF" ]; then
30
+ STATUS=$(grep -o '"status"[[:space:]]*:[[:space:]]*"[^"]*"' "$HANDOFF" 2>/dev/null | head -1 | sed 's/.*"\([^"]*\)"$/\1/')
31
+ case "$STATUS" in
32
+ PENDING_REVIEW|PENDING_RECHECK)
33
+ PR_NUMBER=$(grep -o '"pr_number"[[:space:]]*:[[:space:]]*[0-9][0-9]*' "$HANDOFF" 2>/dev/null | head -1 | grep -o '[0-9][0-9]*$')
34
+ HEALED=0
35
+ if [ -n "$PR_NUMBER" ] && command -v gh >/dev/null 2>&1; then
36
+ PR_STATE=$(gh pr view "$PR_NUMBER" --json state --jq .state 2>/dev/null)
37
+ [ "$PR_STATE" = "MERGED" ] && HEALED=1
38
+ fi
39
+ if [ "$HEALED" -ne 1 ]; then
40
+ HOLD_REASONS="${HOLD_REASONS} - Codex review is ${STATUS}. Round-1 evidence lives in this context — compacting now loses what round-2 needs to re-verify.
41
+ Resolve: wait for CERTIFIED (or escalate) before /compact."$'\n'
42
+ fi
43
+ ;;
44
+ esac
45
+ fi
46
+
47
+ # Check 2: in-progress git operation
48
+ GITDIR="$ROOT/.git"
49
+ if [ -d "$GITDIR" ]; then
50
+ if [ -e "$GITDIR/REBASE_HEAD" ] || [ -d "$GITDIR/rebase-merge" ] || [ -d "$GITDIR/rebase-apply" ]; then
51
+ HOLD_REASONS="${HOLD_REASONS} - Git rebase in progress. Compacting mid-rebase loses the operation's context.
52
+ Resolve: finish or abort the rebase before /compact."$'\n'
53
+ fi
54
+ if [ -e "$GITDIR/MERGE_HEAD" ]; then
55
+ HOLD_REASONS="${HOLD_REASONS} - Git merge in progress. Compacting mid-merge loses the operation's context.
56
+ Resolve: finish or abort the merge before /compact."$'\n'
57
+ fi
58
+ if [ -e "$GITDIR/CHERRY_PICK_HEAD" ]; then
59
+ HOLD_REASONS="${HOLD_REASONS} - Git cherry-pick in progress. Compacting mid-cherry-pick loses the operation's context.
60
+ Resolve: finish or abort the cherry-pick before /compact."$'\n'
61
+ fi
62
+ fi
63
+
64
+ if [ -n "$HOLD_REASONS" ]; then
65
+ {
66
+ echo "HOLD: manual /compact at a non-seam. Compacting mid-cycle loses evidence the next round needs."
67
+ echo ""
68
+ echo "$HOLD_REASONS"
69
+ echo "Natural seams: commit boundary, Codex CERTIFIED, PR merge, ROADMAP item DONE."
70
+ echo "Override: resolve the blocker above, or temporarily disable this hook in .claude/settings.json."
71
+ } >&2
72
+ exit 2
73
+ fi
74
+
75
+ exit 0
@@ -17,6 +17,69 @@ else
17
17
  exit 0
18
18
  fi
19
19
 
20
+ # Effort auto-bump (ROADMAP #195). Watches this UserPromptSubmit payload for
21
+ # LOW-confidence / FAILED-repeatedly / CONFUSED phrases, logs a timestamped
22
+ # signal, and emits a loud '/effort xhigh' nudge when ≥2 signals land inside
23
+ # a 30-minute window. Enforces the SDLC confidence table mid-session so
24
+ # Claude stops burning budget at 'high' after confidence drops.
25
+ EFFORT_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
26
+ EFFORT_SIGNALS="$EFFORT_CACHE_DIR/effort-signals.log"
27
+ PROMPT_TEXT=""
28
+ if [ ! -t 0 ] && command -v jq > /dev/null 2>&1; then
29
+ STDIN_JSON=$(cat)
30
+ if [ -n "$STDIN_JSON" ]; then
31
+ PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
32
+ fi
33
+ fi
34
+ if [ -n "$PROMPT_TEXT" ]; then
35
+ LOWER=$(printf '%s' "$PROMPT_TEXT" | tr '[:upper:]' '[:lower:]')
36
+ SIGNAL_REASON=""
37
+ # Every trigger requires first-person ownership or a structured-label
38
+ # form, so educational/quoted mentions ("How do I name a low confidence
39
+ # badge?", "What does 'failed again' mean?") don't fire.
40
+ case "$LOWER" in
41
+ *"i'm stuck"*|*"i am stuck"*|*"im stuck"*|\
42
+ *"i'm confused"*|*"i am confused"*|*"im confused"*|\
43
+ *"i tried twice"*|*"i've tried twice"*|*"ive tried twice"*|\
44
+ *"i can't figure"*|*"i cant figure"*|\
45
+ *"i'm not sure why"*|*"i am not sure why"*|*"im not sure why"*|\
46
+ *"my confidence is low"*|*"my confidence: low"*|*"confidence: low"*|\
47
+ *"it's still failing"*|*"its still failing"*|\
48
+ *"it keeps failing"*|*"this keeps failing"*|\
49
+ *"it failed again"*|*"this failed again"*|\
50
+ *"failed twice"*|*"failed 2x"*)
51
+ SIGNAL_REASON="low"
52
+ ;;
53
+ esac
54
+ if [ -n "$SIGNAL_REASON" ]; then
55
+ # Group the write so redirection errors (e.g., unwritable HOME,
56
+ # cache-dir-is-a-file) land on /dev/null instead of leaking to stderr.
57
+ {
58
+ if mkdir -p "$EFFORT_CACHE_DIR" && [ -d "$EFFORT_CACHE_DIR" ]; then
59
+ # Prune entries older than 1h on every write to cap log size.
60
+ if [ -f "$EFFORT_SIGNALS" ]; then
61
+ PRUNE_THRESH=$(( $(date +%s) - 3600 ))
62
+ awk -v t="$PRUNE_THRESH" '$1+0 >= t' "$EFFORT_SIGNALS" > "${EFFORT_SIGNALS}.tmp" \
63
+ && mv "${EFFORT_SIGNALS}.tmp" "$EFFORT_SIGNALS"
64
+ fi
65
+ printf '%s\t%s\n' "$(date +%s)" "$SIGNAL_REASON" >> "$EFFORT_SIGNALS"
66
+ fi
67
+ } 2>/dev/null || true
68
+ fi
69
+ fi
70
+ if [ -f "$EFFORT_SIGNALS" ]; then
71
+ NOW=$(date +%s)
72
+ THRESH=$(( NOW - 1800 ))
73
+ RECENT=$(awk -v t="$THRESH" '$1+0 >= t' "$EFFORT_SIGNALS" 2>/dev/null | wc -l | tr -d ' ')
74
+ if [ "${RECENT:-0}" -ge 2 ]; then
75
+ echo ""
76
+ echo "!! EFFORT BUMP REQUIRED: ${RECENT} low-confidence signals in last 30 min !!"
77
+ echo " Run /effort xhigh NOW — spinning at 'high' after confidence drops wastes budget."
78
+ echo " (Auto-enforcement of the SDLC confidence table. ROADMAP #195.)"
79
+ echo ""
80
+ fi
81
+ fi
82
+
20
83
  if [ ! -s "$PROJECT_DIR/SDLC.md" ] || [ ! -s "$PROJECT_DIR/TESTING.md" ]; then
21
84
  cat << 'SETUP'
22
85
  SETUP NOT COMPLETE: SDLC.md and/or TESTING.md are missing.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.33.0",
3
+ "version": "1.35.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "./cli/bin/sdlc-wizard.js"
@@ -184,9 +184,11 @@ Before presenting approach, STATE your confidence:
184
184
  |-------|---------|--------|--------|
185
185
  | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
186
186
  | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
187
- | LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | Consider `/effort max` |
188
- | FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | Try `/effort max` |
189
- | CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | Try `/effort max` |
187
+ | LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | **Run `/effort xhigh` now** — don't wait |
188
+ | FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | **Run `/effort max` now** — you're already burning cycles at lower effort |
189
+ | CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | **Run `/effort max` now** — stop spinning |
190
+
191
+ **Dynamic bumping is NOT optional.** "Consider max effort" is the same as "ignore this" in practice. If your confidence drops or tests fail twice, bump effort BEFORE the next attempt — not after a third failure. Spinning at low effort is an SDLC failure mode, not a style choice.
190
192
 
191
193
  ## Self-Review Loop (CRITICAL)
192
194
 
@@ -536,10 +538,11 @@ Local tests pass -> Commit -> Push -> Watch CI
536
538
  1. Push changes to remote
537
539
  2. Watch CI: `gh pr checks --watch`
538
540
  3. Read CI logs — **pass or fail**: `gh run view <RUN_ID> --log` (not just `--log-failed`). Passing CI can still hide warnings, skipped steps, or degraded scores. Don't just check the green checkmark
539
- 4. If CI fails diagnose from logs, fix, push again (max 2 attempts)
540
- 5. If CI passesread ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
541
- 6. Fix valid suggestions, push, iterate until clean
542
- 7. Only then: explicit merge with `gh pr merge --squash`
541
+ 4. **Cross-model review the CI logs themselves** pipe `gh run view <RUN_ID> --log` to a tmp file and run `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access` with a prompt like *"Audit this CI log for silent failures, skipped tests, degraded metrics, or warnings-that-should-be-errors. Green checkmark is necessary but not sufficient."* A second model catches things the first missed (e.g., a job that passed but degraded an E2E score by 30%, or a test that was silently excluded). Cheap — one extra `codex exec` per PR. **Run separately on Tier 1 quick-check AND Tier 2 5x evaluation logs** — they exercise different code paths, so a clean Tier 1 audit doesn't imply a clean Tier 2. Evidence from PR #206: Tier 1 audit found 3 P1s (Node 24 false-green, "11/10" score leak, E2E incomplete); Tier 2 audit TBD — value is measured by running both and comparing.
542
+ 5. If CI failsdiagnose from logs, fix, push again (max 2 attempts)
543
+ 6. If CI passes read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
544
+ 7. Fix valid suggestions, push, iterate until clean
545
+ 8. Only then: explicit merge with `gh pr merge --squash`
543
546
 
544
547
  **Why this is non-negotiable:** PR #145 auto-merged a release before review feedback was read. CI reviewer found a P1 dead-code bug that shipped to main. The fix required a follow-up commit. Auto-merge cost more time than the shepherd loop would have taken.
545
548
 
@@ -717,6 +720,40 @@ If this session revealed insights, update the right place:
717
720
  - **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
718
721
  - **Plan files** → If this session's work came from a plan file, delete it or mark it complete. Stale plans mislead future sessions into thinking work is still pending
719
722
 
723
+ ### Memory Audit Protocol
724
+
725
+ Per-user memory at `~/.claude/projects/<proj>/memory/` accumulates private learnings. Some belong there (user preferences, external references). Others are portable technical lessons (tool quirks, platform gotchas, bash/GHA/macOS footguns) that would save the next contributor hours. Run this audit to promote the portable ones.
726
+
727
+ **When to run:**
728
+ - End-of-release (before cutting a tag)
729
+ - After a debugging-heavy session with multiple memory additions
730
+ - On explicit "audit my memory" request
731
+
732
+ **Classify each memory file in `~/.claude/projects/<proj>/memory/`:**
733
+
734
+ 1. **Rule-based denylist (deterministic, no LLM):**
735
+ - `type: user` → `keep` (user identity, preferences — never promote)
736
+ - `type: reference` → `keep` (external pointers to Discord/URL/etc — private by default)
737
+ - `type: project` → `manual-review` (often mixed state + portable lesson — human decides)
738
+ - `type: feedback` → `manual-review` (often mixed personal preference + portable rule — human decides)
739
+ - Parser must normalize YAML variants (`type: "user"`, `type: user # comment`, surrounding whitespace) — see `tests/test-memory-audit-protocol.sh::apply_denylist_rule` for the reference implementation
740
+ 2. **Remaining entries** (no type, or type outside the 4 above) fall through to human-gated review. An LLM-assisted classification runner is Prove-It-Gated: build it only after running this protocol 4+ times with manual classification. Until then, human review at promotion time IS the quality gate
741
+
742
+ **Destinations for `promote` entries (no new files — use existing wizard destinations):**
743
+
744
+ | Content | Target |
745
+ |---------|--------|
746
+ | Language/tool/platform gotchas (bash, gh CLI, GHA, macOS) | `SDLC.md` → `## Lessons Learned` section |
747
+ | Testing gotchas (flaky patterns, mock-vs-integration lessons) | `TESTING.md` |
748
+ | Tool-specific quirks tied to a skill | That skill's `SKILL.md` |
749
+ | Process rules that should govern the project | `CLAUDE.md` |
750
+
751
+ **Tracking:** When you promote an entry, add `promoted_to: <path>` to that memory file's YAML frontmatter. Subsequent audits skip already-promoted entries.
752
+
753
+ **Human gate is MANDATORY.** Protocol produces diffs; user approves chunk-by-chunk before apply. Never auto-apply — private memory touching public docs needs human judgement.
754
+
755
+ **Prove It Gate:** If you find yourself running this protocol 4+ times and manually doing the same classification work, that's evidence to build a `/memory-audit` slash command AND wire the LLM-gated quality tests (8/10 classification, 6/6 destination). Until then, protocol + human review is enough — and no stub tests that skip (they mislead reviewers into thinking a gate exists when it doesn't).
756
+
720
757
  ## Post-Mortem: When Process Fails, Feed It Back
721
758
 
722
759
  **Every process failure becomes an enforcement rule.** When you skip a step and it causes a problem, don't just fix the symptom — add a gate so it can't happen again.
@@ -174,31 +174,61 @@ Skip this step if no branding assets or UI/content patterns detected.
174
174
 
175
175
  ### Step 9: Configure Tool Permissions
176
176
 
177
- Based on detected stack, suggest `allowedTools` entries for `.claude/settings.json`:
177
+ Based on detected stack, suggest entries for `permissions.allow` in `.claude/settings.json`:
178
178
  - Package manager commands (npm, pnpm, yarn, cargo, go, pip, etc.)
179
179
  - Build/test commands
180
180
  - CI tools (gh)
181
181
 
182
+ Write the shape as:
183
+
184
+ ```json
185
+ {
186
+ "permissions": {
187
+ "allow": [
188
+ "Bash(npm:*)",
189
+ "Bash(npx:*)",
190
+ "Bash(git:*)",
191
+ "Bash(gh:*)"
192
+ ]
193
+ }
194
+ }
195
+ ```
196
+
197
+ **Do NOT write the deprecated top-level `allowedTools` array** (issue #197). Claude Code treats the presence of `allowedTools` in project settings as "user has explicitly scoped tool permissions" and silently disables its auto-mode classifier — same failure family as the model pin in #198. `permissions.allow` is the supported successor and does not trip the auto-mode gate.
198
+
182
199
  Present suggestions and let the user confirm.
183
200
 
184
- ### Step 9.5: Context Window Configuration
201
+ ### Step 9.5: Context Window Configuration (Opt-In)
202
+
203
+ The CLI ships `cli/templates/settings.json` with **no** `model` or `env` pin by default. This preserves Claude Code's built-in model auto-selection (Sonnet for cheap tasks, Opus for hard ones) and the upstream autocompact threshold. Power users who want guaranteed 1M context can opt in during setup.
204
+
205
+ **Why this is opt-in (issue #198):** A top-level `"model"` in `settings.json` tells Claude Code "the user has explicitly chosen a model" and disables auto-mode for the session. That is a real tradeoff — the pin is only worth it when you actually need the 1M headroom and want to lock to Opus 4.7.
206
+
207
+ **Ask the user exactly once in Step 9.5:**
208
+
209
+ > Pin the session to `opus[1m]` (Opus 4.7 with 1M context) and set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`?
210
+ >
211
+ > - **No (default):** Leaves auto-mode enabled. Claude Code picks the model per turn, compaction follows upstream defaults. Recommended for most users.
212
+ > - **Yes:** Long SDLC sessions (plan → TDD → review → CI shepherd on one feature) regularly cross 100K tokens; the 1M window gives headroom and 30% autocompact fires at ~300K. Requires Claude Code v2.1.111+ and comfort with losing model auto-selection.
213
+ >
214
+ > `[y/N]`
185
215
 
186
- The CLI ships `cli/templates/settings.json` with `"model": "opus[1m]"` and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M context window compacts at ~300K). This is the SDLC wizard default. Confirm the installed values, or fall back to 200K if the user prefers:
216
+ **If the user answers No (default):** Make no edits to `.claude/settings.json`. Auto-mode stays on. Done.
187
217
 
188
- - **1M default (`opus[1m]`):** Confirm `"model": "opus[1m]"` at top level and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "30"` under `env`. Requires Claude Code v2.1.111+ for Opus 4.7.
189
- - **200K fallback (`opus`):** Edit `.claude/settings.json` — change `"model"` to `"opus"` and raise `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to `"75"` (otherwise `30%` of 200K compacts too early at 60K).
218
+ **If the user answers Yes:** Edit `.claude/settings.json` and add both fields at the top level:
190
219
 
191
- To fall back to 200K, edit `.claude/settings.json`:
192
220
  ```json
193
221
  {
194
- "model": "opus",
222
+ "model": "opus[1m]",
195
223
  "env": {
196
- "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
224
+ "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
197
225
  }
198
226
  }
199
227
  ```
200
228
 
201
- For CI pipelines, consider `"60"` — short tasks benefit from compacting early.
229
+ Mention the escape hatch either way:
230
+ - To opt out later: remove the `model` line (and optionally the `env` block) from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
231
+ - For CI pipelines with short tasks, consider `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60` — compact early to stay fast.
202
232
 
203
233
  This is project-scoped and shared with the team via git.
204
234
 
@@ -224,10 +254,13 @@ Tell the user:
224
254
  > **Exit Claude Code and restart it** for the new configuration to take effect.
225
255
  > On restart, the SDLC hook will fire and you'll see the checklist in every response.
226
256
  >
227
- > **Optional next step:**
257
+ > **Optional next steps:**
228
258
  > - Run `/claude-automation-recommender` for stack-specific tooling suggestions (MCP servers, formatting hooks, type-checking hooks, plugins)
259
+ > - After a few sessions, run `/less-permission-prompts` — a native Claude Code skill
260
+ > that scans your transcripts for common read-only Bash/MCP calls and proposes a
261
+ > prioritized allowlist. Reduces permission friction without enabling auto mode.
229
262
  >
230
- > The recommender is complementary to the SDLC wizard — it adds tooling recommendations, not process enforcement.
263
+ > Both are complementary to the SDLC wizard — they add tooling and quality-of-life, not process enforcement.
231
264
 
232
265
  ## Rules
233
266
 
@@ -46,9 +46,10 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
46
46
 
47
47
  ```
48
48
  Installed: 1.24.0
49
- Latest: 1.33.0
49
+ Latest: 1.34.0
50
50
 
51
51
  What changed:
52
+ - [1.34.0] API feature detection shepherd for Claude releases, Memory Audit Protocol with 7 verified lessons (+2 caught-and-retracted), /less-permission-prompts surfaced, ...
52
53
  - [1.33.0] opus[1m] as SDLC default, dual-channel install drift guardrails, model/effort session-start nudge, ...
53
54
  - [1.32.0] Opus 4.7 + xhigh support, model/effort upgrade detection, benchmark ceiling audit, ...
54
55
  - [1.31.0] Hook false-positive fix for non-SDLC dirs, ephemeral marketplace path warning, ...
@@ -109,6 +110,54 @@ NEVER overwrite settings.json. Instead:
109
110
 
110
111
  The CLI's `init --force` already has smart merge logic for settings.json. If the manual merge gets complicated, suggest: `npx agentic-sdlc-wizard init --force` (it preserves custom hooks).
111
112
 
113
+ ### Step 7.5: Model Pin Migration (Issue #198)
114
+
115
+ Wizard versions 1.31.0–1.33.x unconditionally wrote `"model": "opus[1m]"` and `"env": { "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30" }` to `.claude/settings.json`. Issue #198 flipped that to opt-in because a top-level `model` disables Claude Code's auto-mode for the session.
116
+
117
+ If the user is upgrading from a pre-#198 version, check their `.claude/settings.json`:
118
+
119
+ 1. **If `model` is `"opus[1m]"` and `env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` is `"30"`** — this is likely the old wizard-installed pair, not an intentional user choice. Ask:
120
+
121
+ > Your `.claude/settings.json` pins `model: "opus[1m]"` with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`.
122
+ > This pair was the SDLC wizard default in 1.31.0–1.33.x, but it disables Claude Code's auto-mode (see issue #198).
123
+ >
124
+ > - **Remove the pin** (recommended for most users) — keeps auto-mode enabled, lets Claude Code pick the model per turn.
125
+ > - **Keep the pin** — you want guaranteed Opus 4.7 + 1M context, and you're OK giving up model auto-selection.
126
+ >
127
+ > Remove, keep, or decide later? `[r/k/l]`
128
+
129
+ 2. **If only one of the two fields matches** (e.g. `model: "opus[1m]"` but custom autocompact, or vice versa) — treat as intentional customization. Do not prompt.
130
+
131
+ 3. **If `model` is some other value** (e.g. `"sonnet"`, `"opus"`) — treat as user's explicit choice. Do not touch.
132
+
133
+ 4. **If neither field is set** — user is already on the new default. No action.
134
+
135
+ When removing: edit the file in place, drop the `model` key (and the `env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` key if nothing else is in `env`, otherwise leave `env` alone). Never touch other keys the user added.
136
+
137
+ ### Step 7.6: `allowedTools` → `permissions.allow` Migration (Issue #197)
138
+
139
+ Wizard versions before #197 guided users to write a top-level `allowedTools` array in `.claude/settings.json`. Claude Code silently disables its auto-mode classifier when that key is present, even with `defaultMode: "auto"` set globally.
140
+
141
+ If the user's `.claude/settings.json` has a top-level `allowedTools` array, offer to migrate:
142
+
143
+ 1. **If only `allowedTools` is present** (no `permissions.allow`) — ask:
144
+
145
+ > Your `.claude/settings.json` has a top-level `allowedTools` array. This silently disables Claude Code auto-mode (see issue #197). The supported successor is `permissions.allow`, which accepts the same patterns but doesn't trip the auto-mode gate.
146
+ >
147
+ > - **Migrate** (recommended): move all entries into `permissions.allow`, remove the old `allowedTools`.
148
+ > - **Keep** — you have a specific reason to use the legacy key.
149
+ > - **Later** — don't touch it now.
150
+ >
151
+ > `[m/k/l]`
152
+
153
+ 2. **If both `allowedTools` and `permissions.allow` are present** — flag it: the two lists may have diverged. Show both arrays to the user. On migrate, append every entry from `allowedTools` to the end of `permissions.allow` (preserving order within each list), then drop the legacy `allowedTools` key. **Do NOT dedup.** If the same string appears in both lists, it stays in both positions — Claude Code treats duplicate entries as a no-op, but dedup would silently remove user data that the user might have intended. If the user explicitly asks to dedup, do that as a separate follow-up edit.
154
+
155
+ 3. **If only `permissions.allow` is present** — user is already on the new shape. No action.
156
+
157
+ 4. **If neither is present** — no action.
158
+
159
+ When migrating: preserve every entry byte-for-byte; only the container key changes. Do not reorder, dedup, or expand wildcards. Other top-level keys (hooks, env, model, custom user fields) are never touched.
160
+
112
161
  ### Step 8: Apply Selected Changes
113
162
 
114
163
  For each file the user approved: