agentic-sdlc-wizard 1.35.0 → 1.36.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,33 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.36.0] - 2026-04-23
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **Regression test: every ci.yml `steps.X.outputs.Y` reference must resolve** (#214 / ROADMAP #215). Python+PyYAML test walks ci.yml, builds per-job map of step_id → emitted outputs (handles `NAME=val` and heredoc `NAME<<EOF`), and flags any dead gate. Caught #215's original bug and guards against re-introduction.
|
|
12
|
+
- **Regression test: no `oven-sh/setup-bun` in workflows** (#217 / ROADMAP #210). Defensive guard against Node 20 deprecation reintroduction. Committed negative control writes a tmp fixture with the banned pattern, asserts grep catches it, tears down — proves the regex is live-fire correct.
|
|
13
|
+
- **ROADMAP #218** — evaluate CC 2.1.118 `type: "mcp_tool"` hook capability (Prove-It gated).
|
|
14
|
+
- **ROADMAP #219** — re-verify #198 model-pin guidance against CC 2.1.117 restart-persistence behavior.
|
|
15
|
+
- **ROADMAP #220** — token-spike anomaly detection (from Anthropic 2026-04-23 post-mortem).
|
|
16
|
+
- **ROADMAP #221** — fold post-mortem lessons into wizard docs (explicit effort / extended-thinking+caching+idle gotcha / verbosity-cap audit).
|
|
17
|
+
- **ROADMAP #222** — prompt-compounding audit harness (A/B each of ~40 prompt-injection sites).
|
|
18
|
+
- **ROADMAP #223** — adopt GPT-5.5 in review-tier guidance after calibration (standard $5/$30 is Codex-usable ceiling; Pro $30/$180 is ChatGPT-only, reserve for release-blocker one-offs).
|
|
19
|
+
|
|
20
|
+
### Fixed
|
|
21
|
+
|
|
22
|
+
- **Tier 2 persist-scores dead gate** (#214 / ROADMAP #215). The Tier 2 "Persist scores to PR branch" step was gated on `steps.check-baseline.outputs.should_simulate`, but the Tier 2 `check-baseline` only emits `has_baseline`. The step had been silently dead, so `score-history.jsonl` never got appended from Tier 2 runs. One-word fix (`should_simulate` → `has_baseline`) plus the new cross-job output-parser regression test.
|
|
23
|
+
- **`score-history.jsonl` `max_score` correctness** (#216 / ROADMAP #211). Both Tier 1 and Tier 2 hardcoded `--argjson max_score 10`, causing UI scenarios (which score out of 11 via design_system bonus) to record `11/10` — nonsensical and breaking downstream analytics. Both sites now read `MAX_SCORE` from the eval result file; shell `case` statement guards non-numeric inputs; regression test grep-asserts no hardcoded literal remains.
|
|
24
|
+
- **CC 2.1.118 `/cost` → `/usage` doc rename** (#209). Claude Code 2.1.118 consolidated `/cost` and `/stats` into `/usage` with aliases preserved. Wizard docs and test-self-update now use `/usage` as canonical (alias note inline).
|
|
25
|
+
|
|
26
|
+
### Docs
|
|
27
|
+
|
|
28
|
+
- Roadmap entries for all additions above (#218-223), plus minor wording correction to #221(c) post-Codex review (attributes 3% drop to broader length-limit prompt change, not a single sentence).
|
|
29
|
+
|
|
30
|
+
### Process
|
|
31
|
+
|
|
32
|
+
- Codex cross-model review run on every PR in the v1.36.0 batch (3 code PRs: 10/10 + 10/10 + 8/10; 4 doc/roadmap PRs: 10/10 + 10/10 + 7/10→fixed + 9/10). Shepherd-loop discipline re-enforced after a process miss mid-cycle — logged as feedback memory `feedback_shepherd_loop_per_pr.md`.
|
|
33
|
+
|
|
7
34
|
## [1.35.0] - 2026-04-19
|
|
8
35
|
|
|
9
36
|
### Added
|
|
@@ -240,11 +240,13 @@ When Anthropic provides official plugins or tools that handle something:
|
|
|
240
240
|
|
|
241
241
|
Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
|
|
242
242
|
|
|
243
|
+
> ⚠️ **On Opus 4.7, effort below `xhigh` breaks SDLC compliance in practice.** Unlike 4.6, Opus 4.7 respects effort levels *strictly* — at `high` or below it scopes work tighter (shallow reasoning, skipped TDD, no self-review) rather than going above-and-beyond. Treat the table below accordingly: **`max` is the recommended default, `xhigh` is the floor**, `high` or below is for trivial grep/search subagents only.
|
|
244
|
+
|
|
243
245
|
| Level | When to Use | How to Set |
|
|
244
246
|
|-------|-------------|------------|
|
|
245
|
-
| `high` |
|
|
246
|
-
| `xhigh` | **
|
|
247
|
-
| `max` |
|
|
247
|
+
| `high` or below | **Not for SDLC work on Opus 4.7.** Only for trivial grep/search subagents or one-shot questions that don't require planning | `effort: high` in a specific subagent frontmatter only |
|
|
248
|
+
| `xhigh` | **Floor for SDLC work on Opus 4.7.** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7 | `/effort xhigh` or set in skill frontmatter |
|
|
249
|
+
| `max` | **Recommended default for Opus 4.7 SDLC work.** Multi-file changes, architecture decisions, debugging, cross-model reviews, any task touching wizard/skill/CI code | `/effort max` (session only — resets next session) |
|
|
248
250
|
|
|
249
251
|
**Effort level changes in Opus 4.7 (April 2026):**
|
|
250
252
|
- **`xhigh` is new** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
|
|
@@ -255,7 +257,7 @@ Claude Code's **effort level** controls how much thinking the model does before
|
|
|
255
257
|
|
|
256
258
|
**Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
|
|
257
259
|
|
|
258
|
-
The `/sdlc` skill sets `effort: high` in its frontmatter, overriding the medium default on every SDLC invocation.
|
|
260
|
+
The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.7, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.7.
|
|
259
261
|
|
|
260
262
|
**Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
|
|
261
263
|
|
|
@@ -934,7 +936,7 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in
|
|
|
934
936
|
|
|
935
937
|
**How to opt out:** remove the `model` line from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
|
|
936
938
|
|
|
937
|
-
**Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/
|
|
939
|
+
**Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/usage` to monitor (aliases: `/cost`, `/stats`) — a 900K-token session is meaningfully more expensive than an 80K one even at standard rates.
|
|
938
940
|
|
|
939
941
|
**Autocompact pairing (important):** If you opt into `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. Step 9.5 writes both together when you opt in.
|
|
940
942
|
|
|
@@ -2711,7 +2713,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2711
2713
|
|
|
2712
2714
|
**SDLC.md:**
|
|
2713
2715
|
```markdown
|
|
2714
|
-
<!-- SDLC Wizard Version: 1.
|
|
2716
|
+
<!-- SDLC Wizard Version: 1.36.0 -->
|
|
2715
2717
|
<!-- Setup Date: [DATE] -->
|
|
2716
2718
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2717
2719
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -3361,7 +3363,7 @@ Practical techniques to reduce token consumption without sacrificing quality.
|
|
|
3361
3363
|
|
|
3362
3364
|
| Tool | What It Shows | When to Use |
|
|
3363
3365
|
|------|---------------|-------------|
|
|
3364
|
-
| `/
|
|
3366
|
+
| `/usage` | Session total: USD, API time, code changes (aliases: `/cost`, `/stats`) | After a session to review spend |
|
|
3365
3367
|
| `/context` | What's consuming context window space | When hitting context limits |
|
|
3366
3368
|
| Status line | Real-time `cost.total_cost_usd` + token counts | Continuous monitoring |
|
|
3367
3369
|
|
|
@@ -3772,7 +3774,7 @@ Walk through updates? (y/n)
|
|
|
3772
3774
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3773
3775
|
|
|
3774
3776
|
```markdown
|
|
3775
|
-
<!-- SDLC Wizard Version: 1.
|
|
3777
|
+
<!-- SDLC Wizard Version: 1.36.0 -->
|
|
3776
3778
|
<!-- Setup Date: 2026-01-24 -->
|
|
3777
3779
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3778
3780
|
<!-- Git Workflow: PRs -->
|
package/package.json
CHANGED
package/skills/update/SKILL.md
CHANGED
|
@@ -46,9 +46,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
|
|
|
46
46
|
|
|
47
47
|
```
|
|
48
48
|
Installed: 1.24.0
|
|
49
|
-
Latest: 1.
|
|
49
|
+
Latest: 1.36.0
|
|
50
50
|
|
|
51
51
|
What changed:
|
|
52
|
+
- [1.36.0] CC 2.1.118 `/usage` canonical + aliases, Tier 2 dead-gate fix (#215), score-history max_score correctness (#211), setup-bun regression guard (#210), post-mortem learnings (#220-222), GPT-5.5 adoption plan (#223), MCP-tool hooks + #198 re-verify in backlog (#218/#219)
|
|
53
|
+
- [1.35.0] PreCompact seam gate + self-heal (#208/#209), effort auto-bump on LOW/FAILED/CONFUSED (#195), wizard staleness nudge (#196), Codex CI-log audit pattern, ...
|
|
52
54
|
- [1.34.0] API feature detection shepherd for Claude releases, Memory Audit Protocol with 7 verified lessons (+2 caught-and-retracted), /less-permission-prompts surfaced, ...
|
|
53
55
|
- [1.33.0] opus[1m] as SDLC default, dual-channel install drift guardrails, model/effort session-start nudge, ...
|
|
54
56
|
- [1.32.0] Opus 4.7 + xhigh support, model/effort upgrade detection, benchmark ceiling audit, ...
|