agentic-sdlc-wizard 1.61.0 → 1.63.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,54 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.63.0] - 2026-04-30
|
|
8
|
+
|
|
9
|
+
### Added (cache-cost observability closeout — closes ROADMAP #204)
|
|
10
|
+
|
|
11
|
+
- **Explicit cache-miss regression test in `tests/test-token-spike.sh`.** ROADMAP #204 had a Prove-It Gate: "require at least one quality test proves the hook/skill catches an actual cache-cost regression pattern." This release closes that gate. New `test_cache_miss_pattern_triggers_spike_warning` builds a 20-row baseline of cache-hit-heavy sessions (high cache_read, low cache_creation), appends one cache-miss session (cache_read collapses to 0, cache_creation spikes to 50000), and asserts the >2σ spike warning fires. Negative control `test_high_cache_read_no_warning` confirms the detector keys on cost-bearing fields (`costly_tokens = input + cache_creation + output`), not raw token count — a session with 10× cache reads but unchanged costly_tokens does NOT fire (hot cache is healthy, not expensive).
|
|
12
|
+
|
|
13
|
+
- **Wizard doc gains "Cache-Cost Surprises" subsection in Token Efficiency.** Quantifies the 10-20× silent blowup pattern, points at `hooks/token-spike-check.sh` as the detection mechanism, references the Anthropic 2026-04-23 post-mortem, and lists practices to avoid silent invalidation. NOT added to `skills/sdlc/SKILL.md` — that file is already at the 5000-token session-load threshold (audit-session-load.sh trim ceiling). The hook itself fires the warning automatically; the wizard doc has the action-focused guidance for when a user explicitly asks.
|
|
14
|
+
|
|
15
|
+
### Closed (absorbed)
|
|
16
|
+
|
|
17
|
+
- **#204 — Cache-cost guardrail hook.** Codex strategic-pass-2 review confirmed: the cache-miss pattern surfaces directly in `costly_tokens` (the metric `#220`'s `hooks/token-spike-check.sh` tracks). #204 is absorbed by #220 + the v1.63.0 regression test + docs. No new hook needed — the existing one already covers it.
|
|
18
|
+
|
|
19
|
+
### Files
|
|
20
|
+
|
|
21
|
+
- `tests/test-token-spike.sh` (+2 tests, 16 total)
|
|
22
|
+
- `CLAUDE_CODE_SDLC_WIZARD.md` (Cache-Cost Surprises subsection in Token Efficiency)
|
|
23
|
+
- `ROADMAP.md` (#204 marked absorbed v1.63.0)
|
|
24
|
+
- `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json` (1.62.0 → 1.63.0)
|
|
25
|
+
|
|
26
|
+
## [1.62.0] - 2026-04-30
|
|
27
|
+
|
|
28
|
+
### Fixed
|
|
29
|
+
|
|
30
|
+
- **Backfilled 5 corrupted rows in `tests/e2e/score-history.jsonl`** — closes ROADMAP #211. UI-scenario rows (`add-ui-component`, `ui-styling-change`) had `score:11, max_score:10` because the design_system criterion adds an 11th point but the historical writer capped max at 10. Live scoring code was already correct (PR #216, v1.36.0); this PR backfills the historical data so trend analytics aren't poisoned. Found via Codex strategic-priority review: 5 rows on lines 22–25 + 30 fixed via single `sed -i 's/"score":11,"max_score":10/"score":11,"max_score":11/g'`. JSON validity preserved on all 5 rows (verified via per-line `jq empty`).
|
|
31
|
+
|
|
32
|
+
### Closed (paperwork — already shipped, table rows were stale)
|
|
33
|
+
|
|
34
|
+
Codex strategic-priority review (`.reviews/grouping-review.md`) audited the open ROADMAP table and found 6 rows still presented as open despite shipping in earlier releases:
|
|
35
|
+
|
|
36
|
+
- **#207** — Community feature-discovery scanner (scanner shipped v1.39.0, fetcher v1.56.0 PR #286)
|
|
37
|
+
- **#215** — Tier 2 dead persist step (fixed v1.36.0; Tier 2 jobs subsequently deleted entirely per #212 Option 1)
|
|
38
|
+
- **#217** — `model-effort-check.sh` loud warning below xhigh (shipped 2026-04-24; three-tier logic at `hooks/model-effort-check.sh:44`)
|
|
39
|
+
- **#78** — Firmware E2E fixture (shipped earlier)
|
|
40
|
+
- **#79** — Domain-Adaptive Testing Diamond (shipped earlier)
|
|
41
|
+
- **#80** — SDLC Effectiveness Scoreboard (shipped earlier)
|
|
42
|
+
|
|
43
|
+
All six table rows now properly tombstoned with the shipping release reference. Reduces roadmap noise so future "what's next?" reads are honest.
|
|
44
|
+
|
|
45
|
+
### Verified (config-side)
|
|
46
|
+
|
|
47
|
+
- **#219 — model-pin guidance against CC 2.1.117+ persistence change.** Verified locally on CC 2.1.118 (npm latest 2.1.123): both `cli/templates/settings.json` and `.claude/settings.json` have `has("model") == false`. The new persistence semantics (session-picked model now remembered across restarts) are orthogonal to #198's recommendation to omit the pin. `tests/test-cli.sh:1155` already asserts no default model pin so no new test needed. Optional manual UX check noted in roadmap row.
|
|
48
|
+
|
|
49
|
+
### Files
|
|
50
|
+
|
|
51
|
+
- `tests/e2e/score-history.jsonl` (5 rows backfilled, lines 22–25 + 30)
|
|
52
|
+
- `ROADMAP.md` (8 stale rows tombstoned)
|
|
53
|
+
- `CHANGELOG.md`, `SDLC.md`, `CLAUDE_CODE_SDLC_WIZARD.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json` (1.61.0 → 1.62.0)
|
|
54
|
+
|
|
7
55
|
## [1.61.0] - 2026-04-30
|
|
8
56
|
|
|
9
57
|
### Added
|
|
@@ -2974,7 +2974,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2974
2974
|
|
|
2975
2975
|
**SDLC.md:**
|
|
2976
2976
|
```markdown
|
|
2977
|
-
<!-- SDLC Wizard Version: 1.
|
|
2977
|
+
<!-- SDLC Wizard Version: 1.63.0 -->
|
|
2978
2978
|
<!-- Setup Date: [DATE] -->
|
|
2979
2979
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2980
2980
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -3657,6 +3657,20 @@ claude_args: "--max-budget-usd 5.00 --max-turns 30"
|
|
|
3657
3657
|
|
|
3658
3658
|
For organization-wide cost tracking, enable `CLAUDE_CODE_ENABLE_TELEMETRY=1`. This exports per-request `cost_usd`, `input_tokens`, `output_tokens` to any OTLP-compatible backend (Datadog, Honeycomb, Prometheus).
|
|
3659
3659
|
|
|
3660
|
+
### Cache-Cost Surprises
|
|
3661
|
+
|
|
3662
|
+
Anthropic prompt-cache reads bill at ~10% of normal input rate ($1.50/M vs $15/M on Opus). When cache hits drop unexpectedly, the same workload silently costs **10–20×** more — a pattern HN has documented multiple times. Two common triggers:
|
|
3663
|
+
|
|
3664
|
+
1. **Mid-session edits to `CLAUDE.md`, `SDLC.md`, project settings, or any file the cached prefix references.** The cache key includes the project context; editing context invalidates the prefix and forces re-upload (`cache_creation` spikes, `cache_read` collapses).
|
|
3665
|
+
2. **Upstream caching bugs.** Anthropic's [2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) documented one bug that "continuously dropped thinking blocks from subsequent requests" — invisible until the invoice arrived.
|
|
3666
|
+
|
|
3667
|
+
**Detection:** the wizard's `hooks/token-spike-check.sh` (ROADMAP #220, on-by-default for projects with a `.metrics/` directory) tracks per-session `costly_tokens = input + cache_creation + output` (excluding `cache_read`) and warns at `>2σ` over a rolling baseline. The cache-miss pattern (cache_read collapses → cache_creation spikes) shows up in `costly_tokens` directly. Test coverage in `tests/test-token-spike.sh:test_cache_miss_pattern_triggers_spike_warning` proves the absorption.
|
|
3668
|
+
|
|
3669
|
+
**Practices to avoid silent cache-cost blowups:**
|
|
3670
|
+
- Avoid editing `CLAUDE.md`/`SDLC.md`/project settings mid-session when cache stability matters. If you must, accept the next request will re-upload the prefix.
|
|
3671
|
+
- Run `/usage` (or `/cost`) after long sessions to spot anomalies before they accumulate.
|
|
3672
|
+
- Inspect `.metrics/token-history.jsonl` directly when you suspect a regression — fields are documented at the top of `tests/e2e/token-analytics.sh`.
|
|
3673
|
+
|
|
3660
3674
|
---
|
|
3661
3675
|
|
|
3662
3676
|
## CI/CD Gotchas
|
|
@@ -4039,7 +4053,7 @@ Walk through updates? (y/n)
|
|
|
4039
4053
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
4040
4054
|
|
|
4041
4055
|
```markdown
|
|
4042
|
-
<!-- SDLC Wizard Version: 1.
|
|
4056
|
+
<!-- SDLC Wizard Version: 1.63.0 -->
|
|
4043
4057
|
<!-- Setup Date: 2026-01-24 -->
|
|
4044
4058
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
4045
4059
|
<!-- Git Workflow: PRs -->
|
package/package.json
CHANGED
package/skills/update/SKILL.md
CHANGED
|
@@ -93,9 +93,11 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
|
|
|
93
93
|
|
|
94
94
|
```
|
|
95
95
|
Installed: 1.42.0
|
|
96
|
-
Latest: 1.
|
|
96
|
+
Latest: 1.63.0
|
|
97
97
|
|
|
98
98
|
What changed:
|
|
99
|
+
- [1.63.0] cache-cost observability closeout (#204 absorbed by #220) — `tests/test-token-spike.sh` gains explicit cache-miss regression test + negative-control test. SDLC skill + wizard doc gain "Cache-Cost Surprises" sections covering 10-20× silent cost blowups (mid-session CLAUDE.md edits, idle pruning, upstream cache bugs) and detection via `hooks/token-spike-check.sh`'s `costly_tokens` metric.
|
|
100
|
+
- [1.62.0] roadmap hygiene + #211 backfill — closes paperwork-stale rows (#207, #211 historical, #215, #217, #78, #79, #80, #219). Backfilled 5 corrupted `score-history.jsonl` rows from `max_score:10` → `max_score:11` (UI scenarios with design_system criterion). Codex strategic review confirmed scope.
|
|
99
101
|
- [1.61.0] calibration scenarios for #96 Phase 3 PR 2 — `tests/e2e/scenarios/calibration-careful-read.md` (parsePrice with 5 edge-case formats) tests whether self-review catches missed requirements. Score delta between SDLC and naive agents on this scenario is a calibration signal for `lift-proof.sh`
|
|
100
102
|
- [1.60.0] wizard-installation lift-proof harness (#96 Phase 3 PR 1) — `tests/e2e/lift-proof.sh` runs same scenario on bare vs wizard-installed fixture, emits score delta. Closes the "does the wizard work?" question. Honestly zero-API (sim + eval on Max)
|
|
101
103
|
- [1.59.0] evaluator on Max via `claude --print` (#228) — `EVAL_USE_CLI=1` swaps `evaluate.sh`'s per-criterion judge transport from `curl` → API to `claude --print --output-format json`. local-shepherd.sh sets it by default, so the local path is honestly zero-API
|