agentic-sdlc-wizard 1.60.0 → 1.62.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.60.0",
16
+ "version": "1.62.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.60.0",
3
+ "version": "1.62.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,62 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.62.0] - 2026-04-30
8
+
9
+ ### Fixed
10
+
11
+ - **Backfilled 5 corrupted rows in `tests/e2e/score-history.jsonl`** — closes ROADMAP #211. UI-scenario rows (`add-ui-component`, `ui-styling-change`) had `score:11, max_score:10` because the design_system criterion adds an 11th point but the historical writer capped max at 10. Live scoring code was already correct (PR #216, v1.36.0); this PR backfills the historical data so trend analytics aren't poisoned. Found via Codex strategic-priority review: 5 rows on lines 22–25 + 30 fixed via single `sed -i 's/"score":11,"max_score":10/"score":11,"max_score":11/g'`. JSON validity preserved on all 5 rows (verified via per-line `jq empty`).
12
+
13
+ ### Closed (paperwork — already shipped, table rows were stale)
14
+
15
+ Codex strategic-priority review (`.reviews/grouping-review.md`) audited the open ROADMAP table and found 6 rows still presented as open despite shipping in earlier releases:
16
+
17
+ - **#207** — Community feature-discovery scanner (scanner shipped v1.39.0, fetcher v1.56.0 PR #286)
18
+ - **#215** — Tier 2 dead persist step (fixed v1.36.0; Tier 2 jobs subsequently deleted entirely per #212 Option 1)
19
+ - **#217** — `model-effort-check.sh` loud warning below xhigh (shipped 2026-04-24; three-tier logic at `hooks/model-effort-check.sh:44`)
20
+ - **#78** — Firmware E2E fixture (shipped earlier)
21
+ - **#79** — Domain-Adaptive Testing Diamond (shipped earlier)
22
+ - **#80** — SDLC Effectiveness Scoreboard (shipped earlier)
23
+
24
+ All six table rows now properly tombstoned with the shipping release reference. Reduces roadmap noise so future "what's next?" reads are honest.
25
+
26
+ ### Verified (config-side)
27
+
28
+ - **#219 — model-pin guidance against CC 2.1.117+ persistence change.** Verified locally on CC 2.1.118 (npm latest 2.1.123): both `cli/templates/settings.json` and `.claude/settings.json` have `has("model") == false`. The new persistence semantics (session-picked model now remembered across restarts) are orthogonal to #198's recommendation to omit the pin. `tests/test-cli.sh:1155` already asserts no default model pin so no new test needed. Optional manual UX check noted in roadmap row.
29
+
30
+ ### Files
31
+
32
+ - `tests/e2e/score-history.jsonl` (5 rows backfilled, lines 22–25 + 30)
33
+ - `ROADMAP.md` (8 stale rows tombstoned)
34
+ - `CHANGELOG.md`, `SDLC.md`, `CLAUDE_CODE_SDLC_WIZARD.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json` (1.61.0 → 1.62.0)
35
+
36
+ ## [1.61.0] - 2026-04-30
37
+
38
+ ### Added
39
+
40
+ - **Calibration scenario suite — closes ROADMAP #96 Phase 3 PR 2.** New `tests/e2e/scenarios/calibration-careful-read.md` is the first scenario in a new `calibration-*` family designed specifically to reward self-review and punish rushed implementations. Builds on PR 1 (the lift-proof harness, v1.60.0).
41
+
42
+ The scenario asks the agent to add a `parsePrice(input)` utility that handles five distinct formats (standard, cents-only, comma thousand-separator, surrounding whitespace, invalid). A self-reviewing agent reads all five requirements before coding; a rushed agent skims the first example and ships `parseFloat(s.replace('$', ''))` — which silently corrupts `'$1,000.00'` to `1` (a thousand-fold pricing bug). The score delta between these two agent profiles on this scenario is a load-bearing **calibration signal** for `lift-proof.sh`.
43
+
44
+ Out of scope: actual end-to-end calibration verification (does a low-effort agent actually score lower on this scenario than an xhigh agent?). That's deferred to ROADMAP #212(i) Prove-It Gate paired runs. PR 2 ships the scenario; #212(i) runs the comparison.
45
+
46
+ ### Tests
47
+
48
+ - New `tests/test-calibration-scenarios.sh` (6 tests): asserts a `calibration-*.md` exists, has the standard scenario format (`# Scenario:`, `## Task`, `## Fixture:`, calibration-signal docs), and lists ≥2 numbered requirements (the careful-read multi-path signal).
49
+ - Wired into `ci.yml` and `CONTRIBUTING.md` test list.
50
+
51
+ ### Files
52
+
53
+ - `tests/e2e/scenarios/calibration-careful-read.md` (new) — the parsePrice scenario
54
+ - `tests/test-calibration-scenarios.sh` (new) — format validator
55
+ - `.github/workflows/ci.yml` — new test step
56
+ - `CONTRIBUTING.md` — test list parity
57
+ - `CHANGELOG.md`, `ROADMAP.md`, `SDLC.md`, `CLAUDE_CODE_SDLC_WIZARD.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json` (1.60.0 → 1.61.0)
58
+
59
+ ### What's still TBD
60
+
61
+ The `calibration-*` family is intentionally extensible. Future scenarios could test other SDLC virtues — scope guard (don't fix things outside the task), TDD discipline (write tests before implementation), regression hygiene (don't break existing tests). Each new calibration scenario is a separate PR; PR 2 establishes the format.
62
+
7
63
  ## [1.60.0] - 2026-04-30
8
64
 
9
65
  ### Added
@@ -2974,7 +2974,7 @@ If deployment fails or post-deploy verification catches issues:
2974
2974
 
2975
2975
  **SDLC.md:**
2976
2976
  ```markdown
2977
- <!-- SDLC Wizard Version: 1.60.0 -->
2977
+ <!-- SDLC Wizard Version: 1.62.0 -->
2978
2978
  <!-- Setup Date: [DATE] -->
2979
2979
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2980
2980
  <!-- Git Workflow: [PRs or Solo] -->
@@ -4039,7 +4039,7 @@ Walk through updates? (y/n)
4039
4039
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
4040
4040
 
4041
4041
  ```markdown
4042
- <!-- SDLC Wizard Version: 1.60.0 -->
4042
+ <!-- SDLC Wizard Version: 1.62.0 -->
4043
4043
  <!-- Setup Date: 2026-01-24 -->
4044
4044
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
4045
4045
  <!-- Git Workflow: PRs -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.60.0",
3
+ "version": "1.62.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "cli/bin/sdlc-wizard.js"
@@ -93,9 +93,11 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
93
93
 
94
94
  ```
95
95
  Installed: 1.42.0
96
- Latest: 1.60.0
96
+ Latest: 1.62.0
97
97
 
98
98
  What changed:
99
+ - [1.62.0] roadmap hygiene + #211 backfill — closes paperwork-stale rows (#207, #211 historical, #215, #217, #78, #79, #80, #219). Backfilled 5 corrupted `score-history.jsonl` rows from `max_score:10` → `max_score:11` (UI scenarios with design_system criterion). Codex strategic review confirmed scope.
100
+ - [1.61.0] calibration scenarios for #96 Phase 3 PR 2 — `tests/e2e/scenarios/calibration-careful-read.md` (parsePrice with 5 edge-case formats) tests whether self-review catches missed requirements. Score delta between SDLC and naive agents on this scenario is a calibration signal for `lift-proof.sh`
99
101
  - [1.60.0] wizard-installation lift-proof harness (#96 Phase 3 PR 1) — `tests/e2e/lift-proof.sh` runs same scenario on bare vs wizard-installed fixture, emits score delta. Closes the "does the wizard work?" question. Honestly zero-API (sim + eval on Max)
100
102
  - [1.59.0] evaluator on Max via `claude --print` (#228) — `EVAL_USE_CLI=1` swaps `evaluate.sh`'s per-criterion judge transport from `curl` → API to `claude --print --output-format json`. local-shepherd.sh sets it by default, so the local path is honestly zero-API
101
103
  - [1.58.0] ground-truth gate for E2E benchmark (#96 Phase 2) — `tests/e2e/ground-truth.sh` runs `npm test` post-sim; final score capped at 5 if tests fail. Catches "agent followed protocol but produced broken code"