agentic-sdlc-wizard 1.59.0 → 1.60.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.59.0",
16
+ "version": "1.60.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.59.0",
3
+ "version": "1.60.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,43 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.60.0] - 2026-04-30
8
+
9
+ ### Added
10
+
11
+ - **Wizard-installation lift-proof harness — closes ROADMAP #96 Phase 3 PR 1.** The benchmark series goes:
12
+ - **Phase 1** (v1.57.0): de-coached the simulation prompt so the agent isn't told what's scored.
13
+ - **Phase 2** (v1.58.0): added the ground-truth gate so the judge can't pass broken code.
14
+ - **Phase 3 PR 1** (this): measures **the wizard's contribution itself** by running the same scenario against a **bare** fixture (no `.claude/`) and a **wizard-installed** fixture, then emitting the score delta. Positive delta = the wizard lifts organic SDLC behavior. That's the load-bearing claim of the entire harness.
15
+
16
+ New script `tests/e2e/lift-proof.sh`:
17
+ - Two legs in tmp dirs — bare (empty `.claude/`) and wizard-installed (`hooks/skills/settings.json` copied from `$REPO_ROOT/.claude/`).
18
+ - Same parity flags as `local-shepherd.sh` (`--max-turns 55 --output-format json --add-dir tests/e2e`).
19
+ - Both legs invoke `evaluate.sh` under `EVAL_USE_CLI=1` (#228) — honestly zero-API end-to-end.
20
+ - Emits structured JSON artifact at `.benchmark/lift-proof-<timestamp>.json` with `bare_score`, `wizard_score`, `delta`, `lift`, plus provenance.
21
+ - `--dry-run` mode for CI smoke + tests; `--scenario` flag (default `add-feature`); `--output` to override artifact path.
22
+
23
+ - **`tests/e2e/lib/wizard-installer.sh:install_wizard_into_fixture()`** — single source of truth for "the wizard installed into a project." Replaces the inline `cp -R` lines in `local-shepherd.sh:_build_strip_dir`. Accepts `<source_dir> <target_dir>`, copies `hooks/`, `skills/`, and `settings.json` into `<target_dir>/.claude/`. Errors loudly on missing source or target instead of silently no-op'ing.
24
+
25
+ ### Tests
26
+
27
+ - New `tests/test-wizard-installer.sh` (17 tests): library exposes `install_wizard_into_fixture()`, runtime check that hooks+skills+settings land in target, error paths for missing source/target, `local-shepherd.sh` sources the lib (single source of truth), `lift-proof.sh` runs claude --print twice / calls evaluator / emits delta / inherits `EVAL_USE_CLI=1` / writes artifact.
28
+ - Wired into `ci.yml` and `CONTRIBUTING.md` test list.
29
+
30
+ ### Files
31
+
32
+ - `tests/e2e/lib/wizard-installer.sh` (new) — reusable install helper
33
+ - `tests/e2e/lift-proof.sh` (new) — bare-vs-wizard orchestrator
34
+ - `tests/e2e/local-shepherd.sh` — `_build_strip_dir` now sources the lib instead of inlining `cp -R` (DRY)
35
+ - `tests/test-wizard-installer.sh` (new, 17 tests)
36
+ - `.github/workflows/ci.yml` (new test step)
37
+ - `CONTRIBUTING.md` (test list)
38
+ - Version bump 1.59.0 → 1.60.0
39
+
40
+ ### What's still TBD for #96 Phase 3
41
+
42
+ - **PR 2** — calibration scenarios with subtle bugs to test self-review effectiveness (1-3 focused scenarios proving the harness catches real behavioral misses). Lands as a separate PR per Codex's overnight prioritization recommendation: keep fixture plumbing separate from scenario design.
43
+
7
44
  ## [1.59.0] - 2026-04-30
8
45
 
9
46
  ### Added
@@ -2974,7 +2974,7 @@ If deployment fails or post-deploy verification catches issues:
2974
2974
 
2975
2975
  **SDLC.md:**
2976
2976
  ```markdown
2977
- <!-- SDLC Wizard Version: 1.59.0 -->
2977
+ <!-- SDLC Wizard Version: 1.60.0 -->
2978
2978
  <!-- Setup Date: [DATE] -->
2979
2979
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2980
2980
  <!-- Git Workflow: [PRs or Solo] -->
@@ -4039,7 +4039,7 @@ Walk through updates? (y/n)
4039
4039
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
4040
4040
 
4041
4041
  ```markdown
4042
- <!-- SDLC Wizard Version: 1.59.0 -->
4042
+ <!-- SDLC Wizard Version: 1.60.0 -->
4043
4043
  <!-- Setup Date: 2026-01-24 -->
4044
4044
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
4045
4045
  <!-- Git Workflow: PRs -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.59.0",
3
+ "version": "1.60.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "cli/bin/sdlc-wizard.js"
@@ -93,9 +93,10 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
93
93
 
94
94
  ```
95
95
  Installed: 1.42.0
96
- Latest: 1.59.0
96
+ Latest: 1.60.0
97
97
 
98
98
  What changed:
99
+ - [1.60.0] wizard-installation lift-proof harness (#96 Phase 3 PR 1) — `tests/e2e/lift-proof.sh` runs same scenario on bare vs wizard-installed fixture, emits score delta. Closes the "does the wizard work?" question. Honestly zero-API (sim + eval on Max)
99
100
  - [1.59.0] evaluator on Max via `claude --print` (#228) — `EVAL_USE_CLI=1` swaps `evaluate.sh`'s per-criterion judge transport from `curl` → API to `claude --print --output-format json`. local-shepherd.sh sets it by default, so the local path is honestly zero-API
100
101
  - [1.58.0] ground-truth gate for E2E benchmark (#96 Phase 2) — `tests/e2e/ground-truth.sh` runs `npm test` post-sim; final score capped at 5 if tests fail. Catches "agent followed protocol but produced broken code"
101
102
  - [1.57.0] de-coach E2E benchmark prompt (#96 Phase 1) — remove answer-key leakage that saturated benchmark scores at 10/10; new neutral task framing measures organic SDLC behavior