npm - agentic-sdlc-wizard - Versions diffs - 1.57.0 → 1.58.0 - Mend

agentic-sdlc-wizard 1.57.0 → 1.58.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +36 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +2 -2
package/package.json +1 -1
package/skills/update/SKILL.md +2 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.57.0",
+      "version": "1.58.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.57.0",
+  "version": "1.58.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,42 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.58.0] - 2026-04-30
+### Added
+- **Ground-truth gate for the E2E benchmark — closes ROADMAP #96 Phase 2.** New `tests/e2e/ground-truth.sh` runs the fixture's own test suite (`npm test`) post-simulation and emits structured JSON: `{tests_run, tests_pass, tests_rc, tests_tail}`. `local-shepherd.sh` calls it after the evaluator, before the score-history append. If tests fail, the final score is **capped at 5/10** (configurable via `GROUND_TRUTH_FAIL_CAP`) — the judge can't tell if `npm test` actually passes; only running it can.
+  Combined with Phase 1's de-coaching (v1.57.0), the benchmark now requires **both** judge approval **and** real test passage. Catches "agent followed protocol but produced broken code" false-positives — exactly the failure mode the v1.32.0 cross-model audit warned about.
+  Score-history rows now record `original_judge_score`, `tests_run`, `tests_pass`, `ground_truth_gated` so trend analytics can distinguish judge noise from real regressions.
+  Cross-platform timeout: uses `timeout` if available, falls back to `gtimeout` (coreutils on macOS), and finally a portable `perl` fallback for stock systems. Default `GROUND_TRUTH_TIMEOUT=120s`.
+  Escape hatches:
+  - `SDLC_SHEPHERD_SKIP_GROUND_TRUTH=1` — disables the gate entirely (raw judge scores)
+  - `SDLC_SHEPHERD_FIXTURE_DIR=...` — override fixture location (default `tests/e2e/fixtures/test-repo`)
+  - `SDLC_SHEPHERD_GROUND_TRUTH=...` — override script path (used by tests)
+### Tests
+- New `tests/test-ground-truth.sh` (11 tests): passing/failing/no-test/no-package/missing-dir/no-args/help/JSON-validity/timeout-enforcement.
+- 4 new integration tests in `tests/test-local-shepherd.sh` (38 total): gate caps judge=9 to score=5, gate leaves passing judge alone, no-tests fixture skips gate, `SKIP_GROUND_TRUTH` env var fully disables gate.
+- Wired into `ci.yml` and `CONTRIBUTING.md` test list.
+### Files
+- `tests/e2e/ground-truth.sh` (new, 105 lines)
+- `tests/test-ground-truth.sh` (new, 11 tests)
+- `tests/e2e/local-shepherd.sh` (gate logic + new score-history fields)
+- `tests/test-local-shepherd.sh` (4 new integration tests)
+- `.github/workflows/ci.yml` + `CONTRIBUTING.md` (test list)
+- Version bump 1.57.0 → 1.58.0
+### Phase 3 (future)
+- Install wizard files into the local-shepherd test fixture so the simulation tests "wizard-installed agent" vs "wizard-less agent." That's the actual *"does the wizard work?"* test — pairs with calibration scenarios that include subtle bugs to test self-review effectiveness.
 ## [1.57.0] - 2026-04-30
 ### Fixed

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -2974,7 +2974,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.57.0 -->
+<!-- SDLC Wizard Version: 1.58.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -4039,7 +4039,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.57.0 -->
+<!-- SDLC Wizard Version: 1.58.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.57.0",
+  "version": "1.58.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "cli/bin/sdlc-wizard.js"

package/skills/update/SKILL.md CHANGED Viewed

@@ -93,9 +93,10 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
 ```
 Installed: 1.42.0
-Latest:    1.57.0
+Latest:    1.58.0
 What changed:
+- [1.58.0] ground-truth gate for E2E benchmark (#96 Phase 2) — `tests/e2e/ground-truth.sh` runs `npm test` post-sim; final score capped at 5 if tests fail. Catches "agent followed protocol but produced broken code"
 - [1.57.0] de-coach E2E benchmark prompt (#96 Phase 1) — remove answer-key leakage that saturated benchmark scores at 10/10; new neutral task framing measures organic SDLC behavior
 - [1.56.0] community feature-discovery fetcher (#207) — `tests/e2e/fetch-community.sh` pulls Reddit + HN; pipe to `scan-community.sh` to surface candidate /slash-commands
 - [1.55.0] shrink weekly-update.yml dead code (#231 Phase 4) — delete env.VERSION_SCENARIO + has_overlap/overlap_paths outputs + placeholder/parse steps; 289 → 161 lines (-44%)