npm - agentic-sdlc-wizard - Versions diffs - 1.29.0 → 1.30.0 - Mend

agentic-sdlc-wizard 1.29.0 → 1.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +26 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +22 -3
package/README.md +1 -1
package/package.json +1 -1
package/skills/update/SKILL.md +2 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.29.0",
+      "version": "1.30.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.29.0",
+  "version": "1.30.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,32 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.30.0] - 2026-04-12
+### Added
+- CC degradation detection (#96, PR #166)
+  - Score persistence: CI now git-commits `score-history.jsonl` to PR branch after E2E runs, feeding CUSUM drift detection with real data
+  - Fork guard (`head.repo.full_name == github.repository`) prevents silent push failures on fork PRs
+  - Injection-safe: `head.ref` passed via `env:` block, not inline `${{ }}`
+  - Wizard effort section hardened: explains adaptive thinking root cause (Boris Cherny GH #42796), scopes "medium default" to Pro/Max plans, cites code.claude.com docs
+  - `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING` documented as opt-in hardening (not default)
+  - Anti-laziness CLAUDE.md guidance section targeting specific mechanisms (adaptive thinking, effort levels, thinking budget)
+  - 14 behavioral tests (`test-degradation-detection.sh`)
+- Model A/B comparison workflow (#94, PRs #164, #165)
+  - `workflow_dispatch` benchmark: Opus vs Sonnet on E2E scenarios with 95% CI
+  - Matrix strategy over scenarios, parameterized model/trials/max_turns
+  - Wizard installation verification before simulation (P0 fix)
+  - jq-based artifact construction (safe against empty outputs)
+  - 37 quality tests (`test-model-comparison.sh`)
+- Firmware-embedded E2E fixture (#78, PR #163)
+  - Python SD card overlay manager, 3 device configs (Raspberry Pi, STM32, ESP32)
+  - SIL + config validation tests within fixture
+  - Domain-adaptive testing proof: firmware indicators, Python overlay, multi-device differentiation
+  - 12 quality tests (`test-firmware-fixture.sh`)
+### Fixed
+- P0 shell injection in model comparison workflow: `${{ inputs.model }}` directly in `run:` blocks. Fixed by passing all inputs through `env:` block (caught by Codex review)
 ## [1.29.0] - 2026-04-07
 ### Added

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -224,7 +224,11 @@ Claude Code's **effort level** controls how much thinking the model does before
 | `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
 | `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
-**Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
+**Why `high` is the default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level is **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
+The `/sdlc` skill sets `effort: high` in its frontmatter, overriding the medium default on every SDLC invocation. This gives thorough reasoning without the unbounded token cost of `max`.
+**Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
 **When to escalate to `max`:**
 - You hit LOW confidence on your approach — deeper thinking may find clarity
@@ -242,6 +246,21 @@ Claude Code's **effort level** controls how much thinking the model does before
 > See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
+### Anti-Laziness Guidance for CLAUDE.md
+If you notice Claude Code producing shallow outputs despite `effort: high`, add these instructions to your project's `CLAUDE.md`. These target the **specific mechanisms** behind quality degradation — adaptive thinking and effort levels — rather than vague directives:
+```markdown
+## Quality Anchoring
+- This project uses effort: high via SDLC skill frontmatter. Do not reduce reasoning depth.
+- Adaptive thinking may under-allocate your thinking budget on complex tasks. When working on
+  multi-file changes, architecture decisions, or debugging: reason through the full problem
+  before acting, even if the system prompt suggests taking the "simplest approach first."
+- If you catch yourself skipping steps, re-read the task requirements and verify completeness.
+```
+**Why this works:** Claude Code's hidden system prompt includes "Go straight to the point. Try the simplest approach first." This is good for simple queries but causes the model to under-invest in reasoning on complex SDLC tasks. The instructions above don't fight the system prompt — they provide task-specific context that justifies deeper reasoning. Note that CLAUDE.md instructions can be partially overridden by the system prompt, so `effort: high` in skill frontmatter remains the primary defense.
 ---
 ## Claude Code Feature Updates
@@ -2609,7 +2628,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.29.0 -->
+<!-- SDLC Wizard Version: 1.30.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -3668,7 +3687,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.29.0 -->
+<!-- SDLC Wizard Version: 1.30.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/README.md CHANGED Viewed

@@ -229,7 +229,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
 | Document | What It Covers |
 |----------|---------------|
 | [ARCHITECTURE.md](ARCHITECTURE.md) | System design, 5-layer diagram, data flows, file structure |
-| [CI_CD.md](CI_CD.md) | All 6 workflows, E2E scoring, tier system, SDP, integrity checks |
+| [CI_CD.md](CI_CD.md) | All 7 workflows, E2E scoring, tier system, SDP, integrity checks |
 | [SDLC.md](SDLC.md) | Version tracking, enforcement rules, SDLC configuration |
 | [TESTING.md](TESTING.md) | Testing philosophy, test diamond, TDD approach |
 | [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.29.0",
+  "version": "1.30.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "./cli/bin/sdlc-wizard.js"

package/skills/update/SKILL.md CHANGED Viewed

@@ -46,9 +46,10 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
 ```
 Installed: 1.24.0
-Latest:    1.29.0
+Latest:    1.30.0
 What changed:
+- [1.30.0] Firmware fixture, model A/B comparison workflow, CC degradation detection, ...
 - [1.29.0] Node 24 compliance, autocompact in settings.json, effectiveness scoreboard, ...
 - [1.28.0] Autocompact benchmarking methodology, canary fact mechanism, benchmark harness, ...
 - [1.27.0] Domain-adaptive testing diamond, 3 domain fixtures, 25 quality tests, ...