agentic-sdlc-wizard 1.28.0 → 1.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.28.0",
16
+ "version": "1.30.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.28.0",
3
+ "version": "1.30.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,58 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.30.0] - 2026-04-12
8
+
9
+ ### Added
10
+ - CC degradation detection (#96, PR #166)
11
+ - Score persistence: CI now git-commits `score-history.jsonl` to PR branch after E2E runs, feeding CUSUM drift detection with real data
12
+ - Fork guard (`head.repo.full_name == github.repository`) prevents silent push failures on fork PRs
13
+ - Injection-safe: `head.ref` passed via `env:` block, not inline `${{ }}`
14
+ - Wizard effort section hardened: explains adaptive thinking root cause (Boris Cherny GH #42796), scopes "medium default" to Pro/Max plans, cites code.claude.com docs
15
+ - `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING` documented as opt-in hardening (not default)
16
+ - Anti-laziness CLAUDE.md guidance section targeting specific mechanisms (adaptive thinking, effort levels, thinking budget)
17
+ - 14 behavioral tests (`test-degradation-detection.sh`)
18
+ - Model A/B comparison workflow (#94, PRs #164, #165)
19
+ - `workflow_dispatch` benchmark: Opus vs Sonnet on E2E scenarios with 95% CI
20
+ - Matrix strategy over scenarios, parameterized model/trials/max_turns
21
+ - Wizard installation verification before simulation (P0 fix)
22
+ - jq-based artifact construction (safe against empty outputs)
23
+ - 37 quality tests (`test-model-comparison.sh`)
24
+ - Firmware-embedded E2E fixture (#78, PR #163)
25
+ - Python SD card overlay manager, 3 device configs (Raspberry Pi, STM32, ESP32)
26
+ - SIL + config validation tests within fixture
27
+ - Domain-adaptive testing proof: firmware indicators, Python overlay, multi-device differentiation
28
+ - 12 quality tests (`test-firmware-fixture.sh`)
29
+
30
+ ### Fixed
31
+ - P0 shell injection in model comparison workflow: `${{ inputs.model }}` directly in `run:` blocks. Fixed by passing all inputs through `env:` block (caught by Codex review)
32
+
33
+ ## [1.29.0] - 2026-04-07
34
+
35
+ ### Added
36
+ - Node 24 compliance across all GitHub Actions workflows (#93, PR #160)
37
+ - 5 action version bumps: checkout@v5, setup-node@v5, upload-artifact@v6, create-pull-request@v8, sticky-pull-request-comment@v3
38
+ - 2 third-party actions replaced with `gh` CLI: `int128/hide-comment-action` → GraphQL `minimizeComment`, `softprops/action-gh-release` → `gh release create`
39
+ - 4 node-version bumps from 20 to 22
40
+ - 13 new compliance regression tests (`test-node24-compliance.sh`)
41
+ - Expression injection P0 in release.yml caught by CI reviewer and fixed
42
+ - Autocompact env var in settings.json (#88, PR #161)
43
+ - CLI now ships `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` in `settings.json` `env` field (200K default)
44
+ - Smart merge preserves existing user env vars on upgrade; `--force` resets to defaults
45
+ - Handles malformed env values (arrays, strings) gracefully with type validation
46
+ - Setup wizard Step 9.5 references settings.json instead of shell profiles; 1M users guided to 30%
47
+ - 9 new tests (41 total CLI tests)
48
+ - Effectiveness scoreboard (#80, PR #162)
49
+ - `.metrics/catches.jsonl`: 52 historical bug catches extracted from repo history
50
+ - `catch-analytics.sh`: DDE (Defect Detection Effectiveness) per layer, escape rates, severity breakdown
51
+ - Results: cross-model-review (48%) and self-review (46%) nearly tied; self-review missed 28 bugs caught downstream; all 3 P0s caught by cross-model or CI review
52
+ - 14 new quality tests (`test-effectiveness-scoreboard.sh`)
53
+ - Log automation deferred until analytics proven useful (prove-it gate)
54
+
55
+ ### Fixed
56
+ - Expression injection in `release.yml`: `${{ github.ref_name }}` directly in `run:` shell command allowed tag-based code injection. Fixed by passing through `TAG_NAME` env var (P0, caught by CI reviewer)
57
+ - `$TOTAL_` variable name collision in `catch-analytics.sh`: bash parsed as undefined variable `TOTAL_` instead of `$TOTAL` + underscore. Fixed with `${TOTAL}_` brace syntax (P0, caught by CI reviewer)
58
+
7
59
  ## [1.28.0] - 2026-04-06
8
60
 
9
61
  ### Added
@@ -224,7 +224,11 @@ Claude Code's **effort level** controls how much thinking the model does before
224
224
  | `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
225
225
  | `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
226
226
 
227
- **Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
227
+ **Why `high` is the default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level is **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
228
+
229
+ The `/sdlc` skill sets `effort: high` in its frontmatter, overriding the medium default on every SDLC invocation. This gives thorough reasoning without the unbounded token cost of `max`.
230
+
231
+ **Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
228
232
 
229
233
  **When to escalate to `max`:**
230
234
  - You hit LOW confidence on your approach — deeper thinking may find clarity
@@ -242,6 +246,21 @@ Claude Code's **effort level** controls how much thinking the model does before
242
246
 
243
247
  > See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
244
248
 
249
+ ### Anti-Laziness Guidance for CLAUDE.md
250
+
251
+ If you notice Claude Code producing shallow outputs despite `effort: high`, add these instructions to your project's `CLAUDE.md`. These target the **specific mechanisms** behind quality degradation — adaptive thinking and effort levels — rather than vague directives:
252
+
253
+ ```markdown
254
+ ## Quality Anchoring
255
+ - This project uses effort: high via SDLC skill frontmatter. Do not reduce reasoning depth.
256
+ - Adaptive thinking may under-allocate your thinking budget on complex tasks. When working on
257
+ multi-file changes, architecture decisions, or debugging: reason through the full problem
258
+ before acting, even if the system prompt suggests taking the "simplest approach first."
259
+ - If you catch yourself skipping steps, re-read the task requirements and verify completeness.
260
+ ```
261
+
262
+ **Why this works:** Claude Code's hidden system prompt includes "Go straight to the point. Try the simplest approach first." This is good for simple queries but causes the model to under-invest in reasoning on complex SDLC tasks. The instructions above don't fight the system prompt — they provide task-specific context that justifies deeper reasoning. Note that CLAUDE.md instructions can be partially overridden by the system prompt, so `effort: high` in skill frontmatter remains the primary defense.
263
+
245
264
  ---
246
265
 
247
266
  ## Claude Code Feature Updates
@@ -807,10 +826,19 @@ Override the default auto-compact threshold with environment variables. These ar
807
826
  | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
808
827
  | `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |
809
828
 
810
- Set these in your shell profile (`~/.bashrc`, `~/.zshrc`) or per-project `.envrc`:
829
+ **Recommended:** The SDLC Wizard CLI sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` in `.claude/settings.json` by default (200K model optimized). To customize, edit the `env` field in `.claude/settings.json`:
830
+
831
+ ```json
832
+ {
833
+ "env": {
834
+ "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
835
+ }
836
+ }
837
+ ```
838
+
839
+ Alternatively, set via shell profile (`~/.bashrc`, `~/.zshrc`) or per-project `.envrc`:
811
840
 
812
841
  ```bash
813
- # Example: compact earlier on a 200K model
814
842
  export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75
815
843
  ```
816
844
 
@@ -2600,7 +2628,7 @@ If deployment fails or post-deploy verification catches issues:
2600
2628
 
2601
2629
  **SDLC.md:**
2602
2630
  ```markdown
2603
- <!-- SDLC Wizard Version: 1.28.0 -->
2631
+ <!-- SDLC Wizard Version: 1.30.0 -->
2604
2632
  <!-- Setup Date: [DATE] -->
2605
2633
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2606
2634
  <!-- Git Workflow: [PRs or Solo] -->
@@ -3659,7 +3687,7 @@ Walk through updates? (y/n)
3659
3687
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3660
3688
 
3661
3689
  ```markdown
3662
- <!-- SDLC Wizard Version: 1.28.0 -->
3690
+ <!-- SDLC Wizard Version: 1.30.0 -->
3663
3691
  <!-- Setup Date: 2026-01-24 -->
3664
3692
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3665
3693
  <!-- Git Workflow: PRs -->
package/README.md CHANGED
@@ -229,7 +229,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
229
229
  | Document | What It Covers |
230
230
  |----------|---------------|
231
231
  | [ARCHITECTURE.md](ARCHITECTURE.md) | System design, 5-layer diagram, data flows, file structure |
232
- | [CI_CD.md](CI_CD.md) | All 6 workflows, E2E scoring, tier system, SDP, integrity checks |
232
+ | [CI_CD.md](CI_CD.md) | All 7 workflows, E2E scoring, tier system, SDP, integrity checks |
233
233
  | [SDLC.md](SDLC.md) | Version tracking, enforcement rules, SDLC configuration |
234
234
  | [TESTING.md](TESTING.md) | Testing philosophy, test diamond, TDD approach |
235
235
  | [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |
package/cli/init.js CHANGED
@@ -51,6 +51,18 @@ function mergeSettings(existingPath, templatePath, force) {
51
51
  const existing = JSON.parse(fs.readFileSync(existingPath, 'utf8'));
52
52
  const template = JSON.parse(fs.readFileSync(templatePath, 'utf8'));
53
53
 
54
+ // Merge env field
55
+ if (template.env) {
56
+ if (!existing.env || typeof existing.env !== 'object' || Array.isArray(existing.env)) {
57
+ existing.env = {};
58
+ }
59
+ for (const [key, val] of Object.entries(template.env)) {
60
+ if (!(key in existing.env) || force) {
61
+ existing.env[key] = val;
62
+ }
63
+ }
64
+ }
65
+
54
66
  if (!existing.hooks) existing.hooks = {};
55
67
 
56
68
  for (const [event, templateEntries] of Object.entries(template.hooks || {})) {
@@ -1,4 +1,7 @@
1
1
  {
2
+ "env": {
3
+ "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
4
+ },
2
5
  "hooks": {
3
6
  "UserPromptSubmit": [
4
7
  {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.28.0",
3
+ "version": "1.30.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "./cli/bin/sdlc-wizard.js"
@@ -183,13 +183,23 @@ Present suggestions and let the user confirm.
183
183
 
184
184
  ### Step 9.5: Context Window Configuration
185
185
 
186
- Recommend autocompact settings based on the user's context window:
186
+ The CLI already sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` in `.claude/settings.json` `env` field (200K default). Ask the user which model context window they use:
187
+
188
+ - **200K models (default):** Already configured. Confirm `75` is set in `settings.json` `env` field
189
+ - **1M models:** Update `settings.json` `env` field: set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to `"30"` — the default fires at ~76K on 1M, wasting 92% of the window. Optionally also add `CLAUDE_CODE_AUTO_COMPACT_WINDOW` set to `"400000"`
190
+
191
+ To update, edit `.claude/settings.json`:
192
+ ```json
193
+ {
194
+ "env": {
195
+ "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
196
+ }
197
+ }
198
+ ```
187
199
 
188
- - **200K models (default):** Suggest `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` — leaves room for implementation after planning
189
- - **1M models:** Suggest `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` — the default fires at ~76K on 1M, wasting 92% of the window
190
- - **CI pipelines:** Suggest 60% — short tasks, compact early
200
+ For CI pipelines, consider `"60"` — short tasks benefit from compacting early.
191
201
 
192
- Tell the user to add the export to their shell profile (`~/.bashrc`, `~/.zshrc`) or project `.envrc`. This is guidance, not enforcement the wizard doesn't write shell profiles.
202
+ This is project-scoped and shared with the team via git.
193
203
 
194
204
  ### Step 10: Customize Hooks
195
205
 
@@ -46,9 +46,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
46
46
 
47
47
  ```
48
48
  Installed: 1.24.0
49
- Latest: 1.28.0
49
+ Latest: 1.30.0
50
50
 
51
51
  What changed:
52
+ - [1.30.0] Firmware fixture, model A/B comparison workflow, CC degradation detection, ...
53
+ - [1.29.0] Node 24 compliance, autocompact in settings.json, effectiveness scoreboard, ...
52
54
  - [1.28.0] Autocompact benchmarking methodology, canary fact mechanism, benchmark harness, ...
53
55
  - [1.27.0] Domain-adaptive testing diamond, 3 domain fixtures, 25 quality tests, ...
54
56
  - [1.26.0] Codex SDLC Adapter plan, claw-code/OmO/OmX research, CC feature discovery verified, ...