agentic-sdlc-wizard 1.27.0 → 1.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.27.0",
16
+ "version": "1.28.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.27.0",
3
+ "version": "1.28.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,23 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.28.0] - 2026-04-06
8
+
9
+ ### Added
10
+ - Autocompact benchmarking methodology — first rigorous framework for testing `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` thresholds (#92, PR #158)
11
+ - `AUTOCOMPACT_BENCHMARK.md`: experimental design, canary fact mechanism, cost estimation, limitations
12
+ - `tests/benchmarks/run-benchmark.sh`: parameterized harness with `--dry-run`, threshold validation, multi-turn session via `--resume`
13
+ - `tests/benchmarks/analyze-results.sh`: statistical comparison tables using `stats.sh`
14
+ - 3 task files (short/medium/long) with canary fact injection for context preservation measurement
15
+ - `canary-facts.json`: 5 domain-independent facts for binary recall scoring with negation detection
16
+ - `.github/workflows/benchmark-autocompact.yml`: `workflow_dispatch` with matrix strategy across thresholds
17
+ - 26 quality tests proving methodology rigor, harness behavior, and research standards
18
+
19
+ ### Changed
20
+ - README bio: reflects full-stack founding engineer background (not just SDET/QA)
21
+ - Wizard doc autocompact section references benchmarking methodology
22
+ - Workflow count updated (5→6) across README and CI
23
+
7
24
  ## [1.27.0] - 2026-04-05
8
25
 
9
26
  ### Added
@@ -800,7 +800,7 @@ Two tools for managing context — use the right one:
800
800
 
801
801
  ### Autocompact Tuning
802
802
 
803
- Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic:
803
+ Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic. For a rigorous benchmarking methodology to validate these thresholds, see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md).
804
804
 
805
805
  | Variable | What It Does | Default |
806
806
  |----------|-------------|---------|
@@ -828,6 +828,10 @@ export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75
828
828
 
829
829
  **Note:** These env vars may change as Claude Code evolves. Check [Claude Code settings docs](https://docs.anthropic.com/en/docs/claude-code/settings) for the latest supported configuration.
830
830
 
831
+ ### Benchmarking Methodology
832
+
833
+ The thresholds above are community consensus — not empirically validated. For rigorous benchmarking of autocompact thresholds (measuring task quality, context preservation, and cost at each setting), see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md). It provides a controlled experimental methodology with a novel "canary fact" mechanism for measuring context preservation post-compaction.
834
+
831
835
  ### 1M vs 200K Context Window
832
836
 
833
837
  Claude Code supports both 200K and 1M context windows. Choose based on your task:
@@ -2596,7 +2600,7 @@ If deployment fails or post-deploy verification catches issues:
2596
2600
 
2597
2601
  **SDLC.md:**
2598
2602
  ```markdown
2599
- <!-- SDLC Wizard Version: 1.27.0 -->
2603
+ <!-- SDLC Wizard Version: 1.28.0 -->
2600
2604
  <!-- Setup Date: [DATE] -->
2601
2605
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2602
2606
  <!-- Git Workflow: [PRs or Solo] -->
@@ -3655,7 +3659,7 @@ Walk through updates? (y/n)
3655
3659
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3656
3660
 
3657
3661
  ```markdown
3658
- <!-- SDLC Wizard Version: 1.27.0 -->
3662
+ <!-- SDLC Wizard Version: 1.28.0 -->
3659
3663
  <!-- Setup Date: 2026-01-24 -->
3660
3664
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3661
3665
  <!-- Git Workflow: PRs -->
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  A **self-evolving Software Development Life Cycle (SDLC) enforcement system for AI coding agents**. Makes Claude plan before coding, test before shipping, and ask when uncertain. Measures itself getting better over time.
4
4
 
5
- **Built on 15+ years of SDET and QA engineering experience** — battle-tested patterns from real production systems, baked into an AI agent that follows tried-and-true software quality practices so you don't have to enforce them manually.
5
+ **Built on 15+ years of software engineering and founding engineering experience** — battle-tested patterns from real production systems, baked into an AI agent that follows tried-and-true software quality practices so you don't have to enforce them manually.
6
6
 
7
7
  ## Install
8
8
 
@@ -229,7 +229,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
229
229
  | Document | What It Covers |
230
230
  |----------|---------------|
231
231
  | [ARCHITECTURE.md](ARCHITECTURE.md) | System design, 5-layer diagram, data flows, file structure |
232
- | [CI_CD.md](CI_CD.md) | All 5 workflows, E2E scoring, tier system, SDP, integrity checks |
232
+ | [CI_CD.md](CI_CD.md) | All 6 workflows, E2E scoring, tier system, SDP, integrity checks |
233
233
  | [SDLC.md](SDLC.md) | Version tracking, enforcement rules, SDLC configuration |
234
234
  | [TESTING.md](TESTING.md) | Testing philosophy, test diamond, TDD approach |
235
235
  | [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.27.0",
3
+ "version": "1.28.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "./cli/bin/sdlc-wizard.js"
@@ -46,9 +46,10 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
46
46
 
47
47
  ```
48
48
  Installed: 1.24.0
49
- Latest: 1.27.0
49
+ Latest: 1.28.0
50
50
 
51
51
  What changed:
52
+ - [1.28.0] Autocompact benchmarking methodology, canary fact mechanism, benchmark harness, ...
52
53
  - [1.27.0] Domain-adaptive testing diamond, 3 domain fixtures, 25 quality tests, ...
53
54
  - [1.26.0] Codex SDLC Adapter plan, claw-code/OmO/OmX research, CC feature discovery verified, ...
54
55
  - [1.25.0] Plugin format, 6 distribution channels (curl, Homebrew, gh, GitHub Releases), ...