agentic-sdlc-wizard 1.68.0 → 1.70.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.68.0",
16
+ "version": "1.70.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.68.0",
3
+ "version": "1.70.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,59 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.70.0] - 2026-05-05
8
+
9
+ ### Token-bloat fix: TDD CHECK nudge fires once per CC session
10
+
11
+ `hooks/tdd-pretool-check.sh` was emitting a ~50-token JSON nudge ("TDD CHECK: Are you writing IMPLEMENTATION before a FAILING TEST?") on every Write/Edit/MultiEdit touching `src/**`. After the SDLC skill auto-invokes (which already covers TDD RED/GREEN), the per-Edit nudge is duplicate context — typical SDLC session has 10-30 src Edits = ~0.5-1.5K wasted tokens.
12
+
13
+ Now gated on per-`session_id` sentinel under `$SDLC_WIZARD_CACHE_DIR/tdd-shown-<id>`, atomic-claimed via subshell `set -C` (noclobber). Same pattern as v1.69.0 BASELINE gate.
14
+
15
+ ### Behavior
16
+
17
+ - **First src/ edit of a CC session** → TDD CHECK emits as before.
18
+ - **Subsequent src/ edits (same session_id)** → TDD CHECK suppressed.
19
+ - **New CC session (different session_id)** → TDD CHECK re-emits.
20
+ - **Non-src/ files** → no output (existing behavior, regardless of sentinel). Editing README first does NOT consume the sentinel slot — TDD CHECK still fires on first src/ edit afterward.
21
+ - **No session_id in stdin** (legacy CC, direct shell tests) → emits every src/ edit (back-compat preserved).
22
+ - **N parallel src/ edits with same session_id** → exactly 1 TDD CHECK emit (atomic claim).
23
+
24
+ ### Files
25
+
26
+ - `hooks/tdd-pretool-check.sh` — atomic-claim sentinel + jq-decoupled session_id extraction.
27
+ - `tests/test-tdd-pretool-fires-once.sh` (new — 9 cases including 50-parallel concurrency, non-src/ doesn't consume sentinel, suppressed-fire-empty assertion).
28
+ - `.github/workflows/ci.yml`, `CONTRIBUTING.md` — wire new test into validate job + contributor checklist.
29
+ - `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json`, `CLAUDE_CODE_SDLC_WIZARD.md` (1.69.0 → 1.70.0).
30
+
31
+ ### Notes
32
+
33
+ ROADMAP #236 functional-bloat audit, phase 2. Phase 1 (v1.69.0) trimmed the BASELINE block (~12K tokens/session). Phase 2 trims the per-Edit nudge. Combined savings on a 50-prompt + 20-Edit session: ~13.5K tokens. Audit method continues — measure cost × frequency, judge value, don't blind-delete. Other always-on hooks (`model-effort-check`, `precompact-seam-check`, `token-spike-check`) remain silent at healthy state and are not bloat.
34
+
35
+ ## [1.69.0] - 2026-05-04
36
+
37
+ ### Token-bloat fix: BASELINE block fires once per CC session
38
+
39
+ Cuts ~12K tokens/session of duplicate context for users with >3 prompts. The `SDLC BASELINE` block in `hooks/sdlc-prompt-check.sh` (~250 tokens) was firing on every `UserPromptSubmit` — once Claude has the SDLC skill auto-invoked (covers TodoWrite/confidence/workflow phases), every subsequent re-emission is pure duplication. Now gated by a per-`session_id` sentinel under `$SDLC_WIZARD_CACHE_DIR/baseline-shown-<id>`, pruned at 7d.
40
+
41
+ ### Behavior
42
+
43
+ - **First prompt of a CC session** → BASELINE emits as before (cold-start nudge survives).
44
+ - **Subsequent prompts (same session_id)** → BASELINE suppressed.
45
+ - **New CC session (different session_id)** → BASELINE re-emits.
46
+ - **No session_id in stdin** (legacy CC, direct shell tests) → BASELINE emits every fire (back-compat preserved).
47
+ - `SETUP NOT COMPLETE` warning + `EFFORT BUMP REQUIRED` nudge **continue to fire every prompt** — they're dynamic state warnings, not static reminders.
48
+
49
+ ### Files
50
+
51
+ - `hooks/sdlc-prompt-check.sh` — extracts `session_id` from stdin JSON; gates the static BASELINE block via per-session sentinel; prunes >7d sentinels on emit.
52
+ - `tests/test-baseline-fires-once.sh` (new — 8 cases covering first-fire, suppression, different-session re-emit, no-session-id back-compat, SETUP-warning persistence, EFFORT-bump persistence, cross-cache-dir isolation, byte-shrink verification).
53
+ - `.github/workflows/ci.yml` — wires new test into `validate` job.
54
+ - `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json`, `CLAUDE_CODE_SDLC_WIZARD.md` (1.68.0 → 1.69.0).
55
+
56
+ ### Notes
57
+
58
+ Discovered during ROADMAP #236 functional-bloat audit. Identified `sdlc-prompt-check.sh` as the #1 amplifier (every-prompt × 22 lines × N prompts). Audit method: measure cost × frequency, judge value — not blind delete-and-see. Per-prompt BASELINE failed cost/value once skill is loaded; conditional warnings + effort-bump detector earned their keep and stayed untouched. Other hooks (`model-effort-check`, `precompact-seam-check`, `token-spike-check`) are silent at healthy state — not bloat.
59
+
7
60
  ## [1.68.0] - 2026-05-04
8
61
 
9
62
  ### Closed (paperwork-stale roadmap rows)
@@ -2976,7 +2976,7 @@ If deployment fails or post-deploy verification catches issues:
2976
2976
 
2977
2977
  **SDLC.md:**
2978
2978
  ```markdown
2979
- <!-- SDLC Wizard Version: 1.68.0 -->
2979
+ <!-- SDLC Wizard Version: 1.70.0 -->
2980
2980
  <!-- Setup Date: [DATE] -->
2981
2981
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2982
2982
  <!-- Git Workflow: [PRs or Solo] -->
@@ -4055,7 +4055,7 @@ Walk through updates? (y/n)
4055
4055
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
4056
4056
 
4057
4057
  ```markdown
4058
- <!-- SDLC Wizard Version: 1.68.0 -->
4058
+ <!-- SDLC Wizard Version: 1.70.0 -->
4059
4059
  <!-- Setup Date: 2026-01-24 -->
4060
4060
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
4061
4061
  <!-- Git Workflow: PRs -->
@@ -43,10 +43,25 @@ fi
43
43
  EFFORT_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
44
44
  EFFORT_SIGNALS="$EFFORT_CACHE_DIR/effort-signals.log"
45
45
  PROMPT_TEXT=""
46
- if [ ! -t 0 ] && command -v jq > /dev/null 2>&1; then
46
+ SESSION_ID=""
47
+ # Read stdin once regardless of jq availability — session_id extraction
48
+ # is jq-independent (Codex round 1 P1: BASELINE gate failed when jq was
49
+ # missing or broken). Prompt extraction still needs jq because prompt
50
+ # content can contain arbitrary multi-line text + escapes.
51
+ if [ ! -t 0 ]; then
47
52
  STDIN_JSON=$(cat)
48
53
  if [ -n "$STDIN_JSON" ]; then
49
- PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
54
+ # session_id is a UUID-shaped string with no escapable content
55
+ # in CC's stdin contract — regex extraction is sufficient.
56
+ # `tr -cd` later strips anything filename-unsafe, so a malformed
57
+ # input cannot escape the cache dir.
58
+ SESSION_ID=$(printf '%s' "$STDIN_JSON" \
59
+ | grep -o '"session_id"[[:space:]]*:[[:space:]]*"[^"]*"' \
60
+ | head -1 \
61
+ | sed 's/.*"\([^"]*\)"$/\1/')
62
+ if command -v jq > /dev/null 2>&1; then
63
+ PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
64
+ fi
50
65
  fi
51
66
  fi
52
67
  if [ -n "$PROMPT_TEXT" ]; then
@@ -109,7 +124,49 @@ SETUP
109
124
  exit 0
110
125
  fi
111
126
 
112
- cat << 'EOF'
127
+ # Token-bloat fix: BASELINE block fires once per CC session (~250 tok × 50
128
+ # prompts = ~12K wasted tokens before this gate). Once Claude has the SDLC
129
+ # skill auto-invoked (covers TodoWrite/confidence/workflow), this static
130
+ # block is duplicate context. Sentinel is per-session_id so a fresh CC
131
+ # session re-emits the cold-start nudge. Without session_id (legacy CC, or
132
+ # direct shell tests with no JSON stdin), behavior is unchanged — emits
133
+ # every fire. SETUP-not-complete + EFFORT-bump branches above are NOT
134
+ # gated; they're dynamic state warnings that must fire every prompt.
135
+ #
136
+ # Concurrency: claim is atomic via `set -C` (noclobber) — the redirect
137
+ # `: > "$path"` create-or-fails. Across N parallel fires with the same
138
+ # session_id, exactly one wins the claim and emits BASELINE; the rest
139
+ # see file-exists and suppress. (Codex round 1 P1: previous "check then
140
+ # write after emit" pattern allowed N parallel fires to all emit.)
141
+ SHOULD_EMIT_BASELINE=1
142
+ BASELINE_SENTINEL=""
143
+ if [ -n "$SESSION_ID" ]; then
144
+ BASELINE_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
145
+ # Strip path-traversal chars from session_id before using in filename
146
+ # (defense-in-depth — CC session_ids are UUIDs, but never trust stdin).
147
+ SAFE_SID=$(printf '%s' "$SESSION_ID" | tr -cd 'A-Za-z0-9._-')
148
+ if [ -n "$SAFE_SID" ]; then
149
+ BASELINE_SENTINEL="$BASELINE_CACHE_DIR/baseline-shown-${SAFE_SID}"
150
+ mkdir -p "$BASELINE_CACHE_DIR" 2>/dev/null || true
151
+ # Atomic create-or-fail: subshell sets noclobber so `: > "$path"`
152
+ # fails (rc≠0) if the file already exists. The full conditional
153
+ # tree:
154
+ # - claim succeeds → emit (we won the race)
155
+ # - claim fails AND file exists → suppress (someone else won)
156
+ # - claim fails AND file doesn't exist → cache unwritable;
157
+ # fall back to emit so user never loses cold-start nudge.
158
+ if (set -C; : > "$BASELINE_SENTINEL") 2>/dev/null; then
159
+ SHOULD_EMIT_BASELINE=1
160
+ elif [ -f "$BASELINE_SENTINEL" ]; then
161
+ SHOULD_EMIT_BASELINE=0
162
+ else
163
+ SHOULD_EMIT_BASELINE=1
164
+ fi
165
+ fi
166
+ fi
167
+
168
+ if [ "$SHOULD_EMIT_BASELINE" -eq 1 ]; then
169
+ cat << 'EOF'
113
170
  SDLC BASELINE:
114
171
  1. TodoWrite FIRST (plan tasks before coding)
115
172
  2. STATE CONFIDENCE: HIGH/MEDIUM/LOW
@@ -130,3 +187,9 @@ Workflow phases:
130
187
 
131
188
  Quick refs: SDLC.md | TESTING.md | *_PLAN.md for feature
132
189
  EOF
190
+ # Prune sentinels older than 7d so cache doesn't grow forever.
191
+ # Best-effort: errors silently swallowed.
192
+ if [ -n "$BASELINE_SENTINEL" ]; then
193
+ find "$BASELINE_CACHE_DIR" -name 'baseline-shown-*' -type f -mtime +7 -delete 2>/dev/null || true
194
+ fi
195
+ fi
@@ -17,13 +17,59 @@ TOOL_INPUT=$(cat)
17
17
  # Extract the file path being edited (requires jq)
18
18
  FILE_PATH=$(echo "$TOOL_INPUT" | jq -r '.tool_input.file_path // empty')
19
19
 
20
+ # session_id extraction is jq-independent (same pattern as sdlc-prompt-check.sh
21
+ # v1.69.0 — Codex round 1 P1 from that PR proved jq-coupling silently disabled
22
+ # the gate when jq was missing/broken). UUIDs are simple strings, no escapes.
23
+ SESSION_ID=$(printf '%s' "$TOOL_INPUT" \
24
+ | grep -o '"session_id"[[:space:]]*:[[:space:]]*"[^"]*"' \
25
+ | head -1 \
26
+ | sed 's/.*"\([^"]*\)"$/\1/')
27
+
20
28
  # CUSTOMIZE: Change this pattern to match YOUR source directory
21
29
  # Examples: "/src/", "/app/", "/lib/", "/packages/", "/server/"
22
30
  if [[ "$FILE_PATH" == *"/src/"* ]]; then
23
- # Output additionalContext that Claude will read
24
- cat << 'EOF'
31
+ # Token-bloat fix (v1.70.0): nudge fires once per CC session. Once Claude
32
+ # has the SDLC skill auto-invoked (covers TDD RED/GREEN), the per-Edit
33
+ # nudge becomes duplicate context — typical session has 10-30 src Edits
34
+ # = ~0.5-1.5K wasted tokens. Same atomic-noclobber claim pattern as
35
+ # sdlc-prompt-check.sh BASELINE gate.
36
+ #
37
+ # No-session_id stdin (legacy CC, direct shell tests) → emit every fire,
38
+ # preserving back-compat with existing tests in test-hooks.sh that don't
39
+ # pass session_id.
40
+ SHOULD_EMIT=1
41
+ if [ -n "$SESSION_ID" ]; then
42
+ CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
43
+ SAFE_SID=$(printf '%s' "$SESSION_ID" | tr -cd 'A-Za-z0-9._-')
44
+ if [ -n "$SAFE_SID" ]; then
45
+ SENTINEL="$CACHE_DIR/tdd-shown-${SAFE_SID}"
46
+ mkdir -p "$CACHE_DIR" 2>/dev/null || true
47
+ # Atomic claim: subshell `set -C` makes `: > path` create-or-fail.
48
+ # Conditional tree:
49
+ # - claim succeeds → emit (we won the race)
50
+ # - claim fails AND file exists → suppress (someone else won)
51
+ # - claim fails AND file missing → cache unwritable; fall back
52
+ # to emit so user never loses cold-start nudge.
53
+ if (set -C; : > "$SENTINEL") 2>/dev/null; then
54
+ SHOULD_EMIT=1
55
+ elif [ -f "$SENTINEL" ]; then
56
+ SHOULD_EMIT=0
57
+ else
58
+ SHOULD_EMIT=1
59
+ fi
60
+ fi
61
+ fi
62
+
63
+ if [ "$SHOULD_EMIT" -eq 1 ]; then
64
+ # Output additionalContext that Claude will read
65
+ cat << 'EOF'
25
66
  {"hookSpecificOutput": {"hookEventName": "PreToolUse", "additionalContext": "TDD CHECK: Are you writing IMPLEMENTATION before a FAILING TEST? If yes, STOP. Write the test first (TDD RED), then implement (TDD GREEN)."}}
26
67
  EOF
68
+ # Prune sentinels older than 7d so cache doesn't grow forever.
69
+ if [ -n "$SESSION_ID" ] && [ -n "$SAFE_SID" ]; then
70
+ find "$CACHE_DIR" -name 'tdd-shown-*' -type f -mtime +7 -delete 2>/dev/null || true
71
+ fi
72
+ fi
27
73
  fi
28
74
 
29
75
  # No output = allow the tool to proceed
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.68.0",
3
+ "version": "1.70.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "cli/bin/sdlc-wizard.js"
@@ -93,13 +93,12 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
93
93
 
94
94
  ```
95
95
  Installed: 1.42.0
96
- Latest: 1.68.0
96
+ Latest: 1.70.0
97
97
 
98
98
  What changed:
99
- - [1.68.0] roadmap hygiene closed #97 (Anthropic Policy & Research alignment audit NO-GO with one validating parallel: April 2026 "Automated Alignment Researchers" paper conceptually parallel to our cross-model review, our implementation predates it and mitigates noted weaknesses). Research write-up at `.reviews/research-97-anthropic-policy.md`. Also closes #243 follow-up — `.metrics/token-history.jsonl` accumulating (8 rows on maintainer machine, well over 5-record threshold). 6/6 external audits NO-GO.
100
- - [1.67.0] roadmap hygiene closed #99 (AutoGPT integration audit NO-GO; AutoGPT is an agent host like Claude Code / Codex / OpenCode, no hook primitive maps, audience is agent-builders not SWE workflow). Research write-up at `.reviews/research-99-autogpt.md`. 5/5 external-product audits NO-GO (continues #76 + #77 + #235 + #95).
101
- - [1.66.0] roadmap hygiene — closed #95 (Nous Research NO-GO different layer; Nous builds models + agent frameworks, we enforce SDLC process). Rolls in PR #309 codex callout near top + issue #308 close (4/4 API entries audited).
102
- - [1.65.0] roadmap hygiene — closed #210 (Node 24 false-green, already shipped PR #217) + #235 (Thoughtworks AI Evals NO-GO).
99
+ - [1.70.0] token-bloat fix phase 2 `hooks/tdd-pretool-check.sh` TDD CHECK JSON nudge (the per-`Write/Edit` "Are you writing IMPLEMENTATION before a FAILING TEST?" reminder) now fires once per CC `session_id` instead of every src/ edit. Saves ~0.5-1.5K tokens/session (10-30 src Edits × ~50 tok). Same atomic-noclobber claim pattern as v1.69.0 BASELINE gate. Non-src/ edits don't consume the sentinel slot.
100
+ - [1.69.0] token-bloat fix phase 1`hooks/sdlc-prompt-check.sh` BASELINE block (the ~250-token "TodoWrite FIRST / STATE CONFIDENCE / AUTO-INVOKE" reminder) now fires once per CC `session_id`. Saves ~12K tokens/session. SETUP-not-complete + EFFORT-bump warnings still fire every prompt (dynamic state).
101
+ - [1.68.0–1.65.0] roadmap hygiene — five paperwork closes: #97 Anthropic Policy NO-GO + AAR-paper validating parallel; #99 AutoGPT NO-GO; #95 Nous NO-GO; #243 token-history liveness verified; #210 Node-24 false-green; #235 Thoughtworks AI Evals NO-GO. **6/6 external-product audits NO-GO** (continues #76, #77). Research write-ups in `.reviews/research-*.md`.
103
102
  - [1.64.0] XDLC ecosystem cross-references — README, wizard doc, and ROADMAP now cross-reference all three sibling packages (`agentic-sdlc-wizard`, `codex-sdlc-wizard`, `claude-gdlc-wizard`). New "Ecosystem (Sibling Projects)" section in README. 3 new doc-consistency tests prevent drift.
104
103
  - [1.63.0] cache-cost observability closeout (#204 absorbed by #220) — `tests/test-token-spike.sh` gains explicit cache-miss regression test + negative-control test. SDLC skill + wizard doc gain "Cache-Cost Surprises" sections covering 10-20× silent cost blowups (mid-session CLAUDE.md edits, idle pruning, upstream cache bugs) and detection via `hooks/token-spike-check.sh`'s `costly_tokens` metric.
105
104
  - [1.62.0] roadmap hygiene + #211 backfill — closes paperwork-stale rows (#207, #211 historical, #215, #217, #78, #79, #80, #219). Backfilled 5 corrupted `score-history.jsonl` rows from `max_score:10` → `max_score:11` (UI scenarios with design_system criterion). Codex strategic review confirmed scope.