npm - agentic-sdlc-wizard - Versions diffs - 1.67.0 → 1.69.0 - Mend

agentic-sdlc-wizard 1.67.0 → 1.69.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +43 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +2 -2
package/hooks/sdlc-prompt-check.sh +66 -3
package/package.json +1 -1
package/skills/update/SKILL.md +3 -4

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.67.0",
+      "version": "1.69.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.67.0",
+  "version": "1.69.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,49 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.69.0] - 2026-05-04
+### Token-bloat fix: BASELINE block fires once per CC session
+Cuts ~12K tokens/session of duplicate context for users with >3 prompts. The `SDLC BASELINE` block in `hooks/sdlc-prompt-check.sh` (~250 tokens) was firing on every `UserPromptSubmit` — once Claude has the SDLC skill auto-invoked (covers TodoWrite/confidence/workflow phases), every subsequent re-emission is pure duplication. Now gated by a per-`session_id` sentinel under `$SDLC_WIZARD_CACHE_DIR/baseline-shown-<id>`, pruned at 7d.
+### Behavior
+- **First prompt of a CC session** → BASELINE emits as before (cold-start nudge survives).
+- **Subsequent prompts (same session_id)** → BASELINE suppressed.
+- **New CC session (different session_id)** → BASELINE re-emits.
+- **No session_id in stdin** (legacy CC, direct shell tests) → BASELINE emits every fire (back-compat preserved).
+- `SETUP NOT COMPLETE` warning + `EFFORT BUMP REQUIRED` nudge **continue to fire every prompt** — they're dynamic state warnings, not static reminders.
+### Files
+- `hooks/sdlc-prompt-check.sh` — extracts `session_id` from stdin JSON; gates the static BASELINE block via per-session sentinel; prunes >7d sentinels on emit.
+- `tests/test-baseline-fires-once.sh` (new — 8 cases covering first-fire, suppression, different-session re-emit, no-session-id back-compat, SETUP-warning persistence, EFFORT-bump persistence, cross-cache-dir isolation, byte-shrink verification).
+- `.github/workflows/ci.yml` — wires new test into `validate` job.
+- `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json`, `CLAUDE_CODE_SDLC_WIZARD.md` (1.68.0 → 1.69.0).
+### Notes
+Discovered during ROADMAP #236 functional-bloat audit. Identified `sdlc-prompt-check.sh` as the #1 amplifier (every-prompt × 22 lines × N prompts). Audit method: measure cost × frequency, judge value — not blind delete-and-see. Per-prompt BASELINE failed cost/value once skill is loaded; conditional warnings + effort-bump detector earned their keep and stayed untouched. Other hooks (`model-effort-check`, `precompact-seam-check`, `token-spike-check`) are silent at healthy state — not bloat.
+## [1.68.0] - 2026-05-04
+### Closed (paperwork-stale roadmap rows)
+- **ROADMAP #97 — Anthropic Policy & Research alignment audit** ✅ DONE 2026-05-04 with NO-GO + one validating parallel verdict. Research write-up at `.reviews/research-97-anthropic-policy.md`. RSP, Transparency Hub, and Research page audited. RSP: not applicable (Anthropic's internal model-dev policy). Transparency: tangential (model-card disclosures, security-guidance overlap covered by #101). Research page: the April 2026 "Automated Alignment Researchers" paper is **conceptually parallel** to our cross-model review pattern — independent third-party validation that LLM-as-reviewer-of-LLM works. Our implementation predates the paper (PR #189 / ROADMAP #72 mission-first cross-model review) and already mitigates its noted weaknesses (reward hacking, limited generalization) via vendor-diverse adversarial framing + verification checklist. Constitution + Economic Futures skipped as clearly off-topic. **6/6 external audits NO-GO** (continues #76, #77, #95, #99, #235).
+- **ROADMAP #243 — token-spike-check follow-up** ✅ CLOSED 2026-05-04. The 2-week verify-window opened by `hooks/token-spike-check.sh` (shipped v1.43.0, 2026-04-27) has elapsed: `wc -l .metrics/token-history.jsonl` shows 8 rows accumulated on maintainer machine, well above the 5-record rolling-baseline threshold. SessionStart skip-recent filter and transcript-dir resolution are working as designed. No code changes.
+### Files
+- `.reviews/research-97-anthropic-policy.md` (new — research write-up, force-added past `.reviews/` gitignore)
+- `ROADMAP.md` (#97 marked DONE with verdict + AAR paper reference)
+- `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json`, `CLAUDE_CODE_SDLC_WIZARD.md` (1.67.0 → 1.68.0)
+### Notes
+Zero code changes. Same pattern as v1.65.0 + v1.66.0 + v1.67.0 paperwork closes. Open backlog after this release: `#302` (user-level setup skill, design-blocked) + ROADMAP top items #212 (multi-day, partial-API), #9 OpenCode (separate session per maintainer).
 ## [1.67.0] - 2026-05-04
 ### Closed (paperwork-stale roadmap rows)

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -2976,7 +2976,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.67.0 -->
+<!-- SDLC Wizard Version: 1.69.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -4055,7 +4055,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.67.0 -->
+<!-- SDLC Wizard Version: 1.69.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/hooks/sdlc-prompt-check.sh CHANGED Viewed

@@ -43,10 +43,25 @@ fi
 EFFORT_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
 EFFORT_SIGNALS="$EFFORT_CACHE_DIR/effort-signals.log"
 PROMPT_TEXT=""
-if [ ! -t 0 ] && command -v jq > /dev/null 2>&1; then
+SESSION_ID=""
+# Read stdin once regardless of jq availability — session_id extraction
+# is jq-independent (Codex round 1 P1: BASELINE gate failed when jq was
+# missing or broken). Prompt extraction still needs jq because prompt
+# content can contain arbitrary multi-line text + escapes.
+if [ ! -t 0 ]; then
     STDIN_JSON=$(cat)
     if [ -n "$STDIN_JSON" ]; then
-        PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
+        # session_id is a UUID-shaped string with no escapable content
+        # in CC's stdin contract — regex extraction is sufficient.
+        # `tr -cd` later strips anything filename-unsafe, so a malformed
+        # input cannot escape the cache dir.
+        SESSION_ID=$(printf '%s' "$STDIN_JSON" \
+            | grep -o '"session_id"[[:space:]]*:[[:space:]]*"[^"]*"' \
+            | head -1 \
+            | sed 's/.*"\([^"]*\)"$/\1/')
+        if command -v jq > /dev/null 2>&1; then
+            PROMPT_TEXT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || PROMPT_TEXT=""
+        fi
     fi
 fi
 if [ -n "$PROMPT_TEXT" ]; then
@@ -109,7 +124,49 @@ SETUP
     exit 0
 fi
-cat << 'EOF'
+# Token-bloat fix: BASELINE block fires once per CC session (~250 tok × 50
+# prompts = ~12K wasted tokens before this gate). Once Claude has the SDLC
+# skill auto-invoked (covers TodoWrite/confidence/workflow), this static
+# block is duplicate context. Sentinel is per-session_id so a fresh CC
+# session re-emits the cold-start nudge. Without session_id (legacy CC, or
+# direct shell tests with no JSON stdin), behavior is unchanged — emits
+# every fire. SETUP-not-complete + EFFORT-bump branches above are NOT
+# gated; they're dynamic state warnings that must fire every prompt.
+#
+# Concurrency: claim is atomic via `set -C` (noclobber) — the redirect
+# `: > "$path"` create-or-fails. Across N parallel fires with the same
+# session_id, exactly one wins the claim and emits BASELINE; the rest
+# see file-exists and suppress. (Codex round 1 P1: previous "check then
+# write after emit" pattern allowed N parallel fires to all emit.)
+SHOULD_EMIT_BASELINE=1
+BASELINE_SENTINEL=""
+if [ -n "$SESSION_ID" ]; then
+    BASELINE_CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
+    # Strip path-traversal chars from session_id before using in filename
+    # (defense-in-depth — CC session_ids are UUIDs, but never trust stdin).
+    SAFE_SID=$(printf '%s' "$SESSION_ID" | tr -cd 'A-Za-z0-9._-')
+    if [ -n "$SAFE_SID" ]; then
+        BASELINE_SENTINEL="$BASELINE_CACHE_DIR/baseline-shown-${SAFE_SID}"
+        mkdir -p "$BASELINE_CACHE_DIR" 2>/dev/null || true
+        # Atomic create-or-fail: subshell sets noclobber so `: > "$path"`
+        # fails (rc≠0) if the file already exists. The full conditional
+        # tree:
+        #   - claim succeeds → emit (we won the race)
+        #   - claim fails AND file exists → suppress (someone else won)
+        #   - claim fails AND file doesn't exist → cache unwritable;
+        #     fall back to emit so user never loses cold-start nudge.
+        if (set -C; : > "$BASELINE_SENTINEL") 2>/dev/null; then
+            SHOULD_EMIT_BASELINE=1
+        elif [ -f "$BASELINE_SENTINEL" ]; then
+            SHOULD_EMIT_BASELINE=0
+        else
+            SHOULD_EMIT_BASELINE=1
+        fi
+    fi
+fi
+if [ "$SHOULD_EMIT_BASELINE" -eq 1 ]; then
+    cat << 'EOF'
 SDLC BASELINE:
 1. TodoWrite FIRST (plan tasks before coding)
 2. STATE CONFIDENCE: HIGH/MEDIUM/LOW
@@ -130,3 +187,9 @@ Workflow phases:
 Quick refs: SDLC.md | TESTING.md | *_PLAN.md for feature
 EOF
+    # Prune sentinels older than 7d so cache doesn't grow forever.
+    # Best-effort: errors silently swallowed.
+    if [ -n "$BASELINE_SENTINEL" ]; then
+        find "$BASELINE_CACHE_DIR" -name 'baseline-shown-*' -type f -mtime +7 -delete 2>/dev/null || true
+    fi
+fi

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.67.0",
+  "version": "1.69.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "cli/bin/sdlc-wizard.js"

package/skills/update/SKILL.md CHANGED Viewed

@@ -93,12 +93,11 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
 ```
 Installed: 1.42.0
-Latest:    1.67.0
+Latest:    1.69.0
 What changed:
-- [1.67.0] roadmap hygiene — closed #99 (AutoGPT integration audit — NO-GO; AutoGPT is an agent host like Claude Code / Codex / OpenCode, no hook primitive maps, audience is agent-builders not SWE workflow). Research write-up at `.reviews/research-99-autogpt.md`. 5/5 external-product audits NO-GO (continues #76 + #77 + #235 + #95).
-- [1.66.0] roadmap hygiene — closed #95 (Nous Research competitive audit — NO-GO; different layer of the stack, Nous builds open-weights LLMs + agent frameworks, we enforce SDLC process; pattern continues with #76 + #77 + #235 NO-GOs). Research write-up at `.reviews/research-95-nous.md`. Also rolls in PR #309 (codex sibling callout near top of README + wizard doc) and `gh issue close 308` (4/4 API entries audited, zero wizard changes — Sonnet 1M retirement only affects 4.5/4, our mixed-mode tier pins to 4.6).
-- [1.65.0] roadmap hygiene — closed paperwork-stale rows #210 (Node 24 false-green test, already shipped in PR #217) and #235 (Thoughtworks AI Evals research — NO-GO verdict, methodology already implemented under different naming; pattern continues with #76 + #77 NO-GOs). Research write-up at `.reviews/research-235-ai-evals.md`. No code changes.
+- [1.69.0] token-bloat fix — `hooks/sdlc-prompt-check.sh` BASELINE block (the ~250-token "TodoWrite FIRST / STATE CONFIDENCE / AUTO-INVOKE" reminder) now fires once per CC `session_id` instead of every prompt. Saves ~12K tokens/session for any user with >3 prompts. SETUP-not-complete + EFFORT-bump warnings still fire every prompt (dynamic state). Sentinel pruned at 7d. No-session-id stdin keeps current behavior (legacy CC + tests).
+- [1.68.0–1.65.0] roadmap hygiene — five paperwork closes: #97 Anthropic Policy NO-GO + AAR-paper validating parallel; #99 AutoGPT NO-GO; #95 Nous NO-GO; #243 token-history liveness verified; #210 Node-24 false-green; #235 Thoughtworks AI Evals NO-GO. **6/6 external-product audits NO-GO** (continues #76, #77). Research write-ups in `.reviews/research-*.md`.
 - [1.64.0] XDLC ecosystem cross-references — README, wizard doc, and ROADMAP now cross-reference all three sibling packages (`agentic-sdlc-wizard`, `codex-sdlc-wizard`, `claude-gdlc-wizard`). New "Ecosystem (Sibling Projects)" section in README. 3 new doc-consistency tests prevent drift.
 - [1.63.0] cache-cost observability closeout (#204 absorbed by #220) — `tests/test-token-spike.sh` gains explicit cache-miss regression test + negative-control test. SDLC skill + wizard doc gain "Cache-Cost Surprises" sections covering 10-20× silent cost blowups (mid-session CLAUDE.md edits, idle pruning, upstream cache bugs) and detection via `hooks/token-spike-check.sh`'s `costly_tokens` metric.
 - [1.62.0] roadmap hygiene + #211 backfill — closes paperwork-stale rows (#207, #211 historical, #215, #217, #78, #79, #80, #219). Backfilled 5 corrupted `score-history.jsonl` rows from `max_score:10` → `max_score:11` (UI scenarios with design_system criterion). Codex strategic review confirmed scope.