npm - agentic-sdlc-wizard - Versions diffs - 1.42.2 → 1.43.0 - Mend

agentic-sdlc-wizard 1.42.2 → 1.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +14 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +2 -2
package/hooks/hooks.json +4 -0
package/hooks/token-spike-check.sh +60 -0
package/package.json +1 -1
package/skills/update/SKILL.md +2 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.42.2",
+      "version": "1.43.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.42.2",
+  "version": "1.43.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,20 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.43.0] - 2026-04-27
+### Added
+- **Token-spike anomaly detection** (ROADMAP #220 closure). New SessionStart hook `hooks/token-spike-check.sh` walks the CC transcript dir (`~/.claude/projects/<sanitized-cwd>/*.jsonl`), sums per-session `usage.{input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens}` from every assistant message with a usage block, and idempotently appends one record per `session_id` to `.metrics/token-history.jsonl`. The hook then warns when the most recent completed session's `costly_tokens` (= `input + cache_creation + output`, excluding the cheap ~$1.50/M `cache_read` tier) exceeds the rolling baseline by more than 2σ. Anthropic's 2026-04-23 post-mortem documented a CC caching bug that "continuously dropped thinking blocks from subsequent requests" — invisible until the invoice arrived; this hook surfaces the same shape of regression the moment it occurs. The `--metric median` mode (default) uses MAD (median absolute deviation) instead of stdev for the spread term, so a single outlier session in the baseline doesn't mask the next genuine spike. Hook is gated on `.metrics/` existing in the project root (opt-in for consumers, on for the wizard repo which already maintains `.metrics/catches.jsonl`). 14 quality tests in `tests/test-token-spike.sh` cover burn calculation against summed transcript fields, idempotent ingest, positive/negative spike detection, the min-baseline floor (no false positives on <5-record windows), the median-vs-mean contrast (both `--metric` modes invoked, asserting median warns and mean does not on an outlier-inflated fixture), flat-baseline minimum-spread floor (1000→1100 suppressed, 1000→50000 still fires), privacy/type-coercion (a malicious transcript with `"USER_SECRET_INPUT"` strings in usage fields cannot leak content into history), concurrent-ingest atomic-lock serialization (parallel ingests produce 1 record per session), and hook gating + warning surface.
+### Files
+- New `hooks/token-spike-check.sh` (SessionStart, opt-in)
+- New `tests/e2e/token-analytics.sh` (writer + checker engine; supports `--ingest`, `--check`, `--report`, `--metric median|mean`, `--window`, `--threshold-sigma`)
+- New `tests/test-token-spike.sh` (14 quality tests)
+- Hook registered in `hooks/hooks.json` and `.claude/settings.json` SessionStart event
+- `SDLC.md` hooks table + file tree updated
 ## [1.42.2] - 2026-04-26
 ### Documented

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -2918,7 +2918,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.42.2 -->
+<!-- SDLC Wizard Version: 1.43.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -3983,7 +3983,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.42.2 -->
+<!-- SDLC Wizard Version: 1.43.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/hooks/hooks.json CHANGED Viewed

@@ -39,6 +39,10 @@
           {
             "type": "command",
             "command": "${CLAUDE_PLUGIN_ROOT}/hooks/model-effort-check.sh"
+          },
+          {
+            "type": "command",
+            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/token-spike-check.sh"
           }
         ]
       }

package/hooks/token-spike-check.sh ADDED Viewed

@@ -0,0 +1,60 @@
+#!/bin/bash
+# SessionStart hook — token-spike anomaly detection (ROADMAP #220).
+#
+# Reads CC transcript history, computes per-session token burn, and warns
+# if the last completed session's burn deviates >2σ above the rolling median.
+# Catches silent CC-side regressions (caching bugs, prompt-inflation defaults)
+# that only otherwise surface on the invoice. Reference: Anthropic 2026-04-23
+# post-mortem on the dropped-thinking-blocks caching bug.
+#
+# Gated on `.metrics/` directory existing in the project root — opt-in for
+# consumers, on-by-default for the wizard repo (which already maintains
+# `.metrics/catches.jsonl` for the effectiveness scoreboard).
+#
+# Non-blocking: always exits 0.
+# Token-bloat fix: when both project + plugin register this hook, plugin yields.
+HOOK_DIR="${BASH_SOURCE[0]%/*}"
+[ "$HOOK_DIR" = "${BASH_SOURCE[0]}" ] && HOOK_DIR="."
+# shellcheck disable=SC1091
+source "$HOOK_DIR/_find-sdlc-root.sh"
+dedupe_plugin_or_project "${BASH_SOURCE[0]}" || { [ ! -t 0 ] && cat > /dev/null; exit 0; }
+# Drain stdin (SessionStart sends JSON; we don't need any of it)
+[ ! -t 0 ] && cat > /dev/null
+ROOT="${CLAUDE_PROJECT_DIR:-$PWD}"
+# Gate 1: opt-in via .metrics/ directory
+[ -d "$ROOT/.metrics" ] || exit 0
+# Gate 2: analytics script must exist. Resolve hook-relative first so the
+# wizard repo's hook always finds its own analytics regardless of how
+# CLAUDE_PROJECT_DIR is set (e.g., test fixtures pointing at a tmp dir).
+# Fall back to project-relative for consumer forks that ship the script.
+ANALYTICS=""
+for candidate in \
+    "$HOOK_DIR/../tests/e2e/token-analytics.sh" \
+    "$ROOT/tests/e2e/token-analytics.sh"; do
+    if [ -x "$candidate" ]; then
+        ANALYTICS="$candidate"
+        break
+    fi
+done
+[ -n "$ANALYTICS" ] || exit 0
+# Gate 3: jq is required by the analytics script
+command -v jq > /dev/null 2>&1 || exit 0
+ARGS=(--history "$ROOT/.metrics/token-history.jsonl" --ingest --check)
+# Test override: SDLC_TOKEN_SPIKE_TRANSCRIPT_DIR points the ingest at a
+# fixture directory instead of the real ~/.claude/projects/... path.
+if [ -n "$SDLC_TOKEN_SPIKE_TRANSCRIPT_DIR" ]; then
+    ARGS+=(--transcript-dir "$SDLC_TOKEN_SPIKE_TRANSCRIPT_DIR" --no-skip-recent)
+fi
+OUTPUT=$("$ANALYTICS" "${ARGS[@]}" 2>&1) || true
+[ -n "$OUTPUT" ] && echo "$OUTPUT"
+exit 0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.42.2",
+  "version": "1.43.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "cli/bin/sdlc-wizard.js"

package/skills/update/SKILL.md CHANGED Viewed

@@ -131,9 +131,10 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
 ```
 Installed: 1.24.0
-Latest:    1.42.2
+Latest:    1.43.0
 What changed:
+- [1.43.0] Token-spike anomaly detection — ROADMAP #220 closure. New `hooks/token-spike-check.sh` (SessionStart, opt-in via `.metrics/`) ingests CC transcript usage (`input_tokens` / `output_tokens` / `cache_creation_input_tokens` / `cache_read_input_tokens`) into `.metrics/token-history.jsonl`, then warns when the last session's `costly_tokens` (input + cache_creation + output, excluding the cheap cache_read tier) exceeds median + 2σ over a rolling baseline. Catches silent CC-side caching regressions (per Anthropic's 2026-04-23 post-mortem) before they surface on the invoice. Uses MAD-based spread for the median metric so a single baseline outlier doesn't mask the next spike. 14 quality tests in `tests/test-token-spike.sh` (incl. malicious-transcript privacy probe, flat-baseline floor, median-vs-mean contrast, concurrent-ingest mkdir lock).
 - [1.42.2] PreCompact self-heal documented — ROADMAP #209 closure. Added `pr_number` opt-in to all 3 handoff template schemas (skill Step 1; wizard Round 1 + cross-model section). Self-heal logic shipped earlier with #229 but was undocumented, leaving the dead-code path. New `test_handoff_template_documents_pr_number` enforces template/doc parity. Together with #229 (mtime auto-expire) closes the "stuck PENDING handoff blocks /compact forever" footgun from both directions.
 - [1.42.1] CI hygiene fix — skip Claude PR review on wizard self-PRs. 7 self-PRs (v1.39.0–v1.42.0) had shipped with red `review` job (API canary firing on dead credit balance). Treated as "expected" but red normalizes red. Workflow `if:` now skips review on `BaseInfinity/claude-sdlc-wizard` repo only; consumer projects unaffected. 7 quality tests, mutation-verified (== inversion fails).
 - [1.42.0] AGENTS.md interop detection — ROADMAP #205 phase (a). Setup wizard auto-scan now lists AGENTS.md (cross-tool agent-instructions standard, CC issue #6235); new Step 4.5 surfaces a 3-way decision (dual-maintain / merge / skip) when AGENTS.md is detected. Phase (b) write-fresh and phase (d) drift-test deferred. 7 quality tests.