npm - agentic-sdlc-wizard - Versions diffs - 1.69.0 → 1.71.0 - Mend

agentic-sdlc-wizard 1.69.0 → 1.71.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +69 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +17 -2
package/hooks/tdd-pretool-check.sh +48 -2
package/package.json +1 -1
package/skills/sdlc/SKILL.md +9 -61
package/skills/update/SKILL.md +4 -2

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.69.0",
+      "version": "1.71.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.69.0",
+  "version": "1.71.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,75 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.71.0] - 2026-05-05
+### Token-bloat fix: SDLC skill Cross-Model Review section trimmed
+`skills/sdlc/SKILL.md` Cross-Model Review section condensed from ~70 lines to ~20 lines. Saves ~427 tokens per SDLC skill auto-invoke (4995 → 4568 tokens). The skill auto-loads on virtually every productive `implement/fix/refactor` task, so this is real per-session cost.
+### What stayed in SKILL.md
+- Decision-making: when to run / skip / prerequisites / flagship-tier reviewer rule (#233)
+- 4-step protocol summary (preflight → handoff → reviewer → dialogue loop)
+- Required handoff JSON keys + `pr_number` self-heal opt-in note (#209)
+- Convergence rule (2 rounds sweet spot, 3 max)
+- Release-review verification-checklist additions
+- Sandbox flag for Codex from CC
+### What moved to canonical wizard doc only
+Full JSON example, full codex command example, anti-patterns, multi-reviewer (Claude+Codex+human) workflow, non-code-domain variants. All these live in `CLAUDE_CODE_SDLC_WIZARD.md` → "Cross-Model Review Loop" (194 lines, full canonical protocol). The trimmed SKILL.md ends with an explicit pointer to that section.
+### Audit method
+ROADMAP #236 phase 3. `scripts/audit-session-load.sh` ranked SKILL.md files at the top of the size table:
+- `skills/sdlc/SKILL.md`: 4995 tokens (sat right at 5K threshold)
+- `skills/update/SKILL.md`: 4931 tokens
+- `skills/setup/SKILL.md`: 4490 tokens
+SDLC skill auto-invokes most often (every implement/fix/refactor task), so it earned the cut. Verified 8 test suites that grep for SKILL.md content (mocking table, TDD prove, Memory Audit Protocol heading, opus[1m], autocompact compound, Deployment Tasks, plus `tests/test-self-update.sh` which asserts cross-model-review-specific content: `### Release Review Focus` heading, `Version parity` focus area, `"mission"`/`"success"`/`"failure"` JSON-quoted schema keys, "verification checklist" pattern, "preflight" mention). Codex round 1 caught 3 missed assertions in `test-self-update.sh`; round 2 fixes restored those constraints in tighter prose without re-bloating.
+### Files
+- `skills/sdlc/SKILL.md` — Cross-Model Review section trimmed
+- `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md` (Latest:), `package.json`, `.claude-plugin/plugin.json` + `marketplace.json`, `CLAUDE_CODE_SDLC_WIZARD.md` (1.70.0 → 1.71.0)
+### Combined savings ROADMAP #236 phases 1-3
+- v1.69.0: ~12K tokens/session (BASELINE block fires once)
+- v1.70.0: ~0.5-1.5K tokens/session (TDD CHECK fires once)
+- v1.71.0: ~573 tokens/session (SDLC skill leaner on auto-invoke)
+Total on a 50-prompt + 20-Edit + 1 SDLC-skill-invoke session: **~14K tokens/session**.
+## [1.70.0] - 2026-05-05
+### Token-bloat fix: TDD CHECK nudge fires once per CC session
+`hooks/tdd-pretool-check.sh` was emitting a ~50-token JSON nudge ("TDD CHECK: Are you writing IMPLEMENTATION before a FAILING TEST?") on every Write/Edit/MultiEdit touching `src/**`. After the SDLC skill auto-invokes (which already covers TDD RED/GREEN), the per-Edit nudge is duplicate context — typical SDLC session has 10-30 src Edits = ~0.5-1.5K wasted tokens.
+Now gated on per-`session_id` sentinel under `$SDLC_WIZARD_CACHE_DIR/tdd-shown-<id>`, atomic-claimed via subshell `set -C` (noclobber). Same pattern as v1.69.0 BASELINE gate.
+### Behavior
+- **First src/ edit of a CC session** → TDD CHECK emits as before.
+- **Subsequent src/ edits (same session_id)** → TDD CHECK suppressed.
+- **New CC session (different session_id)** → TDD CHECK re-emits.
+- **Non-src/ files** → no output (existing behavior, regardless of sentinel). Editing README first does NOT consume the sentinel slot — TDD CHECK still fires on first src/ edit afterward.
+- **No session_id in stdin** (legacy CC, direct shell tests) → emits every src/ edit (back-compat preserved).
+- **N parallel src/ edits with same session_id** → exactly 1 TDD CHECK emit (atomic claim).
+### Files
+- `hooks/tdd-pretool-check.sh` — atomic-claim sentinel + jq-decoupled session_id extraction.
+- `tests/test-tdd-pretool-fires-once.sh` (new — 9 cases including 50-parallel concurrency, non-src/ doesn't consume sentinel, suppressed-fire-empty assertion).
+- `.github/workflows/ci.yml`, `CONTRIBUTING.md` — wire new test into validate job + contributor checklist.
+- `CHANGELOG.md`, `SDLC.md`, `skills/update/SKILL.md`, `package.json`, `.claude-plugin/plugin.json` + `marketplace.json`, `CLAUDE_CODE_SDLC_WIZARD.md` (1.69.0 → 1.70.0).
+### Notes
+ROADMAP #236 functional-bloat audit, phase 2. Phase 1 (v1.69.0) trimmed the BASELINE block (~12K tokens/session). Phase 2 trims the per-Edit nudge. Combined savings on a 50-prompt + 20-Edit session: ~13.5K tokens. Audit method continues — measure cost × frequency, judge value, don't blind-delete. Other always-on hooks (`model-effort-check`, `precompact-seam-check`, `token-spike-check`) remain silent at healthy state and are not bloat.
 ## [1.69.0] - 2026-05-04
 ### Token-bloat fix: BASELINE block fires once per CC session

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -2976,7 +2976,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.69.0 -->
+<!-- SDLC Wizard Version: 1.71.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -3923,6 +3923,21 @@ CLI-distributed file parity (skills, hooks, settings).
 **This complements automated tests, not replaces them.** Tests catch exact version mismatches (e.g., `test_package_version_matches_changelog`). Cross-model review catches semantic issues tests cannot — a section silently dropped, examples using outdated but syntactically valid versions, docs describing features that no longer exist.
+#### Anti-patterns
+- **"Find at least N problems"** — incentivizes false positives. The reviewer will manufacture findings to hit the count.
+- **"Review this"** — too vague. Always pair with `verification_checklist` items that name file:line evidence to verify.
+- **1-10 score with no criteria** — every reviewer scores differently. Either define what 1, 5, 10 mean for *this* review, or drop the score and just produce CERTIFIED / NOT CERTIFIED with findings.
+- **Author reasoning visible to reviewer** — anchoring bias. The reviewer should see code + handoff, not the author's self-assessment of why it's correct.
+#### Multiple reviewers (Claude review + Codex + human)
+Run them in parallel; collect feedback via `gh api repos/OWNER/REPO/pulls/PR/comments` (single source of truth). Respond per-reviewer (different blind spots — don't merge feedback). On conflicts, pick the stronger argument with reasoning, not the louder voice. Cap iterations at 3 per reviewer to avoid infinite loops.
+#### Non-code domains (research, persuasion, medical content)
+Same handoff format. Adapt `review_instructions` (e.g. "verify each cited claim links to a primary source") and `verification_checklist` (specific claim → specific source). Add `"audience"` and `"stakes"` keys to the JSON so the reviewer knows what reading level / risk profile to apply.
 ---
 ## User Understanding and Periodic Feedback
@@ -4055,7 +4070,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.69.0 -->
+<!-- SDLC Wizard Version: 1.71.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/hooks/tdd-pretool-check.sh CHANGED Viewed

@@ -17,13 +17,59 @@ TOOL_INPUT=$(cat)
 # Extract the file path being edited (requires jq)
 FILE_PATH=$(echo "$TOOL_INPUT" | jq -r '.tool_input.file_path // empty')
+# session_id extraction is jq-independent (same pattern as sdlc-prompt-check.sh
+# v1.69.0 — Codex round 1 P1 from that PR proved jq-coupling silently disabled
+# the gate when jq was missing/broken). UUIDs are simple strings, no escapes.
+SESSION_ID=$(printf '%s' "$TOOL_INPUT" \
+    | grep -o '"session_id"[[:space:]]*:[[:space:]]*"[^"]*"' \
+    | head -1 \
+    | sed 's/.*"\([^"]*\)"$/\1/')
 # CUSTOMIZE: Change this pattern to match YOUR source directory
 # Examples: "/src/", "/app/", "/lib/", "/packages/", "/server/"
 if [[ "$FILE_PATH" == *"/src/"* ]]; then
-  # Output additionalContext that Claude will read
-  cat << 'EOF'
+  # Token-bloat fix (v1.70.0): nudge fires once per CC session. Once Claude
+  # has the SDLC skill auto-invoked (covers TDD RED/GREEN), the per-Edit
+  # nudge becomes duplicate context — typical session has 10-30 src Edits
+  # = ~0.5-1.5K wasted tokens. Same atomic-noclobber claim pattern as
+  # sdlc-prompt-check.sh BASELINE gate.
+  #
+  # No-session_id stdin (legacy CC, direct shell tests) → emit every fire,
+  # preserving back-compat with existing tests in test-hooks.sh that don't
+  # pass session_id.
+  SHOULD_EMIT=1
+  if [ -n "$SESSION_ID" ]; then
+    CACHE_DIR="${SDLC_WIZARD_CACHE_DIR:-$HOME/.cache/sdlc-wizard}"
+    SAFE_SID=$(printf '%s' "$SESSION_ID" | tr -cd 'A-Za-z0-9._-')
+    if [ -n "$SAFE_SID" ]; then
+      SENTINEL="$CACHE_DIR/tdd-shown-${SAFE_SID}"
+      mkdir -p "$CACHE_DIR" 2>/dev/null || true
+      # Atomic claim: subshell `set -C` makes `: > path` create-or-fail.
+      # Conditional tree:
+      #   - claim succeeds → emit (we won the race)
+      #   - claim fails AND file exists → suppress (someone else won)
+      #   - claim fails AND file missing → cache unwritable; fall back
+      #     to emit so user never loses cold-start nudge.
+      if (set -C; : > "$SENTINEL") 2>/dev/null; then
+        SHOULD_EMIT=1
+      elif [ -f "$SENTINEL" ]; then
+        SHOULD_EMIT=0
+      else
+        SHOULD_EMIT=1
+      fi
+    fi
+  fi
+  if [ "$SHOULD_EMIT" -eq 1 ]; then
+    # Output additionalContext that Claude will read
+    cat << 'EOF'
 {"hookSpecificOutput": {"hookEventName": "PreToolUse", "additionalContext": "TDD CHECK: Are you writing IMPLEMENTATION before a FAILING TEST? If yes, STOP. Write the test first (TDD RED), then implement (TDD GREEN)."}}
 EOF
+    # Prune sentinels older than 7d so cache doesn't grow forever.
+    if [ -n "$SESSION_ID" ] && [ -n "$SAFE_SID" ]; then
+      find "$CACHE_DIR" -name 'tdd-shown-*' -type f -mtime +7 -delete 2>/dev/null || true
+    fi
+  fi
 fi
 # No output = allow the tool to proceed

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.69.0",
+  "version": "1.71.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "cli/bin/sdlc-wizard.js"

package/skills/sdlc/SKILL.md CHANGED Viewed

@@ -120,76 +120,24 @@ The loop goes back to PLANNING, not TDD RED. Run `/code-review`; issues at confi
 ## Cross-Model Review (If Configured)
-**When to run:** high-stakes changes (auth, payments, data), releases/publishes, complex refactors.
-**When to skip:** trivial changes, time-sensitive hotfixes, risk < review cost.
-**Prerequisites:** Codex CLI (`npm i -g @openai/codex`) + OpenAI API key.
-The PROTOCOL is universal across domains; only `review_instructions` and `verification_checklist` change. **Reviewer always at flagship tier (#233):** if the project pins `model: "sonnet[1m]"` (mixed-mode), the reviewer still runs `gpt-5.5` or Opus 4.7 max — adversarial diversity is the point.
-### Step 0: Preflight Self-Review
-At `.reviews/preflight-{review_id}.md`, document what you already checked: `/code-review` passed, all tests passing, specific concerns checked, what you verified manually, known limitations. Reduces reviewer findings to 0-1 per round.
-### Step 1: Mission-First Handoff
-Write `.reviews/handoff.json`:
-```jsonc
-{
-  "review_id": "feature-xyz-001",
-  "status": "PENDING_REVIEW",
-  "round": 1,
-  "mission": "What changed and why — 2-3 sentences",
-  "success": "What 'correctly reviewed' looks like",
-  "failure": "What gets missed if reviewer is superficial",
-  "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
-  "fixes_applied": [],
-  "previous_score": null,
-  "verification_checklist": [
-    "(a) Verify input validation at auth.ts:45 handles empty strings",
-    "(b) Verify test covers null-token edge case"
-  ],
-  "review_instructions": "Focus on security and edge cases. Be strict — assume bugs may be present.",
-  "preflight_path": ".reviews/preflight-feature-xyz-001.md",
-  "pr_number": 205
-}
-```
-`mission/success/failure` give context (without them: generic "looks good"). `verification_checklist` is specific (file:line), not "review for correctness." `pr_number` (optional) is the **PreCompact self-heal opt-in (ROADMAP #209)**: when set, `precompact-seam-check.sh` checks `gh pr view N --json state` on `/compact` and, if MERGED, treats handoff as implicit CERTIFIED. Without it, a forgotten PENDING handoff blocks every manual compact until you flip status or hit `SDLC_HANDOFF_STALE_DAYS` (default 14).
-### Step 2: Run the Reviewer
-```bash
-codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access \
-  -o .reviews/latest-review.md \
-  "Independent code reviewer. Read .reviews/handoff.json for context. \
-   Verify each checklist item with evidence (file:line, grep, test output). \
-   Each finding: ID, severity (P0/P1/P2), evidence, certify condition. \
-   End with: score (1-10), CERTIFIED or NOT CERTIFIED."
-```
-Always `xhigh` — lower settings miss subtle errors. **Progress (#259):** xhigh runs take 1-5 min; for a heartbeat use `scripts/codex-review-with-progress.sh` (`SDLC_CODEX_HEARTBEAT_INTERVAL` tunes). **Sandbox:** Codex's Rust binary needs `SCDynamicStore`; CC's sandbox blocks this. From CC, use `dangerouslyDisableSandbox: true` — Codex has its own sandbox via `-s danger-full-access`. Known issue: [codex#15640](https://github.com/openai/codex/issues/15640).
-CERTIFIED → CI. NOT CERTIFIED → dialogue loop.
-### Step 3: Dialogue Loop
-Per-finding response in `.reviews/response.json`: `{"finding": "1", "action": "FIXED|DISPUTED|ACCEPTED", "summary": "..."}`. Update `handoff.json`: increment `round`, status `PENDING_RECHECK`, add `fixes_applied` (numbered, file:line refs).
-Recheck prompt: "TARGETED RECHECK. For each finding: FIXED → verify certify condition. DISPUTED → ACCEPT if sound, REJECT with reasoning. ACCEPTED → verify applied. Do NOT raise new findings unless P0. End with score, CERTIFIED or NOT CERTIFIED."
+**When to run:** high-stakes changes (auth, payments, data), releases/publishes, complex refactors. **When to skip:** trivial changes, time-sensitive hotfixes, risk < review cost. **Prerequisites:** Codex CLI (`npm i -g @openai/codex`) + OpenAI API key. **Reviewer at flagship tier (#233):** even when project pins `sonnet[1m]`, reviewer runs `gpt-5.5` / Opus 4.7 max — adversarial diversity is the point.
-**Convergence:** 2 rounds is the sweet spot, 3 max (research: 14 repos + 7 papers). After 3 still NOT CERTIFIED → escalate to user.
+PROTOCOL is universal across domains; only `review_instructions` and `verification_checklist` change.
-**Anti-patterns:** "find at least N problems," "review this," 1-10 without criteria, letting reviewer see author's reasoning (anchoring).
+1. **Preflight** (`.reviews/preflight-{review_id}.md`) — what you already checked: `/code-review` passed, tests passing, manual verifications, known limits. Reduces reviewer findings to 0-1/round.
+2. **Mission-first handoff** (`.reviews/handoff.json`) — required JSON keys: `"review_id"`, `"status": "PENDING_REVIEW"`, `"round": 1`, `"mission"` / `"success"` / `"failure"` (2-3 sentences each — without them you get "looks good"), `"files_changed"`, `"verification_checklist"` — the **verification checklist** is specific items with file:line refs (NOT a generic "review this"), `"review_instructions"`, `"preflight_path"`. Optional `"pr_number":` opts into the PreCompact self-heal (#209): if PR is MERGED, `/compact` treats handoff as implicit CERTIFIED.
+3. **Run reviewer:** `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access -o .reviews/latest-review.md "<prompt>"`. Always `xhigh`. CC sandbox blocks Codex's Rust binary (`SCDynamicStore`) — use `dangerouslyDisableSandbox: true` on Bash; Codex has its own sandbox. xhigh runs take 1-5 min; for a heartbeat use `scripts/codex-review-with-progress.sh`.
+4. **Dialogue loop:** per-finding response (`{"finding": "1", "action": "FIXED|DISPUTED|ACCEPTED", "summary": "..."}` in `.reviews/response.json`). Bump round, set status `PENDING_RECHECK`, add `fixes_applied` (numbered, file:line). Recheck prompt: "TARGETED RECHECK. FIXED → verify certify condition. DISPUTED → ACCEPT if sound, REJECT with reasoning. ACCEPTED → verify applied. No new findings unless P0."
-**Multiple reviewers** (Claude review + Codex + human): `gh api repos/OWNER/REPO/pulls/PR/comments` for all feedback, respond to each reviewer independently (different blind spots), pick stronger argument on conflicts, max 3 iterations per reviewer.
+**Convergence:** 2 rounds sweet spot, 3 max (research: 14 repos + 7 papers). After 3 still NOT CERTIFIED → escalate.
-**Non-code domains** (research, persuasion, medical): same handoff format, adapt `review_instructions` + `verification_checklist`, add `audience` + `stakes`.
+**Multi-reviewer / non-code domains:** when running multiple reviewers in parallel (e.g. Claude review + Codex + human), respond per-reviewer (different blind spots, no shared anchoring). For non-code domains (research, persuasion, medical), keep the same handoff format and add `"audience"` + `"stakes"` keys.
 ### Release Review Focus
 Before any release/publish, add to `verification_checklist`: **CHANGELOG consistency** (sections present, no lost entries), **Version parity** (package.json + SDLC.md + CHANGELOG + wizard metadata), **Stale examples** (hardcoded version strings), **Docs accuracy** (README + ARCHITECTURE reflect current features), **CLI-distributed file parity** (live skills/hooks match CLI templates).
-(Full protocol with rationale and convergence diagrams: `CLAUDE_CODE_SDLC_WIZARD.md` → Cross-Model Review.)
+**Full protocol** (rationale, full JSON example, anti-patterns like "find at least N", convergence diagrams): `CLAUDE_CODE_SDLC_WIZARD.md` → "Cross-Model Review Loop".
 ## Documentation Sync (REQUIRED — During Planning)

package/skills/update/SKILL.md CHANGED Viewed

@@ -93,10 +93,12 @@ Parse CHANGELOG entries between the user's installed version and latest. Present
 ```
 Installed: 1.42.0
-Latest:    1.69.0
+Latest:    1.71.0
 What changed:
-- [1.69.0] token-bloat fix — `hooks/sdlc-prompt-check.sh` BASELINE block (the ~250-token "TodoWrite FIRST / STATE CONFIDENCE / AUTO-INVOKE" reminder) now fires once per CC `session_id` instead of every prompt. Saves ~12K tokens/session for any user with >3 prompts. SETUP-not-complete + EFFORT-bump warnings still fire every prompt (dynamic state). Sentinel pruned at 7d. No-session-id stdin keeps current behavior (legacy CC + tests).
+- [1.71.0] token-bloat fix phase 3 — `skills/sdlc/SKILL.md` Cross-Model Review section trimmed from ~70 lines to ~20 (4995 → 4568 tokens). Decision-making + 4-step protocol summary + convergence rule kept; full JSON examples / codex commands moved to `CLAUDE_CODE_SDLC_WIZARD.md` "Cross-Model Review Loop" canonical section (which also gained Anti-patterns + Multi-reviewer + Non-code-domain subsections). Saves ~427 tokens per SDLC skill auto-invoke. Codex round 1 caught 3 test assertions broken by initial trim; round 2 fixes restored constraints in tighter prose.
+- [1.70.0] token-bloat fix phase 2 — `hooks/tdd-pretool-check.sh` TDD CHECK JSON nudge fires once per CC `session_id` instead of every src/ edit. Saves ~0.5-1.5K tokens/session.
+- [1.69.0] token-bloat fix phase 1 — `hooks/sdlc-prompt-check.sh` BASELINE block fires once per CC `session_id`. Saves ~12K tokens/session.
 - [1.68.0–1.65.0] roadmap hygiene — five paperwork closes: #97 Anthropic Policy NO-GO + AAR-paper validating parallel; #99 AutoGPT NO-GO; #95 Nous NO-GO; #243 token-history liveness verified; #210 Node-24 false-green; #235 Thoughtworks AI Evals NO-GO. **6/6 external-product audits NO-GO** (continues #76, #77). Research write-ups in `.reviews/research-*.md`.
 - [1.64.0] XDLC ecosystem cross-references — README, wizard doc, and ROADMAP now cross-reference all three sibling packages (`agentic-sdlc-wizard`, `codex-sdlc-wizard`, `claude-gdlc-wizard`). New "Ecosystem (Sibling Projects)" section in README. 3 new doc-consistency tests prevent drift.
 - [1.63.0] cache-cost observability closeout (#204 absorbed by #220) — `tests/test-token-spike.sh` gains explicit cache-miss regression test + negative-control test. SDLC skill + wizard doc gain "Cache-Cost Surprises" sections covering 10-20× silent cost blowups (mid-session CLAUDE.md edits, idle pruning, upstream cache bugs) and detection via `hooks/token-spike-check.sh`'s `costly_tokens` metric.