npm - @oneie/claude - Versions diffs - 0.1.0 → 0.2.1 - Mend

@oneie/claude 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/commands/do.md +62 -7
package/package.json +1 -1
package/scripts/do-analyze.sh +38 -28
package/scripts/do-auto.sh +117 -0
package/scripts/do-recon-cache.sh +50 -0
package/scripts/do-smoke.sh +7 -5
package/scripts/w1-recon.ts +139 -0
package/scripts/w4-rubric.ts +142 -0

package/commands/do.md CHANGED Viewed

@@ -31,8 +31,11 @@ Comprehensive and fast are not in tension: the **tier decides which gates run**,
 |---|---|---|
 | Tier prune | Step 1 | PATCH walks `code` only; a gate never runs above the phases it guards |
 | Tool ladder | every decision | bash → Haiku → Sonnet → Opus; stop at the first that decides |
+| **Context reset** | cycle boundary | multi-cycle plans run each cycle in a fresh subprocess (`do-auto.sh`) — prior-cycle chatter (~15–25k/cycle) never accumulates; ~90–150k saved on a 6-cycle plan |
 | **TRIVIAL fast-path** | BUILD | **0 agent spawns** — read ≤3 files inline, edit, bash verify, inline rubric, 1 learnings line |
-| Recon cache | W1 | KV `recon:{sha}:{sha}` hit (<14d) → skip re-read, `mark(recon:hit)` |
+| Recon cache | W1 | `do-recon-cache.sh check` — sha-keyed local store, <14d → **0 tokens**, skip all spawns (saves ~12k/recon) |
+| W1 prefix cache | W1 miss | `w1-recon.ts` caches the ~5,400-token rules+agent prefix across Haiku calls (~4,900 saved/repeat) |
+| W4 rubric cache | W4 | `w4-rubric.ts` caches the ~10,900-token rubric+spec block across 6 Haiku calls — 1 write + 5 reads (~46k saved/run) |
 | Verify-only-changed | W3→W4 | dependency cone from `git diff`; full verify only at cycle close |
 | Cross-cycle pre-warm | W4 close | one fire-and-forget Haiku pre-reads the next cycle's W1 targets |
 | Recon cap | W1 | 400-word receipt; high-signal slices → `.w2-spec.json`, not the raw dump |
@@ -72,9 +75,10 @@ C2·C3·C4 are siblings → their W1/W2 run concurrently and their W3a edits mer
 | Input | Treat as | Entry |
 |---|---|---|
 | bare text (`"add usage billing"`) | **IDEA** (default) | walk the spine from the top |
-| `<slug>` or a `plans/*.md` path | existing work | resolve its slug, walk from the first gap |
-| `--auto` | run all cycles of a todo continuously | BUILD engine, trust-aware |
+| `<slug>` or a `plans/*.md` path | existing work | resolve its slug; ≥2 incomplete cycles → auto context-isolated loop, else walk from the first gap inline |
 | `--wave N` | force one wave of a todo | BUILD engine |
+| `--next-cycle` | *(internal)* run one batch then exit — the loop's recursion guard, set by `do-auto.sh`; never typed by a human | BUILD engine |
+| `--auto` | *(legacy alias)* same as a bare multi-cycle `/do <slug>` — the loop is automatic now | BUILD engine, trust-aware |
 Derive the **slug** from the idea: kebab-case, 2–4 words (`"add usage billing"` → `usage-billing`). Every artifact on the spine is keyed by this slug.
@@ -148,7 +152,40 @@ After the spine: **LEARN / close** — write one `learnings.md` entry (slug, tie
 **Plan-outcome kill-switch (zero LLM).** Re-run the plan's `outcome:` command every cycle close. Exit 0 → the goal is met: remaining cycles enter justify-or-drop (default drop), `--auto` halts. Over the tier ceiling → `warn` + justify. Weak rubric 3× → halt, the *plan* is wrong, not the code. Goal-drift (outcome fails 3× with green rubrics) → force a re-plan.
-**`--auto` continuation (trust-aware, reads `.do-trust.json`).** `trusted` (composite ≥ 0.85 × 3+) → start the next cycle immediately. `standard` (0.65–0.85) → continue. `cautious` (< 0.65 × 2) → halt, run `/do next` to resume. Absent file → `standard`.
+**One command, complexity-sized.** `/do <anything>` is the only thing a human types — an idea, a slug, or a `plans/<slug>-todo.md` path. The tier sizes the work and the spine prunes itself: a typo is edited and verified inline (one cycle, no loop); a feature writes its promise, spec, todo, tests, and docs and then runs every cycle. The human never picks the mode, never types a flag, never runs a second command.
+**Context isolation is automatic for multi-cycle plans.** When `/do` resolves to a todo with **≥2 incomplete cycles**, it does not run them inline — it hands the loop to the internal engine so every cycle gets a **fresh context**:
+```bash
+# /do runs this itself — the user never types it
+bun .claude/scripts/do-auto.sh <slug>
+```
+`do-auto.sh` loops: each iteration spawns a clean `/do <slug> --next-cycle` that runs exactly one batch (W1→W4), ticks its boxes, writes state, and exits. Because each cycle starts from near-zero context, prior-cycle recon/edit/verify chatter never accumulates. The main session just kicks off the loop and reports the close.
+A **single** incomplete cycle (or a PATCH/FIX) runs inline — there's nothing to isolate from, and spawning a subprocess would cost more than it saves.
+**`--next-cycle` is internal.** It means "run one batch, tick boxes, exit — do **not** loop." Only `do-auto.sh` passes it; it is the recursion guard that keeps a fresh `/do` from re-entering the loop. A human never types it.
+State that survives the reset (all on disk — the loop is stateless in memory):
+- `plans/<slug>-todo.md` — checked boxes are the progress bar; the next `/do --next-cycle` reads them and skips completed cycles automatically
+- `.do-trust.json` — trust level and consecutive score history (drives auto-continue vs halt)
+- `.w4-improvements.json` — open items that become mandatory W1 targets next cycle
+- `.w2-spec.json` / `.w3-receipts.json` — write-once state for soft-resume within a cycle
+What is intentionally discarded each reset: prior-cycle conversation and agent output prose. The checkboxes captured everything that matters; the prose was scaffolding.
+**Trust gates the loop (reads `.do-trust.json`).** `trusted` (composite ≥ 0.85 × 3+) → next cycle fires immediately. `standard` (0.65–0.85) → continue. `cautious` (< 0.65 × 2) → `do-auto.sh` halts and tells the human to re-run `/do <slug>` once the issue is fixed. Absent file → `standard`.
+**Token math for a 6-cycle COMPLEX plan:**
+| Mode | Context per cycle | Cumulative context |
+|---|---|---|
+| inline (everything in one session) | +15–25k per cycle | ~90–150k by cycle 6 |
+| context-isolated (the default) | ~1–2k (fresh invocation) | ~1–2k every cycle |
+| **Saving** | — | **~90–150k tokens** |
+This compounds with the W1 cache and W4 rubric cache: a cycle 6 run with context isolation + both SDK caches costs roughly **1/10th** of the same cycle run inline in a long session.
 ---
@@ -165,7 +202,19 @@ echo "$RESOLVED" | jq -se 'all(.doc_only)' >/dev/null \
 TSC=$(bunx tsc --noEmit 2>&1 | grep -c "error TS" || echo 0)   # write .w0-baseline.json {tscErrors, loc, tests}
 ```
-**W1 — Recon.** Read `.w4-improvements.json` open items first — they're mandatory recon targets. Check the recon cache (KV `recon:{sha}:{sha}` <14d) → hit skips the re-read. ≤5 files → read inline. ≥6 → spawn `w1-recon` (Haiku · low) for ALL files in ONE message; each returns structured findings; skip `relevance_score < 0.4`; all-filtered → halt and broaden (zero-findings guard). Receipt capped at 400 words; persist high-signal slices into `.w2-spec.json`. *(The `w1-recon` agent carries the two-track existing-code + primitive-inventory contract.)*
+**W1 — Recon.** Read `.w4-improvements.json` open items first — they're mandatory recon targets.
+**Cache check (zero tokens):**
+```bash
+.claude/scripts/do-recon-cache.sh check $CYCLE_TARGET_PATHS 2>/dev/null && RECON_CACHE_HIT=true || true
+```
+Hit → load findings from stdout, skip all agent spawns. Also run `do-recon-cache.sh prune` once per session to evict entries older than 14 days.
+Miss → ≤5 files → read inline. ≥6 → run the SDK script (prompt-cached rules + agent-prompt block, writes result to `.w1-cache/`):
+```bash
+bun .claude/scripts/w1-recon.ts --targets "$FILES" --mode RECON
+```
+Script exits 2 if `ANTHROPIC_API_KEY` is absent → fall back to spawning `w1-recon` agent (Haiku · low) for ALL files in ONE message. Either path: skip `relevance_score < 0.4`; all-filtered → halt and broaden (zero-findings guard). Receipt capped at 400 words; persist high-signal slices into `.w2-spec.json`. *(The `w1-recon` agent carries the two-track existing-code + primitive-inventory contract.)*
 **W2 — Decide (never delegated — Opus · high).** Write the goal/deliverable/UX gate (3 sentences) first. Then: reconcile names against `dictionary.md`; run the **compress check** before any new primitive (name 3 existing primitives that compose it → `compose` removes it from the diff, `new` needs a one-line justification + same-diff doc edit). The pre-mortem + trade-offs were already captured at the `spec` stop (`template-spec.md`) — carry the failure modes forward as test cases, don't redo them. Classify each W1 finding Act / Keep / Defer. Output diff specs + write `.w2-spec.json` (+ `.w2-doc-plan.json` if a doc trigger fires). *(The `w2-decide` agent carries the compose-target table — which canonical doc to check per primitive type — plus context-triggers for surgical doc injection.)*
@@ -178,7 +227,11 @@ $(cycle.demo.command)                       # the goal gate — exit 0 = pass
 # doc-sync gate (reads .w2-doc-plan.json): stale-name=0, links ok, contract mtime current
 DELTA_TSC=$((TSC_NOW - TSC_BASELINE))       # hard gate: ≤ 0, no new type errors ever
 ```
-Rubric (inline for TRIVIAL/SIMPLE; for COMPLEX spawn 6 Haiku · medium — goal-fit, security, stability, simplicity, speed, adversarial — alongside `pr-review-toolkit` (code-reviewer, silent-failure-hunter, type-design-analyzer) + `find-bugs` + `accessibility` on UI):
+Rubric (inline for TRIVIAL/SIMPLE; for COMPLEX — run the SDK script first (6 Haiku in parallel, rubric + spec block cached across all calls):
+```bash
+bun .claude/scripts/w4-rubric.ts --files "$(git diff HEAD --name-only | tr '\n' ',')"
+```
+Script exits 2 if key absent → fall back to spawning 6 Haiku agents · medium. Either path runs alongside `pr-review-toolkit` (code-reviewer, silent-failure-hunter, type-design-analyzer) + `find-bugs` + `accessibility` on UI):
 ```
 composite = 0.35·goal-fit + 0.20·security + 0.20·stability + 0.15·simplicity + 0.10·speed
 gate: composite ≥ 0.65  AND  goal-fit ≥ 0.50 (hard)  AND  no adversarial > 0.5  AND  delta_tsc ≤ 0
@@ -206,12 +259,14 @@ PATCH clears {1,2,3,8}. FEATURE clears all eight.
 - **Closed loop** — every cycle ends in `mark`/`warn`, never a silent return.
 - **Cheapest tool that decides wins** — a phase that needs no LLM call is the best kind.
 - **W4 max 3 loops**, then halt and report.
+- **One command.** A human types `/do <anything>` and nothing else. Complexity sizing, spine pruning, and (for multi-cycle plans) the context-isolated loop are all automatic — never a flag, never a second command.
+- **Every cycle gets a fresh context.** A multi-cycle `/do` runs each cycle in a clean subprocess via the internal loop. Prior-cycle conversation is noise; the todo checkboxes are the only state that carries forward.
 ---
 ## Available to /do — the toolbox
-**Scripts** (`.claude/scripts/`): `do-tier.sh` (tier + pruned spine + classifier + ceiling) · `do-folder.sh` (folder-aware verify/build) · `do-survey.sh` (reuse verdict) · `do-reconcile.sh` (substrate dim/verb/dead-name gate) · `do-analyze.sh` (spec↔todo coverage gate) · `do-prove.sh` (surface-detect proof) · `do-smoke.sh` (deterministic outcome).
+**Scripts** (`.claude/scripts/`): `do-auto.sh` (*internal* — the context-isolated loop `/do` drives for multi-cycle plans) · `do-tier.sh` (tier + pruned spine + classifier + ceiling) · `do-folder.sh` (folder-aware verify/build) · `do-survey.sh` (reuse verdict) · `do-reconcile.sh` (substrate dim/verb/dead-name gate) · `do-analyze.sh` (deliverable↔cycle coverage gate) · `do-prove.sh` (surface-detect proof) · `do-smoke.sh` (deterministic outcome) · `w1-recon.ts` (prompt-cached recon) · `w4-rubric.ts` (cached parallel rubric).
 **Templates**: `text/template-frame.md` (promise) · `plans/template-spec.md` (design + pre-mortem + decisions) · `plans/template-todo.md` (plan + parallel budget + testing policy) · `plans/agent-template.md` (agent definition).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@oneie/claude",
-  "version": "0.1.0",
+  "version": "0.2.1",
   "description": "ONE Claude Code plugin — /do lifecycle, W1-W4 BUILD engine, ONE substrate via MCP, signals, and rules",
   "files": [
     ".claude-plugin",

package/scripts/do-analyze.sh CHANGED Viewed

@@ -1,7 +1,15 @@
 #!/usr/bin/env bash
 # do-analyze.sh — ANALYZE coverage gate (ex-Spec-Kit /analyze), read-only. Run at the
-# todo→code boundary, BEFORE build spends worktree tokens. Builds the D#↔C# coverage matrix
-# and the AC→test check. CRITICAL (uncovered deliverable) → exit 1, blocks build. Never edits.
+# todo→code boundary, BEFORE build spends worktree tokens. Verifies the plan against the
+# current template-todo.md contract: a `deliverables:` frontmatter block, and per-cycle
+# `**Deliverable:**` + (`demo:` | `**Cycle outcome:**`) lines.
+#
+# CRITICAL (exit 1, blocks build):
+#   - deliverables: frontmatter is empty            → plan ships nothing
+#   - a ## C# cycle has no **Deliverable:** line     → cycle ships nothing (coverage hole)
+# HIGH (warn, does not block):
+#   - a cycle has no demo:/Cycle outcome line        → AC→test gap (no planned verification)
+# Never edits.
 # Usage:  do-analyze.sh <todo.md>
 set -euo pipefail
 todo="${1:?usage: do-analyze.sh <todo.md>}"
@@ -9,34 +17,36 @@ todo="${1:?usage: do-analyze.sh <todo.md>}"
 crit=0; high=0
-# 1. coverage: every deliverable line (- D#) must cite at least one cycle (C#)
-uncovered=0
-while IFS= read -r line; do
-  did=$(printf '%s' "$line" | grep -oE 'D[0-9]+' | head -1)
-  if ! printf '%s' "$line" | grep -qE 'C[0-9]+'; then
-    echo "CRITICAL: $did maps to no cycle"; crit=1; uncovered=$((uncovered+1))
-  fi
-done < <(grep -E '^[[:space:]]*- D[0-9]+ ' "$todo")
+# 1. plan ships something: deliverables: frontmatter has >=1 real entry (- kind: ...)
+ndel=$(awk '
+  /^deliverables:/ {f=1; next}
+  f && /^[a-z_]+:/ {f=0}
+  f && /^[[:space:]]*-[[:space:]]/ {n++}
+  END {print n+0}
+' "$todo")
+if [ "$ndel" -eq 0 ]; then
+  echo "CRITICAL: deliverables: frontmatter is empty — plan ships nothing"; crit=1
+fi
-ndel=$(grep -cE '^[[:space:]]*- D[0-9]+ ' "$todo" || true)
-ncyc=$(grep -cE '^## C[0-9]+ ' "$todo" || true)
-# 2. orphan cycles: every ## C# section should be cited by some deliverable
-cited=$(grep -oE 'C[0-9]+' "$todo" | sort -u)
-for c in $(grep -oE '^## C[0-9]+ ' "$todo" | grep -oE 'C[0-9]+'); do
-  # a cycle is cited if it appears outside its own header (i.e. in a deliverable line)
-  hits=$(grep -E "^[[:space:]]*- D[0-9]+ .*\b$c\b" "$todo" | wc -l | tr -d ' ')
-  [ "$hits" -eq 0 ] && { echo "HIGH: cycle $c has no deliverable (orphan)"; high=$((high+1)); }
-done
-# 3. AC→test: every cycle section should carry a demo: line (DoD row 2, planned half)
-nodemo=0
-for c in $(grep -oE '^## C[0-9]+ ' "$todo" | grep -oE 'C[0-9]+'); do
-  block=$(awk "/^## $c /{f=1} f&&/^## C[0-9]+ /&&!/^## $c /{if(seen)exit} {if(f)print; if(/^## $c /)seen=1}" "$todo")
-  printf '%s' "$block" | grep -qiE 'demo:' || { echo "HIGH: $c has no planned test (demo: line)"; high=$((high+1)); nodemo=$((nodemo+1)); }
+# 2. every cycle ships a deliverable AND has a planned test. Walk each ## C# section
+#    (header to the next ## header — ### W-waves and **bold** lines stay inside).
+ncyc=0; nodeliv=0; notest=0
+for c in $(grep -oE '^## C[0-9]+' "$todo" | grep -oE 'C[0-9]+'); do
+  ncyc=$((ncyc+1))
+  block=$(awk -v c="$c" '
+    $0 ~ "^## "c"( |$)" {f=1; print; next}
+    f && /^## / {exit}
+    f {print}
+  ' "$todo")
+  printf '%s' "$block" | grep -qE '^\*\*Deliverable' || {
+    echo "CRITICAL: $c has no **Deliverable:** line (cycle ships nothing — coverage hole)"; crit=1; nodeliv=$((nodeliv+1))
+  }
+  printf '%s' "$block" | grep -qiE 'demo:|^\*\*Cycle outcome' || {
+    echo "HIGH: $c has no planned test (demo: or **Cycle outcome:** line)"; high=$((high+1)); notest=$((notest+1))
+  }
 done
 echo "----"
-echo "coverage: $ndel deliverables / $ncyc cycles · uncovered=$uncovered · no-demo=$nodemo · high=$high"
+echo "coverage: $ndel deliverables / $ncyc cycles · cycles-without-deliverable=$nodeliv · cycles-without-test=$notest · high=$high"
 if [ "$crit" -ne 0 ]; then echo "ANALYZE: CRITICAL — fix coverage before build"; exit 1; fi
-echo "ANALYZE: pass (coverage 100%)"; exit 0
+echo "ANALYZE: pass (every cycle ships + is testable)"; exit 0

package/scripts/do-auto.sh ADDED Viewed

@@ -0,0 +1,117 @@
+#!/usr/bin/env bash
+# do-auto.sh — INTERNAL context-isolated loop for multi-cycle /do plans.
+#
+# Not a user command. `/do <slug>` invokes this itself when the resolved todo has
+# >=2 incomplete cycles. Each cycle runs in a FRESH `claude -p "/do <slug> --next-cycle"`
+# subprocess, resetting the context window to near-zero. State passes entirely through
+# disk: the todo file (checked boxes), .do-trust.json (trust), .w4-improvements.json.
+#
+# Why it exists: run inline, each closed cycle leaves ~15-25k tokens of recon/edit/verify
+# chatter in context that the next cycle has no use for — ~90-150k tokens of noise by
+# cycle 6. Fresh-per-cycle eliminates it. The todo checkboxes are the only carried state.
+#
+# Internal usage (driven by do.md, not a human):
+#   bun .claude/scripts/do-auto.sh <slug> [--max-cycles N] [--dry-run]
+set -euo pipefail
+SLUG=""
+MAX_CYCLES=30
+DRY_RUN=false
+while [ "$#" -gt 0 ]; do
+  case "$1" in
+    --max-cycles) MAX_CYCLES="$2"; shift 2 ;;
+    --dry-run)    DRY_RUN=true; shift ;;
+    -*)           echo "unknown flag: $1" >&2; exit 1 ;;
+    *)            SLUG="$1"; shift ;;
+  esac
+done
+[ -z "$SLUG" ] && { echo "usage: do-auto.sh <slug> [--max-cycles N] [--dry-run]" >&2; exit 1; }
+# Validate slug is safe (kebab-case only) before it touches any shell string.
+# Prevents command injection if the slug ever contains metacharacters.
+if ! printf '%s' "$SLUG" | grep -qE '^[a-zA-Z0-9][a-zA-Z0-9_-]*$'; then
+  echo "[do-auto] unsafe slug (must be alphanumeric + hyphens/underscores): $SLUG" >&2; exit 1
+fi
+# Trust boundary: this script runs in the developer's own workspace, invoked by /do
+# which resolves the slug from a plans/ file the developer controls. The spawned
+# claude subprocess uses --dangerously-skip-permissions because it is non-interactive
+# — it cannot prompt the human for tool approvals. The workspace is the isolation
+# boundary. Do not expose this script to untrusted input or run it in shared environments.
+TODO="plans/${SLUG}-todo.md"
+TRUST=".do-trust.json"
+[ -f "$TODO" ] || { echo "[do-auto] $TODO not found — run /do $SLUG first" >&2; exit 1; }
+_remaining() {
+  # Count cycles in the Status section that are open ([ ]) or in-flight ([~]) — i.e. not [x].
+  # Pattern starts with '\[' (not '-') so grep never mistakes it for an option flag.
+  grep -cE '\[[ ~]\] C[0-9]+' "$TODO" 2>/dev/null || echo 0
+}
+_trust() {
+  jq -r '.level // "standard"' "$TRUST" 2>/dev/null || echo "standard"
+}
+_plan_done() {
+  # Plan close item is ticked. -e marks the pattern explicitly (dash-safe).
+  grep -qe '\[x\].*Plan outcome command exits 0' "$TODO" 2>/dev/null
+}
+i=0
+prev_remaining=-1
+stall=0
+while [ "$i" -lt "$MAX_CYCLES" ]; do
+  if _plan_done; then
+    echo "[do-auto] plan complete — outcome line ticked"
+    break
+  fi
+  remaining=$(_remaining)
+  if [ "$remaining" -eq 0 ]; then
+    echo "[do-auto] no open cycles — plan complete"
+    break
+  fi
+  # Stall guard: a cycle that ticks no box means the subprocess halted/errored.
+  # Retrying the identical state forever just burns subprocesses — halt after 2 stalls.
+  if [ "$remaining" -eq "$prev_remaining" ]; then
+    stall=$((stall + 1))
+    if [ "$stall" -ge 2 ]; then
+      echo "[do-auto] no progress for 2 iterations (${remaining} cycle(s) stuck) — halting." >&2
+      echo "[do-auto] The last cycle ticked no checkbox. Inspect $TODO, fix the blocker, then re-run /do $SLUG." >&2
+      exit 1
+    fi
+  else
+    stall=0
+  fi
+  prev_remaining=$remaining
+  trust=$(_trust)
+  if [ "$trust" = "cautious" ]; then
+    echo "[do-auto] trust=cautious — halting. Fix the issue, then re-run /do $SLUG."
+    exit 1
+  fi
+  i=$((i + 1))
+  echo ""
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  echo "[do-auto] iteration ${i} — ${remaining} cycle(s) remaining — trust=${trust}"
+  echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+  echo ""
+  if $DRY_RUN; then
+    echo "[do-auto] DRY RUN: would invoke — claude --dangerously-skip-permissions -p \"/do $SLUG --next-cycle\""
+    break
+  fi
+  # Fresh context: no --resume, no --continue. Each cycle is a clean slate.
+  claude --dangerously-skip-permissions -p "/do $SLUG --next-cycle"
+done
+if [ "$i" -ge "$MAX_CYCLES" ]; then
+  echo "[do-auto] hit --max-cycles $MAX_CYCLES — halting"
+  exit 1
+fi

package/scripts/do-recon-cache.sh ADDED Viewed

@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+# do-recon-cache.sh — sha-keyed W1 recon cache, 14-day TTL, local file store
+#
+# check <file>...   → if ALL files are cached and fresh, print findings to stdout, exit 0
+#                     else exit 1 (caller proceeds to agent spawn)
+# write <sha> <file> → store findings file under CACHE_DIR/<sha>.json
+# prune              → delete entries older than MAX_AGE_DAYS
+set -euo pipefail
+CACHE_DIR="${CACHE_DIR:-.w1-cache}"
+MAX_AGE_DAYS=14
+cmd="${1:-}"; shift || true
+_sha() { git hash-object "$1" 2>/dev/null || sha256sum "$1" | cut -c1-40; }
+case "$cmd" in
+  check)
+    [ "$#" -eq 0 ] && exit 1
+    mkdir -p "$CACHE_DIR"
+    all_findings=""
+    for path in "$@"; do
+      [ -f "$path" ] || continue
+      sha=$(_sha "$path")
+      cache_file="$CACHE_DIR/${sha}.json"
+      [ -f "$cache_file" ] || exit 1
+      # stale?
+      if find "$cache_file" -mtime "+${MAX_AGE_DAYS}" -print 2>/dev/null | grep -q .; then
+        rm -f "$cache_file"
+        exit 1
+      fi
+      all_findings="${all_findings}$(cat "$cache_file")"$'\n\n'
+    done
+    printf '%s' "$all_findings"
+    exit 0
+    ;;
+  write)
+    sha="$1"; src="$2"
+    mkdir -p "$CACHE_DIR"
+    cp "$src" "$CACHE_DIR/${sha}.json"
+    ;;
+  prune)
+    [ -d "$CACHE_DIR" ] || exit 0
+    count=$(find "$CACHE_DIR" -name '*.json' -mtime "+${MAX_AGE_DAYS}" -print -delete | wc -l)
+    echo "pruned ${count} stale entries from $CACHE_DIR"
+    ;;
+  *)
+    echo "usage: do-recon-cache.sh check <file>... | write <sha> <file> | prune" >&2
+    exit 1
+    ;;
+esac

package/scripts/do-smoke.sh CHANGED Viewed

@@ -27,10 +27,12 @@ echo "== C9 do-survey =="
 "$S/do-survey.sh" zxqwfoobar 2>/dev/null | grep -q 'VERDICT: build' && ok "novel → build" || no "novel"
 "$S/do-survey.sh" signal 2>/dev/null | grep -qE 'VERDICT: (extend|expose)' && ok "existing → extend/expose" || no "existing"
-echo "== C14 do-analyze =="
-"$S/do-analyze.sh" plans/do-loop-todo.md >/dev/null 2>&1 && ok "this plan → 100% coverage" || no "plan coverage"
-tmp=$(mktemp); printf 'deliverables:\n  - D1 x (C1)\n  - D2 orphan no cycle\n## C1 — y\n' > "$tmp"
-"$S/do-analyze.sh" "$tmp" >/dev/null 2>&1 && no "uncovered should fail" || ok "uncovered deliverable → CRITICAL exit 1"; rm -f "$tmp"
+echo "== C14 do-analyze (current template format: deliverables: + **Deliverable:** + demo/Cycle outcome) =="
+"$S/do-analyze.sh" plans/tools-router-todo.md >/dev/null 2>&1 && ok "real plan → every cycle ships + testable" || no "real plan coverage"
+tmp=$(mktemp); printf 'deliverables:\n  - api: foo.ts — does X (C1)\nsource_of_truth:\n## C1 — y\n(no deliverable line)\n' > "$tmp"
+"$S/do-analyze.sh" "$tmp" >/dev/null 2>&1 && no "cycle w/o deliverable should fail" || ok "cycle ships nothing → CRITICAL exit 1"; rm -f "$tmp"
+tmp=$(mktemp); printf 'deliverables:\n  - api: foo.ts — does X (C1)\nsource_of_truth:\n## C1 — y\n**Deliverable:** `foo.ts`\n**Cycle outcome:** vitest passes\n' > "$tmp"
+"$S/do-analyze.sh" "$tmp" >/dev/null 2>&1 && ok "well-formed mini-plan → pass" || no "well-formed should pass"; rm -f "$tmp"
 echo "== C11 do-prove =="
 pv=$("$S/do-prove.sh" one.ie/web/src/components/X.tsx 2>/dev/null); echo "$pv" | grep -q 'surface: frontend' && ok "frontend → /browser" || no "frontend"
@@ -46,7 +48,7 @@ echo '{"receiver":"cost:cycle","data":{"tokens":{"input":1},"model":"sonnet","co
 echo "== /do-loop removed (single front door) =="
 # scan commands+agents only (the engine surface); the 'no separate' note is the lone allowed mention
 if grep -rn '/do-loop' .claude/commands .claude/agents 2>/dev/null | grep -vq 'no separate'; then no "/do-loop still referenced"; else ok "/do-loop gone (only the 'no separate' note)"; fi
-[ -f .claude/commands/do-lifecycle.md ] && ok "do-lifecycle.md is the spec" || no "do-lifecycle.md missing"
+[ -f .claude/commands/do.md ] && ok "do.md is the spec (single front door)" || no "do.md missing"
 echo "== C5 seed lifecycle (one.ie/web vitest) =="
 if command -v bunx >/dev/null 2>&1; then

package/scripts/w1-recon.ts ADDED Viewed

@@ -0,0 +1,139 @@
+#!/usr/bin/env bun
+/**
+ * w1-recon.ts — W1 recon with Anthropic SDK prompt caching
+ *
+ * Caches the static rules + agent-prompt block so repeated W1 calls on the
+ * same codebase skip re-tokenizing those layers (~1k tokens / call).
+ * Also writes findings to .w1-cache/ (sha-keyed) so the next cycle is free.
+ *
+ * Usage:
+ *   bun .claude/scripts/w1-recon.ts --targets <file,...> [--mode RECON|SURVEY|INVESTIGATE]
+ *
+ * Falls back gracefully: if ANTHROPIC_API_KEY is missing, exits 2 (caller uses Agent spawn).
+ */
+import Anthropic from "@anthropic-ai/sdk";
+import { readFileSync, existsSync, mkdirSync, writeFileSync } from "fs";
+import { execFileSync } from "child_process";
+import { join } from "path";
+const SCRIPTS_DIR = new URL(".", import.meta.url).pathname;
+const AGENTS_DIR = join(SCRIPTS_DIR, "../agents");
+const RULES_DIR = join(SCRIPTS_DIR, "../rules");
+const CACHE_DIR = ".w1-cache";
+if (!process.env.ANTHROPIC_API_KEY) {
+  process.stderr.write("[w1-recon] no ANTHROPIC_API_KEY — falling back to agent spawn\n");
+  process.exit(2);
+}
+function parseArgs() {
+  const args = process.argv.slice(2);
+  const out = { targets: [] as string[], mode: "RECON" };
+  for (let i = 0; i < args.length; i++) {
+    if (args[i] === "--targets" && args[i + 1]) out.targets = args[++i].split(",").filter(Boolean);
+    if (args[i] === "--mode" && args[i + 1]) out.mode = args[++i].toUpperCase();
+  }
+  return out;
+}
+// Recon is read-only mapping — it needs the locked-rule vocabulary (engine) and
+// doc conventions, NOT the UI rules (astro/react/ui/design) that guide W3 edits.
+// Curating here cuts ~7k tokens of irrelevant context from every recon call AND
+// keeps the merged static block comfortably above Haiku's 4,096-token cache floor.
+const W1_RULES = ["engine.md", "documentation.md"];
+function loadRules(): string {
+  return W1_RULES
+    .map(f => join(RULES_DIR, f))
+    .filter(existsSync)
+    .map(p => readFileSync(p, "utf8"))
+    .join("\n\n---\n\n");
+}
+function loadAgentPrompt(): string {
+  const path = join(AGENTS_DIR, "w1-recon.md");
+  if (!existsSync(path)) return "";
+  return readFileSync(path, "utf8").replace(/^---[\s\S]*?---\n/, "").trim();
+}
+function fileSha(path: string): string {
+  try { return execFileSync("git", ["hash-object", path], { encoding: "utf8" }).trim(); }
+  catch { return ""; }
+}
+function checkCache(shas: string[]): string | null {
+  if (!existsSync(CACHE_DIR) || shas.some(s => !s)) return null;
+  const entries: string[] = [];
+  for (const sha of shas) {
+    const p = join(CACHE_DIR, `${sha}.json`);
+    if (!existsSync(p)) return null;
+    entries.push(readFileSync(p, "utf8"));
+  }
+  return entries.join("\n\n");
+}
+function writeCache(sha: string, findings: string) {
+  mkdirSync(CACHE_DIR, { recursive: true });
+  writeFileSync(join(CACHE_DIR, `${sha}.json`), findings);
+}
+async function main() {
+  const { targets, mode } = parseArgs();
+  if (!targets.length) { process.stderr.write("--targets required\n"); process.exit(1); }
+  const shas = targets.map(fileSha);
+  const hit = checkCache(shas);
+  if (hit) {
+    process.stderr.write(`[w1-recon] cache hit (${targets.length} file(s))\n`);
+    process.stdout.write(hit);
+    return;
+  }
+  const client = new Anthropic();
+  const rulesText = loadRules();
+  const agentPrompt = loadAgentPrompt();
+  const targetContents = targets
+    .map(t => {
+      try { return `### ${t}\n\`\`\`\n${readFileSync(t, "utf8").slice(0, 4000)}\n\`\`\``; }
+      catch { return `### ${t}\n[not found]`; }
+    })
+    .join("\n\n");
+  const improvements = existsSync(".w4-improvements.json")
+    ? `\n\n## Open improvements\n\`\`\`json\n${readFileSync(".w4-improvements.json", "utf8")}\n\`\`\``
+    : "";
+  // ONE merged cached block. Rules + agent prompt are both static, so a single
+  // breakpoint at the end caches the whole prefix. Splitting them would leave the
+  // ~1.4k-token agent block below Haiku's 4,096 cache floor — it would never cache.
+  const staticPrefix = [rulesText && `# Rules\n\n${rulesText}`, agentPrompt]
+    .filter(Boolean)
+    .join("\n\n---\n\n");
+  const system: Anthropic.TextBlockParam[] = [
+    { type: "text", text: staticPrefix, cache_control: { type: "ephemeral" } } as any,
+  ];
+  const res = await client.messages.create({
+    model: "claude-haiku-4-5-20251001",
+    max_tokens: 1024,
+    system: system as any,
+    messages: [{
+      role: "user",
+      content: `Mode: ${mode}\n\nTarget files:\n${targetContents}${improvements}\n\nProduce the findings report. Under 400 words. End with the W1 receipt line.`,
+    }],
+  });
+  const findings = res.content.filter(b => b.type === "text").map(b => (b as Anthropic.TextBlock).text).join("\n");
+  // Write to sha-keyed cache
+  shas.filter(Boolean).forEach(sha => writeCache(sha, findings));
+  const u = res.usage as any;
+  process.stderr.write(`[w1-recon] tokens in=${u.input_tokens} cache_write=${u.cache_creation_input_tokens ?? 0} cache_read=${u.cache_read_input_tokens ?? 0}\n`);
+  process.stdout.write(findings);
+}
+main().catch(e => { process.stderr.write(`[w1-recon] error: ${e}\n`); process.exit(1); });

package/scripts/w4-rubric.ts ADDED Viewed

@@ -0,0 +1,142 @@
+#!/usr/bin/env bun
+/**
+ * w4-rubric.ts — W4 rubric scoring with Anthropic SDK prompt caching
+ *
+ * Runs 6 Haiku calls in parallel (goal-fit, security, stability, simplicity,
+ * speed, adversarial). The static rubric + spec block is cached — each call
+ * pays only for the unique diff/files suffix.
+ *
+ * Usage:
+ *   bun .claude/scripts/w4-rubric.ts --files <file,...>
+ *
+ * Output: JSON to stdout — { composite, gate, dimensions, adversarial }
+ * Falls back: exits 2 if ANTHROPIC_API_KEY missing (caller spawns agents).
+ */
+import Anthropic from "@anthropic-ai/sdk";
+import { readFileSync, existsSync } from "fs";
+import { execFileSync } from "child_process";
+import { join } from "path";
+if (!process.env.ANTHROPIC_API_KEY) {
+  process.stderr.write("[w4-rubric] no ANTHROPIC_API_KEY — falling back to agent spawn\n");
+  process.exit(2);
+}
+const SCRIPTS_DIR = new URL(".", import.meta.url).pathname;
+const RUBRICS_PATH = join(SCRIPTS_DIR, "../../plans/rubrics.md");
+const SPEC_PATH = ".w2-spec.json";
+const DIMS = [
+  { key: "goal-fit",    weight: 0.35, instruction: "Score goal-fit: does the diff advance the plan outcome? Read Goal/deliverable from the spec. 0 = no movement, 1 = fully delivers." },
+  // These are grep-pattern strings sent to the LLM for it to search — not code being executed.
+  { key: "security",    weight: 0.20, instruction: "Score security: check for hardcoded secrets, calls to eval, dangerouslySetInnerHTML, missing Zod at API boundaries, wildcard CORS, TypeQL string concat. 1 = all greps return 0." },
+  { key: "stability",   weight: 0.20, instruction: "Score stability: check new `any`, @ts-ignore without comment, silent returns, retired names (knowledge|connections|people|node|scent|alarm|trail|colony). 1 = all zero." },
+  { key: "simplicity",  weight: 0.15, instruction: "Score simplicity: focused single-purpose files, functions under 20 lines, no backwards-compat shims or WHAT comments. 1 = tight, no ceremony." },
+  { key: "speed",       weight: 0.10, instruction: "Score speed: does the diff increase bundle size or build time? Any new client:load where client:idle suffices? 1 = no regressions." },
+  { key: "adversarial", weight: 0,    instruction: "Adversarial check: is there any finding that should block this cycle? If yes, name it briefly. If nothing blocks, return null for 'finding'." },
+] as const;
+function loadRubrics(): string {
+  if (existsSync(RUBRICS_PATH)) return readFileSync(RUBRICS_PATH, "utf8");
+  return "Score each dimension 0.0–1.0. Explain why in one sentence. Name the specific gap in 'improve', or 'clean' if 1.0.";
+}
+function loadSpec(): string {
+  if (existsSync(SPEC_PATH)) return readFileSync(SPEC_PATH, "utf8").slice(0, 3000);
+  return "{}";
+}
+function diffStat(): string {
+  try { return execFileSync("git", ["diff", "HEAD", "--stat"], { encoding: "utf8" }).slice(0, 1000); }
+  catch { return "(no diff)"; }
+}
+function fileSnippets(files: string[]): string {
+  return files
+    .map(f => {
+      try { return `### ${f}\n\`\`\`\n${readFileSync(f, "utf8").slice(0, 1500)}\n\`\`\``; }
+      catch { return `### ${f}\n[not found]`; }
+    })
+    .join("\n\n");
+}
+type DimResult = { key: string; score: number; why: string; improve: string; finding?: string | null };
+async function scoreDim(
+  client: Anthropic,
+  cachedSystem: Anthropic.TextBlockParam[],
+  dim: typeof DIMS[number],
+): Promise<DimResult> {
+  const res = await client.messages.create({
+    model: "claude-haiku-4-5-20251001",
+    max_tokens: 256,
+    system: cachedSystem as any,
+    messages: [{
+      role: "user",
+      content: `${dim.instruction}
+The rubric, cycle spec, diff summary, and touched files are in the system context above.
+Return compact JSON only:
+${dim.key === "adversarial"
+  ? '{"finding": "<what blocks or null>"}'
+  : '{"score": 0.0–1.0, "why": "<one sentence>", "improve": "<gap or clean>"}'}`,
+    }],
+  });
+  const text = res.content.filter(b => b.type === "text").map(b => (b as Anthropic.TextBlock).text).join("");
+  try {
+    const match = text.match(/\{[\s\S]*?\}/);
+    if (match) return { key: dim.key, score: 0.5, why: "", improve: "", ...JSON.parse(match[0]) };
+  } catch {}
+  return { key: dim.key, score: 0.5, why: "parse error", improve: text.slice(0, 80) };
+}
+async function main() {
+  const args = process.argv.slice(2);
+  const filesIdx = args.indexOf("--files");
+  const files = filesIdx >= 0 ? args[filesIdx + 1].split(",").filter(Boolean) : [];
+  const client = new Anthropic();
+  const rubricText = loadRubrics();
+  const spec = loadSpec();
+  const diff = diffStat();
+  const snippets = fileSnippets(files);
+  // ONE cached block — rubric + spec + diff + snippets are byte-identical across all
+  // 6 parallel dimension calls. Caching them here (instead of repeating diff+snippets
+  // in each user message) turns 6× re-tokenization into 1 cache write + 5 cache reads.
+  const cachedSystem: Anthropic.TextBlockParam[] = [{
+    type: "text",
+    text: `# Rubric Reference\n\n${rubricText}\n\n# Cycle Spec\n\`\`\`json\n${spec}\n\`\`\`\n\n# Diff summary\n${diff}\n\n# Touched files\n${snippets}`,
+    cache_control: { type: "ephemeral" },
+  } as any];
+  const results = await Promise.all(
+    DIMS.map(dim => scoreDim(client, cachedSystem, dim))
+  );
+  const scored = results.filter(r => r.key !== "adversarial");
+  const composite = scored.reduce((sum, r) => {
+    const w = DIMS.find(d => d.key === r.key)!.weight;
+    return sum + r.score * w;
+  }, 0);
+  const goalFit = scored.find(r => r.key === "goal-fit")?.score ?? 0;
+  const adversarial = results.find(r => r.key === "adversarial");
+  const output = {
+    composite: Math.round(composite * 100) / 100,
+    gate: composite >= 0.65 && goalFit >= 0.50,
+    dimensions: Object.fromEntries(
+      scored.map(r => [r.key, { score: r.score, why: r.why, improve: r.improve }])
+    ),
+    adversarial: adversarial?.finding ?? null,
+  };
+  process.stderr.write(`[w4-rubric] composite=${output.composite} gate=${output.gate}\n`);
+  process.stdout.write(JSON.stringify(output, null, 2));
+}
+main().catch(e => { process.stderr.write(`[w4-rubric] error: ${e}\n`); process.exit(1); });