@oneie/claude 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/commands/do.md CHANGED
@@ -31,8 +31,11 @@ Comprehensive and fast are not in tension: the **tier decides which gates run**,
31
31
  |---|---|---|
32
32
  | Tier prune | Step 1 | PATCH walks `code` only; a gate never runs above the phases it guards |
33
33
  | Tool ladder | every decision | bash → Haiku → Sonnet → Opus; stop at the first that decides |
34
+ | **Context reset** | cycle boundary | multi-cycle plans run each cycle in a fresh subprocess (`do-auto.sh`) — prior-cycle chatter (~15–25k/cycle) never accumulates; ~90–150k saved on a 6-cycle plan |
34
35
  | **TRIVIAL fast-path** | BUILD | **0 agent spawns** — read ≤3 files inline, edit, bash verify, inline rubric, 1 learnings line |
35
- | Recon cache | W1 | KV `recon:{sha}:{sha}` hit (<14d) → skip re-read, `mark(recon:hit)` |
36
+ | Recon cache | W1 | `do-recon-cache.sh check` sha-keyed local store, <14d → **0 tokens**, skip all spawns (saves ~12k/recon) |
37
+ | W1 prefix cache | W1 miss | `w1-recon.ts` caches the ~5,400-token rules+agent prefix across Haiku calls (~4,900 saved/repeat) |
38
+ | W4 rubric cache | W4 | `w4-rubric.ts` caches the ~10,900-token rubric+spec block across 6 Haiku calls — 1 write + 5 reads (~46k saved/run) |
36
39
  | Verify-only-changed | W3→W4 | dependency cone from `git diff`; full verify only at cycle close |
37
40
  | Cross-cycle pre-warm | W4 close | one fire-and-forget Haiku pre-reads the next cycle's W1 targets |
38
41
  | Recon cap | W1 | 400-word receipt; high-signal slices → `.w2-spec.json`, not the raw dump |
@@ -72,9 +75,10 @@ C2·C3·C4 are siblings → their W1/W2 run concurrently and their W3a edits mer
72
75
  | Input | Treat as | Entry |
73
76
  |---|---|---|
74
77
  | bare text (`"add usage billing"`) | **IDEA** (default) | walk the spine from the top |
75
- | `<slug>` or a `plans/*.md` path | existing work | resolve its slug, walk from the first gap |
76
- | `--auto` | run all cycles of a todo continuously | BUILD engine, trust-aware |
78
+ | `<slug>` or a `plans/*.md` path | existing work | resolve its slug; ≥2 incomplete cycles → auto context-isolated loop, else walk from the first gap inline |
77
79
  | `--wave N` | force one wave of a todo | BUILD engine |
80
+ | `--next-cycle` | *(internal)* run one batch then exit — the loop's recursion guard, set by `do-auto.sh`; never typed by a human | BUILD engine |
81
+ | `--auto` | *(legacy alias)* same as a bare multi-cycle `/do <slug>` — the loop is automatic now | BUILD engine, trust-aware |
78
82
 
79
83
  Derive the **slug** from the idea: kebab-case, 2–4 words (`"add usage billing"` → `usage-billing`). Every artifact on the spine is keyed by this slug.
80
84
 
@@ -148,7 +152,40 @@ After the spine: **LEARN / close** — write one `learnings.md` entry (slug, tie
148
152
 
149
153
  **Plan-outcome kill-switch (zero LLM).** Re-run the plan's `outcome:` command every cycle close. Exit 0 → the goal is met: remaining cycles enter justify-or-drop (default drop), `--auto` halts. Over the tier ceiling → `warn` + justify. Weak rubric 3× → halt, the *plan* is wrong, not the code. Goal-drift (outcome fails 3× with green rubrics) → force a re-plan.
150
154
 
151
- **`--auto` continuation (trust-aware, reads `.do-trust.json`).** `trusted` (composite 0.85 × 3+) start the next cycle immediately. `standard` (0.65–0.85) continue. `cautious` (< 0.65 × 2) halt, run `/do next` to resume. Absent file `standard`.
155
+ **One command, complexity-sized.** `/do <anything>` is the only thing a human types — an idea, a slug, or a `plans/<slug>-todo.md` path. The tier sizes the work and the spine prunes itself: a typo is edited and verified inline (one cycle, no loop); a feature writes its promise, spec, todo, tests, and docs and then runs every cycle. The human never picks the mode, never types a flag, never runs a second command.
156
+
157
+ **Context isolation is automatic for multi-cycle plans.** When `/do` resolves to a todo with **≥2 incomplete cycles**, it does not run them inline — it hands the loop to the internal engine so every cycle gets a **fresh context**:
158
+
159
+ ```bash
160
+ # /do runs this itself — the user never types it
161
+ bun .claude/scripts/do-auto.sh <slug>
162
+ ```
163
+
164
+ `do-auto.sh` loops: each iteration spawns a clean `/do <slug> --next-cycle` that runs exactly one batch (W1→W4), ticks its boxes, writes state, and exits. Because each cycle starts from near-zero context, prior-cycle recon/edit/verify chatter never accumulates. The main session just kicks off the loop and reports the close.
165
+
166
+ A **single** incomplete cycle (or a PATCH/FIX) runs inline — there's nothing to isolate from, and spawning a subprocess would cost more than it saves.
167
+
168
+ **`--next-cycle` is internal.** It means "run one batch, tick boxes, exit — do **not** loop." Only `do-auto.sh` passes it; it is the recursion guard that keeps a fresh `/do` from re-entering the loop. A human never types it.
169
+
170
+ State that survives the reset (all on disk — the loop is stateless in memory):
171
+ - `plans/<slug>-todo.md` — checked boxes are the progress bar; the next `/do --next-cycle` reads them and skips completed cycles automatically
172
+ - `.do-trust.json` — trust level and consecutive score history (drives auto-continue vs halt)
173
+ - `.w4-improvements.json` — open items that become mandatory W1 targets next cycle
174
+ - `.w2-spec.json` / `.w3-receipts.json` — write-once state for soft-resume within a cycle
175
+
176
+ What is intentionally discarded each reset: prior-cycle conversation and agent output prose. The checkboxes captured everything that matters; the prose was scaffolding.
177
+
178
+ **Trust gates the loop (reads `.do-trust.json`).** `trusted` (composite ≥ 0.85 × 3+) → next cycle fires immediately. `standard` (0.65–0.85) → continue. `cautious` (< 0.65 × 2) → `do-auto.sh` halts and tells the human to re-run `/do <slug>` once the issue is fixed. Absent file → `standard`.
179
+
180
+ **Token math for a 6-cycle COMPLEX plan:**
181
+
182
+ | Mode | Context per cycle | Cumulative context |
183
+ |---|---|---|
184
+ | inline (everything in one session) | +15–25k per cycle | ~90–150k by cycle 6 |
185
+ | context-isolated (the default) | ~1–2k (fresh invocation) | ~1–2k every cycle |
186
+ | **Saving** | — | **~90–150k tokens** |
187
+
188
+ This compounds with the W1 cache and W4 rubric cache: a cycle 6 run with context isolation + both SDK caches costs roughly **1/10th** of the same cycle run inline in a long session.
152
189
 
153
190
  ---
154
191
 
@@ -165,7 +202,19 @@ echo "$RESOLVED" | jq -se 'all(.doc_only)' >/dev/null \
165
202
  TSC=$(bunx tsc --noEmit 2>&1 | grep -c "error TS" || echo 0) # write .w0-baseline.json {tscErrors, loc, tests}
166
203
  ```
167
204
 
168
- **W1 — Recon.** Read `.w4-improvements.json` open items first — they're mandatory recon targets. Check the recon cache (KV `recon:{sha}:{sha}` <14d) → hit skips the re-read. ≤5 files → read inline. ≥6 → spawn `w1-recon` (Haiku · low) for ALL files in ONE message; each returns structured findings; skip `relevance_score < 0.4`; all-filtered → halt and broaden (zero-findings guard). Receipt capped at 400 words; persist high-signal slices into `.w2-spec.json`. *(The `w1-recon` agent carries the two-track existing-code + primitive-inventory contract.)*
205
+ **W1 — Recon.** Read `.w4-improvements.json` open items first — they're mandatory recon targets.
206
+
207
+ **Cache check (zero tokens):**
208
+ ```bash
209
+ .claude/scripts/do-recon-cache.sh check $CYCLE_TARGET_PATHS 2>/dev/null && RECON_CACHE_HIT=true || true
210
+ ```
211
+ Hit → load findings from stdout, skip all agent spawns. Also run `do-recon-cache.sh prune` once per session to evict entries older than 14 days.
212
+
213
+ Miss → ≤5 files → read inline. ≥6 → run the SDK script (prompt-cached rules + agent-prompt block, writes result to `.w1-cache/`):
214
+ ```bash
215
+ bun .claude/scripts/w1-recon.ts --targets "$FILES" --mode RECON
216
+ ```
217
+ Script exits 2 if `ANTHROPIC_API_KEY` is absent → fall back to spawning `w1-recon` agent (Haiku · low) for ALL files in ONE message. Either path: skip `relevance_score < 0.4`; all-filtered → halt and broaden (zero-findings guard). Receipt capped at 400 words; persist high-signal slices into `.w2-spec.json`. *(The `w1-recon` agent carries the two-track existing-code + primitive-inventory contract.)*
169
218
 
170
219
  **W2 — Decide (never delegated — Opus · high).** Write the goal/deliverable/UX gate (3 sentences) first. Then: reconcile names against `dictionary.md`; run the **compress check** before any new primitive (name 3 existing primitives that compose it → `compose` removes it from the diff, `new` needs a one-line justification + same-diff doc edit). The pre-mortem + trade-offs were already captured at the `spec` stop (`template-spec.md`) — carry the failure modes forward as test cases, don't redo them. Classify each W1 finding Act / Keep / Defer. Output diff specs + write `.w2-spec.json` (+ `.w2-doc-plan.json` if a doc trigger fires). *(The `w2-decide` agent carries the compose-target table — which canonical doc to check per primitive type — plus context-triggers for surgical doc injection.)*
171
220
 
@@ -178,7 +227,11 @@ $(cycle.demo.command) # the goal gate — exit 0 = pass
178
227
  # doc-sync gate (reads .w2-doc-plan.json): stale-name=0, links ok, contract mtime current
179
228
  DELTA_TSC=$((TSC_NOW - TSC_BASELINE)) # hard gate: ≤ 0, no new type errors ever
180
229
  ```
181
- Rubric (inline for TRIVIAL/SIMPLE; for COMPLEX spawn 6 Haiku · medium goal-fit, security, stability, simplicity, speed, adversarial alongside `pr-review-toolkit` (code-reviewer, silent-failure-hunter, type-design-analyzer) + `find-bugs` + `accessibility` on UI):
230
+ Rubric (inline for TRIVIAL/SIMPLE; for COMPLEX — run the SDK script first (6 Haiku in parallel, rubric + spec block cached across all calls):
231
+ ```bash
232
+ bun .claude/scripts/w4-rubric.ts --files "$(git diff HEAD --name-only | tr '\n' ',')"
233
+ ```
234
+ Script exits 2 if key absent → fall back to spawning 6 Haiku agents · medium. Either path runs alongside `pr-review-toolkit` (code-reviewer, silent-failure-hunter, type-design-analyzer) + `find-bugs` + `accessibility` on UI):
182
235
  ```
183
236
  composite = 0.35·goal-fit + 0.20·security + 0.20·stability + 0.15·simplicity + 0.10·speed
184
237
  gate: composite ≥ 0.65 AND goal-fit ≥ 0.50 (hard) AND no adversarial > 0.5 AND delta_tsc ≤ 0
@@ -206,12 +259,14 @@ PATCH clears {1,2,3,8}. FEATURE clears all eight.
206
259
  - **Closed loop** — every cycle ends in `mark`/`warn`, never a silent return.
207
260
  - **Cheapest tool that decides wins** — a phase that needs no LLM call is the best kind.
208
261
  - **W4 max 3 loops**, then halt and report.
262
+ - **One command.** A human types `/do <anything>` and nothing else. Complexity sizing, spine pruning, and (for multi-cycle plans) the context-isolated loop are all automatic — never a flag, never a second command.
263
+ - **Every cycle gets a fresh context.** A multi-cycle `/do` runs each cycle in a clean subprocess via the internal loop. Prior-cycle conversation is noise; the todo checkboxes are the only state that carries forward.
209
264
 
210
265
  ---
211
266
 
212
267
  ## Available to /do — the toolbox
213
268
 
214
- **Scripts** (`.claude/scripts/`): `do-tier.sh` (tier + pruned spine + classifier + ceiling) · `do-folder.sh` (folder-aware verify/build) · `do-survey.sh` (reuse verdict) · `do-reconcile.sh` (substrate dim/verb/dead-name gate) · `do-analyze.sh` (spectodo coverage gate) · `do-prove.sh` (surface-detect proof) · `do-smoke.sh` (deterministic outcome).
269
+ **Scripts** (`.claude/scripts/`): `do-auto.sh` (*internal* — the context-isolated loop `/do` drives for multi-cycle plans) · `do-tier.sh` (tier + pruned spine + classifier + ceiling) · `do-folder.sh` (folder-aware verify/build) · `do-survey.sh` (reuse verdict) · `do-reconcile.sh` (substrate dim/verb/dead-name gate) · `do-analyze.sh` (deliverablecycle coverage gate) · `do-prove.sh` (surface-detect proof) · `do-smoke.sh` (deterministic outcome) · `w1-recon.ts` (prompt-cached recon) · `w4-rubric.ts` (cached parallel rubric).
215
270
 
216
271
  **Templates**: `text/template-frame.md` (promise) · `plans/template-spec.md` (design + pre-mortem + decisions) · `plans/template-todo.md` (plan + parallel budget + testing policy) · `plans/agent-template.md` (agent definition).
217
272
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@oneie/claude",
3
- "version": "0.1.0",
3
+ "version": "0.2.1",
4
4
  "description": "ONE Claude Code plugin — /do lifecycle, W1-W4 BUILD engine, ONE substrate via MCP, signals, and rules",
5
5
  "files": [
6
6
  ".claude-plugin",
@@ -1,7 +1,15 @@
1
1
  #!/usr/bin/env bash
2
2
  # do-analyze.sh — ANALYZE coverage gate (ex-Spec-Kit /analyze), read-only. Run at the
3
- # todo→code boundary, BEFORE build spends worktree tokens. Builds the D#↔C# coverage matrix
4
- # and the AC→test check. CRITICAL (uncovered deliverable) exit 1, blocks build. Never edits.
3
+ # todo→code boundary, BEFORE build spends worktree tokens. Verifies the plan against the
4
+ # current template-todo.md contract: a `deliverables:` frontmatter block, and per-cycle
5
+ # `**Deliverable:**` + (`demo:` | `**Cycle outcome:**`) lines.
6
+ #
7
+ # CRITICAL (exit 1, blocks build):
8
+ # - deliverables: frontmatter is empty → plan ships nothing
9
+ # - a ## C# cycle has no **Deliverable:** line → cycle ships nothing (coverage hole)
10
+ # HIGH (warn, does not block):
11
+ # - a cycle has no demo:/Cycle outcome line → AC→test gap (no planned verification)
12
+ # Never edits.
5
13
  # Usage: do-analyze.sh <todo.md>
6
14
  set -euo pipefail
7
15
  todo="${1:?usage: do-analyze.sh <todo.md>}"
@@ -9,34 +17,36 @@ todo="${1:?usage: do-analyze.sh <todo.md>}"
9
17
 
10
18
  crit=0; high=0
11
19
 
12
- # 1. coverage: every deliverable line (- D#) must cite at least one cycle (C#)
13
- uncovered=0
14
- while IFS= read -r line; do
15
- did=$(printf '%s' "$line" | grep -oE 'D[0-9]+' | head -1)
16
- if ! printf '%s' "$line" | grep -qE 'C[0-9]+'; then
17
- echo "CRITICAL: $did maps to no cycle"; crit=1; uncovered=$((uncovered+1))
18
- fi
19
- done < <(grep -E '^[[:space:]]*- D[0-9]+ ' "$todo")
20
+ # 1. plan ships something: deliverables: frontmatter has >=1 real entry (- kind: ...)
21
+ ndel=$(awk '
22
+ /^deliverables:/ {f=1; next}
23
+ f && /^[a-z_]+:/ {f=0}
24
+ f && /^[[:space:]]*-[[:space:]]/ {n++}
25
+ END {print n+0}
26
+ ' "$todo")
27
+ if [ "$ndel" -eq 0 ]; then
28
+ echo "CRITICAL: deliverables: frontmatter is empty — plan ships nothing"; crit=1
29
+ fi
20
30
 
21
- ndel=$(grep -cE '^[[:space:]]*- D[0-9]+ ' "$todo" || true)
22
- ncyc=$(grep -cE '^## C[0-9]+ ' "$todo" || true)
23
-
24
- # 2. orphan cycles: every ## C# section should be cited by some deliverable
25
- cited=$(grep -oE 'C[0-9]+' "$todo" | sort -u)
26
- for c in $(grep -oE '^## C[0-9]+ ' "$todo" | grep -oE 'C[0-9]+'); do
27
- # a cycle is cited if it appears outside its own header (i.e. in a deliverable line)
28
- hits=$(grep -E "^[[:space:]]*- D[0-9]+ .*\b$c\b" "$todo" | wc -l | tr -d ' ')
29
- [ "$hits" -eq 0 ] && { echo "HIGH: cycle $c has no deliverable (orphan)"; high=$((high+1)); }
30
- done
31
-
32
- # 3. AC→test: every cycle section should carry a demo: line (DoD row 2, planned half)
33
- nodemo=0
34
- for c in $(grep -oE '^## C[0-9]+ ' "$todo" | grep -oE 'C[0-9]+'); do
35
- block=$(awk "/^## $c /{f=1} f&&/^## C[0-9]+ /&&!/^## $c /{if(seen)exit} {if(f)print; if(/^## $c /)seen=1}" "$todo")
36
- printf '%s' "$block" | grep -qiE 'demo:' || { echo "HIGH: $c has no planned test (demo: line)"; high=$((high+1)); nodemo=$((nodemo+1)); }
31
+ # 2. every cycle ships a deliverable AND has a planned test. Walk each ## C# section
32
+ # (header to the next ## header — ### W-waves and **bold** lines stay inside).
33
+ ncyc=0; nodeliv=0; notest=0
34
+ for c in $(grep -oE '^## C[0-9]+' "$todo" | grep -oE 'C[0-9]+'); do
35
+ ncyc=$((ncyc+1))
36
+ block=$(awk -v c="$c" '
37
+ $0 ~ "^## "c"( |$)" {f=1; print; next}
38
+ f && /^## / {exit}
39
+ f {print}
40
+ ' "$todo")
41
+ printf '%s' "$block" | grep -qE '^\*\*Deliverable' || {
42
+ echo "CRITICAL: $c has no **Deliverable:** line (cycle ships nothing coverage hole)"; crit=1; nodeliv=$((nodeliv+1))
43
+ }
44
+ printf '%s' "$block" | grep -qiE 'demo:|^\*\*Cycle outcome' || {
45
+ echo "HIGH: $c has no planned test (demo: or **Cycle outcome:** line)"; high=$((high+1)); notest=$((notest+1))
46
+ }
37
47
  done
38
48
 
39
49
  echo "----"
40
- echo "coverage: $ndel deliverables / $ncyc cycles · uncovered=$uncovered · no-demo=$nodemo · high=$high"
50
+ echo "coverage: $ndel deliverables / $ncyc cycles · cycles-without-deliverable=$nodeliv · cycles-without-test=$notest · high=$high"
41
51
  if [ "$crit" -ne 0 ]; then echo "ANALYZE: CRITICAL — fix coverage before build"; exit 1; fi
42
- echo "ANALYZE: pass (coverage 100%)"; exit 0
52
+ echo "ANALYZE: pass (every cycle ships + is testable)"; exit 0
@@ -0,0 +1,117 @@
1
+ #!/usr/bin/env bash
2
+ # do-auto.sh — INTERNAL context-isolated loop for multi-cycle /do plans.
3
+ #
4
+ # Not a user command. `/do <slug>` invokes this itself when the resolved todo has
5
+ # >=2 incomplete cycles. Each cycle runs in a FRESH `claude -p "/do <slug> --next-cycle"`
6
+ # subprocess, resetting the context window to near-zero. State passes entirely through
7
+ # disk: the todo file (checked boxes), .do-trust.json (trust), .w4-improvements.json.
8
+ #
9
+ # Why it exists: run inline, each closed cycle leaves ~15-25k tokens of recon/edit/verify
10
+ # chatter in context that the next cycle has no use for — ~90-150k tokens of noise by
11
+ # cycle 6. Fresh-per-cycle eliminates it. The todo checkboxes are the only carried state.
12
+ #
13
+ # Internal usage (driven by do.md, not a human):
14
+ # bun .claude/scripts/do-auto.sh <slug> [--max-cycles N] [--dry-run]
15
+ set -euo pipefail
16
+
17
+ SLUG=""
18
+ MAX_CYCLES=30
19
+ DRY_RUN=false
20
+
21
+ while [ "$#" -gt 0 ]; do
22
+ case "$1" in
23
+ --max-cycles) MAX_CYCLES="$2"; shift 2 ;;
24
+ --dry-run) DRY_RUN=true; shift ;;
25
+ -*) echo "unknown flag: $1" >&2; exit 1 ;;
26
+ *) SLUG="$1"; shift ;;
27
+ esac
28
+ done
29
+
30
+ [ -z "$SLUG" ] && { echo "usage: do-auto.sh <slug> [--max-cycles N] [--dry-run]" >&2; exit 1; }
31
+
32
+ # Validate slug is safe (kebab-case only) before it touches any shell string.
33
+ # Prevents command injection if the slug ever contains metacharacters.
34
+ if ! printf '%s' "$SLUG" | grep -qE '^[a-zA-Z0-9][a-zA-Z0-9_-]*$'; then
35
+ echo "[do-auto] unsafe slug (must be alphanumeric + hyphens/underscores): $SLUG" >&2; exit 1
36
+ fi
37
+
38
+ # Trust boundary: this script runs in the developer's own workspace, invoked by /do
39
+ # which resolves the slug from a plans/ file the developer controls. The spawned
40
+ # claude subprocess uses --dangerously-skip-permissions because it is non-interactive
41
+ # — it cannot prompt the human for tool approvals. The workspace is the isolation
42
+ # boundary. Do not expose this script to untrusted input or run it in shared environments.
43
+ TODO="plans/${SLUG}-todo.md"
44
+ TRUST=".do-trust.json"
45
+
46
+ [ -f "$TODO" ] || { echo "[do-auto] $TODO not found — run /do $SLUG first" >&2; exit 1; }
47
+
48
+ _remaining() {
49
+ # Count cycles in the Status section that are open ([ ]) or in-flight ([~]) — i.e. not [x].
50
+ # Pattern starts with '\[' (not '-') so grep never mistakes it for an option flag.
51
+ grep -cE '\[[ ~]\] C[0-9]+' "$TODO" 2>/dev/null || echo 0
52
+ }
53
+
54
+ _trust() {
55
+ jq -r '.level // "standard"' "$TRUST" 2>/dev/null || echo "standard"
56
+ }
57
+
58
+ _plan_done() {
59
+ # Plan close item is ticked. -e marks the pattern explicitly (dash-safe).
60
+ grep -qe '\[x\].*Plan outcome command exits 0' "$TODO" 2>/dev/null
61
+ }
62
+
63
+ i=0
64
+ prev_remaining=-1
65
+ stall=0
66
+ while [ "$i" -lt "$MAX_CYCLES" ]; do
67
+ if _plan_done; then
68
+ echo "[do-auto] plan complete — outcome line ticked"
69
+ break
70
+ fi
71
+
72
+ remaining=$(_remaining)
73
+ if [ "$remaining" -eq 0 ]; then
74
+ echo "[do-auto] no open cycles — plan complete"
75
+ break
76
+ fi
77
+
78
+ # Stall guard: a cycle that ticks no box means the subprocess halted/errored.
79
+ # Retrying the identical state forever just burns subprocesses — halt after 2 stalls.
80
+ if [ "$remaining" -eq "$prev_remaining" ]; then
81
+ stall=$((stall + 1))
82
+ if [ "$stall" -ge 2 ]; then
83
+ echo "[do-auto] no progress for 2 iterations (${remaining} cycle(s) stuck) — halting." >&2
84
+ echo "[do-auto] The last cycle ticked no checkbox. Inspect $TODO, fix the blocker, then re-run /do $SLUG." >&2
85
+ exit 1
86
+ fi
87
+ else
88
+ stall=0
89
+ fi
90
+ prev_remaining=$remaining
91
+
92
+ trust=$(_trust)
93
+ if [ "$trust" = "cautious" ]; then
94
+ echo "[do-auto] trust=cautious — halting. Fix the issue, then re-run /do $SLUG."
95
+ exit 1
96
+ fi
97
+
98
+ i=$((i + 1))
99
+ echo ""
100
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
101
+ echo "[do-auto] iteration ${i} — ${remaining} cycle(s) remaining — trust=${trust}"
102
+ echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
103
+ echo ""
104
+
105
+ if $DRY_RUN; then
106
+ echo "[do-auto] DRY RUN: would invoke — claude --dangerously-skip-permissions -p \"/do $SLUG --next-cycle\""
107
+ break
108
+ fi
109
+
110
+ # Fresh context: no --resume, no --continue. Each cycle is a clean slate.
111
+ claude --dangerously-skip-permissions -p "/do $SLUG --next-cycle"
112
+ done
113
+
114
+ if [ "$i" -ge "$MAX_CYCLES" ]; then
115
+ echo "[do-auto] hit --max-cycles $MAX_CYCLES — halting"
116
+ exit 1
117
+ fi
@@ -0,0 +1,50 @@
1
+ #!/usr/bin/env bash
2
+ # do-recon-cache.sh — sha-keyed W1 recon cache, 14-day TTL, local file store
3
+ #
4
+ # check <file>... → if ALL files are cached and fresh, print findings to stdout, exit 0
5
+ # else exit 1 (caller proceeds to agent spawn)
6
+ # write <sha> <file> → store findings file under CACHE_DIR/<sha>.json
7
+ # prune → delete entries older than MAX_AGE_DAYS
8
+ set -euo pipefail
9
+
10
+ CACHE_DIR="${CACHE_DIR:-.w1-cache}"
11
+ MAX_AGE_DAYS=14
12
+ cmd="${1:-}"; shift || true
13
+
14
+ _sha() { git hash-object "$1" 2>/dev/null || sha256sum "$1" | cut -c1-40; }
15
+
16
+ case "$cmd" in
17
+ check)
18
+ [ "$#" -eq 0 ] && exit 1
19
+ mkdir -p "$CACHE_DIR"
20
+ all_findings=""
21
+ for path in "$@"; do
22
+ [ -f "$path" ] || continue
23
+ sha=$(_sha "$path")
24
+ cache_file="$CACHE_DIR/${sha}.json"
25
+ [ -f "$cache_file" ] || exit 1
26
+ # stale?
27
+ if find "$cache_file" -mtime "+${MAX_AGE_DAYS}" -print 2>/dev/null | grep -q .; then
28
+ rm -f "$cache_file"
29
+ exit 1
30
+ fi
31
+ all_findings="${all_findings}$(cat "$cache_file")"$'\n\n'
32
+ done
33
+ printf '%s' "$all_findings"
34
+ exit 0
35
+ ;;
36
+ write)
37
+ sha="$1"; src="$2"
38
+ mkdir -p "$CACHE_DIR"
39
+ cp "$src" "$CACHE_DIR/${sha}.json"
40
+ ;;
41
+ prune)
42
+ [ -d "$CACHE_DIR" ] || exit 0
43
+ count=$(find "$CACHE_DIR" -name '*.json' -mtime "+${MAX_AGE_DAYS}" -print -delete | wc -l)
44
+ echo "pruned ${count} stale entries from $CACHE_DIR"
45
+ ;;
46
+ *)
47
+ echo "usage: do-recon-cache.sh check <file>... | write <sha> <file> | prune" >&2
48
+ exit 1
49
+ ;;
50
+ esac
@@ -27,10 +27,12 @@ echo "== C9 do-survey =="
27
27
  "$S/do-survey.sh" zxqwfoobar 2>/dev/null | grep -q 'VERDICT: build' && ok "novel → build" || no "novel"
28
28
  "$S/do-survey.sh" signal 2>/dev/null | grep -qE 'VERDICT: (extend|expose)' && ok "existing → extend/expose" || no "existing"
29
29
 
30
- echo "== C14 do-analyze =="
31
- "$S/do-analyze.sh" plans/do-loop-todo.md >/dev/null 2>&1 && ok "this plan → 100% coverage" || no "plan coverage"
32
- tmp=$(mktemp); printf 'deliverables:\n - D1 x (C1)\n - D2 orphan no cycle\n## C1 — y\n' > "$tmp"
33
- "$S/do-analyze.sh" "$tmp" >/dev/null 2>&1 && no "uncovered should fail" || ok "uncovered deliverable → CRITICAL exit 1"; rm -f "$tmp"
30
+ echo "== C14 do-analyze (current template format: deliverables: + **Deliverable:** + demo/Cycle outcome) =="
31
+ "$S/do-analyze.sh" plans/tools-router-todo.md >/dev/null 2>&1 && ok "real plan → every cycle ships + testable" || no "real plan coverage"
32
+ tmp=$(mktemp); printf 'deliverables:\n - api: foo.ts — does X (C1)\nsource_of_truth:\n## C1 — y\n(no deliverable line)\n' > "$tmp"
33
+ "$S/do-analyze.sh" "$tmp" >/dev/null 2>&1 && no "cycle w/o deliverable should fail" || ok "cycle ships nothing → CRITICAL exit 1"; rm -f "$tmp"
34
+ tmp=$(mktemp); printf 'deliverables:\n - api: foo.ts — does X (C1)\nsource_of_truth:\n## C1 — y\n**Deliverable:** `foo.ts`\n**Cycle outcome:** vitest passes\n' > "$tmp"
35
+ "$S/do-analyze.sh" "$tmp" >/dev/null 2>&1 && ok "well-formed mini-plan → pass" || no "well-formed should pass"; rm -f "$tmp"
34
36
 
35
37
  echo "== C11 do-prove =="
36
38
  pv=$("$S/do-prove.sh" one.ie/web/src/components/X.tsx 2>/dev/null); echo "$pv" | grep -q 'surface: frontend' && ok "frontend → /browser" || no "frontend"
@@ -46,7 +48,7 @@ echo '{"receiver":"cost:cycle","data":{"tokens":{"input":1},"model":"sonnet","co
46
48
  echo "== /do-loop removed (single front door) =="
47
49
  # scan commands+agents only (the engine surface); the 'no separate' note is the lone allowed mention
48
50
  if grep -rn '/do-loop' .claude/commands .claude/agents 2>/dev/null | grep -vq 'no separate'; then no "/do-loop still referenced"; else ok "/do-loop gone (only the 'no separate' note)"; fi
49
- [ -f .claude/commands/do-lifecycle.md ] && ok "do-lifecycle.md is the spec" || no "do-lifecycle.md missing"
51
+ [ -f .claude/commands/do.md ] && ok "do.md is the spec (single front door)" || no "do.md missing"
50
52
 
51
53
  echo "== C5 seed lifecycle (one.ie/web vitest) =="
52
54
  if command -v bunx >/dev/null 2>&1; then
@@ -0,0 +1,139 @@
1
+ #!/usr/bin/env bun
2
+ /**
3
+ * w1-recon.ts — W1 recon with Anthropic SDK prompt caching
4
+ *
5
+ * Caches the static rules + agent-prompt block so repeated W1 calls on the
6
+ * same codebase skip re-tokenizing those layers (~1k tokens / call).
7
+ * Also writes findings to .w1-cache/ (sha-keyed) so the next cycle is free.
8
+ *
9
+ * Usage:
10
+ * bun .claude/scripts/w1-recon.ts --targets <file,...> [--mode RECON|SURVEY|INVESTIGATE]
11
+ *
12
+ * Falls back gracefully: if ANTHROPIC_API_KEY is missing, exits 2 (caller uses Agent spawn).
13
+ */
14
+
15
+ import Anthropic from "@anthropic-ai/sdk";
16
+ import { readFileSync, existsSync, mkdirSync, writeFileSync } from "fs";
17
+ import { execFileSync } from "child_process";
18
+ import { join } from "path";
19
+
20
+ const SCRIPTS_DIR = new URL(".", import.meta.url).pathname;
21
+ const AGENTS_DIR = join(SCRIPTS_DIR, "../agents");
22
+ const RULES_DIR = join(SCRIPTS_DIR, "../rules");
23
+ const CACHE_DIR = ".w1-cache";
24
+
25
+ if (!process.env.ANTHROPIC_API_KEY) {
26
+ process.stderr.write("[w1-recon] no ANTHROPIC_API_KEY — falling back to agent spawn\n");
27
+ process.exit(2);
28
+ }
29
+
30
+ function parseArgs() {
31
+ const args = process.argv.slice(2);
32
+ const out = { targets: [] as string[], mode: "RECON" };
33
+ for (let i = 0; i < args.length; i++) {
34
+ if (args[i] === "--targets" && args[i + 1]) out.targets = args[++i].split(",").filter(Boolean);
35
+ if (args[i] === "--mode" && args[i + 1]) out.mode = args[++i].toUpperCase();
36
+ }
37
+ return out;
38
+ }
39
+
40
+ // Recon is read-only mapping — it needs the locked-rule vocabulary (engine) and
41
+ // doc conventions, NOT the UI rules (astro/react/ui/design) that guide W3 edits.
42
+ // Curating here cuts ~7k tokens of irrelevant context from every recon call AND
43
+ // keeps the merged static block comfortably above Haiku's 4,096-token cache floor.
44
+ const W1_RULES = ["engine.md", "documentation.md"];
45
+
46
+ function loadRules(): string {
47
+ return W1_RULES
48
+ .map(f => join(RULES_DIR, f))
49
+ .filter(existsSync)
50
+ .map(p => readFileSync(p, "utf8"))
51
+ .join("\n\n---\n\n");
52
+ }
53
+
54
+ function loadAgentPrompt(): string {
55
+ const path = join(AGENTS_DIR, "w1-recon.md");
56
+ if (!existsSync(path)) return "";
57
+ return readFileSync(path, "utf8").replace(/^---[\s\S]*?---\n/, "").trim();
58
+ }
59
+
60
+ function fileSha(path: string): string {
61
+ try { return execFileSync("git", ["hash-object", path], { encoding: "utf8" }).trim(); }
62
+ catch { return ""; }
63
+ }
64
+
65
+ function checkCache(shas: string[]): string | null {
66
+ if (!existsSync(CACHE_DIR) || shas.some(s => !s)) return null;
67
+ const entries: string[] = [];
68
+ for (const sha of shas) {
69
+ const p = join(CACHE_DIR, `${sha}.json`);
70
+ if (!existsSync(p)) return null;
71
+ entries.push(readFileSync(p, "utf8"));
72
+ }
73
+ return entries.join("\n\n");
74
+ }
75
+
76
+ function writeCache(sha: string, findings: string) {
77
+ mkdirSync(CACHE_DIR, { recursive: true });
78
+ writeFileSync(join(CACHE_DIR, `${sha}.json`), findings);
79
+ }
80
+
81
+ async function main() {
82
+ const { targets, mode } = parseArgs();
83
+ if (!targets.length) { process.stderr.write("--targets required\n"); process.exit(1); }
84
+
85
+ const shas = targets.map(fileSha);
86
+ const hit = checkCache(shas);
87
+ if (hit) {
88
+ process.stderr.write(`[w1-recon] cache hit (${targets.length} file(s))\n`);
89
+ process.stdout.write(hit);
90
+ return;
91
+ }
92
+
93
+ const client = new Anthropic();
94
+ const rulesText = loadRules();
95
+ const agentPrompt = loadAgentPrompt();
96
+
97
+ const targetContents = targets
98
+ .map(t => {
99
+ try { return `### ${t}\n\`\`\`\n${readFileSync(t, "utf8").slice(0, 4000)}\n\`\`\``; }
100
+ catch { return `### ${t}\n[not found]`; }
101
+ })
102
+ .join("\n\n");
103
+
104
+ const improvements = existsSync(".w4-improvements.json")
105
+ ? `\n\n## Open improvements\n\`\`\`json\n${readFileSync(".w4-improvements.json", "utf8")}\n\`\`\``
106
+ : "";
107
+
108
+ // ONE merged cached block. Rules + agent prompt are both static, so a single
109
+ // breakpoint at the end caches the whole prefix. Splitting them would leave the
110
+ // ~1.4k-token agent block below Haiku's 4,096 cache floor — it would never cache.
111
+ const staticPrefix = [rulesText && `# Rules\n\n${rulesText}`, agentPrompt]
112
+ .filter(Boolean)
113
+ .join("\n\n---\n\n");
114
+ const system: Anthropic.TextBlockParam[] = [
115
+ { type: "text", text: staticPrefix, cache_control: { type: "ephemeral" } } as any,
116
+ ];
117
+
118
+ const res = await client.messages.create({
119
+ model: "claude-haiku-4-5-20251001",
120
+ max_tokens: 1024,
121
+ system: system as any,
122
+ messages: [{
123
+ role: "user",
124
+ content: `Mode: ${mode}\n\nTarget files:\n${targetContents}${improvements}\n\nProduce the findings report. Under 400 words. End with the W1 receipt line.`,
125
+ }],
126
+ });
127
+
128
+ const findings = res.content.filter(b => b.type === "text").map(b => (b as Anthropic.TextBlock).text).join("\n");
129
+
130
+ // Write to sha-keyed cache
131
+ shas.filter(Boolean).forEach(sha => writeCache(sha, findings));
132
+
133
+ const u = res.usage as any;
134
+ process.stderr.write(`[w1-recon] tokens in=${u.input_tokens} cache_write=${u.cache_creation_input_tokens ?? 0} cache_read=${u.cache_read_input_tokens ?? 0}\n`);
135
+
136
+ process.stdout.write(findings);
137
+ }
138
+
139
+ main().catch(e => { process.stderr.write(`[w1-recon] error: ${e}\n`); process.exit(1); });
@@ -0,0 +1,142 @@
1
+ #!/usr/bin/env bun
2
+ /**
3
+ * w4-rubric.ts — W4 rubric scoring with Anthropic SDK prompt caching
4
+ *
5
+ * Runs 6 Haiku calls in parallel (goal-fit, security, stability, simplicity,
6
+ * speed, adversarial). The static rubric + spec block is cached — each call
7
+ * pays only for the unique diff/files suffix.
8
+ *
9
+ * Usage:
10
+ * bun .claude/scripts/w4-rubric.ts --files <file,...>
11
+ *
12
+ * Output: JSON to stdout — { composite, gate, dimensions, adversarial }
13
+ * Falls back: exits 2 if ANTHROPIC_API_KEY missing (caller spawns agents).
14
+ */
15
+
16
+ import Anthropic from "@anthropic-ai/sdk";
17
+ import { readFileSync, existsSync } from "fs";
18
+ import { execFileSync } from "child_process";
19
+ import { join } from "path";
20
+
21
+ if (!process.env.ANTHROPIC_API_KEY) {
22
+ process.stderr.write("[w4-rubric] no ANTHROPIC_API_KEY — falling back to agent spawn\n");
23
+ process.exit(2);
24
+ }
25
+
26
+ const SCRIPTS_DIR = new URL(".", import.meta.url).pathname;
27
+ const RUBRICS_PATH = join(SCRIPTS_DIR, "../../plans/rubrics.md");
28
+ const SPEC_PATH = ".w2-spec.json";
29
+
30
+ const DIMS = [
31
+ { key: "goal-fit", weight: 0.35, instruction: "Score goal-fit: does the diff advance the plan outcome? Read Goal/deliverable from the spec. 0 = no movement, 1 = fully delivers." },
32
+ // These are grep-pattern strings sent to the LLM for it to search — not code being executed.
33
+ { key: "security", weight: 0.20, instruction: "Score security: check for hardcoded secrets, calls to eval, dangerouslySetInnerHTML, missing Zod at API boundaries, wildcard CORS, TypeQL string concat. 1 = all greps return 0." },
34
+ { key: "stability", weight: 0.20, instruction: "Score stability: check new `any`, @ts-ignore without comment, silent returns, retired names (knowledge|connections|people|node|scent|alarm|trail|colony). 1 = all zero." },
35
+ { key: "simplicity", weight: 0.15, instruction: "Score simplicity: focused single-purpose files, functions under 20 lines, no backwards-compat shims or WHAT comments. 1 = tight, no ceremony." },
36
+ { key: "speed", weight: 0.10, instruction: "Score speed: does the diff increase bundle size or build time? Any new client:load where client:idle suffices? 1 = no regressions." },
37
+ { key: "adversarial", weight: 0, instruction: "Adversarial check: is there any finding that should block this cycle? If yes, name it briefly. If nothing blocks, return null for 'finding'." },
38
+ ] as const;
39
+
40
+ function loadRubrics(): string {
41
+ if (existsSync(RUBRICS_PATH)) return readFileSync(RUBRICS_PATH, "utf8");
42
+ return "Score each dimension 0.0–1.0. Explain why in one sentence. Name the specific gap in 'improve', or 'clean' if 1.0.";
43
+ }
44
+
45
+ function loadSpec(): string {
46
+ if (existsSync(SPEC_PATH)) return readFileSync(SPEC_PATH, "utf8").slice(0, 3000);
47
+ return "{}";
48
+ }
49
+
50
+ function diffStat(): string {
51
+ try { return execFileSync("git", ["diff", "HEAD", "--stat"], { encoding: "utf8" }).slice(0, 1000); }
52
+ catch { return "(no diff)"; }
53
+ }
54
+
55
+ function fileSnippets(files: string[]): string {
56
+ return files
57
+ .map(f => {
58
+ try { return `### ${f}\n\`\`\`\n${readFileSync(f, "utf8").slice(0, 1500)}\n\`\`\``; }
59
+ catch { return `### ${f}\n[not found]`; }
60
+ })
61
+ .join("\n\n");
62
+ }
63
+
64
+ type DimResult = { key: string; score: number; why: string; improve: string; finding?: string | null };
65
+
66
+ async function scoreDim(
67
+ client: Anthropic,
68
+ cachedSystem: Anthropic.TextBlockParam[],
69
+ dim: typeof DIMS[number],
70
+ ): Promise<DimResult> {
71
+ const res = await client.messages.create({
72
+ model: "claude-haiku-4-5-20251001",
73
+ max_tokens: 256,
74
+ system: cachedSystem as any,
75
+ messages: [{
76
+ role: "user",
77
+ content: `${dim.instruction}
78
+
79
+ The rubric, cycle spec, diff summary, and touched files are in the system context above.
80
+
81
+ Return compact JSON only:
82
+ ${dim.key === "adversarial"
83
+ ? '{"finding": "<what blocks or null>"}'
84
+ : '{"score": 0.0–1.0, "why": "<one sentence>", "improve": "<gap or clean>"}'}`,
85
+ }],
86
+ });
87
+
88
+ const text = res.content.filter(b => b.type === "text").map(b => (b as Anthropic.TextBlock).text).join("");
89
+ try {
90
+ const match = text.match(/\{[\s\S]*?\}/);
91
+ if (match) return { key: dim.key, score: 0.5, why: "", improve: "", ...JSON.parse(match[0]) };
92
+ } catch {}
93
+ return { key: dim.key, score: 0.5, why: "parse error", improve: text.slice(0, 80) };
94
+ }
95
+
96
+ async function main() {
97
+ const args = process.argv.slice(2);
98
+ const filesIdx = args.indexOf("--files");
99
+ const files = filesIdx >= 0 ? args[filesIdx + 1].split(",").filter(Boolean) : [];
100
+
101
+ const client = new Anthropic();
102
+ const rubricText = loadRubrics();
103
+ const spec = loadSpec();
104
+ const diff = diffStat();
105
+ const snippets = fileSnippets(files);
106
+
107
+ // ONE cached block — rubric + spec + diff + snippets are byte-identical across all
108
+ // 6 parallel dimension calls. Caching them here (instead of repeating diff+snippets
109
+ // in each user message) turns 6× re-tokenization into 1 cache write + 5 cache reads.
110
+ const cachedSystem: Anthropic.TextBlockParam[] = [{
111
+ type: "text",
112
+ text: `# Rubric Reference\n\n${rubricText}\n\n# Cycle Spec\n\`\`\`json\n${spec}\n\`\`\`\n\n# Diff summary\n${diff}\n\n# Touched files\n${snippets}`,
113
+ cache_control: { type: "ephemeral" },
114
+ } as any];
115
+
116
+ const results = await Promise.all(
117
+ DIMS.map(dim => scoreDim(client, cachedSystem, dim))
118
+ );
119
+
120
+ const scored = results.filter(r => r.key !== "adversarial");
121
+ const composite = scored.reduce((sum, r) => {
122
+ const w = DIMS.find(d => d.key === r.key)!.weight;
123
+ return sum + r.score * w;
124
+ }, 0);
125
+
126
+ const goalFit = scored.find(r => r.key === "goal-fit")?.score ?? 0;
127
+ const adversarial = results.find(r => r.key === "adversarial");
128
+
129
+ const output = {
130
+ composite: Math.round(composite * 100) / 100,
131
+ gate: composite >= 0.65 && goalFit >= 0.50,
132
+ dimensions: Object.fromEntries(
133
+ scored.map(r => [r.key, { score: r.score, why: r.why, improve: r.improve }])
134
+ ),
135
+ adversarial: adversarial?.finding ?? null,
136
+ };
137
+
138
+ process.stderr.write(`[w4-rubric] composite=${output.composite} gate=${output.gate}\n`);
139
+ process.stdout.write(JSON.stringify(output, null, 2));
140
+ }
141
+
142
+ main().catch(e => { process.stderr.write(`[w4-rubric] error: ${e}\n`); process.exit(1); });