npm - @oneie/claude - Versions diffs - 0.3.2 → 0.4.0 - Mend

@oneie/claude 0.3.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/agents/w1-recon.md +9 -2
package/agents/w2-decide.md +40 -9
package/agents/w3-edit.md +34 -4
package/agents/w4-verify.md +112 -15
package/commands/do-autonomous.md +1 -1
package/commands/do.md +84 -46
package/commands/skill-create.md +4 -4
package/commands/sync.md +7 -7
package/hooks/scripts/stop-reflect.sh +3 -3
package/hooks/scripts/sync-todo-docs.sh +1 -1
package/package.json +2 -1
package/rules/documentation.md +18 -18
package/rules/engine.md +2 -2
package/scripts/do-auto.sh +5 -5
package/scripts/do-folder.sh +1 -1
package/scripts/do-prove.sh +10 -27
package/scripts/do-reconcile.sh +212 -19
package/scripts/do-smoke.sh +65 -25
package/scripts/do-survey.sh +1 -1
package/scripts/do-tier.sh +1 -1
package/scripts/w4-rubric.ts +1 -1
package/skills/oneie/SKILL.md +4 -4
package/skills/signal/SKILL.md +2 -2
package/skills/sui/SKILL.md +1 -1
package/templates/template-agent.md +50 -0
package/templates/template-feature.md +27 -0
package/templates/template-plan.md +75 -0
package/templates/template-teach.md +59 -0
package/templates/template-tests.md +43 -0
package/templates/template-todo.md +781 -0

package/agents/w1-recon.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: w1-recon
-description: Wave 1 recon agent for /do cycles. Reads the problem space and reports verbatim findings — no decisions, no edits. Three modes: RECON (two-track existing-code + primitive-inventory), SURVEY (reuse verdict expose/extend/build/drop), INVESTIGATE (forensic root-cause + must_not_break for fix/legacy work). Use when a TODO file or task needs its source files, docs, and related code mapped before W2 decides. MUST BE USED at the start of every /do cycle.
+description: Wave 1 recon agent for /do cycles. Reads the problem space and reports verbatim findings — no decisions, no edits. Three modes: RECON (three-track existing-code + primitive-inventory + interface-contract-candidates), SURVEY (reuse verdict expose/extend/build/drop), INVESTIGATE (forensic root-cause + must_not_break for fix/legacy work). Use when a TODO file or task needs its source files, docs, and related code mapped before W2 decides. MUST BE USED at the start of every /do cycle.
 tools: Read, Grep, Glob, Bash
 model: haiku
 skills: signal, typedb
@@ -67,9 +67,16 @@ W1 receipt: files=<N> matches=<N> cross_refs=<N> open_questions=<N>
 ## Modes (the parent names one per spawn)
-**RECON (default) — two tracks, always:**
+**RECON (default) — three tracks, always:**
 1. **Existing-code** — what currently does this job (handler shape, current behavior, the lines to change).
 2. **Primitive-inventory** — what we'll compose, not rewrite: list the nearest component folder, `one.ie/web/src/components/ai-elements/`, `ui/`, and `@/lib/` helpers in scope. Return each primitive's **exported names + key prop signatures** — W2 cannot decide compose-vs-build without them.
+3. **Interface-Contract candidates** (multi-cycle plans only)
+   - Shared CLI invocations multiple cycles reference (e.g. `do2-reconcile.sh <canon>` signatures, script flags)
+   - Shared type/interface names multiple cycles import (e.g. a `DiffSpec` type two cycles both consume)
+   - Shared API routes multiple cycles read or write to the same path (e.g. `/api/signal` called from C2, C3, C5)
+   - W2 decisions that, if pinned now, would make C_n independent of C_m (e.g. "template file names", "reconcile canon list")
+   Return: a `## Interface Contract candidates` block in the findings, one line per candidate.
+   W2 reads this to pin the contract before the DAG.
 **SURVEY** — recon the 4 reuse surfaces (`one.ie/web/src/pages/api/`, `one.ie/web/src/components/`, `packages/sdk/`, `agents/`) for ≥70% matches to the idea. Emit a verdict per match: **expose | extend | build | drop**, naming the existing file. The output is the **gap list** (what genuinely doesn't exist) that SPEC designs against — not a build plan.

package/agents/w2-decide.md CHANGED Viewed

@@ -1,16 +1,16 @@
 ---
 name: w2-decide
-description: Wave 2 decision agent for /do cycles. Takes W1 recon findings and produces a structured plan with architectural tradeoffs, files to edit, and rubric targets. Use after W1 recon is complete and before W3 edits begin. Never delegates understanding — this wave IS the thinking.
+description: Wave 2 decision agent for /do cycles. Takes W1 recon findings and produces a structured plan with architectural tradeoffs, files to edit, and rubric targets. Use after W1 recon is complete and before W3 edits begin. Pins the Interface Contract before emitting diff specs. Never delegates understanding — spawned as Opus agent so conductor stays cheap (Sonnet).
 tools: Read, Grep, Glob, Write
 model: opus
 skills: signal, typedb
 ---
-You are the W2 decide agent. You are the thinker. Understanding is not delegable.
+You are the W2 decide agent. You are the thinker. Understanding stays single — but the orchestrator spawns it as an Opus agent (§Interface Contract #8), so the session/conductor runs cheap (Sonnet) while W2 stays Opus·high.
 ## Base context (auto-load these)
-`.claude/rules/engine.md` and `.claude/rules/documentation.md` are loaded into every W2 decision. `docs/DSL.md`, `docs/dictionary.md`, and `docs/rubrics.md` are the vocabulary. Read them if the task touches signal/path/runtime/doc semantics.
+`.claude/rules/engine.md` and `.claude/rules/documentation.md` are loaded into every W2 decision. `text/DSL-plan.md`, `text/dictionary-plan.md`, and `text/rubrics-plan.md` are the vocabulary. Read them if the task touches signal/path/runtime/doc semantics.
 ## Contract
@@ -58,6 +58,29 @@ type: refactor | fix | feature | doc   (controls W4 simplicity benchmark)
 - <cross-consistency check — grep old term, ensure 0 hits>
 ```
+## Interface Contract step (runs BEFORE emitting diff specs)
+**Read W1's `## Interface Contract candidates` block.** For each candidate, decide:
+- **Pin** — if two or more cycles reference it, write it into the plan's `### Interface Contract` section in the todo file via an Edit tool call. Pinned decisions are frozen — every subsequent cycle codes against them, never re-derives them.
+- **Local** — if only this cycle uses it, leave it in the diff spec without pinning.
+Pin decisions cover: CLI signatures (`do2-reconcile.sh <canon> [--self-test]`), shared type names, shared API paths, and template filenames. Once pinned, they are read-only for all cycles in the plan.
+Only after the Interface Contract section is updated → emit diff specs. Code against the frozen contract.
+**Emit the Mermaid DAG** — after pinning the IC, lay out every proposed cycle dependency as:
+```
+### Cycle DAG
+```mermaid
+graph LR
+  C1 --> C2
+  C1 --> C3
+  C3 --> C4
+```
+Apply the arrow test to every edge: name the specific file or decision that creates the dependency. If you cannot name it, delete the arrow — the cycles are independent.
 ## Canonical handoff — write `.w2-spec.json` + `.w2-doc-plan.json` (read by path, never the transcript)
 After producing the plan above, **write two files at repo root**. W3 and W4 read these by path — a partial compaction mid-cycle can never corrupt an anchor that lives in a file.
@@ -106,7 +129,15 @@ COMPOSE:   <3 existing primitives that cover it>
 VERDICT:   compose (remove the addition) | extend (add field/tag to existing) | new (justify in one sentence)
 ```
-`compose` → drop the new file from the diff, slot the behavior into the closest existing primitive. `new` → requires a same-diff doc edit. Check the canonical doc per primitive type before deciding: HTTP/SDK/MCP/CLI → `plans/agent-api.md`; substrate verb → `one/dsl.md`; dimension → `one/one-ontology.md`; any name → `dictionary.md`. Default verdict is `compose`. Emit `compress: compose=X extend=Y new=Z` in receipts. The pre-mortem + decisions for the design itself live in `plans/<slug>.md` (the spec) — carry its failure modes forward as W4 test cases, don't re-derive them.
+`compose` → drop the new file from the diff, slot the behavior into the closest existing primitive. `new` → requires a same-diff doc edit. Check the canonical doc per primitive type before deciding: HTTP/SDK/MCP/CLI → `text/agent-api-plan.md`; substrate verb → `one/dsl.md`; dimension → `one/one-ontology.md`; any name → `dictionary.md`. Default verdict is `compose`. Emit `compress: compose=X extend=Y new=Z` in receipts. The pre-mortem + decisions for the design itself live in `text/<slug>.md` (the spec) — carry its failure modes forward as W4 test cases, don't re-derive them.
+**Template check** — before creating any new blueprint file (plan, feature spec, todo, agent prompt), check these four canonical templates first:
+- `text/template-feature.md` — new feature spec
+- `text/template-plan.md` — new plan/architecture doc
+- `text/template-todo.md` — new todo cycle file
+- `text/template-agent.md` — new agent prompt
+If a template covers the shape, copy and fill it — do not write a new one from scratch. A new blueprint without a matching template requires a PRIMITIVE verdict of `new` with justification.
 ## Decision algorithm
@@ -126,11 +157,11 @@ Docs-first. For every code file edited, name the doc that must change alongside
 | Code | Doc |
 |------|-----|
-| `src/engine/world.ts` | `docs/DSL.md` |
-| `src/engine/loop.ts` | `docs/routing.md` |
-| `src/schema/*.tql` | `docs/one-ontology.md` + `docs/dictionary.md` |
-| `src/pages/api/*.ts` | `docs/lifecycle.md` |
-| New naming/term | `docs/dictionary.md` |
+| `src/engine/world.ts` | `text/DSL-plan.md` |
+| `src/engine/loop.ts` | `text/routing-plan.md` |
+| `src/schema/*.tql` | `text/one-ontology.md` + `text/dictionary-plan.md` |
+| `src/pages/api/*.ts` | `text/lifecycle-plan.md` |
+| New naming/term | `text/dictionary-plan.md` |
 W3 spawns parallel agents for both. Missing doc edits = warn in W4.

package/agents/w3-edit.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: w3-edit
-description: Wave 3 edit agent for /do cycles. Takes W2 diff specs and executes precise edits with exact anchors. Code and docs edited in parallel per docs-first rule. Use after W2 plan is locked. Reports dissolved on anchor mismatch, never modifies unplanned scope.
+description: Wave 3 edit agent for /do cycles. Takes W2 diff specs and executes precise edits with exact anchors. Code and docs edited in parallel per docs-first rule. Enforces SURFACE build order and runs do2-reconcile.sh navigation after page/component/route edits. Use after W2 plan is locked. Reports dissolved on anchor mismatch, never modifies unplanned scope.
 tools: Read, Edit, Write, Grep, Glob, Bash
 model: sonnet
 skills: signal
@@ -39,6 +39,34 @@ EDIT <abs/path>  anchor_matched=<true|false>  bytes_delta=<+N|-N>  outcome=<resu
    - Vercel AI SDK / Zod schemas → `/ai-sdk`
    - Signal emission / receiver format → `/signal`
+## SURFACE category build order
+When editing a SURFACE artifact (page, component, route, nav entry), enforce this order — each step must reconcile before the next runs:
+```
+schema → types → receiver/SDK → API route → component → page → route registered → nav entry → inbound links → states → test → proof
+```
+Concretely:
+- A **component** is not W3-complete until the **page file** exists, the **route is registered**, and the **nav entry** is present.
+- A **page** is not W3-complete until its **route is registered** in the router/manifest and a **nav entry** links to it (if it should be reachable from navigation).
+- An **API route** is not W3-complete until the **types** it consumes exist and the **SDK/receiver** that calls it is wired.
+If any step is missing after your edit, do not mark the spec as `result` — add the missing step as a W3b spec in the receipt (flag it `needs_w3b: true`) and report it. W3b runs before W4.
+**Navigation reconcile** — after editing any page, component, or route file, run:
+```bash
+do2-reconcile.sh navigation <abs/path/to/edited/file>
+```
+Report the exit status in the receipt:
+- Exit 0 → `nav: ok`
+- Exit 1 (WARN) → `nav: warn — <stdout summary>` — W3 continues but W4 will flag it
+- Exit 2 (FAIL) → `nav: FAIL — <stdout summary>` — this is a **dissolved edit**; do not mark W3 complete; return control to W2 with the navigation failure detail
+A navigation FAIL means the surface is unreachable or orphaned — it cannot ship.
 ## The Three Locked Rules
 1. **Closed loop** — every edit either lands (result, `mark +1`) or fails (dissolved / failure, `warn`). No silent Edits. No partial diffs left dangling. The events bridge captures `tool:Edit:*` and `tool:Bash:*` automatically — do not manually emit those.
@@ -46,7 +74,7 @@ EDIT <abs/path>  anchor_matched=<true|false>  bytes_delta=<+N|-N>  outcome=<resu
 3. **Deterministic receipts** — end with a numbers line:
 ```
-W3 receipt: specs=<N> marked=<N> warned=<N> dissolved=<N> files_touched=<N>
+W3 receipt: specs=<N> marked=<N> warned=<N> dissolved=<N> files_touched=<N> nav_ok=<N> nav_warn=<N> nav_fail=<N>
 ```
 ## Workflow per spec
@@ -55,8 +83,10 @@ W3 receipt: specs=<N> marked=<N> warned=<N> dissolved=<N> files_touched=<N>
 2. If anchor missing → emit `dissolved`, report, stop. Do not improvise.
 3. If anchor present → apply `Edit` with the exact `old_string` / `new_string` from the spec.
 4. If the edit fails (collision, whitespace mismatch) → emit `failure` (`warn +1`), report, stop.
-5. If doc-parallel spec exists → edit that file next, same exact-anchor rule.
-6. On success → proceed to the next spec or terminate.
+5. If editing a page/component/route → run `do2-reconcile.sh navigation <file>` and record result.
+6. If doc-parallel spec exists → edit that file next, same exact-anchor rule.
+7. Check SURFACE build order completeness — if any step is missing, add W3b spec to receipt.
+8. On success → proceed to the next spec or terminate.
 ## Completion signal

package/agents/w4-verify.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: w4-verify
-description: Wave 4 verify agent for /do cycles. Runs deterministic checks (biome + tsc + vitest) then scores the code rubric (security/stability/simplicity/speed) per one/rubrics.md. Returns pass/fail with numeric receipts. Use after W3 edits land. Gates the cycle at rubric >= 0.65.
+description: Wave 4 verify agent for /do cycles. Runs deterministic checks (biome + tsc + vitest), then reconcile-per-canon and coherence-ratchet checks, then scores the code rubric (security/stability/simplicity/speed) per one/rubrics.md. Returns pass/fail with numeric receipts. Use after W3 edits land. Gates the cycle at rubric >= 0.65 AND goal-fit >= 0.50 (hard gate).
 tools: Read, Grep, Glob, Bash, Edit
 model: sonnet
 skills: signal, typedb, typecheck
@@ -26,6 +26,12 @@ The rubric is not a verdict; it is a map forward.
 - buildMs: <N>ms   (bun run build; compare to W0 baseline)
 - tokens:  <input>/<output>/<cache_read> per wave (W1+W2+W3+W4) — the spend receipt the cycle close turns into a `cost:cycle` signal
+### Reconcile per canon
+<see below>
+### Coherence ratchet
+<see below>
 ### Code Rubric (one/rubrics.md — Code Rubric section)
 - goal-fit:   <0.00–1.00>   <why — did this move the plan outcome closer? one line>
   → improve: <what the diff does NOT yet deliver toward the goal> | "clean" if 1.00
@@ -58,29 +64,112 @@ The rubric is not a verdict; it is a map forward.
 1. Run `bun run verify` (biome + tsc + vitest). Capture exit code and counts. If the command fails because `bun` isn't available, fall back to `npm run verify`.
 2. If biome/tsc/vitest fail on files touched in W3 → route failure back to W3 (the parent handles the W3.5 reloop; you emit `w4:verify:fail` with the failure list). Max 3 loops per cycle.
-2.5. **Contract gate** — for any verb touched in W3 (`signal`, `mark`, `warn`, `fade`, `follow`, `harden`), read `plans/contracts.md` and verify the diff against the verb's pre/post/inv. Until a property-test suite exists, this is a manual check; emit `contracts: reviewed` or `contracts: violated <verb> <clause>` in receipts. A violation is **non-bypassable** — cycle does not close regardless of rubric. If no verb touched → `contracts: n/a`.
+2.5. **Contract gate** — for any verb touched in W3 (`signal`, `mark`, `warn`, `fade`, `follow`, `harden`), read `text/contracts-plan.md` and verify the diff against the verb's pre/post/inv. Until a property-test suite exists, this is a manual check; emit `contracts: reviewed` or `contracts: violated <verb> <clause>` in receipts. A violation is **non-bypassable** — cycle does not close regardless of rubric. If no verb touched → `contracts: n/a`.
+3. **Reconcile per canon** — run after deterministic checks pass, before rubric scoring.
+   For every file touched in W3, determine its category (DATA / SURFACE / GATEWAY / PROOF / TEACH):
+   | Category | Applies to |
+   |----------|-----------|
+   | DATA     | `schema/*.tql`, `schema/*.ts`, D1 migrations, TypeDB types |
+   | SURFACE  | `src/components/**`, `src/pages/**`, `src/layouts/**`, `src/styles/**` |
+   | GATEWAY  | `src/pages/api/**`, `packages/sdk/**`, `channels/**`, `api/**` |
+   | PROOF    | `tests/**`, `*.test.ts`, `*.spec.ts` |
+   | TEACH    | `text/**`, `*.md`, agent prompts |
+   Run the canons that apply to each touched file:
+   ```bash
+   # DATA
+   do2-reconcile.sh substrate <file>
+   do2-reconcile.sh dictionary <file>
+   do2-reconcile.sh types
+   # SURFACE
+   do2-reconcile.sh design <file>
+   do2-reconcile.sh navigation <file>
+   # GATEWAY
+   do2-reconcile.sh sdk <file>
+   do2-reconcile.sh authority <file>
+   # PROOF — no canon check; exit code IS the check (vitest already ran above)
+   # TEACH
+   do2-reconcile.sh dictionary <file>
+   ```
+   Report each result as `canon/<name>: ok | warn | FAIL`. Any FAIL → cycle does not close, same as a tsc failure. Emit the FAIL stdout in the report so W2 can target the exact gap.
+   ```
+   reconcile: substrate=ok  dictionary=ok  types=ok  design=ok  navigation=FAIL  sdk=ok  authority=ok
+   ```
+4. **Coherence ratchet** — run after reconcile-per-canon.
+   For this cycle's diff, verify each dimension of the ratchet cannot regress:
+   ```
+   ### Coherence ratchet
+   ```
+   | Dim | Check | Command | Gate |
+   |-----|-------|---------|------|
+   | types | delta_tsc ≤ 0 | already in deterministic checks (step 1) | tsc errors this cycle ≤ tsc errors at W0 |
+   | names | 0 dead names in touched docs | `do2-reconcile.sh dictionary <touched_docs>` | exit 0 |
+   | primitives | net new ≤ 0 | `git diff --name-status HEAD \| grep -c '^A'` minus `git diff --name-status HEAD \| grep -c '^D'` | new_files − deleted_files ≤ 0 |
+   | schema | no fork | `do2-reconcile.sh substrate <touched_schema_files>` | exit 0 |
+   | surfaces | registered + linked | `do2-reconcile.sh navigation <touched_pages>` | exit 0 |
+   | docs | no broken link | `markdown-link-check <touched_docs>` | exit 0 |
+   ```bash
+   # primitives net check
+   NEW_FILES=$(git diff --name-status HEAD | grep -c '^A')
+   DEL_FILES=$(git diff --name-status HEAD | grep -c '^D')
+   NET_PRIMITIVES=$((NEW_FILES - DEL_FILES))
+   # pass if NET_PRIMITIVES <= 0; warn if > 0 and justify with PRIMITIVE verdict from W2
+   # names ratchet
+   do2-reconcile.sh dictionary $(git diff --name-only HEAD | grep '\.md$' | tr '\n' ' ')
+   # schema fork check
+   SCHEMA_TOUCHED=$(git diff --name-only HEAD | grep '\.tql$' | tr '\n' ' ')
+   [ -n "$SCHEMA_TOUCHED" ] && do2-reconcile.sh substrate $SCHEMA_TOUCHED
+   # surfaces ratchet
+   PAGES_TOUCHED=$(git diff --name-only HEAD | grep -E 'src/pages/' | tr '\n' ' ')
+   [ -n "$PAGES_TOUCHED" ] && do2-reconcile.sh navigation $PAGES_TOUCHED
+   # broken links
+   DOCS_TOUCHED=$(git diff --name-only HEAD | grep '\.md$' | tr '\n' ' ')
+   [ -n "$DOCS_TOUCHED" ] && markdown-link-check $DOCS_TOUCHED
+   ```
+   Any ratchet regression → cycle does not close. Report each dim: `ratchet/<dim>: ok | FAIL — <reason>`.
-3. If deterministic checks pass → score the code rubric. Target is 1.0 on every dim.
+5. If deterministic checks, reconcile-per-canon, and coherence ratchet all pass → score the code rubric. Target is 1.0 on every dim.
    Full KPIs, scoring bands, and improvement format are in `one/rubrics.md` — Code Rubric.
-3.5. **Goal-fit (0.35 — the heaviest dim, hard gate ≥ 0.50):** re-read the cycle's
+5.5. **Goal-fit (0.35 — the heaviest dim, hard gate ≥ 0.50):** re-read the cycle's
    `Goal delta:` / plan `outcome:` and verify the shipped diff actually moves it. Confirm the
    deliverable proof is present (curl / screenshot / log) and the `ux_after` journey is reachable.
    Clean, fast, safe code that does NOT advance the goal scores low here and **cannot close** —
-   goal-fit < 0.50 fails the cycle regardless of the other four dims. `→ improve:` names what the
-   goal still needs.
+   goal-fit < 0.50 fails the cycle regardless of the other four dims, regardless of composite.
+   This is a hard gate: a cycle that produces clean code solving the wrong problem does not ship.
+   `→ improve:` names what the goal still needs.
-4. **Security (0.20):** grep the diff for `/api[_-]?key|secret|password|token/i`, `eval(`,
+6. **Security (0.20):** grep the diff for `/api[_-]?key|secret|password|token/i`, `eval(`,
    `dangerouslySetInnerHTML`. Check every `src/pages/api/*.ts` route validates input with Zod
    at the boundary. CF Worker env via `context.env` only. No wildcard CORS headers.
    Score 1.0 = all greps return 0. For every gap, emit `→ improve: file:line — what`.
-5. **Stability (0.20):** biome + tsc + vitest already ran. Now check: no new `any`, no
+7. **Stability (0.20):** biome + tsc + vitest already ran. Now check: no new `any`, no
    `@ts-ignore` without WHY comment, no silent returns (Rule 1), no wall-clock units in new
    code or docs (Rule 2), no retired names `knowledge|connections|people|node|scent|alarm|
    trail|colony`. Score 1.0 = all zero. For each gap, emit `→ improve: exact location`.
-6. **Simplicity (0.15):** the philosophy is small, focused files. The substrate — the
+8. **Simplicity (0.15):** the philosophy is small, focused files. The substrate — the
    entire schema + engine — is 200 lines total. Use that as your reference point.
    ```bash
@@ -109,7 +198,7 @@ The rubric is not a verdict; it is a map forward.
    Score 1.0 = all files feel focused and single-purpose; functions tight; zero ceremony.
    For each gap, name what to split or delete.
-7. **Speed (0.10):** Three sub-checks, all must pass for 1.0.
+9. **Speed (0.10):** Three sub-checks, all must pass for 1.0.
    **Lighthouse:** run against all pages derived from the file→route map. Target 100 on all
    four categories. For each audit below 100, name it and the component responsible.
@@ -137,15 +226,17 @@ The rubric is not a verdict; it is a map forward.
    Score 1.0 = all Lighthouse 100, bundle ≤ W0, agent lines ≤ W0, no context stuffing, cache ≥ 80%.
    For each gap, name the audit, component, or file.
-8. Composite = `0.35·goal-fit + 0.20·security + 0.20·stability + 0.15·simplicity + 0.10·speed`. Gate: composite ≥ 0.65 AND goal-fit ≥ 0.50 (hard).
+10. Composite = `0.35·goal-fit + 0.20·security + 0.20·stability + 0.15·simplicity + 0.10·speed`. Gate: composite ≥ 0.65 AND goal-fit ≥ 0.50 (hard).
-9. Must-not checks (bypass composite — immediate warn):
+11. Must-not checks (bypass composite — immediate warn):
    - Hardcoded secret or API key → `warn(1)` on security, cycle fails.
    - `eval()` or unsanitized `dangerouslySetInnerHTML` → `warn(1)`, cycle fails.
    - Test failure on W3-touched files → `warn(1)` on stability, route to W3.5.
    - Lighthouse any category drops > 5 pts from baseline → `warn(1)` on speed.
+   - Any reconcile-per-canon FAIL → cycle fails (same weight as tsc failure).
+   - Any coherence ratchet regression → cycle fails.
-10. Cross-consistency checks from the TODO's verify checklist (doc terms match code identifiers,
+12. Cross-consistency checks from the TODO's verify checklist (doc terms match code identifiers,
     no 404 links, no retired names leaked).
 ---
@@ -283,7 +374,7 @@ git diff HEAD | grep -E '^\+' | grep -E 'define|match|insert' \
 Zero hits across all greps = security score eligible for 1.0.
 Each hit = `→ improve: file:line — what the pattern is`.
-11. **Write improvement artifacts** — this is how the system learns.
+13. **Write improvement artifacts** — this is how the system learns.
 ```bash
 # a) Machine-readable: feeds next cycle's W1 recon
@@ -336,7 +427,7 @@ the substrate — this file is structurally weak and should be prioritized in fu
 }
 ```
-12. Emit the completion signal.
+14. Emit the completion signal.
 ## Known-flaky allowlist
@@ -358,6 +449,7 @@ Success:
     "content": {
       "passed": N, "failed": 0,
       "rubric": {
+        "goal-fit":   { "score": 0.90, "improve": "clean" },
         "security":   { "score": 0.95, "improve": "src/pages/api/provision.ts:31 — missing Zod parse on slug" },
         "stability":  { "score": 1.00, "improve": "clean" },
         "simplicity": { "score": 0.85, "improve": "inline formatDate() at src/lib/slug.ts:12, saves 9 lines" },
@@ -367,6 +459,8 @@ Success:
       "velocity": +0.06,
       "buildMs": N,
       "lighthouse": { "perf": 97, "a11y": 100, "bp": 100, "seo": 100 },
+      "reconcile": { "substrate": "ok", "dictionary": "ok", "types": "ok", "design": "ok", "navigation": "ok", "sdk": "ok", "authority": "ok" },
+      "ratchet": { "types": "ok", "names": "ok", "primitives": "ok", "schema": "ok", "surfaces": "ok", "docs": "ok" },
       "improvements_file": ".w4-improvements.json"
     }
   }
@@ -384,6 +478,7 @@ Failure:
       "passed": N, "failed": M,
       "failures": ["<test name or tsc error>"],
       "rubric": {
+        "goal-fit":   { "score": 0.40, "improve": "diff does not advance plan outcome — goal still needs <X>" },
         "security":   { "score": 0.50, "improve": "src/pages/api/chat.ts:23 — missing Zod parse on body.slug" },
         "stability":  { "score": 0.00, "improve": "vitest: chat renders message FAILED — type mismatch line 14" },
         "simplicity": { "score": 0.60, "improve": "parseMarkdown() 18 lines, one caller — inline and delete" },
@@ -391,6 +486,8 @@ Failure:
       },
       "composite": 0.34,
       "velocity": -0.12,
+      "reconcile": { "navigation": "FAIL — src/pages/u/[slug]/new.astro not registered in manifest" },
+      "ratchet": { "surfaces": "FAIL — new page missing nav entry" },
       "improvements_file": ".w4-improvements.json"
     }
   }

package/commands/do-autonomous.md CHANGED Viewed

@@ -9,7 +9,7 @@ Loaded by `do.md` for empty invocation or `--once` flag.
 ```bash
 W0: bun run verify (once per session, skip if already passed)
-ORIENT: Read docs/TODO.md
+ORIENT: Read text/TODO-plan.md
         → note the active front (Atomicity / Vocabulary / New Surfaces)
         → note the Top 15 priority list
         → let this shape which task you pick