npm - gm-copilot-cli - Versions diffs - 2.0.631 → 2.0.633 - Mend

gm-copilot-cli 2.0.631 → 2.0.633

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/copilot-profile.md +1 -1
package/index.html +1 -1
package/manifest.yml +1 -1
package/package.json +1 -1
package/skills/browser/SKILL.md +24 -149
package/skills/code-search/SKILL.md +14 -47
package/skills/create-lang-plugin/SKILL.md +47 -105
package/skills/gm/SKILL.md +20 -48
package/skills/gm-complete/SKILL.md +54 -139
package/skills/gm-emit/SKILL.md +49 -133
package/skills/gm-execute/SKILL.md +52 -140
package/skills/governance/SKILL.md +73 -98
package/skills/planning/SKILL.md +54 -127
package/skills/ssh/SKILL.md +10 -37
package/skills/update-docs/SKILL.md +21 -74
package/tools.json +1 -1

package/skills/gm-execute/SKILL.md CHANGED Viewed

@@ -3,69 +3,46 @@ name: gm-execute
 description: EXECUTE phase AND the foundational execution contract for every skill. Every exec:<lang> run, every witnessed check, every code search, in every phase, follows this skill's discipline. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning — restart chain from PLAN.
 ---
-# GM EXECUTE — Resolving Every Unknown
+# GM EXECUTE — Resolve Every Unknown
-You are in the **EXECUTE** phase. Every mutable on `.gm/prd.yml` carries UNKNOWN status until witnessed execution resolves it. Job here = the witnessing.
+GRAPH: `PLAN → [EXECUTE] → EMIT → VERIFY → COMPLETE`
+Entry: .prd with named unknowns. From `planning` or re-entered from EMIT/VERIFY.
-This skill also carries the **execution contract** applying in every phase, not only this one. Planning runs codebase scans; EMIT runs pre-emit diagnostics; VERIFY runs integration tests and CI watches — all executions, all subject to discipline below. Other skills reference this skill because protocols stay live in context only while this text is nearby. About to run anything → this skill freshly loaded OR operating outside contract.
-New unknown surfaced by a run → stop, state-regress to `planning`, restart chain.
-**GRAPH POSITION**: `PLAN → [EXECUTE] → EMIT → VERIFY → COMPLETE`
-- **Entry**: .prd exists with all unknowns named. Entered from `planning` or via snake from EMIT/VERIFY.
+This skill = execution contract for ALL phases. Other phases reference it because protocols must be fresh. About to run anything → load this skill first.
 ## TRANSITIONS
-**EXIT — invoke `gm-emit` skill immediately when**: All mutables are KNOWN (zero UNKNOWN remaining). Do not wait, do not summarize. Invoke the skill.
-**SELF-LOOP (remain in EXECUTE state)**: Mutable still UNKNOWN after one pass → re-run with different angle (max 2 passes, then regress to PLAN)
-**STATE REGRESSIONS**:
-- New unknown discovered → invoke `planning` skill immediately, reset to PLAN state
-- EXECUTE mutable unresolvable after 2 passes → invoke `planning` skill, reset to PLAN state
-- Re-entered from EMIT state (logic error) → re-resolve the mutable, then re-invoke `gm-emit` skill
-- Re-entered from VERIFY state (runtime failure) → re-resolve with real system state, then re-invoke `gm-emit` skill
+**EXIT → EMIT**: all mutables KNOWN → invoke `gm-emit` immediately.
+**SELF-LOOP**: still UNKNOWN → re-run different angle (max 2 passes, then regress to PLAN).
+**REGRESS → PLAN**: new unknown discovered | mutable unresolvable after 2 passes.
 ## MUTABLE DISCIPLINE
-Each mutable: name | expected | current | resolution method. Execute → witness → assign → compare. Zero variance = resolved. Unresolved after 2 passes = new unknown = snake to `planning`. Never narrate past an unresolved mutable.
-## WEAK-PRIOR BRIDGE — PRIORS DO NOT AUTHORIZE
-EXECUTE receives route candidates from PLAN. Per the weak-prior rule in `governance`: **those candidates arrive as weak priors only — structural value preserved, authorization NOT transferred**. Route plausibility ≠ authorization. A plausible route earns the right to be TESTED, not the right to be BELIEVED.
-- Prior from PLAN: `authorization=weak_prior`. Permitted use: pick the next witnessed probe.
-- After witnessed probe succeeds: `authorization=witnessed`. Permitted use: feed into EMIT.
-- Collapsing `weak_prior` to `witnessed` without a witnessed probe = route-into-authorization leak (collapse #1 in `governance`). Snake to PLAN.
-Rhetorical inflation also strips here: "the plan says" / "we agreed that" / "obviously X" are prior-statements, not witnessed-facts. Restate as weak prior, run the probe, witness, only then authorize.
+Each mutable: name | expected | current | resolution method. Zero variance = resolved. Unresolved after 2 passes = snake to `planning`. Never narrate past an unresolved mutable.
-## QUALITY METRICS — APPLY BEFORE MARKING KNOWN
+Mutables resolve to KNOWN only when ALL four pass:
+- **ΔS=0** — witnessed output equals expected
+- **λ≥2** — two independent paths agree
+- **ε intact** — adjacent invariants hold (types, test.js, neighboring callers)
+- **Coverage≥0.70** — enough corpus inspected for retrieval mutables
-Every mutable passes all four before status flips UNKNOWN → KNOWN (see `governance` for full definitions):
+## PRIORS DON'T AUTHORIZE
-- **ΔS = 0** — witnessed output equals expected
-- **λ ≥ 2** — two independent paths agree (different search, different caller, different import), not just one confirmation
-- **ε intact** — adjacent invariants still hold (neighboring callers, types, test.js, nearby modules unbroken)
-- **Coverage ≥ 0.70** — for retrieval/search mutables, enough of the corpus was inspected to rule out contradicting evidence
-Single-witness resolution (`λ=1`) = still unknown. One passing run on happy path without probing error paths = `ε` unverified. Skipping these checks and marking KNOWN anyway is an authorization-without-witness violation.
+Route candidates from PLAN arrive as `weak_prior` only. Plausibility = right to TEST, not right to BELIEVE.
+`weak_prior` → witnessed probe → `witnessed` → feed to EMIT.
+"The plan says" / "we agreed" / "obviously X" = prior-statements, not witnessed facts.
 ## CODE EXECUTION
-**exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
+`exec:<lang>` only via Bash tool body: `exec:<lang>\n<code>`
-`exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:c` | `exec:cpp` | `exec:java` | `exec:deno` | `exec:cmd`
+Langs: `exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:c` | `exec:cpp` | `exec:java` | `exec:deno` | `exec:cmd`
-Lang auto-detected if omitted. `cwd` sets directory. File I/O via exec:nodejs + require('fs'). Only git in bash directly. `Bash(node/npm/npx/bun)` = violations.
+File I/O: exec:nodejs + require('fs'). Git directly in Bash. Never Bash(node/npm/npx/bun).
-**Execution efficiency — pack every run:**
-- Combine multiple independent operations into one exec call using `Promise.allSettled` or parallel subprocess spawning
-- Each independent idea gets its own try/catch with independent error reporting — never let one failure block another
-- Target under 12s per exec call; split work across multiple calls only when dependencies require it
-- Prefer a single well-structured exec that does 5 things over 5 sequential execs
+Pack runs: Promise.allSettled for parallel, each idea own try/catch, under 12s per call.
-**Background tasks** (auto-backgrounded when execution exceeds 15s):
+Background (when exec exceeds 15s — auto-backgrounds):
 ```
 exec:sleep
 <task_id> [seconds]
@@ -79,54 +56,24 @@ exec:close
 <task_id>
 ```
-**Runner**:
-```
-exec:runner
-start|stop|status
-```
-## GIT PUSH = AUTOMATIC CI WATCH
+Runner: `exec:runner\nstart|stop|status`
-The Stop hook automatically watches GitHub Actions runs whose `headSha` matches the just-pushed HEAD. Every push is watched without manual `gh run list` / `gh run watch` calls.
+## CODEBASE SEARCH
-- All-green → Stop appends a CI summary to its approve reason; you see it in the next turn's context.
-- Any failure → Stop blocks with the failed run names + IDs; treat that as a KNOWN mutable, regress to the right phase, push again — the hook re-watches.
-- Default deadline 180s (override `GM_CI_WATCH_SECS`); if it elapses with runs still in flight, Stop approves with "still in progress" so slow Pages-deploy / npm-publish jobs do not stall completion.
-- For diagnosing a specific failure, `gh run view <id> --log-failed` is permitted on demand.
-- Cascade (downstream-repo workflows triggered indirectly) is NOT auto-watched — only same-repo. Manual cascade check stays for those rare cases.
+`exec:codesearch` only. Grep/Glob/Find/Explore/WebSearch/grep/rg/find inside exec:bash = ALL hook-blocked.
-**Zero silent pushes** is now an automatic invariant of the hook layer; do not duplicate it as manual instructions.
-## CODEBASE EXPLORATION — exec:codesearch ONLY
-**Grep, Glob, Find, Explore, WebSearch, and `grep`/`rg`/`find` inside `exec:bash` are ALL hook-blocked.** Attempting them returns a redirect error. The hook is not a suggestion — it is enforced. `Read` is available for known absolute paths.
-Default reflex for "I need to find X in the codebase" = `exec:codesearch`. No exceptions. Not even for exact strings, not even for regex, not even for "just one quick check". If you find yourself reaching for Grep or Glob, that reflex is wrong — replace with codesearch.
+Known absolute path → `Read`. Known dir → exec:nodejs + fs.readdirSync. No third option.
 ```
 exec:codesearch
-<two-word query to start>
+<two-word query>
 ```
-`exec:codesearch` handles exact strings, symbols, regex-ish patterns, file-name fragments, and PDF pages (indexed page-by-page with `file:page` citations). Two words in, iterate by changing or adding one word per pass, minimum four attempts before concluding absent. Full protocol in `code-search` skill.
-**Direct-read exceptions** (no search needed):
-- Known absolute path → `Read` tool.
-- Directory listing at known path → `exec:nodejs` + `fs.readdirSync`.
-- File content inspection without search → `Read`.
+Iterate: change one word or add one word per pass. Minimum 4 attempts before concluding absent.
-**Never**:
-- `Grep`, `Glob`, `Find`, `Explore` tools (all hook-blocked)
-- `grep`, `rg`, `ripgrep`, `find`, `ag`, `ack` inside `exec:bash` (banned-tool hook intercepts)
-- Reaching for exact-match tools "because codesearch seems fuzzy" — codesearch handles exact matches fine
+## IMPORT-BASED EXECUTION
-When a mutable depends on external specification (protocol field, register layout, compliance text), search the PDF corpus first. Unwitnessed assumption from a doc you did not search = UNKNOWN.
-**Platform note — exec:bash on Windows:** runs real bash (git-bash) when installed, falls back to PowerShell otherwise. If you see a POSIX-syntax parse error (`[ -n ...]`, `&&`, `if/then/fi`), bash wasn't found — either install git-bash or rewrite in `exec:nodejs`.
-## DIAGNOSTIC PROTOCOL — IMPORT-BASED EXECUTION
-Always import actual codebase modules. Never rewrite logic inline. Reimplemented output is unwitnessed and inadmissible as ground truth.
+Always import actual modules. Never rewrite logic inline — reimplemented output = UNKNOWN.
 ```
 exec:nodejs
@@ -134,79 +81,44 @@ const { fn } = await import('/abs/path/to/module.js');
 console.log(await fn(realInput));
 ```
-Witnessed import output = resolved mutable. Reimplemented output = UNKNOWN.
-**Differential diagnosis**: when behavior diverges from expectation, run the smallest possible isolation test first. Compare actual vs expected. Name the delta. The delta is the mutable — resolve it before touching any file.
+Differential diagnosis: isolate smallest reproduction, compare actual vs expected, name the delta. Delta = the mutable.
-## EXECUTION DENSITY
+## CI — AUTOMATED
-Pack every related hypothesis into one run. Each run ≤15s. Witnessed output = ground truth. Narrated assumption = inadmissible.
+git push → Stop hook auto-watches GitHub Actions for pushed HEAD. No manual `gh run watch`.
+- All-green → Stop approves with CI summary
+- Failure → Stop blocks with run names+IDs → `gh run view <id> --log-failed` for diagnosis
+- Deadline 180s (override `GM_CI_WATCH_SECS`)
+- Downstream-repo cascades NOT auto-watched — same-repo only
-Parallel waves: ≤3 `gm:gm` subagents via Agent tool (`Agent(subagent_type="gm:gm", ...)`) — independent items simultaneously, never sequentially.
+## GROUND TRUTH
-## CHAIN DECOMPOSITION — FAULT ISOLATION
+Real services, real data, real timing. Mocks/stubs/simulations = delete. Scattered test files (.test.js, .spec.js, __tests__/) = delete. All coverage in root test.js. Fallback/demo modes = remove, fail loud.
-Break every multi-step operation before running end-to-end. Treat each step as a diagnostic unit:
-1. Number every distinct step
-2. Per step: input shape, output shape, success condition, failure mode
-3. Run each step in isolation — witness output — assign mutable — must be KNOWN before proceeding to next step
-4. Debug adjacent step pairs for handoff correctness — the seam between steps is the most common failure site
-5. Only when all pairs pass: run full chain end-to-end
+**Scan before edit**: exec:codesearch for existing implementation before creating/modifying. Duplicate concern = regress to `planning`.
-Step failure revealing new unknown → regress to `planning` state immediately.
+**Hypothesize via execution**: hypothesis → run → witness → edit. Never edit on unwitnessed assumption.
-## BROWSER DIAGNOSTIC ESCALATION
+**Code quality** (stop at first that resolves need): native → library → structure (map/pipeline) → write.
-Invoke `browser` skill. Exhaust each level before advancing to next:
-1. `exec:browser\n<js>` — inspect DOM state, read globals, check network responses. Always first.
-2. `browser` skill — for full session workflows requiring navigation
-3. navigate/click/type — only when real events required and DOM inspection insufficient
-4. screenshot — last resort, only after all JS-based diagnostics exhausted
+## PARALLEL SUBAGENTS
-## GROUND TRUTH ENFORCEMENT
+≤3 `gm:gm` subagents for independent items simultaneously: `Agent(subagent_type="gm:gm", ...)`
-Real services, real data, real timing. Mocks/fakes/stubs/simulations = diagnostic noise = delete immediately. No scattered test files (.test.js, .spec.js, __tests__/) — delete on discovery. All test coverage belongs in the single root `test.js`. If `test.js` does not exist, create it. Every behavior change updates `test.js`. Every bug fix adds a regression case. No fallback/demo modes — errors must surface with full diagnostic context and fail loud.
+Browser escalation: exec:browser → browser skill → navigate/click → screenshot (last resort).
-**SCAN BEFORE EDIT**: Before modifying or creating any file, search the codebase (exec:codesearch) for existing implementations of the same concern. "Duplicate" means overlapping responsibility, similar logic, or parallel implementations — not just identical files. If consolidation is possible, regress to `planning` with restructuring instructions instead of continuing.
+## MEMORIZE — HARD RULE
-**HYPOTHESIZE VIA EXECUTION — NEVER VIA ASSUMPTION**: Formulate a falsifiable hypothesis. Run it. Witness the output. The output either confirms or falsifies. Only a witnessed falsification justifies editing a file. Never edit based on unwitnessed assumptions — form hypothesis → run → witness → edit.
+Unknown→known = memorize same turn it resolves.
-**CODE QUALITY PROCESS**: The goal is minimal code / maximal DX. When writing or reviewing any block of code, run this mental process: (1) What native language/platform feature already does this? Use it. (2) What library already solves this pattern? Use it. (3) Can this branch/loop be a data structure — a map, array, or pipeline — where the structure itself enforces correctness? Make it so. (4) Would a newcomer read this top-to-bottom and immediately understand what it does without running it? If no, restructure. One-liners that compress logic are the opposite of DX — clarity comes from structure, not brevity. Dispatch tables, pipeline chains, and native APIs eliminate entire categories of bugs by making wrong states unrepresentable.
+Triggers: exec: output answers prior unknown | CI log reveals root cause | code read confirms/refutes | env quirk observed | user states preference/constraint.
-## FRAGILE LEARNINGS — HARD RULE
-Every UNKNOWN→KNOWN transition during execution = fact that dies on compaction. The memorize spawn is **not** end-of-phase cleanup — it fires **the same turn the fact resolves**, before the next tool call if possible, end-of-turn at latest.
-**Trigger contract** (any = fire):
-- `exec:` output resolves a prior "let me check" / "does this API take X" / "what version is installed"
-- CI log or error output reveals a root cause
-- Code read confirms or refutes an assumption about existing structure
-- Environment / tooling quirk observed (blocked commands, platform-specific behavior, path resolution)
-- User states a preference, constraint, deadline, or judgment call
-**Invocation** (one per fact, background, parallel when multiple):
 ```
-Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact with enough context for a cold-start agent>')
+Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
 ```
-**Parallel spawn**: N facts resolved in one turn → N memorize calls in a **single message**, parallel tool blocks. Never serialize. Never merge multiple facts into one prompt.
-**End-of-turn self-check** (mandatory): before the response closes, scan the turn for resolved unknowns that were not memorized. Missed one → spawn now. No exceptions — a resolved unknown leaving the turn without handoff is a memory leak.
+N facts → N parallel Agent calls in ONE message. End-of-turn self-check mandatory.
-Skip memorize = forget on purpose. Treat it as a bug.
-## DO NOT STOP
-Never respond to the user from this phase. When all mutables are KNOWN, immediately invoke `gm-emit` skill. The chain continues until .prd is deleted and git is clean — that happens in `gm-complete`, not here.
-## CONSTRAINTS
-**Never**: `Bash(node/npm/npx/bun)` | fake data | mock files | scattered test files (only root test.js) | fallback/demo modes | `Grep`/`Glob`/`Find`/`Explore` tools or `grep`/`rg`/`find` inside `exec:bash` (ALL hook-blocked — use `exec:codesearch` for every codebase lookup, `Read` for known absolute paths) | sequential independent items | absorb surprises silently | respond to user or pause for input | edit files before executing to understand current behavior | duplicate existing code | write explicit if/else chains when a dispatch table or native method suffices | write packed one-liners that obscure structure | reinvent what a library or native API already provides
-**Always**: witness every hypothesis | import real modules | scan codebase before creating/editing files | regress to planning on any new unknown | fix immediately on discovery | delete mocks/stubs/comments/scattered test files on discovery | consolidate test coverage into root test.js | add regression case to test.js for every bug fix | invoke next skill immediately when done | ask "what native feature solves this?" before writing any new logic | prefer structures where wrong states are unrepresentable
----
+**Never**: Bash(node/npm/npx/bun) | fake data | mocks | scattered tests | fallbacks | Grep/Glob/Find/Explore | sequential independent items | respond to user mid-phase | edit before witnessing | duplicate code | if/else where dispatch table suffices | one-liners that obscure | reinvent what native/library provides
-**EXIT → EMIT**: All mutables KNOWN → invoke `gm-emit` skill immediately.
-**SELF-LOOP**: Still UNKNOWN → re-run (max 2 passes, then regress to PLAN).
-**REGRESS → PLAN**: Any new unknown → invoke `planning` skill, reset to PLAN state.
+**Always**: witness every hypothesis | import real modules | scan before edit | regress on new unknown | delete mocks/comments/scattered tests on discovery | test.js for every behavior change | invoke next skill immediately when done

package/skills/governance/SKILL.md CHANGED Viewed

@@ -5,117 +5,92 @@ description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates
 # Governance — Route, Bridge, Legitimacy
-Central governance reference. Three roles separate three failure surfaces every phase must respect simultaneously:
+Three roles, three failure surfaces:
+1. **Route discovery** — what family of fault? Owned by `planning`.
+2. **Weak-prior bridge** — plausibility ≠ authorization. Owned by `gm-execute`.
+3. **Legitimacy gate** — did this answer earn its strength? Owned by `gm-emit`/`gm-complete`.
-1. **Route discovery** — route-first structural orientation. Where could this fail? What family of fault does it live in? Owned by `planning`.
-2. **Weak-prior bridge** — advisory-only transfer. Route plausibility never converts into authorization. Owned by `gm-execute`.
-3. **Legitimacy gate** — earned-emission governance. Did this answer earn its requested strength? Owned by `gm-emit` and `gm-complete`.
+## Five Refused Collapses
-Neither route-first nor legitimacy-first alone suffices. The weak-prior bridge exists precisely to stop route plausibility from masquerading as authorization.
-## The Five Collapses Governance Refuses
-A conclusion ships only when none of these has occurred:
-1. Route collapsed into authorization — "the plan looks good" became "therefore the code is right"
-2. Candidate repair collapsed into structural repair — local patch presented as architectural fix
-3. Hidden orchestration collapsed into public law — internal convenience shipped as contract
-4. Cleanliness collapsed into legitimacy — code-compiles treated as evidence-supports
-5. One strong route collapsed into universal closure — best available answer treated as only possible answer
+1. Route → authorization ("plan looks good" → "code is right")
+2. Candidate → structural repair (local patch presented as architectural fix)
+3. Hidden → public law (internal convenience shipped as contract)
+4. Cleanliness → legitimacy (compiles = evidence-supports)
+5. One strong route → universal closure (best answer treated as only answer)
 When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
-## The 7 Route Families
+## 7 Route Families
-Every planned item belongs to at least one family. Naming the family disciplines the repair move.
-| Family | What breaks here | Example repair move |
+| Family | What breaks | Repair |
 |---|---|---|
-| **grounding** | Retrieval, lookup, fact anchor | Re-ground against source of truth (PDF, spec, witnessed state) |
-| **reasoning** | Inference chain, logic, derivation | Shorten chain, re-derive from primitives |
-| **state** | Memory, persistence, session continuity | Make state addressable, kill implicit carry-over |
-| **execution** | Runtime, scheduling, process lifecycle | Isolate, witness, re-run deterministically |
-| **observability** | Inspection, tracing, debuggability | Add permanent structure — never ad-hoc log |
-| **boundary** | Interfaces, contracts, seam between subsystems | Re-assert contract, regenerate both sides from one source |
-| **representation** | Data shape, schema, type | Make illegal states unrepresentable structurally |
-Route family gets written into the `.prd` item. Repair attempted in the wrong family = wasted work.
-## The 16 Failure Modes
+| grounding | Retrieval, lookup, fact anchor | Re-ground against source of truth |
+| reasoning | Inference chain, logic | Shorten chain, re-derive from primitives |
+| state | Memory, session continuity | Make state addressable |
+| execution | Runtime, scheduling, process | Isolate, witness, re-run |
+| observability | Inspection, tracing | Add permanent structure |
+| boundary | Interfaces, contracts, seams | Re-assert contract from one source |
+| representation | Data shape, schema, type | Make illegal states unrepresentable |
-Routing taxonomy. Every fault surface enumerated during planning should map to at least one of these. Missing mapping = unexamined surface.
+## 16 Failure Modes
-| # | Name | Family | Shape |
-|---|---|---|---|
-| 1 | Hallucination & chunk drift | grounding | Retrieval returned wrong/irrelevant content |
-| 2 | Interpretation collapse | reasoning | Chunk right, logic wrong |
-| 3 | Long reasoning drift | reasoning | Error accumulates across multi-step chain |
-| 4 | Bluffing / overconfidence | reasoning | Confident, unfounded |
-| 5 | Semantic ≠ embedding | grounding | Cosine match ≠ actual meaning |
-| 6 | Logic collapse, needs reset | reasoning | Dead-end, must restart chain |
-| 7 | Memory breaks across sessions | state | Continuity lost |
-| 8 | Debugging black box | observability | No visibility into failure path |
-| 9 | Entropy collapse | state | Attention melts, incoherent output |
-| 10 | Creative freeze | representation | Flat literal output |
-| 11 | Symbolic collapse | reasoning | Abstract prompt breaks |
-| 12 | Philosophical recursion | reasoning | Self-reference loop |
-| 13 | Multi-agent chaos | state | Agents overwrite each other |
-| 14 | Bootstrap ordering | execution | Services fire before deps ready |
-| 15 | Deployment deadlock | execution | Circular wait in infra |
-| 16 | Pre-deploy collapse | execution | Version skew / missing secret on first call |
-## The 4 State Planes
-Any in-flight item occupies four orthogonal state planes simultaneously. One plane advancing does not advance any other.
-| Plane | Owned by | States | Authorization implication |
+| # | Name | Family |
+|---|---|---|
+| 1 | Hallucination & chunk drift | grounding |
+| 2 | Interpretation collapse | reasoning |
+| 3 | Long reasoning drift | reasoning |
+| 4 | Bluffing / overconfidence | reasoning |
+| 5 | Semantic ≠ embedding | grounding |
+| 6 | Logic collapse, needs reset | reasoning |
+| 7 | Memory breaks across sessions | state |
+| 8 | Debugging black box | observability |
+| 9 | Entropy collapse | state |
+| 10 | Creative freeze | representation |
+| 11 | Symbolic collapse | reasoning |
+| 12 | Philosophical recursion | reasoning |
+| 13 | Multi-agent chaos | state |
+| 14 | Bootstrap ordering | execution |
+| 15 | Deployment deadlock | execution |
+| 16 | Pre-deploy collapse | execution |
+## 4 State Planes
+| Plane | Owner | States | Implication |
 |---|---|---|---|
-| **route_fit** | planning | `unexamined` → `examined` → `dominant` | Examined ≠ dominant. Dominant ≠ authorized. |
-| **authorization** | gm-execute | `none` → `weak_prior` → `witnessed` | Only `witnessed` permits emission. `weak_prior` never. |
-| **repair_legality** | gm-emit | `unverified` → `local_candidate` → `structural` | Local candidate cannot ship as structural repair. |
-| **hidden_decision_posture** | gm-complete | `open` → `down_weighted` → `closed` | Closing before CI green = illegal. |
-`.prd` items SHOULD carry these four fields when the work has emission impact (architecture changes, public API, contract changes). Small edits may omit.
+| route_fit | planning | unexamined → examined → dominant | Dominant ≠ authorized |
+| authorization | gm-execute | none → weak_prior → witnessed | Only witnessed permits emission |
+| repair_legality | gm-emit | unverified → local_candidate → structural | Local cannot ship as structural |
+| hidden_decision_posture | gm-complete | open → down_weighted → closed | Close only after CI green |
-## Quality Metrics (ΔS, λ, ε, Coverage)
+## Quality Metrics
-Quantitative checks applied to every mutable before it is marked KNOWN.
+- **ΔS** — witnessed output equals expected. ΔS≠0 = still open.
+- **λ≥2** — two independent paths agree. λ=1 = still unknown.
+- **ε** — adjacent invariants hold (types, tests, neighboring callers).
+- **Coverage≥0.70** — enough corpus inspected to rule out contradicting evidence.
-- **ΔS (drift)** — semantic delta between expectation and witnessed output. `ΔS ≠ 0` = mutable still open, regardless of narrative.
-- **λ (lambda)** — convergence checkpoint. Have two independent paths (different search, different import, different caller) reached the same answer? `λ unsatisfied` = single-witness, still an unknown.
-- **ε (epsilon)** — domain-level harmony. Does the answer fit adjacent invariants (types, tests, neighboring callers)? `ε violated` = local fix with side effect.
-- **Coverage ≥ 0.70** — for retrieval/search mutables, fraction of relevant corpus inspected. Below threshold = grounding not yet earned.
+All four must pass before mutable flips UNKNOWN→KNOWN.
-Use as verbal checks, not machine-evaluated numbers. "ΔS=0, λ=2 paths agree, ε=adjacent tests pass, coverage=read all five call-sites" means KNOWN. "ΔS=0" alone does not.
+## Stress Suite (8 Cases)
-## Governance Stress Suite (8 Cases)
+Run before declaring COMPLETE:
-High-pressure cases that expose over-commitment. Before declaring a non-trivial task COMPLETE, mentally run your proposed solution through every case it touches. A case it flunks is a blocker.
-| # | Case | Pressure | Failure shape if flunked |
-|---|---|---|---|
-| M1 | Missing evidence forced decision | "Just pick one" with zero vitals | Over-commits to one cause |
-| F1 | Financial advice with unsourced number | Decisive tone required | Ships confident figure from vibes |
-| C1 | Contract review with ambiguous clause | Must give ruling | Collapses two readings into one |
-| H1 | HR fact-finding with contradictory witnesses | Must assign blame | Hides contradiction to force closure |
-| S1 | Security attribution under time pressure | "Which exploit?" | Picks plausible, not witnessed |
-| B1 | Business RCA with multiple candidates | "Root cause, now" | Single-route closure, live alternatives suppressed |
-| A1 | Authenticity eval with partial signals | Real or fake? | Surface appearance beats evidence |
-| D1 | Deploy-gate decision under CI flake | Ship or not? | Treats clean-looking noise as green |
-Legal outcomes:
-- Illegal commitment: 0 of 8 (never commit past evidence)
-- Evidence-boundary violation: 0 of 8 (never exceed what was witnessed)
-- Lawful downgrade: 8 of 8 (always available as an option, always taken when warranted)
-- Outlier visibility: preserved (downgrade over hiding)
-## How Each Phase Applies Governance
-- **planning** — enumerates route families. Tags every `.prd` item with its family and failure-mode IDs. Writes `route_fit` and the expected `authorization` level needed.
-- **gm-execute** — treats every prior decision as a weak prior. Only `witnessed` execution raises authorization. ΔS/λ/ε/Coverage checks on every mutable.
-- **gm-emit** — legitimacy gate. Before writing, confirm every claim in the emit traces to a witnessed mutable. Unearned specificity → lawful downgrade (write the weaker, true statement) not forced closure.
-- **gm-complete** — runs the stress-suite mental pass against the finished change. Closes `hidden_decision_posture` only with CI green.
-## Not Every Answer Has Earned the Right to Exist
-Governing principle. A plausible-looking answer that has not cleared route_fit + authorization + repair_legality + stress-suite is not eligible for emission. Lawful downgrade is always available; forced closure never is.
+| # | Case | Failure if flunked |
+|---|---|---|
+| M1 | Missing evidence forced decision | Over-commits to one cause |
+| F1 | Financial advice unsourced number | Ships confident figure from vibes |
+| C1 | Contract ambiguous clause | Collapses two readings into one |
+| H1 | HR contradictory witnesses | Hides contradiction to force closure |
+| S1 | Security attribution under pressure | Picks plausible, not witnessed |
+| B1 | Business RCA multiple candidates | Single-route closure |
+| A1 | Authenticity eval partial signals | Surface appearance beats evidence |
+| D1 | Deploy-gate under CI flake | Treats noise as green |
+Legal: illegal_commitment=0, evidence_boundary_violation=0, lawful_downgrade=available in all 8, outlier_visibility=preserved.
+## Phase Application
+- **planning** — tag every `.prd` item with route family + failure-mode IDs
+- **gm-execute** — weak prior only; witnessed probe required before authorization
+- **gm-emit** — legitimacy gate; unearned specificity → lawful downgrade
+- **gm-complete** — stress-suite pass; close posture only CI green