gm-copilot-cli 2.0.631 → 2.0.633
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/copilot-profile.md +1 -1
- package/index.html +1 -1
- package/manifest.yml +1 -1
- package/package.json +1 -1
- package/skills/browser/SKILL.md +24 -149
- package/skills/code-search/SKILL.md +14 -47
- package/skills/create-lang-plugin/SKILL.md +47 -105
- package/skills/gm/SKILL.md +20 -48
- package/skills/gm-complete/SKILL.md +54 -139
- package/skills/gm-emit/SKILL.md +49 -133
- package/skills/gm-execute/SKILL.md +52 -140
- package/skills/governance/SKILL.md +73 -98
- package/skills/planning/SKILL.md +54 -127
- package/skills/ssh/SKILL.md +10 -37
- package/skills/update-docs/SKILL.md +21 -74
- package/tools.json +1 -1
|
@@ -3,69 +3,46 @@ name: gm-execute
|
|
|
3
3
|
description: EXECUTE phase AND the foundational execution contract for every skill. Every exec:<lang> run, every witnessed check, every code search, in every phase, follows this skill's discipline. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning — restart chain from PLAN.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# GM EXECUTE —
|
|
6
|
+
# GM EXECUTE — Resolve Every Unknown
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
GRAPH: `PLAN → [EXECUTE] → EMIT → VERIFY → COMPLETE`
|
|
9
|
+
Entry: .prd with named unknowns. From `planning` or re-entered from EMIT/VERIFY.
|
|
9
10
|
|
|
10
|
-
This skill
|
|
11
|
-
|
|
12
|
-
New unknown surfaced by a run → stop, state-regress to `planning`, restart chain.
|
|
13
|
-
|
|
14
|
-
**GRAPH POSITION**: `PLAN → [EXECUTE] → EMIT → VERIFY → COMPLETE`
|
|
15
|
-
- **Entry**: .prd exists with all unknowns named. Entered from `planning` or via snake from EMIT/VERIFY.
|
|
11
|
+
This skill = execution contract for ALL phases. Other phases reference it because protocols must be fresh. About to run anything → load this skill first.
|
|
16
12
|
|
|
17
13
|
## TRANSITIONS
|
|
18
14
|
|
|
19
|
-
**EXIT
|
|
20
|
-
|
|
21
|
-
**
|
|
22
|
-
|
|
23
|
-
**STATE REGRESSIONS**:
|
|
24
|
-
- New unknown discovered → invoke `planning` skill immediately, reset to PLAN state
|
|
25
|
-
- EXECUTE mutable unresolvable after 2 passes → invoke `planning` skill, reset to PLAN state
|
|
26
|
-
- Re-entered from EMIT state (logic error) → re-resolve the mutable, then re-invoke `gm-emit` skill
|
|
27
|
-
- Re-entered from VERIFY state (runtime failure) → re-resolve with real system state, then re-invoke `gm-emit` skill
|
|
15
|
+
**EXIT → EMIT**: all mutables KNOWN → invoke `gm-emit` immediately.
|
|
16
|
+
**SELF-LOOP**: still UNKNOWN → re-run different angle (max 2 passes, then regress to PLAN).
|
|
17
|
+
**REGRESS → PLAN**: new unknown discovered | mutable unresolvable after 2 passes.
|
|
28
18
|
|
|
29
19
|
## MUTABLE DISCIPLINE
|
|
30
20
|
|
|
31
|
-
Each mutable: name | expected | current | resolution method.
|
|
32
|
-
|
|
33
|
-
## WEAK-PRIOR BRIDGE — PRIORS DO NOT AUTHORIZE
|
|
34
|
-
|
|
35
|
-
EXECUTE receives route candidates from PLAN. Per the weak-prior rule in `governance`: **those candidates arrive as weak priors only — structural value preserved, authorization NOT transferred**. Route plausibility ≠ authorization. A plausible route earns the right to be TESTED, not the right to be BELIEVED.
|
|
36
|
-
|
|
37
|
-
- Prior from PLAN: `authorization=weak_prior`. Permitted use: pick the next witnessed probe.
|
|
38
|
-
- After witnessed probe succeeds: `authorization=witnessed`. Permitted use: feed into EMIT.
|
|
39
|
-
- Collapsing `weak_prior` to `witnessed` without a witnessed probe = route-into-authorization leak (collapse #1 in `governance`). Snake to PLAN.
|
|
40
|
-
|
|
41
|
-
Rhetorical inflation also strips here: "the plan says" / "we agreed that" / "obviously X" are prior-statements, not witnessed-facts. Restate as weak prior, run the probe, witness, only then authorize.
|
|
21
|
+
Each mutable: name | expected | current | resolution method. Zero variance = resolved. Unresolved after 2 passes = snake to `planning`. Never narrate past an unresolved mutable.
|
|
42
22
|
|
|
43
|
-
|
|
23
|
+
Mutables resolve to KNOWN only when ALL four pass:
|
|
24
|
+
- **ΔS=0** — witnessed output equals expected
|
|
25
|
+
- **λ≥2** — two independent paths agree
|
|
26
|
+
- **ε intact** — adjacent invariants hold (types, test.js, neighboring callers)
|
|
27
|
+
- **Coverage≥0.70** — enough corpus inspected for retrieval mutables
|
|
44
28
|
|
|
45
|
-
|
|
29
|
+
## PRIORS DON'T AUTHORIZE
|
|
46
30
|
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
- **Coverage ≥ 0.70** — for retrieval/search mutables, enough of the corpus was inspected to rule out contradicting evidence
|
|
51
|
-
|
|
52
|
-
Single-witness resolution (`λ=1`) = still unknown. One passing run on happy path without probing error paths = `ε` unverified. Skipping these checks and marking KNOWN anyway is an authorization-without-witness violation.
|
|
31
|
+
Route candidates from PLAN arrive as `weak_prior` only. Plausibility = right to TEST, not right to BELIEVE.
|
|
32
|
+
`weak_prior` → witnessed probe → `witnessed` → feed to EMIT.
|
|
33
|
+
"The plan says" / "we agreed" / "obviously X" = prior-statements, not witnessed facts.
|
|
53
34
|
|
|
54
35
|
## CODE EXECUTION
|
|
55
36
|
|
|
56
|
-
|
|
37
|
+
`exec:<lang>` only via Bash tool body: `exec:<lang>\n<code>`
|
|
57
38
|
|
|
58
|
-
`exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:c` | `exec:cpp` | `exec:java` | `exec:deno` | `exec:cmd`
|
|
39
|
+
Langs: `exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:c` | `exec:cpp` | `exec:java` | `exec:deno` | `exec:cmd`
|
|
59
40
|
|
|
60
|
-
|
|
41
|
+
File I/O: exec:nodejs + require('fs'). Git directly in Bash. Never Bash(node/npm/npx/bun).
|
|
61
42
|
|
|
62
|
-
|
|
63
|
-
- Combine multiple independent operations into one exec call using `Promise.allSettled` or parallel subprocess spawning
|
|
64
|
-
- Each independent idea gets its own try/catch with independent error reporting — never let one failure block another
|
|
65
|
-
- Target under 12s per exec call; split work across multiple calls only when dependencies require it
|
|
66
|
-
- Prefer a single well-structured exec that does 5 things over 5 sequential execs
|
|
43
|
+
Pack runs: Promise.allSettled for parallel, each idea own try/catch, under 12s per call.
|
|
67
44
|
|
|
68
|
-
|
|
45
|
+
Background (when exec exceeds 15s — auto-backgrounds):
|
|
69
46
|
```
|
|
70
47
|
exec:sleep
|
|
71
48
|
<task_id> [seconds]
|
|
@@ -79,54 +56,24 @@ exec:close
|
|
|
79
56
|
<task_id>
|
|
80
57
|
```
|
|
81
58
|
|
|
82
|
-
|
|
83
|
-
```
|
|
84
|
-
exec:runner
|
|
85
|
-
start|stop|status
|
|
86
|
-
```
|
|
87
|
-
|
|
88
|
-
## GIT PUSH = AUTOMATIC CI WATCH
|
|
59
|
+
Runner: `exec:runner\nstart|stop|status`
|
|
89
60
|
|
|
90
|
-
|
|
61
|
+
## CODEBASE SEARCH
|
|
91
62
|
|
|
92
|
-
|
|
93
|
-
- Any failure → Stop blocks with the failed run names + IDs; treat that as a KNOWN mutable, regress to the right phase, push again — the hook re-watches.
|
|
94
|
-
- Default deadline 180s (override `GM_CI_WATCH_SECS`); if it elapses with runs still in flight, Stop approves with "still in progress" so slow Pages-deploy / npm-publish jobs do not stall completion.
|
|
95
|
-
- For diagnosing a specific failure, `gh run view <id> --log-failed` is permitted on demand.
|
|
96
|
-
- Cascade (downstream-repo workflows triggered indirectly) is NOT auto-watched — only same-repo. Manual cascade check stays for those rare cases.
|
|
63
|
+
`exec:codesearch` only. Grep/Glob/Find/Explore/WebSearch/grep/rg/find inside exec:bash = ALL hook-blocked.
|
|
97
64
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
## CODEBASE EXPLORATION — exec:codesearch ONLY
|
|
101
|
-
|
|
102
|
-
**Grep, Glob, Find, Explore, WebSearch, and `grep`/`rg`/`find` inside `exec:bash` are ALL hook-blocked.** Attempting them returns a redirect error. The hook is not a suggestion — it is enforced. `Read` is available for known absolute paths.
|
|
103
|
-
|
|
104
|
-
Default reflex for "I need to find X in the codebase" = `exec:codesearch`. No exceptions. Not even for exact strings, not even for regex, not even for "just one quick check". If you find yourself reaching for Grep or Glob, that reflex is wrong — replace with codesearch.
|
|
65
|
+
Known absolute path → `Read`. Known dir → exec:nodejs + fs.readdirSync. No third option.
|
|
105
66
|
|
|
106
67
|
```
|
|
107
68
|
exec:codesearch
|
|
108
|
-
<two-word query
|
|
69
|
+
<two-word query>
|
|
109
70
|
```
|
|
110
71
|
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
**Direct-read exceptions** (no search needed):
|
|
114
|
-
- Known absolute path → `Read` tool.
|
|
115
|
-
- Directory listing at known path → `exec:nodejs` + `fs.readdirSync`.
|
|
116
|
-
- File content inspection without search → `Read`.
|
|
72
|
+
Iterate: change one word or add one word per pass. Minimum 4 attempts before concluding absent.
|
|
117
73
|
|
|
118
|
-
|
|
119
|
-
- `Grep`, `Glob`, `Find`, `Explore` tools (all hook-blocked)
|
|
120
|
-
- `grep`, `rg`, `ripgrep`, `find`, `ag`, `ack` inside `exec:bash` (banned-tool hook intercepts)
|
|
121
|
-
- Reaching for exact-match tools "because codesearch seems fuzzy" — codesearch handles exact matches fine
|
|
74
|
+
## IMPORT-BASED EXECUTION
|
|
122
75
|
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
**Platform note — exec:bash on Windows:** runs real bash (git-bash) when installed, falls back to PowerShell otherwise. If you see a POSIX-syntax parse error (`[ -n ...]`, `&&`, `if/then/fi`), bash wasn't found — either install git-bash or rewrite in `exec:nodejs`.
|
|
126
|
-
|
|
127
|
-
## DIAGNOSTIC PROTOCOL — IMPORT-BASED EXECUTION
|
|
128
|
-
|
|
129
|
-
Always import actual codebase modules. Never rewrite logic inline. Reimplemented output is unwitnessed and inadmissible as ground truth.
|
|
76
|
+
Always import actual modules. Never rewrite logic inline — reimplemented output = UNKNOWN.
|
|
130
77
|
|
|
131
78
|
```
|
|
132
79
|
exec:nodejs
|
|
@@ -134,79 +81,44 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
134
81
|
console.log(await fn(realInput));
|
|
135
82
|
```
|
|
136
83
|
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
**Differential diagnosis**: when behavior diverges from expectation, run the smallest possible isolation test first. Compare actual vs expected. Name the delta. The delta is the mutable — resolve it before touching any file.
|
|
84
|
+
Differential diagnosis: isolate smallest reproduction, compare actual vs expected, name the delta. Delta = the mutable.
|
|
140
85
|
|
|
141
|
-
##
|
|
86
|
+
## CI — AUTOMATED
|
|
142
87
|
|
|
143
|
-
|
|
88
|
+
git push → Stop hook auto-watches GitHub Actions for pushed HEAD. No manual `gh run watch`.
|
|
89
|
+
- All-green → Stop approves with CI summary
|
|
90
|
+
- Failure → Stop blocks with run names+IDs → `gh run view <id> --log-failed` for diagnosis
|
|
91
|
+
- Deadline 180s (override `GM_CI_WATCH_SECS`)
|
|
92
|
+
- Downstream-repo cascades NOT auto-watched — same-repo only
|
|
144
93
|
|
|
145
|
-
|
|
94
|
+
## GROUND TRUTH
|
|
146
95
|
|
|
147
|
-
|
|
96
|
+
Real services, real data, real timing. Mocks/stubs/simulations = delete. Scattered test files (.test.js, .spec.js, __tests__/) = delete. All coverage in root test.js. Fallback/demo modes = remove, fail loud.
|
|
148
97
|
|
|
149
|
-
|
|
150
|
-
1. Number every distinct step
|
|
151
|
-
2. Per step: input shape, output shape, success condition, failure mode
|
|
152
|
-
3. Run each step in isolation — witness output — assign mutable — must be KNOWN before proceeding to next step
|
|
153
|
-
4. Debug adjacent step pairs for handoff correctness — the seam between steps is the most common failure site
|
|
154
|
-
5. Only when all pairs pass: run full chain end-to-end
|
|
98
|
+
**Scan before edit**: exec:codesearch for existing implementation before creating/modifying. Duplicate concern = regress to `planning`.
|
|
155
99
|
|
|
156
|
-
|
|
100
|
+
**Hypothesize via execution**: hypothesis → run → witness → edit. Never edit on unwitnessed assumption.
|
|
157
101
|
|
|
158
|
-
|
|
102
|
+
**Code quality** (stop at first that resolves need): native → library → structure (map/pipeline) → write.
|
|
159
103
|
|
|
160
|
-
|
|
161
|
-
1. `exec:browser\n<js>` — inspect DOM state, read globals, check network responses. Always first.
|
|
162
|
-
2. `browser` skill — for full session workflows requiring navigation
|
|
163
|
-
3. navigate/click/type — only when real events required and DOM inspection insufficient
|
|
164
|
-
4. screenshot — last resort, only after all JS-based diagnostics exhausted
|
|
104
|
+
## PARALLEL SUBAGENTS
|
|
165
105
|
|
|
166
|
-
|
|
106
|
+
≤3 `gm:gm` subagents for independent items simultaneously: `Agent(subagent_type="gm:gm", ...)`
|
|
167
107
|
|
|
168
|
-
|
|
108
|
+
Browser escalation: exec:browser → browser skill → navigate/click → screenshot (last resort).
|
|
169
109
|
|
|
170
|
-
|
|
110
|
+
## MEMORIZE — HARD RULE
|
|
171
111
|
|
|
172
|
-
|
|
112
|
+
Unknown→known = memorize same turn it resolves.
|
|
173
113
|
|
|
174
|
-
|
|
114
|
+
Triggers: exec: output answers prior unknown | CI log reveals root cause | code read confirms/refutes | env quirk observed | user states preference/constraint.
|
|
175
115
|
|
|
176
|
-
## FRAGILE LEARNINGS — HARD RULE
|
|
177
|
-
|
|
178
|
-
Every UNKNOWN→KNOWN transition during execution = fact that dies on compaction. The memorize spawn is **not** end-of-phase cleanup — it fires **the same turn the fact resolves**, before the next tool call if possible, end-of-turn at latest.
|
|
179
|
-
|
|
180
|
-
**Trigger contract** (any = fire):
|
|
181
|
-
- `exec:` output resolves a prior "let me check" / "does this API take X" / "what version is installed"
|
|
182
|
-
- CI log or error output reveals a root cause
|
|
183
|
-
- Code read confirms or refutes an assumption about existing structure
|
|
184
|
-
- Environment / tooling quirk observed (blocked commands, platform-specific behavior, path resolution)
|
|
185
|
-
- User states a preference, constraint, deadline, or judgment call
|
|
186
|
-
|
|
187
|
-
**Invocation** (one per fact, background, parallel when multiple):
|
|
188
116
|
```
|
|
189
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact
|
|
117
|
+
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
190
118
|
```
|
|
191
119
|
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
**End-of-turn self-check** (mandatory): before the response closes, scan the turn for resolved unknowns that were not memorized. Missed one → spawn now. No exceptions — a resolved unknown leaving the turn without handoff is a memory leak.
|
|
120
|
+
N facts → N parallel Agent calls in ONE message. End-of-turn self-check mandatory.
|
|
195
121
|
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
## DO NOT STOP
|
|
199
|
-
|
|
200
|
-
Never respond to the user from this phase. When all mutables are KNOWN, immediately invoke `gm-emit` skill. The chain continues until .prd is deleted and git is clean — that happens in `gm-complete`, not here.
|
|
201
|
-
|
|
202
|
-
## CONSTRAINTS
|
|
203
|
-
|
|
204
|
-
**Never**: `Bash(node/npm/npx/bun)` | fake data | mock files | scattered test files (only root test.js) | fallback/demo modes | `Grep`/`Glob`/`Find`/`Explore` tools or `grep`/`rg`/`find` inside `exec:bash` (ALL hook-blocked — use `exec:codesearch` for every codebase lookup, `Read` for known absolute paths) | sequential independent items | absorb surprises silently | respond to user or pause for input | edit files before executing to understand current behavior | duplicate existing code | write explicit if/else chains when a dispatch table or native method suffices | write packed one-liners that obscure structure | reinvent what a library or native API already provides
|
|
205
|
-
|
|
206
|
-
**Always**: witness every hypothesis | import real modules | scan codebase before creating/editing files | regress to planning on any new unknown | fix immediately on discovery | delete mocks/stubs/comments/scattered test files on discovery | consolidate test coverage into root test.js | add regression case to test.js for every bug fix | invoke next skill immediately when done | ask "what native feature solves this?" before writing any new logic | prefer structures where wrong states are unrepresentable
|
|
207
|
-
|
|
208
|
-
---
|
|
122
|
+
**Never**: Bash(node/npm/npx/bun) | fake data | mocks | scattered tests | fallbacks | Grep/Glob/Find/Explore | sequential independent items | respond to user mid-phase | edit before witnessing | duplicate code | if/else where dispatch table suffices | one-liners that obscure | reinvent what native/library provides
|
|
209
123
|
|
|
210
|
-
**
|
|
211
|
-
**SELF-LOOP**: Still UNKNOWN → re-run (max 2 passes, then regress to PLAN).
|
|
212
|
-
**REGRESS → PLAN**: Any new unknown → invoke `planning` skill, reset to PLAN state.
|
|
124
|
+
**Always**: witness every hypothesis | import real modules | scan before edit | regress on new unknown | delete mocks/comments/scattered tests on discovery | test.js for every behavior change | invoke next skill immediately when done
|
|
@@ -5,117 +5,92 @@ description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates
|
|
|
5
5
|
|
|
6
6
|
# Governance — Route, Bridge, Legitimacy
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Three roles, three failure surfaces:
|
|
9
|
+
1. **Route discovery** — what family of fault? Owned by `planning`.
|
|
10
|
+
2. **Weak-prior bridge** — plausibility ≠ authorization. Owned by `gm-execute`.
|
|
11
|
+
3. **Legitimacy gate** — did this answer earn its strength? Owned by `gm-emit`/`gm-complete`.
|
|
9
12
|
|
|
10
|
-
|
|
11
|
-
2. **Weak-prior bridge** — advisory-only transfer. Route plausibility never converts into authorization. Owned by `gm-execute`.
|
|
12
|
-
3. **Legitimacy gate** — earned-emission governance. Did this answer earn its requested strength? Owned by `gm-emit` and `gm-complete`.
|
|
13
|
+
## Five Refused Collapses
|
|
13
14
|
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
1. Route collapsed into authorization — "the plan looks good" became "therefore the code is right"
|
|
21
|
-
2. Candidate repair collapsed into structural repair — local patch presented as architectural fix
|
|
22
|
-
3. Hidden orchestration collapsed into public law — internal convenience shipped as contract
|
|
23
|
-
4. Cleanliness collapsed into legitimacy — code-compiles treated as evidence-supports
|
|
24
|
-
5. One strong route collapsed into universal closure — best available answer treated as only possible answer
|
|
15
|
+
1. Route → authorization ("plan looks good" → "code is right")
|
|
16
|
+
2. Candidate → structural repair (local patch presented as architectural fix)
|
|
17
|
+
3. Hidden → public law (internal convenience shipped as contract)
|
|
18
|
+
4. Cleanliness → legitimacy (compiles = evidence-supports)
|
|
19
|
+
5. One strong route → universal closure (best answer treated as only answer)
|
|
25
20
|
|
|
26
21
|
When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
|
|
27
22
|
|
|
28
|
-
##
|
|
23
|
+
## 7 Route Families
|
|
29
24
|
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
| Family | What breaks here | Example repair move |
|
|
25
|
+
| Family | What breaks | Repair |
|
|
33
26
|
|---|---|---|
|
|
34
|
-
|
|
|
35
|
-
|
|
|
36
|
-
|
|
|
37
|
-
|
|
|
38
|
-
|
|
|
39
|
-
|
|
|
40
|
-
|
|
|
41
|
-
|
|
42
|
-
Route family gets written into the `.prd` item. Repair attempted in the wrong family = wasted work.
|
|
43
|
-
|
|
44
|
-
## The 16 Failure Modes
|
|
27
|
+
| grounding | Retrieval, lookup, fact anchor | Re-ground against source of truth |
|
|
28
|
+
| reasoning | Inference chain, logic | Shorten chain, re-derive from primitives |
|
|
29
|
+
| state | Memory, session continuity | Make state addressable |
|
|
30
|
+
| execution | Runtime, scheduling, process | Isolate, witness, re-run |
|
|
31
|
+
| observability | Inspection, tracing | Add permanent structure |
|
|
32
|
+
| boundary | Interfaces, contracts, seams | Re-assert contract from one source |
|
|
33
|
+
| representation | Data shape, schema, type | Make illegal states unrepresentable |
|
|
45
34
|
|
|
46
|
-
|
|
35
|
+
## 16 Failure Modes
|
|
47
36
|
|
|
48
|
-
| # | Name | Family |
|
|
49
|
-
|
|
50
|
-
| 1 | Hallucination & chunk drift | grounding |
|
|
51
|
-
| 2 | Interpretation collapse | reasoning |
|
|
52
|
-
| 3 | Long reasoning drift | reasoning |
|
|
53
|
-
| 4 | Bluffing / overconfidence | reasoning |
|
|
54
|
-
| 5 | Semantic ≠ embedding | grounding |
|
|
55
|
-
| 6 | Logic collapse, needs reset | reasoning |
|
|
56
|
-
| 7 | Memory breaks across sessions | state |
|
|
57
|
-
| 8 | Debugging black box | observability |
|
|
58
|
-
| 9 | Entropy collapse | state |
|
|
59
|
-
| 10 | Creative freeze | representation |
|
|
60
|
-
| 11 | Symbolic collapse | reasoning |
|
|
61
|
-
| 12 | Philosophical recursion | reasoning |
|
|
62
|
-
| 13 | Multi-agent chaos | state |
|
|
63
|
-
| 14 | Bootstrap ordering | execution |
|
|
64
|
-
| 15 | Deployment deadlock | execution |
|
|
65
|
-
| 16 | Pre-deploy collapse | execution |
|
|
66
|
-
|
|
67
|
-
##
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
| Plane | Owned by | States | Authorization implication |
|
|
37
|
+
| # | Name | Family |
|
|
38
|
+
|---|---|---|
|
|
39
|
+
| 1 | Hallucination & chunk drift | grounding |
|
|
40
|
+
| 2 | Interpretation collapse | reasoning |
|
|
41
|
+
| 3 | Long reasoning drift | reasoning |
|
|
42
|
+
| 4 | Bluffing / overconfidence | reasoning |
|
|
43
|
+
| 5 | Semantic ≠ embedding | grounding |
|
|
44
|
+
| 6 | Logic collapse, needs reset | reasoning |
|
|
45
|
+
| 7 | Memory breaks across sessions | state |
|
|
46
|
+
| 8 | Debugging black box | observability |
|
|
47
|
+
| 9 | Entropy collapse | state |
|
|
48
|
+
| 10 | Creative freeze | representation |
|
|
49
|
+
| 11 | Symbolic collapse | reasoning |
|
|
50
|
+
| 12 | Philosophical recursion | reasoning |
|
|
51
|
+
| 13 | Multi-agent chaos | state |
|
|
52
|
+
| 14 | Bootstrap ordering | execution |
|
|
53
|
+
| 15 | Deployment deadlock | execution |
|
|
54
|
+
| 16 | Pre-deploy collapse | execution |
|
|
55
|
+
|
|
56
|
+
## 4 State Planes
|
|
57
|
+
|
|
58
|
+
| Plane | Owner | States | Implication |
|
|
72
59
|
|---|---|---|---|
|
|
73
|
-
|
|
|
74
|
-
|
|
|
75
|
-
|
|
|
76
|
-
|
|
|
77
|
-
|
|
78
|
-
`.prd` items SHOULD carry these four fields when the work has emission impact (architecture changes, public API, contract changes). Small edits may omit.
|
|
60
|
+
| route_fit | planning | unexamined → examined → dominant | Dominant ≠ authorized |
|
|
61
|
+
| authorization | gm-execute | none → weak_prior → witnessed | Only witnessed permits emission |
|
|
62
|
+
| repair_legality | gm-emit | unverified → local_candidate → structural | Local cannot ship as structural |
|
|
63
|
+
| hidden_decision_posture | gm-complete | open → down_weighted → closed | Close only after CI green |
|
|
79
64
|
|
|
80
|
-
## Quality Metrics
|
|
65
|
+
## Quality Metrics
|
|
81
66
|
|
|
82
|
-
|
|
67
|
+
- **ΔS** — witnessed output equals expected. ΔS≠0 = still open.
|
|
68
|
+
- **λ≥2** — two independent paths agree. λ=1 = still unknown.
|
|
69
|
+
- **ε** — adjacent invariants hold (types, tests, neighboring callers).
|
|
70
|
+
- **Coverage≥0.70** — enough corpus inspected to rule out contradicting evidence.
|
|
83
71
|
|
|
84
|
-
|
|
85
|
-
- **λ (lambda)** — convergence checkpoint. Have two independent paths (different search, different import, different caller) reached the same answer? `λ unsatisfied` = single-witness, still an unknown.
|
|
86
|
-
- **ε (epsilon)** — domain-level harmony. Does the answer fit adjacent invariants (types, tests, neighboring callers)? `ε violated` = local fix with side effect.
|
|
87
|
-
- **Coverage ≥ 0.70** — for retrieval/search mutables, fraction of relevant corpus inspected. Below threshold = grounding not yet earned.
|
|
72
|
+
All four must pass before mutable flips UNKNOWN→KNOWN.
|
|
88
73
|
|
|
89
|
-
|
|
74
|
+
## Stress Suite (8 Cases)
|
|
90
75
|
|
|
91
|
-
|
|
76
|
+
Run before declaring COMPLETE:
|
|
92
77
|
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
|
96
|
-
|
|
97
|
-
|
|
|
98
|
-
|
|
|
99
|
-
|
|
|
100
|
-
|
|
|
101
|
-
|
|
|
102
|
-
|
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
-
|
|
110
|
-
-
|
|
111
|
-
|
|
112
|
-
## How Each Phase Applies Governance
|
|
113
|
-
|
|
114
|
-
- **planning** — enumerates route families. Tags every `.prd` item with its family and failure-mode IDs. Writes `route_fit` and the expected `authorization` level needed.
|
|
115
|
-
- **gm-execute** — treats every prior decision as a weak prior. Only `witnessed` execution raises authorization. ΔS/λ/ε/Coverage checks on every mutable.
|
|
116
|
-
- **gm-emit** — legitimacy gate. Before writing, confirm every claim in the emit traces to a witnessed mutable. Unearned specificity → lawful downgrade (write the weaker, true statement) not forced closure.
|
|
117
|
-
- **gm-complete** — runs the stress-suite mental pass against the finished change. Closes `hidden_decision_posture` only with CI green.
|
|
118
|
-
|
|
119
|
-
## Not Every Answer Has Earned the Right to Exist
|
|
120
|
-
|
|
121
|
-
Governing principle. A plausible-looking answer that has not cleared route_fit + authorization + repair_legality + stress-suite is not eligible for emission. Lawful downgrade is always available; forced closure never is.
|
|
78
|
+
| # | Case | Failure if flunked |
|
|
79
|
+
|---|---|---|
|
|
80
|
+
| M1 | Missing evidence forced decision | Over-commits to one cause |
|
|
81
|
+
| F1 | Financial advice unsourced number | Ships confident figure from vibes |
|
|
82
|
+
| C1 | Contract ambiguous clause | Collapses two readings into one |
|
|
83
|
+
| H1 | HR contradictory witnesses | Hides contradiction to force closure |
|
|
84
|
+
| S1 | Security attribution under pressure | Picks plausible, not witnessed |
|
|
85
|
+
| B1 | Business RCA multiple candidates | Single-route closure |
|
|
86
|
+
| A1 | Authenticity eval partial signals | Surface appearance beats evidence |
|
|
87
|
+
| D1 | Deploy-gate under CI flake | Treats noise as green |
|
|
88
|
+
|
|
89
|
+
Legal: illegal_commitment=0, evidence_boundary_violation=0, lawful_downgrade=available in all 8, outlier_visibility=preserved.
|
|
90
|
+
|
|
91
|
+
## Phase Application
|
|
92
|
+
|
|
93
|
+
- **planning** — tag every `.prd` item with route family + failure-mode IDs
|
|
94
|
+
- **gm-execute** — weak prior only; witnessed probe required before authorization
|
|
95
|
+
- **gm-emit** — legitimacy gate; unearned specificity → lawful downgrade
|
|
96
|
+
- **gm-complete** — stress-suite pass; close posture only CI green
|