gm-kilo 2.0.887 → 2.0.889
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/plugkit.sha256 +6 -6
- package/bin/plugkit.version +1 -1
- package/bin/rtk.sha256 +2 -2
- package/package.json +1 -1
- package/skills/browser/SKILL.md +16 -15
- package/skills/code-search/SKILL.md +13 -15
- package/skills/create-lang-plugin/SKILL.md +22 -26
- package/skills/gm/SKILL.md +31 -105
- package/skills/gm-complete/SKILL.md +46 -71
- package/skills/gm-emit/SKILL.md +40 -65
- package/skills/gm-execute/SKILL.md +35 -104
- package/skills/governance/SKILL.md +24 -23
- package/skills/pages/SKILL.md +42 -92
- package/skills/planning/SKILL.md +40 -153
- package/skills/research/SKILL.md +8 -14
- package/skills/ssh/SKILL.md +15 -9
- package/skills/textprocessing/SKILL.md +17 -25
- package/skills/update-docs/SKILL.md +15 -24
|
@@ -3,35 +3,36 @@ name: gm-complete
|
|
|
3
3
|
description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# GM COMPLETE — Verify
|
|
6
|
+
# GM COMPLETE — Verify, then close
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
Entry: all EMIT gates passed. From `gm-emit`.
|
|
8
|
+
Entry: EMIT gates clear, from `gm-emit`. Exit: `.prd` deleted + test.js green + pushed + CI green → `update-docs`.
|
|
10
9
|
|
|
11
|
-
|
|
10
|
+
Cross-cutting dispositions live in `gm` SKILL.md.
|
|
12
11
|
|
|
13
|
-
|
|
14
|
-
**EXIT → COMPLETE**: .prd deleted + test.js passes + pushed + CI green → invoke `update-docs`.
|
|
15
|
-
**REGRESS → EMIT**: broken file output.
|
|
16
|
-
**REGRESS → EXECUTE**: logic wrong.
|
|
17
|
-
**REGRESS → PLAN**: new unknown or wrong requirements.
|
|
12
|
+
## Transitions
|
|
18
13
|
|
|
19
|
-
|
|
14
|
+
- `.prd` items remain → `gm-execute`
|
|
15
|
+
- `.prd` empty AND test.js green AND pushed AND CI green → `update-docs`
|
|
16
|
+
- Broken file output → `gm-emit`
|
|
17
|
+
- Wrong logic → `gm-execute`
|
|
18
|
+
- New unknown or wrong requirements → `planning`
|
|
20
19
|
|
|
21
|
-
|
|
20
|
+
Failure triage: broken output to EMIT, wrong logic to EXECUTE, new unknown to PLAN. Never patch around surprises.
|
|
21
|
+
|
|
22
|
+
## Mutables that must resolve before COMPLETE
|
|
22
23
|
|
|
23
24
|
- `witnessed_e2e` — real end-to-end run with witnessed output
|
|
24
|
-
- `browser_validated` —
|
|
25
|
+
- `browser_validated` — for any change touching client / UI / browser-facing code, see gate below. test.js + node-side imports DO NOT satisfy this gate.
|
|
25
26
|
- `git_clean` — `git status --porcelain` returns empty
|
|
26
27
|
- `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
|
|
27
|
-
- `ci_passed` —
|
|
28
|
-
- `prd_empty` — `.gm/prd.yml` deleted
|
|
29
|
-
- `stress_suite_clear` — change walked through
|
|
30
|
-
- `hidden_decision_posture` —
|
|
28
|
+
- `ci_passed` — every GitHub Actions run reaches `conclusion: success`
|
|
29
|
+
- `prd_empty` — `.gm/prd.yml` deleted
|
|
30
|
+
- `stress_suite_clear` — change walked through M1–D1 (governance), none flunked
|
|
31
|
+
- `hidden_decision_posture` — open → down_weighted → closed only when CI is green AND stress suite is clear
|
|
31
32
|
|
|
32
|
-
##
|
|
33
|
+
## End-to-end verification
|
|
33
34
|
|
|
34
|
-
|
|
35
|
+
Real system, real data, witness actual output. Doc updates, "saying done", and screenshots alone are not verification.
|
|
35
36
|
|
|
36
37
|
```
|
|
37
38
|
exec:nodejs
|
|
@@ -39,31 +40,23 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
39
40
|
console.log(await fn(realInput));
|
|
40
41
|
```
|
|
41
42
|
|
|
42
|
-
|
|
43
|
+
After every success, enumerate what remains — never stop at first green.
|
|
43
44
|
|
|
44
|
-
##
|
|
45
|
+
## Browser validation gate
|
|
45
46
|
|
|
46
|
-
|
|
47
|
+
Required when this session changed any code that runs in a browser: anything under `client/`, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle.
|
|
47
48
|
|
|
48
|
-
Trigger detection (any one
|
|
49
|
-
- `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry.
|
|
50
|
-
- New/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL.
|
|
51
|
-
- Visual, layout, animation, input, network-on-page, or shader behavior altered.
|
|
49
|
+
Trigger detection (any one): `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry; new/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL; visual, layout, animation, input, network-on-page, or shader behavior altered.
|
|
52
50
|
|
|
53
|
-
|
|
54
|
-
1. Boot the real server (or open the static page) on a known URL — witness HTTP 200.
|
|
55
|
-
2. `exec:browser` → `page.goto(url)` → wait for app init (poll for the global the change affects, e.g. `window.__app.<system>`).
|
|
56
|
-
3. Probe via `page.evaluate(() => …)` — assert the specific invariant the change was supposed to establish (instance counts, scene meshes, DOM nodes, render stats, network frames, etc.).
|
|
57
|
-
4. Capture the witnessed numbers in the response. "Looks fine" is not a witness.
|
|
58
|
-
5. Failures → regress to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
|
|
51
|
+
Protocol: boot the real server (or open the static page) on a known URL — witness HTTP 200. `exec:browser` → `page.goto(url)` → wait for app init by polling for the global the change affects (`window.__app.<system>`). Probe via `page.evaluate(() => …)` asserting the specific invariant the change was supposed to establish — instance counts, scene meshes, DOM nodes, render stats, network frames. Capture witnessed numbers in the response — "looks fine" is not a witness. Failures route to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
|
|
59
52
|
|
|
60
|
-
Long-running probes
|
|
53
|
+
Long-running probes split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
|
|
61
54
|
|
|
62
|
-
Exempt only when: change is server-only with zero browser-facing surface, OR repository has no browser surface at all (pure CLI/library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`.
|
|
55
|
+
Exempt only when: change is server-only with zero browser-facing surface, OR the repository has no browser surface at all (pure CLI / library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence.
|
|
63
56
|
|
|
64
|
-
|
|
57
|
+
Pre-flight: run `git diff --name-only origin/main..HEAD` and grep for `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session → mandatory regression to `gm-execute`.
|
|
65
58
|
|
|
66
|
-
##
|
|
59
|
+
## Integration test gate
|
|
67
60
|
|
|
68
61
|
```
|
|
69
62
|
exec:nodejs
|
|
@@ -72,9 +65,9 @@ try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.lo
|
|
|
72
65
|
catch (e) { console.error('FAIL'); process.exit(1); }
|
|
73
66
|
```
|
|
74
67
|
|
|
75
|
-
Failure →
|
|
68
|
+
Failure → `gm-execute`. No test.js in a repo with testable surface → `gm-execute` to create it.
|
|
76
69
|
|
|
77
|
-
##
|
|
70
|
+
## Git enforcement
|
|
78
71
|
|
|
79
72
|
```
|
|
80
73
|
exec:bash
|
|
@@ -82,48 +75,30 @@ git status --porcelain
|
|
|
82
75
|
git log origin/main..HEAD --oneline
|
|
83
76
|
```
|
|
84
77
|
|
|
85
|
-
Both must return empty. Local commit without push
|
|
86
|
-
|
|
87
|
-
## CI — AUTOMATED
|
|
88
|
-
|
|
89
|
-
Stop hook watches all GitHub Actions runs for the pushed HEAD. Do not call `gh run list` manually.
|
|
90
|
-
- All-green → Stop approves with CI summary in next turn context
|
|
91
|
-
- Failure → Stop blocks with run names+IDs → investigate with `gh run view <id> --log-failed`, fix, push, hook re-watches
|
|
92
|
-
- Deadline 180s (override `GM_CI_WATCH_SECS`) → slow jobs get "still in progress" approve
|
|
78
|
+
Both must return empty. Local commit without push is not complete.
|
|
93
79
|
|
|
94
|
-
##
|
|
80
|
+
## CI is automated
|
|
95
81
|
|
|
96
|
-
|
|
82
|
+
The Stop hook watches Actions for the pushed HEAD. Do not call `gh run list` manually. All-green → Stop approves with CI summary in next-turn context. Failure → Stop blocks with run names + IDs; investigate via `gh run view <id> --log-failed`, fix, push, hook re-watches. Deadline 180s (override `GM_CI_WATCH_SECS`); slow jobs get a "still in progress" approve.
|
|
97
83
|
|
|
98
|
-
##
|
|
84
|
+
## Hygiene sweep
|
|
99
85
|
|
|
100
|
-
Before declaring complete:
|
|
101
86
|
1. Files >200 lines → split
|
|
102
87
|
2. Comments in code → remove
|
|
103
|
-
3. Scattered test files (
|
|
104
|
-
4. Mock/stub/simulation files → delete
|
|
105
|
-
5. Unnecessary doc files (not CHANGELOG
|
|
106
|
-
6. Duplicate concern →
|
|
88
|
+
3. Scattered test files (`.test.js`, `.spec.js`, `__tests__/`, `fixtures/`, `mocks/`) → delete, consolidate into root `test.js`
|
|
89
|
+
4. Mock / stub / simulation files → delete
|
|
90
|
+
5. Unnecessary doc files (not CHANGELOG, CLAUDE, README, TODO.md) → delete
|
|
91
|
+
6. Duplicate concern → regress to `planning` with restructuring instructions
|
|
107
92
|
7. Hardcoded values → derive from ground truth
|
|
108
|
-
8. Fallback/demo modes → remove, fail loud
|
|
109
|
-
9. TODO.md → empty
|
|
110
|
-
10. CHANGELOG.md →
|
|
93
|
+
8. Fallback / demo modes → remove, fail loud
|
|
94
|
+
9. TODO.md → empty or deleted
|
|
95
|
+
10. CHANGELOG.md → entries for this session
|
|
111
96
|
11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
|
|
112
|
-
12. Memorize → every fact from verification handed off via background Agent(memorize) at moment of resolution
|
|
113
|
-
13. Deploy/publish → if deployable, deploy; if npm package, publish
|
|
97
|
+
12. Memorize → every fact from verification handed off via background `Agent(memorize)` at moment of resolution
|
|
98
|
+
13. Deploy / publish → if deployable, deploy; if npm package, publish
|
|
114
99
|
14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
|
|
115
|
-
15. Governance stress-suite → walk change through M1,F1,C1,H1,S1,B1,A1,D1; any flunk
|
|
116
|
-
|
|
117
|
-
## MEMORIZE
|
|
118
|
-
|
|
119
|
-
```
|
|
120
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
One per fact, parallel, same turn resolved. End-of-turn self-check mandatory.
|
|
124
|
-
|
|
125
|
-
## COMPLETION DEFINITION
|
|
100
|
+
15. Governance stress-suite → walk change through M1, F1, C1, H1, S1, B1, A1, D1; any flunk regresses to the owning phase
|
|
126
101
|
|
|
127
|
-
|
|
102
|
+
## Completion
|
|
128
103
|
|
|
129
|
-
|
|
104
|
+
All true at once: witnessed e2e | browser_validated when client work touched | failure paths exercised | test.js passes | `.prd` deleted | git clean and pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated.
|
package/skills/gm-emit/SKILL.md
CHANGED
|
@@ -3,30 +3,34 @@ name: gm-emit
|
|
|
3
3
|
description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# GM EMIT — Write and
|
|
6
|
+
# GM EMIT — Write and verify from disk
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
Entry: all mutables KNOWN. From `gm-execute` or re-entered from VERIFY.
|
|
8
|
+
Entry: every mutable KNOWN, from `gm-execute` or re-entered from VERIFY. Exit: gates clear → `gm-complete`.
|
|
10
9
|
|
|
11
|
-
|
|
10
|
+
Cross-cutting dispositions live in `gm` SKILL.md.
|
|
12
11
|
|
|
13
|
-
|
|
14
|
-
**SELF-LOOP**: post-emit variance with known cause → fix, re-verify, stay in EMIT.
|
|
15
|
-
**REGRESS → EXECUTE**: pre-emit reveals known logic error.
|
|
16
|
-
**REGRESS → PLAN**: pre-emit reveals new unknown | post-emit variance with unknown cause | scope changed.
|
|
12
|
+
## Transitions
|
|
17
13
|
|
|
18
|
-
|
|
14
|
+
- All gates clear → `gm-complete`
|
|
15
|
+
- Post-emit variance with known cause → fix in-band, re-verify, stay in EMIT
|
|
16
|
+
- Pre-emit reveals known logic error → `gm-execute`
|
|
17
|
+
- Pre-emit reveals new unknown OR post-emit variance with unknown cause OR scope changed → `planning`
|
|
19
18
|
|
|
20
|
-
|
|
21
|
-
1. **Earned specificity** — traces to `authorization=witnessed`, not inflated from weak prior?
|
|
22
|
-
2. **Repair legality** — local patch dressed as structural repair? Downgrade scope or snake to PLAN.
|
|
23
|
-
3. **Lawful downgrade** — can a weaker, true statement replace it? PREFER the downgrade.
|
|
24
|
-
4. **Alternative-route suppression** — live competing route being silenced? Preserve it.
|
|
25
|
-
5. **Strongest objection** — if a reviewer pushed back on this change, what would the sharpest argument be? Articulate it. Cannot articulate = have not understood the alternatives = regress to `gm-execute`.
|
|
19
|
+
## Legitimacy gate (before pre-emit run)
|
|
26
20
|
|
|
27
|
-
|
|
21
|
+
For every claim landing in a file, answer five questions:
|
|
28
22
|
|
|
29
|
-
|
|
23
|
+
1. Earned specificity — does it trace to `authorization=witnessed`, or is it inflated from a weak prior?
|
|
24
|
+
2. Repair legality — is a local patch dressed as structural repair? Downgrade scope or regress to PLAN.
|
|
25
|
+
3. Lawful downgrade — can a weaker, true statement replace it? Prefer the downgrade.
|
|
26
|
+
4. Alternative-route suppression — is a live competing route being silenced? Preserve it.
|
|
27
|
+
5. Strongest objection — what would the sharpest reviewer pushback be? Articulate it. Cannot articulate = have not understood the alternatives → `gm-execute`.
|
|
28
|
+
|
|
29
|
+
Any failure regresses to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
|
|
30
|
+
|
|
31
|
+
## Pre-emit run
|
|
32
|
+
|
|
33
|
+
Mandatory before writing any file.
|
|
30
34
|
|
|
31
35
|
```
|
|
32
36
|
exec:nodejs
|
|
@@ -34,58 +38,29 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
34
38
|
console.log(await fn(realInput));
|
|
35
39
|
```
|
|
36
40
|
|
|
37
|
-
|
|
38
|
-
2. Run proposed logic in isolation WITHOUT writing — witness with real inputs
|
|
39
|
-
3. Probe failure paths with real error inputs
|
|
40
|
-
4. Compare: matches expected → write. Unexpected → new unknown → `planning`.
|
|
41
|
+
Import the actual module from disk to witness current behavior as the baseline. Run the proposed logic in isolation without writing — witness with real inputs and with real error inputs. Match expected → write. Unexpected → new unknown → `planning`.
|
|
41
42
|
|
|
42
|
-
##
|
|
43
|
+
## Writing
|
|
43
44
|
|
|
44
|
-
`exec:nodejs` with `require('fs')`. Write only when every gate mutable
|
|
45
|
+
`exec:nodejs` with `require('fs')`. Write only when every gate mutable resolves simultaneously.
|
|
45
46
|
|
|
46
|
-
##
|
|
47
|
+
## Post-emit verification
|
|
47
48
|
|
|
48
|
-
|
|
49
|
-
2. Run identical inputs as pre-emit — must match pre-emit baseline exactly
|
|
50
|
-
3. Known variance → fix immediately, re-verify (EMIT self-loop)
|
|
51
|
-
4. Unknown variance → new unknown → invoke `planning`
|
|
49
|
+
Re-import from disk — in-memory state is stale and inadmissible. Run identical inputs as pre-emit; output must match the baseline exactly. Known variance → fix and re-verify (self-loop). Unknown variance → `planning`.
|
|
52
50
|
|
|
53
|
-
##
|
|
51
|
+
## Gate (all true at once)
|
|
54
52
|
|
|
55
|
-
- Legitimacy gate passed;
|
|
56
|
-
- Pre-emit passed with real inputs
|
|
53
|
+
- Legitimacy gate passed; no refused collapse
|
|
54
|
+
- Pre-emit passed with real inputs and real error inputs
|
|
57
55
|
- Post-emit matches pre-emit exactly
|
|
58
|
-
- Hot
|
|
59
|
-
- No mocks
|
|
60
|
-
-
|
|
61
|
-
-
|
|
62
|
-
- Files ≤200 lines
|
|
63
|
-
- No duplicate concern (run exec:codesearch for primary concern after writing;
|
|
64
|
-
- No comments
|
|
65
|
-
- Observability: new server subsystems expose `/debug/<subsystem>`; new client modules in `window.__debug`
|
|
66
|
-
- Structure: no if/else where dispatch
|
|
67
|
-
-
|
|
68
|
-
- CHANGELOG.md updated; TODO.md cleared
|
|
69
|
-
|
|
70
|
-
## FIX ON SIGHT — HARD RULE
|
|
71
|
-
|
|
72
|
-
Pre-emit run, post-emit run, or legitimacy gate surfaces ANY issue (failing assertion, stderr, type/lint error, unexpected variance, broken import, runtime throw) → fix at root cause this turn, re-run pre-emit AND post-emit, advance only when all gates pass simultaneously. Never write-and-promise-fix-later, never `try/catch`-to-hide, never `.skip`, never silence with redirection. Known variance → fix and re-verify (self-loop). Unknown variance → regress to `planning`.
|
|
73
|
-
|
|
74
|
-
## CODE EXECUTION
|
|
75
|
-
|
|
76
|
-
`exec:<lang>` only. File writes via exec:nodejs + require('fs'). Never Bash(node/npm/npx/bun).
|
|
77
|
-
Pack runs: Promise.allSettled, each idea own try/catch, under 12s per call.
|
|
78
|
-
|
|
79
|
-
## CODEBASE SEARCH
|
|
80
|
-
|
|
81
|
-
`exec:codesearch` only. Grep/Glob/Find/Explore = hook-blocked. Known path → `Read`.
|
|
82
|
-
|
|
83
|
-
## MEMORIZE
|
|
84
|
-
|
|
85
|
-
```
|
|
86
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
Same turn as resolution. Parallel when multiple. End-of-turn self-check mandatory.
|
|
90
|
-
|
|
91
|
-
**Never**: write before pre-emit | advance with post-emit variance | absorb surprises | respond to user mid-phase
|
|
56
|
+
- Hot-reloadable; errors throw with context (no `|| default`, no `catch { return null }`, no fallbacks)
|
|
57
|
+
- No mocks, fakes, stubs, or scattered test files (delete on discovery)
|
|
58
|
+
- Any behavior change has a corresponding assertion in `test.js` — a change no test catches is a change you cannot prove
|
|
59
|
+
- Browser-facing change → post-emit verify includes a live `exec:browser` witness (boot server → `page.goto` → `page.evaluate` asserting the invariant the change established). Node-side import + test.js does not satisfy this — the final gate runs again in `gm-complete`.
|
|
60
|
+
- Files ≤ 200 lines
|
|
61
|
+
- No duplicate concern (run `exec:codesearch` for the primary concern after writing; overlap → `planning`)
|
|
62
|
+
- No comments, no hardcoded values, no adjectives in identifiers, no unnecessary files
|
|
63
|
+
- Observability: new server subsystems expose `/debug/<subsystem>`; new client modules register in `window.__debug`
|
|
64
|
+
- Structure: no if/else where dispatch suffices; no one-liners that obscure; no reinvented APIs
|
|
65
|
+
- Every fact resolved this phase memorized via background `Agent(memorize)`
|
|
66
|
+
- CHANGELOG.md updated; TODO.md cleared or deleted
|
|
@@ -3,70 +3,61 @@ name: gm-execute
|
|
|
3
3
|
description: EXECUTE phase AND the foundational execution contract for every skill. Every exec:<lang> run, every witnessed check, every code search, in every phase, follows this skill's discipline. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning — restart chain from PLAN.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# GM EXECUTE — Resolve
|
|
6
|
+
# GM EXECUTE — Resolve every unknown by witness
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Entry: `.prd` with named unknowns. Exit: every mutable KNOWN → invoke `gm-emit`.
|
|
9
9
|
|
|
10
|
-
This skill
|
|
10
|
+
This skill is the execution contract for ALL phases — pre-emit witnesses, post-emit verifies, e2e checks all run on this discipline. Cross-cutting dispositions live in `gm` SKILL.md.
|
|
11
11
|
|
|
12
|
-
##
|
|
12
|
+
## Transitions
|
|
13
13
|
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
14
|
+
- All mutables KNOWN → `gm-emit`
|
|
15
|
+
- Still UNKNOWN → re-run from a different angle (max 2 passes)
|
|
16
|
+
- New unknown OR unresolvable after 2 passes → `planning`
|
|
17
17
|
|
|
18
|
-
##
|
|
18
|
+
## Mutable discipline
|
|
19
19
|
|
|
20
|
-
Each mutable: name
|
|
20
|
+
Each mutable carries: name, expected, current, resolution method.
|
|
21
21
|
|
|
22
|
-
Resolves to KNOWN only when
|
|
23
|
-
- **ΔS=0** — witnessed output equals expected
|
|
24
|
-
- **λ≥2** — two independent paths agree
|
|
25
|
-
- **ε intact** — adjacent invariants hold
|
|
26
|
-
- **Coverage≥0.70** — enough corpus inspected
|
|
27
|
-
|
|
28
|
-
Unresolved after 2 passes = regress to `planning`. Never narrate past an unresolved mutable.
|
|
29
|
-
|
|
30
|
-
## PRIORS DON'T AUTHORIZE
|
|
31
|
-
|
|
32
|
-
Route candidates from PLAN = `weak_prior` only. Plausibility = right to TEST, not BELIEVE.
|
|
33
|
-
weak_prior → witnessed probe → witnessed → feed to EMIT. "The plan says" / "obviously X" = prior, not fact.
|
|
22
|
+
Resolves to KNOWN only when all four pass:
|
|
34
23
|
|
|
35
|
-
|
|
24
|
+
- **ΔS = 0** — witnessed output equals expected
|
|
25
|
+
- **λ ≥ 2** — two independent paths agree
|
|
26
|
+
- **ε intact** — adjacent invariants hold
|
|
27
|
+
- **Coverage ≥ 0.70** — enough corpus inspected to rule out contradiction
|
|
36
28
|
|
|
37
|
-
|
|
29
|
+
Unresolved after 2 passes regresses to `planning`. Never narrate past an unresolved mutable.
|
|
38
30
|
|
|
39
|
-
|
|
31
|
+
Route candidates from PLAN are `weak_prior` only. Plausibility is the right to test, not the right to believe. A claim with no witness in the current session is a hypothesis — say so when stating it, and say what would settle it. The next reader (you, next turn) needs to know which lines were earned and which were carried forward.
|
|
40
32
|
|
|
41
|
-
##
|
|
33
|
+
## Verification budget
|
|
42
34
|
|
|
43
|
-
`
|
|
35
|
+
Spend on `.prd` items in descending order of consequence-if-wrong × distance-from-witnessed. Items whose failure would collapse the headline finding must reach witnessed status before EMIT; sub-argument-level items need at minimum a stated fallback path.
|
|
44
36
|
|
|
45
|
-
|
|
37
|
+
## Code execution
|
|
46
38
|
|
|
47
|
-
File I/O
|
|
39
|
+
`exec:<lang>` only via Bash. Languages: nodejs (default), bash, python, typescript, go, rust, c, cpp, java, deno, cmd. File I/O via `exec:nodejs` + `require('fs')`. Git directly in Bash. Never `Bash(node/npm/npx/bun)`.
|
|
48
40
|
|
|
49
|
-
Pack runs: Promise.allSettled
|
|
50
|
-
Runner: `exec:runner\nstart|stop|status`
|
|
41
|
+
Pack runs: `Promise.allSettled`, each idea own try/catch, under 12s per call. Runner: `exec:runner\n{start|stop|status}`.
|
|
51
42
|
|
|
52
|
-
Every exec daemonizes. The hook tails the task logfile up to 30s wall-clock and returns whatever
|
|
43
|
+
Every exec daemonizes. The hook tails the task logfile up to 30s wall-clock and returns whatever is there — short tasks complete inside the window and look synchronous; long tasks return a task_id with partial output. Continue with `exec:tail` (drain, bounded), `exec:watch` (resume blocking until match or timeout), or `exec:close` (terminate). Never re-spawn a long task to check on it — that orphans the first one. `exec:wait` is a pure timer; `exec:sleep` blocks on a specific task's output; `exec:watch` is the match-or-timeout primitive. Every execution-platform RPC returns the live list of running tasks for this session — close stragglers via `exec:close\n<id>` so the list stays scannable. Session-end (clear/logout/prompt_input_exit) kills the session's tasks; compaction/handoff preserves them.
|
|
53
44
|
|
|
54
|
-
|
|
45
|
+
Utility verbs (`exec:wait`, `exec:sleep`, `exec:status`, `exec:close`, `exec:pause`, `exec:type`, `exec:runner`, `exec:kill-port`, `exec:recall`, `exec:memorize`, `exec:forget`) take their argument on the next line. Inline form (`exec:status <id>`) is denied by the hook.
|
|
55
46
|
|
|
56
|
-
|
|
47
|
+
## Codebase search
|
|
57
48
|
|
|
58
|
-
|
|
49
|
+
`exec:codesearch` only. Grep, Glob, Find, Explore, raw grep/rg/find inside `exec:bash` are all hook-blocked.
|
|
59
50
|
|
|
60
51
|
```
|
|
61
52
|
exec:codesearch
|
|
62
53
|
<two-word query>
|
|
63
54
|
```
|
|
64
55
|
|
|
65
|
-
|
|
56
|
+
Start two words, change/add one per pass, minimum four attempts before concluding absent. Known absolute path → `Read`. Known directory → `exec:nodejs` + `fs.readdirSync`.
|
|
66
57
|
|
|
67
|
-
##
|
|
58
|
+
## Import-based execution
|
|
68
59
|
|
|
69
|
-
|
|
60
|
+
Hypotheses become real by importing actual modules from disk. Reimplemented behavior is UNKNOWN.
|
|
70
61
|
|
|
71
62
|
```
|
|
72
63
|
exec:nodejs
|
|
@@ -74,76 +65,16 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
74
65
|
console.log(await fn(realInput));
|
|
75
66
|
```
|
|
76
67
|
|
|
77
|
-
Differential diagnosis: smallest reproduction → compare actual vs expected → name the delta
|
|
78
|
-
|
|
79
|
-
## CI — AUTOMATED
|
|
80
|
-
|
|
81
|
-
`git push` → Stop hook auto-watches Actions for pushed HEAD. Same-repo only — downstream cascades not auto-watched.
|
|
82
|
-
- Green → Stop approves with summary
|
|
83
|
-
- Failure → run names+IDs → `gh run view <id> --log-failed`
|
|
84
|
-
- Deadline 180s (override `GM_CI_WATCH_SECS`)
|
|
85
|
-
|
|
86
|
-
## GROUND TRUTH
|
|
87
|
-
|
|
88
|
-
Real services, real data, real timing. Mocks/stubs/scattered tests/fallbacks = delete.
|
|
89
|
-
|
|
90
|
-
**Scan before edit**: exec:codesearch before creating/modifying. Duplicate concern = regress to `planning`.
|
|
91
|
-
**Hypothesize via execution**: hypothesis → run → witness → edit. Never edit on unwitnessed assumption.
|
|
92
|
-
**Code quality**: native → library → structure (map/pipeline) → write.
|
|
93
|
-
|
|
94
|
-
## PARALLEL SUBAGENTS
|
|
95
|
-
|
|
96
|
-
≤3 `gm:gm` subagents for independent items in ONE message. Browser escalation: exec:browser → browser skill → screenshot last resort.
|
|
97
|
-
|
|
98
|
-
## RECALL — HARD RULE
|
|
99
|
-
|
|
100
|
-
Before resolving any new unknown via fresh execution, recall first.
|
|
101
|
-
|
|
102
|
-
```
|
|
103
|
-
exec:recall
|
|
104
|
-
<2-6 word query>
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
Triggers: "did we hit this" | feels familiar | new sub-task in known project | about to comment a non-obvious choice | about to ask user something likely discussed.
|
|
108
|
-
|
|
109
|
-
Hits = weak_prior; still witness. Empty = proceed. Capped 6s, ~5ms when serve running. ~200 tokens / 5 hits.
|
|
110
|
-
|
|
111
|
-
## MEMORIZE — HARD RULE
|
|
112
|
-
|
|
113
|
-
Unknown→known = same-turn memorize.
|
|
114
|
-
|
|
115
|
-
```
|
|
116
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
117
|
-
```
|
|
118
|
-
|
|
119
|
-
Triggers: exec output answers prior unknown | CI log reveals root cause | code read confirms/refutes | env quirk | user states preference/constraint.
|
|
120
|
-
|
|
121
|
-
N facts → N parallel Agent calls in ONE message. End-of-turn self-check mandatory.
|
|
122
|
-
|
|
123
|
-
## FIX ON SIGHT — HARD RULE
|
|
124
|
-
|
|
125
|
-
Issue surfaced mid-execution (failing test, exec stderr, broken import, runtime exception, lint/type error, deprecation warning, unexpected output) is fixed THIS turn, at root cause, in-band. Never `// TODO`, never `try/catch`-to-swallow, never `2>/dev/null`, never `.skip`, never "out of scope" inside the same file. Re-witness after fix. New unknown surfaced by the fix → regress to `planning`. Genuine out-of-scope → write a `.gm/prd.yml` item before continuing.
|
|
126
|
-
|
|
127
|
-
**Incidental errors auto-plan**: a reasonably-fixable issue that is *not* what the user asked about — pre-existing build break, lockfile drift, broken dep feature, dead import in adjacent module, missing artifact, neighboring lint failure — still belongs to the agent. Add it to `.gm/prd.yml` the same turn it surfaces and execute it before COMPLETE. Do not ask the user; do not narrate past it; do not file it as "next session." Only errors that genuinely need user credentials, decisions, or external services that are down are name-and-stop, recorded with `blockedBy: external`.
|
|
128
|
-
|
|
129
|
-
**Obvious re-architecting auto-plans**: same discipline for clear refactor wins surfaced mid-task — code competing with an existing library/package that does the same thing, multi-file ad-hoc logic one import would replace, duplicated logic asking for one helper. Regress to `planning`, add the item, execute. Bar is *obvious + reachable from this session*; speculative refactors stay out.
|
|
130
|
-
|
|
131
|
-
**Cross-session PRD**: items in `.gm/prd.yml` from prior sessions are this session's work the moment they're discovered. Finish every item in the file before COMPLETE — including ones the current user message did not mention. "From another session" is not an exemption.
|
|
132
|
-
|
|
133
|
-
## BROWSER WITNESS — HARD RULE
|
|
68
|
+
Differential diagnosis: smallest reproduction → compare actual vs expected → name the delta — that delta is the mutable.
|
|
134
69
|
|
|
135
|
-
|
|
70
|
+
## Edits depend on witnesses
|
|
136
71
|
|
|
137
|
-
|
|
138
|
-
1. Boot server / open page → HTTP 200 witnessed
|
|
139
|
-
2. `exec:browser` → `page.goto(url)` → poll the affected global (`window.__app.<system>`, `window.__debug.<module>`)
|
|
140
|
-
3. `page.evaluate` asserting the specific invariant the change establishes — capture numbers
|
|
141
|
-
4. Variance → fix at root cause, re-witness (FIX ON SIGHT). Never advance to EMIT on unwitnessed client behavior.
|
|
72
|
+
Hypothesis → run → witness → edit. An edit before a witness is a guess. Scan via `exec:codesearch` before creating or modifying — duplicate concern regresses to `planning`. Code-quality preference: native → library → structure → write.
|
|
142
73
|
|
|
143
|
-
|
|
74
|
+
## Parallel subagents
|
|
144
75
|
|
|
145
|
-
|
|
76
|
+
Up to 3 `gm:gm` subagents for independent items in one message. Browser escalation: `exec:browser` → `browser` skill → screenshot only as last resort.
|
|
146
77
|
|
|
147
|
-
|
|
78
|
+
## CI is automated
|
|
148
79
|
|
|
149
|
-
|
|
80
|
+
`git push` triggers the Stop hook to watch Actions for the pushed HEAD on the same repo (downstream cascades are not auto-watched). Green → Stop approves with summary; failure → run names + IDs surfaced, investigate via `gh run view <id> --log-failed`. Deadline 180s (override `GM_CI_WATCH_SECS`).
|
|
@@ -3,24 +3,25 @@ name: governance
|
|
|
3
3
|
description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates route discovery (PLAN) from weak-prior handoff (EXECUTE) from earned-emission legitimacy (EMIT/VERIFY). Encodes 16-failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, governance stress suite.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Governance — Route,
|
|
6
|
+
# Governance — Route, bridge, legitimacy
|
|
7
7
|
|
|
8
|
-
Three roles, three failure surfaces
|
|
9
|
-
1. **Route discovery** — what family of fault? Owned by `planning`.
|
|
10
|
-
2. **Weak-prior bridge** — plausibility ≠ authorization. Owned by `gm-execute`.
|
|
11
|
-
3. **Legitimacy gate** — did this answer earn its strength? Owned by `gm-emit`/`gm-complete`.
|
|
8
|
+
Three roles, three failure surfaces.
|
|
12
9
|
|
|
13
|
-
|
|
10
|
+
1. Route discovery — what family of fault? Owned by `planning`.
|
|
11
|
+
2. Weak-prior bridge — plausibility is not authorization. Owned by `gm-execute`.
|
|
12
|
+
3. Legitimacy gate — did this answer earn its strength? Owned by `gm-emit` and `gm-complete`.
|
|
14
13
|
|
|
15
|
-
|
|
16
|
-
|
|
14
|
+
## Five refused collapses
|
|
15
|
+
|
|
16
|
+
1. Route → authorization ("plan looks good" treated as "code is right")
|
|
17
|
+
2. Candidate → structural repair (local patch shipped as architectural fix)
|
|
17
18
|
3. Hidden → public law (internal convenience shipped as contract)
|
|
18
|
-
4. Cleanliness → legitimacy (compiles
|
|
19
|
+
4. Cleanliness → legitimacy (compiles treated as evidence-supports)
|
|
19
20
|
5. One strong route → universal closure (best answer treated as only answer)
|
|
20
21
|
|
|
21
|
-
When in doubt
|
|
22
|
+
When in doubt, preserve ambiguity. Lawful downgrade beats forced closure.
|
|
22
23
|
|
|
23
|
-
## 7
|
|
24
|
+
## 7 route families
|
|
24
25
|
|
|
25
26
|
| Family | What breaks | Repair |
|
|
26
27
|
|---|---|---|
|
|
@@ -32,7 +33,7 @@ When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
|
|
|
32
33
|
| boundary | Interfaces, contracts, seams | Re-assert contract from one source |
|
|
33
34
|
| representation | Data shape, schema, type | Make illegal states unrepresentable |
|
|
34
35
|
|
|
35
|
-
## 16
|
|
36
|
+
## 16 failure modes
|
|
36
37
|
|
|
37
38
|
| # | Name | Family |
|
|
38
39
|
|---|---|---|
|
|
@@ -53,7 +54,7 @@ When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
|
|
|
53
54
|
| 15 | Deployment deadlock | execution |
|
|
54
55
|
| 16 | Pre-deploy collapse | execution |
|
|
55
56
|
|
|
56
|
-
## 4
|
|
57
|
+
## 4 state planes
|
|
57
58
|
|
|
58
59
|
| Plane | Owner | States | Implication |
|
|
59
60
|
|---|---|---|---|
|
|
@@ -62,18 +63,18 @@ When in doubt: preserve ambiguity. Lawful downgrade beats forced closure.
|
|
|
62
63
|
| repair_legality | gm-emit | unverified → local_candidate → structural | Local cannot ship as structural |
|
|
63
64
|
| hidden_decision_posture | gm-complete | open → down_weighted → closed | Close only after CI green |
|
|
64
65
|
|
|
65
|
-
## Quality
|
|
66
|
+
## Quality metrics
|
|
66
67
|
|
|
67
68
|
- **ΔS** — witnessed output equals expected. ΔS≠0 = still open.
|
|
68
|
-
-
|
|
69
|
+
- **λ ≥ 2** — two independent paths agree. λ=1 = still unknown.
|
|
69
70
|
- **ε** — adjacent invariants hold (types, tests, neighboring callers).
|
|
70
|
-
- **Coverage≥0.70** — enough corpus inspected to rule out contradicting evidence.
|
|
71
|
+
- **Coverage ≥ 0.70** — enough corpus inspected to rule out contradicting evidence.
|
|
71
72
|
|
|
72
|
-
All four
|
|
73
|
+
All four pass before a mutable flips UNKNOWN → KNOWN.
|
|
73
74
|
|
|
74
|
-
## Stress
|
|
75
|
+
## Stress suite
|
|
75
76
|
|
|
76
|
-
Run before declaring COMPLETE
|
|
77
|
+
Run before declaring COMPLETE.
|
|
77
78
|
|
|
78
79
|
| # | Case | Failure if flunked |
|
|
79
80
|
|---|---|---|
|
|
@@ -86,11 +87,11 @@ Run before declaring COMPLETE:
|
|
|
86
87
|
| A1 | Authenticity eval partial signals | Surface appearance beats evidence |
|
|
87
88
|
| D1 | Deploy-gate under CI flake | Treats noise as green |
|
|
88
89
|
|
|
89
|
-
Legal: illegal_commitment=0
|
|
90
|
+
Legal: `illegal_commitment=0`, `evidence_boundary_violation=0`, `lawful_downgrade=available` in all 8, `outlier_visibility=preserved`.
|
|
90
91
|
|
|
91
|
-
## Phase
|
|
92
|
+
## Phase application
|
|
92
93
|
|
|
93
94
|
- **planning** — tag every `.prd` item with route family + failure-mode IDs
|
|
94
|
-
- **gm-execute** — weak prior only; witnessed probe
|
|
95
|
+
- **gm-execute** — weak prior only; witnessed probe before authorization
|
|
95
96
|
- **gm-emit** — legitimacy gate; unearned specificity → lawful downgrade
|
|
96
|
-
- **gm-complete** — stress-suite pass; close posture only CI green
|
|
97
|
+
- **gm-complete** — stress-suite pass; close posture only when CI is green
|