gm-thebird 2.0.1012

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,121 @@
1
+ ---
2
+ name: create-lang-plugin
3
+ description: Create a lang/ plugin that wires any CLI tool or language runtime into gm-cc — adds exec:<id> dispatch, optional LSP diagnostics, and optional prompt context injection. Zero hook configuration required.
4
+ ---
5
+
6
+ # Create lang plugin
7
+
8
+ Single CommonJS file at `<projectDir>/lang/<id>.js`. Auto-discovered — no hook editing.
9
+
10
+ ## Plugin shape
11
+
12
+ ```js
13
+ 'use strict';
14
+ module.exports = {
15
+ id: 'mytool',
16
+ exec: {
17
+ match: /^exec:mytool/,
18
+ run(code, cwd) { /* returns string or Promise<string> */ }
19
+ },
20
+ lsp: {
21
+ check(fileContent, cwd) { /* returns Diagnostic[] */ }
22
+ },
23
+ extensions: ['.ext'],
24
+ context: `=== mytool ===\n...`
25
+ };
26
+ ```
27
+
28
+ `type Diagnostic = { line: number; col: number; severity: 'error'|'warning'; message: string }`
29
+
30
+ `exec.run` runs in a child process, 30s timeout, async OK. Called when Claude writes `exec:mytool\n<code>`. `lsp.check` is synchronous-only, called per prompt-submit. `context` is injected into every prompt, truncated to 2000 chars.
31
+
32
+ ## Identify the tool
33
+
34
+ What is the CLI name or npm package? Does it run a single expression (`tool eval`, `tool -e`, HTTP POST) or a file (`tool run <file>`)? What is its lint/check mode and output format? File extensions? Does it require a running server, or does it run headless?
35
+
36
+ ## exec.run patterns
37
+
38
+ HTTP eval against a running server:
39
+
40
+ ```js
41
+ function httpPost(port, urlPath, body) {
42
+ return new Promise((resolve, reject) => {
43
+ const data = JSON.stringify(body);
44
+ const req = http.request(
45
+ { hostname: '127.0.0.1', port, path: urlPath, method: 'POST',
46
+ headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(data) } },
47
+ res => { let raw = ''; res.on('data', c => raw += c); res.on('end', () => resolve(JSON.parse(raw))); }
48
+ );
49
+ req.setTimeout(8000, () => { req.destroy(); reject(new Error('timeout')); });
50
+ req.on('error', reject);
51
+ req.write(data); req.end();
52
+ });
53
+ }
54
+ ```
55
+
56
+ File-based, headless:
57
+
58
+ ```js
59
+ function runFile(code, cwd) {
60
+ const tmp = path.join(os.tmpdir(), `plugin_${Date.now()}.ext`);
61
+ fs.writeFileSync(tmp, code);
62
+ try { return execFileSync('mytool', ['run', tmp], { cwd, encoding: 'utf8', timeout: 10000 }); }
63
+ finally { try { fs.unlinkSync(tmp); } catch (_) {} }
64
+ }
65
+ ```
66
+
67
+ Single-expression detection:
68
+
69
+ ```js
70
+ const isSingleExpr = code => !code.trim().includes('\n') && !/\b(func|def|fn |class|import)\b/.test(code);
71
+ ```
72
+
73
+ ## lsp.check
74
+
75
+ ```js
76
+ function check(fileContent, cwd) {
77
+ const tmp = path.join(os.tmpdir(), `lsp_${Math.random().toString(36).slice(2)}.ext`);
78
+ try {
79
+ fs.writeFileSync(tmp, fileContent);
80
+ const r = spawnSync('mytool', ['check', tmp], { encoding: 'utf8', cwd });
81
+ return (r.stdout + r.stderr).split('\n').reduce((acc, line) => {
82
+ const m = line.match(/^.+:(\d+):(\d+):\s+(error|warning):\s+(.+)$/);
83
+ if (m) acc.push({ line: +m[1], col: +m[2], severity: m[3], message: m[4].trim() });
84
+ return acc;
85
+ }, []);
86
+ } catch (_) { return []; }
87
+ finally { try { fs.unlinkSync(tmp); } catch (_) {} }
88
+ }
89
+ ```
90
+
91
+ ## context
92
+
93
+ Under 300 chars:
94
+
95
+ ```js
96
+ context: `=== mytool ===\nexec:mytool\n<expression>\n\nRuns via <how>. Use for <when>.`
97
+ ```
98
+
99
+ ## Verify
100
+
101
+ ```
102
+ exec:nodejs
103
+ const p = require('/abs/path/lang/mytool.js');
104
+ console.log(p.id, typeof p.exec.run, p.exec.match.toString());
105
+ ```
106
+
107
+ Then test dispatch:
108
+
109
+ ```
110
+ exec:mytool
111
+ <simple test expression>
112
+ ```
113
+
114
+ ## Constraints
115
+
116
+ - `exec.run` async OK, 30s timeout
117
+ - `lsp.check` synchronous only — no Promises
118
+ - CommonJS only — no ES module syntax
119
+ - No persistent processes
120
+ - `id` must match filename exactly
121
+ - First match wins — keep `match` specific
@@ -0,0 +1,69 @@
1
+ ---
2
+ name: gm
3
+ description: Agent (not skill) - immutable programming state machine. Always invoke for all work coordination.
4
+ ---
5
+
6
+ # GM — Orchestrator
7
+
8
+ Invoke `planning` immediately. Skill tool only.
9
+
10
+ Phases: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS. Each loaded by Skill invocation; reading the summary is not being in the phase.
11
+
12
+ ## The world the answer lives in
13
+
14
+ The user's request is the authorization. The PRD records it. Doubts during execution resolve by witnessed probe, by recall, or by re-reading the PRD. Questions back to the user only when the next action is destructive-irreversible AND uncovered by the PRD, or when intent is genuinely irrecoverable from PRD, memory, code, and the public web. `exec:pause` is the channel; in-conversation asking is beneath that. Web-search before pausing on anything the public web could plausibly answer.
15
+
16
+ The obvious read of "deeply integrate", "all of them", "every X", "across the whole Y" is wider, not narrower. Pick the maximum reachable shape, declare the read in one line so the user can interrupt, execute. Multi-repo scope, build cost, CI duration, binary-size impact are never grounds to re-confirm. When scope exceeds reach, write every witnessable subset into the PRD as separate items and finish them all. Residuals within the spirit of the ask and reachable from this session re-enter `planning`, append PRD items, and execute — silently-but-declared, never name-and-stop, never asked back. The only name-and-stop residual is one that is genuinely outside the spirit of the ask OR genuinely unreachable from this session; everything else is this turn's work. Before declaring done, scan once more: any reachable in-spirit residual found means re-enter PLAN, not stop.
17
+
18
+ When a PRD holds remaining items, do every reachable one — never offer the user a numbered choice between strategies, never serialize "approach A then approach B then approach C" through the user's inbox. Independent items run as parallel `gm:gm` subagents in one message; sequential items execute back-to-back without re-asking permission between them. The user's authorization for the PRD is authorization for every item in it. A response that asks "1, 2, 3, or 4?" when the PRD is non-empty is the failure this rule guards against — pick the obvious reading, declare it in one line, execute all of them.
19
+
20
+ What ships runs. Stubs, mocks, placeholder returns, fixture-only paths, demo-mode short-circuits, and "TODO: implement" bodies are forbidden in shipped code — they ship green checks that lie. A shim is allowed only when it delegates to real upstream behavior; before adding one, check whether a published library already covers the surface, because local reimplementations drift and age. The behavioral test for fakeness: real input through real code into real output, witnessed. Anything less is provisional.
21
+
22
+ CI is the build. For Rust crates in this org (rs-exec, rs-codeinsight, rs-search, rs-learn, rs-plugkit) and the gm publish chain, `git push` triggers the build matrix; `cargo build` and `cargo test` are not run locally. Local toolchain mismatches, missing deps, or rustc version skew never block a push — push, watch CI via the Stop hook, fix on red. "I cannot witness without a local build" is wrong here: the witness is the green CI run on the pushed HEAD, and the cascade fans the green binary to all 12 downstream platform repos. Pausing for a local build wall is forced closure dressed as caution.
23
+
24
+ Every issue surfaced during work is fixed in-band, this turn, at root cause. Pre-existing build breaks, neighboring lint failures, lockfile drift, broken deps, and stale generated files surfaced while doing the user's task become new PRD items the same turn and finish before COMPLETE. Same rule for obvious refactor wins: hand-rolled code that an existing library covers, multi-file ad-hoc systems one import would replace. The bar is *obvious + reachable from this session*. Items left in `.gm/prd.yml` from prior sessions are this session's work the moment they're seen.
25
+
26
+ Editing browser-facing code requires `exec:browser` witness in the same turn — boot the surface, navigate, assert the specific invariant via `page.evaluate`, capture the numbers. EXECUTE witnesses on edit, EMIT re-witnesses post-write, VERIFY runs the final gate. The exemption (pure-prose static document with no behavior change) is tagged in the response with the reason.
27
+
28
+ Code does mechanics; meaning goes through the textprocessing skill. Summarize, classify, extract intent, rewrite, translate, semantic dedup, rank, label, decide-if-two-texts-mean-the-same — all routed through `Agent(subagent_type='gm:textprocessing', model='haiku', ...)`, N items in N parallel calls. A keyword-list or regex-on-meaning-phrases loop deciding semantic questions is a stub of this skill.
29
+
30
+ Every program emits structured JSONL to `~/.claude/gm-log/<date>/<subsystem>.jsonl`. Inspect via `plugkit log {tail|grep|stats}` before re-running with print debugging. Code the agent writes extends the project's existing observability surface; if none exists, the smallest correct shim is one JSONL appender to `.gm/log/`. Emit on state transitions, error boundaries, external IO, nontrivial decisions; skip loop bodies and parser steps.
31
+
32
+ ## Recall and memorize
33
+
34
+ Before resolving any unknown via fresh execution, recall first.
35
+
36
+ ```
37
+ exec:recall
38
+ <2-6 word query>
39
+ ```
40
+
41
+ Hits arrive as `weak_prior` — they earn the right to be tested, not believed. Empty results confirm the unknown is fresh.
42
+
43
+ A witness that flips an unknown to known is incomplete until the fact lives outside this context window. Memorize is the back-half of the same act, not a later chore — it fires alongside the witness, in the background, in haiku, never blocking the next probe. Resolutions hand off as they happen.
44
+
45
+ ```
46
+ Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
47
+ ```
48
+
49
+ One subagent per fact, fan out in parallel — batching dilutes the signal. The mutables file is the receipt: every entry that flips to `status: witnessed` carries a matching haiku in flight. A status change without a memorize is unfinished work the next turn cannot see.
50
+
51
+ ## Execution order
52
+
53
+ The spool is the universal dispatch surface. Write a file to `.gm/exec-spool/in/<lang-or-verb>/<N>.<ext>`; the watcher executes and streams `out/<N>.out` + `out/<N>.err` + `out/<N>.json`. Languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno. Verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health.
54
+
55
+ Order of cheapness:
56
+
57
+ 1. Recall — `in/recall/<N>.txt` with the query
58
+ 2. Codebase search — `in/codesearch/<N>.txt` with two-word query, 90% of lookups
59
+ 3. Code execution — `in/<lang>/<N>.<ext>`
60
+ 4. Web (`WebFetch`, `WebSearch`) — env facts not in codebase
61
+ 5. User — last resort
62
+
63
+ Bash accepts ONLY git commands directly (`git status`, `git commit`, `git push`, `git log`, `gh ...`). Everything else — code AND every utility verb — dispatches via the spool. Never `Bash(node/npm/npx/bun)`, never `Bash(exec:<anything>)`. `git push` triggers auto CI watch via Stop hook.
64
+
65
+ Skill chain: `planning` → `gm-execute` → `gm-emit` → `gm-complete` → `update-docs`.
66
+
67
+ ## Response shape
68
+
69
+ Terse. Fragments OK. `[thing] [action] [reason]. [next step].` Code, commits, PRs use normal prose. Drop terseness for security or destructive moves.
@@ -0,0 +1,106 @@
1
+ ---
2
+ name: gm-complete
3
+ description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
4
+ ---
5
+
6
+ # GM COMPLETE — Verify, then close
7
+
8
+ Entry: EMIT gates clear, from `gm-emit`. Exit: `.prd` deleted + test.js green + pushed + CI green → `update-docs`.
9
+
10
+ Cross-cutting dispositions live in `gm` SKILL.md.
11
+
12
+ ## Transitions
13
+
14
+ - `.prd` items remain → `gm-execute`
15
+ - `.prd` empty AND test.js green AND pushed AND CI green → `update-docs`
16
+ - Broken file output → `gm-emit`
17
+ - Wrong logic → `gm-execute`
18
+ - New unknown or wrong requirements → `planning`
19
+
20
+ Failure triage: broken output to EMIT, wrong logic to EXECUTE, new unknown to PLAN. Never patch around surprises.
21
+
22
+ ## Mutables that must resolve before COMPLETE
23
+
24
+ - `witnessed_e2e` — real end-to-end run with witnessed output
25
+ - `browser_validated` — for any change touching client / UI / browser-facing code, see gate below. test.js + node-side imports DO NOT satisfy this gate.
26
+ - `git_clean` — `git status --porcelain` returns empty
27
+ - `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
28
+ - `ci_passed` — every GitHub Actions run reaches `conclusion: success`
29
+ - `mutables_resolved` — `.gm/mutables.yml` deleted OR every entry `status: witnessed`. Stop hook hard-blocks turn-stop while any entry is `status: unknown`.
30
+ - `prd_empty` — `.gm/prd.yml` deleted AFTER residual scan: enumerate every in-spirit reachable residual surfaced this session; any hit re-enters `planning`, appends PRD items, executes. Empty PRD is necessary, not sufficient — done = empty PRD AND zero reachable in-spirit residuals. Out-of-spirit-or-unreachable residuals are named in the response and skipped; everything else is this turn's work.
31
+ - `stress_suite_clear` — change walked through M1–D1 (governance), none flunked
32
+ - `hidden_decision_posture` — open → down_weighted → closed only when CI is green AND stress suite is clear
33
+
34
+ ## End-to-end verification
35
+
36
+ Real system, real data, witness actual output. Doc updates, "saying done", and screenshots alone are not verification. Write the e2e probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
37
+
38
+ ```
39
+ const { fn } = await import('/abs/path/to/module.js');
40
+ console.log(await fn(realInput));
41
+ ```
42
+
43
+ After every success, enumerate what remains — never stop at first green.
44
+
45
+ ## Browser validation gate
46
+
47
+ Required when this session changed any code that runs in a browser: anything under `client/`, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle.
48
+
49
+ Trigger detection (any one): `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry; new/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL; visual, layout, animation, input, network-on-page, or shader behavior altered.
50
+
51
+ Protocol: boot the real server (or open the static page) on a known URL — witness HTTP 200. `exec:browser` → `page.goto(url)` → wait for app init by polling for the global the change affects (`window.__app.<system>`). Probe via `page.evaluate(() => …)` asserting the specific invariant the change was supposed to establish — instance counts, scene meshes, DOM nodes, render stats, network frames. Capture witnessed numbers in the response — "looks fine" is not a witness. Failures route to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
52
+
53
+ Long-running probes split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
54
+
55
+ Exempt only when: change is server-only with zero browser-facing surface, OR the repository has no browser surface at all (pure CLI / library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence.
56
+
57
+ Pre-flight: run `git diff --name-only origin/main..HEAD` directly via Bash, then dispatch a nodejs spool file that reads the diff list and filters lines matching `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session → mandatory regression to `gm-execute`.
58
+
59
+ ## Integration test gate
60
+
61
+ Write to `.gm/exec-spool/in/nodejs/<N>.js`:
62
+
63
+ ```
64
+ const { execSync } = require('child_process');
65
+ try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
66
+ catch (e) { console.error('FAIL'); process.exit(1); }
67
+ ```
68
+
69
+ Failure → `gm-execute`. No test.js in a repo with testable surface → `gm-execute` to create it.
70
+
71
+ ## Git enforcement
72
+
73
+ Run directly via Bash:
74
+
75
+ ```
76
+ git status --porcelain
77
+ git log origin/main..HEAD --oneline
78
+ ```
79
+
80
+ Both must return empty. Local commit without push is not complete.
81
+
82
+ ## CI is automated
83
+
84
+ The Stop hook watches Actions for the pushed HEAD. Do not call `gh run list` manually. All-green → Stop approves with CI summary in next-turn context. Failure → Stop blocks with run names + IDs; investigate via `gh run view <id> --log-failed`, fix, push, hook re-watches. Deadline 180s (override `GM_CI_WATCH_SECS`); slow jobs get a "still in progress" approve.
85
+
86
+ ## Hygiene sweep
87
+
88
+ 1. Files >200 lines → split
89
+ 2. Comments in code → remove
90
+ 3. Scattered test files (`.test.js`, `.spec.js`, `__tests__/`, `fixtures/`, `mocks/`) → delete, consolidate into root `test.js`
91
+ 4. Mock / stub / simulation files → delete
92
+ 5. Unnecessary doc files (not CHANGELOG, CLAUDE, README, TODO.md) → delete
93
+ 6. Duplicate concern → regress to `planning` with restructuring instructions
94
+ 7. Hardcoded values → derive from ground truth
95
+ 8. Fallback / demo modes → remove, fail loud
96
+ 9. TODO.md → empty or deleted
97
+ 10. CHANGELOG.md → entries for this session
98
+ 11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
99
+ 12. Memorize → every fact from verification handed off via background `Agent(memorize)` at moment of resolution
100
+ 13. Deploy / publish → if deployable, deploy; if npm package, publish
101
+ 14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
102
+ 15. Governance stress-suite → walk change through M1, F1, C1, H1, S1, B1, A1, D1; any flunk regresses to the owning phase
103
+
104
+ ## Completion
105
+
106
+ All true at once: witnessed e2e | browser_validated when client work touched | failure paths exercised | test.js passes | `.prd` deleted | git clean and pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated.
@@ -0,0 +1,70 @@
1
+ ---
2
+ name: gm-emit
3
+ description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
4
+ ---
5
+
6
+ # GM EMIT — Write and verify from disk
7
+
8
+ Entry: every mutable KNOWN, from `gm-execute` or re-entered from VERIFY. Exit: gates clear → `gm-complete`.
9
+
10
+ Cross-cutting dispositions live in `gm` SKILL.md.
11
+
12
+ ## Transitions
13
+
14
+ - All gates clear → `gm-complete`
15
+ - Post-emit variance with known cause → fix in-band, re-verify, stay in EMIT
16
+ - Pre-emit reveals known logic error → `gm-execute`
17
+ - Pre-emit reveals new unknown OR post-emit variance with unknown cause OR scope changed → `planning`
18
+
19
+ ## Legitimacy gate (before pre-emit run)
20
+
21
+ For every claim landing in a file, answer five questions:
22
+
23
+ 1. Earned specificity — does it trace to `authorization=witnessed`, or is it inflated from a weak prior?
24
+ 2. Repair legality — is a local patch dressed as structural repair? Downgrade scope or regress to PLAN.
25
+ 3. Lawful downgrade — can a weaker, true statement replace it? Prefer the downgrade.
26
+ 4. Alternative-route suppression — is a live competing route being silenced? Preserve it.
27
+ 5. Strongest objection — what would the sharpest reviewer pushback be? Articulate it. Cannot articulate = have not understood the alternatives → `gm-execute`.
28
+
29
+ Any failure regresses to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
30
+
31
+ ## Pre-emit run
32
+
33
+ Mandatory before writing any file. Write the probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
34
+
35
+ ```
36
+ const { fn } = await import('/abs/path/to/module.js');
37
+ console.log(await fn(realInput));
38
+ ```
39
+
40
+ Import the actual module from disk to witness current behavior as the baseline. Run the proposed logic in isolation without writing — witness with real inputs and with real error inputs. Match expected → write. Unexpected → new unknown → `planning`.
41
+
42
+ ## Writing
43
+
44
+ Use the Write tool, or a nodejs spool file with `require('fs')`. Write only when every gate mutable resolves simultaneously.
45
+
46
+ ## Post-emit verification
47
+
48
+ Re-import from disk — in-memory state is stale and inadmissible. Run identical inputs as pre-emit; output must match the baseline exactly. Known variance → fix and re-verify (self-loop). Unknown variance → `planning`.
49
+
50
+ ## Mutables gate
51
+
52
+ Before pre-emit run, read `.gm/mutables.yml`. Any entry with `status: unknown` → regress to `gm-execute`. The pre-tool-use hook hard-blocks Write/Edit/NotebookEdit while unresolved entries exist; trying to emit anyway returns deny. Zero unresolved is the precondition for every legitimacy question below.
53
+
54
+ ## Gate (all true at once)
55
+
56
+ - `.gm/mutables.yml` empty/absent OR every entry `status: witnessed` with filled `witness_evidence`
57
+ - Legitimacy gate passed; no refused collapse
58
+ - Pre-emit passed with real inputs and real error inputs
59
+ - Post-emit matches pre-emit exactly
60
+ - Hot-reloadable; errors throw with context (no `|| default`, no `catch { return null }`, no fallbacks)
61
+ - No mocks, fakes, stubs, or scattered test files (delete on discovery)
62
+ - Any behavior change has a corresponding assertion in `test.js` — a change no test catches is a change you cannot prove
63
+ - Browser-facing change → post-emit verify includes a live `exec:browser` witness (boot server → `page.goto` → `page.evaluate` asserting the invariant the change established). Node-side import + test.js does not satisfy this — the final gate runs again in `gm-complete`.
64
+ - Files ≤ 200 lines
65
+ - No duplicate concern (run `exec:codesearch` for the primary concern after writing; overlap → `planning`)
66
+ - No comments, no hardcoded values, no adjectives in identifiers, no unnecessary files
67
+ - Observability: new server subsystems expose `/debug/<subsystem>`; new client modules register in `window.__debug`
68
+ - Structure: no if/else where dispatch suffices; no one-liners that obscure; no reinvented APIs
69
+ - Every fact resolved this phase memorized via background `Agent(memorize)`
70
+ - CHANGELOG.md updated; TODO.md cleared or deleted
@@ -0,0 +1,84 @@
1
+ ---
2
+ name: gm-execute
3
+ description: EXECUTE phase AND the foundational execution contract for every skill. Every exec:<lang> run, every witnessed check, every code search, in every phase, follows this skill's discipline. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning — restart chain from PLAN.
4
+ ---
5
+
6
+ # GM EXECUTE — Resolve every unknown by witness
7
+
8
+ Entry: `.prd` with named unknowns. Exit: every mutable KNOWN → invoke `gm-emit`.
9
+
10
+ A `@<discipline>` sigil propagates from PLAN through every recall, codesearch, and memorize call; reads without one fan across default plus enabled disciplines, writes without one go to default only.
11
+
12
+ This skill is the execution contract for ALL phases — pre-emit witnesses, post-emit verifies, e2e checks all run on this discipline. Cross-cutting dispositions live in `gm` SKILL.md.
13
+
14
+ ## Transitions
15
+
16
+ - All mutables KNOWN → `gm-emit`
17
+ - Still UNKNOWN → re-run from a different angle (max 2 passes)
18
+ - New unknown OR unresolvable after 2 passes → `planning`
19
+
20
+ ## Mutable discipline
21
+
22
+ Each mutable carries: name, expected, current, resolution method.
23
+
24
+ Resolves to KNOWN only when all four pass:
25
+
26
+ - **ΔS = 0** — witnessed output equals expected
27
+ - **λ ≥ 2** — two independent paths agree
28
+ - **ε intact** — adjacent invariants hold
29
+ - **Coverage ≥ 0.70** — enough corpus inspected to rule out contradiction
30
+
31
+ Unresolved after 2 passes regresses to `planning`. Never narrate past an unresolved mutable.
32
+
33
+ Every witness that resolves a mutable writes back to `.gm/mutables.yml` the same step: set `status: witnessed` and fill `witness_evidence` with concrete proof (file:line, codesearch hit, exec output snippet). No write-back = the mutable stays unknown and the EMIT-gate stays closed. The hook reads this file; the agent's memory of "I resolved it" does not unblock anything.
34
+
35
+ Route candidates from PLAN are `weak_prior` only. Plausibility is the right to test, not the right to believe. A claim with no witness in the current session is a hypothesis — say so when stating it, and say what would settle it. The next reader (you, next turn) needs to know which lines were earned and which were carried forward.
36
+
37
+ ## Verification budget
38
+
39
+ Spend on `.prd` items in descending order of consequence-if-wrong × distance-from-witnessed. Items whose failure would collapse the headline finding must reach witnessed status before EMIT; sub-argument-level items need at minimum a stated fallback path.
40
+
41
+ ## Code execution
42
+
43
+ Code AND utility verbs both run through the file-spool. Write a file to `.gm/exec-spool/in/<lang-or-verb>/<N>.<ext>` — language stems (`in/nodejs/42.js`, `in/python/43.py`, `in/bash/44.sh`, plus typescript, go, rust, c, cpp, java, deno) or verb stems (`in/codesearch/45.txt`, `in/recall/46.txt`, `in/memorize/47.md`, plus wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health). The spool watcher executes and streams stdout to `out/<N>.out`, stderr to `out/<N>.err`, then writes `out/<N>.json` metadata sidecar at completion (taskId, lang, ok, exitCode, durationMs, timedOut, startedAt, endedAt). Both streams return as systemMessage with `--- stdout ---` / `--- stderr ---` separators. File I/O via a nodejs spool file + `require('fs')`. Only `git` and `gh` run directly in Bash. Never `Bash(node/npm/npx/bun)`, never `Bash(exec:<anything>)`.
44
+
45
+ Pack runs: `Promise.allSettled`, each idea own try/catch, under 12s per call. Runner: write `in/runner/<N>.txt` with body `start` | `stop` | `status`.
46
+
47
+ Every exec daemonizes. The hook tails the task logfile up to 30s wall-clock and returns whatever is there — short tasks complete inside the window and look synchronous; long tasks return a task_id with partial output. Continue with `exec:tail` (drain, bounded), `exec:watch` (resume blocking until match or timeout), or `exec:close` (terminate). Never re-spawn a long task to check on it — that orphans the first one. `exec:wait` is a pure timer; `exec:sleep` blocks on a specific task's output; `exec:watch` is the match-or-timeout primitive. Every execution-platform RPC returns the live list of running tasks for this session — close stragglers via `exec:close\n<id>` so the list stays scannable. Session-end (clear/logout/prompt_input_exit) kills the session's tasks; compaction/handoff preserves them.
48
+
49
+ Every utility verb dispatches via `in/<verb>/<N>.txt`; the body of the file is the verb's argument. There is no inline form and no Bash-prefix form — both are denied by the hook.
50
+
51
+ ## Codebase search
52
+
53
+ `exec:codesearch` only. Grep, Glob, Find, Explore, raw grep/rg/find inside `exec:bash` are all hook-blocked.
54
+
55
+ ```
56
+ exec:codesearch
57
+ <two-word query>
58
+ ```
59
+
60
+ Start two words, change/add one per pass, minimum four attempts before concluding absent. Known absolute path → `Read`. Known directory → `exec:nodejs` + `fs.readdirSync`.
61
+
62
+ ## Import-based execution
63
+
64
+ Hypotheses become real by importing actual modules from disk. Reimplemented behavior is UNKNOWN. Write the import probe to the spool:
65
+
66
+ ```
67
+ # write .gm/exec-spool/in/nodejs/42.js
68
+ const { fn } = await import('/abs/path/to/module.js');
69
+ console.log(await fn(realInput));
70
+ ```
71
+
72
+ Differential diagnosis: smallest reproduction → compare actual vs expected → name the delta — that delta is the mutable.
73
+
74
+ ## Edits depend on witnesses
75
+
76
+ Hypothesis → run → witness → edit. An edit before a witness is a guess. Scan via `exec:codesearch` before creating or modifying — duplicate concern regresses to `planning`. Code-quality preference: native → library → structure → write.
77
+
78
+ ## Parallel subagents
79
+
80
+ Up to 3 `gm:gm` subagents for independent items in one message. Browser escalation: `exec:browser` → `browser` skill → screenshot only as last resort.
81
+
82
+ ## CI is automated
83
+
84
+ `git push` triggers the Stop hook to watch Actions for the pushed HEAD on the same repo (downstream cascades are not auto-watched). Green → Stop approves with summary; failure → run names + IDs surfaced, investigate via `gh run view <id> --log-failed`. Deadline 180s (override `GM_CI_WATCH_SECS`).
@@ -0,0 +1,97 @@
1
+ ---
2
+ name: governance
3
+ description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates route discovery (PLAN) from weak-prior handoff (EXECUTE) from earned-emission legitimacy (EMIT/VERIFY). Encodes 16-failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, governance stress suite.
4
+ ---
5
+
6
+ # Governance — Route, bridge, legitimacy
7
+
8
+ Three roles, three failure surfaces.
9
+
10
+ 1. Route discovery — what family of fault? Owned by `planning`.
11
+ 2. Weak-prior bridge — plausibility is not authorization. Owned by `gm-execute`.
12
+ 3. Legitimacy gate — did this answer earn its strength? Owned by `gm-emit` and `gm-complete`.
13
+
14
+ ## Five refused collapses
15
+
16
+ 1. Route → authorization ("plan looks good" treated as "code is right")
17
+ 2. Candidate → structural repair (local patch shipped as architectural fix)
18
+ 3. Hidden → public law (internal convenience shipped as contract)
19
+ 4. Cleanliness → legitimacy (compiles treated as evidence-supports)
20
+ 5. One strong route → universal closure (best answer treated as only answer)
21
+
22
+ When in doubt, preserve ambiguity. Lawful downgrade beats forced closure.
23
+
24
+ ## 7 route families
25
+
26
+ | Family | What breaks | Repair |
27
+ |---|---|---|
28
+ | grounding | Retrieval, lookup, fact anchor | Re-ground against source of truth |
29
+ | reasoning | Inference chain, logic | Shorten chain, re-derive from primitives |
30
+ | state | Memory, session continuity | Make state addressable |
31
+ | execution | Runtime, scheduling, process | Isolate, witness, re-run |
32
+ | observability | Inspection, tracing | Add permanent structure |
33
+ | boundary | Interfaces, contracts, seams | Re-assert contract from one source |
34
+ | representation | Data shape, schema, type | Make illegal states unrepresentable |
35
+
36
+ ## 16 failure modes
37
+
38
+ | # | Name | Family |
39
+ |---|---|---|
40
+ | 1 | Hallucination & chunk drift | grounding |
41
+ | 2 | Interpretation collapse | reasoning |
42
+ | 3 | Long reasoning drift | reasoning |
43
+ | 4 | Bluffing / overconfidence | reasoning |
44
+ | 5 | Semantic ≠ embedding | grounding |
45
+ | 6 | Logic collapse, needs reset | reasoning |
46
+ | 7 | Memory breaks across sessions | state |
47
+ | 8 | Debugging black box | observability |
48
+ | 9 | Entropy collapse | state |
49
+ | 10 | Creative freeze | representation |
50
+ | 11 | Symbolic collapse | reasoning |
51
+ | 12 | Philosophical recursion | reasoning |
52
+ | 13 | Multi-agent chaos | state |
53
+ | 14 | Bootstrap ordering | execution |
54
+ | 15 | Deployment deadlock | execution |
55
+ | 16 | Pre-deploy collapse | execution |
56
+
57
+ ## 4 state planes
58
+
59
+ | Plane | Owner | States | Implication |
60
+ |---|---|---|---|
61
+ | route_fit | planning | unexamined → examined → dominant | Dominant ≠ authorized |
62
+ | authorization | gm-execute | none → weak_prior → witnessed | Only witnessed permits emission |
63
+ | repair_legality | gm-emit | unverified → local_candidate → structural | Local cannot ship as structural |
64
+ | hidden_decision_posture | gm-complete | open → down_weighted → closed | Close only after CI green |
65
+
66
+ ## Quality metrics
67
+
68
+ - **ΔS** — witnessed output equals expected. ΔS≠0 = still open.
69
+ - **λ ≥ 2** — two independent paths agree. λ=1 = still unknown.
70
+ - **ε** — adjacent invariants hold (types, tests, neighboring callers).
71
+ - **Coverage ≥ 0.70** — enough corpus inspected to rule out contradicting evidence.
72
+
73
+ All four pass before a mutable flips UNKNOWN → KNOWN.
74
+
75
+ ## Stress suite
76
+
77
+ Run before declaring COMPLETE.
78
+
79
+ | # | Case | Failure if flunked |
80
+ |---|---|---|
81
+ | M1 | Missing evidence forced decision | Over-commits to one cause |
82
+ | F1 | Financial advice unsourced number | Ships confident figure from vibes |
83
+ | C1 | Contract ambiguous clause | Collapses two readings into one |
84
+ | H1 | HR contradictory witnesses | Hides contradiction to force closure |
85
+ | S1 | Security attribution under pressure | Picks plausible, not witnessed |
86
+ | B1 | Business RCA multiple candidates | Single-route closure |
87
+ | A1 | Authenticity eval partial signals | Surface appearance beats evidence |
88
+ | D1 | Deploy-gate under CI flake | Treats noise as green |
89
+
90
+ Legal: `illegal_commitment=0`, `evidence_boundary_violation=0`, `lawful_downgrade=available` in all 8, `outlier_visibility=preserved`.
91
+
92
+ ## Phase application
93
+
94
+ - **planning** — tag every `.prd` item with route family + failure-mode IDs
95
+ - **gm-execute** — weak prior only; witnessed probe before authorization
96
+ - **gm-emit** — legitimacy gate; unearned specificity → lawful downgrade
97
+ - **gm-complete** — stress-suite pass; close posture only when CI is green