gm-copilot-cli 2.0.726 → 2.0.1063

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,19 @@
1
+ ---
2
+ name: gm-cc
3
+ description: AI-native software engineering via skill-driven orchestration on cc; bootstraps plugkit for task execution and session isolation
4
+ allowed-tools: Skill
5
+ ---
6
+
7
+ # GM — cc Platform
8
+
9
+ AI-native software engineering orchestrated via skill chain: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
10
+
11
+ **Bootstrap pattern**: `bun x gm-plugkit@latest --daemon` downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Call once at session start; idempotent on subsequent calls. All execution routes through the file-spool: write to `.gm/exec-spool/in/<lang>/<N>.<ext>` or `in/<verb>/<N>.txt`, poll `out/<N>.json` for results.
12
+
13
+ **Session-ID threading (no session-start hook)**: At skill invoke time, generate or detect SESSION_ID (env var `SESSION_ID` or `uuid()`). Pass `sessionId: "<id>"` in every rs-exec RPC body (spawn, tail, watch, etc.) and every spool-written task body. All task-scoped cleanup (deleteTask, getTask, appendOutput, killSessionTasks) requires matching sessionId. Absence is forbidden — hard reject by rs-exec handler.
14
+
15
+ **Spool dispatch surface**: Write to `.gm/exec-spool/in/<lang>/<N>.<ext>` (languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno) or `in/<verb>/<N>.txt` (verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, etc.). Watcher executes and streams `out/<N>.out` (stdout) + `out/<N>.err` (stderr) line-by-line, then `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion.
16
+
17
+ **End-to-end skill chaining (skills-based platforms)**: When gm SKILL.md includes `end-to-end: true`, adapter detects signal and parses stdout for trailing JSON: `{"nextSkill": "...", "context": {...}, "phase": "..."}`. If nextSkill is non-null, invoke `Skill(skill="gm:<nextSkill>")` with context dict, repeat until null. This auto-chains 5 invocations into 1 user invocation.
18
+
19
+ Every task returns complete: taskId, exitCode, durationMs, timedOut, stdout, stderr. Background tasks return immediately with task_id; continue with exec:tail, exec:watch, or exec:close.
@@ -0,0 +1,19 @@
1
+ ---
2
+ name: gm-codex
3
+ description: AI-native software engineering via skill-driven orchestration on codex; bootstraps plugkit for task execution and session isolation
4
+ allowed-tools: Skill
5
+ ---
6
+
7
+ # GM — codex Platform
8
+
9
+ AI-native software engineering orchestrated via skill chain: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
10
+
11
+ **Bootstrap pattern**: `bun x gm-plugkit@latest --daemon` downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Call once at session start; idempotent on subsequent calls. All execution routes through the file-spool: write to `.gm/exec-spool/in/<lang>/<N>.<ext>` or `in/<verb>/<N>.txt`, poll `out/<N>.json` for results.
12
+
13
+ **Session-ID threading (no session-start hook)**: At skill invoke time, generate or detect SESSION_ID (env var `SESSION_ID` or `uuid()`). Pass `sessionId: "<id>"` in every rs-exec RPC body (spawn, tail, watch, etc.) and every spool-written task body. All task-scoped cleanup (deleteTask, getTask, appendOutput, killSessionTasks) requires matching sessionId. Absence is forbidden — hard reject by rs-exec handler.
14
+
15
+ **Spool dispatch surface**: Write to `.gm/exec-spool/in/<lang>/<N>.<ext>` (languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno) or `in/<verb>/<N>.txt` (verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, etc.). Watcher executes and streams `out/<N>.out` (stdout) + `out/<N>.err` (stderr) line-by-line, then `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion.
16
+
17
+ **End-to-end skill chaining (skills-based platforms)**: When gm SKILL.md includes `end-to-end: true`, adapter detects signal and parses stdout for trailing JSON: `{"nextSkill": "...", "context": {...}, "phase": "..."}`. If nextSkill is non-null, invoke `Skill(skill="gm:<nextSkill>")` with context dict, repeat until null. This auto-chains 5 invocations into 1 user invocation.
18
+
19
+ Every task returns complete: taskId, exitCode, durationMs, timedOut, stdout, stderr. Background tasks return immediately with task_id; continue with exec:tail, exec:watch, or exec:close.
@@ -3,121 +3,104 @@ name: gm-complete
3
3
  description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
4
4
  ---
5
5
 
6
- # GM COMPLETE — Verify and Complete
6
+ # GM COMPLETE — Verify, then close
7
7
 
8
- GRAPH: `PLAN EXECUTE EMIT [VERIFY] UPDATE-DOCSCOMPLETE`
9
- Entry: all EMIT gates passed. From `gm-emit`.
8
+ Entry: EMIT gates clear, from `gm-emit`. Exit: `.prd` deleted + test.js green + pushed + CI green → `update-docs`.
10
9
 
11
- ## TRANSITIONS
10
+ Cross-cutting dispositions live in `gm` SKILL.md.
12
11
 
13
- **EXIT → EXECUTE**: .prd items remain → invoke `gm-execute` immediately.
14
- **EXIT → COMPLETE**: .prd deleted + test.js passes + pushed + CI green → invoke `update-docs`.
15
- **REGRESS → EMIT**: broken file output.
16
- **REGRESS → EXECUTE**: logic wrong.
17
- **REGRESS → PLAN**: new unknown or wrong requirements.
12
+ ## Transitions
18
13
 
19
- Failure triage: broken outputEMIT | wrong logic → EXECUTE | new unknown → PLAN. Never patch around surprises.
14
+ - `.prd` items remain`gm-execute`
15
+ - `.prd` empty AND test.js green AND pushed AND CI green → `update-docs`
16
+ - Broken file output → `gm-emit`
17
+ - Wrong logic → `gm-execute`
18
+ - New unknown or wrong requirements → `planning`
20
19
 
21
- ## MUTABLES ALL MUST RESOLVE BEFORE COMPLETE
20
+ Failure triage: broken output to EMIT, wrong logic to EXECUTE, new unknown to PLAN. Never patch around surprises.
21
+
22
+ ## Mutables that must resolve before COMPLETE
22
23
 
23
24
  - `witnessed_e2e` — real end-to-end run with witnessed output
24
- - `browser_validated` — MANDATORY for any change touching client/UI/browser-facing code (anything served to a browser, rendered, or whose output is visible to a user). Must invoke `browser` skill, navigate the live page, and witness the change in `window` / DOM / scene state. test.js + node-side imports DO NOT satisfy this gate. See BROWSER VALIDATION GATE below.
25
+ - `browser_validated` — for any change touching client / UI / browser-facing code, see gate below. test.js + node-side imports DO NOT satisfy this gate.
25
26
  - `git_clean` — `git status --porcelain` returns empty
26
27
  - `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
27
- - `ci_passed` — all GitHub Actions runs reach `conclusion: success`
28
- - `prd_empty` — `.gm/prd.yml` deleted (file must not exist)
29
- - `stress_suite_clear` — change walked through all applicable governance stress cases (M1-D1), none flunk
30
- - `hidden_decision_posture` — advances open→down_weighted→closed only when CI green + stress suite clear
28
+ - `ci_passed` — every GitHub Actions run reaches `conclusion: success`
29
+ - `mutables_resolved` — `.gm/mutables.yml` deleted OR every entry `status: witnessed`. Stop hook hard-blocks turn-stop while any entry is `status: unknown`.
30
+ - `prd_empty` — `.gm/prd.yml` deleted AFTER residual scan: enumerate every in-spirit reachable residual surfaced this session; any hit re-enters `planning`, appends PRD items, executes. Empty PRD is necessary, not sufficient — done = empty PRD AND zero reachable in-spirit residuals. Out-of-spirit-or-unreachable residuals are named in the response and skipped; everything else is this turn's work.
31
+ - `stress_suite_clear` — change walked through M1–D1 (governance), none flunked
32
+ - `hidden_decision_posture` — open → down_weighted → closed only when CI is green AND stress suite is clear
31
33
 
32
- ## END-TO-END VERIFICATION
34
+ ## End-to-end verification
33
35
 
34
- Run real system, real data, witness actual output. NOT verification: docs updates, saying done, screenshots alone.
36
+ Real system, real data, witness actual output. Doc updates, "saying done", and screenshots alone are not verification. Write the e2e probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
35
37
 
36
38
  ```
37
- exec:nodejs
38
39
  const { fn } = await import('/abs/path/to/module.js');
39
40
  console.log(await fn(realInput));
40
41
  ```
41
42
 
42
- Browser/UI: invoke `browser` skill. After every success: enumerate what remains — never stop at first green.
43
+ After every success, enumerate what remains — never stop at first green.
44
+
45
+ ## Browser validation gate
46
+
47
+ Required when this session changed any code that runs in a browser: anything under `client/`, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle.
43
48
 
44
- ## BROWSER VALIDATION GATE MANDATORY FOR CLIENT WORK
49
+ Trigger detection (any one): `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry; new/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL; visual, layout, animation, input, network-on-page, or shader behavior altered.
45
50
 
46
- If this session changed any code that runs in a browseranything under client/, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle `browser_validated` MUST resolve before COMPLETE. Skipping it because "node tests pass" or "test.js is green" is a forced-closure refusal of witnessed verification.
51
+ Protocol: boot the real server (or open the static page) on a known URL witness HTTP 200. `exec:browser` `page.goto(url)` wait for app init by polling for the global the change affects (`window.__app.<system>`). Probe via `page.evaluate(() => …)` asserting the specific invariant the change was supposed to establish instance counts, scene meshes, DOM nodes, render stats, network frames. Capture witnessed numbers in the response — "looks fine" is not a witness. Failures route to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
47
52
 
48
- Trigger detection (any one suffices):
49
- - `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry.
50
- - New/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL.
51
- - Visual, layout, animation, input, network-on-page, or shader behavior altered.
53
+ Long-running probes split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
52
54
 
53
- Required protocol:
54
- 1. Boot the real server (or open the static page) on a known URL — witness HTTP 200.
55
- 2. `exec:browser` → `page.goto(url)` → wait for app init (poll for the global the change affects, e.g. `window.__app.<system>`).
56
- 3. Probe via `page.evaluate(() => …)` — assert the specific invariant the change was supposed to establish (instance counts, scene meshes, DOM nodes, render stats, network frames, etc.).
57
- 4. Capture the witnessed numbers in the response. "Looks fine" is not a witness.
58
- 5. Failures → regress to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
55
+ Exempt only when: change is server-only with zero browser-facing surface, OR the repository has no browser surface at all (pure CLI / library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence.
59
56
 
60
- Long-running probes: split into navigate-call `exec:wait N` probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
57
+ Pre-flight: run `git diff --name-only origin/main..HEAD` directly via Bash, then dispatch a nodejs spool file that reads the diff list and filters lines matching `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session → mandatory regression to `gm-execute`.
61
58
 
62
- Exempt only when: change is server-only with zero browser-facing surface, OR repository has no browser surface at all (pure CLI/library). Tag the exemption in the response with the reason; do not silently skip.
59
+ ## Integration test gate
63
60
 
64
- ## INTEGRATION TEST GATE
61
+ Write to `.gm/exec-spool/in/nodejs/<N>.js`:
65
62
 
66
63
  ```
67
- exec:nodejs
68
64
  const { execSync } = require('child_process');
69
65
  try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
70
66
  catch (e) { console.error('FAIL'); process.exit(1); }
71
67
  ```
72
68
 
73
- Failure → regress to `gm-execute`. No test.js + testable surface → regress to `gm-execute` to create it.
69
+ Failure → `gm-execute`. No test.js in a repo with testable surface → `gm-execute` to create it.
74
70
 
75
- ## GIT ENFORCEMENT
71
+ ## Git enforcement
72
+
73
+ Run directly via Bash:
76
74
 
77
75
  ```
78
- exec:bash
79
76
  git status --porcelain
80
77
  git log origin/main..HEAD --oneline
81
78
  ```
82
79
 
83
- Both must return empty. Local commit without push complete.
80
+ Both must return empty. Local commit without push is not complete.
84
81
 
85
- ## CI AUTOMATED
82
+ ## CI is automated
86
83
 
87
- Stop hook watches all GitHub Actions runs for the pushed HEAD. Do not call `gh run list` manually.
88
- - All-green → Stop approves with CI summary in next turn context
89
- - Failure → Stop blocks with run names+IDs → investigate with `gh run view <id> --log-failed`, fix, push, hook re-watches
90
- - Deadline 180s (override `GM_CI_WATCH_SECS`) → slow jobs get "still in progress" approve
84
+ The Stop hook watches Actions for the pushed HEAD. Do not call `gh run list` manually. All-green → Stop approves with CI summary in next-turn context. Failure → Stop blocks with run names + IDs; investigate via `gh run view <id> --log-failed`, fix, push, hook re-watches. Deadline 180s (override `GM_CI_WATCH_SECS`); slow jobs get a "still in progress" approve.
91
85
 
92
- ## HYGIENE SWEEP
86
+ ## Hygiene sweep
93
87
 
94
- Before declaring complete:
95
88
  1. Files >200 lines → split
96
89
  2. Comments in code → remove
97
- 3. Scattered test files (.test.js, .spec.js, __tests__/, fixtures/, mocks/) → delete, consolidate into root test.js
98
- 4. Mock/stub/simulation files → delete
99
- 5. Unnecessary doc files (not CHANGELOG/CLAUDE/README/TODO.md) → delete
100
- 6. Duplicate concern → snake to `planning` with restructuring instructions
90
+ 3. Scattered test files (`.test.js`, `.spec.js`, `__tests__/`, `fixtures/`, `mocks/`) → delete, consolidate into root `test.js`
91
+ 4. Mock / stub / simulation files → delete
92
+ 5. Unnecessary doc files (not CHANGELOG, CLAUDE, README, TODO.md) → delete
93
+ 6. Duplicate concern → regress to `planning` with restructuring instructions
101
94
  7. Hardcoded values → derive from ground truth
102
- 8. Fallback/demo modes → remove, fail loud
103
- 9. TODO.md → empty/deleted
104
- 10. CHANGELOG.md → has entries for this session
95
+ 8. Fallback / demo modes → remove, fail loud
96
+ 9. TODO.md → empty or deleted
97
+ 10. CHANGELOG.md → entries for this session
105
98
  11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
106
- 12. Memorize → every fact from verification handed off via background Agent(memorize) at moment of resolution
107
- 13. Deploy/publish → if deployable, deploy; if npm package, publish
99
+ 12. Memorize → every fact from verification handed off via background `Agent(memorize)` at moment of resolution
100
+ 13. Deploy / publish → if deployable, deploy; if npm package, publish
108
101
  14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
109
- 15. Governance stress-suite → walk change through M1,F1,C1,H1,S1,B1,A1,D1; any flunk = regress to owning phase
110
-
111
- ## MEMORIZE
112
-
113
- ```
114
- Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
115
- ```
116
-
117
- One per fact, parallel, same turn resolved. End-of-turn self-check mandatory.
118
-
119
- ## COMPLETION DEFINITION
102
+ 15. Governance stress-suite → walk change through M1, F1, C1, H1, S1, B1, A1, D1; any flunk regresses to the owning phase
120
103
 
121
- All: witnessed e2e | browser_validated (when client work touched) | failure paths exercised | test.js passes | .prd deleted | git clean+pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated
104
+ ## Completion
122
105
 
123
- **Never**: claim done without witnessed output | claim done on a client change without browser-validation witness | stop while .prd has items | skip hygiene | skip test.js | uncommitted/unpushed work | stop at first green
106
+ All true at once: witnessed e2e | browser_validated when client work touched | failure paths exercised | test.js passes | `.prd` deleted | git clean and pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated.
@@ -0,0 +1,19 @@
1
+ ---
2
+ name: gm-copilot-cli
3
+ description: AI-native software engineering via skill-driven orchestration on copilot-cli; bootstraps plugkit for task execution and session isolation
4
+ allowed-tools: Skill
5
+ ---
6
+
7
+ # GM — copilot-cli Platform
8
+
9
+ AI-native software engineering orchestrated via skill chain: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
10
+
11
+ **Bootstrap pattern**: `bun x gm-plugkit@latest --daemon` downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Call once at session start; idempotent on subsequent calls. All execution routes through the file-spool: write to `.gm/exec-spool/in/<lang>/<N>.<ext>` or `in/<verb>/<N>.txt`, poll `out/<N>.json` for results.
12
+
13
+ **Session-ID threading (no session-start hook)**: At skill invoke time, generate or detect SESSION_ID (env var `SESSION_ID` or `uuid()`). Pass `sessionId: "<id>"` in every rs-exec RPC body (spawn, tail, watch, etc.) and every spool-written task body. All task-scoped cleanup (deleteTask, getTask, appendOutput, killSessionTasks) requires matching sessionId. Absence is forbidden — hard reject by rs-exec handler.
14
+
15
+ **Spool dispatch surface**: Write to `.gm/exec-spool/in/<lang>/<N>.<ext>` (languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno) or `in/<verb>/<N>.txt` (verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, etc.). Watcher executes and streams `out/<N>.out` (stdout) + `out/<N>.err` (stderr) line-by-line, then `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion.
16
+
17
+ **End-to-end skill chaining (skills-based platforms)**: When gm SKILL.md includes `end-to-end: true`, adapter detects signal and parses stdout for trailing JSON: `{"nextSkill": "...", "context": {...}, "phase": "..."}`. If nextSkill is non-null, invoke `Skill(skill="gm:<nextSkill>")` with context dict, repeat until null. This auto-chains 5 invocations into 1 user invocation.
18
+
19
+ Every task returns complete: taskId, exitCode, durationMs, timedOut, stdout, stderr. Background tasks return immediately with task_id; continue with exec:tail, exec:watch, or exec:close.
@@ -0,0 +1,19 @@
1
+ ---
2
+ name: gm-cursor
3
+ description: AI-native software engineering via skill-driven orchestration on cursor; bootstraps plugkit for task execution and session isolation
4
+ allowed-tools: Skill
5
+ ---
6
+
7
+ # GM — cursor Platform
8
+
9
+ AI-native software engineering orchestrated via skill chain: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
10
+
11
+ **Bootstrap pattern**: `bun x gm-plugkit@latest --daemon` downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Call once at session start; idempotent on subsequent calls. All execution routes through the file-spool: write to `.gm/exec-spool/in/<lang>/<N>.<ext>` or `in/<verb>/<N>.txt`, poll `out/<N>.json` for results.
12
+
13
+ **Session-ID threading (no session-start hook)**: At skill invoke time, generate or detect SESSION_ID (env var `SESSION_ID` or `uuid()`). Pass `sessionId: "<id>"` in every rs-exec RPC body (spawn, tail, watch, etc.) and every spool-written task body. All task-scoped cleanup (deleteTask, getTask, appendOutput, killSessionTasks) requires matching sessionId. Absence is forbidden — hard reject by rs-exec handler.
14
+
15
+ **Spool dispatch surface**: Write to `.gm/exec-spool/in/<lang>/<N>.<ext>` (languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno) or `in/<verb>/<N>.txt` (verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, etc.). Watcher executes and streams `out/<N>.out` (stdout) + `out/<N>.err` (stderr) line-by-line, then `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion.
16
+
17
+ **End-to-end skill chaining (skills-based platforms)**: When gm SKILL.md includes `end-to-end: true`, adapter detects signal and parses stdout for trailing JSON: `{"nextSkill": "...", "context": {...}, "phase": "..."}`. If nextSkill is non-null, invoke `Skill(skill="gm:<nextSkill>")` with context dict, repeat until null. This auto-chains 5 invocations into 1 user invocation.
18
+
19
+ Every task returns complete: taskId, exitCode, durationMs, timedOut, stdout, stderr. Background tasks return immediately with task_id; continue with exec:tail, exec:watch, or exec:close.
@@ -3,85 +3,68 @@ name: gm-emit
3
3
  description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
4
4
  ---
5
5
 
6
- # GM EMIT — Write and Verify
6
+ # GM EMIT — Write and verify from disk
7
7
 
8
- GRAPH: `PLAN EXECUTE [EMIT] VERIFY → COMPLETE`
9
- Entry: all mutables KNOWN. From `gm-execute` or re-entered from VERIFY.
8
+ Entry: every mutable KNOWN, from `gm-execute` or re-entered from VERIFY. Exit: gates clear → `gm-complete`.
10
9
 
11
- ## TRANSITIONS
10
+ Cross-cutting dispositions live in `gm` SKILL.md.
12
11
 
13
- **EXIT → VERIFY**: all gate conditions true → invoke `gm-complete` immediately.
14
- **SELF-LOOP**: post-emit variance with known cause → fix, re-verify, stay in EMIT.
15
- **REGRESS → EXECUTE**: pre-emit reveals known logic error.
16
- **REGRESS → PLAN**: pre-emit reveals new unknown | post-emit variance with unknown cause | scope changed.
12
+ ## Transitions
17
13
 
18
- ## LEGITIMACY GATE (before pre-emit run)
14
+ - All gates clear → `gm-complete`
15
+ - Post-emit variance with known cause → fix in-band, re-verify, stay in EMIT
16
+ - Pre-emit reveals known logic error → `gm-execute`
17
+ - Pre-emit reveals new unknown OR post-emit variance with unknown cause OR scope changed → `planning`
19
18
 
20
- For every claim landing in a file:
21
- 1. **Earned specificity** — traces to `authorization=witnessed`, not inflated from weak prior?
22
- 2. **Repair legality** — local patch dressed as structural repair? Downgrade scope or snake to PLAN.
23
- 3. **Lawful downgrade** — can a weaker, true statement replace it? PREFER the downgrade.
24
- 4. **Alternative-route suppression** — live competing route being silenced? Preserve it.
25
- 5. **Strongest objection** — if a reviewer pushed back on this change, what would the sharpest argument be? Articulate it. Cannot articulate = have not understood the alternatives = regress to `gm-execute`.
19
+ ## Legitimacy gate (before pre-emit run)
26
20
 
27
- Fail any regress to `gm-execute` to witness what was missing, or `planning` if gap is structural.
21
+ For every claim landing in a file, answer five questions:
28
22
 
29
- ## PRE-EMIT RUN (mandatory before writing any file)
23
+ 1. Earned specificity does it trace to `authorization=witnessed`, or is it inflated from a weak prior?
24
+ 2. Repair legality — is a local patch dressed as structural repair? Downgrade scope or regress to PLAN.
25
+ 3. Lawful downgrade — can a weaker, true statement replace it? Prefer the downgrade.
26
+ 4. Alternative-route suppression — is a live competing route being silenced? Preserve it.
27
+ 5. Strongest objection — what would the sharpest reviewer pushback be? Articulate it. Cannot articulate = have not understood the alternatives → `gm-execute`.
28
+
29
+ Any failure regresses to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
30
+
31
+ ## Pre-emit run
32
+
33
+ Mandatory before writing any file. Write the probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
30
34
 
31
35
  ```
32
- exec:nodejs
33
36
  const { fn } = await import('/abs/path/to/module.js');
34
37
  console.log(await fn(realInput));
35
38
  ```
36
39
 
37
- 1. Import actual module from disk witness current behavior as baseline
38
- 2. Run proposed logic in isolation WITHOUT writing — witness with real inputs
39
- 3. Probe failure paths with real error inputs
40
- 4. Compare: matches expected → write. Unexpected → new unknown → `planning`.
41
-
42
- ## WRITING FILES
40
+ Import the actual module from disk to witness current behavior as the baseline. Run the proposed logic in isolation without writing — witness with real inputs and with real error inputs. Match expected → write. Unexpected → new unknown → `planning`.
43
41
 
44
- `exec:nodejs` with `require('fs')`. Write only when every gate mutable resolved simultaneously.
42
+ ## Writing
45
43
 
46
- ## POST-EMIT VERIFICATION (immediately after writing)
44
+ Use the Write tool, or a nodejs spool file with `require('fs')`. Write only when every gate mutable resolves simultaneously.
47
45
 
48
- 1. Re-import from disk (not in-memory — stale is inadmissible)
49
- 2. Run identical inputs as pre-emit — must match pre-emit baseline exactly
50
- 3. Known variance → fix immediately, re-verify (EMIT self-loop)
51
- 4. Unknown variance → new unknown → invoke `planning`
52
-
53
- ## GATE CONDITIONS (all true simultaneously)
54
-
55
- - Legitimacy gate passed; none of five refused collapses
56
- - Pre-emit passed with real inputs + error inputs
57
- - Post-emit matches pre-emit exactly
58
- - Hot reloadable; errors throw with context (no fallbacks, `|| default`, `catch { return null }`)
59
- - No mocks/fakes/stubs/scattered test files (delete on discovery)
60
- - Behavior change in this emit = a corresponding assertion in test.js (a change no test would catch is a change you cannot prove)
61
- - If this emit changes any browser-facing code (client/, served HTML/JS, shaders, page-bundle imports, gh-pages assets), the post-emit verify MUST include a live browser witness via `exec:browser` (boot server → page.goto → page.evaluate asserting the invariant the change established). Node-side import + test.js does NOT satisfy this — see `gm-complete` BROWSER VALIDATION GATE.
62
- - Files ≤200 lines
63
- - No duplicate concern (run exec:codesearch for primary concern after writing; any overlap → `planning`)
64
- - No comments; no hardcoded values; no adjectives in identifiers; no unnecessary files
65
- - Observability: new server subsystems expose `/debug/<subsystem>`; new client modules in `window.__debug`
66
- - Structure: no if/else where dispatch table suffices; no one-liners that require decoding; no reinvented APIs
67
- - All facts resolved this phase memorized via background Agent(memorize)
68
- - CHANGELOG.md updated; TODO.md cleared/deleted
46
+ ## Post-emit verification
69
47
 
70
- ## CODE EXECUTION
48
+ Re-import from disk — in-memory state is stale and inadmissible. Run identical inputs as pre-emit; output must match the baseline exactly. Known variance → fix and re-verify (self-loop). Unknown variance → `planning`.
71
49
 
72
- `exec:<lang>` only. File writes via exec:nodejs + require('fs'). Never Bash(node/npm/npx/bun).
73
- Pack runs: Promise.allSettled, each idea own try/catch, under 12s per call.
50
+ ## Mutables gate
74
51
 
75
- ## CODEBASE SEARCH
52
+ Before pre-emit run, read `.gm/mutables.yml`. Any entry with `status: unknown` → regress to `gm-execute`. The pre-tool-use hook hard-blocks Write/Edit/NotebookEdit while unresolved entries exist; trying to emit anyway returns deny. Zero unresolved is the precondition for every legitimacy question below.
76
53
 
77
- `exec:codesearch` only. Grep/Glob/Find/Explore = hook-blocked. Known path → `Read`.
54
+ ## Gate (all true at once)
78
55
 
79
- ## MEMORIZE
80
-
81
- ```
82
- Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
83
- ```
84
-
85
- Same turn as resolution. Parallel when multiple. End-of-turn self-check mandatory.
86
-
87
- **Never**: write before pre-emit | advance with post-emit variance | absorb surprises | respond to user mid-phase
56
+ - `.gm/mutables.yml` empty/absent OR every entry `status: witnessed` with filled `witness_evidence`
57
+ - Legitimacy gate passed; no refused collapse
58
+ - Pre-emit passed with real inputs and real error inputs
59
+ - Post-emit matches pre-emit exactly
60
+ - Hot-reloadable; errors throw with context (no `|| default`, no `catch { return null }`, no fallbacks)
61
+ - No mocks, fakes, stubs, or scattered test files (delete on discovery)
62
+ - Any behavior change has a corresponding assertion in `test.js` a change no test catches is a change you cannot prove
63
+ - Browser-facing change → post-emit verify includes a live `exec:browser` witness (boot server → `page.goto` → `page.evaluate` asserting the invariant the change established). Node-side import + test.js does not satisfy this — the final gate runs again in `gm-complete`.
64
+ - Files 200 lines
65
+ - No duplicate concern (run `exec:codesearch` for the primary concern after writing; overlap → `planning`)
66
+ - No comments, no hardcoded values, no adjectives in identifiers, no unnecessary files
67
+ - Observability: new server subsystems expose `/debug/<subsystem>`; new client modules register in `window.__debug`
68
+ - Structure: no if/else where dispatch suffices; no one-liners that obscure; no reinvented APIs
69
+ - Every fact resolved this phase memorized via background `Agent(memorize)`
70
+ - CHANGELOG.md updated; TODO.md cleared or deleted
@@ -3,123 +3,86 @@ name: gm-execute
3
3
  description: EXECUTE phase AND the foundational execution contract for every skill. Every exec:<lang> run, every witnessed check, every code search, in every phase, follows this skill's discipline. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning — restart chain from PLAN.
4
4
  ---
5
5
 
6
- # GM EXECUTE — Resolve Every Unknown
6
+ # GM EXECUTE — Resolve every unknown by witness
7
7
 
8
- GRAPH: `PLAN [EXECUTE] EMIT VERIFYCOMPLETE`. Entry: .prd with named unknowns.
8
+ Entry: `.prd` with named unknowns. Exit: every mutable KNOWN invoke `gm-emit`.
9
9
 
10
- This skill = execution contract for ALL phases. About to run anything load this first.
10
+ A `@<discipline>` sigil propagates from PLAN through every recall, codesearch, and memorize call; reads without one fan across default plus enabled disciplines, writes without one go to default only.
11
11
 
12
- ## TRANSITIONS
12
+ This skill is the execution contract for ALL phases — pre-emit witnesses, post-emit verifies, e2e checks all run on this discipline. Cross-cutting dispositions live in `gm` SKILL.md.
13
13
 
14
- - **EXIT → EMIT**: all mutables KNOWN → invoke `gm-emit`.
15
- - **SELF-LOOP**: still UNKNOWN → re-run different angle (max 2 passes).
16
- - **REGRESS → PLAN**: new unknown | unresolvable after 2 passes.
14
+ ## Transitions
17
15
 
18
- ## MUTABLE DISCIPLINE
16
+ - All mutables KNOWN → `gm-emit`
17
+ - Still UNKNOWN → re-run from a different angle (max 2 passes)
18
+ - New unknown OR unresolvable after 2 passes → `planning`
19
19
 
20
- Each mutable: name | expected | current | resolution method.
20
+ ## Mutable discipline
21
21
 
22
- Resolves to KNOWN only when ALL four pass:
23
- - **ΔS=0** — witnessed output equals expected
24
- - **λ≥2** — two independent paths agree
25
- - **ε intact** — adjacent invariants hold
26
- - **Coverage≥0.70** — enough corpus inspected
22
+ Each mutable carries: name, expected, current, resolution method.
27
23
 
28
- Unresolved after 2 passes = regress to `planning`. Never narrate past an unresolved mutable.
24
+ Resolves to KNOWN only when all four pass:
29
25
 
30
- ## PRIORS DON'T AUTHORIZE
26
+ - **ΔS = 0** — witnessed output equals expected
27
+ - **λ ≥ 2** — two independent paths agree
28
+ - **ε intact** — adjacent invariants hold
29
+ - **Coverage ≥ 0.70** — enough corpus inspected to rule out contradiction
31
30
 
32
- Route candidates from PLAN = `weak_prior` only. Plausibility = right to TEST, not BELIEVE.
33
- weak_prior → witnessed probe → witnessed → feed to EMIT. "The plan says" / "obviously X" = prior, not fact.
31
+ Unresolved after 2 passes regresses to `planning`. Never narrate past an unresolved mutable.
34
32
 
35
- Claims in response prose stand or fall by their last witness. A claim with no witness in this session is a hypothesis, not a finding say so when you state it, and say what would settle it. The next reader (you, next turn) needs to know which lines were earned and which were carried forward.
33
+ Every witness that resolves a mutable writes back to `.gm/mutables.yml` the same step: set `status: witnessed` and fill `witness_evidence` with concrete proof (file:line, codesearch hit, exec output snippet). No write-back = the mutable stays unknown and the EMIT-gate stays closed. The hook reads this file; the agent's memory of "I resolved it" does not unblock anything.
36
34
 
37
- ## VERIFICATION BUDGET
35
+ Route candidates from PLAN are `weak_prior` only. Plausibility is the right to test, not the right to believe. A claim with no witness in the current session is a hypothesis — say so when stating it, and say what would settle it. The next reader (you, next turn) needs to know which lines were earned and which were carried forward.
38
36
 
39
- Spend on `.prd` items in descending order of consequence-if-wrong × distance-from-witnessed. Items whose failure would collapse the headline finding must reach witnessed status before EMIT; items with sub-argument-level consequence need at minimum a stated fallback path.
37
+ ## Verification budget
40
38
 
41
- ## CODE EXECUTION
39
+ Spend on `.prd` items in descending order of consequence-if-wrong × distance-from-witnessed. Items whose failure would collapse the headline finding must reach witnessed status before EMIT; sub-argument-level items need at minimum a stated fallback path.
42
40
 
43
- `exec:<lang>` only via Bash: `exec:<lang>\n<code>`
41
+ ## Code execution
44
42
 
45
- Langs: `nodejs` (default) | `bash` | `python` | `typescript` | `go` | `rust` | `c` | `cpp` | `java` | `deno` | `cmd`
43
+ Code AND utility verbs both run through the file-spool. Write a file to `.gm/exec-spool/in/<lang-or-verb>/<N>.<ext>` — language stems (`in/nodejs/42.js`, `in/python/43.py`, `in/bash/44.sh`, plus typescript, go, rust, c, cpp, java, deno) or verb stems (`in/codesearch/45.txt`, `in/recall/46.txt`, `in/memorize/47.md`, plus wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health). The spool watcher executes and streams stdout to `out/<N>.out`, stderr to `out/<N>.err`, then writes `out/<N>.json` metadata sidecar at completion (taskId, lang, ok, exitCode, durationMs, timedOut, startedAt, endedAt). Both streams return as systemMessage with `--- stdout ---` / `--- stderr ---` separators. File I/O via a nodejs spool file + `require('fs')`. Only `git` and `gh` run directly in Bash. Never `Bash(node/npm/npx/bun)`, never `Bash(exec:<anything>)`.
46
44
 
47
- File I/O: exec:nodejs + require('fs'). Git directly in Bash. **Never** Bash(node/npm/npx/bun).
45
+ Pack runs: `Promise.allSettled`, each idea own try/catch, under 12s per call. Runner: write `in/runner/<N>.txt` with body `start` | `stop` | `status`.
48
46
 
49
- Pack runs: Promise.allSettled parallel, each idea own try/catch, under 12s per call.
50
- Runner: `exec:runner\nstart|stop|status`
47
+ Every exec daemonizes. The hook tails the task logfile up to 30s wall-clock and returns whatever is there — short tasks complete inside the window and look synchronous; long tasks return a task_id with partial output. Continue with `exec:tail` (drain, bounded), `exec:watch` (resume blocking until match or timeout), or `exec:close` (terminate). Never re-spawn a long task to check on it — that orphans the first one. `exec:wait` is a pure timer; `exec:sleep` blocks on a specific task's output; `exec:watch` is the match-or-timeout primitive. Every execution-platform RPC returns the live list of running tasks for this session — close stragglers via `exec:close\n<id>` so the list stays scannable. Session-end (clear/logout/prompt_input_exit) kills the session's tasks; compaction/handoff preserves them.
51
48
 
52
- ## CODEBASE SEARCH
49
+ Every utility verb dispatches via `in/<verb>/<N>.txt`; the body of the file is the verb's argument. There is no inline form and no Bash-prefix form — both are denied by the hook.
53
50
 
54
- `exec:codesearch` only. Grep/Glob/Find/Explore/grep/rg/find = hook-blocked.
51
+ ## Codebase search
55
52
 
56
- Known absolute path `Read`. Known dir exec:nodejs + fs.readdirSync.
53
+ `exec:codesearch` only. Grep, Glob, Find, Explore, raw grep/rg/find inside `exec:bash` are all hook-blocked.
57
54
 
58
55
  ```
59
56
  exec:codesearch
60
57
  <two-word query>
61
58
  ```
62
59
 
63
- Iterate: change/add one word per pass. Min 4 attempts before concluding absent.
64
-
65
- ## IMPORT-BASED EXECUTION
66
-
67
- Always import actual modules. Reimplemented = UNKNOWN.
68
-
69
- ```
70
- exec:nodejs
71
- const { fn } = await import('/abs/path/to/module.js');
72
- console.log(await fn(realInput));
73
- ```
74
-
75
- Differential diagnosis: smallest reproduction → compare actual vs expected → name the delta = mutable.
76
-
77
- ## CI — AUTOMATED
60
+ Start two words, change/add one per pass, minimum four attempts before concluding absent. Known absolute path → `Read`. Known directory → `exec:nodejs` + `fs.readdirSync`.
78
61
 
79
- `git push` Stop hook auto-watches Actions for pushed HEAD. Same-repo only — downstream cascades not auto-watched.
80
- - Green → Stop approves with summary
81
- - Failure → run names+IDs → `gh run view <id> --log-failed`
82
- - Deadline 180s (override `GM_CI_WATCH_SECS`)
62
+ ## Utility verb failure handling
83
63
 
84
- ## GROUND TRUTH
64
+ **Utility verb failures must surface**: exec:memorize, exec:recall, exec:codesearch, and other utility verbs may fail (socket unavailable, timeout, network error). Failures do not block witness completion but must be reported to the user with error context. Fallback mechanisms (AGENTS.md for memorize) ensure memory preservation even when rs-learn is temporarily unavailable.
85
65
 
86
- Real services, real data, real timing. Mocks/stubs/scattered tests/fallbacks = delete.
66
+ ## Import-based execution
87
67
 
88
- **Scan before edit**: exec:codesearch before creating/modifying. Duplicate concern = regress to `planning`.
89
- **Hypothesize via execution**: hypothesis → run → witness → edit. Never edit on unwitnessed assumption.
90
- **Code quality**: native → library → structure (map/pipeline) → write.
91
-
92
- ## PARALLEL SUBAGENTS
93
-
94
- ≤3 `gm:gm` subagents for independent items in ONE message. Browser escalation: exec:browser → browser skill → screenshot last resort.
95
-
96
- ## RECALL — HARD RULE
97
-
98
- Before resolving any new unknown via fresh execution, recall first.
68
+ Hypotheses become real by importing actual modules from disk. Reimplemented behavior is UNKNOWN. Write the import probe to the spool:
99
69
 
100
70
  ```
101
- exec:recall
102
- <2-6 word query>
71
+ # write .gm/exec-spool/in/nodejs/42.js
72
+ const { fn } = await import('/abs/path/to/module.js');
73
+ console.log(await fn(realInput));
103
74
  ```
104
75
 
105
- Triggers: "did we hit this" | feels familiar | new sub-task in known project | about to comment a non-obvious choice | about to ask user something likely discussed.
76
+ Differential diagnosis: smallest reproduction compare actual vs expected name the delta that delta is the mutable.
106
77
 
107
- Hits = weak_prior; still witness. Empty = proceed. Capped 6s, ~5ms when serve running. ~200 tokens / 5 hits.
108
-
109
- ## MEMORIZE — HARD RULE
110
-
111
- Unknown→known = same-turn memorize.
112
-
113
- ```
114
- Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
115
- ```
78
+ ## Edits depend on witnesses
116
79
 
117
- Triggers: exec output answers prior unknown | CI log reveals root cause | code read confirms/refutes | env quirk | user states preference/constraint.
80
+ Hypothesis run witness edit. An edit before a witness is a guess. Scan via `exec:codesearch` before creating or modifying — duplicate concern regresses to `planning`. Code-quality preference: native → library → structure → write.
118
81
 
119
- N facts → N parallel Agent calls in ONE message. End-of-turn self-check mandatory.
82
+ ## Parallel subagents
120
83
 
121
- ## CONSTRAINTS
84
+ Up to 3 `gm:gm` subagents for independent items in one message. Browser escalation: `exec:browser` → `browser` skill → screenshot only as last resort.
122
85
 
123
- **Never**: Bash(node/npm/npx/bun) | fake data | mocks | scattered tests | fallbacks | Grep/Glob/Find/Explore | sequential independent items | respond mid-phase | edit before witnessing | duplicate code | if/else where dispatch suffices | one-liners that obscure | reinvent native/library
86
+ ## CI is automated
124
87
 
125
- **Always**: witness every hypothesis | import real modules | scan before edit | regress on new unknown | delete mocks/comments/scattered tests on discovery | update test.js for behavior changes | invoke next skill immediately when done | weight verification by load
88
+ `git push` triggers the Stop hook to watch Actions for the pushed HEAD on the same repo (downstream cascades are not auto-watched). Green Stop approves with summary; failure run names + IDs surfaced, investigate via `gh run view <id> --log-failed`. Deadline 180s (override `GM_CI_WATCH_SECS`).
@@ -0,0 +1,19 @@
1
+ ---
2
+ name: gm-gc
3
+ description: AI-native software engineering via skill-driven orchestration on gc; bootstraps plugkit for task execution and session isolation
4
+ allowed-tools: Skill
5
+ ---
6
+
7
+ # GM — gc Platform
8
+
9
+ AI-native software engineering orchestrated via skill chain: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
10
+
11
+ **Bootstrap pattern**: `bun x gm-plugkit@latest --daemon` downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Call once at session start; idempotent on subsequent calls. All execution routes through the file-spool: write to `.gm/exec-spool/in/<lang>/<N>.<ext>` or `in/<verb>/<N>.txt`, poll `out/<N>.json` for results.
12
+
13
+ **Session-ID threading (no session-start hook)**: At skill invoke time, generate or detect SESSION_ID (env var `SESSION_ID` or `uuid()`). Pass `sessionId: "<id>"` in every rs-exec RPC body (spawn, tail, watch, etc.) and every spool-written task body. All task-scoped cleanup (deleteTask, getTask, appendOutput, killSessionTasks) requires matching sessionId. Absence is forbidden — hard reject by rs-exec handler.
14
+
15
+ **Spool dispatch surface**: Write to `.gm/exec-spool/in/<lang>/<N>.<ext>` (languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno) or `in/<verb>/<N>.txt` (verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, etc.). Watcher executes and streams `out/<N>.out` (stdout) + `out/<N>.err` (stderr) line-by-line, then `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion.
16
+
17
+ **End-to-end skill chaining (skills-based platforms)**: When gm SKILL.md includes `end-to-end: true`, adapter detects signal and parses stdout for trailing JSON: `{"nextSkill": "...", "context": {...}, "phase": "..."}`. If nextSkill is non-null, invoke `Skill(skill="gm:<nextSkill>")` with context dict, repeat until null. This auto-chains 5 invocations into 1 user invocation.
18
+
19
+ Every task returns complete: taskId, exitCode, durationMs, timedOut, stdout, stderr. Background tasks return immediately with task_id; continue with exec:tail, exec:watch, or exec:close.