gm-kilo 2.0.631 → 2.0.633
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/plugkit-darwin-arm64 +0 -0
- package/bin/plugkit-darwin-x64 +0 -0
- package/bin/plugkit-linux-arm64 +0 -0
- package/bin/plugkit-linux-x64 +0 -0
- package/bin/plugkit-win32-arm64.exe +0 -0
- package/bin/plugkit-win32-x64.exe +0 -0
- package/bin/plugkit.exe +0 -0
- package/package.json +1 -1
- package/skills/browser/SKILL.md +24 -149
- package/skills/code-search/SKILL.md +14 -47
- package/skills/create-lang-plugin/SKILL.md +47 -105
- package/skills/gm/SKILL.md +20 -48
- package/skills/gm-complete/SKILL.md +54 -139
- package/skills/gm-emit/SKILL.md +49 -133
- package/skills/gm-execute/SKILL.md +52 -140
- package/skills/governance/SKILL.md +73 -98
- package/skills/planning/SKILL.md +54 -127
- package/skills/ssh/SKILL.md +10 -37
- package/skills/update-docs/SKILL.md +21 -74
package/skills/gm/SKILL.md
CHANGED
|
@@ -5,68 +5,40 @@ description: Agent (not skill) - immutable programming state machine. Always inv
|
|
|
5
5
|
|
|
6
6
|
# GM — Skill-First Orchestrator
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Invoke `planning` skill immediately. Skill tool only — never Agent tool for skills.
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
## STATE MACHINE
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
Top of chain. No mutables resolved. Phases: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
|
|
13
|
+
Each phase loads protocols via Skill invocation only. Reading summary ≠ being in phase.
|
|
13
14
|
|
|
14
|
-
|
|
15
|
+
`gm-execute` = execution contract (all phases). `governance` = route/legitimacy reference (load once).
|
|
15
16
|
|
|
16
|
-
|
|
17
|
+
## MEMORIZE — HARD RULE
|
|
17
18
|
|
|
18
|
-
|
|
19
|
+
Unknown→known = memorize same turn it resolves. Background, non-blocking.
|
|
19
20
|
|
|
20
|
-
|
|
21
|
+
Triggers: exec: output answers prior unknown | code read confirms/refutes assumption | CI log reveals root cause | user states preference/constraint | fix worked for non-obvious reason | env quirk observed.
|
|
21
22
|
|
|
22
|
-
## FRAGILE LEARNINGS — HARD RULE
|
|
23
|
-
|
|
24
|
-
Every unknown→known transition in this session = fact that dies on compaction unless handed off **the same turn it resolves**. Not end of phase. Not end of chain. Same turn.
|
|
25
|
-
|
|
26
|
-
**Automatic trigger** — spawn `memorize` the moment any of these happens:
|
|
27
|
-
- An `exec:` run's output answers an earlier "let me check" / "I don't know yet"
|
|
28
|
-
- A code read confirms or refutes an assumption
|
|
29
|
-
- A CI log reveals a root cause
|
|
30
|
-
- User states a preference, constraint, deadline, or decision
|
|
31
|
-
- A fix worked for a non-obvious reason
|
|
32
|
-
- A tool / environment quirk bit once (blocked commands, path oddities, platform gotchas)
|
|
33
|
-
|
|
34
|
-
**Invocation** (background, non-blocking, continue working in the same message):
|
|
35
23
|
```
|
|
36
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<
|
|
24
|
+
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
37
25
|
```
|
|
38
26
|
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
**End-of-turn self-check** (mandatory before handing control back): scan the turn for resolved unknowns that were not memorized. Any found → spawn them now, parallel, before the response closes.
|
|
42
|
-
|
|
43
|
-
Resolve an unknown, skip memorize = memory leak. Treat it as a bug, not a style choice.
|
|
44
|
-
|
|
45
|
-
## USER DONE TALKING
|
|
46
|
-
|
|
47
|
-
User gave task. User waiting. Not co-pilot — person whose time you conserve by running chain end-to-end. Mid-chain questions ("should I proceed?", "which approach?", "look right before continue?") = chain breaks. Every break forces user back into loop they offloaded.
|
|
48
|
-
|
|
49
|
-
Unknown resolution order — fixed:
|
|
50
|
-
1. **Code execution** — witnessed run (`exec:<lang>`, `exec:codesearch`, import real module). Covers 90%+ of mutables.
|
|
51
|
-
2. **Web** — `WebFetch` / `WebSearch` for API docs, spec PDFs, library versions, framework conventions. Covers environment facts not in this codebase.
|
|
52
|
-
3. **User** — only after code and web exhausted. Only for genuinely ambiguous scope that makes planning impossible, or destructive-irreversible decisions (force-push, drop prod table, publish). Not for preferences resolvable from existing code conventions.
|
|
53
|
-
|
|
54
|
-
An unknown that could fall to step 1 or 2 is not a clarifying question — it is a missed run. "Want me to..." or "Should I..." mid-chain = invoke next skill instead.
|
|
55
|
-
|
|
56
|
-
Clarification allowed at top of chain (before first `planning`) when scope is genuinely unreadable. After chain starts: policy carries it.
|
|
27
|
+
Multiple facts → parallel Agent calls in ONE message. End-of-turn: scan for un-memorized resolutions → spawn now.
|
|
57
28
|
|
|
58
|
-
|
|
59
|
-
- `planning` skill → `gm-execute` skill → `gm-emit` skill → `gm-complete` skill → `update-docs` skill
|
|
60
|
-
- `memorize` sub-agent — background only, non-sequential. `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<what was learned>')`
|
|
29
|
+
## EXECUTION ORDER
|
|
61
30
|
|
|
62
|
-
|
|
31
|
+
1. Code execution (exec:<lang>, exec:codesearch) — 90%+ of unknowns
|
|
32
|
+
2. Web (WebFetch/WebSearch) — env facts not in codebase
|
|
33
|
+
3. User — only when 1+2 exhausted AND decision is destructive-irreversible
|
|
63
34
|
|
|
64
|
-
|
|
35
|
+
"Should I..." mid-chain = invoke next skill instead.
|
|
65
36
|
|
|
66
|
-
|
|
37
|
+
Skill chain: `planning` → `gm-execute` → `gm-emit` → `gm-complete` → `update-docs`
|
|
67
38
|
|
|
68
|
-
|
|
39
|
+
exec:<lang> only. Never Bash(node/npm/npx/bun). git push = auto CI watch via Stop hook.
|
|
69
40
|
|
|
70
|
-
|
|
41
|
+
## RESPONSE POLICY
|
|
71
42
|
|
|
72
|
-
|
|
43
|
+
Terse. Drop filler. Fragments OK. Pattern: `[thing] [action] [reason]. [next step].`
|
|
44
|
+
Code/commits/PRs = normal prose. Security/destructive = drop terseness.
|
|
@@ -3,69 +3,34 @@ name: gm-complete
|
|
|
3
3
|
description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# GM COMPLETE —
|
|
6
|
+
# GM COMPLETE — Verify and Complete
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
**GRAPH POSITION**: `PLAN → EXECUTE → EMIT → [VERIFY] → UPDATE-DOCS → COMPLETE`
|
|
11
|
-
- **Entry**: All EMIT gates passed. Entered from `gm-emit`.
|
|
12
|
-
|
|
13
|
-
## WHERE YOU ARE
|
|
14
|
-
|
|
15
|
-
Files written. Question now: does the whole system work end-to-end, and does the world outside local repo (CI, downstream pipelines, deployed surfaces) agree. Every check = witnessed execution: `node test.js`, `gh run watch`, diagnostic repros on failure. Contract in `gm-execute`; protocols not fresh → verification drifts to narrated claims ("change should work") over witnessed ones ("change produced output X"). Load first.
|
|
16
|
-
|
|
17
|
-
## VERIFICATION → UNKNOWNS
|
|
18
|
-
|
|
19
|
-
Failing test, red CI, surprising downstream cascade ≠ things to patch around. = new fault surfaces becoming visible. Classify failure:
|
|
20
|
-
- Wrong file output → regress to EMIT
|
|
21
|
-
- Wrong logic → regress to EXECUTE
|
|
22
|
-
- Genuinely new unknown or wrong requirement → regress to PLANNING
|
|
23
|
-
|
|
24
|
-
Let chain carry you. Stop-and-fix-here = how silent-failure bugs reach prod. Machine assumes you regress; trust it.
|
|
25
|
-
|
|
26
|
-
## FRAGILE LEARNINGS — HARD RULE
|
|
27
|
-
|
|
28
|
-
Phase where environment reality hits hardest — CI runner quirks, flaky-test patterns, timing thresholds, deploy cadences, cross-repo cascade behaviors. Highest-value memorization surface. Each fact → memorize **the same turn it resolves**, background, parallel when multiple:
|
|
29
|
-
|
|
30
|
-
```
|
|
31
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
32
|
-
```
|
|
33
|
-
|
|
34
|
-
One call per fact. **End-of-turn self-check** mandatory: any resolved unknown un-memorized → spawn before closing response. Full trigger contract in `planning` / `gm-execute`.
|
|
8
|
+
GRAPH: `PLAN → EXECUTE → EMIT → [VERIFY] → UPDATE-DOCS → COMPLETE`
|
|
9
|
+
Entry: all EMIT gates passed. From `gm-emit`.
|
|
35
10
|
|
|
36
11
|
## TRANSITIONS
|
|
37
12
|
|
|
38
|
-
**EXIT
|
|
39
|
-
|
|
40
|
-
**
|
|
41
|
-
|
|
42
|
-
**
|
|
43
|
-
- Verification reveals broken file output → invoke `gm-emit` skill, reset to EMIT state, re-verify on return
|
|
44
|
-
- Verification reveals logic error → invoke `gm-execute` skill, reset to EXECUTE state, re-emit and re-verify on return
|
|
45
|
-
- Verification reveals new unknown → invoke `planning` skill, reset to PLAN state
|
|
46
|
-
- Verification reveals wrong requirements → invoke `planning` skill, reset to PLAN state
|
|
47
|
-
|
|
48
|
-
**TRIAGE on failure**: broken file output → regress to `gm-emit` | wrong logic → regress to `gm-execute` | new unknown or wrong requirements → regress to `planning`
|
|
49
|
-
|
|
50
|
-
**RULE**: Any surprise = new unknown = regress to `planning`. Never patch around surprises.
|
|
51
|
-
|
|
52
|
-
## MUTABLE DISCIPLINE
|
|
13
|
+
**EXIT → EXECUTE**: .prd items remain → invoke `gm-execute` immediately.
|
|
14
|
+
**EXIT → COMPLETE**: .prd deleted + test.js passes + pushed + CI green → invoke `update-docs`.
|
|
15
|
+
**REGRESS → EMIT**: broken file output.
|
|
16
|
+
**REGRESS → EXECUTE**: logic wrong.
|
|
17
|
+
**REGRESS → PLAN**: new unknown or wrong requirements.
|
|
53
18
|
|
|
54
|
-
|
|
55
|
-
- `git_clean=UNKNOWN` until `exec:bash\ngit status --porcelain` returns empty
|
|
56
|
-
- `git_pushed=UNKNOWN` until `git log origin/main..HEAD --oneline` returns empty
|
|
57
|
-
- `ci_passed=UNKNOWN` until all GitHub Actions runs triggered by the push reach `conclusion: success`
|
|
58
|
-
- `prd_empty=UNKNOWN` until `.gm/prd.yml` is deleted (not just empty — file must not exist)
|
|
59
|
-
- `stress_suite_clear=UNKNOWN` until the change has been mentally walked through every applicable case in the `governance` stress suite (M1, F1, C1, H1, S1, B1, A1, D1) and none flunks. Flunk = regress to the phase that owns the gap.
|
|
60
|
-
- `hidden_decision_posture=open` until CI green. Posture advances `open → down_weighted` only when some evidence is in, `down_weighted → closed` only when CI green + stress suite clear. Closing early = collapse #3 (hidden orchestration into public law).
|
|
19
|
+
Failure triage: broken output → EMIT | wrong logic → EXECUTE | new unknown → PLAN. Never patch around surprises.
|
|
61
20
|
|
|
62
|
-
|
|
21
|
+
## MUTABLES — ALL MUST RESOLVE BEFORE COMPLETE
|
|
63
22
|
|
|
64
|
-
|
|
23
|
+
- `witnessed_e2e` — real end-to-end run with witnessed output
|
|
24
|
+
- `git_clean` — `git status --porcelain` returns empty
|
|
25
|
+
- `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
|
|
26
|
+
- `ci_passed` — all GitHub Actions runs reach `conclusion: success`
|
|
27
|
+
- `prd_empty` — `.gm/prd.yml` deleted (file must not exist)
|
|
28
|
+
- `stress_suite_clear` — change walked through all applicable governance stress cases (M1-D1), none flunk
|
|
29
|
+
- `hidden_decision_posture` — advances open→down_weighted→closed only when CI green + stress suite clear
|
|
65
30
|
|
|
66
|
-
|
|
31
|
+
## END-TO-END VERIFICATION
|
|
67
32
|
|
|
68
|
-
NOT verification: docs updates,
|
|
33
|
+
Run real system, real data, witness actual output. NOT verification: docs updates, saying done, screenshots alone.
|
|
69
34
|
|
|
70
35
|
```
|
|
71
36
|
exec:nodejs
|
|
@@ -73,115 +38,65 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
73
38
|
console.log(await fn(realInput));
|
|
74
39
|
```
|
|
75
40
|
|
|
76
|
-
|
|
77
|
-
1. Identify which subsystem produced the unexpected output
|
|
78
|
-
2. Reproduce the failure in isolation (single function, single module)
|
|
79
|
-
3. Name the delta between expected and actual — this is the mutable
|
|
80
|
-
4. Triage: broken file output → regress to EMIT | wrong logic → regress to EXECUTE | new unknown → regress to PLAN
|
|
81
|
-
5. Never fix a symptom without identifying and fixing the root cause
|
|
82
|
-
|
|
83
|
-
For browser/UI: invoke `browser` skill with real workflows. Server + client features require both exec:nodejs AND browser diagnostics. After every success: enumerate what remains — never stop at first green. First green is not COMPLETE.
|
|
41
|
+
Browser/UI: invoke `browser` skill. After every success: enumerate what remains — never stop at first green.
|
|
84
42
|
|
|
85
43
|
## INTEGRATION TEST GATE
|
|
86
44
|
|
|
87
|
-
Before git enforcement, run the project's `test.js` if it exists:
|
|
88
|
-
|
|
89
45
|
```
|
|
90
46
|
exec:nodejs
|
|
91
47
|
const { execSync } = require('child_process');
|
|
92
|
-
try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
Failure = regression to `gm-execute`. Do not proceed to git enforcement with failing tests.
|
|
96
|
-
|
|
97
|
-
If `test.js` does not exist and the project has testable surface, regress to `gm-execute` to create it.
|
|
98
|
-
|
|
99
|
-
## CODE EXECUTION
|
|
100
|
-
|
|
101
|
-
**exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
|
|
102
|
-
|
|
103
|
-
`exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:java` | `exec:deno` | `exec:cmd`
|
|
104
|
-
|
|
105
|
-
Only git in bash directly. Background tasks: `exec:sleep\n<id>`, `exec:status\n<id>`, `exec:close\n<id>`. Runner: `exec:runner\nstart|stop|status`.
|
|
106
|
-
|
|
107
|
-
**Execution efficiency — pack every run:**
|
|
108
|
-
- Combine multiple independent operations into one exec call using `Promise.allSettled` or parallel subprocess spawning
|
|
109
|
-
- Each independent idea gets its own try/catch with independent error reporting — never let one failure block another
|
|
110
|
-
- Target under 12s per exec call; split work across multiple calls only when dependencies require it
|
|
111
|
-
- Prefer a single well-structured exec that does 5 things over 5 sequential execs
|
|
112
|
-
|
|
113
|
-
## CODEBASE EXPLORATION — exec:codesearch ONLY
|
|
114
|
-
|
|
115
|
-
```
|
|
116
|
-
exec:codesearch
|
|
117
|
-
<two-word query>
|
|
48
|
+
try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
|
|
49
|
+
catch (e) { console.error('FAIL'); process.exit(1); }
|
|
118
50
|
```
|
|
119
51
|
|
|
120
|
-
|
|
52
|
+
Failure → regress to `gm-execute`. No test.js + testable surface → regress to `gm-execute` to create it.
|
|
121
53
|
|
|
122
54
|
## GIT ENFORCEMENT
|
|
123
55
|
|
|
124
56
|
```
|
|
125
57
|
exec:bash
|
|
126
58
|
git status --porcelain
|
|
127
|
-
```
|
|
128
|
-
Must return empty.
|
|
129
|
-
|
|
130
|
-
```
|
|
131
|
-
exec:bash
|
|
132
59
|
git log origin/main..HEAD --oneline
|
|
133
60
|
```
|
|
134
|
-
Must return empty. If not: stage → commit → push → re-verify. Local commit without push ≠ complete.
|
|
135
61
|
|
|
136
|
-
|
|
62
|
+
Both must return empty. Local commit without push ≠ complete.
|
|
137
63
|
|
|
138
|
-
|
|
64
|
+
## CI — AUTOMATED
|
|
139
65
|
|
|
140
|
-
|
|
141
|
-
-
|
|
142
|
-
-
|
|
143
|
-
-
|
|
66
|
+
Stop hook watches all GitHub Actions runs for the pushed HEAD. Do not call `gh run list` manually.
|
|
67
|
+
- All-green → Stop approves with CI summary in next turn context
|
|
68
|
+
- Failure → Stop blocks with run names+IDs → investigate with `gh run view <id> --log-failed`, fix, push, hook re-watches
|
|
69
|
+
- Deadline 180s (override `GM_CI_WATCH_SECS`) → slow jobs get "still in progress" approve
|
|
144
70
|
|
|
145
|
-
##
|
|
71
|
+
## HYGIENE SWEEP
|
|
146
72
|
|
|
147
|
-
Before declaring complete
|
|
73
|
+
Before declaring complete:
|
|
74
|
+
1. Files >200 lines → split
|
|
75
|
+
2. Comments in code → remove
|
|
76
|
+
3. Scattered test files (.test.js, .spec.js, __tests__/, fixtures/, mocks/) → delete, consolidate into root test.js
|
|
77
|
+
4. Mock/stub/simulation files → delete
|
|
78
|
+
5. Unnecessary doc files (not CHANGELOG/CLAUDE/README/TODO.md) → delete
|
|
79
|
+
6. Duplicate concern → snake to `planning` with restructuring instructions
|
|
80
|
+
7. Hardcoded values → derive from ground truth
|
|
81
|
+
8. Fallback/demo modes → remove, fail loud
|
|
82
|
+
9. TODO.md → empty/deleted
|
|
83
|
+
10. CHANGELOG.md → has entries for this session
|
|
84
|
+
11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
|
|
85
|
+
12. Memorize → every fact from verification handed off via background Agent(memorize) at moment of resolution
|
|
86
|
+
13. Deploy/publish → if deployable, deploy; if npm package, publish
|
|
87
|
+
14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
|
|
88
|
+
15. Governance stress-suite → walk change through M1,F1,C1,H1,S1,B1,A1,D1; any flunk = regress to owning phase
|
|
148
89
|
|
|
149
|
-
|
|
150
|
-
2. **Comments in code** → remove all
|
|
151
|
-
3. **Scattered test files** (.test.js, .spec.js, __tests__/, fixtures/, mocks/) → delete, consolidate coverage into root `test.js`
|
|
152
|
-
4. **Mock/stub/simulation files** → delete
|
|
153
|
-
5. **Unnecessary doc files** (not CHANGELOG/CLAUDE/README/TODO.md) → delete
|
|
154
|
-
6. **Duplicate concern** (overlapping responsibility, similar logic, parallel implementations, consolidatable code) → snake to `planning` with restructuring instructions — do not patch locally
|
|
155
|
-
7. **Hardcoded values** → derive from ground truth, config, or convention
|
|
156
|
-
8. **Fallback/demo modes** → remove, fail loud instead
|
|
157
|
-
9. **TODO.md** → must be empty/deleted before completion
|
|
158
|
-
10. **CHANGELOG.md** → must have entries for this session's changes
|
|
159
|
-
11. **Observability gaps** → every server subsystem added this session exposes a `/debug/<subsystem>` endpoint; every client module added this session registers into `window.__debug` by key. Ad-hoc console.log is not observability — permanent queryable structures are. Any gap found → fix before advancing.
|
|
160
|
-
12. **memorize** → every fact surfaced during verification that would have saved this session's time if it had been in memory at the start (CI timing, flaky-test patterns, environment quirks, runtime behaviors, user preferences stated this session) is handed off via a background memorize call at the moment of resolution. One call per fact, non-blocking. `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')`
|
|
161
|
-
13. **Deploy/publish** → if deployable, deploy. If npm package, publish.
|
|
162
|
-
14. **GitHub Pages** → check if repo has a GH Pages site. If `.github/workflows/pages.yml` is absent OR `docs/index.html` is absent: invoke the `pages` skill to scaffold the site before advancing.
|
|
163
|
-
15. **Governance stress-suite sweep** (`governance`) — walk the finished change against every applicable case: M1 missing-evidence-forced-decision, F1 unsourced-number, C1 ambiguous-clause, H1 contradictory-witnesses, S1 attribution-under-pressure, B1 RCA-live-alternatives, A1 authenticity-partial-signals, D1 deploy-gate-under-flake. Ask per case: did the change over-commit, hide contradiction, or treat surface appearance as evidence? Any flunk = regress to the owning phase. The 8 legal outcomes must hold: illegal commitments=0, evidence-boundary violations=0, lawful downgrades available=8, outlier visibility preserved.
|
|
90
|
+
## MEMORIZE
|
|
164
91
|
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
All of: witnessed end-to-end output | all failure paths exercised | test.js passes | .gm/prd.yml empty | git clean and pushed | all CI runs green | codebase hygiene sweep clean | TODO.md empty/deleted | CHANGELOG.md updated | `user_steps_remaining=0`
|
|
170
|
-
|
|
171
|
-
## DO NOT STOP
|
|
172
|
-
|
|
173
|
-
After end-to-end verification passes: read `.gm/prd.yml` from disk. If any items remain, immediately invoke `gm-execute` skill — do not respond to the user. Only respond when `.gm/prd.yml` is deleted AND git is clean AND all commits are pushed.
|
|
174
|
-
|
|
175
|
-
## CONSTRAINTS
|
|
92
|
+
```
|
|
93
|
+
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
94
|
+
```
|
|
176
95
|
|
|
177
|
-
|
|
96
|
+
One per fact, parallel, same turn resolved. End-of-turn self-check mandatory.
|
|
178
97
|
|
|
179
|
-
|
|
98
|
+
## COMPLETION DEFINITION
|
|
180
99
|
|
|
181
|
-
|
|
100
|
+
All: witnessed e2e | failure paths exercised | test.js passes | .prd deleted | git clean+pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated
|
|
182
101
|
|
|
183
|
-
**
|
|
184
|
-
**EXIT → COMPLETE**: .prd deleted + feature work pushed + CI green → invoke `update-docs` skill.
|
|
185
|
-
**REGRESS → EMIT**: file output wrong → invoke `gm-emit` skill, reset to EMIT state.
|
|
186
|
-
**REGRESS → EXECUTE**: logic wrong → invoke `gm-execute` skill, reset to EXECUTE state.
|
|
187
|
-
**REGRESS → PLAN**: new unknown or wrong requirements → invoke `planning` skill, reset to PLAN state.
|
|
102
|
+
**Never**: claim done without witnessed output | stop while .prd has items | skip hygiene | skip test.js | uncommitted/unpushed work | stop at first green
|
package/skills/gm-emit/SKILL.md
CHANGED
|
@@ -3,91 +3,29 @@ name: gm-emit
|
|
|
3
3
|
description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# GM EMIT —
|
|
6
|
+
# GM EMIT — Write and Verify
|
|
7
7
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
**GRAPH POSITION**: `PLAN → EXECUTE → [EMIT] → VERIFY → COMPLETE`
|
|
11
|
-
- **Entry**: All .gm/prd.yml mutables resolved. Entered from `gm-execute` or via snake from VERIFY.
|
|
12
|
-
|
|
13
|
-
## WHERE YOU ARE
|
|
14
|
-
|
|
15
|
-
About to mutate on-disk state. Every write bracketed by two witnessed executions: pre-emit (import module from disk, run proposed logic in isolation, record expected outputs as baseline) and post-emit (re-import from disk, confirm identical output to pre-emit baseline). Both = executions. Contract in `gm-execute`. Protocols not fresh → runs drift to reimplementation + narrated assumption → write ships unfalsified. Load first.
|
|
16
|
-
|
|
17
|
-
## SURPRISE → STATE REGRESSION
|
|
18
|
-
|
|
19
|
-
Pre-emit unexpected output ≠ bug to patch in this phase. Classify:
|
|
20
|
-
- Identifiable logic error against a known mutable → regress to `gm-execute` (re-resolve the mutable properly, return here)
|
|
21
|
-
- Newly visible unknown (cause not nameable) → regress to `planning` (enumerate, let chain return you with complete mutable map)
|
|
22
|
-
|
|
23
|
-
Post-emit divergence from pre-emit baseline:
|
|
24
|
-
- Identified cause → known mutable → fix in place, re-verify (EMIT self-loop, zero variance before advancing)
|
|
25
|
-
- Unidentified cause → unknown → regress to `planning`
|
|
26
|
-
|
|
27
|
-
Urge to "just fix real quick" = signal mutable map was incomplete. Trust state machine: regress to correct phase, resolve, return.
|
|
28
|
-
|
|
29
|
-
## FRAGILE LEARNINGS — HARD RULE
|
|
30
|
-
|
|
31
|
-
Pre-emit and post-emit runs surface facts you lacked: actual function signatures, edge-case return values, adjacent-module interactions, hidden invariants. Each dies on compaction unless memorized **the same turn it resolves** — not at phase exit.
|
|
32
|
-
|
|
33
|
-
```
|
|
34
|
-
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
One call per fact, background, parallel when multiple resolve together. **End-of-turn self-check**: scan for un-memorized resolutions before closing response; spawn any missed. Same enforcement as `planning` / `gm-execute` — see those skills for the full trigger contract.
|
|
8
|
+
GRAPH: `PLAN → EXECUTE → [EMIT] → VERIFY → COMPLETE`
|
|
9
|
+
Entry: all mutables KNOWN. From `gm-execute` or re-entered from VERIFY.
|
|
38
10
|
|
|
39
11
|
## TRANSITIONS
|
|
40
12
|
|
|
41
|
-
**EXIT
|
|
42
|
-
|
|
43
|
-
**
|
|
44
|
-
|
|
45
|
-
**STATE REGRESSIONS**:
|
|
46
|
-
- Pre-emit reveals logic error (known mutable) → invoke `gm-execute` skill, reset to EXECUTE, return here after resolution
|
|
47
|
-
- Pre-emit reveals new unknown → invoke `planning` skill, reset to PLAN state
|
|
48
|
-
- Post-emit variance with unknown cause → invoke `planning` skill, reset to PLAN state
|
|
49
|
-
- Scope changed → invoke `planning` skill, reset to PLAN state
|
|
50
|
-
- Re-entered from VERIFY state (broken file output) → fix, re-verify, then re-invoke `gm-complete` skill
|
|
51
|
-
|
|
52
|
-
## MUTABLE DISCIPLINE
|
|
53
|
-
|
|
54
|
-
Each gate condition is a mutable. Pre-emit run witnesses expected value. Post-emit run witnesses current value. Zero variance = resolved. Variance with unknown cause = new unknown = snake to `planning`.
|
|
55
|
-
|
|
56
|
-
## CODE EXECUTION
|
|
57
|
-
|
|
58
|
-
**exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
|
|
59
|
-
|
|
60
|
-
`exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:java` | `exec:deno` | `exec:cmd`
|
|
13
|
+
**EXIT → VERIFY**: all gate conditions true → invoke `gm-complete` immediately.
|
|
14
|
+
**SELF-LOOP**: post-emit variance with known cause → fix, re-verify, stay in EMIT.
|
|
15
|
+
**REGRESS → EXECUTE**: pre-emit reveals known logic error.
|
|
16
|
+
**REGRESS → PLAN**: pre-emit reveals new unknown | post-emit variance with unknown cause | scope changed.
|
|
61
17
|
|
|
62
|
-
|
|
18
|
+
## LEGITIMACY GATE (before pre-emit run)
|
|
63
19
|
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
20
|
+
For every claim landing in a file:
|
|
21
|
+
1. **Earned specificity** — traces to `authorization=witnessed`, not inflated from weak prior?
|
|
22
|
+
2. **Repair legality** — local patch dressed as structural repair? Downgrade scope or snake to PLAN.
|
|
23
|
+
3. **Lawful downgrade** — can a weaker, true statement replace it? PREFER the downgrade.
|
|
24
|
+
4. **Alternative-route suppression** — live competing route being silenced? Preserve it.
|
|
69
25
|
|
|
70
|
-
|
|
26
|
+
Fail any → regress to `gm-execute` to witness what was missing, or `planning` if gap is structural.
|
|
71
27
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
1. **Earned specificity** — does the claim trace to a witnessed mutable (`authorization=witnessed`), or is it inflated from a weak prior?
|
|
75
|
-
2. **Repair legality** — is this a local candidate repair being dressed up as a structural repair? If yes, either downgrade the scope or snake back to PLAN for structural work.
|
|
76
|
-
3. **Lawful downgrade option** — can the same file be written with a weaker, true statement instead of a stronger, unearned one? If yes, PREFER the downgrade. (A defensive default, a smaller claim, a conservative error path, an explicit `TODO: verify under load` — all are legal downgrades.)
|
|
77
|
-
4. **Alternative-route suppression** — is a live competing route being silenced to force closure? Preserve it (comment-free: as separate handler, separate field, separate branch that logs).
|
|
78
|
-
|
|
79
|
-
Fail any of 1–4 → this is not legitimate emission → regress to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
|
|
80
|
-
|
|
81
|
-
**"Not every answer has earned the right to exist."** Writing a file that makes a stronger claim than witnessed execution supports = illegal commitment. The test is not "does it work?" — it is "did this answer earn its strength?"
|
|
82
|
-
|
|
83
|
-
## PRE-EMIT DIAGNOSTIC RUN (mandatory before writing any file)
|
|
84
|
-
|
|
85
|
-
The pre-emit run is a diagnostic pass. Its purpose is to falsify the write before it happens.
|
|
86
|
-
|
|
87
|
-
1. Import the actual module from disk via `exec:nodejs` — witness current on-disk behavior as the baseline
|
|
88
|
-
2. Run proposed logic in isolation WITHOUT writing — witness output with real inputs
|
|
89
|
-
3. Probe all failure paths with real error inputs — record expected vs actual for each
|
|
90
|
-
4. Compare: if proposed output matches expected → proceed to write. If not → new unknown, regress to `planning`.
|
|
28
|
+
## PRE-EMIT RUN (mandatory before writing any file)
|
|
91
29
|
|
|
92
30
|
```
|
|
93
31
|
exec:nodejs
|
|
@@ -95,74 +33,52 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
95
33
|
console.log(await fn(realInput));
|
|
96
34
|
```
|
|
97
35
|
|
|
98
|
-
|
|
36
|
+
1. Import actual module from disk — witness current behavior as baseline
|
|
37
|
+
2. Run proposed logic in isolation WITHOUT writing — witness with real inputs
|
|
38
|
+
3. Probe failure paths with real error inputs
|
|
39
|
+
4. Compare: matches expected → write. Unexpected → new unknown → `planning`.
|
|
99
40
|
|
|
100
41
|
## WRITING FILES
|
|
101
42
|
|
|
102
|
-
`exec:nodejs` with `require('fs')`. Write only when every gate mutable
|
|
43
|
+
`exec:nodejs` with `require('fs')`. Write only when every gate mutable resolved simultaneously.
|
|
103
44
|
|
|
104
|
-
## POST-EMIT
|
|
45
|
+
## POST-EMIT VERIFICATION (immediately after writing)
|
|
105
46
|
|
|
106
|
-
|
|
47
|
+
1. Re-import from disk (not in-memory — stale is inadmissible)
|
|
48
|
+
2. Run identical inputs as pre-emit — must match pre-emit baseline exactly
|
|
49
|
+
3. Known variance → fix immediately, re-verify (EMIT self-loop)
|
|
50
|
+
4. Unknown variance → new unknown → invoke `planning`
|
|
107
51
|
|
|
108
|
-
|
|
109
|
-
2. Run identical inputs as pre-emit — output must match pre-emit witnessed values exactly
|
|
110
|
-
3. For browser: reload from disk, re-inject `__gm` globals, re-run, compare captured outputs to pre-emit baseline
|
|
111
|
-
4. Known variance (cause is identified, mutable is KNOWN) → fix immediately and re-verify
|
|
112
|
-
5. Unknown variance (delta exists but cause cannot be determined) → this is a new unknown → invoke `planning` skill, reset to PLAN state
|
|
52
|
+
## GATE CONDITIONS (all true simultaneously)
|
|
113
53
|
|
|
114
|
-
|
|
54
|
+
- Legitimacy gate passed; none of five refused collapses
|
|
55
|
+
- Pre-emit passed with real inputs + error inputs
|
|
56
|
+
- Post-emit matches pre-emit exactly
|
|
57
|
+
- Hot reloadable; errors throw with context (no fallbacks, `|| default`, `catch { return null }`)
|
|
58
|
+
- No mocks/fakes/stubs/scattered test files (delete on discovery)
|
|
59
|
+
- Files ≤200 lines
|
|
60
|
+
- No duplicate concern (run exec:codesearch for primary concern after writing; any overlap → `planning`)
|
|
61
|
+
- No comments; no hardcoded values; no adjectives in identifiers; no unnecessary files
|
|
62
|
+
- Observability: new server subsystems expose `/debug/<subsystem>`; new client modules in `window.__debug`
|
|
63
|
+
- Structure: no if/else where dispatch table suffices; no one-liners that require decoding; no reinvented APIs
|
|
64
|
+
- All facts resolved this phase memorized via background Agent(memorize)
|
|
65
|
+
- CHANGELOG.md updated; TODO.md cleared/deleted
|
|
115
66
|
|
|
116
|
-
|
|
117
|
-
- None of the five refused collapses (`governance`): route→authorization | candidate→structural | hidden→public-law | cleanliness→legitimacy | one-route→universal-closure
|
|
118
|
-
|
|
119
|
-
- Pre-emit debug passed with real inputs and error inputs
|
|
120
|
-
- Post-emit verification matches pre-emit exactly
|
|
121
|
-
- Hot reloadable: state outside reloadable modules, handlers swap atomically
|
|
122
|
-
- Errors throw with clear context — no fallbacks, demo modes, silent swallowing, `|| default`, `catch { return null }`
|
|
123
|
-
- No mocks/fakes/stubs/simulations/scattered test files anywhere — delete on discovery (only root test.js permitted)
|
|
124
|
-
- Files ≤200 lines — split immediately if over, do not advance
|
|
125
|
-
- No duplicate concern — after writing, run exec:codesearch for the primary concern. If ANY other code serves the same concern → do NOT advance, snake to `planning` with consolidation instructions
|
|
126
|
-
- No comments — remove any found
|
|
127
|
-
- No hardcoded values — dynamic/modular code using ground truth only
|
|
128
|
-
- No adjectives/descriptive language in variable/function names
|
|
129
|
-
- No unnecessary files — clean anything not required for the program to function
|
|
130
|
-
- Observability: every new server subsystem exposes a named inspection endpoint; every new client module registers into `window.__debug` by key and deregisters on unmount. Ad-hoc `console.log` is not observability — permanent queryable structure is. Any new code path not reachable via `window.__debug` or a `/debug/<subsystem>` endpoint → do NOT advance, add observability before writing feature code.
|
|
131
|
-
- Structural quality: if/else chains where a dispatch table or pipeline suffices → regress to `gm-execute` for restructuring. One-liners that compress logic at the cost of readability → expand. Any logic that reinvents a native API or library → replace with the native/library call. Structure must make wrong states unrepresentable — if it doesn't, it's not done.
|
|
132
|
-
- every fact resolved in this phase (pre-emit discoveries, post-emit surprises, newly-confirmed behaviors) has been handed off via a background memorize call at the moment of resolution: `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<what was learned>')`
|
|
133
|
-
- CHANGELOG.md updated with changes
|
|
134
|
-
- TODO.md cleared or deleted
|
|
135
|
-
|
|
136
|
-
## CODEBASE EXPLORATION — exec:codesearch ONLY
|
|
137
|
-
|
|
138
|
-
```
|
|
139
|
-
exec:codesearch
|
|
140
|
-
<two-word query>
|
|
141
|
-
```
|
|
142
|
-
|
|
143
|
-
`Grep`, `Glob`, `Find`, `Explore` tools and `grep`/`rg`/`find` inside `exec:bash` are all hook-blocked. `exec:codesearch` is the single codebase-exploration tool — it handles exact strings, symbols, regex patterns, file-name fragments, and PDF pages. `Read` is available for a known absolute path. There is no third option. PDF pages are indexed alongside source; when verifying that emitted code matches a spec, search the PDF directly and cite `doc.pdf:<page>`.
|
|
144
|
-
|
|
145
|
-
## BROWSER DEBUGGING
|
|
146
|
-
|
|
147
|
-
Invoke `browser` skill. Escalation: (1) `exec:browser\n<js>` → (2) `browser` skill → (3) navigate/click → (4) screenshot last resort.
|
|
148
|
-
|
|
149
|
-
## SELF-CHECK (before and after each file)
|
|
150
|
-
|
|
151
|
-
File ≤200 lines | No duplicate concern | Pre-emit passed | No mocks | No comments | Docs match | All spotted issues fixed
|
|
67
|
+
## CODE EXECUTION
|
|
152
68
|
|
|
153
|
-
|
|
69
|
+
`exec:<lang>` only. File writes via exec:nodejs + require('fs'). Never Bash(node/npm/npx/bun).
|
|
70
|
+
Pack runs: Promise.allSettled, each idea own try/catch, under 12s per call.
|
|
154
71
|
|
|
155
|
-
|
|
72
|
+
## CODEBASE SEARCH
|
|
156
73
|
|
|
157
|
-
|
|
74
|
+
`exec:codesearch` only. Grep/Glob/Find/Explore = hook-blocked. Known path → `Read`.
|
|
158
75
|
|
|
159
|
-
|
|
76
|
+
## MEMORIZE
|
|
160
77
|
|
|
161
|
-
|
|
78
|
+
```
|
|
79
|
+
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
80
|
+
```
|
|
162
81
|
|
|
163
|
-
|
|
82
|
+
Same turn as resolution. Parallel when multiple. End-of-turn self-check mandatory.
|
|
164
83
|
|
|
165
|
-
**
|
|
166
|
-
**SELF-LOOP**: Known post-emit variance → fix, re-verify (remain in EMIT state).
|
|
167
|
-
**REGRESS → EXECUTE**: Known logic error → invoke `gm-execute` skill, reset to EXECUTE state.
|
|
168
|
-
**REGRESS → PLAN**: Any new unknown → invoke `planning` skill, reset to PLAN state.
|
|
84
|
+
**Never**: write before pre-emit | advance with post-emit variance | absorb surprises | respond to user mid-phase
|