npm - gm-kilo - Versions diffs - 2.0.631 → 2.0.633 - Mend

gm-kilo 2.0.631 → 2.0.633

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/bin/plugkit-darwin-arm64 +0 -0
package/bin/plugkit-darwin-x64 +0 -0
package/bin/plugkit-linux-arm64 +0 -0
package/bin/plugkit-linux-x64 +0 -0
package/bin/plugkit-win32-arm64.exe +0 -0
package/bin/plugkit-win32-x64.exe +0 -0
package/bin/plugkit.exe +0 -0
package/package.json +1 -1
package/skills/browser/SKILL.md +24 -149
package/skills/code-search/SKILL.md +14 -47
package/skills/create-lang-plugin/SKILL.md +47 -105
package/skills/gm/SKILL.md +20 -48
package/skills/gm-complete/SKILL.md +54 -139
package/skills/gm-emit/SKILL.md +49 -133
package/skills/gm-execute/SKILL.md +52 -140
package/skills/governance/SKILL.md +73 -98
package/skills/planning/SKILL.md +54 -127
package/skills/ssh/SKILL.md +10 -37
package/skills/update-docs/SKILL.md +21 -74

package/skills/gm/SKILL.md CHANGED Viewed

@@ -5,68 +5,40 @@ description: Agent (not skill) - immutable programming state machine. Always inv
 # GM — Skill-First Orchestrator
-**Invoke the `planning` skill immediately.** Use the Skill tool with `skill: "planning"`.
+Invoke `planning` skill immediately. Skill tool only — never Agent tool for skills.
-**CRITICAL: Skills are invoked via the Skill tool ONLY. Do NOT use the Agent tool to load skills.**
+## STATE MACHINE
-## WHERE YOU ARE
+Top of chain. No mutables resolved. Phases: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
+Each phase loads protocols via Skill invocation only. Reading summary ≠ being in phase.
-Top of state machine. No mutables resolved, no files read, no phase protocols loaded. Each phase (PLAN, EXECUTE, EMIT, VERIFY, UPDATE-DOCS) carries its own protocols (mutable discipline, pre-emit diagnostic, post-emit verify, hygiene sweep, CI watch). Protocols enter context only when you invoke the corresponding Skill. Reading a summary ≠ being in them.
+`gm-execute` = execution contract (all phases). `governance` = route/legitimacy reference (load once).
-Transitions = state changes, not reminders. Phase exit condition met → next Skill invocation moves you. Without invocation: still in prior state, regardless of prose.
+## MEMORIZE — HARD RULE
-`gm-execute` = execution contract. Defines "running code" across every phase: `exec:<lang>` = only runner; `exec:codesearch` = only exploration; witnessed output = only ground truth; import real modules over reimplementation. Execution happens in every phase, not only EXECUTE. About to run anything, `gm-execute` protocols not fresh in context → operating outside contract → reload `gm-execute` first.
+Unknown→known = memorize same turn it resolves. Background, non-blocking.
-`governance` = governance reference. Route discovery (7 route families, 16 failure taxonomy) feeds `planning`. Weak-prior bridge (plausibility never equals authorization) constrains `gm-execute`. Legitimacy gate (earned specificity, lawful downgrade, five refused collapses) gates `gm-emit` and `gm-complete`. Load once at session start.
+Triggers: exec: output answers prior unknown | code read confirms/refutes assumption | CI log reveals root cause | user states preference/constraint | fix worked for non-obvious reason | env quirk observed.
-## FRAGILE LEARNINGS — HARD RULE
-Every unknown→known transition in this session = fact that dies on compaction unless handed off **the same turn it resolves**. Not end of phase. Not end of chain. Same turn.
-**Automatic trigger** — spawn `memorize` the moment any of these happens:
-- An `exec:` run's output answers an earlier "let me check" / "I don't know yet"
-- A code read confirms or refutes an assumption
-- A CI log reveals a root cause
-- User states a preference, constraint, deadline, or decision
-- A fix worked for a non-obvious reason
-- A tool / environment quirk bit once (blocked commands, path oddities, platform gotchas)
-**Invocation** (background, non-blocking, continue working in the same message):
 ```
-Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<single fact with enough context to be useful cold>')
+Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
 ```
-**Parallel**: multiple facts resolve in one turn → spawn multiple memorize agents in the **same message** (parallel tool blocks). One call per fact. Never serialize, never batch into one prompt.
-**End-of-turn self-check** (mandatory before handing control back): scan the turn for resolved unknowns that were not memorized. Any found → spawn them now, parallel, before the response closes.
-Resolve an unknown, skip memorize = memory leak. Treat it as a bug, not a style choice.
-## USER DONE TALKING
-User gave task. User waiting. Not co-pilot — person whose time you conserve by running chain end-to-end. Mid-chain questions ("should I proceed?", "which approach?", "look right before continue?") = chain breaks. Every break forces user back into loop they offloaded.
-Unknown resolution order — fixed:
-1. **Code execution** — witnessed run (`exec:<lang>`, `exec:codesearch`, import real module). Covers 90%+ of mutables.
-2. **Web** — `WebFetch` / `WebSearch` for API docs, spec PDFs, library versions, framework conventions. Covers environment facts not in this codebase.
-3. **User** — only after code and web exhausted. Only for genuinely ambiguous scope that makes planning impossible, or destructive-irreversible decisions (force-push, drop prod table, publish). Not for preferences resolvable from existing code conventions.
-An unknown that could fall to step 1 or 2 is not a clarifying question — it is a missed run. "Want me to..." or "Should I..." mid-chain = invoke next skill instead.
-Clarification allowed at top of chain (before first `planning`) when scope is genuinely unreadable. After chain starts: policy carries it.
+Multiple facts → parallel Agent calls in ONE message. End-of-turn: scan for un-memorized resolutions → spawn now.
-All work coordination, planning, execution, and verification happens through the skill tree starting with `planning`:
-- `planning` skill → `gm-execute` skill → `gm-emit` skill → `gm-complete` skill → `update-docs` skill
-- `memorize` sub-agent — background only, non-sequential. `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<what was learned>')`
+## EXECUTION ORDER
-All code execution uses `exec:<lang>` via the Bash tool — never direct `Bash(node ...)` or `Bash(npm ...)`.
+1. Code execution (exec:<lang>, exec:codesearch) — 90%+ of unknowns
+2. Web (WebFetch/WebSearch) — env facts not in codebase
+3. User — only when 1+2 exhausted AND decision is destructive-irreversible
-**Every `git push` triggers GitHub Actions. CI is auto-watched by the Stop hook — outcomes (green / failed / still-running) appear in next-turn context. No manual `gh run watch` needed. `gh run view <id> --log-failed` is for on-demand failure diagnosis only. Full behaviour in `gm-execute`.**
+"Should I..." mid-chain = invoke next skill instead.
-Do not use `EnterPlanMode`. Do not run code directly via Bash. Invoke `planning` skill first.
+Skill chain: `planning` → `gm-execute` → `gm-emit` → `gm-complete` → `update-docs`
-## RESPONSE POLICY — ALWAYS ACTIVE
+exec:<lang> only. Never Bash(node/npm/npx/bun). git push = auto CI watch via Stop hook.
-Always terse. Technical substance stays. Fluff dies. Drop articles, filler, pleasantries, hedging. Fragments OK. Short synonyms. Technical terms exact. Pattern: `[thing] [action] [reason]. [next step].`
+## RESPONSE POLICY
-Code, commits, and PR descriptions write in normal prose. Security warnings, destructive confirmations, and genuinely ambiguous sequences also drop terseness. Everything else stays terse.
+Terse. Drop filler. Fragments OK. Pattern: `[thing] [action] [reason]. [next step].`
+Code/commits/PRs = normal prose. Security/destructive = drop terseness.

package/skills/gm-complete/SKILL.md CHANGED Viewed

@@ -3,69 +3,34 @@ name: gm-complete
 description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
 ---
-# GM COMPLETE — Verification and Completion
+# GM COMPLETE — Verify and Complete
-You are in the **VERIFY → COMPLETE** phase. Files are written. Prove the whole system works end-to-end. Any new unknown = snake to `planning`, restart chain.
-**GRAPH POSITION**: `PLAN → EXECUTE → EMIT → [VERIFY] → UPDATE-DOCS → COMPLETE`
-- **Entry**: All EMIT gates passed. Entered from `gm-emit`.
-## WHERE YOU ARE
-Files written. Question now: does the whole system work end-to-end, and does the world outside local repo (CI, downstream pipelines, deployed surfaces) agree. Every check = witnessed execution: `node test.js`, `gh run watch`, diagnostic repros on failure. Contract in `gm-execute`; protocols not fresh → verification drifts to narrated claims ("change should work") over witnessed ones ("change produced output X"). Load first.
-## VERIFICATION → UNKNOWNS
-Failing test, red CI, surprising downstream cascade ≠ things to patch around. = new fault surfaces becoming visible. Classify failure:
-- Wrong file output → regress to EMIT
-- Wrong logic → regress to EXECUTE
-- Genuinely new unknown or wrong requirement → regress to PLANNING
-Let chain carry you. Stop-and-fix-here = how silent-failure bugs reach prod. Machine assumes you regress; trust it.
-## FRAGILE LEARNINGS — HARD RULE
-Phase where environment reality hits hardest — CI runner quirks, flaky-test patterns, timing thresholds, deploy cadences, cross-repo cascade behaviors. Highest-value memorization surface. Each fact → memorize **the same turn it resolves**, background, parallel when multiple:
-```
-Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
-```
-One call per fact. **End-of-turn self-check** mandatory: any resolved unknown un-memorized → spawn before closing response. Full trigger contract in `planning` / `gm-execute`.
+GRAPH: `PLAN → EXECUTE → EMIT → [VERIFY] → UPDATE-DOCS → COMPLETE`
+Entry: all EMIT gates passed. From `gm-emit`.
 ## TRANSITIONS
-**EXIT — .gm/prd.yml items remain**: Verified items completed, .gm/prd.yml still has pending items → invoke `gm-execute` skill immediately (next wave). Do not stop.
-**EXIT — COMPLETE**: .gm/prd.yml empty + test.js passes + all work pushed + CI green → invoke `update-docs` skill.
-**STATE REGRESSIONS**:
-- Verification reveals broken file output → invoke `gm-emit` skill, reset to EMIT state, re-verify on return
-- Verification reveals logic error → invoke `gm-execute` skill, reset to EXECUTE state, re-emit and re-verify on return
-- Verification reveals new unknown → invoke `planning` skill, reset to PLAN state
-- Verification reveals wrong requirements → invoke `planning` skill, reset to PLAN state
-**TRIAGE on failure**: broken file output → regress to `gm-emit` | wrong logic → regress to `gm-execute` | new unknown or wrong requirements → regress to `planning`
-**RULE**: Any surprise = new unknown = regress to `planning`. Never patch around surprises.
-## MUTABLE DISCIPLINE
+**EXIT → EXECUTE**: .prd items remain → invoke `gm-execute` immediately.
+**EXIT → COMPLETE**: .prd deleted + test.js passes + pushed + CI green → invoke `update-docs`.
+**REGRESS → EMIT**: broken file output.
+**REGRESS → EXECUTE**: logic wrong.
+**REGRESS → PLAN**: new unknown or wrong requirements.
-- `witnessed_e2e=UNKNOWN` until real end-to-end run produces witnessed output
-- `git_clean=UNKNOWN` until `exec:bash\ngit status --porcelain` returns empty
-- `git_pushed=UNKNOWN` until `git log origin/main..HEAD --oneline` returns empty
-- `ci_passed=UNKNOWN` until all GitHub Actions runs triggered by the push reach `conclusion: success`
-- `prd_empty=UNKNOWN` until `.gm/prd.yml` is deleted (not just empty — file must not exist)
-- `stress_suite_clear=UNKNOWN` until the change has been mentally walked through every applicable case in the `governance` stress suite (M1, F1, C1, H1, S1, B1, A1, D1) and none flunks. Flunk = regress to the phase that owns the gap.
-- `hidden_decision_posture=open` until CI green. Posture advances `open → down_weighted` only when some evidence is in, `down_weighted → closed` only when CI green + stress suite clear. Closing early = collapse #3 (hidden orchestration into public law).
+Failure triage: broken output → EMIT | wrong logic → EXECUTE | new unknown → PLAN. Never patch around surprises.
-All must resolve to KNOWN (or `closed` for posture) before COMPLETE. Any UNKNOWN = absolute barrier.
+## MUTABLES — ALL MUST RESOLVE BEFORE COMPLETE
-## END-TO-END DIAGNOSTIC VERIFICATION
+- `witnessed_e2e` — real end-to-end run with witnessed output
+- `git_clean` — `git status --porcelain` returns empty
+- `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
+- `ci_passed` — all GitHub Actions runs reach `conclusion: success`
+- `prd_empty` — `.gm/prd.yml` deleted (file must not exist)
+- `stress_suite_clear` — change walked through all applicable governance stress cases (M1-D1), none flunk
+- `hidden_decision_posture` — advances open→down_weighted→closed only when CI green + stress suite clear
-Run the real system with real data. Witness actual output. This is a full-system fault-detection pass.
+## END-TO-END VERIFICATION
-NOT verification: docs updates, status text, saying done, screenshots alone, marker files. Unwitnessed claims are inadmissible.
+Run real system, real data, witness actual output. NOT verification: docs updates, saying done, screenshots alone.
 ```
 exec:nodejs
@@ -73,115 +38,65 @@ const { fn } = await import('/abs/path/to/module.js');
 console.log(await fn(realInput));
 ```
-**Failure triage protocol**: when end-to-end fails, do not patch blindly. Isolate the fault:
-1. Identify which subsystem produced the unexpected output
-2. Reproduce the failure in isolation (single function, single module)
-3. Name the delta between expected and actual — this is the mutable
-4. Triage: broken file output → regress to EMIT | wrong logic → regress to EXECUTE | new unknown → regress to PLAN
-5. Never fix a symptom without identifying and fixing the root cause
-For browser/UI: invoke `browser` skill with real workflows. Server + client features require both exec:nodejs AND browser diagnostics. After every success: enumerate what remains — never stop at first green. First green is not COMPLETE.
+Browser/UI: invoke `browser` skill. After every success: enumerate what remains — never stop at first green.
 ## INTEGRATION TEST GATE
-Before git enforcement, run the project's `test.js` if it exists:
 ```
 exec:nodejs
 const { execSync } = require('child_process');
-try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('test.js: PASS'); } catch (e) { console.error('test.js: FAIL'); process.exit(1); }
-```
-Failure = regression to `gm-execute`. Do not proceed to git enforcement with failing tests.
-If `test.js` does not exist and the project has testable surface, regress to `gm-execute` to create it.
-## CODE EXECUTION
-**exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
-`exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:java` | `exec:deno` | `exec:cmd`
-Only git in bash directly. Background tasks: `exec:sleep\n<id>`, `exec:status\n<id>`, `exec:close\n<id>`. Runner: `exec:runner\nstart|stop|status`.
-**Execution efficiency — pack every run:**
-- Combine multiple independent operations into one exec call using `Promise.allSettled` or parallel subprocess spawning
-- Each independent idea gets its own try/catch with independent error reporting — never let one failure block another
-- Target under 12s per exec call; split work across multiple calls only when dependencies require it
-- Prefer a single well-structured exec that does 5 things over 5 sequential execs
-## CODEBASE EXPLORATION — exec:codesearch ONLY
-```
-exec:codesearch
-<two-word query>
+try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
+catch (e) { console.error('FAIL'); process.exit(1); }
 ```
-`Grep`, `Glob`, `Find`, `Explore`, and `grep`/`rg`/`find` inside `exec:bash` are all hook-blocked. `exec:codesearch` is the single codebase-exploration tool. `Read` is available for a known absolute path. PDFs in the repo are part of the same index — when verifying a change conforms to a published spec, search the spec PDF directly and cite `doc.pdf:<page>` as evidence. A verification that references a PDF without having searched it is unwitnessed.
+Failure → regress to `gm-execute`. No test.js + testable surface → regress to `gm-execute` to create it.
 ## GIT ENFORCEMENT
 ```
 exec:bash
 git status --porcelain
-```
-Must return empty.
-```
-exec:bash
 git log origin/main..HEAD --oneline
 ```
-Must return empty. If not: stage → commit → push → re-verify. Local commit without push ≠ complete.
-## CI ENFORCEMENT — AUTOMATED
+Both must return empty. Local commit without push ≠ complete.
-The Stop hook automatically watches GitHub Actions runs whose `headSha` matches the just-pushed HEAD. You do not call `gh run list` / `gh run watch` manually.
+## CI — AUTOMATED
-- All-green runs → Stop approves with a CI summary appended to context for the next turn.
-- Any failed / cancelled / timed-out / action_required run → Stop blocks with the failed run names + IDs in the reason. Treat that as a KNOWN mutable: investigate (`gh run view <id> --log-failed` only when explicitly diagnosing), fix the root cause, regress to the appropriate phase, push again — the hook re-watches automatically.
-- Watch deadline default 180s, override with `GM_CI_WATCH_SECS`. If the deadline passes with runs still in flight, Stop approves with "still in progress" so you do not block on slow Pages-deploy / npm-publish jobs forever.
-- Cascade awareness: if a push to this repo triggers downstream workflows in another repo, those are NOT automatically watched (only same-repo). Manual cascade check stays for those rare cases.
+Stop hook watches all GitHub Actions runs for the pushed HEAD. Do not call `gh run list` manually.
+- All-green → Stop approves with CI summary in next turn context
+- Failure → Stop blocks with run names+IDs → investigate with `gh run view <id> --log-failed`, fix, push, hook re-watches
+- Deadline 180s (override `GM_CI_WATCH_SECS`) → slow jobs get "still in progress" approve
-## CODEBASE HYGIENE SWEEP
+## HYGIENE SWEEP
-Before declaring complete, sweep the entire codebase for violations:
+Before declaring complete:
+1. Files >200 lines → split
+2. Comments in code → remove
+3. Scattered test files (.test.js, .spec.js, __tests__/, fixtures/, mocks/) → delete, consolidate into root test.js
+4. Mock/stub/simulation files → delete
+5. Unnecessary doc files (not CHANGELOG/CLAUDE/README/TODO.md) → delete
+6. Duplicate concern → snake to `planning` with restructuring instructions
+7. Hardcoded values → derive from ground truth
+8. Fallback/demo modes → remove, fail loud
+9. TODO.md → empty/deleted
+10. CHANGELOG.md → has entries for this session
+11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
+12. Memorize → every fact from verification handed off via background Agent(memorize) at moment of resolution
+13. Deploy/publish → if deployable, deploy; if npm package, publish
+14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
+15. Governance stress-suite → walk change through M1,F1,C1,H1,S1,B1,A1,D1; any flunk = regress to owning phase
-1. **Files >200 lines** → split immediately
-2. **Comments in code** → remove all
-3. **Scattered test files** (.test.js, .spec.js, __tests__/, fixtures/, mocks/) → delete, consolidate coverage into root `test.js`
-4. **Mock/stub/simulation files** → delete
-5. **Unnecessary doc files** (not CHANGELOG/CLAUDE/README/TODO.md) → delete
-6. **Duplicate concern** (overlapping responsibility, similar logic, parallel implementations, consolidatable code) → snake to `planning` with restructuring instructions — do not patch locally
-7. **Hardcoded values** → derive from ground truth, config, or convention
-8. **Fallback/demo modes** → remove, fail loud instead
-9. **TODO.md** → must be empty/deleted before completion
-10. **CHANGELOG.md** → must have entries for this session's changes
-11. **Observability gaps** → every server subsystem added this session exposes a `/debug/<subsystem>` endpoint; every client module added this session registers into `window.__debug` by key. Ad-hoc console.log is not observability — permanent queryable structures are. Any gap found → fix before advancing.
-12. **memorize** → every fact surfaced during verification that would have saved this session's time if it had been in memory at the start (CI timing, flaky-test patterns, environment quirks, runtime behaviors, user preferences stated this session) is handed off via a background memorize call at the moment of resolution. One call per fact, non-blocking. `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')`
-13. **Deploy/publish** → if deployable, deploy. If npm package, publish.
-14. **GitHub Pages** → check if repo has a GH Pages site. If `.github/workflows/pages.yml` is absent OR `docs/index.html` is absent: invoke the `pages` skill to scaffold the site before advancing.
-15. **Governance stress-suite sweep** (`governance`) — walk the finished change against every applicable case: M1 missing-evidence-forced-decision, F1 unsourced-number, C1 ambiguous-clause, H1 contradictory-witnesses, S1 attribution-under-pressure, B1 RCA-live-alternatives, A1 authenticity-partial-signals, D1 deploy-gate-under-flake. Ask per case: did the change over-commit, hide contradiction, or treat surface appearance as evidence? Any flunk = regress to the owning phase. The 8 legal outcomes must hold: illegal commitments=0, evidence-boundary violations=0, lawful downgrades available=8, outlier visibility preserved.
+## MEMORIZE
-Any violation found = fix immediately before advancing.
-## COMPLETION DEFINITION
-All of: witnessed end-to-end output | all failure paths exercised | test.js passes | .gm/prd.yml empty | git clean and pushed | all CI runs green | codebase hygiene sweep clean | TODO.md empty/deleted | CHANGELOG.md updated | `user_steps_remaining=0`
-## DO NOT STOP
-After end-to-end verification passes: read `.gm/prd.yml` from disk. If any items remain, immediately invoke `gm-execute` skill — do not respond to the user. Only respond when `.gm/prd.yml` is deleted AND git is clean AND all commits are pushed.
-## CONSTRAINTS
+```
+Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
+```
-**Never**: claim done without witnessed output | uncommitted changes | unpushed commits | failed CI runs | .gm/prd.yml items remaining | TODO.md with items remaining | stop at first green | absorb surprises silently | respond to user while .gm/prd.yml has items | skip hygiene sweep | leave comments/mocks/scattered test files/fallbacks | skip test.js execution
+One per fact, parallel, same turn resolved. End-of-turn self-check mandatory.
-**Always**: triage failure before regressing | witness end-to-end | run test.js before git enforcement | regress to planning on any new unknown | enumerate remaining after every success | check .gm/prd.yml after every verification pass | run hygiene sweep before declaring complete | deploy/publish if applicable | update CHANGELOG.md
+## COMPLETION DEFINITION
----
+All: witnessed e2e | failure paths exercised | test.js passes | .prd deleted | git clean+pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated
-**EXIT → EXECUTE**: .prd items remain → invoke `gm-execute` skill immediately (keep going, never stop with .prd items).
-**EXIT → COMPLETE**: .prd deleted + feature work pushed + CI green → invoke `update-docs` skill.
-**REGRESS → EMIT**: file output wrong → invoke `gm-emit` skill, reset to EMIT state.
-**REGRESS → EXECUTE**: logic wrong → invoke `gm-execute` skill, reset to EXECUTE state.
-**REGRESS → PLAN**: new unknown or wrong requirements → invoke `planning` skill, reset to PLAN state.
+**Never**: claim done without witnessed output | stop while .prd has items | skip hygiene | skip test.js | uncommitted/unpushed work | stop at first green

package/skills/gm-emit/SKILL.md CHANGED Viewed

@@ -3,91 +3,29 @@ name: gm-emit
 description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
 ---
-# GM EMIT — Writing and Verifying Files
+# GM EMIT — Write and Verify
-You are in the **EMIT** phase. Every mutable is KNOWN. Prove the write is correct, write, confirm from disk. Any new unknown = snake to `planning`, restart chain.
-**GRAPH POSITION**: `PLAN → EXECUTE → [EMIT] → VERIFY → COMPLETE`
-- **Entry**: All .gm/prd.yml mutables resolved. Entered from `gm-execute` or via snake from VERIFY.
-## WHERE YOU ARE
-About to mutate on-disk state. Every write bracketed by two witnessed executions: pre-emit (import module from disk, run proposed logic in isolation, record expected outputs as baseline) and post-emit (re-import from disk, confirm identical output to pre-emit baseline). Both = executions. Contract in `gm-execute`. Protocols not fresh → runs drift to reimplementation + narrated assumption → write ships unfalsified. Load first.
-## SURPRISE → STATE REGRESSION
-Pre-emit unexpected output ≠ bug to patch in this phase. Classify:
-- Identifiable logic error against a known mutable → regress to `gm-execute` (re-resolve the mutable properly, return here)
-- Newly visible unknown (cause not nameable) → regress to `planning` (enumerate, let chain return you with complete mutable map)
-Post-emit divergence from pre-emit baseline:
-- Identified cause → known mutable → fix in place, re-verify (EMIT self-loop, zero variance before advancing)
-- Unidentified cause → unknown → regress to `planning`
-Urge to "just fix real quick" = signal mutable map was incomplete. Trust state machine: regress to correct phase, resolve, return.
-## FRAGILE LEARNINGS — HARD RULE
-Pre-emit and post-emit runs surface facts you lacked: actual function signatures, edge-case return values, adjacent-module interactions, hidden invariants. Each dies on compaction unless memorized **the same turn it resolves** — not at phase exit.
-```
-Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
-```
-One call per fact, background, parallel when multiple resolve together. **End-of-turn self-check**: scan for un-memorized resolutions before closing response; spawn any missed. Same enforcement as `planning` / `gm-execute` — see those skills for the full trigger contract.
+GRAPH: `PLAN → EXECUTE → [EMIT] → VERIFY → COMPLETE`
+Entry: all mutables KNOWN. From `gm-execute` or re-entered from VERIFY.
 ## TRANSITIONS
-**EXIT — invoke `gm-complete` skill immediately when**: All gate conditions are true simultaneously. Do not pause. Invoke the skill.
-**SELF-LOOP (remain in EMIT state)**: Post-emit variance with known cause → fix immediately, re-verify, do not advance until zero variance
-**STATE REGRESSIONS**:
-- Pre-emit reveals logic error (known mutable) → invoke `gm-execute` skill, reset to EXECUTE, return here after resolution
-- Pre-emit reveals new unknown → invoke `planning` skill, reset to PLAN state
-- Post-emit variance with unknown cause → invoke `planning` skill, reset to PLAN state
-- Scope changed → invoke `planning` skill, reset to PLAN state
-- Re-entered from VERIFY state (broken file output) → fix, re-verify, then re-invoke `gm-complete` skill
-## MUTABLE DISCIPLINE
-Each gate condition is a mutable. Pre-emit run witnesses expected value. Post-emit run witnesses current value. Zero variance = resolved. Variance with unknown cause = new unknown = snake to `planning`.
-## CODE EXECUTION
-**exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
-`exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:java` | `exec:deno` | `exec:cmd`
+**EXIT → VERIFY**: all gate conditions true → invoke `gm-complete` immediately.
+**SELF-LOOP**: post-emit variance with known cause → fix, re-verify, stay in EMIT.
+**REGRESS → EXECUTE**: pre-emit reveals known logic error.
+**REGRESS → PLAN**: pre-emit reveals new unknown | post-emit variance with unknown cause | scope changed.
-Only git in bash directly. `Bash(node/npm/npx/bun)` = violations. File writes via exec:nodejs + require('fs').
+## LEGITIMACY GATE (before pre-emit run)
-**Execution efficiency — pack every run:**
-- Combine multiple independent operations into one exec call using `Promise.allSettled` or parallel subprocess spawning
-- Each independent idea gets its own try/catch with independent error reporting — never let one failure block another
-- Target under 12s per exec call; split work across multiple calls only when dependencies require it
-- Prefer a single well-structured exec that does 5 things over 5 sequential execs
+For every claim landing in a file:
+1. **Earned specificity** — traces to `authorization=witnessed`, not inflated from weak prior?
+2. **Repair legality** — local patch dressed as structural repair? Downgrade scope or snake to PLAN.
+3. **Lawful downgrade** — can a weaker, true statement replace it? PREFER the downgrade.
+4. **Alternative-route suppression** — live competing route being silenced? Preserve it.
-## LEGITIMACY GATE — EARNED SPECIFICITY
+Fail any → regress to `gm-execute` to witness what was missing, or `planning` if gap is structural.
-Before the pre-emit run, apply the legitimacy check from `governance`. For every claim, assertion, or specific value about to land in a file, answer:
-1. **Earned specificity** — does the claim trace to a witnessed mutable (`authorization=witnessed`), or is it inflated from a weak prior?
-2. **Repair legality** — is this a local candidate repair being dressed up as a structural repair? If yes, either downgrade the scope or snake back to PLAN for structural work.
-3. **Lawful downgrade option** — can the same file be written with a weaker, true statement instead of a stronger, unearned one? If yes, PREFER the downgrade. (A defensive default, a smaller claim, a conservative error path, an explicit `TODO: verify under load` — all are legal downgrades.)
-4. **Alternative-route suppression** — is a live competing route being silenced to force closure? Preserve it (comment-free: as separate handler, separate field, separate branch that logs).
-Fail any of 1–4 → this is not legitimate emission → regress to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
-**"Not every answer has earned the right to exist."** Writing a file that makes a stronger claim than witnessed execution supports = illegal commitment. The test is not "does it work?" — it is "did this answer earn its strength?"
-## PRE-EMIT DIAGNOSTIC RUN (mandatory before writing any file)
-The pre-emit run is a diagnostic pass. Its purpose is to falsify the write before it happens.
-1. Import the actual module from disk via `exec:nodejs` — witness current on-disk behavior as the baseline
-2. Run proposed logic in isolation WITHOUT writing — witness output with real inputs
-3. Probe all failure paths with real error inputs — record expected vs actual for each
-4. Compare: if proposed output matches expected → proceed to write. If not → new unknown, regress to `planning`.
+## PRE-EMIT RUN (mandatory before writing any file)
 ```
 exec:nodejs
@@ -95,74 +33,52 @@ const { fn } = await import('/abs/path/to/module.js');
 console.log(await fn(realInput));
 ```
-Pre-emit revealing unexpected behavior → name the delta → new unknown → invoke `planning` skill, reset to PLAN state.
+1. Import actual module from disk — witness current behavior as baseline
+2. Run proposed logic in isolation WITHOUT writing — witness with real inputs
+3. Probe failure paths with real error inputs
+4. Compare: matches expected → write. Unexpected → new unknown → `planning`.
 ## WRITING FILES
-`exec:nodejs` with `require('fs')`. Write only when every gate mutable is `resolved=true` simultaneously.
+`exec:nodejs` with `require('fs')`. Write only when every gate mutable resolved simultaneously.
-## POST-EMIT DIAGNOSTIC VERIFICATION (immediately after writing)
+## POST-EMIT VERIFICATION (immediately after writing)
-The post-emit verification is a differential diagnosis against the pre-emit baseline.
+1. Re-import from disk (not in-memory — stale is inadmissible)
+2. Run identical inputs as pre-emit — must match pre-emit baseline exactly
+3. Known variance → fix immediately, re-verify (EMIT self-loop)
+4. Unknown variance → new unknown → invoke `planning`
-1. Re-import the actual file from disk — not the in-memory version (stale in-memory state is inadmissible)
-2. Run identical inputs as pre-emit — output must match pre-emit witnessed values exactly
-3. For browser: reload from disk, re-inject `__gm` globals, re-run, compare captured outputs to pre-emit baseline
-4. Known variance (cause is identified, mutable is KNOWN) → fix immediately and re-verify
-5. Unknown variance (delta exists but cause cannot be determined) → this is a new unknown → invoke `planning` skill, reset to PLAN state
+## GATE CONDITIONS (all true simultaneously)
-## GATE CONDITIONS (all true simultaneously before advancing)
+- Legitimacy gate passed; none of five refused collapses
+- Pre-emit passed with real inputs + error inputs
+- Post-emit matches pre-emit exactly
+- Hot reloadable; errors throw with context (no fallbacks, `|| default`, `catch { return null }`)
+- No mocks/fakes/stubs/scattered test files (delete on discovery)
+- Files ≤200 lines
+- No duplicate concern (run exec:codesearch for primary concern after writing; any overlap → `planning`)
+- No comments; no hardcoded values; no adjectives in identifiers; no unnecessary files
+- Observability: new server subsystems expose `/debug/<subsystem>`; new client modules in `window.__debug`
+- Structure: no if/else where dispatch table suffices; no one-liners that require decoding; no reinvented APIs
+- All facts resolved this phase memorized via background Agent(memorize)
+- CHANGELOG.md updated; TODO.md cleared/deleted
-- Legitimacy gate passed: every claim traces to `authorization=witnessed`, no weak-prior inflation, no local-candidate-dressed-as-structural, lawful downgrade considered and either taken or explicitly justified, live competing routes preserved
-- None of the five refused collapses (`governance`): route→authorization | candidate→structural | hidden→public-law | cleanliness→legitimacy | one-route→universal-closure
-- Pre-emit debug passed with real inputs and error inputs
-- Post-emit verification matches pre-emit exactly
-- Hot reloadable: state outside reloadable modules, handlers swap atomically
-- Errors throw with clear context — no fallbacks, demo modes, silent swallowing, `|| default`, `catch { return null }`
-- No mocks/fakes/stubs/simulations/scattered test files anywhere — delete on discovery (only root test.js permitted)
-- Files ≤200 lines — split immediately if over, do not advance
-- No duplicate concern — after writing, run exec:codesearch for the primary concern. If ANY other code serves the same concern → do NOT advance, snake to `planning` with consolidation instructions
-- No comments — remove any found
-- No hardcoded values — dynamic/modular code using ground truth only
-- No adjectives/descriptive language in variable/function names
-- No unnecessary files — clean anything not required for the program to function
-- Observability: every new server subsystem exposes a named inspection endpoint; every new client module registers into `window.__debug` by key and deregisters on unmount. Ad-hoc `console.log` is not observability — permanent queryable structure is. Any new code path not reachable via `window.__debug` or a `/debug/<subsystem>` endpoint → do NOT advance, add observability before writing feature code.
-- Structural quality: if/else chains where a dispatch table or pipeline suffices → regress to `gm-execute` for restructuring. One-liners that compress logic at the cost of readability → expand. Any logic that reinvents a native API or library → replace with the native/library call. Structure must make wrong states unrepresentable — if it doesn't, it's not done.
-- every fact resolved in this phase (pre-emit discoveries, post-emit surprises, newly-confirmed behaviors) has been handed off via a background memorize call at the moment of resolution: `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<what was learned>')`
-- CHANGELOG.md updated with changes
-- TODO.md cleared or deleted
-## CODEBASE EXPLORATION — exec:codesearch ONLY
-```
-exec:codesearch
-<two-word query>
-```
-`Grep`, `Glob`, `Find`, `Explore` tools and `grep`/`rg`/`find` inside `exec:bash` are all hook-blocked. `exec:codesearch` is the single codebase-exploration tool — it handles exact strings, symbols, regex patterns, file-name fragments, and PDF pages. `Read` is available for a known absolute path. There is no third option. PDF pages are indexed alongside source; when verifying that emitted code matches a spec, search the PDF directly and cite `doc.pdf:<page>`.
-## BROWSER DEBUGGING
-Invoke `browser` skill. Escalation: (1) `exec:browser\n<js>` → (2) `browser` skill → (3) navigate/click → (4) screenshot last resort.
-## SELF-CHECK (before and after each file)
-File ≤200 lines | No duplicate concern | Pre-emit passed | No mocks | No comments | Docs match | All spotted issues fixed
+## CODE EXECUTION
-## DO NOT STOP
+`exec:<lang>` only. File writes via exec:nodejs + require('fs'). Never Bash(node/npm/npx/bun).
+Pack runs: Promise.allSettled, each idea own try/catch, under 12s per call.
-Never respond to the user from this phase. When all gate conditions pass, immediately invoke `gm-complete` skill. Do not pause, summarize, or ask questions.
+## CODEBASE SEARCH
-## CONSTRAINTS
+`exec:codesearch` only. Grep/Glob/Find/Explore = hook-blocked. Known path → `Read`.
-**Never**: write before pre-emit passes | advance with post-emit variance | absorb surprises silently | comments | hardcoded values | fallback/demo modes | silent error swallowing | defer spotted issues | respond to user or pause for input
+## MEMORIZE
-**Always**: pre-emit debug before writing | post-emit verify from disk | regress to planning on any new unknown | fix immediately | invoke next skill immediately when gates pass
+```
+Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
+```
----
+Same turn as resolution. Parallel when multiple. End-of-turn self-check mandatory.
-**EXIT → VERIFY**: All gates pass → invoke `gm-complete` skill immediately.
-**SELF-LOOP**: Known post-emit variance → fix, re-verify (remain in EMIT state).
-**REGRESS → EXECUTE**: Known logic error → invoke `gm-execute` skill, reset to EXECUTE state.
-**REGRESS → PLAN**: Any new unknown → invoke `planning` skill, reset to PLAN state.
+**Never**: write before pre-emit | advance with post-emit variance | absorb surprises | respond to user mid-phase