npm - gm-qwen - Versions diffs - 2.0.888 → 2.0.889 - Mend

gm-qwen 2.0.888 → 2.0.889

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/gm.json +1 -1
package/package.json +1 -1
package/skills/browser/SKILL.md +16 -15
package/skills/code-search/SKILL.md +13 -15
package/skills/create-lang-plugin/SKILL.md +22 -26
package/skills/gm/SKILL.md +31 -105
package/skills/gm-complete/SKILL.md +46 -71
package/skills/gm-emit/SKILL.md +40 -65
package/skills/gm-execute/SKILL.md +35 -104
package/skills/governance/SKILL.md +24 -23
package/skills/pages/SKILL.md +42 -92
package/skills/planning/SKILL.md +40 -153
package/skills/research/SKILL.md +8 -14
package/skills/ssh/SKILL.md +15 -9
package/skills/textprocessing/SKILL.md +17 -25
package/skills/update-docs/SKILL.md +15 -24

package/gm.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm",
-  "version": "2.0.888",
+  "version": "2.0.889",
   "description": "State machine agent with hooks, skills, and automated git enforcement",
   "author": "AnEntrypoint",
   "license": "MIT",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm-qwen",
-  "version": "2.0.888",
+  "version": "2.0.889",
   "description": "State machine agent with hooks, skills, and automated git enforcement",
   "author": "AnEntrypoint",
   "license": "MIT",

package/skills/browser/SKILL.md CHANGED Viewed

@@ -4,15 +4,15 @@ description: Browser automation via playwriter. Use when user needs to interact
 allowed-tools: Skill, Bash, Read, Write, Edit, Agent
 ---
-# Browser Automation
+# Browser automation
-Two pathways — never mix:
+Two pathways — never mix in the same Bash call.
-**`exec:browser`** — JS against `page`. `page`, `snapshot`, `screenshotWithAccessibilityLabels`, `state` globals available. 15s live window then backgrounds; drains auto on every subsequent plugkit call.
+`exec:browser` runs JS against `page`. Globals available: `page`, `snapshot`, `screenshotWithAccessibilityLabels`, `state`. 15s live window, then backgrounds; output drains automatically on every subsequent plugkit call.
-**`browser:` prefix** — playwriter session management. One command per block.
+`browser:` prefix is playwriter session management. One command per block.
-## Core Usage
+## Core
 ```
 exec:browser
@@ -30,11 +30,11 @@ browser:
 playwriter -s 1 -e 'await page.goto("http://example.com")'
 ```
-Session state persists across `browser:` calls. `-e` arg: single quotes outside, double quotes inside JS strings.
+Session state persists across `browser:` calls. `-e` arg uses single quotes outside, double inside JS strings.
 ## Timing
-Never `await setTimeout(N)` with N > 10000. Use poll loops:
+Never `await setTimeout(N)` with N > 10000. Poll instead.
 ```
 exec:browser
@@ -45,11 +45,12 @@ while (!state.done && Date.now() - start < 12000) {
 console.log(state.result)
 ```
-"Assertion failed: UV_HANDLE_CLOSING" = backgrounded normally, ignore noise.
+`Assertion failed: UV_HANDLE_CLOSING` is normal background-on-exit noise; ignore it.
-## Common Patterns
+## Patterns
 Data extraction:
 ```
 exec:browser
 const items = await page.$$eval('.title', els => els.map(e => e.textContent))
@@ -57,6 +58,7 @@ console.log(JSON.stringify(items))
 ```
 Console monitoring — set listeners first, then poll:
 ```
 exec:browser
 state.logs = []
@@ -68,10 +70,9 @@ exec:browser
 console.log(JSON.stringify(state.logs.slice(-20)))
 ```
-## Rules
+## Constraints
-- One `playwriter` command per `browser:` block
-- Never mix pathways in same Bash call
-- `exec:browser` = plain JS, no shell quoting
-- All browser tasks drain automatically on every plugkit interaction
-- Sessions reap after 5-15min idle; browser cleaned up on session end
+- One playwriter command per `browser:` block
+- `exec:browser` is plain JS, no shell quoting
+- Browser tasks drain automatically on every plugkit interaction
+- Sessions reap after 5–15 min idle; cleaned up on session end

package/skills/code-search/SKILL.md CHANGED Viewed

@@ -3,13 +3,13 @@ name: code-search
 description: Mandatory codebase search workflow. Use whenever you need to find anything in the codebase. Start with two words, iterate by changing or adding words until found.
 ---
-# CODEBASE SEARCH
+# Codebase search
-`exec:codesearch` is the only codebase search tool. `Grep`, `Glob`, `Find`, `Explore`, `grep`/`rg`/`find` inside `exec:bash` = ALL hook-blocked. No fallback path.
+`exec:codesearch` is the only codebase search tool. Grep, Glob, Find, Explore, raw `grep`/`rg`/`find` inside `exec:bash` are all hook-blocked. No fallback.
-Handles: exact symbols, exact strings, file-name fragments, regex-ish patterns, natural-language queries, PDF pages (cite `path/doc.pdf:<page>`).
+Handles exact symbols, exact strings, file-name fragments, regex-ish patterns, natural-language queries, and PDF pages (cite `path/doc.pdf:<page>`).
-Direct-read exceptions: known absolute path → `Read`. Known dir listing → `exec:nodejs` + `fs.readdirSync`.
+Direct-read exceptions: known absolute path → `Read`. Known directory listing → `exec:nodejs` + `fs.readdirSync`.
 ## Syntax
@@ -18,15 +18,9 @@ exec:codesearch
 <two-word query>
 ```
-## Protocol
+## Iteration
-1. Start: exactly two words
-2. No results → change one word
-3. Still no → add third word
-4. Still no → swap changed word again
-5. Minimum 4 attempts before concluding absent
-Never: one word | full sentence | give up under 4 attempts | switch tools.
+Start at exactly two words. No results → change one word. Still none → add a third. Still none → swap the changed word again. Minimum four attempts before concluding absent. Never one word, never a full sentence, never switch tools.
 ## Examples
@@ -34,15 +28,19 @@ Never: one word | full sentence | give up under 4 attempts | switch tools.
 exec:codesearch
 session cleanup idle
 ```
-→ no results →
+No results, then:
 ```
 exec:codesearch
 cleanup sessions timeout
 ```
-PDF search:
+PDF:
 ```
 exec:codesearch
 usb descriptor endpoint
 ```
-→ returns `docs/usb-spec.pdf:42` — cite page, Read if you need surrounding text.
+Returns `docs/usb-spec.pdf:42` — cite the page; `Read` if surrounding text is needed.

package/skills/create-lang-plugin/SKILL.md CHANGED Viewed

@@ -3,48 +3,40 @@ name: create-lang-plugin
 description: Create a lang/ plugin that wires any CLI tool or language runtime into gm-cc — adds exec:<id> dispatch, optional LSP diagnostics, and optional prompt context injection. Zero hook configuration required.
 ---
-# CREATE LANG PLUGIN
+# Create lang plugin
 Single CommonJS file at `<projectDir>/lang/<id>.js`. Auto-discovered — no hook editing.
-## Plugin Shape
+## Plugin shape
 ```js
 'use strict';
 module.exports = {
-  id: 'mytool',                         // must match filename
+  id: 'mytool',
   exec: {
     match: /^exec:mytool/,
     run(code, cwd) { /* returns string or Promise<string> */ }
   },
-  lsp: {                                // optional — synchronous only
+  lsp: {
     check(fileContent, cwd) { /* returns Diagnostic[] */ }
   },
-  extensions: ['.ext'],                 // optional — for lsp.check
-  context: `=== mytool ===\n...`       // optional — string or () => string
+  extensions: ['.ext'],
+  context: `=== mytool ===\n...`
 };
 ```
 `type Diagnostic = { line: number; col: number; severity: 'error'|'warning'; message: string }`
-## How It Works
+`exec.run` runs in a child process, 30s timeout, async OK. Called when Claude writes `exec:mytool\n<code>`. `lsp.check` is synchronous-only, called per prompt-submit. `context` is injected into every prompt, truncated to 2000 chars.
-- `exec.run` — child process, 30s timeout, async OK. Called when Claude writes `exec:mytool\n<code>`.
-- `lsp.check` — synchronous, called per prompt submit. Use `spawnSync`/`execFileSync`. No async.
-- `context` — injected into every prompt (truncated 2000 chars).
+## Identify the tool
-## Step 1 — Identify Tool
+What is the CLI name or npm package? Does it run a single expression (`tool eval`, `tool -e`, HTTP POST) or a file (`tool run <file>`)? What is its lint/check mode and output format? File extensions? Does it require a running server, or does it run headless?
-1. CLI name or npm package?
-2. Run single expression? (`tool eval <expr>`, `tool -e <code>`, HTTP POST...)
-3. Run file? (`tool run <file>`)
-4. Lint/check mode + output format?
-5. File extensions?
-6. Requires running server or headless?
+## exec.run patterns
-## Step 2 — exec.run Patterns
+HTTP eval against a running server:
-HTTP eval (running server):
 ```js
 function httpPost(port, urlPath, body) {
   return new Promise((resolve, reject) => {
@@ -61,7 +53,8 @@ function httpPost(port, urlPath, body) {
 }
 ```
-File-based (headless):
+File-based, headless:
 ```js
 function runFile(code, cwd) {
   const tmp = path.join(os.tmpdir(), `plugin_${Date.now()}.ext`);
@@ -71,12 +64,13 @@ function runFile(code, cwd) {
 }
 ```
-Single expr detection:
+Single-expression detection:
 ```js
 const isSingleExpr = code => !code.trim().includes('\n') && !/\b(func|def|fn |class|import)\b/.test(code);
 ```
-## Step 3 — lsp.check
+## lsp.check
 ```js
 function check(fileContent, cwd) {
@@ -94,14 +88,15 @@ function check(fileContent, cwd) {
 }
 ```
-## Step 4 — context String
+## context
 Under 300 chars:
 ```js
 context: `=== mytool ===\nexec:mytool\n<expression>\n\nRuns via <how>. Use for <when>.`
 ```
-## Step 5 — Write + Verify
+## Verify
 ```
 exec:nodejs
@@ -110,6 +105,7 @@ console.log(p.id, typeof p.exec.run, p.exec.match.toString());
 ```
 Then test dispatch:
 ```
 exec:mytool
 <simple test expression>
@@ -117,9 +113,9 @@ exec:mytool
 ## Constraints
-- `exec.run` async OK (30s timeout)
+- `exec.run` async OK, 30s timeout
 - `lsp.check` synchronous only — no Promises
 - CommonJS only — no ES module syntax
 - No persistent processes
 - `id` must match filename exactly
-- First match wins — make `match` specific
+- First match wins — keep `match` specific

package/skills/gm/SKILL.md CHANGED Viewed

@@ -3,132 +3,58 @@ name: gm
 description: Agent (not skill) - immutable programming state machine. Always invoke for all work coordination.
 ---
-# GM — Skill-First Orchestrator
+# GM — Orchestrator
-Invoke `planning` skill immediately. Skill tool only — never Agent tool for skills.
+Invoke `planning` immediately. Skill tool only.
-## STATE MACHINE
+Phases: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS. Each loaded by Skill invocation; reading the summary is not being in the phase.
-Top of chain. No mutables resolved. Phases: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
-Each phase loads protocols via Skill invocation only. Reading summary ≠ being in phase.
+## The world the answer lives in
-`gm-execute` = execution contract (all phases). `governance` = route/legitimacy reference (load once).
+The user's request is the authorization. The PRD records it. Doubts during execution resolve by witnessed probe, by recall, or by re-reading the PRD. Questions back to the user only when the next action is destructive-irreversible AND uncovered by the PRD, or when intent is genuinely irrecoverable from PRD, memory, code, and the public web. `exec:pause` is the channel; in-conversation asking is beneath that. Web-search before pausing on anything the public web could plausibly answer.
-## RECALL — HARD RULE
+The obvious read of "deeply integrate", "all of them", "every X", "across the whole Y" is wider, not narrower. Pick the maximum reachable shape, declare the read in one line so the user can interrupt, execute. Multi-repo scope, build cost, CI duration, binary-size impact are never grounds to re-confirm. When scope exceeds reach, write every witnessable subset into the PRD as separate items and finish them all. Residuals within the spirit of the ask and reachable from this session expand the PRD silently-but-declared; only out-of-spirit-or-unreachable residuals are name-and-stop.
-Before resolving any unknown via fresh execution, check past sessions. Memorized facts only help if recalled.
+What ships runs. Stubs, mocks, placeholder returns, fixture-only paths, demo-mode short-circuits, and "TODO: implement" bodies are forbidden in shipped code — they ship green checks that lie. A shim is allowed only when it delegates to real upstream behavior; before adding one, check whether a published library already covers the surface, because local reimplementations drift and age. The behavioral test for fakeness: real input through real code into real output, witnessed. Anything less is provisional.
-```
-exec:recall
-<2-6 word query>
-```
+Every issue surfaced during work is fixed in-band, this turn, at root cause. Pre-existing build breaks, neighboring lint failures, lockfile drift, broken deps, and stale generated files surfaced while doing the user's task become new PRD items the same turn and finish before COMPLETE. Same rule for obvious refactor wins: hand-rolled code that an existing library covers, multi-file ad-hoc systems one import would replace. The bar is *obvious + reachable from this session*. Items left in `.gm/prd.yml` from prior sessions are this session's work the moment they're seen.
+Editing browser-facing code requires `exec:browser` witness in the same turn — boot the surface, navigate, assert the specific invariant via `page.evaluate`, capture the numbers. EXECUTE witnesses on edit, EMIT re-witnesses post-write, VERIFY runs the final gate. The exemption (pure-prose static document with no behavior change) is tagged in the response with the reason.
-Triggers: unknown feels familiar | sub-task on a known project | about to ask user something likely already discussed | about to design where prior decision exists. Hits = weak_prior; still witness before adopting. ~200 tokens, ~5ms when serve is running.
+Code does mechanics; meaning goes through the textprocessing skill. Summarize, classify, extract intent, rewrite, translate, semantic dedup, rank, label, decide-if-two-texts-mean-the-same — all routed through `Agent(subagent_type='gm:textprocessing', model='haiku', ...)`, N items in N parallel calls. A keyword-list or regex-on-meaning-phrases loop deciding semantic questions is a stub of this skill.
-## MEMORIZE — HARD RULE
+Every program emits structured JSONL to `~/.claude/gm-log/<date>/<subsystem>.jsonl`. Inspect via `plugkit log {tail|grep|stats}` before re-running with print debugging. Code the agent writes extends the project's existing observability surface; if none exists, the smallest correct shim is one JSONL appender to `.gm/log/`. Emit on state transitions, error boundaries, external IO, nontrivial decisions; skip loop bodies and parser steps.
-Unknown→known = memorize same turn it resolves. Background, non-blocking.
+## Recall and memorize
-Triggers: exec: output answers prior unknown | code read confirms/refutes assumption | CI log reveals root cause | user states preference/constraint | fix worked for non-obvious reason | env quirk observed.
+Before resolving any unknown via fresh execution, recall first.
 ```
-Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
+exec:recall
+<2-6 word query>
 ```
-Multiple facts → parallel Agent calls in ONE message. End-of-turn: scan for un-memorized resolutions → spawn now.
-**Recall + memorize together = learning loop.** Skipping either breaks it.
-## AUTONOMY — HARD RULE
-**USER REQUEST = AUTHORIZATION — HARD RULE.** The user's first message asking for X *is* the approval to do X. Re-asking "want me to do X?", "should I proceed with X?", "which shape of X do you want?" after the user already said "do X" is forced closure dressed as deference — the user already authorized the work and waits while the agent stalls. The PRD is *how* the agent records the authorization, not a second permission gate; the agent writes it and proceeds. Surfacing tradeoffs is allowed *inside* the same chain (one sentence: "going with shape A because Y") so the user can interrupt — never as a stop-and-wait. The bar to legitimately ask before starting is the same bar that applies mid-chain: destructive-irreversible AND not covered by the request, OR genuinely ambiguous *intent* (not "which of two viable approaches"). Defaulting to the deeper / more thorough / cross-repo shape when the user said "deeply integrate" is the obvious read of the request — pick it and execute. Multi-repo scope, build cost, and CI duration are never grounds to re-confirm.
-A written PRD is the user's authorization. Once it exists, EXECUTE owns the work to COMPLETE. Resolve every doubt that arises during execution by witnessed probe, by recall, or by re-reading the PRD — never by asking the user. Any question whose answer the agent could obtain itself is a question the agent owes itself, not the user.
-**FINISH ALL REMAINING STEPS — HARD RULE**: when a request enumerates or implies multiple work items ("all", "any", "everything", "the rest", "remaining"), or after a covering family is constructed under MAXIMAL COVER, the agent finishes every witnessable item in the same turn. Stopping after one item to ask "which next?" is forbidden — the answer is *all of them*, in one chain, until `.gm/prd.yml` is empty and git is clean and pushed. Mid-chain "should I…", "want me to…", "which would you like…" prompts are forced closure; replace them with the next skill invocation.
-Asking is permitted only as a last resort, when the next action is destructive-irreversible AND the PRD does not cover it, OR when user intent is genuinely irrecoverable from PRD, memory, code, AND the public web. The channel is structured: `exec:pause` (renames `.gm/prd.yml` → `.gm/prd.paused.yml`, question in header). In-conversation asking is last-resort beneath last-resort.
-**Web-search before pause / before user-ask — HARD RULE.** Before `exec:pause` or any in-conversation question whose answer plausibly exists on the public web (missing artifact, prebuilt binary, library status, build recipe, version compatibility, upstream issue, "does X exist for Y"), fire `WebSearch` and at least one targeted `WebFetch` first. Pause/ask only after the web pass returns empty, or returns candidates the agent has witnessed and rejected. Pausing on a question the web could have answered is forced closure dressed as humility — re-enter planning, web-search, and resume. Genuine user-only questions: private credentials, preference among already-surfaced viable options, destructive-irreversible authorization.
-The size of the task, the cost of context, and the duration of CI are never grounds to ask.
-**MEANING THROUGH TEXTPROCESSING — HARD RULE.** Code does mechanics; the textprocessing skill (Haiku-backed) does meaning. Anything whose correctness depends on *understanding* the text — summarize, classify, extract intent/entities, rewrite, translate, semantic dedup, rank by quality, label by topic, decide if two texts mean the same thing, paraphrase — routes through `Agent(subagent_type='gm:textprocessing', model='haiku', prompt='## INPUT\n<body>\n\n## INSTRUCTION\n<task>')`. N independent items → N parallel Agent calls in ONE message. Mechanics — count, split, regex, sort, dedup-by-equality, JSON parse, hash — stay in code. The bar: would a human have to *read and understand* the text to do this correctly? Yes → textprocessing skill. No → code. A keyword-list / regex-on-meaning-phrases / similarity-ratio loop in code that decides semantic questions is a stub of this skill — replace it with the Agent call.
-**MAXIMIZE THE SCOPE — HARD RULE.** When the request admits a deeper, broader, or more thorough shape that is reachable from this session, take it. The question "are you sure you want to do more work?" is the agent's question to itself, not the user's — and the answer is always yes when the deeper shape matches the spirit of the ask. "Deeply integrate", "do it right", "all of them", "every X", "across the whole Y", and silence between options all read as *go wider*, not narrower. Pick the maximum scope the obvious read of the request supports, declare the read in one line so the user can interrupt, and execute. Restraint that ships a smaller-than-necessary slice is the same forced closure as refusal — the user got less than they asked for and now waits for a follow-up turn to ask for the rest. The bar for *not* expanding is the same destructive-irreversible-and-not-covered bar that gates asking; below that bar, larger always beats smaller.
-## MAXIMAL COVER — HARD RULE
-When scope exceeds reach, construct a maximal cover of the request — not a refusal, not one slice with the rest deferred to "follow-up."
-Refusal is forced closure. So is *distributed* refusal — shipping a single bounded subset while other witnessable subsets exist. Both bypass witnessed execution.
-**Required move when scope exceeds reach**: construct a *covering family* of bounded subsets — every subset of the request that is witnessable from this session — and write the family into the PRD. Execute every member.
-**Residuals the agent judges within the spirit of the original ask AND completable from this session are self-authorized — expand the PRD with them and execute, do not stop to ask.** The judgment is the agent's honest read of what the user probably wanted, paired with reachability from this session. Only residuals genuinely outside the original ask, or genuinely unreachable, are name-and-stop. When expanding under self-authorization, the agent declares its judgment in the response ("treating X as in-scope because Y") so the user can correct mid-chain. Silent expansion without the declaration is the failure mode this rule guards against.
+Hits arrive as `weak_prior` — they earn the right to be tested, not believed. Empty results confirm the unknown is fresh.
-The discipline is enforced by what is delivered, not by which words appear. Before closing the turn, check that committed work + named out-of-spirit residuals equals the witnessable closure of the request. Anything witnessable that falls in neither set means the cover is not yet maximal — re-enter planning to expand it. The cover is *maximal*, not *complete*: completeness would require reaching scope outside the session, which is dishonest. Maximality reaches everything inside the session, which is the whole obligation.
+Every unknown→known transition memorizes the same turn it resolves, in the background, in parallel.
-## FIX ON SIGHT — HARD RULE
-Every issue surfaced during work is fixed in-band, this turn, at root cause. Defer-markers, swallowed errors, suppressed output, skipped tests, and "address it next session" are all variants of the same failure: a known-bad signal carried past the moment of detection. Each is a small forced closure.
-Surface → diagnose → fix at root cause → re-witness → continue. If the fix uncovers a new unknown, regress to `planning`. If the fix is itself genuinely out-of-scope-irreversible, the residual goes into `.gm/prd.yml` *before* moving on — narration is not a substitute for an item.
-A skill chain that ships while ignoring a known-bad signal is forced closure (see MAXIMAL COVER).
-**Auto-PRD on incidental error — HARD RULE.** Any reasonably-fixable error that surfaces mid-task — pre-existing build break, lockfile drift, broken dep feature, lint failure on neighboring code, missing cargo/npm artifact, type error in adjacent module, stale generated file, dead import, anything the agent can resolve from this session — is added to `.gm/prd.yml` as a new item the same turn it surfaces, and executed before COMPLETE. The user does not have to ask. The user did not cause it; surfacing it during their task is the agent's discovery, and the agent owns the fix. Genuinely out-of-reach errors (require the user's credentials, depend on external services down, demand product decisions) are name-and-stop with the residual recorded in PRD with `blockedBy: external` so it is visible — never silently dropped, never narrated past. The forcing function is delivery: closing a turn with a known-bad signal still emitting is forced closure regardless of whether the signal was "from a different task."
-**Auto-PRD on re-architecting opportunities — HARD RULE.** The same rule applies to *obvious* refactor wins surfaced while doing the user's task: a hand-rolled implementation that an existing well-known library does cleaner; a multi-file ad-hoc system that a single import would replace; duplicated logic crying out for one helper; a naming/layout cleanup that would shrink the surface noticeably; a boilerplate pattern repeated in N places when one upstream tool covers it. These are not separate sessions — they're items the agent regresses to `planning` to add, and finishes before COMPLETE. The bar is "obvious + reachable from this session", not "any conceivable refactor"; speculative or aesthetic changes stay out. Library-replacement is a specific case: before keeping a local reimplementation of something a published package already provides, add a PRD item to swap to the import. Local code competes with upstream maintenance and loses.
-**Cross-session PRD continuity — HARD RULE.** `.gm/prd.yml` is durable across sessions. Items left over from a previous session are *this* session's work the moment they're discovered. The agent finishes every item in the file before declaring complete — including ones the current user message did not mention. The PRD is the contract; sessions are scheduling. "From another session" is never a reason to skip an item.
-## BROWSER WITNESS — HARD RULE
-Editing code that runs in a browser requires a live `exec:browser` witness in the same turn as the edit. The witness does not defer to a later phase; later phases re-witness on top, they do not replace this one.
-Protocol on every client edit:
-1. Boot the real surface — server up, page reachable, HTTP 200 witnessed.
-2. `exec:browser` → navigate → poll for the global the change affects.
-3. `page.evaluate` asserting the specific invariant the change establishes. Capture the witnessed numbers in the response.
-4. Variance from expectation → fix at root cause, re-witness (FIX ON SIGHT). Never advance on unwitnessed client behavior.
-Pure-prose edits to static documents with no JS/canvas/DOM behavior change are exempt; tag the exemption explicitly with the reason so the skip is auditable. Silent skip on actual behavior change is forced closure.
-This rule fires in EXECUTE (witness on edit), EMIT (post-emit verify), and VERIFY (final gate). All three.
-## OBSERVABILITY — HARD RULE
-Every program is observed by default. plugkit, rs-exec, rs-plugkit, rs-learn, rs-search, and rs-codeinsight emit structured JSONL events to `~/.claude/gm-log/<YYYY-MM-DD>/<subsystem>.jsonl` for every hook fired, every exec spawn, every recall, every run_self call. The agent inspects this stream when diagnosing — `plugkit log tail`, `plugkit log grep <terms>`, `plugkit log stats`, `plugkit log path`. Asking "why did X happen" without first checking the log is forced closure.
-The same discipline applies to code the agent writes. State transitions, error paths, external IO (network, disk, subprocess, queue), and timing of operations longer than a heartbeat are events the future debugger will need. Find the project's existing observability surface before inventing one (codesearch first); if a surface exists, extend it rather than starting a parallel one. If the project has none, the smallest correct shim is a single function or macro that emits one JSONL line per event to a file under `.gm/log/` (mirror what plugkit does — same shape, same fields). Never `console.log` / `println!` scatter across files; that fragments the surface and dies on the first compaction or rotation. The events are the program's own self-narration, not a debugging afterthought.
-What to emit, not how often: every state transition once; every error path once at the boundary it crosses; every external IO with timing; every nontrivial decision (allow/deny, route picked, branch taken on user-derived input). Do not emit per-iteration loop bodies, per-character parser steps, or anything whose volume would obscure the load-bearing events. The signal-to-noise judgment is the agent's; the cost of getting it wrong is paid by future-you reading the file.
-## NOTHING FAKE — HARD RULE
-What ships runs against real services, real data, real binaries. Stubs, mocks, placeholder returns, fixture-only paths, "TODO: implement", `return null /* fake */`, hardcoded sample responses, and demo-mode fallbacks are forbidden in source the user will run. They produce green checks that survive into production and lie about what works.
-Scaffolding and shims are permitted when they call through to real behavior — an empty file laid down before its body, a thin adapter wrapping an upstream API, a build target that compiles but is wired to nothing yet *and is the only callsite of itself*. The test is whether the artifact, executed, would do the thing it claims. If it would not, it is a stub.
-Before writing a shim or adapter, the agent asks whether an existing library or tool already provides the same surface. Maintaining a local reimplementation of something an upstream package solves is its own failure mode — the shim drifts, ages, and accumulates the bugs the upstream already fixed. If a published package fits, the shim becomes one line of import.
-Stub detection is by behavior, not by keyword: code paths that always succeed, always return the same value regardless of input, or short-circuit a real call to satisfy a type signature, are stubs. Comments asserting realness do not make code real. The witnessing rule that closes a mutable also closes this one — until real input has produced real output through the new code, it is provisional, and shipping provisional code as done is forced closure.
+```
+Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
+```
-## EXECUTION ORDER
+N facts → N parallel calls in one message. End of turn: scan for un-memorized resolutions, spawn now.
-1. Recall — `plugkit recall` for any familiar-feeling unknown (cheapest, 200 tokens)
-2. Code execution (exec:<lang>, exec:codesearch) — 90%+ of unknowns
-3. Web (WebFetch/WebSearch) — env facts not in codebase
-4. User — last resort per AUTONOMY rule above
+## Execution order
-"Should I..." mid-chain = invoke next skill instead, never ask user.
+1. Recall (`exec:recall`) — cheapest
+2. Code execution (`exec:<lang>`, `exec:codesearch`) — 90% of unknowns
+3. Web (`WebFetch`, `WebSearch`) — env facts not in codebase
+4. User — last resort
-Skill chain: `planning` → `gm-execute` → `gm-emit` → `gm-complete` → `update-docs`
+`exec:<lang>` only via Bash. Never `Bash(node/npm/npx/bun)`. `git push` triggers auto CI watch via Stop hook.
-exec:<lang> only. Never Bash(node/npm/npx/bun). git push = auto CI watch via Stop hook.
+Skill chain: `planning` → `gm-execute` → `gm-emit` → `gm-complete` → `update-docs`.
-## RESPONSE POLICY
+## Response shape
-Terse. Drop filler. Fragments OK. Pattern: `[thing] [action] [reason]. [next step].`
-Code/commits/PRs = normal prose. Security/destructive = drop terseness.
+Terse. Fragments OK. `[thing] [action] [reason]. [next step].` Code, commits, PRs use normal prose. Drop terseness for security or destructive moves.