npm - gm-codex - Versions diffs - 2.0.888 → 2.0.890 - Mend

gm-codex 2.0.888 → 2.0.890

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/.codex-plugin/plugin.json +1 -1
package/bin/plugkit.sha256 +6 -6
package/bin/plugkit.version +1 -1
package/bin/rtk.sha256 +2 -2
package/gm.json +2 -2
package/package.json +1 -1
package/plugin.json +1 -1
package/skills/browser/SKILL.md +16 -15
package/skills/code-search/SKILL.md +13 -15
package/skills/create-lang-plugin/SKILL.md +22 -26
package/skills/gm/SKILL.md +31 -105
package/skills/gm-complete/SKILL.md +46 -71
package/skills/gm-emit/SKILL.md +40 -65
package/skills/gm-execute/SKILL.md +35 -104
package/skills/governance/SKILL.md +24 -23
package/skills/pages/SKILL.md +42 -92
package/skills/planning/SKILL.md +40 -153
package/skills/research/SKILL.md +8 -14
package/skills/ssh/SKILL.md +15 -9
package/skills/textprocessing/SKILL.md +17 -25
package/skills/update-docs/SKILL.md +15 -24

package/skills/pages/SKILL.md CHANGED Viewed

@@ -3,41 +3,37 @@ name: pages
 description: Scaffold and maintain a GitHub Pages site. Buildless in browser (webjsx + rippleui via CDN), flatspace for content aggregation built during GH Actions. Use when user wants to create or update a GH Pages site.
 ---
-# Pages — GitHub Pages Site Scaffolder
+# Pages — GitHub Pages site scaffolder
-Scaffold a complete GH Pages site: **no local build step**, content managed via flatspace flat-file CMS, UI via webjsx + rippleui CDN, GH Actions builds and deploys.
-**Follow full gm skill chain: planning → gm-execute → gm-emit → gm-complete → update-docs.**
+Scaffold a complete GH Pages site with no local build step. Content via flatspace flat-file CMS, UI via webjsx + rippleui CDN, GH Actions builds and deploys. Follow the full chain: `planning → gm-execute → gm-emit → gm-complete → update-docs`.
 ## Stack
 | Layer | Tool | How |
-|-------|------|-----|
+|---|---|---|
 | UI rendering | [webjsx](https://webjsx.org) | ES module via importmap, `applyDiff` for DOM updates |
 | Styling | [rippleui](https://ripple-ui.com) | CDN `<link>` — Tailwind-based component classes |
 | Content CMS | [flatspace](https://npmjs.com/package/flatspace) | Aggregates `content/` → `docs/data/*.json` at build time |
 | Build | GH Actions | `npx flatspace` runs in CI, commits output to `docs/` |
-| Hosting | GitHub Pages | Source: `docs/` branch, or GH Actions deploy |
+| Hosting | GitHub Pages | Source set to "GitHub Actions" |
-## Directory Layout
+## Layout
 ```
 <project>/
-  content/              # Source content (markdown, json, yaml)
-    pages/              # Static pages (index.md, about.md, ...)
-    posts/              # Blog posts or articles
-    data/               # Structured data files
-  docs/                 # GH Pages root (gitignored build output except index.html)
-    index.html          # Entry point — committed, never regenerated
-    app.js              # Main webjsx app — committed
-    data/               # flatspace output — gitignored, built by CI
-  .github/
-    workflows/
-      pages.yml         # Build + deploy workflow
-  flatspace.config.js   # flatspace aggregation config
+  content/
+    pages/
+    posts/
+    data/
+  docs/
+    index.html       # committed, never regenerated
+    app.js           # committed
+    data/            # flatspace output, gitignored
+  .github/workflows/pages.yml
+  flatspace.config.js
 ```
-## index.html — Buildless Entry
+## index.html
 ```html
 <!DOCTYPE html>
@@ -63,28 +59,19 @@ Scaffold a complete GH Pages site: **no local build step**, content managed via
 </html>
 ```
-## app.js — webjsx App Pattern
+## app.js
 ```js
 import { applyDiff } from 'webjsx';
 const h = (tag, props, ...children) => ({ tag, props: props || {}, children });
 const state = { page: null, data: {} };
-async function loadData(path) {
-  const res = await fetch(path);
-  return res.json();
-}
-function render() {
-  applyDiff(document.getElementById('root'), App(state));
-}
+async function loadData(path) { return (await fetch(path)).json(); }
+function render() { applyDiff(document.getElementById('root'), App(state)); }
 function App(s) {
-  if (!s.page) return h('div', { class: 'flex justify-center p-8' },
-    h('span', { class: 'spinner' })
-  );
+  if (!s.page) return h('div', { class: 'flex justify-center p-8' }, h('span', { class: 'spinner' }));
   return h('div', { class: 'max-w-4xl mx-auto p-4' },
     h('nav', { class: 'navbar bg-backgroundSecondary mb-6' },
       h('span', { class: 'navbar-brand text-xl font-bold' }, s.page.title)
@@ -100,11 +87,7 @@ function Section(section) {
   );
 }
-async function main() {
-  state.page = await loadData('./data/index.json');
-  render();
-}
+async function main() { state.page = await loadData('./data/index.json'); render(); }
 main();
 ```
@@ -122,11 +105,10 @@ module.exports = {
 };
 ```
-## GH Actions Workflow — pages.yml
+## pages.yml
 ```yaml
 name: Deploy GitHub Pages
 on:
   push:
     branches: [main]
@@ -142,14 +124,10 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
       - uses: actions/setup-node@v4
-        with:
-          node-version: '20'
+        with: { node-version: '20' }
       - name: Build content with flatspace
         run: npx flatspace
       - name: Commit built data
         run: |
           git config user.name "github-actions[bot]"
@@ -157,11 +135,8 @@ jobs:
           git add docs/data/
           git diff --staged --quiet || git commit -m "chore: build content [skip ci]"
           git push
-      - name: Upload Pages artifact
-        uses: actions/upload-pages-artifact@v3
-        with:
-          path: docs/
+      - uses: actions/upload-pages-artifact@v3
+        with: { path: docs/ }
   deploy:
     needs: build
@@ -174,26 +149,14 @@ jobs:
         uses: actions/deploy-pages@v4
 ```
-## Scaffolding Steps
+## Scaffold sequence
-When user says "set up pages" or "create GH pages site":
+Read existing `docs/` and `content/` if present — never clobber existing content. Create the directory structure. Write `docs/index.html`, `docs/app.js`, `flatspace.config.js`, `.github/workflows/pages.yml`, `content/pages/index.md` with minimal frontmatter (`title`, `sections` array). Add `docs/data/` to `.gitignore`. Verify GH Pages setting is "GitHub Actions" in repo Settings — remind the user if you can't verify.
-1. **Read** existing `docs/` and `content/` if present — never clobber existing content
-2. **Create** directory structure above
-3. **Write** `docs/index.html` with correct site title
-4. **Write** `docs/app.js` with webjsx app skeleton
-5. **Write** `flatspace.config.js`
-6. **Write** `.github/workflows/pages.yml`
-7. **Write** `content/pages/index.md` with minimal frontmatter (`title`, `sections` array)
-8. **Add** `docs/data/` to `.gitignore` (built by CI, not committed by humans)
-9. **Verify** GH Pages setting is "GitHub Actions" in repo Settings — remind user if can't verify
-## rippleui Component Classes Quick Reference
-Use these directly in JSX className strings — no config needed:
+## rippleui classes
 | Component | Class |
-|-----------|-------|
+|---|---|
 | Button | `btn btn-primary`, `btn btn-secondary`, `btn btn-ghost` |
 | Card | `card p-4` |
 | Input | `input input-primary` |
@@ -203,30 +166,18 @@ Use these directly in JSX className strings — no config needed:
 | Spinner | `spinner` |
 | Divider | `divider` |
-Background: `bg-backgroundPrimary`, `bg-backgroundSecondary`. Text: `text-content1`, `text-content2`.
-**CSS variable warning**: rippleui color vars (e.g. `--gray-2`) are raw space-separated RGB tuples — not valid CSS colors. Never use them in `rgb()` directly from JS. Use the component classes instead.
-## webjsx Patterns
-**No JSX transpile needed** — use `h()` factory or import from CDN with importmap and write JSX in `.jsx` files served directly (Chrome supports importmap natively).
-For `.js` files without transpile, use the `h` factory pattern shown above.
+Background `bg-backgroundPrimary`, `bg-backgroundSecondary`. Text `text-content1`, `text-content2`. rippleui CSS color vars (e.g. `--gray-2`) are raw space-separated RGB tuples — invalid in `rgb()` directly. Use the component classes instead.
-For `.jsx` with native importmap (no build):
-```js
-/** @jsxImportSource webjsx */
-import { applyDiff } from 'webjsx';
-```
-Only works if server sets correct MIME type for `.jsx` — GH Pages does not. Use `.js` + `h()` factory.
+## webjsx
-**applyDiff signature**: `applyDiff(domNode, vnodeOrArray)` — never pass a string, always pass a vnode from `h()`.
+No JSX transpile needed. Use the `h()` factory in `.js` files served directly. `.jsx` with native importmap requires the server to set the correct MIME type, which GH Pages does not — stay in `.js` + `h()`.
-**State updates**: mutate `state`, call `render()` — no reactive system, explicit re-render on every change.
+`applyDiff(domNode, vnodeOrArray)` — never pass a string. State updates mutate `state` and call `render()`; no reactive system.
-## Content Format (flatspace input)
+## Content format
 Markdown with YAML frontmatter:
 ```markdown
 ---
 title: Home
@@ -240,19 +191,18 @@ sections:
 Full markdown body here.
 ```
-flatspace outputs `docs/data/pages/index.json`:
+Output `docs/data/pages/index.json`:
 ```json
 { "title": "Home", "sections": [...], "body": "<p>Full markdown body here.</p>", "slug": "index" }
 ```
-## Known Gotchas
-**GH Pages must be set to "GitHub Actions"** in Settings → Pages. "Deploy from branch" ignores the deploy-pages action entirely.
+## Gotchas
-**docs/data/ must be in .gitignore** but `docs/index.html` and `docs/app.js` must NOT be ignored — they are the committed source files.
+GH Pages must be set to "GitHub Actions" in Settings → Pages. "Deploy from branch" ignores the deploy-pages action.
-**flatspace npx cold start**: first CI run downloads flatspace — takes ~10s extra. Subsequent runs use cache if `actions/setup-node` cache is configured.
+`docs/data/` is gitignored; `docs/index.html` and `docs/app.js` are not — they are the committed source files.
-**importmap browser support**: all modern browsers support importmap. No IE, no Safari < 16.4. For GH Pages this is fine.
+`npx flatspace` cold-start is ~10s on first CI run; subsequent runs use the `actions/setup-node` cache.
-**webjsx CDN version**: pin to explicit version in importmap (e.g. `@0.0.42`) — avoid `@latest` to prevent silent breakage on upstream updates.
+Pin the webjsx CDN version in importmap (e.g. `@0.0.42`) — `@latest` breaks silently on upstream updates.

package/skills/planning/SKILL.md CHANGED Viewed

@@ -4,131 +4,40 @@ description: State machine orchestrator. Mutable discovery, PRD construction, an
 allowed-tools: Skill, Bash, Write, Read, Agent
 ---
-# Planning — State Machine Orchestrator
+# Planning — PLAN phase
-Runs `PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS → COMPLETE`. Re-enter on any new unknown in any phase.
+Translate the request into `.gm/prd.yml` and hand to `gm-execute`. Re-enter on any new unknown in any phase.
-## RECALL — HARD RULE
+Cross-cutting dispositions (autonomy, fix-on-sight, nothing-fake, browser-witness, scope, recall, memorize) live in `gm` SKILL.md; this skill only carries what is unique to PLAN.
-Before naming any unknown, run recall.
+## Transitions
-```
-exec:recall
-<2-6 word query>
-```
-Triggers: matches prior topic | "have we seen this" | designing where prior decision likely exists | quirk feels familiar | sub-task in known project.
-Hits = weak_prior; witness via EXECUTE before adopting. Skip recall only on brand-new project / trivially-bounded edit / surgical user instruction.
-## MEMORIZE — HARD RULE
-Every unknown→known = same-turn memorize. Background, parallel, never batched.
-```
-Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
-```
-Triggers: exec output answers prior unknown | code read confirms/refutes | CI log reveals root cause | user states preference/constraint | fix worked non-obviously | env quirk observed.
-N facts → N parallel Agent calls in ONE message.
-## STATE MACHINE
-**FORWARD**: PLAN → `gm-execute` | EXECUTE → `gm-emit` | EMIT → `gm-complete` | VERIFY .prd remains → `gm-execute` | VERIFY .prd empty+pushed → `update-docs`
-**REGRESSIONS**: new unknown anywhere → `planning` | EXECUTE unresolvable 2 passes → `planning` | EMIT logic error → `gm-execute` | VERIFY broken output → `gm-emit` | VERIFY logic wrong → `gm-execute`
-Runs until: .gm/prd.yml empty AND git clean AND all pushes confirmed AND CI green.
-## AUTONOMY — HARD RULE
-**USER REQUEST = AUTHORIZATION.** The user's message asking for X is the green light. PLAN's job is to translate that ask into a PRD and start — not to re-confirm. "Want me to do X?", "should I take shape A or B?", "this is multi-repo work, OK to proceed?" after the user said "do X" are all forced closure. When the user surfaces a tradeoff (deep vs light, single-file vs cross-repo), pick the read that matches the obvious meaning of the request — "deeply integrate" means deep, "all platforms" means all — declare the choice in one line ("going with A because Y") and execute. Multi-repo scope, build cost, CI duration, binary-size impact, and "this will take a while" are never grounds to re-confirm. The user already knows; that's why they asked.
-PRD written → execute to COMPLETE without asking the user. Doubts that arise during execution are resolved by witnessed probe, by recall, or by re-reading the PRD — never by asking. Any question whose answer is reachable from the agent's tools belongs to the agent, not the user.
-Asking is last-resort: destructive-irreversible without PRD coverage, OR user intent irrecoverable from PRD/memory/code/web. Channel: `exec:pause` (renames `prd.yml` → `prd.paused.yml`; question in header). In-conversation asking is last-resort beneath last-resort.
-**WEB-SEARCH BEFORE PAUSE — HARD RULE.** Before `exec:pause` for any blocking question whose answer could plausibly exist on the public web — missing artifact, unknown library/API, prebuilt binary, version compatibility, build recipe, upstream status — fire `WebSearch` and at least one `WebFetch` first. Only after the web pass returns empty (or returns options the agent then witnesses and rejects) is `exec:pause` legitimate. Pausing on a question the web could have answered is forced closure dressed up as humility — fix on sight by re-entering planning, web-searching, and resuming. The only questions that genuinely require user-ask are ones the public web cannot answer: private credentials, intent/preference between viable options the agent has *already surfaced*, destructive-irreversible authorization.
-**Cannot stop while**: `.gm/prd.yml` has items | git uncommitted | git unpushed.
-**MAXIMIZE THE SCOPE — HARD RULE.** When the request admits a deeper, broader, or more thorough shape that is reachable from this session, plan it that way. The internal "is this too much work?" question is the agent's, not the user's — and the answer is yes when the larger shape matches the spirit of the ask. "Deeply integrate", "do it right", "all of them", "every X", "across the whole Y", "keep going", and silence between options all read as *go wider*. Default to the maximum scope the obvious read of the request supports; declare the read in one PRD line so the user can interrupt; execute. A PRD that ships a smaller-than-necessary slice while a larger reachable shape exists is the same failure as distributed refusal. The bar for *not* expanding is the same destructive-irreversible-and-not-covered bar that gates asking; below that bar, larger always beats smaller.
-## MAXIMAL COVER — HARD RULE
-When scope exceeds reach, expand the cover. Don't refuse. Don't ship one slice with the rest abandoned as "follow-up" — that's distributed refusal: the same failure dressed up as triage.
-**Required move when scope exceeds reach**: construct a *covering family* — every bounded subset of the request that is witnessable from this session — and write the family into the PRD as separate items, with the dependency graph explicit so independent members parallelize. Execute every member.
-**Residuals the agent judges within the spirit of the original ask AND completable from this session are self-authorized — expand the PRD with them and execute, do not stop to ask.** The judgment is the agent's honest read of what the user probably wanted, paired with reachability from this session. Only residuals genuinely outside the original ask, or genuinely unreachable from this session, are name-and-stop. When expanding under self-authorization, the agent declares its judgment in the response ("treating X as in-scope because Y") so the user can correct mid-chain. Silent expansion without the declaration is the failure mode this rule guards against.
-Enforcement is on what is delivered, not on which words appear. Before closing the turn, check that committed work + named out-of-spirit residuals = witnessable closure of the request. Gap = cover not yet maximal → re-enter PLAN to expand.
-## FIX ON SIGHT — HARD RULE
-Every issue surfaced during planning, execution, or verification is fixed in-band, the same session, at root cause. A known-bad signal carried past the moment of detection — by deferral, suppression, silencing, skipping, or "next time" narration — is a small forced closure.
-Surface → diagnose → fix → re-witness → continue. New unknown surfaced by the fix → regress here. Genuinely out-of-scope-irreversible → the residual goes into `.gm/prd.yml` *before* moving on; narration is not a substitute for an item.
+- PLAN done → `gm-execute`
+- New unknown anywhere in chain → re-enter PLAN
+- EXECUTE unresolvable after 2 passes → PLAN
+- VERIFY: `.prd` empty + git clean + pushed → `update-docs`; else → `gm-execute`
-**Auto-plan incidental errors**: any fixable error encountered while working on the user's task — even one unrelated to the original ask (pre-existing build break, neighboring lint failure, lockfile drift, broken dep, stale generated file) — is added to `.gm/prd.yml` as a new item the same turn it surfaces and executed before COMPLETE. The user does not have to notice it or ask. Closing a turn with a known-bad signal still emitting is forced closure. Only errors needing external credentials, product decisions, or down services are name-and-stop with `blockedBy: external`.
+Cannot stop while `.gm/prd.yml` has items, git is dirty, or commits are unpushed.
-**Auto-plan obvious re-architecting**: same rule for clear refactor wins surfaced mid-task — hand-rolled code competing with a known library that does the same thing cleaner, multi-file ad-hoc system one import would replace, duplicated logic crying out for one helper. Re-enter PLAN, add the item, execute. Bar is *obvious + reachable from this session*; speculative or aesthetic changes stay out.
+## Orient
-**Cross-session PRD continuity**: `.gm/prd.yml` is durable. Items left over from previous sessions are this session's work as soon as they're seen. Finish every item before COMPLETE — including ones the current user message did not mention. "From another session" is never a reason to skip.
+Open every plan with one parallel pack of `exec:recall` + `exec:codesearch` against the request's nouns. Hits land as `weak_prior`; misses confirm the unknown is fresh. The pack runs in one message.
-## BROWSER WITNESS — HARD RULE
+## Mutable discovery
-A `.prd` item that touches browser-facing code is not plan-complete unless its acceptance criteria include a live `exec:browser` witness with a `page.evaluate` assertion against the specific invariant the change establishes. "Manual verification", "test.js passes", and "browser test optional" are all unwitnessed and therefore unacceptable.
+For each aspect of the work, ask: what do I not know, what could go wrong, what depends on what, what am I assuming. Unwitnessed assumptions are mutables.
-The trigger is functional, not a path-list: any change whose effect is observable in the DOM, canvas, WebGL surface, network frames captured by the page, or any global the page exposes, requires the browser witness. Pure-prose edits to static documents with no behavior change are exempt; the exemption is tagged on the item with the reason.
+Fault surfaces to scan: file existence, API shape, data format, dep versions, runtime behavior, env differences, error conditions, concurrency, integration seams, backwards compat, rollback paths, CI correctness.
-Propagation: EXECUTE witnesses on edit, EMIT re-witnesses post-write, VERIFY runs the final gate. The plan must encode the rule so all three layers fire.
+Tag every item with a route family (grounding | reasoning | state | execution | observability | boundary | representation) and cross-reference the 16-failure taxonomy. `governance` skill holds the table.
-## NOTHING FAKE — HARD RULE
+`existingImpl=UNKNOWN` is the default; resolve via `exec:codesearch` before adding the item. An existing concern routes to consolidation, not addition.
-Plan items resolve when real input flows through real code into real output. Stubs, mocks, placeholder returns, fixture-only branches, "TODO: implement", and demo-mode short-circuits do not count as resolution — they are mutables wearing closed-status disguise.
+Plan exits when zero new unknowns surfaced last pass AND every item has acceptance criteria AND deps are mapped.
-Acceptance criteria must witness behavior, not the existence of a function with the right name. "X is implemented" is not acceptance; "X called with real Y produces real Z" is. The agent that satisfies the criterion via a stub has built something that will lie when production calls it.
+## .prd format
-Scaffolding and shims are permitted only when the shim *delegates* to real behavior — wraps an upstream API, calls a real subprocess, hits a real disk. Before adding a shim, the plan asks whether a published library or tool already provides that surface; maintaining a local reimplementation of an upstream solution is its own failure mode and the shim line should usually become an import line.
-The fake-detection test is behavioral: would the code, executed against the inputs it claims to accept, produce the outputs it claims to produce? If the answer requires "after we fill in the body" or "once X is wired up", the plan item is open, not done.
-## ORIENT — HARD RULE
-Open every plan with a parallel pack of `exec:recall` and `exec:codesearch` against the request's nouns. Hits land as `weak_prior`; misses confirm the unknown is fresh. The pack runs in one message — never serially. The agent that skips orient pays the same cost in fresh probes a turn later, plus the price of disagreeing with its own prior witness.
-## PRD — HARD RULE
-`./.gm/prd.yml` is the authorization. It is written before EXECUTE fires for any task that touches more than one line in one file. The cost of writing it equals the cost of skipping it; what the file buys is durable trace, resumability, and the cover-maximality check.
-## PLAN PHASE — MUTABLE DISCOVERY
-For every aspect: what do I not know (UNKNOWN) | what could go wrong (failure mode) | what depends on what (blocking/blockedBy) | what assumptions am I making (unwitnessed hypothesis = mutable).
-Fault surfaces: file existence | API shape | data format | dep versions | runtime behavior | env differences | error conditions | concurrency | integration seams | backwards compat | rollback paths | CI correctness.
-**Route family** (governance): tag every item — grounding|reasoning|state|execution|observability|boundary|representation.
-**Failure-mode mapping**: cross-reference 16-failure taxonomy.
-**MANDATORY CODEBASE SCAN**: `existingImpl=UNKNOWN` for every item. Resolve via exec:codesearch before adding. Existing concern → consolidation, not addition.
-**EXIT PLAN**: zero new unknowns last pass AND every item has acceptance criteria AND deps mapped → launch subagents or invoke `gm-execute`.
-## OBSERVABILITY — MANDATORY EVERY PASS
-Server: every subsystem exposes `/debug/<subsystem>`. Structured logs `{subsystem, severity, ts}`.
-Client: `window.__debug` live registry; modules register on mount.
-`console.log` ≠ observability. Discovery of gap → add .prd item immediately, never deferred.
-**No parallel observability surfaces.** `window.__debug` is THE in-page registry; `test.js` at project root is the sole out-of-page test asset. Any new file whose purpose is to exercise, smoke-test, demo, or sandbox in-page behavior outside that registry fights the discipline — extend the registry instead.
-## .PRD FORMAT
-Path: `./.gm/prd.yml`. Write via `exec:nodejs` + `fs.writeFileSync`. Delete when empty.
+Path: `./.gm/prd.yml`. Write via `exec:nodejs` + `fs.writeFileSync`. Delete the file when empty.
 ```yaml
 - id: kebab-id
@@ -150,64 +59,42 @@ Path: `./.gm/prd.yml`. Write via `exec:nodejs` + `fs.writeFileSync`. Delete when
     - failure mode
 ```
-**load** axis (consequence — convergence 3.3.0): 0.9 = headline collapses if wrong. 0.7 = sub-argument rebuilt. 0.4 = local patch. 0.1 = nothing breaks. **Verification budget = load × (1 − tier_confidence)**. High-load + low-tier item = top priority. λ>0.75 must be witnessed before EMIT.
-Status: pending → in_progress → completed (remove). Effort: small <15min | medium <45min | large >1h.
-## PARALLEL SUBAGENT LAUNCH
-After .prd written, ≤3 parallel `gm:gm` subagents for independent items in ONE message. Browser tasks serialize.
-`Agent(subagent_type="gm:gm", prompt="Work on .prd item: <id>. .prd path: <path>. Item: <full YAML>.")`
-Not parallelizable → invoke `gm-execute` directly.
+`load` is consequence-if-wrong: 0.9 = headline collapses, 0.7 = sub-argument rebuilt, 0.4 = local patch, 0.1 = nothing breaks. Verification budget = `load × (1 − tier_confidence)`. λ>0.75 must reach witnessed before EMIT.
-## EXECUTION RULES
+`status`: pending → in_progress → completed (then remove). `effort`: small <15min | medium <45min | large >1h.
-`exec:<lang>` only via Bash. File I/O via exec:nodejs + fs. Git directly in Bash. Never Bash(node/npm/npx/bun).
+## Parallel subagent launch
-File paths in `exec:nodejs` are platform-literal — node's `fs` does not auto-resolve POSIX shortcuts like `/tmp`. Use `os.tmpdir()` and `path.join` for portable temp paths; reserve `/tmp/...` for `exec:bash` heredocs where the shell rewrites the prefix.
+After `.prd` is written, up to 3 parallel `gm:gm` subagents for independent items in one message. Browser tasks serialize.
-Every `exec:<lang>` and `exec:bash` call should pass `--timeout-ms <ms>` (mandatory at the rs-exec RPC layer; the CLI emits a transitional warning when omitted). On timeout, partial streamed output is preserved and the runner emits `[exec timed out after Nms; partial output above]` — treat the marker as authoritative truncation and re-issue with a higher budget rather than retrying blindly.
-**Utility-verb syntax**: `exec:wait`, `exec:sleep`, `exec:status`, `exec:close`, `exec:pause`, `exec:type`, `exec:runner`, `exec:kill-port`, `exec:recall`, `exec:memorize`, `exec:forget` all take their argument on the **next line** (heredoc body), never inline. `exec:status\n<task_id>` polls one task; bare `exec:status` lists all. `exec:sleep\n<task_id>` blocks until the task produces output or completes. `exec:close\n<task_id>` terminates and removes a task. Inline forms (`exec:status <id>`) are denied by the hook.
-`exec:codesearch` only — Glob/Grep/Find/Explore hook-blocked. Start 2 words → change/add one per pass → minimum 4 attempts before concluding absent.
-Pack runs: Promise.allSettled for parallel. Each idea own try/catch. Under 12s per call.
-## DEV WORKFLOW
-No comments. No scattered test files. 200-line limit per file. Fail loud. No duplication. Scan before edit. AGENTS.md via memorize agent only. CHANGELOG.md append per commit.
+```
+Agent(subagent_type="gm:gm", prompt="Work on .prd item: <id>. .prd path: <path>. Item: <full YAML>.")
+```
-**Minimal code process** (stop at first that resolves): native → library → structure (map/pipeline) → write.
+Items not parallelizable → invoke `gm-execute` directly.
-## SINGLE INTEGRATION TEST POLICY
+## Observability gates in the plan
-One `test.js` at project root. **200-line hard cap.** No fixtures, mocks, or scattered test files under any naming convention. Plain assertions, real data, real system. `gm-complete` runs it. Failure = regression to EXECUTE.
+Server: every subsystem exposes `/debug/<subsystem>`; structured logs `{subsystem, severity, ts}`. Client: `window.__debug` live registry; modules register on mount. `console.log` is not observability. Discovery of a gap during PLAN adds a `.prd` item the same pass — never deferred.
-Any second test runner — under any name, in any directory — is a smuggled parallel surface and fights the discipline. If a behavior needs to be exercised in-page, register it in `window.__debug` and assert via `test.js`.
+`window.__debug` is THE in-page registry; `test.js` at project root is the sole out-of-page test asset. Any new file whose purpose is to exercise, smoke-test, demo, or sandbox in-page behavior outside that registry fights the discipline — extend the registry instead.
-**Purpose: maximum surface coverage in 200 lines.** test.js is a budget, not a target. Every line should witness a load-bearing behavior; redundant assertions are dead weight. Subsystems get *one* group each — combined groups (e.g. `profiles+observability+auth+context+cron+batch`) are the norm, not the exception. As thoth grew from 17 → 21 → 14 named groups while the surface tripled, the win came from collapsing per-subsystem groups into multi-subsystem ones.
+## Test discipline encoded in the plan
-**Use overlap to exclude.** When subsystem A's test exercises B as a side effect, B does not need its own group — drop the redundant assertion. Examples that have proven out:
-- The agent-machine tool-loop test exercises bash dispatch → no separate bash test needed beyond a smoke-call inside the tools+toolsets group.
-- The dashboard test asserts the API surface AND that the registry has ≥N tools → covers tool registration coverage.
-- The plugins+memory group exercises observability metrics + achievements → no need for a separate plugins-extra group.
-- The gateway test exercising one platform plus a platform-stub-shape loop covers all 18 adapters in one group.
+One `test.js` at project root, 200-line hard cap, real data, real system. No fixtures, mocks, or scattered tests. A second test runner under any name in any directory is a smuggled parallel surface.
-**Adding a new subsystem:** first try to fold its assertion into the closest existing group. Only create a new group when the subsystem's failure mode is genuinely orthogonal (e.g. compressor's iterative-update behavior is not exercised by any other group). Test surface should grow linearly with subsystem count, not multiplicatively, and the line budget is the forcing function.
+The 200 lines are a *budget* for maximum surface coverage, not a target. Subsystems get one combined group each — names joined with `+` (`home+config+skin`, `mcp+swe+distributions+account+credpool`). When a new subsystem's failure mode overlaps an existing group's side-effects, fold the assertion in rather than creating a new group. When `wc -l test.js > 200`, the discipline is *merge groups + drop redundancy*, never split.
-**Pattern that works:** name combined groups by joining their subsystems with `+`, e.g. `home+config+skin`, `mcp+swe+distributions+account+credpool`, `env+pi+cli+tui+setup+website`. Future readers see the coverage at a glance. A group title with 4–6 components is healthy; a group with 1 component should be questioned.
+## Execution norms encoded in the plan
-**Hygiene at edit time:** every change to test.js prefers compaction over expansion. If `wc -l test.js > 200`, the discipline is *not* "split" — it's "merge groups + drop redundancy" until it fits. If the budget is genuinely insufficient for the load-bearing surface, the right move is to question whether the assertion is load-bearing, not to lift the cap.
+`exec:<lang>` only via Bash; file I/O via `exec:nodejs` + `fs`; git directly in Bash; never `Bash(node/npm/npx/bun)`. Paths in `exec:nodejs` are platform-literal — use `os.tmpdir()` and `path.join`, reserve `/tmp/...` for `exec:bash` heredocs. Every `exec:<lang>` and `exec:bash` call passes `--timeout-ms <ms>`; on timeout, partial output is preserved and the runner emits `[exec timed out after Nms; partial output above]` — re-issue with a higher budget rather than retrying blindly.
-## RESPONSE POLICY
+`exec:codesearch` only — Grep/Glob/Find/Explore are hook-blocked. Start two words, change/add one per pass, minimum four attempts before concluding absent.
-Terse. Drop filler. Fragments OK. Pattern: `[thing] [action] [reason]. [next step].` Code/commits/PRs = normal prose.
+Pack runs use `Promise.allSettled`, each idea its own try/catch, under 12s per call.
-## CONSTRAINTS
+## Dev workflow encoded in the plan
-**Never**: Bash(node/npm/npx/bun) | skip planning | partial execution | stop while .prd has items | stop while git dirty | sequential independent items | screenshot before JS exhausted | fallback/demo modes | swallow errors | duplicate concern | leave comments | scattered tests | if/else where map suffices | one-liners that obscure | leave resolved unknown un-memorized | batch memorize | serialize memorize spawns
+No comments. 200-line per-file cap. Fail loud. No duplication. Scan before edit. AGENTS.md edits route through the memorize sub-agent only. CHANGELOG.md gets one entry per commit.
-**Always**: invoke Skill at every transition | regress to planning on new unknown | witnessed execution only | scan before edits | enumerate observability gaps every pass | follow chain end-to-end | prefer dispatch tables and pipelines | make wrong states unrepresentable | spawn memorize same turn unknown resolves | end-of-turn self-check
+Minimal-code process, stop at the first that resolves: native → library → structure (map / pipeline) → write.

package/skills/research/SKILL.md CHANGED Viewed

@@ -6,21 +6,17 @@ allowed-tools: Skill, Bash, Agent, WebFetch, WebSearch, Read, Write
 # Research
-Lead orchestrates. Workers fetch. Findings converge. The lead never reads pages — workers do.
+Lead orchestrates. Workers fetch. Findings converge on disk. The lead never reads pages — workers do.
-## Shape of the work
+Effort matches stakes. A single fact is one short fetch. A vendor comparison is a handful of workers, each owning one vendor. A landscape survey is ten or more, each owning one axis. Spending a fan-out on a fact wastes tokens; spending a fact-fetch on a landscape under-delivers.
-Breadth first, depth on demand. Open with a wide sweep that maps the terrain; commit deep dives only where the sweep surfaces something load-bearing. A narrow opening misses the alternative the user actually needed.
-Effort matches stakes. A single fact warrants one short fetch. A vendor comparison warrants a handful of workers, each owning one vendor. A landscape survey warrants ten or more, each owning one axis. Spending a fan-out on a fact wastes tokens; spending a fact-fetch on a landscape under-delivers.
-Workers run in parallel. Independent questions launch in one message, never serialized. Serial fan-out is the default failure mode — guard against it explicitly.
+Breadth first, depth on demand. Open with a wide sweep that maps the terrain, then commit deep dives only where the sweep surfaces something load-bearing. A narrow opening misses the alternative the user actually needed.
 ## Worker contract
-Each worker receives: the precise question it owns, the shape of the answer (bullets, table row, prose paragraph), the boundary of what it must not pursue, and the destination path under `.gm/research/<slug>/<worker-id>.md` for its findings. Vague briefs produce vague returns; the lead's job is to make the brief unambiguous before spawning.
+Each worker receives the precise question it owns, the shape of the answer (bullets, table row, prose paragraph), the boundary of what it must not pursue, and the destination path under `.gm/research/<slug>/<worker-id>.md`. Workers write structured findings to disk and return only a path plus a one-line summary. The lead reads the paths it cares about; the rest stay on disk. Returning full prose through the agent boundary burns context that the synthesis pass needs.
-Workers write structured findings to disk and return only a path plus a one-line summary. The lead reads paths it cares about; the rest stay on disk. Returning full prose through the agent boundary burns context that the synthesis pass needs.
+Workers run in parallel — independent questions launch in one message, never serialized.
 ## Citations
@@ -28,13 +24,11 @@ A claim without a source URL is a hallucination waiting to be quoted. Workers at
 ## Source quality
-Content farms and SEO-optimised listicles outrank primary sources on most queries. Prefer vendor docs, RFCs, primary repos, dated blog posts from named authors, and academic preprints over aggregator pages. When two sources disagree, the older primary usually beats the newer aggregator.
+Vendor docs, RFCs, primary repos, dated blog posts from named authors, and academic preprints beat aggregator pages. When two sources disagree, the older primary usually beats the newer aggregator.
 ## Convergence
-Synthesis happens once, after all workers return. Mid-flight summarisation truncates findings the next worker would have built on. The lead holds the question, the workers' paths, and the synthesis prompt; everything else lives on disk.
-If a worker's return reveals a new axis the original plan missed, expand the fan-out — do not stretch an existing worker past its brief.
+Synthesis happens once, after all workers return. Mid-flight summarisation truncates findings the next worker would have built on. If a worker's return reveals a new axis the original plan missed, expand the fan-out — do not stretch an existing worker past its brief.
 ## When not to fan out
@@ -42,4 +36,4 @@ One question, one page, one fetch. A single `WebFetch` answers it. The fan-out m
 ## Handoff
-Final answer cites every load-bearing claim, names the workers' output paths for audit, and surfaces disagreements between sources rather than averaging them away.
+Final answer cites every load-bearing claim, names the workers' output paths for audit, and surfaces disagreements rather than averaging them away.

package/skills/ssh/SKILL.md CHANGED Viewed

@@ -3,13 +3,14 @@ name: ssh
 description: Run shell commands on remote SSH hosts via exec:ssh. Reads targets from ~/.claude/ssh-targets.json. Use for deploying, monitoring, or controlling remote machines.
 ---
-# exec:ssh — Remote SSH Execution
+# exec:ssh — Remote SSH execution
-Runs shell commands on remote host. No manual connection needed.
+Runs shell commands on a remote host. No manual connection needed.
 ## Setup
 `~/.claude/ssh-targets.json`:
 ```json
 {
   "default": { "host": "192.168.1.10", "port": 22, "username": "pi", "password": "pass" },
@@ -17,7 +18,7 @@ Runs shell commands on remote host. No manual connection needed.
 }
 ```
-Fields: `host` (required), `port` (default 22), `username` (required), `password` OR `keyPath` + optional `passphrase`.
+`host` and `username` required. `port` defaults to 22. Auth: `password` OR `keyPath` + optional `passphrase`.
 ## Usage
@@ -26,28 +27,32 @@ exec:ssh
 <shell command>
 ```
-Named host with `@name` on first line:
+Named host with `@name` on the first line:
 ```
 exec:ssh
 @prod
 sudo systemctl restart myapp
 ```
-## Process Persistence
+## Process persistence
+SSH kills child processes on close. To survive disconnect:
-SSH kills child processes on close. To persist:
 ```
 exec:ssh
 sudo systemctl reset-failed myunit 2>/dev/null; systemd-run --unit=myunit bash -c 'your-command'
 ```
-Unique name:
+Unique unit name per launch:
 ```
 exec:ssh
 systemd-run --unit=job-$(date +%s) bash -c 'nohup myprogram &'
 ```
-Fallback (no systemd):
+No-systemd fallback:
 ```
 exec:ssh
 setsid nohup bash -c 'myprogram > /tmp/out.log 2>&1' &
@@ -55,7 +60,8 @@ setsid nohup bash -c 'myprogram > /tmp/out.log 2>&1' &
 ## Dependency
-Requires `ssh2` npm package in `~/.claude/gm-tools`:
+Requires `ssh2` in `~/.claude/gm-tools`:
 ```
 exec:bash
 cd ~/.claude/gm-tools && npm install ssh2