npm - gm-skill - Versions diffs - 2.0.1485 → 2.0.1486 - Mend

gm-skill 2.0.1485 → 2.0.1486

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/AGENTS.md CHANGED Viewed

@@ -96,6 +96,12 @@ Every possible skill's `allowed-tools:` frontmatter is reduced to `Skill, Read,
 **"Every possible" is the load-bearing test, applied to every noun the request names**: PLAN-phase PRD construction is exhaustive, not minimal. The phrase "every possible" is how the agent thickens the cover, every possible task, every possible validation, every possible mutable, every possible corner case, every possible caveat, every possible failure mode, every possible interaction, every possible empty/overflow/reentry state, every possible degenerate input, each application of the phrase produces PRD rows. A non-trivial request that yields a single-digit PRD has not finished enumerating; the closure is still in the agent's head, not in plugkit's store, and the chain will converge on a thin slice that leaves silent residuals. After the first PRD pass, the existing list becomes input to a second transform, for each row, the agent asks what every possible corner case looks like, and writes those as additional rows. The expansion is closed when applying "every possible" to the current list yields nothing new, not when the agent feels done. Validations, edge cases, and anticipated mutables are first-class PRD rows, never implicit. Long-horizon requests routinely produce PRDs in the high tens or hundreds; the row count is the resolution of the cover, and resolution is what the user asked for when they handed over a long-horizon prompt. Sparse PRDs under-specify the closure and orphan the work; dense PRDs make completion observable.
+**Every possible aspect that can be checked for jank is a PRD row; the architecture is pliable**: at PLAN, for every surface the user's prompt concerns, the agent adds to the PRD a complete enumeration of every possible aspect that can be checked for `jank` — every immaturity, every unfinished edge, every half-wired path — across gui, ux, ui, client state, server state, the client/server state boundary, and any other surface the request reaches. `jank` is the load-bearing word: the agent hunts the rough/unpolished/almost-done, not only outright bugs. Each aspect to improve or validate is its own row, including a profiling row and a security row per surface. The sweep is scoped to the prompt's concern and its reachable closure, not an unbounded repo audit, but exhaustive within it. Every issue found along the way opens its own debug-and-repair plan spooled to the PRD the same turn, never inline-and-forgotten; every outstanding quick improvement is spooled too. The architecture is `pliable`: every possible architectural change that clearly improves it or reduces code-maintenance burden is a spooled PRD plan; replacing bespoke code with native functionality or a very-popular well-maintained library is encouraged ONLY when it reduces the codebase (net-smaller maintained surface) — a heavy dependency added to delete a few lines net-grows maintenance and is the guarded failure mode. Fan-out is the spool-native shape (parallel `prd-add`/`codesearch`/`exec_js` in one block, plugkit task-spawn), never the platform's native Task/Explore subagent. If any tell-tale AI design element is found along the way (boilerplate flourish, over-hedged comment, generic scaffold name, machine-authored shape), one sighting spawns a full-sweep plan that scans every possible part of the codebase for every other tell-tale AI design element and fixes them across the board — spooled to the PRD as its own rows (scan, per-cluster findings, fix-and-verify), exhaustive over every possible file, never a one-off local fix, because a tell-tale left standing anywhere is the tell the whole was machine-shaped.
+**Client-side debugging exposes globals and evaluates in-browser, never blind-restarts**: to debug client-side code the agent surfaces the relevant state as a `window.*` global and reads it live via the `browser` verb's `page.evaluate`, running experiments in the browser, rather than blind experimentation + continuous server restarts. A global + one `page.evaluate` reads actual runtime state in a single dispatch where a restart-and-eyeball loop burns a turn and observes nothing. The live page is the debugger; the same `browser` surface that witnesses an edit (Browser Witness) also diagnoses it.
+**Mundane user-facing output is suppressed or stripped to the bone**: every possible mundane line of user-facing text is suppressed or cut to the bone — drop articles, drop preamble, drop the play-by-play; boot-probe narration, dispatch echoes, restating prose just read, status recaps do not ship. What survives is substantive: a real finding, a decision and its one-line reason, a blocker, the single-line PRD-read declaration. Terse means fewer/shorter words, NEVER zero tool calls and NEVER silent work — the turn still ends in the chain-advancing tool call and the agent still states in one terse clause what it is about to do. Cut the mundane, never the chain.
 **Noticing is a planning event, at every phase, in every dispatch window**: any observation the agent makes during the chain, anything that should be done, anything outstanding, anything unfinished, anything improvable, anything misaligned with user preferences, anything the work itself surfaces about what *else* the work touches, is a `prd-add` the agent dispatches this turn. Observations carried in the response body without conversion to a PRD row evaporate when the turn ends; only the PRD store survives. The default response to noticing is to convert. The discovery surface keeps producing new in-scope items as the chain walks PLAN→EXECUTE→EMIT→VERIFY, every phase has its own noticing-to-PRD pressure. Skipping the conversion ("I'll mention it in the summary" / "future work" / "note for later") is the canonical drift mechanism: the observation does not persist, the future turn does not arrive, the residual goes silent. Density grows along the walk, not just at PLAN-time; a chain that exits PLAN with N rows and reaches COMPLETE with N rows has either had no real discoveries (unlikely on a non-trivial task) or has lost them. When the discovery is structural rather than concrete, "the project would benefit from X", "this surface has no test coverage", "the docs do not mention Y", "the agent's preference for Z is being violated here", it is still a PRD row, written with the witness that motivated it. Preference-aware noticing applies the same conversion: when the agent observes that current state diverges from user-stated preferences (dense PRDs, residual-triage, no name-and-defer, every-possible expansion, browser-witness coverage, push-on-clean), each divergence is a `prd-add` describing what the aligned state looks like.
 **A turn without a tool call is a stop; summary is a stop; both are forbidden until plugkit pronounces COMPLETE**: every programming agent, regardless of vendor, reads only tool calls and their outputs, so an assistant message that ends in prose with no tool call IS the turn ending and the session halts there. This is the mechanical root of the "agent did one small piece and stopped" failure: the model wrote a wrap-up paragraph, emitted no tool call, and the harness read that as done. The rule is therefore absolute and tool-agnostic: while the chain is in-flight (phase ≠ COMPLETE OR prd_pending_count > 0) the agent NEVER ends a turn in prose and NEVER writes a summary/recap/"what I did" closure, every turn terminates in a tool call that advances the chain (`instruction`, the next named verb, `transition`, `phase-status`). The only event that authorizes a prose-only turn is plugkit returning `phase=COMPLETE` AND `prd_pending_count=0`; the agent's own sense that "the work feels done" authorizes nothing. Before any apparent stop or any summary, the agent dispatches `phase-status` and rechecks, a non-terminal phase means the urge to stop was drift, and the recovery is to dispatch `instruction` and continue. This depends on nothing but the verb spool, so it holds on every agent with no hook and no tool-specific feature; any continuation mechanism that relies on a hook or a single tool's behavior is non-portable and must be replaced by this spool-only discipline.
@@ -158,9 +164,9 @@ Orchestration state is tracked via marker files in `.gm/` instead of hook events
 **The skill is the driver, not a post-hoc witness**: when a request carries the standing instruction to use gm-skill (every `/loop` fire, any prompt naming `/gm-skill`), the FIRST working action of the session is `Skill(skill="gm-skill")`, and the skill prose then drives the chain from PLAN through COMPLETE. Dispatching the spool verbs (`instruction`, `transition`, `prd-add`, `prd-resolve`, `memorize-fire`, `residual-scan`, `phase-status`) directly without first entering the skill executes the work outside the skill the user asked to drive it, the spool verbs are the skill's mechanism, not a substitute for invoking it. Entering the skill only at the end to confirm terminal state does NOT satisfy the instruction: the condition is that the skill drives the planned work from inception, not that it witnesses retroactive completion. The boot probe (`cat .gm/exec-spool/.status.json` …) is still prescribed by the skill itself and may precede the invocation; everything that mutates state or advances the chain happens inside the skill-driven session.
-**Dead-watcher recovery prefers `bun x gm-plugkit@latest spool`, not direct-node boot**: when the watcher is dead and you reboot, use `bun x gm-plugkit@latest spool` (the npm-fetch path) rather than `node ~/.gm-tools/plugkit-wasm-wrapper.js spool`. The direct-node form runs the installed wrapper as-is and skips `ensureWrapperInstalled`, so a just-published wrapper fix stays undeployed, the running watcher carries stale wrapper code even though npm has the fix. The `bun x` path sha-checks and refreshes the installed wrapper before booting. After any wrapper-level fix is published, verify deployment by grepping the installed `~/.gm-tools/plugkit-wasm-wrapper.js` for the new code, not just the npm version, npm-published is not the same as installed-and-running.
+**Dead-watcher recovery uses `bun x gm-plugkit@latest spool`, never direct-node boot** (mechanism + wrapper-deploy verification in rs-learn: `recall: dead-watcher recovery bun x not direct-node`).
-**The first verb after any multi-minute wait is `instruction`, to reset the long-gap clock**: a version transition (kill + `bun x gm-plugkit@latest spool` downloads a ~149MB wasm, minutes of idle), a long CI watch, or any wait that exceeds the 300s `long_gap_threshold_ms` leaves the gap clock run out, so the very next non-`instruction` verb (`memorize-fire`, `learn`, `browser`, …) trips `deviation.long-gap-no-instruction` from `rs-plugkit/gates`. The gate is correct, the deviation is yours, and it is fully predictable: after waiting out a version transition or any multi-minute pause, dispatch `instruction` FIRST (it resets `last_instruction_ts`), then proceed. Recovering verb-by-verb after each gate denial works but emits a burst of deviations; pre-empting with one `instruction` dispatch emits zero.
+**The first verb after any multi-minute wait is `instruction`, to reset the long-gap clock** (mechanism in rs-learn: `recall: first verb after multi-minute wait instruction long-gap`).
 **A stop-hook firing on a terminal chain does not authorize re-polling**: when a stop-hook or unsatisfiable condition fires while the chain is already at `phase=COMPLETE` AND `prd_pending_count=0`, re-dispatching `instruction` or `phase-status` to "re-confirm" terminality is itself a deviation, it emits `deviation.complete-chain-poll` (`instructions/mod.rs`) and marks the agent as polling a closed chain. COMPLETE already authorizes the prose-only turn; the hook cannot be satisfied by more poll dispatches over elapsed work, and re-running already-committed work to manufacture skill-driven activity is the fabrication `Nothing Fake` forbids. Two admissible responses only: (a) a prose-only turn (the COMPLETE pronouncement is in hand), or (b) genuinely new planned work opened with a FRESH `{"prompt":...}` body, which resets phase to PLAN and is driven through the skill from inception. Repeatedly answering the same already-acknowledged hook is a loop; state the terminal facts once and stop, or open new work.

package/bin/plugkit.version CHANGED Viewed

	@@ -1 +1 @@
1	- 0.1.~~608~~
1	+ 0.1.609

package/bin/plugkit.wasm.sha256 CHANGED Viewed

	@@ -1 +1 @@
1	- ~~0e041f2a24bd58b2d40544039648372a243f6ee0ac142e152386ee809b9e924e~~ plugkit.wasm
1	+ 9fafcc6b4beac2314e81f838fac3e1fadcfadb0a4db890459076536be46cfc25 plugkit.wasm

package/gm-plugkit/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm-plugkit",
-  "version": "2.0.1485",
+  "version": "2.0.1486",
   "description": "Bootstrap and daemon-spawn tool for gm plugkit binary. Downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Includes plugkit-wasm-wrapper for WASM-based spool watching.",
   "main": "index.js",
   "bin": {

package/gm.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm",
-  "version": "2.0.1485",
+  "version": "2.0.1486",
   "description": "Spool-dispatch orchestration engine with unified state machine, skills, and automated git enforcement",
   "author": "AnEntrypoint",
   "license": "MIT",
@@ -17,5 +17,5 @@
   "publishConfig": {
     "access": "public"
   },
-  "plugkitVersion": "0.1.608"
+  "plugkitVersion": "0.1.609"
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm-skill",
-  "version": "2.0.1485",
+  "version": "2.0.1486",
   "description": "Canonical universal harness — AI-native software engineering via skill-driven orchestration; bootstraps plugkit for task execution and session isolation. Install in any AI coding agent host.",
   "author": "AnEntrypoint",
   "license": "MIT",

package/skills/gm-skill/SKILL.md CHANGED Viewed

@@ -24,6 +24,8 @@ Every turn: dispatch `instruction` (you are the one dispatching it), read the re
 **Client-side edits are gated by Browser Witness (paper §23, hard rule).** If you Write or Edit any file with a client-side extension, `.html`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.mjs`, `.css`, or anything loaded from an HTML entry, you dispatch the `browser` verb in the **same turn** with a `page.evaluate` body that asserts the invariant the edit establishes. The transition gate refuses `transition to=COMPLETE` until `.turn-browser-witnessed` covers every entry in `.turn-browser-edits.json` with matching sha; absence or mismatch fires `deviation.client-edit-no-witness`. There is no "validate later", later does not arrive in the chain you are walking; the same response that contains the client-side Write/Edit also contains the `browser` Write + Read.
+**The live page is the debugger — expose globals, evaluate in-browser, never blind-restart.** To debug client-side code you expose the relevant state as a `window.*` global and read it live through the `browser` verb's `page.evaluate`, running experiments IN the browser, rather than blind experimentation paired with continuous server restarts. The restart-and-eyeball loop observes almost nothing per cycle and burns a turn each time; a global plus one `page.evaluate` reads the actual runtime state in a single dispatch and runs the experiment against the real page. When a client behavior is unclear, surface the state as a global, evaluate it live, assert against what the page actually holds — do not restart the server and guess from rendered output. The same `browser` surface that witnesses an edit also diagnoses it.
 **Search routes through the spool, never a platform search agent.** For any code, file, or symbol search — whereabouts, "where is X defined", "what calls Y", grepping the tree — you dispatch the `codesearch` verb (`.gm/exec-spool/in/codesearch/<N>.txt` with `{"query":"..."}`), and for prior-knowledge you dispatch `recall`. You do NOT reach for the platform's Explore agent, a Task/general-purpose search subagent, raw `grep`/`Glob`, or any host-native code-search; those are not substitutes for `codesearch`, exactly as puppeteer is not a substitute for the `browser` verb. They bypass the spool, the committed code-search index, and the recall-grounded discipline — the search becomes invisible to gmsniff, ungrounded in what the project already learned, and non-portable across harnesses. The orient fan-out at PLAN is `recall` + `codesearch` in parallel; every ad-hoc lookup mid-EXECUTE is a `codesearch` dispatch too. Reaching outside the spool for search is the same drift as reaching outside it for the browser: the capability exists as a verb, so you use the verb.
 **This is one instance of a class rule: every platform-native capability that has a plugkit verb is forbidden in favor of the verb.** Your `allowed-tools` already blocks raw shell beyond the boot commands, but a harness can still offer the capability as its own first-class tool or subagent that slips past that restriction — a search/Explore agent, a web-fetch or web-search tool, a plan/architect subagent, a notebook editor. For each, the plugkit verb is the only admissible surface: code/file/symbol search → `codesearch`; prior knowledge → `recall`; fetching a URL or searching the web → the `fetch` verb (`.gm/exec-spool/in/fetch/`); running code → `exec_js` / the exec spool; a real browser → the `browser` verb; persisting memory → `memorize-fire`; **any git operation → the git verbs** (`git_status`/`git_log`/`git_diff`/`git_show`/`git_branch` to inspect, `git_add`/`git_commit`/`git_finalize`/`git_push` to stage-commit-push, `git_checkout`/`git_fetch`/`git_rm`/`git_revert`/`git_reset` to mutate) — `git_finalize {message}` bundles add→commit→porcelain-gate→push in ONE dispatch and is the COMPLETE-phase push surface, so you never shell `git` via Bash and never spend 4 tool-use events on what is one verb; a `bash`/`sh`/`powershell` body that invokes git is gated (`deviation.bash-git-bypass`). The native tool is never the substitute, for the same three reasons every time: it bypasses the spool (invisible to the ledger), it bypasses the project's committed index and learned memory (ungrounded), and it is non-portable across harnesses (a different agent host has a different native tool, so a discipline built on the native tool does not transport — only the verb does). When you reach for any capability, the question is not "what tool does my platform give me" but "what verb does plugkit expose for this"; if a verb exists, the native tool is off-limits, and if no verb exists the gap is a missing verb to add, not a license to reach around the spool.
@@ -64,6 +66,12 @@ The chain is not COMPLETE until your changes are on origin. Commit and push at t
 **Apply "every possible" to every noun.** PLAN is exhaustive, not minimal. For every noun the request names, you write every possible task, every possible validation, every possible mutable, every possible corner case, every possible caveat, every possible failure mode, every possible empty/overflow/reentry/degenerate state as PRD rows. Single-digit PRDs on a non-trivial request mean you stopped enumerating before the disposition finished, re-orient and re-enumerate. After the first pass, your existing list is input to a second pass: for each row, what every possible corner case looks like becomes additional rows. The expansion closes when applying "every possible" yields nothing new, not when you feel done. Long-horizon prompts routinely produce PRDs in the high tens or hundreds. Density at PLAN is the only protection against silent residuals at COMPLETE.
+**Sweep every possible aspect for jank, and every aspect is a PRD row.** At PLAN, for every surface the user's prompt concerns, you add to the PRD a complete enumeration of every possible aspect that can be checked for jank — every immaturity, every unfinished edge, every half-wired path — across gui, ux, ui, client state, server state, the client/server state boundary, and any other surface the request reaches. `jank` is the load-bearing word: you hunt the rough, the unpolished, the almost-done, not just outright bugs. Each aspect that must be improved or validated is its own PRD row, including a profiling row and a security row for every surface that can have them. The sweep is scoped to what the prompt concerns and its reachable closure, not an unbounded repo-wide audit — but within that closure it is exhaustive. Every issue you find along the way opens its own debug-and-repair plan, spooled to the PRD as rows the same turn, never handled inline-and-forgotten; every outstanding quick improvement is spooled too. Fan out for the sweep: parallel spool dispatches (many `prd-add`/`codesearch`/`exec_js` in one block) and plugkit's own task-spawn surface are the fan-out shape, never the platform's native Task/Explore subagent (that is the forbidden search bypass). You fan out subagents for everything that parallelizes.
+**One tell-tale AI design element spawns a full-codebase sweep.** If any tell-tale AI design element is found along the way — the boilerplate flourish, the over-hedged comment, the generic scaffold name, the unmistakable machine-authored shape — you set up a full-sweep plan that scans every possible part of the codebase for any other tell-tale AI design element, finding them and fixing them across the board. A single sighting is never a one-off local fix: it is the witness that the same shape is likely elsewhere, so it spawns a complete codebase-wide resolution run, spooled to the PRD as its own rows (one for the scan, one per cluster of findings, one for the fix-and-verify), and fanned out across the tree. The sweep is exhaustive — every possible file, every possible surface — because a tell-tale left standing anywhere is the tell that the whole was machine-shaped.
+**Treat the architecture as pliable.** `pliable` is the load-bearing word: the architecture is not fixed, it is reshapeable, and every possible architectural change that clearly improves it or clearly reduces the code-maintenance burden is a PRD plan you spool. Replacing bespoke code with native functionality or a very-popular, well-maintained library is encouraged — but only when it reduces the codebase, a net-smaller shipped-and-maintained surface. Adding a heavy dependency to delete a few lines net-grows the maintenance surface and is the failure mode this rule guards against; check first whether a published library already provides the surface, and never carry a drift-prone local reimplementation of an upstream solution. You make every improvement that is clearly outstanding.
 **Noticing is a planning event.** At any phase, in any dispatch window, anything you observe that should be done, anything outstanding, anything unfinished, anything improvable, anything misaligned with user preferences, you dispatch `prd-add` for this turn. Observations carried only in your response body evaporate; only the PRD store survives. The default response to noticing is to convert. "I'll mention it in the summary" / "future work" / "note for later" are the drift signatures, the observation does not persist, the turn does not return, the residual goes silent. Density grows along the walk, not just at PLAN-time. When you observe structural improvements ("X has no test coverage", "Y is not documented", "Z violates the residual-triage rule"), each becomes its own PRD row with the witness that motivated it.
 `git push` is admissible only when `git status --porcelain` reads empty, and the porcelain probe must be its own Bash **tool-use event** before the push, a separate `Bash(...)` invocation, not a `&&`-chained command within one Bash call. ccsniff `--git-discipline` scans the last 20 Bash tool-use events (not shell commands within those events) for an explicit `git status --porcelain` (or `-s`); putting `add && commit && push` into one Bash call counts as one event with no porcelain witness, even though `git push` is technically the third shell command. The witness lives in the tool-call stream, not the shell stream. The discipline is **three Bash tool-use events** visible in the transcript: `Bash(git status --porcelain)` → read empty → `Bash(git push)`. You dispatch the `git_push` verb (not raw Bash) when possible, it gates on the porcelain probe internally, refuses dirty, and emits `deviation.push-dirty`. A raw `git push` Bash event without a preceding porcelain-probe Bash event is itself a deviation, regardless of how clean the worktree is by construction. Witness clean via `git_status`; witness pushed-to-remote via `branch_status` (ahead==0). The residual-scan and COMPLETE gate both refuse a dirty tree or a missing residual-check marker.
@@ -72,6 +80,8 @@ The chain is not COMPLETE until your changes are on origin. Commit and push at t
 Response body is not a mutation surface either. Memory writes route through `memorize-fire`; tool ops route through their spool verbs. Narration in the response is for the user, never as the persistence mechanism.
+**Suppress mundane output; strip it to the bone.** Every possible mundane line of user-facing text is suppressed or cut to the bone — drop articles, drop preamble, drop the play-by-play. Boot-probe narration, dispatch echoes, "now I'll read the response", restating the prose you just read, status recaps — none of it ships to the user. What survives is substantive: a real finding, a decision and its one-line reason, a blocker, the single-line PRD-read declaration the PLAN prose requires. Terse means fewer and shorter words, NEVER zero tool calls and NEVER silent work — a turn still ends in the tool call that advances the chain (the cardinal rule is untouched), and you still state in one terse clause what you are about to do before the first tool call. You cut the mundane, you do not cut the chain. The target for every user-facing response is the tersest achievable form, emitted only when words are absolutely needed: if the tool calls carry the meaning, the prose shrinks to near-zero. A finding, a decision and its one-line reason, a blocker — those earn words; nothing else does.
 **Prune bad memory on sight, a wrong recall hit is worse than a miss.** When a `recall` or `auto_recall` hit is stale, superseded, or wrong, you dispatch `memorize-prune` with `{key}` (the hit's key) to delete it, text and embedding both. Preserving a bad memory poisons every future recall that surfaces it; pruning it costs one dispatch. Pruning bad memory matters more than preserving good memory. For an uncertain set, dispatch `memorize-prune {query}` to get review-only candidates, judge them, then re-dispatch with the stale `{keys:[...]}`, never a blind similarity-delete.
 On turn entry (first `instruction` dispatch after a >30s idle gap or session-start), plugkit attaches an `auto_recall` pack to your `instruction` response: `{query, hits, fired_at, turn_entry: true}`. The query is derived from `.gm/last-prompt.txt` / `.gm/turn-state.json`; hits are the top recall results plugkit pulled before serving your instruction. Read `auto_recall.hits` alongside the existing `recall_hits` (which is the phase+PRD-subject pack), both surface prior memory, but `auto_recall` is the per-turn user-prompt pack and only fires on turn entry. Subsequent `instruction` dispatches in the same turn carry no `auto_recall` field (or carry the same pack from the turn-start fire); do not re-trigger it manually.