gm-plugkit 2.0.1484 → 2.0.1486

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SKILL.md CHANGED
@@ -24,6 +24,8 @@ Every turn: dispatch `instruction` (you are the one dispatching it), read the re
24
24
 
25
25
  **Client-side edits are gated by Browser Witness (paper §23, hard rule).** If you Write or Edit any file with a client-side extension, `.html`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.mjs`, `.css`, or anything loaded from an HTML entry, you dispatch the `browser` verb in the **same turn** with a `page.evaluate` body that asserts the invariant the edit establishes. The transition gate refuses `transition to=COMPLETE` until `.turn-browser-witnessed` covers every entry in `.turn-browser-edits.json` with matching sha; absence or mismatch fires `deviation.client-edit-no-witness`. There is no "validate later", later does not arrive in the chain you are walking; the same response that contains the client-side Write/Edit also contains the `browser` Write + Read.
26
26
 
27
+ **The live page is the debugger — expose globals, evaluate in-browser, never blind-restart.** To debug client-side code you expose the relevant state as a `window.*` global and read it live through the `browser` verb's `page.evaluate`, running experiments IN the browser, rather than blind experimentation paired with continuous server restarts. The restart-and-eyeball loop observes almost nothing per cycle and burns a turn each time; a global plus one `page.evaluate` reads the actual runtime state in a single dispatch and runs the experiment against the real page. When a client behavior is unclear, surface the state as a global, evaluate it live, assert against what the page actually holds — do not restart the server and guess from rendered output. The same `browser` surface that witnesses an edit also diagnoses it.
28
+
27
29
  **Search routes through the spool, never a platform search agent.** For any code, file, or symbol search — whereabouts, "where is X defined", "what calls Y", grepping the tree — you dispatch the `codesearch` verb (`.gm/exec-spool/in/codesearch/<N>.txt` with `{"query":"..."}`), and for prior-knowledge you dispatch `recall`. You do NOT reach for the platform's Explore agent, a Task/general-purpose search subagent, raw `grep`/`Glob`, or any host-native code-search; those are not substitutes for `codesearch`, exactly as puppeteer is not a substitute for the `browser` verb. They bypass the spool, the committed code-search index, and the recall-grounded discipline — the search becomes invisible to gmsniff, ungrounded in what the project already learned, and non-portable across harnesses. The orient fan-out at PLAN is `recall` + `codesearch` in parallel; every ad-hoc lookup mid-EXECUTE is a `codesearch` dispatch too. Reaching outside the spool for search is the same drift as reaching outside it for the browser: the capability exists as a verb, so you use the verb.
28
30
 
29
31
  **This is one instance of a class rule: every platform-native capability that has a plugkit verb is forbidden in favor of the verb.** Your `allowed-tools` already blocks raw shell beyond the boot commands, but a harness can still offer the capability as its own first-class tool or subagent that slips past that restriction — a search/Explore agent, a web-fetch or web-search tool, a plan/architect subagent, a notebook editor. For each, the plugkit verb is the only admissible surface: code/file/symbol search → `codesearch`; prior knowledge → `recall`; fetching a URL or searching the web → the `fetch` verb (`.gm/exec-spool/in/fetch/`); running code → `exec_js` / the exec spool; a real browser → the `browser` verb; persisting memory → `memorize-fire`; **any git operation → the git verbs** (`git_status`/`git_log`/`git_diff`/`git_show`/`git_branch` to inspect, `git_add`/`git_commit`/`git_finalize`/`git_push` to stage-commit-push, `git_checkout`/`git_fetch`/`git_rm`/`git_revert`/`git_reset` to mutate) — `git_finalize {message}` bundles add→commit→porcelain-gate→push in ONE dispatch and is the COMPLETE-phase push surface, so you never shell `git` via Bash and never spend 4 tool-use events on what is one verb; a `bash`/`sh`/`powershell` body that invokes git is gated (`deviation.bash-git-bypass`). The native tool is never the substitute, for the same three reasons every time: it bypasses the spool (invisible to the ledger), it bypasses the project's committed index and learned memory (ungrounded), and it is non-portable across harnesses (a different agent host has a different native tool, so a discipline built on the native tool does not transport — only the verb does). When you reach for any capability, the question is not "what tool does my platform give me" but "what verb does plugkit expose for this"; if a verb exists, the native tool is off-limits, and if no verb exists the gap is a missing verb to add, not a license to reach around the spool.
@@ -64,6 +66,12 @@ The chain is not COMPLETE until your changes are on origin. Commit and push at t
64
66
 
65
67
  **Apply "every possible" to every noun.** PLAN is exhaustive, not minimal. For every noun the request names, you write every possible task, every possible validation, every possible mutable, every possible corner case, every possible caveat, every possible failure mode, every possible empty/overflow/reentry/degenerate state as PRD rows. Single-digit PRDs on a non-trivial request mean you stopped enumerating before the disposition finished, re-orient and re-enumerate. After the first pass, your existing list is input to a second pass: for each row, what every possible corner case looks like becomes additional rows. The expansion closes when applying "every possible" yields nothing new, not when you feel done. Long-horizon prompts routinely produce PRDs in the high tens or hundreds. Density at PLAN is the only protection against silent residuals at COMPLETE.
66
68
 
69
+ **Sweep every possible aspect for jank, and every aspect is a PRD row.** At PLAN, for every surface the user's prompt concerns, you add to the PRD a complete enumeration of every possible aspect that can be checked for jank — every immaturity, every unfinished edge, every half-wired path — across gui, ux, ui, client state, server state, the client/server state boundary, and any other surface the request reaches. `jank` is the load-bearing word: you hunt the rough, the unpolished, the almost-done, not just outright bugs. Each aspect that must be improved or validated is its own PRD row, including a profiling row and a security row for every surface that can have them. The sweep is scoped to what the prompt concerns and its reachable closure, not an unbounded repo-wide audit — but within that closure it is exhaustive. Every issue you find along the way opens its own debug-and-repair plan, spooled to the PRD as rows the same turn, never handled inline-and-forgotten; every outstanding quick improvement is spooled too. Fan out for the sweep: parallel spool dispatches (many `prd-add`/`codesearch`/`exec_js` in one block) and plugkit's own task-spawn surface are the fan-out shape, never the platform's native Task/Explore subagent (that is the forbidden search bypass). You fan out subagents for everything that parallelizes.
70
+
71
+ **One tell-tale AI design element spawns a full-codebase sweep.** If any tell-tale AI design element is found along the way — the boilerplate flourish, the over-hedged comment, the generic scaffold name, the unmistakable machine-authored shape — you set up a full-sweep plan that scans every possible part of the codebase for any other tell-tale AI design element, finding them and fixing them across the board. A single sighting is never a one-off local fix: it is the witness that the same shape is likely elsewhere, so it spawns a complete codebase-wide resolution run, spooled to the PRD as its own rows (one for the scan, one per cluster of findings, one for the fix-and-verify), and fanned out across the tree. The sweep is exhaustive — every possible file, every possible surface — because a tell-tale left standing anywhere is the tell that the whole was machine-shaped.
72
+
73
+ **Treat the architecture as pliable.** `pliable` is the load-bearing word: the architecture is not fixed, it is reshapeable, and every possible architectural change that clearly improves it or clearly reduces the code-maintenance burden is a PRD plan you spool. Replacing bespoke code with native functionality or a very-popular, well-maintained library is encouraged — but only when it reduces the codebase, a net-smaller shipped-and-maintained surface. Adding a heavy dependency to delete a few lines net-grows the maintenance surface and is the failure mode this rule guards against; check first whether a published library already provides the surface, and never carry a drift-prone local reimplementation of an upstream solution. You make every improvement that is clearly outstanding.
74
+
67
75
  **Noticing is a planning event.** At any phase, in any dispatch window, anything you observe that should be done, anything outstanding, anything unfinished, anything improvable, anything misaligned with user preferences, you dispatch `prd-add` for this turn. Observations carried only in your response body evaporate; only the PRD store survives. The default response to noticing is to convert. "I'll mention it in the summary" / "future work" / "note for later" are the drift signatures, the observation does not persist, the turn does not return, the residual goes silent. Density grows along the walk, not just at PLAN-time. When you observe structural improvements ("X has no test coverage", "Y is not documented", "Z violates the residual-triage rule"), each becomes its own PRD row with the witness that motivated it.
68
76
 
69
77
  `git push` is admissible only when `git status --porcelain` reads empty, and the porcelain probe must be its own Bash **tool-use event** before the push, a separate `Bash(...)` invocation, not a `&&`-chained command within one Bash call. ccsniff `--git-discipline` scans the last 20 Bash tool-use events (not shell commands within those events) for an explicit `git status --porcelain` (or `-s`); putting `add && commit && push` into one Bash call counts as one event with no porcelain witness, even though `git push` is technically the third shell command. The witness lives in the tool-call stream, not the shell stream. The discipline is **three Bash tool-use events** visible in the transcript: `Bash(git status --porcelain)` → read empty → `Bash(git push)`. You dispatch the `git_push` verb (not raw Bash) when possible, it gates on the porcelain probe internally, refuses dirty, and emits `deviation.push-dirty`. A raw `git push` Bash event without a preceding porcelain-probe Bash event is itself a deviation, regardless of how clean the worktree is by construction. Witness clean via `git_status`; witness pushed-to-remote via `branch_status` (ahead==0). The residual-scan and COMPLETE gate both refuse a dirty tree or a missing residual-check marker.
@@ -72,6 +80,8 @@ The chain is not COMPLETE until your changes are on origin. Commit and push at t
72
80
 
73
81
  Response body is not a mutation surface either. Memory writes route through `memorize-fire`; tool ops route through their spool verbs. Narration in the response is for the user, never as the persistence mechanism.
74
82
 
83
+ **Suppress mundane output; strip it to the bone.** Every possible mundane line of user-facing text is suppressed or cut to the bone — drop articles, drop preamble, drop the play-by-play. Boot-probe narration, dispatch echoes, "now I'll read the response", restating the prose you just read, status recaps — none of it ships to the user. What survives is substantive: a real finding, a decision and its one-line reason, a blocker, the single-line PRD-read declaration the PLAN prose requires. Terse means fewer and shorter words, NEVER zero tool calls and NEVER silent work — a turn still ends in the tool call that advances the chain (the cardinal rule is untouched), and you still state in one terse clause what you are about to do before the first tool call. You cut the mundane, you do not cut the chain. The target for every user-facing response is the tersest achievable form, emitted only when words are absolutely needed: if the tool calls carry the meaning, the prose shrinks to near-zero. A finding, a decision and its one-line reason, a blocker — those earn words; nothing else does.
84
+
75
85
  **Prune bad memory on sight, a wrong recall hit is worse than a miss.** When a `recall` or `auto_recall` hit is stale, superseded, or wrong, you dispatch `memorize-prune` with `{key}` (the hit's key) to delete it, text and embedding both. Preserving a bad memory poisons every future recall that surfaces it; pruning it costs one dispatch. Pruning bad memory matters more than preserving good memory. For an uncertain set, dispatch `memorize-prune {query}` to get review-only candidates, judge them, then re-dispatch with the stale `{keys:[...]}`, never a blind similarity-delete.
76
86
 
77
87
  On turn entry (first `instruction` dispatch after a >30s idle gap or session-start), plugkit attaches an `auto_recall` pack to your `instruction` response: `{query, hits, fired_at, turn_entry: true}`. The query is derived from `.gm/last-prompt.txt` / `.gm/turn-state.json`; hits are the top recall results plugkit pulled before serving your instruction. Read `auto_recall.hits` alongside the existing `recall_hits` (which is the phase+PRD-subject pack), both surface prior memory, but `auto_recall` is the per-turn user-prompt pack and only fires on turn entry. Subsequent `instruction` dispatches in the same turn carry no `auto_recall` field (or carry the same pack from the turn-start fire); do not re-trigger it manually.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-plugkit",
3
- "version": "2.0.1484",
3
+ "version": "2.0.1486",
4
4
  "description": "Bootstrap and daemon-spawn tool for gm plugkit binary. Downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Includes plugkit-wasm-wrapper for WASM-based spool watching.",
5
5
  "main": "index.js",
6
6
  "bin": {
@@ -2757,6 +2757,16 @@ async function runSpoolWatcher(instance, spoolDir) {
2757
2757
  const relPath = path.relative(inDir, filePath);
2758
2758
  const dir = path.dirname(relPath);
2759
2759
  const verb = dir === '.' ? path.basename(filePath, path.extname(filePath)) : dir;
2760
+ // Defense-in-depth beyond walkDir's dot-dir skip: a real verb is a single clean
2761
+ // segment (e.g. instruction, prd-resolve). A derived verb containing a path
2762
+ // separator or a dot-segment means the file lives under a stray nested spool
2763
+ // (in/prd-resolve/.gm/exec-spool/…); dispatching it builds a bogus verb+outName
2764
+ // and ENOENT-storms every tick. Skip + unmark so it never re-enters the loop.
2765
+ if (/[\\/]/.test(verb) || verb.split(/[\\/]/).some(seg => seg.startsWith('.'))) {
2766
+ try { logEvent('plugkit', 'spool.skip-nested-verb', { rel: relPath, derived_verb: verb }); } catch (_) {}
2767
+ unmarkProcessed(key);
2768
+ return;
2769
+ }
2760
2770
  let body = content.trim() || '{}';
2761
2771
  const taskBase = path.basename(filePath, path.extname(filePath));
2762
2772
 
package/plugkit.version CHANGED
@@ -1 +1 @@
1
- 0.1.608
1
+ 0.1.609