gm-skill 2.0.1614 → 2.0.1616
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +4 -4
- package/gm-plugkit/instructions/browser.md +6 -4
- package/gm-plugkit/instructions/emit.md +1 -1
- package/gm-plugkit/instructions/entry.md +2 -2
- package/gm-plugkit/instructions/execute.md +3 -3
- package/gm-plugkit/instructions/plan.md +1 -1
- package/gm-plugkit/instructions/update_docs.md +2 -2
- package/gm-plugkit/instructions/verify.md +4 -4
- package/gm-plugkit/package.json +1 -1
- package/gm-plugkit/plugkit-wasm-wrapper.js +19 -1
- package/gm.json +1 -1
- package/package.json +1 -1
- package/prompts/prompt-submit.txt +11 -11
- package/skills/gm-skill/SKILL.md +3 -3
package/AGENTS.md
CHANGED
|
@@ -60,13 +60,13 @@ Record only non-obvious technical caveats that cost multiple runs to discover; r
|
|
|
60
60
|
|
|
61
61
|
**No comments in code** -- no inline, block, or JSDoc comments anywhere (source, generated output, hooks, scripts). `test.js checkNoComments()` is the structural guard (fails on any leading `//` over tracked `.js/.mjs/.cjs`); one sighting spawns the full-tree sweep.
|
|
62
62
|
|
|
63
|
-
**No UTF-8 BOM in any tracked source file
|
|
63
|
+
**No UTF-8 BOM in any tracked source file** -- always `-Encoding utf8` (no BOM) or the `Write` tool; PowerShell defaults betray this. `test.js checkNoBom()` is the structural guard; one sighting spawns the full-tree sweep. Cause + breakage mechanics in rs-learn (`recall: BOM regression incident`).
|
|
64
64
|
|
|
65
|
-
**No graphical symbols; convert to industry-standard text on sight.**
|
|
65
|
+
**No graphical symbols; convert to industry-standard text on sight.** Any non-ASCII decorative glyph (arrows, box/geometric glyphs, stars, dots, bullets, checks/crosses, emojis) is forbidden in all output and source -- convert it to its plain-ASCII equivalent the same turn (the word, `->`, `-`/`*`, `[x]`/`[ ]`, done/todo/pass/fail). Tell-tale-AI class: one sighting spawns the full-codebase sweep, never a one-off edit. Exempt: functional code operators (`=>`, `??`, `?.`, comparison/math), frozen changelog/git-log entries, binary stores, intentional icon-font/CSS-content product glyphs. `ccsniff --glyph-discipline` flags decorative glyphs post-hoc (run each audit, like `--git-discipline`/`--search-discipline`).
|
|
66
66
|
|
|
67
67
|
**Skill SKILL.md files:** strip explanatory prose; keep ONLY invocation syntax, transition markers (`->`), gate conditions, constraint lists, exact-usage code examples.
|
|
68
68
|
|
|
69
|
-
**Implicit, not explicit, in skill prose.** Skill files (and prompt-submit.txt) elicit behavior, they do not describe it: terse imperative principles that trigger already-learned dispositions, not numbered procedures. Forbidden: step-by-step recipes, "see paper section X", citations to the site/papers, multi-step manuals. A skill that reads like a manual gets imitated as a script and breaks at the first edge case. The papers and site are outputs of the discipline, not inputs; never link from a skill into the docs. Cross-cutting rules needing a citation belong here, not in skills.
|
|
69
|
+
**Implicit, not explicit, in skill prose.** Skill files (and prompt-submit.txt) elicit behavior, they do not describe it: terse imperative principles that trigger already-learned dispositions, not numbered procedures. A passage describes when the agent could re-derive it from the goal (a recipe, a do-X-then-Y sequence, a trigger-instance list, over-explained rationale, restated code-mechanism); it elicits when it constructs a predicament where the wrong move is structurally incoherent or self-evidently loses. Convert the former to the latter, but the boot-edge ABI a wrong guess breaks -- exact spool paths, JSON field names, verb names, file globs, deviation identifiers, gate names -- is non-derivable mechanism that stays explicit; only derivable procedure converts. Forbidden: step-by-step recipes, "see paper section X", citations to the site/papers, multi-step manuals. A skill that reads like a manual gets imitated as a script and breaks at the first edge case. The papers and site are outputs of the discipline, not inputs; never link from a skill into the docs. Cross-cutting rules needing a citation belong here, not in skills.
|
|
70
70
|
|
|
71
71
|
## Build
|
|
72
72
|
|
|
@@ -162,7 +162,7 @@ Orchestration state is tracked via `.gm/` marker files, not hook events; the CLI
|
|
|
162
162
|
|
|
163
163
|
**Process-of-elimination is the debugging paradigm EVERYWHERE, and manual real-services witness is the verification paradigm EVERYWHERE.** Every debug -- code, wasm, cascade, browser, the spooler itself -- enumerates candidate causes as mutables and eliminates each by a witness read against real input (`exec_js`/`codesearch`/`Read`/`browser page.evaluate`), each elimination revealing the next, never guess-and-restart/a-b-test/shotgun. Every verification is manual labour against the real thing -- the single mock-free `test.js`, the live page, the real service, the live wasm -- never an automated unit/mock suite standing in for the real-services witness (the conventional-testing tell-tale gm replaces). Stated in `instructions/execute.md` (the served EXECUTE prose) so it reaches every LLM in-session.
|
|
164
164
|
|
|
165
|
-
**The first verb after a genuine multi-minute IDLE is `instruction`, to reset the long-gap clock**:
|
|
165
|
+
**The first verb after a genuine multi-minute IDLE is `instruction`, to reset the long-gap clock**: only spool verbs reset it, so a long investigation in platform tools trips a false stall -- interleave `instruction`/`prd-add` to stay warm, and dispatch `instruction` BEFORE any predictable blocking wait. Threshold + platform-tool exception in rs-learn (`recall: first verb after multi-minute wait instruction long-gap`).
|
|
166
166
|
|
|
167
167
|
**A stop-hook firing on a terminal chain does not authorize re-polling**: when a stop-hook fires while already at `phase=COMPLETE` AND `prd_pending_count=0`, re-dispatching `instruction`/`phase-status` to "re-confirm" is a deviation (`deviation.complete-chain-poll`, `instructions/mod.rs`). Two admissible responses: (a) a prose-only turn (COMPLETE is in hand), or (b) genuinely new planned work opened with a FRESH `{"prompt":...}` body (resets phase to PLAN, driven through the skill). Repeatedly answering the same hook is a loop; state the terminal facts once and stop, or open new work.
|
|
168
168
|
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
|
|
5
5
|
**Every edit to code that runs in a browser requires a live `browser` dispatch in the same turn as the edit.** Client-side surfaces -- `.html`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.mjs`, `.css`, web components, service workers, every asset loaded by `<script>`, every path reached by `import` from a browser-side entry -- must be witnessed by a live `page.evaluate` of the specific invariant the edit establishes. A passing node test, build, `curl` of the HTML, or static-analysis pass witnesses server delivery, not browser behavior, and is non-substitutive. The witness IS the proof; prose is not.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
The witness is a live `page.evaluate` asserting the specific invariant against the real surface -- server up, HTTP 200, the global the change affects polled until present -- values captured into `stdout`; variance means a root-cause fix and re-witness, not advance. Anything short of the live assertion -- unwitnessed behavior, an assert fired before the global is present, validation queued for "later" -- leaves the edit unproven, and an unproven client edit is forced closure.
|
|
8
8
|
|
|
9
9
|
Fires across phases: **EXECUTE** edit -> same-turn browser dispatch asserting the invariant; **EMIT** post-emit re-witness (page still passes after the full diff); **VERIFY** final gate -- `deviation.browser-witness-hash-mismatch` fires if a witnessed file changed without re-witnessing. Pure-prose static-document edits (no JS, no CSS-driven behavior, no DOM mutation) are the ONLY exempt category, and the exemption must be named explicitly in the response so the skip is auditable. Silent skip on actual behavior change is forced closure.
|
|
10
10
|
|
|
@@ -12,18 +12,20 @@ YOU drive the browser through the spool: plugkit holds the Chromium handle, per-
|
|
|
12
12
|
|
|
13
13
|
## Body shapes
|
|
14
14
|
|
|
15
|
-
The body is a string,
|
|
15
|
+
The body is a string, these shapes:
|
|
16
16
|
|
|
17
17
|
```
|
|
18
18
|
session new
|
|
19
19
|
session list
|
|
20
20
|
session close <id>
|
|
21
21
|
<arbitrary JS expression evaluated in page context>
|
|
22
|
+
<https://... bare URL>
|
|
23
|
+
url=<url>\n<expression>
|
|
22
24
|
timeout=<ms>\n<expression>
|
|
23
25
|
capture\n<expression>
|
|
24
26
|
```
|
|
25
27
|
|
|
26
|
-
A bare expression with no live session opens
|
|
28
|
+
**Open on the page you want to test, not a blank one.** A bare `https://...` URL body navigates the session straight to that page and returns `{url, title}` -- the simplest "show me this page." `url=<url>\n<expression>` navigates first, then runs your expression on the loaded page, so the global/DOM you assert is already there in one dispatch instead of a blank surface you must `page.goto` yourself. `url=` composes with `timeout=` and `capture` -- stack the prefix lines in order `timeout=`, then `url=`, then `capture`, the expression last; the prepended `page.goto` rides inside the capture so its navigation console/network is captured too. A bare expression with no URL prefix and no live session opens against `about:blank`; with a live session it reuses it. `session new` returns the id you carry; with more than one open, target it via `session=<id>\n<expr>`. (`session close` and `session kill` are aliases.) Default per-eval timeout 120000ms; operations that legitimately exceed it prefix `timeout=<ms>\n` (wrapper clamps to 120000ms). The response carries `timeout_ms_used`; `browser.runner-timeout` fires at the cap -- read `stderr`, narrow or raise, never retry blind at the same budget.
|
|
27
29
|
|
|
28
30
|
**`capture\n<expression>` is the zero-boilerplate debug path -- prefer it.** Prefix your script with `capture` (or `profile`) on its own line and the wrapper auto-attaches `page.on('console'|'pageerror'|'requestfinished')` before your code runs, runs your script in an async wrapper (your top-level `await`/`return` work unchanged), and returns `{result: <your return>, debug: {console, pageErrors, network, performance}}` -- page console logs, uncaught errors, per-request network timing, and navigation performance, captured for free. Combine with timeout via `timeout=<ms>\ncapture\n<expr>`. Use the bare expression only when you do not want the capture overhead.
|
|
29
31
|
|
|
@@ -41,7 +43,7 @@ The window opens on the user's screen -- that IS the witness. `GM_BROWSER_HEADLE
|
|
|
41
43
|
|
|
42
44
|
## Profile and debug recipes
|
|
43
45
|
|
|
44
|
-
The page is a genuine profiler and debugger -- use it, never guess-and-restart. The `capture\n` prefix above does all of this for free; reach for the manual recipe below only for custom capture. Attach the listeners
|
|
46
|
+
The page is a genuine profiler and debugger -- use it, never guess-and-restart. The `capture\n` prefix above does all of this for free; reach for the manual recipe below only for custom capture. Attach the listeners, then navigate, in one script so nothing fires before they are listening -- the captured arrays are the live witness:
|
|
45
47
|
|
|
46
48
|
```
|
|
47
49
|
const logs=[],errs=[],net=[];
|
|
@@ -16,7 +16,7 @@ Feed search outputs into EMIT only when the digest matches the live filesystem;
|
|
|
16
16
|
|
|
17
17
|
One write per artifact, then a disk Read against every touched path to assert the change -- verified disk state IS the witness, not the tool-call return. On discrepancy, regress to root cause, do not retry.
|
|
18
18
|
|
|
19
|
-
**Client-side artifacts: write-then-browser-witness, same turn.** If the artifact is `.html .js .jsx .ts .tsx .vue .svelte .mjs .css` or any browser-loaded path, the disk Read is necessary but not sufficient -- also dispatch a `browser` verb that `page.evaluate`s the invariant the artifact establishes (the page-side assertion is the real witness; the disk Read only witnesses serialization). Skipping it ships a green-checked stub. The COMPLETE gate refuses
|
|
19
|
+
**Client-side artifacts: write-then-browser-witness, same turn.** If the artifact is `.html .js .jsx .ts .tsx .vue .svelte .mjs .css` or any browser-loaded path, the disk Read is necessary but not sufficient -- also dispatch a `browser` verb that `page.evaluate`s the invariant the artifact establishes (the page-side assertion is the real witness; the disk Read only witnesses serialization). Skipping it ships a green-checked stub. The COMPLETE gate refuses while any client-side file edited this session lacks its paired browser-witness (`deviation.client-edit-no-witness`, gates.rs); the missing witness is the next dispatch.
|
|
20
20
|
|
|
21
21
|
## Artifact scope
|
|
22
22
|
|
|
@@ -81,7 +81,7 @@ Route KV writes to `<cwd>/.gm/disciplines/<ns>/`. `@<name>` prefix sets namespac
|
|
|
81
81
|
|
|
82
82
|
## Inspection routing
|
|
83
83
|
|
|
84
|
-
|
|
84
|
+
Every capability has exactly one sanctioned surface and the platform's native tools are never it: code/file/symbol search is the `codesearch` verb (cwd-indexed -- a sibling repo is `Read` by path, never expected from `codesearch`), runtime-state files (spool response JSON, `.status.json`) are `Read`, and Bash survives only for the boot probe and shell-only non-git tooling (`npm`, `bun x`, `curl`). Reaching for Glob/Grep/Explore or any host-native search is reaching around the surface -- it is blocked; the verb IS the surface. Spool responses are synchronous; poll external state via `until <check>; do sleep N; done`.
|
|
85
85
|
|
|
86
86
|
## Memorize
|
|
87
87
|
|
|
@@ -89,6 +89,6 @@ Write the recall index only via `memorize-fire`; surfaces outside it produce mem
|
|
|
89
89
|
|
|
90
90
|
## Return to plugkit
|
|
91
91
|
|
|
92
|
-
|
|
92
|
+
Any uncertainty about the next move -- drift, a gate denial, a silent stretch in a non-trivial phase -- is itself the signal to dispatch `instruction`, because your memory of the prose went stale the moment phase/PRD/mutables shifted. It is cheap, synchronous, idempotent; the cost is all on the under-dispatch side. Every gate denial names the next verb in its `reason` field; read it and dispatch that verb, never improvise around the denial -- a denial with no follow-up dispatch is a session that gave up, and the chain is not COMPLETE while you have given up.
|
|
93
93
|
|
|
94
94
|
Transition: SESSION_ID threaded AND spool reachable -> dispatch `instruction` with `{"prompt":"<user request>"}` so plugkit derives orient_nouns + recall_hits; later same-chain dispatches may use empty body.
|
|
@@ -8,7 +8,7 @@ L3 distance + audit: real input -> real code -> real output, witnessed.
|
|
|
8
8
|
|
|
9
9
|
Route every mutation through PRD rows, mutables, KV memos; attach an audit tuple `(id, hash, ts)` to each accepted write, where `hash` is the witness (`file:line`, codesearch hit, exec snippet). `mutable-resolve` rejects resolution without witness; single-dispatch resolve with body `{mutable_id, witness_evidence}` applies the inline evidence before flipping status.
|
|
10
10
|
|
|
11
|
-
Every code/file/symbol lookup is a `codesearch` dispatch -- never a platform Explore agent, Task/general-purpose search subagent, or raw grep
|
|
11
|
+
Every code/file/symbol lookup is a `codesearch` dispatch -- never a platform Explore agent, Task/general-purpose search subagent, or raw grep -- the same drift as reaching for puppeteer over the `browser` verb. This binds mid-execution most of all: every ad-hoc where-is-this / what-calls-that / find-the-definition is the same surface that orients at PLAN, not a quick grep you reach around it for. The capability is a verb; dispatch the verb.
|
|
12
12
|
|
|
13
13
|
## Witness
|
|
14
14
|
|
|
@@ -24,7 +24,7 @@ State diverging from the PRD's assumed shape is a new mutable, not background no
|
|
|
24
24
|
|
|
25
25
|
## Discovery: additive vs reshaping
|
|
26
26
|
|
|
27
|
-
Real input is the highest-yield discovery surface; every observation converts to a PRD row this turn, never a "future work" note -- a corner case
|
|
27
|
+
Real input is the highest-yield discovery surface; every observation converts to a PRD row this turn, never a "future work" note -- whatever real input surfaces (a corner case, a tool caveat, a failure mode, an adjacent file/import, deviation-bearing stderr, a prior commit violating a user preference such as a sparse PRD, untriaged residual, or missing browser-witness) is a row, the list never closed. Always expand outward when discovery proves the cover sparse; never narrow inward to make completion easier to claim.
|
|
28
28
|
|
|
29
29
|
Two kinds, two moves. **Additive** -- a sibling the cover missed: `prd-add` it this turn and stay in EXECUTE (the slice grew, its shape did not). **Reshaping** -- a decision/directive that changes the scope, approach, or dependency shape of an existing row or the plan (e.g. "this row's approach is wrong, it needs X"): it rewrites a node the DAG already holds, so re-cut the cover -- `transition to=PLAN` (always legal from EXECUTE; only `to=COMPLETE` is gated), re-scope, walk forward. Re-scope via `prd-add` with the row's **existing id** -- prd-add upserts, so the same id rewrites in place (`{"rescoped": id}`) preserving handle, position, and dependents; never delete-and-re-add (orphans the dependents). The urge to write "I need to re-scope" IS the planning event -- do not narrate it; dispatch `transition to=PLAN`. Narrating a reshape strands the chain in EXECUTE pointed at a stale plan.
|
|
30
30
|
|
|
@@ -36,7 +36,7 @@ First emit = closure of the transform; scaffold + IOU externalizes residual cost
|
|
|
36
36
|
|
|
37
37
|
Data first -- get the structures and their invariants right and the code writes itself; convoluted control flow means the data model is wrong, so fix the model. Make invalid state unrepresentable -- pass parameters over hidden globals, encode the constraint in the type/shape so the bad combination cannot be constructed. Reason from physical constraints (latency, bandwidth, memory, coordination, the worst node) before designing within them. Keep the spine flat, each unit single-focus and understandable at its call site. Make misuse structurally impossible, not documented-against. Optimize the worst case, not the average; design every failure path explicitly (full -> degraded -> safe-fail -> explicit-error), never a silent catastrophic mode. Measure, do not assume -- profile before optimizing, implement both and compare on real input when in genuine dispute. When a change regresses something that worked, revert first and investigate second: restore green, then diagnose from a known-good base. Fail fast and loud over limping on bad state.
|
|
38
38
|
|
|
39
|
-
**Process of elimination is the debugging paradigm on every surface, and manual labour against real services is how you witness.** Never guess-and-restart, a/b-test, or shotgun variants: enumerate the candidate causes as mutables, then eliminate each by a witness read against REAL input -- `exec_js` against the real service, `codesearch`/`Read` against the real source, the `browser` verb's `page.evaluate` against a `window.*` global on the live page. Each elimination reveals the next mutable; record it and keep going until one cause survives every other's refutation. Reading the live runtime once observes more than a hundred blind restarts. Profile
|
|
39
|
+
**Process of elimination is the debugging paradigm on every surface, and manual labour against real services is how you witness.** Never guess-and-restart, a/b-test, or shotgun variants: enumerate the candidate causes as mutables, then eliminate each by a witness read against REAL input -- `exec_js` against the real service, `codesearch`/`Read` against the real source, the `browser` verb's `page.evaluate` against a `window.*` global on the live page. Each elimination reveals the next mutable; record it and keep going until one cause survives every other's refutation. Reading the live runtime once observes more than a hundred blind restarts. Profile on the real surface, not from intuition: wrap the suspect node and read the live numbers. In node, `exec_js` carries `duration_ms` for free, surfaces your own timing and `process.memoryUsage()` on stdout, and lands the thrown-error `stack` on stderr -- read both channels (numbers on stdout, stack on stderr). In the browser, a body prefixed `capture\n<script>` auto-returns `{result, debug:{console, pageErrors, network, performance}}` with zero boilerplate. Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement. Verification is the same labour: run the real thing and witness the real output (the single mock-free `test.js`, the live page, the real service), never an automated unit/mock harness standing in for the real-services witness. Apparent tooling failure is part of this -- it is your mechanical self-recovery by elimination, never a question for the user.
|
|
40
40
|
|
|
41
41
|
## Memorize
|
|
42
42
|
|
|
@@ -24,7 +24,7 @@ Cut the cover so the hardest reachable node comes first: the row exercising the
|
|
|
24
24
|
|
|
25
25
|
## Noticing-to-PRD
|
|
26
26
|
|
|
27
|
-
Anything noticed during orient or expansion that is not yet a row -- outstanding work, an unfinished surface, an improvable shape, a preference misalignment, an adjacent concern -- is a `prd-add` this turn. Observations carried only in the response body evaporate; only the store survives. "We should also..." / "worth noting..."
|
|
27
|
+
Anything noticed during orient or expansion that is not yet a row -- outstanding work, an unfinished surface, an improvable shape, a preference misalignment, an adjacent concern -- is a `prd-add` this turn. Observations carried only in the response body evaporate; only the store survives. "We should also..." / "worth noting..." is a row with the witness that motivated it, not a remark. A noticing that is structural (a coverage gap, a missing doc, a prior commit that broke a rule) or preference-aware (state drifting from density-at-PLAN, residual-triage, push-on-clean, every-possible expansion, or browser-witness coverage) is the same event: each its own row describing the aligned state.
|
|
28
28
|
|
|
29
29
|
## Mutables
|
|
30
30
|
|
|
@@ -8,7 +8,7 @@ Docs reflect the current state of the system, not its history. Every rule in AGE
|
|
|
8
8
|
|
|
9
9
|
Edit AGENTS.md/CLAUDE.md inline -- the top of the preserved hierarchy and the only doc that survives context summarization. `memorize-fire` is the parallel surface (`.gm/exec-spool/in/memorize-fire/<N>.txt`, raw text or `{text, namespace?}`) where `recall`/`auto_recall` retrieve the fact on future turns. AGENTS.md is the staging ground; the store is the recall surface. Migration is the agent's dual-write, not a file-scan: landing a load-bearing rule in AGENTS.md, fire the same rule to the store the same session so it surfaces in `auto_recall`. An automatic ingest cannot run -- the classifier cannot judge which paragraphs are recall-worthy rules vs narrative, so the agent judges at write time. Never pass `namespace:"AGENTS.md"` (mislabeled namespace); load-bearing rules go to the default namespace. Multiple facts = multiple parallel requests in one message.
|
|
10
10
|
|
|
11
|
-
**Migration is bidirectional; the back-pressure is deflation -- every memorize run also drains AGENTS.md.** AGENTS.md grows monotonically if flow is only inward and bloats past the budget it protects. So every session firing `memorize-fire` for new facts ALSO picks a few existing AGENTS.md entries that have gone detail-heavy/single-crate/single-platform (the material the Documentation Policy assigns to rs-learn), `memorize-fire`s the substance to the default namespace, and deletes or compresses the paragraph to a one-line pointer in the same commit. Eligible =
|
|
11
|
+
**Migration is bidirectional; the back-pressure is deflation -- every memorize run also drains AGENTS.md.** AGENTS.md grows monotonically if flow is only inward and bloats past the budget it protects. So every session firing `memorize-fire` for new facts ALSO picks a few existing AGENTS.md entries that have gone detail-heavy/single-crate/single-platform (the material the Documentation Policy assigns to rs-learn), `memorize-fire`s the substance to the default namespace, and deletes or compresses the paragraph to a one-line pointer in the same commit. Eligible = anything a future agent reaches for via `recall` rather than needing resident every prompt; resident = the cross-cutting rule, drainable = the fact-base caveat. Top-level cross-cutting rules stay; everything recall-reachable drains. Witnessed both ways: the fact lands in the store AND the byte-count drops. A few entries per run, never a wholesale rewrite. Skipping the drain is the slow-bloat drift the policy exists to prevent.
|
|
12
12
|
|
|
13
13
|
## README.md
|
|
14
14
|
|
|
@@ -24,7 +24,7 @@ One entry per commit landed this session: the commit subject plus a one-sentence
|
|
|
24
24
|
|
|
25
25
|
## Commit and Push
|
|
26
26
|
|
|
27
|
-
Stage doc updates only -- never bundle them with code changes from earlier phases (committed at their own time). One commit, present-tense imperative subject. Push via the git verbs: `git_finalize {message}` bundles add -> commit -> porcelain-gate -> push in one dispatch, or `git_add` the doc paths then `git_commit` then `git_push`. The verbs gate on the porcelain probe internally and refuse a dirty tree (`deviation.push-dirty`); a raw `git` shell body is gated `deviation.bash-git-bypass`. If you ever fall back to raw Bash git, the porcelain probe
|
|
27
|
+
Stage doc updates only -- never bundle them with code changes from earlier phases (committed at their own time). One commit, present-tense imperative subject. Push via the git verbs: `git_finalize {message}` bundles add -> commit -> porcelain-gate -> push in one dispatch, or `git_add` the doc paths then `git_commit` then `git_push`. The verbs gate on the porcelain probe internally and refuse a dirty tree (`deviation.push-dirty`); a raw `git` shell body is gated `deviation.bash-git-bypass`. If you ever fall back to raw Bash git, the porcelain probe is its own `Bash(git status --porcelain)` event before the push, never `&&`-chained -- a chained `add && commit && push` carries no separable witness, so ccsniff `--git-discipline` sees an unwitnessed push. A doc commit stages only paths matching AGENTS.md, CLAUDE.md, README.md, SKILLS.md, CHANGELOG.md, LICENSE*, docs/**, or site/**; any non-doc path means you bundled phases -- split it out before staging. The push triggers the docs pipeline and IS the validation dispatch.
|
|
28
28
|
|
|
29
29
|
## COMPLETE
|
|
30
30
|
|
|
@@ -24,7 +24,7 @@ Write `test.js` at root, 200-line ceiling, real services only (mock-free) -- thi
|
|
|
24
24
|
|
|
25
25
|
## Residual-scan
|
|
26
26
|
|
|
27
|
-
Run `residual-scan` before COMPLETE; it examines the open surface
|
|
27
|
+
Run `residual-scan` before COMPLETE; it examines the open surface -- PRD pending, browser sessions, dirty tree, untracked artifacts, and browser-witness coverage for client-side files modified this session -- and a non-empty result is non-convergent. Non-empty = non-convergent -> expand the PRD with the reachable in-spirit residual via `prd-add` and re-execute. One-shot per stop window via marker. `reason: "browser sessions still open"` -> close each (`browser` `session close <id>`; `session list` enumerates); retrying the scan without closing is the idle-mid-chain/polling deviation -- the denial names the next verb, dispatch it.
|
|
28
28
|
|
|
29
29
|
Before accepting the scan empty, re-apply "every possible" to the closing PRD: every resolved row's skipped variants, every adjacent surface the work touched, every validation that proves a row in practice not in claim -- each fresh hit is a `prd-add` + re-execution. A clean scan on a short PRD for a long-horizon prompt is a false negative. Noticing-to-PRD is unchanged: anything observed while testing/reading diffs/inspecting closing state converts this turn and re-executes; stopping at "tests pass" while noticing named follow-on work is the canonical VERIFY drift.
|
|
30
30
|
|
|
@@ -44,13 +44,13 @@ Attach `witness_evidence` of the admitted form to every mutable in the closing s
|
|
|
44
44
|
|
|
45
45
|
## Completion
|
|
46
46
|
|
|
47
|
-
The chain enters COMPLETE only when your `transition` returns COMPLETE phase; the on-disk state file moves only on a `transition`. **Done is plugkit's pronouncement, not yours** -- gate
|
|
47
|
+
The chain enters COMPLETE only when your `transition` returns COMPLETE phase; the on-disk state file moves only on a `transition`. **Done is plugkit's pronouncement, not yours** -- even gate-allowance is not done; only a dispatched `transition` returning COMPLETE is, so a narrated PLAN->COMPLETE walk with the gate open or the verb un-dispatched is fabrication. Announcing "I'm finished" before the gate flips is narrating; the cure is `phase-status`. Not-COMPLETE means a next transition exists; idle and "waiting for the user" mid-chain are deviations (closure was authorized at request time). Between the four observations, between residual-scan and COMPLETE, between push and transition, re-dispatch `instruction` -- VERIFY is where premature COMPLETE happens.
|
|
48
48
|
|
|
49
49
|
**You close rs-learn's learning loop here -- you are the model in the loop and hold the one signal every core needs: how well this turn went.** Before COMPLETE, self-report turn quality 0.0-1.0 (1.0 = clean convergent walk, no deviations, pushed, gate-allowed; lower as deviations/rework accumulate). That number drives every core, none of which can observe it without you:
|
|
50
50
|
|
|
51
|
-
- **Instant MicroLoRA adapter:**
|
|
51
|
+
- **Instant MicroLoRA adapter:** only with >=2 enabled disciplines (default-only = correct no-op, skip). Init once per session `learn {verb:init_instant, body:{targets:[<enabled namespaces>]}}`, then `learn {verb:feedback, body:{embedding:<bge embedding of this turn's task>, payload:{quality:<0-1>}}}` -- high quality shifts the helpful namespace's logit up so recall promotes it.
|
|
52
52
|
- **FastGRNN router:** `learn {verb:record_outcome, body:{target:"<your model id>", quality:<0-1>}}` so the next `route_hint` reflects learned outcomes.
|
|
53
|
-
- **Deep EWC core** and **GAT attention relation weights
|
|
53
|
+
- **Deep EWC core** and **GAT attention relation weights:** the same quality as `record_loss` / `nudge_relation`.
|
|
54
54
|
|
|
55
55
|
rs-learn never calls a model; it emits the need and you supply the answer. Skipping the self-report leaves the cores untrained.
|
|
56
56
|
|
package/gm-plugkit/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm-plugkit",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.1616",
|
|
4
4
|
"description": "Bootstrap and daemon-spawn tool for gm plugkit binary. Downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Includes plugkit-wasm-wrapper for WASM-based spool watching.",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"bin": {
|
|
@@ -2000,6 +2000,22 @@ function makeHostFunctions(instanceRef) {
|
|
|
2000
2000
|
evalBody = timeoutMatch[2];
|
|
2001
2001
|
}
|
|
2002
2002
|
}
|
|
2003
|
+
let startUrl = null;
|
|
2004
|
+
const urlMatch = evalBody.match(/^url=(\S+)[ \t]*\n([\s\S]*)$/);
|
|
2005
|
+
if (urlMatch) {
|
|
2006
|
+
startUrl = urlMatch[1];
|
|
2007
|
+
evalBody = urlMatch[2];
|
|
2008
|
+
} else {
|
|
2009
|
+
const bare = evalBody.trim();
|
|
2010
|
+
if (/^https?:\/\/\S+$/.test(bare)) {
|
|
2011
|
+
startUrl = bare;
|
|
2012
|
+
evalBody = 'return {url: page.url(), title: await page.title()};';
|
|
2013
|
+
}
|
|
2014
|
+
}
|
|
2015
|
+
const navTimeout = Math.min(timeoutMs, 60000);
|
|
2016
|
+
const gotoPrefix = startUrl
|
|
2017
|
+
? `await page.goto(${JSON.stringify(startUrl)},{waitUntil:'load',timeout:${navTimeout}});\n`
|
|
2018
|
+
: '';
|
|
2003
2019
|
const captureMatch = evalBody.match(/^(?:capture|profile)[ \t]*\n([\s\S]*)$/);
|
|
2004
2020
|
if (captureMatch) {
|
|
2005
2021
|
const userScript = captureMatch[1];
|
|
@@ -2007,9 +2023,11 @@ function makeHostFunctions(instanceRef) {
|
|
|
2007
2023
|
+ `try{page.on('console',m=>{try{__logs.push({type:m.type(),text:m.text()});}catch(_){}});`
|
|
2008
2024
|
+ `page.on('pageerror',e=>{try{__errs.push(String(e&&e.message||e));}catch(_){}});`
|
|
2009
2025
|
+ `page.on('requestfinished',r=>{try{const t=r.timing();__net.push({url:String(r.url()).slice(0,120),dur_ms:Math.round(t.responseEnd),ttfb_ms:Math.round(t.responseStart)});}catch(_){}});}catch(_){}\n`
|
|
2010
|
-
+ `const __result = await (async () => {\n${userScript}\n})();\n`
|
|
2026
|
+
+ `const __result = await (async () => {\n${gotoPrefix}${userScript}\n})();\n`
|
|
2011
2027
|
+ `let __perf=null;try{__perf=await page.evaluate(()=>{const n=performance.getEntriesByType('navigation')[0];return n?{load_ms:Math.round(n.loadEventEnd||0),dcl_ms:Math.round(n.domContentLoadedEventEnd||0),resources:performance.getEntriesByType('resource').length,now:Math.round(performance.now())}:null;});}catch(_){}\n`
|
|
2012
2028
|
+ `return {result:__result,debug:{console:__logs,pageErrors:__errs,network:__net.slice(0,30),performance:__perf}};`;
|
|
2029
|
+
} else if (startUrl) {
|
|
2030
|
+
evalBody = `${gotoPrefix}${evalBody}`;
|
|
2013
2031
|
}
|
|
2014
2032
|
const outerTimeoutMs = Math.min(timeoutMs + 6000, 126000);
|
|
2015
2033
|
const r = runBrowserRunner(pw, ['-s', pwSessionId, '--timeout', String(timeoutMs), '-e', evalBody], outerTimeoutMs, cwd, sessionId);
|
package/gm.json
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm-skill",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.1616",
|
|
4
4
|
"description": "Canonical universal harness — AI-native software engineering via skill-driven orchestration; bootstraps plugkit for task execution and session isolation. Install in any AI coding agent host.",
|
|
5
5
|
"author": "AnEntrypoint",
|
|
6
6
|
"license": "MIT",
|
|
@@ -4,15 +4,15 @@ BLOCKING REQUIREMENT -- YOUR FIRST ACTION MUST BE: invoke the gm-skill (the sing
|
|
|
4
4
|
|
|
5
5
|
YOU ARE THE ORCHESTRATOR. Plugkit is a stateful library: it serves instructions and tracks state when YOU dispatch verbs, and it does NOT act on its own, advance phases, or validate transitions in the background. Every state change is a verb YOU write into `.gm/exec-spool/in/<verb>/<N>.txt`. If you find yourself waiting for plugkit, polling its output dir, or saying "the orchestrator will handle/validate/transition" -- STOP and dispatch the verb. The gm skill is your entry surface, not an actor: you invoke gm; gm tells you to dispatch `instruction`; the instruction response names the next verb. Skills do NOT auto-chain; plugkit does NOT auto-advance; you drive both.
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
PLAN
|
|
10
|
-
EXECUTE
|
|
11
|
-
EMIT
|
|
12
|
-
VERIFY
|
|
13
|
-
|
|
7
|
+
Each phase exits exactly when its residual set empties, and you exit it by dispatching `transition` with the next phase as the body string -- the residual IS the exit condition, not a schedule:
|
|
8
|
+
gm loaded -> `instruction` (first action, every turn)
|
|
9
|
+
PLAN, last pass surfaced zero new unknowns -> body "EXECUTE"
|
|
10
|
+
EXECUTE, every mutable witnessed -> body "EMIT"
|
|
11
|
+
EMIT, every gate passes -> body "VERIFY"
|
|
12
|
+
VERIFY, PRD empty AND worktree clean AND pushed AND mutables witnessed -> body "COMPLETE"; any PRD item still open -> body "EXECUTE"
|
|
13
|
+
A non-empty residual means the phase has not exited, whatever the narration claims.
|
|
14
14
|
|
|
15
|
-
Regressions
|
|
15
|
+
Regressions are your dispatches too: a discovery transitions backward to the earliest phase whose output it invalidated -- a fresh unknown reopens PLAN, broken EMIT logic reopens EXECUTE, a broken VERIFY file reopens EMIT, but VERIFY exposing wrong logic skips back past EMIT to EXECUTE. Re-entry is a first-class move, may span more than one phase, never a failure.
|
|
16
16
|
|
|
17
17
|
A phase claim in text without the corresponding `transition` dispatch is fabrication; plugkit's phase walk is ground truth, your narration is not. After PLAN, dispatch independent PRD items in parallel -- batch independent verb dispatches into one message (N request writes, then N response reads), never sequential for independent work.
|
|
18
18
|
|
|
@@ -24,7 +24,7 @@ Every unknown->known transition MUST be memorized THE SAME TURN it resolves, not
|
|
|
24
24
|
|
|
25
25
|
The ONLY acceptable form is the spool dispatch: write `.gm/exec-spool/in/memorize-fire/<N>.txt` with a single fact (enough context for a cold-start agent); the wasm orchestrator embeds and persists it. No subagent, no model call -- the agent IS the model.
|
|
26
26
|
|
|
27
|
-
|
|
27
|
+
Fire the instant an unknown becomes known -- the same turn, before the next tool. Any exec output, code read, CI log, error, user-stated preference, constraint, deadline, or judgment call, surprising fix, or env quirk (blocked command, path oddity, platform difference) that converts a not-yet-known into a known IS that instant; if you could not have answered it before the result and can now, memorize it now.
|
|
28
28
|
|
|
29
29
|
Parallel dispatch: N facts -> N memorize-fire writes in ONE message, never serialized. End-of-turn self-check (mandatory): before closing any response, scan the whole turn for exec outputs and code reads that resolved an unknown but were NOT memorized, and dispatch all missed ones now. "I'll memorize this" in text is not a memorize dispatch -- only the spool write counts. Skipping memorize = memory leak = critical bug.
|
|
30
30
|
|
|
@@ -42,7 +42,7 @@ When scope exceeds reach, expand the cover; don't refuse and don't ship one slic
|
|
|
42
42
|
|
|
43
43
|
=== AUTO-RECALL ON TURN ENTRY ===
|
|
44
44
|
|
|
45
|
-
On the first `instruction`
|
|
45
|
+
On the first `instruction` after a >30s idle gap or session-start, the response carries an extra `auto_recall: {query, hits, fired_at, turn_entry: true}` pack alongside the usual `recall_hits` (the phase+PRD-subject pack) -- read it the same way. It fires once per turn; for a different query mid-turn, dispatch the `auto-recall` verb with your prompt as the body.
|
|
46
46
|
|
|
47
47
|
=== NO WAITING FOR PLUGKIT -- HARD RULE ===
|
|
48
48
|
|
|
@@ -50,7 +50,7 @@ Plugkit is synchronous from your perspective: write the request file, the watche
|
|
|
50
50
|
|
|
51
51
|
=== MUTABLES.YML -- MACHINE-CHECKED DISCIPLINE ===
|
|
52
52
|
|
|
53
|
-
`.gm/mutables.yml` is co-equal with `.gm/prd.yml`: PLAN enumerates every unknown into it, EXECUTE resolves each to `status: witnessed` with filled `witness_evidence`, EMIT is hard-blocked while any entry is `status: unknown`.
|
|
53
|
+
`.gm/mutables.yml` is co-equal with `.gm/prd.yml`: PLAN enumerates every unknown into it, EXECUTE resolves each to `status: witnessed` with filled `witness_evidence`, EMIT is hard-blocked while any entry is `status: unknown`. Until every entry is witnessed, the gate denies Write/Edit, `git commit`, `git push`, and turn-stop alike -- there is no reachable forward move with an unknown still open, so resolution is not optional housekeeping but the only exit. Resolution = write-back with concrete proof (file:line, codesearch hit, exec output); "I resolved it" without updating the file leaves the gate closed.
|
|
54
54
|
|
|
55
55
|
=== SMOKE-PAGE BAN -- USE window.__debug ===
|
|
56
56
|
|
package/skills/gm-skill/SKILL.md
CHANGED
|
@@ -50,9 +50,9 @@ bun x gm-plugkit@latest spool > /dev/null 2>&1 &
|
|
|
50
50
|
|
|
51
51
|
**Apparent tooling failure is NEVER grounds to ask the user, and never a reason to a/b-test or blind-restart.** "The spooler is not working" / a missing spool response / a stale watcher is YOUR mechanical, self-service recovery, not a question for the user: honor a future `busy_until` (wait), else boot the watcher and re-dispatch -- you have the authority to boot, so asking the user to do it (or to do anything the verbs can do) is a paper-spirit violation. The spooler mechanics are sound by construction (`.status.json` is written atomically temp+rename, every long verb advertises `busy_until`), so a transient unreadable/stale read is a respawn/idle-teardown window to boot through, not a broken tool. When a transient boot hiccup occurs (e.g. `FailedToOpenSocket`), retry `bun x gm-plugkit@latest spool` -- blips resolve in seconds; never escalate to the user and never fall back to a non-`@latest` cache (it lands a stale watcher). This is the gm method applied to your own tooling: record each candidate cause as a mutable, eliminate it by witness, discover more, keep going.
|
|
52
52
|
|
|
53
|
-
**Debug the live page via globals + process-of-elimination, never
|
|
53
|
+
**Debug the live page via globals + process-of-elimination, never guess-and-restart, variant-after-variant, or a/b testing.** Surface the relevant state as a `window.*` global and read it live via the `browser` verb's `page.evaluate`, eliminating hypotheses one at a time -- record each as a mutable, witness its resolution, add the mutables it reveals. This record-eliminate-discover loop is the core of gm, the browser most of all.
|
|
54
54
|
|
|
55
|
-
**gm genuinely profiles and debugs on both surfaces --
|
|
55
|
+
**gm genuinely profiles and debugs on both surfaces -- measure, never eyeball.** The numbers exist and are cheap to read: node wall-time, memory, and the thrown stack on `exec_js`; page console, uncaught errors, network timing, and navigation performance on the `browser` verb. Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement against your `window.*` globals -- never guess-and-restart. Two zero-boilerplate affordances make the reach trivial: every `exec_js` response carries `duration_ms`; a `browser` body prefixed `capture\n<script>` auto-returns `{result, debug:{console, pageErrors, network, performance}}`, so the listeners and timing reads come for free.
|
|
56
56
|
|
|
57
57
|
From PowerShell, write spool input as UTF-8 no-BOM (`-Encoding utf8` or `[System.IO.File]::WriteAllText`); the 5.1 default UTF-16+BOM trips `spool.body-encoding-recoded`. Prefer the `Write` tool for JSON bodies. First-turn body is `{"prompt":"<user request>"}` (derives orient_nouns + recall_hits); later same-conversation turns may use `{}`. A `Write` to `in/<verb>/` that errors `ENOENT` (a fast watcher consumed and unlinked the file before the tool's post-write stat) has STILL dispatched -- confirm via the `out/` response, never blind-retry (a non-idempotent verb like `git_finalize` would double-fire); a Bash heredoc `cat > in/<verb>/<N>.txt` has no post-write stat and never surfaces this.
|
|
58
58
|
|
|
@@ -84,4 +84,4 @@ The chain is not COMPLETE until changes are on origin. Commit and push at the en
|
|
|
84
84
|
|
|
85
85
|
**Prune bad memory on sight -- a wrong recall hit is worse than a miss.** A stale/superseded/wrong `recall` or `auto_recall` hit gets `memorize-prune {key}` (deletes text + embedding). For an uncertain set, `memorize-prune {query}` returns review-only candidates; judge, then re-dispatch the stale `{keys:[...]}` -- never a blind similarity-delete.
|
|
86
86
|
|
|
87
|
-
On turn entry
|
|
87
|
+
On turn entry plugkit attaches an `auto_recall` pack derived from the prompt; read its hits alongside `recall_hits` (the phase+PRD-subject pack). It fires once per turn entry on its own -- do not re-trigger it.
|