gm-skill 2.0.1609 → 2.0.1611
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md
CHANGED
|
@@ -158,6 +158,8 @@ Orchestration state is tracked via `.gm/` marker files, not hook events; the CLI
|
|
|
158
158
|
|
|
159
159
|
**Apparent tooling failure is mechanical self-recovery, NEVER a question for the user and never an a/b-test/blind-restart.** "The spooler is not working" / a missing spool response / a stale watcher is the agent's own job to fix: honor a future `busy_until` (wait), else boot the watcher and re-dispatch; on a transient boot hiccup (`FailedToOpenSocket`) retry `@latest`, never the non-`@latest` cache (stale). The spooler is sound by construction -- `.status.json` is written atomically (temp+rename, `atomicWriteJson`) and every long verb advertises `busy_until` -- so a transient unreadable/stale read is a respawn/idle-teardown window to boot through, not a broken tool; asking the user to do what a verb can do is a paper-spirit violation. Debug the live page via `window.*` globals + the `browser` verb's `page.evaluate` as a process of elimination, never variant-after-variant a/b testing. This IS the core gm method on every surface including its own tooling: record all mutables, eliminate each by witness, discover more, keep going.
|
|
160
160
|
|
|
161
|
+
**Process-of-elimination is the debugging paradigm EVERYWHERE, and manual real-services witness is the verification paradigm EVERYWHERE.** Every debug -- code, wasm, cascade, browser, the spooler itself -- enumerates candidate causes as mutables and eliminates each by a witness read against real input (`exec_js`/`codesearch`/`Read`/`browser page.evaluate`), each elimination revealing the next, never guess-and-restart/a-b-test/shotgun. Every verification is manual labour against the real thing -- the single mock-free `test.js`, the live page, the real service, the live wasm -- never an automated unit/mock suite standing in for the real-services witness (the conventional-testing tell-tale gm replaces). Stated in `instructions/execute.md` (the served EXECUTE prose) so it reaches every LLM in-session.
|
|
162
|
+
|
|
161
163
|
**The first verb after a genuine multi-minute IDLE is `instruction`, to reset the long-gap clock**: gate fires when >300s since last instruction AND >300s since any SPOOL verb. Platform `Bash`/`Read`/`Edit`/`Grep` do NOT reset the clock -- a long investigation run in them trips a false stall; interleave `prd-add` or `instruction` to keep warm. For a predictable blocking wait (`TaskOutput`/`gh run watch`), dispatch `instruction` BEFORE entering the wait. Detail + platform-tool exception in rs-learn (`recall: first verb after multi-minute wait instruction long-gap`).
|
|
162
164
|
|
|
163
165
|
**A stop-hook firing on a terminal chain does not authorize re-polling**: when a stop-hook fires while already at `phase=COMPLETE` AND `prd_pending_count=0`, re-dispatching `instruction`/`phase-status` to "re-confirm" is a deviation (`deviation.complete-chain-poll`, `instructions/mod.rs`). Two admissible responses: (a) a prose-only turn (COMPLETE is in hand), or (b) genuinely new planned work opened with a FRESH `{"prompt":...}` body (resets phase to PLAN, driven through the skill). Repeatedly answering the same hook is a loop; state the terminal facts once and stop, or open new work.
|
|
@@ -36,6 +36,27 @@ The window opens on the user's screen -- that IS the witness. `GM_BROWSER_HEADLE
|
|
|
36
36
|
|
|
37
37
|
`session new` (or a bare expression with no live session) spawns a locally-profiled Chromium at `<cwd>/.gm/browser-profile/`; the runner attaches via `--direct <wsEndpoint>`. Cookies/storage/extensions persist across sessions, turns, and runs. A second concurrent launch contends the SingletonLock; the watcher reuses the live CDP rather than re-launching. The runner's extension-attach mode ("Waiting for extension to connect") is never the default or what you want -- seeing it in `stderr` means the host failed to spawn local Chromium; dispatch `instruction` for recovery, not a blind retry.
|
|
38
38
|
|
|
39
|
+
## Profile and debug recipes
|
|
40
|
+
|
|
41
|
+
The page is a genuine profiler and debugger -- use it, never guess-and-restart. Attach the listeners BEFORE `page.goto`, then return the captured arrays from one script (all witnessed live):
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
const logs=[],errs=[],net=[];
|
|
45
|
+
page.on('console',m=>logs.push({type:m.type(),text:m.text()}));
|
|
46
|
+
page.on('pageerror',e=>errs.push(String(e&&e.message||e)));
|
|
47
|
+
page.on('requestfinished',r=>{const t=r.timing();net.push({url:r.url(),dur_ms:Math.round(t.responseEnd),ttfb_ms:Math.round(t.responseStart)});});
|
|
48
|
+
await page.goto(URL,{waitUntil:'load'});
|
|
49
|
+
const perf=await page.evaluate(()=>{const n=performance.getEntriesByType('navigation')[0]||{};return {load_ms:Math.round(n.loadEventEnd||0),dcl_ms:Math.round(n.domContentLoadedEventEnd||0),resources:performance.getEntriesByType('resource').length,now:Math.round(performance.now())};});
|
|
50
|
+
return {logs,errs,net,perf};
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
- **Console + uncaught errors**: `page.on('console')` captures every page `console.*`; `page.on('pageerror')` captures uncaught exceptions (a `try/catch` in the page swallows them -- they surface as a console.error instead). This is your debug log.
|
|
54
|
+
- **Performance**: `performance.getEntriesByType('navigation')[0]` gives `loadEventEnd`/`domContentLoadedEventEnd`; `getEntriesByType('resource')` gives per-asset timing; `performance.now()`/`PerformanceObserver` for in-page measures. This is your profiler.
|
|
55
|
+
- **Network timing**: `request.timing()` fields (`responseEnd`, `responseStart`, ...) are ALREADY relative to `startTime` -- use `Math.round(t.responseEnd)` directly for duration; subtracting `startTime` yields a garbage huge-negative (witnessed). `-1` means N/A.
|
|
56
|
+
- **State**: expose any runtime value as `window.__x` in the app or via `page.evaluate(()=>{window.__x=...})`, then read it with another `page.evaluate` -- the live global beats a restart. Surface relevant state as a global on purpose so a single evaluate observes it.
|
|
57
|
+
|
|
58
|
+
Profile to LOCATE (which call/resource is slow), then eliminate hypotheses by live measurement -- never a/b-test by restarting. The node side mirrors this: `exec_js` with `process.hrtime.bigint()`/`performance.now()` timing, `process.memoryUsage()`, and `stderr` stack capture is a genuine node profiler+debugger.
|
|
59
|
+
|
|
39
60
|
## Discipline
|
|
40
61
|
|
|
41
62
|
Never spawn Chromium yourself, `npm i puppeteer`, or shell `chrome.exe`; the verb owns the handle, and bypassing it orphans state plugkit cannot reap and breaks the next session's first read. Navigate by evaluating `location.href = '...'` through the spool; screenshot by dispatching the verb that returns one. A dispatch returning `ok:false` with a launch error is plugkit reporting the environment refused -- read `stderr`, dispatch `instruction`, do not loop the same body.
|
|
@@ -36,6 +36,8 @@ First emit = closure of the transform; scaffold + IOU externalizes residual cost
|
|
|
36
36
|
|
|
37
37
|
Data first -- get the structures and their invariants right and the code writes itself; convoluted control flow means the data model is wrong, so fix the model. Make invalid state unrepresentable -- pass parameters over hidden globals, encode the constraint in the type/shape so the bad combination cannot be constructed. Reason from physical constraints (latency, bandwidth, memory, coordination, the worst node) before designing within them. Keep the spine flat, each unit single-focus and understandable at its call site. Make misuse structurally impossible, not documented-against. Optimize the worst case, not the average; design every failure path explicitly (full -> degraded -> safe-fail -> explicit-error), never a silent catastrophic mode. Measure, do not assume -- profile before optimizing, implement both and compare on real input when in genuine dispute. When a change regresses something that worked, revert first and investigate second: restore green, then diagnose from a known-good base. Fail fast and loud over limping on bad state.
|
|
38
38
|
|
|
39
|
+
**Process of elimination is the debugging paradigm on every surface, and manual labour against real services is how you witness.** Never guess-and-restart, a/b-test, or shotgun variants: enumerate the candidate causes as mutables, then eliminate each by a witness read against REAL input -- `exec_js` against the real service, `codesearch`/`Read` against the real source, the `browser` verb's `page.evaluate` against a `window.*` global on the live page. Each elimination reveals the next mutable; record it and keep going until one cause survives every other's refutation. Reading the live runtime once observes more than a hundred blind restarts. Profile genuinely on both surfaces: in node, `exec_js` with `process.hrtime.bigint()`/`performance.now()` around the suspect code, `process.memoryUsage()`, and the thrown-error `stack` (stdout carries the numbers, stderr the stack); in the browser, the `browser` verb with `page.on('console')`/`page.on('pageerror')` capture + `performance.getEntriesByType('navigation'|'resource')` + `request.timing().responseEnd` (see browser prose for the recipe). Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement. Verification is the same labour: run the real thing and witness the real output (the single mock-free `test.js`, the live page, the real service), never an automated unit/mock harness standing in for the real-services witness. Apparent tooling failure is part of this -- it is your mechanical self-recovery by elimination, never a question for the user.
|
|
40
|
+
|
|
39
41
|
## Memorize
|
|
40
42
|
|
|
41
43
|
Write the recall index only via `memorize-fire`; other surfaces produce memos the index never sees. Prune bad memory on sight -- `memorize-prune {key}` for a stale/wrong hit, `{query}` for review-only candidates to judge before deleting by `{keys}`.
|
package/gm-plugkit/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm-plugkit",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.1611",
|
|
4
4
|
"description": "Bootstrap and daemon-spawn tool for gm plugkit binary. Downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Includes plugkit-wasm-wrapper for WASM-based spool watching.",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"bin": {
|
package/gm.json
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm-skill",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.1611",
|
|
4
4
|
"description": "Canonical universal harness — AI-native software engineering via skill-driven orchestration; bootstraps plugkit for task execution and session isolation. Install in any AI coding agent host.",
|
|
5
5
|
"author": "AnEntrypoint",
|
|
6
6
|
"license": "MIT",
|
package/skills/gm-skill/SKILL.md
CHANGED
|
@@ -52,6 +52,8 @@ bun x gm-plugkit@latest spool > /dev/null 2>&1 &
|
|
|
52
52
|
|
|
53
53
|
**Debug the live page via globals + process-of-elimination, never a/b testing.** When a browser/client issue is hard, the move is NOT to guess-and-restart or try variant after variant: surface the relevant state as a `window.*` global and read it live via the `browser` verb's `page.evaluate`, running experiments in the page to eliminate hypotheses one by one (record each as a mutable, witness its resolution, add the new mutables it reveals). A global plus one evaluate observes real runtime state in a single dispatch; the restart-and-eyeball / a/b loop observes almost nothing and burns turns. This process -- record all mutables, eliminate by witness, discover more, keep going -- is the core of gm and applies to every debugging surface, the browser most of all.
|
|
54
54
|
|
|
55
|
+
**gm genuinely profiles and debugs on both surfaces -- do it, do not eyeball.** Node via `exec_js`: wrap the suspect code in `process.hrtime.bigint()`/`performance.now()`, read `process.memoryUsage()`, capture thrown-error `stack` (stdout returns the numbers, stderr the stack). Browser via the `browser` verb: attach `page.on('console')` + `page.on('pageerror')` before `page.goto`, then `page.evaluate` `performance.getEntriesByType('navigation'|'resource')` and your `window.*` globals; for network use `request.timing().responseEnd` directly (it is already relative to startTime). Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement -- never guess-and-restart. Both capabilities are witnessed-working; reach for them on every hard performance or correctness question.
|
|
56
|
+
|
|
55
57
|
From PowerShell, write spool input as UTF-8 no-BOM (`-Encoding utf8` or `[System.IO.File]::WriteAllText`); the 5.1 default UTF-16+BOM trips `spool.body-encoding-recoded`. Prefer the `Write` tool for JSON bodies. First-turn body is `{"prompt":"<user request>"}` (derives orient_nouns + recall_hits); later same-conversation turns may use `{}`. A `Write` to `in/<verb>/` that errors `ENOENT` (a fast watcher consumed and unlinked the file before the tool's post-write stat) has STILL dispatched -- confirm via the `out/` response, never blind-retry (a non-idempotent verb like `git_finalize` would double-fire); a Bash heredoc `cat > in/<verb>/<N>.txt` has no post-write stat and never surfaces this.
|
|
56
58
|
|
|
57
59
|
**Batch writes and reads together.** Write request + Read response is one logical step -- issue both in one block, not three turns. Fan-out is the same: N independent verbs = N Writes in one block then N Reads in one block. Only a real data dependency (verb B needs A's response) forces separate turns.
|