gm-skill 2.0.1611 → 2.0.1613

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -12,7 +12,7 @@ YOU drive the browser through the spool: plugkit holds the Chromium handle, per-
12
12
 
13
13
  ## Body shapes
14
14
 
15
- The body is a string, five shapes only:
15
+ The body is a string, six shapes only:
16
16
 
17
17
  ```
18
18
  session new
@@ -20,10 +20,13 @@ session list
20
20
  session close <id>
21
21
  <arbitrary JS expression evaluated in page context>
22
22
  timeout=<ms>\n<expression>
23
+ capture\n<expression>
23
24
  ```
24
25
 
25
26
  A bare expression with no live session opens one against `about:blank`; with a live session it reuses it. `session new` returns the id you carry; with more than one open, target it via `session=<id>\n<expr>`. (`session close` and `session kill` are aliases.) Default per-eval timeout 120000ms; operations that legitimately exceed it prefix `timeout=<ms>\n` (wrapper clamps to 120000ms). The response carries `timeout_ms_used`; `browser.runner-timeout` fires at the cap -- read `stderr`, narrow or raise, never retry blind at the same budget.
26
27
 
28
+ **`capture\n<expression>` is the zero-boilerplate debug path -- prefer it.** Prefix your script with `capture` (or `profile`) on its own line and the wrapper auto-attaches `page.on('console'|'pageerror'|'requestfinished')` before your code runs, runs your script in an async wrapper (your top-level `await`/`return` work unchanged), and returns `{result: <your return>, debug: {console, pageErrors, network, performance}}` -- page console logs, uncaught errors, per-request network timing, and navigation performance, captured for free. Combine with timeout via `timeout=<ms>\ncapture\n<expr>`. Use the bare expression only when you do not want the capture overhead.
29
+
27
30
  ## Envelope
28
31
 
29
32
  `{ok, stdout, stderr, exit_code, session_id?}`. `stdout` = stringified eval result; `stderr` = page errors + launch diagnostics; `exit_code` non-zero = the dispatch did not land -- read `stderr` and re-dispatch, never blind.
@@ -38,7 +41,7 @@ The window opens on the user's screen -- that IS the witness. `GM_BROWSER_HEADLE
38
41
 
39
42
  ## Profile and debug recipes
40
43
 
41
- The page is a genuine profiler and debugger -- use it, never guess-and-restart. Attach the listeners BEFORE `page.goto`, then return the captured arrays from one script (all witnessed live):
44
+ The page is a genuine profiler and debugger -- use it, never guess-and-restart. The `capture\n` prefix above does all of this for free; reach for the manual recipe below only for custom capture. Attach the listeners BEFORE `page.goto`, then return the captured arrays from one script (all witnessed live):
42
45
 
43
46
  ```
44
47
  const logs=[],errs=[],net=[];
@@ -54,6 +57,7 @@ return {logs,errs,net,perf};
54
57
  - **Performance**: `performance.getEntriesByType('navigation')[0]` gives `loadEventEnd`/`domContentLoadedEventEnd`; `getEntriesByType('resource')` gives per-asset timing; `performance.now()`/`PerformanceObserver` for in-page measures. This is your profiler.
55
58
  - **Network timing**: `request.timing()` fields (`responseEnd`, `responseStart`, ...) are ALREADY relative to `startTime` -- use `Math.round(t.responseEnd)` directly for duration; subtracting `startTime` yields a garbage huge-negative (witnessed). `-1` means N/A.
56
59
  - **State**: expose any runtime value as `window.__x` in the app or via `page.evaluate(()=>{window.__x=...})`, then read it with another `page.evaluate` -- the live global beats a restart. Surface relevant state as a global on purpose so a single evaluate observes it.
60
+ - **Screenshots**: to actually SEE a screenshot, save it to a file and `Read` that path -- `await page.screenshot({path:'<abs>/shot.png'})` then `Read <abs>/shot.png` (the Read tool renders the PNG visually; witnessed). NEVER `return` the screenshot Buffer inline -- it stringifies to useless bytes in the envelope. The same applies to any binary: write it to a path, then Read the path.
57
61
 
58
62
  Profile to LOCATE (which call/resource is slow), then eliminate hypotheses by live measurement -- never a/b-test by restarting. The node side mirrors this: `exec_js` with `process.hrtime.bigint()`/`performance.now()` timing, `process.memoryUsage()`, and `stderr` stack capture is a genuine node profiler+debugger.
59
63
 
@@ -36,7 +36,7 @@ First emit = closure of the transform; scaffold + IOU externalizes residual cost
36
36
 
37
37
  Data first -- get the structures and their invariants right and the code writes itself; convoluted control flow means the data model is wrong, so fix the model. Make invalid state unrepresentable -- pass parameters over hidden globals, encode the constraint in the type/shape so the bad combination cannot be constructed. Reason from physical constraints (latency, bandwidth, memory, coordination, the worst node) before designing within them. Keep the spine flat, each unit single-focus and understandable at its call site. Make misuse structurally impossible, not documented-against. Optimize the worst case, not the average; design every failure path explicitly (full -> degraded -> safe-fail -> explicit-error), never a silent catastrophic mode. Measure, do not assume -- profile before optimizing, implement both and compare on real input when in genuine dispute. When a change regresses something that worked, revert first and investigate second: restore green, then diagnose from a known-good base. Fail fast and loud over limping on bad state.
38
38
 
39
- **Process of elimination is the debugging paradigm on every surface, and manual labour against real services is how you witness.** Never guess-and-restart, a/b-test, or shotgun variants: enumerate the candidate causes as mutables, then eliminate each by a witness read against REAL input -- `exec_js` against the real service, `codesearch`/`Read` against the real source, the `browser` verb's `page.evaluate` against a `window.*` global on the live page. Each elimination reveals the next mutable; record it and keep going until one cause survives every other's refutation. Reading the live runtime once observes more than a hundred blind restarts. Profile genuinely on both surfaces: in node, `exec_js` with `process.hrtime.bigint()`/`performance.now()` around the suspect code, `process.memoryUsage()`, and the thrown-error `stack` (stdout carries the numbers, stderr the stack); in the browser, the `browser` verb with `page.on('console')`/`page.on('pageerror')` capture + `performance.getEntriesByType('navigation'|'resource')` + `request.timing().responseEnd` (see browser prose for the recipe). Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement. Verification is the same labour: run the real thing and witness the real output (the single mock-free `test.js`, the live page, the real service), never an automated unit/mock harness standing in for the real-services witness. Apparent tooling failure is part of this -- it is your mechanical self-recovery by elimination, never a question for the user.
39
+ **Process of elimination is the debugging paradigm on every surface, and manual labour against real services is how you witness.** Never guess-and-restart, a/b-test, or shotgun variants: enumerate the candidate causes as mutables, then eliminate each by a witness read against REAL input -- `exec_js` against the real service, `codesearch`/`Read` against the real source, the `browser` verb's `page.evaluate` against a `window.*` global on the live page. Each elimination reveals the next mutable; record it and keep going until one cause survives every other's refutation. Reading the live runtime once observes more than a hundred blind restarts. Profile genuinely on both surfaces: in node, `exec_js` with `process.hrtime.bigint()`/`performance.now()` around the suspect code, `process.memoryUsage()`, and the thrown-error `stack` (stdout carries the numbers, stderr the stack); in the browser, a `browser` body prefixed `capture\n<script>` auto-returns `{result, debug:{console, pageErrors, network, performance}}` (zero boilerplate), or hand-attach `page.on('console')`/`page.on('pageerror')` + `performance.getEntriesByType('navigation'|'resource')` + `request.timing().responseEnd` (see browser prose). `exec_js` responses carry `duration_ms` for free. Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement. Verification is the same labour: run the real thing and witness the real output (the single mock-free `test.js`, the live page, the real service), never an automated unit/mock harness standing in for the real-services witness. Apparent tooling failure is part of this -- it is your mechanical self-recovery by elimination, never a question for the user.
40
40
 
41
41
  ## Memorize
42
42
 
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-plugkit",
3
- "version": "2.0.1611",
3
+ "version": "2.0.1613",
4
4
  "description": "Bootstrap and daemon-spawn tool for gm plugkit binary. Downloads the correct platform binary, verifies SHA256, and starts the spool watcher daemon. Includes plugkit-wasm-wrapper for WASM-based spool watching.",
5
5
  "main": "index.js",
6
6
  "bin": {
@@ -1876,6 +1876,7 @@ function makeHostFunctions(instanceRef) {
1876
1876
  else if (lang === 'bash') { cmd = 'bash'; args = ['-c', code]; }
1877
1877
  else if (lang === 'deno') { cmd = 'deno'; args = ['eval', code]; }
1878
1878
  else { return writeWasmJson(instanceRef.value, { ok: false, error: `unsupported lang: ${lang}` }); }
1879
+ const __execT0 = Date.now();
1879
1880
  const result = spawnSync(cmd, args, { encoding: 'utf-8', timeout: timeoutMs, cwd, env: process.env });
1880
1881
  return writeWasmJson(instanceRef.value, {
1881
1882
  ok: result.status === 0,
@@ -1883,6 +1884,7 @@ function makeHostFunctions(instanceRef) {
1883
1884
  stderr: result.stderr || '',
1884
1885
  exit_code: result.status === null ? -1 : result.status,
1885
1886
  timed_out: result.signal === 'SIGTERM',
1887
+ duration_ms: Date.now() - __execT0,
1886
1888
  });
1887
1889
  } catch (e) {
1888
1890
  return writeWasmJson(instanceRef.value, { ok: false, error: e.message });
@@ -1998,6 +2000,17 @@ function makeHostFunctions(instanceRef) {
1998
2000
  evalBody = timeoutMatch[2];
1999
2001
  }
2000
2002
  }
2003
+ const captureMatch = evalBody.match(/^(?:capture|profile)[ \t]*\n([\s\S]*)$/);
2004
+ if (captureMatch) {
2005
+ const userScript = captureMatch[1];
2006
+ evalBody = `const __logs=[],__errs=[],__net=[];\n`
2007
+ + `try{page.on('console',m=>{try{__logs.push({type:m.type(),text:m.text()});}catch(_){}});`
2008
+ + `page.on('pageerror',e=>{try{__errs.push(String(e&&e.message||e));}catch(_){}});`
2009
+ + `page.on('requestfinished',r=>{try{const t=r.timing();__net.push({url:String(r.url()).slice(0,120),dur_ms:Math.round(t.responseEnd),ttfb_ms:Math.round(t.responseStart)});}catch(_){}});}catch(_){}\n`
2010
+ + `const __result = await (async () => {\n${userScript}\n})();\n`
2011
+ + `let __perf=null;try{__perf=await page.evaluate(()=>{const n=performance.getEntriesByType('navigation')[0];return n?{load_ms:Math.round(n.loadEventEnd||0),dcl_ms:Math.round(n.domContentLoadedEventEnd||0),resources:performance.getEntriesByType('resource').length,now:Math.round(performance.now())}:null;});}catch(_){}\n`
2012
+ + `return {result:__result,debug:{console:__logs,pageErrors:__errs,network:__net.slice(0,30),performance:__perf}};`;
2013
+ }
2001
2014
  const outerTimeoutMs = Math.min(timeoutMs + 6000, 126000);
2002
2015
  const r = runBrowserRunner(pw, ['-s', pwSessionId, '--timeout', String(timeoutMs), '-e', evalBody], outerTimeoutMs, cwd, sessionId);
2003
2016
  const ok = r.status === 0;
@@ -3096,6 +3109,7 @@ async function runSpoolWatcher(instance, spoolDir) {
3096
3109
 
3097
3110
  const ptr = Number(result & 0xffffffffn);
3098
3111
  const len = Number(result >> 32n);
3112
+ guardWasmRange(instance.exports.memory.buffer, ptr, len, `spool-dispatch:${verb}`);
3099
3113
  const resultBytes = new Uint8Array(instance.exports.memory.buffer, ptr, len);
3100
3114
  let resultStr = new TextDecoder().decode(resultBytes);
3101
3115
 
@@ -3805,6 +3819,7 @@ if (_isCliEntry) (async () => {
3805
3819
  const result = dispatch(verbPtr, verbBytes.length, bodyPtr, bodyBytes.length);
3806
3820
  const ptr = Number(result & 0xffffffffn);
3807
3821
  const len = Number(result >> 32n);
3822
+ guardWasmRange(instance.exports.memory.buffer, ptr, len, `cli-dispatch:${verb}`);
3808
3823
  const out = new TextDecoder().decode(new Uint8Array(instance.exports.memory.buffer, ptr, len));
3809
3824
  process.stdout.write(out);
3810
3825
  let parsed;
package/gm.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm",
3
- "version": "2.0.1611",
3
+ "version": "2.0.1613",
4
4
  "description": "Spool-dispatch orchestration engine with unified state machine, skills, and automated git enforcement",
5
5
  "author": "AnEntrypoint",
6
6
  "license": "MIT",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-skill",
3
- "version": "2.0.1611",
3
+ "version": "2.0.1613",
4
4
  "description": "Canonical universal harness — AI-native software engineering via skill-driven orchestration; bootstraps plugkit for task execution and session isolation. Install in any AI coding agent host.",
5
5
  "author": "AnEntrypoint",
6
6
  "license": "MIT",
@@ -52,7 +52,7 @@ bun x gm-plugkit@latest spool > /dev/null 2>&1 &
52
52
 
53
53
  **Debug the live page via globals + process-of-elimination, never a/b testing.** When a browser/client issue is hard, the move is NOT to guess-and-restart or try variant after variant: surface the relevant state as a `window.*` global and read it live via the `browser` verb's `page.evaluate`, running experiments in the page to eliminate hypotheses one by one (record each as a mutable, witness its resolution, add the new mutables it reveals). A global plus one evaluate observes real runtime state in a single dispatch; the restart-and-eyeball / a/b loop observes almost nothing and burns turns. This process -- record all mutables, eliminate by witness, discover more, keep going -- is the core of gm and applies to every debugging surface, the browser most of all.
54
54
 
55
- **gm genuinely profiles and debugs on both surfaces -- do it, do not eyeball.** Node via `exec_js`: wrap the suspect code in `process.hrtime.bigint()`/`performance.now()`, read `process.memoryUsage()`, capture thrown-error `stack` (stdout returns the numbers, stderr the stack). Browser via the `browser` verb: attach `page.on('console')` + `page.on('pageerror')` before `page.goto`, then `page.evaluate` `performance.getEntriesByType('navigation'|'resource')` and your `window.*` globals; for network use `request.timing().responseEnd` directly (it is already relative to startTime). Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement -- never guess-and-restart. Both capabilities are witnessed-working; reach for them on every hard performance or correctness question.
55
+ **gm genuinely profiles and debugs on both surfaces -- do it, do not eyeball.** Node via `exec_js`: wrap the suspect code in `process.hrtime.bigint()`/`performance.now()`, read `process.memoryUsage()`, capture thrown-error `stack` (stdout returns the numbers, stderr the stack). Browser via the `browser` verb: attach `page.on('console')` + `page.on('pageerror')` before `page.goto`, then `page.evaluate` `performance.getEntriesByType('navigation'|'resource')` and your `window.*` globals; for network use `request.timing().responseEnd` directly (it is already relative to startTime). Profile to LOCATE the slow/broken node, then eliminate hypotheses by live measurement -- never guess-and-restart. Both capabilities are witnessed-working; reach for them on every hard performance or correctness question. Two zero-boilerplate affordances make this trivial: every `exec_js` response carries `duration_ms` (free node wall-time); and a `browser` body prefixed `capture\n<script>` auto-returns `{result, debug:{console, pageErrors, network, performance}}` so you get page logs, uncaught errors, network timing, and navigation performance without writing the `page.on` setup.
56
56
 
57
57
  From PowerShell, write spool input as UTF-8 no-BOM (`-Encoding utf8` or `[System.IO.File]::WriteAllText`); the 5.1 default UTF-16+BOM trips `spool.body-encoding-recoded`. Prefer the `Write` tool for JSON bodies. First-turn body is `{"prompt":"<user request>"}` (derives orient_nouns + recall_hits); later same-conversation turns may use `{}`. A `Write` to `in/<verb>/` that errors `ENOENT` (a fast watcher consumed and unlinked the file before the tool's post-write stat) has STILL dispatched -- confirm via the `out/` response, never blind-retry (a non-idempotent verb like `git_finalize` would double-fire); a Bash heredoc `cat > in/<verb>/<N>.txt` has no post-write stat and never surfaces this.
58
58