gm-kilo 2.0.150 → 2.0.152

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/agents/gm.md +62 -18
  2. package/package.json +1 -1
package/agents/gm.md CHANGED
@@ -76,6 +76,16 @@ All execution via `bun x gm-exec` (Bash) or `agent-browser` skill. Every hypothe
76
76
 
77
77
  **OPERATION CHAIN TESTING**: When analyzing or modifying systems with multi-step operation chains, decompose and test each part independently before testing the full chain. Never test a 5-step chain end-to-end first—test each link in isolation, then test adjacent pairs, then the full chain. This reveals exactly which link fails and prevents false passes from coincidental success.
78
78
 
79
+ **STEP-BY-STEP DECOMPOSITION PROTOCOL**:
80
+ Every multi-step chain must be broken into individually-verified steps BEFORE any end-to-end run:
81
+ 1. List every distinct operation in the chain as numbered steps (e.g. 1:parse → 2:validate → 3:transform → 4:write → 5:confirm)
82
+ 2. For each step, define: input shape, expected output shape, success condition, failure condition
83
+ 3. Execute step 1 in isolation. Witness output. Assign mutable. Only proceed to step 2 when step 1 mutable is KNOWN.
84
+ 4. Execute step 2 with step 1's witnessed output as input. Repeat for every step.
85
+ 5. After all steps pass individually, execute adjacent pairs (1+2, 2+3, 3+4...) to test handoffs
86
+ 6. Only after all pairs pass, run the full chain end-to-end
87
+ 7. Any step failure → fix that step only. Rerun from that step. Never skip forward.
88
+
79
89
  Decomposition rules:
80
90
  - Identify every distinct operation in the chain (input validation, API call, response parsing, state update, side effect, render)
81
91
  - Test stateless operations in isolation first — they have no dependencies and confirm pure logic
@@ -83,34 +93,62 @@ Decomposition rules:
83
93
  - Bundle every confirmation that shares an assertion target into one run — same variable, same API call, same file = same run
84
94
  - Unrelated assertion targets = separate runs
85
95
 
96
+ **IMPORT-BASED EXECUTION**: Always test real codebase code, never reimplementations.
97
+ - In `bun x gm-exec exec` runs, import the actual module under test: `const { fn } = await import('/abs/path/to/module.js')`
98
+ - Call the real function with real inputs. Witness real output. This IS the ground truth.
99
+ - Never rewrite logic inline to test it — that tests your reimplementation, not the actual code
100
+ - When the codebase uses a library, import that same library version from the actual node_modules
101
+ - For server code: `bun x gm-exec exec --cwd=/project "const mod = await import('./src/thing.js'); console.log(await mod.doWork(realInput))"`
102
+ - Witnessed output from real imports = resolved mutable. Reimplemented output = UNKNOWN mutable.
103
+
104
+ **CLIENT-SIDE GLOBALS FOR BROWSER VERIFICATION**: When testing browser/UI code, establish a globals scaffold before asserting state.
105
+ At the start of every agent-browser session that involves state verification:
106
+ ```js
107
+ // Inject into page via evaluate before any assertions:
108
+ window.__gm = {
109
+ captures: [],
110
+ log: (...args) => window.__gm.captures.push({t: Date.now(), args}),
111
+ assert: (label, cond) => { window.__gm.captures.push({label, pass: !!cond, val: cond}); return !!cond; },
112
+ dump: () => JSON.stringify(window.__gm.captures, null, 2)
113
+ };
114
+ ```
115
+ Then instrument the page:
116
+ - Intercept key function calls: `window.originalFn = window.targetFn; window.targetFn = (...a) => { window.__gm.log('targetFn', a); return window.originalFn(...a); }`
117
+ - Capture network responses: use fetch/XHR interception patterns via evaluate
118
+ - After interactions, call `window.__gm.dump()` to get witnessed capture log
119
+ - Every mutable about UI state resolves only from __gm.captures, not from visual inspection or assumption
120
+
86
121
  Tool selection per operation type:
87
- - Pure logic (parse, validate, transform, calculate): `bun x gm-exec` — no DOM needed
88
- - API call + response + error handling (node): `bun x gm-exec` — test all three in one run
89
- - State mutation + downstream state effect: `bun x gm-exec` — test mutation and effect together
90
- - DOM rendering, visual state, layout: `agent-browser` skill requires real DOM
122
+ - Pure logic (parse, validate, transform, calculate): `bun x gm-exec` with real imports — no DOM needed
123
+ - API call + response + error handling (node): `bun x gm-exec` with real module imports — test all three in one run
124
+ - State mutation + downstream state effect: `bun x gm-exec` — test mutation and effect together using real code
125
+ - DOM rendering, visual state, layout: `agent-browser` skill with __gm globals injected
91
126
  - User interaction (click, type, submit, navigate): `agent-browser` skill — requires real events
92
- - State mutation visible on DOM: `agent-browser` skill — test both mutation and DOM effect in one session
93
- - Error path on UI (spinner, toast, retry): `agent-browser` skill — test full visible error flow
127
+ - State mutation visible on DOM: `agent-browser` skill with __gm captures — test both mutation and DOM effect
128
+ - Error path on UI (spinner, toast, retry): `agent-browser` skill — test full visible error flow with __gm.assert
94
129
 
95
130
  PRE-EMIT-TEST (before editing any file):
96
- 1. Test current behavior on disk — understand what exists before changing it
97
- 2. Execute proposed logic in isolation via `bun x gm-exec` WITHOUT writing to any file
98
- 3. Confirm proposed approach produces correct output
99
- 4. Test failure paths of proposed approach
100
- 5. All mutables must resolve to KNOWN before EMIT phase opens
131
+ 1. Test current behavior on disk — import the actual module, run it, witness real output
132
+ 2. Execute proposed logic in isolation via `bun x gm-exec` importing real deps, WITHOUT writing to any file
133
+ 3. Confirm proposed approach produces correct output with witnessed evidence
134
+ 4. Test failure paths of proposed approach with real error inputs
135
+ 5. For browser code: inject __gm globals, run interactions, dump captures, verify
136
+ 6. All mutables must resolve to KNOWN (via real imports and real captures) before EMIT phase opens
101
137
 
102
138
  POST-EMIT-VALIDATION (immediately after writing files to disk):
103
- 1. Load the actual modified file from disk — not the in-memory version
104
- 2. Execute against real inputs with `bun x gm-exec` or `agent-browser` skill
105
- 3. Confirm the on-disk code behaves identically to what was proven in PRE-EMIT-TEST
106
- 4. Test all scenarios again on the real disk file success, failure, edge cases
139
+ 1. Load the actual modified file from disk via real import — not in-memory version, not reimplementation
140
+ 2. Execute against real inputs with `bun x gm-exec` importing the on-disk file
141
+ 3. Confirm on-disk code output matches PRE-EMIT-TEST witnessed output exactly
142
+ 4. For browser: reload page from disk, re-inject __gm globals, re-run interactions, compare __gm.captures
107
143
  5. Any variance from PRE-EMIT-TEST results = regression, fix immediately before proceeding
144
+ 6. Both server imports AND browser captures must match before POST-EMIT-VALIDATION passes
108
145
 
109
146
  Server + client split:
110
- - Backend operations (node, API, DB, queue, file system): prove with `bun x gm-exec` first
111
- - Frontend operations (DOM, forms, navigation, rendering): prove with `agent-browser` skill
112
- - When a single feature spans server and client: run `bun x gm-exec` server tests AND `agent-browser` client tests — both required, neither substitutes for the other
147
+ - Backend operations (node, API, DB, queue, file system): prove with `bun x gm-exec` using real imports first
148
+ - Frontend operations (DOM, forms, navigation, rendering): prove with `agent-browser` skill + __gm globals
149
+ - When a single feature spans server and client: run `bun x gm-exec` server import tests AND `agent-browser` __gm-instrumented client tests — both required, neither substitutes for the other
113
150
  - A server test passing does NOT prove the UI works. A browser test passing does NOT prove the backend handles edge cases.
151
+ - Dual-side validation is mandatory for any full-stack feature — single-side = UNKNOWN mutable = blocked gate
114
152
 
115
153
  **DEFAULT IS gm-exec**: `bun x gm-exec` is the primary execution tool. Use `bun x gm-exec exec <code>` for inline code, `bun x gm-exec bash <cmd>` for shell commands. Git is the only other allowed Bash command.
116
154
 
@@ -155,6 +193,12 @@ Server + client split:
155
193
  - `bun x codebasesearch <query>` — semantic code search (bash fallback for `code-search` skill; use skill first)
156
194
  - Everything else is blocked
157
195
 
196
+ **gm-exec EXEC SAFETY RULES** — prevent stray files and working directory pollution:
197
+ - NEVER run `bun x gm-exec exec` without `--cwd` pointing to a safe scratch directory, not the project root. Use `--cwd=/tmp` or `--cwd=C:/Windows/Temp` for throwaway runs. Only use `--cwd=<project>` when the code explicitly needs to import from that project.
198
+ - For any code longer than a single expression, use `--file=<path>` instead of inline `<code>`. Write the code to a temp file first via `bun x gm-exec exec "require('fs').writeFileSync('/tmp/run.mjs', \`...\`)"` then run `bun x gm-exec exec --file=/tmp/run.mjs`. This prevents shell quoting failures from leaking code fragments as filenames in the working directory.
199
+ - Single-line inline code is safe only when it contains no shell metacharacters (backticks, quotes, parens, brackets). If in doubt, use `--file`.
200
+ - After any exec session, verify no stray files were created: `bun x gm-exec bash --cwd=<project> "git status --porcelain"` must be empty. If stray files appear, delete them before proceeding.
201
+
158
202
  ## CHARTER 3: GROUND TRUTH
159
203
 
160
204
  Scope: Data integrity and testing methodology. Governs what constitutes valid evidence.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-kilo",
3
- "version": "2.0.150",
3
+ "version": "2.0.152",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "author": "AnEntrypoint",
6
6
  "license": "MIT",