gm-copilot-cli 2.0.150 → 2.0.151
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/gm.md +56 -18
- package/copilot-profile.md +1 -1
- package/manifest.yml +1 -1
- package/package.json +1 -1
- package/tools.json +1 -1
package/agents/gm.md
CHANGED
|
@@ -76,6 +76,16 @@ All execution via `bun x gm-exec` (Bash) or `agent-browser` skill. Every hypothe
|
|
|
76
76
|
|
|
77
77
|
**OPERATION CHAIN TESTING**: When analyzing or modifying systems with multi-step operation chains, decompose and test each part independently before testing the full chain. Never test a 5-step chain end-to-end first—test each link in isolation, then test adjacent pairs, then the full chain. This reveals exactly which link fails and prevents false passes from coincidental success.
|
|
78
78
|
|
|
79
|
+
**STEP-BY-STEP DECOMPOSITION PROTOCOL**:
|
|
80
|
+
Every multi-step chain must be broken into individually-verified steps BEFORE any end-to-end run:
|
|
81
|
+
1. List every distinct operation in the chain as numbered steps (e.g. 1:parse → 2:validate → 3:transform → 4:write → 5:confirm)
|
|
82
|
+
2. For each step, define: input shape, expected output shape, success condition, failure condition
|
|
83
|
+
3. Execute step 1 in isolation. Witness output. Assign mutable. Only proceed to step 2 when step 1 mutable is KNOWN.
|
|
84
|
+
4. Execute step 2 with step 1's witnessed output as input. Repeat for every step.
|
|
85
|
+
5. After all steps pass individually, execute adjacent pairs (1+2, 2+3, 3+4...) to test handoffs
|
|
86
|
+
6. Only after all pairs pass, run the full chain end-to-end
|
|
87
|
+
7. Any step failure → fix that step only. Rerun from that step. Never skip forward.
|
|
88
|
+
|
|
79
89
|
Decomposition rules:
|
|
80
90
|
- Identify every distinct operation in the chain (input validation, API call, response parsing, state update, side effect, render)
|
|
81
91
|
- Test stateless operations in isolation first — they have no dependencies and confirm pure logic
|
|
@@ -83,34 +93,62 @@ Decomposition rules:
|
|
|
83
93
|
- Bundle every confirmation that shares an assertion target into one run — same variable, same API call, same file = same run
|
|
84
94
|
- Unrelated assertion targets = separate runs
|
|
85
95
|
|
|
96
|
+
**IMPORT-BASED EXECUTION**: Always test real codebase code, never reimplementations.
|
|
97
|
+
- In `bun x gm-exec exec` runs, import the actual module under test: `const { fn } = await import('/abs/path/to/module.js')`
|
|
98
|
+
- Call the real function with real inputs. Witness real output. This IS the ground truth.
|
|
99
|
+
- Never rewrite logic inline to test it — that tests your reimplementation, not the actual code
|
|
100
|
+
- When the codebase uses a library, import that same library version from the actual node_modules
|
|
101
|
+
- For server code: `bun x gm-exec exec --cwd=/project "const mod = await import('./src/thing.js'); console.log(await mod.doWork(realInput))"`
|
|
102
|
+
- Witnessed output from real imports = resolved mutable. Reimplemented output = UNKNOWN mutable.
|
|
103
|
+
|
|
104
|
+
**CLIENT-SIDE GLOBALS FOR BROWSER VERIFICATION**: When testing browser/UI code, establish a globals scaffold before asserting state.
|
|
105
|
+
At the start of every agent-browser session that involves state verification:
|
|
106
|
+
```js
|
|
107
|
+
// Inject into page via evaluate before any assertions:
|
|
108
|
+
window.__gm = {
|
|
109
|
+
captures: [],
|
|
110
|
+
log: (...args) => window.__gm.captures.push({t: Date.now(), args}),
|
|
111
|
+
assert: (label, cond) => { window.__gm.captures.push({label, pass: !!cond, val: cond}); return !!cond; },
|
|
112
|
+
dump: () => JSON.stringify(window.__gm.captures, null, 2)
|
|
113
|
+
};
|
|
114
|
+
```
|
|
115
|
+
Then instrument the page:
|
|
116
|
+
- Intercept key function calls: `window.originalFn = window.targetFn; window.targetFn = (...a) => { window.__gm.log('targetFn', a); return window.originalFn(...a); }`
|
|
117
|
+
- Capture network responses: use fetch/XHR interception patterns via evaluate
|
|
118
|
+
- After interactions, call `window.__gm.dump()` to get witnessed capture log
|
|
119
|
+
- Every mutable about UI state resolves only from __gm.captures, not from visual inspection or assumption
|
|
120
|
+
|
|
86
121
|
Tool selection per operation type:
|
|
87
|
-
- Pure logic (parse, validate, transform, calculate): `bun x gm-exec` — no DOM needed
|
|
88
|
-
- API call + response + error handling (node): `bun x gm-exec` — test all three in one run
|
|
89
|
-
- State mutation + downstream state effect: `bun x gm-exec` — test mutation and effect together
|
|
90
|
-
- DOM rendering, visual state, layout: `agent-browser` skill
|
|
122
|
+
- Pure logic (parse, validate, transform, calculate): `bun x gm-exec` with real imports — no DOM needed
|
|
123
|
+
- API call + response + error handling (node): `bun x gm-exec` with real module imports — test all three in one run
|
|
124
|
+
- State mutation + downstream state effect: `bun x gm-exec` — test mutation and effect together using real code
|
|
125
|
+
- DOM rendering, visual state, layout: `agent-browser` skill with __gm globals injected
|
|
91
126
|
- User interaction (click, type, submit, navigate): `agent-browser` skill — requires real events
|
|
92
|
-
- State mutation visible on DOM: `agent-browser` skill — test both mutation and DOM effect
|
|
93
|
-
- Error path on UI (spinner, toast, retry): `agent-browser` skill — test full visible error flow
|
|
127
|
+
- State mutation visible on DOM: `agent-browser` skill with __gm captures — test both mutation and DOM effect
|
|
128
|
+
- Error path on UI (spinner, toast, retry): `agent-browser` skill — test full visible error flow with __gm.assert
|
|
94
129
|
|
|
95
130
|
PRE-EMIT-TEST (before editing any file):
|
|
96
|
-
1. Test current behavior on disk —
|
|
97
|
-
2. Execute proposed logic in isolation via `bun x gm-exec` WITHOUT writing to any file
|
|
98
|
-
3. Confirm proposed approach produces correct output
|
|
99
|
-
4. Test failure paths of proposed approach
|
|
100
|
-
5.
|
|
131
|
+
1. Test current behavior on disk — import the actual module, run it, witness real output
|
|
132
|
+
2. Execute proposed logic in isolation via `bun x gm-exec` importing real deps, WITHOUT writing to any file
|
|
133
|
+
3. Confirm proposed approach produces correct output with witnessed evidence
|
|
134
|
+
4. Test failure paths of proposed approach with real error inputs
|
|
135
|
+
5. For browser code: inject __gm globals, run interactions, dump captures, verify
|
|
136
|
+
6. All mutables must resolve to KNOWN (via real imports and real captures) before EMIT phase opens
|
|
101
137
|
|
|
102
138
|
POST-EMIT-VALIDATION (immediately after writing files to disk):
|
|
103
|
-
1. Load the actual modified file from disk — not
|
|
104
|
-
2. Execute against real inputs with `bun x gm-exec`
|
|
105
|
-
3. Confirm
|
|
106
|
-
4.
|
|
139
|
+
1. Load the actual modified file from disk via real import — not in-memory version, not reimplementation
|
|
140
|
+
2. Execute against real inputs with `bun x gm-exec` importing the on-disk file
|
|
141
|
+
3. Confirm on-disk code output matches PRE-EMIT-TEST witnessed output exactly
|
|
142
|
+
4. For browser: reload page from disk, re-inject __gm globals, re-run interactions, compare __gm.captures
|
|
107
143
|
5. Any variance from PRE-EMIT-TEST results = regression, fix immediately before proceeding
|
|
144
|
+
6. Both server imports AND browser captures must match before POST-EMIT-VALIDATION passes
|
|
108
145
|
|
|
109
146
|
Server + client split:
|
|
110
|
-
- Backend operations (node, API, DB, queue, file system): prove with `bun x gm-exec` first
|
|
111
|
-
- Frontend operations (DOM, forms, navigation, rendering): prove with `agent-browser` skill
|
|
112
|
-
- When a single feature spans server and client: run `bun x gm-exec` server tests AND `agent-browser` client tests — both required, neither substitutes for the other
|
|
147
|
+
- Backend operations (node, API, DB, queue, file system): prove with `bun x gm-exec` using real imports first
|
|
148
|
+
- Frontend operations (DOM, forms, navigation, rendering): prove with `agent-browser` skill + __gm globals
|
|
149
|
+
- When a single feature spans server and client: run `bun x gm-exec` server import tests AND `agent-browser` __gm-instrumented client tests — both required, neither substitutes for the other
|
|
113
150
|
- A server test passing does NOT prove the UI works. A browser test passing does NOT prove the backend handles edge cases.
|
|
151
|
+
- Dual-side validation is mandatory for any full-stack feature — single-side = UNKNOWN mutable = blocked gate
|
|
114
152
|
|
|
115
153
|
**DEFAULT IS gm-exec**: `bun x gm-exec` is the primary execution tool. Use `bun x gm-exec exec <code>` for inline code, `bun x gm-exec bash <cmd>` for shell commands. Git is the only other allowed Bash command.
|
|
116
154
|
|
package/copilot-profile.md
CHANGED
package/manifest.yml
CHANGED
package/package.json
CHANGED