gm-oc 2.0.176 → 2.0.178
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/gm.md +16 -6
- package/hooks/pre-tool-use-hook.js +31 -20
- package/package.json +1 -1
package/agents/gm.md
CHANGED
|
@@ -55,7 +55,7 @@ exec:<lang>
|
|
|
55
55
|
- `exec:go`, `exec:rust`, `exec:c`, `exec:cpp`, `exec:java`, `exec:deno` — compiled langs
|
|
56
56
|
- Set the `cwd` field on the Bash tool input for working directory
|
|
57
57
|
|
|
58
|
-
**`agent-browser` skill** — Browser automation.
|
|
58
|
+
**`agent-browser` skill** — Browser automation. Use ONLY when code execution cannot answer the question. `exec:agent-browser\n<js>` runs JS directly in the live page and returns the result — use this first for any browser state question. Screenshots and visual navigation are LAST RESORT when JS execution in the page produces no useful data. Replaces puppeteer/playwright entirely. Priority order: (1) `exec:agent-browser\n<js>` — query DOM/state via JS, (2) `agent-browser` skill with __gm globals + evaluate — instrument and capture, (3) navigate + screenshot — only if JS returns nothing actionable. Taking a screenshot without first attempting JS execution = blocked gate.
|
|
59
59
|
|
|
60
60
|
**`code-search` skill** — Semantic codebase exploration. MANDATORY for all code discovery: finding files, locating implementations, answering codebase questions. Natural language queries return ranked results with line numbers. Glob/Grep/Read-for-discovery are blocked. code-search is the only exploration path.
|
|
61
61
|
|
|
@@ -131,15 +131,25 @@ Then instrument the page:
|
|
|
131
131
|
- After interactions, call `window.__gm.dump()` to get witnessed capture log
|
|
132
132
|
- Every mutable about UI state resolves only from __gm.captures, not from visual inspection or assumption
|
|
133
133
|
|
|
134
|
+
**BROWSER TESTING HIERARCHY** — always exhaust lower tiers before escalating:
|
|
135
|
+
1. `exec:agent-browser\n<js>` — query any browser state with JS (DOM values, network state, console errors, JS vars). Returns data directly. Zero navigation needed. USE THIS FIRST for any troubleshooting.
|
|
136
|
+
2. `agent-browser` skill evaluate + __gm globals — instrument the page, intercept calls, capture network. Use when step 1 returns insufficient context.
|
|
137
|
+
3. `agent-browser` skill navigate/click/type — interact when state only changes via user events.
|
|
138
|
+
4. `agent-browser` skill screenshot — LAST RESORT only. Taking a screenshot before exhausting steps 1-3 = wasted turn = gate violation.
|
|
139
|
+
|
|
140
|
+
For troubleshooting: test each part of the chain independently with JS execution before any navigation. Never use browse-and-screenshot as a diagnostic strategy.
|
|
141
|
+
|
|
134
142
|
Tool selection per operation type:
|
|
135
143
|
- Pure logic (parse, validate, transform, calculate): `exec:nodejs` with real imports — no DOM needed
|
|
136
144
|
- API call + response + error handling (node): `exec:nodejs` with real module imports — test all three in one run
|
|
137
145
|
- State mutation + downstream state effect: `exec:nodejs` — test mutation and effect together using real code
|
|
138
146
|
- Shell commands, file system ops, git: `exec:bash` — multi-line shell supported
|
|
139
|
-
- DOM
|
|
140
|
-
-
|
|
141
|
-
-
|
|
142
|
-
-
|
|
147
|
+
- DOM state, JS variables, network responses: `exec:agent-browser\n<js>` — query directly, no navigation
|
|
148
|
+
- DOM rendering, visual state, layout: `agent-browser` skill evaluate with __gm globals — only after JS query fails
|
|
149
|
+
- User interaction (click, type, submit, navigate): `agent-browser` skill — only when state requires real events
|
|
150
|
+
- State mutation visible on DOM: `agent-browser` skill with __gm captures — test mutation and DOM effect together
|
|
151
|
+
- Error path on UI (spinner, toast, retry): `agent-browser` skill with __gm.assert — full visible error flow
|
|
152
|
+
- Screenshots: absolute last resort — only when all JS execution approaches exhausted
|
|
143
153
|
|
|
144
154
|
PRE-EMIT-TEST (before editing any file):
|
|
145
155
|
1. Test current behavior on disk — use `exec:nodejs` to import the actual module, witness real output
|
|
@@ -491,7 +501,7 @@ When constraints conflict:
|
|
|
491
501
|
|
|
492
502
|
No policy conflict is preserved. Every conflict is resolved at the moment it is spotted.
|
|
493
503
|
|
|
494
|
-
**Never**: crash | exit | terminate | use fake data | leave remaining steps for user | spawn/exec/fork in code | write test files | approach context limits as reason to stop | summarize before done | end early due to context | create marker files as completion | use pkill (risks killing agent process) | treat ready state as done without execution | write .prd variants or to non-cwd paths | execute independent items sequentially | use crash as recovery | require human intervention as first solution | violate TOOL_INVARIANTS | use raw bash when exec interception suffices | use bash for file reads/writes/exploration/script execution | use Glob for exploration | use Grep for exploration | use Explore agent | use Read tool for code discovery | use WebSearch for codebase questions | start servers/workers without process-management skill | skip planning skill in PLAN phase | leave orphaned PM2 processes after work completes | defer fixing a spotted inconsistency | defer refactoring code that violates conventions | note an improvement without implementing it | write notes anywhere except .prd (temporary) or CLAUDE.md (permanent) | leave docs out of sync with code | silently pick one rule when two conflict | preserve a policy conflict without resolving it | enforce a policy only at end of session instead of at point of violation | stop when it looks like it works | stop after first green output | report completion while .prd items remain | treat partial success as completion | skip edge cases after main path succeeds | leave any item unwitnessed and claim it complete
|
|
504
|
+
**Never**: crash | exit | terminate | use fake data | leave remaining steps for user | spawn/exec/fork in code | write test files | approach context limits as reason to stop | summarize before done | end early due to context | create marker files as completion | use pkill (risks killing agent process) | treat ready state as done without execution | write .prd variants or to non-cwd paths | execute independent items sequentially | use crash as recovery | require human intervention as first solution | violate TOOL_INVARIANTS | use raw bash when exec interception suffices | use bash for file reads/writes/exploration/script execution | use Glob for exploration | use Grep for exploration | use Explore agent | use Read tool for code discovery | use WebSearch for codebase questions | start servers/workers without process-management skill | skip planning skill in PLAN phase | leave orphaned PM2 processes after work completes | defer fixing a spotted inconsistency | defer refactoring code that violates conventions | note an improvement without implementing it | write notes anywhere except .prd (temporary) or CLAUDE.md (permanent) | leave docs out of sync with code | silently pick one rule when two conflict | preserve a policy conflict without resolving it | enforce a policy only at end of session instead of at point of violation | stop when it looks like it works | stop after first green output | report completion while .prd items remain | treat partial success as completion | skip edge cases after main path succeeds | leave any item unwitnessed and claim it complete | take a screenshot before attempting exec:agent-browser JS execution | use browse-and-screenshot as a diagnostic strategy | skip JS execution steps when troubleshooting browser issues
|
|
495
505
|
|
|
496
506
|
**Always**: execute via `exec:<lang>` interception or `agent-browser` skill | delete mocks on discovery | expose debug hooks | keep files under 200 lines | use ground truth | verify by witnessed execution | complete fully with real data | recover from failures | systems survive forever by design | checkpoint state continuously | contain all promises | maintain supervisors for all components | fix inconsistencies immediately when spotted | restructure code immediately when convention violation found | implement logical improvements immediately when identified | reconcile docs and code before emitting | resolve policy conflicts at the moment they are spotted | ask "what else?" after every success and execute the answer | keep going past the apparent finish line until .prd is empty and git is clean | be the agent that delivers results the user only needs to read
|
|
497
507
|
|
|
@@ -11,16 +11,31 @@ const writeTools = ['Write', 'write_file'];
|
|
|
11
11
|
const searchTools = ['glob', 'search_file_content', 'Search', 'search'];
|
|
12
12
|
const forbiddenTools = ['find', 'Find', 'Glob', 'Grep'];
|
|
13
13
|
|
|
14
|
+
const allow = (additionalContext) => ({
|
|
15
|
+
hookSpecificOutput: { hookEventName: 'PreToolUse', permissionDecision: 'allow', ...(additionalContext && { additionalContext }) }
|
|
16
|
+
});
|
|
17
|
+
const deny = (reason) => isGemini
|
|
18
|
+
? { decision: 'deny', reason }
|
|
19
|
+
: { hookSpecificOutput: { hookEventName: 'PreToolUse', permissionDecision: 'deny', permissionDecisionReason: reason } };
|
|
20
|
+
const allowWithNoop = (context) => ({
|
|
21
|
+
hookSpecificOutput: {
|
|
22
|
+
hookEventName: 'PreToolUse',
|
|
23
|
+
permissionDecision: 'allow',
|
|
24
|
+
additionalContext: context,
|
|
25
|
+
updatedInput: { command: 'echo ""' }
|
|
26
|
+
}
|
|
27
|
+
});
|
|
28
|
+
|
|
14
29
|
const run = () => {
|
|
15
30
|
try {
|
|
16
31
|
const input = fs.readFileSync(0, 'utf-8');
|
|
17
32
|
const data = JSON.parse(input);
|
|
18
33
|
const { tool_name, tool_input } = data;
|
|
19
34
|
|
|
20
|
-
if (!tool_name) return
|
|
35
|
+
if (!tool_name) return allow();
|
|
21
36
|
|
|
22
37
|
if (forbiddenTools.includes(tool_name)) {
|
|
23
|
-
return
|
|
38
|
+
return deny('Use the code-search skill for codebase exploration instead of Grep/Glob/find. Describe what you need in plain language — it understands intent, not just patterns.');
|
|
24
39
|
}
|
|
25
40
|
|
|
26
41
|
if (writeTools.includes(tool_name)) {
|
|
@@ -30,7 +45,7 @@ const run = () => {
|
|
|
30
45
|
const base = path.basename(file_path).toLowerCase();
|
|
31
46
|
if ((ext === '.md' || ext === '.txt' || base.startsWith('features_list')) &&
|
|
32
47
|
!base.startsWith('claude') && !base.startsWith('readme') && !inSkillsDir) {
|
|
33
|
-
return
|
|
48
|
+
return deny('Cannot create documentation files. Only CLAUDE.md and readme.md are maintained. For task-specific notes, use .prd. For permanent reference material, add to CLAUDE.md.');
|
|
34
49
|
}
|
|
35
50
|
if (/\.(test|spec)\.(js|ts|jsx|tsx|mjs|cjs)$/.test(base) ||
|
|
36
51
|
/^(jest|vitest|mocha|ava|jasmine|tap)\.(config|setup)/.test(base) ||
|
|
@@ -38,24 +53,24 @@ const run = () => {
|
|
|
38
53
|
file_path.includes('/tests/') || file_path.includes('/fixtures/') ||
|
|
39
54
|
file_path.includes('/test-data/') || file_path.includes('/__mocks__/') ||
|
|
40
55
|
/\.(snap|stub|mock|fixture)\.(js|ts|json)$/.test(base)) {
|
|
41
|
-
return
|
|
56
|
+
return deny('Test files forbidden on disk. Use Bash tool with real services for all testing.');
|
|
42
57
|
}
|
|
43
58
|
}
|
|
44
59
|
|
|
45
|
-
if (searchTools.includes(tool_name)) return
|
|
60
|
+
if (searchTools.includes(tool_name)) return allow();
|
|
46
61
|
|
|
47
62
|
if (tool_name === 'Task' && (tool_input?.subagent_type || '') === 'Explore') {
|
|
48
|
-
return
|
|
63
|
+
return deny('Use gm:thorns-overview for codebase insight, then use gm:code-search');
|
|
49
64
|
}
|
|
50
65
|
|
|
51
66
|
if (tool_name === 'EnterPlanMode') {
|
|
52
|
-
return
|
|
67
|
+
return deny('Plan mode is disabled. Use GM agent planning (PLAN→EXECUTE→EMIT→VERIFY→COMPLETE state machine) via gm:gm subagent instead.');
|
|
53
68
|
}
|
|
54
69
|
|
|
55
70
|
if (tool_name === 'Skill') {
|
|
56
71
|
const skill = (tool_input?.skill || '').toLowerCase().replace(/^gm:/, '');
|
|
57
72
|
if (skill === 'explore' || skill === 'search') {
|
|
58
|
-
return
|
|
73
|
+
return deny('Use the code-search skill for codebase exploration. Describe what you need in plain language — it understands intent, not just patterns.');
|
|
59
74
|
}
|
|
60
75
|
}
|
|
61
76
|
|
|
@@ -66,7 +81,7 @@ const run = () => {
|
|
|
66
81
|
const rawLang = (execMatch[1] || '').toLowerCase();
|
|
67
82
|
const code = execMatch[2];
|
|
68
83
|
if (/^\s*agent-browser\s/.test(code)) {
|
|
69
|
-
return
|
|
84
|
+
return deny(`Do not call agent-browser via exec:bash. Use exec:agent-browser instead:\n\nexec:agent-browser\n<plain JS here>\n\nThe code is piped directly to the browser eval. No base64, no flags, no shell wrapping.`);
|
|
70
85
|
}
|
|
71
86
|
const cwd = tool_input?.cwd;
|
|
72
87
|
const detectLang = (src) => {
|
|
@@ -134,35 +149,31 @@ const run = () => {
|
|
|
134
149
|
} else {
|
|
135
150
|
result = runWithFile(lang, safeCode);
|
|
136
151
|
}
|
|
137
|
-
return
|
|
152
|
+
return allowWithNoop(`exec:${lang} output:\n\n${result || '(no output)'}`);
|
|
138
153
|
} catch (e) {
|
|
139
|
-
return
|
|
154
|
+
return allowWithNoop(`exec:${lang} error:\n\n${(e.stdout || '') + (e.stderr || '') || e.message || '(exec failed)'}`);
|
|
140
155
|
}
|
|
141
156
|
}
|
|
142
157
|
|
|
143
158
|
if (!/^exec(\s|:)/.test(command) && !/^bun x gm-exec(@[^\s]*)?(\s|$)/.test(command) && !/^git /.test(command) && !/^bun x codebasesearch/.test(command) && !/(\bclaude\b)/.test(command) && !/^npm install .* \/config\/.gmweb/.test(command) && !/^bun install --cwd \/config\/.gmweb/.test(command)) {
|
|
144
159
|
let helpText = '';
|
|
145
160
|
try { helpText = '\n\n' + execSync('bun x gm-exec --help', { timeout: 10000 }).toString().trim(); } catch (e) {}
|
|
146
|
-
return
|
|
161
|
+
return deny(`Bash is restricted to exec:<lang> and git.\n\nexec:<lang> syntax (lang auto-detected if omitted):\n exec:nodejs / exec:python / exec:bash / exec:typescript\n exec:go / exec:rust / exec:java / exec:c / exec:cpp\n exec:agent-browser ← plain JS piped to browser eval (NO base64)\n exec ← auto-detects language\n\nNEVER encode agent-browser code as base64 — pass plain JS directly.\n\nbun x gm-exec${helpText}\n\nAll other Bash commands are blocked.`);
|
|
147
162
|
}
|
|
148
163
|
}
|
|
149
164
|
|
|
150
165
|
const allowedTools = ['agent-browser', 'Skill', 'code-search', 'electron', 'TaskOutput', 'ReadMcpResourceTool', 'ListMcpResourcesTool'];
|
|
151
|
-
if (allowedTools.includes(tool_name)) return
|
|
166
|
+
if (allowedTools.includes(tool_name)) return allow();
|
|
152
167
|
|
|
153
|
-
return
|
|
168
|
+
return allow();
|
|
154
169
|
} catch (error) {
|
|
155
|
-
return
|
|
170
|
+
return allow();
|
|
156
171
|
}
|
|
157
172
|
};
|
|
158
173
|
|
|
159
174
|
try {
|
|
160
175
|
const result = run();
|
|
161
|
-
|
|
162
|
-
console.log(JSON.stringify({ decision: isGemini ? 'deny' : 'block', reason: result.reason }));
|
|
163
|
-
process.exit(0);
|
|
164
|
-
}
|
|
165
|
-
if (isGemini) console.log(JSON.stringify({ decision: 'allow' }));
|
|
176
|
+
console.log(JSON.stringify(result));
|
|
166
177
|
process.exit(0);
|
|
167
178
|
} catch (error) {
|
|
168
179
|
process.exit(0);
|