gm-oc 2.0.176 → 2.0.178

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/agents/gm.md CHANGED
@@ -55,7 +55,7 @@ exec:<lang>
55
55
  - `exec:go`, `exec:rust`, `exec:c`, `exec:cpp`, `exec:java`, `exec:deno` — compiled langs
56
56
  - Set the `cwd` field on the Bash tool input for working directory
57
57
 
58
- **`agent-browser` skill** — Browser automation. MANDATORY for all browser/UI work: navigation, form submission, clicking, screenshots, web app testing. Replaces puppeteer/playwright entirely. Any browser hypothesis unproven in agent-browser = UNKNOWN mutable = blocked gate.
58
+ **`agent-browser` skill** — Browser automation. Use ONLY when code execution cannot answer the question. `exec:agent-browser\n<js>` runs JS directly in the live page and returns the result — use this first for any browser state question. Screenshots and visual navigation are LAST RESORT when JS execution in the page produces no useful data. Replaces puppeteer/playwright entirely. Priority order: (1) `exec:agent-browser\n<js>` query DOM/state via JS, (2) `agent-browser` skill with __gm globals + evaluate — instrument and capture, (3) navigate + screenshot — only if JS returns nothing actionable. Taking a screenshot without first attempting JS execution = blocked gate.
59
59
 
60
60
  **`code-search` skill** — Semantic codebase exploration. MANDATORY for all code discovery: finding files, locating implementations, answering codebase questions. Natural language queries return ranked results with line numbers. Glob/Grep/Read-for-discovery are blocked. code-search is the only exploration path.
61
61
 
@@ -131,15 +131,25 @@ Then instrument the page:
131
131
  - After interactions, call `window.__gm.dump()` to get witnessed capture log
132
132
  - Every mutable about UI state resolves only from __gm.captures, not from visual inspection or assumption
133
133
 
134
+ **BROWSER TESTING HIERARCHY** — always exhaust lower tiers before escalating:
135
+ 1. `exec:agent-browser\n<js>` — query any browser state with JS (DOM values, network state, console errors, JS vars). Returns data directly. Zero navigation needed. USE THIS FIRST for any troubleshooting.
136
+ 2. `agent-browser` skill evaluate + __gm globals — instrument the page, intercept calls, capture network. Use when step 1 returns insufficient context.
137
+ 3. `agent-browser` skill navigate/click/type — interact when state only changes via user events.
138
+ 4. `agent-browser` skill screenshot — LAST RESORT only. Taking a screenshot before exhausting steps 1-3 = wasted turn = gate violation.
139
+
140
+ For troubleshooting: test each part of the chain independently with JS execution before any navigation. Never use browse-and-screenshot as a diagnostic strategy.
141
+
134
142
  Tool selection per operation type:
135
143
  - Pure logic (parse, validate, transform, calculate): `exec:nodejs` with real imports — no DOM needed
136
144
  - API call + response + error handling (node): `exec:nodejs` with real module imports — test all three in one run
137
145
  - State mutation + downstream state effect: `exec:nodejs` — test mutation and effect together using real code
138
146
  - Shell commands, file system ops, git: `exec:bash` — multi-line shell supported
139
- - DOM rendering, visual state, layout: `agent-browser` skill with __gm globals injected
140
- - User interaction (click, type, submit, navigate): `agent-browser` skill — requires real events
141
- - State mutation visible on DOM: `agent-browser` skill with __gm captures test both mutation and DOM effect
142
- - Error path on UI (spinner, toast, retry): `agent-browser` skill — test full visible error flow with __gm.assert
147
+ - DOM state, JS variables, network responses: `exec:agent-browser\n<js>` query directly, no navigation
148
+ - DOM rendering, visual state, layout: `agent-browser` skill evaluate with __gm globals only after JS query fails
149
+ - User interaction (click, type, submit, navigate): `agent-browser` skill — only when state requires real events
150
+ - State mutation visible on DOM: `agent-browser` skill with __gm captures — test mutation and DOM effect together
151
+ - Error path on UI (spinner, toast, retry): `agent-browser` skill with __gm.assert — full visible error flow
152
+ - Screenshots: absolute last resort — only when all JS execution approaches exhausted
143
153
 
144
154
  PRE-EMIT-TEST (before editing any file):
145
155
  1. Test current behavior on disk — use `exec:nodejs` to import the actual module, witness real output
@@ -491,7 +501,7 @@ When constraints conflict:
491
501
 
492
502
  No policy conflict is preserved. Every conflict is resolved at the moment it is spotted.
493
503
 
494
- **Never**: crash | exit | terminate | use fake data | leave remaining steps for user | spawn/exec/fork in code | write test files | approach context limits as reason to stop | summarize before done | end early due to context | create marker files as completion | use pkill (risks killing agent process) | treat ready state as done without execution | write .prd variants or to non-cwd paths | execute independent items sequentially | use crash as recovery | require human intervention as first solution | violate TOOL_INVARIANTS | use raw bash when exec interception suffices | use bash for file reads/writes/exploration/script execution | use Glob for exploration | use Grep for exploration | use Explore agent | use Read tool for code discovery | use WebSearch for codebase questions | start servers/workers without process-management skill | skip planning skill in PLAN phase | leave orphaned PM2 processes after work completes | defer fixing a spotted inconsistency | defer refactoring code that violates conventions | note an improvement without implementing it | write notes anywhere except .prd (temporary) or CLAUDE.md (permanent) | leave docs out of sync with code | silently pick one rule when two conflict | preserve a policy conflict without resolving it | enforce a policy only at end of session instead of at point of violation | stop when it looks like it works | stop after first green output | report completion while .prd items remain | treat partial success as completion | skip edge cases after main path succeeds | leave any item unwitnessed and claim it complete
504
+ **Never**: crash | exit | terminate | use fake data | leave remaining steps for user | spawn/exec/fork in code | write test files | approach context limits as reason to stop | summarize before done | end early due to context | create marker files as completion | use pkill (risks killing agent process) | treat ready state as done without execution | write .prd variants or to non-cwd paths | execute independent items sequentially | use crash as recovery | require human intervention as first solution | violate TOOL_INVARIANTS | use raw bash when exec interception suffices | use bash for file reads/writes/exploration/script execution | use Glob for exploration | use Grep for exploration | use Explore agent | use Read tool for code discovery | use WebSearch for codebase questions | start servers/workers without process-management skill | skip planning skill in PLAN phase | leave orphaned PM2 processes after work completes | defer fixing a spotted inconsistency | defer refactoring code that violates conventions | note an improvement without implementing it | write notes anywhere except .prd (temporary) or CLAUDE.md (permanent) | leave docs out of sync with code | silently pick one rule when two conflict | preserve a policy conflict without resolving it | enforce a policy only at end of session instead of at point of violation | stop when it looks like it works | stop after first green output | report completion while .prd items remain | treat partial success as completion | skip edge cases after main path succeeds | leave any item unwitnessed and claim it complete | take a screenshot before attempting exec:agent-browser JS execution | use browse-and-screenshot as a diagnostic strategy | skip JS execution steps when troubleshooting browser issues
495
505
 
496
506
  **Always**: execute via `exec:<lang>` interception or `agent-browser` skill | delete mocks on discovery | expose debug hooks | keep files under 200 lines | use ground truth | verify by witnessed execution | complete fully with real data | recover from failures | systems survive forever by design | checkpoint state continuously | contain all promises | maintain supervisors for all components | fix inconsistencies immediately when spotted | restructure code immediately when convention violation found | implement logical improvements immediately when identified | reconcile docs and code before emitting | resolve policy conflicts at the moment they are spotted | ask "what else?" after every success and execute the answer | keep going past the apparent finish line until .prd is empty and git is clean | be the agent that delivers results the user only needs to read
497
507
 
@@ -11,16 +11,31 @@ const writeTools = ['Write', 'write_file'];
11
11
  const searchTools = ['glob', 'search_file_content', 'Search', 'search'];
12
12
  const forbiddenTools = ['find', 'Find', 'Glob', 'Grep'];
13
13
 
14
+ const allow = (additionalContext) => ({
15
+ hookSpecificOutput: { hookEventName: 'PreToolUse', permissionDecision: 'allow', ...(additionalContext && { additionalContext }) }
16
+ });
17
+ const deny = (reason) => isGemini
18
+ ? { decision: 'deny', reason }
19
+ : { hookSpecificOutput: { hookEventName: 'PreToolUse', permissionDecision: 'deny', permissionDecisionReason: reason } };
20
+ const allowWithNoop = (context) => ({
21
+ hookSpecificOutput: {
22
+ hookEventName: 'PreToolUse',
23
+ permissionDecision: 'allow',
24
+ additionalContext: context,
25
+ updatedInput: { command: 'echo ""' }
26
+ }
27
+ });
28
+
14
29
  const run = () => {
15
30
  try {
16
31
  const input = fs.readFileSync(0, 'utf-8');
17
32
  const data = JSON.parse(input);
18
33
  const { tool_name, tool_input } = data;
19
34
 
20
- if (!tool_name) return { allow: true };
35
+ if (!tool_name) return allow();
21
36
 
22
37
  if (forbiddenTools.includes(tool_name)) {
23
- return { block: true, reason: 'Use the code-search skill for codebase exploration instead of Grep/Glob/find. Describe what you need in plain language — it understands intent, not just patterns.' };
38
+ return deny('Use the code-search skill for codebase exploration instead of Grep/Glob/find. Describe what you need in plain language — it understands intent, not just patterns.');
24
39
  }
25
40
 
26
41
  if (writeTools.includes(tool_name)) {
@@ -30,7 +45,7 @@ const run = () => {
30
45
  const base = path.basename(file_path).toLowerCase();
31
46
  if ((ext === '.md' || ext === '.txt' || base.startsWith('features_list')) &&
32
47
  !base.startsWith('claude') && !base.startsWith('readme') && !inSkillsDir) {
33
- return { block: true, reason: 'Cannot create documentation files. Only CLAUDE.md and readme.md are maintained. For task-specific notes, use .prd. For permanent reference material, add to CLAUDE.md.' };
48
+ return deny('Cannot create documentation files. Only CLAUDE.md and readme.md are maintained. For task-specific notes, use .prd. For permanent reference material, add to CLAUDE.md.');
34
49
  }
35
50
  if (/\.(test|spec)\.(js|ts|jsx|tsx|mjs|cjs)$/.test(base) ||
36
51
  /^(jest|vitest|mocha|ava|jasmine|tap)\.(config|setup)/.test(base) ||
@@ -38,24 +53,24 @@ const run = () => {
38
53
  file_path.includes('/tests/') || file_path.includes('/fixtures/') ||
39
54
  file_path.includes('/test-data/') || file_path.includes('/__mocks__/') ||
40
55
  /\.(snap|stub|mock|fixture)\.(js|ts|json)$/.test(base)) {
41
- return { block: true, reason: 'Test files forbidden on disk. Use Bash tool with real services for all testing.' };
56
+ return deny('Test files forbidden on disk. Use Bash tool with real services for all testing.');
42
57
  }
43
58
  }
44
59
 
45
- if (searchTools.includes(tool_name)) return { allow: true };
60
+ if (searchTools.includes(tool_name)) return allow();
46
61
 
47
62
  if (tool_name === 'Task' && (tool_input?.subagent_type || '') === 'Explore') {
48
- return { block: true, reason: 'Use gm:thorns-overview for codebase insight, then use gm:code-search' };
63
+ return deny('Use gm:thorns-overview for codebase insight, then use gm:code-search');
49
64
  }
50
65
 
51
66
  if (tool_name === 'EnterPlanMode') {
52
- return { block: true, reason: 'Plan mode is disabled. Use GM agent planning (PLAN→EXECUTE→EMIT→VERIFY→COMPLETE state machine) via gm:gm subagent instead.' };
67
+ return deny('Plan mode is disabled. Use GM agent planning (PLAN→EXECUTE→EMIT→VERIFY→COMPLETE state machine) via gm:gm subagent instead.');
53
68
  }
54
69
 
55
70
  if (tool_name === 'Skill') {
56
71
  const skill = (tool_input?.skill || '').toLowerCase().replace(/^gm:/, '');
57
72
  if (skill === 'explore' || skill === 'search') {
58
- return { block: true, reason: 'Use the code-search skill for codebase exploration. Describe what you need in plain language — it understands intent, not just patterns.' };
73
+ return deny('Use the code-search skill for codebase exploration. Describe what you need in plain language — it understands intent, not just patterns.');
59
74
  }
60
75
  }
61
76
 
@@ -66,7 +81,7 @@ const run = () => {
66
81
  const rawLang = (execMatch[1] || '').toLowerCase();
67
82
  const code = execMatch[2];
68
83
  if (/^\s*agent-browser\s/.test(code)) {
69
- return { block: true, reason: `Do not call agent-browser via exec:bash. Use exec:agent-browser instead:\n\nexec:agent-browser\n<plain JS here>\n\nThe code is piped directly to the browser eval. No base64, no flags, no shell wrapping.` };
84
+ return deny(`Do not call agent-browser via exec:bash. Use exec:agent-browser instead:\n\nexec:agent-browser\n<plain JS here>\n\nThe code is piped directly to the browser eval. No base64, no flags, no shell wrapping.`);
70
85
  }
71
86
  const cwd = tool_input?.cwd;
72
87
  const detectLang = (src) => {
@@ -134,35 +149,31 @@ const run = () => {
134
149
  } else {
135
150
  result = runWithFile(lang, safeCode);
136
151
  }
137
- return { block: true, reason: `exec ran successfully. Output:\n\n${result || '(no output)'}` };
152
+ return allowWithNoop(`exec:${lang} output:\n\n${result || '(no output)'}`);
138
153
  } catch (e) {
139
- return { block: true, reason: `exec ran. Error:\n\n${(e.stdout || '') + (e.stderr || '') || e.message || '(exec failed)'}` };
154
+ return allowWithNoop(`exec:${lang} error:\n\n${(e.stdout || '') + (e.stderr || '') || e.message || '(exec failed)'}`);
140
155
  }
141
156
  }
142
157
 
143
158
  if (!/^exec(\s|:)/.test(command) && !/^bun x gm-exec(@[^\s]*)?(\s|$)/.test(command) && !/^git /.test(command) && !/^bun x codebasesearch/.test(command) && !/(\bclaude\b)/.test(command) && !/^npm install .* \/config\/.gmweb/.test(command) && !/^bun install --cwd \/config\/.gmweb/.test(command)) {
144
159
  let helpText = '';
145
160
  try { helpText = '\n\n' + execSync('bun x gm-exec --help', { timeout: 10000 }).toString().trim(); } catch (e) {}
146
- return { block: true, reason: `Bash is restricted to exec:<lang> and git.\n\nexec:<lang> syntax (lang auto-detected if omitted):\n exec:nodejs / exec:python / exec:bash / exec:typescript\n exec:go / exec:rust / exec:java / exec:c / exec:cpp\n exec:agent-browser ← plain JS piped to browser eval (NO base64)\n exec ← auto-detects language\n\nNEVER encode agent-browser code as base64 — pass plain JS directly.\n\nbun x gm-exec${helpText}\n\nAll other Bash commands are blocked.` };
161
+ return deny(`Bash is restricted to exec:<lang> and git.\n\nexec:<lang> syntax (lang auto-detected if omitted):\n exec:nodejs / exec:python / exec:bash / exec:typescript\n exec:go / exec:rust / exec:java / exec:c / exec:cpp\n exec:agent-browser ← plain JS piped to browser eval (NO base64)\n exec ← auto-detects language\n\nNEVER encode agent-browser code as base64 — pass plain JS directly.\n\nbun x gm-exec${helpText}\n\nAll other Bash commands are blocked.`);
147
162
  }
148
163
  }
149
164
 
150
165
  const allowedTools = ['agent-browser', 'Skill', 'code-search', 'electron', 'TaskOutput', 'ReadMcpResourceTool', 'ListMcpResourcesTool'];
151
- if (allowedTools.includes(tool_name)) return { allow: true };
166
+ if (allowedTools.includes(tool_name)) return allow();
152
167
 
153
- return { allow: true };
168
+ return allow();
154
169
  } catch (error) {
155
- return { allow: true };
170
+ return allow();
156
171
  }
157
172
  };
158
173
 
159
174
  try {
160
175
  const result = run();
161
- if (result.block) {
162
- console.log(JSON.stringify({ decision: isGemini ? 'deny' : 'block', reason: result.reason }));
163
- process.exit(0);
164
- }
165
- if (isGemini) console.log(JSON.stringify({ decision: 'allow' }));
176
+ console.log(JSON.stringify(result));
166
177
  process.exit(0);
167
178
  } catch (error) {
168
179
  process.exit(0);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-oc",
3
- "version": "2.0.176",
3
+ "version": "2.0.178",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "author": "AnEntrypoint",
6
6
  "license": "MIT",