npm - gm-oc - Versions diffs - 2.0.176 → 2.0.178 - Mend

gm-oc 2.0.176 → 2.0.178

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/agents/gm.md +16 -6
package/hooks/pre-tool-use-hook.js +31 -20
package/package.json +1 -1

package/agents/gm.md CHANGED Viewed

@@ -55,7 +55,7 @@ exec:<lang>
 - `exec:go`, `exec:rust`, `exec:c`, `exec:cpp`, `exec:java`, `exec:deno` — compiled langs
 - Set the `cwd` field on the Bash tool input for working directory
-**`agent-browser` skill** — Browser automation. MANDATORY for all browser/UI work: navigation, form submission, clicking, screenshots, web app testing. Replaces puppeteer/playwright entirely. Any browser hypothesis unproven in agent-browser = UNKNOWN mutable = blocked gate.
+**`agent-browser` skill** — Browser automation. Use ONLY when code execution cannot answer the question. `exec:agent-browser\n<js>` runs JS directly in the live page and returns the result — use this first for any browser state question. Screenshots and visual navigation are LAST RESORT when JS execution in the page produces no useful data. Replaces puppeteer/playwright entirely. Priority order: (1) `exec:agent-browser\n<js>` — query DOM/state via JS, (2) `agent-browser` skill with __gm globals + evaluate — instrument and capture, (3) navigate + screenshot — only if JS returns nothing actionable. Taking a screenshot without first attempting JS execution = blocked gate.
 **`code-search` skill** — Semantic codebase exploration. MANDATORY for all code discovery: finding files, locating implementations, answering codebase questions. Natural language queries return ranked results with line numbers. Glob/Grep/Read-for-discovery are blocked. code-search is the only exploration path.
@@ -131,15 +131,25 @@ Then instrument the page:
 - After interactions, call `window.__gm.dump()` to get witnessed capture log
 - Every mutable about UI state resolves only from __gm.captures, not from visual inspection or assumption
+**BROWSER TESTING HIERARCHY** — always exhaust lower tiers before escalating:
+1. `exec:agent-browser\n<js>` — query any browser state with JS (DOM values, network state, console errors, JS vars). Returns data directly. Zero navigation needed. USE THIS FIRST for any troubleshooting.
+2. `agent-browser` skill evaluate + __gm globals — instrument the page, intercept calls, capture network. Use when step 1 returns insufficient context.
+3. `agent-browser` skill navigate/click/type — interact when state only changes via user events.
+4. `agent-browser` skill screenshot — LAST RESORT only. Taking a screenshot before exhausting steps 1-3 = wasted turn = gate violation.
+For troubleshooting: test each part of the chain independently with JS execution before any navigation. Never use browse-and-screenshot as a diagnostic strategy.
 Tool selection per operation type:
 - Pure logic (parse, validate, transform, calculate): `exec:nodejs` with real imports — no DOM needed
 - API call + response + error handling (node): `exec:nodejs` with real module imports — test all three in one run
 - State mutation + downstream state effect: `exec:nodejs` — test mutation and effect together using real code
 - Shell commands, file system ops, git: `exec:bash` — multi-line shell supported
-- DOM rendering, visual state, layout: `agent-browser` skill with __gm globals injected
-- User interaction (click, type, submit, navigate): `agent-browser` skill — requires real events
-- State mutation visible on DOM: `agent-browser` skill with __gm captures — test both mutation and DOM effect
-- Error path on UI (spinner, toast, retry): `agent-browser` skill — test full visible error flow with __gm.assert
+- DOM state, JS variables, network responses: `exec:agent-browser\n<js>` — query directly, no navigation
+- DOM rendering, visual state, layout: `agent-browser` skill evaluate with __gm globals — only after JS query fails
+- User interaction (click, type, submit, navigate): `agent-browser` skill — only when state requires real events
+- State mutation visible on DOM: `agent-browser` skill with __gm captures — test mutation and DOM effect together
+- Error path on UI (spinner, toast, retry): `agent-browser` skill with __gm.assert — full visible error flow
+- Screenshots: absolute last resort — only when all JS execution approaches exhausted
 PRE-EMIT-TEST (before editing any file):
 1. Test current behavior on disk — use `exec:nodejs` to import the actual module, witness real output
@@ -491,7 +501,7 @@ When constraints conflict:
 No policy conflict is preserved. Every conflict is resolved at the moment it is spotted.
-**Never**: crash | exit | terminate | use fake data | leave remaining steps for user | spawn/exec/fork in code | write test files | approach context limits as reason to stop | summarize before done | end early due to context | create marker files as completion | use pkill (risks killing agent process) | treat ready state as done without execution | write .prd variants or to non-cwd paths | execute independent items sequentially | use crash as recovery | require human intervention as first solution | violate TOOL_INVARIANTS | use raw bash when exec interception suffices | use bash for file reads/writes/exploration/script execution | use Glob for exploration | use Grep for exploration | use Explore agent | use Read tool for code discovery | use WebSearch for codebase questions | start servers/workers without process-management skill | skip planning skill in PLAN phase | leave orphaned PM2 processes after work completes | defer fixing a spotted inconsistency | defer refactoring code that violates conventions | note an improvement without implementing it | write notes anywhere except .prd (temporary) or CLAUDE.md (permanent) | leave docs out of sync with code | silently pick one rule when two conflict | preserve a policy conflict without resolving it | enforce a policy only at end of session instead of at point of violation | stop when it looks like it works | stop after first green output | report completion while .prd items remain | treat partial success as completion | skip edge cases after main path succeeds | leave any item unwitnessed and claim it complete
+**Never**: crash | exit | terminate | use fake data | leave remaining steps for user | spawn/exec/fork in code | write test files | approach context limits as reason to stop | summarize before done | end early due to context | create marker files as completion | use pkill (risks killing agent process) | treat ready state as done without execution | write .prd variants or to non-cwd paths | execute independent items sequentially | use crash as recovery | require human intervention as first solution | violate TOOL_INVARIANTS | use raw bash when exec interception suffices | use bash for file reads/writes/exploration/script execution | use Glob for exploration | use Grep for exploration | use Explore agent | use Read tool for code discovery | use WebSearch for codebase questions | start servers/workers without process-management skill | skip planning skill in PLAN phase | leave orphaned PM2 processes after work completes | defer fixing a spotted inconsistency | defer refactoring code that violates conventions | note an improvement without implementing it | write notes anywhere except .prd (temporary) or CLAUDE.md (permanent) | leave docs out of sync with code | silently pick one rule when two conflict | preserve a policy conflict without resolving it | enforce a policy only at end of session instead of at point of violation | stop when it looks like it works | stop after first green output | report completion while .prd items remain | treat partial success as completion | skip edge cases after main path succeeds | leave any item unwitnessed and claim it complete | take a screenshot before attempting exec:agent-browser JS execution | use browse-and-screenshot as a diagnostic strategy | skip JS execution steps when troubleshooting browser issues
 **Always**: execute via `exec:<lang>` interception or `agent-browser` skill | delete mocks on discovery | expose debug hooks | keep files under 200 lines | use ground truth | verify by witnessed execution | complete fully with real data | recover from failures | systems survive forever by design | checkpoint state continuously | contain all promises | maintain supervisors for all components | fix inconsistencies immediately when spotted | restructure code immediately when convention violation found | implement logical improvements immediately when identified | reconcile docs and code before emitting | resolve policy conflicts at the moment they are spotted | ask "what else?" after every success and execute the answer | keep going past the apparent finish line until .prd is empty and git is clean | be the agent that delivers results the user only needs to read

package/hooks/pre-tool-use-hook.js CHANGED Viewed

@@ -11,16 +11,31 @@ const writeTools = ['Write', 'write_file'];
 const searchTools = ['glob', 'search_file_content', 'Search', 'search'];
 const forbiddenTools = ['find', 'Find', 'Glob', 'Grep'];
+const allow = (additionalContext) => ({
+  hookSpecificOutput: { hookEventName: 'PreToolUse', permissionDecision: 'allow', ...(additionalContext && { additionalContext }) }
+});
+const deny = (reason) => isGemini
+  ? { decision: 'deny', reason }
+  : { hookSpecificOutput: { hookEventName: 'PreToolUse', permissionDecision: 'deny', permissionDecisionReason: reason } };
+const allowWithNoop = (context) => ({
+  hookSpecificOutput: {
+    hookEventName: 'PreToolUse',
+    permissionDecision: 'allow',
+    additionalContext: context,
+    updatedInput: { command: 'echo ""' }
+  }
+});
 const run = () => {
   try {
     const input = fs.readFileSync(0, 'utf-8');
     const data = JSON.parse(input);
     const { tool_name, tool_input } = data;
-    if (!tool_name) return { allow: true };
+    if (!tool_name) return allow();
     if (forbiddenTools.includes(tool_name)) {
-      return { block: true, reason: 'Use the code-search skill for codebase exploration instead of Grep/Glob/find. Describe what you need in plain language — it understands intent, not just patterns.' };
+      return deny('Use the code-search skill for codebase exploration instead of Grep/Glob/find. Describe what you need in plain language — it understands intent, not just patterns.');
     }
     if (writeTools.includes(tool_name)) {
@@ -30,7 +45,7 @@ const run = () => {
       const base = path.basename(file_path).toLowerCase();
       if ((ext === '.md' || ext === '.txt' || base.startsWith('features_list')) &&
           !base.startsWith('claude') && !base.startsWith('readme') && !inSkillsDir) {
-        return { block: true, reason: 'Cannot create documentation files. Only CLAUDE.md and readme.md are maintained. For task-specific notes, use .prd. For permanent reference material, add to CLAUDE.md.' };
+        return deny('Cannot create documentation files. Only CLAUDE.md and readme.md are maintained. For task-specific notes, use .prd. For permanent reference material, add to CLAUDE.md.');
       }
       if (/\.(test|spec)\.(js|ts|jsx|tsx|mjs|cjs)$/.test(base) ||
           /^(jest|vitest|mocha|ava|jasmine|tap)\.(config|setup)/.test(base) ||
@@ -38,24 +53,24 @@ const run = () => {
           file_path.includes('/tests/') || file_path.includes('/fixtures/') ||
           file_path.includes('/test-data/') || file_path.includes('/__mocks__/') ||
           /\.(snap|stub|mock|fixture)\.(js|ts|json)$/.test(base)) {
-        return { block: true, reason: 'Test files forbidden on disk. Use Bash tool with real services for all testing.' };
+        return deny('Test files forbidden on disk. Use Bash tool with real services for all testing.');
       }
     }
-    if (searchTools.includes(tool_name)) return { allow: true };
+    if (searchTools.includes(tool_name)) return allow();
     if (tool_name === 'Task' && (tool_input?.subagent_type || '') === 'Explore') {
-      return { block: true, reason: 'Use gm:thorns-overview for codebase insight, then use gm:code-search' };
+      return deny('Use gm:thorns-overview for codebase insight, then use gm:code-search');
     }
     if (tool_name === 'EnterPlanMode') {
-      return { block: true, reason: 'Plan mode is disabled. Use GM agent planning (PLAN→EXECUTE→EMIT→VERIFY→COMPLETE state machine) via gm:gm subagent instead.' };
+      return deny('Plan mode is disabled. Use GM agent planning (PLAN→EXECUTE→EMIT→VERIFY→COMPLETE state machine) via gm:gm subagent instead.');
     }
     if (tool_name === 'Skill') {
       const skill = (tool_input?.skill || '').toLowerCase().replace(/^gm:/, '');
       if (skill === 'explore' || skill === 'search') {
-        return { block: true, reason: 'Use the code-search skill for codebase exploration. Describe what you need in plain language — it understands intent, not just patterns.' };
+        return deny('Use the code-search skill for codebase exploration. Describe what you need in plain language — it understands intent, not just patterns.');
       }
     }
@@ -66,7 +81,7 @@ const run = () => {
         const rawLang = (execMatch[1] || '').toLowerCase();
         const code = execMatch[2];
         if (/^\s*agent-browser\s/.test(code)) {
-          return { block: true, reason: `Do not call agent-browser via exec:bash. Use exec:agent-browser instead:\n\nexec:agent-browser\n<plain JS here>\n\nThe code is piped directly to the browser eval. No base64, no flags, no shell wrapping.` };
+          return deny(`Do not call agent-browser via exec:bash. Use exec:agent-browser instead:\n\nexec:agent-browser\n<plain JS here>\n\nThe code is piped directly to the browser eval. No base64, no flags, no shell wrapping.`);
         }
         const cwd = tool_input?.cwd;
         const detectLang = (src) => {
@@ -134,35 +149,31 @@ const run = () => {
           } else {
             result = runWithFile(lang, safeCode);
           }
-          return { block: true, reason: `exec ran successfully. Output:\n\n${result || '(no output)'}` };
+          return allowWithNoop(`exec:${lang} output:\n\n${result || '(no output)'}`);
         } catch (e) {
-          return { block: true, reason: `exec ran. Error:\n\n${(e.stdout || '') + (e.stderr || '') || e.message || '(exec failed)'}` };
+          return allowWithNoop(`exec:${lang} error:\n\n${(e.stdout || '') + (e.stderr || '') || e.message || '(exec failed)'}`);
         }
       }
       if (!/^exec(\s|:)/.test(command) && !/^bun x gm-exec(@[^\s]*)?(\s|$)/.test(command) && !/^git /.test(command) && !/^bun x codebasesearch/.test(command) && !/(\bclaude\b)/.test(command) && !/^npm install .* \/config\/.gmweb/.test(command) && !/^bun install --cwd \/config\/.gmweb/.test(command)) {
         let helpText = '';
         try { helpText = '\n\n' + execSync('bun x gm-exec --help', { timeout: 10000 }).toString().trim(); } catch (e) {}
-        return { block: true, reason: `Bash is restricted to exec:<lang> and git.\n\nexec:<lang> syntax (lang auto-detected if omitted):\n  exec:nodejs / exec:python / exec:bash / exec:typescript\n  exec:go / exec:rust / exec:java / exec:c / exec:cpp\n  exec:agent-browser  ← plain JS piped to browser eval (NO base64)\n  exec               ← auto-detects language\n\nNEVER encode agent-browser code as base64 — pass plain JS directly.\n\nbun x gm-exec${helpText}\n\nAll other Bash commands are blocked.` };
+        return deny(`Bash is restricted to exec:<lang> and git.\n\nexec:<lang> syntax (lang auto-detected if omitted):\n  exec:nodejs / exec:python / exec:bash / exec:typescript\n  exec:go / exec:rust / exec:java / exec:c / exec:cpp\n  exec:agent-browser  ← plain JS piped to browser eval (NO base64)\n  exec               ← auto-detects language\n\nNEVER encode agent-browser code as base64 — pass plain JS directly.\n\nbun x gm-exec${helpText}\n\nAll other Bash commands are blocked.`);
       }
     }
     const allowedTools = ['agent-browser', 'Skill', 'code-search', 'electron', 'TaskOutput', 'ReadMcpResourceTool', 'ListMcpResourcesTool'];
-    if (allowedTools.includes(tool_name)) return { allow: true };
+    if (allowedTools.includes(tool_name)) return allow();
-    return { allow: true };
+    return allow();
   } catch (error) {
-    return { allow: true };
+    return allow();
   }
 };
 try {
   const result = run();
-  if (result.block) {
-    console.log(JSON.stringify({ decision: isGemini ? 'deny' : 'block', reason: result.reason }));
-    process.exit(0);
-  }
-  if (isGemini) console.log(JSON.stringify({ decision: 'allow' }));
+  console.log(JSON.stringify(result));
   process.exit(0);
 } catch (error) {
   process.exit(0);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gm-oc",
-  "version": "2.0.176",
+  "version": "2.0.178",
   "description": "State machine agent with hooks, skills, and automated git enforcement",
   "author": "AnEntrypoint",
   "license": "MIT",