gm-copilot-cli 2.0.277 → 2.0.279

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: gm
3
- version: 2.0.277
3
+ version: 2.0.279
4
4
  description: State machine agent with hooks, skills, and automated git enforcement
5
5
  author: AnEntrypoint
6
6
  repository: https://github.com/AnEntrypoint/gm-copilot-cli
package/index.html CHANGED
@@ -18,7 +18,7 @@
18
18
  <script type="module">
19
19
  import { createElement as h, applyDiff, Fragment } from "webjsx";
20
20
  const PLATFORM_NAME="Copilot CLI",PLATFORM_TYPE="CLI Tool",PLATFORM_TYPE_COLOR="#3b82f6";
21
- const DESCRIPTION="State machine agent with hooks, skills, and automated git enforcement",VERSION="2.0.277";
21
+ const DESCRIPTION="State machine agent with hooks, skills, and automated git enforcement",VERSION="2.0.279";
22
22
  const GITHUB_URL="https://github.com/AnEntrypoint/gm-copilot-cli",BADGE_LABEL="copilot-cli";
23
23
  const FEATURES=[{"title":"State Machine","desc":"Immutable PLAN→EXECUTE→EMIT→VERIFY→COMPLETE phases with full mutable tracking"},{"title":"Semantic Search","desc":"Natural language codebase exploration via codesearch skill — no grep needed"},{"title":"Hooks","desc":"Pre-tool, session-start, prompt-submit, and stop hooks for full lifecycle control"},{"title":"Agents","desc":"gm, codesearch, and websearch agents pre-configured and ready to use"},{"title":"MCP Integration","desc":"Model Context Protocol server support built in"},{"title":"Auto-Recovery","desc":"Supervisor hierarchy ensures the system never crashes"}],INSTALL_STEPS=[{"desc":"Install via GitHub CLI","cmd":"gh extension install AnEntrypoint/gm-copilot-cli"},{"desc":"Restart your terminal — activates automatically"}];
24
24
  const CURRENT_PLATFORM="gm-copilot-cli";
package/manifest.yml CHANGED
@@ -1,5 +1,5 @@
1
1
  name: gm
2
- version: 2.0.277
2
+ version: 2.0.279
3
3
  description: State machine agent with hooks, skills, and automated git enforcement
4
4
  author: AnEntrypoint
5
5
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm-copilot-cli",
3
- "version": "2.0.277",
3
+ "version": "2.0.279",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "author": "AnEntrypoint",
6
6
  "license": "MIT",
@@ -0,0 +1,171 @@
1
+ ---
2
+ name: browser
3
+ description: Browser automation via playwriter. Use when user needs to interact with websites, navigate pages, fill forms, click buttons, take screenshots, extract data, test web apps, or automate any browser task.
4
+ allowed-tools: Bash(browser:*), Bash(exec:browser*)
5
+ ---
6
+
7
+ # Browser Automation with playwriter
8
+
9
+ ## Two Pathways
10
+
11
+ **Session commands** — use `browser:` prefix via Bash for all browser control.
12
+
13
+ Create a session first, then run commands against it:
14
+
15
+ ```
16
+ browser:
17
+ playwriter session new
18
+ ```
19
+
20
+ Returns a numeric session ID (e.g. `1`). Use that ID for all subsequent commands.
21
+
22
+ ```
23
+ browser:
24
+ playwriter -s 1 -e 'await page.goto("https://example.com")'
25
+ ```
26
+
27
+ ```
28
+ browser:
29
+ playwriter -s 1 -e 'await snapshot({ page })'
30
+ ```
31
+
32
+ ```
33
+ browser:
34
+ playwriter -s 1 -e 'await screenshotWithAccessibilityLabels({ page })'
35
+ ```
36
+
37
+ State persists across calls within a session:
38
+
39
+ ```
40
+ browser:
41
+ playwriter -s 1 -e 'state.x = 1'
42
+ ```
43
+
44
+ ```
45
+ browser:
46
+ playwriter -s 1 -e 'console.log(state.x)'
47
+ ```
48
+
49
+ List active sessions:
50
+
51
+ ```
52
+ browser:
53
+ playwriter session list
54
+ ```
55
+
56
+ **JS eval in browser** — use `exec:browser` via Bash when you need to run JavaScript in the page context directly.
57
+
58
+ ```
59
+ exec:browser
60
+ await page.goto('https://example.com')
61
+ await snapshot({ page })
62
+ ```
63
+
64
+ ```
65
+ exec:browser
66
+ const title = await page.title()
67
+ console.log(title)
68
+ ```
69
+
70
+ Always use single quotes for the `-e` argument to avoid shell quoting issues.
71
+
72
+ ## Core Workflow
73
+
74
+ Every browser automation follows this pattern:
75
+
76
+ 1. **Create session**: `playwriter session new` (note the returned ID)
77
+ 2. **Navigate**: `playwriter -s <id> -e 'await page.goto("https://example.com")'`
78
+ 3. **Snapshot**: `playwriter -s <id> -e 'await snapshot({ page })'`
79
+ 4. **Interact**: click, fill, type via JS expressions
80
+ 5. **Re-snapshot**: after navigation or DOM changes
81
+
82
+ ```
83
+ browser:
84
+ playwriter session new
85
+ playwriter -s 1 -e 'await page.goto("https://example.com/form")'
86
+ playwriter -s 1 -e 'await snapshot({ page })'
87
+ playwriter -s 1 -e 'await page.fill("[name=email]", "user@example.com")'
88
+ playwriter -s 1 -e 'await page.click("[type=submit]")'
89
+ playwriter -s 1 -e 'await page.waitForLoadState("networkidle")'
90
+ playwriter -s 1 -e 'await snapshot({ page })'
91
+ ```
92
+
93
+ ## Common Patterns
94
+
95
+ ### Navigation and Snapshot
96
+
97
+ ```
98
+ browser:
99
+ playwriter session new
100
+ playwriter -s 1 -e 'await page.goto("https://example.com")'
101
+ playwriter -s 1 -e 'await snapshot({ page })'
102
+ ```
103
+
104
+ ### Screenshot with Accessibility Labels
105
+
106
+ ```
107
+ browser:
108
+ playwriter -s 1 -e 'await screenshotWithAccessibilityLabels({ page })'
109
+ ```
110
+
111
+ ### Data Extraction
112
+
113
+ ```
114
+ exec:browser
115
+ await page.goto('https://example.com/products')
116
+ const items = await page.$$eval('.product-title', els => els.map(e => e.textContent))
117
+ console.log(JSON.stringify(items))
118
+ ```
119
+
120
+ ### Persistent State Across Steps
121
+
122
+ ```
123
+ browser:
124
+ playwriter -s 1 -e 'state.loginDone = false'
125
+ playwriter -s 1 -e 'await page.goto("https://app.example.com/login")'
126
+ playwriter -s 1 -e 'await page.fill("[name=user]", "admin")'
127
+ playwriter -s 1 -e 'await page.fill("[name=pass]", "secret")'
128
+ playwriter -s 1 -e 'await page.click("[type=submit]")'
129
+ playwriter -s 1 -e 'state.loginDone = true'
130
+ ```
131
+
132
+ ### Multiple Sessions
133
+
134
+ ```
135
+ browser:
136
+ playwriter session new
137
+ playwriter session new
138
+ playwriter -s 1 -e 'await page.goto("https://site-a.com")'
139
+ playwriter -s 2 -e 'await page.goto("https://site-b.com")'
140
+ playwriter session list
141
+ ```
142
+
143
+ ## JavaScript Evaluation (exec pathway)
144
+
145
+ Use `exec:browser` via Bash when you need direct page access. The body is plain JavaScript executed in the browser context.
146
+
147
+ ```
148
+ exec:browser
149
+ await page.goto('https://example.com')
150
+ await snapshot({ page })
151
+ ```
152
+
153
+ ```
154
+ exec:browser
155
+ const links = await page.$$eval('a', els => els.map(e => e.href))
156
+ console.log(JSON.stringify(links))
157
+ ```
158
+
159
+ Never add shell quoting or escaping to the exec body — write plain JavaScript directly.
160
+
161
+ ## Key Patterns for Agents
162
+
163
+ **Which pathway to use**:
164
+ - Multi-step session workflows → `browser:` prefix with `playwriter -s <id> -e '...'`
165
+ - Quick JS eval or data extraction → `exec:browser` with plain JS body
166
+
167
+ **Always use single quotes** for the `-e` argument to playwriter to avoid shell interpretation.
168
+
169
+ **Session IDs are numeric**: `playwriter session new` returns `1`, `2`, etc. Use the exact returned value.
170
+
171
+ **Snapshot before interacting**: always call `await snapshot({ page })` to understand current page state before clicking or filling.
@@ -84,9 +84,9 @@ Alias: `exec:search`. **Glob, Grep, Read, Explore, WebSearch are hook-blocked**
84
84
 
85
85
  ## BROWSER AUTOMATION
86
86
 
87
- Invoke `agent-browser` skill. Escalation — exhaust each before advancing:
88
- 1. `exec:agent-browser\n<js>` — query DOM/state via JS
89
- 2. `agent-browser` skill + `__gm` globals instrument and capture
87
+ Invoke `browser` skill. Escalation — exhaust each before advancing:
88
+ 1. `exec:browser\n<js>` — query DOM/state via JS
89
+ 2. `browser` skill for full session workflows
90
90
  3. navigate/click/type — only when real events required
91
91
  4. screenshot — last resort only
92
92
 
@@ -97,7 +97,7 @@ Invoke `agent-browser` skill. Escalation — exhaust each before advancing:
97
97
  **`gm-emit`** — Write files to disk when all mutables resolved.
98
98
  **`gm-complete`** — End-to-end verification and git enforcement.
99
99
  **`update-docs`** — Refresh README, CLAUDE.md, and docs to reflect session changes. Invoked by `gm-complete`.
100
- **`agent-browser`** — Browser automation. Invoke inside EXECUTE for all browser/UI work.
100
+ **`browser`** — Browser automation. Invoke inside EXECUTE for all browser/UI work.
101
101
 
102
102
  ## DO NOT STOP
103
103
 
@@ -47,7 +47,7 @@ const { fn } = await import('/abs/path/to/module.js');
47
47
  console.log(await fn(realInput));
48
48
  ```
49
49
 
50
- For browser/UI: invoke `agent-browser` skill with real workflows. Server + client features require both exec:nodejs AND agent-browser. After every success: enumerate what remains — never stop at first green.
50
+ For browser/UI: invoke `browser` skill with real workflows. Server + client features require both exec:nodejs AND browser. After every success: enumerate what remains — never stop at first green.
51
51
 
52
52
  ## CODE EXECUTION
53
53
 
@@ -81,7 +81,7 @@ Alias: `exec:search`. **Glob, Grep, Read, Explore are hook-blocked** — use `ex
81
81
 
82
82
  ## BROWSER DEBUGGING
83
83
 
84
- Invoke `agent-browser` skill. Escalation: (1) `exec:agent-browser\n<js>` → (2) skill + `__gm` globals → (3) navigate/click → (4) screenshot last resort.
84
+ Invoke `browser` skill. Escalation: (1) `exec:browser\n<js>` → (2) `browser` skill → (3) navigate/click → (4) screenshot last resort.
85
85
 
86
86
  ## SELF-CHECK (before and after each file)
87
87
 
@@ -93,17 +93,12 @@ Step failure revealing new unknown → snake to `planning`.
93
93
 
94
94
  ## BROWSER DEBUGGING
95
95
 
96
- Invoke `agent-browser` skill. Escalation — exhaust each before advancing:
97
- 1. `exec:agent-browser\n<js>` — query DOM/state. Always first.
98
- 2. `agent-browser` skill + `__gm` globals instrument and capture
96
+ Invoke `browser` skill. Escalation — exhaust each before advancing:
97
+ 1. `exec:browser\n<js>` — query DOM/state. Always first.
98
+ 2. `browser` skill for full session workflows
99
99
  3. navigate/click/type — only when real events required
100
100
  4. screenshot — last resort
101
101
 
102
- `__gm` scaffold:
103
- ```js
104
- window.__gm = { captures: [], log: (...a) => window.__gm.captures.push({t:Date.now(),a}), assert: (l,c) => { window.__gm.captures.push({l,pass:!!c,val:c}); return !!c; }, dump: () => JSON.stringify(window.__gm.captures,null,2) };
105
- ```
106
-
107
102
  ## GROUND TRUTH
108
103
 
109
104
  Real services, real data, real timing. Mocks/fakes/stubs = delete immediately. No .test.js/.spec.js. Delete on discovery.
package/tools.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm",
3
- "version": "2.0.277",
3
+ "version": "2.0.279",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "tools": [
6
6
  {
@@ -1,577 +0,0 @@
1
- ---
2
- name: agent-browser
3
- description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4
- allowed-tools: agent-browser, Bash(agent-browser:*), Bash(exec:agent-browser*)
5
- ---
6
-
7
- # Browser Automation with agent-browser
8
-
9
- ## Two Pathways
10
-
11
- **Browser CLI commands** — use `agent-browser:` prefix via Bash for all browser control: navigating, clicking, filling forms, taking screenshots, reading snapshots.
12
-
13
- ```
14
- agent-browser:
15
- open http://localhost:3001
16
- wait 2000
17
- snapshot -i
18
- ```
19
-
20
- Single commands:
21
-
22
- ```
23
- agent-browser:
24
- open http://example.com
25
- ```
26
-
27
- ```
28
- agent-browser:
29
- close
30
- ```
31
-
32
- **JS eval in browser** — use `exec:agent-browser` via Bash when you need to run JavaScript in the page context. The body is piped to `eval --stdin`. Use this for DOM inspection, custom extraction logic, or anything requiring programmatic page access.
33
-
34
- ```
35
- exec:agent-browser
36
- document.title
37
- ```
38
-
39
- ```
40
- exec:agent-browser
41
- JSON.stringify([...document.querySelectorAll('h1')].map(h => h.textContent))
42
- ```
43
-
44
- **Always close tabs when done**: every `open` is tracked. Use `agent-browser:\nclose` (or `--session <name> close`) when finished. Leaving sessions open accumulates stale tabs — the hook will warn you when other sessions are still open.
45
-
46
- ## Core Workflow
47
-
48
- Every browser automation follows this pattern:
49
-
50
- 1. **Navigate**: `agent-browser open <url>`
51
- 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
52
- 3. **Interact**: Use refs to click, fill, select
53
- 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
54
-
55
- ```bash
56
- agent-browser open https://example.com/form
57
- agent-browser snapshot -i
58
- # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
59
-
60
- agent-browser fill @e1 "user@example.com"
61
- agent-browser fill @e2 "password123"
62
- agent-browser click @e3
63
- agent-browser wait --load networkidle
64
- agent-browser snapshot -i # Check result
65
- ```
66
-
67
- ## Essential Commands
68
-
69
- ```bash
70
- # Navigation
71
- agent-browser open <url> # Navigate (aliases: goto, navigate)
72
- agent-browser close # Close browser
73
-
74
- # Snapshot
75
- agent-browser snapshot -i # Interactive elements with refs (recommended)
76
- agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
77
- agent-browser snapshot -s "#selector" # Scope to CSS selector
78
-
79
- # Interaction (use @refs from snapshot)
80
- agent-browser click @e1 # Click element
81
- agent-browser fill @e2 "text" # Clear and type text
82
- agent-browser type @e2 "text" # Type without clearing
83
- agent-browser select @e1 "option" # Select dropdown option
84
- agent-browser check @e1 # Check checkbox
85
- agent-browser press Enter # Press key
86
- agent-browser scroll down 500 # Scroll page
87
-
88
- # Get information
89
- agent-browser get text @e1 # Get element text
90
- agent-browser get url # Get current URL
91
- agent-browser get title # Get page title
92
-
93
- # Wait
94
- agent-browser wait @e1 # Wait for element
95
- agent-browser wait --load networkidle # Wait for network idle
96
- agent-browser wait --url "**/page" # Wait for URL pattern
97
- agent-browser wait 2000 # Wait milliseconds
98
-
99
- # Capture
100
- agent-browser screenshot # Screenshot to temp dir
101
- agent-browser screenshot --full # Full page screenshot
102
- agent-browser pdf output.pdf # Save as PDF
103
- ```
104
-
105
- ## Common Patterns
106
-
107
- ### Form Submission
108
-
109
- ```bash
110
- agent-browser open https://example.com/signup
111
- agent-browser snapshot -i
112
- agent-browser fill @e1 "Jane Doe"
113
- agent-browser fill @e2 "jane@example.com"
114
- agent-browser select @e3 "California"
115
- agent-browser check @e4
116
- agent-browser click @e5
117
- agent-browser wait --load networkidle
118
- ```
119
-
120
- ### Authentication with State Persistence
121
-
122
- ```bash
123
- # Login once and save state
124
- agent-browser open https://app.example.com/login
125
- agent-browser snapshot -i
126
- agent-browser fill @e1 "$USERNAME"
127
- agent-browser fill @e2 "$PASSWORD"
128
- agent-browser click @e3
129
- agent-browser wait --url "**/dashboard"
130
- agent-browser state save auth.json
131
-
132
- # Reuse in future sessions
133
- agent-browser state load auth.json
134
- agent-browser open https://app.example.com/dashboard
135
- ```
136
-
137
- ### Data Extraction
138
-
139
- ```bash
140
- agent-browser open https://example.com/products
141
- agent-browser snapshot -i
142
- agent-browser get text @e5 # Get specific element text
143
- agent-browser get text body > page.txt # Get all page text
144
-
145
- # JSON output for parsing
146
- agent-browser snapshot -i --json
147
- agent-browser get text @e1 --json
148
- ```
149
-
150
- ### Parallel Sessions
151
-
152
- ```bash
153
- agent-browser --session site1 open https://site-a.com
154
- agent-browser --session site2 open https://site-b.com
155
-
156
- agent-browser --session site1 snapshot -i
157
- agent-browser --session site2 snapshot -i
158
-
159
- agent-browser session list
160
- ```
161
-
162
- ### Connect to Existing Chrome
163
-
164
- ```bash
165
- # Auto-discover running Chrome with remote debugging enabled
166
- agent-browser --auto-connect open https://example.com
167
- agent-browser --auto-connect snapshot
168
-
169
- # Or with explicit CDP port
170
- agent-browser --cdp 9222 snapshot
171
- ```
172
-
173
- ### Visual Browser (Headed Mode)
174
-
175
- Use `--headed` as the first flag on the first line — it propagates to all commands in the block:
176
-
177
- ```
178
- agent-browser:
179
- --headed open https://example.com
180
- wait --load networkidle
181
- snapshot -i
182
- ```
183
-
184
- ```
185
- agent-browser:
186
- --headed open https://example.com
187
- highlight @e1
188
- record start demo.webm
189
- ```
190
-
191
- ### Local Files (PDFs, HTML)
192
-
193
- ```bash
194
- # Open local files with file:// URLs
195
- agent-browser --allow-file-access open file:///path/to/document.pdf
196
- agent-browser --allow-file-access open file:///path/to/page.html
197
- agent-browser screenshot output.png
198
- ```
199
-
200
- ### iOS Simulator (Mobile Safari)
201
-
202
- ```bash
203
- # List available iOS simulators
204
- agent-browser device list
205
-
206
- # Launch Safari on a specific device
207
- agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
208
-
209
- # Same workflow as desktop - snapshot, interact, re-snapshot
210
- agent-browser -p ios snapshot -i
211
- agent-browser -p ios tap @e1 # Tap (alias for click)
212
- agent-browser -p ios fill @e2 "text"
213
- agent-browser -p ios swipe up # Mobile-specific gesture
214
-
215
- # Take screenshot
216
- agent-browser -p ios screenshot mobile.png
217
-
218
- # Close session (shuts down simulator)
219
- agent-browser -p ios close
220
- ```
221
-
222
- **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
223
-
224
- **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
225
-
226
- ## Ref Lifecycle (Important)
227
-
228
- Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
229
-
230
- - Clicking links or buttons that navigate
231
- - Form submissions
232
- - Dynamic content loading (dropdowns, modals)
233
-
234
- ```bash
235
- agent-browser click @e5 # Navigates to new page
236
- agent-browser snapshot -i # MUST re-snapshot
237
- agent-browser click @e1 # Use new refs
238
- ```
239
-
240
- ## Semantic Locators (Alternative to Refs)
241
-
242
- When refs are unavailable or unreliable, use semantic locators:
243
-
244
- ```bash
245
- agent-browser find text "Sign In" click
246
- agent-browser find label "Email" fill "user@test.com"
247
- agent-browser find role button click --name "Submit"
248
- agent-browser find placeholder "Search" type "query"
249
- agent-browser find testid "submit-btn" click
250
- ```
251
-
252
- ## JavaScript Evaluation (exec pathway)
253
-
254
- Use this pathway when you need to run JavaScript in the browser context — not ordinary CLI commands. This goes through `exec:agent-browser` via Bash, which pipes your code to `agent-browser eval --stdin`. **Shell quoting can corrupt complex expressions** — use the heredoc form.
255
-
256
- Use `exec:agent-browser` via Bash. The code body is piped directly to `agent-browser eval --stdin` — no shell, no escaping.
257
-
258
- ```
259
- exec:agent-browser
260
- document.title
261
- ```
262
-
263
- ```
264
- exec:agent-browser
265
- JSON.stringify(
266
- Array.from(document.querySelectorAll("img"))
267
- .filter(i => !i.alt)
268
- .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
269
- )
270
- ```
271
-
272
- Never base64-encode the code. Never add `agent-browser eval` flags. Write plain JavaScript directly as the exec body.
273
-
274
- ## Complete Command Reference
275
-
276
- ### Core Navigation & Lifecycle
277
- ```bash
278
- agent-browser open <url> # Navigate (aliases: goto, navigate)
279
- agent-browser close # Close browser (aliases: quit, exit)
280
- agent-browser back # Go back
281
- agent-browser forward # Go forward
282
- agent-browser reload # Reload page
283
- ```
284
-
285
- ### Snapshots & Element References
286
- ```bash
287
- agent-browser snapshot # Accessibility tree with semantic refs
288
- agent-browser snapshot -i # Interactive elements with @e refs
289
- agent-browser snapshot -i -C # Include cursor-interactive divs (onclick, pointer)
290
- agent-browser snapshot -s "#sel" # Scope snapshot to CSS selector
291
- agent-browser snapshot --json # JSON output for parsing
292
- ```
293
-
294
- ### Interaction - Click, Fill, Type, Select
295
- ```bash
296
- agent-browser click <sel> # Click element
297
- agent-browser click <sel> --new-tab # Open link in new tab
298
- agent-browser dblclick <sel> # Double-click
299
- agent-browser focus <sel> # Focus element
300
- agent-browser type <sel> <text> # Type into element (append)
301
- agent-browser fill <sel> <text> # Clear and fill
302
- agent-browser select <sel> <val> # Select dropdown option
303
- agent-browser check <sel> # Check checkbox
304
- agent-browser uncheck <sel> # Uncheck checkbox
305
- agent-browser press <key> # Press key (Enter, Tab, Control+a, etc.) (alias: key)
306
- ```
307
-
308
- ### Keyboard & Text Input
309
- ```bash
310
- agent-browser keyboard type <text> # Type with real keystrokes (no selector, uses focus)
311
- agent-browser keyboard inserttext <text> # Insert text without triggering key events
312
- agent-browser keydown <key> # Hold key down
313
- agent-browser keyup <key> # Release key
314
- ```
315
-
316
- ### Mouse & Drag
317
- ```bash
318
- agent-browser hover <sel> # Hover element
319
- agent-browser drag <src> <tgt> # Drag and drop
320
- agent-browser mouse move <x> <y> # Move mouse to coordinates
321
- agent-browser mouse down [button] # Press mouse button (left/right/middle)
322
- agent-browser mouse up [button] # Release mouse button
323
- agent-browser mouse wheel <dy> [dx] # Scroll wheel
324
- ```
325
-
326
- ### Scrolling & Viewport
327
- ```bash
328
- agent-browser scroll <dir> [px] # Scroll (up/down/left/right, optional px)
329
- agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
330
- agent-browser set viewport <w> <h> # Set viewport size (e.g., 1920 1080)
331
- agent-browser set device <name> # Emulate device (e.g., "iPhone 14")
332
- ```
333
-
334
- ### Get Information
335
- ```bash
336
- agent-browser get text <sel> # Get text content
337
- agent-browser get html <sel> # Get innerHTML
338
- agent-browser get value <sel> # Get input value
339
- agent-browser get attr <sel> <attr> # Get attribute value
340
- agent-browser get title # Get page title
341
- agent-browser get url # Get current URL
342
- agent-browser get count <sel> # Count matching elements
343
- agent-browser get box <sel> # Get bounding box {x, y, width, height}
344
- agent-browser get styles <sel> # Get computed CSS styles
345
- ```
346
-
347
- ### Check State
348
- ```bash
349
- agent-browser is visible <sel> # Check if visible
350
- agent-browser is enabled <sel> # Check if enabled (not disabled)
351
- agent-browser is checked <sel> # Check if checked (checkbox/radio)
352
- ```
353
-
354
- ### File Operations
355
- ```bash
356
- agent-browser upload <sel> <files> # Upload files to file input
357
- agent-browser screenshot [path] # Screenshot to temp or custom path
358
- agent-browser screenshot --full # Full page screenshot
359
- agent-browser screenshot --annotate # Annotated with numbered element labels
360
- agent-browser pdf <path> # Save as PDF
361
- ```
362
-
363
- ### Semantic Locators (Alternative to Selectors)
364
- ```bash
365
- agent-browser find role <role> <action> [value] # By ARIA role
366
- agent-browser find text <text> <action> # By text content
367
- agent-browser find label <label> <action> [value] # By form label
368
- agent-browser find placeholder <ph> <action> [value] # By placeholder text
369
- agent-browser find alt <text> <action> # By alt text
370
- agent-browser find title <text> <action> # By title attribute
371
- agent-browser find testid <id> <action> [value] # By data-testid
372
- agent-browser find first <sel> <action> [value] # First matching element
373
- agent-browser find last <sel> <action> [value] # Last matching element
374
- agent-browser find nth <n> <sel> <action> [value] # Nth matching element
375
-
376
- # Role examples: button, link, textbox, combobox, checkbox, radio, heading, list, etc.
377
- # Actions: click, fill, type, hover, focus, check, uncheck, text
378
- # Options: --name <name> (filter by accessible name), --exact (exact text match)
379
- ```
380
-
381
- ### Waiting
382
- ```bash
383
- agent-browser wait <selector> # Wait for element to be visible
384
- agent-browser wait <ms> # Wait for time in milliseconds
385
- agent-browser wait --text "Welcome" # Wait for text to appear
386
- agent-browser wait --url "**/dash" # Wait for URL pattern
387
- agent-browser wait --load networkidle # Wait for load state (load, domcontentloaded, networkidle)
388
- agent-browser wait --fn "window.ready === true" # Wait for JS condition
389
- ```
390
-
391
- ### JavaScript Evaluation (exec pathway — not a direct tool command)
392
- ```
393
- exec:agent-browser
394
- <plain JS>
395
- ```
396
- Use `exec:agent-browser` via Bash. Code is piped to `agent-browser eval --stdin`. No base64, no flags.
397
-
398
- ### Browser Environment
399
- ```bash
400
- agent-browser set geo <lat> <lng> # Set geolocation
401
- agent-browser set offline [on|off] # Toggle offline mode
402
- agent-browser set headers <json> # Set HTTP headers
403
- agent-browser set credentials <u> <p> # HTTP basic auth
404
- agent-browser set media [dark|light] # Emulate color scheme (prefers-color-scheme)
405
- ```
406
-
407
- ### Cookies & Storage
408
- ```bash
409
- agent-browser cookies # Get all cookies
410
- agent-browser cookies set <name> <val> # Set cookie
411
- agent-browser cookies clear # Clear cookies
412
- agent-browser storage local # Get all localStorage
413
- agent-browser storage local <key> # Get specific key
414
- agent-browser storage local set <k> <v> # Set value
415
- agent-browser storage local clear # Clear all localStorage
416
- agent-browser storage session # Same for sessionStorage
417
- agent-browser storage session <key> # Get sessionStorage key
418
- agent-browser storage session set <k> <v> # Set sessionStorage
419
- agent-browser storage session clear # Clear sessionStorage
420
- ```
421
-
422
- ### Network & Interception
423
- ```bash
424
- agent-browser network route <url> # Intercept requests
425
- agent-browser network route <url> --abort # Block requests
426
- agent-browser network route <url> --body <json> # Mock response with JSON
427
- agent-browser network unroute [url] # Remove routes
428
- agent-browser network requests # View tracked requests
429
- agent-browser network requests --filter api # Filter by keyword
430
- ```
431
-
432
- ### Tabs & Windows
433
- ```bash
434
- agent-browser tab # List active tabs
435
- agent-browser tab new [url] # Open new tab (optionally with URL)
436
- agent-browser tab <n> # Switch to tab n
437
- agent-browser tab close [n] # Close tab (current or specific)
438
- agent-browser window new # Open new window
439
- ```
440
-
441
- ### Frames
442
- ```bash
443
- agent-browser frame <sel> # Switch to iframe by selector
444
- agent-browser frame main # Switch back to main frame
445
- ```
446
-
447
- ### Dialogs
448
- ```bash
449
- agent-browser dialog accept [text] # Accept alert/confirm (with optional prompt text)
450
- agent-browser dialog dismiss # Dismiss dialog
451
- ```
452
-
453
- ### State Persistence (Auth, Sessions)
454
- ```bash
455
- agent-browser state save <path> # Save authenticated session
456
- agent-browser state load <path> # Load session state
457
- agent-browser state list # List saved state files
458
- agent-browser state show <file> # Show state summary
459
- agent-browser state rename <old> <new> # Rename state
460
- agent-browser state clear [name] # Clear specific session
461
- agent-browser state clear --all # Clear all states
462
- agent-browser state clean --older-than <days> # Delete old states
463
- ```
464
-
465
- ### Debugging & Analysis
466
- ```bash
467
- agent-browser highlight <sel> # Highlight element visually
468
- agent-browser console # View console messages (log, error, warn)
469
- agent-browser console --clear # Clear console
470
- agent-browser errors # View JavaScript errors
471
- agent-browser errors --clear # Clear errors
472
- agent-browser trace start [path] # Start DevTools trace
473
- agent-browser trace stop [path] # Stop and save trace
474
- agent-browser profiler start # Start Chrome DevTools profiler
475
- agent-browser profiler stop [path] # Stop and save .json profile
476
- ```
477
-
478
- ### Visual Debugging
479
- ```
480
- agent-browser:
481
- --headed open <url>
482
- record start <file.webm>
483
- ```
484
- ```
485
- agent-browser:
486
- record stop
487
- ```
488
-
489
- ### Comparisons & Diffs
490
- ```bash
491
- agent-browser diff snapshot # Compare current vs last snapshot
492
- agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot
493
- agent-browser diff snapshot --selector "#main" --compact # Scoped diff
494
- agent-browser diff screenshot --baseline before.png # Visual pixel diff
495
- agent-browser diff screenshot --baseline b.png -o d.png # Save diff to custom path
496
- agent-browser diff screenshot --baseline b.png -t 0.2 # Color threshold 0-1
497
- agent-browser diff url https://v1.com https://v2.com # Compare two URLs
498
- agent-browser diff url https://v1.com https://v2.com --screenshot # With visual diff
499
- agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scoped
500
- ```
501
-
502
- ### Sessions & Parallelism
503
- ```bash
504
- agent-browser --session <name> <cmd> # Run in named session (isolated instance)
505
- agent-browser session list # List active sessions
506
- agent-browser session show # Show current session
507
- # Example: agent-browser --session agent1 open site.com
508
- # agent-browser --session agent2 open other.com
509
- ```
510
-
511
- ### Browser Connection
512
- ```bash
513
- agent-browser connect <port> # Connect via Chrome DevTools Protocol
514
- agent-browser --auto-connect open <url> # Auto-discover running Chrome
515
- agent-browser --cdp 9222 <cmd> # Explicit CDP port
516
- ```
517
-
518
- ### Setup & Installation
519
- ```bash
520
- agent-browser install # Download Chromium browser
521
- agent-browser install --with-deps # Also install system dependencies (Linux)
522
- ```
523
-
524
- ### Advanced: Local Files & Protocols
525
- ```bash
526
- agent-browser --allow-file-access open file:///path/to/file.pdf
527
- agent-browser --allow-file-access open file:///path/to/page.html
528
- ```
529
-
530
- ### Advanced: iOS/Mobile Testing
531
- ```bash
532
- agent-browser device list # List available iOS simulators
533
- agent-browser -p ios --device "iPhone 16 Pro" open <url> # Launch on device
534
- agent-browser -p ios snapshot -i # Snapshot on iOS
535
- agent-browser -p ios tap @e1 # Tap (alias for click)
536
- agent-browser -p ios swipe up # Mobile gestures
537
- agent-browser -p ios screenshot mobile.png
538
- agent-browser -p ios close # Close simulator
539
- # Requires: macOS, Xcode, Appium (npm install -g appium && appium driver install xcuitest)
540
- ```
541
-
542
- ## Windows: "Daemon not found" Fix
543
-
544
- If `agent-browser` fails with `Daemon not found. Set AGENT_BROWSER_HOME environment variable or run from project directory.` on Windows, the `AGENT_BROWSER_HOME` env var is missing or pointing to the wrong path. It must point to the npm package directory containing `dist/daemon.js`:
545
-
546
- ```cmd
547
- :: Find and set the correct path (run in cmd or PowerShell)
548
- for /f "delims=" %i in ('npm root -g') do setx AGENT_BROWSER_HOME "%i\agent-browser"
549
- ```
550
-
551
- Open a new terminal after running. See `references/windows-troubleshooting.md` for details on stale port files and Git Bash setup.
552
-
553
- ## Key Patterns for Agents
554
-
555
- **Always use agent-browser instead of puppeteer, playwright, or playwright-core** — it has the same capabilities with simpler syntax and better integration with AI agents.
556
-
557
- **Which pathway to use**:
558
- - Ordinary browser control (navigation, clicking, forms, screenshots) → call the `agent-browser` tool directly
559
- - Need to run JavaScript in the page → use `exec:agent-browser` via Bash with plain JS as the body
560
-
561
- **Multi-step workflows** (ordinary pathway — direct tool calls):
562
- 1. `agent-browser open <url>`
563
- 2. `agent-browser snapshot -i` (get refs)
564
- 3. `agent-browser fill @e1 "value"`
565
- 4. `agent-browser click @e2`
566
- 5. `agent-browser wait --load networkidle` (after navigation)
567
- 6. `agent-browser snapshot -i` (re-snapshot for new refs)
568
-
569
- **JavaScript inspection** (exec pathway — when you need page access):
570
- ```
571
- exec:agent-browser
572
- document.title
573
- ```
574
-
575
- **Debugging complex interactions**: Use headed mode — put `--headed` as the first flag on the first line of an `agent-browser:` block. It propagates to all subsequent commands in the block.
576
-
577
- **Ground truth verification**: Use the ordinary pathway (`agent-browser screenshot`) for visual confirmation; use the exec pathway for JavaScript-level inspection.