gm-copilot-cli 2.0.278 → 2.0.279
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/copilot-profile.md +1 -1
- package/index.html +1 -1
- package/manifest.yml +1 -1
- package/package.json +1 -1
- package/skills/browser/SKILL.md +171 -0
- package/skills/gm/SKILL.md +4 -4
- package/skills/gm-complete/SKILL.md +1 -1
- package/skills/gm-emit/SKILL.md +1 -1
- package/skills/gm-execute/SKILL.md +3 -8
- package/tools.json +1 -1
- package/skills/agent-browser/SKILL.md +0 -577
package/copilot-profile.md
CHANGED
package/index.html
CHANGED
|
@@ -18,7 +18,7 @@
|
|
|
18
18
|
<script type="module">
|
|
19
19
|
import { createElement as h, applyDiff, Fragment } from "webjsx";
|
|
20
20
|
const PLATFORM_NAME="Copilot CLI",PLATFORM_TYPE="CLI Tool",PLATFORM_TYPE_COLOR="#3b82f6";
|
|
21
|
-
const DESCRIPTION="State machine agent with hooks, skills, and automated git enforcement",VERSION="2.0.
|
|
21
|
+
const DESCRIPTION="State machine agent with hooks, skills, and automated git enforcement",VERSION="2.0.279";
|
|
22
22
|
const GITHUB_URL="https://github.com/AnEntrypoint/gm-copilot-cli",BADGE_LABEL="copilot-cli";
|
|
23
23
|
const FEATURES=[{"title":"State Machine","desc":"Immutable PLAN→EXECUTE→EMIT→VERIFY→COMPLETE phases with full mutable tracking"},{"title":"Semantic Search","desc":"Natural language codebase exploration via codesearch skill — no grep needed"},{"title":"Hooks","desc":"Pre-tool, session-start, prompt-submit, and stop hooks for full lifecycle control"},{"title":"Agents","desc":"gm, codesearch, and websearch agents pre-configured and ready to use"},{"title":"MCP Integration","desc":"Model Context Protocol server support built in"},{"title":"Auto-Recovery","desc":"Supervisor hierarchy ensures the system never crashes"}],INSTALL_STEPS=[{"desc":"Install via GitHub CLI","cmd":"gh extension install AnEntrypoint/gm-copilot-cli"},{"desc":"Restart your terminal — activates automatically"}];
|
|
24
24
|
const CURRENT_PLATFORM="gm-copilot-cli";
|
package/manifest.yml
CHANGED
package/package.json
CHANGED
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: browser
|
|
3
|
+
description: Browser automation via playwriter. Use when user needs to interact with websites, navigate pages, fill forms, click buttons, take screenshots, extract data, test web apps, or automate any browser task.
|
|
4
|
+
allowed-tools: Bash(browser:*), Bash(exec:browser*)
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Browser Automation with playwriter
|
|
8
|
+
|
|
9
|
+
## Two Pathways
|
|
10
|
+
|
|
11
|
+
**Session commands** — use `browser:` prefix via Bash for all browser control.
|
|
12
|
+
|
|
13
|
+
Create a session first, then run commands against it:
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
browser:
|
|
17
|
+
playwriter session new
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Returns a numeric session ID (e.g. `1`). Use that ID for all subsequent commands.
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
browser:
|
|
24
|
+
playwriter -s 1 -e 'await page.goto("https://example.com")'
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
browser:
|
|
29
|
+
playwriter -s 1 -e 'await snapshot({ page })'
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
browser:
|
|
34
|
+
playwriter -s 1 -e 'await screenshotWithAccessibilityLabels({ page })'
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
State persists across calls within a session:
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
browser:
|
|
41
|
+
playwriter -s 1 -e 'state.x = 1'
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
browser:
|
|
46
|
+
playwriter -s 1 -e 'console.log(state.x)'
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
List active sessions:
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
browser:
|
|
53
|
+
playwriter session list
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**JS eval in browser** — use `exec:browser` via Bash when you need to run JavaScript in the page context directly.
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
exec:browser
|
|
60
|
+
await page.goto('https://example.com')
|
|
61
|
+
await snapshot({ page })
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
exec:browser
|
|
66
|
+
const title = await page.title()
|
|
67
|
+
console.log(title)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Always use single quotes for the `-e` argument to avoid shell quoting issues.
|
|
71
|
+
|
|
72
|
+
## Core Workflow
|
|
73
|
+
|
|
74
|
+
Every browser automation follows this pattern:
|
|
75
|
+
|
|
76
|
+
1. **Create session**: `playwriter session new` (note the returned ID)
|
|
77
|
+
2. **Navigate**: `playwriter -s <id> -e 'await page.goto("https://example.com")'`
|
|
78
|
+
3. **Snapshot**: `playwriter -s <id> -e 'await snapshot({ page })'`
|
|
79
|
+
4. **Interact**: click, fill, type via JS expressions
|
|
80
|
+
5. **Re-snapshot**: after navigation or DOM changes
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
browser:
|
|
84
|
+
playwriter session new
|
|
85
|
+
playwriter -s 1 -e 'await page.goto("https://example.com/form")'
|
|
86
|
+
playwriter -s 1 -e 'await snapshot({ page })'
|
|
87
|
+
playwriter -s 1 -e 'await page.fill("[name=email]", "user@example.com")'
|
|
88
|
+
playwriter -s 1 -e 'await page.click("[type=submit]")'
|
|
89
|
+
playwriter -s 1 -e 'await page.waitForLoadState("networkidle")'
|
|
90
|
+
playwriter -s 1 -e 'await snapshot({ page })'
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Common Patterns
|
|
94
|
+
|
|
95
|
+
### Navigation and Snapshot
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
browser:
|
|
99
|
+
playwriter session new
|
|
100
|
+
playwriter -s 1 -e 'await page.goto("https://example.com")'
|
|
101
|
+
playwriter -s 1 -e 'await snapshot({ page })'
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Screenshot with Accessibility Labels
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
browser:
|
|
108
|
+
playwriter -s 1 -e 'await screenshotWithAccessibilityLabels({ page })'
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Data Extraction
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
exec:browser
|
|
115
|
+
await page.goto('https://example.com/products')
|
|
116
|
+
const items = await page.$$eval('.product-title', els => els.map(e => e.textContent))
|
|
117
|
+
console.log(JSON.stringify(items))
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Persistent State Across Steps
|
|
121
|
+
|
|
122
|
+
```
|
|
123
|
+
browser:
|
|
124
|
+
playwriter -s 1 -e 'state.loginDone = false'
|
|
125
|
+
playwriter -s 1 -e 'await page.goto("https://app.example.com/login")'
|
|
126
|
+
playwriter -s 1 -e 'await page.fill("[name=user]", "admin")'
|
|
127
|
+
playwriter -s 1 -e 'await page.fill("[name=pass]", "secret")'
|
|
128
|
+
playwriter -s 1 -e 'await page.click("[type=submit]")'
|
|
129
|
+
playwriter -s 1 -e 'state.loginDone = true'
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### Multiple Sessions
|
|
133
|
+
|
|
134
|
+
```
|
|
135
|
+
browser:
|
|
136
|
+
playwriter session new
|
|
137
|
+
playwriter session new
|
|
138
|
+
playwriter -s 1 -e 'await page.goto("https://site-a.com")'
|
|
139
|
+
playwriter -s 2 -e 'await page.goto("https://site-b.com")'
|
|
140
|
+
playwriter session list
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## JavaScript Evaluation (exec pathway)
|
|
144
|
+
|
|
145
|
+
Use `exec:browser` via Bash when you need direct page access. The body is plain JavaScript executed in the browser context.
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
exec:browser
|
|
149
|
+
await page.goto('https://example.com')
|
|
150
|
+
await snapshot({ page })
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
exec:browser
|
|
155
|
+
const links = await page.$$eval('a', els => els.map(e => e.href))
|
|
156
|
+
console.log(JSON.stringify(links))
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Never add shell quoting or escaping to the exec body — write plain JavaScript directly.
|
|
160
|
+
|
|
161
|
+
## Key Patterns for Agents
|
|
162
|
+
|
|
163
|
+
**Which pathway to use**:
|
|
164
|
+
- Multi-step session workflows → `browser:` prefix with `playwriter -s <id> -e '...'`
|
|
165
|
+
- Quick JS eval or data extraction → `exec:browser` with plain JS body
|
|
166
|
+
|
|
167
|
+
**Always use single quotes** for the `-e` argument to playwriter to avoid shell interpretation.
|
|
168
|
+
|
|
169
|
+
**Session IDs are numeric**: `playwriter session new` returns `1`, `2`, etc. Use the exact returned value.
|
|
170
|
+
|
|
171
|
+
**Snapshot before interacting**: always call `await snapshot({ page })` to understand current page state before clicking or filling.
|
package/skills/gm/SKILL.md
CHANGED
|
@@ -84,9 +84,9 @@ Alias: `exec:search`. **Glob, Grep, Read, Explore, WebSearch are hook-blocked**
|
|
|
84
84
|
|
|
85
85
|
## BROWSER AUTOMATION
|
|
86
86
|
|
|
87
|
-
Invoke `
|
|
88
|
-
1. `exec:
|
|
89
|
-
2. `
|
|
87
|
+
Invoke `browser` skill. Escalation — exhaust each before advancing:
|
|
88
|
+
1. `exec:browser\n<js>` — query DOM/state via JS
|
|
89
|
+
2. `browser` skill — for full session workflows
|
|
90
90
|
3. navigate/click/type — only when real events required
|
|
91
91
|
4. screenshot — last resort only
|
|
92
92
|
|
|
@@ -97,7 +97,7 @@ Invoke `agent-browser` skill. Escalation — exhaust each before advancing:
|
|
|
97
97
|
**`gm-emit`** — Write files to disk when all mutables resolved.
|
|
98
98
|
**`gm-complete`** — End-to-end verification and git enforcement.
|
|
99
99
|
**`update-docs`** — Refresh README, CLAUDE.md, and docs to reflect session changes. Invoked by `gm-complete`.
|
|
100
|
-
**`
|
|
100
|
+
**`browser`** — Browser automation. Invoke inside EXECUTE for all browser/UI work.
|
|
101
101
|
|
|
102
102
|
## DO NOT STOP
|
|
103
103
|
|
|
@@ -47,7 +47,7 @@ const { fn } = await import('/abs/path/to/module.js');
|
|
|
47
47
|
console.log(await fn(realInput));
|
|
48
48
|
```
|
|
49
49
|
|
|
50
|
-
For browser/UI: invoke `
|
|
50
|
+
For browser/UI: invoke `browser` skill with real workflows. Server + client features require both exec:nodejs AND browser. After every success: enumerate what remains — never stop at first green.
|
|
51
51
|
|
|
52
52
|
## CODE EXECUTION
|
|
53
53
|
|
package/skills/gm-emit/SKILL.md
CHANGED
|
@@ -81,7 +81,7 @@ Alias: `exec:search`. **Glob, Grep, Read, Explore are hook-blocked** — use `ex
|
|
|
81
81
|
|
|
82
82
|
## BROWSER DEBUGGING
|
|
83
83
|
|
|
84
|
-
Invoke `
|
|
84
|
+
Invoke `browser` skill. Escalation: (1) `exec:browser\n<js>` → (2) `browser` skill → (3) navigate/click → (4) screenshot last resort.
|
|
85
85
|
|
|
86
86
|
## SELF-CHECK (before and after each file)
|
|
87
87
|
|
|
@@ -93,17 +93,12 @@ Step failure revealing new unknown → snake to `planning`.
|
|
|
93
93
|
|
|
94
94
|
## BROWSER DEBUGGING
|
|
95
95
|
|
|
96
|
-
Invoke `
|
|
97
|
-
1. `exec:
|
|
98
|
-
2. `
|
|
96
|
+
Invoke `browser` skill. Escalation — exhaust each before advancing:
|
|
97
|
+
1. `exec:browser\n<js>` — query DOM/state. Always first.
|
|
98
|
+
2. `browser` skill — for full session workflows
|
|
99
99
|
3. navigate/click/type — only when real events required
|
|
100
100
|
4. screenshot — last resort
|
|
101
101
|
|
|
102
|
-
`__gm` scaffold:
|
|
103
|
-
```js
|
|
104
|
-
window.__gm = { captures: [], log: (...a) => window.__gm.captures.push({t:Date.now(),a}), assert: (l,c) => { window.__gm.captures.push({l,pass:!!c,val:c}); return !!c; }, dump: () => JSON.stringify(window.__gm.captures,null,2) };
|
|
105
|
-
```
|
|
106
|
-
|
|
107
102
|
## GROUND TRUTH
|
|
108
103
|
|
|
109
104
|
Real services, real data, real timing. Mocks/fakes/stubs = delete immediately. No .test.js/.spec.js. Delete on discovery.
|
package/tools.json
CHANGED
|
@@ -1,577 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: agent-browser
|
|
3
|
-
description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
|
|
4
|
-
allowed-tools: agent-browser, Bash(agent-browser:*), Bash(exec:agent-browser*)
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
# Browser Automation with agent-browser
|
|
8
|
-
|
|
9
|
-
## Two Pathways
|
|
10
|
-
|
|
11
|
-
**Browser CLI commands** — use `agent-browser:` prefix via Bash for all browser control: navigating, clicking, filling forms, taking screenshots, reading snapshots.
|
|
12
|
-
|
|
13
|
-
```
|
|
14
|
-
agent-browser:
|
|
15
|
-
open http://localhost:3001
|
|
16
|
-
wait 2000
|
|
17
|
-
snapshot -i
|
|
18
|
-
```
|
|
19
|
-
|
|
20
|
-
Single commands:
|
|
21
|
-
|
|
22
|
-
```
|
|
23
|
-
agent-browser:
|
|
24
|
-
open http://example.com
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
```
|
|
28
|
-
agent-browser:
|
|
29
|
-
close
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
**JS eval in browser** — use `exec:agent-browser` via Bash when you need to run JavaScript in the page context. The body is piped to `eval --stdin`. Use this for DOM inspection, custom extraction logic, or anything requiring programmatic page access.
|
|
33
|
-
|
|
34
|
-
```
|
|
35
|
-
exec:agent-browser
|
|
36
|
-
document.title
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
```
|
|
40
|
-
exec:agent-browser
|
|
41
|
-
JSON.stringify([...document.querySelectorAll('h1')].map(h => h.textContent))
|
|
42
|
-
```
|
|
43
|
-
|
|
44
|
-
**Always close tabs when done**: every `open` is tracked. Use `agent-browser:\nclose` (or `--session <name> close`) when finished. Leaving sessions open accumulates stale tabs — the hook will warn you when other sessions are still open.
|
|
45
|
-
|
|
46
|
-
## Core Workflow
|
|
47
|
-
|
|
48
|
-
Every browser automation follows this pattern:
|
|
49
|
-
|
|
50
|
-
1. **Navigate**: `agent-browser open <url>`
|
|
51
|
-
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
|
|
52
|
-
3. **Interact**: Use refs to click, fill, select
|
|
53
|
-
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
|
|
54
|
-
|
|
55
|
-
```bash
|
|
56
|
-
agent-browser open https://example.com/form
|
|
57
|
-
agent-browser snapshot -i
|
|
58
|
-
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
|
|
59
|
-
|
|
60
|
-
agent-browser fill @e1 "user@example.com"
|
|
61
|
-
agent-browser fill @e2 "password123"
|
|
62
|
-
agent-browser click @e3
|
|
63
|
-
agent-browser wait --load networkidle
|
|
64
|
-
agent-browser snapshot -i # Check result
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
## Essential Commands
|
|
68
|
-
|
|
69
|
-
```bash
|
|
70
|
-
# Navigation
|
|
71
|
-
agent-browser open <url> # Navigate (aliases: goto, navigate)
|
|
72
|
-
agent-browser close # Close browser
|
|
73
|
-
|
|
74
|
-
# Snapshot
|
|
75
|
-
agent-browser snapshot -i # Interactive elements with refs (recommended)
|
|
76
|
-
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
|
|
77
|
-
agent-browser snapshot -s "#selector" # Scope to CSS selector
|
|
78
|
-
|
|
79
|
-
# Interaction (use @refs from snapshot)
|
|
80
|
-
agent-browser click @e1 # Click element
|
|
81
|
-
agent-browser fill @e2 "text" # Clear and type text
|
|
82
|
-
agent-browser type @e2 "text" # Type without clearing
|
|
83
|
-
agent-browser select @e1 "option" # Select dropdown option
|
|
84
|
-
agent-browser check @e1 # Check checkbox
|
|
85
|
-
agent-browser press Enter # Press key
|
|
86
|
-
agent-browser scroll down 500 # Scroll page
|
|
87
|
-
|
|
88
|
-
# Get information
|
|
89
|
-
agent-browser get text @e1 # Get element text
|
|
90
|
-
agent-browser get url # Get current URL
|
|
91
|
-
agent-browser get title # Get page title
|
|
92
|
-
|
|
93
|
-
# Wait
|
|
94
|
-
agent-browser wait @e1 # Wait for element
|
|
95
|
-
agent-browser wait --load networkidle # Wait for network idle
|
|
96
|
-
agent-browser wait --url "**/page" # Wait for URL pattern
|
|
97
|
-
agent-browser wait 2000 # Wait milliseconds
|
|
98
|
-
|
|
99
|
-
# Capture
|
|
100
|
-
agent-browser screenshot # Screenshot to temp dir
|
|
101
|
-
agent-browser screenshot --full # Full page screenshot
|
|
102
|
-
agent-browser pdf output.pdf # Save as PDF
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
## Common Patterns
|
|
106
|
-
|
|
107
|
-
### Form Submission
|
|
108
|
-
|
|
109
|
-
```bash
|
|
110
|
-
agent-browser open https://example.com/signup
|
|
111
|
-
agent-browser snapshot -i
|
|
112
|
-
agent-browser fill @e1 "Jane Doe"
|
|
113
|
-
agent-browser fill @e2 "jane@example.com"
|
|
114
|
-
agent-browser select @e3 "California"
|
|
115
|
-
agent-browser check @e4
|
|
116
|
-
agent-browser click @e5
|
|
117
|
-
agent-browser wait --load networkidle
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
### Authentication with State Persistence
|
|
121
|
-
|
|
122
|
-
```bash
|
|
123
|
-
# Login once and save state
|
|
124
|
-
agent-browser open https://app.example.com/login
|
|
125
|
-
agent-browser snapshot -i
|
|
126
|
-
agent-browser fill @e1 "$USERNAME"
|
|
127
|
-
agent-browser fill @e2 "$PASSWORD"
|
|
128
|
-
agent-browser click @e3
|
|
129
|
-
agent-browser wait --url "**/dashboard"
|
|
130
|
-
agent-browser state save auth.json
|
|
131
|
-
|
|
132
|
-
# Reuse in future sessions
|
|
133
|
-
agent-browser state load auth.json
|
|
134
|
-
agent-browser open https://app.example.com/dashboard
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
### Data Extraction
|
|
138
|
-
|
|
139
|
-
```bash
|
|
140
|
-
agent-browser open https://example.com/products
|
|
141
|
-
agent-browser snapshot -i
|
|
142
|
-
agent-browser get text @e5 # Get specific element text
|
|
143
|
-
agent-browser get text body > page.txt # Get all page text
|
|
144
|
-
|
|
145
|
-
# JSON output for parsing
|
|
146
|
-
agent-browser snapshot -i --json
|
|
147
|
-
agent-browser get text @e1 --json
|
|
148
|
-
```
|
|
149
|
-
|
|
150
|
-
### Parallel Sessions
|
|
151
|
-
|
|
152
|
-
```bash
|
|
153
|
-
agent-browser --session site1 open https://site-a.com
|
|
154
|
-
agent-browser --session site2 open https://site-b.com
|
|
155
|
-
|
|
156
|
-
agent-browser --session site1 snapshot -i
|
|
157
|
-
agent-browser --session site2 snapshot -i
|
|
158
|
-
|
|
159
|
-
agent-browser session list
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
### Connect to Existing Chrome
|
|
163
|
-
|
|
164
|
-
```bash
|
|
165
|
-
# Auto-discover running Chrome with remote debugging enabled
|
|
166
|
-
agent-browser --auto-connect open https://example.com
|
|
167
|
-
agent-browser --auto-connect snapshot
|
|
168
|
-
|
|
169
|
-
# Or with explicit CDP port
|
|
170
|
-
agent-browser --cdp 9222 snapshot
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
### Visual Browser (Headed Mode)
|
|
174
|
-
|
|
175
|
-
Use `--headed` as the first flag on the first line — it propagates to all commands in the block:
|
|
176
|
-
|
|
177
|
-
```
|
|
178
|
-
agent-browser:
|
|
179
|
-
--headed open https://example.com
|
|
180
|
-
wait --load networkidle
|
|
181
|
-
snapshot -i
|
|
182
|
-
```
|
|
183
|
-
|
|
184
|
-
```
|
|
185
|
-
agent-browser:
|
|
186
|
-
--headed open https://example.com
|
|
187
|
-
highlight @e1
|
|
188
|
-
record start demo.webm
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
### Local Files (PDFs, HTML)
|
|
192
|
-
|
|
193
|
-
```bash
|
|
194
|
-
# Open local files with file:// URLs
|
|
195
|
-
agent-browser --allow-file-access open file:///path/to/document.pdf
|
|
196
|
-
agent-browser --allow-file-access open file:///path/to/page.html
|
|
197
|
-
agent-browser screenshot output.png
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
### iOS Simulator (Mobile Safari)
|
|
201
|
-
|
|
202
|
-
```bash
|
|
203
|
-
# List available iOS simulators
|
|
204
|
-
agent-browser device list
|
|
205
|
-
|
|
206
|
-
# Launch Safari on a specific device
|
|
207
|
-
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
|
|
208
|
-
|
|
209
|
-
# Same workflow as desktop - snapshot, interact, re-snapshot
|
|
210
|
-
agent-browser -p ios snapshot -i
|
|
211
|
-
agent-browser -p ios tap @e1 # Tap (alias for click)
|
|
212
|
-
agent-browser -p ios fill @e2 "text"
|
|
213
|
-
agent-browser -p ios swipe up # Mobile-specific gesture
|
|
214
|
-
|
|
215
|
-
# Take screenshot
|
|
216
|
-
agent-browser -p ios screenshot mobile.png
|
|
217
|
-
|
|
218
|
-
# Close session (shuts down simulator)
|
|
219
|
-
agent-browser -p ios close
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
**Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
|
|
223
|
-
|
|
224
|
-
**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
|
|
225
|
-
|
|
226
|
-
## Ref Lifecycle (Important)
|
|
227
|
-
|
|
228
|
-
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
|
|
229
|
-
|
|
230
|
-
- Clicking links or buttons that navigate
|
|
231
|
-
- Form submissions
|
|
232
|
-
- Dynamic content loading (dropdowns, modals)
|
|
233
|
-
|
|
234
|
-
```bash
|
|
235
|
-
agent-browser click @e5 # Navigates to new page
|
|
236
|
-
agent-browser snapshot -i # MUST re-snapshot
|
|
237
|
-
agent-browser click @e1 # Use new refs
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
## Semantic Locators (Alternative to Refs)
|
|
241
|
-
|
|
242
|
-
When refs are unavailable or unreliable, use semantic locators:
|
|
243
|
-
|
|
244
|
-
```bash
|
|
245
|
-
agent-browser find text "Sign In" click
|
|
246
|
-
agent-browser find label "Email" fill "user@test.com"
|
|
247
|
-
agent-browser find role button click --name "Submit"
|
|
248
|
-
agent-browser find placeholder "Search" type "query"
|
|
249
|
-
agent-browser find testid "submit-btn" click
|
|
250
|
-
```
|
|
251
|
-
|
|
252
|
-
## JavaScript Evaluation (exec pathway)
|
|
253
|
-
|
|
254
|
-
Use this pathway when you need to run JavaScript in the browser context — not ordinary CLI commands. This goes through `exec:agent-browser` via Bash, which pipes your code to `agent-browser eval --stdin`. **Shell quoting can corrupt complex expressions** — use the heredoc form.
|
|
255
|
-
|
|
256
|
-
Use `exec:agent-browser` via Bash. The code body is piped directly to `agent-browser eval --stdin` — no shell, no escaping.
|
|
257
|
-
|
|
258
|
-
```
|
|
259
|
-
exec:agent-browser
|
|
260
|
-
document.title
|
|
261
|
-
```
|
|
262
|
-
|
|
263
|
-
```
|
|
264
|
-
exec:agent-browser
|
|
265
|
-
JSON.stringify(
|
|
266
|
-
Array.from(document.querySelectorAll("img"))
|
|
267
|
-
.filter(i => !i.alt)
|
|
268
|
-
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
|
|
269
|
-
)
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
Never base64-encode the code. Never add `agent-browser eval` flags. Write plain JavaScript directly as the exec body.
|
|
273
|
-
|
|
274
|
-
## Complete Command Reference
|
|
275
|
-
|
|
276
|
-
### Core Navigation & Lifecycle
|
|
277
|
-
```bash
|
|
278
|
-
agent-browser open <url> # Navigate (aliases: goto, navigate)
|
|
279
|
-
agent-browser close # Close browser (aliases: quit, exit)
|
|
280
|
-
agent-browser back # Go back
|
|
281
|
-
agent-browser forward # Go forward
|
|
282
|
-
agent-browser reload # Reload page
|
|
283
|
-
```
|
|
284
|
-
|
|
285
|
-
### Snapshots & Element References
|
|
286
|
-
```bash
|
|
287
|
-
agent-browser snapshot # Accessibility tree with semantic refs
|
|
288
|
-
agent-browser snapshot -i # Interactive elements with @e refs
|
|
289
|
-
agent-browser snapshot -i -C # Include cursor-interactive divs (onclick, pointer)
|
|
290
|
-
agent-browser snapshot -s "#sel" # Scope snapshot to CSS selector
|
|
291
|
-
agent-browser snapshot --json # JSON output for parsing
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
### Interaction - Click, Fill, Type, Select
|
|
295
|
-
```bash
|
|
296
|
-
agent-browser click <sel> # Click element
|
|
297
|
-
agent-browser click <sel> --new-tab # Open link in new tab
|
|
298
|
-
agent-browser dblclick <sel> # Double-click
|
|
299
|
-
agent-browser focus <sel> # Focus element
|
|
300
|
-
agent-browser type <sel> <text> # Type into element (append)
|
|
301
|
-
agent-browser fill <sel> <text> # Clear and fill
|
|
302
|
-
agent-browser select <sel> <val> # Select dropdown option
|
|
303
|
-
agent-browser check <sel> # Check checkbox
|
|
304
|
-
agent-browser uncheck <sel> # Uncheck checkbox
|
|
305
|
-
agent-browser press <key> # Press key (Enter, Tab, Control+a, etc.) (alias: key)
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
### Keyboard & Text Input
|
|
309
|
-
```bash
|
|
310
|
-
agent-browser keyboard type <text> # Type with real keystrokes (no selector, uses focus)
|
|
311
|
-
agent-browser keyboard inserttext <text> # Insert text without triggering key events
|
|
312
|
-
agent-browser keydown <key> # Hold key down
|
|
313
|
-
agent-browser keyup <key> # Release key
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
### Mouse & Drag
|
|
317
|
-
```bash
|
|
318
|
-
agent-browser hover <sel> # Hover element
|
|
319
|
-
agent-browser drag <src> <tgt> # Drag and drop
|
|
320
|
-
agent-browser mouse move <x> <y> # Move mouse to coordinates
|
|
321
|
-
agent-browser mouse down [button] # Press mouse button (left/right/middle)
|
|
322
|
-
agent-browser mouse up [button] # Release mouse button
|
|
323
|
-
agent-browser mouse wheel <dy> [dx] # Scroll wheel
|
|
324
|
-
```
|
|
325
|
-
|
|
326
|
-
### Scrolling & Viewport
|
|
327
|
-
```bash
|
|
328
|
-
agent-browser scroll <dir> [px] # Scroll (up/down/left/right, optional px)
|
|
329
|
-
agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
|
|
330
|
-
agent-browser set viewport <w> <h> # Set viewport size (e.g., 1920 1080)
|
|
331
|
-
agent-browser set device <name> # Emulate device (e.g., "iPhone 14")
|
|
332
|
-
```
|
|
333
|
-
|
|
334
|
-
### Get Information
|
|
335
|
-
```bash
|
|
336
|
-
agent-browser get text <sel> # Get text content
|
|
337
|
-
agent-browser get html <sel> # Get innerHTML
|
|
338
|
-
agent-browser get value <sel> # Get input value
|
|
339
|
-
agent-browser get attr <sel> <attr> # Get attribute value
|
|
340
|
-
agent-browser get title # Get page title
|
|
341
|
-
agent-browser get url # Get current URL
|
|
342
|
-
agent-browser get count <sel> # Count matching elements
|
|
343
|
-
agent-browser get box <sel> # Get bounding box {x, y, width, height}
|
|
344
|
-
agent-browser get styles <sel> # Get computed CSS styles
|
|
345
|
-
```
|
|
346
|
-
|
|
347
|
-
### Check State
|
|
348
|
-
```bash
|
|
349
|
-
agent-browser is visible <sel> # Check if visible
|
|
350
|
-
agent-browser is enabled <sel> # Check if enabled (not disabled)
|
|
351
|
-
agent-browser is checked <sel> # Check if checked (checkbox/radio)
|
|
352
|
-
```
|
|
353
|
-
|
|
354
|
-
### File Operations
|
|
355
|
-
```bash
|
|
356
|
-
agent-browser upload <sel> <files> # Upload files to file input
|
|
357
|
-
agent-browser screenshot [path] # Screenshot to temp or custom path
|
|
358
|
-
agent-browser screenshot --full # Full page screenshot
|
|
359
|
-
agent-browser screenshot --annotate # Annotated with numbered element labels
|
|
360
|
-
agent-browser pdf <path> # Save as PDF
|
|
361
|
-
```
|
|
362
|
-
|
|
363
|
-
### Semantic Locators (Alternative to Selectors)
|
|
364
|
-
```bash
|
|
365
|
-
agent-browser find role <role> <action> [value] # By ARIA role
|
|
366
|
-
agent-browser find text <text> <action> # By text content
|
|
367
|
-
agent-browser find label <label> <action> [value] # By form label
|
|
368
|
-
agent-browser find placeholder <ph> <action> [value] # By placeholder text
|
|
369
|
-
agent-browser find alt <text> <action> # By alt text
|
|
370
|
-
agent-browser find title <text> <action> # By title attribute
|
|
371
|
-
agent-browser find testid <id> <action> [value] # By data-testid
|
|
372
|
-
agent-browser find first <sel> <action> [value] # First matching element
|
|
373
|
-
agent-browser find last <sel> <action> [value] # Last matching element
|
|
374
|
-
agent-browser find nth <n> <sel> <action> [value] # Nth matching element
|
|
375
|
-
|
|
376
|
-
# Role examples: button, link, textbox, combobox, checkbox, radio, heading, list, etc.
|
|
377
|
-
# Actions: click, fill, type, hover, focus, check, uncheck, text
|
|
378
|
-
# Options: --name <name> (filter by accessible name), --exact (exact text match)
|
|
379
|
-
```
|
|
380
|
-
|
|
381
|
-
### Waiting
|
|
382
|
-
```bash
|
|
383
|
-
agent-browser wait <selector> # Wait for element to be visible
|
|
384
|
-
agent-browser wait <ms> # Wait for time in milliseconds
|
|
385
|
-
agent-browser wait --text "Welcome" # Wait for text to appear
|
|
386
|
-
agent-browser wait --url "**/dash" # Wait for URL pattern
|
|
387
|
-
agent-browser wait --load networkidle # Wait for load state (load, domcontentloaded, networkidle)
|
|
388
|
-
agent-browser wait --fn "window.ready === true" # Wait for JS condition
|
|
389
|
-
```
|
|
390
|
-
|
|
391
|
-
### JavaScript Evaluation (exec pathway — not a direct tool command)
|
|
392
|
-
```
|
|
393
|
-
exec:agent-browser
|
|
394
|
-
<plain JS>
|
|
395
|
-
```
|
|
396
|
-
Use `exec:agent-browser` via Bash. Code is piped to `agent-browser eval --stdin`. No base64, no flags.
|
|
397
|
-
|
|
398
|
-
### Browser Environment
|
|
399
|
-
```bash
|
|
400
|
-
agent-browser set geo <lat> <lng> # Set geolocation
|
|
401
|
-
agent-browser set offline [on|off] # Toggle offline mode
|
|
402
|
-
agent-browser set headers <json> # Set HTTP headers
|
|
403
|
-
agent-browser set credentials <u> <p> # HTTP basic auth
|
|
404
|
-
agent-browser set media [dark|light] # Emulate color scheme (prefers-color-scheme)
|
|
405
|
-
```
|
|
406
|
-
|
|
407
|
-
### Cookies & Storage
|
|
408
|
-
```bash
|
|
409
|
-
agent-browser cookies # Get all cookies
|
|
410
|
-
agent-browser cookies set <name> <val> # Set cookie
|
|
411
|
-
agent-browser cookies clear # Clear cookies
|
|
412
|
-
agent-browser storage local # Get all localStorage
|
|
413
|
-
agent-browser storage local <key> # Get specific key
|
|
414
|
-
agent-browser storage local set <k> <v> # Set value
|
|
415
|
-
agent-browser storage local clear # Clear all localStorage
|
|
416
|
-
agent-browser storage session # Same for sessionStorage
|
|
417
|
-
agent-browser storage session <key> # Get sessionStorage key
|
|
418
|
-
agent-browser storage session set <k> <v> # Set sessionStorage
|
|
419
|
-
agent-browser storage session clear # Clear sessionStorage
|
|
420
|
-
```
|
|
421
|
-
|
|
422
|
-
### Network & Interception
|
|
423
|
-
```bash
|
|
424
|
-
agent-browser network route <url> # Intercept requests
|
|
425
|
-
agent-browser network route <url> --abort # Block requests
|
|
426
|
-
agent-browser network route <url> --body <json> # Mock response with JSON
|
|
427
|
-
agent-browser network unroute [url] # Remove routes
|
|
428
|
-
agent-browser network requests # View tracked requests
|
|
429
|
-
agent-browser network requests --filter api # Filter by keyword
|
|
430
|
-
```
|
|
431
|
-
|
|
432
|
-
### Tabs & Windows
|
|
433
|
-
```bash
|
|
434
|
-
agent-browser tab # List active tabs
|
|
435
|
-
agent-browser tab new [url] # Open new tab (optionally with URL)
|
|
436
|
-
agent-browser tab <n> # Switch to tab n
|
|
437
|
-
agent-browser tab close [n] # Close tab (current or specific)
|
|
438
|
-
agent-browser window new # Open new window
|
|
439
|
-
```
|
|
440
|
-
|
|
441
|
-
### Frames
|
|
442
|
-
```bash
|
|
443
|
-
agent-browser frame <sel> # Switch to iframe by selector
|
|
444
|
-
agent-browser frame main # Switch back to main frame
|
|
445
|
-
```
|
|
446
|
-
|
|
447
|
-
### Dialogs
|
|
448
|
-
```bash
|
|
449
|
-
agent-browser dialog accept [text] # Accept alert/confirm (with optional prompt text)
|
|
450
|
-
agent-browser dialog dismiss # Dismiss dialog
|
|
451
|
-
```
|
|
452
|
-
|
|
453
|
-
### State Persistence (Auth, Sessions)
|
|
454
|
-
```bash
|
|
455
|
-
agent-browser state save <path> # Save authenticated session
|
|
456
|
-
agent-browser state load <path> # Load session state
|
|
457
|
-
agent-browser state list # List saved state files
|
|
458
|
-
agent-browser state show <file> # Show state summary
|
|
459
|
-
agent-browser state rename <old> <new> # Rename state
|
|
460
|
-
agent-browser state clear [name] # Clear specific session
|
|
461
|
-
agent-browser state clear --all # Clear all states
|
|
462
|
-
agent-browser state clean --older-than <days> # Delete old states
|
|
463
|
-
```
|
|
464
|
-
|
|
465
|
-
### Debugging & Analysis
|
|
466
|
-
```bash
|
|
467
|
-
agent-browser highlight <sel> # Highlight element visually
|
|
468
|
-
agent-browser console # View console messages (log, error, warn)
|
|
469
|
-
agent-browser console --clear # Clear console
|
|
470
|
-
agent-browser errors # View JavaScript errors
|
|
471
|
-
agent-browser errors --clear # Clear errors
|
|
472
|
-
agent-browser trace start [path] # Start DevTools trace
|
|
473
|
-
agent-browser trace stop [path] # Stop and save trace
|
|
474
|
-
agent-browser profiler start # Start Chrome DevTools profiler
|
|
475
|
-
agent-browser profiler stop [path] # Stop and save .json profile
|
|
476
|
-
```
|
|
477
|
-
|
|
478
|
-
### Visual Debugging
|
|
479
|
-
```
|
|
480
|
-
agent-browser:
|
|
481
|
-
--headed open <url>
|
|
482
|
-
record start <file.webm>
|
|
483
|
-
```
|
|
484
|
-
```
|
|
485
|
-
agent-browser:
|
|
486
|
-
record stop
|
|
487
|
-
```
|
|
488
|
-
|
|
489
|
-
### Comparisons & Diffs
|
|
490
|
-
```bash
|
|
491
|
-
agent-browser diff snapshot # Compare current vs last snapshot
|
|
492
|
-
agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot
|
|
493
|
-
agent-browser diff snapshot --selector "#main" --compact # Scoped diff
|
|
494
|
-
agent-browser diff screenshot --baseline before.png # Visual pixel diff
|
|
495
|
-
agent-browser diff screenshot --baseline b.png -o d.png # Save diff to custom path
|
|
496
|
-
agent-browser diff screenshot --baseline b.png -t 0.2 # Color threshold 0-1
|
|
497
|
-
agent-browser diff url https://v1.com https://v2.com # Compare two URLs
|
|
498
|
-
agent-browser diff url https://v1.com https://v2.com --screenshot # With visual diff
|
|
499
|
-
agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scoped
|
|
500
|
-
```
|
|
501
|
-
|
|
502
|
-
### Sessions & Parallelism
|
|
503
|
-
```bash
|
|
504
|
-
agent-browser --session <name> <cmd> # Run in named session (isolated instance)
|
|
505
|
-
agent-browser session list # List active sessions
|
|
506
|
-
agent-browser session show # Show current session
|
|
507
|
-
# Example: agent-browser --session agent1 open site.com
|
|
508
|
-
# agent-browser --session agent2 open other.com
|
|
509
|
-
```
|
|
510
|
-
|
|
511
|
-
### Browser Connection
|
|
512
|
-
```bash
|
|
513
|
-
agent-browser connect <port> # Connect via Chrome DevTools Protocol
|
|
514
|
-
agent-browser --auto-connect open <url> # Auto-discover running Chrome
|
|
515
|
-
agent-browser --cdp 9222 <cmd> # Explicit CDP port
|
|
516
|
-
```
|
|
517
|
-
|
|
518
|
-
### Setup & Installation
|
|
519
|
-
```bash
|
|
520
|
-
agent-browser install # Download Chromium browser
|
|
521
|
-
agent-browser install --with-deps # Also install system dependencies (Linux)
|
|
522
|
-
```
|
|
523
|
-
|
|
524
|
-
### Advanced: Local Files & Protocols
|
|
525
|
-
```bash
|
|
526
|
-
agent-browser --allow-file-access open file:///path/to/file.pdf
|
|
527
|
-
agent-browser --allow-file-access open file:///path/to/page.html
|
|
528
|
-
```
|
|
529
|
-
|
|
530
|
-
### Advanced: iOS/Mobile Testing
|
|
531
|
-
```bash
|
|
532
|
-
agent-browser device list # List available iOS simulators
|
|
533
|
-
agent-browser -p ios --device "iPhone 16 Pro" open <url> # Launch on device
|
|
534
|
-
agent-browser -p ios snapshot -i # Snapshot on iOS
|
|
535
|
-
agent-browser -p ios tap @e1 # Tap (alias for click)
|
|
536
|
-
agent-browser -p ios swipe up # Mobile gestures
|
|
537
|
-
agent-browser -p ios screenshot mobile.png
|
|
538
|
-
agent-browser -p ios close # Close simulator
|
|
539
|
-
# Requires: macOS, Xcode, Appium (npm install -g appium && appium driver install xcuitest)
|
|
540
|
-
```
|
|
541
|
-
|
|
542
|
-
## Windows: "Daemon not found" Fix
|
|
543
|
-
|
|
544
|
-
If `agent-browser` fails with `Daemon not found. Set AGENT_BROWSER_HOME environment variable or run from project directory.` on Windows, the `AGENT_BROWSER_HOME` env var is missing or pointing to the wrong path. It must point to the npm package directory containing `dist/daemon.js`:
|
|
545
|
-
|
|
546
|
-
```cmd
|
|
547
|
-
:: Find and set the correct path (run in cmd or PowerShell)
|
|
548
|
-
for /f "delims=" %i in ('npm root -g') do setx AGENT_BROWSER_HOME "%i\agent-browser"
|
|
549
|
-
```
|
|
550
|
-
|
|
551
|
-
Open a new terminal after running. See `references/windows-troubleshooting.md` for details on stale port files and Git Bash setup.
|
|
552
|
-
|
|
553
|
-
## Key Patterns for Agents
|
|
554
|
-
|
|
555
|
-
**Always use agent-browser instead of puppeteer, playwright, or playwright-core** — it has the same capabilities with simpler syntax and better integration with AI agents.
|
|
556
|
-
|
|
557
|
-
**Which pathway to use**:
|
|
558
|
-
- Ordinary browser control (navigation, clicking, forms, screenshots) → call the `agent-browser` tool directly
|
|
559
|
-
- Need to run JavaScript in the page → use `exec:agent-browser` via Bash with plain JS as the body
|
|
560
|
-
|
|
561
|
-
**Multi-step workflows** (ordinary pathway — direct tool calls):
|
|
562
|
-
1. `agent-browser open <url>`
|
|
563
|
-
2. `agent-browser snapshot -i` (get refs)
|
|
564
|
-
3. `agent-browser fill @e1 "value"`
|
|
565
|
-
4. `agent-browser click @e2`
|
|
566
|
-
5. `agent-browser wait --load networkidle` (after navigation)
|
|
567
|
-
6. `agent-browser snapshot -i` (re-snapshot for new refs)
|
|
568
|
-
|
|
569
|
-
**JavaScript inspection** (exec pathway — when you need page access):
|
|
570
|
-
```
|
|
571
|
-
exec:agent-browser
|
|
572
|
-
document.title
|
|
573
|
-
```
|
|
574
|
-
|
|
575
|
-
**Debugging complex interactions**: Use headed mode — put `--headed` as the first flag on the first line of an `agent-browser:` block. It propagates to all subsequent commands in the block.
|
|
576
|
-
|
|
577
|
-
**Ground truth verification**: Use the ordinary pathway (`agent-browser screenshot`) for visual confirmation; use the exec pathway for JavaScript-level inspection.
|