@openrig/cli 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (104) hide show
  1. package/daemon/assets/guidance/openrig-start.md +16 -1
  2. package/daemon/dist/adapters/claude-code-adapter.d.ts +12 -0
  3. package/daemon/dist/adapters/claude-code-adapter.d.ts.map +1 -1
  4. package/daemon/dist/adapters/claude-code-adapter.js +92 -3
  5. package/daemon/dist/adapters/claude-code-adapter.js.map +1 -1
  6. package/daemon/dist/adapters/codex-runtime-adapter.d.ts +5 -0
  7. package/daemon/dist/adapters/codex-runtime-adapter.d.ts.map +1 -1
  8. package/daemon/dist/adapters/codex-runtime-adapter.js +82 -2
  9. package/daemon/dist/adapters/codex-runtime-adapter.js.map +1 -1
  10. package/daemon/dist/domain/agent-manifest.d.ts.map +1 -1
  11. package/daemon/dist/domain/agent-manifest.js +2 -1
  12. package/daemon/dist/domain/agent-manifest.js.map +1 -1
  13. package/daemon/dist/domain/native-resume-probe.d.ts.map +1 -1
  14. package/daemon/dist/domain/native-resume-probe.js +24 -1
  15. package/daemon/dist/domain/native-resume-probe.js.map +1 -1
  16. package/daemon/dist/domain/profile-resolver.js +1 -1
  17. package/daemon/dist/domain/profile-resolver.js.map +1 -1
  18. package/daemon/dist/domain/runtime-adapter.d.ts +1 -0
  19. package/daemon/dist/domain/runtime-adapter.d.ts.map +1 -1
  20. package/daemon/dist/domain/runtime-adapter.js.map +1 -1
  21. package/daemon/dist/domain/startup-orchestrator.d.ts.map +1 -1
  22. package/daemon/dist/domain/startup-orchestrator.js +10 -1
  23. package/daemon/dist/domain/startup-orchestrator.js.map +1 -1
  24. package/daemon/specs/agents/analyst/agent.yaml +10 -1
  25. package/daemon/specs/agents/design/agent.yaml +10 -1
  26. package/daemon/specs/agents/design/guidance/role.md +13 -0
  27. package/daemon/specs/agents/impl/agent.yaml +10 -1
  28. package/daemon/specs/agents/impl/guidance/role.md +20 -0
  29. package/daemon/specs/agents/lead/agent.yaml +10 -1
  30. package/daemon/specs/agents/lead/guidance/role.md +18 -0
  31. package/daemon/specs/agents/qa/agent.yaml +10 -1
  32. package/daemon/specs/agents/qa/guidance/role.md +52 -0
  33. package/daemon/specs/agents/reviewer/agent.yaml +10 -1
  34. package/daemon/specs/agents/reviewer/guidance/role.md +13 -0
  35. package/daemon/specs/agents/shared/agent.yaml +38 -0
  36. package/daemon/specs/agents/shared/skills/agent-browser/LOCAL-INSIGHTS.md +189 -0
  37. package/daemon/specs/agents/shared/skills/agent-browser/SKILL.md +417 -0
  38. package/daemon/specs/agents/shared/skills/brainstorming/SKILL.md +96 -0
  39. package/daemon/specs/agents/shared/skills/containerized-e2e/SKILL.md +256 -0
  40. package/daemon/specs/agents/shared/skills/containerized-e2e/scripts/Dockerfile +39 -0
  41. package/daemon/specs/agents/shared/skills/containerized-e2e/scripts/build-e2e-image.sh +37 -0
  42. package/daemon/specs/agents/shared/skills/containerized-e2e/templates/control-plane-test.yaml +40 -0
  43. package/daemon/specs/agents/shared/skills/containerized-e2e/templates/e2e-report-template.md +94 -0
  44. package/daemon/specs/agents/shared/skills/containerized-e2e/templates/expansion-collision-fragment.yaml +13 -0
  45. package/daemon/specs/agents/shared/skills/containerized-e2e/templates/expansion-pod-fragment.yaml +14 -0
  46. package/daemon/specs/agents/shared/skills/development-team/SKILL.md +149 -0
  47. package/daemon/specs/agents/shared/skills/dogfood/SKILL.md +220 -0
  48. package/daemon/specs/agents/shared/skills/dogfood/references/issue-taxonomy.md +109 -0
  49. package/daemon/specs/agents/shared/skills/dogfood/templates/dogfood-report-template.md +53 -0
  50. package/daemon/specs/agents/shared/skills/executing-plans/SKILL.md +84 -0
  51. package/daemon/specs/agents/shared/skills/frontend-design/LICENSE.txt +177 -0
  52. package/daemon/specs/agents/shared/skills/frontend-design/SKILL.md +42 -0
  53. package/daemon/specs/agents/shared/skills/openrig-user/SKILL.md +468 -0
  54. package/daemon/specs/agents/shared/skills/orchestration-team/SKILL.md +234 -0
  55. package/daemon/specs/agents/shared/skills/review-team/SKILL.md +210 -0
  56. package/daemon/specs/agents/shared/skills/systematic-debugging/CREATION-LOG.md +119 -0
  57. package/daemon/specs/agents/shared/skills/systematic-debugging/SKILL.md +296 -0
  58. package/daemon/specs/agents/shared/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
  59. package/daemon/specs/agents/shared/skills/systematic-debugging/condition-based-waiting.md +115 -0
  60. package/daemon/specs/agents/shared/skills/systematic-debugging/defense-in-depth.md +122 -0
  61. package/daemon/specs/agents/shared/skills/systematic-debugging/find-polluter.sh +63 -0
  62. package/daemon/specs/agents/shared/skills/systematic-debugging/root-cause-tracing.md +169 -0
  63. package/daemon/specs/agents/shared/skills/systematic-debugging/test-academic.md +14 -0
  64. package/daemon/specs/agents/shared/skills/systematic-debugging/test-pressure-1.md +58 -0
  65. package/daemon/specs/agents/shared/skills/systematic-debugging/test-pressure-2.md +68 -0
  66. package/daemon/specs/agents/shared/skills/systematic-debugging/test-pressure-3.md +69 -0
  67. package/daemon/specs/agents/shared/skills/test-driven-development/SKILL.md +371 -0
  68. package/daemon/specs/agents/shared/skills/test-driven-development/testing-anti-patterns.md +299 -0
  69. package/daemon/specs/agents/shared/skills/using-superpowers/SKILL.md +95 -0
  70. package/daemon/specs/agents/shared/skills/verification-before-completion/SKILL.md +139 -0
  71. package/daemon/specs/agents/shared/skills/writing-plans/SKILL.md +116 -0
  72. package/daemon/specs/agents/synthesizer/agent.yaml +10 -1
  73. package/daemon/specs/demo.CULTURE.md +92 -0
  74. package/daemon/specs/demo.yaml +91 -0
  75. package/daemon/specs/implementation-pair.yaml +3 -3
  76. package/daemon/specs/product-team.CULTURE.md +137 -0
  77. package/daemon/specs/product-team.yaml +5 -4
  78. package/dist/client.d.ts +8 -1
  79. package/dist/client.d.ts.map +1 -1
  80. package/dist/client.js +15 -6
  81. package/dist/client.js.map +1 -1
  82. package/dist/commands/daemon.d.ts.map +1 -1
  83. package/dist/commands/daemon.js +5 -1
  84. package/dist/commands/daemon.js.map +1 -1
  85. package/dist/commands/up.js +2 -2
  86. package/dist/commands/up.js.map +1 -1
  87. package/dist/daemon-lifecycle.d.ts.map +1 -1
  88. package/dist/daemon-lifecycle.js +54 -7
  89. package/dist/daemon-lifecycle.js.map +1 -1
  90. package/dist/fetch-with-timeout.d.ts +9 -0
  91. package/dist/fetch-with-timeout.d.ts.map +1 -0
  92. package/dist/fetch-with-timeout.js +41 -0
  93. package/dist/fetch-with-timeout.js.map +1 -0
  94. package/dist/index.d.ts.map +1 -1
  95. package/dist/index.js +2 -1
  96. package/dist/index.js.map +1 -1
  97. package/dist/mcp-server.d.ts.map +1 -1
  98. package/dist/mcp-server.js +2 -1
  99. package/dist/mcp-server.js.map +1 -1
  100. package/dist/version.d.ts +2 -0
  101. package/dist/version.d.ts.map +1 -0
  102. package/dist/version.js +8 -0
  103. package/dist/version.js.map +1 -0
  104. package/package.json +1 -1
@@ -0,0 +1,189 @@
1
+ # agent-browser: Local Dev Insights
2
+
3
+ > Companion to the official SKILL.md. These are gotchas, corrections, and best practices
4
+ > discovered through hands-on testing that the upstream skill doesn't cover.
5
+ > Last updated: 2026-02-20 | Tested against: v0.13.0
6
+
7
+ ---
8
+
9
+ ## Command Compatibility Matrix
10
+
11
+ **Not all `get` subcommands accept @refs.** This is the #1 source of confusion.
12
+
13
+ | Command | @refs | CSS selectors | Notes |
14
+ |---------|-------|---------------|-------|
15
+ | `get text @e1` | YES | YES | Works with both |
16
+ | `get html` | NO | YES | Fails silently with refs |
17
+ | `get box` | NO | YES | Returns `{x, y, width, height}` JSON |
18
+ | `get styles` | NO | YES | Returns compact summary (font, color, bg, border-radius) |
19
+ | `get value` | NO | YES | For form inputs |
20
+ | `get attr` | NO | YES | Any HTML attribute |
21
+ | `get count` | N/A | YES | Returns element count |
22
+ | `get url` | N/A | N/A | No selector needed |
23
+ | `get title` | N/A | N/A | No selector needed |
24
+ | `click` | YES | YES | Works with both |
25
+ | `fill` | YES | YES | Works with both |
26
+ | `highlight` | NO | YES | Skill shows `highlight @e1` but this fails |
27
+
28
+ **Rule of thumb:** Interaction commands (click, fill, type, check, select) work with @refs.
29
+ Inspection commands (get html/box/styles, highlight) need CSS selectors.
30
+
31
+ ## CSS Selectors: Strict Mode
32
+
33
+ Playwright strict mode means CSS selectors must match **exactly one element**. If multiple match, you get an error listing all matches (which is actually helpful for debugging).
34
+
35
+ **Strategies for unique selectors:**
36
+ - Use IDs: `#fork-button`
37
+ - Use unique attributes: `[data-testid="submit"]`
38
+ - Combine: `.header > a:first-child`
39
+ - Use `nth`: `.item:nth-child(3)`
40
+
41
+ ## Ref Lifecycle: The Golden Rule
42
+
43
+ Refs are invalidated by **any page state change**. This includes:
44
+ - Navigation (click links, `open`, `back`, `forward`)
45
+ - Scoped snapshots (`snapshot -s`) <-- easy to forget this one
46
+ - Form submissions
47
+ - Dynamic content (modals, dropdowns, AJAX loads)
48
+ - Even `snapshot` itself replaces all previous refs
49
+
50
+ **Pattern:** Always snapshot immediately before interacting. Never cache refs across multiple actions that change the page.
51
+
52
+ ## Snapshot Mode Comparison
53
+
54
+ | Flag | What it returns | When to use |
55
+ |------|----------------|-------------|
56
+ | `-i` | Interactive elements only | **Default choice** - best token efficiency |
57
+ | `-i -C` | Interactive + cursor-interactive | When divs with onclick aren't showing up |
58
+ | `-c` | Compact (removes empty nodes) | Unreliable - can return "Empty page" on some sites |
59
+ | `-d N` | Depth-limited | When `-i` returns too much |
60
+ | `-s "#sel"` | Scoped to selector | Laser focus on one component |
61
+ | `--json` | JSON format | Programmatic parsing |
62
+
63
+ **Token efficiency example:** GitHub repo page with 4,574 DOM elements → `snapshot -i` returns ~25 lines.
64
+
65
+ ## Annotated Screenshots
66
+
67
+ `screenshot --annotate` is powerful but **can hang on complex pages** (known issue #509). If it hangs:
68
+ 1. Kill with Ctrl-C or timeout
69
+ 2. Fall back to regular `screenshot` + separate `snapshot -i`
70
+ 3. Works best on simpler pages
71
+
72
+ The annotated screenshot also **caches refs**, so you can interact with elements immediately after without a separate snapshot.
73
+
74
+ ## Network Monitoring
75
+
76
+ ```bash
77
+ # See all requests (captured since page was opened)
78
+ agent-browser network requests
79
+
80
+ # Filter to just API calls (huge noise reduction)
81
+ agent-browser network requests --filter "/api/"
82
+
83
+ # Mock an API response
84
+ agent-browser network route "https://api.example.com/data" --body '{"mocked": true}'
85
+
86
+ # Block a request (e.g., analytics)
87
+ agent-browser network route "https://www.google-analytics.com/*" --abort
88
+ ```
89
+
90
+ Requests are captured from session start. The `--filter` flag is essential on real sites - without it you get dozens of CSS/image/analytics requests.
91
+
92
+ ## JavaScript Eval Patterns
93
+
94
+ ```bash
95
+ # Quick one-liner (single quotes, no nesting)
96
+ agent-browser eval 'document.title'
97
+
98
+ # Complex JS (ALWAYS use --stdin for anything with quotes/arrows/template literals)
99
+ agent-browser eval --stdin <<'EVALEOF'
100
+ JSON.stringify(
101
+ Array.from(document.querySelectorAll("a"))
102
+ .map(a => ({ text: a.textContent.trim(), href: a.href }))
103
+ .filter(a => a.text.length > 0)
104
+ .slice(0, 10)
105
+ )
106
+ EVALEOF
107
+
108
+ # Fetch API from browser context (uses page cookies/auth)
109
+ agent-browser eval --stdin <<'EVALEOF'
110
+ (async () => {
111
+ const res = await fetch('/api/data');
112
+ return JSON.stringify(await res.json());
113
+ })()
114
+ EVALEOF
115
+ ```
116
+
117
+ ## Session Management
118
+
119
+ - **Always close when done:** `agent-browser close` prevents leaked daemon processes
120
+ - **Headed mode for debugging:** `agent-browser --headed open <url>`
121
+ - **Persistent headed config:** Add `{"headed": true}` to `~/.agent-browser/config.json`
122
+ - **Named sessions for parallel work:** `agent-browser --session name open <url>`
123
+
124
+ ## Authentication: What Actually Works
125
+
126
+ **`--session-name` (state save/restore) does NOT work for all apps.** It saves cookies and localStorage, but apps using HTTP-only cookies, server-side sessions, or complex auth flows may not persist. Tested and failed on: tbbc (The Big Blue Cloud / localhost:8083).
127
+
128
+ **`--profile` (persistent Chrome profile) is the reliable approach.** It preserves everything - cookies, localStorage, IndexedDB, cache, service workers. This is what actually works for real apps.
129
+
130
+ ### Saved Profiles
131
+
132
+ | Profile | Service | URL | Command |
133
+ |---------|---------|-----|---------|
134
+ | `tbbc` | The Big Blue Cloud | `http://localhost:8083` | `agent-browser --profile ~/.agent-browser/profiles/tbbc open http://localhost:8083` |
135
+ | `localhost-3000` | Specright Formulate (Clerk auth) | `http://localhost:3000` | `agent-browser --profile ~/.agent-browser/profiles/localhost-3000 open http://localhost:3000` |
136
+ | `localhost-3010-email` | Smart Report Writer (email login) | `http://localhost:3010` | `agent-browser --profile ~/.agent-browser/profiles/localhost-3010-email open http://localhost:3010` |
137
+ | `localhost-3010-google` | Smart Report Writer (Google auth) | `http://localhost:3010` | See Google OAuth note below |
138
+
139
+ ### Google OAuth Profiles
140
+
141
+ Google blocks sign-in from the bundled Chromium ("This browser or app may not be secure"). The workaround is to use the **real Chrome binary** with automation detection disabled:
142
+
143
+ ```bash
144
+ agent-browser \
145
+ --profile ~/.agent-browser/profiles/localhost-3010-google \
146
+ --executable-path "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
147
+ --args "--disable-blink-features=AutomationControlled" \
148
+ open http://localhost:3010
149
+ ```
150
+
151
+ **This applies to ANY profile that needs Google OAuth.** Always use `--executable-path` + `--args` for Google sign-in flows.
152
+
153
+ ### Auth Setup Pattern
154
+
155
+ ```bash
156
+ # First time: login in headed mode (user enters password)
157
+ agent-browser --profile ~/.agent-browser/profiles/<name> --headed true open <login-url>
158
+ # ... user logs in manually ...
159
+ agent-browser close
160
+
161
+ # Every future run: headless, already authenticated
162
+ agent-browser --profile ~/.agent-browser/profiles/<name> open <app-url>
163
+ ```
164
+
165
+ ### Encryption
166
+
167
+ Session state files in `~/.agent-browser/sessions/` are encrypted with AES-256-GCM.
168
+ Key stored at `~/.agent-browser/.encryption-key` (chmod 600).
169
+ Loaded via `AGENT_BROWSER_ENCRYPTION_KEY` env var in `~/.zshrc`.
170
+
171
+ Note: `--profile` directories are NOT encrypted (they're standard Chromium profile dirs).
172
+ Keep `~/.agent-browser/profiles/` permissions locked down.
173
+
174
+ ## Updating the Official Skill
175
+
176
+ To sync SKILL.md with upstream while preserving local insights:
177
+
178
+ ```bash
179
+ # Download latest official SKILL.md
180
+ curl -sL https://raw.githubusercontent.com/vercel-labs/agent-browser/main/skills/agent-browser/SKILL.md \
181
+ -o ~/.claude/skills/agent-browser/SKILL.md
182
+
183
+ # Re-append the local insights reference (3 lines at end of SKILL.md)
184
+ cat >> ~/.claude/skills/agent-browser/SKILL.md << 'EOF'
185
+
186
+ ## Local Dev Insights
187
+ **IMPORTANT:** Read `LOCAL-INSIGHTS.md` in this skill directory for gotchas, corrections, and tested workflows discovered through hands-on use that this upstream skill doesn't cover.
188
+ EOF
189
+ ```
@@ -0,0 +1,417 @@
1
+ ---
2
+ name: agent-browser
3
+ description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4
+ allowed-tools: Bash(npx agent-browser:*), Bash(agent-browser:*)
5
+ ---
6
+
7
+ # Browser Automation with agent-browser
8
+
9
+ ## Core Workflow
10
+
11
+ Every browser automation follows this pattern:
12
+
13
+ 1. **Navigate**: `agent-browser open <url>`
14
+ 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
15
+ 3. **Interact**: Use refs to click, fill, select
16
+ 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
17
+
18
+ ```bash
19
+ agent-browser open https://example.com/form
20
+ agent-browser snapshot -i
21
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
22
+
23
+ agent-browser fill @e1 "user@example.com"
24
+ agent-browser fill @e2 "password123"
25
+ agent-browser click @e3
26
+ agent-browser wait --load networkidle
27
+ agent-browser snapshot -i # Check result
28
+ ```
29
+
30
+ ## Command Chaining
31
+
32
+ Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
33
+
34
+ ```bash
35
+ # Chain open + wait + snapshot in one call
36
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
37
+
38
+ # Chain multiple interactions
39
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
40
+
41
+ # Navigate and capture
42
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
43
+ ```
44
+
45
+ **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
46
+
47
+ ## Essential Commands
48
+
49
+ ```bash
50
+ # Navigation
51
+ agent-browser open <url> # Navigate (aliases: goto, navigate)
52
+ agent-browser close # Close browser
53
+
54
+ # Snapshot
55
+ agent-browser snapshot -i # Interactive elements with refs (recommended)
56
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
57
+ agent-browser snapshot -s "#selector" # Scope to CSS selector
58
+
59
+ # Interaction (use @refs from snapshot)
60
+ agent-browser click @e1 # Click element
61
+ agent-browser click @e1 --new-tab # Click and open in new tab
62
+ agent-browser fill @e2 "text" # Clear and type text
63
+ agent-browser type @e2 "text" # Type without clearing
64
+ agent-browser select @e1 "option" # Select dropdown option
65
+ agent-browser check @e1 # Check checkbox
66
+ agent-browser press Enter # Press key
67
+ agent-browser scroll down 500 # Scroll page
68
+
69
+ # Get information
70
+ agent-browser get text @e1 # Get element text
71
+ agent-browser get url # Get current URL
72
+ agent-browser get title # Get page title
73
+
74
+ # Wait
75
+ agent-browser wait @e1 # Wait for element
76
+ agent-browser wait --load networkidle # Wait for network idle
77
+ agent-browser wait --url "**/page" # Wait for URL pattern
78
+ agent-browser wait 2000 # Wait milliseconds
79
+
80
+ # Capture
81
+ agent-browser screenshot # Screenshot to temp dir
82
+ agent-browser screenshot --full # Full page screenshot
83
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
84
+ agent-browser pdf output.pdf # Save as PDF
85
+
86
+ # Diff (compare page states)
87
+ agent-browser diff snapshot # Compare current vs last snapshot
88
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
89
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff
90
+ agent-browser diff url <url1> <url2> # Compare two pages
91
+ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
92
+ agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
93
+ ```
94
+
95
+ ## Common Patterns
96
+
97
+ ### Form Submission
98
+
99
+ ```bash
100
+ agent-browser open https://example.com/signup
101
+ agent-browser snapshot -i
102
+ agent-browser fill @e1 "Jane Doe"
103
+ agent-browser fill @e2 "jane@example.com"
104
+ agent-browser select @e3 "California"
105
+ agent-browser check @e4
106
+ agent-browser click @e5
107
+ agent-browser wait --load networkidle
108
+ ```
109
+
110
+ ### Authentication with State Persistence
111
+
112
+ ```bash
113
+ # Login once and save state
114
+ agent-browser open https://app.example.com/login
115
+ agent-browser snapshot -i
116
+ agent-browser fill @e1 "$USERNAME"
117
+ agent-browser fill @e2 "$PASSWORD"
118
+ agent-browser click @e3
119
+ agent-browser wait --url "**/dashboard"
120
+ agent-browser state save auth.json
121
+
122
+ # Reuse in future sessions
123
+ agent-browser state load auth.json
124
+ agent-browser open https://app.example.com/dashboard
125
+ ```
126
+
127
+ ### Session Persistence
128
+
129
+ ```bash
130
+ # Auto-save/restore cookies and localStorage across browser restarts
131
+ agent-browser --session-name myapp open https://app.example.com/login
132
+ # ... login flow ...
133
+ agent-browser close # State auto-saved to ~/.agent-browser/sessions/
134
+
135
+ # Next time, state is auto-loaded
136
+ agent-browser --session-name myapp open https://app.example.com/dashboard
137
+
138
+ # Encrypt state at rest
139
+ export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
140
+ agent-browser --session-name secure open https://app.example.com
141
+
142
+ # Manage saved states
143
+ agent-browser state list
144
+ agent-browser state show myapp-default.json
145
+ agent-browser state clear myapp
146
+ agent-browser state clean --older-than 7
147
+ ```
148
+
149
+ ### Data Extraction
150
+
151
+ ```bash
152
+ agent-browser open https://example.com/products
153
+ agent-browser snapshot -i
154
+ agent-browser get text @e5 # Get specific element text
155
+ agent-browser get text body > page.txt # Get all page text
156
+
157
+ # JSON output for parsing
158
+ agent-browser snapshot -i --json
159
+ agent-browser get text @e1 --json
160
+ ```
161
+
162
+ ### Parallel Sessions
163
+
164
+ ```bash
165
+ agent-browser --session site1 open https://site-a.com
166
+ agent-browser --session site2 open https://site-b.com
167
+
168
+ agent-browser --session site1 snapshot -i
169
+ agent-browser --session site2 snapshot -i
170
+
171
+ agent-browser session list
172
+ ```
173
+
174
+ ### Connect to Existing Chrome
175
+
176
+ ```bash
177
+ # Auto-discover running Chrome with remote debugging enabled
178
+ agent-browser --auto-connect open https://example.com
179
+ agent-browser --auto-connect snapshot
180
+
181
+ # Or with explicit CDP port
182
+ agent-browser --cdp 9222 snapshot
183
+ ```
184
+
185
+ ### Visual Browser (Debugging)
186
+
187
+ ```bash
188
+ agent-browser --headed open https://example.com
189
+ agent-browser highlight @e1 # Highlight element
190
+ agent-browser record start demo.webm # Record session
191
+ agent-browser profiler start # Start Chrome DevTools profiling
192
+ agent-browser profiler stop trace.json # Stop and save profile (path optional)
193
+ ```
194
+
195
+ ### Local Files (PDFs, HTML)
196
+
197
+ ```bash
198
+ # Open local files with file:// URLs
199
+ agent-browser --allow-file-access open file:///path/to/document.pdf
200
+ agent-browser --allow-file-access open file:///path/to/page.html
201
+ agent-browser screenshot output.png
202
+ ```
203
+
204
+ ### iOS Simulator (Mobile Safari)
205
+
206
+ ```bash
207
+ # List available iOS simulators
208
+ agent-browser device list
209
+
210
+ # Launch Safari on a specific device
211
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
212
+
213
+ # Same workflow as desktop - snapshot, interact, re-snapshot
214
+ agent-browser -p ios snapshot -i
215
+ agent-browser -p ios tap @e1 # Tap (alias for click)
216
+ agent-browser -p ios fill @e2 "text"
217
+ agent-browser -p ios swipe up # Mobile-specific gesture
218
+
219
+ # Take screenshot
220
+ agent-browser -p ios screenshot mobile.png
221
+
222
+ # Close session (shuts down simulator)
223
+ agent-browser -p ios close
224
+ ```
225
+
226
+ **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
227
+
228
+ **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
229
+
230
+ ## Diffing (Verifying Changes)
231
+
232
+ Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
233
+
234
+ ```bash
235
+ # Typical workflow: snapshot -> action -> diff
236
+ agent-browser snapshot -i # Take baseline snapshot
237
+ agent-browser click @e2 # Perform action
238
+ agent-browser diff snapshot # See what changed (auto-compares to last snapshot)
239
+ ```
240
+
241
+ For visual regression testing or monitoring:
242
+
243
+ ```bash
244
+ # Save a baseline screenshot, then compare later
245
+ agent-browser screenshot baseline.png
246
+ # ... time passes or changes are made ...
247
+ agent-browser diff screenshot --baseline baseline.png
248
+
249
+ # Compare staging vs production
250
+ agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
251
+ ```
252
+
253
+ `diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
254
+
255
+ ## Timeouts and Slow Pages
256
+
257
+ The default Playwright timeout is 60 seconds for local browsers. For slow websites or large pages, use explicit waits instead of relying on the default timeout:
258
+
259
+ ```bash
260
+ # Wait for network activity to settle (best for slow pages)
261
+ agent-browser wait --load networkidle
262
+
263
+ # Wait for a specific element to appear
264
+ agent-browser wait "#content"
265
+ agent-browser wait @e1
266
+
267
+ # Wait for a specific URL pattern (useful after redirects)
268
+ agent-browser wait --url "**/dashboard"
269
+
270
+ # Wait for a JavaScript condition
271
+ agent-browser wait --fn "document.readyState === 'complete'"
272
+
273
+ # Wait a fixed duration (milliseconds) as a last resort
274
+ agent-browser wait 5000
275
+ ```
276
+
277
+ When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
278
+
279
+ ## Session Management and Cleanup
280
+
281
+ When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
282
+
283
+ ```bash
284
+ # Each agent gets its own isolated session
285
+ agent-browser --session agent1 open site-a.com
286
+ agent-browser --session agent2 open site-b.com
287
+
288
+ # Check active sessions
289
+ agent-browser session list
290
+ ```
291
+
292
+ Always close your browser session when done to avoid leaked processes:
293
+
294
+ ```bash
295
+ agent-browser close # Close default session
296
+ agent-browser --session agent1 close # Close specific session
297
+ ```
298
+
299
+ If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
300
+
301
+ ## Ref Lifecycle (Important)
302
+
303
+ Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
304
+
305
+ - Clicking links or buttons that navigate
306
+ - Form submissions
307
+ - Dynamic content loading (dropdowns, modals)
308
+
309
+ ```bash
310
+ agent-browser click @e5 # Navigates to new page
311
+ agent-browser snapshot -i # MUST re-snapshot
312
+ agent-browser click @e1 # Use new refs
313
+ ```
314
+
315
+ ## Annotated Screenshots (Vision Mode)
316
+
317
+ Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.
318
+
319
+ ```bash
320
+ agent-browser screenshot --annotate
321
+ # Output includes the image path and a legend:
322
+ # [1] @e1 button "Submit"
323
+ # [2] @e2 link "Home"
324
+ # [3] @e3 textbox "Email"
325
+ agent-browser click @e2 # Click using ref from annotated screenshot
326
+ ```
327
+
328
+ Use annotated screenshots when:
329
+ - The page has unlabeled icon buttons or visual-only elements
330
+ - You need to verify visual layout or styling
331
+ - Canvas or chart elements are present (invisible to text snapshots)
332
+ - You need spatial reasoning about element positions
333
+
334
+ ## Semantic Locators (Alternative to Refs)
335
+
336
+ When refs are unavailable or unreliable, use semantic locators:
337
+
338
+ ```bash
339
+ agent-browser find text "Sign In" click
340
+ agent-browser find label "Email" fill "user@test.com"
341
+ agent-browser find role button click --name "Submit"
342
+ agent-browser find placeholder "Search" type "query"
343
+ agent-browser find testid "submit-btn" click
344
+ ```
345
+
346
+ ## JavaScript Evaluation (eval)
347
+
348
+ Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.
349
+
350
+ ```bash
351
+ # Simple expressions work with regular quoting
352
+ agent-browser eval 'document.title'
353
+ agent-browser eval 'document.querySelectorAll("img").length'
354
+
355
+ # Complex JS: use --stdin with heredoc (RECOMMENDED)
356
+ agent-browser eval --stdin <<'EVALEOF'
357
+ JSON.stringify(
358
+ Array.from(document.querySelectorAll("img"))
359
+ .filter(i => !i.alt)
360
+ .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
361
+ )
362
+ EVALEOF
363
+
364
+ # Alternative: base64 encoding (avoids all shell escaping issues)
365
+ agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
366
+ ```
367
+
368
+ **Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.
369
+
370
+ **Rules of thumb:**
371
+ - Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine
372
+ - Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`
373
+ - Programmatic/generated scripts -> use `eval -b` with base64
374
+
375
+ ## Configuration File
376
+
377
+ Create `agent-browser.json` in the project root for persistent settings:
378
+
379
+ ```json
380
+ {
381
+ "headed": true,
382
+ "proxy": "http://localhost:8080",
383
+ "profile": "./browser-data"
384
+ }
385
+ ```
386
+
387
+ Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.
388
+
389
+ ## Deep-Dive Documentation
390
+
391
+ | Reference | When to Use |
392
+ |-----------|-------------|
393
+ | [references/commands.md](references/commands.md) | Full command reference with all options |
394
+ | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
395
+ | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
396
+ | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
397
+ | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
398
+ | [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
399
+ | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
400
+
401
+ ## Ready-to-Use Templates
402
+
403
+ | Template | Description |
404
+ |----------|-------------|
405
+ | [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
406
+ | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state |
407
+ | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |
408
+
409
+ ```bash
410
+ ./templates/form-automation.sh https://example.com/form
411
+ ./templates/authenticated-session.sh https://app.example.com/login
412
+ ./templates/capture-workflow.sh https://example.com ./output
413
+ ```
414
+
415
+ ## Local Dev Insights
416
+
417
+ **IMPORTANT:** Read `LOCAL-INSIGHTS.md` in this skill directory for gotchas, corrections, and tested workflows discovered through hands-on use that this upstream skill doesn't cover.