@assistkick/create 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (178) hide show
  1. package/dist/bin/create.d.ts +2 -0
  2. package/dist/bin/create.js +25 -0
  3. package/dist/bin/create.js.map +1 -0
  4. package/dist/src/scaffolder.d.ts +22 -0
  5. package/dist/src/scaffolder.js +120 -0
  6. package/dist/src/scaffolder.js.map +1 -0
  7. package/package.json +24 -0
  8. package/templates/product-system/.env.example +8 -0
  9. package/templates/product-system/CLAUDE.md +45 -0
  10. package/templates/product-system/package.json +32 -0
  11. package/templates/product-system/packages/backend/package.json +37 -0
  12. package/templates/product-system/packages/backend/src/middleware/auth_middleware.test.ts +86 -0
  13. package/templates/product-system/packages/backend/src/middleware/auth_middleware.ts +35 -0
  14. package/templates/product-system/packages/backend/src/routes/auth.ts +463 -0
  15. package/templates/product-system/packages/backend/src/routes/coherence.ts +187 -0
  16. package/templates/product-system/packages/backend/src/routes/graph.ts +67 -0
  17. package/templates/product-system/packages/backend/src/routes/kanban.ts +201 -0
  18. package/templates/product-system/packages/backend/src/routes/pipeline.ts +41 -0
  19. package/templates/product-system/packages/backend/src/routes/projects.ts +122 -0
  20. package/templates/product-system/packages/backend/src/routes/users.ts +97 -0
  21. package/templates/product-system/packages/backend/src/server.ts +159 -0
  22. package/templates/product-system/packages/backend/src/services/auth_service.test.ts +115 -0
  23. package/templates/product-system/packages/backend/src/services/auth_service.ts +82 -0
  24. package/templates/product-system/packages/backend/src/services/coherence-review.ts +339 -0
  25. package/templates/product-system/packages/backend/src/services/email_service.ts +75 -0
  26. package/templates/product-system/packages/backend/src/services/init.ts +80 -0
  27. package/templates/product-system/packages/backend/src/services/invitation_service.test.ts +235 -0
  28. package/templates/product-system/packages/backend/src/services/invitation_service.ts +193 -0
  29. package/templates/product-system/packages/backend/src/services/password_reset_service.test.ts +151 -0
  30. package/templates/product-system/packages/backend/src/services/password_reset_service.ts +135 -0
  31. package/templates/product-system/packages/backend/src/services/project_service.test.ts +215 -0
  32. package/templates/product-system/packages/backend/src/services/project_service.ts +171 -0
  33. package/templates/product-system/packages/backend/src/services/pty_session_manager.test.ts +88 -0
  34. package/templates/product-system/packages/backend/src/services/pty_session_manager.ts +279 -0
  35. package/templates/product-system/packages/backend/src/services/terminal_ws_handler.ts +133 -0
  36. package/templates/product-system/packages/backend/src/services/user_management_service.test.ts +158 -0
  37. package/templates/product-system/packages/backend/src/services/user_management_service.ts +128 -0
  38. package/templates/product-system/packages/backend/tsconfig.json +22 -0
  39. package/templates/product-system/packages/frontend/index.html +13 -0
  40. package/templates/product-system/packages/frontend/package-lock.json +2666 -0
  41. package/templates/product-system/packages/frontend/package.json +30 -0
  42. package/templates/product-system/packages/frontend/public/favicon.svg +16 -0
  43. package/templates/product-system/packages/frontend/src/App.tsx +29 -0
  44. package/templates/product-system/packages/frontend/src/api/client.ts +386 -0
  45. package/templates/product-system/packages/frontend/src/api/client_projects.test.ts +104 -0
  46. package/templates/product-system/packages/frontend/src/api/client_refresh.test.ts +145 -0
  47. package/templates/product-system/packages/frontend/src/components/CoherenceView.tsx +414 -0
  48. package/templates/product-system/packages/frontend/src/components/GraphLegend.tsx +124 -0
  49. package/templates/product-system/packages/frontend/src/components/GraphSettings.tsx +112 -0
  50. package/templates/product-system/packages/frontend/src/components/GraphView.tsx +370 -0
  51. package/templates/product-system/packages/frontend/src/components/InviteUserDialog.tsx +85 -0
  52. package/templates/product-system/packages/frontend/src/components/KanbanView.tsx +470 -0
  53. package/templates/product-system/packages/frontend/src/components/LoginPage.tsx +116 -0
  54. package/templates/product-system/packages/frontend/src/components/ProjectSelector.tsx +187 -0
  55. package/templates/product-system/packages/frontend/src/components/QaIssueSheet.tsx +192 -0
  56. package/templates/product-system/packages/frontend/src/components/SidePanel.tsx +231 -0
  57. package/templates/product-system/packages/frontend/src/components/TerminalView.tsx +200 -0
  58. package/templates/product-system/packages/frontend/src/components/Toolbar.tsx +84 -0
  59. package/templates/product-system/packages/frontend/src/components/UsersView.tsx +249 -0
  60. package/templates/product-system/packages/frontend/src/constants/graph.ts +191 -0
  61. package/templates/product-system/packages/frontend/src/hooks/useAuth.tsx +54 -0
  62. package/templates/product-system/packages/frontend/src/hooks/useGraph.ts +27 -0
  63. package/templates/product-system/packages/frontend/src/hooks/useKanban.ts +21 -0
  64. package/templates/product-system/packages/frontend/src/hooks/useProjects.ts +86 -0
  65. package/templates/product-system/packages/frontend/src/hooks/useTheme.ts +26 -0
  66. package/templates/product-system/packages/frontend/src/hooks/useToast.tsx +62 -0
  67. package/templates/product-system/packages/frontend/src/hooks/use_projects_logic.test.ts +61 -0
  68. package/templates/product-system/packages/frontend/src/main.tsx +12 -0
  69. package/templates/product-system/packages/frontend/src/pages/accept_invitation_page.tsx +167 -0
  70. package/templates/product-system/packages/frontend/src/pages/forgot_password_page.tsx +100 -0
  71. package/templates/product-system/packages/frontend/src/pages/register_page.tsx +137 -0
  72. package/templates/product-system/packages/frontend/src/pages/reset_password_page.tsx +146 -0
  73. package/templates/product-system/packages/frontend/src/routes/ProtectedRoute.tsx +12 -0
  74. package/templates/product-system/packages/frontend/src/routes/accept_invitation.tsx +14 -0
  75. package/templates/product-system/packages/frontend/src/routes/dashboard.tsx +221 -0
  76. package/templates/product-system/packages/frontend/src/routes/forgot_password.tsx +13 -0
  77. package/templates/product-system/packages/frontend/src/routes/login.tsx +14 -0
  78. package/templates/product-system/packages/frontend/src/routes/register.tsx +14 -0
  79. package/templates/product-system/packages/frontend/src/routes/reset_password.tsx +13 -0
  80. package/templates/product-system/packages/frontend/src/styles/index.css +3358 -0
  81. package/templates/product-system/packages/frontend/src/utils/auth_validation.test.ts +51 -0
  82. package/templates/product-system/packages/frontend/src/utils/auth_validation.ts +19 -0
  83. package/templates/product-system/packages/frontend/src/utils/login_validation.test.ts +61 -0
  84. package/templates/product-system/packages/frontend/src/utils/login_validation.ts +24 -0
  85. package/templates/product-system/packages/frontend/src/utils/logout.test.ts +63 -0
  86. package/templates/product-system/packages/frontend/src/utils/node_sizing.test.ts +62 -0
  87. package/templates/product-system/packages/frontend/src/utils/node_sizing.ts +24 -0
  88. package/templates/product-system/packages/frontend/src/utils/task_status.test.ts +53 -0
  89. package/templates/product-system/packages/frontend/src/utils/task_status.ts +14 -0
  90. package/templates/product-system/packages/frontend/tsconfig.json +21 -0
  91. package/templates/product-system/packages/frontend/vite.config.ts +20 -0
  92. package/templates/product-system/packages/shared/.env.example +3 -0
  93. package/templates/product-system/packages/shared/README.md +1 -0
  94. package/templates/product-system/packages/shared/db/migrate.ts +32 -0
  95. package/templates/product-system/packages/shared/db/migrations/0000_dashing_gorgon.sql +128 -0
  96. package/templates/product-system/packages/shared/db/migrations/meta/0000_snapshot.json +819 -0
  97. package/templates/product-system/packages/shared/db/migrations/meta/_journal.json +13 -0
  98. package/templates/product-system/packages/shared/db/schema.ts +137 -0
  99. package/templates/product-system/packages/shared/drizzle.config.js +14 -0
  100. package/templates/product-system/packages/shared/lib/claude-service.ts +215 -0
  101. package/templates/product-system/packages/shared/lib/coherence.ts +278 -0
  102. package/templates/product-system/packages/shared/lib/completeness.ts +30 -0
  103. package/templates/product-system/packages/shared/lib/constants.ts +327 -0
  104. package/templates/product-system/packages/shared/lib/db.ts +81 -0
  105. package/templates/product-system/packages/shared/lib/git_workflow.ts +110 -0
  106. package/templates/product-system/packages/shared/lib/graph.ts +186 -0
  107. package/templates/product-system/packages/shared/lib/kanban.ts +161 -0
  108. package/templates/product-system/packages/shared/lib/markdown.ts +205 -0
  109. package/templates/product-system/packages/shared/lib/pipeline-state-store.ts +124 -0
  110. package/templates/product-system/packages/shared/lib/pipeline.ts +489 -0
  111. package/templates/product-system/packages/shared/lib/prompt_builder.ts +170 -0
  112. package/templates/product-system/packages/shared/lib/relevance_search.ts +159 -0
  113. package/templates/product-system/packages/shared/lib/session.ts +152 -0
  114. package/templates/product-system/packages/shared/lib/validator.ts +117 -0
  115. package/templates/product-system/packages/shared/lib/work_summary_parser.ts +130 -0
  116. package/templates/product-system/packages/shared/package.json +30 -0
  117. package/templates/product-system/packages/shared/scripts/assign-project.ts +52 -0
  118. package/templates/product-system/packages/shared/tools/add_edge.ts +61 -0
  119. package/templates/product-system/packages/shared/tools/add_node.ts +101 -0
  120. package/templates/product-system/packages/shared/tools/end_session.ts +87 -0
  121. package/templates/product-system/packages/shared/tools/get_gaps.ts +87 -0
  122. package/templates/product-system/packages/shared/tools/get_kanban.ts +125 -0
  123. package/templates/product-system/packages/shared/tools/get_node.ts +78 -0
  124. package/templates/product-system/packages/shared/tools/get_status.ts +98 -0
  125. package/templates/product-system/packages/shared/tools/migrate_to_turso.ts +385 -0
  126. package/templates/product-system/packages/shared/tools/move_card.ts +143 -0
  127. package/templates/product-system/packages/shared/tools/rebuild_index.ts +77 -0
  128. package/templates/product-system/packages/shared/tools/remove_edge.ts +59 -0
  129. package/templates/product-system/packages/shared/tools/remove_node.ts +96 -0
  130. package/templates/product-system/packages/shared/tools/resolve_question.ts +75 -0
  131. package/templates/product-system/packages/shared/tools/search_nodes.ts +106 -0
  132. package/templates/product-system/packages/shared/tools/start_session.ts +144 -0
  133. package/templates/product-system/packages/shared/tools/update_node.ts +133 -0
  134. package/templates/product-system/packages/shared/tsconfig.json +24 -0
  135. package/templates/product-system/pnpm-workspace.yaml +2 -0
  136. package/templates/product-system/smoke_test.ts +219 -0
  137. package/templates/product-system/tests/coherence_review.test.ts +562 -0
  138. package/templates/product-system/tests/db_sqlite_fallback.test.ts +75 -0
  139. package/templates/product-system/tests/edge_type_color_coding.test.ts +147 -0
  140. package/templates/product-system/tests/emit-tool-use-events.test.ts +85 -0
  141. package/templates/product-system/tests/feature_kind.test.ts +139 -0
  142. package/templates/product-system/tests/gap_indicators.test.ts +199 -0
  143. package/templates/product-system/tests/graceful_init.test.ts +142 -0
  144. package/templates/product-system/tests/graph_legend.test.ts +314 -0
  145. package/templates/product-system/tests/graph_settings_sheet.test.ts +804 -0
  146. package/templates/product-system/tests/hide_defined_filter.test.ts +205 -0
  147. package/templates/product-system/tests/kanban.test.ts +529 -0
  148. package/templates/product-system/tests/neighborhood_focus.test.ts +132 -0
  149. package/templates/product-system/tests/node_search.test.ts +340 -0
  150. package/templates/product-system/tests/node_sizing.test.ts +170 -0
  151. package/templates/product-system/tests/node_type_toggle_filters.test.ts +285 -0
  152. package/templates/product-system/tests/node_type_visual_encoding.test.ts +103 -0
  153. package/templates/product-system/tests/pipeline-state-store.test.ts +268 -0
  154. package/templates/product-system/tests/pipeline-unit.test.ts +593 -0
  155. package/templates/product-system/tests/pipeline.test.ts +195 -0
  156. package/templates/product-system/tests/pipeline_stats_all_cards.test.ts +193 -0
  157. package/templates/product-system/tests/play_all.test.ts +296 -0
  158. package/templates/product-system/tests/qa_issue_sheet.test.ts +464 -0
  159. package/templates/product-system/tests/relevance_search.test.ts +186 -0
  160. package/templates/product-system/tests/search_reorder.test.ts +88 -0
  161. package/templates/product-system/tests/serve_ui.test.ts +281 -0
  162. package/templates/product-system/tests/serve_ui_drizzle.test.ts +114 -0
  163. package/templates/product-system/tests/session_context_recall.test.ts +135 -0
  164. package/templates/product-system/tests/side_panel.test.ts +345 -0
  165. package/templates/product-system/tests/spec_completeness_label.test.ts +69 -0
  166. package/templates/product-system/tests/url_routing_test.ts +122 -0
  167. package/templates/product-system/tests/user_login.test.ts +150 -0
  168. package/templates/product-system/tests/user_registration.test.ts +205 -0
  169. package/templates/product-system/tests/web_terminal.test.ts +572 -0
  170. package/templates/product-system/tests/work_summary.test.ts +211 -0
  171. package/templates/product-system/tests/zoom_pan.test.ts +43 -0
  172. package/templates/product-system/tsconfig.json +24 -0
  173. package/templates/skills/product-bootstrap/SKILL.md +312 -0
  174. package/templates/skills/product-code-reviewer/SKILL.md +147 -0
  175. package/templates/skills/product-debugger/SKILL.md +206 -0
  176. package/templates/skills/product-debugger/references/agent-browser.md +1156 -0
  177. package/templates/skills/product-developer/SKILL.md +182 -0
  178. package/templates/skills/product-interview/SKILL.md +220 -0
@@ -0,0 +1,1156 @@
1
+ # agent-browser
2
+
3
+ Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
4
+
5
+ ## Installation
6
+
7
+ ### Global Installation (recommended)
8
+
9
+ Installs the native Rust binary for maximum performance:
10
+
11
+ ```bash
12
+ npm install -g agent-browser
13
+ agent-browser install # Download Chromium
14
+ ```
15
+
16
+ This is the fastest option -- commands run through the native Rust CLI directly with sub-millisecond parsing overhead.
17
+
18
+ ### Quick Start (no install)
19
+
20
+ Run directly with `npx` if you want to try it without installing globally:
21
+
22
+ ```bash
23
+ npx agent-browser install # Download Chromium (first time only)
24
+ npx agent-browser open example.com
25
+ ```
26
+
27
+ > **Note:** `npx` routes through Node.js before reaching the Rust CLI, so it is noticeably slower than a global install. For regular use, install globally.
28
+
29
+ ### Project Installation (local dependency)
30
+
31
+ For projects that want to pin the version in `package.json`:
32
+
33
+ ```bash
34
+ npm install agent-browser
35
+ npx agent-browser install
36
+ ```
37
+
38
+ Then use via `npx` or `package.json` scripts:
39
+
40
+ ```bash
41
+ npx agent-browser open example.com
42
+ ```
43
+
44
+ ### Homebrew (macOS)
45
+
46
+ ```bash
47
+ brew install agent-browser
48
+ agent-browser install # Download Chromium
49
+ ```
50
+
51
+ ### From Source
52
+
53
+ ```bash
54
+ git clone https://github.com/vercel-labs/agent-browser
55
+ cd agent-browser
56
+ pnpm install
57
+ pnpm build
58
+ pnpm build:native # Requires Rust (https://rustup.rs)
59
+ pnpm link --global # Makes agent-browser available globally
60
+ agent-browser install
61
+ ```
62
+
63
+ ### Linux Dependencies
64
+
65
+ On Linux, install system dependencies:
66
+
67
+ ```bash
68
+ agent-browser install --with-deps
69
+ # or manually: npx playwright install-deps chromium
70
+ ```
71
+
72
+ ## Quick Start
73
+
74
+ ```bash
75
+ agent-browser open example.com
76
+ agent-browser snapshot # Get accessibility tree with refs
77
+ agent-browser click @e2 # Click by ref from snapshot
78
+ agent-browser fill @e3 "test@example.com" # Fill by ref
79
+ agent-browser get text @e1 # Get text by ref
80
+ agent-browser screenshot page.png
81
+ agent-browser close
82
+ ```
83
+
84
+ ### Traditional Selectors (also supported)
85
+
86
+ ```bash
87
+ agent-browser click "#submit"
88
+ agent-browser fill "#email" "test@example.com"
89
+ agent-browser find role button click --name "Submit"
90
+ ```
91
+
92
+ ## Commands
93
+
94
+ ### Core Commands
95
+
96
+ ```bash
97
+ agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
98
+ agent-browser click <sel> # Click element (--new-tab to open in new tab)
99
+ agent-browser dblclick <sel> # Double-click element
100
+ agent-browser focus <sel> # Focus element
101
+ agent-browser type <sel> <text> # Type into element
102
+ agent-browser fill <sel> <text> # Clear and fill
103
+ agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
104
+ agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus)
105
+ agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
106
+ agent-browser keydown <key> # Hold key down
107
+ agent-browser keyup <key> # Release key
108
+ agent-browser hover <sel> # Hover element
109
+ agent-browser select <sel> <val> # Select dropdown option
110
+ agent-browser check <sel> # Check checkbox
111
+ agent-browser uncheck <sel> # Uncheck checkbox
112
+ agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>)
113
+ agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
114
+ agent-browser drag <src> <tgt> # Drag and drop
115
+ agent-browser upload <sel> <files> # Upload files
116
+ agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
117
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
118
+ agent-browser pdf <path> # Save as PDF
119
+ agent-browser snapshot # Accessibility tree with refs (best for AI)
120
+ agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
121
+ agent-browser connect <port> # Connect to browser via CDP
122
+ agent-browser close # Close browser (aliases: quit, exit)
123
+ ```
124
+
125
+ ### Get Info
126
+
127
+ ```bash
128
+ agent-browser get text <sel> # Get text content
129
+ agent-browser get html <sel> # Get innerHTML
130
+ agent-browser get value <sel> # Get input value
131
+ agent-browser get attr <sel> <attr> # Get attribute
132
+ agent-browser get title # Get page title
133
+ agent-browser get url # Get current URL
134
+ agent-browser get count <sel> # Count matching elements
135
+ agent-browser get box <sel> # Get bounding box
136
+ agent-browser get styles <sel> # Get computed styles
137
+ ```
138
+
139
+ ### Check State
140
+
141
+ ```bash
142
+ agent-browser is visible <sel> # Check if visible
143
+ agent-browser is enabled <sel> # Check if enabled
144
+ agent-browser is checked <sel> # Check if checked
145
+ ```
146
+
147
+ ### Find Elements (Semantic Locators)
148
+
149
+ ```bash
150
+ agent-browser find role <role> <action> [value] # By ARIA role
151
+ agent-browser find text <text> <action> # By text content
152
+ agent-browser find label <label> <action> [value] # By label
153
+ agent-browser find placeholder <ph> <action> [value] # By placeholder
154
+ agent-browser find alt <text> <action> # By alt text
155
+ agent-browser find title <text> <action> # By title attr
156
+ agent-browser find testid <id> <action> [value] # By data-testid
157
+ agent-browser find first <sel> <action> [value] # First match
158
+ agent-browser find last <sel> <action> [value] # Last match
159
+ agent-browser find nth <n> <sel> <action> [value] # Nth match
160
+ ```
161
+
162
+ **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
163
+
164
+ **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
165
+
166
+ **Examples:**
167
+ ```bash
168
+ agent-browser find role button click --name "Submit"
169
+ agent-browser find text "Sign In" click
170
+ agent-browser find label "Email" fill "test@test.com"
171
+ agent-browser find first ".item" click
172
+ agent-browser find nth 2 "a" text
173
+ ```
174
+
175
+ ### Wait
176
+
177
+ ```bash
178
+ agent-browser wait <selector> # Wait for element to be visible
179
+ agent-browser wait <ms> # Wait for time (milliseconds)
180
+ agent-browser wait --text "Welcome" # Wait for text to appear
181
+ agent-browser wait --url "**/dash" # Wait for URL pattern
182
+ agent-browser wait --load networkidle # Wait for load state
183
+ agent-browser wait --fn "window.ready === true" # Wait for JS condition
184
+ ```
185
+
186
+ **Load states:** `load`, `domcontentloaded`, `networkidle`
187
+
188
+ ### Mouse Control
189
+
190
+ ```bash
191
+ agent-browser mouse move <x> <y> # Move mouse
192
+ agent-browser mouse down [button] # Press button (left/right/middle)
193
+ agent-browser mouse up [button] # Release button
194
+ agent-browser mouse wheel <dy> [dx] # Scroll wheel
195
+ ```
196
+
197
+ ### Browser Settings
198
+
199
+ ```bash
200
+ agent-browser set viewport <w> <h> # Set viewport size
201
+ agent-browser set device <name> # Emulate device ("iPhone 14")
202
+ agent-browser set geo <lat> <lng> # Set geolocation
203
+ agent-browser set offline [on|off] # Toggle offline mode
204
+ agent-browser set headers <json> # Extra HTTP headers
205
+ agent-browser set credentials <u> <p> # HTTP basic auth
206
+ agent-browser set media [dark|light] # Emulate color scheme
207
+ ```
208
+
209
+ ### Cookies & Storage
210
+
211
+ ```bash
212
+ agent-browser cookies # Get all cookies
213
+ agent-browser cookies set <name> <val> # Set cookie
214
+ agent-browser cookies clear # Clear cookies
215
+
216
+ agent-browser storage local # Get all localStorage
217
+ agent-browser storage local <key> # Get specific key
218
+ agent-browser storage local set <k> <v> # Set value
219
+ agent-browser storage local clear # Clear all
220
+
221
+ agent-browser storage session # Same for sessionStorage
222
+ ```
223
+
224
+ ### Network
225
+
226
+ ```bash
227
+ agent-browser network route <url> # Intercept requests
228
+ agent-browser network route <url> --abort # Block requests
229
+ agent-browser network route <url> --body <json> # Mock response
230
+ agent-browser network unroute [url] # Remove routes
231
+ agent-browser network requests # View tracked requests
232
+ agent-browser network requests --filter api # Filter requests
233
+ ```
234
+
235
+ ### Tabs & Windows
236
+
237
+ ```bash
238
+ agent-browser tab # List tabs
239
+ agent-browser tab new [url] # New tab (optionally with URL)
240
+ agent-browser tab <n> # Switch to tab n
241
+ agent-browser tab close [n] # Close tab
242
+ agent-browser window new # New window
243
+ ```
244
+
245
+ ### Frames
246
+
247
+ ```bash
248
+ agent-browser frame <sel> # Switch to iframe
249
+ agent-browser frame main # Back to main frame
250
+ ```
251
+
252
+ ### Dialogs
253
+
254
+ ```bash
255
+ agent-browser dialog accept [text] # Accept (with optional prompt text)
256
+ agent-browser dialog dismiss # Dismiss
257
+ ```
258
+
259
+ ### Diff
260
+
261
+ ```bash
262
+ agent-browser diff snapshot # Compare current vs last snapshot
263
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
264
+ agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
265
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
266
+ agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
267
+ agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
268
+ agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
269
+ agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
270
+ agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
271
+ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
272
+ ```
273
+
274
+ ### Debug
275
+
276
+ ```bash
277
+ agent-browser trace start [path] # Start recording trace
278
+ agent-browser trace stop [path] # Stop and save trace
279
+ agent-browser profiler start # Start Chrome DevTools profiling
280
+ agent-browser profiler stop [path] # Stop and save profile (.json)
281
+ agent-browser console # View console messages (log, error, warn, info)
282
+ agent-browser console --clear # Clear console
283
+ agent-browser errors # View page errors (uncaught JavaScript exceptions)
284
+ agent-browser errors --clear # Clear errors
285
+ agent-browser highlight <sel> # Highlight element
286
+ agent-browser state save <path> # Save auth state
287
+ agent-browser state load <path> # Load auth state
288
+ agent-browser state list # List saved state files
289
+ agent-browser state show <file> # Show state summary
290
+ agent-browser state rename <old> <new> # Rename state file
291
+ agent-browser state clear [name] # Clear states for session
292
+ agent-browser state clear --all # Clear all saved states
293
+ agent-browser state clean --older-than <days> # Delete old states
294
+ ```
295
+
296
+ ### Navigation
297
+
298
+ ```bash
299
+ agent-browser back # Go back
300
+ agent-browser forward # Go forward
301
+ agent-browser reload # Reload page
302
+ ```
303
+
304
+ ### Setup
305
+
306
+ ```bash
307
+ agent-browser install # Download Chromium browser
308
+ agent-browser install --with-deps # Also install system deps (Linux)
309
+ ```
310
+
311
+ ## Sessions
312
+
313
+ Run multiple isolated browser instances:
314
+
315
+ ```bash
316
+ # Different sessions
317
+ agent-browser --session agent1 open site-a.com
318
+ agent-browser --session agent2 open site-b.com
319
+
320
+ # Or via environment variable
321
+ AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
322
+
323
+ # List active sessions
324
+ agent-browser session list
325
+ # Output:
326
+ # Active sessions:
327
+ # -> default
328
+ # agent1
329
+
330
+ # Show current session
331
+ agent-browser session
332
+ ```
333
+
334
+ Each session has its own:
335
+ - Browser instance
336
+ - Cookies and storage
337
+ - Navigation history
338
+ - Authentication state
339
+
340
+ ## Persistent Profiles
341
+
342
+ By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
343
+
344
+ ```bash
345
+ # Use a persistent profile directory
346
+ agent-browser --profile ~/.myapp-profile open myapp.com
347
+
348
+ # Login once, then reuse the authenticated session
349
+ agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
350
+
351
+ # Or via environment variable
352
+ AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
353
+ ```
354
+
355
+ The profile directory stores:
356
+ - Cookies and localStorage
357
+ - IndexedDB data
358
+ - Service workers
359
+ - Browser cache
360
+ - Login sessions
361
+
362
+ **Tip**: Use different profile paths for different projects to keep their browser state isolated.
363
+
364
+ ## Session Persistence
365
+
366
+ Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
367
+
368
+ ```bash
369
+ # Auto-save/load state for "twitter" session
370
+ agent-browser --session-name twitter open twitter.com
371
+
372
+ # Login once, then state persists automatically
373
+ # State files stored in ~/.agent-browser/sessions/
374
+
375
+ # Or via environment variable
376
+ export AGENT_BROWSER_SESSION_NAME=twitter
377
+ agent-browser open twitter.com
378
+ ```
379
+
380
+ ### State Encryption
381
+
382
+ Encrypt saved session data at rest with AES-256-GCM:
383
+
384
+ ```bash
385
+ # Generate key: openssl rand -hex 32
386
+ export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
387
+
388
+ # State files are now encrypted automatically
389
+ agent-browser --session-name secure open example.com
390
+ ```
391
+
392
+ | Variable | Description |
393
+ |----------|-------------|
394
+ | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
395
+ | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
396
+ | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
397
+
398
+ ## Security
399
+
400
+ agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
401
+
402
+ - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
403
+ - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
404
+ - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
405
+ - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
406
+ - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
407
+ - **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
408
+
409
+ | Variable | Description |
410
+ |----------|-------------|
411
+ | `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers |
412
+ | `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output |
413
+ | `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns |
414
+ | `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file |
415
+ | `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation |
416
+ | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts |
417
+
418
+ See [Security documentation](https://agent-browser.vercel.app/security) for details.
419
+
420
+ ## Snapshot Options
421
+
422
+ The `snapshot` command supports filtering to reduce output size:
423
+
424
+ ```bash
425
+ agent-browser snapshot # Full accessibility tree
426
+ agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
427
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
428
+ agent-browser snapshot -c # Compact (remove empty structural elements)
429
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
430
+ agent-browser snapshot -s "#main" # Scope to CSS selector
431
+ agent-browser snapshot -i -c -d 5 # Combine options
432
+ ```
433
+
434
+ | Option | Description |
435
+ |--------|-------------|
436
+ | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
437
+ | `-C, --cursor` | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
438
+ | `-c, --compact` | Remove empty structural elements |
439
+ | `-d, --depth <n>` | Limit tree depth |
440
+ | `-s, --selector <sel>` | Scope to CSS selector |
441
+
442
+ The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
443
+
444
+ ## Annotated Screenshots
445
+
446
+ The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
447
+
448
+ ```bash
449
+ agent-browser screenshot --annotate
450
+ # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
451
+ # [1] @e1 button "Submit"
452
+ # [2] @e2 link "Home"
453
+ # [3] @e3 textbox "Email"
454
+ ```
455
+
456
+ After an annotated screenshot, refs are cached so you can immediately interact with elements:
457
+
458
+ ```bash
459
+ agent-browser screenshot --annotate ./page.png
460
+ agent-browser click @e2 # Click the "Home" link labeled [2]
461
+ ```
462
+
463
+ This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
464
+
465
+ ## Options
466
+
467
+ | Option | Description |
468
+ |--------|-------------|
469
+ | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
470
+ | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
471
+ | `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
472
+ | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
473
+ | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
474
+ | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
475
+ | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
476
+ | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
477
+ | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
478
+ | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
479
+ | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
480
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
481
+ | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
482
+ | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
483
+ | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
484
+ | `--json` | JSON output (for agents) |
485
+ | `--full, -f` | Full page screenshot |
486
+ | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
487
+ | `--headed` | Show browser window (not headless) |
488
+ | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
489
+ | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
490
+ | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
491
+ | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) |
492
+ | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
493
+ | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) |
494
+ | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) |
495
+ | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) |
496
+ | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) |
497
+ | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
498
+ | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
499
+ | `--debug` | Debug output |
500
+
501
+ ## Configuration
502
+
503
+ Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
504
+
505
+ **Locations (lowest to highest priority):**
506
+
507
+ 1. `~/.agent-browser/config.json` -- user-level defaults
508
+ 2. `./agent-browser.json` -- project-level overrides (in working directory)
509
+ 3. `AGENT_BROWSER_*` environment variables override config file values
510
+ 4. CLI flags override everything
511
+
512
+ **Example `agent-browser.json`:**
513
+
514
+ ```json
515
+ {
516
+ "headed": true,
517
+ "proxy": "http://localhost:8080",
518
+ "profile": "./browser-data",
519
+ "userAgent": "my-agent/1.0",
520
+ "ignoreHttpsErrors": true
521
+ }
522
+ ```
523
+
524
+ Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
525
+
526
+ ```bash
527
+ agent-browser --config ./ci-config.json open example.com
528
+ AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
529
+ ```
530
+
531
+ All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
532
+
533
+ Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
534
+
535
+ Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
536
+
537
+ > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
538
+
539
+ ## Default Timeout
540
+
541
+ The default Playwright timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that Playwright returns a proper error instead of the CLI timing out with EAGAIN.
542
+
543
+ Override the default timeout via environment variable:
544
+
545
+ ```bash
546
+ # Set a longer timeout for slow pages (in milliseconds)
547
+ export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
548
+ ```
549
+
550
+ > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before Playwright responds. The CLI retries transient errors automatically, but response times will increase.
551
+
552
+ | Variable | Description |
553
+ |----------|-------------|
554
+ | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default Playwright timeout in ms (default: 25000) |
555
+
556
+ ## Selectors
557
+
558
+ ### Refs (Recommended for AI)
559
+
560
+ Refs provide deterministic element selection from snapshots:
561
+
562
+ ```bash
563
+ # 1. Get snapshot with refs
564
+ agent-browser snapshot
565
+ # Output:
566
+ # - heading "Example Domain" [ref=e1] [level=1]
567
+ # - button "Submit" [ref=e2]
568
+ # - textbox "Email" [ref=e3]
569
+ # - link "Learn more" [ref=e4]
570
+
571
+ # 2. Use refs to interact
572
+ agent-browser click @e2 # Click the button
573
+ agent-browser fill @e3 "test@example.com" # Fill the textbox
574
+ agent-browser get text @e1 # Get heading text
575
+ agent-browser hover @e4 # Hover the link
576
+ ```
577
+
578
+ **Why use refs?**
579
+ - **Deterministic**: Ref points to exact element from snapshot
580
+ - **Fast**: No DOM re-query needed
581
+ - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
582
+
583
+ ### CSS Selectors
584
+
585
+ ```bash
586
+ agent-browser click "#id"
587
+ agent-browser click ".class"
588
+ agent-browser click "div > button"
589
+ ```
590
+
591
+ ### Text & XPath
592
+
593
+ ```bash
594
+ agent-browser click "text=Submit"
595
+ agent-browser click "xpath=//button"
596
+ ```
597
+
598
+ ### Semantic Locators
599
+
600
+ ```bash
601
+ agent-browser find role button click --name "Submit"
602
+ agent-browser find label "Email" fill "test@test.com"
603
+ ```
604
+
605
+ ## Agent Mode
606
+
607
+ Use `--json` for machine-readable output:
608
+
609
+ ```bash
610
+ agent-browser snapshot --json
611
+ # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
612
+
613
+ agent-browser get text @e1 --json
614
+ agent-browser is visible @e2 --json
615
+ ```
616
+
617
+ ### Optimal AI Workflow
618
+
619
+ ```bash
620
+ # 1. Navigate and get snapshot
621
+ agent-browser open example.com
622
+ agent-browser snapshot -i --json # AI parses tree and refs
623
+
624
+ # 2. AI identifies target refs from snapshot
625
+ # 3. Execute actions using refs
626
+ agent-browser click @e2
627
+ agent-browser fill @e3 "input text"
628
+
629
+ # 4. Get new snapshot if page changed
630
+ agent-browser snapshot -i --json
631
+ ```
632
+
633
+ ### Command Chaining
634
+
635
+ Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
636
+
637
+ ```bash
638
+ # Open, wait for load, and snapshot in one call
639
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
640
+
641
+ # Chain multiple interactions
642
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
643
+
644
+ # Navigate and screenshot
645
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
646
+ ```
647
+
648
+ Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
649
+
650
+ ## Headed Mode
651
+
652
+ Show the browser window for debugging:
653
+
654
+ ```bash
655
+ agent-browser open example.com --headed
656
+ ```
657
+
658
+ This opens a visible browser window instead of running headless.
659
+
660
+ ## Authenticated Sessions
661
+
662
+ Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
663
+
664
+ ```bash
665
+ # Headers are scoped to api.example.com only
666
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
667
+
668
+ # Requests to api.example.com include the auth header
669
+ agent-browser snapshot -i --json
670
+ agent-browser click @e2
671
+
672
+ # Navigate to another domain - headers are NOT sent (safe!)
673
+ agent-browser open other-site.com
674
+ ```
675
+
676
+ This is useful for:
677
+ - **Skipping login flows** - Authenticate via headers instead of UI
678
+ - **Switching users** - Start new sessions with different auth tokens
679
+ - **API testing** - Access protected endpoints directly
680
+ - **Security** - Headers are scoped to the origin, not leaked to other domains
681
+
682
+ To set headers for multiple origins, use `--headers` with each `open` command:
683
+
684
+ ```bash
685
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
686
+ agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
687
+ ```
688
+
689
+ For global headers (all domains), use `set headers`:
690
+
691
+ ```bash
692
+ agent-browser set headers '{"X-Custom-Header": "value"}'
693
+ ```
694
+
695
+ ## Custom Browser Executable
696
+
697
+ Use a custom browser executable instead of the bundled Chromium. This is useful for:
698
+ - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
699
+ - **System browsers**: Use an existing Chrome/Chromium installation
700
+ - **Custom builds**: Use modified browser builds
701
+
702
+ ### CLI Usage
703
+
704
+ ```bash
705
+ # Via flag
706
+ agent-browser --executable-path /path/to/chromium open example.com
707
+
708
+ # Via environment variable
709
+ AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
710
+ ```
711
+
712
+ ### Serverless Example (Vercel/AWS Lambda)
713
+
714
+ ```typescript
715
+ import chromium from '@sparticuz/chromium';
716
+ import { BrowserManager } from 'agent-browser';
717
+
718
+ export async function handler() {
719
+ const browser = new BrowserManager();
720
+ await browser.launch({
721
+ executablePath: await chromium.executablePath(),
722
+ headless: true,
723
+ });
724
+ // ... use browser
725
+ }
726
+ ```
727
+
728
+ ## Local Files
729
+
730
+ Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
731
+
732
+ ```bash
733
+ # Enable file access (required for JavaScript to access local files)
734
+ agent-browser --allow-file-access open file:///path/to/document.pdf
735
+ agent-browser --allow-file-access open file:///path/to/page.html
736
+
737
+ # Take screenshot of a local PDF
738
+ agent-browser --allow-file-access open file:///Users/me/report.pdf
739
+ agent-browser screenshot report.png
740
+ ```
741
+
742
+ The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
743
+ - Load and render local files
744
+ - Access other local files via JavaScript (XHR, fetch)
745
+ - Load local resources (images, scripts, stylesheets)
746
+
747
+ **Note:** This flag only works with Chromium. For security, it's disabled by default.
748
+
749
+ ## CDP Mode
750
+
751
+ Connect to an existing browser via Chrome DevTools Protocol:
752
+
753
+ ```bash
754
+ # Start Chrome with: google-chrome --remote-debugging-port=9222
755
+
756
+ # Connect once, then run commands without --cdp
757
+ agent-browser connect 9222
758
+ agent-browser snapshot
759
+ agent-browser tab
760
+ agent-browser close
761
+
762
+ # Or pass --cdp on each command
763
+ agent-browser --cdp 9222 snapshot
764
+
765
+ # Connect to remote browser via WebSocket URL
766
+ agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
767
+ ```
768
+
769
+ The `--cdp` flag accepts either:
770
+ - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
771
+ - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
772
+
773
+ This enables control of:
774
+ - Electron apps
775
+ - Chrome/Chromium instances with remote debugging
776
+ - WebView2 applications
777
+ - Any browser exposing a CDP endpoint
778
+
779
+ ### Auto-Connect
780
+
781
+ Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
782
+
783
+ ```bash
784
+ # Auto-discover running Chrome with remote debugging
785
+ agent-browser --auto-connect open example.com
786
+ agent-browser --auto-connect snapshot
787
+
788
+ # Or via environment variable
789
+ AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
790
+ ```
791
+
792
+ Auto-connect discovers Chrome by:
793
+ 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
794
+ 2. Falling back to probing common debugging ports (9222, 9229)
795
+
796
+ This is useful when:
797
+ - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
798
+ - You want a zero-configuration connection to your existing browser
799
+ - You don't want to track which port Chrome is using
800
+
801
+ ## Streaming (Browser Preview)
802
+
803
+ Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
804
+
805
+ ### Enable Streaming
806
+
807
+ Set the `AGENT_BROWSER_STREAM_PORT` environment variable:
808
+
809
+ ```bash
810
+ AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
811
+ ```
812
+
813
+ This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
814
+
815
+ ### WebSocket Protocol
816
+
817
+ Connect to `ws://localhost:9223` to receive frames and send input:
818
+
819
+ **Receive frames:**
820
+ ```json
821
+ {
822
+ "type": "frame",
823
+ "data": "<base64-encoded-jpeg>",
824
+ "metadata": {
825
+ "deviceWidth": 1280,
826
+ "deviceHeight": 720,
827
+ "pageScaleFactor": 1,
828
+ "offsetTop": 0,
829
+ "scrollOffsetX": 0,
830
+ "scrollOffsetY": 0
831
+ }
832
+ }
833
+ ```
834
+
835
+ **Send mouse events:**
836
+ ```json
837
+ {
838
+ "type": "input_mouse",
839
+ "eventType": "mousePressed",
840
+ "x": 100,
841
+ "y": 200,
842
+ "button": "left",
843
+ "clickCount": 1
844
+ }
845
+ ```
846
+
847
+ **Send keyboard events:**
848
+ ```json
849
+ {
850
+ "type": "input_keyboard",
851
+ "eventType": "keyDown",
852
+ "key": "Enter",
853
+ "code": "Enter"
854
+ }
855
+ ```
856
+
857
+ **Send touch events:**
858
+ ```json
859
+ {
860
+ "type": "input_touch",
861
+ "eventType": "touchStart",
862
+ "touchPoints": [{ "x": 100, "y": 200 }]
863
+ }
864
+ ```
865
+
866
+ ### Programmatic API
867
+
868
+ For advanced use, control streaming directly via the protocol:
869
+
870
+ ```typescript
871
+ import { BrowserManager } from 'agent-browser';
872
+
873
+ const browser = new BrowserManager();
874
+ await browser.launch({ headless: true });
875
+ await browser.navigate('https://example.com');
876
+
877
+ // Start screencast
878
+ await browser.startScreencast((frame) => {
879
+ // frame.data is base64-encoded image
880
+ // frame.metadata contains viewport info
881
+ console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
882
+ }, {
883
+ format: 'jpeg',
884
+ quality: 80,
885
+ maxWidth: 1280,
886
+ maxHeight: 720,
887
+ });
888
+
889
+ // Inject mouse events
890
+ await browser.injectMouseEvent({
891
+ type: 'mousePressed',
892
+ x: 100,
893
+ y: 200,
894
+ button: 'left',
895
+ });
896
+
897
+ // Inject keyboard events
898
+ await browser.injectKeyboardEvent({
899
+ type: 'keyDown',
900
+ key: 'Enter',
901
+ code: 'Enter',
902
+ });
903
+
904
+ // Stop when done
905
+ await browser.stopScreencast();
906
+ ```
907
+
908
+ ## Architecture
909
+
910
+ agent-browser uses a client-daemon architecture:
911
+
912
+ 1. **Rust CLI** (fast native binary) - Parses commands, communicates with daemon
913
+ 2. **Node.js Daemon** - Manages Playwright browser instance
914
+ 3. **Fallback** - If native binary unavailable, uses Node.js directly
915
+
916
+ The daemon starts automatically on first command and persists between commands for fast subsequent operations.
917
+
918
+ **Browser Engine:** Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
919
+
920
+ ## Platforms
921
+
922
+ | Platform | Binary | Fallback |
923
+ |----------|--------|----------|
924
+ | macOS ARM64 | Native Rust | Node.js |
925
+ | macOS x64 | Native Rust | Node.js |
926
+ | Linux ARM64 | Native Rust | Node.js |
927
+ | Linux x64 | Native Rust | Node.js |
928
+ | Windows x64 | Native Rust | Node.js |
929
+
930
+ ## Usage with AI Agents
931
+
932
+ ### Just ask the agent
933
+
934
+ The simplest approach -- just tell your agent to use it:
935
+
936
+ ```
937
+ Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
938
+ ```
939
+
940
+ The `--help` output is comprehensive and most agents can figure it out from there.
941
+
942
+ ### AI Coding Assistants (recommended)
943
+
944
+ Add the skill to your AI coding assistant for richer context:
945
+
946
+ ```bash
947
+ npx skills add vercel-labs/agent-browser
948
+ ```
949
+
950
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
951
+
952
+ ### Claude Code
953
+
954
+ Install as a Claude Code skill:
955
+
956
+ ```bash
957
+ npx skills add vercel-labs/agent-browser
958
+ ```
959
+
960
+ This adds the skill to `.claude/skills/agent-browser/SKILL.md` in your project. The skill teaches Claude Code the full agent-browser workflow, including the snapshot-ref interaction pattern, session management, and timeout handling.
961
+
962
+ ### AGENTS.md / CLAUDE.md
963
+
964
+ For more consistent results, add to your project or global instructions file:
965
+
966
+ ```markdown
967
+ ## Browser Automation
968
+
969
+ Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
970
+
971
+ Core workflow:
972
+ 1. `agent-browser open <url>` - Navigate to page
973
+ 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
974
+ 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
975
+ 4. Re-snapshot after page changes
976
+ ```
977
+
978
+ ## Integrations
979
+
980
+ ### iOS Simulator
981
+
982
+ Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
983
+
984
+ **Setup:**
985
+
986
+ ```bash
987
+ # Install Appium and XCUITest driver
988
+ npm install -g appium
989
+ appium driver install xcuitest
990
+ ```
991
+
992
+ **Usage:**
993
+
994
+ ```bash
995
+ # List available iOS simulators
996
+ agent-browser device list
997
+
998
+ # Launch Safari on a specific device
999
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1000
+
1001
+ # Same commands as desktop
1002
+ agent-browser -p ios snapshot -i
1003
+ agent-browser -p ios tap @e1
1004
+ agent-browser -p ios fill @e2 "text"
1005
+ agent-browser -p ios screenshot mobile.png
1006
+
1007
+ # Mobile-specific commands
1008
+ agent-browser -p ios swipe up
1009
+ agent-browser -p ios swipe down 500
1010
+
1011
+ # Close session
1012
+ agent-browser -p ios close
1013
+ ```
1014
+
1015
+ Or use environment variables:
1016
+
1017
+ ```bash
1018
+ export AGENT_BROWSER_PROVIDER=ios
1019
+ export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
1020
+ agent-browser open https://example.com
1021
+ ```
1022
+
1023
+ | Variable | Description |
1024
+ |----------|-------------|
1025
+ | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
1026
+ | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1027
+ | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
1028
+
1029
+ **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1030
+
1031
+ **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1032
+
1033
+ #### Real Device Support
1034
+
1035
+ Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1036
+
1037
+ **1. Get your device UDID:**
1038
+ ```bash
1039
+ xcrun xctrace list devices
1040
+ # or
1041
+ system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
1042
+ ```
1043
+
1044
+ **2. Sign WebDriverAgent (one-time):**
1045
+ ```bash
1046
+ # Open the WebDriverAgent Xcode project
1047
+ cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1048
+ open WebDriverAgent.xcodeproj
1049
+ ```
1050
+
1051
+ In Xcode:
1052
+ - Select the `WebDriverAgentRunner` target
1053
+ - Go to Signing & Capabilities
1054
+ - Select your Team (requires Apple Developer account, free tier works)
1055
+ - Let Xcode manage signing automatically
1056
+
1057
+ **3. Use with agent-browser:**
1058
+ ```bash
1059
+ # Connect device via USB, then:
1060
+ agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
1061
+
1062
+ # Or use the device name if unique
1063
+ agent-browser -p ios --device "John's iPhone" open https://example.com
1064
+ ```
1065
+
1066
+ **Real device notes:**
1067
+ - First run installs WebDriverAgent to the device (may require Trust prompt)
1068
+ - Device must be unlocked and connected via USB
1069
+ - Slightly slower initial connection than simulator
1070
+ - Tests against real Safari performance and behavior
1071
+
1072
+ ### Browserbase
1073
+
1074
+ [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
1075
+
1076
+ To enable Browserbase, use the `-p` flag:
1077
+
1078
+ ```bash
1079
+ export BROWSERBASE_API_KEY="your-api-key"
1080
+ export BROWSERBASE_PROJECT_ID="your-project-id"
1081
+ agent-browser -p browserbase open https://example.com
1082
+ ```
1083
+
1084
+ Or use environment variables for CI/scripts:
1085
+
1086
+ ```bash
1087
+ export AGENT_BROWSER_PROVIDER=browserbase
1088
+ export BROWSERBASE_API_KEY="your-api-key"
1089
+ export BROWSERBASE_PROJECT_ID="your-project-id"
1090
+ agent-browser open https://example.com
1091
+ ```
1092
+
1093
+ When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
1094
+
1095
+ Get your API key and project ID from the [Browserbase Dashboard](https://browserbase.com/overview).
1096
+
1097
+ ### Browser Use
1098
+
1099
+ [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
1100
+
1101
+ To enable Browser Use, use the `-p` flag:
1102
+
1103
+ ```bash
1104
+ export BROWSER_USE_API_KEY="your-api-key"
1105
+ agent-browser -p browseruse open https://example.com
1106
+ ```
1107
+
1108
+ Or use environment variables for CI/scripts:
1109
+
1110
+ ```bash
1111
+ export AGENT_BROWSER_PROVIDER=browseruse
1112
+ export BROWSER_USE_API_KEY="your-api-key"
1113
+ agent-browser open https://example.com
1114
+ ```
1115
+
1116
+ When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
1117
+
1118
+ Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
1119
+
1120
+ ### Kernel
1121
+
1122
+ [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1123
+
1124
+ To enable Kernel, use the `-p` flag:
1125
+
1126
+ ```bash
1127
+ export KERNEL_API_KEY="your-api-key"
1128
+ agent-browser -p kernel open https://example.com
1129
+ ```
1130
+
1131
+ Or use environment variables for CI/scripts:
1132
+
1133
+ ```bash
1134
+ export AGENT_BROWSER_PROVIDER=kernel
1135
+ export KERNEL_API_KEY="your-api-key"
1136
+ agent-browser open https://example.com
1137
+ ```
1138
+
1139
+ Optional configuration via environment variables:
1140
+
1141
+ | Variable | Description | Default |
1142
+ |----------|-------------|---------|
1143
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1144
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1145
+ | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1146
+ | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1147
+
1148
+ When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1149
+
1150
+ **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1151
+
1152
+ Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1153
+
1154
+ ## License
1155
+
1156
+ Apache-2.0