@monoes/monomindcli 1.10.28 → 1.10.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (87) hide show
  1. package/.claude/helpers/auto-memory-hook.mjs +39 -4
  2. package/.claude/helpers/handlers/edit-handler.cjs +145 -0
  3. package/.claude/helpers/handlers/route-handler.cjs +393 -0
  4. package/.claude/helpers/handlers/session-handler.cjs +167 -0
  5. package/.claude/helpers/handlers/session-restore-handler.cjs +343 -0
  6. package/.claude/helpers/handlers/task-handler.cjs +329 -0
  7. package/.claude/helpers/hook-handler.cjs +114 -2247
  8. package/.claude/helpers/intelligence.cjs +21 -2
  9. package/.claude/helpers/learning-service.mjs +166 -8
  10. package/.claude/helpers/memory-palace.cjs +72 -12
  11. package/.claude/helpers/router.cjs +79 -5
  12. package/.claude/helpers/statusline.cjs +193 -399
  13. package/.claude/helpers/utils/micro-agents.cjs +338 -0
  14. package/.claude/helpers/utils/monograph.cjs +349 -0
  15. package/.claude/helpers/utils/telemetry.cjs +144 -0
  16. package/.claude/skills/agent-browser-testing/SKILL.md +3 -2
  17. package/.claude/skills/monomind/browse-agentcore.md +116 -0
  18. package/.claude/skills/monomind/browse-electron.md +189 -0
  19. package/.claude/skills/monomind/browse-qa.md +229 -0
  20. package/.claude/skills/monomind/browse-references/authentication.md +162 -0
  21. package/.claude/skills/monomind/browse-references/trust-boundaries.md +41 -0
  22. package/.claude/skills/monomind/browse-references/video-recording.md +84 -0
  23. package/.claude/skills/monomind/browse-slack.md +189 -0
  24. package/.claude/skills/monomind/browse-vercel.md +240 -0
  25. package/.claude/skills/monomind/browse.md +724 -0
  26. package/dist/src/browser/actions.d.ts +13 -0
  27. package/dist/src/browser/actions.d.ts.map +1 -0
  28. package/dist/src/browser/actions.js +201 -0
  29. package/dist/src/browser/actions.js.map +1 -0
  30. package/dist/src/browser/browser.d.ts +14 -0
  31. package/dist/src/browser/browser.d.ts.map +1 -0
  32. package/dist/src/browser/browser.js +198 -0
  33. package/dist/src/browser/browser.js.map +1 -0
  34. package/dist/src/browser/cdp.d.ts +17 -0
  35. package/dist/src/browser/cdp.d.ts.map +1 -0
  36. package/dist/src/browser/cdp.js +106 -0
  37. package/dist/src/browser/cdp.js.map +1 -0
  38. package/dist/src/browser/index.d.ts +11 -0
  39. package/dist/src/browser/index.d.ts.map +1 -0
  40. package/dist/src/browser/index.js +11 -0
  41. package/dist/src/browser/index.js.map +1 -0
  42. package/dist/src/browser/network.d.ts +11 -0
  43. package/dist/src/browser/network.d.ts.map +1 -0
  44. package/dist/src/browser/network.js +81 -0
  45. package/dist/src/browser/network.js.map +1 -0
  46. package/dist/src/browser/screenshot.d.ts +15 -0
  47. package/dist/src/browser/screenshot.d.ts.map +1 -0
  48. package/dist/src/browser/screenshot.js +36 -0
  49. package/dist/src/browser/screenshot.js.map +1 -0
  50. package/dist/src/browser/session.d.ts +8 -0
  51. package/dist/src/browser/session.d.ts.map +1 -0
  52. package/dist/src/browser/session.js +50 -0
  53. package/dist/src/browser/session.js.map +1 -0
  54. package/dist/src/browser/snapshot.d.ts +12 -0
  55. package/dist/src/browser/snapshot.d.ts.map +1 -0
  56. package/dist/src/browser/snapshot.js +147 -0
  57. package/dist/src/browser/snapshot.js.map +1 -0
  58. package/dist/src/browser/tabs.d.ts +8 -0
  59. package/dist/src/browser/tabs.d.ts.map +1 -0
  60. package/dist/src/browser/tabs.js +25 -0
  61. package/dist/src/browser/tabs.js.map +1 -0
  62. package/dist/src/browser/types.d.ts +109 -0
  63. package/dist/src/browser/types.d.ts.map +1 -0
  64. package/dist/src/browser/types.js +16 -0
  65. package/dist/src/browser/types.js.map +1 -0
  66. package/dist/src/browser/wait.d.ts +4 -0
  67. package/dist/src/browser/wait.d.ts.map +1 -0
  68. package/dist/src/browser/wait.js +122 -0
  69. package/dist/src/browser/wait.js.map +1 -0
  70. package/dist/src/commands/browse.d.ts +8 -0
  71. package/dist/src/commands/browse.d.ts.map +1 -0
  72. package/dist/src/commands/browse.js +573 -0
  73. package/dist/src/commands/browse.js.map +1 -0
  74. package/dist/src/commands/index.d.ts.map +1 -1
  75. package/dist/src/commands/index.js +2 -0
  76. package/dist/src/commands/index.js.map +1 -1
  77. package/dist/src/commands/init.d.ts.map +1 -1
  78. package/dist/src/commands/init.js +25 -1
  79. package/dist/src/commands/init.js.map +1 -1
  80. package/dist/src/init/executor.d.ts.map +1 -1
  81. package/dist/src/init/executor.js +27 -0
  82. package/dist/src/init/executor.js.map +1 -1
  83. package/dist/src/ui/dashboard-v2.html +1692 -0
  84. package/dist/src/ui/server.mjs +15 -1
  85. package/dist/tsconfig.tsbuildinfo +1 -1
  86. package/package.json +2 -1
  87. package/scripts/understand-analyze.mjs +14 -1
@@ -0,0 +1,724 @@
1
+ ---
2
+ name: monomind:browse
3
+ description: State-of-the-art browser automation skill for UI testing, web scraping, and agent-driven navigation using agent-browser (v0.27+). Token-optimized with ref-based element selection, batch execution, KV-cache prompt ordering, on-demand screenshots, and full monomind memory integration.
4
+ version: 2.0.0
5
+ triggers:
6
+ - /browse
7
+ - monomind:browse
8
+ - browse the web
9
+ - test the UI
10
+ - browser automation
11
+ - open the browser
12
+ - navigate to
13
+ - click on the website
14
+ - fill out the form
15
+ - test login
16
+ - take a screenshot
17
+ - check the page
18
+ - web vitals
19
+ - react tree
20
+ - scrape
21
+ - crawl
22
+ tools:
23
+ - Bash
24
+ requires:
25
+ - agent-browser >= 0.25.4
26
+ ---
27
+
28
+ # monomind:browse
29
+
30
+ State-of-the-art browser automation using agent-browser. Optimized for minimal token consumption, maximum test coverage, and deep monomind integration.
31
+
32
+ ---
33
+
34
+ ## Setup (Run Once)
35
+
36
+ ```bash
37
+ # Install
38
+ npm install -g agent-browser
39
+
40
+ # Download Chrome (first time only)
41
+ agent-browser install
42
+
43
+ # Verify
44
+ agent-browser --version # should be >= 0.25.4
45
+ agent-browser doctor # check all systems
46
+ ```
47
+
48
+ ---
49
+
50
+ ## Token Efficiency Rules (ALWAYS FOLLOW)
51
+
52
+ These rules are non-negotiable. Violating them wastes tokens and degrades performance.
53
+
54
+ 1. **Batch multi-step flows** — Use `agent-browser batch` to execute sequences in a single process invocation. Eliminates per-command startup overhead.
55
+ 2. **Snapshot with `-i` flag** — Interactive-only snapshots are 93% smaller than full trees. Never call `agent-browser snapshot` without `-i` unless you need to understand full page structure.
56
+ 3. **Reuse refs** — After a snapshot, refs (`@e1`, `@e2`) are stable for the current page. Do NOT re-snapshot unless the page changed. One snapshot per page-state.
57
+ 4. **On-demand screenshots only** — Screenshots add ~800ms and ~1500 tokens as images. Only call `agent-browser screenshot` when: element not in a11y tree, visual verification is required, or the task explicitly needs visual proof.
58
+ 5. **Use batch for read-only chains** — `agent-browser open url && agent-browser snapshot -i` is two process starts. `agent-browser batch "open url" "snapshot -i"` is one.
59
+ 6. **Scope snapshots** — When testing a form or component, use `agent-browser snapshot -i -s "#form-id"` to scope to that subtree only.
60
+ 7. **Prefer `wait --text` over polling** — Never sleep and re-snapshot. Use `agent-browser wait --text "Expected"` or `wait --url "**pattern"`.
61
+
62
+ ---
63
+
64
+ ## Core Loop
65
+
66
+ ```
67
+ OPEN → SNAPSHOT -i → ACT (by ref) → SNAPSHOT -i (if page changed) → VERIFY → REPEAT
68
+ ```
69
+
70
+ ### Minimal golden-path example
71
+
72
+ ```bash
73
+ # Batch: open + snapshot in one call
74
+ agent-browser batch "open https://app.example.com/login" "snapshot -i"
75
+ # → snapshot output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign In" [ref=e3]
76
+
77
+ agent-browser fill @e1 "user@test.com"
78
+ agent-browser fill @e2 "SecurePass123!"
79
+ agent-browser click @e3
80
+ agent-browser wait --url "**/dashboard" --timeout 8000
81
+
82
+ # Re-snapshot only because URL changed (new page state)
83
+ agent-browser snapshot -i
84
+ ```
85
+
86
+ ### Full batch mode (most token-efficient)
87
+
88
+ ```bash
89
+ agent-browser batch \
90
+ "open https://app.example.com/login" \
91
+ "snapshot -i" \
92
+ "fill @e1 user@test.com" \
93
+ "fill @e2 SecurePass123!" \
94
+ "click @e3" \
95
+ "wait --url **/dashboard" \
96
+ "snapshot -i"
97
+ ```
98
+
99
+ Or as JSON via stdin (best for programmatic use):
100
+
101
+ ```bash
102
+ echo '[
103
+ ["open", "https://app.example.com/login"],
104
+ ["snapshot", "-i"],
105
+ ["fill", "@e1", "user@test.com"],
106
+ ["fill", "@e2", "SecurePass123!"],
107
+ ["click", "@e3"],
108
+ ["wait", "--url", "**/dashboard"],
109
+ ["snapshot", "-i"]
110
+ ]' | agent-browser batch --json --bail
111
+ ```
112
+
113
+ ---
114
+
115
+ ## All Commands Reference
116
+
117
+ ### Navigation
118
+
119
+ ```bash
120
+ agent-browser open <url> # Launch + navigate (aliases: goto, navigate)
121
+ agent-browser open # Launch on about:blank (for pre-nav setup)
122
+ agent-browser back # Go back
123
+ agent-browser forward # Go forward
124
+ agent-browser reload # Reload
125
+ agent-browser pushstate <url> # SPA client-side nav (Next.js, Remix, etc.)
126
+ agent-browser close # Close browser
127
+ agent-browser close --all # Close all sessions
128
+ ```
129
+
130
+ ### Snapshot (primary observation tool)
131
+
132
+ ```bash
133
+ agent-browser snapshot # Full accessibility tree + refs
134
+ agent-browser snapshot -i # Interactive elements only (USE THIS BY DEFAULT)
135
+ agent-browser snapshot -i --urls # Include href URLs for links
136
+ agent-browser snapshot -c # Compact (strip empty structural nodes)
137
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
138
+ agent-browser snapshot -s "#main" # Scope to CSS selector
139
+ agent-browser snapshot -i -c -d 5 # Combined: compact interactive, max depth 5
140
+ agent-browser snapshot --json # JSON output for programmatic use
141
+ ```
142
+
143
+ **Output example:**
144
+ ```
145
+ - textbox "Email" [ref=e1]
146
+ - textbox "Password" [ref=e2]
147
+ - button "Sign In" [ref=e3]
148
+ - link "Forgot password" [ref=e4]
149
+ ```
150
+
151
+ ### Interaction
152
+
153
+ ```bash
154
+ # Primary: use @ref from snapshot (fastest, no DOM re-query)
155
+ agent-browser click @e3
156
+ agent-browser fill @e1 "value"
157
+ agent-browser dblclick @e5
158
+ agent-browser hover @e6
159
+ agent-browser check @e7 # checkbox
160
+ agent-browser uncheck @e8
161
+ agent-browser select @e9 "Option A"
162
+ agent-browser focus @e2
163
+
164
+ # Keyboard
165
+ agent-browser press Enter
166
+ agent-browser press Tab
167
+ agent-browser press "Control+a"
168
+ agent-browser keyboard type "hello world" # real keystrokes
169
+ agent-browser keyboard inserttext "hello" # insert without key events
170
+ agent-browser keydown Shift
171
+ agent-browser keyup Shift
172
+
173
+ # Drag & drop
174
+ agent-browser drag @e1 @e2
175
+ agent-browser upload @e3 /path/to/file.pdf
176
+
177
+ # Scroll
178
+ agent-browser scroll down 500 # scroll 500px down
179
+ agent-browser scroll up
180
+ agent-browser scrollintoview @e10
181
+ agent-browser scroll down --selector "#feed"
182
+ ```
183
+
184
+ ### Semantic locators (fallback when no ref available)
185
+
186
+ ```bash
187
+ agent-browser find role button click --name "Submit"
188
+ agent-browser find text "Sign In" click
189
+ agent-browser find label "Email" fill "test@test.com"
190
+ agent-browser find placeholder "Search..." fill "query"
191
+ agent-browser find testid "submit-btn" click
192
+ agent-browser find first ".item" click
193
+ agent-browser find nth 2 "a" text
194
+ ```
195
+
196
+ ### Wait (never sleep/poll manually)
197
+
198
+ ```bash
199
+ agent-browser wait 500 # ms delay (use sparingly)
200
+ agent-browser wait "#spinner" --state hidden # wait for element to hide
201
+ agent-browser wait --text "Success" # wait for text to appear
202
+ agent-browser wait --url "**/dashboard" # wait for URL pattern
203
+ agent-browser wait --load networkidle # wait for network idle
204
+ agent-browser wait --fn "window.ready === true" # wait for JS condition
205
+ agent-browser wait --fn "!document.body.innerText.includes('Loading')"
206
+ ```
207
+
208
+ ### Read state
209
+
210
+ ```bash
211
+ agent-browser get text @e1 # text content
212
+ agent-browser get html @e2 # innerHTML
213
+ agent-browser get value @e3 # input value
214
+ agent-browser get attr @e4 "href" # attribute
215
+ agent-browser get title # page title
216
+ agent-browser get url # current URL
217
+ agent-browser get count ".item" # count matching elements
218
+ agent-browser get box @e1 # bounding box
219
+ agent-browser get styles @e1 # computed CSS styles
220
+
221
+ agent-browser is visible @e1 # boolean check
222
+ agent-browser is enabled @e1
223
+ agent-browser is checked @e1
224
+ ```
225
+
226
+ ### Screenshots (use sparingly)
227
+
228
+ ```bash
229
+ agent-browser screenshot # auto-path in /tmp
230
+ agent-browser screenshot page.png # specific path
231
+ agent-browser screenshot --full full-page.png # full-page scroll capture
232
+ agent-browser screenshot --annotate # numbered refs overlaid → use with visual models
233
+ agent-browser pdf report.pdf # save as PDF
234
+ ```
235
+
236
+ **Annotated screenshot pattern** (for visual debugging or multimodal LLMs):
237
+ ```bash
238
+ agent-browser screenshot --annotate
239
+ # Output: [1] @e1 button "Submit" [2] @e2 link "Home" [3] @e3 textbox "Email"
240
+ # Now refs are cached — interact immediately without re-snapshot
241
+ agent-browser click @e1
242
+ ```
243
+
244
+ ### Diff (regression testing)
245
+
246
+ ```bash
247
+ agent-browser diff snapshot # current vs last snapshot
248
+ agent-browser diff snapshot --baseline ./before.txt # vs saved file
249
+ agent-browser diff snapshot -s "#main" --compact # scoped diff
250
+ agent-browser diff screenshot --baseline before.png # pixel diff
251
+ agent-browser diff screenshot --baseline b.png -t 0.2 # threshold 0–1
252
+ agent-browser diff url https://v1.com https://v2.com # compare two URLs
253
+ agent-browser diff url https://v1.com https://v2.com --screenshot # + visual
254
+ ```
255
+
256
+ ### Tabs & multi-tab
257
+
258
+ ```bash
259
+ agent-browser tab # list all tabs
260
+ agent-browser tab new https://example.com # new tab
261
+ agent-browser tab new --label docs https://docs.example.com # named tab
262
+ agent-browser tab docs # switch by label
263
+ agent-browser tab t2 # switch by stable id
264
+ agent-browser tab close docs # close by label
265
+ agent-browser window new # new window
266
+ agent-browser frame "#iframe-id" # switch to iframe
267
+ agent-browser frame main # back to main frame
268
+ ```
269
+
270
+ **Multi-tab parallel test pattern:**
271
+ ```bash
272
+ agent-browser tab new --label app https://app.example.com
273
+ agent-browser tab new --label docs https://docs.example.com
274
+ agent-browser tab app
275
+ agent-browser snapshot -i # refs for app tab
276
+ agent-browser click @e3
277
+ agent-browser tab docs
278
+ agent-browser snapshot -i # refs for docs tab
279
+ ```
280
+
281
+ ### Dialogs
282
+
283
+ ```bash
284
+ agent-browser dialog accept "confirmation text"
285
+ agent-browser dialog dismiss
286
+ agent-browser dialog status # is a dialog currently open?
287
+ ```
288
+
289
+ Note: `alert` and `beforeunload` are auto-accepted by default. `confirm` and `prompt` need explicit handling.
290
+
291
+ ### Network interception
292
+
293
+ ```bash
294
+ agent-browser network route "https://api.example.com/*" --abort # block endpoint
295
+ agent-browser network route "*" --abort --resource-type script # block all JS
296
+ agent-browser network route "https://api/*" --body '{"data":[]}' # mock response
297
+ agent-browser network unroute "https://api.example.com/*"
298
+ agent-browser network requests # view tracked requests
299
+ agent-browser network requests --filter api # filter by URL substring
300
+ agent-browser network requests --type xhr,fetch # filter by type
301
+ agent-browser network requests --method POST
302
+ agent-browser network requests --status 2xx
303
+ agent-browser network request <requestId> # full request/response
304
+ agent-browser network har start # record HAR
305
+ agent-browser network har stop output.har # stop + save
306
+ ```
307
+
308
+ **Pre-nav setup pattern** (set network routes BEFORE navigating):
309
+ ```bash
310
+ agent-browser batch \
311
+ '["open"]' \
312
+ '["network", "route", "*", "--abort", "--resource-type", "script"]' \
313
+ '["cookies", "set", "--curl", "auth.curl", "--domain", "localhost"]' \
314
+ '["navigate", "http://localhost:3000"]'
315
+ ```
316
+
317
+ ### Cookies & storage
318
+
319
+ ```bash
320
+ agent-browser cookies # get all cookies
321
+ agent-browser cookies set name value # set cookie
322
+ agent-browser cookies set --curl cookies.curl # import from cURL dump / JSON / header string
323
+ agent-browser cookies clear
324
+
325
+ agent-browser storage local # get localStorage
326
+ agent-browser storage local myKey # get specific key
327
+ agent-browser storage local set myKey myValue # set
328
+ agent-browser storage local clear
329
+ agent-browser storage session # sessionStorage (same API)
330
+ ```
331
+
332
+ ### Browser settings
333
+
334
+ ```bash
335
+ agent-browser set viewport 1280 720 2 # width height deviceScaleFactor
336
+ agent-browser set device "iPhone 15 Pro" # device emulation
337
+ agent-browser set geo 37.7749 -122.4194 # geolocation
338
+ agent-browser set offline on # offline mode
339
+ agent-browser set headers '{"Authorization":"Bearer tok"}' # global headers
340
+ agent-browser set credentials user pass # HTTP basic auth
341
+ agent-browser set media dark # color scheme
342
+ ```
343
+
344
+ ### Clipboard
345
+
346
+ ```bash
347
+ agent-browser clipboard read
348
+ agent-browser clipboard write "Hello"
349
+ agent-browser clipboard copy # Ctrl+C
350
+ agent-browser clipboard paste # Ctrl+V
351
+ ```
352
+
353
+ ### Mouse (raw control, use only when refs/semantic locators fail)
354
+
355
+ ```bash
356
+ agent-browser mouse move 100 200
357
+ agent-browser mouse down left
358
+ agent-browser mouse up left
359
+ agent-browser mouse wheel 100 0 # dy dx
360
+ ```
361
+
362
+ ### React DevTools (v0.27+)
363
+
364
+ Requires launching with `--enable react-devtools`:
365
+
366
+ ```bash
367
+ # Launch with hook installed
368
+ agent-browser open --enable react-devtools https://your-react-app.com
369
+
370
+ # Inspect component tree
371
+ agent-browser react tree # full component hierarchy
372
+ agent-browser react inspect 5 # fiber ID → props, hooks, state, source
373
+ agent-browser react renders start # begin render profiling
374
+ agent-browser react renders stop # print profile (mount/re-render counts)
375
+ agent-browser react renders stop --json # JSON output
376
+ agent-browser react suspense # Suspense boundaries + root-cause classifier
377
+ agent-browser react suspense --only-dynamic # hide static boundaries
378
+ ```
379
+
380
+ ### Web Vitals (framework-agnostic)
381
+
382
+ ```bash
383
+ agent-browser vitals # LCP, CLS, TTFB, FCP, INP + React hydration
384
+ agent-browser vitals https://example.com # test specific URL
385
+ agent-browser vitals --json # JSON output
386
+ ```
387
+
388
+ ### Tracing & profiling
389
+
390
+ ```bash
391
+ agent-browser trace start trace.zip # start Chrome trace
392
+ agent-browser trace stop trace.zip # stop and save
393
+ agent-browser profiler start # DevTools profiler
394
+ agent-browser profiler stop profile.json # stop and save
395
+
396
+ agent-browser console # browser console messages
397
+ agent-browser console --json # structured CDP output
398
+ agent-browser console --clear
399
+ agent-browser errors # uncaught JS exceptions
400
+ agent-browser errors --clear
401
+ ```
402
+
403
+ ### Sessions & auth
404
+
405
+ ```bash
406
+ # Isolated sessions (each has own browser, cookies, history)
407
+ agent-browser --session agent1 open site-a.com
408
+ agent-browser --session agent2 open site-b.com
409
+ agent-browser session list
410
+
411
+ # Persist state across restarts
412
+ agent-browser --session-name myapp open app.example.com
413
+ # → auto-saves to ~/.agent-browser/sessions/myapp
414
+
415
+ # Reuse existing Chrome login
416
+ agent-browser profiles # list Chrome profiles
417
+ agent-browser --profile Default open gmail.com
418
+
419
+ # Save / load state
420
+ agent-browser state save ./auth.json # save cookies + localStorage
421
+ agent-browser state load ./auth.json
422
+ agent-browser --state ./auth.json open https://app.example.com/dashboard
423
+
424
+ # Auth vault (credentials never sent to LLM)
425
+ echo "mypassword" | agent-browser auth save github --url https://github.com/login --username me --password-stdin
426
+ agent-browser auth login github
427
+
428
+ # Encrypted state at rest
429
+ export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex>
430
+ agent-browser --session-name secure open example.com
431
+ ```
432
+
433
+ ### Dashboard (observability)
434
+
435
+ ```bash
436
+ agent-browser dashboard start # port 4848
437
+ agent-browser dashboard start --port 8080
438
+ agent-browser dashboard stop
439
+ # → open http://localhost:4848 for live viewport + activity feed + AI chat
440
+ ```
441
+
442
+ ### Init scripts
443
+
444
+ ```bash
445
+ agent-browser open --init-script ./setup.js https://app.example.com
446
+ agent-browser addinitscript "window.__TEST__ = true"
447
+ agent-browser removeinitscript <identifier>
448
+ ```
449
+
450
+ ### iOS / Mobile (real Safari)
451
+
452
+ Requires: `npm install -g appium && appium driver install xcuitest`
453
+
454
+ ```bash
455
+ agent-browser device list
456
+ agent-browser -p ios --device "iPhone 15 Pro" open https://example.com
457
+ agent-browser -p ios snapshot -i
458
+ agent-browser -p ios tap @e1
459
+ agent-browser -p ios fill @e2 "text"
460
+ agent-browser -p ios swipe up
461
+ agent-browser -p ios screenshot mobile.png
462
+ agent-browser -p ios close
463
+ ```
464
+
465
+ ### Cloud providers
466
+
467
+ | Provider | Env var | Flag |
468
+ |---------|---------|------|
469
+ | Browserbase | `BROWSERBASE_API_KEY` | `-p browserbase` |
470
+ | Browser Use | `BROWSER_USE_API_KEY` | `-p browseruse` |
471
+ | Browserless | `BROWSERLESS_API_KEY` | `-p browserless` |
472
+ | Kernel | `KERNEL_API_KEY` | `-p kernel` |
473
+ | AWS AgentCore | AWS credentials | `-p agentcore` |
474
+
475
+ All commands work identically regardless of provider.
476
+
477
+ ### Streaming
478
+
479
+ ```bash
480
+ agent-browser stream status # see WebSocket port
481
+ agent-browser stream enable --port 9223
482
+ agent-browser stream disable
483
+ ```
484
+
485
+ ### CDP / Electron apps
486
+
487
+ ```bash
488
+ agent-browser connect 9222 # connect to port, persist for session
489
+ agent-browser --cdp 9222 snapshot # per-command
490
+ agent-browser --cdp wss://remote/cdp snapshot
491
+ agent-browser --auto-connect snapshot # auto-discover running Chrome
492
+ ```
493
+
494
+ ---
495
+
496
+ ## Test Flows
497
+
498
+ ### Login / Auth
499
+
500
+ ```bash
501
+ agent-browser batch \
502
+ "open https://app.example.com/login" \
503
+ "snapshot -i"
504
+ # identify refs from output, then:
505
+ agent-browser fill @e[email] "user@test.com"
506
+ agent-browser fill @e[password] "TestPass123!"
507
+ agent-browser click @e[submit]
508
+ agent-browser wait --url "**/dashboard" --timeout 8000
509
+ ```
510
+
511
+ ### Form with validation
512
+
513
+ ```bash
514
+ # Test happy path
515
+ agent-browser batch "open /form" "snapshot -i"
516
+ agent-browser fill @e1 "John Doe"
517
+ agent-browser fill @e2 "john@test.com"
518
+ agent-browser select @e3 "Option A"
519
+ agent-browser check @e4
520
+ agent-browser click @e5
521
+ agent-browser wait --text "submitted"
522
+ agent-browser screenshot pass-form.png
523
+
524
+ # Test validation (empty submit)
525
+ agent-browser reload
526
+ agent-browser snapshot -i
527
+ agent-browser click @e5 # submit empty
528
+ agent-browser wait --text "required"
529
+ agent-browser snapshot -i # verify error messages
530
+
531
+ # Invalid email
532
+ agent-browser fill @e2 "not-an-email"
533
+ agent-browser click @e5
534
+ agent-browser snapshot -i
535
+ ```
536
+
537
+ ### CRUD
538
+
539
+ ```bash
540
+ # Create
541
+ agent-browser click @e[add]
542
+ agent-browser fill @e[name] "New Item"
543
+ agent-browser click @e[save]
544
+ agent-browser wait --text "New Item"
545
+
546
+ # Update
547
+ agent-browser click @e[edit]
548
+ agent-browser fill @e[name] "Updated Item"
549
+ agent-browser click @e[save]
550
+ agent-browser wait --text "Updated Item"
551
+
552
+ # Delete
553
+ agent-browser click @e[delete]
554
+ agent-browser wait --text "Are you sure"
555
+ agent-browser click @e[confirm]
556
+ agent-browser wait --fn "!document.body.innerText.includes('Updated Item')"
557
+ ```
558
+
559
+ ### Multi-step wizard
560
+
561
+ ```bash
562
+ # Step 1
563
+ agent-browser batch "open /wizard" "snapshot -i"
564
+ agent-browser fill @e1 "value"
565
+ agent-browser click @e[next]
566
+ agent-browser wait --text "Step 2"
567
+
568
+ # Step 2
569
+ agent-browser snapshot -i
570
+ agent-browser select @e2 "choice"
571
+ agent-browser click @e[next]
572
+
573
+ # Step 3 — verify summary
574
+ agent-browser snapshot -i
575
+ agent-browser get text @e[summary]
576
+ agent-browser click @e[confirm]
577
+ agent-browser wait --text "Complete"
578
+ ```
579
+
580
+ ### Regression test (diff baseline)
581
+
582
+ ```bash
583
+ # Save baseline
584
+ agent-browser open https://app.example.com
585
+ agent-browser snapshot -i > baseline.txt
586
+
587
+ # After a deploy, compare:
588
+ agent-browser open https://app.example.com
589
+ agent-browser diff snapshot --baseline ./baseline.txt
590
+ ```
591
+
592
+ ### API mocking
593
+
594
+ ```bash
595
+ # Mock API response, test UI reaction
596
+ agent-browser batch \
597
+ '["open"]' \
598
+ '["network", "route", "https://api.example.com/users", "--body", "{\"data\":[]}"]' \
599
+ '["navigate", "https://app.example.com/users"]'
600
+ agent-browser snapshot -i
601
+ # → verify "No users found" empty state renders correctly
602
+ ```
603
+
604
+ ### React app deep inspection
605
+
606
+ ```bash
607
+ agent-browser open --enable react-devtools https://your-react-app.com
608
+ agent-browser react tree
609
+ agent-browser vitals --json
610
+ agent-browser react renders start
611
+ # ... trigger user interactions ...
612
+ agent-browser react renders stop
613
+ agent-browser react suspense --only-dynamic
614
+ ```
615
+
616
+ ---
617
+
618
+ ## Configuration (agent-browser.json)
619
+
620
+ Create in project root for persistent defaults:
621
+
622
+ ```json
623
+ {
624
+ "$schema": "https://agent-browser.dev/schema.json",
625
+ "maxOutput": 50000,
626
+ "contentBoundaries": true,
627
+ "idleTimeout": "5m",
628
+ "screenshotDir": "./screenshots",
629
+ "screenshotFormat": "jpeg",
630
+ "screenshotQuality": 80
631
+ }
632
+ ```
633
+
634
+ Key security defaults for agent deployments:
635
+ ```json
636
+ {
637
+ "contentBoundaries": true,
638
+ "maxOutput": 50000,
639
+ "allowedDomains": ["app.example.com", "*.example.com"],
640
+ "noAutoDialog": false
641
+ }
642
+ ```
643
+
644
+ Key env vars:
645
+ ```bash
646
+ AGENT_BROWSER_SESSION=<name> # session isolation
647
+ AGENT_BROWSER_SESSION_NAME=<name> # auto-persist state
648
+ AGENT_BROWSER_MAX_OUTPUT=50000 # prevent context flooding
649
+ AGENT_BROWSER_DEFAULT_TIMEOUT=30000 # op timeout in ms (default: 25000)
650
+ AGENT_BROWSER_IDLE_TIMEOUT_MS=300000 # daemon auto-shutdown after idle
651
+ AGENT_BROWSER_CONTENT_BOUNDARIES=1 # LLM-safe output delimiters
652
+ AGENT_BROWSER_HEADED=1 # visible browser (debugging)
653
+ AGENT_BROWSER_STREAM_PORT=9223 # fixed WebSocket stream port
654
+ AI_GATEWAY_API_KEY=gw_... # for `agent-browser chat`
655
+ AI_GATEWAY_MODEL=anthropic/claude-sonnet-4-6
656
+ ```
657
+
658
+ ---
659
+
660
+ ## Monomind Integration
661
+
662
+ ### Store successful test flows
663
+
664
+ ```bash
665
+ npx monomind memory store \
666
+ --namespace browse \
667
+ --key "login-flow-<app>" \
668
+ --value "open /login → snapshot -i → fill @e[email] → fill @e[pw] → click @e[submit] → wait **/dashboard"
669
+ ```
670
+
671
+ ### Retrieve before testing
672
+
673
+ ```bash
674
+ npx monomind memory search --query "login flow" --namespace browse
675
+ ```
676
+
677
+ ### Report bugs as tasks
678
+
679
+ ```bash
680
+ npx monomind task create \
681
+ --title "UI Bug: form submits with empty email" \
682
+ --description "Steps: open /login, click submit without filling email. No validation shown. Screenshot: /tmp/bug-123.png"
683
+ ```
684
+
685
+ ### Save auth state for reuse across sessions
686
+
687
+ ```bash
688
+ # Once logged in:
689
+ agent-browser state save .monomind/auth/<app>.json
690
+
691
+ # Future sessions:
692
+ agent-browser --state .monomind/auth/<app>.json open https://app.example.com
693
+ ```
694
+
695
+ ---
696
+
697
+ ## Anti-patterns (NEVER DO)
698
+
699
+ | Anti-pattern | Why | Fix |
700
+ |---|---|---|
701
+ | `agent-browser snapshot` (no `-i`) for every step | Full tree = 10–20x tokens | Use `snapshot -i` always |
702
+ | Playwright MCP | 13,700-token schema tax before step 1 | Use agent-browser directly |
703
+ | Screenshot every step | +800ms +1500 tokens each | Screenshot only on fail/visual-required |
704
+ | Re-snapshot without page change | Wastes tokens | Reuse refs from last snapshot |
705
+ | `sleep N` between actions | Slow, fragile | Use `wait --text`, `wait --url`, `wait --fn` |
706
+ | CSS selectors when refs available | Slower, can break | Always prefer `@eN` refs from snapshot |
707
+ | Separate commands when batch works | Extra process starts | Use `batch` for multi-step flows |
708
+
709
+ ---
710
+
711
+ ## Checklist
712
+
713
+ When this skill is activated:
714
+ - [ ] `agent-browser --version` — confirm >= 0.25.4 (or run `npm install -g agent-browser`)
715
+ - [ ] `agent-browser doctor` — check Chrome + daemon health
716
+ - [ ] Get target URL from user if not provided
717
+ - [ ] Use `batch "open <url>" "snapshot -i"` to start
718
+ - [ ] Use refs (`@eN`) from snapshot output for all interactions
719
+ - [ ] Only re-snapshot after confirmed page-state change
720
+ - [ ] Only screenshot when visual evidence is required
721
+ - [ ] Use `wait --text / --url / --fn` instead of sleep or polling
722
+ - [ ] Report results: ✓ PASS / ✗ FAIL (steps to reproduce) / ⚠ WARN
723
+ - [ ] Store successful patterns in monomind memory (`browse` namespace)
724
+ - [ ] Create monomind task for any found bugs