agent-browser-stealth 0.14.0-fork.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +1214 -0
  3. package/bin/agent-browser-darwin-arm64 +0 -0
  4. package/bin/agent-browser-darwin-x64 +0 -0
  5. package/bin/agent-browser-linux-arm64 +0 -0
  6. package/bin/agent-browser-linux-x64 +0 -0
  7. package/bin/agent-browser-win32-x64.exe +0 -0
  8. package/bin/agent-browser.js +109 -0
  9. package/dist/actions.d.ts +17 -0
  10. package/dist/actions.d.ts.map +1 -0
  11. package/dist/actions.js +1977 -0
  12. package/dist/actions.js.map +1 -0
  13. package/dist/browser.d.ts +611 -0
  14. package/dist/browser.d.ts.map +1 -0
  15. package/dist/browser.js +2425 -0
  16. package/dist/browser.js.map +1 -0
  17. package/dist/daemon.d.ts +66 -0
  18. package/dist/daemon.d.ts.map +1 -0
  19. package/dist/daemon.js +632 -0
  20. package/dist/daemon.js.map +1 -0
  21. package/dist/diff.d.ts +18 -0
  22. package/dist/diff.d.ts.map +1 -0
  23. package/dist/diff.js +271 -0
  24. package/dist/diff.js.map +1 -0
  25. package/dist/encryption.d.ts +50 -0
  26. package/dist/encryption.d.ts.map +1 -0
  27. package/dist/encryption.js +85 -0
  28. package/dist/encryption.js.map +1 -0
  29. package/dist/ios-actions.d.ts +11 -0
  30. package/dist/ios-actions.d.ts.map +1 -0
  31. package/dist/ios-actions.js +228 -0
  32. package/dist/ios-actions.js.map +1 -0
  33. package/dist/ios-manager.d.ts +266 -0
  34. package/dist/ios-manager.d.ts.map +1 -0
  35. package/dist/ios-manager.js +1073 -0
  36. package/dist/ios-manager.js.map +1 -0
  37. package/dist/protocol.d.ts +26 -0
  38. package/dist/protocol.d.ts.map +1 -0
  39. package/dist/protocol.js +932 -0
  40. package/dist/protocol.js.map +1 -0
  41. package/dist/snapshot.d.ts +67 -0
  42. package/dist/snapshot.d.ts.map +1 -0
  43. package/dist/snapshot.js +514 -0
  44. package/dist/snapshot.js.map +1 -0
  45. package/dist/state-utils.d.ts +77 -0
  46. package/dist/state-utils.d.ts.map +1 -0
  47. package/dist/state-utils.js +178 -0
  48. package/dist/state-utils.js.map +1 -0
  49. package/dist/stealth.d.ts +29 -0
  50. package/dist/stealth.d.ts.map +1 -0
  51. package/dist/stealth.js +1103 -0
  52. package/dist/stealth.js.map +1 -0
  53. package/dist/stream-server.d.ts +117 -0
  54. package/dist/stream-server.d.ts.map +1 -0
  55. package/dist/stream-server.js +309 -0
  56. package/dist/stream-server.js.map +1 -0
  57. package/dist/types.d.ts +854 -0
  58. package/dist/types.d.ts.map +1 -0
  59. package/dist/types.js +2 -0
  60. package/dist/types.js.map +1 -0
  61. package/package.json +84 -0
  62. package/scripts/build-all-platforms.sh +68 -0
  63. package/scripts/check-creepjs-headless.js +137 -0
  64. package/scripts/check-sannysoft-webdriver.js +112 -0
  65. package/scripts/check-version-sync.js +39 -0
  66. package/scripts/copy-native.js +36 -0
  67. package/scripts/postinstall.js +275 -0
  68. package/scripts/sync-upstream.sh +142 -0
  69. package/scripts/sync-version.js +87 -0
  70. package/skills/agent-browser/SKILL.md +470 -0
  71. package/skills/agent-browser/references/authentication.md +202 -0
  72. package/skills/agent-browser/references/commands.md +263 -0
  73. package/skills/agent-browser/references/profiling.md +120 -0
  74. package/skills/agent-browser/references/proxy-support.md +194 -0
  75. package/skills/agent-browser/references/session-management.md +193 -0
  76. package/skills/agent-browser/references/snapshot-refs.md +194 -0
  77. package/skills/agent-browser/references/video-recording.md +173 -0
  78. package/skills/agent-browser/templates/authenticated-session.sh +100 -0
  79. package/skills/agent-browser/templates/capture-workflow.sh +69 -0
  80. package/skills/agent-browser/templates/form-automation.sh +62 -0
@@ -0,0 +1,470 @@
1
+ ---
2
+ name: agent-browser
3
+ description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4
+ allowed-tools: Bash(npx agent-browser-stealth:*), Bash(npx agent-browser:*), Bash(agent-browser:*)
5
+ ---
6
+
7
+ # Browser Automation with agent-browser
8
+
9
+ Install package: `npm install -g agent-browser-stealth` (CLI command remains `agent-browser` for compatibility).
10
+
11
+ ## Core Workflow
12
+
13
+ Every browser automation follows this pattern:
14
+
15
+ 1. **Navigate**: `agent-browser open <url>`
16
+ 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
17
+ 3. **Interact**: Use refs to click, fill, select
18
+ 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
19
+
20
+ ```bash
21
+ agent-browser open https://example.com/form
22
+ agent-browser snapshot -i
23
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
24
+
25
+ agent-browser fill @e1 "user@example.com"
26
+ agent-browser fill @e2 "password123"
27
+ agent-browser click @e3
28
+ agent-browser wait --load networkidle
29
+ agent-browser snapshot -i # Check result
30
+ ```
31
+
32
+ ## Command Chaining
33
+
34
+ Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
35
+
36
+ ```bash
37
+ # Chain open + wait + snapshot in one call
38
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
39
+
40
+ # Chain multiple interactions
41
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
42
+
43
+ # Navigate and capture
44
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
45
+ ```
46
+
47
+ **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
48
+
49
+ ## Essential Commands
50
+
51
+ ```bash
52
+ # Navigation
53
+ agent-browser open <url> # Navigate (aliases: goto, navigate)
54
+ agent-browser close # Close browser
55
+ agent-browser --version # Show CLI version (fork builds include upstream/fork)
56
+
57
+ # Snapshot
58
+ agent-browser snapshot -i # Interactive elements with refs (recommended)
59
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
60
+ agent-browser snapshot -s "#selector" # Scope to CSS selector
61
+
62
+ # Interaction (use @refs from snapshot)
63
+ agent-browser click @e1 # Click element
64
+ agent-browser click @e1 --new-tab # Click and open in new tab
65
+ agent-browser fill @e2 "text" # Clear and type text
66
+ agent-browser type @e2 "text" --delay 120 # Type without clearing (human-like pacing)
67
+ agent-browser select @e1 "option" # Select dropdown option
68
+ agent-browser check @e1 # Check checkbox
69
+ agent-browser press Enter # Press key
70
+ agent-browser keyboard type "text" --delay 90 # Type at current focus (no selector)
71
+ agent-browser keyboard inserttext "text" # Insert without key events
72
+ agent-browser scroll down 500 # Scroll page
73
+
74
+ # Get information
75
+ agent-browser get text @e1 # Get element text
76
+ agent-browser get url # Get current URL
77
+ agent-browser get title # Get page title
78
+
79
+ # Wait
80
+ agent-browser wait @e1 # Wait for element
81
+ agent-browser wait --load networkidle # Wait for network idle
82
+ agent-browser wait --url "**/page" # Wait for URL pattern
83
+ agent-browser wait 2000 # Wait milliseconds
84
+ agent-browser wait 2000-5000 # Random wait between 2-5 seconds
85
+
86
+ # Capture
87
+ agent-browser screenshot # Screenshot to temp dir
88
+ agent-browser screenshot --full # Full page screenshot
89
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
90
+ agent-browser pdf output.pdf # Save as PDF
91
+
92
+ # Diff (compare page states)
93
+ agent-browser diff snapshot # Compare current vs last snapshot
94
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
95
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff
96
+ agent-browser diff url <url1> <url2> # Compare two pages
97
+ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
98
+ agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
99
+ ```
100
+
101
+ ## Common Patterns
102
+
103
+ ### Form Submission
104
+
105
+ ```bash
106
+ agent-browser open https://example.com/signup
107
+ agent-browser snapshot -i
108
+ agent-browser fill @e1 "Jane Doe"
109
+ agent-browser fill @e2 "jane@example.com"
110
+ agent-browser select @e3 "California"
111
+ agent-browser check @e4
112
+ agent-browser click @e5
113
+ agent-browser wait --load networkidle
114
+ ```
115
+
116
+ ### Authentication with State Persistence
117
+
118
+ ```bash
119
+ # Login once and save state
120
+ agent-browser open https://app.example.com/login
121
+ agent-browser snapshot -i
122
+ agent-browser fill @e1 "$USERNAME"
123
+ agent-browser fill @e2 "$PASSWORD"
124
+ agent-browser click @e3
125
+ agent-browser wait --url "**/dashboard"
126
+ agent-browser state save auth.json
127
+
128
+ # Reuse in future sessions
129
+ agent-browser state load auth.json
130
+ agent-browser open https://app.example.com/dashboard
131
+ ```
132
+
133
+ ### Session Persistence
134
+
135
+ ```bash
136
+ # Auto-save/restore cookies and localStorage across browser restarts
137
+ agent-browser --session-name myapp open https://app.example.com/login
138
+ # ... login flow ...
139
+ agent-browser close # State auto-saved to ~/.agent-browser/sessions/
140
+
141
+ # Next time, state is auto-loaded
142
+ agent-browser --session-name myapp open https://app.example.com/dashboard
143
+
144
+ # Encrypt state at rest
145
+ export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
146
+ agent-browser --session-name secure open https://app.example.com
147
+
148
+ # Manage saved states
149
+ agent-browser state list
150
+ agent-browser state show myapp-default.json
151
+ agent-browser state clear myapp
152
+ agent-browser state clean --older-than 7
153
+ ```
154
+
155
+ ### Data Extraction
156
+
157
+ ```bash
158
+ agent-browser open https://example.com/products
159
+ agent-browser snapshot -i
160
+ agent-browser get text @e5 # Get specific element text
161
+ agent-browser get text body > page.txt # Get all page text
162
+
163
+ # JSON output for parsing
164
+ agent-browser snapshot -i --json
165
+ agent-browser get text @e1 --json
166
+ ```
167
+
168
+ ### Parallel Sessions
169
+
170
+ ```bash
171
+ agent-browser --session site1 open https://site-a.com
172
+ agent-browser --session site2 open https://site-b.com
173
+
174
+ agent-browser --session site1 snapshot -i
175
+ agent-browser --session site2 snapshot -i
176
+
177
+ agent-browser session list
178
+ ```
179
+
180
+ ### Connect to Existing Chrome
181
+
182
+ By default in this fork, commands without `--cdp` require an existing browser at `localhost:9333`. If CDP is unavailable, the command fails fast (no automatic local browser launch).
183
+
184
+ ```bash
185
+ # Auto-discover running Chrome with remote debugging enabled
186
+ agent-browser --auto-connect open https://example.com
187
+ agent-browser --auto-connect snapshot
188
+
189
+ # Or with explicit CDP port
190
+ agent-browser --cdp 9222 snapshot
191
+ ```
192
+
193
+ ### Color Scheme (Dark Mode)
194
+
195
+ ```bash
196
+ # Persistent dark mode via flag (applies to all pages and new tabs)
197
+ agent-browser --color-scheme dark open https://example.com
198
+
199
+ # Or via environment variable
200
+ AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
201
+
202
+ # Or set during session (persists for subsequent commands)
203
+ agent-browser set media dark
204
+ ```
205
+
206
+ ### Visual Browser (Debugging)
207
+
208
+ ```bash
209
+ agent-browser --headed open https://example.com
210
+ agent-browser highlight @e1 # Highlight element
211
+ agent-browser record start demo.webm # Record session
212
+ agent-browser profiler start # Start Chrome DevTools profiling
213
+ agent-browser profiler stop trace.json # Stop and save profile (path optional)
214
+ ```
215
+
216
+ ### Local Files (PDFs, HTML)
217
+
218
+ ```bash
219
+ # Open local files with file:// URLs
220
+ agent-browser --allow-file-access open file:///path/to/document.pdf
221
+ agent-browser --allow-file-access open file:///path/to/page.html
222
+ agent-browser screenshot output.png
223
+ ```
224
+
225
+ ### Project Policy
226
+
227
+ - `--profile` / `AGENT_BROWSER_PROFILE` are forbidden
228
+ - `--channel` / `AGENT_BROWSER_CHANNEL` are forbidden
229
+ - Use existing browser sessions (default CDP `localhost:9333`) or pass `--cdp` explicitly
230
+
231
+ ### Stealth Mode (Always On)
232
+
233
+ Stealth is always active -- no flags needed. All sessions automatically apply anti-detection patches (navigator.webdriver removal, UA override, plugin injection, WebGL masking, humanized interactions, etc.).
234
+
235
+ Chromium launches in managed mode use Chrome channel by default for a genuine browser binary fingerprint.
236
+
237
+ For best results against strong bot detection, use `--headed` and `--session-name`.
238
+
239
+ ### Auto Region Detection
240
+
241
+ The browser automatically detects the target site's region from the URL TLD and sets matching locale, timezone, and Accept-Language headers. For example, navigating to `shopee.tw` sets locale `zh-TW` and timezone `Asia/Taipei`. This reduces server-side risk scoring from region-signal mismatches.
242
+
243
+ Override: `AGENT_BROWSER_LOCALE`, `AGENT_BROWSER_TIMEZONE` env vars.
244
+
245
+ ### Captcha Detection & Auto-Retry
246
+
247
+ When a navigation lands on a captcha/verification page, the browser automatically retries up to 2 times with randomized backoff (3-7s). If detection persists, a warning is shown suggesting `--headed` mode or `--session-name` persistence.
248
+
249
+ ### iOS Simulator (Mobile Safari)
250
+
251
+ ```bash
252
+ # List available iOS simulators
253
+ agent-browser device list
254
+
255
+ # Launch Safari on a specific device
256
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
257
+
258
+ # Same workflow as desktop - snapshot, interact, re-snapshot
259
+ agent-browser -p ios snapshot -i
260
+ agent-browser -p ios tap @e1 # Tap (alias for click)
261
+ agent-browser -p ios fill @e2 "text"
262
+ agent-browser -p ios swipe up # Mobile-specific gesture
263
+
264
+ # Take screenshot
265
+ agent-browser -p ios screenshot mobile.png
266
+
267
+ # Close session (shuts down simulator)
268
+ agent-browser -p ios close
269
+ ```
270
+
271
+ **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
272
+
273
+ **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
274
+
275
+ ## Diffing (Verifying Changes)
276
+
277
+ Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
278
+
279
+ ```bash
280
+ # Typical workflow: snapshot -> action -> diff
281
+ agent-browser snapshot -i # Take baseline snapshot
282
+ agent-browser click @e2 # Perform action
283
+ agent-browser diff snapshot # See what changed (auto-compares to last snapshot)
284
+ ```
285
+
286
+ For visual regression testing or monitoring:
287
+
288
+ ```bash
289
+ # Save a baseline screenshot, then compare later
290
+ agent-browser screenshot baseline.png
291
+ # ... time passes or changes are made ...
292
+ agent-browser diff screenshot --baseline baseline.png
293
+
294
+ # Compare staging vs production
295
+ agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
296
+ ```
297
+
298
+ `diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
299
+
300
+ ## Timeouts and Slow Pages
301
+
302
+ The default Playwright timeout is 25 seconds for local browsers. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
303
+
304
+ ```bash
305
+ # Wait for network activity to settle (best for slow pages)
306
+ agent-browser wait --load networkidle
307
+
308
+ # Wait for a specific element to appear
309
+ agent-browser wait "#content"
310
+ agent-browser wait @e1
311
+
312
+ # Wait for a specific URL pattern (useful after redirects)
313
+ agent-browser wait --url "**/dashboard"
314
+
315
+ # Wait for a JavaScript condition
316
+ agent-browser wait --fn "document.readyState === 'complete'"
317
+
318
+ # Wait a fixed duration (milliseconds) as a last resort
319
+ agent-browser wait 5000
320
+
321
+ # Random wait between 2-5 seconds (useful for anti-detection)
322
+ agent-browser wait 2000-5000
323
+ ```
324
+
325
+ When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
326
+
327
+ ### Humanized Interactions
328
+
329
+ agent-browser automatically humanizes interactions to avoid behavioral detection:
330
+
331
+ - **Randomized typing**: `type --delay` varies each keystroke delay by +-40%
332
+ - **Random wait ranges**: `wait 2000-5000` pauses for a random duration in that range
333
+ - **Bezier curve mouse**: Before every `click`, the mouse moves along a natural-looking curve
334
+
335
+ These behaviors are always active. For sensitive sites, combine with `--headed` and `--session-name` for best results.
336
+
337
+ ## Session Management and Cleanup
338
+
339
+ When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
340
+
341
+ ```bash
342
+ # Each agent gets its own isolated session
343
+ agent-browser --session agent1 open site-a.com
344
+ agent-browser --session agent2 open site-b.com
345
+
346
+ # Check active sessions
347
+ agent-browser session list
348
+ ```
349
+
350
+ Always close your browser session when done to avoid leaked processes:
351
+
352
+ ```bash
353
+ agent-browser close # Close default session
354
+ agent-browser --session agent1 close # Close specific session
355
+ ```
356
+
357
+ If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
358
+
359
+ ## Ref Lifecycle (Important)
360
+
361
+ Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
362
+
363
+ - Clicking links or buttons that navigate
364
+ - Form submissions
365
+ - Dynamic content loading (dropdowns, modals)
366
+
367
+ ```bash
368
+ agent-browser click @e5 # Navigates to new page
369
+ agent-browser snapshot -i # MUST re-snapshot
370
+ agent-browser click @e1 # Use new refs
371
+ ```
372
+
373
+ ## Annotated Screenshots (Vision Mode)
374
+
375
+ Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.
376
+
377
+ ```bash
378
+ agent-browser screenshot --annotate
379
+ # Output includes the image path and a legend:
380
+ # [1] @e1 button "Submit"
381
+ # [2] @e2 link "Home"
382
+ # [3] @e3 textbox "Email"
383
+ agent-browser click @e2 # Click using ref from annotated screenshot
384
+ ```
385
+
386
+ Use annotated screenshots when:
387
+ - The page has unlabeled icon buttons or visual-only elements
388
+ - You need to verify visual layout or styling
389
+ - Canvas or chart elements are present (invisible to text snapshots)
390
+ - You need spatial reasoning about element positions
391
+
392
+ ## Semantic Locators (Alternative to Refs)
393
+
394
+ When refs are unavailable or unreliable, use semantic locators:
395
+
396
+ ```bash
397
+ agent-browser find text "Sign In" click
398
+ agent-browser find label "Email" fill "user@test.com"
399
+ agent-browser find role button click --name "Submit"
400
+ agent-browser find placeholder "Search" type "query"
401
+ agent-browser find testid "submit-btn" click
402
+ ```
403
+
404
+ ## JavaScript Evaluation (eval)
405
+
406
+ Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.
407
+
408
+ ```bash
409
+ # Simple expressions work with regular quoting
410
+ agent-browser eval 'document.title'
411
+ agent-browser eval 'document.querySelectorAll("img").length'
412
+
413
+ # Complex JS: use --stdin with heredoc (RECOMMENDED)
414
+ agent-browser eval --stdin <<'EVALEOF'
415
+ JSON.stringify(
416
+ Array.from(document.querySelectorAll("img"))
417
+ .filter(i => !i.alt)
418
+ .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
419
+ )
420
+ EVALEOF
421
+
422
+ # Alternative: base64 encoding (avoids all shell escaping issues)
423
+ agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
424
+ ```
425
+
426
+ **Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.
427
+
428
+ **Rules of thumb:**
429
+ - Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine
430
+ - Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`
431
+ - Programmatic/generated scripts -> use `eval -b` with base64
432
+
433
+ ## Configuration File
434
+
435
+ Create `agent-browser.json` in the project root for persistent settings:
436
+
437
+ ```json
438
+ {
439
+ "headed": true,
440
+ "proxy": "http://localhost:8080"
441
+ }
442
+ ```
443
+
444
+ Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.
445
+
446
+ ## Deep-Dive Documentation
447
+
448
+ | Reference | When to Use |
449
+ |-----------|-------------|
450
+ | [references/commands.md](references/commands.md) | Full command reference with all options |
451
+ | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
452
+ | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
453
+ | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
454
+ | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
455
+ | [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
456
+ | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
457
+
458
+ ## Ready-to-Use Templates
459
+
460
+ | Template | Description |
461
+ |----------|-------------|
462
+ | [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
463
+ | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state |
464
+ | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |
465
+
466
+ ```bash
467
+ ./templates/form-automation.sh https://example.com/form
468
+ ./templates/authenticated-session.sh https://app.example.com/login
469
+ ./templates/capture-workflow.sh https://example.com ./output
470
+ ```
@@ -0,0 +1,202 @@
1
+ # Authentication Patterns
2
+
3
+ Login flows, session persistence, OAuth, 2FA, and authenticated browsing.
4
+
5
+ **Related**: [session-management.md](session-management.md) for state persistence details, [SKILL.md](../SKILL.md) for quick start.
6
+
7
+ ## Contents
8
+
9
+ - [Basic Login Flow](#basic-login-flow)
10
+ - [Saving Authentication State](#saving-authentication-state)
11
+ - [Restoring Authentication](#restoring-authentication)
12
+ - [OAuth / SSO Flows](#oauth--sso-flows)
13
+ - [Two-Factor Authentication](#two-factor-authentication)
14
+ - [HTTP Basic Auth](#http-basic-auth)
15
+ - [Cookie-Based Auth](#cookie-based-auth)
16
+ - [Token Refresh Handling](#token-refresh-handling)
17
+ - [Security Best Practices](#security-best-practices)
18
+
19
+ ## Basic Login Flow
20
+
21
+ ```bash
22
+ # Navigate to login page
23
+ agent-browser open https://app.example.com/login
24
+ agent-browser wait --load networkidle
25
+
26
+ # Get form elements
27
+ agent-browser snapshot -i
28
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Sign In"
29
+
30
+ # Fill credentials
31
+ agent-browser fill @e1 "user@example.com"
32
+ agent-browser fill @e2 "password123"
33
+
34
+ # Submit
35
+ agent-browser click @e3
36
+ agent-browser wait --load networkidle
37
+
38
+ # Verify login succeeded
39
+ agent-browser get url # Should be dashboard, not login
40
+ ```
41
+
42
+ ## Saving Authentication State
43
+
44
+ After logging in, save state for reuse:
45
+
46
+ ```bash
47
+ # Login first (see above)
48
+ agent-browser open https://app.example.com/login
49
+ agent-browser snapshot -i
50
+ agent-browser fill @e1 "user@example.com"
51
+ agent-browser fill @e2 "password123"
52
+ agent-browser click @e3
53
+ agent-browser wait --url "**/dashboard"
54
+
55
+ # Save authenticated state
56
+ agent-browser state save ./auth-state.json
57
+ ```
58
+
59
+ ## Restoring Authentication
60
+
61
+ Skip login by loading saved state:
62
+
63
+ ```bash
64
+ # Load saved auth state
65
+ agent-browser state load ./auth-state.json
66
+
67
+ # Navigate directly to protected page
68
+ agent-browser open https://app.example.com/dashboard
69
+
70
+ # Verify authenticated
71
+ agent-browser snapshot -i
72
+ ```
73
+
74
+ ## OAuth / SSO Flows
75
+
76
+ For OAuth redirects:
77
+
78
+ ```bash
79
+ # Start OAuth flow
80
+ agent-browser open https://app.example.com/auth/google
81
+
82
+ # Handle redirects automatically
83
+ agent-browser wait --url "**/accounts.google.com**"
84
+ agent-browser snapshot -i
85
+
86
+ # Fill Google credentials
87
+ agent-browser fill @e1 "user@gmail.com"
88
+ agent-browser click @e2 # Next button
89
+ agent-browser wait 2000
90
+ agent-browser snapshot -i
91
+ agent-browser fill @e3 "password"
92
+ agent-browser click @e4 # Sign in
93
+
94
+ # Wait for redirect back
95
+ agent-browser wait --url "**/app.example.com**"
96
+ agent-browser state save ./oauth-state.json
97
+ ```
98
+
99
+ ## Two-Factor Authentication
100
+
101
+ Handle 2FA with manual intervention:
102
+
103
+ ```bash
104
+ # Login with credentials
105
+ agent-browser open https://app.example.com/login --headed # Show browser
106
+ agent-browser snapshot -i
107
+ agent-browser fill @e1 "user@example.com"
108
+ agent-browser fill @e2 "password123"
109
+ agent-browser click @e3
110
+
111
+ # Wait for user to complete 2FA manually
112
+ echo "Complete 2FA in the browser window..."
113
+ agent-browser wait --url "**/dashboard" --timeout 120000
114
+
115
+ # Save state after 2FA
116
+ agent-browser state save ./2fa-state.json
117
+ ```
118
+
119
+ ## HTTP Basic Auth
120
+
121
+ For sites using HTTP Basic Authentication:
122
+
123
+ ```bash
124
+ # Set credentials before navigation
125
+ agent-browser set credentials username password
126
+
127
+ # Navigate to protected resource
128
+ agent-browser open https://protected.example.com/api
129
+ ```
130
+
131
+ ## Cookie-Based Auth
132
+
133
+ Manually set authentication cookies:
134
+
135
+ ```bash
136
+ # Set auth cookie
137
+ agent-browser cookies set session_token "abc123xyz"
138
+
139
+ # Navigate to protected page
140
+ agent-browser open https://app.example.com/dashboard
141
+ ```
142
+
143
+ ## Token Refresh Handling
144
+
145
+ For sessions with expiring tokens:
146
+
147
+ ```bash
148
+ #!/bin/bash
149
+ # Wrapper that handles token refresh
150
+
151
+ STATE_FILE="./auth-state.json"
152
+
153
+ # Try loading existing state
154
+ if [[ -f "$STATE_FILE" ]]; then
155
+ agent-browser state load "$STATE_FILE"
156
+ agent-browser open https://app.example.com/dashboard
157
+
158
+ # Check if session is still valid
159
+ URL=$(agent-browser get url)
160
+ if [[ "$URL" == *"/login"* ]]; then
161
+ echo "Session expired, re-authenticating..."
162
+ # Perform fresh login
163
+ agent-browser snapshot -i
164
+ agent-browser fill @e1 "$USERNAME"
165
+ agent-browser fill @e2 "$PASSWORD"
166
+ agent-browser click @e3
167
+ agent-browser wait --url "**/dashboard"
168
+ agent-browser state save "$STATE_FILE"
169
+ fi
170
+ else
171
+ # First-time login
172
+ agent-browser open https://app.example.com/login
173
+ # ... login flow ...
174
+ fi
175
+ ```
176
+
177
+ ## Security Best Practices
178
+
179
+ 1. **Never commit state files** - They contain session tokens
180
+ ```bash
181
+ echo "*.auth-state.json" >> .gitignore
182
+ ```
183
+
184
+ 2. **Use environment variables for credentials**
185
+ ```bash
186
+ agent-browser fill @e1 "$APP_USERNAME"
187
+ agent-browser fill @e2 "$APP_PASSWORD"
188
+ ```
189
+
190
+ 3. **Clean up after automation**
191
+ ```bash
192
+ agent-browser cookies clear
193
+ rm -f ./auth-state.json
194
+ ```
195
+
196
+ 4. **Use short-lived sessions for CI/CD**
197
+ ```bash
198
+ # Don't persist state in CI
199
+ agent-browser open https://app.example.com/login
200
+ # ... login and perform actions ...
201
+ agent-browser close # Session ends, nothing persisted
202
+ ```