agent-browser-stealth 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +1219 -0
  3. package/bin/agent-browser-darwin-arm64 +0 -0
  4. package/bin/agent-browser-local +0 -0
  5. package/bin/agent-browser.js +109 -0
  6. package/dist/actions.d.ts +17 -0
  7. package/dist/actions.d.ts.map +1 -0
  8. package/dist/actions.js +1917 -0
  9. package/dist/actions.js.map +1 -0
  10. package/dist/browser.d.ts +598 -0
  11. package/dist/browser.d.ts.map +1 -0
  12. package/dist/browser.js +2287 -0
  13. package/dist/browser.js.map +1 -0
  14. package/dist/daemon.d.ts +66 -0
  15. package/dist/daemon.d.ts.map +1 -0
  16. package/dist/daemon.js +603 -0
  17. package/dist/daemon.js.map +1 -0
  18. package/dist/diff.d.ts +18 -0
  19. package/dist/diff.d.ts.map +1 -0
  20. package/dist/diff.js +271 -0
  21. package/dist/diff.js.map +1 -0
  22. package/dist/encryption.d.ts +50 -0
  23. package/dist/encryption.d.ts.map +1 -0
  24. package/dist/encryption.js +85 -0
  25. package/dist/encryption.js.map +1 -0
  26. package/dist/ios-actions.d.ts +11 -0
  27. package/dist/ios-actions.d.ts.map +1 -0
  28. package/dist/ios-actions.js +228 -0
  29. package/dist/ios-actions.js.map +1 -0
  30. package/dist/ios-manager.d.ts +266 -0
  31. package/dist/ios-manager.d.ts.map +1 -0
  32. package/dist/ios-manager.js +1073 -0
  33. package/dist/ios-manager.js.map +1 -0
  34. package/dist/protocol.d.ts +26 -0
  35. package/dist/protocol.d.ts.map +1 -0
  36. package/dist/protocol.js +935 -0
  37. package/dist/protocol.js.map +1 -0
  38. package/dist/snapshot.d.ts +67 -0
  39. package/dist/snapshot.d.ts.map +1 -0
  40. package/dist/snapshot.js +514 -0
  41. package/dist/snapshot.js.map +1 -0
  42. package/dist/state-utils.d.ts +77 -0
  43. package/dist/state-utils.d.ts.map +1 -0
  44. package/dist/state-utils.js +178 -0
  45. package/dist/state-utils.js.map +1 -0
  46. package/dist/stealth.d.ts +22 -0
  47. package/dist/stealth.d.ts.map +1 -0
  48. package/dist/stealth.js +614 -0
  49. package/dist/stealth.js.map +1 -0
  50. package/dist/stream-server.d.ts +117 -0
  51. package/dist/stream-server.d.ts.map +1 -0
  52. package/dist/stream-server.js +309 -0
  53. package/dist/stream-server.js.map +1 -0
  54. package/dist/types.d.ts +855 -0
  55. package/dist/types.d.ts.map +1 -0
  56. package/dist/types.js +2 -0
  57. package/dist/types.js.map +1 -0
  58. package/package.json +85 -0
  59. package/scripts/build-all-platforms.sh +68 -0
  60. package/scripts/check-creepjs-headless.js +137 -0
  61. package/scripts/check-sannysoft-webdriver.js +112 -0
  62. package/scripts/check-version-sync.js +39 -0
  63. package/scripts/copy-native.js +36 -0
  64. package/scripts/postinstall.js +275 -0
  65. package/scripts/sync-upstream.sh +142 -0
  66. package/scripts/sync-version.js +69 -0
  67. package/skills/agent-browser/SKILL.md +464 -0
  68. package/skills/agent-browser/references/authentication.md +202 -0
  69. package/skills/agent-browser/references/commands.md +263 -0
  70. package/skills/agent-browser/references/profiling.md +120 -0
  71. package/skills/agent-browser/references/proxy-support.md +194 -0
  72. package/skills/agent-browser/references/session-management.md +193 -0
  73. package/skills/agent-browser/references/snapshot-refs.md +194 -0
  74. package/skills/agent-browser/references/video-recording.md +173 -0
  75. package/skills/agent-browser/templates/authenticated-session.sh +100 -0
  76. package/skills/agent-browser/templates/capture-workflow.sh +69 -0
  77. package/skills/agent-browser/templates/form-automation.sh +62 -0
@@ -0,0 +1,464 @@
1
+ ---
2
+ name: agent-browser
3
+ description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4
+ allowed-tools: Bash(npx agent-browser-stealth:*), Bash(npx agent-browser:*), Bash(agent-browser:*)
5
+ ---
6
+
7
+ # Browser Automation with agent-browser
8
+
9
+ Install package: `npm install -g agent-browser-stealth` (CLI command remains `agent-browser` for compatibility).
10
+
11
+ ## Core Workflow
12
+
13
+ Every browser automation follows this pattern:
14
+
15
+ 1. **Navigate**: `agent-browser open <url>`
16
+ 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
17
+ 3. **Interact**: Use refs to click, fill, select
18
+ 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
19
+
20
+ ```bash
21
+ agent-browser open https://example.com/form
22
+ agent-browser snapshot -i
23
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
24
+
25
+ agent-browser fill @e1 "user@example.com"
26
+ agent-browser fill @e2 "password123"
27
+ agent-browser click @e3
28
+ agent-browser wait --load networkidle
29
+ agent-browser snapshot -i # Check result
30
+ ```
31
+
32
+ ## Command Chaining
33
+
34
+ Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
35
+
36
+ ```bash
37
+ # Chain open + wait + snapshot in one call
38
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
39
+
40
+ # Chain multiple interactions
41
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
42
+
43
+ # Navigate and capture
44
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
45
+ ```
46
+
47
+ **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
48
+
49
+ ## Essential Commands
50
+
51
+ ```bash
52
+ # Navigation
53
+ agent-browser open <url> # Navigate (aliases: goto, navigate)
54
+ agent-browser close # Close browser
55
+
56
+ # Snapshot
57
+ agent-browser snapshot -i # Interactive elements with refs (recommended)
58
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
59
+ agent-browser snapshot -s "#selector" # Scope to CSS selector
60
+
61
+ # Interaction (use @refs from snapshot)
62
+ agent-browser click @e1 # Click element
63
+ agent-browser click @e1 --new-tab # Click and open in new tab
64
+ agent-browser fill @e2 "text" # Clear and type text
65
+ agent-browser type @e2 "text" # Type without clearing
66
+ agent-browser select @e1 "option" # Select dropdown option
67
+ agent-browser check @e1 # Check checkbox
68
+ agent-browser press Enter # Press key
69
+ agent-browser keyboard type "text" # Type at current focus (no selector)
70
+ agent-browser keyboard inserttext "text" # Insert without key events
71
+ agent-browser scroll down 500 # Scroll page
72
+
73
+ # Get information
74
+ agent-browser get text @e1 # Get element text
75
+ agent-browser get url # Get current URL
76
+ agent-browser get title # Get page title
77
+
78
+ # Wait
79
+ agent-browser wait @e1 # Wait for element
80
+ agent-browser wait --load networkidle # Wait for network idle
81
+ agent-browser wait --url "**/page" # Wait for URL pattern
82
+ agent-browser wait 2000 # Wait milliseconds
83
+ agent-browser wait 2000-5000 # Random wait between 2-5 seconds
84
+
85
+ # Capture
86
+ agent-browser screenshot # Screenshot to temp dir
87
+ agent-browser screenshot --full # Full page screenshot
88
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
89
+ agent-browser pdf output.pdf # Save as PDF
90
+
91
+ # Diff (compare page states)
92
+ agent-browser diff snapshot # Compare current vs last snapshot
93
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
94
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff
95
+ agent-browser diff url <url1> <url2> # Compare two pages
96
+ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
97
+ agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
98
+ ```
99
+
100
+ ## Common Patterns
101
+
102
+ ### Form Submission
103
+
104
+ ```bash
105
+ agent-browser open https://example.com/signup
106
+ agent-browser snapshot -i
107
+ agent-browser fill @e1 "Jane Doe"
108
+ agent-browser fill @e2 "jane@example.com"
109
+ agent-browser select @e3 "California"
110
+ agent-browser check @e4
111
+ agent-browser click @e5
112
+ agent-browser wait --load networkidle
113
+ ```
114
+
115
+ ### Authentication with State Persistence
116
+
117
+ ```bash
118
+ # Login once and save state
119
+ agent-browser open https://app.example.com/login
120
+ agent-browser snapshot -i
121
+ agent-browser fill @e1 "$USERNAME"
122
+ agent-browser fill @e2 "$PASSWORD"
123
+ agent-browser click @e3
124
+ agent-browser wait --url "**/dashboard"
125
+ agent-browser state save auth.json
126
+
127
+ # Reuse in future sessions
128
+ agent-browser state load auth.json
129
+ agent-browser open https://app.example.com/dashboard
130
+ ```
131
+
132
+ ### Session Persistence
133
+
134
+ ```bash
135
+ # Auto-save/restore cookies and localStorage across browser restarts
136
+ agent-browser --session-name myapp open https://app.example.com/login
137
+ # ... login flow ...
138
+ agent-browser close # State auto-saved to ~/.agent-browser/sessions/
139
+
140
+ # Next time, state is auto-loaded
141
+ agent-browser --session-name myapp open https://app.example.com/dashboard
142
+
143
+ # Encrypt state at rest
144
+ export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
145
+ agent-browser --session-name secure open https://app.example.com
146
+
147
+ # Manage saved states
148
+ agent-browser state list
149
+ agent-browser state show myapp-default.json
150
+ agent-browser state clear myapp
151
+ agent-browser state clean --older-than 7
152
+ ```
153
+
154
+ ### Data Extraction
155
+
156
+ ```bash
157
+ agent-browser open https://example.com/products
158
+ agent-browser snapshot -i
159
+ agent-browser get text @e5 # Get specific element text
160
+ agent-browser get text body > page.txt # Get all page text
161
+
162
+ # JSON output for parsing
163
+ agent-browser snapshot -i --json
164
+ agent-browser get text @e1 --json
165
+ ```
166
+
167
+ ### Parallel Sessions
168
+
169
+ ```bash
170
+ agent-browser --session site1 open https://site-a.com
171
+ agent-browser --session site2 open https://site-b.com
172
+
173
+ agent-browser --session site1 snapshot -i
174
+ agent-browser --session site2 snapshot -i
175
+
176
+ agent-browser session list
177
+ ```
178
+
179
+ ### Connect to Existing Chrome
180
+
181
+ ```bash
182
+ # Auto-discover running Chrome with remote debugging enabled
183
+ agent-browser --auto-connect open https://example.com
184
+ agent-browser --auto-connect snapshot
185
+
186
+ # Or with explicit CDP port
187
+ agent-browser --cdp 9222 snapshot
188
+ ```
189
+
190
+ ### Color Scheme (Dark Mode)
191
+
192
+ ```bash
193
+ # Persistent dark mode via flag (applies to all pages and new tabs)
194
+ agent-browser --color-scheme dark open https://example.com
195
+
196
+ # Or via environment variable
197
+ AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
198
+
199
+ # Or set during session (persists for subsequent commands)
200
+ agent-browser set media dark
201
+ ```
202
+
203
+ ### Visual Browser (Debugging)
204
+
205
+ ```bash
206
+ agent-browser --headed open https://example.com
207
+ agent-browser highlight @e1 # Highlight element
208
+ agent-browser record start demo.webm # Record session
209
+ agent-browser profiler start # Start Chrome DevTools profiling
210
+ agent-browser profiler stop trace.json # Stop and save profile (path optional)
211
+ ```
212
+
213
+ ### Local Files (PDFs, HTML)
214
+
215
+ ```bash
216
+ # Open local files with file:// URLs
217
+ agent-browser --allow-file-access open file:///path/to/document.pdf
218
+ agent-browser --allow-file-access open file:///path/to/page.html
219
+ agent-browser screenshot output.png
220
+ ```
221
+
222
+ ### Stealth Mode (Avoid Bot Detection)
223
+
224
+ Stealth mode is enabled by default. It patches automation detection vectors (navigator.webdriver, plugins, WebGL, etc.) so websites cannot easily identify the browser as automated.
225
+
226
+ ```bash
227
+ # Stealth is on by default -- just use normally
228
+ agent-browser open https://example.com
229
+
230
+ # Disable stealth if needed for debugging
231
+ agent-browser --stealth false open https://example.com
232
+ ```
233
+
234
+ Stealth capabilities vary by connection type:
235
+
236
+ - Local launch: Chromium launch args + context init scripts
237
+ - CDP / `--auto-connect`: context init scripts
238
+ - Cloud providers: context init scripts (Kernel may also apply provider-managed stealth)
239
+
240
+ For troubleshooting, run with `--debug` to print the active stealth connection type and capabilities.
241
+
242
+ ### iOS Simulator (Mobile Safari)
243
+
244
+ ```bash
245
+ # List available iOS simulators
246
+ agent-browser device list
247
+
248
+ # Launch Safari on a specific device
249
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
250
+
251
+ # Same workflow as desktop - snapshot, interact, re-snapshot
252
+ agent-browser -p ios snapshot -i
253
+ agent-browser -p ios tap @e1 # Tap (alias for click)
254
+ agent-browser -p ios fill @e2 "text"
255
+ agent-browser -p ios swipe up # Mobile-specific gesture
256
+
257
+ # Take screenshot
258
+ agent-browser -p ios screenshot mobile.png
259
+
260
+ # Close session (shuts down simulator)
261
+ agent-browser -p ios close
262
+ ```
263
+
264
+ **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
265
+
266
+ **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
267
+
268
+ ## Diffing (Verifying Changes)
269
+
270
+ Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
271
+
272
+ ```bash
273
+ # Typical workflow: snapshot -> action -> diff
274
+ agent-browser snapshot -i # Take baseline snapshot
275
+ agent-browser click @e2 # Perform action
276
+ agent-browser diff snapshot # See what changed (auto-compares to last snapshot)
277
+ ```
278
+
279
+ For visual regression testing or monitoring:
280
+
281
+ ```bash
282
+ # Save a baseline screenshot, then compare later
283
+ agent-browser screenshot baseline.png
284
+ # ... time passes or changes are made ...
285
+ agent-browser diff screenshot --baseline baseline.png
286
+
287
+ # Compare staging vs production
288
+ agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
289
+ ```
290
+
291
+ `diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
292
+
293
+ ## Timeouts and Slow Pages
294
+
295
+ The default Playwright timeout is 25 seconds for local browsers. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
296
+
297
+ ```bash
298
+ # Wait for network activity to settle (best for slow pages)
299
+ agent-browser wait --load networkidle
300
+
301
+ # Wait for a specific element to appear
302
+ agent-browser wait "#content"
303
+ agent-browser wait @e1
304
+
305
+ # Wait for a specific URL pattern (useful after redirects)
306
+ agent-browser wait --url "**/dashboard"
307
+
308
+ # Wait for a JavaScript condition
309
+ agent-browser wait --fn "document.readyState === 'complete'"
310
+
311
+ # Wait a fixed duration (milliseconds) as a last resort
312
+ agent-browser wait 5000
313
+
314
+ # Random wait between 2-5 seconds (useful for anti-detection)
315
+ agent-browser wait 2000-5000
316
+ ```
317
+
318
+ When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
319
+
320
+ ### Humanized Interactions
321
+
322
+ agent-browser automatically humanizes interactions to avoid behavioral detection:
323
+
324
+ - **Randomized typing**: `type --delay` varies each keystroke delay by +-40%
325
+ - **Random wait ranges**: `wait 2000-5000` pauses for a random duration in that range
326
+ - **Bezier curve mouse**: Before every `click`, the mouse moves along a natural-looking curve
327
+
328
+ These behaviors are always active. For sensitive sites, combine with `--headed` and `--profile` for best results.
329
+
330
+ ## Session Management and Cleanup
331
+
332
+ When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
333
+
334
+ ```bash
335
+ # Each agent gets its own isolated session
336
+ agent-browser --session agent1 open site-a.com
337
+ agent-browser --session agent2 open site-b.com
338
+
339
+ # Check active sessions
340
+ agent-browser session list
341
+ ```
342
+
343
+ Always close your browser session when done to avoid leaked processes:
344
+
345
+ ```bash
346
+ agent-browser close # Close default session
347
+ agent-browser --session agent1 close # Close specific session
348
+ ```
349
+
350
+ If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
351
+
352
+ ## Ref Lifecycle (Important)
353
+
354
+ Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
355
+
356
+ - Clicking links or buttons that navigate
357
+ - Form submissions
358
+ - Dynamic content loading (dropdowns, modals)
359
+
360
+ ```bash
361
+ agent-browser click @e5 # Navigates to new page
362
+ agent-browser snapshot -i # MUST re-snapshot
363
+ agent-browser click @e1 # Use new refs
364
+ ```
365
+
366
+ ## Annotated Screenshots (Vision Mode)
367
+
368
+ Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.
369
+
370
+ ```bash
371
+ agent-browser screenshot --annotate
372
+ # Output includes the image path and a legend:
373
+ # [1] @e1 button "Submit"
374
+ # [2] @e2 link "Home"
375
+ # [3] @e3 textbox "Email"
376
+ agent-browser click @e2 # Click using ref from annotated screenshot
377
+ ```
378
+
379
+ Use annotated screenshots when:
380
+ - The page has unlabeled icon buttons or visual-only elements
381
+ - You need to verify visual layout or styling
382
+ - Canvas or chart elements are present (invisible to text snapshots)
383
+ - You need spatial reasoning about element positions
384
+
385
+ ## Semantic Locators (Alternative to Refs)
386
+
387
+ When refs are unavailable or unreliable, use semantic locators:
388
+
389
+ ```bash
390
+ agent-browser find text "Sign In" click
391
+ agent-browser find label "Email" fill "user@test.com"
392
+ agent-browser find role button click --name "Submit"
393
+ agent-browser find placeholder "Search" type "query"
394
+ agent-browser find testid "submit-btn" click
395
+ ```
396
+
397
+ ## JavaScript Evaluation (eval)
398
+
399
+ Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.
400
+
401
+ ```bash
402
+ # Simple expressions work with regular quoting
403
+ agent-browser eval 'document.title'
404
+ agent-browser eval 'document.querySelectorAll("img").length'
405
+
406
+ # Complex JS: use --stdin with heredoc (RECOMMENDED)
407
+ agent-browser eval --stdin <<'EVALEOF'
408
+ JSON.stringify(
409
+ Array.from(document.querySelectorAll("img"))
410
+ .filter(i => !i.alt)
411
+ .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
412
+ )
413
+ EVALEOF
414
+
415
+ # Alternative: base64 encoding (avoids all shell escaping issues)
416
+ agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
417
+ ```
418
+
419
+ **Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.
420
+
421
+ **Rules of thumb:**
422
+ - Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine
423
+ - Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`
424
+ - Programmatic/generated scripts -> use `eval -b` with base64
425
+
426
+ ## Configuration File
427
+
428
+ Create `agent-browser.json` in the project root for persistent settings:
429
+
430
+ ```json
431
+ {
432
+ "headed": true,
433
+ "proxy": "http://localhost:8080",
434
+ "profile": "./browser-data"
435
+ }
436
+ ```
437
+
438
+ Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.
439
+
440
+ ## Deep-Dive Documentation
441
+
442
+ | Reference | When to Use |
443
+ |-----------|-------------|
444
+ | [references/commands.md](references/commands.md) | Full command reference with all options |
445
+ | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
446
+ | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
447
+ | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
448
+ | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
449
+ | [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
450
+ | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
451
+
452
+ ## Ready-to-Use Templates
453
+
454
+ | Template | Description |
455
+ |----------|-------------|
456
+ | [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
457
+ | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state |
458
+ | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |
459
+
460
+ ```bash
461
+ ./templates/form-automation.sh https://example.com/form
462
+ ./templates/authenticated-session.sh https://app.example.com/login
463
+ ./templates/capture-workflow.sh https://example.com ./output
464
+ ```
@@ -0,0 +1,202 @@
1
+ # Authentication Patterns
2
+
3
+ Login flows, session persistence, OAuth, 2FA, and authenticated browsing.
4
+
5
+ **Related**: [session-management.md](session-management.md) for state persistence details, [SKILL.md](../SKILL.md) for quick start.
6
+
7
+ ## Contents
8
+
9
+ - [Basic Login Flow](#basic-login-flow)
10
+ - [Saving Authentication State](#saving-authentication-state)
11
+ - [Restoring Authentication](#restoring-authentication)
12
+ - [OAuth / SSO Flows](#oauth--sso-flows)
13
+ - [Two-Factor Authentication](#two-factor-authentication)
14
+ - [HTTP Basic Auth](#http-basic-auth)
15
+ - [Cookie-Based Auth](#cookie-based-auth)
16
+ - [Token Refresh Handling](#token-refresh-handling)
17
+ - [Security Best Practices](#security-best-practices)
18
+
19
+ ## Basic Login Flow
20
+
21
+ ```bash
22
+ # Navigate to login page
23
+ agent-browser open https://app.example.com/login
24
+ agent-browser wait --load networkidle
25
+
26
+ # Get form elements
27
+ agent-browser snapshot -i
28
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Sign In"
29
+
30
+ # Fill credentials
31
+ agent-browser fill @e1 "user@example.com"
32
+ agent-browser fill @e2 "password123"
33
+
34
+ # Submit
35
+ agent-browser click @e3
36
+ agent-browser wait --load networkidle
37
+
38
+ # Verify login succeeded
39
+ agent-browser get url # Should be dashboard, not login
40
+ ```
41
+
42
+ ## Saving Authentication State
43
+
44
+ After logging in, save state for reuse:
45
+
46
+ ```bash
47
+ # Login first (see above)
48
+ agent-browser open https://app.example.com/login
49
+ agent-browser snapshot -i
50
+ agent-browser fill @e1 "user@example.com"
51
+ agent-browser fill @e2 "password123"
52
+ agent-browser click @e3
53
+ agent-browser wait --url "**/dashboard"
54
+
55
+ # Save authenticated state
56
+ agent-browser state save ./auth-state.json
57
+ ```
58
+
59
+ ## Restoring Authentication
60
+
61
+ Skip login by loading saved state:
62
+
63
+ ```bash
64
+ # Load saved auth state
65
+ agent-browser state load ./auth-state.json
66
+
67
+ # Navigate directly to protected page
68
+ agent-browser open https://app.example.com/dashboard
69
+
70
+ # Verify authenticated
71
+ agent-browser snapshot -i
72
+ ```
73
+
74
+ ## OAuth / SSO Flows
75
+
76
+ For OAuth redirects:
77
+
78
+ ```bash
79
+ # Start OAuth flow
80
+ agent-browser open https://app.example.com/auth/google
81
+
82
+ # Handle redirects automatically
83
+ agent-browser wait --url "**/accounts.google.com**"
84
+ agent-browser snapshot -i
85
+
86
+ # Fill Google credentials
87
+ agent-browser fill @e1 "user@gmail.com"
88
+ agent-browser click @e2 # Next button
89
+ agent-browser wait 2000
90
+ agent-browser snapshot -i
91
+ agent-browser fill @e3 "password"
92
+ agent-browser click @e4 # Sign in
93
+
94
+ # Wait for redirect back
95
+ agent-browser wait --url "**/app.example.com**"
96
+ agent-browser state save ./oauth-state.json
97
+ ```
98
+
99
+ ## Two-Factor Authentication
100
+
101
+ Handle 2FA with manual intervention:
102
+
103
+ ```bash
104
+ # Login with credentials
105
+ agent-browser open https://app.example.com/login --headed # Show browser
106
+ agent-browser snapshot -i
107
+ agent-browser fill @e1 "user@example.com"
108
+ agent-browser fill @e2 "password123"
109
+ agent-browser click @e3
110
+
111
+ # Wait for user to complete 2FA manually
112
+ echo "Complete 2FA in the browser window..."
113
+ agent-browser wait --url "**/dashboard" --timeout 120000
114
+
115
+ # Save state after 2FA
116
+ agent-browser state save ./2fa-state.json
117
+ ```
118
+
119
+ ## HTTP Basic Auth
120
+
121
+ For sites using HTTP Basic Authentication:
122
+
123
+ ```bash
124
+ # Set credentials before navigation
125
+ agent-browser set credentials username password
126
+
127
+ # Navigate to protected resource
128
+ agent-browser open https://protected.example.com/api
129
+ ```
130
+
131
+ ## Cookie-Based Auth
132
+
133
+ Manually set authentication cookies:
134
+
135
+ ```bash
136
+ # Set auth cookie
137
+ agent-browser cookies set session_token "abc123xyz"
138
+
139
+ # Navigate to protected page
140
+ agent-browser open https://app.example.com/dashboard
141
+ ```
142
+
143
+ ## Token Refresh Handling
144
+
145
+ For sessions with expiring tokens:
146
+
147
+ ```bash
148
+ #!/bin/bash
149
+ # Wrapper that handles token refresh
150
+
151
+ STATE_FILE="./auth-state.json"
152
+
153
+ # Try loading existing state
154
+ if [[ -f "$STATE_FILE" ]]; then
155
+ agent-browser state load "$STATE_FILE"
156
+ agent-browser open https://app.example.com/dashboard
157
+
158
+ # Check if session is still valid
159
+ URL=$(agent-browser get url)
160
+ if [[ "$URL" == *"/login"* ]]; then
161
+ echo "Session expired, re-authenticating..."
162
+ # Perform fresh login
163
+ agent-browser snapshot -i
164
+ agent-browser fill @e1 "$USERNAME"
165
+ agent-browser fill @e2 "$PASSWORD"
166
+ agent-browser click @e3
167
+ agent-browser wait --url "**/dashboard"
168
+ agent-browser state save "$STATE_FILE"
169
+ fi
170
+ else
171
+ # First-time login
172
+ agent-browser open https://app.example.com/login
173
+ # ... login flow ...
174
+ fi
175
+ ```
176
+
177
+ ## Security Best Practices
178
+
179
+ 1. **Never commit state files** - They contain session tokens
180
+ ```bash
181
+ echo "*.auth-state.json" >> .gitignore
182
+ ```
183
+
184
+ 2. **Use environment variables for credentials**
185
+ ```bash
186
+ agent-browser fill @e1 "$APP_USERNAME"
187
+ agent-browser fill @e2 "$APP_PASSWORD"
188
+ ```
189
+
190
+ 3. **Clean up after automation**
191
+ ```bash
192
+ agent-browser cookies clear
193
+ rm -f ./auth-state.json
194
+ ```
195
+
196
+ 4. **Use short-lived sessions for CI/CD**
197
+ ```bash
198
+ # Don't persist state in CI
199
+ agent-browser open https://app.example.com/login
200
+ # ... login and perform actions ...
201
+ agent-browser close # Session ends, nothing persisted
202
+ ```