agentsys 5.3.7 → 5.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/.agnix.toml +17 -7
  2. package/.claude-plugin/marketplace.json +13 -2
  3. package/.claude-plugin/plugin.json +1 -1
  4. package/.gitmodules +3 -0
  5. package/AGENTS.md +4 -4
  6. package/CHANGELOG.md +21 -0
  7. package/README.md +46 -5
  8. package/lib/adapter-transforms.js +3 -1
  9. package/package.json +1 -1
  10. package/site/assets/css/main.css +39 -1
  11. package/site/assets/js/main.js +24 -0
  12. package/site/content.json +4 -4
  13. package/site/index.html +82 -7
  14. package/site/ux-spec.md +5 -5
  15. package/agent-knowledge/AGENTS.md +0 -231
  16. package/agent-knowledge/acp-with-codex-gemini-copilot-claude.md +0 -504
  17. package/agent-knowledge/ai-cli-advanced-integration-patterns.md +0 -670
  18. package/agent-knowledge/ai-cli-non-interactive-programmatic-usage.md +0 -1394
  19. package/agent-knowledge/all-in-one-plus-modular-packages.md +0 -576
  20. package/agent-knowledge/cli-browser-automation-agents.md +0 -936
  21. package/agent-knowledge/github-org-project-management.md +0 -319
  22. package/agent-knowledge/github-org-structure-patterns.md +0 -268
  23. package/agent-knowledge/kiro-supervised-autopilot.md +0 -400
  24. package/agent-knowledge/multi-product-org-docs.md +0 -622
  25. package/agent-knowledge/oss-org-naming-patterns.md +0 -368
  26. package/agent-knowledge/resources/acp-with-codex-gemini-copilot-claude-sources.json +0 -408
  27. package/agent-knowledge/resources/ai-cli-non-interactive-programmatic-usage-sources.json +0 -500
  28. package/agent-knowledge/resources/all-in-one-plus-modular-packages-sources.json +0 -310
  29. package/agent-knowledge/resources/cli-browser-automation-agents-sources.json +0 -428
  30. package/agent-knowledge/resources/github-org-project-management-sources.json +0 -239
  31. package/agent-knowledge/resources/github-org-structure-patterns-sources.json +0 -293
  32. package/agent-knowledge/resources/kiro-supervised-autopilot-sources.json +0 -135
  33. package/agent-knowledge/resources/multi-product-org-docs-sources.json +0 -514
  34. package/agent-knowledge/resources/oss-org-naming-patterns-sources.json +0 -458
  35. package/agent-knowledge/resources/skill-plugin-distribution-patterns-sources.json +0 -290
  36. package/agent-knowledge/resources/terminal-browsers-agent-automation-sources.json +0 -758
  37. package/agent-knowledge/resources/web-session-persistence-cli-agents-sources.json +0 -528
  38. package/agent-knowledge/skill-plugin-distribution-patterns.md +0 -661
  39. package/agent-knowledge/terminal-browsers-agent-automation.md +0 -776
  40. package/agent-knowledge/web-session-persistence-cli-agents.md +0 -1352
@@ -1,936 +0,0 @@
1
- # Learning Guide: CLI-First Browser Automation for AI Agents
2
-
3
- **Generated**: 2026-02-20
4
- **Sources**: 32 resources analyzed
5
- **Depth**: deep
6
-
7
- ---
8
-
9
- ## Prerequisites
10
-
11
- - Basic familiarity with Node.js `npx` and/or Python `pip`/`uv`
12
- - Understanding of what cookies and browser sessions are
13
- - Awareness of Chrome DevTools Protocol (CDP) at a conceptual level
14
- - A working Node.js 18+ or Python 3.11+ environment
15
-
16
- ---
17
-
18
- ## TL;DR
19
-
20
- - **Playwright CLI** (`npx playwright`) is the lowest-friction entry point: `codegen`, `screenshot`, `pdf`, `open`, and `show-trace` commands require zero scripting. The `--save-storage` / `--load-storage` flags give you a complete auth-handoff pattern in two commands.
21
- - **Playwright MCP** exposes 30+ named tools (`browser_navigate`, `browser_click`, `browser_take_screenshot`, etc.) to any MCP-capable agent host without writing a single line of Playwright API code.
22
- - **browser-use** is the highest-level Python option: it ships a CLI (`browser-use open`, `browser-use click N`, `browser-use screenshot`) and a Python agent API. Designed specifically for LLM agents.
23
- - **Steel Browser** and **Browserless** expose a REST API (`POST /v1/screenshot`, `/v1/scrape`, `/v1/pdf`) so agents can drive a browser with plain `curl` or `fetch`.
24
- - For the auth handoff, the canonical pattern is: headed `npx playwright codegen --save-storage=auth.json <login-url>` → user logs in → agent uses `--load-storage=auth.json` on all subsequent headless commands.
25
-
26
- ---
27
-
28
- ## Core Concepts
29
-
30
- ### 1. The Three Automation Layers
31
-
32
- Browser automation tools fall into three layers. Understanding which layer you are at tells you how much boilerplate you need.
33
-
34
- **Layer 1 - CLI/REST verbs (zero boilerplate)**
35
- You call a binary or HTTP endpoint. No session object, no page object, no awaiting. Examples: `npx playwright screenshot`, `curl http://localhost:3000/v1/screenshot`, `browser-use click 3`.
36
-
37
- **Layer 2 - MCP tools (agent-native, zero boilerplate)**
38
- The browser is a running server exposing named tools. Your agent calls `browser_navigate({url})` as a tool call, same as any other MCP tool. The session is persistent across calls. No JS or Python required in the agent.
39
-
40
- **Layer 3 - Library API (full power, more boilerplate)**
41
- You write Playwright/Puppeteer/Rod scripts. Full control of every event but you must manage the async lifecycle yourself.
42
-
43
- For AI agents, Layer 1 and Layer 2 are almost always preferable. Layer 3 is the implementation layer for building Layer 1/2 wrappers.
44
-
45
- ### 2. Headed vs Headless
46
-
47
- - **Headed**: real browser window appears. Required for user-interactive auth flows.
48
- - **Headless**: no window. Faster, suitable for automated runs after auth is established.
49
-
50
- All major tools default to headless or can be switched with a flag.
51
-
52
- ### 3. The Auth Handoff Problem
53
-
54
- Most interesting pages require login. The canonical pattern for CLI agents:
55
-
56
- 1. **Trigger a headed browser** so the user can see and interact with a real login form.
57
- 2. **Capture the resulting session** (cookies + localStorage) into a file.
58
- 3. **Inject that file** into all subsequent headless requests.
59
-
60
- This is a one-time human action. The agent then operates fully autonomously from step 3 onward.
61
-
62
- ### 4. Playwright storageState JSON Format
63
-
64
- `npx playwright codegen --save-storage=auth.json` writes a JSON file with this structure:
65
-
66
- ```json
67
- {
68
- "cookies": [
69
- {
70
- "name": "session",
71
- "value": "abc123...",
72
- "domain": ".example.com",
73
- "path": "/",
74
- "expires": 1771234567.0,
75
- "httpOnly": true,
76
- "secure": true,
77
- "sameSite": "Lax"
78
- }
79
- ],
80
- "origins": [
81
- {
82
- "origin": "https://example.com",
83
- "localStorage": [
84
- { "name": "auth_token", "value": "eyJ..." }
85
- ]
86
- }
87
- ]
88
- }
89
- ```
90
-
91
- This file is directly understood by `--load-storage`, by Playwright MCP's `--storage-state`, and can be converted to Netscape cookies.txt for `curl`/`wget`/`yt-dlp`.
92
-
93
- ### 5. Netscape cookies.txt Format
94
-
95
- The Netscape cookie file format is a 7-field tab-separated text file:
96
-
97
- ```
98
- # Netscape HTTP Cookie File
99
- # Generated by browser-automation tool
100
-
101
- .example.com TRUE / TRUE 1771234567 session abc123...
102
- example.com FALSE /api FALSE 0 csrf_token xyz789
103
- ```
104
-
105
- Fields: `domain`, `include_subdomains` (TRUE/FALSE), `path`, `https_only` (TRUE/FALSE), `expires_unix_epoch` (0 = session cookie), `name`, `value`.
106
-
107
- Lines starting with `#` are comments. Lines starting with `#HttpOnly_` indicate HttpOnly cookies.
108
-
109
- Used by: `curl -b cookies.txt`, `wget --load-cookies`, `yt-dlp --cookies`, `httpx`.
110
-
111
- ---
112
-
113
- ## Tools Reference
114
-
115
- ### Playwright CLI (`npx playwright`)
116
-
117
- **Installation**: `npm install -D playwright` or `npm install -g playwright`
118
-
119
- **Core commands**:
120
-
121
- | Command | What it does | Key flags |
122
- |---------|-------------|-----------|
123
- | `npx playwright codegen [url]` | Opens headed browser, records interactions to test script | `--save-storage=auth.json`, `-o out.js`, `--target python` |
124
- | `npx playwright screenshot [url] [file]` | Headless screenshot | `--full-page`, `--load-storage=auth.json`, `-b chromium\|firefox\|webkit` |
125
- | `npx playwright pdf [url] [file]` | Save page as PDF (Chromium only) | `--paper-format=A4`, `--load-storage=auth.json` |
126
- | `npx playwright open [url]` | Open headed browser interactively | `--load-storage=auth.json`, `--save-storage=auth.json` |
127
- | `npx playwright show-trace [file]` | View recorded trace | `--port 9323` |
128
-
129
- **The auth handoff in two commands**:
130
-
131
- ```bash
132
- # Step 1: User logs in (headed browser opens, user sees real page)
133
- npx playwright codegen --save-storage=auth.json https://example.com/login
134
- # (user logs in manually, closes browser, auth.json now has cookies)
135
-
136
- # Step 2: Agent uses saved session for headless work
137
- npx playwright screenshot --load-storage=auth.json \
138
- https://example.com/dashboard dashboard.png
139
-
140
- npx playwright pdf --load-storage=auth.json \
141
- https://example.com/report report.pdf
142
- ```
143
-
144
- **Note on interactivity**: `npx playwright codegen` opens a visible browser and a side panel with generated code. The user can navigate, log in, and then close the window. The `--save-storage` flag captures state at close. This is the cleanest agent-triggered human-auth pattern available.
145
-
146
- **Standard options** (shared across all commands):
147
- - `--browser` / `-b`: `cr` (chromium), `ff` (firefox), `wk` (webkit), `msedge`, `chrome`
148
- - `--device`: emulate device (`"iPhone 13"`, `"Pixel 5"`)
149
- - `--viewport-size`: `"1280,720"`
150
- - `--user-agent`, `--lang`, `--timezone`, `--geolocation`
151
- - `--proxy-server`
152
- - `--ignore-https-errors`
153
- - `--user-data-dir`: use persistent Chrome profile with existing logins
154
- - `--channel`: `chrome`, `msedge`, `chrome-beta`
155
-
156
- **Screenshot example with wait**:
157
- ```bash
158
- npx playwright screenshot \
159
- --full-page \
160
- --wait-for-selector=".dashboard-loaded" \
161
- --load-storage=auth.json \
162
- https://example.com/dashboard \
163
- out.png
164
- ```
165
-
166
- ---
167
-
168
- ### Playwright MCP (`@playwright/mcp`)
169
-
170
- **What it is**: A Model Context Protocol server that exposes browser automation as ~30 named tools. Any MCP-capable agent host (Claude Desktop, VS Code Copilot, Cursor, Cline, Windsurf) can call these tools without any Playwright code.
171
-
172
- **Installation** (add to MCP client config):
173
- ```json
174
- {
175
- "mcpServers": {
176
- "playwright": {
177
- "command": "npx",
178
- "args": ["@playwright/mcp@latest"]
179
- }
180
- }
181
- }
182
- ```
183
-
184
- **With options** (headed browser + persistent auth):
185
- ```json
186
- {
187
- "mcpServers": {
188
- "playwright": {
189
- "command": "npx",
190
- "args": [
191
- "@playwright/mcp@latest",
192
- "--browser", "chrome",
193
- "--user-data-dir", "/home/user/.playwright-agent-profile"
194
- ]
195
- }
196
- }
197
- }
198
- ```
199
-
200
- **Available MCP tools** (complete list):
201
-
202
- *Core navigation & interaction*:
203
- | Tool | Description |
204
- |------|-------------|
205
- | `browser_navigate` | Navigate to a URL |
206
- | `browser_navigate_back` | Go back in history |
207
- | `browser_click` | Click an element (by accessibility label/text/role) |
208
- | `browser_type` | Type text into a focused field |
209
- | `browser_fill_form` | Fill multiple form fields at once |
210
- | `browser_select_option` | Choose from a dropdown |
211
- | `browser_hover` | Hover over an element |
212
- | `browser_drag` | Drag and drop between elements |
213
- | `browser_press_key` | Send keyboard input |
214
- | `browser_handle_dialog` | Respond to alert/confirm/prompt dialogs |
215
- | `browser_file_upload` | Upload a file |
216
-
217
- *Page inspection*:
218
- | Tool | Description |
219
- |------|-------------|
220
- | `browser_snapshot` | Get accessibility tree of current page (preferred over screenshot for LLMs) |
221
- | `browser_take_screenshot` | Capture PNG screenshot |
222
- | `browser_evaluate` | Run JavaScript and return result |
223
- | `browser_console_messages` | Get browser console logs |
224
- | `browser_network_requests` | List all network requests since load |
225
- | `browser_wait_for` | Wait for text to appear/disappear or timeout |
226
-
227
- *Tab & session management*:
228
- | Tool | Description |
229
- |------|-------------|
230
- | `browser_tabs` | List, create, close, or switch tabs |
231
- | `browser_resize` | Resize browser window |
232
- | `browser_close` | Close current page |
233
- | `browser_install` | Install browser binaries |
234
-
235
- *Vision mode (requires `--caps vision`)*:
236
- | Tool | Description |
237
- |------|-------------|
238
- | `browser_mouse_click_xy` | Click at pixel coordinates |
239
- | `browser_mouse_move_xy` | Move mouse to coordinates |
240
- | `browser_mouse_drag_xy` | Drag using pixel coordinates |
241
- | `browser_mouse_wheel` | Scroll |
242
-
243
- *PDF (requires `--caps pdf`)*:
244
- | Tool | Description |
245
- |------|-------------|
246
- | `browser_pdf_save` | Save current page as PDF |
247
-
248
- *Testing assertions (requires `--caps testing`)*:
249
- | Tool | Description |
250
- |------|-------------|
251
- | `browser_verify_text_visible` | Assert text is present |
252
- | `browser_verify_element_visible` | Assert element exists |
253
- | `browser_generate_locator` | Generate stable CSS/Aria selector |
254
-
255
- **How agents use Playwright MCP**:
256
-
257
- The server runs a persistent headed or headless browser. The agent calls tools sequentially:
258
-
259
- ```
260
- agent → browser_navigate({url: "https://example.com/login"})
261
- agent → browser_snapshot() # read page structure
262
- agent → browser_type({element: "Email field", text: "user@example.com"})
263
- agent → browser_type({element: "Password field", text: "..."})
264
- agent → browser_click({element: "Sign in button"})
265
- agent → browser_snapshot() # verify login succeeded
266
- agent → browser_navigate({url: "https://example.com/dashboard"})
267
- agent → browser_take_screenshot({filename: "dashboard.png"})
268
- ```
269
-
270
- **Auth handoff with Playwright MCP**:
271
-
272
- Option A - Persistent profile (simplest):
273
- ```json
274
- "args": ["@playwright/mcp@latest", "--user-data-dir", "/path/to/profile"]
275
- ```
276
- User logs into a normal Chrome window using that profile once. The agent uses that profile forever.
277
-
278
- Option B - Storage state file:
279
- ```json
280
- "args": ["@playwright/mcp@latest", "--storage-state", "/path/to/auth.json"]
281
- ```
282
- Auth was captured separately (e.g., with `npx playwright codegen --save-storage`).
283
-
284
- Option C - Chrome extension bridge:
285
- ```json
286
- "args": ["@playwright/mcp@latest", "--extension"]
287
- ```
288
- The agent connects to your currently running Chrome browser tab. Uses whatever session is already active.
289
-
290
- Option D - CDP endpoint (connect to running Chrome):
291
- ```json
292
- "args": [
293
- "@playwright/mcp@latest",
294
- "--cdp-endpoint", "http://localhost:9222"
295
- ]
296
- ```
297
- User launches Chrome with `--remote-debugging-port=9222`, logs in, agent connects to that live session.
298
-
299
- **Key insight**: `browser_snapshot` returns the accessibility tree as structured text, not a screenshot. This is far more token-efficient for LLM consumption and does not require a vision model.
300
-
301
- ---
302
-
303
- ### browser-use (Python)
304
-
305
- **What it is**: A Python library with a CLI and agent API designed specifically for LLM-driven browser automation. The agent receives a high-level task description and plans/executes browser interactions autonomously.
306
-
307
- **Installation**:
308
- ```bash
309
- pip install browser-use
310
- # or
311
- uv add browser-use
312
- uvx browser-use install # downloads Chromium
313
- ```
314
-
315
- **CLI interface** (stateful session persists between commands):
316
- ```bash
317
- browser-use open https://example.com # navigate
318
- browser-use state # list clickable elements by index
319
- browser-use click 5 # click element #5
320
- browser-use type "search query" # type text
321
- browser-use screenshot page.png # capture screen
322
- browser-use close # end session
323
- ```
324
-
325
- **Agent API** (LLM controls browser autonomously):
326
- ```python
327
- from browser_use import Agent, Browser, ChatBrowserUse
328
- import asyncio
329
-
330
- async def run():
331
- browser = Browser()
332
- llm = ChatBrowserUse() # or use OpenAI, Anthropic, etc.
333
- agent = Agent(
334
- task="Log into GitHub, go to my notifications, summarize the top 3",
335
- llm=llm,
336
- browser=browser,
337
- )
338
- result = await agent.run()
339
- print(result)
340
-
341
- asyncio.run(run())
342
- ```
343
-
344
- **Auth with real Chrome profile**:
345
- ```python
346
- from browser_use import Browser, BrowserConfig
347
-
348
- browser = Browser(config=BrowserConfig(
349
- chrome_instance_path="/usr/bin/google-chrome",
350
- # Uses default Chrome profile with existing logins
351
- ))
352
- ```
353
-
354
- **Custom tools extension**:
355
- ```python
356
- from browser_use import Agent
357
- from browser_use.browser.context import BrowserContext
358
-
359
- @agent.action("Read the current page URL and return it")
360
- async def get_current_url(browser: BrowserContext) -> str:
361
- page = await browser.get_current_page()
362
- return page.url
363
- ```
364
-
365
- **Comparison to Playwright MCP**: browser-use is more autonomous - you give it a task and it figures out the steps. Playwright MCP gives you individual tool calls (more control, less autonomy). browser-use requires Python; Playwright MCP is language-agnostic.
366
-
367
- ---
368
-
369
- ### puppeteer-extra
370
-
371
- **What it is**: Puppeteer with a plugin system. The key plugin is `puppeteer-extra-plugin-stealth` which patches ~20 bot-detection signals.
372
-
373
- **Installation**:
374
- ```bash
375
- npm install puppeteer-extra puppeteer-extra-plugin-stealth
376
- ```
377
-
378
- **Basic usage** (still requires scripting, no CLI wrapper):
379
- ```javascript
380
- const puppeteer = require('puppeteer-extra');
381
- const StealthPlugin = require('puppeteer-extra-plugin-stealth');
382
- puppeteer.use(StealthPlugin());
383
-
384
- const browser = await puppeteer.launch({ headless: false });
385
- const page = await browser.newPage();
386
- await page.goto('https://example.com');
387
- await page.screenshot({ path: 'screenshot.png' });
388
- await browser.close();
389
- ```
390
-
391
- **Key note**: There is no standalone puppeteer CLI tool for agents. Puppeteer is a library only. For CLI-driven use, Playwright CLI is the better choice. Puppeteer-extra's main value is stealth for avoiding bot detection.
392
-
393
- **Comparison to Playwright**: Playwright is now generally preferred. Playwright has a built-in CLI, supports 3 browser engines natively, and has a richer ecosystem including MCP. Puppeteer supports Chrome/Firefox only and has no CLI.
394
-
395
- ---
396
-
397
- ### Chrome DevTools Protocol (CDP) Direct
398
-
399
- **What it is**: CDP is the underlying wire protocol that Playwright, Puppeteer, and all Chromium-based automation tools use. You can drive Chrome directly via HTTP and WebSocket without any framework.
400
-
401
- **Launching Chrome with debugging port**:
402
- ```bash
403
- # Headed (user-visible) - good for auth
404
- google-chrome \
405
- --remote-debugging-port=9222 \
406
- --user-data-dir=/tmp/chrome-agent \
407
- https://example.com/login
408
-
409
- # Or headless
410
- google-chrome \
411
- --headless=new \
412
- --remote-debugging-port=9222 \
413
- --user-data-dir=/tmp/chrome-agent
414
- ```
415
-
416
- **HTTP API endpoints** (no WebSocket needed for these):
417
- ```bash
418
- # List tabs
419
- curl http://localhost:9222/json/list
420
-
421
- # Create new tab
422
- curl "http://localhost:9222/json/new?https://example.com"
423
-
424
- # Close tab
425
- curl "http://localhost:9222/json/close/{targetId}"
426
-
427
- # Get browser version
428
- curl http://localhost:9222/json/version
429
- ```
430
-
431
- **WebSocket CDP commands** (for actual page control):
432
- ```javascript
433
- const CDP = require('chrome-remote-interface');
434
-
435
- async function captureAuth() {
436
- const client = await CDP();
437
- const { Network, Page } = client;
438
-
439
- await Network.enable();
440
- await Page.enable();
441
- await Page.navigate({ url: 'https://example.com/login' });
442
- await Page.loadEventFired();
443
-
444
- // After user logs in (poll or wait), capture cookies
445
- const { cookies } = await Network.getAllCookies();
446
- console.log(JSON.stringify(cookies));
447
-
448
- await client.close();
449
- }
450
- ```
451
-
452
- **CLI REPL with chrome-remote-interface**:
453
- ```bash
454
- npm install -g chrome-remote-interface
455
-
456
- # List targets
457
- chrome-remote-interface list
458
-
459
- # Open a URL in new tab
460
- chrome-remote-interface new 'https://example.com'
461
-
462
- # Interactive REPL (send CDP commands interactively)
463
- chrome-remote-interface inspect
464
- # Then inside REPL:
465
- # > Page.navigate({url: 'https://example.com'})
466
- # > Network.getAllCookies()
467
- ```
468
-
469
- **Getting cookies via CDP**:
470
- ```bash
471
- # Using websocat + jq (pure CLI, no Node.js needed after browser launch)
472
- WS=$(curl -s http://localhost:9222/json/list | jq -r '.[0].webSocketDebuggerUrl')
473
- echo '{"id":1,"method":"Network.getAllCookies"}' \
474
- | websocat "$WS" \
475
- | jq '.result.cookies[]'
476
- ```
477
-
478
- **CDP verdict for agents**: CDP is powerful but verbose. Best used as a foundation layer. The chrome-remote-interface REPL is useful for exploration. For production agent use, Playwright MCP or Playwright CLI are cleaner because they handle the WebSocket protocol, target management, and element selectors automatically.
479
-
480
- ---
481
-
482
- ### Browserless (Self-Hosted REST API)
483
-
484
- **What it is**: A Docker service that wraps headless Chrome and exposes a REST API. Agents call HTTP endpoints without managing any browser process.
485
-
486
- **Run locally**:
487
- ```bash
488
- docker run -p 3000:3000 ghcr.io/browserless/chrome
489
- ```
490
-
491
- **REST endpoints** (all `POST` with JSON body):
492
-
493
- ```bash
494
- # Screenshot
495
- curl -X POST http://localhost:3000/screenshot \
496
- -H "Content-Type: application/json" \
497
- -d '{"url": "https://example.com", "fullPage": true}' \
498
- --output out.png
499
-
500
- # PDF
501
- curl -X POST http://localhost:3000/pdf \
502
- -H "Content-Type: application/json" \
503
- -d '{"url": "https://example.com"}' \
504
- --output out.pdf
505
-
506
- # HTML content
507
- curl -X POST http://localhost:3000/content \
508
- -H "Content-Type: application/json" \
509
- -d '{"url": "https://example.com"}'
510
-
511
- # Execute Puppeteer script
512
- curl -X POST http://localhost:3000/function \
513
- -H "Content-Type: application/json" \
514
- -d '{
515
- "code": "module.exports = async ({page}) => { await page.goto(args.url); return await page.title(); }",
516
- "context": {"url": "https://example.com"}
517
- }'
518
- ```
519
-
520
- **Passing cookies to Browserless**:
521
- ```bash
522
- # Inject cookies in the request body
523
- curl -X POST http://localhost:3000/screenshot \
524
- -H "Content-Type: application/json" \
525
- -d '{
526
- "url": "https://example.com/dashboard",
527
- "cookies": [
528
- {"name": "session", "value": "abc123", "domain": "example.com"}
529
- ]
530
- }' --output dashboard.png
531
- ```
532
-
533
- **Trade-off**: Requires Docker. But once running, agents just need `curl`. No Node.js, no Python. Good for polyglot agents.
534
-
535
- ---
536
-
537
- ### Steel Browser (Self-Hosted REST API)
538
-
539
- **What it is**: An open-source browser API service similar to Browserless but with a session-oriented architecture. Good for multi-step authenticated workflows.
540
-
541
- **Run locally**:
542
- ```bash
543
- # Via npm
544
- npx @steel-dev/steel start
545
- # Or Docker
546
- docker run -p 3000:3000 ghcr.io/steel-dev/steel
547
- ```
548
-
549
- **REST endpoints**:
550
-
551
- ```bash
552
- # Create a session (returns sessionId)
553
- SESSION=$(curl -s -X POST http://localhost:3000/v1/sessions \
554
- -H "Content-Type: application/json" \
555
- -d '{"blockAds": true}' | jq -r '.id')
556
-
557
- # Screenshot a URL (stateless quick action)
558
- curl -X POST http://localhost:3000/v1/screenshot \
559
- -H "Content-Type: application/json" \
560
- -d '{"url": "https://example.com", "fullPage": true}' \
561
- --output out.png
562
-
563
- # Scrape page content
564
- curl -X POST http://localhost:3000/v1/scrape \
565
- -H "Content-Type: application/json" \
566
- -d '{"url": "https://example.com"}'
567
-
568
- # PDF
569
- curl -X POST http://localhost:3000/v1/pdf \
570
- -H "Content-Type: application/json" \
571
- -d '{"url": "https://example.com"}' \
572
- --output out.pdf
573
- ```
574
-
575
- **Sessions persist cookies** across requests - once you log into a page within a session, all subsequent requests in that session are authenticated.
576
-
577
- **Connect Playwright to Steel session**:
578
- ```javascript
579
- const { chromium } = require('playwright');
580
- const browser = await chromium.connectOverCDP(
581
- `ws://localhost:3000?sessionId=${sessionId}`
582
- );
583
- ```
584
-
585
- ---
586
-
587
- ## The Auth Handoff: Three Patterns
588
-
589
- ### Pattern 1: Playwright CLI (Recommended for CLI Agents)
590
-
591
- **When to use**: Your agent runs from a shell, you want zero framework knowledge required.
592
-
593
- ```bash
594
- # ---- Human does this once ----
595
- # Open headed browser for user login
596
- npx playwright codegen \
597
- --save-storage=~/.agent/auth/example-auth.json \
598
- https://example.com/login
599
- # [Browser opens, user logs in, browser closes, auth.json written]
600
-
601
- # ---- Agent does this autonomously ----
602
- npx playwright screenshot \
603
- --load-storage=~/.agent/auth/example-auth.json \
604
- https://example.com/dashboard \
605
- /tmp/dashboard.png
606
-
607
- # Agent can also generate a full-page PDF
608
- npx playwright pdf \
609
- --load-storage=~/.agent/auth/example-auth.json \
610
- https://example.com/report \
611
- /tmp/report.pdf
612
- ```
613
-
614
- **No Playwright code written. No async/await. Just CLI commands.**
615
-
616
- ### Pattern 2: Playwright MCP with Chrome Extension (Recommended for MCP Agents)
617
-
618
- **When to use**: Your agent is running inside an MCP host and you want to connect to the user's real logged-in browser.
619
-
620
- ```json
621
- {
622
- "mcpServers": {
623
- "playwright": {
624
- "command": "npx",
625
- "args": ["@playwright/mcp@latest", "--extension"]
626
- }
627
- }
628
- }
629
- ```
630
-
631
- 1. User installs "Playwright MCP Bridge" Chrome extension.
632
- 2. User is already logged into sites in their normal Chrome.
633
- 3. Agent calls `browser_navigate` / `browser_snapshot` / `browser_click` directly on those tabs.
634
- 4. No auth file needed - the user's live session is used.
635
-
636
- ### Pattern 3: CDP + Chrome --remote-debugging-port
637
-
638
- **When to use**: You want the most direct control, or you're already running Chrome elsewhere.
639
-
640
- ```bash
641
- # User launches Chrome with debugging enabled
642
- google-chrome \
643
- --remote-debugging-port=9222 \
644
- --user-data-dir=$HOME/.agent-chrome-profile \
645
- https://example.com/login
646
-
647
- # User logs in normally.
648
-
649
- # Agent now connects and captures cookies
650
- node -e "
651
- const CDP = require('chrome-remote-interface');
652
- CDP(async (client) => {
653
- await client.Network.enable();
654
- const {cookies} = await client.Network.getAllCookies();
655
- const fs = require('fs');
656
- // Convert to Playwright storageState format
657
- fs.writeFileSync('auth.json', JSON.stringify({cookies, origins: []}, null, 2));
658
- await client.close();
659
- });
660
- "
661
- ```
662
-
663
- Then use `auth.json` with `npx playwright screenshot --load-storage=auth.json ...`.
664
-
665
- ---
666
-
667
- ## Converting Between Cookie Formats
668
-
669
- ### Playwright storageState → Netscape cookies.txt
670
-
671
- Useful when you want to use the captured session with `curl`, `wget`, or `yt-dlp`.
672
-
673
- ```python
674
- import json, sys
675
- from datetime import datetime
676
-
677
- auth = json.load(open('auth.json'))
678
- print("# Netscape HTTP Cookie File")
679
- for c in auth.get('cookies', []):
680
- domain = c['domain']
681
- include_subdomains = 'TRUE' if domain.startswith('.') else 'FALSE'
682
- path = c.get('path', '/')
683
- https_only = 'TRUE' if c.get('secure', False) else 'FALSE'
684
- expires = int(c.get('expires', 0)) if c.get('expires', -1) != -1 else 0
685
- name = c['name']
686
- value = c['value']
687
- print(f"{domain}\t{include_subdomains}\t{path}\t{https_only}\t{expires}\t{name}\t{value}")
688
- ```
689
-
690
- ```bash
691
- python3 convert.py > cookies.txt
692
- curl -b cookies.txt https://example.com/api/data
693
- wget --load-cookies=cookies.txt https://example.com/api/data
694
- yt-dlp --cookies cookies.txt https://example.com/video
695
- ```
696
-
697
- ### Netscape cookies.txt → Playwright storageState
698
-
699
- ```python
700
- import json, time
701
-
702
- def netscape_to_playwright(cookies_file):
703
- cookies = []
704
- with open(cookies_file) as f:
705
- for line in f:
706
- line = line.strip()
707
- if not line or line.startswith('#'):
708
- continue
709
- parts = line.split('\t')
710
- if len(parts) != 7:
711
- continue
712
- domain, incl_sub, path, https_only, expires, name, value = parts
713
- cookies.append({
714
- 'name': name,
715
- 'value': value,
716
- 'domain': domain,
717
- 'path': path,
718
- 'expires': float(expires) if expires and expires != '0' else -1,
719
- 'httpOnly': False,
720
- 'secure': https_only == 'TRUE',
721
- 'sameSite': 'None'
722
- })
723
- return {'cookies': cookies, 'origins': []}
724
-
725
- state = netscape_to_playwright('cookies.txt')
726
- json.dump(state, open('auth.json', 'w'), indent=2)
727
- ```
728
-
729
- ---
730
-
731
- ## Comparison Table
732
-
733
- | Tool | Interface | Auth Handoff | Boilerplate | Best For |
734
- |------|-----------|-------------|-------------|---------|
735
- | **Playwright CLI** | Shell commands | `--save-storage` / `--load-storage` | Zero | CLI agents, shell scripts |
736
- | **Playwright MCP** | MCP tool calls | `--storage-state`, `--extension`, `--cdp-endpoint` | Zero | MCP agent hosts (Claude, Cursor, etc.) |
737
- | **browser-use** | Python + CLI | Chrome profile reuse | Low (Python) | Autonomous task agents (Python) |
738
- | **Chrome CDP direct** | WebSocket + HTTP | Manual cookie capture | Medium (JS) | Fine-grained control, low-level |
739
- | **chrome-remote-interface** | CLI REPL + JS | `Network.getAllCookies()` | Low-medium | Exploration, scripting |
740
- | **Browserless** | REST API (curl) | Cookie injection in JSON body | Zero (needs Docker) | Polyglot agents, Docker-friendly |
741
- | **Steel Browser** | REST API (curl) | Session-scoped cookie persistence | Zero (needs Docker/npx) | Multi-step auth workflows |
742
- | **puppeteer-extra** | JS library | Manual scripting | High | Bot-detection avoidance |
743
-
744
- ---
745
-
746
- ## Common Pitfalls
747
-
748
- | Pitfall | Why It Happens | How to Avoid |
749
- |---------|---------------|--------------|
750
- | Capturing auth.json but cookies expire | Session cookies have short TTL | Check `expires` field; re-capture if expired. Use `--user-data-dir` for persistent profile instead. |
751
- | Playwright PDF not working | PDF command only works with Chromium | Always pass `-b chromium` or `--channel chrome` for PDF |
752
- | Screenshot captures login page, not dashboard | Session not loaded | Always pass `--load-storage=auth.json` |
753
- | Browser bot-detection blocking | Playwright leaves fingerprints | Use `--channel chrome` (real Chrome binary) instead of Chromium. Or use puppeteer-extra-stealth. |
754
- | MCP tools using accessibility tree but page has poor ARIA | Site has no semantic markup | Fall back to `browser_take_screenshot` + vision, or use `browser_evaluate` for DOM queries |
755
- | CDP WebSocket closes on page navigation | WebSocket is per-target | Re-attach after navigation using Target events |
756
- | Netscape cookies.txt parse error | Wrong line endings (CRLF vs LF) | Normalize to LF on Unix: `sed -i 's/\r//' cookies.txt` |
757
- | `browser-use` agent gets stuck in loop | LLM hallucinating element states | Set `max_steps` limit; use `browser-use state` to inspect actual element indices |
758
- | auth.json committed to git | Forgot to gitignore | Add `*.auth.json`, `auth/`, `.auth/` to `.gitignore` |
759
-
760
- ---
761
-
762
- ## Best Practices
763
-
764
- 1. **Store auth files outside the repo** — use `~/.agent/auth/{service}-auth.json` or environment-relative paths. Never commit session files. (Multiple sources)
765
-
766
- 2. **Prefer `--user-data-dir` over `--save-storage` for long-running agents** — user data directories persist across browser restarts, handle refresh tokens, and work for sites that rotate session cookies. (Playwright MCP docs)
767
-
768
- 3. **Use `browser_snapshot` over screenshots for text extraction** — the accessibility tree is ~10x more token-efficient than describing a screenshot and does not require a vision model. (Playwright MCP README)
769
-
770
- 4. **Use `--channel chrome` (real Chrome) when bot detection is an issue** — websites fingerprint Chrome vs Chromium. The real Chrome binary passes more checks. (Playwright docs, chrome-for-testing)
771
-
772
- 5. **Separate the headed auth step from the headless work step** — document these as two distinct phases in your agent code. This makes re-authentication easy when sessions expire. (browser-use docs)
773
-
774
- 6. **For multi-step workflows, use session-based tools** — Steel Browser sessions and Playwright MCP's persistent browser maintain cookie state across page navigations automatically. One-shot REST calls lose state. (Steel Browser docs)
775
-
776
- 7. **Test for element visibility before interaction** — use `--wait-for-selector` (CLI) or `browser_wait_for` (MCP) to avoid flaky automation on dynamic pages. (Playwright CLI docs)
777
-
778
- 8. **Validate the captured auth immediately** — after `--save-storage`, run one screenshot with `--load-storage` and check it shows the logged-in state before using the auth file in production. (Playwright docs)
779
-
780
- ---
781
-
782
- ## Code Examples
783
-
784
- ### Complete Shell-Only Auth Handoff
785
-
786
- ```bash
787
- #!/bin/bash
788
- # auth-handoff.sh - Agent auth handoff using only Playwright CLI
789
-
790
- AUTH_FILE="$HOME/.agent/auth/myapp-auth.json"
791
- BASE_URL="https://myapp.example.com"
792
-
793
- # Phase 1: Human auth (run once, or when session expires)
794
- capture_auth() {
795
- mkdir -p "$(dirname "$AUTH_FILE")"
796
- echo "Opening browser for login..."
797
- npx playwright codegen \
798
- --save-storage="$AUTH_FILE" \
799
- "$BASE_URL/login"
800
- echo "Auth captured: $AUTH_FILE"
801
- }
802
-
803
- # Phase 2: Agent uses auth headlessly
804
- take_screenshot() {
805
- local url="$1"
806
- local out="$2"
807
- npx playwright screenshot \
808
- --load-storage="$AUTH_FILE" \
809
- --full-page \
810
- "$url" "$out"
811
- }
812
-
813
- save_pdf() {
814
- local url="$1"
815
- local out="$2"
816
- npx playwright pdf \
817
- --load-storage="$AUTH_FILE" \
818
- -b chromium \
819
- "$url" "$out"
820
- }
821
-
822
- # If auth file is missing or stale, capture it
823
- if [ ! -f "$AUTH_FILE" ]; then
824
- capture_auth
825
- fi
826
-
827
- # Agent work
828
- take_screenshot "$BASE_URL/dashboard" /tmp/dashboard.png
829
- save_pdf "$BASE_URL/report/monthly" /tmp/monthly-report.pdf
830
- ```
831
-
832
- ### Playwright MCP Agent Workflow (Conceptual)
833
-
834
- When an MCP agent wants to do browser work:
835
-
836
- ```
837
- # Agent internal monologue:
838
- # 1. Check if page is accessible
839
- tool_call: browser_navigate({url: "https://app.example.com/dashboard"})
840
- tool_call: browser_snapshot()
841
- # → Returns accessibility tree; if login wall detected, trigger auth flow
842
-
843
- # 2. If login needed (persistent profile approach):
844
- # Agent tells user: "Please log into the browser window that just opened"
845
- # (Browser was started with --user-data-dir, user's existing login may already work)
846
-
847
- # 3. Once authenticated, proceed
848
- tool_call: browser_snapshot() # verify dashboard loaded
849
- tool_call: browser_evaluate({expression: "document.title"}) # extract data
850
- tool_call: browser_take_screenshot({filename: "/tmp/dashboard.png"})
851
- ```
852
-
853
- ### Python Agent with browser-use + Cookie Export
854
-
855
- ```python
856
- import asyncio, json
857
- from browser_use import Agent, Browser, BrowserConfig, ChatBrowserUse
858
-
859
- async def authenticated_scrape():
860
- # Option A: Use existing Chrome profile (simplest for auth)
861
- browser = Browser(config=BrowserConfig(
862
- chrome_instance_path="/usr/bin/google-chrome",
863
- headless=False,
864
- ))
865
-
866
- # Option B: Use previously saved Playwright storageState
867
- # browser = Browser(config=BrowserConfig(storage_state="auth.json"))
868
-
869
- llm = ChatBrowserUse()
870
- agent = Agent(
871
- task="""
872
- Go to https://app.example.com/reports.
873
- Find the most recent report dated this month.
874
- Download it or return its URL.
875
- """,
876
- llm=llm,
877
- browser=browser,
878
- max_steps=20,
879
- )
880
-
881
- result = await agent.run()
882
- print(result)
883
- await browser.close()
884
-
885
- asyncio.run(authenticated_scrape())
886
- ```
887
-
888
- ### curl with Cookies from Playwright Auth
889
-
890
- ```bash
891
- # After capturing auth.json with playwright codegen --save-storage
892
-
893
- # Quick Python converter (inline)
894
- python3 -c "
895
- import json, sys
896
- data = json.load(open('auth.json'))
897
- print('# Netscape HTTP Cookie File')
898
- for c in data.get('cookies', []):
899
- dom = c['domain']
900
- sub = 'TRUE' if dom.startswith('.') else 'FALSE'
901
- sec = 'TRUE' if c.get('secure') else 'FALSE'
902
- exp = int(c.get('expires', 0)) if c.get('expires', -1) > 0 else 0
903
- print(f\"{dom}\t{sub}\t{c['path']}\t{sec}\t{exp}\t{c['name']}\t{c['value']}\")
904
- " > cookies.txt
905
-
906
- # Use with curl
907
- curl -b cookies.txt https://app.example.com/api/data | jq .
908
-
909
- # Use with wget
910
- wget --load-cookies=cookies.txt -O data.json https://app.example.com/api/data
911
-
912
- # Use with yt-dlp
913
- yt-dlp --cookies cookies.txt https://app.example.com/video/123
914
- ```
915
-
916
- ---
917
-
918
- ## Further Reading
919
-
920
- | Resource | Type | Why Recommended |
921
- |----------|------|-----------------|
922
- | [Playwright CLI docs](https://playwright.dev/docs/cli) | Official Docs | Authoritative reference for all CLI commands and flags |
923
- | [Playwright Auth docs](https://playwright.dev/docs/auth) | Official Docs | Comprehensive guide to storageState, setup projects, session reuse |
924
- | [Playwright MCP on GitHub](https://github.com/microsoft/playwright-mcp) | Official Repo | Complete tool list, config options, Chrome extension setup |
925
- | [browser-use on GitHub](https://github.com/browser-use/browser-use) | Official Repo | Agent API, CLI reference, custom tools, production deployment |
926
- | [Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/) | Official Spec | Complete CDP domain/method reference |
927
- | [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) | Library | Node.js CDP wrapper with CLI REPL |
928
- | [Steel Browser](https://github.com/steel-dev/steel-browser) | Open Source | REST API browser service, session management |
929
- | [Browserless](https://github.com/browserless/browserless) | Open Source | Docker REST browser service |
930
- | [yt-dlp cookies guide](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp) | Guide | Netscape cookie format, browser extension recommendations |
931
- | [puppeteer-extra-stealth](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth) | Plugin | 20+ bot-detection patches for Puppeteer |
932
-
933
- ---
934
-
935
- *Generated by /learn from 32 sources.*
936
- *See `resources/cli-browser-automation-agents-sources.json` for full source metadata.*