agentmb 0.1.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/README.md +1002 -153
  2. package/dist/browser/actions.d.ts +25 -2
  3. package/dist/browser/actions.d.ts.map +1 -1
  4. package/dist/browser/actions.js +122 -22
  5. package/dist/browser/actions.js.map +1 -1
  6. package/dist/browser/manager.d.ts +35 -0
  7. package/dist/browser/manager.d.ts.map +1 -1
  8. package/dist/browser/manager.js +244 -16
  9. package/dist/browser/manager.js.map +1 -1
  10. package/dist/cli/commands/actions.d.ts.map +1 -1
  11. package/dist/cli/commands/actions.js +342 -80
  12. package/dist/cli/commands/actions.js.map +1 -1
  13. package/dist/cli/commands/browser-launch.d.ts +7 -0
  14. package/dist/cli/commands/browser-launch.d.ts.map +1 -0
  15. package/dist/cli/commands/browser-launch.js +116 -0
  16. package/dist/cli/commands/browser-launch.js.map +1 -0
  17. package/dist/cli/commands/session.d.ts.map +1 -1
  18. package/dist/cli/commands/session.js +76 -4
  19. package/dist/cli/commands/session.js.map +1 -1
  20. package/dist/cli/index.js +3 -1
  21. package/dist/cli/index.js.map +1 -1
  22. package/dist/daemon/index.js +2 -2
  23. package/dist/daemon/index.js.map +1 -1
  24. package/dist/daemon/routes/actions.d.ts.map +1 -1
  25. package/dist/daemon/routes/actions.js +516 -78
  26. package/dist/daemon/routes/actions.js.map +1 -1
  27. package/dist/daemon/routes/interaction.d.ts.map +1 -1
  28. package/dist/daemon/routes/interaction.js +10 -1
  29. package/dist/daemon/routes/interaction.js.map +1 -1
  30. package/dist/daemon/routes/sessions.d.ts.map +1 -1
  31. package/dist/daemon/routes/sessions.js +314 -3
  32. package/dist/daemon/routes/sessions.js.map +1 -1
  33. package/dist/daemon/routes/state.d.ts.map +1 -1
  34. package/dist/daemon/routes/state.js +26 -0
  35. package/dist/daemon/routes/state.js.map +1 -1
  36. package/dist/daemon/server.js +1 -1
  37. package/dist/daemon/session.d.ts +19 -0
  38. package/dist/daemon/session.d.ts.map +1 -1
  39. package/dist/daemon/session.js +13 -0
  40. package/dist/daemon/session.js.map +1 -1
  41. package/dist/policy/types.d.ts.map +1 -1
  42. package/dist/policy/types.js +14 -12
  43. package/dist/policy/types.js.map +1 -1
  44. package/package.json +4 -2
  45. package/skills/agentmb/SKILL.md +541 -0
  46. package/skills/agentmb/references/authentication.md +180 -0
  47. package/skills/agentmb/references/browser-modes.md +167 -0
  48. package/skills/agentmb/references/commands.md +231 -0
  49. package/skills/agentmb/references/locator-modes.md +254 -0
  50. package/skills/agentmb/references/session-management.md +260 -0
package/README.md CHANGED
@@ -2,30 +2,26 @@
2
2
 
3
3
  Agent-ready local browser runtime for stable, auditable web automation.
4
4
 
5
- ## What It Does
6
-
7
- `agent-managed-browser` provides a persistent Chromium daemon with session management, CLI/Python SDK access, and human login handoff support. It is designed for coding/ops agents that need reproducible browser workflows instead of fragile one-off scripts.
5
+ > **For AI agents**: Load [`skills/agentmb/SKILL.md`](./skills/agentmb/SKILL.md) into your context before working with agentmb. It covers the core workflow, locator mode decision guide, essential commands, and common patterns in ~300 lines — no need to read this full README first. Deep references are in [`skills/agentmb/references/`](./skills/agentmb/references/).
8
6
 
9
- ## Use Cases
7
+ ## What It Does
10
8
 
11
- - **Agent web tasks**: Let Codex/Claude run navigation, click/fill, extraction, screenshot, and evaluation in a controlled runtime.
12
- - **Human-in-the-loop login**: Switch to headed mode for manual login, then return to headless automation with the same profile.
13
- - **E2E and CI verification**: Run isolated smoke/auth/handoff/cdp checks with configurable port and data dir.
14
- - **Local automation service**: Keep one daemon running and let multiple tools/agents reuse sessions safely.
9
+ `agent-managed-browser` runs a persistent **Chromium stable** browser daemon (via Playwright's bundled Chromium stable channel) with session management, structured audit logs, multi-modal element targeting, and human login handoff. It exposes a REST API, a CLI, and a Python SDK.
15
10
 
16
- Local Chromium runtime for AI agents, with:
11
+ The browser engine is Chromium (Chrome-compatible). Firefox and WebKit are not supported. Node.js 20 LTS is the runtime baseline.
17
12
 
18
- - daemon API (`agentmb`)
19
- - CLI (`agentmb`)
20
- - Python SDK (`agentmb`)
13
+ Designed for coding and ops agents that need reproducible, inspectable browser workflows rather than fragile one-off scripts.
21
14
 
22
- This repo supports macOS, Linux, and Windows.
15
+ ## Use Cases
23
16
 
24
- ## Agent Skill
17
+ - **Agent web tasks**: navigate, click, fill, extract, screenshot, evaluate JavaScript, all via API or SDK.
18
+ - **Human-in-the-loop login**: switch to headed mode for manual login, then return to headless automation with the same profile and cookies intact.
19
+ - **E2E and CI verification**: run isolated smoke/auth/CDP/policy checks with configurable port and data dir.
20
+ - **Local automation service**: one daemon, multiple sessions, multiple agents reusing sessions safely.
25
21
 
26
- For Codex/Claude/AgentMB operation guidance (initialization, core commands, troubleshooting), see:
22
+ Supports macOS, Linux, and Windows.
27
23
 
28
- - [agentmb-operations-skill/SKILL.md](./agentmb-operations-skill/SKILL.md)
24
+ ---
29
25
 
30
26
  ## Quick Start
31
27
 
@@ -45,9 +41,22 @@ npm link
45
41
  agentmb start
46
42
  ```
47
43
 
48
- ## Install from npm / pip
44
+ In another terminal:
49
45
 
50
- ### macOS / Linux
46
+ ```bash
47
+ agentmb status
48
+ agentmb session new --profile demo
49
+ agentmb session list
50
+ agentmb navigate <session-id> https://example.com
51
+ agentmb screenshot <session-id> -o ./shot.png
52
+ agentmb stop
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Install
58
+
59
+ ### npm + pip (macOS / Linux)
51
60
 
52
61
  ```bash
53
62
  npm i -g agentmb
@@ -56,239 +65,936 @@ agentmb --help
56
65
  python3 -c "import agentmb; print(agentmb.__version__)"
57
66
  ```
58
67
 
59
- ### Windows (PowerShell)
68
+ ### npm + pip (Windows PowerShell)
60
69
 
61
70
  ```powershell
62
71
  npm i -g agentmb
63
72
  py -m pip install --user agentmb
64
73
  agentmb --help
65
- py -c "import agentmb; print(agentmb.__version__)"
66
74
  ```
67
75
 
68
76
  Package roles:
69
- - npm package: CLI + daemon runtime
70
- - pip package: Python SDK client
77
+ - `npm` package: CLI + daemon runtime (Chromium via Playwright)
78
+ - `pip` package: Python SDK client (httpx + pydantic v2)
71
79
 
72
- In another terminal:
73
-
74
- ```bash
75
- agentmb status
76
- agentmb session new --profile demo
77
- agentmb session list
78
- agentmb navigate <session-id> https://example.com
79
- agentmb screenshot <session-id> -o ./shot.png
80
- agentmb stop
81
- ```
80
+ ---
82
81
 
83
82
  ## Python SDK
84
83
 
85
84
  ```bash
86
85
  python3 -m pip install -e sdk/python
87
- python3 -c "from agentmb import BrowserClient; print('SDK OK')"
88
86
  ```
89
87
 
90
- ## Install By Platform
88
+ ```python
89
+ from agentmb import BrowserClient
90
+
91
+ with BrowserClient(base_url="http://127.0.0.1:19315") as client:
92
+ sess = client.sessions.create(headless=True, profile="demo")
93
+ sess.navigate("https://example.com")
94
+ res = sess.screenshot()
95
+ res.save("shot.png")
96
+ sess.close()
97
+ ```
98
+
99
+ ---
91
100
 
92
- For full installation steps on all environments:
101
+ ## Locator Models
93
102
 
94
- - macOS
95
- - Linux (Ubuntu / Debian)
96
- - Windows (PowerShell / WSL2)
103
+ Three targeting modes based on page stability and replay requirements.
97
104
 
98
- See [INSTALL.md](./INSTALL.md).
105
+ ### 1) Selector Mode
99
106
 
100
- ## Action Reference
107
+ Plain CSS selectors passed directly.
101
108
 
102
- | Action | CLI command | Description |
103
- |---|---|---|
104
- | navigate | `agentmb navigate <sess> <url>` | Navigate to URL |
105
- | screenshot | `agentmb screenshot <sess> -o out.png` | Capture screenshot |
106
- | eval | `agentmb eval <sess> <expr>` | Run JavaScript expression |
107
- | extract | `agentmb extract <sess> <selector>` | Extract text/attributes |
108
- | click | `agentmb click <sess> <selector>` | Click element |
109
- | fill | `agentmb fill <sess> <selector> <value>` | Fill form field |
110
- | type | `agentmb type <sess> <selector> <text>` | Type char-by-char |
111
- | press | `agentmb press <sess> <selector> <key>` | Press key / combo (e.g. `Enter`, `Control+a`) |
112
- | select | `agentmb select <sess> <selector> <val>` | Select `<option>` in a `<select>` |
113
- | hover | `agentmb hover <sess> <selector>` | Hover over element |
114
- | wait-selector | `agentmb wait-selector <sess> <selector>` | Wait for element state |
115
- | wait-url | `agentmb wait-url <sess> <pattern>` | Wait for URL pattern |
116
- | upload | `agentmb upload <sess> <selector> <file>` | Upload local file to file input |
117
- | download | `agentmb download <sess> <selector> -o out` | Click link and save download |
118
- | element-map | `agentmb element-map <sess>` | Scan page, label interactive elements with stable IDs |
119
- | get | `agentmb get <sess> <property> <selector>` | Read text/html/value/attr/count/box from element |
120
- | assert | `agentmb assert <sess> <property> <selector>` | Assert visible/enabled/checked state |
121
- | wait-stable | `agentmb wait-stable <sess>` | Wait for network idle + DOM quiet + overlay gone |
122
-
123
- Actions that accept `<selector>` also accept `--element-id <eid>` (from `element-map`) as an alternative stable locator. Both remain backward-compatible.
124
-
125
- ### Element Map
126
-
127
- ```bash
128
- # Scan the page and label all interactive elements
109
+ ```bash
110
+ agentmb click <session-id> "#submit"
111
+ agentmb fill <session-id> "#email" "name@example.com"
112
+ agentmb get <session-id> text "#title"
113
+ ```
114
+
115
+ Best for: stable pages where selectors are reliable.
116
+
117
+ ### 2) Element-ID Mode (`element-map`)
118
+
119
+ Step 1: scan the page, get stable `element_id` values.
120
+
121
+ ```bash
129
122
  agentmb element-map <session-id>
130
- # table: element_id | tag | role | text | rect
123
+ agentmb element-map <session-id> --include-unlabeled # also surface icon-only elements
124
+ ```
131
125
 
132
- # Use element_id in subsequent actions (no selector drift)
126
+ Step 2: pass the ID to any action.
127
+
128
+ ```bash
133
129
  agentmb click <session-id> e3 --element-id
134
- agentmb fill <session-id> e5 "hello" --element-id
130
+ agentmb fill <session-id> e5 "hello" --element-id
131
+ agentmb get <session-id> text e3 --element-id
132
+ agentmb assert <session-id> visible e3 --element-id
133
+ ```
134
+
135
+ `label` field per element is synthesized using a 7-level priority chain:
136
+
137
+ | Priority | Source | `label_source` value |
138
+ |---|---|---|
139
+ | 1 | `aria-label` attribute | `"aria-label"` |
140
+ | 2 | `title` attribute | `"title"` |
141
+ | 3 | `aria-labelledby` target text | `"aria-labelledby"` |
142
+ | 4 | SVG `<title>` / `<desc>` | `"svg-title"` |
143
+ | 5 | `innerText` (trimmed) | `"text"` |
144
+ | 6 | `placeholder` attribute | `"placeholder"` |
145
+ | 7 | Fallback (icon-only) | `"none"` / `"[tag @ x,y]"` |
135
146
 
136
- # Read element properties
137
- agentmb get <session-id> text --element-id e3
138
- agentmb get <session-id> value --element-id e5
139
- agentmb get <session-id> count .item-class
140
- agentmb get <session-id> attr "#logo" --attr-name src
147
+ Icon-only elements get `label_source="none"` by default; `--include-unlabeled` adds a `[tag @ x,y]` coordinate fallback.
141
148
 
142
- # Assert element state
143
- agentmb assert <session-id> visible --element-id e3
144
- agentmb assert <session-id> enabled "#submit" --expected true
145
- agentmb assert <session-id> checked "#agree" --expected false
149
+ Best for: selector drift, dynamic class names, and icon-heavy SPAs.
146
150
 
147
- # Wait for page to be fully stable (network idle + DOM quiet + overlays gone)
148
- agentmb wait-stable <session-id> --timeout-ms 10000 --dom-stable-ms 300
149
- agentmb wait-stable <session-id> --overlay-selector "#loading-overlay"
151
+ ### 3) Snapshot-Ref Mode (`snapshot-map` + `ref_id`)
152
+
153
+ Step 1: create a server-side snapshot.
154
+
155
+ ```bash
156
+ agentmb snapshot-map <session-id>
157
+ agentmb snapshot-map <session-id> --include-unlabeled
150
158
  ```
151
159
 
152
- ### Element Map (Python SDK)
160
+ Step 2: use the returned `ref_id` (`snap_XXXXXX:eN`) in API/SDK calls.
161
+
162
+ - `page_rev` is an integer counter returned with each snapshot; it increments on every main-frame navigation. Poll it directly to detect page changes without taking a full snapshot:
163
+
164
+ ```http
165
+ GET /api/v1/sessions/:id/page_rev
166
+ → { "status": "ok", "session_id": "...", "page_rev": 3, "url": "https://..." }
167
+ ```
153
168
 
154
169
  ```python
155
- with client.create_session(headless=True) as sess:
156
- sess.navigate("https://example.com")
170
+ rev = sess.page_rev() # PageRevResult with .page_rev, .url
171
+ ```
172
+
173
+ - If the page has navigated since the snapshot, using a stale `ref_id` returns `409 stale_ref` with a structured payload:
174
+
175
+ ```json
176
+ {
177
+ "error": "stale_ref: page changed",
178
+ "suggestions": ["call snapshot_map to get fresh ref_ids", "re-run your step with the new ref_id"]
179
+ }
180
+ ```
181
+
182
+ - Recovery: call `snapshot-map` again, retry with new `ref_id`.
183
+
184
+ Best for: deterministic replay and safe automation on changing pages.
185
+
186
+ ### Mode Selection Guide
187
+
188
+ | Page Type | Recommended Mode |
189
+ |---|---|
190
+ | Text-rich pages (docs, GitHub, HN) | `element-map` + `--element-id` |
191
+ | Icon/SVG-dense SPAs (social apps, dashboards) | CSS selector or `--include-unlabeled` |
192
+ | `contenteditable` / custom components | `eval getBoundingClientRect` + `click-at` |
193
+ | Image feeds (Unsplash, Pinterest) | `snapshot-map` (images have `alt` text) |
194
+
195
+ | Action | Approach |
196
+ |---|---|
197
+ | Search / navigation | Construct the URL directly |
198
+ | Click a labeled button | `element-map` eid or CSS selector |
199
+ | Click `contenteditable` | `click-at <sess> <x> <y>` (get coords via `bbox`) |
200
+ | Scroll SPA content area | Check `scrolled` + `scrollable_hint` in response; use `eval el.scrollBy()` if needed |
201
+ | File upload from disk | `upload <sess> <selector> <file>` (MIME inferred from extension) |
202
+ | File upload from URL | API: `POST /sessions/:id/upload_url` |
203
+ | Click JS-signed links | `click-at` to trigger a real click event |
204
+
205
+ ---
206
+
207
+ ## Action Reference
208
+
209
+ Use `agentmb --help` and `agentmb <command> --help` for full flags.
210
+
211
+ ### Navigation
212
+
213
+ | Command | Notes |
214
+ |---|---|
215
+ | `agentmb navigate <sess> <url>` | Navigate; `--wait-until load\|networkidle\|commit` |
216
+ | `agentmb back <sess>` / `forward <sess>` / `reload <sess>` | Browser history |
217
+ | `agentmb wait-url <sess> <pattern>` | Wait for URL match |
218
+ | `agentmb wait-load-state <sess>` | Wait for load state |
219
+ | `agentmb wait-function <sess> <expr>` | Wait for JS condition |
220
+ | `agentmb wait-text <sess> <text>` | Wait for text to appear |
221
+ | `agentmb wait-stable <sess>` | Network idle + DOM quiet + optional overlay clear |
222
+
223
+ ### Locator / Read / Assert
224
+
225
+ | Command | Notes |
226
+ |---|---|
227
+ | `agentmb element-map <sess>` | Scan; inject `element_id`; return `label` + `label_source` |
228
+ | `agentmb element-map <sess> --include-unlabeled` | Include icon-only elements; fallback label = `[tag @ x,y]` |
229
+ | `agentmb snapshot-map <sess>` | Server snapshot with `page_rev`; returns `ref_id` per element |
230
+ | `agentmb get <sess> <property> <selector-or-eid>` | Read `text/html/value/attr/count/box` |
231
+ | `agentmb assert <sess> <property> <selector-or-eid>` | Assert `visible/enabled/checked` |
232
+ | `agentmb extract <sess> <selector>` | Extract text/attributes as list |
233
+
234
+ `selector-or-eid` accepts a CSS selector, `--element-id` (element-map), or `--ref-id` (snapshot-map) on all commands.
235
+
236
+ ### Element Interaction
237
+
238
+ | Command | Notes |
239
+ |---|---|
240
+ | `agentmb click <sess> <selector-or-eid>` | Click; `contenteditable` supported; returns `422` with diagnostics + `recovery_hint` on failure |
241
+ | `agentmb dblclick <sess> <selector-or-eid>` | Double-click |
242
+ | `agentmb fill <sess> <selector-or-eid> <value>` | Fast fill (replaces value) |
243
+ | `agentmb type <sess> <selector-or-eid> <text>` | Type character by character; `--delay-ms <ms>` |
244
+ | `agentmb press <sess> <selector-or-eid> <key>` | Key / combo (`Enter`, `Tab`, `Control+a`) |
245
+ | `agentmb select <sess> <selector> <value...>` | Select `<option>` in `<select>` |
246
+ | `agentmb hover <sess> <selector-or-eid>` | Hover |
247
+ | `agentmb focus <sess> <selector-or-eid>` | Focus |
248
+ | `agentmb check <sess> <selector-or-eid>` / `uncheck` | Checkbox / radio |
249
+ | `agentmb drag <sess> <source> <target>` | Drag-and-drop; also accepts `--source-ref-id` / `--target-ref-id` |
250
+
251
+ **API/SDK — click advanced options:**
252
+
253
+ ```python
254
+ # executor: 'strict' (default) or 'auto_fallback'
255
+ # auto_fallback: tries Playwright click; if it times out due to overlay/intercept,
256
+ # falls back to page.mouse.click(center_x, center_y).
257
+ # When clicking inside an <iframe>, auto_fallback automatically adds the frame's
258
+ # page-level offset so coordinates land correctly.
259
+ # Response includes executed_via: 'high_level' | 'low_level'
260
+ sess.click(selector="#btn", executor="auto_fallback", timeout_ms=3000)
261
+
262
+ # stability: optional pre/post waits to handle animated UIs
263
+ sess.click(selector="#btn", stability={
264
+ "wait_before_ms": 200, # pause before the action
265
+ "wait_after_ms": 100, # pause after the action
266
+ "wait_dom_stable_ms": 500 # wait for DOM readyState before acting
267
+ })
268
+ ```
269
+
270
+ **API/SDK — fill humanization:**
271
+
272
+ ```python
273
+ # fill_strategy='type': types character-by-character (slower, more human-like)
274
+ # char_delay_ms: delay between keystrokes in ms (used with fill_strategy='type')
275
+ sess.fill(selector="#inp", value="hello", fill_strategy="type", char_delay_ms=30)
276
+ ```
277
+
278
+ ### Scroll and Feed
279
+
280
+ | Command | Notes |
281
+ |---|---|
282
+ | `agentmb scroll <sess> <selector-or-eid>` | Scroll element; structured response (see below) |
283
+ | `agentmb scroll-into-view <sess> <selector-or-eid>` | Scroll element into viewport |
284
+ | `agentmb scroll-until <sess>` | Scroll until stop condition (`--stop-selector`, `--stop-text`, `--max-scrolls`) |
285
+ | `agentmb load-more-until <sess> <btn-selector> <item-selector>` | Repeatedly click load-more |
286
+
287
+ **`scroll` response fields:**
288
+
289
+ ```json
290
+ {
291
+ "scrolled": true,
292
+ "warning": "element not scrollable — scrolled nearest scrollable ancestor",
293
+ "scrollable_hint": [
294
+ { "selector": "#feed", "tag": "div", "scrollHeight": 4200, "clientHeight": 600 },
295
+ ...
296
+ ]
297
+ }
298
+ ```
299
+
300
+ - `scrolled` — `true` if any scroll movement occurred
301
+ - `warning` — present when the target element itself is not scrollable and a fallback was used
302
+ - `scrollable_hint` — top-5 scrollable descendants ranked by `scrollHeight`; use these selectors in subsequent `scroll` calls when `scrolled=false`
303
+
304
+ **`scroll_until` / `load_more_until` response** includes `session_id` for chaining:
305
+
306
+ ```json
307
+ { "status": "ok", "session_id": "sess_...", "scrolls": 12, "stop_reason": "stop_text_found" }
308
+ ```
309
+
310
+ **API/SDK — scroll_until with step_delay:**
311
+
312
+ ```python
313
+ # step_delay_ms: wait between each scroll step (default = stall_ms)
314
+ sess.scroll_until(scroll_selector="#feed", direction="down",
315
+ stop_selector=".end", max_scrolls=20, step_delay_ms=150)
316
+ ```
317
+
318
+ ### Coordinate and Low-Level Input
319
+
320
+ | Command | Notes |
321
+ |---|---|
322
+ | `agentmb click-at <sess> <x> <y>` | Click absolute page coordinates |
323
+ | `agentmb wheel <sess> --dx --dy` | Low-level wheel event |
324
+ | `agentmb insert-text <sess> <text>` | Insert text into focused element (no keyboard simulation) |
325
+ | `agentmb bbox <sess> <selector-or-eid>` | Bounding box + center coordinates; accepts `--element-id` / `--ref-id` |
326
+ | `agentmb mouse-move <sess> [x] [y]` | Move mouse to absolute coordinates; or use `--selector`/`--element-id`/`--ref-id` to resolve element center |
327
+ | `agentmb mouse-down <sess>` / `mouse-up <sess>` | Mouse button press / release |
328
+ | `agentmb key-down <sess> <key>` / `key-up <sess> <key>` | Raw key press / release |
329
+
330
+ **API/SDK — smooth mouse movement:**
331
+
332
+ ```python
333
+ # Move by absolute coordinates with smooth interpolation
334
+ res = sess.mouse_move(x=400, y=300, steps=10)
157
335
 
158
- # Scan page elements
159
- result = sess.element_map()
160
- for el in result.elements:
161
- print(el.element_id, el.tag, el.role, el.text)
336
+ # Move to an element center by selector / element_id / ref_id (x/y resolved server-side)
337
+ res = sess.mouse_move(selector="#submit-btn", steps=5)
338
+ res = sess.mouse_move(element_id="e3", steps=5)
339
+ res = sess.mouse_move(ref_id="snap_000001:e3")
162
340
 
163
- # Click by element_id
164
- btn = next(e for e in result.elements if e.role == "button")
165
- sess.click(element_id=btn.element_id)
341
+ # Response includes x, y, steps fields
342
+ print(res.x, res.y, res.steps)
343
+ ```
344
+
345
+ CLI equivalents:
346
+ ```bash
347
+ agentmb mouse-move <sess> 400 300 --steps 10
348
+ agentmb mouse-move <sess> --selector "#btn" --steps 5
349
+ agentmb mouse-move <sess> --element-id e3
350
+ agentmb mouse-move <sess> --ref-id snap_000001:e3
351
+ ```
352
+
353
+ ### Semantic Find
354
+
355
+ Locate elements by Playwright semantic locators without knowing CSS selectors.
356
+
357
+ ```bash
358
+ # CLI
359
+ agentmb find <sess> role "button" --name "Submit"
360
+ agentmb find <sess> text "Sign in" --exact
361
+ agentmb find <sess> placeholder "Search…"
362
+ agentmb find <sess> label "Email address"
363
+ agentmb find <sess> alt_text "Product photo" --nth 2
364
+ ```
365
+
366
+ ```python
367
+ # query_type: 'role' | 'text' | 'label' | 'placeholder' | 'alt_text'
368
+ # Returns: found (bool), count, tag, text, bbox, nth
369
+ res = sess.find(query_type="role", query="button", name="Submit")
370
+ res = sess.find(query_type="text", query="Sign in", exact=True)
371
+ res = sess.find(query_type="placeholder", query="Search…")
372
+ res = sess.find(query_type="label", query="Email address")
373
+ res = sess.find(query_type="alt_text", query="Product photo", nth=2)
374
+ ```
375
+
376
+ | `query_type` | Playwright call |
377
+ |---|---|
378
+ | `role` | `page.getByRole(query, { name, exact })` |
379
+ | `text` | `page.getByText(query, { exact })` |
380
+ | `label` | `page.getByLabel(query, { exact })` |
381
+ | `placeholder` | `page.getByPlaceholder(query, { exact })` |
382
+ | `alt_text` | `page.getByAltText(query, { exact })` |
166
383
 
167
- # Read / assert
168
- text = sess.get("text", element_id=btn.element_id)
169
- check = sess.assert_state("visible", selector="#main", expected=True)
170
- print(check.passed)
384
+ Returns `FindResult` with `found`, `count`, `nth`, `tag`, `text`, `bbox`.
385
+
386
+ ### Batch Execution — run_steps (API / SDK)
387
+
388
+ Execute a sequence of actions in a single request. Supports `stop_on_error`.
389
+
390
+ Each step's `params` accepts `selector`, `element_id`, or `ref_id` interchangeably for element targeting:
391
+
392
+ ```python
393
+ # First, take a snapshot to get ref_ids
394
+ snap = sess.snapshot_map()
395
+ btn_ref = next(e.ref_id for e in snap.elements if "Login" in (e.label or ""))
396
+
397
+ result = sess.run_steps([
398
+ {"action": "navigate", "params": {"url": "https://example.com"}},
399
+ {"action": "click", "params": {"ref_id": btn_ref}}, # ref_id from snapshot
400
+ {"action": "fill", "params": {"element_id": "e5", "value": "user@example.com"}}, # element_id
401
+ {"action": "fill", "params": {"selector": "#pass", "value": "secret"}}, # CSS selector
402
+ {"action": "press", "params": {"selector": "#pass", "key": "Enter"}},
403
+ {"action": "wait_for_selector","params": {"selector": ".dashboard"}},
404
+ {"action": "screenshot", "params": {"format": "png"}},
405
+ ], stop_on_error=True)
406
+
407
+ print(result.status) # 'ok' | 'partial' | 'failed'
408
+ print(result.completed_steps) # number of steps that succeeded
409
+ for step in result.results:
410
+ print(step.step, step.action, step.error)
411
+ ```
412
+
413
+ - A stale `ref_id` (page navigated since snapshot) returns a step-level error, not a request crash. Use `stop_on_error=False` to continue remaining steps.
414
+ - Supported actions: `navigate`, `click`, `fill`, `type`, `press`, `hover`, `scroll`, `wait_for_selector`, `wait_text`, `screenshot`, `eval`. Max 100 steps per call.
415
+
416
+ ### File Transfer
417
+
418
+ | Command | Notes |
419
+ |---|---|
420
+ | `agentmb upload <sess> <selector> <file>` | Upload file from disk; MIME auto-inferred from extension (`--mime-type` to override) |
421
+ | `agentmb download <sess> <selector-or-eid> -o <file>` | Trigger download; accepts `--element-id` / `--ref-id`; requires `--accept-downloads` on session |
422
+
423
+ **download guard**: sessions created without `accept_downloads=True` return `422 download_not_enabled`:
424
+
425
+ ```python
426
+ # Correct — enable at session creation time
427
+ sess = client.sessions.create(accept_downloads=True)
428
+ sess.download(selector="#dl-link", output_path="./file.pdf")
171
429
 
172
- # Stability gate before next scan
173
- sess.wait_page_stable(timeout_ms=8000, overlay_selector="#spinner")
430
+ # download also accepts element_id / ref_id
431
+ sess.download(element_id="e7", output_path="./file.pdf")
432
+ sess.download(ref_id="snap_000001:e7", output_path="./file.pdf")
433
+ ```
434
+
435
+ ```bash
436
+ agentmb session new --accept-downloads
437
+ agentmb download <sess> "#dl-link" -o file.pdf
438
+ agentmb download <sess> e7 --element-id -o file.pdf
439
+ ```
440
+
441
+ **CLI + API/SDK — upload from URL:**
442
+
443
+ ```bash
444
+ agentmb upload-url <sess> https://example.com/assets/photo.jpg "#file-input"
445
+ # optional: --filename photo.jpg --mime-type image/jpeg
446
+ ```
447
+
448
+ ```python
449
+ # Fetches the URL server-side (Node fetch), writes to temp file, uploads to file input.
450
+ res = sess.upload_url(
451
+ url="https://example.com/assets/photo.jpg",
452
+ selector="#file-input",
453
+ filename="photo.jpg", # optional; defaults to last URL path segment
454
+ mime_type="image/jpeg", # optional; defaults to application/octet-stream
455
+ )
456
+ # res.size_bytes, res.fetched_bytes, res.filename
457
+ ```
458
+
459
+ ### Session State (Cookie / Storage)
460
+
461
+ | Command | Notes |
462
+ |---|---|
463
+ | `agentmb cookie-list <sess>` | List all cookies |
464
+ | `agentmb cookie-clear <sess>` | Clear all cookies |
465
+ | `agentmb cookie-delete <sess> <name>` | Delete a specific cookie by name (optionally `--domain .example.com`) |
466
+ | `agentmb storage-export <sess> -o state.json` | Export Playwright storageState (cookies + origins) |
467
+ | `agentmb storage-import <sess> state.json` | Restore cookies from storageState; `origins_skipped` count returned |
468
+
469
+ **API/SDK — delete cookie by name:**
470
+
471
+ ```python
472
+ # Removes matching cookies, preserves the rest. domain is optional filter.
473
+ res = sess.delete_cookie("session_token")
474
+ res = sess.delete_cookie("tracker", domain=".example.com")
475
+ # res.removed, res.remaining
476
+ ```
477
+
478
+ ### Observability and Debug
479
+
480
+ | Command | Notes |
481
+ |---|---|
482
+ | `agentmb screenshot <sess> -o out.png` | Screenshot; `--full-page`, `--format png\|jpeg` |
483
+ | `agentmb annotated-screenshot <sess> --highlight <sel>` | Screenshot with colored element overlays |
484
+ | `agentmb eval <sess> <expr>` | Evaluate JavaScript; returns raw result |
485
+ | `agentmb console-log <sess>` | Browser console entries; `--tail N` |
486
+ | `agentmb page-errors <sess>` | Uncaught JS errors from the page |
487
+ | `agentmb dialogs <sess>` | Auto-dismissed dialog history (alert/confirm/prompt) |
488
+ | `agentmb logs <sess>` | Session audit log tail (all actions, policy events, CDP calls) |
489
+ | `agentmb trace start <sess>` / `trace stop <sess> -o trace.zip` | Playwright trace capture |
490
+
491
+ ### Browser Environment and Controls
492
+
493
+ | Command | Notes |
494
+ |---|---|
495
+ | `agentmb set-viewport <sess> <w> <h>` | Resize viewport |
496
+ | `agentmb clipboard-write <sess> <text>` / `clipboard-read <sess>` | Clipboard access |
497
+ | `agentmb policy <sess> [profile]` | Get or set safety policy profile |
498
+ | `agentmb cdp-ws <sess>` | Print browser-level CDP WebSocket URL |
499
+
500
+ **CLI — browser settings:**
501
+
502
+ ```bash
503
+ agentmb settings <sess> # print viewport, UA, headless, profile, current URL
174
504
  ```
175
505
 
506
+ **API/SDK — browser settings:**
507
+
508
+ ```python
509
+ # Returns viewport, user_agent, url, headless, profile for a session.
510
+ settings = sess.get_settings()
511
+ print(settings.viewport, settings.user_agent, settings.headless)
512
+ ```
513
+
514
+ ---
515
+
176
516
  ## Multi-Page Management
177
517
 
178
518
  ```bash
179
- agentmb pages list <session-id> # list all open tabs
180
- agentmb pages new <session-id> # open a new tab
519
+ agentmb pages list <session-id> # list all open tabs
520
+ agentmb pages new <session-id> # open a new tab
181
521
  agentmb pages switch <session-id> <page-id> # make a tab the active target
182
522
  agentmb pages close <session-id> <page-id> # close a tab (last tab protected)
183
523
  ```
184
524
 
525
+ ### Direct Page Targeting (R09)
526
+
527
+ Pass `page_id` in the request body to any action route to target a specific tab **without switching** the session's active tab. All major actions support `page_id`: `navigate`, `click`, `fill`, `type`, `press`, `eval`, `screenshot`, `element_map`, `snapshot_map`, `scroll`.
528
+
529
+ ```python
530
+ # Open tabs, work on each independently — no switching required
531
+ p2 = sess.new_page() # returns page_id string
532
+ sess.navigate("https://tab2.example.com", page_id=p2)
533
+ sess.screenshot(page_id=p2)
534
+
535
+ # Concurrent multi-page work
536
+ import asyncio
537
+ async with AsyncBrowserClient() as client:
538
+ sess = await client.sessions.create(profile="work")
539
+ async with sess:
540
+ p1 = (await sess.pages())[0].page_id
541
+ p2 = await sess.new_page()
542
+ await asyncio.gather(
543
+ sess.navigate("https://site.com/a", page_id=p1),
544
+ sess.navigate("https://site.com/b", page_id=p2),
545
+ )
546
+ ```
547
+
548
+ CLI with `--page-id`:
549
+ ```bash
550
+ agentmb navigate $SID https://site.com/page2 --page-id $P2
551
+ agentmb screenshot $SID -o p2.png --page-id $P2
552
+ ```
553
+
554
+ ---
555
+
185
556
  ## Network Route Mocks
186
557
 
187
558
  ```bash
188
- agentmb route list <session-id> # list active mocks
559
+ agentmb route list <session-id>
189
560
  agentmb route add <session-id> "**/api/**" \
190
561
  --status 200 --body '{"ok":true}' \
191
- --content-type application/json # intercept requests
192
- agentmb route rm <session-id> "**/api/**" # remove a mock
562
+ --content-type application/json
563
+ agentmb route rm <session-id> "**/api/**"
193
564
  ```
194
565
 
195
- ## Playwright Trace Recording
566
+ Route mocks are applied at context level, so they persist across page navigations within the same session.
567
+
568
+ ### Regex Pattern Matching (R09)
569
+
570
+ In addition to glob patterns, route mocks accept JavaScript-style `/regex/flags` strings:
196
571
 
197
572
  ```bash
198
- agentmb trace start <session-id> # start recording
199
- # ... do actions ...
200
- agentmb trace stop <session-id> -o trace.zip # save ZIP
201
- npx playwright show-trace trace.zip # open in Playwright UI
573
+ agentmb route add <session-id> "/\/api\/.*\.json/i" \
574
+ --status 200 --body '{"mocked":true}' --content-type application/json
575
+ ```
576
+
577
+ ```python
578
+ sess.add_route(r"/\/api\/.*\.json/i", mock={"status": 200, "body": '{"mocked":true}'})
579
+ ```
580
+
581
+ The pattern is auto-detected: if it matches `/expression/flags`, it is compiled as a `RegExp` and passed to Playwright's `context.route()`. Invalid regex falls back to glob string.
582
+
583
+ ### Network Delay Simulation
584
+
585
+ Add `delay_ms` to a mock to simulate network latency:
586
+
587
+ ```python
588
+ sess.add_route("**/slow-api/**", mock={"status": 200, "body": "ok", "delay_ms": 500})
589
+ ```
590
+
591
+ ---
592
+
593
+ ## Three Browser Running Modes
594
+
595
+ agentmb supports three distinct browser modes, differing in **which browser binary is used and how it is connected**.
596
+
597
+ | Mode | Browser | How Connected | Profile Persistence |
598
+ |---|---|---|---|
599
+ | **1. Managed Chromium** | Playwright bundled Chromium | agentmb spawns & owns | Persistent or ephemeral |
600
+ | **2. Managed Chrome Stable** | System Chrome / Edge | agentmb spawns & owns | Persistent or ephemeral |
601
+ | **3. CDP Attach** (Bold Mode) | Any running Chrome-compatible | agentmb attaches via CDP | Owned by external process |
602
+
603
+ ```
604
+ ┌─────────────────────────────────────────────────────────┐
605
+ │ agentmb daemon │
606
+ │ REST API POST /api/v1/sessions (+ preflight check) │
607
+ └───────────┬──────────────────┬──────────────┬───────────┘
608
+ │ │ │
609
+ launchPersistent() launchPersistent() connectOverCDP()
610
+ (bundled Chromium) (system Chrome/Edge) (external process)
611
+ │ │ │
612
+ ┌────────────▼────┐ ┌──────────▼────┐ ┌────▼──────────────┐
613
+ │ Mode 1 │ │ Mode 2 │ │ Mode 3 │
614
+ │ Managed │ │ Managed │ │ CDP Attach │
615
+ │ Chromium │ │ Chrome Stable │ │ (Bold Mode) │
616
+ │ │ │ / Edge │ │ launch_mode= │
617
+ │ profile=name │ │ browser_ │ │ attach │
618
+ │ or ephemeral=T │ │ channel=chrome│ │ │
619
+ └─────────────────┘ └───────────────┘ └───────────────────┘
202
620
  ```
203
621
 
204
- ## CDP WebSocket URL
622
+ ### Mode 1: Managed Chromium (default)
623
+
624
+ agentmb spawns the **Playwright-bundled Chromium** binary. No system Chrome required. Works in headless (CI) and headed modes.
625
+
626
+ Within managed modes, choose a **profile strategy**:
627
+
628
+ **Agent Workspace** — named profile; cookies, localStorage, and browser state persist across runs:
629
+
630
+ ```python
631
+ sess = client.sessions.create(profile="gmail-account")
632
+ ```
205
633
 
206
634
  ```bash
207
- agentmb cdp-ws <session-id> # print browser CDP WebSocket URL
635
+ agentmb session new --profile gmail-account
208
636
  ```
209
637
 
210
- ## Linux Headed Mode
638
+ **Pure Sandbox** ephemeral temp directory; all data is auto-deleted on `close()`:
211
639
 
212
- Linux visual/headed mode requires Xvfb.
640
+ ```python
641
+ sess = client.sessions.create(ephemeral=True)
642
+ ```
213
643
 
214
644
  ```bash
215
- sudo apt-get install -y xvfb
216
- bash scripts/xvfb-headed.sh
645
+ agentmb session new --ephemeral
217
646
  ```
218
647
 
219
- ## Verify
648
+ ### Mode 2: Managed Chrome Stable
649
+
650
+ agentmb spawns a **system-installed Chrome or Edge** binary via Playwright. Requires Chrome Stable or Edge to be installed on the host. Both Agent Workspace and Pure Sandbox profile strategies apply.
651
+
652
+ ```python
653
+ sess = client.sessions.create(browser_channel="chrome") # system Chrome Stable
654
+ sess = client.sessions.create(browser_channel="msedge") # system Edge
655
+ sess = client.sessions.create(executable_path="/path/to/chrome") # custom binary path
656
+ ```
220
657
 
221
658
  ```bash
222
- bash scripts/verify.sh
659
+ agentmb session new --browser-channel chrome
660
+ agentmb session new --browser-channel msedge
661
+ agentmb session new --executable-path /usr/bin/chromium-browser
223
662
  ```
224
663
 
225
- ## npm Release Setup
664
+ Valid `browser_channel` values: `chromium` (Playwright bundled, default), `chrome` (system Chrome Stable), `msedge`. `browser_channel` and `executable_path` are mutually exclusive.
665
+
666
+ ### Mode 3: CDP Attach (Bold Mode)
667
+
668
+ agentmb **attaches to an already-running Chrome** process via the Chrome DevTools Protocol. The remote browser is **not terminated** on `close()` — only the Playwright connection is dropped. This mode exposes lower `navigator.webdriver` fingerprint than managed modes and supports extensions.
669
+
670
+ Three profile variants are available, depending on which `--user-data-dir` Chrome is launched with:
671
+
672
+ | Variant | `--user-data-dir` | State | Typical Use |
673
+ |---|---|---|---|
674
+ | **A. Sandbox** | temp dir (auto) | ephemeral | clean-slate CI runs, throwaway sessions |
675
+ | **B. Dedicated Profile** | custom persistent dir | persistent, isolated | automation account, persistent login |
676
+ | **C. User Chrome** | your real Chrome profile | inherits all cookies & extensions | leverage personal login state |
677
+
678
+ #### Variant A: Sandbox (ephemeral temp dir)
679
+
680
+ `agentmb browser-launch` creates a fresh temp profile automatically. Clean slate — no cookies, no extensions.
226
681
 
227
682
  ```bash
228
- # login once
229
- npm login
230
- npm whoami
683
+ agentmb browser-launch --port 9222
684
+ # → launches Chrome with --user-data-dir=/tmp/agentmb-cdp-9222 (temp, ephemeral)
685
+ # → CDP URL: http://127.0.0.1:9222
686
+ ```
687
+
688
+ ```python
689
+ sess = client.sessions.create(launch_mode="attach", cdp_url="http://127.0.0.1:9222")
690
+ sess.navigate("https://example.com")
691
+ sess.close() # disconnects only — Chrome stays alive
692
+ ```
693
+
694
+ #### Variant B: Dedicated Profile (isolated persistent profile)
231
695
 
232
- # check package payload before publish
233
- npm run pack:check
696
+ Pass a fixed `--user-data-dir` to Chrome. State (cookies, localStorage) persists across restarts. Completely isolated from your personal Chrome.
234
697
 
235
- # publish from repo root
236
- npm publish
698
+ ```bash
699
+ # macOS / Linux
700
+ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
701
+ --remote-debugging-port=9222 \
702
+ --user-data-dir="$HOME/.agentmb-profiles/my-automation-profile" \
703
+ --no-first-run --no-default-browser-check
704
+
705
+ # Windows
706
+ "C:\Program Files\Google\Chrome\Application\chrome.exe" ^
707
+ --remote-debugging-port=9222 ^
708
+ --user-data-dir="%APPDATA%\agentmb-profiles\my-automation-profile"
237
709
  ```
238
710
 
239
- If your global npm cache has permission issues, this repo uses project-local cache (`.npm-cache`) via `.npmrc`.
711
+ ```python
712
+ sess = client.sessions.create(launch_mode="attach", cdp_url="http://127.0.0.1:9222")
713
+ ```
240
714
 
241
- ## Environment Variables
715
+ #### Variant C: User Chrome (reuse your real Chrome profile)
716
+
717
+ Point Chrome at your existing user profile to inherit all logged-in sessions, saved passwords, and installed extensions. **Chrome must not already be running with that profile** when you launch with remote debugging.
718
+
719
+ ```bash
720
+ # macOS — close Chrome first, then:
721
+ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
722
+ --remote-debugging-port=9222 \
723
+ --user-data-dir="$HOME/Library/Application Support/Google/Chrome"
724
+
725
+ # Linux
726
+ google-chrome --remote-debugging-port=9222 \
727
+ --user-data-dir="$HOME/.config/google-chrome"
728
+
729
+ # Windows
730
+ "C:\Program Files\Google\Chrome\Application\chrome.exe" ^
731
+ --remote-debugging-port=9222 ^
732
+ --user-data-dir="%LOCALAPPDATA%\Google\Chrome\User Data"
733
+ ```
734
+
735
+ ```python
736
+ sess = client.sessions.create(launch_mode="attach", cdp_url="http://127.0.0.1:9222")
737
+ # → all cookies, extensions, and login state from your personal Chrome are available
738
+ ```
739
+
740
+ **Warning**: actions performed via agentmb will affect your real Chrome profile (cookies written, history created, etc.). Use Variant B when in doubt.
741
+
742
+ ---
743
+
744
+ Attach a session (all variants):
745
+
746
+ ```bash
747
+ agentmb session new --launch-mode attach --cdp-url http://127.0.0.1:9222
748
+ ```
749
+
750
+ **Note**: `launch_mode=attach` is incompatible with `browser_channel` and `executable_path` (preflight returns `400`). CDP attach gives agentmb control over **all tabs** in the connected browser.
751
+
752
+ ### Session Seal
753
+
754
+ Mark a session as sealed to prevent accidental deletion:
755
+
756
+ ```python
757
+ sess.seal()
758
+ # Now sess.close() / DELETE returns 423 session_sealed
759
+ ```
760
+
761
+ ```bash
762
+ agentmb session seal <session-id>
763
+ agentmb session rm <session-id> # → error: session is sealed
764
+ ```
765
+
766
+ ### Session-Level Proxy (R09)
767
+
768
+ Route all browser traffic through a proxy for a specific session:
769
+
770
+ ```python
771
+ sess = client.sessions.create(profile="demo", proxy_url="http://user:pass@proxy.example.com:8080")
772
+ ```
773
+
774
+ ```bash
775
+ agentmb session new --proxy http://user:pass@proxy.example.com:8080
776
+ ```
777
+
778
+ The proxy is applied at browser-context level, affecting all navigation and network requests within the session.
779
+
780
+ ### Video Recording (R09)
781
+
782
+ Record a video of the session's activity. The video is saved to a temp directory and retrievable via API:
783
+
784
+ ```python
785
+ sess = client.sessions.create(profile="demo", record_video=True)
786
+ sess.navigate("https://example.com")
787
+ # ... perform actions ...
788
+ info = sess.video_path() # { video_path: "/tmp/agentmb-video-<sid>/..." }
789
+ ```
790
+
791
+ ```bash
792
+ agentmb session new --record-video
793
+ # After actions, retrieve the video path:
794
+ GET /api/v1/sessions/<id>/video
795
+ POST /api/v1/sessions/<id>/video/save { "dest_path": "/output/recording.webm" }
796
+ ```
797
+
798
+ ### Preflight Validation
799
+
800
+ The `POST /api/v1/sessions` endpoint validates parameters before launching and returns `400 preflight_failed` for:
801
+ - `browser_channel` + `executable_path` used together (mutually exclusive)
802
+ - `browser_channel` not in `['chromium', 'chrome', 'msedge']`
803
+ - `launch_mode=attach` without `cdp_url`
804
+ - `cdp_url` with invalid URL format
805
+ - `launch_mode=attach` combined with `browser_channel` or `executable_path`
806
+
807
+ ---
808
+
809
+ ## CDP Access
810
+
811
+ agentmb uses Chromium stable as the browser engine. The protocol exposed is the full **Chrome DevTools Protocol (CDP)** as implemented in Chromium/Chrome. Three distinct access modes are provided.
812
+
813
+ ### 1. CDP Command Passthrough (REST)
814
+
815
+ Send any DevTools Protocol method to the session's CDP session.
816
+
817
+ ```http
818
+ GET /api/v1/sessions/:id/cdp → session CDP info
819
+ POST /api/v1/sessions/:id/cdp
820
+ {"method": "Page.captureScreenshot", "params": {"format": "png"}}
821
+ ```
822
+
823
+ All CDP calls are written to the session audit log (`type="cdp"`, `method`, `session_id`, `purpose`, `operator`). Error responses are sanitized (stack frames and internal paths stripped before logging).
824
+
825
+ ### 2. CDP WebSocket Passthrough
826
+
827
+ Returns the browser-level `ws://` endpoint. Connect Puppeteer, Chrome DevTools, or any CDP client directly.
828
+
829
+ ```bash
830
+ agentmb cdp-ws <session-id>
831
+ # → ws://127.0.0.1:NNNN/devtools/browser/...
832
+ ```
833
+
834
+ ```python
835
+ ws_url = sess.cdp_ws_url()
836
+ # connect with puppeteer, pyppeteer, or raw websocket
837
+ ```
838
+
839
+ Note: The WebSocket URL is for the full browser process (not per-page). It is only available when the daemon uses a non-persistent browser launch. Auth-gated: requires the same `X-API-Token` as REST endpoints when auth is enabled.
840
+
841
+ ### 3. CDP Network Emulation
842
+
843
+ Apply network throttling or offline mode via an internal CDP session attached per-session. Does not require external CDP tooling.
844
+
845
+ ```bash
846
+ agentmb set-network <session-id> \
847
+ --latency-ms 200 \
848
+ --download-kbps 512 \
849
+ --upload-kbps 256
850
+
851
+ agentmb set-network <session-id> --offline # full offline mode
852
+ agentmb reset-network <session-id> # restore normal conditions
853
+ ```
854
+
855
+ ```python
856
+ sess.network_conditions(offline=False, latency_ms=200,
857
+ download_kbps=512, upload_kbps=256)
858
+ ```
859
+
860
+ ---
861
+
862
+ ## Profile Management (API / SDK)
863
+
864
+ Profiles persist cookies, localStorage, and browser state between sessions.
865
+
866
+ ```python
867
+ # List all profiles on disk
868
+ result = client.list_profiles()
869
+ for p in result.profiles:
870
+ print(p.name, p.path, p.last_used)
871
+
872
+ # Reset a profile (wipes data dir and recreates empty directory)
873
+ # Returns 409 if a live session is currently using the profile.
874
+ result = client.reset_profile("demo")
875
+ # result.status == "ok"
876
+ ```
877
+
878
+ REST:
879
+ ```
880
+ GET /api/v1/profiles → ProfileListResult
881
+ POST /api/v1/profiles/:name/reset → ProfileResetResult
882
+ ```
883
+
884
+ Profile directories are stored under `AGENTMB_DATA_DIR/profiles/<name>/`.
885
+
886
+ ---
887
+
888
+ ## Local Awareness Pipeline (R09)
889
+
890
+ Allow sessions to scan local directories via a whitelist. Agents can inspect local file structures without requiring shell access.
891
+
892
+ ### Session Creation with `allow_dirs`
242
893
 
243
- Common runtime env vars:
894
+ ```python
895
+ sess = client.sessions.create(
896
+ profile="demo",
897
+ allow_dirs=["/tmp/reports", "/home/user/docs"],
898
+ )
899
+ ```
900
+
901
+ ```bash
902
+ agentmb session new --allow-dir /tmp/reports --allow-dir /home/user/docs
903
+ ```
904
+
905
+ ### File Scan Endpoint
906
+
907
+ ```http
908
+ GET /api/v1/utils/ls?session_id=<sid>&path=/tmp/reports&depth=2
909
+ ```
910
+
911
+ Returns:
912
+ ```json
913
+ {
914
+ "path": "/tmp/reports",
915
+ "session_id": "sess_...",
916
+ "entries": [
917
+ { "name": "report.pdf", "type": "file", "path": "/tmp/reports/report.pdf", "size": 12345 },
918
+ { "name": "charts", "type": "dir", "path": "/tmp/reports/charts", "children": [...] }
919
+ ]
920
+ }
921
+ ```
922
+
923
+ **Access control**:
924
+ - `403` if the session has no `allow_dirs` configured.
925
+ - `403` if the requested path is outside all allowed directories (prevents path traversal).
926
+ - `depth` capped at 5 levels.
927
+
928
+ ---
244
929
 
245
- - `AGENTMB_PORT` (default `19315`)
246
- - `AGENTMB_DATA_DIR` (default `~/.agentmb`)
247
- - `AGENTMB_API_TOKEN` (optional API auth)
248
- - `AGENTMB_ENCRYPTION_KEY` (optional AES-256-GCM profile encryption key, 32 bytes as base64 or hex)
249
- - `AGENTMB_LOG_LEVEL` (default `info`)
250
- - `AGENTMB_POLICY_PROFILE` (default `safe`) — daemon-wide default safety policy profile
930
+ ## Sensitive Website Warning (R09)
931
+
932
+ Every `navigate` response includes a `sensitive_warning` field when the target domain matches a built-in category pattern. The field is absent (not `null`) for non-sensitive domains — no backward-compatibility break.
933
+
934
+ ```json
935
+ {
936
+ "status": "ok",
937
+ "url": "https://onlinebanking.example.com/login",
938
+ "title": "Login",
939
+ "sensitive_warning": {
940
+ "domain": "onlinebanking.example.com",
941
+ "category": "financial",
942
+ "message": "Navigating to potentially sensitive domain: onlinebanking.example.com"
943
+ }
944
+ }
945
+ ```
946
+
947
+ Built-in categories: `financial` (bank/payment/stripe/paypal), `medical` (health/hospital/pharma), `gambling` (casino/betting), `adult` (adult content), `crypto` (bitcoin/wallet/exchange).
948
+
949
+ Custom patterns via environment variable (comma-separated regex strings):
950
+
951
+ ```bash
952
+ AGENTMB_SENSITIVE_DOMAINS="gov,military,defence" agentmb start
953
+ ```
954
+
955
+ Python SDK:
956
+ ```python
957
+ result = sess.navigate("https://mybank.example.com/")
958
+ if hasattr(result, "sensitive_warning") and result.sensitive_warning:
959
+ print(f"Warning: {result.sensitive_warning['message']}")
960
+ ```
961
+
962
+ ---
251
963
 
252
964
  ## Safety Execution Policy
253
965
 
254
- agentmb enforces a configurable **safety execution policy** that throttles actions, enforces per-domain rate limits, and blocks sensitive actions (e.g. form submissions, file uploads) unless explicitly permitted.
966
+ Rate limiting and action guardrails enforced per-session, per-domain.
255
967
 
256
968
  ### Profiles
257
969
 
258
970
  | Profile | Min interval | Jitter | Max actions/min | Sensitive actions |
259
971
  |---|---|---|---|---|
260
- | `safe` | 1500 ms | 300–800 ms | 8 | blocked by default |
972
+ | `safe` | 1500 ms | 300–800 ms | 8 | blocked (HTTP 403) |
261
973
  | `permissive` | 200 ms | 0–100 ms | 60 | allowed |
262
974
  | `disabled` | 0 ms | 0 ms | unlimited | allowed |
263
975
 
264
- Set the daemon-wide default via env var:
976
+ Set daemon-wide default via environment variable:
977
+
265
978
  ```bash
266
979
  AGENTMB_POLICY_PROFILE=disabled node dist/daemon/index.js # CI / trusted automation
267
- AGENTMB_POLICY_PROFILE=safe node dist/daemon/index.js # social-media / sensitive workflows
980
+ AGENTMB_POLICY_PROFILE=safe node dist/daemon/index.js # untrusted / social-media flows
268
981
  ```
269
982
 
270
- ### Per-session override (CLI)
983
+ ### Per-session override
271
984
 
272
985
  ```bash
273
- agentmb policy <session-id> # get current policy
274
- agentmb policy <session-id> safe # switch to safe profile
275
- agentmb policy <session-id> permissive # switch to permissive
986
+ agentmb policy <session-id> # get current profile
987
+ agentmb policy <session-id> safe # switch to safe
988
+ agentmb policy <session-id> permissive # switch to permissive
276
989
  agentmb policy <session-id> safe --allow-sensitive # safe + allow sensitive actions
277
990
  ```
278
991
 
279
- ### Per-session override (Python SDK)
280
-
281
992
  ```python
282
- from agentmb import BrowserClient
283
-
284
- with BrowserClient() as client:
285
- sess = client.sessions.create()
286
- policy = sess.set_policy("safe", allow_sensitive_actions=False)
287
- print(policy.max_retries_per_domain) # 3
288
- current = sess.get_policy()
993
+ sess.set_policy("safe", allow_sensitive_actions=False)
994
+ info = sess.get_policy() # → PolicyInfo
289
995
  ```
290
996
 
291
- ### Audit logs
997
+ ### Audit log (policy events)
292
998
 
293
999
  All policy events (`throttle`, `jitter`, `cooldown`, `deny`, `retry`) are written to the session audit log with `type="policy"`.
294
1000
 
@@ -296,14 +1002,157 @@ All policy events (`throttle`, `jitter`, `cooldown`, `deny`, `retry`) are writte
296
1002
  agentmb logs <session-id> # shows policy events inline
297
1003
  ```
298
1004
 
299
- ### Sensitive actions
1005
+ ### Sensitive action guard
300
1006
 
301
- Mark any action as sensitive by passing `"sensitive": true` in the request body. With `safe` profile and `allow_sensitive_actions=false`, the request returns HTTP 403:
1007
+ Pass `"sensitive": true` in any request body to mark it as sensitive. With `safe` profile and `allow_sensitive_actions=false`:
302
1008
 
303
1009
  ```json
304
1010
  { "error": "sensitive action blocked by policy", "policy_event": "deny" }
305
1011
  ```
306
1012
 
1013
+ HTTP status: `403`.
1014
+
1015
+ ---
1016
+
1017
+ ## Security
1018
+
1019
+ ### API Token Authentication
1020
+
1021
+ All endpoints require `X-API-Token` or `Authorization: Bearer <token>` when `AGENTMB_API_TOKEN` is set.
1022
+
1023
+ ```bash
1024
+ export AGENTMB_API_TOKEN="my-secret-token"
1025
+ ```
1026
+
1027
+ Requests without a valid token return `401 Unauthorized`. CDP REST and WebSocket endpoints are subject to the same token check.
1028
+
1029
+ ### Profile Encryption
1030
+
1031
+ Browser profiles (cookies, storage) are encrypted at rest using AES-256-GCM when `AGENTMB_ENCRYPTION_KEY` is set.
1032
+
1033
+ ```bash
1034
+ # 32-byte key, base64 or hex encoded
1035
+ export AGENTMB_ENCRYPTION_KEY="$(openssl rand -base64 32)"
1036
+ ```
1037
+
1038
+ Profiles written without a key cannot be read with one and vice versa.
1039
+
1040
+ ### Input Validation (Preflight)
1041
+
1042
+ Every action route runs preflight checks before execution:
1043
+
1044
+ - `timeout_ms`: must be in range `[50, 60000]` ms. Out-of-range values return `400 preflight_failed` with `{ field, constraint, value }`.
1045
+ - `fill` value: max 100,000 characters. Longer values return `400 preflight_failed`.
1046
+
1047
+ ### Error Diagnostics and Recovery Hints
1048
+
1049
+ When an action fails (element not found, timeout, detached context, overlay intercept), the route returns `422` with a structured diagnostic payload:
1050
+
1051
+ ```json
1052
+ {
1053
+ "error": "Timeout 3000ms exceeded.",
1054
+ "url": "https://example.com",
1055
+ "readyState": "complete",
1056
+ "recovery_hint": "Increase timeout_ms or add stability.wait_before_ms; ensure element is visible before acting"
1057
+ }
1058
+ ```
1059
+
1060
+ `recovery_hint` categories:
1061
+ - **Timeout / waiting for**: increase `timeout_ms` or add `stability.wait_before_ms`; verify element visibility
1062
+ - **Target closed / detached**: page navigated or element removed; re-navigate or call `snapshot_map` again
1063
+ - **Not found / no element**: check selector; use `snapshot_map` to verify element exists on current page
1064
+ - **Intercept / overlap / obscured**: element covered by overlay; try `executor=auto_fallback` or scroll into view first
1065
+
1066
+ ### Audit Logging
1067
+
1068
+ Every action, CDP call, and policy event is appended to a per-session JSONL audit log:
1069
+
1070
+ ```json
1071
+ {
1072
+ "ts": "2026-02-28T10:00:01.234Z",
1073
+ "v": 1,
1074
+ "session_id": "s_abc123",
1075
+ "action_id": "act_xyz",
1076
+ "type": "action",
1077
+ "action": "click",
1078
+ "url": "https://example.com",
1079
+ "selector": "#submit",
1080
+ "result": { "status": "ok", "duration_ms": 142 },
1081
+ "purpose": "submit search form",
1082
+ "operator": "codex-agent"
1083
+ }
1084
+ ```
1085
+
1086
+ Fields: `purpose` (why), `operator` (who/what). Set via request body or `X-Operator` header.
1087
+
1088
+ ```bash
1089
+ agentmb logs <session-id> --tail 50
1090
+ ```
1091
+
1092
+ ---
1093
+
1094
+ ## Human Login Handoff
1095
+
1096
+ Switch a session to headed (visible) mode, log in manually, then return to headless automation with the same cookies and storage.
1097
+
1098
+ ```bash
1099
+ agentmb login <session-id>
1100
+ # → browser window opens
1101
+ # → log in manually
1102
+ # → press Enter in terminal to return to headless mode
1103
+ ```
1104
+
1105
+ ---
1106
+
1107
+ ## Linux Headed Mode
1108
+
1109
+ Linux visual/headed mode requires Xvfb:
1110
+
1111
+ ```bash
1112
+ sudo apt-get install -y xvfb
1113
+ bash scripts/xvfb-headed.sh
1114
+ ```
1115
+
1116
+ ---
1117
+
1118
+ ## Playwright Trace Recording
1119
+
1120
+ ```bash
1121
+ agentmb trace start <session-id>
1122
+ # ... perform actions ...
1123
+ agentmb trace stop <session-id> -o trace.zip
1124
+ npx playwright show-trace trace.zip
1125
+ ```
1126
+
1127
+ ---
1128
+
1129
+ ## Verify
1130
+
1131
+ Runs: build → daemon start → 24 pytest suites → daemon stop. Requires daemon to not be running on the configured port.
1132
+
1133
+ ```bash
1134
+ bash scripts/verify.sh # uses default port 19315
1135
+ AGENTMB_PORT=19320 bash scripts/verify.sh
1136
+ ```
1137
+
1138
+ Expected output: `ALL GATES PASSED (27/27)`.
1139
+
1140
+ ---
1141
+
1142
+ ## Environment Variables
1143
+
1144
+ | Variable | Default | Purpose |
1145
+ |---|---|---|
1146
+ | `AGENTMB_PORT` | `19315` | Daemon HTTP port |
1147
+ | `AGENTMB_DATA_DIR` | `~/.agentmb` | Profiles and logs directory |
1148
+ | `AGENTMB_API_TOKEN` | _(none)_ | Require this token on all requests |
1149
+ | `AGENTMB_ENCRYPTION_KEY` | _(none)_ | AES-256-GCM key for profile encryption (32 bytes, base64 or hex) |
1150
+ | `AGENTMB_LOG_LEVEL` | `info` | Daemon log verbosity |
1151
+ | `AGENTMB_POLICY_PROFILE` | `safe` | Default safety policy profile (`safe\|permissive\|disabled`) |
1152
+ | `AGENTMB_SENSITIVE_DOMAINS` | _(none)_ | Comma-separated regex patterns to append to the sensitive domain list (e.g. `gov,military`) |
1153
+
1154
+ ---
1155
+
307
1156
  ## License
308
1157
 
309
1158
  MIT