agentmb 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/README.md +188 -4
  2. package/dist/browser/actions.d.ts +1 -1
  3. package/dist/browser/actions.d.ts.map +1 -1
  4. package/dist/browser/actions.js +4 -3
  5. package/dist/browser/actions.js.map +1 -1
  6. package/dist/browser/manager.d.ts +21 -0
  7. package/dist/browser/manager.d.ts.map +1 -1
  8. package/dist/browser/manager.js +127 -12
  9. package/dist/browser/manager.js.map +1 -1
  10. package/dist/cli/commands/actions.d.ts.map +1 -1
  11. package/dist/cli/commands/actions.js +37 -10
  12. package/dist/cli/commands/actions.js.map +1 -1
  13. package/dist/cli/commands/session.d.ts.map +1 -1
  14. package/dist/cli/commands/session.js +9 -0
  15. package/dist/cli/commands/session.js.map +1 -1
  16. package/dist/cli/index.js +1 -1
  17. package/dist/daemon/routes/actions.d.ts.map +1 -1
  18. package/dist/daemon/routes/actions.js +97 -12
  19. package/dist/daemon/routes/actions.js.map +1 -1
  20. package/dist/daemon/routes/interaction.d.ts.map +1 -1
  21. package/dist/daemon/routes/interaction.js +10 -1
  22. package/dist/daemon/routes/interaction.js.map +1 -1
  23. package/dist/daemon/routes/sessions.d.ts.map +1 -1
  24. package/dist/daemon/routes/sessions.js +107 -1
  25. package/dist/daemon/routes/sessions.js.map +1 -1
  26. package/dist/daemon/server.js +1 -1
  27. package/package.json +4 -2
  28. package/skills/agentmb/SKILL.md +541 -0
  29. package/skills/agentmb/references/authentication.md +180 -0
  30. package/skills/agentmb/references/browser-modes.md +167 -0
  31. package/skills/agentmb/references/commands.md +231 -0
  32. package/skills/agentmb/references/locator-modes.md +254 -0
  33. package/skills/agentmb/references/session-management.md +260 -0
@@ -0,0 +1,254 @@
1
+ # Locator Modes — Deep Reference
2
+
3
+ Three modes for targeting page elements. **Start at Priority 1, move down only if needed.**
4
+
5
+ ---
6
+
7
+ ## Priority Order
8
+
9
+ ```
10
+ Priority 1: element-map → --element-id default; works for most pages
11
+ Priority 2: CSS Selector → direct when element-map labels are empty/unreliable
12
+ Priority 3: snapshot-map → --ref-id when you need atomicity or batch operations
13
+ Priority 4: click-at coordinates last resort; contenteditable, canvas
14
+ ```
15
+
16
+ ---
17
+
18
+ ## Priority 1 — element-map + --element-id
19
+
20
+ ### How it works
21
+
22
+ `element-map` injects a stable `element_id` (`e1`, `e2`, …) into every interactable element on the page and returns a list with label, tag, and selector. The IDs persist until the next `element-map` call.
23
+
24
+ ```bash
25
+ agentmb element-map <session-id>
26
+ agentmb element-map <session-id> --include-unlabeled # also include icon-only elements
27
+ ```
28
+
29
+ Example output:
30
+ ```
31
+ e1 [button] Submit
32
+ e2 [input] Email address (placeholder="Enter email")
33
+ e3 [a] Sign in
34
+ e4 [button] ☰ (label_source=none — icon only)
35
+ ```
36
+
37
+ Pass the ID to any action:
38
+ ```bash
39
+ agentmb click <session-id> e1 --element-id
40
+ agentmb fill <session-id> e2 "user@example.com" --element-id
41
+ agentmb get <session-id> text e3 --element-id
42
+ agentmb assert <session-id> visible e1 --element-id
43
+ agentmb bbox <session-id> e1 --element-id
44
+ ```
45
+
46
+ ### label_source Priority Chain
47
+
48
+ The `label` field is synthesized by checking sources in this order (first non-empty wins):
49
+
50
+ | Priority | Source | `label_source` value |
51
+ |---|---|---|
52
+ | 1 | `aria-label` attribute | `"aria-label"` |
53
+ | 2 | `title` attribute | `"title"` |
54
+ | 3 | `aria-labelledby` target text | `"aria-labelledby"` |
55
+ | 4 | SVG `<title>` / `<desc>` | `"svg-title"` |
56
+ | 5 | `innerText` (trimmed) | `"text"` |
57
+ | 6 | `placeholder` attribute | `"placeholder"` |
58
+ | 7 | Fallback (icon-only) | `"none"` / `"[tag @ x,y]"` |
59
+
60
+ If `label_source=none`, the element has no readable label. Add `--include-unlabeled` to get a coordinate-based `[tag @ x,y]` fallback, or switch to CSS selector (Priority 2).
61
+
62
+ ### Best for
63
+ - Text-rich pages: docs, GitHub, Hacker News, dashboards
64
+ - Forms with labeled inputs
65
+ - Buttons with accessible text
66
+
67
+ ---
68
+
69
+ ## Priority 2 — CSS Selector
70
+
71
+ Pass a CSS selector directly — no prior scan needed.
72
+
73
+ ```bash
74
+ agentmb click <session-id> "button[data-testid=submit]"
75
+ agentmb fill <session-id> "#email" "user@example.com"
76
+ agentmb get <session-id> text ".product-title"
77
+ agentmb assert <session-id> visible ".modal"
78
+ ```
79
+
80
+ Python SDK:
81
+ ```python
82
+ sess.click(selector="button[data-testid=submit]")
83
+ sess.fill(selector="#email", value="user@example.com")
84
+ ```
85
+
86
+ ### Best for
87
+ - Icon-dense SPAs where `element-map` returns `label_source=none` for most elements
88
+ - Pages with stable, predictable `data-testid` or `id` attributes
89
+ - When you already know the selector (no scan needed)
90
+
91
+ ### When NOT to use
92
+ - Selectors with dynamic class names like `.css-3xk7a9` — they break on re-render
93
+ - Use element-map or snapshot-map instead
94
+
95
+ ---
96
+
97
+ ## Priority 3 — snapshot-map + --ref-id
98
+
99
+ ### How it works
100
+
101
+ `snapshot-map` captures a server-side snapshot of the page's element state with a `page_rev` counter. Each element gets a stable `ref_id` (`snap_XXXXXX:eN`). The ref is valid as long as the page has not navigated since the snapshot.
102
+
103
+ ```bash
104
+ agentmb snapshot-map <session-id>
105
+ agentmb snapshot-map <session-id> --include-unlabeled
106
+ ```
107
+
108
+ Example output:
109
+ ```
110
+ snap_000001:e1 [button] Login
111
+ snap_000001:e3 [input] Username
112
+ snap_000001:e7 [a] Forgot password?
113
+ ```
114
+
115
+ Pass the ref_id to any action:
116
+ ```bash
117
+ agentmb click <session-id> snap_000001:e1 --ref-id
118
+ agentmb fill <session-id> snap_000001:e3 "alice" --ref-id
119
+ ```
120
+
121
+ Python SDK:
122
+ ```python
123
+ snap = sess.snapshot_map()
124
+
125
+ # Find by label
126
+ btn = next(e for e in snap.elements if "Login" in (e.label or ""))
127
+ sess.click(ref_id=btn.ref_id)
128
+
129
+ # Or use in run_steps
130
+ sess.run_steps([
131
+ {"action": "click", "params": {"ref_id": btn.ref_id}},
132
+ {"action": "fill", "params": {"ref_id": snap.elements[2].ref_id, "value": "alice"}},
133
+ ])
134
+ ```
135
+
136
+ ### ref_id Format
137
+
138
+ ```
139
+ snap_XXXXXX:eN
140
+ │ │
141
+ │ └─ element index within the snapshot
142
+ └─ 6-char snapshot ID (hex)
143
+ ```
144
+
145
+ Examples: `snap_000001:e1`, `snap_a3f9c2:e15`
146
+
147
+ ### page_rev — Detecting Page Changes
148
+
149
+ `page_rev` is an integer that increments on every main-frame navigation. Poll it cheaply to detect page changes without taking a full snapshot:
150
+
151
+ ```bash
152
+ # HTTP
153
+ GET /api/v1/sessions/:id/page_rev
154
+ → { "status": "ok", "session_id": "...", "page_rev": 3, "url": "https://..." }
155
+ ```
156
+
157
+ ```python
158
+ rev = sess.page_rev() # PageRevResult: .page_rev, .url
159
+ ```
160
+
161
+ ### Stale Ref Detection and Recovery
162
+
163
+ If the page has navigated since the snapshot, using a stale `ref_id` returns:
164
+
165
+ ```
166
+ HTTP 409 stale_ref
167
+ {
168
+ "error": "stale_ref: page changed",
169
+ "suggestions": ["call snapshot_map to get fresh ref_ids", "re-run your step with the new ref_id"]
170
+ }
171
+ ```
172
+
173
+ Recovery pattern:
174
+ ```python
175
+ try:
176
+ sess.click(ref_id="snap_000001:e1")
177
+ except httpx.HTTPStatusError as e:
178
+ if e.response.status_code == 409:
179
+ snap = sess.snapshot_map() # refresh
180
+ btn = next(el for el in snap.elements if "Login" in (el.label or ""))
181
+ sess.click(ref_id=btn.ref_id) # retry
182
+ ```
183
+
184
+ ### run_steps + ref_id
185
+
186
+ In `run_steps`, each step with a stale `ref_id` returns a step-level error (not a request crash). Use `stop_on_error=False` to continue remaining steps past a single stale ref.
187
+
188
+ ```python
189
+ result = sess.run_steps(steps, stop_on_error=False)
190
+ for step in result.results:
191
+ if step.error and "stale_ref" in str(step.error):
192
+ # handle stale ref for this specific step
193
+ ```
194
+
195
+ ### Best for
196
+ - Dynamic/reactive pages where element positions change
197
+ - Batch operations (`run_steps`) where you need consistent refs across all steps
198
+ - When you need to confirm an element's existence at snapshot time before acting
199
+
200
+ ---
201
+
202
+ ## Priority 4 — Coordinates (click-at)
203
+
204
+ Use when: `contenteditable`, canvas elements, custom components, or all other modes fail.
205
+
206
+ ```bash
207
+ agentmb bbox <session-id> "#editor"
208
+ # → { "x": 100, "y": 200, "width": 400, "height": 300, "center_x": 300, "center_y": 350 }
209
+
210
+ agentmb click-at <session-id> 300 350 # absolute page coordinates
211
+ agentmb wheel <session-id> --dx 0 --dy 300
212
+ ```
213
+
214
+ Python SDK:
215
+ ```python
216
+ box = sess.bbox("#editor")
217
+ sess.click_at(x=box.center_x, y=box.center_y)
218
+ ```
219
+
220
+ ---
221
+
222
+ ## Mode Comparison Table
223
+
224
+ | Dimension | element-map | CSS Selector | snapshot-map | click-at |
225
+ |---|---|---|---|---|
226
+ | Requires prior scan | Yes | No | Yes | Requires `bbox` |
227
+ | Stable across re-render | Yes (until re-map) | Depends on selector | Until nav | Always |
228
+ | Detects stale state | No | No | Yes (409) | No |
229
+ | Works for icon-only | With `--include-unlabeled` | Yes | With `--include-unlabeled` | Yes |
230
+ | Good for run_steps | OK | OK | Best (stale detection) | Not practical |
231
+ | Token cost | Scan needed | Zero | Scan needed | Scan needed |
232
+
233
+ ---
234
+
235
+ ## Semantic Find (Alternative Locator)
236
+
237
+ Locate elements by Playwright semantic locators without knowing selectors. Returns `found`, `count`, `bbox`.
238
+
239
+ ```python
240
+ # query_type: 'role' | 'text' | 'label' | 'placeholder' | 'alt_text'
241
+ res = sess.find(query_type="role", query="button", name="Submit")
242
+ res = sess.find(query_type="text", query="Sign in", exact=True)
243
+ res = sess.find(query_type="placeholder", query="Search…")
244
+ res = sess.find(query_type="label", query="Email address")
245
+ ```
246
+
247
+ CLI:
248
+ ```bash
249
+ agentmb find <session-id> role button --name "Submit"
250
+ agentmb find <session-id> text "Sign in"
251
+ agentmb find <session-id> placeholder "Search…" --json
252
+ ```
253
+
254
+ Use `find` as a complement to element-map when you know the semantic intent (role, label) but not the CSS selector.
@@ -0,0 +1,260 @@
1
+ # Session Management — Deep Reference
2
+
3
+ ---
4
+
5
+ ## Session Lifecycle
6
+
7
+ ```
8
+ create → active → sealed (optional) → closed
9
+ │ │
10
+ └── all actions available └── profile data persists (if named)
11
+ or is deleted (if ephemeral)
12
+ ```
13
+
14
+ States:
15
+ - **active**: session is running, accepts all commands
16
+ - **sealed**: protected from deletion (`423` on `rm`); all actions still work
17
+ - **zombie**: browser process died unexpectedly; session entry remains but is non-functional
18
+
19
+ ---
20
+
21
+ ## Session Creation Options
22
+
23
+ ### Named Profile (Persistent)
24
+
25
+ Cookies, localStorage, and browser state persist across runs. Use the same `--profile` name to reuse saved state.
26
+
27
+ ```bash
28
+ agentmb session new --profile gmail-account
29
+ agentmb session new --profile shopify-store --headed
30
+ ```
31
+
32
+ ```python
33
+ sess = client.sessions.create(profile="gmail-account")
34
+ ```
35
+
36
+ Profile data stored under `AGENTMB_DATA_DIR/profiles/<name>/` (default `~/.agentmb/profiles/`).
37
+
38
+ ### Pure Sandbox (Ephemeral)
39
+
40
+ Temp directory — all data auto-deleted on `close()` or daemon restart.
41
+
42
+ ```bash
43
+ agentmb session new --ephemeral
44
+ ```
45
+
46
+ ```python
47
+ sess = client.sessions.create(ephemeral=True)
48
+ ```
49
+
50
+ ### Headed vs Headless
51
+
52
+ ```bash
53
+ agentmb session new --profile demo # headless (default)
54
+ agentmb session new --profile demo --headed # visible browser window
55
+ ```
56
+
57
+ ```python
58
+ sess = client.sessions.create(profile="demo", headless=False)
59
+ ```
60
+
61
+ Linux headed mode requires Xvfb: `sudo apt-get install -y xvfb && bash scripts/xvfb-headed.sh`
62
+
63
+ ### Downloads
64
+
65
+ File downloads are disabled by default. Enable at creation time:
66
+
67
+ ```bash
68
+ agentmb session new --accept-downloads
69
+ ```
70
+
71
+ ```python
72
+ sess = client.sessions.create(accept_downloads=True)
73
+ ```
74
+
75
+ ### Policy Profile
76
+
77
+ Rate limiting and action guardrails per-session. Override at creation:
78
+
79
+ ```bash
80
+ agentmb session new --profile demo --policy permissive
81
+ ```
82
+
83
+ ```python
84
+ sess = client.sessions.create(profile="demo", policy="permissive")
85
+ ```
86
+
87
+ Profiles:
88
+ | Profile | Min interval | Max actions/min | Sensitive actions |
89
+ |---|---|---|---|
90
+ | `safe` | 1500 ms | 8 | blocked |
91
+ | `permissive` | 200 ms | 60 | allowed |
92
+ | `disabled` | 0 ms | unlimited | allowed |
93
+
94
+ Change policy for a running session:
95
+ ```bash
96
+ agentmb policy <sid> permissive
97
+ agentmb policy <sid> safe --allow-sensitive
98
+ ```
99
+
100
+ ---
101
+
102
+ ## Session Commands
103
+
104
+ ```bash
105
+ agentmb session new [flags] # create; prints session-id
106
+ agentmb session list # list all active sessions
107
+ agentmb session get <sid> # show details: profile, headless, url, created_at
108
+ agentmb session rm <sid> # close + delete
109
+ agentmb session seal <sid> # protect from deletion
110
+ agentmb settings <sid> # show viewport, user_agent, headless, url, profile
111
+ ```
112
+
113
+ Python SDK:
114
+ ```python
115
+ sess = client.sessions.create(profile="demo")
116
+ sessions = client.sessions.list()
117
+ info = client.sessions.get(sess.id)
118
+ sess.seal()
119
+ sess.close()
120
+ settings = sess.get_settings() # viewport, user_agent, headless, url, profile
121
+ ```
122
+
123
+ ---
124
+
125
+ ## Multi-Page (Tabs) Management
126
+
127
+ Multiple tabs in the same session share profile (cookies, storage) but have independent navigation state.
128
+
129
+ ```bash
130
+ agentmb pages list <sid> # list all tabs
131
+ agentmb pages new <sid> # open new blank tab → returns page-id
132
+ agentmb pages switch <sid> <page-id> # make tab active target
133
+ agentmb pages close <sid> <page-id> # close tab
134
+ # Note: closing the last tab returns 409 (session must have ≥ 1 tab)
135
+ ```
136
+
137
+ Python SDK:
138
+ ```python
139
+ # Open additional tabs
140
+ page2_id = sess.new_page() # returns page_id
141
+ page3_id = sess.new_page()
142
+
143
+ # Switch between tabs
144
+ sess.switch_page(page2_id)
145
+ sess.navigate("https://other.example.com")
146
+
147
+ # List tabs
148
+ pages = sess.pages() # list[PageInfo]: .page_id, .url, .title, .active
149
+
150
+ # Close tab
151
+ sess.close_page(page3_id)
152
+
153
+ # Work on original tab
154
+ sess.switch_page(pages[0].page_id)
155
+ ```
156
+
157
+ ---
158
+
159
+ ## page_id Direct Targeting (R09-C03)
160
+
161
+ Instead of switching the active tab before every action, pass `page_id` directly to any action
162
+ request. All major action routes support this param:
163
+ `navigate`, `click`, `fill`, `type`, `press`, `eval`, `screenshot`, `element_map`, `snapshot_map`, `scroll`.
164
+
165
+ ```python
166
+ # Create session + open multiple tabs
167
+ p1 = sess.pages()[0].page_id
168
+ p2 = sess.new_page() # returns page_id string
169
+ p3 = sess.new_page()
170
+
171
+ # Navigate each independently — no switch_page() needed
172
+ sess.navigate("https://site.com/a", page_id=p1)
173
+ sess.navigate("https://site.com/b", page_id=p2)
174
+ sess.navigate("https://site.com/c", page_id=p3)
175
+
176
+ # element_map + interact on a non-active tab
177
+ em = sess.element_map(page_id=p2)
178
+ sess.click(element_id="e3", page_id=p2)
179
+
180
+ # Screenshot any tab
181
+ shot = sess.screenshot(page_id=p3)
182
+ ```
183
+
184
+ REST (add `page_id` to request body):
185
+ ```json
186
+ POST /api/v1/sessions/:id/navigate
187
+ { "url": "https://example.com", "page_id": "page_abc123" }
188
+ ```
189
+
190
+ Error: `404` if `page_id` not found in session — call `GET /api/v1/sessions/:id/pages` to list valid IDs.
191
+
192
+ ---
193
+
194
+ ## Multi-Agent Concurrency
195
+
196
+ Different agents can share a daemon but must use **separate sessions** (different profiles).
197
+
198
+ ```bash
199
+ # Agent A
200
+ agentmb session new --profile agent-a-work
201
+ # Agent B (separate, isolated)
202
+ agentmb session new --profile agent-b-work
203
+ ```
204
+
205
+ Sessions are fully isolated: cookies, navigation, and page state do not leak between them.
206
+
207
+ **Concurrent access to the same session** is not recommended — actions are not queued, and concurrent commands on one session may produce unpredictable results.
208
+
209
+ ---
210
+
211
+ ## Session Seal
212
+
213
+ Sealed sessions cannot be deleted until explicitly unsealed. Useful for long-running sessions that should not be accidentally closed.
214
+
215
+ ```bash
216
+ agentmb session seal <session-id>
217
+ agentmb session rm <session-id> # → 423 session_sealed
218
+ ```
219
+
220
+ ```python
221
+ sess.seal()
222
+ sess.close() # → 423 SessionSealedError
223
+ ```
224
+
225
+ Unseal via REST:
226
+ ```
227
+ DELETE /api/v1/sessions/:id/seal
228
+ ```
229
+
230
+ ---
231
+
232
+ ## Profile Management
233
+
234
+ ```python
235
+ # List all profiles
236
+ result = client.list_profiles()
237
+ for p in result.profiles:
238
+ print(p.name, p.path, p.last_used)
239
+
240
+ # Reset a profile (wipes data dir)
241
+ # Returns 409 if a live session is currently using the profile
242
+ client.reset_profile("demo")
243
+ ```
244
+
245
+ REST:
246
+ ```
247
+ GET /api/v1/profiles → ProfileListResult
248
+ POST /api/v1/profiles/:name/reset → ProfileResetResult
249
+ ```
250
+
251
+ ---
252
+
253
+ ## Environment Variables Affecting Sessions
254
+
255
+ | Variable | Default | Notes |
256
+ |---|---|---|
257
+ | `AGENTMB_DATA_DIR` | `~/.agentmb` | Root dir for profiles and logs |
258
+ | `AGENTMB_POLICY_PROFILE` | `safe` | Daemon-wide default policy |
259
+ | `AGENTMB_API_TOKEN` | _(none)_ | Token required on all requests |
260
+ | `AGENTMB_ENCRYPTION_KEY` | _(none)_ | AES-256-GCM encrypt profiles at rest |