npm - agentmb - Versions diffs - 0.3.1 → 0.3.2 - Mend

agentmb 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/README.md +188 -4
package/dist/browser/actions.d.ts +1 -1
package/dist/browser/actions.d.ts.map +1 -1
package/dist/browser/actions.js +4 -3
package/dist/browser/actions.js.map +1 -1
package/dist/browser/manager.d.ts +21 -0
package/dist/browser/manager.d.ts.map +1 -1
package/dist/browser/manager.js +127 -12
package/dist/browser/manager.js.map +1 -1
package/dist/cli/commands/actions.d.ts.map +1 -1
package/dist/cli/commands/actions.js +37 -10
package/dist/cli/commands/actions.js.map +1 -1
package/dist/cli/commands/session.d.ts.map +1 -1
package/dist/cli/commands/session.js +9 -0
package/dist/cli/commands/session.js.map +1 -1
package/dist/cli/index.js +1 -1
package/dist/daemon/routes/actions.d.ts.map +1 -1
package/dist/daemon/routes/actions.js +97 -12
package/dist/daemon/routes/actions.js.map +1 -1
package/dist/daemon/routes/interaction.d.ts.map +1 -1
package/dist/daemon/routes/interaction.js +10 -1
package/dist/daemon/routes/interaction.js.map +1 -1
package/dist/daemon/routes/sessions.d.ts.map +1 -1
package/dist/daemon/routes/sessions.js +107 -1
package/dist/daemon/routes/sessions.js.map +1 -1
package/dist/daemon/server.js +1 -1
package/package.json +4 -2
package/skills/agentmb/SKILL.md +541 -0
package/skills/agentmb/references/authentication.md +180 -0
package/skills/agentmb/references/browser-modes.md +167 -0
package/skills/agentmb/references/commands.md +231 -0
package/skills/agentmb/references/locator-modes.md +254 -0
package/skills/agentmb/references/session-management.md +260 -0

package/skills/agentmb/references/locator-modes.md ADDED Viewed

@@ -0,0 +1,254 @@
+# Locator Modes — Deep Reference
+Three modes for targeting page elements. **Start at Priority 1, move down only if needed.**
+---
+## Priority Order
+```
+Priority 1: element-map  →  --element-id      default; works for most pages
+Priority 2: CSS Selector  →  direct             when element-map labels are empty/unreliable
+Priority 3: snapshot-map →  --ref-id           when you need atomicity or batch operations
+Priority 4: click-at coordinates               last resort; contenteditable, canvas
+```
+---
+## Priority 1 — element-map + --element-id
+### How it works
+`element-map` injects a stable `element_id` (`e1`, `e2`, …) into every interactable element on the page and returns a list with label, tag, and selector. The IDs persist until the next `element-map` call.
+```bash
+agentmb element-map <session-id>
+agentmb element-map <session-id> --include-unlabeled   # also include icon-only elements
+```
+Example output:
+```
+e1  [button]  Submit
+e2  [input]   Email address    (placeholder="Enter email")
+e3  [a]       Sign in
+e4  [button]  ☰               (label_source=none — icon only)
+```
+Pass the ID to any action:
+```bash
+agentmb click  <session-id> e1 --element-id
+agentmb fill   <session-id> e2 "user@example.com" --element-id
+agentmb get    <session-id> text e3 --element-id
+agentmb assert <session-id> visible e1 --element-id
+agentmb bbox   <session-id> e1 --element-id
+```
+### label_source Priority Chain
+The `label` field is synthesized by checking sources in this order (first non-empty wins):
+| Priority | Source | `label_source` value |
+|---|---|---|
+| 1 | `aria-label` attribute | `"aria-label"` |
+| 2 | `title` attribute | `"title"` |
+| 3 | `aria-labelledby` target text | `"aria-labelledby"` |
+| 4 | SVG `<title>` / `<desc>` | `"svg-title"` |
+| 5 | `innerText` (trimmed) | `"text"` |
+| 6 | `placeholder` attribute | `"placeholder"` |
+| 7 | Fallback (icon-only) | `"none"` / `"[tag @ x,y]"` |
+If `label_source=none`, the element has no readable label. Add `--include-unlabeled` to get a coordinate-based `[tag @ x,y]` fallback, or switch to CSS selector (Priority 2).
+### Best for
+- Text-rich pages: docs, GitHub, Hacker News, dashboards
+- Forms with labeled inputs
+- Buttons with accessible text
+---
+## Priority 2 — CSS Selector
+Pass a CSS selector directly — no prior scan needed.
+```bash
+agentmb click  <session-id> "button[data-testid=submit]"
+agentmb fill   <session-id> "#email" "user@example.com"
+agentmb get    <session-id> text ".product-title"
+agentmb assert <session-id> visible ".modal"
+```
+Python SDK:
+```python
+sess.click(selector="button[data-testid=submit]")
+sess.fill(selector="#email", value="user@example.com")
+```
+### Best for
+- Icon-dense SPAs where `element-map` returns `label_source=none` for most elements
+- Pages with stable, predictable `data-testid` or `id` attributes
+- When you already know the selector (no scan needed)
+### When NOT to use
+- Selectors with dynamic class names like `.css-3xk7a9` — they break on re-render
+- Use element-map or snapshot-map instead
+---
+## Priority 3 — snapshot-map + --ref-id
+### How it works
+`snapshot-map` captures a server-side snapshot of the page's element state with a `page_rev` counter. Each element gets a stable `ref_id` (`snap_XXXXXX:eN`). The ref is valid as long as the page has not navigated since the snapshot.
+```bash
+agentmb snapshot-map <session-id>
+agentmb snapshot-map <session-id> --include-unlabeled
+```
+Example output:
+```
+snap_000001:e1  [button]  Login
+snap_000001:e3  [input]   Username
+snap_000001:e7  [a]       Forgot password?
+```
+Pass the ref_id to any action:
+```bash
+agentmb click  <session-id> snap_000001:e1 --ref-id
+agentmb fill   <session-id> snap_000001:e3 "alice" --ref-id
+```
+Python SDK:
+```python
+snap = sess.snapshot_map()
+# Find by label
+btn = next(e for e in snap.elements if "Login" in (e.label or ""))
+sess.click(ref_id=btn.ref_id)
+# Or use in run_steps
+sess.run_steps([
+    {"action": "click", "params": {"ref_id": btn.ref_id}},
+    {"action": "fill",  "params": {"ref_id": snap.elements[2].ref_id, "value": "alice"}},
+])
+```
+### ref_id Format
+```
+snap_XXXXXX:eN
+│             │
+│             └─ element index within the snapshot
+└─ 6-char snapshot ID (hex)
+```
+Examples: `snap_000001:e1`, `snap_a3f9c2:e15`
+### page_rev — Detecting Page Changes
+`page_rev` is an integer that increments on every main-frame navigation. Poll it cheaply to detect page changes without taking a full snapshot:
+```bash
+# HTTP
+GET /api/v1/sessions/:id/page_rev
+→ { "status": "ok", "session_id": "...", "page_rev": 3, "url": "https://..." }
+```
+```python
+rev = sess.page_rev()   # PageRevResult: .page_rev, .url
+```
+### Stale Ref Detection and Recovery
+If the page has navigated since the snapshot, using a stale `ref_id` returns:
+```
+HTTP 409 stale_ref
+{
+  "error": "stale_ref: page changed",
+  "suggestions": ["call snapshot_map to get fresh ref_ids", "re-run your step with the new ref_id"]
+}
+```
+Recovery pattern:
+```python
+try:
+    sess.click(ref_id="snap_000001:e1")
+except httpx.HTTPStatusError as e:
+    if e.response.status_code == 409:
+        snap = sess.snapshot_map()          # refresh
+        btn = next(el for el in snap.elements if "Login" in (el.label or ""))
+        sess.click(ref_id=btn.ref_id)       # retry
+```
+### run_steps + ref_id
+In `run_steps`, each step with a stale `ref_id` returns a step-level error (not a request crash). Use `stop_on_error=False` to continue remaining steps past a single stale ref.
+```python
+result = sess.run_steps(steps, stop_on_error=False)
+for step in result.results:
+    if step.error and "stale_ref" in str(step.error):
+        # handle stale ref for this specific step
+```
+### Best for
+- Dynamic/reactive pages where element positions change
+- Batch operations (`run_steps`) where you need consistent refs across all steps
+- When you need to confirm an element's existence at snapshot time before acting
+---
+## Priority 4 — Coordinates (click-at)
+Use when: `contenteditable`, canvas elements, custom components, or all other modes fail.
+```bash
+agentmb bbox <session-id> "#editor"
+# → { "x": 100, "y": 200, "width": 400, "height": 300, "center_x": 300, "center_y": 350 }
+agentmb click-at <session-id> 300 350     # absolute page coordinates
+agentmb wheel    <session-id> --dx 0 --dy 300
+```
+Python SDK:
+```python
+box = sess.bbox("#editor")
+sess.click_at(x=box.center_x, y=box.center_y)
+```
+---
+## Mode Comparison Table
+| Dimension | element-map | CSS Selector | snapshot-map | click-at |
+|---|---|---|---|---|
+| Requires prior scan | Yes | No | Yes | Requires `bbox` |
+| Stable across re-render | Yes (until re-map) | Depends on selector | Until nav | Always |
+| Detects stale state | No | No | Yes (409) | No |
+| Works for icon-only | With `--include-unlabeled` | Yes | With `--include-unlabeled` | Yes |
+| Good for run_steps | OK | OK | Best (stale detection) | Not practical |
+| Token cost | Scan needed | Zero | Scan needed | Scan needed |
+---
+## Semantic Find (Alternative Locator)
+Locate elements by Playwright semantic locators without knowing selectors. Returns `found`, `count`, `bbox`.
+```python
+# query_type: 'role' | 'text' | 'label' | 'placeholder' | 'alt_text'
+res = sess.find(query_type="role", query="button", name="Submit")
+res = sess.find(query_type="text", query="Sign in", exact=True)
+res = sess.find(query_type="placeholder", query="Search…")
+res = sess.find(query_type="label", query="Email address")
+```
+CLI:
+```bash
+agentmb find <session-id> role button --name "Submit"
+agentmb find <session-id> text "Sign in"
+agentmb find <session-id> placeholder "Search…" --json
+```
+Use `find` as a complement to element-map when you know the semantic intent (role, label) but not the CSS selector.

package/skills/agentmb/references/session-management.md ADDED Viewed

@@ -0,0 +1,260 @@
+# Session Management — Deep Reference
+---
+## Session Lifecycle
+```
+create  →  active  →  sealed (optional)  →  closed
+           │                                  │
+           └── all actions available          └── profile data persists (if named)
+                                                   or is deleted (if ephemeral)
+```
+States:
+- **active**: session is running, accepts all commands
+- **sealed**: protected from deletion (`423` on `rm`); all actions still work
+- **zombie**: browser process died unexpectedly; session entry remains but is non-functional
+---
+## Session Creation Options
+### Named Profile (Persistent)
+Cookies, localStorage, and browser state persist across runs. Use the same `--profile` name to reuse saved state.
+```bash
+agentmb session new --profile gmail-account
+agentmb session new --profile shopify-store --headed
+```
+```python
+sess = client.sessions.create(profile="gmail-account")
+```
+Profile data stored under `AGENTMB_DATA_DIR/profiles/<name>/` (default `~/.agentmb/profiles/`).
+### Pure Sandbox (Ephemeral)
+Temp directory — all data auto-deleted on `close()` or daemon restart.
+```bash
+agentmb session new --ephemeral
+```
+```python
+sess = client.sessions.create(ephemeral=True)
+```
+### Headed vs Headless
+```bash
+agentmb session new --profile demo              # headless (default)
+agentmb session new --profile demo --headed     # visible browser window
+```
+```python
+sess = client.sessions.create(profile="demo", headless=False)
+```
+Linux headed mode requires Xvfb: `sudo apt-get install -y xvfb && bash scripts/xvfb-headed.sh`
+### Downloads
+File downloads are disabled by default. Enable at creation time:
+```bash
+agentmb session new --accept-downloads
+```
+```python
+sess = client.sessions.create(accept_downloads=True)
+```
+### Policy Profile
+Rate limiting and action guardrails per-session. Override at creation:
+```bash
+agentmb session new --profile demo --policy permissive
+```
+```python
+sess = client.sessions.create(profile="demo", policy="permissive")
+```
+Profiles:
+| Profile | Min interval | Max actions/min | Sensitive actions |
+|---|---|---|---|
+| `safe` | 1500 ms | 8 | blocked |
+| `permissive` | 200 ms | 60 | allowed |
+| `disabled` | 0 ms | unlimited | allowed |
+Change policy for a running session:
+```bash
+agentmb policy <sid> permissive
+agentmb policy <sid> safe --allow-sensitive
+```
+---
+## Session Commands
+```bash
+agentmb session new [flags]        # create; prints session-id
+agentmb session list               # list all active sessions
+agentmb session get <sid>          # show details: profile, headless, url, created_at
+agentmb session rm <sid>           # close + delete
+agentmb session seal <sid>         # protect from deletion
+agentmb settings <sid>             # show viewport, user_agent, headless, url, profile
+```
+Python SDK:
+```python
+sess = client.sessions.create(profile="demo")
+sessions = client.sessions.list()
+info = client.sessions.get(sess.id)
+sess.seal()
+sess.close()
+settings = sess.get_settings()   # viewport, user_agent, headless, url, profile
+```
+---
+## Multi-Page (Tabs) Management
+Multiple tabs in the same session share profile (cookies, storage) but have independent navigation state.
+```bash
+agentmb pages list <sid>                    # list all tabs
+agentmb pages new <sid>                     # open new blank tab → returns page-id
+agentmb pages switch <sid> <page-id>        # make tab active target
+agentmb pages close <sid> <page-id>         # close tab
+# Note: closing the last tab returns 409 (session must have ≥ 1 tab)
+```
+Python SDK:
+```python
+# Open additional tabs
+page2_id = sess.new_page()         # returns page_id
+page3_id = sess.new_page()
+# Switch between tabs
+sess.switch_page(page2_id)
+sess.navigate("https://other.example.com")
+# List tabs
+pages = sess.pages()               # list[PageInfo]: .page_id, .url, .title, .active
+# Close tab
+sess.close_page(page3_id)
+# Work on original tab
+sess.switch_page(pages[0].page_id)
+```
+---
+## page_id Direct Targeting (R09-C03)
+Instead of switching the active tab before every action, pass `page_id` directly to any action
+request. All major action routes support this param:
+`navigate`, `click`, `fill`, `type`, `press`, `eval`, `screenshot`, `element_map`, `snapshot_map`, `scroll`.
+```python
+# Create session + open multiple tabs
+p1 = sess.pages()[0].page_id
+p2 = sess.new_page()   # returns page_id string
+p3 = sess.new_page()
+# Navigate each independently — no switch_page() needed
+sess.navigate("https://site.com/a", page_id=p1)
+sess.navigate("https://site.com/b", page_id=p2)
+sess.navigate("https://site.com/c", page_id=p3)
+# element_map + interact on a non-active tab
+em = sess.element_map(page_id=p2)
+sess.click(element_id="e3", page_id=p2)
+# Screenshot any tab
+shot = sess.screenshot(page_id=p3)
+```
+REST (add `page_id` to request body):
+```json
+POST /api/v1/sessions/:id/navigate
+{ "url": "https://example.com", "page_id": "page_abc123" }
+```
+Error: `404` if `page_id` not found in session — call `GET /api/v1/sessions/:id/pages` to list valid IDs.
+---
+## Multi-Agent Concurrency
+Different agents can share a daemon but must use **separate sessions** (different profiles).
+```bash
+# Agent A
+agentmb session new --profile agent-a-work
+# Agent B (separate, isolated)
+agentmb session new --profile agent-b-work
+```
+Sessions are fully isolated: cookies, navigation, and page state do not leak between them.
+**Concurrent access to the same session** is not recommended — actions are not queued, and concurrent commands on one session may produce unpredictable results.
+---
+## Session Seal
+Sealed sessions cannot be deleted until explicitly unsealed. Useful for long-running sessions that should not be accidentally closed.
+```bash
+agentmb session seal <session-id>
+agentmb session rm <session-id>   # → 423 session_sealed
+```
+```python
+sess.seal()
+sess.close()  # → 423 SessionSealedError
+```
+Unseal via REST:
+```
+DELETE /api/v1/sessions/:id/seal
+```
+---
+## Profile Management
+```python
+# List all profiles
+result = client.list_profiles()
+for p in result.profiles:
+    print(p.name, p.path, p.last_used)
+# Reset a profile (wipes data dir)
+# Returns 409 if a live session is currently using the profile
+client.reset_profile("demo")
+```
+REST:
+```
+GET  /api/v1/profiles              → ProfileListResult
+POST /api/v1/profiles/:name/reset  → ProfileResetResult
+```
+---
+## Environment Variables Affecting Sessions
+| Variable | Default | Notes |
+|---|---|---|
+| `AGENTMB_DATA_DIR` | `~/.agentmb` | Root dir for profiles and logs |
+| `AGENTMB_POLICY_PROFILE` | `safe` | Daemon-wide default policy |
+| `AGENTMB_API_TOKEN` | _(none)_ | Token required on all requests |
+| `AGENTMB_ENCRYPTION_KEY` | _(none)_ | AES-256-GCM encrypt profiles at rest |