arise-browser 0.2.3 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -4,7 +4,7 @@ export function registerHealthRoute(app) {
4
4
  return {
5
5
  status: "ok",
6
6
  connected: session.isConnected,
7
- version: "0.2.3",
7
+ version: "0.3.1",
8
8
  };
9
9
  });
10
10
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "arise-browser",
3
- "version": "0.2.3",
3
+ "version": "0.3.1",
4
4
  "description": "AI browser automation engine — persistent refs, multi-strategy actions, behavior recording",
5
5
  "type": "module",
6
6
  "main": "./dist/src/index.js",
@@ -8,20 +8,51 @@ metadata:
8
8
  openclaw:
9
9
  emoji: "🌐"
10
10
  requires:
11
- bins: ["npx"]
11
+ bins: ["npx", "docker"]
12
12
  ---
13
13
 
14
14
  # AriseBrowser
15
15
 
16
- Control a real Chrome browser via HTTP API. Persistent element refs, YAML accessibility snapshots, WebRTC live view.
16
+ You are controlling a **real Chrome browser**, like a human sitting in front of a screen. You see the page through snapshots, and you interact by clicking, typing, and selecting — not by writing JavaScript or constructing URLs.
17
17
 
18
18
  ## MANDATORY RULES
19
19
 
20
20
  **You MUST follow these rules. No exceptions.**
21
21
 
22
- 1. **Do NOT call any API endpoint until `/health` returns `{"connected":true}`.** The server needs time to start the Docker container and Chrome. Poll `/health` in a loop.
23
- 2. **Every browser task follows: Navigate Snapshot Act Snapshot Act Done.** Always snapshot before acting you need refs from the snapshot to target elements.
24
- 3. **Refs are persistent.** Do NOT re-snapshot just to reuse a ref. Only snapshot when the page changes significantly.
22
+ 1. **Wait for ready.** Do NOT call any endpoint until `/health` returns `{"connected":true}`.
23
+ 2. **Snapshot is your eyes.** After every navigate or significant action, call `/snapshot` to see what's on the page. Read the snapshot to find element refs (e0, e5, e12...) and understand the page structure.
24
+ 3. **Act through refs.** To click a button, select a dropdown, or type in a field — use `/action` with the ref from your snapshot. Do NOT construct URLs with query parameters to change page state. Use `select`, `click`, and `type` actions instead.
25
+ 4. **NEVER use `/evaluate` to extract data.** The snapshot already contains all visible text, links, buttons, and form elements in a structured format. `/evaluate` is only for rare edge cases where data is hidden from the accessibility tree.
26
+ 5. **NEVER use `/text` as your primary data source.** `/text` returns unstructured plain text that is hard to parse. Use `/snapshot` — it gives you structured elements with refs, roles, names, and links.
27
+ 6. **Refs are persistent.** Do NOT re-snapshot just to reuse a ref. Only re-snapshot when the page content changes.
28
+
29
+ ## How to Think
30
+
31
+ You are a person using a browser. Snapshot is your eyes, action is your hands.
32
+
33
+ - **To sort results** → find the sort dropdown in the snapshot → use `select` action on its ref
34
+ - **To search** → find the search box ref → `type` your query → `press_key` Enter
35
+ - **To go to next page** → find the "Next" button ref → `click` it
36
+ - **To read product info** → it's already in the snapshot (names, prices, ratings are all there as text)
37
+
38
+ ### Example: Sort Amazon results by Best Sellers
39
+
40
+ ```bash
41
+ # 1. Navigate
42
+ curl -X POST /navigate -d '{"url": "https://amazon.com/s?k=laptop"}'
43
+
44
+ # 2. Snapshot — see the page
45
+ curl /snapshot
46
+ # → combobox "Sort by:" [ref=e187] with options including "Best Sellers"
47
+ # → link "Product Name" [ref=e226], generic "4.4" [ref=e231], link "$599" [ref=e246]
48
+
49
+ # 3. Select from dropdown using ref
50
+ curl -X POST /action -d '{"type": "select", "ref": "e187", "value": "exact-aware-popularity-rank"}'
51
+
52
+ # 4. Snapshot again — results are now sorted
53
+ curl /snapshot
54
+ # → Read product names, prices, ratings directly from snapshot text
55
+ ```
25
56
 
26
57
  ## Step 1: Start the Server
27
58
 
@@ -32,7 +63,7 @@ npm install -g arise-browser
32
63
 
33
64
  Start:
34
65
  ```bash
35
- npx arise-browser --virtual-display --port 9867 --host 0.0.0.0 &
66
+ npx arise-browser --virtual-display --port 9867 &
36
67
  ```
37
68
 
38
69
  **Wait for ready (MANDATORY):**
@@ -46,25 +77,15 @@ First run takes ~2 minutes (Docker pulls Neko Chrome image ~700MB).
46
77
 
47
78
  After installation succeeds, inform the user:
48
79
 
49
- 1. **Firewall**: Open these ports in your cloud security group:
50
- - `9867/tcp` — arise-browser API
51
- - `6090/tcp` Neko WebRTC UI (browser live view)
52
- - `52000-52100/udp` — WebRTC media data
53
- 2. **Watch the browser**: Open `http://<server-ip>:6090` in your browser, password: `neko`
54
- 3. **HTTPS (optional)**: For production, put a reverse proxy in front. Recommended: [Caddy](https://caddyserver.com/) with `tls internal` (self-signed, no domain needed) or your own domain for auto Let's Encrypt.
55
- 4. **Passwords**: Default Neko passwords are `neko` (viewer) and `admin` (admin). Change via `--neko-password` and `--neko-admin-password` flags.
80
+ 1. **Watch the browser**: Open `http://localhost:6090` in your browser, password: `neko`
81
+ 2. **Remote access**: Add `--host 0.0.0.0` and `ARISE_BROWSER_TOKEN=<secret>` to the start command, then open ports `9867/tcp`, `6090/tcp`, `52000-52100/udp` in your firewall.
82
+ 3. **Passwords**: Change default Neko passwords via `--neko-password` and `--neko-admin-password` flags.
56
83
 
57
- ## Step 2: Use the Browser
84
+ ## Step 2: Core Loop
58
85
 
59
86
  Base URL: `http://localhost:9867`
60
87
 
61
- Every task follows this loop:
62
-
63
- ```
64
- Navigate → Snapshot → Act → Snapshot → Act → ... → Done
65
- ```
66
-
67
- ### Navigate
88
+ ### Navigate to a URL
68
89
 
69
90
  ```bash
70
91
  curl -X POST http://localhost:9867/navigate \
@@ -72,61 +93,55 @@ curl -X POST http://localhost:9867/navigate \
72
93
  -d '{"url": "https://example.com"}'
73
94
  ```
74
95
 
75
- ### Snapshot (get page state)
96
+ ### Snapshot see the page
76
97
 
77
- Returns a YAML accessibility tree with element refs (e0, e5, e12...).
98
+ Returns a YAML accessibility tree. Every interactive element has a ref you can act on.
78
99
 
79
100
  ```bash
80
- # Full snapshot
81
101
  curl http://localhost:9867/snapshot
102
+ ```
82
103
 
83
- # Diff mode only changes since last snapshot (saves tokens)
84
- curl "http://localhost:9867/snapshot?diff=true"
104
+ What you'll see in a snapshot:
105
+ ```yaml
106
+ - combobox "Sort by:" [ref=e187] ← dropdown, use select action
107
+ - link "Product Name" [ref=e226] ← clickable link
108
+ - textbox "Search" [ref=e14] ← input field, use type action
109
+ - button "Add to cart" [ref=e281] ← button, use click action
110
+ - generic "4.4" [ref=e231] ← text content (rating)
111
+ - generic "$599.99" [ref=e246] ← text content (price)
85
112
  ```
86
113
 
87
- ### Act on elements
114
+ Use `?diff=true` after the first snapshot to only see changes (saves tokens).
88
115
 
89
- Use refs from the snapshot. Refs are **persistent** they survive across snapshots, no need to re-snapshot before reusing a ref.
116
+ ### Actinteract with elements
117
+
118
+ Use the ref from your snapshot:
90
119
 
91
120
  ```bash
92
- # Click
121
+ # Click a link or button
93
122
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
94
- -d '{"type": "click", "ref": "e5"}'
123
+ -d '{"type": "click", "ref": "e226"}'
95
124
 
96
- # Type text
125
+ # Type in a text field
97
126
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
98
- -d '{"type": "type", "ref": "e12", "text": "search query"}'
127
+ -d '{"type": "type", "ref": "e14", "text": "search query"}'
99
128
 
100
- # Press key
129
+ # Press a key (Enter, Tab, Escape, etc.)
101
130
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
102
131
  -d '{"type": "press_key", "key": "Enter"}'
103
132
 
104
- # Scroll
133
+ # Select from a dropdown
105
134
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
106
- -d '{"type": "scroll", "direction": "down", "amount": 500}'
135
+ -d '{"type": "select", "ref": "e187", "value": "option-value"}'
107
136
 
108
- # Hover
137
+ # Scroll down
109
138
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
110
- -d '{"type": "hover", "ref": "e7"}'
111
-
112
- # Select dropdown
113
- curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
114
- -d '{"type": "select", "ref": "e3", "value": "option1"}'
139
+ -d '{"type": "scroll", "direction": "down", "amount": 500}'
115
140
  ```
116
141
 
117
- ### Extract content
142
+ ### Repeat
118
143
 
119
- ```bash
120
- # Page text
121
- curl http://localhost:9867/text
122
-
123
- # Screenshot (JPEG)
124
- curl http://localhost:9867/screenshot > screenshot.jpg
125
-
126
- # Execute JavaScript
127
- curl -X POST http://localhost:9867/evaluate -H "Content-Type: application/json" \
128
- -d '{"expression": "document.title"}'
129
- ```
144
+ After each action that changes the page, snapshot again to see the result. Then act on the new refs.
130
145
 
131
146
  ## Step 3: Stop
132
147
 
@@ -138,11 +153,11 @@ The Docker container is automatically stopped and cleaned up.
138
153
 
139
154
  ## Tips
140
155
 
156
+ - **Read the snapshot carefully.** Product names, prices, ratings, links — they're all there. No need for JavaScript or regex.
141
157
  - Use `?diff=true` after the first snapshot to save tokens.
142
- - Refs persist across snapshots — don't re-snapshot just to reuse a ref.
143
158
  - Batch actions: `POST /actions` with `{"actions": [...], "stopOnError": true}`.
144
159
  - Tabs: `GET /tabs`, `POST /tab` with `{"action": "create|switch|close"}`.
145
- - Use `tabId` param on any endpoint to target a specific tab without switching.
160
+ - Screenshot (`GET /screenshot`) is useful to show the user what you see.
146
161
 
147
162
  ## Troubleshooting
148
163
 
@@ -151,9 +166,5 @@ The Docker container is automatically stopped and cleaned up.
151
166
  | First run slow | Docker pulling Neko image (~700MB), wait ~2 min |
152
167
  | Health returns `connected: false` | Chrome crashed — restart arise-browser |
153
168
  | Neko UI loads but no video | Open UDP 52000-52100 in firewall/security group |
154
- | Neko UI click no response | Use admin password `admin`, or restart container (implicit hosting enabled) |
155
- | Action returns error | Snapshot first to get valid refs, then act |
156
-
157
- ## Full API Reference
158
-
159
- See [references/api.md](references/api.md) for all endpoints, parameters, and advanced features (recording, PDF export, batch actions).
169
+ | Action returns error | Snapshot first to get valid refs, then act on them |
170
+ | Can't find an element | Scroll down and snapshot again element may be below the fold |