arise-browser 0.2.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -4,7 +4,7 @@ export function registerHealthRoute(app) {
4
4
  return {
5
5
  status: "ok",
6
6
  connected: session.isConnected,
7
- version: "0.2.3",
7
+ version: "0.3.0",
8
8
  };
9
9
  });
10
10
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "arise-browser",
3
- "version": "0.2.3",
3
+ "version": "0.3.0",
4
4
  "description": "AI browser automation engine — persistent refs, multi-strategy actions, behavior recording",
5
5
  "type": "module",
6
6
  "main": "./dist/src/index.js",
@@ -13,15 +13,46 @@ metadata:
13
13
 
14
14
  # AriseBrowser
15
15
 
16
- Control a real Chrome browser via HTTP API. Persistent element refs, YAML accessibility snapshots, WebRTC live view.
16
+ You are controlling a **real Chrome browser**, like a human sitting in front of a screen. You see the page through snapshots, and you interact by clicking, typing, and selecting — not by writing JavaScript or constructing URLs.
17
17
 
18
18
  ## MANDATORY RULES
19
19
 
20
20
  **You MUST follow these rules. No exceptions.**
21
21
 
22
- 1. **Do NOT call any API endpoint until `/health` returns `{"connected":true}`.** The server needs time to start the Docker container and Chrome. Poll `/health` in a loop.
23
- 2. **Every browser task follows: Navigate Snapshot Act Snapshot Act Done.** Always snapshot before acting you need refs from the snapshot to target elements.
24
- 3. **Refs are persistent.** Do NOT re-snapshot just to reuse a ref. Only snapshot when the page changes significantly.
22
+ 1. **Wait for ready.** Do NOT call any endpoint until `/health` returns `{"connected":true}`.
23
+ 2. **Snapshot is your eyes.** After every navigate or significant action, call `/snapshot` to see what's on the page. Read the snapshot to find element refs (e0, e5, e12...) and understand the page structure.
24
+ 3. **Act through refs.** To click a button, select a dropdown, or type in a field — use `/action` with the ref from your snapshot. Do NOT construct URLs with query parameters to change page state. Use `select`, `click`, and `type` actions instead.
25
+ 4. **NEVER use `/evaluate` to extract data.** The snapshot already contains all visible text, links, buttons, and form elements in a structured format. `/evaluate` is only for rare edge cases where data is hidden from the accessibility tree.
26
+ 5. **NEVER use `/text` as your primary data source.** `/text` returns unstructured plain text that is hard to parse. Use `/snapshot` — it gives you structured elements with refs, roles, names, and links.
27
+ 6. **Refs are persistent.** Do NOT re-snapshot just to reuse a ref. Only re-snapshot when the page content changes.
28
+
29
+ ## How to Think
30
+
31
+ You are a person using a browser. Snapshot is your eyes, action is your hands.
32
+
33
+ - **To sort results** → find the sort dropdown in the snapshot → use `select` action on its ref
34
+ - **To search** → find the search box ref → `type` your query → `press_key` Enter
35
+ - **To go to next page** → find the "Next" button ref → `click` it
36
+ - **To read product info** → it's already in the snapshot (names, prices, ratings are all there as text)
37
+
38
+ ### Example: Sort Amazon results by Best Sellers
39
+
40
+ ```bash
41
+ # 1. Navigate
42
+ curl -X POST /navigate -d '{"url": "https://amazon.com/s?k=laptop"}'
43
+
44
+ # 2. Snapshot — see the page
45
+ curl /snapshot
46
+ # → combobox "Sort by:" [ref=e187] with options including "Best Sellers"
47
+ # → link "Product Name" [ref=e226], generic "4.4" [ref=e231], link "$599" [ref=e246]
48
+
49
+ # 3. Select from dropdown using ref
50
+ curl -X POST /action -d '{"type": "select", "ref": "e187", "value": "exact-aware-popularity-rank"}'
51
+
52
+ # 4. Snapshot again — results are now sorted
53
+ curl /snapshot
54
+ # → Read product names, prices, ratings directly from snapshot text
55
+ ```
25
56
 
26
57
  ## Step 1: Start the Server
27
58
 
@@ -51,20 +82,14 @@ After installation succeeds, inform the user:
51
82
  - `6090/tcp` — Neko WebRTC UI (browser live view)
52
83
  - `52000-52100/udp` — WebRTC media data
53
84
  2. **Watch the browser**: Open `http://<server-ip>:6090` in your browser, password: `neko`
54
- 3. **HTTPS (optional)**: For production, put a reverse proxy in front. Recommended: [Caddy](https://caddyserver.com/) with `tls internal` (self-signed, no domain needed) or your own domain for auto Let's Encrypt.
55
- 4. **Passwords**: Default Neko passwords are `neko` (viewer) and `admin` (admin). Change via `--neko-password` and `--neko-admin-password` flags.
85
+ 3. **HTTPS (optional)**: For production, put Caddy in front with `tls internal` (self-signed, no domain needed).
86
+ 4. **Passwords**: Default Neko passwords are `neko` (viewer) and `admin` (admin). Change via CLI flags.
56
87
 
57
- ## Step 2: Use the Browser
88
+ ## Step 2: Core Loop
58
89
 
59
90
  Base URL: `http://localhost:9867`
60
91
 
61
- Every task follows this loop:
62
-
63
- ```
64
- Navigate → Snapshot → Act → Snapshot → Act → ... → Done
65
- ```
66
-
67
- ### Navigate
92
+ ### Navigate to a URL
68
93
 
69
94
  ```bash
70
95
  curl -X POST http://localhost:9867/navigate \
@@ -72,61 +97,55 @@ curl -X POST http://localhost:9867/navigate \
72
97
  -d '{"url": "https://example.com"}'
73
98
  ```
74
99
 
75
- ### Snapshot (get page state)
100
+ ### Snapshot see the page
76
101
 
77
- Returns a YAML accessibility tree with element refs (e0, e5, e12...).
102
+ Returns a YAML accessibility tree. Every interactive element has a ref you can act on.
78
103
 
79
104
  ```bash
80
- # Full snapshot
81
105
  curl http://localhost:9867/snapshot
106
+ ```
82
107
 
83
- # Diff mode only changes since last snapshot (saves tokens)
84
- curl "http://localhost:9867/snapshot?diff=true"
108
+ What you'll see in a snapshot:
109
+ ```yaml
110
+ - combobox "Sort by:" [ref=e187] ← dropdown, use select action
111
+ - link "Product Name" [ref=e226] ← clickable link
112
+ - textbox "Search" [ref=e14] ← input field, use type action
113
+ - button "Add to cart" [ref=e281] ← button, use click action
114
+ - generic "4.4" [ref=e231] ← text content (rating)
115
+ - generic "$599.99" [ref=e246] ← text content (price)
85
116
  ```
86
117
 
87
- ### Act on elements
118
+ Use `?diff=true` after the first snapshot to only see changes (saves tokens).
88
119
 
89
- Use refs from the snapshot. Refs are **persistent** they survive across snapshots, no need to re-snapshot before reusing a ref.
120
+ ### Actinteract with elements
121
+
122
+ Use the ref from your snapshot:
90
123
 
91
124
  ```bash
92
- # Click
125
+ # Click a link or button
93
126
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
94
- -d '{"type": "click", "ref": "e5"}'
127
+ -d '{"type": "click", "ref": "e226"}'
95
128
 
96
- # Type text
129
+ # Type in a text field
97
130
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
98
- -d '{"type": "type", "ref": "e12", "text": "search query"}'
131
+ -d '{"type": "type", "ref": "e14", "text": "search query"}'
99
132
 
100
- # Press key
133
+ # Press a key (Enter, Tab, Escape, etc.)
101
134
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
102
135
  -d '{"type": "press_key", "key": "Enter"}'
103
136
 
104
- # Scroll
137
+ # Select from a dropdown
105
138
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
106
- -d '{"type": "scroll", "direction": "down", "amount": 500}'
139
+ -d '{"type": "select", "ref": "e187", "value": "option-value"}'
107
140
 
108
- # Hover
141
+ # Scroll down
109
142
  curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
110
- -d '{"type": "hover", "ref": "e7"}'
111
-
112
- # Select dropdown
113
- curl -X POST http://localhost:9867/action -H "Content-Type: application/json" \
114
- -d '{"type": "select", "ref": "e3", "value": "option1"}'
143
+ -d '{"type": "scroll", "direction": "down", "amount": 500}'
115
144
  ```
116
145
 
117
- ### Extract content
146
+ ### Repeat
118
147
 
119
- ```bash
120
- # Page text
121
- curl http://localhost:9867/text
122
-
123
- # Screenshot (JPEG)
124
- curl http://localhost:9867/screenshot > screenshot.jpg
125
-
126
- # Execute JavaScript
127
- curl -X POST http://localhost:9867/evaluate -H "Content-Type: application/json" \
128
- -d '{"expression": "document.title"}'
129
- ```
148
+ After each action that changes the page, snapshot again to see the result. Then act on the new refs.
130
149
 
131
150
  ## Step 3: Stop
132
151
 
@@ -138,11 +157,11 @@ The Docker container is automatically stopped and cleaned up.
138
157
 
139
158
  ## Tips
140
159
 
160
+ - **Read the snapshot carefully.** Product names, prices, ratings, links — they're all there. No need for JavaScript or regex.
141
161
  - Use `?diff=true` after the first snapshot to save tokens.
142
- - Refs persist across snapshots — don't re-snapshot just to reuse a ref.
143
162
  - Batch actions: `POST /actions` with `{"actions": [...], "stopOnError": true}`.
144
163
  - Tabs: `GET /tabs`, `POST /tab` with `{"action": "create|switch|close"}`.
145
- - Use `tabId` param on any endpoint to target a specific tab without switching.
164
+ - Screenshot (`GET /screenshot`) is useful to show the user what you see, but do NOT use it as your primary data source.
146
165
 
147
166
  ## Troubleshooting
148
167
 
@@ -151,9 +170,5 @@ The Docker container is automatically stopped and cleaned up.
151
170
  | First run slow | Docker pulling Neko image (~700MB), wait ~2 min |
152
171
  | Health returns `connected: false` | Chrome crashed — restart arise-browser |
153
172
  | Neko UI loads but no video | Open UDP 52000-52100 in firewall/security group |
154
- | Neko UI click no response | Use admin password `admin`, or restart container (implicit hosting enabled) |
155
- | Action returns error | Snapshot first to get valid refs, then act |
156
-
157
- ## Full API Reference
158
-
159
- See [references/api.md](references/api.md) for all endpoints, parameters, and advanced features (recording, PDF export, batch actions).
173
+ | Action returns error | Snapshot first to get valid refs, then act on them |
174
+ | Can't find an element | Scroll down and snapshot again element may be below the fold |