opensteer 0.8.18 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,179 +1,142 @@
1
1
  ---
2
2
  name: opensteer
3
- description: "Handles Opensteer browser automation, structured DOM extraction, and browser-backed API reverse engineering. Use when the user mentions Opensteer, browser automation, real Chromium sessions, DOM extraction, network capture, reverse-engineering a site API, or browser-grade fetch."
3
+ description: "Use when the task needs real browser automation, DOM exploration, browser session state, network capture, or browser-backed request replay with Opensteer. The default pattern is: explore with the CLI first, then write the final code with the SDK."
4
4
  argument-hint: "[goal]"
5
5
  ---
6
6
 
7
7
  # Opensteer
8
8
 
9
- Opensteer gives agents a real Chromium browser for two jobs normal code cannot do:
9
+ Use Opensteer when normal code is not enough because the task depends on a real browser:
10
10
 
11
- 1. **DOM automation** — interact with pages and extract structured data via snapshot-based element targeting.
12
- 2. **API reverse engineering** — capture real browser traffic, identify APIs, test transport portability, and write reusable code.
11
+ - interact with a real page
12
+ - inspect browser state like cookies or storage
13
+ - capture real network traffic from real browser actions
14
+ - replay requests through the browser session
15
+ - turn the discovery into plain TypeScript
13
16
 
14
- The workflow is always: **explore with CLI, then write reusable code with SDK.**
17
+ The default workflow is:
15
18
 
16
- ## When To Use
19
+ 1. Use the CLI to explore the site.
20
+ 2. Figure out the page or API behavior.
21
+ 3. Save stable targets with `persist` when useful.
22
+ 4. Write the final reusable code with the SDK.
17
23
 
18
- - Task involves a website's API, network traffic, auth headers, or replay **API workflow**
19
- - Task involves page content, forms, clicking, typing, or extracting visible data → **DOM workflow**
20
- - Task involves browser profiles, attaching to Chrome, or workspace setup → **Browser management**
21
- - Unsure → Start with the API workflow. Capture traffic first, then decide.
22
-
23
- ## Rules
24
-
25
- 1. Set `--workspace <id>` on every command, or export `OPENSTEER_WORKSPACE`.
26
- 2. Re-snapshot after every navigation before using element numbers.
27
- 3. Import as `import { Opensteer } from "opensteer"` — never a relative path.
28
- 4. SDK constructor needs only two fields:
29
- ```ts
30
- const opensteer = new Opensteer({ workspace: "demo", rootDir: process.cwd() });
31
- ```
32
- 5. `persist` is the only naming mechanism for reusable targets and extractions.
33
- 6. `--capture-network <label>` is opt-in. Add it to any action when you need traffic.
34
- 7. `opensteer.fetch()` works without a page open — it uses the session cookie jar and transport stack directly.
35
- 8. Element numbers come from `c="N"` attributes in snapshot HTML. Always snapshot first, then act.
24
+ Do not stop at manual exploration if the user needs automation. Explore first, then convert the result into SDK code.
36
25
 
37
- ---
26
+ ## When To Use
38
27
 
39
- ## DOM Automation
28
+ - The task needs a real browser session.
29
+ - The task involves clicks, forms, DOM extraction, or navigation.
30
+ - The task involves cookies, localStorage, sessionStorage, or auth state.
31
+ - The task involves reverse-engineering a site API from real browser traffic.
32
+ - The task needs browser-backed `fetch()` instead of plain Node HTTP.
33
+ - The task needs coordinate-based fallback because DOM targeting is not enough.
40
34
 
41
- ### Step 1: Open and snapshot
35
+ If the user wants to manually drive a browser and record the flow, use the `recorder` skill instead.
42
36
 
43
- ```bash
44
- opensteer open https://example.com --workspace demo
45
- opensteer snapshot action --workspace demo
46
- ```
37
+ ## Core Rules
47
38
 
48
- Read the `html` output. Find `c="N"` markers these are your element IDs.
39
+ 1. Always use a workspace for stateful commands: `--workspace <id>` or `OPENSTEER_WORKSPACE`.
40
+ 2. In this repo, prefer `pnpm run opensteer:local -- <command>` instead of bare `opensteer ...`.
41
+ 3. Re-snapshot after navigation or big UI changes before reusing element numbers.
42
+ 4. Use the CLI to discover. Use the SDK for the final implementation.
43
+ 5. Use `persist` for stable reusable targets and extraction payloads.
44
+ 6. Use `exec` for SDK code and API experiments. Use `evaluate` only for page-context JavaScript.
45
+ 7. If `fetch()` fails with auth errors, inspect `state()`, `cookies()`, and `storage()` before changing transport.
46
+ 8. Keep the final output simple. Prefer ordinary TypeScript with `Opensteer`, not extra abstraction unless the user asks for it.
47
+ 9. Close the browser when you are done. Do not leave headed browser windows running on the user's machine. Use `opensteer browser delete --workspace <id>` or the matching SDK cleanup when the session does not need to stay open.
49
48
 
50
- ### Step 2: Interact
49
+ ## Choose A Path
51
50
 
52
- ```bash
53
- opensteer click 7 --workspace demo --persist "search button"
54
- opensteer input 5 "laptop" --workspace demo --press-enter --persist "search input"
55
- opensteer hover 3 --workspace demo --persist "menu trigger"
56
- opensteer scroll down 400 --workspace demo
57
- ```
51
+ - Page interaction or extraction: use the DOM path.
52
+ - API discovery or replay: use the network path.
53
+ - Canvas, WebGL, or hard-to-target UI: use computer-use.
54
+ - Browser profile, attach mode, or workspace management: use browser management.
55
+ - Unsure: start by capturing network traffic.
58
56
 
59
- `--persist <key>` saves the element's structural DOM path for deterministic SDK replay. Element number is always required on CLI — persist is save-only, not a targeting mode.
57
+ ## DOM Path
60
58
 
61
- ### Step 3: Extract
59
+ Use this when the goal is clicking, typing, navigating, or extracting visible data.
62
60
 
63
- Re-snapshot, then extract with a JSON schema referencing element numbers:
61
+ ### CLI exploration
64
62
 
65
63
  ```bash
64
+ opensteer open https://example.com --workspace demo
65
+ opensteer snapshot action --workspace demo
66
+ opensteer input 5 "laptop" --workspace demo --press-enter --persist "search input"
67
+ opensteer click 7 --workspace demo --persist "search button"
66
68
  opensteer snapshot extraction --workspace demo
67
69
  opensteer extract '{"items":[{"name":{"element":13},"price":{"element":14}}]}' \
68
- --persist "search results" --workspace demo
70
+ --workspace demo \
71
+ --persist "search results"
69
72
  ```
70
73
 
71
- The positional argument is the JSON schema. `--persist` names it for SDK replay.
72
-
73
- ### Step 4: Close
74
-
75
- ```bash
76
- opensteer close --workspace demo
77
- ```
74
+ Element numbers come from `c="N"` markers in the snapshot HTML.
78
75
 
79
- ### SDK Replay
76
+ ### SDK implementation
80
77
 
81
78
  ```ts
82
79
  import { Opensteer } from "opensteer";
83
- const opensteer = new Opensteer({ workspace: "demo", rootDir: process.cwd() });
84
-
85
- try {
86
- await opensteer.open("https://example.com");
87
80
 
88
- // Replay by persist key no element numbers or snapshots needed
89
- await opensteer.input({ persist: "search input", text: "laptop", pressEnter: true });
90
- await opensteer.click({ persist: "search button" });
81
+ const opensteer = new Opensteer({ workspace: "demo", rootDir: process.cwd() });
91
82
 
92
- // Extract using cached schema
93
- const data = await opensteer.extract({ persist: "search results" });
94
- console.log(data);
95
- } finally {
96
- await opensteer.close();
97
- }
83
+ await opensteer.open("https://example.com");
84
+ await opensteer.input({ persist: "search input", text: "laptop", pressEnter: true });
85
+ await opensteer.click({ persist: "search button" });
86
+ const data = await opensteer.extract({ persist: "search results" });
98
87
  ```
99
88
 
100
- When `persist` is used with `element`, it **saves** the path. When used alone, it **resolves** from cache.
89
+ Use `selector` in SDK code only when a stable CSS selector is cleaner than `persist`.
101
90
 
102
- ---
91
+ ## Network Path
103
92
 
104
- ## API Reverse Engineering
93
+ Use this when the goal is to find or replay a site API.
105
94
 
106
- ### Step 1: Capture traffic
107
-
108
- Open the site and trigger the real browser action with `--capture-network`:
95
+ ### CLI exploration
109
96
 
110
97
  ```bash
111
98
  opensteer open https://example.com --workspace demo
112
99
  opensteer goto https://example.com/search --workspace demo --capture-network page-load
113
100
  opensteer input 5 "laptop" --workspace demo --press-enter --capture-network search
101
+ opensteer network query --workspace demo --capture search --json
102
+ opensteer network detail rec_123 --workspace demo --probe
114
103
  ```
115
104
 
116
- ### Step 2: Find the API
117
-
118
- ```bash
119
- opensteer network query --workspace demo --capture search
120
- opensteer network query --workspace demo --capture search --hostname api.example.com --json
121
- ```
122
-
123
- `--json` filters to JSON and GraphQL responses only. Other filters: `--url`, `--path`, `--method`, `--status`, `--type`, `--before`, `--after`, `--limit`.
124
-
125
- ### Step 3: Inspect and probe transport
126
-
127
- ```bash
128
- opensteer network detail rec_123 --workspace demo
129
- opensteer network detail rec_123 --probe --workspace demo
130
- ```
131
-
132
- The first command shows: URL, method, request headers, cookies sent, request/response body preview, GraphQL metadata, redirect chain.
133
-
134
- Add `--probe` to also test which transport works for this API:
105
+ Use `network detail --probe` to learn which transport works.
135
106
 
136
- | Transport | Meaning | SDK usage |
137
- | ------------- | ------------------------------ | ----------------------------------------------- |
138
- | `direct-http` | Plain HTTP works | `this.fetch(url)` (default) |
139
- | `matched-tls` | Needs TLS fingerprint matching | `this.fetch(url, { transport: "matched-tls" })` |
140
- | `page-http` | Needs a live browser page | `this.fetch(url, { transport: "page" })` |
107
+ ### Session state checks
141
108
 
142
- ### Step 4: Check browser state (if 401/403)
109
+ Use these when auth or browser state matters:
143
110
 
144
111
  ```bash
145
112
  opensteer state example.com --workspace demo
146
113
  ```
147
114
 
148
- Look for session cookies, CSRF tokens, or storage-backed auth that the request depends on.
149
-
150
- ### Step 5: Test and write code with exec
115
+ ```ts
116
+ const cookies = await opensteer.cookies("example.com");
117
+ const localStorage = await opensteer.storage("example.com", "local");
118
+ const sessionStorage = await opensteer.storage("example.com", "session");
119
+ const state = await opensteer.state("example.com");
120
+ ```
151
121
 
152
- Use `exec` to test the API call with the SDK. The code you write here is the same code that goes in your final script:
122
+ ### Prove the request with `exec`
153
123
 
154
124
  ```bash
155
125
  opensteer exec "
156
- const r = await this.fetch('https://api.example.com/search', {
126
+ const response = await this.fetch('https://api.example.com/search', {
157
127
  method: 'POST',
158
128
  headers: { 'content-type': 'application/json' },
159
129
  body: JSON.stringify({ keyword: 'laptop', count: 24 }),
160
130
  });
161
- return { status: r.status, data: await r.json() };
131
+ return { status: response.status, data: await response.json() };
162
132
  " --workspace demo
163
133
  ```
164
134
 
165
- `exec` runs JavaScript with the Opensteer SDK bound as `this`. It supports `await` and has access to `this.fetch()`, `this.cookies()`, `this.evaluate()`, and all other SDK methods.
166
-
167
- IMPORTANT: Use `exec` instead of `evaluate` for API calls. `evaluate` runs inside the browser page where anti-bot scripts can detect and block programmatic requests. `exec` runs through the framework's transport stack which automatically finds the least detectable path.
168
-
169
- ### Step 6: Write the final SDK script
170
-
171
- ```bash
172
- opensteer close --workspace demo
173
- ```
135
+ ### SDK implementation
174
136
 
175
137
  ```ts
176
138
  import { Opensteer } from "opensteer";
139
+
177
140
  const opensteer = new Opensteer({ workspace: "demo", rootDir: process.cwd() });
178
141
 
179
142
  export async function search(keyword: string) {
@@ -186,161 +149,75 @@ export async function search(keyword: string) {
186
149
  }
187
150
  ```
188
151
 
189
- `opensteer.fetch()` accepts standard Web Fetch API syntax (`body` as string, `headers` as `Record<string, string>`). It auto-selects the best transport and includes browser cookies by default.
152
+ Use ordinary `fetch()` syntax. Only set `transport` explicitly if probing showed you need it.
190
153
 
191
- ---
192
-
193
- ## Browser Management
154
+ ## Computer-Use
194
155
 
195
- ### Import a Chrome profile
196
-
197
- Copy cookies, localStorage, and session state from an existing Chrome installation:
156
+ Use this only when DOM targeting is not enough.
198
157
 
199
158
  ```bash
200
- opensteer browser clone --workspace my-site \
201
- --source-user-data-dir "$HOME/Library/Application Support/Google/Chrome" \
202
- --source-profile-directory Default
159
+ opensteer computer click 245 380 --workspace demo --capture-network action
160
+ opensteer computer type "search query" --workspace demo
161
+ opensteer computer key Enter --workspace demo
162
+ opensteer computer screenshot --workspace demo
203
163
  ```
204
164
 
205
- ### Attach to a running browser
206
-
207
- Start Chrome with remote debugging, then connect Opensteer:
208
-
209
- ```bash
210
- # Terminal 1: launch Chrome
211
- /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
212
-
213
- # Terminal 2: attach Opensteer
214
- opensteer open https://example.com --workspace demo --attach-endpoint http://localhost:9222
165
+ ```ts
166
+ await opensteer.computerExecute({
167
+ action: { type: "click", x: 245, y: 380 },
168
+ });
215
169
  ```
216
170
 
217
- ### Headful mode
171
+ After coordinate-based actions, switch back to normal extraction or request analysis as soon as possible.
218
172
 
219
- ```bash
220
- opensteer open https://example.com --workspace demo --headless false
221
- ```
173
+ ## Browser Management
222
174
 
223
- ### Workspace lifecycle
175
+ Use these when the task is about browser setup rather than page logic.
224
176
 
225
177
  ```bash
178
+ opensteer browser discover
179
+ opensteer browser inspect --attach-endpoint http://localhost:9222
180
+ opensteer browser clone --workspace demo \
181
+ --source-user-data-dir "$HOME/Library/Application Support/Google/Chrome" \
182
+ --source-profile-directory Default
226
183
  opensteer browser status --workspace demo
227
- opensteer browser reset --workspace demo # reset browser data, keep workspace
228
- opensteer browser delete --workspace demo # delete workspace entirely
229
- ```
230
-
231
- ---
232
-
233
- ## Tabs
234
-
235
- ```bash
236
- opensteer tab list --workspace demo
237
- opensteer tab new https://other-page.com --workspace demo
238
- opensteer tab 2 --workspace demo # switch to tab 2
239
- opensteer tab close 2 --workspace demo
240
- ```
241
-
242
- ## Run JavaScript
243
-
244
- Use `exec` for API calls and SDK operations (runs in Node.js, avoids bot detection):
245
-
246
- ```bash
247
- opensteer exec "await this.fetch('https://api.example.com/data').then(r => r.json())" --workspace demo
248
- ```
249
-
250
- Use `evaluate` only for DOM inspection (runs inside the browser page):
251
-
252
- ```bash
253
- opensteer evaluate "document.title" --workspace demo
184
+ opensteer browser reset --workspace demo
185
+ opensteer browser delete --workspace demo
254
186
  ```
255
187
 
256
- ## Computer-Use (Coordinate-Based)
257
-
258
- For canvas, WebGL, or complex iframes where DOM element targeting fails:
188
+ Attach to an existing browser:
259
189
 
260
190
  ```bash
261
- opensteer computer click 245 380 --workspace demo --capture-network action
262
- opensteer computer type "search query" --workspace demo
263
- opensteer computer key Enter --workspace demo
264
- opensteer computer scroll 400 300 --dx 0 --dy -200 --workspace demo
265
- opensteer computer screenshot --workspace demo
191
+ opensteer open https://example.com --workspace demo --attach-endpoint http://localhost:9222
266
192
  ```
267
193
 
268
- ---
194
+ Cloud mode:
269
195
 
270
- ## CLI Quick Reference
271
-
272
- | Command | Positional args | Key flags |
273
- | ---------------------------------------------------------- | -------------------- | ----------------------------------------------------------------------- |
274
- | `open <url>` | url | `--headless`, `--provider`, `--attach-endpoint`, `--attach-header` |
275
- | `close` | — | — |
276
- | `status` | — | — |
277
- | `goto <url>` | url | `--capture-network` |
278
- | `snapshot [mode]` | action \| extraction | — |
279
- | `click <element>` | element number | `--persist`, `--capture-network`, `--button` |
280
- | `hover <element>` | element number | `--persist`, `--capture-network` |
281
- | `input <element> <text>` | element, text | `--persist`, `--press-enter`, `--capture-network` |
282
- | `scroll <dir> <amount>` | direction, amount | `--element`, `--persist`, `--capture-network` |
283
- | `extract <schema>` | JSON schema | `--persist` |
284
- | `evaluate <script>` | JS expression | — |
285
- | `network query` | — | `--capture`, `--url`, `--hostname`, `--json`, `--limit`, +6 filters |
286
- | `network detail <id>` | recordId | `--probe` |
287
- | `fetch <url>` | url | `--method`, `--header`, `--query`, `--body`, `--transport`, `--cookies` |
288
- | `state [domain]` | domain (optional) | — |
289
- | `exec <expression>` | JS expression | — |
290
- | `tab list / new / <n> / close` | varies | — |
291
- | `computer click/type/key/scroll/move/drag/screenshot/wait` | varies | `--capture-network` |
292
-
293
- ## SDK Quick Reference
196
+ - Use `--provider cloud` on CLI when needed.
197
+ - Common env vars are `OPENSTEER_BASE_URL`, `OPENSTEER_API_KEY`, and `OPENSTEER_CLOUD_APP_BASE_URL`.
294
198
 
295
- ```ts
296
- // Browser lifecycle
297
- await opensteer.open(url);
298
- await opensteer.goto(url, { captureNetwork?: "label" });
299
- await opensteer.close();
300
-
301
- // DOM actions — save to cache
302
- await opensteer.click({ element: 7, persist: "name" });
303
- await opensteer.input({ element: 5, text: "...", persist: "name", pressEnter: true });
304
- await opensteer.hover({ element: 3, persist: "name" });
305
- await opensteer.scroll({ direction: "down", amount: 400 });
306
-
307
- // DOM actions — resolve from cache
308
- await opensteer.click({ persist: "name" });
309
- await opensteer.input({ persist: "name", text: "..." });
310
-
311
- // Extraction
312
- await opensteer.extract({ persist: "name" }); // cached schema
313
- await opensteer.extract({ persist: "name", schema: { ... } }); // inline schema
314
-
315
- // Network discovery
316
- const records = await opensteer.network.query({ capture: "label", limit: 20 });
317
- const detail = await opensteer.network.detail(recordId);
318
-
319
- // Fetch — standard Web Fetch API syntax, auto-selects transport
320
- const response = await opensteer.fetch(url, {
321
- method: "POST",
322
- headers: { "content-type": "application/json" },
323
- body: JSON.stringify({ keyword: "laptop" }),
324
- });
199
+ ## Useful SDK Surface
325
200
 
326
- // Browser state
327
- const cookies = await opensteer.cookies("domain.com"); // .has(), .get(), .serialize()
328
- const state = await opensteer.state("domain.com");
201
+ Use these often:
329
202
 
330
- // Snapshots
331
- const html = await opensteer.snapshot("action");
332
- const html = await opensteer.snapshot("extraction");
333
- ```
334
-
335
- ---
203
+ - `open(url)`
204
+ - `goto(url, { captureNetwork? })`
205
+ - `snapshot("action" | "extraction")`
206
+ - `click()`, `hover()`, `input()`, `scroll()`
207
+ - `extract()`
208
+ - `network.query()`
209
+ - `network.detail()`
210
+ - `waitForNetwork()`, `waitForResponse()`, `waitForPage()`
211
+ - `cookies()`, `storage()`, `state()`
212
+ - `fetch()`
213
+ - `computerExecute()`
214
+ - `addInitScript()`
215
+ - `browser.status()`, `browser.clone()`, `browser.reset()`, `browser.delete()`
336
216
 
337
- ## Common Issues
217
+ ## Guardrails
338
218
 
339
- | Symptom | Fix |
340
- | -------------------------------------------- | -------------------------------------------------------------------------------------------------- |
341
- | Element numbers wrong after navigation | Re-snapshot before using element numbers |
342
- | `fetch()` returns 401/403 | Check `state` request depends on session cookies or tokens |
343
- | Direct HTTP blocked, browser transport works | Site uses TLS fingerprinting — use `transport: "matched-tls"` |
344
- | Extract returns empty data | Element numbers changed — re-snapshot and rebuild the schema |
345
- | `fetch()` fails with no session | Call `opensteer.goto(url)` first to establish cookies, then `fetch()` |
346
- | Using `evaluate` for API calls | Use `exec` instead — `evaluate` runs inside the page where anti-bot scripts can intercept requests |
219
+ - Snapshot before using element numbers.
220
+ - Snapshot again after UI changes.
221
+ - Do not use `evaluate` for API work.
222
+ - Do not keep the result as a manual-only workflow if the user needs reusable automation.
223
+ - Prefer a small final script over a large framework.
@@ -81,4 +81,4 @@ Cloud recording requires:
81
81
  ## References
82
82
 
83
83
  - [Recorder Reference](references/recorder-reference.md)
84
- - [Opensteer SDK Reference](../opensteer/references/sdk-reference.md)
84
+ - [Opensteer Skill](../opensteer/SKILL.md)