opensteer 0.8.2 → 0.8.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,32 +5,101 @@ Use the CLI when you need a fast JSON-first loop against a repo-local workspace
5
5
  ## Sections
6
6
 
7
7
  - [Quickstart](#quickstart)
8
+ - [Snapshot Output — What To Read](#snapshot-output--what-to-read)
9
+ - [End-to-End Example](#end-to-end-example)
8
10
  - [Browser Lifecycle And Profile Cloning](#browser-lifecycle-and-profile-cloning)
9
11
  - [Browser Modes](#browser-modes)
10
12
  - [Advanced Semantic Operations](#advanced-semantic-operations)
11
- - [Extraction Schema Examples](#extraction-schema-examples)
13
+ - [Extraction Schema And Array Auto-Generalization](#extraction-schema-and-array-auto-generalization)
12
14
 
13
15
  ## Quickstart
14
16
 
15
17
  ```bash
16
- opensteer browser status --workspace demo
17
18
  opensteer open https://example.com --workspace demo
18
19
  opensteer snapshot action --workspace demo
19
- opensteer click --workspace demo --element 3
20
- opensteer input --workspace demo --selector "input[type=search]" --text "search term" --press-enter true
20
+ # Read the "html" field in the JSON output. Find elements by their c="N" attributes.
21
+ # Example: <input c="5" placeholder="Search"> means element 5 is the search input.
22
+ # Example: <button c="7">Search</button> means element 7 is the search button.
23
+
24
+ # Act on elements AND persist their paths with human-readable descriptions
25
+ opensteer run dom.input --workspace demo \
26
+ --input-json '{"target":{"kind":"element","element":5},"text":"search term","persistAsDescription":"search input"}'
27
+ opensteer run dom.click --workspace demo \
28
+ --input-json '{"target":{"kind":"element","element":7},"persistAsDescription":"search button"}'
29
+
30
+ # Re-snapshot after navigation, then persist an extraction descriptor
21
31
  opensteer snapshot extraction --workspace demo
22
32
  opensteer extract --workspace demo \
23
33
  --description "page summary" \
24
- --schema-json '{"title":{"selector":"title"},"url":{"source":"current_url"}}'
34
+ --schema-json '{"title":{"element":3},"url":{"source":"current_url"}}'
35
+
36
+ # Replay later with just descriptions — no snapshot needed
37
+ opensteer click --workspace demo --description "search button"
25
38
  opensteer extract --workspace demo --description "page summary"
26
39
  opensteer close --workspace demo
27
40
  ```
28
41
 
29
42
  - Stateful CLI commands currently require `--workspace <id>`.
30
- - With a workspace, browser mode defaults to `persistent`.
31
- - Use `snapshot action` before `--element <n>` targets.
32
- - `extract --description --schema-json ...` writes or updates a persisted extraction descriptor.
33
- - `extract --description ...` replays the stored extraction payload with no schema.
43
+ - Use `snapshot action` to discover page elements during exploration.
44
+ - Use `opensteer run dom.*` with `persistAsDescription` to cache element paths under descriptions.
45
+ - Replay cached actions with `--description` alone no snapshot needed.
46
+ - `extract --description --schema-json` writes a persisted extraction descriptor.
47
+ - `extract --description` replays the stored extraction.
48
+
49
+ ## Snapshot Output — What To Read
50
+
51
+ `snapshot action` and `snapshot extraction` both return JSON:
52
+
53
+ ```json
54
+ {
55
+ "url": "https://example.com/search?q=airpods",
56
+ "title": "Search Results",
57
+ "mode": "extraction",
58
+ "html": "<span c=\"12\">$549.99</span>\n<a c=\"15\" href=\"/p/product-1\">\n <div c=\"16\">Apple AirPods Max</div>\n</a>\n<a c=\"18\" href=\"/b/apple\">Apple</a>...",
59
+ "counters": [{"element":12,"tagName":"SPAN","pathHint":"span",...}, ...]
60
+ }
61
+ ```
62
+
63
+ **Read the `html` field.** It is a clean, filtered DOM. Hidden elements, scripts, and styles are already removed. Every element has a `c="N"` attribute.
64
+
65
+ - `c="N"` in the HTML = `element: N` in commands and extraction schemas
66
+ - `snapshot action` keeps interactive elements (buttons, inputs, links) for clicking/typing
67
+ - `snapshot extraction` keeps all visible content (text, prices, titles) for data extraction
68
+ - Do NOT parse the `counters` array to find elements — it is verbose metadata. Read the HTML string, find the `c="N"` values, and use those numbers.
69
+
70
+ ## End-to-End Example
71
+
72
+ Goal: go to a site, search for a product, extract all results.
73
+
74
+ ```bash
75
+ # 1. Open the page
76
+ opensteer open https://example.com --workspace demo
77
+
78
+ # 2. Snapshot to discover elements
79
+ opensteer snapshot action --workspace demo
80
+ # Read html field. Find: <input c="5" placeholder="Search"> and <button c="7">Search</button>
81
+
82
+ # 3. Type search term and persist the element path
83
+ opensteer run dom.input --workspace demo \
84
+ --input-json '{"target":{"kind":"element","element":5},"text":"airpods","pressEnter":true,"persistAsDescription":"search input"}'
85
+
86
+ # 4. Re-snapshot the results page (always re-snapshot after navigation!)
87
+ opensteer snapshot extraction --workspace demo
88
+ # Read html field. Find product items with their c="N" values:
89
+ # <div c="13">Apple AirPods Max</div> <span c="14">$549.99</span>
90
+ # <div c="22">Apple AirPods Pro</div> <span c="23">$249.99</span>
91
+
92
+ # 5. Extract all results — array auto-generalizes from template rows
93
+ opensteer extract --workspace demo \
94
+ --description "search results" \
95
+ --schema-json '{"items":[{"name":{"element":13},"price":{"element":14}},{"name":{"element":22},"price":{"element":23}}]}'
96
+ # Returns ALL matching rows on the page, not just the 2 templates.
97
+
98
+ # 6. Close
99
+ opensteer close --workspace demo
100
+ ```
101
+
102
+ Total: 6 commands.
34
103
 
35
104
  ## Browser Lifecycle And Profile Cloning
36
105
 
@@ -94,6 +163,9 @@ opensteer run network.query --workspace demo \
94
163
  opensteer run request-plan.infer --workspace demo \
95
164
  --input-json '{"recordId":"rec_123","key":"products.search","version":"v1"}'
96
165
 
166
+ opensteer run request-plan.infer --workspace demo \
167
+ --input-json '{"recordId":"rec_123","key":"products.search.portable","version":"v1","transport":"direct-http"}'
168
+
97
169
  opensteer run request.execute --workspace demo \
98
170
  --input-json '{"key":"products.search","query":{"q":"laptop"}}'
99
171
  ```
@@ -102,13 +174,11 @@ opensteer run request.execute --workspace demo \
102
174
  - Use `run page.goto` when you need `networkTag` on navigation. The short `goto` form only parses the URL positional.
103
175
  - Use `run dom.click` / `run dom.input` / `run dom.hover` / `run dom.scroll` when you need `persistAsDescription`.
104
176
 
105
- ## Extraction Schema Examples
177
+ ## Extraction Schema And Array Auto-Generalization
106
178
 
107
- ```bash
108
- opensteer snapshot extraction --workspace demo
109
- ```
179
+ Always run `snapshot extraction` before building a schema — you need the `c="N"` counter values from the HTML.
110
180
 
111
- Explicit field bindings:
181
+ Flat field bindings:
112
182
 
113
183
  ```bash
114
184
  opensteer extract --workspace demo \
@@ -120,14 +190,39 @@ opensteer extract --workspace demo \
120
190
  --schema-json '{"url":{"selector":"a.primary","attribute":"href"},"pageUrl":{"source":"current_url"}}'
121
191
  ```
122
192
 
123
- Arrays with representative rows:
193
+ Array extraction with auto-generalization:
124
194
 
125
195
  ```bash
126
196
  opensteer extract --workspace demo \
127
- --description "items" \
128
- --schema-json '{"items":[{"title":{"selector":"#products li:nth-child(1) .title"},"price":{"selector":"#products li:nth-child(1) .price"}},{"title":{"selector":"#products li:nth-child(2) .title"},"price":{"selector":"#products li:nth-child(2) .price"}}]}'
197
+ --description "product list" \
198
+ --schema-json '{"items":[{"name":{"element":13},"price":{"element":14}},{"name":{"element":22},"price":{"element":23}}]}'
199
+ ```
200
+
201
+ The extractor uses the 1-2 representative rows as **templates**, then automatically finds **ALL matching rows** on the page:
202
+
203
+ ```json
204
+ {
205
+ "items": [
206
+ {"name": "Apple AirPods Max", "price": "$549.99"},
207
+ {"name": "Apple AirPods Pro", "price": "$249.99"},
208
+ {"name": "Apple AirPods 4", "price": "$129.99"},
209
+ ...
210
+ ]
211
+ }
129
212
  ```
130
213
 
131
- - Build the exact JSON object you want. The extractor does not accept semantic placeholders like `"string"` or prompt-style schemas.
132
- - Use `element` fields only with counters from a fresh snapshot in the same live session.
133
- - For arrays, include one or more representative objects. Add multiple examples when repeated rows have structural variants.
214
+ Rules:
215
+
216
+ - Build the exact JSON shape you want. The extractor does not accept `"string"` or prompt-style schemas.
217
+ - Each leaf must be `{ element: N }`, `{ selector: "..." }`, optional `attribute`, or `{ source: "current_url" }`.
218
+ - Use `element` fields during CLI exploration with a fresh snapshot. Deterministic scripts use `description`.
219
+ - For arrays, provide 1-2 representative objects. Add 2 when repeated rows have structural variants.
220
+ - Nested arrays are not supported.
221
+
222
+ ## What NOT To Do
223
+
224
+ - Do NOT use `page.evaluate()` to scrape DOM data. Use `extract()` with element-based schemas.
225
+ - Do NOT parse the `counters` array to find elements. Read the `html` string and find `c="N"` values.
226
+ - Do NOT use CSS selectors in reusable scripts. Use `description` from cached descriptors.
227
+ - Do NOT write loops to enumerate list items. Use array extraction with 1-2 template rows.
228
+ - Do NOT skip re-snapshot after navigation. Always re-snapshot before targeting new elements.
@@ -19,10 +19,10 @@ Use this workflow when the deliverable is a custom API, a replayable request pla
19
19
  2. Tag the important navigation or interactions with `networkTag`.
20
20
  3. Inspect the captured traffic and isolate the relevant records.
21
21
  4. Save useful captures to the workspace if they need to survive later analysis.
22
- 5. Prove the request with `rawRequest()`.
23
- 6. Promote the winning record into a request plan.
22
+ 5. Probe the request with `rawRequest()` — try `direct-http` first, then `context-http`.
23
+ 6. Infer a request plan from the probed record pass `transport` if you proved portability.
24
24
  7. Add recipes or auth recipes if replay needs deterministic setup.
25
- 8. Replay the plan from code.
25
+ 8. Replay the plan from code — works immediately, no extra steps.
26
26
 
27
27
  This workflow should carry equal weight with DOM automation. Use it whenever the browser page is only the launcher for the real target request.
28
28
 
@@ -47,7 +47,6 @@ await opensteer.goto({
47
47
  });
48
48
 
49
49
  await opensteer.click({
50
- selector: "button.load-products",
51
50
  description: "load products",
52
51
  networkTag: "products-load",
53
52
  });
@@ -63,26 +62,29 @@ const records = await opensteer.queryNetwork({
63
62
  });
64
63
  ```
65
64
 
66
- 3. Test the request directly.
65
+ 3. Probe the request — try `direct-http` first to test portability.
67
66
 
68
67
  ```ts
69
68
  const response = await opensteer.rawRequest({
70
- transport: "context-http",
69
+ transport: "direct-http",
71
70
  url: "https://example.com/api/products",
72
71
  method: "POST",
73
72
  body: {
74
73
  json: { page: 1 },
75
74
  },
76
75
  });
76
+ // If direct-http returns 200, the API is portable (no browser needed).
77
+ // If it fails, try context-http — the API needs browser session state.
77
78
  ```
78
79
 
79
- 4. Promote a captured record into a request plan.
80
+ 4. Infer a request plan pass `transport` if you proved portability.
80
81
 
81
82
  ```ts
82
83
  await opensteer.inferRequestPlan({
83
- recordId: records.records[0]!.id,
84
+ recordId: response.recordId,
84
85
  key: "products.search",
85
86
  version: "v1",
87
+ transport: "direct-http", // use the transport you proved works
86
88
  });
87
89
  ```
88
90
 
@@ -117,17 +119,18 @@ await opensteer.runAuthRecipe({
117
119
  ## CLI Equivalents
118
120
 
119
121
  ```bash
120
- opensteer open --workspace demo
122
+ opensteer open https://example.com/app --workspace demo
121
123
  opensteer run page.goto --workspace demo \
122
124
  --input-json '{"url":"https://example.com/app","networkTag":"page-load"}'
123
- opensteer run dom.click --workspace demo \
124
- --input-json '{"target":{"kind":"selector","selector":"button.load-products"},"persistAsDescription":"load products","networkTag":"products-load"}'
125
+ opensteer click --workspace demo --description "load products"
126
+ # or with networkTag: opensteer run dom.click --workspace demo \
127
+ # --input-json '{"target":{"kind":"description","description":"load products"},"networkTag":"products-load"}'
125
128
  opensteer run network.query --workspace demo \
126
- --input-json '{"tag":"products-load","includeBodies":true,"limit":20}'
129
+ --input-json '{"source":"saved","tag":"products-load","includeBodies":true,"limit":20}'
127
130
  opensteer run request.raw --workspace demo \
128
- --input-json '{"transport":"context-http","url":"https://example.com/api/products","method":"POST","body":{"json":{"page":1}}}'
131
+ --input-json '{"transport":"direct-http","url":"https://example.com/api/products","method":"POST","body":{"json":{"page":1}}}'
129
132
  opensteer run request-plan.infer --workspace demo \
130
- --input-json '{"recordId":"rec_123","key":"products.search","version":"v1"}'
133
+ --input-json '{"recordId":"rec_123","key":"products.search","version":"v1","transport":"direct-http"}'
131
134
  opensteer run request.execute --workspace demo \
132
135
  --input-json '{"key":"products.search","query":{"q":"laptop"}}'
133
136
  ```
@@ -152,6 +155,22 @@ const context = await opensteer.rawRequest({
152
155
 
153
156
  If `direct-http` returns 200, the API is portable and does not need a browser for future calls. If only `context-http` works, the API depends on browser session state.
154
157
 
158
+ After proving portability, infer the plan with an explicit transport override:
159
+
160
+ ```ts
161
+ await opensteer.inferRequestPlan({
162
+ recordId: records.records[0]!.id,
163
+ key: "products.search.portable",
164
+ version: "v1",
165
+ transport: "direct-http",
166
+ });
167
+ ```
168
+
169
+ ```bash
170
+ opensteer run request-plan.infer --workspace demo \
171
+ --input-json '{"recordId":"rec_123","key":"products.search.portable","version":"v1","transport":"direct-http"}'
172
+ ```
173
+
155
174
  ## Auth Token Acquisition
156
175
 
157
176
  When you discover an auth endpoint, acquire a token and use it to probe for data APIs that may be behind auth:
@@ -211,7 +230,8 @@ Common mistakes:
211
230
  Additional guidance:
212
231
 
213
232
  - Capture the browser action first if authentication, cookies, or minted tokens may matter.
214
- - Prefer `direct-http` only after proving the request no longer depends on live browser state.
233
+ - Probe with `direct-http` first. If it works, pass `transport: "direct-http"` to `inferRequestPlan` so the plan is portable. If it fails, fall back to `context-http`.
215
234
  - `inferRequestPlan()` throws if the key+version already exists. Catch the error or bump the version.
235
+ - Inferred plans are immediately usable — `request.execute` works right after inference.
216
236
  - Use recipes when the request plan needs deterministic setup work. Use auth recipes when the setup is specifically auth refresh or login state.
217
237
  - Stay in the DOM workflow only when the rendered page itself is the deliverable. Move here when the request is the durable artifact.
@@ -39,6 +39,12 @@ const opensteer = new Opensteer({
39
39
 
40
40
  ## DOM Automation And Extraction
41
41
 
42
+ Opensteer uses a two-phase workflow: **explore** with the CLI, then **replay** with the SDK.
43
+
44
+ ### Phase 1 — Exploration (one-time, via CLI or setup script)
45
+
46
+ Run `opensteer snapshot action --workspace demo` from the CLI first. Read the `html` field in the JSON output — it is a clean filtered DOM with `c="N"` attributes. Use those counter numbers as the `element` parameter below. The SDK also exposes `snapshot()`, but this guide keeps discovery in the CLI so the DOM HTML is easy to inspect from the terminal.
47
+
42
48
  ```ts
43
49
  import { Opensteer } from "opensteer";
44
50
 
@@ -48,21 +54,21 @@ const opensteer = new Opensteer({
48
54
  });
49
55
 
50
56
  await opensteer.open("https://example.com");
51
- await opensteer.snapshot("action");
52
57
 
58
+ // element numbers come from c="N" values in the snapshot html field
53
59
  await opensteer.click({
54
- selector: "button.primary",
55
- description: "primary button",
60
+ element: 3,
61
+ description: "primary button", // caches the element path
56
62
  });
57
63
 
58
64
  await opensteer.input({
59
- selector: "input[type=search]",
60
- description: "search input",
65
+ element: 7,
66
+ description: "search input", // caches the element path
61
67
  text: "laptop",
62
68
  pressEnter: true,
63
69
  });
64
70
 
65
- const data = await opensteer.extract({
71
+ await opensteer.extract({
66
72
  description: "page summary",
67
73
  schema: {
68
74
  title: { selector: "title" },
@@ -70,32 +76,83 @@ const data = await opensteer.extract({
70
76
  },
71
77
  });
72
78
 
73
- const replayed = await opensteer.extract({
74
- description: "page summary",
79
+ await opensteer.close();
80
+ ```
81
+
82
+ ### Phase 2 — Deterministic replay (the actual reusable script)
83
+
84
+ Use `description` alone for everything — resolves from cached descriptors:
85
+
86
+ ```ts
87
+ const opensteer = new Opensteer({
88
+ workspace: "demo",
89
+ rootDir: process.cwd(),
75
90
  });
91
+
92
+ await opensteer.open("https://example.com");
93
+
94
+ await opensteer.click({ description: "primary button" });
95
+ await opensteer.input({ description: "search input", text: "laptop", pressEnter: true });
96
+ const data = await opensteer.extract({ description: "page summary" });
97
+
98
+ await opensteer.close();
76
99
  ```
77
100
 
78
101
  DOM rules:
79
102
 
80
- - Use `snapshot("action")` before counter-based interactions.
81
- - Use `snapshot("extraction")` to inspect the page structure and design the output object.
82
- - Treat snapshots as planning artifacts. `extract()` reads current page state and replays persisted extraction descriptors from deterministic, snapshot-backed payloads.
83
- - `selector + description` or `element + description` persists a DOM action descriptor. Bare `description` replays it later.
103
+ - Deterministic scripts use `description` for all interactions and extractions — no snapshots, no selectors.
104
+ - `element + description` persists a DOM action descriptor. Bare `description` replays it later.
84
105
  - `description + schema` writes or updates a persisted extraction descriptor. Bare `description` replays it later.
106
+ - Use `element` targets only during the exploration phase with a fresh snapshot from the CLI.
85
107
  - Keep DOM data collection in `extract()`, not `evaluate()` or raw page DOM parsing, when the result can be expressed as structured fields.
108
+ - CSS selectors exist as a low-level escape hatch but are not recommended for reusable scripts.
86
109
 
87
- Supported extraction field shapes in the current public SDK:
110
+ Supported extraction field shapes:
88
111
 
89
- - `{ element: N }`
112
+ - `{ element: N }` — requires a prior CLI snapshot; use during exploration only
90
113
  - `{ element: N, attribute: "href" }`
91
114
  - `{ selector: ".price" }`
92
115
  - `{ selector: "img.hero", attribute: "src" }`
93
116
  - `{ source: "current_url" }`
94
117
 
95
- For arrays, include one or more representative objects in the schema. Add multiple examples when repeated rows have structural variants.
118
+ For arrays, provide 1-2 representative objects. The extractor auto-generalizes from these templates to find ALL matching rows on the page:
119
+
120
+ ```ts
121
+ const results = await opensteer.extract({
122
+ description: "search results",
123
+ schema: {
124
+ items: [
125
+ { name: { element: 13 }, price: { element: 14 } },
126
+ { name: { element: 22 }, price: { element: 23 } },
127
+ ],
128
+ },
129
+ });
130
+ // results.items contains ALL matching rows on the page, not just the 2 templates
131
+ ```
96
132
 
97
133
  Do not use `prompt` or semantic placeholder values such as `"string"` in the current public SDK. The extractor expects explicit schema objects, arrays, and field descriptors.
98
134
 
135
+ ### What extract() Returns
136
+
137
+ `extract()` returns a plain JSON object matching your schema shape:
138
+
139
+ ```ts
140
+ // Flat schema:
141
+ { title: "Search Results", url: "https://..." }
142
+
143
+ // Array schema (auto-generalized from 1-2 templates):
144
+ {
145
+ items: [
146
+ { name: "Apple AirPods Max", price: "$549.99" },
147
+ { name: "Apple AirPods Pro", price: "$249.99" },
148
+ { name: "Apple AirPods 4", price: "$129.99" },
149
+ // ... ALL matching rows
150
+ ]
151
+ }
152
+ ```
153
+
154
+ Use `extract()` for structured data. Do NOT use `evaluate()` or raw DOM parsing when `extract()` can express the result.
155
+
99
156
  ## Browser Admin
100
157
 
101
158
  ```ts
@@ -150,6 +207,13 @@ await opensteer.inferRequestPlan({
150
207
  version: "v1",
151
208
  });
152
209
 
210
+ await opensteer.inferRequestPlan({
211
+ recordId: records.records[0]!.id,
212
+ key: "products.search.portable",
213
+ version: "v1",
214
+ transport: "direct-http",
215
+ });
216
+
153
217
  await opensteer.saveNetwork({
154
218
  tag: "products-load",
155
219
  });
@@ -199,7 +263,6 @@ Session and page control:
199
263
 
200
264
  Interaction and extraction:
201
265
 
202
- - `snapshot("action" | "extraction")`
203
266
  - `click({ element | selector | description, networkTag? })`
204
267
  - `hover({ element | selector | description, networkTag? })`
205
268
  - `input({ element | selector | description, text, pressEnter?, networkTag? })`
@@ -219,8 +282,8 @@ Inspection and evaluation:
219
282
  Request capture and replay:
220
283
 
221
284
  - `rawRequest({ transport?, pageRef?, url, method?, headers?, body?, followRedirects? })`
222
- - `inferRequestPlan({ recordId, key, version, lifecycle? })`
223
- - `writeRequestPlan({ key, version, payload, lifecycle?, tags?, provenance?, freshness? })`
285
+ - `inferRequestPlan({ recordId, key, version, transport? })`
286
+ - `writeRequestPlan({ key, version, payload, tags?, provenance?, freshness? })`
224
287
  - `getRequestPlan({ key, version? })`
225
288
  - `listRequestPlans({ key? })`
226
289
  - `request(key, { path?, query?, headers?, body? })`
@@ -251,7 +314,8 @@ Lifecycle:
251
314
 
252
315
  - Wrap long-running browser ownership in `try/finally` and call `close()`.
253
316
  - Use `networkTag` on actions that trigger requests you may inspect later.
254
- - Use `description` when the interaction should be replayable across sessions.
255
- - Use `description` plus `schema` when an extraction should be replayable across sessions.
256
- - Use `element` targets only with a fresh snapshot from the same live session.
317
+ - Use `description` for all interactions and extractions in deterministic scripts.
318
+ - Use `description` plus `schema` to persist an extraction descriptor. Bare `description` replays it.
319
+ - Use `element` targets only during CLI exploration with a fresh snapshot. Deterministic scripts use `description`.
320
+ - The SDK does expose `snapshot()`, but this workflow keeps element discovery in the CLI with `snapshot action`.
257
321
  - Prefer Opensteer methods over raw Playwright so browser, extraction, and replay semantics stay consistent.