opensteer 0.8.2 → 0.8.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -6
- package/dist/{chunk-X3G6QSCF.js → chunk-C7GWMSTV.js} +1700 -577
- package/dist/chunk-C7GWMSTV.js.map +1 -0
- package/dist/cli/bin.cjs +6115 -4757
- package/dist/cli/bin.cjs.map +1 -1
- package/dist/cli/bin.js +320 -52
- package/dist/cli/bin.js.map +1 -1
- package/dist/index.cjs +1751 -614
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +484 -539
- package/dist/index.d.ts +484 -539
- package/dist/index.js +30 -26
- package/dist/index.js.map +1 -1
- package/package.json +5 -4
- package/skills/opensteer/SKILL.md +66 -13
- package/skills/opensteer/references/cli-reference.md +115 -20
- package/skills/opensteer/references/request-workflow.md +35 -15
- package/skills/opensteer/references/sdk-reference.md +85 -21
- package/dist/chunk-X3G6QSCF.js.map +0 -1
|
@@ -5,32 +5,101 @@ Use the CLI when you need a fast JSON-first loop against a repo-local workspace
|
|
|
5
5
|
## Sections
|
|
6
6
|
|
|
7
7
|
- [Quickstart](#quickstart)
|
|
8
|
+
- [Snapshot Output — What To Read](#snapshot-output--what-to-read)
|
|
9
|
+
- [End-to-End Example](#end-to-end-example)
|
|
8
10
|
- [Browser Lifecycle And Profile Cloning](#browser-lifecycle-and-profile-cloning)
|
|
9
11
|
- [Browser Modes](#browser-modes)
|
|
10
12
|
- [Advanced Semantic Operations](#advanced-semantic-operations)
|
|
11
|
-
- [Extraction Schema
|
|
13
|
+
- [Extraction Schema And Array Auto-Generalization](#extraction-schema-and-array-auto-generalization)
|
|
12
14
|
|
|
13
15
|
## Quickstart
|
|
14
16
|
|
|
15
17
|
```bash
|
|
16
|
-
opensteer browser status --workspace demo
|
|
17
18
|
opensteer open https://example.com --workspace demo
|
|
18
19
|
opensteer snapshot action --workspace demo
|
|
19
|
-
|
|
20
|
-
|
|
20
|
+
# Read the "html" field in the JSON output. Find elements by their c="N" attributes.
|
|
21
|
+
# Example: <input c="5" placeholder="Search"> means element 5 is the search input.
|
|
22
|
+
# Example: <button c="7">Search</button> means element 7 is the search button.
|
|
23
|
+
|
|
24
|
+
# Act on elements AND persist their paths with human-readable descriptions
|
|
25
|
+
opensteer run dom.input --workspace demo \
|
|
26
|
+
--input-json '{"target":{"kind":"element","element":5},"text":"search term","persistAsDescription":"search input"}'
|
|
27
|
+
opensteer run dom.click --workspace demo \
|
|
28
|
+
--input-json '{"target":{"kind":"element","element":7},"persistAsDescription":"search button"}'
|
|
29
|
+
|
|
30
|
+
# Re-snapshot after navigation, then persist an extraction descriptor
|
|
21
31
|
opensteer snapshot extraction --workspace demo
|
|
22
32
|
opensteer extract --workspace demo \
|
|
23
33
|
--description "page summary" \
|
|
24
|
-
--schema-json '{"title":{"
|
|
34
|
+
--schema-json '{"title":{"element":3},"url":{"source":"current_url"}}'
|
|
35
|
+
|
|
36
|
+
# Replay later with just descriptions — no snapshot needed
|
|
37
|
+
opensteer click --workspace demo --description "search button"
|
|
25
38
|
opensteer extract --workspace demo --description "page summary"
|
|
26
39
|
opensteer close --workspace demo
|
|
27
40
|
```
|
|
28
41
|
|
|
29
42
|
- Stateful CLI commands currently require `--workspace <id>`.
|
|
30
|
-
-
|
|
31
|
-
- Use `
|
|
32
|
-
-
|
|
33
|
-
- `extract --description
|
|
43
|
+
- Use `snapshot action` to discover page elements during exploration.
|
|
44
|
+
- Use `opensteer run dom.*` with `persistAsDescription` to cache element paths under descriptions.
|
|
45
|
+
- Replay cached actions with `--description` alone — no snapshot needed.
|
|
46
|
+
- `extract --description --schema-json` writes a persisted extraction descriptor.
|
|
47
|
+
- `extract --description` replays the stored extraction.
|
|
48
|
+
|
|
49
|
+
## Snapshot Output — What To Read
|
|
50
|
+
|
|
51
|
+
`snapshot action` and `snapshot extraction` both return JSON:
|
|
52
|
+
|
|
53
|
+
```json
|
|
54
|
+
{
|
|
55
|
+
"url": "https://example.com/search?q=airpods",
|
|
56
|
+
"title": "Search Results",
|
|
57
|
+
"mode": "extraction",
|
|
58
|
+
"html": "<span c=\"12\">$549.99</span>\n<a c=\"15\" href=\"/p/product-1\">\n <div c=\"16\">Apple AirPods Max</div>\n</a>\n<a c=\"18\" href=\"/b/apple\">Apple</a>...",
|
|
59
|
+
"counters": [{"element":12,"tagName":"SPAN","pathHint":"span",...}, ...]
|
|
60
|
+
}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
**Read the `html` field.** It is a clean, filtered DOM. Hidden elements, scripts, and styles are already removed. Every element has a `c="N"` attribute.
|
|
64
|
+
|
|
65
|
+
- `c="N"` in the HTML = `element: N` in commands and extraction schemas
|
|
66
|
+
- `snapshot action` keeps interactive elements (buttons, inputs, links) for clicking/typing
|
|
67
|
+
- `snapshot extraction` keeps all visible content (text, prices, titles) for data extraction
|
|
68
|
+
- Do NOT parse the `counters` array to find elements — it is verbose metadata. Read the HTML string, find the `c="N"` values, and use those numbers.
|
|
69
|
+
|
|
70
|
+
## End-to-End Example
|
|
71
|
+
|
|
72
|
+
Goal: go to a site, search for a product, extract all results.
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# 1. Open the page
|
|
76
|
+
opensteer open https://example.com --workspace demo
|
|
77
|
+
|
|
78
|
+
# 2. Snapshot to discover elements
|
|
79
|
+
opensteer snapshot action --workspace demo
|
|
80
|
+
# Read html field. Find: <input c="5" placeholder="Search"> and <button c="7">Search</button>
|
|
81
|
+
|
|
82
|
+
# 3. Type search term and persist the element path
|
|
83
|
+
opensteer run dom.input --workspace demo \
|
|
84
|
+
--input-json '{"target":{"kind":"element","element":5},"text":"airpods","pressEnter":true,"persistAsDescription":"search input"}'
|
|
85
|
+
|
|
86
|
+
# 4. Re-snapshot the results page (always re-snapshot after navigation!)
|
|
87
|
+
opensteer snapshot extraction --workspace demo
|
|
88
|
+
# Read html field. Find product items with their c="N" values:
|
|
89
|
+
# <div c="13">Apple AirPods Max</div> <span c="14">$549.99</span>
|
|
90
|
+
# <div c="22">Apple AirPods Pro</div> <span c="23">$249.99</span>
|
|
91
|
+
|
|
92
|
+
# 5. Extract all results — array auto-generalizes from template rows
|
|
93
|
+
opensteer extract --workspace demo \
|
|
94
|
+
--description "search results" \
|
|
95
|
+
--schema-json '{"items":[{"name":{"element":13},"price":{"element":14}},{"name":{"element":22},"price":{"element":23}}]}'
|
|
96
|
+
# Returns ALL matching rows on the page, not just the 2 templates.
|
|
97
|
+
|
|
98
|
+
# 6. Close
|
|
99
|
+
opensteer close --workspace demo
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Total: 6 commands.
|
|
34
103
|
|
|
35
104
|
## Browser Lifecycle And Profile Cloning
|
|
36
105
|
|
|
@@ -94,6 +163,9 @@ opensteer run network.query --workspace demo \
|
|
|
94
163
|
opensteer run request-plan.infer --workspace demo \
|
|
95
164
|
--input-json '{"recordId":"rec_123","key":"products.search","version":"v1"}'
|
|
96
165
|
|
|
166
|
+
opensteer run request-plan.infer --workspace demo \
|
|
167
|
+
--input-json '{"recordId":"rec_123","key":"products.search.portable","version":"v1","transport":"direct-http"}'
|
|
168
|
+
|
|
97
169
|
opensteer run request.execute --workspace demo \
|
|
98
170
|
--input-json '{"key":"products.search","query":{"q":"laptop"}}'
|
|
99
171
|
```
|
|
@@ -102,13 +174,11 @@ opensteer run request.execute --workspace demo \
|
|
|
102
174
|
- Use `run page.goto` when you need `networkTag` on navigation. The short `goto` form only parses the URL positional.
|
|
103
175
|
- Use `run dom.click` / `run dom.input` / `run dom.hover` / `run dom.scroll` when you need `persistAsDescription`.
|
|
104
176
|
|
|
105
|
-
## Extraction Schema
|
|
177
|
+
## Extraction Schema And Array Auto-Generalization
|
|
106
178
|
|
|
107
|
-
|
|
108
|
-
opensteer snapshot extraction --workspace demo
|
|
109
|
-
```
|
|
179
|
+
Always run `snapshot extraction` before building a schema — you need the `c="N"` counter values from the HTML.
|
|
110
180
|
|
|
111
|
-
|
|
181
|
+
Flat field bindings:
|
|
112
182
|
|
|
113
183
|
```bash
|
|
114
184
|
opensteer extract --workspace demo \
|
|
@@ -120,14 +190,39 @@ opensteer extract --workspace demo \
|
|
|
120
190
|
--schema-json '{"url":{"selector":"a.primary","attribute":"href"},"pageUrl":{"source":"current_url"}}'
|
|
121
191
|
```
|
|
122
192
|
|
|
123
|
-
|
|
193
|
+
Array extraction with auto-generalization:
|
|
124
194
|
|
|
125
195
|
```bash
|
|
126
196
|
opensteer extract --workspace demo \
|
|
127
|
-
--description "
|
|
128
|
-
--schema-json '{"items":[{"
|
|
197
|
+
--description "product list" \
|
|
198
|
+
--schema-json '{"items":[{"name":{"element":13},"price":{"element":14}},{"name":{"element":22},"price":{"element":23}}]}'
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
The extractor uses the 1-2 representative rows as **templates**, then automatically finds **ALL matching rows** on the page:
|
|
202
|
+
|
|
203
|
+
```json
|
|
204
|
+
{
|
|
205
|
+
"items": [
|
|
206
|
+
{"name": "Apple AirPods Max", "price": "$549.99"},
|
|
207
|
+
{"name": "Apple AirPods Pro", "price": "$249.99"},
|
|
208
|
+
{"name": "Apple AirPods 4", "price": "$129.99"},
|
|
209
|
+
...
|
|
210
|
+
]
|
|
211
|
+
}
|
|
129
212
|
```
|
|
130
213
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
-
|
|
214
|
+
Rules:
|
|
215
|
+
|
|
216
|
+
- Build the exact JSON shape you want. The extractor does not accept `"string"` or prompt-style schemas.
|
|
217
|
+
- Each leaf must be `{ element: N }`, `{ selector: "..." }`, optional `attribute`, or `{ source: "current_url" }`.
|
|
218
|
+
- Use `element` fields during CLI exploration with a fresh snapshot. Deterministic scripts use `description`.
|
|
219
|
+
- For arrays, provide 1-2 representative objects. Add 2 when repeated rows have structural variants.
|
|
220
|
+
- Nested arrays are not supported.
|
|
221
|
+
|
|
222
|
+
## What NOT To Do
|
|
223
|
+
|
|
224
|
+
- Do NOT use `page.evaluate()` to scrape DOM data. Use `extract()` with element-based schemas.
|
|
225
|
+
- Do NOT parse the `counters` array to find elements. Read the `html` string and find `c="N"` values.
|
|
226
|
+
- Do NOT use CSS selectors in reusable scripts. Use `description` from cached descriptors.
|
|
227
|
+
- Do NOT write loops to enumerate list items. Use array extraction with 1-2 template rows.
|
|
228
|
+
- Do NOT skip re-snapshot after navigation. Always re-snapshot before targeting new elements.
|
|
@@ -19,10 +19,10 @@ Use this workflow when the deliverable is a custom API, a replayable request pla
|
|
|
19
19
|
2. Tag the important navigation or interactions with `networkTag`.
|
|
20
20
|
3. Inspect the captured traffic and isolate the relevant records.
|
|
21
21
|
4. Save useful captures to the workspace if they need to survive later analysis.
|
|
22
|
-
5.
|
|
23
|
-
6.
|
|
22
|
+
5. Probe the request with `rawRequest()` — try `direct-http` first, then `context-http`.
|
|
23
|
+
6. Infer a request plan from the probed record — pass `transport` if you proved portability.
|
|
24
24
|
7. Add recipes or auth recipes if replay needs deterministic setup.
|
|
25
|
-
8. Replay the plan from code.
|
|
25
|
+
8. Replay the plan from code — works immediately, no extra steps.
|
|
26
26
|
|
|
27
27
|
This workflow should carry equal weight with DOM automation. Use it whenever the browser page is only the launcher for the real target request.
|
|
28
28
|
|
|
@@ -47,7 +47,6 @@ await opensteer.goto({
|
|
|
47
47
|
});
|
|
48
48
|
|
|
49
49
|
await opensteer.click({
|
|
50
|
-
selector: "button.load-products",
|
|
51
50
|
description: "load products",
|
|
52
51
|
networkTag: "products-load",
|
|
53
52
|
});
|
|
@@ -63,26 +62,29 @@ const records = await opensteer.queryNetwork({
|
|
|
63
62
|
});
|
|
64
63
|
```
|
|
65
64
|
|
|
66
|
-
3.
|
|
65
|
+
3. Probe the request — try `direct-http` first to test portability.
|
|
67
66
|
|
|
68
67
|
```ts
|
|
69
68
|
const response = await opensteer.rawRequest({
|
|
70
|
-
transport: "
|
|
69
|
+
transport: "direct-http",
|
|
71
70
|
url: "https://example.com/api/products",
|
|
72
71
|
method: "POST",
|
|
73
72
|
body: {
|
|
74
73
|
json: { page: 1 },
|
|
75
74
|
},
|
|
76
75
|
});
|
|
76
|
+
// If direct-http returns 200, the API is portable (no browser needed).
|
|
77
|
+
// If it fails, try context-http — the API needs browser session state.
|
|
77
78
|
```
|
|
78
79
|
|
|
79
|
-
4.
|
|
80
|
+
4. Infer a request plan — pass `transport` if you proved portability.
|
|
80
81
|
|
|
81
82
|
```ts
|
|
82
83
|
await opensteer.inferRequestPlan({
|
|
83
|
-
recordId:
|
|
84
|
+
recordId: response.recordId,
|
|
84
85
|
key: "products.search",
|
|
85
86
|
version: "v1",
|
|
87
|
+
transport: "direct-http", // use the transport you proved works
|
|
86
88
|
});
|
|
87
89
|
```
|
|
88
90
|
|
|
@@ -117,17 +119,18 @@ await opensteer.runAuthRecipe({
|
|
|
117
119
|
## CLI Equivalents
|
|
118
120
|
|
|
119
121
|
```bash
|
|
120
|
-
opensteer open --workspace demo
|
|
122
|
+
opensteer open https://example.com/app --workspace demo
|
|
121
123
|
opensteer run page.goto --workspace demo \
|
|
122
124
|
--input-json '{"url":"https://example.com/app","networkTag":"page-load"}'
|
|
123
|
-
opensteer
|
|
124
|
-
|
|
125
|
+
opensteer click --workspace demo --description "load products"
|
|
126
|
+
# or with networkTag: opensteer run dom.click --workspace demo \
|
|
127
|
+
# --input-json '{"target":{"kind":"description","description":"load products"},"networkTag":"products-load"}'
|
|
125
128
|
opensteer run network.query --workspace demo \
|
|
126
|
-
--input-json '{"tag":"products-load","includeBodies":true,"limit":20}'
|
|
129
|
+
--input-json '{"source":"saved","tag":"products-load","includeBodies":true,"limit":20}'
|
|
127
130
|
opensteer run request.raw --workspace demo \
|
|
128
|
-
--input-json '{"transport":"
|
|
131
|
+
--input-json '{"transport":"direct-http","url":"https://example.com/api/products","method":"POST","body":{"json":{"page":1}}}'
|
|
129
132
|
opensteer run request-plan.infer --workspace demo \
|
|
130
|
-
--input-json '{"recordId":"rec_123","key":"products.search","version":"v1"}'
|
|
133
|
+
--input-json '{"recordId":"rec_123","key":"products.search","version":"v1","transport":"direct-http"}'
|
|
131
134
|
opensteer run request.execute --workspace demo \
|
|
132
135
|
--input-json '{"key":"products.search","query":{"q":"laptop"}}'
|
|
133
136
|
```
|
|
@@ -152,6 +155,22 @@ const context = await opensteer.rawRequest({
|
|
|
152
155
|
|
|
153
156
|
If `direct-http` returns 200, the API is portable and does not need a browser for future calls. If only `context-http` works, the API depends on browser session state.
|
|
154
157
|
|
|
158
|
+
After proving portability, infer the plan with an explicit transport override:
|
|
159
|
+
|
|
160
|
+
```ts
|
|
161
|
+
await opensteer.inferRequestPlan({
|
|
162
|
+
recordId: records.records[0]!.id,
|
|
163
|
+
key: "products.search.portable",
|
|
164
|
+
version: "v1",
|
|
165
|
+
transport: "direct-http",
|
|
166
|
+
});
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
opensteer run request-plan.infer --workspace demo \
|
|
171
|
+
--input-json '{"recordId":"rec_123","key":"products.search.portable","version":"v1","transport":"direct-http"}'
|
|
172
|
+
```
|
|
173
|
+
|
|
155
174
|
## Auth Token Acquisition
|
|
156
175
|
|
|
157
176
|
When you discover an auth endpoint, acquire a token and use it to probe for data APIs that may be behind auth:
|
|
@@ -211,7 +230,8 @@ Common mistakes:
|
|
|
211
230
|
Additional guidance:
|
|
212
231
|
|
|
213
232
|
- Capture the browser action first if authentication, cookies, or minted tokens may matter.
|
|
214
|
-
-
|
|
233
|
+
- Probe with `direct-http` first. If it works, pass `transport: "direct-http"` to `inferRequestPlan` so the plan is portable. If it fails, fall back to `context-http`.
|
|
215
234
|
- `inferRequestPlan()` throws if the key+version already exists. Catch the error or bump the version.
|
|
235
|
+
- Inferred plans are immediately usable — `request.execute` works right after inference.
|
|
216
236
|
- Use recipes when the request plan needs deterministic setup work. Use auth recipes when the setup is specifically auth refresh or login state.
|
|
217
237
|
- Stay in the DOM workflow only when the rendered page itself is the deliverable. Move here when the request is the durable artifact.
|
|
@@ -39,6 +39,12 @@ const opensteer = new Opensteer({
|
|
|
39
39
|
|
|
40
40
|
## DOM Automation And Extraction
|
|
41
41
|
|
|
42
|
+
Opensteer uses a two-phase workflow: **explore** with the CLI, then **replay** with the SDK.
|
|
43
|
+
|
|
44
|
+
### Phase 1 — Exploration (one-time, via CLI or setup script)
|
|
45
|
+
|
|
46
|
+
Run `opensteer snapshot action --workspace demo` from the CLI first. Read the `html` field in the JSON output — it is a clean filtered DOM with `c="N"` attributes. Use those counter numbers as the `element` parameter below. The SDK also exposes `snapshot()`, but this guide keeps discovery in the CLI so the DOM HTML is easy to inspect from the terminal.
|
|
47
|
+
|
|
42
48
|
```ts
|
|
43
49
|
import { Opensteer } from "opensteer";
|
|
44
50
|
|
|
@@ -48,21 +54,21 @@ const opensteer = new Opensteer({
|
|
|
48
54
|
});
|
|
49
55
|
|
|
50
56
|
await opensteer.open("https://example.com");
|
|
51
|
-
await opensteer.snapshot("action");
|
|
52
57
|
|
|
58
|
+
// element numbers come from c="N" values in the snapshot html field
|
|
53
59
|
await opensteer.click({
|
|
54
|
-
|
|
55
|
-
description: "primary button",
|
|
60
|
+
element: 3,
|
|
61
|
+
description: "primary button", // caches the element path
|
|
56
62
|
});
|
|
57
63
|
|
|
58
64
|
await opensteer.input({
|
|
59
|
-
|
|
60
|
-
description: "search input",
|
|
65
|
+
element: 7,
|
|
66
|
+
description: "search input", // caches the element path
|
|
61
67
|
text: "laptop",
|
|
62
68
|
pressEnter: true,
|
|
63
69
|
});
|
|
64
70
|
|
|
65
|
-
|
|
71
|
+
await opensteer.extract({
|
|
66
72
|
description: "page summary",
|
|
67
73
|
schema: {
|
|
68
74
|
title: { selector: "title" },
|
|
@@ -70,32 +76,83 @@ const data = await opensteer.extract({
|
|
|
70
76
|
},
|
|
71
77
|
});
|
|
72
78
|
|
|
73
|
-
|
|
74
|
-
|
|
79
|
+
await opensteer.close();
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Phase 2 — Deterministic replay (the actual reusable script)
|
|
83
|
+
|
|
84
|
+
Use `description` alone for everything — resolves from cached descriptors:
|
|
85
|
+
|
|
86
|
+
```ts
|
|
87
|
+
const opensteer = new Opensteer({
|
|
88
|
+
workspace: "demo",
|
|
89
|
+
rootDir: process.cwd(),
|
|
75
90
|
});
|
|
91
|
+
|
|
92
|
+
await opensteer.open("https://example.com");
|
|
93
|
+
|
|
94
|
+
await opensteer.click({ description: "primary button" });
|
|
95
|
+
await opensteer.input({ description: "search input", text: "laptop", pressEnter: true });
|
|
96
|
+
const data = await opensteer.extract({ description: "page summary" });
|
|
97
|
+
|
|
98
|
+
await opensteer.close();
|
|
76
99
|
```
|
|
77
100
|
|
|
78
101
|
DOM rules:
|
|
79
102
|
|
|
80
|
-
-
|
|
81
|
-
-
|
|
82
|
-
- Treat snapshots as planning artifacts. `extract()` reads current page state and replays persisted extraction descriptors from deterministic, snapshot-backed payloads.
|
|
83
|
-
- `selector + description` or `element + description` persists a DOM action descriptor. Bare `description` replays it later.
|
|
103
|
+
- Deterministic scripts use `description` for all interactions and extractions — no snapshots, no selectors.
|
|
104
|
+
- `element + description` persists a DOM action descriptor. Bare `description` replays it later.
|
|
84
105
|
- `description + schema` writes or updates a persisted extraction descriptor. Bare `description` replays it later.
|
|
106
|
+
- Use `element` targets only during the exploration phase with a fresh snapshot from the CLI.
|
|
85
107
|
- Keep DOM data collection in `extract()`, not `evaluate()` or raw page DOM parsing, when the result can be expressed as structured fields.
|
|
108
|
+
- CSS selectors exist as a low-level escape hatch but are not recommended for reusable scripts.
|
|
86
109
|
|
|
87
|
-
Supported extraction field shapes
|
|
110
|
+
Supported extraction field shapes:
|
|
88
111
|
|
|
89
|
-
- `{ element: N }`
|
|
112
|
+
- `{ element: N }` — requires a prior CLI snapshot; use during exploration only
|
|
90
113
|
- `{ element: N, attribute: "href" }`
|
|
91
114
|
- `{ selector: ".price" }`
|
|
92
115
|
- `{ selector: "img.hero", attribute: "src" }`
|
|
93
116
|
- `{ source: "current_url" }`
|
|
94
117
|
|
|
95
|
-
For arrays,
|
|
118
|
+
For arrays, provide 1-2 representative objects. The extractor auto-generalizes from these templates to find ALL matching rows on the page:
|
|
119
|
+
|
|
120
|
+
```ts
|
|
121
|
+
const results = await opensteer.extract({
|
|
122
|
+
description: "search results",
|
|
123
|
+
schema: {
|
|
124
|
+
items: [
|
|
125
|
+
{ name: { element: 13 }, price: { element: 14 } },
|
|
126
|
+
{ name: { element: 22 }, price: { element: 23 } },
|
|
127
|
+
],
|
|
128
|
+
},
|
|
129
|
+
});
|
|
130
|
+
// results.items contains ALL matching rows on the page, not just the 2 templates
|
|
131
|
+
```
|
|
96
132
|
|
|
97
133
|
Do not use `prompt` or semantic placeholder values such as `"string"` in the current public SDK. The extractor expects explicit schema objects, arrays, and field descriptors.
|
|
98
134
|
|
|
135
|
+
### What extract() Returns
|
|
136
|
+
|
|
137
|
+
`extract()` returns a plain JSON object matching your schema shape:
|
|
138
|
+
|
|
139
|
+
```ts
|
|
140
|
+
// Flat schema:
|
|
141
|
+
{ title: "Search Results", url: "https://..." }
|
|
142
|
+
|
|
143
|
+
// Array schema (auto-generalized from 1-2 templates):
|
|
144
|
+
{
|
|
145
|
+
items: [
|
|
146
|
+
{ name: "Apple AirPods Max", price: "$549.99" },
|
|
147
|
+
{ name: "Apple AirPods Pro", price: "$249.99" },
|
|
148
|
+
{ name: "Apple AirPods 4", price: "$129.99" },
|
|
149
|
+
// ... ALL matching rows
|
|
150
|
+
]
|
|
151
|
+
}
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Use `extract()` for structured data. Do NOT use `evaluate()` or raw DOM parsing when `extract()` can express the result.
|
|
155
|
+
|
|
99
156
|
## Browser Admin
|
|
100
157
|
|
|
101
158
|
```ts
|
|
@@ -150,6 +207,13 @@ await opensteer.inferRequestPlan({
|
|
|
150
207
|
version: "v1",
|
|
151
208
|
});
|
|
152
209
|
|
|
210
|
+
await opensteer.inferRequestPlan({
|
|
211
|
+
recordId: records.records[0]!.id,
|
|
212
|
+
key: "products.search.portable",
|
|
213
|
+
version: "v1",
|
|
214
|
+
transport: "direct-http",
|
|
215
|
+
});
|
|
216
|
+
|
|
153
217
|
await opensteer.saveNetwork({
|
|
154
218
|
tag: "products-load",
|
|
155
219
|
});
|
|
@@ -199,7 +263,6 @@ Session and page control:
|
|
|
199
263
|
|
|
200
264
|
Interaction and extraction:
|
|
201
265
|
|
|
202
|
-
- `snapshot("action" | "extraction")`
|
|
203
266
|
- `click({ element | selector | description, networkTag? })`
|
|
204
267
|
- `hover({ element | selector | description, networkTag? })`
|
|
205
268
|
- `input({ element | selector | description, text, pressEnter?, networkTag? })`
|
|
@@ -219,8 +282,8 @@ Inspection and evaluation:
|
|
|
219
282
|
Request capture and replay:
|
|
220
283
|
|
|
221
284
|
- `rawRequest({ transport?, pageRef?, url, method?, headers?, body?, followRedirects? })`
|
|
222
|
-
- `inferRequestPlan({ recordId, key, version,
|
|
223
|
-
- `writeRequestPlan({ key, version, payload,
|
|
285
|
+
- `inferRequestPlan({ recordId, key, version, transport? })`
|
|
286
|
+
- `writeRequestPlan({ key, version, payload, tags?, provenance?, freshness? })`
|
|
224
287
|
- `getRequestPlan({ key, version? })`
|
|
225
288
|
- `listRequestPlans({ key? })`
|
|
226
289
|
- `request(key, { path?, query?, headers?, body? })`
|
|
@@ -251,7 +314,8 @@ Lifecycle:
|
|
|
251
314
|
|
|
252
315
|
- Wrap long-running browser ownership in `try/finally` and call `close()`.
|
|
253
316
|
- Use `networkTag` on actions that trigger requests you may inspect later.
|
|
254
|
-
- Use `description`
|
|
255
|
-
- Use `description` plus `schema`
|
|
256
|
-
- Use `element` targets only with a fresh snapshot
|
|
317
|
+
- Use `description` for all interactions and extractions in deterministic scripts.
|
|
318
|
+
- Use `description` plus `schema` to persist an extraction descriptor. Bare `description` replays it.
|
|
319
|
+
- Use `element` targets only during CLI exploration with a fresh snapshot. Deterministic scripts use `description`.
|
|
320
|
+
- The SDK does expose `snapshot()`, but this workflow keeps element discovery in the CLI with `snapshot action`.
|
|
257
321
|
- Prefer Opensteer methods over raw Playwright so browser, extraction, and replay semantics stay consistent.
|