opensteer 0.8.5 → 0.8.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/{chunk-C7GWMSTV.js → chunk-KYRC6CLB.js} +7946 -7899
- package/dist/chunk-KYRC6CLB.js.map +1 -0
- package/dist/cli/bin.cjs +279 -233
- package/dist/cli/bin.cjs.map +1 -1
- package/dist/cli/bin.js +21 -21
- package/dist/cli/bin.js.map +1 -1
- package/dist/index.cjs +8020 -7966
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +78 -11
- package/dist/index.d.ts +78 -11
- package/dist/index.js +3 -3
- package/dist/index.js.map +1 -1
- package/package.json +7 -7
- package/skills/opensteer/SKILL.md +10 -9
- package/skills/opensteer/references/cli-reference.md +37 -6
- package/skills/opensteer/references/request-workflow.md +2 -0
- package/skills/opensteer/references/sdk-reference.md +2 -0
- package/dist/chunk-C7GWMSTV.js.map +0 -1
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "opensteer",
|
|
3
|
-
"version": "0.8.
|
|
3
|
+
"version": "0.8.6",
|
|
4
4
|
"description": "Opensteer browser automation, replay, and reverse-engineering toolkit.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "module",
|
|
@@ -56,14 +56,14 @@
|
|
|
56
56
|
"sharp": "^0.34.5",
|
|
57
57
|
"skills": "^1.4.6",
|
|
58
58
|
"ws": "^8.18.0",
|
|
59
|
-
"@opensteer/engine-playwright": "0.8.
|
|
60
|
-
"@opensteer/runtime-core": "0.1.
|
|
59
|
+
"@opensteer/engine-playwright": "0.8.1",
|
|
60
|
+
"@opensteer/runtime-core": "0.1.1"
|
|
61
61
|
},
|
|
62
62
|
"optionalDependencies": {
|
|
63
63
|
"webcrack": "^2.15.1"
|
|
64
64
|
},
|
|
65
65
|
"peerDependencies": {
|
|
66
|
-
"@opensteer/engine-abp": "0.8.
|
|
66
|
+
"@opensteer/engine-abp": "0.8.2"
|
|
67
67
|
},
|
|
68
68
|
"peerDependenciesMeta": {
|
|
69
69
|
"@opensteer/engine-abp": {
|
|
@@ -71,9 +71,9 @@
|
|
|
71
71
|
}
|
|
72
72
|
},
|
|
73
73
|
"devDependencies": {
|
|
74
|
-
"@opensteer/browser-core": "0.7.
|
|
75
|
-
"@opensteer/engine-abp": "0.8.
|
|
76
|
-
"@opensteer/protocol": "0.7.
|
|
74
|
+
"@opensteer/browser-core": "0.7.1",
|
|
75
|
+
"@opensteer/engine-abp": "0.8.2",
|
|
76
|
+
"@opensteer/protocol": "0.7.1"
|
|
77
77
|
},
|
|
78
78
|
"scripts": {
|
|
79
79
|
"build": "tsup && node ../../scripts/sync-package-skills.mjs",
|
|
@@ -10,9 +10,9 @@ argument-hint: "[goal]"
|
|
|
10
10
|
|
|
11
11
|
1. Run `snapshot action` or `snapshot extraction` first. The output is JSON with `{url, title, mode, html, counters}`. **Read the `html` field** — it is a clean filtered DOM with inline `c="N"` attributes marking every element. Do NOT parse the `counters` array for element discovery — it is verbose metadata.
|
|
12
12
|
2. Use `element` + `persistAsDescription` to act on elements. Use `extract()` with `description` + `schema` to extract data. Do NOT use `page.evaluate()`, CSS selectors, or raw DOM parsing when `extract()` can express the output.
|
|
13
|
-
3.
|
|
13
|
+
3. Extraction schemas are **literal**. If you provide 2 template rows, you get exactly 2 rows back. The framework consolidates those templates into a generalized selector behind the scenes and saves it as a descriptor. Replaying with `description` alone (no schema) uses that generalized selector to return **ALL** matching rows.
|
|
14
14
|
4. `persistAsDescription` requires the verbose `opensteer run dom.*` syntax. The short CLI commands (`click`, `input`, etc.) do NOT support it.
|
|
15
|
-
5. Phase 1 = CLI exploration (snapshot, act, extract). Phase 2 = deterministic
|
|
15
|
+
5. Phase 1 = CLI exploration (snapshot, act, extract with schema). Phase 2 = deterministic replay using `description` alone returns all matching data. No snapshots in Phase 2.
|
|
16
16
|
|
|
17
17
|
If invoked directly, treat `$ARGUMENTS` as the concrete browser or replay goal. First decide whether the task is primarily DOM automation, request capture/replay, or workspace browser administration.
|
|
18
18
|
|
|
@@ -73,23 +73,23 @@ opensteer snapshot extraction --workspace demo
|
|
|
73
73
|
# → Read html field: <div c="13">Apple AirPods</div> <span c="14">$189.99</span> ...
|
|
74
74
|
|
|
75
75
|
opensteer extract --workspace demo --description "search results" \
|
|
76
|
-
--schema-json '{"items":[{"name":{"element":13},"price":{"element":14}}]}'
|
|
77
|
-
# → Returns
|
|
76
|
+
--schema-json '{"items":[{"name":{"element":13},"price":{"element":14}},{"name":{"element":22},"price":{"element":23}}]}'
|
|
77
|
+
# → Returns exactly 2 rows (the literal template values)
|
|
78
|
+
# → Behind the scenes: consolidates templates into a generalized selector and saves it as a descriptor
|
|
78
79
|
|
|
79
80
|
opensteer close --workspace demo
|
|
80
81
|
```
|
|
81
82
|
|
|
82
|
-
**Phase 2 — Deterministic
|
|
83
|
+
**Phase 2 — Deterministic replay (reusable):**
|
|
83
84
|
|
|
84
85
|
1. Use `description` alone for all interactions — resolves from cached descriptors.
|
|
85
|
-
2. Use `description
|
|
86
|
-
3.
|
|
87
|
-
4. No snapshot calls needed in scripts. Just descriptions.
|
|
86
|
+
2. Use `description` alone for extraction replay — uses the generalized selector to return **ALL** matching rows.
|
|
87
|
+
3. No snapshot calls needed. Just descriptions.
|
|
88
88
|
|
|
89
89
|
## Shared Rules
|
|
90
90
|
|
|
91
91
|
- The short CLI commands (`click`, `input`, etc.) accept exactly one of `--element`, `--selector`, or `--description`. Use `opensteer run dom.*` with `--input-json` when you need `persistAsDescription`.
|
|
92
|
-
- For extraction, `description + schema`
|
|
92
|
+
- For extraction, `description + schema` returns literal template values and saves a generalized extraction descriptor. `description` alone replays the descriptor and returns ALL matching rows.
|
|
93
93
|
- Extraction schemas are explicit JSON objects and arrays. Each leaf must be `{ element: N }`, `{ selector: "..." }`, optional `attribute`, or `{ source: "current_url" }`.
|
|
94
94
|
- Persisted extraction replay is deterministic and snapshot-backed. Do not replace `extract()` with `evaluate()` or custom DOM parsing when the desired output fits the extraction schema.
|
|
95
95
|
- Use recipes for deterministic setup work. Use auth recipes for auth refresh/setup specifically. They live in separate registries.
|
|
@@ -102,3 +102,4 @@ opensteer close --workspace demo
|
|
|
102
102
|
- Using `page.evaluate()` or CSS selectors instead of `extract()`. Use extract with element-based schemas.
|
|
103
103
|
- Forgetting to re-snapshot after navigation. Always re-snapshot before targeting new elements.
|
|
104
104
|
- Using short CLI (`click`, `input`) when `persistAsDescription` is needed. Use `opensteer run dom.*`.
|
|
105
|
+
- Expecting `extract --schema-json` with array templates to return all rows. The schema is literal — you get back exactly the rows you specified. Use description-only replay (`extract --description`) to get all matching rows.
|
|
@@ -45,6 +45,8 @@ opensteer close --workspace demo
|
|
|
45
45
|
- Replay cached actions with `--description` alone — no snapshot needed.
|
|
46
46
|
- `extract --description --schema-json` writes a persisted extraction descriptor.
|
|
47
47
|
- `extract --description` replays the stored extraction.
|
|
48
|
+
- Saved-network persistence (`network.save`, `network.query` with `source: "saved"`, and `network.clear`) is SQLite-backed and initializes on first use.
|
|
49
|
+
- Generic workspace and browser commands do not require SQLite capability unless they touch saved-network persistence.
|
|
48
50
|
|
|
49
51
|
## Snapshot Output — What To Read
|
|
50
52
|
|
|
@@ -174,10 +176,12 @@ opensteer run request.execute --workspace demo \
|
|
|
174
176
|
- Use `run page.goto` when you need `networkTag` on navigation. The short `goto` form only parses the URL positional.
|
|
175
177
|
- Use `run dom.click` / `run dom.input` / `run dom.hover` / `run dom.scroll` when you need `persistAsDescription`.
|
|
176
178
|
|
|
177
|
-
## Extraction Schema
|
|
179
|
+
## Extraction Schema
|
|
178
180
|
|
|
179
181
|
Always run `snapshot extraction` before building a schema — you need the `c="N"` counter values from the HTML.
|
|
180
182
|
|
|
183
|
+
Schemas are **literal**: each `element` reference points to one specific DOM element, and you get back exactly the values you asked for. The schema is not a prompt or pattern — it is a precise specification.
|
|
184
|
+
|
|
181
185
|
Flat field bindings:
|
|
182
186
|
|
|
183
187
|
```bash
|
|
@@ -190,7 +194,13 @@ opensteer extract --workspace demo \
|
|
|
190
194
|
--schema-json '{"url":{"selector":"a.primary","attribute":"href"},"pageUrl":{"source":"current_url"}}'
|
|
191
195
|
```
|
|
192
196
|
|
|
193
|
-
Array
|
|
197
|
+
## Array Extraction (Two-Step Process)
|
|
198
|
+
|
|
199
|
+
Extracting lists of items (product rows, search results, etc.) is a two-step process:
|
|
200
|
+
|
|
201
|
+
**Step 1 — Teach the pattern (schema + description):**
|
|
202
|
+
|
|
203
|
+
Provide 2 structurally similar rows as templates. The extractor returns exactly those rows (literal), but behind the scenes consolidates the templates into a generalized selector and saves it as a descriptor.
|
|
194
204
|
|
|
195
205
|
```bash
|
|
196
206
|
opensteer extract --workspace demo \
|
|
@@ -198,31 +208,52 @@ opensteer extract --workspace demo \
|
|
|
198
208
|
--schema-json '{"items":[{"name":{"element":13},"price":{"element":14}},{"name":{"element":22},"price":{"element":23}}]}'
|
|
199
209
|
```
|
|
200
210
|
|
|
201
|
-
|
|
211
|
+
Returns exactly 2 rows (the literal template values):
|
|
212
|
+
```json
|
|
213
|
+
{
|
|
214
|
+
"items": [
|
|
215
|
+
{"name": "Apple AirPods Max", "price": "$549.99"},
|
|
216
|
+
{"name": "Apple AirPods Pro", "price": "$249.99"}
|
|
217
|
+
]
|
|
218
|
+
}
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Step 2 — Replay to get all rows (description only, no schema):**
|
|
222
|
+
|
|
223
|
+
Replay the saved descriptor. The generalized selector finds ALL matching rows on the page:
|
|
202
224
|
|
|
225
|
+
```bash
|
|
226
|
+
opensteer extract --workspace demo --description "product list"
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
Returns all matching rows:
|
|
203
230
|
```json
|
|
204
231
|
{
|
|
205
232
|
"items": [
|
|
206
233
|
{"name": "Apple AirPods Max", "price": "$549.99"},
|
|
207
234
|
{"name": "Apple AirPods Pro", "price": "$249.99"},
|
|
208
235
|
{"name": "Apple AirPods 4", "price": "$129.99"},
|
|
209
|
-
|
|
236
|
+
{"name": "Apple AirPods 4 (ANC)", "price": "$179.99"}
|
|
210
237
|
]
|
|
211
238
|
}
|
|
212
239
|
```
|
|
213
240
|
|
|
241
|
+
This replay is deterministic and works across page updates, pagination, and different page states.
|
|
242
|
+
|
|
214
243
|
Rules:
|
|
215
244
|
|
|
216
245
|
- Build the exact JSON shape you want. The extractor does not accept `"string"` or prompt-style schemas.
|
|
217
246
|
- Each leaf must be `{ element: N }`, `{ selector: "..." }`, optional `attribute`, or `{ source: "current_url" }`.
|
|
218
247
|
- Use `element` fields during CLI exploration with a fresh snapshot. Deterministic scripts use `description`.
|
|
219
|
-
- For arrays, provide
|
|
248
|
+
- For arrays, provide 2 representative objects from different positions in the list. This gives the consolidator enough structural signal to generalize. Add more when rows have structural variants.
|
|
220
249
|
- Nested arrays are not supported.
|
|
250
|
+
- Do NOT expect the initial `extract --schema-json` call to return all rows. It returns exactly the template rows. Use description-only replay for the full list.
|
|
221
251
|
|
|
222
252
|
## What NOT To Do
|
|
223
253
|
|
|
224
254
|
- Do NOT use `page.evaluate()` to scrape DOM data. Use `extract()` with element-based schemas.
|
|
225
255
|
- Do NOT parse the `counters` array to find elements. Read the `html` string and find `c="N"` values.
|
|
226
256
|
- Do NOT use CSS selectors in reusable scripts. Use `description` from cached descriptors.
|
|
227
|
-
- Do NOT write loops to enumerate list items. Use array extraction
|
|
257
|
+
- Do NOT write loops to enumerate list items. Use array extraction: provide 2 template rows in the schema, then replay with description only to get all rows.
|
|
258
|
+
- Do NOT expect `extract --schema-json` with array templates to return all rows. It returns exactly the templates. Replay with `--description` alone for the full list.
|
|
228
259
|
- Do NOT skip re-snapshot after navigation. Always re-snapshot before targeting new elements.
|
|
@@ -96,6 +96,8 @@ await opensteer.saveNetwork({
|
|
|
96
96
|
});
|
|
97
97
|
```
|
|
98
98
|
|
|
99
|
+
Saved-network persistence is SQLite-backed and initializes on first use. Generic workspace and browser flows do not require SQLite capability unless they touch saved-network persistence.
|
|
100
|
+
|
|
99
101
|
6. Replay the plan from code.
|
|
100
102
|
|
|
101
103
|
```ts
|
|
@@ -35,6 +35,8 @@ const opensteer = new Opensteer({
|
|
|
35
35
|
- `opensteer.browser.status()`, `clone()`, `reset()`, and `delete()` manage the persistent workspace browser.
|
|
36
36
|
- `close()` shuts the current session and, for persistent workspaces, closes the live browser process.
|
|
37
37
|
- `disconnect()` detaches local runtime handles and leaves the workspace/browser files intact.
|
|
38
|
+
- Saved-network persistence is SQLite-backed and initializes on first `saveNetwork()`, saved-source `queryNetwork()`, or `clearNetwork()` use.
|
|
39
|
+
- Generic workspace and browser helpers do not require SQLite capability unless they touch saved-network persistence.
|
|
38
40
|
- The current public SDK does not expose `Opensteer.attach()`, cloud session helpers, or the ABP engine.
|
|
39
41
|
|
|
40
42
|
## DOM Automation And Extraction
|