opensteer 0.5.1 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +3 -0
- package/README.md +123 -85
- package/bin/opensteer.mjs +87 -1
- package/dist/cli/skills-installer.cjs +230 -0
- package/dist/cli/skills-installer.d.cts +28 -0
- package/dist/cli/skills-installer.d.ts +28 -0
- package/dist/cli/skills-installer.js +201 -0
- package/package.json +8 -2
- package/skills/README.md +29 -0
- package/skills/electron/SKILL.md +85 -0
- package/skills/electron/references/opensteer-electron-recipes.md +86 -0
- package/skills/electron/references/opensteer-electron-workflow.md +85 -0
- package/skills/opensteer/SKILL.md +168 -0
- package/skills/opensteer/references/cli-reference.md +154 -0
- package/skills/opensteer/references/examples.md +116 -0
- package/skills/opensteer/references/sdk-reference.md +143 -0
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: opensteer
|
|
3
|
+
description: "Browser automation, web scraping, and structured data extraction using Opensteer CLI and SDK. Use when the agent needs to: navigate web pages, interact with elements (click, type, select, hover), extract structured data from pages, take snapshots or screenshots, manage browser tabs and cookies, or generate scraper/automation scripts. Also use when the user asks to create a scraper, automation script, or replay a browsing session as code."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Opensteer Browser Automation
|
|
7
|
+
|
|
8
|
+
Opensteer provides persistent browser automation via a CLI and TypeScript SDK. It maintains browser sessions across calls and caches resolved element paths for deterministic replay.
|
|
9
|
+
|
|
10
|
+
## CRITICAL: Always Use Opensteer Methods Over Playwright
|
|
11
|
+
|
|
12
|
+
Opensteer methods are optimized for scraping — they handle waiting, element resolution, and selector caching automatically. **Never use raw Playwright when an Opensteer method exists.**
|
|
13
|
+
|
|
14
|
+
| Wrong (raw Playwright) | Right (Opensteer) |
|
|
15
|
+
| ----------------------------------------------------------------------------- | -------------------------------------------------------------- |
|
|
16
|
+
| `page.evaluate(() => [...document.querySelectorAll('.item')].map(...))` | `opensteer.extract({ description: "product listing" })` |
|
|
17
|
+
| `page.click('.submit')` | `opensteer.click({ description: "the submit button" })` |
|
|
18
|
+
| `page.fill('#search', 'query')` | `opensteer.input({ description: "search input", text: "q" })` |
|
|
19
|
+
|
|
20
|
+
**Why:** `opensteer.extract()` caches structural selectors that work across pages sharing the same template. Raw `querySelectorAll` is brittle, non-replayable, and bypasses the caching system. The only valid use of `opensteer.page.evaluate()` is calling `fetch()` for API-based extraction when a site has internal REST/GraphQL endpoints.
|
|
21
|
+
|
|
22
|
+
## Default Workflow
|
|
23
|
+
|
|
24
|
+
**Always use the CLI for exploration first. Only write scripts when the user asks.**
|
|
25
|
+
|
|
26
|
+
1. **Explore with CLI** — Open pages, snapshot, interact with elements interactively
|
|
27
|
+
2. **Cache selectors** — Re-run actions with `--description` flags to cache element paths for replay
|
|
28
|
+
3. **Cache extractions** — Run `extract` with `--description` for every page type the scraper will visit
|
|
29
|
+
4. **Generate script** — Use cached descriptions in TypeScript (no counters needed)
|
|
30
|
+
|
|
31
|
+
**Namespace links CLI and SDK.** The `--name` flag on `opensteer open` defines the cache namespace. `new Opensteer({ name: "..." })` in the SDK reads from the same cache. These must match.
|
|
32
|
+
|
|
33
|
+
## CLI Exploration
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# 1. Set session once per shell
|
|
37
|
+
export OPENSTEER_SESSION=my-session
|
|
38
|
+
|
|
39
|
+
# 2. Open page with namespace
|
|
40
|
+
opensteer open https://example.com/products --name "product-scraper"
|
|
41
|
+
|
|
42
|
+
# 3. Snapshot for interactions or data
|
|
43
|
+
opensteer snapshot action # Interactive elements with counters
|
|
44
|
+
opensteer snapshot extraction # Data-oriented HTML with counters
|
|
45
|
+
|
|
46
|
+
# 4. Interact using counter numbers from snapshot
|
|
47
|
+
opensteer click 3
|
|
48
|
+
opensteer input 5 "laptop" --pressEnter
|
|
49
|
+
|
|
50
|
+
# 5. Cache actions with --description for replay
|
|
51
|
+
opensteer click 3 --description "the products link"
|
|
52
|
+
opensteer input 5 "laptop" --pressEnter --description "the search input"
|
|
53
|
+
|
|
54
|
+
# 6. Extract data: snapshot extraction → identify counters → extract with schema
|
|
55
|
+
opensteer snapshot extraction
|
|
56
|
+
opensteer extract '{"products":[{"name":{"element":11},"price":{"element":12}},{"name":{"element":25},"price":{"element":26}}]}' \
|
|
57
|
+
--description "product listing with name and price"
|
|
58
|
+
|
|
59
|
+
# 7. Cache extractions for ALL page types the scraper will visit
|
|
60
|
+
opensteer click 11 --description "first product link"
|
|
61
|
+
opensteer snapshot extraction
|
|
62
|
+
opensteer extract '{"title":{"element":3},"price":{"element":7}}' \
|
|
63
|
+
--description "product detail page"
|
|
64
|
+
|
|
65
|
+
# 8. Always close when done
|
|
66
|
+
opensteer close
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
**Key rules:**
|
|
70
|
+
|
|
71
|
+
- Set `--name` on `open` to define cache namespace
|
|
72
|
+
- Specify snapshot mode explicitly: `action` (interactions) or `extraction` (data)
|
|
73
|
+
- `snapshot extraction` shows structure; `extract` produces JSON — never parse snapshot HTML manually
|
|
74
|
+
- Use `--description` to cache selectors for replay (one character difference = cache miss)
|
|
75
|
+
- For arrays, include all items in the schema — Opensteer caches the structural pattern and finds all matches on replay
|
|
76
|
+
- `open` does raw `page.goto()`; use `navigate` for subsequent pages (includes stability wait)
|
|
77
|
+
- Re-snapshot after navigation or significant page changes
|
|
78
|
+
|
|
79
|
+
## Writing Scraper Scripts
|
|
80
|
+
|
|
81
|
+
Read [sdk-reference.md](references/sdk-reference.md) for exact method signatures before writing any script.
|
|
82
|
+
|
|
83
|
+
### Template
|
|
84
|
+
|
|
85
|
+
```typescript
|
|
86
|
+
import { Opensteer } from "opensteer";
|
|
87
|
+
|
|
88
|
+
async function run() {
|
|
89
|
+
const opensteer = new Opensteer({
|
|
90
|
+
name: "product-scraper", // MUST match --name from CLI exploration
|
|
91
|
+
storage: { rootDir: process.cwd() },
|
|
92
|
+
});
|
|
93
|
+
|
|
94
|
+
await opensteer.launch({ headless: false });
|
|
95
|
+
|
|
96
|
+
try {
|
|
97
|
+
await opensteer.goto("https://example.com/products");
|
|
98
|
+
|
|
99
|
+
await opensteer.input({
|
|
100
|
+
text: "laptop",
|
|
101
|
+
description: "the search input", // exact match to CLI --description
|
|
102
|
+
});
|
|
103
|
+
|
|
104
|
+
// Use extract with description — no schema needed when cache exists
|
|
105
|
+
const data = await opensteer.extract({
|
|
106
|
+
description: "product listing with name and price",
|
|
107
|
+
});
|
|
108
|
+
|
|
109
|
+
console.log(JSON.stringify(data, null, 2));
|
|
110
|
+
} finally {
|
|
111
|
+
await opensteer.close();
|
|
112
|
+
}
|
|
113
|
+
}
|
|
114
|
+
|
|
115
|
+
run().catch((err) => {
|
|
116
|
+
console.error(err);
|
|
117
|
+
process.exit(1);
|
|
118
|
+
});
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Script Rules
|
|
122
|
+
|
|
123
|
+
- No top-level `await` — wrap in `async function run()` + `run().catch(...)`
|
|
124
|
+
- Default to `headless: false` (many sites block headless)
|
|
125
|
+
- Use cached `description` strings for all interactions and extractions
|
|
126
|
+
- Do NOT add wait calls before SDK actions — they handle waiting internally
|
|
127
|
+
- Use `opensteer.waitForText("literal text")` or `page.waitForSelector("css")` only for page transitions or confirming SPA content loaded
|
|
128
|
+
- Run with: `npx tsx scraper.ts`
|
|
129
|
+
|
|
130
|
+
## Browser Connection
|
|
131
|
+
|
|
132
|
+
- **Sandbox (default):** `opensteer open <url>` — fresh Chromium, no user sessions
|
|
133
|
+
- **Connect (existing browser):** `opensteer open --connect-url http://localhost:9222` — attach to a running CDP-enabled browser. Verify CDP: `curl -s http://127.0.0.1:9222/json/version`
|
|
134
|
+
|
|
135
|
+
## Element Targeting (preference order)
|
|
136
|
+
|
|
137
|
+
1. **Counter** (from snapshot): `click 5` — fast, needs fresh snapshot
|
|
138
|
+
2. **Description** (cached): `click --description "the submit button"` — replayable
|
|
139
|
+
3. **CSS selector**: `click --selector "#btn"` — explicit but brittle
|
|
140
|
+
|
|
141
|
+
## Snapshot Modes
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
opensteer snapshot action # Interactable elements (default)
|
|
145
|
+
opensteer snapshot extraction # Flattened HTML for data extraction
|
|
146
|
+
opensteer snapshot clickable # Only clickable elements
|
|
147
|
+
opensteer snapshot scrollable # Only scrollable containers
|
|
148
|
+
opensteer snapshot full # Raw HTML — only for debugging
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
All modes except `full` are intelligently filtered to show only relevant elements with counters.
|
|
152
|
+
|
|
153
|
+
## Debugging
|
|
154
|
+
|
|
155
|
+
When a scraper produces wrong or missing data, diagnose in this order:
|
|
156
|
+
|
|
157
|
+
1. **Timing** — SPA content not rendered. Add `waitForSelector` or `waitForText` before extraction.
|
|
158
|
+
2. **Missing cache** — Forgot to cache extraction during CLI exploration for a page type.
|
|
159
|
+
3. **Obstacles** — Cookie banners, modals, or login walls blocking the target.
|
|
160
|
+
4. **Missing data** — Some pages genuinely lack certain fields. Handle with null checks.
|
|
161
|
+
|
|
162
|
+
**Do NOT replace `opensteer.extract()` with `page.evaluate()` + `querySelectorAll` when debugging.** The extraction logic is not the problem — fix timing, caching, or obstacles instead.
|
|
163
|
+
|
|
164
|
+
## Reference
|
|
165
|
+
|
|
166
|
+
- CLI commands: [cli-reference.md](references/cli-reference.md)
|
|
167
|
+
- SDK API: [sdk-reference.md](references/sdk-reference.md)
|
|
168
|
+
- Full examples: [examples.md](references/examples.md)
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# Opensteer CLI Command Reference
|
|
2
|
+
|
|
3
|
+
All commands output JSON. Set session once per shell:
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
export OPENSTEER_SESSION=my-session
|
|
7
|
+
# Or for non-interactive runners:
|
|
8
|
+
export OPENSTEER_CLIENT_ID=agent-1
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
Global flags: `--session <id>`, `--name <namespace>`, `--headless`, `--description <text>`.
|
|
12
|
+
|
|
13
|
+
## Navigation
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
opensteer open <url> # Open browser, navigate to URL
|
|
17
|
+
opensteer open <url> --name "my-scraper" # Set selector cache namespace
|
|
18
|
+
opensteer open <url> --headless # Headless mode
|
|
19
|
+
opensteer open --connect-url http://localhost:9222 # Connect to running browser
|
|
20
|
+
opensteer navigate <url> # Navigate with visual stability wait
|
|
21
|
+
opensteer navigate <url> --timeout 60000 # Custom timeout (default 30s)
|
|
22
|
+
opensteer back # Go back in history
|
|
23
|
+
opensteer forward # Go forward in history
|
|
24
|
+
opensteer reload # Reload page
|
|
25
|
+
opensteer close # Close browser and stop server
|
|
26
|
+
opensteer close --all # Close all active sessions
|
|
27
|
+
opensteer sessions # List active sessions
|
|
28
|
+
opensteer status # Show resolved session/name state
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
`open` does raw `page.goto()` (no stability wait). `navigate` includes `waitForVisualStability`. Use `open` once to start, then `navigate` for subsequent pages.
|
|
32
|
+
|
|
33
|
+
## Observation
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
opensteer snapshot action # Same as above (explicit)
|
|
37
|
+
opensteer snapshot extraction # Flattened HTML for data scraping
|
|
38
|
+
opensteer snapshot clickable # Only clickable elements
|
|
39
|
+
opensteer snapshot scrollable # Only scrollable containers
|
|
40
|
+
opensteer snapshot full # Minimal cleaning, full HTML
|
|
41
|
+
opensteer state # URL + title + cleaned HTML
|
|
42
|
+
opensteer screenshot # Save screenshot to screenshot.png
|
|
43
|
+
opensteer screenshot output.png # Save to specific file
|
|
44
|
+
opensteer screenshot --fullPage # Full page screenshot
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Actions
|
|
48
|
+
|
|
49
|
+
First positional argument is element counter (`c="N"` from snapshot).
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
opensteer click 5 # Click by counter
|
|
53
|
+
opensteer click --description "the submit button" # By cached description
|
|
54
|
+
opensteer click 5 --button right # Right-click
|
|
55
|
+
opensteer click 5 --clickCount 2 # Double-click
|
|
56
|
+
opensteer hover 4 # Hover over element
|
|
57
|
+
opensteer hover --description "the user menu" # Hover by description
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Input
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
opensteer input 3 "Hello" # Type into element (clears first)
|
|
64
|
+
opensteer input 3 "Hello" --clear false # Append text
|
|
65
|
+
opensteer input 3 "query" --pressEnter # Type and press Enter
|
|
66
|
+
opensteer input --description "the search input" --text "query" --pressEnter
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Select / Scroll
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
opensteer select 9 --label "Option A" # Select by visible label
|
|
73
|
+
opensteer select 9 --value "opt-a" # Select by value attribute
|
|
74
|
+
opensteer select 9 --index 2 # Select by index
|
|
75
|
+
opensteer scroll # Scroll page down (default)
|
|
76
|
+
opensteer scroll --direction up # Scroll up
|
|
77
|
+
opensteer scroll --direction down --amount 1000
|
|
78
|
+
opensteer scroll 12 # Scroll within container element
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Keyboard
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
opensteer press Enter
|
|
85
|
+
opensteer press Tab
|
|
86
|
+
opensteer press Escape
|
|
87
|
+
opensteer press "Control+a"
|
|
88
|
+
opensteer type "Hello World" # Type into focused element
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## Element Info
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
opensteer get-text 5 # Get element text content
|
|
95
|
+
opensteer get-text --description "the heading"
|
|
96
|
+
opensteer get-value 3 # Get input/textarea value
|
|
97
|
+
opensteer get-attrs 5 # Get all HTML attributes
|
|
98
|
+
opensteer get-html # Full page HTML
|
|
99
|
+
opensteer get-html "main" # HTML of element matching selector
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Tabs
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
opensteer tabs # List open tabs with indices
|
|
106
|
+
opensteer tab-new # Open new blank tab
|
|
107
|
+
opensteer tab-new https://example.com # Open URL in new tab
|
|
108
|
+
opensteer tab-switch 0 # Switch to tab by index
|
|
109
|
+
opensteer tab-close # Close current tab
|
|
110
|
+
opensteer tab-close 2 # Close specific tab
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Cookies
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
opensteer cookies # Get all cookies
|
|
117
|
+
opensteer cookies --url https://example.com # Cookies for specific URL
|
|
118
|
+
opensteer cookie-set --name token --value abc123
|
|
119
|
+
opensteer cookies-clear # Clear all cookies
|
|
120
|
+
opensteer cookies-export /tmp/cookies.json # Export to file
|
|
121
|
+
opensteer cookies-import /tmp/cookies.json # Import from file
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
## Utility
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
opensteer eval "document.title" # Execute JS in page
|
|
128
|
+
opensteer wait-for "Success" # Wait for text to appear
|
|
129
|
+
opensteer wait-for "Success" --timeout 5000
|
|
130
|
+
opensteer wait-selector "h1" # Wait for selector to appear
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
## Data Extraction
|
|
134
|
+
|
|
135
|
+
### Counter-based (preferred)
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
opensteer snapshot extraction
|
|
139
|
+
# Read counters from output, then:
|
|
140
|
+
opensteer extract '{"title":{"element":3},"price":{"element":7}}'
|
|
141
|
+
opensteer extract '{"url":{"element":5,"attribute":"href"}}'
|
|
142
|
+
opensteer extract '{"pageUrl":{"source":"current_url"},"title":{"element":3}}'
|
|
143
|
+
|
|
144
|
+
# Arrays: include multiple items to identify the pattern
|
|
145
|
+
opensteer extract '{"results":[{"title":{"element":11},"url":{"element":10,"attribute":"href"}},{"title":{"element":16},"url":{"element":15,"attribute":"href"}}]}'
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### AI-based (limited and requires LLM API keys)
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
opensteer extract '{"title":"","price":""}' --description "product details"
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Always prefer counter-based. AI extraction requires `@ai-sdk/*` packages and does NOT work from workspace root scripts.
|
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
# Opensteer Examples
|
|
2
|
+
|
|
3
|
+
## Full Workflow: CLI Exploration to Scraper Script
|
|
4
|
+
|
|
5
|
+
### Step 1: Explore and cache with CLI
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
export OPENSTEER_SESSION=eures-session
|
|
9
|
+
|
|
10
|
+
opensteer open https://europa.eu/eures/portal/jv-se/home --name "eures-jobs"
|
|
11
|
+
|
|
12
|
+
opensteer snapshot action
|
|
13
|
+
opensteer input 5 "software engineer" --pressEnter --description "the job search input"
|
|
14
|
+
opensteer click 12 --description "the search button"
|
|
15
|
+
|
|
16
|
+
# Wait for results, then extract job listings
|
|
17
|
+
opensteer snapshot extraction
|
|
18
|
+
opensteer extract '{"jobs":[{"title":{"element":20},"company":{"element":22},"url":{"element":20,"attribute":"href"}},{"title":{"element":35},"company":{"element":37},"url":{"element":35,"attribute":"href"}},{"title":{"element":50},"company":{"element":52},"url":{"element":50,"attribute":"href"}}]}' \
|
|
19
|
+
--description "job listing with title company and url"
|
|
20
|
+
|
|
21
|
+
# Cache detail page extraction too
|
|
22
|
+
opensteer click 20 --description "first job link"
|
|
23
|
+
opensteer snapshot extraction
|
|
24
|
+
opensteer extract '{"title":{"element":3},"company":{"element":7},"location":{"element":12},"description":{"element":18}}' \
|
|
25
|
+
--description "job detail page"
|
|
26
|
+
|
|
27
|
+
opensteer close
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Step 2: Generate replay script (same namespace, same descriptions)
|
|
31
|
+
|
|
32
|
+
```typescript
|
|
33
|
+
import { Opensteer } from "opensteer";
|
|
34
|
+
|
|
35
|
+
async function run() {
|
|
36
|
+
const opensteer = new Opensteer({
|
|
37
|
+
name: "eures-jobs",
|
|
38
|
+
storage: { rootDir: process.cwd() },
|
|
39
|
+
});
|
|
40
|
+
|
|
41
|
+
await opensteer.launch({ headless: false });
|
|
42
|
+
|
|
43
|
+
try {
|
|
44
|
+
await opensteer.goto("https://europa.eu/eures/portal/jv-se/home");
|
|
45
|
+
|
|
46
|
+
await opensteer.input({
|
|
47
|
+
text: "software engineer",
|
|
48
|
+
description: "the job search input",
|
|
49
|
+
});
|
|
50
|
+
await opensteer.click({ description: "the search button" });
|
|
51
|
+
|
|
52
|
+
await opensteer.waitForText("Showing 1 to 10");
|
|
53
|
+
|
|
54
|
+
// Extract all job listings using cached description — no schema needed
|
|
55
|
+
const listings = await opensteer.extract({
|
|
56
|
+
description: "job listing with title company and url",
|
|
57
|
+
});
|
|
58
|
+
|
|
59
|
+
// Visit each detail page and extract using cached description
|
|
60
|
+
for (const job of listings.jobs) {
|
|
61
|
+
await opensteer.goto(job.url);
|
|
62
|
+
await opensteer.page.waitForSelector("h1");
|
|
63
|
+
|
|
64
|
+
const detail = await opensteer.extract({
|
|
65
|
+
description: "job detail page",
|
|
66
|
+
});
|
|
67
|
+
console.log(JSON.stringify(detail, null, 2));
|
|
68
|
+
}
|
|
69
|
+
} finally {
|
|
70
|
+
await opensteer.close();
|
|
71
|
+
}
|
|
72
|
+
}
|
|
73
|
+
|
|
74
|
+
run().catch((err) => {
|
|
75
|
+
console.error(err);
|
|
76
|
+
process.exit(1);
|
|
77
|
+
});
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## API-Based Extraction
|
|
81
|
+
|
|
82
|
+
When a site has internal APIs (REST, GraphQL, Algolia), navigate first for cookies, then use `fetch()` inside `page.evaluate()`. This is the only valid use of `page.evaluate()` for data.
|
|
83
|
+
|
|
84
|
+
```typescript
|
|
85
|
+
import { Opensteer } from "opensteer";
|
|
86
|
+
|
|
87
|
+
async function run() {
|
|
88
|
+
const opensteer = new Opensteer({
|
|
89
|
+
name: "api-scraper",
|
|
90
|
+
storage: { rootDir: process.cwd() },
|
|
91
|
+
});
|
|
92
|
+
|
|
93
|
+
await opensteer.launch({ headless: false });
|
|
94
|
+
|
|
95
|
+
try {
|
|
96
|
+
// Navigate first to establish session cookies
|
|
97
|
+
await opensteer.goto("https://example.com");
|
|
98
|
+
|
|
99
|
+
const data = await opensteer.page.evaluate(async () => {
|
|
100
|
+
const res = await fetch("https://api.example.com/search?q=shoes&limit=100", {
|
|
101
|
+
headers: { "Content-Type": "application/json" },
|
|
102
|
+
});
|
|
103
|
+
return res.json();
|
|
104
|
+
});
|
|
105
|
+
|
|
106
|
+
console.log(JSON.stringify(data, null, 2));
|
|
107
|
+
} finally {
|
|
108
|
+
await opensteer.close();
|
|
109
|
+
}
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
run().catch((err) => {
|
|
113
|
+
console.error(err);
|
|
114
|
+
process.exit(1);
|
|
115
|
+
});
|
|
116
|
+
```
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Opensteer SDK API Reference
|
|
2
|
+
|
|
3
|
+
The SDK is the `Opensteer` class imported from `'opensteer'`. **Only the methods listed below exist.** Do NOT call CLI command names as SDK methods.
|
|
4
|
+
|
|
5
|
+
## Construction and Lifecycle
|
|
6
|
+
|
|
7
|
+
```typescript
|
|
8
|
+
const opensteer = new Opensteer({
|
|
9
|
+
name: "my-scraper",
|
|
10
|
+
storage: { rootDir: process.cwd() },
|
|
11
|
+
});
|
|
12
|
+
await opensteer.launch({ headless: false });
|
|
13
|
+
await opensteer.close();
|
|
14
|
+
|
|
15
|
+
// Or wrap an existing Playwright page:
|
|
16
|
+
const opensteer = Opensteer.from(existingPage, { name: "my-scraper" });
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Properties
|
|
20
|
+
|
|
21
|
+
```typescript
|
|
22
|
+
opensteer.page; // Raw Playwright Page — only for page.evaluate(fetch), page.waitForSelector, page.waitForTimeout
|
|
23
|
+
opensteer.context; // Raw Playwright BrowserContext
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Navigation
|
|
27
|
+
|
|
28
|
+
```typescript
|
|
29
|
+
await opensteer.goto(url); // Navigate + waitForVisualStability
|
|
30
|
+
await opensteer.goto(url, { timeout: 60000 }); // Custom timeout
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Observation
|
|
34
|
+
|
|
35
|
+
```typescript
|
|
36
|
+
const html = await opensteer.snapshot(); // Action mode (default)
|
|
37
|
+
const html = await opensteer.snapshot({ mode: "extraction" }); // Extraction mode
|
|
38
|
+
const state = await opensteer.state(); // { url, title, html }
|
|
39
|
+
const buffer = await opensteer.screenshot(); // PNG buffer
|
|
40
|
+
const jpeg = await opensteer.screenshot({ type: "jpeg", fullPage: true });
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Interactions
|
|
44
|
+
|
|
45
|
+
```typescript
|
|
46
|
+
await opensteer.click({ element: 5 });
|
|
47
|
+
await opensteer.click({ description: "the submit button" });
|
|
48
|
+
await opensteer.click({ selector: "#btn" });
|
|
49
|
+
await opensteer.dblclick({ element: 7 });
|
|
50
|
+
await opensteer.rightclick({ element: 7 });
|
|
51
|
+
await opensteer.hover({ element: 4 });
|
|
52
|
+
await opensteer.input({ element: 3, text: "Hello" });
|
|
53
|
+
await opensteer.input({ description: "search", text: "q", pressEnter: true });
|
|
54
|
+
await opensteer.select({ element: 9, label: "Option A" });
|
|
55
|
+
await opensteer.scroll();
|
|
56
|
+
await opensteer.scroll({ direction: "up", amount: 500 });
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Data Extraction
|
|
60
|
+
|
|
61
|
+
```typescript
|
|
62
|
+
// Replay from cached descriptions (preferred in scraper scripts)
|
|
63
|
+
const data = await opensteer.extract({
|
|
64
|
+
description: "product details",
|
|
65
|
+
});
|
|
66
|
+
|
|
67
|
+
// Counter-based (during exploration or when no cache exists)
|
|
68
|
+
const data = await opensteer.extract({
|
|
69
|
+
schema: { title: { element: 3 }, price: { element: 7 } },
|
|
70
|
+
description: "product details",
|
|
71
|
+
});
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Schema field types: `{ element: N }`, `{ element: N, attribute: "href" }`, `{ selector: ".price" }`, `{ source: "current_url" }`.
|
|
75
|
+
|
|
76
|
+
For arrays, include multiple items in the schema. Opensteer caches the structural pattern and expands to all matching items on replay.
|
|
77
|
+
|
|
78
|
+
## Keyboard
|
|
79
|
+
|
|
80
|
+
```typescript
|
|
81
|
+
await opensteer.pressKey("Enter");
|
|
82
|
+
await opensteer.pressKey("Control+a");
|
|
83
|
+
await opensteer.type("Hello World");
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Element Info
|
|
87
|
+
|
|
88
|
+
```typescript
|
|
89
|
+
const text = await opensteer.getElementText({ element: 5 });
|
|
90
|
+
const value = await opensteer.getElementValue({ element: 3 });
|
|
91
|
+
const attrs = await opensteer.getElementAttributes({ element: 5 });
|
|
92
|
+
const box = await opensteer.getElementBoundingBox({ element: 5 });
|
|
93
|
+
const html = await opensteer.getHtml();
|
|
94
|
+
const html = await opensteer.getHtml("main");
|
|
95
|
+
const title = await opensteer.getTitle();
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Wait
|
|
99
|
+
|
|
100
|
+
**Do NOT use wait calls before SDK actions** — each action handles waiting internally. Only use explicit waits for page transitions or confirming SPA content loaded.
|
|
101
|
+
|
|
102
|
+
```typescript
|
|
103
|
+
await opensteer.waitForText("Success"); // Literal text on page
|
|
104
|
+
await opensteer.waitForText("Success", { timeout: 5000 });
|
|
105
|
+
await opensteer.page.waitForSelector("article"); // CSS selector
|
|
106
|
+
await opensteer.page.waitForSelector(".loading", { state: "hidden" });
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Tabs
|
|
110
|
+
|
|
111
|
+
```typescript
|
|
112
|
+
const tabs = await opensteer.tabs();
|
|
113
|
+
await opensteer.newTab("https://example.com");
|
|
114
|
+
await opensteer.switchTab(0);
|
|
115
|
+
await opensteer.closeTab(1);
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## Cookies
|
|
119
|
+
|
|
120
|
+
```typescript
|
|
121
|
+
const cookies = await opensteer.getCookies();
|
|
122
|
+
await opensteer.setCookie({ name: "token", value: "abc" });
|
|
123
|
+
await opensteer.clearCookies();
|
|
124
|
+
await opensteer.exportCookies("/tmp/cookies.json");
|
|
125
|
+
await opensteer.importCookies("/tmp/cookies.json");
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## File Upload
|
|
129
|
+
|
|
130
|
+
```typescript
|
|
131
|
+
await opensteer.uploadFile({ element: 5, paths: ["/path/to/file.pdf"] });
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Methods That DO NOT Exist
|
|
135
|
+
|
|
136
|
+
| Wrong (throws) | Correct |
|
|
137
|
+
| -------------------------------- | -------------------------------------- |
|
|
138
|
+
| `opensteer.evaluate(...)` | `opensteer.page.evaluate(...)` |
|
|
139
|
+
| `opensteer.waitForSelector(...)` | `opensteer.page.waitForSelector(...)` |
|
|
140
|
+
| `opensteer.waitForLoad(...)` | `opensteer.page.waitForLoadState(...)` |
|
|
141
|
+
| `opensteer.navigate(...)` | `opensteer.goto(...)` |
|
|
142
|
+
| `opensteer.browser_launch(...)` | `opensteer.launch(...)` |
|
|
143
|
+
| `opensteer.browser_close(...)` | `opensteer.close(...)` |
|