@kilospark/webact 2.5.1 → 2.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +86 -62
- package/package.json +1 -1
- package/webact.js +90 -22
package/SKILL.md
CHANGED
|
@@ -9,12 +9,12 @@ Control Chrome directly via the Chrome DevTools Protocol. No Playwright, no MCP
|
|
|
9
9
|
|
|
10
10
|
## How to Run Commands
|
|
11
11
|
|
|
12
|
-
All commands use `webact.
|
|
12
|
+
All commands use the `webact` CLI. First, check if `webact` is on PATH by running `which webact`. If found, use `webact` directly. If not, fall back to `node <base-dir>/webact.js` where `<base-dir>` is this skill's base directory.
|
|
13
13
|
|
|
14
14
|
### Session Setup (once)
|
|
15
15
|
|
|
16
16
|
```bash
|
|
17
|
-
|
|
17
|
+
webact launch
|
|
18
18
|
```
|
|
19
19
|
|
|
20
20
|
This launches Chrome (or connects to an existing instance) and creates a session. All subsequent commands auto-discover the session - no session ID needed.
|
|
@@ -24,11 +24,11 @@ This launches Chrome (or connects to an existing instance) and creates a session
|
|
|
24
24
|
Use direct CLI commands. Each is a single bash call:
|
|
25
25
|
|
|
26
26
|
```bash
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
27
|
+
webact navigate https://example.com
|
|
28
|
+
webact click button.submit
|
|
29
|
+
webact keyboard "hello world"
|
|
30
|
+
webact press Enter
|
|
31
|
+
webact dom
|
|
32
32
|
```
|
|
33
33
|
|
|
34
34
|
**Auto-brief:** State-changing commands (navigate, click, hover, press Enter/Tab, scroll, select, waitfor) auto-print a compact page summary showing URL, title, inputs, buttons, links, and total element counts. You usually don't need a separate `dom` call. Use `dom` only when you need the full page structure, `axtree -i` for a quick list of all interactive elements, or `axtree` for the full semantic tree.
|
|
@@ -37,47 +37,48 @@ node <base-dir>/webact.js dom
|
|
|
37
37
|
|
|
38
38
|
| Command | Example |
|
|
39
39
|
|---------|---------|
|
|
40
|
-
| `navigate <url>` | `
|
|
41
|
-
| `back` | `
|
|
42
|
-
| `forward` | `
|
|
43
|
-
| `reload` | `
|
|
44
|
-
| `dom [selector] [--full]` | `
|
|
45
|
-
| `axtree [selector] [-i]` | `
|
|
46
|
-
| `observe` | `
|
|
47
|
-
| `screenshot` | `
|
|
48
|
-
| `pdf [path]` | `
|
|
49
|
-
| `click <selector>` | `
|
|
50
|
-
| `doubleclick <selector>` | `
|
|
51
|
-
| `rightclick <selector>` | `
|
|
52
|
-
| `hover <selector>` | `
|
|
53
|
-
| `focus <selector>` | `
|
|
54
|
-
| `clear <selector>` | `
|
|
55
|
-
| `type <selector> <text>` | `
|
|
56
|
-
| `keyboard <text>` | `
|
|
57
|
-
| `
|
|
58
|
-
| `
|
|
59
|
-
| `
|
|
60
|
-
| `
|
|
61
|
-
| `
|
|
62
|
-
| `
|
|
63
|
-
| `
|
|
64
|
-
| `
|
|
65
|
-
| `
|
|
66
|
-
| `
|
|
67
|
-
| `
|
|
68
|
-
| `
|
|
69
|
-
| `
|
|
70
|
-
| `
|
|
71
|
-
| `
|
|
72
|
-
| `
|
|
73
|
-
| `
|
|
74
|
-
| `
|
|
75
|
-
| `
|
|
76
|
-
| `
|
|
77
|
-
| `
|
|
78
|
-
| `
|
|
79
|
-
|
|
80
|
-
|
|
40
|
+
| `navigate <url>` | `webact navigate https://example.com` |
|
|
41
|
+
| `back` | `webact back` |
|
|
42
|
+
| `forward` | `webact forward` |
|
|
43
|
+
| `reload` | `webact reload` |
|
|
44
|
+
| `dom [selector] [--full]` | `webact dom` or `webact dom .results` |
|
|
45
|
+
| `axtree [selector] [-i]` | `webact axtree` or `webact axtree -i` |
|
|
46
|
+
| `observe` | `webact observe` |
|
|
47
|
+
| `screenshot` | `webact screenshot` |
|
|
48
|
+
| `pdf [path]` | `webact pdf` or `webact pdf /tmp/page.pdf` |
|
|
49
|
+
| `click <selector>` | `webact click button.submit` |
|
|
50
|
+
| `doubleclick <selector>` | `webact doubleclick td.cell` |
|
|
51
|
+
| `rightclick <selector>` | `webact rightclick .context-target` |
|
|
52
|
+
| `hover <selector>` | `webact hover .menu-trigger` |
|
|
53
|
+
| `focus <selector>` | `webact focus input[name=q]` |
|
|
54
|
+
| `clear <selector>` | `webact clear input[name=q]` |
|
|
55
|
+
| `type <selector> <text>` | `webact type input[name=q] search query` |
|
|
56
|
+
| `keyboard <text>` | `webact keyboard hello world` |
|
|
57
|
+
| `paste <text>` | `webact paste Hello world` |
|
|
58
|
+
| `select <selector> <value>` | `webact select select#country US` |
|
|
59
|
+
| `upload <selector> <file>` | `webact upload input[type=file] /tmp/photo.png` |
|
|
60
|
+
| `drag <from> <to>` | `webact drag .card .dropzone` |
|
|
61
|
+
| `dialog <accept\|dismiss> [text]` | `webact dialog accept` |
|
|
62
|
+
| `waitfor <selector> [ms]` | `webact waitfor .dropdown 5000` |
|
|
63
|
+
| `waitfornav [ms]` | `webact waitfornav` |
|
|
64
|
+
| `press <key\|combo>` | `webact press Enter` or `webact press Ctrl+A` |
|
|
65
|
+
| `scroll <target> [px]` | `webact scroll down 500` or `webact scroll top` |
|
|
66
|
+
| `eval <js>` | `webact eval document.title` |
|
|
67
|
+
| `cookies [get\|set\|clear\|delete]` | `webact cookies` or `webact cookies set name val` |
|
|
68
|
+
| `console [show\|errors\|listen]` | `webact console` or `webact console errors` |
|
|
69
|
+
| `block <pattern>` | `webact block images css` or `webact block off` |
|
|
70
|
+
| `viewport <w> <h>` | `webact viewport mobile` or `webact viewport 1024 768` |
|
|
71
|
+
| `frames` | `webact frames` |
|
|
72
|
+
| `frame <id\|selector>` | `webact frame main` or `webact frame iframe#embed` |
|
|
73
|
+
| `download [path\|list]` | `webact download path /tmp/dl` or `webact download list` |
|
|
74
|
+
| `tabs` | `webact tabs` |
|
|
75
|
+
| `tab <id>` | `webact tab ABC123` |
|
|
76
|
+
| `newtab [url]` | `webact newtab https://example.com` |
|
|
77
|
+
| `close` | `webact close` |
|
|
78
|
+
| `activate` | `webact activate` |
|
|
79
|
+
| `minimize` | `webact minimize` |
|
|
80
|
+
|
|
81
|
+
**`type` vs `keyboard` vs `paste`:** Use `type` to focus a specific input and fill it. Use `keyboard` to type at the current caret position - essential for rich text editors (Slack, Google Docs, Notion) where `type`'s focus call resets the cursor. Use `paste` to insert text via a ClipboardEvent - works with apps that intercept paste (Google Docs, Notion) and is faster than `keyboard` for large text.
|
|
81
82
|
|
|
82
83
|
**`click` behavior:** Waits up to 5s for the element, scrolls it into view, then clicks. No manual waits needed for dynamic elements.
|
|
83
84
|
|
|
@@ -93,7 +94,9 @@ node <base-dir>/webact.js dom
|
|
|
93
94
|
|
|
94
95
|
**`press` combos:** Supports modifier keys: `Ctrl+A` (select all), `Ctrl+C` (copy), `Meta+V` (paste on Mac), `Shift+Enter`, etc. Modifiers: Ctrl, Alt, Shift, Meta/Cmd.
|
|
95
96
|
|
|
96
|
-
|
|
97
|
+
**Mac keyboard note:** On macOS, app shortcuts documented as `Ctrl+Alt+<key>` (e.g., Google Docs heading shortcuts `Ctrl+Alt+1` through `Ctrl+Alt+6`) must be sent as `Meta+Alt+<key>` through CDP. Mac's Ctrl key is not the Command key these apps expect. Example: `press Meta+Alt+2` for Heading 2 in Google Docs.
|
|
98
|
+
|
|
99
|
+
**`scroll` targets:** `up`/`down` (default 400px, or specify pixels), `top`/`bottom`, or a CSS selector to scroll an element into view. **Element-scoped:** `scroll <selector> <up|down|top|bottom> [px]` scrolls within a container element instead of the page — essential for apps with custom scroll containers (Google Docs, Slack).
|
|
97
100
|
|
|
98
101
|
**`block` patterns:** Block resource types (`images`, `css`, `fonts`, `media`, `scripts`) or URL substrings. Speeds up page loads. Use `block off` to disable.
|
|
99
102
|
|
|
@@ -119,7 +122,7 @@ When given a goal, follow this loop:
|
|
|
119
122
|
|
|
120
123
|
1. **PLAN** - Break the goal into steps. Chain predictable sequences (click → type → press Enter) into a single command array.
|
|
121
124
|
|
|
122
|
-
2. **ACT** - Write command JSON (or array), run `
|
|
125
|
+
2. **ACT** - Write command JSON (or array), run `webact run <sessionId>`. Actions auto-print a page brief.
|
|
123
126
|
|
|
124
127
|
3. **DECIDE** - Read the brief. Expected state? Continue. Login wall / CAPTCHA? Tell user. Need more detail? Use `dom`. Goal complete? Report.
|
|
125
128
|
|
|
@@ -149,7 +152,7 @@ When given a goal, follow this loop:
|
|
|
149
152
|
|
|
150
153
|
```bash
|
|
151
154
|
# Launch Chrome and get a session ID
|
|
152
|
-
|
|
155
|
+
webact launch
|
|
153
156
|
# Output: Session: a1b2c3d4
|
|
154
157
|
# Command file: /tmp/webact-command-a1b2c3d4.json (path varies by OS)
|
|
155
158
|
```
|
|
@@ -180,30 +183,51 @@ Read the DOM output and identify elements by:
|
|
|
180
183
|
|
|
181
184
|
If a CSS selector doesn't work, use `eval` to find elements by text content:
|
|
182
185
|
```bash
|
|
183
|
-
|
|
186
|
+
webact eval "[...document.querySelectorAll('a')].find(a => a.textContent.includes('Sign in'))?.getAttribute('href')"
|
|
184
187
|
```
|
|
185
188
|
|
|
186
189
|
## Common Patterns
|
|
187
190
|
|
|
188
|
-
All examples assume you've already run `
|
|
191
|
+
All examples assume you've already run `webact launch`.
|
|
189
192
|
|
|
190
193
|
**Navigate and read** (navigate auto-prints brief - no separate dom needed):
|
|
191
194
|
```bash
|
|
192
|
-
|
|
195
|
+
webact navigate https://news.ycombinator.com
|
|
193
196
|
```
|
|
194
197
|
|
|
195
198
|
**Fill a form:**
|
|
196
199
|
```bash
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
+
webact click input[name=q]
|
|
201
|
+
webact type input[name=q] search query
|
|
202
|
+
webact press Enter
|
|
200
203
|
```
|
|
201
204
|
|
|
202
205
|
**Rich text editors and @mentions:**
|
|
203
206
|
```bash
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
207
|
+
webact click .ql-editor
|
|
208
|
+
webact keyboard Hello @alice
|
|
209
|
+
webact waitfor [data-qa='tab_complete_ui_item'] 5000
|
|
210
|
+
webact click [data-qa='tab_complete_ui_item']
|
|
211
|
+
webact keyboard " check this out"
|
|
209
212
|
```
|
|
213
|
+
|
|
214
|
+
## Complex Web Apps
|
|
215
|
+
|
|
216
|
+
Some apps have non-standard DOMs that require specific approaches.
|
|
217
|
+
|
|
218
|
+
**Google Docs:**
|
|
219
|
+
- Use `keyboard` (not `type`) — Google Docs has a custom editor, not standard inputs
|
|
220
|
+
- Use `paste` for inserting blocks of text — faster and more reliable than `keyboard` for multi-line content
|
|
221
|
+
- Heading shortcuts: `press Meta+Alt+1` through `press Meta+Alt+6` (NOT `Ctrl+Alt` — see Mac keyboard note above)
|
|
222
|
+
- Scrolling: Use `scroll .kix-appview-editor down 500` — page-level scroll doesn't reach the document content
|
|
223
|
+
|
|
224
|
+
**Slack:**
|
|
225
|
+
- Message composition: Click the message input, then use `keyboard` to type
|
|
226
|
+
- Message extraction: Use `eval` to query Slack's virtual DOM — standard CSS selectors are unreliable due to virtual scrolling
|
|
227
|
+
- Example: `eval [...document.querySelectorAll('[data-qa="virtual-list-item"]')].map(el => el.textContent).join('\n')`
|
|
228
|
+
|
|
229
|
+
**General rich editors (Notion, Quill, ProseMirror, etc.):**
|
|
230
|
+
- Prefer `paste` over `keyboard` for multi-line text — many editors handle paste events specially
|
|
231
|
+
- Use `keyboard` for short inline text and @mentions
|
|
232
|
+
- Use `eval` when you need to extract content from editors with virtual rendering
|
|
233
|
+
- If `paste` doesn't work for a specific app, fall back to `eval` with a custom ClipboardEvent
|
package/package.json
CHANGED
package/webact.js
CHANGED
|
@@ -4624,6 +4624,38 @@ async function cmdKeyboard(text) {
|
|
|
4624
4624
|
console.log(`OK keyboard "${text.substring(0, 50)}${text.length > 50 ? "..." : ""}"`);
|
|
4625
4625
|
});
|
|
4626
4626
|
}
|
|
4627
|
+
async function cmdPaste(text) {
|
|
4628
|
+
if (!text) {
|
|
4629
|
+
console.error("Usage: webact.js paste <text>");
|
|
4630
|
+
process.exit(1);
|
|
4631
|
+
}
|
|
4632
|
+
await withCDP(async (cdp) => {
|
|
4633
|
+
const result = await cdp.send("Runtime.evaluate", {
|
|
4634
|
+
expression: `
|
|
4635
|
+
(function() {
|
|
4636
|
+
const el = document.activeElement;
|
|
4637
|
+
if (!el) return { error: 'No active element to paste into' };
|
|
4638
|
+
const dt = new DataTransfer();
|
|
4639
|
+
dt.setData('text/plain', ${JSON.stringify(text)});
|
|
4640
|
+
const evt = new ClipboardEvent('paste', {
|
|
4641
|
+
clipboardData: dt,
|
|
4642
|
+
bubbles: true,
|
|
4643
|
+
cancelable: true,
|
|
4644
|
+
});
|
|
4645
|
+
el.dispatchEvent(evt);
|
|
4646
|
+
return { ok: true };
|
|
4647
|
+
})()
|
|
4648
|
+
`,
|
|
4649
|
+
returnByValue: true
|
|
4650
|
+
});
|
|
4651
|
+
const val = result.result.value;
|
|
4652
|
+
if (val && val.error) {
|
|
4653
|
+
console.error(val.error);
|
|
4654
|
+
process.exit(1);
|
|
4655
|
+
}
|
|
4656
|
+
console.log(`OK pasted "${text.substring(0, 50)}${text.length > 50 ? "..." : ""}"`);
|
|
4657
|
+
});
|
|
4658
|
+
}
|
|
4627
4659
|
async function cmdWaitFor(selector, timeoutMs) {
|
|
4628
4660
|
if (!selector) {
|
|
4629
4661
|
console.error("Usage: webact.js waitfor <selector> [timeout_ms]");
|
|
@@ -4775,33 +4807,64 @@ async function cmdPress(key) {
|
|
|
4775
4807
|
}
|
|
4776
4808
|
});
|
|
4777
4809
|
}
|
|
4778
|
-
async function cmdScroll(
|
|
4779
|
-
if (!
|
|
4780
|
-
console.error("Usage: webact.js scroll <up|down|top|bottom|selector> [pixels]");
|
|
4810
|
+
async function cmdScroll(args) {
|
|
4811
|
+
if (!args.length) {
|
|
4812
|
+
console.error("Usage: webact.js scroll <up|down|top|bottom|selector> [pixels]\n webact.js scroll <selector> <up|down|top|bottom> [pixels]");
|
|
4781
4813
|
process.exit(1);
|
|
4782
4814
|
}
|
|
4783
|
-
const
|
|
4815
|
+
const directions = ["up", "down", "top", "bottom"];
|
|
4816
|
+
const first = args[0];
|
|
4817
|
+
const lower = first.toLowerCase();
|
|
4818
|
+
const secondIsDirection = args[1] && directions.includes(args[1].toLowerCase());
|
|
4819
|
+
const firstIsDirection = directions.includes(lower);
|
|
4784
4820
|
await withCDP(async (cdp) => {
|
|
4785
|
-
if (
|
|
4786
|
-
|
|
4787
|
-
|
|
4788
|
-
|
|
4789
|
-
|
|
4790
|
-
|
|
4791
|
-
|
|
4792
|
-
|
|
4793
|
-
|
|
4794
|
-
|
|
4795
|
-
|
|
4796
|
-
|
|
4797
|
-
|
|
4821
|
+
if (firstIsDirection) {
|
|
4822
|
+
if (lower === "top") {
|
|
4823
|
+
await cdp.send("Runtime.evaluate", { expression: "window.scrollTo(0, 0)" });
|
|
4824
|
+
} else if (lower === "bottom") {
|
|
4825
|
+
await cdp.send("Runtime.evaluate", { expression: "window.scrollTo(0, document.body.scrollHeight)" });
|
|
4826
|
+
} else {
|
|
4827
|
+
const pixels = parseInt(args[1], 10) || 400;
|
|
4828
|
+
const deltaY = lower === "up" ? -pixels : pixels;
|
|
4829
|
+
await cdp.send("Input.dispatchMouseEvent", {
|
|
4830
|
+
type: "mouseWheel",
|
|
4831
|
+
x: 200,
|
|
4832
|
+
y: 200,
|
|
4833
|
+
deltaX: 0,
|
|
4834
|
+
deltaY
|
|
4835
|
+
});
|
|
4836
|
+
}
|
|
4837
|
+
} else if (secondIsDirection) {
|
|
4838
|
+
const selector = first;
|
|
4839
|
+
const dir = args[1].toLowerCase();
|
|
4840
|
+
const pixels = parseInt(args[2], 10) || 400;
|
|
4841
|
+
const result = await cdp.send("Runtime.evaluate", {
|
|
4842
|
+
expression: `
|
|
4843
|
+
(function() {
|
|
4844
|
+
const el = document.querySelector(${JSON.stringify(selector)});
|
|
4845
|
+
if (!el) return { error: 'Element not found' };
|
|
4846
|
+
const dir = ${JSON.stringify(dir)};
|
|
4847
|
+
const pixels = ${pixels};
|
|
4848
|
+
if (dir === 'top') el.scrollTop = 0;
|
|
4849
|
+
else if (dir === 'bottom') el.scrollTop = el.scrollHeight;
|
|
4850
|
+
else el.scrollBy(0, dir === 'up' ? -pixels : pixels);
|
|
4851
|
+
return { tag: el.tagName.toLowerCase(), dir };
|
|
4852
|
+
})()
|
|
4853
|
+
`,
|
|
4854
|
+
returnByValue: true
|
|
4798
4855
|
});
|
|
4856
|
+
const val = result.result.value;
|
|
4857
|
+
if (val && val.error) {
|
|
4858
|
+
console.error(val.error);
|
|
4859
|
+
process.exit(1);
|
|
4860
|
+
}
|
|
4861
|
+
console.log(`Scrolled ${val.dir} within ${val.tag} ${selector}`);
|
|
4799
4862
|
} else {
|
|
4800
4863
|
const result = await cdp.send("Runtime.evaluate", {
|
|
4801
4864
|
expression: `
|
|
4802
4865
|
(function() {
|
|
4803
|
-
const el = document.querySelector(${JSON.stringify(
|
|
4804
|
-
if (!el) return { error: 'Element not found: ${
|
|
4866
|
+
const el = document.querySelector(${JSON.stringify(first)});
|
|
4867
|
+
if (!el) return { error: 'Element not found: ' + ${JSON.stringify(first)} };
|
|
4805
4868
|
el.scrollIntoView({ block: 'center', behavior: 'smooth' });
|
|
4806
4869
|
return { tag: el.tagName.toLowerCase() };
|
|
4807
4870
|
})()
|
|
@@ -4809,11 +4872,11 @@ async function cmdScroll(target, amount) {
|
|
|
4809
4872
|
returnByValue: true
|
|
4810
4873
|
});
|
|
4811
4874
|
const val = result.result.value;
|
|
4812
|
-
if (val.error) {
|
|
4875
|
+
if (val && val.error) {
|
|
4813
4876
|
console.error(val.error);
|
|
4814
4877
|
process.exit(1);
|
|
4815
4878
|
}
|
|
4816
|
-
console.log(`Scrolled to ${val.tag} ${
|
|
4879
|
+
console.log(`Scrolled to ${val.tag} ${first}`);
|
|
4817
4880
|
}
|
|
4818
4881
|
await new Promise((r) => setTimeout(r, 100));
|
|
4819
4882
|
console.log(await getPageBrief(cdp));
|
|
@@ -5694,6 +5757,9 @@ async function dispatch(command, args) {
|
|
|
5694
5757
|
case "keyboard":
|
|
5695
5758
|
await cmdKeyboard(args.join(" "));
|
|
5696
5759
|
break;
|
|
5760
|
+
case "paste":
|
|
5761
|
+
await cmdPaste(args.join(" "));
|
|
5762
|
+
break;
|
|
5697
5763
|
case "select":
|
|
5698
5764
|
await cmdSelect(resolveSelector(args[0]), ...args.slice(1));
|
|
5699
5765
|
break;
|
|
@@ -5716,7 +5782,7 @@ async function dispatch(command, args) {
|
|
|
5716
5782
|
await cmdPress(args[0]);
|
|
5717
5783
|
break;
|
|
5718
5784
|
case "scroll":
|
|
5719
|
-
await cmdScroll(args
|
|
5785
|
+
await cmdScroll(args);
|
|
5720
5786
|
break;
|
|
5721
5787
|
case "eval":
|
|
5722
5788
|
await cmdEval(args.join(" "));
|
|
@@ -5817,6 +5883,7 @@ Commands:
|
|
|
5817
5883
|
clear <selector> Clear an input field or contenteditable
|
|
5818
5884
|
type <sel> <text> Type text into element (focuses selector first)
|
|
5819
5885
|
keyboard <text> Type text at current caret position (no selector)
|
|
5886
|
+
paste <text> Paste text via ClipboardEvent (for rich editors)
|
|
5820
5887
|
select <sel> <val> Select option(s) from a <select> by value or label
|
|
5821
5888
|
upload <sel> <file> Upload file(s) to a file input
|
|
5822
5889
|
drag <from> <to> Drag from one element to another
|
|
@@ -5825,6 +5892,7 @@ Commands:
|
|
|
5825
5892
|
waitfornav [ms] Wait for page navigation to complete (default 10000ms)
|
|
5826
5893
|
press <key> Press a key or combo (Enter, Ctrl+A, Meta+C, etc.)
|
|
5827
5894
|
scroll <target> [px] Scroll: up, down, top, bottom, or CSS selector [pixels]
|
|
5895
|
+
scroll <sel> <dir> [px] Scroll within element: up, down, top, bottom [pixels]
|
|
5828
5896
|
eval <js> Evaluate JavaScript
|
|
5829
5897
|
cookies [get|set|clear|delete] Manage browser cookies
|
|
5830
5898
|
console [show|errors|listen] View console output or JS errors
|