@kilospark/webact 2.5.1 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/SKILL.md +86 -62
  2. package/package.json +1 -1
  3. package/webact.js +90 -22
package/SKILL.md CHANGED
@@ -9,12 +9,12 @@ Control Chrome directly via the Chrome DevTools Protocol. No Playwright, no MCP
9
9
 
10
10
  ## How to Run Commands
11
11
 
12
- All commands use `webact.js` from this skill's base directory. The base directory is provided when the skill loads - use it as the path prefix.
12
+ All commands use the `webact` CLI. First, check if `webact` is on PATH by running `which webact`. If found, use `webact` directly. If not, fall back to `node <base-dir>/webact.js` where `<base-dir>` is this skill's base directory.
13
13
 
14
14
  ### Session Setup (once)
15
15
 
16
16
  ```bash
17
- node <base-dir>/webact.js launch
17
+ webact launch
18
18
  ```
19
19
 
20
20
  This launches Chrome (or connects to an existing instance) and creates a session. All subsequent commands auto-discover the session - no session ID needed.
@@ -24,11 +24,11 @@ This launches Chrome (or connects to an existing instance) and creates a session
24
24
  Use direct CLI commands. Each is a single bash call:
25
25
 
26
26
  ```bash
27
- node <base-dir>/webact.js navigate https://example.com
28
- node <base-dir>/webact.js click button.submit
29
- node <base-dir>/webact.js keyboard "hello world"
30
- node <base-dir>/webact.js press Enter
31
- node <base-dir>/webact.js dom
27
+ webact navigate https://example.com
28
+ webact click button.submit
29
+ webact keyboard "hello world"
30
+ webact press Enter
31
+ webact dom
32
32
  ```
33
33
 
34
34
  **Auto-brief:** State-changing commands (navigate, click, hover, press Enter/Tab, scroll, select, waitfor) auto-print a compact page summary showing URL, title, inputs, buttons, links, and total element counts. You usually don't need a separate `dom` call. Use `dom` only when you need the full page structure, `axtree -i` for a quick list of all interactive elements, or `axtree` for the full semantic tree.
@@ -37,47 +37,48 @@ node <base-dir>/webact.js dom
37
37
 
38
38
  | Command | Example |
39
39
  |---------|---------|
40
- | `navigate <url>` | `node webact.js navigate https://example.com` |
41
- | `back` | `node webact.js back` |
42
- | `forward` | `node webact.js forward` |
43
- | `reload` | `node webact.js reload` |
44
- | `dom [selector] [--full]` | `node webact.js dom` or `node webact.js dom .results` |
45
- | `axtree [selector] [-i]` | `node webact.js axtree` or `node webact.js axtree -i` |
46
- | `observe` | `node webact.js observe` |
47
- | `screenshot` | `node webact.js screenshot` |
48
- | `pdf [path]` | `node webact.js pdf` or `node webact.js pdf /tmp/page.pdf` |
49
- | `click <selector>` | `node webact.js click button.submit` |
50
- | `doubleclick <selector>` | `node webact.js doubleclick td.cell` |
51
- | `rightclick <selector>` | `node webact.js rightclick .context-target` |
52
- | `hover <selector>` | `node webact.js hover .menu-trigger` |
53
- | `focus <selector>` | `node webact.js focus input[name=q]` |
54
- | `clear <selector>` | `node webact.js clear input[name=q]` |
55
- | `type <selector> <text>` | `node webact.js type input[name=q] search query` |
56
- | `keyboard <text>` | `node webact.js keyboard hello world` |
57
- | `select <selector> <value>` | `node webact.js select select#country US` |
58
- | `upload <selector> <file>` | `node webact.js upload input[type=file] /tmp/photo.png` |
59
- | `drag <from> <to>` | `node webact.js drag .card .dropzone` |
60
- | `dialog <accept\|dismiss> [text]` | `node webact.js dialog accept` |
61
- | `waitfor <selector> [ms]` | `node webact.js waitfor .dropdown 5000` |
62
- | `waitfornav [ms]` | `node webact.js waitfornav` |
63
- | `press <key\|combo>` | `node webact.js press Enter` or `node webact.js press Ctrl+A` |
64
- | `scroll <target> [px]` | `node webact.js scroll down 500` or `node webact.js scroll top` |
65
- | `eval <js>` | `node webact.js eval document.title` |
66
- | `cookies [get\|set\|clear\|delete]` | `node webact.js cookies` or `node webact.js cookies set name val` |
67
- | `console [show\|errors\|listen]` | `node webact.js console` or `node webact.js console errors` |
68
- | `block <pattern>` | `node webact.js block images css` or `node webact.js block off` |
69
- | `viewport <w> <h>` | `node webact.js viewport mobile` or `node webact.js viewport 1024 768` |
70
- | `frames` | `node webact.js frames` |
71
- | `frame <id\|selector>` | `node webact.js frame main` or `node webact.js frame iframe#embed` |
72
- | `download [path\|list]` | `node webact.js download path /tmp/dl` or `node webact.js download list` |
73
- | `tabs` | `node webact.js tabs` |
74
- | `tab <id>` | `node webact.js tab ABC123` |
75
- | `newtab [url]` | `node webact.js newtab https://example.com` |
76
- | `close` | `node webact.js close` |
77
- | `activate` | `node webact.js activate` |
78
- | `minimize` | `node webact.js minimize` |
79
-
80
- **`type` vs `keyboard`:** Use `type` to focus a specific input and fill it. Use `keyboard` to type at the current caret position - essential for rich text editors (Slack, Google Docs, Notion) where `type`'s focus call resets the cursor.
40
+ | `navigate <url>` | `webact navigate https://example.com` |
41
+ | `back` | `webact back` |
42
+ | `forward` | `webact forward` |
43
+ | `reload` | `webact reload` |
44
+ | `dom [selector] [--full]` | `webact dom` or `webact dom .results` |
45
+ | `axtree [selector] [-i]` | `webact axtree` or `webact axtree -i` |
46
+ | `observe` | `webact observe` |
47
+ | `screenshot` | `webact screenshot` |
48
+ | `pdf [path]` | `webact pdf` or `webact pdf /tmp/page.pdf` |
49
+ | `click <selector>` | `webact click button.submit` |
50
+ | `doubleclick <selector>` | `webact doubleclick td.cell` |
51
+ | `rightclick <selector>` | `webact rightclick .context-target` |
52
+ | `hover <selector>` | `webact hover .menu-trigger` |
53
+ | `focus <selector>` | `webact focus input[name=q]` |
54
+ | `clear <selector>` | `webact clear input[name=q]` |
55
+ | `type <selector> <text>` | `webact type input[name=q] search query` |
56
+ | `keyboard <text>` | `webact keyboard hello world` |
57
+ | `paste <text>` | `webact paste Hello world` |
58
+ | `select <selector> <value>` | `webact select select#country US` |
59
+ | `upload <selector> <file>` | `webact upload input[type=file] /tmp/photo.png` |
60
+ | `drag <from> <to>` | `webact drag .card .dropzone` |
61
+ | `dialog <accept\|dismiss> [text]` | `webact dialog accept` |
62
+ | `waitfor <selector> [ms]` | `webact waitfor .dropdown 5000` |
63
+ | `waitfornav [ms]` | `webact waitfornav` |
64
+ | `press <key\|combo>` | `webact press Enter` or `webact press Ctrl+A` |
65
+ | `scroll <target> [px]` | `webact scroll down 500` or `webact scroll top` |
66
+ | `eval <js>` | `webact eval document.title` |
67
+ | `cookies [get\|set\|clear\|delete]` | `webact cookies` or `webact cookies set name val` |
68
+ | `console [show\|errors\|listen]` | `webact console` or `webact console errors` |
69
+ | `block <pattern>` | `webact block images css` or `webact block off` |
70
+ | `viewport <w> <h>` | `webact viewport mobile` or `webact viewport 1024 768` |
71
+ | `frames` | `webact frames` |
72
+ | `frame <id\|selector>` | `webact frame main` or `webact frame iframe#embed` |
73
+ | `download [path\|list]` | `webact download path /tmp/dl` or `webact download list` |
74
+ | `tabs` | `webact tabs` |
75
+ | `tab <id>` | `webact tab ABC123` |
76
+ | `newtab [url]` | `webact newtab https://example.com` |
77
+ | `close` | `webact close` |
78
+ | `activate` | `webact activate` |
79
+ | `minimize` | `webact minimize` |
80
+
81
+ **`type` vs `keyboard` vs `paste`:** Use `type` to focus a specific input and fill it. Use `keyboard` to type at the current caret position - essential for rich text editors (Slack, Google Docs, Notion) where `type`'s focus call resets the cursor. Use `paste` to insert text via a ClipboardEvent - works with apps that intercept paste (Google Docs, Notion) and is faster than `keyboard` for large text.
81
82
 
82
83
  **`click` behavior:** Waits up to 5s for the element, scrolls it into view, then clicks. No manual waits needed for dynamic elements.
83
84
 
@@ -93,7 +94,9 @@ node <base-dir>/webact.js dom
93
94
 
94
95
  **`press` combos:** Supports modifier keys: `Ctrl+A` (select all), `Ctrl+C` (copy), `Meta+V` (paste on Mac), `Shift+Enter`, etc. Modifiers: Ctrl, Alt, Shift, Meta/Cmd.
95
96
 
96
- **`scroll` targets:** `up`/`down` (default 400px, or specify pixels), `top`/`bottom`, or a CSS selector to scroll an element into view.
97
+ **Mac keyboard note:** On macOS, app shortcuts documented as `Ctrl+Alt+<key>` (e.g., Google Docs heading shortcuts `Ctrl+Alt+1` through `Ctrl+Alt+6`) must be sent as `Meta+Alt+<key>` through CDP. Mac's Ctrl key is not the Command key these apps expect. Example: `press Meta+Alt+2` for Heading 2 in Google Docs.
98
+
99
+ **`scroll` targets:** `up`/`down` (default 400px, or specify pixels), `top`/`bottom`, or a CSS selector to scroll an element into view. **Element-scoped:** `scroll <selector> <up|down|top|bottom> [px]` scrolls within a container element instead of the page — essential for apps with custom scroll containers (Google Docs, Slack).
97
100
 
98
101
  **`block` patterns:** Block resource types (`images`, `css`, `fonts`, `media`, `scripts`) or URL substrings. Speeds up page loads. Use `block off` to disable.
99
102
 
@@ -119,7 +122,7 @@ When given a goal, follow this loop:
119
122
 
120
123
  1. **PLAN** - Break the goal into steps. Chain predictable sequences (click → type → press Enter) into a single command array.
121
124
 
122
- 2. **ACT** - Write command JSON (or array), run `node <base-dir>/webact.js run <sessionId>`. Actions auto-print a page brief.
125
+ 2. **ACT** - Write command JSON (or array), run `webact run <sessionId>`. Actions auto-print a page brief.
123
126
 
124
127
  3. **DECIDE** - Read the brief. Expected state? Continue. Login wall / CAPTCHA? Tell user. Need more detail? Use `dom`. Goal complete? Report.
125
128
 
@@ -149,7 +152,7 @@ When given a goal, follow this loop:
149
152
 
150
153
  ```bash
151
154
  # Launch Chrome and get a session ID
152
- node <base-dir>/webact.js launch
155
+ webact launch
153
156
  # Output: Session: a1b2c3d4
154
157
  # Command file: /tmp/webact-command-a1b2c3d4.json (path varies by OS)
155
158
  ```
@@ -180,30 +183,51 @@ Read the DOM output and identify elements by:
180
183
 
181
184
  If a CSS selector doesn't work, use `eval` to find elements by text content:
182
185
  ```bash
183
- node webact.js eval "[...document.querySelectorAll('a')].find(a => a.textContent.includes('Sign in'))?.getAttribute('href')"
186
+ webact eval "[...document.querySelectorAll('a')].find(a => a.textContent.includes('Sign in'))?.getAttribute('href')"
184
187
  ```
185
188
 
186
189
  ## Common Patterns
187
190
 
188
- All examples assume you've already run `node webact.js launch`.
191
+ All examples assume you've already run `webact launch`.
189
192
 
190
193
  **Navigate and read** (navigate auto-prints brief - no separate dom needed):
191
194
  ```bash
192
- node webact.js navigate https://news.ycombinator.com
195
+ webact navigate https://news.ycombinator.com
193
196
  ```
194
197
 
195
198
  **Fill a form:**
196
199
  ```bash
197
- node webact.js click input[name=q]
198
- node webact.js type input[name=q] search query
199
- node webact.js press Enter
200
+ webact click input[name=q]
201
+ webact type input[name=q] search query
202
+ webact press Enter
200
203
  ```
201
204
 
202
205
  **Rich text editors and @mentions:**
203
206
  ```bash
204
- node webact.js click .ql-editor
205
- node webact.js keyboard Hello @alice
206
- node webact.js waitfor [data-qa='tab_complete_ui_item'] 5000
207
- node webact.js click [data-qa='tab_complete_ui_item']
208
- node webact.js keyboard " check this out"
207
+ webact click .ql-editor
208
+ webact keyboard Hello @alice
209
+ webact waitfor [data-qa='tab_complete_ui_item'] 5000
210
+ webact click [data-qa='tab_complete_ui_item']
211
+ webact keyboard " check this out"
209
212
  ```
213
+
214
+ ## Complex Web Apps
215
+
216
+ Some apps have non-standard DOMs that require specific approaches.
217
+
218
+ **Google Docs:**
219
+ - Use `keyboard` (not `type`) — Google Docs has a custom editor, not standard inputs
220
+ - Use `paste` for inserting blocks of text — faster and more reliable than `keyboard` for multi-line content
221
+ - Heading shortcuts: `press Meta+Alt+1` through `press Meta+Alt+6` (NOT `Ctrl+Alt` — see Mac keyboard note above)
222
+ - Scrolling: Use `scroll .kix-appview-editor down 500` — page-level scroll doesn't reach the document content
223
+
224
+ **Slack:**
225
+ - Message composition: Click the message input, then use `keyboard` to type
226
+ - Message extraction: Use `eval` to query Slack's virtual DOM — standard CSS selectors are unreliable due to virtual scrolling
227
+ - Example: `eval [...document.querySelectorAll('[data-qa="virtual-list-item"]')].map(el => el.textContent).join('\n')`
228
+
229
+ **General rich editors (Notion, Quill, ProseMirror, etc.):**
230
+ - Prefer `paste` over `keyboard` for multi-line text — many editors handle paste events specially
231
+ - Use `keyboard` for short inline text and @mentions
232
+ - Use `eval` when you need to extract content from editors with virtual rendering
233
+ - If `paste` doesn't work for a specific app, fall back to `eval` with a custom ClipboardEvent
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kilospark/webact",
3
- "version": "2.5.1",
3
+ "version": "2.6.0",
4
4
  "description": "CLI for browser automation via Chrome DevTools Protocol",
5
5
  "main": "webact.js",
6
6
  "bin": {
package/webact.js CHANGED
@@ -4624,6 +4624,38 @@ async function cmdKeyboard(text) {
4624
4624
  console.log(`OK keyboard "${text.substring(0, 50)}${text.length > 50 ? "..." : ""}"`);
4625
4625
  });
4626
4626
  }
4627
+ async function cmdPaste(text) {
4628
+ if (!text) {
4629
+ console.error("Usage: webact.js paste <text>");
4630
+ process.exit(1);
4631
+ }
4632
+ await withCDP(async (cdp) => {
4633
+ const result = await cdp.send("Runtime.evaluate", {
4634
+ expression: `
4635
+ (function() {
4636
+ const el = document.activeElement;
4637
+ if (!el) return { error: 'No active element to paste into' };
4638
+ const dt = new DataTransfer();
4639
+ dt.setData('text/plain', ${JSON.stringify(text)});
4640
+ const evt = new ClipboardEvent('paste', {
4641
+ clipboardData: dt,
4642
+ bubbles: true,
4643
+ cancelable: true,
4644
+ });
4645
+ el.dispatchEvent(evt);
4646
+ return { ok: true };
4647
+ })()
4648
+ `,
4649
+ returnByValue: true
4650
+ });
4651
+ const val = result.result.value;
4652
+ if (val && val.error) {
4653
+ console.error(val.error);
4654
+ process.exit(1);
4655
+ }
4656
+ console.log(`OK pasted "${text.substring(0, 50)}${text.length > 50 ? "..." : ""}"`);
4657
+ });
4658
+ }
4627
4659
  async function cmdWaitFor(selector, timeoutMs) {
4628
4660
  if (!selector) {
4629
4661
  console.error("Usage: webact.js waitfor <selector> [timeout_ms]");
@@ -4775,33 +4807,64 @@ async function cmdPress(key) {
4775
4807
  }
4776
4808
  });
4777
4809
  }
4778
- async function cmdScroll(target, amount) {
4779
- if (!target) {
4780
- console.error("Usage: webact.js scroll <up|down|top|bottom|selector> [pixels]");
4810
+ async function cmdScroll(args) {
4811
+ if (!args.length) {
4812
+ console.error("Usage: webact.js scroll <up|down|top|bottom|selector> [pixels]\n webact.js scroll <selector> <up|down|top|bottom> [pixels]");
4781
4813
  process.exit(1);
4782
4814
  }
4783
- const lower = target.toLowerCase();
4815
+ const directions = ["up", "down", "top", "bottom"];
4816
+ const first = args[0];
4817
+ const lower = first.toLowerCase();
4818
+ const secondIsDirection = args[1] && directions.includes(args[1].toLowerCase());
4819
+ const firstIsDirection = directions.includes(lower);
4784
4820
  await withCDP(async (cdp) => {
4785
- if (lower === "top") {
4786
- await cdp.send("Runtime.evaluate", { expression: "window.scrollTo(0, 0)" });
4787
- } else if (lower === "bottom") {
4788
- await cdp.send("Runtime.evaluate", { expression: "window.scrollTo(0, document.body.scrollHeight)" });
4789
- } else if (lower === "up" || lower === "down") {
4790
- const pixels = parseInt(amount, 10) || 400;
4791
- const deltaY = lower === "up" ? -pixels : pixels;
4792
- await cdp.send("Input.dispatchMouseEvent", {
4793
- type: "mouseWheel",
4794
- x: 200,
4795
- y: 200,
4796
- deltaX: 0,
4797
- deltaY
4821
+ if (firstIsDirection) {
4822
+ if (lower === "top") {
4823
+ await cdp.send("Runtime.evaluate", { expression: "window.scrollTo(0, 0)" });
4824
+ } else if (lower === "bottom") {
4825
+ await cdp.send("Runtime.evaluate", { expression: "window.scrollTo(0, document.body.scrollHeight)" });
4826
+ } else {
4827
+ const pixels = parseInt(args[1], 10) || 400;
4828
+ const deltaY = lower === "up" ? -pixels : pixels;
4829
+ await cdp.send("Input.dispatchMouseEvent", {
4830
+ type: "mouseWheel",
4831
+ x: 200,
4832
+ y: 200,
4833
+ deltaX: 0,
4834
+ deltaY
4835
+ });
4836
+ }
4837
+ } else if (secondIsDirection) {
4838
+ const selector = first;
4839
+ const dir = args[1].toLowerCase();
4840
+ const pixels = parseInt(args[2], 10) || 400;
4841
+ const result = await cdp.send("Runtime.evaluate", {
4842
+ expression: `
4843
+ (function() {
4844
+ const el = document.querySelector(${JSON.stringify(selector)});
4845
+ if (!el) return { error: 'Element not found' };
4846
+ const dir = ${JSON.stringify(dir)};
4847
+ const pixels = ${pixels};
4848
+ if (dir === 'top') el.scrollTop = 0;
4849
+ else if (dir === 'bottom') el.scrollTop = el.scrollHeight;
4850
+ else el.scrollBy(0, dir === 'up' ? -pixels : pixels);
4851
+ return { tag: el.tagName.toLowerCase(), dir };
4852
+ })()
4853
+ `,
4854
+ returnByValue: true
4798
4855
  });
4856
+ const val = result.result.value;
4857
+ if (val && val.error) {
4858
+ console.error(val.error);
4859
+ process.exit(1);
4860
+ }
4861
+ console.log(`Scrolled ${val.dir} within ${val.tag} ${selector}`);
4799
4862
  } else {
4800
4863
  const result = await cdp.send("Runtime.evaluate", {
4801
4864
  expression: `
4802
4865
  (function() {
4803
- const el = document.querySelector(${JSON.stringify(target)});
4804
- if (!el) return { error: 'Element not found: ${target}' };
4866
+ const el = document.querySelector(${JSON.stringify(first)});
4867
+ if (!el) return { error: 'Element not found: ' + ${JSON.stringify(first)} };
4805
4868
  el.scrollIntoView({ block: 'center', behavior: 'smooth' });
4806
4869
  return { tag: el.tagName.toLowerCase() };
4807
4870
  })()
@@ -4809,11 +4872,11 @@ async function cmdScroll(target, amount) {
4809
4872
  returnByValue: true
4810
4873
  });
4811
4874
  const val = result.result.value;
4812
- if (val.error) {
4875
+ if (val && val.error) {
4813
4876
  console.error(val.error);
4814
4877
  process.exit(1);
4815
4878
  }
4816
- console.log(`Scrolled to ${val.tag} ${target}`);
4879
+ console.log(`Scrolled to ${val.tag} ${first}`);
4817
4880
  }
4818
4881
  await new Promise((r) => setTimeout(r, 100));
4819
4882
  console.log(await getPageBrief(cdp));
@@ -5694,6 +5757,9 @@ async function dispatch(command, args) {
5694
5757
  case "keyboard":
5695
5758
  await cmdKeyboard(args.join(" "));
5696
5759
  break;
5760
+ case "paste":
5761
+ await cmdPaste(args.join(" "));
5762
+ break;
5697
5763
  case "select":
5698
5764
  await cmdSelect(resolveSelector(args[0]), ...args.slice(1));
5699
5765
  break;
@@ -5716,7 +5782,7 @@ async function dispatch(command, args) {
5716
5782
  await cmdPress(args[0]);
5717
5783
  break;
5718
5784
  case "scroll":
5719
- await cmdScroll(args[0], args[1]);
5785
+ await cmdScroll(args);
5720
5786
  break;
5721
5787
  case "eval":
5722
5788
  await cmdEval(args.join(" "));
@@ -5817,6 +5883,7 @@ Commands:
5817
5883
  clear <selector> Clear an input field or contenteditable
5818
5884
  type <sel> <text> Type text into element (focuses selector first)
5819
5885
  keyboard <text> Type text at current caret position (no selector)
5886
+ paste <text> Paste text via ClipboardEvent (for rich editors)
5820
5887
  select <sel> <val> Select option(s) from a <select> by value or label
5821
5888
  upload <sel> <file> Upload file(s) to a file input
5822
5889
  drag <from> <to> Drag from one element to another
@@ -5825,6 +5892,7 @@ Commands:
5825
5892
  waitfornav [ms] Wait for page navigation to complete (default 10000ms)
5826
5893
  press <key> Press a key or combo (Enter, Ctrl+A, Meta+C, etc.)
5827
5894
  scroll <target> [px] Scroll: up, down, top, bottom, or CSS selector [pixels]
5895
+ scroll <sel> <dir> [px] Scroll within element: up, down, top, bottom [pixels]
5828
5896
  eval <js> Evaluate JavaScript
5829
5897
  cookies [get|set|clear|delete] Manage browser cookies
5830
5898
  console [show|errors|listen] View console output or JS errors