chromeflow 0.1.27 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md CHANGED
@@ -22,22 +22,13 @@ Do NOT ask "should I open the browser?" — just do it. The user expects seamles
22
22
  `scroll_page` then retry, or use `highlight_region` to show the user. Never use
23
23
  `osascript`, `applescript`, or any shell command to control the browser.
24
24
 
25
- 2. **Use `get_elements` to get pixel coordinates, not `take_screenshot`.** `get_elements`
26
- returns exact DOM coordinates (always accurate). `take_screenshot` is a last resort only
27
- when you need to SEE the visual layout not for finding positions. Correct order when
28
- `click_element` or `fill_input` fails: try `get_elements` first to get exact coords
29
- then `highlight_region` using those coords. Only use `take_screenshot` if you genuinely
30
- need to see what the page looks like. Screenshots now include a red coordinate grid to
31
- help read positions use the grid labels, not visual estimates.
32
-
33
- 3. **`open_page` already waits for navigation.** Never call `wait_for_navigation`
34
- immediately after `open_page` — it will time out.
35
-
36
- 4. **When `click_element` fails:** first try `scroll_page(down)` then retry
37
- `click_element`. If it still fails, `take_screenshot` and use `highlight_region`
38
- with pixel coordinates from the image.
39
-
40
- 5. **Use `wait_for_selector` to wait for async page changes** (build completion, modals,
25
+ 2. **Never use `take_screenshot` to find element positions or confirm actions.**
26
+ `get_elements` returns exact DOM coordinates always use that first. `get_page_text`
27
+ tells you what happened after an actionalways use that before reaching for a screenshot.
28
+ `take_screenshot` is only for when you genuinely have no idea what the page looks like
29
+ and DOM queries can't help. It is a last resort, not a routine check.
30
+
31
+ 3. **Use `wait_for_selector` to wait for async page changes** (build completion, modals,
41
32
  toasts). Never poll with repeated `take_screenshot` calls.
42
33
 
43
34
  ## Guided flow pattern
@@ -49,27 +40,25 @@ Do NOT ask "should I open the browser?" — just do it. The user expects seamles
49
40
  3. For each step:
50
41
  a. Claude acts directly:
51
42
  click_element("Save") — press buttons/links Claude can press
52
- wait_for_selector(".success") or get_page_text() — ALWAYS confirm after click; click_element returns after 600ms regardless of outcome
43
+ get_page_text() or wait_for_selector(".success") — ALWAYS confirm after click; click_element returns after 600ms regardless of outcome
53
44
  fill_input("Product name", "Pro") — fill fields Claude knows the answer to (works on React, CodeMirror, and contenteditable)
54
45
  clear_overlays() — call this immediately after fill_input succeeds
55
- scroll_page("down") reveal off-screen content then retry
56
- scroll_to_element("label text") jump directly to a known field instead of guessing pixel scroll amount
46
+ scroll_to_element("label text") jump directly to a known field; prefer this over scroll_page when the target is known
47
+ scroll_page("down") reveal off-screen content when target location is unknown
57
48
  b. Check results with text, not vision:
58
49
  get_page_text() — read errors/status after actions
59
50
  wait_for_selector(".success") — wait for async changes (builds, modals)
60
51
  execute_script("document.title") — query DOM state programmatically
61
- c. When click_element or fill_input fails and you need pixel coords:
62
- click_element("Save") — try this first, ALWAYS
63
- [if fails] get_elements() — get EXACT DOM coords, use these in highlight_region
64
- highlight_region(x,y,w,h,msg) — use exact coords from get_elements, not estimates
65
- [after wait_for_click] get_page_text() — confirm result, NOT take_screenshot
66
- [last resort only] take_screenshot() — returns image to Claude only (no file, no clipboard)
67
- take_and_copy_screenshot() — same as above BUT also saves PNG + copies to clipboard
52
+ c. When an element can't be found or clicked:
53
+ scroll_page("down") and retry always try this first
54
+ get_elements() — get EXACT DOM coords, use these in highlight_region
55
+ highlight_region(x,y,w,h,msg) — use exact coords from get_elements
56
+ [absolute last resort] take_screenshot() — only if you genuinely can't identify the element from DOM
68
57
  d. Pause for the user when needed:
69
58
  find_and_highlight(text, msg) — show the user what to do
70
59
  wait_for_click() — wait for user interaction
71
- [after wait_for_click + fill_input] clear_overlays() — always clear after filling
72
- e. mark_step_done(i) — check off the step
60
+ [after fill_input] clear_overlays() — always clear after filling
61
+ e. mark_step_done(i) — check off the step after it is complete
73
62
  4. clear_overlays() — clean up when done
74
63
  ```
75
64
 
@@ -102,24 +91,46 @@ After a secret key or API key is revealed:
102
91
 
103
92
  Use the absolute path for `envPath` — it's the Claude Code working directory + `/.env`.
104
93
 
94
+ To capture and share a screenshot (e.g. for uploading to a form or pasting into a chat),
95
+ use `take_and_copy_screenshot()` — it saves a PNG to ~/Downloads and copies it to the clipboard.
96
+
105
97
  ## Working with complex forms
106
- - Before filling a large or unfamiliar form, call `get_form_fields()` to get a full inventory of every field (type, label, current value, vertical position). This prevents missing fields and avoids positional guesswork.
107
- - `fill_input` works on React-controlled inputs, contenteditable (Stripe, Notion), and **CodeMirror 6 editors** — it auto-detects all three. No `execute_script` workaround needed.
108
- - `scroll_to_element("label text or #selector")` scrolls a specific field into view without guessing pixel offsets.
109
- - For multi-session tasks (long forms that may exceed context), call `save_page_state()` as a checkpoint. A future session can call `restore_page_state()` to reload all field values from the saved snapshot.
98
+ - Before filling a large or unfamiliar form, call `get_form_fields()` to get a full inventory
99
+ of every field (type, label, current value, vertical position). Use `get_elements()` when
100
+ you need pixel coordinates of visible elements; use `get_form_fields()` when you need to
101
+ understand the full structure of a form including fields below the fold.
102
+ - `fill_input` works on React-controlled inputs, contenteditable (Stripe, Notion), and
103
+ **CodeMirror 6 editors** — it auto-detects all three. No `execute_script` workaround needed.
104
+ - Prefer `scroll_to_element("label text or #selector")` over `scroll_page` whenever you know
105
+ which field or section you need — it scrolls precisely without guessing pixel amounts.
106
+ - For multi-session tasks (long forms that may exceed context), call `save_page_state()` as a
107
+ checkpoint. A future session can call `restore_page_state()` to reload all field values.
110
108
 
111
109
  ## Working with multiple tabs
112
- - Before opening a new tab, call `list_tabs()` to check if the target URL is already open — use `switch_to_tab` to return to it instead of opening a duplicate.
113
- - `open_page(url, new_tab=true)` opens a URL without losing the current tab. Use sparingly — prefer switching to an existing tab over opening a new one.
110
+ - Before opening a new tab, call `list_tabs()` to check if the target URL is already open —
111
+ use `switch_to_tab` to return to it instead of opening a duplicate.
112
+ - `open_page(url, new_tab=true)` opens a URL without losing the current tab. Use sparingly —
113
+ prefer switching to an existing tab over opening a new one.
114
114
  - `switch_to_tab("1")` switches by tab number; `switch_to_tab("form")` matches by URL or title substring.
115
- - `list_tabs()` shows all open tabs with their index, title, and URL.
116
115
 
117
116
  ## Error handling
118
- - After any action → `get_page_text()` to check for errors (not `take_screenshot`)
119
- - After `click_element("Save")` / form submission → use `get_page_text()` or `wait_for_selector` to confirm. Never use `wait_for_navigation` most form saves don't navigate.
120
- - After `click_element` always confirm with `wait_for_selector(".selector")` or `get_page_text()`. `click_element` returns success after 600ms even if the action had no effect.
121
- - `click_element` not found → `scroll_page("down")` then retry
122
- - Still not found → `get_elements()` to get exact coords, then `highlight_region(x,y,w,h,msg)` using those coords. Only use `take_screenshot()` if you need to visually inspect the page.
123
- - `fill_input` not found → `click_element(hint)` to focus the field, then retry `fill_input`. If still failing, use `find_and_highlight(hint, "Click here — I'll fill it in")` (NO `valueToType`) then `wait_for_click()` then retry `fill_input` — after the user focuses the field by clicking, the active-element fallback fills it automatically. `find_and_highlight` uses DOM positioning (pixel-perfect) — only fall back to `take_screenshot` + `highlight_region` if `find_and_highlight` returns false. After `fill_input` succeeds, immediately call `clear_overlays()` to remove the highlight. Only use `valueToType` when the user genuinely must type the value themselves (e.g. password, personal data).
124
- - Waiting for async result (build, save, deploy) → `wait_for_selector(selector, timeout)`
125
- - Never use Bash to work around a stuck browser interaction
117
+
118
+ **After any action**, confirm with `get_page_text()` or `wait_for_selector` — never take a
119
+ screenshot to check what happened.
120
+
121
+ **`click_element` not found:**
122
+ 1. `scroll_page("down")` then retry `click_element`
123
+ 2. `get_elements()` to get exact coords → `highlight_region(x,y,w,h,msg)`
124
+ 3. `take_screenshot()` only if you still can't identify the element from DOM queries
125
+
126
+ **`fill_input` not found:**
127
+ 1. `click_element(hint)` to focus the field, then retry `fill_input`
128
+ 2. `find_and_highlight(hint, "Click here — I'll fill it in")` (no `valueToType`) then
129
+ `wait_for_click()` — the user's click focuses the field and `fill_input`'s active-element
130
+ fallback fills it automatically
131
+ 3. Call `clear_overlays()` after `fill_input` succeeds
132
+ 4. Only use `valueToType` when the user must personally type the value (password, personal data)
133
+
134
+ **Waiting for async results** (build, save, deploy): `wait_for_selector(selector, timeout)` — never poll with screenshots.
135
+
136
+ **Never use Bash to work around a stuck browser interaction.**
@@ -2,7 +2,7 @@ import { z } from "zod";
2
2
  function registerFlowTools(server, bridge) {
3
3
  server.tool(
4
4
  "scroll_page",
5
- "Scroll the page or the focused panel up or down. Use this when a button (e.g. Save) is below the visible area of a panel or page. After scrolling, retry click_element.",
5
+ "Scroll the page or the focused panel up or down. Use this when the target location is unknown. If you know which field or element you need, use scroll_to_element instead \u2014 it scrolls precisely without guessing. After scrolling, retry click_element or fill_input.",
6
6
  {
7
7
  direction: z.enum(["down", "up"]).describe("Scroll direction"),
8
8
  amount: z.number().optional().describe("Pixels to scroll (default 400)")
@@ -110,7 +110,7 @@ Examples: scroll_to_element("#submit-btn"), scroll_to_element("Billing address")
110
110
  );
111
111
  server.tool(
112
112
  "mark_step_done",
113
- "Mark a step in the guide panel as completed (shows a green check). Call this after wait_for_click resolves.",
113
+ "Mark a step in the guide panel as completed (shows a green check). Call this after any step finishes \u2014 whether Claude acted autonomously or the user completed a highlighted step via wait_for_click.",
114
114
  {
115
115
  stepIndex: z.number().int().describe("0-based index of the step to mark done")
116
116
  },
@@ -28,7 +28,7 @@ function registerHighlightTools(server, bridge) {
28
28
  content: [
29
29
  {
30
30
  type: "text",
31
- text: response.found ? `Element containing "${text}" highlighted.` : `Element containing "${text}" not found. Try take_screenshot to identify the element visually.`
31
+ text: response.found ? `Element containing "${text}" highlighted.` : `Element containing "${text}" not found. Try get_elements() to get exact DOM coordinates, or take_screenshot() only if you need to see the visual layout.`
32
32
  }
33
33
  ]
34
34
  };
@@ -36,7 +36,7 @@ function registerHighlightTools(server, bridge) {
36
36
  );
37
37
  server.tool(
38
38
  "highlight_region",
39
- "Highlight a specific pixel region on the page with an instructional callout. Use this after take_screenshot when you can see the element's position.",
39
+ "Highlight a specific pixel region on the page with an instructional callout. Use the exact coordinates returned by get_elements \u2014 do not estimate positions. Only use take_screenshot first if get_elements cannot identify the element.",
40
40
  {
41
41
  x: z.number().describe("Left edge of the region in CSS pixels"),
42
42
  y: z.number().describe("Top edge of the region in CSS pixels"),
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "chromeflow",
3
- "version": "0.1.27",
3
+ "version": "0.1.29",
4
4
  "description": "Browser guidance MCP server for Claude Code — highlights, clicks, fills, and captures from the web so you don't have to.",
5
5
  "type": "module",
6
6
  "bin": {