npm - chromeflow - Versions diffs - 0.1.8 → 0.1.10 - Mend

chromeflow 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CLAUDE.md CHANGED Viewed

@@ -22,11 +22,13 @@ Do NOT ask "should I open the browser?" — just do it. The user expects seamles
    `scroll_page` then retry, or use `highlight_region` to show the user. Never use
    `osascript`, `applescript`, or any shell command to control the browser.
-2. **`take_screenshot` is only for pixel-position lookups before `highlight_region`.** Every
-   other state check — after navigation, after a click, after `wait_for_click`, to confirm an
-   action succeeded — must use `get_page_text` or `wait_for_selector`. Never take a screenshot
-   as a "let me see what's on screen" step. Correct order: try `click_element` → if it fails,
-   THEN `take_screenshot` → `highlight_region`. Never screenshot preemptively.
+2. **Use `get_elements` to get pixel coordinates, not `take_screenshot`.** `get_elements`
+   returns exact DOM coordinates (always accurate). `take_screenshot` is a last resort only
+   when you need to SEE the visual layout — not for finding positions. Correct order when
+   `click_element` or `fill_input` fails: try `get_elements` first to get exact coords →
+   then `highlight_region` using those coords. Only use `take_screenshot` if you genuinely
+   need to see what the page looks like. Screenshots now include a red coordinate grid to
+   help read positions — use the grid labels, not visual estimates.
 3. **`open_page` already waits for navigation.** Never call `wait_for_navigation`
    immediately after `open_page` — it will time out.
@@ -54,11 +56,12 @@ Do NOT ask "should I open the browser?" — just do it. The user expects seamles
         get_page_text()                     — read errors/status after actions
         wait_for_selector(".success")       — wait for async changes (builds, modals)
         execute_script("document.title")    — query DOM state programmatically
-   c. Only take a screenshot when click_element failed and you need pixel coords:
+   c. When click_element or fill_input fails and you need pixel coords:
         click_element("Save")               — try this first, ALWAYS
-        [if fails] take_screenshot()        — now get coords, not before
-        highlight_region(x,y,w,h,msg)       — point user to exact location
+        [if fails] get_elements()           — get EXACT DOM coords, use these in highlight_region
+        highlight_region(x,y,w,h,msg)       — use exact coords from get_elements, not estimates
         [after wait_for_click] get_page_text() — confirm result, NOT take_screenshot
+        [last resort only] take_screenshot() — only if you need to see the visual layout
    d. Pause for the user when needed:
         find_and_highlight(text, msg)        — show the user what to do
         wait_for_click()                    — wait for user interaction
@@ -100,7 +103,7 @@ Use the absolute path for `envPath` — it's the Claude Code working directory +
 - After any action → `get_page_text()` to check for errors (not `take_screenshot`)
 - After `click_element("Save")` / form submission → use `get_page_text()` or `wait_for_selector` to confirm. Never use `wait_for_navigation` — most form saves don't navigate.
 - `click_element` not found → `scroll_page("down")` then retry
-- Still not found → `take_screenshot()` then `highlight_region(x,y,w,h,msg)`
+- Still not found → `get_elements()` to get exact coords, then `highlight_region(x,y,w,h,msg)` using those coords. Only use `take_screenshot()` if you need to visually inspect the page.
 - `fill_input` not found → `click_element(hint)` to focus the field, then retry `fill_input`. If still failing, use `find_and_highlight(hint, "Click here — I'll fill it in")` (NO `valueToType`) then `wait_for_click()` then retry `fill_input` — after the user focuses the field by clicking, the active-element fallback fills it automatically. `find_and_highlight` uses DOM positioning (pixel-perfect) — only fall back to `take_screenshot` + `highlight_region` if `find_and_highlight` returns false. After `fill_input` succeeds, immediately call `clear_overlays()` to remove the highlight. Only use `valueToType` when the user genuinely must type the value themselves (e.g. password, personal data).
 - Waiting for async result (build, save, deploy) → `wait_for_selector(selector, timeout)`
 - Never use Bash to work around a stuck browser interaction

package/dist/index.js CHANGED Viewed

@@ -27,7 +27,7 @@ async function main() {
   const bridge = new WsBridge();
   const server = new McpServer({
     name: "chromeflow",
-    version: "0.1.8"
+    version: "0.1.10"
   });
   registerBrowserTools(server, bridge);
   registerHighlightTools(server, bridge);

package/dist/tools/browser.js CHANGED Viewed

@@ -46,6 +46,31 @@ function registerBrowserTools(server, bridge) {
       };
     }
   );
+  server.tool(
+    "get_elements",
+    `Get the exact pixel positions of all visible interactive elements on the page (inputs, buttons, links, selects).
+Use this INSTEAD OF take_screenshot when you need coordinates for highlight_region \u2014 the coordinates are exact DOM values, not estimates.
+Returns a numbered list with element type, label, and precise x/y/width/height in CSS pixels.
+After calling this, use those exact coordinates in highlight_region \u2014 do NOT adjust them.`,
+    {},
+    async () => {
+      const response = await bridge.request({ type: "get_elements" });
+      if (response.type !== "elements_response") throw new Error("Unexpected response");
+      const els = response.elements;
+      if (els.length === 0) {
+        return { content: [{ type: "text", text: "No visible interactive elements found on page." }] };
+      }
+      const lines = els.map(
+        (e) => `${e.index}. ${e.type} "${e.label}" \u2014 x:${e.x} y:${e.y} w:${e.width} h:${e.height}`
+      );
+      return {
+        content: [{ type: "text", text: `Visible interactive elements:
+${lines.join("\n")}
+Use these exact x/y values in highlight_region.` }]
+      };
+    }
+  );
   server.tool(
     "execute_script",
     `Execute JavaScript in the current page's context and return the result as a string.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "chromeflow",
-  "version": "0.1.8",
+  "version": "0.1.10",
   "description": "Browser guidance MCP server for Claude Code — highlights, clicks, fills, and captures from the web so you don't have to.",
   "type": "module",
   "bin": {