npm - chromeflow - Versions diffs - 0.5.0 → 0.7.1 - Mend

chromeflow 0.5.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CLAUDE.md CHANGED Viewed

@@ -25,7 +25,9 @@ Do NOT ask "should I open the browser?" — just do it. The user expects seamles
 2. **Never use `take_screenshot` to read page content.** After `scroll_page`, after
    `click_element`, after navigation — always call `get_page_text`, not `take_screenshot`.
    `get_page_text` returns up to 10,000 characters; if truncated it tells you the next
-   `startIndex` to paginate. Screenshots are only for locating an element's pixel position
+   `startIndex` to paginate. When you only need to confirm a specific phrase is present,
+   prefer `find_text("phrase")` — it returns matches with context and selectors instead of
+   dumping the whole page. Screenshots are only for locating an element's pixel position
    when DOM queries have already failed. Never take more than 1–2 screenshots in a row.
 3. **Use `wait_for_selector` to wait for async page changes** (build completion, modals,
@@ -98,13 +100,18 @@ After a secret key or API key is revealed:
 Use the absolute path for `envPath` — it's the Claude Code working directory + `/.env`.
 To capture and share a screenshot (e.g. for uploading to a form or pasting into a chat),
-use `take_and_copy_screenshot()` — it saves a PNG to ~/Downloads and copies it to the clipboard.
+use `take_screenshot(copy_to_clipboard=true, save_to="downloads")` — saves a PNG to ~/Downloads
+and copies it to the clipboard. The defaults (`copy_to_clipboard=false, save_to="none"`) return
+the image to Claude only.
 ## Working with complex forms
 - Before filling a large or unfamiliar form, call `get_form_fields()` to get a full inventory
   of every field (type, label, current value, vertical position, and section heading). Use
   `get_elements()` when you need pixel coordinates of visible elements; use `get_form_fields()`
   when you need to understand the full structure of a form including fields below the fold.
+  If you only need one or two specific fields, use `find_input("hint")` instead — targeted
+  lookup is much cheaper than the full inventory and returns labels you can pipe straight
+  into `fill_input`.
 - `get_form_fields()` includes `[type=file]` fields even when they are visually hidden behind
   custom drag-and-drop zones. Use `set_file_input(hint, filePath)` to upload a file — provide
   the label/hint text and the absolute path to the file on disk.
@@ -147,6 +154,24 @@ use `take_and_copy_screenshot()` — it saves a PNG to ~/Downloads and copies it
 - For multi-session tasks (long forms that may exceed context), call `save_page_state()` as a
   checkpoint. A future session can call `restore_page_state()` to reload all field values.
+## Discovery — find without dumping the whole page
+Three lightweight tools save tokens vs `get_page_text` / `get_form_fields` when you don't need the full content:
+- `find_text("Saved successfully")` — grep the DOM. Returns surrounding context, a CSS selector, and a `clickable` flag for each match. Use this instead of `get_page_text` when you're checking whether a specific phrase is present, or to locate a button by its visible text. If `clickable=true`, pipe the matched text straight into `click_element`.
+- `find_input("Email")` — fuzzy form-field lookup, top-N. Returns labels you can pipe straight into `fill_input(label, value)` — both tools share the same match ranks (`aria-eq` → `placeholder-eq` → `label-text-eq` → `name-eq` → `id-eq` → `*-includes` → `fuzzy-text-walk`). Cheaper than `get_form_fields` when you just need a couple of specific fields. Pass `type_filter="email"` to restrict to a specific input type.
+- `wait_for_text("Saved")` — wait for text to appear without knowing the selector ahead of time. Complements `wait_for_selector` for the case where you only know the post-action message.
+All three pierce open shadow roots and accept `frame="iframe.selector"` for same-origin iframes. Pass `regex=true` on `find_text` / `wait_for_text` for case-insensitive regex matching. Pass `exact=true` on `find_input` to refuse fuzzy text-walk matches.
+```
+find_text("Build complete", scope_selector=".log-output")     — only check the build log section
+find_input("Card number", type_filter="text")                  — find Stripe's card-number field
+wait_for_text("Deploy successful", timeout_ms=30000)           — wait up to 30s after clicking Deploy
+```
+Reach for these BEFORE `get_page_text` / `get_form_fields` when the goal is "is X here?" or "where is X?". Reserve `get_page_text` for reading actual content, and `get_form_fields` for understanding a whole form's structure.
 ## Working with multiple tabs
 - Before opening a new tab, call `list_tabs()` to check if the target URL is already open —
   use `switch_to_tab` to return to it instead of opening a duplicate.
@@ -277,7 +302,7 @@ instead of DOM text): `get_page_text()` returns nothing useful. Zoom out and scr
 ```js
 // Shrink to fit wide content, then screenshot
 document.body.style.zoom = '0.4';
-// use take_and_copy_screenshot() to read it
+// use take_screenshot() to read it
 // restore afterward:
 document.body.style.zoom = '1';
 ```

package/README.md CHANGED Viewed

@@ -86,8 +86,7 @@ Claude will navigate, highlight steps, click what it can, pause for anything sen
 | Run arbitrary JS | `execute_script` |
 | Read browser console output | `get_console_logs` |
 | Capture credentials to `.env` | `read_element`, `write_to_env` |
-| Screenshot (element location only) | `take_screenshot` |
-| Screenshot + save + copy to clipboard | `take_and_copy_screenshot` |
+| Screenshot (Claude-only by default; pass `copy_to_clipboard` / `save_to` to share) | `take_screenshot` |
 | Screenshot the terminal window | `capture_terminal` |
 | Save/restore form state across tabs | `save_page_state`, `restore_page_state` |

package/dist/setup.js CHANGED Viewed

@@ -160,8 +160,7 @@ const CHROMEFLOW_TOOLS = [
   "scroll_to_element",
   "save_page_state",
   "restore_page_state",
-  // v0.1.25+
-  "take_and_copy_screenshot",
+  // v0.1.25+ → merged into take_screenshot in v0.7.0
   // v0.1.32+
   "fill_form",
   // v0.1.36+
@@ -177,7 +176,11 @@ const CHROMEFLOW_TOOLS = [
   // v0.1.57+
   "inspect_request_headers",
   // v0.2.1+
-  "wait_for_change"
+  "wait_for_change",
+  // v0.6.0+
+  "find_text",
+  "find_input",
+  "wait_for_text"
 ].map((t) => `mcp__chromeflow__${t}`);
 function patchSettingsLocalJson(cwd) {
   const claudeDir = join(cwd, ".claude");

package/dist/tools/browser.js CHANGED Viewed

@@ -53,58 +53,52 @@ ${lines.join("\n")}` }]
   );
   server.tool(
     "take_screenshot",
-    "Capture a screenshot of the current page. IMPORTANT: Do NOT use this to read page content or check what is on the page \u2014 call get_page_text instead, which is faster and returns searchable text. Screenshots are ONLY for locating a specific element's pixel coordinates when get_elements has already failed. Never take a screenshot immediately after open_page, scroll_page, or click_element \u2014 always use get_page_text after those actions. Never take more than 1-2 screenshots in a row. To also save or copy the image, use take_and_copy_screenshot instead.",
-    {},
-    async () => {
-      const response = await bridge.request({ type: "screenshot" });
+    `Capture a screenshot of the current page. By default returns the image to Claude only; pass copy_to_clipboard or save_to to also share the image outside Claude (paste into a chat, upload to a form, keep as a file).
+IMPORTANT: Do NOT use this to read page content \u2014 call get_page_text instead, which is faster and returns searchable text. Screenshots are ONLY for locating an element's pixel coordinates when DOM queries have already failed. Never take a screenshot immediately after open_page, scroll_page, or click_element. Never take more than 1-2 screenshots in a row.`,
+    {
+      copy_to_clipboard: z.boolean().optional().describe("Copy the PNG to the system clipboard (macOS only). Default false."),
+      save_to: z.enum(["downloads", "cwd", "none"]).optional().describe(`Save the PNG to disk: "downloads" (~/Downloads), "cwd" (Claude's working directory), or "none" (default \u2014 image returned only to Claude).`)
+    },
+    async ({ copy_to_clipboard = false, save_to = "none" }) => {
+      const sharing = copy_to_clipboard || save_to !== "none";
+      const response = await bridge.request({ type: "screenshot", grid: !sharing });
       if (response.type !== "screenshot_response") {
         throw new Error("Unexpected response from extension");
       }
-      return {
-        content: [
-          {
-            type: "image",
-            data: response.image,
-            mimeType: "image/png"
-          },
-          {
-            type: "text",
-            text: `Screenshot captured (${response.width}x${response.height}). Analyze the image to identify element positions for highlighting.`
-          }
-        ]
-      };
-    }
-  );
-  server.tool(
-    "take_and_copy_screenshot",
-    `Take a screenshot, return it to Claude, copy it to the system clipboard, and save it as a PNG file.
-Use this instead of take_screenshot when you need the image outside of Claude \u2014 to paste into a chat, upload to a form, or keep as a file.
-Unlike take_screenshot (Claude-only), this also puts the image on the clipboard and saves it to disk.
-save_to controls where the PNG is saved: "downloads" (default) saves to ~/Downloads, "cwd" saves to Claude's current working directory.`,
-    {
-      save_to: z.enum(["downloads", "cwd"]).optional().describe(`Where to save the PNG file: "downloads" (~/Downloads, default) or "cwd" (Claude's current working directory)`)
-    },
-    async ({ save_to = "downloads" }) => {
-      const response = await bridge.request({ type: "screenshot", grid: false });
-      if (response.type !== "screenshot_response") throw new Error("Unexpected response from extension");
+      if (!sharing) {
+        return {
+          content: [
+            { type: "image", data: response.image, mimeType: "image/png" },
+            {
+              type: "text",
+              text: `Screenshot captured (${response.width}x${response.height}). Analyze the image to identify element positions for highlighting.`
+            }
+          ]
+        };
+      }
       const timestamp = (/* @__PURE__ */ new Date()).toISOString().replace(/[:.]/g, "-").slice(0, 19);
       const filename = `chromeflow-${timestamp}.png`;
       const imageBuffer = Buffer.from(response.image, "base64");
       const tmpPath = join(tmpdir(), filename);
       writeFileSync(tmpPath, imageBuffer);
-      const savePath = save_to === "cwd" ? join(process.cwd(), filename) : join(homedir(), "Downloads", filename);
-      copyFileSync(tmpPath, savePath);
-      let clipboardNote = "";
-      try {
-        execSync(`osascript -e 'set the clipboard to (read (POSIX file "${tmpPath}") as \xABclass PNGf\xBB)'`);
-        clipboardNote = "Copied to clipboard. ";
-      } catch {
-        clipboardNote = "";
+      const notes = [];
+      if (save_to !== "none") {
+        const savePath = save_to === "cwd" ? join(process.cwd(), filename) : join(homedir(), "Downloads", filename);
+        copyFileSync(tmpPath, savePath);
+        notes.push(`Saved to ${savePath}`);
+      }
+      if (copy_to_clipboard) {
+        try {
+          execSync(`osascript -e 'set the clipboard to (read (POSIX file "${tmpPath}") as \xABclass PNGf\xBB)'`);
+          notes.push("Copied to clipboard");
+        } catch {
+        }
       }
       return {
         content: [
           { type: "image", data: response.image, mimeType: "image/png" },
-          { type: "text", text: `${clipboardNote}Saved to ${savePath}` }
+          { type: "text", text: notes.length ? notes.join(". ") + "." : `Screenshot captured (${response.width}x${response.height}).` }
         ]
       };
     }
@@ -285,15 +279,7 @@ For iframe contenteditables: pass \`frame\` (a CSS selector for the iframe). typ
   );
   server.tool(
     "set_file_input",
-    `Upload a file to a file input field. Works even when the input is visually hidden behind a custom drag-and-drop zone.
-Uses Chrome DevTools Protocol to set the file \u2014 the only way to bypass the browser's file-input script restriction.
-Returns success=true ONLY if an observable change is detected within wait_ms: either the page-level file count goes up, or the file is consumed by the page's React handler (input is reset), or verify_selector matches a new element on the page. Otherwise success=false with a clear message \u2014 typically because the page rejected the file (size/type) or the React handler hasn't run yet.
-For rapid batch uploads (multiple set_file_input calls in a row), this commit-wait prevents the second CDP call from overwriting the first before React reads it \u2014 no manual sleep needed between calls.
-hint: label text, name, or CSS selector of the file input (or empty string to target the first file input on the page).
-file_path: absolute path to the file on the local filesystem (e.g. /Users/you/Downloads/task.zip).`,
+    "Upload a file to a file input \u2014 works even when the input is hidden behind a custom drag-and-drop zone. Returns success=true only after an observable commit (file count goes up, input gets reset, or verify_selector appears within wait_ms). See CLAUDE.md for batch-upload guidance.",
     {
       hint: z.string().describe("Label text, name, or surrounding text of the file input. Use empty string to target the first file input on the page."),
       file_path: z.string().describe("Absolute path to the file to upload (e.g. /Users/you/Downloads/task.zip)"),
@@ -339,13 +325,9 @@ Returns the matched element's tag/name/id/type so you can verify it was the righ
   );
   server.tool(
     "execute_script",
-    `Execute JavaScript in the current page's context and return the result as a string.
-Use this to read framework state, check DOM properties, or interact with page APIs that aren't reachable via text.
-Prefer get_page_text for reading visible content. Use this for programmatic DOM queries (e.g. checking an element's attribute, reading a value not visible in text).
-Top-level return statements are supported (e.g. multi-statement scripts with \`return value;\`).
-Top-level \`await\` is supported \u2014 write \`return await fetch(url).then(r => r.json())\` directly without the window.__variable + sleep + re-read pattern. Detected automatically when the code contains the \`await\` keyword.
-If the page called alert()/confirm()/prompt() since the last check, the message will appear as PAGE ALERT in the result \u2014 read it and act on it.
-NOTE: Pages with strict Content Security Policy (e.g. Stripe, GitHub) will fall through to a CDP path that bypasses CSP \u2014 but the script still runs, so retries usually aren't needed.`,
+    `Execute JavaScript in the current page's context and return the result. Use for reading framework state or DOM properties not visible in text \u2014 prefer get_page_text for visible content. Top-level \`return\` and \`await\` are supported.
+CSP-strict pages (Stripe, GitHub) silently fall through to a CDP eval path. Page alerts (alert/confirm/prompt) fired since the last script appear as PAGE ALERT in the result.`,
     {
       code: z.string().describe(
         "JavaScript expression or multi-statement script to evaluate in the page. Top-level `return` is supported."

package/dist/tools/flow.js CHANGED Viewed

@@ -187,6 +187,197 @@ Examples: scroll_to_element("#submit-btn"), scroll_to_element("Billing address")
       return { content: [{ type: "text", text: msg }] };
     }
   );
+  server.tool(
+    "find_text",
+    `Search the page for text and get back actionable matches without dumping the whole DOM. Use this instead of get_page_text when you only need to know "is X on the page?" or "where is the Save button?".
+For each match, returns the surrounding context, the nearest meaningful element (button/link/heading/role/label/etc.), a best-effort CSS selector, and a clickable flag. If a match is clickable, pipe the matched text into click_element to act on it.
+Use when:
+- Checking whether a toast / error message / heading appeared after an action
+- Locating one of multiple buttons by text
+- Finding all instances of a phrase to count or inspect them
+Do NOT use for: reading large blocks of body text \u2014 use get_page_text(selector=...) for that. find_text returns one short snippet per match, not the full content.
+Pierces open shadow roots. Pass frame="iframe.selector" to search inside a same-origin iframe.`,
+    {
+      query: z.string().describe(
+        "Text to search for. Substring match by default; pass regex=true to interpret as a case-insensitive regex."
+      ),
+      max: z.number().int().min(1).optional().describe("Maximum matches to return (default 10). total_matches is reported even when truncated."),
+      scope_selector: z.string().optional().describe('Limit search to descendants of this CSS selector (e.g. ".main-panel", "#dialog"). Default searches the whole body.'),
+      regex: z.boolean().optional().describe("Treat query as a regex (case-insensitive). Default false."),
+      visible_only: z.boolean().optional().describe("Skip matches inside display:none / visibility:hidden / aria-hidden=true ancestors. Default true."),
+      context_chars: z.number().int().min(0).optional().describe("Characters of surrounding context to include before/after each match. Default 60."),
+      frame: z.string().optional().describe('Same-origin iframe CSS selector (e.g. "iframe.editor") to search inside. Cross-origin iframes are not supported.')
+    },
+    async ({ query, max, scope_selector, regex, visible_only, context_chars, frame }) => {
+      const response = await bridge.request({
+        type: "find_text",
+        query,
+        max,
+        scope_selector,
+        regex,
+        visible_only,
+        context_chars,
+        frame
+      });
+      const r = response;
+      if (r.frame_error) {
+        return { content: [{ type: "text", text: r.frame_error }] };
+      }
+      if (r.scope_missed) {
+        return {
+          content: [
+            { type: "text", text: `Scope selector "${scope_selector}" did not match any element \u2014 no search performed.` }
+          ]
+        };
+      }
+      if (r.matches.length === 0) {
+        return {
+          content: [
+            { type: "text", text: `No matches found for "${query}".` }
+          ]
+        };
+      }
+      const lines = r.matches.map((m, i) => {
+        const role = m.role ? `, role=${m.role}` : "";
+        const click = m.clickable ? " \u2014 clickable" : "";
+        const pos = m.position ? ` at (${m.position.x}, ${m.position.y}, ${m.position.width}\xD7${m.position.height})` : "";
+        return `  ${i + 1}. [${m.tag}${role}]${click}${pos} \u2014 selector: ${m.selector}
+     "${m.text}"
+     context: ${m.context}`;
+      });
+      const header = r.truncated ? `Found ${r.matches.length} of ${r.total_matches} matches for "${query}":` : `Found ${r.matches.length} match${r.matches.length === 1 ? "" : "es"} for "${query}":`;
+      return {
+        content: [{ type: "text", text: `${header}
+${lines.join("\n")}` }]
+      };
+    }
+  );
+  server.tool(
+    "find_input",
+    `Locate form inputs whose label / placeholder / aria-label / name / id matches a hint, returning the top N with their section heading. Use this instead of get_form_fields when you only need a couple of fields \u2014 it's the targeted lookup, not the full inventory.
+Match strength is reported as match_kind: aria-eq / placeholder-eq / label-text-eq / name-eq / id-eq are exact matches; *-includes are partial matches; fuzzy-text-walk is the lowest-confidence fallback.
+Returned labels are designed to be piped straight into fill_input(label, value), which uses the same fuzzy ranks to find the same field again. No CSS selector is returned \u2014 fill_input matches by label text, not by selector.
+Use when:
+- "Is the Email field on this page?"
+- "Find the price input below the fold"
+- "Which input has placeholder 'you@example.com'?"
+Do NOT use for: filling fields (use fill_input / fill_form). For the full form inventory (every field including hidden ones), use get_form_fields.
+Pierces open shadow roots. Pass frame="iframe.selector" to search inside a same-origin iframe. Pass exact=true to refuse fuzzy text-walk and *-includes matches when the hint is short and could collide with neighbours.`,
+    {
+      query: z.string().describe(
+        "Hint to match against the field's label, placeholder, aria-label, name, or id (e.g. 'Email', 'price', 'Card number')"
+      ),
+      type_filter: z.string().optional().describe(
+        'Restrict to a specific input type \u2014 "email", "checkbox", "file", "textarea", "select", "number", etc. Default "any".'
+      ),
+      max: z.number().int().min(1).optional().describe("Maximum fields to return (default 5). total_matches is reported even when truncated."),
+      exact: z.boolean().optional().describe(
+        "If true, return only exact equality matches (aria-eq / placeholder-eq / label-text-eq / name-eq / id-eq). Skips fuzzy text-walk and *-includes. Default false."
+      ),
+      frame: z.string().optional().describe("Same-origin iframe CSS selector to search inside. Cross-origin iframes are not supported.")
+    },
+    async ({ query, type_filter, max, exact, frame }) => {
+      const response = await bridge.request({
+        type: "find_input",
+        query,
+        type_filter,
+        max,
+        exact,
+        frame
+      });
+      const r = response;
+      if (r.frame_error) {
+        return { content: [{ type: "text", text: r.frame_error }] };
+      }
+      if (r.fields.length === 0) {
+        return {
+          content: [{ type: "text", text: `No input fields found matching "${query}".` }]
+        };
+      }
+      const lines = r.fields.map((f, i) => {
+        const placeholderPart = f.placeholder ? ` placeholder="${f.placeholder}"` : "";
+        const valuePart = f.value ? ` value="${f.value}"` : "";
+        const underPart = f.under ? ` [under: "${f.under}"]` : "";
+        const posPart = f.position ? ` at y=${f.position.y}` : "";
+        return `  ${i + 1}. "${f.label}" type=${f.type}${placeholderPart}${valuePart}${underPart} \u2014 match: ${f.match_kind}${posPart}`;
+      });
+      const header = r.truncated ? `Found ${r.fields.length} of ${r.total_matches} input(s) for "${query}":` : `Found ${r.fields.length} input${r.fields.length === 1 ? "" : "s"} for "${query}":`;
+      return {
+        content: [{ type: "text", text: `${header}
+${lines.join("\n")}
+To fill: fill_input("${r.fields[0].label}", "<value>")` }]
+      };
+    }
+  );
+  server.tool(
+    "wait_for_text",
+    `Wait for text to appear in the DOM. Complement to wait_for_selector for the case where you only know the message text \u2014 no selector required. Uses a MutationObserver under the hood, no polling.
+Resolves on the first match (or if the text is already present). Returns the elapsed time, the matched text, and the surrounding context.
+Use when:
+- "Click Save, then wait for 'Saved successfully' to show"
+- "Wait until the deploy log says 'Build complete'"
+- Any case where the post-action signal is a phrase, not a known selector
+Do NOT use for: waiting on a known CSS selector (use wait_for_selector \u2014 slightly cheaper).
+Pierces open shadow roots. Pass frame="iframe.selector" to wait for text inside a same-origin iframe.`,
+    {
+      query: z.string().describe("Text to wait for (substring by default; pass regex=true for a regex)"),
+      timeout_ms: z.number().int().min(100).optional().describe("Maximum milliseconds to wait (default 10000)"),
+      scope_selector: z.string().optional().describe("Limit the observation to a CSS selector's subtree (e.g. '.toast-region')"),
+      regex: z.boolean().optional().describe("Treat query as a regex (case-insensitive). Default false."),
+      frame: z.string().optional().describe("Same-origin iframe CSS selector to wait inside. Cross-origin iframes are not supported.")
+    },
+    async ({ query, timeout_ms, scope_selector, regex, frame }) => {
+      const wsTimeout = Math.max(15e3, (timeout_ms ?? 1e4) + 5e3);
+      const response = await bridge.request(
+        {
+          type: "wait_for_text",
+          query,
+          timeout_ms,
+          scope_selector,
+          regex,
+          frame
+        },
+        wsTimeout
+      );
+      const r = response;
+      if (r.frame_error) {
+        return { content: [{ type: "text", text: r.frame_error }] };
+      }
+      if (!r.found) {
+        return {
+          content: [
+            { type: "text", text: `Timed out after ${r.elapsed_ms}ms waiting for "${query}".` }
+          ]
+        };
+      }
+      const ctx = r.context ? `
+context: ${r.context}` : "";
+      const sel = r.selector ? `
+selector: ${r.selector}` : "";
+      return {
+        content: [
+          {
+            type: "text",
+            text: `Found "${r.text}" after ${r.elapsed_ms}ms.${sel}${ctx}`
+          }
+        ]
+      };
+    }
+  );
   server.tool(
     "fill_form",
     `Fill multiple form fields in a single call by targeting each field by its label text.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "chromeflow",
-  "version": "0.5.0",
+  "version": "0.7.1",
   "description": "Browser guidance MCP server for Claude Code — highlights, clicks, fills, and captures from the web so you don't have to.",
   "type": "module",
   "bin": {