npm - chrometools-mcp - Versions diffs - 3.5.0 → 3.5.1 - Mend

chrometools-mcp 3.5.0 → 3.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +12 -0
package/README.md +8 -8
package/element-finder-utils.js +15 -2
package/index.js +16 -3
package/package.json +1 -1
package/pom/apom-tree-converter.js +22 -14
package/server/tool-definitions.js +3 -3
package/server/tool-schemas.js +2 -2
package/utils/actions/click-action.js +10 -3
package/utils/actions/screenshot-action.js +4 -4
package/utils/element-actions.js +51 -11

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,18 @@
 All notable changes to this project will be documented in this file.
+## [3.5.1] - 2026-02-16
+### Fixed
+- **APOM selector uniqueness** — `generateSelector()` in APOM tree converter now verifies CSS selector uniqueness against the entire document instead of just parent element. Fixes critical bug where `click(id: "button_X")` could click the wrong element (e.g., navigation button instead of action button) when multiple elements shared the same class like `.ant-btn`
+- **findElementsByText click timeout** — `executeElementAction` click now uses adaptive strategy with 5s timeout and JS fallback instead of raw Puppeteer `element.click()` which could hang indefinitely on elements inside complex layouts (antd Tabs, scrollable containers)
+- **findElementsByText non-unique selectors** — `getUniqueSelectorInPage` fallback now checks selector uniqueness at each level of path building (max depth 5→8), preventing clicks on wrong elements when multiple matches exist
+### Changed
+- **Screenshot defaults optimized** — Default format changed from PNG/auto to JPEG quality 40 for all screenshot tools, reducing token usage from ~15-25k to ~5-10k tokens per screenshot
+- **Action screenshots compressed** — Screenshots from click/findElementsByText/hover with `screenshot: true` now use lightweight JPEG (quality 40, maxWidth 800) instead of raw PNG, dramatically reducing context consumption
+- **Jimp warmup** — Pre-warms Jimp image processor at server startup (non-blocking, after transport connect) to avoid cold-start delays on first screenshot
 ## [3.5.0] - 2026-02-16
 ### Added

package/README.md CHANGED Viewed

@@ -8,7 +8,7 @@
 **For AI Agents & Developers:**
 - 🎯 **56+ specialized tools** for browser automation - from simple clicks to Figma comparisons
-- 🧠 **APOM (Agent Page Object Model)** - AI-friendly page representation (~8-10k tokens vs 15-25k for screenshots)
+- 🧠 **APOM (Agent Page Object Model)** - AI-friendly page representation (~8-10k tokens vs 5-10k for screenshots)
 - 🔄 **Persistent browser sessions** - pages stay open between commands for iterative workflows
 - ⚡ **Framework-aware** - handles React, Vue, Angular events and state updates automatically
 - 📸 **Visual testing** - compare designs pixel-by-pixel with Figma integration
@@ -322,7 +322,7 @@ executeScenario({ name: "login_flow", parameters: { email: "user@test.com" } })
 1. ✅ **`analyzePage()`** - PRIMARY tool for reading page content
    - Gets forms, inputs, buttons, links with current values
    - Use `refresh: true` after interactions to see updated state
-   - Efficient: 2-5k tokens vs screenshot 15-25k
+   - Efficient: 2-5k tokens vs screenshot 5-10k
 2. ✅ **`findElementsByText()`** - Find specific elements by visible text
 3. ✅ **`getElement()`** - Get HTML of specific element
 4. ⚠️ **`executeScript()`** - LAST RESORT, only if above failed
@@ -397,7 +397,7 @@ executeScenario({ name: "login_flow", parameters: { email: "user@test.com" } })
   - `useLegacyFormat` (optional): Return legacy format instead of APOM (default: false - APOM is the default)
   - `registerElements` (optional): Auto-register elements for ID-based usage (default: true)   - `groupBy` (optional): 'type' or 'flat' - how to group elements (default: 'type') - **Why better than screenshot**:
   - Shows actual data (form values, validation errors) not just visual
-  - Uses 2-5k tokens vs screenshot 15-25k tokens
+  - Uses 2-5k tokens vs screenshot 5-10k tokens
   - Returns structured data with **unique element IDs** for easy interaction
   - **Detects UI frameworks** (MUI, Ant Design, Chakra, Bootstrap, Vuetify, Semantic UI)  - **Extracts dropdown options** from both native `<select>` and custom UI components- **Returns**:
   - **APOM format** (default): Tree-structured Page Object Model with unique IDs     - `tree` - Hierarchical tree of page elements (optimized: ~82% smaller than flat format)
@@ -674,11 +674,11 @@ Capture optimized screenshot of specific element with smart compression and auto
   - `padding` (optional): Padding in pixels (default: 0)
   - `maxWidth` (optional): Max width for auto-scaling (default: 1024, null for original size)
   - `maxHeight` (optional): Max height for auto-scaling (default: 8000, null for original size)
-  - `quality` (optional): JPEG quality 1-100 (default: 80)
-  - `format` (optional): 'png', 'jpeg', or 'auto' (default: 'auto')
+  - `quality` (optional): JPEG quality 1-100 (default: 40)
+  - `format` (optional): 'png', 'jpeg', or 'auto' (default: 'jpeg')
 - **Use case**: Visual documentation, bug reports
-- **Returns**: Optimized image with metadata
-- **Default behavior**: Auto-scales to 1024px width and 8000px height (API limit) and uses smart compression to reduce AI token usage
+- **Returns**: Optimized image with metadata (~5-10k tokens)
+- **Default behavior**: JPEG at quality 40, auto-scales to 1024px width and 8000px height (API limit). For higher quality, explicitly set `quality` and `format` parameters
 - **Automatic compression**: If image exceeds 3 MB, automatically reduces quality or scales down to fit within limit
 - **For original quality**: Set `maxWidth: null`, `maxHeight: null` and `format: 'png'` (still enforces 3 MB limit)
@@ -692,7 +692,7 @@ Save optimized screenshot to filesystem without returning in context, with autom
   - `maxHeight` (optional): Max height for auto-scaling (default: 8000, null for original)
   - `quality` (optional): JPEG quality 1-100 (default: 80)
   - `format` (optional): 'png', 'jpeg', or 'auto' (default: 'auto')
-- **Use case**: Baseline screenshots, file storage
+- **Use case**: Baseline screenshots, file storage (higher quality defaults than `screenshot` tool)
 - **Returns**: File path and metadata (not image data)
 - **Default behavior**: Auto-scales and compresses to save disk space
 - **Automatic compression**: If image exceeds 3 MB, automatically reduces quality or scales down to fit within limit

package/element-finder-utils.js CHANGED Viewed

@@ -462,8 +462,10 @@ function getUniqueSelectorInPage(element) {
   }
   // 8. Fallback: nth-of-type with path
+  // Build path up to 8 levels, verifying uniqueness
   let current = element;
   const path = [];
+  const MAX_PATH_DEPTH = 8;
   while (current && current.tagName) {
     let selector = current.tagName.toLowerCase();
@@ -497,10 +499,21 @@ function getUniqueSelectorInPage(element) {
     }
     path.unshift(selector);
+    // Check if current path is already unique
+    try {
+      const candidateSelector = path.join(' > ');
+      if (document.querySelectorAll(candidateSelector).length === 1) {
+        return candidateSelector;
+      }
+    } catch (e) {
+      // Invalid selector, continue building path
+    }
     current = current.parentElement;
-    // Stop at body or after 5 levels
-    if (!current || current.tagName === 'BODY' || path.length >= 5) {
+    // Stop at body or after max depth
+    if (!current || current.tagName === 'BODY' || path.length >= MAX_PATH_DEPTH) {
       break;
     }
   }

package/index.js CHANGED Viewed

@@ -1,4 +1,4 @@
-#!/usr/bin/env node
+#!/usr/bin/env node
 import {Server} from "@modelcontextprotocol/sdk/server/index.js";
 import {StdioServerTransport} from "@modelcontextprotocol/sdk/server/stdio.js";
@@ -2334,7 +2334,7 @@ Start coding now.`;
             return {
               content: [
                 { type: 'text', text: JSON.stringify(response, null, 2) },
-                { type: 'image', data: actionResult.screenshot, mimeType: 'image/png' }
+                { type: 'image', data: actionResult.screenshot, mimeType: actionResult.screenshotMimeType || 'image/png' }
               ]
             };
           }
@@ -2760,7 +2760,7 @@ Start coding now.`;
             return {
               content: [
                 { type: 'text', text: JSON.stringify(response, null, 2) },
-                { type: 'image', data: actionResult.screenshot, mimeType: 'image/png' }
+                { type: 'image', data: actionResult.screenshot, mimeType: actionResult.screenshotMimeType || 'image/png' }
               ]
             };
           }
@@ -3938,6 +3938,19 @@ async function main() {
   console.error("chrometools-mcp server running on stdio");
   console.error("Browser will be initialized on first openBrowser call");
+  // Pre-warm Jimp AFTER server is connected (non-blocking)
+  // Jimp v0.22 constructor is thenable - awaiting it before server.connect()
+  // would block transport and cause MCP client timeout
+  (async () => {
+    try {
+      const img = await new Jimp(1, 1, 0x000000ff);
+      await img.getBufferAsync(Jimp.MIME_JPEG);
+      console.error("[chrometools-mcp] Jimp pre-warmed");
+    } catch (e) {
+      console.error("[chrometools-mcp] Jimp pre-warm failed:", e.message);
+    }
+  })();
 }
 main().catch((error) => {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "chrometools-mcp",
-  "version": "3.5.0",
+  "version": "3.5.1",
   "description": "MCP (Model Context Protocol) server for Chrome automation using Puppeteer. Persistent browser sessions, UI framework detection (MUI, Ant Design, etc.), Page Object support, visual testing, Figma comparison. Works seamlessly in WSL, Linux, macOS, and Windows.",
   "type": "module",
   "main": "index.js",

package/pom/apom-tree-converter.js CHANGED Viewed

@@ -1003,30 +1003,29 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
     if (stableClass) {
       const escapedClass = CSS.escape(stableClass);
       const classSelector = `.${escapedClass}`;
-      // Verify it's unique within parent context
-      if (element.parentElement) {
-        try {
-          const matches = element.parentElement.querySelectorAll(classSelector);
-          if (matches.length === 1 && matches[0] === element) {
-            return classSelector;
-          }
-        } catch (e) {
-          // Invalid selector, continue to path-based approach
+      // Verify it's unique in the ENTIRE document (not just parent)
+      try {
+        const matches = document.querySelectorAll(classSelector);
+        if (matches.length === 1 && matches[0] === element) {
+          return classSelector;
         }
+      } catch (e) {
+        // Invalid selector, continue to path-based approach
       }
     }
-    // Build path from parent
+    // Build path from element to body, checking uniqueness at each level
     const path = [];
     let current = element;
+    const MAX_PATH_DEPTH = 8;
-    while (current && current !== document.body) {
+    while (current && current !== document.body && path.length < MAX_PATH_DEPTH) {
       let selector = current.tagName.toLowerCase();
       // Add stable class if available (escaped for CSS selector safety)
-      const stableClass = getStableClassName(current);
-      if (stableClass) {
-        selector += `.${CSS.escape(stableClass)}`;
+      const cls = getStableClassName(current);
+      if (cls) {
+        selector += `.${CSS.escape(cls)}`;
       }
       // Add nth-of-type if needed
@@ -1041,6 +1040,15 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
       }
       path.unshift(selector);
+      // Check if current path is already unique in the document
+      try {
+        const candidateSelector = path.join(' > ');
+        if (document.querySelectorAll(candidateSelector).length === 1) {
+          return candidateSelector;
+        }
+      } catch (e) { /* continue building path */ }
       current = current.parentElement;
     }

package/server/tool-definitions.js CHANGED Viewed

@@ -92,7 +92,7 @@ export const toolDefinitions = [
       },
       {
         name: "screenshot",
-        description: "Capture element image (15-25k tokens). Use analyzePage for form data/validation (8-10k tokens).",
+        description: "Capture element image (5-10k tokens). Use analyzePage for form data/validation (8-10k tokens).",
         inputSchema: {
           type: "object",
           properties: {
@@ -100,8 +100,8 @@ export const toolDefinitions = [
             padding: { type: "number", description: "Padding px (default: 0)" },
             maxWidth: { type: "number", description: "Max width px (default: 1024, null=original)" },
             maxHeight: { type: "number", description: "Max height px (default: 8000, null=original)" },
-            quality: { type: "number", minimum: 1, maximum: 100, description: "JPEG quality (default: 80)" },
-            format: { type: "string", enum: ["png", "jpeg", "auto"], description: "Format (default: auto)" },
+            quality: { type: "number", minimum: 1, maximum: 100, description: "JPEG quality (default: 40)" },
+            format: { type: "string", enum: ["png", "jpeg", "auto"], description: "Format (default: jpeg)" },
           },
           required: ["selector"],
         },

package/server/tool-schemas.js CHANGED Viewed

@@ -115,8 +115,8 @@ export const ScreenshotSchema = z.object({
   padding: z.number().optional().describe("Padding around element in pixels (default: 0)"),
   maxWidth: z.number().nullable().optional().describe("Maximum width in pixels, auto-scales if larger (default: 1024, set to null for original size)"),
   maxHeight: z.number().nullable().optional().describe("Maximum height in pixels, auto-scales if larger (default: 8000 for API limit, set to null for original size)"),
-  quality: z.number().min(1).max(100).optional().describe("JPEG quality 1-100 (default: 80, only applies to JPEG format)"),
-  format: z.enum(['png', 'jpeg', 'auto']).optional().describe("Image format: 'png', 'jpeg', or 'auto' (default: 'auto' - chooses based on size)"),
+  quality: z.number().min(1).max(100).optional().describe("JPEG quality 1-100 (default: 40)"),
+  format: z.enum(['png', 'jpeg', 'auto']).optional().describe("Image format (default: 'jpeg')"),
 }).refine(data => (data.id && !data.selector) || (!data.id && data.selector), {
   message: "Either 'id' or 'selector' must be provided, but not both"
 });

package/utils/actions/click-action.js CHANGED Viewed

@@ -5,6 +5,7 @@
 import { runPostClickDiagnostics, formatDiagnosticsForAI } from '../post-click-diagnostics.js';
 import { generateClickHints } from '../hints-generator.js';
+import { processScreenshot } from '../screenshot-processor.js';
 /**
  * Execute click action on element with adaptive strategy
@@ -186,10 +187,16 @@ export async function executeClickAction(page, element, options = {}) {
     { type: "text", text: `Clicked: ${identifier}${hintsText}${diagnosticsText}` }
   ];
-  // Only add screenshot if requested
+  // Only add screenshot if requested — lightweight JPEG for action confirmation
   if (screenshot === true) {
-    const screenshotData = await page.screenshot({ encoding: 'base64', fullPage: false });
-    content.push({ type: "image", data: screenshotData, mimeType: "image/png" });
+    const screenshotBuffer = await page.screenshot({ encoding: 'binary', fullPage: false });
+    const processed = await processScreenshot(screenshotBuffer, {
+      maxWidth: 800,
+      maxHeight: 4000,
+      quality: 40,
+      format: 'jpeg',
+    });
+    content.push({ type: "image", data: processed.buffer.toString('base64'), mimeType: processed.mimeType });
   }
   return { content };

package/utils/actions/screenshot-action.js CHANGED Viewed

@@ -14,8 +14,8 @@ import { processScreenshot } from '../image-processing.js';
  * @param {number} options.padding - Padding around element in pixels (default: 0)
  * @param {number|null} options.maxWidth - Max width for scaling (default: 1024, null for original)
  * @param {number|null} options.maxHeight - Max height for scaling (default: 8000, null for original)
- * @param {number} options.quality - JPEG quality 1-100 (default: 80)
- * @param {string} options.format - Image format: 'png', 'jpeg', 'auto' (default: 'auto')
+ * @param {number} options.quality - JPEG quality 1-100 (default: 50)
+ * @param {string} options.format - Image format: 'png', 'jpeg', 'auto' (default: 'jpeg')
  * @returns {Promise<Object>} Result with content array (text + image)
  */
 export async function executeScreenshotAction(page, element, options = {}) {
@@ -24,8 +24,8 @@ export async function executeScreenshotAction(page, element, options = {}) {
     padding = 0,
     maxWidth = 1024,
     maxHeight = 8000,
-    quality = 80,
-    format = 'auto'
+    quality = 40,
+    format = 'jpeg'
   } = options;
   // Scroll to element to ensure it's in viewport

package/utils/element-actions.js CHANGED Viewed

@@ -1,3 +1,25 @@
+import { processScreenshot } from './screenshot-processor.js';
+// Lightweight action screenshot: small JPEG for confirming actions worked
+// These are "report" screenshots, not for detailed analysis
+async function takeActionScreenshot(page, clip) {
+  const screenshotBuffer = await page.screenshot({
+    encoding: 'binary',
+    fullPage: false,
+    ...(clip ? { clip } : {})
+  });
+  const processed = await processScreenshot(screenshotBuffer, {
+    maxWidth: 800,
+    maxHeight: 4000,
+    quality: 40,
+    format: 'jpeg',
+  });
+  return {
+    data: processed.buffer.toString('base64'),
+    mimeType: processed.mimeType,
+  };
+}
 // Helper function to execute actions on elements
 export async function executeElementAction(page, selector, action) {
   if (!action || !action.type) {
@@ -17,13 +39,27 @@ export async function executeElementAction(page, selector, action) {
   switch (action.type) {
     case 'click':
-      await element.click();
+      // Scroll element into view first (direct JS, avoids Puppeteer's scrollIntoViewIfNeeded hang)
+      await element.evaluate(el => el.scrollIntoView({ behavior: 'instant', block: 'center' }));
+      // Click with timeout + JS fallback (Puppeteer's click can hang in complex layouts)
+      try {
+        await Promise.race([
+          element.click(),
+          new Promise((_, reject) => setTimeout(() => reject(new Error('click timeout')), 5000))
+        ]);
+      } catch (e) {
+        // Fallback to JS click (bypasses Puppeteer's coordinate-based click)
+        await element.evaluate(el => el.click());
+      }
       await new Promise(resolve => setTimeout(resolve, action.waitAfter || 1500));
       result.message = `Clicked on ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;
@@ -38,8 +74,9 @@ export async function executeElementAction(page, selector, action) {
       result.message = `Typed "${action.text}" into ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;
@@ -65,9 +102,10 @@ export async function executeElementAction(page, selector, action) {
         width: Math.max(box.width, 1),
         height: Math.max(box.height, 1)
       };
-      const screenshot = await page.screenshot({ clip, encoding: 'base64' });
+      const { data: screenshotData, mimeType: screenshotMime } = await takeActionScreenshot(page, clip);
       result.message = `Captured screenshot of ${selector}`;
-      result.screenshot = screenshot;
+      result.screenshot = screenshotData;
+      result.screenshotMimeType = screenshotMime;
       break;
     case 'hover':
@@ -76,8 +114,9 @@ export async function executeElementAction(page, selector, action) {
       result.message = `Hovered over ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;
@@ -101,8 +140,9 @@ export async function executeElementAction(page, selector, action) {
       result.message = `Applied styles to ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;