npm - chrometools-mcp - Versions diffs - 3.5.0 → 3.5.2 - Mend

chrometools-mcp 3.5.0 → 3.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/CHANGELOG.md +23 -0
package/README.md +16 -8
package/element-finder-utils.js +15 -2
package/index.js +17 -4
package/models/TextInputModel.js +22 -4
package/models/TextareaModel.js +19 -3
package/models/index.js +39 -1
package/package.json +1 -1
package/pom/apom-tree-converter.js +125 -26
package/server/tool-definitions.js +3 -3
package/server/tool-schemas.js +2 -2
package/utils/actions/click-action.js +10 -3
package/utils/actions/screenshot-action.js +4 -4
package/utils/element-actions.js +51 -11

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,29 @@
 All notable changes to this project will be documented in this file.
+## [3.5.2] - 2026-02-16
+### Added
+- **Modal/Dialog detection (React Portals)** — `analyzePage` now detects modals rendered via React Portals (antd, MUI, Bootstrap, Chakra, Element UI, Headless UI, Radix, Mantine). ModalModel class in Element Model system with `role="dialog"` / `aria-modal="true"` matching. Portal wrapper ancestors are force-included in APOM tree with compact format. Modal metadata includes title and action buttons
+- **TxtInp `clear` action** — TextInput model now supports `executeModelAction(action: "clear")` for clearing pre-filled form fields
+### Fixed
+- **React controlled input clearing** — `type(clearFirst: true)` now works correctly with React/Vue/Angular controlled inputs (antd `<Input>`, MUI `<TextField>`, etc.). Uses native `HTMLInputElement.prototype.value` setter to bypass framework value trackers that ignored programmatic `el.value = ''` changes. Applied to both TextInputModel and TextareaModel
+- **"ModelRegistry is not defined" error** — Fixed sporadic ReferenceError when calling `executeModelAction` or `click` after page navigation. Bare `ModelRegistry` identifier was inaccessible after `eval()` in strict mode contexts; changed to `window.ModelRegistry` reference
+- **Modal output bloat** — ModalModel now only matches actual dialog elements (`role="dialog"`), not framework wrapper divs (`ant-modal-root`, `ant-modal-wrap`). Reduces `modalCount` from 3 to 1 per modal and forces wrapper ancestors to compact container format
+## [3.5.1] - 2026-02-16
+### Fixed
+- **APOM selector uniqueness** — `generateSelector()` in APOM tree converter now verifies CSS selector uniqueness against the entire document instead of just parent element. Fixes critical bug where `click(id: "button_X")` could click the wrong element (e.g., navigation button instead of action button) when multiple elements shared the same class like `.ant-btn`
+- **findElementsByText click timeout** — `executeElementAction` click now uses adaptive strategy with 5s timeout and JS fallback instead of raw Puppeteer `element.click()` which could hang indefinitely on elements inside complex layouts (antd Tabs, scrollable containers)
+- **findElementsByText non-unique selectors** — `getUniqueSelectorInPage` fallback now checks selector uniqueness at each level of path building (max depth 5→8), preventing clicks on wrong elements when multiple matches exist
+### Changed
+- **Screenshot defaults optimized** — Default format changed from PNG/auto to JPEG quality 40 for all screenshot tools, reducing token usage from ~15-25k to ~5-10k tokens per screenshot
+- **Action screenshots compressed** — Screenshots from click/findElementsByText/hover with `screenshot: true` now use lightweight JPEG (quality 40, maxWidth 800) instead of raw PNG, dramatically reducing context consumption
+- **Jimp warmup** — Pre-warms Jimp image processor at server startup (non-blocking, after transport connect) to avoid cold-start delays on first screenshot
 ## [3.5.0] - 2026-02-16
 ### Added

package/README.md CHANGED Viewed

@@ -8,7 +8,7 @@
 **For AI Agents & Developers:**
 - 🎯 **56+ specialized tools** for browser automation - from simple clicks to Figma comparisons
-- 🧠 **APOM (Agent Page Object Model)** - AI-friendly page representation (~8-10k tokens vs 15-25k for screenshots)
+- 🧠 **APOM (Agent Page Object Model)** - AI-friendly page representation (~8-10k tokens vs 5-10k for screenshots)
 - 🔄 **Persistent browser sessions** - pages stay open between commands for iterative workflows
 - ⚡ **Framework-aware** - handles React, Vue, Angular events and state updates automatically
 - 📸 **Visual testing** - compare designs pixel-by-pixel with Figma integration
@@ -322,7 +322,7 @@ executeScenario({ name: "login_flow", parameters: { email: "user@test.com" } })
 1. ✅ **`analyzePage()`** - PRIMARY tool for reading page content
    - Gets forms, inputs, buttons, links with current values
    - Use `refresh: true` after interactions to see updated state
-   - Efficient: 2-5k tokens vs screenshot 15-25k
+   - Efficient: 2-5k tokens vs screenshot 5-10k
 2. ✅ **`findElementsByText()`** - Find specific elements by visible text
 3. ✅ **`getElement()`** - Get HTML of specific element
 4. ⚠️ **`executeScript()`** - LAST RESORT, only if above failed
@@ -335,6 +335,14 @@ executeScenario({ name: "login_flow", parameters: { email: "user@test.com" } })
    - Example: `executeModelAction({id: "input_34", action: "check"})`
    - Example: `executeModelAction({selector: ".datepicker", action: "SetDate", params: {date: "2024-03-15"}})`
    - See `models/` directory for available models and actions
+   - Available models: TxtInp, Sel, Btn, Chk, Radio, TxtArea, Link, Range, DatePicker, DateInp, FileInp, ColorInp, **Modal**, default
+#### Modal/Dialog Support
+- **Automatic detection**: APOM detects modals rendered via React Portals (antd, MUI, Bootstrap, Chakra, Mantine, Element UI, Headless UI, Radix)
+- **Detection methods**: `role="dialog"`, `aria-modal="true"`, framework-specific CSS classes
+- **Animation-proof**: Modal elements are included even during CSS appear animations (opacity: 0)
+- **Rich metadata**: Modal nodes include `title` and `actions` (button labels) in metadata
+- **In APOM tree**: Modals appear as `type: "dialog"` with `model: "Modal"`, containing all interactive children
 **Why specialized tools matter:**
 - ✅ Trigger proper browser events (click, input, change)
@@ -397,7 +405,7 @@ executeScenario({ name: "login_flow", parameters: { email: "user@test.com" } })
   - `useLegacyFormat` (optional): Return legacy format instead of APOM (default: false - APOM is the default)
   - `registerElements` (optional): Auto-register elements for ID-based usage (default: true)   - `groupBy` (optional): 'type' or 'flat' - how to group elements (default: 'type') - **Why better than screenshot**:
   - Shows actual data (form values, validation errors) not just visual
-  - Uses 2-5k tokens vs screenshot 15-25k tokens
+  - Uses 2-5k tokens vs screenshot 5-10k tokens
   - Returns structured data with **unique element IDs** for easy interaction
   - **Detects UI frameworks** (MUI, Ant Design, Chakra, Bootstrap, Vuetify, Semantic UI)  - **Extracts dropdown options** from both native `<select>` and custom UI components- **Returns**:
   - **APOM format** (default): Tree-structured Page Object Model with unique IDs     - `tree` - Hierarchical tree of page elements (optimized: ~82% smaller than flat format)
@@ -674,11 +682,11 @@ Capture optimized screenshot of specific element with smart compression and auto
   - `padding` (optional): Padding in pixels (default: 0)
   - `maxWidth` (optional): Max width for auto-scaling (default: 1024, null for original size)
   - `maxHeight` (optional): Max height for auto-scaling (default: 8000, null for original size)
-  - `quality` (optional): JPEG quality 1-100 (default: 80)
-  - `format` (optional): 'png', 'jpeg', or 'auto' (default: 'auto')
+  - `quality` (optional): JPEG quality 1-100 (default: 40)
+  - `format` (optional): 'png', 'jpeg', or 'auto' (default: 'jpeg')
 - **Use case**: Visual documentation, bug reports
-- **Returns**: Optimized image with metadata
-- **Default behavior**: Auto-scales to 1024px width and 8000px height (API limit) and uses smart compression to reduce AI token usage
+- **Returns**: Optimized image with metadata (~5-10k tokens)
+- **Default behavior**: JPEG at quality 40, auto-scales to 1024px width and 8000px height (API limit). For higher quality, explicitly set `quality` and `format` parameters
 - **Automatic compression**: If image exceeds 3 MB, automatically reduces quality or scales down to fit within limit
 - **For original quality**: Set `maxWidth: null`, `maxHeight: null` and `format: 'png'` (still enforces 3 MB limit)
@@ -692,7 +700,7 @@ Save optimized screenshot to filesystem without returning in context, with autom
   - `maxHeight` (optional): Max height for auto-scaling (default: 8000, null for original)
   - `quality` (optional): JPEG quality 1-100 (default: 80)
   - `format` (optional): 'png', 'jpeg', or 'auto' (default: 'auto')
-- **Use case**: Baseline screenshots, file storage
+- **Use case**: Baseline screenshots, file storage (higher quality defaults than `screenshot` tool)
 - **Returns**: File path and metadata (not image data)
 - **Default behavior**: Auto-scales and compresses to save disk space
 - **Automatic compression**: If image exceeds 3 MB, automatically reduces quality or scales down to fit within limit

package/element-finder-utils.js CHANGED Viewed

@@ -462,8 +462,10 @@ function getUniqueSelectorInPage(element) {
   }
   // 8. Fallback: nth-of-type with path
+  // Build path up to 8 levels, verifying uniqueness
   let current = element;
   const path = [];
+  const MAX_PATH_DEPTH = 8;
   while (current && current.tagName) {
     let selector = current.tagName.toLowerCase();
@@ -497,10 +499,21 @@ function getUniqueSelectorInPage(element) {
     }
     path.unshift(selector);
+    // Check if current path is already unique
+    try {
+      const candidateSelector = path.join(' > ');
+      if (document.querySelectorAll(candidateSelector).length === 1) {
+        return candidateSelector;
+      }
+    } catch (e) {
+      // Invalid selector, continue building path
+    }
     current = current.parentElement;
-    // Stop at body or after 5 levels
-    if (!current || current.tagName === 'BODY' || path.length >= 5) {
+    // Stop at body or after max depth
+    if (!current || current.tagName === 'BODY' || path.length >= MAX_PATH_DEPTH) {
       break;
     }
   }

package/index.js CHANGED Viewed

@@ -1,4 +1,4 @@
-#!/usr/bin/env node
+#!/usr/bin/env node
 import {Server} from "@modelcontextprotocol/sdk/server/index.js";
 import {StdioServerTransport} from "@modelcontextprotocol/sdk/server/stdio.js";
@@ -584,7 +584,7 @@ async function executeToolInternal(name, args) {
             // Initialize registry if needed
             const registry = window.__MODEL_REGISTRY__ || (() => {
-              const reg = new ModelRegistry();
+              const reg = new window.ModelRegistry();
               if (window.ELEMENT_MODELS_CLASSES) {
                 reg.registerAll(window.ELEMENT_MODELS_CLASSES);
               }
@@ -2334,7 +2334,7 @@ Start coding now.`;
             return {
               content: [
                 { type: 'text', text: JSON.stringify(response, null, 2) },
-                { type: 'image', data: actionResult.screenshot, mimeType: 'image/png' }
+                { type: 'image', data: actionResult.screenshot, mimeType: actionResult.screenshotMimeType || 'image/png' }
               ]
             };
           }
@@ -2760,7 +2760,7 @@ Start coding now.`;
             return {
               content: [
                 { type: 'text', text: JSON.stringify(response, null, 2) },
-                { type: 'image', data: actionResult.screenshot, mimeType: 'image/png' }
+                { type: 'image', data: actionResult.screenshot, mimeType: actionResult.screenshotMimeType || 'image/png' }
               ]
             };
           }
@@ -3938,6 +3938,19 @@ async function main() {
   console.error("chrometools-mcp server running on stdio");
   console.error("Browser will be initialized on first openBrowser call");
+  // Pre-warm Jimp AFTER server is connected (non-blocking)
+  // Jimp v0.22 constructor is thenable - awaiting it before server.connect()
+  // would block transport and cause MCP client timeout
+  (async () => {
+    try {
+      const img = await new Jimp(1, 1, 0x000000ff);
+      await img.getBufferAsync(Jimp.MIME_JPEG);
+      console.error("[chrometools-mcp] Jimp pre-warmed");
+    } catch (e) {
+      console.error("[chrometools-mcp] Jimp pre-warm failed:", e.message);
+    }
+  })();
 }
 main().catch((error) => {

package/models/TextInputModel.js CHANGED Viewed

@@ -42,13 +42,22 @@ export class TextInputModel extends BaseInputModel {
     try {
       // Method 1: Try Puppeteer typing (works for most cases)
       try {
-        // Focus and clear using JS (most reliable)
+        // Focus and clear using native setter (works with React/Vue/Angular controlled inputs)
         await withTimeout(
           () => this.element.evaluate((el, shouldClear) => {
             el.focus();
             el.click();
             if (shouldClear) {
-              el.value = '';
+              // Use native HTMLInputElement setter to bypass React's value tracker
+              // React overrides the value setter and ignores programmatic changes via el.value = ''
+              const nativeSetter = Object.getOwnPropertyDescriptor(
+                window.HTMLInputElement.prototype, 'value'
+              )?.set;
+              if (nativeSetter) {
+                nativeSetter.call(el, '');
+              } else {
+                el.value = '';
+              }
               el.dispatchEvent(new Event('input', { bubbles: true }));
               el.dispatchEvent(new Event('change', { bubbles: true }));
             }
@@ -78,11 +87,20 @@ export class TextInputModel extends BaseInputModel {
         // Fall through to JS method
       }
-      // Method 2: Fallback to direct JS value setting
+      // Method 2: Fallback to direct JS value setting (with React-compatible native setter)
       await withTimeout(
         () => this.element.evaluate((el, newValue, shouldClear) => {
           el.focus();
-          el.value = shouldClear ? newValue : el.value + newValue;
+          const finalValue = shouldClear ? newValue : el.value + newValue;
+          // Use native setter to bypass React's value tracker
+          const nativeSetter = Object.getOwnPropertyDescriptor(
+            window.HTMLInputElement.prototype, 'value'
+          )?.set;
+          if (nativeSetter) {
+            nativeSetter.call(el, finalValue);
+          } else {
+            el.value = finalValue;
+          }
           el.dispatchEvent(new Event('input', { bubbles: true }));
           el.dispatchEvent(new Event('change', { bubbles: true }));
         }, value, clearFirst),

package/models/TextareaModel.js CHANGED Viewed

@@ -54,7 +54,15 @@ export class TextareaModel extends BaseInputModel {
           await withTimeout(
             () => this.element.evaluate(el => {
               el.focus();
-              el.value = '';
+              // Use native setter to bypass React's value tracker
+              const nativeSetter = Object.getOwnPropertyDescriptor(
+                window.HTMLTextAreaElement.prototype, 'value'
+              )?.set;
+              if (nativeSetter) {
+                nativeSetter.call(el, '');
+              } else {
+                el.value = '';
+              }
               el.dispatchEvent(new Event('input', { bubbles: true }));
               el.dispatchEvent(new Event('change', { bubbles: true }));
             }),
@@ -82,11 +90,19 @@ export class TextareaModel extends BaseInputModel {
         // Fall through to JS method
       }
-      // Method 2: Fallback to direct JS value setting
+      // Method 2: Fallback to direct JS value setting (with React-compatible native setter)
       await withTimeout(
         () => this.element.evaluate((el, newValue, shouldClear) => {
           el.focus();
-          el.value = shouldClear ? newValue : el.value + newValue;
+          const finalValue = shouldClear ? newValue : el.value + newValue;
+          const nativeSetter = Object.getOwnPropertyDescriptor(
+            window.HTMLTextAreaElement.prototype, 'value'
+          )?.set;
+          if (nativeSetter) {
+            nativeSetter.call(el, finalValue);
+          } else {
+            el.value = finalValue;
+          }
           el.dispatchEvent(new Event('input', { bubbles: true }));
           el.dispatchEvent(new Event('change', { bubbles: true }));
         }, value, clearFirst),

package/models/index.js CHANGED Viewed

@@ -19,7 +19,7 @@ class TextInputModel extends ElementModel {
   }
   getActions() {
-    return ['type', 'click', 'hover', 'screenshot'];
+    return ['type', 'clear', 'click', 'hover', 'screenshot'];
   }
   matches(element, elementType) {
@@ -32,6 +32,7 @@ class TextInputModel extends ElementModel {
   getActionHandler(actionName) {
     const handlers = {
       'type': 'executeTypeAction',
+      'clear': 'executeTypeAction',
       'click': 'executeClickAction',
       'hover': 'executeHoverAction',
       'screenshot': 'executeScreenshotAction'
@@ -381,6 +382,42 @@ class ColorInputModel extends ElementModel {
   }
 }
+/**
+ * Modal/Dialog Model
+ * Handles: Modal dialogs, popups, overlays (React Portals, framework modals)
+ * Detects elements rendered via portals outside the main React tree
+ */
+class ModalModel extends ElementModel {
+  getName() {
+    return 'Modal';
+  }
+  getActions() {
+    return ['screenshot', 'close', 'scrollTo'];
+  }
+  getPriority() {
+    return 200; // High priority — check before containers
+  }
+  matches(element, elementType) {
+    // Only match actual dialog elements, not framework wrappers
+    // Framework wrappers are detected separately for portal inclusion
+    if (element.getAttribute('role') === 'dialog') return true;
+    if (element.getAttribute('aria-modal') === 'true') return true;
+    return false;
+  }
+  getActionHandler(actionName) {
+    const handlers = {
+      'screenshot': 'executeScreenshotAction',
+      'close': 'executeClickAction',
+      'scrollTo': 'executeScrollToAction'
+    };
+    return handlers[actionName] || null;
+  }
+}
 /**
  * Default Model (fallback for non-interactive elements)
  */
@@ -424,6 +461,7 @@ const MODELS = [
   DateInputModel,
   FileInputModel,
   ColorInputModel,
+  ModalModel,
   DefaultModel
 ];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "chrometools-mcp",
-  "version": "3.5.0",
+  "version": "3.5.2",
   "description": "MCP (Model Context Protocol) server for Chrome automation using Puppeteer. Persistent browser sessions, UI framework detection (MUI, Ant Design, etc.), Page Object support, visual testing, Figma comparison. Works seamlessly in WSL, Linux, macOS, and Windows.",
   "type": "module",
   "main": "index.js",

package/pom/apom-tree-converter.js CHANGED Viewed

@@ -16,7 +16,7 @@ function initializeModelRegistry() {
   }
   // Create and populate registry
-  const registry = new ModelRegistry();
+  const registry = new (window.ModelRegistry || ModelRegistry)();
   // Register all models (order doesn't matter, priority is handled internally)
   if (typeof window !== 'undefined' && window.ELEMENT_MODELS_CLASSES) {
@@ -73,12 +73,51 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
   let idCounter = 0;
   const elementIds = new WeakMap();
   const interactiveElements = new WeakSet();
+  const modalElements = new WeakSet(); // Elements inside modal portals (skip visibility checks)
+  const modalAncestors = new WeakSet(); // Portal wrapper ancestors (force compact format)
   // First pass: mark all interactive elements
   if (interactiveOnly) {
     markInteractiveElements(document.body);
   }
+  // Second pass: detect modal/dialog portals and force-mark for inclusion
+  // Modals are rendered via React Portals outside the main tree and may have
+  // opacity: 0 during animation — force-include them and all their descendants
+  if (interactiveOnly) {
+    // Framework-specific portal container patterns (used only for detection, not model assignment)
+    const portalPatterns = [
+      'ant-modal-root', 'ant-modal-wrap',
+      'MuiDialog-root', 'MuiModal-root',
+      'modal-dialog',
+      'chakra-modal__content-container',
+      'el-dialog__wrapper', 'el-overlay-dialog',
+      'headlessui-dialog',
+      'radix-dialog',
+      'mantine-Modal-root',
+    ];
+    function isPortalElement(el) {
+      if (el.getAttribute('role') === 'dialog') return true;
+      if (el.getAttribute('aria-modal') === 'true') return true;
+      const classes = el.className || '';
+      if (typeof classes !== 'string') return false;
+      return portalPatterns.some(p => classes.includes(p));
+    }
+    // Scan body direct children for framework-specific portal roots
+    for (const child of document.body.children) {
+      if (isPortalElement(child)) {
+        forceMarkModalTree(child);
+      }
+    }
+    // Also find deeper dialog elements (some frameworks nest portals)
+    document.querySelectorAll('[role="dialog"], [aria-modal="true"]').forEach(el => {
+      if (!modalElements.has(el)) {
+        forceMarkModalTree(el);
+      }
+    });
+  }
   // Build tree from body
   result.tree = buildNode(document.body, null, 0, []);
@@ -408,6 +447,31 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
     interactiveElements.add(document.body);
   }
+  /**
+   * Force-mark modal portal element and all its descendants for inclusion in APOM tree.
+   * Modal portals (React Portals) are rendered outside the main tree and may have
+   * opacity: 0 during CSS animations — this ensures they're always included.
+   */
+  function forceMarkModalTree(element) {
+    modalElements.add(element);
+    interactiveElements.add(element);
+    // Mark all descendants — inputs, buttons, etc. inside the modal
+    element.querySelectorAll('*').forEach(el => {
+      modalElements.add(el);
+      interactiveElements.add(el);
+    });
+    // Mark ancestors up to body (so the path from body to modal is traversed)
+    // Must add to modalElements for isVisible() bypass (0x0 dimensions, opacity:0)
+    // Also mark as modalAncestors to force compact format (portal wrappers are pass-through)
+    let current = element.parentElement;
+    while (current && current !== document.body) {
+      modalElements.add(current);
+      modalAncestors.add(current);
+      interactiveElements.add(current);
+      current = current.parentElement;
+    }
+  }
   /**
    * Check if element is in viewport
    */
@@ -431,7 +495,11 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
    */
   function isVisible(el) {
     // Check dimensions first (works for fixed position elements)
-    if (el.offsetWidth === 0 || el.offsetHeight === 0) return false;
+    // Exception: modal portal wrapper divs may have 0x0 dimensions
+    // while their visible content (dialog, inputs, buttons) does not
+    if (el.offsetWidth === 0 || el.offsetHeight === 0) {
+      if (!modalElements.has(el)) return false;
+    }
     // Check computed styles
     const style = window.getComputedStyle(el);
@@ -439,11 +507,12 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
       return false;
     }
-    // Check opacity, but allow exceptions for inputs that are commonly styled with opacity:0
-    // (checkboxes, radios, file inputs often use opacity:0 with custom visual overlay)
+    // Check opacity, but allow exceptions for:
+    // - inputs styled with opacity:0 (checkboxes, radios, file inputs with custom overlay)
+    // - elements inside modal portals (opacity: 0 during CSS appear animation)
     const tag = el.tagName.toLowerCase();
     const isStylableInput = tag === 'input' && ['checkbox', 'radio', 'file'].includes(el.type);
-    if (style.opacity === '0' && !isStylableInput) {
+    if (style.opacity === '0' && !isStylableInput && !modalElements.has(el)) {
       return false;
     }
@@ -457,7 +526,8 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
     // Additional check: element should be in viewport or have offsetParent
     // This handles elements inside position:fixed containers (Angular Material)
-    return el.offsetParent !== null || style.position === 'fixed' || style.position === 'sticky';
+    // Exception: modal portal elements may lack offsetParent
+    return el.offsetParent !== null || style.position === 'fixed' || style.position === 'sticky' || modalElements.has(el);
   }
   /**
@@ -505,8 +575,31 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
       }
     }
+    // Modal containers: promote to interactive node with metadata
+    if (modelName === 'Modal') {
+      elementType.isInteractive = true;
+      elementType.type = 'dialog';
+      // Extract modal title
+      const titleEl = element.querySelector(
+        '.ant-modal-title, .MuiDialogTitle-root, [class*="modal-title"], [class*="dialog-title"], .modal-header h5, .modal-header h4'
+      );
+      const titleText = titleEl ? titleEl.textContent.trim().substring(0, 100) : null;
+      // Extract action buttons
+      const buttons = element.querySelectorAll('button');
+      const actions = Array.from(buttons)
+        .map(b => b.textContent.trim())
+        .filter(t => t && t.length > 0 && t.length < 30);
+      elementType.metadata = {
+        ...(elementType.metadata || {}),
+        ...(titleText ? { title: titleText } : {}),
+        ...(actions.length ? { actions: actions.slice(0, 5) } : {})
+      };
+    }
     // Build node - minimize non-interactive parents
-    const isInteractive = elementType.isInteractive;
+    // Modal ancestors (portal wrappers) are forced to compact format —
+    // they have onclick handlers (close on outside click) but are just pass-through containers
+    const isInteractive = elementType.isInteractive && !modalAncestors.has(element);
     // Build node structure based on mode
     let node;
@@ -562,16 +655,14 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
     // Update metadata counters
     result.metadata.totalElements++;
-    if (elementType.isInteractive) {
+    if (isInteractive) {
       result.metadata.interactiveCount++;
     }
     if (elementType.type === 'form') {
       result.metadata.formCount++;
     }
-    if (position && (position.type === 'fixed' || position.type === 'absolute')) {
-      if (position.zIndex && position.zIndex >= 100) {
-        result.metadata.modalCount++;
-      }
+    if (modelName === 'Modal') {
+      result.metadata.modalCount++;
     }
     if (depth > result.metadata.maxDepth) {
       result.metadata.maxDepth = depth;
@@ -1003,30 +1094,29 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
     if (stableClass) {
       const escapedClass = CSS.escape(stableClass);
       const classSelector = `.${escapedClass}`;
-      // Verify it's unique within parent context
-      if (element.parentElement) {
-        try {
-          const matches = element.parentElement.querySelectorAll(classSelector);
-          if (matches.length === 1 && matches[0] === element) {
-            return classSelector;
-          }
-        } catch (e) {
-          // Invalid selector, continue to path-based approach
+      // Verify it's unique in the ENTIRE document (not just parent)
+      try {
+        const matches = document.querySelectorAll(classSelector);
+        if (matches.length === 1 && matches[0] === element) {
+          return classSelector;
         }
+      } catch (e) {
+        // Invalid selector, continue to path-based approach
       }
     }
-    // Build path from parent
+    // Build path from element to body, checking uniqueness at each level
     const path = [];
     let current = element;
+    const MAX_PATH_DEPTH = 8;
-    while (current && current !== document.body) {
+    while (current && current !== document.body && path.length < MAX_PATH_DEPTH) {
       let selector = current.tagName.toLowerCase();
       // Add stable class if available (escaped for CSS selector safety)
-      const stableClass = getStableClassName(current);
-      if (stableClass) {
-        selector += `.${CSS.escape(stableClass)}`;
+      const cls = getStableClassName(current);
+      if (cls) {
+        selector += `.${CSS.escape(cls)}`;
       }
       // Add nth-of-type if needed
@@ -1041,6 +1131,15 @@ function buildAPOMTree(interactiveOnly = true, viewportOnly = false) {
       }
       path.unshift(selector);
+      // Check if current path is already unique in the document
+      try {
+        const candidateSelector = path.join(' > ');
+        if (document.querySelectorAll(candidateSelector).length === 1) {
+          return candidateSelector;
+        }
+      } catch (e) { /* continue building path */ }
       current = current.parentElement;
     }

package/server/tool-definitions.js CHANGED Viewed

@@ -92,7 +92,7 @@ export const toolDefinitions = [
       },
       {
         name: "screenshot",
-        description: "Capture element image (15-25k tokens). Use analyzePage for form data/validation (8-10k tokens).",
+        description: "Capture element image (5-10k tokens). Use analyzePage for form data/validation (8-10k tokens).",
         inputSchema: {
           type: "object",
           properties: {
@@ -100,8 +100,8 @@ export const toolDefinitions = [
             padding: { type: "number", description: "Padding px (default: 0)" },
             maxWidth: { type: "number", description: "Max width px (default: 1024, null=original)" },
             maxHeight: { type: "number", description: "Max height px (default: 8000, null=original)" },
-            quality: { type: "number", minimum: 1, maximum: 100, description: "JPEG quality (default: 80)" },
-            format: { type: "string", enum: ["png", "jpeg", "auto"], description: "Format (default: auto)" },
+            quality: { type: "number", minimum: 1, maximum: 100, description: "JPEG quality (default: 40)" },
+            format: { type: "string", enum: ["png", "jpeg", "auto"], description: "Format (default: jpeg)" },
           },
           required: ["selector"],
         },

package/server/tool-schemas.js CHANGED Viewed

@@ -115,8 +115,8 @@ export const ScreenshotSchema = z.object({
   padding: z.number().optional().describe("Padding around element in pixels (default: 0)"),
   maxWidth: z.number().nullable().optional().describe("Maximum width in pixels, auto-scales if larger (default: 1024, set to null for original size)"),
   maxHeight: z.number().nullable().optional().describe("Maximum height in pixels, auto-scales if larger (default: 8000 for API limit, set to null for original size)"),
-  quality: z.number().min(1).max(100).optional().describe("JPEG quality 1-100 (default: 80, only applies to JPEG format)"),
-  format: z.enum(['png', 'jpeg', 'auto']).optional().describe("Image format: 'png', 'jpeg', or 'auto' (default: 'auto' - chooses based on size)"),
+  quality: z.number().min(1).max(100).optional().describe("JPEG quality 1-100 (default: 40)"),
+  format: z.enum(['png', 'jpeg', 'auto']).optional().describe("Image format (default: 'jpeg')"),
 }).refine(data => (data.id && !data.selector) || (!data.id && data.selector), {
   message: "Either 'id' or 'selector' must be provided, but not both"
 });

package/utils/actions/click-action.js CHANGED Viewed

@@ -5,6 +5,7 @@
 import { runPostClickDiagnostics, formatDiagnosticsForAI } from '../post-click-diagnostics.js';
 import { generateClickHints } from '../hints-generator.js';
+import { processScreenshot } from '../screenshot-processor.js';
 /**
  * Execute click action on element with adaptive strategy
@@ -186,10 +187,16 @@ export async function executeClickAction(page, element, options = {}) {
     { type: "text", text: `Clicked: ${identifier}${hintsText}${diagnosticsText}` }
   ];
-  // Only add screenshot if requested
+  // Only add screenshot if requested — lightweight JPEG for action confirmation
   if (screenshot === true) {
-    const screenshotData = await page.screenshot({ encoding: 'base64', fullPage: false });
-    content.push({ type: "image", data: screenshotData, mimeType: "image/png" });
+    const screenshotBuffer = await page.screenshot({ encoding: 'binary', fullPage: false });
+    const processed = await processScreenshot(screenshotBuffer, {
+      maxWidth: 800,
+      maxHeight: 4000,
+      quality: 40,
+      format: 'jpeg',
+    });
+    content.push({ type: "image", data: processed.buffer.toString('base64'), mimeType: processed.mimeType });
   }
   return { content };

package/utils/actions/screenshot-action.js CHANGED Viewed

@@ -14,8 +14,8 @@ import { processScreenshot } from '../image-processing.js';
  * @param {number} options.padding - Padding around element in pixels (default: 0)
  * @param {number|null} options.maxWidth - Max width for scaling (default: 1024, null for original)
  * @param {number|null} options.maxHeight - Max height for scaling (default: 8000, null for original)
- * @param {number} options.quality - JPEG quality 1-100 (default: 80)
- * @param {string} options.format - Image format: 'png', 'jpeg', 'auto' (default: 'auto')
+ * @param {number} options.quality - JPEG quality 1-100 (default: 50)
+ * @param {string} options.format - Image format: 'png', 'jpeg', 'auto' (default: 'jpeg')
  * @returns {Promise<Object>} Result with content array (text + image)
  */
 export async function executeScreenshotAction(page, element, options = {}) {
@@ -24,8 +24,8 @@ export async function executeScreenshotAction(page, element, options = {}) {
     padding = 0,
     maxWidth = 1024,
     maxHeight = 8000,
-    quality = 80,
-    format = 'auto'
+    quality = 40,
+    format = 'jpeg'
   } = options;
   // Scroll to element to ensure it's in viewport

package/utils/element-actions.js CHANGED Viewed

@@ -1,3 +1,25 @@
+import { processScreenshot } from './screenshot-processor.js';
+// Lightweight action screenshot: small JPEG for confirming actions worked
+// These are "report" screenshots, not for detailed analysis
+async function takeActionScreenshot(page, clip) {
+  const screenshotBuffer = await page.screenshot({
+    encoding: 'binary',
+    fullPage: false,
+    ...(clip ? { clip } : {})
+  });
+  const processed = await processScreenshot(screenshotBuffer, {
+    maxWidth: 800,
+    maxHeight: 4000,
+    quality: 40,
+    format: 'jpeg',
+  });
+  return {
+    data: processed.buffer.toString('base64'),
+    mimeType: processed.mimeType,
+  };
+}
 // Helper function to execute actions on elements
 export async function executeElementAction(page, selector, action) {
   if (!action || !action.type) {
@@ -17,13 +39,27 @@ export async function executeElementAction(page, selector, action) {
   switch (action.type) {
     case 'click':
-      await element.click();
+      // Scroll element into view first (direct JS, avoids Puppeteer's scrollIntoViewIfNeeded hang)
+      await element.evaluate(el => el.scrollIntoView({ behavior: 'instant', block: 'center' }));
+      // Click with timeout + JS fallback (Puppeteer's click can hang in complex layouts)
+      try {
+        await Promise.race([
+          element.click(),
+          new Promise((_, reject) => setTimeout(() => reject(new Error('click timeout')), 5000))
+        ]);
+      } catch (e) {
+        // Fallback to JS click (bypasses Puppeteer's coordinate-based click)
+        await element.evaluate(el => el.click());
+      }
       await new Promise(resolve => setTimeout(resolve, action.waitAfter || 1500));
       result.message = `Clicked on ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;
@@ -38,8 +74,9 @@ export async function executeElementAction(page, selector, action) {
       result.message = `Typed "${action.text}" into ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;
@@ -65,9 +102,10 @@ export async function executeElementAction(page, selector, action) {
         width: Math.max(box.width, 1),
         height: Math.max(box.height, 1)
       };
-      const screenshot = await page.screenshot({ clip, encoding: 'base64' });
+      const { data: screenshotData, mimeType: screenshotMime } = await takeActionScreenshot(page, clip);
       result.message = `Captured screenshot of ${selector}`;
-      result.screenshot = screenshot;
+      result.screenshot = screenshotData;
+      result.screenshotMimeType = screenshotMime;
       break;
     case 'hover':
@@ -76,8 +114,9 @@ export async function executeElementAction(page, selector, action) {
       result.message = `Hovered over ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;
@@ -101,8 +140,9 @@ export async function executeElementAction(page, selector, action) {
       result.message = `Applied styles to ${selector}`;
       if (action.screenshot) {
-        const screenshot = await page.screenshot({ encoding: 'base64', fullPage: false });
-        result.screenshot = screenshot;
+        const { data, mimeType } = await takeActionScreenshot(page);
+        result.screenshot = data;
+        result.screenshotMimeType = mimeType;
       }
       break;