chrometools-mcp 2.4.2 → 3.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/CHANGELOG.md +540 -0
  2. package/COMPONENT_MAPPING_SPEC.md +1217 -0
  3. package/README.md +494 -38
  4. package/bridge/bridge-client.js +472 -0
  5. package/bridge/bridge-service.js +399 -0
  6. package/bridge/install.js +241 -0
  7. package/browser/browser-manager.js +107 -2
  8. package/browser/page-manager.js +226 -69
  9. package/docs/CHROME_EXTENSION.md +219 -0
  10. package/docs/PAGE_OBJECT_MODEL_CONCEPT.md +1756 -0
  11. package/element-finder-utils.js +138 -28
  12. package/extension/background.js +643 -0
  13. package/extension/content.js +715 -0
  14. package/extension/icons/create-icons.js +164 -0
  15. package/extension/icons/icon128.png +0 -0
  16. package/extension/icons/icon16.png +0 -0
  17. package/extension/icons/icon48.png +0 -0
  18. package/extension/manifest.json +58 -0
  19. package/extension/popup/popup.css +437 -0
  20. package/extension/popup/popup.html +102 -0
  21. package/extension/popup/popup.js +415 -0
  22. package/extension/recorder-overlay.css +93 -0
  23. package/figma-tools.js +120 -0
  24. package/index.js +3347 -2518
  25. package/models/BaseInputModel.js +93 -0
  26. package/models/CheckboxGroupModel.js +199 -0
  27. package/models/CheckboxModel.js +103 -0
  28. package/models/ColorInputModel.js +53 -0
  29. package/models/DateInputModel.js +67 -0
  30. package/models/RadioGroupModel.js +126 -0
  31. package/models/RangeInputModel.js +60 -0
  32. package/models/SelectModel.js +97 -0
  33. package/models/TextInputModel.js +34 -0
  34. package/models/TextareaModel.js +59 -0
  35. package/models/TimeInputModel.js +49 -0
  36. package/models/index.js +122 -0
  37. package/package.json +3 -2
  38. package/pom/apom-converter.js +267 -0
  39. package/pom/apom-tree-converter.js +515 -0
  40. package/pom/element-id-generator.js +175 -0
  41. package/recorder/page-object-generator.js +16 -0
  42. package/recorder/scenario-executor.js +80 -2
  43. package/server/tool-definitions.js +839 -656
  44. package/server/tool-groups.js +3 -2
  45. package/server/tool-schemas.js +367 -296
  46. package/server/websocket-bridge.js +447 -0
  47. package/utils/selector-resolver.js +186 -0
  48. package/utils/ui-framework-detector.js +392 -0
package/README.md CHANGED
@@ -8,16 +8,18 @@ MCP server for Chrome automation using Puppeteer with persistent browser session
8
8
  - [Usage](#usage)
9
9
  - [AI Optimization Features](#ai-optimization-features) ⭐ **NEW**
10
10
  - [Scenario Recorder](#scenario-recorder) ⭐ **NEW** - Visual UI-based recording with smart optimization
11
- - [Available Tools](#available-tools) - **40+ Tools Total**
12
- - [AI-Powered Tools](#ai-powered-tools) ⭐ **NEW** - smartFindElement, analyzePage, getAllInteractiveElements, findElementsByText
11
+ - [Available Tools](#available-tools) - **46+ Tools Total**
12
+ - [AI-Powered Tools](#ai-powered-tools) ⭐ **NEW** - smartFindElement, analyzePage, getElementByApomId, getAllInteractiveElements, findElementsByText
13
13
  - [Core Tools](#1-core-tools) - ping, openBrowser
14
- - [Interaction Tools](#2-interaction-tools) - click, type, scrollTo
14
+ - [Interaction Tools](#2-interaction-tools) - click, type, scrollTo, selectOption, selectFromGroup, drag, scrollHorizontal
15
15
  - [Inspection Tools](#3-inspection-tools) - getElement, getComputedCss, getBoxModel, screenshot
16
16
  - [Advanced Tools](#4-advanced-tools) - executeScript, getConsoleLogs, listNetworkRequests, getNetworkRequest, filterNetworkRequests, hover, setStyles, setViewport, getViewport, navigateTo
17
- - [Recorder Tools](#6-recorder-tools) ⭐ **NEW** - enableRecorder, executeScenario, listScenarios, searchScenarios, getScenarioInfo, deleteScenario, exportScenarioAsCode, appendScenarioToFile, generatePageObject
17
+ - [Tab Management Tools](#5-tab-management-tools) ⭐ **NEW** - listTabs, switchTab
18
+ - [Recorder Tools](#7-recorder-tools) ⭐ **NEW** - enableRecorder, executeScenario, listScenarios, searchScenarios, getScenarioInfo, deleteScenario, exportScenarioAsCode, appendScenarioToFile, generatePageObject
18
19
  - [Typical Workflow Example](#typical-workflow-example)
19
20
  - [Tool Usage Tips](#tool-usage-tips)
20
21
  - [Configuration](#configuration)
22
+ - [Multi-Instance Support](#multi-instance-support) ⭐ **NEW** - Run multiple MCP servers simultaneously
21
23
  - [WSL Setup Guide](#wsl-setup-guide) → [Full WSL Guide](WSL_SETUP.md)
22
24
  - [Development](#development)
23
25
  - [Features](#features)
@@ -113,6 +115,39 @@ executeScenario({ name: "login_flow", parameters: { email: "user@test.com" } })
113
115
 
114
116
  ## Available Tools
115
117
 
118
+ ### ⚠️ Tool Usage Priority
119
+
120
+ **CRITICAL: Always use specialized tools first. Never jump to `executeScript` as first choice.**
121
+
122
+ #### For Clicking/Interaction
123
+ 1. ✅ **`click()`** - PRIMARY tool for all clicks
124
+ - Works correctly with React/Vue/Angular synthetic events
125
+ - Handles button clicks, link navigation, form submissions
126
+ 2. ✅ **`findElementsByText()` + action** - When selector is unknown, find by text
127
+ 3. ⚠️ **`executeScript()`** - LAST RESORT, only if above failed
128
+
129
+ #### For Filling Forms
130
+ 1. ✅ **`type()`** - PRIMARY tool for all text input
131
+ - Properly updates React hooks, Vue reactive data
132
+ - Auto-clears field before typing (configurable)
133
+ 2. ⚠️ **`executeScript()`** - LAST RESORT, only if above failed
134
+
135
+ #### For Reading Page State
136
+ 1. ✅ **`analyzePage()`** - PRIMARY tool for reading page content
137
+ - Gets forms, inputs, buttons, links with current values
138
+ - Use `refresh: true` after interactions to see updated state
139
+ - Efficient: 2-5k tokens vs screenshot 15-25k
140
+ 2. ✅ **`findElementsByText()`** - Find specific elements by visible text
141
+ 3. ✅ **`getElement()`** - Get HTML of specific element
142
+ 4. ⚠️ **`executeScript()`** - LAST RESORT, only if above failed
143
+
144
+ **Why specialized tools matter:**
145
+ - ✅ Trigger proper browser events (click, input, change)
146
+ - ✅ Work with React/Vue/Angular synthetic event systems
147
+ - ✅ Update framework state correctly (React hooks, Vue reactivity)
148
+ - ✅ Handle animations, navigation, and async updates
149
+ - ❌ `executeScript` bypasses framework events and may fail silently
150
+
116
151
  ### AI-Powered Tools
117
152
 
118
153
  #### smartFindElement ⭐
@@ -152,24 +187,81 @@ Get current page state and structure. Returns complete map of forms (with values
152
187
  - **Parameters**:
153
188
  - `refresh` (optional): Force refresh cache to get CURRENT state after changes (default: false)
154
189
  - `includeAll` (optional): Include ALL page elements, not just interactive ones (default: false). Useful for layout work - find any element, get its selector, then use `getComputedCss` or `setStyles` on it.
190
+ - `useLegacyFormat` (optional): Return legacy format instead of APOM (default: false - **APOM is now the default**) 🔄 **BREAKING CHANGE**
191
+ - `registerElements` (optional): Auto-register elements for ID-based usage (default: true) ⭐ **APOM**
192
+ - `groupBy` (optional): 'type' or 'flat' - how to group elements (default: 'type') ⭐ **APOM**
155
193
  - **Why better than screenshot**:
156
194
  - Shows actual data (form values, validation errors) not just visual
157
195
  - Uses 2-5k tokens vs screenshot 15-25k tokens
158
- - Returns structured data with selectors
196
+ - Returns structured data with **unique element IDs** for easy interaction
197
+ - **Detects UI frameworks** (MUI, Ant Design, Chakra, Bootstrap, Vuetify, Semantic UI) ⭐
198
+ - **Extracts dropdown options** from both native `<select>` and custom UI components ⭐
159
199
  - **Returns**:
160
- - By default: Complete map of forms (with current values), inputs, buttons, links, navigation with selectors
161
- - With `includeAll: true`: Also includes `allElements` array with ALL visible page elements (divs, spans, headings, etc.) - each with selector, tag, text, classes, id
200
+ - **APOM format** (default): Tree-structured Page Object Model with unique IDs **NOW DEFAULT**
201
+ - `tree` - Hierarchical tree of page elements (optimized: ~82% smaller than flat format)
202
+ - Each node: `{ tag, id?, type?, sel, ch?, bounds?, meta? }`
203
+ - Interactive elements have `bounds` and full metadata
204
+ - Parent containers have minimal info (position only)
205
+ - `groups` - Radio/checkbox groups with options (name, value, label, checked state)
206
+ - `meta` - Page metadata (url, title, timestamp, element counts)
207
+ - Elements automatically registered - use IDs with `click({ id: "..." })`, `type({ id: "..." })`, etc.
208
+ - **Token-optimized**: Minified JSON, simplified parents, no redundant data
209
+ - Example: `analyzePage()` returns APOM, then use `click({ id: "button_45" })` or `type({ id: "input_20", text: "..." })`
210
+ - **Use `getElementByApomId({ id: "input_20" })`** to get full details for any element
211
+ - **Legacy format** (`useLegacyFormat: true`): Classic format for backward compatibility
212
+ - Complete map of forms (with current values), inputs, buttons, links, navigation with selectors
213
+ - **Each element includes `uiFramework` info** (name, version, component type) ⭐
214
+ - **Select elements include `options` array** with value, text, index, selected, disabled, group ⭐
215
+ - With `includeAll: true`: Also includes `allElements` array with ALL visible page elements (divs, spans, headings, etc.) - each with selector, tag, text, classes, id
162
216
  - **Example workflow**:
163
217
  1. `openBrowser({ url: "..." })`
164
- 2. `analyzePage()` ← Initial analysis
165
- 3. `click({ selector: "submit-btn" })`
166
- 4. **`analyzePage({ refresh: true })`**See what changed after click!
218
+ 2. `analyzePage()` ← Initial analysis, returns elements with IDs
219
+ 3. `type({ id: "input_20", text: "user@example.com" })` ← Use APOM ID
220
+ 4. `click({ id: "button_45" })`Use APOM ID
221
+ 5. **`analyzePage({ refresh: true })`** ← See what changed after click!
167
222
  - **Layout work example**:
168
223
  1. `analyzePage({ includeAll: true })` ← Get all elements
169
224
  2. Find element you want to style (e.g., `div.header`)
170
225
  3. `getComputedCss({ selector: "div.header" })` ← Get current styles
171
226
  4. `setStyles({ selector: "div.header", styles: [...] })` ← Apply new styles
172
227
 
228
+ #### getElementByApomId ⭐ **NEW**
229
+ Get detailed information about a specific element by its APOM ID from `analyzePage`. Use this to inspect elements without re-analyzing the entire page.
230
+ - **Parameters**:
231
+ - `id` (required): APOM element ID (e.g., `"input_20"`, `"button_45"`)
232
+ - **Use case**: Get full details for a specific element (bounds, attributes, computed styles)
233
+ - **Returns**: Element details including:
234
+ - `id`: Element APOM ID
235
+ - `selector`: CSS selector
236
+ - `tag`: HTML tag name
237
+ - `type`: Input type (for inputs)
238
+ - `text`: Visible text content
239
+ - `bounds`: `{ x, y, width, height }` position and size
240
+ - `attributes`: All HTML attributes
241
+ - `computedStyles`: Key CSS properties (display, visibility, color, background, etc.)
242
+ - `isVisible`: Whether element is visible
243
+ - `isEnabled`: Whether element is enabled (not disabled)
244
+ - **Example**:
245
+ ```javascript
246
+ // Get details for specific input field
247
+ getElementByApomId({ id: "input_20" })
248
+
249
+ // Returns:
250
+ {
251
+ "success": true,
252
+ "id": "input_20",
253
+ "selector": "input[name='email']",
254
+ "tag": "input",
255
+ "type": "email",
256
+ "text": "",
257
+ "bounds": { "x": 100, "y": 200, "width": 300, "height": 40 },
258
+ "attributes": { "name": "email", "placeholder": "Enter email" },
259
+ "computedStyles": { "display": "block", "visibility": "visible" },
260
+ "isVisible": true,
261
+ "isEnabled": true
262
+ }
263
+ ```
264
+
173
265
  #### getAllInteractiveElements
174
266
  Get all clickable/fillable elements with their selectors.
175
267
  - **Parameters**:
@@ -201,25 +293,45 @@ Opens browser and navigates to URL. Browser stays open for further interactions.
201
293
  ### 2. Interaction Tools
202
294
 
203
295
  #### click
204
- Click an element with optional result screenshot.
296
+ Click an element with optional result screenshot. **PREFERRED**: Use APOM ID from `analyzePage` for reliable targeting.
205
297
  - **Parameters**:
206
- - `selector` (required): CSS selector
298
+ - `id` (optional): APOM element ID from analyzePage (e.g., `"button_45"`, `"link_7"`). **Preferred over selector.**
299
+ - `selector` (optional): CSS selector. Use when APOM ID is not available.
300
+ - ⚠️ Either `id` OR `selector` required (mutually exclusive)
207
301
  - `waitAfter` (optional): Wait time in ms (default: 1500)
208
302
  - `screenshot` (optional): Capture screenshot (default: false for performance) ⚡
209
303
  - `timeout` (optional): Max operation time in ms (default: 30000)
210
304
  - **Use case**: Buttons, links, form submissions
211
305
  - **Returns**: Confirmation text + optional screenshot
212
306
  - **Performance**: 2-10x faster without screenshot
307
+ - **Example**:
308
+ ```javascript
309
+ // PREFERRED: Using APOM ID
310
+ click({ id: "button_45" })
311
+
312
+ // Alternative: Using CSS selector
313
+ click({ selector: "button[type='submit']" })
314
+ ```
213
315
 
214
316
  #### type
215
- Type text into input fields with optional clearing and typing delay.
317
+ Type text into input fields with optional clearing and typing delay. **PREFERRED**: Use APOM ID from `analyzePage` for reliable targeting.
216
318
  - **Parameters**:
217
- - `selector` (required): CSS selector
319
+ - `id` (optional): APOM element ID from analyzePage (e.g., `"input_20"`). **Preferred over selector.**
320
+ - `selector` (optional): CSS selector. Use when APOM ID is not available.
321
+ - ⚠️ Either `id` OR `selector` required (mutually exclusive)
218
322
  - `text` (required): Text to type
219
323
  - `delay` (optional): Delay between keystrokes in ms
220
324
  - `clearFirst` (optional): Clear field first (default: true)
221
325
  - **Use case**: Filling forms, search boxes, text inputs
222
326
  - **Returns**: Confirmation text
327
+ - **Example**:
328
+ ```javascript
329
+ // PREFERRED: Using APOM ID
330
+ type({ id: "input_20", text: "user@example.com" })
331
+
332
+ // Alternative: Using CSS selector
333
+ type({ selector: "input[name='email']", text: "user@example.com" })
334
+ ```
223
335
 
224
336
  #### scrollTo
225
337
  Scroll page to bring element into view.
@@ -229,6 +341,82 @@ Scroll page to bring element into view.
229
341
  - **Use case**: Lazy loading, sticky elements, visibility checks
230
342
  - **Returns**: Final scroll position
231
343
 
344
+ #### selectOption
345
+ Select option in dropdown (HTML select elements). **PREFERRED**: Use APOM ID from `analyzePage` for reliable targeting.
346
+ - **Parameters**:
347
+ - `id` (optional): APOM element ID from analyzePage (e.g., `"select_5"`). **Preferred over selector.**
348
+ - `selector` (optional): CSS selector. Use when APOM ID is not available.
349
+ - ⚠️ Either `id` OR `selector` required (mutually exclusive)
350
+ - `value` (optional): Option value attribute (priority 1)
351
+ - `text` (optional): Option text content (priority 2)
352
+ - `index` (optional): Option index, 0-based (priority 3)
353
+ - **Use case**: Form dropdowns, filtering, selection menus
354
+ - **Returns**: Selected option details (value, text, index)
355
+ - **Selection priority**: If multiple parameters specified, tries value → text → index
356
+ - **AI Integration**: Use `analyzePage` to see all available options with their values, text, and indices
357
+ - **Example**:
358
+ ```javascript
359
+ // PREFERRED: Using APOM ID
360
+ selectOption({ id: "select_5", value: "US" })
361
+
362
+ // Alternative: Using CSS selector
363
+ selectOption({ selector: "select[name='country']", text: "United States" })
364
+ ```
365
+
366
+ #### selectFromGroup ⭐ **NEW**
367
+ Select option(s) from radio or checkbox group by name attribute. Works at abstract group level instead of individual clicks.
368
+ - **Parameters**:
369
+ - `name` (required): Name attribute of the radio/checkbox group (e.g., 'size', 'toppings')
370
+ - `value` (optional): Single value to select (for radio or single checkbox)
371
+ - `values` (optional): Array of values to select (for checkbox group)
372
+ - `text` (optional): Label text to match (alternative to value)
373
+ - `texts` (optional): Array of label texts to match (for checkbox group)
374
+ - `by` (optional): Match by 'value', 'text', or 'auto' (default: 'auto')
375
+ - `mode` (optional): For checkboxes - 'set' (replace all), 'add', 'remove', 'toggle' (default: 'set')
376
+ - **Use case**: Radio buttons, checkbox groups, form options
377
+ - **Returns**: Result with changes made and current selection state
378
+ - **AI Integration**: Use `analyzePage` to see available groups in `groups` section with all options and labels
379
+ - **Examples**:
380
+ ```javascript
381
+ // Radio group - select single option
382
+ selectFromGroup({ name: "size", value: "large" })
383
+ selectFromGroup({ name: "size", text: "Extra Large" })
384
+
385
+ // Checkbox group - set specific values (uncheck others)
386
+ selectFromGroup({ name: "toppings", values: ["cheese", "bacon"] })
387
+
388
+ // Checkbox group - add to existing selection
389
+ selectFromGroup({ name: "toppings", values: ["mushrooms"], mode: "add" })
390
+
391
+ // Checkbox group - remove specific values
392
+ selectFromGroup({ name: "toppings", values: ["onions"], mode: "remove" })
393
+
394
+ // Checkbox group - toggle values
395
+ selectFromGroup({ name: "toppings", texts: ["Extra Cheese"], mode: "toggle" })
396
+ ```
397
+
398
+ #### drag
399
+ Drag element by mouse (click-hold-move-release). Simulates real mouse drag, not scrollbar scrolling.
400
+ - **Parameters**:
401
+ - `selector` (required): CSS selector for element to drag
402
+ - `direction` (required): 'up', 'down', 'left', 'right', 'up-left', 'up-right', 'down-left', 'down-right'
403
+ - `distance` (optional): Distance in pixels (default: 100)
404
+ - `duration` (optional): Drag duration in milliseconds (default: 500)
405
+ - **Use case**: Interactive maps (Google Maps, Leaflet), Gantt charts, SVG diagrams, canvas elements, sliders, drag-to-pan interfaces
406
+ - **How it works**: Moves mouse to element center, presses mouse button, drags to target position, releases button
407
+ - **NOT for**: Standard overflow scrollbars (use `scrollTo` or `scrollHorizontal` instead)
408
+ - **Returns**: Start/end mouse positions and drag delta
409
+
410
+ #### scrollHorizontal
411
+ Scroll element horizontally (for tables, carousels, wide content).
412
+ - **Parameters**:
413
+ - `selector` (required): CSS selector for element to scroll
414
+ - `direction` (required): 'left' or 'right'
415
+ - `amount` (required): Number of pixels to scroll, or 'full' to scroll to the end
416
+ - `behavior` (optional): 'auto' or 'smooth' (default: 'auto')
417
+ - **Use case**: Wide tables, image carousels, horizontally scrollable containers
418
+ - **Returns**: Scroll state (position, total width, visible width, scroll availability)
419
+
232
420
  ### 3. Inspection Tools
233
421
 
234
422
  #### getElement
@@ -352,10 +540,21 @@ Filter requests by URL pattern with full details.
352
540
  3. `filterNetworkRequests({ urlPattern: "api/..." })` - get all matching requests with details
353
541
 
354
542
  #### hover
355
- Simulate mouse hover over element.
356
- - **Parameters**: `selector` (required)
543
+ Simulate mouse hover over element. **PREFERRED**: Use APOM ID from `analyzePage` for reliable targeting.
544
+ - **Parameters**:
545
+ - `id` (optional): APOM element ID from analyzePage (e.g., `"button_10"`). **Preferred over selector.**
546
+ - `selector` (optional): CSS selector. Use when APOM ID is not available.
547
+ - ⚠️ Either `id` OR `selector` required (mutually exclusive)
357
548
  - **Use case**: Testing hover effects, tooltips, dropdown menus
358
549
  - **Returns**: Confirmation text
550
+ - **Example**:
551
+ ```javascript
552
+ // PREFERRED: Using APOM ID
553
+ hover({ id: "button_10" })
554
+
555
+ // Alternative: Using CSS selector
556
+ hover({ selector: ".dropdown-trigger" })
557
+ ```
359
558
 
360
559
  #### setStyles
361
560
  Apply inline CSS styles to element for live editing.
@@ -388,7 +587,49 @@ Navigate to different URL while keeping browser instance.
388
587
  - **Use case**: Moving between pages in workflow
389
588
  - **Returns**: New page title
390
589
 
391
- ### 5. Figma Tools ⭐ ENHANCED
590
+ ### 5. Tab Management Tools ⭐ NEW
591
+
592
+ Tools for managing multiple browser tabs. New tabs opened via `window.open()`, `target="_blank"`, or user actions are automatically detected and tracked.
593
+
594
+ #### listTabs
595
+ List all open browser tabs with their URLs, titles, and active status.
596
+ - **Parameters**: None
597
+ - **Returns**:
598
+ - `tabs`: Array of `{ index, url, title, isActive }`
599
+ - `totalCount`: Number of open tabs
600
+ - `newTabsDetected` (optional): Array of tabs opened since last check
601
+ - **Use case**: See all open tabs, check for newly opened tabs
602
+
603
+ ```javascript
604
+ // Example response
605
+ {
606
+ "tabs": [
607
+ { "index": 0, "url": "https://example.com", "title": "Example", "isActive": false },
608
+ { "index": 1, "url": "https://google.com", "title": "Google", "isActive": true }
609
+ ],
610
+ "totalCount": 2,
611
+ "newTabsDetected": [
612
+ { "timestamp": "2026-01-25T...", "url": "https://google.com", "openerUrl": "https://example.com" }
613
+ ]
614
+ }
615
+ ```
616
+
617
+ #### switchTab
618
+ Switch to a different browser tab by index or URL pattern.
619
+ - **Parameters**:
620
+ - `tab` (required): Tab index (number, 0-based) or URL pattern (string, partial match)
621
+ - **Use case**: Switch between tabs for multi-tab workflows
622
+ - **Returns**: `{ success, switchedTo: { url, title } }`
623
+
624
+ ```javascript
625
+ // Switch by index
626
+ switchTab({ tab: 0 })
627
+
628
+ // Switch by URL pattern
629
+ switchTab({ tab: "google.com" })
630
+ ```
631
+
632
+ ### 6. Figma Tools ⭐ ENHANCED
392
633
 
393
634
  Design-to-code validation, file browsing, design system extraction, and comparison tools with automatic 3 MB compression.
394
635
 
@@ -472,6 +713,27 @@ Extract complete color palette with usage statistics.
472
713
  - Usage examples (where the color is used)
473
714
  - Sorted by usage frequency
474
715
 
716
+ #### convertFigmaToCode ⭐ NEW
717
+ Convert Figma designs to React/Tailwind code with AI assistance.
718
+ - **Parameters**:
719
+ - `figmaToken` (optional): Figma API token
720
+ - `fileKey` (required): Figma file key
721
+ - `nodeId` (required): Frame/component ID (formats: '123:456' or '123-456')
722
+ - `framework` (optional): 'react', 'react-typescript', or 'html' (default: 'react')
723
+ - `includeComments` (optional): Include code comments (default: true)
724
+ - **Use case**: Rapid prototyping, design-to-code workflow, implementing Figma designs
725
+ - **How it works**:
726
+ 1. Fetches design structure (layout, colors, typography, spacing)
727
+ 2. Gets rendered design image at 2x resolution
728
+ 3. Returns AI-optimized instructions with simplified JSON structure
729
+ 4. AI generates clean React/Tailwind code matching the design
730
+ - **Returns**: Formatted instruction prompt containing:
731
+ - Design image reference
732
+ - Simplified JSON structure with layout, styling, text properties
733
+ - Framework-specific guidelines (React components, TypeScript types, Tailwind classes)
734
+ - Quality requirements (semantic HTML, accessibility, accurate spacing)
735
+ - **Best for**: UI components, landing pages, card designs, navigation bars
736
+
475
737
  #### getFigmaFrame
476
738
  Export and download a Figma frame as PNG/JPG image with automatic compression.
477
739
  - **Parameters**:
@@ -513,7 +775,7 @@ Extract detailed design specifications from Figma including text content, colors
513
775
  - **Dimensions**: Width, height, x, y coordinates
514
776
  - **Children**: Recursive tree with text extraction from all child elements
515
777
 
516
- ### 6. Recorder Tools ⭐ NEW
778
+ ### 7. Recorder Tools ⭐ NEW
517
779
 
518
780
  **URL-Based Storage (v2.1+)**: Scenarios are automatically organized by website domain in `~/.config/chrometools-mcp/projects/{domain}/scenarios/`.
519
781
 
@@ -882,18 +1144,29 @@ Generate Page Object Model (POM) class from current page structure. Analyzes pag
882
1144
  // 1. Open page
883
1145
  openBrowser({ url: "https://example.com/form" })
884
1146
 
885
- // 2. Fill form
886
- type({ selector: "input[name='email']", text: "user@example.com" })
887
- type({ selector: "input[name='password']", text: "secret123" })
1147
+ // 2. Analyze page to get element IDs
1148
+ analyzePage()
1149
+ // Returns: { tree: {...}, groups: {...}, meta: {...} }
1150
+ // Elements: input_20 (email), input_21 (password), button_45 (submit)
888
1151
 
889
- // 3. Submit
890
- click({ selector: "button[type='submit']" })
1152
+ // 3. Fill form using APOM IDs (preferred)
1153
+ type({ id: "input_20", text: "user@example.com" })
1154
+ type({ id: "input_21", text: "secret123" })
891
1155
 
892
- // 4. Verify
893
- getElement({ selector: ".success-message" })
1156
+ // 4. Submit using APOM ID
1157
+ click({ id: "button_45" })
1158
+
1159
+ // 5. Verify
1160
+ analyzePage({ refresh: true }) // See updated state
894
1161
  screenshot({ selector: ".dashboard", padding: 20 })
895
1162
  ```
896
1163
 
1164
+ **Alternative: Using CSS selectors (still supported)**
1165
+ ```javascript
1166
+ type({ selector: "input[name='email']", text: "user@example.com" })
1167
+ click({ selector: "button[type='submit']" })
1168
+ ```
1169
+
897
1170
  ---
898
1171
 
899
1172
  ## Tool Usage Tips
@@ -997,9 +1270,9 @@ Each tool definition is sent to the AI in every request, consuming context token
997
1270
  | `debug` | Debugging & network | `getConsoleLogs`, `listNetworkRequests`, `getNetworkRequest`, `filterNetworkRequests` (4) |
998
1271
  | `advanced` | Advanced automation & AI | `executeScript`, `setStyles`, `setViewport`, `getViewport`, `navigateTo`, `smartFindElement`, `analyzePage`, `getAllInteractiveElements`, `findElementsByText` (9) |
999
1272
  | `recorder` | Scenario recording | `enableRecorder`, `executeScenario`, `listScenarios`, `searchScenarios`, `getScenarioInfo`, `deleteScenario`, `exportScenarioAsCode`, `appendScenarioToFile`, `generatePageObject` (9) |
1000
- | `figma` | Figma integration | `getFigmaFrame`, `compareFigmaToElement`, `getFigmaSpecs`, `parseFigmaUrl`, `listFigmaPages`, `searchFigmaFrames`, `getFigmaComponents`, `getFigmaStyles`, `getFigmaColorPalette` (9) |
1273
+ | `figma` | Figma integration | `getFigmaFrame`, `compareFigmaToElement`, `getFigmaSpecs`, `parseFigmaUrl`, `listFigmaPages`, `searchFigmaFrames`, `getFigmaComponents`, `getFigmaStyles`, `getFigmaColorPalette`, `convertFigmaToCode` (10) |
1001
1274
 
1002
- **Total:** 43 tools across 7 groups
1275
+ **Total:** 44 tools across 7 groups
1003
1276
 
1004
1277
  **Configuration:**
1005
1278
 
@@ -1162,17 +1435,29 @@ npx @modelcontextprotocol/inspector node index.js
1162
1435
 
1163
1436
  ## Features
1164
1437
 
1165
- - **27+ Powerful Tools**: Complete toolkit for browser automation
1438
+ - **44+ Powerful Tools**: Complete toolkit for browser automation
1166
1439
  - Core: ping, openBrowser
1167
- - Interaction: click, type, scrollTo
1168
- - Inspection: getElement, getComputedCss, getBoxModel, screenshot
1169
- - Advanced: executeScript, getConsoleLogs, getNetworkRequests, hover, setStyles, setViewport, getViewport, navigateTo
1170
- - AI-Powered: smartFindElement, analyzePage, getAllInteractiveElements, findElementsByText
1171
- - Recorder: enableRecorder, executeScenario, listScenarios, searchScenarios, getScenarioInfo, deleteScenario
1172
- - Figma: getFigmaFrame, compareFigmaToElement, getFigmaSpecs
1440
+ - Interaction: click, type, scrollTo, selectOption, selectFromGroup, drag, scrollHorizontal
1441
+ - Inspection: getElement, getComputedCss, getBoxModel, screenshot, saveScreenshot
1442
+ - Advanced: executeScript, getConsoleLogs, listNetworkRequests, getNetworkRequest, filterNetworkRequests, hover, setStyles, setViewport, getViewport, navigateTo, waitForElement
1443
+ - AI-Powered: smartFindElement, analyzePage, getElementByApomId, getAllInteractiveElements, findElementsByText ⭐ **NEW**
1444
+ - Recorder: enableRecorder, executeScenario, listScenarios, searchScenarios, getScenarioInfo, deleteScenario, exportScenarioAsCode, appendScenarioToFile, generatePageObject
1445
+ - Figma: getFigmaFrame, compareFigmaToElement, getFigmaSpecs, parseFigmaUrl, listFigmaPages, searchFigmaFrames, getFigmaComponents, getFigmaStyles, getFigmaColorPalette, convertFigmaToCode
1446
+ - **UI Framework Detection**: Automatic detection of MUI, Ant Design, Chakra UI, Bootstrap, Vuetify, Semantic UI ⭐ **NEW**
1447
+ - **Smart Dropdown Handling**: Extracts options from both native `<select>` and custom UI framework components ⭐ **NEW**
1448
+ - **APOM (Agent Page Object Model)**: Automatic element ID assignment for reliable interaction ⭐ **NEW**
1449
+ - `analyzePage()` returns elements with unique IDs (e.g., `input_20`, `button_45`)
1450
+ - Use `id` parameter in click/type/hover/selectOption for stable targeting
1451
+ - Use `getElementByApomId()` to get detailed element info
1173
1452
  - **Console Log Capture**: Automatic JavaScript console monitoring
1174
1453
  - **Network Request Monitoring**: Track all HTTP/API requests (XHR, Fetch, etc.)
1175
1454
  - **Persistent Browser Sessions**: Browser tabs remain open between requests
1455
+ - **Multi-Instance Support**: Run multiple MCP servers simultaneously with automatic discovery ⭐ **NEW**
1456
+ - Dynamic port allocation (9223-9227)
1457
+ - Chrome Extension port scanning every 20s
1458
+ - Broadcast pattern for parallel AI clients
1459
+ - Graceful handling of ungraceful shutdowns
1460
+ - **Auto-Sync Active Tab**: MCP server automatically syncs to user's currently active tab ⭐ **NEW**
1176
1461
  - **Visual Browser (GUI Mode)**: See automation in real-time
1177
1462
  - **Cross-platform**: Works on Windows/WSL, Linux, macOS
1178
1463
  - **Simple Installation**: One command with npx
@@ -1180,9 +1465,180 @@ npx @modelcontextprotocol/inspector node index.js
1180
1465
  - **AI-Friendly**: Detailed descriptions optimized for AI agents
1181
1466
  - **Responsive Testing**: Built-in viewport control for mobile/tablet/desktop
1182
1467
 
1468
+ ## Multi-Instance Support
1469
+
1470
+ ⭐ **NEW**: Run up to 8 MCP servers simultaneously, connecting/disconnecting at any time without coordination.
1471
+
1472
+ ### Overview
1473
+
1474
+ ChromeTools MCP uses a **Bridge Architecture** for reliable multi-instance support:
1475
+
1476
+ - **Multiple AI clients** (0-8) can connect/disconnect at any time
1477
+ - **No scanning delays** — instant connection to persistent Bridge Service
1478
+ - **Resilient** — Bridge survives MCP process crashes, maintains state
1479
+ - **Chrome lifecycle** — Bridge starts/stops with Chrome Extension
1480
+
1481
+ ### How It Works
1482
+
1483
+ ```
1484
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
1485
+ │ Claude Desktop │ │ Telegram Bot │ │ Custom Script │
1486
+ │ MCP Client │ │ MCP Client │ │ MCP Client │
1487
+ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘
1488
+ │ │ │
1489
+ │ WebSocket │ WebSocket │ WebSocket
1490
+ │ (client) │ (client) │ (client)
1491
+ │ │ │
1492
+ └────────────────────┼────────────────────┘
1493
+
1494
+
1495
+ ┌───────────────────────────────┐
1496
+ │ Bridge Service (:9223) │
1497
+ │ (Native Messaging Host) │
1498
+ │ │
1499
+ │ • Stores tabs state │
1500
+ │ • Stores recordings │
1501
+ │ • Broadcasts events │
1502
+ │ • Accepts 0-8 clients │
1503
+ └───────────────┬───────────────┘
1504
+
1505
+ │ Native Messaging (stdio)
1506
+
1507
+ ┌───────────────┴───────────────┐
1508
+ │ Chrome Extension │
1509
+ │ (Event Producer) │
1510
+ │ │
1511
+ │ • Tracks all tabs │
1512
+ │ • Records user actions │
1513
+ │ • Sends events to Bridge │
1514
+ └───────────────┬───────────────┘
1515
+
1516
+
1517
+ ┌───────────────────────────────┐
1518
+ │ Chrome Browser │
1519
+ └───────────────────────────────┘
1520
+ ```
1521
+
1522
+ ### Installation
1523
+
1524
+ **One-time setup** (installs Native Messaging Bridge):
1525
+
1526
+ ```bash
1527
+ npx chrometools-mcp --install-bridge
1528
+ ```
1529
+
1530
+ This:
1531
+ 1. Creates Bridge Service files in `~/.chrometools/`
1532
+ 2. Registers Native Messaging Host in system (Windows Registry / Chrome config)
1533
+ 3. Bridge will auto-start when Chrome Extension loads
1534
+
1535
+ **Verify installation:**
1536
+ ```bash
1537
+ npx chrometools-mcp --check-bridge
1538
+ ```
1539
+
1540
+ ### Architecture
1541
+
1542
+ **1. Bridge Service (Persistent Intermediary)**
1543
+ - Launched by Chrome via Native Messaging when Extension starts
1544
+ - Runs WebSocket server on port 9223
1545
+ - Stores state: tabs, recordings, recorder state
1546
+ - Lives as long as Chrome is running
1547
+ - Accepts 0-8 simultaneous MCP clients
1548
+
1549
+ **2. Chrome Extension (Event Producer)**
1550
+ - Tracks all browser tabs (created, updated, closed, activated)
1551
+ - Records user actions (clicks, typing, navigation)
1552
+ - Sends ALL events to Bridge via Native Messaging
1553
+ - Doesn't care about MCP clients — just produces events
1554
+
1555
+ **3. MCP Server (Event Consumer)**
1556
+ - Connects to Bridge as WebSocket client
1557
+ - Receives full state immediately on connect
1558
+ - Gets real-time event updates
1559
+ - Can disconnect/reconnect at any time without losing state
1560
+
1561
+ ### Use Cases
1562
+
1563
+ **Ephemeral AI Sessions**
1564
+ ```bash
1565
+ # User sends message to Telegram bot
1566
+ # → Claude Code starts, connects to Bridge
1567
+ # → Gets current tabs state instantly
1568
+ # → Performs automation
1569
+ # → Claude Code exits, disconnects
1570
+ # → Bridge keeps running, state preserved
1571
+
1572
+ # Next message: same flow, instant state access
1573
+ ```
1574
+
1575
+ **Parallel Workflows**
1576
+ ```bash
1577
+ # Claude Desktop: form automation
1578
+ # Telegram Bot: monitoring & debugging
1579
+ # Custom script: data extraction
1580
+
1581
+ # All connected to same Bridge
1582
+ # All see same browser state
1583
+ # All can control Chrome
1584
+ ```
1585
+
1586
+ ### Configuration
1587
+
1588
+ No configuration needed after installation. Just use:
1589
+
1590
+ ```bash
1591
+ npx chrometools-mcp
1592
+ ```
1593
+
1594
+ MCP automatically connects to Bridge on startup.
1595
+
1596
+ ### CLI Options
1597
+
1598
+ ```bash
1599
+ npx chrometools-mcp --install-bridge # Install Native Messaging Bridge
1600
+ npx chrometools-mcp --uninstall-bridge # Uninstall Bridge
1601
+ npx chrometools-mcp --check-bridge # Check if Bridge is installed
1602
+ npx chrometools-mcp --help # Show help
1603
+ ```
1604
+
1605
+ ### Technical Details
1606
+
1607
+ | Component | Technology | Port |
1608
+ |-----------|------------|------|
1609
+ | Bridge Service | Node.js + WebSocket Server | 9223 |
1610
+ | Extension ↔ Bridge | Native Messaging (stdio) | — |
1611
+ | MCP ↔ Bridge | WebSocket (client) | 9223 |
1612
+
1613
+ **Max Clients:** 8 simultaneous MCP connections
1614
+
1615
+ **State on Connect:** Full state (tabs, recordings, recorder state) sent immediately
1616
+
1617
+ **Extension ID:** `dmehkibmncgphijnigkahhlekgajhpbl` (stable, generated from key)
1618
+
1619
+ ### Troubleshooting
1620
+
1621
+ **Bridge not connecting:**
1622
+ ```bash
1623
+ # Check if Bridge is installed
1624
+ npx chrometools-mcp --check-bridge
1625
+
1626
+ # Reinstall if needed
1627
+ npx chrometools-mcp --install-bridge
1628
+
1629
+ # Reload extension in chrome://extensions
1630
+ ```
1631
+
1632
+ **Extension shows "Disconnected":**
1633
+ - Bridge only runs when Chrome Extension is active
1634
+ - Close and reopen Chrome
1635
+ - Check Extension Service Worker console for errors
1636
+
1183
1637
  ## Architecture
1184
1638
 
1185
- - Uses Puppeteer for Chrome automation
1186
- - MCP Server SDK for protocol implementation
1187
- - Zod for schema validation
1188
- - Stdio transport for communication
1639
+ - **Puppeteer** for Chrome automation
1640
+ - **MCP Server SDK** for protocol implementation
1641
+ - **Native Messaging Bridge** for persistent Extension ↔ MCP communication
1642
+ - **WebSocket** for multi-client support (Bridge as server, MCP as clients)
1643
+ - **Zod** for schema validation
1644
+ - **Stdio transport** for MCP communication