npm - chrometools-mcp - Versions diffs - 2.5.0 → 3.1.2 - Mend

chrometools-mcp 2.5.0 → 3.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/CHANGELOG.md +420 -0
package/COMPONENT_MAPPING_SPEC.md +1217 -0
package/README.md +406 -38
package/bridge/bridge-client.js +472 -0
package/bridge/bridge-service.js +399 -0
package/bridge/install.js +241 -0
package/browser/browser-manager.js +107 -2
package/browser/page-manager.js +226 -69
package/docs/CHROME_EXTENSION.md +219 -0
package/docs/PAGE_OBJECT_MODEL_CONCEPT.md +1756 -0
package/extension/background.js +643 -0
package/extension/content.js +715 -0
package/extension/icons/create-icons.js +164 -0
package/extension/icons/icon128.png +0 -0
package/extension/icons/icon16.png +0 -0
package/extension/icons/icon48.png +0 -0
package/extension/manifest.json +58 -0
package/extension/popup/popup.css +437 -0
package/extension/popup/popup.html +102 -0
package/extension/popup/popup.js +415 -0
package/extension/recorder-overlay.css +93 -0
package/index.js +3347 -2901
package/models/BaseInputModel.js +93 -0
package/models/CheckboxGroupModel.js +199 -0
package/models/CheckboxModel.js +103 -0
package/models/ColorInputModel.js +53 -0
package/models/DateInputModel.js +67 -0
package/models/RadioGroupModel.js +126 -0
package/models/RangeInputModel.js +60 -0
package/models/SelectModel.js +97 -0
package/models/TextInputModel.js +34 -0
package/models/TextareaModel.js +59 -0
package/models/TimeInputModel.js +49 -0
package/models/index.js +122 -0
package/package.json +3 -2
package/pom/apom-converter.js +267 -0
package/pom/apom-tree-converter.js +515 -0
package/pom/element-id-generator.js +175 -0
package/recorder/page-object-generator.js +16 -0
package/recorder/scenario-executor.js +80 -2
package/server/tool-definitions.js +839 -713
package/server/tool-groups.js +1 -1
package/server/tool-schemas.js +367 -326
package/server/websocket-bridge.js +447 -0
package/utils/selector-resolver.js +186 -0
package/utils/ui-framework-detector.js +392 -0
package/RELEASE_NOTES_v2.5.0.md +0 -109
package/npm_publish_output.txt +0 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,426 @@
 All notable changes to this project will be documented in this file.
+## [3.1.2] - 2026-01-26
+### Added
+- **Multi-tab scenario recording** — Automatic recording of tab switches during scenario capture
+  - When user switches tabs during recording, an `openTab` action is automatically recorded
+  - Records tab URL, title, and switch reason for accurate playback
+  - Only records if switching to a different tab (ignores same-tab activations)
+  - Location: `extension/background.js` (chrome.tabs.onActivated listener)
+### Changed
+- **openTab navigation strategy** — Changed from `networkidle2` to `domcontentloaded` in scenario executor
+  - Prevents timeout errors when opening tabs with continuous ad/tracking loading
+  - Consistent with other navigation improvements in 3.1.1
+  - Location: `recorder/scenario-executor.js:979`
+### Fixed
+- **openTab with empty URL uses look-ahead to next action's URL** — Smart URL detection for tab switches
+  - When openTab has empty URL, executor looks at next action's tabUrl
+  - If next action has real URL, uses it for tab opening/switching
+  - Fixes scenarios where new tab was opened but URL loaded immediately after
+  - Empty URLs without look-ahead match are still skipped (prevents about:blank tabs)
+  - Location: `recorder/scenario-executor.js:168-176, 987-990`
+- **Added 500ms delay before tab switch** — Prevents race conditions during scenario playback
+  - Allows previous tab's pending processes (navigation, AJAX, form submissions) to complete
+  - Ensures stable state before switching to new tab
+  - Location: `recorder/scenario-executor.js:990-992`
+## [3.1.1] - 2026-01-26
+### Fixed
+- **Scenario recording and saving flow** — Fixed critical bugs preventing scenario recording
+  - Fixed Extension → Bridge → MCP communication flow for recording control
+  - `startRecording` now properly sends command to Extension via Bridge WebSocket
+  - `stopRecording` correctly retrieves recorded actions from Extension state
+  - `saveScenario` now successfully saves scenarios to correct project directory
+  - Recording state properly synchronized between Extension popup and MCP tools
+  - Location: `index.js` (startRecording/stopRecording/saveScenario handlers)
+- **Navigation timeout for slow websites** — Fixed timeout errors on sites with continuous loading
+  - Increased navigation timeout from 30s to 60s
+  - Changed wait strategy from `networkidle2` to `domcontentloaded` (less strict)
+  - Fixes timeout errors on sites like Yahoo that continuously load ads and tracking scripts
+  - Location: `browser/page-manager.js:188-193`
+### Changed
+- **Improved WebSocket message handling** — Better error reporting and state management
+  - Bridge now properly forwards recording commands to Extension
+  - MCP server correctly receives recording state updates from Bridge
+  - Clear error messages when Extension is not connected or recording fails
+## [3.1.0] - 2026-01-26
+### Added
+- **Native Messaging Bridge Architecture** — complete rewrite of Extension ↔ MCP communication
+  - Bridge Service runs as Native Messaging Host (launched by Chrome with Extension)
+  - MCP servers connect as WebSocket clients (not servers)
+  - Supports 0-8 simultaneous MCP clients connecting/disconnecting at any time
+  - Full state (tabs, recordings) sent immediately on client connect
+  - No more scanning delays — instant connection to persistent Bridge
+- **CLI commands for Bridge management**
+  - `--install-bridge` — Install Native Messaging Bridge (one-time setup)
+  - `--uninstall-bridge` — Remove Bridge installation
+  - `--check-bridge` — Verify Bridge is installed
+  - `--help` — Show all CLI options
+- **Stable Extension ID** via manifest key
+  - Extension ID is now deterministic: `dmehkibmncgphijnigkahhlekgajhpbl`
+  - Required for Native Messaging Host registration
+- **New extension icons** — Chrome/robot themed design (16, 48, 128px)
+### Changed
+- **Extension is now Event Producer** — sends all events to Bridge, doesn't manage WebSocket connections
+- **MCP is now Event Consumer** — connects to Bridge as client, receives state on demand
+- **Bridge lifecycle** — starts with Chrome Extension, stops when Chrome closes
+- Removed port scanning (9223-9227) — Bridge uses single fixed port 9223
+### Architecture
+```
+Chrome Extension (producer) → Native Messaging → Bridge Service (:9223) ← WebSocket ← MCP clients (0-8)
+```
+### Migration
+1. Run `npx chrometools-mcp --install-bridge` once
+2. Reload Extension in chrome://extensions
+3. Use normally — MCP auto-connects to Bridge
+## [3.0.4] - 2026-01-26
+### Added
+- **Smart tab tracking for scenario recording**
+  - Recording automatically follows the active tab
+  - When user switches tabs during recording, an `openTab` action is recorded
+  - When user opens new tab, an `openTab` action is recorded
+  - Actions from non-active tabs are automatically filtered out
+  - `openTab` action ensures tab exists during playback (opens tab with URL if not exists)
+  - Scenario executor supports `openTab` with automatic tab reuse and creation
+- **MCP tools for programmatic recording control**
+  - `startRecording` - Start recording from AI/code
+  - `stopRecording` - Stop and retrieve recorded actions
+  - `getRecorderState` - Query current recording state
+  - `saveScenario` - Save recorded actions as scenario
+  - AI agents can now fully control recording without manual interaction
+### Changed
+- Recording now tracks `currentTabId` instead of being locked to `startTabId`
+- Content scripts only send actions when their tab is the active recording target
+- Replaced `switchTab`/`newTab` with unified `openTab` action type
+- `openTab` intelligently checks if tab already exists before creating new one
+- `enableRecorder` now mentions programmatic control tools
+## [3.0.3] - 2026-01-25
+### Added
+- **Multi-instance MCP server support** via dynamic port allocation
+  - MCP server automatically finds available port from range 9223-9227
+  - Chrome Extension scans for running MCP instances every 20 seconds (port scanning)
+  - Extension connects to multiple MCP servers simultaneously (broadcast pattern)
+  - Enables multiple AI clients (Claude Desktop, Telegram bot, etc.) to work in parallel
+  - Graceful handling of ungraceful shutdowns (kill -9) via WebSocket.onclose
+### Changed
+- **Auto-sync active tab when user switches tabs manually**
+  - MCP server now syncs Puppeteer's `lastPage` when extension reports `tab_activated`
+  - Callback pattern avoids circular dependencies between websocket-bridge and page-manager
+  - MCP commands automatically target the user's currently active tab
+### Fixed
+- **Input recording deduplication** in Chrome Extension
+  - Extension now records only final text value after blur/Enter or 1.5s pause
+  - Eliminated intermediate keystroke recordings (e.g., "test" → "test1" recorded as one action)
+  - Improved debouncing with `inputStartValues` tracking
+## [3.0.2] - 2026-01-25
+### Added
+- **Extension installation instructions for AI agents**
+  - `listTabs`, `switchTab`, `enableRecorder` now include install steps when extension not connected
+  - `openBrowser` shows warning when connected to existing Chrome (extension needs manual install)
+  - Clear step-by-step instructions with extension path for manual installation
+  - Alternative fix: close all Chrome windows and restart MCP for auto-install
+### Changed
+- Improved extension status reporting with `extensionConnected` flag in responses
+## [3.0.1] - 2026-01-25
+### Fixed
+- **Multi-tab support** - Fixed extension-based tab switching
+  - `switchTab` now uses Chrome Extension for reliable tab switching
+  - Auto-connects Puppeteer to switched tab (fixes `analyzePage` after tab switch)
+  - `puppeteerConnected: true` in response confirms Puppeteer sync
+## [3.0.0] - 2026-01-25
+### BREAKING CHANGES
+- **Chrome Extension for Recording** - Scenario recording now requires Chrome Extension
+  - Old HTML-injection recorder (`injectRecorder`) removed
+  - Extension auto-loads when Chrome is started by chrometools-mcp
+  - Recording controlled via Extension popup (click CT icon in toolbar)
+### Added
+- **ChromeTools Chrome Extension** (`extension/` folder)
+  - Full tab tracking via Chrome tabs API (catches ALL new tabs including Ctrl+T, context menu)
+  - Scenario recording via content script (works across all domains)
+  - Popup UI for recording control
+  - WebSocket connection to MCP server for real-time communication
+- **WebSocket Bridge** (`server/websocket-bridge.js`)
+  - Bidirectional communication between Extension and MCP server
+  - Port 9223 (CHROME_DEBUG_PORT + 1)
+  - Tab state sync, recorder commands, scenario save/list
+- **Auto-load Extension** - Chrome launched with `--load-extension` flag
+  - Extension automatically installed when Chrome starts
+  - No manual installation required
+### Changed
+- `enableRecorder` tool now checks Extension connection status instead of injecting HTML
+- Tab tracking improved: Extension provides complete tab list including manually opened tabs
+- Recording state persisted in `chrome.storage.local` (survives cross-domain navigation)
+### Removed
+- `recorder-script.js` HTML injection functionality (still exists for reference)
+- `pagesWithRecorder` tracking (Extension handles this now)
+- `setupRecorderAutoReinjection` function (Extension handles this now)
+## [2.9.0] - 2026-01-25
+### Added
+- **Automatic new tab detection** - Tracks tabs opened via `window.open()`, `target="_blank"`, or user actions
+  - New tabs automatically become the active page
+  - Network monitoring, console capture, and recorder auto-injection enabled on new tabs
+  - New tab events queued for AI notification via `listTabs`
+- **`listTabs` tool** - List all open browser tabs
+  - Returns: `{ tabs: [{ index, url, title, isActive }], totalCount }`
+  - Includes `newTabsDetected` array when new tabs were opened since last check
+  - Use tab index with `switchTab` to change active tab
+- **`switchTab` tool** - Switch between browser tabs
+  - Parameters: `tab` - Tab index (number) or URL pattern (string, partial match)
+  - Makes the specified tab active for subsequent commands
+  - Returns: `{ success, switchedTo: { url, title } }`
+### Changed
+- `openPages` Map now tracks all tabs including those opened externally
+- Browser `targetcreated` event handler added for automatic tab tracking
+## [2.8.0] - 2026-01-25
+### Added
+- **`getElementByApomId` tool** - Get detailed element information by APOM ID
+  - Parameters: `id` (required) - APOM element ID from analyzePage (e.g., `"input_20"`)
+  - Returns: Full element details including bounds, attributes, computed styles, visibility
+  - Use case: Inspect specific elements without re-analyzing entire page
+### Changed
+- **APOM format optimization** - ~82% token reduction
+  - Tree-structured output with hierarchical parent-child relationships
+  - Minified JSON output (no pretty printing)
+  - Parent nodes contain only position info (no bounds/metadata)
+  - Interactive elements retain full details (bounds, type, metadata)
+  - Groups section for radio/checkbox groups with options
+- **Separate `id` and `selector` parameters** for click, type, hover, selectOption
+  - **PREFERRED**: Use `id` parameter with APOM ID from analyzePage (e.g., `click({ id: "button_45" })`)
+  - **ALTERNATIVE**: Use `selector` parameter with CSS selector (e.g., `click({ selector: ".submit" })`)
+  - Parameters are mutually exclusive (use one or the other)
+  - Makes API clearer: agent knows exactly what it's passing
+  - Updated tool descriptions with PREFERRED/ALTERNATIVE guidance
+### Removed
+- **`registerPageObject` tool** - No longer needed
+  - `analyzePage()` now automatically registers elements with unique IDs
+  - Use APOM IDs directly with click/type/hover/selectOption tools
+  - Simplifies workflow: just call `analyzePage()` and use the returned IDs
+### Performance
+- APOM token usage reduced from ~31,000 to ~5,684 tokens on typical pages
+- Tree structure eliminates redundant parent information
+- Minified JSON further reduces output size
+## [2.7.0] - 2026-01-25
+### 🔄 BREAKING CHANGE
+- **`analyzePage` now returns APOM format by default**
+  - Previous default was legacy format, now APOM is default
+  - Use `useLegacyFormat: true` to get old format if needed
+  - Migration: `analyzePage()` now returns APOM instead of legacy
+  - Rationale: APOM is superior - provides unique IDs, better structure, automatic registration
+### Added
+- **🎉 Agent Page Object Model (APOM) - Now Default Format**
+  - `analyzePage()` returns structured APOM format (no parameters needed!)
+  - New parameter: `useLegacyFormat` - Return old format (default: false)
+  - Parameter: `registerElements` - Auto-register elements (default: true)
+  - Parameter: `groupBy: 'type' | 'flat'` - Control element grouping
+  - Returns: `{ pageId, url, title, timestamp, elements, groups, metadata }`
+  - Each element gets unique ID: `input_0`, `button_1`, `form_0`, `radio_0`, `checkbox_0`
+  - Elements automatically registered in persistent `window.__ELEMENT_REGISTRY__`
+  - **Use IDs instead of CSS selectors**: `type({ id: "input_0", text: "..." })`
+  - **Backward compatible**: Set `useLegacyFormat: true` for old format
+  - **Tested**: Fully functional with real-world forms
+- **New POM Modules**
+  - `pom/element-id-generator.js` (171 lines) - Smart ID generation
+    - Priority: data-testid > id attribute > semantic path + index
+    - Supports: input, button, link, form, textarea, select, radio, checkbox
+  - `pom/apom-converter.js` (294 lines) - Convert analyzePage to APOM
+    - Transforms legacy format to structured model
+    - Groups elements by type (forms, inputs, buttons, links)
+    - Generates pageId, metadata
+- **Input Models Architecture** - Modular input handling system
+  - New `models/` directory with specialized input handlers
+  - `BaseInputModel` - Abstract base class with common interface (setValue, getValue, clear, focus)
+  - `TextInputModel` - Default for text-like inputs (text, email, password, search, tel, url)
+  - `TimeInputModel` - Correct handling for `input[type="time"]` via JavaScript value assignment
+  - `DateInputModel` - Handles date, datetime-local, month, week inputs
+  - `ColorInputModel` - Color picker input handling
+  - `RangeInputModel` - Slider/range input handling
+  - `SelectModel` - HTML `<select>` element handling
+  - `CheckboxModel` - Single checkbox toggle
+  - `TextareaModel` - Multi-line text input
+  - `InputModelFactory` - Factory pattern for selecting appropriate model
+  - Fixes issue where keyboard input didn't work for time inputs (only minutes showed)
+- **Radio/Checkbox Group Models** - Abstract group-level operations
+  - `RadioGroupModel` - Select single option from radio group by name, value, or label text
+  - `CheckboxGroupModel` - Multi-select from checkbox group with modes:
+    - `set` - Replace all selections
+    - `add` - Check additional values
+    - `remove` - Uncheck specific values
+    - `toggle` - Flip specific values
+- **`selectFromGroup` tool** - New MCP tool for radio/checkbox group selection
+  - Parameters: `name` (required), `value`, `values`, `text`, `texts`, `mode`, `by`
+  - Works with radio groups (single selection) and checkbox groups (multi-selection)
+  - Match by value attribute or label text (`by: 'value' | 'text' | 'auto'`)
+  - Example: `selectFromGroup({ name: "size", value: "large" })`
+  - Example: `selectFromGroup({ name: "toppings", values: ["cheese", "bacon"], mode: "add" })`
+- **Radio/Checkbox Groups in `analyzePage`**
+  - APOM output now includes `groups` section with radio and checkbox groups
+  - Each group shows: name, all options with values, labels, checked state
+  - Labels extracted from: parent `<label>`, `<label for="id">`, aria-label attribute
+  - Example output:
+    ```json
+    "groups": {
+      "radio": {
+        "size": {
+          "options": [
+            { "value": "small", "label": "Small", "checked": false },
+            { "value": "large", "label": "Large", "checked": true }
+          ]
+        }
+      },
+      "checkbox": { ... }
+    }
+    ```
+### Fixed
+- **Critical Bug**: Variable shadowing in analyzePage APOM conversion (commit e1e63e2)
+  - Fixed `const analysis` shadowing in else block causing undefined analysis
+- **Critical Bug**: Element registry not persisting across page.evaluate calls (commit faadd0e)
+  - Changed `const elementRegistry = new Map()` to `window.__ELEMENT_REGISTRY__`
+  - Registry now persists between tool calls
+  - All selector-resolver functions exported to window
+- **API Error**: `oneOf` not supported at top level in tool schemas
+  - Removed `oneOf` blocks from click, type, hover, selectOption tool definitions
+  - Both `id` and `selector` parameters now optional with description indicating one is required
+  - Fixes error: `tools.19.custom.input_schema: input_schema does not support oneOf, allOf, or anyOf at the top level`
+### Changed
+- **`analyzePage` enhanced with APOM support**
+  - Now supports both legacy and APOM formats
+  - Cache logic updated to handle generateIds parameter
+  - Elements automatically registered when generateIds=true
+  - Radio/checkbox elements now include label text
+- **`utils/selector-resolver.js` updated for persistence**
+  - Registry stored in `window.__ELEMENT_REGISTRY__` instead of local const
+  - All functions (registerElement, resolveSelector, etc.) exported to window
+  - Survives across multiple page.evaluate contexts
+- **`navigateTo` auto-opens browser** - No longer throws error when no page is open
+  - Automatically opens browser at specified URL if no page is currently open
+  - Eliminates need to manually call `openBrowser` before navigation
+  - Falls back gracefully with informative message
+- **`type` tool uses Input Models** - Automatically selects appropriate model based on input type
+  - Time inputs now correctly set full value (e.g., "18:30" not just "30")
+  - Date inputs work without keyboard simulation issues
+  - All specialized inputs handled by their respective models
+### Technical Details
+- APOM conversion happens in browser context via page.evaluate
+- Element IDs remain stable across page refreshes (based on testid/id/structure)
+- Dual selector mode: all tools accept both IDs and CSS selectors
+- Input models use JavaScript value assignment with proper event dispatching
+## [2.6.0] - 2026-01-25
+### Added
+- **UI Framework Detection** - Automatic detection of UI component libraries (MUI, Ant Design, Chakra UI, Bootstrap, Vuetify, Semantic UI)
+  - New utility: `utils/ui-framework-detector.js`
+  - Detects framework name, version, and component type for each element
+  - Integrated into `analyzePage` - all elements now include `uiFramework` field
+  - Extracts options from both native `<select>` and custom framework dropdowns
+- **Enhanced Select/Dropdown Options Extraction** - Smart extraction of dropdown options from various UI libraries
+  - Native HTML `<select>` with `<optgroup>` support
+  - Material-UI (MUI) Select components
+  - Ant Design Select components
+  - Chakra UI, Bootstrap, Vuetify, Semantic UI dropdowns
+  - Options include: value, text, index, selected, disabled, group
+  - Handles cases where options aren't rendered until dropdown opens (with informative notes)
+- **Page Object ID Support** - Use element IDs instead of CSS selectors
+  - New utility: `utils/selector-resolver.js` - Registry for Page Object element IDs
+  - New tool: `registerPageObject` - Register elements from Page Object for use with IDs
+  - **Backward compatible**: All interaction tools (click, type, selectOption, hover, etc.) now accept BOTH:
+    - Page Object IDs (e.g., `"login_email_input"`)
+    - CSS selectors (e.g., `"input[name='email']"`)
+  - Element registry persists in page context between tool calls
+- **`registerPageObject` tool** - Register Page Object elements for ID-based interaction
+  - Parameters:
+    - `elements` (required) - Array of {id, selector, metadata}
+    - `clearExisting` (optional) - Clear registry before registering
+  - Enables using meaningful IDs instead of fragile CSS selectors
+  - Example: After registering, use `click("login_submit_button")` instead of `click("button[type='submit']")`
+- **Enhanced Page Object Generation** - Page Objects now include comprehensive element information
+  - Each element gets unique ID: `{name}_{timestamp}_{random}`
+  - Select elements include full options array with groups
+  - UI framework detection for all elements
+  - Metadata includes: type, label, placeholder, required, validation hints
+### Changed
+- **`analyzePage` enhanced with UI framework detection**
+  - All form fields and inputs now include `uiFramework` field
+  - Select elements use smart extraction: works with both vanilla HTML and UI frameworks
+  - Better handling of MUI, Ant Design, and other component libraries
+- **All interaction tools now support dual selector mode**
+  - Tools affected: `click`, `type`, `selectOption`, `hover`, `scrollTo`, `drag`, `setStyles`
+  - Automatically resolves Page Object IDs to CSS selectors
+  - Error messages indicate whether identifier was Page Object ID or CSS selector
+  - No breaking changes - existing CSS selector usage works unchanged
+### Technical Details
+- Selector resolution happens in page context using injected `selector-resolver.js`
+- UI framework detection uses class names, data attributes, and DOM structure analysis
+- Element registry stored in browser page context (survives navigation within same page)
+- New helper function `resolveSelector(page, identifier)` in `index.js`
 ## [2.5.0] - 2026-01-21
 ### Added