chrometools-mcp 2.4.2 → 3.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/CHANGELOG.md +540 -0
  2. package/COMPONENT_MAPPING_SPEC.md +1217 -0
  3. package/README.md +494 -38
  4. package/bridge/bridge-client.js +472 -0
  5. package/bridge/bridge-service.js +399 -0
  6. package/bridge/install.js +241 -0
  7. package/browser/browser-manager.js +107 -2
  8. package/browser/page-manager.js +226 -69
  9. package/docs/CHROME_EXTENSION.md +219 -0
  10. package/docs/PAGE_OBJECT_MODEL_CONCEPT.md +1756 -0
  11. package/element-finder-utils.js +138 -28
  12. package/extension/background.js +643 -0
  13. package/extension/content.js +715 -0
  14. package/extension/icons/create-icons.js +164 -0
  15. package/extension/icons/icon128.png +0 -0
  16. package/extension/icons/icon16.png +0 -0
  17. package/extension/icons/icon48.png +0 -0
  18. package/extension/manifest.json +58 -0
  19. package/extension/popup/popup.css +437 -0
  20. package/extension/popup/popup.html +102 -0
  21. package/extension/popup/popup.js +415 -0
  22. package/extension/recorder-overlay.css +93 -0
  23. package/figma-tools.js +120 -0
  24. package/index.js +3347 -2518
  25. package/models/BaseInputModel.js +93 -0
  26. package/models/CheckboxGroupModel.js +199 -0
  27. package/models/CheckboxModel.js +103 -0
  28. package/models/ColorInputModel.js +53 -0
  29. package/models/DateInputModel.js +67 -0
  30. package/models/RadioGroupModel.js +126 -0
  31. package/models/RangeInputModel.js +60 -0
  32. package/models/SelectModel.js +97 -0
  33. package/models/TextInputModel.js +34 -0
  34. package/models/TextareaModel.js +59 -0
  35. package/models/TimeInputModel.js +49 -0
  36. package/models/index.js +122 -0
  37. package/package.json +3 -2
  38. package/pom/apom-converter.js +267 -0
  39. package/pom/apom-tree-converter.js +515 -0
  40. package/pom/element-id-generator.js +175 -0
  41. package/recorder/page-object-generator.js +16 -0
  42. package/recorder/scenario-executor.js +80 -2
  43. package/server/tool-definitions.js +839 -656
  44. package/server/tool-groups.js +3 -2
  45. package/server/tool-schemas.js +367 -296
  46. package/server/websocket-bridge.js +447 -0
  47. package/utils/selector-resolver.js +186 -0
  48. package/utils/ui-framework-detector.js +392 -0
package/CHANGELOG.md CHANGED
@@ -2,6 +2,546 @@
2
2
 
3
3
  All notable changes to this project will be documented in this file.
4
4
 
5
+ ## [3.1.2] - 2026-01-26
6
+
7
+ ### Added
8
+ - **Multi-tab scenario recording** — Automatic recording of tab switches during scenario capture
9
+ - When user switches tabs during recording, an `openTab` action is automatically recorded
10
+ - Records tab URL, title, and switch reason for accurate playback
11
+ - Only records if switching to a different tab (ignores same-tab activations)
12
+ - Location: `extension/background.js` (chrome.tabs.onActivated listener)
13
+
14
+ ### Changed
15
+ - **openTab navigation strategy** — Changed from `networkidle2` to `domcontentloaded` in scenario executor
16
+ - Prevents timeout errors when opening tabs with continuous ad/tracking loading
17
+ - Consistent with other navigation improvements in 3.1.1
18
+ - Location: `recorder/scenario-executor.js:979`
19
+
20
+ ### Fixed
21
+ - **openTab with empty URL uses look-ahead to next action's URL** — Smart URL detection for tab switches
22
+ - When openTab has empty URL, executor looks at next action's tabUrl
23
+ - If next action has real URL, uses it for tab opening/switching
24
+ - Fixes scenarios where new tab was opened but URL loaded immediately after
25
+ - Empty URLs without look-ahead match are still skipped (prevents about:blank tabs)
26
+ - Location: `recorder/scenario-executor.js:168-176, 987-990`
27
+
28
+ - **Added 500ms delay before tab switch** — Prevents race conditions during scenario playback
29
+ - Allows previous tab's pending processes (navigation, AJAX, form submissions) to complete
30
+ - Ensures stable state before switching to new tab
31
+ - Location: `recorder/scenario-executor.js:990-992`
32
+
33
+ ## [3.1.1] - 2026-01-26
34
+
35
+ ### Fixed
36
+ - **Scenario recording and saving flow** — Fixed critical bugs preventing scenario recording
37
+ - Fixed Extension → Bridge → MCP communication flow for recording control
38
+ - `startRecording` now properly sends command to Extension via Bridge WebSocket
39
+ - `stopRecording` correctly retrieves recorded actions from Extension state
40
+ - `saveScenario` now successfully saves scenarios to correct project directory
41
+ - Recording state properly synchronized between Extension popup and MCP tools
42
+ - Location: `index.js` (startRecording/stopRecording/saveScenario handlers)
43
+
44
+ - **Navigation timeout for slow websites** — Fixed timeout errors on sites with continuous loading
45
+ - Increased navigation timeout from 30s to 60s
46
+ - Changed wait strategy from `networkidle2` to `domcontentloaded` (less strict)
47
+ - Fixes timeout errors on sites like Yahoo that continuously load ads and tracking scripts
48
+ - Location: `browser/page-manager.js:188-193`
49
+
50
+ ### Changed
51
+ - **Improved WebSocket message handling** — Better error reporting and state management
52
+ - Bridge now properly forwards recording commands to Extension
53
+ - MCP server correctly receives recording state updates from Bridge
54
+ - Clear error messages when Extension is not connected or recording fails
55
+
56
+ ## [3.1.0] - 2026-01-26
57
+
58
+ ### Added
59
+ - **Native Messaging Bridge Architecture** — complete rewrite of Extension ↔ MCP communication
60
+ - Bridge Service runs as Native Messaging Host (launched by Chrome with Extension)
61
+ - MCP servers connect as WebSocket clients (not servers)
62
+ - Supports 0-8 simultaneous MCP clients connecting/disconnecting at any time
63
+ - Full state (tabs, recordings) sent immediately on client connect
64
+ - No more scanning delays — instant connection to persistent Bridge
65
+
66
+ - **CLI commands for Bridge management**
67
+ - `--install-bridge` — Install Native Messaging Bridge (one-time setup)
68
+ - `--uninstall-bridge` — Remove Bridge installation
69
+ - `--check-bridge` — Verify Bridge is installed
70
+ - `--help` — Show all CLI options
71
+
72
+ - **Stable Extension ID** via manifest key
73
+ - Extension ID is now deterministic: `dmehkibmncgphijnigkahhlekgajhpbl`
74
+ - Required for Native Messaging Host registration
75
+
76
+ - **New extension icons** — Chrome/robot themed design (16, 48, 128px)
77
+
78
+ ### Changed
79
+ - **Extension is now Event Producer** — sends all events to Bridge, doesn't manage WebSocket connections
80
+ - **MCP is now Event Consumer** — connects to Bridge as client, receives state on demand
81
+ - **Bridge lifecycle** — starts with Chrome Extension, stops when Chrome closes
82
+ - Removed port scanning (9223-9227) — Bridge uses single fixed port 9223
83
+
84
+ ### Architecture
85
+ ```
86
+ Chrome Extension (producer) → Native Messaging → Bridge Service (:9223) ← WebSocket ← MCP clients (0-8)
87
+ ```
88
+
89
+ ### Migration
90
+ 1. Run `npx chrometools-mcp --install-bridge` once
91
+ 2. Reload Extension in chrome://extensions
92
+ 3. Use normally — MCP auto-connects to Bridge
93
+
94
+ ## [3.0.4] - 2026-01-26
95
+
96
+ ### Added
97
+ - **Smart tab tracking for scenario recording**
98
+ - Recording automatically follows the active tab
99
+ - When user switches tabs during recording, an `openTab` action is recorded
100
+ - When user opens new tab, an `openTab` action is recorded
101
+ - Actions from non-active tabs are automatically filtered out
102
+ - `openTab` action ensures tab exists during playback (opens tab with URL if not exists)
103
+ - Scenario executor supports `openTab` with automatic tab reuse and creation
104
+ - **MCP tools for programmatic recording control**
105
+ - `startRecording` - Start recording from AI/code
106
+ - `stopRecording` - Stop and retrieve recorded actions
107
+ - `getRecorderState` - Query current recording state
108
+ - `saveScenario` - Save recorded actions as scenario
109
+ - AI agents can now fully control recording without manual interaction
110
+
111
+ ### Changed
112
+ - Recording now tracks `currentTabId` instead of being locked to `startTabId`
113
+ - Content scripts only send actions when their tab is the active recording target
114
+ - Replaced `switchTab`/`newTab` with unified `openTab` action type
115
+ - `openTab` intelligently checks if tab already exists before creating new one
116
+ - `enableRecorder` now mentions programmatic control tools
117
+
118
+ ## [3.0.3] - 2026-01-25
119
+
120
+ ### Added
121
+ - **Multi-instance MCP server support** via dynamic port allocation
122
+ - MCP server automatically finds available port from range 9223-9227
123
+ - Chrome Extension scans for running MCP instances every 20 seconds (port scanning)
124
+ - Extension connects to multiple MCP servers simultaneously (broadcast pattern)
125
+ - Enables multiple AI clients (Claude Desktop, Telegram bot, etc.) to work in parallel
126
+ - Graceful handling of ungraceful shutdowns (kill -9) via WebSocket.onclose
127
+
128
+ ### Changed
129
+ - **Auto-sync active tab when user switches tabs manually**
130
+ - MCP server now syncs Puppeteer's `lastPage` when extension reports `tab_activated`
131
+ - Callback pattern avoids circular dependencies between websocket-bridge and page-manager
132
+ - MCP commands automatically target the user's currently active tab
133
+
134
+ ### Fixed
135
+ - **Input recording deduplication** in Chrome Extension
136
+ - Extension now records only final text value after blur/Enter or 1.5s pause
137
+ - Eliminated intermediate keystroke recordings (e.g., "test" → "test1" recorded as one action)
138
+ - Improved debouncing with `inputStartValues` tracking
139
+
140
+ ## [3.0.2] - 2026-01-25
141
+
142
+ ### Added
143
+ - **Extension installation instructions for AI agents**
144
+ - `listTabs`, `switchTab`, `enableRecorder` now include install steps when extension not connected
145
+ - `openBrowser` shows warning when connected to existing Chrome (extension needs manual install)
146
+ - Clear step-by-step instructions with extension path for manual installation
147
+ - Alternative fix: close all Chrome windows and restart MCP for auto-install
148
+
149
+ ### Changed
150
+ - Improved extension status reporting with `extensionConnected` flag in responses
151
+
152
+ ## [3.0.1] - 2026-01-25
153
+
154
+ ### Fixed
155
+ - **Multi-tab support** - Fixed extension-based tab switching
156
+ - `switchTab` now uses Chrome Extension for reliable tab switching
157
+ - Auto-connects Puppeteer to switched tab (fixes `analyzePage` after tab switch)
158
+ - `puppeteerConnected: true` in response confirms Puppeteer sync
159
+
160
+ ## [3.0.0] - 2026-01-25
161
+
162
+ ### BREAKING CHANGES
163
+ - **Chrome Extension for Recording** - Scenario recording now requires Chrome Extension
164
+ - Old HTML-injection recorder (`injectRecorder`) removed
165
+ - Extension auto-loads when Chrome is started by chrometools-mcp
166
+ - Recording controlled via Extension popup (click CT icon in toolbar)
167
+
168
+ ### Added
169
+ - **ChromeTools Chrome Extension** (`extension/` folder)
170
+ - Full tab tracking via Chrome tabs API (catches ALL new tabs including Ctrl+T, context menu)
171
+ - Scenario recording via content script (works across all domains)
172
+ - Popup UI for recording control
173
+ - WebSocket connection to MCP server for real-time communication
174
+
175
+ - **WebSocket Bridge** (`server/websocket-bridge.js`)
176
+ - Bidirectional communication between Extension and MCP server
177
+ - Port 9223 (CHROME_DEBUG_PORT + 1)
178
+ - Tab state sync, recorder commands, scenario save/list
179
+
180
+ - **Auto-load Extension** - Chrome launched with `--load-extension` flag
181
+ - Extension automatically installed when Chrome starts
182
+ - No manual installation required
183
+
184
+ ### Changed
185
+ - `enableRecorder` tool now checks Extension connection status instead of injecting HTML
186
+ - Tab tracking improved: Extension provides complete tab list including manually opened tabs
187
+ - Recording state persisted in `chrome.storage.local` (survives cross-domain navigation)
188
+
189
+ ### Removed
190
+ - `recorder-script.js` HTML injection functionality (still exists for reference)
191
+ - `pagesWithRecorder` tracking (Extension handles this now)
192
+ - `setupRecorderAutoReinjection` function (Extension handles this now)
193
+
194
+ ## [2.9.0] - 2026-01-25
195
+
196
+ ### Added
197
+ - **Automatic new tab detection** - Tracks tabs opened via `window.open()`, `target="_blank"`, or user actions
198
+ - New tabs automatically become the active page
199
+ - Network monitoring, console capture, and recorder auto-injection enabled on new tabs
200
+ - New tab events queued for AI notification via `listTabs`
201
+
202
+ - **`listTabs` tool** - List all open browser tabs
203
+ - Returns: `{ tabs: [{ index, url, title, isActive }], totalCount }`
204
+ - Includes `newTabsDetected` array when new tabs were opened since last check
205
+ - Use tab index with `switchTab` to change active tab
206
+
207
+ - **`switchTab` tool** - Switch between browser tabs
208
+ - Parameters: `tab` - Tab index (number) or URL pattern (string, partial match)
209
+ - Makes the specified tab active for subsequent commands
210
+ - Returns: `{ success, switchedTo: { url, title } }`
211
+
212
+ ### Changed
213
+ - `openPages` Map now tracks all tabs including those opened externally
214
+ - Browser `targetcreated` event handler added for automatic tab tracking
215
+
216
+ ## [2.8.0] - 2026-01-25
217
+
218
+ ### Added
219
+ - **`getElementByApomId` tool** - Get detailed element information by APOM ID
220
+ - Parameters: `id` (required) - APOM element ID from analyzePage (e.g., `"input_20"`)
221
+ - Returns: Full element details including bounds, attributes, computed styles, visibility
222
+ - Use case: Inspect specific elements without re-analyzing entire page
223
+
224
+ ### Changed
225
+ - **APOM format optimization** - ~82% token reduction
226
+ - Tree-structured output with hierarchical parent-child relationships
227
+ - Minified JSON output (no pretty printing)
228
+ - Parent nodes contain only position info (no bounds/metadata)
229
+ - Interactive elements retain full details (bounds, type, metadata)
230
+ - Groups section for radio/checkbox groups with options
231
+
232
+ - **Separate `id` and `selector` parameters** for click, type, hover, selectOption
233
+ - **PREFERRED**: Use `id` parameter with APOM ID from analyzePage (e.g., `click({ id: "button_45" })`)
234
+ - **ALTERNATIVE**: Use `selector` parameter with CSS selector (e.g., `click({ selector: ".submit" })`)
235
+ - Parameters are mutually exclusive (use one or the other)
236
+ - Makes API clearer: agent knows exactly what it's passing
237
+ - Updated tool descriptions with PREFERRED/ALTERNATIVE guidance
238
+
239
+ ### Removed
240
+ - **`registerPageObject` tool** - No longer needed
241
+ - `analyzePage()` now automatically registers elements with unique IDs
242
+ - Use APOM IDs directly with click/type/hover/selectOption tools
243
+ - Simplifies workflow: just call `analyzePage()` and use the returned IDs
244
+
245
+ ### Performance
246
+ - APOM token usage reduced from ~31,000 to ~5,684 tokens on typical pages
247
+ - Tree structure eliminates redundant parent information
248
+ - Minified JSON further reduces output size
249
+
250
+ ## [2.7.0] - 2026-01-25
251
+
252
+ ### 🔄 BREAKING CHANGE
253
+ - **`analyzePage` now returns APOM format by default**
254
+ - Previous default was legacy format, now APOM is default
255
+ - Use `useLegacyFormat: true` to get old format if needed
256
+ - Migration: `analyzePage()` now returns APOM instead of legacy
257
+ - Rationale: APOM is superior - provides unique IDs, better structure, automatic registration
258
+
259
+ ### Added
260
+ - **🎉 Agent Page Object Model (APOM) - Now Default Format**
261
+ - `analyzePage()` returns structured APOM format (no parameters needed!)
262
+ - New parameter: `useLegacyFormat` - Return old format (default: false)
263
+ - Parameter: `registerElements` - Auto-register elements (default: true)
264
+ - Parameter: `groupBy: 'type' | 'flat'` - Control element grouping
265
+ - Returns: `{ pageId, url, title, timestamp, elements, groups, metadata }`
266
+ - Each element gets unique ID: `input_0`, `button_1`, `form_0`, `radio_0`, `checkbox_0`
267
+ - Elements automatically registered in persistent `window.__ELEMENT_REGISTRY__`
268
+ - **Use IDs instead of CSS selectors**: `type({ id: "input_0", text: "..." })`
269
+ - **Backward compatible**: Set `useLegacyFormat: true` for old format
270
+ - **Tested**: Fully functional with real-world forms
271
+
272
+ - **New POM Modules**
273
+ - `pom/element-id-generator.js` (171 lines) - Smart ID generation
274
+ - Priority: data-testid > id attribute > semantic path + index
275
+ - Supports: input, button, link, form, textarea, select, radio, checkbox
276
+ - `pom/apom-converter.js` (294 lines) - Convert analyzePage to APOM
277
+ - Transforms legacy format to structured model
278
+ - Groups elements by type (forms, inputs, buttons, links)
279
+ - Generates pageId, metadata
280
+
281
+ - **Input Models Architecture** - Modular input handling system
282
+ - New `models/` directory with specialized input handlers
283
+ - `BaseInputModel` - Abstract base class with common interface (setValue, getValue, clear, focus)
284
+ - `TextInputModel` - Default for text-like inputs (text, email, password, search, tel, url)
285
+ - `TimeInputModel` - Correct handling for `input[type="time"]` via JavaScript value assignment
286
+ - `DateInputModel` - Handles date, datetime-local, month, week inputs
287
+ - `ColorInputModel` - Color picker input handling
288
+ - `RangeInputModel` - Slider/range input handling
289
+ - `SelectModel` - HTML `<select>` element handling
290
+ - `CheckboxModel` - Single checkbox toggle
291
+ - `TextareaModel` - Multi-line text input
292
+ - `InputModelFactory` - Factory pattern for selecting appropriate model
293
+ - Fixes issue where keyboard input didn't work for time inputs (only minutes showed)
294
+
295
+ - **Radio/Checkbox Group Models** - Abstract group-level operations
296
+ - `RadioGroupModel` - Select single option from radio group by name, value, or label text
297
+ - `CheckboxGroupModel` - Multi-select from checkbox group with modes:
298
+ - `set` - Replace all selections
299
+ - `add` - Check additional values
300
+ - `remove` - Uncheck specific values
301
+ - `toggle` - Flip specific values
302
+
303
+ - **`selectFromGroup` tool** - New MCP tool for radio/checkbox group selection
304
+ - Parameters: `name` (required), `value`, `values`, `text`, `texts`, `mode`, `by`
305
+ - Works with radio groups (single selection) and checkbox groups (multi-selection)
306
+ - Match by value attribute or label text (`by: 'value' | 'text' | 'auto'`)
307
+ - Example: `selectFromGroup({ name: "size", value: "large" })`
308
+ - Example: `selectFromGroup({ name: "toppings", values: ["cheese", "bacon"], mode: "add" })`
309
+
310
+ - **Radio/Checkbox Groups in `analyzePage`**
311
+ - APOM output now includes `groups` section with radio and checkbox groups
312
+ - Each group shows: name, all options with values, labels, checked state
313
+ - Labels extracted from: parent `<label>`, `<label for="id">`, aria-label attribute
314
+ - Example output:
315
+ ```json
316
+ "groups": {
317
+ "radio": {
318
+ "size": {
319
+ "options": [
320
+ { "value": "small", "label": "Small", "checked": false },
321
+ { "value": "large", "label": "Large", "checked": true }
322
+ ]
323
+ }
324
+ },
325
+ "checkbox": { ... }
326
+ }
327
+ ```
328
+
329
+ ### Fixed
330
+ - **Critical Bug**: Variable shadowing in analyzePage APOM conversion (commit e1e63e2)
331
+ - Fixed `const analysis` shadowing in else block causing undefined analysis
332
+ - **Critical Bug**: Element registry not persisting across page.evaluate calls (commit faadd0e)
333
+ - Changed `const elementRegistry = new Map()` to `window.__ELEMENT_REGISTRY__`
334
+ - Registry now persists between tool calls
335
+ - All selector-resolver functions exported to window
336
+ - **API Error**: `oneOf` not supported at top level in tool schemas
337
+ - Removed `oneOf` blocks from click, type, hover, selectOption tool definitions
338
+ - Both `id` and `selector` parameters now optional with description indicating one is required
339
+ - Fixes error: `tools.19.custom.input_schema: input_schema does not support oneOf, allOf, or anyOf at the top level`
340
+
341
+ ### Changed
342
+ - **`analyzePage` enhanced with APOM support**
343
+ - Now supports both legacy and APOM formats
344
+ - Cache logic updated to handle generateIds parameter
345
+ - Elements automatically registered when generateIds=true
346
+ - Radio/checkbox elements now include label text
347
+
348
+ - **`utils/selector-resolver.js` updated for persistence**
349
+ - Registry stored in `window.__ELEMENT_REGISTRY__` instead of local const
350
+ - All functions (registerElement, resolveSelector, etc.) exported to window
351
+ - Survives across multiple page.evaluate contexts
352
+
353
+ - **`navigateTo` auto-opens browser** - No longer throws error when no page is open
354
+ - Automatically opens browser at specified URL if no page is currently open
355
+ - Eliminates need to manually call `openBrowser` before navigation
356
+ - Falls back gracefully with informative message
357
+
358
+ - **`type` tool uses Input Models** - Automatically selects appropriate model based on input type
359
+ - Time inputs now correctly set full value (e.g., "18:30" not just "30")
360
+ - Date inputs work without keyboard simulation issues
361
+ - All specialized inputs handled by their respective models
362
+
363
+ ### Technical Details
364
+ - APOM conversion happens in browser context via page.evaluate
365
+ - Element IDs remain stable across page refreshes (based on testid/id/structure)
366
+ - Dual selector mode: all tools accept both IDs and CSS selectors
367
+ - Input models use JavaScript value assignment with proper event dispatching
368
+
369
+ ## [2.6.0] - 2026-01-25
370
+
371
+ ### Added
372
+ - **UI Framework Detection** - Automatic detection of UI component libraries (MUI, Ant Design, Chakra UI, Bootstrap, Vuetify, Semantic UI)
373
+ - New utility: `utils/ui-framework-detector.js`
374
+ - Detects framework name, version, and component type for each element
375
+ - Integrated into `analyzePage` - all elements now include `uiFramework` field
376
+ - Extracts options from both native `<select>` and custom framework dropdowns
377
+
378
+ - **Enhanced Select/Dropdown Options Extraction** - Smart extraction of dropdown options from various UI libraries
379
+ - Native HTML `<select>` with `<optgroup>` support
380
+ - Material-UI (MUI) Select components
381
+ - Ant Design Select components
382
+ - Chakra UI, Bootstrap, Vuetify, Semantic UI dropdowns
383
+ - Options include: value, text, index, selected, disabled, group
384
+ - Handles cases where options aren't rendered until dropdown opens (with informative notes)
385
+
386
+ - **Page Object ID Support** - Use element IDs instead of CSS selectors
387
+ - New utility: `utils/selector-resolver.js` - Registry for Page Object element IDs
388
+ - New tool: `registerPageObject` - Register elements from Page Object for use with IDs
389
+ - **Backward compatible**: All interaction tools (click, type, selectOption, hover, etc.) now accept BOTH:
390
+ - Page Object IDs (e.g., `"login_email_input"`)
391
+ - CSS selectors (e.g., `"input[name='email']"`)
392
+ - Element registry persists in page context between tool calls
393
+
394
+ - **`registerPageObject` tool** - Register Page Object elements for ID-based interaction
395
+ - Parameters:
396
+ - `elements` (required) - Array of {id, selector, metadata}
397
+ - `clearExisting` (optional) - Clear registry before registering
398
+ - Enables using meaningful IDs instead of fragile CSS selectors
399
+ - Example: After registering, use `click("login_submit_button")` instead of `click("button[type='submit']")`
400
+
401
+ - **Enhanced Page Object Generation** - Page Objects now include comprehensive element information
402
+ - Each element gets unique ID: `{name}_{timestamp}_{random}`
403
+ - Select elements include full options array with groups
404
+ - UI framework detection for all elements
405
+ - Metadata includes: type, label, placeholder, required, validation hints
406
+
407
+ ### Changed
408
+ - **`analyzePage` enhanced with UI framework detection**
409
+ - All form fields and inputs now include `uiFramework` field
410
+ - Select elements use smart extraction: works with both vanilla HTML and UI frameworks
411
+ - Better handling of MUI, Ant Design, and other component libraries
412
+
413
+ - **All interaction tools now support dual selector mode**
414
+ - Tools affected: `click`, `type`, `selectOption`, `hover`, `scrollTo`, `drag`, `setStyles`
415
+ - Automatically resolves Page Object IDs to CSS selectors
416
+ - Error messages indicate whether identifier was Page Object ID or CSS selector
417
+ - No breaking changes - existing CSS selector usage works unchanged
418
+
419
+ ### Technical Details
420
+ - Selector resolution happens in page context using injected `selector-resolver.js`
421
+ - UI framework detection uses class names, data attributes, and DOM structure analysis
422
+ - Element registry stored in browser page context (survives navigation within same page)
423
+ - New helper function `resolveSelector(page, identifier)` in `index.js`
424
+
425
+ ## [2.5.0] - 2026-01-21
426
+
427
+ ### Added
428
+ - **`selectOption` tool** - Select options in HTML dropdown elements with intelligent priority-based selection
429
+ - Parameters: `selector` (required), `value`, `text`, or `index` (specify at least one)
430
+ - Selection priority: value → text → index (tries value first, falls back to text, then index)
431
+ - Automatically triggers `input` and `change` events for React and other frameworks
432
+ - Returns selected option details (value, text, index)
433
+ - Location: `index.js:911-979`, schemas in `server/tool-schemas.js:40-45`, definitions in `server/tool-definitions.js:234-247`, tool group in `server/tool-groups.js:10`
434
+
435
+ - **`drag` tool** - Drag element by mouse (click-hold-move-release) in any direction
436
+ - Parameters: `selector` (required), `direction` (required: 'up', 'down', 'left', 'right', 'up-left', 'up-right', 'down-left', 'down-right'), `distance` (optional, default: 100), `duration` (optional, default: 500ms)
437
+ - Emulates real mouse drag: moves to element center, presses button, drags, releases button
438
+ - Supports 8 directions including 4 diagonal directions for maximum flexibility
439
+ - Use for: interactive maps (Google Maps, Leaflet), Gantt charts, SVG diagrams, canvas, drag-to-pan interfaces
440
+ - NOT for: standard overflow scrollbars (use `scrollTo` or `scrollHorizontal` instead)
441
+ - Location: `index.js:982-1091`, schemas in `server/tool-schemas.js:47-53`, definitions in `server/tool-definitions.js:248-261`, tool group in `server/tool-groups.js:10`
442
+
443
+ - **`scrollHorizontal` tool** - Scroll element horizontally for tables, carousels, and wide content
444
+ - Parameters: `selector` (required), `direction` (required: 'left' or 'right'), `amount` (required: pixels or 'full'), `behavior` (optional: 'auto' or 'smooth')
445
+ - Supports precise pixel-based scrolling or 'full' to scroll to the end
446
+ - Returns detailed scroll state: position, total width, visible width, and scroll availability (canScrollLeft, canScrollRight)
447
+ - Uses native `scrollTo` API with smooth/auto behavior options
448
+ - Location: `index.js:1055-1117`, schemas in `server/tool-schemas.js:55-60`, definitions in `server/tool-definitions.js:262-275`, tool group in `server/tool-groups.js:10`
449
+
450
+ ### Fixed
451
+ - **🔥 CRITICAL: Fixed `drag` tool implementation** - Now correctly emulates mouse drag instead of changing scrollLeft/scrollTop
452
+ - Problem: Previous implementation used `scrollLeft`/`scrollTop` animation which only works for `overflow: auto/scroll` containers
453
+ - Impact: **Did not work with custom drag-to-scroll interfaces** like:
454
+ - ❌ Interactive maps (Google Maps, Leaflet, Mapbox)
455
+ - ❌ Gantt charts and timeline diagrams (SVG-based)
456
+ - ❌ Canvas elements with pan/zoom
457
+ - ❌ Custom drag handlers (React DnD, interact.js)
458
+ - Solution: Complete rewrite using Puppeteer's `page.mouse` API:
459
+ 1. Finds element center position
460
+ 2. Moves mouse to center (`page.mouse.move`)
461
+ 3. Presses mouse button (`page.mouse.down`)
462
+ 4. Drags to target position with smooth motion (`page.mouse.move` with steps)
463
+ 5. Releases mouse button (`page.mouse.up`)
464
+ - Result: **Now works with ANY drag-scrollable element** including SVG diagrams, maps, and custom implementations
465
+ - Location: `index.js:982-1091`, updated description in `README.md:277-285`
466
+ - Reported by: User testing on Gantt chart with `<svg class="gantt">` element
467
+
468
+ - **Fixed `analyzePage` crash with `includeAll: true` on SVG elements** - Now handles both HTML and SVG className types
469
+ - Problem: `className.split is not a function` error when page contains SVG elements
470
+ - Cause: SVG elements have `className` as `SVGAnimatedString` object (with `.baseVal` property), not a string
471
+ - Solution: Added type checking - uses `className.baseVal` for SVG elements, direct string for HTML
472
+ - Location: `index.js:2126-2137`
473
+
474
+ - **🔥 CRITICAL: Fixed Tailwind CSS selector generation bug** - `getUniqueSelectorInPage` now works correctly with Tailwind/utility-first CSS frameworks
475
+ - Problem: Generated invalid CSS selectors like `button.hover:bg-blue-700` containing special characters (`:`, `/`, `[]`)
476
+ - Impact: **ALL AI-powered tools failed** with `SyntaxError: invalid selector` on Tailwind/styled-components apps:
477
+ - ❌ `analyzePage` - couldn't read page state
478
+ - ❌ `findElementsByText` - couldn't find elements by text
479
+ - ❌ `smartFindElement` - couldn't find elements by description
480
+ - ❌ `getAllInteractiveElements` - couldn't list interactive elements
481
+ - Solution: Complete rewrite of selector generation logic with intelligent filtering:
482
+ 1. **New priority hierarchy** (most reliable first):
483
+ - `#id` (ID attribute)
484
+ - `[data-testid="..."]` (test IDs, very common in modern apps)
485
+ - `[data-*="..."]` (other data attributes)
486
+ - `[aria-label="..."]` (accessibility labels)
487
+ - `[role="..."]` (ARIA roles)
488
+ - `[name="..."]` (form element names)
489
+ - `tag.semantic-class` (non-Tailwind classes only)
490
+ - `tag:nth-of-type(n)` (fallback with path)
491
+ 2. **Tailwind class filtering** - New `isTailwindClass()` function detects and excludes:
492
+ - Variant classes with `:` (hover:, focus:, md:, lg:, etc.)
493
+ - Fraction classes with `/` (w-1/2, space-x-1/2)
494
+ - Arbitrary values with `[]` (bg-[#1da1f2], w-[500px])
495
+ - 60+ common Tailwind prefixes (bg-, text-, p-, m-, flex-, etc.)
496
+ 3. **CSS.escape() integration** - All selectors properly escaped (with fallback for old browsers)
497
+ 4. **Semantic attribute prioritization** - Prefers stable, meaningful selectors over utility classes
498
+ - Result: **Unblocks testing of ALL modern apps** using Tailwind, styled-components, CSS modules, Emotion, etc.
499
+ - Location: `element-finder-utils.js:316-509` (complete rewrite, ~200 lines)
500
+ - Reported by: AI agent encountering `SyntaxError` on every tool call in React+Tailwind app
501
+
502
+ ### Changed
503
+ - **Improved tool descriptions for better AI agent behavior** - Prevents premature use of `executeScript`
504
+ - `click` - Emphasized as PRIMARY tool for clicking, works with React/Vue/Angular synthetic events
505
+ - `type` - Emphasized as PRIMARY tool for input, updates React hooks and Vue reactive data correctly
506
+ - `executeScript` - ⚠️ Marked as LAST RESORT with strict warnings, never use for clicking/typing/reading
507
+ - `findElementsByText` - Highlighted as alternative to executeScript for finding elements
508
+ - `analyzePage` - Emphasized as PRIMARY tool for reading page state, more efficient than executeScript
509
+ - Location: `server/tool-definitions.js:31,45,162,510,489`
510
+
511
+ - **Added "Tool Usage Priority" section to README** - Clear hierarchy preventing executeScript abuse
512
+ - Three workflows: Clicking/Interaction, Filling Forms, Reading Page State
513
+ - Each shows specialized tools first (click, type, analyzePage), executeScript last
514
+ - Explains why specialized tools work with React/Vue/Angular while executeScript may fail
515
+ - Location: `README.md:116-147`
516
+
517
+ - **`analyzePage` enhancement** - Now detects and reports HTML select elements with all available options
518
+ - Select fields in forms and inputs sections now include `options` array with value, text, index, selected, and disabled status
519
+ - Includes `selectedIndex`, `selectedValue`, and `selectedText` for current selection
520
+ - Enables AI agents to see all dropdown options without additional queries
521
+ - Makes `selectOption` tool usage more intelligent and reliable
522
+ - Location: `index.js:1632-1660` (forms), `index.js:1691-1713` (inputs)
523
+
524
+ - **Tool groups** - Added 3 new tools to `interaction` group: `selectOption`, `dragScroll`, `scrollHorizontal`
525
+ - Total interaction tools: 8 (was 5)
526
+ - Total tools in project: 44+ (was 40+)
527
+ - Location: `server/tool-groups.js:10`
528
+
529
+ - **`convertFigmaToCode` tool** - Convert Figma designs to React/Tailwind code with AI assistance
530
+ - Parameters: `figmaToken` (optional), `fileKey` (required), `nodeId` (required), `framework` (optional: 'react', 'react-typescript', 'html'), `includeComments` (optional, default: true)
531
+ - Fetches design structure (layout, colors, typography, spacing) and rendered image at 2x scale
532
+ - Returns AI-optimized instruction prompt with simplified JSON structure and framework-specific guidelines
533
+ - Supports React (JavaScript), React (TypeScript), and pure HTML with Tailwind CSS
534
+ - Generates clean, semantic code with proper spacing, accessibility, and component structure
535
+ - Uses existing Figma token mechanism (from parameter or FIGMA_TOKEN env var)
536
+ - Location: `index.js:1676-1779`, schemas in `server/tool-schemas.js:225-231`, definitions in `server/tool-definitions.js:448-462`, tool group in `server/tool-groups.js:53`, helper in `figma-tools.js:381-499`
537
+
538
+ - **`simplifyNode` helper** - New function in figma-tools.js for code generation
539
+ - Recursively extracts essential design properties from Figma node structure
540
+ - Captures: layout (flexbox), dimensions, padding/gaps, colors (fills/strokes), effects (shadows), typography, border radius
541
+ - Filters out invisible elements and rounds numeric values for cleaner output
542
+ - Used by `convertFigmaToCode` to provide AI with actionable design data
543
+ - Location: `figma-tools.js:381-499`
544
+
5
545
  ## [2.4.2] - 2026-01-05
6
546
 
7
547
  ### Added