pagebolt-mcp 1.8.2 → 1.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/server.json +2 -2
- package/src/index.mjs +304 -15
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pagebolt-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.9.1",
|
|
4
4
|
"description": "MCP server for PageBolt — take screenshots, generate PDFs, create OG images, inspect pages, record demo videos with Audio Guide narration, from AI coding assistants like Claude, Cursor, and Windsurf.",
|
|
5
5
|
"main": "src/index.mjs",
|
|
6
6
|
"module": "src/index.mjs",
|
package/server.json
CHANGED
|
@@ -6,12 +6,12 @@
|
|
|
6
6
|
"url": "https://github.com/Custodia-Admin/pagebolt-mcp",
|
|
7
7
|
"source": "github"
|
|
8
8
|
},
|
|
9
|
-
"version": "1.
|
|
9
|
+
"version": "1.9.1",
|
|
10
10
|
"packages": [
|
|
11
11
|
{
|
|
12
12
|
"registryType": "npm",
|
|
13
13
|
"identifier": "pagebolt-mcp",
|
|
14
|
-
"version": "1.
|
|
14
|
+
"version": "1.9.1",
|
|
15
15
|
"transport": {
|
|
16
16
|
"type": "stdio"
|
|
17
17
|
},
|
package/src/index.mjs
CHANGED
|
@@ -61,7 +61,7 @@ async function callApi(endpoint, options = {}) {
|
|
|
61
61
|
const method = options.method || 'GET';
|
|
62
62
|
const headers = {
|
|
63
63
|
'x-api-key': API_KEY,
|
|
64
|
-
'user-agent': 'pagebolt-mcp/1.
|
|
64
|
+
'user-agent': 'pagebolt-mcp/1.9.1',
|
|
65
65
|
...(options.body ? { 'Content-Type': 'application/json' } : {}),
|
|
66
66
|
};
|
|
67
67
|
const body = options.body ? JSON.stringify(options.body) : undefined;
|
|
@@ -159,7 +159,7 @@ const styleSchema = z.object({
|
|
|
159
159
|
|
|
160
160
|
// ─── Server Instructions ────────────────────────────────────────
|
|
161
161
|
const SERVER_INSTRUCTIONS = `
|
|
162
|
-
PageBolt gives you
|
|
162
|
+
PageBolt gives you tools for web capture and browser automation. All tools use your API key automatically.
|
|
163
163
|
|
|
164
164
|
## Tools Overview
|
|
165
165
|
|
|
@@ -168,7 +168,9 @@ PageBolt gives you 8 tools for web capture and browser automation. All tools use
|
|
|
168
168
|
| take_screenshot | Capture a URL, HTML, or Markdown as PNG/JPEG/WebP | 1 request |
|
|
169
169
|
| generate_pdf | Convert a URL or HTML to PDF, saves to disk | 1 request |
|
|
170
170
|
| create_og_image | Generate social card images from templates or custom HTML | 1 request |
|
|
171
|
-
|
|
|
171
|
+
| observe_page | Agent-optimized page observation: id-indexed elements, page-type classification, suggested actions (+ optional content/ARIA/screenshot) | 1 request |
|
|
172
|
+
| visual_diff | Pixel-level visual comparison of two pages | 1 request |
|
|
173
|
+
| run_sequence | Multi-step browser automation with screenshot/PDF/diff outputs | 1 request per output |
|
|
172
174
|
| record_video | Record browser automation as MP4/WebM/GIF with cursor effects | 3 requests |
|
|
173
175
|
| inspect_page | Get structured map of page elements with CSS selectors | 1 request |
|
|
174
176
|
| list_devices | List 25+ device presets (iPhone, iPad, MacBook, etc.) | 0 (free) |
|
|
@@ -176,6 +178,10 @@ PageBolt gives you 8 tools for web capture and browser automation. All tools use
|
|
|
176
178
|
| create_session | Create a persistent browser session (Starter+ only) | 0 (free to create) |
|
|
177
179
|
| destroy_session | Destroy a persistent browser session | 0 (free) |
|
|
178
180
|
|
|
181
|
+
## Agent Perception: observe_page vs inspect_page
|
|
182
|
+
|
|
183
|
+
For AI agents that need to understand and act on an arbitrary page, prefer **observe_page** — it returns a compact, token-budgeted observation (id-indexed elements + page-type + grouped suggested actions) in one call, and can optionally bundle readable content, the ARIA tree, and a screenshot. Use **inspect_page** when you specifically want the full raw element/heading/link/image inventory. Both return reliable CSS selectors you can pass to run_sequence.
|
|
184
|
+
|
|
179
185
|
## Key Workflow: Inspect Before You Interact
|
|
180
186
|
|
|
181
187
|
When building sequences or videos, ALWAYS use inspect_page first to discover reliable CSS selectors:
|
|
@@ -185,6 +191,37 @@ When building sequences or videos, ALWAYS use inspect_page first to discover rel
|
|
|
185
191
|
|
|
186
192
|
This avoids guessing selectors like "#submit" when the actual element is "#submitBtn".
|
|
187
193
|
|
|
194
|
+
## Handling Dynamic UI: Dropdowns, Popovers, and Modals
|
|
195
|
+
|
|
196
|
+
Clicking menus, avatars, profile icons, "⋯" buttons, hamburger toggles, or anything that opens a dropdown/popover/modal creates an overlay that floats ABOVE the page. This is the #1 cause of broken multi-step automations:
|
|
197
|
+
- Subsequent steps get visually obscured by the still-open overlay.
|
|
198
|
+
- A click intended for the underlying page lands on the overlay (or its backdrop) and navigates somewhere unexpected.
|
|
199
|
+
|
|
200
|
+
Rules:
|
|
201
|
+
1. **Don't open menus you don't need.** For a high-level tour, navigate directly to the destination URL (from inspect_page / observe_page) instead of clicking through a dropdown.
|
|
202
|
+
2. **If you open an overlay, the very next step must commit to it** — either interact with an element INSIDE the overlay, or explicitly close it before continuing. There is no "press_key" action, so close an overlay with an evaluate step (note: max 2 evaluate steps per sequence):
|
|
203
|
+
{ "action": "evaluate", "script": "document.activeElement&&document.activeElement.blur&&document.activeElement.blur();document.dispatchEvent(new KeyboardEvent('keydown',{key:'Escape',bubbles:true}));" }
|
|
204
|
+
(Clicking a blank area can also work, but may hit the overlay backdrop and navigate — prefer the evaluate approach or click a known-safe element.)
|
|
205
|
+
3. **Never chain clicks across a state change you haven't re-perceived.** Selectors gathered before a menu opened or a route changed may now point at the wrong (or covered) element.
|
|
206
|
+
|
|
207
|
+
## Re-perceive Between Actions (avoid getting lost)
|
|
208
|
+
|
|
209
|
+
run_sequence and record_video execute a FIXED, pre-planned list of steps — they do NOT re-check the page between steps. For anything beyond a short, predictable flow, work iteratively instead of blind-batching:
|
|
210
|
+
1. observe_page (or take_screenshot) to see the CURRENT state.
|
|
211
|
+
2. Perform ONE meaningful action (a short run_sequence, or a single click/fill).
|
|
212
|
+
3. observe_page / take_screenshot AGAIN, then choose the next action from the fresh result.
|
|
213
|
+
Repeat. This is how an agent recovers from unexpected popovers, redirects, or layout shifts. Use session_id (create_session, Starter+) on run_sequence to keep cookies/auth/scroll state across these iterations.
|
|
214
|
+
|
|
215
|
+
For record_video specifically (one continuous capture, no mid-recording re-perception): keep the flow short and predictable, use ONLY selectors verified via inspect_page/observe_page, and add a dismiss step after anything that could open an overlay.
|
|
216
|
+
|
|
217
|
+
## Visual Diff
|
|
218
|
+
|
|
219
|
+
Use visual_diff to compare two pages pixel-by-pixel. Returns a diff image with changed pixels highlighted in red.
|
|
220
|
+
- Supports fullPage: true to diff entire scrollable pages (not just the viewport)
|
|
221
|
+
- Supports all screenshot options: device emulation, dark mode, selectors, blocking, etc.
|
|
222
|
+
- Use in run_sequence as a "diff" step to automate browser interactions before comparing — navigate, click, fill forms, then diff against another URL.
|
|
223
|
+
- threshold: 0.1 (default) — lower values catch more subtle differences
|
|
224
|
+
|
|
188
225
|
## Styling Screenshots
|
|
189
226
|
|
|
190
227
|
Use the "style" parameter on take_screenshot for beautiful styled captures:
|
|
@@ -226,8 +263,9 @@ Use blockBanners on almost every request to get clean captures. Combine blockAds
|
|
|
226
263
|
- response_type: "json" returns base64 data instead of binary (useful for programmatic use)
|
|
227
264
|
- record_video pace presets: "fast" (0.5x), "normal" (1x), "slow" (2x), "dramatic" (3x), "cinematic" (4.5x)
|
|
228
265
|
- record_video cursor styles: "highlight", "circle", "spotlight", "dot", "classic"
|
|
229
|
-
- run_sequence requires at least 1 screenshot
|
|
230
|
-
-
|
|
266
|
+
- run_sequence requires at least 1 output step (screenshot, pdf, or diff)
|
|
267
|
+
- run_sequence supports "diff" steps: automate interactions, then diff current page against another URL/HTML
|
|
268
|
+
- record_video does NOT allow screenshot/pdf/diff steps — the whole sequence IS the video
|
|
231
269
|
- Max 2 evaluate (JavaScript) steps per sequence/video
|
|
232
270
|
- fullPage: true on screenshots captures the entire scrollable page
|
|
233
271
|
- fullPageScroll: true triggers lazy-loaded images before capture
|
|
@@ -236,8 +274,8 @@ Use blockBanners on almost every request to get clean captures. Combine blockAds
|
|
|
236
274
|
|
|
237
275
|
| Action | Cost |
|
|
238
276
|
|--------|------|
|
|
239
|
-
| Screenshot, PDF, OG image, Inspect | 1 request each |
|
|
240
|
-
| Sequence | 1 request per output (screenshot/pdf) |
|
|
277
|
+
| Screenshot, PDF, OG image, Inspect, Visual Diff | 1 request each |
|
|
278
|
+
| Sequence | 1 request per output (screenshot/pdf/diff) |
|
|
241
279
|
| Video recording | 3 requests flat |
|
|
242
280
|
| list_devices, check_usage | Free |
|
|
243
281
|
`.trim();
|
|
@@ -246,7 +284,7 @@ Use blockBanners on almost every request to get clean captures. Combine blockAds
|
|
|
246
284
|
function createConfiguredServer() {
|
|
247
285
|
const srv = new McpServer({
|
|
248
286
|
name: 'pagebolt',
|
|
249
|
-
version: '1.
|
|
287
|
+
version: '1.9.1',
|
|
250
288
|
}, {
|
|
251
289
|
instructions: SERVER_INSTRUCTIONS,
|
|
252
290
|
});
|
|
@@ -513,14 +551,14 @@ server.tool(
|
|
|
513
551
|
// ═══════════════════════════════════════════════════════════════════
|
|
514
552
|
server.tool(
|
|
515
553
|
'run_sequence',
|
|
516
|
-
'Execute a multi-step browser automation sequence. Navigate pages, interact with elements (click, fill, select), and capture multiple screenshots/PDFs in a single browser session. Each output counts as 1 API request.',
|
|
554
|
+
'Execute a multi-step browser automation sequence. Navigate pages, interact with elements (click, fill, select), and capture multiple screenshots/PDFs/diffs in a single browser session. Use the "diff" step to compare the current page state against another URL after automation. Each output counts as 1 API request.',
|
|
517
555
|
{
|
|
518
556
|
steps: z.array(
|
|
519
557
|
z.object({
|
|
520
558
|
action: z.enum([
|
|
521
559
|
'navigate', 'click', 'dblclick', 'fill', 'select', 'hover',
|
|
522
560
|
'scroll', 'wait', 'wait_for', 'evaluate',
|
|
523
|
-
'screenshot', 'pdf',
|
|
561
|
+
'screenshot', 'pdf', 'diff',
|
|
524
562
|
]).describe('The action to perform'),
|
|
525
563
|
url: z.string().url().optional().describe('URL to navigate to (for navigate action)'),
|
|
526
564
|
selector: z.string().optional().describe('CSS selector for the target element (also used for element screenshots)'),
|
|
@@ -530,20 +568,25 @@ server.tool(
|
|
|
530
568
|
x: z.number().optional().describe('Horizontal scroll position in pixels (scroll action). Use when scrolling horizontally without a selector.'),
|
|
531
569
|
y: z.number().optional().describe('Vertical scroll position in pixels (scroll action). REQUIRED when no selector is provided — e.g. {"action":"scroll","y":800} scrolls 800px down.'),
|
|
532
570
|
script: z.string().max(5000).optional().describe('JavaScript to execute in page context (for evaluate action)'),
|
|
533
|
-
name: z.string().optional().describe('Name for the output (for screenshot/pdf actions)'),
|
|
571
|
+
name: z.string().optional().describe('Name for the output (for screenshot/pdf/diff actions)'),
|
|
534
572
|
format: z.string().optional().describe('Image format: png, jpeg, webp (screenshot) or A4, Letter (pdf)'),
|
|
535
|
-
fullPage: z.boolean().optional().describe('Capture full scrollable page (for screenshot
|
|
536
|
-
fullPageScroll: z.boolean().optional().describe('Auto-scroll for lazy images (for screenshot
|
|
573
|
+
fullPage: z.boolean().optional().describe('Capture full scrollable page (for screenshot/diff actions)'),
|
|
574
|
+
fullPageScroll: z.boolean().optional().describe('Auto-scroll for lazy images (for screenshot/diff actions)'),
|
|
537
575
|
quality: z.number().int().min(1).max(100).optional().describe('JPEG/WebP quality (for screenshot action)'),
|
|
538
576
|
omitBackground: z.boolean().optional().describe('Transparent background (for screenshot action)'),
|
|
539
|
-
delay: z.number().int().min(0).max(10000).optional().describe('Pre-capture delay in ms (for screenshot
|
|
577
|
+
delay: z.number().int().min(0).max(10000).optional().describe('Pre-capture delay in ms (for screenshot/diff actions)'),
|
|
540
578
|
landscape: z.boolean().optional().describe('Landscape orientation (for pdf action)'),
|
|
541
579
|
printBackground: z.boolean().optional().describe('Include CSS backgrounds (for pdf action)'),
|
|
542
580
|
margin: z.string().optional().describe('CSS margin for all sides (for pdf action)'),
|
|
543
581
|
scale: z.number().min(0.1).max(2).optional().describe('Rendering scale (for pdf action)'),
|
|
544
582
|
style: styleSchema,
|
|
583
|
+
// ── Diff-specific step properties ──
|
|
584
|
+
url_b: z.string().url().optional().describe('URL of the comparison page (for diff action). The current page state is "A"; this URL is rendered as "B".'),
|
|
585
|
+
html_b: z.string().optional().describe('HTML of the comparison page (for diff action). The current page state is "A"; this HTML is rendered as "B".'),
|
|
586
|
+
selector_a: z.string().optional().describe('CSS selector to capture on the current page as side "A" (for diff action). If omitted, captures the full viewport/page.'),
|
|
587
|
+
threshold: z.number().min(0).max(1).optional().describe('Pixelmatch sensitivity 0–1 (for diff action, default: 0.1). Lower = more sensitive.'),
|
|
545
588
|
})
|
|
546
|
-
).min(1).max(20).describe('Array of steps to execute in order. Must include at least one screenshot or
|
|
589
|
+
).min(1).max(20).describe('Array of steps to execute in order. Must include at least one output step (screenshot, pdf, or diff). Max 20 steps, max 5 outputs.'),
|
|
547
590
|
viewport: z.object({
|
|
548
591
|
width: z.number().int().min(320).max(3840).optional().describe('Viewport width (default: 1280)'),
|
|
549
592
|
height: z.number().int().min(200).max(2160).optional().describe('Viewport height (default: 720)'),
|
|
@@ -596,6 +639,20 @@ server.tool(
|
|
|
596
639
|
type: 'text',
|
|
597
640
|
text: `[${output.name}] PDF generated — ${output.size_bytes} bytes, step ${output.step_index}`,
|
|
598
641
|
});
|
|
642
|
+
} else if (output.type === 'diff') {
|
|
643
|
+
content.push({
|
|
644
|
+
type: 'image',
|
|
645
|
+
data: output.data,
|
|
646
|
+
mimeType: 'image/png',
|
|
647
|
+
});
|
|
648
|
+
content.push({
|
|
649
|
+
type: 'text',
|
|
650
|
+
text: `[${output.name}] Diff — ${output.changed_pct}% changed (${output.changed_pixels?.toLocaleString()} of ${output.total_pixels?.toLocaleString()} pixels), step ${output.step_index}` +
|
|
651
|
+
(output.changed_pct === 0 ? ' — Pages are visually identical.' :
|
|
652
|
+
output.changed_pct < 1 ? ' — Minor differences.' :
|
|
653
|
+
output.changed_pct < 10 ? ' — Moderate differences.' :
|
|
654
|
+
' — Significant differences.'),
|
|
655
|
+
});
|
|
599
656
|
}
|
|
600
657
|
}
|
|
601
658
|
|
|
@@ -928,6 +985,235 @@ server.tool(
|
|
|
928
985
|
}
|
|
929
986
|
);
|
|
930
987
|
|
|
988
|
+
// ═══════════════════════════════════════════════════════════════════
|
|
989
|
+
// Tool: observe_page — agent-optimized page observation (perception layer)
|
|
990
|
+
// ═══════════════════════════════════════════════════════════════════
|
|
991
|
+
server.tool(
|
|
992
|
+
'observe_page',
|
|
993
|
+
'Get a compact, token-budgeted "observation" of any web page, purpose-built for AI agents. In ONE request it returns: id-indexed interactive elements (role, name, CSS selector, state), a heuristic page-type classification (login, signup, search, article, form, generic), and grouped "suggested actions" (login flow, search, primary buttons, navigation). Optionally include readable content (Markdown), the ARIA tree, and a screenshot. This is the fastest way for an agent to understand and act on an un-instrumented page — far more token-efficient than a raw screenshot or full DOM. Use the returned selectors with run_sequence to act. Costs 1 API request.',
|
|
994
|
+
{
|
|
995
|
+
// ── Source ──
|
|
996
|
+
url: z.string().url().optional().describe('URL to observe (required if no html)'),
|
|
997
|
+
html: z.string().optional().describe('Raw HTML to observe (required if no url)'),
|
|
998
|
+
// ── Observation shape ──
|
|
999
|
+
maxElements: z.number().int().min(1).max(150).optional().describe('Cap on interactive elements returned (default 40, max 150). Lower = fewer tokens.'),
|
|
1000
|
+
includeRects: z.boolean().optional().describe('Include bounding boxes {x,y,w,h} per element (default false — omit to save tokens)'),
|
|
1001
|
+
includeContent: z.boolean().optional().describe('Also extract the main readable content as Markdown (default false)'),
|
|
1002
|
+
includeAriaTree: z.boolean().optional().describe('Also include the interesting-only ARIA accessibility tree (default false)'),
|
|
1003
|
+
includeScreenshot: z.boolean().optional().describe('Also capture a screenshot in the same page load (default false)'),
|
|
1004
|
+
screenshotFormat: z.enum(['jpeg', 'png', 'webp']).optional().describe('Screenshot format when includeScreenshot is true (default jpeg)'),
|
|
1005
|
+
screenshotFullPage: z.boolean().optional().describe('Capture the full scrollable page for the screenshot (default false)'),
|
|
1006
|
+
// ── Viewport ──
|
|
1007
|
+
width: z.number().int().min(1).max(3840).optional().describe('Viewport width in pixels (default: 1280)'),
|
|
1008
|
+
height: z.number().int().min(1).max(2160).optional().describe('Viewport height in pixels (default: 720)'),
|
|
1009
|
+
viewportDevice: z.string().optional().describe('Device preset for viewport emulation (e.g. "iphone_14_pro"). Use list_devices to see all presets.'),
|
|
1010
|
+
deviceScaleFactor: z.number().min(1).max(3).optional().describe('Device pixel ratio (default: 1)'),
|
|
1011
|
+
// ── Timing ──
|
|
1012
|
+
waitUntil: z.enum(['load', 'domcontentloaded', 'networkidle0', 'networkidle2']).optional().describe('When to consider navigation finished (default: networkidle2)'),
|
|
1013
|
+
waitForSelector: z.string().optional().describe('Wait for this CSS selector to appear before observing'),
|
|
1014
|
+
navigationTimeout: z.number().int().min(0).max(30000).optional().describe('Navigation timeout in ms (default: 25000)'),
|
|
1015
|
+
// ── Emulation ──
|
|
1016
|
+
darkMode: z.boolean().optional().describe('Emulate dark color scheme (default: false)'),
|
|
1017
|
+
timeZone: z.string().optional().describe('Override browser timezone'),
|
|
1018
|
+
userAgent: z.string().optional().describe('Override the browser User-Agent string'),
|
|
1019
|
+
// ── Auth & headers ──
|
|
1020
|
+
cookies: z.array(cookieSchema).optional().describe('Cookies to set — array of "name=value" strings or { name, value, domain? } objects'),
|
|
1021
|
+
headers: z.record(z.string(), z.string()).optional().describe('Extra HTTP headers to send with the request'),
|
|
1022
|
+
authorization: z.string().optional().describe('Authorization header value (e.g. "Bearer <token>")'),
|
|
1023
|
+
bypassCSP: z.boolean().optional().describe('Bypass Content-Security-Policy on the page'),
|
|
1024
|
+
// ── Blocking ──
|
|
1025
|
+
blockBanners: z.boolean().optional().describe('Hide cookie consent banners (default: false)'),
|
|
1026
|
+
blockAds: z.boolean().optional().describe('Block advertisements on the page'),
|
|
1027
|
+
blockChats: z.boolean().optional().describe('Block live chat widgets'),
|
|
1028
|
+
blockTrackers: z.boolean().optional().describe('Block tracking scripts'),
|
|
1029
|
+
},
|
|
1030
|
+
async (params) => {
|
|
1031
|
+
if (!params.url && !params.html) {
|
|
1032
|
+
return { content: [{ type: 'text', text: 'Error: Either "url" or "html" is required.' }], isError: true };
|
|
1033
|
+
}
|
|
1034
|
+
|
|
1035
|
+
try {
|
|
1036
|
+
const res = await callApi('/api/v1/observe', { method: 'POST', body: params });
|
|
1037
|
+
const data = await res.json();
|
|
1038
|
+
|
|
1039
|
+
const lines = [];
|
|
1040
|
+
lines.push(`Page: ${data.title || '(untitled)'} (${data.url})`);
|
|
1041
|
+
lines.push(`Type: ${data.pageType}`);
|
|
1042
|
+
if (data.metadata && data.metadata.httpStatusCode) lines.push(`HTTP Status: ${data.metadata.httpStatusCode}`);
|
|
1043
|
+
lines.push('');
|
|
1044
|
+
|
|
1045
|
+
if (data.actions && data.actions.length > 0) {
|
|
1046
|
+
lines.push('Suggested actions:');
|
|
1047
|
+
for (const a of data.actions) {
|
|
1048
|
+
lines.push(` ${a.intent}: ${a.elementIds.join(', ')}`);
|
|
1049
|
+
}
|
|
1050
|
+
lines.push('');
|
|
1051
|
+
}
|
|
1052
|
+
|
|
1053
|
+
if (data.elements && data.elements.length > 0) {
|
|
1054
|
+
lines.push(`Interactive elements (${data.elements.length}):`);
|
|
1055
|
+
for (const el of data.elements) {
|
|
1056
|
+
let line = ` ${el.id} [${el.role}${el.type ? ` ${el.type}` : ''}]`;
|
|
1057
|
+
if (el.name) line += ` "${el.name}"`;
|
|
1058
|
+
if (el.state && el.state.length) line += ` {${el.state.join(',')}}`;
|
|
1059
|
+
line += ` — selector: ${el.selector}`;
|
|
1060
|
+
if (el.href) line += ` → ${el.href}`;
|
|
1061
|
+
lines.push(line);
|
|
1062
|
+
}
|
|
1063
|
+
lines.push('');
|
|
1064
|
+
}
|
|
1065
|
+
|
|
1066
|
+
if (data.forms && data.forms.length > 0) {
|
|
1067
|
+
lines.push(`Forms (${data.forms.length}):`);
|
|
1068
|
+
for (const f of data.forms) {
|
|
1069
|
+
lines.push(` ${f.selector} (${f.method} ${f.action || '(none)'}): fields ${f.fieldIds.join(', ')}`);
|
|
1070
|
+
}
|
|
1071
|
+
lines.push('');
|
|
1072
|
+
}
|
|
1073
|
+
|
|
1074
|
+
if (data.headings && data.headings.length > 0) {
|
|
1075
|
+
lines.push('Outline:');
|
|
1076
|
+
for (const h of data.headings) lines.push(` ${' '.repeat(h.level - 1)}H${h.level}: ${h.text}`);
|
|
1077
|
+
lines.push('');
|
|
1078
|
+
}
|
|
1079
|
+
|
|
1080
|
+
if (data.content && data.content.markdown) {
|
|
1081
|
+
lines.push(`Readable content (${data.content.wordCount} words):`);
|
|
1082
|
+
lines.push(data.content.markdown.slice(0, 4000) + (data.content.markdown.length > 4000 ? '\n…(truncated)' : ''));
|
|
1083
|
+
lines.push('');
|
|
1084
|
+
}
|
|
1085
|
+
|
|
1086
|
+
if (data.ariaTree) {
|
|
1087
|
+
lines.push('ARIA tree:');
|
|
1088
|
+
lines.push(JSON.stringify(data.ariaTree, null, 2));
|
|
1089
|
+
lines.push('');
|
|
1090
|
+
}
|
|
1091
|
+
|
|
1092
|
+
lines.push(`Stats: ${data.stats.elementCount} elements, ~${data.stats.estimatedTokens} tokens. Duration: ${data.duration_ms}ms`);
|
|
1093
|
+
|
|
1094
|
+
const content = [{ type: 'text', text: lines.join('\n') }];
|
|
1095
|
+
if (data.screenshot && data.screenshot.base64) {
|
|
1096
|
+
content.unshift({ type: 'image', data: data.screenshot.base64, mimeType: imageMimeType(data.screenshot.format) });
|
|
1097
|
+
}
|
|
1098
|
+
return { content };
|
|
1099
|
+
} catch (err) {
|
|
1100
|
+
return { content: [{ type: 'text', text: `Observe error: ${err.message}` }], isError: true };
|
|
1101
|
+
}
|
|
1102
|
+
}
|
|
1103
|
+
);
|
|
1104
|
+
|
|
1105
|
+
// ═══════════════════════════════════════════════════════════════════
|
|
1106
|
+
// Tool: visual_diff — pixel-level visual comparison
|
|
1107
|
+
// ═══════════════════════════════════════════════════════════════════
|
|
1108
|
+
server.tool(
|
|
1109
|
+
'visual_diff',
|
|
1110
|
+
'Compare two web pages (or HTML strings) pixel-by-pixel and return a diff image highlighting all visual differences. Supports full-page capture, device emulation, element selectors, and all screenshot-like options. Returns the diff image, changed pixel count, and percentage changed. Costs 1 API request.',
|
|
1111
|
+
{
|
|
1112
|
+
// ── Sources ──
|
|
1113
|
+
url_a: z.string().url().optional().describe('URL of the first page (required if no html_a)'),
|
|
1114
|
+
url_b: z.string().url().optional().describe('URL of the second page (required if no html_b)'),
|
|
1115
|
+
html_a: z.string().optional().describe('Raw HTML for the first page (required if no url_a)'),
|
|
1116
|
+
html_b: z.string().optional().describe('Raw HTML for the second page (required if no url_b)'),
|
|
1117
|
+
// ── Diff sensitivity ──
|
|
1118
|
+
threshold: z.number().min(0).max(1).optional().describe('Pixelmatch sensitivity 0–1 (default: 0.1). Lower = more sensitive to subtle differences.'),
|
|
1119
|
+
// ── Viewport ──
|
|
1120
|
+
width: z.number().int().min(1).max(3840).optional().describe('Viewport width in pixels (default: 1280)'),
|
|
1121
|
+
height: z.number().int().min(1).max(2160).optional().describe('Viewport height in pixels (default: 720)'),
|
|
1122
|
+
viewportDevice: z.string().optional().describe('Device preset for viewport emulation (e.g. "iphone_14_pro"). Use list_devices to see all presets.'),
|
|
1123
|
+
viewportMobile: z.boolean().optional().describe('Enable mobile meta viewport emulation'),
|
|
1124
|
+
viewportHasTouch: z.boolean().optional().describe('Enable touch event emulation'),
|
|
1125
|
+
viewportLandscape: z.boolean().optional().describe('Landscape orientation'),
|
|
1126
|
+
deviceScaleFactor: z.number().min(1).max(3).optional().describe('Device pixel ratio (default: 1)'),
|
|
1127
|
+
// ── Capture region ──
|
|
1128
|
+
fullPage: z.boolean().optional().describe('Capture the full scrollable page for both sides (default: false)'),
|
|
1129
|
+
fullPageScroll: z.boolean().optional().describe('Auto-scroll pages before capture to trigger lazy-loaded images'),
|
|
1130
|
+
fullPageScrollDelay: z.number().int().min(0).max(2000).optional().describe('Delay between scroll steps in ms (default: 400)'),
|
|
1131
|
+
fullPageScrollBy: z.number().int().optional().describe('Pixels to scroll per step (default: viewport height)'),
|
|
1132
|
+
fullPageMaxHeight: z.number().int().optional().describe('Maximum pixel height cap for full-page captures'),
|
|
1133
|
+
selector: z.string().optional().describe('CSS selector — capture only this element on both pages'),
|
|
1134
|
+
clip: z.object({
|
|
1135
|
+
x: z.number(),
|
|
1136
|
+
y: z.number(),
|
|
1137
|
+
width: z.number(),
|
|
1138
|
+
height: z.number(),
|
|
1139
|
+
}).optional().describe('Crop region { x, y, width, height } in pixels'),
|
|
1140
|
+
// ── Timing ──
|
|
1141
|
+
delay: z.number().int().min(0).max(30000).optional().describe('Milliseconds to wait before capture on both pages (default: 0)'),
|
|
1142
|
+
click: z.string().optional().describe('CSS selector to click before capturing on both pages'),
|
|
1143
|
+
waitUntil: z.enum(['load', 'domcontentloaded', 'networkidle0', 'networkidle2']).optional().describe('When to consider navigation finished (default: networkidle2)'),
|
|
1144
|
+
waitForSelector: z.string().optional().describe('Wait for this CSS selector to appear before capturing'),
|
|
1145
|
+
navigationTimeout: z.number().int().min(0).max(30000).optional().describe('Navigation timeout in ms (default: 25000)'),
|
|
1146
|
+
// ── Emulation ──
|
|
1147
|
+
darkMode: z.boolean().optional().describe('Emulate dark color scheme (default: false)'),
|
|
1148
|
+
reducedMotion: z.boolean().optional().describe('Emulate prefers-reduced-motion to disable animations'),
|
|
1149
|
+
mediaType: z.enum(['screen', 'print']).optional().describe('Emulate CSS media type'),
|
|
1150
|
+
timeZone: z.string().optional().describe('Override browser timezone (e.g. "America/New_York")'),
|
|
1151
|
+
geolocation: z.object({
|
|
1152
|
+
latitude: z.number(),
|
|
1153
|
+
longitude: z.number(),
|
|
1154
|
+
accuracy: z.number().optional(),
|
|
1155
|
+
}).optional().describe('Emulate geolocation { latitude, longitude, accuracy? }'),
|
|
1156
|
+
userAgent: z.string().optional().describe('Override the browser User-Agent string'),
|
|
1157
|
+
// ── Auth & headers ──
|
|
1158
|
+
cookies: z.array(cookieSchema).optional().describe('Cookies to set — array of "name=value" strings or { name, value, domain? } objects'),
|
|
1159
|
+
headers: z.record(z.string(), z.string()).optional().describe('Extra HTTP headers to send with the request'),
|
|
1160
|
+
authorization: z.string().optional().describe('Authorization header value (e.g. "Bearer <token>")'),
|
|
1161
|
+
bypassCSP: z.boolean().optional().describe('Bypass Content-Security-Policy on the page'),
|
|
1162
|
+
// ── Content manipulation ──
|
|
1163
|
+
hideSelectors: z.array(z.string()).optional().describe('Array of CSS selectors to hide before capture'),
|
|
1164
|
+
injectCss: z.string().optional().describe('Custom CSS to inject before capturing (max 50KB)'),
|
|
1165
|
+
injectJs: z.string().optional().describe('Custom JavaScript to execute before capturing (max 50KB)'),
|
|
1166
|
+
// ── Blocking ──
|
|
1167
|
+
blockBanners: z.boolean().optional().describe('Hide cookie consent banners (default: false)'),
|
|
1168
|
+
blockAds: z.boolean().optional().describe('Block advertisements on the page'),
|
|
1169
|
+
blockChats: z.boolean().optional().describe('Block live chat widgets on the page'),
|
|
1170
|
+
blockTrackers: z.boolean().optional().describe('Block tracking scripts on the page'),
|
|
1171
|
+
blockRequests: z.array(z.string()).optional().describe('URL patterns to block (array of strings)'),
|
|
1172
|
+
blockResources: z.array(z.string()).optional().describe('Resource types to block (e.g. ["image", "font"])'),
|
|
1173
|
+
},
|
|
1174
|
+
async (params) => {
|
|
1175
|
+
if (!params.url_a && !params.html_a) {
|
|
1176
|
+
return { content: [{ type: 'text', text: 'Error: One of "url_a" or "html_a" is required.' }], isError: true };
|
|
1177
|
+
}
|
|
1178
|
+
if (!params.url_b && !params.html_b) {
|
|
1179
|
+
return { content: [{ type: 'text', text: 'Error: One of "url_b" or "html_b" is required.' }], isError: true };
|
|
1180
|
+
}
|
|
1181
|
+
|
|
1182
|
+
try {
|
|
1183
|
+
const res = await callApi('/api/v1/diff', {
|
|
1184
|
+
method: 'POST',
|
|
1185
|
+
body: params,
|
|
1186
|
+
});
|
|
1187
|
+
|
|
1188
|
+
const data = await res.json();
|
|
1189
|
+
|
|
1190
|
+
const content = [
|
|
1191
|
+
{
|
|
1192
|
+
type: 'image',
|
|
1193
|
+
data: data.diff_image.replace(/^data:image\/png;base64,/, ''),
|
|
1194
|
+
mimeType: 'image/png',
|
|
1195
|
+
},
|
|
1196
|
+
{
|
|
1197
|
+
type: 'text',
|
|
1198
|
+
text: `Visual diff complete.\n` +
|
|
1199
|
+
` Changed: ${data.changed_pct}% (${data.changed_pixels.toLocaleString()} of ${data.total_pixels.toLocaleString()} pixels)\n` +
|
|
1200
|
+
` URL A: ${data.url_a || '(html)'}\n` +
|
|
1201
|
+
` URL B: ${data.url_b || '(html)'}\n` +
|
|
1202
|
+
` Duration: ${data.duration_ms}ms\n` +
|
|
1203
|
+
(data.changed_pct === 0 ? ' Result: Pages are visually identical.' :
|
|
1204
|
+
data.changed_pct < 1 ? ' Result: Minor visual differences detected.' :
|
|
1205
|
+
data.changed_pct < 10 ? ' Result: Moderate visual differences detected.' :
|
|
1206
|
+
' Result: Significant visual differences detected.'),
|
|
1207
|
+
},
|
|
1208
|
+
];
|
|
1209
|
+
|
|
1210
|
+
return { content };
|
|
1211
|
+
} catch (err) {
|
|
1212
|
+
return { content: [{ type: 'text', text: `Visual diff error: ${err.message}` }], isError: true };
|
|
1213
|
+
}
|
|
1214
|
+
}
|
|
1215
|
+
);
|
|
1216
|
+
|
|
931
1217
|
// ═══════════════════════════════════════════════════════════════════
|
|
932
1218
|
// Tool: list_devices
|
|
933
1219
|
// ═══════════════════════════════════════════════════════════════════
|
|
@@ -1191,6 +1477,9 @@ Based on the inspection and the description, plan 5–12 action steps. Rules:
|
|
|
1191
1477
|
{ "action": "wait", "ms": 1500, "live": true }
|
|
1192
1478
|
- Do NOT pad with wait steps between steps that don't need load time — pace handles inter-step timing automatically.
|
|
1193
1479
|
- Do NOT use zoom unless the user explicitly asked for it.
|
|
1480
|
+
- **Avoid opening dropdowns/menus/popovers** unless the demo is specifically about their contents — they stay open and obscure or misdirect later steps. Prefer navigating directly to the target URL (from the inspection) over clicking through a menu. The recording cannot re-check the page between steps, so a stuck-open overlay will break everything after it.
|
|
1481
|
+
- If a step DOES open an overlay, the next step must either act on an element inside it or close it. There is no key-press action; close with an evaluate step (max 2 per video):
|
|
1482
|
+
{ "action": "evaluate", "script": "document.activeElement&&document.activeElement.blur&&document.activeElement.blur();document.dispatchEvent(new KeyboardEvent('keydown',{key:'Escape',bubbles:true}));" }
|
|
1194
1483
|
|
|
1195
1484
|
**Step 3 — Write the narration script**
|
|
1196
1485
|
Write an audioGuide.script that matches the step count. Format:
|