@mcp-b/smart-dom-reader 0.0.0-beta-20260221154800

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 mcp-b contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,539 @@
1
+ # @mcp-b/smart-dom-reader
2
+
3
+ > Token-efficient DOM extraction for AI agents - Extract interactive elements, semantic structure, and stable CSS selectors for LLM-powered browser automation
4
+
5
+ [![npm version](https://img.shields.io/npm/v/@mcp-b/smart-dom-reader?style=flat-square)](https://www.npmjs.com/package/@mcp-b/smart-dom-reader)
6
+ [![npm downloads](https://img.shields.io/npm/dm/@mcp-b/smart-dom-reader?style=flat-square)](https://www.npmjs.com/package/@mcp-b/smart-dom-reader)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
8
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue?style=flat-square)](https://www.typescriptlang.org/)
9
+ [![Zero Dependencies](https://img.shields.io/badge/Dependencies-0-green?style=flat-square)](https://bundlephobia.com/package/@mcp-b/smart-dom-reader)
10
+
11
+ **[Full Documentation](https://docs.mcp-b.ai/packages/smart-dom-reader)** | **[Quick Start](https://docs.mcp-b.ai/quickstart)**
12
+
13
+ **@mcp-b/smart-dom-reader** extracts DOM structure optimized for AI/LLM consumption. Get stable CSS selectors, interactive elements, and semantic page structure while minimizing token usage. Perfect for AI-powered browser automation, userscript generation, and web scraping with Claude, ChatGPT, or any LLM.
14
+
15
+ ## Why Use @mcp-b/smart-dom-reader?
16
+
17
+ | Feature | Benefit |
18
+ |---------|---------|
19
+ | **Token-Efficient** | Progressive extraction minimizes LLM context window usage |
20
+ | **Stable Selectors** | Ranked CSS selectors (ID > data-testid > ARIA > classes) for reliable automation |
21
+ | **AI-Optimized Output** | Structured data designed for LLM understanding |
22
+ | **Zero Dependencies** | Lightweight, runs in any browser environment |
23
+ | **Shadow DOM Support** | Traverses shadow roots and iframes |
24
+ | **Stateless API** | Works with any document context - Puppeteer, Playwright, browser extensions |
25
+
26
+ ## Use Cases
27
+
28
+ - **AI Browser Automation**: Generate robust selectors for Puppeteer/Playwright scripts
29
+ - **Userscript Generation**: LLMs create browser automation scripts with stable selectors
30
+ - **Web Scraping**: Extract structured data with semantic context for AI processing
31
+ - **Test Automation**: Generate reliable test selectors with multiple fallback strategies
32
+ - **Accessibility Analysis**: Extract ARIA labels, roles, and semantic landmarks
33
+
34
+ ## Key Features
35
+
36
+ - **Two extraction approaches**: Progressive (step-by-step) and Full (single-pass)
37
+ - **Stateless architecture**: All functions accept document/element parameters
38
+ - **Multiple selector strategies**: CSS, XPath, text-based, data-testid
39
+ - **Smart content detection**: Automatically identifies main content areas
40
+ - **Context preservation**: Maintains element relationships and semantic context
41
+ - **Shadow DOM & iframe support**: Traverses complex DOM structures
42
+ - **Token-efficient**: Optimized for LLM context windows
43
+
44
+ ## Installation
45
+
46
+ ```bash
47
+ npm install @mcp-b/smart-dom-reader
48
+ ```
49
+
50
+ ## Two Extraction Approaches
51
+
52
+ ### 1. Full Extraction (SmartDOMReader)
53
+
54
+ **When to use:** You need all information upfront and have sufficient token budget for processing the complete output. Best for automation, testing, and scenarios where you know exactly what you need.
55
+
56
+ ```typescript
57
+ import { SmartDOMReader } from '@mcp-b/smart-dom-reader';
58
+
59
+ // Pass document explicitly - no window dependency
60
+ const doc = document; // or any Document object
61
+
62
+ // Interactive mode - extract only interactive elements
63
+ const interactiveData = SmartDOMReader.extractInteractive(doc);
64
+
65
+ // Full mode - extract interactive + semantic elements
66
+ const fullData = SmartDOMReader.extractFull(doc);
67
+
68
+ // Custom options
69
+ const customData = SmartDOMReader.extractInteractive(doc, {
70
+ mainContentOnly: true,
71
+ viewportOnly: true,
72
+ includeHidden: false
73
+ });
74
+ ```
75
+
76
+ ### 2. Progressive Extraction (ProgressiveExtractor)
77
+
78
+ **When to use:** Working with AI/LLMs where token efficiency is critical. Allows making intelligent decisions at each step rather than extracting everything upfront.
79
+
80
+ ```typescript
81
+ import { ProgressiveExtractor } from '@mcp-b/smart-dom-reader';
82
+
83
+ // Step 1: Get high-level page structure (minimal tokens)
84
+ // Structure can be extracted from the whole document or a specific container element
85
+ const structure = ProgressiveExtractor.extractStructure(document);
86
+ console.log(structure.summary); // Quick stats about the page
87
+ console.log(structure.regions); // Map of page regions
88
+ console.log(structure.suggestions); // AI-friendly hints
89
+
90
+ // Step 2: Extract details from specific region based on structure
91
+ const mainContent = ProgressiveExtractor.extractRegion(
92
+ structure.summary.mainContentSelector,
93
+ document,
94
+ { mode: 'interactive' }
95
+ );
96
+
97
+ // Step 3: Extract readable content from a region
98
+ const articleText = ProgressiveExtractor.extractContent(
99
+ 'article.main-article',
100
+ document,
101
+ { includeHeadings: true, includeLists: true }
102
+ );
103
+
104
+ // Structure scoped to a container (e.g., navigation only)
105
+ const nav = document.querySelector('nav');
106
+ if (nav) {
107
+ const navOutline = ProgressiveExtractor.extractStructure(nav);
108
+ // navOutline.regions will only include elements within <nav>
109
+ }
110
+ ```
111
+
112
+ ## Extraction Modes
113
+
114
+ ### Interactive Mode
115
+ Focuses on elements users can interact with:
116
+ - Buttons and button-like elements
117
+ - Links
118
+ - Form inputs (text, select, textarea)
119
+ - Clickable elements with handlers
120
+ - Form structures and associations
121
+
122
+ ### Full Mode
123
+ Includes everything from interactive mode plus:
124
+ - Semantic HTML elements (articles, sections, nav)
125
+ - Headings hierarchy
126
+ - Images with alt text
127
+ - Tables and lists
128
+ - Content structure and relationships
129
+
130
+ ## API Comparison
131
+
132
+ ### Full Extraction API
133
+
134
+ ```typescript
135
+ // Class-based with options
136
+ const reader = new SmartDOMReader({
137
+ mode: 'interactive',
138
+ mainContentOnly: true,
139
+ viewportOnly: false
140
+ });
141
+ const result = reader.extract(document);
142
+
143
+ // Static methods for convenience
144
+ SmartDOMReader.extractInteractive(document);
145
+ SmartDOMReader.extractFull(document);
146
+ SmartDOMReader.extractFromElement(element, 'interactive');
147
+ ```
148
+
149
+ ### Progressive Extraction API
150
+
151
+ ```typescript
152
+ // Step 1: Structure overview (Document or Element)
153
+ const overview = ProgressiveExtractor.extractStructure(document);
154
+ // Returns: regions, forms, summary, suggestions
155
+
156
+ // Step 2: Region extraction
157
+ const region = ProgressiveExtractor.extractRegion(
158
+ selector,
159
+ document,
160
+ options
161
+ );
162
+ // Returns: Full SmartDOMResult for that region
163
+
164
+ // Step 3: Content extraction
165
+ const content = ProgressiveExtractor.extractContent(
166
+ selector,
167
+ document,
168
+ { includeMedia: true }
169
+ );
170
+ // Returns: Text content, headings, lists, tables, media
171
+ ```
172
+
173
+ ## Output Structure
174
+
175
+ Both approaches return structured data optimized for AI processing:
176
+
177
+ ```typescript
178
+ interface SmartDOMResult {
179
+ mode: 'interactive' | 'full';
180
+ timestamp: number;
181
+
182
+ page: {
183
+ url: string;
184
+ title: string;
185
+ hasErrors: boolean;
186
+ isLoading: boolean;
187
+ hasModals: boolean;
188
+ hasFocus?: string;
189
+ };
190
+
191
+ landmarks: {
192
+ navigation: string[];
193
+ main: string[];
194
+ forms: string[];
195
+ headers: string[];
196
+ footers: string[];
197
+ articles: string[];
198
+ sections: string[];
199
+ };
200
+
201
+ interactive: {
202
+ buttons: ExtractedElement[];
203
+ links: ExtractedElement[];
204
+ inputs: ExtractedElement[];
205
+ forms: FormInfo[];
206
+ clickable: ExtractedElement[];
207
+ };
208
+
209
+ semantic?: { // Only in full mode
210
+ headings: ExtractedElement[];
211
+ images: ExtractedElement[];
212
+ tables: ExtractedElement[];
213
+ lists: ExtractedElement[];
214
+ articles: ExtractedElement[];
215
+ };
216
+
217
+ metadata?: { // Only in full mode
218
+ totalElements: number;
219
+ extractedElements: number;
220
+ mainContent?: string;
221
+ language?: string;
222
+ };
223
+ }
224
+ ```
225
+
226
+ ## Element Information
227
+
228
+ Each extracted element includes comprehensive selector strategies with ranking (stable-first):
229
+
230
+ ```typescript
231
+ interface ExtractedElement {
232
+ tag: string;
233
+ text: string;
234
+
235
+ selector: {
236
+ css: string; // Best CSS selector (ranked stable-first)
237
+ xpath: string; // XPath selector
238
+ textBased?: string; // Text-content based hint
239
+ dataTestId?: string; // data-testid if available
240
+ ariaLabel?: string; // ARIA label if available
241
+ candidates?: Array<{
242
+ type: 'id' | 'data-testid' | 'role-aria' | 'name' | 'class-path' | 'css-path' | 'xpath' | 'text';
243
+ value: string;
244
+ score: number; // Higher = more stable/robust
245
+ }>;
246
+ };
247
+
248
+ attributes: Record<string, string>;
249
+
250
+ context: {
251
+ nearestForm?: string;
252
+ nearestSection?: string;
253
+ nearestMain?: string;
254
+ nearestNav?: string;
255
+ parentChain: string[];
256
+ };
257
+
258
+ // Compact flags: only present when true to save tokens
259
+ interaction: {
260
+ click?: boolean;
261
+ change?: boolean;
262
+ submit?: boolean;
263
+ nav?: boolean;
264
+ disabled?: boolean;
265
+ hidden?: boolean;
266
+ role?: string; // aria role when present
267
+ form?: string; // associated form selector
268
+ };
269
+ }
270
+ ```
271
+
272
+ ## Options
273
+
274
+ | Option | Type | Default | Description |
275
+ |--------|------|---------|-------------|
276
+ | `mode` | `'interactive' \| 'full'` | `'interactive'` | Extraction mode |
277
+ | `maxDepth` | `number` | `5` | Maximum traversal depth |
278
+ | `includeHidden` | `boolean` | `false` | Include hidden elements |
279
+ | `includeShadowDOM` | `boolean` | `true` | Traverse shadow DOM |
280
+ | `includeIframes` | `boolean` | `false` | Traverse iframes |
281
+ | `viewportOnly` | `boolean` | `false` | Only visible viewport elements |
282
+ | `mainContentOnly` | `boolean` | `false` | Focus on main content area |
283
+ | `customSelectors` | `string[]` | `[]` | Additional selectors to extract |
284
+
285
+ ## Use Cases
286
+
287
+ ### AI Userscript Generation (Progressive Approach)
288
+ ```typescript
289
+ // First, understand the page structure
290
+ const structure = ProgressiveExtractor.extractStructure(document);
291
+
292
+ // AI decides which region to focus on based on structure
293
+ const targetRegion = structure.regions.main?.selector || 'body';
294
+
295
+ // Extract detailed information from chosen region
296
+ const details = ProgressiveExtractor.extractRegion(
297
+ targetRegion,
298
+ document,
299
+ { mode: 'interactive', viewportOnly: true }
300
+ );
301
+
302
+ // Generate userscript prompt with focused context
303
+ const prompt = `
304
+ Page: ${details.page.title}
305
+ Main form: ${details.interactive.forms[0]?.selector}
306
+ Submit button: ${details.interactive.buttons.find(b => b.text.includes('Submit'))?.selector.css}
307
+
308
+ Write a userscript to auto-fill and submit this form.
309
+ `;
310
+ ```
311
+
312
+ ### Test Automation (Full Extraction)
313
+ ```typescript
314
+ // Get all interactive elements at once
315
+ const testData = SmartDOMReader.extractInteractive(document, {
316
+ customSelectors: ['[data-test]', '[data-cy]']
317
+ });
318
+
319
+ // Use multiple selector strategies for robust testing
320
+ testData.interactive.buttons.forEach(button => {
321
+ console.log(`Button: ${button.text}`);
322
+ console.log(` CSS: ${button.selector.css}`);
323
+ console.log(` XPath: ${button.selector.xpath}`);
324
+ console.log(` TestID: ${button.selector.dataTestId}`);
325
+ console.log(` Ranked candidates:`, button.selector.candidates?.slice(0, 3));
326
+ });
327
+ ```
328
+
329
+ ### Content Analysis (Progressive Approach)
330
+ ```typescript
331
+ // Get structure first
332
+ const structure = ProgressiveExtractor.extractStructure(document);
333
+
334
+ // Extract readable content from main area
335
+ const content = ProgressiveExtractor.extractContent(
336
+ structure.summary.mainContentSelector || 'main',
337
+ document,
338
+ { includeHeadings: true, includeTables: true }
339
+ );
340
+
341
+ console.log(`Word count: ${content.metadata.wordCount}`);
342
+ console.log(`Headings: ${content.text.headings?.length}`);
343
+ console.log(`Has interactive elements: ${content.metadata.hasInteractive}`);
344
+ ```
345
+
346
+ ## Stateless Architecture
347
+
348
+ All methods are stateless and accept document/element parameters explicitly:
349
+
350
+ ```typescript
351
+ // No window or document globals required
352
+ function extractFromIframe(iframe: HTMLIFrameElement) {
353
+ const iframeDoc = iframe.contentDocument;
354
+ if (iframeDoc) {
355
+ return SmartDOMReader.extractInteractive(iframeDoc);
356
+ }
357
+ }
358
+
359
+ // Works with any document context
360
+ function extractFromShadowRoot(shadowRoot: ShadowRoot) {
361
+ const container = shadowRoot.querySelector('.container');
362
+ if (container) {
363
+ return SmartDOMReader.extractFromElement(container);
364
+ }
365
+ }
366
+
367
+ /**
368
+ * Stateless bundle string (for extensions / userScripts)
369
+ *
370
+ * The library also provides a self-contained IIFE bundle as a string
371
+ * export that can be injected and executed without touching window scope.
372
+ */
373
+ import { SMART_DOM_READER_BUNDLE } from '@mcp-b/smart-dom-reader/bundle-string';
374
+
375
+ function execute(method, args) {
376
+ const code = `(() => {\n${SMART_DOM_READER_BUNDLE}\nreturn SmartDOMReaderBundle.executeExtraction(${JSON.stringify(
377
+ 'extractStructure'
378
+ )}, ${JSON.stringify({ selector: undefined, formatOptions: { detail: 'summary' } })});\n})()`;
379
+ // inject `code` into the page (e.g., chrome.userScripts.execute)
380
+ }
381
+
382
+ // Note: The bundle contains guarded fallbacks (e.g., typeof require === 'function')
383
+ // that are no-ops in the browser; there are no runtime imports.
384
+ ```
385
+
386
+ ## Design Philosophy
387
+
388
+ This library is designed to provide:
389
+
390
+ 1. **Token Efficiency**: Progressive extraction minimizes token usage for AI applications
391
+ 2. **Flexibility**: Choose between complete extraction or step-by-step approach
392
+ 3. **Statelessness**: No global dependencies, works in any JavaScript environment
393
+ 4. **Multiple Selector Strategies**: Robust element targeting with fallbacks
394
+ 5. **Semantic Understanding**: Preserves meaning and relationships
395
+ 6. **Interactive Focus**: Prioritizes elements users interact with
396
+ 7. **Context Preservation**: Maintains element relationships
397
+ 8. **Framework Agnostic**: Works with any web application
398
+
399
+ ## Frequently Asked Questions
400
+
401
+ ### How is this different from Cheerio or jsdom?
402
+
403
+ This library is **AI-optimized**:
404
+ - Outputs structured data designed for LLM consumption
405
+ - Provides ranked selectors with stability scores
406
+ - Progressive extraction minimizes token usage
407
+ - Preserves semantic context (landmarks, forms, interactivity)
408
+
409
+ ### Can I use this with Puppeteer/Playwright?
410
+
411
+ Yes! The library is stateless - pass any `document` object:
412
+
413
+ ```typescript
414
+ const page = await browser.newPage();
415
+ await page.goto('https://example.com');
416
+ const result = await page.evaluate(() => {
417
+ return SmartDOMReader.extractInteractive(document);
418
+ });
419
+ ```
420
+
421
+ ### How do selector rankings work?
422
+
423
+ Selectors are ranked by stability (higher = more reliable):
424
+ 1. **ID selectors** (score: 100) - `#unique-id`
425
+ 2. **data-testid** (score: 90) - `[data-testid="submit"]`
426
+ 3. **ARIA** (score: 80) - `[role="button"][aria-label="Submit"]`
427
+ 4. **Name/ID attributes** (score: 70) - `input[name="email"]`
428
+ 5. **Class paths** (score: 50) - `.form-container .submit-btn`
429
+
430
+ ### Does it handle Shadow DOM?
431
+
432
+ Yes! Set `includeShadowDOM: true` to traverse shadow roots.
433
+
434
+ ### What's the token overhead vs raw HTML?
435
+
436
+ Progressive extraction can reduce token usage by 80-95% compared to raw HTML, depending on the page and what you extract.
437
+
438
+ ## Comparison with Alternatives
439
+
440
+ | Feature | @mcp-b/smart-dom-reader | Cheerio | jsdom | Raw DOM |
441
+ |---------|-------------------------|---------|-------|---------|
442
+ | AI-Optimized Output | Yes | No | No | No |
443
+ | Ranked Selectors | Yes | No | No | No |
444
+ | Token Efficiency | Progressive | N/A | N/A | N/A |
445
+ | Shadow DOM | Yes | No | Limited | Yes |
446
+ | Browser Environment | Native | Parse only | Simulated | Native |
447
+ | Zero Dependencies | Yes | No | No | Yes |
448
+
449
+ ## Credits
450
+
451
+ Inspired by:
452
+ - [stacking-contexts-inspector](https://github.com/andreadev-it/stacking-contexts-inspector) - DOM traversal techniques
453
+ - [dom-to-semantic-markdown](https://github.com/romansky/dom-to-semantic-markdown) - Content scoring algorithms
454
+ - [z-context](https://github.com/gwwar/z-context) - Selector generation approaches
455
+
456
+ ## Related Packages
457
+
458
+ - [`@mcp-b/global`](https://docs.mcp-b.ai/packages/global) - W3C Web Model Context API polyfill
459
+ - [`@mcp-b/transports`](https://docs.mcp-b.ai/packages/transports) - Browser-specific MCP transports
460
+ - [`@mcp-b/chrome-devtools-mcp`](https://docs.mcp-b.ai/packages/chrome-devtools-mcp) - Connect desktop AI agents to browser
461
+ - [`@modelcontextprotocol/sdk`](https://www.npmjs.com/package/@modelcontextprotocol/sdk) - Official MCP SDK
462
+
463
+ ## Resources
464
+
465
+ - [WebMCP Documentation](https://docs.mcp-b.ai)
466
+ - [Model Context Protocol Spec](https://modelcontextprotocol.io)
467
+ - [MCP GitHub Repository](https://github.com/modelcontextprotocol)
468
+
469
+ ## License
470
+
471
+ MIT - see [LICENSE](../../LICENSE) for details
472
+
473
+ ## Support
474
+
475
+ - [GitHub Issues](https://github.com/WebMCP-org/npm-packages/issues)
476
+ - [Documentation](https://docs.mcp-b.ai)
477
+ - [Discord Community](https://discord.gg/a9fBR6Bw)
478
+
479
+ ## MCP Server (Golden Path)
480
+
481
+ For AI agents, use the bundled MCP server which returns XML-wrapped Markdown instead of JSON. This keeps responses concise and readable for LLMs while providing clear structural boundaries.
482
+
483
+ - Output format: always XML envelope with a single section tag containing Markdown in CDATA
484
+ - Structure: `<page title="..." url="...">\n <outline><![CDATA[ ...markdown... ]]></outline>\n</page>`
485
+ - Region: `<page ...>\n <section><![CDATA[ ...markdown... ]]></section>\n</page>`
486
+ - Content: `<page ...>\n <content><![CDATA[ ...markdown... ]]></content>\n</page>`
487
+ - Golden path sequence:
488
+ 1) `dom_extract_structure` → get page outline and pick a target
489
+ 2) `dom_extract_region` → get actionable selectors for that area
490
+ 3) Write a script; if unstable, re-run with higher detail or limits
491
+ 4) Optional: `dom_extract_content` for readable text context
492
+
493
+ ### Running the server
494
+
495
+ Ensure the library is built so the formatter is available:
496
+
497
+ ```
498
+ pnpm -w --filter @mcp-b/smart-dom-reader run build
499
+ ```
500
+
501
+ Build and update the embedded bundle, then start the MCP server (stdio):
502
+
503
+ ```
504
+ pnpm --filter @mcp-b/smart-dom-reader bundle:mcp
505
+ pnpm --filter @mcp-b/smart-dom-reader-server run start
506
+ ```
507
+
508
+ Or directly with tsx:
509
+
510
+ ```
511
+ tsx smart-dom-reader/mcp-server/src/index.ts
512
+ ```
513
+
514
+ ### Tool overview (inputs only)
515
+
516
+ - `browser_connect` → `{ headless?: boolean, executablePath?: string }`
517
+ - `browser_navigate` → `{ url: string }`
518
+ - `dom_extract_structure` → `{ selector?: string, detail?: 'summary'|'region'|'deep', maxTextLength?: number, maxElements?: number }`
519
+ - `dom_extract_region` → `{ selector: string, options?: { mode?: 'interactive'|'full', includeHidden?: boolean, maxDepth?: number, detail?: 'summary'|'region'|'deep', maxTextLength?: number, maxElements?: number } }`
520
+ - `dom_extract_content` → `{ selector: string, options?: { includeHeadings?: boolean, includeLists?: boolean, includeMedia?: boolean, maxTextLength?: number, detail?: 'summary'|'region'|'deep', maxElements?: number } }`
521
+ - `dom_extract_interactive` → `{ selector?: string, options?: { viewportOnly?: boolean, maxDepth?: number, detail?: 'summary'|'region'|'deep', maxTextLength?: number, maxElements?: number } }`
522
+ - `browser_screenshot` → `{ path?: string, fullPage?: boolean }`
523
+ - `browser_close` → `{}`
524
+
525
+ All extraction tools return XML-wrapped Markdown with a short “Next:” instruction at the bottom to guide the following step.
526
+
527
+ ## Local Testing (Playwright)
528
+
529
+ Run the library in a real browser against local HTML (no network):
530
+
531
+ ```
532
+ pnpm --filter @mcp-b/smart-dom-reader bundle:mcp
533
+ pnpm --filter @mcp-b/smart-dom-reader test:local
534
+ ```
535
+
536
+ What it validates:
537
+ - Stable selectors (ID, data-testid, role+aria, name/id)
538
+ - Semantic extraction (headings/images/tables/lists)
539
+ - Shadow DOM detection