design-clone 1.1.1 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/README.md +42 -20
  2. package/SKILL.md +74 -0
  3. package/bin/commands/clone-site.js +75 -10
  4. package/bin/commands/init.js +33 -1
  5. package/bin/commands/verify.js +5 -3
  6. package/bin/utils/validate.js +24 -8
  7. package/docs/cli-reference.md +224 -2
  8. package/docs/codebase-summary.md +309 -0
  9. package/docs/design-clone-architecture.md +290 -45
  10. package/docs/pixel-perfect.md +35 -4
  11. package/docs/project-roadmap.md +382 -0
  12. package/docs/troubleshooting.md +5 -4
  13. package/package.json +12 -6
  14. package/src/ai/__pycache__/analyze-structure.cpython-313.pyc +0 -0
  15. package/src/ai/__pycache__/extract-design-tokens.cpython-313.pyc +0 -0
  16. package/src/ai/analyze-structure.py +73 -3
  17. package/src/ai/extract-design-tokens.py +356 -13
  18. package/src/ai/prompts/__pycache__/__init__.cpython-313.pyc +0 -0
  19. package/src/ai/prompts/__pycache__/design_tokens.cpython-313.pyc +0 -0
  20. package/src/ai/prompts/__pycache__/structure_analysis.cpython-313.pyc +0 -0
  21. package/src/ai/prompts/__pycache__/ux_audit.cpython-313.pyc +0 -0
  22. package/src/ai/prompts/design_tokens.py +133 -0
  23. package/src/ai/prompts/structure_analysis.py +329 -10
  24. package/src/ai/prompts/ux_audit.py +198 -0
  25. package/src/ai/ux-audit.js +596 -0
  26. package/src/core/animation-extractor.js +526 -0
  27. package/src/core/app-state-snapshot.js +511 -0
  28. package/src/core/content-counter.js +342 -0
  29. package/src/core/cookie-handler.js +1 -1
  30. package/src/core/css-extractor.js +4 -4
  31. package/src/core/dimension-extractor.js +93 -21
  32. package/src/core/dimension-output.js +103 -6
  33. package/src/core/discover-pages.js +242 -14
  34. package/src/core/dom-tree-analyzer.js +298 -0
  35. package/src/core/extract-assets.js +1 -1
  36. package/src/core/framework-detector.js +538 -0
  37. package/src/core/html-extractor.js +45 -4
  38. package/src/core/lazy-loader.js +7 -7
  39. package/src/core/multi-page-screenshot.js +9 -6
  40. package/src/core/page-readiness.js +8 -8
  41. package/src/core/screenshot.js +311 -7
  42. package/src/core/section-cropper.js +209 -0
  43. package/src/core/section-detector.js +386 -0
  44. package/src/core/semantic-enhancer.js +492 -0
  45. package/src/core/state-capture.js +598 -0
  46. package/src/core/tests/test-section-cropper.js +177 -0
  47. package/src/core/tests/test-section-detector.js +55 -0
  48. package/src/core/video-capture.js +546 -0
  49. package/src/route-discoverers/angular-discoverer.js +157 -0
  50. package/src/route-discoverers/astro-discoverer.js +123 -0
  51. package/src/route-discoverers/base-discoverer.js +242 -0
  52. package/src/route-discoverers/index.js +106 -0
  53. package/src/route-discoverers/next-discoverer.js +130 -0
  54. package/src/route-discoverers/nuxt-discoverer.js +138 -0
  55. package/src/route-discoverers/react-discoverer.js +139 -0
  56. package/src/route-discoverers/svelte-discoverer.js +109 -0
  57. package/src/route-discoverers/universal-discoverer.js +227 -0
  58. package/src/route-discoverers/vue-discoverer.js +118 -0
  59. package/src/utils/__init__.py +1 -1
  60. package/src/utils/__pycache__/__init__.cpython-313.pyc +0 -0
  61. package/src/utils/__pycache__/env.cpython-313.pyc +0 -0
  62. package/src/utils/browser.js +11 -37
  63. package/src/utils/playwright.js +213 -0
  64. package/src/verification/generate-audit-report.js +398 -0
  65. package/src/verification/verify-footer.js +493 -0
  66. package/src/verification/verify-header.js +486 -0
  67. package/src/verification/verify-layout.js +2 -2
  68. package/src/verification/verify-menu.js +4 -20
  69. package/src/verification/verify-slider.js +533 -0
  70. package/src/utils/puppeteer.js +0 -281
@@ -22,10 +22,52 @@ node src/core/screenshot.js [options]
22
22
  | --close | bool | false | Close browser after capture (false keeps session) |
23
23
  | --extract-html | bool | false | Extract cleaned HTML |
24
24
  | --extract-css | bool | false | Extract all CSS from page |
25
+ | --extract-animations | bool | true* | Extract @keyframes and transitions (enabled with --extract-css) |
25
26
  | --filter-unused | bool | true | Filter CSS to remove unused selectors |
27
+ | --capture-hover | bool | false | Capture hover state screenshots and generate :hover CSS |
28
+ | --no-semantic | bool | false | Disable WordPress semantic HTML enhancement (Phase 3) |
26
29
  | --verbose | bool | false | Verbose logging |
27
30
 
28
- **Output**: JSON with screenshot paths and metadata. Includes `browserRestarts` count tracking for stability monitoring.
31
+ *Default true when --extract-css is enabled, can be disabled with `--extract-animations false`
32
+
33
+ **Output**: JSON with screenshot paths and metadata. Includes `browserRestarts` count tracking for stability monitoring. When `--capture-hover` is enabled, also includes hover state results in output. When semantic enhancement is enabled (default), output includes `semanticStats` with enhancement details (sections enhanced, IDs/classes/roles added).
34
+
35
+ ### Semantic HTML Enhancement (Phase 3)
36
+
37
+ Semantic HTML enhancement is enabled by default when extracting HTML. It injects WordPress-compatible semantic IDs, classes, and ARIA roles into the extracted HTML.
38
+
39
+ **What's Added**:
40
+ - **IDs**: `site-header`, `main-content`, `site-footer`, `site-navigation`, `primary-sidebar`, `hero-section`
41
+ - **Classes**: `site-header`, `main-navigation`, `nav-menu`, `site-main`, `content-area`, `widget-area`, `sidebar`, `site-footer`, `hero`
42
+ - **ARIA Roles**: `banner` (header), `navigation` (nav), `main`, `complementary` (sidebar), `contentinfo` (footer)
43
+
44
+ **Detection Priority**:
45
+ 1. Semantic HTML tags (`<header>`, `<nav>`, `<main>`, `<aside>`, `<footer>`)
46
+ 2. ARIA role attributes (`banner`, `navigation`, `main`, `complementary`, `contentinfo`)
47
+ 3. Class patterns (header, nav, main, sidebar, footer, hero)
48
+
49
+ **Usage**:
50
+ ```bash
51
+ # Enable semantic enhancement (default)
52
+ node src/core/screenshot.js --url https://example.com --output ./out --extract-html
53
+
54
+ # Disable semantic enhancement
55
+ node src/core/screenshot.js --url https://example.com --output ./out --extract-html --no-semantic
56
+ ```
57
+
58
+ **Example Output Metadata**:
59
+ ```json
60
+ {
61
+ "html": "path/to/source.html",
62
+ "semanticStats": {
63
+ "sectionsEnhanced": 5,
64
+ "idsAdded": 3,
65
+ "classesAdded": 4,
66
+ "rolesAdded": 2,
67
+ "warnings": []
68
+ }
69
+ }
70
+ ```
29
71
 
30
72
  ## filter-css.js
31
73
 
@@ -82,9 +124,62 @@ node src/core/extract-assets.js --url URL --output DIR
82
124
  Validate navigation structure.
83
125
 
84
126
  ```bash
85
- node src/verification/verify-menu.js --html FILE
127
+ node src/verification/verify-menu.js --html FILE [--url URL] [--output DIR] [--verbose]
128
+ ```
129
+
130
+ | Option | Description |
131
+ |--------|-------------|
132
+ | --html | Path to HTML file |
133
+ | --url | URL to test (alternative to --html) |
134
+ | --output | Output directory for screenshots |
135
+ | --verbose | Show detailed progress |
136
+
137
+ ## verify-header.js
138
+
139
+ Verify header components (Phase 1).
140
+
141
+ ```bash
142
+ node src/verification/verify-header.js --html FILE [--url URL] [--output DIR] [--verbose]
143
+ ```
144
+
145
+ Tests: logo presence, navigation visibility, CTA buttons, sticky/fixed behavior, z-index layering, height consistency.
146
+
147
+ ## verify-footer.js
148
+
149
+ Verify footer components (Phase 1).
150
+
151
+ ```bash
152
+ node src/verification/verify-footer.js --html FILE [--url URL] [--output DIR] [--verbose]
153
+ ```
154
+
155
+ Tests: position at bottom, multi-column layout, link sections, copyright text, social icons, background contrast.
156
+
157
+ ## verify-slider.js
158
+
159
+ Verify slider/carousel components (Phase 1).
160
+
161
+ ```bash
162
+ node src/verification/verify-slider.js --html FILE [--url URL] [--output DIR] [--verbose]
86
163
  ```
87
164
 
165
+ Tests: library detection (Swiper, Slick, Owl, native), navigation arrows, pagination dots, autoplay behavior, current slide indicator.
166
+
167
+ ## generate-audit-report.js
168
+
169
+ Aggregate verification results into consolidated report (Phase 1).
170
+
171
+ ```bash
172
+ node src/verification/generate-audit-report.js --dir DIR [--output FILE] [--verbose]
173
+ ```
174
+
175
+ | Option | Description |
176
+ |--------|-------------|
177
+ | --dir | Directory containing verification JSON results |
178
+ | --output | Output path for report (default: component-audit.md) |
179
+ | --verbose | Show detailed progress |
180
+
181
+ Output: Markdown report with summary table, side-by-side screenshots, responsive analysis, CSS suggestions.
182
+
88
183
  ## verify-layout.js
89
184
 
90
185
  Verify layout consistency.
@@ -92,3 +187,130 @@ Verify layout consistency.
92
187
  ```bash
93
188
  node src/verification/verify-layout.js --html FILE
94
189
  ```
190
+
191
+ ## state-capture.js
192
+
193
+ Capture hover states for interactive elements.
194
+
195
+ Used internally by screenshot.js with `--capture-hover` flag.
196
+
197
+ | Export | Description |
198
+ |--------|-------------|
199
+ | `captureAllHoverStates(page, cssString, outputDir)` | Detect interactive elements and capture normal/hover screenshots |
200
+ | `captureHoverState(page, selector, outputDir, index)` | Capture hover state for a single element |
201
+ | `generateHoverCss(results)` | Generate `:hover` CSS rules from captured style diffs |
202
+ | `detectInteractiveElements(page, cssString)` | Detect interactive elements via CSS analysis and DOM query |
203
+
204
+ **Key Features:**
205
+ - Dual detection: CSS-based (`:hover` selectors) and DOM-based (interactive elements, transitions)
206
+ - Per-element style diff capture (backgroundColor, color, transform, boxShadow, etc.)
207
+ - Automatic screenshot pair generation (normal + hover states)
208
+ - CSS rule generation from detected style changes
209
+ - Validates selectors and skips hidden/invisible elements
210
+
211
+ ## ux-audit.js
212
+
213
+ UX quality assessment using Gemini Vision AI (Phase 2).
214
+
215
+ ```bash
216
+ node src/ai/ux-audit.js --screenshots <dir> [--output <dir>] [--url <url>] [--verbose]
217
+ ```
218
+
219
+ | Option | Required | Description |
220
+ |--------|----------|-------------|
221
+ | --screenshots | yes | Directory containing viewport screenshots (desktop.png, tablet.png, mobile.png) |
222
+ | --output | no | Output directory for report and JSON results (default: same as screenshots) |
223
+ | --url | no | Original URL (for report metadata) |
224
+ | --verbose | no | Show detailed progress |
225
+
226
+ **Requires**: GEMINI_API_KEY or GOOGLE_API_KEY environment variable
227
+
228
+ **Output**:
229
+ - `ux-audit.md`: Markdown report with scores, issues, and recommendations
230
+ - `ux-audit.json`: Structured results (aggregated scores, viewport breakdown, issues, recommendations)
231
+
232
+ **Evaluation Categories** (0-100 score each):
233
+ 1. Visual Hierarchy - Content prominence, scanning patterns, call-to-action visibility
234
+ 2. Navigation - Touch targets, menu discoverability, current page indicator
235
+ 3. Typography - Text size, line height, contrast ratio, readability
236
+ 4. Spacing - Padding/margins, element breathing room, touch target spacing
237
+ 5. Interactive Elements - Button affordance, link distinguishability, focus states
238
+ 6. Responsive - Content reflow, no horizontal scroll, text truncation, breakpoint transitions
239
+
240
+ **Viewport Analysis**: Evaluates all three viewports (desktop: 1920×1080, tablet: 768×1024, mobile: 375×812) and generates weighted scores (desktop 40%, tablet 30%, mobile 30%).
241
+
242
+ **Issue Severity Levels**:
243
+ - Critical (0-30 score): Blocks tasks or causes confusion
244
+ - Major (31-60 score): Degrades experience significantly
245
+ - Minor (61-80 score): Polish improvements
246
+
247
+ **Scoring Scale**:
248
+ - 90-100: Excellent, industry-leading UX
249
+ - 70-89: Good, meets modern standards
250
+ - 50-69: Adequate, room for improvement
251
+ - 30-49: Poor, significant issues
252
+ - 0-29: Critical, requires immediate attention
253
+
254
+ ## clone-site.js
255
+
256
+ Clone multiple pages from website with integrated UX audit (Phase 2).
257
+
258
+ ```bash
259
+ design-clone clone-site <url> [options]
260
+ ```
261
+
262
+ | Option | Description |
263
+ |--------|-------------|
264
+ | --pages <paths> | Comma-separated paths (e.g., /,/about,/contact) |
265
+ | --max-pages <n> | Maximum pages to auto-discover (default: 10) |
266
+ | --viewports <list> | Viewport list (default: desktop,tablet,mobile) |
267
+ | --output <dir> | Custom output directory |
268
+ | --ai | Extract design tokens using Gemini AI (requires GEMINI_API_KEY) |
269
+ | --ux-audit | Run UX audit using Gemini Vision (requires GEMINI_API_KEY) |
270
+ | --yes, -y | Skip confirmation prompt |
271
+
272
+ **Integrated Workflow** (when using --ux-audit):
273
+ 1. Discover or use manual pages
274
+ 2. Capture screenshots across viewports
275
+ 3. Merge CSS files
276
+ 4. Extract design tokens (with --ai)
277
+ 5. **Run UX audit** (with --ux-audit) - Analyzes homepage screenshots via Gemini Vision
278
+ 6. Rewrite links
279
+ 7. Generate manifest
280
+
281
+ **UX Audit Output**: When enabled, generates `analysis/ux-audit.md` and `analysis/ux-audit.json` in output directory.
282
+
283
+ **Examples**:
284
+ ```bash
285
+ design-clone clone-site https://example.com --ux-audit
286
+ design-clone clone-site https://example.com --pages /,/about,/contact --ux-audit
287
+ design-clone clone-site https://example.com --ai --ux-audit
288
+ ```
289
+
290
+ ## ux_audit.py
291
+
292
+ Python module providing UX audit prompts for Gemini Vision integration.
293
+
294
+ ```python
295
+ from src.ai.prompts.ux_audit import build_ux_audit_prompt, build_aggregation_prompt
296
+
297
+ # Build viewport-specific prompt
298
+ prompt = build_ux_audit_prompt(viewport='mobile')
299
+
300
+ # Build aggregation prompt for multiple viewports
301
+ aggregation = build_aggregation_prompt(desktop_results, tablet_results, mobile_results)
302
+ ```
303
+
304
+ **Functions**:
305
+ - `build_ux_audit_prompt(viewport)` - Build prompt with viewport-specific checks (mobile/tablet/desktop)
306
+ - `build_aggregation_prompt(desktop, tablet, mobile)` - Combine viewport results into unified assessment
307
+
308
+ **Constants**:
309
+ - `UX_AUDIT_PROMPT` - Base UX evaluation prompt (6 categories)
310
+ - `VIEWPORT_CONTEXT` - Dictionary of viewport-specific evaluation criteria
311
+ - `AGGREGATION_PROMPT` - Template for combining viewport results with weighted averaging
312
+
313
+ **Viewport Weighting**:
314
+ - Desktop: 40% (primary interaction model)
315
+ - Tablet: 30% (hybrid interaction)
316
+ - Mobile: 30% (touch-first)
@@ -0,0 +1,309 @@
1
+ # Codebase Summary
2
+
3
+ ## Overview
4
+
5
+ Design Clone is a comprehensive design extraction toolkit that captures website designs through multi-viewport screenshots, extracts HTML/CSS, analyzes structure with AI, and enhances semantic HTML for WordPress compatibility.
6
+
7
+ ## Core Architecture
8
+
9
+ ### Directory Structure
10
+
11
+ ```
12
+ design-clone/
13
+ ├── src/
14
+ │ ├── core/ # Core extraction & processing modules
15
+ │ │ ├── screenshot.js # Multi-viewport screenshot capture
16
+ │ │ ├── html-extractor.js # HTML extraction + semantic enhancement
17
+ │ │ ├── semantic-enhancer.js # WordPress semantic HTML injection (Phase 3)
18
+ │ │ ├── css-extractor.js # CSS extraction & property tracking
19
+ │ │ ├── filter-css.js # Unused CSS selector removal
20
+ │ │ ├── animation-extractor.js # @keyframes & transition extraction
21
+ │ │ ├── state-capture.js # Hover state capture
22
+ │ │ ├── extract-assets.js # Image/font/icon downloading
23
+ │ │ ├── design-tokens.js # Design token extraction
24
+ │ │ ├── dom-tree-analyzer.js # DOM hierarchy for structure analysis
25
+ │ │ ├── dimension-extractor.js # Component dimension measurement
26
+ │ │ ├── section-cropper.js # Section extraction for AI analysis
27
+ │ │ ├── page-readiness.js # Page stability detection
28
+ │ │ ├── lazy-loader.js # Lazy loading trigger & wait
29
+ │ │ ├── cookie-handler.js # Cookie banner dismissal
30
+ │ │ ├── content-counter.js # Content statistics
31
+ │ │ ├── video-capture.js # Scroll animation recording
32
+ │ │ └── app-state-snapshot.js # App state persistence
33
+ │ ├── ai/ # AI analysis modules
34
+ │ │ ├── ux-audit.js # UX audit runner
35
+ │ │ └── prompts/ # AI prompts
36
+ │ ├── verification/ # Verification scripts
37
+ │ └── utils/ # Shared utilities
38
+ │ ├── browser.js # Browser abstraction facade
39
+ │ ├── env.js # Environment resolution
40
+ │ └── helpers.js # CLI utilities
41
+ ├── tests/ # Unit tests
42
+ │ ├── test-semantic-enhancer.js # Semantic enhancer tests (59 tests)
43
+ │ └── [other test files]
44
+ └── package.json
45
+ ```
46
+
47
+ ## Key Modules
48
+
49
+ ### 1. semantic-enhancer.js (Phase 3)
50
+
51
+ **Purpose**: Inject WordPress-compatible semantic IDs, classes, and ARIA roles into extracted HTML while preserving original styling.
52
+
53
+ **Key Exports**:
54
+ - `SEMANTIC_MAPPINGS` - Mapping definitions for header, nav, main, sidebar, footer, hero
55
+ - `detectSectionType(element)` - Detect section type via semantic tags (priority 1), ARIA roles (priority 2), class patterns (priority 3)
56
+ - `applySemanticAttributes(element, sectionType, options)` - Add ID/classes/roles to element
57
+ - `handleMultipleNavs(navElements, usedIds)` - Handle multiple nav elements with aria-label
58
+ - `enhanceSemanticHTML(html, domHierarchy)` - Browser-context enhancement (uses DOMParser)
59
+ - `enhanceSemanticHTMLInPage(page, html)` - Playwright-context enhancement (recommended for Node.js)
60
+
61
+ **Semantic Mappings**:
62
+ ```javascript
63
+ header: { id: 'site-header', classes: ['site-header'], role: 'banner' }
64
+ nav: { id: 'site-navigation', classes: ['main-navigation', 'nav-menu'], role: 'navigation' }
65
+ main: { id: 'main-content', classes: ['site-main', 'content-area'], role: 'main' }
66
+ sidebar: { id: 'primary-sidebar', classes: ['widget-area', 'sidebar'], role: 'complementary' }
67
+ footer: { id: 'site-footer', classes: ['site-footer'], role: 'contentinfo' }
68
+ hero: { id: 'hero-section', classes: ['hero'], role: null }
69
+ ```
70
+
71
+ **Detection Priority**:
72
+ 1. Semantic HTML tags (header, nav, main, aside, footer)
73
+ 2. ARIA role attributes (banner, navigation, main, complementary, contentinfo)
74
+ 3. Class pattern matching (header, nav, main, sidebar, footer, hero)
75
+
76
+ **Rules**:
77
+ - Add ID only if none exists (avoid duplicates)
78
+ - Append classes (never replace existing)
79
+ - Set role only if not present
80
+ - Handle multiple navs with proper aria-label (Primary Menu, Footer Menu, etc.)
81
+
82
+ ### 2. html-extractor.js (Modified)
83
+
84
+ **New Function**: `extractAndEnhanceHtml(page, options)`
85
+
86
+ Extracts clean HTML and optionally applies semantic enhancement via semantic-enhancer.js.
87
+
88
+ **Options**:
89
+ ```javascript
90
+ {
91
+ enhanceSemantic: true, // Enable semantic enhancement (default: true)
92
+ frameworkPatterns: [...] // Custom framework patterns to remove
93
+ }
94
+ ```
95
+
96
+ **Returns**:
97
+ ```javascript
98
+ {
99
+ html: string, // Enhanced HTML
100
+ warnings: string[], // Processing warnings
101
+ elementCount: number, // DOM element count
102
+ semanticStats: { // Only if enhanceSemantic=true
103
+ sectionsEnhanced: number,
104
+ idsAdded: number,
105
+ classesAdded: number,
106
+ rolesAdded: number,
107
+ warnings: string[]
108
+ }
109
+ }
110
+ ```
111
+
112
+ **Existing Functions**:
113
+ - `extractCleanHtml(page, frameworkPatterns)` - Remove scripts, event handlers, framework attributes
114
+
115
+ ### 3. screenshot.js (Modified)
116
+
117
+ **New Flag**: `--no-semantic`
118
+
119
+ Disable WordPress semantic HTML enhancement in extracted HTML. By default, semantic enhancement is enabled.
120
+
121
+ **Usage**:
122
+ ```bash
123
+ node src/core/screenshot.js --url https://example.com --output ./out --extract-html --no-semantic
124
+ ```
125
+
126
+ ### 4. multi-page-screenshot.js (Modified)
127
+
128
+ Uses `extractAndEnhanceHtml()` instead of separate extraction steps.
129
+
130
+ ## Processing Pipeline
131
+
132
+ ### Multi-Viewport Screenshot Flow
133
+
134
+ ```
135
+ Input URL
136
+ ├─ Desktop (1440x900)
137
+ ├─ Tablet (768x1024)
138
+ └─ Mobile (375x812)
139
+
140
+ ├── Wait for page readiness (DOM stable, fonts loaded, styles stable)
141
+ ├── Dismiss cookie banners
142
+ ├── Trigger lazy loading
143
+ ├── Force lazy images visible
144
+ ├── Capture screenshots
145
+
146
+ ├── Optional: Extract HTML
147
+ │ ├─ Clean HTML (remove scripts, framework attrs)
148
+ │ └─ Semantic enhance (add WordPress IDs/classes/roles)
149
+
150
+ ├── Optional: Extract CSS
151
+ │ ├─ Collect all stylesheet rules
152
+ │ ├─ Extract @keyframes & transitions
153
+ │ └─ Filter unused selectors
154
+
155
+ ├── Optional: Capture hover states
156
+ │ ├─ Identify interactive elements
157
+ │ ├─ Screenshot before/during hover
158
+ │ └─ Generate :hover CSS rules
159
+
160
+ └── Output: Screenshots + metadata
161
+
162
+ Output Files
163
+ ├── desktop.png, tablet.png, mobile.png
164
+ ├── source.html (cleaned + optionally semantically enhanced)
165
+ ├── source.css, source-raw.css
166
+ ├── animations.css, animation-tokens.json
167
+ ├── hover.css (if --capture-hover)
168
+ ├── structure.md (if GEMINI_API_KEY set)
169
+ └── tokens.json
170
+ ```
171
+
172
+ ## Testing
173
+
174
+ ### Test Files
175
+
176
+ - `tests/test-semantic-enhancer.js` - 59 unit tests covering:
177
+ - SEMANTIC_MAPPINGS exports
178
+ - Section type detection (header, nav, main, sidebar, footer, hero)
179
+ - Semantic attribute application
180
+ - Multiple nav handling with aria-labels
181
+ - HTML enhancement stats
182
+ - Page.evaluate integration
183
+
184
+ **Run Tests**:
185
+ ```bash
186
+ node tests/test-semantic-enhancer.js
187
+ ```
188
+
189
+ ## Data Flow
190
+
191
+ ### Semantic Enhancement Data Flow
192
+
193
+ ```
194
+ extractAndEnhanceHtml()
195
+ ├─ extractCleanHtml(page)
196
+ │ └─ page.evaluate()
197
+ │ ├─ Clone document
198
+ │ ├─ Remove scripts/noscript
199
+ │ ├─ Remove malicious CSS links
200
+ │ ├─ Remove event handlers
201
+ │ ├─ Remove framework attributes
202
+ │ ├─ Inline critical layout styles
203
+ │ └─ Return cleaned HTML + warnings
204
+
205
+ └─ enhanceSemanticHTMLInPage(page, html)
206
+ └─ page.evaluate(enhancementLogic)
207
+ ├─ Parse HTML with DOMParser
208
+ ├─ Detect sections (semantic tags → ARIA roles → class patterns)
209
+ ├─ Apply IDs/classes/roles
210
+ ├─ Handle multiple navs with aria-labels
211
+ ├─ Detect hero sections
212
+ └─ Return enhanced HTML + stats
213
+ ```
214
+
215
+ ## Configuration & Environment
216
+
217
+ ### CLI Options (screenshot.js)
218
+
219
+ | Option | Default | Phase | Description |
220
+ |--------|---------|-------|-------------|
221
+ | --url | required | - | Target URL |
222
+ | --output | required | - | Output directory |
223
+ | --viewports | all | - | Comma-separated viewport names |
224
+ | --full-page | true | - | Capture full page height |
225
+ | --max-size | 5 | - | Max file size (MB) before compression |
226
+ | --headless | false | - | Run in headless mode |
227
+ | --scroll-delay | 1500 | - | Pause time (ms) between scroll steps |
228
+ | --extract-html | false | - | Extract cleaned HTML |
229
+ | --extract-css | false | - | Extract CSS |
230
+ | --filter-unused | true | - | Filter unused CSS selectors |
231
+ | --capture-hover | false | 2 | Capture hover states |
232
+ | --section-mode | false | - | Enable section-based capture |
233
+ | --no-semantic | false | 3 | Disable semantic HTML enhancement |
234
+ | --video | false | - | Record scroll animation |
235
+
236
+ ### Environment Variables
237
+
238
+ ```bash
239
+ GEMINI_API_KEY=... # For AI structure analysis
240
+ ```
241
+
242
+ ## Design Patterns
243
+
244
+ ### Error Handling
245
+
246
+ All modules use try-catch with warning accumulation. Failed processing steps return partial results rather than throwing.
247
+
248
+ ### Idempotency
249
+
250
+ Semantic enhancement is idempotent—running on already-enhanced HTML produces same result (IDs/classes/roles already present are skipped).
251
+
252
+ ### Performance
253
+
254
+ - Combined landmark selector reduces querySelectorAll calls (8 → 1)
255
+ - Processed element tracking prevents double-counting from overlapping selectors
256
+ - Index-based element matching for reliability during DOM cloning
257
+
258
+ ### Validation
259
+
260
+ - Input validation on HTML strings (non-empty, valid string type)
261
+ - Browser context validation (DOMParser vs page.evaluate)
262
+ - ID uniqueness tracking with usedIds Set
263
+ - DOM size warnings (>50k elements)
264
+
265
+ ## Version History
266
+
267
+ ### Phase 1
268
+ - Multi-viewport screenshots
269
+ - HTML/CSS extraction
270
+ - Asset extraction
271
+
272
+ ### Phase 2
273
+ - Hover state capture
274
+ - UX audit with Gemini
275
+ - Design token extraction
276
+ - DOM tree analysis
277
+
278
+ ### Phase 3
279
+ - WordPress semantic HTML enhancement (CURRENT)
280
+ - Semantic ID/class/role injection
281
+ - ARIA landmark support
282
+ - Multiple nav handling
283
+
284
+ ## Integration Points
285
+
286
+ ### With screenshot.js
287
+ - New `--no-semantic` flag to disable enhancement
288
+ - Automatic semantic enhancement when extracting HTML (unless disabled)
289
+
290
+ ### With html-extractor.js
291
+ - New `extractAndEnhanceHtml()` function wraps extraction + enhancement
292
+ - `enhanceSemantic` option controls semantic injection
293
+
294
+ ### With multi-page-screenshot.js
295
+ - Uses `extractAndEnhanceHtml()` for HTML extraction
296
+
297
+ ## Dependencies
298
+
299
+ - **playwright** - Browser automation
300
+ - **sharp** - Image compression (optional)
301
+ - **google-genai** - AI analysis (optional, for Phase 2 features)
302
+
303
+ ## Limitations & Considerations
304
+
305
+ 1. **Browser Context Required**: `enhanceSemanticHTML()` requires DOMParser (browser). Use `enhanceSemanticHTMLInPage()` for Playwright.
306
+ 2. **Non-Invasive**: Semantic enhancement never removes existing attributes, only adds/appends.
307
+ 3. **False Positive Prevention**: Class pattern detection limited to container elements (div, section, article) to avoid false positives.
308
+ 4. **Multiple Landing Pages**: Each nav gets unique aria-label (Primary Menu, Footer Menu, Navigation 2, etc.)
309
+ 5. **Hero Section Detection**: Only top-level hero elements (not within header/footer) are detected.