design-clone 1.1.1 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +42 -20
- package/SKILL.md +74 -0
- package/bin/commands/clone-site.js +75 -10
- package/bin/commands/init.js +33 -1
- package/bin/commands/verify.js +5 -3
- package/bin/utils/validate.js +24 -8
- package/docs/cli-reference.md +224 -2
- package/docs/codebase-summary.md +309 -0
- package/docs/design-clone-architecture.md +290 -45
- package/docs/pixel-perfect.md +35 -4
- package/docs/project-roadmap.md +382 -0
- package/docs/troubleshooting.md +5 -4
- package/package.json +12 -6
- package/src/ai/__pycache__/analyze-structure.cpython-313.pyc +0 -0
- package/src/ai/__pycache__/extract-design-tokens.cpython-313.pyc +0 -0
- package/src/ai/analyze-structure.py +73 -3
- package/src/ai/extract-design-tokens.py +356 -13
- package/src/ai/prompts/__pycache__/__init__.cpython-313.pyc +0 -0
- package/src/ai/prompts/__pycache__/design_tokens.cpython-313.pyc +0 -0
- package/src/ai/prompts/__pycache__/structure_analysis.cpython-313.pyc +0 -0
- package/src/ai/prompts/__pycache__/ux_audit.cpython-313.pyc +0 -0
- package/src/ai/prompts/design_tokens.py +133 -0
- package/src/ai/prompts/structure_analysis.py +329 -10
- package/src/ai/prompts/ux_audit.py +198 -0
- package/src/ai/ux-audit.js +596 -0
- package/src/core/animation-extractor.js +526 -0
- package/src/core/app-state-snapshot.js +511 -0
- package/src/core/content-counter.js +342 -0
- package/src/core/cookie-handler.js +1 -1
- package/src/core/css-extractor.js +4 -4
- package/src/core/dimension-extractor.js +93 -21
- package/src/core/dimension-output.js +103 -6
- package/src/core/discover-pages.js +242 -14
- package/src/core/dom-tree-analyzer.js +298 -0
- package/src/core/extract-assets.js +1 -1
- package/src/core/framework-detector.js +538 -0
- package/src/core/html-extractor.js +45 -4
- package/src/core/lazy-loader.js +7 -7
- package/src/core/multi-page-screenshot.js +9 -6
- package/src/core/page-readiness.js +8 -8
- package/src/core/screenshot.js +311 -7
- package/src/core/section-cropper.js +209 -0
- package/src/core/section-detector.js +386 -0
- package/src/core/semantic-enhancer.js +492 -0
- package/src/core/state-capture.js +598 -0
- package/src/core/tests/test-section-cropper.js +177 -0
- package/src/core/tests/test-section-detector.js +55 -0
- package/src/core/video-capture.js +546 -0
- package/src/route-discoverers/angular-discoverer.js +157 -0
- package/src/route-discoverers/astro-discoverer.js +123 -0
- package/src/route-discoverers/base-discoverer.js +242 -0
- package/src/route-discoverers/index.js +106 -0
- package/src/route-discoverers/next-discoverer.js +130 -0
- package/src/route-discoverers/nuxt-discoverer.js +138 -0
- package/src/route-discoverers/react-discoverer.js +139 -0
- package/src/route-discoverers/svelte-discoverer.js +109 -0
- package/src/route-discoverers/universal-discoverer.js +227 -0
- package/src/route-discoverers/vue-discoverer.js +118 -0
- package/src/utils/__init__.py +1 -1
- package/src/utils/__pycache__/__init__.cpython-313.pyc +0 -0
- package/src/utils/__pycache__/env.cpython-313.pyc +0 -0
- package/src/utils/browser.js +11 -37
- package/src/utils/playwright.js +213 -0
- package/src/verification/generate-audit-report.js +398 -0
- package/src/verification/verify-footer.js +493 -0
- package/src/verification/verify-header.js +486 -0
- package/src/verification/verify-layout.js +2 -2
- package/src/verification/verify-menu.js +4 -20
- package/src/verification/verify-slider.js +533 -0
- package/src/utils/puppeteer.js +0 -281
package/docs/cli-reference.md
CHANGED
|
@@ -22,10 +22,52 @@ node src/core/screenshot.js [options]
|
|
|
22
22
|
| --close | bool | false | Close browser after capture (false keeps session) |
|
|
23
23
|
| --extract-html | bool | false | Extract cleaned HTML |
|
|
24
24
|
| --extract-css | bool | false | Extract all CSS from page |
|
|
25
|
+
| --extract-animations | bool | true* | Extract @keyframes and transitions (enabled with --extract-css) |
|
|
25
26
|
| --filter-unused | bool | true | Filter CSS to remove unused selectors |
|
|
27
|
+
| --capture-hover | bool | false | Capture hover state screenshots and generate :hover CSS |
|
|
28
|
+
| --no-semantic | bool | false | Disable WordPress semantic HTML enhancement (Phase 3) |
|
|
26
29
|
| --verbose | bool | false | Verbose logging |
|
|
27
30
|
|
|
28
|
-
|
|
31
|
+
*Default true when --extract-css is enabled, can be disabled with `--extract-animations false`
|
|
32
|
+
|
|
33
|
+
**Output**: JSON with screenshot paths and metadata. Includes `browserRestarts` count tracking for stability monitoring. When `--capture-hover` is enabled, also includes hover state results in output. When semantic enhancement is enabled (default), output includes `semanticStats` with enhancement details (sections enhanced, IDs/classes/roles added).
|
|
34
|
+
|
|
35
|
+
### Semantic HTML Enhancement (Phase 3)
|
|
36
|
+
|
|
37
|
+
Semantic HTML enhancement is enabled by default when extracting HTML. It injects WordPress-compatible semantic IDs, classes, and ARIA roles into the extracted HTML.
|
|
38
|
+
|
|
39
|
+
**What's Added**:
|
|
40
|
+
- **IDs**: `site-header`, `main-content`, `site-footer`, `site-navigation`, `primary-sidebar`, `hero-section`
|
|
41
|
+
- **Classes**: `site-header`, `main-navigation`, `nav-menu`, `site-main`, `content-area`, `widget-area`, `sidebar`, `site-footer`, `hero`
|
|
42
|
+
- **ARIA Roles**: `banner` (header), `navigation` (nav), `main`, `complementary` (sidebar), `contentinfo` (footer)
|
|
43
|
+
|
|
44
|
+
**Detection Priority**:
|
|
45
|
+
1. Semantic HTML tags (`<header>`, `<nav>`, `<main>`, `<aside>`, `<footer>`)
|
|
46
|
+
2. ARIA role attributes (`banner`, `navigation`, `main`, `complementary`, `contentinfo`)
|
|
47
|
+
3. Class patterns (header, nav, main, sidebar, footer, hero)
|
|
48
|
+
|
|
49
|
+
**Usage**:
|
|
50
|
+
```bash
|
|
51
|
+
# Enable semantic enhancement (default)
|
|
52
|
+
node src/core/screenshot.js --url https://example.com --output ./out --extract-html
|
|
53
|
+
|
|
54
|
+
# Disable semantic enhancement
|
|
55
|
+
node src/core/screenshot.js --url https://example.com --output ./out --extract-html --no-semantic
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Example Output Metadata**:
|
|
59
|
+
```json
|
|
60
|
+
{
|
|
61
|
+
"html": "path/to/source.html",
|
|
62
|
+
"semanticStats": {
|
|
63
|
+
"sectionsEnhanced": 5,
|
|
64
|
+
"idsAdded": 3,
|
|
65
|
+
"classesAdded": 4,
|
|
66
|
+
"rolesAdded": 2,
|
|
67
|
+
"warnings": []
|
|
68
|
+
}
|
|
69
|
+
}
|
|
70
|
+
```
|
|
29
71
|
|
|
30
72
|
## filter-css.js
|
|
31
73
|
|
|
@@ -82,9 +124,62 @@ node src/core/extract-assets.js --url URL --output DIR
|
|
|
82
124
|
Validate navigation structure.
|
|
83
125
|
|
|
84
126
|
```bash
|
|
85
|
-
node src/verification/verify-menu.js --html FILE
|
|
127
|
+
node src/verification/verify-menu.js --html FILE [--url URL] [--output DIR] [--verbose]
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
| Option | Description |
|
|
131
|
+
|--------|-------------|
|
|
132
|
+
| --html | Path to HTML file |
|
|
133
|
+
| --url | URL to test (alternative to --html) |
|
|
134
|
+
| --output | Output directory for screenshots |
|
|
135
|
+
| --verbose | Show detailed progress |
|
|
136
|
+
|
|
137
|
+
## verify-header.js
|
|
138
|
+
|
|
139
|
+
Verify header components (Phase 1).
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
node src/verification/verify-header.js --html FILE [--url URL] [--output DIR] [--verbose]
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Tests: logo presence, navigation visibility, CTA buttons, sticky/fixed behavior, z-index layering, height consistency.
|
|
146
|
+
|
|
147
|
+
## verify-footer.js
|
|
148
|
+
|
|
149
|
+
Verify footer components (Phase 1).
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
node src/verification/verify-footer.js --html FILE [--url URL] [--output DIR] [--verbose]
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Tests: position at bottom, multi-column layout, link sections, copyright text, social icons, background contrast.
|
|
156
|
+
|
|
157
|
+
## verify-slider.js
|
|
158
|
+
|
|
159
|
+
Verify slider/carousel components (Phase 1).
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
node src/verification/verify-slider.js --html FILE [--url URL] [--output DIR] [--verbose]
|
|
86
163
|
```
|
|
87
164
|
|
|
165
|
+
Tests: library detection (Swiper, Slick, Owl, native), navigation arrows, pagination dots, autoplay behavior, current slide indicator.
|
|
166
|
+
|
|
167
|
+
## generate-audit-report.js
|
|
168
|
+
|
|
169
|
+
Aggregate verification results into consolidated report (Phase 1).
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
node src/verification/generate-audit-report.js --dir DIR [--output FILE] [--verbose]
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
| Option | Description |
|
|
176
|
+
|--------|-------------|
|
|
177
|
+
| --dir | Directory containing verification JSON results |
|
|
178
|
+
| --output | Output path for report (default: component-audit.md) |
|
|
179
|
+
| --verbose | Show detailed progress |
|
|
180
|
+
|
|
181
|
+
Output: Markdown report with summary table, side-by-side screenshots, responsive analysis, CSS suggestions.
|
|
182
|
+
|
|
88
183
|
## verify-layout.js
|
|
89
184
|
|
|
90
185
|
Verify layout consistency.
|
|
@@ -92,3 +187,130 @@ Verify layout consistency.
|
|
|
92
187
|
```bash
|
|
93
188
|
node src/verification/verify-layout.js --html FILE
|
|
94
189
|
```
|
|
190
|
+
|
|
191
|
+
## state-capture.js
|
|
192
|
+
|
|
193
|
+
Capture hover states for interactive elements.
|
|
194
|
+
|
|
195
|
+
Used internally by screenshot.js with `--capture-hover` flag.
|
|
196
|
+
|
|
197
|
+
| Export | Description |
|
|
198
|
+
|--------|-------------|
|
|
199
|
+
| `captureAllHoverStates(page, cssString, outputDir)` | Detect interactive elements and capture normal/hover screenshots |
|
|
200
|
+
| `captureHoverState(page, selector, outputDir, index)` | Capture hover state for a single element |
|
|
201
|
+
| `generateHoverCss(results)` | Generate `:hover` CSS rules from captured style diffs |
|
|
202
|
+
| `detectInteractiveElements(page, cssString)` | Detect interactive elements via CSS analysis and DOM query |
|
|
203
|
+
|
|
204
|
+
**Key Features:**
|
|
205
|
+
- Dual detection: CSS-based (`:hover` selectors) and DOM-based (interactive elements, transitions)
|
|
206
|
+
- Per-element style diff capture (backgroundColor, color, transform, boxShadow, etc.)
|
|
207
|
+
- Automatic screenshot pair generation (normal + hover states)
|
|
208
|
+
- CSS rule generation from detected style changes
|
|
209
|
+
- Validates selectors and skips hidden/invisible elements
|
|
210
|
+
|
|
211
|
+
## ux-audit.js
|
|
212
|
+
|
|
213
|
+
UX quality assessment using Gemini Vision AI (Phase 2).
|
|
214
|
+
|
|
215
|
+
```bash
|
|
216
|
+
node src/ai/ux-audit.js --screenshots <dir> [--output <dir>] [--url <url>] [--verbose]
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
| Option | Required | Description |
|
|
220
|
+
|--------|----------|-------------|
|
|
221
|
+
| --screenshots | yes | Directory containing viewport screenshots (desktop.png, tablet.png, mobile.png) |
|
|
222
|
+
| --output | no | Output directory for report and JSON results (default: same as screenshots) |
|
|
223
|
+
| --url | no | Original URL (for report metadata) |
|
|
224
|
+
| --verbose | no | Show detailed progress |
|
|
225
|
+
|
|
226
|
+
**Requires**: GEMINI_API_KEY or GOOGLE_API_KEY environment variable
|
|
227
|
+
|
|
228
|
+
**Output**:
|
|
229
|
+
- `ux-audit.md`: Markdown report with scores, issues, and recommendations
|
|
230
|
+
- `ux-audit.json`: Structured results (aggregated scores, viewport breakdown, issues, recommendations)
|
|
231
|
+
|
|
232
|
+
**Evaluation Categories** (0-100 score each):
|
|
233
|
+
1. Visual Hierarchy - Content prominence, scanning patterns, call-to-action visibility
|
|
234
|
+
2. Navigation - Touch targets, menu discoverability, current page indicator
|
|
235
|
+
3. Typography - Text size, line height, contrast ratio, readability
|
|
236
|
+
4. Spacing - Padding/margins, element breathing room, touch target spacing
|
|
237
|
+
5. Interactive Elements - Button affordance, link distinguishability, focus states
|
|
238
|
+
6. Responsive - Content reflow, no horizontal scroll, text truncation, breakpoint transitions
|
|
239
|
+
|
|
240
|
+
**Viewport Analysis**: Evaluates all three viewports (desktop: 1920×1080, tablet: 768×1024, mobile: 375×812) and generates weighted scores (desktop 40%, tablet 30%, mobile 30%).
|
|
241
|
+
|
|
242
|
+
**Issue Severity Levels**:
|
|
243
|
+
- Critical (0-30 score): Blocks tasks or causes confusion
|
|
244
|
+
- Major (31-60 score): Degrades experience significantly
|
|
245
|
+
- Minor (61-80 score): Polish improvements
|
|
246
|
+
|
|
247
|
+
**Scoring Scale**:
|
|
248
|
+
- 90-100: Excellent, industry-leading UX
|
|
249
|
+
- 70-89: Good, meets modern standards
|
|
250
|
+
- 50-69: Adequate, room for improvement
|
|
251
|
+
- 30-49: Poor, significant issues
|
|
252
|
+
- 0-29: Critical, requires immediate attention
|
|
253
|
+
|
|
254
|
+
## clone-site.js
|
|
255
|
+
|
|
256
|
+
Clone multiple pages from website with integrated UX audit (Phase 2).
|
|
257
|
+
|
|
258
|
+
```bash
|
|
259
|
+
design-clone clone-site <url> [options]
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
| Option | Description |
|
|
263
|
+
|--------|-------------|
|
|
264
|
+
| --pages <paths> | Comma-separated paths (e.g., /,/about,/contact) |
|
|
265
|
+
| --max-pages <n> | Maximum pages to auto-discover (default: 10) |
|
|
266
|
+
| --viewports <list> | Viewport list (default: desktop,tablet,mobile) |
|
|
267
|
+
| --output <dir> | Custom output directory |
|
|
268
|
+
| --ai | Extract design tokens using Gemini AI (requires GEMINI_API_KEY) |
|
|
269
|
+
| --ux-audit | Run UX audit using Gemini Vision (requires GEMINI_API_KEY) |
|
|
270
|
+
| --yes, -y | Skip confirmation prompt |
|
|
271
|
+
|
|
272
|
+
**Integrated Workflow** (when using --ux-audit):
|
|
273
|
+
1. Discover or use manual pages
|
|
274
|
+
2. Capture screenshots across viewports
|
|
275
|
+
3. Merge CSS files
|
|
276
|
+
4. Extract design tokens (with --ai)
|
|
277
|
+
5. **Run UX audit** (with --ux-audit) - Analyzes homepage screenshots via Gemini Vision
|
|
278
|
+
6. Rewrite links
|
|
279
|
+
7. Generate manifest
|
|
280
|
+
|
|
281
|
+
**UX Audit Output**: When enabled, generates `analysis/ux-audit.md` and `analysis/ux-audit.json` in output directory.
|
|
282
|
+
|
|
283
|
+
**Examples**:
|
|
284
|
+
```bash
|
|
285
|
+
design-clone clone-site https://example.com --ux-audit
|
|
286
|
+
design-clone clone-site https://example.com --pages /,/about,/contact --ux-audit
|
|
287
|
+
design-clone clone-site https://example.com --ai --ux-audit
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
## ux_audit.py
|
|
291
|
+
|
|
292
|
+
Python module providing UX audit prompts for Gemini Vision integration.
|
|
293
|
+
|
|
294
|
+
```python
|
|
295
|
+
from src.ai.prompts.ux_audit import build_ux_audit_prompt, build_aggregation_prompt
|
|
296
|
+
|
|
297
|
+
# Build viewport-specific prompt
|
|
298
|
+
prompt = build_ux_audit_prompt(viewport='mobile')
|
|
299
|
+
|
|
300
|
+
# Build aggregation prompt for multiple viewports
|
|
301
|
+
aggregation = build_aggregation_prompt(desktop_results, tablet_results, mobile_results)
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
**Functions**:
|
|
305
|
+
- `build_ux_audit_prompt(viewport)` - Build prompt with viewport-specific checks (mobile/tablet/desktop)
|
|
306
|
+
- `build_aggregation_prompt(desktop, tablet, mobile)` - Combine viewport results into unified assessment
|
|
307
|
+
|
|
308
|
+
**Constants**:
|
|
309
|
+
- `UX_AUDIT_PROMPT` - Base UX evaluation prompt (6 categories)
|
|
310
|
+
- `VIEWPORT_CONTEXT` - Dictionary of viewport-specific evaluation criteria
|
|
311
|
+
- `AGGREGATION_PROMPT` - Template for combining viewport results with weighted averaging
|
|
312
|
+
|
|
313
|
+
**Viewport Weighting**:
|
|
314
|
+
- Desktop: 40% (primary interaction model)
|
|
315
|
+
- Tablet: 30% (hybrid interaction)
|
|
316
|
+
- Mobile: 30% (touch-first)
|
|
@@ -0,0 +1,309 @@
|
|
|
1
|
+
# Codebase Summary
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Design Clone is a comprehensive design extraction toolkit that captures website designs through multi-viewport screenshots, extracts HTML/CSS, analyzes structure with AI, and enhances semantic HTML for WordPress compatibility.
|
|
6
|
+
|
|
7
|
+
## Core Architecture
|
|
8
|
+
|
|
9
|
+
### Directory Structure
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
design-clone/
|
|
13
|
+
├── src/
|
|
14
|
+
│ ├── core/ # Core extraction & processing modules
|
|
15
|
+
│ │ ├── screenshot.js # Multi-viewport screenshot capture
|
|
16
|
+
│ │ ├── html-extractor.js # HTML extraction + semantic enhancement
|
|
17
|
+
│ │ ├── semantic-enhancer.js # WordPress semantic HTML injection (Phase 3)
|
|
18
|
+
│ │ ├── css-extractor.js # CSS extraction & property tracking
|
|
19
|
+
│ │ ├── filter-css.js # Unused CSS selector removal
|
|
20
|
+
│ │ ├── animation-extractor.js # @keyframes & transition extraction
|
|
21
|
+
│ │ ├── state-capture.js # Hover state capture
|
|
22
|
+
│ │ ├── extract-assets.js # Image/font/icon downloading
|
|
23
|
+
│ │ ├── design-tokens.js # Design token extraction
|
|
24
|
+
│ │ ├── dom-tree-analyzer.js # DOM hierarchy for structure analysis
|
|
25
|
+
│ │ ├── dimension-extractor.js # Component dimension measurement
|
|
26
|
+
│ │ ├── section-cropper.js # Section extraction for AI analysis
|
|
27
|
+
│ │ ├── page-readiness.js # Page stability detection
|
|
28
|
+
│ │ ├── lazy-loader.js # Lazy loading trigger & wait
|
|
29
|
+
│ │ ├── cookie-handler.js # Cookie banner dismissal
|
|
30
|
+
│ │ ├── content-counter.js # Content statistics
|
|
31
|
+
│ │ ├── video-capture.js # Scroll animation recording
|
|
32
|
+
│ │ └── app-state-snapshot.js # App state persistence
|
|
33
|
+
│ ├── ai/ # AI analysis modules
|
|
34
|
+
│ │ ├── ux-audit.js # UX audit runner
|
|
35
|
+
│ │ └── prompts/ # AI prompts
|
|
36
|
+
│ ├── verification/ # Verification scripts
|
|
37
|
+
│ └── utils/ # Shared utilities
|
|
38
|
+
│ ├── browser.js # Browser abstraction facade
|
|
39
|
+
│ ├── env.js # Environment resolution
|
|
40
|
+
│ └── helpers.js # CLI utilities
|
|
41
|
+
├── tests/ # Unit tests
|
|
42
|
+
│ ├── test-semantic-enhancer.js # Semantic enhancer tests (59 tests)
|
|
43
|
+
│ └── [other test files]
|
|
44
|
+
└── package.json
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Key Modules
|
|
48
|
+
|
|
49
|
+
### 1. semantic-enhancer.js (Phase 3)
|
|
50
|
+
|
|
51
|
+
**Purpose**: Inject WordPress-compatible semantic IDs, classes, and ARIA roles into extracted HTML while preserving original styling.
|
|
52
|
+
|
|
53
|
+
**Key Exports**:
|
|
54
|
+
- `SEMANTIC_MAPPINGS` - Mapping definitions for header, nav, main, sidebar, footer, hero
|
|
55
|
+
- `detectSectionType(element)` - Detect section type via semantic tags (priority 1), ARIA roles (priority 2), class patterns (priority 3)
|
|
56
|
+
- `applySemanticAttributes(element, sectionType, options)` - Add ID/classes/roles to element
|
|
57
|
+
- `handleMultipleNavs(navElements, usedIds)` - Handle multiple nav elements with aria-label
|
|
58
|
+
- `enhanceSemanticHTML(html, domHierarchy)` - Browser-context enhancement (uses DOMParser)
|
|
59
|
+
- `enhanceSemanticHTMLInPage(page, html)` - Playwright-context enhancement (recommended for Node.js)
|
|
60
|
+
|
|
61
|
+
**Semantic Mappings**:
|
|
62
|
+
```javascript
|
|
63
|
+
header: { id: 'site-header', classes: ['site-header'], role: 'banner' }
|
|
64
|
+
nav: { id: 'site-navigation', classes: ['main-navigation', 'nav-menu'], role: 'navigation' }
|
|
65
|
+
main: { id: 'main-content', classes: ['site-main', 'content-area'], role: 'main' }
|
|
66
|
+
sidebar: { id: 'primary-sidebar', classes: ['widget-area', 'sidebar'], role: 'complementary' }
|
|
67
|
+
footer: { id: 'site-footer', classes: ['site-footer'], role: 'contentinfo' }
|
|
68
|
+
hero: { id: 'hero-section', classes: ['hero'], role: null }
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
**Detection Priority**:
|
|
72
|
+
1. Semantic HTML tags (header, nav, main, aside, footer)
|
|
73
|
+
2. ARIA role attributes (banner, navigation, main, complementary, contentinfo)
|
|
74
|
+
3. Class pattern matching (header, nav, main, sidebar, footer, hero)
|
|
75
|
+
|
|
76
|
+
**Rules**:
|
|
77
|
+
- Add ID only if none exists (avoid duplicates)
|
|
78
|
+
- Append classes (never replace existing)
|
|
79
|
+
- Set role only if not present
|
|
80
|
+
- Handle multiple navs with proper aria-label (Primary Menu, Footer Menu, etc.)
|
|
81
|
+
|
|
82
|
+
### 2. html-extractor.js (Modified)
|
|
83
|
+
|
|
84
|
+
**New Function**: `extractAndEnhanceHtml(page, options)`
|
|
85
|
+
|
|
86
|
+
Extracts clean HTML and optionally applies semantic enhancement via semantic-enhancer.js.
|
|
87
|
+
|
|
88
|
+
**Options**:
|
|
89
|
+
```javascript
|
|
90
|
+
{
|
|
91
|
+
enhanceSemantic: true, // Enable semantic enhancement (default: true)
|
|
92
|
+
frameworkPatterns: [...] // Custom framework patterns to remove
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Returns**:
|
|
97
|
+
```javascript
|
|
98
|
+
{
|
|
99
|
+
html: string, // Enhanced HTML
|
|
100
|
+
warnings: string[], // Processing warnings
|
|
101
|
+
elementCount: number, // DOM element count
|
|
102
|
+
semanticStats: { // Only if enhanceSemantic=true
|
|
103
|
+
sectionsEnhanced: number,
|
|
104
|
+
idsAdded: number,
|
|
105
|
+
classesAdded: number,
|
|
106
|
+
rolesAdded: number,
|
|
107
|
+
warnings: string[]
|
|
108
|
+
}
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
**Existing Functions**:
|
|
113
|
+
- `extractCleanHtml(page, frameworkPatterns)` - Remove scripts, event handlers, framework attributes
|
|
114
|
+
|
|
115
|
+
### 3. screenshot.js (Modified)
|
|
116
|
+
|
|
117
|
+
**New Flag**: `--no-semantic`
|
|
118
|
+
|
|
119
|
+
Disable WordPress semantic HTML enhancement in extracted HTML. By default, semantic enhancement is enabled.
|
|
120
|
+
|
|
121
|
+
**Usage**:
|
|
122
|
+
```bash
|
|
123
|
+
node src/core/screenshot.js --url https://example.com --output ./out --extract-html --no-semantic
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### 4. multi-page-screenshot.js (Modified)
|
|
127
|
+
|
|
128
|
+
Uses `extractAndEnhanceHtml()` instead of separate extraction steps.
|
|
129
|
+
|
|
130
|
+
## Processing Pipeline
|
|
131
|
+
|
|
132
|
+
### Multi-Viewport Screenshot Flow
|
|
133
|
+
|
|
134
|
+
```
|
|
135
|
+
Input URL
|
|
136
|
+
├─ Desktop (1440x900)
|
|
137
|
+
├─ Tablet (768x1024)
|
|
138
|
+
└─ Mobile (375x812)
|
|
139
|
+
│
|
|
140
|
+
├── Wait for page readiness (DOM stable, fonts loaded, styles stable)
|
|
141
|
+
├── Dismiss cookie banners
|
|
142
|
+
├── Trigger lazy loading
|
|
143
|
+
├── Force lazy images visible
|
|
144
|
+
├── Capture screenshots
|
|
145
|
+
│
|
|
146
|
+
├── Optional: Extract HTML
|
|
147
|
+
│ ├─ Clean HTML (remove scripts, framework attrs)
|
|
148
|
+
│ └─ Semantic enhance (add WordPress IDs/classes/roles)
|
|
149
|
+
│
|
|
150
|
+
├── Optional: Extract CSS
|
|
151
|
+
│ ├─ Collect all stylesheet rules
|
|
152
|
+
│ ├─ Extract @keyframes & transitions
|
|
153
|
+
│ └─ Filter unused selectors
|
|
154
|
+
│
|
|
155
|
+
├── Optional: Capture hover states
|
|
156
|
+
│ ├─ Identify interactive elements
|
|
157
|
+
│ ├─ Screenshot before/during hover
|
|
158
|
+
│ └─ Generate :hover CSS rules
|
|
159
|
+
│
|
|
160
|
+
└── Output: Screenshots + metadata
|
|
161
|
+
|
|
162
|
+
Output Files
|
|
163
|
+
├── desktop.png, tablet.png, mobile.png
|
|
164
|
+
├── source.html (cleaned + optionally semantically enhanced)
|
|
165
|
+
├── source.css, source-raw.css
|
|
166
|
+
├── animations.css, animation-tokens.json
|
|
167
|
+
├── hover.css (if --capture-hover)
|
|
168
|
+
├── structure.md (if GEMINI_API_KEY set)
|
|
169
|
+
└── tokens.json
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
## Testing
|
|
173
|
+
|
|
174
|
+
### Test Files
|
|
175
|
+
|
|
176
|
+
- `tests/test-semantic-enhancer.js` - 59 unit tests covering:
|
|
177
|
+
- SEMANTIC_MAPPINGS exports
|
|
178
|
+
- Section type detection (header, nav, main, sidebar, footer, hero)
|
|
179
|
+
- Semantic attribute application
|
|
180
|
+
- Multiple nav handling with aria-labels
|
|
181
|
+
- HTML enhancement stats
|
|
182
|
+
- Page.evaluate integration
|
|
183
|
+
|
|
184
|
+
**Run Tests**:
|
|
185
|
+
```bash
|
|
186
|
+
node tests/test-semantic-enhancer.js
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
## Data Flow
|
|
190
|
+
|
|
191
|
+
### Semantic Enhancement Data Flow
|
|
192
|
+
|
|
193
|
+
```
|
|
194
|
+
extractAndEnhanceHtml()
|
|
195
|
+
├─ extractCleanHtml(page)
|
|
196
|
+
│ └─ page.evaluate()
|
|
197
|
+
│ ├─ Clone document
|
|
198
|
+
│ ├─ Remove scripts/noscript
|
|
199
|
+
│ ├─ Remove malicious CSS links
|
|
200
|
+
│ ├─ Remove event handlers
|
|
201
|
+
│ ├─ Remove framework attributes
|
|
202
|
+
│ ├─ Inline critical layout styles
|
|
203
|
+
│ └─ Return cleaned HTML + warnings
|
|
204
|
+
│
|
|
205
|
+
└─ enhanceSemanticHTMLInPage(page, html)
|
|
206
|
+
└─ page.evaluate(enhancementLogic)
|
|
207
|
+
├─ Parse HTML with DOMParser
|
|
208
|
+
├─ Detect sections (semantic tags → ARIA roles → class patterns)
|
|
209
|
+
├─ Apply IDs/classes/roles
|
|
210
|
+
├─ Handle multiple navs with aria-labels
|
|
211
|
+
├─ Detect hero sections
|
|
212
|
+
└─ Return enhanced HTML + stats
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
## Configuration & Environment
|
|
216
|
+
|
|
217
|
+
### CLI Options (screenshot.js)
|
|
218
|
+
|
|
219
|
+
| Option | Default | Phase | Description |
|
|
220
|
+
|--------|---------|-------|-------------|
|
|
221
|
+
| --url | required | - | Target URL |
|
|
222
|
+
| --output | required | - | Output directory |
|
|
223
|
+
| --viewports | all | - | Comma-separated viewport names |
|
|
224
|
+
| --full-page | true | - | Capture full page height |
|
|
225
|
+
| --max-size | 5 | - | Max file size (MB) before compression |
|
|
226
|
+
| --headless | false | - | Run in headless mode |
|
|
227
|
+
| --scroll-delay | 1500 | - | Pause time (ms) between scroll steps |
|
|
228
|
+
| --extract-html | false | - | Extract cleaned HTML |
|
|
229
|
+
| --extract-css | false | - | Extract CSS |
|
|
230
|
+
| --filter-unused | true | - | Filter unused CSS selectors |
|
|
231
|
+
| --capture-hover | false | 2 | Capture hover states |
|
|
232
|
+
| --section-mode | false | - | Enable section-based capture |
|
|
233
|
+
| --no-semantic | false | 3 | Disable semantic HTML enhancement |
|
|
234
|
+
| --video | false | - | Record scroll animation |
|
|
235
|
+
|
|
236
|
+
### Environment Variables
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
GEMINI_API_KEY=... # For AI structure analysis
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
## Design Patterns
|
|
243
|
+
|
|
244
|
+
### Error Handling
|
|
245
|
+
|
|
246
|
+
All modules use try-catch with warning accumulation. Failed processing steps return partial results rather than throwing.
|
|
247
|
+
|
|
248
|
+
### Idempotency
|
|
249
|
+
|
|
250
|
+
Semantic enhancement is idempotent—running on already-enhanced HTML produces same result (IDs/classes/roles already present are skipped).
|
|
251
|
+
|
|
252
|
+
### Performance
|
|
253
|
+
|
|
254
|
+
- Combined landmark selector reduces querySelectorAll calls (8 → 1)
|
|
255
|
+
- Processed element tracking prevents double-counting from overlapping selectors
|
|
256
|
+
- Index-based element matching for reliability during DOM cloning
|
|
257
|
+
|
|
258
|
+
### Validation
|
|
259
|
+
|
|
260
|
+
- Input validation on HTML strings (non-empty, valid string type)
|
|
261
|
+
- Browser context validation (DOMParser vs page.evaluate)
|
|
262
|
+
- ID uniqueness tracking with usedIds Set
|
|
263
|
+
- DOM size warnings (>50k elements)
|
|
264
|
+
|
|
265
|
+
## Version History
|
|
266
|
+
|
|
267
|
+
### Phase 1
|
|
268
|
+
- Multi-viewport screenshots
|
|
269
|
+
- HTML/CSS extraction
|
|
270
|
+
- Asset extraction
|
|
271
|
+
|
|
272
|
+
### Phase 2
|
|
273
|
+
- Hover state capture
|
|
274
|
+
- UX audit with Gemini
|
|
275
|
+
- Design token extraction
|
|
276
|
+
- DOM tree analysis
|
|
277
|
+
|
|
278
|
+
### Phase 3
|
|
279
|
+
- WordPress semantic HTML enhancement (CURRENT)
|
|
280
|
+
- Semantic ID/class/role injection
|
|
281
|
+
- ARIA landmark support
|
|
282
|
+
- Multiple nav handling
|
|
283
|
+
|
|
284
|
+
## Integration Points
|
|
285
|
+
|
|
286
|
+
### With screenshot.js
|
|
287
|
+
- New `--no-semantic` flag to disable enhancement
|
|
288
|
+
- Automatic semantic enhancement when extracting HTML (unless disabled)
|
|
289
|
+
|
|
290
|
+
### With html-extractor.js
|
|
291
|
+
- New `extractAndEnhanceHtml()` function wraps extraction + enhancement
|
|
292
|
+
- `enhanceSemantic` option controls semantic injection
|
|
293
|
+
|
|
294
|
+
### With multi-page-screenshot.js
|
|
295
|
+
- Uses `extractAndEnhanceHtml()` for HTML extraction
|
|
296
|
+
|
|
297
|
+
## Dependencies
|
|
298
|
+
|
|
299
|
+
- **playwright** - Browser automation
|
|
300
|
+
- **sharp** - Image compression (optional)
|
|
301
|
+
- **google-genai** - AI analysis (optional, for Phase 2 features)
|
|
302
|
+
|
|
303
|
+
## Limitations & Considerations
|
|
304
|
+
|
|
305
|
+
1. **Browser Context Required**: `enhanceSemanticHTML()` requires DOMParser (browser). Use `enhanceSemanticHTMLInPage()` for Playwright.
|
|
306
|
+
2. **Non-Invasive**: Semantic enhancement never removes existing attributes, only adds/appends.
|
|
307
|
+
3. **False Positive Prevention**: Class pattern detection limited to container elements (div, section, article) to avoid false positives.
|
|
308
|
+
4. **Multiple Landing Pages**: Each nav gets unique aria-label (Primary Menu, Footer Menu, Navigation 2, etc.)
|
|
309
|
+
5. **Hero Section Detection**: Only top-level hero elements (not within header/footer) are detected.
|