design-clone 1.2.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/README.md +26 -12
  2. package/bin/commands/clone-site.js +75 -10
  3. package/bin/commands/init.js +33 -1
  4. package/bin/commands/verify.js +5 -3
  5. package/bin/utils/validate.js +24 -8
  6. package/docs/cli-reference.md +200 -2
  7. package/docs/codebase-summary.md +309 -0
  8. package/docs/design-clone-architecture.md +259 -42
  9. package/docs/pixel-perfect.md +35 -4
  10. package/docs/project-roadmap.md +382 -0
  11. package/docs/troubleshooting.md +5 -4
  12. package/package.json +10 -8
  13. package/src/ai/__pycache__/analyze-structure.cpython-313.pyc +0 -0
  14. package/src/ai/__pycache__/extract-design-tokens.cpython-313.pyc +0 -0
  15. package/src/ai/analyze-structure.py +73 -3
  16. package/src/ai/extract-design-tokens.py +356 -13
  17. package/src/ai/prompts/__pycache__/design_tokens.cpython-313.pyc +0 -0
  18. package/src/ai/prompts/__pycache__/structure_analysis.cpython-313.pyc +0 -0
  19. package/src/ai/prompts/__pycache__/ux_audit.cpython-313.pyc +0 -0
  20. package/src/ai/prompts/design_tokens.py +133 -0
  21. package/src/ai/prompts/structure_analysis.py +329 -10
  22. package/src/ai/prompts/ux_audit.py +198 -0
  23. package/src/ai/ux-audit.js +596 -0
  24. package/src/core/app-state-snapshot.js +511 -0
  25. package/src/core/content-counter.js +342 -0
  26. package/src/core/cookie-handler.js +1 -1
  27. package/src/core/css-extractor.js +4 -4
  28. package/src/core/dimension-extractor.js +93 -21
  29. package/src/core/dimension-output.js +103 -6
  30. package/src/core/discover-pages.js +242 -14
  31. package/src/core/dom-tree-analyzer.js +298 -0
  32. package/src/core/extract-assets.js +1 -1
  33. package/src/core/framework-detector.js +538 -0
  34. package/src/core/html-extractor.js +45 -4
  35. package/src/core/lazy-loader.js +7 -7
  36. package/src/core/multi-page-screenshot.js +9 -6
  37. package/src/core/page-readiness.js +8 -8
  38. package/src/core/screenshot.js +138 -9
  39. package/src/core/section-cropper.js +209 -0
  40. package/src/core/section-detector.js +386 -0
  41. package/src/core/semantic-enhancer.js +492 -0
  42. package/src/core/state-capture.js +18 -22
  43. package/src/core/tests/test-section-cropper.js +177 -0
  44. package/src/core/tests/test-section-detector.js +55 -0
  45. package/src/core/video-capture.js +152 -146
  46. package/src/route-discoverers/angular-discoverer.js +157 -0
  47. package/src/route-discoverers/astro-discoverer.js +123 -0
  48. package/src/route-discoverers/base-discoverer.js +242 -0
  49. package/src/route-discoverers/index.js +106 -0
  50. package/src/route-discoverers/next-discoverer.js +130 -0
  51. package/src/route-discoverers/nuxt-discoverer.js +138 -0
  52. package/src/route-discoverers/react-discoverer.js +139 -0
  53. package/src/route-discoverers/svelte-discoverer.js +109 -0
  54. package/src/route-discoverers/universal-discoverer.js +227 -0
  55. package/src/route-discoverers/vue-discoverer.js +118 -0
  56. package/src/utils/__init__.py +1 -1
  57. package/src/utils/__pycache__/__init__.cpython-313.pyc +0 -0
  58. package/src/utils/browser.js +11 -37
  59. package/src/utils/playwright.js +213 -0
  60. package/src/verification/generate-audit-report.js +398 -0
  61. package/src/verification/verify-footer.js +493 -0
  62. package/src/verification/verify-header.js +486 -0
  63. package/src/verification/verify-layout.js +2 -2
  64. package/src/verification/verify-menu.js +4 -20
  65. package/src/verification/verify-slider.js +533 -0
  66. package/src/utils/puppeteer.js +0 -281
@@ -0,0 +1,309 @@
1
+ # Codebase Summary
2
+
3
+ ## Overview
4
+
5
+ Design Clone is a comprehensive design extraction toolkit that captures website designs through multi-viewport screenshots, extracts HTML/CSS, analyzes structure with AI, and enhances semantic HTML for WordPress compatibility.
6
+
7
+ ## Core Architecture
8
+
9
+ ### Directory Structure
10
+
11
+ ```
12
+ design-clone/
13
+ ├── src/
14
+ │ ├── core/ # Core extraction & processing modules
15
+ │ │ ├── screenshot.js # Multi-viewport screenshot capture
16
+ │ │ ├── html-extractor.js # HTML extraction + semantic enhancement
17
+ │ │ ├── semantic-enhancer.js # WordPress semantic HTML injection (Phase 3)
18
+ │ │ ├── css-extractor.js # CSS extraction & property tracking
19
+ │ │ ├── filter-css.js # Unused CSS selector removal
20
+ │ │ ├── animation-extractor.js # @keyframes & transition extraction
21
+ │ │ ├── state-capture.js # Hover state capture
22
+ │ │ ├── extract-assets.js # Image/font/icon downloading
23
+ │ │ ├── design-tokens.js # Design token extraction
24
+ │ │ ├── dom-tree-analyzer.js # DOM hierarchy for structure analysis
25
+ │ │ ├── dimension-extractor.js # Component dimension measurement
26
+ │ │ ├── section-cropper.js # Section extraction for AI analysis
27
+ │ │ ├── page-readiness.js # Page stability detection
28
+ │ │ ├── lazy-loader.js # Lazy loading trigger & wait
29
+ │ │ ├── cookie-handler.js # Cookie banner dismissal
30
+ │ │ ├── content-counter.js # Content statistics
31
+ │ │ ├── video-capture.js # Scroll animation recording
32
+ │ │ └── app-state-snapshot.js # App state persistence
33
+ │ ├── ai/ # AI analysis modules
34
+ │ │ ├── ux-audit.js # UX audit runner
35
+ │ │ └── prompts/ # AI prompts
36
+ │ ├── verification/ # Verification scripts
37
+ │ └── utils/ # Shared utilities
38
+ │ ├── browser.js # Browser abstraction facade
39
+ │ ├── env.js # Environment resolution
40
+ │ └── helpers.js # CLI utilities
41
+ ├── tests/ # Unit tests
42
+ │ ├── test-semantic-enhancer.js # Semantic enhancer tests (59 tests)
43
+ │ └── [other test files]
44
+ └── package.json
45
+ ```
46
+
47
+ ## Key Modules
48
+
49
+ ### 1. semantic-enhancer.js (Phase 3)
50
+
51
+ **Purpose**: Inject WordPress-compatible semantic IDs, classes, and ARIA roles into extracted HTML while preserving original styling.
52
+
53
+ **Key Exports**:
54
+ - `SEMANTIC_MAPPINGS` - Mapping definitions for header, nav, main, sidebar, footer, hero
55
+ - `detectSectionType(element)` - Detect section type via semantic tags (priority 1), ARIA roles (priority 2), class patterns (priority 3)
56
+ - `applySemanticAttributes(element, sectionType, options)` - Add ID/classes/roles to element
57
+ - `handleMultipleNavs(navElements, usedIds)` - Handle multiple nav elements with aria-label
58
+ - `enhanceSemanticHTML(html, domHierarchy)` - Browser-context enhancement (uses DOMParser)
59
+ - `enhanceSemanticHTMLInPage(page, html)` - Playwright-context enhancement (recommended for Node.js)
60
+
61
+ **Semantic Mappings**:
62
+ ```javascript
63
+ header: { id: 'site-header', classes: ['site-header'], role: 'banner' }
64
+ nav: { id: 'site-navigation', classes: ['main-navigation', 'nav-menu'], role: 'navigation' }
65
+ main: { id: 'main-content', classes: ['site-main', 'content-area'], role: 'main' }
66
+ sidebar: { id: 'primary-sidebar', classes: ['widget-area', 'sidebar'], role: 'complementary' }
67
+ footer: { id: 'site-footer', classes: ['site-footer'], role: 'contentinfo' }
68
+ hero: { id: 'hero-section', classes: ['hero'], role: null }
69
+ ```
70
+
71
+ **Detection Priority**:
72
+ 1. Semantic HTML tags (header, nav, main, aside, footer)
73
+ 2. ARIA role attributes (banner, navigation, main, complementary, contentinfo)
74
+ 3. Class pattern matching (header, nav, main, sidebar, footer, hero)
75
+
76
+ **Rules**:
77
+ - Add ID only if none exists (avoid duplicates)
78
+ - Append classes (never replace existing)
79
+ - Set role only if not present
80
+ - Handle multiple navs with proper aria-label (Primary Menu, Footer Menu, etc.)
81
+
82
+ ### 2. html-extractor.js (Modified)
83
+
84
+ **New Function**: `extractAndEnhanceHtml(page, options)`
85
+
86
+ Extracts clean HTML and optionally applies semantic enhancement via semantic-enhancer.js.
87
+
88
+ **Options**:
89
+ ```javascript
90
+ {
91
+ enhanceSemantic: true, // Enable semantic enhancement (default: true)
92
+ frameworkPatterns: [...] // Custom framework patterns to remove
93
+ }
94
+ ```
95
+
96
+ **Returns**:
97
+ ```javascript
98
+ {
99
+ html: string, // Enhanced HTML
100
+ warnings: string[], // Processing warnings
101
+ elementCount: number, // DOM element count
102
+ semanticStats: { // Only if enhanceSemantic=true
103
+ sectionsEnhanced: number,
104
+ idsAdded: number,
105
+ classesAdded: number,
106
+ rolesAdded: number,
107
+ warnings: string[]
108
+ }
109
+ }
110
+ ```
111
+
112
+ **Existing Functions**:
113
+ - `extractCleanHtml(page, frameworkPatterns)` - Remove scripts, event handlers, framework attributes
114
+
115
+ ### 3. screenshot.js (Modified)
116
+
117
+ **New Flag**: `--no-semantic`
118
+
119
+ Disable WordPress semantic HTML enhancement in extracted HTML. By default, semantic enhancement is enabled.
120
+
121
+ **Usage**:
122
+ ```bash
123
+ node src/core/screenshot.js --url https://example.com --output ./out --extract-html --no-semantic
124
+ ```
125
+
126
+ ### 4. multi-page-screenshot.js (Modified)
127
+
128
+ Uses `extractAndEnhanceHtml()` instead of separate extraction steps.
129
+
130
+ ## Processing Pipeline
131
+
132
+ ### Multi-Viewport Screenshot Flow
133
+
134
+ ```
135
+ Input URL
136
+ ├─ Desktop (1440x900)
137
+ ├─ Tablet (768x1024)
138
+ └─ Mobile (375x812)
139
+
140
+ ├── Wait for page readiness (DOM stable, fonts loaded, styles stable)
141
+ ├── Dismiss cookie banners
142
+ ├── Trigger lazy loading
143
+ ├── Force lazy images visible
144
+ ├── Capture screenshots
145
+
146
+ ├── Optional: Extract HTML
147
+ │ ├─ Clean HTML (remove scripts, framework attrs)
148
+ │ └─ Semantic enhance (add WordPress IDs/classes/roles)
149
+
150
+ ├── Optional: Extract CSS
151
+ │ ├─ Collect all stylesheet rules
152
+ │ ├─ Extract @keyframes & transitions
153
+ │ └─ Filter unused selectors
154
+
155
+ ├── Optional: Capture hover states
156
+ │ ├─ Identify interactive elements
157
+ │ ├─ Screenshot before/during hover
158
+ │ └─ Generate :hover CSS rules
159
+
160
+ └── Output: Screenshots + metadata
161
+
162
+ Output Files
163
+ ├── desktop.png, tablet.png, mobile.png
164
+ ├── source.html (cleaned + optionally semantically enhanced)
165
+ ├── source.css, source-raw.css
166
+ ├── animations.css, animation-tokens.json
167
+ ├── hover.css (if --capture-hover)
168
+ ├── structure.md (if GEMINI_API_KEY set)
169
+ └── tokens.json
170
+ ```
171
+
172
+ ## Testing
173
+
174
+ ### Test Files
175
+
176
+ - `tests/test-semantic-enhancer.js` - 59 unit tests covering:
177
+ - SEMANTIC_MAPPINGS exports
178
+ - Section type detection (header, nav, main, sidebar, footer, hero)
179
+ - Semantic attribute application
180
+ - Multiple nav handling with aria-labels
181
+ - HTML enhancement stats
182
+ - Page.evaluate integration
183
+
184
+ **Run Tests**:
185
+ ```bash
186
+ node tests/test-semantic-enhancer.js
187
+ ```
188
+
189
+ ## Data Flow
190
+
191
+ ### Semantic Enhancement Data Flow
192
+
193
+ ```
194
+ extractAndEnhanceHtml()
195
+ ├─ extractCleanHtml(page)
196
+ │ └─ page.evaluate()
197
+ │ ├─ Clone document
198
+ │ ├─ Remove scripts/noscript
199
+ │ ├─ Remove malicious CSS links
200
+ │ ├─ Remove event handlers
201
+ │ ├─ Remove framework attributes
202
+ │ ├─ Inline critical layout styles
203
+ │ └─ Return cleaned HTML + warnings
204
+
205
+ └─ enhanceSemanticHTMLInPage(page, html)
206
+ └─ page.evaluate(enhancementLogic)
207
+ ├─ Parse HTML with DOMParser
208
+ ├─ Detect sections (semantic tags → ARIA roles → class patterns)
209
+ ├─ Apply IDs/classes/roles
210
+ ├─ Handle multiple navs with aria-labels
211
+ ├─ Detect hero sections
212
+ └─ Return enhanced HTML + stats
213
+ ```
214
+
215
+ ## Configuration & Environment
216
+
217
+ ### CLI Options (screenshot.js)
218
+
219
+ | Option | Default | Phase | Description |
220
+ |--------|---------|-------|-------------|
221
+ | --url | required | - | Target URL |
222
+ | --output | required | - | Output directory |
223
+ | --viewports | all | - | Comma-separated viewport names |
224
+ | --full-page | true | - | Capture full page height |
225
+ | --max-size | 5 | - | Max file size (MB) before compression |
226
+ | --headless | false | - | Run in headless mode |
227
+ | --scroll-delay | 1500 | - | Pause time (ms) between scroll steps |
228
+ | --extract-html | false | - | Extract cleaned HTML |
229
+ | --extract-css | false | - | Extract CSS |
230
+ | --filter-unused | true | - | Filter unused CSS selectors |
231
+ | --capture-hover | false | 2 | Capture hover states |
232
+ | --section-mode | false | - | Enable section-based capture |
233
+ | --no-semantic | false | 3 | Disable semantic HTML enhancement |
234
+ | --video | false | - | Record scroll animation |
235
+
236
+ ### Environment Variables
237
+
238
+ ```bash
239
+ GEMINI_API_KEY=... # For AI structure analysis
240
+ ```
241
+
242
+ ## Design Patterns
243
+
244
+ ### Error Handling
245
+
246
+ All modules use try-catch with warning accumulation. Failed processing steps return partial results rather than throwing.
247
+
248
+ ### Idempotency
249
+
250
+ Semantic enhancement is idempotent—running on already-enhanced HTML produces same result (IDs/classes/roles already present are skipped).
251
+
252
+ ### Performance
253
+
254
+ - Combined landmark selector reduces querySelectorAll calls (8 → 1)
255
+ - Processed element tracking prevents double-counting from overlapping selectors
256
+ - Index-based element matching for reliability during DOM cloning
257
+
258
+ ### Validation
259
+
260
+ - Input validation on HTML strings (non-empty, valid string type)
261
+ - Browser context validation (DOMParser vs page.evaluate)
262
+ - ID uniqueness tracking with usedIds Set
263
+ - DOM size warnings (>50k elements)
264
+
265
+ ## Version History
266
+
267
+ ### Phase 1
268
+ - Multi-viewport screenshots
269
+ - HTML/CSS extraction
270
+ - Asset extraction
271
+
272
+ ### Phase 2
273
+ - Hover state capture
274
+ - UX audit with Gemini
275
+ - Design token extraction
276
+ - DOM tree analysis
277
+
278
+ ### Phase 3
279
+ - WordPress semantic HTML enhancement (CURRENT)
280
+ - Semantic ID/class/role injection
281
+ - ARIA landmark support
282
+ - Multiple nav handling
283
+
284
+ ## Integration Points
285
+
286
+ ### With screenshot.js
287
+ - New `--no-semantic` flag to disable enhancement
288
+ - Automatic semantic enhancement when extracting HTML (unless disabled)
289
+
290
+ ### With html-extractor.js
291
+ - New `extractAndEnhanceHtml()` function wraps extraction + enhancement
292
+ - `enhanceSemantic` option controls semantic injection
293
+
294
+ ### With multi-page-screenshot.js
295
+ - Uses `extractAndEnhanceHtml()` for HTML extraction
296
+
297
+ ## Dependencies
298
+
299
+ - **playwright** - Browser automation
300
+ - **sharp** - Image compression (optional)
301
+ - **google-genai** - AI analysis (optional, for Phase 2 features)
302
+
303
+ ## Limitations & Considerations
304
+
305
+ 1. **Browser Context Required**: `enhanceSemanticHTML()` requires DOMParser (browser). Use `enhanceSemanticHTMLInPage()` for Playwright.
306
+ 2. **Non-Invasive**: Semantic enhancement never removes existing attributes, only adds/appends.
307
+ 3. **False Positive Prevention**: Class pattern detection limited to container elements (div, section, article) to avoid false positives.
308
+ 4. **Multiple Landing Pages**: Each nav gets unique aria-label (Primary Menu, Footer Menu, Navigation 2, etc.)
309
+ 5. **Hero Section Detection**: Only top-level hero elements (not within header/footer) are detected.