similarbuild 0.3.3 → 0.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "similarbuild",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.4",
|
|
4
4
|
"description": "Visual migration framework for Claude Code — clone a live page, get a paste-ready WordPress/Elementor or Shopify section file, validated and auto-corrected.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -37,47 +37,74 @@ A single `.html` file written to `outputPath`. The file is a fragment — no `<h
|
|
|
37
37
|
|
|
38
38
|
## On Activation
|
|
39
39
|
|
|
40
|
-
1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, **`domLive`**, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**, and
|
|
40
|
+
1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, **`domLive`**, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**, and **`sectionCrops[]`**) and `assets-map.json` (the URL → localPath / inline-SVG dictionary). If `fixHints` is given, also read `previousHtmlPath`.
|
|
41
41
|
|
|
42
|
-
|
|
42
|
+
**CANONICAL VISUAL INPUT = `sectionCrops[]`.** Do NOT read `inspection.screenshot` (full-page) — it's downscaled when loaded as image and digits become unreadable. The crops are HD-readable per section. Composer uses one crop per section it composes.
|
|
43
43
|
|
|
44
|
-
|
|
44
|
+
**§V03-4 — TERNARY section classification + ZERO-FABRICATION HARD ENFORCEMENT (replaces §V03-3).** Before composing ANY section, classify it into ONE of five categories. Pick the FIRST category that matches. Each category has a distinct compose recipe. Default = placeholder is FORBIDDEN; placeholder is only the last-resort for category (E).
|
|
45
45
|
|
|
46
46
|
**Workflow when composing a body page** (anything NOT `--target-section=header|footer`):
|
|
47
47
|
|
|
48
|
-
1. **For each section you compose
|
|
49
|
-
|
|
50
|
-
2. **
|
|
51
|
-
|
|
52
|
-
-
|
|
53
|
-
-
|
|
54
|
-
-
|
|
55
|
-
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
48
|
+
1. **Find the section's crop.** For each section you compose, find the matching `inspection.sectionCrops[]` entry by `bbox.y` proximity. Read the crop via `Read({ file_path: cropEntry.path })`.
|
|
49
|
+
|
|
50
|
+
2. **Classify the section** into A / B / C / D / E:
|
|
51
|
+
|
|
52
|
+
**(A) IMAGE-BLOCK** — the section is dominated by a single `<img>` element. Detect by either:
|
|
53
|
+
- `inspection.domLive` (or `dom`) — the section's root node has a child `<img>` whose `bbox` covers ≥80% of the section's `bbox`, OR
|
|
54
|
+
- Wrapper class match: ancestor has class containing `product-info__image`, `image-with-text`, `hero-image`, `trust-badges`, `features-points`, `mothers-day`, `single-image`, `banner-image`, or similar "single asset" pattern.
|
|
55
|
+
- In Shopify Dawn/OS 2.0: **`<div class="product-info__image">` is THE canonical flag** — every match is an image-block.
|
|
56
|
+
|
|
57
|
+
Recipe: **DO NOT reproduce the image's contents in HTML/CSS.** Emit:
|
|
58
|
+
```html
|
|
59
|
+
<section class="es-{section-slug}">
|
|
60
|
+
<img src="{assetsMap.assets[url].localPath}" alt="{img.alt || section description}" loading="lazy" width="{img.width}" height="{img.height}" />
|
|
61
|
+
</section>
|
|
62
|
+
```
|
|
63
|
+
Pull the URL from `inspection.imgUrls` matching the section's bbox / src pattern. If asset wasn't downloaded (failed extract or didn't appear in imgUrls), emit `<img src="" alt="..." data-todo="asset-missing">` + `<!-- TODO: image asset not in assetsMap (URL: …) — download manually -->`. No CSS reproduction. No SVG re-creation. No text "interpretation" of what's inside the image.
|
|
64
|
+
|
|
65
|
+
**Examples from real feedback (everstride PDP)**: `mothers-day-new.svg` banner, `trust-badges-shipping.svg`, `features-points.svg`, `60-day-fit.svg` cards — all category (A).
|
|
66
|
+
|
|
67
|
+
**(B) PURE MARKUP** — the section is text + structured layout, no dominant image. Examples: FAQ accordion (`<details>/<summary>`), comparison tables (`<table>` with cells), pricing tiers, trust copy blocks, headings with paragraphs.
|
|
68
|
+
|
|
69
|
+
Recipe: READ the crop carefully + cross-validate every literal against `inspection.domLive` text nodes. Emit real semantic markup (`<table>`, `<dl>`, `<details>`, `<ul>` of `<li>`, etc.) that mirrors the live DOM structure.
|
|
70
|
+
|
|
71
|
+
**Cross-validation is MANDATORY for every literal**:
|
|
72
|
+
- Numeric (`$NN`, `NN reviews`, percentages, ratings): grep `inspection.domLive` recursively for a text node containing the exact substring. NOT found → emit `<!-- TODO: literal "$29.00" read from crop but not in DOM; verify -->` + placeholder.
|
|
73
|
+
- Headings/copy/labels: same rule. "What Customers Say", "All rights reserved", "Mix & Match Colors" — if not in DOM verbatim, do NOT emit.
|
|
74
|
+
- Counts: count items visible in crop AND cross-validate against `domLive` children of the container. If mismatch, prefer DOM count (DOM is authoritative for structure) + log discrepancy.
|
|
75
|
+
|
|
76
|
+
**(C) MIXED (image + text)** — the section has BOTH a meaningful image AND text content (heading + body paragraph + image side-by-side). Common Shopify pattern: `<images-with-text-scrolling>` custom element.
|
|
77
|
+
|
|
78
|
+
Recipe: emit `<section>` with `<img>` (rule A for the image part — use real asset URL when available) + `<h*>` heading + `<p>` paragraph(s) literal from crop, cross-validated. Do NOT collapse to image-only or text-only — preserve both.
|
|
79
|
+
|
|
80
|
+
**(D) WIDGET-RENDERED** — third-party widget (Loox reviews, "you may also like" carousel, Instagram feed) where the crop SHOWS the widget already rendered with content (reviews with names + photos + stars + quotes; product cards with prices + titles).
|
|
81
|
+
|
|
82
|
+
Recipe: READ the crop and emit real markup for each visible card/item. Cross-validate against DOM when widget exposes content as JSON-LD or hidden HTML. Mark container with `data-source="widget-name"` so the deploy team knows where to integrate the real widget later. **Do NOT emit empty mount-div when content is visible.** The "deploy time placeholder" excuse is ONLY for (E).
|
|
83
|
+
|
|
84
|
+
**(E) WIDGET-EMPTY / OPAQUE** — the crop is blank, only shows "Loading..." text, or the section is completely unreadable (zero text visible, no images, no structure). Last resort.
|
|
85
|
+
|
|
86
|
+
Recipe: emit STRUCTURAL placeholder (heading from DOM if available + `<div class="placeholder-grid">` with N visual placeholder-cards) + `<!-- TODO: <widget-name or section-description> — content not visible in crop, integrate at deploy -->`. Mount-div alone (empty `<div>`) is FORBIDDEN — placeholder must have visible structure so the deployed page is not visually broken.
|
|
87
|
+
|
|
88
|
+
3. **Hybridize sources by field type** (applies to B, C, D):
|
|
89
|
+
- **Texts/structure/counts** → crop, cross-validated against DOM.
|
|
90
|
+
- **Hrefs** → `inspection.domLive` / `hydratedHeader` / `hydratedFooter` / `imgUrls` by matching visible link text. No match → `href="#"` + TODO.
|
|
91
|
+
- **Form action/method/hidden inputs** → ALWAYS DOM (never guess; broken submission else).
|
|
92
|
+
- **Image src** → `assetsMap.assets[url].localPath` by alt/src pattern matching imgUrls.
|
|
93
|
+
- **Computed style** → `inspection.tokens` + `domLive.computedStyle`.
|
|
94
|
+
|
|
95
|
+
4. **Self-validation BEFORE calling `build-wp.mjs write`** — scan composed HTML for fabrication signals:
|
|
96
|
+
- Every numeric/price literal must either appear in `inspection.domLive` text nodes OR have a nearby TODO comment.
|
|
97
|
+
- Every `<h*>` / heading text must appear in DOM verbatim OR have TODO.
|
|
98
|
+
- Every `<p>` text > 30 chars must appear in DOM verbatim OR have TODO.
|
|
99
|
+
- Every FAQ `<div class="faq-a">` body MUST be either (a) empty + TODO when accordion was closed, OR (b) verbatim from crop where accordion was open.
|
|
100
|
+
- Image-block sections (A) MUST NOT contain SVG reproductions or CSS-styled `<div>` mimicking the image — they MUST be `<img>` + nothing else inside the section.
|
|
101
|
+
If any violation → REWRITE before submit. If can't pass after 1 rewrite, return preflight error `composition-fabrication-detected` for orchestrator Step 4j retry.
|
|
102
|
+
|
|
103
|
+
5. **Hard-fail after 2 attempts.** Orchestrator (Step 4j) caps at 2 compose iterations. 2nd iteration still flagged → page goes to **❌** with TODO-only fragment shipped (not as "ready" or "partial"). NEVER plausible-but-wrong shipped as warning.
|
|
104
|
+
|
|
105
|
+
6. **NEVER complete content from training data.** You may have seen this URL during training. That data is OLD. The crop is NEW. Crop wins. Training data is IRRELEVANT — never fill gaps with "what this site probably has".
|
|
106
|
+
|
|
107
|
+
This contract enforces the user's hard requirement: "não pode me entregar algo diferente e se não conseguir falha na 2 vez."
|
|
81
108
|
|
|
82
109
|
**§V03-1 — Use `domLive` as the canonical body tree when present.** When `inspection.domLive` is non-null, it holds the live-walker snapshot taken BEFORE Cap A substituted `dom[]` with the shadow-flattened tree. The flattened `dom[]` carries `bbox={0,0,0,0}` and empty `computedStyle` because parseHTMLUnsafe returns a detached doc — it's structurally rich (gallery imgs, custom-element children) but useless for layout. The composer needs real bboxes, real `computedStyle.background`, real heights. Always prefer `inspection.domLive` for body section composition (bbox, computedStyle, hero detection, section ordering). Use `inspection.dom` only when `domLive === null` (page had no shadow roots — flatten didn't fire) or when you specifically need shadow-flattened content like a PDP gallery (consult `dom` for image-rich PDP nodes, but use `domLive` for the surrounding layout).
|
|
83
110
|
|
|
@@ -78,7 +78,7 @@ function findAll(re, str) {
|
|
|
78
78
|
return out
|
|
79
79
|
}
|
|
80
80
|
|
|
81
|
-
function validateHtml(html) {
|
|
81
|
+
function validateHtml(html, inspection = null) {
|
|
82
82
|
const errors = []
|
|
83
83
|
const warnings = []
|
|
84
84
|
const info = []
|
|
@@ -228,6 +228,103 @@ function validateHtml(html) {
|
|
|
228
228
|
fabricationRisks.push({ section: m[1].trim(), reason: m[2].trim() })
|
|
229
229
|
}
|
|
230
230
|
|
|
231
|
+
// 13. §V03-4 fabrication detector — when inspection JSON is provided,
|
|
232
|
+
// cross-validate every literal text in the composed HTML against the
|
|
233
|
+
// inspection's text corpus (domLive + dom + hydratedHeader.html +
|
|
234
|
+
// hydratedFooter.html). Literals that don't appear ANYWHERE in the
|
|
235
|
+
// inspection corpus are flagged as fabrications.
|
|
236
|
+
//
|
|
237
|
+
// Detected literal types:
|
|
238
|
+
// - prices/money: $\d+(.\d{2})? or \d+\.\d{2}\b
|
|
239
|
+
// - countdown phrases: \d+(?:,\d{3})*\s*(reviews|customers|women|sold|pairs|...)
|
|
240
|
+
// - headings text content (h1-h6 + p.bold-like)
|
|
241
|
+
// - meaningful paragraph text >30 chars
|
|
242
|
+
//
|
|
243
|
+
// Findings emit error `composition-fabrication-detected` so the orchestrator
|
|
244
|
+
// (Step 4j) can re-trigger composition with feedback.
|
|
245
|
+
const fabricationFindings = []
|
|
246
|
+
if (inspection && typeof inspection === 'object') {
|
|
247
|
+
const corpus = buildInspectionCorpus(inspection)
|
|
248
|
+
const looseCorpus = corpus
|
|
249
|
+
.toLowerCase()
|
|
250
|
+
.replace(/\s+/g, ' ')
|
|
251
|
+
.replace(/[‘’]/g, "'")
|
|
252
|
+
.replace(/[“”]/g, '"')
|
|
253
|
+
.replace(/[—–-]+/g, '-')
|
|
254
|
+
// strip TODO-commented regions so they don't get re-scanned
|
|
255
|
+
const htmlBody = html.replace(/<!--[\s\S]*?-->/g, '')
|
|
256
|
+
// (a) money literals
|
|
257
|
+
const moneyRe = /\$\d{1,5}(?:\.\d{2})?/g
|
|
258
|
+
const moneyHits = new Set()
|
|
259
|
+
for (let m = moneyRe.exec(htmlBody); m; m = moneyRe.exec(htmlBody)) moneyHits.add(m[0])
|
|
260
|
+
for (const lit of moneyHits) {
|
|
261
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
262
|
+
fabricationFindings.push({
|
|
263
|
+
kind: 'money',
|
|
264
|
+
literal: lit,
|
|
265
|
+
severity: 'high',
|
|
266
|
+
message: `Money literal "${lit}" not found in inspection corpus (domLive/dom/hydratedHeader/hydratedFooter)`,
|
|
267
|
+
})
|
|
268
|
+
}
|
|
269
|
+
}
|
|
270
|
+
// (b) count phrases like "19,479 Reviews", "240,000+ Women", "1M+ Sold"
|
|
271
|
+
const countRe = /\b(\d[\d,]{1,9}\+?)\s+(reviews?|customers?|women|sold|pairs?|orders?|stars?|users?|clinicians?|five-star|verified)\b/gi
|
|
272
|
+
const countHits = new Set()
|
|
273
|
+
for (let m = countRe.exec(htmlBody); m; m = countRe.exec(htmlBody)) countHits.add(m[0])
|
|
274
|
+
for (const lit of countHits) {
|
|
275
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
276
|
+
fabricationFindings.push({
|
|
277
|
+
kind: 'count',
|
|
278
|
+
literal: lit,
|
|
279
|
+
severity: 'high',
|
|
280
|
+
message: `Count phrase "${lit}" not found in inspection corpus`,
|
|
281
|
+
})
|
|
282
|
+
}
|
|
283
|
+
}
|
|
284
|
+
// (c) headings — text content of h1..h6
|
|
285
|
+
const headingRe = /<h[1-6][^>]*>([\s\S]*?)<\/h[1-6]>/gi
|
|
286
|
+
const headingHits = new Set()
|
|
287
|
+
for (let m = headingRe.exec(htmlBody); m; m = headingRe.exec(htmlBody)) {
|
|
288
|
+
const txt = m[1].replace(/<[^>]+>/g, '').trim()
|
|
289
|
+
if (txt.length >= 3 && txt.length <= 120) headingHits.add(txt)
|
|
290
|
+
}
|
|
291
|
+
for (const lit of headingHits) {
|
|
292
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
293
|
+
fabricationFindings.push({
|
|
294
|
+
kind: 'heading',
|
|
295
|
+
literal: lit,
|
|
296
|
+
severity: 'high',
|
|
297
|
+
message: `Heading "${lit}" not found in inspection corpus`,
|
|
298
|
+
})
|
|
299
|
+
}
|
|
300
|
+
}
|
|
301
|
+
// (d) long paragraphs (>30 chars) — text content of <p>
|
|
302
|
+
const paraRe = /<p[^>]*>([\s\S]*?)<\/p>/gi
|
|
303
|
+
const paraHits = new Set()
|
|
304
|
+
for (let m = paraRe.exec(htmlBody); m; m = paraRe.exec(htmlBody)) {
|
|
305
|
+
const txt = m[1].replace(/<[^>]+>/g, '').trim()
|
|
306
|
+
if (txt.length >= 30 && txt.length <= 400) paraHits.add(txt)
|
|
307
|
+
}
|
|
308
|
+
for (const lit of paraHits) {
|
|
309
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
310
|
+
fabricationFindings.push({
|
|
311
|
+
kind: 'paragraph',
|
|
312
|
+
literal: lit.slice(0, 80) + (lit.length > 80 ? '…' : ''),
|
|
313
|
+
severity: 'medium',
|
|
314
|
+
message: `Paragraph "${lit.slice(0, 80)}…" not found in inspection corpus`,
|
|
315
|
+
})
|
|
316
|
+
}
|
|
317
|
+
}
|
|
318
|
+
if (fabricationFindings.length > 0) {
|
|
319
|
+
errors.push({
|
|
320
|
+
rule: 'composition-fabrication-detected',
|
|
321
|
+
severity: 'high',
|
|
322
|
+
message: `§V03-4 fabrication-detector found ${fabricationFindings.length} literal(s) not present in inspection corpus. The composer should re-read the section crops and either remove the fabricated literals OR replace with TODO comments.`,
|
|
323
|
+
findings: fabricationFindings,
|
|
324
|
+
})
|
|
325
|
+
}
|
|
326
|
+
}
|
|
327
|
+
|
|
231
328
|
return {
|
|
232
329
|
passed: errors.length === 0,
|
|
233
330
|
errorCount: errors.length,
|
|
@@ -236,9 +333,62 @@ function validateHtml(html) {
|
|
|
236
333
|
warnings,
|
|
237
334
|
info,
|
|
238
335
|
fabricationRisks,
|
|
336
|
+
fabricationFindings,
|
|
239
337
|
}
|
|
240
338
|
}
|
|
241
339
|
|
|
340
|
+
// §V03-4 — Build a single lowercased text corpus from all inspection text
|
|
341
|
+
// sources for fabrication cross-validation. Includes:
|
|
342
|
+
// - domLive (preferred) / dom: every node.text + node.attrs.alt
|
|
343
|
+
// - hydratedHeader.html, hydratedFooter.html (raw outerHTML strings)
|
|
344
|
+
function buildInspectionCorpus(inspection) {
|
|
345
|
+
const parts = []
|
|
346
|
+
function walkText(arr) {
|
|
347
|
+
if (!arr) return
|
|
348
|
+
const stack = Array.isArray(arr) ? [...arr] : [arr]
|
|
349
|
+
while (stack.length) {
|
|
350
|
+
const n = stack.pop()
|
|
351
|
+
if (!n || typeof n !== 'object') continue
|
|
352
|
+
if (typeof n.text === 'string' && n.text.trim()) parts.push(n.text)
|
|
353
|
+
if (n.attrs && typeof n.attrs === 'object') {
|
|
354
|
+
if (typeof n.attrs.alt === 'string') parts.push(n.attrs.alt)
|
|
355
|
+
if (typeof n.attrs['aria-label'] === 'string') parts.push(n.attrs['aria-label'])
|
|
356
|
+
if (typeof n.attrs.title === 'string') parts.push(n.attrs.title)
|
|
357
|
+
}
|
|
358
|
+
if (Array.isArray(n.children)) for (const c of n.children) stack.push(c)
|
|
359
|
+
}
|
|
360
|
+
}
|
|
361
|
+
walkText(inspection.domLive)
|
|
362
|
+
if (!inspection.domLive) walkText(inspection.dom)
|
|
363
|
+
if (inspection.hydratedHeader && typeof inspection.hydratedHeader.html === 'string') {
|
|
364
|
+
parts.push(inspection.hydratedHeader.html.replace(/<[^>]+>/g, ' '))
|
|
365
|
+
}
|
|
366
|
+
if (inspection.hydratedFooter && typeof inspection.hydratedFooter.html === 'string') {
|
|
367
|
+
parts.push(inspection.hydratedFooter.html.replace(/<[^>]+>/g, ' '))
|
|
368
|
+
}
|
|
369
|
+
return parts.join(' \n ')
|
|
370
|
+
}
|
|
371
|
+
|
|
372
|
+
// Loose comparison: literal appears in the corpus with whitespace and
|
|
373
|
+
// punctuation normalization. Handles "$29.00" matching "$ 29.00" etc.
|
|
374
|
+
function corpusHas(looseCorpusLower, literal) {
|
|
375
|
+
if (!literal) return true
|
|
376
|
+
const needle = literal
|
|
377
|
+
.toLowerCase()
|
|
378
|
+
.replace(/\s+/g, ' ')
|
|
379
|
+
.replace(/[‘’]/g, "'")
|
|
380
|
+
.replace(/[“”]/g, '"')
|
|
381
|
+
.replace(/[—–-]+/g, '-')
|
|
382
|
+
.trim()
|
|
383
|
+
if (!needle) return true
|
|
384
|
+
// direct substring
|
|
385
|
+
if (looseCorpusLower.includes(needle)) return true
|
|
386
|
+
// also try removing all whitespace (handles "$29 .00" vs "$29.00")
|
|
387
|
+
const compact = needle.replace(/\s+/g, '')
|
|
388
|
+
const compactCorpus = looseCorpusLower.replace(/\s+/g, '')
|
|
389
|
+
return compactCorpus.includes(compact)
|
|
390
|
+
}
|
|
391
|
+
|
|
242
392
|
// --- preflight ----------------------------------------------------------------
|
|
243
393
|
//
|
|
244
394
|
// Pre-dispatch checklist. The composer is creative — it picks patterns and
|
|
@@ -425,8 +575,21 @@ async function main() {
|
|
|
425
575
|
} catch (err) {
|
|
426
576
|
fail(`validate: cannot read ${path}: ${err.message}`, 1)
|
|
427
577
|
}
|
|
428
|
-
|
|
429
|
-
|
|
578
|
+
// §V03-4 fabrication detector — when --inspection-path is passed, load
|
|
579
|
+
// the inspection JSON and cross-validate composed HTML literals
|
|
580
|
+
// against the inspection text corpus. Without --inspection-path, the
|
|
581
|
+
// fabrication check is skipped (back-compat for older callers).
|
|
582
|
+
let inspection = null
|
|
583
|
+
if (values['inspection-path']) {
|
|
584
|
+
try {
|
|
585
|
+
const insRaw = await readFile(resolve(values['inspection-path']), 'utf8')
|
|
586
|
+
inspection = JSON.parse(insRaw)
|
|
587
|
+
} catch (err) {
|
|
588
|
+
log(values.verbose, `validate: could not load --inspection-path: ${err.message}`)
|
|
589
|
+
}
|
|
590
|
+
}
|
|
591
|
+
log(values.verbose, `validating ${path} (${html.length} bytes${inspection ? ', with fabrication check' : ''})`)
|
|
592
|
+
const report = validateHtml(html, inspection)
|
|
430
593
|
process.stdout.write(`${JSON.stringify(report, null, 2)}\n`)
|
|
431
594
|
process.exit(report.passed ? 0 : 3)
|
|
432
595
|
}
|
|
@@ -453,6 +453,84 @@ async function main() {
|
|
|
453
453
|
const slug = (cls || tag).replace(/[^a-z0-9-]/gi, '-').toLowerCase().slice(0, 32)
|
|
454
454
|
return slug || 'section'
|
|
455
455
|
}
|
|
456
|
+
|
|
457
|
+
// §V03-4 image-block detection. A section is an image-block when
|
|
458
|
+
// its visible content is dominated by a single <img>: either by
|
|
459
|
+
// bbox coverage (img takes ≥80% of section area) OR by a known
|
|
460
|
+
// wrapper-class pattern (Shopify Dawn / OS 2.0 themes use specific
|
|
461
|
+
// wrapper classes like `product-info__image` to delimit a region
|
|
462
|
+
// whose entire visual content is one CDN-hosted SVG/PNG/JPG asset).
|
|
463
|
+
// The composer (sb-build-wp §V03-4 rule A) treats image-block
|
|
464
|
+
// sections as "download asset + emit <img src>" — NEVER tries to
|
|
465
|
+
// reproduce the visual contents in HTML/CSS.
|
|
466
|
+
const IMAGE_BLOCK_WRAPPER_PATTERNS = [
|
|
467
|
+
/product-info__image/i,
|
|
468
|
+
/image-with-text/i,
|
|
469
|
+
/\bhero-image\b/i,
|
|
470
|
+
/\btrust-badges\b/i,
|
|
471
|
+
/\bfeatures-points\b/i,
|
|
472
|
+
/\bmothers?-day\b/i,
|
|
473
|
+
/\bsingle-image\b/i,
|
|
474
|
+
/\bbanner-image\b/i,
|
|
475
|
+
/\bguarantee-card\b/i,
|
|
476
|
+
/\bbest-fit-size-chart\b/i,
|
|
477
|
+
]
|
|
478
|
+
function classifyAsImageBlock(node) {
|
|
479
|
+
if (!node || typeof node !== 'object') return null
|
|
480
|
+
const sectionBbox = node.bbox
|
|
481
|
+
if (!sectionBbox || !sectionBbox.h || !sectionBbox.w) return null
|
|
482
|
+
const sectionArea = sectionBbox.h * sectionBbox.w
|
|
483
|
+
if (sectionArea <= 0) return null
|
|
484
|
+
// Match by wrapper-class pattern on the node itself OR any
|
|
485
|
+
// descendant (the wrapper might be a child of the section root).
|
|
486
|
+
function findClassMatch(n) {
|
|
487
|
+
if (!n || typeof n !== 'object') return null
|
|
488
|
+
if (Array.isArray(n.classes)) {
|
|
489
|
+
for (const cls of n.classes) {
|
|
490
|
+
for (const pat of IMAGE_BLOCK_WRAPPER_PATTERNS) {
|
|
491
|
+
if (pat.test(cls)) return cls
|
|
492
|
+
}
|
|
493
|
+
}
|
|
494
|
+
}
|
|
495
|
+
if (Array.isArray(n.children)) {
|
|
496
|
+
for (const c of n.children) {
|
|
497
|
+
const m = findClassMatch(c)
|
|
498
|
+
if (m) return m
|
|
499
|
+
}
|
|
500
|
+
}
|
|
501
|
+
return null
|
|
502
|
+
}
|
|
503
|
+
const classMatch = findClassMatch(node)
|
|
504
|
+
// Find the largest <img> descendant by bbox area.
|
|
505
|
+
let bestImg = null
|
|
506
|
+
function findBiggestImg(n) {
|
|
507
|
+
if (!n || typeof n !== 'object') return
|
|
508
|
+
if (n.tag === 'img' && n.bbox && n.bbox.h > 0 && n.bbox.w > 0) {
|
|
509
|
+
const a = n.bbox.h * n.bbox.w
|
|
510
|
+
if (!bestImg || a > bestImg.area) {
|
|
511
|
+
const src = (n.attrs && (n.attrs.src || n.attrs.srcset)) || ''
|
|
512
|
+
bestImg = { area: a, bbox: n.bbox, src }
|
|
513
|
+
}
|
|
514
|
+
}
|
|
515
|
+
if (Array.isArray(n.children)) {
|
|
516
|
+
for (const c of n.children) findBiggestImg(c)
|
|
517
|
+
}
|
|
518
|
+
}
|
|
519
|
+
findBiggestImg(node)
|
|
520
|
+
const imgArea = bestImg ? bestImg.area : 0
|
|
521
|
+
const coverage = sectionArea > 0 ? imgArea / sectionArea : 0
|
|
522
|
+
if (classMatch || coverage >= 0.8) {
|
|
523
|
+
return {
|
|
524
|
+
reason: classMatch
|
|
525
|
+
? `wrapper-class:${classMatch}`
|
|
526
|
+
: `img-coverage:${(coverage * 100).toFixed(0)}%`,
|
|
527
|
+
imgSrc: bestImg ? bestImg.src : null,
|
|
528
|
+
imgBbox: bestImg ? bestImg.bbox : null,
|
|
529
|
+
coverage,
|
|
530
|
+
}
|
|
531
|
+
}
|
|
532
|
+
return null
|
|
533
|
+
}
|
|
456
534
|
const sourceDom = result.domLive
|
|
457
535
|
? Array.isArray(result.domLive)
|
|
458
536
|
? result.domLive
|
|
@@ -496,6 +574,7 @@ async function main() {
|
|
|
496
574
|
sectionList.push({
|
|
497
575
|
sectionType: labelFromNode(grand),
|
|
498
576
|
bbox: grand.bbox,
|
|
577
|
+
imageBlock: classifyAsImageBlock(grand),
|
|
499
578
|
})
|
|
500
579
|
}
|
|
501
580
|
}
|
|
@@ -507,6 +586,7 @@ async function main() {
|
|
|
507
586
|
sectionList.push({
|
|
508
587
|
sectionType: labelFromNode(child),
|
|
509
588
|
bbox,
|
|
589
|
+
imageBlock: classifyAsImageBlock(child),
|
|
510
590
|
})
|
|
511
591
|
}
|
|
512
592
|
|
|
@@ -567,6 +647,7 @@ async function main() {
|
|
|
567
647
|
sectionType: sec.sectionType,
|
|
568
648
|
bbox: sec.bbox,
|
|
569
649
|
path: cropPath,
|
|
650
|
+
imageBlock: sec.imageBlock || null,
|
|
570
651
|
})
|
|
571
652
|
} catch (err) {
|
|
572
653
|
log(`section crop ${idx} ${sec.sectionType} failed: ${err?.message || err}`)
|