similarbuild 0.3.3 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "similarbuild",
3
- "version": "0.3.3",
3
+ "version": "0.3.4",
4
4
  "description": "Visual migration framework for Claude Code — clone a live page, get a paste-ready WordPress/Elementor or Shopify section file, validated and auto-corrected.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -37,47 +37,74 @@ A single `.html` file written to `outputPath`. The file is a fragment — no `<h
37
37
 
38
38
  ## On Activation
39
39
 
40
- 1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, **`domLive`**, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**, and `screenshot` path) and `assets-map.json` (the URL → localPath / inline-SVG dictionary). If `fixHints` is given, also read `previousHtmlPath`.
40
+ 1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, **`domLive`**, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**, and **`sectionCrops[]`**) and `assets-map.json` (the URL → localPath / inline-SVG dictionary). If `fixHints` is given, also read `previousHtmlPath`.
41
41
 
42
- **§V03-3 Section-level vision composition with ZERO-FABRICATION enforcement (REPLACES §V03-2).** The previous full-page screenshot approach (§V03-2) failed because page screenshots of 24000+ pixels tall get downscaled when read, making digits/labels unreadable the LLM then completed gaps with plausible-but-wrong content (fabricated FAQ answers, wrong prices, hallucinated product counts, "old version of the site" pulled from training data).
42
+ **CANONICAL VISUAL INPUT = `sectionCrops[]`.** Do NOT read `inspection.screenshot` (full-page) it's downscaled when loaded as image and digits become unreadable. The crops are HD-readable per section. Composer uses one crop per section it composes.
43
43
 
44
- v0.3.3 fixes this with **section crops in native resolution**: `inspection.sectionCrops[]` carries one image per visual band (hero, products grid, FAQ, etc.) at the viewport's native width (390px on mobile). Each crop is HD-readable for the section it represents.
44
+ **§V03-4 TERNARY section classification + ZERO-FABRICATION HARD ENFORCEMENT (replaces §V03-3).** Before composing ANY section, classify it into ONE of five categories. Pick the FIRST category that matches. Each category has a distinct compose recipe. Default = placeholder is FORBIDDEN; placeholder is only the last-resort for category (E).
45
45
 
46
46
  **Workflow when composing a body page** (anything NOT `--target-section=header|footer`):
47
47
 
48
- 1. **For each section you compose**, find the matching entry in `inspection.sectionCrops[]` by approximate bbox.y range and `sectionType` keyword. Read the crop image via Read tool — `Read({ file_path: cropEntry.path })`.
49
-
50
- 2. **CRITICAL Zero-fabrication rules. Treat these as hard contract:**
51
- - **Literal text** (prices, review counts, button labels, headings, badge text, product names, percentages, dimensions): emit ONLY what you can read clearly in the crop AT NATIVE RESOLUTION. If you're not 100% sure of the exact characters (one digit looks like another, text is too small even in native, etc.), DO NOT GUESS — emit a `<!-- TODO: <visual description>, unreadable -->` comment + structural placeholder.
52
- - **FAQ answers**: NEVER write FAQ answer bodies based on "plausible content for this brand". If `<details>`/`<summary>` accordion is visible in the crop but answers are collapsed, emit each `<summary>` text verbatim from the crop + empty `<div class="faq-a"><!-- TODO: answer body collapsed in source — open accordion or fetch live --></div>`. Same for reviews: if Loox/Yotpo widget appears as the band, emit empty `<div class="reviews-mount"><!-- TODO: third-party reviews widget, integrate at deploy time --></div>` — DO NOT compose fake reviews.
53
- - **Product counts**: count cards/thumbs in the crop. If there are 4 product cards visible, emit exactly 4. Do not "round" to 3 because typical Shopify themes have 3.
54
- - **Cross-validate every numeric literal against `inspection.domLive` text nodes.** Before emitting `<span>$29.00</span>`: grep `inspection.domLive` recursively for any text node containing `$29` or `29.00`. If found safe to emit. If NOT found → emit `<!-- TODO: price "$29.00" read from crop but not present in DOM; verify -->` instead.
55
- - **Site version awareness**: if the crop shows a banner like "Mother's Day Sale", "Black Friday Sale", limited-edition badges emit them. Do NOT skip "because the site usually doesn't have this". The crop captured the LIVE state.
56
- - **NEVER complete content based on knowledge of the public site.** You may have seen this URL in training data. That data is OLD. The crop is NEW. Crop wins, training data is irrelevant.
57
-
58
- 3. **Emit real markup reflecting what the crop shows:**
59
- - Counted gallery thumbs → emit each `<img>` with src resolved through `assetsMap.assets[url].localPath` matching URLs in `inspection.imgUrls`. If you can see 6 thumbs but `imgUrls` only has 3 matching URLs, emit 3 real `<img>` + 3 placeholders with `<!-- TODO: thumb N visible in crop, no src in inspection.imgUrls -->`.
60
- - Literal price `$29.00` (cross-validated against DOM) `<span class="price">$29.00</span> <s class="compare">$34.00</s>`.
61
- - Banner with exact visible text → emit verbatim.
62
- - Variant selectors: emit `<select>` with option list MATCHING WHAT'S VISIBLE. Inferred options (XS/XXXL not visible in crop) → emit only what's seen + `<!-- TODO: additional sizes likely available -->`.
63
- - CTA: read the literal label text + observe button color. Emit with the right text + use color from `inspection.tokens.colors` for accent/primary.
64
-
65
- 4. **Hybridize DOM + vision (data source by field type):**
66
- - **Texts/structure/counts** → crop image (truth of current state).
67
- - **Hrefs** `inspection.domLive` / `hydratedHeader` / `hydratedFooter` / `imgUrls` match by visible text. No match `href="#"` + TODO.
68
- - **Form action, method, hidden inputs** → ALWAYS DOM (never guess; broken submission else).
69
- - **Image src** `assetsMap.assets[url].localPath` resolved by visible alt or src pattern.
70
- - **Computed style** → `inspection.tokens` + `domLive.computedStyle` for primary/accent colors and font tokens.
71
-
72
- 5. **Self-validation before submit:** before calling `build-wp.mjs write`, scan the composed HTML for fabrication signals:
73
- - Every numeric literal `$NNNN` or `NNN reviews` MUST either (a) appear in a crop you actually read, or (b) be marked with a nearby TODO comment.
74
- - Every FAQ `<div class="faq-a">` body MUST be non-empty ONLY if you literally read it from a crop. Empty bodies + TODO comment is the correct output when accordions are collapsed.
75
- - Reviews widget bands → empty mount-div + TODO. Never fake review text.
76
- If any violation found, REWRITE before submit. If you can't make it pass, return preflight error `composition-fabrication-detected` and let the orchestrator's Step 4j re-try with feedback.
77
-
78
- 6. **Hard-fail after 2 attempts.** The orchestrator (Step 4j) limits to 2 compose iterations. If iteration 2 still has fabrication signals, the orchestrator marks the page `❌` and ships a structural placeholder fragment with TODO comments — NEVER ship plausible-but-wrong content as "ready" or "partial".
79
-
80
- This contract enforces what the user explicitly asked for: "não pode me entregar algo diferente e se nao conseguir falha na 2 vez".
48
+ 1. **Find the section's crop.** For each section you compose, find the matching `inspection.sectionCrops[]` entry by `bbox.y` proximity. Read the crop via `Read({ file_path: cropEntry.path })`.
49
+
50
+ 2. **Classify the section** into A / B / C / D / E:
51
+
52
+ **(A) IMAGE-BLOCK** the section is dominated by a single `<img>` element. Detect by either:
53
+ - `inspection.domLive` (or `dom`) the section's root node has a child `<img>` whose `bbox` covers ≥80% of the section's `bbox`, OR
54
+ - Wrapper class match: ancestor has class containing `product-info__image`, `image-with-text`, `hero-image`, `trust-badges`, `features-points`, `mothers-day`, `single-image`, `banner-image`, or similar "single asset" pattern.
55
+ - In Shopify Dawn/OS 2.0: **`<div class="product-info__image">` is THE canonical flag**every match is an image-block.
56
+
57
+ Recipe: **DO NOT reproduce the image's contents in HTML/CSS.** Emit:
58
+ ```html
59
+ <section class="es-{section-slug}">
60
+ <img src="{assetsMap.assets[url].localPath}" alt="{img.alt || section description}" loading="lazy" width="{img.width}" height="{img.height}" />
61
+ </section>
62
+ ```
63
+ Pull the URL from `inspection.imgUrls` matching the section's bbox / src pattern. If asset wasn't downloaded (failed extract or didn't appear in imgUrls), emit `<img src="" alt="..." data-todo="asset-missing">` + `<!-- TODO: image asset not in assetsMap (URL: …) — download manually -->`. No CSS reproduction. No SVG re-creation. No text "interpretation" of what's inside the image.
64
+
65
+ **Examples from real feedback (everstride PDP)**: `mothers-day-new.svg` banner, `trust-badges-shipping.svg`, `features-points.svg`, `60-day-fit.svg` cards — all category (A).
66
+
67
+ **(B) PURE MARKUP** the section is text + structured layout, no dominant image. Examples: FAQ accordion (`<details>/<summary>`), comparison tables (`<table>` with cells), pricing tiers, trust copy blocks, headings with paragraphs.
68
+
69
+ Recipe: READ the crop carefully + cross-validate every literal against `inspection.domLive` text nodes. Emit real semantic markup (`<table>`, `<dl>`, `<details>`, `<ul>` of `<li>`, etc.) that mirrors the live DOM structure.
70
+
71
+ **Cross-validation is MANDATORY for every literal**:
72
+ - Numeric (`$NN`, `NN reviews`, percentages, ratings): grep `inspection.domLive` recursively for a text node containing the exact substring. NOT found → emit `<!-- TODO: literal "$29.00" read from crop but not in DOM; verify -->` + placeholder.
73
+ - Headings/copy/labels: same rule. "What Customers Say", "All rights reserved", "Mix & Match Colors" if not in DOM verbatim, do NOT emit.
74
+ - Counts: count items visible in crop AND cross-validate against `domLive` children of the container. If mismatch, prefer DOM count (DOM is authoritative for structure) + log discrepancy.
75
+
76
+ **(C) MIXED (image + text)** the section has BOTH a meaningful image AND text content (heading + body paragraph + image side-by-side). Common Shopify pattern: `<images-with-text-scrolling>` custom element.
77
+
78
+ Recipe: emit `<section>` with `<img>` (rule A for the image part use real asset URL when available) + `<h*>` heading + `<p>` paragraph(s) literal from crop, cross-validated. Do NOT collapse to image-only or text-only preserve both.
79
+
80
+ **(D) WIDGET-RENDERED** third-party widget (Loox reviews, "you may also like" carousel, Instagram feed) where the crop SHOWS the widget already rendered with content (reviews with names + photos + stars + quotes; product cards with prices + titles).
81
+
82
+ Recipe: READ the crop and emit real markup for each visible card/item. Cross-validate against DOM when widget exposes content as JSON-LD or hidden HTML. Mark container with `data-source="widget-name"` so the deploy team knows where to integrate the real widget later. **Do NOT emit empty mount-div when content is visible.** The "deploy time placeholder" excuse is ONLY for (E).
83
+
84
+ **(E) WIDGET-EMPTY / OPAQUE** — the crop is blank, only shows "Loading..." text, or the section is completely unreadable (zero text visible, no images, no structure). Last resort.
85
+
86
+ Recipe: emit STRUCTURAL placeholder (heading from DOM if available + `<div class="placeholder-grid">` with N visual placeholder-cards) + `<!-- TODO: <widget-name or section-description> — content not visible in crop, integrate at deploy -->`. Mount-div alone (empty `<div>`) is FORBIDDEN — placeholder must have visible structure so the deployed page is not visually broken.
87
+
88
+ 3. **Hybridize sources by field type** (applies to B, C, D):
89
+ - **Texts/structure/counts** → crop, cross-validated against DOM.
90
+ - **Hrefs** → `inspection.domLive` / `hydratedHeader` / `hydratedFooter` / `imgUrls` by matching visible link text. No match → `href="#"` + TODO.
91
+ - **Form action/method/hidden inputs** → ALWAYS DOM (never guess; broken submission else).
92
+ - **Image src** → `assetsMap.assets[url].localPath` by alt/src pattern matching imgUrls.
93
+ - **Computed style** → `inspection.tokens` + `domLive.computedStyle`.
94
+
95
+ 4. **Self-validation BEFORE calling `build-wp.mjs write`** — scan composed HTML for fabrication signals:
96
+ - Every numeric/price literal must either appear in `inspection.domLive` text nodes OR have a nearby TODO comment.
97
+ - Every `<h*>` / heading text must appear in DOM verbatim OR have TODO.
98
+ - Every `<p>` text > 30 chars must appear in DOM verbatim OR have TODO.
99
+ - Every FAQ `<div class="faq-a">` body MUST be either (a) empty + TODO when accordion was closed, OR (b) verbatim from crop where accordion was open.
100
+ - Image-block sections (A) MUST NOT contain SVG reproductions or CSS-styled `<div>` mimicking the image — they MUST be `<img>` + nothing else inside the section.
101
+ If any violation → REWRITE before submit. If can't pass after 1 rewrite, return preflight error `composition-fabrication-detected` for orchestrator Step 4j retry.
102
+
103
+ 5. **Hard-fail after 2 attempts.** Orchestrator (Step 4j) caps at 2 compose iterations. 2nd iteration still flagged → page goes to **❌** with TODO-only fragment shipped (not as "ready" or "partial"). NEVER plausible-but-wrong shipped as warning.
104
+
105
+ 6. **NEVER complete content from training data.** You may have seen this URL during training. That data is OLD. The crop is NEW. Crop wins. Training data is IRRELEVANT — never fill gaps with "what this site probably has".
106
+
107
+ This contract enforces the user's hard requirement: "não pode me entregar algo diferente e se não conseguir falha na 2 vez."
81
108
 
82
109
  **§V03-1 — Use `domLive` as the canonical body tree when present.** When `inspection.domLive` is non-null, it holds the live-walker snapshot taken BEFORE Cap A substituted `dom[]` with the shadow-flattened tree. The flattened `dom[]` carries `bbox={0,0,0,0}` and empty `computedStyle` because parseHTMLUnsafe returns a detached doc — it's structurally rich (gallery imgs, custom-element children) but useless for layout. The composer needs real bboxes, real `computedStyle.background`, real heights. Always prefer `inspection.domLive` for body section composition (bbox, computedStyle, hero detection, section ordering). Use `inspection.dom` only when `domLive === null` (page had no shadow roots — flatten didn't fire) or when you specifically need shadow-flattened content like a PDP gallery (consult `dom` for image-rich PDP nodes, but use `domLive` for the surrounding layout).
83
110
 
@@ -78,7 +78,7 @@ function findAll(re, str) {
78
78
  return out
79
79
  }
80
80
 
81
- function validateHtml(html) {
81
+ function validateHtml(html, inspection = null) {
82
82
  const errors = []
83
83
  const warnings = []
84
84
  const info = []
@@ -228,6 +228,103 @@ function validateHtml(html) {
228
228
  fabricationRisks.push({ section: m[1].trim(), reason: m[2].trim() })
229
229
  }
230
230
 
231
+ // 13. §V03-4 fabrication detector — when inspection JSON is provided,
232
+ // cross-validate every literal text in the composed HTML against the
233
+ // inspection's text corpus (domLive + dom + hydratedHeader.html +
234
+ // hydratedFooter.html). Literals that don't appear ANYWHERE in the
235
+ // inspection corpus are flagged as fabrications.
236
+ //
237
+ // Detected literal types:
238
+ // - prices/money: $\d+(.\d{2})? or \d+\.\d{2}\b
239
+ // - countdown phrases: \d+(?:,\d{3})*\s*(reviews|customers|women|sold|pairs|...)
240
+ // - headings text content (h1-h6 + p.bold-like)
241
+ // - meaningful paragraph text >30 chars
242
+ //
243
+ // Findings emit error `composition-fabrication-detected` so the orchestrator
244
+ // (Step 4j) can re-trigger composition with feedback.
245
+ const fabricationFindings = []
246
+ if (inspection && typeof inspection === 'object') {
247
+ const corpus = buildInspectionCorpus(inspection)
248
+ const looseCorpus = corpus
249
+ .toLowerCase()
250
+ .replace(/\s+/g, ' ')
251
+ .replace(/[‘’]/g, "'")
252
+ .replace(/[“”]/g, '"')
253
+ .replace(/[—–-]+/g, '-')
254
+ // strip TODO-commented regions so they don't get re-scanned
255
+ const htmlBody = html.replace(/<!--[\s\S]*?-->/g, '')
256
+ // (a) money literals
257
+ const moneyRe = /\$\d{1,5}(?:\.\d{2})?/g
258
+ const moneyHits = new Set()
259
+ for (let m = moneyRe.exec(htmlBody); m; m = moneyRe.exec(htmlBody)) moneyHits.add(m[0])
260
+ for (const lit of moneyHits) {
261
+ if (!corpusHas(looseCorpus, lit)) {
262
+ fabricationFindings.push({
263
+ kind: 'money',
264
+ literal: lit,
265
+ severity: 'high',
266
+ message: `Money literal "${lit}" not found in inspection corpus (domLive/dom/hydratedHeader/hydratedFooter)`,
267
+ })
268
+ }
269
+ }
270
+ // (b) count phrases like "19,479 Reviews", "240,000+ Women", "1M+ Sold"
271
+ const countRe = /\b(\d[\d,]{1,9}\+?)\s+(reviews?|customers?|women|sold|pairs?|orders?|stars?|users?|clinicians?|five-star|verified)\b/gi
272
+ const countHits = new Set()
273
+ for (let m = countRe.exec(htmlBody); m; m = countRe.exec(htmlBody)) countHits.add(m[0])
274
+ for (const lit of countHits) {
275
+ if (!corpusHas(looseCorpus, lit)) {
276
+ fabricationFindings.push({
277
+ kind: 'count',
278
+ literal: lit,
279
+ severity: 'high',
280
+ message: `Count phrase "${lit}" not found in inspection corpus`,
281
+ })
282
+ }
283
+ }
284
+ // (c) headings — text content of h1..h6
285
+ const headingRe = /<h[1-6][^>]*>([\s\S]*?)<\/h[1-6]>/gi
286
+ const headingHits = new Set()
287
+ for (let m = headingRe.exec(htmlBody); m; m = headingRe.exec(htmlBody)) {
288
+ const txt = m[1].replace(/<[^>]+>/g, '').trim()
289
+ if (txt.length >= 3 && txt.length <= 120) headingHits.add(txt)
290
+ }
291
+ for (const lit of headingHits) {
292
+ if (!corpusHas(looseCorpus, lit)) {
293
+ fabricationFindings.push({
294
+ kind: 'heading',
295
+ literal: lit,
296
+ severity: 'high',
297
+ message: `Heading "${lit}" not found in inspection corpus`,
298
+ })
299
+ }
300
+ }
301
+ // (d) long paragraphs (>30 chars) — text content of <p>
302
+ const paraRe = /<p[^>]*>([\s\S]*?)<\/p>/gi
303
+ const paraHits = new Set()
304
+ for (let m = paraRe.exec(htmlBody); m; m = paraRe.exec(htmlBody)) {
305
+ const txt = m[1].replace(/<[^>]+>/g, '').trim()
306
+ if (txt.length >= 30 && txt.length <= 400) paraHits.add(txt)
307
+ }
308
+ for (const lit of paraHits) {
309
+ if (!corpusHas(looseCorpus, lit)) {
310
+ fabricationFindings.push({
311
+ kind: 'paragraph',
312
+ literal: lit.slice(0, 80) + (lit.length > 80 ? '…' : ''),
313
+ severity: 'medium',
314
+ message: `Paragraph "${lit.slice(0, 80)}…" not found in inspection corpus`,
315
+ })
316
+ }
317
+ }
318
+ if (fabricationFindings.length > 0) {
319
+ errors.push({
320
+ rule: 'composition-fabrication-detected',
321
+ severity: 'high',
322
+ message: `§V03-4 fabrication-detector found ${fabricationFindings.length} literal(s) not present in inspection corpus. The composer should re-read the section crops and either remove the fabricated literals OR replace with TODO comments.`,
323
+ findings: fabricationFindings,
324
+ })
325
+ }
326
+ }
327
+
231
328
  return {
232
329
  passed: errors.length === 0,
233
330
  errorCount: errors.length,
@@ -236,9 +333,62 @@ function validateHtml(html) {
236
333
  warnings,
237
334
  info,
238
335
  fabricationRisks,
336
+ fabricationFindings,
239
337
  }
240
338
  }
241
339
 
340
+ // §V03-4 — Build a single lowercased text corpus from all inspection text
341
+ // sources for fabrication cross-validation. Includes:
342
+ // - domLive (preferred) / dom: every node.text + node.attrs.alt
343
+ // - hydratedHeader.html, hydratedFooter.html (raw outerHTML strings)
344
+ function buildInspectionCorpus(inspection) {
345
+ const parts = []
346
+ function walkText(arr) {
347
+ if (!arr) return
348
+ const stack = Array.isArray(arr) ? [...arr] : [arr]
349
+ while (stack.length) {
350
+ const n = stack.pop()
351
+ if (!n || typeof n !== 'object') continue
352
+ if (typeof n.text === 'string' && n.text.trim()) parts.push(n.text)
353
+ if (n.attrs && typeof n.attrs === 'object') {
354
+ if (typeof n.attrs.alt === 'string') parts.push(n.attrs.alt)
355
+ if (typeof n.attrs['aria-label'] === 'string') parts.push(n.attrs['aria-label'])
356
+ if (typeof n.attrs.title === 'string') parts.push(n.attrs.title)
357
+ }
358
+ if (Array.isArray(n.children)) for (const c of n.children) stack.push(c)
359
+ }
360
+ }
361
+ walkText(inspection.domLive)
362
+ if (!inspection.domLive) walkText(inspection.dom)
363
+ if (inspection.hydratedHeader && typeof inspection.hydratedHeader.html === 'string') {
364
+ parts.push(inspection.hydratedHeader.html.replace(/<[^>]+>/g, ' '))
365
+ }
366
+ if (inspection.hydratedFooter && typeof inspection.hydratedFooter.html === 'string') {
367
+ parts.push(inspection.hydratedFooter.html.replace(/<[^>]+>/g, ' '))
368
+ }
369
+ return parts.join(' \n ')
370
+ }
371
+
372
+ // Loose comparison: literal appears in the corpus with whitespace and
373
+ // punctuation normalization. Handles "$29.00" matching "$ 29.00" etc.
374
+ function corpusHas(looseCorpusLower, literal) {
375
+ if (!literal) return true
376
+ const needle = literal
377
+ .toLowerCase()
378
+ .replace(/\s+/g, ' ')
379
+ .replace(/[‘’]/g, "'")
380
+ .replace(/[“”]/g, '"')
381
+ .replace(/[—–-]+/g, '-')
382
+ .trim()
383
+ if (!needle) return true
384
+ // direct substring
385
+ if (looseCorpusLower.includes(needle)) return true
386
+ // also try removing all whitespace (handles "$29 .00" vs "$29.00")
387
+ const compact = needle.replace(/\s+/g, '')
388
+ const compactCorpus = looseCorpusLower.replace(/\s+/g, '')
389
+ return compactCorpus.includes(compact)
390
+ }
391
+
242
392
  // --- preflight ----------------------------------------------------------------
243
393
  //
244
394
  // Pre-dispatch checklist. The composer is creative — it picks patterns and
@@ -425,8 +575,21 @@ async function main() {
425
575
  } catch (err) {
426
576
  fail(`validate: cannot read ${path}: ${err.message}`, 1)
427
577
  }
428
- log(values.verbose, `validating ${path} (${html.length} bytes)`)
429
- const report = validateHtml(html)
578
+ // §V03-4 fabrication detector — when --inspection-path is passed, load
579
+ // the inspection JSON and cross-validate composed HTML literals
580
+ // against the inspection text corpus. Without --inspection-path, the
581
+ // fabrication check is skipped (back-compat for older callers).
582
+ let inspection = null
583
+ if (values['inspection-path']) {
584
+ try {
585
+ const insRaw = await readFile(resolve(values['inspection-path']), 'utf8')
586
+ inspection = JSON.parse(insRaw)
587
+ } catch (err) {
588
+ log(values.verbose, `validate: could not load --inspection-path: ${err.message}`)
589
+ }
590
+ }
591
+ log(values.verbose, `validating ${path} (${html.length} bytes${inspection ? ', with fabrication check' : ''})`)
592
+ const report = validateHtml(html, inspection)
430
593
  process.stdout.write(`${JSON.stringify(report, null, 2)}\n`)
431
594
  process.exit(report.passed ? 0 : 3)
432
595
  }
@@ -453,6 +453,84 @@ async function main() {
453
453
  const slug = (cls || tag).replace(/[^a-z0-9-]/gi, '-').toLowerCase().slice(0, 32)
454
454
  return slug || 'section'
455
455
  }
456
+
457
+ // §V03-4 image-block detection. A section is an image-block when
458
+ // its visible content is dominated by a single <img>: either by
459
+ // bbox coverage (img takes ≥80% of section area) OR by a known
460
+ // wrapper-class pattern (Shopify Dawn / OS 2.0 themes use specific
461
+ // wrapper classes like `product-info__image` to delimit a region
462
+ // whose entire visual content is one CDN-hosted SVG/PNG/JPG asset).
463
+ // The composer (sb-build-wp §V03-4 rule A) treats image-block
464
+ // sections as "download asset + emit <img src>" — NEVER tries to
465
+ // reproduce the visual contents in HTML/CSS.
466
+ const IMAGE_BLOCK_WRAPPER_PATTERNS = [
467
+ /product-info__image/i,
468
+ /image-with-text/i,
469
+ /\bhero-image\b/i,
470
+ /\btrust-badges\b/i,
471
+ /\bfeatures-points\b/i,
472
+ /\bmothers?-day\b/i,
473
+ /\bsingle-image\b/i,
474
+ /\bbanner-image\b/i,
475
+ /\bguarantee-card\b/i,
476
+ /\bbest-fit-size-chart\b/i,
477
+ ]
478
+ function classifyAsImageBlock(node) {
479
+ if (!node || typeof node !== 'object') return null
480
+ const sectionBbox = node.bbox
481
+ if (!sectionBbox || !sectionBbox.h || !sectionBbox.w) return null
482
+ const sectionArea = sectionBbox.h * sectionBbox.w
483
+ if (sectionArea <= 0) return null
484
+ // Match by wrapper-class pattern on the node itself OR any
485
+ // descendant (the wrapper might be a child of the section root).
486
+ function findClassMatch(n) {
487
+ if (!n || typeof n !== 'object') return null
488
+ if (Array.isArray(n.classes)) {
489
+ for (const cls of n.classes) {
490
+ for (const pat of IMAGE_BLOCK_WRAPPER_PATTERNS) {
491
+ if (pat.test(cls)) return cls
492
+ }
493
+ }
494
+ }
495
+ if (Array.isArray(n.children)) {
496
+ for (const c of n.children) {
497
+ const m = findClassMatch(c)
498
+ if (m) return m
499
+ }
500
+ }
501
+ return null
502
+ }
503
+ const classMatch = findClassMatch(node)
504
+ // Find the largest <img> descendant by bbox area.
505
+ let bestImg = null
506
+ function findBiggestImg(n) {
507
+ if (!n || typeof n !== 'object') return
508
+ if (n.tag === 'img' && n.bbox && n.bbox.h > 0 && n.bbox.w > 0) {
509
+ const a = n.bbox.h * n.bbox.w
510
+ if (!bestImg || a > bestImg.area) {
511
+ const src = (n.attrs && (n.attrs.src || n.attrs.srcset)) || ''
512
+ bestImg = { area: a, bbox: n.bbox, src }
513
+ }
514
+ }
515
+ if (Array.isArray(n.children)) {
516
+ for (const c of n.children) findBiggestImg(c)
517
+ }
518
+ }
519
+ findBiggestImg(node)
520
+ const imgArea = bestImg ? bestImg.area : 0
521
+ const coverage = sectionArea > 0 ? imgArea / sectionArea : 0
522
+ if (classMatch || coverage >= 0.8) {
523
+ return {
524
+ reason: classMatch
525
+ ? `wrapper-class:${classMatch}`
526
+ : `img-coverage:${(coverage * 100).toFixed(0)}%`,
527
+ imgSrc: bestImg ? bestImg.src : null,
528
+ imgBbox: bestImg ? bestImg.bbox : null,
529
+ coverage,
530
+ }
531
+ }
532
+ return null
533
+ }
456
534
  const sourceDom = result.domLive
457
535
  ? Array.isArray(result.domLive)
458
536
  ? result.domLive
@@ -496,6 +574,7 @@ async function main() {
496
574
  sectionList.push({
497
575
  sectionType: labelFromNode(grand),
498
576
  bbox: grand.bbox,
577
+ imageBlock: classifyAsImageBlock(grand),
499
578
  })
500
579
  }
501
580
  }
@@ -507,6 +586,7 @@ async function main() {
507
586
  sectionList.push({
508
587
  sectionType: labelFromNode(child),
509
588
  bbox,
589
+ imageBlock: classifyAsImageBlock(child),
510
590
  })
511
591
  }
512
592
 
@@ -567,6 +647,7 @@ async function main() {
567
647
  sectionType: sec.sectionType,
568
648
  bbox: sec.bbox,
569
649
  path: cropPath,
650
+ imageBlock: sec.imageBlock || null,
570
651
  })
571
652
  } catch (err) {
572
653
  log(`section crop ${idx} ${sec.sectionType} failed: ${err?.message || err}`)