similarbuild 0.3.2 → 0.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "similarbuild",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.4",
|
|
4
4
|
"description": "Visual migration framework for Claude Code — clone a live page, get a paste-ready WordPress/Elementor or Shopify section file, validated and auto-corrected.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -402,6 +402,8 @@ node .claude/skills/sb-inspect-live/scripts/inspect-live.mjs \
|
|
|
402
402
|
|
|
403
403
|
**Re-uso da inspection da home (V03-0a):** se `page.type === 'home'` AND `globalsExtracted.homeInspectionPath !== null`, SKIP esta inspeção e re-use `globalsExtracted.homeInspectionPath` como o `{inspection-path}` deste step. Step 3.5a já inspecionou a home — re-rodar custaria ~5-8s e produziria conteúdo idêntico. Logue `[build-site] Step 4b: re-using home inspection from Step 3.5 ({path})`.
|
|
404
404
|
|
|
405
|
+
**Section crops organizados (V03-3):** após cada inspeção, copie/link os crops de `{inspection-path}/sections/*.png` para `{output_folder}/{project-slug}/screenshots/{page.slug}/` (com prefixo de página tipo `home/`, `pdp/originals/`, `pages/privacy-policy/` etc, derivado de `page.type + page.slug`). Isso dá ao usuário uma estrutura inspecionável `screenshots/<page>/<idx>-<sectionType>.png` que ele pode abrir num browser/folder pra ver O QUE o composer recebeu como input. Use `ln -sf` (symlink) ou `cp` — sem mover, porque a inspeção também precisa da pasta original pra cross-validation.
|
|
406
|
+
|
|
405
407
|
Capture `inspection`. Branches específicas do batch:
|
|
406
408
|
|
|
407
409
|
- `inspection.widgetBlocked === true` → marque a página como `❌` em `pageResults[]`, anote o motivo, e **continue para a próxima página** (não pare o batch). Exceção: se essa for a PRIMEIRA página E for a `home`, pare e escale — provavelmente o site inteiro está atrás de bot-wall.
|
|
@@ -574,8 +576,9 @@ Anote em `pageResults[].coverage = { buildHeight, liveMainHeight, ratio, source:
|
|
|
574
576
|
|
|
575
577
|
Pré-checks adicionais (idênticos ao `/build-page`):
|
|
576
578
|
1. `--no-auto-correct` foi passado → escala primeiro diff de cada página: rotas que iriam pra Step 4j viram ⚠️ direto.
|
|
577
|
-
2. `iteration >= auto_correct_max_iterations` (default 2) →
|
|
578
|
-
3. **Mesmo `violations[]` 2 iterações seguidas** → fixHint não pegou. Não rode 3ª.
|
|
579
|
+
2. `iteration >= auto_correct_max_iterations` (default 2) → page goes to **❌** (NOT ⚠️) per V03-3 zero-fabrication policy. The composer had its 2 attempts and couldn't produce a faithful build; shipping the partial output as "warning" would be misleading. Mark `pageResults[].status = ❌`, message: `"compose-failed after 2 attempts — see TODOs in {output-path} for unreadable sections"`. The fragment IS still written so the user can inspect/edit it manually, but the orchestrator's contract with the user (per V03-3 hard-fail rule) is: don't claim something is ready when it's not.
|
|
580
|
+
3. **Mesmo `violations[]` 2 iterações seguidas** → fixHint não pegou. Não rode 3ª. ❌ imediato.
|
|
581
|
+
4. **(V03-3) `review.violations[]` includes `composition-fabrication-detected` or `composition-todo-bloat`** (composer self-flagged unreadable sections) → goes to Step 4j once. If 2nd attempt also flags, ship as ❌ — não como ⚠️.
|
|
579
582
|
|
|
580
583
|
### Step 4j — Auto-correct iteration (loop FECHADO por página)
|
|
581
584
|
|
|
@@ -37,34 +37,74 @@ A single `.html` file written to `outputPath`. The file is a fragment — no `<h
|
|
|
37
37
|
|
|
38
38
|
## On Activation
|
|
39
39
|
|
|
40
|
-
1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, **`domLive`**, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**, and
|
|
40
|
+
1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, **`domLive`**, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**, and **`sectionCrops[]`**) and `assets-map.json` (the URL → localPath / inline-SVG dictionary). If `fixHints` is given, also read `previousHtmlPath`.
|
|
41
41
|
|
|
42
|
-
|
|
42
|
+
**CANONICAL VISUAL INPUT = `sectionCrops[]`.** Do NOT read `inspection.screenshot` (full-page) — it's downscaled when loaded as image and digits become unreadable. The crops are HD-readable per section. Composer uses one crop per section it composes.
|
|
43
43
|
|
|
44
|
-
|
|
44
|
+
**§V03-4 — TERNARY section classification + ZERO-FABRICATION HARD ENFORCEMENT (replaces §V03-3).** Before composing ANY section, classify it into ONE of five categories. Pick the FIRST category that matches. Each category has a distinct compose recipe. Default = placeholder is FORBIDDEN; placeholder is only the last-resort for category (E).
|
|
45
45
|
|
|
46
|
-
|
|
47
|
-
2. **Identify visible components** in the section bbox you're composing: gallery images (count and approximate aspect), price (current + compare-at if struck through), sale banners, variant selectors (size dropdowns, color swatches), tier pricing offers, CTAs (color, label text), reviews snippets, trust badges, FAQ accordions, etc.
|
|
48
|
-
3. **For each visible component, emit REAL markup** reflecting what you saw. Examples:
|
|
49
|
-
- 6-thumb gallery visible → emit `<ul class="pdp__thumbs"><li><img src="…" alt="…"></li>×6</ul>` with the thumb URLs taken from `inspection.imgUrls` filtered by Shopify-CDN patterns + product-slug match. If a thumb URL isn't in imgUrls, emit a `<!-- TODO: thumb N — visible in screenshot at (x,y), no src in inspection -->` comment and a placeholder `<img>` so the user knows.
|
|
50
|
-
- Price "$29.00" with strikethrough "$34.00" → `<span class="pdp__price">$29.00</span> <s class="pdp__compare">$34.00</s>` — read the digits LITERALLY from the screenshot, do not pull from a different DOM source unless they match.
|
|
51
|
-
- "Mother's Day Sale — Buy 2 Pairs, Get 2 FREE" banner → emit the banner with the EXACT text you read.
|
|
52
|
-
- 3 tier offers (1 Pair $29, 2 Pairs + 2 FREE $58, 3 Pairs + 5 FREE $88) → emit 3 radio cards with the literal pricing and "Best Seller"/"Best Value" badges as you see them.
|
|
53
|
-
- Variant selectors (Size XL dropdown, Color "Classic Black" dropdown) → emit `<select>` elements with option lists matching what's visible (you may not see all options if dropdown is closed — emit the ones visible + a `<!-- TODO: additional options likely below the fold -->` comment).
|
|
54
|
-
- CTA "ADD TO CART" red button → emit `<button class="pdp__cta">ADD TO CART</button>` with red background color sampled from your visual reading (or use a known brand red from `inspection.tokens.colors`).
|
|
46
|
+
**Workflow when composing a body page** (anything NOT `--target-section=header|footer`):
|
|
55
47
|
|
|
56
|
-
|
|
57
|
-
- **Texts and visible structure** → read from screenshot (most reliable for lazy-hydrated content).
|
|
58
|
-
- **`href` for clickable links** → look in `inspection.domLive` / `inspection.hydratedHeader` / `inspection.hydratedFooter` / `inspection.imgUrls` for matching elements (by visible link text or image alt). If no match, emit `href="#"` plus a `<!-- TODO: resolve href for "…" -->` comment.
|
|
59
|
-
- **`form action`, `method`, hidden inputs** → ALWAYS from `inspection.hydratedFooter.forms` / `inspection.dom` / `inspection.domLive` (never guess these — submission will break).
|
|
60
|
-
- **`src` for images** → resolve via `assetsMap.assets[url].localPath` when URL appears in `inspection.imgUrls`. When you see an image in the screenshot but can't find its URL in imgUrls (custom-element-rendered img), emit a placeholder + TODO comment.
|
|
61
|
-
- **Computed style (fonts, colors, spacing)** → from `inspection.tokens` + `domLive[].computedStyle` (canonical for layout-derivable values like body background).
|
|
48
|
+
1. **Find the section's crop.** For each section you compose, find the matching `inspection.sectionCrops[]` entry by `bbox.y` proximity. Read the crop via `Read({ file_path: cropEntry.path })`.
|
|
62
49
|
|
|
63
|
-
|
|
50
|
+
2. **Classify the section** into A / B / C / D / E:
|
|
64
51
|
|
|
65
|
-
|
|
52
|
+
**(A) IMAGE-BLOCK** — the section is dominated by a single `<img>` element. Detect by either:
|
|
53
|
+
- `inspection.domLive` (or `dom`) — the section's root node has a child `<img>` whose `bbox` covers ≥80% of the section's `bbox`, OR
|
|
54
|
+
- Wrapper class match: ancestor has class containing `product-info__image`, `image-with-text`, `hero-image`, `trust-badges`, `features-points`, `mothers-day`, `single-image`, `banner-image`, or similar "single asset" pattern.
|
|
55
|
+
- In Shopify Dawn/OS 2.0: **`<div class="product-info__image">` is THE canonical flag** — every match is an image-block.
|
|
66
56
|
|
|
67
|
-
|
|
57
|
+
Recipe: **DO NOT reproduce the image's contents in HTML/CSS.** Emit:
|
|
58
|
+
```html
|
|
59
|
+
<section class="es-{section-slug}">
|
|
60
|
+
<img src="{assetsMap.assets[url].localPath}" alt="{img.alt || section description}" loading="lazy" width="{img.width}" height="{img.height}" />
|
|
61
|
+
</section>
|
|
62
|
+
```
|
|
63
|
+
Pull the URL from `inspection.imgUrls` matching the section's bbox / src pattern. If asset wasn't downloaded (failed extract or didn't appear in imgUrls), emit `<img src="" alt="..." data-todo="asset-missing">` + `<!-- TODO: image asset not in assetsMap (URL: …) — download manually -->`. No CSS reproduction. No SVG re-creation. No text "interpretation" of what's inside the image.
|
|
64
|
+
|
|
65
|
+
**Examples from real feedback (everstride PDP)**: `mothers-day-new.svg` banner, `trust-badges-shipping.svg`, `features-points.svg`, `60-day-fit.svg` cards — all category (A).
|
|
66
|
+
|
|
67
|
+
**(B) PURE MARKUP** — the section is text + structured layout, no dominant image. Examples: FAQ accordion (`<details>/<summary>`), comparison tables (`<table>` with cells), pricing tiers, trust copy blocks, headings with paragraphs.
|
|
68
|
+
|
|
69
|
+
Recipe: READ the crop carefully + cross-validate every literal against `inspection.domLive` text nodes. Emit real semantic markup (`<table>`, `<dl>`, `<details>`, `<ul>` of `<li>`, etc.) that mirrors the live DOM structure.
|
|
70
|
+
|
|
71
|
+
**Cross-validation is MANDATORY for every literal**:
|
|
72
|
+
- Numeric (`$NN`, `NN reviews`, percentages, ratings): grep `inspection.domLive` recursively for a text node containing the exact substring. NOT found → emit `<!-- TODO: literal "$29.00" read from crop but not in DOM; verify -->` + placeholder.
|
|
73
|
+
- Headings/copy/labels: same rule. "What Customers Say", "All rights reserved", "Mix & Match Colors" — if not in DOM verbatim, do NOT emit.
|
|
74
|
+
- Counts: count items visible in crop AND cross-validate against `domLive` children of the container. If mismatch, prefer DOM count (DOM is authoritative for structure) + log discrepancy.
|
|
75
|
+
|
|
76
|
+
**(C) MIXED (image + text)** — the section has BOTH a meaningful image AND text content (heading + body paragraph + image side-by-side). Common Shopify pattern: `<images-with-text-scrolling>` custom element.
|
|
77
|
+
|
|
78
|
+
Recipe: emit `<section>` with `<img>` (rule A for the image part — use real asset URL when available) + `<h*>` heading + `<p>` paragraph(s) literal from crop, cross-validated. Do NOT collapse to image-only or text-only — preserve both.
|
|
79
|
+
|
|
80
|
+
**(D) WIDGET-RENDERED** — third-party widget (Loox reviews, "you may also like" carousel, Instagram feed) where the crop SHOWS the widget already rendered with content (reviews with names + photos + stars + quotes; product cards with prices + titles).
|
|
81
|
+
|
|
82
|
+
Recipe: READ the crop and emit real markup for each visible card/item. Cross-validate against DOM when widget exposes content as JSON-LD or hidden HTML. Mark container with `data-source="widget-name"` so the deploy team knows where to integrate the real widget later. **Do NOT emit empty mount-div when content is visible.** The "deploy time placeholder" excuse is ONLY for (E).
|
|
83
|
+
|
|
84
|
+
**(E) WIDGET-EMPTY / OPAQUE** — the crop is blank, only shows "Loading..." text, or the section is completely unreadable (zero text visible, no images, no structure). Last resort.
|
|
85
|
+
|
|
86
|
+
Recipe: emit STRUCTURAL placeholder (heading from DOM if available + `<div class="placeholder-grid">` with N visual placeholder-cards) + `<!-- TODO: <widget-name or section-description> — content not visible in crop, integrate at deploy -->`. Mount-div alone (empty `<div>`) is FORBIDDEN — placeholder must have visible structure so the deployed page is not visually broken.
|
|
87
|
+
|
|
88
|
+
3. **Hybridize sources by field type** (applies to B, C, D):
|
|
89
|
+
- **Texts/structure/counts** → crop, cross-validated against DOM.
|
|
90
|
+
- **Hrefs** → `inspection.domLive` / `hydratedHeader` / `hydratedFooter` / `imgUrls` by matching visible link text. No match → `href="#"` + TODO.
|
|
91
|
+
- **Form action/method/hidden inputs** → ALWAYS DOM (never guess; broken submission else).
|
|
92
|
+
- **Image src** → `assetsMap.assets[url].localPath` by alt/src pattern matching imgUrls.
|
|
93
|
+
- **Computed style** → `inspection.tokens` + `domLive.computedStyle`.
|
|
94
|
+
|
|
95
|
+
4. **Self-validation BEFORE calling `build-wp.mjs write`** — scan composed HTML for fabrication signals:
|
|
96
|
+
- Every numeric/price literal must either appear in `inspection.domLive` text nodes OR have a nearby TODO comment.
|
|
97
|
+
- Every `<h*>` / heading text must appear in DOM verbatim OR have TODO.
|
|
98
|
+
- Every `<p>` text > 30 chars must appear in DOM verbatim OR have TODO.
|
|
99
|
+
- Every FAQ `<div class="faq-a">` body MUST be either (a) empty + TODO when accordion was closed, OR (b) verbatim from crop where accordion was open.
|
|
100
|
+
- Image-block sections (A) MUST NOT contain SVG reproductions or CSS-styled `<div>` mimicking the image — they MUST be `<img>` + nothing else inside the section.
|
|
101
|
+
If any violation → REWRITE before submit. If can't pass after 1 rewrite, return preflight error `composition-fabrication-detected` for orchestrator Step 4j retry.
|
|
102
|
+
|
|
103
|
+
5. **Hard-fail after 2 attempts.** Orchestrator (Step 4j) caps at 2 compose iterations. 2nd iteration still flagged → page goes to **❌** with TODO-only fragment shipped (not as "ready" or "partial"). NEVER plausible-but-wrong shipped as warning.
|
|
104
|
+
|
|
105
|
+
6. **NEVER complete content from training data.** You may have seen this URL during training. That data is OLD. The crop is NEW. Crop wins. Training data is IRRELEVANT — never fill gaps with "what this site probably has".
|
|
106
|
+
|
|
107
|
+
This contract enforces the user's hard requirement: "não pode me entregar algo diferente e se não conseguir falha na 2 vez."
|
|
68
108
|
|
|
69
109
|
**§V03-1 — Use `domLive` as the canonical body tree when present.** When `inspection.domLive` is non-null, it holds the live-walker snapshot taken BEFORE Cap A substituted `dom[]` with the shadow-flattened tree. The flattened `dom[]` carries `bbox={0,0,0,0}` and empty `computedStyle` because parseHTMLUnsafe returns a detached doc — it's structurally rich (gallery imgs, custom-element children) but useless for layout. The composer needs real bboxes, real `computedStyle.background`, real heights. Always prefer `inspection.domLive` for body section composition (bbox, computedStyle, hero detection, section ordering). Use `inspection.dom` only when `domLive === null` (page had no shadow roots — flatten didn't fire) or when you specifically need shadow-flattened content like a PDP gallery (consult `dom` for image-rich PDP nodes, but use `domLive` for the surrounding layout).
|
|
70
110
|
|
|
@@ -78,7 +78,7 @@ function findAll(re, str) {
|
|
|
78
78
|
return out
|
|
79
79
|
}
|
|
80
80
|
|
|
81
|
-
function validateHtml(html) {
|
|
81
|
+
function validateHtml(html, inspection = null) {
|
|
82
82
|
const errors = []
|
|
83
83
|
const warnings = []
|
|
84
84
|
const info = []
|
|
@@ -228,6 +228,103 @@ function validateHtml(html) {
|
|
|
228
228
|
fabricationRisks.push({ section: m[1].trim(), reason: m[2].trim() })
|
|
229
229
|
}
|
|
230
230
|
|
|
231
|
+
// 13. §V03-4 fabrication detector — when inspection JSON is provided,
|
|
232
|
+
// cross-validate every literal text in the composed HTML against the
|
|
233
|
+
// inspection's text corpus (domLive + dom + hydratedHeader.html +
|
|
234
|
+
// hydratedFooter.html). Literals that don't appear ANYWHERE in the
|
|
235
|
+
// inspection corpus are flagged as fabrications.
|
|
236
|
+
//
|
|
237
|
+
// Detected literal types:
|
|
238
|
+
// - prices/money: $\d+(.\d{2})? or \d+\.\d{2}\b
|
|
239
|
+
// - countdown phrases: \d+(?:,\d{3})*\s*(reviews|customers|women|sold|pairs|...)
|
|
240
|
+
// - headings text content (h1-h6 + p.bold-like)
|
|
241
|
+
// - meaningful paragraph text >30 chars
|
|
242
|
+
//
|
|
243
|
+
// Findings emit error `composition-fabrication-detected` so the orchestrator
|
|
244
|
+
// (Step 4j) can re-trigger composition with feedback.
|
|
245
|
+
const fabricationFindings = []
|
|
246
|
+
if (inspection && typeof inspection === 'object') {
|
|
247
|
+
const corpus = buildInspectionCorpus(inspection)
|
|
248
|
+
const looseCorpus = corpus
|
|
249
|
+
.toLowerCase()
|
|
250
|
+
.replace(/\s+/g, ' ')
|
|
251
|
+
.replace(/[‘’]/g, "'")
|
|
252
|
+
.replace(/[“”]/g, '"')
|
|
253
|
+
.replace(/[—–-]+/g, '-')
|
|
254
|
+
// strip TODO-commented regions so they don't get re-scanned
|
|
255
|
+
const htmlBody = html.replace(/<!--[\s\S]*?-->/g, '')
|
|
256
|
+
// (a) money literals
|
|
257
|
+
const moneyRe = /\$\d{1,5}(?:\.\d{2})?/g
|
|
258
|
+
const moneyHits = new Set()
|
|
259
|
+
for (let m = moneyRe.exec(htmlBody); m; m = moneyRe.exec(htmlBody)) moneyHits.add(m[0])
|
|
260
|
+
for (const lit of moneyHits) {
|
|
261
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
262
|
+
fabricationFindings.push({
|
|
263
|
+
kind: 'money',
|
|
264
|
+
literal: lit,
|
|
265
|
+
severity: 'high',
|
|
266
|
+
message: `Money literal "${lit}" not found in inspection corpus (domLive/dom/hydratedHeader/hydratedFooter)`,
|
|
267
|
+
})
|
|
268
|
+
}
|
|
269
|
+
}
|
|
270
|
+
// (b) count phrases like "19,479 Reviews", "240,000+ Women", "1M+ Sold"
|
|
271
|
+
const countRe = /\b(\d[\d,]{1,9}\+?)\s+(reviews?|customers?|women|sold|pairs?|orders?|stars?|users?|clinicians?|five-star|verified)\b/gi
|
|
272
|
+
const countHits = new Set()
|
|
273
|
+
for (let m = countRe.exec(htmlBody); m; m = countRe.exec(htmlBody)) countHits.add(m[0])
|
|
274
|
+
for (const lit of countHits) {
|
|
275
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
276
|
+
fabricationFindings.push({
|
|
277
|
+
kind: 'count',
|
|
278
|
+
literal: lit,
|
|
279
|
+
severity: 'high',
|
|
280
|
+
message: `Count phrase "${lit}" not found in inspection corpus`,
|
|
281
|
+
})
|
|
282
|
+
}
|
|
283
|
+
}
|
|
284
|
+
// (c) headings — text content of h1..h6
|
|
285
|
+
const headingRe = /<h[1-6][^>]*>([\s\S]*?)<\/h[1-6]>/gi
|
|
286
|
+
const headingHits = new Set()
|
|
287
|
+
for (let m = headingRe.exec(htmlBody); m; m = headingRe.exec(htmlBody)) {
|
|
288
|
+
const txt = m[1].replace(/<[^>]+>/g, '').trim()
|
|
289
|
+
if (txt.length >= 3 && txt.length <= 120) headingHits.add(txt)
|
|
290
|
+
}
|
|
291
|
+
for (const lit of headingHits) {
|
|
292
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
293
|
+
fabricationFindings.push({
|
|
294
|
+
kind: 'heading',
|
|
295
|
+
literal: lit,
|
|
296
|
+
severity: 'high',
|
|
297
|
+
message: `Heading "${lit}" not found in inspection corpus`,
|
|
298
|
+
})
|
|
299
|
+
}
|
|
300
|
+
}
|
|
301
|
+
// (d) long paragraphs (>30 chars) — text content of <p>
|
|
302
|
+
const paraRe = /<p[^>]*>([\s\S]*?)<\/p>/gi
|
|
303
|
+
const paraHits = new Set()
|
|
304
|
+
for (let m = paraRe.exec(htmlBody); m; m = paraRe.exec(htmlBody)) {
|
|
305
|
+
const txt = m[1].replace(/<[^>]+>/g, '').trim()
|
|
306
|
+
if (txt.length >= 30 && txt.length <= 400) paraHits.add(txt)
|
|
307
|
+
}
|
|
308
|
+
for (const lit of paraHits) {
|
|
309
|
+
if (!corpusHas(looseCorpus, lit)) {
|
|
310
|
+
fabricationFindings.push({
|
|
311
|
+
kind: 'paragraph',
|
|
312
|
+
literal: lit.slice(0, 80) + (lit.length > 80 ? '…' : ''),
|
|
313
|
+
severity: 'medium',
|
|
314
|
+
message: `Paragraph "${lit.slice(0, 80)}…" not found in inspection corpus`,
|
|
315
|
+
})
|
|
316
|
+
}
|
|
317
|
+
}
|
|
318
|
+
if (fabricationFindings.length > 0) {
|
|
319
|
+
errors.push({
|
|
320
|
+
rule: 'composition-fabrication-detected',
|
|
321
|
+
severity: 'high',
|
|
322
|
+
message: `§V03-4 fabrication-detector found ${fabricationFindings.length} literal(s) not present in inspection corpus. The composer should re-read the section crops and either remove the fabricated literals OR replace with TODO comments.`,
|
|
323
|
+
findings: fabricationFindings,
|
|
324
|
+
})
|
|
325
|
+
}
|
|
326
|
+
}
|
|
327
|
+
|
|
231
328
|
return {
|
|
232
329
|
passed: errors.length === 0,
|
|
233
330
|
errorCount: errors.length,
|
|
@@ -236,9 +333,62 @@ function validateHtml(html) {
|
|
|
236
333
|
warnings,
|
|
237
334
|
info,
|
|
238
335
|
fabricationRisks,
|
|
336
|
+
fabricationFindings,
|
|
239
337
|
}
|
|
240
338
|
}
|
|
241
339
|
|
|
340
|
+
// §V03-4 — Build a single lowercased text corpus from all inspection text
|
|
341
|
+
// sources for fabrication cross-validation. Includes:
|
|
342
|
+
// - domLive (preferred) / dom: every node.text + node.attrs.alt
|
|
343
|
+
// - hydratedHeader.html, hydratedFooter.html (raw outerHTML strings)
|
|
344
|
+
function buildInspectionCorpus(inspection) {
|
|
345
|
+
const parts = []
|
|
346
|
+
function walkText(arr) {
|
|
347
|
+
if (!arr) return
|
|
348
|
+
const stack = Array.isArray(arr) ? [...arr] : [arr]
|
|
349
|
+
while (stack.length) {
|
|
350
|
+
const n = stack.pop()
|
|
351
|
+
if (!n || typeof n !== 'object') continue
|
|
352
|
+
if (typeof n.text === 'string' && n.text.trim()) parts.push(n.text)
|
|
353
|
+
if (n.attrs && typeof n.attrs === 'object') {
|
|
354
|
+
if (typeof n.attrs.alt === 'string') parts.push(n.attrs.alt)
|
|
355
|
+
if (typeof n.attrs['aria-label'] === 'string') parts.push(n.attrs['aria-label'])
|
|
356
|
+
if (typeof n.attrs.title === 'string') parts.push(n.attrs.title)
|
|
357
|
+
}
|
|
358
|
+
if (Array.isArray(n.children)) for (const c of n.children) stack.push(c)
|
|
359
|
+
}
|
|
360
|
+
}
|
|
361
|
+
walkText(inspection.domLive)
|
|
362
|
+
if (!inspection.domLive) walkText(inspection.dom)
|
|
363
|
+
if (inspection.hydratedHeader && typeof inspection.hydratedHeader.html === 'string') {
|
|
364
|
+
parts.push(inspection.hydratedHeader.html.replace(/<[^>]+>/g, ' '))
|
|
365
|
+
}
|
|
366
|
+
if (inspection.hydratedFooter && typeof inspection.hydratedFooter.html === 'string') {
|
|
367
|
+
parts.push(inspection.hydratedFooter.html.replace(/<[^>]+>/g, ' '))
|
|
368
|
+
}
|
|
369
|
+
return parts.join(' \n ')
|
|
370
|
+
}
|
|
371
|
+
|
|
372
|
+
// Loose comparison: literal appears in the corpus with whitespace and
|
|
373
|
+
// punctuation normalization. Handles "$29.00" matching "$ 29.00" etc.
|
|
374
|
+
function corpusHas(looseCorpusLower, literal) {
|
|
375
|
+
if (!literal) return true
|
|
376
|
+
const needle = literal
|
|
377
|
+
.toLowerCase()
|
|
378
|
+
.replace(/\s+/g, ' ')
|
|
379
|
+
.replace(/[‘’]/g, "'")
|
|
380
|
+
.replace(/[“”]/g, '"')
|
|
381
|
+
.replace(/[—–-]+/g, '-')
|
|
382
|
+
.trim()
|
|
383
|
+
if (!needle) return true
|
|
384
|
+
// direct substring
|
|
385
|
+
if (looseCorpusLower.includes(needle)) return true
|
|
386
|
+
// also try removing all whitespace (handles "$29 .00" vs "$29.00")
|
|
387
|
+
const compact = needle.replace(/\s+/g, '')
|
|
388
|
+
const compactCorpus = looseCorpusLower.replace(/\s+/g, '')
|
|
389
|
+
return compactCorpus.includes(compact)
|
|
390
|
+
}
|
|
391
|
+
|
|
242
392
|
// --- preflight ----------------------------------------------------------------
|
|
243
393
|
//
|
|
244
394
|
// Pre-dispatch checklist. The composer is creative — it picks patterns and
|
|
@@ -425,8 +575,21 @@ async function main() {
|
|
|
425
575
|
} catch (err) {
|
|
426
576
|
fail(`validate: cannot read ${path}: ${err.message}`, 1)
|
|
427
577
|
}
|
|
428
|
-
|
|
429
|
-
|
|
578
|
+
// §V03-4 fabrication detector — when --inspection-path is passed, load
|
|
579
|
+
// the inspection JSON and cross-validate composed HTML literals
|
|
580
|
+
// against the inspection text corpus. Without --inspection-path, the
|
|
581
|
+
// fabrication check is skipped (back-compat for older callers).
|
|
582
|
+
let inspection = null
|
|
583
|
+
if (values['inspection-path']) {
|
|
584
|
+
try {
|
|
585
|
+
const insRaw = await readFile(resolve(values['inspection-path']), 'utf8')
|
|
586
|
+
inspection = JSON.parse(insRaw)
|
|
587
|
+
} catch (err) {
|
|
588
|
+
log(values.verbose, `validate: could not load --inspection-path: ${err.message}`)
|
|
589
|
+
}
|
|
590
|
+
}
|
|
591
|
+
log(values.verbose, `validating ${path} (${html.length} bytes${inspection ? ', with fabrication check' : ''})`)
|
|
592
|
+
const report = validateHtml(html, inspection)
|
|
430
593
|
process.stdout.write(`${JSON.stringify(report, null, 2)}\n`)
|
|
431
594
|
process.exit(report.passed ? 0 : 3)
|
|
432
595
|
}
|
|
@@ -422,6 +422,245 @@ async function main() {
|
|
|
422
422
|
|
|
423
423
|
Object.assign(result, extracted)
|
|
424
424
|
|
|
425
|
+
// §V03-3 — Section-level crops at NATIVE resolution.
|
|
426
|
+
// The full-page screenshot can be 24k+ pixels tall on long pages;
|
|
427
|
+
// when a vision model reads it, the image is downscaled to fit and
|
|
428
|
+
// small details (prices, badges, button labels) become unreadable.
|
|
429
|
+
// Capture each classified section as its own viewport-width crop in
|
|
430
|
+
// native resolution so the composer can read details accurately.
|
|
431
|
+
log('capturing section-level crops')
|
|
432
|
+
try {
|
|
433
|
+
const cropsDir = join(OUTPUT_DIR, 'sections')
|
|
434
|
+
await mkdir(cropsDir, { recursive: true })
|
|
435
|
+
|
|
436
|
+
// Strategy: capture TOP-LEVEL vertical bands. Walk into <body>/<main>
|
|
437
|
+
// and treat each direct child with meaningful height as a section
|
|
438
|
+
// crop. classifySection's sectionType (when present) is preserved
|
|
439
|
+
// as the section label; otherwise we tag with the element's tag +
|
|
440
|
+
// first class. This is more complete than relying on classifySection
|
|
441
|
+
// alone — many real-world pages have hero/trust/pillars/guarantee/
|
|
442
|
+
// banner that classifySection doesn't recognize as a known type but
|
|
443
|
+
// are clearly distinct visual bands.
|
|
444
|
+
const sectionList = []
|
|
445
|
+
const seenBbox = new Set()
|
|
446
|
+
function bboxKey(b) {
|
|
447
|
+
return `${b?.x}|${b?.y}|${b?.w}|${b?.h}`
|
|
448
|
+
}
|
|
449
|
+
function labelFromNode(n) {
|
|
450
|
+
if (n.sectionType) return n.sectionType
|
|
451
|
+
const cls = Array.isArray(n.classes) ? n.classes[0] : null
|
|
452
|
+
const tag = n.tag || 'section'
|
|
453
|
+
const slug = (cls || tag).replace(/[^a-z0-9-]/gi, '-').toLowerCase().slice(0, 32)
|
|
454
|
+
return slug || 'section'
|
|
455
|
+
}
|
|
456
|
+
|
|
457
|
+
// §V03-4 image-block detection. A section is an image-block when
|
|
458
|
+
// its visible content is dominated by a single <img>: either by
|
|
459
|
+
// bbox coverage (img takes ≥80% of section area) OR by a known
|
|
460
|
+
// wrapper-class pattern (Shopify Dawn / OS 2.0 themes use specific
|
|
461
|
+
// wrapper classes like `product-info__image` to delimit a region
|
|
462
|
+
// whose entire visual content is one CDN-hosted SVG/PNG/JPG asset).
|
|
463
|
+
// The composer (sb-build-wp §V03-4 rule A) treats image-block
|
|
464
|
+
// sections as "download asset + emit <img src>" — NEVER tries to
|
|
465
|
+
// reproduce the visual contents in HTML/CSS.
|
|
466
|
+
const IMAGE_BLOCK_WRAPPER_PATTERNS = [
|
|
467
|
+
/product-info__image/i,
|
|
468
|
+
/image-with-text/i,
|
|
469
|
+
/\bhero-image\b/i,
|
|
470
|
+
/\btrust-badges\b/i,
|
|
471
|
+
/\bfeatures-points\b/i,
|
|
472
|
+
/\bmothers?-day\b/i,
|
|
473
|
+
/\bsingle-image\b/i,
|
|
474
|
+
/\bbanner-image\b/i,
|
|
475
|
+
/\bguarantee-card\b/i,
|
|
476
|
+
/\bbest-fit-size-chart\b/i,
|
|
477
|
+
]
|
|
478
|
+
function classifyAsImageBlock(node) {
|
|
479
|
+
if (!node || typeof node !== 'object') return null
|
|
480
|
+
const sectionBbox = node.bbox
|
|
481
|
+
if (!sectionBbox || !sectionBbox.h || !sectionBbox.w) return null
|
|
482
|
+
const sectionArea = sectionBbox.h * sectionBbox.w
|
|
483
|
+
if (sectionArea <= 0) return null
|
|
484
|
+
// Match by wrapper-class pattern on the node itself OR any
|
|
485
|
+
// descendant (the wrapper might be a child of the section root).
|
|
486
|
+
function findClassMatch(n) {
|
|
487
|
+
if (!n || typeof n !== 'object') return null
|
|
488
|
+
if (Array.isArray(n.classes)) {
|
|
489
|
+
for (const cls of n.classes) {
|
|
490
|
+
for (const pat of IMAGE_BLOCK_WRAPPER_PATTERNS) {
|
|
491
|
+
if (pat.test(cls)) return cls
|
|
492
|
+
}
|
|
493
|
+
}
|
|
494
|
+
}
|
|
495
|
+
if (Array.isArray(n.children)) {
|
|
496
|
+
for (const c of n.children) {
|
|
497
|
+
const m = findClassMatch(c)
|
|
498
|
+
if (m) return m
|
|
499
|
+
}
|
|
500
|
+
}
|
|
501
|
+
return null
|
|
502
|
+
}
|
|
503
|
+
const classMatch = findClassMatch(node)
|
|
504
|
+
// Find the largest <img> descendant by bbox area.
|
|
505
|
+
let bestImg = null
|
|
506
|
+
function findBiggestImg(n) {
|
|
507
|
+
if (!n || typeof n !== 'object') return
|
|
508
|
+
if (n.tag === 'img' && n.bbox && n.bbox.h > 0 && n.bbox.w > 0) {
|
|
509
|
+
const a = n.bbox.h * n.bbox.w
|
|
510
|
+
if (!bestImg || a > bestImg.area) {
|
|
511
|
+
const src = (n.attrs && (n.attrs.src || n.attrs.srcset)) || ''
|
|
512
|
+
bestImg = { area: a, bbox: n.bbox, src }
|
|
513
|
+
}
|
|
514
|
+
}
|
|
515
|
+
if (Array.isArray(n.children)) {
|
|
516
|
+
for (const c of n.children) findBiggestImg(c)
|
|
517
|
+
}
|
|
518
|
+
}
|
|
519
|
+
findBiggestImg(node)
|
|
520
|
+
const imgArea = bestImg ? bestImg.area : 0
|
|
521
|
+
const coverage = sectionArea > 0 ? imgArea / sectionArea : 0
|
|
522
|
+
if (classMatch || coverage >= 0.8) {
|
|
523
|
+
return {
|
|
524
|
+
reason: classMatch
|
|
525
|
+
? `wrapper-class:${classMatch}`
|
|
526
|
+
: `img-coverage:${(coverage * 100).toFixed(0)}%`,
|
|
527
|
+
imgSrc: bestImg ? bestImg.src : null,
|
|
528
|
+
imgBbox: bestImg ? bestImg.bbox : null,
|
|
529
|
+
coverage,
|
|
530
|
+
}
|
|
531
|
+
}
|
|
532
|
+
return null
|
|
533
|
+
}
|
|
534
|
+
const sourceDom = result.domLive
|
|
535
|
+
? Array.isArray(result.domLive)
|
|
536
|
+
? result.domLive
|
|
537
|
+
: [result.domLive]
|
|
538
|
+
: result.dom
|
|
539
|
+
|
|
540
|
+
// Find the host container (body or main) — walk in one level and
|
|
541
|
+
// pick the node with the most children + largest bbox.
|
|
542
|
+
function findHost(arr) {
|
|
543
|
+
for (const node of arr) {
|
|
544
|
+
if (!node || typeof node !== 'object') continue
|
|
545
|
+
if (node.tag === 'body' || node.tag === 'main') return node
|
|
546
|
+
// Recurse one level for the case the root is e.g. <html>
|
|
547
|
+
if (Array.isArray(node.children)) {
|
|
548
|
+
const inner = findHost(node.children)
|
|
549
|
+
if (inner) return inner
|
|
550
|
+
}
|
|
551
|
+
}
|
|
552
|
+
return null
|
|
553
|
+
}
|
|
554
|
+
const host = findHost(sourceDom)
|
|
555
|
+
const directChildren = host && Array.isArray(host.children) ? host.children : sourceDom
|
|
556
|
+
|
|
557
|
+
for (const child of directChildren) {
|
|
558
|
+
if (!child || typeof child !== 'object') continue
|
|
559
|
+
const bbox = child.bbox
|
|
560
|
+
if (!bbox || typeof bbox.h !== 'number') continue
|
|
561
|
+
// Skip tiny strips (probably wrappers/spacers) and zero-height.
|
|
562
|
+
if (bbox.h < 60 || bbox.w < 200) continue
|
|
563
|
+
// Skip absurdly tall single bands (entire page wrapped in one
|
|
564
|
+
// div) — fall back to nested crops in that case.
|
|
565
|
+
if (bbox.h > 6000) {
|
|
566
|
+
if (Array.isArray(child.children)) {
|
|
567
|
+
for (const grand of child.children) {
|
|
568
|
+
if (!grand || typeof grand !== 'object' || !grand.bbox) continue
|
|
569
|
+
if (grand.bbox.h < 60 || grand.bbox.w < 200) continue
|
|
570
|
+
if (grand.bbox.h > 6000) continue
|
|
571
|
+
const key = bboxKey(grand.bbox)
|
|
572
|
+
if (seenBbox.has(key)) continue
|
|
573
|
+
seenBbox.add(key)
|
|
574
|
+
sectionList.push({
|
|
575
|
+
sectionType: labelFromNode(grand),
|
|
576
|
+
bbox: grand.bbox,
|
|
577
|
+
imageBlock: classifyAsImageBlock(grand),
|
|
578
|
+
})
|
|
579
|
+
}
|
|
580
|
+
}
|
|
581
|
+
continue
|
|
582
|
+
}
|
|
583
|
+
const key = bboxKey(bbox)
|
|
584
|
+
if (seenBbox.has(key)) continue
|
|
585
|
+
seenBbox.add(key)
|
|
586
|
+
sectionList.push({
|
|
587
|
+
sectionType: labelFromNode(child),
|
|
588
|
+
bbox,
|
|
589
|
+
imageBlock: classifyAsImageBlock(child),
|
|
590
|
+
})
|
|
591
|
+
}
|
|
592
|
+
|
|
593
|
+
// Sort by y so crops are numbered top-to-bottom — matches the
|
|
594
|
+
// user's reading order through the page.
|
|
595
|
+
sectionList.sort((a, b) => (a.bbox.y || 0) - (b.bbox.y || 0))
|
|
596
|
+
|
|
597
|
+
// Use sharp to crop the already-captured full-page screenshot.
|
|
598
|
+
// This is faster (no extra page.screenshot per section) and avoids
|
|
599
|
+
// page-level scroll/layout race conditions where clip regions outside
|
|
600
|
+
// the rendered viewport return "clipped area outside resulting image".
|
|
601
|
+
// The screenshot was captured at deviceScaleFactor=3 for iPhone profile,
|
|
602
|
+
// so pixel coords are bbox-coord * dpr; we read metadata to detect.
|
|
603
|
+
let sharp = null
|
|
604
|
+
try {
|
|
605
|
+
sharp = (await import('sharp')).default
|
|
606
|
+
} catch (_) {
|
|
607
|
+
log('sharp not available — section crops skipped (install: npm i sharp)')
|
|
608
|
+
result.sectionCrops = []
|
|
609
|
+
}
|
|
610
|
+
const sectionCrops = []
|
|
611
|
+
if (sharp) {
|
|
612
|
+
const fullMeta = await sharp(screenshotPath).metadata()
|
|
613
|
+
const dpr = Math.max(1, Math.round(fullMeta.width / VIEWPORT_W))
|
|
614
|
+
let idx = 0
|
|
615
|
+
for (const sec of sectionList) {
|
|
616
|
+
idx++
|
|
617
|
+
const slug =
|
|
618
|
+
String(idx).padStart(2, '0') +
|
|
619
|
+
'-' +
|
|
620
|
+
String(sec.sectionType).replace(/[^a-z0-9-]/gi, '-').toLowerCase()
|
|
621
|
+
const cropPath = join(cropsDir, `${slug}.png`)
|
|
622
|
+
// Clamp to image bounds — bbox may extend beyond captured area.
|
|
623
|
+
const extractW = Math.max(
|
|
624
|
+
1,
|
|
625
|
+
Math.min(Math.round((sec.bbox.w || VIEWPORT_W) * dpr), fullMeta.width),
|
|
626
|
+
)
|
|
627
|
+
const extractH = Math.max(
|
|
628
|
+
1,
|
|
629
|
+
Math.min(Math.round(sec.bbox.h * dpr), fullMeta.height - Math.round(sec.bbox.y * dpr)),
|
|
630
|
+
)
|
|
631
|
+
const extractY = Math.max(0, Math.round(sec.bbox.y * dpr))
|
|
632
|
+
if (extractY + extractH > fullMeta.height || extractH < 30 * dpr) {
|
|
633
|
+
log(`section crop ${idx} ${sec.sectionType} skipped: outside image bounds`)
|
|
634
|
+
continue
|
|
635
|
+
}
|
|
636
|
+
try {
|
|
637
|
+
await sharp(screenshotPath)
|
|
638
|
+
.extract({
|
|
639
|
+
left: 0,
|
|
640
|
+
top: extractY,
|
|
641
|
+
width: extractW,
|
|
642
|
+
height: extractH,
|
|
643
|
+
})
|
|
644
|
+
.toFile(cropPath)
|
|
645
|
+
sectionCrops.push({
|
|
646
|
+
idx,
|
|
647
|
+
sectionType: sec.sectionType,
|
|
648
|
+
bbox: sec.bbox,
|
|
649
|
+
path: cropPath,
|
|
650
|
+
imageBlock: sec.imageBlock || null,
|
|
651
|
+
})
|
|
652
|
+
} catch (err) {
|
|
653
|
+
log(`section crop ${idx} ${sec.sectionType} failed: ${err?.message || err}`)
|
|
654
|
+
}
|
|
655
|
+
}
|
|
656
|
+
}
|
|
657
|
+
result.sectionCrops = sectionCrops
|
|
658
|
+
log(`section crops: ${sectionCrops.length} captured`)
|
|
659
|
+
} catch (err) {
|
|
660
|
+
log(`section crops phase failed: ${err?.message || err}`)
|
|
661
|
+
result.sectionCrops = []
|
|
662
|
+
}
|
|
663
|
+
|
|
425
664
|
// §3.2 — Trigger-and-observe header hamburger menu, if present.
|
|
426
665
|
// Composer was guessing "Pattern A drawer-left" from sectionType=header
|
|
427
666
|
// alone, ending up with the wrong animation in example-shop (the live
|