similarbuild 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "similarbuild",
3
- "version": "0.2.1",
3
+ "version": "0.3.0",
4
4
  "description": "Visual migration framework for Claude Code — clone a live page, get a paste-ready WordPress/Elementor or Shopify section file, validated and auto-corrected.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -67,7 +67,7 @@ O texto após `ARGUMENTS:` contém a entrada do usuário. Extraia:
67
67
  | `--no-auto-correct` | no | false | Em cada página, escala no primeiro decision-matrix que pediria loop. |
68
68
  | `--max-pages <n>` | no | (deixa o default da skill, `200`) | Repassa pra `sb-crawl-and-list` como hard cap. |
69
69
  | `--sitemap-path <path>` | no | (none) | Repassa pra `sb-crawl-and-list`. Útil pra SPAs ou sites que bloqueiam crawler. |
70
- | `--no-globals` | no | false | Opt-out explícito do Step 4d (globals auto-extract). Sem essa flag, header/footer compartilhados são extraídos pra `clean/global/` quando `crawl.pageCount >= 3` (default ON). Use quando o site não tem chrome compartilhado entre páginas, ou quando estiver fazendo um run de debug que precisa ver o markup completo por página. |
70
+ | `--no-globals` | no | false | Opt-out do Step 3.5 (globals-first phase, V03-0a). Sem essa flag, header/footer compartilhados são extraídos pra `clean/global/` ANTES do loop de páginas, com gates duros de interactivity+visual. Default ON quando `crawl.pageCount >= 3` AND existe ≥1 página `type === 'home'`. Use quando o site não tem chrome compartilhado, ou em debug que precisa ver markup completo por página. |
71
71
  | `--dry-run` | no | false | Roda crawl + checkpoint + plan-summary, **não builda**. Imprime o plano (URLs, types, assets estimados) e sai. |
72
72
 
73
73
  Flag desconhecida → ignore com warning de uma linha. Não erre.
@@ -105,7 +105,7 @@ Flag desconhecida → ignore com warning de uma linha. Não erre.
105
105
  │ ├── collections/ ← um por collection
106
106
  │ ├── pages/ ← contact/about/policy/blog/other
107
107
  │ ├── sections/ ← seções genéricas
108
- │ └── global/ ← header/footer compartilhados (Step 4d)
108
+ │ └── global/ ← header/footer compartilhados (Step 3.5, V03-0a)
109
109
  ├── assets/ ← content-hash images, COMPARTILHADO entre todas as páginas
110
110
  ├── reports/
111
111
  │ ├── diffs/
@@ -227,7 +227,136 @@ Mostre a lista FINAL pós-filtro (mesma tabela formatada, mas com a contagem nov
227
227
 
228
228
  ### Step 3d — Dry-run check
229
229
 
230
- Se `--dry-run` foi passado, AGORA é o ponto de saída: depois de `confirm`, escreva `reports/plan.{ts}.md` com a lista final + estimativa por type, imprima o sumário, e pare. Não rode Step 4.
230
+ Se `--dry-run` foi passado, AGORA é o ponto de saída: depois de `confirm`, escreva `reports/plan.{ts}.md` com a lista final + estimativa por type, imprima o sumário, e pare. Não rode Step 3.5 nem Step 4.
231
+
232
+ ---
233
+
234
+ ## Step 3.5 — Globals-first phase (V03-0, hydrated-snapshot powered)
235
+
236
+ **Default ON** quando `crawl.pageCount >= 3` AND a flag `--no-globals` NÃO foi passada AND existe ao menos uma página com `type === 'home'` em `pagesConfirmed[]`. Caso contrário, skip o Step 3.5 inteiro com uma linha em `decisions.md` (`Step 3.5 skipped: <reason>`) e segue pro Step 4 com `globalsExtracted = { header: null, footer: null, status: 'skipped' }`.
237
+
238
+ Header/footer são **shared chrome**: usados em TODAS as páginas. Bug em qualquer um afeta o site inteiro. Esta fase é dedicada e roda ANTES do loop de páginas pra evitar gastar tempo nas páginas individuais quando os globais estão quebrados.
239
+
240
+ **Diferença vs tentativa anterior (revertida em v0.2.2)**: agora o `sb-inspect-live` emite `inspection.hydratedHeader` / `inspection.hydratedFooter` — payloads estruturados (links, headings, inputs, forms, images) capturados WHILE THE CHROME IS IN VIEW e ainda hidratado. Isso elimina o problema de lazy-light-DOM em Shopify (onde menus `<store-footer-menu>` populam `<li>` via JS pós-intersection-out). Composer compõe markup REAL — não image-slice estática.
241
+
242
+ Estado do orchestrator:
243
+
244
+ ```
245
+ globalsExtracted = {
246
+ header: null | <output-path>, // ex: "clean/global/header.html"
247
+ footer: null | <output-path>,
248
+ status: 'extracted' | 'skipped' | 'failed',
249
+ failureReason: null | <string>,
250
+ headerBbox: null | {x,y,w,h},
251
+ footerBbox: null | {x,y,w,h},
252
+ homeInspectionPath: null | <path>,
253
+ }
254
+ ```
255
+
256
+ ### Step 3.5a — Inspect home (Bash, dedicado)
257
+
258
+ ```bash
259
+ node .claude/skills/sb-inspect-live/scripts/inspect-live.mjs \
260
+ --url "{home-page.url}" \
261
+ --viewport-width {default_viewport} \
262
+ --viewport-height 844 \
263
+ --output-dir "{output_folder}/{project-slug}/.sb-memory/inspect-globals-{ts}"
264
+ ```
265
+
266
+ Onde `{home-page}` é a primeira entrada de `pagesConfirmed[]` com `type === 'home'`. Set `globalsExtracted.homeInspectionPath` = path resultante. Se inspect falha (`widgetBlocked`, exit non-zero), `globalsExtracted.status = 'failed'`, `failureReason = 'home-inspect-failed: <stderr>'`, segue pro Step 3.5g.
267
+
268
+ ### Step 3.5b — Detect hydrated chrome
269
+
270
+ Carregue `inspection.json` da Step 3.5a:
271
+
272
+ - **Header**: se `inspection.hydratedHeader !== null` AND `inspection.hydratedHeader.links.length >= 2`, set `globalsExtracted.headerBbox = inspection.hydratedHeader.bbox`. Caso contrário, set `globalsExtracted.header = null` (skip header compose).
273
+ - **Footer**: se `inspection.hydratedFooter !== null` AND (`inspection.hydratedFooter.links.length >= 3` OR `inspection.hydratedFooter.forms.length >= 1`), set `globalsExtracted.footerBbox = inspection.hydratedFooter.bbox`. Caso contrário, set `globalsExtracted.footer = null`.
274
+
275
+ Se AMBOS skipped, `globalsExtracted.status = 'skipped'`, `failureReason = 'no-hydrated-chrome-detected'`. Log `[build-site] Step 3.5: no hydrated header/footer in home inspection — pages keep chrome inline`. Pula direto pro Step 4 (cenário válido — não é falha).
276
+
277
+ ### Step 3.5c — Compose globals (Skill por target)
278
+
279
+ Para cada um dos detectados (`header` e/ou `footer`), invoque o composer:
280
+
281
+ ```
282
+ Skill(
283
+ skill="sb-build-wp",
284
+ args="inspection={globals-inspection-path} assets-map={assets-map-path} output-path=clean/global/{header|footer}.html preset=wp-elementor target-section={header|footer}"
285
+ )
286
+ ```
287
+
288
+ O composer consome `inspection.hydratedHeader` / `inspection.hydratedFooter` (per SKILL.md §V03-0a) — markup real com links, forms, imgs.
289
+
290
+ (`sb-build-shopify` quando `target === shopify`, mesmo padrão.)
291
+
292
+ Set `globalsExtracted.header` e/ou `.footer` = output paths. Se composer falha, `globalsExtracted.status = 'failed'`, `failureReason = 'compose-failed: <which> <stderr>'`, segue pro Step 3.5g.
293
+
294
+ ### Step 3.5d — Test interactivity (Bash, HARD blocker)
295
+
296
+ Diferente do Step 4f-bis (diagnóstico em páginas individuais), aqui o gate é DURO. Para cada um:
297
+
298
+ ```bash
299
+ node .claude/skills/sb-test-interactivity/scripts/test-interactivity.mjs \
300
+ --file "clean/global/{header|footer}.html" \
301
+ --preset "{preset}" \
302
+ --output-dir "{output_folder}/{project-slug}/reports/validations/global-{ts}"
303
+ ```
304
+
305
+ Se `passed === false`, `globalsExtracted.status = 'failed'`, `failureReason = 'interactivity-failed: <which>'`, segue pro Step 3.5g.
306
+
307
+ ### Step 3.5e — Compare visual focado em crops
308
+
309
+ Pré-condição: cada fragment já foi composto. Renderize cada um e compare contra o crop correspondente da live screenshot.
310
+
311
+ **3.5e.i — Render do fragment**:
312
+
313
+ ```bash
314
+ node .claude/skills/sb-validate-static-render/scripts/validate-static-render.mjs \
315
+ --file "clean/global/{header|footer}.{html|liquid}" \
316
+ --preset "{preset}" \
317
+ --output-dir "{output_folder}/{project-slug}/reports/validations/global-{ts}/{header|footer}/"
318
+ ```
319
+
320
+ **3.5e.ii — Bbox no live**: use `globalsExtracted.headerBbox` (ou `footerBbox`). Se `w === 0 || h === 0`, log warning e omita `--crop-live-bbox` (compare-visual cai pra comparação full-screen).
321
+
322
+ **3.5e.iii — Compare**:
323
+
324
+ ```bash
325
+ node .claude/skills/sb-compare-visual/scripts/compare-visual.mjs \
326
+ --live-screenshot "{globals-inspection-screenshot}" \
327
+ --build-screenshot "{globals-render-screenshot}" \
328
+ --output-dir "{output_folder}/{project-slug}/reports/diffs/global-{ts}/{header|footer}/" \
329
+ --tokens-live "{globals-inspection-path}" \
330
+ --tokens-build "{globals-render-json-path}" \
331
+ --threshold {diff_threshold_percent} \
332
+ [--crop-live-bbox "{x},{y},{w},{h}"] \
333
+ [--crop-build-bbox "0,0,{width},{totalHeight}"]
334
+ ```
335
+
336
+ Se `passed === false`, `globalsExtracted.status = 'failed'`, `failureReason = 'visual-diff: <which> <diffPercent>%'`, segue pro Step 3.5g.
337
+
338
+ ### Step 3.5f — Sucesso
339
+
340
+ `globalsExtracted.status = 'extracted'`. Log `[build-site] Step 3.5 ✅ globals extracted: header={path}, footer={path}`. Prossegue pro Step 4.
341
+
342
+ ### Step 3.5g — Gate duro (early-exit em falha)
343
+
344
+ Se `globalsExtracted.status === 'failed'`:
345
+
346
+ 1. Escreva `reports/index.html` mínimo com o motivo + links pros artefatos diagnósticos.
347
+ 2. Append em `decisions.md`: `{ts} | /build-site {root-URL} | Step 3.5 BLOCKED: {failureReason} | pages NOT processed`.
348
+ 3. Console:
349
+
350
+ ```
351
+ ❌ /build-site {root-URL}
352
+
353
+ Step 3.5 (globals pass) bloqueou o batch.
354
+ Motivo: {failureReason}
355
+ Artefatos: {output_folder}/{project-slug}/reports/index.html
356
+
357
+ Páginas individuais NÃO foram processadas — fix os globals e re-rode.
358
+ ```
359
+ 4. Pare.
231
360
 
232
361
  ---
233
362
 
@@ -271,6 +400,8 @@ node .claude/skills/sb-inspect-live/scripts/inspect-live.mjs \
271
400
  --output-dir "{output_folder}/{project-slug}/.sb-memory/inspect-{slug}-{ts}"
272
401
  ```
273
402
 
403
+ **Re-uso da inspection da home (V03-0a):** se `page.type === 'home'` AND `globalsExtracted.homeInspectionPath !== null`, SKIP esta inspeção e re-use `globalsExtracted.homeInspectionPath` como o `{inspection-path}` deste step. Step 3.5a já inspecionou a home — re-rodar custaria ~5-8s e produziria conteúdo idêntico. Logue `[build-site] Step 4b: re-using home inspection from Step 3.5 ({path})`.
404
+
274
405
  Capture `inspection`. Branches específicas do batch:
275
406
 
276
407
  - `inspection.widgetBlocked === true` → marque a página como `❌` em `pageResults[]`, anote o motivo, e **continue para a próxima página** (não pare o batch). Exceção: se essa for a PRIMEIRA página E for a `home`, pare e escale — provavelmente o site inteiro está atrás de bot-wall.
@@ -291,50 +422,29 @@ node .claude/skills/sb-extract-assets/scripts/extract-assets.mjs \
291
422
 
292
423
  `assetsMap.failed[]` non-empty → forwarde pro builder, **não** pare a página.
293
424
 
294
- ### Step 4d — Extract global sections (header/footer compartilhados)
425
+ ### Step 4d — Strip globals from per-page inspection (V03-0a, post-Step 3.5)
295
426
 
296
- **Default ON** quando `crawl.pageCount >= 3` AND a flag `--no-globals` NÃO foi passada. Caso contrário, skip este Step inteiro com uma linha em `decisions.md` (`Step 4d skipped: pageCount<3` OU `Step 4d skipped: --no-globals opt-out`).
427
+ A detecção e composição de header/footer **agora roda no Step 3.5** (Globals-first phase, antes do loop). Aqui o trabalho é apenas garantir que páginas individuais NÃO incluam header/footer no markup gerado o destino renderiza os globais UMA vez via `clean/global/{header,footer}.{html|liquid}`.
297
428
 
298
- Estado mantido pelo orchestrator durante o batch:
429
+ **Skip este Step** se `globalsExtracted.status !== 'extracted'` (Step 3.5 foi `skipped` ou retornou sem chrome detectado). Páginas mantêm chrome inline — comportamento pré-v0.3.0 para sites sem chrome global.
299
430
 
300
- ```
301
- globalsExtracted = {
302
- header: null | <output-path>, // ex: "clean/global/header.html"
303
- footer: null | <output-path>,
304
- }
305
- ```
306
-
307
- **Detector estrutural** (sem hash, sem regex sobre HTML cru):
308
-
309
- - **Header** = primeiro elemento da `inspection.dom` (descida em DFS) cujo `tag === 'header'` E que tenha algum descendant cujo `tag === 'nav'`. Se ausente: **skip globals (não extrai nem strippa) + log warning** em `decisions.md`: `[build-site] WARN: Step 4d header detector missed — no <header> semântico com <nav>; globals not extracted, pages keep chrome inline`. Continue pipeline normalmente. **NÃO pergunte ao humano** — non-negotiable #5 (`single human checkpoint = page list confirmation`) proíbe queries mid-batch. Decisão de fix manual fica pra próximo run com `--no-globals` ou refresh do crawl.
310
-
311
- - **Footer** = último elemento que é **descendant direto-de-body-ou-main** (i.e., `parent.tag === 'body'` OU `parent.tag === 'main'`) cujo `tag === 'footer'` OU cujo atributo `role === 'contentinfo'`. **NÃO** o último em DFS pós-ordem — isso pegava blog-post `<footer>` ou article `<footer>` em vez do site footer (bug example-shop-class em sites de conteúdo). Se ausente: skip + warn (mesmo padrão do header acima).
312
-
313
- **Trigger window:** Step 4d roda apenas quando `globalsExtracted.header === null` (e separadamente para footer). Ou seja, só na primeira inspection que casar o detector. Inspeções subsequentes não re-processam — o asset já está em disco e é referenciado.
314
-
315
- **Extração:**
316
- 1. Localize header e/ou footer no `inspection.dom` (em memória).
317
- 2. Serialize cada um pro fragment HTML/Liquid passando pelo composer (`sb-build-wp` ou `sb-build-shopify`) com hint `--target-section=header|footer` (composer trata como input de seção, não página inteira). Output: `clean/global/{header,footer}.{html|liquid}`.
318
- 3. Set `globalsExtracted.header` (e/ou `.footer`) = output path.
319
-
320
- **Stripper (mandatório quando globals extracted):** ANTES de invocar o composer pra páginas individuais (Step 4e):
431
+ **Stripper (mandatório quando `globalsExtracted.status === 'extracted'`):** ANTES de invocar o composer pra esta página (Step 4e):
321
432
 
322
433
  1. Clone a `inspection` em memória.
323
- 2. Em `inspection.dom`, **remova** as subtrees do `<header>` semântico (se `globalsExtracted.header !== null`) e do `<footer>`/`[role=contentinfo]` (se `globalsExtracted.footer !== null`).
324
- 3. Passe a inspection editada ao composer. Composer não header/footer e não pode fabricá-los.
325
- 4. Após o composer retornar, **prepende** ao output uma linha de comentário:
434
+ 2. Em `inspection.dom`, **remova** as subtrees do `<header>` semântico (se `globalsExtracted.header !== null`) e do `<footer>` / `[role=contentinfo]` (se `globalsExtracted.footer !== null`). Use heurística "parent imediato": o tag `<footer>` direto-de-body-ou-main qualifica; `<article><footer>` aninhado em layouts blog NÃO qualifica.
435
+ 3. Também limpe `inspection.hydratedHeader` e `inspection.hydratedFooter` (set null) na cópia editada assim o composer não tenta re-renderizar o chrome global na página individual.
436
+ 4. Passe a inspection editada ao composer.
437
+ 5. Após o composer retornar, **prepende** ao output uma linha de comentário:
326
438
  - HTML: `<!-- sb-build-site: header in clean/global/header.html -->\n` (se header extracted)
327
439
  - HTML: `<!-- sb-build-site: footer in clean/global/footer.html -->` (se footer extracted, no fim)
328
440
  - Liquid: idem com `{% comment %} ... {% endcomment %}`
329
- 5. Grava em `clean/{home,pdp,...}/{slug}.html` o output já strippado-e-comentado.
441
+ 6. Grava em `clean/{home,pdp,...}/{slug}.html` o output já strippado-e-comentado.
330
442
 
331
- **Stripper validation:** após gravar o file, faça um grep rápido:
443
+ **Stripper validation:** após gravar, faça grep:
332
444
  ```
333
445
  grep -nE "<header[ >]|<footer[ >]" clean/{home,pdp,...}/{slug}.{html,liquid}
334
446
  ```
335
- Padrão sem `^` anchor pra pegar pretty-printed (indented ` <header>`). Glob `{html,liquid}` cobre ambos os targets (WP + Shopify). Se retornar match, é defeito — composer ignorou a inspection editada e re-fabricou, ou stripper passou tag misformada. Log `[build-site] WARN: stripper miss in {slug}.{ext}` em `decisions.md`. Não blocking — orchestrator continua.
336
-
337
- **Por que detector estrutural e não hash:** hash exige inspection emitir hashes comparáveis. Detector estrutural usa só tags semânticas que `sb-inspect-live` já captura na DOM tree. Funciona com qualquer inspection que tenha um DOM serializado, sem mudança upstream. Cobre o caso comum (sites WordPress/Shopify usam `<header>`/`<footer>` semântico). Para sites com chrome em `<div class="site-header">` sem semântica, o Ask First gate força decisão consciente em vez de heurística silenciosa.
447
+ Match log `[build-site] WARN: stripper miss in {slug}.{ext}` em `decisions.md`. Não-blocking — orchestrator continua, mas vira entry em "Stripper misses" no report final.
338
448
 
339
449
  ### Step 4e — Build (Skill)
340
450
 
@@ -492,11 +602,13 @@ Depois que TODAS as páginas confirmadas foram processadas (`pageResults[].lengt
492
602
 
493
603
  1. **Escreva `{output_folder}/{project-slug}/reports/index.html`** com:
494
604
  - Header com root URL, target, project-slug, timestamp do run, totals (`X✅ / Y⚠️ / Z❌`).
605
+ - **Section "Globals extraction" (V03-0a)** logo abaixo do header, ANTES da tabela: badge ✅/❌/⏭️ (extracted/failed/skipped) + `globalsExtracted.status`, `globalsExtracted.failureReason` quando aplicável, links pra `clean/global/header.html` e `clean/global/footer.html` quando extracted, link pro `interactivity-report.json` da fase global, link pro diff map da fase global. Se `status === 'failed'`, esta seção é a única coisa renderizada antes do "batch BLOCKED" notice — não há tabela de páginas (Step 3.5g já abortou).
495
606
  - Tabela com uma linha por página: status badge (com sufixos `+!interactive` quando aplicável), type, URL → output path link, diff %, **coverage %**, iterations, screenshot side-by-side (live + build, thumbs com link pro full), link pro diff map, violations resumo, link pro `interactivity-report.json` quando `interactivityWarnings.length > 0`.
496
607
  - Section "Auto-correct details" listando páginas com iteration > 0 e o que mudou.
497
608
  - Section "Escalations" com páginas ⚠️/❌ — top diffs visuais e candidate fixes inline.
498
609
  - Section "Interactivity warnings" listando páginas com `interactivityWarnings.length > 0`, agrupadas por type de teste reprovado (aria-controls / details-summary / dialog) e mostrando o trigger/target + check name por failure.
499
610
  - Section "Coverage warnings" listando páginas com `coverage.ratio < 0.85`, ordenadas por ratio asc — primeiro caso é o mais crítico.
611
+ - Section "Stripper misses" (V03-0a) — listar entries de `decisions.md` matchando `WARN: stripper miss in <slug>` quando `globalsExtracted.status === 'extracted'`, com link pro arquivo offending pra revisão manual.
500
612
  - Footer com: link pra `pages-confirmed.json`, link pro `crawl/pages-list.json` (raw discovery), config snapshot.
501
613
 
502
614
  2. **Persistência cumulativa.** Se o `report.html` já existir (rerun), preserve runs anteriores em uma section "Previous runs" (hierárquica por timestamp). O run mais recente fica no topo. NÃO sobrescreva tudo.
@@ -37,7 +37,17 @@ A single `.html` file written to `outputPath`. The file is a fragment — no `<h
37
37
 
38
38
  ## On Activation
39
39
 
40
- 1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, `pseudoElements`, `imgUrls`) and `assets-map.json` (the URL → localPath / inline-SVG dictionary). If `fixHints` is given, also read `previousHtmlPath`.
40
+ 1. **Read the inputs.** Parse `inspection.json` (capture `sectionType`, `tokens`, `dom`, `pseudoElements`, `imgUrls`, **`hydratedHeader`**, **`hydratedFooter`**) and `assets-map.json` (the URL → localPath / inline-SVG dictionary). If `fixHints` is given, also read `previousHtmlPath`.
41
+
42
+ **§V03-0a — `hydratedHeader` / `hydratedFooter` payload.** When `inspect-live` captures the page chrome WHILE THE BOTTOM IS IN VIEW, it snapshots the hydrated HTML before Shopify themes tear menus down on intersection-out. The payload is the canonical source of truth for header/footer composition — prefer it over `dom[]` subtrees, which may be empty `<ul>` shells if the live walker ran after tear-down. Each one has:
43
+ - `html`: outerHTML of the chrome subtree (string).
44
+ - `bbox`: `{x,y,w,h}` of the chrome at snapshot time (may reflect bottom-scroll position; the walker emits its own subtree bbox on `dom[]` if you need the at-top layout).
45
+ - `links[]`: every `<a href>` inside, as `{href, text, label}`. Use these for menu items, policy links, social, etc.
46
+ - `headings[]`: every `<h1>`...`<h6>` and `<p class="bold">`, as `{tag, text}`. Use these for column titles ("More Information", "Collections", "Need Help?").
47
+ - `inputs[]` / `forms[]`: full input + form metadata for newsletter signup or contact widgets — preserve `action`/`method` verbatim so the form actually submits.
48
+ - `images[]`: every `<img>` with src + alt + dimensions.
49
+
50
+ When composing a `--target-section=header` or `--target-section=footer` fragment AND the matching `hydratedHeader`/`hydratedFooter` is non-null, BUILD FROM THE STRUCTURED PAYLOAD, not from `dom[]`. This is mandatory whenever the payload is present — it eliminates the entire class of "shadow-dom-opaque" / lazy-light-DOM fabrication risks that previously forced image-slice fallback.
41
51
 
42
52
  1.5. **Pre-dispatch checklist.** Before composing, run:
43
53
 
@@ -84,6 +94,34 @@ A single `.html` file written to `outputPath`. The file is a fragment — no `<h
84
94
  | Pattern doesn't match any section in `inspection.dom` | Section type detected but markup is ambiguous | Pick the closest pattern from A-H, document the choice in a comment, and surface to the orchestrator. |
85
95
  | Prettier missing | Optional dep not installed | Non-fatal — `write` reports `formatted: false, formatterSkippedReason: "prettier-not-installed"`. Output is still written. |
86
96
 
97
+ ## §V03-0a — Header / Footer composition from hydrated snapshot
98
+
99
+ When `--target-section=header` or `--target-section=footer` is passed AND the corresponding `inspection.hydratedHeader` / `inspection.hydratedFooter` is non-null, follow this composition recipe instead of the generic A-H pattern lookup:
100
+
101
+ **Footer recipe** (when `hydratedFooter` present):
102
+
103
+ 1. **Outer shell.** `<footer class="es-footer">` with reset + `box-sizing: border-box` + the page background-color taken from `inspection.tokens.colors.background` (or fallback `#fafafa`).
104
+ 2. **Grouping.** Split `hydratedFooter.links` into clusters by their adjacent heading in `hydratedFooter.headings`. Order of headings reflects the live DOM order. For each heading: emit a column with `<p class="footer__col-title">{heading.text}</p>` followed by `<ul>` of `<li><a href>` from the links bucketed under that heading.
105
+ - Heuristic for bucketing: walk `hydratedFooter.html` once (in your head, you have the raw outerHTML in `hydratedFooter.html` if needed) — links that appear AFTER a heading and BEFORE the next heading belong to that heading. If you can't disambiguate, fall back to grouping by `href` prefix patterns (`/policies/` → "More Information", `/products/` → "Collections", `/pages/` → split by name).
106
+ 3. **Newsletter.** If `hydratedFooter.forms[]` is non-empty AND `hydratedFooter.inputs[]` contains an `email`-typed input, emit a real `<form action="{form.action}" method="{form.method}">` with the email input, preserving `name`, `placeholder`, `required`, `aria-label`. Hidden inputs (`type=hidden`) are preserved verbatim — they're typically Shopify form-type tokens. Add a submit `<button type="submit">` even if not in the source.
107
+ 4. **Social media.** Detect social links by `href` matching `/facebook|instagram|tiktok|youtube|x\.com|twitter|pinterest/`. Group in a separate `<ul class="footer__social">` with inline SVG icons (you can ship the standard FB/IG icons from the bundled patterns). Skip if no matches.
108
+ 5. **Payment icons.** Detect by `hydratedFooter.images[]` with `alt` matching `/amazon|visa|mastercard|amex|apple pay|google pay|discover|diners|shop pay|paypal/i` — emit those as a `<ul class="footer__payments">` with `<img>` referencing the `assetsMap` (resolve src via the standard pipeline). If `images[]` is empty but you'd expect them, fall back to text labels.
109
+ 6. **Copyright.** If any link or heading text matches `© YEAR, Brand.`, preserve verbatim at the bottom.
110
+ 7. **NO image-slice fallback.** When hydrated data is present, never compose a single `<img>` of the footer. The hydrated payload gives everything needed for real markup.
111
+
112
+ **Header recipe** (when `hydratedHeader` present):
113
+
114
+ 1. **Outer shell.** `<header class="es-header">` with sticky position if the page tokens indicate so.
115
+ 2. **Promo bar.** If any link's text matches a promotional pattern (`/sale|discount|free|% off/i`) OR `hydratedHeader.headings[]` contains a short standalone heading at the top, emit `<div class="es-header__promo">{text}</div>`.
116
+ 3. **Brand.** If `hydratedHeader.images[]` includes one with `alt` matching the brand name (derived from URL or generic `logo`), emit `<a class="es-header__brand" href="/">` with that `<img>`.
117
+ 4. **Nav.** Take `hydratedHeader.links` filtered to category-looking hrefs (`/products/...`, `/collections/...`, top-level pages). Emit as `<nav><ul>{links}</ul></nav>`. On mobile (`@media (max-width: 749px)`), collapse into a `<details><summary aria-label="Menu">` drawer — keep all category links inside.
118
+ 5. **Utility icons.** Links with text "Open search", "Account", "Cart" (or matching hrefs `/search`, `/account`, `/cart`) get rendered as a right-side cluster of icon buttons.
119
+
120
+ **Output contract for header/footer:**
121
+ - Markup includes a top comment `<!-- sb-build-wp: composed from hydrated snapshot (V03-0a) -->`.
122
+ - `fabricationRisks` array in the metadata is empty for these sections (hydrated data IS the truth — no fabrication concern).
123
+ - If `hydratedHeader`/`hydratedFooter` is NULL or empty, fall back to the prior A-H pattern lookup OR — as last resort — image-slice from screenshot bbox.
124
+
87
125
  ## Conventions
88
126
 
89
127
  - Bare paths (e.g. `scripts/build-wp.mjs`) resolve from the skill root.
@@ -29,7 +29,7 @@ Optional:
29
29
  --wait-strategy <name> lazy-load (default) | auto | kaching-bundles | judge-me
30
30
  --max-depth <n> DOM walk max depth (default 8).
31
31
  --max-children <n> Max children kept per node (default 60).
32
- --max-text <n> Max chars of direct text per node (default 240).
32
+ --max-text <n> Max chars of direct text per node (default 0 = no truncation; pass a positive integer to cap).
33
33
  --timeout <ms> Per-step timeout (default 30000).
34
34
  --help Show this message.
35
35
 
@@ -55,7 +55,7 @@ const { values } = parseArgs({
55
55
  'output-dir': { type: 'string' },
56
56
  'max-depth': { type: 'string', default: '8' },
57
57
  'max-children': { type: 'string', default: '60' },
58
- 'max-text': { type: 'string', default: '240' },
58
+ 'max-text': { type: 'string', default: '0' },
59
59
  timeout: { type: 'string', default: '30000' },
60
60
  help: { type: 'boolean', default: false },
61
61
  },
@@ -196,9 +196,107 @@ async function main() {
196
196
  window.scrollTo(0, y)
197
197
  await sleep(150)
198
198
  }
199
- window.scrollTo(0, 0)
200
199
  })
201
200
 
201
+ // §V03-0a — lazy-hydration wait WHILE SCROLLED TO BOTTOM. Many
202
+ // Shopify themes populate footer menus (Contact, Privacy Policy,
203
+ // Collections links) via <store-footer-menu>-style custom elements
204
+ // only AFTER the host intersects the viewport. Wait here, with the
205
+ // footer in view, for any <footer> <ul> to gain children or for
206
+ // <footer> <a href> count to reach >= 3.
207
+ log('waiting for lazy-hydrated footer content (max 5s)')
208
+ try {
209
+ await page.waitForFunction(
210
+ () => {
211
+ const lists = document.querySelectorAll('footer ul, [class*="footer"] ul')
212
+ for (const ul of lists) {
213
+ if (ul.children.length > 0) return true
214
+ }
215
+ const links = document.querySelectorAll('footer a[href], [class*="footer"] a[href]')
216
+ return links.length >= 3
217
+ },
218
+ { timeout: 5000 },
219
+ )
220
+ log('footer content hydrated')
221
+ } catch (_) {
222
+ log('footer hydration timeout — proceeding')
223
+ }
224
+
225
+ // §V03-0a — capture hydrated chrome HTML WHILE STILL IN VIEW.
226
+ // Shopify themes that populate footer menus on viewport intersection
227
+ // sometimes tear the content down when the host leaves the viewport.
228
+ // The walker can't see torn-down menus. Snapshot the relevant
229
+ // subtrees here, before we scroll back to the top for layout reads,
230
+ // and stash them on `window` for extractInPage to read during walk.
231
+ log('snapshotting hydrated header/footer HTML for downstream use')
232
+ await page.evaluate(() => {
233
+ function pickFooter() {
234
+ const candidates = document.querySelectorAll(
235
+ 'footer, [class*="footer"][class*="section"], [role="contentinfo"]',
236
+ )
237
+ let best = null
238
+ for (const el of candidates) {
239
+ const r = el.getBoundingClientRect()
240
+ if (r.height < 50) continue
241
+ const links = el.querySelectorAll('a[href]').length
242
+ const inputs = el.querySelectorAll('input,form').length
243
+ const score = links + inputs * 2 + r.height / 100
244
+ if (!best || score > best.score) best = { el, score }
245
+ }
246
+ return best?.el || null
247
+ }
248
+ function pickHeader() {
249
+ const candidates = document.querySelectorAll(
250
+ 'header, [class*="header"][class*="section"], [role="banner"]',
251
+ )
252
+ let best = null
253
+ for (const el of candidates) {
254
+ const r = el.getBoundingClientRect()
255
+ if (r.height < 30) continue
256
+ const links = el.querySelectorAll('a[href]').length
257
+ const score = links + r.height / 100
258
+ if (!best || score > best.score) best = { el, score }
259
+ }
260
+ return best?.el || null
261
+ }
262
+ const footer = pickFooter()
263
+ const header = pickHeader()
264
+ window.__sbHydratedFooter = footer
265
+ ? {
266
+ html: footer.outerHTML,
267
+ bbox: (() => {
268
+ const r = footer.getBoundingClientRect()
269
+ return {
270
+ x: Math.round(r.x + window.scrollX),
271
+ y: Math.round(r.y + window.scrollY),
272
+ w: Math.round(r.width),
273
+ h: Math.round(r.height),
274
+ }
275
+ })(),
276
+ }
277
+ : null
278
+ window.__sbHydratedHeader = header
279
+ ? {
280
+ html: header.outerHTML,
281
+ bbox: (() => {
282
+ const r = header.getBoundingClientRect()
283
+ return {
284
+ x: Math.round(r.x + window.scrollX),
285
+ y: Math.round(r.y + window.scrollY),
286
+ w: Math.round(r.width),
287
+ h: Math.round(r.height),
288
+ }
289
+ })(),
290
+ }
291
+ : null
292
+ })
293
+
294
+ // Return to the top before layout reads. content-visibility:auto on
295
+ // viewport-distant nodes returns {w:0,h:0} bboxes — walker can't
296
+ // read layout from the bottom. 1.2s settle for any re-hydration.
297
+ await page.evaluate(() => window.scrollTo(0, 0))
298
+ await page.waitForTimeout(1200)
299
+
202
300
  // Force-eager: any <img loading="lazy"> that didn't intersect during scroll
203
301
  // gets rewritten to eager and re-fetched. <picture> sources too.
204
302
  log('forcing eager image fetch')
@@ -960,6 +1058,12 @@ function extractInPage({ selector, maxDepth, maxChildren, maxText }) {
960
1058
  // iframes (Klaviyo embeds, recaptcha) are recorded as opaque rectangles.
961
1059
  let shadowDOMTraversed = false
962
1060
  let shadowRootCount = 0
1061
+ // §V03-C — counts hosts whose shadow tree was successfully re-serialized
1062
+ // via getHTML+parseHTMLUnsafe and re-walked into a flattened light-DOM
1063
+ // representation. 0 means the post-render shadow-flatten phase was either
1064
+ // unavailable, skipped, or hit zero open roots.
1065
+ let shadowSerializedHostCount = 0
1066
+ const warnings = []
963
1067
  const externalIframes = []
964
1068
  function classifyIframePurpose(src) {
965
1069
  if (!src) return null
@@ -1091,7 +1195,10 @@ function extractInPage({ selector, maxDepth, maxChildren, maxText }) {
1091
1195
  if (n.nodeType === 3) t += n.nodeValue
1092
1196
  }
1093
1197
  t = t.replace(/\s+/g, ' ').trim()
1094
- if (t.length > maxText) t = `${t.slice(0, maxText)}…`
1198
+ // §V03-B maxText === 0 means "no truncation" (default since v0.3.0).
1199
+ // Pre-v0.3.0 default was 240, which silently clipped policy paragraphs
1200
+ // (privacy/terms) at ~25-36% coverage. The cap is now opt-in via flag.
1201
+ if (maxText > 0 && t.length > maxText) t = `${t.slice(0, maxText)}…`
1095
1202
  return t
1096
1203
  }
1097
1204
 
@@ -1406,16 +1513,198 @@ function extractInPage({ selector, maxDepth, maxChildren, maxText }) {
1406
1513
  const dom = [walk(root, 0)].filter(Boolean)
1407
1514
  const { sectionType, sectionBoundingBox } = findSectionAndBox(root, !!selector)
1408
1515
 
1516
+ // §V03-C — Shadow DOM serialization via getHTML + parseHTMLUnsafe.
1517
+ // walk() above only sees light DOM and `.shadowRoot.children` direct.
1518
+ // Custom elements whose shadow tree is populated by JS (Shopify
1519
+ // <x-product-form>, <price-list>, <variant-radios>, <store-footer-menu>)
1520
+ // expose content that the children-only walker can technically read,
1521
+ // but `<slot>`-projected content and content composed from declarative
1522
+ // shadow DOM gets fragmented. getHTML({ serializableShadowRoots: true })
1523
+ // emits a single HTML string with `<template shadowrootmode="open">`
1524
+ // declarations inline; parseHTMLUnsafe re-attaches those as live shadow
1525
+ // roots in a parsed document, which the same walk() can then flatten
1526
+ // uniformly. Layout (bbox/computedStyle) on the parsed doc is detached
1527
+ // and returns UA defaults — that's an acknowledged trade-off; the
1528
+ // structural content (tags, classes, attrs, text, src) is what makes
1529
+ // PDP gallery/price/variants visible to the composer downstream.
1530
+ const originalShadowRootCount = shadowRootCount
1531
+ const originalShadowDOMTraversed = shadowDOMTraversed
1532
+ // §V03-C — preserve a snapshot of the live-walker dom[] before any
1533
+ // potential substitution. If the re-walk replaces dom[] with the
1534
+ // shadow-flattened tree (which carries detached parsedDoc bboxes
1535
+ // === {0,0,0,0}), downstream consumers that need real layout (e.g.
1536
+ // /build-site Step 3.5e --crop-live-bbox for header/footer compare)
1537
+ // can fall back to domLive. When substitution doesn't fire, domLive
1538
+ // is left null and consumers use dom[] as-is.
1539
+ let domLive = null
1540
+ try {
1541
+ if (typeof document.documentElement.getHTML === 'function' &&
1542
+ typeof Document.parseHTMLUnsafe === 'function') {
1543
+ const hostsSeen = new Set()
1544
+ const hostsCollected = []
1545
+ function collectHostsFrom(el) {
1546
+ if (!el) return
1547
+ if (el.shadowRoot && !hostsSeen.has(el)) {
1548
+ hostsSeen.add(el)
1549
+ hostsCollected.push(el)
1550
+ for (const child of el.shadowRoot.children) collectHostsFrom(child)
1551
+ }
1552
+ for (const child of el.children) collectHostsFrom(child)
1553
+ }
1554
+ collectHostsFrom(document.documentElement)
1555
+
1556
+ if (hostsCollected.length > 0) {
1557
+ const html = document.documentElement.getHTML({
1558
+ serializableShadowRoots: true,
1559
+ shadowRoots: hostsCollected.map((h) => h.shadowRoot),
1560
+ })
1561
+ const parsedDoc = Document.parseHTMLUnsafe(html)
1562
+ const parsedRoot = selector
1563
+ ? parsedDoc.querySelector(selector)
1564
+ : parsedDoc.body
1565
+ if (parsedRoot) {
1566
+ const flattened = walk(parsedRoot, 0)
1567
+ // §V03-C safety guard: only substitute the live dom[] if the
1568
+ // re-walk did not lose value on EITHER axis (nodes or aggregate
1569
+ // text chars). parseHTMLUnsafe returns a detached doc;
1570
+ // getComputedStyle/getBoundingClientRect degrade to UA defaults
1571
+ // (zero bbox, empty computed). On pages where shadow flattening
1572
+ // adds value (PDPs Shopify with populated custom elements) the
1573
+ // gain is large; on pages without that workload (policies,
1574
+ // plain HTML), the re-walk can lose content because hydrated
1575
+ // shadow content visible to the live walker doesn't reproduce
1576
+ // on the detached tree. When that happens, keep the live
1577
+ // walker result and surface a warning.
1578
+ function measureTree(arr) {
1579
+ let nodes = 0
1580
+ let textChars = 0
1581
+ const stack = Array.isArray(arr) ? [...arr] : [arr]
1582
+ while (stack.length) {
1583
+ const node = stack.pop()
1584
+ if (!node || typeof node !== 'object') continue
1585
+ nodes++
1586
+ if (typeof node.text === 'string') textChars += node.text.length
1587
+ if (Array.isArray(node.children)) {
1588
+ for (const c of node.children) stack.push(c)
1589
+ }
1590
+ }
1591
+ return { nodes, textChars }
1592
+ }
1593
+ if (flattened) {
1594
+ const orig = measureTree(dom)
1595
+ const flat = measureTree([flattened])
1596
+ if (flat.nodes >= orig.nodes && flat.textChars >= orig.textChars) {
1597
+ // Snapshot the live walker result BEFORE substitution so
1598
+ // downstream consumers that depend on real layout (bbox
1599
+ // values are zero on the parsed detached doc) can fall
1600
+ // back when needed — see §V03-C domLive comment above.
1601
+ domLive = dom.length === 1 ? dom[0] : [...dom]
1602
+ dom.length = 0
1603
+ dom.push(flattened)
1604
+ shadowSerializedHostCount = hostsCollected.length
1605
+ } else {
1606
+ warnings.push({
1607
+ code: 'shadow-flatten-skipped-lossy',
1608
+ message: `re-walk lossy: nodes ${flat.nodes} vs ${orig.nodes}, textChars ${flat.textChars} vs ${orig.textChars}; keeping live walker result`,
1609
+ })
1610
+ }
1611
+ }
1612
+ // The re-walk of the parsed (detached) doc re-discovers shadow
1613
+ // roots that parseHTMLUnsafe re-attached, so it would
1614
+ // double-count if we let those increments leak. Restore the
1615
+ // canonical live counts here.
1616
+ shadowRootCount = originalShadowRootCount
1617
+ shadowDOMTraversed = originalShadowDOMTraversed
1618
+ }
1619
+ }
1620
+ } else {
1621
+ warnings.push({
1622
+ code: 'shadow-serialize-unavailable',
1623
+ message:
1624
+ 'getHTML or Document.parseHTMLUnsafe not available; shadow DOM falls back to .shadowRoot.children walk (v0.2.x behavior)',
1625
+ })
1626
+ }
1627
+ } catch (err) {
1628
+ shadowRootCount = originalShadowRootCount
1629
+ shadowDOMTraversed = originalShadowDOMTraversed
1630
+ warnings.push({
1631
+ code: 'shadow-serialize-failed',
1632
+ message: String(err && err.message ? err.message : err),
1633
+ })
1634
+ }
1635
+
1636
+ // §V03-0a — extract structured content from the hydrated-chrome
1637
+ // HTML snapshots taken before scroll-back. Shopify themes that
1638
+ // populate menus on viewport intersection may have torn those down
1639
+ // by the time the walker runs. The snapshots preserve the truth.
1640
+ function extractChrome(snapshot) {
1641
+ if (!snapshot || !snapshot.html) return null
1642
+ let parsed
1643
+ try {
1644
+ parsed = new DOMParser().parseFromString(snapshot.html, 'text/html')
1645
+ } catch (_) {
1646
+ return { ...snapshot, links: [], headings: [], inputs: [], forms: [], images: [] }
1647
+ }
1648
+ const root = parsed.body.firstElementChild || parsed.body
1649
+ const norm = (s) => (s || '').replace(/\s+/g, ' ').trim()
1650
+ const links = Array.from(root.querySelectorAll('a[href]'))
1651
+ .map((a) => ({
1652
+ href: a.getAttribute('href') || '',
1653
+ text: norm(a.textContent),
1654
+ label: a.getAttribute('aria-label') || null,
1655
+ }))
1656
+ .filter((l) => l.href && (l.text || l.label))
1657
+ const headings = Array.from(root.querySelectorAll('h1, h2, h3, h4, h5, h6, p.bold, .footer__block-title, [class*="heading"]'))
1658
+ .map((h) => ({ tag: h.tagName.toLowerCase(), text: norm(h.textContent) }))
1659
+ .filter((h) => h.text)
1660
+ const inputs = Array.from(root.querySelectorAll('input')).map((i) => ({
1661
+ type: i.getAttribute('type') || 'text',
1662
+ name: i.getAttribute('name') || null,
1663
+ placeholder: i.getAttribute('placeholder') || null,
1664
+ required: i.hasAttribute('required'),
1665
+ ariaLabel: i.getAttribute('aria-label') || null,
1666
+ }))
1667
+ const forms = Array.from(root.querySelectorAll('form')).map((f) => ({
1668
+ action: f.getAttribute('action') || null,
1669
+ method: f.getAttribute('method') || 'get',
1670
+ id: f.getAttribute('id') || null,
1671
+ }))
1672
+ const images = Array.from(root.querySelectorAll('img'))
1673
+ .map((img) => ({
1674
+ src: img.getAttribute('src') || '',
1675
+ alt: img.getAttribute('alt') || '',
1676
+ width: img.getAttribute('width') || null,
1677
+ height: img.getAttribute('height') || null,
1678
+ }))
1679
+ .filter((img) => img.src)
1680
+ return {
1681
+ html: snapshot.html,
1682
+ bbox: snapshot.bbox,
1683
+ links,
1684
+ headings,
1685
+ inputs,
1686
+ forms,
1687
+ images,
1688
+ }
1689
+ }
1690
+ const hydratedHeader = extractChrome(window.__sbHydratedHeader)
1691
+ const hydratedFooter = extractChrome(window.__sbHydratedFooter)
1692
+
1409
1693
  return {
1410
1694
  sectionType,
1411
1695
  sectionBoundingBox,
1412
1696
  tokens,
1413
1697
  dom,
1698
+ domLive,
1414
1699
  pseudoElements,
1415
1700
  imgUrls,
1416
1701
  shadowDOMTraversed,
1417
1702
  shadowRootCount,
1703
+ shadowSerializedHostCount,
1704
+ warnings,
1418
1705
  externalIframes,
1706
+ hydratedHeader,
1707
+ hydratedFooter,
1419
1708
  }
1420
1709
  }
1421
1710
 
@@ -37,6 +37,24 @@ test('--help exits 0 and prints usage', () => {
37
37
  assert.match(r.stdout, /--wait-strategy/)
38
38
  })
39
39
 
40
+ // §V03-B — V0.3.0 changed --max-text default from 240 to 0 (no truncation).
41
+ // This regression-protects the policy-page coverage fix.
42
+ test('--help documents --max-text default 0 (no truncation)', () => {
43
+ const r = spawnSync('node', [SCRIPT, '--help'], { encoding: 'utf8' })
44
+ assert.equal(r.status, 0, `exit code was ${r.status}`)
45
+ assert.match(r.stdout, /--max-text/, '--max-text flag must appear in help')
46
+ // Scope the "default 0" assertion to the --max-text line specifically.
47
+ // The previous /default\s+0/ would match against any other flag whose
48
+ // description happens to contain "default 0" (e.g. a future flag whose
49
+ // default is "0ms" or similar).
50
+ assert.match(
51
+ r.stdout,
52
+ /--max-text[^\n]*default\s+0/,
53
+ '--max-text line must document default 0',
54
+ )
55
+ assert.match(r.stdout, /no truncation/i, 'help must explain semantics')
56
+ })
57
+
40
58
  test('missing --url exits 2', () => {
41
59
  const r = spawnSync('node', [SCRIPT, '--output-dir', '/tmp/sb-test'], { encoding: 'utf8' })
42
60
  assert.equal(r.status, 2, `exit code was ${r.status}`)