@adia-ai/a2ui-compose 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -10,7 +10,70 @@ generator graph.
10
10
 
11
11
  ## [Unreleased]
12
12
 
13
- _No pending changes._
13
+ _Nothing yet._
14
+
15
+ ---
16
+
17
+ ## [0.2.2] - 2026-05-02
18
+
19
+ **Lockstep cut + reasoning-panel emission fixes.** All 8 published `@adia-ai/*` packages bump 0.2.1 → 0.2.2 per [`docs/specs/package-architecture.md` § 15](../../../docs/specs/package-architecture.md#15-versioning-policy). Patch cut — no breaking changes.
20
+
21
+ ### Changed
22
+
23
+ - `version`: `0.2.1` → `0.2.2`.
24
+ - `dependencies["@adia-ai/a2ui-corpus"]`: `^0.2.0` (covers `0.2.2`).
25
+ - `dependencies["@adia-ai/a2ui-utils"]`: `^0.2.0` (covers `0.2.2`).
26
+ - `dependencies["@adia-ai/a2ui-validator"]`: `^0.2.0` (covers `0.2.2`).
27
+ - `dependencies["@adia-ai/a2ui-retrieval"]`: `^0.2.0` (covers `0.2.2`).
28
+ - `core/generator.js` and `strategies/` are the surfaces that gained the new behavior below; no other compose directories touched.
29
+
30
+ ### Fixed — Reasoning-panel emissions surface confidence + retrieval candidates (2026-05-02)
31
+
32
+ `generator.js` was emitting `status` events whose labels looked truthful but hid the data needed to judge their reliability. Two fixes that surface the missing data as `outcomes`:
33
+
34
+ - **Interpret stage** now includes the domain confidence percentage and the matched signals. Previously `Domain: data` read identical at 3% confidence vs 95% — same label. Now: `Domain: data (3% confidence)` plus `Matched signals: pricing` as outcomes. Per memory `feedback_panel_label_reliability.md`.
35
+ - **Analyze stage** runs a fast keyword `searchBlocks(intent)` and emits the top-5 candidates with scores + domains. Previously the stage reported `context.patterns.length` from `getContext(intent, 2)` — but the context-assembler only populates `result.patterns` at tier ≥ 3, so the count was structurally guaranteed to be 0. The label read "nothing relevant" when it actually meant "we didn't even look here." The fix runs a real retrieval at the analyze stage instead of relying on the wrong-tier context.
36
+
37
+ Companion fix in `@adia-ai/a2ui-retrieval` Unreleased (web-research.js EXPLICIT/IMPLICIT pattern split). Both surfaces feed the same reasoning-panel deception class — five distinct bugs caught + fixed in commit `9986c71e`.
38
+
39
+ ---
40
+
41
+ ## [0.2.1] - 2026-05-02
42
+
43
+ **Lockstep cut + scope-drift gate at composer time + skeleton harvest + Tier-1 block filter + high-resolution session ticket trace.** All 8 published `@adia-ai/*` packages bump 0.2.0 → 0.2.1 per [`docs/specs/package-architecture.md` § 15](../../../docs/specs/package-architecture.md#15-versioning-policy). Patch cut — no breaking changes.
44
+
45
+ ### Changed
46
+
47
+ - `version`: `0.2.0` → `0.2.1`.
48
+ - `dependencies["@adia-ai/a2ui-corpus"]`: `^0.2.0` (covers `0.2.1`).
49
+ - `dependencies["@adia-ai/a2ui-utils"]`: `^0.2.0` (covers `0.2.1`).
50
+ - `dependencies["@adia-ai/a2ui-validator"]`: `^0.2.0` (covers `0.2.1`).
51
+ - `dependencies["@adia-ai/a2ui-retrieval"]`: `^0.2.0` (covers `0.2.1`).
52
+ - `strategies/zettel/chunk-synthesizer.js` and `strategies/zettel/issue-reporter.js` are the surfaces that gained the new behavior below; no other directories under `strategies/` were touched.
53
+
54
+ ### Added — Scope-drift gate at composer time (2026-05-02)
55
+
56
+ `composeFromIntent` now computes a scope-drift signal after every successful composition: composed-HTML component count vs. the sum of bound chunks' component counts. When `actual / expected > 1.5×` (with a 20-component floor to suppress small-UI noise), the synthesizer emits a warning with the ratio. The MCP layer auto-fires a `scope-drift` issue on detection, which writes both a JSON ticket and a high-resolution Markdown ticket showing the bound-chunk envelope, ratio, and a callout naming the canvas-drift regression class.
57
+
58
+ - `chunk-synthesizer.js`: `computeScopeDrift(html, boundChunks)`, `SCOPE_DRIFT_RATIO = 1.5`, `SCOPE_DRIFT_MIN_ACTUAL = 20`. Tier-1 (single bound block) and Tier-2 (page chunk + every block resolved into a slot) both instrumented.
59
+ - `issue-reporter.js`: new `'scope-drift'` reason in `AUTO_FIRE_POLICY` (type `bug`, severity `drift`, owner `synthesis`); `scopeDrift` flowed into the trace; ticket renderer extended with envelope + ratio + drift callout.
60
+
61
+ Closes the loop on the canvas-drift regression class — the §37 fix patched the immediate symptom (84 components for a 4-stat retrieval), this trip-wire makes the class self-detecting.
62
+
63
+ ### Added — Skeleton harvest + Tier-1 block filter (2026-05-02)
64
+
65
+ Two coordinated changes to the chunk-corpus pipeline that bound the synthesizer's output envelope.
66
+
67
+ - **Skeleton harvest for nested chunks.** `harvest-chunks.mjs` now collapses each nested `data-chunk` element's inner content to `<!-- nested: <name> -->`, so page/panel chunks become compact skeletons. `dashboard-admin-page` shrank from 26,555c → 1,522c (-94%); `dashboard-overview-panel` from 12,493c → 898c (-93%); xl-bucket chunks (>10K chars) went from 2 → 0.
68
+ - **Tier-1 retrieval `kind: 'block'` filter.** `composeFromIntent` restricts the fast-path retrieval to block-kind chunks; page/panel composition still works via Tier-2 LLM synthesis with slot-binding, where the skeleton is the right input.
69
+
70
+ End-to-end: "trace documentation UI with stat cards" now produces 4 components (was 84). "kpi dashboard with stat cards" → 13 components. The retrieved chunk's component count is now an upper bound on the Tier-1 fast-path output.
71
+
72
+ ### Added — High-resolution session ticket trace (2026-05-02)
73
+
74
+ The synthesizer now captures a `retrievalTrace` per attempt: Tier-1 hits with scores + kinds, Tier-2 catalog summary (page/panel/block counts, top-N block candidates), user-prompt char count, system-prompt hash, attempt-by-attempt LLM raw responses, validation results, plan, and HTML preview. The state cache stores it on every entry. `attachTrace('full')` (now the default when `state_id` is provided) returns the full session-replay payload.
75
+
76
+ `issue-reporter.js` renders this as a sibling Markdown ticket alongside the JSON; sections cover header, description, reproduction, component count, retrieval log table, LLM attempts (raw responses), user prompt, composer plan, generated HTML preview, warnings, ops history, environment. A maintainer can replay any flagged session from the ticket alone.
14
77
 
15
78
  ---
16
79
 
@@ -21,7 +84,7 @@ packages now share one version, governed by
21
84
  [`docs/specs/package-architecture.md` § 15 (Versioning Policy)](../../../docs/specs/package-architecture.md#15-versioning-policy).
22
85
  This release also lands the `engine/` ↔ `engines/` collision fix from
23
86
  T3 of the
24
- [`docs/plans/packages-architecture-fixes-2026-05-02.md`](../../../docs/plans/packages-architecture-fixes-2026-05-02.md)
87
+ [`docs/plans/packages-architecture-fixes-2026-05-02.md`](../../../.brain/archive/2026-Q2/PLAN-packages-architecture-fixes-2026-05-02.md)
25
88
  plan.
26
89
 
27
90
  ### Changed
package/core/generator.js CHANGED
@@ -244,9 +244,23 @@ export async function* generateUIStream({ intent, executionId, llmAdapter, model
244
244
  executionId = engine.start({ intent, mode: 'stream', previousExecId });
245
245
 
246
246
  // ── Stage 1: Interpret ──
247
+ // Include confidence + matched signals in the status message so a reader can
248
+ // distinguish a strong domain match from a 3% tie-breaker. The bare label
249
+ // "Domain: data" is indistinguishable between "build a data dashboard" (high
250
+ // confidence, many signals) and "tier pricing page" (0.03 confidence, one
251
+ // weak signal) — same text, very different reliability.
247
252
  const domain = lookupDomain(intent);
248
253
  engine.submitStage(executionId, 'interpret', { domain, intent, confidence: domain.confidence });
249
- yield { type: 'status', stage: 'interpret', message: `Domain: ${domain.domain}` };
254
+ const conf = domain.confidence != null ? Math.round(domain.confidence * 100) : null;
255
+ const signals = Array.isArray(domain.matchedSignals) && domain.matchedSignals.length
256
+ ? domain.matchedSignals.slice(0, 4)
257
+ : null;
258
+ yield {
259
+ type: 'status',
260
+ stage: 'interpret',
261
+ message: `Domain: ${domain.domain}${conf != null ? ` (${conf}% confidence)` : ''}`,
262
+ outcomes: signals ? [`Matched signals: ${signals.join(', ')}`] : [],
263
+ };
250
264
 
251
265
  // ── Clarity gate: yield clarify event if intent is vague ──
252
266
  // Only for fresh intents (not multi-turn iterations which have prior context)
@@ -267,12 +281,34 @@ export async function* generateUIStream({ intent, executionId, llmAdapter, model
267
281
  }
268
282
 
269
283
  // ── Stage 2: Analyze ──
284
+ // Surface the actual top-N retrieval candidates with scores. Two reasons the
285
+ // previous `${...patterns.length} patterns` line was misleading:
286
+ // (a) `getContext(intent, 2)` runs at tier=2, but the context-assembler
287
+ // only populates `result.patterns` at tier≥3 — so `context.patterns`
288
+ // was STRUCTURALLY empty. "0 patterns" wasn't a retrieval result; it
289
+ // was a tier mismatch. Fixed by running a fast keyword search here.
290
+ // (b) The synthesizer composes from below-threshold patterns too; "0
291
+ // patterns" reading as "nothing relevant" actively misleads the user.
270
292
  const context = await getContext(intent, 2);
293
+ const topPatternsRaw = searchBlocks(intent).slice(0, 5);
294
+ const candidateLines = topPatternsRaw.length
295
+ ? topPatternsRaw.map(p => {
296
+ const score = p.score != null ? ` · ${Number(p.score).toFixed(2)}` : '';
297
+ const domainTag = p.domain ? ` (${p.domain})` : '';
298
+ return ` • ${p.name}${domainTag}${score}`;
299
+ })
300
+ : [' (no keyword match — synthesizer will compose from components)'];
271
301
  engine.submitStage(executionId, 'analyze', {
272
- context, componentCount: context.components.length, patternCount: context.patterns.length,
302
+ context, componentCount: context.components.length, patternCount: topPatternsRaw.length,
303
+ topPatterns: topPatternsRaw.map(p => ({ name: p.name, score: p.score, domain: p.domain })),
273
304
  confidence: context.components.length > 0 ? 0.85 : 0.5,
274
305
  });
275
- yield { type: 'status', stage: 'analyze', message: `${context.components.length} components, ${context.patterns.length} patterns` };
306
+ yield {
307
+ type: 'status',
308
+ stage: 'analyze',
309
+ message: `${context.components.length} components inspected · ${topPatternsRaw.length} pattern candidate${topPatternsRaw.length === 1 ? '' : 's'}`,
310
+ outcomes: ['Top retrievals:', ...candidateLines],
311
+ };
276
312
 
277
313
  // ── Research: web search enrichment (optional) ──
278
314
  let researchContext = '';
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@adia-ai/a2ui-compose",
3
- "version": "0.2.0",
3
+ "version": "0.2.2",
4
4
  "description": "AdiaUI A2UI compose engine — framework-agnostic. Takes natural-language intents + a catalog and produces A2UI protocol messages. Pairs with `@adia-ai/a2ui-retrieval` (intent classification, catalog lookup) and `@adia-ai/a2ui-validator` (schema + semantic checks).",
5
5
  "type": "module",
6
6
  "exports": {
@@ -31,6 +31,12 @@ import { composeFromPlan, validatePlan } from './chunk-composer.js';
31
31
  const STRONG_RETRIEVAL_SCORE = 8; // search-score threshold for fast path
32
32
  const PRE_SEARCH_LIMIT = 30; // chunks shown to the LLM in the prompt
33
33
  const DEFAULT_MAX_ATTEMPTS = 2;
34
+ // Scope-drift gate: composed HTML's component count vs. the sum of bound
35
+ // chunks' component counts. A multiplier > SCOPE_DRIFT_RATIO trips a warning
36
+ // + auto-fires a `scope-drift` issue. Floor prevents false positives on
37
+ // small UIs where slot-wrapper noise dominates.
38
+ const SCOPE_DRIFT_RATIO = 1.5;
39
+ const SCOPE_DRIFT_MIN_ACTUAL = 20;
34
40
 
35
41
  const SYSTEM_PROMPT = `You compose web-app pages by binding training chunks into named slots.
36
42
 
@@ -141,17 +147,30 @@ function extractJSON(raw) {
141
147
  export async function composeFromIntent({ intent, llmAdapter, maxAttempts = DEFAULT_MAX_ATTEMPTS }) {
142
148
  // Tier 1 — retrieval. Try semantic-blended hit first; fall back to keyword
143
149
  // when embeddings are unavailable (no chunk-embeddings.json or no API key).
144
- const hits = await searchChunksAsync(intent, { limit: 5 });
150
+ //
151
+ // Restricted to kind=block: page/panel chunks are SKELETONS that need
152
+ // slot-binding composition (Tier 2 handles them). Returning a skeleton
153
+ // directly from Tier-1 would emit a near-empty page; we only want the
154
+ // fast-path for atomic block patterns.
155
+ const hits = await searchChunksAsync(intent, { kind: 'block', limit: 5 });
145
156
  if (hits.length > 0 && hits[0].score >= STRONG_RETRIEVAL_SCORE) {
146
157
  const top = getChunk(hits[0].name);
147
158
  const html = top.html || top.instances?.[0]?.html || '';
159
+ // Tier-1 fast path: sole bound chunk is the retrieved block. The gate
160
+ // is mostly a no-op here (ratio ≈ 1) but stays for symmetry — and to
161
+ // catch a corner case where retrieval returns a block that, post-render,
162
+ // expands far beyond its source (shouldn't happen, but worth detecting).
163
+ const scopeDrift = computeScopeDrift(html, [top]);
148
164
  return {
149
165
  html,
150
166
  plan: null,
151
167
  source: 'retrieval',
152
168
  score: hits[0].score,
153
169
  cosineScore: hits[0].cosineScore,
154
- warnings: [],
170
+ warnings: scopeDrift.drift
171
+ ? [`scope drift: ${scopeDrift.actual} components in HTML vs ${scopeDrift.expected} in bound chunk (ratio ${scopeDrift.ratio.toFixed(2)}×)`]
172
+ : [],
173
+ scopeDrift,
155
174
  };
156
175
  }
157
176
 
@@ -189,6 +208,29 @@ export async function composeFromIntent({ intent, llmAdapter, maxAttempts = DEFA
189
208
  let lastError = null;
190
209
  const attempts = [];
191
210
 
211
+ // Trace: snapshot of the retrieval log for the issue-reporter to surface
212
+ // verbatim on bug tickets. Recorded once before the retry loop so it
213
+ // describes what the LLM actually saw.
214
+ const retrievalTrace = {
215
+ tier1Hits: hits.slice(0, 5).map((h) => ({
216
+ name: h.name,
217
+ score: Number(h.score.toFixed(3)),
218
+ kind: h.kind,
219
+ cosineScore: h.cosineScore != null ? Number(h.cosineScore.toFixed(3)) : null,
220
+ })),
221
+ tier1Threshold: STRONG_RETRIEVAL_SCORE,
222
+ tier1Pass: hits.length > 0 && hits[0].score >= STRONG_RETRIEVAL_SCORE,
223
+ catalogSize: filtered.length,
224
+ catalogPageNames: pageChunks.map((c) => c.name),
225
+ catalogPanelNames: panelChunks.map((c) => c.name),
226
+ catalogBlockTopN: blockHits.slice(0, 10).map((h) => ({
227
+ name: h.name,
228
+ score: Number(h.score.toFixed(3)),
229
+ })),
230
+ userPromptChars: userPrompt.length,
231
+ systemPromptChars: SYSTEM_PROMPT.length,
232
+ };
233
+
192
234
  for (let i = 0; i < maxAttempts; i++) {
193
235
  const retryNudge = lastError
194
236
  ? `\n\nPREVIOUS ATTEMPT FAILED: ${lastError}. Return ONLY a JSON object shaped as { "page": "...", "slot_bindings": { ... } }. No prose, no questions.`
@@ -216,12 +258,33 @@ export async function composeFromIntent({ intent, llmAdapter, maxAttempts = DEFA
216
258
  continue;
217
259
  }
218
260
 
261
+ // Compute scope-drift signal: composed envelope vs sum of bound chunks.
262
+ // Bound chunks = page chunk + every block/panel resolved into a slot.
263
+ const boundNames = new Set([plan.page]);
264
+ for (const v of Object.values(plan.slot_bindings || {})) {
265
+ const arr = Array.isArray(v) ? v : [v];
266
+ for (const n of arr) boundNames.add(n);
267
+ }
268
+ const boundChunks = [...boundNames].map((n) => getChunk(n)).filter(Boolean);
269
+ const scopeDrift = computeScopeDrift(composed.html, boundChunks);
270
+ const driftWarnings = scopeDrift.drift
271
+ ? [`scope drift: ${scopeDrift.actual} components in composed HTML vs ${scopeDrift.expected} in bound chunks (ratio ${scopeDrift.ratio.toFixed(2)}× exceeds gate ${SCOPE_DRIFT_RATIO}×)`]
272
+ : [];
273
+
219
274
  return {
220
275
  html: composed.html,
221
276
  plan,
222
277
  source: 'synthesis',
223
- warnings: composed.warnings,
224
- synthesis: { attempts: i + 1, attemptsLog: attempts, validation },
278
+ warnings: [...composed.warnings, ...driftWarnings],
279
+ scopeDrift,
280
+ synthesis: {
281
+ attempts: i + 1,
282
+ attemptsLog: attempts,
283
+ validation,
284
+ retrievalTrace,
285
+ userPrompt,
286
+ systemPromptHash: hashString(SYSTEM_PROMPT),
287
+ },
225
288
  };
226
289
  }
227
290
 
@@ -230,6 +293,59 @@ export async function composeFromIntent({ intent, llmAdapter, maxAttempts = DEFA
230
293
  plan: null,
231
294
  source: 'synthesis',
232
295
  warnings: [`synthesis failed after ${maxAttempts} attempts: ${lastError}`],
233
- synthesis: { attempts: maxAttempts, attemptsLog: attempts },
296
+ synthesis: {
297
+ attempts: maxAttempts,
298
+ attemptsLog: attempts,
299
+ retrievalTrace,
300
+ userPrompt,
301
+ systemPromptHash: hashString(SYSTEM_PROMPT),
302
+ },
234
303
  };
235
304
  }
305
+
306
+ // Cheap non-crypto hash for prompt-version fingerprinting in tickets.
307
+ function hashString(s) {
308
+ let h = 5381;
309
+ for (let i = 0; i < s.length; i++) h = ((h * 33) ^ s.charCodeAt(i)) >>> 0;
310
+ return 'h' + h.toString(36);
311
+ }
312
+
313
+ /**
314
+ * Heuristic component count — number of opening tags in the HTML.
315
+ * Loose but stable proxy for A2UI envelope size; used by the scope-drift
316
+ * gate (above) and by issue-reporter's ticket-rendering counter.
317
+ */
318
+ function countComponents(html) {
319
+ if (typeof html !== 'string') return 0;
320
+ return (html.match(/<[a-z][a-z0-9-]*/g) || []).length;
321
+ }
322
+
323
+ /**
324
+ * Compute the scope-drift signal for a composed result.
325
+ *
326
+ * Inputs: the final HTML string + the chunk records that fed into it
327
+ * (page chunk for Tier-2; the single block for Tier-1). Sums the bound
328
+ * chunks' component counts to get an "expected envelope," compares it to
329
+ * the actual composed count, and reports the ratio.
330
+ *
331
+ * Returns { actual, expected, ratio, drift } where `drift` is true when
332
+ * actual exceeds SCOPE_DRIFT_RATIO × expected AND actual ≥ SCOPE_DRIFT_MIN_ACTUAL.
333
+ */
334
+ function computeScopeDrift(html, boundChunks) {
335
+ const actual = countComponents(html);
336
+ let expected = 0;
337
+ for (const c of boundChunks) {
338
+ if (!c) continue;
339
+ const chunkHtml = c.html || c.instances?.[0]?.html;
340
+ expected += countComponents(chunkHtml);
341
+ }
342
+ // Guard against zero-expected: when bound chunks have no HTML accounted for
343
+ // (rare; should only happen in malformed corpora), skip the gate rather
344
+ // than report infinite drift.
345
+ if (expected === 0) {
346
+ return { actual, expected, ratio: null, drift: false };
347
+ }
348
+ const ratio = actual / expected;
349
+ const drift = actual >= SCOPE_DRIFT_MIN_ACTUAL && ratio > SCOPE_DRIFT_RATIO;
350
+ return { actual, expected, ratio, drift };
351
+ }
@@ -75,6 +75,16 @@ export const AUTO_FIRE_POLICY = {
75
75
  suggested_owner: 'validator',
76
76
  titleFor: () => `refine_composition ops_failed list non-empty after apply`,
77
77
  },
78
+ 'scope-drift': {
79
+ type: 'bug',
80
+ severity: 'drift',
81
+ suggested_owner: 'synthesis',
82
+ titleFor: (ctx) => {
83
+ const ratio = ctx?.scopeDrift?.ratio != null ? `${ctx.scopeDrift.ratio.toFixed(1)}×` : '';
84
+ const intent = ctx?.intent ? ` for "${truncate(ctx.intent, 40)}"` : '';
85
+ return `Scope drift${ratio ? ' ' + ratio : ''}: composed HTML exceeds bound-chunk envelope${intent}`;
86
+ },
87
+ },
78
88
  };
79
89
 
80
90
  function truncate(s, n = 60) {
@@ -167,9 +177,20 @@ export async function attachTrace(state_id, depth, cache) {
167
177
  };
168
178
  }
169
179
 
170
- // 'full'
180
+ // 'full' — high-resolution session trace. Includes everything from
181
+ // summary plus the synthesis breadcrumbs (retrieval log, LLM prompts,
182
+ // raw responses per attempt, validation, plan, composed HTML preview)
183
+ // so a human reviewer or future agent can replay the full session.
171
184
  return {
172
185
  ...baseTrace,
186
+ intent: entry.intent ?? null,
187
+ source: entry.source ?? null,
188
+ score: entry.score ?? null,
189
+ plan: entry.plan ?? null,
190
+ synthesis: entry.synthesis ?? null, // retrievalTrace, attemptsLog, prompts
191
+ htmlPreview: entry.html ? truncateHtmlForTrace(entry.html) : null,
192
+ componentCount: entry.html ? countComponents(entry.html) : null,
193
+ scopeDrift: entry.scopeDrift ?? null,
173
194
  internal: entry.internal ?? null,
174
195
  output: entry.output ?? {
175
196
  ops: entry.ops_history || [],
@@ -177,9 +198,27 @@ export async function attachTrace(state_id, depth, cache) {
177
198
  },
178
199
  warnings: entry.warnings || [],
179
200
  duration_ms: entry.duration_ms ?? null,
201
+ parent_state_id: entry.parent_state_id ?? null,
202
+ created_at: entry.created_at ?? null,
180
203
  };
181
204
  }
182
205
 
206
+ const HTML_PREVIEW_MAX_BYTES = 8 * 1024;
207
+
208
+ function truncateHtmlForTrace(html) {
209
+ if (typeof html !== 'string') return null;
210
+ if (html.length <= HTML_PREVIEW_MAX_BYTES) return html;
211
+ return html.slice(0, HTML_PREVIEW_MAX_BYTES) + `\n<!-- ... ${html.length - HTML_PREVIEW_MAX_BYTES} bytes truncated -->`;
212
+ }
213
+
214
+ function countComponents(html) {
215
+ if (typeof html !== 'string') return 0;
216
+ // Count opening tags (excluding void/standalone HTML elements that
217
+ // don't contribute to A2UI component count). Loose heuristic; signals
218
+ // scope drift even if it's not a perfect protocol-level count.
219
+ return (html.match(/<[a-z][a-z0-9-]*/g) || []).length;
220
+ }
221
+
183
222
  /**
184
223
  * Write an issue to disk.
185
224
  *
@@ -198,7 +237,11 @@ export async function reportIssue(input, ctx = {}) {
198
237
  ? ctx.reporter
199
238
  : 'user';
200
239
 
201
- const traceDepth = input.trace ?? (input.state_id ? 'summary' : 'none');
240
+ // Default to 'full' when a state_id is provided high-resolution session
241
+ // tickets are the design intent (the previous 'summary' default produced
242
+ // tickets too thin to debug from). Caller can opt down to 'summary' or
243
+ // 'none' explicitly.
244
+ const traceDepth = input.trace ?? (input.state_id ? 'full' : 'none');
202
245
  let trace = null;
203
246
  if (input.state_id && traceDepth !== 'none') {
204
247
  trace = await attachTrace(input.state_id, traceDepth, ctx.cache);
@@ -237,7 +280,181 @@ export async function reportIssue(input, ctx = {}) {
237
280
  const path = join(storageRoot, `${issue_id}.json`);
238
281
  await writeFile(path, JSON.stringify(issue, null, 2));
239
282
 
240
- return { issue_id, path, ack: 'logged' };
283
+ // Write a sibling Markdown report for human review. The JSON ticket
284
+ // is the machine-readable source of truth; the .md is the readable
285
+ // surface a maintainer scans first to triage. Always written when
286
+ // we have a non-trivial trace (full or summary), so reviewers can
287
+ // see retrieval log, LLM prompts, attempts, plan, and composed HTML
288
+ // without reading the raw JSON.
289
+ let markdown_path = null;
290
+ if (trace) {
291
+ markdown_path = join(storageRoot, `${issue_id}.md`);
292
+ await writeFile(markdown_path, renderTicketMarkdown(issue, trace));
293
+ }
294
+
295
+ return {
296
+ issue_id,
297
+ path, // .json (machine-readable)
298
+ markdown_path, // .md (human-readable; null when trace='none')
299
+ ack: 'logged',
300
+ severity: issue.severity,
301
+ suggested_owner: issue.suggested_owner,
302
+ };
303
+ }
304
+
305
+ /**
306
+ * Render the high-resolution session ticket as Markdown.
307
+ * Sections (each appears only if relevant data exists):
308
+ * 1. Header (id, severity, type, title, owner, tags)
309
+ * 2. Description (body)
310
+ * 3. Reproduction (intent, state_id, source, score)
311
+ * 4. Component-count drift (heuristic; flagged when > 50 components)
312
+ * 5. Retrieval log (Tier 1 hits + Tier 2 catalog summary)
313
+ * 6. LLM trace (each attempt's raw response)
314
+ * 7. Composer plan (slot bindings)
315
+ * 8. HTML preview (first 8KB of generated output)
316
+ * 9. Warnings + ops history
317
+ * 10. Environment (mcp/engine/model)
318
+ */
319
+ function renderTicketMarkdown(issue, trace) {
320
+ const lines = [];
321
+ lines.push(`# ${issue.title}`);
322
+ lines.push('');
323
+ lines.push(`> **${issue.severity.toUpperCase()}** · ${issue.type} · owner: \`${issue.suggested_owner}\` · ${issue.created_at}`);
324
+ lines.push('> ');
325
+ lines.push(`> Issue ID: \`${issue.issue_id}\`${issue.tags?.length ? ' · tags: ' + issue.tags.map((t) => '`' + t + '`').join(' ') : ''}`);
326
+ lines.push('');
327
+
328
+ if (issue.body) {
329
+ lines.push('## Description');
330
+ lines.push('');
331
+ lines.push(issue.body);
332
+ lines.push('');
333
+ }
334
+
335
+ if (trace.intent || trace.state_id) {
336
+ lines.push('## Reproduction');
337
+ lines.push('');
338
+ if (trace.intent) lines.push(`- **Intent**: \`${trace.intent}\``);
339
+ if (trace.state_id) lines.push(`- **State ID**: \`${trace.state_id}\``);
340
+ if (trace.source) lines.push(`- **Source**: ${trace.source}${trace.score != null ? ` (score: ${trace.score.toFixed(3)})` : ''}`);
341
+ if (trace.parent_state_id) lines.push(`- **Parent state**: \`${trace.parent_state_id}\``);
342
+ if (trace.duration_ms != null) lines.push(`- **Duration**: ${trace.duration_ms}ms`);
343
+ lines.push('');
344
+ }
345
+
346
+ if (trace.componentCount != null) {
347
+ lines.push('## Component count');
348
+ lines.push('');
349
+ lines.push(`- **${trace.componentCount}** components in generated HTML`);
350
+ if (trace.scopeDrift) {
351
+ const sd = trace.scopeDrift;
352
+ const flag = sd.drift ? ' ⚠ **scope drift**' : '';
353
+ const ratio = sd.ratio != null ? `${sd.ratio.toFixed(2)}×` : 'n/a';
354
+ lines.push(`- Bound-chunk envelope: ${sd.expected} components`);
355
+ lines.push(`- Ratio (actual / expected): **${ratio}**${flag}`);
356
+ if (sd.drift) {
357
+ lines.push('');
358
+ lines.push(`> The composed HTML's component count exceeds the bound chunks' envelope by more than the drift gate. This is the canvas-drift regression class — the synthesizer materialized markup beyond what the retrieved chunks justify.`);
359
+ }
360
+ } else if (trace.componentCount > 50) {
361
+ lines.push('- ⚠ over-generation candidate (>50 components, no scope-drift signal available)');
362
+ }
363
+ lines.push('');
364
+ }
365
+
366
+ if (trace.synthesis?.retrievalTrace) {
367
+ const r = trace.synthesis.retrievalTrace;
368
+ lines.push('## Retrieval log');
369
+ lines.push('');
370
+ lines.push(`**Tier 1** (semantic-blended, threshold ${r.tier1Threshold}, ${r.tier1Pass ? '✓ pass' : '✗ fall through to Tier 2'})`);
371
+ lines.push('');
372
+ lines.push('| rank | score | kind | chunk |');
373
+ lines.push('|------|------:|------|-------|');
374
+ r.tier1Hits.forEach((h, i) => lines.push(`| ${i + 1} | ${h.score} | ${h.kind} | \`${h.name}\` |`));
375
+ lines.push('');
376
+ if (r.catalogSize) {
377
+ lines.push(`**Tier 2 catalog**: ${r.catalogSize} chunks (${r.catalogPageNames.length} page · ${r.catalogPanelNames.length} panel · ${r.catalogBlockTopN.length} top-block)`);
378
+ lines.push('');
379
+ if (r.catalogBlockTopN.length) {
380
+ lines.push('Top block candidates:');
381
+ for (const h of r.catalogBlockTopN) lines.push(` - ${h.score} \`${h.name}\``);
382
+ lines.push('');
383
+ }
384
+ }
385
+ }
386
+
387
+ if (trace.synthesis?.attemptsLog?.length) {
388
+ lines.push('## LLM attempts');
389
+ lines.push('');
390
+ trace.synthesis.attemptsLog.forEach((att, i) => {
391
+ lines.push(`### Attempt ${att.attempt ?? i + 1}`);
392
+ lines.push('');
393
+ lines.push('```json');
394
+ lines.push(typeof att.raw === 'string' ? att.raw.slice(0, 2000) : JSON.stringify(att.raw, null, 2).slice(0, 2000));
395
+ lines.push('```');
396
+ lines.push('');
397
+ });
398
+ if (trace.synthesis.userPrompt) {
399
+ lines.push('### User prompt sent to LLM');
400
+ lines.push('');
401
+ lines.push('```');
402
+ lines.push(trace.synthesis.userPrompt.slice(0, 3000));
403
+ if (trace.synthesis.userPrompt.length > 3000) lines.push(`... (${trace.synthesis.userPrompt.length - 3000} more chars)`);
404
+ lines.push('```');
405
+ lines.push('');
406
+ if (trace.synthesis.systemPromptHash) lines.push(`System prompt hash: \`${trace.synthesis.systemPromptHash}\` (matches across attempts)`);
407
+ lines.push('');
408
+ }
409
+ }
410
+
411
+ if (trace.plan) {
412
+ lines.push('## Composer plan');
413
+ lines.push('');
414
+ lines.push('```json');
415
+ lines.push(JSON.stringify(trace.plan, null, 2));
416
+ lines.push('```');
417
+ lines.push('');
418
+ }
419
+
420
+ if (trace.htmlPreview) {
421
+ lines.push('## Generated HTML (preview)');
422
+ lines.push('');
423
+ lines.push('```html');
424
+ lines.push(trace.htmlPreview);
425
+ lines.push('```');
426
+ lines.push('');
427
+ }
428
+
429
+ if (trace.warnings?.length) {
430
+ lines.push('## Warnings');
431
+ lines.push('');
432
+ for (const w of trace.warnings) lines.push(`- ${w}`);
433
+ lines.push('');
434
+ }
435
+
436
+ if (trace.output?.ops?.length) {
437
+ lines.push('## Ops history (refinement chain)');
438
+ lines.push('');
439
+ lines.push(`${trace.output.ops.length} ops applied. ${trace.output.delta_summary ? 'Last delta: ' + trace.output.delta_summary : ''}`);
440
+ lines.push('');
441
+ }
442
+
443
+ if (issue.environment) {
444
+ lines.push('## Environment');
445
+ lines.push('');
446
+ for (const [k, v] of Object.entries(issue.environment)) lines.push(`- **${k}**: ${v}`);
447
+ lines.push('');
448
+ }
449
+
450
+ if (issue.related_issue_ids?.length) {
451
+ lines.push('## Related issues');
452
+ lines.push('');
453
+ for (const id of issue.related_issue_ids) lines.push(`- \`${id}\``);
454
+ lines.push('');
455
+ }
456
+
457
+ return lines.join('\n');
241
458
  }
242
459
 
243
460
  /**