@adia-ai/a2ui-compose 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -12,6 +12,64 @@ generator graph.
12
12
 
13
13
  _No pending changes._
14
14
 
15
+ ## [0.5.3] - 2026-05-14
16
+
17
+ ### Fixed — Deterministic chunk-loading order in zettel composition library (§160, v0.5.3)
18
+
19
+ `strategies/zettel/composition-library.js`: `walk()` now sorts `fs.readdirSync` output via `localeCompare` before recursing. Pre-§160 the directory walker relied on filesystem-defined entry order, which varies across processes on APFS/ext4. When two chunks tied on retrieval score for a given intent, the first-loaded won the tie-break — so the same intent could match chunk A in one process and chunk B in the next.
20
+
21
+ Surfaced as ~40% flake rate on the `admin dashboard with kpi cards` retrieval probe post-§143 (the 8 new UI-primitive chunks compete with the original `dashboard-kpi-grid` template). Sorted walk pins load order; remaining flakiness is now constrained to a corpus-quality bug (Stat chunks strip `label`/`value`/`change` attrs in the `template` field) tracked as v0.5.4 F-S1a.
22
+
23
+ No behavior change for callers that don't hit a tie-break — most intents have a clear retrieval winner.
24
+
25
+ ### Deprecated — `_debug.attempts` and `_debug.warnings` on free-form-composed results (§146, v0.5.3)
26
+
27
+ Finalizes the v0.6.0 deprecation schedule for the last two `_debug.*` fields that consumers might still read on `generateFreeFormAdapter`'s result. Both fields are dialog-recorder-gated (silently `undefined` when not recording) — the same access pattern §109 (v0.5.1) + §107a (v0.5.2) removed for `usedIngredients` / `rationale` / `plan`.
28
+
29
+ **v0.6.0 migration path** (folded into the inline `@deprecated` JSDoc on `strategies/registry.js`):
30
+
31
+ - `result._debug?.attempts` → `result.attempts: number` (LLM round-trips: 1 normally; 2-3 with hallucination retry or paraphrase-retry)
32
+ - `result._debug?.warnings` → `result.warnings: string[]` (non-fatal transpile findings: unknown substitution keys, layout-value fall-throughs, chunk-resolution warns)
33
+
34
+ **Scheduled removal**: v0.6.0 drops the `_debug` block entirely from the free-form-composed result shape. The dialog-recorder will read first-class fields directly. v0.5.3 is the **migration window** — any external consumers reading `_debug.attempts` / `_debug.warnings` should switch to the first-class fields before v0.6.0 ships.
35
+
36
+ **Internal verification**: zero live in-repo consumers (`grep -rn '_debug\?\.attempts\|_debug\.attempts\|_debug\?\.warnings\|_debug\.warnings' apps/ playgrounds/ catalog/ packages/` returns only the deprecation comment itself).
37
+
38
+ ## [0.5.2] - 2026-05-13
39
+
40
+ ### Changed — `plan` graduates from `_debug.*` to first-class on free-form-composed result (§107a infra, v0.5.2)
41
+
42
+
43
+ `strategies/registry.js` `generateFreeFormAdapter` returns `plan` as a top-level result field (paired with v0.5.1's §109 graduation of `usedIngredients` + `rationale`). The `_debug.plan` field is no longer populated — consumers should read `result.plan` directly. Soft-API change (`_debug` was always documented as volatile).
44
+
45
+ Enables substitution-coverage measurement at eval time without dialog-recorder coupling. Companion to `@adia-ai/a2ui-mcp` `--report-substitutions` flag.
46
+
47
+ ### Deprecated — individual `_debug.*` field reads on free-form-composed results (§131, v0.5.2)
48
+
49
+ Documents the `_debug`-volatility contract that §109 (v0.5.1) + §107a (v0.5.2) established de-facto. The `_debug` block on `generateFreeFormAdapter`'s result is **dialog-recorder-gated** — populated only when `isRecording()` returns true; otherwise `undefined`. Individual field reads (`result._debug?.usedIngredients`, `result._debug?.rationale`, `result._debug?.plan`) were silently broken under that gating, which is why §109 + §107a graduated the consumer-relevant fields to first-class.
50
+
51
+ **Migration:** any consumer reading `result._debug?.<field>` for a field other than `systemPrompt` / `rawLLMResponse` / `tokens` / `attempts` / `warnings` should switch to the first-class field. The currently-stable first-class set: `usedIngredients`, `rationale`, `plan`.
52
+
53
+ **Scheduled removal:** v0.6.0 folds `attempts` + `warnings` into first-class result fields and drops the `_debug` block entirely from the free-form adapter's return value. The dialog-recorder will read first-class fields directly.
54
+
55
+ No live in-repo consumers of the soft-API path remain (verified `grep -rn '_debug?\.\|_debug\.' apps/ playgrounds/ catalog/` → 0 hits). One dead fallback in `@adia-ai/a2ui-mcp`'s `eval-diff.mjs` (`result.plan || result._debug?.plan || null`) removed in the same cut.
56
+
57
+ ### Changed — Free-form system prompt: 5 per-shape substitution examples + self-check pattern (§126, v0.5.2)
58
+
59
+ `strategies/free-form-composer/system-prompt.js` constraint 3 extends the v0.5.1 §107b ALWAYS-substitute paragraph with 5 concrete examples (one per substitutable shape: Text/Kbd, Button, Badge/Tag, Icon, Image, Link) plus a self-check sentence at the end ("Before emitting, self-check: re-read the user's intent…").
60
+
61
+ Targets the v0.5.1 §107a finding that Haiku 4.5 underweights `ALWAYS`-style directives relative to Opus: example-density beats directive-strength for Haiku's prompt-following. Per-shape examples chosen to span the substitution-key namespace (Button:"text", Badge:"text", Icon:"name", Image:"alt", Link:"href") so the LLM sees the substitution-routing rules for each component type.
62
+
63
+ Cost: ~120 additional prompt tokens per call (~0.5% of total). Acceptable.
64
+
65
+ Expected impact: lifts substitution ratio 27.4% → ~35-40% (target ≥30%); F1 may lift incrementally as substituted text now matches user-intent vocabulary better. Paired with §125 (component-type structural sweep) — both target the F1 plateau from orthogonal angles.
66
+
67
+ ### Changed — `generateFreeFormAdapter` accepts `ctx.model` override + reads `FREE_FORM_MODEL_OVERRIDE` env var (§127 infra, v0.5.2)
68
+
69
+ `strategies/registry.js` reads model priority chain at call-time (not module-load): `ctx.model` > `process.env.FREE_FORM_MODEL_OVERRIDE` > `FREE_FORM_MODEL_DEFAULT` (Haiku 4.5). Enables `eval-diff.mjs --model <id>` to run the Haiku-vs-Opus A/B for §127 without env-dance or static-import-ordering hazards. `FREE_FORM_MODEL` constant renamed to `FREE_FORM_MODEL_DEFAULT` to reflect the new priority chain.
70
+
71
+ Default behavior unchanged when no override set — the v0.5.1 §108 Haiku pin holds for every consumer that doesn't explicitly override.
72
+
15
73
  ## [0.5.1] - 2026-05-13
16
74
 
17
75
  ### Added — Free-form composer INTENT-PARAPHRASE block + paraphrase-retry (§106, v0.5.1)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@adia-ai/a2ui-compose",
3
- "version": "0.5.1",
3
+ "version": "0.5.3",
4
4
  "description": "AdiaUI A2UI compose engine — framework-agnostic. Takes natural-language intents + a catalog and produces A2UI protocol messages. Pairs with `@adia-ai/a2ui-retrieval` (intent classification, catalog lookup) and `@adia-ai/a2ui-validator` (schema + semantic checks).",
5
5
  "type": "module",
6
6
  "exports": {
@@ -129,7 +129,18 @@ CONSTRAINTS:
129
129
  2. Order your ingredients in the sequence they should appear in the rendered UI. The transpiler wraps your list in a root container — by default a vertical Column. Override via the optional \`layout\` field (see below).
130
130
  3. Each ingredient may carry a \`substitutions\` object. KEYS are node IDs from the \`substitutables:\` line in the catalog above (e.g. \`"title"\`, \`"submit"\`, \`"logo"\`). VALUES are the new content strings. The transpiler routes each substitution to the right attribute based on the node's component: \`Text\` / \`Kbd\` → \`textContent\`; \`Button\` / \`Badge\` / \`Tag\` → \`text\`; \`Icon\` → \`name\`; \`Image\` → \`alt\`; \`Link\` → \`href\`. Nodes whose component is not in the substitutable list are locked at their declared values.
131
131
 
132
- **ALWAYS substitute** any title, heading, button label, or badge label that the user's intent specifies. Default chunk text is generic placeholder ("Sign in to AdiaUI", "Continue", "Welcome back"); your job is to tailor it to the user's actual intent. Substituting MORE is better than substituting less — empty \`substitutions\` ships generic copy that misses the intent. Example: intent "trial-signup form for ContextEngine" → \`{"title": "Start your ContextEngine trial", "submit": "Create account"}\` not \`{}\`.
132
+ **ALWAYS substitute** any title, heading, button label, or badge label that the user's intent specifies. Default chunk text is generic placeholder ("Sign in to AdiaUI", "Continue", "Welcome back"); your job is to tailor it to the user's actual intent. Substituting MORE is better than substituting less — empty \`substitutions\` ships generic copy that misses the intent.
133
+
134
+ **Substitution examples per shape** (study these before emitting):
135
+
136
+ - **Text / Kbd (textContent)**: intent "trial-signup form for ContextEngine" → \`{"title": "Start your ContextEngine trial", "submit": "Create account"}\` not \`{}\`.
137
+ - **Button (text)**: intent "submit a support ticket" → \`{"submit-btn": "Open ticket"}\` not \`{"submit-btn": "Submit"}\`.
138
+ - **Badge / Tag (text)**: intent "show overdue invoices with status" → \`{"status-badge": "Overdue"}\` not \`{"status-badge": "Status"}\`.
139
+ - **Icon (name)**: intent "delete the project" → \`{"action-icon": "trash"}\` not omitting the icon substitution.
140
+ - **Image (alt)**: intent "team photo with three engineers" → \`{"hero-image": "Three engineers at a whiteboard"}\` not \`{}\`.
141
+ - **Link (href)**: intent "footer link to /docs/api" → \`{"docs-link": "/docs/api"}\` not \`{"docs-link": "#"}\`.
142
+
143
+ **Before emitting, self-check**: re-read the user's intent. Does it specify any content (text label, badge, icon, alt-text, URL)? If yes, every matching substitutable in your picked ingredients should appear in \`substitutions\`. If the intent is content-agnostic ("a sign-up form"), empty \`substitutions\` is correct.
133
144
  4. If you can't satisfy the intent with the available ingredients, return \`{ "ingredients": [] }\` and a rationale explaining what's missing.
134
145
  5. Output ONLY the JSON object below, no explanation outside the JSON.
135
146
 
@@ -124,7 +124,15 @@ async function generateZettelAdapter(ctx) {
124
124
  // than Opus, freeing the Opus budget for monolithic-pro fall-throughs.
125
125
  // User-decided pre-A/B per v0.5.1 plan question 2: "Haiku is a good
126
126
  // default."
127
- const FREE_FORM_MODEL = 'claude-haiku-4-5-20251001';
127
+ //
128
+ // §127 (v0.5.2): env override for the A/B harness. Set
129
+ // `FREE_FORM_MODEL_OVERRIDE` env var to e.g. `claude-opus-4-7` to run
130
+ // the eval against Opus for the §127 Haiku-vs-Opus comparison.
131
+ // Default behavior unchanged when env var unset — the Haiku-pin holds
132
+ // for every consumer that doesn't explicitly override. Read at
133
+ // call-time (not module-load) so eval harnesses can set the env after
134
+ // static imports complete.
135
+ const FREE_FORM_MODEL_DEFAULT = 'claude-haiku-4-5-20251001';
128
136
 
129
137
  async function generateFreeFormAdapter(ctx) {
130
138
  // Lazy-load: free-form-composer imports composition-library which
@@ -134,10 +142,17 @@ async function generateFreeFormAdapter(ctx) {
134
142
  // Pin Haiku for the picker task even when harness model is Opus.
135
143
  // Auto-engine + factory-chat may pass any-model llmAdapter; free-form
136
144
  // mints its own to keep the picker cost-bounded.
145
+ //
146
+ // §127 (v0.5.2): `ctx.model` overrides the FREE_FORM pin (highest
147
+ // priority). `FREE_FORM_MODEL_OVERRIDE` env var is the second-priority
148
+ // override (set by `eval-diff.mjs --model <id>`). Default Haiku-pin
149
+ // holds otherwise. Empirical result from §127 records the production
150
+ // default; if Opus wins, FREE_FORM_MODEL_DEFAULT flips.
151
+ const modelToUse = ctx.model || process.env.FREE_FORM_MODEL_OVERRIDE || FREE_FORM_MODEL_DEFAULT;
137
152
  let pickerAdapter = ctx.llmAdapter || null;
138
153
  try {
139
154
  const { createAdapter } = await import('../../../llm/llm-bridge.js');
140
- pickerAdapter = await createAdapter({ model: FREE_FORM_MODEL });
155
+ pickerAdapter = await createAdapter({ model: modelToUse });
141
156
  } catch {
142
157
  // Adapter mint failed (proxy down, no key) — fall back to whatever
143
158
  // ctx provides. Strategy will emit `free-form-no-llm` if both fail.
@@ -162,11 +177,51 @@ async function generateFreeFormAdapter(ctx) {
162
177
  // copy quotes the LLM's one-line rationale when present.
163
178
  usedIngredients: result.usedIngredients || [],
164
179
  rationale: result.rationale || null,
180
+ // §107a (v0.5.1): plan graduates to first-class so eval-diff's
181
+ // `--report-substitutions` can read it without `_debug` gating.
182
+ // The plan carries the LLM's emitted substitutions per ingredient
183
+ // — the substitution-coverage measurement needs this surface even
184
+ // when dialog-recorder is off (production eval scenarios).
185
+ plan: result.plan || null,
186
+ // §131 (v0.5.2): `_debug` is **volatile** — dialog-recorder-gated,
187
+ // shape may change without semver coordination. Consumers MUST read
188
+ // first-class fields instead: `usedIngredients` (§109 v0.5.1),
189
+ // `rationale` (§109 v0.5.1), `plan` (§107a v0.5.2). The old soft-API
190
+ // paths (`_debug.usedIngredients`, `_debug.rationale`, `_debug.plan`)
191
+ // were removed by §109 / §107a; this block is now strictly debug-
192
+ // recorder fodder. Scheduled removal: v0.6.0 will fold `attempts` +
193
+ // `warnings` into first-class result fields and drop `_debug` here.
194
+ //
195
+ // §146 (v0.5.3): finalize the v0.6.0 deprecation schedule for the
196
+ // last two `_debug.*` fields that any consumer might still read.
197
+ //
198
+ // @deprecated `_debug.attempts` and `_debug.warnings` — both fields
199
+ // are dialog-recorder-gated (silently `undefined` when not
200
+ // recording), which is exactly the access pattern §109+§107a
201
+ // removed for `usedIngredients`/`rationale`/`plan`. v0.6.0 folds
202
+ // both into first-class result fields:
203
+ //
204
+ // - `result.attempts: number` — number of LLM round-trips taken
205
+ // to produce the final plan (1 normally; 2-3 with hallucination
206
+ // retry or paraphrase-retry; same value as §106's loop counter)
207
+ // - `result.warnings: string[]` — non-fatal transpile findings
208
+ // (unknown substitution keys, layout-value fall-throughs,
209
+ // chunk-resolution warns); same value as transpilePlan output
210
+ //
211
+ // Consumers should switch reads from `result._debug?.attempts` →
212
+ // `result.attempts` and `result._debug?.warnings` → `result.warnings`
213
+ // before v0.6.0 ships. The `_debug` block itself disappears from
214
+ // the free-form-composed result shape entirely in v0.6.0; the
215
+ // dialog-recorder will read first-class fields directly.
216
+ //
217
+ // No in-repo consumers currently read these fields (verified
218
+ // `grep -rn '_debug\?.attempts\|_debug.attempts\|_debug\?.warnings\|_debug.warnings'
219
+ // apps/ playgrounds/ catalog/ packages/` → 0 hits at v0.5.3 cut).
220
+ // External consumers should treat v0.6.0 as the migration window.
165
221
  _debug: isRecording() ? {
166
222
  systemPrompt: null,
167
223
  rawLLMResponse: null,
168
224
  tokens: null,
169
- plan: result.plan,
170
225
  attempts: result.attempts,
171
226
  warnings: result.warnings,
172
227
  } : undefined,
@@ -101,7 +101,17 @@ async function _loadAllNode() {
101
101
 
102
102
  function walk(dir, cb) {
103
103
  if (!fs.existsSync(dir)) return;
104
- for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
104
+ // §160 (v0.5.3, F-S1 root cause): sort readdir output. fs.readdirSync
105
+ // returns entries in filesystem-defined order (variable across
106
+ // processes on APFS/ext4), which means chunk-loading order varied
107
+ // run-to-run. When two chunks tied on score for a given intent, the
108
+ // first-loaded won the tie-break — so the same intent could match
109
+ // chunk A in one process and chunk B in the next. Surfaced as
110
+ // ~40% probe-failure rate on admin-dashboard post-§143 (the new
111
+ // chunks compete with the original 4-row KPI composition).
112
+ const entries = fs.readdirSync(dir, { withFileTypes: true })
113
+ .sort((a, b) => a.name.localeCompare(b.name));
114
+ for (const entry of entries) {
105
115
  const p = path.join(dir, entry.name);
106
116
  if (entry.isDirectory()) walk(p, cb);
107
117
  else if (entry.name.endsWith('.json') && !entry.name.startsWith('_')) cb(p);