@adia-ai/a2ui-mcp 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,70 @@ zettel strategies.
11
11
 
12
12
  _No pending changes._
13
13
 
14
+ ## [0.3.3] - 2026-05-07
15
+
16
+ **Lockstep cut.** All 9 published `@adia-ai/*` packages now share version `0.3.3`, governed by [`docs/specs/package-architecture.md` § 15](../../../docs/specs/package-architecture.md#15-versioning-policy). Internal `@adia-ai/*` ranges stay at `^0.3.0` (patch-cut asymmetry — caret floats `0.3.x`).
17
+
18
+ ### Added
19
+
20
+ - **`TOOLS.md`** — single-page reference for all 24 MCP tools
21
+ exposed by `server.js`. Tools grouped into 7 categories
22
+ (composition, retrieval, validation, refinement, telemetry,
23
+ catalog, debug) with descriptions extracted from `server.js`.
24
+ (closes backlog #96)
25
+
26
+ - **Smoke retrieval-quality probe** in
27
+ `scripts/smoke-engine-registry.mjs`. Runs 3 canonical intents
28
+ (login form, pricing tiers, sign-up form) through `generateUI()`
29
+ and verifies output text content overlaps intent keywords.
30
+ Catches retrieval regressions (wrong-domain top hit) that pure
31
+ shape-validation gates miss. 3/3 probes pass. (closes backlog #46)
32
+
33
+ ### Changed
34
+
35
+ - **README** — corrected stale 25-tool count to 24, removed phantom
36
+ `compose_from_chunks` entry from the table, replaced inline tool
37
+ table with pointer to `TOOLS.md`. Added "Multi-turn limitations"
38
+ Gotcha entry documenting per-process state-cache limitation
39
+ (session-store + state-cache are in-memory, ephemeral; restarting
40
+ the server drops every multi-turn chain in flight; consumers
41
+ needing durability should checkpoint to their own store + pass
42
+ `currentCanvas` on resume). (closes backlog #47)
43
+
44
+ - **`tools/synthesis.js`** — issue-reporter prompt vocabulary swept:
45
+ `'coherence-audit'` → `'ui-audit-coherence'` (matches the renamed
46
+ skill).
47
+
48
+ - **`scripts/smoke-issues.mjs`** — extended `AUTO_FIRE_POLICY exports
49
+ expected reasons` test to include the new
50
+ `iteration-synthesis-failure` reason (62/62 pass).
51
+
52
+ ## [0.3.2] - 2026-05-06
53
+
54
+ **9-package lockstep patch cut to v0.3.2.** All lockstep members share
55
+ one version per [`docs/specs/package-architecture.md` § 15](../../../docs/specs/package-architecture.md#15-versioning-policy).
56
+ Internal `@adia-ai/*` dep ranges unchanged at `^0.3.0`.
57
+
58
+ ### Added
59
+
60
+ - **Render-fidelity scoring** — `scripts/render-fidelity.mjs`: Playwright
61
+ headless evaluates synthesized HTML on console errors, blank-page
62
+ detection, and unregistered custom-element count. Integrated into
63
+ `eval-compose-from-chunks.mjs` at full 30% weight.
64
+ - **`eval:compose-from-chunks` framework** — 20-intent hold-out set with
65
+ real-LLM runner (structural 30% + coverage 20% + retrieval 20% +
66
+ render 30%). Threshold: 80. Result: 20/20 passing, avg 88.
67
+ - **`chunk-zettel` engine baseline** — comparison vs classic zettel on same
68
+ hold-out (88 avg / 20 passing vs 77 avg / 11 passing).
69
+ - **Zettel baseline runner** — `scripts/zettel-baseline.mjs` for cross-engine
70
+ comparison.
71
+ - **Nightly eval CI** — `.github/workflows/nightly-eval.yml` with stub
72
+ eval + zettel-diff + summary jobs.
73
+
74
+ ### Changed
75
+
76
+ - `version`: `0.3.1` → `0.3.2`.
77
+
14
78
  ## [0.3.1] - 2026-05-06
15
79
 
16
80
  **9-package lockstep patch cut.** All 9 published `@adia-ai/*` packages bump 0.3.0 → 0.3.1 per [`docs/specs/package-architecture.md` § 15](../../../docs/specs/package-architecture.md#15-versioning-policy). Internal `@adia-ai/*` dep ranges remain at `^0.3.0` (covers `0.3.1` under semver — patch-cut asymmetry).
@@ -396,7 +460,7 @@ ADR: [`0008-multiturn-genui-architecture.md`](../../../.brain/adrs/0008-multitur
396
460
  reporter kinds: LLM self-fire (this tool with `reporter: 'llm'`),
397
461
  consumer-fire (passed through directly), engine auto-fire (internal,
398
462
  per `AUTO_FIRE_POLICY` in the issue-reporter module). Severity vocabulary
399
- `blocker | drift | nit` matches the existing `coherence-audit`
463
+ `blocker | drift | nit` matches the existing `ui-audit-coherence`
400
464
  discipline. Trace levels: `'full' | 'summary' | 'none'`; oversized
401
465
  traces (> 200 KB) spill to a sidecar `.trace.json` file. Tool count
402
466
  goes from 25 → 28.
package/README.md CHANGED
@@ -42,35 +42,19 @@ export GEMINI_API_KEY=AIza…
42
42
 
43
43
  ## Tools
44
44
 
45
- The server registers 25 tools. Shape is stable; argument schemas via Zod.
46
-
47
- | Tool | What it does |
48
- |-------------------------|-----------------------------------------------------------|
49
- | `generate_ui` | Intent → A2UI tree. Engine (`monolithic`/`zettel`) + mode. |
50
- | `validate_schema` | Run the 15-check validator on an A2UI tree; returns 0-100. |
51
- | `classify_intent` | Extract concepts, entities, implied components, steelman. |
52
- | `lookup_component` | Resolve a component name (alias-aware) to its schema. |
53
- | `get_component_map` | Full tag→class map including alias normalizations. |
54
- | `search_patterns` | Keyword-rank the monolithic pattern corpus. |
55
- | `assemble_context` | Build the system prompt context for a given intent. |
56
- | `check_anti_patterns` | Scan a tree for canonical anti-patterns (chart-legend, …). |
57
- | `get_traits` | List trait catalog + their host-binding rules. |
58
- | `convert_html` | Raw HTML → best-effort A2UI tree (import path). |
59
- | `get_wiring_catalog` | Declarative wiring-engine recipes. |
60
- | `import_pattern` | Commit a generated result into the pattern library. |
61
- | `submit_feedback` | Append a user-feedback event to the feedback store. |
62
- | `get_quality_metrics` | Aggregate pass/fail scores over a window. |
63
- | `get_training_gaps` | Intents that currently miss coverage. |
64
- | `run_eval` | Run the held-out benchmark; return pass/fail per intent. |
65
- | `get_fragment` | Fetch a single zettel fragment by id. |
66
- | `get_composition` | Fetch a named multi-fragment composition. |
67
- | `resolve_composition` | Expand a composition reference into its fragments. |
68
- | `get_graph` | Dump the zettel fragment-dependency graph. |
69
- | `zettel_stats` | Corpus counts (fragments, compositions, reuse ratio, …). |
70
- | **`search_chunks`** | Semantic + keyword search over the gen-UI training-chunk corpus (since 0.0.2). |
71
- | **`get_chunk`** | Full record (HTML + metadata + slots) for one chunk. |
72
- | **`lookup_chunk`** | List every chunk whose primary element is `<component>`. |
73
- | **`compose_from_chunks`** | Retrieval-first / LLM-mix-and-match composition. Picks a page chunk + binds block/panel chunks to its slots when retrieval is weak. Validator enforces slot+kind contracts. **Embedding-blended retrieval as of 0.0.3** (was keyword-only in 0.0.2). |
45
+ The server registers **24 tools**. Argument schemas via Zod; shape is stable across the v0.3.x line.
46
+
47
+ **See [`TOOLS.md`](./TOOLS.md) for the full reference** — tool names, descriptions, grouping, and source pointers. Quick map:
48
+
49
+ | Group | Tools |
50
+ |---|---|
51
+ | **Generation** | `generate_ui` |
52
+ | **Discovery** | `get_component_map`, `lookup_component`, `lookup_chunk`, `get_traits`, `get_wiring_catalog` |
53
+ | **Retrieval** | `search_chunks`, `get_chunk`, `search_patterns`, `get_fragment`, `get_composition`, `get_graph`, `resolve_composition`, `zettel_stats` |
54
+ | **Intent + context** | `classify_intent`, `assemble_context` |
55
+ | **Validation + conversion** | `validate_schema`, `check_anti_patterns`, `convert_html` |
56
+ | **Feedback + evaluation** | `submit_feedback`, `get_quality_metrics`, `get_training_gaps`, `run_eval` |
57
+ | **Authoring** | `import_pattern` |
74
58
 
75
59
  ## Layout
76
60
 
@@ -141,6 +125,20 @@ On start, the server:
141
125
  names in the registry guard against accidental shadowing.
142
126
  - **Validator score < 70 is still returned.** Consumers should gate on
143
127
  the `passed` boolean or raw `score` — the tool doesn't auto-retry.
128
+ - **Multi-turn state is per-process, in-memory, ephemeral.** Both the
129
+ zettel `state-cache` (keyed by `state_id` for refinement chains) and
130
+ the zettel `session-store` (keyed by `sessionId` for follow-up
131
+ iteration) live in the MCP process's heap. **Restarting the server
132
+ drops every multi-turn chain in flight.** Consumers that need
133
+ durability across restarts should checkpoint the canvas to their own
134
+ store between turns and pass `currentCanvas` on resume rather than
135
+ relying on `executionId` / `sessionId`. A future phase may add
136
+ file-backed persistence as opt-in via `A2UI_STATE_CACHE_PATH`; not
137
+ implemented today.
138
+ - **`session-store` and `state-cache` overlap intentionally.** Different
139
+ lifecycles (MCP-call multi-turn vs engine-internal stable-id resolution),
140
+ different APIs, kept separate by design — see `state-cache.js` and
141
+ `session-store.js` for the contracts.
144
142
 
145
143
  ## Evals + regression floors
146
144
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@adia-ai/a2ui-mcp",
3
- "version": "0.3.2",
3
+ "version": "0.3.3",
4
4
  "description": "AdiaUI A2UI MCP server. Exposes the compose engine over MCP with an engine selector for monolithic + zettel strategies.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -33,4 +33,4 @@
33
33
  "@adia-ai/llm": "^0.3.0",
34
34
  "zod": "^3.24.0"
35
35
  }
36
- }
36
+ }
@@ -40,4 +40,39 @@ const ok =
40
40
  Array.isArray(r2.messages) && r2.validation;
41
41
  console.log(`\n[smoke] shape invariants: ${ok ? 'ok' : 'FAIL'}`);
42
42
 
43
- if (!ok) process.exit(1);
43
+ // Retrieval-quality probe — for each canonical intent, the generated
44
+ // component tree's text content must overlap the intent's keywords.
45
+ // This catches retrieval regressions (wrong-domain top hit) that pure
46
+ // shape-validation gates miss.
47
+ const RETRIEVAL_PROBES = [
48
+ { intent: 'login form with email and password', engine: 'zettel', expectKeywords: ['sign in', 'login', 'email', 'password'] },
49
+ { intent: 'pricing tiers with three plans', engine: 'zettel', expectKeywords: ['pricing', 'tier', 'plan', 'starter', 'pro', 'enterprise', '$'] },
50
+ { intent: 'sign up form for a new account', engine: 'zettel', expectKeywords: ['sign up', 'register', 'create account', 'email'] },
51
+ ];
52
+
53
+ function extractText(messages) {
54
+ const parts = [];
55
+ for (const msg of messages || []) {
56
+ for (const c of msg.components || []) {
57
+ if (c.textContent) parts.push(String(c.textContent));
58
+ if (c.label) parts.push(String(c.label));
59
+ if (c.placeholder) parts.push(String(c.placeholder));
60
+ if (c.text) parts.push(String(c.text));
61
+ }
62
+ }
63
+ return parts.join(' ').toLowerCase();
64
+ }
65
+
66
+ let probeOk = true;
67
+ for (const probe of RETRIEVAL_PROBES) {
68
+ const r = await generateUI({ intent: probe.intent, engine: probe.engine });
69
+ const text = extractText(r.messages);
70
+ const matched = probe.expectKeywords.some((k) => text.includes(k.toLowerCase()));
71
+ const tag = matched ? 'ok' : 'FAIL';
72
+ const preview = text.slice(0, 60).replace(/\s+/g, ' ');
73
+ console.log(`[smoke/retrieval] "${probe.intent.slice(0, 38)}…" → strategy=${r.strategy} text="${preview}…" ${matched ? '✓' : `✗ expected one of [${probe.expectKeywords.slice(0, 3).join(', ')}]`} [${tag}]`);
74
+ if (!matched) probeOk = false;
75
+ }
76
+ console.log(`\n[smoke] retrieval-quality probes: ${probeOk ? 'ok' : 'FAIL'}`);
77
+
78
+ if (!ok || !probeOk) process.exit(1);
@@ -257,7 +257,7 @@ console.log('\n=== AUTO_FIRE_POLICY exported ===');
257
257
  t('AUTO_FIRE_POLICY exports expected reasons',
258
258
  ['synthesizer-exhausted', 'validator-exhausted', 'locator-empty-targets',
259
259
  'retrieval-zero-then-synthesis-fail', 'cache-miss-on-known-state',
260
- 'ops-failed-after-apply'].every((r) => AUTO_FIRE_POLICY[r])
260
+ 'ops-failed-after-apply', 'scope-drift', 'iteration-synthesis-failure'].every((r) => AUTO_FIRE_POLICY[r])
261
261
  );
262
262
 
263
263
  await rm(TMP, { recursive: true, force: true });
@@ -401,7 +401,7 @@ The tool returns BOTH paths in its response: \`path\` (.json) and \`markdown_pat
401
401
  • Trace report: \`{markdown_path}\` ← human-readable, scan this first
402
402
  • Raw JSON: \`{path}\` ← machine-readable
403
403
 
404
- Issue files land under \`.brain/audit-history/issues/\` (immutable; resolution lands in a sidecar file). Severity taxonomy matches the project's coherence-audit vocabulary: blocker = contract violation; drift = quality erosion; nit = cosmetic.`,
404
+ Issue files land under \`.brain/audit-history/issues/\` (immutable; resolution lands in a sidecar file). Severity taxonomy matches the project's ui-audit-coherence vocabulary: blocker = contract violation; drift = quality erosion; nit = cosmetic.`,
405
405
  {
406
406
  type: z.enum(['bug', 'training-gap', 'protocol-gap', 'ux-feedback']).describe('Issue category'),
407
407
  severity: z.enum(['blocker', 'drift', 'nit']).describe('Severity tier'),