@adia-ai/a2ui-mcp 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,59 @@ zettel strategies.
11
11
 
12
12
  ---
13
13
 
14
+ ## [0.0.3] - 2026-04-28
15
+
16
+ Same-day follow-up to `0.0.2` shipped earlier today. Wires the corpus
17
+ embedding vectors (from `@adia-ai/a2ui-corpus@0.0.4`) into the chunk
18
+ retrieval path so the synthesizer mixes-and-matches against semantic
19
+ similarity, not just keyword overlap.
20
+
21
+ ### Added
22
+
23
+ - **`chunk-embedding-retriever.js`** (`packages/a2ui/retrieval/`) — sibling
24
+ to the existing pattern-embedding retriever. Lazy-loads
25
+ `chunk-embeddings.json`, scores cosine similarity against the same
26
+ provider that built the index. Graceful no-op when index is missing or
27
+ no API key is available.
28
+ - **`searchChunksAsync(query, opts)`** in `chunk-library.js` — embedding-
29
+ blended search. Returns top candidates ranked by `keyword + 5×cosine`
30
+ when embeddings are available; falls back to keyword-only otherwise.
31
+ Sync `searchChunks()` is preserved for callers that want the keyword
32
+ floor without the network round-trip.
33
+
34
+ ### Changed
35
+
36
+ - `chunk-synthesizer.composeFromIntent()` — both tier-1 retrieval AND
37
+ tier-2 pre-search now go through `searchChunksAsync` instead of the
38
+ keyword-only `searchChunks`. The pre-search filters the prompt token
39
+ budget more accurately; the retrieval-first path catches more direct
40
+ hits before the LLM is invoked.
41
+
42
+ ### Measured impact
43
+
44
+ `eval:chunk-synthesis` (10 hold-out intents, real Anthropic LLM):
45
+
46
+ | | 0.0.2 (keyword) | **0.0.3 (semantic)** |
47
+ |---|---|---|
48
+ | Retrieval-tier hits | 7/10 | **9/10** |
49
+ | Synthesis-tier attempts | 3 | **1** |
50
+ | Synthesis plans validated | 3/3 | 1/1 |
51
+ | Total time | 5.0s | 5.1s |
52
+
53
+ Examples that flipped from synthesis → direct retrieval:
54
+ - "kpi grid with 4 stat cards" → matches `dashboard-kpi-grid` cleanly.
55
+ - "conversion funnel chart" → matches `dashboard-funnel`.
56
+
57
+ The remaining synthesis intent ("recovery page with backup-code form +
58
+ contact-support link") is genuinely novel — it composes from parts that
59
+ don't have a 1:1 chunk and exercises the slot-validator path correctly.
60
+
61
+ ### Dependencies
62
+
63
+ - Bumps `@adia-ai/a2ui-corpus` requirement from `^0.0.3` to `^0.0.4`.
64
+
65
+ ---
66
+
14
67
  ## [0.0.2] - 2026-04-28
15
68
 
16
69
  Adds **gen-UI training-chunk tools** that expose the new chunk corpus
package/README.md CHANGED
@@ -42,31 +42,35 @@ export GEMINI_API_KEY=AIza…
42
42
 
43
43
  ## Tools
44
44
 
45
- The server registers 21 tools. Shape is stable; argument schemas via Zod.
46
-
47
- | Tool | What it does |
48
- |------------------------|-----------------------------------------------------------|
49
- | `generate_ui` | Intent → A2UI tree. Engine (`monolithic`/`zettel`) + mode. |
50
- | `validate_schema` | Run the 15-check validator on an A2UI tree; returns 0-100. |
51
- | `classify_intent` | Extract concepts, entities, implied components, steelman. |
52
- | `lookup_component` | Resolve a component name (alias-aware) to its schema. |
53
- | `get_component_map` | Full tag→class map including alias normalizations. |
54
- | `search_patterns` | Keyword-rank the monolithic pattern corpus. |
55
- | `assemble_context` | Build the system prompt context for a given intent. |
56
- | `check_anti_patterns` | Scan a tree for canonical anti-patterns (chart-legend, …). |
57
- | `get_traits` | List trait catalog + their host-binding rules. |
58
- | `convert_html` | Raw HTML → best-effort A2UI tree (import path). |
59
- | `get_wiring_catalog` | Declarative wiring-engine recipes. |
60
- | `import_pattern` | Commit a generated result into the pattern library. |
61
- | `submit_feedback` | Append a user-feedback event to the feedback store. |
62
- | `get_quality_metrics` | Aggregate pass/fail scores over a window. |
63
- | `get_training_gaps` | Intents that currently miss coverage. |
64
- | `run_eval` | Run the held-out benchmark; return pass/fail per intent. |
65
- | `get_fragment` | Fetch a single zettel fragment by id. |
66
- | `get_composition` | Fetch a named multi-fragment composition. |
67
- | `resolve_composition` | Expand a composition reference into its fragments. |
68
- | `get_graph` | Dump the zettel fragment-dependency graph. |
69
- | `zettel_stats` | Corpus counts (fragments, compositions, reuse ratio, …). |
45
+ The server registers 25 tools. Shape is stable; argument schemas via Zod.
46
+
47
+ | Tool | What it does |
48
+ |-------------------------|-----------------------------------------------------------|
49
+ | `generate_ui` | Intent → A2UI tree. Engine (`monolithic`/`zettel`) + mode. |
50
+ | `validate_schema` | Run the 15-check validator on an A2UI tree; returns 0-100. |
51
+ | `classify_intent` | Extract concepts, entities, implied components, steelman. |
52
+ | `lookup_component` | Resolve a component name (alias-aware) to its schema. |
53
+ | `get_component_map` | Full tag→class map including alias normalizations. |
54
+ | `search_patterns` | Keyword-rank the monolithic pattern corpus. |
55
+ | `assemble_context` | Build the system prompt context for a given intent. |
56
+ | `check_anti_patterns` | Scan a tree for canonical anti-patterns (chart-legend, …). |
57
+ | `get_traits` | List trait catalog + their host-binding rules. |
58
+ | `convert_html` | Raw HTML → best-effort A2UI tree (import path). |
59
+ | `get_wiring_catalog` | Declarative wiring-engine recipes. |
60
+ | `import_pattern` | Commit a generated result into the pattern library. |
61
+ | `submit_feedback` | Append a user-feedback event to the feedback store. |
62
+ | `get_quality_metrics` | Aggregate pass/fail scores over a window. |
63
+ | `get_training_gaps` | Intents that currently miss coverage. |
64
+ | `run_eval` | Run the held-out benchmark; return pass/fail per intent. |
65
+ | `get_fragment` | Fetch a single zettel fragment by id. |
66
+ | `get_composition` | Fetch a named multi-fragment composition. |
67
+ | `resolve_composition` | Expand a composition reference into its fragments. |
68
+ | `get_graph` | Dump the zettel fragment-dependency graph. |
69
+ | `zettel_stats` | Corpus counts (fragments, compositions, reuse ratio, …). |
70
+ | **`search_chunks`** | Semantic + keyword search over the gen-UI training-chunk corpus (since 0.0.2). |
71
+ | **`get_chunk`** | Full record (HTML + metadata + slots) for one chunk. |
72
+ | **`lookup_chunk`** | List every chunk whose primary element is `<component>`. |
73
+ | **`compose_from_chunks`** | Retrieval-first / LLM-mix-and-match composition. Picks a page chunk + binds block/panel chunks to its slots when retrieval is weak. Validator enforces slot+kind contracts. **Embedding-blended retrieval as of 0.0.3** (was keyword-only in 0.0.2). |
70
74
 
71
75
  ## Layout
72
76
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@adia-ai/a2ui-mcp",
3
- "version": "0.0.2",
3
+ "version": "0.0.3",
4
4
  "description": "AdiaUI A2UI MCP server. Exposes the compose engine over MCP with an engine selector for monolithic + zettel strategies.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -29,7 +29,7 @@
29
29
  "@adia-ai/a2ui-compose": "^0.0.1",
30
30
  "@adia-ai/a2ui-retrieval": "^0.0.1",
31
31
  "@adia-ai/a2ui-validator": "^0.0.1",
32
- "@adia-ai/a2ui-corpus": "^0.0.3",
32
+ "@adia-ai/a2ui-corpus": "^0.0.4",
33
33
  "zod": "^3.24.0"
34
34
  }
35
35
  }
@@ -0,0 +1,100 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Real-LLM eval set for the chunk-aware composition synthesizer.
4
+ *
5
+ * Walks a hold-out set of intents that DON'T have a 1:1 chunk match in the
6
+ * corpus and exercises the full retrieval-first → synthesis-fallback path.
7
+ * For each intent, records:
8
+ * - which tier handled it (retrieval direct hit vs LLM synthesis)
9
+ * - whether HTML was produced
10
+ * - whether the synthesizer's plan validated against slot/kind contracts
11
+ * - composed HTML byte length (rough proxy for "not empty")
12
+ *
13
+ * Pass criterion: ≥ 80% of intents produce non-empty HTML; all synthesizer
14
+ * plans pass the slot+kind validator.
15
+ *
16
+ * Usage:
17
+ * node packages/a2ui/mcp/scripts/eval-chunk-synthesis.mjs
18
+ * ANTHROPIC_API_KEY=… node packages/a2ui/mcp/scripts/eval-chunk-synthesis.mjs
19
+ */
20
+
21
+ import '../../../../scripts/load-env.mjs';
22
+ import { composeFromIntent } from '../../compose/engines/zettel/chunk-synthesizer.js';
23
+ import { createAdapter } from '../../compose/llm/llm-bridge.js';
24
+
25
+ // Hold-out intents — chosen to NOT have a 1:1 chunk match (so synthesis path
26
+ // is exercised). Mix of dashboard-shape, auth-shape, and novel composites.
27
+ const INTENTS = [
28
+ // Dashboard composites that don't exist as a single chunk:
29
+ 'admin dashboard with KPI grid and conversion funnel',
30
+ 'analytics page with audience KPIs and a country list',
31
+ 'reports page with a transactions table and a sparkline grid',
32
+
33
+ // Auth-card variations that aren't pre-baked:
34
+ 'sign-in page with email + password + remember-me + magic-link alternative',
35
+ 'recovery page with backup-code form and contact-support link',
36
+
37
+ // Generic page-shaped composites:
38
+ 'page with a sticky header and a tabbed body',
39
+ 'data-rich settings page with a members table and an integrations grid',
40
+
41
+ // Things that should retrieve directly (control group):
42
+ 'kpi grid with 4 stat cards', // → likely matches dashboard-kpi-grid via search
43
+ 'conversion funnel chart', // → dashboard-funnel
44
+ 'sign in form with email', // → auth-signin-card-email
45
+ ];
46
+
47
+ const startedAt = Date.now();
48
+ const results = [];
49
+
50
+ console.log(`▶ chunk-synthesis eval — ${INTENTS.length} intents\n`);
51
+
52
+ const llmAdapter = await createAdapter();
53
+
54
+ for (const intent of INTENTS) {
55
+ const t0 = Date.now();
56
+ let row = { intent, ms: 0, source: null, hasHtml: false, htmlBytes: 0, planValid: null, error: null };
57
+ try {
58
+ const result = await composeFromIntent({ intent, llmAdapter, maxAttempts: 2 });
59
+ row.ms = Date.now() - t0;
60
+ row.source = result.source;
61
+ row.hasHtml = !!result.html;
62
+ row.htmlBytes = result.html ? result.html.length : 0;
63
+ row.planValid = result.synthesis ? !!result.synthesis.validation?.ok : null;
64
+ row.warnings = (result.warnings || []).length;
65
+ } catch (e) {
66
+ row.ms = Date.now() - t0;
67
+ row.error = e.message;
68
+ }
69
+ results.push(row);
70
+ const flag = row.hasHtml ? '✓' : '✗';
71
+ const tag = row.source === 'retrieval' ? '[ret]' : row.source === 'synthesis' ? '[syn]' : '[err]';
72
+ console.log(` ${flag} ${tag} ${row.ms.toString().padStart(5)}ms ${intent}`);
73
+ if (row.error) console.log(` error: ${row.error}`);
74
+ if (row.warnings) console.log(` ${row.warnings} warning(s)`);
75
+ }
76
+
77
+ const passed = results.filter((r) => r.hasHtml).length;
78
+ const passRate = passed / results.length;
79
+ const synthAttempts = results.filter((r) => r.source === 'synthesis');
80
+ const retrievalAttempts = results.filter((r) => r.source === 'retrieval');
81
+ const synthValidPlans = synthAttempts.filter((r) => r.planValid).length;
82
+
83
+ console.log(`\n── Summary ──`);
84
+ console.log(` Total intents: ${results.length}`);
85
+ console.log(` Produced HTML: ${passed} (${(passRate * 100).toFixed(0)}%)`);
86
+ console.log(` Retrieval-tier hits: ${retrievalAttempts.length}`);
87
+ console.log(` Synthesis-tier attempts: ${synthAttempts.length}`);
88
+ if (synthAttempts.length) {
89
+ console.log(` Synthesis plans validated: ${synthValidPlans}/${synthAttempts.length}`);
90
+ }
91
+ console.log(` Total time: ${((Date.now() - startedAt) / 1000).toFixed(1)}s`);
92
+
93
+ const passThreshold = 0.8;
94
+ if (passRate >= passThreshold) {
95
+ console.log(`\n✓ PASS — ≥${passThreshold * 100}% intents produced HTML`);
96
+ process.exit(0);
97
+ } else {
98
+ console.log(`\n✗ FAIL — pass rate ${(passRate * 100).toFixed(0)}% < ${passThreshold * 100}% threshold`);
99
+ process.exit(1);
100
+ }