@adia-ai/a2ui-mcp 0.0.2 → 0.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +66 -0
- package/README.md +29 -25
- package/package.json +2 -2
- package/scripts/eval-chunk-synthesis.mjs +100 -0
package/CHANGELOG.md
CHANGED
|
@@ -11,6 +11,72 @@ zettel strategies.
|
|
|
11
11
|
|
|
12
12
|
---
|
|
13
13
|
|
|
14
|
+
## [0.0.4] - 2026-04-28
|
|
15
|
+
|
|
16
|
+
Dependency-only bump to pull `@adia-ai/a2ui-corpus@^0.0.5` (one fewer
|
|
17
|
+
chunk + re-embedded vectors). No code changes in this package; existing
|
|
18
|
+
consumers using `compose_from_chunks` get a tiny retrieval-quality lift
|
|
19
|
+
from the refreshed embedding index.
|
|
20
|
+
|
|
21
|
+
### Dependencies
|
|
22
|
+
|
|
23
|
+
- Bumps `@adia-ai/a2ui-corpus` requirement from `^0.0.4` to `^0.0.5`.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## [0.0.3] - 2026-04-28
|
|
28
|
+
|
|
29
|
+
Same-day follow-up to `0.0.2` shipped earlier today. Wires the corpus
|
|
30
|
+
embedding vectors (from `@adia-ai/a2ui-corpus@0.0.4`) into the chunk
|
|
31
|
+
retrieval path so the synthesizer mixes-and-matches against semantic
|
|
32
|
+
similarity, not just keyword overlap.
|
|
33
|
+
|
|
34
|
+
### Added
|
|
35
|
+
|
|
36
|
+
- **`chunk-embedding-retriever.js`** (`packages/a2ui/retrieval/`) — sibling
|
|
37
|
+
to the existing pattern-embedding retriever. Lazy-loads
|
|
38
|
+
`chunk-embeddings.json`, scores cosine similarity against the same
|
|
39
|
+
provider that built the index. Graceful no-op when index is missing or
|
|
40
|
+
no API key is available.
|
|
41
|
+
- **`searchChunksAsync(query, opts)`** in `chunk-library.js` — embedding-
|
|
42
|
+
blended search. Returns top candidates ranked by `keyword + 5×cosine`
|
|
43
|
+
when embeddings are available; falls back to keyword-only otherwise.
|
|
44
|
+
Sync `searchChunks()` is preserved for callers that want the keyword
|
|
45
|
+
floor without the network round-trip.
|
|
46
|
+
|
|
47
|
+
### Changed
|
|
48
|
+
|
|
49
|
+
- `chunk-synthesizer.composeFromIntent()` — both tier-1 retrieval AND
|
|
50
|
+
tier-2 pre-search now go through `searchChunksAsync` instead of the
|
|
51
|
+
keyword-only `searchChunks`. The pre-search filters the prompt token
|
|
52
|
+
budget more accurately; the retrieval-first path catches more direct
|
|
53
|
+
hits before the LLM is invoked.
|
|
54
|
+
|
|
55
|
+
### Measured impact
|
|
56
|
+
|
|
57
|
+
`eval:chunk-synthesis` (10 hold-out intents, real Anthropic LLM):
|
|
58
|
+
|
|
59
|
+
| | 0.0.2 (keyword) | **0.0.3 (semantic)** |
|
|
60
|
+
|---|---|---|
|
|
61
|
+
| Retrieval-tier hits | 7/10 | **9/10** |
|
|
62
|
+
| Synthesis-tier attempts | 3 | **1** |
|
|
63
|
+
| Synthesis plans validated | 3/3 | 1/1 |
|
|
64
|
+
| Total time | 5.0s | 5.1s |
|
|
65
|
+
|
|
66
|
+
Examples that flipped from synthesis → direct retrieval:
|
|
67
|
+
- "kpi grid with 4 stat cards" → matches `dashboard-kpi-grid` cleanly.
|
|
68
|
+
- "conversion funnel chart" → matches `dashboard-funnel`.
|
|
69
|
+
|
|
70
|
+
The remaining synthesis intent ("recovery page with backup-code form +
|
|
71
|
+
contact-support link") is genuinely novel — it composes from parts that
|
|
72
|
+
don't have a 1:1 chunk and exercises the slot-validator path correctly.
|
|
73
|
+
|
|
74
|
+
### Dependencies
|
|
75
|
+
|
|
76
|
+
- Bumps `@adia-ai/a2ui-corpus` requirement from `^0.0.3` to `^0.0.4`.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
14
80
|
## [0.0.2] - 2026-04-28
|
|
15
81
|
|
|
16
82
|
Adds **gen-UI training-chunk tools** that expose the new chunk corpus
|
package/README.md
CHANGED
|
@@ -42,31 +42,35 @@ export GEMINI_API_KEY=AIza…
|
|
|
42
42
|
|
|
43
43
|
## Tools
|
|
44
44
|
|
|
45
|
-
The server registers
|
|
46
|
-
|
|
47
|
-
| Tool
|
|
48
|
-
|
|
49
|
-
| `generate_ui`
|
|
50
|
-
| `validate_schema`
|
|
51
|
-
| `classify_intent`
|
|
52
|
-
| `lookup_component`
|
|
53
|
-
| `get_component_map`
|
|
54
|
-
| `search_patterns`
|
|
55
|
-
| `assemble_context`
|
|
56
|
-
| `check_anti_patterns`
|
|
57
|
-
| `get_traits`
|
|
58
|
-
| `convert_html`
|
|
59
|
-
| `get_wiring_catalog`
|
|
60
|
-
| `import_pattern`
|
|
61
|
-
| `submit_feedback`
|
|
62
|
-
| `get_quality_metrics`
|
|
63
|
-
| `get_training_gaps`
|
|
64
|
-
| `run_eval`
|
|
65
|
-
| `get_fragment`
|
|
66
|
-
| `get_composition`
|
|
67
|
-
| `resolve_composition`
|
|
68
|
-
| `get_graph`
|
|
69
|
-
| `zettel_stats`
|
|
45
|
+
The server registers 25 tools. Shape is stable; argument schemas via Zod.
|
|
46
|
+
|
|
47
|
+
| Tool | What it does |
|
|
48
|
+
|-------------------------|-----------------------------------------------------------|
|
|
49
|
+
| `generate_ui` | Intent → A2UI tree. Engine (`monolithic`/`zettel`) + mode. |
|
|
50
|
+
| `validate_schema` | Run the 15-check validator on an A2UI tree; returns 0-100. |
|
|
51
|
+
| `classify_intent` | Extract concepts, entities, implied components, steelman. |
|
|
52
|
+
| `lookup_component` | Resolve a component name (alias-aware) to its schema. |
|
|
53
|
+
| `get_component_map` | Full tag→class map including alias normalizations. |
|
|
54
|
+
| `search_patterns` | Keyword-rank the monolithic pattern corpus. |
|
|
55
|
+
| `assemble_context` | Build the system prompt context for a given intent. |
|
|
56
|
+
| `check_anti_patterns` | Scan a tree for canonical anti-patterns (chart-legend, …). |
|
|
57
|
+
| `get_traits` | List trait catalog + their host-binding rules. |
|
|
58
|
+
| `convert_html` | Raw HTML → best-effort A2UI tree (import path). |
|
|
59
|
+
| `get_wiring_catalog` | Declarative wiring-engine recipes. |
|
|
60
|
+
| `import_pattern` | Commit a generated result into the pattern library. |
|
|
61
|
+
| `submit_feedback` | Append a user-feedback event to the feedback store. |
|
|
62
|
+
| `get_quality_metrics` | Aggregate pass/fail scores over a window. |
|
|
63
|
+
| `get_training_gaps` | Intents that currently miss coverage. |
|
|
64
|
+
| `run_eval` | Run the held-out benchmark; return pass/fail per intent. |
|
|
65
|
+
| `get_fragment` | Fetch a single zettel fragment by id. |
|
|
66
|
+
| `get_composition` | Fetch a named multi-fragment composition. |
|
|
67
|
+
| `resolve_composition` | Expand a composition reference into its fragments. |
|
|
68
|
+
| `get_graph` | Dump the zettel fragment-dependency graph. |
|
|
69
|
+
| `zettel_stats` | Corpus counts (fragments, compositions, reuse ratio, …). |
|
|
70
|
+
| **`search_chunks`** | Semantic + keyword search over the gen-UI training-chunk corpus (since 0.0.2). |
|
|
71
|
+
| **`get_chunk`** | Full record (HTML + metadata + slots) for one chunk. |
|
|
72
|
+
| **`lookup_chunk`** | List every chunk whose primary element is `<component>`. |
|
|
73
|
+
| **`compose_from_chunks`** | Retrieval-first / LLM-mix-and-match composition. Picks a page chunk + binds block/panel chunks to its slots when retrieval is weak. Validator enforces slot+kind contracts. **Embedding-blended retrieval as of 0.0.3** (was keyword-only in 0.0.2). |
|
|
70
74
|
|
|
71
75
|
## Layout
|
|
72
76
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@adia-ai/a2ui-mcp",
|
|
3
|
-
"version": "0.0.
|
|
3
|
+
"version": "0.0.4",
|
|
4
4
|
"description": "AdiaUI A2UI MCP server. Exposes the compose engine over MCP with an engine selector for monolithic + zettel strategies.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -29,7 +29,7 @@
|
|
|
29
29
|
"@adia-ai/a2ui-compose": "^0.0.1",
|
|
30
30
|
"@adia-ai/a2ui-retrieval": "^0.0.1",
|
|
31
31
|
"@adia-ai/a2ui-validator": "^0.0.1",
|
|
32
|
-
"@adia-ai/a2ui-corpus": "^0.0.
|
|
32
|
+
"@adia-ai/a2ui-corpus": "^0.0.5",
|
|
33
33
|
"zod": "^3.24.0"
|
|
34
34
|
}
|
|
35
35
|
}
|
|
@@ -0,0 +1,100 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* Real-LLM eval set for the chunk-aware composition synthesizer.
|
|
4
|
+
*
|
|
5
|
+
* Walks a hold-out set of intents that DON'T have a 1:1 chunk match in the
|
|
6
|
+
* corpus and exercises the full retrieval-first → synthesis-fallback path.
|
|
7
|
+
* For each intent, records:
|
|
8
|
+
* - which tier handled it (retrieval direct hit vs LLM synthesis)
|
|
9
|
+
* - whether HTML was produced
|
|
10
|
+
* - whether the synthesizer's plan validated against slot/kind contracts
|
|
11
|
+
* - composed HTML byte length (rough proxy for "not empty")
|
|
12
|
+
*
|
|
13
|
+
* Pass criterion: ≥ 80% of intents produce non-empty HTML; all synthesizer
|
|
14
|
+
* plans pass the slot+kind validator.
|
|
15
|
+
*
|
|
16
|
+
* Usage:
|
|
17
|
+
* node packages/a2ui/mcp/scripts/eval-chunk-synthesis.mjs
|
|
18
|
+
* ANTHROPIC_API_KEY=… node packages/a2ui/mcp/scripts/eval-chunk-synthesis.mjs
|
|
19
|
+
*/
|
|
20
|
+
|
|
21
|
+
import '../../../../scripts/load-env.mjs';
|
|
22
|
+
import { composeFromIntent } from '../../compose/engines/zettel/chunk-synthesizer.js';
|
|
23
|
+
import { createAdapter } from '../../compose/llm/llm-bridge.js';
|
|
24
|
+
|
|
25
|
+
// Hold-out intents — chosen to NOT have a 1:1 chunk match (so synthesis path
|
|
26
|
+
// is exercised). Mix of dashboard-shape, auth-shape, and novel composites.
|
|
27
|
+
const INTENTS = [
|
|
28
|
+
// Dashboard composites that don't exist as a single chunk:
|
|
29
|
+
'admin dashboard with KPI grid and conversion funnel',
|
|
30
|
+
'analytics page with audience KPIs and a country list',
|
|
31
|
+
'reports page with a transactions table and a sparkline grid',
|
|
32
|
+
|
|
33
|
+
// Auth-card variations that aren't pre-baked:
|
|
34
|
+
'sign-in page with email + password + remember-me + magic-link alternative',
|
|
35
|
+
'recovery page with backup-code form and contact-support link',
|
|
36
|
+
|
|
37
|
+
// Generic page-shaped composites:
|
|
38
|
+
'page with a sticky header and a tabbed body',
|
|
39
|
+
'data-rich settings page with a members table and an integrations grid',
|
|
40
|
+
|
|
41
|
+
// Things that should retrieve directly (control group):
|
|
42
|
+
'kpi grid with 4 stat cards', // → likely matches dashboard-kpi-grid via search
|
|
43
|
+
'conversion funnel chart', // → dashboard-funnel
|
|
44
|
+
'sign in form with email', // → auth-signin-card-email
|
|
45
|
+
];
|
|
46
|
+
|
|
47
|
+
const startedAt = Date.now();
|
|
48
|
+
const results = [];
|
|
49
|
+
|
|
50
|
+
console.log(`▶ chunk-synthesis eval — ${INTENTS.length} intents\n`);
|
|
51
|
+
|
|
52
|
+
const llmAdapter = await createAdapter();
|
|
53
|
+
|
|
54
|
+
for (const intent of INTENTS) {
|
|
55
|
+
const t0 = Date.now();
|
|
56
|
+
let row = { intent, ms: 0, source: null, hasHtml: false, htmlBytes: 0, planValid: null, error: null };
|
|
57
|
+
try {
|
|
58
|
+
const result = await composeFromIntent({ intent, llmAdapter, maxAttempts: 2 });
|
|
59
|
+
row.ms = Date.now() - t0;
|
|
60
|
+
row.source = result.source;
|
|
61
|
+
row.hasHtml = !!result.html;
|
|
62
|
+
row.htmlBytes = result.html ? result.html.length : 0;
|
|
63
|
+
row.planValid = result.synthesis ? !!result.synthesis.validation?.ok : null;
|
|
64
|
+
row.warnings = (result.warnings || []).length;
|
|
65
|
+
} catch (e) {
|
|
66
|
+
row.ms = Date.now() - t0;
|
|
67
|
+
row.error = e.message;
|
|
68
|
+
}
|
|
69
|
+
results.push(row);
|
|
70
|
+
const flag = row.hasHtml ? '✓' : '✗';
|
|
71
|
+
const tag = row.source === 'retrieval' ? '[ret]' : row.source === 'synthesis' ? '[syn]' : '[err]';
|
|
72
|
+
console.log(` ${flag} ${tag} ${row.ms.toString().padStart(5)}ms ${intent}`);
|
|
73
|
+
if (row.error) console.log(` error: ${row.error}`);
|
|
74
|
+
if (row.warnings) console.log(` ${row.warnings} warning(s)`);
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
const passed = results.filter((r) => r.hasHtml).length;
|
|
78
|
+
const passRate = passed / results.length;
|
|
79
|
+
const synthAttempts = results.filter((r) => r.source === 'synthesis');
|
|
80
|
+
const retrievalAttempts = results.filter((r) => r.source === 'retrieval');
|
|
81
|
+
const synthValidPlans = synthAttempts.filter((r) => r.planValid).length;
|
|
82
|
+
|
|
83
|
+
console.log(`\n── Summary ──`);
|
|
84
|
+
console.log(` Total intents: ${results.length}`);
|
|
85
|
+
console.log(` Produced HTML: ${passed} (${(passRate * 100).toFixed(0)}%)`);
|
|
86
|
+
console.log(` Retrieval-tier hits: ${retrievalAttempts.length}`);
|
|
87
|
+
console.log(` Synthesis-tier attempts: ${synthAttempts.length}`);
|
|
88
|
+
if (synthAttempts.length) {
|
|
89
|
+
console.log(` Synthesis plans validated: ${synthValidPlans}/${synthAttempts.length}`);
|
|
90
|
+
}
|
|
91
|
+
console.log(` Total time: ${((Date.now() - startedAt) / 1000).toFixed(1)}s`);
|
|
92
|
+
|
|
93
|
+
const passThreshold = 0.8;
|
|
94
|
+
if (passRate >= passThreshold) {
|
|
95
|
+
console.log(`\n✓ PASS — ≥${passThreshold * 100}% intents produced HTML`);
|
|
96
|
+
process.exit(0);
|
|
97
|
+
} else {
|
|
98
|
+
console.log(`\n✗ FAIL — pass rate ${(passRate * 100).toFixed(0)}% < ${passThreshold * 100}% threshold`);
|
|
99
|
+
process.exit(1);
|
|
100
|
+
}
|