@adia-ai/a2ui-mcp 0.0.4 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,137 @@ zettel strategies.
11
11
 
12
12
  ---
13
13
 
14
+ ## [0.1.0] - 2026-04-28
15
+
16
+ **Multi-turn gen-UI tool surface (Phase A code-complete).** Adds three new
17
+ MCP tools that turn the chunk-composition pipeline from single-shot into a
18
+ multi-turn surface, plus extends `compose_from_chunks` to mint a `state_id`
19
+ for refinement chains.
20
+
21
+ Spec: [`docs/specs/genui-multiturn-architecture.md`](../../../docs/specs/genui-multiturn-architecture.md) (Active v0.1.0).
22
+ Plan: [`docs/plans/genui-multiturn-rollout-2026-04-28.md`](../../../docs/plans/genui-multiturn-rollout-2026-04-28.md) (Phase A scoped).
23
+ ADR: [`0008-multiturn-genui-architecture.md`](../../../.brain/adrs/0008-multiturn-genui-architecture.md).
24
+
25
+ ### Added (MCP tools)
26
+
27
+ - **`refine_composition(state_id, intent | ops, max_attempts?)`** — takes a
28
+ `state_id` from a prior `compose_from_chunks` (or `refine_composition`)
29
+ call plus either a natural-language intent OR an explicit op-list, runs
30
+ the chunk-refiner's two-pass synthesis (locator → modifier; validator-
31
+ driven retry on op-validation failure), applies the resulting chunk-plan
32
+ ops, re-materializes HTML, mints a child `state_id` chained back to the
33
+ parent, and returns A2UI `updateComponents` messages (the wire format).
34
+ Failed ops surface in `ops_failed` with reasons; the new state is cached
35
+ for further refinement.
36
+ - **`get_state(state_id)`** — read-only inspection of a cached composition
37
+ state. Returns the chunk binding plan, materialized HTML, ops history
38
+ (chronological list of every refinement applied to this state's lineage),
39
+ and `parent_state_id` (chain-back). Auto-fires `cache-miss-on-known-state`
40
+ (severity `nit`) when the id is absent.
41
+ - **`report_issue(type, severity, title, body, state_id?, trace?, …)`** —
42
+ first-class telemetry / dev-process feedback tool. Writes a structured
43
+ JSON ticket to `.brain/audit-history/issues/<issue_id>.json`. Three
44
+ reporter kinds: LLM self-fire (this tool with `reporter: 'llm'`),
45
+ consumer-fire (passed through directly), engine auto-fire (internal,
46
+ per `AUTO_FIRE_POLICY` in the issue-reporter module). Severity vocabulary
47
+ `blocker | drift | nit` matches the existing `coherence-audit`
48
+ discipline. Trace levels: `'full' | 'summary' | 'none'`; oversized
49
+ traces (> 200 KB) spill to a sidecar `.trace.json` file. Tool count
50
+ goes from 25 → 28.
51
+
52
+ ### Changed
53
+
54
+ - **`compose_from_chunks`** now mints a `state_id` and caches the result
55
+ before returning. The response shape gains a `state_id` field; existing
56
+ fields (`html`, `plan`, `source`, `score`, `warnings`, `synthesis`)
57
+ are unchanged. Backward-compatible — consumers ignoring `state_id` see
58
+ no behavior change.
59
+ - **MCP server boot** instantiates a `getStateCache()` singleton and an
60
+ `ENGINE_VERSION_INFO` block (mcp 0.1.0, corpus 0.0.6, engine zettel,
61
+ llm_adapter anthropic) that's threaded through every issue-reporter
62
+ call so written tickets carry environment metadata.
63
+
64
+ ### Auto-fire policy (engine-driven)
65
+
66
+ `refine_composition` and `get_state` auto-fire `report_issue` on these
67
+ failure paths via the per-tool-call `IssueAccumulator`:
68
+
69
+ | Path | Type | Severity |
70
+ |---|---|---|
71
+ | Synthesizer exhausts retries | bug | drift |
72
+ | Validator exhausts retries on refinement | bug | blocker |
73
+ | Locator pass returns empty for targeted intent | bug | drift |
74
+ | Retrieval 0 + synthesis fallback fails | training-gap | drift |
75
+ | `get_state` called with absent `state_id` | bug | nit |
76
+ | `refine_composition` ops_failed list non-empty | bug | drift |
77
+
78
+ Multiple auto-fires within one tool call coalesce into a single issue
79
+ (highest severity wins; reasons listed in body + tags).
80
+
81
+ ### Smoke + eval
82
+
83
+ - `smoke:state-cache` — 34/34.
84
+ - `smoke:issues` — 62/62.
85
+ - `smoke:refine` — 51/51 (stub LLM).
86
+ - `test:a2ui` — 25/25 + 1 skipped (was 19/19 + 1; +6 multi-turn assertions).
87
+ - `mcp:smoke` — server boots clean with 28 tools registered.
88
+ - **`eval:refine-synthesis`** — 15/15 PASS. Ops 100%, validate 100%,
89
+ 0 auto-fires, 67 s.
90
+ - **No regression:** `eval:chunk-synthesis` 10/10, `eval:diff zettel`
91
+ coverage 83 / score 89 / MRR 0.986.
92
+
93
+ ### Dependencies
94
+
95
+ - Bumps `@adia-ai/a2ui-compose` requirement from `^0.0.1` to `^0.1.0`.
96
+
97
+ ### Migration
98
+
99
+ Additive surface; no breaking changes. The existing 25 tools are
100
+ unchanged behaviorally; `compose_from_chunks` adds a `state_id` field to
101
+ its response that ignoring consumers can safely drop.
102
+
103
+ ### Phase A simplification (documented)
104
+
105
+ Refinement ops internally use a chunk-plan vocabulary
106
+ (`rebindSlot | appendToSlot | removeFromSlot | replacePage`), wrapped
107
+ on output as standard `updateComponents` A2UI messages with
108
+ `components[].html` carrying the materialized payload. Strict
109
+ component-tree shape upgrade is queued for Phase B.
110
+
111
+ ---
112
+
113
+ ## [0.0.5] - 2026-04-28
114
+
115
+ **Retires the legacy exemplar auto-ingest.** Server boot no longer pulls
116
+ 70 patterns from the prose exemplars on every start. The chunk corpus
117
+ + chunk-aware synthesizer is the training surface now (`compose_from_chunks`
118
+ + `search_chunks` + `get_chunk` + `lookup_chunk` from 0.0.2).
119
+
120
+ ### Changed
121
+
122
+ - **`server.js` boot path** — removed the `auto-ingest` block that called
123
+ `ingestAll()` at startup. The runtime pattern library reverts from 225
124
+ (155 hand-authored + 70 ingested) to **155 hand-authored only**. Eval
125
+ metrics confirm no regression: `eval:diff zettel` still reports
126
+ coverage 83 / score 89; `smoke:chunks` still 33/33; `eval:chunk-synthesis`
127
+ still PASS at 100% / 9 retrieval / 1 synthesis on the hold-out set.
128
+ - **`test:a2ui` test 6 updated** — was asserting "post-ingest count ≥ 200
129
+ patterns"; now asserts the two training-corpus surfaces independently:
130
+ pattern library ≥ 100 hand-authored entries (155 today), chunk corpus
131
+ ≥ 500 unique chunks (701 today).
132
+
133
+ ### Dependencies
134
+
135
+ - Bumps `@adia-ai/a2ui-corpus` requirement from `^0.0.5` to `^0.0.6`.
136
+
137
+ ### Migration (none required)
138
+
139
+ - Pure runtime simplification. Existing MCP consumers experience the
140
+ same tool surface; the only observable change is faster boot (no
141
+ ingest round trip) and a smaller in-memory pattern library.
142
+
143
+ ---
144
+
14
145
  ## [0.0.4] - 2026-04-28
15
146
 
16
147
  Dependency-only bump to pull `@adia-ai/a2ui-corpus@^0.0.5` (one fewer
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@adia-ai/a2ui-mcp",
3
- "version": "0.0.4",
3
+ "version": "0.1.0",
4
4
  "description": "AdiaUI A2UI MCP server. Exposes the compose engine over MCP with an engine selector for monolithic + zettel strategies.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -26,10 +26,10 @@
26
26
  },
27
27
  "dependencies": {
28
28
  "@modelcontextprotocol/sdk": "^1.29.0",
29
- "@adia-ai/a2ui-compose": "^0.0.1",
29
+ "@adia-ai/a2ui-compose": "^0.1.0",
30
30
  "@adia-ai/a2ui-retrieval": "^0.0.1",
31
31
  "@adia-ai/a2ui-validator": "^0.0.1",
32
- "@adia-ai/a2ui-corpus": "^0.0.5",
32
+ "@adia-ai/a2ui-corpus": "^0.0.6",
33
33
  "zod": "^3.24.0"
34
34
  }
35
35
  }
@@ -0,0 +1,270 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Real-LLM eval set for the chunk-refiner — multi-turn refinement engine.
4
+ *
5
+ * Walks 5 seed compositions × 3 refinement intents = 15 total. Seeds are
6
+ * deterministic chunk-binding plans (no LLM cost on the create side); the
7
+ * refiner exercises the full two-pass synthesis (locator → modifier),
8
+ * validator-driven retry, and op-application path.
9
+ *
10
+ * Pass criteria (per spec §6.2 + plan §1.7):
11
+ * - ≥ 80% of refinements produce ops (no all-fail outcome).
12
+ * - ≥ 90% of returned ops apply cleanly (validator + applyOps).
13
+ * - ≤ 5 auto-fired issues across the full run (plan §1.8 #3).
14
+ *
15
+ * Spec: docs/specs/genui-multiturn-architecture.md (Active v0.1.0).
16
+ * Plan: docs/plans/genui-multiturn-rollout-2026-04-28.md (Phase A).
17
+ *
18
+ * Usage:
19
+ * ANTHROPIC_API_KEY=… node packages/a2ui/mcp/scripts/eval-refine-synthesis.mjs
20
+ */
21
+
22
+ import '../../../../scripts/load-env.mjs';
23
+ import {
24
+ refineFromIntent,
25
+ applyOps,
26
+ } from '../../compose/engines/zettel/chunk-refiner.js';
27
+ import { mintStateId } from '../../compose/engines/zettel/state-cache.js';
28
+ import { createIssueAccumulator } from '../../compose/engines/zettel/issue-reporter.js';
29
+ import { composeFromPlan } from '../../compose/engines/zettel/chunk-composer.js';
30
+ import { listChunksByKind, getChunk } from '../../corpus/scripts/chunk-library.js';
31
+ import { createAdapter } from '../../compose/llm/llm-bridge.js';
32
+
33
+ // ── Discover corpus shape ────────────────────────────────────────────
34
+ // Pick a page with ≥ 2 slots so refinements have room to target.
35
+
36
+ const pages = listChunksByKind('page');
37
+ const panels = listChunksByKind('panel');
38
+ const blocks = listChunksByKind('block');
39
+
40
+ const slotsOf = (c) => (c.slots || c.instances?.[0]?.slots || []).map((s) => s.name);
41
+
42
+ const samplePage =
43
+ pages.find((p) => slotsOf(p).length >= 2)
44
+ || panels.find((p) => slotsOf(p).length >= 2)
45
+ || pages[0]
46
+ || panels[0];
47
+
48
+ if (!samplePage || slotsOf(samplePage).length === 0) {
49
+ console.error('Corpus has no page/panel chunks with declared slots — aborting eval');
50
+ process.exit(2);
51
+ }
52
+
53
+ const pageSlots = slotsOf(samplePage);
54
+ const slotA = pageSlots[0]; // typically the header slot
55
+ const slotB = pageSlots[1] || pageSlots[0]; // typically the content slot
56
+
57
+ // Pick at least 4 distinct block chunks. Filter out anything missing HTML.
58
+ const usableBlocks = blocks
59
+ .filter((b) => (b.html || b.instances?.[0]?.html))
60
+ .slice(0, 8)
61
+ .map((b) => b.name);
62
+
63
+ if (usableBlocks.length < 4) {
64
+ console.error(`Corpus has only ${usableBlocks.length} usable block chunks (need ≥ 4) — aborting eval`);
65
+ process.exit(2);
66
+ }
67
+
68
+ const [B0, B1, B2, B3, B4 = B0, B5 = B1] = usableBlocks;
69
+
70
+ console.log(`▶ refine-synthesis eval`);
71
+ console.log(` page: ${samplePage.name} (${pageSlots.join(', ')})`);
72
+ console.log(` blocks: ${usableBlocks.slice(0, 6).join(', ')}`);
73
+ console.log('');
74
+
75
+ // ── Seeds + refinements ──────────────────────────────────────────────
76
+ // 5 seeds × 3 refinements = 15 hold-out intents.
77
+
78
+ const SEEDS = [
79
+ {
80
+ label: 'two-block-content',
81
+ plan: {
82
+ page: samplePage.name,
83
+ slot_bindings: { [slotA]: [B0], [slotB]: [B1, B2] },
84
+ },
85
+ refinements: [
86
+ `add another block to ${slotB}`,
87
+ `remove one block from ${slotB}`,
88
+ `swap the ${slotA} for a different option`,
89
+ ],
90
+ },
91
+ {
92
+ label: 'single-block-content',
93
+ plan: {
94
+ page: samplePage.name,
95
+ slot_bindings: { [slotA]: [B0], [slotB]: [B1] },
96
+ },
97
+ refinements: [
98
+ `add a second block alongside the existing one in ${slotB}`,
99
+ `replace the ${slotA} with a more concise header`,
100
+ `preserve the existing block and append another to ${slotB}`,
101
+ ],
102
+ },
103
+ {
104
+ label: 'three-block-stack',
105
+ plan: {
106
+ page: samplePage.name,
107
+ slot_bindings: { [slotA]: [B0], [slotB]: [B1, B2, B3] },
108
+ },
109
+ refinements: [
110
+ `remove the middle block from ${slotB}`,
111
+ `drop the last block from ${slotB}`,
112
+ `make the layout more compact`,
113
+ ],
114
+ },
115
+ {
116
+ label: 'header-only',
117
+ plan: {
118
+ page: samplePage.name,
119
+ slot_bindings: { [slotA]: [B0], [slotB]: [B1] },
120
+ },
121
+ refinements: [
122
+ `add an additional block to ${slotB}`,
123
+ `change the ${slotA}`,
124
+ `preserve everything and add a new block to ${slotB}`,
125
+ ],
126
+ },
127
+ {
128
+ label: 'mixed-stack',
129
+ plan: {
130
+ page: samplePage.name,
131
+ slot_bindings: { [slotA]: [B0], [slotB]: [B2, B3] },
132
+ },
133
+ refinements: [
134
+ `swap the first block in ${slotB} for a different one`,
135
+ `add another block at the end of ${slotB}`,
136
+ `drop the last block from ${slotB}`,
137
+ ],
138
+ },
139
+ ];
140
+
141
+ // ── Run ──────────────────────────────────────────────────────────────
142
+
143
+ const llmAdapter = await createAdapter();
144
+ const startedAt = Date.now();
145
+ const results = [];
146
+ const autoFires = { total: 0, byReason: {} };
147
+
148
+ for (const seed of SEEDS) {
149
+ // Materialize the seed plan (no LLM call; pure compose-from-plan).
150
+ const composed = composeFromPlan(seed.plan);
151
+ if (!composed.html) {
152
+ console.log(`✗ seed [${seed.label}] failed to materialize — skipping refinements`);
153
+ for (const intent of seed.refinements) {
154
+ results.push({
155
+ seed: seed.label, intent, ms: 0, ops_count: 0, attempts: 0,
156
+ targeted: null, ops_applied: 0, ops_failed: 0,
157
+ auto_fires: [], error: 'seed-materialize-failed',
158
+ });
159
+ }
160
+ continue;
161
+ }
162
+
163
+ const priorState = {
164
+ state_id: mintStateId(seed.label, 1),
165
+ intent: `[seed] ${seed.label}`,
166
+ plan: seed.plan,
167
+ html: composed.html,
168
+ version: 1,
169
+ };
170
+
171
+ console.log(`── seed [${seed.label}] · ${seed.plan.slot_bindings[slotB].length} block(s) in ${slotB}`);
172
+
173
+ for (const intent of seed.refinements) {
174
+ const acc = createIssueAccumulator();
175
+ const t0 = Date.now();
176
+ const row = {
177
+ seed: seed.label, intent, ms: 0, ops_count: 0, attempts: 0,
178
+ targeted: null, ops_applied: 0, ops_failed: 0,
179
+ auto_fires: [], error: null,
180
+ };
181
+
182
+ try {
183
+ const refined = await refineFromIntent({
184
+ priorState,
185
+ intent,
186
+ llmAdapter,
187
+ maxAttempts: 2,
188
+ issueAccumulator: acc,
189
+ });
190
+ row.ms = Date.now() - t0;
191
+ row.ops_count = refined.ops.length;
192
+ row.attempts = refined.synthesis?.attempts ?? 0;
193
+ row.targeted = refined.synthesis?.targeted ?? null;
194
+
195
+ if (refined.ops.length > 0) {
196
+ const applied = await applyOps({ priorState, ops: refined.ops });
197
+ row.ops_applied = applied.ops_applied.length;
198
+ row.ops_failed = applied.ops_failed.length;
199
+ }
200
+ } catch (e) {
201
+ row.ms = Date.now() - t0;
202
+ row.error = e.message;
203
+ }
204
+
205
+ row.auto_fires = acc.reasons();
206
+ autoFires.total += row.auto_fires.length;
207
+ for (const r of row.auto_fires) {
208
+ autoFires.byReason[r] = (autoFires.byReason[r] || 0) + 1;
209
+ }
210
+
211
+ results.push(row);
212
+
213
+ const flag = row.ops_count > 0 && row.ops_failed === 0 ? '✓' : (row.ops_count > 0 ? '~' : '✗');
214
+ const tgtTag = row.targeted === true ? 'tgt' : row.targeted === false ? 'unt' : '???';
215
+ const padMs = row.ms.toString().padStart(5);
216
+ console.log(` ${flag} [${tgtTag}] ${padMs}ms ops=${row.ops_count} att=${row.attempts} ${intent}`);
217
+ if (row.error) console.log(` error: ${row.error}`);
218
+ if (row.ops_failed > 0) console.log(` ops_failed: ${row.ops_failed}`);
219
+ if (row.auto_fires.length) console.log(` auto-fires: ${row.auto_fires.join(', ')}`);
220
+ }
221
+ }
222
+
223
+ // ── Summary ──────────────────────────────────────────────────────────
224
+
225
+ const total = results.length;
226
+ const producedOps = results.filter((r) => r.ops_count > 0).length;
227
+ const totalOpsReturned = results.reduce((s, r) => s + r.ops_count, 0);
228
+ const totalOpsApplied = results.reduce((s, r) => s + r.ops_applied, 0);
229
+ const totalOpsFailed = results.reduce((s, r) => s + r.ops_failed, 0);
230
+
231
+ const opsRate = total ? producedOps / total : 0;
232
+ const validateRate = totalOpsReturned ? totalOpsApplied / totalOpsReturned : 0;
233
+
234
+ console.log(`\n── Summary ──`);
235
+ console.log(` Refinements: ${total}`);
236
+ console.log(` Produced ops: ${producedOps}/${total} (${(opsRate * 100).toFixed(0)}%)`);
237
+ console.log(` Ops returned: ${totalOpsReturned}; applied: ${totalOpsApplied}; failed: ${totalOpsFailed} (validate ${(validateRate * 100).toFixed(0)}%)`);
238
+ console.log(` Auto-fires: ${autoFires.total}`);
239
+ if (autoFires.total > 0) {
240
+ for (const [reason, n] of Object.entries(autoFires.byReason)) {
241
+ console.log(` ${reason}: ${n}`);
242
+ }
243
+ }
244
+ const targeted = results.filter((r) => r.targeted === true).length;
245
+ const untargeted = results.filter((r) => r.targeted === false).length;
246
+ console.log(` Targeted vs untargeted: ${targeted} / ${untargeted}`);
247
+ console.log(` Total time: ${((Date.now() - startedAt) / 1000).toFixed(1)}s`);
248
+
249
+ const opsThreshold = 0.8;
250
+ const validateThreshold = 0.9;
251
+ const autoFireCeiling = 5;
252
+
253
+ const opsPass = opsRate >= opsThreshold;
254
+ const validatePass = totalOpsReturned === 0 || validateRate >= validateThreshold;
255
+ const autoFirePass = autoFires.total <= autoFireCeiling;
256
+
257
+ const allPass = opsPass && validatePass && autoFirePass;
258
+
259
+ console.log('');
260
+ console.log(` ops rate ≥ ${opsThreshold * 100}%: ${opsPass ? '✓' : '✗'} (${(opsRate * 100).toFixed(0)}%)`);
261
+ console.log(` validate ≥ ${validateThreshold * 100}%: ${validatePass ? '✓' : '✗'} (${(validateRate * 100).toFixed(0)}%)`);
262
+ console.log(` auto-fires ≤ ${autoFireCeiling}: ${autoFirePass ? '✓' : '✗'} (${autoFires.total})`);
263
+
264
+ if (allPass) {
265
+ console.log(`\n✓ PASS`);
266
+ process.exit(0);
267
+ } else {
268
+ console.log(`\n✗ FAIL`);
269
+ process.exit(1);
270
+ }