@rubytech/create-realagent 1.0.852 → 1.0.854

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/dist/__tests__/preflight-port-classifier.test.js +240 -73
  2. package/dist/index.js +59 -11
  3. package/dist/preflight-port-classifier.js +176 -41
  4. package/package.json +1 -1
  5. package/payload/platform/config/brand-registry.json +44 -0
  6. package/payload/platform/lib/persistent-components/dist/index.d.ts +21 -0
  7. package/payload/platform/lib/persistent-components/dist/index.d.ts.map +1 -0
  8. package/payload/platform/lib/persistent-components/dist/index.js +32 -0
  9. package/payload/platform/lib/persistent-components/dist/index.js.map +1 -0
  10. package/payload/platform/lib/persistent-components/src/index.ts +28 -0
  11. package/payload/platform/lib/persistent-components/tsconfig.json +8 -0
  12. package/payload/platform/package.json +2 -2
  13. package/payload/platform/plugins/admin/PLUGIN.md +1 -1
  14. package/payload/platform/plugins/admin/hooks/__tests__/playwright-file-guard.test.sh +278 -0
  15. package/payload/platform/plugins/admin/hooks/playwright-file-guard.sh +204 -20
  16. package/payload/platform/plugins/admin/mcp/dist/index.js +40 -1
  17. package/payload/platform/plugins/admin/mcp/dist/index.js.map +1 -1
  18. package/payload/platform/plugins/docs/references/deployment.md +2 -0
  19. package/payload/platform/plugins/docs/references/getting-started.md +2 -0
  20. package/payload/platform/plugins/docs/references/platform.md +1 -1
  21. package/payload/platform/plugins/docs/references/troubleshooting.md +10 -0
  22. package/payload/platform/scripts/admin-persist-audit.ts +191 -0
  23. package/payload/platform/scripts/component-knowledgedoc-backfill.ts +214 -0
  24. package/payload/platform/scripts/installer-device-verify.sh +249 -0
  25. package/payload/platform/templates/specialists/agents/content-producer.md +2 -2
  26. package/payload/server/chunk-CFNSKDGA.js +667 -0
  27. package/payload/server/chunk-DC6DWYZJ.js +1603 -0
  28. package/payload/server/chunk-LTB5SSQW.js +10889 -0
  29. package/payload/server/chunk-MN2LGNUB.js +2143 -0
  30. package/payload/server/client-pool-AMT2W3II.js +34 -0
  31. package/payload/server/cloudflare-task-tracker-LJ4SMK2D.js +20 -0
  32. package/payload/server/maxy-edge.js +3 -3
  33. package/payload/server/public/assets/admin-DZ8Ke7t3.js +352 -0
  34. package/payload/server/public/assets/public-DApUXgoq.js +5 -0
  35. package/payload/server/public/assets/useVoiceRecorder-CI8GpxfU.js +36 -0
  36. package/payload/server/public/index.html +2 -2
  37. package/payload/server/public/public.html +2 -2
  38. package/payload/server/server.js +535 -351
  39. package/payload/server/public/assets/admin-Dyl8uNxX.js +0 -352
  40. package/payload/server/public/assets/public-B_PNZUph.js +0 -5
  41. package/payload/server/public/assets/useVoiceRecorder-fD0IWzJj.js +0 -36
@@ -60,6 +60,8 @@ When the text field is empty, a microphone button appears in place of the send b
60
60
 
61
61
  Voice recording requires a secure connection (HTTPS). When accessing {{productName}} over the local network via HTTP, use the tunnel URL for voice notes.
62
62
 
63
+ You can also drop, paste, or pick an audio file (`.opus`, `.ogg`, `.m4a`, `.mp3`, `.wav`, `.webm`) into the chat composer — for example a voice note forwarded from WhatsApp. The file is transcribed the same way the in-browser recording is, and only the transcript reaches {{productName}}; the audio itself is discarded after transcription.
64
+
63
65
  ## What {{productName}} Remembers
64
66
 
65
67
  {{productName}} maintains a memory graph of everything important: contacts, conversations, preferences, relationships, and context. When you tell {{productName}} something, it stores it. When you ask about something later, it retrieves it.
@@ -65,7 +65,7 @@ There is no dashboard, no settings panel, no menus. Everything is done through c
65
65
 
66
66
  The chat input auto-grows as you type — it expands to fit your message and shrinks back when you delete text. You can also drag the resize handle above the input to set a custom height.
67
67
 
68
- The admin interface is a three-pane layout: a sidebar on the left with your brand mark, navigation (Chat, People, Agents, Projects, Tasks, Artefacts), and your recent conversations; the chat in the middle; and an artefact pane on the right that opens when you select a document, click a project, or open Browser, Data, or Graph from the menu — holding the surface side-by-side with the conversation so the chat stays live while you work in it. The sidebar's nav rows swap the list view in place — Chat shows recent conversations, Projects shows your active work projects, and Artefacts lists every KnowledgeDocument plus this account's agent templates (your admin agent's IDENTITY, SOUL, and KNOWLEDGE files plus one entry per enabled specialist). The People, Agents, and Tasks rows are graph shortcuts: clicking each opens the artefact-pane Graph filtered to every Person, every public Agent, or every Task in your account respectively, with no side-list — the graph itself is the result. Public agents become first-class graph entities the moment you create them, with edges to their IDENTITY/SOUL/KNOWLEDGE files, edges to every knowledge document they have access to, and edges from every conversation they have handled, so a single Agents click reveals the whole shape of who knows what and who has been talking to whom. Click an artefact row to open the document. KnowledgeDocuments and your admin agent's templates are editable — type in the document and changes save automatically; specialist agent templates are read-only because they ship with Maxy and your edits would be overwritten on the next install. PDF artefacts render inline so you can read them without leaving the pane. If your browser doesn't have a built-in PDF viewer, a Download button appears instead. Artefacts that have no readable file backing them (orphan rows, files removed from disk, unsupported content types) show a one-line banner explaining the skip instead of opening to a blank pane. Click a project row to open the Graph view focused on that project's neighbourhood — clicking a second project swaps the focus rather than stacking on top. The chat / artefact divider is drag-resizable — drag the line between the columns to make either side wider; double-click it to reset to half of the available width (viewport minus sidebar), clamped to the chat / artefact min-width floors. Your chosen width is remembered across reloads. On wider screens (>1280px) all three panes are visible. The sidebar narrows at 1280px, the artefact pane hides at 1080px (Browser, Data, and Graph then open as full-window pages instead), and the sidebar collapses to a 56px icon rail at 820px. On phones (<720px) the sidebar slides in as a drawer from the left when you tap the menu icon in the chat header.
68
+ The admin interface is a three-pane layout: a sidebar on the left with your brand mark, navigation (Chat, People, Agents, Projects, Tasks, Artefacts), and your recent conversations; the chat in the middle; and an artefact pane on the right that opens when you select a document, click a project, or open Browser, Data, or Graph from the menu — holding the surface side-by-side with the conversation so the chat stays live while you work in it. The sidebar's nav rows swap the list view in place — Chat shows recent conversations, Projects shows your active work projects, and Artefacts lists every KnowledgeDocument plus this account's agent templates (your admin agent's IDENTITY, SOUL, and KNOWLEDGE files plus one entry per enabled specialist). The People, Agents, and Tasks rows are graph shortcuts: clicking each opens the artefact-pane Graph filtered to every Person, every public Agent, or every Task in your account respectively, with no side-list — the graph itself is the result. Public agents become first-class graph entities the moment you create them, with edges to their IDENTITY/SOUL/KNOWLEDGE files, edges to every knowledge document they have access to, and edges from every conversation they have handled, so a single Agents click reveals the whole shape of who knows what and who has been talking to whom. Click an artefact row to open the document. KnowledgeDocuments and your admin agent's templates are editable — type in the document and changes save automatically; specialist agent templates are read-only because they ship with Maxy and your edits would be overwritten on the next install. PDF artefacts render inline so you can read them without leaving the pane. If your browser doesn't have a built-in PDF viewer, a Download button appears instead. Artefacts that have no readable file backing them (orphan rows, files removed from disk, unsupported content types) show a one-line banner explaining the skip instead of opening to a blank pane. Click a project row to open the Graph view focused on that project's neighbourhood — clicking a second project swaps the focus rather than stacking on top. The chat / artefact divider is drag-resizable — drag the line between the columns to make either side wider; double-click it to reset to half of the available width (viewport minus sidebar), clamped to the chat / artefact min-width floors. Your chosen width is remembered across reloads. On wider screens (>1280px) all three panes are visible. The sidebar narrows at 1280px, the artefact pane hides at 1080px (Browser, Data, and Graph then open as full-window pages instead), and the sidebar collapses to a 56px icon rail at 820px. On phones (<720px) the sidebar slides in as a drawer from the left when you tap the menu icon in the chat header. When the sidebar is collapsed to the 56px icon rail, clicking the Artefacts icon expands the rail back open so the artefact list is visible — the row was previously a silent no-op in collapsed state.
69
69
 
70
70
  Page titles are brand-aware: the browser tab shows your product name (e.g. `Real Agent` instead of `Maxy`) on every shell — chat, graph, and data — so a non-default brand never leaks the default name in tab strips or browser history.
71
71
 
@@ -1,5 +1,13 @@
1
1
  # Troubleshooting
2
2
 
3
+ ## Browser navigation to a local file (`file://`) used to time out for two minutes
4
+
5
+ **Symptom:** Older versions of the platform's admin agent would attempt `browser_navigate file:///path/to.html`, hit Playwright's silent two-minute timeout, then guess fixed ports (8080 / 3000 / 8000 / 9000) and report `ERR_CONNECTION_REFUSED` for each before someone manually started a local HTTP server.
6
+
7
+ **Resolution shipped:** The `playwright-file-guard` PreToolUse hook (admin plugin) intercepts `file://` URLs, picks a free loopback port, backgrounds `python3 -m http.server` rooted at the file's parent directory, connect-verifies the server within one second, and rewrites the tool call's URL to `http://127.0.0.1:<port>/<basename>` before Playwright sees it. The agent never sees the rewrite. Stale server processes are reaped opportunistically on every hook invocation (1 h threshold, gated by a `ps` cmdline check that won't kill a reused PID).
8
+
9
+ **Diagnose if it ever recurs:** grep the per-conversation stream log for `[playwright-file-guard] action=`. One `action=rewrite original=file://… port=<n> pid=<m>` line per file:// navigate is the healthy signal. `action=fail reason=<r>` indicates the hook tried to rewrite but failed open (Playwright handled the original URL); the reason field names the cause (`python3-missing`, `port-pick-failed`, `server-not-ready`, `file-not-found`, `spawn-failed`). The `cleanup` argv on the hook script can be invoked manually to sweep `/tmp/playwright-file-guard.*.pid`; the suite at `platform/plugins/admin/hooks/__tests__/playwright-file-guard.test.sh` exercises every path.
10
+
3
11
  ## First user-domain write rejected by `[graph-write-gate] reject reason=no-admin-user`
4
12
 
5
13
  **Symptom:** Admin chat reports "couldn't save that — set up your business profile first" or `[graph-write-gate] reject reason=no-admin-user` appears in `server.log` on the operator's first non-bootstrap write (a website, service, opening hours, etc.). Reproduces on Minimal-onboarded installs from before the seed-stamping fix shipped.
@@ -64,6 +72,8 @@ tail -200 ~/.maxy/logs/maxy-ui.log | rg '\[remote-auth\].*resolvedKind='
64
72
 
65
73
  **Agent searches the filesystem after uploading a zip.** If you uploaded a zip and the agent burns several turns running `find` / `Glob` instead of unzipping, that is the symptom of the recovery-retry attachment-context regression (now closed by the recovery context preservation contract in `.docs/agents.md`). Greppable confirmation is the `[context-overflow-recovery] retry … attachmentsCarried=<n>` line in the conversation stream log. If you see `[context-overflow-recovery] WARN attachment-context-lost`, the regression has returned — surface to support.
66
74
 
75
+
76
+ **A turn rendered in chat is missing on next page-refresh.** Pre-the 2026-05-07 mandate this was a class of silent failure — Neo4j persists were wrapped in a no-op error catch and a write that threw left the artefact "rendered then disappeared on resume". The 2026-05-07 mandate makes JSONL canonical: the resume route reads the SDK transcript file at `~/.claude/projects/<project-key>/<sessionId>.jsonl` first, supplements from Neo4j, and triggers async heal-on-resume writes for any turn the JSONL has but Neo4j does not. So a refreshed conversation always renders what the SDK saw, regardless of write outcome. If a heal write itself fails, the chat shows a top-of-conversation banner naming the count; if every heal succeeds the resume is silent and the missing rows are quietly restored to Neo4j. Greppable post-deploy invariants in the per-conversation stream log (`logs/claude-agent-stream-<conversationId>.log`): `[admin-resume] reason=<…> source=<jsonl|jsonl-missing|neo4j-only>` (one per resume), `[admin-persist] convId=<8> writer=<…> outcome=<ok|fail|skip>` (per persist site), `[admin-persist-heal] convId=<8> turnIndex=<n> outcome=<ok|fail>` (per heal write). To force-audit a specific conversation against its Neo4j projection without re-executing it, run `tsx platform/scripts/admin-persist-audit.ts --conversation-id=<uuid> --account-id=<uuid> --session-id=<uuid>` — non-zero exit + per-divergence `[admin-persist-audit] expected=<message|component> missing reason=neo4j-row-absent` lines name what would have been silently lost pre-mandate.
67
77
  **Wrong Claude account answering on a multi-brand device.** On a host running both Maxy and Real Agent, each brand's admin agent reads its own `~/${brand.configDir}/.claude/.credentials.json`; there is no longer a shared `~/.claude/` thrashing them against one another. If a brand reports auth failures or appears to be operating against the wrong subscription, check three things:
68
78
  1. `grep "\[claude-auth\] init" ~/.${brand}/logs/server.log | tail -1` — the resolved path must end with `~/.${brand}/.claude/.credentials.json`. If a `[claude-auth] WARN cross-brand-path-detected` line is present, the runtime is still pointing at `~/.claude/`; the brand main service did not pick up the `Environment=CLAUDE_CONFIG_DIR=` setting (re-run the brand installer to refresh the unit file).
69
79
  2. `diff <(jq .claudeAiOauth.accessToken ~/.maxy/.claude/.credentials.json) <(jq .claudeAiOauth.accessToken ~/.realagent/.claude/.credentials.json)` — must be non-empty after each brand's operator has run `claude /login` against distinct Anthropic accounts; if it's empty, both brands are still logged in to the same account (operator action, not a code bug).
@@ -0,0 +1,191 @@
1
+ #!/usr/bin/env -S node --loader tsx
2
+ /**
3
+ * Task 940 — admin persist audit harness.
4
+ *
5
+ * Compares JSONL canonical state against Neo4j projection for a given
6
+ * conversationId. Prints one [admin-persist-audit] divergence line per
7
+ * (sdkTurnUuid, expected) gap; non-zero exit on any mismatch. Designed to
8
+ * run against an operator-supplied stream log fixture WITHOUT re-executing
9
+ * the live session — JSONL is the only ground truth this script consults.
10
+ *
11
+ * Usage:
12
+ * tsx platform/scripts/admin-persist-audit.ts \
13
+ * --conversation-id=<uuid> \
14
+ * --account-id=<uuid> \
15
+ * --jsonl=<path> # optional override; otherwise resolved from accountId+sessionId
16
+ * --session-id=<uuid> # required if --jsonl not provided
17
+ *
18
+ * Exit codes:
19
+ * 0 = no divergences
20
+ * 1 = at least one Message or Component absent from Neo4j
21
+ * 2 = invocation error (missing args, file unreadable, Neo4j unreachable)
22
+ *
23
+ * Why audit-only and not also auto-heal: the heal-on-resume writer at
24
+ * server/routes/admin/sessions.ts handles the live path. This harness exists
25
+ * for forensic investigation against operator-supplied JSONLs (e.g. the
26
+ * 2026-05-07 stream log that motivated this task) where the live session
27
+ * has long since terminated.
28
+ */
29
+
30
+ import { existsSync, readFileSync } from "node:fs";
31
+ import { homedir } from "node:os";
32
+ import { resolve } from "node:path";
33
+ import process from "node:process";
34
+
35
+ import { replayJsonl, resolveJsonlPath } from "../ui/app/lib/claude-agent/jsonl-replay";
36
+ import { ACCOUNTS_DIR } from "../ui/app/lib/claude-agent/account";
37
+ import { getRecentMessages, getSession } from "../ui/app/lib/neo4j-store";
38
+ import { PERSISTENT_COMPONENTS } from "../lib/persistent-components/src/index";
39
+
40
+ interface Args {
41
+ conversationId: string;
42
+ accountId: string;
43
+ jsonlPath: string;
44
+ }
45
+
46
+ function parseArgs(argv: string[]): Args | { error: string } {
47
+ const out: Partial<Args> = {};
48
+ let sessionId: string | undefined;
49
+ let jsonlOverride: string | undefined;
50
+ for (const a of argv) {
51
+ const m = a.match(/^--([a-z-]+)=(.+)$/);
52
+ if (!m) continue;
53
+ const [, key, val] = m;
54
+ if (key === "conversation-id") out.conversationId = val;
55
+ else if (key === "account-id") out.accountId = val;
56
+ else if (key === "session-id") sessionId = val;
57
+ else if (key === "jsonl") jsonlOverride = val;
58
+ }
59
+ if (!out.conversationId) return { error: "--conversation-id required" };
60
+ if (!out.accountId) return { error: "--account-id required" };
61
+ if (jsonlOverride) {
62
+ out.jsonlPath = resolve(jsonlOverride);
63
+ } else if (sessionId) {
64
+ const accountDir = resolve(ACCOUNTS_DIR, out.accountId);
65
+ out.jsonlPath = resolveJsonlPath(homedir(), accountDir, sessionId);
66
+ } else {
67
+ return { error: "either --jsonl or --session-id required" };
68
+ }
69
+ return out as Args;
70
+ }
71
+
72
+ async function main(): Promise<number> {
73
+ const parsed = parseArgs(process.argv.slice(2));
74
+ if ("error" in parsed) {
75
+ console.error(`[admin-persist-audit] usage error: ${parsed.error}`);
76
+ return 2;
77
+ }
78
+ const { conversationId, jsonlPath } = parsed;
79
+
80
+ if (!existsSync(jsonlPath)) {
81
+ console.error(`[admin-persist-audit] jsonl absent path=${jsonlPath} convId=${conversationId.slice(0, 8)}`);
82
+ return 2;
83
+ }
84
+
85
+ // JSONL replay — derives the canonical message stream and the expected
86
+ // component side-effects.
87
+ const replay = replayJsonl(jsonlPath);
88
+ if (replay.malformedLines > 0) {
89
+ console.error(`[admin-persist-audit] jsonl-malformed-lines convId=${conversationId.slice(0, 8)} count=${replay.malformedLines}`);
90
+ }
91
+
92
+ // Neo4j projection — what the resume route would have rendered pre-940.
93
+ let neo4j: Awaited<ReturnType<typeof getRecentMessages>>;
94
+ try {
95
+ neo4j = await getRecentMessages(conversationId, 1000);
96
+ } catch (err) {
97
+ console.error(`[admin-persist-audit] neo4j-read-failed convId=${conversationId.slice(0, 8)} reason=${err instanceof Error ? err.message : String(err)}`);
98
+ return 2;
99
+ }
100
+
101
+ // Build a set of (role, content) keys present in Neo4j for fast lookup.
102
+ // Same matching key the resume route uses, for parity.
103
+ const neo4jByKey = new Map<string, typeof neo4j[number]>();
104
+ for (const n of neo4j) neo4jByKey.set(`${n.role}\x1f${n.content}`, n);
105
+
106
+ let divergences = 0;
107
+ for (const j of replay.messages) {
108
+ const key = `${j.role}\x1f${j.content}`;
109
+ const match = neo4jByKey.get(key);
110
+ if (!match) {
111
+ // Whole message absent from Neo4j.
112
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} sdkTurnUuid=${j.messageId.slice(0, 8)} expected=message missing reason=neo4j-row-absent`);
113
+ divergences += 1;
114
+ // Each missing message implies its components are also missing — emit
115
+ // one divergence line per absent component for forensic completeness.
116
+ for (const c of j.components) {
117
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} sdkTurnUuid=${j.messageId.slice(0, 8)} expected=component component_name=${c.name} ordinal=${c.ordinal} missing reason=neo4j-row-absent`);
118
+ divergences += 1;
119
+ }
120
+ continue;
121
+ }
122
+ // Message exists; cross-check component count (Neo4j carries components
123
+ // as siblings of :Message via :HAS_COMPONENT).
124
+ const neoComps = match.components ?? [];
125
+ if (neoComps.length < j.components.length) {
126
+ for (let i = neoComps.length; i < j.components.length; i++) {
127
+ const c = j.components[i];
128
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} sdkTurnUuid=${j.messageId.slice(0, 8)} expected=component component_name=${c.name} ordinal=${c.ordinal} missing reason=neo4j-row-absent`);
129
+ divergences += 1;
130
+ }
131
+ }
132
+ }
133
+
134
+ // Task 942 — every PERSISTENT_COMPONENTS :Component row must have a
135
+ // sibling :KnowledgeDocument with a matching attachmentId. Two failure
136
+ // modes: (a) live writer succeeded the :Component CREATE but the
137
+ // sibling :KnowledgeDocument MERGE didn't fire (theoretical — they're
138
+ // in the same tx, so this is only possible if the row is pre-942 or if
139
+ // the FOREACH ran on a null attachmentId); (b) the disk-write failed
140
+ // mid-render and `c.attachmentId` is null but the artefact bytes might
141
+ // be recoverable from `c.data`. Both cases are surfaced as `kd-row-absent`
142
+ // — the operator runs component-knowledgedoc-backfill.ts to materialise
143
+ // the projection.
144
+ const projectionSession = getSession();
145
+ try {
146
+ const componentRowsResult = await projectionSession.run(
147
+ `MATCH (m:Message {conversationId: $conversationId})-[:HAS_COMPONENT]->(c:Component)
148
+ WHERE c.name IN $names
149
+ OPTIONAL MATCH (k:KnowledgeDocument {accountId: c.accountId, attachmentId: c.attachmentId})
150
+ WHERE c.attachmentId IS NOT NULL
151
+ RETURN c.componentId AS componentId,
152
+ c.name AS componentName,
153
+ c.accountId AS accountId,
154
+ c.attachmentId AS attachmentId,
155
+ k IS NOT NULL AS hasProjection`,
156
+ { conversationId, names: Array.from(PERSISTENT_COMPONENTS) },
157
+ );
158
+
159
+ for (const record of componentRowsResult.records) {
160
+ const componentId = record.get("componentId") as string;
161
+ const componentName = record.get("componentName") as string;
162
+ const attachmentId = record.get("attachmentId") as string | null;
163
+ const hasProjection = record.get("hasProjection") === true;
164
+ if (!attachmentId) {
165
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} componentId=${componentId.slice(0, 8)} expected=knowledgedoc component_name=${componentName} missing reason=empty-attachment-id`);
166
+ divergences += 1;
167
+ continue;
168
+ }
169
+ if (!hasProjection) {
170
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} componentId=${componentId.slice(0, 8)} expected=knowledgedoc component_name=${componentName} missing reason=kd-row-absent attachmentId=${attachmentId.slice(0, 8)}`);
171
+ divergences += 1;
172
+ }
173
+ }
174
+ } finally {
175
+ await projectionSession.close();
176
+ }
177
+
178
+ if (divergences === 0) {
179
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} jsonlMessages=${replay.messages.length} neo4jMessages=${neo4j.length} divergences=0 status=ok`);
180
+ return 0;
181
+ }
182
+ console.log(`[admin-persist-audit] convId=${conversationId.slice(0, 8)} jsonlMessages=${replay.messages.length} neo4jMessages=${neo4j.length} divergences=${divergences} status=mismatch`);
183
+ return 1;
184
+ }
185
+
186
+ main()
187
+ .then((code) => process.exit(code))
188
+ .catch((err) => {
189
+ console.error(`[admin-persist-audit] crashed: ${err instanceof Error ? err.stack : String(err)}`);
190
+ process.exit(2);
191
+ });
@@ -0,0 +1,214 @@
1
+ #!/usr/bin/env -S node --loader tsx
2
+ /**
3
+ * Task 942 — backfill :KnowledgeDocument projections for legacy
4
+ * :Component rows whose render-component was emitted before this
5
+ * task landed.
6
+ *
7
+ * Walks every `:Component {name ∈ PERSISTENT_COMPONENTS}` row that
8
+ * lacks a sibling `:KnowledgeDocument` (matched by accountId +
9
+ * deterministic attachmentId derived from the component's id) and
10
+ * for each one materialises the file on disk + MERGEs the projection
11
+ * row in a single Cypher tx. Idempotent — re-running against the
12
+ * same rows is a no-op (MERGE collapses, file write rewrites the
13
+ * same bytes).
14
+ *
15
+ * Usage:
16
+ * tsx platform/scripts/component-knowledgedoc-backfill.ts \
17
+ * [--account-id=<uuid>] # optional filter, default = all accounts
18
+ * [--dry-run] # print what would happen, do not write
19
+ *
20
+ * Exit codes:
21
+ * 0 = no rows needed backfill, OR all rows succeeded (including --dry-run)
22
+ * 1 = at least one row failed (disk write threw, Cypher tx threw)
23
+ * 2 = invocation / Neo4j connection error
24
+ *
25
+ * Per-row log line:
26
+ * [component-kd-backfill] convId=<…> componentId=<…> outcome=created|skipped|failed reason=<…>
27
+ *
28
+ * The skip cases:
29
+ * - component data does not contain `data.content` or `data.html`,
30
+ * - both fields empty,
31
+ * - the projection row already exists (idempotent re-run).
32
+ */
33
+
34
+ import process from "node:process";
35
+
36
+ import { isPersistentComponent, PERSISTENT_COMPONENTS } from "../lib/persistent-components/src/index";
37
+ import { deriveComponentAttachmentId, deriveComponentTitle, pickComponentBytes } from "../ui/app/lib/claude-agent/component-attachment";
38
+ import { getSession } from "../ui/app/lib/neo4j-store";
39
+ import { storeComponentArtefact } from "../ui/app/lib/attachments";
40
+
41
+ interface Args {
42
+ accountIdFilter?: string;
43
+ dryRun: boolean;
44
+ }
45
+
46
+ function parseArgs(argv: string[]): Args {
47
+ const out: Args = { dryRun: false };
48
+ for (const a of argv) {
49
+ const m = a.match(/^--([a-z-]+)(?:=(.+))?$/);
50
+ if (!m) continue;
51
+ const [, key, val] = m;
52
+ if (key === "account-id") out.accountIdFilter = val;
53
+ else if (key === "dry-run") out.dryRun = true;
54
+ }
55
+ return out;
56
+ }
57
+
58
+ interface ComponentRow {
59
+ componentId: string;
60
+ conversationId: string;
61
+ accountId: string;
62
+ name: string;
63
+ data: string;
64
+ existingAttachmentId: string | null;
65
+ messageId: string;
66
+ }
67
+
68
+ async function main(): Promise<number> {
69
+ const args = parseArgs(process.argv.slice(2));
70
+ const session = getSession();
71
+ let backfilled = 0;
72
+ let skipped = 0;
73
+ let failed = 0;
74
+
75
+ try {
76
+ // Pull every PERSISTENT_COMPONENTS :Component row, optionally
77
+ // filtered by accountId. The query also returns the existing
78
+ // c.attachmentId (null on legacy / pre-942 rows). For legacy rows
79
+ // we derive attachmentId from componentId — it's the only stable
80
+ // identifier on the historical data — write the file, MERGE the
81
+ // projection, AND back-fill c.attachmentId so the audit harness
82
+ // collapses on the same row on its next run.
83
+ const componentNames = Array.from(PERSISTENT_COMPONENTS);
84
+ const filterClause = args.accountIdFilter ? "AND c.accountId = $accountId" : "";
85
+ const result = await session.run(
86
+ `MATCH (m:Message)-[:HAS_COMPONENT]->(c:Component)
87
+ WHERE c.name IN $names ${filterClause}
88
+ RETURN c.componentId AS componentId,
89
+ c.conversationId AS conversationId,
90
+ c.accountId AS accountId,
91
+ c.name AS name,
92
+ c.data AS data,
93
+ c.attachmentId AS existingAttachmentId,
94
+ m.messageId AS messageId
95
+ ORDER BY c.createdAt`,
96
+ args.accountIdFilter
97
+ ? { names: componentNames, accountId: args.accountIdFilter }
98
+ : { names: componentNames },
99
+ );
100
+
101
+ for (const record of result.records) {
102
+ const row: ComponentRow = {
103
+ componentId: record.get("componentId") as string,
104
+ conversationId: record.get("conversationId") as string,
105
+ accountId: record.get("accountId") as string,
106
+ name: record.get("name") as string,
107
+ data: record.get("data") as string,
108
+ existingAttachmentId: (record.get("existingAttachmentId") as string | null) ?? null,
109
+ messageId: record.get("messageId") as string,
110
+ };
111
+
112
+ if (!isPersistentComponent(row.name)) {
113
+ // Defensive — the WHERE clause already filters, but a future
114
+ // schema change might mismatch; log and move on.
115
+ skipped += 1;
116
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=skipped reason=name-not-persistent`);
117
+ continue;
118
+ }
119
+
120
+ let dataObj: Record<string, unknown>;
121
+ try {
122
+ dataObj = JSON.parse(row.data) as Record<string, unknown>;
123
+ } catch {
124
+ skipped += 1;
125
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=skipped reason=data-not-json`);
126
+ continue;
127
+ }
128
+
129
+ const bytesPick = pickComponentBytes(dataObj);
130
+ if (!bytesPick) {
131
+ skipped += 1;
132
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=skipped reason=no-content-or-html`);
133
+ continue;
134
+ }
135
+
136
+ // Prefer the live-writer-stamped attachmentId when present;
137
+ // otherwise derive from componentId (legacy / pre-942 rows).
138
+ // The derived value is then written back onto :Component below
139
+ // so the audit harness sees a single source of truth on the
140
+ // next run.
141
+ const attachmentId = row.existingAttachmentId ?? deriveComponentAttachmentId(row.componentId);
142
+ const derivedFromComponentId = !row.existingAttachmentId;
143
+ const title = deriveComponentTitle(row.name, dataObj);
144
+ const filename = bytesPick.mimeType === "text/html" ? `${title}.html` : `${title}.md`;
145
+
146
+ if (args.dryRun) {
147
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=dry-run attachmentId=${attachmentId.slice(0, 8)} mimeType=${bytesPick.mimeType} bytes=${bytesPick.content.length} source=${derivedFromComponentId ? "derived" : "stamped"}`);
148
+ continue;
149
+ }
150
+
151
+ try {
152
+ await storeComponentArtefact(row.accountId, attachmentId, bytesPick.mimeType, bytesPick.content, filename);
153
+ } catch (err) {
154
+ failed += 1;
155
+ const reason = err instanceof Error ? err.message : String(err);
156
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=failed reason=disk-write:${JSON.stringify(reason.slice(0, 200))}`);
157
+ continue;
158
+ }
159
+
160
+ try {
161
+ // MERGE the projection + the discovery edge from :Message + back-fill
162
+ // c.attachmentId so the audit harness sees a single attachmentId
163
+ // source on its next run. The Cypher returns whether the projection
164
+ // was newly created or already existed, so the per-row log line
165
+ // distinguishes the two outcomes for forensic purposes.
166
+ const mergeResult = await session.run(
167
+ `MATCH (m:Message {messageId: $messageId, accountId: $accountId})
168
+ MATCH (m)-[:HAS_COMPONENT]->(c:Component {componentId: $componentId})
169
+ MERGE (k:KnowledgeDocument {accountId: $accountId, attachmentId: $attachmentId})
170
+ ON CREATE SET k.name = $title,
171
+ k.encodingFormat = $mimeType,
172
+ k.createdAt = datetime(),
173
+ k.updatedAt = datetime()
174
+ ON MATCH SET k.updatedAt = datetime()
175
+ MERGE (m)-[:HAS_KNOWLEDGE_DOCUMENT]->(k)
176
+ SET c.attachmentId = $attachmentId
177
+ RETURN CASE WHEN k.createdAt = k.updatedAt THEN 'created' ELSE 'projection-existed' END AS state`,
178
+ {
179
+ accountId: row.accountId,
180
+ attachmentId,
181
+ title,
182
+ mimeType: bytesPick.mimeType,
183
+ messageId: row.messageId,
184
+ componentId: row.componentId,
185
+ },
186
+ );
187
+ const state = mergeResult.records[0]?.get("state") as string | undefined;
188
+ if (state === "created") {
189
+ backfilled += 1;
190
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=created attachmentId=${attachmentId.slice(0, 8)} mimeType=${bytesPick.mimeType} bytes=${bytesPick.content.length} source=${derivedFromComponentId ? "derived" : "stamped"}`);
191
+ } else {
192
+ skipped += 1;
193
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=skipped reason=already-projected`);
194
+ }
195
+ } catch (err) {
196
+ failed += 1;
197
+ const reason = err instanceof Error ? err.message : String(err);
198
+ console.log(`[component-kd-backfill] convId=${row.conversationId.slice(0, 8)} componentId=${row.componentId.slice(0, 8)} outcome=failed reason=cypher:${JSON.stringify(reason.slice(0, 200))}`);
199
+ }
200
+ }
201
+ } finally {
202
+ await session.close();
203
+ }
204
+
205
+ console.log(`[component-kd-backfill] summary backfilled=${backfilled} skipped=${skipped} failed=${failed}`);
206
+ return failed === 0 ? 0 : 1;
207
+ }
208
+
209
+ main()
210
+ .then((code) => process.exit(code))
211
+ .catch((err) => {
212
+ console.error(`[component-kd-backfill] crashed: ${err instanceof Error ? err.stack : String(err)}`);
213
+ process.exit(2);
214
+ });