npm - @rubytech/create-realagent - Versions diffs - 1.0.705 → 1.0.707 - Mend

@rubytech/create-realagent 1.0.705 → 1.0.707

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (107) hide show

package/payload/platform/plugins/admin/skills/onboarding/SKILL.md CHANGED Viewed

@@ -110,7 +110,7 @@ Present the admin SOUL via `render-component` with `name: "document-editor"` and
 After the admin SOUL is written and approved, call `onboarding-complete-step` with step 6.
-**Document ingestion.** If the user uploaded any documents during Step 6 (or earlier in the session), dispatch the `content-producer` subagent to ingest them AFTER calling `onboarding-complete-step` — not before. Use the Agent tool with `run_in_background: true`. The critical path (SOUL file, step completion) must not depend on document ingestion succeeding. If no documents were uploaded, skip this step.
+**Document ingestion.** If the user uploaded any documents during Step 6 (or earlier in the session), dispatch the `database-operator` subagent (via the universal `document-ingest` skill) to ingest them AFTER calling `onboarding-complete-step` — not before. Use the Agent tool with `run_in_background: true`. The critical path (SOUL file, step completion) must not depend on document ingestion succeeding. Include the document path, the document subject (typically the account owner's UserProfile or the LocalBusiness depending on the doc), and the scope in the brief. If no documents were uploaded, skip this step.
 **Next steps.** After completing onboarding, let the user know that everything configured during onboarding — plugins, WiFi, output style, thinking view, timezone, and personality — can be changed at any time through conversation. Then suggest three things the user can do next — all optional and available whenever they are ready:
@@ -167,14 +167,55 @@ All retry loops re-evaluate using the `action` returned in the most recent respo
 Do not read any skill files. Do not call any other Anthropic tools except `anthropic-setup`. Do not dispatch specialists. The `anthropic-setup` tool handles the entire flow.
-## Step 9 — Business profile
+## Step 9 — Operator persona and profile bootstrap
 *(skip if `currentStep` >= 9)*
-Populate the business's operational identity in the graph — the admin user node, the `LocalBusiness` node, and its core properties (name, description, address, hours, services). Without this, the graph-write gate refuses any user-domain write, so capturing it now avoids the agent being interrupted mid-task later.
+Pin the operator's persona and bootstrap the graph nodes that satisfy the graph-write gate. The persona choice is the trust-shaping moment for this step — an employee who answers "what's your business?" with their employer's name silently registers a `LocalBusiness` owned by the wrong party, so we must surface the question explicitly before any write.
-Invoke the `business-profile` skill. Follow its first-run path: create the `AdminUser` node (bound to the `userId` from users.json), create the `LocalBusiness` node, collect identity + address + whichever additional domains (hours, services, FAQs, brand assets) the user provides. The skill knows how to adapt — accept partial input and allow skipping sections.
+**Render the persona select first.** Call `render-component` with `name: "single-select"` and data:
-When `business-profile` reports that the `AdminUser` and `LocalBusiness` nodes exist in the graph, call `onboarding-complete-step` with step 9. Do not mark step 9 complete before both nodes exist — the gate's precondition must be real, not just recorded.
+```
+{
+  submitMessage: "Persona: {{value}}",
+  options: [
+    {
+      value: "personal",
+      label: "Just for me",
+      description: "I am not setting this up for a business — Maxy is my personal operations agent."
+    },
+    {
+      value: "business-owner",
+      label: "For my business",
+      description: "I am the owner / operator and Maxy is the operations agent for my company."
+    },
+    {
+      value: "employee",
+      label: "I work somewhere, but not as the owner",
+      description: "I have an employer, but this device is for my personal use — my employer is NOT being registered here."
+    }
+  ]
+}
+```
+**Wait for the user's submission.** If the user picks "Other" or types free text instead of selecting, ask them which of the three personas best describes them and re-render the select. Do not proceed without one of the three documented modes — the agent must not improvise a fourth path. If the user pivots off-topic mid-flow, answer their question briefly and re-render the select; step 9 stays incomplete until they pick a mode.
+**Call `onboarding-step9-mode` with the chosen mode before any graph write or skill invocation.** The tool emits the diagnostic log line and returns the deterministic next-action prose. Branch on the mode:
+### `business-owner`
+Invoke the `business-profile` skill. Follow its first-run path: create the `AdminUser` node, create the `LocalBusiness` node, collect identity + address + whichever additional domains (hours, services, FAQs, brand assets) the user provides. When `business-profile` reports that both nodes exist in the graph, call `onboarding-complete-step` with step 9. Do not mark step 9 complete before both nodes exist — the gate's precondition must be real, not just recorded.
+### `personal` or `employee`
+Personal/employee mode does not register a `LocalBusiness`. Instead, bootstrap the operator's profile so the graph-write gate is satisfied without a business node:
+1. **Ask the user for their email** in one short conversational message — Maxy needs an email or phone number on the personal-profile node, and email is the more useful signal for downstream features.
+2. **Read the admin's `userId` and `name` from `admin-identity` in your system prompt.** Split `name` into `givenName` (first token) and `familyName` (remainder, or empty if a single token).
+3. **Write the `AdminUser` node.** Call `memory-write` with `labels: ["AdminUser"]`, `properties: { userId, name }`, `scope: "admin"`, `relationships: [{ type: "HAS_ACCOUNT_SCOPE", direction: "outgoing", targetNodeId: "<account-anchor>" }]` — or whatever adjacency convention the current schema requires (grep `AdminUser` in the codebase for a live example if unsure).
+4. **Write the personal-profile `Person` node.** Call `memory-write` with `labels: ["Person"]`, `properties: { givenName, familyName, email, role: "admin-personal" }`, `scope: "admin"`, `relationships: [{ type: "OWNS", direction: "incoming", targetNodeId: "<AdminUser-elementId>" }]`. The `role: "admin-personal"` property is what the graph-write gate looks for in lieu of a `LocalBusiness`.
+5. **Mark step 9 complete.** Call `onboarding-complete-step` with step 9.
+After step 9 completes in personal/employee mode, tell the user that Maxy is configured for personal use — their employer (if any) is not registered here. If they later become the operator for a business of their own, they can ask Maxy to set up a business profile, which invokes the `business-profile` skill directly.
-If the user declines to complete business-profile now, leave step 9 incomplete. The next session will resume here, and any attempt to write user-domain data will surface `Write blocked (no-admin-user)` or `Write blocked (no-local-business)` via the gate, pulling the agent back into this step.
+If the user declines to bootstrap during step 9 in any mode, leave step 9 incomplete. The next session will resume here, and any attempt to write user-domain data will surface `Write blocked (no-admin-user)` or `Write blocked (no-local-business)` via the gate, pulling the agent back into this step.

package/payload/platform/plugins/docs/references/adherence.md CHANGED Viewed

@@ -93,6 +93,6 @@ The constraint is computed once per turn at the top of `invokeAgent` and frozen
 ## Limits and deferrals
-v1 covers the admin agent only. Specialist subagents (`personal-assistant`, `project-manager`, `research-assistant`, `content-producer`) do not receive their own ledger injection yet — their `.md` templates load via `--plugin-dir` and have no TS-side assembly site. Follow-up task filed.
+v1 covers the admin agent only. Specialist subagents (`personal-assistant`, `project-manager`, `research-assistant`, `content-producer`, `database-operator`) do not receive their own ledger injection yet — their `.md` templates load via `--plugin-dir` and have no TS-side assembly site. Follow-up task filed.
 No cross-agent rule inheritance, no user-visible correction-ack signal, no blocking-critic retry loop in v1 — each is a separate follow-up task. See [`.docs/agents.md`](../../../../.docs/agents.md) § Adherence Fidelity for the full deferral list with task numbers.

package/payload/platform/plugins/memory/PLUGIN.md CHANGED Viewed

@@ -10,6 +10,7 @@ tools:
   - memory-ingest
   - memory-ingest-extract
   - memory-ingest-web
+  - memory-classify
   - memory-find-candidates
   - memory-delete
   - memory-restore
@@ -32,6 +33,7 @@ hidden:
   - session-compact-status
 skills:
   - skills/conversational-memory/SKILL.md
+  - skills/document-ingest/SKILL.md
 always: true
 embed: false
 ---
@@ -89,32 +91,39 @@ Restricted fields (`accountId`, `embedding`, `profileVersion`) cannot be set via
 ## Schema References
-Before any structured write, load `references/schema-base.md` via `plugin-read`. This defines universal node types, property naming rules, and relationship patterns. If the `LocalBusiness` node has a `businessType` property, also load the matching vertical schema (`references/schema-{businessType}.md`) — it extends the base with vertical-specific types. Confirm which schemas were consulted before writing.
+Before any structured write, load `references/schema-base.md` via `plugin-read`. This defines property naming rules, required-property groups for documented types, and relationship patterns. If the `LocalBusiness` node has a `businessType` property, also load the matching vertical schema (`references/schema-{businessType}.md`) — it extends the base with vertical-specific types. Confirm which schemas were consulted before writing.
+**Validation surface (Task 736).** `memory-write` validates labels against `db.labels()` ∪ `schema.cypher` declarations — not against this markdown. A label is recognised if it appears in either set. The markdown defines property *shape* (required-property groups, naming rules) for documented labels only; recognised-but-undocumented labels (e.g. `LocalBusiness`, `AdminUser`, `KnowledgeDocument`) accept any property shape and emit `[schema-validator] markdown-undocumented label=<X>` so the doc gap is visible to operators. If `memory-write` rejects a label as unknown, the rejection lists both source sets — the agent can call `maxy-graph-get_neo4j_schema` to refresh its view.
 ## Document Ingestion
-### Scope
+Document ingestion of any kind — PDFs, text, transcripts, web pages, single files — routes to the `database-operator` specialist, which loads the universal `document-ingest` skill at `skills/document-ingest/SKILL.md`. The admin agent never calls `memory-ingest` directly; it dispatches with the document path, the document subject (the anchor node), and the visibility scope.
-Before calling `memory-ingest`, ask the user what visibility scope the document should have. Do not assume a default. Present the options:
+### Pipeline (Task 737)
-- **public** — visible to public agents and the admin. Use for business knowledge that customers or visitors should be able to ask about (product info, services, FAQs, policies).
-- **shared** — visible to all agents on the account but not surfaced to unauthenticated public visitors. Use for internal operational knowledge that multiple agents need.
-- **admin** — visible only to the admin agent. Use for sensitive or internal-only content (contracts, credentials, internal processes).
+The skill drives a three-tool pipeline:
-If the user's intent is unambiguous from context — e.g., "save this for the sales agent" implies public, "this is just for my reference" implies admin — confirm the inferred scope rather than asking. When delegating to the content-producer, include the confirmed scope in the task description.
+1. **`memory-ingest-extract`** — pulls text from PDF/markdown/plain-text and caches it under the `attachmentId`. No chunking — the chunker has moved upstream into LLM-driven section classification.
+2. **`memory-classify`** — calls Claude Haiku with the loaded ontology and the cached text; returns typed sections (`{kind, title, body, properties, anchorEdge, related}`). Every returned `kind` is verified against the live ontology label set; invalid labels become `UNMAPPED` so a hallucination never reaches the writer.
+3. **`memory-ingest`** — writes typed graph nodes (Position, Service, Credential, etc.) anchored to `UserProfile` / `LocalBusiness` / `Person` / `Organization` via natural ontology edges, plus `(KnowledgeDocument)-[:REFERENCES]->(typed)` links. UNMAPPED sections become generic `:Section` nodes hanging off the document via `HAS_SECTION` (legacy fallback) so free-form prose retrieval still works.
-### Keywords
+### Scope
+`memory-ingest` requires a `scope` value on every call. The admin agent confirms the scope with the operator before dispatching:
-After a successful `memory-ingest` call, present the ingestion results to the user before moving on:
+- **public** — visible to public agents and the admin. Use for business knowledge that customers or visitors should be able to ask about.
+- **shared** — visible to all agents on the account but not surfaced to unauthenticated public visitors.
+- **admin** — visible only to the admin agent. Use for sensitive content (contracts, credentials, internal processes).
-1. **Display the document summary** from the `documentSummary` field in the tool response.
-2. **Display the extracted keywords** from the `keywords` array in the tool response (omit if absent).
-3. **Prompt for user keywords** — e.g., "Would you like to add any of your own keywords or tags to file this under?"
+If the user's intent is unambiguous from context — e.g., "save this for the sales agent" implies public, "this is just for my reference" implies admin — confirm the inferred scope rather than asking.
-If the user provides keywords, normalize them (lowercase, trim), merge with the existing `keywords` array (deduplicate), and call `memory-update` on the `documentNodeId` with the merged array as the `keywords` property. User-supplied keywords appear first in the merged array.
+### Keywords
-If the user declines or says nothing further, the flow is complete.
+The classifier extracts topic keywords as `documentKeywords`; the user can supply their own as `userKeywords`. Both are merged additively (lowercase, trimmed, deduplicated) and stored on the `KnowledgeDocument.keywords` array. User-supplied keywords appear first.
-Always generate LLM-extracted keywords as `keywords` on the `memory-ingest` call. The two sources are complementary — LLM keywords capture topic signals; user keywords define the user's intended classification.
+After a successful `memory-ingest` call, the dispatching admin agent should:
+1. Display the document summary and the extracted keywords to the user.
+2. Prompt for user keywords if none were supplied with the brief.
+3. If the user provides additional keywords post-ingest, call `memory-update` on the `documentNodeId` with the merged array.
-Keywords support user-defined collections via naming convention (e.g., `["reports", "reports/quarterly", "reports/quarterly/q1-2026"]`). When the user describes a hierarchical filing intent, build the full keyword path.
+Keywords support user-defined collections via naming convention (e.g., `["reports", "reports/quarterly", "reports/quarterly/q1-2026"]`).

package/payload/platform/plugins/memory/mcp/dist/index.js CHANGED Viewed

@@ -8,10 +8,12 @@ import { memorySearch } from "./tools/memory-search.js";
 import { memoryRank } from "./tools/memory-rank.js";
 import { memoryWrite } from "./tools/memory-write.js";
 import { loadSchema } from "./lib/schema-loader.js";
+import { buildLiveSchemaSource, defaultSchemaCypherPath, } from "./lib/live-schema-source.js";
 import { memoryReindex } from "./tools/memory-reindex.js";
 import { memoryIngestExtract } from "./tools/memory-ingest-extract.js";
 import { memoryIngest } from "./tools/memory-ingest.js";
 import { memoryIngestWeb } from "./tools/memory-ingest-web.js";
+import { memoryClassify } from "./tools/memory-classify.js";
 import { memoryUpdate } from "./tools/memory-update.js";
 import { memoryDelete } from "./tools/memory-delete.js";
 import { memoryFindCandidates } from "./tools/memory-find-candidates.js";
@@ -40,11 +42,59 @@ const accountId = process.env.ACCOUNT_ID;
 if (!accountId) {
     throw new Error("ACCOUNT_ID environment variable is required");
 }
-// Load the schema contract from platform/plugins/memory/references/schema-*.md
-// once at startup. Every memory-write call validates against this. If loading
-// fails we throw — a memory server running with a broken schema is worse than
-// one that fails to start loudly.
+// Load the markdown schema sidecar once at startup. Every memory-write call
+// reads its required-property and synonym maps. If loading fails we throw —
+// a memory server running with a broken schema is worse than one that fails
+// to start loudly.
 const schema = loadSchema();
+// Live label source (Task 736). The memory MCP runs as a separate stdio
+// process from graph-mcp, so it owns its own SchemaCache instance backed by
+// the existing Neo4j driver. The cache refreshes every 60s and tap-fires
+// drift detection from its emit hook. Boot does NOT block on the first
+// refresh — `defaultDeclaredOnly` is sufficient until live joins, since
+// declared labels cover the bootstrap labels (LocalBusiness etc.).
+const schemaCypherPath = defaultSchemaCypherPath();
+const liveSchemaRuntime = buildLiveSchemaSource({
+    schemaCypherPath,
+    markdownLabels: schema.markdownLabels.keys(),
+    fetcher: {
+        async labels() {
+            const session = getSession();
+            try {
+                const result = await session.run("CALL db.labels() YIELD label RETURN label");
+                return result.records
+                    .map((r) => r.get("label"))
+                    .filter((v) => typeof v === "string");
+            }
+            finally {
+                await session.close();
+            }
+        },
+        async relationshipTypes() {
+            const session = getSession();
+            try {
+                const result = await session.run("CALL db.relationshipTypes() YIELD relationshipType RETURN relationshipType");
+                return result.records
+                    .map((r) => r.get("relationshipType"))
+                    .filter((v) => typeof v === "string");
+            }
+            finally {
+                await session.close();
+            }
+        },
+    },
+});
+// Boot drift fires once the first refresh resolves. We do not await — the
+// validator handles boot races by accepting declared labels (which include
+// the bootstrap set) until live joins. Drift logs land in server.log as
+// soon as the snapshot lands, with no impact on serving requests.
+liveSchemaRuntime.ready.then(() => {
+    process.stderr.write(`[schema-validator] using-cache cacheAgeMs=0 ready=${liveSchemaRuntime.source.liveReady()}\n`);
+});
+const validator = {
+    schema,
+    liveSource: liveSchemaRuntime.source,
+};
 const userId = process.env.USER_ID; // Optional — present for admin sessions with users.json auth
 // Scope filtering: comma-separated list of allowed scopes (e.g. "public,shared").
 // When set, memory-search only returns nodes whose scope is in this list.
@@ -382,7 +432,7 @@ if (!readOnly) {
                     session: sessionId,
                     tool: "memory-write",
                 },
-                schema,
+                validator,
             });
             return {
                 content: [
@@ -508,7 +558,7 @@ if (!readOnly) {
                     session: sessionId,
                     tool: "session-compact",
                 },
-                schema,
+                validator,
             });
             return {
                 content: [{ type: "text", text: `Session summary saved (ID: ${result.nodeId})` }],
@@ -546,10 +596,11 @@ if (!readOnly) {
             };
         }
     });
-    server.tool("memory-ingest-extract", "Extract text from an uploaded file and chunk it server-side. Supports PDF (via pdftotext), plain text, and markdown. " +
+    server.tool("memory-ingest-extract", "Extract text from an uploaded file and cache it server-side for memory-classify and memory-ingest (Task 737). " +
+        "Supports PDF (via pdftotext), plain text, and markdown. " +
         "Rejects CSV (structured data — use memory-write instead) and images (metadata-only). " +
-        "Returns section titles and chunk previews. Full chunk content is cached server-side — " +
-        "the calling agent generates summaries from the previews, then calls memory-ingest with only summaries (no raw content needed).", {
+        "Returns file metadata + a short preview; the full text lives in the in-process cache keyed by attachmentId. " +
+        "Chunking has moved upstream into memory-classify (LLM-driven section classification) — this tool no longer chunks.", {
         storagePath: z.string().describe("Absolute path to the stored file on disk"),
         filename: z.string().describe("Original filename as uploaded"),
         mimeType: z.string().describe("MIME type of the file"),
@@ -574,42 +625,96 @@ if (!readOnly) {
             };
         }
     });
-    server.tool("memory-ingest", "Ingest a knowledge document into the graph as a three-level hierarchy: KnowledgeDocument → Section → Chunk. " +
-        "Requires a prior call to memory-ingest-extract (which caches the raw chunk content server-side). " +
-        "The calling agent provides only summaries — no raw content needed. The tool retrieves cached content, " +
-        "pairs it with summaries, embeds everything, and writes to the graph. " +
-        "Re-ingesting with the same attachmentId replaces the existing hierarchy — safe to call on already-ingested documents.", {
+    server.tool("memory-classify", "Classify an unstructured document into typed sections via Claude Haiku (Task 737). " +
+        "Reads the cached text written by memory-ingest-extract (same attachmentId), runs the classifier against " +
+        "the live ontology label set, and returns a JSON structure ready for memory-ingest. Every returned `kind` is " +
+        "verified server-side against the ontology — invalid labels become `UNMAPPED` so the writer falls back to " +
+        "generic Section nodes with a logged ontology gap. When Haiku is unavailable, returns `{kind: \"fallback\"}` " +
+        "with a reason — the skill should treat the whole document as one UNMAPPED Section.", {
+        attachmentId: z.string().describe("UUID of the file attachment — must match a prior memory-ingest-extract call"),
+        anchorDescription: z.string().describe("Short human sentence describing the document subject the classifier should anchor sections to. " +
+            "Examples: 'subject = UserProfile (the account owner)'; 'subject = LocalBusiness (the operator's business)'; " +
+            "'subject = Person {name: \"Jane Smith\"} (a third party)'."),
+    }, async ({ attachmentId, anchorDescription }) => {
+        try {
+            const result = await memoryClassify({
+                accountId,
+                attachmentId,
+                anchorDescription,
+                liveSchemaSource: liveSchemaRuntime.source,
+            });
+            return {
+                content: [{
+                        type: "text",
+                        text: JSON.stringify(result),
+                    }],
+            };
+        }
+        catch (err) {
+            return {
+                content: [{
+                        type: "text",
+                        text: `memory-classify failed: ${err instanceof Error ? err.message : String(err)}`,
+                    }],
+                isError: true,
+            };
+        }
+    });
+    server.tool("memory-ingest", "Write a classified document into the graph as typed nodes anchored to the document subject (Task 737). " +
+        "Requires prior calls to memory-ingest-extract (text cache) and memory-classify (typed structure). " +
+        "Each section becomes either a typed graph node (Position, Service, Credential, etc.) anchored to the named " +
+        "subject via the natural ontology edge plus a (KnowledgeDocument)-[:REFERENCES]->(typed) link, or — for " +
+        "UNMAPPED sections — a generic :Section node hanging off the document via HAS_SECTION (legacy fallback). " +
+        "Re-ingesting with the same attachmentId replaces the document's typed and untyped children — safe to call " +
+        "on already-ingested documents. Shared related entities (Organizations, Persons referenced by typed nodes) " +
+        "are MERGEd by identifying property and never deleted on re-ingest.", {
         attachmentId: z.string().describe("UUID of the file attachment — must match a prior memory-ingest-extract call"),
-        documentSummary: z.string().describe("LLM-generated summary of the entire document (1-3 sentences)"),
+        documentSummary: z.string().describe("Classifier-produced summary of the entire document (1-3 sentences)"),
+        anchorNodeId: z.string().describe("Element ID of the anchor node (UserProfile/LocalBusiness/Person/Organization). The document subject " +
+            "the operator confirmed during the document-ingest skill's first step."),
+        anchorLabel: z.string().describe("Primary label of the anchor node (e.g. 'UserProfile', 'LocalBusiness'). Used in the per-section MATCH " +
+            "for anchor-edge creation."),
         sections: z.array(z.object({
-            title: z.string().describe("Section title (must match the title from memory-ingest-extract)"),
-            summary: z.string().describe("LLM-generated summary of this section (1 sentence)"),
-            chunkSummaries: z.array(z.string().describe("LLM-generated summary of one chunk (1 sentence)"))
-                .describe("One summary per chunk in this section, in order"),
-        })).describe("One entry per section from memory-ingest-extract, in order"),
+            kind: z.string().describe("Ontology label or the literal 'UNMAPPED'"),
+            title: z.string(),
+            body: z.string(),
+            properties: z.record(z.string(), z.unknown()),
+            anchorEdge: z.object({
+                type: z.string(),
+                direction: z.enum(["from-anchor", "to-anchor"]),
+                properties: z.record(z.string(), z.unknown()).optional(),
+            }).nullable(),
+            related: z.array(z.object({
+                kind: z.string(),
+                properties: z.record(z.string(), z.unknown()),
+                edge: z.object({
+                    type: z.string(),
+                    direction: z.enum(["outgoing", "incoming"]),
+                    properties: z.record(z.string(), z.unknown()).optional(),
+                }),
+                merge: z.boolean().optional(),
+            })).optional(),
+        })).describe("Typed sections as returned by memory-classify"),
         scope: z.string().describe("Visibility scope for all created nodes — required. Valid values: 'public', 'shared', 'admin', or 'user:{identifier}'."),
-        entities: z.array(z.object({
-            name: z.string().describe("Entity name (for logging)"),
-            nodeId: z.string().describe("Neo4j element ID of the entity node"),
-        })).optional().describe("Entities referenced in the document to link via REFERENCES relationships"),
         sourceUrl: z.string().optional().describe("Original URL for web-sourced documents. Omit for file uploads."),
-        sourceType: z.string().optional().describe("Provenance: 'upload' or 'web'. Omit for file uploads (defaults to no sourceType)."),
-        keywords: z.array(z.string()).optional().describe("LLM-extracted topic keywords for the document. Normalized (lowercase, trimmed, deduplicated) before storage."),
-        userKeywords: z.array(z.string()).optional().describe("Keywords explicitly provided by the user. Always included verbatim (after normalization). " +
-            "Use for user-defined collections and classification (e.g. 'file this under quarterly reports')."),
-    }, async ({ attachmentId, documentSummary, sections, scope, entities, sourceUrl, sourceType, keywords, userKeywords }) => {
+        sourceType: z.string().optional().describe("Provenance: 'upload' or 'web'."),
+        documentKeywords: z.array(z.string()).optional().describe("Classifier-extracted topic keywords."),
+        userKeywords: z.array(z.string()).optional().describe("Keywords explicitly provided by the user. Merged additively with documentKeywords."),
+    }, async ({ attachmentId, documentSummary, anchorNodeId, anchorLabel, sections, scope, sourceUrl, sourceType, documentKeywords, userKeywords }) => {
         try {
             const result = await memoryIngest({
                 accountId,
                 attachmentId,
                 documentSummary,
+                anchorNodeId,
+                anchorLabel,
                 sections,
                 scope,
-                entities,
                 sourceUrl,
                 sourceType,
-                keywords,
+                documentKeywords,
                 userKeywords,
+                sessionId,
             });
             return {
                 content: [{
@@ -617,6 +722,8 @@ if (!readOnly) {
                         text: JSON.stringify({
                             documentNodeId: result.documentNodeId,
                             sectionCount: result.sectionCount,
+                            typedCount: result.typedCount,
+                            unmappedCount: result.unmappedCount,
                             chunkCount: result.chunkCount,
                             entityLinks: result.entityLinks,
                             documentSummary: result.documentSummary,
@@ -635,10 +742,10 @@ if (!readOnly) {
             };
         }
     });
-    server.tool("memory-ingest-web", "Ingest web content into the knowledge graph. Accepts a URL and its pre-fetched readable content " +
-        "(the agent calls WebFetch first, then passes the text here). " +
-        "Extracts a title, writes content to a temp file, and delegates to the existing extraction + chunking pipeline. " +
-        "Returns section/chunk previews for the agent to generate summaries, then call memory-ingest with sourceUrl and sourceType. " +
+    server.tool("memory-ingest-web", "Adapter for web-content ingestion (Task 737). Accepts a URL and its pre-fetched readable content " +
+        "(the agent calls WebFetch first, then passes the text here), writes content to a temp file, and delegates " +
+        "to memory-ingest-extract — caching the text under a freshly-generated attachmentId. The skill then drives " +
+        "memory-classify and memory-ingest using that attachmentId, exactly as for a file upload. " +
         "If the URL was previously ingested, returns the existing document info so the agent can inform the user.", {
         url: z.string().describe("The web page URL being ingested"),
         content: z.string().describe("The readable text content of the web page (pre-fetched by the agent via WebFetch)"),
@@ -879,12 +986,12 @@ if (!readOnly) {
             const result = await memoryEditAttachment({ accountId, attachmentId, content });
             let text = `Edited: ${result.filename} (${result.oldSizeBytes} → ${result.newSizeBytes} bytes)`;
             if (result.cachePopulated) {
-                text += `\nExtract cache populated: ${result.extractSections} sections, ${result.extractChunks} chunks.`;
-                text += `\nCall memory-ingest with updated summaries to sync the knowledge graph.`;
+                text += `\nExtract cache populated: ${result.extractTextLength} chars.`;
+                text += `\nCall memory-classify and memory-ingest to re-sync the knowledge graph.`;
             }
             else {
                 text += `\nWARNING: File updated on disk but extract cache could not be populated.`;
-                text += `\nCall memory-ingest-extract manually, then memory-ingest.`;
+                text += `\nCall memory-ingest-extract manually, then memory-classify, then memory-ingest.`;
             }
             return { content: [{ type: "text", text }] };
         }
@@ -1466,6 +1573,7 @@ server.tool("conversation-search", "Search conversation history using semantic v
 });
 // Cleanup on exit
 process.on("SIGINT", async () => {
+    liveSchemaRuntime.cache.stop();
     await closeDriver();
     process.exit(0);
 });