clawmem 0.7.1 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -368,7 +368,7 @@ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent sh
368
368
  - Do NOT pin everything — pin is for persistent high-priority items, not temporary boosting.
369
369
  - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it naturally.
370
370
  - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically. Only after bulk ingestion or when `intent_search` returns weak graph results.
371
- - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent.
371
+ - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent. **v0.7.2 adds `--synthesize`** — an opt-in post-import LLM fact extraction pass that turns raw conversation dumps into searchable structured decisions / preferences / milestones / problems with cross-fact relations. Off by default; also requires user consent because it drives additional LLM calls (one per conversation doc). Suggest both together when the user wants to get real value out of old chat exports, not just the raw dumps.
372
372
  - Do NOT use `diary_write` in Claude Code — hooks (`decision-extractor`, `handoff-generator`) capture this automatically. Diary is for non-hooked environments only (Hermes, Gemini, plain MCP clients).
373
373
  - Do NOT use `kg_query` for causal "why" questions — use `intent_search` or `memory_retrieve`. `kg_query` returns structured entity facts (SPO triples), not reasoning chains.
374
374
 
@@ -505,6 +505,7 @@ The `memory_relations` table is populated by multiple independent sources:
505
505
  | Entity co-occurrence graph | entity | A-MEM enrichment (indexing) | LLM entity extraction → quality filters (title/length/blocklist/location validation) → type-agnostic canonical resolution within compatibility buckets (person, org, location, tech=project/service/tool/concept) → `entity_mentions` + `entity_cooccurrences` tables. Entity edges use IDF-based specificity scoring. Feeds ENTITY intent queries and MPFP `[entity, semantic]` patterns. |
506
506
  | `consolidated_observations` | supporting, contradicts | Consolidation worker (background) | 3-tier consolidation: facts → observations → mental models. Observations track `proof_count`, `trend` (STABLE/STRENGTHENING/WEAKENING/STALE), and source links. **v0.7.1 safety gates:** name-aware merge gate uses entity-anchor comparison + 3-gram cosine similarity (dual-threshold `CLAWMEM_MERGE_SCORE_NORMAL`=0.93 / `_STRICT`=0.98) to prevent cross-entity merges ("Alice decided X" merging into "Bob decided X"). Merge-time contradiction gate runs deterministic heuristic + LLM check; blocked merges route to `CLAWMEM_CONTRADICTION_POLICY`=`link` (new row + `contradicts` edge, default) or `supersede` (old row `status='inactive'`, new row replaces). |
507
507
  | Deductive synthesis | supporting, contradicts | Consolidation worker Phase 3 (background, every ~15 min) | Combines 2-3 related recent observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` documents with `source_doc_ids` provenance. First-class searchable docs with ∞ half-life. **v0.7.1 anti-contamination wrapper:** every draft passes through deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions` or lexical fallback) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` (contaminationRejects, invalidIndexRejects, unsupportedRejects, emptyRejects, dedupSkipped, validatorFallbackAccepts). Contradictory dedupe matches are linked via `contradicts` edges. |
508
+ | Conversation synthesis (`runConversationSynthesis`) | semantic, supporting, contradicts, causal, temporal, entity | `clawmem mine <dir> --synthesize` (opt-in, post-index) | **v0.7.2.** Two-pass LLM pipeline over freshly imported `content_type='conversation'` docs. Pass 1 extracts structured facts (decision/preference/milestone/problem) via `extractFactsFromConversation`, saves each via dedup-aware `saveMemory`, populates a local Set-based alias map. Pass 2 resolves cross-fact links against the local map first (fails closed on ambiguity — multi-candidate titles return unresolved), falls back to collection-scoped SQL lookup with LIMIT 2 ambiguity detection. Relations upsert via `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)` — idempotent on equal-weight reruns but monotonically accepts stronger later evidence. Synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns update in place instead of creating parallel rows. Counters split into `llmFailures` (null/thrown/invalid-JSON) vs `docsWithNoFacts` (valid-empty extraction). All failures non-fatal — never rolls back the mine import. |
508
509
 
509
510
  **Edge collision:** Both `generateMemoryLinks()` and `buildSemanticGraph()` insert `relation_type='semantic'`. PK is `(source_id, target_id, relation_type)` — first writer wins.
510
511
 
package/CLAUDE.md CHANGED
@@ -368,7 +368,7 @@ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent sh
368
368
  - Do NOT pin everything — pin is for persistent high-priority items, not temporary boosting.
369
369
  - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it naturally.
370
370
  - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically. Only after bulk ingestion or when `intent_search` returns weak graph results.
371
- - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent.
371
+ - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent. **v0.7.2 adds `--synthesize`** — an opt-in post-import LLM fact extraction pass that turns raw conversation dumps into searchable structured decisions / preferences / milestones / problems with cross-fact relations. Off by default; also requires user consent because it drives additional LLM calls (one per conversation doc). Suggest both together when the user wants to get real value out of old chat exports, not just the raw dumps.
372
372
  - Do NOT use `diary_write` in Claude Code — hooks (`decision-extractor`, `handoff-generator`) capture this automatically. Diary is for non-hooked environments only (Hermes, Gemini, plain MCP clients).
373
373
  - Do NOT use `kg_query` for causal "why" questions — use `intent_search` or `memory_retrieve`. `kg_query` returns structured entity facts (SPO triples), not reasoning chains.
374
374
 
@@ -505,6 +505,7 @@ The `memory_relations` table is populated by multiple independent sources:
505
505
  | Entity co-occurrence graph | entity | A-MEM enrichment (indexing) | LLM entity extraction → quality filters (title/length/blocklist/location validation) → type-agnostic canonical resolution within compatibility buckets (person, org, location, tech=project/service/tool/concept) → `entity_mentions` + `entity_cooccurrences` tables. Entity edges use IDF-based specificity scoring. Feeds ENTITY intent queries and MPFP `[entity, semantic]` patterns. |
506
506
  | `consolidated_observations` | supporting, contradicts | Consolidation worker (background) | 3-tier consolidation: facts → observations → mental models. Observations track `proof_count`, `trend` (STABLE/STRENGTHENING/WEAKENING/STALE), and source links. **v0.7.1 safety gates:** name-aware merge gate uses entity-anchor comparison + 3-gram cosine similarity (dual-threshold `CLAWMEM_MERGE_SCORE_NORMAL`=0.93 / `_STRICT`=0.98) to prevent cross-entity merges ("Alice decided X" merging into "Bob decided X"). Merge-time contradiction gate runs deterministic heuristic + LLM check; blocked merges route to `CLAWMEM_CONTRADICTION_POLICY`=`link` (new row + `contradicts` edge, default) or `supersede` (old row `status='inactive'`, new row replaces). |
507
507
  | Deductive synthesis | supporting, contradicts | Consolidation worker Phase 3 (background, every ~15 min) | Combines 2-3 related recent observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` documents with `source_doc_ids` provenance. First-class searchable docs with ∞ half-life. **v0.7.1 anti-contamination wrapper:** every draft passes through deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions` or lexical fallback) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` (contaminationRejects, invalidIndexRejects, unsupportedRejects, emptyRejects, dedupSkipped, validatorFallbackAccepts). Contradictory dedupe matches are linked via `contradicts` edges. |
508
+ | Conversation synthesis (`runConversationSynthesis`) | semantic, supporting, contradicts, causal, temporal, entity | `clawmem mine <dir> --synthesize` (opt-in, post-index) | **v0.7.2.** Two-pass LLM pipeline over freshly imported `content_type='conversation'` docs. Pass 1 extracts structured facts (decision/preference/milestone/problem) via `extractFactsFromConversation`, saves each via dedup-aware `saveMemory`, populates a local Set-based alias map. Pass 2 resolves cross-fact links against the local map first (fails closed on ambiguity — multi-candidate titles return unresolved), falls back to collection-scoped SQL lookup with LIMIT 2 ambiguity detection. Relations upsert via `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)` — idempotent on equal-weight reruns but monotonically accepts stronger later evidence. Synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns update in place instead of creating parallel rows. Counters split into `llmFailures` (null/thrown/invalid-JSON) vs `docsWithNoFacts` (valid-empty extraction). All failures non-fatal — never rolls back the mine import. |
508
509
 
509
510
  **Edge collision:** Both `generateMemoryLinks()` and `buildSemanticGraph()` insert `relation_type='semantic'`. PK is `(source_id, target_id, relation_type)` — first writer wins.
510
511
 
package/README.md CHANGED
@@ -19,7 +19,7 @@ ClawMem turns your markdown notes, project docs, and research dumps into persist
19
19
  - **Surfaces relevant context** on every prompt (context-surfacing hook)
20
20
  - **Bootstraps sessions** with your profile, latest handoff, recent decisions, and stale notes
21
21
  - **Captures decisions, preferences, milestones, and problems** from session transcripts using a local GGUF observer model
22
- - **Imports conversation exports** from Claude Code, ChatGPT, Claude.ai, Slack, and plain text via `clawmem mine`
22
+ - **Imports conversation exports** from Claude Code, ChatGPT, Claude.ai, Slack, and plain text via `clawmem mine`, with optional post-import LLM fact extraction (`--synthesize`) that pulls structured decisions / preferences / milestones / problems and cross-fact links out of otherwise full-text conversation dumps (v0.7.2)
23
23
  - **Generates handoffs** at session end so the next session can pick up where you left off
24
24
  - **Learns what matters** via a feedback loop that boosts referenced notes and decays unused ones
25
25
  - **Guards against prompt injection** in surfaced content
@@ -66,6 +66,20 @@ Five independent safety gates around the consolidation pipeline and context surf
66
66
  - **Anti-contamination deductive synthesis** — every Phase 3 draft runs through a three-layer validator: deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions`) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` so Phase 3 yield can be diagnosed without enabling extra logging.
67
67
  - **Context instruction + relationship snippets** — `context-surfacing` now always prepends an `<instruction>` block framing the surfaced facts as background knowledge the model already holds, and appends an optional `<relationships>` block listing memory-graph edges where BOTH endpoints are in the surfaced doc set. The relationships block is the first thing dropped when the payload would overflow `CLAWMEM_PROFILE`'s token budget, preserving facts-first behaviour while giving the model graph-level reasoning hooks directly in-prompt.
68
68
 
69
+ ### v0.7.2 Post-Import Conversation Synthesis
70
+
71
+ Opt-in LLM pass that runs **after** `clawmem mine` finishes indexing an imported collection. Operates on the freshly imported `content_type='conversation'` documents and extracts structured knowledge facts (decisions / preferences / milestones / problems) plus cross-fact relations, writing each fact as a first-class searchable document alongside the raw conversation exchanges. See [post-import synthesis](docs/concepts/architecture.md#post-import-conversation-synthesis-v072) for the architectural walkthrough.
72
+
73
+ - **New CLI flag** — `clawmem mine <dir> --synthesize [--synthesis-max-docs N]`. Off by default. When omitted, existing mine behaviour is byte-identical to v0.7.1.
74
+ - **Two-pass pipeline** — Pass 1 extracts facts per conversation via the existing LLM, saves each via dedup-aware `saveMemory`, and populates a local alias map. Pass 2 resolves cross-fact links against the local map first, falling back to collection-scoped SQL lookup. Forward references (link to a fact extracted later in the same run) are resolved correctly.
75
+ - **Idempotent reruns** — synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns over the same conversation batch hit the `saveMemory` update branch instead of creating parallel rows. Same-slug collisions are disambiguated by the stable hash suffix, not encounter order.
76
+ - **Fail-closed link resolution** — when two different facts claim the same normalized title or alias, the resolver treats the link as ambiguous and counts it unresolved. Pre-existing docs with duplicate titles in the collection do not silently bind either.
77
+ - **Weight-monotonic relation upsert** — `memory_relations` insert uses `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)`, which is idempotent on equal-weight reruns but still accepts stronger later evidence without double-counting.
78
+ - **Non-fatal failure model** — any LLM failure, JSON parse error, saveMemory collision, or relation insert error is counted and logged, never re-thrown. Synthesis failure after `indexCollection` commits does not roll back the mine import.
79
+ - **Split operator counters** — `llmFailures` counts actual LLM path failures (null, thrown, non-array JSON), while `docsWithNoFacts` counts docs where the LLM responded validly but returned zero structured facts. Previously these were conflated as `nullCalls`.
80
+
81
+ Adds +63 tests (46 unit + 5 integration + 12 regression) on top of the v0.7.1 baseline.
82
+
69
83
  ## Architecture
70
84
 
71
85
  <p align="center">
@@ -657,7 +671,7 @@ clawmem collection list List collections
657
671
  clawmem collection remove <name> Remove a collection
658
672
 
659
673
  clawmem update [--pull] [--embed] Incremental re-scan
660
- clawmem mine <dir> [-c name] [--embed] Import conversation exports (Claude, ChatGPT, Slack)
674
+ clawmem mine <dir> [-c name] [--embed] [--synthesize] Import conversation exports (--synthesize runs post-import LLM fact extraction, v0.7.2)
661
675
  clawmem embed [-f] Generate fragment embeddings
662
676
  clawmem reindex [--force] Full re-index
663
677
  clawmem watch File watcher daemon
package/SKILL.md CHANGED
@@ -528,6 +528,7 @@ mcp__clawmem__vsearch(query, collection="name", compact=true) # vector
528
528
  | `buildSemanticGraph()` | semantic | `build_graphs` MCP (manual) | Pure cosine similarity. A-MEM edges take precedence (first-writer wins). |
529
529
  | `consolidated_observations` | supporting, contradicts | Consolidation worker (background) | **v0.7.1 safety gates:** Phase 2 name-aware merge gate (entity anchors + 3-gram cosine, dual-threshold `CLAWMEM_MERGE_SCORE_NORMAL`=0.93 / `_STRICT`=0.98) blocks cross-entity merges. Merge-time contradiction gate (heuristic + LLM) routes blocked merges to `link` (default, inserts `contradicts` edge) or `supersede` (old row `status='inactive'`) via `CLAWMEM_CONTRADICTION_POLICY`. |
530
530
  | Deductive synthesis | supporting, contradicts | Consolidation worker Phase 3 (every ~15 min) | Combines 2-3 related observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` docs. **v0.7.1 anti-contamination:** deterministic pre-checks (empty/invalid_indices/pool-only entity contamination) + LLM validator (fail-open, `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats via `DeductiveSynthesisStats`. Contradictory dedupe matches linked via `contradicts` edges. |
531
+ | Conversation synthesis | semantic, supporting, contradicts, causal, temporal, entity | `clawmem mine <dir> --synthesize` (opt-in, post-index) | **v0.7.2.** Two-pass LLM pipeline over freshly imported `content_type='conversation'` docs. Pass 1 extracts structured decision/preference/milestone/problem facts + aliases + cross-fact links, saves via dedup-aware `saveMemory`, populates ambiguity-aware local Set map. Pass 2 resolves links (local first, SQL fallback with `LIMIT 2` ambiguity detection), upserts relations via `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)`. Synthesized paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))` so reruns update in place. All failures non-fatal. Counters split: `llmFailures` (LLM/parse error) vs `docsWithNoFacts` (valid empty extraction). |
531
532
 
532
533
  **Graph traversal asymmetry:** `adaptiveTraversal()` traverses all edge types outbound (source->target) but only `semantic` and `entity` inbound.
533
534
 
@@ -589,7 +590,7 @@ Phase 3 deductive synthesis applies the same `contradicts` link for any draft th
589
590
  - Do NOT pin everything — pin is for persistent high-priority items.
590
591
  - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it.
591
592
  - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically.
592
- - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command. Suggest it to the user when they mention old conversation exports, but let them run it.
593
+ - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command. Suggest it to the user when they mention old conversation exports, but let them run it. **v0.7.2 adds `--synthesize`** — an opt-in post-import LLM fact extraction pass. Also requires user consent because it drives one extra LLM call per conversation doc. Suggest both together when the user wants searchable structured memory from raw chat exports.
593
594
  - Do NOT use `diary_write` in Claude Code — hooks capture this automatically. Diary is for non-hooked environments only (Hermes, Gemini, plain MCP).
594
595
  - Do NOT use `kg_query` for causal "why" questions — use `intent_search` or `memory_retrieve`. `kg_query` returns structured entity facts (SPO triples), not reasoning chains.
595
596
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "clawmem",
3
- "version": "0.7.1",
3
+ "version": "0.7.2",
4
4
  "description": "On-device context engine and memory for AI agents. Claude Code and OpenClaw. Hooks + MCP server + hybrid RAG search.",
5
5
  "type": "module",
6
6
  "bin": {
package/src/clawmem.ts CHANGED
@@ -242,12 +242,14 @@ async function cmdMine(args: string[]) {
242
242
  collection: { type: "string", short: "c" },
243
243
  embed: { type: "boolean", default: false },
244
244
  "dry-run": { type: "boolean", default: false },
245
+ synthesize: { type: "boolean", default: false },
246
+ "synthesis-max-docs": { type: "string" },
245
247
  },
246
248
  allowPositionals: true,
247
249
  });
248
250
 
249
251
  const dir = positionals[0];
250
- if (!dir) die("Usage: clawmem mine <directory> [-c collection-name] [--embed] [--dry-run]");
252
+ if (!dir) die("Usage: clawmem mine <directory> [-c collection-name] [--embed] [--dry-run] [--synthesize] [--synthesis-max-docs N]");
251
253
  const absDir = pathResolve(dir);
252
254
  if (!existsSync(absDir)) die(`Directory not found: ${absDir}`);
253
255
 
@@ -319,6 +321,32 @@ async function cmdMine(args: string[]) {
319
321
  const stats = await indexCollection(s, collectionName, stagingDir, "**/*.md");
320
322
  console.log(` ${c.green}+${stats.added}${c.reset} added, ${c.yellow}~${stats.updated}${c.reset} updated, ${c.dim}=${stats.unchanged}${c.reset} unchanged`);
321
323
 
324
+ // Ext 4 — post-import conversation synthesis (opt-in via --synthesize)
325
+ // Runs AFTER indexCollection has committed. Failure is non-fatal and never
326
+ // rolls back the mine import.
327
+ if (values.synthesize) {
328
+ const maxDocs = values["synthesis-max-docs"]
329
+ ? parseInt(values["synthesis-max-docs"] as string, 10)
330
+ : undefined;
331
+ console.log(`\n${c.cyan}Running post-import conversation synthesis${c.reset}`);
332
+ try {
333
+ const { runConversationSynthesis } = await import("./conversation-synthesis.ts");
334
+ const llm = getDefaultLlamaCpp();
335
+ const synthResult = await runConversationSynthesis(s, llm, {
336
+ collection: collectionName,
337
+ maxDocs: Number.isFinite(maxDocs) && (maxDocs as number) > 0 ? maxDocs : undefined,
338
+ });
339
+ console.log(
340
+ ` ${c.green}${synthResult.factsSaved}${c.reset} facts saved, ` +
341
+ `${c.green}${synthResult.linksResolved}${c.reset} links resolved, ` +
342
+ `${c.yellow}${synthResult.linksUnresolved}${c.reset} unresolved, ` +
343
+ `${c.dim}${synthResult.llmFailures} LLM failure(s), ${synthResult.docsWithNoFacts} docs with no facts${c.reset}`,
344
+ );
345
+ } catch (err) {
346
+ console.log(` ${c.yellow}Synthesis failed (mine import preserved):${c.reset} ${err}`);
347
+ }
348
+ }
349
+
322
350
  if (values.embed) {
323
351
  console.log();
324
352
  await cmdEmbed([]);
@@ -2500,7 +2528,7 @@ ${c.bold}Setup:${c.reset}
2500
2528
 
2501
2529
  ${c.bold}Indexing:${c.reset}
2502
2530
  clawmem update [--pull] [--embed] Re-scan collections (--embed auto-embeds)
2503
- clawmem mine <dir> [-c name] [--embed] Import conversation exports (Claude, ChatGPT, Slack)
2531
+ clawmem mine <dir> [-c name] [--embed] [--synthesize] Import conversation exports (Claude, ChatGPT, Slack); --synthesize runs post-import LLM fact extraction
2504
2532
  clawmem embed [-f] Generate fragment embeddings
2505
2533
  clawmem reindex [--force] [--enrich] Full re-index (--enrich: run entity extraction + links on all docs)
2506
2534
  clawmem watch File watcher daemon
@@ -0,0 +1,637 @@
1
+ /**
2
+ * conversation-synthesis.ts — Post-import conversation synthesis pipeline (v0.7.2, Ext 4)
3
+ *
4
+ * Runs AFTER `clawmem mine` completes indexing. Operates on imported conversation
5
+ * docs to extract structured knowledge facts (decisions / preferences / milestones /
6
+ * problems) and cross-document relations via a two-pass LLM pipeline.
7
+ *
8
+ * Pass 1 — Extract facts:
9
+ * - For each conversation doc in the target collection, ask the LLM for structured
10
+ * facts with {title, contentType, narrative, facts, aliases, links}
11
+ * - Save each fact via the dedup-aware saveMemory API
12
+ * - Populate a localMap keyed by normalized title + aliases → docId
13
+ *
14
+ * Pass 2 — Resolve links:
15
+ * - For each extracted fact, resolve its links[] targetTitle via localMap first,
16
+ * fall back to SQL lookup scoped to the same collection
17
+ * - Insert memory_relations via existing upsert (INSERT OR IGNORE for idempotency)
18
+ *
19
+ * Failure modes are all non-fatal:
20
+ * - null LLM call → increment nullCalls, continue
21
+ * - invalid JSON → skip doc, continue
22
+ * - unresolved link target → increment linksUnresolved, continue
23
+ * - any error inside the pipeline never bubbles to the mine import
24
+ *
25
+ * Invoked only when `clawmem mine <dir> --synthesize` is passed (off by default).
26
+ */
27
+
28
+ import type { Store } from "./store.ts";
29
+ import type { LlamaCpp } from "./llm.ts";
30
+ import { extractJsonFromLLM } from "./amem.ts";
31
+ import type { ContentType } from "./memory.ts";
32
+
33
+ // =============================================================================
34
+ // Constants
35
+ // =============================================================================
36
+
37
+ const DEFAULT_MAX_DOCS = 20;
38
+ const DEFAULT_CONTENT_TYPE_FILTER: ContentType[] = ["conversation"];
39
+ const DEFAULT_LINK_WEIGHT = 0.6;
40
+ const DEFAULT_CONFIDENCE = 0.7;
41
+ const DEFAULT_QUALITY_SCORE = 0.6;
42
+ const CONVERSATION_TRUNCATE_CHARS = 3000;
43
+ const LLM_MAX_TOKENS = 1200;
44
+ const LLM_TEMPERATURE = 0.3;
45
+
46
+ /** Content types that the extractor is allowed to emit for synthesized facts. */
47
+ const VALID_EXTRACTED_TYPES = new Set<ContentType>([
48
+ "decision",
49
+ "preference",
50
+ "milestone",
51
+ "problem",
52
+ ]);
53
+
54
+ /** Relation types the extractor may propose — must match the post-P0 taxonomy. */
55
+ const VALID_RELATION_TYPES = new Set<string>([
56
+ "semantic",
57
+ "supporting",
58
+ "contradicts",
59
+ "causal",
60
+ "temporal",
61
+ "entity",
62
+ ]);
63
+
64
+ // =============================================================================
65
+ // Public types (per THOTH_EXTRACTION_PLAN.md Ext 4 spec)
66
+ // =============================================================================
67
+
68
+ export type SynthesizeOptions = {
69
+ /** Required — only operate on this imported collection. */
70
+ collection: string;
71
+ /** Log what would happen but don't insert facts or relations. */
72
+ dryRun?: boolean;
73
+ /** Cap total conversation docs scanned per run (default 20). */
74
+ maxDocs?: number;
75
+ /** Content types to target for synthesis (default ["conversation"]). */
76
+ contentTypeFilter?: ContentType[];
77
+ };
78
+
79
+ export type ExtractedFactLink = {
80
+ targetTitle: string;
81
+ relationType: string;
82
+ weight?: number;
83
+ };
84
+
85
+ export type ExtractedFact = {
86
+ title: string;
87
+ contentType: ContentType;
88
+ narrative: string;
89
+ facts?: string[];
90
+ aliases?: string[];
91
+ sourceDocId: number;
92
+ links?: ExtractedFactLink[];
93
+ };
94
+
95
+ export type SynthesisResult = {
96
+ docsScanned: number;
97
+ factsExtracted: number;
98
+ factsSaved: number;
99
+ linksResolved: number;
100
+ /**
101
+ * Links where the target could not be resolved to a single unique docId.
102
+ * Includes unknown targets AND ambiguous multi-match targets (Turn 13 fix).
103
+ */
104
+ linksUnresolved: number;
105
+ /**
106
+ * Docs where the LLM path itself failed — null response, thrown error,
107
+ * or invalid JSON that couldn't be parsed into an array.
108
+ */
109
+ llmFailures: number;
110
+ /**
111
+ * Docs where the LLM responded with a valid but empty (or all-invalid)
112
+ * extraction — distinct from LLM failures so operators can diagnose
113
+ * "LLM is broken" vs "conversation had no structured facts".
114
+ */
115
+ docsWithNoFacts: number;
116
+ };
117
+
118
+ // =============================================================================
119
+ // Helpers
120
+ // =============================================================================
121
+
122
+ /** Normalize a title or alias for localMap keying. */
123
+ export function normalizeTitle(title: string): string {
124
+ return title.toLowerCase().trim().replace(/\s+/g, " ");
125
+ }
126
+
127
+ /** Slugify a title for stable synthesized path generation. */
128
+ function slugify(title: string): string {
129
+ const slug = title
130
+ .toLowerCase()
131
+ .replace(/[^a-z0-9]+/g, "-")
132
+ .replace(/^-+|-+$/g, "")
133
+ .slice(0, 50);
134
+ return slug || "untitled";
135
+ }
136
+
137
+ /** Render an extracted fact as markdown for the body field. */
138
+ export function renderFactBody(fact: ExtractedFact): string {
139
+ const lines: string[] = [
140
+ `# ${fact.title}`,
141
+ "",
142
+ fact.narrative,
143
+ ];
144
+
145
+ if (fact.facts && fact.facts.length > 0) {
146
+ lines.push("", "## Supporting facts");
147
+ for (const f of fact.facts) {
148
+ lines.push(`- ${f}`);
149
+ }
150
+ }
151
+
152
+ if (fact.aliases && fact.aliases.length > 0) {
153
+ lines.push("", `**Aliases:** ${fact.aliases.join(", ")}`);
154
+ }
155
+
156
+ lines.push("", `_Synthesized from conversation doc #${fact.sourceDocId}._`);
157
+ return lines.join("\n");
158
+ }
159
+
160
+ /**
161
+ * Build the LLM prompt for conversation fact extraction.
162
+ * Exported for test inspection.
163
+ */
164
+ export function buildExtractionPrompt(conversationText: string): string {
165
+ const content = conversationText.slice(0, CONVERSATION_TRUNCATE_CHARS);
166
+ return `Analyze this conversation and extract structured knowledge facts.
167
+
168
+ Conversation:
169
+ ${content}
170
+
171
+ Extract discrete facts as a JSON array. Each fact should represent ONE of:
172
+ - "decision": a choice made, architectural decision, tool selection
173
+ - "preference": a stated preference, convention, or style rule
174
+ - "milestone": a completed deliverable, version release, or event
175
+ - "problem": a bug, issue, or constraint discovered
176
+
177
+ For each fact provide:
178
+ - title: concise 3-8 word title (becomes the fact identity)
179
+ - contentType: one of [decision, preference, milestone, problem]
180
+ - narrative: 1-3 sentence description of the fact in context
181
+ - facts: optional array of supporting fact strings (evidence)
182
+ - aliases: optional alternative titles for linking (e.g., ["OAuth choice"] for "Use OAuth 2.0")
183
+ - links: optional array of cross-fact references. Each link is
184
+ {targetTitle, relationType, weight}
185
+ - targetTitle may refer to another fact extracted from this conversation OR from
186
+ any other conversation in the same imported batch. Prefer an exact title, and
187
+ if you have multiple candidates use a canonical alias.
188
+ - relationType MUST be one of: semantic, supporting, contradicts, causal, temporal, entity
189
+ - weight is 0.0-1.0 (default 0.6)
190
+
191
+ Only extract facts the conversation clearly supports. Do NOT fabricate.
192
+ Return ONLY valid JSON array. Return empty array [] if no structured facts found.
193
+
194
+ Example output:
195
+ [
196
+ {
197
+ "title": "Use OAuth 2.0 with PKCE",
198
+ "contentType": "decision",
199
+ "narrative": "Team decided to use OAuth 2.0 with PKCE for user authentication, replacing session cookies.",
200
+ "facts": ["PKCE chosen for mobile support", "Legacy session auth to be deprecated Q2"],
201
+ "aliases": ["OAuth decision", "switch to OAuth"],
202
+ "links": [
203
+ { "targetTitle": "Deprecate session auth", "relationType": "causal", "weight": 0.8 }
204
+ ]
205
+ }
206
+ ]`;
207
+ }
208
+
209
+ /**
210
+ * Validate + normalize a single raw fact object from LLM output.
211
+ * Returns null if the fact is malformed or uses a disallowed content/relation type.
212
+ */
213
+ function normalizeExtractedFact(
214
+ raw: unknown,
215
+ sourceDocId: number,
216
+ ): ExtractedFact | null {
217
+ if (!raw || typeof raw !== "object") return null;
218
+ const obj = raw as Record<string, unknown>;
219
+
220
+ const title = typeof obj.title === "string" ? obj.title.trim() : "";
221
+ if (!title) return null;
222
+
223
+ const contentType = obj.contentType;
224
+ if (typeof contentType !== "string") return null;
225
+ if (!VALID_EXTRACTED_TYPES.has(contentType as ContentType)) return null;
226
+
227
+ const narrative = typeof obj.narrative === "string" ? obj.narrative.trim() : "";
228
+ if (!narrative) return null;
229
+
230
+ const facts: string[] = Array.isArray(obj.facts)
231
+ ? obj.facts.filter((f): f is string => typeof f === "string" && f.trim().length > 0)
232
+ : [];
233
+
234
+ const aliases: string[] = Array.isArray(obj.aliases)
235
+ ? obj.aliases.filter((a): a is string => typeof a === "string" && a.trim().length > 0)
236
+ : [];
237
+
238
+ const links: ExtractedFactLink[] = Array.isArray(obj.links)
239
+ ? (obj.links as unknown[])
240
+ .map((l) => {
241
+ if (!l || typeof l !== "object") return null;
242
+ const link = l as Record<string, unknown>;
243
+ const targetTitle =
244
+ typeof link.targetTitle === "string" ? link.targetTitle.trim() : "";
245
+ const relationType =
246
+ typeof link.relationType === "string" ? link.relationType : "";
247
+ if (!targetTitle || !VALID_RELATION_TYPES.has(relationType)) return null;
248
+ const weight =
249
+ typeof link.weight === "number" && Number.isFinite(link.weight)
250
+ ? Math.max(0, Math.min(1, link.weight))
251
+ : DEFAULT_LINK_WEIGHT;
252
+ return { targetTitle, relationType, weight };
253
+ })
254
+ .filter((l): l is ExtractedFactLink => l !== null)
255
+ : [];
256
+
257
+ return {
258
+ title,
259
+ contentType: contentType as ContentType,
260
+ narrative,
261
+ facts,
262
+ aliases,
263
+ sourceDocId,
264
+ links,
265
+ };
266
+ }
267
+
268
+ /**
269
+ * Extract facts from a single conversation doc via LLM.
270
+ *
271
+ * Return value discriminates failure mode (Turn 13 fix):
272
+ * - `null` → LLM itself failed: null response, thrown error, or non-array JSON
273
+ * - `[]` → LLM responded with a valid but empty extraction (or all facts rejected by normalize)
274
+ * - `[fact..]` → at least one valid fact extracted
275
+ *
276
+ * Callers use this distinction to split `llmFailures` from `docsWithNoFacts`.
277
+ *
278
+ * Exported for unit testing.
279
+ */
280
+ export async function extractFactsFromConversation(
281
+ llm: LlamaCpp,
282
+ conversationText: string,
283
+ sourceDocId: number,
284
+ ): Promise<ExtractedFact[] | null> {
285
+ const prompt = buildExtractionPrompt(conversationText);
286
+
287
+ let result;
288
+ try {
289
+ result = await llm.generate(prompt, {
290
+ temperature: LLM_TEMPERATURE,
291
+ maxTokens: LLM_MAX_TOKENS,
292
+ });
293
+ } catch (err) {
294
+ console.log(`[synthesis] LLM generate threw for doc ${sourceDocId}:`, err);
295
+ return null;
296
+ }
297
+
298
+ if (!result || typeof result.text !== "string") return null;
299
+
300
+ const parsed = extractJsonFromLLM(result.text);
301
+ if (!Array.isArray(parsed)) return null;
302
+
303
+ const facts: ExtractedFact[] = [];
304
+ for (const raw of parsed) {
305
+ const fact = normalizeExtractedFact(raw, sourceDocId);
306
+ if (fact) facts.push(fact);
307
+ }
308
+ return facts;
309
+ }
310
+
311
+ /**
312
+ * Resolve a link target to a UNIQUE docId via localMap first, then a SQL
313
+ * fallback scoped to the same collection.
314
+ *
315
+ * Ambiguity handling (Turn 13 fix):
316
+ * - localMap stores a Set<number> per normalized title/alias. If a key maps
317
+ * to more than one docId (two different synthesized facts share the same
318
+ * title or alias), the resolver returns `null` — the caller counts this
319
+ * as unresolved/ambiguous instead of silently binding to one candidate.
320
+ * - SQL fallback issues a LIMIT 2 query and returns `null` if more than
321
+ * one row matches.
322
+ *
323
+ * Exported for unit testing.
324
+ */
325
+ export function resolveLinkTarget(
326
+ store: Store,
327
+ localMap: Map<string, Set<number>>,
328
+ titleOrAlias: string,
329
+ collection: string,
330
+ ): number | null {
331
+ const normalized = normalizeTitle(titleOrAlias);
332
+ if (!normalized) return null;
333
+
334
+ const localHits = localMap.get(normalized);
335
+ if (localHits && localHits.size > 0) {
336
+ if (localHits.size === 1) {
337
+ // localHits.values().next().value is the sole docId
338
+ const first = localHits.values().next().value;
339
+ return typeof first === "number" ? first : null;
340
+ }
341
+ // Ambiguous — two or more synthesized facts claim this title/alias
342
+ console.log(
343
+ `[synthesis] Ambiguous local target "${titleOrAlias}" — ${localHits.size} candidates, treated as unresolved`,
344
+ );
345
+ return null;
346
+ }
347
+
348
+ try {
349
+ const rows = store.db
350
+ .prepare(
351
+ `SELECT id
352
+ FROM documents
353
+ WHERE collection = ?
354
+ AND active = 1
355
+ AND LOWER(TRIM(title)) = ?
356
+ ORDER BY created_at DESC
357
+ LIMIT 2`,
358
+ )
359
+ .all(collection, normalized) as Array<{ id: number }>;
360
+
361
+ if (rows.length === 0) return null;
362
+ if (rows.length > 1) {
363
+ console.log(
364
+ `[synthesis] Ambiguous SQL target "${titleOrAlias}" in collection '${collection}' — multiple matches, treated as unresolved`,
365
+ );
366
+ return null;
367
+ }
368
+ return rows[0]!.id;
369
+ } catch (err) {
370
+ console.log(`[synthesis] SQL lookup failed for "${titleOrAlias}":`, err);
371
+ return null;
372
+ }
373
+ }
374
+
375
+ // =============================================================================
376
+ // Main orchestrator
377
+ // =============================================================================
378
+
379
+ /**
380
+ * Helper: add a docId to the localMap under `key`. Uses Set<number> so we can
381
+ * detect ambiguous collisions (two different facts claiming the same title/alias).
382
+ * Turn 13 fix — previous implementation silently overwrote on collision.
383
+ */
384
+ function addToLocalMap(
385
+ localMap: Map<string, Set<number>>,
386
+ key: string,
387
+ docId: number,
388
+ ): void {
389
+ if (!key) return;
390
+ const existing = localMap.get(key);
391
+ if (existing) {
392
+ existing.add(docId);
393
+ } else {
394
+ localMap.set(key, new Set([docId]));
395
+ }
396
+ }
397
+
398
+ /**
399
+ * Build a stable synthesized path for a fact (Turn 14 fix).
400
+ *
401
+ * The path is a pure function of (sourceDocId, slug(title), hash(normalized title)),
402
+ * with NO dependence on extraction order. This means:
403
+ * - Reruns over the same conversation batch hit saveMemory's
404
+ * UNIQUE(collection, path) update branch and keep the same synthesized
405
+ * document in place, even when the LLM's fact order changes.
406
+ * - Two different facts with the same slug (e.g., "Use OAuth." and
407
+ * "Use OAuth!" both slugify to "use-oauth") get distinct hash suffixes
408
+ * because the full normalized title differs, so they do not clobber
409
+ * each other in the UNIQUE(collection, path) constraint.
410
+ *
411
+ * Turn 13 used a per-run encounter counter which was order-dependent: if the
412
+ * LLM re-emitted the two same-slug facts in reversed order on a subsequent
413
+ * run, the `-2` suffix would land on the other fact and saveMemory would
414
+ * overwrite each row with the wrong body. The hash version is stable.
415
+ */
416
+ function buildSynthesizedPath(sourceDocId: number, title: string): string {
417
+ const baseSlug = slugify(title);
418
+ const hasher = new Bun.CryptoHasher("sha256");
419
+ hasher.update(normalizeTitle(title));
420
+ const shortHash = hasher.digest("hex").slice(0, 8);
421
+ return `synthesized/${baseSlug}-src${sourceDocId}-${shortHash}.md`;
422
+ }
423
+
424
+ /**
425
+ * Run the two-pass conversation synthesis pipeline over a collection's
426
+ * imported conversation documents.
427
+ *
428
+ * Failure of this pipeline NEVER aborts or rolls back an upstream mine import —
429
+ * the caller should invoke this AFTER indexCollection has committed its changes.
430
+ */
431
+ export async function runConversationSynthesis(
432
+ store: Store,
433
+ llm: LlamaCpp,
434
+ opts: SynthesizeOptions,
435
+ ): Promise<SynthesisResult> {
436
+ const {
437
+ collection,
438
+ dryRun = false,
439
+ maxDocs = DEFAULT_MAX_DOCS,
440
+ contentTypeFilter = DEFAULT_CONTENT_TYPE_FILTER,
441
+ } = opts;
442
+
443
+ const result: SynthesisResult = {
444
+ docsScanned: 0,
445
+ factsExtracted: 0,
446
+ factsSaved: 0,
447
+ linksResolved: 0,
448
+ linksUnresolved: 0,
449
+ llmFailures: 0,
450
+ docsWithNoFacts: 0,
451
+ };
452
+
453
+ if (!collection) {
454
+ console.log(`[synthesis] No collection specified — skipping`);
455
+ return result;
456
+ }
457
+ if (contentTypeFilter.length === 0) {
458
+ console.log(`[synthesis] Empty contentTypeFilter — skipping`);
459
+ return result;
460
+ }
461
+
462
+ let docs: Array<{ id: number; title: string; body: string }>;
463
+ try {
464
+ const placeholders = contentTypeFilter.map(() => "?").join(",");
465
+ docs = store.db
466
+ .prepare(
467
+ `SELECT d.id, d.title, c.doc as body
468
+ FROM documents d
469
+ JOIN content c ON c.hash = d.hash
470
+ WHERE d.collection = ?
471
+ AND d.active = 1
472
+ AND d.content_type IN (${placeholders})
473
+ ORDER BY d.created_at ASC, d.id ASC
474
+ LIMIT ?`,
475
+ )
476
+ .all(collection, ...contentTypeFilter, maxDocs) as Array<{
477
+ id: number;
478
+ title: string;
479
+ body: string;
480
+ }>;
481
+ } catch (err) {
482
+ console.log(`[synthesis] Query failed for collection '${collection}':`, err);
483
+ return result;
484
+ }
485
+
486
+ if (docs.length === 0) {
487
+ console.log(
488
+ `[synthesis] No matching docs in collection '${collection}' (types=${contentTypeFilter.join(",")})`,
489
+ );
490
+ return result;
491
+ }
492
+
493
+ console.log(
494
+ `[synthesis] Pass 1 — extracting facts from ${docs.length} doc(s) in '${collection}'${dryRun ? " (dry run)" : ""}`,
495
+ );
496
+
497
+ // Pass 1 — extract + save + populate localMap
498
+ // Each fact carries its resolved docId so Pass 2 can reference it without
499
+ // re-querying. In dryRun mode we only count, we do not persist anything.
500
+ type SavedFact = ExtractedFact & { _savedDocId: number };
501
+ const saved: SavedFact[] = [];
502
+ const localMap = new Map<string, Set<number>>();
503
+
504
+ for (const doc of docs) {
505
+ result.docsScanned++;
506
+
507
+ const extracted = await extractFactsFromConversation(llm, doc.body, doc.id);
508
+
509
+ if (extracted === null) {
510
+ // LLM path failed (null / thrown / non-array)
511
+ result.llmFailures++;
512
+ continue;
513
+ }
514
+
515
+ if (extracted.length === 0) {
516
+ // LLM returned a valid response but there were no structured facts
517
+ // to extract (or all candidates were rejected by normalize).
518
+ result.docsWithNoFacts++;
519
+ continue;
520
+ }
521
+
522
+ for (const fact of extracted) {
523
+ result.factsExtracted++;
524
+
525
+ if (dryRun) continue;
526
+
527
+ try {
528
+ const saveResult = store.saveMemory({
529
+ collection,
530
+ path: buildSynthesizedPath(doc.id, fact.title),
531
+ title: fact.title,
532
+ body: renderFactBody(fact),
533
+ contentType: fact.contentType,
534
+ confidence: DEFAULT_CONFIDENCE,
535
+ qualityScore: DEFAULT_QUALITY_SCORE,
536
+ semanticPayload: `${fact.title}\n${fact.narrative}`,
537
+ });
538
+
539
+ if (!saveResult.docId || saveResult.docId < 0) continue;
540
+
541
+ if (saveResult.action === "inserted" || saveResult.action === "updated") {
542
+ result.factsSaved++;
543
+ }
544
+
545
+ // Populate localMap with the canonical title and every alias. Using
546
+ // Set<number> means a second fact claiming the same title/alias will
547
+ // make the key ambiguous and the resolver returns null instead of
548
+ // silently picking one. (Turn 13 fix.)
549
+ addToLocalMap(localMap, normalizeTitle(fact.title), saveResult.docId);
550
+ for (const alias of fact.aliases ?? []) {
551
+ addToLocalMap(localMap, normalizeTitle(alias), saveResult.docId);
552
+ }
553
+
554
+ saved.push({ ...fact, _savedDocId: saveResult.docId });
555
+ } catch (err) {
556
+ console.log(`[synthesis] saveMemory error for "${fact.title}":`, err);
557
+ }
558
+ }
559
+ }
560
+
561
+ if (dryRun) {
562
+ console.log(
563
+ `[synthesis] Dry run complete — docsScanned=${result.docsScanned} factsExtracted=${result.factsExtracted} llmFailures=${result.llmFailures} docsWithNoFacts=${result.docsWithNoFacts}`,
564
+ );
565
+ return result;
566
+ }
567
+
568
+ // Pass 2 — resolve links against localMap first, then collection-scoped SQL
569
+ console.log(
570
+ `[synthesis] Pass 2 — resolving links for ${saved.length} saved fact(s)`,
571
+ );
572
+
573
+ for (const fact of saved) {
574
+ if (!fact.links || fact.links.length === 0) continue;
575
+ const sourceDocId = fact._savedDocId;
576
+
577
+ for (const link of fact.links) {
578
+ const targetId = resolveLinkTarget(
579
+ store,
580
+ localMap,
581
+ link.targetTitle,
582
+ collection,
583
+ );
584
+
585
+ if (targetId === null || targetId === sourceDocId) {
586
+ result.linksUnresolved++;
587
+ if (targetId !== sourceDocId) {
588
+ console.log(
589
+ `[synthesis] Unresolved link "${link.targetTitle}" from doc ${sourceDocId}`,
590
+ );
591
+ }
592
+ continue;
593
+ }
594
+
595
+ try {
596
+ // Idempotent-yet-evidence-preserving upsert (Turn 13 fix):
597
+ // INSERT OR IGNORE under-accumulated — it discarded later runs that
598
+ // had stronger evidence for the same triple.
599
+ // store.insertRelation over-accumulated (weight += excluded.weight) —
600
+ // it inflated weights linearly with rerun count.
601
+ // `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)`
602
+ // is idempotent on reruns with equal weight AND monotonically accepts
603
+ // later-discovered stronger evidence for the same (source, target, type)
604
+ // triple without double-counting.
605
+ store.db
606
+ .prepare(
607
+ `INSERT INTO memory_relations
608
+ (source_id, target_id, relation_type, weight, metadata, created_at)
609
+ VALUES (?, ?, ?, ?, ?, ?)
610
+ ON CONFLICT(source_id, target_id, relation_type)
611
+ DO UPDATE SET weight = MAX(weight, excluded.weight)`,
612
+ )
613
+ .run(
614
+ sourceDocId,
615
+ targetId,
616
+ link.relationType,
617
+ link.weight ?? DEFAULT_LINK_WEIGHT,
618
+ JSON.stringify({ origin: "conversation-synthesis" }),
619
+ new Date().toISOString(),
620
+ );
621
+ result.linksResolved++;
622
+ } catch (err) {
623
+ console.log(
624
+ `[synthesis] insertRelation failed ${sourceDocId}->${targetId} (${link.relationType}):`,
625
+ err,
626
+ );
627
+ result.linksUnresolved++;
628
+ }
629
+ }
630
+ }
631
+
632
+ console.log(
633
+ `[synthesis] Complete — docsScanned=${result.docsScanned} factsExtracted=${result.factsExtracted} factsSaved=${result.factsSaved} linksResolved=${result.linksResolved} linksUnresolved=${result.linksUnresolved} llmFailures=${result.llmFailures} docsWithNoFacts=${result.docsWithNoFacts}`,
634
+ );
635
+
636
+ return result;
637
+ }