clawmem 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -97,6 +97,11 @@ curl http://host:8090/v1/models
97
97
  | `CLAWMEM_ENABLE_CONSOLIDATION` | disabled | Background worker backfills unenriched docs. Needs long-lived MCP process. |
98
98
  | `CLAWMEM_CONSOLIDATION_INTERVAL` | 300000 | Worker interval in ms (min 15000). |
99
99
  | `CLAWMEM_NUDGE_INTERVAL` | `15` | Prompts between lifecycle tool use before `<vault-nudge>` injection. 0 to disable. |
100
+ | `CLAWMEM_MERGE_SCORE_NORMAL` | `0.93` | **v0.7.1.** Phase 2 merge-safety score threshold when candidate and existing anchors align. Merges above this normalized 3-gram cosine similarity are allowed. |
101
+ | `CLAWMEM_MERGE_SCORE_STRICT` | `0.98` | **v0.7.1.** Strictest merge-safety score threshold (fallback when anchors are ambiguous). |
102
+ | `CLAWMEM_MERGE_GUARD_DRY_RUN` | `false` | **v0.7.1.** When `true`, Phase 2 merge-safety rejections are logged but not enforced — use for calibration before switching on the gate. |
103
+ | `CLAWMEM_CONTRADICTION_POLICY` | `link` | **v0.7.1.** How the merge-time contradiction gate handles a contradictory merge. `link` keeps both rows active and inserts a `contradicts` edge. `supersede` marks the old row `status='inactive'`. |
104
+ | `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE` | `0.5` | **v0.7.1.** Minimum combined (heuristic + LLM) confidence required before the contradiction gate blocks a merge. Below this, the merge proceeds. |
100
105
 
101
106
  **Note:** The `bin/clawmem` wrapper sets all endpoint defaults. Always use the wrapper — never `bun run src/clawmem.ts` directly. For remote GPU setups, add the same env vars to the watcher service via a systemd drop-in.
102
107
 
@@ -246,7 +251,7 @@ ClawMem hooks handle ~90% of retrieval automatically. Agent-initiated MCP calls
246
251
 
247
252
  | Hook | Trigger | Budget | Content |
248
253
  |------|---------|--------|---------|
249
- | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate → profile-driven hybrid search (vector if `useVector`, timeout from profile) → FTS supplement → file-aware supplemental search (E13) → snooze filter → noise filter → spreading activation (E11: co-activated doc boost) → memory type diversification (E10) → tiered injection (HOT/WARM/COLD snippets) → `<vault-context>` + optional `<vault-routing>` hint. Budget, max results, vector timeout, and min score all driven by `CLAWMEM_PROFILE`. |
254
+ | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate → profile-driven hybrid search (vector if `useVector`, timeout from profile) → FTS supplement → file-aware supplemental search (E13) → snooze filter → noise filter → spreading activation (E11: co-activated doc boost) → memory type diversification (E10) → tiered injection (HOT/WARM/COLD snippets) → `<vault-context><instruction>…</instruction><facts>…</facts><relationships>…</relationships></vault-context>` (v0.7.1: instruction always prepended when context is returned; relationships block lists memory-graph edges where BOTH endpoints are in the surfaced set, truncated first when over budget) + optional `<vault-routing>` hint. Budget, max results, vector timeout, and min score all driven by `CLAWMEM_PROFILE`. |
250
255
  | `postcompact-inject` | SessionStart (compact) | 1200 tokens | re-injects authoritative context after compaction: precompact state (600) + recent decisions (400) + antipatterns (150) + vault context (200) → `<vault-postcompact>` |
251
256
  | `curator-nudge` | SessionStart | 200 tokens | surfaces curator report actions, nudges when report is stale (>7 days) |
252
257
  | `precompact-extract` | PreCompact | — | extracts decisions, file paths, open questions → writes `precompact-state.md` to auto-memory. Query-aware decision ranking. Reindexes auto-memory collection. |
@@ -352,7 +357,7 @@ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent sh
352
357
  - **Proactive triggers:** A memory keeps surfacing but isn't relevant to current work. User says "not now" / "later" / "ignore this for now". Seasonal or time-boxed content (e.g., "revisit after launch").
353
358
  - **Forget** (`memory_forget`) — permanently deactivates. Use sparingly.
354
359
  - Only when a memory is genuinely wrong or permanently obsolete. Prefer snooze for temporary suppression.
355
- - **Contradictions auto-resolve:** When `decision-extractor` detects a new decision contradicting an old one, the old decision's confidence is lowered automatically. No manual intervention needed for superseded decisions.
360
+ - **Contradictions auto-resolve:** When `decision-extractor` detects a new decision contradicting an old one, the old decision's confidence is lowered automatically. No manual intervention needed for superseded decisions. **v0.7.1:** the consolidation worker adds a merge-time contradiction gate — before any Phase 2 merge, it runs a deterministic heuristic + LLM check and either links contradictory observations via a `contradicts` edge (default) or marks the prior row `status='inactive'` (when `CLAWMEM_CONTRADICTION_POLICY=supersede`). Phase 3 deductive synthesis applies the same gate to deductive dedupe matches.
356
361
 
357
362
  ### Anti-Patterns
358
363
 
@@ -363,7 +368,7 @@ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent sh
363
368
  - Do NOT pin everything — pin is for persistent high-priority items, not temporary boosting.
364
369
  - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it naturally.
365
370
  - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically. Only after bulk ingestion or when `intent_search` returns weak graph results.
366
- - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent.
371
+ - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent. **v0.7.2 adds `--synthesize`** — an opt-in post-import LLM fact extraction pass that turns raw conversation dumps into searchable structured decisions / preferences / milestones / problems with cross-fact relations. Off by default; also requires user consent because it drives additional LLM calls (one per conversation doc). Suggest both together when the user wants to get real value out of old chat exports, not just the raw dumps.
367
372
  - Do NOT use `diary_write` in Claude Code — hooks (`decision-extractor`, `handoff-generator`) capture this automatically. Diary is for non-hooked environments only (Hermes, Gemini, plain MCP clients).
368
373
  - Do NOT use `kg_query` for causal "why" questions — use `intent_search` or `memory_retrieve`. `kg_query` returns structured entity facts (SPO triples), not reasoning chains.
369
374
 
@@ -498,8 +503,9 @@ The `memory_relations` table is populated by multiple independent sources:
498
503
  | `buildTemporalBackbone()` | temporal | `build_graphs` MCP tool (manual) | Creation-order edges between all active docs. |
499
504
  | `buildSemanticGraph()` | semantic | `build_graphs` MCP tool (manual) | Pure cosine similarity. PK collision: `INSERT OR IGNORE` means A-MEM semantic edges take precedence if they exist first. |
500
505
  | Entity co-occurrence graph | entity | A-MEM enrichment (indexing) | LLM entity extraction → quality filters (title/length/blocklist/location validation) → type-agnostic canonical resolution within compatibility buckets (person, org, location, tech=project/service/tool/concept) → `entity_mentions` + `entity_cooccurrences` tables. Entity edges use IDF-based specificity scoring. Feeds ENTITY intent queries and MPFP `[entity, semantic]` patterns. |
501
- | `consolidated_observations` | supporting | Consolidation worker (background) | 3-tier consolidation: facts → observations → mental models. Observations track `proof_count`, `trend` (STABLE/STRENGTHENING/WEAKENING/STALE), and source links. |
502
- | Deductive synthesis | supporting | Consolidation worker Phase 3 (background, every ~15 min) | Combines 2-3 related recent observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` documents with `source_doc_ids` provenance. First-class searchable docs with ∞ half-life. |
506
+ | `consolidated_observations` | supporting, contradicts | Consolidation worker (background) | 3-tier consolidation: facts → observations → mental models. Observations track `proof_count`, `trend` (STABLE/STRENGTHENING/WEAKENING/STALE), and source links. **v0.7.1 safety gates:** name-aware merge gate uses entity-anchor comparison + 3-gram cosine similarity (dual-threshold `CLAWMEM_MERGE_SCORE_NORMAL`=0.93 / `_STRICT`=0.98) to prevent cross-entity merges ("Alice decided X" merging into "Bob decided X"). Merge-time contradiction gate runs deterministic heuristic + LLM check; blocked merges route to `CLAWMEM_CONTRADICTION_POLICY`=`link` (new row + `contradicts` edge, default) or `supersede` (old row `status='inactive'`, new row replaces). |
507
+ | Deductive synthesis | supporting, contradicts | Consolidation worker Phase 3 (background, every ~15 min) | Combines 2-3 related recent observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` documents with `source_doc_ids` provenance. First-class searchable docs with ∞ half-life. **v0.7.1 anti-contamination wrapper:** every draft passes through deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions` or lexical fallback) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` (contaminationRejects, invalidIndexRejects, unsupportedRejects, emptyRejects, dedupSkipped, validatorFallbackAccepts). Contradictory dedupe matches are linked via `contradicts` edges. |
508
+ | Conversation synthesis (`runConversationSynthesis`) | semantic, supporting, contradicts, causal, temporal, entity | `clawmem mine <dir> --synthesize` (opt-in, post-index) | **v0.7.2.** Two-pass LLM pipeline over freshly imported `content_type='conversation'` docs. Pass 1 extracts structured facts (decision/preference/milestone/problem) via `extractFactsFromConversation`, saves each via dedup-aware `saveMemory`, populates a local Set-based alias map. Pass 2 resolves cross-fact links against the local map first (fails closed on ambiguity — multi-candidate titles return unresolved), falls back to collection-scoped SQL lookup with LIMIT 2 ambiguity detection. Relations upsert via `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)` — idempotent on equal-weight reruns but monotonically accepts stronger later evidence. Synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns update in place instead of creating parallel rows. Counters split into `llmFailures` (null/thrown/invalid-JSON) vs `docsWithNoFacts` (valid-empty extraction). All failures non-fatal — never rolls back the mine import. |
503
509
 
504
510
  **Edge collision:** Both `generateMemoryLinks()` and `buildSemanticGraph()` insert `relation_type='semantic'`. PK is `(source_id, target_id, relation_type)` — first writer wins.
505
511
 
package/CLAUDE.md CHANGED
@@ -97,6 +97,11 @@ curl http://host:8090/v1/models
97
97
  | `CLAWMEM_ENABLE_CONSOLIDATION` | disabled | Background worker backfills unenriched docs. Needs long-lived MCP process. |
98
98
  | `CLAWMEM_CONSOLIDATION_INTERVAL` | 300000 | Worker interval in ms (min 15000). |
99
99
  | `CLAWMEM_NUDGE_INTERVAL` | `15` | Prompts between lifecycle tool use before `<vault-nudge>` injection. 0 to disable. |
100
+ | `CLAWMEM_MERGE_SCORE_NORMAL` | `0.93` | **v0.7.1.** Phase 2 merge-safety score threshold when candidate and existing anchors align. Merges above this normalized 3-gram cosine similarity are allowed. |
101
+ | `CLAWMEM_MERGE_SCORE_STRICT` | `0.98` | **v0.7.1.** Strictest merge-safety score threshold (fallback when anchors are ambiguous). |
102
+ | `CLAWMEM_MERGE_GUARD_DRY_RUN` | `false` | **v0.7.1.** When `true`, Phase 2 merge-safety rejections are logged but not enforced — use for calibration before switching on the gate. |
103
+ | `CLAWMEM_CONTRADICTION_POLICY` | `link` | **v0.7.1.** How the merge-time contradiction gate handles a contradictory merge. `link` keeps both rows active and inserts a `contradicts` edge. `supersede` marks the old row `status='inactive'`. |
104
+ | `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE` | `0.5` | **v0.7.1.** Minimum combined (heuristic + LLM) confidence required before the contradiction gate blocks a merge. Below this, the merge proceeds. |
100
105
 
101
106
  **Note:** The `bin/clawmem` wrapper sets all endpoint defaults. Always use the wrapper — never `bun run src/clawmem.ts` directly. For remote GPU setups, add the same env vars to the watcher service via a systemd drop-in.
102
107
 
@@ -246,7 +251,7 @@ ClawMem hooks handle ~90% of retrieval automatically. Agent-initiated MCP calls
246
251
 
247
252
  | Hook | Trigger | Budget | Content |
248
253
  |------|---------|--------|---------|
249
- | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate → profile-driven hybrid search (vector if `useVector`, timeout from profile) → FTS supplement → file-aware supplemental search (E13) → snooze filter → noise filter → spreading activation (E11: co-activated doc boost) → memory type diversification (E10) → tiered injection (HOT/WARM/COLD snippets) → `<vault-context>` + optional `<vault-routing>` hint. Budget, max results, vector timeout, and min score all driven by `CLAWMEM_PROFILE`. |
254
+ | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate → profile-driven hybrid search (vector if `useVector`, timeout from profile) → FTS supplement → file-aware supplemental search (E13) → snooze filter → noise filter → spreading activation (E11: co-activated doc boost) → memory type diversification (E10) → tiered injection (HOT/WARM/COLD snippets) → `<vault-context><instruction>…</instruction><facts>…</facts><relationships>…</relationships></vault-context>` (v0.7.1: instruction always prepended when context is returned; relationships block lists memory-graph edges where BOTH endpoints are in the surfaced set, truncated first when over budget) + optional `<vault-routing>` hint. Budget, max results, vector timeout, and min score all driven by `CLAWMEM_PROFILE`. |
250
255
  | `postcompact-inject` | SessionStart (compact) | 1200 tokens | re-injects authoritative context after compaction: precompact state (600) + recent decisions (400) + antipatterns (150) + vault context (200) → `<vault-postcompact>` |
251
256
  | `curator-nudge` | SessionStart | 200 tokens | surfaces curator report actions, nudges when report is stale (>7 days) |
252
257
  | `precompact-extract` | PreCompact | — | extracts decisions, file paths, open questions → writes `precompact-state.md` to auto-memory. Query-aware decision ranking. Reindexes auto-memory collection. |
@@ -352,7 +357,7 @@ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent sh
352
357
  - **Proactive triggers:** A memory keeps surfacing but isn't relevant to current work. User says "not now" / "later" / "ignore this for now". Seasonal or time-boxed content (e.g., "revisit after launch").
353
358
  - **Forget** (`memory_forget`) — permanently deactivates. Use sparingly.
354
359
  - Only when a memory is genuinely wrong or permanently obsolete. Prefer snooze for temporary suppression.
355
- - **Contradictions auto-resolve:** When `decision-extractor` detects a new decision contradicting an old one, the old decision's confidence is lowered automatically. No manual intervention needed for superseded decisions.
360
+ - **Contradictions auto-resolve:** When `decision-extractor` detects a new decision contradicting an old one, the old decision's confidence is lowered automatically. No manual intervention needed for superseded decisions. **v0.7.1:** the consolidation worker adds a merge-time contradiction gate — before any Phase 2 merge, it runs a deterministic heuristic + LLM check and either links contradictory observations via a `contradicts` edge (default) or marks the prior row `status='inactive'` (when `CLAWMEM_CONTRADICTION_POLICY=supersede`). Phase 3 deductive synthesis applies the same gate to deductive dedupe matches.
356
361
 
357
362
  ### Anti-Patterns
358
363
 
@@ -363,7 +368,7 @@ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent sh
363
368
  - Do NOT pin everything — pin is for persistent high-priority items, not temporary boosting.
364
369
  - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it naturally.
365
370
  - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically. Only after bulk ingestion or when `intent_search` returns weak graph results.
366
- - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent.
371
+ - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command (same category as `update`/`reindex`). Suggest it to the user when they mention old conversation exports, but let them run it. Bulk import has disk/embedding cost implications that need user consent. **v0.7.2 adds `--synthesize`** — an opt-in post-import LLM fact extraction pass that turns raw conversation dumps into searchable structured decisions / preferences / milestones / problems with cross-fact relations. Off by default; also requires user consent because it drives additional LLM calls (one per conversation doc). Suggest both together when the user wants to get real value out of old chat exports, not just the raw dumps.
367
372
  - Do NOT use `diary_write` in Claude Code — hooks (`decision-extractor`, `handoff-generator`) capture this automatically. Diary is for non-hooked environments only (Hermes, Gemini, plain MCP clients).
368
373
  - Do NOT use `kg_query` for causal "why" questions — use `intent_search` or `memory_retrieve`. `kg_query` returns structured entity facts (SPO triples), not reasoning chains.
369
374
 
@@ -498,8 +503,9 @@ The `memory_relations` table is populated by multiple independent sources:
498
503
  | `buildTemporalBackbone()` | temporal | `build_graphs` MCP tool (manual) | Creation-order edges between all active docs. |
499
504
  | `buildSemanticGraph()` | semantic | `build_graphs` MCP tool (manual) | Pure cosine similarity. PK collision: `INSERT OR IGNORE` means A-MEM semantic edges take precedence if they exist first. |
500
505
  | Entity co-occurrence graph | entity | A-MEM enrichment (indexing) | LLM entity extraction → quality filters (title/length/blocklist/location validation) → type-agnostic canonical resolution within compatibility buckets (person, org, location, tech=project/service/tool/concept) → `entity_mentions` + `entity_cooccurrences` tables. Entity edges use IDF-based specificity scoring. Feeds ENTITY intent queries and MPFP `[entity, semantic]` patterns. |
501
- | `consolidated_observations` | supporting | Consolidation worker (background) | 3-tier consolidation: facts → observations → mental models. Observations track `proof_count`, `trend` (STABLE/STRENGTHENING/WEAKENING/STALE), and source links. |
502
- | Deductive synthesis | supporting | Consolidation worker Phase 3 (background, every ~15 min) | Combines 2-3 related recent observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` documents with `source_doc_ids` provenance. First-class searchable docs with ∞ half-life. |
506
+ | `consolidated_observations` | supporting, contradicts | Consolidation worker (background) | 3-tier consolidation: facts → observations → mental models. Observations track `proof_count`, `trend` (STABLE/STRENGTHENING/WEAKENING/STALE), and source links. **v0.7.1 safety gates:** name-aware merge gate uses entity-anchor comparison + 3-gram cosine similarity (dual-threshold `CLAWMEM_MERGE_SCORE_NORMAL`=0.93 / `_STRICT`=0.98) to prevent cross-entity merges ("Alice decided X" merging into "Bob decided X"). Merge-time contradiction gate runs deterministic heuristic + LLM check; blocked merges route to `CLAWMEM_CONTRADICTION_POLICY`=`link` (new row + `contradicts` edge, default) or `supersede` (old row `status='inactive'`, new row replaces). |
507
+ | Deductive synthesis | supporting, contradicts | Consolidation worker Phase 3 (background, every ~15 min) | Combines 2-3 related recent observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` documents with `source_doc_ids` provenance. First-class searchable docs with ∞ half-life. **v0.7.1 anti-contamination wrapper:** every draft passes through deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions` or lexical fallback) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` (contaminationRejects, invalidIndexRejects, unsupportedRejects, emptyRejects, dedupSkipped, validatorFallbackAccepts). Contradictory dedupe matches are linked via `contradicts` edges. |
508
+ | Conversation synthesis (`runConversationSynthesis`) | semantic, supporting, contradicts, causal, temporal, entity | `clawmem mine <dir> --synthesize` (opt-in, post-index) | **v0.7.2.** Two-pass LLM pipeline over freshly imported `content_type='conversation'` docs. Pass 1 extracts structured facts (decision/preference/milestone/problem) via `extractFactsFromConversation`, saves each via dedup-aware `saveMemory`, populates a local Set-based alias map. Pass 2 resolves cross-fact links against the local map first (fails closed on ambiguity — multi-candidate titles return unresolved), falls back to collection-scoped SQL lookup with LIMIT 2 ambiguity detection. Relations upsert via `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)` — idempotent on equal-weight reruns but monotonically accepts stronger later evidence. Synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns update in place instead of creating parallel rows. Counters split into `llmFailures` (null/thrown/invalid-JSON) vs `docsWithNoFacts` (valid-empty extraction). All failures non-fatal — never rolls back the mine import. |
503
509
 
504
510
  **Edge collision:** Both `generateMemoryLinks()` and `buildSemanticGraph()` insert `relation_type='semantic'`. PK is `(source_id, target_id, relation_type)` — first writer wins.
505
511
 
package/README.md CHANGED
@@ -19,7 +19,7 @@ ClawMem turns your markdown notes, project docs, and research dumps into persist
19
19
  - **Surfaces relevant context** on every prompt (context-surfacing hook)
20
20
  - **Bootstraps sessions** with your profile, latest handoff, recent decisions, and stale notes
21
21
  - **Captures decisions, preferences, milestones, and problems** from session transcripts using a local GGUF observer model
22
- - **Imports conversation exports** from Claude Code, ChatGPT, Claude.ai, Slack, and plain text via `clawmem mine`
22
+ - **Imports conversation exports** from Claude Code, ChatGPT, Claude.ai, Slack, and plain text via `clawmem mine`, with optional post-import LLM fact extraction (`--synthesize`) that pulls structured decisions / preferences / milestones / problems and cross-fact links out of otherwise full-text conversation dumps (v0.7.2)
23
23
  - **Generates handoffs** at session end so the next session can pick up where you left off
24
24
  - **Learns what matters** via a feedback loop that boosts referenced notes and decays unused ones
25
25
  - **Guards against prompt injection** in surfaced content
@@ -27,7 +27,10 @@ ClawMem turns your markdown notes, project docs, and research dumps into persist
27
27
  - **Traverses multi-graphs** (semantic, temporal, causal) via adaptive beam search
28
28
  - **Evolves memory metadata** as new documents create or refine connections
29
29
  - **Infers causal relationships** between facts extracted from session observations
30
- - **Detects contradictions** between new and prior decisions, auto-decaying superseded ones
30
+ - **Detects contradictions** between new and prior decisions, auto-decaying superseded ones (with an additional merge-time contradiction gate in the consolidation worker that blocks cross-observation contradictions before they land, v0.7.1)
31
+ - **Guards against cross-entity merges** during consolidation — name-aware dual-threshold merge safety compares entity anchors before merging similar observations, preventing "Alice decided X" from merging into "Bob decided X" (v0.7.1)
32
+ - **Prevents context bleed in derived insights** — the Phase 3 deductive synthesis pipeline validates every draft against an anti-contamination wrapper (deterministic entity contamination check + LLM validator + dedupe) before writing cross-session deductive observations (v0.7.1)
33
+ - **Frames surfaced facts as background knowledge** — `context-surfacing` wraps injected content in `<instruction>` + `<facts>` + `<relationships>` blocks, telling the model to treat facts as already-known and exposing memory-graph edges between surfaced docs directly in-prompt (v0.7.1)
31
34
  - **Scores document quality** using structure, keywords, and metadata richness signals
32
35
  - **Boosts co-accessed documents** — notes frequently surfaced together get retrieval reinforcement
33
36
  - **Decomposes complex queries** into typed retrieval clauses (BM25/vector/graph) for multi-topic questions
@@ -53,6 +56,30 @@ Runs fully local with no API keys and no cloud services. Integrates via Claude C
53
56
  - **Observation invalidation** — soft invalidation (invalidated_at/invalidated_by/superseded_by columns). Observations with confidence ≤ 0.2 after contradiction are filtered from search results.
54
57
  - **Memory nudge** — periodic ephemeral `<vault-nudge>` injection prompting lifecycle tool use after N turns of inactivity. Configurable via `CLAWMEM_NUDGE_INTERVAL`.
55
58
 
59
+ ### v0.7.1 Safety Release
60
+
61
+ Five independent safety gates around the consolidation pipeline and context surfacing, aimed at preventing contamination, cross-entity merges, and unchecked contradictions from landing in the vault. Every extraction ships with full unit + integration test coverage (+158 tests on top of the v0.7.0 baseline). See [consolidation safety](docs/concepts/architecture.md#consolidation-safety-v071) for the architectural walkthrough.
62
+
63
+ - **Taxonomy cleanup** — standardized on the A-MEM `contradicts` (plural) convention across the entire codebase, eliminating silent query misses on the legacy singular form
64
+ - **Name-aware merge safety** — the Phase 2 consolidation worker gate extracts entity anchors (via `entity_mentions`, with lexical proper-noun fallback) and runs dual-threshold normalized 3-gram cosine similarity before merging similar observations. Cross-entity merges are hard-rejected when anchor sets differ materially, preventing context bleed where "Alice decided X" merges into "Bob decided X". Thresholds are env-overridable (`CLAWMEM_MERGE_SCORE_NORMAL`=0.93, `_STRICT`=0.98). Dry-run mode via `CLAWMEM_MERGE_GUARD_DRY_RUN` for calibration.
65
+ - **Contradiction-aware merge gate** — after the name-aware gate passes, a deterministic heuristic (negation asymmetry, number/date mismatch) plus an LLM check detect contradictory merges. Blocked merges route to `link` policy (insert new row + `contradicts` edge, default) or `supersede` policy (mark old row `status='inactive'`). Configurable via `CLAWMEM_CONTRADICTION_POLICY` and `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE`. Phase 3 deductive synthesis applies the same gate to deductive dedupe matches.
66
+ - **Anti-contamination deductive synthesis** — every Phase 3 draft runs through a three-layer validator: deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions`) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` so Phase 3 yield can be diagnosed without enabling extra logging.
67
+ - **Context instruction + relationship snippets** — `context-surfacing` now always prepends an `<instruction>` block framing the surfaced facts as background knowledge the model already holds, and appends an optional `<relationships>` block listing memory-graph edges where BOTH endpoints are in the surfaced doc set. The relationships block is the first thing dropped when the payload would overflow `CLAWMEM_PROFILE`'s token budget, preserving facts-first behaviour while giving the model graph-level reasoning hooks directly in-prompt.
68
+
69
+ ### v0.7.2 Post-Import Conversation Synthesis
70
+
71
+ Opt-in LLM pass that runs **after** `clawmem mine` finishes indexing an imported collection. Operates on the freshly imported `content_type='conversation'` documents and extracts structured knowledge facts (decisions / preferences / milestones / problems) plus cross-fact relations, writing each fact as a first-class searchable document alongside the raw conversation exchanges. See [post-import synthesis](docs/concepts/architecture.md#post-import-conversation-synthesis-v072) for the architectural walkthrough.
72
+
73
+ - **New CLI flag** — `clawmem mine <dir> --synthesize [--synthesis-max-docs N]`. Off by default. When omitted, existing mine behaviour is byte-identical to v0.7.1.
74
+ - **Two-pass pipeline** — Pass 1 extracts facts per conversation via the existing LLM, saves each via dedup-aware `saveMemory`, and populates a local alias map. Pass 2 resolves cross-fact links against the local map first, falling back to collection-scoped SQL lookup. Forward references (link to a fact extracted later in the same run) are resolved correctly.
75
+ - **Idempotent reruns** — synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns over the same conversation batch hit the `saveMemory` update branch instead of creating parallel rows. Same-slug collisions are disambiguated by the stable hash suffix, not encounter order.
76
+ - **Fail-closed link resolution** — when two different facts claim the same normalized title or alias, the resolver treats the link as ambiguous and counts it unresolved. Pre-existing docs with duplicate titles in the collection do not silently bind either.
77
+ - **Weight-monotonic relation upsert** — `memory_relations` insert uses `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)`, which is idempotent on equal-weight reruns but still accepts stronger later evidence without double-counting.
78
+ - **Non-fatal failure model** — any LLM failure, JSON parse error, saveMemory collision, or relation insert error is counted and logged, never re-thrown. Synthesis failure after `indexCollection` commits does not roll back the mine import.
79
+ - **Split operator counters** — `llmFailures` counts actual LLM path failures (null, thrown, non-array JSON), while `docsWithNoFacts` counts docs where the LLM responded validly but returned zero structured facts. Previously these were conflated as `nullCalls`.
80
+
81
+ Adds +63 tests (46 unit + 5 integration + 12 regression) on top of the v0.7.1 baseline.
82
+
56
83
  ## Architecture
57
84
 
58
85
  <p align="center">
@@ -644,7 +671,7 @@ clawmem collection list List collections
644
671
  clawmem collection remove <name> Remove a collection
645
672
 
646
673
  clawmem update [--pull] [--embed] Incremental re-scan
647
- clawmem mine <dir> [-c name] [--embed] Import conversation exports (Claude, ChatGPT, Slack)
674
+ clawmem mine <dir> [-c name] [--embed] [--synthesize] Import conversation exports (--synthesize runs post-import LLM fact extraction, v0.7.2)
648
675
  clawmem embed [-f] Generate fragment embeddings
649
676
  clawmem reindex [--force] Full re-index
650
677
  clawmem watch File watcher daemon
@@ -816,7 +843,7 @@ For WHY and ENTITY queries, the search pipeline expands results through the memo
816
843
  - **Temporal** — chronological document ordering
817
844
  - **Causal** — LLM-inferred cause→effect from Observer facts + Beads `blocks`/`waits-for` deps
818
845
  - **Supporting** — LLM-analyzed document relationships + Beads `discovered-from` deps
819
- - **Contradicts** — LLM-analyzed document relationships
846
+ - **Contradicts** — LLM-analyzed document relationships. Additional `contradicts` edges are inserted by the consolidation worker's merge-time contradiction gate (v0.7.1) when it blocks a Phase 2 merge under the `link` policy, and by Phase 3 deductive synthesis when a new draft contradicts a prior deductive observation.
820
847
 
821
848
  ### Content Type Scoring
822
849
 
@@ -848,7 +875,7 @@ Content types are inferred from frontmatter or file path patterns. Half-lives ex
848
875
 
849
876
  **Snooze:** Snoozed documents are filtered out of context surfacing until their snooze date. Use `memory_snooze` for temporary suppression.
850
877
 
851
- **Contradiction detection:** When `decision-extractor` identifies a new decision that contradicts a prior one, the old decision's confidence is automatically lowered (−0.25 for contradictions, −0.15 for updates). Superseded decisions naturally fade from context surfacing without manual intervention.
878
+ **Contradiction detection:** When `decision-extractor` identifies a new decision that contradicts a prior one, the old decision's confidence is automatically lowered (−0.25 for contradictions, −0.15 for updates). Superseded decisions naturally fade from context surfacing without manual intervention. **v0.7.1 adds a second layer at the consolidation boundary:** before the background worker merges a new pattern into an existing consolidated observation, a deterministic heuristic (negation asymmetry, number/date mismatch) plus an LLM check detect contradictions. Blocked merges route to either `link` policy (both rows remain active + a `contradicts` edge is inserted, default) or `supersede` policy (old row marked `status='inactive'`). Configurable via `CLAWMEM_CONTRADICTION_POLICY` and `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE`.
852
879
 
853
880
  ## Features
854
881
 
@@ -935,6 +962,11 @@ Notes referenced by the agent during a session get boosted (`access_count++`). U
935
962
  | `CLAWMEM_LLM_URL` | `http://localhost:8089` | LLM server URL for intent/query/A-MEM. Without it, falls to `node-llama-cpp` (if allowed). |
936
963
  | `CLAWMEM_RERANK_URL` | `http://localhost:8090` | Reranker server URL. Without it, falls to `node-llama-cpp` (if allowed). |
937
964
  | `CLAWMEM_NO_LOCAL_MODELS` | `false` | Block `node-llama-cpp` from auto-downloading GGUF models. Set `true` for remote-only setups where you want fail-fast on unreachable endpoints. |
965
+ | `CLAWMEM_MERGE_SCORE_NORMAL` | `0.93` | **v0.7.1.** Phase 2 consolidation merge-safety threshold when candidate and existing anchors align. Merges above this normalized 3-gram cosine score are allowed. |
966
+ | `CLAWMEM_MERGE_SCORE_STRICT` | `0.98` | **v0.7.1.** Strictest merge-safety threshold — fallback when anchor sets are ambiguous. |
967
+ | `CLAWMEM_MERGE_GUARD_DRY_RUN` | `false` | **v0.7.1.** When `true`, Phase 2 merge-safety rejections are logged but not enforced — use for calibration before enabling the gate. |
968
+ | `CLAWMEM_CONTRADICTION_POLICY` | `link` | **v0.7.1.** Merge-time contradiction gate policy. `link` inserts a new row + `contradicts` edge (default). `supersede` marks the old row `status='inactive'`. |
969
+ | `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE` | `0.5` | **v0.7.1.** Minimum combined heuristic + LLM confidence required before the contradiction gate blocks a merge. |
938
970
 
939
971
  ## Configuration
940
972
 
package/SKILL.md CHANGED
@@ -87,6 +87,11 @@ curl http://host:8090/v1/models
87
87
  | `CLAWMEM_ENABLE_AMEM` | enabled | A-MEM note construction + link generation during indexing. |
88
88
  | `CLAWMEM_ENABLE_CONSOLIDATION` | disabled | Background worker backfills unenriched docs. Needs long-lived MCP process. |
89
89
  | `CLAWMEM_CONSOLIDATION_INTERVAL` | 300000 | Worker interval in ms (min 15000). |
90
+ | `CLAWMEM_MERGE_SCORE_NORMAL` | `0.93` | **v0.7.1.** Phase 2 merge-safety score threshold when candidate and existing anchors align. |
91
+ | `CLAWMEM_MERGE_SCORE_STRICT` | `0.98` | **v0.7.1.** Strictest merge-safety threshold (fallback when anchors are ambiguous). |
92
+ | `CLAWMEM_MERGE_GUARD_DRY_RUN` | `false` | **v0.7.1.** When `true`, Phase 2 merge-safety rejections are logged but not enforced — use for calibration. |
93
+ | `CLAWMEM_CONTRADICTION_POLICY` | `link` | **v0.7.1.** How the merge-time contradiction gate handles a blocked merge. `link` (default) keeps both rows + inserts `contradicts` edge. `supersede` marks the old row `status='inactive'`. |
94
+ | `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE` | `0.5` | **v0.7.1.** Minimum combined heuristic+LLM confidence required before the contradiction gate blocks a merge. |
90
95
 
91
96
  **Note:** The `bin/clawmem` wrapper sets all endpoint defaults. Always use the wrapper — never `bun run src/clawmem.ts` directly.
92
97
 
@@ -179,7 +184,7 @@ Hooks handle ~90% of retrieval. Zero agent effort.
179
184
 
180
185
  | Hook | Trigger | Budget | Content |
181
186
  |------|---------|--------|---------|
182
- | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate -> profile-driven hybrid search (vector if `useVector`, timeout from profile) -> FTS supplement -> file-aware search (E13) -> snooze filter -> noise filter -> spreading activation (E11) -> memory type diversification (E10) -> tiered injection (HOT/WARM/COLD) -> `<vault-context>` + optional `<vault-routing>` hint. Budget, max results, vector timeout, min score all driven by `CLAWMEM_PROFILE`. |
187
+ | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate -> profile-driven hybrid search (vector if `useVector`, timeout from profile) -> FTS supplement -> file-aware search (E13) -> snooze filter -> noise filter -> spreading activation (E11) -> memory type diversification (E10) -> tiered injection (HOT/WARM/COLD) -> `<vault-context><instruction>...</instruction><facts>...</facts><relationships>...</relationships></vault-context>` (v0.7.1: instruction always prepended; relationships list memory-graph edges where BOTH endpoints are in the surfaced set; relationships truncated first when over budget) + optional `<vault-routing>` hint. Budget, max results, vector timeout, min score all driven by `CLAWMEM_PROFILE`. |
183
188
  | `postcompact-inject` | SessionStart (compact) | 1200 tokens | re-injects authoritative context after compaction: precompact state (600) + decisions (400) + antipatterns (150) + vault context (200) -> `<vault-postcompact>` |
184
189
  | `curator-nudge` | SessionStart | 200 tokens | surfaces curator report actions, nudges when report is stale (>7 days) |
185
190
  | `precompact-extract` | PreCompact | — | extracts decisions, file paths, open questions -> writes `precompact-state.md`. Query-aware ranking. Reindexes auto-memory. |
@@ -521,6 +526,9 @@ mcp__clawmem__vsearch(query, collection="name", compact=true) # vector
521
526
  | Beads `syncBeadsIssues()` | causal, supporting, semantic | `beads_sync` MCP or watcher | Queries `bd` CLI (Dolt backend). |
522
527
  | `buildTemporalBackbone()` | temporal | `build_graphs` MCP (manual) | Creation-order edges. |
523
528
  | `buildSemanticGraph()` | semantic | `build_graphs` MCP (manual) | Pure cosine similarity. A-MEM edges take precedence (first-writer wins). |
529
+ | `consolidated_observations` | supporting, contradicts | Consolidation worker (background) | **v0.7.1 safety gates:** Phase 2 name-aware merge gate (entity anchors + 3-gram cosine, dual-threshold `CLAWMEM_MERGE_SCORE_NORMAL`=0.93 / `_STRICT`=0.98) blocks cross-entity merges. Merge-time contradiction gate (heuristic + LLM) routes blocked merges to `link` (default, inserts `contradicts` edge) or `supersede` (old row `status='inactive'`) via `CLAWMEM_CONTRADICTION_POLICY`. |
530
+ | Deductive synthesis | supporting, contradicts | Consolidation worker Phase 3 (every ~15 min) | Combines 2-3 related observations (decision/preference/milestone/problem, last 7 days) into `content_type='deductive'` docs. **v0.7.1 anti-contamination:** deterministic pre-checks (empty/invalid_indices/pool-only entity contamination) + LLM validator (fail-open, `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats via `DeductiveSynthesisStats`. Contradictory dedupe matches linked via `contradicts` edges. |
531
+ | Conversation synthesis | semantic, supporting, contradicts, causal, temporal, entity | `clawmem mine <dir> --synthesize` (opt-in, post-index) | **v0.7.2.** Two-pass LLM pipeline over freshly imported `content_type='conversation'` docs. Pass 1 extracts structured decision/preference/milestone/problem facts + aliases + cross-fact links, saves via dedup-aware `saveMemory`, populates ambiguity-aware local Set map. Pass 2 resolves links (local first, SQL fallback with `LIMIT 2` ambiguity detection), upserts relations via `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)`. Synthesized paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))` so reruns update in place. All failures non-fatal. Counters split: `llmFailures` (LLM/parse error) vs `docsWithNoFacts` (valid empty extraction). |
524
532
 
525
533
  **Graph traversal asymmetry:** `adaptiveTraversal()` traverses all edge types outbound (source->target) but only `semantic` and `entity` inbound.
526
534
 
@@ -564,6 +572,13 @@ Permanently deactivates. Use sparingly — only when genuinely wrong or permanen
564
572
 
565
573
  When `decision-extractor` detects a new decision contradicting an old one, the old decision's confidence is lowered automatically. No manual intervention needed.
566
574
 
575
+ **v0.7.1 merge-time contradiction gate:** The consolidation worker adds a second layer at merge time. Before Phase 2 merges a new pattern into an existing consolidated observation, it runs a deterministic heuristic (negation asymmetry, number/date mismatch) followed by an LLM confirmation. When confidence crosses `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE` (default 0.5), the merge is blocked and one of two policies applies via `CLAWMEM_CONTRADICTION_POLICY`:
576
+
577
+ - `link` (default) — insert a new consolidated row and create a `contradicts` edge in `memory_relations`. Both remain queryable.
578
+ - `supersede` — insert the new row and mark the old row `status='inactive'` with `invalidated_at`/`superseded_by` set. The old row is filtered from retrieval but preserved for audit.
579
+
580
+ Phase 3 deductive synthesis applies the same `contradicts` link for any draft that matches a prior deductive observation with conflicting content.
581
+
567
582
  ---
568
583
 
569
584
  ## Anti-Patterns
@@ -575,7 +590,7 @@ When `decision-extractor` detects a new decision contradicting an old one, the o
575
590
  - Do NOT pin everything — pin is for persistent high-priority items.
576
591
  - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it.
577
592
  - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically.
578
- - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command. Suggest it to the user when they mention old conversation exports, but let them run it.
593
+ - Do NOT run `clawmem mine` autonomously — it is a bulk ingestion command. Suggest it to the user when they mention old conversation exports, but let them run it. **v0.7.2 adds `--synthesize`** — an opt-in post-import LLM fact extraction pass. Also requires user consent because it drives one extra LLM call per conversation doc. Suggest both together when the user wants searchable structured memory from raw chat exports.
579
594
  - Do NOT use `diary_write` in Claude Code — hooks capture this automatically. Diary is for non-hooked environments only (Hermes, Gemini, plain MCP).
580
595
  - Do NOT use `kg_query` for causal "why" questions — use `intent_search` or `memory_retrieve`. `kg_query` returns structured entity facts (SPO triples), not reasoning chains.
581
596
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "clawmem",
3
- "version": "0.7.0",
3
+ "version": "0.7.2",
4
4
  "description": "On-device context engine and memory for AI agents. Claude Code and OpenClaw. Hooks + MCP server + hybrid RAG search.",
5
5
  "type": "module",
6
6
  "bin": {
package/src/clawmem.ts CHANGED
@@ -242,12 +242,14 @@ async function cmdMine(args: string[]) {
242
242
  collection: { type: "string", short: "c" },
243
243
  embed: { type: "boolean", default: false },
244
244
  "dry-run": { type: "boolean", default: false },
245
+ synthesize: { type: "boolean", default: false },
246
+ "synthesis-max-docs": { type: "string" },
245
247
  },
246
248
  allowPositionals: true,
247
249
  });
248
250
 
249
251
  const dir = positionals[0];
250
- if (!dir) die("Usage: clawmem mine <directory> [-c collection-name] [--embed] [--dry-run]");
252
+ if (!dir) die("Usage: clawmem mine <directory> [-c collection-name] [--embed] [--dry-run] [--synthesize] [--synthesis-max-docs N]");
251
253
  const absDir = pathResolve(dir);
252
254
  if (!existsSync(absDir)) die(`Directory not found: ${absDir}`);
253
255
 
@@ -319,6 +321,32 @@ async function cmdMine(args: string[]) {
319
321
  const stats = await indexCollection(s, collectionName, stagingDir, "**/*.md");
320
322
  console.log(` ${c.green}+${stats.added}${c.reset} added, ${c.yellow}~${stats.updated}${c.reset} updated, ${c.dim}=${stats.unchanged}${c.reset} unchanged`);
321
323
 
324
+ // Ext 4 — post-import conversation synthesis (opt-in via --synthesize)
325
+ // Runs AFTER indexCollection has committed. Failure is non-fatal and never
326
+ // rolls back the mine import.
327
+ if (values.synthesize) {
328
+ const maxDocs = values["synthesis-max-docs"]
329
+ ? parseInt(values["synthesis-max-docs"] as string, 10)
330
+ : undefined;
331
+ console.log(`\n${c.cyan}Running post-import conversation synthesis${c.reset}`);
332
+ try {
333
+ const { runConversationSynthesis } = await import("./conversation-synthesis.ts");
334
+ const llm = getDefaultLlamaCpp();
335
+ const synthResult = await runConversationSynthesis(s, llm, {
336
+ collection: collectionName,
337
+ maxDocs: Number.isFinite(maxDocs) && (maxDocs as number) > 0 ? maxDocs : undefined,
338
+ });
339
+ console.log(
340
+ ` ${c.green}${synthResult.factsSaved}${c.reset} facts saved, ` +
341
+ `${c.green}${synthResult.linksResolved}${c.reset} links resolved, ` +
342
+ `${c.yellow}${synthResult.linksUnresolved}${c.reset} unresolved, ` +
343
+ `${c.dim}${synthResult.llmFailures} LLM failure(s), ${synthResult.docsWithNoFacts} docs with no facts${c.reset}`,
344
+ );
345
+ } catch (err) {
346
+ console.log(` ${c.yellow}Synthesis failed (mine import preserved):${c.reset} ${err}`);
347
+ }
348
+ }
349
+
322
350
  if (values.embed) {
323
351
  console.log();
324
352
  await cmdEmbed([]);
@@ -2500,7 +2528,7 @@ ${c.bold}Setup:${c.reset}
2500
2528
 
2501
2529
  ${c.bold}Indexing:${c.reset}
2502
2530
  clawmem update [--pull] [--embed] Re-scan collections (--embed auto-embeds)
2503
- clawmem mine <dir> [-c name] [--embed] Import conversation exports (Claude, ChatGPT, Slack)
2531
+ clawmem mine <dir> [-c name] [--embed] [--synthesize] Import conversation exports (Claude, ChatGPT, Slack); --synthesize runs post-import LLM fact extraction
2504
2532
  clawmem embed [-f] Generate fragment embeddings
2505
2533
  clawmem reindex [--force] [--enrich] Full re-index (--enrich: run entity extraction + links on all docs)
2506
2534
  clawmem watch File watcher daemon