npm - cozo-memory - Versions diffs - 1.1.2 → 1.1.4 - Mend

cozo-memory 1.1.2 → 1.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/README.md +356 -5
package/dist/adaptive-retrieval.js +520 -0
package/dist/db-inspect.js +25 -0
package/dist/dynamic-fusion.js +602 -0
package/dist/hybrid-search.js +4 -4
package/dist/index.js +699 -23
package/dist/inference-engine.js +104 -76
package/dist/logical-edges-service.js +316 -0
package/dist/multi-hop-vector-pivot.js +390 -0
package/dist/temporal-embedding-service.js +313 -0
package/dist/test-adaptive-integration.js +84 -0
package/dist/test-adaptive-retrieval.js +135 -0
package/dist/test-compaction.js +91 -0
package/dist/test-dynamic-fusion.js +231 -0
package/dist/test-fact-lifecycle.js +82 -0
package/dist/test-logical-edges.js +282 -0
package/dist/test-manual-compact.js +95 -0
package/dist/test-multi-hop-vector-pivot-v2.js +239 -0
package/dist/test-multi-hop-vector-pivot.js +240 -0
package/dist/test-temporal-embeddings.js +123 -0
package/dist/test-validity-retract.js +45 -0
package/dist/test-validity-rm.js +49 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -59,11 +59,19 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
 🔍 **Hybrid Search (since v0.7)** - Combines semantic search (HNSW), full-text search (FTS), and graph signals via Reciprocal Rank Fusion (RRF)
-🕸️ **Graph-RAG & Graph-Walking (v1.7/v2.0)** - Hierarchical retrieval with community detection and summarization; recursive traversals using optimized Datalog algorithms
+🔀 **Dynamic Fusion Framework (v2.3)** - Advanced 4-path retrieval system combining Dense Vector, Sparse Vector, FTS, and Graph traversal with configurable weights and fusion strategies (RRF, Weighted Sum, Max, Adaptive)
+🧠 **Logical Edges from Knowledge Graph (v1.0)** - Metadata-driven implicit relationship discovery with 5 patterns: same category, same type, hierarchical, contextual, and transitive logical edges
+🔀 **Multi-Hop Reasoning with Vector Pivots (v2.5)** - Logic-aware Retrieve-Reason-Prune pipeline using vector search as springboard for graph traversal with helpfulness scoring and pivot depth security
+⏳ **Temporal Graph Neural Networks (v2.4)** - Time-aware node embeddings capturing historical context, temporal smoothness, and recency-weighted aggregation using Time2Vec encoding and multi-signal fusion
 🧠 **Agentic Retrieval Layer (v2.0)** - Auto-routing engine that analyzes query intent via local LLM to select the optimal search strategy (Vector, Graph, or Community)
-🧠 **Multi-Level Memory (v2.0)** - Context-aware memory system with built-in session and task management
+� **GraphRAG-R1 Adaptive Retrieval (v2.6)** - Intelligent retrieval system with Progressive Retrieval Attenuation (PRA) and Cost-Aware F1 (CAF) scoring that automatically selects optimal strategies based on query complexity and learns from historical performance
+�🧠 **Multi-Level Memory (v2.0)** - Context-aware memory system with built-insession and task management
 🎯 **Tiny Learned Reranker (v2.0)** - Integrated Cross-Encoder model (`ms-marco-MiniLM-L-6-v2`) for ultra-precise re-ranking of top search results
@@ -81,6 +89,10 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
 🧹 **Janitor Service** - LLM-backed automatic cleanup with hierarchical summarization, observation pruning, and **automated session compression**
+🗜️ **Context Compaction & Auto-Summarization (v2.2)** - Automatic and manual memory consolidation with progressive summarization and LLM-backed Executive Summaries
+🧠 **Fact Lifecycle Management (v2.1)** - Native "soft-deletion" via CozoDB Validity retraction; invalidated facts are hidden from current views but preserved in history for audit trails
 👤 **User Preference Profiling** - Persistent user preferences with automatic 50% search boost
 🔍 **Near-Duplicate Detection** - Automatic LSH-based deduplication to avoid redundancy
@@ -129,6 +141,10 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
 - **Data Integrity (Trigger Concept)**: Prevents invalid states like self-references in relationships (Self-Loops) directly at creation.
 - **Hierarchical Summarization**: The Janitor condenses old fragments into "Executive Summary" nodes to preserve the "Big Picture" long-term.
 - **User Preference Profiling**: A specialized `global_user_profile` entity stores persistent preferences (likes, work style), which receive a **50% score boost** in every search.
+- **Fact Lifecycle Management (v2.1)**: Uses CozoDB's native **Validity** retraction mechanism to manage the lifecycle of information. Instead of destructive deletions, facts are invalidated by asserting a `[timestamp, false]` record. This ensures:
+  1. **Auditability**: You can always "time-travel" back to see what the system knew at any given point.
+  2. **Consistency**: All standard retrieval (Search, Graph-RAG, Inference) uses the `@ "NOW"` filter to automatically exclude retracted facts.
+  3. **Atomic Retraction**: Invalidation can be part of a multi-statement transaction, allowing for clean "update" patterns (invalidate old + insert new).
 - **All Local**: Embeddings via Transformers/ONNX; no external embedding service required.
 ## Positioning & Comparison
@@ -350,6 +366,153 @@ EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run download-model
 **Note:** Changing models requires re-embedding existing data. The model is downloaded once on first use.
+## Framework Adapters
+Official adapters for seamless integration with popular AI frameworks:
+### 🦜 LangChain Adapter
+```bash
+npm install @cozo-memory/langchain @cozo-memory/adapters-core
+```
+```typescript
+import { CozoMemoryChatHistory, CozoMemoryRetriever } from '@cozo-memory/langchain';
+import { BufferMemory } from 'langchain/memory';
+// Chat history with session management
+const chatHistory = new CozoMemoryChatHistory({
+  sessionName: 'user-123'
+});
+const memory = new BufferMemory({ chatHistory });
+// Retriever with hybrid search or Graph-RAG
+const retriever = new CozoMemoryRetriever({
+  useGraphRAG: true,
+  graphRAGDepth: 2
+});
+```
+### 🦙 LlamaIndex Adapter
+```bash
+npm install @cozo-memory/llamaindex @cozo-memory/adapters-core
+```
+```typescript
+import { CozoVectorStore } from '@cozo-memory/llamaindex';
+import { VectorStoreIndex } from 'llamaindex';
+// Vector store with Graph-RAG support
+const vectorStore = new CozoVectorStore({
+  useGraphRAG: true
+});
+const index = await VectorStoreIndex.fromDocuments(
+  documents,
+  { vectorStore }
+);
+```
+**Features:**
+- ✅ Persistent chat history (LangChain)
+- ✅ Hybrid search retrieval (both)
+- ✅ Graph-RAG mode (both)
+- ✅ Session management (LangChain)
+- ✅ Vector store operations (LlamaIndex)
+**Documentation:** See [adapters/README.md](./adapters/README.md) for complete examples and API reference.
+## Temporal Graph Neural Networks (v2.4)
+CozoDB Memory now includes **Temporal Graph Neural Network (TGNN) embeddings** that capture time-aware node representations combining historical context, temporal smoothness, and graph structure.
+### What are Temporal Embeddings?
+Traditional embeddings are static snapshots. Temporal embeddings evolve over time, capturing:
+1. **Historical Context** - Weighted aggregation of past observations with exponential decay
+2. **Temporal Smoothness** - Recency-weighted signals ensure gradual changes, not sudden jumps
+3. **Time Encoding** - Time2Vec-inspired sinusoidal encoding captures periodicity and time differences
+4. **Neighborhood Aggregation** - Related entities influence the embedding through weighted graph signals
+### Architecture
+```
+Entity Embedding = Fuse(
+  content_embedding (0.4),      # Semantic meaning
+  temporal_encoding (0.2),      # Time information (Time2Vec)
+  historical_context (0.2),     # Past observations (exponential decay)
+  neighborhood_agg (0.2)        # Related entities (graph signals)
+)
+```
+### Key Features
+- **Time2Vec Encoding** - Sinusoidal functions capture temporal patterns without discretization
+- **Exponential Decay Weighting** - Recent observations matter more (30-day half-life)
+- **Multi-Signal Fusion** - Combines content, temporal, historical, and graph signals
+- **Confidence Scoring** - Reflects data freshness and completeness (0-1 scale)
+- **Memory Caching** - Efficient temporal state for multi-hop traversals
+- **Time-Travel Support** - Generate embeddings at any historical timepoint via CozoDB Validity
+### Usage Example
+```typescript
+import { TemporalEmbeddingService } from 'cozo-memory';
+const temporalService = new TemporalEmbeddingService(
+  embeddingService,
+  dbQuery
+);
+// Generate embedding at current time
+const embedding = await temporalService.generateTemporalEmbedding(
+  entityId,
+  new Date()
+);
+// Or at a historical timepoint
+const pastEmbedding = await temporalService.generateTemporalEmbedding(
+  entityId,
+  new Date('2026-02-01')
+);
+// Compare temporal trajectories
+const similarity = cosineSimilarity(
+  embedding.embedding,
+  pastEmbedding.embedding
+);
+```
+### Confidence Scoring
+Confidence reflects data quality and freshness:
+```
+Base: 0.5
++ Recent entity (< 7 days): +0.3
++ Many observations (> 5): +0.15
++ Well-connected (> 10 relations): +0.15
+= Max: 1.0
+```
+### Research Foundation
+Based on cutting-edge research (2023-2026):
+- **ACM Temporal Graph Learning Primer** (2025) - Comprehensive TGNN taxonomy
+- **TempGNN** (2023) - Temporal embeddings for dynamic session-based recommendations
+- **Time-Aware Graph Embedding** (2021) - Temporal smoothness and task-oriented approaches
+- **Allan-Poe** (2025) - All-in-One Graph-Based Hybrid Search with dynamic fusion
+### Testing
+```bash
+npx ts-node src/test-temporal-embeddings.ts
+```
 ## Start / Integration
 ### MCP Server (stdio)
@@ -523,10 +686,10 @@ The interface is reduced to **4 consolidated tools**. The concrete operation is
 | Tool | Purpose | Key Actions |
 |------|---------|-------------|
-| `mutate_memory` | Write operations | create_entity, update_entity, delete_entity, add_observation, create_relation, start_session, stop_session, start_task, stop_task, run_transaction, add_inference_rule, ingest_file |
-| `query_memory` | Read operations | search, advancedSearch, context, entity_details, history, graph_rag, graph_walking, agentic_search (Multi-Level Context support) |
+| `mutate_memory` | Write operations | create_entity, update_entity, delete_entity, add_observation, create_relation, start_session, stop_session, start_task, stop_task, run_transaction, add_inference_rule, ingest_file, invalidate_observation, invalidate_relation |
+| `query_memory` | Read operations | search, advancedSearch, context, entity_details, history, graph_rag, graph_walking, agentic_search, dynamic_fusion, adaptive_retrieval (Multi-Level Context support) |
 | `analyze_graph` | Graph analysis | explore, communities, pagerank, betweenness, hits, shortest_path, bridge_discovery, semantic_walk, infer_relations |
-| `manage_system` | Maintenance | health, metrics, export_memory, import_memory, snapshot_create, snapshot_list, snapshot_diff, cleanup, reflect, summarize_communities, clear_memory |
+| `manage_system` | Maintenance | health, metrics, export_memory, import_memory, snapshot_create, snapshot_list, snapshot_diff, cleanup, defrag, reflect, summarize_communities, clear_memory, compact |
 ### mutate_memory (Write)
@@ -543,6 +706,8 @@ Actions:
 - `run_transaction`: `{ operations: Array<{ action, params }> }` **(New v1.2)**: Executes multiple operations atomically.
 - `add_inference_rule`: `{ name, datalog }`
 - `ingest_file`: `{ format, file_path?, content?, entity_id?, entity_name?, entity_type?, chunking?, metadata?, observation_metadata?, deduplicate?, max_observations? }`
+- `invalidate_observation`: `{ observation_id }` **(New v2.1)**: Retracts an observation using Validity `[now, false]`.
+- `invalidate_relation`: `{ from_id, to_id, relation_type }` **(New v2.1)**: Retracts a relationship using Validity `[now, false]`.
   - `format` options: `"markdown"`, `"json"`, `"pdf"` **(New v1.9)**
   - `file_path`: Optional path to file on disk (alternative to `content` parameter)
   - `content`: File content as string (required if `file_path` not provided)
@@ -635,6 +800,8 @@ Actions:
 - `graph_rag`: `{ query, max_depth?, limit?, filters?, rerank? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
 - `graph_walking`: `{ query, start_entity_id?, max_depth?, limit? }` (v1.7) Recursive semantic graph search. Starts at vector seeds or a specific entity and follows relationships to other semantically relevant entities. Ideal for deeper path exploration.
 - `agentic_search`: `{ query, limit?, rerank? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
+- `adaptive_retrieval`: `{ query, limit? }` **(New v2.6)**: **GraphRAG-R1 Adaptive Retrieval**. Intelligent system inspired by GraphRAG-R1 (Yu et al., WWW 2026) that automatically classifies query complexity (Simple/Moderate/Complex/Exploratory) and selects the optimal retrieval strategy from 5 options (Vector-Only, Graph-Walk, Hybrid-Fusion, Community-Expansion, Semantic-Walk). Features Progressive Retrieval Attenuation (PRA) to prevent over-retrieval and Cost-Aware F1 (CAF) scoring to balance answer quality with computational cost. Learns from usage and adapts strategy selection based on historical performance stored in CozoDB.
+- `dynamic_fusion`: `{ query, config?, limit? }` **(New v2.3)**: **Dynamic Fusion Framework**. Combines 4 retrieval paths (Dense Vector, Sparse Vector, FTS, Graph) with configurable weights and fusion strategies. Inspired by Allan-Poe (arXiv:2511.00855).
 - `get_relation_evolution`: `{ from_id, to_id?, since?, until? }` (in `analyze_graph`) Shows temporal development of relationships including time range filter and diff summary.
 Important Details:
@@ -674,6 +841,81 @@ Examples:
 { "action": "context", "query": "What is Alice working on right now?", "context_window": 20 }
 ```
+#### Dynamic Fusion Framework (v2.3)
+The Dynamic Fusion Framework combines 4 retrieval paths with configurable weights and fusion strategies:
+**Retrieval Paths:**
+1. **Dense Vector Search (HNSW)**: Semantic similarity via embeddings
+2. **Sparse Vector Search**: Keyword-based matching with TF-IDF scoring
+3. **Full-Text Search (FTS)**: BM25 scoring on entity names
+4. **Graph Traversal**: Multi-hop relationship expansion from vector seeds
+**Fusion Strategies:**
+- `rrf` (Reciprocal Rank Fusion): Combines rankings with position-based scoring
+- `weighted_sum`: Direct weighted combination of scores
+- `max`: Takes maximum score across all paths
+- `adaptive`: Query-dependent weighting (future enhancement)
+**Configuration Example:**
+```json
+{
+  "action": "dynamic_fusion",
+  "query": "database with graph capabilities",
+  "limit": 10,
+  "config": {
+    "vector": {
+      "enabled": true,
+      "weight": 0.4,
+      "topK": 20,
+      "efSearch": 100
+    },
+    "sparse": {
+      "enabled": true,
+      "weight": 0.3,
+      "topK": 20,
+      "minScore": 0.1
+    },
+    "fts": {
+      "enabled": true,
+      "weight": 0.2,
+      "topK": 20,
+      "fuzzy": true
+    },
+    "graph": {
+      "enabled": true,
+      "weight": 0.1,
+      "maxDepth": 2,
+      "maxResults": 20,
+      "relationTypes": ["related_to", "uses"]
+    },
+    "fusion": {
+      "strategy": "rrf",
+      "rrfK": 60,
+      "minScore": 0.0,
+      "deduplication": true
+    }
+  }
+}
+```
+**Response includes:**
+- `results`: Fused and ranked results with path contribution details
+- `stats`: Performance metrics including:
+  - `totalResults`: Number of results after fusion
+  - `pathContributions`: Count of results from each path
+  - `fusionTime`: Total execution time
+  - `pathTimes`: Individual execution times per path
+**Use Cases:**
+- **Broad Exploration**: Enable all paths with balanced weights
+- **Precision Search**: High vector weight, low graph weight
+- **Relationship Discovery**: High graph weight with specific relation types
+- **Keyword Matching**: High sparse/FTS weights for exact term matching
+```json
 #### Conflict Detection (Status)
 If there are contradictory statements about the status of an entity, a conflict is marked. The system considers **temporal consistency**:
@@ -728,6 +970,16 @@ Actions:
 - `snapshot_list`: `{}`
 - `snapshot_diff`: `{ snapshot_id_a, snapshot_id_b }`
 - `cleanup`: `{ confirm, older_than_days?, max_observations?, min_entity_degree?, model? }`
+- `defrag`: `{ confirm, similarity_threshold?, min_island_size? }` **(New v2.3)**: Memory defragmentation. Reorganizes memory structure by:
+  - **Duplicate Detection**: Finds and merges near-duplicate observations using cosine similarity (threshold 0.8-1.0, default 0.95)
+  - **Island Connection**: Connects small knowledge islands (≤3 nodes) to main graph via semantic bridges
+  - **Orphan Removal**: Deletes orphaned entities without observations or relations
+  - With `confirm: false`: Dry-run mode showing candidates without making changes
+  - With `confirm: true`: Executes defragmentation and returns statistics
+- `compact`: `{ session_id?, entity_id?, model? }` **(New v2.2)**: Manual context compaction. Supports three modes:
+  - **Session Compaction**: `{ session_id, model? }` - Summarizes session observations into 2-3 bullet points and stores in user profile
+  - **Entity Compaction**: `{ entity_id, model? }` - Compacts entity observations when threshold exceeded, creates Executive Summary
+  - **Global Compaction**: `{}` (no parameters) - Compacts all entities exceeding threshold (default: 20 observations)
 - `summarize_communities`: `{ model?, min_community_size? }` **(New v2.0)**: Triggers the **Hierarchical GraphRAG** pipeline. Recomputes communities, generates thematic summaries via LLM, and stores them as `CommunitySummary` entities.
 - `reflect`: `{ entity_id?, mode?, model? }` Analyzes memory for contradictions and new insights. Supports `summary` (default) and `discovery` (autonomous link refinement) modes.
 - `clear_memory`: `{ confirm }`
@@ -738,6 +990,17 @@ Janitor Cleanup Details:
   - **Hierarchical Summarization**: Detects isolated or old observations, has them summarized by a local LLM (Ollama), and creates a new `ExecutiveSummary` node. Old fragments are deleted to reduce noise while preserving knowledge.
   - **Automated Session Compression**: Automatically identifies inactive sessions, summarizes their activity into a few bullet points, and stores the summary in the User Profile while marking the session as archived.
+Context Compaction Details **(New v2.2)**:
+- **Automatic Compaction**: Triggered automatically when observations exceed threshold (default: 20)
+  - Runs in background during `addObservation`
+  - Uses lock mechanism to prevent concurrent compaction
+- **Manual Compaction**: Available via `compact` action in `manage_system`
+  - **Session Mode**: Summarizes session observations and stores in `global_user_profile`
+  - **Entity Mode**: Compacts specific entity with custom threshold
+  - **Global Mode**: Compacts all entities exceeding threshold
+- **Progressive Summarization**: New observations are merged with existing Executive Summaries instead of simple append
+- **LLM Integration**: Uses Ollama (default model: `demyagent-4b-i1:Q6_K`) for intelligent summarization
 **Before Janitor:**
 ```
 Entity: Project X
@@ -857,6 +1120,94 @@ Example:
 Returns deletion statistics showing exactly what was removed.
+## Multi-Hop Reasoning with Vector Pivots (v2.5)
+**Research-backed implementation** based on HopRAG (ACL 2025), Retrieval Pivot Attacks (arXiv:2602.08668), and Neo4j GraphRAG patterns.
+### Retrieve-Reason-Prune Pipeline
+1. **RETRIEVE**: Find semantic pivot points via HNSW vector search
+2. **REASON**: Logic-aware graph traversal with relationship context
+3. **PRUNE**: Helpfulness scoring combining textual similarity + logical importance
+4. **AGGREGATE**: Deduplicate and rank entities by occurrence and confidence
+### Key Features
+- **Logic-Aware Traversal**: Considers relationship types, strengths, and PageRank scores
+- **Helpfulness Scoring**: Combines semantic similarity (60%) + logical importance (40%)
+- **Pivot Depth Security**: Enforces max depth limit to prevent uncontrolled graph expansion
+- **Confidence Decay**: Exponential decay (0.9^depth) for recency weighting
+- **Adaptive Pruning**: Filters paths below confidence threshold
+### Usage Example
+```typescript
+const multiHop = new MultiHopVectorPivot(db, embeddingService);
+const result = await multiHop.multiHopVectorPivot(
+  "how does deep learning relate to NLP",
+  maxHops: 3,
+  limit: 10
+);
+// Returns:
+// - pivots: Initial vector search results
+// - paths: High-quality reasoning paths
+// - aggregated_results: Ranked entities with scores
+// - total_hops: Maximum traversal depth
+// - execution_time_ms: Performance metrics
+```
+### Research Foundation
+- **HopRAG (ACL 2025)**: Logic-aware RAG with pseudo-queries as edges, achieving 76.78% higher answer accuracy
+- **Retrieval Pivot Attacks**: Security patterns for hybrid RAG systems with boundary enforcement
+- **Neo4j GraphRAG**: Multi-hop reasoning patterns for knowledge graphs
+## Logical Edges from Knowledge Graph (v1.0)
+**Research-backed implementation** based on SAGE (ICLR 2026), Metadata Knowledge Graphs (Atlan 2026), and Knowledge Graph Completion research.
+### Five Logical Edge Patterns
+1. **Same Category Edges** - Entities with identical category metadata (confidence: 0.8)
+2. **Same Type Edges** - Entities of the same type (confidence: 0.7)
+3. **Hierarchical Edges** - Parent-child relationships from metadata (confidence: 0.9)
+4. **Contextual Edges** - Entities sharing domain, time period, location, or organization (confidence: 0.7-0.75)
+5. **Transitive Logical Edges** - Derived from explicit relationships + metadata patterns (confidence: 0.55-0.6)
+### Usage Example
+```typescript
+const logicalEdges = new LogicalEdgesService(db);
+// Discover all logical edges for an entity
+const edges = await logicalEdges.discoverLogicalEdges(entityId);
+// Optionally materialize as explicit relationships
+const created = await logicalEdges.materializeLogicalEdges(entityId);
+// Returns:
+// - from_id, to_id: Entity IDs
+// - relation_type: "same_category", "same_type", "hierarchical", "contextual", "transitive_logical"
+// - confidence: 0.55-0.9 based on pattern
+// - reason: Human-readable explanation
+// - pattern: Pattern type for analysis
+```
+### Key Features
+- **Metadata-Driven**: Discovers relationships from entity metadata without explicit encoding
+- **Multi-Pattern**: Combines 5 different logical inference patterns
+- **Deduplication**: Automatically removes duplicate edges, keeping highest confidence
+- **Materialization**: Optional: create explicit relationships for performance optimization
+- **Explainability**: Each edge includes reason and pattern for interpretability
+### Research Foundation
+- **SAGE (ICLR 2026)**: Implicit graph exploration with on-demand edge discovery
+- **Metadata Knowledge Graphs (Atlan 2026)**: Metadata-driven relationship inference
+- **Knowledge Graph Completion (Frontiers 2025)**: Predicting implicit relationships using embeddings
 ## Technical Highlights
 ### Dual Timestamp Format (v1.9)