PyPI - quantum-memory-graph - Versions diffs - 1.2.1__tar.gz → 1.2.2__tar.gz - Mend

quantum-memory-graph 1.2.1tar.gz → 1.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: quantum-memory-graph
-Version: 1.2.1
+Version: 1.2.2
 Summary: Quantum-optimized knowledge graph memory for AI agents. Relationship-aware subgraph selection via QAOA.
 Home-page: https://github.com/Dustin-a11y/quantum-memory-graph
 Author: Coinkong (Chef's Attraction)
@@ -23,7 +23,6 @@ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: quantum-agent-memory>=0.1.0
 Requires-Dist: sentence-transformers>=2.2.0
 Requires-Dist: networkx>=3.0
 Requires-Dist: numpy>=1.24.0
@@ -45,29 +44,9 @@ Dynamic: license-file
 Every memory system treats memories as independent documents — search, rank, stuff into context. But memories aren't independent. They have *relationships*. "The team chose React" becomes 10x more useful paired with "because of ecosystem maturity" and "FastAPI handles the backend."
-Quantum Memory Graph maps these relationships, then uses QAOA to find the optimal *combination* of memories — not just the most relevant individuals, but the best connected subgraph that gives your agent maximum context.
-## Benchmark: MemCombine (Internal — Memory Combination)
-MemCombine tests what no existing benchmark measures — **memory combination quality**, where QAOA graph selection finds coherent subsets that embedding similarity misses.
-| Method | Coverage | Evidence Recall | F1 | Perfect |
-|--------|----------|----------------|----|---------|
-| Embedding Top-K | 69.9% | 65.6% | 68.1% | 1/5 |
-| **Graph + QAOA** | **96.7%** | **91.0%** | **92.6%** | **4/5** |
-| **Advantage** | **+26.8%** | **+25.4%** | **+24.5%** | |
-When the task is "find memories that work *together*," graph-aware quantum selection crushes pure similarity search.
-> **How to read this table:** The R@5/R@10 numbers are driven by QMG's chunked
-> embedding retrieval pipeline (Stage 1: gte-large, 500-char chunks, mean-of-top-3
-> scoring). QAOA (Stage 2) refines the top-14 candidates for relationship-aware
-> selection — its advantage shows up in MemCombine (combination quality) rather
-> than raw recall rank. The pipeline as a whole achieves #1.
 ## 🏆 #1 on LongMemEval (ICLR 2025 Benchmark)
-Tested on the official [LongMemEval benchmark](https://arxiv.org/abs/2410.10813) for long-term memory in AI agents:
+Tested on the official [LongMemEval benchmark](https://arxiv.org/abs/2410.10813) — [verified submission](https://github.com/xiaowu0162/LongMemEval/issues/46).
 | Method | R@1 | R@5 | R@10 | NDCG@10 |
 |--------|:---:|:---:|:----:|:-------:|
@@ -207,10 +186,7 @@ result = recall(
 )
 ```
-### Run MemCombine Benchmark
 ```python
-from benchmarks.memcombine import run_benchmark
 def my_recall(memories, query, K):
     # Your recall implementation
@@ -243,8 +219,6 @@ Validated on `ibm_fez` and `ibm_kingston` backends.
 MIT License — Copyright 2026 Coinkong (Chef's Attraction)
 ## Links
-- [quantum-agent-memory](https://github.com/Dustin-a11y/quantum-agent-memory) — The QAOA optimization engine
-- [MemCombine Benchmark](benchmarks/memcombine.py) — Test memory combination quality
+- [GitHub](https://github.com/Dustin-a11y/quantum-memory-graph) — Source code and benchmarks

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/README.md RENAMED Viewed

@@ -4,29 +4,9 @@
 Every memory system treats memories as independent documents — search, rank, stuff into context. But memories aren't independent. They have *relationships*. "The team chose React" becomes 10x more useful paired with "because of ecosystem maturity" and "FastAPI handles the backend."
-Quantum Memory Graph maps these relationships, then uses QAOA to find the optimal *combination* of memories — not just the most relevant individuals, but the best connected subgraph that gives your agent maximum context.
-## Benchmark: MemCombine (Internal — Memory Combination)
-MemCombine tests what no existing benchmark measures — **memory combination quality**, where QAOA graph selection finds coherent subsets that embedding similarity misses.
-| Method | Coverage | Evidence Recall | F1 | Perfect |
-|--------|----------|----------------|----|---------|
-| Embedding Top-K | 69.9% | 65.6% | 68.1% | 1/5 |
-| **Graph + QAOA** | **96.7%** | **91.0%** | **92.6%** | **4/5** |
-| **Advantage** | **+26.8%** | **+25.4%** | **+24.5%** | |
-When the task is "find memories that work *together*," graph-aware quantum selection crushes pure similarity search.
-> **How to read this table:** The R@5/R@10 numbers are driven by QMG's chunked
-> embedding retrieval pipeline (Stage 1: gte-large, 500-char chunks, mean-of-top-3
-> scoring). QAOA (Stage 2) refines the top-14 candidates for relationship-aware
-> selection — its advantage shows up in MemCombine (combination quality) rather
-> than raw recall rank. The pipeline as a whole achieves #1.
 ## 🏆 #1 on LongMemEval (ICLR 2025 Benchmark)
-Tested on the official [LongMemEval benchmark](https://arxiv.org/abs/2410.10813) for long-term memory in AI agents:
+Tested on the official [LongMemEval benchmark](https://arxiv.org/abs/2410.10813) — [verified submission](https://github.com/xiaowu0162/LongMemEval/issues/46).
 | Method | R@1 | R@5 | R@10 | NDCG@10 |
 |--------|:---:|:---:|:----:|:-------:|
@@ -166,10 +146,7 @@ result = recall(
 )
 ```
-### Run MemCombine Benchmark
 ```python
-from benchmarks.memcombine import run_benchmark
 def my_recall(memories, query, K):
     # Your recall implementation
@@ -202,8 +179,6 @@ Validated on `ibm_fez` and `ibm_kingston` backends.
 MIT License — Copyright 2026 Coinkong (Chef's Attraction)
 ## Links
-- [quantum-agent-memory](https://github.com/Dustin-a11y/quantum-agent-memory) — The QAOA optimization engine
-- [MemCombine Benchmark](benchmarks/memcombine.py) — Test memory combination quality
+- [GitHub](https://github.com/Dustin-a11y/quantum-memory-graph) — Source code and benchmarks

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/quantum_memory_graph/__init__.py RENAMED Viewed

@@ -7,7 +7,7 @@ then QAOA to find the optimal subgraph for any query.
 Copyright 2026 Coinkong (Chef's Attraction). MIT License.
 """
-__version__ = "1.2.0"
+__version__ = "1.2.1"
 from .graph import MemoryGraph
 from .subgraph_optimizer import optimize_subgraph

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/quantum_memory_graph/pipeline.py RENAMED Viewed

@@ -13,7 +13,7 @@ from typing import List, Dict, Optional
 from datetime import datetime
 from .graph import MemoryGraph
-from .subgraph_optimizer import optimize_subgraph
+from .subgraph_optimizer import optimize_subgraph as _std_optimize_subgraph
 from .recency import ShortTermMemory
@@ -139,6 +139,7 @@ def recall(
     max_candidates: int = 14,
     use_recency: bool = True,
     stm: ShortTermMemory = None,
+    method: str = "qaoa",
 ) -> Dict:
     """
     Recall optimal memories for a query.
@@ -170,6 +171,14 @@ def recall(
     if not g.memories:
         return {"ok": True, "memories": [], "method": "empty"}
+    today_str = datetime.now().strftime("%Y-%m-%d")
+    def _tier(mem):
+        """Determine if a memory is warm (today) or cold (older)."""
+        if mem.timestamp and today_str in str(mem.timestamp):
+            return "warm"
+        return "cold"
     # Phase 1: Graph neighborhood search
     neighborhood = g.get_neighborhood(
         query=query, hops=hops, top_seeds=top_seeds
@@ -198,6 +207,7 @@ def recall(
                 "entities": g.memories[cid].entities,
                 "relevance": float(candidate_scores[i]),
                 "source": g.memories[cid].source,
+                "tier": _tier(g.memories[cid]),
             }
             for i, cid in enumerate(candidate_ids)
         ]
@@ -206,22 +216,66 @@ def recall(
             "memories": memories,
             "method": "all_candidates",
             "candidates": len(candidate_ids),
+            "tier_counts": {
+                "warm": sum(1 for m in memories if m["tier"] == "warm"),
+                "cold": sum(1 for m in memories if m["tier"] == "cold"),
+            },
         }
-    # Phase 2: Extract subgraph data
-    subgraph = g.get_subgraph_data(candidate_ids)
-    adjacency = subgraph["adjacency"]
+    # Phase 2: Synergy rerank or QAOA subgraph
+    if method == "synergy":
+        try:
+            from .synergy_reranker import select as synergy_select
+            # Build texts for candidates
+            cand_texts = [g.memories[cid].text for cid in candidate_ids]
+            selected_synergy = synergy_select(candidate_scores, cand_texts, query, K)
+            selected_memories = []
+            for idx in selected_synergy:
+                cid = candidate_ids[idx]
+                mem = g.memories[cid]
+                selected_memories.append({
+                    "text": mem.text,
+                    "entities": mem.entities,
+                    "relevance": float(candidate_scores[idx]),
+                    "source": mem.source,
+                    "tier": _tier(mem),
+                    "connections": [],
+                })
+            return {
+                "ok": True,
+                "memories": selected_memories,
+                "method": "synergy",
+                "candidates": len(candidate_ids),
+                "K": K,
+            }
+        except Exception as e:
+            print(f"WARNING: Synergy selection failed ({e}), falling back to QAOA")
     # Phase 3: QAOA subgraph optimization
+    subgraph = g.get_subgraph_data(candidate_ids)
+    adjacency = subgraph["adjacency"]
+    # Use PCE for larger candidate sets (14+), standard QAOA for smaller
     try:
-        result = optimize_subgraph(
-            relevance_scores=candidate_scores,
-            adjacency=adjacency,
-            K=K,
-            alpha=alpha,
-            beta_conn=beta_conn,
-            gamma_cov=gamma_cov,
-        )
+        if len(candidate_ids) > 14:
+            from .pce_optimizer import optimize_subgraph_pce
+            result = optimize_subgraph_pce(
+                relevance_scores=candidate_scores,
+                adjacency=adjacency,
+                K=K,
+                alpha=alpha,
+                beta_conn=beta_conn,
+                gamma_cov=gamma_cov,
+            )
+        else:
+            result = _std_optimize_subgraph(
+                relevance_scores=candidate_scores,
+                adjacency=adjacency,
+                K=K,
+                alpha=alpha,
+                beta_conn=beta_conn,
+                gamma_cov=gamma_cov,
+            )
     except Exception as e:
         # Ultimate fallback: use greedy if QAOA fails despite internal try/except
         print(f"WARNING: Subgraph optimization failed ({e}), using greedy fallback")
@@ -239,6 +293,14 @@ def recall(
     selected_idxs = result["selection"]
     selected_memories = []
+    # Compute comparison pcts safely
+    qaoa_score_val = result.get("qaoa", {}).get("score", result["score"])
+    greedy_score_val = result.get("greedy", {}).get("score", result["score"])
+    optimal_score_val = result.get("optimal", {}).get("score", result["score"])
+    qaoa_vs_greedy = (qaoa_score_val / greedy_score_val * 100) if greedy_score_val > 0 else 100
+    qaoa_vs_optimal = (qaoa_score_val / optimal_score_val * 100) if optimal_score_val > 0 else 100
     for idx in selected_idxs:
         cid = candidate_ids[idx]
         mem = g.memories[cid]
@@ -261,6 +323,7 @@ def recall(
             "entities": mem.entities,
             "relevance": float(candidate_scores[idx]),
             "source": mem.source,
+            "tier": _tier(mem),
             "connections": connections,
         })
@@ -273,7 +336,11 @@ def recall(
         "qaoa_score": result["score"],
         "greedy_score": result["greedy"]["score"],
         "optimal_score": result["optimal"]["score"],
-        "qaoa_vs_optimal_pct": result["qaoa_vs_optimal_pct"],
-        "qaoa_vs_greedy_pct": result["qaoa_vs_greedy_pct"],
+        "qaoa_vs_optimal_pct": round(float(qaoa_vs_optimal), 2),
+        "qaoa_vs_greedy_pct": round(float(qaoa_vs_greedy), 2),
+        "tier_counts": {
+            "warm": sum(1 for m in selected_memories if m["tier"] == "warm"),
+            "cold": sum(1 for m in selected_memories if m["tier"] == "cold"),
+        },
         "graph_stats": g.stats(),
     }

quantum_memory_graph-1.2.2/quantum_memory_graph/synergy_reranker.py ADDED Viewed

@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+"""
+Synergy-aware reranker — word-overlap synergy + diversity selection.
+Uses token-level overlap analysis to select chunks that are
+complementary to each other, not just individually relevant.
+DK 🦍
+"""
+import math
+import numpy as np
+from collections import defaultdict
+STOP_WORDS = frozenset({
+    "the","is","a","an","and","or","but","in","on","at",
+    "to","for","of","with","by","from","was","were","are",
+    "be","been","being","have","has","had","do","does","did",
+    "will","would","could","should","may","might","can","this",
+    "that","these","those","it","its","not","no","he","she",
+    "his","her","my","me","i","you","we","us","they","them",
+    "what","who","how","when","where","which",
+})
+def _tokenize(text):
+    words = set()
+    for w in text.lower().split():
+        w = "".join(c for c in w if c.isalnum())
+        if len(w) > 2 and w not in STOP_WORDS:
+            words.add(w)
+    return words
+def _synergy_matrix(texts, query):
+    """Pairwise synergy between chunks given a query."""
+    n = len(texts)
+    qt = _tokenize(query)
+    mts = [_tokenize(t) for t in texts]
+    mat = np.zeros((n, n))
+    for i in range(n):
+        for j in range(i + 1, n):
+            qi = qt & mts[i]
+            qj = qt & mts[j]
+            combined = qi | qj
+            complementary = (len(combined) - max(len(qi), len(qj))) / len(qt) if qt else 0.0
+            mi, mj = mts[i], mts[j]
+            u = mi | mj
+            jaccard = len(mi & mj) / len(u) if u else 0.0
+            relatedness = math.exp(-((jaccard - 0.2) ** 2) / 0.05)
+            shared = (mi & mj) - qt
+            bridge = min(len(shared) / 5.0, 0.3)
+            synergy = max(0.0, complementary * 0.5 + relatedness * 0.3 + bridge * 0.2)
+            mat[i][j] = mat[j][i] = synergy
+    return mat
+def _diversity_matrix(texts):
+    """1 - Jaccard overlap between chunk token sets."""
+    n = len(texts)
+    mts = [_tokenize(t) for t in texts]
+    mat = np.ones((n, n))
+    np.fill_diagonal(mat, 0.0)
+    for i in range(n):
+        for j in range(i + 1, n):
+            u = mts[i] | mts[j]
+            overlap = len(mts[i] & mts[j]) / len(u) if u else 0.0
+            mat[i][j] = mat[j][i] = 1.0 - overlap
+    return mat
+def select(cosine_scores, chunk_texts, query, K=5):
+    """
+    Select K chunks using synergy-aware greedy selection.
+    Args:
+        cosine_scores: 1D array of cosine scores
+        chunk_texts: list of chunk text strings
+        query: query text
+        K: number of chunks to select
+    Returns:
+        List of selected chunk indices in selection order
+    """
+    n = len(cosine_scores)
+    if n <= K:
+        return list(range(n))
+    synergy = _synergy_matrix(chunk_texts, query)
+    diversity = _diversity_matrix(chunk_texts)
+    selected = []
+    remaining = set(range(n))
+    first = int(np.argmax(cosine_scores))
+    selected.append(first)
+    remaining.remove(first)
+    for _ in range(K - 1):
+        best_idx, best_score = -1, -np.inf
+        for i in remaining:
+            if selected:
+                avg_syn = float(np.mean([synergy[i][j] for j in selected]))
+                avg_div = float(np.mean([diversity[i][j] for j in selected]))
+            else:
+                avg_syn = avg_div = 0.0
+            score = 0.4 * cosine_scores[i] + 0.3 * avg_syn + 0.2 * avg_div + 0.1
+            if score > best_score:
+                best_score = score
+                best_idx = i
+        selected.append(best_idx)
+        remaining.remove(best_idx)
+    return selected
+def rerank(cosine_scores, chunk_texts, chunk_session_ids, query, K=5):
+    """
+    Full synergy rerank: select chunks, rank sessions by contribution.
+    Args:
+        cosine_scores: per-chunk cosine scores
+        chunk_texts: per-chunk text
+        chunk_session_ids: per-chunk session ID
+        query: query text
+        K: number of chunks to select
+    Returns:
+        List of session IDs ranked by synergy contribution
+    """
+    selected = select(cosine_scores, chunk_texts, query, K)
+    counts = defaultdict(int)
+    for idx in selected:
+        counts[chunk_session_ids[idx]] += 1
+    return sorted(counts.keys(), key=lambda s: -counts[s])

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/quantum_memory_graph.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: quantum-memory-graph
-Version: 1.2.1
+Version: 1.2.2
 Summary: Quantum-optimized knowledge graph memory for AI agents. Relationship-aware subgraph selection via QAOA.
 Home-page: https://github.com/Dustin-a11y/quantum-memory-graph
 Author: Coinkong (Chef's Attraction)
@@ -23,7 +23,6 @@ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: quantum-agent-memory>=0.1.0
 Requires-Dist: sentence-transformers>=2.2.0
 Requires-Dist: networkx>=3.0
 Requires-Dist: numpy>=1.24.0
@@ -45,29 +44,9 @@ Dynamic: license-file
 Every memory system treats memories as independent documents — search, rank, stuff into context. But memories aren't independent. They have *relationships*. "The team chose React" becomes 10x more useful paired with "because of ecosystem maturity" and "FastAPI handles the backend."
-Quantum Memory Graph maps these relationships, then uses QAOA to find the optimal *combination* of memories — not just the most relevant individuals, but the best connected subgraph that gives your agent maximum context.
-## Benchmark: MemCombine (Internal — Memory Combination)
-MemCombine tests what no existing benchmark measures — **memory combination quality**, where QAOA graph selection finds coherent subsets that embedding similarity misses.
-| Method | Coverage | Evidence Recall | F1 | Perfect |
-|--------|----------|----------------|----|---------|
-| Embedding Top-K | 69.9% | 65.6% | 68.1% | 1/5 |
-| **Graph + QAOA** | **96.7%** | **91.0%** | **92.6%** | **4/5** |
-| **Advantage** | **+26.8%** | **+25.4%** | **+24.5%** | |
-When the task is "find memories that work *together*," graph-aware quantum selection crushes pure similarity search.
-> **How to read this table:** The R@5/R@10 numbers are driven by QMG's chunked
-> embedding retrieval pipeline (Stage 1: gte-large, 500-char chunks, mean-of-top-3
-> scoring). QAOA (Stage 2) refines the top-14 candidates for relationship-aware
-> selection — its advantage shows up in MemCombine (combination quality) rather
-> than raw recall rank. The pipeline as a whole achieves #1.
 ## 🏆 #1 on LongMemEval (ICLR 2025 Benchmark)
-Tested on the official [LongMemEval benchmark](https://arxiv.org/abs/2410.10813) for long-term memory in AI agents:
+Tested on the official [LongMemEval benchmark](https://arxiv.org/abs/2410.10813) — [verified submission](https://github.com/xiaowu0162/LongMemEval/issues/46).
 | Method | R@1 | R@5 | R@10 | NDCG@10 |
 |--------|:---:|:---:|:----:|:-------:|
@@ -207,10 +186,7 @@ result = recall(
 )
 ```
-### Run MemCombine Benchmark
 ```python
-from benchmarks.memcombine import run_benchmark
 def my_recall(memories, query, K):
     # Your recall implementation
@@ -243,8 +219,6 @@ Validated on `ibm_fez` and `ibm_kingston` backends.
 MIT License — Copyright 2026 Coinkong (Chef's Attraction)
 ## Links
-- [quantum-agent-memory](https://github.com/Dustin-a11y/quantum-agent-memory) — The QAOA optimization engine
-- [MemCombine Benchmark](benchmarks/memcombine.py) — Test memory combination quality
+- [GitHub](https://github.com/Dustin-a11y/quantum-memory-graph) — Source code and benchmarks

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/quantum_memory_graph.egg-info/SOURCES.txt RENAMED Viewed

@@ -13,7 +13,6 @@ benchmarks/longmemeval_bench_v4.py
 benchmarks/longmemeval_bench_v5.py
 benchmarks/longmemeval_bench_v6.py
 benchmarks/longmemeval_bench_v7.py
-benchmarks/memcombine.py
 benchmarks/run_final.py
 benchmarks/run_full_benchmark.py
 benchmarks/run_full_benchmark_v2.py
@@ -28,6 +27,7 @@ quantum_memory_graph/pce_optimizer.py
 quantum_memory_graph/pipeline.py
 quantum_memory_graph/recency.py
 quantum_memory_graph/subgraph_optimizer.py
+quantum_memory_graph/synergy_reranker.py
 quantum_memory_graph.egg-info/PKG-INFO
 quantum_memory_graph.egg-info/SOURCES.txt
 quantum_memory_graph.egg-info/dependency_links.txt

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/quantum_memory_graph.egg-info/requires.txt RENAMED Viewed

@@ -1,4 +1,3 @@
-quantum-agent-memory>=0.1.0
 sentence-transformers>=2.2.0
 networkx>=3.0
 numpy>=1.24.0

{quantum_memory_graph-1.2.1 → quantum_memory_graph-1.2.2}/setup.cfg RENAMED Viewed

@@ -1,6 +1,6 @@
 [metadata]
 name = quantum-memory-graph
-version = 1.2.1
+version = 1.2.2
 description = Quantum-optimized knowledge graph memory for AI agents. Relationship-aware subgraph selection via QAOA.
 long_description = file: README.md
 long_description_content_type = text/markdown
@@ -29,7 +29,6 @@ classifiers =
 packages = find:
 python_requires = >=3.9
 install_requires =
-	quantum-agent-memory>=0.1.0
 	sentence-transformers>=2.2.0
 	networkx>=3.0
 	numpy>=1.24.0

quantum_memory_graph-1.2.1/benchmarks/memcombine.py DELETED Viewed

@@ -1,236 +0,0 @@
-"""
-MemCombine Benchmark — Tests memory COMBINATION quality.
-Unlike LongMemEval (needle-in-haystack retrieval), MemCombine tests whether
-selected memories work TOGETHER to answer complex questions.
-Questions require synthesizing information from multiple memories:
-  - "What was the decision AND its reasoning AND its outcome?"
-  - "How do project X and project Y relate?"
-  - "What changed between meeting A and meeting B?"
-Metrics:
-  - Combination Score: Do selected memories cover all required facets?
-  - Synergy Score: Do memories reference/build on each other?
-  - Completeness: Can the question be fully answered from selected memories?
-Copyright 2026 Coinkong (Chef's Attraction). MIT License.
-"""
-import json
-import numpy as np
-from typing import List, Dict
-from dataclasses import dataclass, field
-@dataclass
-class MemCombineQuestion:
-    """A question requiring multiple related memories."""
-    id: str
-    question: str
-    category: str  # synthesis, temporal, causal, multi-entity
-    memories: List[Dict]  # All available memories
-    evidence_ids: List[int]  # Which memories contain evidence
-    facets: List[str]  # Required information facets
-    facet_memory_map: Dict  # Which facet comes from which memory
-# Built-in benchmark scenarios
-SCENARIOS = [
-    {
-        "id": "synthesis_1",
-        "question": "What technology stack was chosen for the project and why was each component selected?",
-        "category": "synthesis",
-        "memories": [
-            {"id": 0, "text": "Team meeting: Decided to use React for the frontend. Sarah argued it has the best ecosystem for our use case."},
-            {"id": 1, "text": "Architecture review: PostgreSQL chosen for the database. Need JSONB support for flexible schemas."},
-            {"id": 2, "text": "Sprint planning: Set up CI/CD pipeline using GitHub Actions. Two-week sprint cycles."},
-            {"id": 3, "text": "Team lunch at the Italian place. Good pasta. Bob told a funny joke about recursion."},
-            {"id": 4, "text": "Backend discussion: FastAPI selected over Django. Need async support for real-time features."},
-            {"id": 5, "text": "Deployment strategy: Going with Docker + Kubernetes on AWS. Auto-scaling is critical for launch."},
-            {"id": 6, "text": "Budget review: Cloud costs estimated at $2000/month. Within budget allocation."},
-            {"id": 7, "text": "Coffee chat about the new office layout. Open floor plan vs cubicles debate."},
-            {"id": 8, "text": "Performance testing results: FastAPI handles 10K concurrent connections. Meets our requirements."},
-            {"id": 9, "text": "Security audit: Need to add rate limiting and input validation before launch."},
-        ],
-        "evidence_ids": [0, 1, 4, 5],
-        "facets": ["frontend_choice", "frontend_reason", "backend_choice", "backend_reason", "database_choice", "database_reason", "deployment_choice"],
-        "facet_memory_map": {"frontend_choice": 0, "frontend_reason": 0, "backend_choice": 4, "backend_reason": 4, "database_choice": 1, "database_reason": 1, "deployment_choice": 5},
-    },
-    {
-        "id": "temporal_1",
-        "question": "How did the team's stance on remote work change over the three months?",
-        "category": "temporal",
-        "memories": [
-            {"id": 0, "text": "January all-hands: CEO announced mandatory return to office 5 days a week starting February."},
-            {"id": 1, "text": "Q4 revenue report showed 15% growth. Celebrated with team dinner."},
-            {"id": 2, "text": "February survey results: 73% of employees reported decreased satisfaction with RTO policy."},
-            {"id": 3, "text": "New coffee machine installed in the break room. Everyone loves it."},
-            {"id": 4, "text": "February town hall: HR presented data showing 20% increase in turnover since RTO mandate."},
-            {"id": 5, "text": "March policy update: CEO reversed course. Now hybrid 3 days in office, 2 remote. Cited retention data."},
-            {"id": 6, "text": "IT upgraded the conference room AV equipment for better hybrid meetings."},
-            {"id": 7, "text": "Quarterly OKR review. Team hit 4 of 5 objectives."},
-            {"id": 8, "text": "March satisfaction survey: Employee satisfaction recovered to 85% after hybrid policy."},
-            {"id": 9, "text": "Parking garage construction causing noise complaints from third floor."},
-        ],
-        "evidence_ids": [0, 2, 4, 5, 8],
-        "facets": ["initial_policy", "employee_reaction", "turnover_impact", "policy_change", "final_outcome"],
-        "facet_memory_map": {"initial_policy": 0, "employee_reaction": 2, "turnover_impact": 4, "policy_change": 5, "final_outcome": 8},
-    },
-    {
-        "id": "causal_1",
-        "question": "What caused the production outage, what was done to fix it, and what prevention measures were taken?",
-        "category": "causal",
-        "memories": [
-            {"id": 0, "text": "Monday 2am alert: Production database hit 100% disk usage. All writes failing."},
-            {"id": 1, "text": "Sprint retrospective: Team agreed to improve code review process."},
-            {"id": 2, "text": "Root cause analysis: Logging table grew 500GB in 2 weeks due to debug logging left on after feature deploy."},
-            {"id": 3, "text": "Incident response: DevOps team purged old log entries and increased disk from 1TB to 2TB."},
-            {"id": 4, "text": "New hire orientation for three junior developers. HR handled logistics."},
-            {"id": 5, "text": "Post-mortem action item 1: Implement log rotation with 30-day retention policy."},
-            {"id": 6, "text": "Post-mortem action item 2: Add disk usage alerts at 70%, 80%, 90% thresholds."},
-            {"id": 7, "text": "Post-mortem action item 3: Require removing debug logging before merging to main."},
-            {"id": 8, "text": "Team building event at the escape room. Marketing team won."},
-            {"id": 9, "text": "Client demo went well. They want to proceed with Phase 2."},
-        ],
-        "evidence_ids": [0, 2, 3, 5, 6, 7],
-        "facets": ["what_happened", "root_cause", "immediate_fix", "prevention_1", "prevention_2", "prevention_3"],
-        "facet_memory_map": {"what_happened": 0, "root_cause": 2, "immediate_fix": 3, "prevention_1": 5, "prevention_2": 6, "prevention_3": 7},
-    },
-    {
-        "id": "multi_entity_1",
-        "question": "What are each team member's roles and how do their responsibilities interact?",
-        "category": "multi_entity",
-        "memories": [
-            {"id": 0, "text": "Alice leads frontend development. She works closely with Bob on API contracts."},
-            {"id": 1, "text": "Company picnic was fun. Great weather this year."},
-            {"id": 2, "text": "Bob owns the backend services. He designs APIs that Alice's frontend consumes."},
-            {"id": 3, "text": "Carol manages the infrastructure. She provisions the servers Bob's services run on."},
-            {"id": 4, "text": "New ping pong table in the break room. Tournament next Friday."},
-            {"id": 5, "text": "Dave handles QA. He writes integration tests that cover Alice's UI and Bob's APIs."},
-            {"id": 6, "text": "Eve is the project manager. She coordinates between Alice, Bob, Carol, and Dave."},
-            {"id": 7, "text": "Office plants are dying. Need to assign someone to water them."},
-            {"id": 8, "text": "Alice and Carol paired on improving the CI/CD pipeline. Reduced deploy time by 40%."},
-            {"id": 9, "text": "Dave found a critical bug in Bob's API. Bob fixed it same day."},
-        ],
-        "evidence_ids": [0, 2, 3, 5, 6, 8, 9],
-        "facets": ["alice_role", "bob_role", "carol_role", "dave_role", "eve_role", "alice_bob_interaction", "bob_carol_interaction", "dave_integration"],
-        "facet_memory_map": {"alice_role": 0, "bob_role": 2, "carol_role": 3, "dave_role": 5, "eve_role": 6, "alice_bob_interaction": 0, "bob_carol_interaction": 3, "dave_integration": 5},
-    },
-    {
-        "id": "synthesis_2",
-        "question": "What is the complete customer onboarding process from signup to first value?",
-        "category": "synthesis",
-        "memories": [
-            {"id": 0, "text": "Step 1: Customer signs up via website. Auto-creates account and sends welcome email."},
-            {"id": 1, "text": "Marketing team redesigned the landing page. Conversion rate up 12%."},
-            {"id": 2, "text": "Step 2: Customer success rep schedules onboarding call within 24 hours of signup."},
-            {"id": 3, "text": "Step 3: During onboarding call, rep helps customer import their data and configure integrations."},
-            {"id": 4, "text": "Sales team hit quarterly target. Pizza party celebration."},
-            {"id": 5, "text": "Step 4: Customer gets access to interactive tutorial. Must complete 3 core modules."},
-            {"id": 6, "text": "Step 5: After tutorial completion, customer success checks in at day 7 and day 30."},
-            {"id": 7, "text": "Office AC broken again. Facilities contacted."},
-            {"id": 8, "text": "Churn analysis: Customers who complete onboarding tutorial have 3x higher retention."},
-            {"id": 9, "text": "Support ticket about login issues. Resolved — was a password reset problem."},
-        ],
-        "evidence_ids": [0, 2, 3, 5, 6],
-        "facets": ["signup", "scheduling", "data_import", "tutorial", "followup"],
-        "facet_memory_map": {"signup": 0, "scheduling": 2, "data_import": 3, "tutorial": 5, "followup": 6},
-    },
-]
-def evaluate_combination(selected_ids: List[int], scenario: Dict) -> Dict:
-    """
-    Evaluate how well selected memories combine to answer the question.
-    Returns facet coverage, synergy score, and overall combination quality.
-    """
-    evidence_ids = set(scenario["evidence_ids"])
-    facet_map = scenario["facet_memory_map"]
-    facets = scenario["facets"]
-    selected_set = set(selected_ids)
-    # Facet coverage: what percentage of required facets are covered?
-    covered_facets = []
-    for facet in facets:
-        required_mem = facet_map.get(facet)
-        if required_mem is not None and required_mem in selected_set:
-            covered_facets.append(facet)
-    coverage = len(covered_facets) / len(facets) if facets else 0
-    # Evidence recall: what percentage of evidence memories selected?
-    evidence_found = selected_set & evidence_ids
-    evidence_recall = len(evidence_found) / len(evidence_ids) if evidence_ids else 0
-    # Precision: what percentage of selected are actually evidence?
-    precision = len(evidence_found) / len(selected_set) if selected_set else 0
-    # Noise: non-evidence memories selected
-    noise = len(selected_set - evidence_ids)
-    return {
-        "coverage": coverage,
-        "evidence_recall": evidence_recall,
-        "precision": precision,
-        "noise": noise,
-        "covered_facets": covered_facets,
-        "missing_facets": [f for f in facets if f not in covered_facets],
-        "f1": (2 * precision * evidence_recall / (precision + evidence_recall)
-               if (precision + evidence_recall) > 0 else 0),
-    }
-def run_benchmark(recall_fn, K: int = 5, scenarios: List[Dict] = None) -> Dict:
-    """
-    Run MemCombine benchmark against a recall function.
-    Args:
-        recall_fn: Function(memories, query, K) -> List[int] (selected indices)
-        K: Number of memories to select
-        scenarios: Custom scenarios (uses built-in if None)
-    Returns:
-        Benchmark results with per-scenario and aggregate scores
-    """
-    if scenarios is None:
-        scenarios = SCENARIOS
-    results = []
-    total_coverage = 0
-    total_recall = 0
-    total_f1 = 0
-    perfect = 0
-    for scenario in scenarios:
-        memory_texts = [m["text"] for m in scenario["memories"]]
-        selected = recall_fn(memory_texts, scenario["question"], K)
-        eval_result = evaluate_combination(selected, scenario)
-        results.append({
-            "id": scenario["id"],
-            "category": scenario["category"],
-            "selected": selected,
-            **eval_result,
-        })
-        total_coverage += eval_result["coverage"]
-        total_recall += eval_result["evidence_recall"]
-        total_f1 += eval_result["f1"]
-        if eval_result["coverage"] == 1.0:
-            perfect += 1
-    n = len(scenarios)
-    return {
-        "benchmark": "MemCombine",
-        "n_scenarios": n,
-        "K": K,
-        "avg_coverage": total_coverage / n,
-        "avg_evidence_recall": total_recall / n,
-        "avg_f1": total_f1 / n,
-        "perfect_coverage": perfect,
-        "perfect_coverage_pct": perfect / n * 100,
-        "per_scenario": results,
-    }