PyPI - agmem - Versions diffs - 0.2.0__tar.gz → 0.2.1__tar.gz - Mend

agmem 0.2.0tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (169) hide show

{agmem-0.2.0/agmem.egg-info → agmem-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agmem
-Version: 0.2.0
+Version: 0.2.1
 Summary: Agentic Memory Version Control System - Git for AI agent memories
 Home-page: https://github.com/vivek-tiwari-vt/agmem
 Author: agmem Team
@@ -137,14 +137,15 @@ agmem solves all of these problems with a familiar Git-like interface.
 - ✅ **Tamper-evident audit trail** — Append-only hash-chained log (init, add, commit, checkout, merge, push, pull, config); `agmem audit` and `agmem audit --verify`
 - ✅ **Multi-agent trust** — Trust store (full / conditional / untrusted) per public key; applied on pull/merge; clone copies remote keys
 - ✅ **Conflict resolution** — `agmem resolve` with ours/theirs/both; conflicts persisted in `.mem/merge/`; path-safe
-- ✅ **Differential privacy** — Epsilon/delta budget in `.mem/privacy_budget.json`; `--private` on `agmem distill` and `agmem garden`; noise applied to counts and frontmatter
+- ✅ **Differential privacy** — Epsilon/delta budget in `.mem/privacy_budget.json`; `--private` on `agmem distill` and `agmem garden`; noise applies to fact-level data only (metadata fields excluded)
 - ✅ **Pack files & GC** — `agmem gc [--repack]` (reachable from refs, prune loose, optional pack file + index); ObjectStore reads from pack when loose missing
 - ✅ **Multi-provider LLM** — OpenAI and Anthropic via `memvcs.core.llm`; config/repo or env; used by gardener, distiller, consistency, merge
 - ✅ **Temporal querying** — Point-in-time and range queries in temporal index; frontmatter timestamps
-- ✅ **Federated collaboration** — `agmem federated push|pull`; real summaries (topic counts, fact hashes); optional DP on outbound; coordinator API in docs/FEDERATED.md
+- ✅ **Federated collaboration** — `agmem federated push|pull`; protocol-compliant summaries (agent_id, timestamp, topic_counts, fact_hashes); optional DP on outbound; coordinator API in docs/FEDERATED.md
 - ✅ **Zero-knowledge proofs** — `agmem prove` (hash/signature-based): keyword containment (Merkle set membership), memory freshness (signed timestamp). **Note:** Current implementation is proof-of-knowledge with known limitations; see docs for migration to true zk-SNARKs.
 - ✅ **Daemon health** — 4-point health monitoring (storage, redundancy, staleness, graph consistency) with periodic checks; visible warnings and JSON reports
-- ✅ **Delta encoding** — 5-10x compression for similar objects using Levenshtein distance and SequenceMatcher; optional feature in pack files
+- ✅ **Delta encoding** — 5-10x compression for similar objects using Levenshtein distance and SequenceMatcher; enabled in GC repack with multi-tier similarity filtering
+- ✅ **Performance safeguards** — Multi-tier similarity filter (length ratio + SimHash) avoids O(n²×m²) worst-case comparisons
 - ✅ **GPU acceleration** — Vector store detects GPU for embedding model when available
 - ✅ **Optional** — `serve`, `daemon` (watch + auto-commit), `garden` (episode archival), MCP server; install extras as needed

{agmem-0.2.0 → agmem-0.2.1}/README.md RENAMED Viewed

@@ -37,14 +37,15 @@ agmem solves all of these problems with a familiar Git-like interface.
 - ✅ **Tamper-evident audit trail** — Append-only hash-chained log (init, add, commit, checkout, merge, push, pull, config); `agmem audit` and `agmem audit --verify`
 - ✅ **Multi-agent trust** — Trust store (full / conditional / untrusted) per public key; applied on pull/merge; clone copies remote keys
 - ✅ **Conflict resolution** — `agmem resolve` with ours/theirs/both; conflicts persisted in `.mem/merge/`; path-safe
-- ✅ **Differential privacy** — Epsilon/delta budget in `.mem/privacy_budget.json`; `--private` on `agmem distill` and `agmem garden`; noise applied to counts and frontmatter
+- ✅ **Differential privacy** — Epsilon/delta budget in `.mem/privacy_budget.json`; `--private` on `agmem distill` and `agmem garden`; noise applies to fact-level data only (metadata fields excluded)
 - ✅ **Pack files & GC** — `agmem gc [--repack]` (reachable from refs, prune loose, optional pack file + index); ObjectStore reads from pack when loose missing
 - ✅ **Multi-provider LLM** — OpenAI and Anthropic via `memvcs.core.llm`; config/repo or env; used by gardener, distiller, consistency, merge
 - ✅ **Temporal querying** — Point-in-time and range queries in temporal index; frontmatter timestamps
-- ✅ **Federated collaboration** — `agmem federated push|pull`; real summaries (topic counts, fact hashes); optional DP on outbound; coordinator API in docs/FEDERATED.md
+- ✅ **Federated collaboration** — `agmem federated push|pull`; protocol-compliant summaries (agent_id, timestamp, topic_counts, fact_hashes); optional DP on outbound; coordinator API in docs/FEDERATED.md
 - ✅ **Zero-knowledge proofs** — `agmem prove` (hash/signature-based): keyword containment (Merkle set membership), memory freshness (signed timestamp). **Note:** Current implementation is proof-of-knowledge with known limitations; see docs for migration to true zk-SNARKs.
 - ✅ **Daemon health** — 4-point health monitoring (storage, redundancy, staleness, graph consistency) with periodic checks; visible warnings and JSON reports
-- ✅ **Delta encoding** — 5-10x compression for similar objects using Levenshtein distance and SequenceMatcher; optional feature in pack files
+- ✅ **Delta encoding** — 5-10x compression for similar objects using Levenshtein distance and SequenceMatcher; enabled in GC repack with multi-tier similarity filtering
+- ✅ **Performance safeguards** — Multi-tier similarity filter (length ratio + SimHash) avoids O(n²×m²) worst-case comparisons
 - ✅ **GPU acceleration** — Vector store detects GPU for embedding model when available
 - ✅ **Optional** — `serve`, `daemon` (watch + auto-commit), `garden` (episode archival), MCP server; install extras as needed

{agmem-0.2.0 → agmem-0.2.1/agmem.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agmem
-Version: 0.2.0
+Version: 0.2.1
 Summary: Agentic Memory Version Control System - Git for AI agent memories
 Home-page: https://github.com/vivek-tiwari-vt/agmem
 Author: agmem Team
@@ -137,14 +137,15 @@ agmem solves all of these problems with a familiar Git-like interface.
 - ✅ **Tamper-evident audit trail** — Append-only hash-chained log (init, add, commit, checkout, merge, push, pull, config); `agmem audit` and `agmem audit --verify`
 - ✅ **Multi-agent trust** — Trust store (full / conditional / untrusted) per public key; applied on pull/merge; clone copies remote keys
 - ✅ **Conflict resolution** — `agmem resolve` with ours/theirs/both; conflicts persisted in `.mem/merge/`; path-safe
-- ✅ **Differential privacy** — Epsilon/delta budget in `.mem/privacy_budget.json`; `--private` on `agmem distill` and `agmem garden`; noise applied to counts and frontmatter
+- ✅ **Differential privacy** — Epsilon/delta budget in `.mem/privacy_budget.json`; `--private` on `agmem distill` and `agmem garden`; noise applies to fact-level data only (metadata fields excluded)
 - ✅ **Pack files & GC** — `agmem gc [--repack]` (reachable from refs, prune loose, optional pack file + index); ObjectStore reads from pack when loose missing
 - ✅ **Multi-provider LLM** — OpenAI and Anthropic via `memvcs.core.llm`; config/repo or env; used by gardener, distiller, consistency, merge
 - ✅ **Temporal querying** — Point-in-time and range queries in temporal index; frontmatter timestamps
-- ✅ **Federated collaboration** — `agmem federated push|pull`; real summaries (topic counts, fact hashes); optional DP on outbound; coordinator API in docs/FEDERATED.md
+- ✅ **Federated collaboration** — `agmem federated push|pull`; protocol-compliant summaries (agent_id, timestamp, topic_counts, fact_hashes); optional DP on outbound; coordinator API in docs/FEDERATED.md
 - ✅ **Zero-knowledge proofs** — `agmem prove` (hash/signature-based): keyword containment (Merkle set membership), memory freshness (signed timestamp). **Note:** Current implementation is proof-of-knowledge with known limitations; see docs for migration to true zk-SNARKs.
 - ✅ **Daemon health** — 4-point health monitoring (storage, redundancy, staleness, graph consistency) with periodic checks; visible warnings and JSON reports
-- ✅ **Delta encoding** — 5-10x compression for similar objects using Levenshtein distance and SequenceMatcher; optional feature in pack files
+- ✅ **Delta encoding** — 5-10x compression for similar objects using Levenshtein distance and SequenceMatcher; enabled in GC repack with multi-tier similarity filtering
+- ✅ **Performance safeguards** — Multi-tier similarity filter (length ratio + SimHash) avoids O(n²×m²) worst-case comparisons
 - ✅ **GPU acceleration** — Vector store detects GPU for embedding model when available
 - ✅ **Optional** — `serve`, `daemon` (watch + auto-commit), `garden` (episode archival), MCP server; install extras as needed

{agmem-0.2.0 → agmem-0.2.1}/agmem.egg-info/SOURCES.txt RENAMED Viewed

@@ -12,18 +12,9 @@ agmem.egg-info/top_level.txt
 docs/AGMEM_PUBLISHING_SETUP.md
 docs/CONFIG.md
 docs/FEDERATED.md
-docs/FINAL_COMPLETION_REPORT.md
-docs/FINAL_STATUS_REPORT.md
 docs/GTM.md
-docs/HEALTH_MONITORING.md
-docs/IMPLEMENTATION_COMPLETE_SUMMARY.md
 docs/KNOWLEDGE_GRAPH.md
-docs/PACKAGE_UPDATES_VERIFICATION.md
-docs/QUICK_REFERENCE.md
 docs/SEQUENTIAL_VALIDATION.md
-docs/STEP10_SOLID_REFACTORING_COMPLETION.md
-docs/STEP8_HEALTH_MONITORING_COMPLETION.md
-docs/STEP9_DELTA_ENCODING_COMPLETION.md
 docs/TEST_REPORT.md
 docs/aux/INSTALL.md
 examples/basic_workflow.sh
@@ -79,6 +70,7 @@ memvcs/coordinator/server.py
 memvcs/core/__init__.py
 memvcs/core/access_index.py
 memvcs/core/audit.py
+memvcs/core/compression_metrics.py
 memvcs/core/compression_pipeline.py
 memvcs/core/config_loader.py
 memvcs/core/consistency.py
@@ -89,6 +81,7 @@ memvcs/core/delta.py
 memvcs/core/diff.py
 memvcs/core/distiller.py
 memvcs/core/encryption.py
+memvcs/core/fast_similarity.py
 memvcs/core/federated.py
 memvcs/core/gardener.py
 memvcs/core/hooks.py
@@ -99,6 +92,8 @@ memvcs/core/objects.py
 memvcs/core/pack.py
 memvcs/core/pii_scanner.py
 memvcs/core/privacy_budget.py
+memvcs/core/privacy_validator.py
+memvcs/core/protocol_builder.py
 memvcs/core/refs.py
 memvcs/core/remote.py
 memvcs/core/repository.py
@@ -141,6 +136,7 @@ tests/test_consistency.py
 tests/test_crypto_verify.py
 tests/test_decay.py
 tests/test_delta_encoding.py
+tests/test_distiller_dp.py
 tests/test_edge_cases.py
 tests/test_encryption.py
 tests/test_federated.py
@@ -150,6 +146,7 @@ tests/test_ipfs_remote.py
 tests/test_llm_provider.py
 tests/test_objects.py
 tests/test_pack_gc.py
+tests/test_performance_benchmarks.py
 tests/test_pii.py
 tests/test_plan_features.py
 tests/test_privacy_budget.py

{agmem-0.2.0 → agmem-0.2.1}/docs/FEDERATED.md RENAMED Viewed

@@ -23,7 +23,7 @@ In `.mem/config.json` or user config:
 - `coordinator_url`: Base URL of the coordinator (no trailing slash).
 - `memory_types`: Which memory dirs to include in the summary.
-- `differential_privacy.enabled`: If true, numeric fields in the summary are noised before push.
+- `differential_privacy.enabled`: If true, fact-level numeric fields are noised before push (metadata is exempt).
 ## Coordinator API
@@ -33,15 +33,22 @@ The coordinator must expose two endpoints.
 **Request**
-- Body: JSON object (local summary from `produce_local_summary`).
+- Body: JSON object (protocol-compliant summary envelope).
 - `Content-Type: application/json`.
 **Summary shape**
+Top-level envelope:
+- `summary`: object containing the fields below.
+Summary fields:
+- `agent_id`: deterministic client identifier (SHA-256).
+- `timestamp`: ISO-8601 UTC timestamp.
 - `memory_types`: list of strings (e.g. `["episodic", "semantic"]`).
-- `topics`: dict of memory type → integer count (file count per type; may be noised if DP enabled).
-- `topic_hashes`: dict of memory type → list of topic labels (no raw content).
-- `fact_count`: integer (total fact/file count; may be noised if DP enabled).
+- `topic_counts`: dict of memory type → integer count (may be noised if DP enabled).
+- `fact_hashes`: list of strings (hashes; no raw content).
 **Response**

{agmem-0.2.0 → agmem-0.2.1}/docs/TEST_REPORT.md RENAMED Viewed

@@ -1,6 +1,8 @@
 # agmem Test Report — What Works and What Doesn’t
-This report is based on full-flow tests (`scripts/test_full_flow.py`), manual command runs, security fixes, and the knowledge graph feature.
+This report is based on full-flow tests (`scripts/test_full_flow.py`), automated pytest runs, manual command runs, security fixes, and the knowledge graph feature.
+**Latest automated run (2026-02-01):** 246 passed, 5 skipped in ~45s.
 ---
@@ -82,11 +84,13 @@ This report is based on full-flow tests (`scripts/test_full_flow.py`), manual co
 |------|--------|------|
 | **Crypto** | ✅ Tests | Merkle build/verify, tampered blob fails verification, signature present but no public key. |
 | **Encryption** | ✅ Tests | Round-trip, wrong key fails, corrupted ciphertext fails. |
-| **Privacy budget** | ✅ Tests | load_budget, spend_epsilon, add_noise, Gardener/Distiller DP integration (mocked). |
+| **Privacy budget** | ✅ Tests | load_budget, spend_epsilon, add_noise, Gardener/Distiller DP integration (mocked), DP sampling (no fixed seed), metadata fields exempted from noise. |
 | **Pack/GC** | ✅ Tests | list_loose_objects, run_gc, write_pack, retrieve_from_pack, ObjectStore read from pack, run_repack dry-run. |
 | **ZK proofs** | ✅ Tests | prove_keyword_containment / verify_proof round-trip; keyword not in file returns False; freshness (skipped without signing key). |
-| **Federated** | ✅ Tests | produce_local_summary (topic_hashes, fact_count), DP noising; push/pull with mock coordinator. |
+| **Federated** | ✅ Tests | protocol-compliant summary (agent_id, timestamp, topic_counts, fact_hashes), DP noising; push/pull with mock coordinator. |
 | **IPFS** | ✅ Tests | parse_ipfs_url, bundle/unbundle round-trip; push/pull with mock gateway. |
+| **Protocol & privacy** | ✅ Tests | schema validation, privacy audit (metadata noise rejection), strict mode enforcement. |
+| **Performance** | ✅ Tests | Levenshtein, SimHash, multi-tier similarity filtering regression checks. |
 ### Security (vulnerability check)

{agmem-0.2.0 → agmem-0.2.1}/memvcs/__init__.py RENAMED Viewed

@@ -4,6 +4,6 @@ agmem - Agentic Memory Version Control System
 A Git-inspired version control system for AI agent memory artifacts.
 """
-__version__ = "0.1.6"
+__version__ = "0.2.1"
 __author__ = "agmem Team"
 __license__ = "MIT"

{agmem-0.2.0 → agmem-0.2.1}/memvcs/cli.py RENAMED Viewed

@@ -141,7 +141,7 @@ For more information: https://github.com/vivek-tiwari-vt/agmem
         """,
     )
-    parser.add_argument("--version", "-v", action="version", version="%(prog)s 0.1.0")
+    parser.add_argument("--version", "-v", action="version", version="%(prog)s 0.2.1")
     parser.add_argument("--verbose", action="store_true", help="Enable verbose output")

{agmem-0.2.0 → agmem-0.2.1}/memvcs/coordinator/server.py RENAMED Viewed

@@ -21,6 +21,7 @@ from typing import Dict, List, Optional, Any
 from pathlib import Path
 import json
 import hashlib
+import re
 try:
     from fastapi import FastAPI, HTTPException, Request
@@ -39,10 +40,25 @@ except ImportError:
         return None
+def _get_version() -> str:
+    """Get agmem version from pyproject.toml. Falls back to 0.2.1 if not found."""
+    try:
+        pyproject_path = Path(__file__).parent.parent.parent / "pyproject.toml"
+        if pyproject_path.exists():
+            content = pyproject_path.read_text()
+            match = re.search(r'version\s*=\s*"([^"]+)"', content)
+            if match:
+                return match.group(1)
+    except Exception:
+        pass
+    return "0.2.1"
 # Storage: In-memory for simplicity (use Redis/PostgreSQL for production)
 summaries_store: Dict[str, List[Dict[str, Any]]] = {}
+_version = _get_version()
 metadata_store: Dict[str, Any] = {
-    "coordinator_version": "0.1.6",
+    "coordinator_version": _version,
     "started_at": datetime.now(timezone.utc).isoformat(),
     "total_pushes": 0,
     "total_agents": 0,
@@ -79,7 +95,7 @@ if FASTAPI_AVAILABLE:
     app = FastAPI(
         title="agmem Federated Coordinator",
         description="Minimal coordinator for federated agent memory collaboration",
-        version="0.1.6",
+        version=_version,
     )
     @app.get("/")

agmem-0.2.1/memvcs/core/compression_metrics.py ADDED Viewed

@@ -0,0 +1,248 @@
+"""
+Delta compression metrics and observability.
+Tracks compression effectiveness across object types to enable future
+optimization and auto-tuning of delta encoding parameters.
+Provides:
+- DeltaCompressionMetrics: Tracks compression ratio, object types, benefits
+- CompressionHeatmap: Visualizes which types compress best
+- Statistics reporting for gc --repack operations
+"""
+from dataclasses import dataclass, field
+from typing import Dict, List, Any, Optional, Tuple
+from collections import defaultdict
+@dataclass
+class ObjectCompressionStats:
+    """Statistics for a single object's compression."""
+    object_id: str
+    object_type: str  # "semantic", "episodic", "procedural"
+    original_size: int  # bytes
+    compressed_size: int  # bytes after delta encoding
+    compression_ratio: float  # compressed_size / original_size (0.0 = 100% compression)
+    delta_used: bool  # Whether delta encoding was applied
+    compression_benefit: float  # original_size - compressed_size
+@dataclass
+class TypeCompressionStats:
+    """Aggregated statistics for an object type."""
+    object_type: str
+    count: int = 0
+    total_original_size: int = 0
+    total_compressed_size: int = 0
+    avg_compression_ratio: float = 0.0
+    total_benefit: int = 0  # Total bytes saved
+    objects_with_delta: int = 0  # How many used delta encoding
+    min_ratio: float = 1.0
+    max_ratio: float = 0.0
+    def update_from_object(self, obj_stats: ObjectCompressionStats) -> None:
+        """Update type stats with a single object's stats."""
+        self.count += 1
+        self.total_original_size += obj_stats.original_size
+        self.total_compressed_size += obj_stats.compressed_size
+        self.total_benefit += int(obj_stats.compression_benefit)
+        if obj_stats.delta_used:
+            self.objects_with_delta += 1
+        self.min_ratio = min(self.min_ratio, obj_stats.compression_ratio)
+        self.max_ratio = max(self.max_ratio, obj_stats.compression_ratio)
+        # Recalculate average
+        if self.total_original_size > 0:
+            self.avg_compression_ratio = self.total_compressed_size / self.total_original_size
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dict for reporting."""
+        savings_pct = 0.0
+        if self.total_original_size > 0:
+            savings_pct = (self.total_benefit / self.total_original_size) * 100
+        return {
+            "object_type": self.object_type,
+            "count": self.count,
+            "total_original_bytes": self.total_original_size,
+            "total_compressed_bytes": self.total_compressed_size,
+            "avg_compression_ratio": round(self.avg_compression_ratio, 3),
+            "compression_range": f"{self.min_ratio:.1%} - {self.max_ratio:.1%}",
+            "total_bytes_saved": self.total_benefit,
+            "savings_percentage": round(savings_pct, 1),
+            "objects_using_delta": self.objects_with_delta,
+            "delta_adoption_rate": (
+                round((self.objects_with_delta / self.count * 100), 1) if self.count > 0 else 0
+            ),
+        }
+class DeltaCompressionMetrics:
+    """Tracks delta compression statistics across all objects.
+    Usage:
+        metrics = DeltaCompressionMetrics()
+        # ... during packing ...
+        metrics.record_object(ObjectCompressionStats(...))
+        # ... after packing ...
+        report = metrics.get_report()
+    """
+    def __init__(self):
+        self.objects: List[ObjectCompressionStats] = []
+        self.type_stats: Dict[str, TypeCompressionStats] = {}
+        self.total_original_size: int = 0
+        self.total_compressed_size: int = 0
+    def record_object(self, obj_stats: ObjectCompressionStats) -> None:
+        """Record compression stats for a single object."""
+        self.objects.append(obj_stats)
+        self.total_original_size += obj_stats.original_size
+        self.total_compressed_size += obj_stats.compressed_size
+        # Update type-specific stats
+        if obj_stats.object_type not in self.type_stats:
+            self.type_stats[obj_stats.object_type] = TypeCompressionStats(
+                object_type=obj_stats.object_type
+            )
+        self.type_stats[obj_stats.object_type].update_from_object(obj_stats)
+    def get_type_stats(self, object_type: str) -> Optional[TypeCompressionStats]:
+        """Get stats for a specific object type."""
+        return self.type_stats.get(object_type)
+    def get_overall_ratio(self) -> float:
+        """Get overall compression ratio across all objects."""
+        if self.total_original_size == 0:
+            return 0.0
+        return self.total_compressed_size / self.total_original_size
+    def get_overall_savings(self) -> int:
+        """Get total bytes saved across all objects."""
+        return self.total_original_size - self.total_compressed_size
+    def get_report(self) -> Dict[str, Any]:
+        """Generate a comprehensive compression report."""
+        overall_ratio = self.get_overall_ratio()
+        overall_savings = self.get_overall_savings()
+        savings_pct = (
+            (overall_savings / self.total_original_size * 100)
+            if self.total_original_size > 0
+            else 0
+        )
+        return {
+            "timestamp": None,  # Set by caller if needed
+            "total_objects": len(self.objects),
+            "total_original_bytes": self.total_original_size,
+            "total_compressed_bytes": self.total_compressed_size,
+            "overall_compression_ratio": round(overall_ratio, 3),
+            "total_bytes_saved": overall_savings,
+            "compression_percentage": round(savings_pct, 1),
+            "type_statistics": {otype: stats.to_dict() for otype, stats in self.type_stats.items()},
+            "recommendations": self._generate_recommendations(),
+        }
+    def _generate_recommendations(self) -> List[str]:
+        """Generate optimization recommendations based on compression stats."""
+        recommendations = []
+        # Check if delta encoding is worth it
+        objects_with_delta = sum(s.objects_with_delta for s in self.type_stats.values())
+        if objects_with_delta == 0:
+            recommendations.append("No objects used delta encoding. Check similarity thresholds.")
+        # Check for types with poor compression
+        for otype, stats in self.type_stats.items():
+            if stats.count > 0 and stats.avg_compression_ratio > 0.9:
+                recommendations.append(
+                    f"Type '{otype}' compresses poorly (ratio: {stats.avg_compression_ratio:.1%}). "
+                    f"Consider increasing similarity threshold or reducing delta cost."
+                )
+        # Check for types with excellent compression
+        for otype, stats in self.type_stats.items():
+            if stats.count > 0 and stats.avg_compression_ratio < 0.5:
+                recommendations.append(
+                    f"Type '{otype}' compresses very well (ratio: {stats.avg_compression_ratio:.1%}). "
+                    f"Consider aggressive delta encoding or reduced threshold."
+                )
+        if not recommendations:
+            recommendations.append("Compression is operating normally.")
+        return recommendations
+    def get_heatmap(self) -> str:
+        """Generate a text-based compression heatmap."""
+        lines = ["Delta Compression Heatmap", "=" * 50]
+        if not self.type_stats:
+            lines.append("No compression data available")
+            return "\n".join(lines)
+        # Sort by compression ratio
+        sorted_types = sorted(
+            self.type_stats.values(),
+            key=lambda s: s.avg_compression_ratio,
+        )
+        for stats in sorted_types:
+            if stats.count == 0:
+                continue
+            ratio = stats.avg_compression_ratio
+            # Create a simple bar chart
+            bar_width = 30
+            filled = int(bar_width * ratio)
+            bar = "█" * filled + "░" * (bar_width - filled)
+            saved_pct = (
+                (stats.total_benefit / stats.total_original_size * 100)
+                if stats.total_original_size > 0
+                else 0
+            )
+            lines.append(
+                f"{stats.object_type:12} {bar} {saved_pct:5.1f}% saved ({stats.objects_with_delta}/{stats.count} using delta)"
+            )
+        return "\n".join(lines)
+    def log_report(self, logger: Any = None) -> None:
+        """Log the compression report."""
+        report = self.get_report()
+        heatmap = self.get_heatmap()
+        output = [
+            "=" * 70,
+            "Delta Compression Report",
+            "=" * 70,
+            f"Total Objects: {report['total_objects']}",
+            f"Total Original: {report['total_original_bytes']:,} bytes",
+            f"Total Compressed: {report['total_compressed_bytes']:,} bytes",
+            f"Overall Ratio: {report['overall_compression_ratio']:.1%}",
+            f"Bytes Saved: {report['total_bytes_saved']:,} ({report['compression_percentage']:.1f}%)",
+            "",
+            heatmap,
+            "",
+            "Type Breakdown:",
+        ]
+        for otype, stats in sorted(report["type_statistics"].items()):
+            output.append(f"  {otype}:")
+            output.append(f"    Count: {stats['count']}")
+            output.append(f"    Compression: {stats['avg_compression_ratio']:.1%}")
+            output.append(f"    Saved: {stats['total_bytes_saved']:,} bytes")
+            output.append(f"    Delta adoption: {stats['delta_adoption_rate']:.0f}%")
+        output.extend(["", "Recommendations:"])
+        for rec in report["recommendations"]:
+            output.append(f"  - {rec}")
+        output.append("=" * 70)
+        full_output = "\n".join(output)
+        if logger:
+            logger.info(full_output)
+        else:
+            print(full_output)

{agmem-0.2.0 → agmem-0.2.1}/memvcs/core/distiller.py RENAMED Viewed

@@ -211,7 +211,6 @@ class Distiller:
         # Sample facts with noise - prevents any single episode from dominating
         import random
-        random.seed(42)  # Deterministic but different per cluster due to content
         sampled = random.sample(facts, min(noisy_count, len(facts)))
         # Optional: Add slight noise to fact embeddings if vector store available
@@ -233,17 +232,9 @@ class Distiller:
             out_path = self.target_dir / f"consolidated-{ts}.md"
         confidence_score = self.config.extraction_confidence_threshold
-        if (
-            self.config.use_dp
-            and self.config.dp_epsilon is not None
-            and self.config.dp_delta is not None
-        ):
-            from .privacy_budget import add_noise
-            confidence_score = add_noise(
-                confidence_score, 0.1, self.config.dp_epsilon, self.config.dp_delta
-            )
-            confidence_score = max(0.0, min(1.0, confidence_score))
+        # Metadata noise removed: confidence_score is a metadata field (threshold setting),
+        # not an individual fact. Adding noise to metadata doesn't provide meaningful
+        # privacy guarantees. See privacy_validator.py for the distinction.
         frontmatter = {
             "schema_version": "1.0",
             "last_updated": datetime.utcnow().isoformat() + "Z",

agmem 0.2.0__tar.gz → 0.2.1__tar.gz

agmem 0.2.0tar.gz → 0.2.1tar.gz