npm - prism-mcp-server - Versions diffs - 4.6.0 → 5.1.0 - Mend

prism-mcp-server 4.6.0 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +271 -71
package/dist/dashboard/server.js +240 -7
package/dist/dashboard/ui.js +198 -16
package/dist/server.js +22 -3
package/dist/storage/sqlite.js +247 -6
package/dist/storage/supabase.js +58 -0
package/dist/storage/supabaseMigrations.js +86 -1
package/dist/tools/index.js +2 -2
package/dist/tools/sessionMemoryDefinitions.js +63 -0
package/dist/tools/sessionMemoryHandlers.js +99 -5
package/dist/utils/turboquant.js +730 -0
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -8,12 +8,14 @@
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
 [![Node.js](https://img.shields.io/badge/Node.js-18+-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
-> **Your AI agent's memory that survives between sessions.** Prism MCP is a Model Context Protocol server that gives Claude Desktop, Cursor, Windsurf, and any MCP client **persistent memory**, **time travel**, **visual context**, **multi-agent sync**, **GDPR-compliant deletion**, **memory tracing**, and **LangChain integration** — all running locally with zero cloud dependencies.
+> **Your AI agent's memory that survives between sessions.** Prism MCP is a Model Context Protocol server that gives Claude Desktop, Cursor, Windsurf, and any MCP client **persistent memory**, **time travel**, **visual context**, **multi-agent sync**, **GDPR-compliant deletion**, **memory tracing**, **quantized vector compression**, and **LangChain integration** — all running locally with zero cloud dependencies.
 >
-> Built with **SQLite + F32_BLOB vector search**, **optimistic concurrency control**, **MCP Prompts & Resources**, **auto-compaction**, **Gemini-powered Morning Briefings**, **MemoryTrace explainability**, and optional **Supabase cloud sync**.
+> Built with **SQLite + F32_BLOB vector search**, **TurboQuant 10× embedding compression**, **optimistic concurrency control**, **MCP Prompts & Resources**, **auto-compaction**, **Gemini-powered Morning Briefings**, **MemoryTrace explainability**, and optional **Supabase cloud sync**.
 ## Table of Contents
+- [What's New (v5.1.0)](#whats-new-in-v510--deep-storage--knowledge-graph-)
+- [What's New (v5.0.0)](#whats-new-in-v500--quantized-agentic-memory-)
 - [What's New (v4.6.0)](#whats-new-in-v460--opentelemetry-observability-)
 - [Multi-Instance Support](#multi-instance-support)
 - [How Prism Compares](#how-prism-compares)
@@ -23,7 +25,7 @@
 - [Claude Code Integration (Hooks)](#claude-code-integration-hooks)
 - [Gemini / Antigravity Integration](#gemini--antigravity-integration)
 - [Use Cases](#use-cases)
-- [Architecture](#architecture)
+- [Architecture](#architecture) | [Full Architecture Guide](docs/ARCHITECTURE.md) | [Self-Improving Agent Guide](docs/self-improving-agent.md)
 - [Tool Reference](#tool-reference)
 - [Agent Hivemind — Role Usage](#agent-hivemind--role-usage)
 - [LangChain / LangGraph Integration](#langchain--langgraph-integration)
@@ -42,12 +44,98 @@
 ---
-## What's New in v4.6.0 — OpenTelemetry Observability 🔭
+## What's New in v5.1.0 — Deep Storage & Knowledge Graph 🗂️
+> **🗂️ Reclaim 90% of your vector storage and visually edit your agent's knowledge graph.**
+> [CHANGELOG](CHANGELOG.md)
+| Feature | Description |
+|---|---|
+| 🗑️ **Deep Storage Mode** | New `deep_storage_purge` tool NULLs out redundant float32 embeddings for entries with TurboQuant compressed blobs, reclaiming ~90% of vector storage. Safety guards: 7-day minimum age, dry-run preview, multi-tenant isolation. |
+| 🕸️ **Knowledge Graph Editor** | The Mind Palace Neural Graph is now fully interactive — click nodes to rename or delete keywords, filter by project/date/importance, and surgically groom your agent's semantic memory. |
+| 🔧 **Auto-Load Reliability** | Hardened hook-based integration patterns for Claude Code and Gemini/Antigravity to guarantee context loading on the absolute first turn without reasoning hallucinations. |
+| 🧪 **303 Tests** | 8 new deep-storage test cases covering dry run, execute, safety guards, and idempotency — zero regressions across 13 suites. |
+---
+## What's New in v5.0.0 — Quantized Agentic Memory 🧬
+> **🧬 10× embedding compression is here.** Powered by Google's TurboQuant (ICLR 2026), Prism now compresses 768-dim embeddings from **3,072 bytes → ~400 bytes** — enabling decades of session history on a standard laptop.
+> [RFC-001: Quantized Agentic Memory](docs/rfcs/001-turboquant-integration.md) | [CHANGELOG](CHANGELOG.md)
+### Performance Benchmarks
+| Metric | Before v5.0 | After v5.0 |
+|--------|------------|------------|
+| **Storage per embedding** | 3,072 bytes (float32) | ~400 bytes (turbo4) |
+| **Compression ratio** | 1:1 | **~7.7:1** (4-bit) / **~10.1:1** (3-bit) |
+| **Similarity correlation** | Baseline | >0.85 (4-bit) |
+| **Top-1 retrieval accuracy** | Baseline | >90% (N=100) |
+| **Entries per GB** | ~330K | **~2.5M** |
+| **Search without vector DB** | ❌ Empty | ✅ Tier-2 JS fallback |
+### Three-Tier Memory Architecture
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    PRISM v5.0 MEMORY                       │
+├─────────┬───────────────┬───────────────────────────────────┤
+│  TIER   │ STORAGE       │ SEARCH METHOD                    │
+├─────────┼───────────────┼───────────────────────────────────┤
+│  Tier 0 │ FTS5 keywords │ Full-text search (knowledge_search) │
+│  Tier 1 │ float32 3072B │ sqlite-vec cosine (native)       │
+│  Tier 2 │ turbo4  400B  │ JS asymmetricCosineSimilarity    │
+└─────────┴───────────────┴───────────────────────────────────┘
+searchMemory() flow:
+  → Tier 1 (sqlite-vec)  ── success → return results
+                          ── fail    → Tier 2 (TurboQuant JS)
+                                      ── success → return results
+                                      ── fail    → return []
+```
+### Live Usage: How TurboQuant Works in Practice
+**Every `session_save_ledger` call now generates both tiers automatically:**
+```typescript
+// What happens behind the scenes when you save a session:
+await saveLedger({ project: "my-app", summary: "Built auth flow" });
+// 1. Gemini generates float32 embedding (3,072 bytes)
+// 2. TurboQuant compresses to turbo4 blob (~400 bytes)
+// 3. Single atomic patchLedger writes BOTH to the database
+//    → embedding: "[0.0234, -0.0156, ...]"   (float32)
+//    → embedding_compressed: "base64..."       (turbo4)
+//    → embedding_format: "turbo4"
+//    → embedding_turbo_radius: 12.847
+// Searching works seamlessly across both tiers:
+await searchMemory({ query: "auth flow" });
+// → Tier 1 tries native vector search
+// → If unavailable, Tier 2 deserializes compressed blobs
+//   and ranks using asymmetric cosine similarity in JS
+```
+**Backfill existing entries with one command:**
+```
+> Use tool: session_backfill_embeddings
+> Now repairs AND compresses in a single atomic update
+```
+> **💡 Ollama TurboQuant Tip:** If using Ollama for self-hosted inference, set `OLLAMA_KV_CACHE_TYPE=turbo3` for 10× smaller KV caches during generation — the same algorithm powering Prism's memory compression.
+---
+<details>
+<summary><strong>What's in v4.6.0 — OpenTelemetry Observability 🔭</strong></summary>
 > **🔭 Full distributed tracing for every MCP tool call, LLM provider hop, and background AI worker.**
 > Configure in the new **🔭 Observability** tab in Mind Palace — no code changes required.
 > Activates a 4-tier span waterfall: `mcp.call_tool` → `worker.vlm_caption` → `llm.generate_image_description` / `llm.generate_embedding`.
+</details>
 <a name="whats-new-in-v451--gdpr-export-"></a>
 <details>
 <summary><strong>What's in v4.5.1 — GDPR Export & Test Hardening 🔒</strong></summary>
@@ -234,7 +322,7 @@
 | Feature | Description |
 |---|---|
 | 🏠 **Local-First SQLite** | Run Prism entirely locally with zero cloud dependencies. Full vector search (libSQL F32_BLOB) and FTS5 included. |
-| 🔮 **Mind Palace UI** | A beautiful glassmorphism dashboard at `localhost:3000` to inspect your agent's memory, visual vault, and Git drift. |
+| 🔮 **Mind Palace UI** | A beautiful glassmorphism dashboard at `localhost:3000` (configurable via `PRISM_DASHBOARD_PORT`) to inspect your agent's memory, visual vault, and Git drift. |
 | 🕰️ **Time Travel** | `memory_history` and `memory_checkout` act like `git revert` for your agent's brain — full version history with OCC. |
 | 🖼️ **Visual Memory** | Agents can save screenshots to a local media vault. Auto-capture mode snapshots your local dev server on every handoff save. |
 | 📡 **Agent Telepathy** | Multi-client sync: if your agent in Cursor saves state, Claude Desktop gets a live notification instantly. |
@@ -271,9 +359,14 @@
 | **Auto-Compaction** | ✅ Gemini rollups | ❌ | ❌ | ❌ | ❌ |
 | **Morning Briefing** | ✅ Gemini synthesis | ❌ | ❌ | ❌ | ❌ |
 | **OCC (Concurrency)** | ✅ Version-based | ❌ | ❌ | ❌ | ❌ |
-| **GDPR Compliance** | ✅ Soft/hard delete | ❌ | ❌ | ❌ | ❌ |
+| **GDPR Compliance** | ✅ Soft/hard delete + ZIP export | ❌ | ❌ | ❌ | ❌ |
 | **Memory Tracing** | ✅ Latency breakdown | ❌ | ❌ | ❌ | ❌ |
+| **OpenTelemetry** | ✅ OTLP spans (v4.6) | ❌ | ❌ | ❌ | ❌ |
+| **VLM Image Captions** | ✅ Auto-caption vault (v4.5) | ❌ | ❌ | ❌ | ❌ |
+| **Pluggable LLM Adapters** | ✅ OpenAI/Anthropic/Gemini/Ollama | ❌ | ✅ Multi-provider | ❌ | ❌ |
 | **LangChain** | ✅ BaseRetriever | ❌ | ❌ | ❌ | ❌ |
+| **Vector Compression** | ✅ TurboQuant 10× (v5.0) | ❌ | ❌ | ❌ | ❌ |
+| **Three-Tier Search** | ✅ FTS + Vec + Quantized | ❌ | ❌ | ❌ | ❌ |
 | **MCP Native** | ✅ stdio | ✅ stdio | ❌ Python SDK | ✅ HTTP + MCP | ✅ stdio |
 | **Language** | TypeScript | TypeScript | Python | Python | Python |
@@ -465,11 +558,36 @@ Add to your Continue `config.json` or Cline MCP settings:
 ## Claude Code Integration (Hooks)
-Claude Code supports **lifecycle hooks** in `~/.claude/settings.json` that fire automatically at session start and end. Use these to auto-hydrate and persist Prism memory without manual prompting.
+Claude Code supports custom hooks (`SessionStart`, `Stop`) that can force the agent to load and save Prism context automatically. Because Claude Code requires explicit permission for MCP tools, you must also whitelist the Prism commands.
+### 1. The Auto-Load Hook Script
+Create a Python script (e.g., `~/.claude/mcp_autoload_hook.py`). This script outputs JSON that Claude Code reads during the `SessionStart` event.
-### SessionStart Hook
+```python
+#!/usr/bin/env python3
+import json
+import sys
+def main():
+    # Inject a system message forcing the agent to load memory BEFORE speaking
+    print(json.dumps({
+        "continue": True,
+        "suppressOutput": True,
+        "systemMessage": (
+            "## First Action\n"
+            "Call `mcp__prism-mcp__session_load_context(project='my-project', level='deep')` "
+            "before responding to the user. Do not generate any text before calling this tool."
+        )
+    }))
+if __name__ == "__main__":
+    main()
+```
-Automatically loads context when a new session begins:
+### 2. Configure `settings.json`
+Map the hooks in your `~/.claude/settings.json`:
 ```json
 {
@@ -480,47 +598,45 @@ Automatically loads context when a new session begins:
         "hooks": [
           {
             "type": "command",
-            "command": "python3 -c \"import json; print(json.dumps({'continue': True, 'suppressOutput': False, 'systemMessage': 'You MUST call mcp__prism-mcp__session_load_context twice before responding to the user: first with project=my-project level=standard, then with project=my-other-project level=standard. Do not skip this.'}))\"",
+            "command": "python3 /Users/you/.claude/mcp_autoload_hook.py",
             "timeout": 10
           }
         ]
       }
-    ]
-  }
-}
-```
-### Stop Hook
-Automatically saves session memory when a session ends:
-```json
-{
-  "hooks": {
+    ],
     "Stop": [
       {
         "matcher": "*",
         "hooks": [
           {
             "type": "command",
-            "command": "python3 -c \"import json; print(json.dumps({'continue': True, 'suppressOutput': False, 'systemMessage': 'MANDATORY END WORKFLOW: 1) Call mcp__prism-mcp__session_save_ledger with project and summary. 2) Call mcp__prism-mcp__session_save_handoff with expected_version set to the loaded version.'}))\"",
+            "command": "python3 -c \"import json; print(json.dumps({'continue': True, 'suppressOutput': True, 'systemMessage': 'MANDATORY END WORKFLOW: 1) Call mcp__prism-mcp__session_save_ledger with project and summary. 2) Call mcp__prism-mcp__session_save_handoff with expected_version set to the loaded version.'}))\"",
             "timeout": 10
           }
         ]
       }
     ]
+  },
+  "permissions": {
+    "allow": [
+      "mcp__prism-mcp__session_load_context",
+      "mcp__prism-mcp__session_save_ledger",
+      "mcp__prism-mcp__session_save_handoff",
+      "mcp__prism-mcp__knowledge_search",
+      "mcp__prism-mcp__session_search_memory"
+    ]
   }
 }
 ```
 ### How the Hooks Work
-The hook `command` runs a Python one-liner that returns a JSON object to Claude Code:
+The hook `command` runs a Python script that returns a JSON object to Claude Code:
 | Field | Purpose |
 |---|---|
 | `continue: true` | Tell Claude Code to proceed (don't abort the session) |
-| `suppressOutput: false` | Show the hook result to the agent |
+| `suppressOutput: true` | Silently inject the system message (recommended for Stop hooks) |
 | `systemMessage` | Instruction injected as a system message — the agent follows it |
 The agent receives the `systemMessage` as an instruction and executes the tool calls. The server resolves the agent's **role** and **name** automatically from the dashboard — no need to specify them in the hook.
@@ -539,90 +655,153 @@ explicit tool argument  →  dashboard setting  →  "global" (default)
 Change your role once in the dashboard, and it automatically applies to every session — CLI, extension, and all MCP clients.
-### Verification
-If hydration ran successfully, the agent's output will include:
-- A `[👤 AGENT IDENTITY]` block showing your dashboard-configured role and name
-- `PRISM_CONTEXT_LOADED` marker text
+### Troubleshooting Claude Code
-If the marker is missing, the hook did not fire or the MCP server is not connected.
+- **Hook not firing?** Check the hook `timeout` in Claude Code settings. If your Python script takes too long, Claude ignores it silently.
+- **"Tool not available" hallucination?** If Claude claims it doesn't have the tool, it's usually an adversarial Chain-of-Thought loop. Ensure the `permissions.allow` array exactly matches the double-underscore format (`mcp__prism-mcp__...`).
+- **Missing `PRISM_CONTEXT_LOADED`?** The hook didn't fire or the MCP server isn't connected. Verify `prism-mcp` is listed in your `mcpServers` config.
 ---
 ## Gemini / Antigravity Integration
-Gemini-based clients (like Antigravity) use `GEMINI.md` global rules or user rules for startup behavior. The server resolves the role from the dashboard automatically.
+Antigravity and Gemini-based agents require a radically simplified approach to auto-loading. If you give modern instruction-tuned models a long list of "Banned Behaviors" (e.g., "Do NOT say hello first"), their internal reasoning often over-indexes on the constraints and causes them to hallucinate that the tool doesn't exist.
-### Global Rules (`~/.gemini/GEMINI.md`)
+### The 2-Line "First Action" Rule
-```markdown
-## Prism MCP Memory Auto-Load (CRITICAL)
-At the start of every new session, call `mcp__prism-mcp__session_load_context`
-for these projects:
-- `my-project` (level=standard)
-- `my-other-project` (level=standard)
+Create a `GEMINI.md` file in your project root (or globally at `~/.gemini/GEMINI.md`) or paste this into your Antigravity **User Rules**:
-After both succeed, print PRISM_CONTEXT_LOADED.
+```markdown
+## First Action
+Call `mcp_prism-mcp_session_load_context(project="my-project", level="deep")` before responding.
 ```
-### User Rules (Antigravity Settings)
+> **Note:** Antigravity uses single underscores (`mcp_prism-mcp_...`) compared to Claude Code's double underscores (`mcp__prism-mcp__...`).
-If your Gemini client supports user rules, add the same instructions there. The key points:
+That's it — **two lines**. This approach proved reliable after 13 iterations of increasingly complex prompt engineering. The key insight: shorter instructions avoid triggering the model's adversarial reasoning about tool availability.
-1. **Call `session_load_context` as a tool** — not `read_resource`. Only the tool returns the `[👤 AGENT IDENTITY]` block.
-2. **Verify** — confirm the response includes `version` and `last_summary`.
+### Session End Protocol
-### Session End
+At the end of your conversation, explicitly tell the agent:
+> *"Wrap up the session."*
-At the end of each session, save state:
+The agent will rely on its system prompt to execute:
+1. `session_save_ledger` — immutable work log with summary, TODOs, and decisions
+2. `session_save_handoff` — passing the `expected_version` it received during the load step to ensure Optimistic Concurrency Control
+### Antigravity UI Caveats
+Antigravity's UI currently does **not** visually render the raw output of MCP tool calls. To ensure the agent actually ingested the context, add this to your User Rules:
 ```markdown
-## Session End Protocol
-1) Call `mcp__prism-mcp__session_save_ledger` with project and summary.
-2) Call `mcp__prism-mcp__session_save_handoff` with expected_version from the loaded version.
+## STEP 2: Echo Context in Your Text Response
+After the tool returns, include the following in your greeting text:
+- Agent identity: `🤖 Agent: <role> — <name>`
+- Last session summary
+- Open TODOs
+- Session version number
 ```
+This forces the agent to prove it loaded context by echoing it in visible text.
 ---
 ## Use Cases
-| Scenario | How Prism MCP Helps |
-|----------|-------------------|
-| **Long-running feature work** | Save session state at end of day, restore full context the next morning — no re-explaining |
-| **Multi-agent collaboration** | Telepathy sync lets multiple agents share context in real time |
-| **Consulting / multi-project** | Switch between client projects with progressive context loading |
-| **Research & analysis** | Multi-engine search with 94% context reduction via sandboxed code transforms |
-| **Team onboarding** | New team member's agent loads full project history via `session_load_context("deep")` |
-| **Visual debugging** | Save screenshots of broken UI to visual memory — the agent remembers what it looked like |
-| **Offline / air-gapped** | Full SQLite local mode with no internet dependency for memory features |
+| Scenario | How Prism MCP Helps | Live Sample |
+|----------|---------------------|-------------|
+| **Long-running feature work** | Save session state at end of day, restore full context next morning — no re-explaining | `session_save_handoff(project, last_summary, open_todos)` |
+| **Multi-agent collaboration** | Hivemind Telepathy lets multiple agents share real-time context across clients | `session_load_context(project, role="qa")` |
+| **Consulting / multi-project** | Switch between client projects with progressive context loading | `session_load_context(project, level="quick")` |
+| **Research & analysis** | Multi-engine search with 94% context reduction via sandboxed code transforms | `brave_web_search` + `code_mode_transform(template="api_endpoints")` |
+| **Team onboarding** | New team member's agent loads full project history instantly | `session_load_context(project, level="deep")` |
+| **Visual debugging** | Save UI screenshots to visual memory — searchable by description | `session_save_image(project, path, description)` → `session_view_image(id)` |
+| **Offline / air-gapped** | Full SQLite local mode, Ollama LLM adapter — zero internet dependency | `PRISM_LLM_PROVIDER=ollama` in MCP config env |
+| **Behavior enforcement** | Agent corrections auto-graduate into permanent `.cursorrules` | `session_save_experience(event_type="correction")` → `knowledge_sync_rules(project)` |
+| **Infrastructure observability** | OTel spans to Jaeger/Grafana for every MCP tool call fanout | Enable in Dashboard → Settings → 🔭 Observability |
+| **GDPR / audit export** | ZIP export of all memory as JSON + Markdown, sensitive fields redacted | `session_export_memory(project, format="zip")` |
+---
+## New in v4.6.0 — Feature Setup Guide
+### 🔭 OpenTelemetry Distributed Tracing
+**Why:** Every `session_save_ledger` call can silently fan out into a synchronous DB write, an async VLM caption, and a vector embedding backfill. Without tracing, these are invisible. OTel makes the full call tree visible in Jaeger, Grafana Tempo, or any OTLP-compatible collector.
+**Setup:**
+1. Open Mind Palace Dashboard → ⚙️ Settings → 🔭 Observability
+2. Toggle **Enable OpenTelemetry** → set your OTLP endpoint (default: `http://localhost:4318`)
+3. Restart the MCP server
+4. Run Jaeger locally:
+```bash
+docker run -d --name jaeger \
+  -p 16686:16686 -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+```
+5. Open http://localhost:16686 — select service `prism-mcp` to see span waterfalls.
+**Span hierarchy:**
+```
+mcp.call_tool [session_save_ledger]
+├── storage.write_ledger          ~2ms
+├── llm.generate_embedding        ~180ms
+└── worker.vlm_caption (async)    ~1.2s
+```
+> GDPR note: Span attributes contain only metadata — no prompt content, embeddings, or image data.
+---
+### 🖼️ VLM Multimodal Memory
+**Why:** Agents lose visual context between sessions. UI screenshots, architecture diagrams, and bug states all become searchable memory.
+**Setup:** Requires `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` (vision-capable model).
+**Usage:**
+```
+session_save_image(project="my-app", file_path="/path/to/screenshot.png", description="Login page broken layout after CSS refactor")
+```
+The image is auto-captioned by a VLM and stored in the media vault. Retrieve later:
+```
+session_view_image(project="my-app", image_id="8f2a1b3c")
+```
 ---
 ## Architecture
+> **📖 Deep dive**: [Full Architecture Guide](docs/ARCHITECTURE.md) — TurboQuant math, Three-Tier search, storage optimization flow
+> **🤖 Tutorial**: [How to Build a Self-Improving Agent](docs/self-improving-agent.md) — corrections → behavioral memory → IDE rules
 ```mermaid
 graph TB
     Client["AI Client<br/>(Claude Desktop / Cursor / Windsurf)"]
-    LangChain["LangChain / LangGraph<br/>(Python Retrievers)"]
+    LangChain["LangChain / LangGraph<br/>(Python/TS Retrievers)"]
     MCP["Prism MCP Server<br/>(TypeScript)"]
     Client -- "MCP Protocol (stdio)" --> MCP
     LangChain -- "JSON-RPC via MCP Bridge" --> MCP
-    MCP --> Tracing["MemoryTrace Engine<br/>Latency + Strategy + Scoring"]
-    MCP --> Dashboard["Mind Palace Dashboard<br/>localhost:3000"]
+    MCP --> Tracing["OTel Tracing<br/>v4.6 Observability"]
+    MCP --> Dashboard["Mind Palace Dashboard<br/>localhost:3000<br/>(PRISM_DASHBOARD_PORT)"]
     MCP --> Brave["Brave Search API<br/>Web + Local + AI Answers"]
-    MCP --> Gemini["Google Gemini API<br/>Analysis + Briefings"]
+    MCP --> LLM["LLM Factory<br/>Gemini / OpenAI / Ollama"]
     MCP --> Sandbox["QuickJS Sandbox<br/>Code-Mode Templates"]
     MCP --> SyncBus["SyncBus<br/>Agent Telepathy"]
     MCP --> GDPR["GDPR Engine<br/>Soft/Hard Delete + Audit"]
     MCP --> Storage{"Storage Backend"}
-    Storage --> SQLite["SQLite (Local)<br/>libSQL + F32_BLOB vectors"]
+    Storage --> SQLite["SQLite (Local)<br/>libSQL + sqlite-vec"]
     Storage --> Supabase["Supabase (Cloud)<br/>PostgreSQL + pgvector"]
-    SQLite --> Ledger["session_ledger<br/>(+ deleted_at tombstoning)"]
-    SQLite --> Handoffs["session_handoffs"]
+    SQLite --> Ledger["session_ledger"]
+    Ledger --> T1["Tier 1: float32<br/>3,072B native search"]
+    T1 -- "v5.0 TurboQuant" --> T2["Tier 2: turbo4<br/>400B JS search"]
+    T1 -. "v5.1 Purge" .-> Null["NULL after 30d"]
+    SQLite --> Handoffs["session_handoffs<br/>(OCC versioning)"]
     SQLite --> History["history_snapshots<br/>(Time Travel)"]
     SQLite --> Media["media vault<br/>(Visual Memory)"]
@@ -632,13 +811,16 @@ graph TB
     style Tracing fill:#D69E2E,color:#fff
     style Dashboard fill:#9F7AEA,color:#fff
     style Brave fill:#FB542B,color:#fff
-    style Gemini fill:#4285F4,color:#fff
+    style LLM fill:#4285F4,color:#fff
     style Sandbox fill:#805AD5,color:#fff
     style SyncBus fill:#ED64A6,color:#fff
     style GDPR fill:#E53E3E,color:#fff
     style Storage fill:#2D3748,color:#fff
     style SQLite fill:#38B2AC,color:#fff
     style Supabase fill:#3ECF8E,color:#fff
+    style T1 fill:#48BB78,color:#fff
+    style T2 fill:#E8B004,color:#000
+    style Null fill:#E53E3E,color:#fff
 ```
 ---
@@ -968,6 +1150,7 @@ The retrievers use `_aget_relevant_documents` as the primary path with `asyncio.
 | `PRISM_AUTO_CAPTURE` | No | Set `"true"` to auto-capture HTML snapshots of dev servers |
 | `PRISM_CAPTURE_PORTS` | No | Comma-separated ports to scan (default: `3000,3001,5173,8080`) |
 | `PRISM_DEBUG_LOGGING` | No | Set `"true"` to enable verbose debug logs (default: quiet) |
+| `PRISM_DASHBOARD_PORT` | No | Configure the dashboard port (default: `3000`) |
 ---
@@ -1420,7 +1603,6 @@ See [`vertex-ai/`](vertex-ai/) for setup and benchmarks.
 │   │   ├── compactionHandler.ts         # Gemini-powered ledger compaction
 │   │   └── index.ts                     # Tool registration & re-exports
 │   └── utils/
-│   └── utils/
 │       ├── telemetry.ts                 # OTel singleton — NodeTracerProvider, BatchSpanProcessor, no-op mode
 │       ├── tracing.ts                   # MemoryTrace types + factory (Phase 1 — LLM explainability)
 │       ├── imageCaptioner.ts            # VLM auto-caption pipeline (v4.5) + worker.vlm_caption OTel span
@@ -1462,6 +1644,24 @@ See [`vertex-ai/`](vertex-ai/) for setup and benchmarks.
 > **[View the full project board →](https://github.com/users/dcostenco/projects/1/views/1)** | **[Full ROADMAP.md →](ROADMAP.md)**
+### ✅ v5.0 — Quantized Agentic Memory (Shipped!)
+| Feature | Description |
+|---|---|
+| 🧮 **TurboQuant Math Core** | Pure TypeScript port of Google's TurboQuant (ICLR 2026) — Lloyd-Max codebook, QR rotation, QJL error correction. Zero dependencies. [RFC-001](docs/rfcs/001-turboquant-integration.md) |
+| 📦 **~7× Embedding Compression** | 768-dim embeddings shrink from 3,072 bytes to ~400 bytes (4-bit) via variable bit-packing. |
+| 🔍 **Asymmetric Similarity** | Unbiased inner product estimator: query as float32 vs compressed blobs. No decompression needed. |
+| 🗄️ **Two-Tier Search** | FTS5 candidate filter → JS-side asymmetric scoring. Bypasses sqlite-vec float32 limitation. |
+### ✅ v5.1 — Deep Storage Mode (Shipped!)
+| Feature | Description |
+|---|---|
+| 🧬 **Deep Storage Purge** | Automated `deep_storage_purge` tool NULLs out redundant float32 embeddings for entries with TurboQuant compressed blobs, reclaiming ~90% of vector storage. |
+| 🛡️ **Safety Guards** | Minimum 7-day age threshold, dry-run preview mode, multi-tenant isolation, and compressed-blob-existence validation ensure zero data loss. |
+| 🗃️ **Supabase RPC** | `prism_purge_embeddings` Postgres function (migration 030) provides full backend parity with SQLite. Auto-applied via the v4.1 migration runner. |
+| 🧪 **303 Tests** | 8 new deep-storage test cases covering dry run, execute, safety guards, and idempotency — zero regressions across the full suite. |
 ### ✅ v4.6 — OpenTelemetry Observability (Shipped!)
 | Feature | Description |
@@ -1516,11 +1716,11 @@ See [v3.1.0](#whats-in-v310--memory-lifecycle-) and [v3.0.0](#whats-in-v300--age
 | Priority | Feature | Description |
 |----------|---------|-------------|
-| 🥇 | **Documentation & Architecture Guide** | Full README overhaul with architecture diagrams, "How to build a self-improving agent" walkthrough, and v4.x feature matrix. |
-| 🥈 | **Knowledge Graph Editor** | Visual graph in Mind Palace showing nodes for projects, agents, sessions, and graduated rules. |
+| ✅ | **Documentation & Architecture Guide** | [Architecture Guide](docs/ARCHITECTURE.md), [Self-Improving Agent Guide](docs/self-improving-agent.md), updated README diagram with v5.x vector tiers. |
+| ✅ | **Knowledge Graph Editor** | Interactive vis.js graph with click-to-filter, node stats, project/keyword/category visualization. |
 | 🥉 | **Autonomous Web Scholar** | Agent-driven learning pipeline using Brave Search + VLM to autonomously build project context while the developer sleeps. |
-| — | **Dashboard Auth** | Optional basic auth for remote Mind Palace access. |
-| — | **TypeScript LangGraph Examples** | Reference implementations alongside the existing Python agent. |
+| ✅ | **Dashboard Auth** | HTTP Basic Auth with session cookies, timing-safe comparison, styled login page. Set `PRISM_DASHBOARD_USER`/`PRISM_DASHBOARD_PASS`. |
+| ✅ | **TypeScript LangGraph Examples** | [Reference agent](examples/langgraph-ts/) with MCP client, memory retriever nodes, and session persistence. |
 | — | **CRDT Conflict Resolution** | Conflict-free types for concurrent multi-agent edits on the same handoff. |
 ---