prism-mcp-server 4.6.0 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,12 +8,14 @@
8
8
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-3178C6?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
9
9
  [![Node.js](https://img.shields.io/badge/Node.js-18+-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
10
10
 
11
- > **Your AI agent's memory that survives between sessions.** Prism MCP is a Model Context Protocol server that gives Claude Desktop, Cursor, Windsurf, and any MCP client **persistent memory**, **time travel**, **visual context**, **multi-agent sync**, **GDPR-compliant deletion**, **memory tracing**, and **LangChain integration** — all running locally with zero cloud dependencies.
11
+ > **Your AI agent's memory that survives between sessions.** Prism MCP is a Model Context Protocol server that gives Claude Desktop, Cursor, Windsurf, and any MCP client **persistent memory**, **time travel**, **visual context**, **multi-agent sync**, **GDPR-compliant deletion**, **memory tracing**, **quantized vector compression**, and **LangChain integration** — all running locally with zero cloud dependencies.
12
12
  >
13
- > Built with **SQLite + F32_BLOB vector search**, **optimistic concurrency control**, **MCP Prompts & Resources**, **auto-compaction**, **Gemini-powered Morning Briefings**, **MemoryTrace explainability**, and optional **Supabase cloud sync**.
13
+ > Built with **SQLite + F32_BLOB vector search**, **TurboQuant 10× embedding compression**, **optimistic concurrency control**, **MCP Prompts & Resources**, **auto-compaction**, **Gemini-powered Morning Briefings**, **MemoryTrace explainability**, and optional **Supabase cloud sync**.
14
14
 
15
15
  ## Table of Contents
16
16
 
17
+ - [What's New (v5.1.0)](#whats-new-in-v510--deep-storage--knowledge-graph-)
18
+ - [What's New (v5.0.0)](#whats-new-in-v500--quantized-agentic-memory-)
17
19
  - [What's New (v4.6.0)](#whats-new-in-v460--opentelemetry-observability-)
18
20
  - [Multi-Instance Support](#multi-instance-support)
19
21
  - [How Prism Compares](#how-prism-compares)
@@ -23,7 +25,7 @@
23
25
  - [Claude Code Integration (Hooks)](#claude-code-integration-hooks)
24
26
  - [Gemini / Antigravity Integration](#gemini--antigravity-integration)
25
27
  - [Use Cases](#use-cases)
26
- - [Architecture](#architecture)
28
+ - [Architecture](#architecture) | [Full Architecture Guide](docs/ARCHITECTURE.md) | [Self-Improving Agent Guide](docs/self-improving-agent.md)
27
29
  - [Tool Reference](#tool-reference)
28
30
  - [Agent Hivemind — Role Usage](#agent-hivemind--role-usage)
29
31
  - [LangChain / LangGraph Integration](#langchain--langgraph-integration)
@@ -42,12 +44,98 @@
42
44
 
43
45
  ---
44
46
 
45
- ## What's New in v4.6.0 — OpenTelemetry Observability 🔭
47
+ ## What's New in v5.1.0 — Deep Storage & Knowledge Graph 🗂️
48
+
49
+ > **🗂️ Reclaim 90% of your vector storage and visually edit your agent's knowledge graph.**
50
+ > [CHANGELOG](CHANGELOG.md)
51
+
52
+ | Feature | Description |
53
+ |---|---|
54
+ | 🗑️ **Deep Storage Mode** | New `deep_storage_purge` tool NULLs out redundant float32 embeddings for entries with TurboQuant compressed blobs, reclaiming ~90% of vector storage. Safety guards: 7-day minimum age, dry-run preview, multi-tenant isolation. |
55
+ | 🕸️ **Knowledge Graph Editor** | The Mind Palace Neural Graph is now fully interactive — click nodes to rename or delete keywords, filter by project/date/importance, and surgically groom your agent's semantic memory. |
56
+ | 🔧 **Auto-Load Reliability** | Hardened hook-based integration patterns for Claude Code and Gemini/Antigravity to guarantee context loading on the absolute first turn without reasoning hallucinations. |
57
+ | 🧪 **303 Tests** | 8 new deep-storage test cases covering dry run, execute, safety guards, and idempotency — zero regressions across 13 suites. |
58
+
59
+ ---
60
+
61
+ ## What's New in v5.0.0 — Quantized Agentic Memory 🧬
62
+
63
+ > **🧬 10× embedding compression is here.** Powered by Google's TurboQuant (ICLR 2026), Prism now compresses 768-dim embeddings from **3,072 bytes → ~400 bytes** — enabling decades of session history on a standard laptop.
64
+ > [RFC-001: Quantized Agentic Memory](docs/rfcs/001-turboquant-integration.md) | [CHANGELOG](CHANGELOG.md)
65
+
66
+ ### Performance Benchmarks
67
+
68
+ | Metric | Before v5.0 | After v5.0 |
69
+ |--------|------------|------------|
70
+ | **Storage per embedding** | 3,072 bytes (float32) | ~400 bytes (turbo4) |
71
+ | **Compression ratio** | 1:1 | **~7.7:1** (4-bit) / **~10.1:1** (3-bit) |
72
+ | **Similarity correlation** | Baseline | >0.85 (4-bit) |
73
+ | **Top-1 retrieval accuracy** | Baseline | >90% (N=100) |
74
+ | **Entries per GB** | ~330K | **~2.5M** |
75
+ | **Search without vector DB** | ❌ Empty | ✅ Tier-2 JS fallback |
76
+
77
+ ### Three-Tier Memory Architecture
78
+
79
+ ```
80
+ ┌─────────────────────────────────────────────────────────────┐
81
+ │ PRISM v5.0 MEMORY │
82
+ ├─────────┬───────────────┬───────────────────────────────────┤
83
+ │ TIER │ STORAGE │ SEARCH METHOD │
84
+ ├─────────┼───────────────┼───────────────────────────────────┤
85
+ │ Tier 0 │ FTS5 keywords │ Full-text search (knowledge_search) │
86
+ │ Tier 1 │ float32 3072B │ sqlite-vec cosine (native) │
87
+ │ Tier 2 │ turbo4 400B │ JS asymmetricCosineSimilarity │
88
+ └─────────┴───────────────┴───────────────────────────────────┘
89
+
90
+ searchMemory() flow:
91
+ → Tier 1 (sqlite-vec) ── success → return results
92
+ ── fail → Tier 2 (TurboQuant JS)
93
+ ── success → return results
94
+ ── fail → return []
95
+ ```
96
+
97
+ ### Live Usage: How TurboQuant Works in Practice
98
+
99
+ **Every `session_save_ledger` call now generates both tiers automatically:**
100
+
101
+ ```typescript
102
+ // What happens behind the scenes when you save a session:
103
+ await saveLedger({ project: "my-app", summary: "Built auth flow" });
104
+
105
+ // 1. Gemini generates float32 embedding (3,072 bytes)
106
+ // 2. TurboQuant compresses to turbo4 blob (~400 bytes)
107
+ // 3. Single atomic patchLedger writes BOTH to the database
108
+ // → embedding: "[0.0234, -0.0156, ...]" (float32)
109
+ // → embedding_compressed: "base64..." (turbo4)
110
+ // → embedding_format: "turbo4"
111
+ // → embedding_turbo_radius: 12.847
112
+
113
+ // Searching works seamlessly across both tiers:
114
+ await searchMemory({ query: "auth flow" });
115
+ // → Tier 1 tries native vector search
116
+ // → If unavailable, Tier 2 deserializes compressed blobs
117
+ // and ranks using asymmetric cosine similarity in JS
118
+ ```
119
+
120
+ **Backfill existing entries with one command:**
121
+ ```
122
+ > Use tool: session_backfill_embeddings
123
+ > Now repairs AND compresses in a single atomic update
124
+ ```
125
+
126
+ > **💡 Ollama TurboQuant Tip:** If using Ollama for self-hosted inference, set `OLLAMA_KV_CACHE_TYPE=turbo3` for 10× smaller KV caches during generation — the same algorithm powering Prism's memory compression.
127
+
128
+ ---
129
+
130
+ <details>
131
+ <summary><strong>What's in v4.6.0 — OpenTelemetry Observability 🔭</strong></summary>
46
132
 
47
133
  > **🔭 Full distributed tracing for every MCP tool call, LLM provider hop, and background AI worker.**
48
134
  > Configure in the new **🔭 Observability** tab in Mind Palace — no code changes required.
49
135
  > Activates a 4-tier span waterfall: `mcp.call_tool` → `worker.vlm_caption` → `llm.generate_image_description` / `llm.generate_embedding`.
50
136
 
137
+ </details>
138
+
51
139
  <a name="whats-new-in-v451--gdpr-export-"></a>
52
140
  <details>
53
141
  <summary><strong>What's in v4.5.1 — GDPR Export & Test Hardening 🔒</strong></summary>
@@ -234,7 +322,7 @@
234
322
  | Feature | Description |
235
323
  |---|---|
236
324
  | 🏠 **Local-First SQLite** | Run Prism entirely locally with zero cloud dependencies. Full vector search (libSQL F32_BLOB) and FTS5 included. |
237
- | 🔮 **Mind Palace UI** | A beautiful glassmorphism dashboard at `localhost:3000` to inspect your agent's memory, visual vault, and Git drift. |
325
+ | 🔮 **Mind Palace UI** | A beautiful glassmorphism dashboard at `localhost:3000` (configurable via `PRISM_DASHBOARD_PORT`) to inspect your agent's memory, visual vault, and Git drift. |
238
326
  | 🕰️ **Time Travel** | `memory_history` and `memory_checkout` act like `git revert` for your agent's brain — full version history with OCC. |
239
327
  | 🖼️ **Visual Memory** | Agents can save screenshots to a local media vault. Auto-capture mode snapshots your local dev server on every handoff save. |
240
328
  | 📡 **Agent Telepathy** | Multi-client sync: if your agent in Cursor saves state, Claude Desktop gets a live notification instantly. |
@@ -271,9 +359,14 @@
271
359
  | **Auto-Compaction** | ✅ Gemini rollups | ❌ | ❌ | ❌ | ❌ |
272
360
  | **Morning Briefing** | ✅ Gemini synthesis | ❌ | ❌ | ❌ | ❌ |
273
361
  | **OCC (Concurrency)** | ✅ Version-based | ❌ | ❌ | ❌ | ❌ |
274
- | **GDPR Compliance** | ✅ Soft/hard delete | ❌ | ❌ | ❌ | ❌ |
362
+ | **GDPR Compliance** | ✅ Soft/hard delete + ZIP export | ❌ | ❌ | ❌ | ❌ |
275
363
  | **Memory Tracing** | ✅ Latency breakdown | ❌ | ❌ | ❌ | ❌ |
364
+ | **OpenTelemetry** | ✅ OTLP spans (v4.6) | ❌ | ❌ | ❌ | ❌ |
365
+ | **VLM Image Captions** | ✅ Auto-caption vault (v4.5) | ❌ | ❌ | ❌ | ❌ |
366
+ | **Pluggable LLM Adapters** | ✅ OpenAI/Anthropic/Gemini/Ollama | ❌ | ✅ Multi-provider | ❌ | ❌ |
276
367
  | **LangChain** | ✅ BaseRetriever | ❌ | ❌ | ❌ | ❌ |
368
+ | **Vector Compression** | ✅ TurboQuant 10× (v5.0) | ❌ | ❌ | ❌ | ❌ |
369
+ | **Three-Tier Search** | ✅ FTS + Vec + Quantized | ❌ | ❌ | ❌ | ❌ |
277
370
  | **MCP Native** | ✅ stdio | ✅ stdio | ❌ Python SDK | ✅ HTTP + MCP | ✅ stdio |
278
371
  | **Language** | TypeScript | TypeScript | Python | Python | Python |
279
372
 
@@ -465,11 +558,36 @@ Add to your Continue `config.json` or Cline MCP settings:
465
558
 
466
559
  ## Claude Code Integration (Hooks)
467
560
 
468
- Claude Code supports **lifecycle hooks** in `~/.claude/settings.json` that fire automatically at session start and end. Use these to auto-hydrate and persist Prism memory without manual prompting.
561
+ Claude Code supports custom hooks (`SessionStart`, `Stop`) that can force the agent to load and save Prism context automatically. Because Claude Code requires explicit permission for MCP tools, you must also whitelist the Prism commands.
562
+
563
+ ### 1. The Auto-Load Hook Script
564
+
565
+ Create a Python script (e.g., `~/.claude/mcp_autoload_hook.py`). This script outputs JSON that Claude Code reads during the `SessionStart` event.
469
566
 
470
- ### SessionStart Hook
567
+ ```python
568
+ #!/usr/bin/env python3
569
+ import json
570
+ import sys
571
+
572
+ def main():
573
+ # Inject a system message forcing the agent to load memory BEFORE speaking
574
+ print(json.dumps({
575
+ "continue": True,
576
+ "suppressOutput": True,
577
+ "systemMessage": (
578
+ "## First Action\n"
579
+ "Call `mcp__prism-mcp__session_load_context(project='my-project', level='deep')` "
580
+ "before responding to the user. Do not generate any text before calling this tool."
581
+ )
582
+ }))
583
+
584
+ if __name__ == "__main__":
585
+ main()
586
+ ```
471
587
 
472
- Automatically loads context when a new session begins:
588
+ ### 2. Configure `settings.json`
589
+
590
+ Map the hooks in your `~/.claude/settings.json`:
473
591
 
474
592
  ```json
475
593
  {
@@ -480,47 +598,45 @@ Automatically loads context when a new session begins:
480
598
  "hooks": [
481
599
  {
482
600
  "type": "command",
483
- "command": "python3 -c \"import json; print(json.dumps({'continue': True, 'suppressOutput': False, 'systemMessage': 'You MUST call mcp__prism-mcp__session_load_context twice before responding to the user: first with project=my-project level=standard, then with project=my-other-project level=standard. Do not skip this.'}))\"",
601
+ "command": "python3 /Users/you/.claude/mcp_autoload_hook.py",
484
602
  "timeout": 10
485
603
  }
486
604
  ]
487
605
  }
488
- ]
489
- }
490
- }
491
- ```
492
-
493
- ### Stop Hook
494
-
495
- Automatically saves session memory when a session ends:
496
-
497
- ```json
498
- {
499
- "hooks": {
606
+ ],
500
607
  "Stop": [
501
608
  {
502
609
  "matcher": "*",
503
610
  "hooks": [
504
611
  {
505
612
  "type": "command",
506
- "command": "python3 -c \"import json; print(json.dumps({'continue': True, 'suppressOutput': False, 'systemMessage': 'MANDATORY END WORKFLOW: 1) Call mcp__prism-mcp__session_save_ledger with project and summary. 2) Call mcp__prism-mcp__session_save_handoff with expected_version set to the loaded version.'}))\"",
613
+ "command": "python3 -c \"import json; print(json.dumps({'continue': True, 'suppressOutput': True, 'systemMessage': 'MANDATORY END WORKFLOW: 1) Call mcp__prism-mcp__session_save_ledger with project and summary. 2) Call mcp__prism-mcp__session_save_handoff with expected_version set to the loaded version.'}))\"",
507
614
  "timeout": 10
508
615
  }
509
616
  ]
510
617
  }
511
618
  ]
619
+ },
620
+ "permissions": {
621
+ "allow": [
622
+ "mcp__prism-mcp__session_load_context",
623
+ "mcp__prism-mcp__session_save_ledger",
624
+ "mcp__prism-mcp__session_save_handoff",
625
+ "mcp__prism-mcp__knowledge_search",
626
+ "mcp__prism-mcp__session_search_memory"
627
+ ]
512
628
  }
513
629
  }
514
630
  ```
515
631
 
516
632
  ### How the Hooks Work
517
633
 
518
- The hook `command` runs a Python one-liner that returns a JSON object to Claude Code:
634
+ The hook `command` runs a Python script that returns a JSON object to Claude Code:
519
635
 
520
636
  | Field | Purpose |
521
637
  |---|---|
522
638
  | `continue: true` | Tell Claude Code to proceed (don't abort the session) |
523
- | `suppressOutput: false` | Show the hook result to the agent |
639
+ | `suppressOutput: true` | Silently inject the system message (recommended for Stop hooks) |
524
640
  | `systemMessage` | Instruction injected as a system message — the agent follows it |
525
641
 
526
642
  The agent receives the `systemMessage` as an instruction and executes the tool calls. The server resolves the agent's **role** and **name** automatically from the dashboard — no need to specify them in the hook.
@@ -539,90 +655,153 @@ explicit tool argument → dashboard setting → "global" (default)
539
655
 
540
656
  Change your role once in the dashboard, and it automatically applies to every session — CLI, extension, and all MCP clients.
541
657
 
542
- ### Verification
543
-
544
- If hydration ran successfully, the agent's output will include:
545
- - A `[👤 AGENT IDENTITY]` block showing your dashboard-configured role and name
546
- - `PRISM_CONTEXT_LOADED` marker text
658
+ ### Troubleshooting Claude Code
547
659
 
548
- If the marker is missing, the hook did not fire or the MCP server is not connected.
660
+ - **Hook not firing?** Check the hook `timeout` in Claude Code settings. If your Python script takes too long, Claude ignores it silently.
661
+ - **"Tool not available" hallucination?** If Claude claims it doesn't have the tool, it's usually an adversarial Chain-of-Thought loop. Ensure the `permissions.allow` array exactly matches the double-underscore format (`mcp__prism-mcp__...`).
662
+ - **Missing `PRISM_CONTEXT_LOADED`?** The hook didn't fire or the MCP server isn't connected. Verify `prism-mcp` is listed in your `mcpServers` config.
549
663
 
550
664
  ---
551
665
 
552
666
  ## Gemini / Antigravity Integration
553
667
 
554
- Gemini-based clients (like Antigravity) use `GEMINI.md` global rules or user rules for startup behavior. The server resolves the role from the dashboard automatically.
668
+ Antigravity and Gemini-based agents require a radically simplified approach to auto-loading. If you give modern instruction-tuned models a long list of "Banned Behaviors" (e.g., "Do NOT say hello first"), their internal reasoning often over-indexes on the constraints and causes them to hallucinate that the tool doesn't exist.
555
669
 
556
- ### Global Rules (`~/.gemini/GEMINI.md`)
670
+ ### The 2-Line "First Action" Rule
557
671
 
558
- ```markdown
559
- ## Prism MCP Memory Auto-Load (CRITICAL)
560
- At the start of every new session, call `mcp__prism-mcp__session_load_context`
561
- for these projects:
562
- - `my-project` (level=standard)
563
- - `my-other-project` (level=standard)
672
+ Create a `GEMINI.md` file in your project root (or globally at `~/.gemini/GEMINI.md`) or paste this into your Antigravity **User Rules**:
564
673
 
565
- After both succeed, print PRISM_CONTEXT_LOADED.
674
+ ```markdown
675
+ ## First Action
676
+ Call `mcp_prism-mcp_session_load_context(project="my-project", level="deep")` before responding.
566
677
  ```
567
678
 
568
- ### User Rules (Antigravity Settings)
679
+ > **Note:** Antigravity uses single underscores (`mcp_prism-mcp_...`) compared to Claude Code's double underscores (`mcp__prism-mcp__...`).
569
680
 
570
- If your Gemini client supports user rules, add the same instructions there. The key points:
681
+ That's it **two lines**. This approach proved reliable after 13 iterations of increasingly complex prompt engineering. The key insight: shorter instructions avoid triggering the model's adversarial reasoning about tool availability.
571
682
 
572
- 1. **Call `session_load_context` as a tool** — not `read_resource`. Only the tool returns the `[👤 AGENT IDENTITY]` block.
573
- 2. **Verify** — confirm the response includes `version` and `last_summary`.
683
+ ### Session End Protocol
574
684
 
575
- ### Session End
685
+ At the end of your conversation, explicitly tell the agent:
686
+ > *"Wrap up the session."*
576
687
 
577
- At the end of each session, save state:
688
+ The agent will rely on its system prompt to execute:
689
+ 1. `session_save_ledger` — immutable work log with summary, TODOs, and decisions
690
+ 2. `session_save_handoff` — passing the `expected_version` it received during the load step to ensure Optimistic Concurrency Control
691
+
692
+ ### Antigravity UI Caveats
693
+
694
+ Antigravity's UI currently does **not** visually render the raw output of MCP tool calls. To ensure the agent actually ingested the context, add this to your User Rules:
578
695
 
579
696
  ```markdown
580
- ## Session End Protocol
581
- 1) Call `mcp__prism-mcp__session_save_ledger` with project and summary.
582
- 2) Call `mcp__prism-mcp__session_save_handoff` with expected_version from the loaded version.
697
+ ## STEP 2: Echo Context in Your Text Response
698
+ After the tool returns, include the following in your greeting text:
699
+ - Agent identity: `🤖 Agent: <role> <name>`
700
+ - Last session summary
701
+ - Open TODOs
702
+ - Session version number
583
703
  ```
584
704
 
705
+ This forces the agent to prove it loaded context by echoing it in visible text.
706
+
585
707
  ---
586
708
 
587
709
  ## Use Cases
588
710
 
589
- | Scenario | How Prism MCP Helps |
590
- |----------|-------------------|
591
- | **Long-running feature work** | Save session state at end of day, restore full context the next morning — no re-explaining |
592
- | **Multi-agent collaboration** | Telepathy sync lets multiple agents share context in real time |
593
- | **Consulting / multi-project** | Switch between client projects with progressive context loading |
594
- | **Research & analysis** | Multi-engine search with 94% context reduction via sandboxed code transforms |
595
- | **Team onboarding** | New team member's agent loads full project history via `session_load_context("deep")` |
596
- | **Visual debugging** | Save screenshots of broken UI to visual memory — the agent remembers what it looked like |
597
- | **Offline / air-gapped** | Full SQLite local mode with no internet dependency for memory features |
711
+ | Scenario | How Prism MCP Helps | Live Sample |
712
+ |----------|---------------------|-------------|
713
+ | **Long-running feature work** | Save session state at end of day, restore full context next morning — no re-explaining | `session_save_handoff(project, last_summary, open_todos)` |
714
+ | **Multi-agent collaboration** | Hivemind Telepathy lets multiple agents share real-time context across clients | `session_load_context(project, role="qa")` |
715
+ | **Consulting / multi-project** | Switch between client projects with progressive context loading | `session_load_context(project, level="quick")` |
716
+ | **Research & analysis** | Multi-engine search with 94% context reduction via sandboxed code transforms | `brave_web_search` + `code_mode_transform(template="api_endpoints")` |
717
+ | **Team onboarding** | New team member's agent loads full project history instantly | `session_load_context(project, level="deep")` |
718
+ | **Visual debugging** | Save UI screenshots to visual memory — searchable by description | `session_save_image(project, path, description)` → `session_view_image(id)` |
719
+ | **Offline / air-gapped** | Full SQLite local mode, Ollama LLM adapter — zero internet dependency | `PRISM_LLM_PROVIDER=ollama` in MCP config env |
720
+ | **Behavior enforcement** | Agent corrections auto-graduate into permanent `.cursorrules` | `session_save_experience(event_type="correction")` → `knowledge_sync_rules(project)` |
721
+ | **Infrastructure observability** | OTel spans to Jaeger/Grafana for every MCP tool call fanout | Enable in Dashboard → Settings → 🔭 Observability |
722
+ | **GDPR / audit export** | ZIP export of all memory as JSON + Markdown, sensitive fields redacted | `session_export_memory(project, format="zip")` |
723
+
724
+ ---
725
+
726
+ ## New in v4.6.0 — Feature Setup Guide
727
+
728
+ ### 🔭 OpenTelemetry Distributed Tracing
729
+
730
+ **Why:** Every `session_save_ledger` call can silently fan out into a synchronous DB write, an async VLM caption, and a vector embedding backfill. Without tracing, these are invisible. OTel makes the full call tree visible in Jaeger, Grafana Tempo, or any OTLP-compatible collector.
731
+
732
+ **Setup:**
733
+ 1. Open Mind Palace Dashboard → ⚙️ Settings → 🔭 Observability
734
+ 2. Toggle **Enable OpenTelemetry** → set your OTLP endpoint (default: `http://localhost:4318`)
735
+ 3. Restart the MCP server
736
+ 4. Run Jaeger locally:
737
+ ```bash
738
+ docker run -d --name jaeger \
739
+ -p 16686:16686 -p 4318:4318 \
740
+ jaegertracing/all-in-one:latest
741
+ ```
742
+ 5. Open http://localhost:16686 — select service `prism-mcp` to see span waterfalls.
743
+
744
+ **Span hierarchy:**
745
+ ```
746
+ mcp.call_tool [session_save_ledger]
747
+ ├── storage.write_ledger ~2ms
748
+ ├── llm.generate_embedding ~180ms
749
+ └── worker.vlm_caption (async) ~1.2s
750
+ ```
751
+
752
+ > GDPR note: Span attributes contain only metadata — no prompt content, embeddings, or image data.
753
+
754
+ ---
755
+
756
+ ### 🖼️ VLM Multimodal Memory
757
+
758
+ **Why:** Agents lose visual context between sessions. UI screenshots, architecture diagrams, and bug states all become searchable memory.
759
+
760
+ **Setup:** Requires `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` (vision-capable model).
761
+
762
+ **Usage:**
763
+ ```
764
+ session_save_image(project="my-app", file_path="/path/to/screenshot.png", description="Login page broken layout after CSS refactor")
765
+ ```
766
+ The image is auto-captioned by a VLM and stored in the media vault. Retrieve later:
767
+ ```
768
+ session_view_image(project="my-app", image_id="8f2a1b3c")
769
+ ```
598
770
 
599
771
  ---
600
772
 
601
773
  ## Architecture
602
774
 
775
+ > **📖 Deep dive**: [Full Architecture Guide](docs/ARCHITECTURE.md) — TurboQuant math, Three-Tier search, storage optimization flow
776
+ > **🤖 Tutorial**: [How to Build a Self-Improving Agent](docs/self-improving-agent.md) — corrections → behavioral memory → IDE rules
777
+
603
778
  ```mermaid
604
779
  graph TB
605
780
  Client["AI Client<br/>(Claude Desktop / Cursor / Windsurf)"]
606
- LangChain["LangChain / LangGraph<br/>(Python Retrievers)"]
781
+ LangChain["LangChain / LangGraph<br/>(Python/TS Retrievers)"]
607
782
  MCP["Prism MCP Server<br/>(TypeScript)"]
608
783
 
609
784
  Client -- "MCP Protocol (stdio)" --> MCP
610
785
  LangChain -- "JSON-RPC via MCP Bridge" --> MCP
611
786
 
612
- MCP --> Tracing["MemoryTrace Engine<br/>Latency + Strategy + Scoring"]
613
- MCP --> Dashboard["Mind Palace Dashboard<br/>localhost:3000"]
787
+ MCP --> Tracing["OTel Tracing<br/>v4.6 Observability"]
788
+ MCP --> Dashboard["Mind Palace Dashboard<br/>localhost:3000<br/>(PRISM_DASHBOARD_PORT)"]
614
789
  MCP --> Brave["Brave Search API<br/>Web + Local + AI Answers"]
615
- MCP --> Gemini["Google Gemini API<br/>Analysis + Briefings"]
790
+ MCP --> LLM["LLM Factory<br/>Gemini / OpenAI / Ollama"]
616
791
  MCP --> Sandbox["QuickJS Sandbox<br/>Code-Mode Templates"]
617
792
  MCP --> SyncBus["SyncBus<br/>Agent Telepathy"]
618
793
  MCP --> GDPR["GDPR Engine<br/>Soft/Hard Delete + Audit"]
619
794
 
620
795
  MCP --> Storage{"Storage Backend"}
621
- Storage --> SQLite["SQLite (Local)<br/>libSQL + F32_BLOB vectors"]
796
+ Storage --> SQLite["SQLite (Local)<br/>libSQL + sqlite-vec"]
622
797
  Storage --> Supabase["Supabase (Cloud)<br/>PostgreSQL + pgvector"]
623
798
 
624
- SQLite --> Ledger["session_ledger<br/>(+ deleted_at tombstoning)"]
625
- SQLite --> Handoffs["session_handoffs"]
799
+ SQLite --> Ledger["session_ledger"]
800
+ Ledger --> T1["Tier 1: float32<br/>3,072B native search"]
801
+ T1 -- "v5.0 TurboQuant" --> T2["Tier 2: turbo4<br/>400B JS search"]
802
+ T1 -. "v5.1 Purge" .-> Null["NULL after 30d"]
803
+
804
+ SQLite --> Handoffs["session_handoffs<br/>(OCC versioning)"]
626
805
  SQLite --> History["history_snapshots<br/>(Time Travel)"]
627
806
  SQLite --> Media["media vault<br/>(Visual Memory)"]
628
807
 
@@ -632,13 +811,16 @@ graph TB
632
811
  style Tracing fill:#D69E2E,color:#fff
633
812
  style Dashboard fill:#9F7AEA,color:#fff
634
813
  style Brave fill:#FB542B,color:#fff
635
- style Gemini fill:#4285F4,color:#fff
814
+ style LLM fill:#4285F4,color:#fff
636
815
  style Sandbox fill:#805AD5,color:#fff
637
816
  style SyncBus fill:#ED64A6,color:#fff
638
817
  style GDPR fill:#E53E3E,color:#fff
639
818
  style Storage fill:#2D3748,color:#fff
640
819
  style SQLite fill:#38B2AC,color:#fff
641
820
  style Supabase fill:#3ECF8E,color:#fff
821
+ style T1 fill:#48BB78,color:#fff
822
+ style T2 fill:#E8B004,color:#000
823
+ style Null fill:#E53E3E,color:#fff
642
824
  ```
643
825
 
644
826
  ---
@@ -968,6 +1150,7 @@ The retrievers use `_aget_relevant_documents` as the primary path with `asyncio.
968
1150
  | `PRISM_AUTO_CAPTURE` | No | Set `"true"` to auto-capture HTML snapshots of dev servers |
969
1151
  | `PRISM_CAPTURE_PORTS` | No | Comma-separated ports to scan (default: `3000,3001,5173,8080`) |
970
1152
  | `PRISM_DEBUG_LOGGING` | No | Set `"true"` to enable verbose debug logs (default: quiet) |
1153
+ | `PRISM_DASHBOARD_PORT` | No | Configure the dashboard port (default: `3000`) |
971
1154
 
972
1155
  ---
973
1156
 
@@ -1420,7 +1603,6 @@ See [`vertex-ai/`](vertex-ai/) for setup and benchmarks.
1420
1603
  │ │ ├── compactionHandler.ts # Gemini-powered ledger compaction
1421
1604
  │ │ └── index.ts # Tool registration & re-exports
1422
1605
  │ └── utils/
1423
- │ └── utils/
1424
1606
  │ ├── telemetry.ts # OTel singleton — NodeTracerProvider, BatchSpanProcessor, no-op mode
1425
1607
  │ ├── tracing.ts # MemoryTrace types + factory (Phase 1 — LLM explainability)
1426
1608
  │ ├── imageCaptioner.ts # VLM auto-caption pipeline (v4.5) + worker.vlm_caption OTel span
@@ -1462,6 +1644,24 @@ See [`vertex-ai/`](vertex-ai/) for setup and benchmarks.
1462
1644
 
1463
1645
  > **[View the full project board →](https://github.com/users/dcostenco/projects/1/views/1)** | **[Full ROADMAP.md →](ROADMAP.md)**
1464
1646
 
1647
+ ### ✅ v5.0 — Quantized Agentic Memory (Shipped!)
1648
+
1649
+ | Feature | Description |
1650
+ |---|---|
1651
+ | 🧮 **TurboQuant Math Core** | Pure TypeScript port of Google's TurboQuant (ICLR 2026) — Lloyd-Max codebook, QR rotation, QJL error correction. Zero dependencies. [RFC-001](docs/rfcs/001-turboquant-integration.md) |
1652
+ | 📦 **~7× Embedding Compression** | 768-dim embeddings shrink from 3,072 bytes to ~400 bytes (4-bit) via variable bit-packing. |
1653
+ | 🔍 **Asymmetric Similarity** | Unbiased inner product estimator: query as float32 vs compressed blobs. No decompression needed. |
1654
+ | 🗄️ **Two-Tier Search** | FTS5 candidate filter → JS-side asymmetric scoring. Bypasses sqlite-vec float32 limitation. |
1655
+
1656
+ ### ✅ v5.1 — Deep Storage Mode (Shipped!)
1657
+
1658
+ | Feature | Description |
1659
+ |---|---|
1660
+ | 🧬 **Deep Storage Purge** | Automated `deep_storage_purge` tool NULLs out redundant float32 embeddings for entries with TurboQuant compressed blobs, reclaiming ~90% of vector storage. |
1661
+ | 🛡️ **Safety Guards** | Minimum 7-day age threshold, dry-run preview mode, multi-tenant isolation, and compressed-blob-existence validation ensure zero data loss. |
1662
+ | 🗃️ **Supabase RPC** | `prism_purge_embeddings` Postgres function (migration 030) provides full backend parity with SQLite. Auto-applied via the v4.1 migration runner. |
1663
+ | 🧪 **303 Tests** | 8 new deep-storage test cases covering dry run, execute, safety guards, and idempotency — zero regressions across the full suite. |
1664
+
1465
1665
  ### ✅ v4.6 — OpenTelemetry Observability (Shipped!)
1466
1666
 
1467
1667
  | Feature | Description |
@@ -1516,11 +1716,11 @@ See [v3.1.0](#whats-in-v310--memory-lifecycle-) and [v3.0.0](#whats-in-v300--age
1516
1716
 
1517
1717
  | Priority | Feature | Description |
1518
1718
  |----------|---------|-------------|
1519
- | 🥇 | **Documentation & Architecture Guide** | Full README overhaul with architecture diagrams, "How to build a self-improving agent" walkthrough, and v4.x feature matrix. |
1520
- | 🥈 | **Knowledge Graph Editor** | Visual graph in Mind Palace showing nodes for projects, agents, sessions, and graduated rules. |
1719
+ | | **Documentation & Architecture Guide** | [Architecture Guide](docs/ARCHITECTURE.md), [Self-Improving Agent Guide](docs/self-improving-agent.md), updated README diagram with v5.x vector tiers. |
1720
+ | | **Knowledge Graph Editor** | Interactive vis.js graph with click-to-filter, node stats, project/keyword/category visualization. |
1521
1721
  | 🥉 | **Autonomous Web Scholar** | Agent-driven learning pipeline using Brave Search + VLM to autonomously build project context while the developer sleeps. |
1522
- | | **Dashboard Auth** | Optional basic auth for remote Mind Palace access. |
1523
- | | **TypeScript LangGraph Examples** | Reference implementations alongside the existing Python agent. |
1722
+ | | **Dashboard Auth** | HTTP Basic Auth with session cookies, timing-safe comparison, styled login page. Set `PRISM_DASHBOARD_USER`/`PRISM_DASHBOARD_PASS`. |
1723
+ | | **TypeScript LangGraph Examples** | [Reference agent](examples/langgraph-ts/) with MCP client, memory retriever nodes, and session persistence. |
1524
1724
  | — | **CRDT Conflict Resolution** | Conflict-free types for concurrent multi-agent edits on the same handoff. |
1525
1725
 
1526
1726
  ---