npm - prism-mcp-server - Versions diffs - 16.1.1 → 16.2.0 - Mend

prism-mcp-server 16.1.1 → 16.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +122 -0
package/dist/server.js +4 -0
package/dist/utils/llm/adapters/gemini.js +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -229,6 +229,51 @@ That's it. Open Claude / Cursor and your AI now has memory.
 More setup details in [`docs/SETUP_GEMINI.md`](docs/SETUP_GEMINI.md).
+### Monitoring & Observability *(new in v16.2)*
+Built-in Datadog integration — every tool call is logged with tool name, project, and latency. Zero config for self-hosted users (logs to stdout); set `DD_API_KEY` to send structured logs to Datadog HTTP intake.
+```bash
+# Enable Datadog logging (optional)
+export DD_API_KEY=your_datadog_api_key
+# Enable OpenTelemetry tracing (optional — works with Jaeger, Zipkin, Datadog, Grafana Tempo)
+export PRISM_OTEL_ENABLED=true
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
+```
+**What's tracked automatically:**
+- `mcp.tool.success` — tool name, project, duration (ms) on every successful call
+- `mcp.tool.error` — tool name, error message, stack trace on failures
+- OpenTelemetry spans with `tool.name` and `project` attributes on all 50 tool handlers
+| Dashboard | What it tracks |
+|-----------|---------------|
+| [Prism MCP — Server Analytics](https://app.datadoghq.com/dashboard/tdm-92f-myh/prism-mcp--server-analytics) | Tool call volume, latency per tool (avg/p95), errors by tool, project activity, knowledge search/ingest, session memory ops |
+### In-app analytics for paid users *(new in v16.2)*
+Paid Synalux subscribers get a built-in analytics dashboard at `/app/memory-analytics`:
+```
+┌─────────────────────────────────────────────────────────┐
+│  Analytics                              [standard] plan │
+├─────────────────────────────────────────────────────────┤
+│  📝 Sessions: 147  🔄 Handoffs: 23  📚 Knowledge: 89  │
+│  📁 Projects: 5    💾 Memory: 42 KB                    │
+├─────────────────────────────────────────────────────────┤
+│  Today's Usage    🧠 47/200  🔎 12/50  💬 85/200       │
+├─────────────────────────────────────────────────────────┤
+│  30-Day Trend     ▂▃▅▇▆▄▃▅▆▇█▇▅▃▂▃▅▆▇▅▃▂▁▂▃▅▇▆▅▃    │
+├─────────────────────────────────────────────────────────┤
+│  Top Projects     prism-mcp (45) · portal (32) · ...   │
+│  Compaction       3 entries > 5KB — run compact_ledger  │
+└─────────────────────────────────────────────────────────┘
+```
+- **Free tier**: paywall with upgrade CTA
+- **Standard+**: session counts, handoffs, knowledge entries, daily quotas with tier limits, 30-day activity trend, project breakdown, compaction candidates
 ---
 ## How AI agents use it
@@ -319,6 +364,83 @@ python3 tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py
 |---|---|---|
 | Per-model BFCL | [`tests/benchmarks/prism-routing-100/`](tests/benchmarks/prism-routing-100/) | Solo accuracy per model, 12 categories |
 | Cascade vs Opus | [`tests/benchmarks/cascade-14b-32b-opus/`](tests/benchmarks/cascade-14b-32b-opus/) | Tier distribution, Opus engagement rate, cascade accuracy |
+| LoCoMo-Plus (Cognitive) | `/tmp/Locomo-Plus/` | Long-context dialogue coherence and historical memory retention |
+### Cognitive Dialogue Memory (LoCoMo-Plus Benchmark)
+LoCoMo-Plus is a long-context, multi-day dialogue benchmark designed to test an AI agent's memory retention, context awareness, and ability to coherently reference historical dialogue evidence.
+The **Cognitive** subset (401 multi-day dialogue scenarios) was evaluated head-to-head comparing raw baseline models against the **Prism-MCP** framework (using local SQLite semantic memory). Graded by a neutral `gemini-2.5-flash` model acting as judge (scoring on coherence, continuity, and fact accuracy):
+| Configuration | Total Samples | Total Score | Average Score | Absolute Delta | Relative Error Reduction |
+| :--- | :---: | :---: | :---: | :---: | :---: |
+| **Gemini-2.5-flash (Baseline)** | 401 | 278.0 / 401 | **69.33%** | — | — |
+| **Prism-MCP (Gemini-2.5-flash + Memory)** | 401 | 361.0 / 401 | **90.02%** | **+20.69%** | **67.46%** |
+| **Gemini-3.1-pro-preview (Baseline)** | 401 | 272.0 / 401 | **67.83%** | — | — |
+| **Prism-MCP (Gemini-3.1-pro + Memory)** | 401 | 382.0 / 401 | **95.26%** | **+27.43%** | **85.27%** |
+**Key Takeaways**:
+* **Pure attention limits**: Larger raw frontier models (Gemini 3.1 Pro baseline at **67.83%**) suffer from attention dilution (the "needle in a haystack" problem) when parsing massive multi-day transcripts directly in active context.
+* **Semantic database synergy**: Equipping a model with Prism-MCP's structured semantic memory retrieval yields state-of-the-art performance (**95.26%** for Gemini 3.1 Pro + Memory), proving that structured semantic recall is far more accurate than raw model scaling alone.
+<details>
+<summary>🔍 View Test Case Schema & Sample</summary>
+A representative test sample from the [unified_cognitive_only.json](file:///tmp/Locomo-Plus/data/unified_cognitive_only.json) ([GitHub source](https://github.com/dcostenco/Locomo-Plus/blob/main/data/unified_cognitive_only.json)) dataset contains a multi-turn chat history with a memory "needle" placed days prior, followed by a cued dialogue prompt:
+```json
+{
+  "category": "Cognitive",
+  "input_prompt": "Caroline said, \"...\"\nMelanie said, \"...\"",
+  "trigger": "Melanie said, \"Hey, Caroline! Nice to hear from you! Love the necklace, any special meaning to it?\"",
+  "evidence": "Swedish grandmother's necklace was gifted to Caroline",
+  "answer": "Yes, this necklace was a gift from my grandmother in my home country, Sweden."
+}
+```
+When evaluated:
+* **Baseline models** without memory frequently output a generic guess (e.g., "Thanks, it was a gift from a friend") or fail to reference the Sweden/grandmother relationship.
+* **Prism-MCP** automatically embeds the prior turns, stores them in SQLite, and when cued, retrieves the precise "Swedish grandmother" evidence turn via semantic vectors to inject it into active context.
+</details>
+<details>
+<summary>💻 View How to Reproduce Publicly (Test Source & Guide)</summary>
+To run and review the evaluation suite on your local setup using the benchmark runner scripts ([evaluate_qa.py](file:///tmp/Locomo-Plus/evaluation_framework/task_eval/evaluate_qa.py) and [llm_as_judge.py](file:///tmp/Locomo-Plus/evaluation_framework/task_eval/llm_as_judge.py)):
+```bash
+# 1. Clone the LoCoMo-Plus evaluation codebase
+git clone https://github.com/dcostenco/Locomo-Plus /tmp/Locomo-Plus
+cd /tmp/Locomo-Plus
+# 2. Run Baseline Gemini 3.1 Pro Evaluation (concurrency 5)
+export GOOGLE_API_KEY="your-api-key"
+PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
+  --data-file data/unified_cognitive_only.json \
+  --out-file output/gemini_3.1_pro_pred.json \
+  --model gemini-3.1-pro-preview \
+  --backend call_gemini \
+  --concurrency 5
+# 3. Run Prism-MCP powered by Gemini 3.1 Pro Evaluation (concurrency 1 to guard SQLite locks)
+export PRISM_TEXT_MODEL=gemini-3.1-pro-preview
+PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
+  --data-file data/unified_cognitive_only.json \
+  --out-file output/prism_gemini_3.1_pro_pred.json \
+  --model gemini-3.1-pro-preview \
+  --backend call_prism \
+  --concurrency 1
+# 4. Grade results using the LLM-as-a-Judge script
+PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/llm_as_judge.py \
+  --input-file output/prism_gemini_3.1_pro_pred.json \
+  --out-file output/prism_gemini_3.1_pro_judged.json \
+  --model gemini-2.5-flash \
+  --backend call_gemini \
+  --concurrency 5 \
+  --summary-file output/prism_gemini_3.1_pro_summary.json
+```
+</details>
 ### Models on HuggingFace

package/dist/server.js CHANGED Viewed

@@ -77,6 +77,7 @@ import { getSettingSync, initConfigStorage } from "./storage/configStorage.js";
 import { sanitizeMcpOutput } from "./utils/sanitizer.js";
 import { getTracer, initTelemetry } from "./utils/telemetry.js";
 import { context as otelContext, trace, SpanStatusCode } from "@opentelemetry/api";
+import { ddInfo, ddError as ddLogError } from "./utils/ddLogger.js";
 // ─── Import Tool Definitions (schemas) and Handlers (implementations) ─────
 import { WEB_SEARCH_TOOL, BRAVE_WEB_SEARCH_CODE_MODE_TOOL, LOCAL_SEARCH_TOOL, BRAVE_LOCAL_SEARCH_CODE_MODE_TOOL, CODE_MODE_TRANSFORM_TOOL, BRAVE_ANSWERS_TOOL, RESEARCH_PAPER_ANALYSIS_TOOL, webSearchHandler, braveWebSearchCodeModeHandler, localSearchHandler, braveLocalSearchCodeModeHandler, codeModeTransformHandler, braveAnswersHandler, researchPaperAnalysisHandler, } from "./tools/index.js";
 // Session memory tools — only used if Supabase is configured
@@ -672,6 +673,7 @@ export function createServer() {
         // through await chains — including fire-and-forget workers launched
         // within the handler body (e.g. imageCaptioner, embeddings backfill).
         return otelContext.with(trace.setSpan(otelContext.active(), rootSpan), async () => {
+            const _ddStart = Date.now();
             try {
                 if (!args) {
                     throw new Error("No arguments provided");
@@ -945,6 +947,7 @@ export function createServer() {
                         };
                 }
                 rootSpan.setStatus({ code: SpanStatusCode.OK });
+                ddInfo("mcp.tool.success", { tool: name, project: args?.project, durationMs: Date.now() - _ddStart });
                 // ═══ v5.3: Hivemind Watchdog Alert Injection (Telepathy) ═══
                 // CRITICAL: Append alerts DIRECTLY to tool response content
                 // so the LLM actually reads them. sendLoggingMessage goes to
@@ -985,6 +988,7 @@ export function createServer() {
             }
             catch (error) {
                 console.error(`Error in tool handler: ${error instanceof Error ? error.message : String(error)}`);
+                ddLogError("mcp.tool.error", error instanceof Error ? error : undefined, { tool: name, project: args?.project, durationMs: Date.now() - _ddStart });
                 rootSpan.recordException(error instanceof Error ? error : new Error(String(error)));
                 rootSpan.setStatus({
                     code: SpanStatusCode.ERROR,

package/dist/utils/llm/adapters/gemini.js CHANGED Viewed

@@ -37,7 +37,7 @@ import { debugLog } from "../../logger.js";
 // ─── Model Constants ──────────────────────────────────────────────────────────
 // Defined as constants (not hardcoded strings) so external reviewers can see
 // all model choices at a glance, and future changes only need one edit.
-const TEXT_MODEL = "gemini-2.5-flash"; // chat/instruction-following model
+const TEXT_MODEL = process.env.PRISM_TEXT_MODEL || "gemini-2.5-flash"; // chat/instruction-following model
 const EMBEDDING_MODEL = "gemini-embedding-001"; // vector embedding model (MRL-enabled)
 const EMBEDDING_DIMS = 768; // fixed output dims — must match DB schema
 // ─── Embedding Truncation Constants ──────────────────────────────────────────

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "prism-mcp-server",
-  "version": "16.1.1",
+  "version": "16.2.0",
   "mcpName": "io.github.dcostenco/prism-coder",
   "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
   "module": "index.ts",