prism-mcp-server 16.1.1 → 16.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -229,6 +229,51 @@ That's it. Open Claude / Cursor and your AI now has memory.
229
229
 
230
230
  More setup details in [`docs/SETUP_GEMINI.md`](docs/SETUP_GEMINI.md).
231
231
 
232
+ ### Monitoring & Observability *(new in v16.2)*
233
+
234
+ Built-in Datadog integration — every tool call is logged with tool name, project, and latency. Zero config for self-hosted users (logs to stdout); set `DD_API_KEY` to send structured logs to Datadog HTTP intake.
235
+
236
+ ```bash
237
+ # Enable Datadog logging (optional)
238
+ export DD_API_KEY=your_datadog_api_key
239
+
240
+ # Enable OpenTelemetry tracing (optional — works with Jaeger, Zipkin, Datadog, Grafana Tempo)
241
+ export PRISM_OTEL_ENABLED=true
242
+ export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
243
+ ```
244
+
245
+ **What's tracked automatically:**
246
+ - `mcp.tool.success` — tool name, project, duration (ms) on every successful call
247
+ - `mcp.tool.error` — tool name, error message, stack trace on failures
248
+ - OpenTelemetry spans with `tool.name` and `project` attributes on all 50 tool handlers
249
+
250
+ | Dashboard | What it tracks |
251
+ |-----------|---------------|
252
+ | [Prism MCP — Server Analytics](https://app.datadoghq.com/dashboard/tdm-92f-myh/prism-mcp--server-analytics) | Tool call volume, latency per tool (avg/p95), errors by tool, project activity, knowledge search/ingest, session memory ops |
253
+
254
+ ### In-app analytics for paid users *(new in v16.2)*
255
+
256
+ Paid Synalux subscribers get a built-in analytics dashboard at `/app/memory-analytics`:
257
+
258
+ ```
259
+ ┌─────────────────────────────────────────────────────────┐
260
+ │ Analytics [standard] plan │
261
+ ├─────────────────────────────────────────────────────────┤
262
+ │ 📝 Sessions: 147 🔄 Handoffs: 23 📚 Knowledge: 89 │
263
+ │ 📁 Projects: 5 💾 Memory: 42 KB │
264
+ ├─────────────────────────────────────────────────────────┤
265
+ │ Today's Usage 🧠 47/200 🔎 12/50 💬 85/200 │
266
+ ├─────────────────────────────────────────────────────────┤
267
+ │ 30-Day Trend ▂▃▅▇▆▄▃▅▆▇█▇▅▃▂▃▅▆▇▅▃▂▁▂▃▅▇▆▅▃ │
268
+ ├─────────────────────────────────────────────────────────┤
269
+ │ Top Projects prism-mcp (45) · portal (32) · ... │
270
+ │ Compaction 3 entries > 5KB — run compact_ledger │
271
+ └─────────────────────────────────────────────────────────┘
272
+ ```
273
+
274
+ - **Free tier**: paywall with upgrade CTA
275
+ - **Standard+**: session counts, handoffs, knowledge entries, daily quotas with tier limits, 30-day activity trend, project breakdown, compaction candidates
276
+
232
277
  ---
233
278
 
234
279
  ## How AI agents use it
@@ -319,6 +364,83 @@ python3 tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py
319
364
  |---|---|---|
320
365
  | Per-model BFCL | [`tests/benchmarks/prism-routing-100/`](tests/benchmarks/prism-routing-100/) | Solo accuracy per model, 12 categories |
321
366
  | Cascade vs Opus | [`tests/benchmarks/cascade-14b-32b-opus/`](tests/benchmarks/cascade-14b-32b-opus/) | Tier distribution, Opus engagement rate, cascade accuracy |
367
+ | LoCoMo-Plus (Cognitive) | `/tmp/Locomo-Plus/` | Long-context dialogue coherence and historical memory retention |
368
+
369
+ ### Cognitive Dialogue Memory (LoCoMo-Plus Benchmark)
370
+
371
+ LoCoMo-Plus is a long-context, multi-day dialogue benchmark designed to test an AI agent's memory retention, context awareness, and ability to coherently reference historical dialogue evidence.
372
+
373
+ The **Cognitive** subset (401 multi-day dialogue scenarios) was evaluated head-to-head comparing raw baseline models against the **Prism-MCP** framework (using local SQLite semantic memory). Graded by a neutral `gemini-2.5-flash` model acting as judge (scoring on coherence, continuity, and fact accuracy):
374
+
375
+ | Configuration | Total Samples | Total Score | Average Score | Absolute Delta | Relative Error Reduction |
376
+ | :--- | :---: | :---: | :---: | :---: | :---: |
377
+ | **Gemini-2.5-flash (Baseline)** | 401 | 278.0 / 401 | **69.33%** | — | — |
378
+ | **Prism-MCP (Gemini-2.5-flash + Memory)** | 401 | 361.0 / 401 | **90.02%** | **+20.69%** | **67.46%** |
379
+ | **Gemini-3.1-pro-preview (Baseline)** | 401 | 272.0 / 401 | **67.83%** | — | — |
380
+ | **Prism-MCP (Gemini-3.1-pro + Memory)** | 401 | 382.0 / 401 | **95.26%** | **+27.43%** | **85.27%** |
381
+
382
+ **Key Takeaways**:
383
+ * **Pure attention limits**: Larger raw frontier models (Gemini 3.1 Pro baseline at **67.83%**) suffer from attention dilution (the "needle in a haystack" problem) when parsing massive multi-day transcripts directly in active context.
384
+ * **Semantic database synergy**: Equipping a model with Prism-MCP's structured semantic memory retrieval yields state-of-the-art performance (**95.26%** for Gemini 3.1 Pro + Memory), proving that structured semantic recall is far more accurate than raw model scaling alone.
385
+
386
+ <details>
387
+ <summary>🔍 View Test Case Schema & Sample</summary>
388
+
389
+ A representative test sample from the [unified_cognitive_only.json](file:///tmp/Locomo-Plus/data/unified_cognitive_only.json) ([GitHub source](https://github.com/dcostenco/Locomo-Plus/blob/main/data/unified_cognitive_only.json)) dataset contains a multi-turn chat history with a memory "needle" placed days prior, followed by a cued dialogue prompt:
390
+
391
+ ```json
392
+ {
393
+ "category": "Cognitive",
394
+ "input_prompt": "Caroline said, \"...\"\nMelanie said, \"...\"",
395
+ "trigger": "Melanie said, \"Hey, Caroline! Nice to hear from you! Love the necklace, any special meaning to it?\"",
396
+ "evidence": "Swedish grandmother's necklace was gifted to Caroline",
397
+ "answer": "Yes, this necklace was a gift from my grandmother in my home country, Sweden."
398
+ }
399
+ ```
400
+
401
+ When evaluated:
402
+ * **Baseline models** without memory frequently output a generic guess (e.g., "Thanks, it was a gift from a friend") or fail to reference the Sweden/grandmother relationship.
403
+ * **Prism-MCP** automatically embeds the prior turns, stores them in SQLite, and when cued, retrieves the precise "Swedish grandmother" evidence turn via semantic vectors to inject it into active context.
404
+ </details>
405
+
406
+ <details>
407
+ <summary>💻 View How to Reproduce Publicly (Test Source & Guide)</summary>
408
+
409
+ To run and review the evaluation suite on your local setup using the benchmark runner scripts ([evaluate_qa.py](file:///tmp/Locomo-Plus/evaluation_framework/task_eval/evaluate_qa.py) and [llm_as_judge.py](file:///tmp/Locomo-Plus/evaluation_framework/task_eval/llm_as_judge.py)):
410
+
411
+ ```bash
412
+ # 1. Clone the LoCoMo-Plus evaluation codebase
413
+ git clone https://github.com/dcostenco/Locomo-Plus /tmp/Locomo-Plus
414
+ cd /tmp/Locomo-Plus
415
+
416
+ # 2. Run Baseline Gemini 3.1 Pro Evaluation (concurrency 5)
417
+ export GOOGLE_API_KEY="your-api-key"
418
+ PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
419
+ --data-file data/unified_cognitive_only.json \
420
+ --out-file output/gemini_3.1_pro_pred.json \
421
+ --model gemini-3.1-pro-preview \
422
+ --backend call_gemini \
423
+ --concurrency 5
424
+
425
+ # 3. Run Prism-MCP powered by Gemini 3.1 Pro Evaluation (concurrency 1 to guard SQLite locks)
426
+ export PRISM_TEXT_MODEL=gemini-3.1-pro-preview
427
+ PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
428
+ --data-file data/unified_cognitive_only.json \
429
+ --out-file output/prism_gemini_3.1_pro_pred.json \
430
+ --model gemini-3.1-pro-preview \
431
+ --backend call_prism \
432
+ --concurrency 1
433
+
434
+ # 4. Grade results using the LLM-as-a-Judge script
435
+ PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/llm_as_judge.py \
436
+ --input-file output/prism_gemini_3.1_pro_pred.json \
437
+ --out-file output/prism_gemini_3.1_pro_judged.json \
438
+ --model gemini-2.5-flash \
439
+ --backend call_gemini \
440
+ --concurrency 5 \
441
+ --summary-file output/prism_gemini_3.1_pro_summary.json
442
+ ```
443
+ </details>
322
444
 
323
445
  ### Models on HuggingFace
324
446
 
package/dist/server.js CHANGED
@@ -77,6 +77,7 @@ import { getSettingSync, initConfigStorage } from "./storage/configStorage.js";
77
77
  import { sanitizeMcpOutput } from "./utils/sanitizer.js";
78
78
  import { getTracer, initTelemetry } from "./utils/telemetry.js";
79
79
  import { context as otelContext, trace, SpanStatusCode } from "@opentelemetry/api";
80
+ import { ddInfo, ddError as ddLogError } from "./utils/ddLogger.js";
80
81
  // ─── Import Tool Definitions (schemas) and Handlers (implementations) ─────
81
82
  import { WEB_SEARCH_TOOL, BRAVE_WEB_SEARCH_CODE_MODE_TOOL, LOCAL_SEARCH_TOOL, BRAVE_LOCAL_SEARCH_CODE_MODE_TOOL, CODE_MODE_TRANSFORM_TOOL, BRAVE_ANSWERS_TOOL, RESEARCH_PAPER_ANALYSIS_TOOL, webSearchHandler, braveWebSearchCodeModeHandler, localSearchHandler, braveLocalSearchCodeModeHandler, codeModeTransformHandler, braveAnswersHandler, researchPaperAnalysisHandler, } from "./tools/index.js";
82
83
  // Session memory tools — only used if Supabase is configured
@@ -672,6 +673,7 @@ export function createServer() {
672
673
  // through await chains — including fire-and-forget workers launched
673
674
  // within the handler body (e.g. imageCaptioner, embeddings backfill).
674
675
  return otelContext.with(trace.setSpan(otelContext.active(), rootSpan), async () => {
676
+ const _ddStart = Date.now();
675
677
  try {
676
678
  if (!args) {
677
679
  throw new Error("No arguments provided");
@@ -945,6 +947,7 @@ export function createServer() {
945
947
  };
946
948
  }
947
949
  rootSpan.setStatus({ code: SpanStatusCode.OK });
950
+ ddInfo("mcp.tool.success", { tool: name, project: args?.project, durationMs: Date.now() - _ddStart });
948
951
  // ═══ v5.3: Hivemind Watchdog Alert Injection (Telepathy) ═══
949
952
  // CRITICAL: Append alerts DIRECTLY to tool response content
950
953
  // so the LLM actually reads them. sendLoggingMessage goes to
@@ -985,6 +988,7 @@ export function createServer() {
985
988
  }
986
989
  catch (error) {
987
990
  console.error(`Error in tool handler: ${error instanceof Error ? error.message : String(error)}`);
991
+ ddLogError("mcp.tool.error", error instanceof Error ? error : undefined, { tool: name, project: args?.project, durationMs: Date.now() - _ddStart });
988
992
  rootSpan.recordException(error instanceof Error ? error : new Error(String(error)));
989
993
  rootSpan.setStatus({
990
994
  code: SpanStatusCode.ERROR,
@@ -37,7 +37,7 @@ import { debugLog } from "../../logger.js";
37
37
  // ─── Model Constants ──────────────────────────────────────────────────────────
38
38
  // Defined as constants (not hardcoded strings) so external reviewers can see
39
39
  // all model choices at a glance, and future changes only need one edit.
40
- const TEXT_MODEL = "gemini-2.5-flash"; // chat/instruction-following model
40
+ const TEXT_MODEL = process.env.PRISM_TEXT_MODEL || "gemini-2.5-flash"; // chat/instruction-following model
41
41
  const EMBEDDING_MODEL = "gemini-embedding-001"; // vector embedding model (MRL-enabled)
42
42
  const EMBEDDING_DIMS = 768; // fixed output dims — must match DB schema
43
43
  // ─── Embedding Truncation Constants ──────────────────────────────────────────
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "prism-mcp-server",
3
- "version": "16.1.1",
3
+ "version": "16.2.0",
4
4
  "mcpName": "io.github.dcostenco/prism-coder",
5
5
  "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
6
6
  "module": "index.ts",