prism-mcp-server 16.1.1 → 16.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +122 -0
- package/dist/server.js +4 -0
- package/dist/utils/llm/adapters/gemini.js +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -229,6 +229,51 @@ That's it. Open Claude / Cursor and your AI now has memory.
|
|
|
229
229
|
|
|
230
230
|
More setup details in [`docs/SETUP_GEMINI.md`](docs/SETUP_GEMINI.md).
|
|
231
231
|
|
|
232
|
+
### Monitoring & Observability *(new in v16.2)*
|
|
233
|
+
|
|
234
|
+
Built-in Datadog integration — every tool call is logged with tool name, project, and latency. Zero config for self-hosted users (logs to stdout); set `DD_API_KEY` to send structured logs to Datadog HTTP intake.
|
|
235
|
+
|
|
236
|
+
```bash
|
|
237
|
+
# Enable Datadog logging (optional)
|
|
238
|
+
export DD_API_KEY=your_datadog_api_key
|
|
239
|
+
|
|
240
|
+
# Enable OpenTelemetry tracing (optional — works with Jaeger, Zipkin, Datadog, Grafana Tempo)
|
|
241
|
+
export PRISM_OTEL_ENABLED=true
|
|
242
|
+
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
**What's tracked automatically:**
|
|
246
|
+
- `mcp.tool.success` — tool name, project, duration (ms) on every successful call
|
|
247
|
+
- `mcp.tool.error` — tool name, error message, stack trace on failures
|
|
248
|
+
- OpenTelemetry spans with `tool.name` and `project` attributes on all 50 tool handlers
|
|
249
|
+
|
|
250
|
+
| Dashboard | What it tracks |
|
|
251
|
+
|-----------|---------------|
|
|
252
|
+
| [Prism MCP — Server Analytics](https://app.datadoghq.com/dashboard/tdm-92f-myh/prism-mcp--server-analytics) | Tool call volume, latency per tool (avg/p95), errors by tool, project activity, knowledge search/ingest, session memory ops |
|
|
253
|
+
|
|
254
|
+
### In-app analytics for paid users *(new in v16.2)*
|
|
255
|
+
|
|
256
|
+
Paid Synalux subscribers get a built-in analytics dashboard at `/app/memory-analytics`:
|
|
257
|
+
|
|
258
|
+
```
|
|
259
|
+
┌─────────────────────────────────────────────────────────┐
|
|
260
|
+
│ Analytics [standard] plan │
|
|
261
|
+
├─────────────────────────────────────────────────────────┤
|
|
262
|
+
│ 📝 Sessions: 147 🔄 Handoffs: 23 📚 Knowledge: 89 │
|
|
263
|
+
│ 📁 Projects: 5 💾 Memory: 42 KB │
|
|
264
|
+
├─────────────────────────────────────────────────────────┤
|
|
265
|
+
│ Today's Usage 🧠 47/200 🔎 12/50 💬 85/200 │
|
|
266
|
+
├─────────────────────────────────────────────────────────┤
|
|
267
|
+
│ 30-Day Trend ▂▃▅▇▆▄▃▅▆▇█▇▅▃▂▃▅▆▇▅▃▂▁▂▃▅▇▆▅▃ │
|
|
268
|
+
├─────────────────────────────────────────────────────────┤
|
|
269
|
+
│ Top Projects prism-mcp (45) · portal (32) · ... │
|
|
270
|
+
│ Compaction 3 entries > 5KB — run compact_ledger │
|
|
271
|
+
└─────────────────────────────────────────────────────────┘
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
- **Free tier**: paywall with upgrade CTA
|
|
275
|
+
- **Standard+**: session counts, handoffs, knowledge entries, daily quotas with tier limits, 30-day activity trend, project breakdown, compaction candidates
|
|
276
|
+
|
|
232
277
|
---
|
|
233
278
|
|
|
234
279
|
## How AI agents use it
|
|
@@ -319,6 +364,83 @@ python3 tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py
|
|
|
319
364
|
|---|---|---|
|
|
320
365
|
| Per-model BFCL | [`tests/benchmarks/prism-routing-100/`](tests/benchmarks/prism-routing-100/) | Solo accuracy per model, 12 categories |
|
|
321
366
|
| Cascade vs Opus | [`tests/benchmarks/cascade-14b-32b-opus/`](tests/benchmarks/cascade-14b-32b-opus/) | Tier distribution, Opus engagement rate, cascade accuracy |
|
|
367
|
+
| LoCoMo-Plus (Cognitive) | `/tmp/Locomo-Plus/` | Long-context dialogue coherence and historical memory retention |
|
|
368
|
+
|
|
369
|
+
### Cognitive Dialogue Memory (LoCoMo-Plus Benchmark)
|
|
370
|
+
|
|
371
|
+
LoCoMo-Plus is a long-context, multi-day dialogue benchmark designed to test an AI agent's memory retention, context awareness, and ability to coherently reference historical dialogue evidence.
|
|
372
|
+
|
|
373
|
+
The **Cognitive** subset (401 multi-day dialogue scenarios) was evaluated head-to-head comparing raw baseline models against the **Prism-MCP** framework (using local SQLite semantic memory). Graded by a neutral `gemini-2.5-flash` model acting as judge (scoring on coherence, continuity, and fact accuracy):
|
|
374
|
+
|
|
375
|
+
| Configuration | Total Samples | Total Score | Average Score | Absolute Delta | Relative Error Reduction |
|
|
376
|
+
| :--- | :---: | :---: | :---: | :---: | :---: |
|
|
377
|
+
| **Gemini-2.5-flash (Baseline)** | 401 | 278.0 / 401 | **69.33%** | — | — |
|
|
378
|
+
| **Prism-MCP (Gemini-2.5-flash + Memory)** | 401 | 361.0 / 401 | **90.02%** | **+20.69%** | **67.46%** |
|
|
379
|
+
| **Gemini-3.1-pro-preview (Baseline)** | 401 | 272.0 / 401 | **67.83%** | — | — |
|
|
380
|
+
| **Prism-MCP (Gemini-3.1-pro + Memory)** | 401 | 382.0 / 401 | **95.26%** | **+27.43%** | **85.27%** |
|
|
381
|
+
|
|
382
|
+
**Key Takeaways**:
|
|
383
|
+
* **Pure attention limits**: Larger raw frontier models (Gemini 3.1 Pro baseline at **67.83%**) suffer from attention dilution (the "needle in a haystack" problem) when parsing massive multi-day transcripts directly in active context.
|
|
384
|
+
* **Semantic database synergy**: Equipping a model with Prism-MCP's structured semantic memory retrieval yields state-of-the-art performance (**95.26%** for Gemini 3.1 Pro + Memory), proving that structured semantic recall is far more accurate than raw model scaling alone.
|
|
385
|
+
|
|
386
|
+
<details>
|
|
387
|
+
<summary>🔍 View Test Case Schema & Sample</summary>
|
|
388
|
+
|
|
389
|
+
A representative test sample from the [unified_cognitive_only.json](file:///tmp/Locomo-Plus/data/unified_cognitive_only.json) ([GitHub source](https://github.com/dcostenco/Locomo-Plus/blob/main/data/unified_cognitive_only.json)) dataset contains a multi-turn chat history with a memory "needle" placed days prior, followed by a cued dialogue prompt:
|
|
390
|
+
|
|
391
|
+
```json
|
|
392
|
+
{
|
|
393
|
+
"category": "Cognitive",
|
|
394
|
+
"input_prompt": "Caroline said, \"...\"\nMelanie said, \"...\"",
|
|
395
|
+
"trigger": "Melanie said, \"Hey, Caroline! Nice to hear from you! Love the necklace, any special meaning to it?\"",
|
|
396
|
+
"evidence": "Swedish grandmother's necklace was gifted to Caroline",
|
|
397
|
+
"answer": "Yes, this necklace was a gift from my grandmother in my home country, Sweden."
|
|
398
|
+
}
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
When evaluated:
|
|
402
|
+
* **Baseline models** without memory frequently output a generic guess (e.g., "Thanks, it was a gift from a friend") or fail to reference the Sweden/grandmother relationship.
|
|
403
|
+
* **Prism-MCP** automatically embeds the prior turns, stores them in SQLite, and when cued, retrieves the precise "Swedish grandmother" evidence turn via semantic vectors to inject it into active context.
|
|
404
|
+
</details>
|
|
405
|
+
|
|
406
|
+
<details>
|
|
407
|
+
<summary>💻 View How to Reproduce Publicly (Test Source & Guide)</summary>
|
|
408
|
+
|
|
409
|
+
To run and review the evaluation suite on your local setup using the benchmark runner scripts ([evaluate_qa.py](file:///tmp/Locomo-Plus/evaluation_framework/task_eval/evaluate_qa.py) and [llm_as_judge.py](file:///tmp/Locomo-Plus/evaluation_framework/task_eval/llm_as_judge.py)):
|
|
410
|
+
|
|
411
|
+
```bash
|
|
412
|
+
# 1. Clone the LoCoMo-Plus evaluation codebase
|
|
413
|
+
git clone https://github.com/dcostenco/Locomo-Plus /tmp/Locomo-Plus
|
|
414
|
+
cd /tmp/Locomo-Plus
|
|
415
|
+
|
|
416
|
+
# 2. Run Baseline Gemini 3.1 Pro Evaluation (concurrency 5)
|
|
417
|
+
export GOOGLE_API_KEY="your-api-key"
|
|
418
|
+
PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
|
|
419
|
+
--data-file data/unified_cognitive_only.json \
|
|
420
|
+
--out-file output/gemini_3.1_pro_pred.json \
|
|
421
|
+
--model gemini-3.1-pro-preview \
|
|
422
|
+
--backend call_gemini \
|
|
423
|
+
--concurrency 5
|
|
424
|
+
|
|
425
|
+
# 3. Run Prism-MCP powered by Gemini 3.1 Pro Evaluation (concurrency 1 to guard SQLite locks)
|
|
426
|
+
export PRISM_TEXT_MODEL=gemini-3.1-pro-preview
|
|
427
|
+
PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
|
|
428
|
+
--data-file data/unified_cognitive_only.json \
|
|
429
|
+
--out-file output/prism_gemini_3.1_pro_pred.json \
|
|
430
|
+
--model gemini-3.1-pro-preview \
|
|
431
|
+
--backend call_prism \
|
|
432
|
+
--concurrency 1
|
|
433
|
+
|
|
434
|
+
# 4. Grade results using the LLM-as-a-Judge script
|
|
435
|
+
PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/llm_as_judge.py \
|
|
436
|
+
--input-file output/prism_gemini_3.1_pro_pred.json \
|
|
437
|
+
--out-file output/prism_gemini_3.1_pro_judged.json \
|
|
438
|
+
--model gemini-2.5-flash \
|
|
439
|
+
--backend call_gemini \
|
|
440
|
+
--concurrency 5 \
|
|
441
|
+
--summary-file output/prism_gemini_3.1_pro_summary.json
|
|
442
|
+
```
|
|
443
|
+
</details>
|
|
322
444
|
|
|
323
445
|
### Models on HuggingFace
|
|
324
446
|
|
package/dist/server.js
CHANGED
|
@@ -77,6 +77,7 @@ import { getSettingSync, initConfigStorage } from "./storage/configStorage.js";
|
|
|
77
77
|
import { sanitizeMcpOutput } from "./utils/sanitizer.js";
|
|
78
78
|
import { getTracer, initTelemetry } from "./utils/telemetry.js";
|
|
79
79
|
import { context as otelContext, trace, SpanStatusCode } from "@opentelemetry/api";
|
|
80
|
+
import { ddInfo, ddError as ddLogError } from "./utils/ddLogger.js";
|
|
80
81
|
// ─── Import Tool Definitions (schemas) and Handlers (implementations) ─────
|
|
81
82
|
import { WEB_SEARCH_TOOL, BRAVE_WEB_SEARCH_CODE_MODE_TOOL, LOCAL_SEARCH_TOOL, BRAVE_LOCAL_SEARCH_CODE_MODE_TOOL, CODE_MODE_TRANSFORM_TOOL, BRAVE_ANSWERS_TOOL, RESEARCH_PAPER_ANALYSIS_TOOL, webSearchHandler, braveWebSearchCodeModeHandler, localSearchHandler, braveLocalSearchCodeModeHandler, codeModeTransformHandler, braveAnswersHandler, researchPaperAnalysisHandler, } from "./tools/index.js";
|
|
82
83
|
// Session memory tools — only used if Supabase is configured
|
|
@@ -672,6 +673,7 @@ export function createServer() {
|
|
|
672
673
|
// through await chains — including fire-and-forget workers launched
|
|
673
674
|
// within the handler body (e.g. imageCaptioner, embeddings backfill).
|
|
674
675
|
return otelContext.with(trace.setSpan(otelContext.active(), rootSpan), async () => {
|
|
676
|
+
const _ddStart = Date.now();
|
|
675
677
|
try {
|
|
676
678
|
if (!args) {
|
|
677
679
|
throw new Error("No arguments provided");
|
|
@@ -945,6 +947,7 @@ export function createServer() {
|
|
|
945
947
|
};
|
|
946
948
|
}
|
|
947
949
|
rootSpan.setStatus({ code: SpanStatusCode.OK });
|
|
950
|
+
ddInfo("mcp.tool.success", { tool: name, project: args?.project, durationMs: Date.now() - _ddStart });
|
|
948
951
|
// ═══ v5.3: Hivemind Watchdog Alert Injection (Telepathy) ═══
|
|
949
952
|
// CRITICAL: Append alerts DIRECTLY to tool response content
|
|
950
953
|
// so the LLM actually reads them. sendLoggingMessage goes to
|
|
@@ -985,6 +988,7 @@ export function createServer() {
|
|
|
985
988
|
}
|
|
986
989
|
catch (error) {
|
|
987
990
|
console.error(`Error in tool handler: ${error instanceof Error ? error.message : String(error)}`);
|
|
991
|
+
ddLogError("mcp.tool.error", error instanceof Error ? error : undefined, { tool: name, project: args?.project, durationMs: Date.now() - _ddStart });
|
|
988
992
|
rootSpan.recordException(error instanceof Error ? error : new Error(String(error)));
|
|
989
993
|
rootSpan.setStatus({
|
|
990
994
|
code: SpanStatusCode.ERROR,
|
|
@@ -37,7 +37,7 @@ import { debugLog } from "../../logger.js";
|
|
|
37
37
|
// ─── Model Constants ──────────────────────────────────────────────────────────
|
|
38
38
|
// Defined as constants (not hardcoded strings) so external reviewers can see
|
|
39
39
|
// all model choices at a glance, and future changes only need one edit.
|
|
40
|
-
const TEXT_MODEL = "gemini-2.5-flash"; // chat/instruction-following model
|
|
40
|
+
const TEXT_MODEL = process.env.PRISM_TEXT_MODEL || "gemini-2.5-flash"; // chat/instruction-following model
|
|
41
41
|
const EMBEDDING_MODEL = "gemini-embedding-001"; // vector embedding model (MRL-enabled)
|
|
42
42
|
const EMBEDDING_DIMS = 768; // fixed output dims — must match DB schema
|
|
43
43
|
// ─── Embedding Truncation Constants ──────────────────────────────────────────
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "prism-mcp-server",
|
|
3
|
-
"version": "16.
|
|
3
|
+
"version": "16.2.0",
|
|
4
4
|
"mcpName": "io.github.dcostenco/prism-coder",
|
|
5
5
|
"description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
|
|
6
6
|
"module": "index.ts",
|