npm - prism-mcp-server - Versions diffs - 15.7.0 → 15.7.2 - Mend

prism-mcp-server 15.7.0 → 15.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +58 -22
package/dist/utils/groundingVerifier.js +4 -4
package/dist/utils/modelPicker.js +23 -23
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -58,10 +58,11 @@ Free tier runs entirely on your machine — SQLite, local embedding model, no AP
 Install in one command — no config, no keys, no vendor agreements:
 ```bash
-ollama pull dcostenco/prism-coder:1b7   # 2.2 GB · ~1.6s · any machine
-ollama pull dcostenco/prism-coder:8b    # 4.7 GB · ~0.8s · Mac M1+ / iPhone 8GB
-ollama pull dcostenco/prism-coder:14b   # 8.4 GB · ~1.1s · Mac M2+ / iPad Pro 16GB
-ollama pull dcostenco/prism-coder:32b   # 16 GB  · ~0.8s · Mac M2 Ultra+ (30B-A3B MoE)
+ollama pull dcostenco/prism-coder:14b   # 9 GB  · default router · Mac M2+ / iPad Pro
+ollama pull dcostenco/prism-coder:4b    # 2.5 GB · verifier · iPhone 15/16 Pro
+ollama pull dcostenco/prism-coder:1b7   # 2.2 GB · ultra-low RAM / Apple Watch
+ollama pull dcostenco/prism-coder:32b   # 19 GB  · complex tasks · Mac M2 Ultra+
+ollama pull dcostenco/prism-coder:8b    # 4.7 GB · balanced · iPhone/iPad 8GB
 ```
 Prism MCP detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tag forms automatically — nothing else to configure. If you want the bare tags as aliases for direct `ollama run prism-coder:14b` use, run:
@@ -73,33 +74,68 @@ prism register-models --dry-run # preview what would be aliased
 ### Cascade architecture
-Two cascades operate independently depending on the deployment context:
+Three-tier local cascade with cloud fallback:
-**Desktop / server cascade** (quality-first, used in Prism MCP + Synalux portal):
 ```
-prism-coder:14b ─── correct? ──YES──▶  serve  (99% of traffic, ~1.1s)
-  │ NO
-prism-coder:32b ─── correct? ──YES──▶  serve  (~1% of traffic, ~0.8s)
-  │ NO
-Claude Opus 4.7 ──────────────────────▶  serve  (0% in practice, cloud)
+Query arrives
+  │
+  ▼
+prism-coder:14b ── routes (100% eval_300) ──▶  serve  (~3s, 9GB, FREE)
+  │                                              │
+  │                                    knowledge_search (RAG context)
+  │                                              │
+  ▼                                              ▼
+prism-coder:4b ── verifies claims ──────────▶  grounded response
+  │                 (2.5GB, <1s)
+  │
+  ▼  (complex tasks only, explicit ceiling="32b")
+prism-coder:32b ── deep reasoning ──────────▶  serve  (~8s, 19GB, FREE)
+  │
+  ▼  (cloud fallback when local insufficient)
+Claude Sonnet 4 → Claude Opus 4.7 ─────────▶  serve  (cloud, ~$0.01/req)
 ```
-**Mobile / offline cascade** (availability-first, used in Prism AAC iOS):
+| Tier | Model | Role | RAM | Latency | Cost |
+|------|-------|------|-----|---------|------|
+| **Default** | prism-coder:14b | Router + general inference | 9 GB | ~3s | $0 |
+| **Verifier** | prism-coder:4b | Grounding claims check | 2.5 GB | <1s | $0 |
+| **Complex** | prism-coder:32b | Deep reasoning (on-demand) | 19 GB | ~8s | $0 |
+| **Cloud** | Sonnet → Opus | Fallback for max quality | — | ~5-10s | ~$0.01 |
+**Mobile / offline cascade** (Prism AAC iOS):
 ```
-prism-coder:14b (~1.1s) — iPad Pro 16GB  →  prism-coder:8b (~0.8s) — iPhone/iPad 8GB
-  →  prism-coder:1.7b (~1.6s) — any device, always fits
+prism-coder:14b (iPad Pro 16GB) → prism-coder:4b (iPhone 8GB)
+  → prism-coder:1.7b (any device, always fits)
 ```
-**Code generation cascade** (used in Prism Coder IDE + Agent Mode):
-```
-prism-ide:14b ─── quality OK? ──YES──▶  serve  (~1.1s, 22/22 TypeScript eval)
-  │ NO (complex / multi-file)
-prism-ide:32b ─── quality OK? ──YES──▶  serve  (~0.8s MoE, deep reasoning)
-  │ NO
-Claude Sonnet 4 ──────────────────────▶  serve  (cloud fallback)
+### Knowledge ingestion — teach Prism your codebase
+Your code knowledge lives in the knowledge graph, not in model weights. Routing stays at 100%.
+```bash
+bash scripts/knowledge-ingest/setup.sh   # one-time setup
+# Then every git commit auto-indexes changed files into the knowledge graph
 ```
-The routing cascade validates each response against the 6 known tool names and escalates on empty, truncated, or hallucinated tool calls. The code generation cascade escalates on incomplete or syntactically invalid output.
+Three entry points:
+- **MCP tool**: `knowledge_ingest` — AI says "learn this code"
+- **GitHub webhook**: `POST /api/github/webhook` — auto on push
+- **REST API**: `POST /api/v1/prism/ingest` — open interface
+See [KNOWLEDGE_INGESTION.md](docs/KNOWLEDGE_INGESTION.md) for full setup guide.
+### Cost comparison
+Benchmark: 19 queries (routing + code knowledge + clinical), May 2026:
+| Architecture | Routing | Code Knowledge | Clinical | Annual Cost (1K/day) |
+|---|---|---|---|---|
+| **Prism cascade** (14b→RAG→Sonnet) | 100% (local) | RAG-powered | Sonnet | **~$330/yr** |
+| Claude Opus for everything | ~30% (no tools) | Training data | Opus | ~$10,600/yr |
+**84% cost savings.** Routing is free and 100% accurate. Cloud only for the 20% of queries that need deep reasoning.
+The routing cascade validates each response against the known tool names and escalates on empty, truncated, or hallucinated tool calls.
 **Routing accuracy** ([102-case Prism eval](tests/benchmarks/prism-routing-100/README.md), v36/v7 system prompt, 3-seed mean, May 2026):

package/dist/utils/groundingVerifier.js CHANGED Viewed

@@ -9,9 +9,9 @@
  * stateless MCP), pointed at free-form generation instead of tool-call
  * responses.
  *
- * Cascade role: prism-coder:1b7 is the default verifier on every
- * device (server, iPad). Larger tiers (8B/14B/32B) draft; 1b7 verifies.
- * Different model from the drafter — satisfies the Patronus rule.
+ * Cascade role: prism-coder:4b is the default verifier (fast, 2.5GB).
+ * 14b drafts; 4b verifies. Different model = Patronus rule satisfied.
+ * Falls back to 1b7 on devices with <4GB free RAM.
  *
  * Failure modes:
  *   - Verifier model unreachable / timeout → fail-closed refusal
@@ -93,7 +93,7 @@ function refusalText(action, failedClaim) {
     }
 }
 export async function verifyGrounding(opts) {
-    const verifierModel = opts.verifierModel ?? "prism-coder:1b7";
+    const verifierModel = opts.verifierModel ?? "prism-coder:4b";
     const timeoutMs = opts.timeoutMs ?? 2000;
     const ollamaUrl = opts.ollamaUrl ?? PRISM_LOCAL_LLM_URL;
     const fetchImpl = opts.fetchImpl ?? fetch;

package/dist/utils/modelPicker.js CHANGED Viewed

@@ -1,25 +1,25 @@
 /**
  * RAM-Gated Local Model Picker
  * ─────────────────────────────────────────────────────────────
- * Pure function. Given free RAM in bytes, return the largest
- * prism-coder tag whose Q4_K_M weights + KV-cache headroom fit.
+ * Cascade: 14b (default) → 4b (verifier) → 32b (complex only).
  *
- * Thresholds reflect observed footprint on Apple Silicon with
- * 8K–32K context windows (Q4_K_M weights + KV cache + activations
- * + OS headroom). They are intentionally conservative so picking
- * a tier never OOMs the machine.
+ * The default ceiling is "14b" — NOT "32b". This means:
+ *   - 14b is the primary model for routing + general inference
+ *   - 4b is used as the grounding verifier (fast, small)
+ *   - 32b is only loaded when caller explicitly passes ceiling="32b"
+ *     or when the task requires maximum quality (complex code gen, etc.)
  *
- *   tag                 weights   need free   ctx
- *   prism-coder:32b     ~19 GB    ≥ 24 GB     32K
- *   prism-coder:14b     ~ 9 GB    ≥ 12 GB     32K
- *   prism-coder:4b      ~ 2.5 GB  ≥  4 GB      8K
- *   prism-coder:8b      ~ 5 GB    ≥  7 GB     32K
- *   prism-coder:1b7     ~ 2 GB    ≥  3 GB      8K
+ * This saves 10GB+ RAM on most devices and keeps response times fast.
+ * The 14b achieves 100% on eval_300 — same as 32b.
  *
- * Below 3 GB free → no local pick (caller must use cloud).
+ *   tag                 weights   need free   ctx     role
+ *   prism-coder:32b     ~19 GB    ≥ 24 GB     32K    complex (on-demand)
+ *   prism-coder:14b     ~ 9 GB    ≥ 12 GB     32K    default router
+ *   prism-coder:8b      ~ 5 GB    ≥  7 GB     32K    fallback
+ *   prism-coder:4b      ~ 2.5 GB  ≥  4 GB      8K    verifier + mobile
+ *   prism-coder:1b7     ~ 2 GB    ≥  3 GB      8K    watch + ultra-low RAM
  *
- * Note: thresholds use BINARY GB (1024^3) — matches what `os.freemem()`
- * reports on macOS/Linux.
+ * Below 3 GB free → no local pick (caller must use cloud).
  */
 const GB = 1024 ** 3;
 /**
@@ -44,21 +44,21 @@ export const MODEL_TIERS = [
 function tagMatches(installed, tierTag) {
     return installed === tierTag || installed.endsWith(`/${tierTag}`);
 }
+/** Default ceiling: 14b. Pass ceiling="32b" explicitly for max quality. */
+export const DEFAULT_CEILING = "14b";
 /**
- * Pick the largest viable tier for the given free RAM.
- * Returns null when no tier fits (caller should go cloud-only).
+ * Pick the best viable tier for the given free RAM.
+ * Default ceiling is 14b — use ceiling="32b" only for complex tasks.
  *
  * @param freeBytes  Result of os.freemem() — binary bytes
- * @param ceiling    Optional cap (e.g. "14b" to forbid 32B even if RAM allows)
- * @param available  Optional whitelist — only consider tags in this set. Accepts
- *                   bare (`prism-coder:32b`) or namespaced (`dcostenco/prism-coder:32b`).
+ * @param ceiling    Cap tier. Default "14b". Pass "32b" for complex tasks.
+ * @param available  Optional whitelist of installed Ollama tags.
  */
 export function pickLocalModel(freeBytes, ceiling, available) {
     if (!Number.isFinite(freeBytes) || freeBytes <= 0)
         return null;
-    const ceilingIdx = ceiling
-        ? MODEL_TIERS.findIndex(t => t.tag.endsWith(ceiling) || t.tag === ceiling)
-        : 0;
+    const effectiveCeiling = ceiling || DEFAULT_CEILING;
+    const ceilingIdx = MODEL_TIERS.findIndex(t => t.tag.endsWith(effectiveCeiling) || t.tag === effectiveCeiling);
     const startIdx = ceilingIdx >= 0 ? ceilingIdx : 0;
     for (let i = startIdx; i < MODEL_TIERS.length; i++) {
         const tier = MODEL_TIERS[i];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "prism-mcp-server",
-  "version": "15.7.0",
+  "version": "15.7.2",
   "mcpName": "io.github.dcostenco/prism-coder",
   "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
   "module": "index.ts",