npm - prism-mcp-server - Versions diffs - 15.4.0 → 15.5.1 - Mend

prism-mcp-server 15.4.0 → 15.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +82 -57
package/dist/cli.js +65 -0
package/dist/storage/index.js +4 -0
package/dist/tools/prismInferHandler.js +9 -4
package/dist/utils/modelPicker.js +39 -3
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 **Persistent memory + tool-calling intelligence for AI agents.** *(formerly Prism MCP)*
-A Model Context Protocol server that gives Claude, Cursor, and other AI tools a Mind Palace — long-term memory that survives across sessions, with semantic search, cognitive routing, a visual dashboard, and the `prism-coder:1b7` / `prism-coder:14b` / `prism-coder:32b` LLM fleet for offline tool-calling. **[→ prism-mcp.com](https://prism-mcp.com)**
+A Model Context Protocol server that gives Claude, Cursor, and other AI tools a Mind Palace — long-term memory that survives across sessions, with semantic search, cognitive routing, a visual dashboard, and the `prism-coder:1b7` / `prism-coder:8b` / `prism-coder:14b` / `prism-coder:32b` LLM fleet for offline tool-calling. **[→ prism-mcp.com](https://prism-mcp.com)**
 [![npm](https://img.shields.io/npm/v/prism-mcp-server?color=cb0000&label=npm%20%E2%80%94%20prism-mcp-server)](https://www.npmjs.com/package/prism-mcp-server)
 [![VS Marketplace](https://img.shields.io/visual-studio-marketplace/v/synalux-ai.synalux?label=VS%20Code&color=007ACC)](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
@@ -13,7 +13,7 @@ A Model Context Protocol server that gives Claude, Cursor, and other AI tools a
 [![Smithery](https://img.shields.io/badge/Smithery-listed-6B4FBB)](https://smithery.ai/server/@dcostenco/prism-mcp)
 [![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](LICENSE)
-> **Renamed in v14.0.0:** the project is now **Prism Coder** to cover both the Mind Palace memory server *and* the `prism-coder:1b7` / `prism-coder:14b` / `prism-coder:32b` LLM fleet on HuggingFace + Ollama. The npm package stays `prism-mcp-server` so existing install URLs and `mcp.json` entries keep working — the `prism-coder` binary has been the canonical entry point since v12.
+> **Renamed in v14.0.0:** the project is now **Prism Coder** to cover both the Mind Palace memory server *and* the `prism-coder:1b7` / `prism-coder:8b` / `prism-coder:14b` / `prism-coder:32b` LLM fleet on HuggingFace + Ollama. The npm package stays `prism-mcp-server` so existing install URLs and `mcp.json` entries keep working — the `prism-coder` binary has been the canonical entry point since v12.
 ---
@@ -61,19 +61,27 @@ Install in one command — no config, no keys, no vendor agreements:
 ollama pull dcostenco/prism-coder:1b7   # 2.2 GB · ~1.6s · any machine
 ollama pull dcostenco/prism-coder:8b    # 4.7 GB · ~0.8s · Mac M1+ / iPhone 8GB
 ollama pull dcostenco/prism-coder:14b   # 8.4 GB · ~1.1s · Mac M2+ / iPad Pro 16GB
-ollama pull dcostenco/prism-coder:32b   # 19 GB  · ~2.5s · Mac M2 Ultra+
+ollama pull dcostenco/prism-coder:32b   # 16 GB  · ~0.8s · Mac M2 Ultra+ (30B-A3B MoE)
 ```
+Prism MCP detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tag forms automatically — nothing else to configure. If you want the bare tags as aliases for direct `ollama run prism-coder:14b` use, run:
+```bash
+prism register-models           # aliases */prism-coder:* → prism-coder:* via `ollama cp`
+prism register-models --dry-run # preview what would be aliased
+```
 ### Cascade architecture
 Two cascades operate independently depending on the deployment context:
 **Desktop / server cascade** (quality-first, used in Prism MCP + Synalux portal):
 ```
-prism-coder:14b ─── correct? ──YES──▶  serve  (97% of traffic, ~1.1s)
+prism-coder:14b ─── correct? ──YES──▶  serve  (99% of traffic, ~1.1s)
   │ NO
-prism-coder:32b ─── correct? ──YES──▶  serve  (2% of traffic, ~2.5s)
+prism-coder:32b ─── correct? ──YES──▶  serve  (~1% of traffic, ~0.8s)
   │ NO
-Claude Opus 4.7 ──────────────────────▶  serve  (1% of traffic, cloud)
+Claude Opus 4.7 ──────────────────────▶  serve  (0% in practice, cloud)
 ```
 **Mobile / offline cascade** (availability-first, used in Prism AAC iOS):
@@ -82,27 +90,36 @@ prism-coder:14b (~1.1s) — iPad Pro 16GB  →  prism-coder:8b (~0.8s) — iPhon
   →  prism-coder:1.7b (~1.6s) — any device, always fits
 ```
-The cascade validates each response against the 6 known tool names and escalates on empty, truncated, or hallucinated tool calls.
+**Code generation cascade** (used in Prism Coder IDE + Agent Mode):
+```
+prism-ide:14b ─── quality OK? ──YES──▶  serve  (~1.1s, 22/22 TypeScript eval)
+  │ NO (complex / multi-file)
+prism-ide:32b ─── quality OK? ──YES──▶  serve  (~0.8s MoE, deep reasoning)
+  │ NO
+Claude Sonnet 4 ──────────────────────▶  serve  (cloud fallback)
+```
+The routing cascade validates each response against the 6 known tool names and escalates on empty, truncated, or hallucinated tool calls. The code generation cascade escalates on incomplete or syntactically invalid output.
-**Routing accuracy** ([102-case Prism eval](tests/benchmarks/prism-routing-100/README.md), v25 system prompt, 3-seed mean, May 2026):
+**Routing accuracy** ([102-case Prism eval](tests/benchmarks/prism-routing-100/README.md), v36/v7 system prompt, 3-seed mean, May 2026):
 | Model | Accuracy | Cost/req | Latency | Runs on | AAC | Edge cases |
 |---|---|---|---|---|---|---|
 | Claude Sonnet 4 | **99%** | ~$0.01 | 3.2s | Cloud | 100% | 83% |
-| **prism-coder:32b** v33 | **99.0%** | **$0** | 2.5s | Mac 48GB+ | **100%** | **100%** |
-| **prism-coder:8b** v35 | **98.0%** | **$0** | **0.8s** | iPhone/iPad 8GB | **100%** | **100%** |
-| **prism-coder:14b** v33 | **97.1%** | **$0** | **1.1s** | Mac 24GB+ / iPad Pro 16GB | **100%** | **100%** |
-| Claude Opus 4.7 | **97.1%** | ~$0.05 | 3.0s | Cloud | 100% | 83% |
-| **prism-coder:1.7b** v41 | **96.1%** | **$0** | 1.6s | Any device | **100%** | 83% |
-| **14B→32B cascade** | **99.0%** | **~$0** | ~1.1s¹ | Mac 24GB+ | **100%** | **100%** |
+| **prism-coder:32b** v7 | **100.0%** | **$0** | 0.8s | Mac 24GB+ (MoE) | **100%** | **100%** |
+| **prism-coder:8b** v36 | **100.0%** | **$0** | **0.8s** | iPhone/iPad 8GB | **100%** | **100%** |
+| **prism-coder:14b** v36 | **100.0%** | **$0** | **1.1s** | Mac 24GB+ / iPad Pro 16GB | **100%** | **100%** |
+| Claude Opus 4.7 | **98.3%** | ~$0.05 | 3.0s | Cloud | 100% | 83% |
+| **prism-coder:1.7b** v42 | **100.0%** | **$0** | 1.6s | Any device | **100%** | **100%** |
+| **14B→32B cascade** | **100.0%** | **~$0** | ~1.1s¹ | Mac 24GB+ | **100%** | **100%** |
-¹ 97% of requests served by 14B at 1.1s; 32B only for the 2% 14B misses; Opus for the 1% both miss.
+¹ ~99% of requests served by 14B at 1.1s; 32B for the ~1% 14B misses.
-**Why this matters for a life-critical AAC app**: a child in a hospital without WiFi, a nonverbal adult on an airplane, or a family on a budget gets Claude-grade routing accuracy (99%) with zero cloud dependency — and the AAC path (expressing pain, asking for help) routes correctly **100% of the time across all tiers and all seeds tested**.
+**Why this matters for a life-critical AAC app**: a child in a hospital without WiFi, a nonverbal adult on an airplane, or a family on a budget gets Claude-grade routing accuracy with zero cloud dependency — and the AAC path (expressing pain, asking for help) routes correctly **100% of the time across all tiers and all seeds tested**.
 **What it does NOT mean**: these scores measure routing precision on a narrow 6-tool taxonomy, not general intelligence. Claude outperforms these models on everything outside this task. The value is **offline reliability at zero cost**, not replacing Claude.
-> **The prompt engineering breakthrough**: Q4_K_M quantized models confuse semantically similar tool names when routing rules use plain keyword lists. Two structural fixes eliminated all confusion: (1) replacing `-> plain text` with `-> respond directly (no tool)`, and (2) adding category labels (`CONVERSATION RECALL:` / `SAVED KNOWLEDGE:`) as semantic anchors stronger than keyword matching. Combined effect: 14B went from 87% → 97% with zero retraining, zero cost.
+> **The prompt engineering breakthrough**: Q4_K_M quantized models confuse semantically similar tool names when routing rules use plain keyword lists. Two structural fixes eliminated all confusion: (1) replacing `-> plain text` with `-> respond directly (no tool)`, and (2) adding category labels (`CONVERSATION RECALL:` / `SAVED KNOWLEDGE:`) as semantic anchors stronger than keyword matching. Combined effect: 14B went from 87% → 100% on the 102-case Prism eval (v36/v7 system prompt, 3-seed mean).
 ### ⚡ Zero-search retrieval
 Holographic Reduced Representations (HRR) for instant similarity lookups without an index. ~5ms over 100K memories.
@@ -180,30 +197,30 @@ Prism Coder inference cascades through fine-tuned models first, with Claude as a
 | Model | Ollama tag | Where | Tier | Latency |
 |---|---|---|---|---|
-| **prism-coder:1.7b** | `prism-coder:1b7-v19-q8` (published) | On-device (Mac/local) · iOS via local network | Free | ~50ms |
-| **prism-coder:14b** | `prism-coder:14b` (published v19) | Cloud (OpenRouter) A100 via Synalux | Standard+ | ~200ms |
-| **prism-coder:32b** | `prism-coder:32b` (published v19) | Cloud (OpenRouter) A100 80GB via Synalux | Pro/Enterprise | ~3–5s |
+| **prism-coder:1.7b** | `prism-coder:1b7` (v42) | On-device (Mac/local) · iOS via llama.cpp | Free | ~1.6s |
+| **prism-coder:8b** | `prism-coder:8b` (v36) | On-device iPhone/iPad 8GB+ · local Mac | Free | ~0.8s |
+| **prism-coder:14b** | `prism-coder:14b` (v36) | On-device Mac 24GB+ · iPad Pro · Cloud A100 | Standard+ | ~1.1s |
+| **prism-coder:32b** | `prism-coder:32b` (v7 MoE) | Cloud (OpenRouter) A100 80GB via Synalux | Pro/Enterprise | ~0.8s |
-Models use the Synalux SFT corpus (AAC + Prism MCP tool taxonomy + clinical workflows). **Internal quality gate: ≥ 90% on the Prism 100-case eval before production promotion.**
+Models use the Synalux SFT corpus (AAC + Prism MCP tool taxonomy + clinical workflows). **Internal quality gate: ≥ 90% on the Prism 102-case eval before production promotion.**
 > **Training note**: Base Qwen3 models are strong tool-routers out of the box. Heavy fine-tuning regresses tool-vs-plain-text decisions; light-touch polish recipes (small corpus, balanced tool/plain-text split) are the published path. Production adapter selection and retrain methodology are managed in the Synalux portal.
-**Per-category breakdown — [Prism 102-case eval](tests/benchmarks/prism-routing-100/README.md) (3-seed mean, v25 system prompt, May 2026):**
+**Per-category breakdown — [Prism 102-case eval](tests/benchmarks/prism-routing-100/README.md) (3-seed mean, v36/v7 system prompt, May 2026):**
 | Model | Overall | Load ctx | Save | Srch mem | Handoff | Compact | Know srch | AAC | Translate | No-tool | Info | Edge | Avg lat | Inv |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-| **prism-coder:32b** v33 | **99.0%** | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 2.5s | 0 |
-| **prism-coder:8b** v35 | **98.0%** | 100% | 100% | 83% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 0.8s | 0 |
-| **prism-coder:14b** v33 | **97.1%** | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 1.1s | 0 |
-| **Claude Opus 4.7** | **97.1%** | 100% | 100% | 83% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 83% | 3.0s | 0 |
-| **prism-coder:1.7b** v41 | **96.1%** | 89% | 100% | 100% | 100% | 83% | 100% | 100% | 100% | 90% | 100% | 83% | 1.6s | 0 |
-| **14B→32B cascade** | **99.0%** | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | ~1.1s | 0 |
-> **Methodology**: 102-case pool across 12 categories. Scores are 3-seed mean (seeds 2027/2028/2029, zero variance across all seeds). All fine-tuned models use the Qwen3 nothink template. System prompt v25 uses category labels (`CONVERSATION RECALL:` / `SAVED KNOWLEDGE:`) and `-> respond directly (no tool)` to prevent quantization artifacts. Full runner: [`tests/benchmarks/prism-routing-100/benchmark.py`](tests/benchmarks/prism-routing-100/benchmark.py) · Cascade runner: [`tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py`](tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py).
+| **prism-coder:32b** v7 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 0.8s | 0 |
+| **prism-coder:8b** v36 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 0.8s | 0 |
+| **prism-coder:14b** v36 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 1.1s | 0 |
+| **Claude Opus 4.7** | **98.3%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 83% | 3.0s | 0 |
+| **prism-coder:1.7b** v42 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 1.6s | 0 |
+> **Methodology**: 102-case pool across 12 categories. Scores are 3-seed mean (seeds 2027/2028/2029, zero variance across all seeds). All fine-tuned models use the Qwen3 nothink template with keyword-trigger routing prompts and `-> respond directly (no tool)` for the no-tool class. Full runner: [`tests/benchmarks/prism-routing-100/benchmark.py`](tests/benchmarks/prism-routing-100/benchmark.py) · Cascade runner: [`tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py`](tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py).
 >
 > **These are NOT general-purpose LLM benchmarks.** This eval measures routing precision on 6 specific MCP tools. The prism-coder models are specialists trained on this exact task — they match or exceed Claude on routing while Claude dominates on general reasoning, coding, and open-domain QA. The value is **offline reliability at zero cost**, not replacing cloud AI.
-**iOS deployment:** On-device inference via **llama.cpp Swift SPM**. Auto-selects by device RAM: 14B on iPad Pro 16GB (97.1%), 8B on iPhone/iPad 8GB (98%, OOM fallback to 1.7B at 96.1%). CoreML not viable — coremltools doesn't support Qwen3 attention ops. Integration: `LLMEngine.swift` → `prismNativeBridge.askAI()` → token stream. WiFi fallback: Mac Ollama (`OLLAMA_HOST=0.0.0.0`).
+**iOS deployment:** On-device inference via **llama.cpp Swift SPM**. Auto-selects by device RAM: 14B on iPad Pro 16GB (100% routing), 8B on iPhone/iPad 8GB (100%, OOM fallback to 1.7B at 100%). CoreML not viable — coremltools doesn't support Qwen3 attention ops. Integration: `LLMEngine.swift` → `prismNativeBridge.askAI()` → token stream. WiFi fallback: Mac Ollama (`OLLAMA_HOST=0.0.0.0`).
 ### Benchmarks — run them yourself
@@ -234,10 +251,12 @@ python3 tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py
 | Model | HuggingFace | Solo BFCL | Cascade role | Size |
 |---|---|---|---|---|
-| prism-coder:32b | [dcostenco/prism-coder-32b](https://huggingface.co/dcostenco/prism-coder-32b) | **99.0%** | Tier 2 (catches 2% 14B misses) | 25 GB |
-| prism-coder:8b | [dcostenco/prism-coder-8b](https://huggingface.co/dcostenco/prism-coder-8b) | **98.0%** | Mobile tier 2 | 4.7 GB |
-| prism-coder:14b | [dcostenco/prism-coder-14b](https://huggingface.co/dcostenco/prism-coder-14b) | **97.1%** | Tier 1 (serves 97% of traffic) | 8.4 GB |
-| prism-coder:1.7b | [dcostenco/prism-coder-1.7b](https://huggingface.co/dcostenco/prism-coder-1.7b) | **96.1%** | On-device / always-fits fallback | 1.1 GB |
+| prism-coder:32b | [dcostenco/prism-coder-32b](https://huggingface.co/dcostenco/prism-coder-32b) | **100.0%** routing (v7 MoE) | Tier 2 (catches ~1% 14B misses) | 16 GB |
+| prism-coder:8b | [dcostenco/prism-coder-8b](https://huggingface.co/dcostenco/prism-coder-8b) | **100.0%** routing (v36) | Mobile tier | 4.7 GB |
+| prism-coder:14b | [dcostenco/prism-coder-14b](https://huggingface.co/dcostenco/prism-coder-14b) | **100.0%** routing (v36) | Tier 1 (serves ~99% of traffic) | 8.4 GB |
+| prism-coder:1.7b | [dcostenco/prism-coder-1.7b](https://huggingface.co/dcostenco/prism-coder-1.7b) | **100.0%** routing (v42) | On-device / always-fits fallback | 1.1 GB |
+| prism-ide:14b | [dcostenco/prism-ide](https://huggingface.co/dcostenco/prism-ide) | **22/22** TypeScript eval (v1) | Code generation tier 1 (~1.1s) | 8.4 GB |
+| prism-ide:32b | [dcostenco/prism-ide](https://huggingface.co/dcostenco/prism-ide) | Complex code + multi-file (v3) | Code generation tier 2 (~0.8s MoE) | 16 GB |
 ## Self-hosted / Local AI (Enterprise)
@@ -246,16 +265,16 @@ Run the full Prism model stack on your own hardware — zero cloud, zero latency
 **Requirements:** Mac M2 Pro+ (48GB recommended) or Linux with NVIDIA GPU · [Ollama](https://ollama.com)
 ```bash
-# On-device tier — 2.2 GB (any machine, iPhone)
+# On-device tier — 1.1 GB (any machine, iPhone) — 100% routing
 ollama pull dcostenco/prism-coder:1b7
-# Mobile tier — 4.7 GB (iPhone/iPad 8GB, Mac M1+)
+# Mobile tier — 4.7 GB (iPhone/iPad 8GB, Mac M1+) — 100% routing
 ollama pull dcostenco/prism-coder:8b
-# Standard tier — 8.4 GB (Mac M2 Pro+, iPad Pro 16GB)
+# Standard tier — 8.4 GB (Mac 24GB+, iPad Pro 16GB) — 100% routing
 ollama pull dcostenco/prism-coder:14b
-# Reasoning tier — 19 GB (Mac M2 Ultra+ or A100)
+# Reasoning tier — 16 GB (Mac M2 Ultra+, 30B-A3B MoE) — 100% routing
 ollama pull dcostenco/prism-coder:32b
 ```
@@ -264,8 +283,8 @@ Set `LOCAL_LLM_URL=http://localhost:11434` in your portal config. Routing is aut
 **Desktop/server**: 14B → 32B → Claude Opus fallback · **Mobile/offline**: 14B → 8B → 1.7B
 iOS/mobile on same WiFi: `OLLAMA_HOST=0.0.0.0 ollama serve` on the Mac, then point `LOCAL_LLM_URL` at the Mac's IP.
-Routing accuracy (May 2026, 3-seed mean): **32B = 99.0% · 8B = 98.0% · 14B = 97.1% · 1.7B = 96.1%**
-Cascade (14B→32B): **99.0%** · Opus solo: 97.1% · Opus engaged: **1% of requests** → [Full results](tests/benchmarks/cascade-14b-32b-opus/README.md)
+Routing accuracy (May 2026, v36/v7 system prompt, 3-seed mean): 32B v7 = **100.0%** · 8B v36 = **100.0%** · 14B v36 = **100.0%** · 1.7B v42 = **100.0%**
+Cascade (14B→32B): **100.0%** · Opus solo: 98.3% · Opus engaged: **0% of requests** → [Full results](tests/benchmarks/cascade-14b-32b-opus/README.md)
 ---
@@ -273,7 +292,7 @@ Cascade (14B→32B): **99.0%** · Opus solo: 97.1% · Opus engaged: **1% of requ
 | Plan | Cloud model | Daily limit | On-device |
 |---|---|---|---|
-| **Free** | — | unlimited local | prism-coder:1.7b (96.1%) + 8b (98.0%) + 14b (97.1%) |
+| **Free** | — | unlimited local | prism-coder:1.7b (100%) + 8b (100%) + 14b (100%) |
 | **Standard $19/mo** | Claude Sonnet 4 | 200 req | + cloud fallback |
 | **Pro $49/mo** | prism-coder:32b | 2,000 req | + reasoning tier |
 | **Enterprise $99/mo** | prism-coder:32b priority | unlimited | + HIPAA BAA + custom fine-tuning |
@@ -360,7 +379,7 @@ python3 tests/benchmarks/prism-routing-100/benchmark.py --models 1b7 14b 32b
 - BCBA skill integration
 - Deep storage tier
 - Dashboard rendering
-- Routing benchmarks (100-case Prism eval) — see `tests/benchmarks/prism-routing-100/`
+- Routing benchmarks (102-case Prism eval) — see `tests/benchmarks/prism-routing-100/`
 ## Migration
@@ -413,14 +432,16 @@ node scripts/migrate-local-to-portal.mjs --include-scholar
   └──────────┬───────────┘  └─────────────┬───────────────┘
              │                            │
              ▼                            ▼
-  ┌───────────────────────┐  ┌─────────────────────────────┐
-  │  OPENROUTER / LOCAL   │  │  SUPABASE                   │
-  │                       │  │  session ledgers            │
-  │  Cloud: Claude Sonnet │  │  knowledge graph            │
-  │  Local:  prism-coder  │  │  handoffs & todos           │
-  │   :14b (98%) :8b (96%)│  │                             │
-  │   :32b (97%) :1b7(88%)│  │  source of truth            │
-  └───────────────────────┘  └─────────────────────────────┘
+  ┌──────────────────────────────┐  ┌─────────────────────────────┐
+  │  OPENROUTER / LOCAL          │  │  SUPABASE                   │
+  │                              │  │  session ledgers            │
+  │  Cloud: Claude Sonnet 4      │  │  knowledge graph            │
+  │  Routing: prism-coder        │  │  handoffs & todos           │
+  │   :32b(100%) :14b(100%)      │  │                             │
+  │   :8b(100%)  :1b7(100%)      │  │  source of truth            │
+  │  Code:    prism-ide          │  │                             │
+  │   :14b · :32b                │  │                             │
+  └──────────────────────────────┘  └─────────────────────────────┘
 ```
 ### Service Routing
@@ -439,8 +460,8 @@ node scripts/migrate-local-to-portal.mjs --include-scholar
 | Surface | Primary | Fallback |
 |---|---|---|
 | AI Chat `@search` | Firecrawl | — |
-| Prism MCP agents (cloud) | Firecrawl | Brave Search |
-| Prism MCP server (local) | Brave Search (via MCP tools) | — |
+| Prism MCP agents (cloud) | Firecrawl | — |
+| Prism MCP server (local) | Firecrawl (via MCP tools) | — |
 | Clinical research | PubMed + ERIC + Semantic Scholar | DuckDuckGo |
 **TTS (Text-to-Speech)**
@@ -503,10 +524,14 @@ HuggingFace: dcostenco/prism-coder-{14b,8b,32b,1.7b} (public GGUF weights)
 | Plan | Cloud model | Daily limit | On-device |
 |---|---|---|---|
-| Free | — | unlimited local | prism-coder:1.7b |
-| Standard $19/mo | prism-coder:14b | 200 req | + cloud |
-| Pro $49/mo | prism-coder:32b | 2,000 req | + reasoning |
-| Enterprise $99/mo | prism-coder:32b priority | unlimited | full stack |
+| **Free** | — | unlimited local | prism-coder:1.7b (100%) + 8b (100%) + 14b (100%) |
+| **Standard $19/mo** | Claude Sonnet 4 | 200 req | + cloud fallback |
+| **Pro $49/mo** | prism-coder:32b | 2,000 req | + reasoning tier |
+| **Enterprise $99/mo** | prism-coder:32b priority | unlimited | + HIPAA BAA + custom fine-tuning |
+All on-device models are **free for every tier** — no subscription needed for local inference. Offline translation (1,261 phrases × 20 languages) included in all plans.
+[Subscribe →](https://synalux.ai/pricing)
 See [`docs/WOW_FEATURES.md`](docs/WOW_FEATURES.md) for the algorithm catalogue. Release notes in [`docs/releases/v14.0.0-prism-as-foundation.md`](docs/releases/v14.0.0-prism-as-foundation.md).

package/dist/cli.js CHANGED Viewed

@@ -519,4 +519,69 @@ scmCmd
         process.exit(1);
     }
 });
+// ─── prism register-models ────────────────────────────────────
+// Convenience: alias namespaced HF-style prism-coder tags
+// (`dcostenco/prism-coder:14b`) to the bare tags (`prism-coder:14b`)
+// some external tooling expects. The MCP picker handles both forms
+// natively as of v15.5, so this command is OPTIONAL — useful only
+// when a user wants to run `ollama run prism-coder:14b` directly,
+// or for tools that pre-date the picker's namespace fallback.
+program
+    .command('register-models')
+    .description('Alias namespaced prism-coder Ollama tags to bare tags (optional convenience)')
+    .option('-u, --url <url>', 'Ollama base URL', process.env.PRISM_LOCAL_LLM_URL || 'http://localhost:11434')
+    .option('--dry-run', 'Print what would be aliased without running ollama cp')
+    .action(async (options) => {
+    let installed = [];
+    try {
+        const res = await fetch(`${options.url}/api/tags`, { signal: AbortSignal.timeout(3_000) });
+        if (!res.ok) {
+            console.error(`Ollama /api/tags returned HTTP ${res.status}. Is Ollama running at ${options.url}?`);
+            process.exit(1);
+        }
+        const data = (await res.json());
+        installed = data.models ?? [];
+    }
+    catch (err) {
+        console.error(`Cannot reach Ollama at ${options.url}: ${err instanceof Error ? err.message : String(err)}`);
+        process.exit(1);
+    }
+    const installedNames = new Set(installed.map(m => m.name));
+    const candidates = installed
+        .map(m => m.name)
+        .filter(n => /\/prism-coder:/.test(n))
+        .map(n => ({ from: n, to: n.replace(/^[^/]+\//, '') }))
+        .filter(({ to }) => !installedNames.has(to));
+    if (candidates.length === 0) {
+        console.log('Nothing to do — no namespaced prism-coder tags need aliasing.');
+        return;
+    }
+    console.log(`Found ${candidates.length} model(s) to alias:`);
+    for (const { from, to } of candidates) {
+        console.log(`  ${from}  →  ${to}`);
+    }
+    if (options.dryRun) {
+        console.log('\n(dry-run — no changes made)');
+        return;
+    }
+    const { execFile } = await import('node:child_process');
+    const { promisify } = await import('node:util');
+    const exec = promisify(execFile);
+    let ok = 0;
+    let fail = 0;
+    for (const { from, to } of candidates) {
+        try {
+            await exec('ollama', ['cp', from, to]);
+            console.log(`  ✓ aliased ${to}`);
+            ok++;
+        }
+        catch (err) {
+            console.error(`  ✗ ${from} → ${to}: ${err instanceof Error ? err.message : String(err)}`);
+            fail++;
+        }
+    }
+    console.log(`\nDone. Aliased ${ok}, failed ${fail}.`);
+    if (fail > 0)
+        process.exit(1);
+});
 program.parse(process.argv);

package/dist/storage/index.js CHANGED Viewed

@@ -131,3 +131,7 @@ export async function closeStorage() {
         storageInstance = null;
     }
 }
+/** Test-only: inject a pre-initialized storage instance into the singleton slot. */
+export function _setStorageForTesting(instance) {
+    storageInstance = instance;
+}

package/dist/tools/prismInferHandler.js CHANGED Viewed

@@ -19,7 +19,7 @@
  * directly — all cloud traffic goes via the synalux portal so billing,
  * tier gating, and HIPAA audit are enforced in one place.
  */
-import { pickLocalModel, fmtGb, MODEL_TIERS } from "../utils/modelPicker.js";
+import { pickLocalModel, fmtGb, MODEL_TIERS, resolveOllamaName } from "../utils/modelPicker.js";
 import { getSynaluxJwt, invalidateSynaluxJwt } from "../utils/synaluxJwt.js";
 import { getAvailableMemoryBytes } from "../utils/availableMemory.js";
 import { PRISM_SYNALUX_BASE_URL, PRISM_LOCAL_LLM_URL, } from "../config.js";
@@ -249,20 +249,25 @@ export async function runInfer(args, deps) {
         let anyViable = false;
         for (let i = ceilStart; i < MODEL_TIERS.length; i++) {
             const tier = MODEL_TIERS[i];
-            if (!installed.has(tier.tag)) {
+            // Accept the tier whether Ollama reports it as bare (`prism-coder:32b`)
+            // or namespaced (`dcostenco/prism-coder:32b`, the form `ollama pull`
+            // produces from a HF repo). resolveOllamaName returns the actual
+            // name Ollama knows so /api/generate finds the model.
+            const ollamaName = resolveOllamaName(tier.tag, installed);
+            if (!installed.has(ollamaName)) {
                 attempts.push({ tier: tier.tag, reason: "not_pulled" });
                 continue;
             }
             // RAM gate — but skip the check if the tier is already warm in
             // Ollama. Reused models don't reallocate weight buffers.
-            const isWarm = loaded.has(tier.tag);
+            const isWarm = loaded.has(ollamaName);
             if (!isWarm && freeBytes < tier.minFreeGb * (1024 ** 3)) {
                 attempts.push({ tier: tier.tag, reason: "ram_insufficient" });
                 continue;
             }
             anyViable = true;
             const timeout = args.timeout_ms ?? DEFAULT_TIMEOUTS[tier.tag] ?? 60_000;
-            const result = await deps.callLocal(deps.ollamaUrl, tier.tag, args.prompt, args.system, maxTokens, temperature, timeout);
+            const result = await deps.callLocal(deps.ollamaUrl, ollamaName, args.prompt, args.system, maxTokens, temperature, timeout);
             if (result.ok) {
                 return {
                     output: result.text,

package/dist/utils/modelPicker.js CHANGED Viewed

@@ -31,13 +31,25 @@ export const MODEL_TIERS = [
     { tag: 'prism-coder:8b', weightsGb: 5, minFreeGb: 7, ctxTokens: 32_768 },
     { tag: 'prism-coder:1b7', weightsGb: 2, minFreeGb: 3, ctxTokens: 8_192 },
 ];
+/**
+ * True when `installed` matches `tierTag` either as a bare tag
+ * (`prism-coder:32b`) or as a namespaced HuggingFace-style tag
+ * (`dcostenco/prism-coder:32b`). The README documents `ollama pull
+ * dcostenco/prism-coder:32b`, so Ollama's /api/tags returns the
+ * namespaced form — without this matcher the picker would never
+ * see them and silently fall through to cloud.
+ */
+function tagMatches(installed, tierTag) {
+    return installed === tierTag || installed.endsWith(`/${tierTag}`);
+}
 /**
  * Pick the largest viable tier for the given free RAM.
  * Returns null when no tier fits (caller should go cloud-only).
  *
  * @param freeBytes  Result of os.freemem() — binary bytes
  * @param ceiling    Optional cap (e.g. "14b" to forbid 32B even if RAM allows)
- * @param available  Optional whitelist — only consider tags in this set
+ * @param available  Optional whitelist — only consider tags in this set. Accepts
+ *                   bare (`prism-coder:32b`) or namespaced (`dcostenco/prism-coder:32b`).
  */
 export function pickLocalModel(freeBytes, ceiling, available) {
     if (!Number.isFinite(freeBytes) || freeBytes <= 0)
@@ -50,12 +62,36 @@ export function pickLocalModel(freeBytes, ceiling, available) {
         const tier = MODEL_TIERS[i];
         if (freeBytes < tier.minFreeGb * GB)
             continue;
-        if (available && !available.has(tier.tag))
-            continue;
+        if (available) {
+            let found = false;
+            for (const a of available) {
+                if (tagMatches(a, tier.tag)) {
+                    found = true;
+                    break;
+                }
+            }
+            if (!found)
+                continue;
+        }
         return tier;
     }
     return null;
 }
+/**
+ * Resolve a tier tag to the actual Ollama name installed locally.
+ * If `installed` contains a namespaced match (e.g. `dcostenco/prism-coder:32b`),
+ * the namespaced form is returned so Ollama's /api/generate finds it.
+ * Falls back to the bare tag when only the bare form is present.
+ */
+export function resolveOllamaName(tierTag, installed) {
+    if (installed.has(tierTag))
+        return tierTag;
+    for (const a of installed) {
+        if (a.endsWith(`/${tierTag}`))
+            return a;
+    }
+    return tierTag;
+}
 /**
  * Format a byte count for logging. 12_884_901_888 → "12.0 GB".
  */

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "prism-mcp-server",
-  "version": "15.4.0",
+  "version": "15.5.1",
   "mcpName": "io.github.dcostenco/prism-coder",
   "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
   "module": "index.ts",