prism-mcp-server 15.7.0 → 15.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -58,10 +58,11 @@ Free tier runs entirely on your machine — SQLite, local embedding model, no AP
58
58
 
59
59
  Install in one command — no config, no keys, no vendor agreements:
60
60
  ```bash
61
- ollama pull dcostenco/prism-coder:1b7 # 2.2 GB · ~1.6s · any machine
62
- ollama pull dcostenco/prism-coder:8b # 4.7 GB · ~0.8s · Mac M1+ / iPhone 8GB
63
- ollama pull dcostenco/prism-coder:14b # 8.4 GB · ~1.1s · Mac M2+ / iPad Pro 16GB
64
- ollama pull dcostenco/prism-coder:32b # 16 GB · ~0.8s · Mac M2 Ultra+ (30B-A3B MoE)
61
+ ollama pull dcostenco/prism-coder:14b # 9 GB · default router · Mac M2+ / iPad Pro
62
+ ollama pull dcostenco/prism-coder:4b # 2.5 GB · verifier · iPhone 15/16 Pro
63
+ ollama pull dcostenco/prism-coder:1b7 # 2.2 GB · ultra-low RAM / Apple Watch
64
+ ollama pull dcostenco/prism-coder:32b # 19 GB · complex tasks · Mac M2 Ultra+
65
+ ollama pull dcostenco/prism-coder:8b # 4.7 GB · balanced · iPhone/iPad 8GB
65
66
  ```
66
67
 
67
68
  Prism MCP detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tag forms automatically — nothing else to configure. If you want the bare tags as aliases for direct `ollama run prism-coder:14b` use, run:
@@ -73,33 +74,68 @@ prism register-models --dry-run # preview what would be aliased
73
74
 
74
75
  ### Cascade architecture
75
76
 
76
- Two cascades operate independently depending on the deployment context:
77
+ Three-tier local cascade with cloud fallback:
77
78
 
78
- **Desktop / server cascade** (quality-first, used in Prism MCP + Synalux portal):
79
79
  ```
80
- prism-coder:14b ─── correct? ──YES──▶ serve (99% of traffic, ~1.1s)
81
- NO
82
- prism-coder:32b ─── correct? ──YES──▶ serve (~1% of traffic, ~0.8s)
83
- NO
84
- Claude Opus 4.7 ──────────────────────▶ serve (0% in practice, cloud)
80
+ Query arrives
81
+
82
+
83
+ prism-coder:14b ── routes (100% eval_300) ──▶ serve (~3s, 9GB, FREE)
84
+ │ │
85
+ │ knowledge_search (RAG context)
86
+ │ │
87
+ ▼ ▼
88
+ prism-coder:4b ── verifies claims ──────────▶ grounded response
89
+ │ (2.5GB, <1s)
90
+
91
+ ▼ (complex tasks only, explicit ceiling="32b")
92
+ prism-coder:32b ── deep reasoning ──────────▶ serve (~8s, 19GB, FREE)
93
+
94
+ ▼ (cloud fallback when local insufficient)
95
+ Claude Sonnet 4 → Claude Opus 4.7 ─────────▶ serve (cloud, ~$0.01/req)
85
96
  ```
86
97
 
87
- **Mobile / offline cascade** (availability-first, used in Prism AAC iOS):
98
+ | Tier | Model | Role | RAM | Latency | Cost |
99
+ |------|-------|------|-----|---------|------|
100
+ | **Default** | prism-coder:14b | Router + general inference | 9 GB | ~3s | $0 |
101
+ | **Verifier** | prism-coder:4b | Grounding claims check | 2.5 GB | <1s | $0 |
102
+ | **Complex** | prism-coder:32b | Deep reasoning (on-demand) | 19 GB | ~8s | $0 |
103
+ | **Cloud** | Sonnet → Opus | Fallback for max quality | — | ~5-10s | ~$0.01 |
104
+
105
+ **Mobile / offline cascade** (Prism AAC iOS):
88
106
  ```
89
- prism-coder:14b (~1.1s) — iPad Pro 16GB prism-coder:8b (~0.8s) — iPhone/iPad 8GB
90
- prism-coder:1.7b (~1.6s) — any device, always fits
107
+ prism-coder:14b (iPad Pro 16GB) prism-coder:4b (iPhone 8GB)
108
+ prism-coder:1.7b (any device, always fits)
91
109
  ```
92
110
 
93
- **Code generation cascade** (used in Prism Coder IDE + Agent Mode):
94
- ```
95
- prism-ide:14b ─── quality OK? ──YES──▶ serve (~1.1s, 22/22 TypeScript eval)
96
- │ NO (complex / multi-file)
97
- prism-ide:32b ─── quality OK? ──YES──▶ serve (~0.8s MoE, deep reasoning)
98
- NO
99
- Claude Sonnet 4 ──────────────────────▶ serve (cloud fallback)
111
+ ### Knowledge ingestion teach Prism your codebase
112
+
113
+ Your code knowledge lives in the knowledge graph, not in model weights. Routing stays at 100%.
114
+
115
+ ```bash
116
+ bash scripts/knowledge-ingest/setup.sh # one-time setup
117
+ # Then every git commit auto-indexes changed files into the knowledge graph
100
118
  ```
101
119
 
102
- The routing cascade validates each response against the 6 known tool names and escalates on empty, truncated, or hallucinated tool calls. The code generation cascade escalates on incomplete or syntactically invalid output.
120
+ Three entry points:
121
+ - **MCP tool**: `knowledge_ingest` — AI says "learn this code"
122
+ - **GitHub webhook**: `POST /api/github/webhook` — auto on push
123
+ - **REST API**: `POST /api/v1/prism/ingest` — open interface
124
+
125
+ See [KNOWLEDGE_INGESTION.md](docs/KNOWLEDGE_INGESTION.md) for full setup guide.
126
+
127
+ ### Cost comparison
128
+
129
+ Benchmark: 19 queries (routing + code knowledge + clinical), May 2026:
130
+
131
+ | Architecture | Routing | Code Knowledge | Clinical | Annual Cost (1K/day) |
132
+ |---|---|---|---|---|
133
+ | **Prism cascade** (14b→RAG→Sonnet) | 100% (local) | RAG-powered | Sonnet | **~$330/yr** |
134
+ | Claude Opus for everything | ~30% (no tools) | Training data | Opus | ~$10,600/yr |
135
+
136
+ **84% cost savings.** Routing is free and 100% accurate. Cloud only for the 20% of queries that need deep reasoning.
137
+
138
+ The routing cascade validates each response against the known tool names and escalates on empty, truncated, or hallucinated tool calls.
103
139
 
104
140
  **Routing accuracy** ([102-case Prism eval](tests/benchmarks/prism-routing-100/README.md), v36/v7 system prompt, 3-seed mean, May 2026):
105
141
 
@@ -9,9 +9,9 @@
9
9
  * stateless MCP), pointed at free-form generation instead of tool-call
10
10
  * responses.
11
11
  *
12
- * Cascade role: prism-coder:1b7 is the default verifier on every
13
- * device (server, iPad). Larger tiers (8B/14B/32B) draft; 1b7 verifies.
14
- * Different model from the drafter satisfies the Patronus rule.
12
+ * Cascade role: prism-coder:4b is the default verifier (fast, 2.5GB).
13
+ * 14b drafts; 4b verifies. Different model = Patronus rule satisfied.
14
+ * Falls back to 1b7 on devices with <4GB free RAM.
15
15
  *
16
16
  * Failure modes:
17
17
  * - Verifier model unreachable / timeout → fail-closed refusal
@@ -93,7 +93,7 @@ function refusalText(action, failedClaim) {
93
93
  }
94
94
  }
95
95
  export async function verifyGrounding(opts) {
96
- const verifierModel = opts.verifierModel ?? "prism-coder:1b7";
96
+ const verifierModel = opts.verifierModel ?? "prism-coder:4b";
97
97
  const timeoutMs = opts.timeoutMs ?? 2000;
98
98
  const ollamaUrl = opts.ollamaUrl ?? PRISM_LOCAL_LLM_URL;
99
99
  const fetchImpl = opts.fetchImpl ?? fetch;
@@ -1,25 +1,25 @@
1
1
  /**
2
2
  * RAM-Gated Local Model Picker
3
3
  * ─────────────────────────────────────────────────────────────
4
- * Pure function. Given free RAM in bytes, return the largest
5
- * prism-coder tag whose Q4_K_M weights + KV-cache headroom fit.
4
+ * Cascade: 14b (default) 4b (verifier) 32b (complex only).
6
5
  *
7
- * Thresholds reflect observed footprint on Apple Silicon with
8
- * 8K–32K context windows (Q4_K_M weights + KV cache + activations
9
- * + OS headroom). They are intentionally conservative so picking
10
- * a tier never OOMs the machine.
6
+ * The default ceiling is "14b" NOT "32b". This means:
7
+ * - 14b is the primary model for routing + general inference
8
+ * - 4b is used as the grounding verifier (fast, small)
9
+ * - 32b is only loaded when caller explicitly passes ceiling="32b"
10
+ * or when the task requires maximum quality (complex code gen, etc.)
11
11
  *
12
- * tag weights need free ctx
13
- * prism-coder:32b ~19 GB ≥ 24 GB 32K
14
- * prism-coder:14b ~ 9 GB ≥ 12 GB 32K
15
- * prism-coder:4b ~ 2.5 GB ≥ 4 GB 8K
16
- * prism-coder:8b ~ 5 GB ≥ 7 GB 32K
17
- * prism-coder:1b7 ~ 2 GB ≥ 3 GB 8K
12
+ * This saves 10GB+ RAM on most devices and keeps response times fast.
13
+ * The 14b achieves 100% on eval_300 — same as 32b.
18
14
  *
19
- * Below 3 GB free → no local pick (caller must use cloud).
15
+ * tag weights need free ctx role
16
+ * prism-coder:32b ~19 GB ≥ 24 GB 32K complex (on-demand)
17
+ * prism-coder:14b ~ 9 GB ≥ 12 GB 32K default router
18
+ * prism-coder:8b ~ 5 GB ≥ 7 GB 32K fallback
19
+ * prism-coder:4b ~ 2.5 GB ≥ 4 GB 8K verifier + mobile
20
+ * prism-coder:1b7 ~ 2 GB ≥ 3 GB 8K watch + ultra-low RAM
20
21
  *
21
- * Note: thresholds use BINARY GB (1024^3) matches what `os.freemem()`
22
- * reports on macOS/Linux.
22
+ * Below 3 GB free no local pick (caller must use cloud).
23
23
  */
24
24
  const GB = 1024 ** 3;
25
25
  /**
@@ -44,21 +44,21 @@ export const MODEL_TIERS = [
44
44
  function tagMatches(installed, tierTag) {
45
45
  return installed === tierTag || installed.endsWith(`/${tierTag}`);
46
46
  }
47
+ /** Default ceiling: 14b. Pass ceiling="32b" explicitly for max quality. */
48
+ export const DEFAULT_CEILING = "14b";
47
49
  /**
48
- * Pick the largest viable tier for the given free RAM.
49
- * Returns null when no tier fits (caller should go cloud-only).
50
+ * Pick the best viable tier for the given free RAM.
51
+ * Default ceiling is 14b use ceiling="32b" only for complex tasks.
50
52
  *
51
53
  * @param freeBytes Result of os.freemem() — binary bytes
52
- * @param ceiling Optional cap (e.g. "14b" to forbid 32B even if RAM allows)
53
- * @param available Optional whitelist only consider tags in this set. Accepts
54
- * bare (`prism-coder:32b`) or namespaced (`dcostenco/prism-coder:32b`).
54
+ * @param ceiling Cap tier. Default "14b". Pass "32b" for complex tasks.
55
+ * @param available Optional whitelist of installed Ollama tags.
55
56
  */
56
57
  export function pickLocalModel(freeBytes, ceiling, available) {
57
58
  if (!Number.isFinite(freeBytes) || freeBytes <= 0)
58
59
  return null;
59
- const ceilingIdx = ceiling
60
- ? MODEL_TIERS.findIndex(t => t.tag.endsWith(ceiling) || t.tag === ceiling)
61
- : 0;
60
+ const effectiveCeiling = ceiling || DEFAULT_CEILING;
61
+ const ceilingIdx = MODEL_TIERS.findIndex(t => t.tag.endsWith(effectiveCeiling) || t.tag === effectiveCeiling);
62
62
  const startIdx = ceilingIdx >= 0 ? ceilingIdx : 0;
63
63
  for (let i = startIdx; i < MODEL_TIERS.length; i++) {
64
64
  const tier = MODEL_TIERS[i];
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "prism-mcp-server",
3
- "version": "15.7.0",
3
+ "version": "15.7.2",
4
4
  "mcpName": "io.github.dcostenco/prism-coder",
5
5
  "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
6
6
  "module": "index.ts",