prism-mcp-server 15.7.0 → 15.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +58 -22
- package/dist/utils/groundingVerifier.js +4 -4
- package/dist/utils/modelPicker.js +23 -23
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -58,10 +58,11 @@ Free tier runs entirely on your machine — SQLite, local embedding model, no AP
|
|
|
58
58
|
|
|
59
59
|
Install in one command — no config, no keys, no vendor agreements:
|
|
60
60
|
```bash
|
|
61
|
-
ollama pull dcostenco/prism-coder:
|
|
62
|
-
ollama pull dcostenco/prism-coder:
|
|
63
|
-
ollama pull dcostenco/prism-coder:
|
|
64
|
-
ollama pull dcostenco/prism-coder:32b #
|
|
61
|
+
ollama pull dcostenco/prism-coder:14b # 9 GB · default router · Mac M2+ / iPad Pro
|
|
62
|
+
ollama pull dcostenco/prism-coder:4b # 2.5 GB · verifier · iPhone 15/16 Pro
|
|
63
|
+
ollama pull dcostenco/prism-coder:1b7 # 2.2 GB · ultra-low RAM / Apple Watch
|
|
64
|
+
ollama pull dcostenco/prism-coder:32b # 19 GB · complex tasks · Mac M2 Ultra+
|
|
65
|
+
ollama pull dcostenco/prism-coder:8b # 4.7 GB · balanced · iPhone/iPad 8GB
|
|
65
66
|
```
|
|
66
67
|
|
|
67
68
|
Prism MCP detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tag forms automatically — nothing else to configure. If you want the bare tags as aliases for direct `ollama run prism-coder:14b` use, run:
|
|
@@ -73,33 +74,68 @@ prism register-models --dry-run # preview what would be aliased
|
|
|
73
74
|
|
|
74
75
|
### Cascade architecture
|
|
75
76
|
|
|
76
|
-
|
|
77
|
+
Three-tier local cascade with cloud fallback:
|
|
77
78
|
|
|
78
|
-
**Desktop / server cascade** (quality-first, used in Prism MCP + Synalux portal):
|
|
79
79
|
```
|
|
80
|
-
|
|
81
|
-
│
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
80
|
+
Query arrives
|
|
81
|
+
│
|
|
82
|
+
▼
|
|
83
|
+
prism-coder:14b ── routes (100% eval_300) ──▶ serve (~3s, 9GB, FREE)
|
|
84
|
+
│ │
|
|
85
|
+
│ knowledge_search (RAG context)
|
|
86
|
+
│ │
|
|
87
|
+
▼ ▼
|
|
88
|
+
prism-coder:4b ── verifies claims ──────────▶ grounded response
|
|
89
|
+
│ (2.5GB, <1s)
|
|
90
|
+
│
|
|
91
|
+
▼ (complex tasks only, explicit ceiling="32b")
|
|
92
|
+
prism-coder:32b ── deep reasoning ──────────▶ serve (~8s, 19GB, FREE)
|
|
93
|
+
│
|
|
94
|
+
▼ (cloud fallback when local insufficient)
|
|
95
|
+
Claude Sonnet 4 → Claude Opus 4.7 ─────────▶ serve (cloud, ~$0.01/req)
|
|
85
96
|
```
|
|
86
97
|
|
|
87
|
-
|
|
98
|
+
| Tier | Model | Role | RAM | Latency | Cost |
|
|
99
|
+
|------|-------|------|-----|---------|------|
|
|
100
|
+
| **Default** | prism-coder:14b | Router + general inference | 9 GB | ~3s | $0 |
|
|
101
|
+
| **Verifier** | prism-coder:4b | Grounding claims check | 2.5 GB | <1s | $0 |
|
|
102
|
+
| **Complex** | prism-coder:32b | Deep reasoning (on-demand) | 19 GB | ~8s | $0 |
|
|
103
|
+
| **Cloud** | Sonnet → Opus | Fallback for max quality | — | ~5-10s | ~$0.01 |
|
|
104
|
+
|
|
105
|
+
**Mobile / offline cascade** (Prism AAC iOS):
|
|
88
106
|
```
|
|
89
|
-
prism-coder:14b (
|
|
90
|
-
→
|
|
107
|
+
prism-coder:14b (iPad Pro 16GB) → prism-coder:4b (iPhone 8GB)
|
|
108
|
+
→ prism-coder:1.7b (any device, always fits)
|
|
91
109
|
```
|
|
92
110
|
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
111
|
+
### Knowledge ingestion — teach Prism your codebase
|
|
112
|
+
|
|
113
|
+
Your code knowledge lives in the knowledge graph, not in model weights. Routing stays at 100%.
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
bash scripts/knowledge-ingest/setup.sh # one-time setup
|
|
117
|
+
# Then every git commit auto-indexes changed files into the knowledge graph
|
|
100
118
|
```
|
|
101
119
|
|
|
102
|
-
|
|
120
|
+
Three entry points:
|
|
121
|
+
- **MCP tool**: `knowledge_ingest` — AI says "learn this code"
|
|
122
|
+
- **GitHub webhook**: `POST /api/github/webhook` — auto on push
|
|
123
|
+
- **REST API**: `POST /api/v1/prism/ingest` — open interface
|
|
124
|
+
|
|
125
|
+
See [KNOWLEDGE_INGESTION.md](docs/KNOWLEDGE_INGESTION.md) for full setup guide.
|
|
126
|
+
|
|
127
|
+
### Cost comparison
|
|
128
|
+
|
|
129
|
+
Benchmark: 19 queries (routing + code knowledge + clinical), May 2026:
|
|
130
|
+
|
|
131
|
+
| Architecture | Routing | Code Knowledge | Clinical | Annual Cost (1K/day) |
|
|
132
|
+
|---|---|---|---|---|
|
|
133
|
+
| **Prism cascade** (14b→RAG→Sonnet) | 100% (local) | RAG-powered | Sonnet | **~$330/yr** |
|
|
134
|
+
| Claude Opus for everything | ~30% (no tools) | Training data | Opus | ~$10,600/yr |
|
|
135
|
+
|
|
136
|
+
**84% cost savings.** Routing is free and 100% accurate. Cloud only for the 20% of queries that need deep reasoning.
|
|
137
|
+
|
|
138
|
+
The routing cascade validates each response against the known tool names and escalates on empty, truncated, or hallucinated tool calls.
|
|
103
139
|
|
|
104
140
|
**Routing accuracy** ([102-case Prism eval](tests/benchmarks/prism-routing-100/README.md), v36/v7 system prompt, 3-seed mean, May 2026):
|
|
105
141
|
|
|
@@ -9,9 +9,9 @@
|
|
|
9
9
|
* stateless MCP), pointed at free-form generation instead of tool-call
|
|
10
10
|
* responses.
|
|
11
11
|
*
|
|
12
|
-
* Cascade role: prism-coder:
|
|
13
|
-
*
|
|
14
|
-
*
|
|
12
|
+
* Cascade role: prism-coder:4b is the default verifier (fast, 2.5GB).
|
|
13
|
+
* 14b drafts; 4b verifies. Different model = Patronus rule satisfied.
|
|
14
|
+
* Falls back to 1b7 on devices with <4GB free RAM.
|
|
15
15
|
*
|
|
16
16
|
* Failure modes:
|
|
17
17
|
* - Verifier model unreachable / timeout → fail-closed refusal
|
|
@@ -93,7 +93,7 @@ function refusalText(action, failedClaim) {
|
|
|
93
93
|
}
|
|
94
94
|
}
|
|
95
95
|
export async function verifyGrounding(opts) {
|
|
96
|
-
const verifierModel = opts.verifierModel ?? "prism-coder:
|
|
96
|
+
const verifierModel = opts.verifierModel ?? "prism-coder:4b";
|
|
97
97
|
const timeoutMs = opts.timeoutMs ?? 2000;
|
|
98
98
|
const ollamaUrl = opts.ollamaUrl ?? PRISM_LOCAL_LLM_URL;
|
|
99
99
|
const fetchImpl = opts.fetchImpl ?? fetch;
|
|
@@ -1,25 +1,25 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* RAM-Gated Local Model Picker
|
|
3
3
|
* ─────────────────────────────────────────────────────────────
|
|
4
|
-
*
|
|
5
|
-
* prism-coder tag whose Q4_K_M weights + KV-cache headroom fit.
|
|
4
|
+
* Cascade: 14b (default) → 4b (verifier) → 32b (complex only).
|
|
6
5
|
*
|
|
7
|
-
*
|
|
8
|
-
*
|
|
9
|
-
*
|
|
10
|
-
*
|
|
6
|
+
* The default ceiling is "14b" — NOT "32b". This means:
|
|
7
|
+
* - 14b is the primary model for routing + general inference
|
|
8
|
+
* - 4b is used as the grounding verifier (fast, small)
|
|
9
|
+
* - 32b is only loaded when caller explicitly passes ceiling="32b"
|
|
10
|
+
* or when the task requires maximum quality (complex code gen, etc.)
|
|
11
11
|
*
|
|
12
|
-
*
|
|
13
|
-
*
|
|
14
|
-
* prism-coder:14b ~ 9 GB ≥ 12 GB 32K
|
|
15
|
-
* prism-coder:4b ~ 2.5 GB ≥ 4 GB 8K
|
|
16
|
-
* prism-coder:8b ~ 5 GB ≥ 7 GB 32K
|
|
17
|
-
* prism-coder:1b7 ~ 2 GB ≥ 3 GB 8K
|
|
12
|
+
* This saves 10GB+ RAM on most devices and keeps response times fast.
|
|
13
|
+
* The 14b achieves 100% on eval_300 — same as 32b.
|
|
18
14
|
*
|
|
19
|
-
*
|
|
15
|
+
* tag weights need free ctx role
|
|
16
|
+
* prism-coder:32b ~19 GB ≥ 24 GB 32K complex (on-demand)
|
|
17
|
+
* prism-coder:14b ~ 9 GB ≥ 12 GB 32K default router
|
|
18
|
+
* prism-coder:8b ~ 5 GB ≥ 7 GB 32K fallback
|
|
19
|
+
* prism-coder:4b ~ 2.5 GB ≥ 4 GB 8K verifier + mobile
|
|
20
|
+
* prism-coder:1b7 ~ 2 GB ≥ 3 GB 8K watch + ultra-low RAM
|
|
20
21
|
*
|
|
21
|
-
*
|
|
22
|
-
* reports on macOS/Linux.
|
|
22
|
+
* Below 3 GB free → no local pick (caller must use cloud).
|
|
23
23
|
*/
|
|
24
24
|
const GB = 1024 ** 3;
|
|
25
25
|
/**
|
|
@@ -44,21 +44,21 @@ export const MODEL_TIERS = [
|
|
|
44
44
|
function tagMatches(installed, tierTag) {
|
|
45
45
|
return installed === tierTag || installed.endsWith(`/${tierTag}`);
|
|
46
46
|
}
|
|
47
|
+
/** Default ceiling: 14b. Pass ceiling="32b" explicitly for max quality. */
|
|
48
|
+
export const DEFAULT_CEILING = "14b";
|
|
47
49
|
/**
|
|
48
|
-
* Pick the
|
|
49
|
-
*
|
|
50
|
+
* Pick the best viable tier for the given free RAM.
|
|
51
|
+
* Default ceiling is 14b — use ceiling="32b" only for complex tasks.
|
|
50
52
|
*
|
|
51
53
|
* @param freeBytes Result of os.freemem() — binary bytes
|
|
52
|
-
* @param ceiling
|
|
53
|
-
* @param available Optional whitelist
|
|
54
|
-
* bare (`prism-coder:32b`) or namespaced (`dcostenco/prism-coder:32b`).
|
|
54
|
+
* @param ceiling Cap tier. Default "14b". Pass "32b" for complex tasks.
|
|
55
|
+
* @param available Optional whitelist of installed Ollama tags.
|
|
55
56
|
*/
|
|
56
57
|
export function pickLocalModel(freeBytes, ceiling, available) {
|
|
57
58
|
if (!Number.isFinite(freeBytes) || freeBytes <= 0)
|
|
58
59
|
return null;
|
|
59
|
-
const
|
|
60
|
-
|
|
61
|
-
: 0;
|
|
60
|
+
const effectiveCeiling = ceiling || DEFAULT_CEILING;
|
|
61
|
+
const ceilingIdx = MODEL_TIERS.findIndex(t => t.tag.endsWith(effectiveCeiling) || t.tag === effectiveCeiling);
|
|
62
62
|
const startIdx = ceilingIdx >= 0 ? ceilingIdx : 0;
|
|
63
63
|
for (let i = startIdx; i < MODEL_TIERS.length; i++) {
|
|
64
64
|
const tier = MODEL_TIERS[i];
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "prism-mcp-server",
|
|
3
|
-
"version": "15.7.
|
|
3
|
+
"version": "15.7.2",
|
|
4
4
|
"mcpName": "io.github.dcostenco/prism-coder",
|
|
5
5
|
"description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder:7b / 14b open-weights LLM fleet.",
|
|
6
6
|
"module": "index.ts",
|