prism-mcp-server 18.0.1 β 19.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +273 -773
- package/dist/dashboard/authUtils.js +10 -7
- package/dist/server.js +9 -0
- package/dist/tools/behavioralVerifierHandler.js +80 -0
- package/dist/tools/index.js +2 -0
- package/dist/tools/ingestHandler.js +5 -1
- package/dist/tools/ledgerHandlers.js +29 -32
- package/dist/tools/prismInferHandler.js +8 -8
- package/dist/tools/sessionMemoryDefinitions.js +40 -0
- package/dist/tools/skillRouting.js +31 -6
- package/dist/utils/entitlements.js +1 -1
- package/dist/utils/groundingVerifier.js +3 -3
- package/dist/utils/modelPicker.js +7 -8
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,280 +1,201 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Prism Coder
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**Persistent memory and reliable tool-routing for AI agents.** *(formerly Prism MCP)*
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Prism Coder is a [Model Context Protocol](https://modelcontextprotocol.io) server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions β semantic search, cognitive routing, and a visual dashboard. It ships alongside the open-weight `prism-coder` model fleet (1.7B-32B) for fast, offline tool-routing when you don't want a cloud round-trip.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
It runs **fully local and free** on SQLite + Ollama with no API keys. A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.
|
|
8
8
|
|
|
9
|
-
[](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
|
|
11
|
-
[](https://synalux.ai/prism-mcp)
|
|
9
|
+
[](https://www.npmjs.com/package/prism-mcp-server)
|
|
12
10
|
[](https://github.com/modelcontextprotocol/servers)
|
|
13
|
-
[](https://smithery.ai/server/@dcostenco/prism-mcp)
|
|
14
11
|
[](LICENSE)
|
|
12
|
+
[](https://huggingface.co/dcostenco)
|
|
15
13
|
|
|
16
|
-
>
|
|
14
|
+
<p align="center">
|
|
15
|
+
<img src="docs/v11_hivemind_multi_agent_dashboard.jpg" alt="Prism Coder β Mind Palace Dashboard with Knowledge Graph and Multi-Agent Hivemind" width="700" />
|
|
16
|
+
</p>
|
|
17
17
|
|
|
18
|
-
|
|
18
|
+
> **Renamed in v14:** the project is now **Prism Coder** to cover both the memory server and the model fleet. The npm package stays `prism-mcp-server`, so existing install URLs and `mcp.json` entries keep working.
|
|
19
19
|
|
|
20
|
-
|
|
20
|
+
---
|
|
21
21
|
|
|
22
|
-
|
|
23
|
-
Every conversation feeds the Mind Palace. Next session, your AI agent loads the right context automatically β no re-explaining.
|
|
22
|
+
## Quickstart
|
|
24
23
|
|
|
25
|
-
|
|
26
|
-
Ask "what did I decide about the auth flow last month?" and get the answer with citations. Vector search + keyword + graph traversal.
|
|
24
|
+
The free tier needs no account, no API key, and no cloud. Add the server to your MCP client:
|
|
27
25
|
|
|
28
|
-
|
|
29
|
-
|
|
26
|
+
```json
|
|
27
|
+
{
|
|
28
|
+
"mcpServers": {
|
|
29
|
+
"prism": {
|
|
30
|
+
"command": "npx",
|
|
31
|
+
"args": ["-y", "prism-mcp-server"]
|
|
32
|
+
}
|
|
33
|
+
}
|
|
34
|
+
}
|
|
35
|
+
```
|
|
30
36
|
|
|
31
|
-
|
|
32
|
-
Your AI agent can now detect when it has drifted from your original goals β mid-session, automatically β and self-correct before you notice the problem.
|
|
37
|
+
Open Claude Desktop or Cursor and your agent now has memory backed by a local SQLite database (`~/.prism-mcp/data.db`).
|
|
33
38
|
|
|
34
|
-
|
|
35
|
-
1. **`session_save_ledger`** β snapshot current state
|
|
36
|
-
2. **`session_detect_drift`** β HRR-powered semantic comparison of current work vs original goals, returns `on_track / minor_drift / major_drift` with domain-specific signals (BCBA/Coding/AAC)
|
|
37
|
-
3. **`session_compact_ledger`** β if drifted, compress and reload only what matters
|
|
39
|
+
**Optional β local model fleet** for offline tool-routing. Pull whichever fits your hardware:
|
|
38
40
|
|
|
39
|
-
|
|
41
|
+
```bash
|
|
42
|
+
ollama pull dcostenco/prism-coder:2b # 2.3 GB Β· iPhone / mobile first gate (Qwen3.5-4B Q3_K_M, 99.1%)
|
|
43
|
+
ollama pull dcostenco/prism-coder:4b # 3.4 GB Β· verifier + 8 GB+ devices (Qwen3.5-4B Q4_K_M, 100%)
|
|
44
|
+
ollama pull dcostenco/prism-coder:14b # 8.4 GB Β· Mac default router (100%)
|
|
45
|
+
ollama pull dcostenco/prism-coder:32b # 16 GB Β· Mac complex tasks (100%)
|
|
46
|
+
```
|
|
40
47
|
|
|
41
|
-
|
|
48
|
+
Prism detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tags automatically.
|
|
42
49
|
|
|
43
|
-
|
|
50
|
+
---
|
|
44
51
|
|
|
45
|
-
|
|
46
|
-
Automatic Protected Health Information detection and redaction in the memory pipeline. Every `session_save_ledger` and `session_save_handoff` call passes through the PHI guard before storage.
|
|
52
|
+
## What it does
|
|
47
53
|
|
|
48
|
-
|
|
54
|
+
### Mind Palace β persistent memory that survives across sessions
|
|
49
55
|
|
|
50
|
-
|
|
56
|
+
Every conversation feeds a persistent store. The next session loads the right context automatically β no re-explaining.
|
|
51
57
|
|
|
52
|
-
|
|
53
|
-
|
|
58
|
+
<p align="center">
|
|
59
|
+
<img src="docs/mind-palace-dashboard.png" alt="Mind Palace Dashboard β project state, neural graph, pending TODOs" width="700" />
|
|
60
|
+
</p>
|
|
54
61
|
|
|
55
|
-
|
|
56
|
-
`prism_infer` now enforces subscription-tier gates: model ceiling, max tokens, daily limits, and cloud fallback are all gated by your plan. Free users get local-only inference up to 4b; paid tiers unlock higher models, more tokens, and cloud fallback. Flat-rate seat caps via `max_seats` per plan.
|
|
62
|
+
The dashboard shows your current project state, pending TODOs, intent health, and a neural knowledge graph β all built automatically from your agent sessions.
|
|
57
63
|
|
|
58
|
-
###
|
|
59
|
-
Free tier runs entirely on your machine β SQLite, local embedding model, no API keys, no cloud. Paid tier adds cloud sync via Synalux portal.
|
|
64
|
+
### Knowledge Graph β semantic + keyword + graph search
|
|
60
65
|
|
|
61
|
-
|
|
66
|
+
Ask "what did I decide about the auth flow last month?" and get an answer with citations, combining vector similarity, full-text search, and graph traversal.
|
|
62
67
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
| API key required | Yes | **No** |
|
|
67
|
-
| Data sent externally | Every prompt | **Nothing** |
|
|
68
|
-
| Works offline | β | β
|
|
|
69
|
-
| Cost at scale | $0.002β0.06/call | **$0** |
|
|
70
|
-
| HIPAA | Requires BAA | **On-prem = no BAA** |
|
|
68
|
+
<p align="center">
|
|
69
|
+
<img src="docs/knowledge-graph.jpg" alt="Knowledge Graph β 190 keywords, 47 edges, 12 projects visualized" width="500" />
|
|
70
|
+
</p>
|
|
71
71
|
|
|
72
|
-
|
|
73
|
-
```bash
|
|
74
|
-
ollama pull dcostenco/prism-coder:14b # 9 GB Β· default router Β· Mac M2+ / iPad Pro
|
|
75
|
-
ollama pull dcostenco/prism-coder:4b # 2.5 GB Β· verifier Β· iPhone 15/16 Pro
|
|
76
|
-
ollama pull dcostenco/prism-coder:1b7 # 2.2 GB Β· ultra-low RAM / Apple Watch
|
|
77
|
-
ollama pull dcostenco/prism-coder:32b # 19 GB Β· complex tasks Β· Mac M2 Ultra+
|
|
78
|
-
ollama pull dcostenco/prism-coder:8b # 4.7 GB Β· balanced Β· iPhone/iPad 8GB
|
|
79
|
-
```
|
|
72
|
+
### Session History β immutable audit trail
|
|
80
73
|
|
|
81
|
-
|
|
74
|
+
Every session is logged with files changed, decisions made, and TODOs. Search, filter, and replay any past session.
|
|
82
75
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
```
|
|
76
|
+
<p align="center">
|
|
77
|
+
<img src="docs/session-ledger.jpg" alt="Session Ledger β 93 sessions, 847 decisions logged across 12 projects" width="700" />
|
|
78
|
+
</p>
|
|
87
79
|
|
|
88
|
-
###
|
|
80
|
+
### Session Drift Detection
|
|
89
81
|
|
|
90
|
-
|
|
82
|
+
Long agent sessions can wander from their original goal. `session_detect_drift` compares current work against the stated goal and returns `on_track / minor_drift / major_drift` so the agent can self-correct.
|
|
91
83
|
|
|
92
|
-
|
|
93
|
-
Query arrives
|
|
94
|
-
β
|
|
95
|
-
βΌ
|
|
96
|
-
prism-coder:14b ββ routes (100% eval_300) βββΆ serve (~3s, 9GB, FREE)
|
|
97
|
-
β β
|
|
98
|
-
β knowledge_search (RAG context)
|
|
99
|
-
β β
|
|
100
|
-
βΌ βΌ
|
|
101
|
-
prism-coder:4b ββ verifies claims βββββββββββΆ grounded response
|
|
102
|
-
β (2.5GB, <1s)
|
|
103
|
-
β
|
|
104
|
-
βΌ (complex tasks only, explicit ceiling="32b")
|
|
105
|
-
prism-coder:32b ββ deep reasoning βββββββββββΆ serve (~8s, 19GB, FREE)
|
|
106
|
-
β
|
|
107
|
-
βΌ (cloud fallback when local insufficient)
|
|
108
|
-
Claude Sonnet 4 βββββββββββββββββββββββββββββΆ serve (cloud, ~$0.01/req)
|
|
109
|
-
```
|
|
84
|
+
### Behavioral Verification β catch bad edits before they happen
|
|
110
85
|
|
|
111
|
-
|
|
112
|
-
|------|-------|------|-----|---------|------|
|
|
113
|
-
| **Default** | prism-coder:14b | Router + general inference | 9 GB | ~3s | $0 |
|
|
114
|
-
| **Verifier** | prism-coder:4b | Grounding claims check | 2.5 GB | <1s | $0 |
|
|
115
|
-
| **Complex** | prism-coder:32b | Deep reasoning (on-demand) | 19 GB | ~8s | $0 |
|
|
116
|
-
| **Cloud** | Claude Sonnet 4 | Fallback for max quality | β | ~5-10s | ~$0.01 |
|
|
86
|
+
AI agents pattern-match on checklists instead of thinking through user impact. The behavioral verifier challenges the agent with a domain-specific scenario **before** editing code β like an ABA antecedent intervention.
|
|
117
87
|
|
|
118
|
-
**Mobile / offline cascade** (Prism AAC iOS):
|
|
119
|
-
```
|
|
120
|
-
prism-coder:14b (iPad Pro 16GB) β prism-coder:4b (iPhone 8GB)
|
|
121
|
-
β prism-coder:1.7b (any device, always fits)
|
|
122
88
|
```
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
```bash
|
|
129
|
-
bash scripts/knowledge-ingest/setup.sh # one-time setup
|
|
130
|
-
# Then every git commit auto-indexes changed files into the knowledge graph
|
|
89
|
+
Agent: "I'll revert the KDS bump logic"
|
|
90
|
+
Prism: "β οΈ Kitchen worker scenario: A cook has a 3-item ticket.
|
|
91
|
+
One item is voided. What should the cook see on the KDS?"
|
|
92
|
+
Agent: "The ticket should stay visible with the remaining 2 items."
|
|
93
|
+
Prism: "Correct β your revert would remove the ticket entirely. Don't revert."
|
|
131
94
|
```
|
|
132
95
|
|
|
133
|
-
|
|
134
|
-
- **MCP tool**: `knowledge_ingest` β AI says "learn this code"
|
|
135
|
-
- **GitHub webhook**: `POST /api/github/webhook` β auto on push
|
|
136
|
-
- **REST API**: `POST /api/v1/prism/ingest` β open interface
|
|
96
|
+
**17 built-in domains**: KDS, billing, auth, voice ordering, webhooks, migrations, EU routing, clinical (HIPAA/FHIR), HR, accounting, chat, STT, privacy, loyalty, discounts, drawer operations, order lifecycle. Custom domains can be added per workspace.
|
|
137
97
|
|
|
138
|
-
|
|
98
|
+
**How it works**: The `verify_behavior` tool calls the Synalux portal API, which matches the file path against domain scenarios stored in the database. The agent must answer the scenario concretely before editing. No local hooks required β works in Claude, Cursor, or any MCP client.
|
|
139
99
|
|
|
140
|
-
|
|
100
|
+
**Why it matters**: In a single audit session, 47 bugs were found across 7 days of AI-generated code. Every bug was introduced by an agent that applied a "correct" pattern without simulating the end-user journey. The behavioral verifier would have caught all of them.
|
|
141
101
|
|
|
142
|
-
|
|
102
|
+
| Tier | Coverage |
|
|
103
|
+
|------|----------|
|
|
104
|
+
| Free | Skill-based advisory (agent prompted to think before editing) |
|
|
105
|
+
| Standard+ | `verify_behavior` tool with 17 domain scenarios via API |
|
|
106
|
+
| Enterprise | Custom per-workspace scenarios |
|
|
143
107
|
|
|
144
|
-
|
|
145
|
-
|---|---|---|
|
|
146
|
-
| **Overall accuracy** | **96% (24/25)** | 88% (22/25) |
|
|
147
|
-
| **Tool routing** (15 tests) | **93% (14/15)** | 80% (12/15) |
|
|
148
|
-
| **Abstention** (10 tests) | **100% (10/10)** | **100% (10/10)** |
|
|
149
|
-
| **Avg latency** | **0.8s** | 5.5s |
|
|
150
|
-
| **Cost per query** | **$0** | ~$0.017 |
|
|
151
|
-
| **Annual @ 1K/day** | **$0** | ~$6,100 |
|
|
108
|
+
### Time Travel
|
|
152
109
|
|
|
153
|
-
|
|
110
|
+
Roll back to any previous session state. Compare diffs between versions. Restore a known-good state with one click.
|
|
154
111
|
|
|
155
|
-
|
|
112
|
+
<p align="center">
|
|
113
|
+
<img src="docs/time-travel-timeline.jpg" alt="Time Travel β version timeline with diff view and one-click restore" width="500" />
|
|
114
|
+
</p>
|
|
156
115
|
|
|
157
|
-
|
|
158
|
-
|---|---|---|---|
|
|
159
|
-
| **prism-coder:32b** | **300/300 (100%)** | 19 GB | ~1.4s |
|
|
160
|
-
| **prism-coder:14b** | **299/300 (99.7%)** | 9 GB | ~0.8s |
|
|
161
|
-
| **prism-coder:4b** | **300/300 (100%)** | 2.5 GB | ~0.5s |
|
|
162
|
-
| **prism-coder:1.7b** | **300/300 (100%)** | 2.2 GB | ~1.6s |
|
|
116
|
+
### Cognitive Routing
|
|
163
117
|
|
|
164
|
-
|
|
118
|
+
Episodic (what happened), semantic (what's true), and procedural (how to do X) memories live in separate stores; a router decides where to write and where to read.
|
|
165
119
|
|
|
166
|
-
|
|
120
|
+
### Multi-Agent Hivemind
|
|
167
121
|
|
|
168
|
-
|
|
122
|
+
Coordinate multiple AI agents working on the same project. Each agent has its own session, but they share memory through the knowledge graph. The Hivemind Radar shows real-time agent status, tasks, and activity.
|
|
169
123
|
|
|
170
|
-
|
|
124
|
+
<p align="center">
|
|
125
|
+
<img src="docs/hivemind-radar.jpg" alt="Hivemind Radar β 5 agents with real-time status, tasks, and activity feed" width="500" />
|
|
126
|
+
</p>
|
|
171
127
|
|
|
172
|
-
|
|
128
|
+
### Neural Search
|
|
173
129
|
|
|
174
|
-
|
|
130
|
+
Search across all memories with highlighted results, knowledge graph editing, and memory density metrics.
|
|
175
131
|
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
| **0a β No evidence** | Assertive draft + zero evidence snippets | Refuse (fail-closed) |
|
|
180
|
-
| **2 β NLI** | Assertive draft + evidence provided | Verify each claim against evidence |
|
|
132
|
+
<p align="center">
|
|
133
|
+
<img src="docs/v6_cognitive_load_dashboard.jpg" alt="Neural Search with Knowledge Graph Editor and Memory Density" width="500" />
|
|
134
|
+
</p>
|
|
181
135
|
|
|
182
|
-
|
|
183
|
-
- `ENTAILED` β claim matches evidence (including arithmetic identity: "3" β "three")
|
|
184
|
-
- `CONTRADICTED` β evidence states a different value for the same fact β **refuse**
|
|
185
|
-
- `NEUTRAL` β claim not covered by evidence β **refuse** (fail-closed default)
|
|
136
|
+
---
|
|
186
137
|
|
|
187
|
-
|
|
138
|
+
## Local-first and privacy
|
|
188
139
|
|
|
189
|
-
|
|
190
|
-
```json
|
|
191
|
-
{
|
|
192
|
-
"prompt": "What was the patient's last A1C?",
|
|
193
|
-
"evidence": [
|
|
194
|
-
{ "source": "lab_2026-05-01", "content": "HbA1c: 6.8% (ref <7.0)" }
|
|
195
|
-
]
|
|
196
|
-
}
|
|
197
|
-
```
|
|
198
|
-
|
|
199
|
-
**Structured output:**
|
|
200
|
-
```json
|
|
201
|
-
{
|
|
202
|
-
"output": "The patient's last A1C was 6.8%.",
|
|
203
|
-
"verification": {
|
|
204
|
-
"action": "served",
|
|
205
|
-
"claims": [{ "text": "A1C was 6.8%", "verdict": "ENTAILED" }],
|
|
206
|
-
"verifierChain": [{ "model": "prism-coder:4b", "verdict": "ENTAILED", "latencyMs": 340 }]
|
|
207
|
-
}
|
|
208
|
-
}
|
|
209
|
-
```
|
|
140
|
+
The free tier runs entirely on your machine. Paid tiers add cloud sync through the Synalux portal, which is what enables cross-device memory and team sharing.
|
|
210
141
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
}
|
|
220
|
-
```
|
|
142
|
+
| | Local tier (free) | Cloud tier (paid) |
|
|
143
|
+
|---|---|---|
|
|
144
|
+
| Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
|
|
145
|
+
| Inference | Local Ollama models | Local models + cloud fallback |
|
|
146
|
+
| API keys required | None | Synalux subscription key |
|
|
147
|
+
| Web search / scrape | Not included | Routed through the Synalux portal (provider keys stay server-side). Search tools appear as `brave_web_search` in the MCP surface but are proxied through the portal for auth and billing. |
|
|
148
|
+
| What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
|
|
149
|
+
| Works offline | Yes | Local features yes; sync/cloud no |
|
|
221
150
|
|
|
222
|
-
|
|
151
|
+
**Handling sensitive data.** Memory text fields (summaries, decisions, handoff context, file paths) pass through a PHI-redaction step (SSN/DOB/MRN/phone/email and common clinical identifiers) before any cloud write. Knowledge ingestion chunks are also redacted before being sent to the LLM for Q&A synthesis. For regulated workloads, run the **local tier** to keep data on-device, or use an **Enterprise** plan, which is the tier that includes a HIPAA Business Associate Agreement. Prism does not claim blanket HIPAA compliance on the free or individual tiers β the on-device path is the air-gapped option.
|
|
223
152
|
|
|
224
|
-
|
|
225
|
-
Detects when long AI agent sessions drift from their original goal β using Holographic Reduced Representations for temporal trajectory encoding and anomaly detection.
|
|
153
|
+
---
|
|
226
154
|
|
|
227
|
-
|
|
228
|
-
| Domain | Signals | Safety |
|
|
229
|
-
|---|---|---|
|
|
230
|
-
| **BCBA/Clinical** | Client specificity decay, function-intervention alignment (4 functions), contraindication detection (epilepsy/pica/dysphagia/diabetes) | PHI-safe, deterministic |
|
|
231
|
-
| **Coding** | File scope entropy, summary vagueness, test coverage ratio, trajectory HRR divergence | Adaptive threshold for refactors |
|
|
232
|
-
| **AAC** | Prediction accuracy, vocabulary stagnation, topic divergence | Emergency phrases always β₯ 0.95 |
|
|
155
|
+
## Models
|
|
233
156
|
|
|
234
|
-
|
|
157
|
+
The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing. The 14B and 32B are fine-tuned from Qwen3; the 2B and 4B slots use stock Qwen3.5-4B with prompt engineering at different quantization levels (100% routing accuracy without fine-tuning). They are **not** general-purpose chat models β they route reliably and run offline; Claude and other frontier models remain better at reasoning, coding, and open-domain work. The intended pattern is local routing with an optional cloud fallback for hard cases.
|
|
235
158
|
|
|
236
|
-
|
|
237
|
-
|
|
159
|
+
| Model | Ollama tag | Size | BFCL Accuracy | Role | Tier |
|
|
160
|
+
|---|---|---|---|---|---|
|
|
161
|
+
| Qwen3.5-4B Q3_K_M | `prism-coder:2b` | 2.3 GB | 99.1% Γ 3 seeds | iPhone / mobile first gate | Free |
|
|
162
|
+
| Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% Γ 3 seeds | Verifier + 8 GB+ devices | Free |
|
|
163
|
+
| prism-coder:14b | `prism-coder:14b` | 8.4 GB | 100% Γ 3 seeds | Default router | Standard+ |
|
|
164
|
+
| prism-coder:32b | `prism-coder:32b` | 16 GB | 100% Γ 3 seeds | Complex tasks | Advanced+ |
|
|
238
165
|
|
|
239
|
-
|
|
240
|
-
- **GloVe embeddings** (offline, 50K words) β 87% Top-1 accuracy, stable at 200+ concepts
|
|
241
|
-
- **API embeddings** (Gemini/Voyage) β 90%+ accuracy when online
|
|
242
|
-
- **NeurIPS 2021 projection** β unit-modulus normalization for numerical stability
|
|
166
|
+
Weights: [huggingface.co/dcostenco](https://huggingface.co/dcostenco) (public GGUF). Latency depends on model size and hardware β see [Benchmarks](#benchmarks) to measure it on your own machine rather than trusting a printed number.
|
|
243
167
|
|
|
244
|
-
|
|
168
|
+
### Cascade
|
|
245
169
|
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
|
|
170
|
+
```
|
|
171
|
+
query β prism-coder:14b (local router, Mac default)
|
|
172
|
+
β qwen3.5:4b (grounding verifier)
|
|
173
|
+
β prism-coder:2b (iPhone / mobile, auto-selected by RAM)
|
|
174
|
+
β prism-coder:32b (complex tasks, on demand)
|
|
175
|
+
β cloud fallback (paid tiers, for max quality)
|
|
176
|
+
```
|
|
253
177
|
|
|
254
|
-
|
|
178
|
+
---
|
|
255
179
|
|
|
256
|
-
|
|
180
|
+
## Benchmarks
|
|
257
181
|
|
|
258
|
-
|
|
259
|
-
|----------|---------------|------------|-----------|----------|
|
|
260
|
-
| Core AAC phrases | 36.7% | 46.7% | **+27.3%** | +6.0% |
|
|
261
|
-
| Personal vocabulary | 70.4% | 81.5% | **+15.8%** | +9.2% |
|
|
262
|
-
| Mixed (all phrases) | 47.2% | 56.9% | **+20.6%** | +5.7% |
|
|
263
|
-
| Cross-session recall | 80.0% | 80.0% | +0.0% | +0.0% |
|
|
182
|
+
**Reproduce every number yourself.** All evals are open-source and self-contained:
|
|
264
183
|
|
|
265
|
-
|
|
184
|
+
```bash
|
|
185
|
+
git clone https://github.com/dcostenco/prism-coder && cd prism-coder
|
|
186
|
+
pip install anthropic requests
|
|
187
|
+
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 14b 32b
|
|
188
|
+
```
|
|
266
189
|
|
|
267
|
-
**
|
|
190
|
+
**Routing eval (115 cases, 12 categories, 3-seed mean).** On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy β it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
|
|
268
191
|
|
|
269
|
-
|
|
|
270
|
-
|
|
271
|
-
|
|
|
272
|
-
|
|
|
273
|
-
|
|
|
274
|
-
| Hermes (NousResearch) | HRR + SQLite | Yes | Free | ~5ms |
|
|
192
|
+
| Model | Routing accuracy | Notes |
|
|
193
|
+
|---|---|---|
|
|
194
|
+
| prism-coder:2b (Q3_K_M) | 99.1% Γ 3 seeds | 1 failure: regexβknowledge_search |
|
|
195
|
+
| prism-coder:4b / 14b / 32b | 100% Γ 3 seeds | Perfect on all 115 cases |
|
|
196
|
+
| Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
|
|
275
197
|
|
|
276
|
-
|
|
277
|
-
Multiple AI agents share the same Mind Palace. Each agent has a role (dev / qa / pm / etc.) and sees scoped context. Heartbeat + roster for coordination.
|
|
198
|
+
**Memory uplift (LoCoMo-Plus, self-published).** A separate long-context dialogue benchmark ([dcostenco/Locomo-Plus](https://github.com/dcostenco/Locomo-Plus)) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project β treat it as a reproducible demonstration, not an independent third-party result, and run it yourself with the commands in that repo.
|
|
278
199
|
|
|
279
200
|
---
|
|
280
201
|
|
|
@@ -282,658 +203,237 @@ Multiple AI agents share the same Mind Palace. Each agent has a role (dev / qa /
|
|
|
282
203
|
|
|
283
204
|
### vs AI coding assistants
|
|
284
205
|
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
|
288
|
-
| Works offline (local-only mode) | β
| β | β | β | β | β | β |
|
|
289
|
-
| Open-weight models (HuggingFace) | β
| β | β | β | β | β | β |
|
|
290
|
-
| Data stays on machine (local tier) | β
| β | β | β | β | β | β |
|
|
291
|
-
| Persistent cross-session memory | β
| β
| β | β | β | β | β |
|
|
292
|
-
| Cognitive routing (episodic/semantic) | β
| β | β | β | β | β | β |
|
|
293
|
-
| Session drift detection (HRR) | β
| β | β | β | β | β | β |
|
|
294
|
-
| L3 grounding verifier | β
| β | β | β | β | β | β |
|
|
295
|
-
| Multi-agent hivemind | β
| β | β | β | β | β | β |
|
|
296
|
-
| MCP server (tools + memory for agents) | β
| β | β | β | β | β | β |
|
|
297
|
-
| Cloud fallback (14b β 32b β Sonnet) | β
| β | β | β | β | β | β |
|
|
298
|
-
| Web IDE | β
| β
| β | β | β
| β | β
|
|
|
299
|
-
| VS Code extension | β
| β
| β | β | β
| β
| β |
|
|
300
|
-
| HIPAA / air-gapped ready | β
| β | β | β | β | β | β |
|
|
301
|
-
| Flat-rate pricing (not per-seat) | β
| β | β | β | β | β | β |
|
|
302
|
-
|
|
303
|
-
### vs local AI tools
|
|
304
|
-
|
|
305
|
-
| Feature | Prism Coder | Ollama | LM Studio | Jan.ai | Mem0 | Zep |
|
|
206
|
+
These tables are the maintainer's assessment as of June 2026. Verify claims that matter to you β products change fast.
|
|
207
|
+
|
|
208
|
+
| Feature | Prism Coder | GitHub Copilot | Cursor | Windsurf | Amazon Q | Devin |
|
|
306
209
|
|---|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
307
|
-
| Local inference (
|
|
308
|
-
|
|
|
309
|
-
| Persistent cross-session memory |
|
|
310
|
-
|
|
|
311
|
-
|
|
|
312
|
-
|
|
|
313
|
-
|
|
|
314
|
-
|
|
|
315
|
-
|
|
|
316
|
-
|
|
|
210
|
+
| Local inference (open-weight) | Yes | No | No | No | No | No |
|
|
211
|
+
| Works fully offline | Yes (free tier) | No | No | No | No | No |
|
|
212
|
+
| Persistent cross-session memory | Yes | Yes | No | No | No | No |
|
|
213
|
+
| Session drift detection | Yes | No | No | No | No | No |
|
|
214
|
+
| L3 grounding verifier | Yes | No | No | No | No | No |
|
|
215
|
+
| Behavioral verification (pre-edit) | Yes | No | No | No | No | No |
|
|
216
|
+
| MCP server (tools + memory) | Yes | No | No | No | No | No |
|
|
217
|
+
| Web IDE | Yes | Yes | No | No | Yes | Yes |
|
|
218
|
+
| VS Code extension | Yes | Yes | N/A (is VS Code) | N/A | Yes | No |
|
|
219
|
+
| Flat-rate team pricing | Yes | No (per-seat) | No (per-seat) | No | No | No |
|
|
220
|
+
| HIPAA BAA available | Yes (Enterprise) | No | No | No | No | No |
|
|
221
|
+
|
|
222
|
+
### vs local AI / memory tools
|
|
223
|
+
|
|
224
|
+
| Feature | Prism Coder | Ollama | LM Studio | Mem0 | Zep |
|
|
225
|
+
|---|:---:|:---:|:---:|:---:|:---:|
|
|
226
|
+
| Local inference cascade | Yes | Yes | Yes | No | No |
|
|
227
|
+
| Cloud fallback | Yes | No | No | No | No |
|
|
228
|
+
| Persistent cross-session memory | Yes | No | No | Yes | Yes |
|
|
229
|
+
| Knowledge ingestion (MCP + webhook) | Yes | No | No | No | No |
|
|
230
|
+
| Cognitive routing (3-store) | Yes | No | No | No | No |
|
|
231
|
+
| Session drift detection | Yes | No | No | No | No |
|
|
232
|
+
| Native MCP server | Yes | No | No | No | No |
|
|
233
|
+
| Web IDE + VS Code extension | Yes | No | No | No | No |
|
|
317
234
|
|
|
318
235
|
### Pricing β flat-rate, not per-seat
|
|
319
236
|
|
|
320
|
-
| | **Prism Coder** | GitHub Copilot | Cursor |
|
|
321
|
-
|
|
322
|
-
| **Individual** | **$19/mo** | $10/mo | $20/mo | $
|
|
323
|
-
| **Team (5 devs)** | **$49/mo flat** | $95/mo | $200/mo | $
|
|
324
|
-
| **Enterprise (25 devs)** | **$99/mo flat** | $195/mo | $1,000/mo | Custom |
|
|
325
|
-
| **Cost per dev (team)** | **$9.80** | $19 | $40 | $40 | $19 | $59 |
|
|
326
|
-
| **Annual savings (5 devs)** | β | **$552** | **$1,812** | **$1,812** | **$552** | **$2,952** |
|
|
237
|
+
| | **Prism Coder** | GitHub Copilot | Cursor | Amazon Q |
|
|
238
|
+
|---|:---:|:---:|:---:|:---:|
|
|
239
|
+
| **Individual** | **$19/mo** | $10/mo | $20/mo | $19/mo |
|
|
240
|
+
| **Team (5 devs)** | **$49/mo flat** | $95/mo | $200/mo | $95/mo |
|
|
241
|
+
| **Enterprise (25 devs)** | **$99/mo flat** | $195/mo | $1,000/mo | Custom |
|
|
327
242
|
|
|
328
243
|
---
|
|
329
244
|
|
|
330
245
|
## Plans
|
|
331
246
|
|
|
332
|
-
|
|
247
|
+
All on-device models are free to run locally via Ollama on every tier. A subscription gates **cloud** features, higher model ceilings, and increased limits. Local model ceilings are advisory β on-device models run on your Ollama regardless of plan; the ceiling gates cloud inference and `prism_infer` routing.
|
|
248
|
+
|
|
249
|
+
| | **Free** | **Standard** $19/mo | **Advanced** $49/mo | **Enterprise** $99/mo |
|
|
333
250
|
|---|---|---|---|---|
|
|
334
|
-
|
|
|
335
|
-
|
|
|
336
|
-
|
|
|
337
|
-
|
|
|
338
|
-
|
|
|
339
|
-
|
|
|
340
|
-
|
|
|
341
|
-
|
|
|
342
|
-
|
|
|
343
|
-
|
|
|
344
|
-
|
|
345
|
-
|
|
251
|
+
| Seats | 1 | 1 | up to 5 | up to 25 |
|
|
252
|
+
| Local model ceiling | up to 4b | up to 14b | up to 32b | up to 32b |
|
|
253
|
+
| Daily cloud inference | -- | 200 | 2,000 | 100,000 |
|
|
254
|
+
| Cloud Coder (Web IDE) | -- | 100/day | 1,000/day | 100,000/day |
|
|
255
|
+
| Cloud search | -- | 50/day | 500/day | 100,000/day |
|
|
256
|
+
| Max output tokens | 512 | 1,024 | 2,048 | 4,096 |
|
|
257
|
+
| Cloud fallback | -- | Claude Sonnet 4 | Claude Sonnet 4 | Priority + Sonnet 4 |
|
|
258
|
+
| Grounding verifier | -- | Yes | Yes | Yes |
|
|
259
|
+
| Memory sync (cloud) | -- | Yes | Yes | Yes |
|
|
260
|
+
| Knowledge / session memory | limited | unlimited | unlimited | unlimited |
|
|
261
|
+
| Analytics dashboard | -- | Yes | Yes | Yes |
|
|
262
|
+
| HIPAA BAA | -- | -- | -- | Yes |
|
|
263
|
+
|
|
264
|
+
14-day free trial on paid plans. [Pricing](https://synalux.ai/pricing) | 25+ seats: [contact sales](https://synalux.ai/support)
|
|
346
265
|
|
|
347
266
|
---
|
|
348
267
|
|
|
349
|
-
##
|
|
350
|
-
|
|
351
|
-
```bash
|
|
352
|
-
# Install globally
|
|
353
|
-
npm install -g prism-mcp-server
|
|
354
|
-
|
|
355
|
-
# Or use npx (no install)
|
|
356
|
-
npx prism-mcp-server
|
|
357
|
-
```
|
|
358
|
-
|
|
359
|
-
Add to Claude Desktop / Cursor config:
|
|
360
|
-
|
|
361
|
-
```json
|
|
362
|
-
{
|
|
363
|
-
"mcpServers": {
|
|
364
|
-
"prism": {
|
|
365
|
-
"command": "npx",
|
|
366
|
-
"args": ["-y", "prism-mcp-server"]
|
|
367
|
-
}
|
|
368
|
-
}
|
|
369
|
-
}
|
|
370
|
-
```
|
|
371
|
-
|
|
372
|
-
That's it. Open Claude / Cursor and your AI now has memory.
|
|
268
|
+
## How agents use it
|
|
373
269
|
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
### Monitoring & Observability *(new in v16.2)*
|
|
377
|
-
|
|
378
|
-
Built-in Datadog integration β every tool call is logged with tool name, project, and latency. Zero config for self-hosted users (logs to stdout); set `DD_API_KEY` to send structured logs to Datadog HTTP intake.
|
|
379
|
-
|
|
380
|
-
```bash
|
|
381
|
-
# Enable Datadog logging (optional)
|
|
382
|
-
export DD_API_KEY=your_datadog_api_key
|
|
383
|
-
|
|
384
|
-
# Enable OpenTelemetry tracing (optional β works with Jaeger, Zipkin, Datadog, Grafana Tempo)
|
|
385
|
-
export PRISM_OTEL_ENABLED=true
|
|
386
|
-
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
387
|
-
```
|
|
388
|
-
|
|
389
|
-
**What's tracked automatically:**
|
|
390
|
-
- `mcp.tool.success` β tool name, project, duration (ms) on every successful call
|
|
391
|
-
- `mcp.tool.error` β tool name, error message, stack trace on failures
|
|
392
|
-
- OpenTelemetry spans with `tool.name` and `project` attributes on all 50 tool handlers
|
|
393
|
-
|
|
394
|
-
| Dashboard | What it tracks |
|
|
395
|
-
|-----------|---------------|
|
|
396
|
-
| [Prism MCP β Server Analytics](https://app.datadoghq.com/dashboard/tdm-92f-myh/prism-mcp--server-analytics) | Tool call volume, latency per tool (avg/p95), errors by tool, project activity, knowledge search/ingest, session memory ops |
|
|
397
|
-
|
|
398
|
-
### In-app analytics for paid users *(new in v16.2)*
|
|
399
|
-
|
|
400
|
-
Paid Synalux subscribers get a built-in analytics dashboard at `/app/memory-analytics`:
|
|
401
|
-
|
|
402
|
-
```
|
|
403
|
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
404
|
-
β Analytics [standard] plan β
|
|
405
|
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
|
406
|
-
β π Sessions: 147 π Handoffs: 23 π Knowledge: 89 β
|
|
407
|
-
β π Projects: 5 πΎ Memory: 42 KB β
|
|
408
|
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
|
409
|
-
β Today's Usage π§ 47/200 π 12/50 π¬ 85/200 β
|
|
410
|
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
|
411
|
-
β 30-Day Trend βββ
βββββ
βββββ
ββββ
βββ
ββββββ
βββ
β β
|
|
412
|
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
|
413
|
-
β Top Projects prism-mcp (45) Β· portal (32) Β· ... β
|
|
414
|
-
β Compaction 3 entries > 5KB β run compact_ledger β
|
|
415
|
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
416
|
-
```
|
|
417
|
-
|
|
418
|
-
- **Free tier**: paywall with upgrade CTA
|
|
419
|
-
- **Standard+**: session counts, handoffs, knowledge entries, daily quotas with tier limits, 30-day activity trend, project breakdown, compaction candidates
|
|
420
|
-
|
|
421
|
-
---
|
|
422
|
-
|
|
423
|
-
## How AI agents use it
|
|
270
|
+
Prism exposes 40+ MCP tools. The core memory loop:
|
|
424
271
|
|
|
425
272
|
| Tool | What it does |
|
|
426
273
|
|---|---|
|
|
427
|
-
| `session_load_context` | Recover prior session's state on boot |
|
|
428
|
-
| `session_save_ledger` | Append immutable session log entry |
|
|
274
|
+
| `session_load_context` | Recover the prior session's state on boot |
|
|
275
|
+
| `session_save_ledger` | Append an immutable session log entry |
|
|
429
276
|
| `session_save_handoff` | Save live state for the next session |
|
|
430
277
|
| `knowledge_search` | Semantic + keyword search over all memories |
|
|
431
|
-
| `query_memory_natural` | Natural-language Q&A over
|
|
432
|
-
| `
|
|
433
|
-
| `
|
|
434
|
-
| `
|
|
278
|
+
| `query_memory_natural` | Natural-language Q&A over the memory store |
|
|
279
|
+
| `session_detect_drift` | Detect when a session has drifted from its goal |
|
|
280
|
+
| `verify_behavior` | Pre-edit scenario challenge β catch bad changes before they happen |
|
|
281
|
+
| `knowledge_ingest` | Teach Prism a codebase or document |
|
|
435
282
|
|
|
436
|
-
|
|
283
|
+
Full TypeScript signatures live in [`src/tools/`](src/tools/); architecture in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).
|
|
437
284
|
|
|
438
285
|
<details>
|
|
439
|
-
<summary
|
|
440
|
-
|
|
441
|
-
The LLM context window is treated as ephemeral scratch space. All durable state lives in Prism's persistent store (SQLite / Supabase). Context compaction is a non-event.
|
|
442
|
-
|
|
443
|
-
**Boot protocol** β every session (including post-compaction) begins with a mandatory `session_load_context` call, enforced via `CLAUDE.md`. The agent is fully oriented before writing a single byte of response.
|
|
444
|
-
|
|
445
|
-
**Two persistent stores:**
|
|
446
|
-
- `session_save_ledger` β immutable append-only work log (decisions, files changed, summaries)
|
|
447
|
-
- `session_save_handoff` β versioned live-state snapshot (current task, TODOs, open context)
|
|
448
|
-
|
|
449
|
-
**Ledger compaction** (`session_compact_ledger`) β when a project exceeds a threshold (default: 50 entries), Prism summarizes old entries via LLM into a rollup row, soft-archives originals, and links them via `spawned_from` graph edges. Runs on a 12-hour background scheduler.
|
|
450
|
-
|
|
451
|
-
β Full details: [`docs/COMPACTION.md`](docs/COMPACTION.md)
|
|
286
|
+
<summary>How Prism survives context compaction</summary>
|
|
452
287
|
|
|
288
|
+
The LLM context window is treated as ephemeral scratch space; durable state lives in the persistent store (SQLite locally, the portal in the cloud). Every session begins with a mandatory `session_load_context` call, so the agent is oriented before it writes a response. When a project exceeds a threshold (default 50 entries), `session_compact_ledger` summarizes old entries into a rollup, soft-archives the originals, and links them in the graph. See [`docs/COMPACTION.md`](docs/COMPACTION.md)
|
|
453
289
|
</details>
|
|
454
290
|
|
|
455
291
|
---
|
|
456
292
|
|
|
457
|
-
##
|
|
458
|
-
|
|
459
|
-
Prism Coder inference cascades through fine-tuned models first, with Claude as a quality-gate fallback. Models route through the Synalux router (authentication + subscription required). Cascade: Cloud (OpenRouter) β Ollama local β Claude fallback.
|
|
460
|
-
|
|
461
|
-
| Model | Ollama tag | Where | Tier | Latency |
|
|
462
|
-
|---|---|---|---|---|
|
|
463
|
-
| **prism-coder:1.7b** | `prism-coder:1b7` (v42) | On-device (Mac/local) Β· iOS via llama.cpp | Free | ~1.6s |
|
|
464
|
-
| **prism-coder:8b** | `prism-coder:8b` (v36) | On-device iPhone/iPad 8GB+ Β· local Mac | Free | ~0.8s |
|
|
465
|
-
| **prism-coder:14b** | `prism-coder:14b` (v36) | On-device Mac 24GB+ Β· iPad Pro Β· Cloud A100 | Standard+ | ~1.1s |
|
|
466
|
-
| **prism-coder:32b** | `prism-coder:32b` (v7 MoE) | Cloud (OpenRouter) A100 80GB via Synalux | Pro/Enterprise | ~0.8s |
|
|
467
|
-
|
|
468
|
-
Models use the Synalux SFT corpus (AAC + Prism MCP tool taxonomy + clinical workflows). **Internal quality gate: β₯ 90% on the Prism 102-case eval before production promotion.**
|
|
469
|
-
|
|
470
|
-
> **Training note**: Base Qwen3 models are strong tool-routers out of the box. Heavy fine-tuning regresses tool-vs-plain-text decisions; light-touch polish recipes (small corpus, balanced tool/plain-text split) are the published path. Production adapter selection and retrain methodology are managed in the Synalux portal.
|
|
471
|
-
|
|
472
|
-
**Per-category breakdown β [Prism 102-case eval](tests/benchmarks/prism-routing-100/README.md) (3-seed mean, v36/v7 system prompt, May 2026):**
|
|
473
|
-
|
|
474
|
-
| Model | Overall | Load ctx | Save | Srch mem | Handoff | Compact | Know srch | AAC | Translate | No-tool | Info | Edge | Avg lat | Inv |
|
|
475
|
-
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
476
|
-
| **prism-coder:32b** v7 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 0.8s | 0 |
|
|
477
|
-
| **prism-coder:8b** v36 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 0.8s | 0 |
|
|
478
|
-
| **prism-coder:14b** v36 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 1.1s | 0 |
|
|
479
|
-
| **Claude Opus 4.7** | **98.3%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 83% | 3.0s | 0 |
|
|
480
|
-
| **prism-coder:1.7b** v42 | **100.0%** | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | **100%** | 1.6s | 0 |
|
|
481
|
-
|
|
482
|
-
> **Methodology**: 102-case pool across 12 categories. Scores are 3-seed mean (seeds 2027/2028/2029, zero variance across all seeds). All fine-tuned models use the Qwen3 nothink template with keyword-trigger routing prompts and `-> respond directly (no tool)` for the no-tool class. Full runner: [`tests/benchmarks/prism-routing-100/benchmark.py`](tests/benchmarks/prism-routing-100/benchmark.py) Β· Cascade runner: [`tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py`](tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py).
|
|
483
|
-
>
|
|
484
|
-
> **These are NOT general-purpose LLM benchmarks.** This eval measures routing precision on 6 specific MCP tools. The prism-coder models are specialists trained on this exact task β they match or exceed Claude on routing while Claude dominates on general reasoning, coding, and open-domain QA. The value is **offline reliability at zero cost**, not replacing cloud AI.
|
|
485
|
-
|
|
486
|
-
**iOS deployment:** On-device inference via **llama.cpp Swift SPM**. Auto-selects by device RAM: 14B on iPad Pro 16GB (100% routing), 8B on iPhone/iPad 8GB (100%, OOM fallback to 1.7B at 100%). CoreML not viable β coremltools doesn't support Qwen3 attention ops. Integration: `LLMEngine.swift` β `prismNativeBridge.askAI()` β token stream. WiFi fallback: Mac Ollama (`OLLAMA_HOST=0.0.0.0`).
|
|
487
|
-
|
|
488
|
-
### Benchmarks β run them yourself
|
|
489
|
-
|
|
490
|
-
All benchmarks are open-source. Reproduce every number in this README:
|
|
293
|
+
## CLI
|
|
491
294
|
|
|
492
295
|
```bash
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
#
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
# Cascade eval β 14B β 32B β Opus (Claude Opus as etalon)
|
|
501
|
-
export ANTHROPIC_API_KEY=sk-ant-...
|
|
502
|
-
ollama pull dcostenco/prism-coder:14b dcostenco/prism-coder:32b
|
|
503
|
-
python3 tests/benchmarks/cascade-14b-32b-opus/cascade_eval.py
|
|
296
|
+
prism load <project> # load session context
|
|
297
|
+
prism save # save ledger + handoff
|
|
298
|
+
prism search <query> # search code across repos (exact / regex / symbol / semantic)
|
|
299
|
+
prism review <files...> # AI code review β security, performance, style
|
|
300
|
+
prism scan <files...> # security scan β secrets, licenses, Dockerfile
|
|
301
|
+
prism push # push local SQLite to the cloud backend
|
|
302
|
+
prism register-models # alias dcostenco/prism-coder:* -> prism-coder:*
|
|
504
303
|
```
|
|
505
304
|
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
| Benchmark | Source | What it measures |
|
|
509
|
-
|---|---|---|
|
|
510
|
-
| Per-model BFCL | [`tests/benchmarks/prism-routing-100/`](tests/benchmarks/prism-routing-100/) | Solo accuracy per model, 12 categories |
|
|
511
|
-
| Cascade vs Opus | [`tests/benchmarks/cascade-14b-32b-opus/`](tests/benchmarks/cascade-14b-32b-opus/) | Tier distribution, Opus engagement rate, cascade accuracy |
|
|
512
|
-
| LoCoMo-Plus (Cognitive) | [`dcostenco/Locomo-Plus`](https://github.com/dcostenco/Locomo-Plus) | Long-context dialogue coherence and historical memory retention |
|
|
513
|
-
|
|
514
|
-
### Cognitive Dialogue Memory (LoCoMo-Plus Benchmark)
|
|
515
|
-
|
|
516
|
-
LoCoMo-Plus is a long-context, multi-day dialogue benchmark designed to test an AI agent's memory retention, context awareness, and ability to coherently reference historical dialogue evidence.
|
|
305
|
+
### `prism search` β semantic code search
|
|
517
306
|
|
|
518
|
-
|
|
307
|
+
<p align="center">
|
|
308
|
+
<img src="docs/scm_search_cli.jpg" alt="prism search β semantic code search with relevance scores" width="500" />
|
|
309
|
+
</p>
|
|
519
310
|
|
|
520
|
-
|
|
521
|
-
| :--- | :---: | :---: | :---: | :---: | :---: |
|
|
522
|
-
| **Gemini-2.5-flash (Baseline)** | 401 | 278.0 / 401 | **69.33%** | β | β |
|
|
523
|
-
| **Prism-MCP (Gemini-2.5-flash + Memory)** | 401 | 361.0 / 401 | **90.02%** | **+20.69pp** | **67.5%** |
|
|
524
|
-
| **Gemini-3.1-pro-preview (Baseline)** | 401 | 272.0 / 401 | **67.83%** | β | β |
|
|
525
|
-
| **Prism-MCP (Gemini-3.1-pro + Memory)** | 401 | 382.0 / 401 | **95.26%** | **+27.43pp** | **85.3%** |
|
|
526
|
-
| **Gemini-3.5-flash (Baseline)** | 401 | 237.0 / 401 | **59.10%** | β | β |
|
|
527
|
-
| **Prism-MCP (Gemini-3.5-flash + Memory)** | 401 | 388.0 / 401 | **96.76%** | **+37.66pp** | **92.1%** |
|
|
528
|
-
| **Claude Sonnet 4.6 (Baseline)** | 401 | 290.0 / 401 | **72.32%** | β | β |
|
|
529
|
-
| **Prism-MCP (Claude Sonnet 4.6 + Memory)** | 401 | 357.0 / 401 | **89.03%** | **+16.71pp** | **60.4%** |
|
|
311
|
+
### `prism review` β AI code review with HIPAA checks
|
|
530
312
|
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
* **Best overall**: Prism-MCP + Gemini 3.5 Flash achieves the highest score (**96.76%**), eliminating 92.1% of baseline errors. This makes the cheapest model + Prism more accurate than the most expensive model alone.
|
|
535
|
-
* **Claude vs Gemini (raw)**: Claude Sonnet 4.6 outperforms all Gemini baselines by a wide margin (+13.22pp over Flash 3.5, +4.49pp over Pro 3.1), confirming stronger native long-context recall.
|
|
313
|
+
<p align="center">
|
|
314
|
+
<img src="docs/scm_review_cli.jpg" alt="prism review β AI code review with security and HIPAA findings" width="400" />
|
|
315
|
+
</p>
|
|
536
316
|
|
|
537
|
-
|
|
538
|
-
<summary>π View Test Case Schema & Sample</summary>
|
|
317
|
+
### `prism scan` β security scanner for secrets, Dockerfiles, licenses
|
|
539
318
|
|
|
540
|
-
|
|
319
|
+
<p align="center">
|
|
320
|
+
<img src="docs/scm_scan_cli.jpg" alt="prism scan β security scan finding secrets and container issues" width="400" />
|
|
321
|
+
</p>
|
|
541
322
|
|
|
542
|
-
|
|
543
|
-
{
|
|
544
|
-
"category": "Cognitive",
|
|
545
|
-
"input_prompt": "Caroline said, \"...\"\nMelanie said, \"...\"",
|
|
546
|
-
"trigger": "Melanie said, \"Hey, Caroline! Nice to hear from you! Love the necklace, any special meaning to it?\"",
|
|
547
|
-
"evidence": "Swedish grandmother's necklace was gifted to Caroline",
|
|
548
|
-
"answer": "Yes, this necklace was a gift from my grandmother in my home country, Sweden."
|
|
549
|
-
}
|
|
550
|
-
```
|
|
323
|
+
---
|
|
551
324
|
|
|
552
|
-
|
|
553
|
-
* **Baseline models** without memory frequently output a generic guess (e.g., "Thanks, it was a gift from a friend") or fail to reference the Sweden/grandmother relationship.
|
|
554
|
-
* **Prism-MCP** automatically embeds the prior turns, stores them in SQLite, and when cued, retrieves the precise "Swedish grandmother" evidence turn via semantic vectors to inject it into active context.
|
|
555
|
-
</details>
|
|
325
|
+
## Companions
|
|
556
326
|
|
|
557
|
-
|
|
558
|
-
<summary>π» View How to Reproduce Publicly (Test Source & Guide)</summary>
|
|
327
|
+
### Web IDE β Prism Coder
|
|
559
328
|
|
|
560
|
-
|
|
329
|
+
A browser-based IDE at [synalux.ai/coder](https://synalux.ai/coder). Import any GitHub repo and get:
|
|
561
330
|
|
|
562
|
-
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
|
|
566
|
-
|
|
567
|
-
|
|
568
|
-
|
|
569
|
-
|
|
570
|
-
|
|
571
|
-
--out-file output/gemini_3.1_pro_pred.json \
|
|
572
|
-
--model gemini-3.1-pro-preview \
|
|
573
|
-
--backend call_gemini \
|
|
574
|
-
--concurrency 5
|
|
575
|
-
|
|
576
|
-
# 3. Run Prism-MCP powered by Gemini 3.1 Pro Evaluation (concurrency 1 to guard SQLite locks)
|
|
577
|
-
export PRISM_TEXT_MODEL=gemini-3.1-pro-preview
|
|
578
|
-
PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
|
|
579
|
-
--data-file data/unified_cognitive_only.json \
|
|
580
|
-
--out-file output/prism_gemini_3.1_pro_pred.json \
|
|
581
|
-
--model gemini-3.1-pro-preview \
|
|
582
|
-
--backend call_prism \
|
|
583
|
-
--concurrency 1
|
|
584
|
-
|
|
585
|
-
# 4. Run Claude Sonnet 4.6 Baseline Evaluation (concurrency 3, rate-limit safe)
|
|
586
|
-
export ANTHROPIC_API_KEY="your-api-key"
|
|
587
|
-
PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/evaluate_qa.py \
|
|
588
|
-
--data-file data/unified_cognitive_only.json \
|
|
589
|
-
--out-file output/claude_sonnet46_pred.json \
|
|
590
|
-
--model claude-sonnet-4-6 \
|
|
591
|
-
--backend call_claude \
|
|
592
|
-
--concurrency 3
|
|
593
|
-
|
|
594
|
-
# 5. Grade results using the LLM-as-a-Judge script
|
|
595
|
-
PYTHONPATH=/tmp/Locomo-Plus python3 evaluation_framework/task_eval/llm_as_judge.py \
|
|
596
|
-
--input-file output/prism_gemini_3.1_pro_pred.json \
|
|
597
|
-
--out-file output/prism_gemini_3.1_pro_judged.json \
|
|
598
|
-
--model gemini-2.5-flash \
|
|
599
|
-
--backend call_gemini \
|
|
600
|
-
--concurrency 5 \
|
|
601
|
-
--summary-file output/prism_gemini_3.1_pro_summary.json
|
|
602
|
-
```
|
|
603
|
-
</details>
|
|
331
|
+
- **Monaco editor** with multi-tab, split view, syntax highlighting, and VS Code keybindings
|
|
332
|
+
- **In-browser Node.js** via WebContainer (your code runs in the browser sandbox, not on a server)
|
|
333
|
+
- **Integrated terminal** β WebContainer shell in-browser; optional server PTY via WebSocket when connected to a dev server
|
|
334
|
+
- **AI Agent Mode** β describe a task and the agent creates files, runs type-checks, and verifies
|
|
335
|
+
- **Source control** β commit, branch, push/pull, stash, blame, tag management
|
|
336
|
+
- **Live Share** β real-time collaborative editing with session links
|
|
337
|
+
- **Node.js debugger** via Chrome DevTools Protocol
|
|
338
|
+
- **Tasks runner** (VS Code `tasks.json` compatible), **Problems panel** (Monaco diagnostics)
|
|
339
|
+
- **12-language i18n** β full UI localization
|
|
604
340
|
|
|
605
|
-
|
|
341
|
+
<p align="center">
|
|
342
|
+
<img src="docs/screenshots/agent-mode.png" alt="Prism Coder IDE β Agent Mode creating a component with auto-fix and type-checking" width="500" />
|
|
343
|
+
</p>
|
|
606
344
|
|
|
607
|
-
|
|
608
|
-
|
|
609
|
-
|
|
610
|
-
| prism-coder:8b | [dcostenco/prism-coder-8b](https://huggingface.co/dcostenco/prism-coder-8b) | **100.0%** routing (v36) | Mobile tier | 4.7 GB |
|
|
611
|
-
| prism-coder:14b | [dcostenco/prism-coder-14b](https://huggingface.co/dcostenco/prism-coder-14b) | **100.0%** routing (v36) | Tier 1 (serves ~99% of traffic) | 8.4 GB |
|
|
612
|
-
| prism-coder:1.7b | [dcostenco/prism-coder-1.7b](https://huggingface.co/dcostenco/prism-coder-1.7b) | **100.0%** routing (v42) | On-device / always-fits fallback | 1.1 GB |
|
|
613
|
-
| prism-ide:14b | [dcostenco/prism-ide](https://huggingface.co/dcostenco/prism-ide) | **22/22** TypeScript eval (v1) | Code generation tier 1 (~1.1s) | 8.4 GB |
|
|
614
|
-
| prism-ide:32b | [dcostenco/prism-ide](https://huggingface.co/dcostenco/prism-ide) | Complex code + multi-file (v3) | Code generation tier 2 (~0.8s MoE) | 16 GB |
|
|
345
|
+
<p align="center">
|
|
346
|
+
<img src="docs/screenshots/collaboration.png" alt="Prism Coder IDE β Live Share with team members and real-time cursor tracking" width="500" />
|
|
347
|
+
</p>
|
|
615
348
|
|
|
616
|
-
|
|
349
|
+
Standard+ plans get cloud AI and higher rate limits. Free tier works with local Ollama. Code execution uses the in-browser WebContainer by default; Live Share and the optional PTY terminal connect to external servers when explicitly enabled.
|
|
617
350
|
|
|
618
|
-
|
|
351
|
+
### VS Code Extension β Synalux
|
|
619
352
|
|
|
620
|
-
|
|
353
|
+
Memory-augmented AI inside VS Code with clinical practice management features. Install from the marketplace:
|
|
621
354
|
|
|
622
355
|
```bash
|
|
623
|
-
|
|
624
|
-
ollama pull dcostenco/prism-coder:1b7
|
|
625
|
-
|
|
626
|
-
# Mobile tier β 4.7 GB (iPhone/iPad 8GB, Mac M1+) β 100% routing
|
|
627
|
-
ollama pull dcostenco/prism-coder:8b
|
|
628
|
-
|
|
629
|
-
# Standard tier β 8.4 GB (Mac 24GB+, iPad Pro 16GB) β 100% routing
|
|
630
|
-
ollama pull dcostenco/prism-coder:14b
|
|
631
|
-
|
|
632
|
-
# Reasoning tier β 16 GB (Mac M2 Ultra+, 30B-A3B MoE) β 100% routing
|
|
633
|
-
ollama pull dcostenco/prism-coder:32b
|
|
356
|
+
code --install-extension synalux-ai.synalux
|
|
634
357
|
```
|
|
635
358
|
|
|
636
|
-
|
|
637
|
-
|
|
638
|
-
**Desktop/server**: 14B β 32B β Claude Sonnet 4 fallback Β· **Mobile/offline**: 14B β 8B β 1.7B
|
|
639
|
-
|
|
640
|
-
iOS/mobile on same WiFi: `OLLAMA_HOST=0.0.0.0 ollama serve` on the Mac, then point `LOCAL_LLM_URL` at the Mac's IP.
|
|
641
|
-
Routing accuracy (May 2026, v36/v7 system prompt, 3-seed mean): 32B v7 = **100.0%** Β· 8B v36 = **100.0%** Β· 14B v36 = **100.0%** Β· 1.7B v42 = **100.0%**
|
|
642
|
-
Cascade (14Bβ32B): **100.0%** Β· Opus solo: 98.3% Β· Opus engaged: **0% of requests** β [Full results](tests/benchmarks/cascade-14b-32b-opus/README.md)
|
|
643
|
-
|
|
644
|
-
---
|
|
645
|
-
|
|
646
|
-
## What you can build with it
|
|
647
|
-
|
|
648
|
-
- **Persistent coding assistant** that remembers your codebase, your decisions, your team's conventions
|
|
649
|
-
- **Research agent** that builds knowledge over time β Auto-Scholar pipeline ingests papers / docs and synthesizes
|
|
650
|
-
- **Clinical scribe** that retains patient context across visits (HIPAA-compliant cloud + local)
|
|
651
|
-
- **Customer support agent** that learns from every ticket
|
|
652
|
-
- **Writing assistant** that knows your voice, your prior drafts, and what you've already published
|
|
653
|
-
|
|
654
|
-
---
|
|
655
|
-
|
|
656
|
-
## Companions
|
|
359
|
+
[](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
|
|
657
360
|
|
|
658
|
-
|
|
361
|
+
**AI features:** Chat participant (`@synalux`), multi-agent pipeline, voice input with conversation mode, model switching (local Ollama / cloud / Gemini), 10 AI personality tones.
|
|
659
362
|
|
|
660
|
-
**
|
|
363
|
+
**Clinical features (BCBA / healthcare):** SOAP note generator, role-based access, document signing, patient board. Voice recording with AES-256-GCM encryption (consent-gated, off by default, plaintext deleted after encryption).
|
|
661
364
|
|
|
662
|
-
|
|
365
|
+
**Collaboration:** Team chat, direct messages, enterprise video calls (LiveKit), customer board, visual builder, DevContainers, Auth & Database panel.
|
|
663
366
|
|
|
664
|
-
|
|
367
|
+
**Privacy note:** The extension routes AI requests through the `BackendRouter` β local Ollama by default for free tier, cloud for paid (user-configurable via `preferLocal`). Clinical features (SOAP notes, voice) route through the same backend. `preferLocal=true` tries local first but can still fall back to cloud if the local model is unavailable. For regulated workloads where PHI must never leave the machine, use the free tier (no cloud key) or an Enterprise plan with BAA that covers cloud-bound data. Licensed under [BSL-1.1](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux).
|
|
665
368
|
|
|
666
|
-
|
|
369
|
+
### Prism AAC
|
|
667
370
|
|
|
668
|
-
|
|
669
|
-
|---|---|
|
|
670
|
-
| Agent | prism-coder:8b offline Β· Claude Sonnet 4 (Standard+) |
|
|
671
|
-
| Integrations | GitHub repos Β· same Prism account, no separate sign-up |
|
|
672
|
-
| Plans | Free (4b) Β· Standard $19/mo (14b) Β· Advanced $49/mo (32b) Β· Enterprise $99/mo |
|
|
371
|
+
Communication app for non-speaking users, powered by the on-device prism-coder fleet for phrase prediction. macOS / iOS / web.
|
|
673
372
|
|
|
674
|
-
|
|
373
|
+
See [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)
|
|
675
374
|
|
|
676
|
-
|
|
375
|
+
---
|
|
677
376
|
|
|
678
|
-
|
|
377
|
+
## Self-hosting (Enterprise)
|
|
679
378
|
|
|
680
|
-
|
|
681
|
-
# Install from terminal
|
|
682
|
-
code --install-extension synalux-ai.synalux
|
|
683
|
-
```
|
|
684
|
-
|
|
685
|
-
Or open VS Code β Extensions (β§βX) β search **"Synalux"** β Install.
|
|
379
|
+
Run the full model stack on your own hardware β no cloud, full data sovereignty.
|
|
686
380
|
|
|
687
|
-
|
|
381
|
+
**Requirements:** Mac M2 Pro+ (48 GB recommended) or Linux + NVIDIA GPU, plus [Ollama](https://ollama.com).
|
|
688
382
|
|
|
689
383
|
```bash
|
|
690
|
-
|
|
691
|
-
|
|
692
|
-
|
|
693
|
-
# Or install globally
|
|
694
|
-
npm install -g prism-mcp-server
|
|
695
|
-
prism load my-project
|
|
384
|
+
ollama pull dcostenco/prism-coder:14b # default router
|
|
385
|
+
export LOCAL_LLM_URL=http://localhost:11434
|
|
696
386
|
```
|
|
697
387
|
|
|
698
|
-
|
|
699
|
-
|
|
700
|
-
### PrismAAC
|
|
701
|
-
|
|
702
|
-
AAC communication app for non-speaking users. Powered by Prism's spreading-activation phrase ranking + on-device 7B model. macOS / iOS / Android via web. β [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)
|
|
388
|
+
Routing is automatic: `14b β 4b β cloud fallback` on desktop/server, `2b β cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
|
|
703
389
|
|
|
704
390
|
---
|
|
705
391
|
|
|
706
|
-
##
|
|
707
|
-
|
|
708
|
-
As of v14.0.0, Prism's algorithm exports are a **stable public contract** under SemVer. External systems can port `actrActivation.ts` (ACT-R cognitive decay), `spreadingActivation.ts` (the 0.7 similarity + 0.3 activation hybrid score), `routerExperience.ts` (experience bias with `MIN_SAMPLES=5` cold-start gate), `compactionHandler.ts` (the 25KB prompt-budget cap), and `graphMetrics.ts` (warning ratios) with citations and pin a Prism version.
|
|
392
|
+
## Configuration reference
|
|
709
393
|
|
|
710
|
-
|
|
711
|
-
|
|
712
|
-
|
|
|
713
|
-
|
|
714
|
-
|
|
|
715
|
-
|
|
|
716
|
-
| Synalux portal | Tier-aware model routing using experience bias on prior outcomes per fingerprint. HIPAA-compliant clinical scribe with on-device-first privacy guarantees. |
|
|
717
|
-
|
|
718
|
-
## CLI Reference
|
|
394
|
+
| Variable | Purpose | Default |
|
|
395
|
+
|---|---|---|
|
|
396
|
+
| `PRISM_STORAGE` | `local` / `synalux` / `supabase` / `auto` | `auto` |
|
|
397
|
+
| `PRISM_SYNALUX_API_KEY` | Paid-tier portal key (`synalux_sk_...`) | -- (local if unset) |
|
|
398
|
+
| `LOCAL_LLM_URL` | Ollama endpoint | `http://localhost:11434` |
|
|
399
|
+
| `PRISM_FORCE_LOCAL` | Force local SQLite regardless of credentials | `false` |
|
|
719
400
|
|
|
720
|
-
|
|
401
|
+
With no variables set, Prism runs fully local. Set `PRISM_SYNALUX_API_KEY` (and leave `PRISM_STORAGE=auto`) to use the cloud backend.
|
|
721
402
|
|
|
722
|
-
|
|
723
|
-
prism load <project> # Load session context (same as session_load_context MCP tool)
|
|
724
|
-
prism save # Save session state (ledger + handoff)
|
|
725
|
-
prism ledger <project> # Save a session log entry (same as session_save_ledger)
|
|
726
|
-
prism handoff <project> # Update live project state for next session
|
|
727
|
-
prism push # Push local SQLite data to Supabase cloud
|
|
728
|
-
prism sync # Cross-backend data synchronization
|
|
729
|
-
prism search <query> # Search code across repos (exact, regex, symbol, semantic)
|
|
730
|
-
prism review <files...> # AI code review β security, performance, style
|
|
731
|
-
prism scan <files...> # Security scan β secrets, licenses, Dockerfile
|
|
732
|
-
prism dora # Show DORA metrics for current project
|
|
733
|
-
prism scm # Source control, AI review, security scanning
|
|
734
|
-
prism verify # Manage the verification harness
|
|
735
|
-
prism status # Check verification state and config drift
|
|
736
|
-
prism generate # Bless current rubric as canonical
|
|
737
|
-
prism register-models # Alias dcostenco/prism-coder:* β prism-coder:*
|
|
738
|
-
```
|
|
403
|
+
---
|
|
739
404
|
|
|
740
405
|
## Testing
|
|
741
406
|
|
|
742
407
|
```bash
|
|
743
|
-
npm test
|
|
744
|
-
npm test -- --coverage
|
|
745
|
-
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 1b7 14b 32b
|
|
408
|
+
npm test # full suite (vitest)
|
|
409
|
+
npm test -- --coverage # coverage report
|
|
746
410
|
```
|
|
747
411
|
|
|
748
|
-
|
|
412
|
+
Coverage spans HRR retrieval, knowledge ingestion, the inference cascade and grounding verifier, compaction, the model picker, and storage round-trips.
|
|
749
413
|
|
|
750
|
-
|
|
751
|
-
- HRR zero-search retrieval (97 tests: 3 embedding strategies, edge cases, persistence, adaptive cascade, API client, chat integration)
|
|
752
|
-
- Knowledge ingestion (32 tests: chunker, Q&A gen, webhook, security, storage round-trip)
|
|
753
|
-
- Prism infer cascade (110 tests: tier selection, cloud fallback, grounding verifier)
|
|
754
|
-
- Compaction handler (rollup creation, concurrency guard, LLM failure)
|
|
755
|
-
- Model picker (20 tests: 14b default ceiling, 4b verifier, RAM gating)
|
|
756
|
-
- Storage round-trip (12 architectural guard tests preventing bypass)
|
|
757
|
-
- BCBA skill integration
|
|
758
|
-
- Deep storage tier
|
|
759
|
-
- Dashboard rendering
|
|
760
|
-
- Routing benchmarks (eval_300: 300 cases, 17 tools)
|
|
761
|
-
|
|
762
|
-
## Migration
|
|
414
|
+
---
|
|
763
415
|
|
|
764
|
-
|
|
416
|
+
## Migration: local to cloud
|
|
765
417
|
|
|
766
|
-
|
|
418
|
+
To move free-tier history into the paid portal:
|
|
767
419
|
|
|
768
420
|
```bash
|
|
769
|
-
|
|
770
|
-
node scripts/migrate-local-to-portal.mjs --dry-run
|
|
771
|
-
|
|
772
|
-
# real run β pushes ledger + handoff entries through POST /api/v1/prism/memory
|
|
421
|
+
node scripts/migrate-local-to-portal.mjs --dry-run # preview, no network
|
|
773
422
|
PRISM_SYNALUX_API_KEY=synalux_sk_... \
|
|
774
|
-
node scripts/migrate-local-to-portal.mjs
|
|
775
|
-
|
|
776
|
-
# scope to one project
|
|
777
|
-
node scripts/migrate-local-to-portal.mjs --project=my-project
|
|
778
|
-
|
|
779
|
-
# include scholar entries (excluded by default β usually large + low-value)
|
|
780
|
-
node scripts/migrate-local-to-portal.mjs --include-scholar
|
|
781
|
-
```
|
|
782
|
-
|
|
783
|
-
**What it does**: reads `~/.prism-mcp/data.db` via `@libsql/client` (already a runtime dep β no extra install), exchanges the refresh token for a JWT (cached + auto-refreshed before expiry), and POSTs each ledger entry and handoff to the portal. Failures are logged with the source row id; successes are counted at the end.
|
|
784
|
-
|
|
785
|
-
**Credentials**: `PRISM_SYNALUX_API_KEY` from env. If unset, the script also checks `~/prism/.env` for `PRISM_SYNALUX_API_KEY=...` as a convenience for dev workflows.
|
|
786
|
-
|
|
787
|
-
**Idempotency**: handoffs are written with the portal's CRDT merge (last-write-wins per project+role); ledger entries are append-only and de-duped server-side by `(project, conversation_id, summary)`. Re-running on the same DB is safe.
|
|
788
|
-
|
|
789
|
-
**One-shot only**: this script is a migration tool, not a sync daemon. Once you've moved, set `PRISM_STORAGE=synalux` (or leave it on `auto` and let the resolver pick synalux when credentials are present) and the MCP server writes directly to the portal going forward.
|
|
790
|
-
|
|
791
|
-
## Production Infrastructure
|
|
792
|
-
|
|
793
|
-
### Architecture
|
|
794
|
-
|
|
423
|
+
node scripts/migrate-local-to-portal.mjs # push ledger + handoffs
|
|
795
424
|
```
|
|
796
|
-
CLIENTS
|
|
797
|
-
βββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
|
|
798
|
-
β prism-aac (iOS/web)β β Claude Code Β· Cursor Β· IDE β
|
|
799
|
-
β Vercel β β MCP config β Railway URL β
|
|
800
|
-
ββββββββββββ¬βββββββββββ βββββββββββββββ¬ββββββββββββββββ
|
|
801
|
-
β inference β memory
|
|
802
|
-
βΌ βΌ
|
|
803
|
-
ββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
|
|
804
|
-
β SYNALUX ROUTER β β prism-mcp SERVER β
|
|
805
|
-
β Vercel β β β
|
|
806
|
-
β β’ JWT auth β β Primary β Railway β
|
|
807
|
-
β β’ tier enforcement β β Standby β Fly.io β
|
|
808
|
-
β β’ complexity route β β Fallback β Supabase REST β
|
|
809
|
-
β β’ proxy to cloud β β auto-failover chain β
|
|
810
|
-
ββββββββββββ¬ββββββββββββ βββββββββββββββ¬ββββββββββββββββ
|
|
811
|
-
β β
|
|
812
|
-
βΌ βΌ
|
|
813
|
-
ββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
|
|
814
|
-
β OPENROUTER / LOCAL β β SUPABASE β
|
|
815
|
-
β β β session ledgers β
|
|
816
|
-
β Cloud: Claude Sonnet 4 β β knowledge graph β
|
|
817
|
-
β Routing: prism-coder β β handoffs & todos β
|
|
818
|
-
β :32b(100%) :14b(100%) β β β
|
|
819
|
-
β :8b(100%) :1b7(100%) β β source of truth β
|
|
820
|
-
β Code: prism-ide β β β
|
|
821
|
-
β :14b Β· :32b β β β
|
|
822
|
-
ββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
|
|
823
|
-
```
|
|
824
|
-
|
|
825
|
-
### Service Routing
|
|
826
|
-
|
|
827
|
-
**LLM Backends**
|
|
828
425
|
|
|
829
|
-
|
|
830
|
-
|---|---|---|---|
|
|
831
|
-
| AI Chat (free) | Gemini 2.5 Flash (direct API) | Claude Haiku 3.5 | prism-coder:14b via Ollama |
|
|
832
|
-
| AI Chat (paid) | Claude Sonnet 4 (OpenRouter) | Claude Haiku 3.5 | prism-coder:14b via Ollama |
|
|
833
|
-
| Prism Coder (tool-calling) | Claude Haiku 3.5 (OpenRouter) | β | prism-coder:14b via Ollama |
|
|
834
|
-
| Prism AAC | Local prism-coder:14b | Gemini 2.5 Flash / Claude | prism-coder:8b / :1b7 |
|
|
835
|
-
|
|
836
|
-
**Web Search**
|
|
837
|
-
|
|
838
|
-
| Surface | Primary | Fallback |
|
|
839
|
-
|---|---|---|
|
|
840
|
-
| AI Chat `@search` | Firecrawl | β |
|
|
841
|
-
| Prism MCP agents (cloud) | Firecrawl | β |
|
|
842
|
-
| Prism MCP server (local) | Firecrawl (via MCP tools) | β |
|
|
843
|
-
| Clinical research | PubMed + ERIC + Semantic Scholar | DuckDuckGo |
|
|
844
|
-
|
|
845
|
-
**TTS (Text-to-Speech)**
|
|
846
|
-
|
|
847
|
-
| Tier | Engine | Offline |
|
|
848
|
-
|---|---|---|
|
|
849
|
-
| 1 | Inworld TTS-2 (cloud) | β |
|
|
850
|
-
| 1.5 | Kokoro-82M neural (WASM) | en/es/fr/pt/ja/zh |
|
|
851
|
-
| 2 | OS Web Speech API | all |
|
|
852
|
-
| 3 | WASM espeak-ng | all |
|
|
853
|
-
|
|
854
|
-
**Other Services**
|
|
855
|
-
|
|
856
|
-
| Service | Provider | Purpose |
|
|
857
|
-
|---|---|---|
|
|
858
|
-
| Payments | Stripe | Subscriptions, checkout |
|
|
859
|
-
| Email | Resend | Transactional (invites, shares) |
|
|
860
|
-
| Video | LiveKit | Telehealth, case conferences |
|
|
861
|
-
| SMS | Twilio | Emergency alerts, caregiver notifications |
|
|
862
|
-
| Translation | Offline dictionary (1,261 Γ 20 langs) | AAC, Watch |
|
|
863
|
-
|
|
864
|
-
## Synalux Inference Router
|
|
865
|
-
|
|
866
|
-
All Prism AAC model inference is protected behind Synalux as a mandatory router. Models are **never accessible directly** β all traffic goes through Synalux for auth, billing, and rate limiting.
|
|
867
|
-
|
|
868
|
-
```
|
|
869
|
-
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
870
|
-
β CLIENT LAYER β
|
|
871
|
-
β prism-aac (iOS/web) β Synalux Portal β
|
|
872
|
-
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
|
|
873
|
-
β POST /api/v1/prism-aac/inference
|
|
874
|
-
β Authorization: Bearer <user-JWT>
|
|
875
|
-
βΌ
|
|
876
|
-
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
877
|
-
β SYNALUX ROUTER β
|
|
878
|
-
β 1. Verify JWT (no anonymous access) β
|
|
879
|
-
β 2. Check subscription tier β
|
|
880
|
-
β 3. Enforce rate limit (per-tier daily cap) β
|
|
881
|
-
β 4. Route to model tier by complexity β
|
|
882
|
-
β 5. Proxy β OpenRouter / Gemini (key never exposed) β
|
|
883
|
-
β 6. Log β aac_inference_log (audit trail) β
|
|
884
|
-
ββββββββββββ¬ββββββββββββββββββββββββββββββββ¬ββββββββββββββββ
|
|
885
|
-
β β
|
|
886
|
-
βΌ βΌ
|
|
887
|
-
ββββββββββββββββββββββ ββββββββββββββββββββββββ
|
|
888
|
-
β LOCAL (Ollama) β β CLOUD (OpenRouter) β
|
|
889
|
-
β prism-coder:14b β β Claude Sonnet 4 β
|
|
890
|
-
β prism-coder:8b β β Claude Haiku 3.5 β
|
|
891
|
-
β prism-coder:1b7 β β Gemini 2.5 Flash β
|
|
892
|
-
β free, offline β β paid tiers β
|
|
893
|
-
ββββββββββββββββββββββ ββββββββββββββββββββββββ
|
|
894
|
-
|
|
895
|
-
On-device (free, offline):
|
|
896
|
-
prism-coder:1b7 GGUF Q4_K_M (1.1 GB) β any Apple device
|
|
897
|
-
prism-coder:8b GGUF Q4_K_M (4.7 GB) β iPhone/iPad 8 GB+
|
|
898
|
-
prism-coder:14b GGUF Q4_K_M (8.4 GB) β Mac/iPad Pro 16 GB+
|
|
899
|
-
|
|
900
|
-
HuggingFace: dcostenco/prism-coder-{14b,8b,32b,1.7b} (public GGUF weights)
|
|
901
|
-
```
|
|
902
|
-
|
|
903
|
-
| Plan | Cloud model | Daily limit | On-device |
|
|
904
|
-
|---|---|---|---|
|
|
905
|
-
| **Free** | β | unlimited local | prism-coder:1.7b (100%) + 8b (100%) + 14b (100%) |
|
|
906
|
-
| **Standard $19/mo** | Claude Sonnet 4 | 200 req | + cloud fallback |
|
|
907
|
-
| **Pro $49/mo** | prism-coder:32b | 2,000 req | + reasoning tier |
|
|
908
|
-
| **Enterprise $99/mo** | prism-coder:32b priority | unlimited | + HIPAA BAA + custom fine-tuning |
|
|
909
|
-
|
|
910
|
-
All on-device models are **free for every tier** β no subscription needed for local inference. Offline translation (1,261 phrases Γ 20 languages) included in all plans.
|
|
911
|
-
|
|
912
|
-
[Subscribe β](https://synalux.ai/pricing)
|
|
913
|
-
|
|
914
|
-
See [`docs/WOW_FEATURES.md`](docs/WOW_FEATURES.md) for the algorithm catalogue. Release notes in [`docs/releases/v14.0.0-prism-as-foundation.md`](docs/releases/v14.0.0-prism-as-foundation.md).
|
|
915
|
-
|
|
916
|
-
---
|
|
917
|
-
|
|
918
|
-
<details>
|
|
919
|
-
<summary>π Architecture, cognitive systems, and full feature catalog</summary>
|
|
920
|
-
|
|
921
|
-
**Detailed docs in this repo:**
|
|
922
|
-
- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) β system architecture, memory routing, HRR
|
|
923
|
-
- [`docs/COMPACTION.md`](docs/COMPACTION.md) β how Prism handles LLM context compaction and ledger compaction
|
|
924
|
-
- [`docs/SETUP_GEMINI.md`](docs/SETUP_GEMINI.md) β Gemini configuration
|
|
925
|
-
- [`docs/self-improving-agent.md`](docs/self-improving-agent.md) β adversarial eval / anti-sycophancy
|
|
926
|
-
- [`docs/rfcs/`](docs/rfcs/) β design RFCs
|
|
927
|
-
- [`docs/releases/`](docs/releases/) β per-version release notes
|
|
928
|
-
- [`CHANGELOG.md`](CHANGELOG.md) β version history (v12.5 Unified Billing, v11.6 Hivemind, v11.5.1 Auto-Scholar, etc.)
|
|
929
|
-
- [`CONTRIBUTING.md`](CONTRIBUTING.md) β contributor guide
|
|
930
|
-
|
|
931
|
-
**The original 1933-line README is preserved in git history.** To browse the prior version (full feature catalog, Cognitive Architecture v7.8, Autonomous Cognitive OS v9.0, HRR Zero-Search, Adversarial Evaluation walkthroughs, Universal Import patterns, competitive analysis vs LangMem/MemGPT/Letta/Zep, v12.5 Unified Billing details, v11.6 Hivemind, v11.5.1 Auto-Scholar): `git show HEAD~1:README.md`.
|
|
932
|
-
|
|
933
|
-
</details>
|
|
426
|
+
It reads `~/.prism-mcp/data.db` and POSTs entries to the portal. Ledger entries are append-only and de-duped server-side; handoffs use last-write-wins per project. Re-running on the same DB is safe. This is a one-shot migration, not a sync daemon β after it, set `PRISM_STORAGE=synalux` (or leave it on `auto`).
|
|
934
427
|
|
|
935
428
|
---
|
|
936
429
|
|
|
937
430
|
## License
|
|
938
431
|
|
|
939
|
-
|
|
432
|
+
| Product | License |
|
|
433
|
+
|---|---|
|
|
434
|
+
| **prism-mcp-server** (this repo) | [AGPL-3.0](LICENSE) |
|
|
435
|
+
| **VS Code extension** (synalux-ai.synalux) | BSL-1.1 |
|
|
436
|
+
| **Web IDE** (synalux.ai/coder) | Synalux Terms of Service |
|
|
437
|
+
| **Prism AAC** | AGPL-3.0 |
|
|
438
|
+
|
|
439
|
+
The AGPL-3.0 license covers the MCP server and its source code. The VS Code extension and Web IDE are separate products with their own licenses. Commercial hosted/managed deployment of the MCP server is available via the Synalux subscription.
|