prism-mcp-server 19.0.0 → 19.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +129 -75
- package/dist/cli.js +2 -2
- package/dist/storage/sqlite.js +4 -2
- package/dist/tools/behavioralVerifierHandler.js +3 -4
- package/dist/tools/ledgerHandlers.js +7 -5
- package/dist/tools/prismInferHandler.js +87 -28
- package/dist/utils/entitlements.js +27 -7
- package/dist/utils/modelPicker.js +21 -22
- package/dist/utils/qualityGate.js +43 -0
- package/dist/utils/thinkStrip.js +26 -0
- package/dist/verification/gatekeeper.js +2 -1
- package/dist/verification/runner.js +7 -2
- package/dist/verification/schema.js +9 -1
- package/dist/verification/severityPolicy.js +12 -0
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,10 +1,6 @@
|
|
|
1
1
|
# Prism Coder
|
|
2
2
|
|
|
3
|
-
**
|
|
4
|
-
|
|
5
|
-
Prism Coder is a [Model Context Protocol](https://modelcontextprotocol.io) server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions — semantic search, cognitive routing, and a visual dashboard. It ships alongside the open-weight `prism-coder` model fleet (1.7B-32B) for fast, offline tool-routing when you don't want a cloud round-trip.
|
|
6
|
-
|
|
7
|
-
It runs **fully local and free** on SQLite + Ollama with no API keys. A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.
|
|
3
|
+
**Give your AI agent memory that lasts.** Persistent sessions, knowledge graphs, and offline tool-routing — fully local and free.
|
|
8
4
|
|
|
9
5
|
[](https://www.npmjs.com/package/prism-mcp-server)
|
|
10
6
|
[](https://github.com/modelcontextprotocol/servers)
|
|
@@ -15,7 +11,10 @@ It runs **fully local and free** on SQLite + Ollama with no API keys. A paid sub
|
|
|
15
11
|
<img src="docs/v11_hivemind_multi_agent_dashboard.jpg" alt="Prism Coder — Mind Palace Dashboard with Knowledge Graph and Multi-Agent Hivemind" width="700" />
|
|
16
12
|
</p>
|
|
17
13
|
|
|
18
|
-
|
|
14
|
+
Prism Coder is an [MCP server](https://modelcontextprotocol.io) that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight `prism-coder` model fleet (2B–27B) for fast, offline tool-routing — no cloud required.
|
|
15
|
+
|
|
16
|
+
**No account needed. No API keys. Runs on your machine.**
|
|
17
|
+
A paid subscription adds cloud sync, higher model tiers, and team features through the [Synalux portal](https://synalux.ai).
|
|
19
18
|
|
|
20
19
|
---
|
|
21
20
|
|
|
@@ -39,18 +38,20 @@ Open Claude Desktop or Cursor and your agent now has memory backed by a local SQ
|
|
|
39
38
|
**Optional — local model fleet** for offline tool-routing. Pull whichever fits your hardware:
|
|
40
39
|
|
|
41
40
|
```bash
|
|
42
|
-
ollama pull dcostenco/prism-coder:2b # 2.3 GB ·
|
|
43
|
-
ollama pull dcostenco/prism-coder:4b # 3.4 GB · verifier
|
|
44
|
-
ollama pull dcostenco/prism-coder:
|
|
45
|
-
ollama pull dcostenco/prism-coder:
|
|
41
|
+
ollama pull dcostenco/prism-coder:2b # 2.3 GB · mobile / lightweight (99.1% routing accuracy)
|
|
42
|
+
ollama pull dcostenco/prism-coder:4b # 3.4 GB · verifier (100% accuracy)
|
|
43
|
+
ollama pull dcostenco/prism-coder:9b # 5.8 GB · default router (100% accuracy, Qwen3.5)
|
|
44
|
+
ollama pull dcostenco/prism-coder:27b # 16 GB · complex tasks (100% accuracy)
|
|
46
45
|
```
|
|
47
46
|
|
|
48
|
-
Prism detects both the namespaced (`dcostenco/prism-coder:
|
|
47
|
+
Prism detects both the namespaced (`dcostenco/prism-coder:9b`) and bare (`prism-coder:9b`) Ollama tags automatically.
|
|
49
48
|
|
|
50
49
|
---
|
|
51
50
|
|
|
52
51
|
## What it does
|
|
53
52
|
|
|
53
|
+
Your AI agent forgets everything between sessions. Prism fixes that — and adds verification, drift detection, and multi-agent coordination on top.
|
|
54
|
+
|
|
54
55
|
### Mind Palace — persistent memory that survives across sessions
|
|
55
56
|
|
|
56
57
|
Every conversation feeds a persistent store. The next session loads the right context automatically — no re-explaining.
|
|
@@ -83,27 +84,17 @@ Long agent sessions can wander from their original goal. `session_detect_drift`
|
|
|
83
84
|
|
|
84
85
|
### Behavioral Verification — catch bad edits before they happen
|
|
85
86
|
|
|
86
|
-
AI agents
|
|
87
|
+
AI agents apply patterns from checklists without understanding the real-world impact. The `verify_behavior` tool challenges the agent with a scenario it must answer **before** editing — forcing it to think through what the end user will experience.
|
|
87
88
|
|
|
88
89
|
```
|
|
89
|
-
Agent: "I'll revert
|
|
90
|
-
Prism: "⚠️
|
|
91
|
-
|
|
92
|
-
Agent: "The ticket
|
|
93
|
-
Prism: "Correct — your revert would
|
|
90
|
+
Agent: "I'll revert this kitchen display change"
|
|
91
|
+
Prism: "⚠️ Scenario: A cook sees a 3-item ticket. One item is voided.
|
|
92
|
+
What should the cook see after the void?"
|
|
93
|
+
Agent: "The ticket stays visible with the remaining 2 items."
|
|
94
|
+
Prism: "Correct — your revert would hide the ticket entirely."
|
|
94
95
|
```
|
|
95
96
|
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
**How it works**: The `verify_behavior` tool calls the Synalux portal API, which matches the file path against domain scenarios stored in the database. The agent must answer the scenario concretely before editing. No local hooks required — works in Claude, Cursor, or any MCP client.
|
|
99
|
-
|
|
100
|
-
**Why it matters**: In a single audit session, 47 bugs were found across 7 days of AI-generated code. Every bug was introduced by an agent that applied a "correct" pattern without simulating the end-user journey. The behavioral verifier would have caught all of them.
|
|
101
|
-
|
|
102
|
-
| Tier | Coverage |
|
|
103
|
-
|------|----------|
|
|
104
|
-
| Free | Skill-based advisory (agent prompted to think before editing) |
|
|
105
|
-
| Standard+ | `verify_behavior` tool with 17 domain scenarios via API |
|
|
106
|
-
| Enterprise | Custom per-workspace scenarios |
|
|
97
|
+
17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed — works in any MCP client.
|
|
107
98
|
|
|
108
99
|
### Time Travel
|
|
109
100
|
|
|
@@ -115,7 +106,7 @@ Roll back to any previous session state. Compare diffs between versions. Restore
|
|
|
115
106
|
|
|
116
107
|
### Cognitive Routing
|
|
117
108
|
|
|
118
|
-
|
|
109
|
+
Three memory types, automatically sorted: **episodic** (what happened — session logs, decisions), **semantic** (what's true — facts, architecture), and **procedural** (how to do X — workflows, patterns). When you search, the router picks the right store instead of dumping everything.
|
|
119
110
|
|
|
120
111
|
### Multi-Agent Hivemind
|
|
121
112
|
|
|
@@ -144,37 +135,53 @@ The free tier runs entirely on your machine. Paid tiers add cloud sync through t
|
|
|
144
135
|
| Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
|
|
145
136
|
| Inference | Local Ollama models | Local models + cloud fallback |
|
|
146
137
|
| API keys required | None | Synalux subscription key |
|
|
147
|
-
| Web search / scrape | Not included |
|
|
138
|
+
| Web search / scrape | Not included | Via Synalux portal (provider keys server-side) |
|
|
148
139
|
| What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
|
|
149
|
-
| Works offline |
|
|
140
|
+
| Works offline | ✅ | Local features yes; sync/cloud no |
|
|
150
141
|
|
|
151
|
-
**Handling sensitive data.**
|
|
142
|
+
**Handling sensitive data.** All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the **local tier** for full air-gap, or use **Enterprise** which includes a HIPAA Business Associate Agreement.
|
|
152
143
|
|
|
153
144
|
---
|
|
154
145
|
|
|
155
146
|
## Models
|
|
156
147
|
|
|
157
|
-
The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing. The
|
|
148
|
+
The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing AND general inference. The 9B and 27B are fine-tuned with LoRA (r=128, all 64 layers including DeltaNet); the 2B and 4B use stock Qwen3.5-4B at different quantization levels. The 27B scored 100% on BFCL function-calling and 100% on an internal 15-problem coding eval at $0 inference cost.
|
|
158
149
|
|
|
159
|
-
|
|
150
|
+
`prism_infer` supports three modes: `route` (tool routing, fast, nothink), `chat` (conversation with thinking), and `code` (code generation with thinking). In chat/code modes, the model uses `<think>` blocks for chain-of-thought reasoning, which are stripped before the response is served. If the local model fails a quality gate (empty, think-only, or truncated), paid tiers automatically escalate to Claude via the Synalux portal.
|
|
151
|
+
|
|
152
|
+
| Model | Ollama tag | Size | [BFCL](https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v3_multi_turn.html) Accuracy | Role | Tier |
|
|
160
153
|
|---|---|---|---|---|---|
|
|
161
154
|
| Qwen3.5-4B Q3_K_M | `prism-coder:2b` | 2.3 GB | 99.1% × 3 seeds | iPhone / mobile first gate | Free |
|
|
162
|
-
| Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% × 3 seeds | Verifier
|
|
163
|
-
|
|
|
164
|
-
|
|
|
155
|
+
| Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% × 3 seeds | Verifier | Free |
|
|
156
|
+
| Qwen3.5-9B (LoRA) | `prism-coder:9b` | 5.8 GB | 100% × 3 seeds | Default router | Standard+ |
|
|
157
|
+
| Qwen3.5-27B (LoRA) | `prism-coder:27b` | 16 GB | 100% × 3 seeds | Quality tier (DeltaNet, 28.5 tok/s) | Advanced+ |
|
|
165
158
|
|
|
166
159
|
Weights: [huggingface.co/dcostenco](https://huggingface.co/dcostenco) (public GGUF). Latency depends on model size and hardware — see [Benchmarks](#benchmarks) to measure it on your own machine rather than trusting a printed number.
|
|
167
160
|
|
|
168
161
|
### Cascade
|
|
169
162
|
|
|
170
163
|
```
|
|
171
|
-
query → prism-coder:
|
|
172
|
-
→
|
|
164
|
+
query → prism-coder:9b (local router, default)
|
|
165
|
+
→ prism-coder:4b (grounding verifier)
|
|
173
166
|
→ prism-coder:2b (iPhone / mobile, auto-selected by RAM)
|
|
174
|
-
→ prism-coder:
|
|
167
|
+
→ prism-coder:27b (complex tasks, on demand)
|
|
175
168
|
→ cloud fallback (paid tiers, for max quality)
|
|
176
169
|
```
|
|
177
170
|
|
|
171
|
+
### Multi-Layer Verification
|
|
172
|
+
|
|
173
|
+
Every tool-grounded answer on paid tiers passes through deterministic L3 routing rules and an NLI grounding verifier before reaching the user. Free-tier users get the deterministic gates (L1, L3-Tool, L3-Tier0) without the model-based NLI check.
|
|
174
|
+
|
|
175
|
+
| Layer | What | Model | Cost |
|
|
176
|
+
|---|---|---|---|
|
|
177
|
+
| **L1** | Crisis/medical safety gate | None (regex) | 0 ms |
|
|
178
|
+
| **L3-Tool** | Tool name remap + false-positive rejection | None (deterministic) | 0 ms |
|
|
179
|
+
| **L3-Tier0** | Integer grounding (set membership) | None (deterministic) | 0 ms |
|
|
180
|
+
| **L3-Tier2** | NLI verifier (claim → ENTAILED/NEUTRAL/CONTRADICTED) | prism-coder:2b | ~200 ms |
|
|
181
|
+
| **L4** | Hallucination judge (opt-out for clinical) | prism-coder:4b | ~500 ms |
|
|
182
|
+
|
|
183
|
+
Fail-closed on the verified path: when the grounding verifier runs (Standard tier and up), timeout, ambiguity, or missing evidence yields a refusal, not pass-through. Free-tier users get the deterministic L1/L3-Tool gates but not the NLI verifier.
|
|
184
|
+
|
|
178
185
|
---
|
|
179
186
|
|
|
180
187
|
## Benchmarks
|
|
@@ -184,15 +191,15 @@ query → prism-coder:14b (local router, Mac default)
|
|
|
184
191
|
```bash
|
|
185
192
|
git clone https://github.com/dcostenco/prism-coder && cd prism-coder
|
|
186
193
|
pip install anthropic requests
|
|
187
|
-
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b
|
|
194
|
+
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 9b 27b
|
|
188
195
|
```
|
|
189
196
|
|
|
190
|
-
**Routing eval (115 cases, 12 categories, 3-seed mean).** On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
|
|
197
|
+
**Routing eval (115 cases, 12 categories, 3-seed mean).** Routing accuracy includes the deterministic L3 correction layer — the same rules that run in production. On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
|
|
191
198
|
|
|
192
199
|
| Model | Routing accuracy | Notes |
|
|
193
200
|
|---|---|---|
|
|
194
201
|
| prism-coder:2b (Q3_K_M) | 99.1% × 3 seeds | 1 failure: regex→knowledge_search |
|
|
195
|
-
| prism-coder:4b /
|
|
202
|
+
| prism-coder:4b / 9b / 27b | 100% × 3 seeds | Perfect on all 115 cases |
|
|
196
203
|
| Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
|
|
197
204
|
|
|
198
205
|
**Memory uplift (LoCoMo-Plus, self-published).** A separate long-context dialogue benchmark ([dcostenco/Locomo-Plus](https://github.com/dcostenco/Locomo-Plus)) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project — treat it as a reproducible demonstration, not an independent third-party result, and run it yourself with the commands in that repo.
|
|
@@ -207,30 +214,30 @@ These tables are the maintainer's assessment as of June 2026. Verify claims that
|
|
|
207
214
|
|
|
208
215
|
| Feature | Prism Coder | GitHub Copilot | Cursor | Windsurf | Amazon Q | Devin |
|
|
209
216
|
|---|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
210
|
-
| Local inference (open-weight) |
|
|
211
|
-
| Works fully offline |
|
|
212
|
-
| Persistent cross-session memory |
|
|
213
|
-
| Session drift detection |
|
|
214
|
-
| L3 grounding verifier |
|
|
215
|
-
| Behavioral verification (pre-edit) |
|
|
216
|
-
| MCP server (tools + memory) |
|
|
217
|
-
| Web IDE |
|
|
218
|
-
| VS Code extension |
|
|
219
|
-
| Flat-rate team pricing |
|
|
220
|
-
| HIPAA BAA available |
|
|
217
|
+
| Local inference (open-weight) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
218
|
+
| Works fully offline | ✅ (free tier) | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
219
|
+
| Persistent cross-session memory | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
220
|
+
| Session drift detection | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
221
|
+
| L3 grounding verifier | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
222
|
+
| Behavioral verification (pre-edit) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
223
|
+
| MCP server (tools + memory) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
224
|
+
| Web IDE | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
|
|
225
|
+
| VS Code extension | ✅ | ✅ | — | — | ✅ | ❌ |
|
|
226
|
+
| Flat-rate team pricing | ✅ | ❌ (per-seat) | ❌ (per-seat) | ❌ | ❌ | ❌ |
|
|
227
|
+
| HIPAA BAA available | ✅ (Enterprise) | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
221
228
|
|
|
222
229
|
### vs local AI / memory tools
|
|
223
230
|
|
|
224
231
|
| Feature | Prism Coder | Ollama | LM Studio | Mem0 | Zep |
|
|
225
232
|
|---|:---:|:---:|:---:|:---:|:---:|
|
|
226
|
-
| Local inference cascade |
|
|
227
|
-
| Cloud fallback |
|
|
228
|
-
| Persistent cross-session memory |
|
|
229
|
-
| Knowledge ingestion (MCP + webhook) |
|
|
230
|
-
| Cognitive routing (3-store) |
|
|
231
|
-
| Session drift detection |
|
|
232
|
-
| Native MCP server |
|
|
233
|
-
| Web IDE + VS Code extension |
|
|
233
|
+
| Local inference cascade | ✅ | ✅ | ✅ | ❌ | ❌ |
|
|
234
|
+
| Cloud fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
235
|
+
| Persistent cross-session memory | ✅ | ❌ | ❌ | ✅ | ✅ |
|
|
236
|
+
| Knowledge ingestion (MCP + webhook) | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
237
|
+
| Cognitive routing (3-store) | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
238
|
+
| Session drift detection | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
239
|
+
| Native MCP server | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
240
|
+
| Web IDE + VS Code extension | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
234
241
|
|
|
235
242
|
### Pricing — flat-rate, not per-seat
|
|
236
243
|
|
|
@@ -249,19 +256,19 @@ All on-device models are free to run locally via Ollama on every tier. A subscri
|
|
|
249
256
|
| | **Free** | **Standard** $19/mo | **Advanced** $49/mo | **Enterprise** $99/mo |
|
|
250
257
|
|---|---|---|---|---|
|
|
251
258
|
| Seats | 1 | 1 | up to 5 | up to 25 |
|
|
252
|
-
| Local model ceiling | up to 4b | up to
|
|
259
|
+
| Local model ceiling | up to 4b | up to 9b | up to 27b | up to 27b |
|
|
253
260
|
| Daily cloud inference | -- | 200 | 2,000 | 100,000 |
|
|
254
261
|
| Cloud Coder (Web IDE) | -- | 100/day | 1,000/day | 100,000/day |
|
|
255
262
|
| Cloud search | -- | 50/day | 500/day | 100,000/day |
|
|
256
263
|
| Max output tokens | 512 | 1,024 | 2,048 | 4,096 |
|
|
257
264
|
| Cloud fallback | -- | Claude Sonnet 4 | Claude Sonnet 4 | Priority + Sonnet 4 |
|
|
258
|
-
| Grounding verifier | -- |
|
|
259
|
-
| Memory sync (cloud) | -- |
|
|
265
|
+
| Grounding verifier (fact-check AI output) | -- | ✅ | ✅ | ✅ |
|
|
266
|
+
| Memory sync (cloud) | -- | ✅ | ✅ | ✅ |
|
|
260
267
|
| Knowledge / session memory | limited | unlimited | unlimited | unlimited |
|
|
261
|
-
| Analytics dashboard | -- |
|
|
262
|
-
| HIPAA BAA | -- | -- | -- |
|
|
268
|
+
| Analytics dashboard | -- | ✅ | ✅ | ✅ |
|
|
269
|
+
| HIPAA BAA | -- | -- | -- | ✅ |
|
|
263
270
|
|
|
264
|
-
14-day free trial on paid plans.
|
|
271
|
+
14-day free trial on paid plans. 25+ seats: [contact sales](https://synalux.ai/support)
|
|
265
272
|
|
|
266
273
|
---
|
|
267
274
|
|
|
@@ -279,6 +286,26 @@ Prism exposes 40+ MCP tools. The core memory loop:
|
|
|
279
286
|
| `session_detect_drift` | Detect when a session has drifted from its goal |
|
|
280
287
|
| `verify_behavior` | Pre-edit scenario challenge — catch bad changes before they happen |
|
|
281
288
|
| `knowledge_ingest` | Teach Prism a codebase or document |
|
|
289
|
+
| `prism_infer` | Local-first inference (route/chat/code modes, thinking, cloud escalation) |
|
|
290
|
+
|
|
291
|
+
### `prism_infer` — local-first inference with cloud escalation
|
|
292
|
+
|
|
293
|
+
```typescript
|
|
294
|
+
prism_infer({
|
|
295
|
+
prompt: "Write a binary search in Python",
|
|
296
|
+
mode: "code", // "route" | "chat" | "code"
|
|
297
|
+
think: true, // enable <think> reasoning (default: true for chat/code)
|
|
298
|
+
model_ceiling: "27b", // use the quality tier
|
|
299
|
+
})
|
|
300
|
+
// → 27B generates code locally ($0), with thinking for quality
|
|
301
|
+
// → If quality gate fails + paid tier → auto-escalate to Claude
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
| Mode | Think | Model | Use case |
|
|
305
|
+
|------|-------|-------|----------|
|
|
306
|
+
| `route` | Off (fast) | 9B default | MCP tool routing |
|
|
307
|
+
| `chat` | On | 27B preferred | Conversation, reasoning |
|
|
308
|
+
| `code` | On | 27B preferred | Code generation, debugging |
|
|
282
309
|
|
|
283
310
|
Full TypeScript signatures live in [`src/tools/`](src/tools/); architecture in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).
|
|
284
311
|
|
|
@@ -324,6 +351,8 @@ prism register-models # alias dcostenco/prism-coder:* -> prism-coder:*
|
|
|
324
351
|
|
|
325
352
|
## Companions
|
|
326
353
|
|
|
354
|
+
Prism works alongside these tools — use whichever fits your workflow.
|
|
355
|
+
|
|
327
356
|
### Web IDE — Prism Coder
|
|
328
357
|
|
|
329
358
|
A browser-based IDE at [synalux.ai/coder](https://synalux.ai/coder). Import any GitHub repo and get:
|
|
@@ -358,13 +387,16 @@ code --install-extension synalux-ai.synalux
|
|
|
358
387
|
|
|
359
388
|
[](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
|
|
360
389
|
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
**Clinical features (BCBA / healthcare):** SOAP note generator, role-based access, document signing, patient board. Voice recording with AES-256-GCM encryption (consent-gated, off by default, plaintext deleted after encryption).
|
|
390
|
+
AI chat, voice input, SOAP note generator, team collaboration, and video calls — all inside VS Code. Routes through local Ollama by default; cloud on paid tiers.
|
|
364
391
|
|
|
365
|
-
|
|
392
|
+
<details>
|
|
393
|
+
<summary>Feature details</summary>
|
|
366
394
|
|
|
367
|
-
**
|
|
395
|
+
- **AI**: Chat participant (`@synalux`), multi-agent pipeline, voice input, model switching, 10 tones
|
|
396
|
+
- **Clinical**: SOAP note generator, role-based access, document signing, patient board
|
|
397
|
+
- **Collaboration**: Team chat, DMs, video calls, customer board, visual builder, DevContainers
|
|
398
|
+
- **Privacy**: Local Ollama by default. `preferLocal=true` tries local first. Enterprise BAA available.
|
|
399
|
+
</details>
|
|
368
400
|
|
|
369
401
|
### Prism AAC
|
|
370
402
|
|
|
@@ -374,6 +406,28 @@ See [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)
|
|
|
374
406
|
|
|
375
407
|
---
|
|
376
408
|
|
|
409
|
+
## Git Hooks (Portable)
|
|
410
|
+
|
|
411
|
+
Pre-commit and pre-push security hooks that work with any editor, any AI tool, and direct CLI. No Claude Code dependency.
|
|
412
|
+
|
|
413
|
+
```bash
|
|
414
|
+
# Install in all repos (one-time)
|
|
415
|
+
bash synalux-private/scripts/install-git-hooks.sh
|
|
416
|
+
|
|
417
|
+
# Or install manually in a single repo
|
|
418
|
+
cp hooks/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
|
|
419
|
+
cp hooks/pre-push .git/hooks/pre-push && chmod +x .git/hooks/pre-push
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
| Hook | What it checks | Mode |
|
|
423
|
+
|------|----------------|------|
|
|
424
|
+
| `pre-commit` | Dead code, orphan services, scaffold code, missing auth | `PRECOMMIT_MODE=advisory\|block\|off` |
|
|
425
|
+
| `pre-push` | 19-rule security audit (SSRF, SQL injection, secrets, IDOR, etc.) | `PREPUSH_MODE=advisory\|block\|off` |
|
|
426
|
+
|
|
427
|
+
Default mode is `advisory` (warn but allow). Set `*_MODE=block` for hard enforcement. Hooks look for full audit scripts in the repo first (`hooks/lib/`), then `~/.claude/hooks/` fallback, then minimal inline checks.
|
|
428
|
+
|
|
429
|
+
---
|
|
430
|
+
|
|
377
431
|
## Self-hosting (Enterprise)
|
|
378
432
|
|
|
379
433
|
Run the full model stack on your own hardware — no cloud, full data sovereignty.
|
|
@@ -381,11 +435,11 @@ Run the full model stack on your own hardware — no cloud, full data sovereignt
|
|
|
381
435
|
**Requirements:** Mac M2 Pro+ (48 GB recommended) or Linux + NVIDIA GPU, plus [Ollama](https://ollama.com).
|
|
382
436
|
|
|
383
437
|
```bash
|
|
384
|
-
ollama pull dcostenco/prism-coder:
|
|
438
|
+
ollama pull dcostenco/prism-coder:9b # default router
|
|
385
439
|
export LOCAL_LLM_URL=http://localhost:11434
|
|
386
440
|
```
|
|
387
441
|
|
|
388
|
-
Routing is automatic: `
|
|
442
|
+
Routing is automatic: `9b → 4b → cloud fallback` on desktop/server, `2b → cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
|
|
389
443
|
|
|
390
444
|
---
|
|
391
445
|
|
package/dist/cli.js
CHANGED
|
@@ -521,10 +521,10 @@ scmCmd
|
|
|
521
521
|
});
|
|
522
522
|
// ─── prism register-models ────────────────────────────────────
|
|
523
523
|
// Convenience: alias namespaced HF-style prism-coder tags
|
|
524
|
-
// (`dcostenco/prism-coder:
|
|
524
|
+
// (`dcostenco/prism-coder:9b`) to the bare tags (`prism-coder:9b`)
|
|
525
525
|
// some external tooling expects. The MCP picker handles both forms
|
|
526
526
|
// natively as of v15.5, so this command is OPTIONAL — useful only
|
|
527
|
-
// when a user wants to run `ollama run prism-coder:
|
|
527
|
+
// when a user wants to run `ollama run prism-coder:9b` directly,
|
|
528
528
|
// or for tools that pre-date the picker's namespace fallback.
|
|
529
529
|
program
|
|
530
530
|
.command('register-models')
|
package/dist/storage/sqlite.js
CHANGED
|
@@ -1268,7 +1268,7 @@ export class SqliteStorage {
|
|
|
1268
1268
|
FROM session_ledger
|
|
1269
1269
|
WHERE project = ? AND user_id = ? AND role = ?
|
|
1270
1270
|
AND event_type = 'correction'
|
|
1271
|
-
AND importance >=
|
|
1271
|
+
AND importance >= 0
|
|
1272
1272
|
AND deleted_at IS NULL
|
|
1273
1273
|
AND archived_at IS NULL
|
|
1274
1274
|
ORDER BY importance DESC
|
|
@@ -2323,10 +2323,12 @@ export class SqliteStorage {
|
|
|
2323
2323
|
SET importance = MAX(0, importance - 1)
|
|
2324
2324
|
WHERE project = ? AND user_id = ?
|
|
2325
2325
|
AND importance > 0
|
|
2326
|
+
AND importance < 10
|
|
2326
2327
|
AND event_type != 'session'
|
|
2327
2328
|
AND created_at < datetime('now', '-' || ? || ' days')
|
|
2329
|
+
AND (last_accessed_at IS NULL OR last_accessed_at < datetime('now', '-' || ? || ' days'))
|
|
2328
2330
|
AND deleted_at IS NULL`,
|
|
2329
|
-
args: [project, userId, decayDays],
|
|
2331
|
+
args: [project, userId, decayDays, decayDays],
|
|
2330
2332
|
});
|
|
2331
2333
|
const decayed = result.rowsAffected || 0;
|
|
2332
2334
|
if (decayed > 0) {
|
|
@@ -10,7 +10,6 @@
|
|
|
10
10
|
*/
|
|
11
11
|
import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
|
|
12
12
|
import { getSynaluxJwt } from "../utils/synaluxJwt.js";
|
|
13
|
-
import { debugLog } from "../utils/logger.js";
|
|
14
13
|
const FALLBACK_SCENARIO = [
|
|
15
14
|
"⚠️ BEHAVIORAL VERIFICATION (OFFLINE MODE)",
|
|
16
15
|
"",
|
|
@@ -30,7 +29,7 @@ export async function verifyBehaviorHandler(args) {
|
|
|
30
29
|
}
|
|
31
30
|
const jwt = await getSynaluxJwt();
|
|
32
31
|
if (!jwt) {
|
|
33
|
-
|
|
32
|
+
console.error("[verify-behavior] ⚠️ JWT unavailable — fail-closed with generic scenario");
|
|
34
33
|
return FALLBACK_SCENARIO;
|
|
35
34
|
}
|
|
36
35
|
try {
|
|
@@ -49,14 +48,14 @@ export async function verifyBehaviorHandler(args) {
|
|
|
49
48
|
signal: AbortSignal.timeout(5_000),
|
|
50
49
|
});
|
|
51
50
|
if (!res.ok) {
|
|
52
|
-
|
|
51
|
+
console.error(`[verify-behavior] ⚠️ portal returned ${res.status} — fail-closed. URL: ${url}`);
|
|
53
52
|
return FALLBACK_SCENARIO;
|
|
54
53
|
}
|
|
55
54
|
const data = (await res.json());
|
|
56
55
|
return formatResult(data);
|
|
57
56
|
}
|
|
58
57
|
catch (err) {
|
|
59
|
-
|
|
58
|
+
console.error(`[verify-behavior] ⚠️ VERIFICATION FAILED: ${err.message} — using generic fallback`);
|
|
60
59
|
return FALLBACK_SCENARIO;
|
|
61
60
|
}
|
|
62
61
|
}
|
|
@@ -977,15 +977,17 @@ export async function sessionLoadContextHandler(args) {
|
|
|
977
977
|
// Build the response object before v4.0 augmentations
|
|
978
978
|
// SECURITY: Wrap output in boundary tags to prevent context confusion.
|
|
979
979
|
// The LLM sees <prism_memory context="historical"> and knows this is data, not instructions.
|
|
980
|
-
|
|
981
|
-
//
|
|
982
|
-
//
|
|
983
|
-
// formatted output so the agent sees them prominently.
|
|
980
|
+
// ─── v19.1: Behavioral Warnings — BEFORE skills (protected from truncation) ───
|
|
981
|
+
// Corrections must surface prominently. Placed before skillBlock so the
|
|
982
|
+
// skill budget cannot push them out. Capped at 2,000 chars.
|
|
984
983
|
const behavWarnings = data?.behavioral_warnings;
|
|
984
|
+
let behavBlock = '';
|
|
985
985
|
if (behavWarnings && behavWarnings.length > 0) {
|
|
986
|
-
|
|
986
|
+
const rawBlock = `\n\n[⚠️ BEHAVIORAL WARNINGS — DO NOT IGNORE]\n` +
|
|
987
987
|
behavWarnings.map(w => `- ${w.summary} (importance: ${w.importance})`).join("\n");
|
|
988
|
+
behavBlock = [...rawBlock].slice(0, 2000).join('');
|
|
988
989
|
}
|
|
990
|
+
let responseText = `${MEMORY_BOUNDARY_PREFIX}📋 Session context for "${project}" (${level}):\n\n${formattedContext.trim()}${splitBrainWarning}${driftReport}${briefingBlock}${sdmRecallBlock}${greetingBlock}${visualMemoryBlock}${behavBlock}${skillBlock}${versionNote}`;
|
|
989
991
|
// ─── v9.4.7: ABA Precision Protocol (foundational) ────────
|
|
990
992
|
// Injected into EVERY session load so the agent always operates
|
|
991
993
|
// under these behavioral rules. Never truncated (placed before
|
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
* prism_infer — local-first inference tool
|
|
3
3
|
* ─────────────────────────────────────────────────────────────
|
|
4
4
|
* Save the caller's cloud tokens by routing to a local prism-coder
|
|
5
|
-
* model via Ollama. Tiers (
|
|
5
|
+
* model via Ollama. Tiers (27B/9B/8B/1.7B) auto-selected by free
|
|
6
6
|
* RAM, then capped by `model_ceiling` and the set of tags that are
|
|
7
7
|
* actually pulled into Ollama.
|
|
8
8
|
*
|
|
@@ -12,7 +12,7 @@
|
|
|
12
12
|
* 4. On local fail, if cloud_fallback=true:
|
|
13
13
|
* - exchange synalux_sk_ → JWT (cached)
|
|
14
14
|
* - POST synalux portal /api/v1/prism-aac/inference
|
|
15
|
-
* - portal runs its own cascade (
|
|
15
|
+
* - portal runs its own cascade (9B/27B/Claude by tier)
|
|
16
16
|
* 5. Return { output, backend, model_picked, ram_free_mb, latency_ms, used_cloud }
|
|
17
17
|
*
|
|
18
18
|
* `prism_infer` is a thin client. It never calls Anthropic / OpenRouter
|
|
@@ -24,16 +24,17 @@ import { getSynaluxJwt, invalidateSynaluxJwt } from "../utils/synaluxJwt.js";
|
|
|
24
24
|
import { getAvailableMemoryBytes } from "../utils/availableMemory.js";
|
|
25
25
|
import { PRISM_SYNALUX_BASE_URL, PRISM_LOCAL_LLM_URL, } from "../config.js";
|
|
26
26
|
import { debugLog } from "../utils/logger.js";
|
|
27
|
-
import { verifyGrounding } from "../utils/groundingVerifier.js";
|
|
28
27
|
import { getEntitlements, clampCeiling } from "../utils/entitlements.js";
|
|
29
28
|
import { ddLog } from "../utils/ddLogger.js";
|
|
29
|
+
import { stripThink } from "../utils/thinkStrip.js";
|
|
30
|
+
import { passesQualityGate } from "../utils/qualityGate.js";
|
|
30
31
|
// ─── Tool Definition ────────────────────────────────────────────
|
|
31
32
|
export const PRISM_INFER_TOOL = {
|
|
32
33
|
name: "prism_infer",
|
|
33
34
|
description: "Run an inference on a local prism-coder model (Ollama) to save cloud tokens. " +
|
|
34
|
-
"Picks the largest viable tier —
|
|
35
|
+
"Picks the largest viable tier — 27B / 9B / 8B / 1.7B — based on free RAM at call time, " +
|
|
35
36
|
"clamped by `model_ceiling` and what is actually pulled in Ollama. " +
|
|
36
|
-
"Falls through to the synalux portal cloud cascade (
|
|
37
|
+
"Falls through to the synalux portal cloud cascade (9B → 27B → Claude Opus 4.7) " +
|
|
37
38
|
"only when local is unviable AND `cloud_fallback=true`. " +
|
|
38
39
|
"Use this for code generation, summarisation, classification, or any synth task you would " +
|
|
39
40
|
"otherwise hand to the cloud model — it costs $0 when the local hit succeeds.",
|
|
@@ -60,8 +61,8 @@ export const PRISM_INFER_TOOL = {
|
|
|
60
61
|
},
|
|
61
62
|
model_ceiling: {
|
|
62
63
|
type: "string",
|
|
63
|
-
enum: ["
|
|
64
|
-
description: "Cap the largest tier the picker may select. e.g. '
|
|
64
|
+
enum: ["27b", "9b", "4b", "2b"],
|
|
65
|
+
description: "Cap the largest tier the picker may select. e.g. '9b' forbids 27B even if RAM allows.",
|
|
65
66
|
},
|
|
66
67
|
cloud_fallback: {
|
|
67
68
|
type: "boolean",
|
|
@@ -70,7 +71,7 @@ export const PRISM_INFER_TOOL = {
|
|
|
70
71
|
},
|
|
71
72
|
timeout_ms: {
|
|
72
73
|
type: "number",
|
|
73
|
-
description: "Override per-call timeout. Default scales with model size:
|
|
74
|
+
description: "Override per-call timeout. Default scales with model size: 27B=120s, 9B=60s, 4B=20s, 1.7B=15s.",
|
|
74
75
|
},
|
|
75
76
|
evidence: {
|
|
76
77
|
type: "array",
|
|
@@ -103,6 +104,20 @@ export const PRISM_INFER_TOOL = {
|
|
|
103
104
|
description: "Override the verifier hard timeout. Default 2000 ms.",
|
|
104
105
|
default: 2000,
|
|
105
106
|
},
|
|
107
|
+
mode: {
|
|
108
|
+
type: "string",
|
|
109
|
+
enum: ["route", "chat", "code"],
|
|
110
|
+
description: "Execution mode. 'route' (default) for MCP tool routing — fast, nothink. " +
|
|
111
|
+
"'chat' for general conversation — uses thinking, escalates to cloud on failure. " +
|
|
112
|
+
"'code' for code generation — uses thinking, larger context. " +
|
|
113
|
+
"In chat/code modes, prefers the 27B tier and enables <think> reasoning.",
|
|
114
|
+
default: "route",
|
|
115
|
+
},
|
|
116
|
+
think: {
|
|
117
|
+
type: "boolean",
|
|
118
|
+
description: "Enable thinking mode (<think> blocks). Default: true for chat/code, false for route. " +
|
|
119
|
+
"Thinking improves quality on complex tasks but adds latency (~2-5s).",
|
|
120
|
+
},
|
|
106
121
|
},
|
|
107
122
|
required: ["prompt"],
|
|
108
123
|
},
|
|
@@ -124,7 +139,12 @@ export function isPrismInferArgs(args) {
|
|
|
124
139
|
if (a.timeout_ms !== undefined && typeof a.timeout_ms !== "number")
|
|
125
140
|
return false;
|
|
126
141
|
if (a.model_ceiling !== undefined &&
|
|
127
|
-
!["
|
|
142
|
+
!["27b", "9b", "4b", "2b"].includes(a.model_ceiling))
|
|
143
|
+
return false;
|
|
144
|
+
if (a.mode !== undefined &&
|
|
145
|
+
!["route", "chat", "code"].includes(a.mode))
|
|
146
|
+
return false;
|
|
147
|
+
if (a.think !== undefined && typeof a.think !== "boolean")
|
|
128
148
|
return false;
|
|
129
149
|
if (a.verify !== undefined && typeof a.verify !== "boolean")
|
|
130
150
|
return false;
|
|
@@ -147,9 +167,9 @@ export function isPrismInferArgs(args) {
|
|
|
147
167
|
}
|
|
148
168
|
// ─── Ollama helpers ────────────────────────────────────────────
|
|
149
169
|
const DEFAULT_TIMEOUTS = {
|
|
150
|
-
"prism-coder:
|
|
151
|
-
"prism-coder:
|
|
152
|
-
"
|
|
170
|
+
"prism-coder:27b": 120_000,
|
|
171
|
+
"prism-coder:9b": 60_000,
|
|
172
|
+
"prism-coder:4b": 20_000,
|
|
153
173
|
"prism-coder:2b": 15_000,
|
|
154
174
|
};
|
|
155
175
|
/** List Ollama-installed tags. Returns null if Ollama unreachable. */
|
|
@@ -194,16 +214,20 @@ export async function listOllamaLoaded(url = PRISM_LOCAL_LLM_URL) {
|
|
|
194
214
|
return new Set();
|
|
195
215
|
}
|
|
196
216
|
}
|
|
197
|
-
async function callOllamaGenerate(url, model, prompt, system, maxTokens, temperature, timeoutMs) {
|
|
217
|
+
async function callOllamaGenerate(url, model, prompt, system, maxTokens, temperature, timeoutMs, think) {
|
|
198
218
|
try {
|
|
219
|
+
const messages = [];
|
|
220
|
+
if (system)
|
|
221
|
+
messages.push({ role: "system", content: system });
|
|
222
|
+
messages.push({ role: "user", content: prompt });
|
|
199
223
|
const body = {
|
|
200
224
|
model,
|
|
201
|
-
|
|
202
|
-
...(system ? { system } : {}),
|
|
225
|
+
messages,
|
|
203
226
|
stream: false,
|
|
227
|
+
...(think !== undefined ? { think } : {}),
|
|
204
228
|
options: { num_predict: maxTokens, temperature },
|
|
205
229
|
};
|
|
206
|
-
const res = await fetch(`${url}/api/
|
|
230
|
+
const res = await fetch(`${url}/api/chat`, {
|
|
207
231
|
method: "POST",
|
|
208
232
|
headers: { "Content-Type": "application/json" },
|
|
209
233
|
body: JSON.stringify(body),
|
|
@@ -215,10 +239,10 @@ async function callOllamaGenerate(url, model, prompt, system, maxTokens, tempera
|
|
|
215
239
|
const data = (await res.json());
|
|
216
240
|
if (data.error)
|
|
217
241
|
return { ok: false, reason: `ollama_err:${data.error}` };
|
|
218
|
-
const text = (data.
|
|
242
|
+
const text = (data.message?.content ?? "").trim();
|
|
219
243
|
if (!text)
|
|
220
244
|
return { ok: false, reason: "empty_response" };
|
|
221
|
-
return { ok: true, text };
|
|
245
|
+
return { ok: true, text, doneReason: data.done_reason };
|
|
222
246
|
}
|
|
223
247
|
catch (err) {
|
|
224
248
|
const name = err instanceof Error ? err.name : "Unknown";
|
|
@@ -280,8 +304,11 @@ export async function runInfer(args, deps) {
|
|
|
280
304
|
// Fetch user's plan limits (cached 1hr). Free users without auth
|
|
281
305
|
// get 4b ceiling, 50 calls/day, 512 max tokens.
|
|
282
306
|
const ent = deps.entitlements ?? await getEntitlements();
|
|
283
|
-
//
|
|
284
|
-
|
|
307
|
+
// MF2: In chat/code modes, request the 27B tier (subject to plan ceiling + RAM).
|
|
308
|
+
// mode:"code" implies quality → start higher in the cascade.
|
|
309
|
+
const mode = args.mode ?? "route";
|
|
310
|
+
const modeCeiling = (mode === "chat" || mode === "code") ? (args.model_ceiling ?? "27b") : args.model_ceiling;
|
|
311
|
+
const effectiveCeiling = clampCeiling(modeCeiling, ent.model_ceiling);
|
|
285
312
|
// Clamp max_tokens to plan limit
|
|
286
313
|
const maxTokens = Math.min(args.max_tokens ?? 1024, ent.max_tokens, 8192);
|
|
287
314
|
// Cloud fallback only for paid plans
|
|
@@ -327,16 +354,16 @@ export async function runInfer(args, deps) {
|
|
|
327
354
|
// Walk the tier table top → bottom, capped by model_ceiling. Each tier
|
|
328
355
|
// logs its skip reason ("not_pulled" / "ram_insufficient" / fail reason)
|
|
329
356
|
// so the caller can see exactly why each tier was bypassed.
|
|
357
|
+
let localDraft = null;
|
|
330
358
|
if (installed) {
|
|
331
|
-
// Find start index from ceiling — if no ceiling, start at the top (32B).
|
|
332
359
|
const ceilStart = effectiveCeiling
|
|
333
360
|
? Math.max(0, MODEL_TIERS.findIndex(t => t.tag.endsWith(`:${effectiveCeiling}`)))
|
|
334
361
|
: 0;
|
|
335
362
|
let anyViable = false;
|
|
336
363
|
for (let i = ceilStart; i < MODEL_TIERS.length; i++) {
|
|
337
364
|
const tier = MODEL_TIERS[i];
|
|
338
|
-
// Accept the tier whether Ollama reports it as bare (`prism-coder:
|
|
339
|
-
// or namespaced (`dcostenco/prism-coder:
|
|
365
|
+
// Accept the tier whether Ollama reports it as bare (`prism-coder:27b`)
|
|
366
|
+
// or namespaced (`dcostenco/prism-coder:27b`, the form `ollama pull`
|
|
340
367
|
// produces from a HF repo). resolveOllamaName returns the actual
|
|
341
368
|
// name Ollama knows so /api/generate finds the model.
|
|
342
369
|
const ollamaName = resolveOllamaName(tier.tag, installed);
|
|
@@ -353,9 +380,27 @@ export async function runInfer(args, deps) {
|
|
|
353
380
|
}
|
|
354
381
|
anyViable = true;
|
|
355
382
|
const timeout = args.timeout_ms ?? DEFAULT_TIMEOUTS[tier.tag] ?? 60_000;
|
|
356
|
-
const
|
|
383
|
+
const enableThink = args.think ?? (mode !== "route");
|
|
384
|
+
const result = await deps.callLocal(deps.ollamaUrl, ollamaName, args.prompt, args.system, maxTokens, temperature, timeout, enableThink);
|
|
357
385
|
if (result.ok) {
|
|
358
|
-
|
|
386
|
+
const { stripped, thinkOnly } = stripThink(result.text);
|
|
387
|
+
const output = stripped;
|
|
388
|
+
// Quality gate for chat/code modes
|
|
389
|
+
if (mode !== "route") {
|
|
390
|
+
const gate = passesQualityGate(output, thinkOnly, result.doneReason);
|
|
391
|
+
if (!gate.pass && allowCloud) {
|
|
392
|
+
debugLog(`[prism_infer] quality gate FAIL (${gate.reason}) — escalating to cloud`);
|
|
393
|
+
attempts.push({ tier: tier.tag, reason: `quality_gate:${gate.reason}` });
|
|
394
|
+
if (gate.reason === "hard_truncation" || gate.reason === "loop_detected") {
|
|
395
|
+
localDraft = { output, tier: tier.tag };
|
|
396
|
+
}
|
|
397
|
+
break;
|
|
398
|
+
}
|
|
399
|
+
if (!gate.pass) {
|
|
400
|
+
debugLog(`[prism_infer] quality gate FAIL (${gate.reason}) — no cloud, serving local`);
|
|
401
|
+
}
|
|
402
|
+
}
|
|
403
|
+
return await applyVerification(output, gatedArgs, deps, {
|
|
359
404
|
backend: `ollama-${tier.tag.replace("prism-coder:", "")}`,
|
|
360
405
|
model_picked: tier.tag,
|
|
361
406
|
ram_free_mb: ramFreeMb,
|
|
@@ -393,7 +438,20 @@ export async function runInfer(args, deps) {
|
|
|
393
438
|
else {
|
|
394
439
|
attempts.push({ tier: "synalux", reason: "cloud_fallback_disabled" });
|
|
395
440
|
}
|
|
396
|
-
//
|
|
441
|
+
// Cloud also failed — serve the local draft if we have one
|
|
442
|
+
if (localDraft) {
|
|
443
|
+
debugLog(`[prism_infer] cloud failed, serving gate-failed local draft from ${localDraft.tier}`);
|
|
444
|
+
return await applyVerification(localDraft.output, gatedArgs, deps, {
|
|
445
|
+
backend: `ollama-${localDraft.tier.replace("prism-coder:", "")}`,
|
|
446
|
+
model_picked: localDraft.tier,
|
|
447
|
+
ram_free_mb: ramFreeMb,
|
|
448
|
+
latency_ms: Date.now() - t0,
|
|
449
|
+
used_cloud: false,
|
|
450
|
+
attempts,
|
|
451
|
+
plan: ent.plan,
|
|
452
|
+
quality_gate_failed: true,
|
|
453
|
+
});
|
|
454
|
+
}
|
|
397
455
|
const err = new Error(`prism_infer: no backend produced output. attempts=${JSON.stringify(attempts)}, free=${fmtGb(freeBytes)}`);
|
|
398
456
|
err.attempts = attempts;
|
|
399
457
|
throw err;
|
|
@@ -407,10 +465,10 @@ export async function runInfer(args, deps) {
|
|
|
407
465
|
*/
|
|
408
466
|
async function applyVerification(draft, args, deps, partial) {
|
|
409
467
|
const shouldVerify = args.verify ?? (args.evidence !== undefined && args.evidence.length > 0);
|
|
410
|
-
if (!shouldVerify) {
|
|
468
|
+
if (!shouldVerify || !deps.callVerifier) {
|
|
411
469
|
return { ...partial, output: draft };
|
|
412
470
|
}
|
|
413
|
-
const verifier = deps.callVerifier
|
|
471
|
+
const verifier = deps.callVerifier;
|
|
414
472
|
const outcome = await verifier({
|
|
415
473
|
draft,
|
|
416
474
|
evidence: args.evidence ?? [],
|
|
@@ -451,6 +509,7 @@ export async function prismInferHandler(args) {
|
|
|
451
509
|
` free_ram=${result.ram_free_mb}MB` +
|
|
452
510
|
` latency=${result.latency_ms}ms` +
|
|
453
511
|
` used_cloud=${result.used_cloud}` +
|
|
512
|
+
(result.quality_gate_failed ? ` quality_gate_failed=true` : "") +
|
|
454
513
|
(result.verification ? ` verify=${result.verification.action}` : "") +
|
|
455
514
|
(result.attempts.length ? ` attempts=${JSON.stringify(result.attempts)}` : "");
|
|
456
515
|
return {
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
* to enforce model ceiling, max_tokens, and feature gates.
|
|
7
7
|
*
|
|
8
8
|
* Unauthenticated users (no SYNALUX_API_KEY) get free-tier defaults.
|
|
9
|
-
* Authenticated users get their plan from the portal (
|
|
9
|
+
* Authenticated users get their plan from the portal (5-minute cache).
|
|
10
10
|
*/
|
|
11
11
|
import { getSynaluxJwt } from "./synaluxJwt.js";
|
|
12
12
|
import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
|
|
@@ -32,10 +32,10 @@ const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
|
|
|
32
32
|
let cache = null;
|
|
33
33
|
let inFlight = null;
|
|
34
34
|
// ── Model tier ordering for ceiling enforcement ───────────────────
|
|
35
|
-
const TIER_ORDER = ["2b", "4b", "
|
|
35
|
+
const TIER_ORDER = ["2b", "4b", "9b", "27b"];
|
|
36
36
|
/**
|
|
37
37
|
* Returns true if `requested` exceeds `ceiling`.
|
|
38
|
-
* e.g. ceilingExceeded("
|
|
38
|
+
* e.g. ceilingExceeded("9b", "4b") → true (9b > 4b ceiling)
|
|
39
39
|
*/
|
|
40
40
|
export function ceilingExceeded(requested, ceiling) {
|
|
41
41
|
const reqIdx = TIER_ORDER.indexOf(requested);
|
|
@@ -79,12 +79,18 @@ async function fetchEntitlements() {
|
|
|
79
79
|
redirect: "error",
|
|
80
80
|
});
|
|
81
81
|
if (!res.ok) {
|
|
82
|
-
debugLog(`[entitlements] portal HTTP ${res.status}
|
|
82
|
+
debugLog(`[entitlements] portal HTTP ${res.status}`);
|
|
83
|
+
if (cache) {
|
|
84
|
+
debugLog("[entitlements] using last-known-good (safety fail-closed)");
|
|
85
|
+
return cache.entitlements;
|
|
86
|
+
}
|
|
83
87
|
return FREE_ENTITLEMENTS;
|
|
84
88
|
}
|
|
85
89
|
const data = (await res.json());
|
|
86
90
|
if (!data.plan || !data.model_ceiling) {
|
|
87
|
-
debugLog("[entitlements] malformed response
|
|
91
|
+
debugLog("[entitlements] malformed response");
|
|
92
|
+
if (cache)
|
|
93
|
+
return cache.entitlements;
|
|
88
94
|
return FREE_ENTITLEMENTS;
|
|
89
95
|
}
|
|
90
96
|
debugLog(`[entitlements] plan=${data.plan} ceiling=${data.model_ceiling} ` +
|
|
@@ -92,7 +98,14 @@ async function fetchEntitlements() {
|
|
|
92
98
|
return data;
|
|
93
99
|
}
|
|
94
100
|
catch (err) {
|
|
95
|
-
debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)}
|
|
101
|
+
debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)}`);
|
|
102
|
+
// F1 fix: fail-closed — keep last-known-good entitlements on fetch error.
|
|
103
|
+
// Safety controls (grounding_verifier) must not degrade on availability failures.
|
|
104
|
+
if (cache) {
|
|
105
|
+
debugLog("[entitlements] using last-known-good (safety fail-closed)");
|
|
106
|
+
return cache.entitlements;
|
|
107
|
+
}
|
|
108
|
+
debugLog("[entitlements] no cached entitlements — free tier fallback (cold start)");
|
|
96
109
|
return FREE_ENTITLEMENTS;
|
|
97
110
|
}
|
|
98
111
|
}
|
|
@@ -111,7 +124,14 @@ export async function getEntitlements() {
|
|
|
111
124
|
inFlight = (async () => {
|
|
112
125
|
try {
|
|
113
126
|
const ent = await fetchEntitlements();
|
|
114
|
-
cache
|
|
127
|
+
// Only update cache if this is a REAL fetch (not a cached fallback).
|
|
128
|
+
// fetchEntitlements returns cache.entitlements on error — detect by
|
|
129
|
+
// checking if the returned object is the exact same reference.
|
|
130
|
+
const isFallback = cache && ent === cache.entitlements;
|
|
131
|
+
if (!isFallback) {
|
|
132
|
+
cache = { entitlements: ent, expiresAt: Date.now() + CACHE_TTL_MS };
|
|
133
|
+
}
|
|
134
|
+
// On fallback: DON'T refresh expiresAt — let it expire so we retry.
|
|
115
135
|
return ent;
|
|
116
136
|
}
|
|
117
137
|
finally {
|
|
@@ -1,23 +1,22 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* RAM-Gated Local Model Picker
|
|
3
3
|
* ─────────────────────────────────────────────────────────────
|
|
4
|
-
* Cascade:
|
|
4
|
+
* Cascade: 9b (default) → 4b (verifier) → 2b (mobile) → 27b (quality).
|
|
5
5
|
*
|
|
6
|
-
* The default ceiling is "
|
|
7
|
-
* -
|
|
6
|
+
* The default ceiling is "9b" — NOT "27b". This means:
|
|
7
|
+
* - 9b is the primary model for routing + general inference (Qwen3.5-9B, 100% BFCL)
|
|
8
8
|
* - 4b is used as the grounding verifier (fast, small)
|
|
9
|
-
* - 2b is the mobile/iPhone first gate (Qwen3.5-
|
|
10
|
-
* -
|
|
9
|
+
* - 2b is the mobile/iPhone first gate (Qwen3.5-2B, 99.1% BFCL)
|
|
10
|
+
* - 27b is only loaded when caller explicitly passes ceiling="27b"
|
|
11
11
|
* or when the task requires maximum quality (complex code gen, etc.)
|
|
12
12
|
*
|
|
13
|
-
* This saves
|
|
14
|
-
* The 14b achieves 100% on eval_300 — same as 32b.
|
|
13
|
+
* This saves 11GB+ RAM vs 27b and keeps response times fast.
|
|
15
14
|
*
|
|
16
15
|
* tag weights need free ctx role
|
|
17
|
-
* prism-coder:
|
|
18
|
-
* prism-coder:
|
|
19
|
-
*
|
|
20
|
-
* prism-coder:2b ~ 2.3 GB ≥ 3 GB 8K mobile / iPhone (
|
|
16
|
+
* prism-coder:27b ~16 GB ≥ 20 GB 32K quality (on-demand, Qwen3.5 DeltaNet, 100% BFCL)
|
|
17
|
+
* prism-coder:9b ~ 5.8 GB ≥ 8 GB 32K default router (Qwen3.5, 100% BFCL)
|
|
18
|
+
* prism-coder:4b ~ 3.4 GB ≥ 5 GB 32K verifier (Qwen3.5, 100%)
|
|
19
|
+
* prism-coder:2b ~ 2.3 GB ≥ 3 GB 8K mobile / iPhone (Qwen3.5, 99.1%)
|
|
21
20
|
*
|
|
22
21
|
* Below 3 GB free → no local pick (caller must use cloud).
|
|
23
22
|
*/
|
|
@@ -27,30 +26,30 @@ const GB = 1024 ** 3;
|
|
|
27
26
|
* the first row whose minFreeGb fits within freeBytes.
|
|
28
27
|
*/
|
|
29
28
|
export const MODEL_TIERS = [
|
|
30
|
-
{ tag: 'prism-coder:
|
|
31
|
-
{ tag: 'prism-coder:
|
|
32
|
-
{ tag: '
|
|
29
|
+
{ tag: 'prism-coder:27b', weightsGb: 16, minFreeGb: 20, ctxTokens: 32_768 },
|
|
30
|
+
{ tag: 'prism-coder:9b', weightsGb: 5.8, minFreeGb: 8, ctxTokens: 32_768 },
|
|
31
|
+
{ tag: 'prism-coder:4b', weightsGb: 3.4, minFreeGb: 5, ctxTokens: 32_768 },
|
|
33
32
|
{ tag: 'prism-coder:2b', weightsGb: 2.3, minFreeGb: 3, ctxTokens: 8_192 },
|
|
34
33
|
];
|
|
35
34
|
/**
|
|
36
35
|
* True when `installed` matches `tierTag` either as a bare tag
|
|
37
|
-
* (`prism-coder:
|
|
38
|
-
* (`dcostenco/prism-coder:
|
|
39
|
-
* dcostenco/prism-coder:
|
|
36
|
+
* (`prism-coder:27b`) or as a namespaced HuggingFace-style tag
|
|
37
|
+
* (`dcostenco/prism-coder:27b`). The README documents `ollama pull
|
|
38
|
+
* dcostenco/prism-coder:27b`, so Ollama's /api/tags returns the
|
|
40
39
|
* namespaced form — without this matcher the picker would never
|
|
41
40
|
* see them and silently fall through to cloud.
|
|
42
41
|
*/
|
|
43
42
|
function tagMatches(installed, tierTag) {
|
|
44
43
|
return installed === tierTag || installed.endsWith(`/${tierTag}`);
|
|
45
44
|
}
|
|
46
|
-
/** Default ceiling:
|
|
47
|
-
export const DEFAULT_CEILING = "
|
|
45
|
+
/** Default ceiling: 9b. Pass ceiling="27b" explicitly for max quality. */
|
|
46
|
+
export const DEFAULT_CEILING = "9b";
|
|
48
47
|
/**
|
|
49
48
|
* Pick the best viable tier for the given free RAM.
|
|
50
|
-
* Default ceiling is
|
|
49
|
+
* Default ceiling is 9b — use ceiling="27b" only for complex tasks.
|
|
51
50
|
*
|
|
52
51
|
* @param freeBytes Result of os.freemem() — binary bytes
|
|
53
|
-
* @param ceiling Cap tier. Default "
|
|
52
|
+
* @param ceiling Cap tier. Default "9b". Pass "27b" for complex tasks.
|
|
54
53
|
* @param available Optional whitelist of installed Ollama tags.
|
|
55
54
|
*/
|
|
56
55
|
export function pickLocalModel(freeBytes, ceiling, available) {
|
|
@@ -80,7 +79,7 @@ export function pickLocalModel(freeBytes, ceiling, available) {
|
|
|
80
79
|
}
|
|
81
80
|
/**
|
|
82
81
|
* Resolve a tier tag to the actual Ollama name installed locally.
|
|
83
|
-
* If `installed` contains a namespaced match (e.g. `dcostenco/prism-coder:
|
|
82
|
+
* If `installed` contains a namespaced match (e.g. `dcostenco/prism-coder:27b`),
|
|
84
83
|
* the namespaced form is returned so Ollama's /api/generate finds it.
|
|
85
84
|
* Falls back to the bare tag when only the bare form is present.
|
|
86
85
|
*/
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Quality Gate — deterministic check for obvious inference failures.
|
|
3
|
+
*
|
|
4
|
+
* NARROW by design: only high-precision signals that rarely false-positive.
|
|
5
|
+
* Does NOT judge correctness — that's the grounding verifier's job.
|
|
6
|
+
* Does NOT use refusal regex (too many false positives on legitimate output).
|
|
7
|
+
*
|
|
8
|
+
* Returns: { pass: boolean, reason?: string }
|
|
9
|
+
*/
|
|
10
|
+
/**
|
|
11
|
+
* Check if a model response passes the quality gate.
|
|
12
|
+
* @param stripped Response AFTER think-stripping (use stripThink first)
|
|
13
|
+
* @param thinkOnly True if the response was only <think> blocks with no answer
|
|
14
|
+
* @param finishReason Ollama's finish_reason if available (e.g. "length" = truncated)
|
|
15
|
+
*/
|
|
16
|
+
export function passesQualityGate(stripped, thinkOnly, finishReason) {
|
|
17
|
+
// Signal 1: Think-only — model reasoned but produced no answer (check before empty)
|
|
18
|
+
if (thinkOnly) {
|
|
19
|
+
return { pass: false, reason: "think_only" };
|
|
20
|
+
}
|
|
21
|
+
// Signal 2: Empty or near-empty after stripping
|
|
22
|
+
if (stripped.trim().length < 5) {
|
|
23
|
+
return { pass: false, reason: "empty_response" };
|
|
24
|
+
}
|
|
25
|
+
// Signal 3: Hard truncation — Ollama reports finish_reason="length"
|
|
26
|
+
// meaning the model hit num_predict before finishing
|
|
27
|
+
if (finishReason === "length") {
|
|
28
|
+
return { pass: false, reason: "hard_truncation" };
|
|
29
|
+
}
|
|
30
|
+
// Signal 4: Exact-loop — same sentence repeated 3+ times
|
|
31
|
+
const sentences = stripped.split(/[.!?\n]+/).map(s => s.trim()).filter(s => s.length > 10);
|
|
32
|
+
if (sentences.length >= 6) {
|
|
33
|
+
const counts = new Map();
|
|
34
|
+
for (const s of sentences) {
|
|
35
|
+
const key = s.toLowerCase();
|
|
36
|
+
counts.set(key, (counts.get(key) ?? 0) + 1);
|
|
37
|
+
if ((counts.get(key) ?? 0) >= 3) {
|
|
38
|
+
return { pass: false, reason: "loop_detected" };
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
}
|
|
42
|
+
return { pass: true };
|
|
43
|
+
}
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Think-Strip — remove <think>...</think> blocks from model output.
|
|
3
|
+
*
|
|
4
|
+
* Qwen3.5 uses <think> blocks for chain-of-thought reasoning.
|
|
5
|
+
* These must be stripped before serving to the user or passing
|
|
6
|
+
* to the grounding verifier (which would try to ground reasoning text).
|
|
7
|
+
*
|
|
8
|
+
* Returns: { stripped: string, thinkContent: string | null, thinkOnly: boolean }
|
|
9
|
+
*/
|
|
10
|
+
const THINK_RE = /<(?:think|\|synalux_think\|)>[\s\S]*?<\/(?:think|\|synalux_think\|)>\s*/g;
|
|
11
|
+
const UNCLOSED_THINK_RE = /<(?:think|\|synalux_think\|)>[\s\S]*$/;
|
|
12
|
+
export function stripThink(raw) {
|
|
13
|
+
if (!raw.includes("<think>") && !raw.includes("<|synalux_think|>")) {
|
|
14
|
+
return { stripped: raw, thinkContent: null, thinkOnly: false };
|
|
15
|
+
}
|
|
16
|
+
const thinkMatch = raw.match(/<(?:think|\|synalux_think\|)>([\s\S]*?)<\/(?:think|\|synalux_think\|)>/);
|
|
17
|
+
const thinkContent = thinkMatch ? thinkMatch[1].trim() : null;
|
|
18
|
+
let stripped = raw.replace(THINK_RE, "");
|
|
19
|
+
stripped = stripped.replace(UNCLOSED_THINK_RE, "");
|
|
20
|
+
stripped = stripped.trim();
|
|
21
|
+
return {
|
|
22
|
+
stripped,
|
|
23
|
+
thinkContent,
|
|
24
|
+
thinkOnly: stripped.length === 0 && raw.trim().length > 0,
|
|
25
|
+
};
|
|
26
|
+
}
|
|
@@ -15,8 +15,9 @@ export class Gatekeeper {
|
|
|
15
15
|
console.warn(`\n⚠️ [OVERRIDDEN] Verification Gate bypassed via administrator override.`);
|
|
16
16
|
// Enforce immutability and record audit trail context via environment variables
|
|
17
17
|
validatedResult.gate_override = true;
|
|
18
|
+
// F19 fix: process.env.USER is trivially spoofable — log it but note it's unauthenticated.
|
|
18
19
|
const actor = process.env.USER || process.env.USERNAME || 'unknown_user';
|
|
19
|
-
validatedResult.override_reason = validatedResult.override_reason || `CLI --force bypass
|
|
20
|
+
validatedResult.override_reason = validatedResult.override_reason || `CLI --force bypass (unauthenticated env.USER=${actor})`;
|
|
20
21
|
return { canContinue: true, validatedResult };
|
|
21
22
|
}
|
|
22
23
|
switch (validatedResult.gate_action) {
|
|
@@ -196,7 +196,12 @@ export class VerificationRunner {
|
|
|
196
196
|
* Throws an error if the hash does not match, ensuring test integrity.
|
|
197
197
|
*/
|
|
198
198
|
static verifyRubricHash(tests, harness) {
|
|
199
|
-
|
|
199
|
+
// F11 fix: include min_pass_rate in hash verification when harness has it.
|
|
200
|
+
// Try with min_pass_rate first; fall back to without for backward compat.
|
|
201
|
+
const minRate = harness.min_pass_rate;
|
|
202
|
+
const computed = minRate !== undefined
|
|
203
|
+
? computeRubricHash(tests, minRate)
|
|
204
|
+
: computeRubricHash(tests);
|
|
200
205
|
if (computed !== harness.rubric_hash) {
|
|
201
206
|
throw new Error(`Rubric hash mismatch. Expected ${harness.rubric_hash}, but computeRubricHash returned ${computed}. The tests have been modified since the harness was created.`);
|
|
202
207
|
}
|
|
@@ -405,7 +410,7 @@ export class VerificationRunner {
|
|
|
405
410
|
if (!targetCheck.ok) {
|
|
406
411
|
return { passed: false, error: `HTTP target blocked: ${targetCheck.reason}` };
|
|
407
412
|
}
|
|
408
|
-
const res = await fetch(a.target);
|
|
413
|
+
const res = await fetch(a.target, { redirect: "error" });
|
|
409
414
|
return res.status === a.expected
|
|
410
415
|
? { passed: true }
|
|
411
416
|
: { passed: false, error: `Expected status ${a.expected}, got ${res.status} for ${a.target}` };
|
|
@@ -56,8 +56,16 @@ export const TestSuiteSchema = z.object({
|
|
|
56
56
|
* @param tests - The array of TestAssertion to hash
|
|
57
57
|
* @returns Lowercase hex SHA-256 digest
|
|
58
58
|
*/
|
|
59
|
-
export function computeRubricHash(tests) {
|
|
59
|
+
export function computeRubricHash(tests, minPassRate) {
|
|
60
60
|
const sorted = [...tests].sort((a, b) => a.id.localeCompare(b.id));
|
|
61
|
+
// F11 fix: when minPassRate is provided, include it in the hash so the
|
|
62
|
+
// threshold can't be changed without invalidating the rubric.
|
|
63
|
+
// When omitted, hash only tests (backward compatible with existing harnesses).
|
|
64
|
+
if (minPassRate !== undefined) {
|
|
65
|
+
return createHash("sha256")
|
|
66
|
+
.update(JSON.stringify({ tests: sorted, min_pass_rate: minPassRate }))
|
|
67
|
+
.digest("hex");
|
|
68
|
+
}
|
|
61
69
|
return createHash("sha256")
|
|
62
70
|
.update(JSON.stringify(sorted))
|
|
63
71
|
.digest("hex");
|
|
@@ -44,6 +44,18 @@ export function resolveEffectiveSeverity(assertionSeverity, defaultSeverity) {
|
|
|
44
44
|
*/
|
|
45
45
|
export function evaluateSeverityGates(results, config) {
|
|
46
46
|
const failures = results.filter(r => !r.passed && !r.skipped);
|
|
47
|
+
// F10 fix: skipped critical (gate/abort) assertions count as failures.
|
|
48
|
+
// Crafting depends_on to skip critical checks must not neutralize the gate.
|
|
49
|
+
const skippedCritical = results.filter(r => r.skipped && (r.severity === 'gate' || r.severity === 'abort'));
|
|
50
|
+
if (skippedCritical.length > 0) {
|
|
51
|
+
const ids = skippedCritical.map(r => r.id).join(", ");
|
|
52
|
+
const hasAbort = skippedCritical.some(r => r.severity === 'abort');
|
|
53
|
+
return {
|
|
54
|
+
action: hasAbort ? "abort" : "block",
|
|
55
|
+
failed_assertions: skippedCritical,
|
|
56
|
+
summary: `${hasAbort ? 'ABORT' : 'BLOCKED'}: ${skippedCritical.length} critical assertion(s) were skipped [${ids}] — treating as failures.`
|
|
57
|
+
};
|
|
58
|
+
}
|
|
47
59
|
if (failures.length === 0) {
|
|
48
60
|
return {
|
|
49
61
|
action: "continue",
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "prism-mcp-server",
|
|
3
|
-
"version": "19.
|
|
3
|
+
"version": "19.1.0",
|
|
4
4
|
"mcpName": "io.github.dcostenco/prism-coder",
|
|
5
|
-
"description": "Prism Coder
|
|
5
|
+
"description": "Prism Coder \u2014 Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 114 Agent Skills, PHI Guard, Tier Enforcement, Prompt-Based Skill Routing, Zero-Search HDC/HRR retrieval, HRR Semantic Drift Detection across BCBA/Coding/AAC domains, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder 1.7B\u201332B open-weights LLM fleet.",
|
|
6
6
|
"module": "index.ts",
|
|
7
7
|
"type": "module",
|
|
8
8
|
"main": "dist/server.js",
|