prism-mcp-server 19.0.0 → 19.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,10 +1,6 @@
1
1
  # Prism Coder
2
2
 
3
- **Persistent memory and reliable tool-routing for AI agents.** *(formerly Prism MCP)*
4
-
5
- Prism Coder is a [Model Context Protocol](https://modelcontextprotocol.io) server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions — semantic search, cognitive routing, and a visual dashboard. It ships alongside the open-weight `prism-coder` model fleet (1.7B-32B) for fast, offline tool-routing when you don't want a cloud round-trip.
6
-
7
- It runs **fully local and free** on SQLite + Ollama with no API keys. A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.
3
+ **Give your AI agent memory that lasts.** Persistent sessions, knowledge graphs, and offline tool-routing fully local and free.
8
4
 
9
5
  [![npm](https://img.shields.io/npm/v/prism-mcp-server?color=cb0000&label=npm)](https://www.npmjs.com/package/prism-mcp-server)
10
6
  [![MCP Registry](https://img.shields.io/badge/MCP_Registry-listed-00ADD8)](https://github.com/modelcontextprotocol/servers)
@@ -15,7 +11,10 @@ It runs **fully local and free** on SQLite + Ollama with no API keys. A paid sub
15
11
  <img src="docs/v11_hivemind_multi_agent_dashboard.jpg" alt="Prism Coder — Mind Palace Dashboard with Knowledge Graph and Multi-Agent Hivemind" width="700" />
16
12
  </p>
17
13
 
18
- > **Renamed in v14:** the project is now **Prism Coder** to cover both the memory server and the model fleet. The npm package stays `prism-mcp-server`, so existing install URLs and `mcp.json` entries keep working.
14
+ Prism Coder is an [MCP server](https://modelcontextprotocol.io) that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight `prism-coder` model fleet (2B–32B) for fast, offline tool-routing no cloud required.
15
+
16
+ **No account needed. No API keys. Runs on your machine.**
17
+ A paid subscription adds cloud sync, higher model tiers, and team features through the [Synalux portal](https://synalux.ai).
19
18
 
20
19
  ---
21
20
 
@@ -39,18 +38,20 @@ Open Claude Desktop or Cursor and your agent now has memory backed by a local SQ
39
38
  **Optional — local model fleet** for offline tool-routing. Pull whichever fits your hardware:
40
39
 
41
40
  ```bash
42
- ollama pull dcostenco/prism-coder:2b # 2.3 GB · iPhone / mobile first gate (Qwen3.5-4B Q3_K_M, 99.1%)
43
- ollama pull dcostenco/prism-coder:4b # 3.4 GB · verifier + 8 GB+ devices (Qwen3.5-4B Q4_K_M, 100%)
44
- ollama pull dcostenco/prism-coder:14b # 8.4 GB · Mac default router (100%)
45
- ollama pull dcostenco/prism-coder:32b # 16 GB · Mac complex tasks (100%)
41
+ ollama pull dcostenco/prism-coder:2b # 2.3 GB · mobile / lightweight (99.1% routing accuracy)
42
+ ollama pull dcostenco/prism-coder:4b # 3.4 GB · verifier (100% accuracy)
43
+ ollama pull dcostenco/prism-coder:9b # 5.8 GB · default router (100% accuracy, Qwen3.5)
44
+ ollama pull dcostenco/prism-coder:32b # 19 GB · complex tasks (100% accuracy)
46
45
  ```
47
46
 
48
- Prism detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tags automatically.
47
+ Prism detects both the namespaced (`dcostenco/prism-coder:9b`) and bare (`prism-coder:9b`) Ollama tags automatically.
49
48
 
50
49
  ---
51
50
 
52
51
  ## What it does
53
52
 
53
+ Your AI agent forgets everything between sessions. Prism fixes that — and adds verification, drift detection, and multi-agent coordination on top.
54
+
54
55
  ### Mind Palace — persistent memory that survives across sessions
55
56
 
56
57
  Every conversation feeds a persistent store. The next session loads the right context automatically — no re-explaining.
@@ -83,27 +84,17 @@ Long agent sessions can wander from their original goal. `session_detect_drift`
83
84
 
84
85
  ### Behavioral Verification — catch bad edits before they happen
85
86
 
86
- AI agents pattern-match on checklists instead of thinking through user impact. The behavioral verifier challenges the agent with a domain-specific scenario **before** editing code like an ABA antecedent intervention.
87
+ AI agents apply patterns from checklists without understanding the real-world impact. The `verify_behavior` tool challenges the agent with a scenario it must answer **before** editing — forcing it to think through what the end user will experience.
87
88
 
88
89
  ```
89
- Agent: "I'll revert the KDS bump logic"
90
- Prism: "⚠️ Kitchen worker scenario: A cook has a 3-item ticket.
91
- One item is voided. What should the cook see on the KDS?"
92
- Agent: "The ticket should stay visible with the remaining 2 items."
93
- Prism: "Correct — your revert would remove the ticket entirely. Don't revert."
90
+ Agent: "I'll revert this kitchen display change"
91
+ Prism: "⚠️ Scenario: A cook sees a 3-item ticket. One item is voided.
92
+ What should the cook see after the void?"
93
+ Agent: "The ticket stays visible with the remaining 2 items."
94
+ Prism: "Correct — your revert would hide the ticket entirely."
94
95
  ```
95
96
 
96
- **17 built-in domains**: KDS, billing, auth, voice ordering, webhooks, migrations, EU routing, clinical (HIPAA/FHIR), HR, accounting, chat, STT, privacy, loyalty, discounts, drawer operations, order lifecycle. Custom domains can be added per workspace.
97
-
98
- **How it works**: The `verify_behavior` tool calls the Synalux portal API, which matches the file path against domain scenarios stored in the database. The agent must answer the scenario concretely before editing. No local hooks required — works in Claude, Cursor, or any MCP client.
99
-
100
- **Why it matters**: In a single audit session, 47 bugs were found across 7 days of AI-generated code. Every bug was introduced by an agent that applied a "correct" pattern without simulating the end-user journey. The behavioral verifier would have caught all of them.
101
-
102
- | Tier | Coverage |
103
- |------|----------|
104
- | Free | Skill-based advisory (agent prompted to think before editing) |
105
- | Standard+ | `verify_behavior` tool with 17 domain scenarios via API |
106
- | Enterprise | Custom per-workspace scenarios |
97
+ 17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed works in any MCP client.
107
98
 
108
99
  ### Time Travel
109
100
 
@@ -115,7 +106,7 @@ Roll back to any previous session state. Compare diffs between versions. Restore
115
106
 
116
107
  ### Cognitive Routing
117
108
 
118
- Episodic (what happened), semantic (what's true), and procedural (how to do X) memories live in separate stores; a router decides where to write and where to read.
109
+ Three memory types, automatically sorted: **episodic** (what happened — session logs, decisions), **semantic** (what's true — facts, architecture), and **procedural** (how to do X workflows, patterns). When you search, the router picks the right store instead of dumping everything.
119
110
 
120
111
  ### Multi-Agent Hivemind
121
112
 
@@ -144,37 +135,51 @@ The free tier runs entirely on your machine. Paid tiers add cloud sync through t
144
135
  | Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
145
136
  | Inference | Local Ollama models | Local models + cloud fallback |
146
137
  | API keys required | None | Synalux subscription key |
147
- | Web search / scrape | Not included | Routed through the Synalux portal (provider keys stay server-side). Search tools appear as `brave_web_search` in the MCP surface but are proxied through the portal for auth and billing. |
138
+ | Web search / scrape | Not included | Via Synalux portal (provider keys server-side) |
148
139
  | What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
149
- | Works offline | Yes | Local features yes; sync/cloud no |
140
+ | Works offline | | Local features yes; sync/cloud no |
150
141
 
151
- **Handling sensitive data.** Memory text fields (summaries, decisions, handoff context, file paths) pass through a PHI-redaction step (SSN/DOB/MRN/phone/email and common clinical identifiers) before any cloud write. Knowledge ingestion chunks are also redacted before being sent to the LLM for Q&A synthesis. For regulated workloads, run the **local tier** to keep data on-device, or use an **Enterprise** plan, which is the tier that includes a HIPAA Business Associate Agreement. Prism does not claim blanket HIPAA compliance on the free or individual tiers — the on-device path is the air-gapped option.
142
+ **Handling sensitive data.** All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the **local tier** for full air-gap, or use **Enterprise** which includes a HIPAA Business Associate Agreement.
152
143
 
153
144
  ---
154
145
 
155
146
  ## Models
156
147
 
157
- The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing. The 14B and 32B are fine-tuned from Qwen3; the 2B and 4B slots use stock Qwen3.5-4B with prompt engineering at different quantization levels (100% routing accuracy without fine-tuning). They are **not** general-purpose chat models — they route reliably and run offline; Claude and other frontier models remain better at reasoning, coding, and open-domain work. The intended pattern is local routing with an optional cloud fallback for hard cases.
148
+ The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing. The 9B is fine-tuned with LoRA (r=128, all 64 layers including DeltaNet); the 2B and 4B use stock Qwen3.5-4B at different quantization levels. They are **not** general-purpose chat models — they route reliably and run offline; Claude and other frontier models remain better at reasoning, coding, and open-domain work. The intended pattern is local routing with an optional cloud fallback for hard cases.
158
149
 
159
- | Model | Ollama tag | Size | BFCL Accuracy | Role | Tier |
150
+ | Model | Ollama tag | Size | [BFCL](https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v3_multi_turn.html) Accuracy | Role | Tier |
160
151
  |---|---|---|---|---|---|
161
152
  | Qwen3.5-4B Q3_K_M | `prism-coder:2b` | 2.3 GB | 99.1% × 3 seeds | iPhone / mobile first gate | Free |
162
- | Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% × 3 seeds | Verifier + 8 GB+ devices | Free |
163
- | prism-coder:14b | `prism-coder:14b` | 8.4 GB | 100% × 3 seeds | Default router | Standard+ |
164
- | prism-coder:32b | `prism-coder:32b` | 16 GB | 100% × 3 seeds | Complex tasks | Advanced+ |
153
+ | Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% × 3 seeds | Verifier | Free |
154
+ | Qwen3.5-9B (LoRA) | `prism-coder:9b` | 5.8 GB | 100% × 3 seeds | Default router | Standard+ |
155
+ | prism-coder:32b | `prism-coder:32b` | 19 GB | 100% × 3 seeds | Complex tasks | Advanced+ |
165
156
 
166
157
  Weights: [huggingface.co/dcostenco](https://huggingface.co/dcostenco) (public GGUF). Latency depends on model size and hardware — see [Benchmarks](#benchmarks) to measure it on your own machine rather than trusting a printed number.
167
158
 
168
159
  ### Cascade
169
160
 
170
161
  ```
171
- query → prism-coder:14b (local router, Mac default)
172
- qwen3.5:4b (grounding verifier)
162
+ query → prism-coder:9b (local router, default)
163
+ prism-coder:4b (grounding verifier)
173
164
  → prism-coder:2b (iPhone / mobile, auto-selected by RAM)
174
165
  → prism-coder:32b (complex tasks, on demand)
175
166
  → cloud fallback (paid tiers, for max quality)
176
167
  ```
177
168
 
169
+ ### Multi-Layer Verification
170
+
171
+ Every tool-grounded answer on paid tiers passes through deterministic L3 routing rules and an NLI grounding verifier before reaching the user. Free-tier users get the deterministic gates (L1, L3-Tool, L3-Tier0) without the model-based NLI check.
172
+
173
+ | Layer | What | Model | Cost |
174
+ |---|---|---|---|
175
+ | **L1** | Crisis/medical safety gate | None (regex) | 0 ms |
176
+ | **L3-Tool** | Tool name remap + false-positive rejection | None (deterministic) | 0 ms |
177
+ | **L3-Tier0** | Integer grounding (set membership) | None (deterministic) | 0 ms |
178
+ | **L3-Tier2** | NLI verifier (claim → ENTAILED/NEUTRAL/CONTRADICTED) | prism-coder:2b | ~200 ms |
179
+ | **L4** | Hallucination judge (opt-out for clinical) | prism-coder:4b | ~500 ms |
180
+
181
+ Fail-closed on the verified path: when the grounding verifier runs (Standard tier and up), timeout, ambiguity, or missing evidence yields a refusal, not pass-through. Free-tier users get the deterministic L1/L3-Tool gates but not the NLI verifier.
182
+
178
183
  ---
179
184
 
180
185
  ## Benchmarks
@@ -184,15 +189,15 @@ query → prism-coder:14b (local router, Mac default)
184
189
  ```bash
185
190
  git clone https://github.com/dcostenco/prism-coder && cd prism-coder
186
191
  pip install anthropic requests
187
- python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 14b 32b
192
+ python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 9b 32b
188
193
  ```
189
194
 
190
- **Routing eval (115 cases, 12 categories, 3-seed mean).** On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
195
+ **Routing eval (115 cases, 12 categories, 3-seed mean).** Routing accuracy includes the deterministic L3 correction layer — the same rules that run in production. On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
191
196
 
192
197
  | Model | Routing accuracy | Notes |
193
198
  |---|---|---|
194
199
  | prism-coder:2b (Q3_K_M) | 99.1% × 3 seeds | 1 failure: regex→knowledge_search |
195
- | prism-coder:4b / 14b / 32b | 100% × 3 seeds | Perfect on all 115 cases |
200
+ | prism-coder:4b / 9b / 32b | 100% × 3 seeds | Perfect on all 115 cases |
196
201
  | Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
197
202
 
198
203
  **Memory uplift (LoCoMo-Plus, self-published).** A separate long-context dialogue benchmark ([dcostenco/Locomo-Plus](https://github.com/dcostenco/Locomo-Plus)) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project — treat it as a reproducible demonstration, not an independent third-party result, and run it yourself with the commands in that repo.
@@ -207,30 +212,30 @@ These tables are the maintainer's assessment as of June 2026. Verify claims that
207
212
 
208
213
  | Feature | Prism Coder | GitHub Copilot | Cursor | Windsurf | Amazon Q | Devin |
209
214
  |---|:---:|:---:|:---:|:---:|:---:|:---:|
210
- | Local inference (open-weight) | Yes | No | No | No | No | No |
211
- | Works fully offline | Yes (free tier) | No | No | No | No | No |
212
- | Persistent cross-session memory | Yes | Yes | No | No | No | No |
213
- | Session drift detection | Yes | No | No | No | No | No |
214
- | L3 grounding verifier | Yes | No | No | No | No | No |
215
- | Behavioral verification (pre-edit) | Yes | No | No | No | No | No |
216
- | MCP server (tools + memory) | Yes | No | No | No | No | No |
217
- | Web IDE | Yes | Yes | No | No | Yes | Yes |
218
- | VS Code extension | Yes | Yes | N/A (is VS Code) | N/A | Yes | No |
219
- | Flat-rate team pricing | Yes | No (per-seat) | No (per-seat) | No | No | No |
220
- | HIPAA BAA available | Yes (Enterprise) | No | No | No | No | No |
215
+ | Local inference (open-weight) | | | | | | |
216
+ | Works fully offline | (free tier) | | | | | |
217
+ | Persistent cross-session memory | | | | | | |
218
+ | Session drift detection | | | | | | |
219
+ | L3 grounding verifier | | | | | | |
220
+ | Behavioral verification (pre-edit) | | | | | | |
221
+ | MCP server (tools + memory) | | | | | | |
222
+ | Web IDE | | | | | | |
223
+ | VS Code extension | | | | | | |
224
+ | Flat-rate team pricing | | (per-seat) | (per-seat) | | | |
225
+ | HIPAA BAA available | (Enterprise) | | | | | |
221
226
 
222
227
  ### vs local AI / memory tools
223
228
 
224
229
  | Feature | Prism Coder | Ollama | LM Studio | Mem0 | Zep |
225
230
  |---|:---:|:---:|:---:|:---:|:---:|
226
- | Local inference cascade | Yes | Yes | Yes | No | No |
227
- | Cloud fallback | Yes | No | No | No | No |
228
- | Persistent cross-session memory | Yes | No | No | Yes | Yes |
229
- | Knowledge ingestion (MCP + webhook) | Yes | No | No | No | No |
230
- | Cognitive routing (3-store) | Yes | No | No | No | No |
231
- | Session drift detection | Yes | No | No | No | No |
232
- | Native MCP server | Yes | No | No | No | No |
233
- | Web IDE + VS Code extension | Yes | No | No | No | No |
231
+ | Local inference cascade | | | | | |
232
+ | Cloud fallback | | | | | |
233
+ | Persistent cross-session memory | | | | | |
234
+ | Knowledge ingestion (MCP + webhook) | | | | | |
235
+ | Cognitive routing (3-store) | | | | | |
236
+ | Session drift detection | | | | | |
237
+ | Native MCP server | | | | | |
238
+ | Web IDE + VS Code extension | | | | | |
234
239
 
235
240
  ### Pricing — flat-rate, not per-seat
236
241
 
@@ -249,19 +254,19 @@ All on-device models are free to run locally via Ollama on every tier. A subscri
249
254
  | | **Free** | **Standard** $19/mo | **Advanced** $49/mo | **Enterprise** $99/mo |
250
255
  |---|---|---|---|---|
251
256
  | Seats | 1 | 1 | up to 5 | up to 25 |
252
- | Local model ceiling | up to 4b | up to 14b | up to 32b | up to 32b |
257
+ | Local model ceiling | up to 4b | up to 9b | up to 32b | up to 32b |
253
258
  | Daily cloud inference | -- | 200 | 2,000 | 100,000 |
254
259
  | Cloud Coder (Web IDE) | -- | 100/day | 1,000/day | 100,000/day |
255
260
  | Cloud search | -- | 50/day | 500/day | 100,000/day |
256
261
  | Max output tokens | 512 | 1,024 | 2,048 | 4,096 |
257
262
  | Cloud fallback | -- | Claude Sonnet 4 | Claude Sonnet 4 | Priority + Sonnet 4 |
258
- | Grounding verifier | -- | Yes | Yes | Yes |
259
- | Memory sync (cloud) | -- | Yes | Yes | Yes |
263
+ | Grounding verifier (fact-check AI output) | -- | | | |
264
+ | Memory sync (cloud) | -- | | | |
260
265
  | Knowledge / session memory | limited | unlimited | unlimited | unlimited |
261
- | Analytics dashboard | -- | Yes | Yes | Yes |
262
- | HIPAA BAA | -- | -- | -- | Yes |
266
+ | Analytics dashboard | -- | | | |
267
+ | HIPAA BAA | -- | -- | -- | |
263
268
 
264
- 14-day free trial on paid plans. [Pricing](https://synalux.ai/pricing) | 25+ seats: [contact sales](https://synalux.ai/support)
269
+ 14-day free trial on paid plans. 25+ seats: [contact sales](https://synalux.ai/support)
265
270
 
266
271
  ---
267
272
 
@@ -324,6 +329,8 @@ prism register-models # alias dcostenco/prism-coder:* -> prism-coder:*
324
329
 
325
330
  ## Companions
326
331
 
332
+ Prism works alongside these tools — use whichever fits your workflow.
333
+
327
334
  ### Web IDE — Prism Coder
328
335
 
329
336
  A browser-based IDE at [synalux.ai/coder](https://synalux.ai/coder). Import any GitHub repo and get:
@@ -358,13 +365,16 @@ code --install-extension synalux-ai.synalux
358
365
 
359
366
  [![VS Marketplace](https://img.shields.io/visual-studio-marketplace/v/synalux-ai.synalux?label=VS%20Marketplace&color=007ACC)](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
360
367
 
361
- **AI features:** Chat participant (`@synalux`), multi-agent pipeline, voice input with conversation mode, model switching (local Ollama / cloud / Gemini), 10 AI personality tones.
362
-
363
- **Clinical features (BCBA / healthcare):** SOAP note generator, role-based access, document signing, patient board. Voice recording with AES-256-GCM encryption (consent-gated, off by default, plaintext deleted after encryption).
368
+ AI chat, voice input, SOAP note generator, team collaboration, and video calls all inside VS Code. Routes through local Ollama by default; cloud on paid tiers.
364
369
 
365
- **Collaboration:** Team chat, direct messages, enterprise video calls (LiveKit), customer board, visual builder, DevContainers, Auth & Database panel.
370
+ <details>
371
+ <summary>Feature details</summary>
366
372
 
367
- **Privacy note:** The extension routes AI requests through the `BackendRouter` — local Ollama by default for free tier, cloud for paid (user-configurable via `preferLocal`). Clinical features (SOAP notes, voice) route through the same backend. `preferLocal=true` tries local first but can still fall back to cloud if the local model is unavailable. For regulated workloads where PHI must never leave the machine, use the free tier (no cloud key) or an Enterprise plan with BAA that covers cloud-bound data. Licensed under [BSL-1.1](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux).
373
+ - **AI**: Chat participant (`@synalux`), multi-agent pipeline, voice input, model switching, 10 tones
374
+ - **Clinical**: SOAP note generator, role-based access, document signing, patient board
375
+ - **Collaboration**: Team chat, DMs, video calls, customer board, visual builder, DevContainers
376
+ - **Privacy**: Local Ollama by default. `preferLocal=true` tries local first. Enterprise BAA available.
377
+ </details>
368
378
 
369
379
  ### Prism AAC
370
380
 
@@ -374,6 +384,28 @@ See [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)
374
384
 
375
385
  ---
376
386
 
387
+ ## Git Hooks (Portable)
388
+
389
+ Pre-commit and pre-push security hooks that work with any editor, any AI tool, and direct CLI. No Claude Code dependency.
390
+
391
+ ```bash
392
+ # Install in all repos (one-time)
393
+ bash synalux-private/scripts/install-git-hooks.sh
394
+
395
+ # Or install manually in a single repo
396
+ cp hooks/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
397
+ cp hooks/pre-push .git/hooks/pre-push && chmod +x .git/hooks/pre-push
398
+ ```
399
+
400
+ | Hook | What it checks | Mode |
401
+ |------|----------------|------|
402
+ | `pre-commit` | Dead code, orphan services, scaffold code, missing auth | `PRECOMMIT_MODE=advisory\|block\|off` |
403
+ | `pre-push` | 19-rule security audit (SSRF, SQL injection, secrets, IDOR, etc.) | `PREPUSH_MODE=advisory\|block\|off` |
404
+
405
+ Default mode is `advisory` (warn but allow). Set `*_MODE=block` for hard enforcement. Hooks look for full audit scripts in the repo first (`hooks/lib/`), then `~/.claude/hooks/` fallback, then minimal inline checks.
406
+
407
+ ---
408
+
377
409
  ## Self-hosting (Enterprise)
378
410
 
379
411
  Run the full model stack on your own hardware — no cloud, full data sovereignty.
@@ -381,11 +413,11 @@ Run the full model stack on your own hardware — no cloud, full data sovereignt
381
413
  **Requirements:** Mac M2 Pro+ (48 GB recommended) or Linux + NVIDIA GPU, plus [Ollama](https://ollama.com).
382
414
 
383
415
  ```bash
384
- ollama pull dcostenco/prism-coder:14b # default router
416
+ ollama pull dcostenco/prism-coder:9b # default router
385
417
  export LOCAL_LLM_URL=http://localhost:11434
386
418
  ```
387
419
 
388
- Routing is automatic: `14b → 4b → cloud fallback` on desktop/server, `2b → cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
420
+ Routing is automatic: `9b → 4b → cloud fallback` on desktop/server, `2b → cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
389
421
 
390
422
  ---
391
423
 
package/dist/cli.js CHANGED
@@ -521,10 +521,10 @@ scmCmd
521
521
  });
522
522
  // ─── prism register-models ────────────────────────────────────
523
523
  // Convenience: alias namespaced HF-style prism-coder tags
524
- // (`dcostenco/prism-coder:14b`) to the bare tags (`prism-coder:14b`)
524
+ // (`dcostenco/prism-coder:9b`) to the bare tags (`prism-coder:9b`)
525
525
  // some external tooling expects. The MCP picker handles both forms
526
526
  // natively as of v15.5, so this command is OPTIONAL — useful only
527
- // when a user wants to run `ollama run prism-coder:14b` directly,
527
+ // when a user wants to run `ollama run prism-coder:9b` directly,
528
528
  // or for tools that pre-date the picker's namespace fallback.
529
529
  program
530
530
  .command('register-models')
@@ -1268,7 +1268,7 @@ export class SqliteStorage {
1268
1268
  FROM session_ledger
1269
1269
  WHERE project = ? AND user_id = ? AND role = ?
1270
1270
  AND event_type = 'correction'
1271
- AND importance >= 3
1271
+ AND importance >= 0
1272
1272
  AND deleted_at IS NULL
1273
1273
  AND archived_at IS NULL
1274
1274
  ORDER BY importance DESC
@@ -2323,10 +2323,12 @@ export class SqliteStorage {
2323
2323
  SET importance = MAX(0, importance - 1)
2324
2324
  WHERE project = ? AND user_id = ?
2325
2325
  AND importance > 0
2326
+ AND importance < 10
2326
2327
  AND event_type != 'session'
2327
2328
  AND created_at < datetime('now', '-' || ? || ' days')
2329
+ AND (last_accessed_at IS NULL OR last_accessed_at < datetime('now', '-' || ? || ' days'))
2328
2330
  AND deleted_at IS NULL`,
2329
- args: [project, userId, decayDays],
2331
+ args: [project, userId, decayDays, decayDays],
2330
2332
  });
2331
2333
  const decayed = result.rowsAffected || 0;
2332
2334
  if (decayed > 0) {
@@ -10,7 +10,6 @@
10
10
  */
11
11
  import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
12
12
  import { getSynaluxJwt } from "../utils/synaluxJwt.js";
13
- import { debugLog } from "../utils/logger.js";
14
13
  const FALLBACK_SCENARIO = [
15
14
  "⚠️ BEHAVIORAL VERIFICATION (OFFLINE MODE)",
16
15
  "",
@@ -30,7 +29,7 @@ export async function verifyBehaviorHandler(args) {
30
29
  }
31
30
  const jwt = await getSynaluxJwt();
32
31
  if (!jwt) {
33
- debugLog("[verify-behavior] JWT unavailable — fail-closed with generic scenario");
32
+ console.error("[verify-behavior] ⚠️ JWT unavailable — fail-closed with generic scenario");
34
33
  return FALLBACK_SCENARIO;
35
34
  }
36
35
  try {
@@ -49,14 +48,14 @@ export async function verifyBehaviorHandler(args) {
49
48
  signal: AbortSignal.timeout(5_000),
50
49
  });
51
50
  if (!res.ok) {
52
- debugLog(`[verify-behavior] portal returned ${res.status} — fail-closed`);
51
+ console.error(`[verify-behavior] ⚠️ portal returned ${res.status} — fail-closed. URL: ${url}`);
53
52
  return FALLBACK_SCENARIO;
54
53
  }
55
54
  const data = (await res.json());
56
55
  return formatResult(data);
57
56
  }
58
57
  catch (err) {
59
- debugLog(`[verify-behavior] error: ${err.message} — fail-closed`);
58
+ console.error(`[verify-behavior] ⚠️ VERIFICATION FAILED: ${err.message} — using generic fallback`);
60
59
  return FALLBACK_SCENARIO;
61
60
  }
62
61
  }
@@ -977,15 +977,17 @@ export async function sessionLoadContextHandler(args) {
977
977
  // Build the response object before v4.0 augmentations
978
978
  // SECURITY: Wrap output in boundary tags to prevent context confusion.
979
979
  // The LLM sees <prism_memory context="historical"> and knows this is data, not instructions.
980
- let responseText = `${MEMORY_BOUNDARY_PREFIX}📋 Session context for "${project}" (${level}):\n\n${formattedContext.trim()}${splitBrainWarning}${driftReport}${briefingBlock}${sdmRecallBlock}${greetingBlock}${visualMemoryBlock}${skillBlock}${versionNote}`;
981
- // ─── v4.0: Behavioral Warnings Injection ───────────────────
982
- // If loadContext returned behavioral_warnings, add them to the
983
- // formatted output so the agent sees them prominently.
980
+ // ─── v19.1: Behavioral Warnings BEFORE skills (protected from truncation) ───
981
+ // Corrections must surface prominently. Placed before skillBlock so the
982
+ // skill budget cannot push them out. Capped at 2,000 chars.
984
983
  const behavWarnings = data?.behavioral_warnings;
984
+ let behavBlock = '';
985
985
  if (behavWarnings && behavWarnings.length > 0) {
986
- responseText += `\n\n[⚠️ BEHAVIORAL WARNINGS]\n` +
986
+ const rawBlock = `\n\n[⚠️ BEHAVIORAL WARNINGS — DO NOT IGNORE]\n` +
987
987
  behavWarnings.map(w => `- ${w.summary} (importance: ${w.importance})`).join("\n");
988
+ behavBlock = [...rawBlock].slice(0, 2000).join('');
988
989
  }
990
+ let responseText = `${MEMORY_BOUNDARY_PREFIX}📋 Session context for "${project}" (${level}):\n\n${formattedContext.trim()}${splitBrainWarning}${driftReport}${briefingBlock}${sdmRecallBlock}${greetingBlock}${visualMemoryBlock}${behavBlock}${skillBlock}${versionNote}`;
989
991
  // ─── v9.4.7: ABA Precision Protocol (foundational) ────────
990
992
  // Injected into EVERY session load so the agent always operates
991
993
  // under these behavioral rules. Never truncated (placed before
@@ -2,7 +2,7 @@
2
2
  * prism_infer — local-first inference tool
3
3
  * ─────────────────────────────────────────────────────────────
4
4
  * Save the caller's cloud tokens by routing to a local prism-coder
5
- * model via Ollama. Tiers (32B/14B/8B/1.7B) auto-selected by free
5
+ * model via Ollama. Tiers (32B/9B/8B/1.7B) auto-selected by free
6
6
  * RAM, then capped by `model_ceiling` and the set of tags that are
7
7
  * actually pulled into Ollama.
8
8
  *
@@ -12,7 +12,7 @@
12
12
  * 4. On local fail, if cloud_fallback=true:
13
13
  * - exchange synalux_sk_ → JWT (cached)
14
14
  * - POST synalux portal /api/v1/prism-aac/inference
15
- * - portal runs its own cascade (14B/32B/Claude by tier)
15
+ * - portal runs its own cascade (9B/32B/Claude by tier)
16
16
  * 5. Return { output, backend, model_picked, ram_free_mb, latency_ms, used_cloud }
17
17
  *
18
18
  * `prism_infer` is a thin client. It never calls Anthropic / OpenRouter
@@ -24,16 +24,15 @@ import { getSynaluxJwt, invalidateSynaluxJwt } from "../utils/synaluxJwt.js";
24
24
  import { getAvailableMemoryBytes } from "../utils/availableMemory.js";
25
25
  import { PRISM_SYNALUX_BASE_URL, PRISM_LOCAL_LLM_URL, } from "../config.js";
26
26
  import { debugLog } from "../utils/logger.js";
27
- import { verifyGrounding } from "../utils/groundingVerifier.js";
28
27
  import { getEntitlements, clampCeiling } from "../utils/entitlements.js";
29
28
  import { ddLog } from "../utils/ddLogger.js";
30
29
  // ─── Tool Definition ────────────────────────────────────────────
31
30
  export const PRISM_INFER_TOOL = {
32
31
  name: "prism_infer",
33
32
  description: "Run an inference on a local prism-coder model (Ollama) to save cloud tokens. " +
34
- "Picks the largest viable tier — 32B / 14B / 8B / 1.7B — based on free RAM at call time, " +
33
+ "Picks the largest viable tier — 32B / 9B / 8B / 1.7B — based on free RAM at call time, " +
35
34
  "clamped by `model_ceiling` and what is actually pulled in Ollama. " +
36
- "Falls through to the synalux portal cloud cascade (14B → 32B → Claude Opus 4.7) " +
35
+ "Falls through to the synalux portal cloud cascade (9B → 32B → Claude Opus 4.7) " +
37
36
  "only when local is unviable AND `cloud_fallback=true`. " +
38
37
  "Use this for code generation, summarisation, classification, or any synth task you would " +
39
38
  "otherwise hand to the cloud model — it costs $0 when the local hit succeeds.",
@@ -60,8 +59,8 @@ export const PRISM_INFER_TOOL = {
60
59
  },
61
60
  model_ceiling: {
62
61
  type: "string",
63
- enum: ["32b", "14b", "4b", "2b"],
64
- description: "Cap the largest tier the picker may select. e.g. '14b' forbids 32B even if RAM allows.",
62
+ enum: ["32b", "9b", "4b", "2b"],
63
+ description: "Cap the largest tier the picker may select. e.g. '9b' forbids 32B even if RAM allows.",
65
64
  },
66
65
  cloud_fallback: {
67
66
  type: "boolean",
@@ -70,7 +69,7 @@ export const PRISM_INFER_TOOL = {
70
69
  },
71
70
  timeout_ms: {
72
71
  type: "number",
73
- description: "Override per-call timeout. Default scales with model size: 32B=120s, 14B=60s, 4B=20s, 1.7B=15s.",
72
+ description: "Override per-call timeout. Default scales with model size: 32B=120s, 9B=60s, 4B=20s, 1.7B=15s.",
74
73
  },
75
74
  evidence: {
76
75
  type: "array",
@@ -124,7 +123,7 @@ export function isPrismInferArgs(args) {
124
123
  if (a.timeout_ms !== undefined && typeof a.timeout_ms !== "number")
125
124
  return false;
126
125
  if (a.model_ceiling !== undefined &&
127
- !["32b", "14b", "4b", "2b"].includes(a.model_ceiling))
126
+ !["32b", "9b", "4b", "2b"].includes(a.model_ceiling))
128
127
  return false;
129
128
  if (a.verify !== undefined && typeof a.verify !== "boolean")
130
129
  return false;
@@ -148,8 +147,8 @@ export function isPrismInferArgs(args) {
148
147
  // ─── Ollama helpers ────────────────────────────────────────────
149
148
  const DEFAULT_TIMEOUTS = {
150
149
  "prism-coder:32b": 120_000,
151
- "prism-coder:14b": 60_000,
152
- "qwen3.5:4b": 20_000,
150
+ "prism-coder:9b": 60_000,
151
+ "prism-coder:4b": 20_000,
153
152
  "prism-coder:2b": 15_000,
154
153
  };
155
154
  /** List Ollama-installed tags. Returns null if Ollama unreachable. */
@@ -407,10 +406,10 @@ export async function runInfer(args, deps) {
407
406
  */
408
407
  async function applyVerification(draft, args, deps, partial) {
409
408
  const shouldVerify = args.verify ?? (args.evidence !== undefined && args.evidence.length > 0);
410
- if (!shouldVerify) {
409
+ if (!shouldVerify || !deps.callVerifier) {
411
410
  return { ...partial, output: draft };
412
411
  }
413
- const verifier = deps.callVerifier ?? verifyGrounding;
412
+ const verifier = deps.callVerifier;
414
413
  const outcome = await verifier({
415
414
  draft,
416
415
  evidence: args.evidence ?? [],
@@ -6,7 +6,7 @@
6
6
  * to enforce model ceiling, max_tokens, and feature gates.
7
7
  *
8
8
  * Unauthenticated users (no SYNALUX_API_KEY) get free-tier defaults.
9
- * Authenticated users get their plan from the portal (1-hour cache).
9
+ * Authenticated users get their plan from the portal (5-minute cache).
10
10
  */
11
11
  import { getSynaluxJwt } from "./synaluxJwt.js";
12
12
  import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
@@ -32,10 +32,10 @@ const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
32
32
  let cache = null;
33
33
  let inFlight = null;
34
34
  // ── Model tier ordering for ceiling enforcement ───────────────────
35
- const TIER_ORDER = ["2b", "4b", "14b", "32b"];
35
+ const TIER_ORDER = ["2b", "4b", "9b", "32b"];
36
36
  /**
37
37
  * Returns true if `requested` exceeds `ceiling`.
38
- * e.g. ceilingExceeded("14b", "4b") → true (14b > 4b ceiling)
38
+ * e.g. ceilingExceeded("9b", "4b") → true (9b > 4b ceiling)
39
39
  */
40
40
  export function ceilingExceeded(requested, ceiling) {
41
41
  const reqIdx = TIER_ORDER.indexOf(requested);
@@ -79,12 +79,18 @@ async function fetchEntitlements() {
79
79
  redirect: "error",
80
80
  });
81
81
  if (!res.ok) {
82
- debugLog(`[entitlements] portal HTTP ${res.status} — free tier fallback`);
82
+ debugLog(`[entitlements] portal HTTP ${res.status}`);
83
+ if (cache) {
84
+ debugLog("[entitlements] using last-known-good (safety fail-closed)");
85
+ return cache.entitlements;
86
+ }
83
87
  return FREE_ENTITLEMENTS;
84
88
  }
85
89
  const data = (await res.json());
86
90
  if (!data.plan || !data.model_ceiling) {
87
- debugLog("[entitlements] malformed response — free tier fallback");
91
+ debugLog("[entitlements] malformed response");
92
+ if (cache)
93
+ return cache.entitlements;
88
94
  return FREE_ENTITLEMENTS;
89
95
  }
90
96
  debugLog(`[entitlements] plan=${data.plan} ceiling=${data.model_ceiling} ` +
@@ -92,7 +98,14 @@ async function fetchEntitlements() {
92
98
  return data;
93
99
  }
94
100
  catch (err) {
95
- debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)} — free tier fallback`);
101
+ debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)}`);
102
+ // F1 fix: fail-closed — keep last-known-good entitlements on fetch error.
103
+ // Safety controls (grounding_verifier) must not degrade on availability failures.
104
+ if (cache) {
105
+ debugLog("[entitlements] using last-known-good (safety fail-closed)");
106
+ return cache.entitlements;
107
+ }
108
+ debugLog("[entitlements] no cached entitlements — free tier fallback (cold start)");
96
109
  return FREE_ENTITLEMENTS;
97
110
  }
98
111
  }
@@ -111,7 +124,14 @@ export async function getEntitlements() {
111
124
  inFlight = (async () => {
112
125
  try {
113
126
  const ent = await fetchEntitlements();
114
- cache = { entitlements: ent, expiresAt: Date.now() + CACHE_TTL_MS };
127
+ // Only update cache if this is a REAL fetch (not a cached fallback).
128
+ // fetchEntitlements returns cache.entitlements on error — detect by
129
+ // checking if the returned object is the exact same reference.
130
+ const isFallback = cache && ent === cache.entitlements;
131
+ if (!isFallback) {
132
+ cache = { entitlements: ent, expiresAt: Date.now() + CACHE_TTL_MS };
133
+ }
134
+ // On fallback: DON'T refresh expiresAt — let it expire so we retry.
115
135
  return ent;
116
136
  }
117
137
  finally {
@@ -1,23 +1,22 @@
1
1
  /**
2
2
  * RAM-Gated Local Model Picker
3
3
  * ─────────────────────────────────────────────────────────────
4
- * Cascade: 14b (default) → 4b (verifier) → 2b (mobile) → 32b (complex only).
4
+ * Cascade: 9b (default) → 4b (verifier) → 2b (mobile) → 32b (complex only).
5
5
  *
6
- * The default ceiling is "14b" — NOT "32b". This means:
7
- * - 14b is the primary model for routing + general inference
6
+ * The default ceiling is "9b" — NOT "32b". This means:
7
+ * - 9b is the primary model for routing + general inference (Qwen3.5-9B, 100% BFCL)
8
8
  * - 4b is used as the grounding verifier (fast, small)
9
- * - 2b is the mobile/iPhone first gate (Qwen3.5-4B Q3_K_M, 99.1% BFCL)
9
+ * - 2b is the mobile/iPhone first gate (Qwen3.5-2B, 99.1% BFCL)
10
10
  * - 32b is only loaded when caller explicitly passes ceiling="32b"
11
11
  * or when the task requires maximum quality (complex code gen, etc.)
12
12
  *
13
- * This saves 10GB+ RAM on most devices and keeps response times fast.
14
- * The 14b achieves 100% on eval_300 — same as 32b.
13
+ * This saves 13GB+ RAM vs 32b and keeps response times fast.
15
14
  *
16
15
  * tag weights need free ctx role
17
16
  * prism-coder:32b ~19 GB ≥ 24 GB 32K complex (on-demand)
18
- * prism-coder:14b ~ 9 GB 12 GB 32K default router
19
- * qwen3.5:4b ~ 3.4 GB ≥ 5 GB 32K verifier (Q4_K_M, 100%)
20
- * prism-coder:2b ~ 2.3 GB ≥ 3 GB 8K mobile / iPhone (Q3_K_M, 99.1%)
17
+ * prism-coder:9b ~ 5.8 GB 8 GB 32K default router (Qwen3.5, 100% BFCL)
18
+ * prism-coder:4b ~ 3.4 GB ≥ 5 GB 32K verifier (Qwen3.5, 100%)
19
+ * prism-coder:2b ~ 2.3 GB ≥ 3 GB 8K mobile / iPhone (Qwen3.5, 99.1%)
21
20
  *
22
21
  * Below 3 GB free → no local pick (caller must use cloud).
23
22
  */
@@ -28,8 +27,8 @@ const GB = 1024 ** 3;
28
27
  */
29
28
  export const MODEL_TIERS = [
30
29
  { tag: 'prism-coder:32b', weightsGb: 19, minFreeGb: 24, ctxTokens: 32_768 },
31
- { tag: 'prism-coder:14b', weightsGb: 9, minFreeGb: 12, ctxTokens: 32_768 },
32
- { tag: 'qwen3.5:4b', weightsGb: 3.4, minFreeGb: 5, ctxTokens: 32_768 },
30
+ { tag: 'prism-coder:9b', weightsGb: 5.8, minFreeGb: 8, ctxTokens: 32_768 },
31
+ { tag: 'prism-coder:4b', weightsGb: 3.4, minFreeGb: 5, ctxTokens: 32_768 },
33
32
  { tag: 'prism-coder:2b', weightsGb: 2.3, minFreeGb: 3, ctxTokens: 8_192 },
34
33
  ];
35
34
  /**
@@ -43,14 +42,14 @@ export const MODEL_TIERS = [
43
42
  function tagMatches(installed, tierTag) {
44
43
  return installed === tierTag || installed.endsWith(`/${tierTag}`);
45
44
  }
46
- /** Default ceiling: 14b. Pass ceiling="32b" explicitly for max quality. */
47
- export const DEFAULT_CEILING = "14b";
45
+ /** Default ceiling: 9b. Pass ceiling="32b" explicitly for max quality. */
46
+ export const DEFAULT_CEILING = "9b";
48
47
  /**
49
48
  * Pick the best viable tier for the given free RAM.
50
- * Default ceiling is 14b — use ceiling="32b" only for complex tasks.
49
+ * Default ceiling is 9b — use ceiling="32b" only for complex tasks.
51
50
  *
52
51
  * @param freeBytes Result of os.freemem() — binary bytes
53
- * @param ceiling Cap tier. Default "14b". Pass "32b" for complex tasks.
52
+ * @param ceiling Cap tier. Default "9b". Pass "32b" for complex tasks.
54
53
  * @param available Optional whitelist of installed Ollama tags.
55
54
  */
56
55
  export function pickLocalModel(freeBytes, ceiling, available) {
@@ -15,8 +15,9 @@ export class Gatekeeper {
15
15
  console.warn(`\n⚠️ [OVERRIDDEN] Verification Gate bypassed via administrator override.`);
16
16
  // Enforce immutability and record audit trail context via environment variables
17
17
  validatedResult.gate_override = true;
18
+ // F19 fix: process.env.USER is trivially spoofable — log it but note it's unauthenticated.
18
19
  const actor = process.env.USER || process.env.USERNAME || 'unknown_user';
19
- validatedResult.override_reason = validatedResult.override_reason || `CLI --force bypass by ${actor}`;
20
+ validatedResult.override_reason = validatedResult.override_reason || `CLI --force bypass (unauthenticated env.USER=${actor})`;
20
21
  return { canContinue: true, validatedResult };
21
22
  }
22
23
  switch (validatedResult.gate_action) {
@@ -196,7 +196,12 @@ export class VerificationRunner {
196
196
  * Throws an error if the hash does not match, ensuring test integrity.
197
197
  */
198
198
  static verifyRubricHash(tests, harness) {
199
- const computed = computeRubricHash(tests);
199
+ // F11 fix: include min_pass_rate in hash verification when harness has it.
200
+ // Try with min_pass_rate first; fall back to without for backward compat.
201
+ const minRate = harness.min_pass_rate;
202
+ const computed = minRate !== undefined
203
+ ? computeRubricHash(tests, minRate)
204
+ : computeRubricHash(tests);
200
205
  if (computed !== harness.rubric_hash) {
201
206
  throw new Error(`Rubric hash mismatch. Expected ${harness.rubric_hash}, but computeRubricHash returned ${computed}. The tests have been modified since the harness was created.`);
202
207
  }
@@ -405,7 +410,7 @@ export class VerificationRunner {
405
410
  if (!targetCheck.ok) {
406
411
  return { passed: false, error: `HTTP target blocked: ${targetCheck.reason}` };
407
412
  }
408
- const res = await fetch(a.target);
413
+ const res = await fetch(a.target, { redirect: "error" });
409
414
  return res.status === a.expected
410
415
  ? { passed: true }
411
416
  : { passed: false, error: `Expected status ${a.expected}, got ${res.status} for ${a.target}` };
@@ -56,8 +56,16 @@ export const TestSuiteSchema = z.object({
56
56
  * @param tests - The array of TestAssertion to hash
57
57
  * @returns Lowercase hex SHA-256 digest
58
58
  */
59
- export function computeRubricHash(tests) {
59
+ export function computeRubricHash(tests, minPassRate) {
60
60
  const sorted = [...tests].sort((a, b) => a.id.localeCompare(b.id));
61
+ // F11 fix: when minPassRate is provided, include it in the hash so the
62
+ // threshold can't be changed without invalidating the rubric.
63
+ // When omitted, hash only tests (backward compatible with existing harnesses).
64
+ if (minPassRate !== undefined) {
65
+ return createHash("sha256")
66
+ .update(JSON.stringify({ tests: sorted, min_pass_rate: minPassRate }))
67
+ .digest("hex");
68
+ }
61
69
  return createHash("sha256")
62
70
  .update(JSON.stringify(sorted))
63
71
  .digest("hex");
@@ -44,6 +44,18 @@ export function resolveEffectiveSeverity(assertionSeverity, defaultSeverity) {
44
44
  */
45
45
  export function evaluateSeverityGates(results, config) {
46
46
  const failures = results.filter(r => !r.passed && !r.skipped);
47
+ // F10 fix: skipped critical (gate/abort) assertions count as failures.
48
+ // Crafting depends_on to skip critical checks must not neutralize the gate.
49
+ const skippedCritical = results.filter(r => r.skipped && (r.severity === 'gate' || r.severity === 'abort'));
50
+ if (skippedCritical.length > 0) {
51
+ const ids = skippedCritical.map(r => r.id).join(", ");
52
+ const hasAbort = skippedCritical.some(r => r.severity === 'abort');
53
+ return {
54
+ action: hasAbort ? "abort" : "block",
55
+ failed_assertions: skippedCritical,
56
+ summary: `${hasAbort ? 'ABORT' : 'BLOCKED'}: ${skippedCritical.length} critical assertion(s) were skipped [${ids}] — treating as failures.`
57
+ };
58
+ }
47
59
  if (failures.length === 0) {
48
60
  return {
49
61
  action: "continue",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "prism-mcp-server",
3
- "version": "19.0.0",
3
+ "version": "19.0.1",
4
4
  "mcpName": "io.github.dcostenco/prism-coder",
5
5
  "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 114 Agent Skills, PHI Guard, Tier Enforcement, Prompt-Based Skill Routing, Zero-Search HDC/HRR retrieval, HRR Semantic Drift Detection across BCBA/Coding/AAC domains, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder 1.7B–32B open-weights LLM fleet.",
6
6
  "module": "index.ts",