npm - prism-mcp-server - Versions diffs - 19.0.0 → 19.1.0 - Mend

prism-mcp-server 19.0.0 → 19.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +129 -75
package/dist/cli.js +2 -2
package/dist/storage/sqlite.js +4 -2
package/dist/tools/behavioralVerifierHandler.js +3 -4
package/dist/tools/ledgerHandlers.js +7 -5
package/dist/tools/prismInferHandler.js +87 -28
package/dist/utils/entitlements.js +27 -7
package/dist/utils/modelPicker.js +21 -22
package/dist/utils/qualityGate.js +43 -0
package/dist/utils/thinkStrip.js +26 -0
package/dist/verification/gatekeeper.js +2 -1
package/dist/verification/runner.js +7 -2
package/dist/verification/schema.js +9 -1
package/dist/verification/severityPolicy.js +12 -0
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -1,10 +1,6 @@
 # Prism Coder
-**Persistent memory and reliable tool-routing for AI agents.** *(formerly Prism MCP)*
-Prism Coder is a [Model Context Protocol](https://modelcontextprotocol.io) server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions — semantic search, cognitive routing, and a visual dashboard. It ships alongside the open-weight `prism-coder` model fleet (1.7B-32B) for fast, offline tool-routing when you don't want a cloud round-trip.
-It runs **fully local and free** on SQLite + Ollama with no API keys. A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.
+**Give your AI agent memory that lasts.** Persistent sessions, knowledge graphs, and offline tool-routing — fully local and free.
 [![npm](https://img.shields.io/npm/v/prism-mcp-server?color=cb0000&label=npm)](https://www.npmjs.com/package/prism-mcp-server)
 [![MCP Registry](https://img.shields.io/badge/MCP_Registry-listed-00ADD8)](https://github.com/modelcontextprotocol/servers)
@@ -15,7 +11,10 @@ It runs **fully local and free** on SQLite + Ollama with no API keys. A paid sub
   <img src="docs/v11_hivemind_multi_agent_dashboard.jpg" alt="Prism Coder — Mind Palace Dashboard with Knowledge Graph and Multi-Agent Hivemind" width="700" />
 </p>
-> **Renamed in v14:** the project is now **Prism Coder** to cover both the memory server and the model fleet. The npm package stays `prism-mcp-server`, so existing install URLs and `mcp.json` entries keep working.
+Prism Coder is an [MCP server](https://modelcontextprotocol.io) that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight `prism-coder` model fleet (2B–27B) for fast, offline tool-routing — no cloud required.
+**No account needed. No API keys. Runs on your machine.**
+A paid subscription adds cloud sync, higher model tiers, and team features through the [Synalux portal](https://synalux.ai).
 ---
@@ -39,18 +38,20 @@ Open Claude Desktop or Cursor and your agent now has memory backed by a local SQ
 **Optional — local model fleet** for offline tool-routing. Pull whichever fits your hardware:
 ```bash
-ollama pull dcostenco/prism-coder:2b    # 2.3 GB · iPhone / mobile first gate (Qwen3.5-4B Q3_K_M, 99.1%)
-ollama pull dcostenco/prism-coder:4b    # 3.4 GB · verifier + 8 GB+ devices (Qwen3.5-4B Q4_K_M, 100%)
-ollama pull dcostenco/prism-coder:14b   # 8.4 GB · Mac default router (100%)
-ollama pull dcostenco/prism-coder:32b   # 16 GB  · Mac complex tasks (100%)
+ollama pull dcostenco/prism-coder:2b    # 2.3 GB · mobile / lightweight (99.1% routing accuracy)
+ollama pull dcostenco/prism-coder:4b    # 3.4 GB · verifier (100% accuracy)
+ollama pull dcostenco/prism-coder:9b    # 5.8 GB · default router (100% accuracy, Qwen3.5)
+ollama pull dcostenco/prism-coder:27b   # 16 GB  · complex tasks (100% accuracy)
 ```
-Prism detects both the namespaced (`dcostenco/prism-coder:14b`) and bare (`prism-coder:14b`) Ollama tags automatically.
+Prism detects both the namespaced (`dcostenco/prism-coder:9b`) and bare (`prism-coder:9b`) Ollama tags automatically.
 ---
 ## What it does
+Your AI agent forgets everything between sessions. Prism fixes that — and adds verification, drift detection, and multi-agent coordination on top.
 ### Mind Palace — persistent memory that survives across sessions
 Every conversation feeds a persistent store. The next session loads the right context automatically — no re-explaining.
@@ -83,27 +84,17 @@ Long agent sessions can wander from their original goal. `session_detect_drift`
 ### Behavioral Verification — catch bad edits before they happen
-AI agents pattern-match on checklists instead of thinking through user impact. The behavioral verifier challenges the agent with a domain-specific scenario **before** editing code — like an ABA antecedent intervention.
+AI agents apply patterns from checklists without understanding the real-world impact. The `verify_behavior` tool challenges the agent with a scenario it must answer **before** editing — forcing it to think through what the end user will experience.
 ```
-Agent: "I'll revert the KDS bump logic"
-Prism: "⚠️ Kitchen worker scenario: A cook has a 3-item ticket.
-        One item is voided. What should the cook see on the KDS?"
-Agent: "The ticket should stay visible with the remaining 2 items."
-Prism: "Correct — your revert would remove the ticket entirely. Don't revert."
+Agent: "I'll revert this kitchen display change"
+Prism: "⚠️ Scenario: A cook sees a 3-item ticket. One item is voided.
+        What should the cook see after the void?"
+Agent: "The ticket stays visible with the remaining 2 items."
+Prism: "Correct — your revert would hide the ticket entirely."
 ```
-**17 built-in domains**: KDS, billing, auth, voice ordering, webhooks, migrations, EU routing, clinical (HIPAA/FHIR), HR, accounting, chat, STT, privacy, loyalty, discounts, drawer operations, order lifecycle. Custom domains can be added per workspace.
-**How it works**: The `verify_behavior` tool calls the Synalux portal API, which matches the file path against domain scenarios stored in the database. The agent must answer the scenario concretely before editing. No local hooks required — works in Claude, Cursor, or any MCP client.
-**Why it matters**: In a single audit session, 47 bugs were found across 7 days of AI-generated code. Every bug was introduced by an agent that applied a "correct" pattern without simulating the end-user journey. The behavioral verifier would have caught all of them.
-| Tier | Coverage |
-|------|----------|
-| Free | Skill-based advisory (agent prompted to think before editing) |
-| Standard+ | `verify_behavior` tool with 17 domain scenarios via API |
-| Enterprise | Custom per-workspace scenarios |
+17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed — works in any MCP client.
 ### Time Travel
@@ -115,7 +106,7 @@ Roll back to any previous session state. Compare diffs between versions. Restore
 ### Cognitive Routing
-Episodic (what happened), semantic (what's true), and procedural (how to do X) memories live in separate stores; a router decides where to write and where to read.
+Three memory types, automatically sorted: **episodic** (what happened — session logs, decisions), **semantic** (what's true — facts, architecture), and **procedural** (how to do X — workflows, patterns). When you search, the router picks the right store instead of dumping everything.
 ### Multi-Agent Hivemind
@@ -144,37 +135,53 @@ The free tier runs entirely on your machine. Paid tiers add cloud sync through t
 | Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
 | Inference | Local Ollama models | Local models + cloud fallback |
 | API keys required | None | Synalux subscription key |
-| Web search / scrape | Not included | Routed through the Synalux portal (provider keys stay server-side). Search tools appear as `brave_web_search` in the MCP surface but are proxied through the portal for auth and billing. |
+| Web search / scrape | Not included | Via Synalux portal (provider keys server-side) |
 | What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
-| Works offline | Yes | Local features yes; sync/cloud no |
+| Works offline | ✅ | Local features yes; sync/cloud no |
-**Handling sensitive data.** Memory text fields (summaries, decisions, handoff context, file paths) pass through a PHI-redaction step (SSN/DOB/MRN/phone/email and common clinical identifiers) before any cloud write. Knowledge ingestion chunks are also redacted before being sent to the LLM for Q&A synthesis. For regulated workloads, run the **local tier** to keep data on-device, or use an **Enterprise** plan, which is the tier that includes a HIPAA Business Associate Agreement. Prism does not claim blanket HIPAA compliance on the free or individual tiers — the on-device path is the air-gapped option.
+**Handling sensitive data.** All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the **local tier** for full air-gap, or use **Enterprise** which includes a HIPAA Business Associate Agreement.
 ---
 ## Models
-The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing. The 14B and 32B are fine-tuned from Qwen3; the 2B and 4B slots use stock Qwen3.5-4B with prompt engineering at different quantization levels (100% routing accuracy without fine-tuning). They are **not** general-purpose chat models — they route reliably and run offline; Claude and other frontier models remain better at reasoning, coding, and open-domain work. The intended pattern is local routing with an optional cloud fallback for hard cases.
+The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing AND general inference. The 9B and 27B are fine-tuned with LoRA (r=128, all 64 layers including DeltaNet); the 2B and 4B use stock Qwen3.5-4B at different quantization levels. The 27B scored 100% on BFCL function-calling and 100% on an internal 15-problem coding eval at $0 inference cost.
-| Model | Ollama tag | Size | BFCL Accuracy | Role | Tier |
+`prism_infer` supports three modes: `route` (tool routing, fast, nothink), `chat` (conversation with thinking), and `code` (code generation with thinking). In chat/code modes, the model uses `<think>` blocks for chain-of-thought reasoning, which are stripped before the response is served. If the local model fails a quality gate (empty, think-only, or truncated), paid tiers automatically escalate to Claude via the Synalux portal.
+| Model | Ollama tag | Size | [BFCL](https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v3_multi_turn.html) Accuracy | Role | Tier |
 |---|---|---|---|---|---|
 | Qwen3.5-4B Q3_K_M | `prism-coder:2b` | 2.3 GB | 99.1% × 3 seeds | iPhone / mobile first gate | Free |
-| Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% × 3 seeds | Verifier + 8 GB+ devices | Free |
-| prism-coder:14b | `prism-coder:14b` | 8.4 GB | 100% × 3 seeds | Default router | Standard+ |
-| prism-coder:32b | `prism-coder:32b` | 16 GB | 100% × 3 seeds | Complex tasks | Advanced+ |
+| Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% × 3 seeds | Verifier | Free |
+| Qwen3.5-9B (LoRA) | `prism-coder:9b` | 5.8 GB | 100% × 3 seeds | Default router | Standard+ |
+| Qwen3.5-27B (LoRA) | `prism-coder:27b` | 16 GB | 100% × 3 seeds | Quality tier (DeltaNet, 28.5 tok/s) | Advanced+ |
 Weights: [huggingface.co/dcostenco](https://huggingface.co/dcostenco) (public GGUF). Latency depends on model size and hardware — see [Benchmarks](#benchmarks) to measure it on your own machine rather than trusting a printed number.
 ### Cascade
 ```
-query → prism-coder:14b (local router, Mac default)
-      → qwen3.5:4b (grounding verifier)
+query → prism-coder:9b (local router, default)
+      → prism-coder:4b (grounding verifier)
       → prism-coder:2b (iPhone / mobile, auto-selected by RAM)
-      → prism-coder:32b (complex tasks, on demand)
+      → prism-coder:27b (complex tasks, on demand)
       → cloud fallback (paid tiers, for max quality)
 ```
+### Multi-Layer Verification
+Every tool-grounded answer on paid tiers passes through deterministic L3 routing rules and an NLI grounding verifier before reaching the user. Free-tier users get the deterministic gates (L1, L3-Tool, L3-Tier0) without the model-based NLI check.
+| Layer | What | Model | Cost |
+|---|---|---|---|
+| **L1** | Crisis/medical safety gate | None (regex) | 0 ms |
+| **L3-Tool** | Tool name remap + false-positive rejection | None (deterministic) | 0 ms |
+| **L3-Tier0** | Integer grounding (set membership) | None (deterministic) | 0 ms |
+| **L3-Tier2** | NLI verifier (claim → ENTAILED/NEUTRAL/CONTRADICTED) | prism-coder:2b | ~200 ms |
+| **L4** | Hallucination judge (opt-out for clinical) | prism-coder:4b | ~500 ms |
+Fail-closed on the verified path: when the grounding verifier runs (Standard tier and up), timeout, ambiguity, or missing evidence yields a refusal, not pass-through. Free-tier users get the deterministic L1/L3-Tool gates but not the NLI verifier.
 ---
 ## Benchmarks
@@ -184,15 +191,15 @@ query → prism-coder:14b (local router, Mac default)
 ```bash
 git clone https://github.com/dcostenco/prism-coder && cd prism-coder
 pip install anthropic requests
-python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 14b 32b
+python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 9b 27b
 ```
-**Routing eval (115 cases, 12 categories, 3-seed mean).** On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
+**Routing eval (115 cases, 12 categories, 3-seed mean).** Routing accuracy includes the deterministic L3 correction layer — the same rules that run in production. On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
 | Model | Routing accuracy | Notes |
 |---|---|---|
 | prism-coder:2b (Q3_K_M) | 99.1% × 3 seeds | 1 failure: regex→knowledge_search |
-| prism-coder:4b / 14b / 32b | 100% × 3 seeds | Perfect on all 115 cases |
+| prism-coder:4b / 9b / 27b | 100% × 3 seeds | Perfect on all 115 cases |
 | Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
 **Memory uplift (LoCoMo-Plus, self-published).** A separate long-context dialogue benchmark ([dcostenco/Locomo-Plus](https://github.com/dcostenco/Locomo-Plus)) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project — treat it as a reproducible demonstration, not an independent third-party result, and run it yourself with the commands in that repo.
@@ -207,30 +214,30 @@ These tables are the maintainer's assessment as of June 2026. Verify claims that
 | Feature | Prism Coder | GitHub Copilot | Cursor | Windsurf | Amazon Q | Devin |
 |---|:---:|:---:|:---:|:---:|:---:|:---:|
-| Local inference (open-weight) | Yes | No | No | No | No | No |
-| Works fully offline | Yes (free tier) | No | No | No | No | No |
-| Persistent cross-session memory | Yes | Yes | No | No | No | No |
-| Session drift detection | Yes | No | No | No | No | No |
-| L3 grounding verifier | Yes | No | No | No | No | No |
-| Behavioral verification (pre-edit) | Yes | No | No | No | No | No |
-| MCP server (tools + memory) | Yes | No | No | No | No | No |
-| Web IDE | Yes | Yes | No | No | Yes | Yes |
-| VS Code extension | Yes | Yes | N/A (is VS Code) | N/A | Yes | No |
-| Flat-rate team pricing | Yes | No (per-seat) | No (per-seat) | No | No | No |
-| HIPAA BAA available | Yes (Enterprise) | No | No | No | No | No |
+| Local inference (open-weight) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Works fully offline | ✅ (free tier) | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Persistent cross-session memory | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Session drift detection | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
+| L3 grounding verifier | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Behavioral verification (pre-edit) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
+| MCP server (tools + memory) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
+| Web IDE | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
+| VS Code extension | ✅ | ✅ | — | — | ✅ | ❌ |
+| Flat-rate team pricing | ✅ | ❌ (per-seat) | ❌ (per-seat) | ❌ | ❌ | ❌ |
+| HIPAA BAA available | ✅ (Enterprise) | ❌ | ❌ | ❌ | ❌ | ❌ |
 ### vs local AI / memory tools
 | Feature | Prism Coder | Ollama | LM Studio | Mem0 | Zep |
 |---|:---:|:---:|:---:|:---:|:---:|
-| Local inference cascade | Yes | Yes | Yes | No | No |
-| Cloud fallback | Yes | No | No | No | No |
-| Persistent cross-session memory | Yes | No | No | Yes | Yes |
-| Knowledge ingestion (MCP + webhook) | Yes | No | No | No | No |
-| Cognitive routing (3-store) | Yes | No | No | No | No |
-| Session drift detection | Yes | No | No | No | No |
-| Native MCP server | Yes | No | No | No | No |
-| Web IDE + VS Code extension | Yes | No | No | No | No |
+| Local inference cascade | ✅ | ✅ | ✅ | ❌ | ❌ |
+| Cloud fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Persistent cross-session memory | ✅ | ❌ | ❌ | ✅ | ✅ |
+| Knowledge ingestion (MCP + webhook) | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Cognitive routing (3-store) | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Session drift detection | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Native MCP server | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Web IDE + VS Code extension | ✅ | ❌ | ❌ | ❌ | ❌ |
 ### Pricing — flat-rate, not per-seat
@@ -249,19 +256,19 @@ All on-device models are free to run locally via Ollama on every tier. A subscri
 | | **Free** | **Standard** $19/mo | **Advanced** $49/mo | **Enterprise** $99/mo |
 |---|---|---|---|---|
 | Seats | 1 | 1 | up to 5 | up to 25 |
-| Local model ceiling | up to 4b | up to 14b | up to 32b | up to 32b |
+| Local model ceiling | up to 4b | up to 9b | up to 27b | up to 27b |
 | Daily cloud inference | -- | 200 | 2,000 | 100,000 |
 | Cloud Coder (Web IDE) | -- | 100/day | 1,000/day | 100,000/day |
 | Cloud search | -- | 50/day | 500/day | 100,000/day |
 | Max output tokens | 512 | 1,024 | 2,048 | 4,096 |
 | Cloud fallback | -- | Claude Sonnet 4 | Claude Sonnet 4 | Priority + Sonnet 4 |
-| Grounding verifier | -- | Yes | Yes | Yes |
-| Memory sync (cloud) | -- | Yes | Yes | Yes |
+| Grounding verifier (fact-check AI output) | -- | ✅ | ✅ | ✅ |
+| Memory sync (cloud) | -- | ✅ | ✅ | ✅ |
 | Knowledge / session memory | limited | unlimited | unlimited | unlimited |
-| Analytics dashboard | -- | Yes | Yes | Yes |
-| HIPAA BAA | -- | -- | -- | Yes |
+| Analytics dashboard | -- | ✅ | ✅ | ✅ |
+| HIPAA BAA | -- | -- | -- | ✅ |
-14-day free trial on paid plans. [Pricing](https://synalux.ai/pricing) | 25+ seats: [contact sales](https://synalux.ai/support)
+14-day free trial on paid plans. 25+ seats: [contact sales](https://synalux.ai/support)
 ---
@@ -279,6 +286,26 @@ Prism exposes 40+ MCP tools. The core memory loop:
 | `session_detect_drift` | Detect when a session has drifted from its goal |
 | `verify_behavior` | Pre-edit scenario challenge — catch bad changes before they happen |
 | `knowledge_ingest` | Teach Prism a codebase or document |
+| `prism_infer` | Local-first inference (route/chat/code modes, thinking, cloud escalation) |
+### `prism_infer` — local-first inference with cloud escalation
+```typescript
+prism_infer({
+    prompt: "Write a binary search in Python",
+    mode: "code",        // "route" | "chat" | "code"
+    think: true,          // enable <think> reasoning (default: true for chat/code)
+    model_ceiling: "27b", // use the quality tier
+})
+// → 27B generates code locally ($0), with thinking for quality
+// → If quality gate fails + paid tier → auto-escalate to Claude
+```
+| Mode | Think | Model | Use case |
+|------|-------|-------|----------|
+| `route` | Off (fast) | 9B default | MCP tool routing |
+| `chat` | On | 27B preferred | Conversation, reasoning |
+| `code` | On | 27B preferred | Code generation, debugging |
 Full TypeScript signatures live in [`src/tools/`](src/tools/); architecture in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).
@@ -324,6 +351,8 @@ prism register-models     # alias dcostenco/prism-coder:* -> prism-coder:*
 ## Companions
+Prism works alongside these tools — use whichever fits your workflow.
 ### Web IDE — Prism Coder
 A browser-based IDE at [synalux.ai/coder](https://synalux.ai/coder). Import any GitHub repo and get:
@@ -358,13 +387,16 @@ code --install-extension synalux-ai.synalux
 [![VS Marketplace](https://img.shields.io/visual-studio-marketplace/v/synalux-ai.synalux?label=VS%20Marketplace&color=007ACC)](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
-**AI features:** Chat participant (`@synalux`), multi-agent pipeline, voice input with conversation mode, model switching (local Ollama / cloud / Gemini), 10 AI personality tones.
-**Clinical features (BCBA / healthcare):** SOAP note generator, role-based access, document signing, patient board. Voice recording with AES-256-GCM encryption (consent-gated, off by default, plaintext deleted after encryption).
+AI chat, voice input, SOAP note generator, team collaboration, and video calls — all inside VS Code. Routes through local Ollama by default; cloud on paid tiers.
-**Collaboration:** Team chat, direct messages, enterprise video calls (LiveKit), customer board, visual builder, DevContainers, Auth & Database panel.
+<details>
+<summary>Feature details</summary>
-**Privacy note:** The extension routes AI requests through the `BackendRouter` — local Ollama by default for free tier, cloud for paid (user-configurable via `preferLocal`). Clinical features (SOAP notes, voice) route through the same backend. `preferLocal=true` tries local first but can still fall back to cloud if the local model is unavailable. For regulated workloads where PHI must never leave the machine, use the free tier (no cloud key) or an Enterprise plan with BAA that covers cloud-bound data. Licensed under [BSL-1.1](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux).
+- **AI**: Chat participant (`@synalux`), multi-agent pipeline, voice input, model switching, 10 tones
+- **Clinical**: SOAP note generator, role-based access, document signing, patient board
+- **Collaboration**: Team chat, DMs, video calls, customer board, visual builder, DevContainers
+- **Privacy**: Local Ollama by default. `preferLocal=true` tries local first. Enterprise BAA available.
+</details>
 ### Prism AAC
@@ -374,6 +406,28 @@ See [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)
 ---
+## Git Hooks (Portable)
+Pre-commit and pre-push security hooks that work with any editor, any AI tool, and direct CLI. No Claude Code dependency.
+```bash
+# Install in all repos (one-time)
+bash synalux-private/scripts/install-git-hooks.sh
+# Or install manually in a single repo
+cp hooks/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
+cp hooks/pre-push .git/hooks/pre-push && chmod +x .git/hooks/pre-push
+```
+| Hook | What it checks | Mode |
+|------|----------------|------|
+| `pre-commit` | Dead code, orphan services, scaffold code, missing auth | `PRECOMMIT_MODE=advisory\|block\|off` |
+| `pre-push` | 19-rule security audit (SSRF, SQL injection, secrets, IDOR, etc.) | `PREPUSH_MODE=advisory\|block\|off` |
+Default mode is `advisory` (warn but allow). Set `*_MODE=block` for hard enforcement. Hooks look for full audit scripts in the repo first (`hooks/lib/`), then `~/.claude/hooks/` fallback, then minimal inline checks.
+---
 ## Self-hosting (Enterprise)
 Run the full model stack on your own hardware — no cloud, full data sovereignty.
@@ -381,11 +435,11 @@ Run the full model stack on your own hardware — no cloud, full data sovereignt
 **Requirements:** Mac M2 Pro+ (48 GB recommended) or Linux + NVIDIA GPU, plus [Ollama](https://ollama.com).
 ```bash
-ollama pull dcostenco/prism-coder:14b      # default router
+ollama pull dcostenco/prism-coder:9b       # default router
 export LOCAL_LLM_URL=http://localhost:11434
 ```
-Routing is automatic: `14b → 4b → cloud fallback` on desktop/server, `2b → cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
+Routing is automatic: `9b → 4b → cloud fallback` on desktop/server, `2b → cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
 ---

package/dist/cli.js CHANGED Viewed

@@ -521,10 +521,10 @@ scmCmd
 });
 // ─── prism register-models ────────────────────────────────────
 // Convenience: alias namespaced HF-style prism-coder tags
-// (`dcostenco/prism-coder:14b`) to the bare tags (`prism-coder:14b`)
+// (`dcostenco/prism-coder:9b`) to the bare tags (`prism-coder:9b`)
 // some external tooling expects. The MCP picker handles both forms
 // natively as of v15.5, so this command is OPTIONAL — useful only
-// when a user wants to run `ollama run prism-coder:14b` directly,
+// when a user wants to run `ollama run prism-coder:9b` directly,
 // or for tools that pre-date the picker's namespace fallback.
 program
     .command('register-models')

package/dist/storage/sqlite.js CHANGED Viewed

@@ -1268,7 +1268,7 @@ export class SqliteStorage {
             FROM session_ledger
             WHERE project = ? AND user_id = ? AND role = ?
               AND event_type = 'correction'
-              AND importance >= 3
+              AND importance >= 0
               AND deleted_at IS NULL
               AND archived_at IS NULL
             ORDER BY importance DESC
@@ -2323,10 +2323,12 @@ export class SqliteStorage {
             SET importance = MAX(0, importance - 1)
             WHERE project = ? AND user_id = ?
               AND importance > 0
+              AND importance < 10
               AND event_type != 'session'
               AND created_at < datetime('now', '-' || ? || ' days')
+              AND (last_accessed_at IS NULL OR last_accessed_at < datetime('now', '-' || ? || ' days'))
               AND deleted_at IS NULL`,
-            args: [project, userId, decayDays],
+            args: [project, userId, decayDays, decayDays],
         });
         const decayed = result.rowsAffected || 0;
         if (decayed > 0) {

package/dist/tools/behavioralVerifierHandler.js CHANGED Viewed

@@ -10,7 +10,6 @@
  */
 import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
 import { getSynaluxJwt } from "../utils/synaluxJwt.js";
-import { debugLog } from "../utils/logger.js";
 const FALLBACK_SCENARIO = [
     "⚠️ BEHAVIORAL VERIFICATION (OFFLINE MODE)",
     "",
@@ -30,7 +29,7 @@ export async function verifyBehaviorHandler(args) {
     }
     const jwt = await getSynaluxJwt();
     if (!jwt) {
-        debugLog("[verify-behavior] JWT unavailable — fail-closed with generic scenario");
+        console.error("[verify-behavior] ⚠️ JWT unavailable — fail-closed with generic scenario");
         return FALLBACK_SCENARIO;
     }
     try {
@@ -49,14 +48,14 @@ export async function verifyBehaviorHandler(args) {
             signal: AbortSignal.timeout(5_000),
         });
         if (!res.ok) {
-            debugLog(`[verify-behavior] portal returned ${res.status} — fail-closed`);
+            console.error(`[verify-behavior] ⚠️ portal returned ${res.status} — fail-closed. URL: ${url}`);
             return FALLBACK_SCENARIO;
         }
         const data = (await res.json());
         return formatResult(data);
     }
     catch (err) {
-        debugLog(`[verify-behavior] error: ${err.message} — fail-closed`);
+        console.error(`[verify-behavior] ⚠️ VERIFICATION FAILED: ${err.message} — using generic fallback`);
         return FALLBACK_SCENARIO;
     }
 }

package/dist/tools/ledgerHandlers.js CHANGED Viewed

@@ -977,15 +977,17 @@ export async function sessionLoadContextHandler(args) {
     // Build the response object before v4.0 augmentations
     // SECURITY: Wrap output in boundary tags to prevent context confusion.
     // The LLM sees <prism_memory context="historical"> and knows this is data, not instructions.
-    let responseText = `${MEMORY_BOUNDARY_PREFIX}📋 Session context for "${project}" (${level}):\n\n${formattedContext.trim()}${splitBrainWarning}${driftReport}${briefingBlock}${sdmRecallBlock}${greetingBlock}${visualMemoryBlock}${skillBlock}${versionNote}`;
-    // ─── v4.0: Behavioral Warnings Injection ───────────────────
-    // If loadContext returned behavioral_warnings, add them to the
-    // formatted output so the agent sees them prominently.
+    // ─── v19.1: Behavioral Warnings — BEFORE skills (protected from truncation) ───
+    // Corrections must surface prominently. Placed before skillBlock so the
+    // skill budget cannot push them out. Capped at 2,000 chars.
     const behavWarnings = data?.behavioral_warnings;
+    let behavBlock = '';
     if (behavWarnings && behavWarnings.length > 0) {
-        responseText += `\n\n[⚠️ BEHAVIORAL WARNINGS]\n` +
+        const rawBlock = `\n\n[⚠️ BEHAVIORAL WARNINGS — DO NOT IGNORE]\n` +
             behavWarnings.map(w => `- ${w.summary} (importance: ${w.importance})`).join("\n");
+        behavBlock = [...rawBlock].slice(0, 2000).join('');
     }
+    let responseText = `${MEMORY_BOUNDARY_PREFIX}📋 Session context for "${project}" (${level}):\n\n${formattedContext.trim()}${splitBrainWarning}${driftReport}${briefingBlock}${sdmRecallBlock}${greetingBlock}${visualMemoryBlock}${behavBlock}${skillBlock}${versionNote}`;
     // ─── v9.4.7: ABA Precision Protocol (foundational) ────────
     // Injected into EVERY session load so the agent always operates
     // under these behavioral rules. Never truncated (placed before

package/dist/tools/prismInferHandler.js CHANGED Viewed

@@ -2,7 +2,7 @@
  * prism_infer — local-first inference tool
  * ─────────────────────────────────────────────────────────────
  * Save the caller's cloud tokens by routing to a local prism-coder
- * model via Ollama. Tiers (32B/14B/8B/1.7B) auto-selected by free
+ * model via Ollama. Tiers (27B/9B/8B/1.7B) auto-selected by free
  * RAM, then capped by `model_ceiling` and the set of tags that are
  * actually pulled into Ollama.
  *
@@ -12,7 +12,7 @@
  *   4. On local fail, if cloud_fallback=true:
  *        - exchange synalux_sk_ → JWT (cached)
  *        - POST synalux portal /api/v1/prism-aac/inference
- *        - portal runs its own cascade (14B/32B/Claude by tier)
+ *        - portal runs its own cascade (9B/27B/Claude by tier)
  *   5. Return { output, backend, model_picked, ram_free_mb, latency_ms, used_cloud }
  *
  * `prism_infer` is a thin client. It never calls Anthropic / OpenRouter
@@ -24,16 +24,17 @@ import { getSynaluxJwt, invalidateSynaluxJwt } from "../utils/synaluxJwt.js";
 import { getAvailableMemoryBytes } from "../utils/availableMemory.js";
 import { PRISM_SYNALUX_BASE_URL, PRISM_LOCAL_LLM_URL, } from "../config.js";
 import { debugLog } from "../utils/logger.js";
-import { verifyGrounding } from "../utils/groundingVerifier.js";
 import { getEntitlements, clampCeiling } from "../utils/entitlements.js";
 import { ddLog } from "../utils/ddLogger.js";
+import { stripThink } from "../utils/thinkStrip.js";
+import { passesQualityGate } from "../utils/qualityGate.js";
 // ─── Tool Definition ────────────────────────────────────────────
 export const PRISM_INFER_TOOL = {
     name: "prism_infer",
     description: "Run an inference on a local prism-coder model (Ollama) to save cloud tokens. " +
-        "Picks the largest viable tier — 32B / 14B / 8B / 1.7B — based on free RAM at call time, " +
+        "Picks the largest viable tier — 27B / 9B / 8B / 1.7B — based on free RAM at call time, " +
         "clamped by `model_ceiling` and what is actually pulled in Ollama. " +
-        "Falls through to the synalux portal cloud cascade (14B → 32B → Claude Opus 4.7) " +
+        "Falls through to the synalux portal cloud cascade (9B → 27B → Claude Opus 4.7) " +
         "only when local is unviable AND `cloud_fallback=true`. " +
         "Use this for code generation, summarisation, classification, or any synth task you would " +
         "otherwise hand to the cloud model — it costs $0 when the local hit succeeds.",
@@ -60,8 +61,8 @@ export const PRISM_INFER_TOOL = {
             },
             model_ceiling: {
                 type: "string",
-                enum: ["32b", "14b", "4b", "2b"],
-                description: "Cap the largest tier the picker may select. e.g. '14b' forbids 32B even if RAM allows.",
+                enum: ["27b", "9b", "4b", "2b"],
+                description: "Cap the largest tier the picker may select. e.g. '9b' forbids 27B even if RAM allows.",
             },
             cloud_fallback: {
                 type: "boolean",
@@ -70,7 +71,7 @@ export const PRISM_INFER_TOOL = {
             },
             timeout_ms: {
                 type: "number",
-                description: "Override per-call timeout. Default scales with model size: 32B=120s, 14B=60s, 4B=20s, 1.7B=15s.",
+                description: "Override per-call timeout. Default scales with model size: 27B=120s, 9B=60s, 4B=20s, 1.7B=15s.",
             },
             evidence: {
                 type: "array",
@@ -103,6 +104,20 @@ export const PRISM_INFER_TOOL = {
                 description: "Override the verifier hard timeout. Default 2000 ms.",
                 default: 2000,
             },
+            mode: {
+                type: "string",
+                enum: ["route", "chat", "code"],
+                description: "Execution mode. 'route' (default) for MCP tool routing — fast, nothink. " +
+                    "'chat' for general conversation — uses thinking, escalates to cloud on failure. " +
+                    "'code' for code generation — uses thinking, larger context. " +
+                    "In chat/code modes, prefers the 27B tier and enables <think> reasoning.",
+                default: "route",
+            },
+            think: {
+                type: "boolean",
+                description: "Enable thinking mode (<think> blocks). Default: true for chat/code, false for route. " +
+                    "Thinking improves quality on complex tasks but adds latency (~2-5s).",
+            },
         },
         required: ["prompt"],
     },
@@ -124,7 +139,12 @@ export function isPrismInferArgs(args) {
     if (a.timeout_ms !== undefined && typeof a.timeout_ms !== "number")
         return false;
     if (a.model_ceiling !== undefined &&
-        !["32b", "14b", "4b", "2b"].includes(a.model_ceiling))
+        !["27b", "9b", "4b", "2b"].includes(a.model_ceiling))
+        return false;
+    if (a.mode !== undefined &&
+        !["route", "chat", "code"].includes(a.mode))
+        return false;
+    if (a.think !== undefined && typeof a.think !== "boolean")
         return false;
     if (a.verify !== undefined && typeof a.verify !== "boolean")
         return false;
@@ -147,9 +167,9 @@ export function isPrismInferArgs(args) {
 }
 // ─── Ollama helpers ────────────────────────────────────────────
 const DEFAULT_TIMEOUTS = {
-    "prism-coder:32b": 120_000,
-    "prism-coder:14b": 60_000,
-    "qwen3.5:4b": 20_000,
+    "prism-coder:27b": 120_000,
+    "prism-coder:9b": 60_000,
+    "prism-coder:4b": 20_000,
     "prism-coder:2b": 15_000,
 };
 /** List Ollama-installed tags. Returns null if Ollama unreachable. */
@@ -194,16 +214,20 @@ export async function listOllamaLoaded(url = PRISM_LOCAL_LLM_URL) {
         return new Set();
     }
 }
-async function callOllamaGenerate(url, model, prompt, system, maxTokens, temperature, timeoutMs) {
+async function callOllamaGenerate(url, model, prompt, system, maxTokens, temperature, timeoutMs, think) {
     try {
+        const messages = [];
+        if (system)
+            messages.push({ role: "system", content: system });
+        messages.push({ role: "user", content: prompt });
         const body = {
             model,
-            prompt,
-            ...(system ? { system } : {}),
+            messages,
             stream: false,
+            ...(think !== undefined ? { think } : {}),
             options: { num_predict: maxTokens, temperature },
         };
-        const res = await fetch(`${url}/api/generate`, {
+        const res = await fetch(`${url}/api/chat`, {
             method: "POST",
             headers: { "Content-Type": "application/json" },
             body: JSON.stringify(body),
@@ -215,10 +239,10 @@ async function callOllamaGenerate(url, model, prompt, system, maxTokens, tempera
         const data = (await res.json());
         if (data.error)
             return { ok: false, reason: `ollama_err:${data.error}` };
-        const text = (data.response ?? "").trim();
+        const text = (data.message?.content ?? "").trim();
         if (!text)
             return { ok: false, reason: "empty_response" };
-        return { ok: true, text };
+        return { ok: true, text, doneReason: data.done_reason };
     }
     catch (err) {
         const name = err instanceof Error ? err.name : "Unknown";
@@ -280,8 +304,11 @@ export async function runInfer(args, deps) {
     // Fetch user's plan limits (cached 1hr). Free users without auth
     // get 4b ceiling, 50 calls/day, 512 max tokens.
     const ent = deps.entitlements ?? await getEntitlements();
-    // Clamp model ceiling to what the plan allows
-    const effectiveCeiling = clampCeiling(args.model_ceiling, ent.model_ceiling);
+    // MF2: In chat/code modes, request the 27B tier (subject to plan ceiling + RAM).
+    // mode:"code" implies quality → start higher in the cascade.
+    const mode = args.mode ?? "route";
+    const modeCeiling = (mode === "chat" || mode === "code") ? (args.model_ceiling ?? "27b") : args.model_ceiling;
+    const effectiveCeiling = clampCeiling(modeCeiling, ent.model_ceiling);
     // Clamp max_tokens to plan limit
     const maxTokens = Math.min(args.max_tokens ?? 1024, ent.max_tokens, 8192);
     // Cloud fallback only for paid plans
@@ -327,16 +354,16 @@ export async function runInfer(args, deps) {
     // Walk the tier table top → bottom, capped by model_ceiling. Each tier
     // logs its skip reason ("not_pulled" / "ram_insufficient" / fail reason)
     // so the caller can see exactly why each tier was bypassed.
+    let localDraft = null;
     if (installed) {
-        // Find start index from ceiling — if no ceiling, start at the top (32B).
         const ceilStart = effectiveCeiling
             ? Math.max(0, MODEL_TIERS.findIndex(t => t.tag.endsWith(`:${effectiveCeiling}`)))
             : 0;
         let anyViable = false;
         for (let i = ceilStart; i < MODEL_TIERS.length; i++) {
             const tier = MODEL_TIERS[i];
-            // Accept the tier whether Ollama reports it as bare (`prism-coder:32b`)
-            // or namespaced (`dcostenco/prism-coder:32b`, the form `ollama pull`
+            // Accept the tier whether Ollama reports it as bare (`prism-coder:27b`)
+            // or namespaced (`dcostenco/prism-coder:27b`, the form `ollama pull`
             // produces from a HF repo). resolveOllamaName returns the actual
             // name Ollama knows so /api/generate finds the model.
             const ollamaName = resolveOllamaName(tier.tag, installed);
@@ -353,9 +380,27 @@ export async function runInfer(args, deps) {
             }
             anyViable = true;
             const timeout = args.timeout_ms ?? DEFAULT_TIMEOUTS[tier.tag] ?? 60_000;
-            const result = await deps.callLocal(deps.ollamaUrl, ollamaName, args.prompt, args.system, maxTokens, temperature, timeout);
+            const enableThink = args.think ?? (mode !== "route");
+            const result = await deps.callLocal(deps.ollamaUrl, ollamaName, args.prompt, args.system, maxTokens, temperature, timeout, enableThink);
             if (result.ok) {
-                return await applyVerification(result.text, gatedArgs, deps, {
+                const { stripped, thinkOnly } = stripThink(result.text);
+                const output = stripped;
+                // Quality gate for chat/code modes
+                if (mode !== "route") {
+                    const gate = passesQualityGate(output, thinkOnly, result.doneReason);
+                    if (!gate.pass && allowCloud) {
+                        debugLog(`[prism_infer] quality gate FAIL (${gate.reason}) — escalating to cloud`);
+                        attempts.push({ tier: tier.tag, reason: `quality_gate:${gate.reason}` });
+                        if (gate.reason === "hard_truncation" || gate.reason === "loop_detected") {
+                            localDraft = { output, tier: tier.tag };
+                        }
+                        break;
+                    }
+                    if (!gate.pass) {
+                        debugLog(`[prism_infer] quality gate FAIL (${gate.reason}) — no cloud, serving local`);
+                    }
+                }
+                return await applyVerification(output, gatedArgs, deps, {
                     backend: `ollama-${tier.tag.replace("prism-coder:", "")}`,
                     model_picked: tier.tag,
                     ram_free_mb: ramFreeMb,
@@ -393,7 +438,20 @@ export async function runInfer(args, deps) {
     else {
         attempts.push({ tier: "synalux", reason: "cloud_fallback_disabled" });
     }
-    // Everything failed.
+    // Cloud also failed — serve the local draft if we have one
+    if (localDraft) {
+        debugLog(`[prism_infer] cloud failed, serving gate-failed local draft from ${localDraft.tier}`);
+        return await applyVerification(localDraft.output, gatedArgs, deps, {
+            backend: `ollama-${localDraft.tier.replace("prism-coder:", "")}`,
+            model_picked: localDraft.tier,
+            ram_free_mb: ramFreeMb,
+            latency_ms: Date.now() - t0,
+            used_cloud: false,
+            attempts,
+            plan: ent.plan,
+            quality_gate_failed: true,
+        });
+    }
     const err = new Error(`prism_infer: no backend produced output. attempts=${JSON.stringify(attempts)}, free=${fmtGb(freeBytes)}`);
     err.attempts = attempts;
     throw err;
@@ -407,10 +465,10 @@ export async function runInfer(args, deps) {
  */
 async function applyVerification(draft, args, deps, partial) {
     const shouldVerify = args.verify ?? (args.evidence !== undefined && args.evidence.length > 0);
-    if (!shouldVerify) {
+    if (!shouldVerify || !deps.callVerifier) {
         return { ...partial, output: draft };
     }
-    const verifier = deps.callVerifier ?? verifyGrounding;
+    const verifier = deps.callVerifier;
     const outcome = await verifier({
         draft,
         evidence: args.evidence ?? [],
@@ -451,6 +509,7 @@ export async function prismInferHandler(args) {
             ` free_ram=${result.ram_free_mb}MB` +
             ` latency=${result.latency_ms}ms` +
             ` used_cloud=${result.used_cloud}` +
+            (result.quality_gate_failed ? ` quality_gate_failed=true` : "") +
             (result.verification ? ` verify=${result.verification.action}` : "") +
             (result.attempts.length ? ` attempts=${JSON.stringify(result.attempts)}` : "");
         return {

package/dist/utils/entitlements.js CHANGED Viewed

@@ -6,7 +6,7 @@
  * to enforce model ceiling, max_tokens, and feature gates.
  *
  * Unauthenticated users (no SYNALUX_API_KEY) get free-tier defaults.
- * Authenticated users get their plan from the portal (1-hour cache).
+ * Authenticated users get their plan from the portal (5-minute cache).
  */
 import { getSynaluxJwt } from "./synaluxJwt.js";
 import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
@@ -32,10 +32,10 @@ const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
 let cache = null;
 let inFlight = null;
 // ── Model tier ordering for ceiling enforcement ───────────────────
-const TIER_ORDER = ["2b", "4b", "14b", "32b"];
+const TIER_ORDER = ["2b", "4b", "9b", "27b"];
 /**
  * Returns true if `requested` exceeds `ceiling`.
- * e.g. ceilingExceeded("14b", "4b") → true (14b > 4b ceiling)
+ * e.g. ceilingExceeded("9b", "4b") → true (9b > 4b ceiling)
  */
 export function ceilingExceeded(requested, ceiling) {
     const reqIdx = TIER_ORDER.indexOf(requested);
@@ -79,12 +79,18 @@ async function fetchEntitlements() {
             redirect: "error",
         });
         if (!res.ok) {
-            debugLog(`[entitlements] portal HTTP ${res.status} — free tier fallback`);
+            debugLog(`[entitlements] portal HTTP ${res.status}`);
+            if (cache) {
+                debugLog("[entitlements] using last-known-good (safety fail-closed)");
+                return cache.entitlements;
+            }
             return FREE_ENTITLEMENTS;
         }
         const data = (await res.json());
         if (!data.plan || !data.model_ceiling) {
-            debugLog("[entitlements] malformed response — free tier fallback");
+            debugLog("[entitlements] malformed response");
+            if (cache)
+                return cache.entitlements;
             return FREE_ENTITLEMENTS;
         }
         debugLog(`[entitlements] plan=${data.plan} ceiling=${data.model_ceiling} ` +
@@ -92,7 +98,14 @@ async function fetchEntitlements() {
         return data;
     }
     catch (err) {
-        debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)} — free tier fallback`);
+        debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)}`);
+        // F1 fix: fail-closed — keep last-known-good entitlements on fetch error.
+        // Safety controls (grounding_verifier) must not degrade on availability failures.
+        if (cache) {
+            debugLog("[entitlements] using last-known-good (safety fail-closed)");
+            return cache.entitlements;
+        }
+        debugLog("[entitlements] no cached entitlements — free tier fallback (cold start)");
         return FREE_ENTITLEMENTS;
     }
 }
@@ -111,7 +124,14 @@ export async function getEntitlements() {
     inFlight = (async () => {
         try {
             const ent = await fetchEntitlements();
-            cache = { entitlements: ent, expiresAt: Date.now() + CACHE_TTL_MS };
+            // Only update cache if this is a REAL fetch (not a cached fallback).
+            // fetchEntitlements returns cache.entitlements on error — detect by
+            // checking if the returned object is the exact same reference.
+            const isFallback = cache && ent === cache.entitlements;
+            if (!isFallback) {
+                cache = { entitlements: ent, expiresAt: Date.now() + CACHE_TTL_MS };
+            }
+            // On fallback: DON'T refresh expiresAt — let it expire so we retry.
             return ent;
         }
         finally {

package/dist/utils/modelPicker.js CHANGED Viewed

@@ -1,23 +1,22 @@
 /**
  * RAM-Gated Local Model Picker
  * ─────────────────────────────────────────────────────────────
- * Cascade: 14b (default) → 4b (verifier) → 2b (mobile) → 32b (complex only).
+ * Cascade: 9b (default) → 4b (verifier) → 2b (mobile) → 27b (quality).
  *
- * The default ceiling is "14b" — NOT "32b". This means:
- *   - 14b is the primary model for routing + general inference
+ * The default ceiling is "9b" — NOT "27b". This means:
+ *   - 9b is the primary model for routing + general inference (Qwen3.5-9B, 100% BFCL)
  *   - 4b is used as the grounding verifier (fast, small)
- *   - 2b is the mobile/iPhone first gate (Qwen3.5-4B Q3_K_M, 99.1% BFCL)
- *   - 32b is only loaded when caller explicitly passes ceiling="32b"
+ *   - 2b is the mobile/iPhone first gate (Qwen3.5-2B, 99.1% BFCL)
+ *   - 27b is only loaded when caller explicitly passes ceiling="27b"
  *     or when the task requires maximum quality (complex code gen, etc.)
  *
- * This saves 10GB+ RAM on most devices and keeps response times fast.
- * The 14b achieves 100% on eval_300 — same as 32b.
+ * This saves 11GB+ RAM vs 27b and keeps response times fast.
  *
  *   tag                 weights   need free   ctx     role
- *   prism-coder:32b     ~19 GB    ≥ 24 GB     32K    complex (on-demand)
- *   prism-coder:14b     ~ 9 GB    ≥ 12 GB     32K    default router
- *   qwen3.5:4b          ~ 3.4 GB  ≥  5 GB     32K    verifier (Q4_K_M, 100%)
- *   prism-coder:2b      ~ 2.3 GB  ≥  3 GB      8K    mobile / iPhone (Q3_K_M, 99.1%)
+ *   prism-coder:27b     ~16 GB    ≥ 20 GB     32K    quality (on-demand, Qwen3.5 DeltaNet, 100% BFCL)
+ *   prism-coder:9b      ~ 5.8 GB  ≥  8 GB     32K    default router (Qwen3.5, 100% BFCL)
+ *   prism-coder:4b      ~ 3.4 GB  ≥  5 GB     32K    verifier (Qwen3.5, 100%)
+ *   prism-coder:2b      ~ 2.3 GB  ≥  3 GB      8K    mobile / iPhone (Qwen3.5, 99.1%)
  *
  * Below 3 GB free → no local pick (caller must use cloud).
  */
@@ -27,30 +26,30 @@ const GB = 1024 ** 3;
  * the first row whose minFreeGb fits within freeBytes.
  */
 export const MODEL_TIERS = [
-    { tag: 'prism-coder:32b', weightsGb: 19, minFreeGb: 24, ctxTokens: 32_768 },
-    { tag: 'prism-coder:14b', weightsGb: 9, minFreeGb: 12, ctxTokens: 32_768 },
-    { tag: 'qwen3.5:4b', weightsGb: 3.4, minFreeGb: 5, ctxTokens: 32_768 },
+    { tag: 'prism-coder:27b', weightsGb: 16, minFreeGb: 20, ctxTokens: 32_768 },
+    { tag: 'prism-coder:9b', weightsGb: 5.8, minFreeGb: 8, ctxTokens: 32_768 },
+    { tag: 'prism-coder:4b', weightsGb: 3.4, minFreeGb: 5, ctxTokens: 32_768 },
     { tag: 'prism-coder:2b', weightsGb: 2.3, minFreeGb: 3, ctxTokens: 8_192 },
 ];
 /**
  * True when `installed` matches `tierTag` either as a bare tag
- * (`prism-coder:32b`) or as a namespaced HuggingFace-style tag
- * (`dcostenco/prism-coder:32b`). The README documents `ollama pull
- * dcostenco/prism-coder:32b`, so Ollama's /api/tags returns the
+ * (`prism-coder:27b`) or as a namespaced HuggingFace-style tag
+ * (`dcostenco/prism-coder:27b`). The README documents `ollama pull
+ * dcostenco/prism-coder:27b`, so Ollama's /api/tags returns the
  * namespaced form — without this matcher the picker would never
  * see them and silently fall through to cloud.
  */
 function tagMatches(installed, tierTag) {
     return installed === tierTag || installed.endsWith(`/${tierTag}`);
 }
-/** Default ceiling: 14b. Pass ceiling="32b" explicitly for max quality. */
-export const DEFAULT_CEILING = "14b";
+/** Default ceiling: 9b. Pass ceiling="27b" explicitly for max quality. */
+export const DEFAULT_CEILING = "9b";
 /**
  * Pick the best viable tier for the given free RAM.
- * Default ceiling is 14b — use ceiling="32b" only for complex tasks.
+ * Default ceiling is 9b — use ceiling="27b" only for complex tasks.
  *
  * @param freeBytes  Result of os.freemem() — binary bytes
- * @param ceiling    Cap tier. Default "14b". Pass "32b" for complex tasks.
+ * @param ceiling    Cap tier. Default "9b". Pass "27b" for complex tasks.
  * @param available  Optional whitelist of installed Ollama tags.
  */
 export function pickLocalModel(freeBytes, ceiling, available) {
@@ -80,7 +79,7 @@ export function pickLocalModel(freeBytes, ceiling, available) {
 }
 /**
  * Resolve a tier tag to the actual Ollama name installed locally.
- * If `installed` contains a namespaced match (e.g. `dcostenco/prism-coder:32b`),
+ * If `installed` contains a namespaced match (e.g. `dcostenco/prism-coder:27b`),
  * the namespaced form is returned so Ollama's /api/generate finds it.
  * Falls back to the bare tag when only the bare form is present.
  */

package/dist/utils/qualityGate.js ADDED Viewed

@@ -0,0 +1,43 @@
+/**
+ * Quality Gate — deterministic check for obvious inference failures.
+ *
+ * NARROW by design: only high-precision signals that rarely false-positive.
+ * Does NOT judge correctness — that's the grounding verifier's job.
+ * Does NOT use refusal regex (too many false positives on legitimate output).
+ *
+ * Returns: { pass: boolean, reason?: string }
+ */
+/**
+ * Check if a model response passes the quality gate.
+ * @param stripped  Response AFTER think-stripping (use stripThink first)
+ * @param thinkOnly  True if the response was only <think> blocks with no answer
+ * @param finishReason  Ollama's finish_reason if available (e.g. "length" = truncated)
+ */
+export function passesQualityGate(stripped, thinkOnly, finishReason) {
+    // Signal 1: Think-only — model reasoned but produced no answer (check before empty)
+    if (thinkOnly) {
+        return { pass: false, reason: "think_only" };
+    }
+    // Signal 2: Empty or near-empty after stripping
+    if (stripped.trim().length < 5) {
+        return { pass: false, reason: "empty_response" };
+    }
+    // Signal 3: Hard truncation — Ollama reports finish_reason="length"
+    // meaning the model hit num_predict before finishing
+    if (finishReason === "length") {
+        return { pass: false, reason: "hard_truncation" };
+    }
+    // Signal 4: Exact-loop — same sentence repeated 3+ times
+    const sentences = stripped.split(/[.!?\n]+/).map(s => s.trim()).filter(s => s.length > 10);
+    if (sentences.length >= 6) {
+        const counts = new Map();
+        for (const s of sentences) {
+            const key = s.toLowerCase();
+            counts.set(key, (counts.get(key) ?? 0) + 1);
+            if ((counts.get(key) ?? 0) >= 3) {
+                return { pass: false, reason: "loop_detected" };
+            }
+        }
+    }
+    return { pass: true };
+}

package/dist/utils/thinkStrip.js ADDED Viewed

@@ -0,0 +1,26 @@
+/**
+ * Think-Strip — remove <think>...</think> blocks from model output.
+ *
+ * Qwen3.5 uses <think> blocks for chain-of-thought reasoning.
+ * These must be stripped before serving to the user or passing
+ * to the grounding verifier (which would try to ground reasoning text).
+ *
+ * Returns: { stripped: string, thinkContent: string | null, thinkOnly: boolean }
+ */
+const THINK_RE = /<(?:think|\|synalux_think\|)>[\s\S]*?<\/(?:think|\|synalux_think\|)>\s*/g;
+const UNCLOSED_THINK_RE = /<(?:think|\|synalux_think\|)>[\s\S]*$/;
+export function stripThink(raw) {
+    if (!raw.includes("<think>") && !raw.includes("<|synalux_think|>")) {
+        return { stripped: raw, thinkContent: null, thinkOnly: false };
+    }
+    const thinkMatch = raw.match(/<(?:think|\|synalux_think\|)>([\s\S]*?)<\/(?:think|\|synalux_think\|)>/);
+    const thinkContent = thinkMatch ? thinkMatch[1].trim() : null;
+    let stripped = raw.replace(THINK_RE, "");
+    stripped = stripped.replace(UNCLOSED_THINK_RE, "");
+    stripped = stripped.trim();
+    return {
+        stripped,
+        thinkContent,
+        thinkOnly: stripped.length === 0 && raw.trim().length > 0,
+    };
+}

package/dist/verification/gatekeeper.js CHANGED Viewed

@@ -15,8 +15,9 @@ export class Gatekeeper {
             console.warn(`\n⚠️  [OVERRIDDEN] Verification Gate bypassed via administrator override.`);
             // Enforce immutability and record audit trail context via environment variables
             validatedResult.gate_override = true;
+            // F19 fix: process.env.USER is trivially spoofable — log it but note it's unauthenticated.
             const actor = process.env.USER || process.env.USERNAME || 'unknown_user';
-            validatedResult.override_reason = validatedResult.override_reason || `CLI --force bypass by ${actor}`;
+            validatedResult.override_reason = validatedResult.override_reason || `CLI --force bypass (unauthenticated env.USER=${actor})`;
             return { canContinue: true, validatedResult };
         }
         switch (validatedResult.gate_action) {

package/dist/verification/runner.js CHANGED Viewed

@@ -196,7 +196,12 @@ export class VerificationRunner {
      * Throws an error if the hash does not match, ensuring test integrity.
      */
     static verifyRubricHash(tests, harness) {
-        const computed = computeRubricHash(tests);
+        // F11 fix: include min_pass_rate in hash verification when harness has it.
+        // Try with min_pass_rate first; fall back to without for backward compat.
+        const minRate = harness.min_pass_rate;
+        const computed = minRate !== undefined
+            ? computeRubricHash(tests, minRate)
+            : computeRubricHash(tests);
         if (computed !== harness.rubric_hash) {
             throw new Error(`Rubric hash mismatch. Expected ${harness.rubric_hash}, but computeRubricHash returned ${computed}. The tests have been modified since the harness was created.`);
         }
@@ -405,7 +410,7 @@ export class VerificationRunner {
                     if (!targetCheck.ok) {
                         return { passed: false, error: `HTTP target blocked: ${targetCheck.reason}` };
                     }
-                    const res = await fetch(a.target);
+                    const res = await fetch(a.target, { redirect: "error" });
                     return res.status === a.expected
                         ? { passed: true }
                         : { passed: false, error: `Expected status ${a.expected}, got ${res.status} for ${a.target}` };

package/dist/verification/schema.js CHANGED Viewed

@@ -56,8 +56,16 @@ export const TestSuiteSchema = z.object({
  * @param tests - The array of TestAssertion to hash
  * @returns Lowercase hex SHA-256 digest
  */
-export function computeRubricHash(tests) {
+export function computeRubricHash(tests, minPassRate) {
     const sorted = [...tests].sort((a, b) => a.id.localeCompare(b.id));
+    // F11 fix: when minPassRate is provided, include it in the hash so the
+    // threshold can't be changed without invalidating the rubric.
+    // When omitted, hash only tests (backward compatible with existing harnesses).
+    if (minPassRate !== undefined) {
+        return createHash("sha256")
+            .update(JSON.stringify({ tests: sorted, min_pass_rate: minPassRate }))
+            .digest("hex");
+    }
     return createHash("sha256")
         .update(JSON.stringify(sorted))
         .digest("hex");

package/dist/verification/severityPolicy.js CHANGED Viewed

@@ -44,6 +44,18 @@ export function resolveEffectiveSeverity(assertionSeverity, defaultSeverity) {
  */
 export function evaluateSeverityGates(results, config) {
     const failures = results.filter(r => !r.passed && !r.skipped);
+    // F10 fix: skipped critical (gate/abort) assertions count as failures.
+    // Crafting depends_on to skip critical checks must not neutralize the gate.
+    const skippedCritical = results.filter(r => r.skipped && (r.severity === 'gate' || r.severity === 'abort'));
+    if (skippedCritical.length > 0) {
+        const ids = skippedCritical.map(r => r.id).join(", ");
+        const hasAbort = skippedCritical.some(r => r.severity === 'abort');
+        return {
+            action: hasAbort ? "abort" : "block",
+            failed_assertions: skippedCritical,
+            summary: `${hasAbort ? 'ABORT' : 'BLOCKED'}: ${skippedCritical.length} critical assertion(s) were skipped [${ids}] — treating as failures.`
+        };
+    }
     if (failures.length === 0) {
         return {
             action: "continue",

package/package.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "name": "prism-mcp-server",
-  "version": "19.0.0",
+  "version": "19.1.0",
   "mcpName": "io.github.dcostenco/prism-coder",
-  "description": "Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 114 Agent Skills, PHI Guard, Tier Enforcement, Prompt-Based Skill Routing, Zero-Search HDC/HRR retrieval, HRR Semantic Drift Detection across BCBA/Coding/AAC domains, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder 1.7B–32B open-weights LLM fleet.",
+  "description": "Prism Coder \u2014 Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 114 Agent Skills, PHI Guard, Tier Enforcement, Prompt-Based Skill Routing, Zero-Search HDC/HRR retrieval, HRR Semantic Drift Detection across BCBA/Coding/AAC domains, HIPAA-hardened local-first storage, SLERP-optimized GRPO alignment) plus the prism-coder 1.7B\u201332B open-weights LLM fleet.",
   "module": "index.ts",
   "type": "module",
   "main": "dist/server.js",