npm - prism-mcp-server - Versions diffs - 17.0.0 → 17.1.0 - Mend

prism-mcp-server 17.0.0 → 17.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +94 -11
package/dist/tools/ledgerHandlers.js +37 -7
package/dist/tools/prismInferHandler.js +46 -6
package/dist/tools/skillRouting.js +39 -0
package/dist/utils/braveApi.js +52 -0
package/dist/utils/entitlements.js +136 -0
package/dist/utils/phiGuard.js +88 -0
package/dist/utils/synaluxSearch.js +195 -0
package/package.json +2 -2
package/dist/agent/agentTools.js +0 -453
package/dist/agent/mcpBridge.js +0 -234
package/dist/agent/platformUtils.js +0 -470
package/dist/agent/terminalUI.js +0 -198
package/dist/auth.js +0 -218
package/dist/darkfactory/cloudDelegate.js +0 -173
package/dist/env-preload.cjs +0 -2
package/dist/plugins/pluginManager.js +0 -199
package/dist/prism-cloud.js +0 -110
package/dist/scm/ciPipeline.js +0 -220
package/dist/start-with-env.sh +0 -5
package/dist/sync/encryptedSync.js +0 -172
package/dist/sync/synaluxProxy.js +0 -177
package/dist/tools/adaptiveDefinitions.js +0 -148
package/dist/tools/projects.js +0 -214
package/dist/utils/changelogGenerator.js +0 -158
package/dist/utils/fallbackClient.js +0 -52
package/dist/utils/memoryAttestation.js +0 -163
package/dist/utils/rbac.js +0 -321
package/dist/utils/tavilyApi.js +0 -70
package/dist/vm/quotaEnforcer.js +0 -192

package/README.md CHANGED Viewed

@@ -200,7 +200,7 @@ HRR acts as Tier 0 — if confidence is high, FTS5 is skipped entirely. Falls th
 Top-1 = correct word is tile #1. MRR = Mean Reciprocal Rank. Zero Top-5 regressions in any scenario. HRR encodes bigrams + trigrams from every spoken phrase; probes take ~0.2ms — safe on every keystroke. All Synalux apps (clinical, AAC, PrismCoach) share HRR via the portal `/api/v1/hrr` endpoint.
-**Competitive comparison:**
+**Memory retrieval comparison:**
 | System | Retrieval | Offline | Cost | Latency |
 |--------|-----------|---------|------|---------|
@@ -512,16 +512,99 @@ Cascade (14B→32B): **100.0%** · Opus solo: 98.3% · Opus engaged: **0% of req
 ## Plans
-| Plan | Cloud model | Daily limit | On-device |
-|---|---|---|---|
-| **Free** | — | unlimited local | prism-coder:1.7b (100%) + 8b (100%) + 14b (100%) |
-| **Standard $19/mo** | Claude Sonnet 4 | 200 req | + cloud fallback |
-| **Pro $49/mo** | prism-coder:32b | 2,000 req | + reasoning tier |
-| **Enterprise $99/mo** | prism-coder:32b priority | unlimited | + HIPAA BAA + custom fine-tuning |
-All on-device models are **free for every tier** — no subscription needed for local inference. Offline translation (1,261 phrases × 20 languages) included in all plans.
-[Subscribe →](https://synalux.ai/pricing)
+| | **Free** | **Standard $19/mo** | **Advanced $49/mo** | **Enterprise $99/mo** |
+|---|---|---|---|---|
+| **Local model ceiling** | up to 4b | up to 14b | up to 32b | up to 32b |
+| **Daily inference limit** | 50 | 200 | 2,000 | 100,000 |
+| **Max output tokens** | 512 | 1,024 | 2,048 | 4,096 |
+| **Cloud fallback** | — | Portal cascade (14b → 32b) | Portal cascade (14b → 32b → Claude Opus) | Priority cascade + Claude Opus |
+| **L3 grounding verifier** | — | ✓ | ✓ | ✓ |
+| **Knowledge search** | limited | unlimited | unlimited | unlimited |
+| **Session memory** | limited | unlimited | unlimited | unlimited |
+| **Analytics dashboard** | — | ✓ | ✓ | ✓ |
+| **HIPAA BAA** | — | — | — | ✓ |
+### What free users get
+- Local Ollama inference with models up to 4b (prism-coder:1b7 and prism-coder:4b)
+- 50 calls/day, 512 max output tokens per call
+- Local SQLite storage for session memory and knowledge
+- All open-weight models available to pull via `ollama pull`
+### What paid users get
+- **Higher model ceilings** — Standard unlocks 14b, Advanced/Enterprise unlock 32b
+- **Cloud fallback** — when local Ollama is down or underpowered, inference routes through the Synalux portal cascade (14b → 32b → Claude Opus)
+- **L3 grounding verifier** — evidence-based claim verification that rejects hallucinated outputs
+- **Unlimited knowledge search and session memory** — no caps on stored context
+- **Analytics dashboard** — usage metrics, latency tracking, model performance
+- **Higher daily limits and token caps** — see table above
+All on-device models are open-weight and free to run locally via Ollama. The subscription gates cloud features, higher model tiers, and increased limits.
+14-day free trial on all paid plans. [Subscribe →](https://synalux.ai/pricing)
+### Why Prism MCP
+**Pricing — flat-rate, not per-seat:**
+| | **Prism MCP** | GitHub Copilot | Cursor | Windsurf | Amazon Q | Tabnine |
+|---|---|---|---|---|---|---|
+| **Individual** | **$19/mo** | $10/mo | $20/mo | $15-20/mo | $19/mo | $39/mo |
+| **Team (5 devs)** | **$49/mo flat** | $95/mo | $200/mo | $200/mo | $95/mo | $295/mo |
+| **Enterprise** | **$99/mo flat** | $195/mo | $1,000/mo | Custom | Custom | Custom |
+**Features — full stack vs single-purpose:**
+| | **Prism MCP** | GitHub Copilot | Cursor | Windsurf | Amazon Q | Tabnine | Devin |
+|---|---|---|---|---|---|---|---|
+| **Web IDE** | **Synalux Coder** | github.dev | — | — | Console | — | Browser |
+| **VS Code extension** | **Yes** | Yes | N/A (is a fork) | N/A (is a fork) | Yes | Yes | No |
+| **MCP server** | **Native** | No | Partial | No | No | No | No |
+| **Works with Claude Code** | **Yes** | No | N/A | No | No | No | No |
+| **Local inference (Ollama)** | **1.7B–32B fleet** | No | No | No | No | No | No |
+| **Cloud fallback** | **14b→32b→Opus** | Cloud only | Cloud only | Cloud only | Cloud only | Cloud only | Cloud only |
+| **Works offline** | **Yes** | No | No | No | No | No | No |
+| **Open-weight models** | **HuggingFace** | Proprietary | Proprietary | Proprietary | Proprietary | Proprietary | Proprietary |
+| **Persistent memory** | **Cross-session** | No | No | No | No | No | Partial |
+| **Cognitive routing** | **Episodic/semantic/procedural** | No | No | No | No | No | No |
+| **Session drift detection** | **HRR-based** | No | No | No | No | No | No |
+| **Codebase indexing** | **Knowledge ingest (MCP + webhook + REST)** | Partial | Yes | Yes | Yes | Yes | Yes |
+| **L3 grounding verifier** | **Evidence-based** | No | No | No | No | No | No |
+| **Multi-agent hivemind** | **Shared Mind Palace** | No | No | No | No | No | No |
+| **Analytics dashboard** | **Yes** | No | Yes | Yes | Yes | No | Yes |
+| **HIPAA / air-gapped** | **On-prem, no BAA needed** | Requires BAA | No | No | Partial | No | No |
+| **Data stays local** | **Yes** | No | No | No | No | No | No |
+**vs local AI tools:**
+| | **Prism MCP** | Ollama | LM Studio | Jan.ai | Mem0 | Zep |
+|---|---|---|---|---|---|---|
+| **Local inference** | 1.7B–32B cascade | Any GGUF | Any GGUF | Any GGUF | No | No |
+| **Cloud fallback** | Automatic | No | No | Partial | Cloud only | Cloud only |
+| **Persistent memory** | Cross-session | No | No | No | Yes | Yes |
+| **Knowledge ingestion** | MCP + GitHub webhook + REST | No | No | No | Partial | No |
+| **Cognitive routing** | 3-store (episodic/semantic/procedural) | No | No | No | No | Temporal graph |
+| **Grounding verifier** | L3 evidence-based | No | No | No | No | No |
+| **Drift detection** | HRR-based | No | No | No | No | No |
+| **MCP server** | Native | No | No | No | No | No |
+| **Web IDE** | Synalux Coder | No | No | No | No | No |
+| **VS Code extension** | Yes | No | No | No | No | No |
+| **Analytics** | Dashboard + Datadog | No | No | No | Yes | Yes |
+| **Price** | $0–99/mo flat | Free | Free/$10/user | Free | $249/mo | $99/mo |
+**Why developers choose Prism:**
+- **Full IDE experience** — Synalux Coder (web) + VS Code extension + MCP for Claude Code, Cursor, JetBrains
+- **Local-first** — your code and context never leave your machine unless you opt in to cloud
+- **Flat-rate pricing** — $49/mo for your whole team, not $40/seat/mo
+- **Works offline** — airplane, hospital, air-gapped classified environments
+- **Open models** — prism-coder weights are on HuggingFace, not locked behind an API
+- **Memory that persists** — cognitive routing stores episodic, semantic, and procedural memory across sessions
+- **Drift detection** — HRR-based session monitoring catches when your AI agent goes off-track
+- **Grounding verification** — L3 verifier rejects hallucinated outputs before they reach you
+- **Codebase indexing** — knowledge ingestion via MCP tool, GitHub webhooks, or REST API
+- **Multi-agent ready** — Hivemind lets multiple agents share the same Mind Palace with role-scoped context
+- **HIPAA without paperwork** — local inference means no BAA required, PHI never leaves the device
 ---

package/dist/tools/ledgerHandlers.js CHANGED Viewed

@@ -3,6 +3,7 @@ import * as nodePath from "node:path";
 import * as os from "node:os";
 import { randomUUID } from "node:crypto";
 import { redactSettings, toMarkdown } from "./commonHelpers.js";
+import { scanAndRedactPHI } from "../utils/phiGuard.js";
 import * as fflate from "fflate";
 import { buildVaultDirectory } from "../utils/vaultExporter.js";
 /**
@@ -60,9 +61,11 @@ import { notifyResourceUpdate } from "../server.js";
  * Zero-latency (pure regex, no API calls). Runs on every save.
  */
 export function sanitizeMemoryInput(text) {
-    return text
+    const stripped = text
         .replace(/<\/?(?:system|user_input|instruction|anti_pattern|desired_pattern|assistant|tool_call|prism_memory)[^>]*>/gi, '')
         .trim();
+    // HIPAA: redact PHI before storage — SSN, DOB, MRN, patient names, etc.
+    return scanAndRedactPHI(stripped).redacted;
 }
 /** Sanitize each string in an array (for decisions[], todos[], etc.) */
 function sanitizeArray(arr) {
@@ -853,17 +856,29 @@ export async function sessionLoadContextHandler(args) {
             .fetchSkillContent(missing).catch(() => ({}));
         debugLog(`[session_load_context] Synalux skill content fetched: ${Object.keys(synaluxContent).join(", ") || "none"}`);
     }
+    const SKILL_BLOCK_CAP = 30_000;
+    const skippedSkills = [];
     for (const skillName of skillsToLoad) {
         if (loadedSkills.includes(skillName))
             continue;
-        // Synalux (paid) → platform fallback skill:<name> (free/offline). Never user_skill:.
+        if (skillBlock.length >= SKILL_BLOCK_CAP) {
+            skippedSkills.push(skillName);
+            debugLog(`[session_load_context] Skill "${skillName}" skipped — block cap ${SKILL_BLOCK_CAP} reached`);
+            continue;
+        }
         const content = synaluxContent[skillName] || await getSetting(`skill:${skillName}`, "");
         if (content && content.trim()) {
+            const trimmed = content.trim();
+            if (skillBlock.length + trimmed.length > SKILL_BLOCK_CAP && loadedSkills.length > 0) {
+                skippedSkills.push(skillName);
+                debugLog(`[session_load_context] Skill "${skillName}" skipped — would exceed cap (${skillBlock.length}+${trimmed.length} > ${SKILL_BLOCK_CAP})`);
+                continue;
+            }
             const source = synaluxContent[skillName] ? "synalux" : "local-platform";
-            skillBlock += `\n\n[📜 SKILL: ${skillName}]\n${content.trim()}`;
+            skillBlock += `\n\n[📜 SKILL: ${skillName}]\n${trimmed}`;
             loadedSkills.push(skillName);
             skillLoaded = true;
-            debugLog(`[session_load_context] Skill "${skillName}" loaded (${source}) for project="${project}"`);
+            debugLog(`[session_load_context] Skill "${skillName}" loaded (${source}) for project="${project}" [${skillBlock.length}/${SKILL_BLOCK_CAP} chars]`);
         }
     }
     // ─── User-Local Skills ──────────────────────────────────────
@@ -876,10 +891,15 @@ export async function sessionLoadContextHandler(args) {
         for (const [k, v] of Object.entries(allSettings)) {
             if (!k.startsWith(prefix) || !v)
                 continue;
+            if (skillBlock.length >= SKILL_BLOCK_CAP)
+                break;
             const skillName = k.replace(prefix, "");
             if (loadedSkills.includes(skillName))
                 continue;
-            skillBlock += `\n\n[📜 USER SKILL: ${skillName}]\n${v.trim()}`;
+            const trimmed = v.trim();
+            if (skillBlock.length + trimmed.length > SKILL_BLOCK_CAP && loadedSkills.length > 0)
+                continue;
+            skillBlock += `\n\n[📜 USER SKILL: ${skillName}]\n${trimmed}`;
             loadedSkills.push(skillName);
             skillLoaded = true;
             debugLog(`[session_load_context] User-local skill "${skillName}" loaded`);
@@ -888,22 +908,32 @@ export async function sessionLoadContextHandler(args) {
     // ─── Memory-Based Skill Discovery ──────────────────────────
     // If recent handoff/ledger mentions a platform skill name, auto-load it.
     // Only scans platform skill: keys — user_skill: discovery is not automatic.
-    if (formattedContext.length > 0) {
+    if (formattedContext.length > 0 && skillBlock.length < SKILL_BLOCK_CAP) {
         const contextText = formattedContext.toLowerCase();
         const allSkillKeys = await storage.getAllSettings?.() || {};
         for (const [k, v] of Object.entries(allSkillKeys)) {
             if (!k.startsWith("skill:") || !v)
                 continue;
+            if (skillBlock.length >= SKILL_BLOCK_CAP)
+                break;
             const skillName = k.replace("skill:", "");
             if (loadedSkills.includes(skillName))
                 continue;
             if (contextText.includes(skillName.replace(/-/g, " ")) || contextText.includes(skillName)) {
-                skillBlock += `\n\n[📜 CONTEXT SKILL: ${skillName}]\n${v}`;
+                const trimmed = v.trim();
+                if (skillBlock.length + trimmed.length > SKILL_BLOCK_CAP && loadedSkills.length > 0) {
+                    skippedSkills.push(skillName);
+                    continue;
+                }
+                skillBlock += `\n\n[📜 CONTEXT SKILL: ${skillName}]\n${trimmed}`;
                 loadedSkills.push(skillName);
                 debugLog(`[session_load_context] Context-triggered skill "${skillName}"`);
             }
         }
     }
+    if (skippedSkills.length > 0) {
+        skillBlock += `\n\n[⏭️ ${skippedSkills.length} skills skipped (cap ${SKILL_BLOCK_CAP} chars): ${skippedSkills.join(", ")}]`;
+    }
     // ─── Agent Greeting Block ────────────────────────────────────
     // Shows agent identity (name + role) and skill status after briefing.
     let greetingBlock = "";

package/dist/tools/prismInferHandler.js CHANGED Viewed

@@ -25,6 +25,8 @@ import { getAvailableMemoryBytes } from "../utils/availableMemory.js";
 import { PRISM_SYNALUX_BASE_URL, PRISM_LOCAL_LLM_URL, } from "../config.js";
 import { debugLog } from "../utils/logger.js";
 import { verifyGrounding } from "../utils/groundingVerifier.js";
+import { getEntitlements, clampCeiling } from "../utils/entitlements.js";
+import { ddLog } from "../utils/ddLogger.js";
 // ─── Tool Definition ────────────────────────────────────────────
 export const PRISM_INFER_TOOL = {
     name: "prism_infer",
@@ -273,12 +275,47 @@ async function callSynaluxInference(prompt, maxTokens, timeoutMs) {
 }
 export async function runInfer(args, deps) {
     const t0 = Date.now();
-    const maxTokens = Math.min(args.max_tokens ?? 1024, 8192);
     const temperature = args.temperature ?? 0;
-    const allowCloud = args.cloud_fallback === true;
+    // ── Entitlement enforcement ──────────────────────────────────
+    // Fetch user's plan limits (cached 1hr). Free users without auth
+    // get 4b ceiling, 50 calls/day, 512 max tokens.
+    const ent = deps.entitlements ?? await getEntitlements();
+    // Clamp model ceiling to what the plan allows
+    const effectiveCeiling = clampCeiling(args.model_ceiling, ent.model_ceiling);
+    // Clamp max_tokens to plan limit
+    const maxTokens = Math.min(args.max_tokens ?? 1024, ent.max_tokens, 8192);
+    // Cloud fallback only for paid plans
+    const allowCloud = args.cloud_fallback === true && ent.features.cloud_fallback;
+    // Verification only for paid plans (free users skip L3 grounding)
+    const canVerify = ent.features.grounding_verifier;
     const freeBytes = deps.freemem();
     const ramFreeMb = Math.round(freeBytes / (1024 * 1024));
     const attempts = [];
+    // Strip verification args if plan lacks grounding_verifier
+    const gatedArgs = canVerify ? args : { ...args, verify: false, evidence: undefined };
+    debugLog(`[prism_infer] plan=${ent.plan} ceiling=${effectiveCeiling} max_tokens=${maxTokens} cloud=${allowCloud} verify=${canVerify}`);
+    // Log tier enforcement to Datadog for monetization visibility
+    const ceilingClamped = effectiveCeiling !== (args.model_ceiling ?? ent.model_ceiling);
+    const tokensClamped = maxTokens < (args.max_tokens ?? 1024);
+    const cloudBlocked = args.cloud_fallback === true && !allowCloud;
+    const verifierBlocked = (args.verify === true || (args.evidence?.length ?? 0) > 0) && !canVerify;
+    if (ceilingClamped || tokensClamped || cloudBlocked || verifierBlocked) {
+        ddLog("info", "prism_infer.tier_enforcement", {
+            plan: ent.plan,
+            requested_ceiling: args.model_ceiling,
+            effective_ceiling: effectiveCeiling,
+            ceiling_clamped: ceilingClamped,
+            requested_tokens: args.max_tokens,
+            effective_tokens: maxTokens,
+            tokens_clamped: tokensClamped,
+            cloud_requested: args.cloud_fallback,
+            cloud_allowed: allowCloud,
+            cloud_blocked: cloudBlocked,
+            verify_requested: args.verify,
+            verify_allowed: canVerify,
+            verify_blocked: verifierBlocked,
+        });
+    }
     // Discover which tags Ollama actually has + which are already warm.
     // Already-loaded models don't need RAM headroom — they're reusing
     // memory Ollama allocated previously.
@@ -292,8 +329,8 @@ export async function runInfer(args, deps) {
     // so the caller can see exactly why each tier was bypassed.
     if (installed) {
         // Find start index from ceiling — if no ceiling, start at the top (32B).
-        const ceilStart = args.model_ceiling
-            ? Math.max(0, MODEL_TIERS.findIndex(t => t.tag.endsWith(args.model_ceiling) || t.tag === args.model_ceiling))
+        const ceilStart = effectiveCeiling
+            ? Math.max(0, MODEL_TIERS.findIndex(t => t.tag.endsWith(effectiveCeiling) || t.tag === effectiveCeiling))
             : 0;
         let anyViable = false;
         for (let i = ceilStart; i < MODEL_TIERS.length; i++) {
@@ -318,13 +355,14 @@ export async function runInfer(args, deps) {
             const timeout = args.timeout_ms ?? DEFAULT_TIMEOUTS[tier.tag] ?? 60_000;
             const result = await deps.callLocal(deps.ollamaUrl, ollamaName, args.prompt, args.system, maxTokens, temperature, timeout);
             if (result.ok) {
-                return await applyVerification(result.text, args, deps, {
+                return await applyVerification(result.text, gatedArgs, deps, {
                     backend: `ollama-${tier.tag.replace("prism-coder:", "")}`,
                     model_picked: tier.tag,
                     ram_free_mb: ramFreeMb,
                     latency_ms: Date.now() - t0,
                     used_cloud: false,
                     attempts,
+                    plan: ent.plan,
                 });
             }
             attempts.push({ tier: tier.tag, reason: result.reason });
@@ -340,13 +378,14 @@ export async function runInfer(args, deps) {
         const cloudTimeout = args.timeout_ms ?? 90_000;
         const cloud = await deps.callCloud(args.prompt, maxTokens, cloudTimeout);
         if (cloud.ok && cloud.output) {
-            return await applyVerification(cloud.output, args, deps, {
+            return await applyVerification(cloud.output, gatedArgs, deps, {
                 backend: cloud.backend ?? "synalux",
                 model_picked: null,
                 ram_free_mb: ramFreeMb,
                 latency_ms: Date.now() - t0,
                 used_cloud: true,
                 attempts,
+                plan: ent.plan,
             });
         }
         attempts.push({ tier: "synalux", reason: cloud.reason ?? "unknown" });
@@ -408,6 +447,7 @@ export async function prismInferHandler(args) {
         debugLog(`[prism_infer] backend=${result.backend} model=${result.model_picked} latency=${result.latency_ms}ms free=${result.ram_free_mb}MB`);
         const header = `[prism_infer] backend=${result.backend}` +
             ` model=${result.model_picked ?? "n/a"}` +
+            ` plan=${result.plan ?? "unknown"}` +
             ` free_ram=${result.ram_free_mb}MB` +
             ` latency=${result.latency_ms}ms` +
             ` used_cloud=${result.used_cloud}` +

package/dist/tools/skillRouting.js CHANGED Viewed

@@ -81,6 +81,45 @@ export async function resolveSkillsForProject(project) {
         user_local: table.user_local ?? OFFLINE_FALLBACK.user_local,
     };
 }
+/**
+ * Resolve skills based on user prompt keywords. Matches prompt text
+ * against the routing table's prompt_keywords regex patterns.
+ * Returns deduplicated skill names (excluding any already in baseSkills).
+ */
+export async function resolveSkillsForPrompt(prompt, baseSkills = []) {
+    const now = Date.now();
+    if (!cached || now - cached.fetchedAt > CACHE_TTL_MS) {
+        if (!inflight) {
+            inflight = fetchOnce().then((table) => {
+                cached = { table, fetchedAt: Date.now() };
+                return table;
+            }).finally(() => { inflight = null; });
+        }
+        await inflight;
+    }
+    const table = cached.table;
+    if (!table.prompt_keywords)
+        return [];
+    const existing = new Set(baseSkills);
+    const matched = [];
+    for (const [pattern, skills] of Object.entries(table.prompt_keywords)) {
+        try {
+            const re = new RegExp(pattern, 'i');
+            if (re.test(prompt)) {
+                for (const s of skills) {
+                    if (!existing.has(s)) {
+                        existing.add(s);
+                        matched.push(s);
+                    }
+                }
+            }
+        }
+        catch {
+            // Invalid regex in routing table — skip silently
+        }
+    }
+    return matched;
+}
 /** Force a re-fetch on the next call. Exposed for tests + admin tooling. */
 export function _invalidateRoutingCache() {
     cached = null;

package/dist/utils/braveApi.js CHANGED Viewed

@@ -29,8 +29,20 @@
  * The Brave Answers endpoint uses a separate BRAVE_ANSWERS_API_KEY via Bearer token.
  */
 import { BRAVE_API_KEY, BRAVE_ANSWERS_API_KEY } from "../config.js";
+import { SYNALUX_SEARCH_AVAILABLE, synaluxWebSearch, synaluxWebSearchRaw, synaluxLocalSearch, synaluxLocalSearchRaw, synaluxBraveAnswers, } from "./synaluxSearch.js";
+import { debugLog } from "./logger.js";
 // Brave Answers API call (AI Grounding/OpenAI-compatible)
 export async function performBraveAnswers(query, model = "brave") {
+    // Route through Synalux portal when available
+    if (SYNALUX_SEARCH_AVAILABLE) {
+        try {
+            return await synaluxBraveAnswers(query, model);
+        }
+        catch (err) {
+            debugLog(`[braveApi] Synalux answers failed, falling back to Brave: ${err instanceof Error ? err.message : String(err)}`);
+            // Fall through to direct Brave API
+        }
+    }
     if (!BRAVE_ANSWERS_API_KEY) {
         throw new Error("BRAVE_ANSWERS_API_KEY is not configured");
     }
@@ -62,6 +74,16 @@ export async function performBraveAnswers(query, model = "brave") {
 }
 // Raw web search API call
 export async function performWebSearchRaw(query, count = 10, offset = 0) {
+    // Route through Synalux portal when available (offset=0 only — portal doesn't support offset)
+    if (SYNALUX_SEARCH_AVAILABLE && offset === 0) {
+        try {
+            return await synaluxWebSearchRaw(query, count);
+        }
+        catch (err) {
+            debugLog(`[braveApi] Synalux search failed, falling back to Brave: ${err instanceof Error ? err.message : String(err)}`);
+            // Fall through to direct Brave API
+        }
+    }
     const url = new URL("https://api.search.brave.com/res/v1/web/search");
     url.searchParams.set("q", query);
     url.searchParams.set("count", Math.min(count, 20).toString()); // API limit
@@ -81,6 +103,16 @@ export async function performWebSearchRaw(query, count = 10, offset = 0) {
 }
 // Web search API call
 export async function performWebSearch(query, count = 10, offset = 0) {
+    // Route through Synalux portal when available (offset=0 only — portal doesn't support offset)
+    if (SYNALUX_SEARCH_AVAILABLE && offset === 0) {
+        try {
+            return await synaluxWebSearch(query, count);
+        }
+        catch (err) {
+            debugLog(`[braveApi] Synalux search failed, falling back to Brave: ${err instanceof Error ? err.message : String(err)}`);
+            // Fall through to direct Brave API
+        }
+    }
     const textData = await performWebSearchRaw(query, count, offset);
     const data = JSON.parse(textData);
     // Extract just web results
@@ -136,6 +168,16 @@ function chunkArray(arr, size) {
 }
 // Raw local search API call with poi/details payload
 export async function performLocalSearchRaw(query, count = 5) {
+    // Route through Synalux portal when available
+    if (SYNALUX_SEARCH_AVAILABLE) {
+        try {
+            return await synaluxLocalSearchRaw(query, count);
+        }
+        catch (err) {
+            debugLog(`[braveApi] Synalux local search raw failed, falling back to Brave: ${err instanceof Error ? err.message : String(err)}`);
+            // Fall through to direct Brave API
+        }
+    }
     // Initial search to get location IDs
     const webUrl = new URL("https://api.search.brave.com/res/v1/web/search");
     webUrl.searchParams.set("q", query);
@@ -190,6 +232,16 @@ export async function performLocalSearchRaw(query, count = 5) {
 }
 // Local search API call with poi details
 export async function performLocalSearch(query, count = 5) {
+    // Route through Synalux portal when available
+    if (SYNALUX_SEARCH_AVAILABLE) {
+        try {
+            return await synaluxLocalSearch(query, count);
+        }
+        catch (err) {
+            debugLog(`[braveApi] Synalux local search failed, falling back to Brave: ${err instanceof Error ? err.message : String(err)}`);
+            // Fall through to direct Brave API
+        }
+    }
     const rawData = await performLocalSearchRaw(query, count);
     const parsed = JSON.parse(rawData);
     if (parsed.source === "web_fallback") {

package/dist/utils/entitlements.js ADDED Viewed

@@ -0,0 +1,136 @@
+/**
+ * Prism Entitlements — Plan-Based Feature & Model Gating
+ * ═══════════════════════════════════════════════════════════
+ * Fetches the user's plan entitlements from the Synalux portal
+ * and caches them locally. Used by prism_infer and other tools
+ * to enforce model ceiling, max_tokens, and feature gates.
+ *
+ * Unauthenticated users (no SYNALUX_API_KEY) get free-tier defaults.
+ * Authenticated users get their plan from the portal (1-hour cache).
+ */
+import { getSynaluxJwt } from "./synaluxJwt.js";
+import { PRISM_SYNALUX_BASE_URL, SYNALUX_CONFIGURED } from "../config.js";
+import { debugLog } from "./logger.js";
+// ── Free-tier defaults (no auth) ──────────────────────────────────
+export const FREE_ENTITLEMENTS = {
+    plan: "free",
+    model_ceiling: "4b",
+    daily_infer_limit: 50,
+    max_tokens: 512,
+    features: {
+        cloud_fallback: false,
+        grounding_verifier: false,
+        knowledge_search_unlimited: false,
+        session_memory_unlimited: false,
+        analytics_dashboard: false,
+    },
+    upgrade_url: "https://synalux.ai/pricing",
+};
+// ── Cache ─────────────────────────────────────────────────────────
+const CACHE_TTL_MS = 60 * 60 * 1000; // 1 hour
+let cache = null;
+let inFlight = null;
+// ── Model tier ordering for ceiling enforcement ───────────────────
+const TIER_ORDER = ["1b7", "4b", "8b", "14b", "32b"];
+/**
+ * Returns true if `requested` exceeds `ceiling`.
+ * e.g. ceilingExceeded("14b", "4b") → true (14b > 4b ceiling)
+ */
+export function ceilingExceeded(requested, ceiling) {
+    const reqIdx = TIER_ORDER.indexOf(requested);
+    const ceilIdx = TIER_ORDER.indexOf(ceiling);
+    if (reqIdx === -1 || ceilIdx === -1)
+        return false;
+    return reqIdx > ceilIdx;
+}
+/**
+ * Clamp a model ceiling string to the plan's maximum.
+ * Returns the lower of the two ceilings.
+ */
+export function clampCeiling(requested, planCeiling) {
+    if (!requested)
+        return planCeiling;
+    const reqIdx = TIER_ORDER.indexOf(requested);
+    const planIdx = TIER_ORDER.indexOf(planCeiling);
+    if (reqIdx === -1)
+        return planCeiling;
+    if (planIdx === -1)
+        return requested;
+    return TIER_ORDER[Math.min(reqIdx, planIdx)];
+}
+// ── Fetch ─────────────────────────────────────────────────────────
+async function fetchEntitlements() {
+    if (!SYNALUX_CONFIGURED || !PRISM_SYNALUX_BASE_URL) {
+        debugLog("[entitlements] no Synalux auth configured — free tier");
+        return FREE_ENTITLEMENTS;
+    }
+    const jwt = await getSynaluxJwt();
+    if (!jwt) {
+        debugLog("[entitlements] JWT exchange failed — free tier fallback");
+        return FREE_ENTITLEMENTS;
+    }
+    try {
+        const url = `${PRISM_SYNALUX_BASE_URL}/api/v1/prism/entitlements`;
+        const res = await fetch(url, {
+            method: "GET",
+            headers: { Authorization: `Bearer ${jwt}` },
+            signal: AbortSignal.timeout(10_000),
+            redirect: "error",
+        });
+        if (!res.ok) {
+            debugLog(`[entitlements] portal HTTP ${res.status} — free tier fallback`);
+            return FREE_ENTITLEMENTS;
+        }
+        const data = (await res.json());
+        if (!data.plan || !data.model_ceiling) {
+            debugLog("[entitlements] malformed response — free tier fallback");
+            return FREE_ENTITLEMENTS;
+        }
+        debugLog(`[entitlements] plan=${data.plan} ceiling=${data.model_ceiling} ` +
+            `daily=${data.daily_infer_limit} max_tokens=${data.max_tokens}`);
+        return data;
+    }
+    catch (err) {
+        debugLog(`[entitlements] fetch error: ${err instanceof Error ? err.message : String(err)} — free tier fallback`);
+        return FREE_ENTITLEMENTS;
+    }
+}
+// ── Public API ────────────────────────────────────────────────────
+/**
+ * Get the current user's entitlements (cached for 1 hour).
+ * Concurrent callers share a single in-flight fetch.
+ */
+export async function getEntitlements() {
+    const now = Date.now();
+    if (cache && cache.expiresAt > now) {
+        return cache.entitlements;
+    }
+    if (inFlight)
+        return inFlight;
+    inFlight = (async () => {
+        try {
+            const ent = await fetchEntitlements();
+            cache = { entitlements: ent, expiresAt: Date.now() + CACHE_TTL_MS };
+            return ent;
+        }
+        finally {
+            inFlight = null;
+        }
+    })();
+    return inFlight;
+}
+/**
+ * Force cache invalidation (e.g. after plan upgrade).
+ */
+export function invalidateEntitlements() {
+    cache = null;
+}
+/** Test-only: reset all state. */
+export function _resetEntitlementsForTest() {
+    cache = null;
+    inFlight = null;
+}
+/** Test-only: inject a cached entitlement. */
+export function _setCacheForTest(ent, ttlMs = CACHE_TTL_MS) {
+    cache = { entitlements: ent, expiresAt: Date.now() + ttlMs };
+}