local-model-suitability-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,13 @@
1
+ # Changelog
2
+
3
+ ## [1.0.0] - 2026-04-13
4
+
5
+ ### Initial release
6
+
7
+ - `evaluate_local_model_suitability` tool — AI-powered evaluation of local model suitability
8
+ - Built-in capability profiles for 25+ popular local models (Llama, Mistral, Qwen, Gemma, Phi, DeepSeek, CodeLlama)
9
+ - Four-dimensional reasoning: cost, privacy, latency, quality
10
+ - Verdict: LOCAL / CLOUD / EITHER / NEITHER with confidence score
11
+ - Free tier: 20 evaluations/month
12
+ - Pro tier: 2,000 evaluations/month ($99/month)
13
+ - Enterprise tier: unlimited ($299/month)
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Kord Agencies
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,176 @@
1
+ # Local Model Suitability MCP
2
+
3
+ **AI-powered evaluation of whether your local model is actually good enough for the task at hand.**
4
+
5
+ ---
6
+
7
+ ## The Problem
8
+
9
+ When you have both a local model (Ollama, LM Studio, etc.) and cloud APIs available, agents face a decision they cannot make intelligently alone:
10
+
11
+ **Should I run this locally or send it to the cloud?**
12
+
13
+ Getting this wrong in either direction is expensive:
14
+
15
+ - **Wrong direction 1 — cloud when local works:** You pay Claude Opus rates for a task a 7B model handles perfectly. At scale, this is thousands of dollars wasted monthly.
16
+ - **Wrong direction 2 — local when cloud is needed:** You run a complex reasoning task through a small model and get silent quality failures. The agent proceeds confidently on bad output.
17
+ - **Wrong direction 3 — cloud when data is sensitive:** You send confidential internal data to an external API that logs it. A privacy or compliance violation you never intended.
18
+
19
+ ## The Solution
20
+
21
+ `evaluate_local_model_suitability` is a single AI-powered tool that reasons across four dimensions simultaneously — **cost, privacy, latency, and quality** — and returns a clear verdict your agent can act on.
22
+
23
+ ```
24
+ Verdict: LOCAL | CLOUD | EITHER | NEITHER
25
+ ```
26
+
27
+ This is not a benchmark lookup. Claude reasons about your specific task, your specific model, and your specific constraints.
28
+
29
+ ---
30
+
31
+ ## Installation
32
+
33
+ ```bash
34
+ npx local-model-suitability-mcp
35
+ ```
36
+
37
+ Or install globally:
38
+
39
+ ```bash
40
+ npm install -g local-model-suitability-mcp
41
+ ```
42
+
43
+ ### Claude Desktop / Claude Code config
44
+
45
+ ```json
46
+ {
47
+ "mcpServers": {
48
+ "local-model-suitability": {
49
+ "command": "npx",
50
+ "args": ["-y", "local-model-suitability-mcp"],
51
+ "env": {
52
+ "ANTHROPIC_API_KEY": "your-key-here"
53
+ }
54
+ }
55
+ }
56
+ }
57
+ ```
58
+
59
+ ### With Pro API key
60
+
61
+ ```json
62
+ {
63
+ "mcpServers": {
64
+ "local-model-suitability": {
65
+ "command": "npx",
66
+ "args": ["-y", "local-model-suitability-mcp"],
67
+ "env": {
68
+ "ANTHROPIC_API_KEY": "your-anthropic-key",
69
+ "LMS_API_KEY": "your-pro-key-from-kordagencies"
70
+ }
71
+ }
72
+ }
73
+ }
74
+ ```
75
+
76
+ ---
77
+
78
+ ## Tool: `evaluate_local_model_suitability`
79
+
80
+ ### Parameters
81
+
82
+ | Parameter | Type | Required | Description |
83
+ |---|---|---|---|
84
+ | `task_description` | string | ✅ | Describe the task specifically. Include output format, accuracy requirements, stakes. |
85
+ | `local_model` | string | ✅ | Model name in Ollama format: `llama3.1:8b`, `mistral:7b`, `qwen2.5:14b`, etc. |
86
+ | `quality_threshold` | enum | ✅ | `draft` / `production` / `critical` |
87
+ | `use_case_type` | enum | ✅ | `classification` / `summarisation` / `code_generation` / `reasoning` / `data_extraction` / `creative_writing` / `question_answering` / `translation` / `sentiment_analysis` / `other` |
88
+ | `data_sensitivity` | enum | ✅ | `public` / `internal` / `confidential` |
89
+ | `latency_requirement` | enum | ✅ | `flexible` / `moderate` / `realtime` |
90
+
91
+ ### Example Request
92
+
93
+ ```json
94
+ {
95
+ "task_description": "Classify customer support emails into 5 categories: billing, technical, returns, complaints, general. Must be accurate enough for production routing — wrong classification means wrong team gets the ticket.",
96
+ "local_model": "llama3.1:8b",
97
+ "quality_threshold": "production",
98
+ "use_case_type": "classification",
99
+ "data_sensitivity": "internal",
100
+ "latency_requirement": "moderate"
101
+ }
102
+ ```
103
+
104
+ ### Example Response
105
+
106
+ ```json
107
+ {
108
+ "verdict": "EITHER",
109
+ "confidence": "HIGH",
110
+ "summary": "Llama 3.1 8B can handle 5-category email classification at production quality if emails are clear — use local to protect customer data and save cost, with cloud fallback for ambiguous cases.",
111
+ "model_evaluated": "llama3.1:8b",
112
+ "model_profile": {
113
+ "parameter_count": "8B",
114
+ "tier": "small",
115
+ "known_strengths": ["simple Q&A", "basic summarisation", "short classification", "data extraction"],
116
+ "known_weaknesses": ["complex multi-step reasoning", "long-context coherence", "nuanced instruction following"]
117
+ },
118
+ "task_complexity": "SIMPLE",
119
+ "reasoning": {
120
+ "quality_assessment": "5-category classification is within 8B capability for well-structured emails. Performance degrades on ambiguous or multi-issue tickets.",
121
+ "cost_impact": "Running locally saves approximately $0.003-0.008 per classification vs cloud. At 10,000 emails/month that is $30-80 saved monthly.",
122
+ "privacy_assessment": "Customer support emails contain personal data. Keeping classification local avoids sending customer PII to external APIs — strong argument for local.",
123
+ "latency_assessment": "Classification on an 8B model completes in 200-800ms depending on hardware. Meets moderate latency requirement.",
124
+ "failure_modes": "Watch for: (1) multi-issue emails being misclassified to only one category, (2) sarcastic or informal language confusing the classifier, (3) very short one-word emails with no context."
125
+ },
126
+ "recommended_cloud_model": null,
127
+ "fallback_advice": "If local classification confidence is low (detectable via logprobs or by asking the model to rate its own confidence), escalate to claude-haiku-3 for a second opinion — cheapest cloud model that handles ambiguous classification reliably.",
128
+ "task_complexity": "SIMPLE",
129
+ "analysis_type": "AI-powered — NOT a simple benchmark lookup",
130
+ "free_tier_remaining": 17,
131
+ "checked_at": "2026-04-13T10:22:31.000Z"
132
+ }
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Models With Built-in Knowledge
138
+
139
+ The following models have detailed capability profiles built in. All other models are assessed based on name and parameter patterns.
140
+
141
+ | Model | Params | Tier |
142
+ |---|---|---|
143
+ | llama3.1:8b | 8B | small |
144
+ | llama3.1:70b | 70B | large |
145
+ | llama3.1:405b | 405B | frontier |
146
+ | llama3.2:3b | 3B | tiny |
147
+ | mistral:7b | 7B | small |
148
+ | mixtral:8x7b | 47B | medium |
149
+ | qwen2.5:7b–72b | 7B–72B | small–large |
150
+ | gemma2:2b–27b | 2B–27B | tiny–medium |
151
+ | phi3:mini, phi3:medium, phi4 | 3.8B–14B | tiny–medium |
152
+ | deepseek-r1:8b–70b | 8B–70B | small–large |
153
+ | codellama:7b–34b | 7B–34B | small–large |
154
+ | deepseek-coder:6.7b–33b | 6.7B–33B | small–large |
155
+
156
+ ---
157
+
158
+ ## Pricing
159
+
160
+ | Tier | Price | Evaluations |
161
+ |---|---|---|
162
+ | Free | $0 | 20/month |
163
+ | Pro | $99/month | 2,000/month |
164
+ | Enterprise | $299/month | Unlimited |
165
+
166
+ Get your Pro key at [kordagencies.com](https://kordagencies.com)
167
+
168
+ ---
169
+
170
+ ## Privacy
171
+
172
+ We do not log or store your task descriptions, model names, or any query content. Each evaluation is processed and discarded. Full terms: [kordagencies.com/terms.html](https://kordagencies.com/terms.html)
173
+
174
+ ---
175
+
176
+ Built by [Kord Agencies](https://kordagencies.com)
package/glama.json ADDED
@@ -0,0 +1,14 @@
1
+ {
2
+ "name": "local-model-suitability-mcp",
3
+ "description": "AI-powered MCP tool that evaluates whether a local model is suitable for a specific task. Helps agents avoid paying cloud rates unnecessarily, sending sensitive data to external APIs, and trusting under-powered local models with tasks beyond their capability.",
4
+ "version": "1.0.0",
5
+ "author": "ojas1",
6
+ "homepage": "https://kordagencies.com",
7
+ "license": "MIT",
8
+ "tools": [
9
+ {
10
+ "name": "evaluate_local_model_suitability",
11
+ "description": "Evaluates whether a local model is suitable for a task. Returns LOCAL / CLOUD / EITHER / NEITHER verdict with cost, privacy, latency, and quality reasoning."
12
+ }
13
+ ]
14
+ }
package/package.json ADDED
@@ -0,0 +1,38 @@
1
+ {
2
+ "name": "local-model-suitability-mcp",
3
+ "version": "1.0.0",
4
+ "description": "AI-powered MCP tool that evaluates whether a local model is suitable for a specific task — helps agents decide between local inference and cloud APIs based on cost, privacy, latency, and quality requirements.",
5
+ "main": "src/server.js",
6
+ "type": "module",
7
+ "bin": {
8
+ "local-model-suitability-mcp": "src/server.js"
9
+ },
10
+ "scripts": {
11
+ "start": "node src/server.js"
12
+ },
13
+ "keywords": [
14
+ "mcp",
15
+ "agent",
16
+ "local-llm",
17
+ "ollama",
18
+ "model-routing",
19
+ "ai-inference",
20
+ "llm-evaluation",
21
+ "on-device-ai",
22
+ "privacy",
23
+ "cost-optimisation"
24
+ ],
25
+ "author": "ojas1",
26
+ "license": "MIT",
27
+ "dependencies": {
28
+ "@anthropic-ai/sdk": "^0.39.0"
29
+ },
30
+ "engines": {
31
+ "node": ">=18.0.0"
32
+ },
33
+ "repository": {
34
+ "type": "git",
35
+ "url": "https://github.com/OjasKord/local-model-suitability-mcp"
36
+ },
37
+ "homepage": "https://kordagencies.com"
38
+ }
package/server.json ADDED
@@ -0,0 +1,27 @@
1
+ {
2
+ "name": "local-model-suitability-mcp",
3
+ "display_name": "Local Model Suitability MCP",
4
+ "description": "AI-powered tool that evaluates whether a local model (Ollama, LM Studio, etc.) is suitable for a specific task — helping agents make intelligent decisions about cost, privacy, latency, and quality without guessing.",
5
+ "version": "1.0.0",
6
+ "author": {
7
+ "name": "ojas1",
8
+ "url": "https://kordagencies.com"
9
+ },
10
+ "repository": {
11
+ "type": "git",
12
+ "url": "https://github.com/OjasKord/local-model-suitability-mcp"
13
+ },
14
+ "homepage": "https://kordagencies.com",
15
+ "license": "MIT",
16
+ "keywords": ["mcp", "agent", "local-llm", "ollama", "model-routing", "privacy", "cost-optimisation"],
17
+ "categories": ["ai", "infrastructure", "developer-tools"],
18
+ "tools": [
19
+ {
20
+ "name": "evaluate_local_model_suitability",
21
+ "description": "AI-powered evaluation of whether a local model is suitable for a specific task. Returns a structured verdict with cost, privacy, latency, and quality reasoning."
22
+ }
23
+ ],
24
+ "installation": {
25
+ "npm": "local-model-suitability-mcp"
26
+ }
27
+ }
package/smithery.yaml ADDED
@@ -0,0 +1,24 @@
1
+ name: local-model-suitability-mcp
2
+ description: AI-powered tool that evaluates whether a local model (Ollama, LM Studio, etc.) is suitable for a specific task. Helps agents make intelligent decisions about cost, privacy, latency, and quality — avoiding expensive mistakes in both directions.
3
+ version: 1.0.0
4
+ author: ojas1
5
+ homepage: https://kordagencies.com
6
+ license: MIT
7
+
8
+ tools:
9
+ - name: evaluate_local_model_suitability
10
+ description: >
11
+ Evaluates whether a specific local model is suitable for a specific task.
12
+ Returns a structured verdict (LOCAL / CLOUD / EITHER / NEITHER) with
13
+ reasoning about cost, privacy, quality risk, and failure modes.
14
+
15
+ config:
16
+ schema:
17
+ type: object
18
+ properties:
19
+ apiKey:
20
+ type: string
21
+ description: "Pro API key from kordagencies.com. Leave blank for free tier (20 evaluations/month)."
22
+ x-from:
23
+ header: x-api-key
24
+ required: []
package/src/server.js ADDED
@@ -0,0 +1,469 @@
1
+ #!/usr/bin/env node
2
+
3
+ import Anthropic from '@anthropic-ai/sdk';
4
+ import { readFileSync, writeFileSync, existsSync } from 'fs';
5
+
6
+ // ─── Constants ───────────────────────────────────────────────────────────────
7
+
8
+ const VERSION = '1.0.0';
9
+ const FREE_TIER_LIMIT = 20;
10
+ const STATS_FILE = '/tmp/lms_stats.json';
11
+
12
+ const LEGAL_DISCLAIMER =
13
+ 'Results are AI-powered assessments based on known model benchmarks and capabilities. ' +
14
+ 'We do not log or store your query content. ' +
15
+ 'Results are for informational purposes only and do not constitute technical guarantees. ' +
16
+ 'Operator must independently validate model output quality for production workloads. ' +
17
+ 'Provider maximum liability is limited to subscription fees paid in the preceding 3 months. ' +
18
+ 'Full terms: kordagencies.com/terms.html';
19
+
20
+ function nowISO() {
21
+ return new Date().toISOString();
22
+ }
23
+
24
+ // ─── Stats persistence ────────────────────────────────────────────────────────
25
+
26
+ function loadStats() {
27
+ try {
28
+ if (existsSync(STATS_FILE)) {
29
+ return JSON.parse(readFileSync(STATS_FILE, 'utf8'));
30
+ }
31
+ } catch (_) {}
32
+ return {
33
+ total_requests: 0,
34
+ tool_usage: { evaluate_local_model_suitability: 0 },
35
+ free_tier_calls_by_ip: {},
36
+ start_time: nowISO(),
37
+ };
38
+ }
39
+
40
+ function saveStats(stats) {
41
+ try {
42
+ writeFileSync(STATS_FILE, JSON.stringify(stats, null, 2));
43
+ } catch (_) {}
44
+ }
45
+
46
+ const stats = loadStats();
47
+
48
+ // ─── Free tier enforcement ────────────────────────────────────────────────────
49
+
50
+ function checkFreeTier(apiKey, clientIp) {
51
+ if (apiKey) return { allowed: true, paid: true };
52
+ const key = clientIp || 'unknown';
53
+ const now = new Date();
54
+ const monthKey = `${now.getFullYear()}-${String(now.getMonth() + 1).padStart(2, '0')}`;
55
+ if (!stats.free_tier_calls_by_ip[key]) stats.free_tier_calls_by_ip[key] = {};
56
+ const monthCalls = stats.free_tier_calls_by_ip[key][monthKey] || 0;
57
+ if (monthCalls >= FREE_TIER_LIMIT) {
58
+ return { allowed: false, paid: false, used: monthCalls };
59
+ }
60
+ stats.free_tier_calls_by_ip[key][monthKey] = monthCalls + 1;
61
+ saveStats(stats);
62
+ return { allowed: true, paid: false, remaining: FREE_TIER_LIMIT - monthCalls - 1 };
63
+ }
64
+
65
+ // ─── Model knowledge base ─────────────────────────────────────────────────────
66
+
67
+ const MODEL_KNOWLEDGE = {
68
+ // Llama family
69
+ 'llama3.1:8b': { params: '8B', tier: 'small', strengths: ['simple Q&A', 'basic summarisation', 'short classification', 'data extraction'], weaknesses: ['complex multi-step reasoning', 'long-context coherence', 'nuanced instruction following', 'code generation beyond simple scripts'], context_window: 128000 },
70
+ 'llama3.1:70b': { params: '70B', tier: 'large', strengths: ['complex reasoning', 'code generation', 'nuanced analysis', 'long-context tasks'], weaknesses: ['frontier-level reasoning', 'very specialised domain knowledge'], context_window: 128000 },
71
+ 'llama3.1:405b': { params: '405B',tier: 'frontier',strengths: ['frontier reasoning', 'complex code', 'deep analysis', 'long-context coherence'], weaknesses: ['hardware requirements are extreme'], context_window: 128000 },
72
+ 'llama3.2:3b': { params: '3B', tier: 'tiny', strengths: ['very simple classification', 'keyword extraction', 'structured data parsing'], weaknesses: ['any reasoning', 'multi-step tasks', 'creative generation', 'code'], context_window: 128000 },
73
+ 'llama3.2:1b': { params: '1B', tier: 'tiny', strengths: ['simple keyword extraction', 'basic yes/no classification'], weaknesses: ['almost everything beyond trivial tasks'], context_window: 128000 },
74
+
75
+ // Mistral family
76
+ 'mistral:7b': { params: '7B', tier: 'small', strengths: ['instruction following', 'simple reasoning', 'structured output', 'European language tasks'], weaknesses: ['complex multi-step reasoning', 'long document analysis'], context_window: 32000 },
77
+ 'mixtral:8x7b': { params: '47B', tier: 'medium', strengths: ['strong reasoning', 'code generation', 'multilingual', 'structured output'], weaknesses: ['frontier reasoning', 'very long contexts'], context_window: 32000 },
78
+ 'mistral-nemo:12b': { params: '12B', tier: 'small', strengths: ['instruction following', 'simple to medium reasoning', 'multilingual'], weaknesses: ['complex reasoning', 'long-context tasks'], context_window: 128000 },
79
+
80
+ // Qwen family
81
+ 'qwen2.5:7b': { params: '7B', tier: 'small', strengths: ['coding', 'maths', 'structured output', 'multilingual'], weaknesses: ['complex reasoning', 'long-context coherence'], context_window: 128000 },
82
+ 'qwen2.5:14b': { params: '14B', tier: 'medium', strengths: ['strong coding', 'maths', 'reasoning', 'multilingual'], weaknesses: ['frontier-level reasoning'], context_window: 128000 },
83
+ 'qwen2.5:32b': { params: '32B', tier: 'large', strengths: ['excellent coding', 'maths', 'complex reasoning', 'multilingual'], weaknesses: ['frontier reasoning on very hard tasks'], context_window: 128000 },
84
+ 'qwen2.5:72b': { params: '72B', tier: 'large', strengths: ['frontier-adjacent reasoning', 'coding', 'maths', 'long context'], weaknesses: ['hardware requirements are high'], context_window: 128000 },
85
+
86
+ // Gemma family
87
+ 'gemma2:2b': { params: '2B', tier: 'tiny', strengths: ['simple classification', 'short Q&A', 'keyword extraction'], weaknesses: ['reasoning', 'multi-step tasks', 'code'], context_window: 8192 },
88
+ 'gemma2:9b': { params: '9B', tier: 'small', strengths: ['instruction following', 'simple reasoning', 'coding basics'], weaknesses: ['complex reasoning', 'long contexts'], context_window: 8192 },
89
+ 'gemma2:27b': { params: '27B', tier: 'medium', strengths: ['solid reasoning', 'code generation', 'analysis'], weaknesses: ['frontier reasoning'], context_window: 8192 },
90
+
91
+ // Phi family
92
+ 'phi3:mini': { params: '3.8B', tier: 'tiny', strengths: ['simple reasoning', 'code snippets', 'structured output'], weaknesses: ['complex multi-step tasks', 'long contexts'], context_window: 128000 },
93
+ 'phi3:medium': { params: '14B', tier: 'medium', strengths: ['reasoning', 'coding', 'maths', 'instruction following'], weaknesses: ['frontier reasoning'], context_window: 128000 },
94
+ 'phi4': { params: '14B', tier: 'medium', strengths: ['strong reasoning', 'coding', 'maths', 'structured output'], weaknesses: ['frontier reasoning on hardest tasks'], context_window: 16000 },
95
+
96
+ // Code-specific
97
+ 'codellama:7b': { params: '7B', tier: 'small', strengths: ['code completion', 'simple code generation', 'code explanation'], weaknesses: ['complex architecture', 'multi-file reasoning', 'debugging complex bugs'], context_window: 16000 },
98
+ 'codellama:34b': { params: '34B', tier: 'large', strengths: ['complex code generation', 'multi-language', 'architecture reasoning'], weaknesses: ['frontier coding tasks'], context_window: 16000 },
99
+ 'deepseek-coder:6.7b': { params: '6.7B', tier: 'small', strengths: ['code generation', 'code explanation', 'simple debugging'], weaknesses: ['complex multi-file reasoning'], context_window: 16000 },
100
+ 'deepseek-coder:33b': { params: '33B', tier: 'large', strengths: ['complex code generation', 'debugging', 'architecture'], weaknesses: ['frontier coding'], context_window: 16000 },
101
+
102
+ // DeepSeek R1 family
103
+ 'deepseek-r1:8b': { params: '8B', tier: 'small', strengths: ['reasoning with chain-of-thought', 'maths', 'simple logic'], weaknesses: ['complex domain knowledge', 'very hard reasoning'], context_window: 128000 },
104
+ 'deepseek-r1:32b': { params: '32B', tier: 'large', strengths: ['strong reasoning', 'maths', 'coding', 'logic'], weaknesses: ['frontier reasoning'], context_window: 128000 },
105
+ 'deepseek-r1:70b': { params: '70B', tier: 'large', strengths: ['frontier-adjacent reasoning', 'maths', 'complex coding'], weaknesses: ['hardware requirements are very high'], context_window: 128000 },
106
+ };
107
+
108
+ function lookupModel(modelName) {
109
+ const normalized = modelName.toLowerCase().trim();
110
+ if (MODEL_KNOWLEDGE[normalized]) return MODEL_KNOWLEDGE[normalized];
111
+ // Fuzzy match — strip version tags like :latest
112
+ const base = normalized.replace(/:latest$/, '');
113
+ for (const key of Object.keys(MODEL_KNOWLEDGE)) {
114
+ if (key.startsWith(base) || base.startsWith(key.split(':')[0])) {
115
+ return MODEL_KNOWLEDGE[key];
116
+ }
117
+ }
118
+ return null;
119
+ }
120
+
121
+ // ─── Claude brain ─────────────────────────────────────────────────────────────
122
+
123
+ const anthropic = new Anthropic({
124
+ apiKey: process.env.ANTHROPIC_API_KEY,
125
+ });
126
+
127
+ async function evaluateWithClaude(params) {
128
+ const {
129
+ task_description,
130
+ local_model,
131
+ quality_threshold,
132
+ use_case_type,
133
+ data_sensitivity,
134
+ latency_requirement,
135
+ model_info,
136
+ } = params;
137
+
138
+ const systemPrompt = `You are an expert in LLM capabilities and deployment strategy.
139
+ Your job is to give AI agents a clear, honest verdict on whether a specific local model
140
+ is suitable for a specific task — so agents can make intelligent decisions about
141
+ cost, privacy, latency, and quality without guessing.
142
+
143
+ You understand the real-world capability gaps between model sizes and how they affect
144
+ production workloads. You do not hedge excessively. You give a clear verdict with
145
+ clear reasoning that an agent can act on immediately.
146
+
147
+ Always respond in valid JSON only. No markdown, no preamble.`;
148
+
149
+ const userPrompt = `Evaluate whether this local model is suitable for this task.
150
+
151
+ TASK: ${task_description}
152
+ LOCAL MODEL: ${local_model}
153
+ ${model_info ? `MODEL PROFILE: ${JSON.stringify(model_info)}` : 'MODEL PROFILE: Unknown model — assess based on name/size patterns'}
154
+ QUALITY THRESHOLD: ${quality_threshold} (draft=errors acceptable, production=high accuracy required, critical=near-perfect required)
155
+ USE CASE TYPE: ${use_case_type}
156
+ DATA SENSITIVITY: ${data_sensitivity} (public=safe to send to cloud, internal=prefer local, confidential=must stay local)
157
+ LATENCY REQUIREMENT: ${latency_requirement} (flexible=seconds ok, moderate=under 2s preferred, realtime=under 500ms required)
158
+
159
+ Respond with this exact JSON structure:
160
+ {
161
+ "verdict": "LOCAL" | "CLOUD" | "EITHER" | "NEITHER",
162
+ "confidence": "HIGH" | "MEDIUM" | "LOW",
163
+ "summary": "One sentence verdict an agent can act on immediately",
164
+ "reasoning": {
165
+ "quality_assessment": "Can this model reliably handle this task at the required quality level?",
166
+ "cost_impact": "What is the cost argument for local vs cloud here?",
167
+ "privacy_assessment": "What are the data exposure implications of cloud routing?",
168
+ "latency_assessment": "Will local inference meet the latency requirement?",
169
+ "failure_modes": "What specific failures should the agent watch for if using local?"
170
+ },
171
+ "recommended_model": "If verdict is CLOUD, suggest the most cost-effective cloud model for this task",
172
+ "fallback_advice": "If local model fails, what should the agent do?",
173
+ "task_complexity": "SIMPLE" | "MODERATE" | "COMPLEX" | "FRONTIER"
174
+ }`;
175
+
176
+ const response = await anthropic.messages.create({
177
+ model: 'claude-sonnet-4-20250514',
178
+ max_tokens: 1000,
179
+ system: systemPrompt,
180
+ messages: [{ role: 'user', content: userPrompt }],
181
+ });
182
+
183
+ const text = response.content[0].text.trim();
184
+ return JSON.parse(text);
185
+ }
186
+
187
+ // ─── MCP Protocol ─────────────────────────────────────────────────────────────
188
+
189
+ const TOOL_DEFINITION = {
190
+ name: 'evaluate_local_model_suitability',
191
+ description:
192
+ 'Call this tool before executing any task where you have both a local model (Ollama, LM Studio, etc.) and a cloud API available. ' +
193
+ 'Use it to avoid three expensive mistakes: (1) paying cloud rates for tasks a local model handles perfectly, ' +
194
+ '(2) sending sensitive or confidential data to cloud APIs unnecessarily, ' +
195
+ '(3) trusting a local model with tasks beyond its capability and getting silent quality failures. ' +
196
+ 'Also use when you need to meet strict latency requirements and must decide whether local inference is fast enough. ' +
197
+ 'Returns a structured verdict — LOCAL / CLOUD / EITHER / NEITHER — with specific reasoning about cost, privacy, quality risk, and failure modes for your exact task and model. ' +
198
+ 'AI-powered assessment — NOT a simple benchmark lookup. ' +
199
+ 'Free tier: first 20 evaluations/month, no API key needed. ' +
200
+ 'Full terms: kordagencies.com/terms.html',
201
+ inputSchema: {
202
+ type: 'object',
203
+ properties: {
204
+ task_description: {
205
+ type: 'string',
206
+ description:
207
+ 'Describe the task you are about to run. Be specific — include the type of reasoning, expected output format, and any quality constraints. Example: "Classify customer support emails into 5 categories. Must be accurate enough for production routing — wrong classification costs money."',
208
+ },
209
+ local_model: {
210
+ type: 'string',
211
+ description:
212
+ 'The local model name and size. Use Ollama-style naming where possible. Examples: llama3.1:8b, mistral:7b, qwen2.5:14b, phi4, deepseek-r1:32b',
213
+ },
214
+ quality_threshold: {
215
+ type: 'string',
216
+ enum: ['draft', 'production', 'critical'],
217
+ description:
218
+ 'draft = errors acceptable, output will be reviewed by human. production = high accuracy required, output used directly. critical = near-perfect required, failures have significant consequences.',
219
+ },
220
+ use_case_type: {
221
+ type: 'string',
222
+ enum: [
223
+ 'classification',
224
+ 'summarisation',
225
+ 'code_generation',
226
+ 'reasoning',
227
+ 'data_extraction',
228
+ 'creative_writing',
229
+ 'question_answering',
230
+ 'translation',
231
+ 'sentiment_analysis',
232
+ 'other',
233
+ ],
234
+ description: 'The primary type of task the model will perform.',
235
+ },
236
+ data_sensitivity: {
237
+ type: 'string',
238
+ enum: ['public', 'internal', 'confidential'],
239
+ description:
240
+ 'public = safe to send to any cloud API. internal = organisation data, prefer local. confidential = must stay on-device, cannot be sent to external APIs.',
241
+ },
242
+ latency_requirement: {
243
+ type: 'string',
244
+ enum: ['flexible', 'moderate', 'realtime'],
245
+ description:
246
+ 'flexible = response in seconds is fine. moderate = under 2 seconds preferred. realtime = under 500ms required (e.g. streaming UI, voice agent).',
247
+ },
248
+ },
249
+ required: [
250
+ 'task_description',
251
+ 'local_model',
252
+ 'quality_threshold',
253
+ 'use_case_type',
254
+ 'data_sensitivity',
255
+ 'latency_requirement',
256
+ ],
257
+ },
258
+ };
259
+
260
+ // ─── Request handler ──────────────────────────────────────────────────────────
261
+
262
+ async function handleRequest(request) {
263
+ const { method, params } = request;
264
+
265
+ if (method === 'initialize') {
266
+ return {
267
+ protocolVersion: '2024-11-05',
268
+ capabilities: { tools: {} },
269
+ serverInfo: { name: 'local-model-suitability-mcp', version: VERSION },
270
+ };
271
+ }
272
+
273
+ if (method === 'tools/list') {
274
+ return { tools: [TOOL_DEFINITION] };
275
+ }
276
+
277
+ if (method === 'tools/call') {
278
+ const { name, arguments: args } = params;
279
+
280
+ if (name !== 'evaluate_local_model_suitability') {
281
+ throw { code: -32601, message: `Unknown tool: ${name}` };
282
+ }
283
+
284
+ // Stats
285
+ stats.total_requests++;
286
+ stats.tool_usage.evaluate_local_model_suitability =
287
+ (stats.tool_usage.evaluate_local_model_suitability || 0) + 1;
288
+ saveStats(stats);
289
+
290
+ // Free tier check
291
+ const apiKey = request._meta?.apiKey || null;
292
+ const clientIp = request._meta?.clientIp || null;
293
+ const tierCheck = checkFreeTier(apiKey, clientIp);
294
+
295
+ if (!tierCheck.allowed) {
296
+ return {
297
+ content: [
298
+ {
299
+ type: 'text',
300
+ text: JSON.stringify({
301
+ error: `Free tier limit of ${FREE_TIER_LIMIT} evaluations/month reached. You have seen it work — upgrade to Pro ($99/month) at kordagencies.com to continue.`,
302
+ upgrade_url: 'https://kordagencies.com',
303
+ _disclaimer: LEGAL_DISCLAIMER,
304
+ }),
305
+ },
306
+ ],
307
+ };
308
+ }
309
+
310
+ // Validate required fields
311
+ const required = [
312
+ 'task_description',
313
+ 'local_model',
314
+ 'quality_threshold',
315
+ 'use_case_type',
316
+ 'data_sensitivity',
317
+ 'latency_requirement',
318
+ ];
319
+ for (const field of required) {
320
+ if (!args[field]) {
321
+ throw {
322
+ code: -32602,
323
+ message: `Missing required parameter: ${field}`,
324
+ };
325
+ }
326
+ }
327
+
328
+ // Look up model knowledge
329
+ const modelInfo = lookupModel(args.local_model);
330
+
331
+ // Claude brain assessment
332
+ let assessment;
333
+ try {
334
+ assessment = await evaluateWithClaude({
335
+ task_description: args.task_description,
336
+ local_model: args.local_model,
337
+ quality_threshold: args.quality_threshold,
338
+ use_case_type: args.use_case_type,
339
+ data_sensitivity: args.data_sensitivity,
340
+ latency_requirement: args.latency_requirement,
341
+ model_info: modelInfo,
342
+ });
343
+ } catch (err) {
344
+ return {
345
+ content: [
346
+ {
347
+ type: 'text',
348
+ text: JSON.stringify({
349
+ error:
350
+ 'Assessment engine temporarily unavailable — this is not a problem with your query. Please retry in 30 seconds.',
351
+ checked_at: nowISO(),
352
+ _disclaimer: LEGAL_DISCLAIMER,
353
+ }),
354
+ },
355
+ ],
356
+ };
357
+ }
358
+
359
+ // Build response
360
+ const response = {
361
+ verdict: assessment.verdict,
362
+ confidence: assessment.confidence,
363
+ summary: assessment.summary,
364
+ model_evaluated: args.local_model,
365
+ model_profile: modelInfo
366
+ ? {
367
+ parameter_count: modelInfo.params,
368
+ tier: modelInfo.tier,
369
+ known_strengths: modelInfo.strengths,
370
+ known_weaknesses: modelInfo.weaknesses,
371
+ }
372
+ : { note: 'Model not in knowledge base — assessment based on name and size patterns' },
373
+ task_complexity: assessment.task_complexity,
374
+ reasoning: assessment.reasoning,
375
+ recommended_cloud_model: assessment.recommended_model || null,
376
+ fallback_advice: assessment.fallback_advice,
377
+ analysis_type: 'AI-powered — NOT a simple benchmark lookup',
378
+ free_tier_remaining: tierCheck.paid ? 'unlimited' : tierCheck.remaining,
379
+ checked_at: nowISO(),
380
+ _disclaimer: LEGAL_DISCLAIMER,
381
+ };
382
+
383
+ return {
384
+ content: [{ type: 'text', text: JSON.stringify(response, null, 2) }],
385
+ };
386
+ }
387
+
388
+ // Health / stats via special methods
389
+ if (method === 'health') {
390
+ return { status: 'ok', version: VERSION, checked_at: nowISO() };
391
+ }
392
+
393
+ throw { code: -32601, message: `Method not found: ${method}` };
394
+ }
395
+
396
+ // ─── HTTP server (for health + stats endpoints) ───────────────────────────────
397
+
398
+ import { createServer } from 'http';
399
+
400
+ const httpServer = createServer((req, res) => {
401
+ res.setHeader('Content-Type', 'application/json');
402
+
403
+ if (req.url === '/health') {
404
+ res.writeHead(200);
405
+ res.end(JSON.stringify({ status: 'ok', version: VERSION, checked_at: nowISO() }));
406
+ return;
407
+ }
408
+
409
+ if (req.url === '/stats') {
410
+ const statsKey = req.headers['x-stats-key'];
411
+ if (statsKey !== process.env.STATS_KEY) {
412
+ res.writeHead(401);
413
+ res.end(JSON.stringify({ error: 'Unauthorised' }));
414
+ return;
415
+ }
416
+ res.writeHead(200);
417
+ res.end(JSON.stringify({ ...stats, version: VERSION, checked_at: nowISO() }));
418
+ return;
419
+ }
420
+
421
+ res.writeHead(404);
422
+ res.end(JSON.stringify({ error: 'Not found' }));
423
+ });
424
+
425
+ const HTTP_PORT = process.env.PORT || 3000;
426
+ httpServer.listen(HTTP_PORT, () => {
427
+ process.stderr.write(`[local-model-suitability-mcp] HTTP on port ${HTTP_PORT}\n`);
428
+ });
429
+
430
+ // ─── stdio MCP transport ──────────────────────────────────────────────────────
431
+
432
+ process.stdin.setEncoding('utf8');
433
+ let buffer = '';
434
+
435
+ process.stdin.on('data', async (chunk) => {
436
+ buffer += chunk;
437
+ const lines = buffer.split('\n');
438
+ buffer = lines.pop();
439
+
440
+ for (const line of lines) {
441
+ const trimmed = line.trim();
442
+ if (!trimmed) continue;
443
+
444
+ let request;
445
+ try {
446
+ request = JSON.parse(trimmed);
447
+ } catch {
448
+ continue;
449
+ }
450
+
451
+ const id = request.id;
452
+ try {
453
+ const result = await handleRequest(request);
454
+ const response = { jsonrpc: '2.0', id, result };
455
+ process.stdout.write(JSON.stringify(response) + '\n');
456
+ } catch (err) {
457
+ const error =
458
+ typeof err === 'object' && err.code
459
+ ? err
460
+ : { code: -32603, message: String(err?.message || err) };
461
+ process.stdout.write(
462
+ JSON.stringify({ jsonrpc: '2.0', id, error }) + '\n'
463
+ );
464
+ }
465
+ }
466
+ });
467
+
468
+ process.on('SIGINT', () => process.exit(0));
469
+ process.on('SIGTERM', () => process.exit(0));