@index9/mcp 6.1.0 → 6.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +170 -76
- package/manifest.json +1 -1
- package/package.json +4 -4
package/dist/cli.js
CHANGED
|
@@ -48,6 +48,11 @@ var MissingModelDiagnosticSchema = z.object({
|
|
|
48
48
|
provider: z.string().optional(),
|
|
49
49
|
message: z.string()
|
|
50
50
|
});
|
|
51
|
+
var SuggestionEntrySchema = z.object({
|
|
52
|
+
id: z.string(),
|
|
53
|
+
name: z.string(),
|
|
54
|
+
created: z.number().nullable()
|
|
55
|
+
});
|
|
51
56
|
var UserContentTextPartSchema = z.strictObject({
|
|
52
57
|
type: z.literal("text"),
|
|
53
58
|
text: z.string().trim().min(1)
|
|
@@ -108,8 +113,9 @@ Key rules:
|
|
|
108
113
|
- find_models price-asc tends to be dominated by free preview models \u2014 pass \`excludeFree=true\` when you want a paid SLA.
|
|
109
114
|
- find_models always emits \`meta.confidence\` ("high" | "low") on semantic queries. Low means no candidate matched on keyword (BM25); \`meta.lowConfidenceReason\` is "no_keyword_matches" or "no_results" and \`meta.suggestion\` carries an actionable hint. Weak hits are capped at score=30 so they don't masquerade as strong matches. Pass \`requireKeywordMatch: true\` to get an empty page instead of weak vector-only neighbors.
|
|
110
115
|
- find_models with sortBy=price exposes \`pricing.effectivePromptPerMillion\` and \`pageInfo.priceSortBasis\` \u2014 sort order may diverge from displayed promptPerMillion for models with per-request fees.
|
|
111
|
-
- get_models
|
|
116
|
+
- Your training-data model IDs are routinely stale. get_models / compare_models / test_model all accept aliases (display names, short names) and return unknown ids in \`missingIds\` with \`suggestions[id]\` ordered newest-first (each entry: \`{id, name, created}\`, where \`created\` is unix seconds), plus \`missingDiagnostics[id].reason\` \u2208 {"unknown_provider", "no_match", "suggestions_available", "ambiguous_alias"}. **Default recovery: retry with \`suggestions[id][0].id\` \u2014 it's the newest viable replacement.** If suggestions is empty or reason="no_match"/"unknown_provider", fall back to \`find_models sortBy=created\` instead.
|
|
112
117
|
- compare_models accepts the same alias formats as get_models. Use it instead of N parallel get_models calls when the user is comparing finalists.
|
|
118
|
+
- test_model pre-flight resolves and filters unresolvable ids out of the OpenRouter call, so stale ids never cost you credits \u2014 they come back in missingIds with the same suggestions/diagnostics surface as get_models. If every id is unresolvable, the call returns 400 with diagnostics and no inference fires.
|
|
113
119
|
- Use test_model with \`dryRun=true\` to estimate cost before live testing. Pass \`expectedPromptTokens\` for capacity planning at sizes you don't want to paste in full.
|
|
114
120
|
- test_model with \`dryRun=false\` (default) requires OPENROUTER_API_KEY and incurs real usage costs.
|
|
115
121
|
- Reasoning-capable models (capabilities includes "reasoning") burn hidden reasoning tokens against \`maxTokens\` before emitting visible text. Leave \`maxTokens\` unset, or set it to at least 2000, when testing reasoning models \u2014 otherwise results may fail with finish_reason=length.
|
|
@@ -152,7 +158,7 @@ Pass result.id to get_models for full specs or to test_model for live testing.`,
|
|
|
152
158
|
|
|
153
159
|
Call after find_models to inspect candidates, or directly when the user names a model (format: 'provider/model-name').
|
|
154
160
|
|
|
155
|
-
Response: { results: (Model | null)[], missingIds: string[], resolvedAliases?: Record<alias, canonicalId>, ambiguousAliases?: Record<alias, candidateIds[]>, suggestions?: Record<unknownId,
|
|
161
|
+
Response: { results: (Model | null)[], missingIds: string[], resolvedAliases?: Record<alias, canonicalId>, ambiguousAliases?: Record<alias, candidateIds[]>, suggestions?: Record<unknownId, Array<{id, name, created}>> }. Each non-null result has:
|
|
156
162
|
- id, canonicalSlug, name, description
|
|
157
163
|
- created (unix seconds), createdAt (ISO 8601), knowledgeCutoff (ISO date or null)
|
|
158
164
|
- contextLength (tokens), maxOutputTokens, isModerated
|
|
@@ -161,7 +167,7 @@ Response: { results: (Model | null)[], missingIds: string[], resolvedAliases?: R
|
|
|
161
167
|
- capabilities[]: normalized capability flags (same values as find_models and capabilitiesAll/Any)
|
|
162
168
|
- supportedParameters[]: OpenRouter parameters the model accepts (e.g., "temperature", "tools", "response_format")
|
|
163
169
|
|
|
164
|
-
Entries in results are null when the id is unknown; those ids appear in missingIds. Ambiguous aliases appear in ambiguousAliases with candidate canonical ids \u2014 pass a canonical id to disambiguate. Unknown ids that partially match (e.g. "sonnet" \u2192 all Claude Sonnet variants) appear in suggestions
|
|
170
|
+
Entries in results are null when the id is unknown; those ids appear in missingIds. Ambiguous aliases appear in ambiguousAliases with candidate canonical ids \u2014 pass a canonical id to disambiguate. Unknown ids that partially match (e.g. "sonnet" \u2192 all Claude Sonnet variants) appear in \`suggestions\` as up to 5 \`{id, name, created}\` entries **sorted newest-first** \u2014 pick \`suggestions[id][0].id\` for the most current replacement without a second lookup. When token-overlap finds nothing but the id is shaped like \`provider/<unknown>\` and the provider exists, suggestions falls back to the 5 newest models from that provider (real created timestamps, no hardcoded "popular" list).
|
|
165
171
|
|
|
166
172
|
\`missingDiagnostics\` (when present) gives a machine-readable reason per missing id: \`unknown_provider\` (the prefix before / isn't in the catalog \u2014 fix the provider, not the model name), \`ambiguous_alias\`, \`suggestions_available\` (mirrors suggestions[id]), or \`no_match\`.`,
|
|
167
173
|
requiresKey: false
|
|
@@ -174,13 +180,13 @@ Entries in results are null when the id is unknown; those ids appear in missingI
|
|
|
174
180
|
|
|
175
181
|
Use this when the user asks "which is cheaper / has more context / supports X" across multiple specific models. Faster than calling get_models and diffing yourself.
|
|
176
182
|
|
|
177
|
-
Response: { models: ModelResponse[], diff: { contextLength, maxOutputTokens, promptPricePerMillion, completionPricePerMillion, tokenizer, inputModalities, outputModalities, capabilities, supportedParameters }, cheapestForPromptPerMillion, largestContext, missingIds, resolvedAliases?, ambiguousAliases?, suggestions? }.
|
|
183
|
+
Response: { models: ModelResponse[], diff: { contextLength, maxOutputTokens, promptPricePerMillion, completionPricePerMillion, tokenizer, inputModalities, outputModalities, capabilities, supportedParameters }, cheapestForPromptPerMillion, largestContext, missingIds, resolvedAliases?, ambiguousAliases?, suggestions?: Record<unknownId, Array<{id, name, created}>> (newest-first), missingDiagnostics? }.
|
|
178
184
|
|
|
179
185
|
Each numeric/string diff field has { allEqual: boolean, values: Record<id, value|null> }. Capability/parameter diffs have { commonAll: string[], uniquePerModel: Record<id, string[]> }. cheapestForPromptPerMillion / largestContext are convenience picks across the supplied models \u2014 null when the field is missing on every model.
|
|
180
186
|
|
|
181
187
|
Optional: pass \`expectedPromptTokens\` AND \`expectedCompletionTokens\` to also receive \`workloadCosts\` and \`cheapestForRealisticWorkload\` \u2014 the actual cheapest given the user's expected token mix. Each \`workloadCosts[i]\` carries \`tokenCostUsd\` (token-only), \`requestCostUsd\` (per-request fee), \`totalCostUsd\` (sum, includes request fees), and \`pricingBasis\` ("exact_per_token" | "rounded_per_million" | "unavailable"). This matters when prompt:completion price ratios diverge across models, or when a model has a per-request fee.
|
|
182
188
|
|
|
183
|
-
Accepts the same alias formats as get_models. Unknown ids are returned in missingIds (with suggestions when partial matches exist, plus \`missingDiagnostics\` carrying a machine-readable reason per id).`,
|
|
189
|
+
Accepts the same alias formats as get_models. Unknown ids are returned in missingIds (with \`suggestions[id]\` as newest-first \`{id, name, created}\` entries when partial matches exist, plus \`missingDiagnostics\` carrying a machine-readable reason per id). When fewer than 2 ids resolve, this returns 400 with the diagnostics so you can retry with \`suggestions[id][0].id\` for each missing id.`,
|
|
184
190
|
requiresKey: false
|
|
185
191
|
},
|
|
186
192
|
list_facets: {
|
|
@@ -213,10 +219,18 @@ Parameters:
|
|
|
213
219
|
- expectedPromptTokens: Estimated prompt-token count for dryRun cost estimation; overrides the prompt-string heuristic. Use to model "what would N-token requests cost?" without pasting N tokens.
|
|
214
220
|
- expectedCompletionTokens: Optional completion token estimate used by dryRun
|
|
215
221
|
- maxTokens, systemPrompt, temperature, topP, seed, responseFormat, enforceJson, retries: Live-testing controls (ignored when dryRun=true)
|
|
222
|
+
- stream: Use OpenRouter's SSE streaming so capacity/refusal errors surface in ~1s instead of waiting the full per-model timeout for an empty 200. Defaults to false.
|
|
223
|
+
- firstTokenTimeoutMs: Streaming-only deadline for the first delta. Defaults to 10s. If the upstream sends no token within this window, the request aborts and returns failureReason="timeout". Ignored when stream=false.
|
|
224
|
+
- providerSort: "throughput" | "price" | "latency" \u2014 opt-in OpenRouter provider routing. Defaults to OpenRouter's load-balanced choice.
|
|
225
|
+
- providerOrder: ordered list of provider slugs (up to 8). Try these providers first before falling back. Useful for steering around an overloaded provider for a single model.
|
|
226
|
+
- fallbackModels: ordered list of model ids (up to 5). OpenRouter automatically retries the request against the next id when the primary is unavailable. Use sparingly \u2014 a benchmark should usually test the model you asked for, not a substitute.
|
|
227
|
+
- debug: When true, each result includes a \`debug\` field with the raw upstream finish_reason, error message, provider name, refusal, and usage. Use to diagnose "missing assistant text" without re-running.
|
|
228
|
+
|
|
229
|
+
Results (live): each result carries modelId (the id you passed), resolvedModelId (canonical id, present when the input was an alias), ok, response, latencyMs, tokens { prompt, completion }, cost (USD; live from OpenRouter when available, else estimated from cached pricing), and truncated=true when finish_reason is "length". On failure, results include \`error\` (free-form) plus \`failureReason\` ("insufficient_credits" | "model_unavailable" | "rate_limited" | "capacity" | "timeout" | "invalid_request" | "unknown") so callers can pick a retry strategy without parsing the error string. \`capacity\` indicates the provider is overloaded \u2014 apply a longer backoff or set \`fallbackModels\` and retry. When \`debug: true\` is set, each result also carries a \`debug\` block with the upstream provider's diagnostic fields.
|
|
216
230
|
|
|
217
|
-
Results (
|
|
231
|
+
Results (dryRun): each entry carries \`tokenCostUsd\`, \`requestCostUsd\`, \`totalCostUsd\` (matches \`estimatedCost\`, includes per-request fees), and \`estimatedCostBasis\` (same enum as compare_models.workloadCosts). Use find_models or get_models first to identify model ids.
|
|
218
232
|
|
|
219
|
-
|
|
233
|
+
Stale-id recovery: unresolvable model ids are filtered out **before** any OpenRouter call (so they cost nothing) and returned in \`missingIds\` alongside \`suggestions\` (newest-first \`{id, name, created}\` entries), \`resolvedAliases\`, \`ambiguousAliases\`, and \`missingDiagnostics\` \u2014 same shape as get_models / compare_models. If every id is unresolvable, the call returns 400 with diagnostics and no inference fires. Default recovery: retry with \`suggestions[id][0].id\`.`,
|
|
220
234
|
requiresKey: true
|
|
221
235
|
}
|
|
222
236
|
};
|
|
@@ -247,24 +261,28 @@ var SITE = {
|
|
|
247
261
|
hero: {
|
|
248
262
|
titleLine1: "Pick the right AI model",
|
|
249
263
|
titleLine2: "from chat",
|
|
250
|
-
subtitle: "
|
|
251
|
-
proof: ["Live OpenRouter data \xB7 300+ models \xB7 refreshed every 30 min"],
|
|
264
|
+
subtitle: "An MCP server your coding assistant uses to search, compare, and live-test 300+ models for the task you're on.",
|
|
252
265
|
pricingNote: "Free. You only pay OpenRouter for live model calls.",
|
|
253
|
-
getStarted: "Add index9 to your editor",
|
|
254
266
|
seeHowItWorks: "See a real session",
|
|
255
|
-
updatedBadge: "OpenRouter data \xB7 refreshed "
|
|
267
|
+
updatedBadge: "OpenRouter data \xB7 refreshed ",
|
|
268
|
+
panel: {
|
|
269
|
+
signalEyebrow: "Just landed",
|
|
270
|
+
signalTitle: "Newest on OpenRouter",
|
|
271
|
+
liveLabel: "live",
|
|
272
|
+
ctaEyebrow: "How your assistant picks",
|
|
273
|
+
body: "Your assistant compares these against your task and live-tests the finalists."
|
|
274
|
+
}
|
|
256
275
|
},
|
|
257
276
|
problem: {
|
|
258
277
|
label: "Why this exists",
|
|
259
278
|
heading: "Your assistant's model knowledge is stale",
|
|
260
279
|
body: [
|
|
261
280
|
'New models ship every week. Pricing changes. "Use GPT-4" or "use Claude 3.5" is usually months behind reality.',
|
|
262
|
-
"Without live data, your assistant defaults to whatever it learned in training
|
|
263
|
-
"Index9 gives it the data and the tools to
|
|
281
|
+
"Without live data, your assistant defaults to whatever it learned in training. Usually a model superseded by something cheaper or better-suited to your task.",
|
|
282
|
+
"Index9 gives it the data, and the tools to compare."
|
|
264
283
|
]
|
|
265
284
|
},
|
|
266
285
|
howItWorks: {
|
|
267
|
-
label: "How it works",
|
|
268
286
|
heading: "How it works",
|
|
269
287
|
subtitle: "Index9 adds 5 tools to your editor. Your assistant calls them when you ask about models.",
|
|
270
288
|
steps: [
|
|
@@ -276,12 +294,12 @@ var SITE = {
|
|
|
276
294
|
{
|
|
277
295
|
number: "2",
|
|
278
296
|
title: "Your assistant calls index9",
|
|
279
|
-
body: "It searches live model data, compares finalists, and runs your prompt against the top
|
|
297
|
+
body: "It searches live model data, compares finalists, and runs your prompt against the top picks."
|
|
280
298
|
},
|
|
281
299
|
{
|
|
282
300
|
number: "3",
|
|
283
301
|
title: "You get a measured pick",
|
|
284
|
-
body: "Backed by real cost numbers and real outputs
|
|
302
|
+
body: "Backed by real cost numbers and real outputs, not training-data memory."
|
|
285
303
|
}
|
|
286
304
|
]
|
|
287
305
|
},
|
|
@@ -291,7 +309,7 @@ var SITE = {
|
|
|
291
309
|
subheading: "A Claude Code session picking a TypeScript code-review model. Real tool calls, real verdict.",
|
|
292
310
|
prompt: {
|
|
293
311
|
title: "The prompt",
|
|
294
|
-
body: "Pick a model for a TypeScript code-review bot that runs on every PR. I want
|
|
312
|
+
body: "Pick a model for a TypeScript code-review bot that runs on every PR. I want quality without paying frontier rates on routine reviews. Test against this diff."
|
|
295
313
|
},
|
|
296
314
|
toolCalls: {
|
|
297
315
|
title: "What the assistant did",
|
|
@@ -319,7 +337,7 @@ var SITE = {
|
|
|
319
337
|
]
|
|
320
338
|
},
|
|
321
339
|
consideredTitle: "Recent models, evaluated",
|
|
322
|
-
consideredSubtitle: "
|
|
340
|
+
consideredSubtitle: "Candidates the assistant ruled in and out, with the reason.",
|
|
323
341
|
consideredRows: [
|
|
324
342
|
{
|
|
325
343
|
id: "openai/gpt-5.5",
|
|
@@ -350,16 +368,14 @@ var SITE = {
|
|
|
350
368
|
title: "The pick",
|
|
351
369
|
model: "z-ai/glm-5.1",
|
|
352
370
|
body: "Open-weight, $1.05 per million input tokens. Caught both bugs in the sample diff at roughly $0.005 per PR, about 5\xD7 cheaper than running gpt-5.5 on every commit."
|
|
353
|
-
},
|
|
354
|
-
quote: {
|
|
355
|
-
body: "The frontier model would have caught both bugs, at 5\xD7 the cost. The cheapest candidate missed them entirely. Only the live test surfaced the model that did both.",
|
|
356
|
-
attribution: "index9 session trace"
|
|
357
371
|
}
|
|
358
372
|
},
|
|
359
373
|
toolsSection: {
|
|
360
374
|
label: "Tools",
|
|
361
375
|
heading: "The 5 tools",
|
|
362
376
|
subheading: "Your assistant chains these together. You don't call them directly.",
|
|
377
|
+
keyNotePrefix: "Only",
|
|
378
|
+
keyNoteSuffix: "needs an OpenRouter key. The rest work out of the box.",
|
|
363
379
|
openRouterKey: "OpenRouter API key",
|
|
364
380
|
noKeyRequired: "No key required",
|
|
365
381
|
requiresLabel: "Requires ",
|
|
@@ -396,7 +412,7 @@ var SITE = {
|
|
|
396
412
|
action: "compare_models",
|
|
397
413
|
displayName: "compare_models",
|
|
398
414
|
fullName: null,
|
|
399
|
-
description: "Diffs 2\u201310 finalists side-by-side. Flags the cheapest
|
|
415
|
+
description: "Diffs 2\u201310 finalists side-by-side. Flags the cheapest for your token mix.",
|
|
400
416
|
badge: null,
|
|
401
417
|
requiresKey: false
|
|
402
418
|
},
|
|
@@ -405,7 +421,7 @@ var SITE = {
|
|
|
405
421
|
action: "test_model",
|
|
406
422
|
displayName: "test_model",
|
|
407
423
|
fullName: null,
|
|
408
|
-
description: "Runs your prompt across models. Returns output, latency,
|
|
424
|
+
description: "Runs your prompt across models. Returns output, latency, cost. Dry-run for cost only.",
|
|
409
425
|
badge: "Live",
|
|
410
426
|
requiresKey: true
|
|
411
427
|
}
|
|
@@ -430,7 +446,7 @@ var SITE = {
|
|
|
430
446
|
},
|
|
431
447
|
{
|
|
432
448
|
question: "Does it pick the model for me?",
|
|
433
|
-
answer: "No
|
|
449
|
+
answer: "No. It gives your assistant the data: search results, specs, cost diffs, live test outputs. Your assistant makes the call.",
|
|
434
450
|
link: null
|
|
435
451
|
},
|
|
436
452
|
{
|
|
@@ -440,7 +456,7 @@ var SITE = {
|
|
|
440
456
|
},
|
|
441
457
|
{
|
|
442
458
|
question: "Which models?",
|
|
443
|
-
answer: `${MODEL_COUNT} from OpenRouter
|
|
459
|
+
answer: `${MODEL_COUNT} from OpenRouter: OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more. Metadata refreshes every 30 minutes.`,
|
|
444
460
|
link: null
|
|
445
461
|
},
|
|
446
462
|
{
|
|
@@ -450,7 +466,7 @@ var SITE = {
|
|
|
450
466
|
},
|
|
451
467
|
{
|
|
452
468
|
question: "What's the project status?",
|
|
453
|
-
answer: "Stable
|
|
469
|
+
answer: "Stable. Issues and feature requests on GitHub.",
|
|
454
470
|
link: null
|
|
455
471
|
}
|
|
456
472
|
]
|
|
@@ -520,32 +536,9 @@ var SITE = {
|
|
|
520
536
|
}
|
|
521
537
|
};
|
|
522
538
|
var README = {
|
|
523
|
-
tagline: `Landing page, API, and MCP server for discovering, shortlisting, comparing, cost-modeling, and live-testing ${MODEL_COUNT} AI models.`,
|
|
524
539
|
mcpDescription: `Discover, shortlist, compare, cost-model, and live-test ${MODEL_COUNT} AI models from your editor`,
|
|
525
|
-
monorepoLayout: {
|
|
526
|
-
appsWeb: "apps/web \u2014 Next.js 16 app (UI + API routes)",
|
|
527
|
-
packagesCore: "packages/core \u2014 Shared Zod schemas, types, constants (@index9/core)",
|
|
528
|
-
packagesMcp: "packages/mcp \u2014 Thin MCP stdio server calling the hosted API (@index9/mcp)"
|
|
529
|
-
},
|
|
530
|
-
quickStart: {
|
|
531
|
-
install: "pnpm install",
|
|
532
|
-
build: "pnpm build",
|
|
533
|
-
test: "pnpm test",
|
|
534
|
-
dev: "pnpm dev # run web app"
|
|
535
|
-
},
|
|
536
|
-
envNote: "Copy apps/web/.env.example to apps/web/.env.local and fill in values for local development.",
|
|
537
540
|
mcpInstall: {
|
|
538
|
-
|
|
539
|
-
envNote: "Optional: set OPENROUTER_API_KEY in your MCP client config for live test_model calls. dryRun=true works without a key.",
|
|
540
|
-
claudeCode: "Claude Code: Run `claude mcp add --transport stdio index9 -- npx -y @index9/mcp` or add the same config to .mcp.json / ~/.claude.json."
|
|
541
|
-
},
|
|
542
|
-
release: {
|
|
543
|
-
step1: "Make changes in packages/mcp (core is internal, bundled into mcp)",
|
|
544
|
-
step2: "Run pnpm changeset \u2014 add a changeset, select packages, choose bump type",
|
|
545
|
-
step3: "Commit and push; open PR to main",
|
|
546
|
-
step4: "Merge the PR; CI creates a Version Packages PR when changesets exist",
|
|
547
|
-
step5: "Merge the version PR; CI publishes to npm and creates a GitHub Release with the .mcpb artifact attached",
|
|
548
|
-
step6: "Users can install via npx @index9/mcp@latest or download .mcpb from Releases"
|
|
541
|
+
envNote: "Optional: set OPENROUTER_API_KEY in your MCP client config for live test_model calls. dryRun=true works without a key."
|
|
549
542
|
}
|
|
550
543
|
};
|
|
551
544
|
|
|
@@ -654,7 +647,7 @@ var BatchModelLookupResponseSchema = z3.object({
|
|
|
654
647
|
missingIds: z3.array(z3.string()),
|
|
655
648
|
resolvedAliases: z3.record(z3.string(), z3.string()).optional(),
|
|
656
649
|
ambiguousAliases: z3.record(z3.string(), z3.array(z3.string())).optional(),
|
|
657
|
-
suggestions: z3.record(z3.string(), z3.array(
|
|
650
|
+
suggestions: z3.record(z3.string(), z3.array(SuggestionEntrySchema)).optional(),
|
|
658
651
|
missingDiagnostics: z3.record(z3.string(), MissingModelDiagnosticSchema).optional()
|
|
659
652
|
}).strict();
|
|
660
653
|
var GetModelsToolResultSchema = z3.object({
|
|
@@ -662,7 +655,7 @@ var GetModelsToolResultSchema = z3.object({
|
|
|
662
655
|
missingIds: z3.array(z3.string()),
|
|
663
656
|
resolvedAliases: z3.record(z3.string(), z3.string()).optional(),
|
|
664
657
|
ambiguousAliases: z3.record(z3.string(), z3.array(z3.string())).optional(),
|
|
665
|
-
suggestions: z3.record(z3.string(), z3.array(
|
|
658
|
+
suggestions: z3.record(z3.string(), z3.array(SuggestionEntrySchema)).optional(),
|
|
666
659
|
missingDiagnostics: z3.record(z3.string(), MissingModelDiagnosticSchema).optional(),
|
|
667
660
|
_index9: Index9MetaSchema
|
|
668
661
|
});
|
|
@@ -720,7 +713,7 @@ var CompareResponseSchema = z4.object({
|
|
|
720
713
|
workloadCosts: z4.array(CompareWorkloadCostSchema).optional(),
|
|
721
714
|
resolvedAliases: z4.record(z4.string(), z4.string()).optional(),
|
|
722
715
|
missingIds: z4.array(z4.string()),
|
|
723
|
-
suggestions: z4.record(z4.string(), z4.array(
|
|
716
|
+
suggestions: z4.record(z4.string(), z4.array(SuggestionEntrySchema)).optional(),
|
|
724
717
|
ambiguousAliases: z4.record(z4.string(), z4.array(z4.string())).optional(),
|
|
725
718
|
missingDiagnostics: z4.record(z4.string(), MissingModelDiagnosticSchema).optional()
|
|
726
719
|
}).strict();
|
|
@@ -754,6 +747,7 @@ import { z as z6 } from "zod";
|
|
|
754
747
|
var ResponseFormatSchema = z6.object({
|
|
755
748
|
type: z6.string().min(1)
|
|
756
749
|
}).catchall(z6.unknown()).optional();
|
|
750
|
+
var ProviderSortSchema = z6.enum(["throughput", "price", "latency"]);
|
|
757
751
|
var TestRequestSchema = z6.object({
|
|
758
752
|
prompt: z6.string().min(1).optional(),
|
|
759
753
|
userContent: z6.array(UserContentPartSchema).min(1).optional(),
|
|
@@ -769,7 +763,30 @@ var TestRequestSchema = z6.object({
|
|
|
769
763
|
seed: z6.number().int().optional(),
|
|
770
764
|
responseFormat: ResponseFormatSchema,
|
|
771
765
|
enforceJson: z6.boolean().optional(),
|
|
772
|
-
retries: z6.number().int().min(0).max(3).optional()
|
|
766
|
+
retries: z6.number().int().min(0).max(3).optional(),
|
|
767
|
+
// Use OpenRouter's SSE streaming endpoint so capacity/refusal errors
|
|
768
|
+
// surface in ~1s instead of waiting the full per-model timeout for an
|
|
769
|
+
// empty 200 OK. Cost/tokens are still returned via stream_options.
|
|
770
|
+
stream: z6.boolean().optional(),
|
|
771
|
+
// First-token deadline (streaming only). If the upstream sends no
|
|
772
|
+
// delta within this window, abort the request. Defaults to 10s when
|
|
773
|
+
// streaming. Ignored when stream=false.
|
|
774
|
+
firstTokenTimeoutMs: z6.number().int().positive().optional(),
|
|
775
|
+
// Forwards as `provider.sort` to OpenRouter — opt into routing toward
|
|
776
|
+
// higher-throughput providers when running benchmarks.
|
|
777
|
+
providerSort: ProviderSortSchema.optional(),
|
|
778
|
+
// Forwards as `provider.order` — try these provider slugs first in the
|
|
779
|
+
// given order before falling back. Capped to stay within reasonable
|
|
780
|
+
// limits and prevent abuse.
|
|
781
|
+
providerOrder: z6.array(z6.string().min(1)).min(1).max(8).optional(),
|
|
782
|
+
// Forwards as the top-level `models` array (NOT `model`). OpenRouter
|
|
783
|
+
// tries each in order if the primary is unavailable. Different intent
|
|
784
|
+
// from providerOrder, which routes within a single model.
|
|
785
|
+
fallbackModels: z6.array(z6.string().min(1)).min(1).max(5).optional(),
|
|
786
|
+
// When true, attach a `debug` field on each result with the raw
|
|
787
|
+
// upstream finish_reason, error message, provider name, refusal, and
|
|
788
|
+
// usage. Used to diagnose "missing assistant text" without re-running.
|
|
789
|
+
debug: z6.boolean().optional()
|
|
773
790
|
}).strict().superRefine((data, ctx) => {
|
|
774
791
|
if (data.dryRun === true) {
|
|
775
792
|
if (!data.prompt && data.expectedPromptTokens === void 0) {
|
|
@@ -804,10 +821,27 @@ var TestFailureReasonSchema = z6.enum([
|
|
|
804
821
|
"insufficient_credits",
|
|
805
822
|
"model_unavailable",
|
|
806
823
|
"rate_limited",
|
|
824
|
+
// Provider is overloaded / "at capacity" / "provisioned throughput
|
|
825
|
+
// required". A distinct reason from rate_limited so callers can apply
|
|
826
|
+
// a longer backoff or route to a fallback model.
|
|
827
|
+
"capacity",
|
|
807
828
|
"timeout",
|
|
808
829
|
"invalid_request",
|
|
809
830
|
"unknown"
|
|
810
831
|
]);
|
|
832
|
+
var TestDebugInfoSchema = z6.object({
|
|
833
|
+
upstreamId: z6.string().optional(),
|
|
834
|
+
providerName: z6.string().optional(),
|
|
835
|
+
finishReason: z6.string().optional(),
|
|
836
|
+
upstreamError: z6.string().optional(),
|
|
837
|
+
refusal: z6.string().optional(),
|
|
838
|
+
hasToolCalls: z6.boolean().optional(),
|
|
839
|
+
usage: z6.object({
|
|
840
|
+
promptTokens: z6.number().optional(),
|
|
841
|
+
completionTokens: z6.number().optional(),
|
|
842
|
+
totalTokens: z6.number().optional()
|
|
843
|
+
}).optional()
|
|
844
|
+
});
|
|
811
845
|
var TestModelMetadataSchema = z6.object({
|
|
812
846
|
id: z6.string(),
|
|
813
847
|
name: z6.string(),
|
|
@@ -824,7 +858,8 @@ var TestResultSuccessSchema = z6.object({
|
|
|
824
858
|
latencyMs: z6.number().min(0),
|
|
825
859
|
tokens: UsageTokensSchema,
|
|
826
860
|
cost: z6.number().nullable().optional(),
|
|
827
|
-
truncated: z6.boolean().optional()
|
|
861
|
+
truncated: z6.boolean().optional(),
|
|
862
|
+
debug: TestDebugInfoSchema.optional()
|
|
828
863
|
});
|
|
829
864
|
var TestResultFailureSchema = z6.object({
|
|
830
865
|
modelId: z6.string(),
|
|
@@ -833,7 +868,8 @@ var TestResultFailureSchema = z6.object({
|
|
|
833
868
|
model: TestModelMetadataSchema,
|
|
834
869
|
error: z6.string(),
|
|
835
870
|
failureReason: TestFailureReasonSchema.optional(),
|
|
836
|
-
latencyMs: z6.number().min(0)
|
|
871
|
+
latencyMs: z6.number().min(0),
|
|
872
|
+
debug: TestDebugInfoSchema.optional()
|
|
837
873
|
});
|
|
838
874
|
var TestResultSchema = z6.discriminatedUnion("ok", [
|
|
839
875
|
TestResultSuccessSchema,
|
|
@@ -850,13 +886,22 @@ var TestEstimateResultSchema = z6.object({
|
|
|
850
886
|
totalCostUsd: z6.number().nullable().optional(),
|
|
851
887
|
estimatedCostBasis: PricingBasisSchema.optional()
|
|
852
888
|
});
|
|
889
|
+
var TestResolutionFieldsSchema = {
|
|
890
|
+
missingIds: z6.array(z6.string()).optional(),
|
|
891
|
+
resolvedAliases: z6.record(z6.string(), z6.string()).optional(),
|
|
892
|
+
ambiguousAliases: z6.record(z6.string(), z6.array(z6.string())).optional(),
|
|
893
|
+
suggestions: z6.record(z6.string(), z6.array(SuggestionEntrySchema)).optional(),
|
|
894
|
+
missingDiagnostics: z6.record(z6.string(), MissingModelDiagnosticSchema).optional()
|
|
895
|
+
};
|
|
853
896
|
var TestDryRunResponseSchema = z6.object({
|
|
854
897
|
dryRun: z6.literal(true),
|
|
855
898
|
results: z6.array(TestEstimateResultSchema),
|
|
856
|
-
disclaimer: z6.string()
|
|
899
|
+
disclaimer: z6.string(),
|
|
900
|
+
...TestResolutionFieldsSchema
|
|
857
901
|
});
|
|
858
902
|
var TestLiveResponseSchema = z6.object({
|
|
859
|
-
results: z6.array(TestResultSchema)
|
|
903
|
+
results: z6.array(TestResultSchema),
|
|
904
|
+
...TestResolutionFieldsSchema
|
|
860
905
|
});
|
|
861
906
|
var TestResponseSchema = z6.union([TestDryRunResponseSchema, TestLiveResponseSchema]);
|
|
862
907
|
|
|
@@ -885,8 +930,8 @@ function loadConfig() {
|
|
|
885
930
|
}
|
|
886
931
|
|
|
887
932
|
// src/client.ts
|
|
888
|
-
var
|
|
889
|
-
var
|
|
933
|
+
var DEFAULT_RETRY_DELAYS_MS = [1e3, 2e3, 4e3];
|
|
934
|
+
var DEFAULT_ATTEMPT_TIMEOUT_MS = 3e4;
|
|
890
935
|
function isRetryable(status) {
|
|
891
936
|
return status === 429 || status >= 500;
|
|
892
937
|
}
|
|
@@ -902,14 +947,17 @@ function toErrorMessage(error) {
|
|
|
902
947
|
if (error instanceof Error && error.message.trim()) return error.message;
|
|
903
948
|
return "Unknown error";
|
|
904
949
|
}
|
|
905
|
-
async function fetchWithRetry(url, options) {
|
|
950
|
+
async function fetchWithRetry(url, options, retryOptions) {
|
|
951
|
+
const attemptTimeoutMs = retryOptions?.attemptTimeoutMs ?? DEFAULT_ATTEMPT_TIMEOUT_MS;
|
|
952
|
+
const maxRetries = Math.max(0, retryOptions?.maxRetries ?? DEFAULT_RETRY_DELAYS_MS.length);
|
|
953
|
+
const retryDelaysMs = DEFAULT_RETRY_DELAYS_MS.slice(0, maxRetries);
|
|
906
954
|
let lastResponse = null;
|
|
907
955
|
let lastError;
|
|
908
|
-
for (let i = 0; i <=
|
|
956
|
+
for (let i = 0; i <= maxRetries; i++) {
|
|
909
957
|
const timeoutController = new AbortController();
|
|
910
958
|
const timeoutId = setTimeout(() => {
|
|
911
959
|
timeoutController.abort(new DOMException("Request timed out", "AbortError"));
|
|
912
|
-
},
|
|
960
|
+
}, attemptTimeoutMs);
|
|
913
961
|
const externalSignal = options.signal;
|
|
914
962
|
const onAbort = () => {
|
|
915
963
|
timeoutController.abort(
|
|
@@ -934,14 +982,12 @@ async function fetchWithRetry(url, options) {
|
|
|
934
982
|
clearTimeout(timeoutId);
|
|
935
983
|
externalSignal?.removeEventListener("abort", onAbort);
|
|
936
984
|
}
|
|
937
|
-
if (i <
|
|
938
|
-
await sleep(
|
|
985
|
+
if (i < retryDelaysMs.length) {
|
|
986
|
+
await sleep(retryDelaysMs[i]);
|
|
939
987
|
}
|
|
940
988
|
}
|
|
941
989
|
if (lastResponse) return lastResponse;
|
|
942
|
-
throw new Error(
|
|
943
|
-
`Request failed after ${RETRY_DELAYS_MS.length + 1} attempts: ${toErrorMessage(lastError)}`
|
|
944
|
-
);
|
|
990
|
+
throw new Error(`Request failed after ${maxRetries + 1} attempts: ${toErrorMessage(lastError)}`);
|
|
945
991
|
}
|
|
946
992
|
function buildUrl(baseUrl, path, params) {
|
|
947
993
|
const url = new URL(path, baseUrl);
|
|
@@ -1004,8 +1050,24 @@ function extractError(body) {
|
|
|
1004
1050
|
}
|
|
1005
1051
|
return "Request failed";
|
|
1006
1052
|
}
|
|
1007
|
-
|
|
1008
|
-
|
|
1053
|
+
var RECOVERY_FIELDS = [
|
|
1054
|
+
"missingIds",
|
|
1055
|
+
"resolvedAliases",
|
|
1056
|
+
"ambiguousAliases",
|
|
1057
|
+
"suggestions",
|
|
1058
|
+
"missingDiagnostics"
|
|
1059
|
+
];
|
|
1060
|
+
function extractRecoveryFields(body) {
|
|
1061
|
+
if (typeof body !== "object" || body === null || Array.isArray(body)) return {};
|
|
1062
|
+
const out = {};
|
|
1063
|
+
const b = body;
|
|
1064
|
+
for (const key of RECOVERY_FIELDS) {
|
|
1065
|
+
if (key in b) out[key] = b[key];
|
|
1066
|
+
}
|
|
1067
|
+
return out;
|
|
1068
|
+
}
|
|
1069
|
+
async function callApi(ctx, url, options, responseSchema, retryOptions) {
|
|
1070
|
+
const res = await fetchWithRetry(url, options, retryOptions);
|
|
1009
1071
|
let body;
|
|
1010
1072
|
try {
|
|
1011
1073
|
body = await res.json();
|
|
@@ -1014,7 +1076,12 @@ async function callApi(ctx, url, options, responseSchema) {
|
|
|
1014
1076
|
}
|
|
1015
1077
|
if (!res.ok) {
|
|
1016
1078
|
return toResponse(
|
|
1017
|
-
{
|
|
1079
|
+
{
|
|
1080
|
+
error: extractError(body),
|
|
1081
|
+
status: res.status,
|
|
1082
|
+
...extractRecoveryFields(body),
|
|
1083
|
+
_index9: buildMeta(ctx, res.headers)
|
|
1084
|
+
},
|
|
1018
1085
|
true
|
|
1019
1086
|
);
|
|
1020
1087
|
}
|
|
@@ -1062,7 +1129,11 @@ async function handleGetModels(ctx, args) {
|
|
|
1062
1129
|
return callApi(
|
|
1063
1130
|
ctx,
|
|
1064
1131
|
`${ctx.baseUrl}${API_PATHS.model}`,
|
|
1065
|
-
{
|
|
1132
|
+
{
|
|
1133
|
+
method: "POST",
|
|
1134
|
+
headers: baseHeaders(ctx),
|
|
1135
|
+
body: JSON.stringify(parsed.data)
|
|
1136
|
+
},
|
|
1066
1137
|
BatchModelLookupResponseSchema
|
|
1067
1138
|
);
|
|
1068
1139
|
}
|
|
@@ -1074,7 +1145,11 @@ async function handleCompareModels(ctx, args) {
|
|
|
1074
1145
|
return callApi(
|
|
1075
1146
|
ctx,
|
|
1076
1147
|
`${ctx.baseUrl}${API_PATHS.compare}`,
|
|
1077
|
-
{
|
|
1148
|
+
{
|
|
1149
|
+
method: "POST",
|
|
1150
|
+
headers: baseHeaders(ctx),
|
|
1151
|
+
body: JSON.stringify(parsed.data)
|
|
1152
|
+
},
|
|
1078
1153
|
CompareResponseSchema
|
|
1079
1154
|
);
|
|
1080
1155
|
}
|
|
@@ -1107,7 +1182,12 @@ async function handleTestModels(ctx, args) {
|
|
|
1107
1182
|
ctx,
|
|
1108
1183
|
`${ctx.baseUrl}${API_PATHS.test}`,
|
|
1109
1184
|
{ method: "POST", headers: reqHeaders, body: JSON.stringify(parsed.data) },
|
|
1110
|
-
TestResponseSchema
|
|
1185
|
+
TestResponseSchema,
|
|
1186
|
+
// Live inference is non-idempotent and slow: each retry costs real money
|
|
1187
|
+
// and the server-side per-model retry/backoff already handles transient
|
|
1188
|
+
// errors. Give the call enough wall-clock to cover a worst-case 10-model
|
|
1189
|
+
// batch × 60s per model and let the server decide on retries.
|
|
1190
|
+
{ attemptTimeoutMs: 24e4, maxRetries: 0 }
|
|
1111
1191
|
);
|
|
1112
1192
|
}
|
|
1113
1193
|
|
|
@@ -1218,7 +1298,21 @@ async function createServer() {
|
|
|
1218
1298
|
"Structured output shape request forwarded to OpenRouter (e.g., { type: 'json_object' })."
|
|
1219
1299
|
),
|
|
1220
1300
|
enforceJson: z7.boolean().optional().describe("When true, output must parse as JSON."),
|
|
1221
|
-
retries: z7.number().int().min(0).max(3).optional().describe("Retries for transient failures.")
|
|
1301
|
+
retries: z7.number().int().min(0).max(3).optional().describe("Retries for transient failures."),
|
|
1302
|
+
stream: z7.boolean().optional().describe(
|
|
1303
|
+
"Use OpenRouter SSE streaming so capacity/refusal errors surface quickly. Defaults to false."
|
|
1304
|
+
),
|
|
1305
|
+
firstTokenTimeoutMs: z7.number().int().min(1).optional().describe("Streaming-only first-token deadline in ms. Defaults to 10000."),
|
|
1306
|
+
providerSort: ProviderSortSchema.optional().describe(
|
|
1307
|
+
'OpenRouter provider routing sort: "throughput", "price", or "latency".'
|
|
1308
|
+
),
|
|
1309
|
+
providerOrder: z7.array(z7.string().min(1)).min(1).max(8).optional().describe("Provider slugs to try first, in order. Up to 8."),
|
|
1310
|
+
fallbackModels: z7.array(z7.string().min(1)).min(1).max(5).optional().describe(
|
|
1311
|
+
"Fallback model IDs OpenRouter may try if the primary is unavailable. Up to 5."
|
|
1312
|
+
),
|
|
1313
|
+
debug: z7.boolean().optional().describe(
|
|
1314
|
+
"When true, include upstream finish_reason, provider, error, refusal, and usage."
|
|
1315
|
+
)
|
|
1222
1316
|
},
|
|
1223
1317
|
// No outputSchema: test_model returns a z.union of dry-run and live shapes.
|
|
1224
1318
|
// The SDK supports only ZodRawShape | AnySchema for outputSchema; a discriminated-union
|
package/manifest.json
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@index9/mcp",
|
|
3
|
-
"version": "6.
|
|
3
|
+
"version": "6.3.0",
|
|
4
4
|
"license": "MIT",
|
|
5
5
|
"repository": {
|
|
6
6
|
"type": "git",
|
|
@@ -24,11 +24,11 @@
|
|
|
24
24
|
"zod": "^4.4.3"
|
|
25
25
|
},
|
|
26
26
|
"devDependencies": {
|
|
27
|
-
"@types/node": "^25.
|
|
27
|
+
"@types/node": "^25.8.0",
|
|
28
28
|
"tsup": "^8.5.1",
|
|
29
29
|
"typescript": "6.0.3",
|
|
30
|
-
"vitest": "^4.1.
|
|
31
|
-
"@index9/core": "2.
|
|
30
|
+
"vitest": "^4.1.6",
|
|
31
|
+
"@index9/core": "2.6.0"
|
|
32
32
|
},
|
|
33
33
|
"engines": {
|
|
34
34
|
"node": ">=20"
|