@index9/mcp 6.2.0 → 6.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +152 -94
- package/manifest.json +1 -1
- package/package.json +3 -3
package/dist/cli.js
CHANGED
|
@@ -219,8 +219,14 @@ Parameters:
|
|
|
219
219
|
- expectedPromptTokens: Estimated prompt-token count for dryRun cost estimation; overrides the prompt-string heuristic. Use to model "what would N-token requests cost?" without pasting N tokens.
|
|
220
220
|
- expectedCompletionTokens: Optional completion token estimate used by dryRun
|
|
221
221
|
- maxTokens, systemPrompt, temperature, topP, seed, responseFormat, enforceJson, retries: Live-testing controls (ignored when dryRun=true)
|
|
222
|
+
- stream: Use OpenRouter's SSE streaming so capacity/refusal errors surface in ~1s instead of waiting the full per-model timeout for an empty 200. Defaults to false.
|
|
223
|
+
- firstTokenTimeoutMs: Streaming-only deadline for the first delta. Defaults to 10s. If the upstream sends no token within this window, the request aborts and returns failureReason="timeout". Ignored when stream=false.
|
|
224
|
+
- providerSort: "throughput" | "price" | "latency" \u2014 opt-in OpenRouter provider routing. Defaults to OpenRouter's load-balanced choice.
|
|
225
|
+
- providerOrder: ordered list of provider slugs (up to 8). Try these providers first before falling back. Useful for steering around an overloaded provider for a single model.
|
|
226
|
+
- fallbackModels: ordered list of model ids (up to 5). OpenRouter automatically retries the request against the next id when the primary is unavailable. Use sparingly \u2014 a benchmark should usually test the model you asked for, not a substitute.
|
|
227
|
+
- debug: When true, each result includes a \`debug\` field with the raw upstream finish_reason, error message, \`providerName\` (OpenRouter routing/fulfillment provider \u2014 can differ from the publisher, e.g. an anthropic/* model served via Google Vertex), \`modelPublisher\` (derived from the canonical id prefix \u2014 e.g. "anthropic", "x-ai"), refusal, and usage. Use to diagnose "missing assistant text" without re-running.
|
|
222
228
|
|
|
223
|
-
Results (live): each result carries modelId (the id you passed), resolvedModelId (canonical id, present when the input was an alias), ok, response, latencyMs, tokens { prompt, completion }, cost (USD; live from OpenRouter when available, else estimated from cached pricing), and truncated=true when finish_reason is "length". On failure, results include \`error\` (free-form) plus \`failureReason\` ("insufficient_credits" | "model_unavailable" | "rate_limited" | "timeout" | "invalid_request" | "unknown") so callers can pick a retry strategy without parsing the error string.
|
|
229
|
+
Results (live): each result carries modelId (the id you passed), resolvedModelId (canonical id, present when the input was an alias), ok, response, latencyMs, tokens { prompt, completion }, cost (USD; live from OpenRouter when available, else estimated from cached pricing), and truncated=true when finish_reason is "length". On failure, results include \`error\` (free-form) plus \`failureReason\` ("insufficient_credits" | "model_unavailable" | "rate_limited" | "capacity" | "timeout" | "invalid_request" | "unknown") so callers can pick a retry strategy without parsing the error string. \`capacity\` indicates the provider is overloaded \u2014 apply a longer backoff or set \`fallbackModels\` and retry. When \`debug: true\` is set, each result also carries a \`debug\` block with the upstream provider's diagnostic fields.
|
|
224
230
|
|
|
225
231
|
Results (dryRun): each entry carries \`tokenCostUsd\`, \`requestCostUsd\`, \`totalCostUsd\` (matches \`estimatedCost\`, includes per-request fees), and \`estimatedCostBasis\` (same enum as compare_models.workloadCosts). Use find_models or get_models first to identify model ids.
|
|
226
232
|
|
|
@@ -239,13 +245,16 @@ var PARAM_DESCRIPTIONS = {
|
|
|
239
245
|
excludeFree: `When true, exclude models with id ending in ':free'. Useful for sortBy=price (which would otherwise be dominated by free-tier preview models) and when you want a paid SLA. Default false.`,
|
|
240
246
|
requireKeywordMatch: `When true, suppress weak vector-only results from semantic queries. If no candidate has a BM25 keyword hit, returns an empty page with meta.confidence='low' and meta.lowConfidenceReason \u2014 instead of returning misleading nearest-neighbor matches. Filter-only queries (sortBy=created or sortBy=price without q) ignore this flag. Default false.`,
|
|
241
247
|
expectedPromptTokens: `Expected number of prompt tokens for dryRun cost estimation. When set, overrides the heuristic that counts characters from the literal \`prompt\` string \u2014 use this for capacity planning ("what would 6000-token reviews cost?") without pasting filler. If both are omitted, the prompt string is tokenized at ~4 chars/token.`,
|
|
242
|
-
expectedCompletionTokens: `Expected number of completion tokens for cost estimation (default: 256). Typical ranges: 100-500 for quick tests, 1000-2000 for code generation, 4000+ for long-form content. This is a heuristic \u2014 actual billed tokens may differ
|
|
248
|
+
expectedCompletionTokens: `Expected number of completion tokens for cost estimation (default: 256). Typical ranges: 100-500 for quick tests, 1000-2000 for code generation, 4000+ for long-form content. This is a heuristic \u2014 actual billed tokens may differ.`,
|
|
249
|
+
minPrice: `Minimum effective prompt price in USD per million tokens. Matches the basis exposed in \`pricing.effectivePromptPerMillion\` and \`pageInfo.priceSortBasis\` \u2014 token price plus per-request fees scaled to per-million, so a model with $0.30/M tokens and a $0.30 per-request fee evaluates as $0.60/M effective. Models with no resolvable price (zero-token, no per-request fee, non-\`:free\` id) are excluded when this bound is set.`,
|
|
250
|
+
maxPrice: `Maximum effective prompt price in USD per million tokens. Matches the basis exposed in \`pricing.effectivePromptPerMillion\` and \`pageInfo.priceSortBasis\` \u2014 token price plus per-request fees scaled to per-million, so a model with $0.30/M tokens and a $0.30 per-request fee evaluates as $0.60/M effective and is excluded by \`maxPrice: 0.55\`. Models with no resolvable price (zero-token, no per-request fee, non-\`:free\` id) are excluded when this bound is set.`
|
|
243
251
|
};
|
|
244
252
|
var SITE = {
|
|
245
253
|
nav: {
|
|
246
254
|
brand: "index9",
|
|
247
255
|
tools: "Tools",
|
|
248
256
|
howItWorks: "How it works",
|
|
257
|
+
caseStudy: "Case study",
|
|
249
258
|
install: "Install",
|
|
250
259
|
faq: "FAQ",
|
|
251
260
|
github: "GitHub",
|
|
@@ -253,26 +262,29 @@ var SITE = {
|
|
|
253
262
|
installCta: "Install"
|
|
254
263
|
},
|
|
255
264
|
hero: {
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
subtitle: "Index9 is an MCP server. Your coding assistant uses it to search, compare, and live-test 300+ models on the task you're working on, so it recommends the best fit.",
|
|
259
|
-
proof: ["Live OpenRouter data \xB7 300+ models \xB7 refreshed every 30 min"],
|
|
265
|
+
title: "Pick AI models from live data, not your assistant's memory.",
|
|
266
|
+
subtitle: "An MCP server with five tools for Claude Code, Cursor, VS Code, and Codex. Search, compare, cost-model, and live-test 300+ models.",
|
|
260
267
|
pricingNote: "Free. You only pay OpenRouter for live model calls.",
|
|
261
|
-
getStarted: "Add index9 to your editor",
|
|
262
268
|
seeHowItWorks: "See a real session",
|
|
263
|
-
updatedBadge: "OpenRouter data \xB7 refreshed "
|
|
269
|
+
updatedBadge: "OpenRouter data \xB7 refreshed ",
|
|
270
|
+
toolList: [
|
|
271
|
+
{ name: "list_facets", role: "vocabulary" },
|
|
272
|
+
{ name: "find_models", role: "search & filter" },
|
|
273
|
+
{ name: "get_models", role: "full specs" },
|
|
274
|
+
{ name: "compare_models", role: "side-by-side diff" },
|
|
275
|
+
{ name: "test_model", role: "live inference" }
|
|
276
|
+
]
|
|
264
277
|
},
|
|
265
278
|
problem: {
|
|
266
279
|
label: "Why this exists",
|
|
267
280
|
heading: "Your assistant's model knowledge is stale",
|
|
268
281
|
body: [
|
|
269
282
|
'New models ship every week. Pricing changes. "Use GPT-4" or "use Claude 3.5" is usually months behind reality.',
|
|
270
|
-
"Without live data, your assistant defaults to whatever it learned in training
|
|
271
|
-
"Index9 gives it the data and the tools to
|
|
283
|
+
"Without live data, your assistant defaults to whatever it learned in training. Usually a model superseded by something cheaper or better-suited to your task.",
|
|
284
|
+
"Index9 gives it the data, and the tools to compare."
|
|
272
285
|
]
|
|
273
286
|
},
|
|
274
287
|
howItWorks: {
|
|
275
|
-
label: "How it works",
|
|
276
288
|
heading: "How it works",
|
|
277
289
|
subtitle: "Index9 adds 5 tools to your editor. Your assistant calls them when you ask about models.",
|
|
278
290
|
steps: [
|
|
@@ -284,90 +296,95 @@ var SITE = {
|
|
|
284
296
|
{
|
|
285
297
|
number: "2",
|
|
286
298
|
title: "Your assistant calls index9",
|
|
287
|
-
body: "It searches live model data, compares finalists, and runs your prompt against the top
|
|
299
|
+
body: "It searches live model data, compares finalists, and runs your prompt against the top picks."
|
|
288
300
|
},
|
|
289
301
|
{
|
|
290
302
|
number: "3",
|
|
291
303
|
title: "You get a measured pick",
|
|
292
|
-
body: "Backed by real cost numbers and real outputs
|
|
304
|
+
body: "Backed by real cost numbers and real outputs, not training-data memory."
|
|
293
305
|
}
|
|
294
306
|
]
|
|
295
307
|
},
|
|
296
308
|
caseStudy: {
|
|
297
|
-
label: "
|
|
309
|
+
label: "Real session",
|
|
298
310
|
heading: "A real session, not a mockup",
|
|
299
|
-
subheading: "
|
|
311
|
+
subheading: "Claude Sonnet 4.6 driven through the index9 MCP server, picking a model for a TypeScript code-review bot. Real prompt, real tool calls, real verdict.",
|
|
300
312
|
prompt: {
|
|
301
313
|
title: "The prompt",
|
|
302
|
-
body: "Pick a model for a TypeScript code-review bot that runs on every PR. I want real quality without paying frontier rates
|
|
314
|
+
body: "Pick a model for a TypeScript code-review bot that runs on every PR. I want real quality without paying frontier rates. Quote model ids verbatim from the tool responses."
|
|
303
315
|
},
|
|
304
316
|
toolCalls: {
|
|
305
317
|
title: "What the assistant did",
|
|
306
|
-
subtitle: "
|
|
318
|
+
subtitle: "4 calls, 42s",
|
|
307
319
|
calls: [
|
|
308
|
-
{ tool: "find_models", params: "newest first", note: "skip stale training picks" },
|
|
309
320
|
{
|
|
310
321
|
tool: "find_models",
|
|
311
|
-
params: "
|
|
322
|
+
params: "sortBy=created, function_calling + structured_output",
|
|
323
|
+
note: "skip stale training picks"
|
|
324
|
+
},
|
|
325
|
+
{
|
|
326
|
+
tool: "find_models",
|
|
327
|
+
params: "code review, maxPrice=$5/M, minContext=64K",
|
|
312
328
|
note: "task fit"
|
|
313
329
|
},
|
|
314
|
-
{ tool: "find_models", params: "max $2/M, every-PR budget", note: "rule out frontier" },
|
|
315
|
-
{ tool: "get_models", params: "8 candidates", note: "metadata lookup" },
|
|
316
330
|
{
|
|
317
331
|
tool: "compare_models",
|
|
318
|
-
params: "4 finalists,
|
|
319
|
-
note: "per-PR cost
|
|
332
|
+
params: "4 finalists, 3,000 prompt + 800 completion",
|
|
333
|
+
note: "per-PR cost diff"
|
|
320
334
|
},
|
|
321
|
-
{ tool: "test_model", params: "dry-run \xD7 4", note: "cost estimate" },
|
|
322
335
|
{
|
|
323
336
|
tool: "test_model",
|
|
324
|
-
params: "
|
|
325
|
-
note: "
|
|
337
|
+
params: "dry-run \xD7 4, same token budget",
|
|
338
|
+
note: "cost confirmation"
|
|
326
339
|
}
|
|
327
340
|
]
|
|
328
341
|
},
|
|
329
|
-
consideredTitle: "
|
|
330
|
-
consideredSubtitle: "
|
|
342
|
+
consideredTitle: "Candidates the session evaluated",
|
|
343
|
+
consideredSubtitle: "Lifted from the tool responses. Per-PR cost is the dry-run test_model output for a 3,000 token diff and an 800 token review.",
|
|
331
344
|
consideredRows: [
|
|
332
345
|
{
|
|
333
|
-
id: "
|
|
334
|
-
age: "
|
|
346
|
+
id: "anthropic/claude-opus-4.7-fast",
|
|
347
|
+
age: "5d ago",
|
|
335
348
|
decision: "skip",
|
|
336
|
-
reason: "
|
|
349
|
+
reason: "$30/M input, filtered out by the maxPrice constraint on the second find_models call"
|
|
337
350
|
},
|
|
338
351
|
{
|
|
339
|
-
id: "
|
|
340
|
-
age: "
|
|
352
|
+
id: "mistralai/devstral-medium",
|
|
353
|
+
age: "10mo ago",
|
|
341
354
|
decision: "tested",
|
|
342
|
-
reason: "
|
|
355
|
+
reason: "$0.40/M input, completion-heavy at $2/M, $0.0028 per PR"
|
|
343
356
|
},
|
|
344
357
|
{
|
|
345
|
-
id: "
|
|
346
|
-
age: "
|
|
358
|
+
id: "qwen/qwen3-coder",
|
|
359
|
+
age: "10mo ago",
|
|
347
360
|
decision: "tested",
|
|
348
|
-
reason: "
|
|
361
|
+
reason: "$0.22/M input, 1M context but no file_input, $0.0021 per PR"
|
|
349
362
|
},
|
|
350
363
|
{
|
|
351
|
-
id: "
|
|
352
|
-
age: "
|
|
364
|
+
id: "google/gemini-3.1-flash-lite",
|
|
365
|
+
age: "1w ago",
|
|
366
|
+
decision: "tested",
|
|
367
|
+
reason: "$0.25/M input, newest of the shortlist, adds reasoning and vision, $0.00195 per PR"
|
|
368
|
+
},
|
|
369
|
+
{
|
|
370
|
+
id: "mistralai/codestral-2508",
|
|
371
|
+
age: "9mo ago",
|
|
353
372
|
decision: "shortlisted",
|
|
354
|
-
reason: "
|
|
373
|
+
reason: "$0.30/M input, Mistral's coding-specific model, $0.00162 per PR"
|
|
355
374
|
}
|
|
356
375
|
],
|
|
357
376
|
verdict: {
|
|
358
377
|
title: "The pick",
|
|
359
|
-
model: "
|
|
360
|
-
body: "
|
|
361
|
-
},
|
|
362
|
-
quote: {
|
|
363
|
-
body: "The frontier model would have caught both bugs, at 5\xD7 the cost. The cheapest candidate missed them entirely. Only the live test surfaced the model that did both.",
|
|
364
|
-
attribution: "index9 session trace"
|
|
378
|
+
model: "mistralai/codestral-2508",
|
|
379
|
+
body: "Lowest realistic per-PR cost at $0.00162 on a 3,000 token diff and an 800 token review. Mistral's coding-specific model, structured output, file input for diffs, 256K context. About 100\xD7 cheaper on input than running claude-opus-4.7-fast on every commit."
|
|
365
380
|
}
|
|
366
381
|
},
|
|
367
382
|
toolsSection: {
|
|
368
383
|
label: "Tools",
|
|
369
384
|
heading: "The 5 tools",
|
|
370
385
|
subheading: "Your assistant chains these together. You don't call them directly.",
|
|
386
|
+
keyNotePrefix: "Only",
|
|
387
|
+
keyNoteSuffix: "needs an OpenRouter key. The rest work out of the box.",
|
|
371
388
|
openRouterKey: "OpenRouter API key",
|
|
372
389
|
noKeyRequired: "No key required",
|
|
373
390
|
requiresLabel: "Requires ",
|
|
@@ -404,7 +421,7 @@ var SITE = {
|
|
|
404
421
|
action: "compare_models",
|
|
405
422
|
displayName: "compare_models",
|
|
406
423
|
fullName: null,
|
|
407
|
-
description: "Diffs 2\u201310 finalists side-by-side. Flags the cheapest
|
|
424
|
+
description: "Diffs 2\u201310 finalists side-by-side. Flags the cheapest for your token mix.",
|
|
408
425
|
badge: null,
|
|
409
426
|
requiresKey: false
|
|
410
427
|
},
|
|
@@ -413,7 +430,7 @@ var SITE = {
|
|
|
413
430
|
action: "test_model",
|
|
414
431
|
displayName: "test_model",
|
|
415
432
|
fullName: null,
|
|
416
|
-
description: "Runs your prompt across models. Returns output, latency,
|
|
433
|
+
description: "Runs your prompt across models. Returns output, latency, cost. Dry-run for cost only.",
|
|
417
434
|
badge: "Live",
|
|
418
435
|
requiresKey: true
|
|
419
436
|
}
|
|
@@ -438,7 +455,7 @@ var SITE = {
|
|
|
438
455
|
},
|
|
439
456
|
{
|
|
440
457
|
question: "Does it pick the model for me?",
|
|
441
|
-
answer: "No
|
|
458
|
+
answer: "No. It gives your assistant the data: search results, specs, cost diffs, live test outputs. Your assistant makes the call.",
|
|
442
459
|
link: null
|
|
443
460
|
},
|
|
444
461
|
{
|
|
@@ -448,7 +465,7 @@ var SITE = {
|
|
|
448
465
|
},
|
|
449
466
|
{
|
|
450
467
|
question: "Which models?",
|
|
451
|
-
answer: `${MODEL_COUNT} from OpenRouter
|
|
468
|
+
answer: `${MODEL_COUNT} from OpenRouter: OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more. Metadata refreshes every 30 minutes.`,
|
|
452
469
|
link: null
|
|
453
470
|
},
|
|
454
471
|
{
|
|
@@ -458,7 +475,7 @@ var SITE = {
|
|
|
458
475
|
},
|
|
459
476
|
{
|
|
460
477
|
question: "What's the project status?",
|
|
461
|
-
answer: "Stable
|
|
478
|
+
answer: "Stable. Issues and feature requests on GitHub.",
|
|
462
479
|
link: null
|
|
463
480
|
}
|
|
464
481
|
]
|
|
@@ -528,32 +545,9 @@ var SITE = {
|
|
|
528
545
|
}
|
|
529
546
|
};
|
|
530
547
|
var README = {
|
|
531
|
-
tagline: `Landing page, API, and MCP server for discovering, shortlisting, comparing, cost-modeling, and live-testing ${MODEL_COUNT} AI models.`,
|
|
532
548
|
mcpDescription: `Discover, shortlist, compare, cost-model, and live-test ${MODEL_COUNT} AI models from your editor`,
|
|
533
|
-
monorepoLayout: {
|
|
534
|
-
appsWeb: "apps/web \u2014 Next.js 16 app (UI + API routes)",
|
|
535
|
-
packagesCore: "packages/core \u2014 Shared Zod schemas, types, constants (@index9/core)",
|
|
536
|
-
packagesMcp: "packages/mcp \u2014 Thin MCP stdio server calling the hosted API (@index9/mcp)"
|
|
537
|
-
},
|
|
538
|
-
quickStart: {
|
|
539
|
-
install: "pnpm install",
|
|
540
|
-
build: "pnpm build",
|
|
541
|
-
test: "pnpm test",
|
|
542
|
-
dev: "pnpm dev # run web app"
|
|
543
|
-
},
|
|
544
|
-
envNote: "Copy apps/web/.env.example to apps/web/.env.local and fill in values for local development.",
|
|
545
549
|
mcpInstall: {
|
|
546
|
-
|
|
547
|
-
envNote: "Optional: set OPENROUTER_API_KEY in your MCP client config for live test_model calls. dryRun=true works without a key.",
|
|
548
|
-
claudeCode: "Claude Code: Run `claude mcp add --transport stdio index9 -- npx -y @index9/mcp` or add the same config to .mcp.json / ~/.claude.json."
|
|
549
|
-
},
|
|
550
|
-
release: {
|
|
551
|
-
step1: "Make changes in packages/mcp (core is internal, bundled into mcp)",
|
|
552
|
-
step2: "Run pnpm changeset \u2014 add a changeset, select packages, choose bump type",
|
|
553
|
-
step3: "Commit and push; open PR to main",
|
|
554
|
-
step4: "Merge the PR; CI creates a Version Packages PR when changesets exist",
|
|
555
|
-
step5: "Merge the version PR; CI publishes to npm and creates a GitHub Release with the .mcpb artifact attached",
|
|
556
|
-
step6: "Users can install via npx @index9/mcp@latest or download .mcpb from Releases"
|
|
550
|
+
envNote: "Optional: set OPENROUTER_API_KEY in your MCP client config for live test_model calls. dryRun=true works without a key."
|
|
557
551
|
}
|
|
558
552
|
};
|
|
559
553
|
|
|
@@ -762,6 +756,7 @@ import { z as z6 } from "zod";
|
|
|
762
756
|
var ResponseFormatSchema = z6.object({
|
|
763
757
|
type: z6.string().min(1)
|
|
764
758
|
}).catchall(z6.unknown()).optional();
|
|
759
|
+
var ProviderSortSchema = z6.enum(["throughput", "price", "latency"]);
|
|
765
760
|
var TestRequestSchema = z6.object({
|
|
766
761
|
prompt: z6.string().min(1).optional(),
|
|
767
762
|
userContent: z6.array(UserContentPartSchema).min(1).optional(),
|
|
@@ -777,7 +772,30 @@ var TestRequestSchema = z6.object({
|
|
|
777
772
|
seed: z6.number().int().optional(),
|
|
778
773
|
responseFormat: ResponseFormatSchema,
|
|
779
774
|
enforceJson: z6.boolean().optional(),
|
|
780
|
-
retries: z6.number().int().min(0).max(3).optional()
|
|
775
|
+
retries: z6.number().int().min(0).max(3).optional(),
|
|
776
|
+
// Use OpenRouter's SSE streaming endpoint so capacity/refusal errors
|
|
777
|
+
// surface in ~1s instead of waiting the full per-model timeout for an
|
|
778
|
+
// empty 200 OK. Cost/tokens are still returned via stream_options.
|
|
779
|
+
stream: z6.boolean().optional(),
|
|
780
|
+
// First-token deadline (streaming only). If the upstream sends no
|
|
781
|
+
// delta within this window, abort the request. Defaults to 10s when
|
|
782
|
+
// streaming. Ignored when stream=false.
|
|
783
|
+
firstTokenTimeoutMs: z6.number().int().positive().optional(),
|
|
784
|
+
// Forwards as `provider.sort` to OpenRouter — opt into routing toward
|
|
785
|
+
// higher-throughput providers when running benchmarks.
|
|
786
|
+
providerSort: ProviderSortSchema.optional(),
|
|
787
|
+
// Forwards as `provider.order` — try these provider slugs first in the
|
|
788
|
+
// given order before falling back. Capped to stay within reasonable
|
|
789
|
+
// limits and prevent abuse.
|
|
790
|
+
providerOrder: z6.array(z6.string().min(1)).min(1).max(8).optional(),
|
|
791
|
+
// Forwards as the top-level `models` array (NOT `model`). OpenRouter
|
|
792
|
+
// tries each in order if the primary is unavailable. Different intent
|
|
793
|
+
// from providerOrder, which routes within a single model.
|
|
794
|
+
fallbackModels: z6.array(z6.string().min(1)).min(1).max(5).optional(),
|
|
795
|
+
// When true, attach a `debug` field on each result with the raw
|
|
796
|
+
// upstream finish_reason, error message, provider name, refusal, and
|
|
797
|
+
// usage. Used to diagnose "missing assistant text" without re-running.
|
|
798
|
+
debug: z6.boolean().optional()
|
|
781
799
|
}).strict().superRefine((data, ctx) => {
|
|
782
800
|
if (data.dryRun === true) {
|
|
783
801
|
if (!data.prompt && data.expectedPromptTokens === void 0) {
|
|
@@ -812,10 +830,28 @@ var TestFailureReasonSchema = z6.enum([
|
|
|
812
830
|
"insufficient_credits",
|
|
813
831
|
"model_unavailable",
|
|
814
832
|
"rate_limited",
|
|
833
|
+
// Provider is overloaded / "at capacity" / "provisioned throughput
|
|
834
|
+
// required". A distinct reason from rate_limited so callers can apply
|
|
835
|
+
// a longer backoff or route to a fallback model.
|
|
836
|
+
"capacity",
|
|
815
837
|
"timeout",
|
|
816
838
|
"invalid_request",
|
|
817
839
|
"unknown"
|
|
818
840
|
]);
|
|
841
|
+
var TestDebugInfoSchema = z6.object({
|
|
842
|
+
upstreamId: z6.string().optional(),
|
|
843
|
+
providerName: z6.string().optional(),
|
|
844
|
+
modelPublisher: z6.string().optional(),
|
|
845
|
+
finishReason: z6.string().optional(),
|
|
846
|
+
upstreamError: z6.string().optional(),
|
|
847
|
+
refusal: z6.string().optional(),
|
|
848
|
+
hasToolCalls: z6.boolean().optional(),
|
|
849
|
+
usage: z6.object({
|
|
850
|
+
promptTokens: z6.number().optional(),
|
|
851
|
+
completionTokens: z6.number().optional(),
|
|
852
|
+
totalTokens: z6.number().optional()
|
|
853
|
+
}).optional()
|
|
854
|
+
});
|
|
819
855
|
var TestModelMetadataSchema = z6.object({
|
|
820
856
|
id: z6.string(),
|
|
821
857
|
name: z6.string(),
|
|
@@ -832,7 +868,8 @@ var TestResultSuccessSchema = z6.object({
|
|
|
832
868
|
latencyMs: z6.number().min(0),
|
|
833
869
|
tokens: UsageTokensSchema,
|
|
834
870
|
cost: z6.number().nullable().optional(),
|
|
835
|
-
truncated: z6.boolean().optional()
|
|
871
|
+
truncated: z6.boolean().optional(),
|
|
872
|
+
debug: TestDebugInfoSchema.optional()
|
|
836
873
|
});
|
|
837
874
|
var TestResultFailureSchema = z6.object({
|
|
838
875
|
modelId: z6.string(),
|
|
@@ -841,7 +878,8 @@ var TestResultFailureSchema = z6.object({
|
|
|
841
878
|
model: TestModelMetadataSchema,
|
|
842
879
|
error: z6.string(),
|
|
843
880
|
failureReason: TestFailureReasonSchema.optional(),
|
|
844
|
-
latencyMs: z6.number().min(0)
|
|
881
|
+
latencyMs: z6.number().min(0),
|
|
882
|
+
debug: TestDebugInfoSchema.optional()
|
|
845
883
|
});
|
|
846
884
|
var TestResultSchema = z6.discriminatedUnion("ok", [
|
|
847
885
|
TestResultSuccessSchema,
|
|
@@ -902,8 +940,8 @@ function loadConfig() {
|
|
|
902
940
|
}
|
|
903
941
|
|
|
904
942
|
// src/client.ts
|
|
905
|
-
var
|
|
906
|
-
var
|
|
943
|
+
var DEFAULT_RETRY_DELAYS_MS = [1e3, 2e3, 4e3];
|
|
944
|
+
var DEFAULT_ATTEMPT_TIMEOUT_MS = 3e4;
|
|
907
945
|
function isRetryable(status) {
|
|
908
946
|
return status === 429 || status >= 500;
|
|
909
947
|
}
|
|
@@ -919,14 +957,17 @@ function toErrorMessage(error) {
|
|
|
919
957
|
if (error instanceof Error && error.message.trim()) return error.message;
|
|
920
958
|
return "Unknown error";
|
|
921
959
|
}
|
|
922
|
-
async function fetchWithRetry(url, options) {
|
|
960
|
+
async function fetchWithRetry(url, options, retryOptions) {
|
|
961
|
+
const attemptTimeoutMs = retryOptions?.attemptTimeoutMs ?? DEFAULT_ATTEMPT_TIMEOUT_MS;
|
|
962
|
+
const maxRetries = Math.max(0, retryOptions?.maxRetries ?? DEFAULT_RETRY_DELAYS_MS.length);
|
|
963
|
+
const retryDelaysMs = DEFAULT_RETRY_DELAYS_MS.slice(0, maxRetries);
|
|
923
964
|
let lastResponse = null;
|
|
924
965
|
let lastError;
|
|
925
|
-
for (let i = 0; i <=
|
|
966
|
+
for (let i = 0; i <= maxRetries; i++) {
|
|
926
967
|
const timeoutController = new AbortController();
|
|
927
968
|
const timeoutId = setTimeout(() => {
|
|
928
969
|
timeoutController.abort(new DOMException("Request timed out", "AbortError"));
|
|
929
|
-
},
|
|
970
|
+
}, attemptTimeoutMs);
|
|
930
971
|
const externalSignal = options.signal;
|
|
931
972
|
const onAbort = () => {
|
|
932
973
|
timeoutController.abort(
|
|
@@ -951,14 +992,12 @@ async function fetchWithRetry(url, options) {
|
|
|
951
992
|
clearTimeout(timeoutId);
|
|
952
993
|
externalSignal?.removeEventListener("abort", onAbort);
|
|
953
994
|
}
|
|
954
|
-
if (i <
|
|
955
|
-
await sleep(
|
|
995
|
+
if (i < retryDelaysMs.length) {
|
|
996
|
+
await sleep(retryDelaysMs[i]);
|
|
956
997
|
}
|
|
957
998
|
}
|
|
958
999
|
if (lastResponse) return lastResponse;
|
|
959
|
-
throw new Error(
|
|
960
|
-
`Request failed after ${RETRY_DELAYS_MS.length + 1} attempts: ${toErrorMessage(lastError)}`
|
|
961
|
-
);
|
|
1000
|
+
throw new Error(`Request failed after ${maxRetries + 1} attempts: ${toErrorMessage(lastError)}`);
|
|
962
1001
|
}
|
|
963
1002
|
function buildUrl(baseUrl, path, params) {
|
|
964
1003
|
const url = new URL(path, baseUrl);
|
|
@@ -1037,8 +1076,8 @@ function extractRecoveryFields(body) {
|
|
|
1037
1076
|
}
|
|
1038
1077
|
return out;
|
|
1039
1078
|
}
|
|
1040
|
-
async function callApi(ctx, url, options, responseSchema) {
|
|
1041
|
-
const res = await fetchWithRetry(url, options);
|
|
1079
|
+
async function callApi(ctx, url, options, responseSchema, retryOptions) {
|
|
1080
|
+
const res = await fetchWithRetry(url, options, retryOptions);
|
|
1042
1081
|
let body;
|
|
1043
1082
|
try {
|
|
1044
1083
|
body = await res.json();
|
|
@@ -1153,7 +1192,12 @@ async function handleTestModels(ctx, args) {
|
|
|
1153
1192
|
ctx,
|
|
1154
1193
|
`${ctx.baseUrl}${API_PATHS.test}`,
|
|
1155
1194
|
{ method: "POST", headers: reqHeaders, body: JSON.stringify(parsed.data) },
|
|
1156
|
-
TestResponseSchema
|
|
1195
|
+
TestResponseSchema,
|
|
1196
|
+
// Live inference is non-idempotent and slow: each retry costs real money
|
|
1197
|
+
// and the server-side per-model retry/backoff already handles transient
|
|
1198
|
+
// errors. Give the call enough wall-clock to cover a worst-case 10-model
|
|
1199
|
+
// batch × 60s per model and let the server decide on retries.
|
|
1200
|
+
{ attemptTimeoutMs: 24e4, maxRetries: 0 }
|
|
1157
1201
|
);
|
|
1158
1202
|
}
|
|
1159
1203
|
|
|
@@ -1177,8 +1221,8 @@ async function createServer() {
|
|
|
1177
1221
|
sortOrder: z7.enum(["asc", "desc"]).optional().describe("Sort order. Defaults by sortBy."),
|
|
1178
1222
|
createdAfter: z7.string().optional().describe("Lower bound for model created timestamp."),
|
|
1179
1223
|
createdBefore: z7.string().optional().describe("Upper bound for model created timestamp."),
|
|
1180
|
-
minPrice: z7.number().min(0).optional().describe(
|
|
1181
|
-
maxPrice: z7.number().min(0).optional().describe(
|
|
1224
|
+
minPrice: z7.number().min(0).optional().describe(PARAM_DESCRIPTIONS.minPrice),
|
|
1225
|
+
maxPrice: z7.number().min(0).optional().describe(PARAM_DESCRIPTIONS.maxPrice),
|
|
1182
1226
|
minContext: z7.number().int().min(1).optional().describe("Minimum context window in tokens."),
|
|
1183
1227
|
capabilitiesAll: z7.array(z7.enum(CAPABILITIES)).optional().describe(PARAM_DESCRIPTIONS.capabilitiesAll),
|
|
1184
1228
|
capabilitiesAny: z7.array(z7.enum(CAPABILITIES)).optional().describe(PARAM_DESCRIPTIONS.capabilitiesAny),
|
|
@@ -1264,7 +1308,21 @@ async function createServer() {
|
|
|
1264
1308
|
"Structured output shape request forwarded to OpenRouter (e.g., { type: 'json_object' })."
|
|
1265
1309
|
),
|
|
1266
1310
|
enforceJson: z7.boolean().optional().describe("When true, output must parse as JSON."),
|
|
1267
|
-
retries: z7.number().int().min(0).max(3).optional().describe("Retries for transient failures.")
|
|
1311
|
+
retries: z7.number().int().min(0).max(3).optional().describe("Retries for transient failures."),
|
|
1312
|
+
stream: z7.boolean().optional().describe(
|
|
1313
|
+
"Use OpenRouter SSE streaming so capacity/refusal errors surface quickly. Defaults to false."
|
|
1314
|
+
),
|
|
1315
|
+
firstTokenTimeoutMs: z7.number().int().min(1).optional().describe("Streaming-only first-token deadline in ms. Defaults to 10000."),
|
|
1316
|
+
providerSort: ProviderSortSchema.optional().describe(
|
|
1317
|
+
'OpenRouter provider routing sort: "throughput", "price", or "latency".'
|
|
1318
|
+
),
|
|
1319
|
+
providerOrder: z7.array(z7.string().min(1)).min(1).max(8).optional().describe("Provider slugs to try first, in order. Up to 8."),
|
|
1320
|
+
fallbackModels: z7.array(z7.string().min(1)).min(1).max(5).optional().describe(
|
|
1321
|
+
"Fallback model IDs OpenRouter may try if the primary is unavailable. Up to 5."
|
|
1322
|
+
),
|
|
1323
|
+
debug: z7.boolean().optional().describe(
|
|
1324
|
+
"When true, include upstream finish_reason, provider, error, refusal, and usage."
|
|
1325
|
+
)
|
|
1268
1326
|
},
|
|
1269
1327
|
// No outputSchema: test_model returns a z.union of dry-run and live shapes.
|
|
1270
1328
|
// The SDK supports only ZodRawShape | AnySchema for outputSchema; a discriminated-union
|
package/manifest.json
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@index9/mcp",
|
|
3
|
-
"version": "6.
|
|
3
|
+
"version": "6.4.0",
|
|
4
4
|
"license": "MIT",
|
|
5
5
|
"repository": {
|
|
6
6
|
"type": "git",
|
|
@@ -24,11 +24,11 @@
|
|
|
24
24
|
"zod": "^4.4.3"
|
|
25
25
|
},
|
|
26
26
|
"devDependencies": {
|
|
27
|
-
"@types/node": "^25.
|
|
27
|
+
"@types/node": "^25.8.0",
|
|
28
28
|
"tsup": "^8.5.1",
|
|
29
29
|
"typescript": "6.0.3",
|
|
30
30
|
"vitest": "^4.1.6",
|
|
31
|
-
"@index9/core": "2.
|
|
31
|
+
"@index9/core": "2.7.0"
|
|
32
32
|
},
|
|
33
33
|
"engines": {
|
|
34
34
|
"node": ">=20"
|