npm - @xynogen/pix-data - Versions diffs - 0.1.2 → 0.2.0 - Mend

@xynogen/pix-data 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,20 +1,101 @@
 # pix-data
-Pi coding agent extension — shared model data layer. Fetches and caches [models.dev](https://models.dev) metadata and [BenchLM](https://benchlm.ai) leaderboard data to `~/.cache/pi/` on session start, so other extensions can read them synchronously without redundant network calls.
+Pi coding agent extension — shared model data layer. Fetches and caches the
+[modelgrep](https://modelgrep.com) model catalog to `~/.cache/pi/` on session
+start, so other extensions (model picker, footer, subagent resolver) can read
+context window, pricing, and a coding-focused score/rank synchronously without
+redundant network calls.
+## Data source
+All data comes from a **single source**: [modelgrep.com](https://modelgrep.com)
+(`/api/v1/models?benchmarked=1&sort=coding`). Free, no API key, ~190 benchmarked
+models with real model ids. modelgrep aggregates benchmark numbers from
+[Artificial Analysis](https://artificialanalysis.ai).
+- **Context window + pricing** — taken verbatim from modelgrep.
+- **Score** — computed locally from the raw benchmark fields (see below).
+- **Rank** — the model's position once the whole catalog is sorted by that score
+  (best = `#1`). Unscored models sink to the bottom.
+Cached 24h → `~/.cache/pi/modelgrep.json`. On outage the stale cache keeps the
+picker working until it can refresh.
+## Scoring methodology
+**Primary score = [Artificial Analysis Intelligence Index](https://artificialanalysis.ai/methodology/intelligence-benchmarking)**
+when available — AA's authoritative composite of 9 independent evals (agents,
+coding, scientific reasoning, general), already weighted toward agentic work.
+It is rescaled to 0–100 (`intelligence / 65 × 100`; the current leader scores
+~65).
+**Fallback = a coding-and-agentic heuristic** for the ~84% of models AA has not
+index-scored, computed from the raw benchmarks below, then mapped onto the index
+scale by a least-squares line. Both the heuristic weights *and* the line were
+jointly tuned against the index on the models that carry *both* it and the raw
+benches (`index100 ≈ 120.6·heuristic − 10.6`, deduped n=29, R²=0.901,
+leave-one-out RMSE 6.55pt) — a data calibration, not a guessed penalty. The
+picker exists to choose a model *for coding work in an agent*, so the heuristic
+is weighted toward exactly that:
+| bench | range | measures |
+|---|---|---|
+| `coding` | 0–100 | code generation index |
+| `scicode` | 0–1 | scientific coding |
+| `tau2` | 0–1 | agentic tool-use |
+| `agentic` | 0–100 | agentic index |
+| `gpqa` | 0–1 | graduate-level reasoning |
+| `hle` | 0–1 | hard-exam reasoning |
+When the index is absent, three sub-scores combine, each a weighted blend of
+its benches (all normalized to 0–1):
+```
+coding_score    = 0.60·(coding/100) + 0.40·scicode
+agentic_score   = 0.70·tau2         + 0.30·(agentic/100)
+reasoning_score = 0.60·gpqa         + 0.40·hle
+heuristic = 0.30·coding_score + 0.60·agentic_score + 0.10·reasoning_score
+score     = round(clamp₀₁₀₀(120.6·heuristic − 10.6))   // fitted to the index
+```
+**Why a heuristic at all, and why these raw evals only:** the AA Intelligence
+Index *is* the ideal number — but only ~16% of the catalog has it. For the rest
+we rebuild a comparable score from the same family of raw evals. Crucially we
+use each raw eval **once** and never feed `intelligence` *and* its components
+together, nor any `_pct` field (which is just a percentile-rank of a raw field)
+— doing so would double-count the same measurement and silently inflate weights
+you can't see. Independent inputs only → honest weighted average.
+**Why these weights:** an agentic coding model lives or dies on *tool-calling*
+and *code generation*, so `agentic_score` (0.60) and `coding_score` (0.30)
+carry the score; pure reasoning (0.10) is a tiebreaker, not the headline. The
+split is not arbitrary — a grid search over weight combinations, scored by how
+well the heuristic predicts the AA index (leave-one-out cross-validation),
+landed on this agentic-heavy mix. Within each group the dominant bench (`tau2`
+for agentic, raw `coding`, `gpqa`) carries most of the weight and a secondary
+bench refines it.
+**Missing benchmarks:** every blend renormalizes over the fields actually
+present, so a model missing one bench is diluted only *within its own group* —
+it is never zero-penalized or dropped. A model with no benchmarks at all gets a
+`null` score (shown as a bare row) and sorts to the bottom.
+The exact implementation is `codingScore()` in
+[`src/data.ts`](src/data.ts); the weights are intentionally easy to tune in one
+place if your priorities differ.
 ## What's included
 | Export | Description |
 |---|---|
-| `modelsDev` | `DataSource<ModelsDevApi>` — models.dev metadata (context, cost, modalities). TTL 24h → `~/.cache/pi/models.json` |
-| `benchmark` | `DataSource<BenchmarkEntry[]>` — BenchLM leaderboard (rank, score, pricing). TTL 24h → `~/.cache/pi/benchlm.json` |
+| `modelgrep` | `DataSource<ModelGrepModel[]>` — the catalog. TTL 24h → `~/.cache/pi/modelgrep.json` |
 | `DataSource` | Generic cached data source class |
 | `CACHE_DIR` | Resolved cache directory (`~/.cache/pi`) |
-| `buildModelsDevIndex` | Build a lookup `Map` from a `ModelsDevApi` response |
-| `lookupInIndex` | Fuzzy-match a router model id against the index |
+| `buildModelsDevIndex` | Build a lookup `Map` from the catalog (context/cost/modalities) |
+| `lookupInIndex` | Fuzzy-match a router model id against an index |
 | `lookupModelsDev` | Sync lookup by provider + id from in-memory cache |
-| `lookupBenchmark` | Fuzzy lookup a model by name from BenchLM cache |
-| `fetchModelsDevIndex` | Async — fetch models.dev and return built index |
+| `lookupBenchmark` | Sync lookup a model by id — returns score + rank + pricing |
 ## Install
@@ -24,10 +105,15 @@ pi install npm:@xynogen/pix-data
 ## How it works
-On session start the extension fires two parallel background fetches (`modelsDev.get()` + `benchmark.get()`). If the cache is fresh the fetches are skipped. Both cache files live in `~/.cache/pi/` — any Pi extension using the same `DataSource` + cache paths will share data automatically.
+On session start the extension fires a background fetch (`modelgrep.get()`),
+paginating the API until the full benchmarked catalog is retrieved. If the cache
+is fresh the fetch is skipped. The cache file lives in `~/.cache/pi/` — any Pi
+extension using the same `DataSource` shares it automatically.
 ## Full distro
+Source: [github.com/xynogen/pix-mono](https://github.com/xynogen/pix-mono)
 To install the complete pix suite (all packages + Pi itself):
 ```bash

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "@xynogen/pix-data",
-	"version": "0.1.2",
+	"version": "0.2.0",
 	"description": "Pi extension — shared model data layer (models.dev + BenchLM), cached at ~/.cache/pi",
 	"type": "module",
 	"main": "src/index.ts",

package/src/data.test.ts CHANGED Viewed

@@ -1,107 +1,102 @@
 import { afterEach, beforeEach, describe, expect, it } from "bun:test";
 import {
-	type BenchmarkEntry,
-	benchmark,
 	buildModelsDevIndex,
 	lookupBenchmark,
 	lookupInIndex,
 	lookupModelsDev,
-	type ModelsDevApi,
-	type ModelsDevModel,
-	modelsDev,
+	type ModelGrepModel,
+	modelgrep,
 } from "./data.ts";
+// Compact modelgrep-shaped fixture builder.
+function mg(
+	id: string,
+	opts: {
+		name?: string;
+		ctx?: number;
+		in?: number;
+		out?: number;
+		reasoning?: boolean;
+		input?: string[];
+		// Raw benchmark inputs to codingScore. intelligence (~0–65) wins when
+		// present; otherwise coding/agentic (0–100) + rest (0–1) feed the heuristic.
+		bench?: {
+			intelligence?: number;
+			coding?: number;
+			agentic?: number;
+			gpqa?: number;
+			scicode?: number;
+			tau2?: number;
+			hle?: number;
+		};
+	} = {},
+): ModelGrepModel {
+	return {
+		id,
+		name: opts.name ?? id,
+		context_length: opts.ctx,
+		pricing: { input: opts.in, output: opts.out },
+		modality: { input: opts.input },
+		capabilities: { reasoning: opts.reasoning },
+		benchmarks: { artificial_analysis: { ...opts.bench } },
+	};
+}
 // ── buildModelsDevIndex ──────────────────────────────────────────────────────
 describe("buildModelsDevIndex", () => {
-	const api: ModelsDevApi = {
-		anthropic: {
-			models: {
-				"claude-sonnet-4-5": {
-					id: "claude-sonnet-4-5",
-					name: "Claude Sonnet 4.5",
-				},
-				"claude-opus-4": {
-					id: "claude-opus-4",
-					name: "Claude Opus 4",
-					reasoning: true,
-				},
-			},
-		},
-		openai: {
-			models: {
-				"gpt-4o": {
-					id: "gpt-4o",
-					name: "GPT-4o",
-					modalities: { input: ["text", "image"] },
-				},
-			},
-		},
-	};
+	const catalog: ModelGrepModel[] = [
+		mg("anthropic/claude-sonnet-4-5", { name: "Claude Sonnet 4.5" }),
+		mg("anthropic/claude-opus-4", { name: "Claude Opus 4", reasoning: true }),
+		mg("openai/gpt-4o", { name: "GPT-4o", input: ["text", "image"] }),
+	];
-	it("indexes all models by exact id", () => {
-		const idx = buildModelsDevIndex(api);
+	it("indexes all models by slug", () => {
+		const idx = buildModelsDevIndex(catalog);
 		expect(idx.has("claude-sonnet-4-5")).toBe(true);
 		expect(idx.has("claude-opus-4")).toBe(true);
 		expect(idx.has("gpt-4o")).toBe(true);
 	});
-	it("indexes normalized id (strip date suffix)", () => {
-		const a: ModelsDevApi = {
-			anthropic: {
-				models: {
-					"claude-sonnet-4-5-20250514": {
-						id: "claude-sonnet-4-5-20250514",
-						name: "Claude Sonnet 4.5",
-					},
-				},
-			},
-		};
-		const idx = buildModelsDevIndex(a);
+	it("indexes normalized slug (strip date suffix)", () => {
+		const idx = buildModelsDevIndex([
+			mg("anthropic/claude-sonnet-4-5-20250514", { name: "Claude Sonnet 4.5" }),
+		]);
 		expect(idx.has("claude-sonnet-4-5")).toBe(true);
 	});
-	it("handles empty api", () => {
-		expect(buildModelsDevIndex({}).size).toBe(0);
+	it("handles empty catalog", () => {
+		expect(buildModelsDevIndex([]).size).toBe(0);
 	});
-	it("handles provider with no models key", () => {
-		expect(buildModelsDevIndex({ anthropic: {} }).size).toBe(0);
+	it("maps fields onto ModelsDevModel shape", () => {
+		const m = buildModelsDevIndex([
+			mg("openai/gpt-4o", { ctx: 128000, in: 5, out: 15, input: ["text"] }),
+		]).get("gpt-4o");
+		expect(m?.limit?.context).toBe(128000);
+		expect(m?.cost?.input).toBe(5);
+		expect(m?.cost?.output).toBe(15);
+		expect(m?.modalities?.input).toEqual(["text"]);
 	});
-	it("preserves first-seen on id collision", () => {
-		const a: ModelsDevApi = {
-			a: { models: { "gpt-4o": { id: "gpt-4o", name: "First" } } },
-			b: { models: { "gpt-4o": { id: "gpt-4o", name: "Second" } } },
-		};
-		expect(buildModelsDevIndex(a).get("gpt-4o")?.name).toBe("First");
+	it("preserves first-seen on slug collision", () => {
+		const idx = buildModelsDevIndex([
+			mg("a/gpt-4o", { name: "First" }),
+			mg("b/gpt-4o", { name: "Second" }),
+		]);
+		expect(idx.get("gpt-4o")?.name).toBe("First");
 	});
 });
 // ── lookupInIndex ────────────────────────────────────────────────────────────
 describe("lookupInIndex", () => {
-	let index: Map<string, ModelsDevModel>;
-	beforeEach(() => {
-		index = buildModelsDevIndex({
-			anthropic: {
-				models: {
-					"claude-sonnet-4-5": {
-						id: "claude-sonnet-4-5",
-						name: "Claude Sonnet 4.5",
-					},
-					"claude-opus-4": { id: "claude-opus-4", name: "Claude Opus 4" },
-				},
-			},
-			openai: {
-				models: {
-					"gpt-4o": { id: "gpt-4o", name: "GPT-4o" },
-					"o3-mini": { id: "o3-mini", name: "o3 mini" },
-				},
-			},
-		});
-	});
+	const index = buildModelsDevIndex([
+		mg("anthropic/claude-sonnet-4-5", { name: "Claude Sonnet 4.5" }),
+		mg("anthropic/claude-opus-4", { name: "Claude Opus 4" }),
+		mg("openai/gpt-4o", { name: "GPT-4o" }),
+		mg("openai/o3-mini", { name: "o3 mini" }),
+	]);
 	it("finds exact match", () => {
 		expect(lookupInIndex("claude-sonnet-4-5", index)?.name).toBe(
@@ -142,108 +137,93 @@ describe("lookupInIndex", () => {
 	});
 });
-// ── lookupModelsDev ───────────────────────────────────────────────────────────
+// ── modelgrep adapters (lookupModelsDev + lookupBenchmark) ────────────────────
+describe("modelgrep adapters", () => {
+	const catalog: ModelGrepModel[] = [
+		mg("anthropic/claude-haiku-4.5", {
+			name: "Anthropic: Claude Haiku 4.5",
+			ctx: 200000,
+			in: 1,
+			out: 5,
+			input: ["text", "image"],
+			bench: {
+				coding: 43.9,
+				agentic: 16.4,
+				gpqa: 0.672,
+				scicode: 0.433,
+				tau2: 0.547,
+				hle: 0.097,
+			},
+		}),
+		mg("tencent/hy3-preview", {
+			name: "Tencent: hy3 preview",
+			ctx: 256000,
+			in: 0,
+			out: 0,
+			reasoning: true,
+			// coding/agentic absent — only raw benches → renormalized over present
+			bench: { gpqa: 0.732, scicode: 0.394, tau2: 0.675, hle: 0.063 },
+		}),
+		mg("ghost/unbenched", { name: "Ghost" }), // no signal at all
+	];
-describe("lookupModelsDev", () => {
 	beforeEach(() => {
-		// Seed in-memory cache directly
-		(modelsDev as any)._mem = {
-			anthropic: {
-				models: {
-					"claude-sonnet-4-5": {
-						id: "claude-sonnet-4-5",
-						name: "Claude Sonnet 4.5",
-					},
-				},
-			},
-			openai: {
-				models: {
-					"gpt-4o": { id: "gpt-4o", name: "GPT-4o" },
-				},
-			},
-		};
+		(modelgrep as unknown as { _mem: ModelGrepModel[] })._mem = catalog;
 	});
 	afterEach(() => {
-		(modelsDev as any)._mem = null;
+		(modelgrep as unknown as { _mem: ModelGrepModel[] | null })._mem = null;
 	});
-	it("finds by exact provider + id", () => {
-		expect(lookupModelsDev("anthropic", "claude-sonnet-4-5")?.name).toBe(
-			"Claude Sonnet 4.5",
-		);
+	it("lookupModelsDev finds haiku via slug, ignoring routing prefix + date", () => {
+		const m = lookupModelsDev("cc", "claude-haiku-4-5-20251001");
+		expect(m?.limit?.context).toBe(200000);
+		expect(m?.cost?.input).toBe(1);
 	});
-	it("falls back across providers when provider miss", () => {
-		expect(lookupModelsDev("unknown-provider", "gpt-4o")?.name).toBe("GPT-4o");
-	});
-	it("strips path prefix from id", () => {
+	it("lookupModelsDev finds hy3 via prefix + suffix strip", () => {
 		expect(
-			lookupModelsDev("anthropic", "anthropic/claude-sonnet-4-5")?.name,
-		).toBe("Claude Sonnet 4.5");
-	});
-	it("returns undefined for unknown model", () => {
-		expect(lookupModelsDev("anthropic", "nonexistent-xyz")).toBeUndefined();
+			lookupModelsDev("openrouter", "tencent/hy3-preview:nitro")?.limit
+				?.context,
+		).toBe(256000);
 	});
-});
-// ── lookupBenchmark ───────────────────────────────────────────────────────────
-describe("lookupBenchmark", () => {
-	const entries: BenchmarkEntry[] = [
-		{
-			rank: 1,
-			model: "Claude Sonnet 4.5",
-			creator: "Anthropic",
-			overallScore: 95,
-			inputPrice: 3,
-			outputPrice: 15,
-		},
-		{
-			rank: 2,
-			model: "GPT-4o",
-			creator: "OpenAI",
-			overallScore: 90,
-			inputPrice: 5,
-			outputPrice: 15,
-		},
-		{
-			rank: 3,
-			model: "Gemini 1.5 Pro",
-			creator: "Google",
-			overallScore: 88,
-			inputPrice: 3.5,
-			outputPrice: 10.5,
-		},
-	];
-	beforeEach(() => {
-		(benchmark as any)._mem = entries;
+	it("lookupModelsDev returns undefined for unknown model", () => {
+		expect(lookupModelsDev("cc", "nonexistent-xyz")).toBeUndefined();
 	});
-	afterEach(() => {
-		(benchmark as any)._mem = null;
-	});
-	it("finds exact match (case-insensitive, normalized)", () => {
-		expect(lookupBenchmark("claude sonnet 4.5")?.rank).toBe(1);
+	it("lookupBenchmark falls back to fitted heuristic (no intelligence)", () => {
+		const b = lookupBenchmark("claude-haiku-4-5-20251001");
+		// no intelligence → 120.6·heur − 10.6 → 42
+		expect(b?.overallScore).toBe(42);
+		expect(b?.rank).toBe(2); // ranked by score: hy3 (58) > haiku (42)
+		expect(b?.inputPrice).toBe(1);
+		expect(b?.outputPrice).toBe(5);
 	});
-	it("finds with dashes normalized", () => {
-		expect(lookupBenchmark("claude-sonnet-4-5")?.rank).toBe(1);
+	it("lookupBenchmark renormalizes heuristic over present benches", () => {
+		const b = lookupBenchmark("tencent/hy3-preview:nitro");
+		// coding/agentic indices absent → heuristic renormalizes → fitted → 58
+		expect(b?.overallScore).toBe(58);
+		expect(b?.rank).toBe(1);
 	});
-	it("finds partial match (needle in model)", () => {
-		expect(lookupBenchmark("gpt-4o")?.rank).toBe(2);
+	it("lookupBenchmark returns null score when no benches at all", () => {
+		const b = lookupBenchmark("ghost/unbenched");
+		expect(b?.overallScore).toBeNull();
+		expect(b?.rank).toBe(3); // unscored sinks to the bottom
 	});
-	it("finds partial match (model in needle)", () => {
-		expect(lookupBenchmark("gemini 1.5 pro latest")?.rank).toBe(3);
+	it("lookupBenchmark prefers AA intelligence index over heuristic", () => {
+		(modelgrep as unknown as { _mem: ModelGrepModel[] })._mem = [
+			mg("openai/gpt-5", { bench: { intelligence: 52, coding: 10 } }),
+		];
+		const b = lookupBenchmark("gpt-5");
+		// intelligence present → round(52 / 65 * 100) = 80, ignores low coding
+		expect(b?.overallScore).toBe(80);
 	});
-	it("returns undefined for unknown model", () => {
+	it("lookupBenchmark returns undefined for unknown model", () => {
 		expect(lookupBenchmark("nonexistent-model-xyz")).toBeUndefined();
 	});
 });

package/src/data.ts CHANGED Viewed

@@ -1,21 +1,18 @@
 /**
  * data.ts — shared Pi model data layer
  *
- * Single source of truth for:
- *   - models.dev metadata  (context, cost, modalities) → ~/.cache/pi/models.json   TTL 24h
- *   - BenchLM leaderboard  (rank, score, pricing)      → ~/.cache/pi/benchlm.json  TTL 24h
+ * Single source of truth, sourced from modelgrep (coding-sorted), cached at
+ * ~/.cache/pi/modelgrep.json (TTL 24h). Provides context, cost, modalities,
+ * capabilities, coding-percentile score, and rank.
  *
  * Cache files are shared across all Pi extensions — whichever extension loads
  * first populates the cache; subsequent extensions read from disk.
  *
  * Usage:
- *   import { modelsDev, benchmark } from "./data.ts";
+ *   import { modelgrep } from "./data.ts";
  *
- *   const models  = await modelsDev.get();   // async, fetches if stale
- *   const entries = await benchmark.get();
- *
- *   const models  = modelsDev.getCached();   // sync, disk-only, no fetch
- *   const entries = benchmark.getCached();
+ *   const catalog = await modelgrep.get();   // async, fetches if stale
+ *   const catalog = modelgrep.getCached();   // sync, disk-only, no fetch
  *
  *   import { lookupModelsDev, lookupBenchmark } from "./data.ts";
  */
@@ -57,10 +54,29 @@ export interface BenchmarkEntry {
 	outputPrice: number | null;
 }
-interface BenchmarkResponse {
-	lastUpdated?: string;
-	mode?: string;
-	models: BenchmarkEntry[];
+export interface ModelGrepModel {
+	id: string;
+	name?: string;
+	context_length?: number;
+	pricing?: { input?: number; output?: number };
+	modality?: { input?: string[]; output?: string[] };
+	capabilities?: { reasoning?: boolean };
+	benchmarks?: {
+		artificial_analysis?: {
+			// AA Intelligence Index — authoritative 9-eval composite (~0–65 range).
+			intelligence?: number | null;
+			coding?: number | null; // 0–100 index
+			agentic?: number | null; // 0–100 index
+			gpqa?: number | null; // 0–1
+			scicode?: number | null; // 0–1
+			tau2?: number | null; // 0–1
+			hle?: number | null; // 0–1
+		};
+	};
+}
+interface ModelGrepResponse {
+	data: ModelGrepModel[];
 }
 // ── DataSource ───────────────────────────────────────────────────────────────
@@ -76,6 +92,16 @@ interface DataSourceOptions<T> {
 	empty: T;
 	label: string;
 	skip?: () => boolean;
+	/**
+	 * Optional override for sources that need multiple requests (pagination).
+	 * Returns the merged raw payload, which is then handed to `parse`/cached
+	 * exactly as a single response would be.
+	 */
+	fetchRaw?: (
+		url: string,
+		headers: Record<string, string> | undefined,
+		timeoutMs: number,
+	) => Promise<unknown>;
 }
 export class DataSource<T> {
@@ -89,6 +115,7 @@ export class DataSource<T> {
 			timeoutMs: 10_000,
 			headers: () => undefined,
 			skip: () => false,
+			fetchRaw: defaultFetchRaw,
 			...opts,
 		};
 	}
@@ -131,14 +158,11 @@ export class DataSource<T> {
 		try {
 			const url =
 				typeof this.opts.url === "function" ? this.opts.url() : this.opts.url;
-			const response = await fetchWithTimeout(
+			const raw = await this.opts.fetchRaw(
 				url,
-				this.opts.timeoutMs,
 				this.opts.headers(),
+				this.opts.timeoutMs,
 			);
-			if (!response.ok)
-				throw new Error(`${this.opts.label} fetch failed: ${response.status}`);
-			const raw = await response.json();
 			const val = this.opts.parse(raw);
 			this._mem = val;
 			void this._writeCache(raw);
@@ -196,6 +220,52 @@ function fetchWithTimeout(
 	);
 }
+/** Single-request raw fetch — the default DataSource fetch strategy. */
+async function defaultFetchRaw(
+	url: string,
+	headers: Record<string, string> | undefined,
+	timeoutMs: number,
+): Promise<unknown> {
+	const response = await fetchWithTimeout(url, timeoutMs, headers);
+	if (!response.ok) throw new Error(`fetch failed: ${response.status}`);
+	return response.json();
+}
+const MODELGREP_PAGE = 200; // modelgrep hard page-size cap
+const MODELGREP_MAX_PAGES = 10; // safety bound (~2000 models)
+interface ModelGrepPage {
+	data?: ModelGrepModel[];
+	meta?: { has_more?: boolean; next_offset?: number };
+}
+/**
+ * Paginating fetch for modelgrep: walks `meta.has_more`/`next_offset` and
+ * merges every page into one `{ data }` payload so `parse` and the cache see
+ * the full catalog as a single response. `url` already carries the query
+ * (sort/limit); we only append `&offset=`.
+ */
+async function fetchModelGrepAll(
+	url: string,
+	headers: Record<string, string> | undefined,
+	timeoutMs: number,
+): Promise<{ data: ModelGrepModel[] }> {
+	const all: ModelGrepModel[] = [];
+	let offset = 0;
+	for (let page = 0; page < MODELGREP_MAX_PAGES; page++) {
+		const sep = url.includes("?") ? "&" : "?";
+		const res = (await defaultFetchRaw(
+			`${url}${sep}offset=${offset}`,
+			headers,
+			timeoutMs,
+		)) as ModelGrepPage;
+		if (res.data?.length) all.push(...res.data);
+		if (!res.meta?.has_more) break;
+		offset = res.meta.next_offset ?? offset + MODELGREP_PAGE;
+	}
+	return { data: all };
+}
 // ── Cache dir ─────────────────────────────────────────────────────────────────
 export const CACHE_DIR = join(
@@ -205,21 +275,13 @@ export const CACHE_DIR = join(
 // ── Data sources ──────────────────────────────────────────────────────────────
-export const modelsDev = new DataSource<ModelsDevApi>({
-	label: "models.dev",
-	url: "https://models.dev/api.json",
-	cachePath: join(CACHE_DIR, "models.json"),
-	parse: (raw) => raw as ModelsDevApi,
-	parseCache: (data) => (data as ModelsDevApi) ?? {},
-	empty: {},
-});
-export const benchmark = new DataSource<BenchmarkEntry[]>({
-	label: "benchlm",
-	url: "https://benchlm.ai/api/data/leaderboard",
-	cachePath: join(CACHE_DIR, "benchlm.json"),
-	parse: (raw) => (raw as BenchmarkResponse).models ?? [],
-	parseCache: (data) => (data as BenchmarkResponse)?.models ?? [],
+export const modelgrep = new DataSource<ModelGrepModel[]>({
+	label: "modelgrep",
+	url: `https://modelgrep.com/api/v1/models?benchmarked=1&sort=coding&order=desc&limit=${MODELGREP_PAGE}`,
+	cachePath: join(CACHE_DIR, "modelgrep.json"),
+	fetchRaw: fetchModelGrepAll,
+	parse: (raw) => (raw as ModelGrepResponse).data ?? [],
+	parseCache: (data) => (data as ModelGrepResponse)?.data ?? [],
 	empty: [],
 });
@@ -228,8 +290,9 @@ export const benchmark = new DataSource<BenchmarkEntry[]>({
 function normalize(id: string): string {
 	return id
 		.toLowerCase()
-		.replace(/[:@].*$/, "")
-		.replace(/-\d{8}$/, "");
+		.replace(/[:@].*$/, "") // routing suffix (:nitro, @date)
+		.replace(/[._]/g, "-") // fold separators: modelgrep `4.5` ↔ Pi routing `4-5`
+		.replace(/-\d{8}$/, ""); // trailing -YYYYMMDD
 }
 function stripPrefix(id: string): string {
@@ -237,71 +300,172 @@ function stripPrefix(id: string): string {
 	return i >= 0 ? id.slice(i + 1) : id;
 }
-export function buildModelsDevIndex(
-	api: ModelsDevApi,
-): Map<string, ModelsDevModel> {
-	const index = new Map<string, ModelsDevModel>();
-	for (const provider of Object.values(api)) {
-		if (!provider?.models) continue;
-		for (const [modelId, model] of Object.entries(provider.models)) {
-			const m: ModelsDevModel = { ...model, id: modelId };
-			if (!index.has(modelId)) index.set(modelId, m);
-			const norm = normalize(modelId);
-			if (!index.has(norm)) index.set(norm, m);
-		}
-	}
-	return index;
+/** Slug = model id without its maker/provider prefix. */
+function slugOf(id: string): string {
+	return id.includes("/") ? id.slice(id.lastIndexOf("/") + 1) : id;
 }
-export function lookupInIndex(
-	id: string,
-	index: Map<string, ModelsDevModel>,
-): ModelsDevModel | undefined {
+/**
+ * Generic normalized-index lookup: exact slug → normalized slug → fuzzy
+ * prefix overlap. Handles routing suffixes (`:nitro`, `@date`, `-YYYYMMDD`)
+ * and maker prefixes (e.g. `tencent/hy3-preview:nitro` → `hy3-preview`).
+ */
+function findInIndex<T>(id: string, index: Map<string, T>): T | undefined {
 	const stripped = stripPrefix(id);
 	const direct = index.get(stripped) ?? index.get(normalize(stripped));
 	if (direct) return direct;
 	const norm = normalize(stripped);
-	for (const [key, model] of index) {
-		if (key.startsWith(norm) || norm.startsWith(key)) return model;
+	for (const [key, value] of index) {
+		if (key.startsWith(norm) || norm.startsWith(key)) return value;
 	}
 	return undefined;
 }
-export function lookupModelsDev(
-	provider: string,
+export function lookupInIndex(
 	id: string,
+	index: Map<string, ModelsDevModel>,
 ): ModelsDevModel | undefined {
-	const data = modelsDev.getCached();
-	const canonical = id.includes("/") ? id.slice(id.lastIndexOf("/") + 1) : id;
-	const exact = data[provider]?.models?.[canonical];
-	if (exact) return exact;
-	for (const p of Object.keys(data)) {
-		const hit = data[p]?.models?.[canonical];
-		if (hit) return hit;
+	return findInIndex(id, index);
+}
+function toModelsDevModel(g: ModelGrepModel): ModelsDevModel {
+	return {
+		id: slugOf(g.id),
+		name: g.name,
+		reasoning: g.capabilities?.reasoning,
+		modalities: g.modality,
+		limit: { context: g.context_length },
+		cost: { input: g.pricing?.input, output: g.pricing?.output },
+	};
+}
+export function buildModelsDevIndex(
+	source: ModelGrepModel[],
+): Map<string, ModelsDevModel> {
+	const index = new Map<string, ModelsDevModel>();
+	for (const g of source) {
+		const m = toModelsDevModel(g);
+		if (!index.has(m.id)) index.set(m.id, m);
+		const norm = normalize(m.id);
+		if (!index.has(norm)) index.set(norm, m);
 	}
-	return undefined;
+	return index;
+}
+export function lookupModelsDev(
+	_provider: string,
+	id: string,
+): ModelsDevModel | undefined {
+	// Provider prefix differs between Pi routing (cc/ds/openrouter) and modelgrep
+	// (anthropic/tencent), so join on the model slug only via the normalized index.
+	return findInIndex(id, buildModelsDevIndex(modelgrep.getCached()));
 }
 export async function fetchModelsDevIndex(): Promise<
 	Map<string, ModelsDevModel>
 > {
-	return buildModelsDevIndex(await modelsDev.get());
+	return buildModelsDevIndex(await modelgrep.get());
 }
-function normBench(s: string): string {
-	return s
-		.toLowerCase()
-		.replace(/[-_.]+/g, " ")
-		.replace(/\s+/g, " ")
-		.trim();
+// Weighted blend, renormalized over present fields — a missing input dilutes
+// only its own group, never zero-penalizes the whole score.
+function blend(parts: [number, number | null | undefined][]): number | null {
+	let weighted = 0;
+	let present = 0;
+	for (const [w, v] of parts) {
+		if (v == null) continue;
+		weighted += w * v;
+		present += w;
+	}
+	return present === 0 ? null : weighted / present;
+}
+const frac = (v: number | null | undefined) => (v == null ? null : v / 100);
+// AA Intelligence Index ceiling — current leader (Claude Fable 5) scores ~65,
+// so /65 maps the index to ~0–100 with headroom and no clipping.
+const INTELLIGENCE_MAX = 65;
+// Fallback calibration. For the models that carry the index AND the raw benches
+// (deduped overlap, n=29), we fit our heuristic (0–1) to the rescaled index via
+// least-squares:
+//   index100 ≈ SLOPE·heuristic + INTERCEPT
+// Heuristic weights below + this line were jointly tuned against the index
+// (R²=0.901, LOOCV-RMSE 6.55pt). Applying it to index-less models maps their
+// heuristic onto the SAME scale as real index scores — a data-fit, not a
+// guessed penalty. Refit if the catalog or weights change.
+const FALLBACK_SLOPE = 120.6;
+const FALLBACK_INTERCEPT = -10.6;
+const clamp01to100 = (x: number) => Math.max(0, Math.min(100, x));
+// Our coding/agentic-weighted heuristic from the raw evals (each used once —
+// no double-counting with the index). Weights tuned against the AA index:
+// agentic-heavy (.60) since tool-call matters most, coding (.30), reasoning a
+// .10 tiebreaker. Sub-weights likewise fit — tau2 dominates the agentic group.
+function heuristicScore(
+	aa: NonNullable<
+		NonNullable<ModelGrepModel["benchmarks"]>["artificial_analysis"]
+	>,
+): number | null {
+	const coding = blend([
+		[0.6, frac(aa.coding)],
+		[0.4, aa.scicode],
+	]);
+	const agentic = blend([
+		[0.7, aa.tau2],
+		[0.3, frac(aa.agentic)],
+	]);
+	const reasoning = blend([
+		[0.6, aa.gpqa],
+		[0.4, aa.hle],
+	]);
+	return blend([
+		[0.3, coding],
+		[0.6, agentic],
+		[0.1, reasoning],
+	]);
+}
+// Model score 0–100. Prefer AA's Intelligence Index (authoritative 9-eval
+// composite); when absent, map our heuristic onto the index scale via the
+// fitted line. Null only when nothing is benchmarked.
+function codingScore(
+	bench: NonNullable<ModelGrepModel["benchmarks"]>,
+): number | null {
+	const aa = bench.artificial_analysis ?? {};
+	if (aa.intelligence != null) {
+		return Math.round((aa.intelligence / INTELLIGENCE_MAX) * 100);
+	}
+	const h = heuristicScore(aa);
+	return h == null
+		? null
+		: Math.round(clamp01to100(FALLBACK_SLOPE * h + FALLBACK_INTERCEPT));
+}
+function buildBenchIndex(): Map<string, BenchmarkEntry> {
+	const index = new Map<string, BenchmarkEntry>();
+	// Rank by our computed score (desc); unscored sink to the bottom, holding
+	// source order among themselves.
+	const scored = modelgrep.getCached().map((g) => ({
+		g,
+		score: g.benchmarks ? codingScore(g.benchmarks) : null,
+	}));
+	scored.sort((a, b) => (b.score ?? -1) - (a.score ?? -1));
+	scored.forEach(({ g, score }, i) => {
+		const slug = slugOf(g.id);
+		const entry: BenchmarkEntry = {
+			rank: i + 1,
+			model: g.name ?? g.id,
+			creator: g.id.split("/")[0] ?? "",
+			overallScore: score,
+			inputPrice: g.pricing?.input ?? null,
+			outputPrice: g.pricing?.output ?? null,
+		};
+		for (const k of [slug, normalize(slug)])
+			if (!index.has(k)) index.set(k, entry);
+	});
+	return index;
 }
 export function lookupBenchmark(modelName: string): BenchmarkEntry | undefined {
-	const entries = benchmark.getCached();
-	const needle = normBench(modelName);
-	return (
-		entries.find((e) => normBench(e.model) === needle) ??
-		entries.find((e) => normBench(e.model).includes(needle)) ??
-		entries.find((e) => needle.includes(normBench(e.model)))
-	);
+	return findInIndex(modelName, buildBenchIndex());
 }

package/src/index.ts CHANGED Viewed

@@ -4,14 +4,15 @@
  * Warms the shared model data cache on session start so other extensions
  * (pix-9router, models picker, footer) can read from ~/.cache/pi/* synchronously.
  *
- * Fetches in parallel, non-blocking — Pi session starts immediately.
+ * Single non-blocking fetch — Pi session starts immediately.
  */
 import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
-import { benchmark, modelsDev } from "./data.ts";
+import { modelgrep } from "./data.ts";
 export type {
 	BenchmarkEntry,
+	ModelGrepModel,
 	ModelsDevApi,
 	ModelsDevModel,
 } from "./data.ts";
@@ -19,7 +20,6 @@ export type {
 // Consumers (pix-core, pix-9router, …) import these instead of duplicating
 // the DataSource implementation and models.dev/BenchLM lookups.
 export {
-	benchmark,
 	buildModelsDevIndex,
 	CACHE_DIR,
 	DataSource,
@@ -27,10 +27,9 @@ export {
 	lookupBenchmark,
 	lookupInIndex,
 	lookupModelsDev,
-	modelsDev,
+	modelgrep,
 } from "./data.ts";
 export default function (_pi: ExtensionAPI): void {
-	void modelsDev.get();
-	void benchmark.get();
+	void modelgrep.get();
 }