@suluk/models 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/REFRESH.md ADDED
@@ -0,0 +1,42 @@
1
+ # @suluk/models — the weekly refresh spec
2
+
3
+ The catalog is a **generated, content-addressed artifact**, not a live call. This package ships the schema + selector + a small SEED catalog; the ~200-row catalog is produced by the fetcher below. Council `wf_729cde52-cc7`.
4
+
5
+ ## The honest split — two derivability classes (the verifier's correction)
6
+
7
+ The naive "everything weekly from public data" is false. Derivability splits in two:
8
+
9
+ **(A) WEEKLY-DERIVABLE from a live API — the automatable spine (build first).** Pure facts from the OpenRouter `/models` + `/endpoints` API: per-token `input`/`output`/`cached` price, `context_length`, `max_completion_tokens`, `supported_parameters` (→ `tool_calling` / `forced_tool_choice` / `parallel_tool_calls` / `structured_output` / `json_schema_strict` — **declared, not verified**), `architecture.input/output_modalities`, provider fan-out, usage rankings, region/data-policy. These refresh cleanly every week. Our **own** signal: a week-over-week snapshot diff → `priceVolatile` and the `status` deprecation delta.
10
+
11
+ **(B) PERIODIC, lower-cadence, human-reviewable — the benchmark TIERS (second pass).** BFCL / τ-bench (agentic-tool-use), IFEval + LMArena (instruction-following), GPQA/AIME (reasoning), SWE-bench-Verified (coding), RULER (long-context), MMLU-Pro (knowledge), LMArena Elo (human-preference). These are **periodic publications, not APIs** — they cannot be claimed "weekly." They are mapped to coarse tiers by a committed **bucketing rules file** (which leaderboard snapshot → which tier boundary) so the tiers are auditable and reproducible. Coverage is **sparse** — most rows are `unknown` on the agentic + long-context axes; we **surface the gap, never impute**.
12
+
13
+ ## Honesty rules (council-unanimous)
14
+
15
+ 1. **Coarse tiers over scores** — `frontier|strong|mid|basic|unknown`; the raw cited number lives in `source`, never as the sortable value.
16
+ 2. **Cite per metric** — every `Cell` carries `{source, asOf}`; an unsourced cell is MISSING (fail-closed in hard filters, soft-penalty in ranks), never a confident value. A tier is an **adopted public prior at a low ceiling**, never our measured fact — we do **not** self-test.
17
+ 3. **Name contamination/saturation** — MMLU + HumanEval are demoted (MMLU-Pro gate; HumanEval secondary to SWE-bench-Verified); cross-witness a benchmark tier against LMArena (≥2 sources to agree on `frontier`).
18
+ 4. **Staleness visible** — per-cell `asOf`; a not-seen-this-week row is STALE; the catalog is pinned by `snapshotHash` so a selection is reproducible (a re-pick with no author edit is otherwise un-auditable).
19
+ 5. **Disclosed blindspots we cannot close without self-testing** — `supported_parameters` is decidable-as-DECLARED not as-true (a model can advertise tools and emit malformed calls → the CAP filter and the agentic-RELIABILITY tier are SEPARATE fields); AA latency/throughput are single-vendor + route/load-dependent (tag the provider, never a guarantee); provider quantization can drop quality with no version bump — disclosed, not solved; popularity is selection-bias (tiebreak only).
20
+
21
+ ## Open sub-question (deferred to a micro-panel)
22
+
23
+ Key a row by **model** or by **(model, provider-endpoint)**? Governance/price/region attach to the *endpoint actually served* (structurally sounder), but that 3–5×'s the row count and the author UX. To resolve with a receipt + ceiling before hardening. The seed catalog currently keys by model with a single `provider`.
24
+
25
+ ## Pipeline
26
+
27
+ ```
28
+ weekly: OpenRouter /models ──▶ normalizeOpenRouter (A) facts cells ─┐ [BUILT: normalize.ts + fetch.ts]
29
+ periodic: leaderboard snapshots + BUCKETING_RULES ─▶ applyBucketing (B) tier cells ─┤ [BUILT: bucketing.ts; the live overlay is TODO]
30
+
31
+ catalogFrom ▶ ModelRecord rows ▶ snapshotHash ▶ ModelCatalog (committed, versioned)
32
+ ```
33
+
34
+ **BUILT (no-network spine, unit-tested):** `bucketing.ts` (`BUCKETING_RULES` + `applyBucketing` — the committed,
35
+ cited tier-boundary rules per axis, the red-line), `normalize.ts` (`normalizeOpenRouterModel`/`normalizeOpenRouter` —
36
+ the OpenRouter `/models` → fact-cells transform; `snapshotHash`/`catalogFrom` — content-addressed catalog), and
37
+ `fetch.ts` (`fetchOpenRouterCatalog` — the thin live wrapper). The selector (`selectModel`) and the agent seam
38
+ (`deriveRequirements`/`resolveSkillModels`, having replaced `SulukSkillRef.model[]` + the analyzer's `DEFAULT_WINDOWS`)
39
+ are wired against these.
40
+
41
+ **TODO:** run `fetchOpenRouterCatalog` weekly (CI) to commit the ~200-row fact catalog; build the Class-B overlay that
42
+ maps leaderboard snapshots through `applyBucketing` onto `intel.*` (human-reviewed, lower cadence); then retire the seed.
package/package.json ADDED
@@ -0,0 +1,28 @@
1
+ {
2
+ "name": "@suluk/models",
3
+ "version": "0.1.0",
4
+ "description": "A weekly, PUBLIC-DATA-ONLY catalog of OpenRouter models + a selector: a suluk skill declares NEEDS (hard filters) + a small PREFERENCE (a named profile), and selectModel picks the best CURRENT model — never a hard-coded id. Decidable OpenRouter facts as numbers; noisy benchmarks as COARSE TIERS with {source, asOf}; no cross-axis composite (blending is the selector's job). CANDIDATE tooling — NOT official OAS.",
5
+ "publishConfig": {
6
+ "access": "public"
7
+ },
8
+ "license": "Apache-2.0",
9
+ "repository": {
10
+ "type": "git",
11
+ "url": "git+https://github.com/MahmoodKhalil57/suluk.git",
12
+ "directory": "tooling/ts/packages/models"
13
+ },
14
+ "homepage": "https://github.com/MahmoodKhalil57/suluk#readme",
15
+ "bugs": "https://github.com/MahmoodKhalil57/suluk/issues",
16
+ "type": "module",
17
+ "main": "src/index.ts",
18
+ "exports": {
19
+ ".": "./src/index.ts"
20
+ },
21
+ "devDependencies": {
22
+ "@types/bun": "latest"
23
+ },
24
+ "scripts": {
25
+ "test": "bun test",
26
+ "typecheck": "tsc --noEmit -p ."
27
+ }
28
+ }
@@ -0,0 +1,39 @@
1
+ /**
2
+ * The TIER BUCKETING RULES (metrics-council red-line `wf_729cde52-cc7`) — a DOCUMENTED, COMMITTED mapping from a
3
+ * published leaderboard metric to a coarse tier, per INTEL axis. Without this the tiers are un-auditable and not
4
+ * weekly-maintainable (the minimalist red-line). The VALUE is that the mapping is EXPLICIT + CITED + reproducible —
5
+ * not that a boundary is exact (tune at review). `applyBucketing` is a pure function the Class-B tier pass calls.
6
+ *
7
+ * Scores are the published metric normalized to [0,1] (accuracy/pass-rate), EXCEPT human-preference which is raw
8
+ * LMArena Elo. A null/absent score ⇒ `unknown` (NEVER imputed to worst — that would kill new models).
9
+ */
10
+ import type { Tier } from "./types";
11
+
12
+ export interface AxisRule {
13
+ /** the public leaderboard(s) this axis is bucketed from (cited in every cell's `source`). */
14
+ source: string;
15
+ /** what the score means (so a reviewer can reproduce the bucketing). */
16
+ metric: string;
17
+ /** score >= frontier ⇒ frontier; >= strong ⇒ strong; >= mid ⇒ mid; else basic. */
18
+ boundaries: { frontier: number; strong: number; mid: number };
19
+ }
20
+
21
+ export const BUCKETING_RULES: Record<string, AxisRule> = {
22
+ agenticToolUse: { source: "bfcl-v3 + tau-bench", metric: "BFCL overall accuracy (tau-bench corroborates)", boundaries: { frontier: 0.85, strong: 0.70, mid: 0.50 } },
23
+ instructionFollowing: { source: "ifeval", metric: "IFEval strict prompt+instruction accuracy", boundaries: { frontier: 0.88, strong: 0.78, mid: 0.65 } },
24
+ reasoning: { source: "gpqa-diamond + aime", metric: "GPQA-Diamond accuracy (AIME corroborates)", boundaries: { frontier: 0.70, strong: 0.55, mid: 0.40 } },
25
+ coding: { source: "swe-bench-verified", metric: "SWE-bench-Verified resolved % (HumanEval secondary)", boundaries: { frontier: 0.60, strong: 0.45, mid: 0.30 } },
26
+ longCtxComprehension: { source: "ruler-128k", metric: "RULER avg accuracy @128k (needle corroborates)", boundaries: { frontier: 0.90, strong: 0.80, mid: 0.65 } },
27
+ knowledge: { source: "mmlu-pro", metric: "MMLU-Pro accuracy (saturation-flagged)", boundaries: { frontier: 0.80, strong: 0.70, mid: 0.55 } },
28
+ humanPreference: { source: "lmarena", metric: "LMArena overall Elo (style-confounded — cross-witness only)", boundaries: { frontier: 1350, strong: 1250, mid: 1150 } },
29
+ };
30
+
31
+ /** Bucket a raw leaderboard score into a coarse tier per the committed rule. Null/absent/unknown-axis ⇒ `unknown`. */
32
+ export function applyBucketing(axis: string, score: number | null | undefined): Tier {
33
+ const rule = BUCKETING_RULES[axis];
34
+ if (!rule || score === null || score === undefined || Number.isNaN(score)) return "unknown";
35
+ if (score >= rule.boundaries.frontier) return "frontier";
36
+ if (score >= rule.boundaries.strong) return "strong";
37
+ if (score >= rule.boundaries.mid) return "mid";
38
+ return "basic";
39
+ }
package/src/catalog.ts ADDED
@@ -0,0 +1,89 @@
1
+ /**
2
+ * SEED catalog — a small, hand-curated set of headline models so the selector + the agent seam can be built and
3
+ * tested BEFORE the weekly fetcher exists (see REFRESH.md). These cells are illustrative public-knowledge values
4
+ * stamped `asOf`; the real catalog is a generated, content-addressed artifact from OpenRouter (facts) + periodic
5
+ * benchmark tiers (see REFRESH.md). Tiers are coarse and source-cited; UNKNOWN is honest, never imputed.
6
+ */
7
+ import type { Cell, ModelCatalog, ModelRecord, Tier, DataRetention } from "./types";
8
+
9
+ const ASOF = "2026-06-13";
10
+ const OR = "openrouter.api"; // decidable facts (price, context, caps, modalities, ops)
11
+ const AA = "artificialanalysis.ai"; // single-vendor speed + composite (provider/load-dependent)
12
+ const num = (value: number | null, source = OR): Cell<number> => ({ value, source: value === null ? "" : source, asOf: ASOF });
13
+ const flag = (value: boolean | null, source = OR): Cell<boolean> => ({ value, source: value === null ? "" : source, asOf: ASOF });
14
+ const tier = (value: Tier | null, source: string): Cell<Tier> => ({ value, source: value === null ? "" : source, asOf: ASOF });
15
+ const mods = (value: string[] | null): Cell<string[]> => ({ value, source: value === null ? "" : OR, asOf: ASOF });
16
+ const str = <T extends string>(value: T | null, source: string): Cell<T> => ({ value, source: value === null ? "" : source, asOf: ASOF });
17
+ const U = null; // UNKNOWN shorthand
18
+
19
+ interface Spec {
20
+ id: string; provider: string; family: string; status?: ModelRecord["status"];
21
+ inP: number; outP: number; cachedP?: number | null;
22
+ win: number; out?: number | null; fidelity?: Tier | null;
23
+ ttft?: Tier | null; tput?: Tier | null;
24
+ tool?: boolean; forced?: boolean; parallel?: boolean; structured?: boolean; strict?: boolean;
25
+ inMod?: string[]; outMod?: string[];
26
+ agentic?: Tier | null; instruct?: Tier | null; reasoning?: Tier | null; coding?: Tier | null; longctx?: Tier | null; knowledge?: Tier | null; humanpref?: Tier | null;
27
+ retention?: DataRetention; region?: string | null; license?: string | null;
28
+ fanout?: number; popularity?: number | null; released?: string | null; volatile?: boolean;
29
+ }
30
+ const mk = (s: Spec): ModelRecord => ({
31
+ id: s.id, provider: s.provider, family: s.family, status: s.status ?? "active",
32
+ cost: { inputPerMtok: num(s.inP), outputPerMtok: num(s.outP), cachedInputPerMtok: num(s.cachedP ?? U), perRequest: flag(false) },
33
+ context: { maxWindow: num(s.win), maxOutput: num(s.out ?? U), longCtxFidelity: tier(s.fidelity ?? U, "ruler.public") },
34
+ speed: { ttft: tier(s.ttft ?? U, AA), throughput: tier(s.tput ?? U, AA) },
35
+ caps: {
36
+ toolCalling: flag(s.tool ?? false), forcedToolChoice: flag(s.forced ?? false), parallelToolCalls: flag(s.parallel ?? false),
37
+ structuredOutput: flag(s.structured ?? false), jsonSchemaStrict: flag(s.strict ?? false),
38
+ inputModalities: mods(s.inMod ?? ["text"]), outputModalities: mods(s.outMod ?? ["text"]),
39
+ },
40
+ intel: {
41
+ agenticToolUse: tier(s.agentic ?? U, "bfcl+tau-bench"), instructionFollowing: tier(s.instruct ?? U, "ifeval+lmarena-if"),
42
+ reasoning: tier(s.reasoning ?? U, "gpqa+aime"), coding: tier(s.coding ?? U, "swe-bench-verified"),
43
+ longCtxComprehension: tier(s.longctx ?? U, "ruler.public"), knowledge: tier(s.knowledge ?? U, "mmlu-pro"),
44
+ humanPreference: tier(s.humanpref ?? U, "lmarena"),
45
+ },
46
+ gov: { dataRetention: str(s.retention ?? "unknown", s.retention ? "provider.tos" : ""), region: str(s.region ?? U, "openrouter.provider"), license: str(s.license ?? U, "openrouter.license") },
47
+ ops: { providerFanOut: num(s.fanout ?? 1), popularityRank: num(s.popularity ?? U, "openrouter.rankings"), releaseDate: str(s.released ?? U, "provider"), priceVolatile: flag(s.volatile ?? false, "suluk.snapshot-diff") },
48
+ });
49
+
50
+ /** Illustrative seed — NOT the live catalog. Tiers reflect coarse public standing as of asOf; UNKNOWN is honest. */
51
+ export const SEED_CATALOG: ModelCatalog = {
52
+ schemaVersion: "0.1.0",
53
+ generatedAt: ASOF,
54
+ snapshotHash: "sha256-seed-0001",
55
+ rows: [
56
+ mk({ id: "anthropic/claude-opus-4", provider: "anthropic", family: "claude", inP: 15, outP: 75, cachedP: 1.5, win: 200000, out: 64000, fidelity: "strong",
57
+ ttft: "mid", tput: "mid", tool: true, forced: true, parallel: true, structured: true, strict: true, inMod: ["text", "image"],
58
+ agentic: "frontier", instruct: "frontier", reasoning: "frontier", coding: "frontier", longctx: "strong", knowledge: "frontier", humanpref: "frontier",
59
+ retention: "ephemeral", region: "us", license: "proprietary", fanout: 2, popularity: 3, released: "2026-05" }),
60
+ mk({ id: "anthropic/claude-sonnet-4-6", provider: "anthropic", family: "claude", inP: 3, outP: 15, cachedP: 0.3, win: 200000, out: 64000, fidelity: "strong",
61
+ ttft: "strong", tput: "strong", tool: true, forced: true, parallel: true, structured: true, strict: true, inMod: ["text", "image"],
62
+ agentic: "frontier", instruct: "frontier", reasoning: "strong", coding: "frontier", longctx: "strong", knowledge: "strong", humanpref: "strong",
63
+ retention: "ephemeral", region: "us", license: "proprietary", fanout: 2, popularity: 1, released: "2026-04" }),
64
+ mk({ id: "google/gemini-2.5-flash", provider: "google", family: "gemini", inP: 0.3, outP: 2.5, cachedP: 0.075, win: 1000000, out: 65000, fidelity: "mid",
65
+ ttft: "frontier", tput: "frontier", tool: true, forced: true, parallel: true, structured: true, strict: true, inMod: ["text", "image", "audio"],
66
+ agentic: "strong", instruct: "strong", reasoning: "mid", coding: "strong", longctx: "mid", knowledge: "strong", humanpref: "strong",
67
+ retention: "logged", region: "us", license: "proprietary", fanout: 1, popularity: 2, released: "2026-03" }),
68
+ mk({ id: "google/gemini-2.5-pro", provider: "google", family: "gemini", inP: 1.25, outP: 10, cachedP: 0.31, win: 1000000, out: 65000, fidelity: "strong",
69
+ ttft: "strong", tput: "strong", tool: true, forced: true, parallel: true, structured: true, strict: true, inMod: ["text", "image", "audio", "pdf"],
70
+ agentic: "strong", instruct: "frontier", reasoning: "frontier", coding: "strong", longctx: "frontier", knowledge: "frontier", humanpref: "frontier",
71
+ retention: "logged", region: "us", license: "proprietary", fanout: 1, popularity: 4, released: "2026-03" }),
72
+ mk({ id: "openai/gpt-5", provider: "openai", family: "gpt", inP: 10, outP: 30, cachedP: 1.25, win: 400000, out: 128000, fidelity: "strong",
73
+ ttft: "mid", tput: "mid", tool: true, forced: true, parallel: true, structured: true, strict: true, inMod: ["text", "image"],
74
+ agentic: "frontier", instruct: "frontier", reasoning: "frontier", coding: "frontier", longctx: "strong", knowledge: "frontier", humanpref: "frontier",
75
+ retention: "logged", region: "us", license: "proprietary", fanout: 2, popularity: 5, released: "2026-02" }),
76
+ mk({ id: "openai/gpt-4o-mini", provider: "openai", family: "gpt", inP: 0.15, outP: 0.6, cachedP: 0.075, win: 128000, out: 16000,
77
+ ttft: "frontier", tput: "frontier", tool: true, forced: true, parallel: true, structured: true, strict: true, inMod: ["text", "image"],
78
+ agentic: "mid", instruct: "strong", reasoning: "mid", coding: "mid", knowledge: "mid", humanpref: "mid",
79
+ retention: "logged", region: "us", license: "proprietary", fanout: 2, popularity: 6, released: "2025-07" }),
80
+ mk({ id: "deepseek/deepseek-v3", provider: "deepseek", family: "deepseek", inP: 0.27, outP: 1.1, cachedP: 0.07, win: 128000, out: 8000,
81
+ ttft: "mid", tput: "strong", tool: true, structured: true, inMod: ["text"],
82
+ agentic: "mid", instruct: "strong", reasoning: "strong", coding: "strong", knowledge: "strong", humanpref: "strong",
83
+ retention: "trains", region: "cn", license: "mit", fanout: 3, popularity: 7, released: "2025-12", volatile: true }),
84
+ mk({ id: "meta-llama/llama-4-maverick", provider: "meta", family: "llama", inP: 0.2, outP: 0.6, win: 1000000, out: 16000, fidelity: U,
85
+ ttft: "strong", tput: "strong", tool: true, structured: true, inMod: ["text", "image"],
86
+ agentic: U /* thin public agentic data */, instruct: "mid", reasoning: "mid", coding: "mid", knowledge: "strong", humanpref: "mid",
87
+ retention: "zero", region: "us", license: "llama-4-community", fanout: 5, popularity: 8, released: "2025-04" }),
88
+ ],
89
+ };
package/src/fetch.ts ADDED
@@ -0,0 +1,20 @@
1
+ /**
2
+ * The LIVE weekly fetcher (Class A) — a THIN wrapper over the pure `normalizeOpenRouter` transform. This is the only
3
+ * part that needs network; it is deliberately tiny so the transform stays unit-tested. The Class-B benchmark TIER
4
+ * pass (BFCL/IFEval/SWE-bench/RULER/MMLU-Pro/LMArena → `applyBucketing`) is a separate, lower-cadence, human-reviewed
5
+ * step that overlays `intel.*` tiers onto these rows — see REFRESH.md. `asOf` must be passed in (scripts stamp time;
6
+ * the package never calls `new Date()` implicitly so a run is reproducible).
7
+ */
8
+ import type { ModelCatalog } from "./types";
9
+ import { type ORModel, normalizeOpenRouter, catalogFrom } from "./normalize";
10
+
11
+ /** Fetch OpenRouter `/models` and normalize to the fact-cell catalog. NETWORK — run from a weekly script/CI, not tests. */
12
+ export async function fetchOpenRouterCatalog(asOf: string, opts: { baseUrl?: string; fetchImpl?: typeof fetch } = {}): Promise<ModelCatalog> {
13
+ const base = opts.baseUrl ?? "https://openrouter.ai/api/v1";
14
+ const f = opts.fetchImpl ?? fetch;
15
+ const res = await f(`${base}/models`);
16
+ if (!res.ok) throw new Error(`@suluk/models: OpenRouter /models returned ${res.status}`);
17
+ const body = (await res.json()) as { data?: ORModel[] };
18
+ if (!Array.isArray(body.data)) throw new Error("@suluk/models: unexpected OpenRouter /models payload (no data[])");
19
+ return catalogFrom(normalizeOpenRouter(body.data, asOf), asOf);
20
+ }
package/src/index.ts ADDED
@@ -0,0 +1,18 @@
1
+ /**
2
+ * @suluk/models — a weekly, PUBLIC-DATA-ONLY OpenRouter model catalog + a selector. A suluk skill declares NEEDS
3
+ * (hard filters) + a small PREFERENCE (a named profile), and selectModel picks the best CURRENT model — never a
4
+ * hard-coded id. Decidable OpenRouter facts are numbers; noisy benchmarks are coarse TIERS with {source, asOf};
5
+ * UNKNOWN is honest (never imputed to worst); no cross-axis composite is stored (blending is the selector's job).
6
+ * Council wf_729cde52-cc7. CANDIDATE tooling — NOT official OAS. The live weekly fetcher is specified in REFRESH.md
7
+ * (this package ships the schema + selector + a SEED catalog; the 200-row generated catalog is the data-eng spine).
8
+ */
9
+ export type {
10
+ Tier, Cell, DataRetention, ModelRecord, ModelCatalog, HardFilters, Profile, Preferences, RankedModel, SelectResult,
11
+ } from "./types";
12
+ export { SEED_CATALOG } from "./catalog";
13
+ export { PROFILES, type ResolvedProfile } from "./profiles";
14
+ export { selectModel, deriveRequirements } from "./select";
15
+ // the weekly fetcher spine: documented tier bucketing rules (red-line) + the pure OpenRouter facts transform + a thin live fetch.
16
+ export { BUCKETING_RULES, applyBucketing, type AxisRule } from "./bucketing";
17
+ export { normalizeOpenRouter, normalizeOpenRouterModel, catalogFrom, snapshotHash, type ORModel } from "./normalize";
18
+ export { fetchOpenRouterCatalog } from "./fetch";
@@ -0,0 +1,79 @@
1
+ /**
2
+ * normalizeOpenRouter — the WEEKLY-DERIVABLE spine (Class A in REFRESH.md): the OpenRouter `/models` API → the
3
+ * DECIDABLE fact cells of a ModelRecord (cost, context, capabilities-as-DECLARED, modalities, recency). PURE +
4
+ * unit-tested; the live `fetch` is a thin wrapper (fetch.ts). The noisy benchmark TIER cells (intel.*) are left
5
+ * `unknown` here — they come from the periodic Class-B pass via `applyBucketing`, not this API. We do NOT impute.
6
+ */
7
+ import { createHash } from "node:crypto";
8
+ import type { Cell, DataRetention, ModelCatalog, ModelRecord } from "./types";
9
+
10
+ /** The subset of an OpenRouter `/models` row we rely on (all public facts). */
11
+ export interface ORModel {
12
+ id: string;
13
+ name?: string;
14
+ created?: number; // unix seconds
15
+ context_length?: number;
16
+ pricing?: { prompt?: string; completion?: string; request?: string; input_cache_read?: string };
17
+ top_provider?: { max_completion_tokens?: number | null };
18
+ architecture?: { input_modalities?: string[]; output_modalities?: string[] };
19
+ supported_parameters?: string[];
20
+ }
21
+
22
+ const c = <T,>(value: T | null, source: string, asOf: string): Cell<T> => ({ value, source: value === null ? "" : source, asOf });
23
+ const perMtok = (s?: string): number | null => (s === undefined ? null : Math.round(parseFloat(s) * 1_000_000 * 1000) / 1000);
24
+ const has = (params: string[] | undefined, k: string) => !!params?.includes(k);
25
+ const OR = "openrouter.api";
26
+
27
+ /** One OpenRouter model → its decidable fact cells (intel/gov tiers stay UNKNOWN; filled by the Class-B pass). */
28
+ export function normalizeOpenRouterModel(m: ORModel, asOf: string): ModelRecord {
29
+ const provider = m.id.split("/")[0] ?? "unknown";
30
+ const family = (m.id.split("/")[1] ?? "").replace(/[:@-].*$/, "") || provider;
31
+ const sp = m.supported_parameters;
32
+ const U = <T,>(): Cell<T> => ({ value: null, source: "", asOf });
33
+ return {
34
+ id: m.id, provider, family, status: "active",
35
+ cost: {
36
+ inputPerMtok: c(perMtok(m.pricing?.prompt), OR, asOf),
37
+ outputPerMtok: c(perMtok(m.pricing?.completion), OR, asOf),
38
+ cachedInputPerMtok: c(perMtok(m.pricing?.input_cache_read), OR, asOf),
39
+ perRequest: c(m.pricing?.request !== undefined ? parseFloat(m.pricing.request) > 0 : null, OR, asOf),
40
+ },
41
+ context: {
42
+ maxWindow: c(m.context_length ?? null, OR, asOf),
43
+ maxOutput: c(m.top_provider?.max_completion_tokens ?? null, OR, asOf),
44
+ longCtxFidelity: U(), // RULER — not in this API; Class-B pass
45
+ },
46
+ speed: { ttft: U(), throughput: U() }, // Artificial Analysis — not in this API
47
+ caps: {
48
+ toolCalling: c(sp ? has(sp, "tools") : null, OR, asOf),
49
+ forcedToolChoice: c(sp ? has(sp, "tool_choice") : null, OR, asOf),
50
+ parallelToolCalls: c(sp ? has(sp, "parallel_tool_calls") : null, OR, asOf),
51
+ structuredOutput: c(sp ? has(sp, "structured_outputs") || has(sp, "response_format") : null, OR, asOf),
52
+ jsonSchemaStrict: c(sp ? has(sp, "structured_outputs") : null, OR, asOf),
53
+ inputModalities: c(m.architecture?.input_modalities ?? null, OR, asOf),
54
+ outputModalities: c(m.architecture?.output_modalities ?? null, OR, asOf),
55
+ },
56
+ intel: { agenticToolUse: U(), instructionFollowing: U(), reasoning: U(), coding: U(), longCtxComprehension: U(), knowledge: U(), humanPreference: U() },
57
+ gov: { dataRetention: U<DataRetention>(), region: U(), license: U() },
58
+ ops: {
59
+ providerFanOut: c(1, OR, asOf),
60
+ popularityRank: U(),
61
+ releaseDate: c(m.created ? new Date(m.created * 1000).toISOString().slice(0, 10) : null, "openrouter.created", asOf),
62
+ priceVolatile: c(false, "suluk.snapshot-diff", asOf),
63
+ },
64
+ };
65
+ }
66
+
67
+ export function normalizeOpenRouter(models: ORModel[], asOf: string): ModelRecord[] {
68
+ return models.map((m) => normalizeOpenRouterModel(m, asOf)).sort((a, b) => a.id.localeCompare(b.id));
69
+ }
70
+
71
+ /** A content-addressed hash over the rows' load-bearing FACT cells (reproducible pin; ties C027 contentHash). */
72
+ export function snapshotHash(rows: ModelRecord[]): string {
73
+ const facts = rows.map((r) => [r.id, r.cost.inputPerMtok.value, r.cost.outputPerMtok.value, r.context.maxWindow.value, r.caps.toolCalling.value]);
74
+ return "sha256-" + createHash("sha256").update(JSON.stringify(facts), "utf8").digest("hex").slice(0, 16);
75
+ }
76
+
77
+ export function catalogFrom(rows: ModelRecord[], asOf: string): ModelCatalog {
78
+ return { schemaVersion: "0.1.0", generatedAt: asOf, snapshotHash: snapshotHash(rows), rows };
79
+ }
@@ -0,0 +1,23 @@
1
+ /**
2
+ * The 6 named PROFILES — the 90%-case author UX (a profile = preset preference weights + AUTO-WIRED implied hard
3
+ * filters so the author can't foot-gun). Same SHAPE as today's skill.model[] (a preference, not an id), so the
4
+ * migration is mechanical. The escape hatch (≤4 small int weights + taskShape) is for power users; the catalog
5
+ * tracks ~30 fields but the author touches ~4.
6
+ */
7
+ import type { HardFilters, Preferences, Profile } from "./types";
8
+
9
+ export interface ResolvedProfile {
10
+ prefer: { intelligence: 0 | 1 | 2 | 3; cost: 0 | 1 | 2 | 3; speed: 0 | 1 | 2 | 3; context: 0 | 1 | 2 | 3 };
11
+ taskShape?: Preferences["taskShape"];
12
+ /** filters the profile auto-wires (an author choosing "tool-reliable" implicitly requires tool-calling). */
13
+ impliedFilters: Partial<HardFilters>;
14
+ }
15
+
16
+ export const PROFILES: Record<Profile, ResolvedProfile> = {
17
+ "tool-reliable": { prefer: { intelligence: 3, cost: 1, speed: 1, context: 1 }, taskShape: "agentic", impliedFilters: { needsTools: true } },
18
+ "cheap-fast": { prefer: { intelligence: 1, cost: 3, speed: 3, context: 0 }, impliedFilters: {} },
19
+ "balanced": { prefer: { intelligence: 2, cost: 2, speed: 2, context: 1 }, impliedFilters: {} },
20
+ "max-reasoning": { prefer: { intelligence: 3, cost: 0, speed: 0, context: 1 }, taskShape: "reasoning", impliedFilters: {} },
21
+ "long-context": { prefer: { intelligence: 2, cost: 1, speed: 0, context: 3 }, impliedFilters: { fidelityFloor: "mid" } },
22
+ "vision": { prefer: { intelligence: 2, cost: 1, speed: 1, context: 1 }, impliedFilters: { inputModalities: ["image"] } },
23
+ };
package/src/select.ts ADDED
@@ -0,0 +1,136 @@
1
+ /**
2
+ * selectModel — requirements FILTER first (capabilities + context + governance + the C028 allowlist MEET; can empty
3
+ * the set ⇒ FAIL LOUD naming the unsatisfiable filter), then preferences RANK the survivors over coarse tiers.
4
+ * A preference can NEVER widen a hard filter (the operator/C028 set is terminal). UNKNOWN is a soft penalty, never
5
+ * worst. Output carries a "why this model" explainer — adoption dies if selection is a black box.
6
+ */
7
+ import type { Cell, HardFilters, ModelCatalog, ModelRecord, Preferences, RankedModel, SelectResult, Tier } from "./types";
8
+ import { PROFILES } from "./profiles";
9
+
10
+ const TIER_SCORE: Record<Tier, number> = { frontier: 4, strong: 3, mid: 2, basic: 1, unknown: 1.5 };
11
+ const TIER_RANK: Record<Exclude<Tier, "unknown">, number> = { frontier: 4, strong: 3, mid: 2, basic: 1 };
12
+ const scoreTier = (c: Cell<Tier>): number => TIER_SCORE[c.value ?? "unknown"];
13
+ const isTrue = (c: Cell<boolean>): boolean => c.value === true; // fail-closed: unknown/false does not satisfy
14
+ const supersetOf = (have: Cell<string[]>, need: string[]): boolean => !!have.value && need.every((m) => have.value!.includes(m));
15
+
16
+ /** Resolve a profile + escape-hatch into concrete weights + implied filters + taskShape. */
17
+ function resolvePrefs(prefs: Preferences): { w: { intelligence: number; cost: number; speed: number; context: number }; taskShape?: Preferences["taskShape"]; implied: Partial<HardFilters> } {
18
+ const base = prefs.profile ? PROFILES[prefs.profile] : { prefer: { intelligence: 2, cost: 2, speed: 1, context: 1 }, impliedFilters: {}, taskShape: undefined as Preferences["taskShape"] };
19
+ const w = { ...base.prefer, ...(prefs.prefer ?? {}) };
20
+ return { w, taskShape: prefs.taskShape ?? base.taskShape, implied: base.impliedFilters };
21
+ }
22
+
23
+ /** Merge author requirements with a profile's implied filters (author can ADD but the operator policy is terminal). */
24
+ function mergeFilters(reqs: HardFilters, implied: Partial<HardFilters>): HardFilters {
25
+ return { ...implied, ...reqs, policy: reqs.policy ?? implied.policy };
26
+ }
27
+
28
+ interface FilterTrace { passed: string[]; failed: string | null }
29
+ function checkFilters(m: ModelRecord, f: HardFilters): FilterTrace {
30
+ const passed: string[] = [];
31
+ const need = (ok: boolean, label: string): string | null => { if (ok) { passed.push(label); return null; } return label; };
32
+ // capabilities (declared-not-verified; fail-closed on unknown)
33
+ const checks: Array<[boolean | undefined, () => boolean, string]> = [
34
+ [f.needsTools, () => isTrue(m.caps.toolCalling), "tool-calling"],
35
+ [f.needsForcedToolChoice, () => isTrue(m.caps.forcedToolChoice), "forced-tool-choice"],
36
+ [f.needsStructured, () => isTrue(m.caps.structuredOutput), "structured-output"],
37
+ [f.strictSchema, () => isTrue(m.caps.jsonSchemaStrict), "strict-schema"],
38
+ ];
39
+ for (const [req, test, label] of checks) if (req) { const fail = need(test(), label); if (fail) return { passed, failed: fail }; }
40
+ if (f.inputModalities?.length) { const fail = need(supersetOf(m.caps.inputModalities, f.inputModalities), `input-modalities[${f.inputModalities.join(",")}]`); if (fail) return { passed, failed: fail }; }
41
+ if (f.outputModalities?.length) { const fail = need(supersetOf(m.caps.outputModalities, f.outputModalities), `output-modalities[${f.outputModalities.join(",")}]`); if (fail) return { passed, failed: fail }; }
42
+ // context — minWindow fail-closed on unknown (don't claim a fit we can't prove)
43
+ if (f.minWindowRequired !== undefined) { const fail = need(m.context.maxWindow.value !== null && m.context.maxWindow.value >= f.minWindowRequired, `min-window>=${f.minWindowRequired}`); if (fail) return { passed, failed: fail }; }
44
+ if (f.minOutputTokens !== undefined && m.context.maxOutput.value !== null) { const fail = need(m.context.maxOutput.value >= f.minOutputTokens, `min-output>=${f.minOutputTokens}`); if (fail) return { passed, failed: fail }; }
45
+ if (f.fidelityFloor && m.context.longCtxFidelity.value && m.context.longCtxFidelity.value !== "unknown") { const fail = need(TIER_RANK[m.context.longCtxFidelity.value] >= TIER_RANK[f.fidelityFloor as Exclude<Tier, "unknown">], `fidelity>=${f.fidelityFloor}`); if (fail) return { passed, failed: fail }; }
46
+ // price caps (price is always known from OpenRouter)
47
+ if (f.maxInputPrice !== undefined) { const fail = need(m.cost.inputPerMtok.value !== null && m.cost.inputPerMtok.value <= f.maxInputPrice, `input-price<=${f.maxInputPrice}`); if (fail) return { passed, failed: fail }; }
48
+ if (f.maxOutputPrice !== undefined) { const fail = need(m.cost.outputPerMtok.value !== null && m.cost.outputPerMtok.value <= f.maxOutputPrice, `output-price<=${f.maxOutputPrice}`); if (fail) return { passed, failed: fail }; }
49
+ // governance — FAIL-CLOSED (unknown excluded), C028
50
+ const p = f.policy;
51
+ if (p?.allowedRegions?.length) { const fail = need(!!m.gov.region.value && p.allowedRegions.includes(m.gov.region.value), "region"); if (fail) return { passed, failed: fail }; }
52
+ if (p?.allowedLicenses?.length) { const fail = need(!!m.gov.license.value && p.allowedLicenses.includes(m.gov.license.value), "license"); if (fail) return { passed, failed: fail }; }
53
+ if (p?.allowedRetention?.length) { const fail = need(m.gov.dataRetention.value !== null && p.allowedRetention.includes(m.gov.dataRetention.value), "data-retention"); if (fail) return { passed, failed: fail }; }
54
+ // the TERMINAL allowlist MEET (a model outside a non-empty allowlist is excluded on ANY grounds)
55
+ if (p?.modelAllowlist?.length) { const fail = need(p.modelAllowlist.includes(m.id), "policy-allowlist"); if (fail) return { passed, failed: fail }; }
56
+ // liveness
57
+ { const fail = need(m.status === "active", "status-active"); if (fail) return { passed, failed: fail }; }
58
+ return { passed, failed: null };
59
+ }
60
+
61
+ /** The intelligence sub-tier the preference points at (taskShape routes the single knob; default = agentic for agents). */
62
+ function intelCell(m: ModelRecord, taskShape?: Preferences["taskShape"]): Cell<Tier> {
63
+ if (taskShape === "coding") return m.intel.coding;
64
+ if (taskShape === "reasoning") return m.intel.reasoning;
65
+ if (taskShape === "agentic") return m.intel.agenticToolUse;
66
+ // default: prefer the agentic tier, fall back to instruction-following then reasoning
67
+ return m.intel.agenticToolUse.value !== null ? m.intel.agenticToolUse : m.intel.instructionFollowing.value !== null ? m.intel.instructionFollowing : m.intel.reasoning;
68
+ }
69
+
70
+ const blendedPrice = (m: ModelRecord): number => (m.cost.inputPerMtok.value ?? 0) * 0.75 + (m.cost.outputPerMtok.value ?? 0) * 0.25;
71
+ /** Normalize a raw numeric across the candidate pool into [1,4]; `higherBetter=false` inverts (cheaper→higher). */
72
+ function norm(values: number[], v: number, higherBetter: boolean): number {
73
+ const min = Math.min(...values), max = Math.max(...values);
74
+ if (max === min) return 2.5;
75
+ const t = (v - min) / (max - min);
76
+ return 1 + 3 * (higherBetter ? t : 1 - t);
77
+ }
78
+
79
+ export function selectModel(reqs: HardFilters, prefs: Preferences, catalog: ModelCatalog): SelectResult {
80
+ const { w, taskShape, implied } = resolvePrefs(prefs);
81
+ const filters = mergeFilters(reqs, implied);
82
+
83
+ const survivors: { m: ModelRecord; passed: string[] }[] = [];
84
+ const excludedBy = new Map<string, number>();
85
+ for (const m of catalog.rows) {
86
+ const t = checkFilters(m, filters);
87
+ if (t.failed) excludedBy.set(t.failed, (excludedBy.get(t.failed) ?? 0) + 1);
88
+ else survivors.push({ m, passed: t.passed });
89
+ }
90
+
91
+ if (survivors.length === 0) {
92
+ const unsatisfiable = [...excludedBy.entries()].sort((a, b) => b[1] - a[1]).map(([k, n]) => `${k} (excluded ${n})`);
93
+ return { ranked: [], candidateCount: 0, unsatisfiable, coverageGaps: [] };
94
+ }
95
+
96
+ const prices = survivors.map((s) => blendedPrice(s.m));
97
+ const headrooms = survivors.map((s) => (s.m.context.maxWindow.value ?? 0) - (filters.minWindowRequired ?? 0));
98
+
99
+ const ranked: RankedModel[] = survivors.map((s, i) => {
100
+ const intel = intelCell(s.m, taskShape);
101
+ const g = {
102
+ intelligence: scoreTier(intel),
103
+ cost: norm(prices, prices[i], false),
104
+ speed: scoreTier(s.m.speed.ttft),
105
+ context: norm(headrooms, headrooms[i], true),
106
+ };
107
+ const score = w.intelligence * g.intelligence + w.cost * g.cost + w.speed * g.speed + w.context * g.context;
108
+ const tierByAxis: RankedModel["why"]["tierByAxis"] = {
109
+ intelligence: { tier: intel.value ?? "unknown", source: intel.source, asOf: intel.asOf },
110
+ latency: { tier: s.m.speed.ttft.value ?? "unknown", source: s.m.speed.ttft.source, asOf: s.m.speed.ttft.asOf },
111
+ cost: { tier: `${blendedPrice(s.m).toFixed(2)} $/Mtok blended`, source: s.m.cost.inputPerMtok.source, asOf: s.m.cost.inputPerMtok.asOf },
112
+ };
113
+ const decidingPreference = (["intelligence", "cost", "speed", "context"] as Array<keyof typeof w>).sort((a, b) => w[b] - w[a])[0];
114
+ return { id: s.m.id, provider: s.m.provider, score, why: { passedFilters: s.passed, decidingPreference: `${decidingPreference} (weight ${w[decidingPreference]})`, tierByAxis } };
115
+ }).sort((a, b) => b.score - a.score);
116
+
117
+ // coverage gaps on the winner — soft axes with no data (honesty surface)
118
+ const winner = survivors.find((s) => s.m.id === ranked[0].id)!.m;
119
+ const coverageGaps: string[] = [];
120
+ if (intelCell(winner, taskShape).value === null) coverageGaps.push("intelligence (no public tier for this taskShape)");
121
+ if (winner.speed.ttft.value === null) coverageGaps.push("latency");
122
+ if (winner.context.longCtxFidelity.value === null && (filters.minWindowRequired ?? 0) > 200000) coverageGaps.push("long-context-fidelity (large window, unverified)");
123
+
124
+ return { ranked, candidateCount: survivors.length, coverageGaps };
125
+ }
126
+
127
+ /** Derive HardFilters from an agent/skill's declared needs + the analyzer's load (the C027 seam). */
128
+ export function deriveRequirements(input: { minWindowRequired?: number; hasRoutes?: boolean; needsStructured?: boolean; inputModalities?: string[]; policy?: HardFilters["policy"] }): HardFilters {
129
+ return {
130
+ ...(input.hasRoutes ? { needsTools: true } : {}),
131
+ ...(input.minWindowRequired !== undefined ? { minWindowRequired: input.minWindowRequired } : {}),
132
+ ...(input.needsStructured ? { needsStructured: true } : {}),
133
+ ...(input.inputModalities?.length ? { inputModalities: input.inputModalities } : {}),
134
+ ...(input.policy ? { policy: input.policy } : {}),
135
+ };
136
+ }
package/src/types.ts ADDED
@@ -0,0 +1,117 @@
1
+ /**
2
+ * @suluk/models — the catalog schema (council wf_729cde52-cc7). A row is one model+provider endpoint. Decidable
3
+ * OpenRouter facts are NUMBERS/BOOLS; noisy third-party benchmarks are COARSE TIERS (frontier/strong/mid/basic/
4
+ * unknown) — never a 2-decimal score (that launders noisy/contaminated public data as precision). Every cell carries
5
+ * {source, asOf}; an unsourced cell is MISSING, never a confident value, and a missing tier is NEVER imputed to
6
+ * worst (that would kill new models). The catalog stores NO cross-axis composite — blending is the selector's job
7
+ * at query time under explicit operator weights (storing a blend launders preference as fact).
8
+ */
9
+
10
+ export type Tier = "frontier" | "strong" | "mid" | "basic" | "unknown";
11
+
12
+ /** One catalog value with provenance. `value: null` ⇒ UNKNOWN (and `source` is ""); never imputed. */
13
+ export interface Cell<T> { value: T | null; source: string; asOf: string }
14
+
15
+ export type DataRetention = "zero" | "ephemeral" | "logged" | "trains" | "unknown";
16
+
17
+ export interface ModelRecord {
18
+ /** the OpenRouter id the selector compiles against (stable wire id). */
19
+ id: string;
20
+ provider: string;
21
+ family: string;
22
+ status: "active" | "deprecated" | "sunset" | "preview";
23
+ cost: {
24
+ inputPerMtok: Cell<number>;
25
+ outputPerMtok: Cell<number>;
26
+ cachedInputPerMtok: Cell<number>;
27
+ perRequest: Cell<boolean>;
28
+ };
29
+ context: {
30
+ maxWindow: Cell<number>;
31
+ maxOutput: Cell<number>;
32
+ /** RULER/needle — does the big window actually hold quality? sparse public data ⇒ mostly unknown; NEVER inferred from size. */
33
+ longCtxFidelity: Cell<Tier>;
34
+ };
35
+ /** Artificial-Analysis single-vendor, provider/route/load-dependent — their measurement, not a guarantee. */
36
+ speed: { ttft: Cell<Tier>; throughput: Cell<Tier> };
37
+ /** capabilities are DECLARED-not-verified (provider self-report; we do not self-test). */
38
+ caps: {
39
+ toolCalling: Cell<boolean>;
40
+ forcedToolChoice: Cell<boolean>;
41
+ parallelToolCalls: Cell<boolean>;
42
+ structuredOutput: Cell<boolean>;
43
+ jsonSchemaStrict: Cell<boolean>;
44
+ inputModalities: Cell<string[]>;
45
+ outputModalities: Cell<string[]>;
46
+ };
47
+ /** "intelligence" split into 6 orthogonal-ish, source-separated dimensions (ranked by relevance to tool-using agents). */
48
+ intel: {
49
+ agenticToolUse: Cell<Tier>; // BFCL + tau-bench — rank 1 for suluk agents; thinnest coverage
50
+ instructionFollowing: Cell<Tier>; // IFEval
51
+ reasoning: Cell<Tier>; // GPQA-Diamond + AIME
52
+ coding: Cell<Tier>; // SWE-bench-Verified (HumanEval secondary)
53
+ longCtxComprehension: Cell<Tier>; // RULER (same datum as context.longCtxFidelity)
54
+ knowledge: Cell<Tier>; // MMLU-Pro (saturation-flagged)
55
+ humanPreference: Cell<Tier>; // LMArena — a SEPARATE cross-witness axis, never summed into a capability tier
56
+ };
57
+ gov: { dataRetention: Cell<DataRetention>; region: Cell<string>; license: Cell<string> };
58
+ ops: { providerFanOut: Cell<number>; popularityRank: Cell<number>; releaseDate: Cell<string>; priceVolatile: Cell<boolean> };
59
+ }
60
+
61
+ export interface ModelCatalog {
62
+ schemaVersion: string;
63
+ generatedAt: string;
64
+ /** content-addressed so a selection is reproducible week-over-week (ties C027 contentHash). */
65
+ snapshotHash: string;
66
+ rows: ModelRecord[];
67
+ }
68
+
69
+ /** Hard requirements — these FILTER (can empty the set ⇒ fail-loud), never rank. */
70
+ export interface HardFilters {
71
+ needsTools?: boolean;
72
+ needsForcedToolChoice?: boolean;
73
+ needsStructured?: boolean;
74
+ strictSchema?: boolean;
75
+ inputModalities?: string[];
76
+ outputModalities?: string[];
77
+ /** the analyzer's per-agent minWindowRequired (context.ts) becomes the hard min-context gate. */
78
+ minWindowRequired?: number;
79
+ minOutputTokens?: number;
80
+ fidelityFloor?: Tier;
81
+ maxInputPrice?: number;
82
+ maxOutputPrice?: number;
83
+ /** C028 governance/allowlist — the TERMINAL, non-overridable MEET (a preference can NEVER widen these). */
84
+ policy?: { modelAllowlist?: string[]; allowedRegions?: string[]; allowedLicenses?: string[]; allowedRetention?: DataRetention[] };
85
+ }
86
+
87
+ export type Profile = "tool-reliable" | "cheap-fast" | "balanced" | "max-reasoning" | "long-context" | "vision";
88
+
89
+ /** Preference — RANKS the survivors. A named profile is the 90% case; the escape hatch is ≤4 small int weights. */
90
+ export interface Preferences {
91
+ profile?: Profile;
92
+ prefer?: { intelligence?: 0 | 1 | 2 | 3; cost?: 0 | 1 | 2 | 3; speed?: 0 | 1 | 2 | 3; context?: 0 | 1 | 2 | 3 };
93
+ /** routes the single "intelligence" knob to the ONE relevant INTEL sub-tier. */
94
+ taskShape?: "agentic" | "coding" | "reasoning";
95
+ }
96
+
97
+ export interface RankedModel {
98
+ id: string;
99
+ provider: string;
100
+ score: number;
101
+ why: {
102
+ passedFilters: string[];
103
+ decidingPreference: string;
104
+ tierByAxis: Record<string, { tier: Tier | string; source: string; asOf: string }>;
105
+ };
106
+ }
107
+
108
+ export interface SelectResult {
109
+ /** ranked best-first; empty when no model satisfies the hard filters. */
110
+ ranked: RankedModel[];
111
+ /** the count after hard filtering. */
112
+ candidateCount: number;
113
+ /** present when the requirements emptied the set — names the unsatisfiable filter(s). */
114
+ unsatisfiable?: string[];
115
+ /** UNKNOWN-coverage warning: soft axes with no data on the winner (honesty surface). */
116
+ coverageGaps: string[];
117
+ }
@@ -0,0 +1,51 @@
1
+ import { test, expect, describe } from "bun:test";
2
+ import { applyBucketing, normalizeOpenRouterModel, normalizeOpenRouter, catalogFrom, snapshotHash, type ORModel } from "../src/index";
3
+
4
+ describe("@suluk/models — tier bucketing rules (the red-line)", () => {
5
+ test("scores bucket per the committed boundaries; null / unknown-axis ⇒ unknown", () => {
6
+ expect(applyBucketing("agenticToolUse", 0.9)).toBe("frontier");
7
+ expect(applyBucketing("agenticToolUse", 0.72)).toBe("strong");
8
+ expect(applyBucketing("agenticToolUse", 0.55)).toBe("mid");
9
+ expect(applyBucketing("agenticToolUse", 0.2)).toBe("basic");
10
+ expect(applyBucketing("agenticToolUse", null)).toBe("unknown");
11
+ expect(applyBucketing("not-an-axis", 0.9)).toBe("unknown");
12
+ expect(applyBucketing("humanPreference", 1400)).toBe("frontier"); // Elo scale, not [0,1]
13
+ });
14
+ });
15
+
16
+ describe("@suluk/models — normalizeOpenRouter (the weekly facts spine)", () => {
17
+ const m: ORModel = {
18
+ id: "anthropic/claude-opus-4", created: 1746057600, context_length: 200000,
19
+ pricing: { prompt: "0.000015", completion: "0.000075", input_cache_read: "0.0000015" },
20
+ top_provider: { max_completion_tokens: 64000 },
21
+ architecture: { input_modalities: ["text", "image"], output_modalities: ["text"] },
22
+ supported_parameters: ["tools", "tool_choice", "structured_outputs", "response_format"],
23
+ };
24
+ const rec = normalizeOpenRouterModel(m, "2026-06-13");
25
+
26
+ test("prices convert per-token → per-Mtok; context + modalities + caps from supported_parameters", () => {
27
+ expect(rec.cost.inputPerMtok.value).toBe(15);
28
+ expect(rec.cost.outputPerMtok.value).toBe(75);
29
+ expect(rec.cost.cachedInputPerMtok.value).toBe(1.5);
30
+ expect(rec.context.maxWindow.value).toBe(200000);
31
+ expect(rec.context.maxOutput.value).toBe(64000);
32
+ expect(rec.caps.toolCalling.value).toBe(true);
33
+ expect(rec.caps.structuredOutput.value).toBe(true);
34
+ expect(rec.caps.inputModalities.value).toEqual(["text", "image"]);
35
+ expect(rec.provider).toBe("anthropic");
36
+ expect(rec.cost.inputPerMtok.source).toBe("openrouter.api");
37
+ });
38
+
39
+ test("benchmark TIER cells are UNKNOWN here (filled by the Class-B pass), never imputed", () => {
40
+ expect(rec.intel.agenticToolUse.value).toBeNull();
41
+ expect(rec.intel.reasoning.value).toBeNull();
42
+ expect(rec.context.longCtxFidelity.value).toBeNull();
43
+ expect(rec.speed.ttft.value).toBeNull();
44
+ });
45
+
46
+ test("catalogFrom is content-addressed + deterministic", () => {
47
+ const rows = normalizeOpenRouter([m], "2026-06-13");
48
+ expect(catalogFrom(rows, "2026-06-13").snapshotHash).toBe(catalogFrom(rows, "2026-06-13").snapshotHash);
49
+ expect(snapshotHash(rows)).toStartWith("sha256-");
50
+ });
51
+ });
@@ -0,0 +1,67 @@
1
+ import { test, expect, describe } from "bun:test";
2
+ import { selectModel, deriveRequirements, SEED_CATALOG } from "../src/index";
3
+
4
+ describe("@suluk/models selectModel — filter then rank", () => {
5
+ test("tool-reliable profile requires tool-calling and ranks agentic-tool-use highest", () => {
6
+ const r = selectModel({}, { profile: "tool-reliable" }, SEED_CATALOG);
7
+ expect(r.ranked.length).toBeGreaterThan(0);
8
+ const top = r.ranked[0];
9
+ expect(top.why.passedFilters).toContain("tool-calling");
10
+ expect(["frontier", "strong"]).toContain(top.why.tierByAxis.intelligence.tier);
11
+ expect(top.why.decidingPreference).toContain("intelligence");
12
+ });
13
+
14
+ test("cheap-fast picks a cheap, fast model (never the premium ones)", () => {
15
+ const r = selectModel({}, { profile: "cheap-fast" }, SEED_CATALOG);
16
+ expect(["openai/gpt-4o-mini", "google/gemini-2.5-flash", "deepseek/deepseek-v3", "meta-llama/llama-4-maverick"]).toContain(r.ranked[0].id);
17
+ expect(["anthropic/claude-opus-4", "openai/gpt-5"]).not.toContain(r.ranked[0].id);
18
+ });
19
+
20
+ test("the analyzer's minWindowRequired is a HARD min-context gate (fail-closed)", () => {
21
+ const r = selectModel({ minWindowRequired: 500000 }, { profile: "balanced" }, SEED_CATALOG);
22
+ const ids = r.ranked.map((x) => x.id).sort();
23
+ expect(ids).toEqual(["google/gemini-2.5-flash", "google/gemini-2.5-pro", "meta-llama/llama-4-maverick"]);
24
+ });
25
+
26
+ test("C028 modelAllowlist is the TERMINAL MEET — a model outside it is excluded on any grounds", () => {
27
+ const r = selectModel({ policy: { modelAllowlist: ["google/gemini-2.5-flash"] } }, { profile: "max-reasoning" }, SEED_CATALOG);
28
+ expect(r.candidateCount).toBe(1);
29
+ expect(r.ranked[0].id).toBe("google/gemini-2.5-flash"); // even though max-reasoning would prefer a frontier reasoner
30
+ });
31
+
32
+ test("governance filters FAIL-CLOSED (unknown/disallowed retention excluded)", () => {
33
+ const r = selectModel({ policy: { allowedRetention: ["zero", "ephemeral"] } }, { profile: "balanced" }, SEED_CATALOG);
34
+ const ids = r.ranked.map((x) => x.id).sort();
35
+ expect(ids).toEqual(["anthropic/claude-opus-4", "anthropic/claude-sonnet-4-6", "meta-llama/llama-4-maverick"]);
36
+ });
37
+
38
+ test("vision profile requires image input (text-only models excluded)", () => {
39
+ const r = selectModel({}, { profile: "vision" }, SEED_CATALOG);
40
+ expect(r.ranked.map((x) => x.id)).not.toContain("deepseek/deepseek-v3");
41
+ });
42
+
43
+ test("FAIL LOUD when requirements empty the set — names the unsatisfiable filter", () => {
44
+ const r = selectModel({ minWindowRequired: 2_000_000 }, { profile: "balanced" }, SEED_CATALOG);
45
+ expect(r.ranked).toEqual([]);
46
+ expect(r.unsatisfiable?.some((u) => u.includes("min-window"))).toBe(true);
47
+ });
48
+
49
+ test("UNKNOWN is surfaced as a coverage gap, never imputed to worst", () => {
50
+ // isolate llama (agentic tool-use = unknown) via the allowlist
51
+ const r = selectModel({ policy: { modelAllowlist: ["meta-llama/llama-4-maverick"] } }, { profile: "tool-reliable" }, SEED_CATALOG);
52
+ expect(r.ranked[0].id).toBe("meta-llama/llama-4-maverick"); // not excluded for unknown agentic
53
+ expect(r.coverageGaps.some((g) => g.startsWith("intelligence"))).toBe(true);
54
+ });
55
+
56
+ test("every ranked result carries a 'why this model' explainer", () => {
57
+ const r = selectModel({}, { profile: "balanced" }, SEED_CATALOG);
58
+ const why = r.ranked[0].why;
59
+ expect(Array.isArray(why.passedFilters)).toBe(true);
60
+ expect(why.decidingPreference).toBeTruthy();
61
+ expect(why.tierByAxis.cost.source).toBe("openrouter.api");
62
+ });
63
+
64
+ test("deriveRequirements maps an agent's structure to hard filters (the C027 seam)", () => {
65
+ expect(deriveRequirements({ hasRoutes: true, minWindowRequired: 120000 })).toEqual({ needsTools: true, minWindowRequired: 120000 });
66
+ });
67
+ });
package/tsconfig.json ADDED
@@ -0,0 +1 @@
1
+ { "extends": "../../tsconfig.base.json", "compilerOptions": { "types": ["bun"] }, "include": ["src", "test"] }