npm - @warmdrift/kgauto-compiler - Versions diffs - 2.0.0-alpha.3 - Mend

@warmdrift/kgauto-compiler 2.0.0-alpha.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +198 -0
package/dist/chunk-5TI6PNSK.mjs +95 -0
package/dist/chunk-MBEI5UOM.mjs +409 -0
package/dist/dialect.d.mts +99 -0
package/dist/dialect.d.ts +99 -0
package/dist/dialect.js +127 -0
package/dist/dialect.mjs +22 -0
package/dist/index.d.mts +238 -0
package/dist/index.d.ts +238 -0
package/dist/index.js +1804 -0
package/dist/index.mjs +1286 -0
package/dist/profiles-BiyrF36f.d.mts +489 -0
package/dist/profiles-C5lVqF8_.d.ts +489 -0
package/dist/profiles.d.mts +2 -0
package/dist/profiles.d.ts +2 -0
package/dist/profiles.js +437 -0
package/dist/profiles.mjs +14 -0
package/package.json +59 -0

package/README.md ADDED Viewed

@@ -0,0 +1,198 @@
+# @warmdrift/kgauto-compiler — v2.0.0-alpha.3
+> Prompt compiler + central learning brain for multi-model AI apps.
+> **Swap models without rewriting prompts.**
+Greenfield rewrite of `@warmdrift/kgauto` v1. v1 was a behavioral patcher
+with telemetry; v2 is a real prompt compiler with a self-improving learning
+layer designed for cross-app pollination.
+The "compiler" name is deliberate — every optimization is a pass on a
+structured Intermediate Representation (IR), not string surgery on a
+rendered prompt. This unlocks slicing, dedupe, intent-aware tool relevance,
+target-correct lowering with cache markers, and (in v2.1) outcome-driven
+mutations.
+## Status
+- **Package:** alpha — coexists with v1 (`@warmdrift/kgauto@1.2.0`) under
+  the temporary name `@warmdrift/kgauto-compiler`. Renames to v2 final once
+  v1 is fully retired from production.
+- **Tests:** 132/132 passing
+- **Build:** clean (43KB ESM, 60KB CJS)
+- **Brain:** schema ready (see `brain/migrations/001_initial_schema.sql`);
+  awaiting dedicated Supabase provisioning.
+- **Mutation engine:** v2.1 (after enough outcome data accumulates).
+## Quickstart
+Two entry points. Use whichever fits your call path.
+### `call()` — kgauto owns the network round-trip (alpha.3)
+For plain-fetch consumers who don't already drive the wire themselves. One async call → compiled, executed, normalized, recorded.
+```ts
+import { call, configureBrain } from '@warmdrift/kgauto-compiler';
+configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
+const result = await call({
+  appId: 'my-app',
+  intent: { name: 'search', archetype: 'ask' },
+  sections: [
+    { id: 'role', text: 'You are an assistant.', cacheable: true },
+    { id: 'task', text: userQuestion },
+  ],
+  models: ['claude-sonnet-4-6', 'gemini-2.5-flash'],   // first = primary; rest = fallback chain
+});
+// result.actualModel       → what served (post-fallback)
+// result.requestedModel    → what kgauto initially picked
+// result.response.text     → normalized across providers
+// result.response.tokens   → { input, output, total, cached?, cacheCreated? }
+// result.response.toolCalls → ToolCall[] in normalized shape
+// result.attempts          → retry observability
+```
+API keys default to `process.env.{ANTHROPIC,GOOGLE,OPENAI,DEEPSEEK}_API_KEY`. Override per-call via `apiKeys: { anthropic, google, ... }`. Reach provider-specific fields (Gemini `safetySettings`, Anthropic `tool_choice`, OpenAI `seed`) via `providerOverrides: { google: {...}, anthropic: {...} }` shallow-merged into the lowered request.
+### `compile()` — drive the wire yourself (existing path)
+For consumers who already own provider plumbing (AI SDK adapters, custom retry logic, streaming).
+```ts
+import { compile, configureBrain, record } from '@warmdrift/kgauto-compiler';
+configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
+const result = compile({
+  appId: 'my-app',
+  intent: { name: 'search', archetype: 'ask' },
+  sections: [
+    { id: 'role',  text: 'You are an assistant.', cacheable: true },
+    { id: 'task',  text: userQuestion },
+  ],
+  models: ['claude-sonnet-4-6', 'gemini-2.5-flash'],
+});
+const start = Date.now();
+const response = await callProvider(result.target, result.request);
+await record({
+  handle:    result.handle,
+  tokensIn:  response.usage.input,
+  tokensOut: response.usage.output,
+  latencyMs: Date.now() - start,
+  success:   true,
+  oracleScore: { score: 0.85 },
+});
+```
+## Architecture
+```
+APP (any consumer)
+  ├── kg.compile(IR) ── runs LOCALLY, no network
+  │     ├─ pass: slice (drop sections not for this intent)
+  │     ├─ pass: dedupe (collapse identical sections by hash)
+  │     ├─ pass: tool_relevance (drop tools below intent threshold)
+  │     ├─ pass: compress_history (summarize old turns)
+  │     ├─ pass: score_targets (rank allowed models)
+  │     ├─ pass: apply_cliffs (executable known_failures from profile)
+  │     ├─ pass: lower (target-specific wire format + cache markers)
+  │     └─ pass: validate (fits hard constraints)
+  │
+  ├── app calls provider with the wire request
+  │
+  └── kg.record(handle, outcome) ── async POST to brain
+BRAIN (centralized Supabase)
+  ├── compile_outcomes (multi-tenant from day 1)
+  ├── mutations (active rules — empty in v2.0; engine in v2.1)
+  ├── apps (consumer registry)
+  └── digest_runs (weekly summary audit trail)
+```
+## Dialect-v1 (cross-app vocabulary)
+Apps tag every call with an **intent archetype** (`ask`, `hunt`, `classify`,
+`summarize`, `generate`, `extract`, `plan`, `critique`, `transform`) and the
+compiler computes a **shape signature** (context bucket × tool count × history
+depth × output mode × examples flag).
+The `(archetype, model, shape)` tuple is the **learning key**. Apps that
+declare the same tuple inherit each other's mutations — even apps that have
+never seen each other's data.
+That's how *"what works for the dashboard, should be insights for the next
+dashboard"* becomes mechanical instead of aspirational.
+## Profiles — executable model knowledge
+`profiles.ts` carries every model's capabilities, cliffs, lowering rules, and
+recovery handlers as **executable code** — not prose:
+```ts
+gemini-2.5-flash:
+  cliffs: [
+    { metric: 'input_tokens', threshold: 8_000, action: 'downgrade_quality_warning' },
+    { metric: 'tool_count',   threshold: 20,    action: 'drop_to_top_relevant' },
+    { metric: 'thinking_with_short_output', threshold: 1, action: 'force_thinking_budget_zero' },
+  ],
+  recovery: [
+    { signal: 'empty_response_after_tool', action: 'retry_with_params',
+      retryParams: { 'generationConfig.thinkingConfig.thinkingBudget': 0 } },
+  ],
+  lowering: {
+    cache: { strategy: 'cachedContent', minTokens: 4096, discount: 0.25 },
+    thinking: { field: 'generationConfig.thinkingConfig.thinkingBudget', default: 'auto' },
+  },
+```
+The 5 prod empty-responses in tt-intelligence's `gemini-2.5-flash` dashboard
+calls? v2 catches those automatically — `expectedShortOutput` constraint plus
+the `force_thinking_budget_zero` cliff guard.
+## Brain provisioning
+1. Create a NEW Supabase project (suggested name: `kgauto-brain`)
+2. Apply `brain/migrations/001_initial_schema.sql`
+3. Insert your apps:
+   ```sql
+   insert into apps (id, display_name, api_key_hash)
+     values ('my-app', 'My App', crypt('<bearer>', gen_salt('bf')));
+   ```
+4. Configure each consumer with `configureBrain({ endpoint, apiKey })`
+For staging without a dedicated brain, point consumers at the same Supabase
+they already use — the schema is identical and migration to a dedicated brain
+is a `pg_dump` away.
+## What's next
+- **v2.0.x:** real-app integrations (tt-intelligence, inspire-central,
+  playbacksam, inspirato/incantato). Brain accumulates outcome data.
+- **v2.1:** mutation engine. Shadow-test → statistical gate → promote → auto-rollback.
+- **v2.2:** weekly digest reporting back to the operator.
+- **v2.x:** dialect-v2 expanded with archetypes that emerge from real usage.
+## Why this exists
+The previous version (v1) treated prompts as opaque strings and could only
+*append* behavioral patches. It also tried to learn quality from structural
+signals (token counts) — but quality is semantic, not structural.
+v2 treats prompts as structured IR, makes every model-specific quirk
+*executable* (cliffs, lowering, recovery), and makes oracle scoring a
+first-class contract — so the brain learns from quality data, not its proxies.
+The whole point: every multi-model AI app needs a compiler. Building it
+inline ships one app's value. Building it portable with a shared brain
+ships every app's value to every other app.
+Communicating vessels — finally accurate to the name.
+## License
+MIT. © Warmdrift.

package/dist/chunk-5TI6PNSK.mjs ADDED Viewed

@@ -0,0 +1,95 @@
+// src/dialect.ts
+var DIALECT_VERSION = "v1";
+var INTENT_ARCHETYPES = {
+  ask: {
+    name: "ask",
+    description: "Filter, search, or interrogate existing data",
+    examples: ["filter creators by criteria", "find docs matching query", "lookup a record"]
+  },
+  hunt: {
+    name: "hunt",
+    description: "Discover new entities not in the current dataset",
+    examples: ["find new prospects", "crawl for unindexed sources", "expand a seed list"]
+  },
+  classify: {
+    name: "classify",
+    description: "Assign a category from a finite set",
+    examples: ["intent detection", "sentiment", "route-to-team"]
+  },
+  summarize: {
+    name: "summarize",
+    description: "Compress text or data while preserving meaning",
+    examples: ["dashboard insight", "meeting notes", "briefing"]
+  },
+  generate: {
+    name: "generate",
+    description: "Produce new content from a prompt or template",
+    examples: ["draft email", "create marketing copy", "co-founder conversation"]
+  },
+  extract: {
+    name: "extract",
+    description: "Pull structured data from unstructured input",
+    examples: ["parse invoice", "extract entities", "transcript \u2192 action items"]
+  },
+  plan: {
+    name: "plan",
+    description: "Multi-step decomposition of a goal",
+    examples: ["build a roadmap", "sequence tasks", "break a feature into steps"]
+  },
+  critique: {
+    name: "critique",
+    description: "Quality assessment, review, or scoring",
+    examples: ["code review", "design feedback", "oracle judgment"]
+  },
+  transform: {
+    name: "transform",
+    description: "Change format or style while preserving content",
+    examples: ["markdown \u2192 html", "formal \u2192 casual", "translate"]
+  }
+};
+var ALL_ARCHETYPES = Object.keys(INTENT_ARCHETYPES);
+function isArchetype(name) {
+  return name in INTENT_ARCHETYPES;
+}
+function bucketContext(tokens) {
+  if (tokens < 1e3) return "tiny";
+  if (tokens < 4e3) return "small";
+  if (tokens < 16e3) return "medium";
+  if (tokens < 64e3) return "large";
+  return "huge";
+}
+function bucketToolCount(count) {
+  if (count === 0) return "none";
+  if (count <= 5) return "few";
+  if (count <= 20) return "many";
+  return "massive";
+}
+function bucketHistory(turnCount) {
+  if (turnCount <= 1) return "single_turn";
+  if (turnCount <= 6) return "short";
+  return "long";
+}
+function hashShape(s) {
+  return [
+    s.contextBucket,
+    s.toolCountBucket,
+    s.historyDepth,
+    s.outputMode,
+    s.hasExamples ? "ex" : "no_ex"
+  ].join("-");
+}
+function learningKey(archetype, model, shape) {
+  return `${DIALECT_VERSION}::${archetype}::${model}::${hashShape(shape)}`;
+}
+export {
+  DIALECT_VERSION,
+  INTENT_ARCHETYPES,
+  ALL_ARCHETYPES,
+  isArchetype,
+  bucketContext,
+  bucketToolCount,
+  bucketHistory,
+  hashShape,
+  learningKey
+};

package/dist/chunk-MBEI5UOM.mjs ADDED Viewed

@@ -0,0 +1,409 @@
+// src/profiles.ts
+var ANTHROPIC_LOWERING_BASE = {
+  system: { mode: "inline" },
+  cache: {
+    strategy: "cache_control",
+    minTokens: 1024,
+    discount: 0.1,
+    ttlSeconds: 300
+  },
+  tools: { format: "anthropic" }
+};
+var GOOGLE_LOWERING_BASE = {
+  system: { mode: "separate", field: "systemInstruction" },
+  cache: {
+    strategy: "cachedContent",
+    minTokens: 4096,
+    discount: 0.25,
+    ttlSeconds: 3600
+  },
+  tools: { format: "google" }
+};
+var PROFILES_RAW = [
+  // ── Anthropic ──
+  {
+    id: "claude-opus-4-7",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "anthropic",
+    status: "current",
+    maxContextTokens: 1e6,
+    maxOutputTokens: 128e3,
+    maxTools: 64,
+    parallelToolCalls: true,
+    structuredOutput: "grammar",
+    systemPromptMode: "inline",
+    streaming: true,
+    cliffs: [],
+    costInputPer1m: 5,
+    costOutputPer1m: 25,
+    lowering: ANTHROPIC_LOWERING_BASE,
+    recovery: [
+      {
+        signal: "rate_limit",
+        action: "escalate",
+        reason: "429 from Anthropic \u2014 escalate to fallback chain"
+      },
+      {
+        signal: "model_not_found",
+        action: "escalate",
+        reason: "Model deprecated/renamed \u2014 escalate (L-061)"
+      }
+    ],
+    strengths: ["reasoning", "agentic_coding", "long_context", "reliable_tool_use", "structured_output"],
+    weaknesses: ["cost", "latency"],
+    notes: "Frontier (2026-05). Step-change improvement over 4.6 in agentic coding. Adaptive thinking only \u2014 no extended-thinking toggle. 1M context, 128k max output."
+  },
+  {
+    id: "claude-opus-4-6",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "anthropic",
+    status: "legacy",
+    maxContextTokens: 1e6,
+    maxOutputTokens: 128e3,
+    maxTools: 64,
+    parallelToolCalls: true,
+    structuredOutput: "grammar",
+    systemPromptMode: "inline",
+    streaming: true,
+    cliffs: [],
+    costInputPer1m: 5,
+    costOutputPer1m: 25,
+    lowering: ANTHROPIC_LOWERING_BASE,
+    recovery: [
+      {
+        signal: "rate_limit",
+        action: "escalate",
+        reason: "429 from Anthropic \u2014 escalate to fallback chain"
+      },
+      {
+        signal: "model_not_found",
+        action: "escalate",
+        reason: "Model deprecated/renamed \u2014 escalate (L-061)"
+      }
+    ],
+    strengths: ["reasoning", "long_context", "reliable_tool_use", "structured_output", "extended_thinking"],
+    weaknesses: ["cost", "latency"],
+    notes: "Predecessor to 4.7. Still current in Anthropic legacy table. Same pricing as 4.7 \u2014 choose 4.7 unless you need extended-thinking budget control (4.7 is adaptive-only)."
+  },
+  {
+    id: "claude-sonnet-4-6",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "anthropic",
+    status: "current",
+    maxContextTokens: 1e6,
+    maxOutputTokens: 64e3,
+    maxTools: 64,
+    parallelToolCalls: true,
+    structuredOutput: "grammar",
+    systemPromptMode: "inline",
+    streaming: true,
+    cliffs: [],
+    costInputPer1m: 3,
+    costOutputPer1m: 15,
+    lowering: ANTHROPIC_LOWERING_BASE,
+    recovery: [
+      { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate" },
+      { signal: "model_not_found", action: "escalate", reason: "Deprecated \u2014 escalate (L-061)" }
+    ],
+    strengths: ["quality", "tool_use", "long_context", "cache_friendly", "extended_thinking"],
+    weaknesses: [],
+    notes: "Workhorse. Best price/quality for most multi-turn agentic work. 1M context, 64k max output."
+  },
+  {
+    id: "claude-haiku-4-5",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "anthropic",
+    status: "current",
+    maxContextTokens: 2e5,
+    maxOutputTokens: 64e3,
+    maxTools: 32,
+    parallelToolCalls: true,
+    structuredOutput: "grammar",
+    systemPromptMode: "inline",
+    streaming: true,
+    cliffs: [
+      {
+        metric: "tool_count",
+        threshold: 16,
+        action: "drop_to_top_relevant",
+        reason: "Haiku reliability degrades above ~16 tools"
+      }
+    ],
+    costInputPer1m: 1,
+    costOutputPer1m: 5,
+    lowering: ANTHROPIC_LOWERING_BASE,
+    recovery: [
+      { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate to Sonnet" }
+    ],
+    strengths: ["speed", "cost", "classification", "cache_friendly", "extended_thinking"],
+    weaknesses: ["complex_reasoning", "large_tool_sets"],
+    notes: "Cheapest Anthropic. Great for classify, summarize, ask shapes. 200k context, 64k max output. API alias `claude-haiku-4-5` resolves to dated snapshot `claude-haiku-4-5-20251001`."
+  },
+  // ── Google ──
+  {
+    id: "gemini-2.5-flash",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "google",
+    status: "current",
+    maxContextTokens: 1048576,
+    maxOutputTokens: 65535,
+    maxTools: 128,
+    parallelToolCalls: true,
+    structuredOutput: "native",
+    systemPromptMode: "separate",
+    streaming: true,
+    cliffs: [
+      {
+        metric: "input_tokens",
+        threshold: 8e3,
+        action: "downgrade_quality_warning",
+        reason: "Quality degrades significantly above ~8K context tokens"
+      },
+      {
+        metric: "tool_count",
+        threshold: 20,
+        action: "drop_to_top_relevant",
+        reason: "Tool reliability drops above ~20 tools (despite 128 hard limit)"
+      },
+      {
+        metric: "thinking_with_short_output",
+        threshold: 1,
+        action: "force_thinking_budget_zero",
+        reason: "Thinking tokens consume maxOutputTokens \u2014 empty response if drained"
+      },
+      {
+        // s11 trust artifact (2026-05-02): brain showed 5/5 empty rate on
+        // tt-intelligence/summarize/gemini-2.5-flash with tools offered.
+        // v1's disable_thinking_for_short_output already fired and didn't
+        // help — disabling thinking is necessary but not sufficient. Tools
+        // present + summarize intent confuses Flash into a no-output state
+        // (likely tool-decision purgatory). Strip tools entirely for this
+        // archetype on this model.
+        metric: "tool_count",
+        threshold: 1,
+        whenIntent: "summarize",
+        action: "strip_tools",
+        reason: "Gemini Flash returns empty when summarize intent has tools offered (5/5 empty rate observed in v1 prod 2026-04-19, replayed into v2 brain 2026-04-29)"
+      }
+    ],
+    costInputPer1m: 0.3,
+    costOutputPer1m: 2.5,
+    lowering: {
+      ...GOOGLE_LOWERING_BASE,
+      thinking: { field: "generationConfig.thinkingConfig.thinkingBudget", default: "auto" }
+    },
+    recovery: [
+      {
+        signal: "empty_response_after_tool",
+        action: "retry_with_params",
+        retryParams: { "generationConfig.thinkingConfig.thinkingBudget": 0 },
+        maxRetries: 1,
+        reason: "Known: empty after tool result \u2014 retry with thinking off"
+      },
+      {
+        signal: "empty_response",
+        action: "retry_with_params",
+        retryParams: { "generationConfig.thinkingConfig.thinkingBudget": 0 },
+        maxRetries: 1,
+        reason: "Empty response \u2014 try with thinking off"
+      },
+      {
+        signal: "malformed_function_call",
+        action: "escalate",
+        reason: "MALFORMED_FUNCTION_CALL maps to stop \u2014 escalate to next target"
+      }
+    ],
+    strengths: ["speed", "volume", "classification", "1m_context", "cost"],
+    weaknesses: ["complex_schemas", "large_tool_sets", "high_context_quality"],
+    notes: "Fast and cheap with 1M context. Quality cliffs at 8K context and 20 tools \u2014 guard with cliffs."
+  },
+  {
+    id: "gemini-2.5-pro",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "google",
+    status: "current",
+    maxContextTokens: 1048576,
+    maxOutputTokens: 65535,
+    maxTools: 128,
+    parallelToolCalls: true,
+    structuredOutput: "native",
+    systemPromptMode: "separate",
+    streaming: true,
+    cliffs: [
+      {
+        metric: "input_tokens",
+        threshold: 2e5,
+        action: "downgrade_quality_warning",
+        reason: "Pricing doubles above 200K: input $1.25\u2192$2.50/M, output $10\u2192$15/M"
+      }
+    ],
+    costInputPer1m: 1.25,
+    costOutputPer1m: 10,
+    lowering: {
+      ...GOOGLE_LOWERING_BASE,
+      thinking: { field: "generationConfig.thinkingConfig.thinkingBudget", default: "auto" }
+    },
+    recovery: [
+      {
+        signal: "malformed_function_call",
+        action: "escalate",
+        reason: "MALFORMED_FUNCTION_CALL \u2014 escalate"
+      }
+    ],
+    strengths: ["reasoning", "1m_context", "structured_output", "tool_use"],
+    weaknesses: ["pricing_above_200k"]
+  },
+  {
+    id: "gemini-3.1-pro-preview",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "google",
+    status: "preview",
+    maxContextTokens: 1048576,
+    maxOutputTokens: 65535,
+    maxTools: 128,
+    parallelToolCalls: true,
+    structuredOutput: "native",
+    systemPromptMode: "separate",
+    streaming: true,
+    cliffs: [
+      {
+        metric: "input_tokens",
+        threshold: 2e5,
+        action: "downgrade_quality_warning",
+        reason: "Pricing doubles above 200K: input $2\u2192$4/M, output $12\u2192$18/M"
+      }
+    ],
+    costInputPer1m: 2,
+    costOutputPer1m: 12,
+    lowering: {
+      ...GOOGLE_LOWERING_BASE,
+      cache: { ...GOOGLE_LOWERING_BASE.cache, discount: 0.1 },
+      thinking: { field: "generationConfig.thinkingConfig.thinkingBudget", default: "auto" }
+    },
+    recovery: [
+      {
+        signal: "malformed_function_call",
+        action: "escalate",
+        reason: "MALFORMED_FUNCTION_CALL \u2014 escalate"
+      }
+    ],
+    strengths: ["reasoning", "1m_context", "agentic_coding", "structured_output", "tool_use"],
+    weaknesses: ["cost", "preview_status", "pricing_above_200k"],
+    notes: "Frontier Gemini (preview, 2026-Q2). Step-change agentic coding per Google. Cache discount 10\xD7 (vs 4\xD7 for 2.5 Pro). Use status=preview to flag rollback path until GA."
+  },
+  // ── DeepSeek ──
+  // 2026-05-08 audit (L-073): DeepSeek's `deepseek-chat` was silently aliased
+  // to `deepseek-v4-flash` non-thinking mode. Old kgauto profile claimed 64k
+  // context + $0.27/$1.10 — actual is 1M context + $0.14/$0.28. Now modeled
+  // as: V4-Flash + V4-Pro as canonical profiles; deepseek-chat and
+  // deepseek-reasoner registered as aliases (see ALIASES below).
+  {
+    id: "deepseek-v4-flash",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "deepseek",
+    status: "current",
+    maxContextTokens: 1e6,
+    maxOutputTokens: 384e3,
+    maxTools: 16,
+    parallelToolCalls: false,
+    structuredOutput: "native",
+    systemPromptMode: "inline",
+    streaming: true,
+    cliffs: [
+      {
+        metric: "tool_count",
+        threshold: 1,
+        action: "drop_to_top_relevant",
+        reason: "Sequential tool calls only \u2014 L-040"
+      }
+    ],
+    costInputPer1m: 0.14,
+    costOutputPer1m: 0.28,
+    lowering: {
+      system: { mode: "inline" },
+      cache: { strategy: "unsupported" },
+      tools: { format: "deepseek" }
+    },
+    recovery: [
+      { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate" }
+    ],
+    strengths: ["cost", "1m_context", "json_output", "code", "reasoning"],
+    weaknesses: ["parallel_tools", "large_tool_sets"],
+    notes: "Cheap workhorse. 1M context, 384k max output. Cache-hit input $0.0028/M (1/50\xD7 of miss). Aliased as `deepseek-chat` (non-thinking) and `deepseek-reasoner` (thinking) \u2014 see ALIASES."
+  },
+  {
+    id: "deepseek-v4-pro",
+    verifiedAgainstDocs: "2026-05-08",
+    provider: "deepseek",
+    status: "current",
+    maxContextTokens: 1e6,
+    maxOutputTokens: 384e3,
+    maxTools: 16,
+    parallelToolCalls: false,
+    structuredOutput: "native",
+    systemPromptMode: "inline",
+    streaming: true,
+    cliffs: [
+      {
+        metric: "tool_count",
+        threshold: 1,
+        action: "drop_to_top_relevant",
+        reason: "Sequential tool calls only \u2014 L-040"
+      }
+    ],
+    // Profile carries REGULAR pricing, not the 75%-off promo (ends 2026-05-31).
+    // Under-estimating cost is worse than over-estimating for budget caps.
+    costInputPer1m: 1.74,
+    costOutputPer1m: 3.48,
+    lowering: {
+      system: { mode: "inline" },
+      cache: { strategy: "unsupported" },
+      tools: { format: "deepseek" }
+    },
+    recovery: [
+      { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate" }
+    ],
+    strengths: ["quality", "reasoning", "1m_context", "json_output", "code", "extended_thinking"],
+    weaknesses: ["parallel_tools", "large_tool_sets"],
+    notes: "Pro tier. 1M context, 384k max output. Regular pricing $1.74/$3.48; 75% promo through 2026-05-31 ($0.435/$0.87). Default mode = thinking."
+  }
+];
+var ALIASES = {
+  // DeepSeek's own model routing — both names served by V4-Flash.
+  "deepseek-chat": "deepseek-v4-flash",
+  "deepseek-reasoner": "deepseek-v4-flash",
+  // Legacy kgauto typo — actual API alias is dash-form (alpha.1 had dot).
+  "claude-haiku-4.5": "claude-haiku-4-5"
+};
+function canonicalId(id) {
+  return ALIASES[id] ?? id;
+}
+var PROFILE_INDEX = new Map(
+  PROFILES_RAW.map((p) => [p.id, p])
+);
+function getProfile(id) {
+  const canonical = canonicalId(id);
+  const p = PROFILE_INDEX.get(canonical);
+  if (!p) {
+    const known = [...PROFILE_INDEX.keys(), ...Object.keys(ALIASES)].join(", ");
+    throw new Error(`Unknown model id: "${id}". Known: ${known}`);
+  }
+  return p;
+}
+function tryGetProfile(id) {
+  return PROFILE_INDEX.get(canonicalId(id));
+}
+function allProfiles() {
+  return PROFILES_RAW;
+}
+function profilesByProvider(provider) {
+  return PROFILES_RAW.filter((p) => p.provider === provider);
+}
+export {
+  ALIASES,
+  getProfile,
+  tryGetProfile,
+  allProfiles,
+  profilesByProvider
+};