@warmdrift/kgauto-compiler 2.0.0-alpha.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,198 @@
1
+ # @warmdrift/kgauto-compiler — v2.0.0-alpha.3
2
+
3
+ > Prompt compiler + central learning brain for multi-model AI apps.
4
+ > **Swap models without rewriting prompts.**
5
+
6
+ Greenfield rewrite of `@warmdrift/kgauto` v1. v1 was a behavioral patcher
7
+ with telemetry; v2 is a real prompt compiler with a self-improving learning
8
+ layer designed for cross-app pollination.
9
+
10
+ The "compiler" name is deliberate — every optimization is a pass on a
11
+ structured Intermediate Representation (IR), not string surgery on a
12
+ rendered prompt. This unlocks slicing, dedupe, intent-aware tool relevance,
13
+ target-correct lowering with cache markers, and (in v2.1) outcome-driven
14
+ mutations.
15
+
16
+ ## Status
17
+
18
+ - **Package:** alpha — coexists with v1 (`@warmdrift/kgauto@1.2.0`) under
19
+ the temporary name `@warmdrift/kgauto-compiler`. Renames to v2 final once
20
+ v1 is fully retired from production.
21
+ - **Tests:** 132/132 passing
22
+ - **Build:** clean (43KB ESM, 60KB CJS)
23
+ - **Brain:** schema ready (see `brain/migrations/001_initial_schema.sql`);
24
+ awaiting dedicated Supabase provisioning.
25
+ - **Mutation engine:** v2.1 (after enough outcome data accumulates).
26
+
27
+ ## Quickstart
28
+
29
+ Two entry points. Use whichever fits your call path.
30
+
31
+ ### `call()` — kgauto owns the network round-trip (alpha.3)
32
+
33
+ For plain-fetch consumers who don't already drive the wire themselves. One async call → compiled, executed, normalized, recorded.
34
+
35
+ ```ts
36
+ import { call, configureBrain } from '@warmdrift/kgauto-compiler';
37
+
38
+ configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
39
+
40
+ const result = await call({
41
+ appId: 'my-app',
42
+ intent: { name: 'search', archetype: 'ask' },
43
+ sections: [
44
+ { id: 'role', text: 'You are an assistant.', cacheable: true },
45
+ { id: 'task', text: userQuestion },
46
+ ],
47
+ models: ['claude-sonnet-4-6', 'gemini-2.5-flash'], // first = primary; rest = fallback chain
48
+ });
49
+
50
+ // result.actualModel → what served (post-fallback)
51
+ // result.requestedModel → what kgauto initially picked
52
+ // result.response.text → normalized across providers
53
+ // result.response.tokens → { input, output, total, cached?, cacheCreated? }
54
+ // result.response.toolCalls → ToolCall[] in normalized shape
55
+ // result.attempts → retry observability
56
+ ```
57
+
58
+ API keys default to `process.env.{ANTHROPIC,GOOGLE,OPENAI,DEEPSEEK}_API_KEY`. Override per-call via `apiKeys: { anthropic, google, ... }`. Reach provider-specific fields (Gemini `safetySettings`, Anthropic `tool_choice`, OpenAI `seed`) via `providerOverrides: { google: {...}, anthropic: {...} }` shallow-merged into the lowered request.
59
+
60
+ ### `compile()` — drive the wire yourself (existing path)
61
+
62
+ For consumers who already own provider plumbing (AI SDK adapters, custom retry logic, streaming).
63
+
64
+ ```ts
65
+ import { compile, configureBrain, record } from '@warmdrift/kgauto-compiler';
66
+
67
+ configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
68
+
69
+ const result = compile({
70
+ appId: 'my-app',
71
+ intent: { name: 'search', archetype: 'ask' },
72
+ sections: [
73
+ { id: 'role', text: 'You are an assistant.', cacheable: true },
74
+ { id: 'task', text: userQuestion },
75
+ ],
76
+ models: ['claude-sonnet-4-6', 'gemini-2.5-flash'],
77
+ });
78
+
79
+ const start = Date.now();
80
+ const response = await callProvider(result.target, result.request);
81
+
82
+ await record({
83
+ handle: result.handle,
84
+ tokensIn: response.usage.input,
85
+ tokensOut: response.usage.output,
86
+ latencyMs: Date.now() - start,
87
+ success: true,
88
+ oracleScore: { score: 0.85 },
89
+ });
90
+ ```
91
+
92
+ ## Architecture
93
+
94
+ ```
95
+ APP (any consumer)
96
+ ├── kg.compile(IR) ── runs LOCALLY, no network
97
+ │ ├─ pass: slice (drop sections not for this intent)
98
+ │ ├─ pass: dedupe (collapse identical sections by hash)
99
+ │ ├─ pass: tool_relevance (drop tools below intent threshold)
100
+ │ ├─ pass: compress_history (summarize old turns)
101
+ │ ├─ pass: score_targets (rank allowed models)
102
+ │ ├─ pass: apply_cliffs (executable known_failures from profile)
103
+ │ ├─ pass: lower (target-specific wire format + cache markers)
104
+ │ └─ pass: validate (fits hard constraints)
105
+
106
+ ├── app calls provider with the wire request
107
+
108
+ └── kg.record(handle, outcome) ── async POST to brain
109
+
110
+ BRAIN (centralized Supabase)
111
+ ├── compile_outcomes (multi-tenant from day 1)
112
+ ├── mutations (active rules — empty in v2.0; engine in v2.1)
113
+ ├── apps (consumer registry)
114
+ └── digest_runs (weekly summary audit trail)
115
+ ```
116
+
117
+ ## Dialect-v1 (cross-app vocabulary)
118
+
119
+ Apps tag every call with an **intent archetype** (`ask`, `hunt`, `classify`,
120
+ `summarize`, `generate`, `extract`, `plan`, `critique`, `transform`) and the
121
+ compiler computes a **shape signature** (context bucket × tool count × history
122
+ depth × output mode × examples flag).
123
+
124
+ The `(archetype, model, shape)` tuple is the **learning key**. Apps that
125
+ declare the same tuple inherit each other's mutations — even apps that have
126
+ never seen each other's data.
127
+
128
+ That's how *"what works for the dashboard, should be insights for the next
129
+ dashboard"* becomes mechanical instead of aspirational.
130
+
131
+ ## Profiles — executable model knowledge
132
+
133
+ `profiles.ts` carries every model's capabilities, cliffs, lowering rules, and
134
+ recovery handlers as **executable code** — not prose:
135
+
136
+ ```ts
137
+ gemini-2.5-flash:
138
+ cliffs: [
139
+ { metric: 'input_tokens', threshold: 8_000, action: 'downgrade_quality_warning' },
140
+ { metric: 'tool_count', threshold: 20, action: 'drop_to_top_relevant' },
141
+ { metric: 'thinking_with_short_output', threshold: 1, action: 'force_thinking_budget_zero' },
142
+ ],
143
+ recovery: [
144
+ { signal: 'empty_response_after_tool', action: 'retry_with_params',
145
+ retryParams: { 'generationConfig.thinkingConfig.thinkingBudget': 0 } },
146
+ ],
147
+ lowering: {
148
+ cache: { strategy: 'cachedContent', minTokens: 4096, discount: 0.25 },
149
+ thinking: { field: 'generationConfig.thinkingConfig.thinkingBudget', default: 'auto' },
150
+ },
151
+ ```
152
+
153
+ The 5 prod empty-responses in tt-intelligence's `gemini-2.5-flash` dashboard
154
+ calls? v2 catches those automatically — `expectedShortOutput` constraint plus
155
+ the `force_thinking_budget_zero` cliff guard.
156
+
157
+ ## Brain provisioning
158
+
159
+ 1. Create a NEW Supabase project (suggested name: `kgauto-brain`)
160
+ 2. Apply `brain/migrations/001_initial_schema.sql`
161
+ 3. Insert your apps:
162
+ ```sql
163
+ insert into apps (id, display_name, api_key_hash)
164
+ values ('my-app', 'My App', crypt('<bearer>', gen_salt('bf')));
165
+ ```
166
+ 4. Configure each consumer with `configureBrain({ endpoint, apiKey })`
167
+
168
+ For staging without a dedicated brain, point consumers at the same Supabase
169
+ they already use — the schema is identical and migration to a dedicated brain
170
+ is a `pg_dump` away.
171
+
172
+ ## What's next
173
+
174
+ - **v2.0.x:** real-app integrations (tt-intelligence, inspire-central,
175
+ playbacksam, inspirato/incantato). Brain accumulates outcome data.
176
+ - **v2.1:** mutation engine. Shadow-test → statistical gate → promote → auto-rollback.
177
+ - **v2.2:** weekly digest reporting back to the operator.
178
+ - **v2.x:** dialect-v2 expanded with archetypes that emerge from real usage.
179
+
180
+ ## Why this exists
181
+
182
+ The previous version (v1) treated prompts as opaque strings and could only
183
+ *append* behavioral patches. It also tried to learn quality from structural
184
+ signals (token counts) — but quality is semantic, not structural.
185
+
186
+ v2 treats prompts as structured IR, makes every model-specific quirk
187
+ *executable* (cliffs, lowering, recovery), and makes oracle scoring a
188
+ first-class contract — so the brain learns from quality data, not its proxies.
189
+
190
+ The whole point: every multi-model AI app needs a compiler. Building it
191
+ inline ships one app's value. Building it portable with a shared brain
192
+ ships every app's value to every other app.
193
+
194
+ Communicating vessels — finally accurate to the name.
195
+
196
+ ## License
197
+
198
+ MIT. © Warmdrift.
@@ -0,0 +1,95 @@
1
+ // src/dialect.ts
2
+ var DIALECT_VERSION = "v1";
3
+ var INTENT_ARCHETYPES = {
4
+ ask: {
5
+ name: "ask",
6
+ description: "Filter, search, or interrogate existing data",
7
+ examples: ["filter creators by criteria", "find docs matching query", "lookup a record"]
8
+ },
9
+ hunt: {
10
+ name: "hunt",
11
+ description: "Discover new entities not in the current dataset",
12
+ examples: ["find new prospects", "crawl for unindexed sources", "expand a seed list"]
13
+ },
14
+ classify: {
15
+ name: "classify",
16
+ description: "Assign a category from a finite set",
17
+ examples: ["intent detection", "sentiment", "route-to-team"]
18
+ },
19
+ summarize: {
20
+ name: "summarize",
21
+ description: "Compress text or data while preserving meaning",
22
+ examples: ["dashboard insight", "meeting notes", "briefing"]
23
+ },
24
+ generate: {
25
+ name: "generate",
26
+ description: "Produce new content from a prompt or template",
27
+ examples: ["draft email", "create marketing copy", "co-founder conversation"]
28
+ },
29
+ extract: {
30
+ name: "extract",
31
+ description: "Pull structured data from unstructured input",
32
+ examples: ["parse invoice", "extract entities", "transcript \u2192 action items"]
33
+ },
34
+ plan: {
35
+ name: "plan",
36
+ description: "Multi-step decomposition of a goal",
37
+ examples: ["build a roadmap", "sequence tasks", "break a feature into steps"]
38
+ },
39
+ critique: {
40
+ name: "critique",
41
+ description: "Quality assessment, review, or scoring",
42
+ examples: ["code review", "design feedback", "oracle judgment"]
43
+ },
44
+ transform: {
45
+ name: "transform",
46
+ description: "Change format or style while preserving content",
47
+ examples: ["markdown \u2192 html", "formal \u2192 casual", "translate"]
48
+ }
49
+ };
50
+ var ALL_ARCHETYPES = Object.keys(INTENT_ARCHETYPES);
51
+ function isArchetype(name) {
52
+ return name in INTENT_ARCHETYPES;
53
+ }
54
+ function bucketContext(tokens) {
55
+ if (tokens < 1e3) return "tiny";
56
+ if (tokens < 4e3) return "small";
57
+ if (tokens < 16e3) return "medium";
58
+ if (tokens < 64e3) return "large";
59
+ return "huge";
60
+ }
61
+ function bucketToolCount(count) {
62
+ if (count === 0) return "none";
63
+ if (count <= 5) return "few";
64
+ if (count <= 20) return "many";
65
+ return "massive";
66
+ }
67
+ function bucketHistory(turnCount) {
68
+ if (turnCount <= 1) return "single_turn";
69
+ if (turnCount <= 6) return "short";
70
+ return "long";
71
+ }
72
+ function hashShape(s) {
73
+ return [
74
+ s.contextBucket,
75
+ s.toolCountBucket,
76
+ s.historyDepth,
77
+ s.outputMode,
78
+ s.hasExamples ? "ex" : "no_ex"
79
+ ].join("-");
80
+ }
81
+ function learningKey(archetype, model, shape) {
82
+ return `${DIALECT_VERSION}::${archetype}::${model}::${hashShape(shape)}`;
83
+ }
84
+
85
+ export {
86
+ DIALECT_VERSION,
87
+ INTENT_ARCHETYPES,
88
+ ALL_ARCHETYPES,
89
+ isArchetype,
90
+ bucketContext,
91
+ bucketToolCount,
92
+ bucketHistory,
93
+ hashShape,
94
+ learningKey
95
+ };
@@ -0,0 +1,409 @@
1
+ // src/profiles.ts
2
+ var ANTHROPIC_LOWERING_BASE = {
3
+ system: { mode: "inline" },
4
+ cache: {
5
+ strategy: "cache_control",
6
+ minTokens: 1024,
7
+ discount: 0.1,
8
+ ttlSeconds: 300
9
+ },
10
+ tools: { format: "anthropic" }
11
+ };
12
+ var GOOGLE_LOWERING_BASE = {
13
+ system: { mode: "separate", field: "systemInstruction" },
14
+ cache: {
15
+ strategy: "cachedContent",
16
+ minTokens: 4096,
17
+ discount: 0.25,
18
+ ttlSeconds: 3600
19
+ },
20
+ tools: { format: "google" }
21
+ };
22
+ var PROFILES_RAW = [
23
+ // ── Anthropic ──
24
+ {
25
+ id: "claude-opus-4-7",
26
+ verifiedAgainstDocs: "2026-05-08",
27
+ provider: "anthropic",
28
+ status: "current",
29
+ maxContextTokens: 1e6,
30
+ maxOutputTokens: 128e3,
31
+ maxTools: 64,
32
+ parallelToolCalls: true,
33
+ structuredOutput: "grammar",
34
+ systemPromptMode: "inline",
35
+ streaming: true,
36
+ cliffs: [],
37
+ costInputPer1m: 5,
38
+ costOutputPer1m: 25,
39
+ lowering: ANTHROPIC_LOWERING_BASE,
40
+ recovery: [
41
+ {
42
+ signal: "rate_limit",
43
+ action: "escalate",
44
+ reason: "429 from Anthropic \u2014 escalate to fallback chain"
45
+ },
46
+ {
47
+ signal: "model_not_found",
48
+ action: "escalate",
49
+ reason: "Model deprecated/renamed \u2014 escalate (L-061)"
50
+ }
51
+ ],
52
+ strengths: ["reasoning", "agentic_coding", "long_context", "reliable_tool_use", "structured_output"],
53
+ weaknesses: ["cost", "latency"],
54
+ notes: "Frontier (2026-05). Step-change improvement over 4.6 in agentic coding. Adaptive thinking only \u2014 no extended-thinking toggle. 1M context, 128k max output."
55
+ },
56
+ {
57
+ id: "claude-opus-4-6",
58
+ verifiedAgainstDocs: "2026-05-08",
59
+ provider: "anthropic",
60
+ status: "legacy",
61
+ maxContextTokens: 1e6,
62
+ maxOutputTokens: 128e3,
63
+ maxTools: 64,
64
+ parallelToolCalls: true,
65
+ structuredOutput: "grammar",
66
+ systemPromptMode: "inline",
67
+ streaming: true,
68
+ cliffs: [],
69
+ costInputPer1m: 5,
70
+ costOutputPer1m: 25,
71
+ lowering: ANTHROPIC_LOWERING_BASE,
72
+ recovery: [
73
+ {
74
+ signal: "rate_limit",
75
+ action: "escalate",
76
+ reason: "429 from Anthropic \u2014 escalate to fallback chain"
77
+ },
78
+ {
79
+ signal: "model_not_found",
80
+ action: "escalate",
81
+ reason: "Model deprecated/renamed \u2014 escalate (L-061)"
82
+ }
83
+ ],
84
+ strengths: ["reasoning", "long_context", "reliable_tool_use", "structured_output", "extended_thinking"],
85
+ weaknesses: ["cost", "latency"],
86
+ notes: "Predecessor to 4.7. Still current in Anthropic legacy table. Same pricing as 4.7 \u2014 choose 4.7 unless you need extended-thinking budget control (4.7 is adaptive-only)."
87
+ },
88
+ {
89
+ id: "claude-sonnet-4-6",
90
+ verifiedAgainstDocs: "2026-05-08",
91
+ provider: "anthropic",
92
+ status: "current",
93
+ maxContextTokens: 1e6,
94
+ maxOutputTokens: 64e3,
95
+ maxTools: 64,
96
+ parallelToolCalls: true,
97
+ structuredOutput: "grammar",
98
+ systemPromptMode: "inline",
99
+ streaming: true,
100
+ cliffs: [],
101
+ costInputPer1m: 3,
102
+ costOutputPer1m: 15,
103
+ lowering: ANTHROPIC_LOWERING_BASE,
104
+ recovery: [
105
+ { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate" },
106
+ { signal: "model_not_found", action: "escalate", reason: "Deprecated \u2014 escalate (L-061)" }
107
+ ],
108
+ strengths: ["quality", "tool_use", "long_context", "cache_friendly", "extended_thinking"],
109
+ weaknesses: [],
110
+ notes: "Workhorse. Best price/quality for most multi-turn agentic work. 1M context, 64k max output."
111
+ },
112
+ {
113
+ id: "claude-haiku-4-5",
114
+ verifiedAgainstDocs: "2026-05-08",
115
+ provider: "anthropic",
116
+ status: "current",
117
+ maxContextTokens: 2e5,
118
+ maxOutputTokens: 64e3,
119
+ maxTools: 32,
120
+ parallelToolCalls: true,
121
+ structuredOutput: "grammar",
122
+ systemPromptMode: "inline",
123
+ streaming: true,
124
+ cliffs: [
125
+ {
126
+ metric: "tool_count",
127
+ threshold: 16,
128
+ action: "drop_to_top_relevant",
129
+ reason: "Haiku reliability degrades above ~16 tools"
130
+ }
131
+ ],
132
+ costInputPer1m: 1,
133
+ costOutputPer1m: 5,
134
+ lowering: ANTHROPIC_LOWERING_BASE,
135
+ recovery: [
136
+ { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate to Sonnet" }
137
+ ],
138
+ strengths: ["speed", "cost", "classification", "cache_friendly", "extended_thinking"],
139
+ weaknesses: ["complex_reasoning", "large_tool_sets"],
140
+ notes: "Cheapest Anthropic. Great for classify, summarize, ask shapes. 200k context, 64k max output. API alias `claude-haiku-4-5` resolves to dated snapshot `claude-haiku-4-5-20251001`."
141
+ },
142
+ // ── Google ──
143
+ {
144
+ id: "gemini-2.5-flash",
145
+ verifiedAgainstDocs: "2026-05-08",
146
+ provider: "google",
147
+ status: "current",
148
+ maxContextTokens: 1048576,
149
+ maxOutputTokens: 65535,
150
+ maxTools: 128,
151
+ parallelToolCalls: true,
152
+ structuredOutput: "native",
153
+ systemPromptMode: "separate",
154
+ streaming: true,
155
+ cliffs: [
156
+ {
157
+ metric: "input_tokens",
158
+ threshold: 8e3,
159
+ action: "downgrade_quality_warning",
160
+ reason: "Quality degrades significantly above ~8K context tokens"
161
+ },
162
+ {
163
+ metric: "tool_count",
164
+ threshold: 20,
165
+ action: "drop_to_top_relevant",
166
+ reason: "Tool reliability drops above ~20 tools (despite 128 hard limit)"
167
+ },
168
+ {
169
+ metric: "thinking_with_short_output",
170
+ threshold: 1,
171
+ action: "force_thinking_budget_zero",
172
+ reason: "Thinking tokens consume maxOutputTokens \u2014 empty response if drained"
173
+ },
174
+ {
175
+ // s11 trust artifact (2026-05-02): brain showed 5/5 empty rate on
176
+ // tt-intelligence/summarize/gemini-2.5-flash with tools offered.
177
+ // v1's disable_thinking_for_short_output already fired and didn't
178
+ // help — disabling thinking is necessary but not sufficient. Tools
179
+ // present + summarize intent confuses Flash into a no-output state
180
+ // (likely tool-decision purgatory). Strip tools entirely for this
181
+ // archetype on this model.
182
+ metric: "tool_count",
183
+ threshold: 1,
184
+ whenIntent: "summarize",
185
+ action: "strip_tools",
186
+ reason: "Gemini Flash returns empty when summarize intent has tools offered (5/5 empty rate observed in v1 prod 2026-04-19, replayed into v2 brain 2026-04-29)"
187
+ }
188
+ ],
189
+ costInputPer1m: 0.3,
190
+ costOutputPer1m: 2.5,
191
+ lowering: {
192
+ ...GOOGLE_LOWERING_BASE,
193
+ thinking: { field: "generationConfig.thinkingConfig.thinkingBudget", default: "auto" }
194
+ },
195
+ recovery: [
196
+ {
197
+ signal: "empty_response_after_tool",
198
+ action: "retry_with_params",
199
+ retryParams: { "generationConfig.thinkingConfig.thinkingBudget": 0 },
200
+ maxRetries: 1,
201
+ reason: "Known: empty after tool result \u2014 retry with thinking off"
202
+ },
203
+ {
204
+ signal: "empty_response",
205
+ action: "retry_with_params",
206
+ retryParams: { "generationConfig.thinkingConfig.thinkingBudget": 0 },
207
+ maxRetries: 1,
208
+ reason: "Empty response \u2014 try with thinking off"
209
+ },
210
+ {
211
+ signal: "malformed_function_call",
212
+ action: "escalate",
213
+ reason: "MALFORMED_FUNCTION_CALL maps to stop \u2014 escalate to next target"
214
+ }
215
+ ],
216
+ strengths: ["speed", "volume", "classification", "1m_context", "cost"],
217
+ weaknesses: ["complex_schemas", "large_tool_sets", "high_context_quality"],
218
+ notes: "Fast and cheap with 1M context. Quality cliffs at 8K context and 20 tools \u2014 guard with cliffs."
219
+ },
220
+ {
221
+ id: "gemini-2.5-pro",
222
+ verifiedAgainstDocs: "2026-05-08",
223
+ provider: "google",
224
+ status: "current",
225
+ maxContextTokens: 1048576,
226
+ maxOutputTokens: 65535,
227
+ maxTools: 128,
228
+ parallelToolCalls: true,
229
+ structuredOutput: "native",
230
+ systemPromptMode: "separate",
231
+ streaming: true,
232
+ cliffs: [
233
+ {
234
+ metric: "input_tokens",
235
+ threshold: 2e5,
236
+ action: "downgrade_quality_warning",
237
+ reason: "Pricing doubles above 200K: input $1.25\u2192$2.50/M, output $10\u2192$15/M"
238
+ }
239
+ ],
240
+ costInputPer1m: 1.25,
241
+ costOutputPer1m: 10,
242
+ lowering: {
243
+ ...GOOGLE_LOWERING_BASE,
244
+ thinking: { field: "generationConfig.thinkingConfig.thinkingBudget", default: "auto" }
245
+ },
246
+ recovery: [
247
+ {
248
+ signal: "malformed_function_call",
249
+ action: "escalate",
250
+ reason: "MALFORMED_FUNCTION_CALL \u2014 escalate"
251
+ }
252
+ ],
253
+ strengths: ["reasoning", "1m_context", "structured_output", "tool_use"],
254
+ weaknesses: ["pricing_above_200k"]
255
+ },
256
+ {
257
+ id: "gemini-3.1-pro-preview",
258
+ verifiedAgainstDocs: "2026-05-08",
259
+ provider: "google",
260
+ status: "preview",
261
+ maxContextTokens: 1048576,
262
+ maxOutputTokens: 65535,
263
+ maxTools: 128,
264
+ parallelToolCalls: true,
265
+ structuredOutput: "native",
266
+ systemPromptMode: "separate",
267
+ streaming: true,
268
+ cliffs: [
269
+ {
270
+ metric: "input_tokens",
271
+ threshold: 2e5,
272
+ action: "downgrade_quality_warning",
273
+ reason: "Pricing doubles above 200K: input $2\u2192$4/M, output $12\u2192$18/M"
274
+ }
275
+ ],
276
+ costInputPer1m: 2,
277
+ costOutputPer1m: 12,
278
+ lowering: {
279
+ ...GOOGLE_LOWERING_BASE,
280
+ cache: { ...GOOGLE_LOWERING_BASE.cache, discount: 0.1 },
281
+ thinking: { field: "generationConfig.thinkingConfig.thinkingBudget", default: "auto" }
282
+ },
283
+ recovery: [
284
+ {
285
+ signal: "malformed_function_call",
286
+ action: "escalate",
287
+ reason: "MALFORMED_FUNCTION_CALL \u2014 escalate"
288
+ }
289
+ ],
290
+ strengths: ["reasoning", "1m_context", "agentic_coding", "structured_output", "tool_use"],
291
+ weaknesses: ["cost", "preview_status", "pricing_above_200k"],
292
+ notes: "Frontier Gemini (preview, 2026-Q2). Step-change agentic coding per Google. Cache discount 10\xD7 (vs 4\xD7 for 2.5 Pro). Use status=preview to flag rollback path until GA."
293
+ },
294
+ // ── DeepSeek ──
295
+ // 2026-05-08 audit (L-073): DeepSeek's `deepseek-chat` was silently aliased
296
+ // to `deepseek-v4-flash` non-thinking mode. Old kgauto profile claimed 64k
297
+ // context + $0.27/$1.10 — actual is 1M context + $0.14/$0.28. Now modeled
298
+ // as: V4-Flash + V4-Pro as canonical profiles; deepseek-chat and
299
+ // deepseek-reasoner registered as aliases (see ALIASES below).
300
+ {
301
+ id: "deepseek-v4-flash",
302
+ verifiedAgainstDocs: "2026-05-08",
303
+ provider: "deepseek",
304
+ status: "current",
305
+ maxContextTokens: 1e6,
306
+ maxOutputTokens: 384e3,
307
+ maxTools: 16,
308
+ parallelToolCalls: false,
309
+ structuredOutput: "native",
310
+ systemPromptMode: "inline",
311
+ streaming: true,
312
+ cliffs: [
313
+ {
314
+ metric: "tool_count",
315
+ threshold: 1,
316
+ action: "drop_to_top_relevant",
317
+ reason: "Sequential tool calls only \u2014 L-040"
318
+ }
319
+ ],
320
+ costInputPer1m: 0.14,
321
+ costOutputPer1m: 0.28,
322
+ lowering: {
323
+ system: { mode: "inline" },
324
+ cache: { strategy: "unsupported" },
325
+ tools: { format: "deepseek" }
326
+ },
327
+ recovery: [
328
+ { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate" }
329
+ ],
330
+ strengths: ["cost", "1m_context", "json_output", "code", "reasoning"],
331
+ weaknesses: ["parallel_tools", "large_tool_sets"],
332
+ notes: "Cheap workhorse. 1M context, 384k max output. Cache-hit input $0.0028/M (1/50\xD7 of miss). Aliased as `deepseek-chat` (non-thinking) and `deepseek-reasoner` (thinking) \u2014 see ALIASES."
333
+ },
334
+ {
335
+ id: "deepseek-v4-pro",
336
+ verifiedAgainstDocs: "2026-05-08",
337
+ provider: "deepseek",
338
+ status: "current",
339
+ maxContextTokens: 1e6,
340
+ maxOutputTokens: 384e3,
341
+ maxTools: 16,
342
+ parallelToolCalls: false,
343
+ structuredOutput: "native",
344
+ systemPromptMode: "inline",
345
+ streaming: true,
346
+ cliffs: [
347
+ {
348
+ metric: "tool_count",
349
+ threshold: 1,
350
+ action: "drop_to_top_relevant",
351
+ reason: "Sequential tool calls only \u2014 L-040"
352
+ }
353
+ ],
354
+ // Profile carries REGULAR pricing, not the 75%-off promo (ends 2026-05-31).
355
+ // Under-estimating cost is worse than over-estimating for budget caps.
356
+ costInputPer1m: 1.74,
357
+ costOutputPer1m: 3.48,
358
+ lowering: {
359
+ system: { mode: "inline" },
360
+ cache: { strategy: "unsupported" },
361
+ tools: { format: "deepseek" }
362
+ },
363
+ recovery: [
364
+ { signal: "rate_limit", action: "escalate", reason: "429 \u2014 escalate" }
365
+ ],
366
+ strengths: ["quality", "reasoning", "1m_context", "json_output", "code", "extended_thinking"],
367
+ weaknesses: ["parallel_tools", "large_tool_sets"],
368
+ notes: "Pro tier. 1M context, 384k max output. Regular pricing $1.74/$3.48; 75% promo through 2026-05-31 ($0.435/$0.87). Default mode = thinking."
369
+ }
370
+ ];
371
+ var ALIASES = {
372
+ // DeepSeek's own model routing — both names served by V4-Flash.
373
+ "deepseek-chat": "deepseek-v4-flash",
374
+ "deepseek-reasoner": "deepseek-v4-flash",
375
+ // Legacy kgauto typo — actual API alias is dash-form (alpha.1 had dot).
376
+ "claude-haiku-4.5": "claude-haiku-4-5"
377
+ };
378
+ function canonicalId(id) {
379
+ return ALIASES[id] ?? id;
380
+ }
381
+ var PROFILE_INDEX = new Map(
382
+ PROFILES_RAW.map((p) => [p.id, p])
383
+ );
384
+ function getProfile(id) {
385
+ const canonical = canonicalId(id);
386
+ const p = PROFILE_INDEX.get(canonical);
387
+ if (!p) {
388
+ const known = [...PROFILE_INDEX.keys(), ...Object.keys(ALIASES)].join(", ");
389
+ throw new Error(`Unknown model id: "${id}". Known: ${known}`);
390
+ }
391
+ return p;
392
+ }
393
+ function tryGetProfile(id) {
394
+ return PROFILE_INDEX.get(canonicalId(id));
395
+ }
396
+ function allProfiles() {
397
+ return PROFILES_RAW;
398
+ }
399
+ function profilesByProvider(provider) {
400
+ return PROFILES_RAW.filter((p) => p.provider === provider);
401
+ }
402
+
403
+ export {
404
+ ALIASES,
405
+ getProfile,
406
+ tryGetProfile,
407
+ allProfiles,
408
+ profilesByProvider
409
+ };