@warmdrift/kgauto-compiler 2.0.0-alpha.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +240 -0
- package/dist/chunk-3KVKELZN.mjs +657 -0
- package/dist/chunk-5TI6PNSK.mjs +95 -0
- package/dist/dialect.d.mts +99 -0
- package/dist/dialect.d.ts +99 -0
- package/dist/dialect.js +127 -0
- package/dist/dialect.mjs +22 -0
- package/dist/index.d.mts +509 -0
- package/dist/index.d.ts +509 -0
- package/dist/index.js +2559 -0
- package/dist/index.mjs +1784 -0
- package/dist/profiles-Bgri1pe7.d.ts +728 -0
- package/dist/profiles-DO6R9moS.d.mts +728 -0
- package/dist/profiles.d.mts +2 -0
- package/dist/profiles.d.ts +2 -0
- package/dist/profiles.js +685 -0
- package/dist/profiles.mjs +14 -0
- package/package.json +59 -0
package/README.md
ADDED
|
@@ -0,0 +1,240 @@
|
|
|
1
|
+
# @warmdrift/kgauto-compiler — v2.0.0-alpha.6
|
|
2
|
+
|
|
3
|
+
> Prompt compiler + central learning brain for multi-model AI apps.
|
|
4
|
+
> **Swap models without rewriting prompts.**
|
|
5
|
+
|
|
6
|
+
Greenfield rewrite of `@warmdrift/kgauto` v1. v1 was a behavioral patcher
|
|
7
|
+
with telemetry; v2 is a real prompt compiler with a self-improving learning
|
|
8
|
+
layer designed for cross-app pollination.
|
|
9
|
+
|
|
10
|
+
The "compiler" name is deliberate — every optimization is a pass on a
|
|
11
|
+
structured Intermediate Representation (IR), not string surgery on a
|
|
12
|
+
rendered prompt. This unlocks slicing, dedupe, intent-aware tool relevance,
|
|
13
|
+
target-correct lowering with cache markers, and (in v2.1) outcome-driven
|
|
14
|
+
mutations.
|
|
15
|
+
|
|
16
|
+
## Status
|
|
17
|
+
|
|
18
|
+
- **Package:** alpha — coexists with v1 (`@warmdrift/kgauto@1.2.0`) under
|
|
19
|
+
the temporary name `@warmdrift/kgauto-compiler`. Renames to v2 final once
|
|
20
|
+
v1 is fully retired from production.
|
|
21
|
+
- **Tests:** 201/201 passing
|
|
22
|
+
- **Build:** clean (47KB ESM, 68KB CJS)
|
|
23
|
+
- **Brain:** schema ready (see `brain/migrations/001_initial_schema.sql`);
|
|
24
|
+
awaiting dedicated Supabase provisioning.
|
|
25
|
+
- **Mutation engine:** v2.1 (after enough outcome data accumulates).
|
|
26
|
+
|
|
27
|
+
## Quickstart
|
|
28
|
+
|
|
29
|
+
Two entry points. Use whichever fits your call path.
|
|
30
|
+
|
|
31
|
+
### `call()` — kgauto owns the network round-trip (alpha.3)
|
|
32
|
+
|
|
33
|
+
For plain-fetch consumers who don't already drive the wire themselves. One async call → compiled, executed, normalized, recorded.
|
|
34
|
+
|
|
35
|
+
```ts
|
|
36
|
+
import { call, configureBrain } from '@warmdrift/kgauto-compiler';
|
|
37
|
+
|
|
38
|
+
configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
|
|
39
|
+
|
|
40
|
+
const result = await call({
|
|
41
|
+
appId: 'my-app',
|
|
42
|
+
intent: { name: 'search', archetype: 'ask' },
|
|
43
|
+
sections: [
|
|
44
|
+
{ id: 'role', text: 'You are an assistant.', cacheable: true },
|
|
45
|
+
{ id: 'task', text: userQuestion },
|
|
46
|
+
],
|
|
47
|
+
models: ['claude-sonnet-4-6', 'gemini-2.5-flash'], // first = primary; rest = fallback chain
|
|
48
|
+
});
|
|
49
|
+
|
|
50
|
+
// result.actualModel → what served (post-fallback)
|
|
51
|
+
// result.requestedModel → what kgauto initially picked
|
|
52
|
+
// result.response.text → normalized across providers
|
|
53
|
+
// result.response.tokens → { input, output, total, cached?, cacheCreated? }
|
|
54
|
+
// result.response.toolCalls → ToolCall[] in normalized shape
|
|
55
|
+
// result.attempts → retry observability
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
API keys default to `process.env.{ANTHROPIC,GOOGLE,OPENAI,DEEPSEEK}_API_KEY`. Override per-call via `apiKeys: { anthropic, google, ... }`. Reach provider-specific fields (Gemini `safetySettings`, Anthropic `tool_choice`, OpenAI `seed`) via `providerOverrides: { google: {...}, anthropic: {...} }` shallow-merged into the lowered request.
|
|
59
|
+
|
|
60
|
+
### `compile()` — drive the wire yourself (existing path)
|
|
61
|
+
|
|
62
|
+
For consumers who already own provider plumbing (AI SDK adapters, custom retry logic, streaming).
|
|
63
|
+
|
|
64
|
+
```ts
|
|
65
|
+
import { compile, configureBrain, record } from '@warmdrift/kgauto-compiler';
|
|
66
|
+
|
|
67
|
+
configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
|
|
68
|
+
|
|
69
|
+
const result = compile({
|
|
70
|
+
appId: 'my-app',
|
|
71
|
+
intent: { name: 'search', archetype: 'ask' },
|
|
72
|
+
sections: [
|
|
73
|
+
{ id: 'role', text: 'You are an assistant.', cacheable: true },
|
|
74
|
+
{ id: 'task', text: userQuestion },
|
|
75
|
+
],
|
|
76
|
+
models: ['claude-sonnet-4-6', 'gemini-2.5-flash'],
|
|
77
|
+
});
|
|
78
|
+
|
|
79
|
+
const start = Date.now();
|
|
80
|
+
const response = await callProvider(result.target, result.request);
|
|
81
|
+
|
|
82
|
+
await record({
|
|
83
|
+
handle: result.handle,
|
|
84
|
+
tokensIn: response.usage.input,
|
|
85
|
+
tokensOut: response.usage.output,
|
|
86
|
+
latencyMs: Date.now() - start,
|
|
87
|
+
success: true,
|
|
88
|
+
oracleScore: { score: 0.85 },
|
|
89
|
+
});
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## Architecture
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
APP (any consumer)
|
|
96
|
+
├── kg.compile(IR) ── runs LOCALLY, no network
|
|
97
|
+
│ ├─ pass: slice (drop sections not for this intent)
|
|
98
|
+
│ ├─ pass: dedupe (collapse identical sections by hash)
|
|
99
|
+
│ ├─ pass: tool_relevance (drop tools below intent threshold)
|
|
100
|
+
│ ├─ pass: compress_history (summarize old turns)
|
|
101
|
+
│ ├─ pass: score_targets (rank allowed models)
|
|
102
|
+
│ ├─ pass: apply_cliffs (executable known_failures from profile)
|
|
103
|
+
│ ├─ pass: lower (target-specific wire format + cache markers)
|
|
104
|
+
│ └─ pass: validate (fits hard constraints)
|
|
105
|
+
│
|
|
106
|
+
├── app calls provider with the wire request
|
|
107
|
+
│
|
|
108
|
+
└── kg.record(handle, outcome) ── async POST to brain
|
|
109
|
+
|
|
110
|
+
BRAIN (centralized Supabase)
|
|
111
|
+
├── compile_outcomes (multi-tenant from day 1)
|
|
112
|
+
├── mutations (active rules — empty in v2.0; engine in v2.1)
|
|
113
|
+
├── apps (consumer registry)
|
|
114
|
+
└── digest_runs (weekly summary audit trail)
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
## Dialect-v1 (cross-app vocabulary)
|
|
118
|
+
|
|
119
|
+
Apps tag every call with an **intent archetype** (`ask`, `hunt`, `classify`,
|
|
120
|
+
`summarize`, `generate`, `extract`, `plan`, `critique`, `transform`) and the
|
|
121
|
+
compiler computes a **shape signature** (context bucket × tool count × history
|
|
122
|
+
depth × output mode × examples flag).
|
|
123
|
+
|
|
124
|
+
The `(archetype, model, shape)` tuple is the **learning key**. Apps that
|
|
125
|
+
declare the same tuple inherit each other's mutations — even apps that have
|
|
126
|
+
never seen each other's data.
|
|
127
|
+
|
|
128
|
+
That's how *"what works for the dashboard, should be insights for the next
|
|
129
|
+
dashboard"* becomes mechanical instead of aspirational.
|
|
130
|
+
|
|
131
|
+
## Profiles — executable model knowledge
|
|
132
|
+
|
|
133
|
+
`profiles.ts` carries every model's capabilities, cliffs, lowering rules, and
|
|
134
|
+
recovery handlers as **executable code** — not prose:
|
|
135
|
+
|
|
136
|
+
```ts
|
|
137
|
+
gemini-2.5-flash:
|
|
138
|
+
cliffs: [
|
|
139
|
+
{ metric: 'input_tokens', threshold: 8_000, action: 'downgrade_quality_warning' },
|
|
140
|
+
{ metric: 'tool_count', threshold: 20, action: 'drop_to_top_relevant' },
|
|
141
|
+
{ metric: 'thinking_with_short_output', threshold: 1, action: 'force_thinking_budget_zero' },
|
|
142
|
+
],
|
|
143
|
+
recovery: [
|
|
144
|
+
{ signal: 'empty_response_after_tool', action: 'retry_with_params',
|
|
145
|
+
retryParams: { 'generationConfig.thinkingConfig.thinkingBudget': 0 } },
|
|
146
|
+
],
|
|
147
|
+
lowering: {
|
|
148
|
+
cache: { strategy: 'cachedContent', minTokens: 4096, discount: 0.25 },
|
|
149
|
+
thinking: { field: 'generationConfig.thinkingConfig.thinkingBudget', default: 'auto' },
|
|
150
|
+
},
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
The 5 prod empty-responses in tt-intelligence's `gemini-2.5-flash` dashboard
|
|
154
|
+
calls? v2 catches those automatically — `expectedShortOutput` constraint plus
|
|
155
|
+
the `force_thinking_budget_zero` cliff guard.
|
|
156
|
+
|
|
157
|
+
## Tools
|
|
158
|
+
|
|
159
|
+
Tools are first-class IR fields. The compiler's tool-relevance pass drops
|
|
160
|
+
tools that don't apply to the current intent before lowering — saves
|
|
161
|
+
context budget on every call.
|
|
162
|
+
|
|
163
|
+
```ts
|
|
164
|
+
const tools: ToolDefinition[] = [
|
|
165
|
+
{
|
|
166
|
+
name: 'web_search',
|
|
167
|
+
description: 'Search the public web',
|
|
168
|
+
parameters: { type: 'object', properties: { q: { type: 'string' } } },
|
|
169
|
+
relevanceByIntent: {
|
|
170
|
+
ask: 0.9, // primary tool for ask
|
|
171
|
+
hunt: 0.9,
|
|
172
|
+
classify: 0.0, // never useful for classification
|
|
173
|
+
summarize: 0.0,
|
|
174
|
+
extract: 0.1,
|
|
175
|
+
},
|
|
176
|
+
},
|
|
177
|
+
// ...
|
|
178
|
+
];
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Each tool declares per-intent relevance scores 0..1. The pass keeps tools
|
|
182
|
+
where `relevanceByIntent[currentIntent] >= toolRelevanceThreshold` (default
|
|
183
|
+
`0.2`). Missing entries default to neutral (`0.5`) — kept by default. Set
|
|
184
|
+
explicit `0.0` to hard-exclude.
|
|
185
|
+
|
|
186
|
+
Tool definitions eat ~350 tokens of context per tool (L-051), so trimming
|
|
187
|
+
matters: 12 declared tools, only 3 relevant → 9 × 350 = 3150 tokens
|
|
188
|
+
recovered per call.
|
|
189
|
+
|
|
190
|
+
The `tool-bloat` advisory (alpha.6) fires when more than 10 tools survive
|
|
191
|
+
the relevance pass on a short-output archetype (`classify`, `extract`,
|
|
192
|
+
`summarize`, `transform`, `critique`) — those archetypes typically use
|
|
193
|
+
≤3 tools, so a kept-count >10 indicates either missing `relevanceByIntent`
|
|
194
|
+
or scores set too generously.
|
|
195
|
+
|
|
196
|
+
DeepSeek profiles cap tools to 1 (sequential-only). Other providers
|
|
197
|
+
inherit the count from the IR after the relevance pass.
|
|
198
|
+
|
|
199
|
+
## Brain provisioning
|
|
200
|
+
|
|
201
|
+
1. Create a NEW Supabase project (suggested name: `kgauto-brain`)
|
|
202
|
+
2. Apply `brain/migrations/001_initial_schema.sql`
|
|
203
|
+
3. Insert your apps:
|
|
204
|
+
```sql
|
|
205
|
+
insert into apps (id, display_name, api_key_hash)
|
|
206
|
+
values ('my-app', 'My App', crypt('<bearer>', gen_salt('bf')));
|
|
207
|
+
```
|
|
208
|
+
4. Configure each consumer with `configureBrain({ endpoint, apiKey })`
|
|
209
|
+
|
|
210
|
+
For staging without a dedicated brain, point consumers at the same Supabase
|
|
211
|
+
they already use — the schema is identical and migration to a dedicated brain
|
|
212
|
+
is a `pg_dump` away.
|
|
213
|
+
|
|
214
|
+
## What's next
|
|
215
|
+
|
|
216
|
+
- **v2.0.x:** real-app integrations (tt-intelligence, inspire-central,
|
|
217
|
+
playbacksam, inspirato/incantato). Brain accumulates outcome data.
|
|
218
|
+
- **v2.1:** mutation engine. Shadow-test → statistical gate → promote → auto-rollback.
|
|
219
|
+
- **v2.2:** weekly digest reporting back to the operator.
|
|
220
|
+
- **v2.x:** dialect-v2 expanded with archetypes that emerge from real usage.
|
|
221
|
+
|
|
222
|
+
## Why this exists
|
|
223
|
+
|
|
224
|
+
The previous version (v1) treated prompts as opaque strings and could only
|
|
225
|
+
*append* behavioral patches. It also tried to learn quality from structural
|
|
226
|
+
signals (token counts) — but quality is semantic, not structural.
|
|
227
|
+
|
|
228
|
+
v2 treats prompts as structured IR, makes every model-specific quirk
|
|
229
|
+
*executable* (cliffs, lowering, recovery), and makes oracle scoring a
|
|
230
|
+
first-class contract — so the brain learns from quality data, not its proxies.
|
|
231
|
+
|
|
232
|
+
The whole point: every multi-model AI app needs a compiler. Building it
|
|
233
|
+
inline ships one app's value. Building it portable with a shared brain
|
|
234
|
+
ships every app's value to every other app.
|
|
235
|
+
|
|
236
|
+
Communicating vessels — finally accurate to the name.
|
|
237
|
+
|
|
238
|
+
## License
|
|
239
|
+
|
|
240
|
+
MIT. © Warmdrift.
|