@warmdrift/kgauto-compiler 2.0.0-alpha.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,240 @@
1
+ # @warmdrift/kgauto-compiler — v2.0.0-alpha.6
2
+
3
+ > Prompt compiler + central learning brain for multi-model AI apps.
4
+ > **Swap models without rewriting prompts.**
5
+
6
+ Greenfield rewrite of `@warmdrift/kgauto` v1. v1 was a behavioral patcher
7
+ with telemetry; v2 is a real prompt compiler with a self-improving learning
8
+ layer designed for cross-app pollination.
9
+
10
+ The "compiler" name is deliberate — every optimization is a pass on a
11
+ structured Intermediate Representation (IR), not string surgery on a
12
+ rendered prompt. This unlocks slicing, dedupe, intent-aware tool relevance,
13
+ target-correct lowering with cache markers, and (in v2.1) outcome-driven
14
+ mutations.
15
+
16
+ ## Status
17
+
18
+ - **Package:** alpha — coexists with v1 (`@warmdrift/kgauto@1.2.0`) under
19
+ the temporary name `@warmdrift/kgauto-compiler`. Renames to v2 final once
20
+ v1 is fully retired from production.
21
+ - **Tests:** 201/201 passing
22
+ - **Build:** clean (47KB ESM, 68KB CJS)
23
+ - **Brain:** schema ready (see `brain/migrations/001_initial_schema.sql`);
24
+ awaiting dedicated Supabase provisioning.
25
+ - **Mutation engine:** v2.1 (after enough outcome data accumulates).
26
+
27
+ ## Quickstart
28
+
29
+ Two entry points. Use whichever fits your call path.
30
+
31
+ ### `call()` — kgauto owns the network round-trip (alpha.3)
32
+
33
+ For plain-fetch consumers who don't already drive the wire themselves. One async call → compiled, executed, normalized, recorded.
34
+
35
+ ```ts
36
+ import { call, configureBrain } from '@warmdrift/kgauto-compiler';
37
+
38
+ configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
39
+
40
+ const result = await call({
41
+ appId: 'my-app',
42
+ intent: { name: 'search', archetype: 'ask' },
43
+ sections: [
44
+ { id: 'role', text: 'You are an assistant.', cacheable: true },
45
+ { id: 'task', text: userQuestion },
46
+ ],
47
+ models: ['claude-sonnet-4-6', 'gemini-2.5-flash'], // first = primary; rest = fallback chain
48
+ });
49
+
50
+ // result.actualModel → what served (post-fallback)
51
+ // result.requestedModel → what kgauto initially picked
52
+ // result.response.text → normalized across providers
53
+ // result.response.tokens → { input, output, total, cached?, cacheCreated? }
54
+ // result.response.toolCalls → ToolCall[] in normalized shape
55
+ // result.attempts → retry observability
56
+ ```
57
+
58
+ API keys default to `process.env.{ANTHROPIC,GOOGLE,OPENAI,DEEPSEEK}_API_KEY`. Override per-call via `apiKeys: { anthropic, google, ... }`. Reach provider-specific fields (Gemini `safetySettings`, Anthropic `tool_choice`, OpenAI `seed`) via `providerOverrides: { google: {...}, anthropic: {...} }` shallow-merged into the lowered request.
59
+
60
+ ### `compile()` — drive the wire yourself (existing path)
61
+
62
+ For consumers who already own provider plumbing (AI SDK adapters, custom retry logic, streaming).
63
+
64
+ ```ts
65
+ import { compile, configureBrain, record } from '@warmdrift/kgauto-compiler';
66
+
67
+ configureBrain({ endpoint: 'https://your-app.com/api/kgauto/v2', apiKey: '...' });
68
+
69
+ const result = compile({
70
+ appId: 'my-app',
71
+ intent: { name: 'search', archetype: 'ask' },
72
+ sections: [
73
+ { id: 'role', text: 'You are an assistant.', cacheable: true },
74
+ { id: 'task', text: userQuestion },
75
+ ],
76
+ models: ['claude-sonnet-4-6', 'gemini-2.5-flash'],
77
+ });
78
+
79
+ const start = Date.now();
80
+ const response = await callProvider(result.target, result.request);
81
+
82
+ await record({
83
+ handle: result.handle,
84
+ tokensIn: response.usage.input,
85
+ tokensOut: response.usage.output,
86
+ latencyMs: Date.now() - start,
87
+ success: true,
88
+ oracleScore: { score: 0.85 },
89
+ });
90
+ ```
91
+
92
+ ## Architecture
93
+
94
+ ```
95
+ APP (any consumer)
96
+ ├── kg.compile(IR) ── runs LOCALLY, no network
97
+ │ ├─ pass: slice (drop sections not for this intent)
98
+ │ ├─ pass: dedupe (collapse identical sections by hash)
99
+ │ ├─ pass: tool_relevance (drop tools below intent threshold)
100
+ │ ├─ pass: compress_history (summarize old turns)
101
+ │ ├─ pass: score_targets (rank allowed models)
102
+ │ ├─ pass: apply_cliffs (executable known_failures from profile)
103
+ │ ├─ pass: lower (target-specific wire format + cache markers)
104
+ │ └─ pass: validate (fits hard constraints)
105
+
106
+ ├── app calls provider with the wire request
107
+
108
+ └── kg.record(handle, outcome) ── async POST to brain
109
+
110
+ BRAIN (centralized Supabase)
111
+ ├── compile_outcomes (multi-tenant from day 1)
112
+ ├── mutations (active rules — empty in v2.0; engine in v2.1)
113
+ ├── apps (consumer registry)
114
+ └── digest_runs (weekly summary audit trail)
115
+ ```
116
+
117
+ ## Dialect-v1 (cross-app vocabulary)
118
+
119
+ Apps tag every call with an **intent archetype** (`ask`, `hunt`, `classify`,
120
+ `summarize`, `generate`, `extract`, `plan`, `critique`, `transform`) and the
121
+ compiler computes a **shape signature** (context bucket × tool count × history
122
+ depth × output mode × examples flag).
123
+
124
+ The `(archetype, model, shape)` tuple is the **learning key**. Apps that
125
+ declare the same tuple inherit each other's mutations — even apps that have
126
+ never seen each other's data.
127
+
128
+ That's how *"what works for the dashboard, should be insights for the next
129
+ dashboard"* becomes mechanical instead of aspirational.
130
+
131
+ ## Profiles — executable model knowledge
132
+
133
+ `profiles.ts` carries every model's capabilities, cliffs, lowering rules, and
134
+ recovery handlers as **executable code** — not prose:
135
+
136
+ ```ts
137
+ gemini-2.5-flash:
138
+ cliffs: [
139
+ { metric: 'input_tokens', threshold: 8_000, action: 'downgrade_quality_warning' },
140
+ { metric: 'tool_count', threshold: 20, action: 'drop_to_top_relevant' },
141
+ { metric: 'thinking_with_short_output', threshold: 1, action: 'force_thinking_budget_zero' },
142
+ ],
143
+ recovery: [
144
+ { signal: 'empty_response_after_tool', action: 'retry_with_params',
145
+ retryParams: { 'generationConfig.thinkingConfig.thinkingBudget': 0 } },
146
+ ],
147
+ lowering: {
148
+ cache: { strategy: 'cachedContent', minTokens: 4096, discount: 0.25 },
149
+ thinking: { field: 'generationConfig.thinkingConfig.thinkingBudget', default: 'auto' },
150
+ },
151
+ ```
152
+
153
+ The 5 prod empty-responses in tt-intelligence's `gemini-2.5-flash` dashboard
154
+ calls? v2 catches those automatically — `expectedShortOutput` constraint plus
155
+ the `force_thinking_budget_zero` cliff guard.
156
+
157
+ ## Tools
158
+
159
+ Tools are first-class IR fields. The compiler's tool-relevance pass drops
160
+ tools that don't apply to the current intent before lowering — saves
161
+ context budget on every call.
162
+
163
+ ```ts
164
+ const tools: ToolDefinition[] = [
165
+ {
166
+ name: 'web_search',
167
+ description: 'Search the public web',
168
+ parameters: { type: 'object', properties: { q: { type: 'string' } } },
169
+ relevanceByIntent: {
170
+ ask: 0.9, // primary tool for ask
171
+ hunt: 0.9,
172
+ classify: 0.0, // never useful for classification
173
+ summarize: 0.0,
174
+ extract: 0.1,
175
+ },
176
+ },
177
+ // ...
178
+ ];
179
+ ```
180
+
181
+ Each tool declares per-intent relevance scores 0..1. The pass keeps tools
182
+ where `relevanceByIntent[currentIntent] >= toolRelevanceThreshold` (default
183
+ `0.2`). Missing entries default to neutral (`0.5`) — kept by default. Set
184
+ explicit `0.0` to hard-exclude.
185
+
186
+ Tool definitions eat ~350 tokens of context per tool (L-051), so trimming
187
+ matters: 12 declared tools, only 3 relevant → 9 × 350 = 3150 tokens
188
+ recovered per call.
189
+
190
+ The `tool-bloat` advisory (alpha.6) fires when more than 10 tools survive
191
+ the relevance pass on a short-output archetype (`classify`, `extract`,
192
+ `summarize`, `transform`, `critique`) — those archetypes typically use
193
+ ≤3 tools, so a kept-count >10 indicates either missing `relevanceByIntent`
194
+ or scores set too generously.
195
+
196
+ DeepSeek profiles cap tools to 1 (sequential-only). Other providers
197
+ inherit the count from the IR after the relevance pass.
198
+
199
+ ## Brain provisioning
200
+
201
+ 1. Create a NEW Supabase project (suggested name: `kgauto-brain`)
202
+ 2. Apply `brain/migrations/001_initial_schema.sql`
203
+ 3. Insert your apps:
204
+ ```sql
205
+ insert into apps (id, display_name, api_key_hash)
206
+ values ('my-app', 'My App', crypt('<bearer>', gen_salt('bf')));
207
+ ```
208
+ 4. Configure each consumer with `configureBrain({ endpoint, apiKey })`
209
+
210
+ For staging without a dedicated brain, point consumers at the same Supabase
211
+ they already use — the schema is identical and migration to a dedicated brain
212
+ is a `pg_dump` away.
213
+
214
+ ## What's next
215
+
216
+ - **v2.0.x:** real-app integrations (tt-intelligence, inspire-central,
217
+ playbacksam, inspirato/incantato). Brain accumulates outcome data.
218
+ - **v2.1:** mutation engine. Shadow-test → statistical gate → promote → auto-rollback.
219
+ - **v2.2:** weekly digest reporting back to the operator.
220
+ - **v2.x:** dialect-v2 expanded with archetypes that emerge from real usage.
221
+
222
+ ## Why this exists
223
+
224
+ The previous version (v1) treated prompts as opaque strings and could only
225
+ *append* behavioral patches. It also tried to learn quality from structural
226
+ signals (token counts) — but quality is semantic, not structural.
227
+
228
+ v2 treats prompts as structured IR, makes every model-specific quirk
229
+ *executable* (cliffs, lowering, recovery), and makes oracle scoring a
230
+ first-class contract — so the brain learns from quality data, not its proxies.
231
+
232
+ The whole point: every multi-model AI app needs a compiler. Building it
233
+ inline ships one app's value. Building it portable with a shared brain
234
+ ships every app's value to every other app.
235
+
236
+ Communicating vessels — finally accurate to the name.
237
+
238
+ ## License
239
+
240
+ MIT. © Warmdrift.