@ax-llm/ax 19.0.16 → 19.0.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,323 @@
1
+ ---
2
+ name: ax-gen
3
+ description: This skill helps an LLM generate correct AxGen code using @ax-llm/ax. Use when the user asks about ax(), AxGen, generators, forward(), streamingForward(), assertions, field processors, step hooks, self-tuning, or structured outputs.
4
+ version: "19.0.18"
5
+ ---
6
+
7
+ # AxGen Codegen Rules (@ax-llm/ax)
8
+
9
+ Use this skill to generate `AxGen` code. Prefer short, modern, copyable patterns. Do not write tutorial prose unless the user explicitly asks for explanation.
10
+
11
+ ## Use These Defaults
12
+
13
+ - Use `ax(...)` factory, not `new AxGen(...)`.
14
+ - Always pass an AI instance from `ai(...)` as the first argument to `forward()`.
15
+ - Streaming uses `streamingForward()`, not `forward()` with a stream option.
16
+ - Assertions auto-retry with error feedback on failure.
17
+ - Step hook mutations are applied at the next step boundary (pending pattern).
18
+ - `stopFunction` accepts a string or string[] for multiple stop functions.
19
+ - Multi-step continues until: all outputs filled, stop function called, or `maxSteps` reached.
20
+
21
+ ## Canonical Pattern
22
+
23
+ ```typescript
24
+ import { ai, ax, s } from '@ax-llm/ax';
25
+
26
+ const llm = ai({
27
+ name: 'openai',
28
+ apiKey: process.env.OPENAI_APIKEY!,
29
+ });
30
+
31
+ // Inline signature
32
+ const gen = ax('input:string -> output:string, reasoning:string');
33
+
34
+ // Reusable signature
35
+ const sig = s('question:string, context:string[] -> answer:string');
36
+ const gen2 = ax(sig);
37
+
38
+ // With options
39
+ const gen3 = ax('input -> output', {
40
+ description: 'A helpful assistant',
41
+ maxRetries: 3,
42
+ maxSteps: 10,
43
+ temperature: 0.7,
44
+ });
45
+
46
+ const result = await gen.forward(llm, { input: 'Hello world' });
47
+ console.log(result.output);
48
+ ```
49
+
50
+ ## Running AxGen
51
+
52
+ ### `forward()`
53
+
54
+ ```typescript
55
+ const result = await gen.forward(llm, { input: '...' });
56
+
57
+ // With options
58
+ const result = await gen.forward(llm, { input: '...' }, {
59
+ maxRetries: 5,
60
+ model: 'gpt-4.1',
61
+ modelConfig: { temperature: 0.9, maxTokens: 1000 },
62
+ debug: true,
63
+ });
64
+ ```
65
+
66
+ ### `streamingForward()`
67
+
68
+ ```typescript
69
+ const stream = gen.streamingForward(llm, { input: 'Write a long story' });
70
+ for await (const chunk of stream) {
71
+ if (chunk.delta.output) process.stdout.write(chunk.delta.output);
72
+ }
73
+ ```
74
+
75
+ ## Stopping And Cancellation
76
+
77
+ ```typescript
78
+ import { AxAIServiceAbortedError } from '@ax-llm/ax';
79
+
80
+ const timer = setTimeout(() => gen.stop(), 3_000);
81
+
82
+ try {
83
+ const result = await gen.forward(llm, { topic: 'Long document' }, {
84
+ abortSignal: AbortSignal.timeout(10_000),
85
+ });
86
+ } catch (err) {
87
+ if (err instanceof AxAIServiceAbortedError) console.log('Aborted');
88
+ }
89
+ ```
90
+
91
+ Rules:
92
+
93
+ - `gen.stop()` gracefully stops multi-step execution at the next step boundary.
94
+ - `abortSignal` cancels the underlying AI service call immediately.
95
+ - Catch `AxAIServiceAbortedError` when using either mechanism.
96
+
97
+ ## Assertions And Validation
98
+
99
+ ```typescript
100
+ // Standard assertion (checked after forward completes)
101
+ gen.addAssert(
102
+ (args) => args.output.length > 50,
103
+ 'Output must be at least 50 characters'
104
+ );
105
+
106
+ // Streaming assertion (checked during streaming)
107
+ gen.addStreamingAssert(
108
+ 'output',
109
+ (text) => !text.includes('forbidden'),
110
+ 'Output contains forbidden text'
111
+ );
112
+ ```
113
+
114
+ Rules:
115
+
116
+ - Failed assertions cause an automatic retry with the error message fed back to the LLM.
117
+ - `addAssert` receives the full output object.
118
+ - `addStreamingAssert` targets a specific field and receives the partial text so far.
119
+
120
+ ## Field Processors
121
+
122
+ ```typescript
123
+ // Post-processing after generation
124
+ gen.addFieldProcessor('summary', (value, context) => value.toUpperCase());
125
+
126
+ // Streaming field processor (called on each chunk)
127
+ gen.addStreamingFieldProcessor('content', (partialValue, context) => {
128
+ console.log(`Received ${partialValue.length} chars`);
129
+ return partialValue;
130
+ });
131
+ ```
132
+
133
+ Rules:
134
+
135
+ - `addFieldProcessor` runs once after the field is fully generated.
136
+ - `addStreamingFieldProcessor` runs on each streaming chunk for the target field.
137
+ - Both must return the (possibly transformed) value.
138
+
139
+ ## Function Calling
140
+
141
+ ```typescript
142
+ const result = await gen.forward(llm, { question: '...' }, {
143
+ functions: tools,
144
+ functionCallMode: 'auto',
145
+ stopFunction: 'finalAnswer',
146
+ });
147
+ ```
148
+
149
+ Rules:
150
+
151
+ - `functionCallMode` can be `'auto'`, `'none'`, or a specific function name to force.
152
+ - `stopFunction` accepts a string or string[] to halt multi-step on specific function calls.
153
+ - Multi-step continues until all outputs filled, stop function called, or `maxSteps` reached.
154
+
155
+ ## Caching
156
+
157
+ ### Response Caching
158
+
159
+ ```typescript
160
+ const gen = ax('question:string -> answer:string', {
161
+ cachingFunction: async (key, value?) => {
162
+ if (value !== undefined) {
163
+ await cache.set(key, value);
164
+ return;
165
+ }
166
+ return await cache.get(key);
167
+ },
168
+ });
169
+ ```
170
+
171
+ ### Context Caching
172
+
173
+ ```typescript
174
+ const result = await gen.forward(llm, { question: '...' }, {
175
+ contextCache: { cacheBreakpoint: 'after-examples' },
176
+ });
177
+ ```
178
+
179
+ Rules:
180
+
181
+ - `cachingFunction` acts as a get/set: called with `(key)` to read, `(key, value)` to write.
182
+ - `contextCache` enables AI provider-level prompt caching for long context.
183
+
184
+ ## Sampling And Result Picker
185
+
186
+ ```typescript
187
+ const result = await gen.forward(llm, { question: '...' }, {
188
+ sampleCount: 3,
189
+ resultPicker: async (samples) => {
190
+ // Evaluate each sample and return the index of the best one
191
+ return bestIndex;
192
+ },
193
+ });
194
+ ```
195
+
196
+ Rules:
197
+
198
+ - `sampleCount` generates multiple completions in parallel.
199
+ - `resultPicker` receives all samples and must return the index of the chosen result.
200
+
201
+ ## Extended Thinking
202
+
203
+ ```typescript
204
+ const result = await gen.forward(llm, { question: '...' }, {
205
+ thinkingTokenBudget: 'medium',
206
+ showThoughts: true,
207
+ });
208
+ console.log(result.thought);
209
+ ```
210
+
211
+ Rules:
212
+
213
+ - `thinkingTokenBudget` can be `'low'`, `'medium'`, `'high'`, or a number.
214
+ - Set `showThoughts: true` to include the model's reasoning in `result.thought`.
215
+
216
+ ## Step Hooks
217
+
218
+ ```typescript
219
+ const result = await gen.forward(llm, values, {
220
+ stepHooks: {
221
+ beforeStep: (ctx) => {
222
+ if (ctx.functionsExecuted.has('complexanalysis')) {
223
+ ctx.setModel('smart');
224
+ ctx.setThinkingBudget('high');
225
+ }
226
+ },
227
+ afterStep: (ctx) => {
228
+ console.log(`Usage: ${ctx.usage.totalTokens} tokens`);
229
+ },
230
+ },
231
+ });
232
+ ```
233
+
234
+ ### AxStepContext Read-Only Properties
235
+
236
+ - `stepIndex` - current step number
237
+ - `maxSteps` - configured maximum steps
238
+ - `isFirstStep` - whether this is the first step
239
+ - `functionsExecuted` - `Set<string>` of function names called so far
240
+ - `lastFunctionCalls` - array of the most recent function call results
241
+ - `usage` - token usage statistics
242
+ - `state` - current step state
243
+
244
+ ### AxStepContext Mutators
245
+
246
+ - `setModel(model)` - change the model for the next step
247
+ - `setThinkingBudget(budget)` - adjust thinking budget
248
+ - `setTemperature(temp)` - adjust temperature
249
+ - `setMaxTokens(max)` - adjust max output tokens
250
+ - `setOptions(opts)` - set arbitrary forward options
251
+ - `addFunctions(fns)` - add functions for the next step
252
+ - `removeFunctions(names)` - remove functions by name
253
+ - `stop()` - stop multi-step execution
254
+
255
+ Rules:
256
+
257
+ - All mutations are pending and applied at the next step boundary.
258
+ - `beforeStep` runs before each LLM call; `afterStep` runs after.
259
+ - Use `afterFunctionExecution` to react to specific function results.
260
+
261
+ ## Self-Tuning
262
+
263
+ ```typescript
264
+ // Simple: enable all self-tuning
265
+ const result = await gen.forward(llm, values, { selfTuning: true });
266
+
267
+ // Granular: pick what to tune
268
+ const result = await gen.forward(llm, values, {
269
+ selfTuning: {
270
+ model: true,
271
+ thinkingBudget: true,
272
+ functions: [searchWeb, calculate],
273
+ },
274
+ });
275
+ ```
276
+
277
+ Rules:
278
+
279
+ - `selfTuning: true` enables automatic model and parameter selection.
280
+ - Granular config allows tuning specific aspects independently.
281
+ - `selfTuning.functions` provides a pool of functions the tuner may add or remove per step.
282
+
283
+ ## Error Handling
284
+
285
+ ```typescript
286
+ import { AxGenerateError } from '@ax-llm/ax';
287
+
288
+ try {
289
+ const result = await gen.forward(llm, { input: '...' });
290
+ } catch (error) {
291
+ if (error instanceof AxGenerateError) {
292
+ console.log(error.details.model, error.details.signature);
293
+ }
294
+ }
295
+ ```
296
+
297
+ Rules:
298
+
299
+ - `AxGenerateError` includes `details` with `model` and `signature` for debugging.
300
+ - `AxAIServiceAbortedError` is thrown on cancellation via `stop()` or `abortSignal`.
301
+
302
+ ## Examples
303
+
304
+ Fetch these for full working code:
305
+
306
+ - [Streaming](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/streaming.ts) — streaming with assertions
307
+ - [Assertions](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/asserts.ts) — output validation
308
+ - [Streaming Assertions](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/streaming-asserts.ts) — streaming with assertion checks
309
+ - [Structured Output](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/structured_output.ts) — fluent API with validation
310
+ - [Debug Logging](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/debug-logging.ts) — debug mode and step hooks
311
+ - [Stop Function](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/stop-function.ts) — stop functions
312
+ - [Fibonacci](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/fibonacci.ts) — streaming with thinking
313
+ - [Extraction](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/extract.ts) — information extraction
314
+ - [Multi-Sampling](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/sample-count.ts) — sample count usage
315
+
316
+ ## Do Not Generate
317
+
318
+ - Do not use `new AxGen(...)` for new code unless explicitly required.
319
+ - Do not pass raw API keys or config objects where an `ai(...)` instance is expected.
320
+ - Do not use `forward()` for streaming; use `streamingForward()`.
321
+ - Do not forget that assertions auto-retry; avoid manual retry loops around assertion logic.
322
+ - Do not mutate step hook context expecting immediate effect; mutations are pending until the next step.
323
+ - Do not assume multi-step stops after one LLM call; it continues until outputs are filled, a stop function fires, or `maxSteps` is reached.
@@ -0,0 +1,244 @@
1
+ ---
2
+ name: ax-gepa
3
+ description: This skill helps an LLM generate correct AxGEPA optimization code using @ax-llm/ax. Use when the user asks about AxGEPA, GEPA, Pareto optimization, multi-objective prompt tuning, reflective prompt evolution, validationExamples, maxMetricCalls, or optimizing a generator, flow, or agent tree.
4
+ version: "19.0.18"
5
+ ---
6
+
7
+ # AxGEPA Codegen Rules (@ax-llm/ax)
8
+
9
+ Use this skill to generate direct `AxGEPA` optimization code. Prefer short, modern, copyable patterns over long explanation.
10
+
11
+ ## Use These Defaults
12
+
13
+ - Use `new AxGEPA({ studentAI, teacherAI, ... })`.
14
+ - Prefer `ai()`, `ax()`, and `flow()` for new code.
15
+ - Use a strong `teacherAI` and a cheaper `studentAI`.
16
+ - Always pass `validationExamples` to `compile()`.
17
+ - Always set `maxMetricCalls` to bound optimizer cost.
18
+ - Use scalar metrics for one objective and object metrics for Pareto optimization.
19
+ - Apply results with `program.applyOptimization(result.optimizedProgram!)`.
20
+ - For tree-wide runs, expect `optimizedProgram.instructionMap`.
21
+
22
+ ## Critical Rules
23
+
24
+ - `AxGEPA.compile()` works for a single generator and for tree-aware roots such as flows or agents with registered instruction-bearing descendants.
25
+ - There is no separate flow-only GEPA optimizer. Use `AxGEPA` for flows too.
26
+ - The metric may return either `number` or `Record<string, number>`.
27
+ - Keep metrics deterministic and cheap. Avoid extra LLM calls inside the metric unless the user explicitly wants judge-based evaluation.
28
+ - `maxMetricCalls` must be large enough to cover the initial validation pass over `validationExamples`.
29
+ - GEPA optimizes instructions. If a tree has no instruction-bearing nodes, optimization will fail.
30
+ - Use held-out validation examples for selection. Do not reuse the training set as `validationExamples`.
31
+ - `result.optimizedProgram` is the easy-to-apply best candidate. `result.paretoFront` is the full trade-off set for multi-objective runs.
32
+
33
+ ## Canonical Scalar Pattern
34
+
35
+ ```typescript
36
+ import { ai, ax, AxAIOpenAIModel, AxGEPA } from '@ax-llm/ax';
37
+
38
+ const student = ai({
39
+ name: 'openai',
40
+ apiKey: process.env.OPENAI_APIKEY!,
41
+ config: { model: AxAIOpenAIModel.GPT4OMini },
42
+ });
43
+
44
+ const teacher = ai({
45
+ name: 'openai',
46
+ apiKey: process.env.OPENAI_APIKEY!,
47
+ config: { model: AxAIOpenAIModel.GPT4O },
48
+ });
49
+
50
+ const classifier = ax(
51
+ 'emailText:string -> priority:class "high, normal, low", rationale:string'
52
+ );
53
+
54
+ const train = [
55
+ { emailText: 'URGENT: Server down!', priority: 'high' },
56
+ { emailText: 'Weekly newsletter', priority: 'low' },
57
+ ];
58
+
59
+ const validation = [
60
+ { emailText: 'Invoice overdue', priority: 'high' },
61
+ { emailText: 'Lunch plans?', priority: 'low' },
62
+ ];
63
+
64
+ const metric = ({ prediction, example }: { prediction: any; example: any }) =>
65
+ prediction?.priority === example?.priority ? 1 : 0;
66
+
67
+ const optimizer = new AxGEPA({
68
+ studentAI: student,
69
+ teacherAI: teacher,
70
+ numTrials: 12,
71
+ minibatch: true,
72
+ minibatchSize: 4,
73
+ earlyStoppingTrials: 4,
74
+ sampleCount: 1,
75
+ });
76
+
77
+ const result = await optimizer.compile(classifier, train, metric, {
78
+ validationExamples: validation,
79
+ maxMetricCalls: 120,
80
+ });
81
+
82
+ classifier.applyOptimization(result.optimizedProgram!);
83
+ console.log(result.bestScore);
84
+ ```
85
+
86
+ ## Canonical Pareto Pattern
87
+
88
+ ```typescript
89
+ import { ai, flow, AxAIOpenAIModel, AxGEPA } from '@ax-llm/ax';
90
+
91
+ const student = ai({
92
+ name: 'openai',
93
+ apiKey: process.env.OPENAI_APIKEY!,
94
+ config: { model: AxAIOpenAIModel.GPT4OMini },
95
+ });
96
+
97
+ const teacher = ai({
98
+ name: 'openai',
99
+ apiKey: process.env.OPENAI_APIKEY!,
100
+ config: { model: AxAIOpenAIModel.GPT4O },
101
+ });
102
+
103
+ const wf = flow<{ emailText: string }>()
104
+ .n('classifier', 'emailText:string -> priority:class "high, normal, low"')
105
+ .n(
106
+ 'rationale',
107
+ 'emailText:string, priority:string -> rationale:string "One concise sentence"'
108
+ )
109
+ .e('classifier', (state) => ({ emailText: state.emailText }))
110
+ .e('rationale', (state) => ({
111
+ emailText: state.emailText,
112
+ priority: state.classifierResult.priority,
113
+ }))
114
+ .r((state) => ({
115
+ priority: state.classifierResult.priority,
116
+ rationale: state.rationaleResult.rationale,
117
+ }));
118
+
119
+ const train = [
120
+ { emailText: 'URGENT: Server down!', priority: 'high' },
121
+ { emailText: 'Weekly newsletter', priority: 'low' },
122
+ ];
123
+
124
+ const validation = [
125
+ { emailText: 'Invoice overdue', priority: 'high' },
126
+ { emailText: 'Lunch plans?', priority: 'low' },
127
+ ];
128
+
129
+ const metric = ({ prediction, example }: { prediction: any; example: any }) => {
130
+ const accuracy = prediction?.priority === example?.priority ? 1 : 0;
131
+ const rationale = typeof prediction?.rationale === 'string'
132
+ ? prediction.rationale
133
+ : '';
134
+ const brevity = rationale.length <= 40 ? 1 : rationale.length <= 80 ? 0.5 : 0.1;
135
+ return { accuracy, brevity };
136
+ };
137
+
138
+ const result = await new AxGEPA({
139
+ studentAI: student,
140
+ teacherAI: teacher,
141
+ numTrials: 16,
142
+ minibatch: true,
143
+ minibatchSize: 6,
144
+ earlyStoppingTrials: 5,
145
+ sampleCount: 1,
146
+ }).compile(wf, train, metric, {
147
+ validationExamples: validation,
148
+ maxMetricCalls: 240,
149
+ });
150
+
151
+ for (const point of result.paretoFront) {
152
+ console.log(point.scores, point.configuration);
153
+ }
154
+
155
+ wf.applyOptimization(result.optimizedProgram!);
156
+ console.log(result.optimizedProgram?.instructionMap);
157
+ ```
158
+
159
+ ## Metric Patterns
160
+
161
+ ```typescript
162
+ // Scalar objective
163
+ const scalarMetric = ({ prediction, example }) =>
164
+ prediction.answer === example.answer ? 1 : 0;
165
+
166
+ // Multi-objective
167
+ const multiMetric = ({ prediction, example }) => ({
168
+ accuracy: prediction.answer === example.answer ? 1 : 0,
169
+ brevity:
170
+ typeof prediction?.reasoning === 'string' &&
171
+ prediction.reasoning.length < 120
172
+ ? 1
173
+ : 0.2,
174
+ });
175
+ ```
176
+
177
+ - Return plain numbers or plain object literals.
178
+ - Keep objective names stable across calls.
179
+ - Prefer normalized scores such as `0..1` so trade-offs are easy to reason about.
180
+
181
+ ## Result Handling
182
+
183
+ ```typescript
184
+ const { optimizedProgram, paretoFront } = result;
185
+
186
+ program.applyOptimization(optimizedProgram!);
187
+
188
+ // Save for later
189
+ const saved = JSON.stringify(optimizedProgram);
190
+
191
+ // Load later and re-apply
192
+ const loaded = JSON.parse(saved);
193
+ program.applyOptimization(loaded);
194
+ ```
195
+
196
+ - Single-target runs usually populate both `optimizedProgram.instruction` and `optimizedProgram.instructionMap`.
197
+ - Tree-wide runs rely on `instructionMap`, keyed by full program ID.
198
+ - Pareto points expose candidate configs under `point.configuration.instructionMap`.
199
+
200
+ ## Useful Options
201
+
202
+ ```typescript
203
+ const optimizer = new AxGEPA({
204
+ studentAI,
205
+ teacherAI,
206
+ numTrials: 20,
207
+ minibatch: true,
208
+ minibatchSize: 5,
209
+ minibatchFullEvalSteps: 5,
210
+ earlyStoppingTrials: 5,
211
+ minImprovementThreshold: 0,
212
+ sampleCount: 1,
213
+ seed: 42,
214
+ verbose: true,
215
+ });
216
+ ```
217
+
218
+ - `numTrials`: number of reflection/evolution rounds.
219
+ - `minibatch`: reduce per-round evaluation cost.
220
+ - `minibatchSize`: examples per minibatch.
221
+ - `earlyStoppingTrials`: stop after repeated non-improvement.
222
+ - `minImprovementThreshold`: reject tiny gains below this threshold.
223
+ - `seed`: stabilize sampling during demos and tests.
224
+
225
+ ## Budgeting and Validation
226
+
227
+ - Always create distinct `train` and `validationExamples` arrays.
228
+ - Size `maxMetricCalls` for at least one full validation pass plus several rounds.
229
+ - If the user wants a strict budget, say so explicitly and set `maxMetricCalls`.
230
+ - For expensive trees, start with `auto: 'light'` or fewer `numTrials`, then scale up.
231
+
232
+ ## Troubleshooting
233
+
234
+ - Error about `maxMetricCalls` being too small: increase it until the initial validation pass fits.
235
+ - Empty or poor Pareto front: verify the metric returns numbers for every example.
236
+ - No tree optimization effect: ensure child programs are registered under the root and have instructions to mutate.
237
+ - Saved optimization applies only partly: use `program.applyOptimization(...)`, not just `setInstruction(...)`, so `instructionMap` reaches the full tree.
238
+
239
+ ## Good Example Targets
240
+
241
+ - `/Users/vr/src/ax/src/examples/gepa.ts`
242
+ - `/Users/vr/src/ax/src/examples/gepa-flow.ts`
243
+ - `/Users/vr/src/ax/src/examples/gepa-train-inference.ts`
244
+ - `/Users/vr/src/ax/src/examples/gepa-quality-vs-speed-optimization.ts`