@ax-llm/ax 21.0.14 → 22.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,81 @@
1
+ ---
2
+ name: ax-refine
3
+ description: Use this skill when writing or reviewing Ax bestOfN/refine code, reward functions, thresholds, native sample selection, serial attempts, generated advice, and attempt diagnostics.
4
+ version: "22.0.0"
5
+ ---
6
+
7
+ # Ax Refine And BestOfN
8
+
9
+ Use `bestOfN(...)` when you can score complete outputs independently. Use `refine(...)` when failed rounds should produce feedback that changes the next attempt.
10
+
11
+ ## Breaking Migration
12
+
13
+ Treat this as a breaking API change:
14
+
15
+ - Do not generate `addAssert(...)` or `addStreamingAssert(...)`; they are removed.
16
+ - Use schema validation for shape and field validity.
17
+ - Use `bestOfN(...)` for complete-candidate selection.
18
+ - Use `refine(...)` for retry rounds with generated feedback.
19
+ - Use `addStreamingGuard(...)` only for fail-fast streaming safety.
20
+
21
+ ## APIs
22
+
23
+ ```typescript
24
+ import { bestOfN, refine } from '@ax-llm/ax';
25
+
26
+ const selected = bestOfN(program, {
27
+ n: 4,
28
+ threshold: 0.8,
29
+ rewardFn: ({ input, prediction, traces, chatLog }) => score(prediction),
30
+ });
31
+
32
+ const improved = refine(program, {
33
+ rounds: 3,
34
+ samplesPerRound: 2,
35
+ threshold: 0.85,
36
+ rewardDescription: 'Prefer complete, grounded, concise answers.',
37
+ rewardFn: ({ prediction }) => score(prediction),
38
+ });
39
+ ```
40
+
41
+ Rules:
42
+
43
+ - `forward(...)` returns the selected prediction.
44
+ - `streamingForward(...)` is unsupported; score complete outputs instead.
45
+ - `getUsage()` aggregates usage across attempts.
46
+ - `getTraces()` and `getChatLog()` return the selected attempt's diagnostics.
47
+ - `getAttempts()` returns all attempt metadata, including reward, errors, and advice application.
48
+
49
+ ## Reward Functions
50
+
51
+ Reward functions return a number. Higher is better. A `threshold` marks a good-enough candidate and can stop serial attempts early.
52
+
53
+ ```typescript
54
+ const rewardFn = ({ prediction }) => {
55
+ const exact = prediction.answer === 'Paris' ? 1 : 0;
56
+ const concise = prediction.answer.length < 80 ? 0.2 : 0;
57
+ return exact + concise;
58
+ };
59
+ ```
60
+
61
+ Use serial strategy when the reward needs traces, chat logs, tools, or full flow behavior.
62
+
63
+ ## Strategies
64
+
65
+ - `strategy: "auto"` uses native samples for `AxGen` and serial attempts for composite programs.
66
+ - `strategy: "native-samples"` uses `sampleCount` and a reward-backed `resultPicker`; candidate context includes outputs, not full per-candidate traces.
67
+ - `strategy: "serial"` runs isolated full-program attempts with fresh memory/session IDs.
68
+
69
+ ## Refine Advice
70
+
71
+ `refine(...)` generates advice after a below-threshold round. Advice is appended temporarily to matching `kind: "instruction"` components exposed by `getOptimizableComponents()` and applied through `applyOptimizedComponents()`.
72
+
73
+ Rules:
74
+
75
+ - Original instruction values are restored in `finally`, on success and error.
76
+ - Programs without instruction components continue as best-of-N rounds and mark `adviceApplied: false`.
77
+ - Do not add DSPy-style `hint_` signature fields; Ax uses instruction-component advice.
78
+
79
+ ## Streaming
80
+
81
+ Do not use `refine(...)` for streaming. For partial-output safety, use `addStreamingGuard(fieldName, fn, message?)` on `AxGen`. Guards fail fast with `AxStreamingGuardError`; they do not retry, refine, or feed correction text back to the model.
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-signature
3
3
  description: This skill helps an LLM generate correct DSPy signature code using @ax-llm/ax. Use when the user asks about signatures, s(), f(), field types, string syntax, fluent builder API, validation constraints, or type-safe inputs/outputs.
4
- version: "21.0.14"
4
+ version: "22.0.0"
5
5
  ---
6
6
 
7
7
  # Ax Signature Reference
@@ -1,268 +0,0 @@
1
- ---
2
- name: ax-learn
3
- description: This skill helps an LLM generate correct AxLearn code using @ax-llm/ax. Use when the user asks about self-improving agents, trace-backed learning, feedback-aware updates, or AxLearn modes.
4
- version: "21.0.14"
5
- ---
6
-
7
- # AxLearn Codegen Rules (@ax-llm/ax)
8
-
9
- Use this skill to generate `AxLearn` code that matches the current API.
10
-
11
- ## Core Model
12
-
13
- - `AxLearn` wraps an `AxGen`.
14
- - `teacher` is for judging, synthesis, and reflection.
15
- - `runtimeAI` is the model being improved.
16
- - `forward()` and `streamingForward()` are inference-time APIs and auto-log traces when tracing is enabled.
17
- - `optimize()` is offline learning.
18
- - `applyUpdate()` is a bounded update API for `continuous` and `playbook` modes.
19
- - `ready()` should be awaited before assuming checkpoints have been restored.
20
- - `improvement` is the score delta from the previous/restored state.
21
-
22
- ## Required Inputs
23
-
24
- - Always provide `name`.
25
- - Always provide `storage`.
26
- - Always provide `teacher`.
27
- - Always provide `runtimeAI` if you call `optimize()` or `applyUpdate()`.
28
-
29
- ## Modes
30
-
31
- - `batch`: offline prompt learning only.
32
- - `continuous`: offline optimization plus bounded feedback-aware `applyUpdate(...)`.
33
- - `playbook`: structured context/playbook learning plus `applyUpdate(...)`.
34
-
35
- ## Preferred Construction
36
-
37
- ```typescript
38
- import {
39
- AxLearn,
40
- ax,
41
- ai,
42
- type AxCheckpoint,
43
- type AxStorage,
44
- type AxTrace,
45
- } from '@ax-llm/ax';
46
-
47
- const storage: AxStorage = {
48
- save: async (_name, _item) => {
49
- // persist trace/checkpoint
50
- },
51
- load: async (_name, _query) => {
52
- // return traces/checkpoints
53
- return [];
54
- },
55
- };
56
-
57
- const teacher = ai({
58
- name: 'openai',
59
- apiKey: process.env.OPENAI_APIKEY!,
60
- });
61
-
62
- const runtimeAI = ai({
63
- name: 'openai',
64
- apiKey: process.env.OPENAI_APIKEY!,
65
- });
66
-
67
- const gen = ax(`
68
- customerQuery:string "User message" ->
69
- supportReply:string "Agent reply"
70
- `);
71
-
72
- const agent = new AxLearn(gen, {
73
- name: 'support-bot-v1',
74
- storage,
75
- teacher,
76
- runtimeAI,
77
- mode: 'continuous',
78
- budget: 12,
79
- examples: [
80
- {
81
- customerQuery: 'Where is my order?',
82
- supportReply: 'Your order is in transit and should arrive in 2 days.',
83
- },
84
- {
85
- customerQuery: 'I need a refund.',
86
- supportReply: 'I can help with that. Please share your order number.',
87
- },
88
- ],
89
- generateExamples: false,
90
- });
91
-
92
- await agent.ready();
93
- ```
94
-
95
- ## Runtime Pattern
96
-
97
- ```typescript
98
- const prediction = await agent.forward(runtimeAI, {
99
- customerQuery: 'My package is late.',
100
- });
101
-
102
- const traces = await agent.getTraces({ limit: 1 });
103
- if (traces[0]) {
104
- await agent.addFeedback(traces[0].id, {
105
- score: 0,
106
- label: 'needs-empathy',
107
- comment: 'Acknowledge the frustration more directly.',
108
- });
109
- }
110
- ```
111
-
112
- ## Offline Optimization
113
-
114
- ```typescript
115
- const result = await agent.optimize({
116
- // Optional overrides
117
- budget: 20,
118
- });
119
-
120
- console.log(result.mode);
121
- console.log(result.score);
122
- console.log(result.improvement);
123
- console.log(result.checkpointVersion);
124
- ```
125
-
126
- `result.improvement` is the gain relative to the prior/restored score.
127
-
128
- ## Continuous Update
129
-
130
- Use `applyUpdate(...)` only in `continuous` or `playbook` mode.
131
-
132
- - In `continuous` mode, `example` may be input-only.
133
- - `prediction` is the observed runtime output being critiqued.
134
- - If `example` includes expected output fields, that expected-output row stays eligible for scored optimization.
135
- - The observed `prediction` row is feedback/reflection context, not a scored train/validation row by itself.
136
- - Feedback-bearing scored examples should stay in the training pool when non-feedback rows can fill validation.
137
- - In `playbook` mode, `getInstruction()` returns the active composed prompt.
138
-
139
- ```typescript
140
- const update = await agent.applyUpdate({
141
- example: {
142
- customerQuery: 'My package is late.',
143
- },
144
- prediction,
145
- feedback: {
146
- score: 0,
147
- label: 'needs-empathy',
148
- comment: 'Acknowledge the frustration more directly.',
149
- },
150
- });
151
- ```
152
-
153
- ## Playbook Mode
154
-
155
- - Use `mode: 'playbook'` when the learned artifact should be structured guidance, not just an instruction tweak.
156
- - Playbook checkpoints restore through `ready()`.
157
- - `applyUpdate(...)` in playbook mode performs an online structured update.
158
- - `getInstruction()` should be treated as the active composed runtime prompt, even before optimization if the base prompt lives in the signature description.
159
- - `artifact.playbookSummary` should match the persisted checkpoint `state.artifactSummary`.
160
-
161
- ## How Learning Data Is Used
162
-
163
- - `examples` and usable traces become scored optimization rows.
164
- - Feedback stored with `addFeedback(...)` becomes reflection feedback for later optimization.
165
- - In continuous updates, `example + prediction + feedback` is used as an observed feedback event.
166
- - Input-only update examples are useful for reflection, but they are not promoted into scored examples unless expected outputs are present.
167
-
168
- ## Important Options
169
-
170
- ```typescript
171
- const agent = new AxLearn(gen, {
172
- name: 'agent-id',
173
- storage,
174
- teacher,
175
- runtimeAI,
176
- mode: 'batch', // 'batch' | 'continuous' | 'playbook'
177
- budget: 20,
178
- metric: async ({ prediction, example }) => {
179
- return prediction.supportReply === example.supportReply ? 1 : 0;
180
- },
181
- criteria: 'accuracy and tone',
182
- judgeOptions: {},
183
- examples: [],
184
- useTraces: true,
185
- generateExamples: false,
186
- synthCount: 20,
187
- validationSplit: 0.2,
188
- continuousOptions: {
189
- feedbackWindowSize: 25,
190
- maxRecentTraces: 100,
191
- updateBudget: 4,
192
- },
193
- playbookOptions: {
194
- maxEpochs: 2,
195
- },
196
- onTrace: (trace) => {
197
- console.log(trace.id);
198
- },
199
- onProgress: (progress) => {
200
- console.log(progress.round, progress.score);
201
- },
202
- });
203
- ```
204
-
205
- ## Result Shape
206
-
207
- ```typescript
208
- type AxLearnResult = {
209
- mode: 'batch' | 'continuous' | 'playbook';
210
- score: number;
211
- improvement: number;
212
- checkpointVersion: number;
213
- stats: {
214
- trainingExamples: number;
215
- validationExamples: number;
216
- feedbackExamples: number;
217
- durationMs: number;
218
- mode: 'batch' | 'continuous' | 'playbook';
219
- };
220
- state?: {
221
- mode: 'batch' | 'continuous' | 'playbook';
222
- instruction?: string;
223
- baseInstruction?: string;
224
- score?: number;
225
- continuous?: {
226
- feedbackTraceCount?: number;
227
- lastUpdateAt?: string;
228
- };
229
- playbook?: Record<string, unknown>;
230
- artifactSummary?: Record<string, unknown>;
231
- };
232
- artifact?: {
233
- playbook?: Record<string, unknown>;
234
- playbookSummary?: {
235
- feedbackEvents: number;
236
- historyBatches: number;
237
- bulletCount: number;
238
- updatedAt?: string;
239
- };
240
- lastUpdateAt?: string;
241
- feedbackExamples?: number;
242
- };
243
- };
244
- ```
245
-
246
- ## Storage Notes
247
-
248
- - `AxStorage.save(name, item)` receives either a trace or checkpoint.
249
- - `AxStorage.load(name, query)` should return arrays of traces or checkpoints.
250
- - Checkpoints may be returned unsorted. `AxLearn` restores the newest one client-side.
251
-
252
- ## Do This
253
-
254
- - Use `runtimeAI` explicitly.
255
- - Await `ready()` before relying on restored state.
256
- - Run `optimize()` off the hot path.
257
- - Use `continuous` mode when you want bounded feedback-aware updates.
258
- - Use `playbook` mode when you want persistent structured guidance.
259
- - Pass the real observed model output as `prediction` in `applyUpdate(...)`.
260
- - Treat `getInstruction()` in playbook mode as the live composed prompt, not just the raw base instruction.
261
-
262
- ## Avoid This
263
-
264
- - Do not assume `teacher` is the optimized runtime model.
265
- - Do not call `applyUpdate()` in `batch` mode.
266
- - Do not claim feedback affects learning unless you are storing it with `addFeedback(...)` or passing it to `applyUpdate(...)`.
267
- - Do not assume checkpoints load synchronously in the constructor.
268
- - Do not treat `prediction` as the gold answer in continuous updates.