@ax-llm/ax 21.0.12 → 21.0.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/index.cjs +358 -223
- package/index.cjs.map +1 -1
- package/index.d.cts +4484 -4282
- package/index.d.ts +4484 -4282
- package/index.global.js +358 -223
- package/index.global.js.map +1 -1
- package/index.js +358 -223
- package/index.js.map +1 -1
- package/package.json +1 -1
- package/skills/ax-agent-memory-skills.md +52 -3
- package/skills/ax-agent-observability.md +2 -2
- package/skills/ax-agent-optimize.md +22 -27
- package/skills/ax-agent-rlm.md +31 -44
- package/skills/ax-agent.md +46 -11
- package/skills/ax-ai.md +38 -7
- package/skills/ax-audio.md +155 -33
- package/skills/ax-flow.md +1 -1
- package/skills/ax-gen.md +1 -1
- package/skills/ax-gepa.md +1 -1
- package/skills/ax-learn.md +1 -1
- package/skills/ax-llm.md +1 -1
- package/skills/ax-signature.md +13 -8
package/skills/ax-ai.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-ai
|
|
3
|
-
description: This skill helps an LLM generate correct AI provider setup and configuration code using @ax-llm/ax. Use when the user asks about ai(), providers, models, presets, embeddings, extended thinking, context caching, or mentions OpenAI/Anthropic/Google/Azure/Groq/DeepSeek/Mistral/Cohere/Together/Ollama/HuggingFace/Reka/OpenRouter with @ax-llm/ax.
|
|
4
|
-
version: "21.0.
|
|
3
|
+
description: This skill helps an LLM generate correct AI provider setup and configuration code using @ax-llm/ax. Use when the user asks about ai(), providers, models, presets, embeddings, batch audio with ai.transcribe() or ai.speak(), extended thinking, context caching, or mentions OpenAI/Anthropic/Google/Azure/Groq/DeepSeek/Mistral/Cohere/Together/Ollama/HuggingFace/Reka/OpenRouter with @ax-llm/ax.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# AI Provider Codegen Rules (@ax-llm/ax)
|
|
@@ -78,6 +78,30 @@ const res = await llm.chat({
|
|
|
78
78
|
console.log(res.results[0]?.content);
|
|
79
79
|
```
|
|
80
80
|
|
|
81
|
+
## Batch Audio
|
|
82
|
+
|
|
83
|
+
Use `ai.transcribe(...)` for batch speech-to-text and `ai.speak(...)` for batch text-to-speech. These are separate from conversational `.chat()` audio config.
|
|
84
|
+
|
|
85
|
+
```typescript
|
|
86
|
+
const transcript = await llm.transcribe({
|
|
87
|
+
audio: { data: base64Wav, format: 'wav' },
|
|
88
|
+
model: 'gpt-4o-mini-transcribe',
|
|
89
|
+
language: 'en',
|
|
90
|
+
});
|
|
91
|
+
|
|
92
|
+
const speech = await llm.speak({
|
|
93
|
+
text: transcript.text,
|
|
94
|
+
model: 'gpt-4o-mini-tts',
|
|
95
|
+
voice: 'alloy',
|
|
96
|
+
format: 'mp3',
|
|
97
|
+
});
|
|
98
|
+
|
|
99
|
+
console.log(transcript.text);
|
|
100
|
+
console.log(speech.data);
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Providers without the requested audio endpoint throw `AxMediaNotSupportedError`. Use `speech` forward options for signature audio artifacts and `modelConfig.audio` for conversational chat audio.
|
|
104
|
+
|
|
81
105
|
## Common Options
|
|
82
106
|
|
|
83
107
|
- `stream` (boolean): enable SSE; true by default
|
|
@@ -117,14 +141,21 @@ import { ai, AxAIDeepSeekModel } from '@ax-llm/ax';
|
|
|
117
141
|
const deepseek = ai({
|
|
118
142
|
name: 'deepseek',
|
|
119
143
|
apiKey: process.env.DEEPSEEK_APIKEY!,
|
|
120
|
-
config: { model: AxAIDeepSeekModel.
|
|
144
|
+
config: { model: AxAIDeepSeekModel.DeepSeekV4Flash },
|
|
121
145
|
});
|
|
122
146
|
```
|
|
123
147
|
|
|
124
|
-
DeepSeek
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
148
|
+
DeepSeek's current API models are `deepseek-v4-flash` and `deepseek-v4-pro`.
|
|
149
|
+
The deprecated `deepseek-chat` and `deepseek-reasoner` aliases are retained for
|
|
150
|
+
compatibility until DeepSeek removes them on 2026-07-24.
|
|
151
|
+
|
|
152
|
+
DeepSeek V4 supports thinking mode. Ax sends `thinking: { type: "disabled" }`
|
|
153
|
+
by default to preserve non-thinking behavior, and enables it when
|
|
154
|
+
`thinkingTokenBudget` is set. Ax maps lower budget levels to DeepSeek's `high`
|
|
155
|
+
effort and maps `highest` to `max`. DeepSeek V4 thinking models support tools,
|
|
156
|
+
but reject the `tool_choice` request parameter, so Ax omits forced/auto tool
|
|
157
|
+
choice for `deepseek-v4-pro`, `deepseek-v4-flash`, and `deepseek-reasoner`
|
|
158
|
+
while still sending tool definitions.
|
|
128
159
|
|
|
129
160
|
## Extended Thinking
|
|
130
161
|
|
package/skills/ax-audio.md
CHANGED
|
@@ -1,57 +1,175 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-audio
|
|
3
|
-
description: This skill helps an LLM generate correct
|
|
4
|
-
version: "21.0.
|
|
3
|
+
description: This skill helps an LLM generate correct audio code with @ax-llm/ax. Use when the user asks about ai.transcribe(), ai.speak(), signature audio inputs or outputs, agent audio behavior, .chat() conversational audio, OpenAI audio or realtime models, Gemini Live native audio, Grok Voice Agent models, voices, formats, transcripts, or how audio fits with structured outputs.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Audio I/O Codegen Rules (@ax-llm/ax)
|
|
8
8
|
|
|
9
|
-
Use this skill for
|
|
9
|
+
Use this skill for audio in Ax. Pick the smallest audio surface that matches the job:
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
- Use `ai.transcribe(...)` for batch speech-to-text.
|
|
12
|
+
- Use `ai.speak(...)` for batch text-to-speech.
|
|
13
|
+
- Use `speech:audio` signature outputs for structured programs that should return synthesized audio artifacts.
|
|
14
|
+
- Use `.chat()` audio config for conversational or realtime audio turns.
|
|
12
15
|
|
|
13
|
-
|
|
16
|
+
## Core Rules
|
|
14
17
|
|
|
15
|
-
|
|
18
|
+
- Input `:audio` is an audio input value: `{ data, format?, mimeType?, sampleRate?, channels? }`.
|
|
19
|
+
- Output `:audio` is a scripted audio artifact. The model returns plain text for that field; Ax synthesizes it after structured output parsing.
|
|
20
|
+
- Output audio JSON schema is model-facing `string`, not a binary object.
|
|
21
|
+
- Agents transcribe input audio fields before planner/executor/responder stages by default, so agent stages see text instead of base64 audio.
|
|
22
|
+
- Realtime and conversational audio still use `.chat()` and `modelConfig.audio`.
|
|
23
|
+
- Batch signature audio artifacts use forward-time `speech` options, not `modelConfig.audio`.
|
|
24
|
+
|
|
25
|
+
## Direct Batch APIs
|
|
26
|
+
|
|
27
|
+
```typescript
|
|
28
|
+
import { ai } from '@ax-llm/ax';
|
|
29
|
+
|
|
30
|
+
const llm = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
|
|
31
|
+
|
|
32
|
+
const transcript = await llm.transcribe({
|
|
33
|
+
audio: { data: base64Wav, format: 'wav' },
|
|
34
|
+
model: 'gpt-4o-mini-transcribe',
|
|
35
|
+
language: 'en',
|
|
36
|
+
prompt: 'Product support call',
|
|
37
|
+
});
|
|
38
|
+
|
|
39
|
+
const speech = await llm.speak({
|
|
40
|
+
text: transcript.text,
|
|
41
|
+
model: 'gpt-4o-mini-tts',
|
|
42
|
+
voice: 'alloy',
|
|
43
|
+
format: 'mp3',
|
|
44
|
+
});
|
|
45
|
+
|
|
46
|
+
console.log(transcript.text);
|
|
47
|
+
console.log(speech.data);
|
|
48
|
+
console.log(speech.transcript);
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Providers without the requested batch audio capability throw `AxMediaNotSupportedError`.
|
|
52
|
+
|
|
53
|
+
## Signature Audio Artifacts
|
|
54
|
+
|
|
55
|
+
```typescript
|
|
56
|
+
import { ai, ax } from '@ax-llm/ax';
|
|
57
|
+
|
|
58
|
+
const llm = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
|
|
59
|
+
const say = ax('question:string -> speech:audio, summary:string');
|
|
60
|
+
|
|
61
|
+
const result = await say.forward(
|
|
62
|
+
llm,
|
|
63
|
+
{ question: 'Explain retries in one sentence.' },
|
|
64
|
+
{
|
|
65
|
+
speech: {
|
|
66
|
+
speak: { voice: 'alloy', format: 'mp3' },
|
|
67
|
+
fields: {
|
|
68
|
+
speech: { voice: 'alloy' },
|
|
69
|
+
},
|
|
70
|
+
},
|
|
71
|
+
}
|
|
72
|
+
);
|
|
73
|
+
|
|
74
|
+
console.log(result.summary);
|
|
75
|
+
console.log(result.speech.data);
|
|
76
|
+
console.log(result.speech.mimeType);
|
|
77
|
+
console.log(result.speech.transcript);
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The model emits a text script for `speech`; Ax replaces it with `AxChatAudioOutput` after result selection. If the field already contains an audio artifact with `{ data }` or `{ id }`, Ax leaves it alone.
|
|
81
|
+
|
|
82
|
+
## Agent Audio Inputs
|
|
83
|
+
|
|
84
|
+
```typescript
|
|
85
|
+
import { agent, ai } from '@ax-llm/ax';
|
|
86
|
+
|
|
87
|
+
const llm = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
|
|
88
|
+
|
|
89
|
+
const voiceAgent = agent(
|
|
90
|
+
'recording:audio, question:string -> speech:audio, summary:string',
|
|
91
|
+
{
|
|
92
|
+
agentIdentity: {
|
|
93
|
+
name: 'Voice Assistant',
|
|
94
|
+
description: 'Answers spoken requests with spoken and written output',
|
|
95
|
+
},
|
|
96
|
+
contextFields: [],
|
|
97
|
+
}
|
|
98
|
+
);
|
|
99
|
+
|
|
100
|
+
const result = await voiceAgent.forward(
|
|
101
|
+
llm,
|
|
102
|
+
{
|
|
103
|
+
recording: { data: base64Wav, format: 'wav' },
|
|
104
|
+
question: 'What should I do next?',
|
|
105
|
+
},
|
|
106
|
+
{
|
|
107
|
+
speech: {
|
|
108
|
+
transcribe: { model: 'gpt-4o-mini-transcribe' },
|
|
109
|
+
speak: { voice: 'alloy', format: 'mp3' },
|
|
110
|
+
},
|
|
111
|
+
}
|
|
112
|
+
);
|
|
113
|
+
|
|
114
|
+
console.log(result.summary);
|
|
115
|
+
console.log(result.speech.data);
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
The agent runtime transcribes `recording` first and passes the transcript through the internal agent stages. Use direct `ax(...)` or `.chat()` when you specifically want native audio understanding in the model call.
|
|
119
|
+
|
|
120
|
+
## Conversational `.chat()` Audio
|
|
121
|
+
|
|
122
|
+
Use `modelConfig.audio` for conversational audio turns where audio is part of the chat response instead of a structured signature field.
|
|
16
123
|
|
|
17
124
|
```typescript
|
|
18
|
-
const
|
|
125
|
+
const res = await llm.chat({
|
|
19
126
|
chatPrompt: [{ role: 'user', content: 'Say hello out loud.' }],
|
|
20
127
|
modelConfig: {
|
|
21
|
-
audio: { output: { enabled: true } },
|
|
128
|
+
audio: { output: { enabled: true, voice: 'alloy', format: 'wav' } },
|
|
22
129
|
},
|
|
23
130
|
});
|
|
24
131
|
|
|
25
|
-
console.log(
|
|
26
|
-
console.log(
|
|
27
|
-
console.log(
|
|
132
|
+
console.log(res.results[0]?.content);
|
|
133
|
+
console.log(res.results[0]?.audio?.data);
|
|
134
|
+
console.log(res.results[0]?.audio?.transcript);
|
|
28
135
|
```
|
|
29
136
|
|
|
30
|
-
Do not write signatures like `question:string -> audio:audio`. Use `.chat()` for conversational audio and use `audio.data` for the generated bytes.
|
|
31
|
-
|
|
32
137
|
## Config Shape
|
|
33
138
|
|
|
34
139
|
```typescript
|
|
35
|
-
type
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
140
|
+
type AxAudioFormat =
|
|
141
|
+
| 'wav'
|
|
142
|
+
| 'mp3'
|
|
143
|
+
| 'flac'
|
|
144
|
+
| 'opus'
|
|
145
|
+
| 'aac'
|
|
146
|
+
| 'pcm16'
|
|
147
|
+
| 'pcm'
|
|
148
|
+
| 'ogg'
|
|
149
|
+
| 'raw'
|
|
150
|
+
| 'mulaw'
|
|
151
|
+
| 'ulaw'
|
|
152
|
+
| 'alaw';
|
|
153
|
+
|
|
154
|
+
type AxSpeechConfig = {
|
|
155
|
+
transcribe?: {
|
|
156
|
+
model?: string;
|
|
157
|
+
language?: string;
|
|
158
|
+
prompt?: string;
|
|
49
159
|
};
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
160
|
+
speak?: {
|
|
161
|
+
model?: string;
|
|
162
|
+
voice?: string;
|
|
163
|
+
format?: AxAudioFormat;
|
|
54
164
|
};
|
|
165
|
+
fields?: Record<
|
|
166
|
+
string,
|
|
167
|
+
{
|
|
168
|
+
model?: string;
|
|
169
|
+
voice?: string;
|
|
170
|
+
format?: AxAudioFormat;
|
|
171
|
+
}
|
|
172
|
+
>;
|
|
55
173
|
};
|
|
56
174
|
```
|
|
57
175
|
|
|
@@ -246,6 +364,10 @@ for await (const chunk of stream) {
|
|
|
246
364
|
|
|
247
365
|
## Structured Outputs
|
|
248
366
|
|
|
249
|
-
|
|
367
|
+
Use signature audio outputs for structured speech artifacts:
|
|
368
|
+
|
|
369
|
+
```typescript
|
|
370
|
+
const gen = ax('question:string -> answer:string, speech:audio');
|
|
371
|
+
```
|
|
250
372
|
|
|
251
|
-
|
|
373
|
+
Use `.chat()` audio when the response itself is a conversational audio turn. Do not combine `.chat()` audio output with provider-native structured response formats unless that provider explicitly supports the combination.
|
package/skills/ax-flow.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-flow
|
|
3
3
|
description: This skill helps an LLM generate correct AxFlow workflow code using @ax-llm/ax. Use when the user asks about flow(), AxFlow, workflow orchestration, parallel execution, DAG workflows, conditional routing, map/reduce patterns, or multi-node AI pipelines.
|
|
4
|
-
version: "21.0.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# AxFlow Codegen Rules (@ax-llm/ax)
|
package/skills/ax-gen.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-gen
|
|
3
3
|
description: This skill helps an LLM generate correct AxGen code using @ax-llm/ax. Use when the user asks about ax(), AxGen, generators, forward(), streamingForward(), assertions, field processors, step hooks, self-tuning, or structured outputs.
|
|
4
|
-
version: "21.0.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# AxGen Codegen Rules (@ax-llm/ax)
|
package/skills/ax-gepa.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-gepa
|
|
3
3
|
description: This skill helps an LLM generate correct AxGEPA optimization code using @ax-llm/ax. Use when the user asks about AxGEPA, GEPA, Pareto optimization, multi-objective prompt tuning, reflective prompt evolution, validationExamples, maxMetricCalls, or optimizing a generator, flow, or agent tree.
|
|
4
|
-
version: "21.0.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# AxGEPA Codegen Rules (@ax-llm/ax)
|
package/skills/ax-learn.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-learn
|
|
3
3
|
description: This skill helps an LLM generate correct AxLearn code using @ax-llm/ax. Use when the user asks about self-improving agents, trace-backed learning, feedback-aware updates, or AxLearn modes.
|
|
4
|
-
version: "21.0.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# AxLearn Codegen Rules (@ax-llm/ax)
|
package/skills/ax-llm.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-llm
|
|
3
3
|
description: This skill helps with using the @ax-llm/ax TypeScript library for building LLM applications. Use when the user asks about ax(), ai(), f(), s(), agent(), flow(), AxGen, AxAgent, AxFlow, signatures, streaming, or mentions @ax-llm/ax.
|
|
4
|
-
version: "21.0.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Ax Library (@ax-llm/ax) Quick Reference
|
package/skills/ax-signature.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ax-signature
|
|
3
3
|
description: This skill helps an LLM generate correct DSPy signature code using @ax-llm/ax. Use when the user asks about signatures, s(), f(), field types, string syntax, fluent builder API, validation constraints, or type-safe inputs/outputs.
|
|
4
|
-
version: "21.0.
|
|
4
|
+
version: "21.0.14"
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Ax Signature Reference
|
|
@@ -25,7 +25,7 @@ version: "21.0.12"
|
|
|
25
25
|
| DateRange | `:dateRange` | `{ start: Date; end: Date }` | `travelDates:dateRange` |
|
|
26
26
|
| DateTimeRange | `:datetimeRange` | `{ start: Date; end: Date }` | `meetingWindow:datetimeRange` |
|
|
27
27
|
| Image | `:image` | `{mimeType, data}` | `photo:image` (input only) |
|
|
28
|
-
| Audio | `:audio` | `
|
|
28
|
+
| Audio | `:audio` | input: `AxAudioInput`; output: `AxChatAudioOutput` | `recording:audio`, `speech:audio` |
|
|
29
29
|
| File | `:file` | `{mimeType, data}` | `document:file` (input only) |
|
|
30
30
|
| URL | `:url` | `string` | `website:url` |
|
|
31
31
|
| Code | `:code` | `string` | `pythonScript:code` |
|
|
@@ -256,9 +256,11 @@ Bad: `text`, `data`, `input`, `output`, `a`, `x`, `val` (too generic), `1field`
|
|
|
256
256
|
|
|
257
257
|
## Media Type Restrictions
|
|
258
258
|
|
|
259
|
-
-
|
|
260
|
-
-
|
|
261
|
-
-
|
|
259
|
+
- Image and file fields are top-level input fields only.
|
|
260
|
+
- Audio fields can be top-level inputs or single top-level outputs.
|
|
261
|
+
- Audio output fields are scripted speech artifacts: the model returns plain text, then Ax synthesizes `AxChatAudioOutput`.
|
|
262
|
+
- Media fields cannot be nested in objects.
|
|
263
|
+
- Media arrays are supported for inputs only; output `audio[]` is not supported.
|
|
262
264
|
|
|
263
265
|
## Common Patterns
|
|
264
266
|
|
|
@@ -269,9 +271,12 @@ Bad: `text`, `data`, `input`, `output`, `a`, `x`, `val` (too generic), `1field`
|
|
|
269
271
|
// Classification
|
|
270
272
|
'email:string -> priority:class "urgent, normal, low"'
|
|
271
273
|
|
|
272
|
-
// Multi-modal
|
|
274
|
+
// Multi-modal input
|
|
273
275
|
'imageData:image, question?:string -> description:string, objects:string[]'
|
|
274
276
|
|
|
277
|
+
// Scripted speech output
|
|
278
|
+
'question:string -> speech:audio, summary:string'
|
|
279
|
+
|
|
275
280
|
// Data Extraction
|
|
276
281
|
'invoiceText:string -> invoiceNumber:string, totalAmount:number, lineItems:json[]'
|
|
277
282
|
|
|
@@ -283,13 +288,13 @@ Bad: `text`, `data`, `input`, `output`, `a`, `x`, `val` (too generic), `1field`
|
|
|
283
288
|
|
|
284
289
|
- Use `f()` fluent builder, NOT nested `f.array(f.string())` -- those are removed.
|
|
285
290
|
- Field names must be descriptive (not generic like `text`, `data`, `input`).
|
|
286
|
-
-
|
|
291
|
+
- Image/file media types are input-only, top-level only; audio may also be a single top-level output.
|
|
287
292
|
- `.internal()` / `{ internal: true }` is output-only (for chain-of-thought reasoning).
|
|
288
293
|
- `.cache()` / `{ cache: true }` is input-only (for prompt caching).
|
|
289
294
|
- Validation errors trigger auto-retry with correction feedback.
|
|
290
295
|
- `f.email()`, `f.url()`, `f.date()`, `f.datetime()` are shorthand for `f.string().email()` etc.; `f.dateRange()` and `f.datetimeRange()` return `{ start: Date; end: Date }`.
|
|
291
296
|
- `z.enum()` maps to ax's `class` type — only valid on **output** fields.
|
|
292
|
-
- For multimodal inputs (images, audio, files) use `f.image()` / `f.audio()` / `f.file()` — zod has no equivalent.
|
|
297
|
+
- For multimodal inputs (images, audio, files) and scripted audio outputs, use `f.image()` / `f.audio()` / `f.file()` — zod has no equivalent.
|
|
293
298
|
|
|
294
299
|
## Examples
|
|
295
300
|
|