npm - @ax-llm/ax - Versions diffs - 21.0.12 → 21.0.13 - Mend

@ax-llm/ax 21.0.12 → 21.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/index.cjs +358 -223
package/index.cjs.map +1 -1
package/index.d.cts +4484 -4282
package/index.d.ts +4484 -4282
package/index.global.js +358 -223
package/index.global.js.map +1 -1
package/index.js +358 -223
package/index.js.map +1 -1
package/package.json +1 -1
package/skills/ax-agent-memory-skills.md +52 -3
package/skills/ax-agent-observability.md +2 -2
package/skills/ax-agent-optimize.md +22 -27
package/skills/ax-agent-rlm.md +30 -43
package/skills/ax-agent.md +46 -11
package/skills/ax-ai.md +38 -7
package/skills/ax-audio.md +155 -33
package/skills/ax-flow.md +1 -1
package/skills/ax-gen.md +1 -1
package/skills/ax-gepa.md +1 -1
package/skills/ax-learn.md +1 -1
package/skills/ax-llm.md +1 -1
package/skills/ax-signature.md +13 -8

package/skills/ax-audio.md CHANGED Viewed

@@ -1,57 +1,175 @@
 ---
 name: ax-audio
-description: This skill helps an LLM generate correct conversational audio I/O code with @ax-llm/ax. Use when the user asks about .chat() audio input, audio output, OpenAI gpt-audio or realtime models, Gemini Live native audio, Grok Voice Agent models, voices, formats, transcripts, or how audio fits with signatures and structured outputs.
-version: "21.0.12"
+description: This skill helps an LLM generate correct audio code with @ax-llm/ax. Use when the user asks about ai.transcribe(), ai.speak(), signature audio inputs or outputs, agent audio behavior, .chat() conversational audio, OpenAI audio or realtime models, Gemini Live native audio, Grok Voice Agent models, voices, formats, transcripts, or how audio fits with structured outputs.
+version: "21.0.13"
 ---
 # Audio I/O Codegen Rules (@ax-llm/ax)
-Use this skill for bounded-turn conversational audio through `.chat()`. Prefer short, modern, copyable examples. Do not model generated audio as a DSPy signature output field.
+Use this skill for audio in Ax. Pick the smallest audio surface that matches the job:
-## Core Rule
+- Use `ai.transcribe(...)` for batch speech-to-text.
+- Use `ai.speak(...)` for batch text-to-speech.
+- Use `speech:audio` signature outputs for structured programs that should return synthesized audio artifacts.
+- Use `.chat()` audio config for conversational or realtime audio turns.
-Audio output is returned on `AxChatResponseResult.audio`, not in signature fields.
+## Core Rules
-Signatures should keep text fields text-shaped:
+- Input `:audio` is an audio input value: `{ data, format?, mimeType?, sampleRate?, channels? }`.
+- Output `:audio` is a scripted audio artifact. The model returns plain text for that field; Ax synthesizes it after structured output parsing.
+- Output audio JSON schema is model-facing `string`, not a binary object.
+- Agents transcribe input audio fields before planner/executor/responder stages by default, so agent stages see text instead of base64 audio.
+- Realtime and conversational audio still use `.chat()` and `modelConfig.audio`.
+- Batch signature audio artifacts use forward-time `speech` options, not `modelConfig.audio`.
+## Direct Batch APIs
+```typescript
+import { ai } from '@ax-llm/ax';
+const llm = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
+const transcript = await llm.transcribe({
+  audio: { data: base64Wav, format: 'wav' },
+  model: 'gpt-4o-mini-transcribe',
+  language: 'en',
+  prompt: 'Product support call',
+});
+const speech = await llm.speak({
+  text: transcript.text,
+  model: 'gpt-4o-mini-tts',
+  voice: 'alloy',
+  format: 'mp3',
+});
+console.log(transcript.text);
+console.log(speech.data);
+console.log(speech.transcript);
+```
+Providers without the requested batch audio capability throw `AxMediaNotSupportedError`.
+## Signature Audio Artifacts
+```typescript
+import { ai, ax } from '@ax-llm/ax';
+const llm = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
+const say = ax('question:string -> speech:audio, summary:string');
+const result = await say.forward(
+  llm,
+  { question: 'Explain retries in one sentence.' },
+  {
+    speech: {
+      speak: { voice: 'alloy', format: 'mp3' },
+      fields: {
+        speech: { voice: 'alloy' },
+      },
+    },
+  }
+);
+console.log(result.summary);
+console.log(result.speech.data);
+console.log(result.speech.mimeType);
+console.log(result.speech.transcript);
+```
+The model emits a text script for `speech`; Ax replaces it with `AxChatAudioOutput` after result selection. If the field already contains an audio artifact with `{ data }` or `{ id }`, Ax leaves it alone.
+## Agent Audio Inputs
+```typescript
+import { agent, ai } from '@ax-llm/ax';
+const llm = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
+const voiceAgent = agent(
+  'recording:audio, question:string -> speech:audio, summary:string',
+  {
+    agentIdentity: {
+      name: 'Voice Assistant',
+      description: 'Answers spoken requests with spoken and written output',
+    },
+    contextFields: [],
+  }
+);
+const result = await voiceAgent.forward(
+  llm,
+  {
+    recording: { data: base64Wav, format: 'wav' },
+    question: 'What should I do next?',
+  },
+  {
+    speech: {
+      transcribe: { model: 'gpt-4o-mini-transcribe' },
+      speak: { voice: 'alloy', format: 'mp3' },
+    },
+  }
+);
+console.log(result.summary);
+console.log(result.speech.data);
+```
+The agent runtime transcribes `recording` first and passes the transcript through the internal agent stages. Use direct `ax(...)` or `.chat()` when you specifically want native audio understanding in the model call.
+## Conversational `.chat()` Audio
+Use `modelConfig.audio` for conversational audio turns where audio is part of the chat response instead of a structured signature field.
 ```typescript
-const result = await llm.chat({
+const res = await llm.chat({
   chatPrompt: [{ role: 'user', content: 'Say hello out loud.' }],
   modelConfig: {
-    audio: { output: { enabled: true } },
+    audio: { output: { enabled: true, voice: 'alloy', format: 'wav' } },
   },
 });
-console.log(result.results[0]?.content);
-console.log(result.results[0]?.audio?.data);
-console.log(result.results[0]?.audio?.transcript);
+console.log(res.results[0]?.content);
+console.log(res.results[0]?.audio?.data);
+console.log(res.results[0]?.audio?.transcript);
 ```
-Do not write signatures like `question:string -> audio:audio`. Use `.chat()` for conversational audio and use `audio.data` for the generated bytes.
 ## Config Shape
 ```typescript
-type AxChatAudioConfig = {
-  input?: {
-    format?: 'wav' | 'mp3' | 'flac' | 'opus' | 'aac' | 'pcm16' | 'pcm' | 'ogg';
-    mimeType?: string;
-    sampleRate?: number;
-    channels?: number;
-  };
-  output?: {
-    enabled?: boolean;
-    voice?: string | { id: string };
-    format?: 'wav' | 'mp3' | 'flac' | 'opus' | 'aac' | 'pcm16' | 'pcm' | 'ogg';
-    sampleRate?: number;
-    channels?: number;
-    includeTranscript?: boolean;
+type AxAudioFormat =
+  | 'wav'
+  | 'mp3'
+  | 'flac'
+  | 'opus'
+  | 'aac'
+  | 'pcm16'
+  | 'pcm'
+  | 'ogg'
+  | 'raw'
+  | 'mulaw'
+  | 'ulaw'
+  | 'alaw';
+type AxSpeechConfig = {
+  transcribe?: {
+    model?: string;
+    language?: string;
+    prompt?: string;
   };
-  live?: {
-    turnTimeoutMs?: number;
-    enableAffectiveDialog?: boolean;
-    proactiveAudio?: boolean;
+  speak?: {
+    model?: string;
+    voice?: string;
+    format?: AxAudioFormat;
   };
+  fields?: Record<
+    string,
+    {
+      model?: string;
+      voice?: string;
+      format?: AxAudioFormat;
+    }
+  >;
 };
 ```
@@ -246,6 +364,10 @@ for await (const chunk of stream) {
 ## Structured Outputs
-Do not combine audio output with structured response formats. Audio chat may return a text transcript in `content`, but generated audio bytes live at `result.results[0].audio`.
+Use signature audio outputs for structured speech artifacts:
+```typescript
+const gen = ax('question:string -> answer:string, speech:audio');
+```
-For structured extraction from speech, use a text-only or transcription step first, then pass the transcript into `ax(...)` or `flow(...)`.
+Use `.chat()` audio when the response itself is a conversational audio turn. Do not combine `.chat()` audio output with provider-native structured response formats unless that provider explicitly supports the combination.

package/skills/ax-flow.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-flow
 description: This skill helps an LLM generate correct AxFlow workflow code using @ax-llm/ax. Use when the user asks about flow(), AxFlow, workflow orchestration, parallel execution, DAG workflows, conditional routing, map/reduce patterns, or multi-node AI pipelines.
-version: "21.0.12"
+version: "21.0.13"
 ---
 # AxFlow Codegen Rules (@ax-llm/ax)

package/skills/ax-gen.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-gen
 description: This skill helps an LLM generate correct AxGen code using @ax-llm/ax. Use when the user asks about ax(), AxGen, generators, forward(), streamingForward(), assertions, field processors, step hooks, self-tuning, or structured outputs.
-version: "21.0.12"
+version: "21.0.13"
 ---
 # AxGen Codegen Rules (@ax-llm/ax)

package/skills/ax-gepa.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-gepa
 description: This skill helps an LLM generate correct AxGEPA optimization code using @ax-llm/ax. Use when the user asks about AxGEPA, GEPA, Pareto optimization, multi-objective prompt tuning, reflective prompt evolution, validationExamples, maxMetricCalls, or optimizing a generator, flow, or agent tree.
-version: "21.0.12"
+version: "21.0.13"
 ---
 # AxGEPA Codegen Rules (@ax-llm/ax)

package/skills/ax-learn.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-learn
 description: This skill helps an LLM generate correct AxLearn code using @ax-llm/ax. Use when the user asks about self-improving agents, trace-backed learning, feedback-aware updates, or AxLearn modes.
-version: "21.0.12"
+version: "21.0.13"
 ---
 # AxLearn Codegen Rules (@ax-llm/ax)

package/skills/ax-llm.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-llm
 description: This skill helps with using the @ax-llm/ax TypeScript library for building LLM applications. Use when the user asks about ax(), ai(), f(), s(), agent(), flow(), AxGen, AxAgent, AxFlow, signatures, streaming, or mentions @ax-llm/ax.
-version: "21.0.12"
+version: "21.0.13"
 ---
 # Ax Library (@ax-llm/ax) Quick Reference

package/skills/ax-signature.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-signature
 description: This skill helps an LLM generate correct DSPy signature code using @ax-llm/ax. Use when the user asks about signatures, s(), f(), field types, string syntax, fluent builder API, validation constraints, or type-safe inputs/outputs.
-version: "21.0.12"
+version: "21.0.13"
 ---
 # Ax Signature Reference
@@ -25,7 +25,7 @@ version: "21.0.12"
 | DateRange | `:dateRange` | `{ start: Date; end: Date }` | `travelDates:dateRange` |
 | DateTimeRange | `:datetimeRange` | `{ start: Date; end: Date }` | `meetingWindow:datetimeRange` |
 | Image | `:image` | `{mimeType, data}` | `photo:image` (input only) |
-| Audio | `:audio` | `{format?, data}` | `recording:audio` (input only) |
+| Audio | `:audio` | input: `AxAudioInput`; output: `AxChatAudioOutput` | `recording:audio`, `speech:audio` |
 | File | `:file` | `{mimeType, data}` | `document:file` (input only) |
 | URL | `:url` | `string` | `website:url` |
 | Code | `:code` | `string` | `pythonScript:code` |
@@ -256,9 +256,11 @@ Bad: `text`, `data`, `input`, `output`, `a`, `x`, `val` (too generic), `1field`
 ## Media Type Restrictions
-- Media types (image, audio, file) are **top-level input fields only**
-- Cannot be nested in objects
-- Cannot be output fields
+- Image and file fields are top-level input fields only.
+- Audio fields can be top-level inputs or single top-level outputs.
+- Audio output fields are scripted speech artifacts: the model returns plain text, then Ax synthesizes `AxChatAudioOutput`.
+- Media fields cannot be nested in objects.
+- Media arrays are supported for inputs only; output `audio[]` is not supported.
 ## Common Patterns
@@ -269,9 +271,12 @@ Bad: `text`, `data`, `input`, `output`, `a`, `x`, `val` (too generic), `1field`
 // Classification
 'email:string -> priority:class "urgent, normal, low"'
-// Multi-modal
+// Multi-modal input
 'imageData:image, question?:string -> description:string, objects:string[]'
+// Scripted speech output
+'question:string -> speech:audio, summary:string'
 // Data Extraction
 'invoiceText:string -> invoiceNumber:string, totalAmount:number, lineItems:json[]'
@@ -283,13 +288,13 @@ Bad: `text`, `data`, `input`, `output`, `a`, `x`, `val` (too generic), `1field`
 - Use `f()` fluent builder, NOT nested `f.array(f.string())` -- those are removed.
 - Field names must be descriptive (not generic like `text`, `data`, `input`).
-- Media types are input-only, top-level only.
+- Image/file media types are input-only, top-level only; audio may also be a single top-level output.
 - `.internal()` / `{ internal: true }` is output-only (for chain-of-thought reasoning).
 - `.cache()` / `{ cache: true }` is input-only (for prompt caching).
 - Validation errors trigger auto-retry with correction feedback.
 - `f.email()`, `f.url()`, `f.date()`, `f.datetime()` are shorthand for `f.string().email()` etc.; `f.dateRange()` and `f.datetimeRange()` return `{ start: Date; end: Date }`.
 - `z.enum()` maps to ax's `class` type — only valid on **output** fields.
-- For multimodal inputs (images, audio, files) use `f.image()` / `f.audio()` / `f.file()` — zod has no equivalent.
+- For multimodal inputs (images, audio, files) and scripted audio outputs, use `f.image()` / `f.audio()` / `f.file()` — zod has no equivalent.
 ## Examples