@ax-llm/ax 19.0.21 → 19.0.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,26 +1,56 @@
1
1
  ---
2
2
  name: ax-agent
3
- description: This skill helps an LLM generate correct AxAgent code using @ax-llm/ax. Use when the user asks about agent(), child agents, namespaced functions, discovery mode, shared fields, llmQuery(...), RLM code execution, or offline tuning with agent.optimize(...).
4
- version: "19.0.21"
3
+ description: This skill helps an LLM generate correct AxAgent code using @ax-llm/ax. Use when the user asks about agent(), child agents, namespaced functions, discovery mode, shared fields, llmQuery(...), RLM code execution, promptLevel, recursionOptions, or agent runtime behavior. For tuning and eval with agent.optimize(...), use ax-agent-optimize.
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # AxAgent Codegen Rules (@ax-llm/ax)
8
8
 
9
9
  Use this skill to generate `AxAgent` code. Prefer short, modern, copyable patterns. Do not write tutorial prose unless the user explicitly asks for explanation.
10
10
 
11
+ Your job is not just to write valid code. Your job is to choose the smallest correct `AxAgent` shape for the user's needs:
12
+
13
+ - If the user wants a normal tool-using assistant, keep the config minimal.
14
+ - If the user wants long-running code execution, use RLM features deliberately.
15
+ - If the user wants delegated subtasks, decide whether they need plain `llmQuery(...)` or recursive advanced mode.
16
+ - If the user wants observability, add only the specific hooks or debug options that support that need.
17
+ - If the user is unsure, choose conservative defaults and avoid exotic options.
18
+
11
19
  ## Use These Defaults
12
20
 
13
21
  - Use `agent(...)`, not `new AxAgent(...)`.
14
22
  - Prefer `fn(...)` for host-side function definitions instead of hand-writing JSON Schema objects.
15
23
  - Prefer namespaced functions such as `utils.search(...)` or `kb.find(...)`.
16
24
  - Assume the child-agent module is `agents` unless `agentIdentity.namespace` is set.
17
- - Use `agent.optimize(...)` when the user wants to tune a fully configured agent against task datasets.
18
25
  - If `functions.discovery` is `true`, discover callables from modules before using them.
19
26
  - In stdout-mode RLM, use one observable `console.log(...)` step per non-final actor turn.
20
27
  - For long RLM tasks, prefer `contextPolicy: { preset: 'adaptive' }` so older successful turns collapse into checkpoint summaries while live runtime state stays visible.
21
- - Default `actorOptions.promptLevel` to `'detailed'` and opt down to `'basic'` only when the user wants a shorter actor prompt.
28
+ - Prefer `contextPolicy: { preset: 'checkpointed' }` when you want debugging-friendly full replay first and only want summaries after prompt pressure becomes real.
29
+ - Default top-level `promptLevel` to `'detailed'` and opt down to `'basic'` only when the user wants a shorter root actor prompt.
30
+ - Prefer `actorModelPolicy` when the actor may need to upgrade under whole-prompt pressure or repeated error turns without also upgrading the responder.
22
31
  - Use `actorTurnCallback` when the user needs per-turn observability into generated code, raw runtime result, formatted output, or provider thoughts.
23
32
 
33
+ ## Decision Guide
34
+
35
+ Map user intent to agent shape before writing code:
36
+
37
+ - "Use tools and answer" -> plain `agent(...)` with local functions, no recursion, no extra observability.
38
+ - "Inspect large context with code" -> add `runtime`, `contextFields`, and usually `contextPolicy: { preset: 'adaptive' }`.
39
+ - "Delegate focused semantic subtasks" -> use `llmQuery(...)`; add `mode: 'advanced'` only when child tasks need their own runtime, tools, or discovery loop.
40
+ - "Need child agents with distinct responsibilities" -> use `agents.local`, and add `fields.shared` only when parent inputs truly need to flow into children.
41
+ - "Need tool discovery because names/schemas are not stable" -> use `functions.discovery: true` and generate discovery-first code.
42
+ - "Need a stronger actor only when the run gets noisy or large" -> use `actorModelPolicy` and keep the responder model separate.
43
+ - "Need debugging or traceability" -> start with `debug: true` or `actorTurnCallback`; do not add both unless the user clearly wants both prompt/runtime visibility and structured telemetry.
44
+
45
+ Choose options based on user needs, not feature completeness:
46
+
47
+ - Prefer `mode: 'simple'` unless recursive child agents materially improve the task.
48
+ - Prefer `promptLevel: 'detailed'` when reliability matters more than prompt size.
49
+ - Prefer `promptLevel: 'basic'` when the user wants a leaner prompt and the workflow is already well constrained.
50
+ - Prefer `recursionOptions.promptLevel: 'basic'` for narrow delegated children unless the child is also discovery-heavy or schema-uncertain.
51
+ - Prefer `maxSubAgentCalls` only when advanced recursion is enabled or the user needs explicit delegation limits.
52
+ - Prefer `contextPolicy.preset: 'adaptive'` for long RLM tasks, `checkpointed` when you want "full first, summarize later", `full` for debugging, and `lean` only under real token pressure.
53
+
24
54
  ## Mental Model
25
55
 
26
56
  Treat `AxAgent` as a long-running JavaScript REPL that the actor steers over multiple turns, not as a fresh script generator on every turn.
@@ -36,11 +66,13 @@ Use these meanings consistently when writing or explaining `contextPolicy.preset
36
66
 
37
67
  - `full`: Keep prior actions fully replayed. Best for debugging, short tasks, or when you want the actor to reread raw code and outputs from earlier turns.
38
68
  - `adaptive`: Keep runtime state visible, keep recent or dependency-relevant actions in full, and collapse older successful work into a `Checkpoint Summary` when context grows. This is the default recommendation for long multi-turn tasks.
69
+ - `checkpointed`: Keep full replay until the rendered actor prompt crosses the checkpoint threshold, then replace older successful history with a `Checkpoint Summary` while keeping recent actions and unresolved errors fully visible.
39
70
  - `lean`: Most aggressive compression. Keep `Live Runtime State`, checkpoint older successful work, and summarize replay-pruned successful turns instead of showing their full code blocks. Use when token pressure matters more than raw replay detail.
40
71
 
41
72
  Practical rule:
42
73
 
43
74
  - Start with `adaptive` for most long RLM tasks.
75
+ - Use `checkpointed` when you want conservative replay until there is actual pressure to summarize.
44
76
  - Use `lean` only when the task can mostly continue from current runtime state plus compact summaries.
45
77
  - Use `full` when you are debugging the actor loop itself or need exact prior code/output in prompt.
46
78
 
@@ -49,31 +81,38 @@ Important:
49
81
  - `contextPolicy` controls prompt replay and compression, not runtime persistence.
50
82
  - A value created by successful actor code still exists in the runtime session even if the earlier turn is later shown only as a summary or checkpoint.
51
83
  - Used discovery docs are replay artifacts too: `lean` hides old `listModuleFunctions(...)` / `getFunctionDefinitions(...)` output by default after the actor successfully uses the discovered callable, while `adaptive` keeps them unless you opt into pruning.
84
+ - `checkpointed` keeps used discovery docs by default and avoids destructive cleanup unless you explicitly opt into it.
52
85
  - Reliability-first defaults now prefer "summarize first, delete only when clearly safe" instead of aggressively pruning older evidence as soon as context grows.
53
86
 
54
87
  ## Choosing Presets, Prompt Level, And Model Size
55
88
 
56
- Treat these three knobs as a bundle:
89
+ Treat these knobs as a bundle:
57
90
 
58
91
  - `contextPolicy.preset` decides how much raw history the actor keeps seeing.
59
- - `actorOptions.promptLevel` decides how prescriptive the actor prompt is.
92
+ - Top-level `promptLevel` decides how prescriptive the root actor prompt is.
93
+ - `recursionOptions.promptLevel` overrides prompt detail for recursive `llmQuery(...)` child agents.
94
+ - `actorModelPolicy` decides when the actor switches to an override model without changing the responder.
60
95
  - Model size decides how well the actor can recover from compressed context and terse guidance.
61
96
 
62
97
  Recommended combinations:
63
98
 
64
99
  - Short task, debugging, or weaker/cheaper model: `preset: 'full'` with `promptLevel: 'detailed'`.
65
100
  - Long multi-turn task, general default, medium-to-strong model: `preset: 'adaptive'` with `promptLevel: 'detailed'`.
101
+ - Long task where you want raw replay until the log is actually large: `preset: 'checkpointed'` with `promptLevel: 'detailed'`.
66
102
  - Long task where the actor keeps making avoidable exploration mistakes: `preset: 'adaptive'` with `promptLevel: 'detailed'`.
67
103
  - Very long task under token pressure, stronger model only: `preset: 'lean'` with `promptLevel: 'basic'`.
68
104
  - Discovery-heavy or schema-uncertain work with a capable model: `preset: 'adaptive'` with `promptLevel: 'detailed'`.
105
+ - Discovery-heavy work with a cheaper default actor: keep the responder cheap and add `actorModelPolicy` so only the actor upgrades under pressure.
69
106
 
70
107
  Practical rule:
71
108
 
72
109
  - The leaner the replay policy, the stronger the model should usually be.
73
110
  - `full` gives the model more raw evidence, so smaller models often do better there.
74
111
  - `adaptive` is the default middle ground for real agent work.
112
+ - `checkpointed` is the conservative middle ground when you want full replay first and summarization only after a threshold.
75
113
  - `lean` should be reserved for models that can reason well from runtime state plus summaries instead of exact old code/output.
76
114
  - `detailed` is not automatically "better"; it is more controlling. Use it when the actor needs tighter exploration rhythm, not just because the task is hard.
115
+ - `actorModelPolicy` is usually better than globally upgrading the whole agent when the bottleneck is actor exploration rather than responder synthesis.
77
116
 
78
117
  Prompt-level guidance:
79
118
 
@@ -104,10 +143,10 @@ Use `promptLevel: 'basic'` when:
104
143
  - If a child agent needs parent inputs such as `audience`, use `fields.shared` or `fields.globallyShared`.
105
144
  - `llmQuery(...)` failures may come back as `[ERROR] ...`; do not assume success.
106
145
  - If `contextPolicy.state.summary` is on, rely on the `Live Runtime State` block for current variables instead of re-reading old action log code.
107
- - If `contextPolicy.preset` is `'adaptive'` or `'lean'`, assume older successful turns may be replaced by a `Checkpoint Summary` and that replay-pruned successful turns may appear as compact summaries instead of full code blocks.
146
+ - If `contextPolicy.preset` is `'adaptive'`, `'checkpointed'`, or `'lean'`, assume older successful turns may be replaced by a `Checkpoint Summary` and that replay-pruned successful turns may appear as compact summaries instead of full code blocks.
108
147
  - In public `forward()` and `streamingForward()` flows, `ask_clarification(...)` does not go through the responder; it throws `AxAgentClarificationError`.
109
148
  - When resuming after clarification, prefer `error.getState()` from the thrown `AxAgentClarificationError`, then call `agent.setState(savedState)` before the next `forward(...)`.
110
- - For offline tuning, prefer eval-safe tools or in-memory mocks because `agent.optimize(...)` will replay tasks many times.
149
+ - For offline tuning, hand off to the `ax-agent-optimize` skill and prefer eval-safe tools or in-memory mocks because `agent.optimize(...)` will replay tasks many times.
111
150
 
112
151
  ## Canonical Pattern
113
152
 
@@ -570,13 +609,18 @@ Rules:
570
609
 
571
610
  - Use `preset: 'full'` when the actor should keep seeing raw prior code and outputs with minimal compression.
572
611
  - Use `preset: 'adaptive'` when the task needs runtime state across many turns but older successful work should collapse into checkpoint summaries while important recent steps can still stay fully replayed.
612
+ - Use `preset: 'checkpointed'` when you want full replay first, then only older successful history checkpointed after the rendered actor prompt crosses `checkpoints.triggerChars`.
573
613
  - Use `preset: 'lean'` when you want more aggressive compression and can rely mostly on current runtime state plus checkpoint summaries and compact action summaries.
574
614
  - `adaptive` now keeps used discovery docs by default and uses slightly richer live-state/checkpoint settings than `lean`; it should be the first choice unless you have a strong reason to prefer `full` or `lean`.
615
+ - `checkpointed` keeps the most recent `3` actions in full and keeps unresolved errors fully replayed even after checkpointing starts.
575
616
  - Use `state.summary` to inject a compact `Live Runtime State` block into the actor prompt. The block is structured and provenance-aware: variables are rendered with compact type/size/preview metadata, and when Ax can infer it, a short source suffix like `from t3 via db.search` is included. Combine `maxEntries` with `maxChars` so large runtime objects do not dominate the prompt.
576
- - Use `state.inspect` with `inspectThresholdChars` so the actor is reminded to call `inspect_runtime()` when replayed action history starts getting large.
617
+ - Use `state.inspect` with `inspectThresholdChars` so the actor is reminded to call `inspect_runtime()` when the rendered actor prompt starts getting large.
577
618
  - `adaptive` keeps used discovery docs by default; set `contextPolicy.pruneUsedDocs: true` only when you want more aggressive cleanup.
619
+ - `checkpointed` keeps used discovery docs by default; set `contextPolicy.pruneUsedDocs: true` only when you want the same cleanup there.
578
620
  - `lean` hides used discovery docs by default; set `contextPolicy.pruneUsedDocs: false` if you want to keep replaying them.
579
621
  - `full` keeps used discovery docs by default; set `contextPolicy.pruneUsedDocs: true` if you want the same cleanup there.
622
+ - `checkpointed` uses a checkpoint summarizer that is optimized to preserve exact callables, ids, enum literals, date/time strings, query formats, and failures worth avoiding. Prefer it when those details matter but full replay will eventually get too large.
623
+ - Lower `checkpoints.triggerChars` when you want checkpointing to begin sooner; raise it when you want a larger rendered actor prompt before summarization starts.
580
624
  - Use `summarizerOptions` to tune the internal checkpoint-summary AxGen program.
581
625
  - If you configure `expert.tombstones`, treat the object form as options for the internal tombstone-summary AxGen program.
582
626
  - Internal checkpoint and tombstone summarizers are stateless helpers: `functions` are not allowed, `maxSteps` is forced to `1`, and `mem` is not propagated.
@@ -654,15 +698,83 @@ const supportAgent = agent('query:string -> answer:string', {
654
698
  });
655
699
  ```
656
700
 
701
+ ## Option Layout
702
+
703
+ Use these top-level controls consistently:
704
+
705
+ - `mode`: controls whether `llmQuery(...)` stays simple or delegates to recursive child agents in advanced mode
706
+ - `promptLevel`: controls the root actor prompt guidance
707
+ - `recursionOptions.maxDepth`: limits recursive `llmQuery(...)` depth
708
+ - `recursionOptions.promptLevel`: overrides prompt guidance for recursive `llmQuery(...)` child agents
709
+ - `maxSubAgentCalls`: shared delegated-call budget across the whole run, including recursive children
710
+ - `actorOptions`: actor-only forward options such as `description`, `model`, `modelConfig`, `thinkingTokenBudget`, and `showThoughts`
711
+ - `actorModelPolicy`: actor-only model override rules based on full rendered prompt size or consecutive error turns
712
+ - `responderOptions`: responder-only forward options
713
+ - `judgeOptions`: built-in judge options for `agent.optimize(...)`; for tuning workflows use the `ax-agent-optimize` skill
714
+
715
+ Canonical shape:
716
+
717
+ ```typescript
718
+ const researchAgent = agent('query:string -> answer:string', {
719
+ contextFields: ['query'],
720
+ runtime,
721
+ mode: 'advanced',
722
+ promptLevel: 'detailed',
723
+ recursionOptions: {
724
+ maxDepth: 2,
725
+ promptLevel: 'basic',
726
+ },
727
+ contextPolicy: {
728
+ preset: 'checkpointed',
729
+ },
730
+ actorOptions: {
731
+ description: 'Use tools first and keep JS steps small.',
732
+ model: 'gpt-5.4-mini',
733
+ },
734
+ actorModelPolicy: [
735
+ {
736
+ model: 'gpt-5.4',
737
+ abovePromptChars: 16_000,
738
+ aboveErrorTurns: 2,
739
+ },
740
+ ],
741
+ responderOptions: {
742
+ model: 'gpt-5.4-mini',
743
+ },
744
+ });
745
+ ```
746
+
747
+ Semantics:
748
+
749
+ - Top-level `promptLevel` applies to the root actor.
750
+ - `recursionOptions.promptLevel` applies only to recursive `llmQuery(...)` child agents in advanced mode.
751
+ - If `recursionOptions.promptLevel` is omitted, recursive children inherit the root top-level `promptLevel`.
752
+ - `mode` stays top-level; there is no `recursionOptions.mode`.
753
+ - The current merged actor model stays the default base model. `actorModelPolicy` only overrides it when a rule matches.
754
+ - `actorModelPolicy` only switches the actor model. It does not change `responderOptions.model`.
755
+ - Recursive child agents can inherit `actorModelPolicy`; use a child override only when that child needs different routing behavior.
756
+ - `actorModelPolicy` entries are ordered from weaker to stronger. If multiple rules match, the last matching entry wins.
757
+ - If one entry defines both `abovePromptChars` and `aboveErrorTurns`, it matches when either threshold is crossed.
758
+
759
+ When choosing these options for a user:
760
+
761
+ - Do not add `mode: 'advanced'` just because recursion exists as a feature. Add it only when delegated children need their own tool/discovery/runtime loop.
762
+ - Do not add `recursionOptions` at all if the user does not need recursive delegation.
763
+ - Do not add `judgeOptions` in normal agent examples; reserve that for optimize/eval workflows.
764
+ - Keep `actorOptions` focused on actor-only forward concerns such as `description`, `model`, `modelConfig`, `thinkingTokenBudget`, and `showThoughts`.
765
+ - Use `actorModelPolicy` when the actor is the bottleneck and you want the responder to stay fixed.
766
+
657
767
  ## Actor Prompt Controls
658
768
 
659
- Use `actorOptions` for actor-only model/prompt tuning and `responderOptions` for responder-only tuning.
769
+ Use top-level `promptLevel` for root actor guidance, `recursionOptions.promptLevel` for recursive child guidance, `actorOptions` for actor-only forward options, and `responderOptions` for responder-only tuning.
660
770
 
661
771
  Key fields:
662
772
 
773
+ - `promptLevel`: choose `'basic'` or `'detailed'` guidance for the root actor template
774
+ - `recursionOptions.promptLevel`: choose `'basic'` or `'detailed'` guidance for recursive child agents
663
775
  - `actorOptions.description`: append extra actor-specific instructions without changing the responder prompt
664
- - `actorOptions.promptLevel`: choose `'basic'` or `'detailed'` guidance for the actor template
665
776
  - `actorOptions.model` / `responderOptions.model`: split model choice across actor and responder when needed
777
+ - `actorModelPolicy`: auto-switch only the actor when the rendered actor prompt is large or the run is on a consecutive error streak
666
778
 
667
779
  Good split-model pattern:
668
780
 
@@ -671,9 +783,9 @@ const researchAgent = agent('query:string -> answer:string', {
671
783
  contextFields: ['query'],
672
784
  runtime,
673
785
  contextPolicy: { preset: 'adaptive' },
786
+ promptLevel: 'detailed',
674
787
  actorOptions: {
675
788
  model: 'gpt-5.4',
676
- promptLevel: 'detailed',
677
789
  },
678
790
  responderOptions: {
679
791
  model: 'gpt-5.4-mini',
@@ -686,6 +798,9 @@ Model guidance:
686
798
  - Put the stronger model on the actor when the task depends on multi-turn exploration, discovery, runtime state reuse, or compressed replay.
687
799
  - Put the stronger model on the responder only when the hard part is final synthesis/formatting rather than exploration.
688
800
  - For cost-sensitive setups, a common pattern is stronger actor + cheaper responder, not the other way around.
801
+ - Prefer `actorModelPolicy` over globally upgrading the whole agent when the actor only needs help after context grows or the run starts thrashing.
802
+ - `actorModelPolicy` uses full rendered actor prompt chars, not raw `actionLog.length`. That prompt includes the actor definition, user inputs, context metadata, replayed actions, live runtime state, delegated context summaries, and checkpoint summaries.
803
+ - Pair `contextPolicy: { preset: 'checkpointed' }` with `actorModelPolicy` when you want "full first, then summarize and upgrade the actor only if needed."
689
804
 
690
805
  Invalid pattern:
691
806
 
@@ -762,111 +877,16 @@ Rules:
762
877
  - `agents.globallyShared` and `functions.globallyShared` propagate to all descendants.
763
878
  - Use `excluded` when a child should not receive a propagated field, agent, or function.
764
879
 
765
- ## Offline Tuning With `agent.optimize(...)`
766
-
767
- Use `agent.optimize(...)` when the user already has a configured `AxAgent` and wants to tune it against focused tasks such as emailing, scheduling, or office-assistant workflows.
768
-
769
- Canonical pattern:
880
+ ## Tuning Hand-off
770
881
 
771
- ```typescript
772
- import {
773
- AxAIGoogleGeminiModel,
774
- AxJSRuntime,
775
- AxOptimizedProgramImpl,
776
- axDefaultOptimizerLogger,
777
- agent,
778
- ai,
779
- f,
780
- fn,
781
- } from '@ax-llm/ax';
882
+ When the user wants `agent.optimize(...)`, judge configuration, eval datasets, saved optimization artifacts, or recursive optimization guidance, use the `ax-agent-optimize` skill.
782
883
 
783
- const tools = [
784
- fn('sendEmail')
785
- .namespace('email')
786
- .description('Send an email message')
787
- .arg('to', f.string('Recipient email address'))
788
- .arg('body', f.string('Email body text'))
789
- .returns(
790
- f.object({
791
- sent: f.boolean('Whether the email was sent'),
792
- to: f.string('Recipient email address'),
793
- })
794
- )
795
- .handler(async ({ to }) => ({ sent: true, to }))
796
- .build(),
797
- ];
798
-
799
- const studentAI = ai({
800
- name: 'google-gemini',
801
- apiKey: process.env.GOOGLE_APIKEY!,
802
- config: { model: AxAIGoogleGeminiModel.Gemini25FlashLite, temperature: 0.2 },
803
- });
884
+ Keep this skill focused on building and running agents. For tuning work:
804
885
 
805
- const judgeAI = ai({
806
- name: 'google-gemini',
807
- apiKey: process.env.GOOGLE_APIKEY!,
808
- config: { model: AxAIGoogleGeminiModel.Gemini3Pro, temperature: 1.0 },
809
- });
810
-
811
- const assistant = agent('query:string -> answer:string', {
812
- ai: studentAI,
813
- judgeAI,
814
- judgeOptions: {
815
- description: 'Prefer correct tool use over polished wording.',
816
- model: 'judge-model',
817
- },
818
- contextFields: [],
819
- runtime: new AxJSRuntime(),
820
- functions: { local: tools },
821
- contextPolicy: { preset: 'adaptive' },
822
- });
823
-
824
- const tasks = [
825
- {
826
- input: { query: 'Send an email to Jim saying good morning.' },
827
- criteria: 'Use the email tool and send the message to Jim.',
828
- expectedActions: ['email.sendEmail'],
829
- },
830
- ];
831
-
832
- const result = await assistant.optimize(tasks, {
833
- target: 'actor',
834
- maxMetricCalls: 12,
835
- verbose: true,
836
- optimizerLogger: axDefaultOptimizerLogger,
837
- onProgress: (progress) => {
838
- console.log(
839
- `round ${progress.round}/${progress.totalRounds} current=${progress.currentScore} best=${progress.bestScore}`
840
- );
841
- },
842
- });
843
-
844
- const saved = JSON.stringify(result.optimizedProgram, null, 2);
845
- const restored = new AxOptimizedProgramImpl(JSON.parse(saved));
846
- assistant.applyOptimization(restored);
847
- ```
848
-
849
- Rules:
850
-
851
- - Pass already-loaded tasks. Do not invent a benchmark loader unless the user asks for one.
852
- - Default optimize target is `root.actor`; use `target: 'responder'` or explicit program IDs only when the user clearly wants that.
853
- - Prefer the built-in judge path. Use `judgeAI` plus `judgeOptions` instead of forcing the user to author a metric for open-ended assistant tasks.
854
- - `judgeOptions` mirrors normal forward options and supports extra judge guidance through `description`.
855
- - The built-in judge scores from the full agent run, not just the final reply. It can see the completion type, clarification payload when present, final output when present, action log, normalized function calls, tool errors, and turn count.
856
- - `agent.optimize(...)` runs each evaluation rollout from a clean continuation state. Saved runtime state from `getState()` / `setState(...)` is not used during evaluation rollouts, and optimization does not overwrite the caller's existing saved state.
857
- - During optimize/eval, `ask_clarification(...)` is treated as a scored evaluation outcome instead of going through the responder. Custom metrics and the built-in judge should branch on `prediction.completionType`.
858
- - For clarification outcomes in custom metrics, expect `prediction.completionType === 'ask_clarification'`, `prediction.clarification` to be populated, and `prediction.output` to be absent.
859
- - For final outcomes in custom metrics, expect `prediction.completionType === 'final'` and `prediction.output` to be populated.
860
- - Use `expectedActions` and `forbiddenActions` in tasks when tool correctness matters.
861
- - Use `verbose`, `optimizerLogger`, and `onProgress` when the user wants live optimization status.
862
- - Treat `debugOptimizer` as an advanced override that forces logging even when normal verbosity is off.
863
- - If the user provides a custom `metric`, that overrides the built-in judge path.
864
- - `target: 'responder'` still works, but clarification-heavy tasks are low-signal for responder optimization because clarification rollouts do not invoke the responder.
865
- - Save `result.optimizedProgram` and later restore it with `new AxOptimizedProgramImpl(...)` plus `agent.applyOptimization(...)`.
866
- - For real examples, use fresh eval-safe tool state for the baseline run, the optimization run, and the restored replay so side effects do not leak across phases.
867
- - If the user wants to demonstrate improvement, run a held-out task before optimization, save and reload the artifact, then replay the same task on a fresh restored agent and print the concrete side effects.
868
- - A good office-assistant optimization example should push the weaker model into multi-step tool use it may barely handle zero-shot, such as relative-date scheduling plus correct recipient selection and “draft only” constraints.
869
- - Remind the user that `agent.optimize(...)` replays tasks many times, so real side-effecting tools should be replaced with eval-safe mocks or in-memory state during tuning.
886
+ - use eval-safe tools or in-memory mocks
887
+ - treat `judgeOptions` as part of the optimize workflow
888
+ - choose a deterministic `metric` when scoring is objective; use the built-in judge only when run quality needs qualitative review
889
+ - keep runtime authoring guidance here and optimization guidance in `ax-agent-optimize`
870
890
 
871
891
  ## `llmQuery(...)` Rules
872
892
 
@@ -1034,17 +1054,48 @@ agentIdentity?: {
1034
1054
  }) => void | Promise<void>;
1035
1055
  inputUpdateCallback?: (currentInputs: Record<string, unknown>) => Promise<Record<string, unknown> | undefined> | Record<string, unknown> | undefined;
1036
1056
  mode?: 'simple' | 'advanced';
1057
+ promptLevel?: 'detailed' | 'basic';
1058
+ actorModelPolicy?: readonly [
1059
+ | {
1060
+ model: string;
1061
+ abovePromptChars: number;
1062
+ aboveErrorTurns?: number;
1063
+ }
1064
+ | {
1065
+ model: string;
1066
+ abovePromptChars?: number;
1067
+ aboveErrorTurns: number;
1068
+ },
1069
+ ...Array<
1070
+ | {
1071
+ model: string;
1072
+ abovePromptChars: number;
1073
+ aboveErrorTurns?: number;
1074
+ }
1075
+ | {
1076
+ model: string;
1077
+ abovePromptChars?: number;
1078
+ aboveErrorTurns: number;
1079
+ }
1080
+ >,
1081
+ ];
1037
1082
  recursionOptions?: Partial<Omit<AxProgramForwardOptions, 'functions'>> & {
1038
1083
  maxDepth?: number;
1039
1084
  promptLevel?: 'detailed' | 'basic';
1040
1085
  };
1041
- actorOptions?: Partial<AxProgramForwardOptions & { description?: string; promptLevel?: 'detailed' | 'basic' }>;
1086
+ actorOptions?: Partial<AxProgramForwardOptions & { description?: string }>;
1042
1087
  responderOptions?: Partial<AxProgramForwardOptions & { description?: string }>;
1043
1088
  judgeOptions?: Partial<AxJudgeOptions>;
1044
1089
  }
1045
1090
  ```
1046
1091
 
1047
1092
  - `actorTurnCallback` fires for the root agent and for recursive child agents that run actor turns.
1093
+ - `promptLevel` controls the root actor prompt.
1094
+ - `actorModelPolicy` applies to the actor loop and can be inherited by recursive child agents unless you override it there.
1095
+ - `abovePromptChars` is measured from the full rendered actor prompt, not just replayed action log text.
1096
+ - Consecutive error turns reset after a successful non-error turn and when checkpoint summarization refreshes to a new fingerprint.
1097
+ - `recursionOptions.promptLevel` controls recursive child prompt guidance and falls back to the top-level `promptLevel`.
1098
+ - `maxSubAgentCalls` is a shared delegated-call budget across the entire run.
1048
1099
 
1049
1100
  ## Examples
1050
1101
 
@@ -1061,7 +1112,6 @@ Fetch these for full working code:
1061
1112
  - [RLM Adaptive Replay](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-adaptive-replay.ts) — adaptive replay
1062
1113
  - [RLM Live Runtime State](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-live-runtime-state.ts) — structured runtime-state rendering
1063
1114
  - [RLM Clarification Resume](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-clarification-resume.ts) — clarification exception plus `getState()` / `setState(...)`
1064
- - [RLM Agent Optimize](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-agent-optimize.ts) — Gemini office-assistant tuning with save/load
1065
1115
  - [Customer Support](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/customer-support.ts) — classification agent
1066
1116
  - [Abort Patterns](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/abort-patterns.ts) — abort handling
1067
1117
 
@@ -1073,4 +1123,4 @@ Fetch these for full working code:
1073
1123
  - Do not write a full multi-step RLM actor program in one turn.
1074
1124
  - Do not combine `console.log(...)` with `final(...)`.
1075
1125
  - Do not forget `fields.shared` when child agents depend on parent inputs.
1076
- - Do not run `agent.optimize(...)` against production tools with real side effects unless the user explicitly wants that.
1126
+ - Do not put `promptLevel` under `actorOptions`; use top-level `promptLevel` for the root actor and `recursionOptions.promptLevel` for recursive child agents.
package/skills/ax-ai.md CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-ai
3
3
  description: This skill helps an LLM generate correct AI provider setup and configuration code using @ax-llm/ax. Use when the user asks about ai(), providers, models, presets, embeddings, extended thinking, context caching, or mentions OpenAI/Anthropic/Google/Azure/Groq/DeepSeek/Mistral/Cohere/Together/Ollama/HuggingFace/Reka/OpenRouter with @ax-llm/ax.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # AI Provider Codegen Rules (@ax-llm/ax)
package/skills/ax-flow.md CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-flow
3
3
  description: This skill helps an LLM generate correct AxFlow workflow code using @ax-llm/ax. Use when the user asks about flow(), AxFlow, workflow orchestration, parallel execution, DAG workflows, conditional routing, map/reduce patterns, or multi-node AI pipelines.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # AxFlow Codegen Rules (@ax-llm/ax)
package/skills/ax-gen.md CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-gen
3
3
  description: This skill helps an LLM generate correct AxGen code using @ax-llm/ax. Use when the user asks about ax(), AxGen, generators, forward(), streamingForward(), assertions, field processors, step hooks, self-tuning, or structured outputs.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # AxGen Codegen Rules (@ax-llm/ax)
package/skills/ax-gepa.md CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-gepa
3
3
  description: This skill helps an LLM generate correct AxGEPA optimization code using @ax-llm/ax. Use when the user asks about AxGEPA, GEPA, Pareto optimization, multi-objective prompt tuning, reflective prompt evolution, validationExamples, maxMetricCalls, or optimizing a generator, flow, or agent tree.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # AxGEPA Codegen Rules (@ax-llm/ax)
@@ -24,12 +24,28 @@ Use this skill to generate direct `AxGEPA` optimization code. Prefer short, mode
24
24
  - `AxGEPA.compile()` works for a single generator and for tree-aware roots such as flows or agents with registered instruction-bearing descendants.
25
25
  - There is no separate flow-only GEPA optimizer. Use `AxGEPA` for flows too.
26
26
  - The metric may return either `number` or `Record<string, number>`.
27
- - Keep metrics deterministic and cheap. Avoid extra LLM calls inside the metric unless the user explicitly wants judge-based evaluation.
27
+ - Keep metrics deterministic and cheap by default.
28
+ - Avoid extra LLM calls inside the metric unless the user explicitly wants judge-based evaluation.
29
+ - If the user needs LLM-as-judge scoring for a non-agent GEPA run, prefer a plain typed `AxGen` evaluator instead of writing a custom judge abstraction.
28
30
  - `maxMetricCalls` must be large enough to cover the initial validation pass over `validationExamples`.
29
31
  - GEPA optimizes instructions. If a tree has no instruction-bearing nodes, optimization will fail.
30
32
  - Use held-out validation examples for selection. Do not reuse the training set as `validationExamples`.
31
33
  - `result.optimizedProgram` is the easy-to-apply best candidate. `result.paretoFront` is the full trade-off set for multi-objective runs.
32
34
 
35
+ ## Metric Selection
36
+
37
+ Choose the evaluation path deliberately:
38
+
39
+ - Prefer a deterministic metric when correctness can be read directly from `prediction` and `example`.
40
+ - Prefer a deterministic metric when cost, latency, recursion depth, or tool count matters.
41
+ - Use a plain typed `AxGen` evaluator only when the task is genuinely qualitative and hard to score exactly.
42
+ - For `agent.optimize(...)`, prefer the built-in judge path instead of manually wrapping a judge metric.
43
+
44
+ Rule of thumb:
45
+
46
+ - `AxGEPA` on `AxGen` or flow: use a metric first, optionally a plain typed `AxGen` evaluator if needed.
47
+ - `agent.optimize(...)`: use custom `metric` for crisp scoring, otherwise `judgeAI` plus `judgeOptions`.
48
+
33
49
  ## Canonical Scalar Pattern
34
50
 
35
51
  ```typescript
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-learn
3
3
  description: This skill helps an LLM generate correct AxLearn code using @ax-llm/ax. Use when the user asks about self-improving agents, trace-backed learning, feedback-aware updates, or AxLearn modes.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # AxLearn Codegen Rules (@ax-llm/ax)
package/skills/ax-llm.md CHANGED
@@ -1,14 +1,14 @@
1
1
  ---
2
2
  name: ax
3
3
  description: This skill helps with using the @ax-llm/ax TypeScript library for building LLM applications. Use when the user asks about ax(), ai(), f(), s(), agent(), flow(), AxGen, AxAgent, AxFlow, signatures, streaming, or mentions @ax-llm/ax.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # Ax Library (@ax-llm/ax) Quick Reference
8
8
 
9
9
  Ax is a TypeScript library for building LLM-powered applications with type-safe signatures, streaming support, and multi-provider compatibility.
10
10
 
11
- > **Detailed skills available:** ax-ai (providers), ax-signature (signatures/types), ax-gen (generators), ax-agent (agents), ax-flow (workflows), ax-gepa (Pareto optimization), ax-learn (self-improving agents).
11
+ > **Detailed skills available:** ax-ai (providers), ax-signature (signatures/types), ax-gen (generators), ax-agent (agents/runtime), ax-agent-optimize (agent tuning/eval), ax-flow (workflows), ax-gepa (Pareto optimization), ax-learn (self-improving agents).
12
12
 
13
13
  ## Imports & Factories
14
14
 
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: ax-signature
3
3
  description: This skill helps an LLM generate correct DSPy signature code using @ax-llm/ax. Use when the user asks about signatures, s(), f(), field types, string syntax, fluent builder API, validation constraints, or type-safe inputs/outputs.
4
- version: "19.0.21"
4
+ version: "19.0.23"
5
5
  ---
6
6
 
7
7
  # Ax Signature Reference