npm - @ax-llm/ax - Versions diffs - 19.0.15 → 19.0.17 - Mend

@ax-llm/ax 19.0.15 → 19.0.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ax-llm/ax",
-  "version": "19.0.15",
+  "version": "19.0.17",
   "type": "module",
   "description": "The best library to work with LLMs",
   "repository": {

package/skills/ax-agent.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ax-agent
 description: This skill helps an LLM generate correct AxAgent code using @ax-llm/ax. Use when the user asks about agent(), child agents, namespaced functions, discovery mode, shared fields, llmQuery(...), or RLM code execution.
-version: "19.0.15"
+version: "19.0.17"
 ---
 # AxAgent Codegen Rules (@ax-llm/ax)
@@ -11,11 +11,42 @@ Use this skill to generate `AxAgent` code. Prefer short, modern, copyable patter
 ## Use These Defaults
 - Use `agent(...)`, not `new AxAgent(...)`.
+- Prefer `fn(...)` for host-side function definitions instead of hand-writing JSON Schema objects.
 - Prefer namespaced functions such as `utils.search(...)` or `kb.find(...)`.
 - Assume the child-agent module is `agents` unless `agentIdentity.namespace` is set.
 - If `functions.discovery` is `true`, discover callables from modules before using them.
 - In stdout-mode RLM, use one observable `console.log(...)` step per non-final actor turn.
-- For long RLM tasks, prefer `contextManagement.actionReplay: 'adaptive'` plus `stateSummary` so prior exploratory turns are summarized and live runtime state stays visible.
+- For long RLM tasks, prefer `contextPolicy: { preset: 'adaptive' }` so older successful turns collapse into checkpoint summaries while live runtime state stays visible.
+## Mental Model
+Treat `AxAgent` as a long-running JavaScript REPL that the actor steers over multiple turns, not as a fresh script generator on every turn.
+- Successful code leaves variables, functions, imports, and computed values available in the runtime session.
+- The actor should continue from existing runtime state instead of recreating prior work.
+- `Action Log`, `Live Runtime State`, and checkpoint summaries only control what the actor can see again in the prompt.
+- Rebuild state only after an explicit runtime restart notice or when you intentionally need to overwrite a value.
+## Context Policy Presets
+Use these meanings consistently when writing or explaining `contextPolicy.preset`:
+- `full`: Keep prior actions fully replayed. Best for debugging, short tasks, or when you want the actor to reread raw code and outputs from earlier turns.
+- `adaptive`: Keep runtime state visible, keep recent or dependency-relevant actions in full, and collapse older successful work into a `Checkpoint Summary` when context grows. This is the default recommendation for long multi-turn tasks.
+- `lean`: Most aggressive compression. Keep `Live Runtime State`, checkpoint older successful work, and summarize replay-pruned successful turns instead of showing their full code blocks. Use when token pressure matters more than raw replay detail.
+Practical rule:
+- Start with `adaptive` for most long RLM tasks.
+- Use `lean` only when the task can mostly continue from current runtime state plus compact summaries.
+- Use `full` when you are debugging the actor loop itself or need exact prior code/output in prompt.
+Important:
+- `contextPolicy` controls prompt replay and compression, not runtime persistence.
+- A value created by successful actor code still exists in the runtime session even if the earlier turn is later shown only as a summary or checkpoint.
+- Used discovery docs are replay artifacts too: `adaptive` and `lean` can hide old `listModuleFunctions(...)` / `getFunctionDefinitions(...)` output after the actor successfully uses the discovered callable.
+- Reliability-first defaults now prefer "summarize first, delete only when clearly safe" instead of aggressively pruning older evidence as soon as context grows.
 ## Critical Rules
@@ -24,10 +55,11 @@ Use this skill to generate `AxAgent` code. Prefer short, modern, copyable patter
 - If `functions.discovery` is `true`, call `listModuleFunctions(...)` first, then `getFunctionDefinitions(...)`, then call only discovered functions.
 - In stdout-mode RLM, non-final turns must emit exactly one `console.log(...)` and stop immediately after it.
 - Never combine `console.log(...)` with `final(...)` or `ask_clarification(...)` in the same actor turn.
+- If a host-side `AxAgentFunction` needs to end the current actor turn, use `extra.protocol.final(...)` or `extra.protocol.askClarification(...)`.
 - If a child agent needs parent inputs such as `audience`, use `fields.shared` or `fields.globallyShared`.
 - `llmQuery(...)` failures may come back as `[ERROR] ...`; do not assume success.
-- If `contextManagement.stateSummary.enabled` is on, rely on the `Live Runtime State` block for current variables instead of re-reading old action log code.
-- If `contextManagement.actionReplay` is `'adaptive'` or `'minimal'`, assume older successful turns may be summarized or omitted.
+- If `contextPolicy.state.summary` is on, rely on the `Live Runtime State` block for current variables instead of re-reading old action log code.
+- If `contextPolicy.preset` is `'adaptive'` or `'lean'`, assume older successful turns may be replaced by a `Checkpoint Summary` and that replay-pruned successful turns may appear as compact summaries instead of full code blocks.
 ## Canonical Pattern
@@ -119,43 +151,34 @@ Rules:
 ## Tool Functions And Namespaces
 ```typescript
-import type { AxAgentFunction } from '@ax-llm/ax';
+import { f, fn } from '@ax-llm/ax';
+const tools = [
+  fn('findSnippets')
+    .description('Find handbook snippets by topic')
+    .namespace('kb')
+    .arg('topic', f.string('Topic keyword'))
+    .returns(f.string('Matching snippet').array())
+    .example({
+      title: 'Find severity guidance',
+      code: 'await kb.findSnippets({ topic: "severity" });',
+    })
+    .handler(async ({ topic }) => [])
+    .build(),
+];
-const tools: AxAgentFunction[] = [
-  {
-    name: 'findSnippets',
-    namespace: 'kb',
-    description: 'Find handbook snippets by topic',
-    parameters: {
-      type: 'object',
-      properties: {
-        topic: { type: 'string', description: 'Topic keyword' },
-      },
-      required: ['topic'],
-    },
-    returns: {
-      type: 'array',
-      items: { type: 'string' },
-    },
-    examples: [
+const analyst = agent('query:string -> answer:string', {
+  functions: {
+    local: [
       {
-        title: 'Find severity guidance',
-        code: 'await kb.findSnippets({ topic: "severity" });',
+        namespace: 'kb',
+        title: 'Knowledge Base',
+        selectionCriteria: 'Use for handbook and documentation lookups.',
+        description: 'Handbook and documentation search helpers.',
+        functions: tools.map(({ namespace: _namespace, ...tool }) => tool),
       },
     ],
-    func: async ({ topic }) => [],
   },
-];
-const analyst = agent('query:string -> answer:string', {
-  namespaces: [
-    {
-      name: 'kb',
-      title: 'Knowledge Base',
-      description: 'Handbook and documentation search helpers.',
-    },
-  ],
-  functions: { local: tools },
   contextFields: [],
 });
 ```
@@ -172,6 +195,44 @@ Rules:
 - Default function namespace is `utils` when no namespace is set.
 - Use the runtime call shape `await <namespace>.<name>({...})`.
+## Host-Side Completion From Functions
+Use this pattern when the actor should call a namespaced function, but the host-side function implementation should decide to end the turn:
+```typescript
+import { f, fn } from '@ax-llm/ax';
+const workflowTools = [
+  fn('finishReply')
+    .description('Complete the actor turn with the final reply text')
+    .namespace('workflow')
+    .arg('reply', f.string('Final reply text'))
+    .returns(f.string('Final reply text'))
+    .handler(async ({ reply }, extra) => {
+      extra?.protocol?.final(reply);
+      return reply;
+    })
+    .build(),
+  fn('askForOrderId')
+    .description('Complete the actor turn by requesting clarification')
+    .namespace('workflow')
+    .arg('question', f.string('Clarification question'))
+    .returns(f.string('Clarification question'))
+    .handler(async ({ question }, extra) => {
+      extra?.protocol?.askClarification(question);
+      return question;
+    })
+    .build(),
+];
+```
+Rules:
+- `extra.protocol` is only available when the function call comes from an active AxAgent actor runtime session.
+- Use `extra.protocol.final(...)` or `extra.protocol.askClarification(...)` only inside host-side function handlers.
+- Inside actor-authored JavaScript, keep using the runtime globals `final(...)` and `ask_clarification(...)`.
+- Do not model these protocol completions as normal registered tool functions or discovery entries.
 ## Discovery Mode
 Enable discovery mode when you want the actor to discover modules and fetch callable definitions on demand:
@@ -199,7 +260,8 @@ Discovery APIs:
 Both return Markdown.
-- `listModuleFunctions(...)` only lists modules that actually have callable entries. Namespace metadata from `namespaces` only enriches those callable-backed modules.
+- `listModuleFunctions(...)` only lists modules that actually have callable entries.
+- Grouped modules render in the Actor prompt as `<namespace> - <selection criteria>` when criteria is provided.
 - If a requested module does not exist, `listModuleFunctions(...)` returns a per-module markdown error without failing the whole call.
 - `getFunctionDefinitions(...)` may include argument comments from schema descriptions and fenced code examples from `AxAgentFunction.examples`.
@@ -234,20 +296,88 @@ Do not:
 - Do not dump large pre-known tool definitions into actor code when discovery mode is enabled.
 - Do not use `Promise.all(...)` to fan out discovery calls across modules or definitions.
 - Do not convert discovery markdown into JSON before logging or using it.
+- If used discovery docs disappear from later prompts under `adaptive` or `lean`, call `listModuleFunctions(...)` or `getFunctionDefinitions(...)` again when you need to re-open them.
 ## RLM Actor Code Rules
 Use these rules when generating actor JavaScript for RLM in stdout mode:
 - Treat each actor turn as exactly one observable step.
-- If you need to inspect a value, compute it, `console.log(...)` it, and stop immediately after that `console.log(...)`.
-- On the next turn, read the logged result from `Action Log` before writing more code that depends on it.
+- Inspect what already exists before recomputing it. If a prior turn successfully created a value, prefer reusing that runtime value.
+- If you need to inspect a value, compute it or read it, `console.log(...)` it, and stop immediately after that `console.log(...)`.
+- On the next turn, continue from the existing runtime state and use the logged result from `Action Log` only as evidence for what happened.
 - If the prompt contains `Live Runtime State`, treat it as the canonical view of current variables.
 - Errors from child-agent or tool calls appear in `Action Log`; inspect them and fix the code on the next turn.
 - Non-final turns should contain exactly one `console.log(...)`.
 - Final turns should call `final(...)` or `ask_clarification(...)` without `console.log(...)`.
 - Do not write a complete multi-step program in one actor turn.
-- Do not assume older successful turns remain fully replayed; adaptive replay may compress them to `[SUMMARY]: ...`.
+- Do not re-declare or recompute values just because older turns are summarized; only rebuild after an explicit runtime restart or when you intentionally want a new value.
+- Do not assume older successful turns remain fully replayed; adaptive or lean policies may collapse them into a `Checkpoint Summary` block or compact action summaries.
+Small reuse example:
+Turn 1:
+```javascript
+const customers = await kb.findCustomers({ segment: 'active' });
+console.log(customers.length);
+```
+Turn 2:
+```javascript
+const topCustomers = customers.slice(0, 3);
+console.log(topCustomers);
+```
+Reason: turn 2 reuses `customers` from the persistent runtime. `Live Runtime State` or summaries may change how turn 1 is shown in the prompt, but they do not remove the value from the runtime session.
+## RLM Test Harness
+Use `agent.test(code, contextFieldValues?, options?)` when the user wants to validate JavaScript snippets against the actual AxAgent runtime environment without running the full Actor/Responder loop.
+```typescript
+import { AxJSRuntime, agent, f, fn } from '@ax-llm/ax';
+const runtime = new AxJSRuntime();
+const tools = [
+  fn('sum')
+    .description('Return the sum of the provided numeric values')
+    .namespace('math')
+    .arg('values', f.number('Value to add').array())
+    .returns(f.number('Sum of all values'))
+    .handler(async ({ values }) =>
+      values.reduce((total, value) => total + value, 0)
+    )
+    .build(),
+];
+const harness = agent('query:string -> answer:string', {
+  contextFields: ['query'],
+  runtime,
+  functions: { local: tools },
+  contextPolicy: { preset: 'adaptive' },
+});
+const output = await harness.test(
+  'console.log(await math.sum({ values: [3, 5, 8] }))',
+  { query: 'sum the values' }
+);
+console.log(output);
+```
+Rules:
+- `test(...)` creates a fresh runtime session per call.
+- It exposes the same runtime globals the actor would see for configured `contextFields`: `inputs`, non-colliding top-level aliases, namespaced functions, child agents, and `llmQuery`.
+- In `AxJSRuntime`, do not rely on calling `inspect_runtime()` from inside `test(...)` snippets yet; prefer checking runtime globals directly inside the snippet.
+- It returns the formatted runtime output string.
+- It throws on runtime failures instead of returning LLM-style error strings.
+- Do not call `final(...)` or `ask_clarification(...)` inside `test(...)` snippets.
+- Pass only `contextFields` values to `test(...)`; it is not a general way to inject arbitrary non-context inputs.
+- If the snippet uses `llmQuery(...)`, provide an AI service through the agent config or `options.ai`.
 ## RLM Adaptive Replay
@@ -260,15 +390,31 @@ const analyst = agent(
     contextFields: ['context'],
     runtime: new AxJSRuntime(),
     maxTurns: 10,
-    contextManagement: {
-      actionReplay: 'adaptive',
-      recentFullActions: 1,
-      successSummarization: true,
-      stateSummary: { enabled: true, maxEntries: 6 },
-      stateInspection: { contextThreshold: 2_000 },
-      errorPruning: true,
-      hindsightEvaluation: true,
-      pruneRank: 2,
+    contextPolicy: {
+      preset: 'adaptive',
+      summarizerOptions: {
+        model: 'summary-model',
+        modelConfig: { temperature: 0.2, maxTokens: 180 },
+      },
+      state: {
+        summary: true,
+        inspect: true,
+        inspectThresholdChars: 8_000,
+        maxEntries: 6,
+        maxChars: 1_200,
+      },
+      checkpoints: {
+        enabled: true,
+        triggerChars: 12_000,
+      },
+      expert: {
+        pruneErrors: true,
+        rankPruning: { enabled: true, minRank: 2 },
+        tombstones: {
+          model: 'summary-model',
+          modelConfig: { maxTokens: 80 },
+        },
+      },
     },
   }
 );
@@ -276,12 +422,18 @@ const analyst = agent(
 Rules:
-- Use `actionReplay: 'adaptive'` when the task needs runtime state across many turns but old exploratory code should not keep bloating the prompt.
-- Use `recentFullActions` to keep the newest one or two turns verbatim while older successful turns collapse to summaries.
-- Use `successSummarization: true` for explicit, compact summaries of older successful turns.
-- Use `stateSummary.enabled` to inject a compact `Live Runtime State` block into the actor prompt.
-- Use `actionReplay: 'minimal'` only when you want aggressively compressed history and can rely mostly on current runtime state.
-- Keep `stateInspection.contextThreshold` on so the actor is reminded to call `inspect_runtime()` when context grows.
+- Use `preset: 'full'` when the actor should keep seeing raw prior code and outputs with minimal compression.
+- Use `preset: 'adaptive'` when the task needs runtime state across many turns but older successful work should collapse into checkpoint summaries while important recent steps can still stay fully replayed.
+- Use `preset: 'lean'` when you want more aggressive compression and can rely mostly on current runtime state plus checkpoint summaries and compact action summaries.
+- Use `state.summary` to inject a compact `Live Runtime State` block into the actor prompt. The block is structured and provenance-aware: variables are rendered with compact type/size/preview metadata, and when Ax can infer it, a short source suffix like `from t3 via db.search` is included. Combine `maxEntries` with `maxChars` so large runtime objects do not dominate the prompt.
+- Use `state.inspect` with `inspectThresholdChars` so the actor is reminded to call `inspect_runtime()` when replayed action history starts getting large.
+- `adaptive` and `lean` hide used discovery docs by default; set `contextPolicy.pruneUsedDocs: false` if you want to keep replaying them.
+- `full` keeps used discovery docs by default; set `contextPolicy.pruneUsedDocs: true` if you want the same cleanup there.
+- Use `summarizerOptions` to tune the internal checkpoint-summary AxGen program.
+- If you configure `expert.tombstones`, treat the object form as options for the internal tombstone-summary AxGen program.
+- Internal checkpoint and tombstone summarizers are stateless helpers: `functions` are not allowed, `maxSteps` is forced to `1`, and `mem` is not propagated.
+- Built-in `adaptive` and `lean` presets no longer enable destructive rank pruning by default. Opt in with `expert.rankPruning` only when you want lower-value successful turns deleted instead of summarized.
+- If you want a quick local demo of the rendered `Live Runtime State` block, run [`src/examples/rlm-live-runtime-state.ts`](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-live-runtime-state.ts).
 Good pattern:
@@ -455,7 +607,7 @@ agentIdentity?: {
   maxRuntimeChars?: number;
   maxBatchedLlmQueryConcurrency?: number;
   maxTurns?: number;
-  contextManagement?: AxContextManagementConfig;
+  contextPolicy?: AxContextPolicyConfig;
   actorFields?: string[];
   actorCallback?: (result: Record<string, unknown>) => void | Promise<void>;
   inputUpdateCallback?: (currentInputs: Record<string, unknown>) => Promise<Record<string, unknown> | undefined> | Record<string, unknown> | undefined;
@@ -468,6 +620,23 @@ agentIdentity?: {
 }
 ```
+## Examples
+Fetch these for full working code:
+- [Agent](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/agent.ts) — basic agent
+- [Functions](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/function.ts) — function validation
+- [Food Search](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/food-search.ts) — API tools
+- [Smart Home](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/smart-home.ts) — state management
+- [RLM](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm.ts) — RLM basic
+- [RLM Long Task](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-long-task.ts) — RLM context policy
+- [RLM Discovery](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-discovery.ts) — discovery mode
+- [RLM Shared Fields](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-shared-fields.ts) — shared fields
+- [RLM Adaptive Replay](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-adaptive-replay.ts) — adaptive replay
+- [RLM Live Runtime State](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/rlm-live-runtime-state.ts) — structured runtime-state rendering
+- [Customer Support](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/customer-support.ts) — classification agent
+- [Abort Patterns](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/abort-patterns.ts) — abort handling
 ## Do Not Generate
 - Do not use `new AxAgent(...)` for new code unless explicitly required.

package/skills/ax-ai.md ADDED Viewed

@@ -0,0 +1,245 @@
+---
+name: ax-ai
+description: This skill helps an LLM generate correct AI provider setup and configuration code using @ax-llm/ax. Use when the user asks about ai(), providers, models, presets, embeddings, extended thinking, context caching, or mentions OpenAI/Anthropic/Google/Azure/Groq/DeepSeek/Mistral/Cohere/Together/Ollama/HuggingFace/Reka/OpenRouter with @ax-llm/ax.
+version: "19.0.17"
+---
+# AI Provider Codegen Rules (@ax-llm/ax)
+Use this skill to generate AI provider setup, configuration, and chat code. Prefer short, modern, copyable patterns. Do not write tutorial prose unless the user explicitly asks for explanation.
+## Quick Setup
+```typescript
+import { ai } from '@ax-llm/ax';
+const openai = ai({ name: 'openai', apiKey: 'sk-...' });
+const claude = ai({ name: 'anthropic', apiKey: 'sk-ant-...' });
+const gemini = ai({ name: 'google-gemini', apiKey: 'AIza...' });
+const azure = ai({ name: 'azure-openai', apiKey: 'your-key', resourceName: 'your-resource', deploymentName: 'gpt-4' });
+const groq = ai({ name: 'groq', apiKey: 'gsk_...' });
+const deepseek = ai({ name: 'deepseek', apiKey: 'sk-...' });
+const mistral = ai({ name: 'mistral', apiKey: 'your-key' });
+const cohere = ai({ name: 'cohere', apiKey: 'your-key' });
+const together = ai({ name: 'together', apiKey: 'your-key' });
+const openrouter = ai({ name: 'openrouter', apiKey: 'your-key' });
+const ollama = ai({ name: 'ollama', url: 'http://localhost:11434' });
+const hf = ai({ name: 'huggingface', apiKey: 'hf_...' });
+const reka = ai({ name: 'reka', apiKey: 'your-key' });
+const grok = ai({ name: 'x-grok', apiKey: 'your-key' });
+```
+## Model Presets
+```typescript
+import { ai, AxAIGoogleGeminiModel } from '@ax-llm/ax';
+const gemini = ai({
+  name: 'google-gemini',
+  apiKey: process.env.GOOGLE_APIKEY!,
+  config: { model: 'simple' },
+  models: [
+    { key: 'tiny', model: AxAIGoogleGeminiModel.Gemini20FlashLite, description: 'Fast + cheap', config: { maxTokens: 1024, temperature: 0.3 } },
+    { key: 'simple', model: AxAIGoogleGeminiModel.Gemini20Flash, description: 'Balanced', config: { temperature: 0.6 } },
+  ],
+});
+await gemini.chat({ model: 'tiny', chatPrompt: [{ role: 'user', content: 'Hi' }] });
+```
+## Chat
+```typescript
+const res = await llm.chat({
+  chatPrompt: [
+    { role: 'system', content: 'You are concise.' },
+    { role: 'user', content: 'Write a haiku about the ocean.' },
+  ],
+});
+console.log(res.results[0]?.content);
+```
+## Common Options
+- `stream` (boolean): enable SSE; true by default
+- `thinkingTokenBudget`: `'minimal'` | `'low'` | `'medium'` | `'high'` | `'highest'` | `'none'`
+- `showThoughts`: include thoughts in output
+- `functionCallMode`: `'auto'` | `'native'` | `'prompt'`
+- `debug`, `logger`, `tracer`, `rateLimiter`, `timeout`
+## Extended Thinking
+```typescript
+import { ai, AxAIAnthropicModel } from '@ax-llm/ax';
+const claude = ai({
+  name: 'anthropic',
+  apiKey: process.env.ANTHROPIC_APIKEY!,
+  config: { model: AxAIAnthropicModel.Claude46Opus },
+});
+const res = await claude.chat(
+  { chatPrompt: [{ role: 'user', content: 'Solve step by step...' }] },
+  { thinkingTokenBudget: 'medium', showThoughts: true },
+);
+console.log(res.results[0]?.thought);
+console.log(res.results[0]?.content);
+```
+### Budget Levels
+| Level | Anthropic (tokens) | Gemini (tokens) |
+|---|---|---|
+| `'none'` | disabled | minimal |
+| `'minimal'` | 1,024 | 200 |
+| `'low'` | 5,000 | 800 |
+| `'medium'` | 10,000 | 5,000 |
+| `'high'` | 20,000 | 10,000 |
+| `'highest'` | 32,000 | 24,500 |
+### Anthropic Model-Specific Behavior
+- Opus 4.6: adaptive thinking, effort levels
+- Opus 4.5: budget_tokens + effort levels (capped at `'high'`)
+- Other thinking models: budget tokens only
+### Custom Thinking Levels
+```typescript
+const claude = ai({
+  name: 'anthropic',
+  apiKey: '...',
+  config: {
+    model: AxAIAnthropicModel.Claude46Opus,
+    thinkingTokenBudgetLevels: {
+      minimal: 2048,
+      low: 8000,
+      medium: 16000,
+      high: 25000,
+      highest: 40000,
+    },
+    effortLevelMapping: {
+      minimal: 'low',
+      low: 'medium',
+      medium: 'high',
+      high: 'high',
+      highest: 'max',
+    },
+  },
+});
+```
+## Embeddings
+```typescript
+const { embeddings } = await llm.embed({
+  texts: ['hello', 'world'],
+  embedModel: 'text-embedding-005',
+});
+```
+## Context Caching
+```typescript
+const result = await gen.forward(llm, { code, language }, {
+  mem,
+  sessionId: 'code-review-session',
+  contextCache: {
+    ttlSeconds: 3600,
+    cacheBreakpoint: 'after-examples',
+  },
+});
+```
+Breakpoint values: `'system'` | `'after-functions'` | `'after-examples'`
+Provider behavior:
+- Google Gemini: explicit caching with cache resource ID, auto TTL refresh
+- Anthropic: implicit via `cache_control` markers
+### External Registry (serverless)
+```typescript
+const registry: AxContextCacheRegistry = {
+  get: async (key) => { /* redis.get */ },
+  set: async (key, entry) => { /* redis.set */ },
+};
+```
+## AWS Bedrock
+```typescript
+import { AxAIBedrock, AxAIBedrockModel } from '@ax-llm/ax-ai-aws-bedrock';
+const bedrock = new AxAIBedrock({
+  region: 'us-east-2',
+  fallbackRegions: ['us-west-2'],
+  config: { model: AxAIBedrockModel.ClaudeSonnet4 },
+});
+```
+## Vercel AI SDK Integration
+```typescript
+import { ai } from '@ax-llm/ax';
+import { AxAIProvider } from '@ax-llm/ax-ai-sdk-provider';
+import { generateText } from 'ai';
+const axAI = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY! });
+const model = new AxAIProvider(axAI);
+const result = await generateText({
+  model,
+  messages: [{ role: 'user', content: 'Hello!' }],
+});
+```
+## MCP + AxJSRuntime
+```typescript
+import { AxMCPClient } from '@ax-llm/ax';
+import { axCreateMCPStdioTransport } from '@ax-llm/ax-tools';
+const transport = axCreateMCPStdioTransport({
+  command: 'npx',
+  args: ['-y', '@anthropic/mcp-server-filesystem'],
+});
+const client = new AxMCPClient(transport);
+```
+## Critical Rules
+- Use `ai()` factory for all providers.
+- Provider names: `'openai'`, `'anthropic'`, `'google-gemini'`, `'azure-openai'`, `'mistral'`, `'groq'`, `'cohere'`, `'together'`, `'deepseek'`, `'ollama'`, `'huggingface'`, `'openrouter'`, `'reka'`, `'x-grok'`
+- Thinking constraints on Anthropic: `temperature` and `topK` are ignored; `topP` only sent if >= 0.95.
+- Bedrock uses `new AxAIBedrock()`, not `ai()`.
+- Vercel AI SDK uses `AxAIProvider` wrapper.
+## Examples
+Fetch these for full working code:
+- [Embeddings](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/embed.ts) — embedding generation
+- [Anthropic Thinking](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/anthropic-thinking-function.ts) — extended thinking with functions
+- [Anthropic Thinking Separation](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/anthropic-thinking-separation.ts) — thinking separation
+- [Anthropic Web Search](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/anthropic-web-search.ts) — Anthropic web search
+- [OpenAI Web Search](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/openai-web-search.ts) — OpenAI web search
+- [OpenAI Responses](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/openai-responses.ts) — OpenAI responses API
+- [o3 Reasoning](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/reasoning-o3-example.ts) — o3 reasoning
+- [Gemini Context Cache](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/gemini-context-cache.ts) — Gemini context caching
+- [Gemini Files](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/gemini-file-support.ts) — Gemini file handling
+- [Grok Live Search](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/grok-live-search.ts) — Grok live search
+- [OpenRouter](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/openrouter.ts) — OpenRouter provider
+- [Vertex AI Auth](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/vertex-auth-example.ts) — Vertex AI authentication
+- [MCP Stdio](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/mcp-client-memory.ts) — MCP stdio transport
+- [MCP HTTP](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/mcp-client-pipedream.ts) — MCP HTTP transport
+- [Telemetry](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/telemetry.ts) — OpenTelemetry tracing
+- [Multi-Modal](https://raw.githubusercontent.com/ax-llm/ax/refs/heads/main/src/examples/multi-modal.ts) — image handling
+## Do Not Generate
+- Do not use `new AxAIOpenAI(...)` or similar class constructors for standard providers; use `ai()`.
+- Do not hardcode provider class names when `ai({ name: ... })` covers the provider.
+- Do not mix `thinkingTokenBudget` with explicit `temperature` on Anthropic thinking models.
+- Do not use `ai()` for AWS Bedrock; use `new AxAIBedrock()`.
+- Do not omit `resourceName` and `deploymentName` for Azure OpenAI.