@octavus/docs 4.1.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -194,6 +194,7 @@ interface TriggerRequest {
194
194
  triggerName: string;
195
195
  input?: Record<string, unknown>;
196
196
  rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID
197
+ sender?: UIMessageSender; // Author of this turn, for multi-user attribution
197
198
  }
198
199
 
199
200
  // Continue after client-side tool handling
@@ -223,6 +224,40 @@ export async function POST(request: Request) {
223
224
  }
224
225
  ```
225
226
 
227
+ ### Attributing Messages in Multi-User Chats
228
+
229
+ When several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:
230
+
231
+ ```typescript
232
+ interface UIMessageSender {
233
+ id?: string;
234
+ name?: string;
235
+ image?: string; // Avatar URL
236
+ }
237
+
238
+ export async function POST(request: Request) {
239
+ const user = await authenticate(request); // your auth
240
+ const { sessionId, ...payload } = await request.json();
241
+
242
+ const session = client.agentSessions.attach(sessionId, {
243
+ tools: {
244
+ /* ... */
245
+ },
246
+ });
247
+ const events = session.execute(
248
+ {
249
+ ...payload,
250
+ sender: { id: user.id, name: user.name, image: user.avatarUrl },
251
+ },
252
+ { signal: request.signal },
253
+ );
254
+
255
+ return new Response(toSSEStream(events));
256
+ }
257
+ ```
258
+
259
+ The runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).
260
+
226
261
  ### Stop Support
227
262
 
228
263
  Pass an abort signal to allow clients to stop generation:
@@ -16,6 +16,13 @@ interface UIMessage {
16
16
  parts: UIMessagePart[];
17
17
  status: 'streaming' | 'done';
18
18
  createdAt: Date;
19
+ sender?: UIMessageSender; // Author of a user message, in multi-user chats
20
+ }
21
+
22
+ interface UIMessageSender {
23
+ id?: string;
24
+ name?: string;
25
+ image?: string; // Avatar URL
19
26
  }
20
27
  ```
21
28
 
@@ -133,7 +140,7 @@ interface UIWorkerPart {
133
140
  parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)
134
141
  output?: unknown;
135
142
  error?: string;
136
- status: 'running' | 'done' | 'error';
143
+ status: 'running' | 'done' | 'error' | 'cancelled';
137
144
  }
138
145
 
139
146
  // Step boundary marker (structural, not rendered visually)
@@ -225,6 +232,22 @@ async function handleSend(text: string, files?: FileReference[]) {
225
232
 
226
233
  See [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.
227
234
 
235
+ ### Attributing the Sender (Multi-User Chats)
236
+
237
+ In conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:
238
+
239
+ ```tsx
240
+ await send(
241
+ 'user-message',
242
+ { USER_MESSAGE: text },
243
+ {
244
+ userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },
245
+ },
246
+ );
247
+ ```
248
+
249
+ This `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.
250
+
228
251
  ## Rendering Messages
229
252
 
230
253
  ### Basic Rendering
@@ -126,7 +126,7 @@ Skills that have [secrets](#skill-secrets) configured run in **secure mode**, wh
126
126
 
127
127
  ## Device Execution
128
128
 
129
- By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.
129
+ By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.
130
130
 
131
131
  ```yaml
132
132
  skills:
@@ -154,7 +154,7 @@ The generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_
154
154
 
155
155
  | Aspect | Sandbox (default) | Device |
156
156
  | ------------------- | ---------------------------------- | ------------------------------------------------------ |
157
- | **Environment** | Isolated sandbox | Agent's computer (VM or desktop) |
157
+ | **Environment** | Isolated sandbox | The agent's computer |
158
158
  | **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
159
159
  | **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |
160
160
  | **Code execution** | Via `octavus_code_run` | Via device shell MCP |
@@ -236,6 +236,7 @@ description: >
236
236
  version: 1.0.0
237
237
  license: MIT
238
238
  author: Octavus Team
239
+ category: Productivity
239
240
  ---
240
241
 
241
242
  # QR Code Generator
@@ -273,6 +274,7 @@ Main script for generating QR codes...
273
274
  | `version` | No | Semantic version string |
274
275
  | `license` | No | License identifier |
275
276
  | `author` | No | Skill author |
277
+ | `category` | No | Display category used to group and filter skills in the UI |
276
278
  | `secrets` | No | Array of secret declarations (enables secure mode) |
277
279
 
278
280
  ## Best Practices
@@ -21,25 +21,28 @@ agent:
21
21
 
22
22
  ## Configuration Options
23
23
 
24
- | Field | Required | Description |
25
- | ---------------- | -------- | ---------------------------------------------------------------------------------------- |
26
- | `model` | Yes | Model identifier or variable reference |
27
- | `backupModel` | No | Backup model for automatic failover on provider errors |
28
- | `system` | Yes | System prompt filename (without .md) |
29
- | `input` | No | Variables to pass to the system prompt |
30
- | `tools` | No | List of tools the LLM can call |
31
- | `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |
32
- | `skills` | No | List of Octavus skills the LLM can use |
33
- | `references` | No | List of references the LLM can fetch on demand |
34
- | `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |
35
- | `imageModel` | No | Image generation model (enables agentic image generation) |
36
- | `webSearch` | No | Enable built-in web search tool (provider-agnostic) |
37
- | `agentic` | No | Allow multiple tool call cycles |
38
- | `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |
39
- | `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |
40
- | `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
41
- | `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |
42
- | `anthropic` | No | Anthropic-specific options (tools, skills) |
24
+ | Field | Required | Description |
25
+ | --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
26
+ | `model` | Yes | Model identifier or variable reference |
27
+ | `backupModel` | No | Backup model for automatic failover on provider errors |
28
+ | `system` | Yes | System prompt filename (without .md) |
29
+ | `input` | No | Variables to pass to the system prompt |
30
+ | `tools` | No | List of tools the LLM can call |
31
+ | `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |
32
+ | `skills` | No | List of Octavus skills the LLM can use |
33
+ | `references` | No | List of references the LLM can fetch on demand |
34
+ | `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |
35
+ | `imageModel` | No | Image generation model (enables agentic image generation) |
36
+ | `webSearch` | No | Enable built-in web search tool (provider-agnostic) |
37
+ | `agentic` | No | Allow multiple tool call cycles |
38
+ | `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |
39
+ | `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |
40
+ | `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
41
+ | `speed` | No | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode)) |
42
+ | `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |
43
+ | `maxToolOutputTokens` | No | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |
44
+ | `contextManagement` | No | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management)) |
45
+ | `anthropic` | No | Anthropic-specific options (tools, skills) |
43
46
 
44
47
  ## Models
45
48
 
@@ -50,7 +53,7 @@ Specify models in `provider/model-id` format. Any model supported by the provide
50
53
  | Provider | Format | Examples |
51
54
  | --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |
52
55
  | Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |
53
- | Google | `google/{model-id}` | `gemini-3-pro-preview`, `gemini-3-flash-preview`, `gemini-2.5-flash` |
56
+ | Google | `google/{model-id}` | `gemini-3.5-flash`, `gemini-3-flash-preview`, `gemini-2.5-flash` |
54
57
  | OpenAI | `openai/{model-id}` | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1` |
55
58
 
56
59
  ### Examples
@@ -456,7 +459,7 @@ agent:
456
459
 
457
460
  ## Dynamic Configuration
458
461
 
459
- Like `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
462
+ Like `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
460
463
 
461
464
  ```yaml
462
465
  input:
@@ -548,9 +551,10 @@ handlers:
548
551
  Start summary thread:
549
552
  block: start-thread
550
553
  thread: summary
551
- model: anthropic/claude-sonnet-4-5 # Different model
554
+ model: anthropic/claude-opus-4-8 # Different model
552
555
  backupModel: openai/gpt-4o # Failover model
553
556
  thinking: low # Different thinking
557
+ speed: fast # Fast mode for this thread (supported Opus models only)
554
558
  cache: off # Different cache mode (does not inherit from agent)
555
559
  maxSteps: 1 # Limit tool calls
556
560
  system: escalation-summary # Different prompt
@@ -562,7 +566,7 @@ handlers:
562
566
  todoList: true # Thread-specific task list
563
567
  ```
564
568
 
565
- Each thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section.
569
+ Each thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section - which is how a worker enables fast mode.
566
570
 
567
571
  ## Full Example
568
572
 
@@ -333,7 +333,7 @@ When a skill declares secrets and an organization configures them, the skill run
333
333
 
334
334
  | Aspect | Standard Skills | Secure Skills | Device Skills |
335
335
  | ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |
336
- | **Environment** | Shared sandbox | Isolated sandbox (one per skill) | Agent's computer (VM or desktop) |
336
+ | **Environment** | Shared sandbox | Isolated sandbox (one per skill) | The agent's computer |
337
337
  | **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
338
338
  | **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |
339
339
  | **Secrets** | No secrets | Secrets as env vars | No secrets |
@@ -219,21 +219,22 @@ steps:
219
219
 
220
220
  All LLM configuration goes here:
221
221
 
222
- | Field | Description |
223
- | ------------- | -------------------------------------------------------------------------------------- |
224
- | `thread` | Thread name (defaults to block name) |
225
- | `model` | LLM model to use |
226
- | `system` | System prompt filename (required) |
227
- | `input` | Variables for system prompt |
228
- | `tools` | Tools available in this thread |
229
- | `skills` | Octavus skills available in this thread |
230
- | `mcpServers` | MCP servers available in this thread |
231
- | `imageModel` | Image generation model |
232
- | `webSearch` | Enable built-in web search tool |
233
- | `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
234
- | `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |
235
- | `temperature` | Model temperature (0-2), `"off"`, or variable reference |
236
- | `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |
222
+ | Field | Description |
223
+ | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
224
+ | `thread` | Thread name (defaults to block name) |
225
+ | `model` | LLM model to use |
226
+ | `system` | System prompt filename (required) |
227
+ | `input` | Variables for system prompt |
228
+ | `tools` | Tools available in this thread |
229
+ | `skills` | Octavus skills available in this thread |
230
+ | `mcpServers` | MCP servers available in this thread |
231
+ | `imageModel` | Image generation model |
232
+ | `webSearch` | Enable built-in web search tool |
233
+ | `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
234
+ | `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |
235
+ | `temperature` | Model temperature (0-2), `"off"`, or variable reference |
236
+ | `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |
237
+ | `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread's model view (head+tail preview + note). Omit to leave tool output unbounded |
237
238
 
238
239
  ## Simple Example
239
240
 
@@ -468,10 +469,11 @@ All standard events (text-delta, tool calls, etc.) are also emitted.
468
469
 
469
470
  ## Calling Workers from Interactive Agents
470
471
 
471
- Interactive agents can call workers in two ways:
472
+ Interactive agents can call workers in three ways:
472
473
 
473
474
  1. **Deterministically** - Using the `run-worker` block
474
475
  2. **Agentically** - LLM calls worker as a tool
476
+ 3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management's `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.
475
477
 
476
478
  ### Worker Declaration
477
479
 
@@ -0,0 +1,68 @@
1
+ ---
2
+ title: Context Management
3
+ description: Automatic context-window compaction so long sessions keep running past the model's limit.
4
+ ---
5
+
6
+ # Context Management
7
+
8
+ Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.
9
+
10
+ Compaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.
11
+
12
+ ## Configuration
13
+
14
+ ```yaml
15
+ workers:
16
+ context-summarizer: # the worker that produces the running summary
17
+ description: Summarizes earlier conversation to free up context
18
+ display: description
19
+
20
+ agent:
21
+ model: anthropic/claude-sonnet-4-5
22
+ system: system
23
+ maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)
24
+ # context-summarizer is intentionally NOT listed in agent.workers,
25
+ # so the model never sees it as a callable tool.
26
+ contextManagement:
27
+ summarizerWorker: context-summarizer
28
+ thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)
29
+ recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)
30
+ ```
31
+
32
+ `maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:
33
+
34
+ | Field | Required | Description |
35
+ | ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |
36
+ | `summarizerWorker` | No | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |
37
+ | `thresholdPercent` | No | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |
38
+ | `recentPercent` | No | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization. |
39
+ | `recentWindow` | No | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction). |
40
+
41
+ ## How it works
42
+
43
+ - When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.
44
+ - When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.
45
+ - Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.
46
+ - If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.
47
+
48
+ ## Bounded tool output
49
+
50
+ Some tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.
51
+
52
+ There is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.
53
+
54
+ Bounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.
55
+
56
+ ## The summarizer worker
57
+
58
+ `summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.
59
+
60
+ Summarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.
61
+
62
+ Declare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.
63
+
64
+ Without a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.
65
+
66
+ ## What users see
67
+
68
+ Because the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.
@@ -0,0 +1,77 @@
1
+ ---
2
+ title: Fast Mode
3
+ description: Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.
4
+ ---
5
+
6
+ # Fast Mode
7
+
8
+ Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):
9
+
10
+ ```yaml
11
+ agent:
12
+ model: anthropic/claude-opus-4-8
13
+ speed: fast # fast | standard (default)
14
+ ```
15
+
16
+ | Mode | Behavior | When to use |
17
+ | ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
18
+ | `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents. |
19
+ | `fast` | Higher output speed at a premium per-token rate. | Latency-sensitive, interactive agents where faster responses are worth the premium. |
20
+
21
+ Fast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.
22
+
23
+ ## Supported models
24
+
25
+ Fast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.
26
+
27
+ When you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.
28
+
29
+ ## Premium pricing
30
+
31
+ Fast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:
32
+
33
+ | Model | Fast-mode cost |
34
+ | -------------- | -------------- |
35
+ | Opus 4.8 | ~2x standard |
36
+ | Opus 4.7 / 4.6 | ~6x standard |
37
+
38
+ Prompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.
39
+
40
+ ## Rate limits and fallback
41
+
42
+ Fast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.
43
+
44
+ Falling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.
45
+
46
+ ## Routing
47
+
48
+ A supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:
49
+
50
+ | Route | Example model | How fast mode is enabled |
51
+ | ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |
52
+ | Direct Anthropic | `anthropic/claude-opus-4-8` | `speed: fast` |
53
+ | Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7` | `speed: fast` |
54
+ | OpenRouter | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |
55
+
56
+ ## Passing speed as input
57
+
58
+ Like `thinking`, `speed` accepts a variable reference so consumers choose it per session:
59
+
60
+ ```yaml
61
+ input:
62
+ SPEED:
63
+ type: string
64
+ description: Inference speed (fast/standard)
65
+ optional: true
66
+
67
+ agent:
68
+ model: anthropic/claude-opus-4-8
69
+ speed: SPEED # Resolved from session input; unset -> standard
70
+ system: system
71
+ ```
72
+
73
+ An unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.
74
+
75
+ ## Scope
76
+
77
+ `speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.