npm - @octavus/docs - Versions diffs - 4.1.0 → 5.0.0 - Mend

@octavus/docs 4.1.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/content/02-server-sdk/02-sessions.md +35 -0
package/content/03-client-sdk/02-messages.md +24 -1
package/content/04-protocol/05-skills.md +4 -2
package/content/04-protocol/07-agent-config.md +27 -23
package/content/04-protocol/09-skills-advanced.md +1 -1
package/content/04-protocol/11-workers.md +18 -16
package/content/04-protocol/14-context-management.md +68 -0
package/content/04-protocol/15-fast-mode.md +77 -0
package/dist/{chunk-V4C4VGHD.js → chunk-Z2OPVMHI.js} +59 -23
package/dist/chunk-Z2OPVMHI.js.map +1 -0
package/dist/content.js +1 -1
package/dist/docs.json +29 -11
package/dist/index.js +1 -1
package/dist/search-index.json +1 -1
package/dist/search.js +1 -1
package/dist/search.js.map +1 -1
package/dist/sections.json +29 -11
package/package.json +1 -1
package/dist/chunk-V4C4VGHD.js.map +0 -1

package/content/02-server-sdk/02-sessions.md CHANGED Viewed

@@ -194,6 +194,7 @@ interface TriggerRequest {
   triggerName: string;
   input?: Record<string, unknown>;
   rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID
+  sender?: UIMessageSender; // Author of this turn, for multi-user attribution
 }
 // Continue after client-side tool handling
@@ -223,6 +224,40 @@ export async function POST(request: Request) {
 }
 ```
+### Attributing Messages in Multi-User Chats
+When several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:
+```typescript
+interface UIMessageSender {
+  id?: string;
+  name?: string;
+  image?: string; // Avatar URL
+}
+export async function POST(request: Request) {
+  const user = await authenticate(request); // your auth
+  const { sessionId, ...payload } = await request.json();
+  const session = client.agentSessions.attach(sessionId, {
+    tools: {
+      /* ... */
+    },
+  });
+  const events = session.execute(
+    {
+      ...payload,
+      sender: { id: user.id, name: user.name, image: user.avatarUrl },
+    },
+    { signal: request.signal },
+  );
+  return new Response(toSSEStream(events));
+}
+```
+The runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).
 ### Stop Support
 Pass an abort signal to allow clients to stop generation:

package/content/03-client-sdk/02-messages.md CHANGED Viewed

@@ -16,6 +16,13 @@ interface UIMessage {
   parts: UIMessagePart[];
   status: 'streaming' | 'done';
   createdAt: Date;
+  sender?: UIMessageSender; // Author of a user message, in multi-user chats
+}
+interface UIMessageSender {
+  id?: string;
+  name?: string;
+  image?: string; // Avatar URL
 }
 ```
@@ -133,7 +140,7 @@ interface UIWorkerPart {
   parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)
   output?: unknown;
   error?: string;
-  status: 'running' | 'done' | 'error';
+  status: 'running' | 'done' | 'error' | 'cancelled';
 }
 // Step boundary marker (structural, not rendered visually)
@@ -225,6 +232,22 @@ async function handleSend(text: string, files?: FileReference[]) {
 See [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.
+### Attributing the Sender (Multi-User Chats)
+In conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:
+```tsx
+await send(
+  'user-message',
+  { USER_MESSAGE: text },
+  {
+    userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },
+  },
+);
+```
+This `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.
 ## Rendering Messages
 ### Basic Rendering

package/content/04-protocol/05-skills.md CHANGED Viewed

@@ -126,7 +126,7 @@ Skills that have [secrets](#skill-secrets) configured run in **secure mode**, wh
 ## Device Execution
-By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.
+By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.
 ```yaml
 skills:
@@ -154,7 +154,7 @@ The generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_
 | Aspect              | Sandbox (default)                  | Device                                                 |
 | ------------------- | ---------------------------------- | ------------------------------------------------------ |
-| **Environment**     | Isolated sandbox                   | Agent's computer (VM or desktop)                       |
+| **Environment**     | Isolated sandbox                   | The agent's computer                                   |
 | **Available tools** | All 6 skill tools                  | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
 | **File access**     | Via `octavus_file_read/write`      | Via device filesystem MCP                              |
 | **Code execution**  | Via `octavus_code_run`             | Via device shell MCP                                   |
@@ -236,6 +236,7 @@ description: >
 version: 1.0.0
 license: MIT
 author: Octavus Team
+category: Productivity
 ---
 # QR Code Generator
@@ -273,6 +274,7 @@ Main script for generating QR codes...
 | `version`     | No       | Semantic version string                                |
 | `license`     | No       | License identifier                                     |
 | `author`      | No       | Skill author                                           |
+| `category`    | No       | Display category used to group and filter skills in the UI |
 | `secrets`     | No       | Array of secret declarations (enables secure mode)     |
 ## Best Practices

package/content/04-protocol/07-agent-config.md CHANGED Viewed

@@ -21,25 +21,28 @@ agent:
 ## Configuration Options
-| Field            | Required | Description                                                                              |
-| ---------------- | -------- | ---------------------------------------------------------------------------------------- |
-| `model`          | Yes      | Model identifier or variable reference                                                   |
-| `backupModel`    | No       | Backup model for automatic failover on provider errors                                   |
-| `system`         | Yes      | System prompt filename (without .md)                                                     |
-| `input`          | No       | Variables to pass to the system prompt                                                   |
-| `tools`          | No       | List of tools the LLM can call                                                           |
-| `mcpServers`     | No       | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers))           |
-| `skills`         | No       | List of Octavus skills the LLM can use                                                   |
-| `references`     | No       | List of references the LLM can fetch on demand                                           |
-| `sandboxTimeout` | No       | Skill sandbox timeout in ms (default: 5 min, max: 1 hour)                                |
-| `imageModel`     | No       | Image generation model (enables agentic image generation)                                |
-| `webSearch`      | No       | Enable built-in web search tool (provider-agnostic)                                      |
-| `agentic`        | No       | Allow multiple tool call cycles                                                          |
-| `maxSteps`       | No       | Maximum agentic steps (default: 10) - literal or variable reference                      |
-| `temperature`    | No       | Model temperature (0-2), `"off"`, or a variable reference                                |
-| `thinking`       | No       | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
-| `cache`          | No       | Prompt caching mode: `auto` (default), `extended`, or `off`                              |
-| `anthropic`      | No       | Anthropic-specific options (tools, skills)                                               |
+| Field                 | Required | Description                                                                                                                    |
+| --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
+| `model`               | Yes      | Model identifier or variable reference                                                                                         |
+| `backupModel`         | No       | Backup model for automatic failover on provider errors                                                                         |
+| `system`              | Yes      | System prompt filename (without .md)                                                                                           |
+| `input`               | No       | Variables to pass to the system prompt                                                                                         |
+| `tools`               | No       | List of tools the LLM can call                                                                                                 |
+| `mcpServers`          | No       | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers))                                                 |
+| `skills`              | No       | List of Octavus skills the LLM can use                                                                                         |
+| `references`          | No       | List of references the LLM can fetch on demand                                                                                 |
+| `sandboxTimeout`      | No       | Skill sandbox timeout in ms (default: 5 min, max: 1 hour)                                                                      |
+| `imageModel`          | No       | Image generation model (enables agentic image generation)                                                                      |
+| `webSearch`           | No       | Enable built-in web search tool (provider-agnostic)                                                                            |
+| `agentic`             | No       | Allow multiple tool call cycles                                                                                                |
+| `maxSteps`            | No       | Maximum agentic steps (default: 10) - literal or variable reference                                                            |
+| `temperature`         | No       | Model temperature (0-2), `"off"`, or a variable reference                                                                      |
+| `thinking`            | No       | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference                                       |
+| `speed`               | No       | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode))                       |
+| `cache`               | No       | Prompt caching mode: `auto` (default), `extended`, or `off`                                                                    |
+| `maxToolOutputTokens` | No       | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |
+| `contextManagement`   | No       | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management))                              |
+| `anthropic`           | No       | Anthropic-specific options (tools, skills)                                                                                     |
 ## Models
@@ -50,7 +53,7 @@ Specify models in `provider/model-id` format. Any model supported by the provide
 | Provider  | Format                 | Examples                                                                                           |
 | --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |
 | Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |
-| Google    | `google/{model-id}`    | `gemini-3-pro-preview`, `gemini-3-flash-preview`, `gemini-2.5-flash`                               |
+| Google    | `google/{model-id}`    | `gemini-3.5-flash`, `gemini-3-flash-preview`, `gemini-2.5-flash`                                   |
 | OpenAI    | `openai/{model-id}`    | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1`                                                |
 ### Examples
@@ -456,7 +459,7 @@ agent:
 ## Dynamic Configuration
-Like `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
+Like `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
 ```yaml
 input:
@@ -548,9 +551,10 @@ handlers:
     Start summary thread:
       block: start-thread
       thread: summary
-      model: anthropic/claude-sonnet-4-5 # Different model
+      model: anthropic/claude-opus-4-8 # Different model
       backupModel: openai/gpt-4o # Failover model
       thinking: low # Different thinking
+      speed: fast # Fast mode for this thread (supported Opus models only)
       cache: off # Different cache mode (does not inherit from agent)
       maxSteps: 1 # Limit tool calls
       system: escalation-summary # Different prompt
@@ -562,7 +566,7 @@ handlers:
       todoList: true # Thread-specific task list
 ```
-Each thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section.
+Each thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section - which is how a worker enables fast mode.
 ## Full Example

package/content/04-protocol/09-skills-advanced.md CHANGED Viewed

@@ -333,7 +333,7 @@ When a skill declares secrets and an organization configures them, the skill run
 | Aspect              | Standard Skills          | Secure Skills                                       | Device Skills                                          |
 | ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |
-| **Environment**     | Shared sandbox           | Isolated sandbox (one per skill)                    | Agent's computer (VM or desktop)                       |
+| **Environment**     | Shared sandbox           | Isolated sandbox (one per skill)                    | The agent's computer                                   |
 | **Available tools** | All 6 skill tools        | `skill_read`, `skill_list`, `skill_run` only        | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
 | **Script input**    | CLI arguments via `args` | JSON via stdin (use `input` parameter)              | CLI arguments via `args`                               |
 | **Secrets**         | No secrets               | Secrets as env vars                                 | No secrets                                             |

package/content/04-protocol/11-workers.md CHANGED Viewed

@@ -219,21 +219,22 @@ steps:
 All LLM configuration goes here:
-| Field         | Description                                                                            |
-| ------------- | -------------------------------------------------------------------------------------- |
-| `thread`      | Thread name (defaults to block name)                                                   |
-| `model`       | LLM model to use                                                                       |
-| `system`      | System prompt filename (required)                                                      |
-| `input`       | Variables for system prompt                                                            |
-| `tools`       | Tools available in this thread                                                         |
-| `skills`      | Octavus skills available in this thread                                                |
-| `mcpServers`  | MCP servers available in this thread                                                   |
-| `imageModel`  | Image generation model                                                                 |
-| `webSearch`   | Enable built-in web search tool                                                        |
-| `thinking`    | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
-| `cache`       | Prompt caching mode: `auto` (default), `extended`, or `off`                            |
-| `temperature` | Model temperature (0-2), `"off"`, or variable reference                                |
-| `maxSteps`    | Maximum tool call cycles (enables agentic if > 1), or variable reference               |
+| Field                 | Description                                                                                                                             |
+| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
+| `thread`              | Thread name (defaults to block name)                                                                                                    |
+| `model`               | LLM model to use                                                                                                                        |
+| `system`              | System prompt filename (required)                                                                                                       |
+| `input`               | Variables for system prompt                                                                                                             |
+| `tools`               | Tools available in this thread                                                                                                          |
+| `skills`              | Octavus skills available in this thread                                                                                                 |
+| `mcpServers`          | MCP servers available in this thread                                                                                                    |
+| `imageModel`          | Image generation model                                                                                                                  |
+| `webSearch`           | Enable built-in web search tool                                                                                                         |
+| `thinking`            | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference                                                  |
+| `cache`               | Prompt caching mode: `auto` (default), `extended`, or `off`                                                                             |
+| `temperature`         | Model temperature (0-2), `"off"`, or variable reference                                                                                 |
+| `maxSteps`            | Maximum tool call cycles (enables agentic if > 1), or variable reference                                                                |
+| `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread's model view (head+tail preview + note). Omit to leave tool output unbounded |
 ## Simple Example
@@ -468,10 +469,11 @@ All standard events (text-delta, tool calls, etc.) are also emitted.
 ## Calling Workers from Interactive Agents
-Interactive agents can call workers in two ways:
+Interactive agents can call workers in three ways:
 1. **Deterministically** - Using the `run-worker` block
 2. **Agentically** - LLM calls worker as a tool
+3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management's `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.
 ### Worker Declaration

package/content/04-protocol/14-context-management.md ADDED Viewed

@@ -0,0 +1,68 @@
+---
+title: Context Management
+description: Automatic context-window compaction so long sessions keep running past the model's limit.
+---
+# Context Management
+Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.
+Compaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.
+## Configuration
+```yaml
+workers:
+  context-summarizer: # the worker that produces the running summary
+    description: Summarizes earlier conversation to free up context
+    display: description
+agent:
+  model: anthropic/claude-sonnet-4-5
+  system: system
+  maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)
+  # context-summarizer is intentionally NOT listed in agent.workers,
+  # so the model never sees it as a callable tool.
+  contextManagement:
+    summarizerWorker: context-summarizer
+    thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)
+    recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)
+```
+`maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:
+| Field              | Required | Description                                                                                                          |
+| ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |
+| `summarizerWorker` | No       | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |
+| `thresholdPercent` | No       | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |
+| `recentPercent`    | No       | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization.        |
+| `recentWindow`     | No       | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction).                                   |
+## How it works
+- When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.
+- When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.
+- Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.
+- If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.
+## Bounded tool output
+Some tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.
+There is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.
+Bounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.
+## The summarizer worker
+`summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.
+Summarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.
+Declare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.
+Without a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.
+## What users see
+Because the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.

package/content/04-protocol/15-fast-mode.md ADDED Viewed

@@ -0,0 +1,77 @@
+---
+title: Fast Mode
+description: Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.
+---
+# Fast Mode
+Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):
+```yaml
+agent:
+  model: anthropic/claude-opus-4-8
+  speed: fast # fast | standard (default)
+```
+| Mode       | Behavior                                                     | When to use                                                                         |
+| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
+| `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents.                                                                        |
+| `fast`     | Higher output speed at a premium per-token rate.             | Latency-sensitive, interactive agents where faster responses are worth the premium. |
+Fast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.
+## Supported models
+Fast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.
+When you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.
+## Premium pricing
+Fast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:
+| Model          | Fast-mode cost |
+| -------------- | -------------- |
+| Opus 4.8       | ~2x standard   |
+| Opus 4.7 / 4.6 | ~6x standard   |
+Prompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.
+## Rate limits and fallback
+Fast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.
+Falling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.
+## Routing
+A supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:
+| Route             | Example model                               | How fast mode is enabled                                          |
+| ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |
+| Direct Anthropic  | `anthropic/claude-opus-4-8`                 | `speed: fast`                                                     |
+| Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7`          | `speed: fast`                                                     |
+| OpenRouter        | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |
+## Passing speed as input
+Like `thinking`, `speed` accepts a variable reference so consumers choose it per session:
+```yaml
+input:
+  SPEED:
+    type: string
+    description: Inference speed (fast/standard)
+    optional: true
+agent:
+  model: anthropic/claude-opus-4-8
+  speed: SPEED # Resolved from session input; unset -> standard
+  system: system
+```
+An unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.
+## Scope
+`speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.