@octavus/docs 4.1.0 → 5.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/content/02-server-sdk/02-sessions.md +35 -0
- package/content/03-client-sdk/02-messages.md +24 -1
- package/content/04-protocol/05-skills.md +4 -2
- package/content/04-protocol/07-agent-config.md +27 -23
- package/content/04-protocol/09-skills-advanced.md +1 -1
- package/content/04-protocol/11-workers.md +18 -16
- package/content/04-protocol/14-context-management.md +68 -0
- package/content/04-protocol/15-fast-mode.md +77 -0
- package/dist/{chunk-V4C4VGHD.js → chunk-Z2OPVMHI.js} +59 -23
- package/dist/chunk-Z2OPVMHI.js.map +1 -0
- package/dist/content.js +1 -1
- package/dist/docs.json +29 -11
- package/dist/index.js +1 -1
- package/dist/search-index.json +1 -1
- package/dist/search.js +1 -1
- package/dist/search.js.map +1 -1
- package/dist/sections.json +29 -11
- package/package.json +1 -1
- package/dist/chunk-V4C4VGHD.js.map +0 -1
|
@@ -194,6 +194,7 @@ interface TriggerRequest {
|
|
|
194
194
|
triggerName: string;
|
|
195
195
|
input?: Record<string, unknown>;
|
|
196
196
|
rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID
|
|
197
|
+
sender?: UIMessageSender; // Author of this turn, for multi-user attribution
|
|
197
198
|
}
|
|
198
199
|
|
|
199
200
|
// Continue after client-side tool handling
|
|
@@ -223,6 +224,40 @@ export async function POST(request: Request) {
|
|
|
223
224
|
}
|
|
224
225
|
```
|
|
225
226
|
|
|
227
|
+
### Attributing Messages in Multi-User Chats
|
|
228
|
+
|
|
229
|
+
When several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:
|
|
230
|
+
|
|
231
|
+
```typescript
|
|
232
|
+
interface UIMessageSender {
|
|
233
|
+
id?: string;
|
|
234
|
+
name?: string;
|
|
235
|
+
image?: string; // Avatar URL
|
|
236
|
+
}
|
|
237
|
+
|
|
238
|
+
export async function POST(request: Request) {
|
|
239
|
+
const user = await authenticate(request); // your auth
|
|
240
|
+
const { sessionId, ...payload } = await request.json();
|
|
241
|
+
|
|
242
|
+
const session = client.agentSessions.attach(sessionId, {
|
|
243
|
+
tools: {
|
|
244
|
+
/* ... */
|
|
245
|
+
},
|
|
246
|
+
});
|
|
247
|
+
const events = session.execute(
|
|
248
|
+
{
|
|
249
|
+
...payload,
|
|
250
|
+
sender: { id: user.id, name: user.name, image: user.avatarUrl },
|
|
251
|
+
},
|
|
252
|
+
{ signal: request.signal },
|
|
253
|
+
);
|
|
254
|
+
|
|
255
|
+
return new Response(toSSEStream(events));
|
|
256
|
+
}
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
The runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).
|
|
260
|
+
|
|
226
261
|
### Stop Support
|
|
227
262
|
|
|
228
263
|
Pass an abort signal to allow clients to stop generation:
|
|
@@ -16,6 +16,13 @@ interface UIMessage {
|
|
|
16
16
|
parts: UIMessagePart[];
|
|
17
17
|
status: 'streaming' | 'done';
|
|
18
18
|
createdAt: Date;
|
|
19
|
+
sender?: UIMessageSender; // Author of a user message, in multi-user chats
|
|
20
|
+
}
|
|
21
|
+
|
|
22
|
+
interface UIMessageSender {
|
|
23
|
+
id?: string;
|
|
24
|
+
name?: string;
|
|
25
|
+
image?: string; // Avatar URL
|
|
19
26
|
}
|
|
20
27
|
```
|
|
21
28
|
|
|
@@ -133,7 +140,7 @@ interface UIWorkerPart {
|
|
|
133
140
|
parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)
|
|
134
141
|
output?: unknown;
|
|
135
142
|
error?: string;
|
|
136
|
-
status: 'running' | 'done' | 'error';
|
|
143
|
+
status: 'running' | 'done' | 'error' | 'cancelled';
|
|
137
144
|
}
|
|
138
145
|
|
|
139
146
|
// Step boundary marker (structural, not rendered visually)
|
|
@@ -225,6 +232,22 @@ async function handleSend(text: string, files?: FileReference[]) {
|
|
|
225
232
|
|
|
226
233
|
See [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.
|
|
227
234
|
|
|
235
|
+
### Attributing the Sender (Multi-User Chats)
|
|
236
|
+
|
|
237
|
+
In conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:
|
|
238
|
+
|
|
239
|
+
```tsx
|
|
240
|
+
await send(
|
|
241
|
+
'user-message',
|
|
242
|
+
{ USER_MESSAGE: text },
|
|
243
|
+
{
|
|
244
|
+
userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },
|
|
245
|
+
},
|
|
246
|
+
);
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
This `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.
|
|
250
|
+
|
|
228
251
|
## Rendering Messages
|
|
229
252
|
|
|
230
253
|
### Basic Rendering
|
|
@@ -126,7 +126,7 @@ Skills that have [secrets](#skill-secrets) configured run in **secure mode**, wh
|
|
|
126
126
|
|
|
127
127
|
## Device Execution
|
|
128
128
|
|
|
129
|
-
By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer
|
|
129
|
+
By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.
|
|
130
130
|
|
|
131
131
|
```yaml
|
|
132
132
|
skills:
|
|
@@ -154,7 +154,7 @@ The generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_
|
|
|
154
154
|
|
|
155
155
|
| Aspect | Sandbox (default) | Device |
|
|
156
156
|
| ------------------- | ---------------------------------- | ------------------------------------------------------ |
|
|
157
|
-
| **Environment** | Isolated sandbox |
|
|
157
|
+
| **Environment** | Isolated sandbox | The agent's computer |
|
|
158
158
|
| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
|
|
159
159
|
| **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |
|
|
160
160
|
| **Code execution** | Via `octavus_code_run` | Via device shell MCP |
|
|
@@ -236,6 +236,7 @@ description: >
|
|
|
236
236
|
version: 1.0.0
|
|
237
237
|
license: MIT
|
|
238
238
|
author: Octavus Team
|
|
239
|
+
category: Productivity
|
|
239
240
|
---
|
|
240
241
|
|
|
241
242
|
# QR Code Generator
|
|
@@ -273,6 +274,7 @@ Main script for generating QR codes...
|
|
|
273
274
|
| `version` | No | Semantic version string |
|
|
274
275
|
| `license` | No | License identifier |
|
|
275
276
|
| `author` | No | Skill author |
|
|
277
|
+
| `category` | No | Display category used to group and filter skills in the UI |
|
|
276
278
|
| `secrets` | No | Array of secret declarations (enables secure mode) |
|
|
277
279
|
|
|
278
280
|
## Best Practices
|
|
@@ -21,25 +21,28 @@ agent:
|
|
|
21
21
|
|
|
22
22
|
## Configuration Options
|
|
23
23
|
|
|
24
|
-
| Field
|
|
25
|
-
|
|
|
26
|
-
| `model`
|
|
27
|
-
| `backupModel`
|
|
28
|
-
| `system`
|
|
29
|
-
| `input`
|
|
30
|
-
| `tools`
|
|
31
|
-
| `mcpServers`
|
|
32
|
-
| `skills`
|
|
33
|
-
| `references`
|
|
34
|
-
| `sandboxTimeout`
|
|
35
|
-
| `imageModel`
|
|
36
|
-
| `webSearch`
|
|
37
|
-
| `agentic`
|
|
38
|
-
| `maxSteps`
|
|
39
|
-
| `temperature`
|
|
40
|
-
| `thinking`
|
|
41
|
-
| `
|
|
42
|
-
| `
|
|
24
|
+
| Field | Required | Description |
|
|
25
|
+
| --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
|
|
26
|
+
| `model` | Yes | Model identifier or variable reference |
|
|
27
|
+
| `backupModel` | No | Backup model for automatic failover on provider errors |
|
|
28
|
+
| `system` | Yes | System prompt filename (without .md) |
|
|
29
|
+
| `input` | No | Variables to pass to the system prompt |
|
|
30
|
+
| `tools` | No | List of tools the LLM can call |
|
|
31
|
+
| `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |
|
|
32
|
+
| `skills` | No | List of Octavus skills the LLM can use |
|
|
33
|
+
| `references` | No | List of references the LLM can fetch on demand |
|
|
34
|
+
| `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |
|
|
35
|
+
| `imageModel` | No | Image generation model (enables agentic image generation) |
|
|
36
|
+
| `webSearch` | No | Enable built-in web search tool (provider-agnostic) |
|
|
37
|
+
| `agentic` | No | Allow multiple tool call cycles |
|
|
38
|
+
| `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |
|
|
39
|
+
| `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |
|
|
40
|
+
| `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
|
|
41
|
+
| `speed` | No | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode)) |
|
|
42
|
+
| `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |
|
|
43
|
+
| `maxToolOutputTokens` | No | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |
|
|
44
|
+
| `contextManagement` | No | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management)) |
|
|
45
|
+
| `anthropic` | No | Anthropic-specific options (tools, skills) |
|
|
43
46
|
|
|
44
47
|
## Models
|
|
45
48
|
|
|
@@ -50,7 +53,7 @@ Specify models in `provider/model-id` format. Any model supported by the provide
|
|
|
50
53
|
| Provider | Format | Examples |
|
|
51
54
|
| --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |
|
|
52
55
|
| Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |
|
|
53
|
-
| Google | `google/{model-id}` | `gemini-3-
|
|
56
|
+
| Google | `google/{model-id}` | `gemini-3.5-flash`, `gemini-3-flash-preview`, `gemini-2.5-flash` |
|
|
54
57
|
| OpenAI | `openai/{model-id}` | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1` |
|
|
55
58
|
|
|
56
59
|
### Examples
|
|
@@ -456,7 +459,7 @@ agent:
|
|
|
456
459
|
|
|
457
460
|
## Dynamic Configuration
|
|
458
461
|
|
|
459
|
-
Like `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
|
|
462
|
+
Like `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
|
|
460
463
|
|
|
461
464
|
```yaml
|
|
462
465
|
input:
|
|
@@ -548,9 +551,10 @@ handlers:
|
|
|
548
551
|
Start summary thread:
|
|
549
552
|
block: start-thread
|
|
550
553
|
thread: summary
|
|
551
|
-
model: anthropic/claude-
|
|
554
|
+
model: anthropic/claude-opus-4-8 # Different model
|
|
552
555
|
backupModel: openai/gpt-4o # Failover model
|
|
553
556
|
thinking: low # Different thinking
|
|
557
|
+
speed: fast # Fast mode for this thread (supported Opus models only)
|
|
554
558
|
cache: off # Different cache mode (does not inherit from agent)
|
|
555
559
|
maxSteps: 1 # Limit tool calls
|
|
556
560
|
system: escalation-summary # Different prompt
|
|
@@ -562,7 +566,7 @@ handlers:
|
|
|
562
566
|
todoList: true # Thread-specific task list
|
|
563
567
|
```
|
|
564
568
|
|
|
565
|
-
Each thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section.
|
|
569
|
+
Each thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section - which is how a worker enables fast mode.
|
|
566
570
|
|
|
567
571
|
## Full Example
|
|
568
572
|
|
|
@@ -333,7 +333,7 @@ When a skill declares secrets and an organization configures them, the skill run
|
|
|
333
333
|
|
|
334
334
|
| Aspect | Standard Skills | Secure Skills | Device Skills |
|
|
335
335
|
| ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |
|
|
336
|
-
| **Environment** | Shared sandbox | Isolated sandbox (one per skill) |
|
|
336
|
+
| **Environment** | Shared sandbox | Isolated sandbox (one per skill) | The agent's computer |
|
|
337
337
|
| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
|
|
338
338
|
| **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |
|
|
339
339
|
| **Secrets** | No secrets | Secrets as env vars | No secrets |
|
|
@@ -219,21 +219,22 @@ steps:
|
|
|
219
219
|
|
|
220
220
|
All LLM configuration goes here:
|
|
221
221
|
|
|
222
|
-
| Field
|
|
223
|
-
|
|
|
224
|
-
| `thread`
|
|
225
|
-
| `model`
|
|
226
|
-
| `system`
|
|
227
|
-
| `input`
|
|
228
|
-
| `tools`
|
|
229
|
-
| `skills`
|
|
230
|
-
| `mcpServers`
|
|
231
|
-
| `imageModel`
|
|
232
|
-
| `webSearch`
|
|
233
|
-
| `thinking`
|
|
234
|
-
| `cache`
|
|
235
|
-
| `temperature`
|
|
236
|
-
| `maxSteps`
|
|
222
|
+
| Field | Description |
|
|
223
|
+
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
|
|
224
|
+
| `thread` | Thread name (defaults to block name) |
|
|
225
|
+
| `model` | LLM model to use |
|
|
226
|
+
| `system` | System prompt filename (required) |
|
|
227
|
+
| `input` | Variables for system prompt |
|
|
228
|
+
| `tools` | Tools available in this thread |
|
|
229
|
+
| `skills` | Octavus skills available in this thread |
|
|
230
|
+
| `mcpServers` | MCP servers available in this thread |
|
|
231
|
+
| `imageModel` | Image generation model |
|
|
232
|
+
| `webSearch` | Enable built-in web search tool |
|
|
233
|
+
| `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
|
|
234
|
+
| `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |
|
|
235
|
+
| `temperature` | Model temperature (0-2), `"off"`, or variable reference |
|
|
236
|
+
| `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |
|
|
237
|
+
| `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread's model view (head+tail preview + note). Omit to leave tool output unbounded |
|
|
237
238
|
|
|
238
239
|
## Simple Example
|
|
239
240
|
|
|
@@ -468,10 +469,11 @@ All standard events (text-delta, tool calls, etc.) are also emitted.
|
|
|
468
469
|
|
|
469
470
|
## Calling Workers from Interactive Agents
|
|
470
471
|
|
|
471
|
-
Interactive agents can call workers in
|
|
472
|
+
Interactive agents can call workers in three ways:
|
|
472
473
|
|
|
473
474
|
1. **Deterministically** - Using the `run-worker` block
|
|
474
475
|
2. **Agentically** - LLM calls worker as a tool
|
|
476
|
+
3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management's `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.
|
|
475
477
|
|
|
476
478
|
### Worker Declaration
|
|
477
479
|
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Context Management
|
|
3
|
+
description: Automatic context-window compaction so long sessions keep running past the model's limit.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Context Management
|
|
7
|
+
|
|
8
|
+
Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.
|
|
9
|
+
|
|
10
|
+
Compaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.
|
|
11
|
+
|
|
12
|
+
## Configuration
|
|
13
|
+
|
|
14
|
+
```yaml
|
|
15
|
+
workers:
|
|
16
|
+
context-summarizer: # the worker that produces the running summary
|
|
17
|
+
description: Summarizes earlier conversation to free up context
|
|
18
|
+
display: description
|
|
19
|
+
|
|
20
|
+
agent:
|
|
21
|
+
model: anthropic/claude-sonnet-4-5
|
|
22
|
+
system: system
|
|
23
|
+
maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)
|
|
24
|
+
# context-summarizer is intentionally NOT listed in agent.workers,
|
|
25
|
+
# so the model never sees it as a callable tool.
|
|
26
|
+
contextManagement:
|
|
27
|
+
summarizerWorker: context-summarizer
|
|
28
|
+
thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)
|
|
29
|
+
recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
`maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:
|
|
33
|
+
|
|
34
|
+
| Field | Required | Description |
|
|
35
|
+
| ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |
|
|
36
|
+
| `summarizerWorker` | No | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |
|
|
37
|
+
| `thresholdPercent` | No | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |
|
|
38
|
+
| `recentPercent` | No | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization. |
|
|
39
|
+
| `recentWindow` | No | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction). |
|
|
40
|
+
|
|
41
|
+
## How it works
|
|
42
|
+
|
|
43
|
+
- When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.
|
|
44
|
+
- When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.
|
|
45
|
+
- Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.
|
|
46
|
+
- If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.
|
|
47
|
+
|
|
48
|
+
## Bounded tool output
|
|
49
|
+
|
|
50
|
+
Some tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.
|
|
51
|
+
|
|
52
|
+
There is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.
|
|
53
|
+
|
|
54
|
+
Bounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.
|
|
55
|
+
|
|
56
|
+
## The summarizer worker
|
|
57
|
+
|
|
58
|
+
`summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.
|
|
59
|
+
|
|
60
|
+
Summarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.
|
|
61
|
+
|
|
62
|
+
Declare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.
|
|
63
|
+
|
|
64
|
+
Without a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.
|
|
65
|
+
|
|
66
|
+
## What users see
|
|
67
|
+
|
|
68
|
+
Because the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Fast Mode
|
|
3
|
+
description: Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Fast Mode
|
|
7
|
+
|
|
8
|
+
Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):
|
|
9
|
+
|
|
10
|
+
```yaml
|
|
11
|
+
agent:
|
|
12
|
+
model: anthropic/claude-opus-4-8
|
|
13
|
+
speed: fast # fast | standard (default)
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
| Mode | Behavior | When to use |
|
|
17
|
+
| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
|
|
18
|
+
| `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents. |
|
|
19
|
+
| `fast` | Higher output speed at a premium per-token rate. | Latency-sensitive, interactive agents where faster responses are worth the premium. |
|
|
20
|
+
|
|
21
|
+
Fast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.
|
|
22
|
+
|
|
23
|
+
## Supported models
|
|
24
|
+
|
|
25
|
+
Fast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.
|
|
26
|
+
|
|
27
|
+
When you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.
|
|
28
|
+
|
|
29
|
+
## Premium pricing
|
|
30
|
+
|
|
31
|
+
Fast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:
|
|
32
|
+
|
|
33
|
+
| Model | Fast-mode cost |
|
|
34
|
+
| -------------- | -------------- |
|
|
35
|
+
| Opus 4.8 | ~2x standard |
|
|
36
|
+
| Opus 4.7 / 4.6 | ~6x standard |
|
|
37
|
+
|
|
38
|
+
Prompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.
|
|
39
|
+
|
|
40
|
+
## Rate limits and fallback
|
|
41
|
+
|
|
42
|
+
Fast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.
|
|
43
|
+
|
|
44
|
+
Falling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.
|
|
45
|
+
|
|
46
|
+
## Routing
|
|
47
|
+
|
|
48
|
+
A supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:
|
|
49
|
+
|
|
50
|
+
| Route | Example model | How fast mode is enabled |
|
|
51
|
+
| ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |
|
|
52
|
+
| Direct Anthropic | `anthropic/claude-opus-4-8` | `speed: fast` |
|
|
53
|
+
| Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7` | `speed: fast` |
|
|
54
|
+
| OpenRouter | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |
|
|
55
|
+
|
|
56
|
+
## Passing speed as input
|
|
57
|
+
|
|
58
|
+
Like `thinking`, `speed` accepts a variable reference so consumers choose it per session:
|
|
59
|
+
|
|
60
|
+
```yaml
|
|
61
|
+
input:
|
|
62
|
+
SPEED:
|
|
63
|
+
type: string
|
|
64
|
+
description: Inference speed (fast/standard)
|
|
65
|
+
optional: true
|
|
66
|
+
|
|
67
|
+
agent:
|
|
68
|
+
model: anthropic/claude-opus-4-8
|
|
69
|
+
speed: SPEED # Resolved from session input; unset -> standard
|
|
70
|
+
system: system
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
An unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.
|
|
74
|
+
|
|
75
|
+
## Scope
|
|
76
|
+
|
|
77
|
+
`speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.
|