@octavus/docs 4.2.0 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -194,6 +194,7 @@ interface TriggerRequest {
194
194
  triggerName: string;
195
195
  input?: Record<string, unknown>;
196
196
  rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID
197
+ sender?: UIMessageSender; // Author of this turn, for multi-user attribution
197
198
  }
198
199
 
199
200
  // Continue after client-side tool handling
@@ -223,6 +224,40 @@ export async function POST(request: Request) {
223
224
  }
224
225
  ```
225
226
 
227
+ ### Attributing Messages in Multi-User Chats
228
+
229
+ When several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:
230
+
231
+ ```typescript
232
+ interface UIMessageSender {
233
+ id?: string;
234
+ name?: string;
235
+ image?: string; // Avatar URL
236
+ }
237
+
238
+ export async function POST(request: Request) {
239
+ const user = await authenticate(request); // your auth
240
+ const { sessionId, ...payload } = await request.json();
241
+
242
+ const session = client.agentSessions.attach(sessionId, {
243
+ tools: {
244
+ /* ... */
245
+ },
246
+ });
247
+ const events = session.execute(
248
+ {
249
+ ...payload,
250
+ sender: { id: user.id, name: user.name, image: user.avatarUrl },
251
+ },
252
+ { signal: request.signal },
253
+ );
254
+
255
+ return new Response(toSSEStream(events));
256
+ }
257
+ ```
258
+
259
+ The runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).
260
+
226
261
  ### Stop Support
227
262
 
228
263
  Pass an abort signal to allow clients to stop generation:
@@ -16,6 +16,13 @@ interface UIMessage {
16
16
  parts: UIMessagePart[];
17
17
  status: 'streaming' | 'done';
18
18
  createdAt: Date;
19
+ sender?: UIMessageSender; // Author of a user message, in multi-user chats
20
+ }
21
+
22
+ interface UIMessageSender {
23
+ id?: string;
24
+ name?: string;
25
+ image?: string; // Avatar URL
19
26
  }
20
27
  ```
21
28
 
@@ -225,6 +232,22 @@ async function handleSend(text: string, files?: FileReference[]) {
225
232
 
226
233
  See [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.
227
234
 
235
+ ### Attributing the Sender (Multi-User Chats)
236
+
237
+ In conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:
238
+
239
+ ```tsx
240
+ await send(
241
+ 'user-message',
242
+ { USER_MESSAGE: text },
243
+ {
244
+ userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },
245
+ },
246
+ );
247
+ ```
248
+
249
+ This `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.
250
+
228
251
  ## Rendering Messages
229
252
 
230
253
  ### Basic Rendering
@@ -34,22 +34,24 @@ tools:
34
34
 
35
35
  ### Tool Fields
36
36
 
37
- | Field | Required | Description |
38
- | ------------- | -------- | ------------------------------------------------------------ |
39
- | `description` | Yes | What the tool does (shown to LLM and optionally user) |
40
- | `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` |
41
- | `parameters` | No | Input parameters the tool accepts |
37
+ | Field | Required | Description |
38
+ | ------------- | -------- | -------------------------------------------------------------------------- |
39
+ | `description` | Yes | What the tool does (shown to LLM and optionally user) |
40
+ | `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream`, `title` |
41
+ | `title` | No | UI label shown when `display: title` (hides the description and arguments) |
42
+ | `parameters` | No | Input parameters the tool accepts |
42
43
 
43
44
  ### Display Modes
44
45
 
45
46
  Controls what the client sees about tool execution. The default is `description`.
46
47
 
47
- | Mode | Behavior |
48
- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
49
- | `hidden` | No UI events emitted. The tool executes silently and the user has no awareness it was called. Use for internal plumbing tools (title setting, context management). |
50
- | `name` | Shows the raw tool name while executing. Arguments and result are not displayed. |
51
- | `description` | Shows the tool's description while executing (default). Arguments are visible during live streaming but the result is not preserved after page refresh. |
52
- | `stream` | Full visibility. Arguments stream progressively as the LLM generates them, and the result is shown after execution. The result is preserved after page refresh. |
48
+ | Mode | Behavior |
49
+ | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
50
+ | `hidden` | No UI events emitted. The tool executes silently and the user has no awareness it was called. Use for internal plumbing tools (title setting, context management). |
51
+ | `name` | Shows the raw tool name while executing. Arguments and result are not displayed. |
52
+ | `description` | Shows the tool's description while executing (default). Arguments are visible during live streaming but the result is not preserved after page refresh. |
53
+ | `stream` | Full visibility. Arguments stream progressively as the LLM generates them, and the result is shown after execution. The result is preserved after page refresh. |
54
+ | `title` | Shows the tool's `title` plus the tool name only. The description, arguments, and result are hidden in the UI; the `description` is still sent to the LLM. Use for server-side tools that should appear as a labeled step without exposing their inputs/outputs. |
53
55
 
54
56
  **When to use `stream`:**
55
57
 
@@ -64,6 +66,14 @@ Controls what the client sees about tool execution. The default is `description`
64
66
  - Context-setting tools that would clutter the UI
65
67
  - Tools that are implementation details of the agent's protocol
66
68
 
69
+ **When to use `title`:**
70
+
71
+ - Server-side tools that should appear in the UI as a clean, labeled step (e.g. "Looking up your account") without exposing their arguments or result
72
+ - Tools whose `description` is written for the LLM and shouldn't be shown verbatim to the user
73
+ - This is the recommended mode for server-executed tools that should be visible: give them a human-readable `title` for the UI and keep `description` focused on instructing the model
74
+
75
+ `title` is the preferred choice for server-side tools that should surface in the UI. `name` and `description` remain supported for backward compatibility, but for new server-side tools prefer `title` (a clean UI label) - or `stream` for client tools where the user benefits from seeing the arguments and result.
76
+
67
77
  **Refresh and restore behavior:**
68
78
 
69
79
  `stream` is the only mode that preserves the tool result after a page refresh. For all other modes, the result is available during the live session but stripped on refresh. On session restore (when the session expires and is rebuilt from stored `UIMessage[]`), `stream` tools retain their original result while other modes receive a placeholder.
@@ -45,11 +45,12 @@ skills:
45
45
 
46
46
  ### Skill Fields
47
47
 
48
- | Field | Required | Description |
49
- | ------------- | -------- | ------------------------------------------------------------------------------------- |
50
- | `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |
51
- | `description` | No | Custom description shown to users (overrides skill's built-in description) |
52
- | `execution` | No | Where the skill runs: `sandbox` (default) or `device` |
48
+ | Field | Required | Description |
49
+ | ------------- | -------- | ---------------------------------------------------------------------------------------------- |
50
+ | `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream`, `title` (default: `description`) |
51
+ | `title` | No | UI label shown when `display: title` (hides the description and arguments) |
52
+ | `description` | No | Custom description shown to users (overrides skill's built-in description) |
53
+ | `execution` | No | Where the skill runs: `sandbox` (default) or `device` |
53
54
 
54
55
  ### Display Modes
55
56
 
@@ -126,7 +127,7 @@ Skills that have [secrets](#skill-secrets) configured run in **secure mode**, wh
126
127
 
127
128
  ## Device Execution
128
129
 
129
- By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.
130
+ By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.
130
131
 
131
132
  ```yaml
132
133
  skills:
@@ -154,7 +155,7 @@ The generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_
154
155
 
155
156
  | Aspect | Sandbox (default) | Device |
156
157
  | ------------------- | ---------------------------------- | ------------------------------------------------------ |
157
- | **Environment** | Isolated sandbox | Agent's computer (VM or desktop) |
158
+ | **Environment** | Isolated sandbox | The agent's computer |
158
159
  | **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
159
160
  | **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |
160
161
  | **Code execution** | Via `octavus_code_run` | Via device shell MCP |
@@ -224,12 +224,13 @@ For agentic image generation where the LLM decides when to generate, configure `
224
224
 
225
225
  Every block has a `display` property:
226
226
 
227
- | Mode | Default For | Behavior |
228
- | ------------- | ------------------------- | ----------------- |
229
- | `hidden` | add-message | Not shown to user |
230
- | `name` | set-resource | Shows block name |
231
- | `description` | tool-call, generate-image | Shows description |
232
- | `stream` | next-message | Streams content |
227
+ | Mode | Default For | Behavior |
228
+ | ------------- | ------------------------- | ------------------------------- |
229
+ | `hidden` | add-message | Not shown to user |
230
+ | `name` | set-resource | Shows block name |
231
+ | `description` | tool-call, generate-image | Shows description |
232
+ | `stream` | next-message | Streams content |
233
+ | `title` | - | Shows the block's `title` field |
233
234
 
234
235
  ## Complete Example
235
236
 
@@ -21,25 +21,28 @@ agent:
21
21
 
22
22
  ## Configuration Options
23
23
 
24
- | Field | Required | Description |
25
- | ---------------- | -------- | ---------------------------------------------------------------------------------------- |
26
- | `model` | Yes | Model identifier or variable reference |
27
- | `backupModel` | No | Backup model for automatic failover on provider errors |
28
- | `system` | Yes | System prompt filename (without .md) |
29
- | `input` | No | Variables to pass to the system prompt |
30
- | `tools` | No | List of tools the LLM can call |
31
- | `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |
32
- | `skills` | No | List of Octavus skills the LLM can use |
33
- | `references` | No | List of references the LLM can fetch on demand |
34
- | `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |
35
- | `imageModel` | No | Image generation model (enables agentic image generation) |
36
- | `webSearch` | No | Enable built-in web search tool (provider-agnostic) |
37
- | `agentic` | No | Allow multiple tool call cycles |
38
- | `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |
39
- | `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |
40
- | `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
41
- | `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |
42
- | `anthropic` | No | Anthropic-specific options (tools, skills) |
24
+ | Field | Required | Description |
25
+ | --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
26
+ | `model` | Yes | Model identifier or variable reference |
27
+ | `backupModel` | No | Backup model for automatic failover on provider errors |
28
+ | `system` | Yes | System prompt filename (without .md) |
29
+ | `input` | No | Variables to pass to the system prompt |
30
+ | `tools` | No | List of tools the LLM can call |
31
+ | `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |
32
+ | `skills` | No | List of Octavus skills the LLM can use |
33
+ | `references` | No | List of references the LLM can fetch on demand |
34
+ | `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |
35
+ | `imageModel` | No | Image generation model (enables agentic image generation) |
36
+ | `webSearch` | No | Enable built-in web search tool (provider-agnostic) |
37
+ | `agentic` | No | Allow multiple tool call cycles |
38
+ | `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |
39
+ | `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |
40
+ | `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
41
+ | `speed` | No | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode)) |
42
+ | `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |
43
+ | `maxToolOutputTokens` | No | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |
44
+ | `contextManagement` | No | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management)) |
45
+ | `anthropic` | No | Anthropic-specific options (tools, skills) |
43
46
 
44
47
  ## Models
45
48
 
@@ -456,7 +459,7 @@ agent:
456
459
 
457
460
  ## Dynamic Configuration
458
461
 
459
- Like `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
462
+ Like `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
460
463
 
461
464
  ```yaml
462
465
  input:
@@ -548,9 +551,10 @@ handlers:
548
551
  Start summary thread:
549
552
  block: start-thread
550
553
  thread: summary
551
- model: anthropic/claude-sonnet-4-5 # Different model
554
+ model: anthropic/claude-opus-4-8 # Different model
552
555
  backupModel: openai/gpt-4o # Failover model
553
556
  thinking: low # Different thinking
557
+ speed: fast # Fast mode for this thread (supported Opus models only)
554
558
  cache: off # Different cache mode (does not inherit from agent)
555
559
  maxSteps: 1 # Limit tool calls
556
560
  system: escalation-summary # Different prompt
@@ -562,7 +566,7 @@ handlers:
562
566
  todoList: true # Thread-specific task list
563
567
  ```
564
568
 
565
- Each thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section.
569
+ Each thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section - which is how a worker enables fast mode.
566
570
 
567
571
  ## Full Example
568
572
 
@@ -58,10 +58,11 @@ anthropic:
58
58
  description: Searching... # Custom display text
59
59
  ```
60
60
 
61
- | Field | Required | Description |
62
- | ------------- | -------- | --------------------------------------------------------------------- |
63
- | `display` | No | `hidden`, `name`, `description`, or `stream` (default: `description`) |
64
- | `description` | No | Custom text shown to users during execution |
61
+ | Field | Required | Description |
62
+ | ------------- | -------- | ------------------------------------------------------------------------------ |
63
+ | `display` | No | `hidden`, `name`, `description`, `stream`, or `title` (default: `description`) |
64
+ | `title` | No | UI label shown when `display: title` (hides description and arguments) |
65
+ | `description` | No | Custom text shown to users during execution |
65
66
 
66
67
  ### Web Search
67
68
 
@@ -120,12 +121,13 @@ anthropic:
120
121
  description: Processing PDF
121
122
  ```
122
123
 
123
- | Field | Required | Description |
124
- | ------------- | -------- | --------------------------------------------------------------------- |
125
- | `type` | Yes | `anthropic` (built-in) or `custom` (uploaded) |
126
- | `version` | No | Skill version (default: `latest`) |
127
- | `display` | No | `hidden`, `name`, `description`, or `stream` (default: `description`) |
128
- | `description` | No | Custom text shown to users |
124
+ | Field | Required | Description |
125
+ | ------------- | -------- | ------------------------------------------------------------------------------ |
126
+ | `type` | Yes | `anthropic` (built-in) or `custom` (uploaded) |
127
+ | `version` | No | Skill version (default: `latest`) |
128
+ | `display` | No | `hidden`, `name`, `description`, `stream`, or `title` (default: `description`) |
129
+ | `title` | No | UI label shown when `display: title` (hides description and arguments) |
130
+ | `description` | No | Custom text shown to users |
129
131
 
130
132
  ### Built-in Skills
131
133
 
@@ -333,7 +333,7 @@ When a skill declares secrets and an organization configures them, the skill run
333
333
 
334
334
  | Aspect | Standard Skills | Secure Skills | Device Skills |
335
335
  | ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |
336
- | **Environment** | Shared sandbox | Isolated sandbox (one per skill) | Agent's computer (VM or desktop) |
336
+ | **Environment** | Shared sandbox | Isolated sandbox (one per skill) | The agent's computer |
337
337
  | **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
338
338
  | **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |
339
339
  | **Secrets** | No secrets | Secrets as env vars | No secrets |
@@ -219,21 +219,22 @@ steps:
219
219
 
220
220
  All LLM configuration goes here:
221
221
 
222
- | Field | Description |
223
- | ------------- | -------------------------------------------------------------------------------------- |
224
- | `thread` | Thread name (defaults to block name) |
225
- | `model` | LLM model to use |
226
- | `system` | System prompt filename (required) |
227
- | `input` | Variables for system prompt |
228
- | `tools` | Tools available in this thread |
229
- | `skills` | Octavus skills available in this thread |
230
- | `mcpServers` | MCP servers available in this thread |
231
- | `imageModel` | Image generation model |
232
- | `webSearch` | Enable built-in web search tool |
233
- | `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
234
- | `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |
235
- | `temperature` | Model temperature (0-2), `"off"`, or variable reference |
236
- | `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |
222
+ | Field | Description |
223
+ | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
224
+ | `thread` | Thread name (defaults to block name) |
225
+ | `model` | LLM model to use |
226
+ | `system` | System prompt filename (required) |
227
+ | `input` | Variables for system prompt |
228
+ | `tools` | Tools available in this thread |
229
+ | `skills` | Octavus skills available in this thread |
230
+ | `mcpServers` | MCP servers available in this thread |
231
+ | `imageModel` | Image generation model |
232
+ | `webSearch` | Enable built-in web search tool |
233
+ | `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
234
+ | `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |
235
+ | `temperature` | Model temperature (0-2), `"off"`, or variable reference |
236
+ | `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |
237
+ | `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread's model view (head+tail preview + note). Omit to leave tool output unbounded |
237
238
 
238
239
  ## Simple Example
239
240
 
@@ -468,10 +469,11 @@ All standard events (text-delta, tool calls, etc.) are also emitted.
468
469
 
469
470
  ## Calling Workers from Interactive Agents
470
471
 
471
- Interactive agents can call workers in two ways:
472
+ Interactive agents can call workers in three ways:
472
473
 
473
474
  1. **Deterministically** - Using the `run-worker` block
474
475
  2. **Agentically** - LLM calls worker as a tool
476
+ 3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management's `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.
475
477
 
476
478
  ### Worker Declaration
477
479
 
@@ -528,6 +530,7 @@ Controls how worker execution appears to users. The default for workers is `stre
528
530
  | `name` | Shows a running/done indicator with the worker name. No nested content (text, tool calls, reasoning) is forwarded. |
529
531
  | `description` | Shows a running/done indicator with the worker description. No nested content is forwarded. |
530
532
  | `stream` | Full visibility. All nested events are forwarded - text, reasoning, tool calls, sources, files. Worker input is included on start. |
533
+ | `title` | Like `description`, but shows the worker's `title` field instead of its description. No nested content or input is forwarded. |
531
534
 
532
535
  **Progressive input streaming:** When a worker with `display: stream` is invoked agentically (LLM calls it as a tool), the `UIWorkerPart` appears in the UI immediately as the LLM starts generating the worker's arguments. The worker input streams progressively into the worker part, the same way text tokens stream into a text part. Once input finishes, worker execution begins and nested content flows into the same worker part. There is no intermediate tool card.
533
536
 
@@ -43,7 +43,8 @@ mcpServers:
43
43
  | ------------- | -------- | ---------------------------------------------------------------------------------------------------------------------- |
44
44
  | `description` | Yes | What the MCP server provides |
45
45
  | `source` | Yes | `remote`, `device`, or `consumer` (see source types above) |
46
- | `display` | No | How tool calls appear in UI: `hidden`, `name`, `description` (default: `description`) |
46
+ | `display` | No | How tool calls appear in UI: `hidden`, `name`, `description`, `stream`, `title` (default: `description`) |
47
+ | `title` | No | UI label shown when `display: title`; applies to every tool in this namespace (hides description and arguments) |
47
48
  | `connection` | No | When to connect: `eager` or `lazy` (default: `lazy`). `remote` only. |
48
49
  | `execution` | No | Where the MCP process runs: `sandbox` (default) or `device`. `remote` only. See [Device Execution](#device-execution). |
49
50
 
@@ -0,0 +1,68 @@
1
+ ---
2
+ title: Context Management
3
+ description: Automatic context-window compaction so long sessions keep running past the model's limit.
4
+ ---
5
+
6
+ # Context Management
7
+
8
+ Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.
9
+
10
+ Compaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.
11
+
12
+ ## Configuration
13
+
14
+ ```yaml
15
+ workers:
16
+ context-summarizer: # the worker that produces the running summary
17
+ description: Summarizes earlier conversation to free up context
18
+ display: description
19
+
20
+ agent:
21
+ model: anthropic/claude-sonnet-4-5
22
+ system: system
23
+ maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)
24
+ # context-summarizer is intentionally NOT listed in agent.workers,
25
+ # so the model never sees it as a callable tool.
26
+ contextManagement:
27
+ summarizerWorker: context-summarizer
28
+ thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)
29
+ recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)
30
+ ```
31
+
32
+ `maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:
33
+
34
+ | Field | Required | Description |
35
+ | ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |
36
+ | `summarizerWorker` | No | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |
37
+ | `thresholdPercent` | No | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |
38
+ | `recentPercent` | No | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization. |
39
+ | `recentWindow` | No | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction). |
40
+
41
+ ## How it works
42
+
43
+ - When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.
44
+ - When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.
45
+ - Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.
46
+ - If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.
47
+
48
+ ## Bounded tool output
49
+
50
+ Some tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.
51
+
52
+ There is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.
53
+
54
+ Bounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.
55
+
56
+ ## The summarizer worker
57
+
58
+ `summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.
59
+
60
+ Summarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.
61
+
62
+ Declare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.
63
+
64
+ Without a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.
65
+
66
+ ## What users see
67
+
68
+ Because the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.
@@ -0,0 +1,77 @@
1
+ ---
2
+ title: Fast Mode
3
+ description: Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.
4
+ ---
5
+
6
+ # Fast Mode
7
+
8
+ Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):
9
+
10
+ ```yaml
11
+ agent:
12
+ model: anthropic/claude-opus-4-8
13
+ speed: fast # fast | standard (default)
14
+ ```
15
+
16
+ | Mode | Behavior | When to use |
17
+ | ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
18
+ | `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents. |
19
+ | `fast` | Higher output speed at a premium per-token rate. | Latency-sensitive, interactive agents where faster responses are worth the premium. |
20
+
21
+ Fast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.
22
+
23
+ ## Supported models
24
+
25
+ Fast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.
26
+
27
+ When you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.
28
+
29
+ ## Premium pricing
30
+
31
+ Fast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:
32
+
33
+ | Model | Fast-mode cost |
34
+ | -------------- | -------------- |
35
+ | Opus 4.8 | ~2x standard |
36
+ | Opus 4.7 / 4.6 | ~6x standard |
37
+
38
+ Prompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.
39
+
40
+ ## Rate limits and fallback
41
+
42
+ Fast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.
43
+
44
+ Falling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.
45
+
46
+ ## Routing
47
+
48
+ A supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:
49
+
50
+ | Route | Example model | How fast mode is enabled |
51
+ | ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |
52
+ | Direct Anthropic | `anthropic/claude-opus-4-8` | `speed: fast` |
53
+ | Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7` | `speed: fast` |
54
+ | OpenRouter | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |
55
+
56
+ ## Passing speed as input
57
+
58
+ Like `thinking`, `speed` accepts a variable reference so consumers choose it per session:
59
+
60
+ ```yaml
61
+ input:
62
+ SPEED:
63
+ type: string
64
+ description: Inference speed (fast/standard)
65
+ optional: true
66
+
67
+ agent:
68
+ model: anthropic/claude-opus-4-8
69
+ speed: SPEED # Resolved from session input; unset -> standard
70
+ system: system
71
+ ```
72
+
73
+ An unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.
74
+
75
+ ## Scope
76
+
77
+ `speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.
@@ -0,0 +1,60 @@
1
+ ---
2
+ title: Overview
3
+ description: What Workforce Agents are and how to drive them programmatically.
4
+ ---
5
+
6
+ # Workforce Agents
7
+
8
+ Workforce Agents are Octavus's autonomous AI teammates - specialized agents you hire and configure in the dashboard, each with its own computer, skills, and tools. Browse and hire them at [octavus.ai/agents](https://octavus.ai/agents), and see the full roster at [octavus.ai/agents/discover](https://octavus.ai/agents/discover).
9
+
10
+ The **Workforce Agents API** lets you drive one of your agents without a browser: give it a task, wait for it to finish, and read what it did. It is built for automation - CI pipelines, scheduled jobs, and backend integrations that hand work to an agent and collect the result.
11
+
12
+ > Workforce Agents are distinct from agents you build yourself with the SDK. The [Sessions API](/docs/api-reference/sessions) drives agents you define and host; the Workforce Agents API drives the managed agents you hire and configure in the dashboard.
13
+
14
+ ## How it works
15
+
16
+ A run follows a simple dispatch-then-poll model - there is no stream or webhook to manage:
17
+
18
+ 1. **Dispatch** a message to an agent. This starts a **thread** (a conversation) and returns a `threadId` right away.
19
+ 2. **Poll** the thread until its `status` is terminal.
20
+ 3. **Read** the thread's messages to get the agent's work.
21
+
22
+ ```mermaid
23
+ sequenceDiagram
24
+ participant C as Your code
25
+ participant A as Workforce Agent
26
+ C->>A: dispatch a task
27
+ A-->>C: threadId and status
28
+ loop until terminal status
29
+ C->>A: get thread
30
+ A-->>C: status and messages
31
+ end
32
+ ```
33
+
34
+ ### Thread status
35
+
36
+ | Status | Meaning |
37
+ | ----------- | ----------------------------------------------------------------------------------- |
38
+ | `pending` | Dispatched, waiting to start |
39
+ | `queued` | Waiting for the agent to free up - an agent runs one task at a time on its computer |
40
+ | `running` | The agent is working |
41
+ | `completed` | Finished successfully (terminal) |
42
+ | `failed` | Ended with an error - see `failureReason` (terminal) |
43
+ | `cancelled` | Stopped (terminal) |
44
+
45
+ Poll while the status is `pending`, `queued`, or `running`, and stop once it is `completed`, `failed`, or `cancelled`.
46
+
47
+ ## Authentication
48
+
49
+ Each agent has its own **API key**, created from the agent's settings in the dashboard (Settings -> API). The key:
50
+
51
+ - Drives only that one agent - a key for one agent can never call another.
52
+ - Is a secret. Use it from a backend, script, or CI, never in a browser or client app.
53
+ - Is sent as a bearer token: `Authorization: Bearer oct_agt_...`.
54
+
55
+ Create a key, copy it once (it is shown only at creation), and store it securely.
56
+
57
+ ## Next steps
58
+
59
+ - [Using the SDK](/docs/workforce-agents/sdk) - the `@octavus/server-sdk` `workforce` client, including a run-and-wait helper.
60
+ - [API reference](/docs/workforce-agents/api-reference) - the REST endpoints, for any language.