npm - @octavus/docs - Versions diffs - 4.2.0 → 5.1.0 - Mend

@octavus/docs 4.2.0 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/content/02-server-sdk/02-sessions.md +35 -0
package/content/03-client-sdk/02-messages.md +23 -0
package/content/04-protocol/04-tools.md +21 -11
package/content/04-protocol/05-skills.md +8 -7
package/content/04-protocol/06-handlers.md +7 -6
package/content/04-protocol/07-agent-config.md +26 -22
package/content/04-protocol/08-provider-options.md +12 -10
package/content/04-protocol/09-skills-advanced.md +1 -1
package/content/04-protocol/11-workers.md +19 -16
package/content/04-protocol/13-mcp-servers.md +2 -1
package/content/04-protocol/14-context-management.md +68 -0
package/content/04-protocol/15-fast-mode.md +77 -0
package/content/07-workforce-agents/01-overview.md +60 -0
package/content/07-workforce-agents/02-sdk.md +105 -0
package/content/07-workforce-agents/03-api-reference.md +146 -0
package/content/07-workforce-agents/_meta.md +4 -0
package/dist/{chunk-DOHOCPKJ.js → chunk-RW7WOZKG.js} +129 -31
package/dist/chunk-RW7WOZKG.js.map +1 -0
package/dist/content.js +1 -1
package/dist/docs.json +60 -15
package/dist/index.js +1 -1
package/dist/search-index.json +1 -1
package/dist/search.js +1 -1
package/dist/search.js.map +1 -1
package/dist/sections.json +68 -15
package/package.json +1 -1
package/dist/chunk-DOHOCPKJ.js.map +0 -1

package/content/02-server-sdk/02-sessions.md CHANGED Viewed

@@ -194,6 +194,7 @@ interface TriggerRequest {
   triggerName: string;
   input?: Record<string, unknown>;
   rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID
+  sender?: UIMessageSender; // Author of this turn, for multi-user attribution
 }
 // Continue after client-side tool handling
@@ -223,6 +224,40 @@ export async function POST(request: Request) {
 }
 ```
+### Attributing Messages in Multi-User Chats
+When several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:
+```typescript
+interface UIMessageSender {
+  id?: string;
+  name?: string;
+  image?: string; // Avatar URL
+}
+export async function POST(request: Request) {
+  const user = await authenticate(request); // your auth
+  const { sessionId, ...payload } = await request.json();
+  const session = client.agentSessions.attach(sessionId, {
+    tools: {
+      /* ... */
+    },
+  });
+  const events = session.execute(
+    {
+      ...payload,
+      sender: { id: user.id, name: user.name, image: user.avatarUrl },
+    },
+    { signal: request.signal },
+  );
+  return new Response(toSSEStream(events));
+}
+```
+The runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).
 ### Stop Support
 Pass an abort signal to allow clients to stop generation:

package/content/03-client-sdk/02-messages.md CHANGED Viewed

@@ -16,6 +16,13 @@ interface UIMessage {
   parts: UIMessagePart[];
   status: 'streaming' | 'done';
   createdAt: Date;
+  sender?: UIMessageSender; // Author of a user message, in multi-user chats
+}
+interface UIMessageSender {
+  id?: string;
+  name?: string;
+  image?: string; // Avatar URL
 }
 ```
@@ -225,6 +232,22 @@ async function handleSend(text: string, files?: FileReference[]) {
 See [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.
+### Attributing the Sender (Multi-User Chats)
+In conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:
+```tsx
+await send(
+  'user-message',
+  { USER_MESSAGE: text },
+  {
+    userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },
+  },
+);
+```
+This `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.
 ## Rendering Messages
 ### Basic Rendering

package/content/04-protocol/04-tools.md CHANGED Viewed

@@ -34,22 +34,24 @@ tools:
 ### Tool Fields
-| Field         | Required | Description                                                  |
-| ------------- | -------- | ------------------------------------------------------------ |
-| `description` | Yes      | What the tool does (shown to LLM and optionally user)        |
-| `display`     | No       | How to show in UI: `hidden`, `name`, `description`, `stream` |
-| `parameters`  | No       | Input parameters the tool accepts                            |
+| Field         | Required | Description                                                                |
+| ------------- | -------- | -------------------------------------------------------------------------- |
+| `description` | Yes      | What the tool does (shown to LLM and optionally user)                      |
+| `display`     | No       | How to show in UI: `hidden`, `name`, `description`, `stream`, `title`      |
+| `title`       | No       | UI label shown when `display: title` (hides the description and arguments) |
+| `parameters`  | No       | Input parameters the tool accepts                                          |
 ### Display Modes
 Controls what the client sees about tool execution. The default is `description`.
-| Mode          | Behavior                                                                                                                                                           |
-| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `hidden`      | No UI events emitted. The tool executes silently and the user has no awareness it was called. Use for internal plumbing tools (title setting, context management). |
-| `name`        | Shows the raw tool name while executing. Arguments and result are not displayed.                                                                                   |
-| `description` | Shows the tool's description while executing (default). Arguments are visible during live streaming but the result is not preserved after page refresh.            |
-| `stream`      | Full visibility. Arguments stream progressively as the LLM generates them, and the result is shown after execution. The result is preserved after page refresh.    |
+| Mode          | Behavior                                                                                                                                                                                                                                                         |
+| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `hidden`      | No UI events emitted. The tool executes silently and the user has no awareness it was called. Use for internal plumbing tools (title setting, context management).                                                                                               |
+| `name`        | Shows the raw tool name while executing. Arguments and result are not displayed.                                                                                                                                                                                 |
+| `description` | Shows the tool's description while executing (default). Arguments are visible during live streaming but the result is not preserved after page refresh.                                                                                                          |
+| `stream`      | Full visibility. Arguments stream progressively as the LLM generates them, and the result is shown after execution. The result is preserved after page refresh.                                                                                                  |
+| `title`       | Shows the tool's `title` plus the tool name only. The description, arguments, and result are hidden in the UI; the `description` is still sent to the LLM. Use for server-side tools that should appear as a labeled step without exposing their inputs/outputs. |
 **When to use `stream`:**
@@ -64,6 +66,14 @@ Controls what the client sees about tool execution. The default is `description`
 - Context-setting tools that would clutter the UI
 - Tools that are implementation details of the agent's protocol
+**When to use `title`:**
+- Server-side tools that should appear in the UI as a clean, labeled step (e.g. "Looking up your account") without exposing their arguments or result
+- Tools whose `description` is written for the LLM and shouldn't be shown verbatim to the user
+- This is the recommended mode for server-executed tools that should be visible: give them a human-readable `title` for the UI and keep `description` focused on instructing the model
+`title` is the preferred choice for server-side tools that should surface in the UI. `name` and `description` remain supported for backward compatibility, but for new server-side tools prefer `title` (a clean UI label) - or `stream` for client tools where the user benefits from seeing the arguments and result.
 **Refresh and restore behavior:**
 `stream` is the only mode that preserves the tool result after a page refresh. For all other modes, the result is available during the live session but stripped on refresh. On session restore (when the session expires and is rebuilt from stored `UIMessage[]`), `stream` tools retain their original result while other modes receive a placeholder.

package/content/04-protocol/05-skills.md CHANGED Viewed

@@ -45,11 +45,12 @@ skills:
 ### Skill Fields
-| Field         | Required | Description                                                                           |
-| ------------- | -------- | ------------------------------------------------------------------------------------- |
-| `display`     | No       | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |
-| `description` | No       | Custom description shown to users (overrides skill's built-in description)            |
-| `execution`   | No       | Where the skill runs: `sandbox` (default) or `device`                                 |
+| Field         | Required | Description                                                                                    |
+| ------------- | -------- | ---------------------------------------------------------------------------------------------- |
+| `display`     | No       | How to show in UI: `hidden`, `name`, `description`, `stream`, `title` (default: `description`) |
+| `title`       | No       | UI label shown when `display: title` (hides the description and arguments)                     |
+| `description` | No       | Custom description shown to users (overrides skill's built-in description)                     |
+| `execution`   | No       | Where the skill runs: `sandbox` (default) or `device`                                          |
 ### Display Modes
@@ -126,7 +127,7 @@ Skills that have [secrets](#skill-secrets) configured run in **secure mode**, wh
 ## Device Execution
-By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.
+By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.
 ```yaml
 skills:
@@ -154,7 +155,7 @@ The generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_
 | Aspect              | Sandbox (default)                  | Device                                                 |
 | ------------------- | ---------------------------------- | ------------------------------------------------------ |
-| **Environment**     | Isolated sandbox                   | Agent's computer (VM or desktop)                       |
+| **Environment**     | Isolated sandbox                   | The agent's computer                                   |
 | **Available tools** | All 6 skill tools                  | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
 | **File access**     | Via `octavus_file_read/write`      | Via device filesystem MCP                              |
 | **Code execution**  | Via `octavus_code_run`             | Via device shell MCP                                   |

package/content/04-protocol/06-handlers.md CHANGED Viewed

@@ -224,12 +224,13 @@ For agentic image generation where the LLM decides when to generate, configure `
 Every block has a `display` property:
-| Mode          | Default For               | Behavior          |
-| ------------- | ------------------------- | ----------------- |
-| `hidden`      | add-message               | Not shown to user |
-| `name`        | set-resource              | Shows block name  |
-| `description` | tool-call, generate-image | Shows description |
-| `stream`      | next-message              | Streams content   |
+| Mode          | Default For               | Behavior                        |
+| ------------- | ------------------------- | ------------------------------- |
+| `hidden`      | add-message               | Not shown to user               |
+| `name`        | set-resource              | Shows block name                |
+| `description` | tool-call, generate-image | Shows description               |
+| `stream`      | next-message              | Streams content                 |
+| `title`       | -                         | Shows the block's `title` field |
 ## Complete Example

package/content/04-protocol/07-agent-config.md CHANGED Viewed

@@ -21,25 +21,28 @@ agent:
 ## Configuration Options
-| Field            | Required | Description                                                                              |
-| ---------------- | -------- | ---------------------------------------------------------------------------------------- |
-| `model`          | Yes      | Model identifier or variable reference                                                   |
-| `backupModel`    | No       | Backup model for automatic failover on provider errors                                   |
-| `system`         | Yes      | System prompt filename (without .md)                                                     |
-| `input`          | No       | Variables to pass to the system prompt                                                   |
-| `tools`          | No       | List of tools the LLM can call                                                           |
-| `mcpServers`     | No       | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers))           |
-| `skills`         | No       | List of Octavus skills the LLM can use                                                   |
-| `references`     | No       | List of references the LLM can fetch on demand                                           |
-| `sandboxTimeout` | No       | Skill sandbox timeout in ms (default: 5 min, max: 1 hour)                                |
-| `imageModel`     | No       | Image generation model (enables agentic image generation)                                |
-| `webSearch`      | No       | Enable built-in web search tool (provider-agnostic)                                      |
-| `agentic`        | No       | Allow multiple tool call cycles                                                          |
-| `maxSteps`       | No       | Maximum agentic steps (default: 10) - literal or variable reference                      |
-| `temperature`    | No       | Model temperature (0-2), `"off"`, or a variable reference                                |
-| `thinking`       | No       | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |
-| `cache`          | No       | Prompt caching mode: `auto` (default), `extended`, or `off`                              |
-| `anthropic`      | No       | Anthropic-specific options (tools, skills)                                               |
+| Field                 | Required | Description                                                                                                                    |
+| --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
+| `model`               | Yes      | Model identifier or variable reference                                                                                         |
+| `backupModel`         | No       | Backup model for automatic failover on provider errors                                                                         |
+| `system`              | Yes      | System prompt filename (without .md)                                                                                           |
+| `input`               | No       | Variables to pass to the system prompt                                                                                         |
+| `tools`               | No       | List of tools the LLM can call                                                                                                 |
+| `mcpServers`          | No       | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers))                                                 |
+| `skills`              | No       | List of Octavus skills the LLM can use                                                                                         |
+| `references`          | No       | List of references the LLM can fetch on demand                                                                                 |
+| `sandboxTimeout`      | No       | Skill sandbox timeout in ms (default: 5 min, max: 1 hour)                                                                      |
+| `imageModel`          | No       | Image generation model (enables agentic image generation)                                                                      |
+| `webSearch`           | No       | Enable built-in web search tool (provider-agnostic)                                                                            |
+| `agentic`             | No       | Allow multiple tool call cycles                                                                                                |
+| `maxSteps`            | No       | Maximum agentic steps (default: 10) - literal or variable reference                                                            |
+| `temperature`         | No       | Model temperature (0-2), `"off"`, or a variable reference                                                                      |
+| `thinking`            | No       | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference                                       |
+| `speed`               | No       | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode))                       |
+| `cache`               | No       | Prompt caching mode: `auto` (default), `extended`, or `off`                                                                    |
+| `maxToolOutputTokens` | No       | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |
+| `contextManagement`   | No       | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management))                              |
+| `anthropic`           | No       | Anthropic-specific options (tools, skills)                                                                                     |
 ## Models
@@ -456,7 +459,7 @@ agent:
 ## Dynamic Configuration
-Like `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
+Like `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:
 ```yaml
 input:
@@ -548,9 +551,10 @@ handlers:
     Start summary thread:
       block: start-thread
       thread: summary
-      model: anthropic/claude-sonnet-4-5 # Different model
+      model: anthropic/claude-opus-4-8 # Different model
       backupModel: openai/gpt-4o # Failover model
       thinking: low # Different thinking
+      speed: fast # Fast mode for this thread (supported Opus models only)
       cache: off # Different cache mode (does not inherit from agent)
       maxSteps: 1 # Limit tool calls
       system: escalation-summary # Different prompt
@@ -562,7 +566,7 @@ handlers:
       todoList: true # Thread-specific task list
 ```
-Each thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section.
+Each thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol's `skills:` section. References must exist in the agent's `references/` directory. Workers use this same pattern since they don't have a global `agent:` section - which is how a worker enables fast mode.
 ## Full Example

package/content/04-protocol/08-provider-options.md CHANGED Viewed

@@ -58,10 +58,11 @@ anthropic:
       description: Searching... # Custom display text
 ```
-| Field         | Required | Description                                                           |
-| ------------- | -------- | --------------------------------------------------------------------- |
-| `display`     | No       | `hidden`, `name`, `description`, or `stream` (default: `description`) |
-| `description` | No       | Custom text shown to users during execution                           |
+| Field         | Required | Description                                                                    |
+| ------------- | -------- | ------------------------------------------------------------------------------ |
+| `display`     | No       | `hidden`, `name`, `description`, `stream`, or `title` (default: `description`) |
+| `title`       | No       | UI label shown when `display: title` (hides description and arguments)         |
+| `description` | No       | Custom text shown to users during execution                                    |
 ### Web Search
@@ -120,12 +121,13 @@ anthropic:
       description: Processing PDF
 ```
-| Field         | Required | Description                                                           |
-| ------------- | -------- | --------------------------------------------------------------------- |
-| `type`        | Yes      | `anthropic` (built-in) or `custom` (uploaded)                         |
-| `version`     | No       | Skill version (default: `latest`)                                     |
-| `display`     | No       | `hidden`, `name`, `description`, or `stream` (default: `description`) |
-| `description` | No       | Custom text shown to users                                            |
+| Field         | Required | Description                                                                    |
+| ------------- | -------- | ------------------------------------------------------------------------------ |
+| `type`        | Yes      | `anthropic` (built-in) or `custom` (uploaded)                                  |
+| `version`     | No       | Skill version (default: `latest`)                                              |
+| `display`     | No       | `hidden`, `name`, `description`, `stream`, or `title` (default: `description`) |
+| `title`       | No       | UI label shown when `display: title` (hides description and arguments)         |
+| `description` | No       | Custom text shown to users                                                     |
 ### Built-in Skills

package/content/04-protocol/09-skills-advanced.md CHANGED Viewed

@@ -333,7 +333,7 @@ When a skill declares secrets and an organization configures them, the skill run
 | Aspect              | Standard Skills          | Secure Skills                                       | Device Skills                                          |
 | ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |
-| **Environment**     | Shared sandbox           | Isolated sandbox (one per skill)                    | Agent's computer (VM or desktop)                       |
+| **Environment**     | Shared sandbox           | Isolated sandbox (one per skill)                    | The agent's computer                                   |
 | **Available tools** | All 6 skill tools        | `skill_read`, `skill_list`, `skill_run` only        | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
 | **Script input**    | CLI arguments via `args` | JSON via stdin (use `input` parameter)              | CLI arguments via `args`                               |
 | **Secrets**         | No secrets               | Secrets as env vars                                 | No secrets                                             |

package/content/04-protocol/11-workers.md CHANGED Viewed

@@ -219,21 +219,22 @@ steps:
 All LLM configuration goes here:
-| Field         | Description                                                                            |
-| ------------- | -------------------------------------------------------------------------------------- |
-| `thread`      | Thread name (defaults to block name)                                                   |
-| `model`       | LLM model to use                                                                       |
-| `system`      | System prompt filename (required)                                                      |
-| `input`       | Variables for system prompt                                                            |
-| `tools`       | Tools available in this thread                                                         |
-| `skills`      | Octavus skills available in this thread                                                |
-| `mcpServers`  | MCP servers available in this thread                                                   |
-| `imageModel`  | Image generation model                                                                 |
-| `webSearch`   | Enable built-in web search tool                                                        |
-| `thinking`    | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |
-| `cache`       | Prompt caching mode: `auto` (default), `extended`, or `off`                            |
-| `temperature` | Model temperature (0-2), `"off"`, or variable reference                                |
-| `maxSteps`    | Maximum tool call cycles (enables agentic if > 1), or variable reference               |
+| Field                 | Description                                                                                                                             |
+| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
+| `thread`              | Thread name (defaults to block name)                                                                                                    |
+| `model`               | LLM model to use                                                                                                                        |
+| `system`              | System prompt filename (required)                                                                                                       |
+| `input`               | Variables for system prompt                                                                                                             |
+| `tools`               | Tools available in this thread                                                                                                          |
+| `skills`              | Octavus skills available in this thread                                                                                                 |
+| `mcpServers`          | MCP servers available in this thread                                                                                                    |
+| `imageModel`          | Image generation model                                                                                                                  |
+| `webSearch`           | Enable built-in web search tool                                                                                                         |
+| `thinking`            | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference                                                  |
+| `cache`               | Prompt caching mode: `auto` (default), `extended`, or `off`                                                                             |
+| `temperature`         | Model temperature (0-2), `"off"`, or variable reference                                                                                 |
+| `maxSteps`            | Maximum tool call cycles (enables agentic if > 1), or variable reference                                                                |
+| `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread's model view (head+tail preview + note). Omit to leave tool output unbounded |
 ## Simple Example
@@ -468,10 +469,11 @@ All standard events (text-delta, tool calls, etc.) are also emitted.
 ## Calling Workers from Interactive Agents
-Interactive agents can call workers in two ways:
+Interactive agents can call workers in three ways:
 1. **Deterministically** - Using the `run-worker` block
 2. **Agentically** - LLM calls worker as a tool
+3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management's `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.
 ### Worker Declaration
@@ -528,6 +530,7 @@ Controls how worker execution appears to users. The default for workers is `stre
 | `name`        | Shows a running/done indicator with the worker name. No nested content (text, tool calls, reasoning) is forwarded.                 |
 | `description` | Shows a running/done indicator with the worker description. No nested content is forwarded.                                        |
 | `stream`      | Full visibility. All nested events are forwarded - text, reasoning, tool calls, sources, files. Worker input is included on start. |
+| `title`       | Like `description`, but shows the worker's `title` field instead of its description. No nested content or input is forwarded.      |
 **Progressive input streaming:** When a worker with `display: stream` is invoked agentically (LLM calls it as a tool), the `UIWorkerPart` appears in the UI immediately as the LLM starts generating the worker's arguments. The worker input streams progressively into the worker part, the same way text tokens stream into a text part. Once input finishes, worker execution begins and nested content flows into the same worker part. There is no intermediate tool card.

package/content/04-protocol/13-mcp-servers.md CHANGED Viewed

@@ -43,7 +43,8 @@ mcpServers:
 | ------------- | -------- | ---------------------------------------------------------------------------------------------------------------------- |
 | `description` | Yes      | What the MCP server provides                                                                                           |
 | `source`      | Yes      | `remote`, `device`, or `consumer` (see source types above)                                                             |
-| `display`     | No       | How tool calls appear in UI: `hidden`, `name`, `description` (default: `description`)                                  |
+| `display`     | No       | How tool calls appear in UI: `hidden`, `name`, `description`, `stream`, `title` (default: `description`)               |
+| `title`       | No       | UI label shown when `display: title`; applies to every tool in this namespace (hides description and arguments)        |
 | `connection`  | No       | When to connect: `eager` or `lazy` (default: `lazy`). `remote` only.                                                   |
 | `execution`   | No       | Where the MCP process runs: `sandbox` (default) or `device`. `remote` only. See [Device Execution](#device-execution). |

package/content/04-protocol/14-context-management.md ADDED Viewed

@@ -0,0 +1,68 @@
+---
+title: Context Management
+description: Automatic context-window compaction so long sessions keep running past the model's limit.
+---
+# Context Management
+Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.
+Compaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.
+## Configuration
+```yaml
+workers:
+  context-summarizer: # the worker that produces the running summary
+    description: Summarizes earlier conversation to free up context
+    display: description
+agent:
+  model: anthropic/claude-sonnet-4-5
+  system: system
+  maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)
+  # context-summarizer is intentionally NOT listed in agent.workers,
+  # so the model never sees it as a callable tool.
+  contextManagement:
+    summarizerWorker: context-summarizer
+    thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)
+    recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)
+```
+`maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:
+| Field              | Required | Description                                                                                                          |
+| ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |
+| `summarizerWorker` | No       | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |
+| `thresholdPercent` | No       | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |
+| `recentPercent`    | No       | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization.        |
+| `recentWindow`     | No       | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction).                                   |
+## How it works
+- When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.
+- When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.
+- Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.
+- If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.
+## Bounded tool output
+Some tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.
+There is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.
+Bounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.
+## The summarizer worker
+`summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.
+Summarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.
+Declare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.
+Without a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.
+## What users see
+Because the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.

package/content/04-protocol/15-fast-mode.md ADDED Viewed

@@ -0,0 +1,77 @@
+---
+title: Fast Mode
+description: Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.
+---
+# Fast Mode
+Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):
+```yaml
+agent:
+  model: anthropic/claude-opus-4-8
+  speed: fast # fast | standard (default)
+```
+| Mode       | Behavior                                                     | When to use                                                                         |
+| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
+| `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents.                                                                        |
+| `fast`     | Higher output speed at a premium per-token rate.             | Latency-sensitive, interactive agents where faster responses are worth the premium. |
+Fast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.
+## Supported models
+Fast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.
+When you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.
+## Premium pricing
+Fast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:
+| Model          | Fast-mode cost |
+| -------------- | -------------- |
+| Opus 4.8       | ~2x standard   |
+| Opus 4.7 / 4.6 | ~6x standard   |
+Prompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.
+## Rate limits and fallback
+Fast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.
+Falling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.
+## Routing
+A supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:
+| Route             | Example model                               | How fast mode is enabled                                          |
+| ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |
+| Direct Anthropic  | `anthropic/claude-opus-4-8`                 | `speed: fast`                                                     |
+| Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7`          | `speed: fast`                                                     |
+| OpenRouter        | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |
+## Passing speed as input
+Like `thinking`, `speed` accepts a variable reference so consumers choose it per session:
+```yaml
+input:
+  SPEED:
+    type: string
+    description: Inference speed (fast/standard)
+    optional: true
+agent:
+  model: anthropic/claude-opus-4-8
+  speed: SPEED # Resolved from session input; unset -> standard
+  system: system
+```
+An unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.
+## Scope
+`speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.

package/content/07-workforce-agents/01-overview.md ADDED Viewed

@@ -0,0 +1,60 @@
+---
+title: Overview
+description: What Workforce Agents are and how to drive them programmatically.
+---
+# Workforce Agents
+Workforce Agents are Octavus's autonomous AI teammates - specialized agents you hire and configure in the dashboard, each with its own computer, skills, and tools. Browse and hire them at [octavus.ai/agents](https://octavus.ai/agents), and see the full roster at [octavus.ai/agents/discover](https://octavus.ai/agents/discover).
+The **Workforce Agents API** lets you drive one of your agents without a browser: give it a task, wait for it to finish, and read what it did. It is built for automation - CI pipelines, scheduled jobs, and backend integrations that hand work to an agent and collect the result.
+> Workforce Agents are distinct from agents you build yourself with the SDK. The [Sessions API](/docs/api-reference/sessions) drives agents you define and host; the Workforce Agents API drives the managed agents you hire and configure in the dashboard.
+## How it works
+A run follows a simple dispatch-then-poll model - there is no stream or webhook to manage:
+1. **Dispatch** a message to an agent. This starts a **thread** (a conversation) and returns a `threadId` right away.
+2. **Poll** the thread until its `status` is terminal.
+3. **Read** the thread's messages to get the agent's work.
+```mermaid
+sequenceDiagram
+  participant C as Your code
+  participant A as Workforce Agent
+  C->>A: dispatch a task
+  A-->>C: threadId and status
+  loop until terminal status
+    C->>A: get thread
+    A-->>C: status and messages
+  end
+```
+### Thread status
+| Status      | Meaning                                                                             |
+| ----------- | ----------------------------------------------------------------------------------- |
+| `pending`   | Dispatched, waiting to start                                                        |
+| `queued`    | Waiting for the agent to free up - an agent runs one task at a time on its computer |
+| `running`   | The agent is working                                                                |
+| `completed` | Finished successfully (terminal)                                                    |
+| `failed`    | Ended with an error - see `failureReason` (terminal)                                |
+| `cancelled` | Stopped (terminal)                                                                  |
+Poll while the status is `pending`, `queued`, or `running`, and stop once it is `completed`, `failed`, or `cancelled`.
+## Authentication
+Each agent has its own **API key**, created from the agent's settings in the dashboard (Settings -> API). The key:
+- Drives only that one agent - a key for one agent can never call another.
+- Is a secret. Use it from a backend, script, or CI, never in a browser or client app.
+- Is sent as a bearer token: `Authorization: Bearer oct_agt_...`.
+Create a key, copy it once (it is shown only at creation), and store it securely.
+## Next steps
+- [Using the SDK](/docs/workforce-agents/sdk) - the `@octavus/server-sdk` `workforce` client, including a run-and-wait helper.
+- [API reference](/docs/workforce-agents/api-reference) - the REST endpoints, for any language.