@octavus/docs 4.1.0 → 5.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/content/02-server-sdk/02-sessions.md +35 -0
- package/content/03-client-sdk/02-messages.md +24 -1
- package/content/04-protocol/05-skills.md +4 -2
- package/content/04-protocol/07-agent-config.md +27 -23
- package/content/04-protocol/09-skills-advanced.md +1 -1
- package/content/04-protocol/11-workers.md +18 -16
- package/content/04-protocol/14-context-management.md +68 -0
- package/content/04-protocol/15-fast-mode.md +77 -0
- package/dist/{chunk-V4C4VGHD.js → chunk-Z2OPVMHI.js} +59 -23
- package/dist/chunk-Z2OPVMHI.js.map +1 -0
- package/dist/content.js +1 -1
- package/dist/docs.json +29 -11
- package/dist/index.js +1 -1
- package/dist/search-index.json +1 -1
- package/dist/search.js +1 -1
- package/dist/search.js.map +1 -1
- package/dist/sections.json +29 -11
- package/package.json +1 -1
- package/dist/chunk-V4C4VGHD.js.map +0 -1
|
@@ -23,7 +23,7 @@ var docs_default = [
|
|
|
23
23
|
section: "server-sdk",
|
|
24
24
|
title: "Overview",
|
|
25
25
|
description: "Introduction to the Octavus Server SDK for backend integration.",
|
|
26
|
-
content: "\n# Server SDK Overview\n\nThe `@octavus/server-sdk` package provides a Node.js SDK for integrating Octavus agents into your backend application. It handles session management, streaming, and the tool execution continuation loop.\n\n**Current version:** `
|
|
26
|
+
content: "\n# Server SDK Overview\n\nThe `@octavus/server-sdk` package provides a Node.js SDK for integrating Octavus agents into your backend application. It handles session management, streaming, and the tool execution continuation loop.\n\n**Current version:** `5.0.0`\n\n## Installation\n\n```bash\nnpm install @octavus/server-sdk\n```\n\nFor agent management (sync, validate), install the CLI as a dev dependency:\n\n```bash\nnpm install --save-dev @octavus/cli\n```\n\n## Basic Usage\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: 'https://octavus.ai',\n apiKey: 'your-api-key',\n});\n```\n\n## Key Features\n\n### Agent Management\n\nAgent definitions are managed via the CLI. See the [CLI documentation](/docs/server-sdk/cli) for details.\n\n```bash\n# Sync agent from local files\noctavus sync ./agents/support-chat\n\n# Output: Created: support-chat\n# Agent ID: clxyz123abc456\n```\n\n### Session Management\n\nCreate and manage agent sessions using the agent ID:\n\n```typescript\n// Create a new session (use agent ID from CLI sync)\nconst sessionId = await client.agentSessions.create('clxyz123abc456', {\n COMPANY_NAME: 'Acme Corp',\n PRODUCT_NAME: 'Widget Pro',\n});\n\n// Get UI-ready session messages (for session restore)\nconst session = await client.agentSessions.getMessages(sessionId);\n```\n\n### Tool Handlers\n\nTools run on your server with your data:\n\n```typescript\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'get-user-account': async (args) => {\n // Access your database, APIs, etc.\n return await db.users.findById(args.userId);\n },\n },\n});\n```\n\n### Streaming\n\nAll responses stream in real-time:\n\n```typescript\nimport { toSSEStream } from '@octavus/server-sdk';\n\n// execute() returns an async generator of events\nconst events = session.execute({\n type: 'trigger',\n triggerName: 'user-message',\n input: { USER_MESSAGE: 'Hello!' },\n});\n\n// Convert to SSE stream for HTTP responses\nreturn new Response(toSSEStream(events), {\n headers: { 'Content-Type': 'text/event-stream' },\n});\n```\n\n### Computer Capabilities\n\nGive agents access to browser, filesystem, and shell via MCP:\n\n```typescript\nimport { Computer } from '@octavus/computer';\n\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=...']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [dir]),\n shell: Computer.shell({ cwd: dir, mode: 'unrestricted' }),\n },\n});\n\nawait computer.start();\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => ({ title: args.title }),\n },\n});\n\nsession.setDynamicTools(computer);\n```\n\n### Workers\n\nExecute worker agents for task-based processing:\n\n```typescript\n// Non-streaming: get the output directly\nconst { output } = await client.workers.generate(agentId, {\n TOPIC: 'AI safety',\n});\n\n// Streaming: observe events in real-time\nfor await (const event of client.workers.execute(agentId, input)) {\n // Handle stream events\n}\n```\n\n## API Reference\n\n### OctavusClient\n\nThe main entry point for interacting with Octavus.\n\n```typescript\ninterface OctavusClientConfig {\n baseUrl: string; // Octavus API URL\n apiKey?: string; // Your API key\n traceModelRequests?: boolean; // Enable model request tracing (default: false)\n maxRetries?: number; // Retries for transient network failures during streaming (default: 2, set to 0 to disable)\n}\n\nclass OctavusClient {\n readonly agents: AgentsApi;\n readonly agentSessions: AgentSessionsApi;\n readonly workers: WorkersApi;\n readonly files: FilesApi;\n\n constructor(config: OctavusClientConfig);\n}\n```\n\n### AgentSessionsApi\n\nManages agent sessions.\n\n```typescript\nclass AgentSessionsApi {\n // Create a new session\n async create(agentId: string, input?: Record<string, unknown>): Promise<string>;\n\n // Get full session state (for debugging/internal use)\n async get(sessionId: string): Promise<SessionState>;\n\n // Get UI-ready messages (for client display)\n async getMessages(sessionId: string): Promise<UISessionState>;\n\n // Attach to a session for triggering\n attach(sessionId: string, options?: SessionAttachOptions): AgentSession;\n}\n\n// Full session state (internal format)\ninterface SessionState {\n id: string;\n agentId: string;\n input: Record<string, unknown>;\n variables: Record<string, unknown>;\n resources: Record<string, unknown>;\n messages: ChatMessage[]; // Internal message format\n createdAt: string;\n updatedAt: string;\n}\n\n// UI-ready session state\ninterface UISessionState {\n sessionId: string;\n agentId: string;\n messages: UIMessage[]; // UI-ready messages for frontend\n}\n```\n\n### AgentSession\n\nHandles request execution and streaming for a specific session.\n\n```typescript\nclass AgentSession {\n // Execute a request and stream parsed events\n execute(request: SessionRequest, options?: TriggerOptions): AsyncGenerator<StreamEvent>;\n\n // Get the session ID\n getSessionId(): string;\n\n // Register dynamic tools (e.g., pass a Computer or explicit DynamicTool[])\n setDynamicTools(source: ToolProvider | DynamicTool[]): void;\n}\n\ntype SessionRequest = TriggerRequest | ContinueRequest;\n\ninterface TriggerRequest {\n type: 'trigger';\n triggerName: string;\n input?: Record<string, unknown>;\n}\n\ninterface ContinueRequest {\n type: 'continue';\n executionId: string;\n toolResults: ToolResult[];\n}\n\n// Helper to convert events to SSE stream\nfunction toSSEStream(events: AsyncIterable<StreamEvent>): ReadableStream<Uint8Array>;\n```\n\n### FilesApi\n\nHandles file uploads for sessions.\n\n```typescript\nclass FilesApi {\n // Get presigned URLs for file uploads\n async getUploadUrls(sessionId: string, files: FileUploadRequest[]): Promise<UploadUrlsResponse>;\n}\n\ninterface FileUploadRequest {\n filename: string;\n mediaType: string;\n size: number;\n}\n\ninterface UploadUrlsResponse {\n files: {\n id: string; // File ID for references\n uploadUrl: string; // PUT to this URL\n downloadUrl: string; // GET URL after upload\n }[];\n}\n```\n\nThe client uploads files directly to S3 using the presigned upload URL. See [File Uploads](/docs/client-sdk/file-uploads) for the full integration pattern.\n\n## Next Steps\n\n- [Sessions](/docs/server-sdk/sessions) - Deep dive into session management\n- [Tools](/docs/server-sdk/tools) - Implementing tool handlers\n- [Streaming](/docs/server-sdk/streaming) - Understanding stream events\n- [Workers](/docs/server-sdk/workers) - Executing worker agents\n- [Debugging](/docs/server-sdk/debugging) - Model request tracing and debugging\n- [Computer](/docs/server-sdk/computer) - Browser, filesystem, and shell via MCP\n",
|
|
27
27
|
excerpt: "Server SDK Overview The package provides a Node.js SDK for integrating Octavus agents into your backend application. It handles session management, streaming, and the tool execution continuation...",
|
|
28
28
|
order: 1
|
|
29
29
|
},
|
|
@@ -32,7 +32,7 @@ var docs_default = [
|
|
|
32
32
|
section: "server-sdk",
|
|
33
33
|
title: "Sessions",
|
|
34
34
|
description: "Managing agent sessions with the Server SDK.",
|
|
35
|
-
content: "\n# Sessions\n\nSessions represent conversations with an agent. They store conversation history, track resources and variables, and enable stateful interactions.\n\n## Creating Sessions\n\nCreate a session by specifying the agent ID and initial input variables:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\n// Create a session with the support-chat agent\nconst sessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n PRODUCT_NAME: 'Widget Pro',\n USER_ID: 'user-123', // Optional inputs\n});\n\nconsole.log('Session created:', sessionId);\n```\n\n## Getting Session Messages\n\nTo restore a conversation on page load, use `getMessages()` to retrieve UI-ready messages:\n\n```typescript\nconst session = await client.agentSessions.getMessages(sessionId);\n\nconsole.log({\n sessionId: session.sessionId,\n agentId: session.agentId,\n messages: session.messages.length, // UIMessage[] ready for frontend\n});\n```\n\nThe returned messages can be passed directly to the client SDK's `initialMessages` option.\n\n### UISessionState Interface\n\n```typescript\ninterface UISessionState {\n sessionId: string;\n agentId: string;\n messages: UIMessage[]; // UI-ready conversation history\n}\n```\n\n## Full Session State (Debug)\n\nFor debugging or internal use, you can retrieve the complete session state including all variables and internal message format:\n\n```typescript\nconst state = await client.agentSessions.get(sessionId);\n\nconsole.log({\n id: state.id,\n agentId: state.agentId,\n messages: state.messages.length, // ChatMessage[] (internal format)\n resources: state.resources,\n variables: state.variables,\n createdAt: state.createdAt,\n updatedAt: state.updatedAt,\n});\n```\n\n> **Note**: Use `getMessages()` for client-facing code. The `get()` method returns internal message format that includes hidden content not intended for end users.\n\n## Getting Execution Logs\n\n`getLogs()` returns the chronological execution trace for a session - triggers, messages, tool calls, LLM responses, errors, and other events emitted while the agent ran. Useful for debugging, observability, and building custom timeline views.\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status === 'expired') {\n console.log('Session expired:', result.sessionId);\n} else {\n for (const entry of result.entries) {\n console.log(entry.type, entry.timestamp);\n }\n}\n```\n\nEach entry is a typed variant of `ExecutionLogEntry` (a discriminated union) so consumers can narrow on `entry.type`:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired') {\n const toolCalls = result.entries.filter((e) => e.type === 'tool-call');\n for (const call of toolCalls) {\n // call.toolName, call.toolArguments are typed without optional chaining\n console.log(call.toolName, call.toolArguments);\n }\n}\n```\n\n### Excluding Model Request Payloads\n\nModel-request entries include the full provider request body and can be large. Pass `excludeModelRequests: true` to skip them:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId, {\n excludeModelRequests: true,\n});\n```\n\n### Truncation\n\nResponses are capped at 1000 entries (most recent). When the log exceeds that cap, the response includes `total` and `truncated` so consumers can detect this:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired' && result.truncated) {\n console.warn(`Showing latest 1000 of ${result.total} entries`);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | -------------------------------------------------------------------------------------------- |\n| `active` | `ExecutionLogsResult` | `{ sessionId, entries, total?, truncated? }`. `total` and `truncated` are present when known |\n| `expired` | `ExpiredSessionState` | `{ sessionId, agentId, status: 'expired', createdAt }` |\n\n> **Forward-compatible types**: `ExecutionLogEntry` may gain new variants over time. Include a `default` case when switching on `entry.type` so unknown variants are handled gracefully.\n\n## Attaching to Sessions\n\nTo trigger actions on a session, you need to attach to it first:\n\n```typescript\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n // Tool handlers (see Tools documentation)\n },\n resources: [\n // Resource watchers (optional)\n ],\n});\n```\n\n### Attach Options\n\n| Option | Type | Description |\n| ----------------------- | --------------------------------- | ------------------------------------------------------------------------------- |\n| `tools` | `ToolHandlers` | Server-side tool handler functions |\n| `resources` | `Resource[]` | Resource watchers for real-time updates |\n| `onToolResults` | `(results: ToolResult[]) => void` | Callback invoked after server-side tool results are produced |\n| `rejectClientToolCalls` | `boolean` | If `true`, reject tool calls that have no server handler (no client forwarding) |\n\nFor MCP tool integration (browser, filesystem, shell via `@octavus/computer`), register dynamic tools after attaching with `session.setDynamicTools()`. See [Computer](/docs/server-sdk/computer) for details.\n\n## Executing Requests\n\nOnce attached, execute requests on the session using `execute()`:\n\n```typescript\nimport { toSSEStream } from '@octavus/server-sdk';\n\n// execute() handles both triggers and client tool continuations\nconst events = session.execute(\n { type: 'trigger', triggerName: 'user-message', input: { USER_MESSAGE: 'Hello!' } },\n { signal: request.signal },\n);\n\n// Convert to SSE stream for HTTP responses\nreturn new Response(toSSEStream(events), {\n headers: { 'Content-Type': 'text/event-stream' },\n});\n```\n\n### Request Types\n\nThe `execute()` method accepts a discriminated union:\n\n```typescript\ntype SessionRequest = TriggerRequest | ContinueRequest;\n\n// Start a new conversation turn\ninterface TriggerRequest {\n type: 'trigger';\n triggerName: string;\n input?: Record<string, unknown>;\n rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID\n}\n\n// Continue after client-side tool handling\ninterface ContinueRequest {\n type: 'continue';\n executionId: string;\n toolResults: ToolResult[];\n}\n```\n\nThis makes it easy to pass requests through from the client:\n\n```typescript\n// Simple passthrough from HTTP request body\nexport async function POST(request: Request) {\n const body = await request.json();\n const { sessionId, ...payload } = body;\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n /* ... */\n },\n });\n const events = session.execute(payload, { signal: request.signal });\n\n return new Response(toSSEStream(events));\n}\n```\n\n### Stop Support\n\nPass an abort signal to allow clients to stop generation:\n\n```typescript\nconst events = session.execute(request, {\n signal: request.signal, // Forward the client's abort signal\n});\n```\n\nWhen the client aborts the request, the signal propagates through to the LLM provider, stopping generation immediately. Any partial content is preserved.\n\n## WebSocket Handling\n\nFor WebSocket integrations, use `handleSocketMessage()` which manages abort controller lifecycle internally:\n\n```typescript\nimport type { SocketMessage } from '@octavus/server-sdk';\n\n// In your socket handler\nconn.on('data', async (rawData: string) => {\n const msg = JSON.parse(rawData);\n\n if (msg.type === 'trigger' || msg.type === 'continue' || msg.type === 'stop') {\n await session.handleSocketMessage(msg as SocketMessage, {\n onEvent: (event) => conn.write(JSON.stringify(event)),\n onFinish: async () => {\n // Fetch and persist messages to your database for restoration\n },\n });\n }\n});\n```\n\nThe `handleSocketMessage()` method:\n\n- Handles `trigger`, `continue`, and `stop` messages\n- Automatically aborts previous requests when a new one arrives\n- Streams events via the `onEvent` callback\n- Calls `onFinish` after streaming completes (not called if aborted)\n\nSee [Socket Chat Example](/docs/examples/socket-chat) for a complete implementation.\n\n## Session Lifecycle\n\n```mermaid\nflowchart TD\n A[1. CREATE] --> B[2. ATTACH]\n B --> C[3. TRIGGER]\n C --> C\n C --> D[4. RETRIEVE]\n D --> C\n C --> E[5. EXPIRE]\n C --> G[5b. CLEAR]\n G --> F\n E --> F{6. RESTORE?}\n F -->|Yes| C\n F -->|No| A\n\n A -.- A1[\"`**client.agentSessions.create()**\n Returns sessionId\n Initializes state`\"]\n\n B -.- B1[\"`**client.agentSessions.attach()**\n Configure tool handlers\n Configure resource watchers`\"]\n\n C -.- C1[\"`**session.execute()**\n Execute request\n Stream events\n Update state`\"]\n\n D -.- D1[\"`**client.agentSessions.getMessages()**\n Get UI-ready messages\n Check session status`\"]\n\n E -.- E1[\"`Sessions expire after\n 24 hours (configurable)`\"]\n\n G -.- G1[\"`**client.agentSessions.clear()**\n Programmatically clear state\n Session becomes expired`\"]\n\n F -.- F1[\"`**client.agentSessions.restore()**\n Restore from stored messages\n Or create new session`\"]\n```\n\n## Session Expiration\n\nSessions expire after a period of inactivity (default: 24 hours). When you call `getMessages()` or `get()`, the response includes a `status` field:\n\n```typescript\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'expired') {\n // Session has expired - restore or create new\n console.log('Session expired:', result.sessionId);\n} else {\n // Session is active\n console.log('Messages:', result.messages.length);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | ------------------------------------------------------------- |\n| `active` | `UISessionState` | Session is active, includes `messages` array |\n| `expired` | `ExpiredSessionState` | Session expired, includes `sessionId`, `agentId`, `createdAt` |\n\n## Persisting Chat History\n\nTo enable session restoration, store the chat messages in your own database after each interaction:\n\n```typescript\n// After each trigger completes, save messages\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'active') {\n // Store in your database\n await db.chats.update({\n where: { id: chatId },\n data: {\n sessionId: result.sessionId,\n messages: result.messages, // Store UIMessage[] as JSON\n },\n });\n}\n```\n\n> **Best Practice**: Store the full `UIMessage[]` array. This preserves all message parts (text, tool calls, files, etc.) needed for accurate restoration.\n\n## Restoring Sessions\n\nWhen a user returns to your app:\n\n```typescript\n// 1. Load stored data from your database\nconst chat = await db.chats.findUnique({ where: { id: chatId } });\n\n// 2. Check if session is still active\nconst result = await client.agentSessions.getMessages(chat.sessionId);\n\nif (result.status === 'active') {\n // Session is active - use it directly\n return {\n sessionId: result.sessionId,\n messages: result.messages,\n };\n}\n\n// 3. Session expired - restore from stored messages\nif (chat.messages && chat.messages.length > 0) {\n const restored = await client.agentSessions.restore(\n chat.sessionId,\n chat.messages,\n { COMPANY_NAME: 'Acme Corp' }, // Optional: same input as create()\n );\n\n if (restored.restored) {\n // Session restored successfully\n return {\n sessionId: restored.sessionId,\n messages: chat.messages,\n };\n }\n}\n\n// 4. Cannot restore - create new session\nconst newSessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n});\n\nreturn {\n sessionId: newSessionId,\n messages: [],\n};\n```\n\n### Restore Response\n\n```typescript\ninterface RestoreSessionResult {\n sessionId: string;\n restored: boolean; // true if restored, false if session was already active\n}\n```\n\n## Complete Example\n\nHere's a complete session management flow:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\nasync function getOrCreateSession(chatId: string, agentId: string, input: Record<string, unknown>) {\n // Load existing chat data\n const chat = await db.chats.findUnique({ where: { id: chatId } });\n\n if (chat?.sessionId) {\n // Check session status\n const result = await client.agentSessions.getMessages(chat.sessionId);\n\n if (result.status === 'active') {\n return { sessionId: result.sessionId, messages: result.messages };\n }\n\n // Try to restore expired session\n if (chat.messages?.length > 0) {\n const restored = await client.agentSessions.restore(chat.sessionId, chat.messages, input);\n if (restored.restored) {\n return { sessionId: restored.sessionId, messages: chat.messages };\n }\n }\n }\n\n // Create new session\n const sessionId = await client.agentSessions.create(agentId, input);\n\n // Save to database\n await db.chats.upsert({\n where: { id: chatId },\n create: { id: chatId, sessionId, messages: [] },\n update: { sessionId, messages: [] },\n });\n\n return { sessionId, messages: [] };\n}\n```\n\n## Clearing Sessions\n\nTo programmatically clear a session's state (e.g., for testing reset/restore flows), use `clear()`:\n\n```typescript\nconst result = await client.agentSessions.clear(sessionId);\nconsole.log(result.cleared); // true\n```\n\nAfter clearing, the session transitions to `expired` status. You can then restore it with `restore()` or create a new session.\n\n```typescript\ninterface ClearSessionResult {\n sessionId: string;\n cleared: boolean;\n}\n```\n\nThis is idempotent - calling `clear()` on an already expired session succeeds without error.\n\n## Error Handling\n\n```typescript\nimport { ApiError } from '@octavus/server-sdk';\n\ntry {\n const session = await client.agentSessions.getMessages(sessionId);\n} catch (error) {\n if (error instanceof ApiError) {\n if (error.status === 404) {\n // Session not found or expired\n console.log('Session expired, create a new one');\n } else {\n console.error('API Error:', error.message);\n }\n }\n throw error;\n}\n```\n",
|
|
35
|
+
content: "\n# Sessions\n\nSessions represent conversations with an agent. They store conversation history, track resources and variables, and enable stateful interactions.\n\n## Creating Sessions\n\nCreate a session by specifying the agent ID and initial input variables:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\n// Create a session with the support-chat agent\nconst sessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n PRODUCT_NAME: 'Widget Pro',\n USER_ID: 'user-123', // Optional inputs\n});\n\nconsole.log('Session created:', sessionId);\n```\n\n## Getting Session Messages\n\nTo restore a conversation on page load, use `getMessages()` to retrieve UI-ready messages:\n\n```typescript\nconst session = await client.agentSessions.getMessages(sessionId);\n\nconsole.log({\n sessionId: session.sessionId,\n agentId: session.agentId,\n messages: session.messages.length, // UIMessage[] ready for frontend\n});\n```\n\nThe returned messages can be passed directly to the client SDK's `initialMessages` option.\n\n### UISessionState Interface\n\n```typescript\ninterface UISessionState {\n sessionId: string;\n agentId: string;\n messages: UIMessage[]; // UI-ready conversation history\n}\n```\n\n## Full Session State (Debug)\n\nFor debugging or internal use, you can retrieve the complete session state including all variables and internal message format:\n\n```typescript\nconst state = await client.agentSessions.get(sessionId);\n\nconsole.log({\n id: state.id,\n agentId: state.agentId,\n messages: state.messages.length, // ChatMessage[] (internal format)\n resources: state.resources,\n variables: state.variables,\n createdAt: state.createdAt,\n updatedAt: state.updatedAt,\n});\n```\n\n> **Note**: Use `getMessages()` for client-facing code. The `get()` method returns internal message format that includes hidden content not intended for end users.\n\n## Getting Execution Logs\n\n`getLogs()` returns the chronological execution trace for a session - triggers, messages, tool calls, LLM responses, errors, and other events emitted while the agent ran. Useful for debugging, observability, and building custom timeline views.\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status === 'expired') {\n console.log('Session expired:', result.sessionId);\n} else {\n for (const entry of result.entries) {\n console.log(entry.type, entry.timestamp);\n }\n}\n```\n\nEach entry is a typed variant of `ExecutionLogEntry` (a discriminated union) so consumers can narrow on `entry.type`:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired') {\n const toolCalls = result.entries.filter((e) => e.type === 'tool-call');\n for (const call of toolCalls) {\n // call.toolName, call.toolArguments are typed without optional chaining\n console.log(call.toolName, call.toolArguments);\n }\n}\n```\n\n### Excluding Model Request Payloads\n\nModel-request entries include the full provider request body and can be large. Pass `excludeModelRequests: true` to skip them:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId, {\n excludeModelRequests: true,\n});\n```\n\n### Truncation\n\nResponses are capped at 1000 entries (most recent). When the log exceeds that cap, the response includes `total` and `truncated` so consumers can detect this:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired' && result.truncated) {\n console.warn(`Showing latest 1000 of ${result.total} entries`);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | -------------------------------------------------------------------------------------------- |\n| `active` | `ExecutionLogsResult` | `{ sessionId, entries, total?, truncated? }`. `total` and `truncated` are present when known |\n| `expired` | `ExpiredSessionState` | `{ sessionId, agentId, status: 'expired', createdAt }` |\n\n> **Forward-compatible types**: `ExecutionLogEntry` may gain new variants over time. Include a `default` case when switching on `entry.type` so unknown variants are handled gracefully.\n\n## Attaching to Sessions\n\nTo trigger actions on a session, you need to attach to it first:\n\n```typescript\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n // Tool handlers (see Tools documentation)\n },\n resources: [\n // Resource watchers (optional)\n ],\n});\n```\n\n### Attach Options\n\n| Option | Type | Description |\n| ----------------------- | --------------------------------- | ------------------------------------------------------------------------------- |\n| `tools` | `ToolHandlers` | Server-side tool handler functions |\n| `resources` | `Resource[]` | Resource watchers for real-time updates |\n| `onToolResults` | `(results: ToolResult[]) => void` | Callback invoked after server-side tool results are produced |\n| `rejectClientToolCalls` | `boolean` | If `true`, reject tool calls that have no server handler (no client forwarding) |\n\nFor MCP tool integration (browser, filesystem, shell via `@octavus/computer`), register dynamic tools after attaching with `session.setDynamicTools()`. See [Computer](/docs/server-sdk/computer) for details.\n\n## Executing Requests\n\nOnce attached, execute requests on the session using `execute()`:\n\n```typescript\nimport { toSSEStream } from '@octavus/server-sdk';\n\n// execute() handles both triggers and client tool continuations\nconst events = session.execute(\n { type: 'trigger', triggerName: 'user-message', input: { USER_MESSAGE: 'Hello!' } },\n { signal: request.signal },\n);\n\n// Convert to SSE stream for HTTP responses\nreturn new Response(toSSEStream(events), {\n headers: { 'Content-Type': 'text/event-stream' },\n});\n```\n\n### Request Types\n\nThe `execute()` method accepts a discriminated union:\n\n```typescript\ntype SessionRequest = TriggerRequest | ContinueRequest;\n\n// Start a new conversation turn\ninterface TriggerRequest {\n type: 'trigger';\n triggerName: string;\n input?: Record<string, unknown>;\n rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID\n sender?: UIMessageSender; // Author of this turn, for multi-user attribution\n}\n\n// Continue after client-side tool handling\ninterface ContinueRequest {\n type: 'continue';\n executionId: string;\n toolResults: ToolResult[];\n}\n```\n\nThis makes it easy to pass requests through from the client:\n\n```typescript\n// Simple passthrough from HTTP request body\nexport async function POST(request: Request) {\n const body = await request.json();\n const { sessionId, ...payload } = body;\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n /* ... */\n },\n });\n const events = session.execute(payload, { signal: request.signal });\n\n return new Response(toSSEStream(events));\n}\n```\n\n### Attributing Messages in Multi-User Chats\n\nWhen several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:\n\n```typescript\ninterface UIMessageSender {\n id?: string;\n name?: string;\n image?: string; // Avatar URL\n}\n\nexport async function POST(request: Request) {\n const user = await authenticate(request); // your auth\n const { sessionId, ...payload } = await request.json();\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n /* ... */\n },\n });\n const events = session.execute(\n {\n ...payload,\n sender: { id: user.id, name: user.name, image: user.avatarUrl },\n },\n { signal: request.signal },\n );\n\n return new Response(toSSEStream(events));\n}\n```\n\nThe runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).\n\n### Stop Support\n\nPass an abort signal to allow clients to stop generation:\n\n```typescript\nconst events = session.execute(request, {\n signal: request.signal, // Forward the client's abort signal\n});\n```\n\nWhen the client aborts the request, the signal propagates through to the LLM provider, stopping generation immediately. Any partial content is preserved.\n\n## WebSocket Handling\n\nFor WebSocket integrations, use `handleSocketMessage()` which manages abort controller lifecycle internally:\n\n```typescript\nimport type { SocketMessage } from '@octavus/server-sdk';\n\n// In your socket handler\nconn.on('data', async (rawData: string) => {\n const msg = JSON.parse(rawData);\n\n if (msg.type === 'trigger' || msg.type === 'continue' || msg.type === 'stop') {\n await session.handleSocketMessage(msg as SocketMessage, {\n onEvent: (event) => conn.write(JSON.stringify(event)),\n onFinish: async () => {\n // Fetch and persist messages to your database for restoration\n },\n });\n }\n});\n```\n\nThe `handleSocketMessage()` method:\n\n- Handles `trigger`, `continue`, and `stop` messages\n- Automatically aborts previous requests when a new one arrives\n- Streams events via the `onEvent` callback\n- Calls `onFinish` after streaming completes (not called if aborted)\n\nSee [Socket Chat Example](/docs/examples/socket-chat) for a complete implementation.\n\n## Session Lifecycle\n\n```mermaid\nflowchart TD\n A[1. CREATE] --> B[2. ATTACH]\n B --> C[3. TRIGGER]\n C --> C\n C --> D[4. RETRIEVE]\n D --> C\n C --> E[5. EXPIRE]\n C --> G[5b. CLEAR]\n G --> F\n E --> F{6. RESTORE?}\n F -->|Yes| C\n F -->|No| A\n\n A -.- A1[\"`**client.agentSessions.create()**\n Returns sessionId\n Initializes state`\"]\n\n B -.- B1[\"`**client.agentSessions.attach()**\n Configure tool handlers\n Configure resource watchers`\"]\n\n C -.- C1[\"`**session.execute()**\n Execute request\n Stream events\n Update state`\"]\n\n D -.- D1[\"`**client.agentSessions.getMessages()**\n Get UI-ready messages\n Check session status`\"]\n\n E -.- E1[\"`Sessions expire after\n 24 hours (configurable)`\"]\n\n G -.- G1[\"`**client.agentSessions.clear()**\n Programmatically clear state\n Session becomes expired`\"]\n\n F -.- F1[\"`**client.agentSessions.restore()**\n Restore from stored messages\n Or create new session`\"]\n```\n\n## Session Expiration\n\nSessions expire after a period of inactivity (default: 24 hours). When you call `getMessages()` or `get()`, the response includes a `status` field:\n\n```typescript\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'expired') {\n // Session has expired - restore or create new\n console.log('Session expired:', result.sessionId);\n} else {\n // Session is active\n console.log('Messages:', result.messages.length);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | ------------------------------------------------------------- |\n| `active` | `UISessionState` | Session is active, includes `messages` array |\n| `expired` | `ExpiredSessionState` | Session expired, includes `sessionId`, `agentId`, `createdAt` |\n\n## Persisting Chat History\n\nTo enable session restoration, store the chat messages in your own database after each interaction:\n\n```typescript\n// After each trigger completes, save messages\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'active') {\n // Store in your database\n await db.chats.update({\n where: { id: chatId },\n data: {\n sessionId: result.sessionId,\n messages: result.messages, // Store UIMessage[] as JSON\n },\n });\n}\n```\n\n> **Best Practice**: Store the full `UIMessage[]` array. This preserves all message parts (text, tool calls, files, etc.) needed for accurate restoration.\n\n## Restoring Sessions\n\nWhen a user returns to your app:\n\n```typescript\n// 1. Load stored data from your database\nconst chat = await db.chats.findUnique({ where: { id: chatId } });\n\n// 2. Check if session is still active\nconst result = await client.agentSessions.getMessages(chat.sessionId);\n\nif (result.status === 'active') {\n // Session is active - use it directly\n return {\n sessionId: result.sessionId,\n messages: result.messages,\n };\n}\n\n// 3. Session expired - restore from stored messages\nif (chat.messages && chat.messages.length > 0) {\n const restored = await client.agentSessions.restore(\n chat.sessionId,\n chat.messages,\n { COMPANY_NAME: 'Acme Corp' }, // Optional: same input as create()\n );\n\n if (restored.restored) {\n // Session restored successfully\n return {\n sessionId: restored.sessionId,\n messages: chat.messages,\n };\n }\n}\n\n// 4. Cannot restore - create new session\nconst newSessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n});\n\nreturn {\n sessionId: newSessionId,\n messages: [],\n};\n```\n\n### Restore Response\n\n```typescript\ninterface RestoreSessionResult {\n sessionId: string;\n restored: boolean; // true if restored, false if session was already active\n}\n```\n\n## Complete Example\n\nHere's a complete session management flow:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\nasync function getOrCreateSession(chatId: string, agentId: string, input: Record<string, unknown>) {\n // Load existing chat data\n const chat = await db.chats.findUnique({ where: { id: chatId } });\n\n if (chat?.sessionId) {\n // Check session status\n const result = await client.agentSessions.getMessages(chat.sessionId);\n\n if (result.status === 'active') {\n return { sessionId: result.sessionId, messages: result.messages };\n }\n\n // Try to restore expired session\n if (chat.messages?.length > 0) {\n const restored = await client.agentSessions.restore(chat.sessionId, chat.messages, input);\n if (restored.restored) {\n return { sessionId: restored.sessionId, messages: chat.messages };\n }\n }\n }\n\n // Create new session\n const sessionId = await client.agentSessions.create(agentId, input);\n\n // Save to database\n await db.chats.upsert({\n where: { id: chatId },\n create: { id: chatId, sessionId, messages: [] },\n update: { sessionId, messages: [] },\n });\n\n return { sessionId, messages: [] };\n}\n```\n\n## Clearing Sessions\n\nTo programmatically clear a session's state (e.g., for testing reset/restore flows), use `clear()`:\n\n```typescript\nconst result = await client.agentSessions.clear(sessionId);\nconsole.log(result.cleared); // true\n```\n\nAfter clearing, the session transitions to `expired` status. You can then restore it with `restore()` or create a new session.\n\n```typescript\ninterface ClearSessionResult {\n sessionId: string;\n cleared: boolean;\n}\n```\n\nThis is idempotent - calling `clear()` on an already expired session succeeds without error.\n\n## Error Handling\n\n```typescript\nimport { ApiError } from '@octavus/server-sdk';\n\ntry {\n const session = await client.agentSessions.getMessages(sessionId);\n} catch (error) {\n if (error instanceof ApiError) {\n if (error.status === 404) {\n // Session not found or expired\n console.log('Session expired, create a new one');\n } else {\n console.error('API Error:', error.message);\n }\n }\n throw error;\n}\n```\n",
|
|
36
36
|
excerpt: "Sessions Sessions represent conversations with an agent. They store conversation history, track resources and variables, and enable stateful interactions. Creating Sessions Create a session by...",
|
|
37
37
|
order: 2
|
|
38
38
|
},
|
|
@@ -59,7 +59,7 @@ var docs_default = [
|
|
|
59
59
|
section: "server-sdk",
|
|
60
60
|
title: "CLI",
|
|
61
61
|
description: "Command-line interface for validating and syncing agent definitions.",
|
|
62
|
-
content: '\n# Octavus CLI\n\nThe `@octavus/cli` package provides a command-line interface for validating and syncing agent definitions from your local filesystem to the Octavus platform.\n\n**Current version:** `
|
|
62
|
+
content: '\n# Octavus CLI\n\nThe `@octavus/cli` package provides a command-line interface for validating and syncing agent definitions from your local filesystem to the Octavus platform.\n\n**Current version:** `5.0.0`\n\n## Installation\n\n```bash\nnpm install --save-dev @octavus/cli\n```\n\n## Configuration\n\nThe CLI requires an API key with the **Agents** permission.\n\n### Environment Variables\n\n| Variable | Description |\n| --------------------- | ---------------------------------------------- |\n| `OCTAVUS_CLI_API_KEY` | API key with "Agents" permission (recommended) |\n| `OCTAVUS_API_KEY` | Fallback if `OCTAVUS_CLI_API_KEY` not set |\n| `OCTAVUS_API_URL` | Optional, defaults to `https://octavus.ai` |\n\n### Two-Key Strategy (Recommended)\n\nFor production deployments, use separate API keys with minimal permissions:\n\n```bash\n# CI/CD or .env.local (not committed)\nOCTAVUS_CLI_API_KEY=oct_sk_... # "Agents" permission only\n\n# Production .env\nOCTAVUS_API_KEY=oct_sk_... # "Sessions" permission only\n```\n\nThis ensures production servers only have session permissions (smaller blast radius if leaked), while agent management is restricted to development/CI environments.\n\n### Multiple Environments\n\nUse separate Octavus projects for staging and production, each with their own API keys. The `--env` flag lets you load different environment files:\n\n```bash\n# Local development (default: .env)\noctavus sync ./agents/my-agent\n\n# Staging project\noctavus --env .env.staging sync ./agents/my-agent\n\n# Production project\noctavus --env .env.production sync ./agents/my-agent\n```\n\nExample environment files:\n\n```bash\n# .env.staging (syncs to your staging project)\nOCTAVUS_CLI_API_KEY=oct_sk_staging_project_key...\n\n# .env.production (syncs to your production project)\nOCTAVUS_CLI_API_KEY=oct_sk_production_project_key...\n```\n\nEach project has its own agents, so you\'ll get different agent IDs per environment.\n\n## Global Options\n\n| Option | Description |\n| -------------- | ------------------------------------------------------- |\n| `--env <file>` | Load environment from a specific file (default: `.env`) |\n| `--help` | Show help |\n| `--version` | Show version |\n\n## Commands\n\n### `octavus sync <path>`\n\nSync an agent definition to the platform. Creates the agent if it doesn\'t exist, or updates it if it does.\n\n```bash\noctavus sync ./agents/my-agent\n```\n\n**Options:**\n\n- `--json` - Output as JSON (for CI/CD parsing)\n- `--quiet` - Suppress non-essential output\n\n**Example output:**\n\n```\n\u2139 Reading agent from ./agents/my-agent...\n\u2139 Syncing support-chat...\n\u2713 Created: support-chat\n Agent ID: clxyz123abc456\n```\n\n### `octavus validate <path>`\n\nValidate an agent definition without saving. Useful for CI/CD pipelines.\n\n```bash\noctavus validate ./agents/my-agent\n```\n\n**Exit codes:**\n\n- `0` - Validation passed\n- `1` - Validation errors\n- `2` - Configuration errors (missing API key, etc.)\n\n### `octavus list`\n\nList all agents in your project.\n\n```bash\noctavus list\n```\n\n**Example output:**\n\n```\nSLUG NAME FORMAT ID\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nsupport-chat Support Chat Agent interactive clxyz123abc456\n\n1 agent(s)\n```\n\n### `octavus get <slug>`\n\nGet details about a specific agent by its slug.\n\n```bash\noctavus get support-chat\n```\n\n### `octavus archive <slug>`\n\nArchive an agent by slug (soft delete). Archived agents are removed from the active agent list and their slug is freed for reuse.\n\n```bash\noctavus archive support-chat\n```\n\n**Options:**\n\n- `--json` - Output as JSON (for CI/CD parsing)\n- `--quiet` - Suppress non-essential output\n\n**Example output:**\n\n```\n\u2139 Archiving support-chat...\n\u2713 Archived: support-chat\n Agent ID: clxyz123abc456\n```\n\n### `octavus skills sync <path>`\n\nSync a skill to the platform. Packages the skill directory into a bundle (excluding `.env` files, `.git`, and `node_modules`), uploads it, and optionally pushes secrets from the skill\'s `.env` file.\n\n```bash\noctavus skills sync ./skills/github\n```\n\n**Options:**\n\n- `--json` - Output as JSON (for CI/CD parsing)\n- `--quiet` - Suppress non-essential output\n\n**Example output:**\n\n```\n\u2139 Reading skill from ./skills/github...\n\u2139 Packaging github...\n\u2713 Created: github\n Skill ID: clxyz789def012\n\u2139 Pushing 2 secret(s)...\n\u2713 2 secret(s) updated\n```\n\n**Secret handling:**\n\nIf the skill directory contains a `.env` file, secrets are pushed alongside the bundle. Secrets are cross-validated against the `secrets` declarations in `SKILL.md` - warnings are shown for undeclared or missing required secrets.\n\n```\nmy-skill/\n\u251C\u2500\u2500 SKILL.md\n\u251C\u2500\u2500 scripts/\n\u2502 \u2514\u2500\u2500 run.py\n\u2514\u2500\u2500 .env # Secrets (not included in bundle)\n```\n\nSee [Skills](/docs/protocol/skills) for details on skill format, secrets, and secure mode.\n\n## Agent Directory Structure\n\nThe CLI expects agent definitions in a specific directory structure:\n\n```\nmy-agent/\n\u251C\u2500\u2500 settings.json # Required: Agent metadata\n\u251C\u2500\u2500 protocol.yaml # Required: Agent protocol\n\u251C\u2500\u2500 prompts/ # Optional: Prompt templates\n\u2502 \u251C\u2500\u2500 system.md\n\u2502 \u2514\u2500\u2500 user-message.md\n\u2514\u2500\u2500 references/ # Optional: Reference documents\n \u2514\u2500\u2500 api-guidelines.md\n```\n\n### references/\n\nReference files are markdown documents with YAML frontmatter containing a `description`. The agent can fetch these on demand during execution. See [References](/docs/protocol/references) for details.\n\n### settings.json\n\n```json\n{\n "slug": "my-agent",\n "name": "My Agent",\n "description": "A helpful assistant",\n "format": "interactive"\n}\n```\n\n### protocol.yaml\n\nSee the [Protocol documentation](/docs/protocol/overview) for details on protocol syntax.\n\n## CI/CD Integration\n\n### GitHub Actions\n\n```yaml\nname: Validate and Sync Agents\n\non:\n push:\n branches: [main]\n paths:\n - \'agents/**\'\n\njobs:\n sync:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v4\n\n - uses: actions/setup-node@v4\n with:\n node-version: \'22\'\n\n - run: npm install\n\n - name: Validate agent\n run: npx octavus validate ./agents/support-chat\n env:\n OCTAVUS_CLI_API_KEY: ${{ secrets.OCTAVUS_CLI_API_KEY }}\n\n - name: Sync agent\n run: npx octavus sync ./agents/support-chat\n env:\n OCTAVUS_CLI_API_KEY: ${{ secrets.OCTAVUS_CLI_API_KEY }}\n```\n\n### Package.json Scripts\n\nAdd sync scripts to your `package.json`:\n\n```json\n{\n "scripts": {\n "agents:validate": "octavus validate ./agents/my-agent",\n "agents:sync": "octavus sync ./agents/my-agent"\n },\n "devDependencies": {\n "@octavus/cli": "^0.1.0"\n }\n}\n```\n\n## Workflow\n\nThe recommended workflow for managing agents:\n\n1. **Define agent locally** - Create `settings.json`, `protocol.yaml`, and prompts\n2. **Validate** - Run `octavus validate ./my-agent` to check for errors\n3. **Sync** - Run `octavus sync ./my-agent` to push to platform\n4. **Store agent ID** - Save the output ID in an environment variable\n5. **Use in app** - Read the ID from env and pass to `client.agentSessions.create()`\n\n```bash\n# After syncing: octavus sync ./agents/support-chat\n# Output: Agent ID: clxyz123abc456\n\n# Add to your .env file\nOCTAVUS_SUPPORT_AGENT_ID=clxyz123abc456\n```\n\n```typescript\nconst agentId = process.env.OCTAVUS_SUPPORT_AGENT_ID;\n\nconst sessionId = await client.agentSessions.create(agentId, {\n COMPANY_NAME: \'Acme Corp\',\n});\n```\n',
|
|
63
63
|
excerpt: "Octavus CLI The package provides a command-line interface for validating and syncing agent definitions from your local filesystem to the Octavus platform. Current version: Installation ...",
|
|
64
64
|
order: 5
|
|
65
65
|
},
|
|
@@ -86,7 +86,7 @@ var docs_default = [
|
|
|
86
86
|
section: "server-sdk",
|
|
87
87
|
title: "Computer",
|
|
88
88
|
description: "Adding browser, filesystem, and shell capabilities to agents with @octavus/computer.",
|
|
89
|
-
content: "\n# Computer\n\nThe `@octavus/computer` package gives agents access to a physical or virtual machine's browser, filesystem, and shell. It connects to [MCP](https://modelcontextprotocol.io) servers, discovers their tools, and provides them to the server-sdk.\n\n**Current version:** `4.1.0`\n\n## Installation\n\n```bash\nnpm install @octavus/computer\n```\n\n## Quick Start\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=http://127.0.0.1:9222']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', ['/path/to/workspace']),\n shell: Computer.shell({ cwd: '/path/to/workspace', mode: 'unrestricted' }),\n },\n});\n\nawait computer.start();\n\nconst client = new OctavusClient({\n baseUrl: 'https://octavus.ai',\n apiKey: 'your-api-key',\n});\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => ({ title: args.title }),\n },\n});\n\nsession.setDynamicTools(computer);\n```\n\nDynamic tools are registered after attaching via `session.setDynamicTools()`. Pass the `computer` directly - the session extracts schemas and handlers from the `ToolProvider`. Tool schemas are sent to the platform on the next `execute()` call, and tool calls flow back through the existing execution loop.\n\n## How It Works\n\n1. You configure MCP servers with namespaces (e.g., `browser`, `filesystem`, `shell`)\n2. `computer.start()` connects to all servers in parallel and discovers their tools\n3. Each tool is namespaced with `__` (e.g., `browser__navigate_page`, `filesystem__read_file`)\n4. The server-sdk sends tool schemas to the platform and handles tool call execution\n\nThe agent's protocol must declare matching `mcpServers` with `source: device` - see [MCP Servers](/docs/protocol/mcp-servers).\n\n## Entry Types\n\nThe `Computer` class supports three types of MCP entries:\n\n### Stdio (MCP Subprocess)\n\nSpawns an MCP server as a child process, communicating via stdin/stdout:\n\n```typescript\nComputer.stdio(command: string, args?: string[], options?: {\n env?: Record<string, string>;\n cwd?: string;\n})\n```\n\nUse this for local MCP servers installed as npm packages or standalone executables:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n '--browser-url=http://127.0.0.1:9222',\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [\n '/Users/me/projects/my-app',\n ]),\n },\n});\n```\n\n### HTTP (Remote MCP Endpoint)\n\nConnects to an MCP server over Streamable HTTP:\n\n```typescript\nComputer.http(url: string, options?: {\n headers?: Record<string, string>;\n})\n```\n\nUse this for MCP servers running as HTTP services:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n docs: Computer.http('http://localhost:3001/mcp', {\n headers: { Authorization: 'Bearer token' },\n }),\n },\n});\n```\n\n### Shell (Built-in)\n\nProvides shell command execution without spawning an MCP subprocess:\n\n```typescript\nComputer.shell(options: {\n cwd?: string;\n mode: ShellMode;\n timeout?: number; // Default: 300,000ms (5 minutes)\n})\n```\n\nThis exposes a `run_command` tool (namespaced as `shell__run_command` when the key is `shell`). Commands execute in a login shell with the user's full environment.\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n shell: Computer.shell({\n cwd: '/Users/me/projects/my-app',\n mode: 'unrestricted',\n timeout: 300_000,\n }),\n },\n});\n```\n\n#### Shell Safety Modes\n\n| Mode | Description |\n| -------------------------------------- | --------------------------------------------- |\n| `'unrestricted'` | All commands allowed (for dedicated machines) |\n| `{ allowedPatterns, blockedPatterns }` | Pattern-based command filtering |\n\nPattern-based filtering:\n\n```typescript\nComputer.shell({\n cwd: workspaceDir,\n mode: {\n blockedPatterns: [/rm\\s+-rf/, /sudo/],\n allowedPatterns: [/^git\\s/, /^npm\\s/, /^ls\\s/],\n },\n});\n```\n\nWhen `allowedPatterns` is set, only matching commands are permitted. When `blockedPatterns` is set, matching commands are rejected. Blocked patterns are checked first.\n\n## Lifecycle\n\n### Starting\n\n`computer.start()` connects to all configured MCP servers in parallel. If some servers fail to connect, the computer still starts with the remaining servers - only if _all_ connections fail does it throw an error.\n\n```typescript\nconst { errors } = await computer.start();\n\nif (errors.length > 0) {\n console.warn('Some MCP servers failed to connect:', errors);\n}\n```\n\n### Stopping\n\n`computer.stop()` closes all MCP connections and kills managed processes:\n\n```typescript\nawait computer.stop();\n```\n\nAlways call `stop()` when the session ends to clean up MCP subprocesses. For managed processes (like Chrome), pass them in the config for automatic cleanup.\n\n## Dynamic Entries\n\nYou can add or remove MCP entries on a running `Computer` after `start()` has returned. This is useful when MCP configurations arrive after construction - for example, when a session-manager receives per-session entries from a dispatch payload and wants to wire them into the existing computer instead of rebuilding it.\n\n### `addEntry(namespace, entry, options?)`\n\nRegisters a new MCP entry under `namespace`. By default, connects immediately:\n\n```typescript\nawait computer.addEntry(\n 'github',\n Computer.stdio('@modelcontextprotocol/server-github', [], {\n env: { GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GH_TOKEN! },\n }),\n);\n```\n\nPass `{ deferred: true }` to register the entry without connecting. The entry starts in a degraded state and connects on the next `restartEntry(namespace)` call - useful for lazy MCPs the agent activates on demand:\n\n```typescript\nawait computer.addEntry('github', githubEntry, { deferred: true });\n\n// Later, when the agent decides it needs GitHub:\nawait computer.restartEntry('github');\n```\n\n`addEntry` throws if the namespace already exists. To replace an entry, call `removeEntry` first.\n\nIf the immediate connection fails, `addEntry` does not throw - the entry is registered as degraded with the error message attached. Inspect via `getHealth()` or `restartEntry()` to retry.\n\n### `removeEntry(namespace)`\n\nCloses the entry's connection (if any) and drops it from the configuration. No-op when the namespace doesn't exist:\n\n```typescript\nawait computer.removeEntry('github');\n```\n\n### `restartEntry(namespace)`\n\nCloses the existing connection (if any) and reconnects with the current configuration:\n\n```typescript\nawait computer.restartEntry('github');\n```\n\nUse this to bring a deferred entry online for the first time, or to recover an entry that became degraded mid-session.\n\n### Detecting dynamic-entry support\n\nConsumers that work with arbitrary `ToolProvider` implementations can detect dynamic-entry capability with `isDynamicMcpProvider`:\n\n```typescript\nimport { isDynamicMcpProvider } from '@octavus/server-sdk';\n\nif (isDynamicMcpProvider(provider)) {\n await provider.addEntry('github', githubEntry);\n}\n```\n\n`Computer` always passes this check.\n\n## Chrome Launch Helper\n\nFor desktop applications that need to control a browser, `Computer.launchChrome()` launches Chrome with remote debugging enabled:\n\n```typescript\nconst browser = await Computer.launchChrome({\n profileDir: '/Users/me/.my-app/chrome-profiles/agent-1',\n debuggingPort: 9222, // Optional, auto-allocated if omitted\n flags: ['--window-size=1280,800'],\n});\n\nconsole.log(`Chrome running on port ${browser.port}, PID ${browser.pid}`);\n```\n\nPass the browser to `managedProcesses` for automatic cleanup when the computer stops:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [workspaceDir]),\n shell: Computer.shell({ cwd: workspaceDir, mode: 'unrestricted' }),\n },\n managedProcesses: [{ process: browser.process }],\n});\n```\n\n### ChromeLaunchOptions\n\n| Field | Required | Description |\n| --------------- | -------- | ----------------------------------------------------- |\n| `profileDir` | Yes | Directory for Chrome's user data (profile isolation) |\n| `debuggingPort` | No | Port for remote debugging (auto-allocated if omitted) |\n| `flags` | No | Additional Chrome launch flags |\n\n## ToolProvider Interface\n\n`Computer` implements the `ToolProvider` interface from `@octavus/core`:\n\n```typescript\ninterface ToolProvider {\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n```\n\n`setDynamicTools()` accepts any `ToolProvider` directly - the session extracts schemas and handlers automatically:\n\n```typescript\nsession.setDynamicTools(computer);\n```\n\nYou can also pass a custom `ToolProvider`:\n\n```typescript\nconst customProvider: ToolProvider = {\n toolHandlers() {\n return {\n custom__my_tool: async (args) => {\n return { result: 'done' };\n },\n };\n },\n toolSchemas() {\n return [\n {\n name: 'custom__my_tool',\n description: 'A custom tool',\n inputSchema: {\n type: 'object',\n properties: {\n input: { type: 'string', description: 'Tool input' },\n },\n required: ['input'],\n },\n },\n ];\n },\n};\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: { 'set-chat-title': titleHandler },\n});\n\nsession.setDynamicTools(customProvider);\n```\n\nFor cases where you need explicit control, `setDynamicTools()` also accepts a `DynamicTool[]` array:\n\n```typescript\ninterface DynamicTool {\n schema: ToolSchema;\n handler: ToolHandler;\n}\n```\n\n## Complete Example\n\nA desktop application with browser, filesystem, and shell capabilities:\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst WORKSPACE_DIR = '/Users/me/projects/my-app';\nconst PROFILE_DIR = '/Users/me/.my-app/chrome-profiles/agent';\n\nasync function startSession(sessionId: string) {\n // 1. Launch Chrome with remote debugging\n const browser = await Computer.launchChrome({\n profileDir: PROFILE_DIR,\n });\n\n // 2. Create computer with all capabilities\n const computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [WORKSPACE_DIR]),\n shell: Computer.shell({\n cwd: WORKSPACE_DIR,\n mode: 'unrestricted',\n }),\n },\n managedProcesses: [{ process: browser.process }],\n });\n\n // 3. Connect to all MCP servers\n const { errors } = await computer.start();\n if (errors.length > 0) {\n console.warn('Failed to connect:', errors);\n }\n\n // 4. Attach to session and register dynamic tools\n const client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n });\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => {\n console.log('Chat title:', args.title);\n return { success: true };\n },\n },\n });\n\n session.setDynamicTools(computer);\n\n // 5. Execute and stream\n const events = session.execute({\n type: 'trigger',\n triggerName: 'user-message',\n input: { USER_MESSAGE: 'Navigate to github.com and take a screenshot' },\n });\n\n for await (const event of events) {\n // Handle stream events\n }\n\n // 6. Clean up\n await computer.stop();\n}\n```\n\n## API Reference\n\n### Computer\n\n```typescript\nclass Computer implements ToolProvider {\n constructor(config: ComputerConfig);\n\n // Static factories for MCP entries\n static stdio(\n command: string,\n args?: string[],\n options?: {\n env?: Record<string, string>;\n cwd?: string;\n },\n ): StdioConfig;\n\n static http(\n url: string,\n options?: {\n headers?: Record<string, string>;\n },\n ): HttpConfig;\n\n static shell(options: { cwd?: string; mode: ShellMode; timeout?: number }): ShellConfig;\n\n // Chrome launch helper\n static launchChrome(options: ChromeLaunchOptions): Promise<ChromeInstance>;\n\n // Lifecycle\n start(): Promise<{ errors: string[] }>;\n stop(): Promise<void>;\n\n // Dynamic entries\n addEntry(namespace: string, entry: McpEntry, options?: { deferred?: boolean }): Promise<void>;\n removeEntry(namespace: string): Promise<void>;\n restartEntry(namespace: string): Promise<void>;\n stopEntry(namespace: string): Promise<void>;\n\n // Health\n getHealth(): Promise<ComputerHealth>;\n ensureReady(): Promise<EnsureReadyResult>;\n retryDegraded(): Promise<{ recovered: string[]; stillDegraded: string[] }>;\n\n // ToolProvider implementation\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n\ninterface ComputerHealth {\n healthy: boolean;\n entries: EntryHealth[];\n totalTools: number;\n}\n\ninterface EntryHealth {\n name: string;\n healthy: boolean;\n error?: string;\n}\n\ninterface EnsureReadyResult extends ComputerHealth {\n recovered?: string[];\n failedEntries?: string[];\n}\n```\n\n### ComputerConfig\n\n```typescript\ninterface ComputerConfig {\n mcpServers: Record<string, McpEntry>;\n managedProcesses?: { process: ChildProcess }[];\n /** Namespaces to skip during start() - they begin as degraded and can be connected on demand via restartEntry(). */\n deferredEntries?: string[];\n}\n\ntype McpEntry = StdioConfig | HttpConfig | ShellConfig;\ntype ShellMode =\n | 'unrestricted'\n | {\n allowedPatterns?: RegExp[];\n blockedPatterns?: RegExp[];\n };\n```\n\n### ChromeInstance\n\n```typescript\ninterface ChromeInstance {\n port: number;\n process: ChildProcess;\n pid: number;\n}\n```\n",
|
|
89
|
+
content: "\n# Computer\n\nThe `@octavus/computer` package gives agents access to a physical or virtual machine's browser, filesystem, and shell. It connects to [MCP](https://modelcontextprotocol.io) servers, discovers their tools, and provides them to the server-sdk.\n\n**Current version:** `5.0.0`\n\n## Installation\n\n```bash\nnpm install @octavus/computer\n```\n\n## Quick Start\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=http://127.0.0.1:9222']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', ['/path/to/workspace']),\n shell: Computer.shell({ cwd: '/path/to/workspace', mode: 'unrestricted' }),\n },\n});\n\nawait computer.start();\n\nconst client = new OctavusClient({\n baseUrl: 'https://octavus.ai',\n apiKey: 'your-api-key',\n});\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => ({ title: args.title }),\n },\n});\n\nsession.setDynamicTools(computer);\n```\n\nDynamic tools are registered after attaching via `session.setDynamicTools()`. Pass the `computer` directly - the session extracts schemas and handlers from the `ToolProvider`. Tool schemas are sent to the platform on the next `execute()` call, and tool calls flow back through the existing execution loop.\n\n## How It Works\n\n1. You configure MCP servers with namespaces (e.g., `browser`, `filesystem`, `shell`)\n2. `computer.start()` connects to all servers in parallel and discovers their tools\n3. Each tool is namespaced with `__` (e.g., `browser__navigate_page`, `filesystem__read_file`)\n4. The server-sdk sends tool schemas to the platform and handles tool call execution\n\nThe agent's protocol must declare matching `mcpServers` with `source: device` - see [MCP Servers](/docs/protocol/mcp-servers).\n\n## Entry Types\n\nThe `Computer` class supports three types of MCP entries:\n\n### Stdio (MCP Subprocess)\n\nSpawns an MCP server as a child process, communicating via stdin/stdout:\n\n```typescript\nComputer.stdio(command: string, args?: string[], options?: {\n env?: Record<string, string>;\n cwd?: string;\n})\n```\n\nUse this for local MCP servers installed as npm packages or standalone executables:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n '--browser-url=http://127.0.0.1:9222',\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [\n '/Users/me/projects/my-app',\n ]),\n },\n});\n```\n\n### HTTP (Remote MCP Endpoint)\n\nConnects to an MCP server over Streamable HTTP:\n\n```typescript\nComputer.http(url: string, options?: {\n headers?: Record<string, string>;\n})\n```\n\nUse this for MCP servers running as HTTP services:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n docs: Computer.http('http://localhost:3001/mcp', {\n headers: { Authorization: 'Bearer token' },\n }),\n },\n});\n```\n\n### Shell (Built-in)\n\nProvides shell command execution without spawning an MCP subprocess:\n\n```typescript\nComputer.shell(options: {\n cwd?: string;\n mode: ShellMode;\n timeout?: number; // Default: 300,000ms (5 minutes)\n})\n```\n\nThis exposes a `run_command` tool (namespaced as `shell__run_command` when the key is `shell`). Commands execute in a login shell with the user's full environment.\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n shell: Computer.shell({\n cwd: '/Users/me/projects/my-app',\n mode: 'unrestricted',\n timeout: 300_000,\n }),\n },\n});\n```\n\n#### Shell Safety Modes\n\n| Mode | Description |\n| -------------------------------------- | --------------------------------------------- |\n| `'unrestricted'` | All commands allowed (for dedicated machines) |\n| `{ allowedPatterns, blockedPatterns }` | Pattern-based command filtering |\n\nPattern-based filtering:\n\n```typescript\nComputer.shell({\n cwd: workspaceDir,\n mode: {\n blockedPatterns: [/rm\\s+-rf/, /sudo/],\n allowedPatterns: [/^git\\s/, /^npm\\s/, /^ls\\s/],\n },\n});\n```\n\nWhen `allowedPatterns` is set, only matching commands are permitted. When `blockedPatterns` is set, matching commands are rejected. Blocked patterns are checked first.\n\n## Lifecycle\n\n### Starting\n\n`computer.start()` connects to all configured MCP servers in parallel. If some servers fail to connect, the computer still starts with the remaining servers - only if _all_ connections fail does it throw an error.\n\n```typescript\nconst { errors } = await computer.start();\n\nif (errors.length > 0) {\n console.warn('Some MCP servers failed to connect:', errors);\n}\n```\n\n### Stopping\n\n`computer.stop()` closes all MCP connections and kills managed processes:\n\n```typescript\nawait computer.stop();\n```\n\nAlways call `stop()` when the session ends to clean up MCP subprocesses. For managed processes (like Chrome), pass them in the config for automatic cleanup.\n\n## Dynamic Entries\n\nYou can add or remove MCP entries on a running `Computer` after `start()` has returned. This is useful when MCP configurations arrive after construction - for example, when a session-manager receives per-session entries from a dispatch payload and wants to wire them into the existing computer instead of rebuilding it.\n\n### `addEntry(namespace, entry, options?)`\n\nRegisters a new MCP entry under `namespace`. By default, connects immediately:\n\n```typescript\nawait computer.addEntry(\n 'github',\n Computer.stdio('@modelcontextprotocol/server-github', [], {\n env: { GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GH_TOKEN! },\n }),\n);\n```\n\nPass `{ deferred: true }` to register the entry without connecting. The entry starts in a degraded state and connects on the next `restartEntry(namespace)` call - useful for lazy MCPs the agent activates on demand:\n\n```typescript\nawait computer.addEntry('github', githubEntry, { deferred: true });\n\n// Later, when the agent decides it needs GitHub:\nawait computer.restartEntry('github');\n```\n\n`addEntry` throws if the namespace already exists. To replace an entry, call `removeEntry` first.\n\nIf the immediate connection fails, `addEntry` does not throw - the entry is registered as degraded with the error message attached. Inspect via `getHealth()` or `restartEntry()` to retry.\n\n### `removeEntry(namespace)`\n\nCloses the entry's connection (if any) and drops it from the configuration. No-op when the namespace doesn't exist:\n\n```typescript\nawait computer.removeEntry('github');\n```\n\n### `restartEntry(namespace)`\n\nCloses the existing connection (if any) and reconnects with the current configuration:\n\n```typescript\nawait computer.restartEntry('github');\n```\n\nUse this to bring a deferred entry online for the first time, or to recover an entry that became degraded mid-session.\n\n### Detecting dynamic-entry support\n\nConsumers that work with arbitrary `ToolProvider` implementations can detect dynamic-entry capability with `isDynamicMcpProvider`:\n\n```typescript\nimport { isDynamicMcpProvider } from '@octavus/server-sdk';\n\nif (isDynamicMcpProvider(provider)) {\n await provider.addEntry('github', githubEntry);\n}\n```\n\n`Computer` always passes this check.\n\n## Chrome Launch Helper\n\nFor desktop applications that need to control a browser, `Computer.launchChrome()` launches Chrome with remote debugging enabled:\n\n```typescript\nconst browser = await Computer.launchChrome({\n profileDir: '/Users/me/.my-app/chrome-profiles/agent-1',\n debuggingPort: 9222, // Optional, auto-allocated if omitted\n flags: ['--window-size=1280,800'],\n});\n\nconsole.log(`Chrome running on port ${browser.port}, PID ${browser.pid}`);\n```\n\nPass the browser to `managedProcesses` for automatic cleanup when the computer stops:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [workspaceDir]),\n shell: Computer.shell({ cwd: workspaceDir, mode: 'unrestricted' }),\n },\n managedProcesses: [{ process: browser.process }],\n});\n```\n\n### ChromeLaunchOptions\n\n| Field | Required | Description |\n| --------------- | -------- | ----------------------------------------------------- |\n| `profileDir` | Yes | Directory for Chrome's user data (profile isolation) |\n| `debuggingPort` | No | Port for remote debugging (auto-allocated if omitted) |\n| `flags` | No | Additional Chrome launch flags |\n\n## ToolProvider Interface\n\n`Computer` implements the `ToolProvider` interface from `@octavus/core`:\n\n```typescript\ninterface ToolProvider {\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n```\n\n`setDynamicTools()` accepts any `ToolProvider` directly - the session extracts schemas and handlers automatically:\n\n```typescript\nsession.setDynamicTools(computer);\n```\n\nYou can also pass a custom `ToolProvider`:\n\n```typescript\nconst customProvider: ToolProvider = {\n toolHandlers() {\n return {\n custom__my_tool: async (args) => {\n return { result: 'done' };\n },\n };\n },\n toolSchemas() {\n return [\n {\n name: 'custom__my_tool',\n description: 'A custom tool',\n inputSchema: {\n type: 'object',\n properties: {\n input: { type: 'string', description: 'Tool input' },\n },\n required: ['input'],\n },\n },\n ];\n },\n};\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: { 'set-chat-title': titleHandler },\n});\n\nsession.setDynamicTools(customProvider);\n```\n\nFor cases where you need explicit control, `setDynamicTools()` also accepts a `DynamicTool[]` array:\n\n```typescript\ninterface DynamicTool {\n schema: ToolSchema;\n handler: ToolHandler;\n}\n```\n\n## Complete Example\n\nA desktop application with browser, filesystem, and shell capabilities:\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst WORKSPACE_DIR = '/Users/me/projects/my-app';\nconst PROFILE_DIR = '/Users/me/.my-app/chrome-profiles/agent';\n\nasync function startSession(sessionId: string) {\n // 1. Launch Chrome with remote debugging\n const browser = await Computer.launchChrome({\n profileDir: PROFILE_DIR,\n });\n\n // 2. Create computer with all capabilities\n const computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [WORKSPACE_DIR]),\n shell: Computer.shell({\n cwd: WORKSPACE_DIR,\n mode: 'unrestricted',\n }),\n },\n managedProcesses: [{ process: browser.process }],\n });\n\n // 3. Connect to all MCP servers\n const { errors } = await computer.start();\n if (errors.length > 0) {\n console.warn('Failed to connect:', errors);\n }\n\n // 4. Attach to session and register dynamic tools\n const client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n });\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => {\n console.log('Chat title:', args.title);\n return { success: true };\n },\n },\n });\n\n session.setDynamicTools(computer);\n\n // 5. Execute and stream\n const events = session.execute({\n type: 'trigger',\n triggerName: 'user-message',\n input: { USER_MESSAGE: 'Navigate to github.com and take a screenshot' },\n });\n\n for await (const event of events) {\n // Handle stream events\n }\n\n // 6. Clean up\n await computer.stop();\n}\n```\n\n## API Reference\n\n### Computer\n\n```typescript\nclass Computer implements ToolProvider {\n constructor(config: ComputerConfig);\n\n // Static factories for MCP entries\n static stdio(\n command: string,\n args?: string[],\n options?: {\n env?: Record<string, string>;\n cwd?: string;\n },\n ): StdioConfig;\n\n static http(\n url: string,\n options?: {\n headers?: Record<string, string>;\n },\n ): HttpConfig;\n\n static shell(options: { cwd?: string; mode: ShellMode; timeout?: number }): ShellConfig;\n\n // Chrome launch helper\n static launchChrome(options: ChromeLaunchOptions): Promise<ChromeInstance>;\n\n // Lifecycle\n start(): Promise<{ errors: string[] }>;\n stop(): Promise<void>;\n\n // Dynamic entries\n addEntry(namespace: string, entry: McpEntry, options?: { deferred?: boolean }): Promise<void>;\n removeEntry(namespace: string): Promise<void>;\n restartEntry(namespace: string): Promise<void>;\n stopEntry(namespace: string): Promise<void>;\n\n // Health\n getHealth(): Promise<ComputerHealth>;\n ensureReady(): Promise<EnsureReadyResult>;\n retryDegraded(): Promise<{ recovered: string[]; stillDegraded: string[] }>;\n\n // ToolProvider implementation\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n\ninterface ComputerHealth {\n healthy: boolean;\n entries: EntryHealth[];\n totalTools: number;\n}\n\ninterface EntryHealth {\n name: string;\n healthy: boolean;\n error?: string;\n}\n\ninterface EnsureReadyResult extends ComputerHealth {\n recovered?: string[];\n failedEntries?: string[];\n}\n```\n\n### ComputerConfig\n\n```typescript\ninterface ComputerConfig {\n mcpServers: Record<string, McpEntry>;\n managedProcesses?: { process: ChildProcess }[];\n /** Namespaces to skip during start() - they begin as degraded and can be connected on demand via restartEntry(). */\n deferredEntries?: string[];\n}\n\ntype McpEntry = StdioConfig | HttpConfig | ShellConfig;\ntype ShellMode =\n | 'unrestricted'\n | {\n allowedPatterns?: RegExp[];\n blockedPatterns?: RegExp[];\n };\n```\n\n### ChromeInstance\n\n```typescript\ninterface ChromeInstance {\n port: number;\n process: ChildProcess;\n pid: number;\n}\n```\n",
|
|
90
90
|
excerpt: "Computer The package gives agents access to a physical or virtual machine's browser, filesystem, and shell. It connects to MCP servers, discovers their tools, and provides them to the server-sdk....",
|
|
91
91
|
order: 8
|
|
92
92
|
},
|
|
@@ -104,7 +104,7 @@ var docs_default = [
|
|
|
104
104
|
section: "client-sdk",
|
|
105
105
|
title: "Overview",
|
|
106
106
|
description: "Introduction to the Octavus Client SDKs for building chat interfaces.",
|
|
107
|
-
content: "\n# Client SDK Overview\n\nOctavus provides two packages for frontend integration:\n\n| Package | Purpose | Use When |\n| --------------------- | ------------------------ | ----------------------------------------------------- |\n| `@octavus/react` | React hooks and bindings | Building React applications |\n| `@octavus/client-sdk` | Framework-agnostic core | Using Vue, Svelte, vanilla JS, or custom integrations |\n\n**Most users should install `@octavus/react`** - it includes everything from `@octavus/client-sdk` plus React-specific hooks.\n\n## Installation\n\n### React Applications\n\n```bash\nnpm install @octavus/react\n```\n\n**Current version:** `4.1.0`\n\n### Other Frameworks\n\n```bash\nnpm install @octavus/client-sdk\n```\n\n**Current version:** `4.1.0`\n\n## Transport Pattern\n\nThe Client SDK uses a **transport abstraction** to handle communication with your backend. This gives you flexibility in how events are delivered:\n\n| Transport | Use Case | Docs |\n| ----------------------- | -------------------------------------------- | ----------------------------------------------------- |\n| `createHttpTransport` | HTTP/SSE (Next.js, Express, etc.) | [HTTP Transport](/docs/client-sdk/http-transport) |\n| `createSocketTransport` | WebSocket, SockJS, or other socket protocols | [Socket Transport](/docs/client-sdk/socket-transport) |\n\nWhen the transport changes (e.g., when `sessionId` changes), the `useOctavusChat` hook automatically reinitializes with the new transport.\n\n> **Recommendation**: Use HTTP transport unless you specifically need WebSocket features (custom real-time events, Meteor/Phoenix, etc.).\n\n## React Usage\n\nThe `useOctavusChat` hook provides state management and streaming for React applications:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n // Create a stable transport instance (memoized on sessionId)\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { messages, status, send } = useOctavusChat({ transport });\n\n const sendMessage = async (text: string) => {\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n };\n\n return (\n <div>\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n return (\n <div>\n {message.parts.map((part, i) => {\n if (part.type === 'text') {\n return <p key={i}>{part.text}</p>;\n }\n return null;\n })}\n </div>\n );\n}\n```\n\n## Framework-Agnostic Usage\n\nThe `OctavusChat` class can be used with any framework or vanilla JavaScript:\n\n```typescript\nimport { OctavusChat, createHttpTransport } from '@octavus/client-sdk';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n\nconst chat = new OctavusChat({ transport });\n\n// Subscribe to state changes\nconst unsubscribe = chat.subscribe(() => {\n console.log('Messages:', chat.messages);\n console.log('Status:', chat.status);\n // Update your UI here\n});\n\n// Send a message\nawait chat.send('user-message', { USER_MESSAGE: 'Hello' }, { userMessage: { content: 'Hello' } });\n\n// Cleanup when done\nunsubscribe();\n```\n\n## Key Features\n\n### Unified Send Function\n\nThe `send` function handles both user message display and agent triggering in one call:\n\n```tsx\nconst { send } = useOctavusChat({ transport });\n\n// Add user message to UI and trigger agent\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Trigger without adding a user message (e.g., button click)\nawait send('request-human');\n```\n\n### Message Parts\n\nMessages contain ordered `parts` for rich content:\n\n```tsx\nconst { messages } = useOctavusChat({ transport });\n\n// Each message has typed parts\nmessage.parts.map((part) => {\n switch (part.type) {\n case 'text': // Text content\n case 'reasoning': // Extended reasoning/thinking\n case 'tool-call': // Tool execution\n case 'operation': // Internal operations (set-resource, etc.)\n }\n});\n```\n\n### Status Tracking\n\n```tsx\nconst { status } = useOctavusChat({ transport });\n\n// status: 'idle' | 'streaming' | 'error' | 'awaiting-input'\n// 'awaiting-input' occurs when interactive client tools need user action\n```\n\n### Stop Streaming\n\n```tsx\nconst { stop } = useOctavusChat({ transport });\n\n// Stop current stream and finalize message\nstop();\n```\n\n### Retry Last Trigger\n\nRe-execute the last trigger from the same starting point. Messages are rolled back to the state before the trigger, the user message is re-added (if any), and the agent re-executes. Already-uploaded files are reused without re-uploading.\n\n```tsx\nconst { retry, canRetry } = useOctavusChat({ transport });\n\n// Retry after an error, cancellation, or unsatisfactory result\nif (canRetry) {\n await retry();\n}\n```\n\n`canRetry` is `true` when a trigger has been sent and the chat is not currently streaming or awaiting input.\n\n## Hook Reference (React)\n\n### useOctavusChat\n\n```typescript\nfunction useOctavusChat(options: OctavusChatOptions): UseOctavusChatReturn;\n\ninterface OctavusChatOptions {\n // Required: Transport for streaming events\n transport: Transport;\n\n // Optional: Function to request upload URLs for file uploads\n requestUploadUrls?: (\n files: { filename: string; mediaType: string; size: number }[],\n ) => Promise<UploadUrlsResponse>;\n\n // Optional: Client-side tool handlers\n // - Function: executes automatically and returns result\n // - 'interactive': appears in pendingClientTools for user input\n clientTools?: Record<string, ClientToolHandler>;\n\n // Optional: Pre-populate with existing messages (session restore)\n initialMessages?: UIMessage[];\n\n // Optional: Callbacks\n onError?: (error: OctavusError) => void; // Structured error with type, source, retryable\n onFinish?: () => void;\n onStop?: () => void; // Called when user stops generation\n onResourceUpdate?: (name: string, value: unknown) => void;\n}\n\ninterface UseOctavusChatReturn {\n // State\n messages: UIMessage[];\n status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n error: OctavusError | null; // Structured error with type, source, retryable\n\n // Connection (socket transport only - undefined for HTTP)\n connectionState: ConnectionState | undefined; // 'disconnected' | 'connecting' | 'connected' | 'error'\n connectionError: Error | undefined;\n\n // Client tools (interactive tools awaiting user input)\n pendingClientTools: Record<string, InteractiveTool[]>; // Keyed by tool name\n\n // Actions\n send: (\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ) => Promise<void>;\n stop: () => void;\n retry: () => Promise<void>; // Retry last trigger from same starting point\n canRetry: boolean; // Whether retry() can be called\n\n // Connection management (socket transport only - undefined for HTTP)\n connect: (() => Promise<void>) | undefined;\n disconnect: (() => void) | undefined;\n\n // File uploads (requires requestUploadUrls)\n uploadFiles: (\n files: FileList | File[],\n onProgress?: (fileIndex: number, progress: number) => void,\n ) => Promise<FileReference[]>;\n}\n\ninterface UserMessageInput {\n content?: string;\n files?: FileList | File[] | FileReference[];\n}\n```\n\n### useAutoScroll\n\nSmart auto-scroll for chat containers. Scrolls to bottom when content updates, but pauses if the user has scrolled up. See [Streaming - Auto-Scroll](/docs/client-sdk/streaming#auto-scroll) for full usage.\n\n```typescript\nfunction useAutoScroll(options?: UseAutoScrollOptions): {\n scrollRef: RefObject<HTMLDivElement | null>;\n handleScroll: () => void;\n scrollOnUpdate: () => void;\n resetAutoScroll: () => void;\n};\n\ninterface UseAutoScrollOptions {\n scrollRef?: RefObject<HTMLDivElement | null>;\n threshold?: number; // Distance from bottom in px (default: 80)\n}\n```\n\n## Transport Reference\n\n### createHttpTransport\n\nCreates an HTTP/SSE transport using native `fetch()`:\n\n```typescript\nimport { createHttpTransport } from '@octavus/react';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n```\n\n### createSocketTransport\n\nCreates a WebSocket/SockJS transport for real-time connections:\n\n```typescript\nimport { createSocketTransport } from '@octavus/react';\n\nconst transport = createSocketTransport({\n connect: () =>\n new Promise((resolve, reject) => {\n const ws = new WebSocket(`wss://api.example.com/stream?sessionId=${sessionId}`);\n ws.onopen = () => resolve(ws);\n ws.onerror = () => reject(new Error('Connection failed'));\n }),\n});\n```\n\nSocket transport provides additional connection management:\n\n```typescript\n// Access connection state directly\ntransport.connectionState; // 'disconnected' | 'connecting' | 'connected' | 'error'\n\n// Subscribe to state changes\ntransport.onConnectionStateChange((state, error) => {\n /* ... */\n});\n\n// Eager connection (instead of lazy on first send)\nawait transport.connect();\n\n// Manual disconnect\ntransport.disconnect();\n```\n\nFor detailed WebSocket/SockJS usage including custom events, reconnection patterns, and server-side implementation, see [Socket Transport](/docs/client-sdk/socket-transport).\n\n## Class Reference (Framework-Agnostic)\n\n### OctavusChat\n\n```typescript\nclass OctavusChat {\n constructor(options: OctavusChatOptions);\n\n // State (read-only)\n readonly messages: UIMessage[];\n readonly status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n readonly error: OctavusError | null; // Structured error\n readonly pendingClientTools: Record<string, InteractiveTool[]>; // Interactive tools\n\n // Actions\n send(\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ): Promise<void>;\n stop(): void;\n\n // Subscription\n subscribe(callback: () => void): () => void; // Returns unsubscribe function\n}\n```\n\n## Next Steps\n\n- [HTTP Transport](/docs/client-sdk/http-transport) - HTTP/SSE integration (recommended)\n- [Socket Transport](/docs/client-sdk/socket-transport) - WebSocket and SockJS integration\n- [Messages](/docs/client-sdk/messages) - Working with message state\n- [Streaming](/docs/client-sdk/streaming) - Building streaming UIs\n- [Client Tools](/docs/client-sdk/client-tools) - Interactive browser-side tool handling\n- [Operations](/docs/client-sdk/execution-blocks) - Showing agent progress\n- [Error Handling](/docs/client-sdk/error-handling) - Handling errors with type guards\n- [File Uploads](/docs/client-sdk/file-uploads) - Uploading images and documents\n- [Examples](/docs/examples/overview) - Complete working examples\n",
|
|
107
|
+
content: "\n# Client SDK Overview\n\nOctavus provides two packages for frontend integration:\n\n| Package | Purpose | Use When |\n| --------------------- | ------------------------ | ----------------------------------------------------- |\n| `@octavus/react` | React hooks and bindings | Building React applications |\n| `@octavus/client-sdk` | Framework-agnostic core | Using Vue, Svelte, vanilla JS, or custom integrations |\n\n**Most users should install `@octavus/react`** - it includes everything from `@octavus/client-sdk` plus React-specific hooks.\n\n## Installation\n\n### React Applications\n\n```bash\nnpm install @octavus/react\n```\n\n**Current version:** `5.0.0`\n\n### Other Frameworks\n\n```bash\nnpm install @octavus/client-sdk\n```\n\n**Current version:** `5.0.0`\n\n## Transport Pattern\n\nThe Client SDK uses a **transport abstraction** to handle communication with your backend. This gives you flexibility in how events are delivered:\n\n| Transport | Use Case | Docs |\n| ----------------------- | -------------------------------------------- | ----------------------------------------------------- |\n| `createHttpTransport` | HTTP/SSE (Next.js, Express, etc.) | [HTTP Transport](/docs/client-sdk/http-transport) |\n| `createSocketTransport` | WebSocket, SockJS, or other socket protocols | [Socket Transport](/docs/client-sdk/socket-transport) |\n\nWhen the transport changes (e.g., when `sessionId` changes), the `useOctavusChat` hook automatically reinitializes with the new transport.\n\n> **Recommendation**: Use HTTP transport unless you specifically need WebSocket features (custom real-time events, Meteor/Phoenix, etc.).\n\n## React Usage\n\nThe `useOctavusChat` hook provides state management and streaming for React applications:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n // Create a stable transport instance (memoized on sessionId)\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { messages, status, send } = useOctavusChat({ transport });\n\n const sendMessage = async (text: string) => {\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n };\n\n return (\n <div>\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n return (\n <div>\n {message.parts.map((part, i) => {\n if (part.type === 'text') {\n return <p key={i}>{part.text}</p>;\n }\n return null;\n })}\n </div>\n );\n}\n```\n\n## Framework-Agnostic Usage\n\nThe `OctavusChat` class can be used with any framework or vanilla JavaScript:\n\n```typescript\nimport { OctavusChat, createHttpTransport } from '@octavus/client-sdk';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n\nconst chat = new OctavusChat({ transport });\n\n// Subscribe to state changes\nconst unsubscribe = chat.subscribe(() => {\n console.log('Messages:', chat.messages);\n console.log('Status:', chat.status);\n // Update your UI here\n});\n\n// Send a message\nawait chat.send('user-message', { USER_MESSAGE: 'Hello' }, { userMessage: { content: 'Hello' } });\n\n// Cleanup when done\nunsubscribe();\n```\n\n## Key Features\n\n### Unified Send Function\n\nThe `send` function handles both user message display and agent triggering in one call:\n\n```tsx\nconst { send } = useOctavusChat({ transport });\n\n// Add user message to UI and trigger agent\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Trigger without adding a user message (e.g., button click)\nawait send('request-human');\n```\n\n### Message Parts\n\nMessages contain ordered `parts` for rich content:\n\n```tsx\nconst { messages } = useOctavusChat({ transport });\n\n// Each message has typed parts\nmessage.parts.map((part) => {\n switch (part.type) {\n case 'text': // Text content\n case 'reasoning': // Extended reasoning/thinking\n case 'tool-call': // Tool execution\n case 'operation': // Internal operations (set-resource, etc.)\n }\n});\n```\n\n### Status Tracking\n\n```tsx\nconst { status } = useOctavusChat({ transport });\n\n// status: 'idle' | 'streaming' | 'error' | 'awaiting-input'\n// 'awaiting-input' occurs when interactive client tools need user action\n```\n\n### Stop Streaming\n\n```tsx\nconst { stop } = useOctavusChat({ transport });\n\n// Stop current stream and finalize message\nstop();\n```\n\n### Retry Last Trigger\n\nRe-execute the last trigger from the same starting point. Messages are rolled back to the state before the trigger, the user message is re-added (if any), and the agent re-executes. Already-uploaded files are reused without re-uploading.\n\n```tsx\nconst { retry, canRetry } = useOctavusChat({ transport });\n\n// Retry after an error, cancellation, or unsatisfactory result\nif (canRetry) {\n await retry();\n}\n```\n\n`canRetry` is `true` when a trigger has been sent and the chat is not currently streaming or awaiting input.\n\n## Hook Reference (React)\n\n### useOctavusChat\n\n```typescript\nfunction useOctavusChat(options: OctavusChatOptions): UseOctavusChatReturn;\n\ninterface OctavusChatOptions {\n // Required: Transport for streaming events\n transport: Transport;\n\n // Optional: Function to request upload URLs for file uploads\n requestUploadUrls?: (\n files: { filename: string; mediaType: string; size: number }[],\n ) => Promise<UploadUrlsResponse>;\n\n // Optional: Client-side tool handlers\n // - Function: executes automatically and returns result\n // - 'interactive': appears in pendingClientTools for user input\n clientTools?: Record<string, ClientToolHandler>;\n\n // Optional: Pre-populate with existing messages (session restore)\n initialMessages?: UIMessage[];\n\n // Optional: Callbacks\n onError?: (error: OctavusError) => void; // Structured error with type, source, retryable\n onFinish?: () => void;\n onStop?: () => void; // Called when user stops generation\n onResourceUpdate?: (name: string, value: unknown) => void;\n}\n\ninterface UseOctavusChatReturn {\n // State\n messages: UIMessage[];\n status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n error: OctavusError | null; // Structured error with type, source, retryable\n\n // Connection (socket transport only - undefined for HTTP)\n connectionState: ConnectionState | undefined; // 'disconnected' | 'connecting' | 'connected' | 'error'\n connectionError: Error | undefined;\n\n // Client tools (interactive tools awaiting user input)\n pendingClientTools: Record<string, InteractiveTool[]>; // Keyed by tool name\n\n // Actions\n send: (\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ) => Promise<void>;\n stop: () => void;\n retry: () => Promise<void>; // Retry last trigger from same starting point\n canRetry: boolean; // Whether retry() can be called\n\n // Connection management (socket transport only - undefined for HTTP)\n connect: (() => Promise<void>) | undefined;\n disconnect: (() => void) | undefined;\n\n // File uploads (requires requestUploadUrls)\n uploadFiles: (\n files: FileList | File[],\n onProgress?: (fileIndex: number, progress: number) => void,\n ) => Promise<FileReference[]>;\n}\n\ninterface UserMessageInput {\n content?: string;\n files?: FileList | File[] | FileReference[];\n}\n```\n\n### useAutoScroll\n\nSmart auto-scroll for chat containers. Scrolls to bottom when content updates, but pauses if the user has scrolled up. See [Streaming - Auto-Scroll](/docs/client-sdk/streaming#auto-scroll) for full usage.\n\n```typescript\nfunction useAutoScroll(options?: UseAutoScrollOptions): {\n scrollRef: RefObject<HTMLDivElement | null>;\n handleScroll: () => void;\n scrollOnUpdate: () => void;\n resetAutoScroll: () => void;\n};\n\ninterface UseAutoScrollOptions {\n scrollRef?: RefObject<HTMLDivElement | null>;\n threshold?: number; // Distance from bottom in px (default: 80)\n}\n```\n\n## Transport Reference\n\n### createHttpTransport\n\nCreates an HTTP/SSE transport using native `fetch()`:\n\n```typescript\nimport { createHttpTransport } from '@octavus/react';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n```\n\n### createSocketTransport\n\nCreates a WebSocket/SockJS transport for real-time connections:\n\n```typescript\nimport { createSocketTransport } from '@octavus/react';\n\nconst transport = createSocketTransport({\n connect: () =>\n new Promise((resolve, reject) => {\n const ws = new WebSocket(`wss://api.example.com/stream?sessionId=${sessionId}`);\n ws.onopen = () => resolve(ws);\n ws.onerror = () => reject(new Error('Connection failed'));\n }),\n});\n```\n\nSocket transport provides additional connection management:\n\n```typescript\n// Access connection state directly\ntransport.connectionState; // 'disconnected' | 'connecting' | 'connected' | 'error'\n\n// Subscribe to state changes\ntransport.onConnectionStateChange((state, error) => {\n /* ... */\n});\n\n// Eager connection (instead of lazy on first send)\nawait transport.connect();\n\n// Manual disconnect\ntransport.disconnect();\n```\n\nFor detailed WebSocket/SockJS usage including custom events, reconnection patterns, and server-side implementation, see [Socket Transport](/docs/client-sdk/socket-transport).\n\n## Class Reference (Framework-Agnostic)\n\n### OctavusChat\n\n```typescript\nclass OctavusChat {\n constructor(options: OctavusChatOptions);\n\n // State (read-only)\n readonly messages: UIMessage[];\n readonly status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n readonly error: OctavusError | null; // Structured error\n readonly pendingClientTools: Record<string, InteractiveTool[]>; // Interactive tools\n\n // Actions\n send(\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ): Promise<void>;\n stop(): void;\n\n // Subscription\n subscribe(callback: () => void): () => void; // Returns unsubscribe function\n}\n```\n\n## Next Steps\n\n- [HTTP Transport](/docs/client-sdk/http-transport) - HTTP/SSE integration (recommended)\n- [Socket Transport](/docs/client-sdk/socket-transport) - WebSocket and SockJS integration\n- [Messages](/docs/client-sdk/messages) - Working with message state\n- [Streaming](/docs/client-sdk/streaming) - Building streaming UIs\n- [Client Tools](/docs/client-sdk/client-tools) - Interactive browser-side tool handling\n- [Operations](/docs/client-sdk/execution-blocks) - Showing agent progress\n- [Error Handling](/docs/client-sdk/error-handling) - Handling errors with type guards\n- [File Uploads](/docs/client-sdk/file-uploads) - Uploading images and documents\n- [Examples](/docs/examples/overview) - Complete working examples\n",
|
|
108
108
|
excerpt: "Client SDK Overview Octavus provides two packages for frontend integration: | Package | Purpose | Use When | |...",
|
|
109
109
|
order: 1
|
|
110
110
|
},
|
|
@@ -113,7 +113,7 @@ var docs_default = [
|
|
|
113
113
|
section: "client-sdk",
|
|
114
114
|
title: "Messages",
|
|
115
115
|
description: "Working with message state in the Client SDK.",
|
|
116
|
-
content: "\n# Messages\n\nMessages represent the conversation history. The Client SDK tracks messages automatically and provides structured access to their content through typed parts.\n\n## Message Structure\n\n```typescript\ninterface UIMessage {\n id: string;\n role: 'user' | 'assistant';\n parts: UIMessagePart[];\n status: 'streaming' | 'done';\n createdAt: Date;\n}\n```\n\n### Message Parts\n\nMessages contain ordered `parts` that preserve content ordering:\n\n```typescript\ntype UIMessagePart =\n | UITextPart\n | UIReasoningPart\n | UIToolCallPart\n | UIOperationPart\n | UISourcePart\n | UIFilePart\n | UIObjectPart\n | UITodoPart\n | UIWorkerPart\n | UIStepStartPart;\n\n// Text content\ninterface UITextPart {\n type: 'text';\n text: string;\n status: 'streaming' | 'done';\n thread?: string; // For named threads (e.g., \"summary\")\n}\n\n// Extended reasoning/thinking\ninterface UIReasoningPart {\n type: 'reasoning';\n text: string;\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Tool execution\ninterface UIToolCallPart {\n type: 'tool-call';\n toolCallId: string;\n toolName: string;\n displayName?: string; // Human-readable name\n args: Record<string, unknown>;\n result?: unknown;\n error?: string;\n status: 'pending' | 'running' | 'done' | 'error' | 'cancelled';\n thread?: string;\n}\n\n// Internal operations (set-resource, serialize-thread)\ninterface UIOperationPart {\n type: 'operation';\n operationId: string;\n name: string;\n operationType: string;\n status: 'running' | 'done';\n thread?: string;\n}\n\n// Source references (from web search, document processing)\ninterface UISourcePart {\n type: 'source';\n sourceType: 'url' | 'document';\n id: string;\n url?: string; // For URL sources\n title?: string;\n mediaType?: string; // For document sources\n filename?: string;\n thread?: string;\n}\n\n// Generated files (from image generation, skills, code execution)\ninterface UIFilePart {\n type: 'file';\n id: string;\n mediaType: string; // MIME type (e.g., 'image/png', 'image/webp')\n url: string; // Download/display URL (presigned S3 URL)\n filename?: string;\n size?: number;\n toolCallId?: string; // Present if from a tool call\n thread?: string;\n}\n\n// Structured output (when responseType is used)\ninterface UIObjectPart {\n type: 'object';\n id: string;\n typeName: string; // Type name from protocol (e.g., \"ChatResponse\")\n partial?: unknown; // Partial object while streaming\n object?: unknown; // Final object when done\n status: 'streaming' | 'done' | 'error';\n error?: string;\n thread?: string;\n}\n\n// Structured task list (when the agent uses octavus_todo_write)\ninterface UITodoPart {\n type: 'todo';\n todos: {\n id: string;\n content: string;\n status: 'pending' | 'in_progress' | 'completed' | 'cancelled';\n }[];\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Sub-agent execution container (when an agent invokes a worker)\ninterface UIWorkerPart {\n type: 'worker';\n workerId: string;\n workerSlug: string;\n description?: string;\n input?: Record<string, unknown>;\n parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)\n output?: unknown;\n error?: string;\n status: 'running' | 'done' | 'error';\n}\n\n// Step boundary marker (structural, not rendered visually)\ninterface UIStepStartPart {\n type: 'step-start';\n}\n```\n\n## Sending Messages\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { send } = useOctavusChat({ transport });\n\n async function handleSend(text: string) {\n // Add user message to UI and trigger agent\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n }\n\n // ...\n}\n```\n\nThe `send` function:\n\n1. Adds the user message to the UI immediately (if `userMessage` is provided)\n2. Triggers the agent with the specified trigger name and input\n3. Streams the assistant's response back\n\n### Message Content Types\n\nThe `content` field in `userMessage` accepts both strings and objects:\n\n```tsx\n// Text content \u2192 creates a text part\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Object content \u2192 creates an object part (uses `type` field as typeName)\nconst selection = { type: 'product_selection', productId: 'abc123', action: 'select' };\nawait send('user-message', { USER_INPUT: selection }, { userMessage: { content: selection } });\n```\n\nWhen passing an object as `content`:\n\n- The SDK creates a `UIObjectPart` instead of a `UITextPart`\n- The object's `type` field is used as the `typeName` (defaults to `'object'` if not present)\n- This is useful for rich UI interactions like product selections, quick replies, etc.\n\n### Sending with Files\n\nInclude file attachments with messages:\n\n```tsx\nimport type { FileReference } from '@octavus/react';\n\nasync function handleSend(text: string, files?: FileReference[]) {\n await send(\n 'user-message',\n {\n USER_MESSAGE: text,\n FILES: files, // Array of FileReference\n },\n {\n userMessage: {\n content: text,\n files: files, // Shows files in user message bubble\n },\n },\n );\n}\n```\n\nSee [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.\n\n## Rendering Messages\n\n### Basic Rendering\n\n```tsx\nfunction MessageList({ messages }: { messages: UIMessage[] }) {\n return (\n <div className=\"space-y-4\">\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n const isUser = message.role === 'user';\n\n return (\n <div className={isUser ? 'text-right' : 'text-left'}>\n <div className=\"inline-block p-3 rounded-lg\">\n {message.parts.map((part, i) => (\n <PartRenderer key={i} part={part} />\n ))}\n </div>\n </div>\n );\n}\n```\n\n### Rendering Parts\n\n```tsx\nimport { isOtherThread, type UIMessagePart } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n // Check if part belongs to a named thread (e.g., \"summary\")\n if (isOtherThread(part)) {\n return <OtherThreadPart part={part} />;\n }\n\n switch (part.type) {\n case 'text':\n return <TextPart part={part} />;\n\n case 'reasoning':\n return (\n <details className=\"text-gray-500\">\n <summary>Thinking...</summary>\n <pre className=\"text-sm\">{part.text}</pre>\n </details>\n );\n\n case 'tool-call':\n return (\n <div className=\"bg-gray-100 p-2 rounded text-sm\">\n \u{1F527} {part.displayName || part.toolName}\n {part.status === 'done' && ' \u2713'}\n {part.status === 'error' && ` \u2717 ${part.error}`}\n </div>\n );\n\n case 'operation':\n return (\n <div className=\"text-gray-500 text-sm\">\n {part.name}\n {part.status === 'done' && ' \u2713'}\n </div>\n );\n\n case 'source':\n return (\n <div className=\"text-blue-500 text-sm\">\u{1F4CE} {part.title || part.url || part.filename}</div>\n );\n\n case 'file':\n // Render images inline, other files as download links\n if (part.mediaType.startsWith('image/')) {\n return (\n <img\n src={part.url}\n alt={part.filename || 'Generated image'}\n className=\"max-w-full rounded-lg\"\n />\n );\n }\n return (\n <a href={part.url} className=\"text-blue-500 text-sm underline\">\n \u{1F4C4} {part.filename || 'Download file'}\n </a>\n );\n\n case 'object':\n // For structured output, render custom UI based on typeName\n // See Structured Output guide for more details\n return <ObjectPartRenderer part={part} />;\n\n case 'step-start':\n return null;\n\n default:\n return null;\n }\n}\n\nfunction TextPart({ part }: { part: UITextPart }) {\n return (\n <p>\n {part.text}\n {part.status === 'streaming' && (\n <span className=\"inline-block w-2 h-4 bg-gray-400 animate-pulse ml-1\" />\n )}\n </p>\n );\n}\n```\n\n## Named Threads\n\nContent from named threads (like \"summary\") is identified by the `thread` property. Use the `isOtherThread` helper:\n\n```tsx\nimport { isOtherThread } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n if (isOtherThread(part)) {\n // Render differently for named threads\n return (\n <div className=\"bg-amber-50 p-2 rounded border border-amber-200\">\n <span className=\"text-amber-600 text-sm\">\n {part.thread}: {part.type === 'text' && part.text}\n </span>\n </div>\n );\n }\n\n // Regular rendering for main thread\n // ...\n}\n```\n\n## Session Restore\n\nWhen restoring a session, fetch messages from your backend and pass them to the hook:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\ninterface ChatProps {\n sessionId: string;\n initialMessages: UIMessage[];\n}\n\nfunction Chat({ sessionId, initialMessages }: ChatProps) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n // Pass existing messages to restore the conversation\n const { messages } = useOctavusChat({\n transport,\n initialMessages,\n });\n\n // ...\n}\n```\n\nOn your backend, use `agentSessions.getMessages()` to fetch UI-ready messages:\n\n```typescript\n// Server-side\nconst session = await client.agentSessions.getMessages(sessionId);\n// session.messages is UIMessage[] ready for the client\n```\n\n## Callbacks\n\n```tsx\nuseOctavusChat({\n transport,\n onFinish: () => {\n console.log('Stream completed');\n // Scroll to bottom, play sound, etc.\n },\n onError: (error) => {\n console.error('Error:', error);\n toast.error('Failed to get response');\n },\n onResourceUpdate: (name, value) => {\n console.log('Resource updated:', name, value);\n },\n});\n```\n",
|
|
116
|
+
content: "\n# Messages\n\nMessages represent the conversation history. The Client SDK tracks messages automatically and provides structured access to their content through typed parts.\n\n## Message Structure\n\n```typescript\ninterface UIMessage {\n id: string;\n role: 'user' | 'assistant';\n parts: UIMessagePart[];\n status: 'streaming' | 'done';\n createdAt: Date;\n sender?: UIMessageSender; // Author of a user message, in multi-user chats\n}\n\ninterface UIMessageSender {\n id?: string;\n name?: string;\n image?: string; // Avatar URL\n}\n```\n\n### Message Parts\n\nMessages contain ordered `parts` that preserve content ordering:\n\n```typescript\ntype UIMessagePart =\n | UITextPart\n | UIReasoningPart\n | UIToolCallPart\n | UIOperationPart\n | UISourcePart\n | UIFilePart\n | UIObjectPart\n | UITodoPart\n | UIWorkerPart\n | UIStepStartPart;\n\n// Text content\ninterface UITextPart {\n type: 'text';\n text: string;\n status: 'streaming' | 'done';\n thread?: string; // For named threads (e.g., \"summary\")\n}\n\n// Extended reasoning/thinking\ninterface UIReasoningPart {\n type: 'reasoning';\n text: string;\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Tool execution\ninterface UIToolCallPart {\n type: 'tool-call';\n toolCallId: string;\n toolName: string;\n displayName?: string; // Human-readable name\n args: Record<string, unknown>;\n result?: unknown;\n error?: string;\n status: 'pending' | 'running' | 'done' | 'error' | 'cancelled';\n thread?: string;\n}\n\n// Internal operations (set-resource, serialize-thread)\ninterface UIOperationPart {\n type: 'operation';\n operationId: string;\n name: string;\n operationType: string;\n status: 'running' | 'done';\n thread?: string;\n}\n\n// Source references (from web search, document processing)\ninterface UISourcePart {\n type: 'source';\n sourceType: 'url' | 'document';\n id: string;\n url?: string; // For URL sources\n title?: string;\n mediaType?: string; // For document sources\n filename?: string;\n thread?: string;\n}\n\n// Generated files (from image generation, skills, code execution)\ninterface UIFilePart {\n type: 'file';\n id: string;\n mediaType: string; // MIME type (e.g., 'image/png', 'image/webp')\n url: string; // Download/display URL (presigned S3 URL)\n filename?: string;\n size?: number;\n toolCallId?: string; // Present if from a tool call\n thread?: string;\n}\n\n// Structured output (when responseType is used)\ninterface UIObjectPart {\n type: 'object';\n id: string;\n typeName: string; // Type name from protocol (e.g., \"ChatResponse\")\n partial?: unknown; // Partial object while streaming\n object?: unknown; // Final object when done\n status: 'streaming' | 'done' | 'error';\n error?: string;\n thread?: string;\n}\n\n// Structured task list (when the agent uses octavus_todo_write)\ninterface UITodoPart {\n type: 'todo';\n todos: {\n id: string;\n content: string;\n status: 'pending' | 'in_progress' | 'completed' | 'cancelled';\n }[];\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Sub-agent execution container (when an agent invokes a worker)\ninterface UIWorkerPart {\n type: 'worker';\n workerId: string;\n workerSlug: string;\n description?: string;\n input?: Record<string, unknown>;\n parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)\n output?: unknown;\n error?: string;\n status: 'running' | 'done' | 'error' | 'cancelled';\n}\n\n// Step boundary marker (structural, not rendered visually)\ninterface UIStepStartPart {\n type: 'step-start';\n}\n```\n\n## Sending Messages\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { send } = useOctavusChat({ transport });\n\n async function handleSend(text: string) {\n // Add user message to UI and trigger agent\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n }\n\n // ...\n}\n```\n\nThe `send` function:\n\n1. Adds the user message to the UI immediately (if `userMessage` is provided)\n2. Triggers the agent with the specified trigger name and input\n3. Streams the assistant's response back\n\n### Message Content Types\n\nThe `content` field in `userMessage` accepts both strings and objects:\n\n```tsx\n// Text content \u2192 creates a text part\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Object content \u2192 creates an object part (uses `type` field as typeName)\nconst selection = { type: 'product_selection', productId: 'abc123', action: 'select' };\nawait send('user-message', { USER_INPUT: selection }, { userMessage: { content: selection } });\n```\n\nWhen passing an object as `content`:\n\n- The SDK creates a `UIObjectPart` instead of a `UITextPart`\n- The object's `type` field is used as the `typeName` (defaults to `'object'` if not present)\n- This is useful for rich UI interactions like product selections, quick replies, etc.\n\n### Sending with Files\n\nInclude file attachments with messages:\n\n```tsx\nimport type { FileReference } from '@octavus/react';\n\nasync function handleSend(text: string, files?: FileReference[]) {\n await send(\n 'user-message',\n {\n USER_MESSAGE: text,\n FILES: files, // Array of FileReference\n },\n {\n userMessage: {\n content: text,\n files: files, // Shows files in user message bubble\n },\n },\n );\n}\n```\n\nSee [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.\n\n### Attributing the Sender (Multi-User Chats)\n\nIn conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:\n\n```tsx\nawait send(\n 'user-message',\n { USER_MESSAGE: text },\n {\n userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },\n },\n);\n```\n\nThis `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.\n\n## Rendering Messages\n\n### Basic Rendering\n\n```tsx\nfunction MessageList({ messages }: { messages: UIMessage[] }) {\n return (\n <div className=\"space-y-4\">\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n const isUser = message.role === 'user';\n\n return (\n <div className={isUser ? 'text-right' : 'text-left'}>\n <div className=\"inline-block p-3 rounded-lg\">\n {message.parts.map((part, i) => (\n <PartRenderer key={i} part={part} />\n ))}\n </div>\n </div>\n );\n}\n```\n\n### Rendering Parts\n\n```tsx\nimport { isOtherThread, type UIMessagePart } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n // Check if part belongs to a named thread (e.g., \"summary\")\n if (isOtherThread(part)) {\n return <OtherThreadPart part={part} />;\n }\n\n switch (part.type) {\n case 'text':\n return <TextPart part={part} />;\n\n case 'reasoning':\n return (\n <details className=\"text-gray-500\">\n <summary>Thinking...</summary>\n <pre className=\"text-sm\">{part.text}</pre>\n </details>\n );\n\n case 'tool-call':\n return (\n <div className=\"bg-gray-100 p-2 rounded text-sm\">\n \u{1F527} {part.displayName || part.toolName}\n {part.status === 'done' && ' \u2713'}\n {part.status === 'error' && ` \u2717 ${part.error}`}\n </div>\n );\n\n case 'operation':\n return (\n <div className=\"text-gray-500 text-sm\">\n {part.name}\n {part.status === 'done' && ' \u2713'}\n </div>\n );\n\n case 'source':\n return (\n <div className=\"text-blue-500 text-sm\">\u{1F4CE} {part.title || part.url || part.filename}</div>\n );\n\n case 'file':\n // Render images inline, other files as download links\n if (part.mediaType.startsWith('image/')) {\n return (\n <img\n src={part.url}\n alt={part.filename || 'Generated image'}\n className=\"max-w-full rounded-lg\"\n />\n );\n }\n return (\n <a href={part.url} className=\"text-blue-500 text-sm underline\">\n \u{1F4C4} {part.filename || 'Download file'}\n </a>\n );\n\n case 'object':\n // For structured output, render custom UI based on typeName\n // See Structured Output guide for more details\n return <ObjectPartRenderer part={part} />;\n\n case 'step-start':\n return null;\n\n default:\n return null;\n }\n}\n\nfunction TextPart({ part }: { part: UITextPart }) {\n return (\n <p>\n {part.text}\n {part.status === 'streaming' && (\n <span className=\"inline-block w-2 h-4 bg-gray-400 animate-pulse ml-1\" />\n )}\n </p>\n );\n}\n```\n\n## Named Threads\n\nContent from named threads (like \"summary\") is identified by the `thread` property. Use the `isOtherThread` helper:\n\n```tsx\nimport { isOtherThread } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n if (isOtherThread(part)) {\n // Render differently for named threads\n return (\n <div className=\"bg-amber-50 p-2 rounded border border-amber-200\">\n <span className=\"text-amber-600 text-sm\">\n {part.thread}: {part.type === 'text' && part.text}\n </span>\n </div>\n );\n }\n\n // Regular rendering for main thread\n // ...\n}\n```\n\n## Session Restore\n\nWhen restoring a session, fetch messages from your backend and pass them to the hook:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\ninterface ChatProps {\n sessionId: string;\n initialMessages: UIMessage[];\n}\n\nfunction Chat({ sessionId, initialMessages }: ChatProps) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n // Pass existing messages to restore the conversation\n const { messages } = useOctavusChat({\n transport,\n initialMessages,\n });\n\n // ...\n}\n```\n\nOn your backend, use `agentSessions.getMessages()` to fetch UI-ready messages:\n\n```typescript\n// Server-side\nconst session = await client.agentSessions.getMessages(sessionId);\n// session.messages is UIMessage[] ready for the client\n```\n\n## Callbacks\n\n```tsx\nuseOctavusChat({\n transport,\n onFinish: () => {\n console.log('Stream completed');\n // Scroll to bottom, play sound, etc.\n },\n onError: (error) => {\n console.error('Error:', error);\n toast.error('Failed to get response');\n },\n onResourceUpdate: (name, value) => {\n console.log('Resource updated:', name, value);\n },\n});\n```\n",
|
|
117
117
|
excerpt: "Messages Messages represent the conversation history. The Client SDK tracks messages automatically and provides structured access to their content through typed parts. Message Structure Message...",
|
|
118
118
|
order: 2
|
|
119
119
|
},
|
|
@@ -597,7 +597,7 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
597
597
|
section: "protocol",
|
|
598
598
|
title: "Skills",
|
|
599
599
|
description: "Using Octavus skills for code execution and specialized capabilities.",
|
|
600
|
-
content: "\n# Skills\n\nSkills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with documentation and scripts. By default, skills run in isolated sandbox environments, but they can also run directly on the agent's computer.\n\n## Overview\n\nOctavus Skills provide **provider-agnostic** code execution. They work with any LLM provider (Anthropic, OpenAI, Google) by using explicit tool calls and system prompt injection.\n\n### How Skills Work\n\n1. **Skill Definition**: Skills are defined in the protocol's `skills:` section\n2. **Skill Resolution**: Skills are resolved from available sources (see below)\n3. **Execution**: Code runs in an isolated sandbox (default) or on the agent's computer\n4. **File Generation**: Files saved to `/output/` are automatically captured and made available for download (sandbox skills)\n\n### Skill Sources\n\nSkills come from two sources, visible in the Skills tab of your organization:\n\n| Source | Badge in UI | Visibility | Example |\n| ----------- | ----------- | ------------------------------ | ------------------ |\n| **Octavus** | `Octavus` | Available to all organizations | `qr-code` |\n| **Custom** | None | Private to your organization | `my-company-skill` |\n\nWhen you reference a skill in your protocol, Octavus resolves it from your available skills. If you create a custom skill with the same name as an Octavus skill, your custom skill takes precedence.\n\n## Defining Skills\n\nDefine skills in the protocol's `skills:` section:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n```\n\n### Skill Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------------------------------------- |\n| `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |\n| `description` | No | Custom description shown to users (overrides skill's built-in description) |\n| `execution` | No | Where the skill runs: `sandbox` (default) or `device` |\n\n### Display Modes\n\nThe `display` setting on a skill applies to all tools under that skill namespace. See [Tool Display Modes](/docs/protocol/tools#display-modes) for full details on each mode.\n\n| Mode | Behavior |\n| ------------- | -------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Skill tools run silently, no UI events emitted |\n| `name` | Shows skill name while executing |\n| `description` | Shows description while executing (default). Result not preserved after page refresh. |\n| `stream` | Full visibility - arguments stream progressively, result shown after execution, result preserved after page refresh. |\n\n## Enabling Skills\n\nAfter defining skills in the `skills:` section, specify which skills are available. Skills work in both interactive agents and workers.\n\n### Interactive Agents\n\nReference skills in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account]\n skills: [qr-code]\n agentic: true\n```\n\n### Workers and Named Threads\n\nReference skills per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n maxSteps: 10\n```\n\nThis also works for named threads in interactive agents, allowing different threads to have different skills.\n\n## Skill Tools\n\nWhen skills are enabled, the LLM has access to these tools:\n\n| Tool | Purpose | Availability |\n| --------------------- | ----------------------------------------------- | ------------------------------ |\n| `octavus_skill_read` | Read skill documentation (SKILL.md) | All skills |\n| `octavus_skill_list` | List available scripts in a skill | All skills |\n| `octavus_skill_run` | Execute a pre-built script from a skill | All skills |\n| `octavus_skill_setup` | Install a skill on the device for file browsing | Device skills only |\n| `octavus_code_run` | Execute arbitrary Python/Bash code | Sandbox skills (standard) only |\n| `octavus_file_write` | Create files in the sandbox | Sandbox skills (standard) only |\n| `octavus_file_read` | Read files from the sandbox | Sandbox skills (standard) only |\n\nThe LLM learns about available skills through system prompt injection and can use these tools to interact with skills.\n\nSkills that have [secrets](#skill-secrets) configured run in **secure mode**, where only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available. See [Skill Secrets](#skill-secrets) below.\n\n## Device Execution\n\nBy default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications to production\n execution: device\n qr-code:\n display: description\n description: Generating QR codes\n # execution defaults to sandbox\n```\n\n### How Device Skills Work\n\nDevice skills are installed on the agent's computer so the agent can browse their files and run their scripts directly. After attaching a skill via integrations, the agent uses `octavus_skill_setup` to install it on the device. Once installed, the agent can:\n\n- Read the skill's documentation with `octavus_skill_read`\n- List available scripts with `octavus_skill_list`\n- Run pre-built scripts with `octavus_skill_run`\n\nThe generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_file_read`) are **not available** for device skills. Instead, the agent uses the device's own shell and filesystem MCP servers to interact with files and run commands.\n\n### Sandbox vs Device Skills\n\n| Aspect | Sandbox (default) | Device |\n| ------------------- | ---------------------------------- | ------------------------------------------------------ |\n| **Environment** | Isolated sandbox | Agent's computer (VM or desktop) |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |\n| **Code execution** | Via `octavus_code_run` | Via device shell MCP |\n| **Isolation** | Fully sandboxed | Runs alongside other device processes |\n| **File output** | `/output/` directory auto-captured | Files written to device filesystem |\n\n### When to Use Device Execution\n\nUse `execution: device` when the skill needs to:\n\n- Access the agent's local filesystem or running processes\n- Use tools or CLIs installed on the device\n- Interact with services running on the device\n- Persist files beyond a single execution cycle\n\n## Example: QR Code Generation\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n agentic: true\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Respond:\n block: next-message\n```\n\nWhen a user asks \"Create a QR code for octavus.ai\", the LLM will:\n\n1. Recognize the task matches the `qr-code` skill\n2. Call `octavus_skill_read` to learn how to use the skill\n3. Execute code (via `octavus_code_run` or `octavus_skill_run`) to generate the QR code\n4. Save the image to `/output/` in the sandbox\n5. The file is automatically captured and made available for download\n\n## File Output\n\nFiles saved to `/output/` in the sandbox are automatically:\n\n1. **Captured** after code execution\n2. **Uploaded** to S3 storage\n3. **Made available** via presigned URLs\n4. **Included** in the message as file parts\n\nFiles persist across page refreshes and are stored in the session's message history.\n\n## Skill Format\n\nSkills follow the [Agent Skills](https://agentskills.io) open standard:\n\n- `SKILL.md` - Required skill documentation with YAML frontmatter\n- `scripts/` - Optional executable code (Python/Bash)\n- `references/` - Optional documentation loaded as needed\n- `assets/` - Optional files used in outputs (templates, images)\n\n### SKILL.md Format\n\n````yaml\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\nversion: 1.0.0\nlicense: MIT\nauthor: Octavus Team\n---\n\n# QR Code Generator\n\n## Overview\n\nThis skill creates QR codes from text data using Python...\n\n## Quick Start\n\nGenerate a QR code with Python:\n\n```python\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n# ... code to generate QR code ...\n````\n\n## Scripts Reference\n\n### scripts/generate.py\n\nMain script for generating QR codes...\n\n````\n\n### Frontmatter Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------ |\n| `name` | Yes | Skill slug (lowercase, hyphens) |\n| `description` | Yes | What the skill does (shown to the LLM) |\n| `version` | No | Semantic version string |\n| `license` | No | License identifier |\n| `author` | No | Skill author |\n| `secrets` | No | Array of secret declarations (enables secure mode) |\n\n## Best Practices\n\n### 1. Clear Descriptions\n\nProvide clear, purpose-driven descriptions:\n\n```yaml\nskills:\n # Good - clear purpose\n qr-code:\n description: Generating QR codes for URLs, contact info, or any text data\n\n # Avoid - vague\n utility:\n description: Does stuff\n````\n\n### 2. When to Use Skills vs Tools\n\n| Use Skills When | Use Tools When |\n| ------------------------ | ---------------------------- |\n| Code execution needed | Simple API calls |\n| File generation | Database queries |\n| Complex calculations | External service integration |\n| Data processing | Authentication required |\n| Provider-agnostic needed | Backend-specific logic |\n\n### 3. Skill Selection\n\nDefine all skills available to this agent in the `skills:` section. Then specify which skills are available for the chat thread in `agent.skills`:\n\n```yaml\n# All skills available to this agent (defined once at protocol level)\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n pdf-processor:\n display: description\n description: Processing PDFs\n\n# Skills available for this chat thread\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis] # Skills available for this thread\n```\n\n### 4. Display Modes\n\nChoose appropriate display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n## Comparison: Skills vs Tools vs Provider Options\n\n| Feature | Octavus Skills | External Tools | Provider Tools/Skills |\n| ------------------ | --------------------------- | ------------------- | --------------------- |\n| **Execution** | Sandbox or agent's computer | Your backend | Provider servers |\n| **Provider** | Any (agnostic) | N/A | Provider-specific |\n| **Code Execution** | Yes | No | Yes (provider tools) |\n| **File Output** | Yes | No | Yes (provider skills) |\n| **Implementation** | Skill packages | Your code | Built-in |\n| **Cost** | Sandbox + LLM API | Your infrastructure | Included in API |\n\n## Uploading Custom Skills\n\nYou can upload custom skills to your organization using the CLI or the platform UI.\n\n### Via CLI (Recommended)\n\nUse [`octavus skills sync`](/docs/server-sdk/cli#octavus-skills-sync-path) to package and upload a skill directory. If the skill has a `.env` file, secrets are pushed alongside the bundle:\n\n```bash\noctavus skills sync ./skills/my-skill\n```\n\n### Skill Directory Structure\n\n```\nmy-skill/\n\u251C\u2500\u2500 SKILL.md # Required: Skill documentation with frontmatter\n\u251C\u2500\u2500 scripts/ # Optional: Executable scripts\n\u2502 \u251C\u2500\u2500 run.py\n\u2502 \u2514\u2500\u2500 requirements.txt\n\u251C\u2500\u2500 references/ # Optional: Additional documentation\n\u251C\u2500\u2500 assets/ # Optional: Templates, images\n\u2514\u2500\u2500 .env # Optional: Secrets (not included in bundle)\n```\n\nOnce uploaded, reference the skill by slug in your protocol:\n\n```yaml\nskills:\n my-skill:\n display: description\n description: Custom analysis tool\n\nagent:\n skills: [my-skill]\n```\n\n## On-Demand Skills\n\nOn-demand skills (`onDemandSkills`) also support the `execution` field:\n\n```yaml\nonDemandSkills:\n display: description\n execution: device\n```\n\nWhen `execution: device` is set on the on-demand skills declaration, any skill attached at runtime via integrations runs on the agent's computer instead of in a sandbox.\n\n## Sandbox Timeout\n\nThe default sandbox timeout is 5 minutes (applies to sandbox skills only). You can configure a custom timeout using `sandboxTimeout` in the agent config or on individual `start-thread` blocks:\n\n```yaml\n# Agent-level timeout (applies to main thread)\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes (in milliseconds)\n```\n\n```yaml\n# Thread-level timeout (overrides agent-level for this thread)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour\n```\n\nThread-level `sandboxTimeout` takes priority over agent-level. Maximum: 1 hour (3,600,000 ms).\n\n## Skill Secrets\n\nSkills can declare secrets they need to function. When an organization configures those secrets, the skill runs in **secure mode** with additional isolation.\n\n### Declaring Secrets\n\nAdd a `secrets` array to your SKILL.md frontmatter:\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\nEach secret declaration has:\n\n| Field | Required | Description |\n| ------------- | -------- | ----------------------------------------------------------- |\n| `name` | Yes | Environment variable name (uppercase, e.g., `GITHUB_TOKEN`) |\n| `description` | No | Explains what this secret is for (shown in the UI) |\n| `required` | No | Whether the secret is required (defaults to `true`) |\n\nSecret names must match the pattern `^[A-Z_][A-Z0-9_]*$` (uppercase letters, digits, and underscores).\n\n### Configuring Secrets\n\nOrganization admins configure secret values through the skill editor in the platform UI. Each organization maintains its own independent set of secrets for each skill.\n\nSecrets are encrypted at rest and only decrypted at execution time.\n\n### Secure Mode\n\nWhen a skill has secrets configured for the organization, it automatically runs in **secure mode**:\n\n- The skill gets its own **isolated sandbox** (separate from other skills)\n- Secrets are injected as **environment variables** available to all scripts\n- Only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked\n- Scripts receive input as **JSON via stdin** (using the `input` parameter on `octavus_skill_run`) instead of CLI args\n- All output (stdout/stderr) is **automatically redacted** for secret values before being returned to the LLM\n\n### Writing Scripts for Secure Skills\n\nScripts in secure skills read input from stdin as JSON and access secrets from environment variables:\n\n```python\nimport json\nimport os\nimport sys\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ.get('GITHUB_TOKEN')\n\n# Use the token and input_data to perform the task\n```\n\nFor standard skills (without secrets), scripts receive input as CLI arguments. For secure skills, always use stdin JSON.\n\n## Security\n\nSandbox skills run in isolated environments:\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n- **Secret redaction** - output from secure skills is automatically scanned for secret values\n\nDevice skills run on the agent's computer and share its environment. They do not have sandbox isolation but benefit from restricted tool access (only slug-bearing tools are available).\n\n## Next Steps\n\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills in agent settings\n- [Provider Options](/docs/protocol/provider-options) - Anthropic's built-in skills\n- [Skills Advanced Guide](/docs/protocol/skills-advanced) - Best practices and advanced patterns\n",
|
|
600
|
+
content: "\n# Skills\n\nSkills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with documentation and scripts. By default, skills run in isolated sandbox environments, but they can also run directly on the agent's computer.\n\n## Overview\n\nOctavus Skills provide **provider-agnostic** code execution. They work with any LLM provider (Anthropic, OpenAI, Google) by using explicit tool calls and system prompt injection.\n\n### How Skills Work\n\n1. **Skill Definition**: Skills are defined in the protocol's `skills:` section\n2. **Skill Resolution**: Skills are resolved from available sources (see below)\n3. **Execution**: Code runs in an isolated sandbox (default) or on the agent's computer\n4. **File Generation**: Files saved to `/output/` are automatically captured and made available for download (sandbox skills)\n\n### Skill Sources\n\nSkills come from two sources, visible in the Skills tab of your organization:\n\n| Source | Badge in UI | Visibility | Example |\n| ----------- | ----------- | ------------------------------ | ------------------ |\n| **Octavus** | `Octavus` | Available to all organizations | `qr-code` |\n| **Custom** | None | Private to your organization | `my-company-skill` |\n\nWhen you reference a skill in your protocol, Octavus resolves it from your available skills. If you create a custom skill with the same name as an Octavus skill, your custom skill takes precedence.\n\n## Defining Skills\n\nDefine skills in the protocol's `skills:` section:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n```\n\n### Skill Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------------------------------------- |\n| `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |\n| `description` | No | Custom description shown to users (overrides skill's built-in description) |\n| `execution` | No | Where the skill runs: `sandbox` (default) or `device` |\n\n### Display Modes\n\nThe `display` setting on a skill applies to all tools under that skill namespace. See [Tool Display Modes](/docs/protocol/tools#display-modes) for full details on each mode.\n\n| Mode | Behavior |\n| ------------- | -------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Skill tools run silently, no UI events emitted |\n| `name` | Shows skill name while executing |\n| `description` | Shows description while executing (default). Result not preserved after page refresh. |\n| `stream` | Full visibility - arguments stream progressively, result shown after execution, result preserved after page refresh. |\n\n## Enabling Skills\n\nAfter defining skills in the `skills:` section, specify which skills are available. Skills work in both interactive agents and workers.\n\n### Interactive Agents\n\nReference skills in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account]\n skills: [qr-code]\n agentic: true\n```\n\n### Workers and Named Threads\n\nReference skills per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n maxSteps: 10\n```\n\nThis also works for named threads in interactive agents, allowing different threads to have different skills.\n\n## Skill Tools\n\nWhen skills are enabled, the LLM has access to these tools:\n\n| Tool | Purpose | Availability |\n| --------------------- | ----------------------------------------------- | ------------------------------ |\n| `octavus_skill_read` | Read skill documentation (SKILL.md) | All skills |\n| `octavus_skill_list` | List available scripts in a skill | All skills |\n| `octavus_skill_run` | Execute a pre-built script from a skill | All skills |\n| `octavus_skill_setup` | Install a skill on the device for file browsing | Device skills only |\n| `octavus_code_run` | Execute arbitrary Python/Bash code | Sandbox skills (standard) only |\n| `octavus_file_write` | Create files in the sandbox | Sandbox skills (standard) only |\n| `octavus_file_read` | Read files from the sandbox | Sandbox skills (standard) only |\n\nThe LLM learns about available skills through system prompt injection and can use these tools to interact with skills.\n\nSkills that have [secrets](#skill-secrets) configured run in **secure mode**, where only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available. See [Skill Secrets](#skill-secrets) below.\n\n## Device Execution\n\nBy default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications to production\n execution: device\n qr-code:\n display: description\n description: Generating QR codes\n # execution defaults to sandbox\n```\n\n### How Device Skills Work\n\nDevice skills are installed on the agent's computer so the agent can browse their files and run their scripts directly. After attaching a skill via integrations, the agent uses `octavus_skill_setup` to install it on the device. Once installed, the agent can:\n\n- Read the skill's documentation with `octavus_skill_read`\n- List available scripts with `octavus_skill_list`\n- Run pre-built scripts with `octavus_skill_run`\n\nThe generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_file_read`) are **not available** for device skills. Instead, the agent uses the device's own shell and filesystem MCP servers to interact with files and run commands.\n\n### Sandbox vs Device Skills\n\n| Aspect | Sandbox (default) | Device |\n| ------------------- | ---------------------------------- | ------------------------------------------------------ |\n| **Environment** | Isolated sandbox | The agent's computer |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |\n| **Code execution** | Via `octavus_code_run` | Via device shell MCP |\n| **Isolation** | Fully sandboxed | Runs alongside other device processes |\n| **File output** | `/output/` directory auto-captured | Files written to device filesystem |\n\n### When to Use Device Execution\n\nUse `execution: device` when the skill needs to:\n\n- Access the agent's local filesystem or running processes\n- Use tools or CLIs installed on the device\n- Interact with services running on the device\n- Persist files beyond a single execution cycle\n\n## Example: QR Code Generation\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n agentic: true\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Respond:\n block: next-message\n```\n\nWhen a user asks \"Create a QR code for octavus.ai\", the LLM will:\n\n1. Recognize the task matches the `qr-code` skill\n2. Call `octavus_skill_read` to learn how to use the skill\n3. Execute code (via `octavus_code_run` or `octavus_skill_run`) to generate the QR code\n4. Save the image to `/output/` in the sandbox\n5. The file is automatically captured and made available for download\n\n## File Output\n\nFiles saved to `/output/` in the sandbox are automatically:\n\n1. **Captured** after code execution\n2. **Uploaded** to S3 storage\n3. **Made available** via presigned URLs\n4. **Included** in the message as file parts\n\nFiles persist across page refreshes and are stored in the session's message history.\n\n## Skill Format\n\nSkills follow the [Agent Skills](https://agentskills.io) open standard:\n\n- `SKILL.md` - Required skill documentation with YAML frontmatter\n- `scripts/` - Optional executable code (Python/Bash)\n- `references/` - Optional documentation loaded as needed\n- `assets/` - Optional files used in outputs (templates, images)\n\n### SKILL.md Format\n\n````yaml\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\nversion: 1.0.0\nlicense: MIT\nauthor: Octavus Team\ncategory: Productivity\n---\n\n# QR Code Generator\n\n## Overview\n\nThis skill creates QR codes from text data using Python...\n\n## Quick Start\n\nGenerate a QR code with Python:\n\n```python\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n# ... code to generate QR code ...\n````\n\n## Scripts Reference\n\n### scripts/generate.py\n\nMain script for generating QR codes...\n\n````\n\n### Frontmatter Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------ |\n| `name` | Yes | Skill slug (lowercase, hyphens) |\n| `description` | Yes | What the skill does (shown to the LLM) |\n| `version` | No | Semantic version string |\n| `license` | No | License identifier |\n| `author` | No | Skill author |\n| `category` | No | Display category used to group and filter skills in the UI |\n| `secrets` | No | Array of secret declarations (enables secure mode) |\n\n## Best Practices\n\n### 1. Clear Descriptions\n\nProvide clear, purpose-driven descriptions:\n\n```yaml\nskills:\n # Good - clear purpose\n qr-code:\n description: Generating QR codes for URLs, contact info, or any text data\n\n # Avoid - vague\n utility:\n description: Does stuff\n````\n\n### 2. When to Use Skills vs Tools\n\n| Use Skills When | Use Tools When |\n| ------------------------ | ---------------------------- |\n| Code execution needed | Simple API calls |\n| File generation | Database queries |\n| Complex calculations | External service integration |\n| Data processing | Authentication required |\n| Provider-agnostic needed | Backend-specific logic |\n\n### 3. Skill Selection\n\nDefine all skills available to this agent in the `skills:` section. Then specify which skills are available for the chat thread in `agent.skills`:\n\n```yaml\n# All skills available to this agent (defined once at protocol level)\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n pdf-processor:\n display: description\n description: Processing PDFs\n\n# Skills available for this chat thread\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis] # Skills available for this thread\n```\n\n### 4. Display Modes\n\nChoose appropriate display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n## Comparison: Skills vs Tools vs Provider Options\n\n| Feature | Octavus Skills | External Tools | Provider Tools/Skills |\n| ------------------ | --------------------------- | ------------------- | --------------------- |\n| **Execution** | Sandbox or agent's computer | Your backend | Provider servers |\n| **Provider** | Any (agnostic) | N/A | Provider-specific |\n| **Code Execution** | Yes | No | Yes (provider tools) |\n| **File Output** | Yes | No | Yes (provider skills) |\n| **Implementation** | Skill packages | Your code | Built-in |\n| **Cost** | Sandbox + LLM API | Your infrastructure | Included in API |\n\n## Uploading Custom Skills\n\nYou can upload custom skills to your organization using the CLI or the platform UI.\n\n### Via CLI (Recommended)\n\nUse [`octavus skills sync`](/docs/server-sdk/cli#octavus-skills-sync-path) to package and upload a skill directory. If the skill has a `.env` file, secrets are pushed alongside the bundle:\n\n```bash\noctavus skills sync ./skills/my-skill\n```\n\n### Skill Directory Structure\n\n```\nmy-skill/\n\u251C\u2500\u2500 SKILL.md # Required: Skill documentation with frontmatter\n\u251C\u2500\u2500 scripts/ # Optional: Executable scripts\n\u2502 \u251C\u2500\u2500 run.py\n\u2502 \u2514\u2500\u2500 requirements.txt\n\u251C\u2500\u2500 references/ # Optional: Additional documentation\n\u251C\u2500\u2500 assets/ # Optional: Templates, images\n\u2514\u2500\u2500 .env # Optional: Secrets (not included in bundle)\n```\n\nOnce uploaded, reference the skill by slug in your protocol:\n\n```yaml\nskills:\n my-skill:\n display: description\n description: Custom analysis tool\n\nagent:\n skills: [my-skill]\n```\n\n## On-Demand Skills\n\nOn-demand skills (`onDemandSkills`) also support the `execution` field:\n\n```yaml\nonDemandSkills:\n display: description\n execution: device\n```\n\nWhen `execution: device` is set on the on-demand skills declaration, any skill attached at runtime via integrations runs on the agent's computer instead of in a sandbox.\n\n## Sandbox Timeout\n\nThe default sandbox timeout is 5 minutes (applies to sandbox skills only). You can configure a custom timeout using `sandboxTimeout` in the agent config or on individual `start-thread` blocks:\n\n```yaml\n# Agent-level timeout (applies to main thread)\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes (in milliseconds)\n```\n\n```yaml\n# Thread-level timeout (overrides agent-level for this thread)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour\n```\n\nThread-level `sandboxTimeout` takes priority over agent-level. Maximum: 1 hour (3,600,000 ms).\n\n## Skill Secrets\n\nSkills can declare secrets they need to function. When an organization configures those secrets, the skill runs in **secure mode** with additional isolation.\n\n### Declaring Secrets\n\nAdd a `secrets` array to your SKILL.md frontmatter:\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\nEach secret declaration has:\n\n| Field | Required | Description |\n| ------------- | -------- | ----------------------------------------------------------- |\n| `name` | Yes | Environment variable name (uppercase, e.g., `GITHUB_TOKEN`) |\n| `description` | No | Explains what this secret is for (shown in the UI) |\n| `required` | No | Whether the secret is required (defaults to `true`) |\n\nSecret names must match the pattern `^[A-Z_][A-Z0-9_]*$` (uppercase letters, digits, and underscores).\n\n### Configuring Secrets\n\nOrganization admins configure secret values through the skill editor in the platform UI. Each organization maintains its own independent set of secrets for each skill.\n\nSecrets are encrypted at rest and only decrypted at execution time.\n\n### Secure Mode\n\nWhen a skill has secrets configured for the organization, it automatically runs in **secure mode**:\n\n- The skill gets its own **isolated sandbox** (separate from other skills)\n- Secrets are injected as **environment variables** available to all scripts\n- Only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked\n- Scripts receive input as **JSON via stdin** (using the `input` parameter on `octavus_skill_run`) instead of CLI args\n- All output (stdout/stderr) is **automatically redacted** for secret values before being returned to the LLM\n\n### Writing Scripts for Secure Skills\n\nScripts in secure skills read input from stdin as JSON and access secrets from environment variables:\n\n```python\nimport json\nimport os\nimport sys\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ.get('GITHUB_TOKEN')\n\n# Use the token and input_data to perform the task\n```\n\nFor standard skills (without secrets), scripts receive input as CLI arguments. For secure skills, always use stdin JSON.\n\n## Security\n\nSandbox skills run in isolated environments:\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n- **Secret redaction** - output from secure skills is automatically scanned for secret values\n\nDevice skills run on the agent's computer and share its environment. They do not have sandbox isolation but benefit from restricted tool access (only slug-bearing tools are available).\n\n## Next Steps\n\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills in agent settings\n- [Provider Options](/docs/protocol/provider-options) - Anthropic's built-in skills\n- [Skills Advanced Guide](/docs/protocol/skills-advanced) - Best practices and advanced patterns\n",
|
|
601
601
|
excerpt: "Skills Skills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with...",
|
|
602
602
|
order: 5
|
|
603
603
|
},
|
|
@@ -615,8 +615,8 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
615
615
|
section: "protocol",
|
|
616
616
|
title: "Agent Config",
|
|
617
617
|
description: "Configuring the agent model and behavior.",
|
|
618
|
-
content: '\n# Agent Config\n\nThe `agent` section configures the LLM model, system prompt, tools, and behavior.\n\n## Basic Configuration\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system # References prompts/system.md\n tools: [get-user-account] # Available tools\n mcpServers: [figma, browser] # MCP server connections\n skills: [qr-code] # Available skills\n references: [api-guidelines] # On-demand context documents\n```\n\n## Configuration Options\n\n| Field | Required | Description |\n| ---------------- | -------- | ---------------------------------------------------------------------------------------- |\n| `model` | Yes | Model identifier or variable reference |\n| `backupModel` | No | Backup model for automatic failover on provider errors |\n| `system` | Yes | System prompt filename (without .md) |\n| `input` | No | Variables to pass to the system prompt |\n| `tools` | No | List of tools the LLM can call |\n| `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |\n| `skills` | No | List of Octavus skills the LLM can use |\n| `references` | No | List of references the LLM can fetch on demand |\n| `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |\n| `imageModel` | No | Image generation model (enables agentic image generation) |\n| `webSearch` | No | Enable built-in web search tool (provider-agnostic) |\n| `agentic` | No | Allow multiple tool call cycles |\n| `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |\n| `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |\n| `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |\n| `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `anthropic` | No | Anthropic-specific options (tools, skills) |\n\n## Models\n\nSpecify models in `provider/model-id` format. Any model supported by the provider\'s SDK will work.\n\n### Supported Providers\n\n| Provider | Format | Examples |\n| --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |\n| Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |\n| Google | `google/{model-id}` | `gemini-3-pro-preview`, `gemini-3-flash-preview`, `gemini-2.5-flash` |\n| OpenAI | `openai/{model-id}` | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1` |\n\n### Examples\n\n```yaml\n# Anthropic Claude 4.5\nagent:\n model: anthropic/claude-sonnet-4-5\n\n# Google Gemini 3\nagent:\n model: google/gemini-3-flash-preview\n\n# OpenAI GPT-5\nagent:\n model: openai/gpt-5\n\n# OpenAI reasoning models\nagent:\n model: openai/o3-mini\n```\n\n> **Note**: Model IDs are passed directly to the provider SDK. Check the provider\'s documentation for the latest available models.\n\n### Dynamic Model Selection\n\nThe model field can also reference an input variable, allowing consumers to choose the model when creating a session:\n\n```yaml\ninput:\n MODEL:\n type: string\n description: The LLM model to use\n\nagent:\n model: MODEL # Resolved from session input\n system: system\n```\n\nWhen creating a session, pass the model:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n MODEL: \'anthropic/claude-sonnet-4-5\',\n});\n```\n\nThis enables:\n\n- **Multi-provider support** - Same agent works with different providers\n- **A/B testing** - Test different models without protocol changes\n- **User preferences** - Let users choose their preferred model\n\nThe model value is validated at runtime to ensure it\'s in the correct `provider/model-id` format.\n\n> **Note**: When using dynamic models, provider-specific options (like `anthropic:`) may not apply if the model resolves to a different provider.\n\n## Backup Model\n\nConfigure a fallback model that activates automatically when the primary model encounters a transient provider error (rate limits, outages, timeouts):\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n```\n\nWhen a provider error occurs, the system retries once with the backup model. If the backup also fails, the original error is returned.\n\n**Key behaviors:**\n\n- Only transient provider errors trigger fallback - authentication and validation errors are not retried\n- Provider-specific options (like `anthropic:`) are only forwarded to the backup model if it uses the same provider\n- For streaming responses, fallback only occurs if no content has been sent to the client yet\n\nLike `model`, `backupModel` supports variable references:\n\n```yaml\ninput:\n BACKUP_MODEL:\n type: string\n description: Fallback model for provider errors\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: BACKUP_MODEL\n system: system\n```\n\n> **Tip**: Use a different provider for your backup model (e.g., primary on Anthropic, backup on OpenAI) to maximize resilience against single-provider outages.\n\n## System Prompt\n\nThe system prompt sets the agent\'s persona and instructions. The `input` field controls which variables are available to the prompt - only variables listed in `input` are interpolated.\n\n```yaml\nagent:\n system: system # Uses prompts/system.md\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n```\n\nVariables in `input` can come from `protocol.input`, `protocol.resources`, or `protocol.variables`.\n\n### Input Mapping Formats\n\n```yaml\n# Array format (same name)\ninput:\n - COMPANY_NAME\n - PRODUCT_NAME\n\n# Array format (rename)\ninput:\n - CONTEXT: CONVERSATION_SUMMARY # Prompt sees CONTEXT, value comes from CONVERSATION_SUMMARY\n\n# Object format (rename)\ninput:\n CONTEXT: CONVERSATION_SUMMARY\n```\n\nThe left side (label) is what the prompt sees. The right side (source) is where the value comes from.\n\n### Example\n\n`prompts/system.md`:\n\n```markdown\nYou are a friendly support agent for {{COMPANY_NAME}}.\n\n## Your Role\n\nHelp users with questions about {{PRODUCT_NAME}}.\n\n## Guidelines\n\n- Be helpful and professional\n- If you can\'t help, offer to escalate\n- Never share internal information\n```\n\n## Agentic Mode\n\nEnable multi-step tool calling:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account, search-docs, create-ticket]\n agentic: true # LLM can call multiple tools\n maxSteps: 10 # Limit cycles to prevent runaway\n```\n\n**How it works:**\n\n1. LLM receives user message\n2. LLM decides to call a tool\n3. Tool executes, result returned to LLM\n4. LLM decides if more tools needed\n5. Repeat until LLM responds or maxSteps reached\n\n## Extended Thinking\n\nEnable extended reasoning for complex tasks:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n thinking: medium # low | medium | high | max\n```\n\n| Level | Use Case |\n| -------- | ---------------------------------- |\n| `low` | Simple reasoning |\n| `medium` | Moderate complexity |\n| `high` | Complex analysis |\n| `max` | Maximum reasoning budget available |\n\nThinking content streams to the UI and can be displayed to users.\n\n### How levels are applied\n\nEach provider translates `thinking` into its own reasoning controls:\n\n| Provider | Level mapping |\n| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| Anthropic 4.6+ (`claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`) | Adaptive thinking - the model decides how much to reason, guided by `effort: low / medium / high / max` |\n| Anthropic older (4.5 and earlier) | Fixed token budgets: `low` ~5,000, `medium` ~10,000, `high` ~20,000, `max` ~40,000 |\n| OpenAI (GPT-5.x, o-series) | `reasoningEffort: low / medium / high` (`max` maps to `high`) |\n| Google (Gemini 3.x) | `thinkingLevel: low / high` (`medium` rounds up to `high`) |\n| Google (Gemini 1.x / 2.x) | Token budgets: `low` 1,024, `medium` 8,192, `high` 24,576, `max` 65,536 |\n| OpenRouter | Unified `reasoning.max_tokens` (translated upstream) |\n| Vercel AI Gateway | Forwards the underlying provider\'s options |\n\n## Prompt Caching\n\nProviders charge less for tokens served from their prompt cache (often 10% of the uncached rate). Octavus exposes a single `cache` field that picks the right retention policy per provider, so the stable prefix of your agent - tools, system prompt, and historical messages - gets billed at the cache-read rate on repeat requests.\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n cache: auto # auto (default) | extended | off\n```\n\n| Mode | Behavior | When to use |\n| ---------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| `auto` | Short-TTL caching. Default when omitted. | Most agents. Free on all supported providers and pays for itself within the same session. |\n| `extended` | Long-TTL caching. Trades a higher cache-write cost for much longer residency. | Agents triggered with gaps (daily reports, on-call assistants) where the prefix is reused across hours. |\n| `off` | No opt-in caching emitted. | When you explicitly want to skip caching - e.g. debugging a non-deterministic prefix. |\n\n### Per-provider behavior\n\nThe `cache` field is provider-agnostic at the protocol level - each provider translates it into its own cache retention policy:\n\n| Provider | `auto` TTL | `extended` TTL |\n| --------- | ------------------------- | -------------- |\n| Anthropic | 5 minutes | 1 hour |\n| OpenAI | in-memory (~5\u201310 minutes) | 24 hours |\n| Google | Implicit (Gemini 2.5+) | Implicit |\n\nOn `off`, Octavus emits no explicit cache options. Providers that auto-cache (OpenAI on prefixes \u2265 1,024 tokens, Gemini 2.5+) may still cache transparently - `off` just disables Octavus\'s opt-in behavior.\n\n### Threads don\'t inherit\n\nNamed threads (created with `start-thread`) read their own `cache` field independently - they **do not** inherit the agent\'s cache value:\n\n```yaml\nagent:\n cache: extended # 1-hour TTL on the main thread\n\nhandlers:\n summarize:\n Start summary:\n block: start-thread\n thread: summary\n # No cache field \u2192 defaults to \'auto\' (5-minute TTL), NOT \'extended\'\n system: summary-system\n```\n\nThis is intentional: named threads are often used for short, one-shot work (summarization, classification) where the long TTL would be wasted. Set `cache` explicitly on `start-thread` when you do want it.\n\n### Cost trade-offs\n\n- **Cache reads** are always much cheaper than uncached input on any provider - caching is effectively free if your prefix is stable.\n- **Cache writes** on Anthropic cost ~1.25\xD7 input for `auto` and 2\xD7 input for `extended`. OpenAI and Google don\'t charge separately for cache writes.\n- Use `extended` only when the same prefix is genuinely reused across sessions that span hours; otherwise the higher write cost dominates the savings.\n\n## Skills\n\nEnable Octavus skills for code execution and file generation:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code] # Enable skills\n agentic: true\n```\n\nSkills provide provider-agnostic code execution in isolated sandboxes. When enabled, the LLM can execute Python/Bash code, run skill scripts, and generate files.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## References\n\nEnable on-demand context loading via reference documents:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n references: [api-guidelines, error-codes]\n agentic: true\n```\n\nReferences are markdown files stored in the agent\'s `references/` directory. When enabled, the LLM can list available references and read their content using `octavus_reference_list` and `octavus_reference_read` tools.\n\nSee [References](/docs/protocol/references) for full documentation.\n\n## Image Generation\n\nEnable the LLM to generate images autonomously:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n imageModel: google/gemini-2.5-flash-image\n agentic: true\n```\n\nWhen `imageModel` is configured, the `octavus_generate_image` tool becomes available. The LLM can decide when to generate images based on user requests. The tool supports both text-to-image generation and image editing/transformation using reference images.\n\n### Supported Image Providers\n\n| Provider | Model Types | Examples |\n| -------- | --------------------------------------- | --------------------------------------------------------- |\n| OpenAI | Dedicated image models | `gpt-image-1` |\n| Google | Gemini native (contains "image") | `gemini-2.5-flash-image`, `gemini-3-flash-image-generate` |\n| Google | Imagen dedicated (starts with "imagen") | `imagen-4.0-generate-001` |\n\n> **Note**: Google has two image generation approaches. Gemini "native" models (containing "image" in the ID) generate images using the language model API with `responseModalities`. Imagen models (starting with "imagen") use a dedicated image generation API.\n\n### Image Sizes\n\nThe tool supports three image sizes:\n\n- `1024x1024` (default) - Square\n- `1792x1024` - Landscape (16:9)\n- `1024x1792` - Portrait (9:16)\n\n### Image Editing with Reference Images\n\nBoth the agentic tool and the `generate-image` block support reference images for editing and transformation. When reference images are provided, the prompt describes how to modify or use those images.\n\n| Provider | Models | Reference Image Support |\n| -------- | -------------------------------- | ----------------------- |\n| OpenAI | `gpt-image-1` | Yes |\n| Google | Gemini native (`gemini-*-image`) | Yes |\n| Google | Imagen (`imagen-*`) | No |\n\n### Agentic vs Deterministic\n\nUse `imageModel` in agent config when:\n\n- The LLM should decide when to generate or edit images\n- Users ask for images in natural language\n\nUse `generate-image` block (see [Handlers](/docs/protocol/handlers#generate-image)) when:\n\n- You want explicit control over image generation or editing\n- Building prompt engineering pipelines\n- Images are generated at specific handler steps\n\n## Web Search\n\nEnable the LLM to search the web for current information:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n webSearch: true\n agentic: true\n```\n\nWhen `webSearch` is enabled, the `octavus_web_search` tool becomes available. The LLM can decide when to search the web based on the conversation. Search results include source URLs that are emitted as citations in the UI.\n\nThis is a **provider-agnostic** built-in tool - it works with any LLM provider (Anthropic, Google, OpenAI, etc.). For Anthropic\'s own web search implementation, see [Provider Options](/docs/protocol/provider-options).\n\nUse cases:\n\n- Current events and real-time data\n- Fact verification and documentation lookups\n- Any information that may have changed since the model\'s training\n\n## TODO List\n\nEnable the LLM to maintain a structured task list while it works:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n todoList: true\n agentic: true\n```\n\nWhen `todoList` is enabled, the `octavus_todo_write` tool becomes available. The LLM creates and updates a list of items - each with `id`, `content`, and `status` (`pending`, `in_progress`, `completed`, `cancelled`) - and the platform emits a `todo-update` stream event with the resolved snapshot. The Client SDK accumulates updates into a single `UITodoPart` per assistant message, so consumers render an evolving "Plan" card without managing state themselves.\n\nThe list persists across messages: the LLM can use `merge=true` to update items by id (sending only the changed fields), or `merge=false` to replace the list entirely.\n\nUse cases:\n\n- Multi-step tasks where the user benefits from seeing progress\n- Long-running agentic loops that should communicate intent\n- Workflows where the agent plans before acting\n\n## Temperature\n\nControl response randomness:\n\n```yaml\nagent:\n model: openai/gpt-4o\n temperature: 0.7 # 0 = deterministic, 2 = creative\n```\n\n**Guidelines:**\n\n- `0 - 0.3`: Factual, consistent responses\n- `0.4 - 0.7`: Balanced (good default)\n- `0.8 - 1.2`: Creative, varied responses\n- `> 1.2`: Very creative (may be inconsistent)\n\n## Dynamic Configuration\n\nLike `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:\n\n```yaml\ninput:\n TEMPERATURE:\n type: number\n description: Override temperature (0-2)\n optional: true\n THINKING:\n type: string\n description: Override thinking effort (low/medium/high/max, or "off")\n optional: true\n MAX_STEPS:\n type: integer\n description: Override max agentic steps\n optional: true\n\nagent:\n model: anthropic/claude-sonnet-4-5\n temperature: TEMPERATURE\n thinking: THINKING\n maxSteps: MAX_STEPS\n system: system\n```\n\nWhen creating a session, pass the values in their natural type:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n TEMPERATURE: 0.7,\n THINKING: \'medium\',\n MAX_STEPS: 5,\n});\n```\n\n### Accepted values\n\nThe resolver accepts the natural type for each field, plus a string fallback so consumers can pass values from form inputs without coercing first.\n\n| Field | Suggested input type | Value at session creation |\n| ------------- | ------------------------------------------ | -------------------------------------------------- |\n| `temperature` | `number` (or `string` for `"off"` support) | A number `0`-`2`, a numeric string, or `"off"` |\n| `thinking` | `string` | `"low"`, `"medium"`, `"high"`, `"max"`, or `"off"` |\n| `maxSteps` | `integer` (or `string`) | A positive integer or a positive integer string |\n\nThe protocol\'s `input:` declaration enforces what the consumer can pass. Pick `type: number` / `type: integer` if you want native numeric overrides; pick `type: string` (or `type: unknown`) if you also need to pass the `"off"` sentinel for `temperature`.\n\n### Explicit "off" vs not set\n\n`temperature` and `thinking` accept an explicit `"off"` value to disable the field at session creation. This is different from omitting the variable:\n\n- **Variable not provided** -> the field is unset; the provider uses its default behavior\n- **Variable provided as `"off"`** -> the field is explicitly disabled (no temperature emitted, reasoning disabled)\n\nThe distinction matters because `temperature` and `thinking` are mutually exclusive at the provider level - several providers ignore temperature when reasoning is enabled. Use `"off"` to opt one out so the other takes effect.\n\n### Validation\n\nVariable references are caught at protocol validation time. If `temperature: TEMPERATURE` is declared but `TEMPERATURE` is missing from `input:` or `variables:`, the validator surfaces the error in the dashboard before the agent runs.\n\n## Provider Options\n\nEnable provider-specific features like Anthropic\'s built-in tools and skills:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n```\n\nProvider options are validated against the model - using `anthropic:` with a non-Anthropic model will fail validation.\n\nSee [Provider Options](/docs/protocol/provider-options) for full documentation.\n\n## Thread-Specific Config\n\nOverride config for named threads:\n\n```yaml\nhandlers:\n request-human:\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-sonnet-4-5 # Different model\n backupModel: openai/gpt-4o # Failover model\n thinking: low # Different thinking\n cache: off # Different cache mode (does not inherit from agent)\n maxSteps: 1 # Limit tool calls\n system: escalation-summary # Different prompt\n mcpServers: [figma, browser] # Thread-specific MCP servers\n skills: [data-analysis] # Thread-specific skills\n references: [escalation-policy] # Thread-specific references\n imageModel: google/gemini-2.5-flash-image # Thread-specific image model\n webSearch: true # Thread-specific web search\n todoList: true # Thread-specific task list\n```\n\nEach thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol\'s `skills:` section. References must exist in the agent\'s `references/` directory. Workers use this same pattern since they don\'t have a global `agent:` section.\n\n## Full Example\n\n```yaml\ninput:\n COMPANY_NAME: { type: string }\n PRODUCT_NAME: { type: string }\n USER_ID: { type: string, optional: true }\n\nresources:\n CONVERSATION_SUMMARY:\n type: string\n default: \'\'\n\ntools:\n get-user-account:\n description: Look up user account\n parameters:\n userId: { type: string }\n\n search-docs:\n description: Search help documentation\n parameters:\n query: { type: string }\n\n create-support-ticket:\n description: Create a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string } # low, medium, high\n\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n tools:\n - get-user-account\n - search-docs\n - create-support-ticket\n mcpServers: [figma] # MCP server connections\n skills: [qr-code] # Octavus skills\n references: [support-policies] # On-demand context\n webSearch: true # Built-in web search\n todoList: true # Structured task tracking\n agentic: true\n maxSteps: 10\n thinking: medium\n # Anthropic-specific options\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n\ntriggers:\n user-message:\n input:\n USER_MESSAGE: { type: string }\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n display: hidden\n\n Respond:\n block: next-message\n```\n',
|
|
619
|
-
excerpt: "Agent Config The section configures the LLM model, system prompt, tools, and behavior. Basic Configuration Configuration Options | Field
|
|
618
|
+
content: '\n# Agent Config\n\nThe `agent` section configures the LLM model, system prompt, tools, and behavior.\n\n## Basic Configuration\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system # References prompts/system.md\n tools: [get-user-account] # Available tools\n mcpServers: [figma, browser] # MCP server connections\n skills: [qr-code] # Available skills\n references: [api-guidelines] # On-demand context documents\n```\n\n## Configuration Options\n\n| Field | Required | Description |\n| --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |\n| `model` | Yes | Model identifier or variable reference |\n| `backupModel` | No | Backup model for automatic failover on provider errors |\n| `system` | Yes | System prompt filename (without .md) |\n| `input` | No | Variables to pass to the system prompt |\n| `tools` | No | List of tools the LLM can call |\n| `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |\n| `skills` | No | List of Octavus skills the LLM can use |\n| `references` | No | List of references the LLM can fetch on demand |\n| `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |\n| `imageModel` | No | Image generation model (enables agentic image generation) |\n| `webSearch` | No | Enable built-in web search tool (provider-agnostic) |\n| `agentic` | No | Allow multiple tool call cycles |\n| `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |\n| `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |\n| `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |\n| `speed` | No | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode)) |\n| `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `maxToolOutputTokens` | No | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |\n| `contextManagement` | No | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management)) |\n| `anthropic` | No | Anthropic-specific options (tools, skills) |\n\n## Models\n\nSpecify models in `provider/model-id` format. Any model supported by the provider\'s SDK will work.\n\n### Supported Providers\n\n| Provider | Format | Examples |\n| --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |\n| Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |\n| Google | `google/{model-id}` | `gemini-3.5-flash`, `gemini-3-flash-preview`, `gemini-2.5-flash` |\n| OpenAI | `openai/{model-id}` | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1` |\n\n### Examples\n\n```yaml\n# Anthropic Claude 4.5\nagent:\n model: anthropic/claude-sonnet-4-5\n\n# Google Gemini 3\nagent:\n model: google/gemini-3-flash-preview\n\n# OpenAI GPT-5\nagent:\n model: openai/gpt-5\n\n# OpenAI reasoning models\nagent:\n model: openai/o3-mini\n```\n\n> **Note**: Model IDs are passed directly to the provider SDK. Check the provider\'s documentation for the latest available models.\n\n### Dynamic Model Selection\n\nThe model field can also reference an input variable, allowing consumers to choose the model when creating a session:\n\n```yaml\ninput:\n MODEL:\n type: string\n description: The LLM model to use\n\nagent:\n model: MODEL # Resolved from session input\n system: system\n```\n\nWhen creating a session, pass the model:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n MODEL: \'anthropic/claude-sonnet-4-5\',\n});\n```\n\nThis enables:\n\n- **Multi-provider support** - Same agent works with different providers\n- **A/B testing** - Test different models without protocol changes\n- **User preferences** - Let users choose their preferred model\n\nThe model value is validated at runtime to ensure it\'s in the correct `provider/model-id` format.\n\n> **Note**: When using dynamic models, provider-specific options (like `anthropic:`) may not apply if the model resolves to a different provider.\n\n## Backup Model\n\nConfigure a fallback model that activates automatically when the primary model encounters a transient provider error (rate limits, outages, timeouts):\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n```\n\nWhen a provider error occurs, the system retries once with the backup model. If the backup also fails, the original error is returned.\n\n**Key behaviors:**\n\n- Only transient provider errors trigger fallback - authentication and validation errors are not retried\n- Provider-specific options (like `anthropic:`) are only forwarded to the backup model if it uses the same provider\n- For streaming responses, fallback only occurs if no content has been sent to the client yet\n\nLike `model`, `backupModel` supports variable references:\n\n```yaml\ninput:\n BACKUP_MODEL:\n type: string\n description: Fallback model for provider errors\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: BACKUP_MODEL\n system: system\n```\n\n> **Tip**: Use a different provider for your backup model (e.g., primary on Anthropic, backup on OpenAI) to maximize resilience against single-provider outages.\n\n## System Prompt\n\nThe system prompt sets the agent\'s persona and instructions. The `input` field controls which variables are available to the prompt - only variables listed in `input` are interpolated.\n\n```yaml\nagent:\n system: system # Uses prompts/system.md\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n```\n\nVariables in `input` can come from `protocol.input`, `protocol.resources`, or `protocol.variables`.\n\n### Input Mapping Formats\n\n```yaml\n# Array format (same name)\ninput:\n - COMPANY_NAME\n - PRODUCT_NAME\n\n# Array format (rename)\ninput:\n - CONTEXT: CONVERSATION_SUMMARY # Prompt sees CONTEXT, value comes from CONVERSATION_SUMMARY\n\n# Object format (rename)\ninput:\n CONTEXT: CONVERSATION_SUMMARY\n```\n\nThe left side (label) is what the prompt sees. The right side (source) is where the value comes from.\n\n### Example\n\n`prompts/system.md`:\n\n```markdown\nYou are a friendly support agent for {{COMPANY_NAME}}.\n\n## Your Role\n\nHelp users with questions about {{PRODUCT_NAME}}.\n\n## Guidelines\n\n- Be helpful and professional\n- If you can\'t help, offer to escalate\n- Never share internal information\n```\n\n## Agentic Mode\n\nEnable multi-step tool calling:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account, search-docs, create-ticket]\n agentic: true # LLM can call multiple tools\n maxSteps: 10 # Limit cycles to prevent runaway\n```\n\n**How it works:**\n\n1. LLM receives user message\n2. LLM decides to call a tool\n3. Tool executes, result returned to LLM\n4. LLM decides if more tools needed\n5. Repeat until LLM responds or maxSteps reached\n\n## Extended Thinking\n\nEnable extended reasoning for complex tasks:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n thinking: medium # low | medium | high | max\n```\n\n| Level | Use Case |\n| -------- | ---------------------------------- |\n| `low` | Simple reasoning |\n| `medium` | Moderate complexity |\n| `high` | Complex analysis |\n| `max` | Maximum reasoning budget available |\n\nThinking content streams to the UI and can be displayed to users.\n\n### How levels are applied\n\nEach provider translates `thinking` into its own reasoning controls:\n\n| Provider | Level mapping |\n| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| Anthropic 4.6+ (`claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`) | Adaptive thinking - the model decides how much to reason, guided by `effort: low / medium / high / max` |\n| Anthropic older (4.5 and earlier) | Fixed token budgets: `low` ~5,000, `medium` ~10,000, `high` ~20,000, `max` ~40,000 |\n| OpenAI (GPT-5.x, o-series) | `reasoningEffort: low / medium / high` (`max` maps to `high`) |\n| Google (Gemini 3.x) | `thinkingLevel: low / high` (`medium` rounds up to `high`) |\n| Google (Gemini 1.x / 2.x) | Token budgets: `low` 1,024, `medium` 8,192, `high` 24,576, `max` 65,536 |\n| OpenRouter | Unified `reasoning.max_tokens` (translated upstream) |\n| Vercel AI Gateway | Forwards the underlying provider\'s options |\n\n## Prompt Caching\n\nProviders charge less for tokens served from their prompt cache (often 10% of the uncached rate). Octavus exposes a single `cache` field that picks the right retention policy per provider, so the stable prefix of your agent - tools, system prompt, and historical messages - gets billed at the cache-read rate on repeat requests.\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n cache: auto # auto (default) | extended | off\n```\n\n| Mode | Behavior | When to use |\n| ---------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| `auto` | Short-TTL caching. Default when omitted. | Most agents. Free on all supported providers and pays for itself within the same session. |\n| `extended` | Long-TTL caching. Trades a higher cache-write cost for much longer residency. | Agents triggered with gaps (daily reports, on-call assistants) where the prefix is reused across hours. |\n| `off` | No opt-in caching emitted. | When you explicitly want to skip caching - e.g. debugging a non-deterministic prefix. |\n\n### Per-provider behavior\n\nThe `cache` field is provider-agnostic at the protocol level - each provider translates it into its own cache retention policy:\n\n| Provider | `auto` TTL | `extended` TTL |\n| --------- | ------------------------- | -------------- |\n| Anthropic | 5 minutes | 1 hour |\n| OpenAI | in-memory (~5\u201310 minutes) | 24 hours |\n| Google | Implicit (Gemini 2.5+) | Implicit |\n\nOn `off`, Octavus emits no explicit cache options. Providers that auto-cache (OpenAI on prefixes \u2265 1,024 tokens, Gemini 2.5+) may still cache transparently - `off` just disables Octavus\'s opt-in behavior.\n\n### Threads don\'t inherit\n\nNamed threads (created with `start-thread`) read their own `cache` field independently - they **do not** inherit the agent\'s cache value:\n\n```yaml\nagent:\n cache: extended # 1-hour TTL on the main thread\n\nhandlers:\n summarize:\n Start summary:\n block: start-thread\n thread: summary\n # No cache field \u2192 defaults to \'auto\' (5-minute TTL), NOT \'extended\'\n system: summary-system\n```\n\nThis is intentional: named threads are often used for short, one-shot work (summarization, classification) where the long TTL would be wasted. Set `cache` explicitly on `start-thread` when you do want it.\n\n### Cost trade-offs\n\n- **Cache reads** are always much cheaper than uncached input on any provider - caching is effectively free if your prefix is stable.\n- **Cache writes** on Anthropic cost ~1.25\xD7 input for `auto` and 2\xD7 input for `extended`. OpenAI and Google don\'t charge separately for cache writes.\n- Use `extended` only when the same prefix is genuinely reused across sessions that span hours; otherwise the higher write cost dominates the savings.\n\n## Skills\n\nEnable Octavus skills for code execution and file generation:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code] # Enable skills\n agentic: true\n```\n\nSkills provide provider-agnostic code execution in isolated sandboxes. When enabled, the LLM can execute Python/Bash code, run skill scripts, and generate files.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## References\n\nEnable on-demand context loading via reference documents:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n references: [api-guidelines, error-codes]\n agentic: true\n```\n\nReferences are markdown files stored in the agent\'s `references/` directory. When enabled, the LLM can list available references and read their content using `octavus_reference_list` and `octavus_reference_read` tools.\n\nSee [References](/docs/protocol/references) for full documentation.\n\n## Image Generation\n\nEnable the LLM to generate images autonomously:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n imageModel: google/gemini-2.5-flash-image\n agentic: true\n```\n\nWhen `imageModel` is configured, the `octavus_generate_image` tool becomes available. The LLM can decide when to generate images based on user requests. The tool supports both text-to-image generation and image editing/transformation using reference images.\n\n### Supported Image Providers\n\n| Provider | Model Types | Examples |\n| -------- | --------------------------------------- | --------------------------------------------------------- |\n| OpenAI | Dedicated image models | `gpt-image-1` |\n| Google | Gemini native (contains "image") | `gemini-2.5-flash-image`, `gemini-3-flash-image-generate` |\n| Google | Imagen dedicated (starts with "imagen") | `imagen-4.0-generate-001` |\n\n> **Note**: Google has two image generation approaches. Gemini "native" models (containing "image" in the ID) generate images using the language model API with `responseModalities`. Imagen models (starting with "imagen") use a dedicated image generation API.\n\n### Image Sizes\n\nThe tool supports three image sizes:\n\n- `1024x1024` (default) - Square\n- `1792x1024` - Landscape (16:9)\n- `1024x1792` - Portrait (9:16)\n\n### Image Editing with Reference Images\n\nBoth the agentic tool and the `generate-image` block support reference images for editing and transformation. When reference images are provided, the prompt describes how to modify or use those images.\n\n| Provider | Models | Reference Image Support |\n| -------- | -------------------------------- | ----------------------- |\n| OpenAI | `gpt-image-1` | Yes |\n| Google | Gemini native (`gemini-*-image`) | Yes |\n| Google | Imagen (`imagen-*`) | No |\n\n### Agentic vs Deterministic\n\nUse `imageModel` in agent config when:\n\n- The LLM should decide when to generate or edit images\n- Users ask for images in natural language\n\nUse `generate-image` block (see [Handlers](/docs/protocol/handlers#generate-image)) when:\n\n- You want explicit control over image generation or editing\n- Building prompt engineering pipelines\n- Images are generated at specific handler steps\n\n## Web Search\n\nEnable the LLM to search the web for current information:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n webSearch: true\n agentic: true\n```\n\nWhen `webSearch` is enabled, the `octavus_web_search` tool becomes available. The LLM can decide when to search the web based on the conversation. Search results include source URLs that are emitted as citations in the UI.\n\nThis is a **provider-agnostic** built-in tool - it works with any LLM provider (Anthropic, Google, OpenAI, etc.). For Anthropic\'s own web search implementation, see [Provider Options](/docs/protocol/provider-options).\n\nUse cases:\n\n- Current events and real-time data\n- Fact verification and documentation lookups\n- Any information that may have changed since the model\'s training\n\n## TODO List\n\nEnable the LLM to maintain a structured task list while it works:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n todoList: true\n agentic: true\n```\n\nWhen `todoList` is enabled, the `octavus_todo_write` tool becomes available. The LLM creates and updates a list of items - each with `id`, `content`, and `status` (`pending`, `in_progress`, `completed`, `cancelled`) - and the platform emits a `todo-update` stream event with the resolved snapshot. The Client SDK accumulates updates into a single `UITodoPart` per assistant message, so consumers render an evolving "Plan" card without managing state themselves.\n\nThe list persists across messages: the LLM can use `merge=true` to update items by id (sending only the changed fields), or `merge=false` to replace the list entirely.\n\nUse cases:\n\n- Multi-step tasks where the user benefits from seeing progress\n- Long-running agentic loops that should communicate intent\n- Workflows where the agent plans before acting\n\n## Temperature\n\nControl response randomness:\n\n```yaml\nagent:\n model: openai/gpt-4o\n temperature: 0.7 # 0 = deterministic, 2 = creative\n```\n\n**Guidelines:**\n\n- `0 - 0.3`: Factual, consistent responses\n- `0.4 - 0.7`: Balanced (good default)\n- `0.8 - 1.2`: Creative, varied responses\n- `> 1.2`: Very creative (may be inconsistent)\n\n## Dynamic Configuration\n\nLike `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:\n\n```yaml\ninput:\n TEMPERATURE:\n type: number\n description: Override temperature (0-2)\n optional: true\n THINKING:\n type: string\n description: Override thinking effort (low/medium/high/max, or "off")\n optional: true\n MAX_STEPS:\n type: integer\n description: Override max agentic steps\n optional: true\n\nagent:\n model: anthropic/claude-sonnet-4-5\n temperature: TEMPERATURE\n thinking: THINKING\n maxSteps: MAX_STEPS\n system: system\n```\n\nWhen creating a session, pass the values in their natural type:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n TEMPERATURE: 0.7,\n THINKING: \'medium\',\n MAX_STEPS: 5,\n});\n```\n\n### Accepted values\n\nThe resolver accepts the natural type for each field, plus a string fallback so consumers can pass values from form inputs without coercing first.\n\n| Field | Suggested input type | Value at session creation |\n| ------------- | ------------------------------------------ | -------------------------------------------------- |\n| `temperature` | `number` (or `string` for `"off"` support) | A number `0`-`2`, a numeric string, or `"off"` |\n| `thinking` | `string` | `"low"`, `"medium"`, `"high"`, `"max"`, or `"off"` |\n| `maxSteps` | `integer` (or `string`) | A positive integer or a positive integer string |\n\nThe protocol\'s `input:` declaration enforces what the consumer can pass. Pick `type: number` / `type: integer` if you want native numeric overrides; pick `type: string` (or `type: unknown`) if you also need to pass the `"off"` sentinel for `temperature`.\n\n### Explicit "off" vs not set\n\n`temperature` and `thinking` accept an explicit `"off"` value to disable the field at session creation. This is different from omitting the variable:\n\n- **Variable not provided** -> the field is unset; the provider uses its default behavior\n- **Variable provided as `"off"`** -> the field is explicitly disabled (no temperature emitted, reasoning disabled)\n\nThe distinction matters because `temperature` and `thinking` are mutually exclusive at the provider level - several providers ignore temperature when reasoning is enabled. Use `"off"` to opt one out so the other takes effect.\n\n### Validation\n\nVariable references are caught at protocol validation time. If `temperature: TEMPERATURE` is declared but `TEMPERATURE` is missing from `input:` or `variables:`, the validator surfaces the error in the dashboard before the agent runs.\n\n## Provider Options\n\nEnable provider-specific features like Anthropic\'s built-in tools and skills:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n```\n\nProvider options are validated against the model - using `anthropic:` with a non-Anthropic model will fail validation.\n\nSee [Provider Options](/docs/protocol/provider-options) for full documentation.\n\n## Thread-Specific Config\n\nOverride config for named threads:\n\n```yaml\nhandlers:\n request-human:\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-opus-4-8 # Different model\n backupModel: openai/gpt-4o # Failover model\n thinking: low # Different thinking\n speed: fast # Fast mode for this thread (supported Opus models only)\n cache: off # Different cache mode (does not inherit from agent)\n maxSteps: 1 # Limit tool calls\n system: escalation-summary # Different prompt\n mcpServers: [figma, browser] # Thread-specific MCP servers\n skills: [data-analysis] # Thread-specific skills\n references: [escalation-policy] # Thread-specific references\n imageModel: google/gemini-2.5-flash-image # Thread-specific image model\n webSearch: true # Thread-specific web search\n todoList: true # Thread-specific task list\n```\n\nEach thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol\'s `skills:` section. References must exist in the agent\'s `references/` directory. Workers use this same pattern since they don\'t have a global `agent:` section - which is how a worker enables fast mode.\n\n## Full Example\n\n```yaml\ninput:\n COMPANY_NAME: { type: string }\n PRODUCT_NAME: { type: string }\n USER_ID: { type: string, optional: true }\n\nresources:\n CONVERSATION_SUMMARY:\n type: string\n default: \'\'\n\ntools:\n get-user-account:\n description: Look up user account\n parameters:\n userId: { type: string }\n\n search-docs:\n description: Search help documentation\n parameters:\n query: { type: string }\n\n create-support-ticket:\n description: Create a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string } # low, medium, high\n\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n tools:\n - get-user-account\n - search-docs\n - create-support-ticket\n mcpServers: [figma] # MCP server connections\n skills: [qr-code] # Octavus skills\n references: [support-policies] # On-demand context\n webSearch: true # Built-in web search\n todoList: true # Structured task tracking\n agentic: true\n maxSteps: 10\n thinking: medium\n # Anthropic-specific options\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n\ntriggers:\n user-message:\n input:\n USER_MESSAGE: { type: string }\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n display: hidden\n\n Respond:\n block: next-message\n```\n',
|
|
619
|
+
excerpt: "Agent Config The section configures the LLM model, system prompt, tools, and behavior. Basic Configuration Configuration Options | Field | Required | Description ...",
|
|
620
620
|
order: 7
|
|
621
621
|
},
|
|
622
622
|
{
|
|
@@ -633,7 +633,7 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
633
633
|
section: "protocol",
|
|
634
634
|
title: "Skills Advanced Guide",
|
|
635
635
|
description: "Best practices and advanced patterns for using Octavus skills.",
|
|
636
|
-
content: "\n# Skills Advanced Guide\n\nThis guide covers advanced patterns and best practices for using Octavus skills in your agents.\n\n## When to Use Skills\n\nSkills are ideal for:\n\n- **Code execution** - Running Python/Bash scripts\n- **File generation** - Creating images, PDFs, reports\n- **Data processing** - Analyzing, transforming, or visualizing data\n- **Provider-agnostic needs** - Features that should work with any LLM\n\nUse external tools instead when:\n\n- **Simple API calls** - Database queries, external services\n- **Authentication required** - Accessing user-specific resources\n- **Backend integration** - Tight coupling with your infrastructure\n\n## Skill Selection Strategy\n\n### Defining Available Skills\n\nDefine all skills in the `skills:` section, then reference which skills are available where they're used:\n\n**Interactive agents** - reference in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n pdf-processor:\n display: description\n description: Processing PDFs\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\n**Workers and named threads** - reference per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n\nsteps:\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis]\n maxSteps: 10\n```\n\n### Execution Mode\n\nThe `execution` field is set at the skill definition level and applies to all threads that use the skill:\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications\n execution: device # All threads using this skill run it on the device\n qr-code:\n display: description\n description: Generating QR codes\n # Defaults to sandbox execution\n```\n\nYou don't set `execution` per-thread - a skill's execution mode is consistent wherever it's used.\n\n### Match Skills to Use Cases\n\nDifferent threads can have different skills. Define all skills at the protocol level, then scope them to each thread:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n visualization:\n display: description\n description: Creating charts and visualizations\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\nFor a data analysis thread, you would specify `[data-analysis, visualization]` in `agent.skills` or in a `start-thread` block's `skills` field.\n\n## Display Mode Strategy\n\nChoose display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n### Guidelines\n\n- **`hidden`**: Background work that doesn't need user awareness\n- **`description`**: User-facing operations (default)\n- **`name`**: Quick operations where name is sufficient\n- **`stream`**: Long-running operations where progress matters\n\n## System Prompt Integration\n\nSkills are automatically injected into the system prompt. The LLM learns:\n\n1. **Available skills** - List of enabled skills with descriptions\n2. **How to use skills** - Instructions for using skill tools\n3. **Tool reference** - Available skill tools (`octavus_skill_read`, `octavus_code_run`, etc.)\n\nYou don't need to manually document skills in your system prompt. However, you can guide the LLM:\n\n```markdown\n<!-- prompts/system.md -->\n\nYou are a helpful assistant that can generate QR codes.\n\n## When to Generate QR Codes\n\nGenerate QR codes when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Share WiFi credentials\n- Create scannable data\n\nUse the qr-code skill for all QR code generation tasks.\n```\n\n## Error Handling\n\nSkills handle errors gracefully:\n\n```yaml\n# Skill execution errors are returned to the LLM\n# The LLM can retry or explain the error to the user\n```\n\nCommon error scenarios:\n\n1. **Invalid skill slug** - Skill not found in organization\n2. **Code execution errors** - Syntax errors, runtime exceptions\n3. **Missing dependencies** - Required packages not installed\n4. **File I/O errors** - Permission issues, invalid paths\n\nThe LLM receives error messages and can:\n\n- Retry with corrected code\n- Explain errors to users\n- Suggest alternatives\n\n## File Output Patterns\n\n### Single File Output\n\n```python\n# Save single file to /output/\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\nqr = qrcode.QRCode()\nqr.add_data('https://example.com')\nimg = qr.make_image()\nimg.save(f'{output_dir}/qrcode.png')\n```\n\n### Multiple Files\n\n```python\n# Save multiple files\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Generate multiple outputs\nfor i in range(3):\n filename = f'{output_dir}/output_{i}.png'\n # ... generate file ...\n```\n\n### Structured Output\n\n```python\n# Save structured data + files\nimport json\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Save metadata\nmetadata = {\n 'files': ['chart.png', 'data.csv'],\n 'summary': 'Analysis complete'\n}\nwith open(f'{output_dir}/metadata.json', 'w') as f:\n json.dump(metadata, f)\n\n# Save actual files\n# ... generate chart.png and data.csv ...\n```\n\n## Performance Considerations\n\n### Lazy Initialization\n\nSandboxes are created only when a skill tool is first called:\n\n```yaml\nagent:\n skills: [qr-code] # Sandbox created on first skill tool call\n```\n\nThis means:\n\n- No cost if skills aren't used\n- Fast startup (no sandbox creation delay)\n- Each `next-message` execution gets its own sandbox with only the skills it needs\n\n### Timeout Limits\n\nSandboxes default to a 5-minute timeout. Configure `sandboxTimeout` on the agent config or per thread:\n\n```yaml\n# Agent-level\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes\n```\n\n```yaml\n# Thread-level (overrides agent-level)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour for long-running analysis\n```\n\nThread-level `sandboxTimeout` takes priority. Maximum: 1 hour (3,600,000 ms).\n\n### Sandbox Lifecycle\n\nEach `next-message` execution gets its own sandbox:\n\n- **Scoped** - Only contains the skills available to that thread\n- **Isolated** - Interactive agents and workers don't share sandboxes\n- **Resilient** - If a sandbox expires, it's transparently recreated\n- **Cleaned up** - Sandbox destroyed when the LLM call completes\n\n## Combining Skills with Tools\n\nSkills and tools can work together:\n\n```yaml\ntools:\n get-user-data:\n description: Fetch user data from database\n parameters:\n userId: { type: string }\n\nskills:\n data-analysis:\n display: description\n description: Analyzing data\n\nagent:\n tools: [get-user-data]\n skills: [data-analysis]\n agentic: true\n\nhandlers:\n analyze-user:\n Get user data:\n block: tool-call\n tool: get-user-data\n input:\n userId: USER_ID\n output: USER_DATA\n\n Analyze:\n block: next-message\n # LLM can use data-analysis skill with USER_DATA\n```\n\nPattern:\n\n1. Fetch data via tool (from your backend)\n2. LLM uses skill to analyze/process the data\n3. Generate outputs (files, reports)\n\n## Secure Skills\n\nWhen a skill declares secrets and an organization configures them, the skill runs in secure mode with its own isolated sandbox.\n\n### Standard vs Secure vs Device Skills\n\n| Aspect | Standard Skills | Secure Skills | Device Skills |\n| ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |\n| **Environment** | Shared sandbox | Isolated sandbox (one per skill) | Agent's computer (VM or desktop) |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |\n| **Secrets** | No secrets | Secrets as env vars | No secrets |\n| **Output** | Raw stdout/stderr | Redacted (secret values replaced with `[REDACTED]`) | Raw stdout/stderr |\n\n### Writing Scripts for Secure Skills\n\nSecure skill scripts receive structured input via stdin (JSON) and access secrets from environment variables:\n\n```python\n#!/usr/bin/env python3\nimport json\nimport os\nimport sys\nimport subprocess\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ[\"GITHUB_TOKEN\"]\n\nrepo = input_data.get(\"repo\", \"\")\nresult = subprocess.run(\n [\"gh\", \"repo\", \"view\", repo, \"--json\", \"name,description\"],\n capture_output=True, text=True,\n env={**os.environ, \"GH_TOKEN\": token}\n)\n\nprint(result.stdout)\n```\n\nKey patterns:\n\n- **Read stdin**: `json.load(sys.stdin)` to get the `input` object from the `octavus_skill_run` call\n- **Access secrets**: `os.environ[\"SECRET_NAME\"]` - secrets are injected as env vars\n- **Print output**: Write results to stdout - the LLM sees the (redacted) stdout\n- **Error handling**: Write errors to stderr and exit with non-zero code\n\n### Declaring Secrets in SKILL.md\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\n### Testing Secure Skills Locally\n\nYou can test scripts locally by piping JSON to stdin:\n\n```bash\necho '{\"repo\": \"octavus-ai/agent-sdk\"}' | GITHUB_TOKEN=ghp_xxx python scripts/list-issues.py\n```\n\n## Skill Development Tips\n\n### Writing SKILL.md\n\nFocus on **when** and **how** to use the skill:\n\n```markdown\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\n---\n\n# QR Code Generator\n\n## When to Use\n\nUse this skill when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Create scannable data\n\n## Quick Start\n\n[Clear examples of how to use the skill]\n```\n\n### Script Organization\n\nOrganize scripts logically:\n\n```\nskill-name/\n\u251C\u2500\u2500 SKILL.md\n\u2514\u2500\u2500 scripts/\n \u251C\u2500\u2500 generate.py # Main script\n \u251C\u2500\u2500 utils.py # Helper functions\n \u2514\u2500\u2500 requirements.txt # Dependencies\n```\n\n### Error Messages\n\nProvide helpful error messages:\n\n```python\ntry:\n # ... code ...\nexcept ValueError as e:\n print(f\"Error: Invalid input - {e}\")\n sys.exit(1)\n```\n\nThe LLM sees these errors and can retry or explain to users.\n\n## Security Considerations\n\n### Sandbox Isolation\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n\n### Secret Protection\n\nFor skills with configured secrets:\n\n- **Isolated sandbox** - each secure skill gets its own sandbox, preventing cross-skill secret leakage\n- **No arbitrary code** - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked for secure skills, so only pre-built scripts can execute\n- **Output redaction** - all stdout and stderr are scanned for secret values before being returned to the LLM\n- **Encrypted at rest** - secrets are encrypted using AES-256-GCM and only decrypted at execution time\n\n### Input Validation\n\nSkills should validate inputs:\n\n```python\nimport sys\n\nif not data:\n print(\"Error: Data is required\")\n sys.exit(1)\n\nif len(data) > 1000:\n print(\"Error: Data too long (max 1000 characters)\")\n sys.exit(1)\n```\n\n### Resource Limits\n\nBe aware of:\n\n- **File size limits** - Large files may fail to upload\n- **Execution time** - Sandbox timeout (5-minute default, 1-hour maximum)\n- **Memory limits** - Sandbox environment constraints\n\n## Debugging Skills\n\n### Check Skill Documentation\n\nThe LLM can read skill docs:\n\n```python\n# LLM calls octavus_skill_read to see skill instructions\n```\n\n### Test Locally\n\nTest skills before uploading:\n\n```bash\n# Test skill locally\npython scripts/generate.py --data \"test\"\n```\n\n### Monitor Execution\n\nCheck execution logs in the platform debug view:\n\n- Tool calls and arguments\n- Code execution results\n- File outputs\n- Error messages\n\n## Common Patterns\n\n### Pattern 1: Generate and Return\n\n```yaml\n# User asks for QR code\n# LLM generates QR code\n# File automatically available for download\n```\n\n### Pattern 2: Analyze and Report\n\n```yaml\n# User provides data\n# LLM analyzes with skill\n# Generates report file\n# Returns summary + file link\n```\n\n### Pattern 3: Transform and Save\n\n```yaml\n# User uploads file (via tool)\n# LLM processes with skill\n# Generates transformed file\n# Returns new file link\n```\n\n## Best Practices Summary\n\n1. **Enable only needed skills** - Don't overwhelm the LLM\n2. **Choose appropriate display modes** - Match user experience needs\n3. **Write clear skill descriptions** - Help LLM understand when to use\n4. **Handle errors gracefully** - Provide helpful error messages\n5. **Test skills locally** - Verify before uploading\n6. **Monitor execution** - Check logs for issues\n7. **Combine with tools** - Use tools for data, skills for processing\n8. **Consider performance** - Be aware of timeouts and limits\n9. **Use secrets for credentials** - Declare secrets in frontmatter instead of hardcoding tokens\n10. **Design scripts for stdin input** - Secure skills receive JSON via stdin, so plan for both input methods if the skill might be used in either mode\n\n## Next Steps\n\n- [Skills](/docs/protocol/skills) - Basic skills documentation\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills\n- [Tools](/docs/protocol/tools) - External tools integration\n",
|
|
636
|
+
content: "\n# Skills Advanced Guide\n\nThis guide covers advanced patterns and best practices for using Octavus skills in your agents.\n\n## When to Use Skills\n\nSkills are ideal for:\n\n- **Code execution** - Running Python/Bash scripts\n- **File generation** - Creating images, PDFs, reports\n- **Data processing** - Analyzing, transforming, or visualizing data\n- **Provider-agnostic needs** - Features that should work with any LLM\n\nUse external tools instead when:\n\n- **Simple API calls** - Database queries, external services\n- **Authentication required** - Accessing user-specific resources\n- **Backend integration** - Tight coupling with your infrastructure\n\n## Skill Selection Strategy\n\n### Defining Available Skills\n\nDefine all skills in the `skills:` section, then reference which skills are available where they're used:\n\n**Interactive agents** - reference in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n pdf-processor:\n display: description\n description: Processing PDFs\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\n**Workers and named threads** - reference per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n\nsteps:\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis]\n maxSteps: 10\n```\n\n### Execution Mode\n\nThe `execution` field is set at the skill definition level and applies to all threads that use the skill:\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications\n execution: device # All threads using this skill run it on the device\n qr-code:\n display: description\n description: Generating QR codes\n # Defaults to sandbox execution\n```\n\nYou don't set `execution` per-thread - a skill's execution mode is consistent wherever it's used.\n\n### Match Skills to Use Cases\n\nDifferent threads can have different skills. Define all skills at the protocol level, then scope them to each thread:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n visualization:\n display: description\n description: Creating charts and visualizations\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\nFor a data analysis thread, you would specify `[data-analysis, visualization]` in `agent.skills` or in a `start-thread` block's `skills` field.\n\n## Display Mode Strategy\n\nChoose display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n### Guidelines\n\n- **`hidden`**: Background work that doesn't need user awareness\n- **`description`**: User-facing operations (default)\n- **`name`**: Quick operations where name is sufficient\n- **`stream`**: Long-running operations where progress matters\n\n## System Prompt Integration\n\nSkills are automatically injected into the system prompt. The LLM learns:\n\n1. **Available skills** - List of enabled skills with descriptions\n2. **How to use skills** - Instructions for using skill tools\n3. **Tool reference** - Available skill tools (`octavus_skill_read`, `octavus_code_run`, etc.)\n\nYou don't need to manually document skills in your system prompt. However, you can guide the LLM:\n\n```markdown\n<!-- prompts/system.md -->\n\nYou are a helpful assistant that can generate QR codes.\n\n## When to Generate QR Codes\n\nGenerate QR codes when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Share WiFi credentials\n- Create scannable data\n\nUse the qr-code skill for all QR code generation tasks.\n```\n\n## Error Handling\n\nSkills handle errors gracefully:\n\n```yaml\n# Skill execution errors are returned to the LLM\n# The LLM can retry or explain the error to the user\n```\n\nCommon error scenarios:\n\n1. **Invalid skill slug** - Skill not found in organization\n2. **Code execution errors** - Syntax errors, runtime exceptions\n3. **Missing dependencies** - Required packages not installed\n4. **File I/O errors** - Permission issues, invalid paths\n\nThe LLM receives error messages and can:\n\n- Retry with corrected code\n- Explain errors to users\n- Suggest alternatives\n\n## File Output Patterns\n\n### Single File Output\n\n```python\n# Save single file to /output/\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\nqr = qrcode.QRCode()\nqr.add_data('https://example.com')\nimg = qr.make_image()\nimg.save(f'{output_dir}/qrcode.png')\n```\n\n### Multiple Files\n\n```python\n# Save multiple files\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Generate multiple outputs\nfor i in range(3):\n filename = f'{output_dir}/output_{i}.png'\n # ... generate file ...\n```\n\n### Structured Output\n\n```python\n# Save structured data + files\nimport json\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Save metadata\nmetadata = {\n 'files': ['chart.png', 'data.csv'],\n 'summary': 'Analysis complete'\n}\nwith open(f'{output_dir}/metadata.json', 'w') as f:\n json.dump(metadata, f)\n\n# Save actual files\n# ... generate chart.png and data.csv ...\n```\n\n## Performance Considerations\n\n### Lazy Initialization\n\nSandboxes are created only when a skill tool is first called:\n\n```yaml\nagent:\n skills: [qr-code] # Sandbox created on first skill tool call\n```\n\nThis means:\n\n- No cost if skills aren't used\n- Fast startup (no sandbox creation delay)\n- Each `next-message` execution gets its own sandbox with only the skills it needs\n\n### Timeout Limits\n\nSandboxes default to a 5-minute timeout. Configure `sandboxTimeout` on the agent config or per thread:\n\n```yaml\n# Agent-level\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes\n```\n\n```yaml\n# Thread-level (overrides agent-level)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour for long-running analysis\n```\n\nThread-level `sandboxTimeout` takes priority. Maximum: 1 hour (3,600,000 ms).\n\n### Sandbox Lifecycle\n\nEach `next-message` execution gets its own sandbox:\n\n- **Scoped** - Only contains the skills available to that thread\n- **Isolated** - Interactive agents and workers don't share sandboxes\n- **Resilient** - If a sandbox expires, it's transparently recreated\n- **Cleaned up** - Sandbox destroyed when the LLM call completes\n\n## Combining Skills with Tools\n\nSkills and tools can work together:\n\n```yaml\ntools:\n get-user-data:\n description: Fetch user data from database\n parameters:\n userId: { type: string }\n\nskills:\n data-analysis:\n display: description\n description: Analyzing data\n\nagent:\n tools: [get-user-data]\n skills: [data-analysis]\n agentic: true\n\nhandlers:\n analyze-user:\n Get user data:\n block: tool-call\n tool: get-user-data\n input:\n userId: USER_ID\n output: USER_DATA\n\n Analyze:\n block: next-message\n # LLM can use data-analysis skill with USER_DATA\n```\n\nPattern:\n\n1. Fetch data via tool (from your backend)\n2. LLM uses skill to analyze/process the data\n3. Generate outputs (files, reports)\n\n## Secure Skills\n\nWhen a skill declares secrets and an organization configures them, the skill runs in secure mode with its own isolated sandbox.\n\n### Standard vs Secure vs Device Skills\n\n| Aspect | Standard Skills | Secure Skills | Device Skills |\n| ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |\n| **Environment** | Shared sandbox | Isolated sandbox (one per skill) | The agent's computer |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |\n| **Secrets** | No secrets | Secrets as env vars | No secrets |\n| **Output** | Raw stdout/stderr | Redacted (secret values replaced with `[REDACTED]`) | Raw stdout/stderr |\n\n### Writing Scripts for Secure Skills\n\nSecure skill scripts receive structured input via stdin (JSON) and access secrets from environment variables:\n\n```python\n#!/usr/bin/env python3\nimport json\nimport os\nimport sys\nimport subprocess\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ[\"GITHUB_TOKEN\"]\n\nrepo = input_data.get(\"repo\", \"\")\nresult = subprocess.run(\n [\"gh\", \"repo\", \"view\", repo, \"--json\", \"name,description\"],\n capture_output=True, text=True,\n env={**os.environ, \"GH_TOKEN\": token}\n)\n\nprint(result.stdout)\n```\n\nKey patterns:\n\n- **Read stdin**: `json.load(sys.stdin)` to get the `input` object from the `octavus_skill_run` call\n- **Access secrets**: `os.environ[\"SECRET_NAME\"]` - secrets are injected as env vars\n- **Print output**: Write results to stdout - the LLM sees the (redacted) stdout\n- **Error handling**: Write errors to stderr and exit with non-zero code\n\n### Declaring Secrets in SKILL.md\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\n### Testing Secure Skills Locally\n\nYou can test scripts locally by piping JSON to stdin:\n\n```bash\necho '{\"repo\": \"octavus-ai/agent-sdk\"}' | GITHUB_TOKEN=ghp_xxx python scripts/list-issues.py\n```\n\n## Skill Development Tips\n\n### Writing SKILL.md\n\nFocus on **when** and **how** to use the skill:\n\n```markdown\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\n---\n\n# QR Code Generator\n\n## When to Use\n\nUse this skill when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Create scannable data\n\n## Quick Start\n\n[Clear examples of how to use the skill]\n```\n\n### Script Organization\n\nOrganize scripts logically:\n\n```\nskill-name/\n\u251C\u2500\u2500 SKILL.md\n\u2514\u2500\u2500 scripts/\n \u251C\u2500\u2500 generate.py # Main script\n \u251C\u2500\u2500 utils.py # Helper functions\n \u2514\u2500\u2500 requirements.txt # Dependencies\n```\n\n### Error Messages\n\nProvide helpful error messages:\n\n```python\ntry:\n # ... code ...\nexcept ValueError as e:\n print(f\"Error: Invalid input - {e}\")\n sys.exit(1)\n```\n\nThe LLM sees these errors and can retry or explain to users.\n\n## Security Considerations\n\n### Sandbox Isolation\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n\n### Secret Protection\n\nFor skills with configured secrets:\n\n- **Isolated sandbox** - each secure skill gets its own sandbox, preventing cross-skill secret leakage\n- **No arbitrary code** - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked for secure skills, so only pre-built scripts can execute\n- **Output redaction** - all stdout and stderr are scanned for secret values before being returned to the LLM\n- **Encrypted at rest** - secrets are encrypted using AES-256-GCM and only decrypted at execution time\n\n### Input Validation\n\nSkills should validate inputs:\n\n```python\nimport sys\n\nif not data:\n print(\"Error: Data is required\")\n sys.exit(1)\n\nif len(data) > 1000:\n print(\"Error: Data too long (max 1000 characters)\")\n sys.exit(1)\n```\n\n### Resource Limits\n\nBe aware of:\n\n- **File size limits** - Large files may fail to upload\n- **Execution time** - Sandbox timeout (5-minute default, 1-hour maximum)\n- **Memory limits** - Sandbox environment constraints\n\n## Debugging Skills\n\n### Check Skill Documentation\n\nThe LLM can read skill docs:\n\n```python\n# LLM calls octavus_skill_read to see skill instructions\n```\n\n### Test Locally\n\nTest skills before uploading:\n\n```bash\n# Test skill locally\npython scripts/generate.py --data \"test\"\n```\n\n### Monitor Execution\n\nCheck execution logs in the platform debug view:\n\n- Tool calls and arguments\n- Code execution results\n- File outputs\n- Error messages\n\n## Common Patterns\n\n### Pattern 1: Generate and Return\n\n```yaml\n# User asks for QR code\n# LLM generates QR code\n# File automatically available for download\n```\n\n### Pattern 2: Analyze and Report\n\n```yaml\n# User provides data\n# LLM analyzes with skill\n# Generates report file\n# Returns summary + file link\n```\n\n### Pattern 3: Transform and Save\n\n```yaml\n# User uploads file (via tool)\n# LLM processes with skill\n# Generates transformed file\n# Returns new file link\n```\n\n## Best Practices Summary\n\n1. **Enable only needed skills** - Don't overwhelm the LLM\n2. **Choose appropriate display modes** - Match user experience needs\n3. **Write clear skill descriptions** - Help LLM understand when to use\n4. **Handle errors gracefully** - Provide helpful error messages\n5. **Test skills locally** - Verify before uploading\n6. **Monitor execution** - Check logs for issues\n7. **Combine with tools** - Use tools for data, skills for processing\n8. **Consider performance** - Be aware of timeouts and limits\n9. **Use secrets for credentials** - Declare secrets in frontmatter instead of hardcoding tokens\n10. **Design scripts for stdin input** - Secure skills receive JSON via stdin, so plan for both input methods if the skill might be used in either mode\n\n## Next Steps\n\n- [Skills](/docs/protocol/skills) - Basic skills documentation\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills\n- [Tools](/docs/protocol/tools) - External tools integration\n",
|
|
637
637
|
excerpt: "Skills Advanced Guide This guide covers advanced patterns and best practices for using Octavus skills in your agents. When to Use Skills Skills are ideal for: - Code execution - Running Python/Bash...",
|
|
638
638
|
order: 9
|
|
639
639
|
},
|
|
@@ -651,7 +651,7 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
651
651
|
section: "protocol",
|
|
652
652
|
title: "Workers",
|
|
653
653
|
description: "Defining worker agents for background and task-based execution.",
|
|
654
|
-
content: '\n# Workers\n\nWorkers are agents designed for task-based execution. Unlike interactive agents that handle multi-turn conversations, workers execute a sequence of steps and return an output value.\n\n## When to Use Workers\n\nWorkers are ideal for:\n\n- **Background processing** - Long-running tasks that don\'t need conversation\n- **Composable tasks** - Reusable units of work called by other agents\n- **Pipelines** - Multi-step processing with structured output\n- **Parallel execution** - Tasks that can run independently\n\nUse interactive agents instead when:\n\n- **Conversation is needed** - Multi-turn dialogue with users\n- **Persistence matters** - State should survive across interactions\n- **Session context** - User context needs to persist\n\n## Worker vs Interactive\n\n| Aspect | Interactive | Worker |\n| ---------- | ---------------------------------- | ----------------------------- |\n| Structure | `triggers` + `handlers` + `agent` | `steps` + `output` |\n| LLM Config | Global `agent:` section | Per-thread via `start-thread` |\n| Invocation | Fire a named trigger | Direct execution with input |\n| Session | Persists across triggers (24h TTL) | Single execution |\n| Result | Streaming chat | Streaming + output value |\n\n## Protocol Structure\n\nWorkers use a simpler protocol structure than interactive agents:\n\n```yaml\n# Input schema - provided when worker is executed\ninput:\n TOPIC:\n type: string\n description: Topic to research\n DEPTH:\n type: string\n optional: true\n default: medium\n\n# Variables for intermediate results\nvariables:\n RESEARCH_DATA:\n type: string\n ANALYSIS:\n type: string\n description: Final analysis result\n\n# Tools available to the worker\ntools:\n web-search:\n description: Search the web\n parameters:\n query: { type: string }\n\n# Sequential execution steps\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC, DEPTH]\n tools: [web-search]\n maxSteps: 5\n\n Add research request:\n block: add-message\n thread: research\n role: user\n prompt: research-prompt\n input: [TOPIC, DEPTH]\n\n Generate research:\n block: next-message\n thread: research\n output: RESEARCH_DATA\n\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: analysis-system\n\n Add analysis request:\n block: add-message\n thread: analysis\n role: user\n prompt: analysis-prompt\n input: [RESEARCH_DATA]\n\n Generate analysis:\n block: next-message\n thread: analysis\n output: ANALYSIS\n\n# Output variable - the worker\'s return value\noutput: ANALYSIS\n```\n\n## settings.json\n\nWorkers are identified by the `format` field:\n\n```json\n{\n "slug": "research-assistant",\n "name": "Research Assistant",\n "description": "Researches topics and returns structured analysis",\n "format": "worker"\n}\n```\n\n## Key Differences\n\n### No Global Agent Config\n\nInteractive agents have a global `agent:` section that configures a main thread. Workers don\'t have this - every thread must be explicitly created via `start-thread`:\n\n```yaml\n# Interactive agent: Global config\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [tool-a, tool-b]\n\n# Worker: Each thread configured independently\nsteps:\n Start thread A:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n tools: [tool-a]\n\n Start thread B:\n block: start-thread\n thread: analysis\n model: openai/gpt-4o\n tools: [tool-b]\n```\n\nThis gives workers flexibility to use different models, tools, skills, and settings at different stages.\n\n### Steps Instead of Handlers\n\nWorkers use `steps:` instead of `handlers:`. Steps execute sequentially, like handler blocks:\n\n```yaml\n# Interactive: Handlers respond to triggers\nhandlers:\n user-message:\n Add message:\n block: add-message\n # ...\n\n# Worker: Steps execute in sequence\nsteps:\n Add message:\n block: add-message\n # ...\n```\n\n### Output Value\n\nWorkers can return an output value to the caller:\n\n```yaml\nvariables:\n RESULT:\n type: string\n\nsteps:\n # ... steps that populate RESULT ...\n\noutput: RESULT # Return this variable\'s value\n```\n\nThe `output` field references a variable declared in `variables:`. If omitted, the worker completes without returning a value.\n\n## Available Blocks\n\nWorkers support the same blocks as handlers:\n\n| Block | Purpose |\n| ------------------ | -------------------------------------------- |\n| `start-thread` | Create a named thread with LLM configuration |\n| `add-message` | Add a message to a thread |\n| `next-message` | Generate LLM response |\n| `tool-call` | Call a tool deterministically |\n| `set-resource` | Update a resource value |\n| `serialize-thread` | Convert thread to text |\n| `generate-image` | Generate an image from a prompt variable |\n\n### start-thread (Required for LLM)\n\nEvery thread must be initialized with `start-thread` before using `next-message`:\n\n```yaml\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC]\n tools: [web-search]\n thinking: medium\n maxSteps: 5\n```\n\nAll LLM configuration goes here:\n\n| Field | Description |\n| ------------- | -------------------------------------------------------------------------------------- |\n| `thread` | Thread name (defaults to block name) |\n| `model` | LLM model to use |\n| `system` | System prompt filename (required) |\n| `input` | Variables for system prompt |\n| `tools` | Tools available in this thread |\n| `skills` | Octavus skills available in this thread |\n| `mcpServers` | MCP servers available in this thread |\n| `imageModel` | Image generation model |\n| `webSearch` | Enable built-in web search tool |\n| `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |\n| `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `temperature` | Model temperature (0-2), `"off"`, or variable reference |\n| `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |\n\n## Simple Example\n\nA worker that generates a title from a summary:\n\n```yaml\n# Input\ninput:\n CONVERSATION_SUMMARY:\n type: string\n description: Summary to generate a title for\n\n# Variables\nvariables:\n TITLE:\n type: string\n description: The generated title\n\n# Steps\nsteps:\n Start title thread:\n block: start-thread\n thread: title-gen\n model: anthropic/claude-sonnet-4-5\n system: title-system\n\n Add title request:\n block: add-message\n thread: title-gen\n role: user\n prompt: title-request\n input: [CONVERSATION_SUMMARY]\n\n Generate title:\n block: next-message\n thread: title-gen\n output: TITLE\n display: stream\n\n# Output\noutput: TITLE\n```\n\n## Advanced Example\n\nA worker with multiple threads, tools, and agentic behavior:\n\n```yaml\ninput:\n USER_MESSAGE:\n type: string\n description: The user\'s message to respond to\n USER_ID:\n type: string\n description: User ID for account lookups\n optional: true\n\ntools:\n get-user-account:\n description: Looking up account information\n parameters:\n userId: { type: string }\n create-support-ticket:\n description: Creating a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string }\n\nvariables:\n ASSISTANT_RESPONSE:\n type: string\n CHAT_TRANSCRIPT:\n type: string\n CONVERSATION_SUMMARY:\n type: string\n\nsteps:\n # Thread 1: Chat with agentic tool calling\n Start chat thread:\n block: start-thread\n thread: chat\n model: anthropic/claude-sonnet-4-5\n system: chat-system\n input: [USER_ID]\n tools: [get-user-account, create-support-ticket]\n thinking: medium\n maxSteps: 5\n\n Add user message:\n block: add-message\n thread: chat\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Generate response:\n block: next-message\n thread: chat\n output: ASSISTANT_RESPONSE\n display: stream\n\n # Serialize for summary\n Save conversation:\n block: serialize-thread\n thread: chat\n output: CHAT_TRANSCRIPT\n\n # Thread 2: Summary generation\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-sonnet-4-5\n system: summary-system\n thinking: low\n\n Add summary request:\n block: add-message\n thread: summary\n role: user\n prompt: summary-request\n input: [CHAT_TRANSCRIPT]\n\n Generate summary:\n block: next-message\n thread: summary\n output: CONVERSATION_SUMMARY\n display: stream\n\noutput: CONVERSATION_SUMMARY\n```\n\n## MCP Servers\n\nWorkers can declare and use MCP servers, just like interactive agents. Define them in `mcpServers:` and reference them in `start-thread`:\n\n```yaml\nmcpServers:\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [sentry, browser]\n maxSteps: 10\n```\n\nWorkers resolve their own MCP connections independently - they don\'t inherit MCP servers from a parent interactive agent. Remote MCP connections are project-scoped, so a worker in the same project automatically has access to the same OAuth connections.\n\nSee [MCP Servers](/docs/protocol/mcp-servers) for full documentation.\n\n## Skills, Image Generation, and Web Search\n\nWorkers can use Octavus skills, image generation, and web search, configured per-thread via `start-thread`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generate QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n imageModel: google/gemini-2.5-flash-image\n webSearch: true\n maxSteps: 10\n```\n\nWorkers define their own skills independently - they don\'t inherit skills from a parent interactive agent. Each thread gets its own sandbox scoped to only its listed skills.\n\nSkills with `execution: device` work the same way in workers as in interactive agents - the skill runs on the agent\'s computer. Workers resolve their device execution independently, so a worker can use device skills even if the parent agent does not.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## Tool Handling\n\nWorkers support the same tool handling as interactive agents:\n\n- **Server tools** - Handled by tool handlers you provide\n- **Client tools** - Pause execution, return tool request to caller\n\n```typescript\n// Non-streaming: get the output directly\nconst { output } = await client.workers.generate(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n\n// Streaming: observe events in real-time\nconst events = client.workers.execute(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n```\n\nSee [Server SDK Workers](/docs/server-sdk/workers) for tool handling details.\n\n## Stream Events\n\nWorkers emit the same events as interactive agents, plus worker-specific events:\n\n| Event | Description |\n| --------------- | ---------------------------------- |\n| `worker-start` | Worker execution begins |\n| `worker-result` | Worker completes (includes output) |\n\nAll standard events (text-delta, tool calls, etc.) are also emitted.\n\n## Calling Workers from Interactive Agents\n\nInteractive agents can call workers in two ways:\n\n1. **Deterministically** - Using the `run-worker` block\n2. **Agentically** - LLM calls worker as a tool\n\n### Worker Declaration\n\nFirst, declare workers in your interactive agent\'s protocol:\n\n```yaml\nworkers:\n generate-title:\n description: Generating conversation title\n display: description\n research-assistant:\n description: Researching topic\n display: stream\n tools:\n search: web-search # Map worker tool \u2192 parent tool\n```\n\n### run-worker Block\n\nCall a worker deterministically from a handler:\n\n```yaml\nhandlers:\n request-human:\n Generate title:\n block: run-worker\n worker: generate-title\n input:\n CONVERSATION_SUMMARY: SUMMARY\n output: CONVERSATION_TITLE\n```\n\n### LLM Tool Invocation\n\nMake workers available to the LLM:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n workers: [generate-title, research-assistant]\n agentic: true\n```\n\nThe LLM can then call workers as tools during conversation.\n\n### Display Modes\n\nControls how worker execution appears to users. The default for workers is `stream`.\n\n| Mode | Behavior |\n| ------------- | ---------------------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Worker runs silently. No events reach the client - no `UIWorkerPart` is created. |\n| `name` | Shows a running/done indicator with the worker name. No nested content (text, tool calls, reasoning) is forwarded. |\n| `description` | Shows a running/done indicator with the worker description. No nested content is forwarded. |\n| `stream` | Full visibility. All nested events are forwarded - text, reasoning, tool calls, sources, files. Worker input is included on start. |\n\n**Progressive input streaming:** When a worker with `display: stream` is invoked agentically (LLM calls it as a tool), the `UIWorkerPart` appears in the UI immediately as the LLM starts generating the worker\'s arguments. The worker input streams progressively into the worker part, the same way text tokens stream into a text part. Once input finishes, worker execution begins and nested content flows into the same worker part. There is no intermediate tool card.\n\n**`name` and `description` modes:** Worker input is stripped from the `worker-start` event (it may contain sensitive data). Only the running/done status and the final `worker-result` are forwarded to the parent stream. Use these for workers where the user only needs to know the worker ran, not what it did internally.\n\n**`hidden` mode:** The worker executes normally but produces no UI presence at all. Use for internal workers that are implementation details.\n\n### Tool Mapping\n\nMap parent tools to worker tools when the worker needs access to your tool handlers:\n\n```yaml\nworkers:\n research-assistant:\n description: Research topics\n tools:\n search: web-search # Worker\'s "search" \u2192 parent\'s "web-search"\n```\n\nWhen the worker calls its `search` tool, your `web-search` handler executes.\n\n## Next Steps\n\n- [Server SDK Workers](/docs/server-sdk/workers) - Executing workers from code\n- [Handlers](/docs/protocol/handlers) - Block reference for steps\n- [Agent Config](/docs/protocol/agent-config) - Model and settings\n',
|
|
654
|
+
content: '\n# Workers\n\nWorkers are agents designed for task-based execution. Unlike interactive agents that handle multi-turn conversations, workers execute a sequence of steps and return an output value.\n\n## When to Use Workers\n\nWorkers are ideal for:\n\n- **Background processing** - Long-running tasks that don\'t need conversation\n- **Composable tasks** - Reusable units of work called by other agents\n- **Pipelines** - Multi-step processing with structured output\n- **Parallel execution** - Tasks that can run independently\n\nUse interactive agents instead when:\n\n- **Conversation is needed** - Multi-turn dialogue with users\n- **Persistence matters** - State should survive across interactions\n- **Session context** - User context needs to persist\n\n## Worker vs Interactive\n\n| Aspect | Interactive | Worker |\n| ---------- | ---------------------------------- | ----------------------------- |\n| Structure | `triggers` + `handlers` + `agent` | `steps` + `output` |\n| LLM Config | Global `agent:` section | Per-thread via `start-thread` |\n| Invocation | Fire a named trigger | Direct execution with input |\n| Session | Persists across triggers (24h TTL) | Single execution |\n| Result | Streaming chat | Streaming + output value |\n\n## Protocol Structure\n\nWorkers use a simpler protocol structure than interactive agents:\n\n```yaml\n# Input schema - provided when worker is executed\ninput:\n TOPIC:\n type: string\n description: Topic to research\n DEPTH:\n type: string\n optional: true\n default: medium\n\n# Variables for intermediate results\nvariables:\n RESEARCH_DATA:\n type: string\n ANALYSIS:\n type: string\n description: Final analysis result\n\n# Tools available to the worker\ntools:\n web-search:\n description: Search the web\n parameters:\n query: { type: string }\n\n# Sequential execution steps\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC, DEPTH]\n tools: [web-search]\n maxSteps: 5\n\n Add research request:\n block: add-message\n thread: research\n role: user\n prompt: research-prompt\n input: [TOPIC, DEPTH]\n\n Generate research:\n block: next-message\n thread: research\n output: RESEARCH_DATA\n\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: analysis-system\n\n Add analysis request:\n block: add-message\n thread: analysis\n role: user\n prompt: analysis-prompt\n input: [RESEARCH_DATA]\n\n Generate analysis:\n block: next-message\n thread: analysis\n output: ANALYSIS\n\n# Output variable - the worker\'s return value\noutput: ANALYSIS\n```\n\n## settings.json\n\nWorkers are identified by the `format` field:\n\n```json\n{\n "slug": "research-assistant",\n "name": "Research Assistant",\n "description": "Researches topics and returns structured analysis",\n "format": "worker"\n}\n```\n\n## Key Differences\n\n### No Global Agent Config\n\nInteractive agents have a global `agent:` section that configures a main thread. Workers don\'t have this - every thread must be explicitly created via `start-thread`:\n\n```yaml\n# Interactive agent: Global config\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [tool-a, tool-b]\n\n# Worker: Each thread configured independently\nsteps:\n Start thread A:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n tools: [tool-a]\n\n Start thread B:\n block: start-thread\n thread: analysis\n model: openai/gpt-4o\n tools: [tool-b]\n```\n\nThis gives workers flexibility to use different models, tools, skills, and settings at different stages.\n\n### Steps Instead of Handlers\n\nWorkers use `steps:` instead of `handlers:`. Steps execute sequentially, like handler blocks:\n\n```yaml\n# Interactive: Handlers respond to triggers\nhandlers:\n user-message:\n Add message:\n block: add-message\n # ...\n\n# Worker: Steps execute in sequence\nsteps:\n Add message:\n block: add-message\n # ...\n```\n\n### Output Value\n\nWorkers can return an output value to the caller:\n\n```yaml\nvariables:\n RESULT:\n type: string\n\nsteps:\n # ... steps that populate RESULT ...\n\noutput: RESULT # Return this variable\'s value\n```\n\nThe `output` field references a variable declared in `variables:`. If omitted, the worker completes without returning a value.\n\n## Available Blocks\n\nWorkers support the same blocks as handlers:\n\n| Block | Purpose |\n| ------------------ | -------------------------------------------- |\n| `start-thread` | Create a named thread with LLM configuration |\n| `add-message` | Add a message to a thread |\n| `next-message` | Generate LLM response |\n| `tool-call` | Call a tool deterministically |\n| `set-resource` | Update a resource value |\n| `serialize-thread` | Convert thread to text |\n| `generate-image` | Generate an image from a prompt variable |\n\n### start-thread (Required for LLM)\n\nEvery thread must be initialized with `start-thread` before using `next-message`:\n\n```yaml\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC]\n tools: [web-search]\n thinking: medium\n maxSteps: 5\n```\n\nAll LLM configuration goes here:\n\n| Field | Description |\n| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |\n| `thread` | Thread name (defaults to block name) |\n| `model` | LLM model to use |\n| `system` | System prompt filename (required) |\n| `input` | Variables for system prompt |\n| `tools` | Tools available in this thread |\n| `skills` | Octavus skills available in this thread |\n| `mcpServers` | MCP servers available in this thread |\n| `imageModel` | Image generation model |\n| `webSearch` | Enable built-in web search tool |\n| `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |\n| `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `temperature` | Model temperature (0-2), `"off"`, or variable reference |\n| `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |\n| `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread\'s model view (head+tail preview + note). Omit to leave tool output unbounded |\n\n## Simple Example\n\nA worker that generates a title from a summary:\n\n```yaml\n# Input\ninput:\n CONVERSATION_SUMMARY:\n type: string\n description: Summary to generate a title for\n\n# Variables\nvariables:\n TITLE:\n type: string\n description: The generated title\n\n# Steps\nsteps:\n Start title thread:\n block: start-thread\n thread: title-gen\n model: anthropic/claude-sonnet-4-5\n system: title-system\n\n Add title request:\n block: add-message\n thread: title-gen\n role: user\n prompt: title-request\n input: [CONVERSATION_SUMMARY]\n\n Generate title:\n block: next-message\n thread: title-gen\n output: TITLE\n display: stream\n\n# Output\noutput: TITLE\n```\n\n## Advanced Example\n\nA worker with multiple threads, tools, and agentic behavior:\n\n```yaml\ninput:\n USER_MESSAGE:\n type: string\n description: The user\'s message to respond to\n USER_ID:\n type: string\n description: User ID for account lookups\n optional: true\n\ntools:\n get-user-account:\n description: Looking up account information\n parameters:\n userId: { type: string }\n create-support-ticket:\n description: Creating a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string }\n\nvariables:\n ASSISTANT_RESPONSE:\n type: string\n CHAT_TRANSCRIPT:\n type: string\n CONVERSATION_SUMMARY:\n type: string\n\nsteps:\n # Thread 1: Chat with agentic tool calling\n Start chat thread:\n block: start-thread\n thread: chat\n model: anthropic/claude-sonnet-4-5\n system: chat-system\n input: [USER_ID]\n tools: [get-user-account, create-support-ticket]\n thinking: medium\n maxSteps: 5\n\n Add user message:\n block: add-message\n thread: chat\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Generate response:\n block: next-message\n thread: chat\n output: ASSISTANT_RESPONSE\n display: stream\n\n # Serialize for summary\n Save conversation:\n block: serialize-thread\n thread: chat\n output: CHAT_TRANSCRIPT\n\n # Thread 2: Summary generation\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-sonnet-4-5\n system: summary-system\n thinking: low\n\n Add summary request:\n block: add-message\n thread: summary\n role: user\n prompt: summary-request\n input: [CHAT_TRANSCRIPT]\n\n Generate summary:\n block: next-message\n thread: summary\n output: CONVERSATION_SUMMARY\n display: stream\n\noutput: CONVERSATION_SUMMARY\n```\n\n## MCP Servers\n\nWorkers can declare and use MCP servers, just like interactive agents. Define them in `mcpServers:` and reference them in `start-thread`:\n\n```yaml\nmcpServers:\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [sentry, browser]\n maxSteps: 10\n```\n\nWorkers resolve their own MCP connections independently - they don\'t inherit MCP servers from a parent interactive agent. Remote MCP connections are project-scoped, so a worker in the same project automatically has access to the same OAuth connections.\n\nSee [MCP Servers](/docs/protocol/mcp-servers) for full documentation.\n\n## Skills, Image Generation, and Web Search\n\nWorkers can use Octavus skills, image generation, and web search, configured per-thread via `start-thread`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generate QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n imageModel: google/gemini-2.5-flash-image\n webSearch: true\n maxSteps: 10\n```\n\nWorkers define their own skills independently - they don\'t inherit skills from a parent interactive agent. Each thread gets its own sandbox scoped to only its listed skills.\n\nSkills with `execution: device` work the same way in workers as in interactive agents - the skill runs on the agent\'s computer. Workers resolve their device execution independently, so a worker can use device skills even if the parent agent does not.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## Tool Handling\n\nWorkers support the same tool handling as interactive agents:\n\n- **Server tools** - Handled by tool handlers you provide\n- **Client tools** - Pause execution, return tool request to caller\n\n```typescript\n// Non-streaming: get the output directly\nconst { output } = await client.workers.generate(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n\n// Streaming: observe events in real-time\nconst events = client.workers.execute(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n```\n\nSee [Server SDK Workers](/docs/server-sdk/workers) for tool handling details.\n\n## Stream Events\n\nWorkers emit the same events as interactive agents, plus worker-specific events:\n\n| Event | Description |\n| --------------- | ---------------------------------- |\n| `worker-start` | Worker execution begins |\n| `worker-result` | Worker completes (includes output) |\n\nAll standard events (text-delta, tool calls, etc.) are also emitted.\n\n## Calling Workers from Interactive Agents\n\nInteractive agents can call workers in three ways:\n\n1. **Deterministically** - Using the `run-worker` block\n2. **Agentically** - LLM calls worker as a tool\n3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management\'s `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.\n\n### Worker Declaration\n\nFirst, declare workers in your interactive agent\'s protocol:\n\n```yaml\nworkers:\n generate-title:\n description: Generating conversation title\n display: description\n research-assistant:\n description: Researching topic\n display: stream\n tools:\n search: web-search # Map worker tool \u2192 parent tool\n```\n\n### run-worker Block\n\nCall a worker deterministically from a handler:\n\n```yaml\nhandlers:\n request-human:\n Generate title:\n block: run-worker\n worker: generate-title\n input:\n CONVERSATION_SUMMARY: SUMMARY\n output: CONVERSATION_TITLE\n```\n\n### LLM Tool Invocation\n\nMake workers available to the LLM:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n workers: [generate-title, research-assistant]\n agentic: true\n```\n\nThe LLM can then call workers as tools during conversation.\n\n### Display Modes\n\nControls how worker execution appears to users. The default for workers is `stream`.\n\n| Mode | Behavior |\n| ------------- | ---------------------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Worker runs silently. No events reach the client - no `UIWorkerPart` is created. |\n| `name` | Shows a running/done indicator with the worker name. No nested content (text, tool calls, reasoning) is forwarded. |\n| `description` | Shows a running/done indicator with the worker description. No nested content is forwarded. |\n| `stream` | Full visibility. All nested events are forwarded - text, reasoning, tool calls, sources, files. Worker input is included on start. |\n\n**Progressive input streaming:** When a worker with `display: stream` is invoked agentically (LLM calls it as a tool), the `UIWorkerPart` appears in the UI immediately as the LLM starts generating the worker\'s arguments. The worker input streams progressively into the worker part, the same way text tokens stream into a text part. Once input finishes, worker execution begins and nested content flows into the same worker part. There is no intermediate tool card.\n\n**`name` and `description` modes:** Worker input is stripped from the `worker-start` event (it may contain sensitive data). Only the running/done status and the final `worker-result` are forwarded to the parent stream. Use these for workers where the user only needs to know the worker ran, not what it did internally.\n\n**`hidden` mode:** The worker executes normally but produces no UI presence at all. Use for internal workers that are implementation details.\n\n### Tool Mapping\n\nMap parent tools to worker tools when the worker needs access to your tool handlers:\n\n```yaml\nworkers:\n research-assistant:\n description: Research topics\n tools:\n search: web-search # Worker\'s "search" \u2192 parent\'s "web-search"\n```\n\nWhen the worker calls its `search` tool, your `web-search` handler executes.\n\n## Next Steps\n\n- [Server SDK Workers](/docs/server-sdk/workers) - Executing workers from code\n- [Handlers](/docs/protocol/handlers) - Block reference for steps\n- [Agent Config](/docs/protocol/agent-config) - Model and settings\n',
|
|
655
655
|
excerpt: "Workers Workers are agents designed for task-based execution. Unlike interactive agents that handle multi-turn conversations, workers execute a sequence of steps and return an output value. When to...",
|
|
656
656
|
order: 11
|
|
657
657
|
},
|
|
@@ -673,6 +673,24 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
673
673
|
excerpt: "MCP Servers MCP servers extend your agent with tools from external services. Define them in your protocol, and agents automatically discover and use their tools at runtime. There are three types of...",
|
|
674
674
|
order: 13
|
|
675
675
|
},
|
|
676
|
+
{
|
|
677
|
+
slug: "protocol/context-management",
|
|
678
|
+
section: "protocol",
|
|
679
|
+
title: "Context Management",
|
|
680
|
+
description: "Automatic context-window compaction so long sessions keep running past the model's limit.",
|
|
681
|
+
content: "\n# Context Management\n\nLong-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.\n\nCompaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.\n\n## Configuration\n\n```yaml\nworkers:\n context-summarizer: # the worker that produces the running summary\n description: Summarizes earlier conversation to free up context\n display: description\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)\n # context-summarizer is intentionally NOT listed in agent.workers,\n # so the model never sees it as a callable tool.\n contextManagement:\n summarizerWorker: context-summarizer\n thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)\n recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)\n```\n\n`maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:\n\n| Field | Required | Description |\n| ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |\n| `summarizerWorker` | No | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |\n| `thresholdPercent` | No | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |\n| `recentPercent` | No | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization. |\n| `recentWindow` | No | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction). |\n\n## How it works\n\n- When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.\n- When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.\n- Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.\n- If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.\n\n## Bounded tool output\n\nSome tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.\n\nThere is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.\n\nBounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.\n\n## The summarizer worker\n\n`summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.\n\nSummarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.\n\nDeclare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.\n\nWithout a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.\n\n## What users see\n\nBecause the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.\n",
|
|
682
|
+
excerpt: "Context Management Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the...",
|
|
683
|
+
order: 14
|
|
684
|
+
},
|
|
685
|
+
{
|
|
686
|
+
slug: "protocol/fast-mode",
|
|
687
|
+
section: "protocol",
|
|
688
|
+
title: "Fast Mode",
|
|
689
|
+
description: "Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.",
|
|
690
|
+
content: "\n# Fast Mode\n\nFast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):\n\n```yaml\nagent:\n model: anthropic/claude-opus-4-8\n speed: fast # fast | standard (default)\n```\n\n| Mode | Behavior | When to use |\n| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |\n| `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents. |\n| `fast` | Higher output speed at a premium per-token rate. | Latency-sensitive, interactive agents where faster responses are worth the premium. |\n\nFast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.\n\n## Supported models\n\nFast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.\n\nWhen you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.\n\n## Premium pricing\n\nFast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:\n\n| Model | Fast-mode cost |\n| -------------- | -------------- |\n| Opus 4.8 | ~2x standard |\n| Opus 4.7 / 4.6 | ~6x standard |\n\nPrompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.\n\n## Rate limits and fallback\n\nFast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.\n\nFalling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.\n\n## Routing\n\nA supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:\n\n| Route | Example model | How fast mode is enabled |\n| ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |\n| Direct Anthropic | `anthropic/claude-opus-4-8` | `speed: fast` |\n| Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7` | `speed: fast` |\n| OpenRouter | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |\n\n## Passing speed as input\n\nLike `thinking`, `speed` accepts a variable reference so consumers choose it per session:\n\n```yaml\ninput:\n SPEED:\n type: string\n description: Inference speed (fast/standard)\n optional: true\n\nagent:\n model: anthropic/claude-opus-4-8\n speed: SPEED # Resolved from session input; unset -> standard\n system: system\n```\n\nAn unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.\n\n## Scope\n\n`speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.\n",
|
|
691
|
+
excerpt: "Fast Mode Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the ...",
|
|
692
|
+
order: 15
|
|
693
|
+
},
|
|
676
694
|
{
|
|
677
695
|
slug: "api-reference/overview",
|
|
678
696
|
section: "api-reference",
|
|
@@ -768,7 +786,7 @@ var sections_default = [
|
|
|
768
786
|
section: "server-sdk",
|
|
769
787
|
title: "Overview",
|
|
770
788
|
description: "Introduction to the Octavus Server SDK for backend integration.",
|
|
771
|
-
content: "\n# Server SDK Overview\n\nThe `@octavus/server-sdk` package provides a Node.js SDK for integrating Octavus agents into your backend application. It handles session management, streaming, and the tool execution continuation loop.\n\n**Current version:** `
|
|
789
|
+
content: "\n# Server SDK Overview\n\nThe `@octavus/server-sdk` package provides a Node.js SDK for integrating Octavus agents into your backend application. It handles session management, streaming, and the tool execution continuation loop.\n\n**Current version:** `5.0.0`\n\n## Installation\n\n```bash\nnpm install @octavus/server-sdk\n```\n\nFor agent management (sync, validate), install the CLI as a dev dependency:\n\n```bash\nnpm install --save-dev @octavus/cli\n```\n\n## Basic Usage\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: 'https://octavus.ai',\n apiKey: 'your-api-key',\n});\n```\n\n## Key Features\n\n### Agent Management\n\nAgent definitions are managed via the CLI. See the [CLI documentation](/docs/server-sdk/cli) for details.\n\n```bash\n# Sync agent from local files\noctavus sync ./agents/support-chat\n\n# Output: Created: support-chat\n# Agent ID: clxyz123abc456\n```\n\n### Session Management\n\nCreate and manage agent sessions using the agent ID:\n\n```typescript\n// Create a new session (use agent ID from CLI sync)\nconst sessionId = await client.agentSessions.create('clxyz123abc456', {\n COMPANY_NAME: 'Acme Corp',\n PRODUCT_NAME: 'Widget Pro',\n});\n\n// Get UI-ready session messages (for session restore)\nconst session = await client.agentSessions.getMessages(sessionId);\n```\n\n### Tool Handlers\n\nTools run on your server with your data:\n\n```typescript\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'get-user-account': async (args) => {\n // Access your database, APIs, etc.\n return await db.users.findById(args.userId);\n },\n },\n});\n```\n\n### Streaming\n\nAll responses stream in real-time:\n\n```typescript\nimport { toSSEStream } from '@octavus/server-sdk';\n\n// execute() returns an async generator of events\nconst events = session.execute({\n type: 'trigger',\n triggerName: 'user-message',\n input: { USER_MESSAGE: 'Hello!' },\n});\n\n// Convert to SSE stream for HTTP responses\nreturn new Response(toSSEStream(events), {\n headers: { 'Content-Type': 'text/event-stream' },\n});\n```\n\n### Computer Capabilities\n\nGive agents access to browser, filesystem, and shell via MCP:\n\n```typescript\nimport { Computer } from '@octavus/computer';\n\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=...']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [dir]),\n shell: Computer.shell({ cwd: dir, mode: 'unrestricted' }),\n },\n});\n\nawait computer.start();\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => ({ title: args.title }),\n },\n});\n\nsession.setDynamicTools(computer);\n```\n\n### Workers\n\nExecute worker agents for task-based processing:\n\n```typescript\n// Non-streaming: get the output directly\nconst { output } = await client.workers.generate(agentId, {\n TOPIC: 'AI safety',\n});\n\n// Streaming: observe events in real-time\nfor await (const event of client.workers.execute(agentId, input)) {\n // Handle stream events\n}\n```\n\n## API Reference\n\n### OctavusClient\n\nThe main entry point for interacting with Octavus.\n\n```typescript\ninterface OctavusClientConfig {\n baseUrl: string; // Octavus API URL\n apiKey?: string; // Your API key\n traceModelRequests?: boolean; // Enable model request tracing (default: false)\n maxRetries?: number; // Retries for transient network failures during streaming (default: 2, set to 0 to disable)\n}\n\nclass OctavusClient {\n readonly agents: AgentsApi;\n readonly agentSessions: AgentSessionsApi;\n readonly workers: WorkersApi;\n readonly files: FilesApi;\n\n constructor(config: OctavusClientConfig);\n}\n```\n\n### AgentSessionsApi\n\nManages agent sessions.\n\n```typescript\nclass AgentSessionsApi {\n // Create a new session\n async create(agentId: string, input?: Record<string, unknown>): Promise<string>;\n\n // Get full session state (for debugging/internal use)\n async get(sessionId: string): Promise<SessionState>;\n\n // Get UI-ready messages (for client display)\n async getMessages(sessionId: string): Promise<UISessionState>;\n\n // Attach to a session for triggering\n attach(sessionId: string, options?: SessionAttachOptions): AgentSession;\n}\n\n// Full session state (internal format)\ninterface SessionState {\n id: string;\n agentId: string;\n input: Record<string, unknown>;\n variables: Record<string, unknown>;\n resources: Record<string, unknown>;\n messages: ChatMessage[]; // Internal message format\n createdAt: string;\n updatedAt: string;\n}\n\n// UI-ready session state\ninterface UISessionState {\n sessionId: string;\n agentId: string;\n messages: UIMessage[]; // UI-ready messages for frontend\n}\n```\n\n### AgentSession\n\nHandles request execution and streaming for a specific session.\n\n```typescript\nclass AgentSession {\n // Execute a request and stream parsed events\n execute(request: SessionRequest, options?: TriggerOptions): AsyncGenerator<StreamEvent>;\n\n // Get the session ID\n getSessionId(): string;\n\n // Register dynamic tools (e.g., pass a Computer or explicit DynamicTool[])\n setDynamicTools(source: ToolProvider | DynamicTool[]): void;\n}\n\ntype SessionRequest = TriggerRequest | ContinueRequest;\n\ninterface TriggerRequest {\n type: 'trigger';\n triggerName: string;\n input?: Record<string, unknown>;\n}\n\ninterface ContinueRequest {\n type: 'continue';\n executionId: string;\n toolResults: ToolResult[];\n}\n\n// Helper to convert events to SSE stream\nfunction toSSEStream(events: AsyncIterable<StreamEvent>): ReadableStream<Uint8Array>;\n```\n\n### FilesApi\n\nHandles file uploads for sessions.\n\n```typescript\nclass FilesApi {\n // Get presigned URLs for file uploads\n async getUploadUrls(sessionId: string, files: FileUploadRequest[]): Promise<UploadUrlsResponse>;\n}\n\ninterface FileUploadRequest {\n filename: string;\n mediaType: string;\n size: number;\n}\n\ninterface UploadUrlsResponse {\n files: {\n id: string; // File ID for references\n uploadUrl: string; // PUT to this URL\n downloadUrl: string; // GET URL after upload\n }[];\n}\n```\n\nThe client uploads files directly to S3 using the presigned upload URL. See [File Uploads](/docs/client-sdk/file-uploads) for the full integration pattern.\n\n## Next Steps\n\n- [Sessions](/docs/server-sdk/sessions) - Deep dive into session management\n- [Tools](/docs/server-sdk/tools) - Implementing tool handlers\n- [Streaming](/docs/server-sdk/streaming) - Understanding stream events\n- [Workers](/docs/server-sdk/workers) - Executing worker agents\n- [Debugging](/docs/server-sdk/debugging) - Model request tracing and debugging\n- [Computer](/docs/server-sdk/computer) - Browser, filesystem, and shell via MCP\n",
|
|
772
790
|
excerpt: "Server SDK Overview The package provides a Node.js SDK for integrating Octavus agents into your backend application. It handles session management, streaming, and the tool execution continuation...",
|
|
773
791
|
order: 1
|
|
774
792
|
},
|
|
@@ -777,7 +795,7 @@ var sections_default = [
|
|
|
777
795
|
section: "server-sdk",
|
|
778
796
|
title: "Sessions",
|
|
779
797
|
description: "Managing agent sessions with the Server SDK.",
|
|
780
|
-
content: "\n# Sessions\n\nSessions represent conversations with an agent. They store conversation history, track resources and variables, and enable stateful interactions.\n\n## Creating Sessions\n\nCreate a session by specifying the agent ID and initial input variables:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\n// Create a session with the support-chat agent\nconst sessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n PRODUCT_NAME: 'Widget Pro',\n USER_ID: 'user-123', // Optional inputs\n});\n\nconsole.log('Session created:', sessionId);\n```\n\n## Getting Session Messages\n\nTo restore a conversation on page load, use `getMessages()` to retrieve UI-ready messages:\n\n```typescript\nconst session = await client.agentSessions.getMessages(sessionId);\n\nconsole.log({\n sessionId: session.sessionId,\n agentId: session.agentId,\n messages: session.messages.length, // UIMessage[] ready for frontend\n});\n```\n\nThe returned messages can be passed directly to the client SDK's `initialMessages` option.\n\n### UISessionState Interface\n\n```typescript\ninterface UISessionState {\n sessionId: string;\n agentId: string;\n messages: UIMessage[]; // UI-ready conversation history\n}\n```\n\n## Full Session State (Debug)\n\nFor debugging or internal use, you can retrieve the complete session state including all variables and internal message format:\n\n```typescript\nconst state = await client.agentSessions.get(sessionId);\n\nconsole.log({\n id: state.id,\n agentId: state.agentId,\n messages: state.messages.length, // ChatMessage[] (internal format)\n resources: state.resources,\n variables: state.variables,\n createdAt: state.createdAt,\n updatedAt: state.updatedAt,\n});\n```\n\n> **Note**: Use `getMessages()` for client-facing code. The `get()` method returns internal message format that includes hidden content not intended for end users.\n\n## Getting Execution Logs\n\n`getLogs()` returns the chronological execution trace for a session - triggers, messages, tool calls, LLM responses, errors, and other events emitted while the agent ran. Useful for debugging, observability, and building custom timeline views.\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status === 'expired') {\n console.log('Session expired:', result.sessionId);\n} else {\n for (const entry of result.entries) {\n console.log(entry.type, entry.timestamp);\n }\n}\n```\n\nEach entry is a typed variant of `ExecutionLogEntry` (a discriminated union) so consumers can narrow on `entry.type`:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired') {\n const toolCalls = result.entries.filter((e) => e.type === 'tool-call');\n for (const call of toolCalls) {\n // call.toolName, call.toolArguments are typed without optional chaining\n console.log(call.toolName, call.toolArguments);\n }\n}\n```\n\n### Excluding Model Request Payloads\n\nModel-request entries include the full provider request body and can be large. Pass `excludeModelRequests: true` to skip them:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId, {\n excludeModelRequests: true,\n});\n```\n\n### Truncation\n\nResponses are capped at 1000 entries (most recent). When the log exceeds that cap, the response includes `total` and `truncated` so consumers can detect this:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired' && result.truncated) {\n console.warn(`Showing latest 1000 of ${result.total} entries`);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | -------------------------------------------------------------------------------------------- |\n| `active` | `ExecutionLogsResult` | `{ sessionId, entries, total?, truncated? }`. `total` and `truncated` are present when known |\n| `expired` | `ExpiredSessionState` | `{ sessionId, agentId, status: 'expired', createdAt }` |\n\n> **Forward-compatible types**: `ExecutionLogEntry` may gain new variants over time. Include a `default` case when switching on `entry.type` so unknown variants are handled gracefully.\n\n## Attaching to Sessions\n\nTo trigger actions on a session, you need to attach to it first:\n\n```typescript\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n // Tool handlers (see Tools documentation)\n },\n resources: [\n // Resource watchers (optional)\n ],\n});\n```\n\n### Attach Options\n\n| Option | Type | Description |\n| ----------------------- | --------------------------------- | ------------------------------------------------------------------------------- |\n| `tools` | `ToolHandlers` | Server-side tool handler functions |\n| `resources` | `Resource[]` | Resource watchers for real-time updates |\n| `onToolResults` | `(results: ToolResult[]) => void` | Callback invoked after server-side tool results are produced |\n| `rejectClientToolCalls` | `boolean` | If `true`, reject tool calls that have no server handler (no client forwarding) |\n\nFor MCP tool integration (browser, filesystem, shell via `@octavus/computer`), register dynamic tools after attaching with `session.setDynamicTools()`. See [Computer](/docs/server-sdk/computer) for details.\n\n## Executing Requests\n\nOnce attached, execute requests on the session using `execute()`:\n\n```typescript\nimport { toSSEStream } from '@octavus/server-sdk';\n\n// execute() handles both triggers and client tool continuations\nconst events = session.execute(\n { type: 'trigger', triggerName: 'user-message', input: { USER_MESSAGE: 'Hello!' } },\n { signal: request.signal },\n);\n\n// Convert to SSE stream for HTTP responses\nreturn new Response(toSSEStream(events), {\n headers: { 'Content-Type': 'text/event-stream' },\n});\n```\n\n### Request Types\n\nThe `execute()` method accepts a discriminated union:\n\n```typescript\ntype SessionRequest = TriggerRequest | ContinueRequest;\n\n// Start a new conversation turn\ninterface TriggerRequest {\n type: 'trigger';\n triggerName: string;\n input?: Record<string, unknown>;\n rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID\n}\n\n// Continue after client-side tool handling\ninterface ContinueRequest {\n type: 'continue';\n executionId: string;\n toolResults: ToolResult[];\n}\n```\n\nThis makes it easy to pass requests through from the client:\n\n```typescript\n// Simple passthrough from HTTP request body\nexport async function POST(request: Request) {\n const body = await request.json();\n const { sessionId, ...payload } = body;\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n /* ... */\n },\n });\n const events = session.execute(payload, { signal: request.signal });\n\n return new Response(toSSEStream(events));\n}\n```\n\n### Stop Support\n\nPass an abort signal to allow clients to stop generation:\n\n```typescript\nconst events = session.execute(request, {\n signal: request.signal, // Forward the client's abort signal\n});\n```\n\nWhen the client aborts the request, the signal propagates through to the LLM provider, stopping generation immediately. Any partial content is preserved.\n\n## WebSocket Handling\n\nFor WebSocket integrations, use `handleSocketMessage()` which manages abort controller lifecycle internally:\n\n```typescript\nimport type { SocketMessage } from '@octavus/server-sdk';\n\n// In your socket handler\nconn.on('data', async (rawData: string) => {\n const msg = JSON.parse(rawData);\n\n if (msg.type === 'trigger' || msg.type === 'continue' || msg.type === 'stop') {\n await session.handleSocketMessage(msg as SocketMessage, {\n onEvent: (event) => conn.write(JSON.stringify(event)),\n onFinish: async () => {\n // Fetch and persist messages to your database for restoration\n },\n });\n }\n});\n```\n\nThe `handleSocketMessage()` method:\n\n- Handles `trigger`, `continue`, and `stop` messages\n- Automatically aborts previous requests when a new one arrives\n- Streams events via the `onEvent` callback\n- Calls `onFinish` after streaming completes (not called if aborted)\n\nSee [Socket Chat Example](/docs/examples/socket-chat) for a complete implementation.\n\n## Session Lifecycle\n\n```mermaid\nflowchart TD\n A[1. CREATE] --> B[2. ATTACH]\n B --> C[3. TRIGGER]\n C --> C\n C --> D[4. RETRIEVE]\n D --> C\n C --> E[5. EXPIRE]\n C --> G[5b. CLEAR]\n G --> F\n E --> F{6. RESTORE?}\n F -->|Yes| C\n F -->|No| A\n\n A -.- A1[\"`**client.agentSessions.create()**\n Returns sessionId\n Initializes state`\"]\n\n B -.- B1[\"`**client.agentSessions.attach()**\n Configure tool handlers\n Configure resource watchers`\"]\n\n C -.- C1[\"`**session.execute()**\n Execute request\n Stream events\n Update state`\"]\n\n D -.- D1[\"`**client.agentSessions.getMessages()**\n Get UI-ready messages\n Check session status`\"]\n\n E -.- E1[\"`Sessions expire after\n 24 hours (configurable)`\"]\n\n G -.- G1[\"`**client.agentSessions.clear()**\n Programmatically clear state\n Session becomes expired`\"]\n\n F -.- F1[\"`**client.agentSessions.restore()**\n Restore from stored messages\n Or create new session`\"]\n```\n\n## Session Expiration\n\nSessions expire after a period of inactivity (default: 24 hours). When you call `getMessages()` or `get()`, the response includes a `status` field:\n\n```typescript\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'expired') {\n // Session has expired - restore or create new\n console.log('Session expired:', result.sessionId);\n} else {\n // Session is active\n console.log('Messages:', result.messages.length);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | ------------------------------------------------------------- |\n| `active` | `UISessionState` | Session is active, includes `messages` array |\n| `expired` | `ExpiredSessionState` | Session expired, includes `sessionId`, `agentId`, `createdAt` |\n\n## Persisting Chat History\n\nTo enable session restoration, store the chat messages in your own database after each interaction:\n\n```typescript\n// After each trigger completes, save messages\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'active') {\n // Store in your database\n await db.chats.update({\n where: { id: chatId },\n data: {\n sessionId: result.sessionId,\n messages: result.messages, // Store UIMessage[] as JSON\n },\n });\n}\n```\n\n> **Best Practice**: Store the full `UIMessage[]` array. This preserves all message parts (text, tool calls, files, etc.) needed for accurate restoration.\n\n## Restoring Sessions\n\nWhen a user returns to your app:\n\n```typescript\n// 1. Load stored data from your database\nconst chat = await db.chats.findUnique({ where: { id: chatId } });\n\n// 2. Check if session is still active\nconst result = await client.agentSessions.getMessages(chat.sessionId);\n\nif (result.status === 'active') {\n // Session is active - use it directly\n return {\n sessionId: result.sessionId,\n messages: result.messages,\n };\n}\n\n// 3. Session expired - restore from stored messages\nif (chat.messages && chat.messages.length > 0) {\n const restored = await client.agentSessions.restore(\n chat.sessionId,\n chat.messages,\n { COMPANY_NAME: 'Acme Corp' }, // Optional: same input as create()\n );\n\n if (restored.restored) {\n // Session restored successfully\n return {\n sessionId: restored.sessionId,\n messages: chat.messages,\n };\n }\n}\n\n// 4. Cannot restore - create new session\nconst newSessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n});\n\nreturn {\n sessionId: newSessionId,\n messages: [],\n};\n```\n\n### Restore Response\n\n```typescript\ninterface RestoreSessionResult {\n sessionId: string;\n restored: boolean; // true if restored, false if session was already active\n}\n```\n\n## Complete Example\n\nHere's a complete session management flow:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\nasync function getOrCreateSession(chatId: string, agentId: string, input: Record<string, unknown>) {\n // Load existing chat data\n const chat = await db.chats.findUnique({ where: { id: chatId } });\n\n if (chat?.sessionId) {\n // Check session status\n const result = await client.agentSessions.getMessages(chat.sessionId);\n\n if (result.status === 'active') {\n return { sessionId: result.sessionId, messages: result.messages };\n }\n\n // Try to restore expired session\n if (chat.messages?.length > 0) {\n const restored = await client.agentSessions.restore(chat.sessionId, chat.messages, input);\n if (restored.restored) {\n return { sessionId: restored.sessionId, messages: chat.messages };\n }\n }\n }\n\n // Create new session\n const sessionId = await client.agentSessions.create(agentId, input);\n\n // Save to database\n await db.chats.upsert({\n where: { id: chatId },\n create: { id: chatId, sessionId, messages: [] },\n update: { sessionId, messages: [] },\n });\n\n return { sessionId, messages: [] };\n}\n```\n\n## Clearing Sessions\n\nTo programmatically clear a session's state (e.g., for testing reset/restore flows), use `clear()`:\n\n```typescript\nconst result = await client.agentSessions.clear(sessionId);\nconsole.log(result.cleared); // true\n```\n\nAfter clearing, the session transitions to `expired` status. You can then restore it with `restore()` or create a new session.\n\n```typescript\ninterface ClearSessionResult {\n sessionId: string;\n cleared: boolean;\n}\n```\n\nThis is idempotent - calling `clear()` on an already expired session succeeds without error.\n\n## Error Handling\n\n```typescript\nimport { ApiError } from '@octavus/server-sdk';\n\ntry {\n const session = await client.agentSessions.getMessages(sessionId);\n} catch (error) {\n if (error instanceof ApiError) {\n if (error.status === 404) {\n // Session not found or expired\n console.log('Session expired, create a new one');\n } else {\n console.error('API Error:', error.message);\n }\n }\n throw error;\n}\n```\n",
|
|
798
|
+
content: "\n# Sessions\n\nSessions represent conversations with an agent. They store conversation history, track resources and variables, and enable stateful interactions.\n\n## Creating Sessions\n\nCreate a session by specifying the agent ID and initial input variables:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\n// Create a session with the support-chat agent\nconst sessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n PRODUCT_NAME: 'Widget Pro',\n USER_ID: 'user-123', // Optional inputs\n});\n\nconsole.log('Session created:', sessionId);\n```\n\n## Getting Session Messages\n\nTo restore a conversation on page load, use `getMessages()` to retrieve UI-ready messages:\n\n```typescript\nconst session = await client.agentSessions.getMessages(sessionId);\n\nconsole.log({\n sessionId: session.sessionId,\n agentId: session.agentId,\n messages: session.messages.length, // UIMessage[] ready for frontend\n});\n```\n\nThe returned messages can be passed directly to the client SDK's `initialMessages` option.\n\n### UISessionState Interface\n\n```typescript\ninterface UISessionState {\n sessionId: string;\n agentId: string;\n messages: UIMessage[]; // UI-ready conversation history\n}\n```\n\n## Full Session State (Debug)\n\nFor debugging or internal use, you can retrieve the complete session state including all variables and internal message format:\n\n```typescript\nconst state = await client.agentSessions.get(sessionId);\n\nconsole.log({\n id: state.id,\n agentId: state.agentId,\n messages: state.messages.length, // ChatMessage[] (internal format)\n resources: state.resources,\n variables: state.variables,\n createdAt: state.createdAt,\n updatedAt: state.updatedAt,\n});\n```\n\n> **Note**: Use `getMessages()` for client-facing code. The `get()` method returns internal message format that includes hidden content not intended for end users.\n\n## Getting Execution Logs\n\n`getLogs()` returns the chronological execution trace for a session - triggers, messages, tool calls, LLM responses, errors, and other events emitted while the agent ran. Useful for debugging, observability, and building custom timeline views.\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status === 'expired') {\n console.log('Session expired:', result.sessionId);\n} else {\n for (const entry of result.entries) {\n console.log(entry.type, entry.timestamp);\n }\n}\n```\n\nEach entry is a typed variant of `ExecutionLogEntry` (a discriminated union) so consumers can narrow on `entry.type`:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired') {\n const toolCalls = result.entries.filter((e) => e.type === 'tool-call');\n for (const call of toolCalls) {\n // call.toolName, call.toolArguments are typed without optional chaining\n console.log(call.toolName, call.toolArguments);\n }\n}\n```\n\n### Excluding Model Request Payloads\n\nModel-request entries include the full provider request body and can be large. Pass `excludeModelRequests: true` to skip them:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId, {\n excludeModelRequests: true,\n});\n```\n\n### Truncation\n\nResponses are capped at 1000 entries (most recent). When the log exceeds that cap, the response includes `total` and `truncated` so consumers can detect this:\n\n```typescript\nconst result = await client.agentSessions.getLogs(sessionId);\n\nif (result.status !== 'expired' && result.truncated) {\n console.warn(`Showing latest 1000 of ${result.total} entries`);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | -------------------------------------------------------------------------------------------- |\n| `active` | `ExecutionLogsResult` | `{ sessionId, entries, total?, truncated? }`. `total` and `truncated` are present when known |\n| `expired` | `ExpiredSessionState` | `{ sessionId, agentId, status: 'expired', createdAt }` |\n\n> **Forward-compatible types**: `ExecutionLogEntry` may gain new variants over time. Include a `default` case when switching on `entry.type` so unknown variants are handled gracefully.\n\n## Attaching to Sessions\n\nTo trigger actions on a session, you need to attach to it first:\n\n```typescript\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n // Tool handlers (see Tools documentation)\n },\n resources: [\n // Resource watchers (optional)\n ],\n});\n```\n\n### Attach Options\n\n| Option | Type | Description |\n| ----------------------- | --------------------------------- | ------------------------------------------------------------------------------- |\n| `tools` | `ToolHandlers` | Server-side tool handler functions |\n| `resources` | `Resource[]` | Resource watchers for real-time updates |\n| `onToolResults` | `(results: ToolResult[]) => void` | Callback invoked after server-side tool results are produced |\n| `rejectClientToolCalls` | `boolean` | If `true`, reject tool calls that have no server handler (no client forwarding) |\n\nFor MCP tool integration (browser, filesystem, shell via `@octavus/computer`), register dynamic tools after attaching with `session.setDynamicTools()`. See [Computer](/docs/server-sdk/computer) for details.\n\n## Executing Requests\n\nOnce attached, execute requests on the session using `execute()`:\n\n```typescript\nimport { toSSEStream } from '@octavus/server-sdk';\n\n// execute() handles both triggers and client tool continuations\nconst events = session.execute(\n { type: 'trigger', triggerName: 'user-message', input: { USER_MESSAGE: 'Hello!' } },\n { signal: request.signal },\n);\n\n// Convert to SSE stream for HTTP responses\nreturn new Response(toSSEStream(events), {\n headers: { 'Content-Type': 'text/event-stream' },\n});\n```\n\n### Request Types\n\nThe `execute()` method accepts a discriminated union:\n\n```typescript\ntype SessionRequest = TriggerRequest | ContinueRequest;\n\n// Start a new conversation turn\ninterface TriggerRequest {\n type: 'trigger';\n triggerName: string;\n input?: Record<string, unknown>;\n rollbackAfterMessageId?: string | null; // For retry: truncate messages after this ID\n sender?: UIMessageSender; // Author of this turn, for multi-user attribution\n}\n\n// Continue after client-side tool handling\ninterface ContinueRequest {\n type: 'continue';\n executionId: string;\n toolResults: ToolResult[];\n}\n```\n\nThis makes it easy to pass requests through from the client:\n\n```typescript\n// Simple passthrough from HTTP request body\nexport async function POST(request: Request) {\n const body = await request.json();\n const { sessionId, ...payload } = body;\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n /* ... */\n },\n });\n const events = session.execute(payload, { signal: request.signal });\n\n return new Response(toSSEStream(events));\n}\n```\n\n### Attributing Messages in Multi-User Chats\n\nWhen several people share one conversation, set `sender` on the trigger so each user message is attributed to its author. Set it **server-side from your authenticated user** - never trust a client-supplied identity:\n\n```typescript\ninterface UIMessageSender {\n id?: string;\n name?: string;\n image?: string; // Avatar URL\n}\n\nexport async function POST(request: Request) {\n const user = await authenticate(request); // your auth\n const { sessionId, ...payload } = await request.json();\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n /* ... */\n },\n });\n const events = session.execute(\n {\n ...payload,\n sender: { id: user.id, name: user.name, image: user.avatarUrl },\n },\n { signal: request.signal },\n );\n\n return new Response(toSSEStream(events));\n}\n```\n\nThe runtime stamps the sender onto the user message it creates, so it comes back on `UIMessage.sender` from `getMessages()` and survives restore. `sender` is turn metadata - it is never added to your protocol's trigger `input`, and agent-initiated turns (no `sender`) stay unattributed. For instant optimistic display in the browser, also pass it on the client `send()` (see [Client SDK Messages](/docs/client-sdk/messages)).\n\n### Stop Support\n\nPass an abort signal to allow clients to stop generation:\n\n```typescript\nconst events = session.execute(request, {\n signal: request.signal, // Forward the client's abort signal\n});\n```\n\nWhen the client aborts the request, the signal propagates through to the LLM provider, stopping generation immediately. Any partial content is preserved.\n\n## WebSocket Handling\n\nFor WebSocket integrations, use `handleSocketMessage()` which manages abort controller lifecycle internally:\n\n```typescript\nimport type { SocketMessage } from '@octavus/server-sdk';\n\n// In your socket handler\nconn.on('data', async (rawData: string) => {\n const msg = JSON.parse(rawData);\n\n if (msg.type === 'trigger' || msg.type === 'continue' || msg.type === 'stop') {\n await session.handleSocketMessage(msg as SocketMessage, {\n onEvent: (event) => conn.write(JSON.stringify(event)),\n onFinish: async () => {\n // Fetch and persist messages to your database for restoration\n },\n });\n }\n});\n```\n\nThe `handleSocketMessage()` method:\n\n- Handles `trigger`, `continue`, and `stop` messages\n- Automatically aborts previous requests when a new one arrives\n- Streams events via the `onEvent` callback\n- Calls `onFinish` after streaming completes (not called if aborted)\n\nSee [Socket Chat Example](/docs/examples/socket-chat) for a complete implementation.\n\n## Session Lifecycle\n\n```mermaid\nflowchart TD\n A[1. CREATE] --> B[2. ATTACH]\n B --> C[3. TRIGGER]\n C --> C\n C --> D[4. RETRIEVE]\n D --> C\n C --> E[5. EXPIRE]\n C --> G[5b. CLEAR]\n G --> F\n E --> F{6. RESTORE?}\n F -->|Yes| C\n F -->|No| A\n\n A -.- A1[\"`**client.agentSessions.create()**\n Returns sessionId\n Initializes state`\"]\n\n B -.- B1[\"`**client.agentSessions.attach()**\n Configure tool handlers\n Configure resource watchers`\"]\n\n C -.- C1[\"`**session.execute()**\n Execute request\n Stream events\n Update state`\"]\n\n D -.- D1[\"`**client.agentSessions.getMessages()**\n Get UI-ready messages\n Check session status`\"]\n\n E -.- E1[\"`Sessions expire after\n 24 hours (configurable)`\"]\n\n G -.- G1[\"`**client.agentSessions.clear()**\n Programmatically clear state\n Session becomes expired`\"]\n\n F -.- F1[\"`**client.agentSessions.restore()**\n Restore from stored messages\n Or create new session`\"]\n```\n\n## Session Expiration\n\nSessions expire after a period of inactivity (default: 24 hours). When you call `getMessages()` or `get()`, the response includes a `status` field:\n\n```typescript\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'expired') {\n // Session has expired - restore or create new\n console.log('Session expired:', result.sessionId);\n} else {\n // Session is active\n console.log('Messages:', result.messages.length);\n}\n```\n\n### Response Types\n\n| Status | Type | Description |\n| --------- | --------------------- | ------------------------------------------------------------- |\n| `active` | `UISessionState` | Session is active, includes `messages` array |\n| `expired` | `ExpiredSessionState` | Session expired, includes `sessionId`, `agentId`, `createdAt` |\n\n## Persisting Chat History\n\nTo enable session restoration, store the chat messages in your own database after each interaction:\n\n```typescript\n// After each trigger completes, save messages\nconst result = await client.agentSessions.getMessages(sessionId);\n\nif (result.status === 'active') {\n // Store in your database\n await db.chats.update({\n where: { id: chatId },\n data: {\n sessionId: result.sessionId,\n messages: result.messages, // Store UIMessage[] as JSON\n },\n });\n}\n```\n\n> **Best Practice**: Store the full `UIMessage[]` array. This preserves all message parts (text, tool calls, files, etc.) needed for accurate restoration.\n\n## Restoring Sessions\n\nWhen a user returns to your app:\n\n```typescript\n// 1. Load stored data from your database\nconst chat = await db.chats.findUnique({ where: { id: chatId } });\n\n// 2. Check if session is still active\nconst result = await client.agentSessions.getMessages(chat.sessionId);\n\nif (result.status === 'active') {\n // Session is active - use it directly\n return {\n sessionId: result.sessionId,\n messages: result.messages,\n };\n}\n\n// 3. Session expired - restore from stored messages\nif (chat.messages && chat.messages.length > 0) {\n const restored = await client.agentSessions.restore(\n chat.sessionId,\n chat.messages,\n { COMPANY_NAME: 'Acme Corp' }, // Optional: same input as create()\n );\n\n if (restored.restored) {\n // Session restored successfully\n return {\n sessionId: restored.sessionId,\n messages: chat.messages,\n };\n }\n}\n\n// 4. Cannot restore - create new session\nconst newSessionId = await client.agentSessions.create('support-chat', {\n COMPANY_NAME: 'Acme Corp',\n});\n\nreturn {\n sessionId: newSessionId,\n messages: [],\n};\n```\n\n### Restore Response\n\n```typescript\ninterface RestoreSessionResult {\n sessionId: string;\n restored: boolean; // true if restored, false if session was already active\n}\n```\n\n## Complete Example\n\nHere's a complete session management flow:\n\n```typescript\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n});\n\nasync function getOrCreateSession(chatId: string, agentId: string, input: Record<string, unknown>) {\n // Load existing chat data\n const chat = await db.chats.findUnique({ where: { id: chatId } });\n\n if (chat?.sessionId) {\n // Check session status\n const result = await client.agentSessions.getMessages(chat.sessionId);\n\n if (result.status === 'active') {\n return { sessionId: result.sessionId, messages: result.messages };\n }\n\n // Try to restore expired session\n if (chat.messages?.length > 0) {\n const restored = await client.agentSessions.restore(chat.sessionId, chat.messages, input);\n if (restored.restored) {\n return { sessionId: restored.sessionId, messages: chat.messages };\n }\n }\n }\n\n // Create new session\n const sessionId = await client.agentSessions.create(agentId, input);\n\n // Save to database\n await db.chats.upsert({\n where: { id: chatId },\n create: { id: chatId, sessionId, messages: [] },\n update: { sessionId, messages: [] },\n });\n\n return { sessionId, messages: [] };\n}\n```\n\n## Clearing Sessions\n\nTo programmatically clear a session's state (e.g., for testing reset/restore flows), use `clear()`:\n\n```typescript\nconst result = await client.agentSessions.clear(sessionId);\nconsole.log(result.cleared); // true\n```\n\nAfter clearing, the session transitions to `expired` status. You can then restore it with `restore()` or create a new session.\n\n```typescript\ninterface ClearSessionResult {\n sessionId: string;\n cleared: boolean;\n}\n```\n\nThis is idempotent - calling `clear()` on an already expired session succeeds without error.\n\n## Error Handling\n\n```typescript\nimport { ApiError } from '@octavus/server-sdk';\n\ntry {\n const session = await client.agentSessions.getMessages(sessionId);\n} catch (error) {\n if (error instanceof ApiError) {\n if (error.status === 404) {\n // Session not found or expired\n console.log('Session expired, create a new one');\n } else {\n console.error('API Error:', error.message);\n }\n }\n throw error;\n}\n```\n",
|
|
781
799
|
excerpt: "Sessions Sessions represent conversations with an agent. They store conversation history, track resources and variables, and enable stateful interactions. Creating Sessions Create a session by...",
|
|
782
800
|
order: 2
|
|
783
801
|
},
|
|
@@ -804,7 +822,7 @@ var sections_default = [
|
|
|
804
822
|
section: "server-sdk",
|
|
805
823
|
title: "CLI",
|
|
806
824
|
description: "Command-line interface for validating and syncing agent definitions.",
|
|
807
|
-
content: '\n# Octavus CLI\n\nThe `@octavus/cli` package provides a command-line interface for validating and syncing agent definitions from your local filesystem to the Octavus platform.\n\n**Current version:** `
|
|
825
|
+
content: '\n# Octavus CLI\n\nThe `@octavus/cli` package provides a command-line interface for validating and syncing agent definitions from your local filesystem to the Octavus platform.\n\n**Current version:** `5.0.0`\n\n## Installation\n\n```bash\nnpm install --save-dev @octavus/cli\n```\n\n## Configuration\n\nThe CLI requires an API key with the **Agents** permission.\n\n### Environment Variables\n\n| Variable | Description |\n| --------------------- | ---------------------------------------------- |\n| `OCTAVUS_CLI_API_KEY` | API key with "Agents" permission (recommended) |\n| `OCTAVUS_API_KEY` | Fallback if `OCTAVUS_CLI_API_KEY` not set |\n| `OCTAVUS_API_URL` | Optional, defaults to `https://octavus.ai` |\n\n### Two-Key Strategy (Recommended)\n\nFor production deployments, use separate API keys with minimal permissions:\n\n```bash\n# CI/CD or .env.local (not committed)\nOCTAVUS_CLI_API_KEY=oct_sk_... # "Agents" permission only\n\n# Production .env\nOCTAVUS_API_KEY=oct_sk_... # "Sessions" permission only\n```\n\nThis ensures production servers only have session permissions (smaller blast radius if leaked), while agent management is restricted to development/CI environments.\n\n### Multiple Environments\n\nUse separate Octavus projects for staging and production, each with their own API keys. The `--env` flag lets you load different environment files:\n\n```bash\n# Local development (default: .env)\noctavus sync ./agents/my-agent\n\n# Staging project\noctavus --env .env.staging sync ./agents/my-agent\n\n# Production project\noctavus --env .env.production sync ./agents/my-agent\n```\n\nExample environment files:\n\n```bash\n# .env.staging (syncs to your staging project)\nOCTAVUS_CLI_API_KEY=oct_sk_staging_project_key...\n\n# .env.production (syncs to your production project)\nOCTAVUS_CLI_API_KEY=oct_sk_production_project_key...\n```\n\nEach project has its own agents, so you\'ll get different agent IDs per environment.\n\n## Global Options\n\n| Option | Description |\n| -------------- | ------------------------------------------------------- |\n| `--env <file>` | Load environment from a specific file (default: `.env`) |\n| `--help` | Show help |\n| `--version` | Show version |\n\n## Commands\n\n### `octavus sync <path>`\n\nSync an agent definition to the platform. Creates the agent if it doesn\'t exist, or updates it if it does.\n\n```bash\noctavus sync ./agents/my-agent\n```\n\n**Options:**\n\n- `--json` - Output as JSON (for CI/CD parsing)\n- `--quiet` - Suppress non-essential output\n\n**Example output:**\n\n```\n\u2139 Reading agent from ./agents/my-agent...\n\u2139 Syncing support-chat...\n\u2713 Created: support-chat\n Agent ID: clxyz123abc456\n```\n\n### `octavus validate <path>`\n\nValidate an agent definition without saving. Useful for CI/CD pipelines.\n\n```bash\noctavus validate ./agents/my-agent\n```\n\n**Exit codes:**\n\n- `0` - Validation passed\n- `1` - Validation errors\n- `2` - Configuration errors (missing API key, etc.)\n\n### `octavus list`\n\nList all agents in your project.\n\n```bash\noctavus list\n```\n\n**Example output:**\n\n```\nSLUG NAME FORMAT ID\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nsupport-chat Support Chat Agent interactive clxyz123abc456\n\n1 agent(s)\n```\n\n### `octavus get <slug>`\n\nGet details about a specific agent by its slug.\n\n```bash\noctavus get support-chat\n```\n\n### `octavus archive <slug>`\n\nArchive an agent by slug (soft delete). Archived agents are removed from the active agent list and their slug is freed for reuse.\n\n```bash\noctavus archive support-chat\n```\n\n**Options:**\n\n- `--json` - Output as JSON (for CI/CD parsing)\n- `--quiet` - Suppress non-essential output\n\n**Example output:**\n\n```\n\u2139 Archiving support-chat...\n\u2713 Archived: support-chat\n Agent ID: clxyz123abc456\n```\n\n### `octavus skills sync <path>`\n\nSync a skill to the platform. Packages the skill directory into a bundle (excluding `.env` files, `.git`, and `node_modules`), uploads it, and optionally pushes secrets from the skill\'s `.env` file.\n\n```bash\noctavus skills sync ./skills/github\n```\n\n**Options:**\n\n- `--json` - Output as JSON (for CI/CD parsing)\n- `--quiet` - Suppress non-essential output\n\n**Example output:**\n\n```\n\u2139 Reading skill from ./skills/github...\n\u2139 Packaging github...\n\u2713 Created: github\n Skill ID: clxyz789def012\n\u2139 Pushing 2 secret(s)...\n\u2713 2 secret(s) updated\n```\n\n**Secret handling:**\n\nIf the skill directory contains a `.env` file, secrets are pushed alongside the bundle. Secrets are cross-validated against the `secrets` declarations in `SKILL.md` - warnings are shown for undeclared or missing required secrets.\n\n```\nmy-skill/\n\u251C\u2500\u2500 SKILL.md\n\u251C\u2500\u2500 scripts/\n\u2502 \u2514\u2500\u2500 run.py\n\u2514\u2500\u2500 .env # Secrets (not included in bundle)\n```\n\nSee [Skills](/docs/protocol/skills) for details on skill format, secrets, and secure mode.\n\n## Agent Directory Structure\n\nThe CLI expects agent definitions in a specific directory structure:\n\n```\nmy-agent/\n\u251C\u2500\u2500 settings.json # Required: Agent metadata\n\u251C\u2500\u2500 protocol.yaml # Required: Agent protocol\n\u251C\u2500\u2500 prompts/ # Optional: Prompt templates\n\u2502 \u251C\u2500\u2500 system.md\n\u2502 \u2514\u2500\u2500 user-message.md\n\u2514\u2500\u2500 references/ # Optional: Reference documents\n \u2514\u2500\u2500 api-guidelines.md\n```\n\n### references/\n\nReference files are markdown documents with YAML frontmatter containing a `description`. The agent can fetch these on demand during execution. See [References](/docs/protocol/references) for details.\n\n### settings.json\n\n```json\n{\n "slug": "my-agent",\n "name": "My Agent",\n "description": "A helpful assistant",\n "format": "interactive"\n}\n```\n\n### protocol.yaml\n\nSee the [Protocol documentation](/docs/protocol/overview) for details on protocol syntax.\n\n## CI/CD Integration\n\n### GitHub Actions\n\n```yaml\nname: Validate and Sync Agents\n\non:\n push:\n branches: [main]\n paths:\n - \'agents/**\'\n\njobs:\n sync:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v4\n\n - uses: actions/setup-node@v4\n with:\n node-version: \'22\'\n\n - run: npm install\n\n - name: Validate agent\n run: npx octavus validate ./agents/support-chat\n env:\n OCTAVUS_CLI_API_KEY: ${{ secrets.OCTAVUS_CLI_API_KEY }}\n\n - name: Sync agent\n run: npx octavus sync ./agents/support-chat\n env:\n OCTAVUS_CLI_API_KEY: ${{ secrets.OCTAVUS_CLI_API_KEY }}\n```\n\n### Package.json Scripts\n\nAdd sync scripts to your `package.json`:\n\n```json\n{\n "scripts": {\n "agents:validate": "octavus validate ./agents/my-agent",\n "agents:sync": "octavus sync ./agents/my-agent"\n },\n "devDependencies": {\n "@octavus/cli": "^0.1.0"\n }\n}\n```\n\n## Workflow\n\nThe recommended workflow for managing agents:\n\n1. **Define agent locally** - Create `settings.json`, `protocol.yaml`, and prompts\n2. **Validate** - Run `octavus validate ./my-agent` to check for errors\n3. **Sync** - Run `octavus sync ./my-agent` to push to platform\n4. **Store agent ID** - Save the output ID in an environment variable\n5. **Use in app** - Read the ID from env and pass to `client.agentSessions.create()`\n\n```bash\n# After syncing: octavus sync ./agents/support-chat\n# Output: Agent ID: clxyz123abc456\n\n# Add to your .env file\nOCTAVUS_SUPPORT_AGENT_ID=clxyz123abc456\n```\n\n```typescript\nconst agentId = process.env.OCTAVUS_SUPPORT_AGENT_ID;\n\nconst sessionId = await client.agentSessions.create(agentId, {\n COMPANY_NAME: \'Acme Corp\',\n});\n```\n',
|
|
808
826
|
excerpt: "Octavus CLI The package provides a command-line interface for validating and syncing agent definitions from your local filesystem to the Octavus platform. Current version: Installation ...",
|
|
809
827
|
order: 5
|
|
810
828
|
},
|
|
@@ -831,7 +849,7 @@ var sections_default = [
|
|
|
831
849
|
section: "server-sdk",
|
|
832
850
|
title: "Computer",
|
|
833
851
|
description: "Adding browser, filesystem, and shell capabilities to agents with @octavus/computer.",
|
|
834
|
-
content: "\n# Computer\n\nThe `@octavus/computer` package gives agents access to a physical or virtual machine's browser, filesystem, and shell. It connects to [MCP](https://modelcontextprotocol.io) servers, discovers their tools, and provides them to the server-sdk.\n\n**Current version:** `4.1.0`\n\n## Installation\n\n```bash\nnpm install @octavus/computer\n```\n\n## Quick Start\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=http://127.0.0.1:9222']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', ['/path/to/workspace']),\n shell: Computer.shell({ cwd: '/path/to/workspace', mode: 'unrestricted' }),\n },\n});\n\nawait computer.start();\n\nconst client = new OctavusClient({\n baseUrl: 'https://octavus.ai',\n apiKey: 'your-api-key',\n});\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => ({ title: args.title }),\n },\n});\n\nsession.setDynamicTools(computer);\n```\n\nDynamic tools are registered after attaching via `session.setDynamicTools()`. Pass the `computer` directly - the session extracts schemas and handlers from the `ToolProvider`. Tool schemas are sent to the platform on the next `execute()` call, and tool calls flow back through the existing execution loop.\n\n## How It Works\n\n1. You configure MCP servers with namespaces (e.g., `browser`, `filesystem`, `shell`)\n2. `computer.start()` connects to all servers in parallel and discovers their tools\n3. Each tool is namespaced with `__` (e.g., `browser__navigate_page`, `filesystem__read_file`)\n4. The server-sdk sends tool schemas to the platform and handles tool call execution\n\nThe agent's protocol must declare matching `mcpServers` with `source: device` - see [MCP Servers](/docs/protocol/mcp-servers).\n\n## Entry Types\n\nThe `Computer` class supports three types of MCP entries:\n\n### Stdio (MCP Subprocess)\n\nSpawns an MCP server as a child process, communicating via stdin/stdout:\n\n```typescript\nComputer.stdio(command: string, args?: string[], options?: {\n env?: Record<string, string>;\n cwd?: string;\n})\n```\n\nUse this for local MCP servers installed as npm packages or standalone executables:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n '--browser-url=http://127.0.0.1:9222',\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [\n '/Users/me/projects/my-app',\n ]),\n },\n});\n```\n\n### HTTP (Remote MCP Endpoint)\n\nConnects to an MCP server over Streamable HTTP:\n\n```typescript\nComputer.http(url: string, options?: {\n headers?: Record<string, string>;\n})\n```\n\nUse this for MCP servers running as HTTP services:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n docs: Computer.http('http://localhost:3001/mcp', {\n headers: { Authorization: 'Bearer token' },\n }),\n },\n});\n```\n\n### Shell (Built-in)\n\nProvides shell command execution without spawning an MCP subprocess:\n\n```typescript\nComputer.shell(options: {\n cwd?: string;\n mode: ShellMode;\n timeout?: number; // Default: 300,000ms (5 minutes)\n})\n```\n\nThis exposes a `run_command` tool (namespaced as `shell__run_command` when the key is `shell`). Commands execute in a login shell with the user's full environment.\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n shell: Computer.shell({\n cwd: '/Users/me/projects/my-app',\n mode: 'unrestricted',\n timeout: 300_000,\n }),\n },\n});\n```\n\n#### Shell Safety Modes\n\n| Mode | Description |\n| -------------------------------------- | --------------------------------------------- |\n| `'unrestricted'` | All commands allowed (for dedicated machines) |\n| `{ allowedPatterns, blockedPatterns }` | Pattern-based command filtering |\n\nPattern-based filtering:\n\n```typescript\nComputer.shell({\n cwd: workspaceDir,\n mode: {\n blockedPatterns: [/rm\\s+-rf/, /sudo/],\n allowedPatterns: [/^git\\s/, /^npm\\s/, /^ls\\s/],\n },\n});\n```\n\nWhen `allowedPatterns` is set, only matching commands are permitted. When `blockedPatterns` is set, matching commands are rejected. Blocked patterns are checked first.\n\n## Lifecycle\n\n### Starting\n\n`computer.start()` connects to all configured MCP servers in parallel. If some servers fail to connect, the computer still starts with the remaining servers - only if _all_ connections fail does it throw an error.\n\n```typescript\nconst { errors } = await computer.start();\n\nif (errors.length > 0) {\n console.warn('Some MCP servers failed to connect:', errors);\n}\n```\n\n### Stopping\n\n`computer.stop()` closes all MCP connections and kills managed processes:\n\n```typescript\nawait computer.stop();\n```\n\nAlways call `stop()` when the session ends to clean up MCP subprocesses. For managed processes (like Chrome), pass them in the config for automatic cleanup.\n\n## Dynamic Entries\n\nYou can add or remove MCP entries on a running `Computer` after `start()` has returned. This is useful when MCP configurations arrive after construction - for example, when a session-manager receives per-session entries from a dispatch payload and wants to wire them into the existing computer instead of rebuilding it.\n\n### `addEntry(namespace, entry, options?)`\n\nRegisters a new MCP entry under `namespace`. By default, connects immediately:\n\n```typescript\nawait computer.addEntry(\n 'github',\n Computer.stdio('@modelcontextprotocol/server-github', [], {\n env: { GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GH_TOKEN! },\n }),\n);\n```\n\nPass `{ deferred: true }` to register the entry without connecting. The entry starts in a degraded state and connects on the next `restartEntry(namespace)` call - useful for lazy MCPs the agent activates on demand:\n\n```typescript\nawait computer.addEntry('github', githubEntry, { deferred: true });\n\n// Later, when the agent decides it needs GitHub:\nawait computer.restartEntry('github');\n```\n\n`addEntry` throws if the namespace already exists. To replace an entry, call `removeEntry` first.\n\nIf the immediate connection fails, `addEntry` does not throw - the entry is registered as degraded with the error message attached. Inspect via `getHealth()` or `restartEntry()` to retry.\n\n### `removeEntry(namespace)`\n\nCloses the entry's connection (if any) and drops it from the configuration. No-op when the namespace doesn't exist:\n\n```typescript\nawait computer.removeEntry('github');\n```\n\n### `restartEntry(namespace)`\n\nCloses the existing connection (if any) and reconnects with the current configuration:\n\n```typescript\nawait computer.restartEntry('github');\n```\n\nUse this to bring a deferred entry online for the first time, or to recover an entry that became degraded mid-session.\n\n### Detecting dynamic-entry support\n\nConsumers that work with arbitrary `ToolProvider` implementations can detect dynamic-entry capability with `isDynamicMcpProvider`:\n\n```typescript\nimport { isDynamicMcpProvider } from '@octavus/server-sdk';\n\nif (isDynamicMcpProvider(provider)) {\n await provider.addEntry('github', githubEntry);\n}\n```\n\n`Computer` always passes this check.\n\n## Chrome Launch Helper\n\nFor desktop applications that need to control a browser, `Computer.launchChrome()` launches Chrome with remote debugging enabled:\n\n```typescript\nconst browser = await Computer.launchChrome({\n profileDir: '/Users/me/.my-app/chrome-profiles/agent-1',\n debuggingPort: 9222, // Optional, auto-allocated if omitted\n flags: ['--window-size=1280,800'],\n});\n\nconsole.log(`Chrome running on port ${browser.port}, PID ${browser.pid}`);\n```\n\nPass the browser to `managedProcesses` for automatic cleanup when the computer stops:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [workspaceDir]),\n shell: Computer.shell({ cwd: workspaceDir, mode: 'unrestricted' }),\n },\n managedProcesses: [{ process: browser.process }],\n});\n```\n\n### ChromeLaunchOptions\n\n| Field | Required | Description |\n| --------------- | -------- | ----------------------------------------------------- |\n| `profileDir` | Yes | Directory for Chrome's user data (profile isolation) |\n| `debuggingPort` | No | Port for remote debugging (auto-allocated if omitted) |\n| `flags` | No | Additional Chrome launch flags |\n\n## ToolProvider Interface\n\n`Computer` implements the `ToolProvider` interface from `@octavus/core`:\n\n```typescript\ninterface ToolProvider {\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n```\n\n`setDynamicTools()` accepts any `ToolProvider` directly - the session extracts schemas and handlers automatically:\n\n```typescript\nsession.setDynamicTools(computer);\n```\n\nYou can also pass a custom `ToolProvider`:\n\n```typescript\nconst customProvider: ToolProvider = {\n toolHandlers() {\n return {\n custom__my_tool: async (args) => {\n return { result: 'done' };\n },\n };\n },\n toolSchemas() {\n return [\n {\n name: 'custom__my_tool',\n description: 'A custom tool',\n inputSchema: {\n type: 'object',\n properties: {\n input: { type: 'string', description: 'Tool input' },\n },\n required: ['input'],\n },\n },\n ];\n },\n};\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: { 'set-chat-title': titleHandler },\n});\n\nsession.setDynamicTools(customProvider);\n```\n\nFor cases where you need explicit control, `setDynamicTools()` also accepts a `DynamicTool[]` array:\n\n```typescript\ninterface DynamicTool {\n schema: ToolSchema;\n handler: ToolHandler;\n}\n```\n\n## Complete Example\n\nA desktop application with browser, filesystem, and shell capabilities:\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst WORKSPACE_DIR = '/Users/me/projects/my-app';\nconst PROFILE_DIR = '/Users/me/.my-app/chrome-profiles/agent';\n\nasync function startSession(sessionId: string) {\n // 1. Launch Chrome with remote debugging\n const browser = await Computer.launchChrome({\n profileDir: PROFILE_DIR,\n });\n\n // 2. Create computer with all capabilities\n const computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [WORKSPACE_DIR]),\n shell: Computer.shell({\n cwd: WORKSPACE_DIR,\n mode: 'unrestricted',\n }),\n },\n managedProcesses: [{ process: browser.process }],\n });\n\n // 3. Connect to all MCP servers\n const { errors } = await computer.start();\n if (errors.length > 0) {\n console.warn('Failed to connect:', errors);\n }\n\n // 4. Attach to session and register dynamic tools\n const client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n });\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => {\n console.log('Chat title:', args.title);\n return { success: true };\n },\n },\n });\n\n session.setDynamicTools(computer);\n\n // 5. Execute and stream\n const events = session.execute({\n type: 'trigger',\n triggerName: 'user-message',\n input: { USER_MESSAGE: 'Navigate to github.com and take a screenshot' },\n });\n\n for await (const event of events) {\n // Handle stream events\n }\n\n // 6. Clean up\n await computer.stop();\n}\n```\n\n## API Reference\n\n### Computer\n\n```typescript\nclass Computer implements ToolProvider {\n constructor(config: ComputerConfig);\n\n // Static factories for MCP entries\n static stdio(\n command: string,\n args?: string[],\n options?: {\n env?: Record<string, string>;\n cwd?: string;\n },\n ): StdioConfig;\n\n static http(\n url: string,\n options?: {\n headers?: Record<string, string>;\n },\n ): HttpConfig;\n\n static shell(options: { cwd?: string; mode: ShellMode; timeout?: number }): ShellConfig;\n\n // Chrome launch helper\n static launchChrome(options: ChromeLaunchOptions): Promise<ChromeInstance>;\n\n // Lifecycle\n start(): Promise<{ errors: string[] }>;\n stop(): Promise<void>;\n\n // Dynamic entries\n addEntry(namespace: string, entry: McpEntry, options?: { deferred?: boolean }): Promise<void>;\n removeEntry(namespace: string): Promise<void>;\n restartEntry(namespace: string): Promise<void>;\n stopEntry(namespace: string): Promise<void>;\n\n // Health\n getHealth(): Promise<ComputerHealth>;\n ensureReady(): Promise<EnsureReadyResult>;\n retryDegraded(): Promise<{ recovered: string[]; stillDegraded: string[] }>;\n\n // ToolProvider implementation\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n\ninterface ComputerHealth {\n healthy: boolean;\n entries: EntryHealth[];\n totalTools: number;\n}\n\ninterface EntryHealth {\n name: string;\n healthy: boolean;\n error?: string;\n}\n\ninterface EnsureReadyResult extends ComputerHealth {\n recovered?: string[];\n failedEntries?: string[];\n}\n```\n\n### ComputerConfig\n\n```typescript\ninterface ComputerConfig {\n mcpServers: Record<string, McpEntry>;\n managedProcesses?: { process: ChildProcess }[];\n /** Namespaces to skip during start() - they begin as degraded and can be connected on demand via restartEntry(). */\n deferredEntries?: string[];\n}\n\ntype McpEntry = StdioConfig | HttpConfig | ShellConfig;\ntype ShellMode =\n | 'unrestricted'\n | {\n allowedPatterns?: RegExp[];\n blockedPatterns?: RegExp[];\n };\n```\n\n### ChromeInstance\n\n```typescript\ninterface ChromeInstance {\n port: number;\n process: ChildProcess;\n pid: number;\n}\n```\n",
|
|
852
|
+
content: "\n# Computer\n\nThe `@octavus/computer` package gives agents access to a physical or virtual machine's browser, filesystem, and shell. It connects to [MCP](https://modelcontextprotocol.io) servers, discovers their tools, and provides them to the server-sdk.\n\n**Current version:** `5.0.0`\n\n## Installation\n\n```bash\nnpm install @octavus/computer\n```\n\n## Quick Start\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=http://127.0.0.1:9222']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', ['/path/to/workspace']),\n shell: Computer.shell({ cwd: '/path/to/workspace', mode: 'unrestricted' }),\n },\n});\n\nawait computer.start();\n\nconst client = new OctavusClient({\n baseUrl: 'https://octavus.ai',\n apiKey: 'your-api-key',\n});\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => ({ title: args.title }),\n },\n});\n\nsession.setDynamicTools(computer);\n```\n\nDynamic tools are registered after attaching via `session.setDynamicTools()`. Pass the `computer` directly - the session extracts schemas and handlers from the `ToolProvider`. Tool schemas are sent to the platform on the next `execute()` call, and tool calls flow back through the existing execution loop.\n\n## How It Works\n\n1. You configure MCP servers with namespaces (e.g., `browser`, `filesystem`, `shell`)\n2. `computer.start()` connects to all servers in parallel and discovers their tools\n3. Each tool is namespaced with `__` (e.g., `browser__navigate_page`, `filesystem__read_file`)\n4. The server-sdk sends tool schemas to the platform and handles tool call execution\n\nThe agent's protocol must declare matching `mcpServers` with `source: device` - see [MCP Servers](/docs/protocol/mcp-servers).\n\n## Entry Types\n\nThe `Computer` class supports three types of MCP entries:\n\n### Stdio (MCP Subprocess)\n\nSpawns an MCP server as a child process, communicating via stdin/stdout:\n\n```typescript\nComputer.stdio(command: string, args?: string[], options?: {\n env?: Record<string, string>;\n cwd?: string;\n})\n```\n\nUse this for local MCP servers installed as npm packages or standalone executables:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n '--browser-url=http://127.0.0.1:9222',\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [\n '/Users/me/projects/my-app',\n ]),\n },\n});\n```\n\n### HTTP (Remote MCP Endpoint)\n\nConnects to an MCP server over Streamable HTTP:\n\n```typescript\nComputer.http(url: string, options?: {\n headers?: Record<string, string>;\n})\n```\n\nUse this for MCP servers running as HTTP services:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n docs: Computer.http('http://localhost:3001/mcp', {\n headers: { Authorization: 'Bearer token' },\n }),\n },\n});\n```\n\n### Shell (Built-in)\n\nProvides shell command execution without spawning an MCP subprocess:\n\n```typescript\nComputer.shell(options: {\n cwd?: string;\n mode: ShellMode;\n timeout?: number; // Default: 300,000ms (5 minutes)\n})\n```\n\nThis exposes a `run_command` tool (namespaced as `shell__run_command` when the key is `shell`). Commands execute in a login shell with the user's full environment.\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n shell: Computer.shell({\n cwd: '/Users/me/projects/my-app',\n mode: 'unrestricted',\n timeout: 300_000,\n }),\n },\n});\n```\n\n#### Shell Safety Modes\n\n| Mode | Description |\n| -------------------------------------- | --------------------------------------------- |\n| `'unrestricted'` | All commands allowed (for dedicated machines) |\n| `{ allowedPatterns, blockedPatterns }` | Pattern-based command filtering |\n\nPattern-based filtering:\n\n```typescript\nComputer.shell({\n cwd: workspaceDir,\n mode: {\n blockedPatterns: [/rm\\s+-rf/, /sudo/],\n allowedPatterns: [/^git\\s/, /^npm\\s/, /^ls\\s/],\n },\n});\n```\n\nWhen `allowedPatterns` is set, only matching commands are permitted. When `blockedPatterns` is set, matching commands are rejected. Blocked patterns are checked first.\n\n## Lifecycle\n\n### Starting\n\n`computer.start()` connects to all configured MCP servers in parallel. If some servers fail to connect, the computer still starts with the remaining servers - only if _all_ connections fail does it throw an error.\n\n```typescript\nconst { errors } = await computer.start();\n\nif (errors.length > 0) {\n console.warn('Some MCP servers failed to connect:', errors);\n}\n```\n\n### Stopping\n\n`computer.stop()` closes all MCP connections and kills managed processes:\n\n```typescript\nawait computer.stop();\n```\n\nAlways call `stop()` when the session ends to clean up MCP subprocesses. For managed processes (like Chrome), pass them in the config for automatic cleanup.\n\n## Dynamic Entries\n\nYou can add or remove MCP entries on a running `Computer` after `start()` has returned. This is useful when MCP configurations arrive after construction - for example, when a session-manager receives per-session entries from a dispatch payload and wants to wire them into the existing computer instead of rebuilding it.\n\n### `addEntry(namespace, entry, options?)`\n\nRegisters a new MCP entry under `namespace`. By default, connects immediately:\n\n```typescript\nawait computer.addEntry(\n 'github',\n Computer.stdio('@modelcontextprotocol/server-github', [], {\n env: { GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GH_TOKEN! },\n }),\n);\n```\n\nPass `{ deferred: true }` to register the entry without connecting. The entry starts in a degraded state and connects on the next `restartEntry(namespace)` call - useful for lazy MCPs the agent activates on demand:\n\n```typescript\nawait computer.addEntry('github', githubEntry, { deferred: true });\n\n// Later, when the agent decides it needs GitHub:\nawait computer.restartEntry('github');\n```\n\n`addEntry` throws if the namespace already exists. To replace an entry, call `removeEntry` first.\n\nIf the immediate connection fails, `addEntry` does not throw - the entry is registered as degraded with the error message attached. Inspect via `getHealth()` or `restartEntry()` to retry.\n\n### `removeEntry(namespace)`\n\nCloses the entry's connection (if any) and drops it from the configuration. No-op when the namespace doesn't exist:\n\n```typescript\nawait computer.removeEntry('github');\n```\n\n### `restartEntry(namespace)`\n\nCloses the existing connection (if any) and reconnects with the current configuration:\n\n```typescript\nawait computer.restartEntry('github');\n```\n\nUse this to bring a deferred entry online for the first time, or to recover an entry that became degraded mid-session.\n\n### Detecting dynamic-entry support\n\nConsumers that work with arbitrary `ToolProvider` implementations can detect dynamic-entry capability with `isDynamicMcpProvider`:\n\n```typescript\nimport { isDynamicMcpProvider } from '@octavus/server-sdk';\n\nif (isDynamicMcpProvider(provider)) {\n await provider.addEntry('github', githubEntry);\n}\n```\n\n`Computer` always passes this check.\n\n## Chrome Launch Helper\n\nFor desktop applications that need to control a browser, `Computer.launchChrome()` launches Chrome with remote debugging enabled:\n\n```typescript\nconst browser = await Computer.launchChrome({\n profileDir: '/Users/me/.my-app/chrome-profiles/agent-1',\n debuggingPort: 9222, // Optional, auto-allocated if omitted\n flags: ['--window-size=1280,800'],\n});\n\nconsole.log(`Chrome running on port ${browser.port}, PID ${browser.pid}`);\n```\n\nPass the browser to `managedProcesses` for automatic cleanup when the computer stops:\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [workspaceDir]),\n shell: Computer.shell({ cwd: workspaceDir, mode: 'unrestricted' }),\n },\n managedProcesses: [{ process: browser.process }],\n});\n```\n\n### ChromeLaunchOptions\n\n| Field | Required | Description |\n| --------------- | -------- | ----------------------------------------------------- |\n| `profileDir` | Yes | Directory for Chrome's user data (profile isolation) |\n| `debuggingPort` | No | Port for remote debugging (auto-allocated if omitted) |\n| `flags` | No | Additional Chrome launch flags |\n\n## ToolProvider Interface\n\n`Computer` implements the `ToolProvider` interface from `@octavus/core`:\n\n```typescript\ninterface ToolProvider {\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n```\n\n`setDynamicTools()` accepts any `ToolProvider` directly - the session extracts schemas and handlers automatically:\n\n```typescript\nsession.setDynamicTools(computer);\n```\n\nYou can also pass a custom `ToolProvider`:\n\n```typescript\nconst customProvider: ToolProvider = {\n toolHandlers() {\n return {\n custom__my_tool: async (args) => {\n return { result: 'done' };\n },\n };\n },\n toolSchemas() {\n return [\n {\n name: 'custom__my_tool',\n description: 'A custom tool',\n inputSchema: {\n type: 'object',\n properties: {\n input: { type: 'string', description: 'Tool input' },\n },\n required: ['input'],\n },\n },\n ];\n },\n};\n\nconst session = client.agentSessions.attach(sessionId, {\n tools: { 'set-chat-title': titleHandler },\n});\n\nsession.setDynamicTools(customProvider);\n```\n\nFor cases where you need explicit control, `setDynamicTools()` also accepts a `DynamicTool[]` array:\n\n```typescript\ninterface DynamicTool {\n schema: ToolSchema;\n handler: ToolHandler;\n}\n```\n\n## Complete Example\n\nA desktop application with browser, filesystem, and shell capabilities:\n\n```typescript\nimport { Computer } from '@octavus/computer';\nimport { OctavusClient } from '@octavus/server-sdk';\n\nconst WORKSPACE_DIR = '/Users/me/projects/my-app';\nconst PROFILE_DIR = '/Users/me/.my-app/chrome-profiles/agent';\n\nasync function startSession(sessionId: string) {\n // 1. Launch Chrome with remote debugging\n const browser = await Computer.launchChrome({\n profileDir: PROFILE_DIR,\n });\n\n // 2. Create computer with all capabilities\n const computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', [\n `--browser-url=http://127.0.0.1:${browser.port}`,\n '--no-usage-statistics',\n ]),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [WORKSPACE_DIR]),\n shell: Computer.shell({\n cwd: WORKSPACE_DIR,\n mode: 'unrestricted',\n }),\n },\n managedProcesses: [{ process: browser.process }],\n });\n\n // 3. Connect to all MCP servers\n const { errors } = await computer.start();\n if (errors.length > 0) {\n console.warn('Failed to connect:', errors);\n }\n\n // 4. Attach to session and register dynamic tools\n const client = new OctavusClient({\n baseUrl: process.env.OCTAVUS_API_URL!,\n apiKey: process.env.OCTAVUS_API_KEY!,\n });\n\n const session = client.agentSessions.attach(sessionId, {\n tools: {\n 'set-chat-title': async (args) => {\n console.log('Chat title:', args.title);\n return { success: true };\n },\n },\n });\n\n session.setDynamicTools(computer);\n\n // 5. Execute and stream\n const events = session.execute({\n type: 'trigger',\n triggerName: 'user-message',\n input: { USER_MESSAGE: 'Navigate to github.com and take a screenshot' },\n });\n\n for await (const event of events) {\n // Handle stream events\n }\n\n // 6. Clean up\n await computer.stop();\n}\n```\n\n## API Reference\n\n### Computer\n\n```typescript\nclass Computer implements ToolProvider {\n constructor(config: ComputerConfig);\n\n // Static factories for MCP entries\n static stdio(\n command: string,\n args?: string[],\n options?: {\n env?: Record<string, string>;\n cwd?: string;\n },\n ): StdioConfig;\n\n static http(\n url: string,\n options?: {\n headers?: Record<string, string>;\n },\n ): HttpConfig;\n\n static shell(options: { cwd?: string; mode: ShellMode; timeout?: number }): ShellConfig;\n\n // Chrome launch helper\n static launchChrome(options: ChromeLaunchOptions): Promise<ChromeInstance>;\n\n // Lifecycle\n start(): Promise<{ errors: string[] }>;\n stop(): Promise<void>;\n\n // Dynamic entries\n addEntry(namespace: string, entry: McpEntry, options?: { deferred?: boolean }): Promise<void>;\n removeEntry(namespace: string): Promise<void>;\n restartEntry(namespace: string): Promise<void>;\n stopEntry(namespace: string): Promise<void>;\n\n // Health\n getHealth(): Promise<ComputerHealth>;\n ensureReady(): Promise<EnsureReadyResult>;\n retryDegraded(): Promise<{ recovered: string[]; stillDegraded: string[] }>;\n\n // ToolProvider implementation\n toolHandlers(): Record<string, ToolHandler>;\n toolSchemas(): ToolSchema[];\n}\n\ninterface ComputerHealth {\n healthy: boolean;\n entries: EntryHealth[];\n totalTools: number;\n}\n\ninterface EntryHealth {\n name: string;\n healthy: boolean;\n error?: string;\n}\n\ninterface EnsureReadyResult extends ComputerHealth {\n recovered?: string[];\n failedEntries?: string[];\n}\n```\n\n### ComputerConfig\n\n```typescript\ninterface ComputerConfig {\n mcpServers: Record<string, McpEntry>;\n managedProcesses?: { process: ChildProcess }[];\n /** Namespaces to skip during start() - they begin as degraded and can be connected on demand via restartEntry(). */\n deferredEntries?: string[];\n}\n\ntype McpEntry = StdioConfig | HttpConfig | ShellConfig;\ntype ShellMode =\n | 'unrestricted'\n | {\n allowedPatterns?: RegExp[];\n blockedPatterns?: RegExp[];\n };\n```\n\n### ChromeInstance\n\n```typescript\ninterface ChromeInstance {\n port: number;\n process: ChildProcess;\n pid: number;\n}\n```\n",
|
|
835
853
|
excerpt: "Computer The package gives agents access to a physical or virtual machine's browser, filesystem, and shell. It connects to MCP servers, discovers their tools, and provides them to the server-sdk....",
|
|
836
854
|
order: 8
|
|
837
855
|
},
|
|
@@ -857,7 +875,7 @@ var sections_default = [
|
|
|
857
875
|
section: "client-sdk",
|
|
858
876
|
title: "Overview",
|
|
859
877
|
description: "Introduction to the Octavus Client SDKs for building chat interfaces.",
|
|
860
|
-
content: "\n# Client SDK Overview\n\nOctavus provides two packages for frontend integration:\n\n| Package | Purpose | Use When |\n| --------------------- | ------------------------ | ----------------------------------------------------- |\n| `@octavus/react` | React hooks and bindings | Building React applications |\n| `@octavus/client-sdk` | Framework-agnostic core | Using Vue, Svelte, vanilla JS, or custom integrations |\n\n**Most users should install `@octavus/react`** - it includes everything from `@octavus/client-sdk` plus React-specific hooks.\n\n## Installation\n\n### React Applications\n\n```bash\nnpm install @octavus/react\n```\n\n**Current version:** `4.1.0`\n\n### Other Frameworks\n\n```bash\nnpm install @octavus/client-sdk\n```\n\n**Current version:** `4.1.0`\n\n## Transport Pattern\n\nThe Client SDK uses a **transport abstraction** to handle communication with your backend. This gives you flexibility in how events are delivered:\n\n| Transport | Use Case | Docs |\n| ----------------------- | -------------------------------------------- | ----------------------------------------------------- |\n| `createHttpTransport` | HTTP/SSE (Next.js, Express, etc.) | [HTTP Transport](/docs/client-sdk/http-transport) |\n| `createSocketTransport` | WebSocket, SockJS, or other socket protocols | [Socket Transport](/docs/client-sdk/socket-transport) |\n\nWhen the transport changes (e.g., when `sessionId` changes), the `useOctavusChat` hook automatically reinitializes with the new transport.\n\n> **Recommendation**: Use HTTP transport unless you specifically need WebSocket features (custom real-time events, Meteor/Phoenix, etc.).\n\n## React Usage\n\nThe `useOctavusChat` hook provides state management and streaming for React applications:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n // Create a stable transport instance (memoized on sessionId)\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { messages, status, send } = useOctavusChat({ transport });\n\n const sendMessage = async (text: string) => {\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n };\n\n return (\n <div>\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n return (\n <div>\n {message.parts.map((part, i) => {\n if (part.type === 'text') {\n return <p key={i}>{part.text}</p>;\n }\n return null;\n })}\n </div>\n );\n}\n```\n\n## Framework-Agnostic Usage\n\nThe `OctavusChat` class can be used with any framework or vanilla JavaScript:\n\n```typescript\nimport { OctavusChat, createHttpTransport } from '@octavus/client-sdk';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n\nconst chat = new OctavusChat({ transport });\n\n// Subscribe to state changes\nconst unsubscribe = chat.subscribe(() => {\n console.log('Messages:', chat.messages);\n console.log('Status:', chat.status);\n // Update your UI here\n});\n\n// Send a message\nawait chat.send('user-message', { USER_MESSAGE: 'Hello' }, { userMessage: { content: 'Hello' } });\n\n// Cleanup when done\nunsubscribe();\n```\n\n## Key Features\n\n### Unified Send Function\n\nThe `send` function handles both user message display and agent triggering in one call:\n\n```tsx\nconst { send } = useOctavusChat({ transport });\n\n// Add user message to UI and trigger agent\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Trigger without adding a user message (e.g., button click)\nawait send('request-human');\n```\n\n### Message Parts\n\nMessages contain ordered `parts` for rich content:\n\n```tsx\nconst { messages } = useOctavusChat({ transport });\n\n// Each message has typed parts\nmessage.parts.map((part) => {\n switch (part.type) {\n case 'text': // Text content\n case 'reasoning': // Extended reasoning/thinking\n case 'tool-call': // Tool execution\n case 'operation': // Internal operations (set-resource, etc.)\n }\n});\n```\n\n### Status Tracking\n\n```tsx\nconst { status } = useOctavusChat({ transport });\n\n// status: 'idle' | 'streaming' | 'error' | 'awaiting-input'\n// 'awaiting-input' occurs when interactive client tools need user action\n```\n\n### Stop Streaming\n\n```tsx\nconst { stop } = useOctavusChat({ transport });\n\n// Stop current stream and finalize message\nstop();\n```\n\n### Retry Last Trigger\n\nRe-execute the last trigger from the same starting point. Messages are rolled back to the state before the trigger, the user message is re-added (if any), and the agent re-executes. Already-uploaded files are reused without re-uploading.\n\n```tsx\nconst { retry, canRetry } = useOctavusChat({ transport });\n\n// Retry after an error, cancellation, or unsatisfactory result\nif (canRetry) {\n await retry();\n}\n```\n\n`canRetry` is `true` when a trigger has been sent and the chat is not currently streaming or awaiting input.\n\n## Hook Reference (React)\n\n### useOctavusChat\n\n```typescript\nfunction useOctavusChat(options: OctavusChatOptions): UseOctavusChatReturn;\n\ninterface OctavusChatOptions {\n // Required: Transport for streaming events\n transport: Transport;\n\n // Optional: Function to request upload URLs for file uploads\n requestUploadUrls?: (\n files: { filename: string; mediaType: string; size: number }[],\n ) => Promise<UploadUrlsResponse>;\n\n // Optional: Client-side tool handlers\n // - Function: executes automatically and returns result\n // - 'interactive': appears in pendingClientTools for user input\n clientTools?: Record<string, ClientToolHandler>;\n\n // Optional: Pre-populate with existing messages (session restore)\n initialMessages?: UIMessage[];\n\n // Optional: Callbacks\n onError?: (error: OctavusError) => void; // Structured error with type, source, retryable\n onFinish?: () => void;\n onStop?: () => void; // Called when user stops generation\n onResourceUpdate?: (name: string, value: unknown) => void;\n}\n\ninterface UseOctavusChatReturn {\n // State\n messages: UIMessage[];\n status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n error: OctavusError | null; // Structured error with type, source, retryable\n\n // Connection (socket transport only - undefined for HTTP)\n connectionState: ConnectionState | undefined; // 'disconnected' | 'connecting' | 'connected' | 'error'\n connectionError: Error | undefined;\n\n // Client tools (interactive tools awaiting user input)\n pendingClientTools: Record<string, InteractiveTool[]>; // Keyed by tool name\n\n // Actions\n send: (\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ) => Promise<void>;\n stop: () => void;\n retry: () => Promise<void>; // Retry last trigger from same starting point\n canRetry: boolean; // Whether retry() can be called\n\n // Connection management (socket transport only - undefined for HTTP)\n connect: (() => Promise<void>) | undefined;\n disconnect: (() => void) | undefined;\n\n // File uploads (requires requestUploadUrls)\n uploadFiles: (\n files: FileList | File[],\n onProgress?: (fileIndex: number, progress: number) => void,\n ) => Promise<FileReference[]>;\n}\n\ninterface UserMessageInput {\n content?: string;\n files?: FileList | File[] | FileReference[];\n}\n```\n\n### useAutoScroll\n\nSmart auto-scroll for chat containers. Scrolls to bottom when content updates, but pauses if the user has scrolled up. See [Streaming - Auto-Scroll](/docs/client-sdk/streaming#auto-scroll) for full usage.\n\n```typescript\nfunction useAutoScroll(options?: UseAutoScrollOptions): {\n scrollRef: RefObject<HTMLDivElement | null>;\n handleScroll: () => void;\n scrollOnUpdate: () => void;\n resetAutoScroll: () => void;\n};\n\ninterface UseAutoScrollOptions {\n scrollRef?: RefObject<HTMLDivElement | null>;\n threshold?: number; // Distance from bottom in px (default: 80)\n}\n```\n\n## Transport Reference\n\n### createHttpTransport\n\nCreates an HTTP/SSE transport using native `fetch()`:\n\n```typescript\nimport { createHttpTransport } from '@octavus/react';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n```\n\n### createSocketTransport\n\nCreates a WebSocket/SockJS transport for real-time connections:\n\n```typescript\nimport { createSocketTransport } from '@octavus/react';\n\nconst transport = createSocketTransport({\n connect: () =>\n new Promise((resolve, reject) => {\n const ws = new WebSocket(`wss://api.example.com/stream?sessionId=${sessionId}`);\n ws.onopen = () => resolve(ws);\n ws.onerror = () => reject(new Error('Connection failed'));\n }),\n});\n```\n\nSocket transport provides additional connection management:\n\n```typescript\n// Access connection state directly\ntransport.connectionState; // 'disconnected' | 'connecting' | 'connected' | 'error'\n\n// Subscribe to state changes\ntransport.onConnectionStateChange((state, error) => {\n /* ... */\n});\n\n// Eager connection (instead of lazy on first send)\nawait transport.connect();\n\n// Manual disconnect\ntransport.disconnect();\n```\n\nFor detailed WebSocket/SockJS usage including custom events, reconnection patterns, and server-side implementation, see [Socket Transport](/docs/client-sdk/socket-transport).\n\n## Class Reference (Framework-Agnostic)\n\n### OctavusChat\n\n```typescript\nclass OctavusChat {\n constructor(options: OctavusChatOptions);\n\n // State (read-only)\n readonly messages: UIMessage[];\n readonly status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n readonly error: OctavusError | null; // Structured error\n readonly pendingClientTools: Record<string, InteractiveTool[]>; // Interactive tools\n\n // Actions\n send(\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ): Promise<void>;\n stop(): void;\n\n // Subscription\n subscribe(callback: () => void): () => void; // Returns unsubscribe function\n}\n```\n\n## Next Steps\n\n- [HTTP Transport](/docs/client-sdk/http-transport) - HTTP/SSE integration (recommended)\n- [Socket Transport](/docs/client-sdk/socket-transport) - WebSocket and SockJS integration\n- [Messages](/docs/client-sdk/messages) - Working with message state\n- [Streaming](/docs/client-sdk/streaming) - Building streaming UIs\n- [Client Tools](/docs/client-sdk/client-tools) - Interactive browser-side tool handling\n- [Operations](/docs/client-sdk/execution-blocks) - Showing agent progress\n- [Error Handling](/docs/client-sdk/error-handling) - Handling errors with type guards\n- [File Uploads](/docs/client-sdk/file-uploads) - Uploading images and documents\n- [Examples](/docs/examples/overview) - Complete working examples\n",
|
|
878
|
+
content: "\n# Client SDK Overview\n\nOctavus provides two packages for frontend integration:\n\n| Package | Purpose | Use When |\n| --------------------- | ------------------------ | ----------------------------------------------------- |\n| `@octavus/react` | React hooks and bindings | Building React applications |\n| `@octavus/client-sdk` | Framework-agnostic core | Using Vue, Svelte, vanilla JS, or custom integrations |\n\n**Most users should install `@octavus/react`** - it includes everything from `@octavus/client-sdk` plus React-specific hooks.\n\n## Installation\n\n### React Applications\n\n```bash\nnpm install @octavus/react\n```\n\n**Current version:** `5.0.0`\n\n### Other Frameworks\n\n```bash\nnpm install @octavus/client-sdk\n```\n\n**Current version:** `5.0.0`\n\n## Transport Pattern\n\nThe Client SDK uses a **transport abstraction** to handle communication with your backend. This gives you flexibility in how events are delivered:\n\n| Transport | Use Case | Docs |\n| ----------------------- | -------------------------------------------- | ----------------------------------------------------- |\n| `createHttpTransport` | HTTP/SSE (Next.js, Express, etc.) | [HTTP Transport](/docs/client-sdk/http-transport) |\n| `createSocketTransport` | WebSocket, SockJS, or other socket protocols | [Socket Transport](/docs/client-sdk/socket-transport) |\n\nWhen the transport changes (e.g., when `sessionId` changes), the `useOctavusChat` hook automatically reinitializes with the new transport.\n\n> **Recommendation**: Use HTTP transport unless you specifically need WebSocket features (custom real-time events, Meteor/Phoenix, etc.).\n\n## React Usage\n\nThe `useOctavusChat` hook provides state management and streaming for React applications:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n // Create a stable transport instance (memoized on sessionId)\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { messages, status, send } = useOctavusChat({ transport });\n\n const sendMessage = async (text: string) => {\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n };\n\n return (\n <div>\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n return (\n <div>\n {message.parts.map((part, i) => {\n if (part.type === 'text') {\n return <p key={i}>{part.text}</p>;\n }\n return null;\n })}\n </div>\n );\n}\n```\n\n## Framework-Agnostic Usage\n\nThe `OctavusChat` class can be used with any framework or vanilla JavaScript:\n\n```typescript\nimport { OctavusChat, createHttpTransport } from '@octavus/client-sdk';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n\nconst chat = new OctavusChat({ transport });\n\n// Subscribe to state changes\nconst unsubscribe = chat.subscribe(() => {\n console.log('Messages:', chat.messages);\n console.log('Status:', chat.status);\n // Update your UI here\n});\n\n// Send a message\nawait chat.send('user-message', { USER_MESSAGE: 'Hello' }, { userMessage: { content: 'Hello' } });\n\n// Cleanup when done\nunsubscribe();\n```\n\n## Key Features\n\n### Unified Send Function\n\nThe `send` function handles both user message display and agent triggering in one call:\n\n```tsx\nconst { send } = useOctavusChat({ transport });\n\n// Add user message to UI and trigger agent\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Trigger without adding a user message (e.g., button click)\nawait send('request-human');\n```\n\n### Message Parts\n\nMessages contain ordered `parts` for rich content:\n\n```tsx\nconst { messages } = useOctavusChat({ transport });\n\n// Each message has typed parts\nmessage.parts.map((part) => {\n switch (part.type) {\n case 'text': // Text content\n case 'reasoning': // Extended reasoning/thinking\n case 'tool-call': // Tool execution\n case 'operation': // Internal operations (set-resource, etc.)\n }\n});\n```\n\n### Status Tracking\n\n```tsx\nconst { status } = useOctavusChat({ transport });\n\n// status: 'idle' | 'streaming' | 'error' | 'awaiting-input'\n// 'awaiting-input' occurs when interactive client tools need user action\n```\n\n### Stop Streaming\n\n```tsx\nconst { stop } = useOctavusChat({ transport });\n\n// Stop current stream and finalize message\nstop();\n```\n\n### Retry Last Trigger\n\nRe-execute the last trigger from the same starting point. Messages are rolled back to the state before the trigger, the user message is re-added (if any), and the agent re-executes. Already-uploaded files are reused without re-uploading.\n\n```tsx\nconst { retry, canRetry } = useOctavusChat({ transport });\n\n// Retry after an error, cancellation, or unsatisfactory result\nif (canRetry) {\n await retry();\n}\n```\n\n`canRetry` is `true` when a trigger has been sent and the chat is not currently streaming or awaiting input.\n\n## Hook Reference (React)\n\n### useOctavusChat\n\n```typescript\nfunction useOctavusChat(options: OctavusChatOptions): UseOctavusChatReturn;\n\ninterface OctavusChatOptions {\n // Required: Transport for streaming events\n transport: Transport;\n\n // Optional: Function to request upload URLs for file uploads\n requestUploadUrls?: (\n files: { filename: string; mediaType: string; size: number }[],\n ) => Promise<UploadUrlsResponse>;\n\n // Optional: Client-side tool handlers\n // - Function: executes automatically and returns result\n // - 'interactive': appears in pendingClientTools for user input\n clientTools?: Record<string, ClientToolHandler>;\n\n // Optional: Pre-populate with existing messages (session restore)\n initialMessages?: UIMessage[];\n\n // Optional: Callbacks\n onError?: (error: OctavusError) => void; // Structured error with type, source, retryable\n onFinish?: () => void;\n onStop?: () => void; // Called when user stops generation\n onResourceUpdate?: (name: string, value: unknown) => void;\n}\n\ninterface UseOctavusChatReturn {\n // State\n messages: UIMessage[];\n status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n error: OctavusError | null; // Structured error with type, source, retryable\n\n // Connection (socket transport only - undefined for HTTP)\n connectionState: ConnectionState | undefined; // 'disconnected' | 'connecting' | 'connected' | 'error'\n connectionError: Error | undefined;\n\n // Client tools (interactive tools awaiting user input)\n pendingClientTools: Record<string, InteractiveTool[]>; // Keyed by tool name\n\n // Actions\n send: (\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ) => Promise<void>;\n stop: () => void;\n retry: () => Promise<void>; // Retry last trigger from same starting point\n canRetry: boolean; // Whether retry() can be called\n\n // Connection management (socket transport only - undefined for HTTP)\n connect: (() => Promise<void>) | undefined;\n disconnect: (() => void) | undefined;\n\n // File uploads (requires requestUploadUrls)\n uploadFiles: (\n files: FileList | File[],\n onProgress?: (fileIndex: number, progress: number) => void,\n ) => Promise<FileReference[]>;\n}\n\ninterface UserMessageInput {\n content?: string;\n files?: FileList | File[] | FileReference[];\n}\n```\n\n### useAutoScroll\n\nSmart auto-scroll for chat containers. Scrolls to bottom when content updates, but pauses if the user has scrolled up. See [Streaming - Auto-Scroll](/docs/client-sdk/streaming#auto-scroll) for full usage.\n\n```typescript\nfunction useAutoScroll(options?: UseAutoScrollOptions): {\n scrollRef: RefObject<HTMLDivElement | null>;\n handleScroll: () => void;\n scrollOnUpdate: () => void;\n resetAutoScroll: () => void;\n};\n\ninterface UseAutoScrollOptions {\n scrollRef?: RefObject<HTMLDivElement | null>;\n threshold?: number; // Distance from bottom in px (default: 80)\n}\n```\n\n## Transport Reference\n\n### createHttpTransport\n\nCreates an HTTP/SSE transport using native `fetch()`:\n\n```typescript\nimport { createHttpTransport } from '@octavus/react';\n\nconst transport = createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n});\n```\n\n### createSocketTransport\n\nCreates a WebSocket/SockJS transport for real-time connections:\n\n```typescript\nimport { createSocketTransport } from '@octavus/react';\n\nconst transport = createSocketTransport({\n connect: () =>\n new Promise((resolve, reject) => {\n const ws = new WebSocket(`wss://api.example.com/stream?sessionId=${sessionId}`);\n ws.onopen = () => resolve(ws);\n ws.onerror = () => reject(new Error('Connection failed'));\n }),\n});\n```\n\nSocket transport provides additional connection management:\n\n```typescript\n// Access connection state directly\ntransport.connectionState; // 'disconnected' | 'connecting' | 'connected' | 'error'\n\n// Subscribe to state changes\ntransport.onConnectionStateChange((state, error) => {\n /* ... */\n});\n\n// Eager connection (instead of lazy on first send)\nawait transport.connect();\n\n// Manual disconnect\ntransport.disconnect();\n```\n\nFor detailed WebSocket/SockJS usage including custom events, reconnection patterns, and server-side implementation, see [Socket Transport](/docs/client-sdk/socket-transport).\n\n## Class Reference (Framework-Agnostic)\n\n### OctavusChat\n\n```typescript\nclass OctavusChat {\n constructor(options: OctavusChatOptions);\n\n // State (read-only)\n readonly messages: UIMessage[];\n readonly status: ChatStatus; // 'idle' | 'streaming' | 'error' | 'awaiting-input'\n readonly error: OctavusError | null; // Structured error\n readonly pendingClientTools: Record<string, InteractiveTool[]>; // Interactive tools\n\n // Actions\n send(\n triggerName: string,\n input?: Record<string, unknown>,\n options?: { userMessage?: UserMessageInput },\n ): Promise<void>;\n stop(): void;\n\n // Subscription\n subscribe(callback: () => void): () => void; // Returns unsubscribe function\n}\n```\n\n## Next Steps\n\n- [HTTP Transport](/docs/client-sdk/http-transport) - HTTP/SSE integration (recommended)\n- [Socket Transport](/docs/client-sdk/socket-transport) - WebSocket and SockJS integration\n- [Messages](/docs/client-sdk/messages) - Working with message state\n- [Streaming](/docs/client-sdk/streaming) - Building streaming UIs\n- [Client Tools](/docs/client-sdk/client-tools) - Interactive browser-side tool handling\n- [Operations](/docs/client-sdk/execution-blocks) - Showing agent progress\n- [Error Handling](/docs/client-sdk/error-handling) - Handling errors with type guards\n- [File Uploads](/docs/client-sdk/file-uploads) - Uploading images and documents\n- [Examples](/docs/examples/overview) - Complete working examples\n",
|
|
861
879
|
excerpt: "Client SDK Overview Octavus provides two packages for frontend integration: | Package | Purpose | Use When | |...",
|
|
862
880
|
order: 1
|
|
863
881
|
},
|
|
@@ -866,7 +884,7 @@ var sections_default = [
|
|
|
866
884
|
section: "client-sdk",
|
|
867
885
|
title: "Messages",
|
|
868
886
|
description: "Working with message state in the Client SDK.",
|
|
869
|
-
content: "\n# Messages\n\nMessages represent the conversation history. The Client SDK tracks messages automatically and provides structured access to their content through typed parts.\n\n## Message Structure\n\n```typescript\ninterface UIMessage {\n id: string;\n role: 'user' | 'assistant';\n parts: UIMessagePart[];\n status: 'streaming' | 'done';\n createdAt: Date;\n}\n```\n\n### Message Parts\n\nMessages contain ordered `parts` that preserve content ordering:\n\n```typescript\ntype UIMessagePart =\n | UITextPart\n | UIReasoningPart\n | UIToolCallPart\n | UIOperationPart\n | UISourcePart\n | UIFilePart\n | UIObjectPart\n | UITodoPart\n | UIWorkerPart\n | UIStepStartPart;\n\n// Text content\ninterface UITextPart {\n type: 'text';\n text: string;\n status: 'streaming' | 'done';\n thread?: string; // For named threads (e.g., \"summary\")\n}\n\n// Extended reasoning/thinking\ninterface UIReasoningPart {\n type: 'reasoning';\n text: string;\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Tool execution\ninterface UIToolCallPart {\n type: 'tool-call';\n toolCallId: string;\n toolName: string;\n displayName?: string; // Human-readable name\n args: Record<string, unknown>;\n result?: unknown;\n error?: string;\n status: 'pending' | 'running' | 'done' | 'error' | 'cancelled';\n thread?: string;\n}\n\n// Internal operations (set-resource, serialize-thread)\ninterface UIOperationPart {\n type: 'operation';\n operationId: string;\n name: string;\n operationType: string;\n status: 'running' | 'done';\n thread?: string;\n}\n\n// Source references (from web search, document processing)\ninterface UISourcePart {\n type: 'source';\n sourceType: 'url' | 'document';\n id: string;\n url?: string; // For URL sources\n title?: string;\n mediaType?: string; // For document sources\n filename?: string;\n thread?: string;\n}\n\n// Generated files (from image generation, skills, code execution)\ninterface UIFilePart {\n type: 'file';\n id: string;\n mediaType: string; // MIME type (e.g., 'image/png', 'image/webp')\n url: string; // Download/display URL (presigned S3 URL)\n filename?: string;\n size?: number;\n toolCallId?: string; // Present if from a tool call\n thread?: string;\n}\n\n// Structured output (when responseType is used)\ninterface UIObjectPart {\n type: 'object';\n id: string;\n typeName: string; // Type name from protocol (e.g., \"ChatResponse\")\n partial?: unknown; // Partial object while streaming\n object?: unknown; // Final object when done\n status: 'streaming' | 'done' | 'error';\n error?: string;\n thread?: string;\n}\n\n// Structured task list (when the agent uses octavus_todo_write)\ninterface UITodoPart {\n type: 'todo';\n todos: {\n id: string;\n content: string;\n status: 'pending' | 'in_progress' | 'completed' | 'cancelled';\n }[];\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Sub-agent execution container (when an agent invokes a worker)\ninterface UIWorkerPart {\n type: 'worker';\n workerId: string;\n workerSlug: string;\n description?: string;\n input?: Record<string, unknown>;\n parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)\n output?: unknown;\n error?: string;\n status: 'running' | 'done' | 'error';\n}\n\n// Step boundary marker (structural, not rendered visually)\ninterface UIStepStartPart {\n type: 'step-start';\n}\n```\n\n## Sending Messages\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { send } = useOctavusChat({ transport });\n\n async function handleSend(text: string) {\n // Add user message to UI and trigger agent\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n }\n\n // ...\n}\n```\n\nThe `send` function:\n\n1. Adds the user message to the UI immediately (if `userMessage` is provided)\n2. Triggers the agent with the specified trigger name and input\n3. Streams the assistant's response back\n\n### Message Content Types\n\nThe `content` field in `userMessage` accepts both strings and objects:\n\n```tsx\n// Text content \u2192 creates a text part\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Object content \u2192 creates an object part (uses `type` field as typeName)\nconst selection = { type: 'product_selection', productId: 'abc123', action: 'select' };\nawait send('user-message', { USER_INPUT: selection }, { userMessage: { content: selection } });\n```\n\nWhen passing an object as `content`:\n\n- The SDK creates a `UIObjectPart` instead of a `UITextPart`\n- The object's `type` field is used as the `typeName` (defaults to `'object'` if not present)\n- This is useful for rich UI interactions like product selections, quick replies, etc.\n\n### Sending with Files\n\nInclude file attachments with messages:\n\n```tsx\nimport type { FileReference } from '@octavus/react';\n\nasync function handleSend(text: string, files?: FileReference[]) {\n await send(\n 'user-message',\n {\n USER_MESSAGE: text,\n FILES: files, // Array of FileReference\n },\n {\n userMessage: {\n content: text,\n files: files, // Shows files in user message bubble\n },\n },\n );\n}\n```\n\nSee [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.\n\n## Rendering Messages\n\n### Basic Rendering\n\n```tsx\nfunction MessageList({ messages }: { messages: UIMessage[] }) {\n return (\n <div className=\"space-y-4\">\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n const isUser = message.role === 'user';\n\n return (\n <div className={isUser ? 'text-right' : 'text-left'}>\n <div className=\"inline-block p-3 rounded-lg\">\n {message.parts.map((part, i) => (\n <PartRenderer key={i} part={part} />\n ))}\n </div>\n </div>\n );\n}\n```\n\n### Rendering Parts\n\n```tsx\nimport { isOtherThread, type UIMessagePart } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n // Check if part belongs to a named thread (e.g., \"summary\")\n if (isOtherThread(part)) {\n return <OtherThreadPart part={part} />;\n }\n\n switch (part.type) {\n case 'text':\n return <TextPart part={part} />;\n\n case 'reasoning':\n return (\n <details className=\"text-gray-500\">\n <summary>Thinking...</summary>\n <pre className=\"text-sm\">{part.text}</pre>\n </details>\n );\n\n case 'tool-call':\n return (\n <div className=\"bg-gray-100 p-2 rounded text-sm\">\n \u{1F527} {part.displayName || part.toolName}\n {part.status === 'done' && ' \u2713'}\n {part.status === 'error' && ` \u2717 ${part.error}`}\n </div>\n );\n\n case 'operation':\n return (\n <div className=\"text-gray-500 text-sm\">\n {part.name}\n {part.status === 'done' && ' \u2713'}\n </div>\n );\n\n case 'source':\n return (\n <div className=\"text-blue-500 text-sm\">\u{1F4CE} {part.title || part.url || part.filename}</div>\n );\n\n case 'file':\n // Render images inline, other files as download links\n if (part.mediaType.startsWith('image/')) {\n return (\n <img\n src={part.url}\n alt={part.filename || 'Generated image'}\n className=\"max-w-full rounded-lg\"\n />\n );\n }\n return (\n <a href={part.url} className=\"text-blue-500 text-sm underline\">\n \u{1F4C4} {part.filename || 'Download file'}\n </a>\n );\n\n case 'object':\n // For structured output, render custom UI based on typeName\n // See Structured Output guide for more details\n return <ObjectPartRenderer part={part} />;\n\n case 'step-start':\n return null;\n\n default:\n return null;\n }\n}\n\nfunction TextPart({ part }: { part: UITextPart }) {\n return (\n <p>\n {part.text}\n {part.status === 'streaming' && (\n <span className=\"inline-block w-2 h-4 bg-gray-400 animate-pulse ml-1\" />\n )}\n </p>\n );\n}\n```\n\n## Named Threads\n\nContent from named threads (like \"summary\") is identified by the `thread` property. Use the `isOtherThread` helper:\n\n```tsx\nimport { isOtherThread } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n if (isOtherThread(part)) {\n // Render differently for named threads\n return (\n <div className=\"bg-amber-50 p-2 rounded border border-amber-200\">\n <span className=\"text-amber-600 text-sm\">\n {part.thread}: {part.type === 'text' && part.text}\n </span>\n </div>\n );\n }\n\n // Regular rendering for main thread\n // ...\n}\n```\n\n## Session Restore\n\nWhen restoring a session, fetch messages from your backend and pass them to the hook:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\ninterface ChatProps {\n sessionId: string;\n initialMessages: UIMessage[];\n}\n\nfunction Chat({ sessionId, initialMessages }: ChatProps) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n // Pass existing messages to restore the conversation\n const { messages } = useOctavusChat({\n transport,\n initialMessages,\n });\n\n // ...\n}\n```\n\nOn your backend, use `agentSessions.getMessages()` to fetch UI-ready messages:\n\n```typescript\n// Server-side\nconst session = await client.agentSessions.getMessages(sessionId);\n// session.messages is UIMessage[] ready for the client\n```\n\n## Callbacks\n\n```tsx\nuseOctavusChat({\n transport,\n onFinish: () => {\n console.log('Stream completed');\n // Scroll to bottom, play sound, etc.\n },\n onError: (error) => {\n console.error('Error:', error);\n toast.error('Failed to get response');\n },\n onResourceUpdate: (name, value) => {\n console.log('Resource updated:', name, value);\n },\n});\n```\n",
|
|
887
|
+
content: "\n# Messages\n\nMessages represent the conversation history. The Client SDK tracks messages automatically and provides structured access to their content through typed parts.\n\n## Message Structure\n\n```typescript\ninterface UIMessage {\n id: string;\n role: 'user' | 'assistant';\n parts: UIMessagePart[];\n status: 'streaming' | 'done';\n createdAt: Date;\n sender?: UIMessageSender; // Author of a user message, in multi-user chats\n}\n\ninterface UIMessageSender {\n id?: string;\n name?: string;\n image?: string; // Avatar URL\n}\n```\n\n### Message Parts\n\nMessages contain ordered `parts` that preserve content ordering:\n\n```typescript\ntype UIMessagePart =\n | UITextPart\n | UIReasoningPart\n | UIToolCallPart\n | UIOperationPart\n | UISourcePart\n | UIFilePart\n | UIObjectPart\n | UITodoPart\n | UIWorkerPart\n | UIStepStartPart;\n\n// Text content\ninterface UITextPart {\n type: 'text';\n text: string;\n status: 'streaming' | 'done';\n thread?: string; // For named threads (e.g., \"summary\")\n}\n\n// Extended reasoning/thinking\ninterface UIReasoningPart {\n type: 'reasoning';\n text: string;\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Tool execution\ninterface UIToolCallPart {\n type: 'tool-call';\n toolCallId: string;\n toolName: string;\n displayName?: string; // Human-readable name\n args: Record<string, unknown>;\n result?: unknown;\n error?: string;\n status: 'pending' | 'running' | 'done' | 'error' | 'cancelled';\n thread?: string;\n}\n\n// Internal operations (set-resource, serialize-thread)\ninterface UIOperationPart {\n type: 'operation';\n operationId: string;\n name: string;\n operationType: string;\n status: 'running' | 'done';\n thread?: string;\n}\n\n// Source references (from web search, document processing)\ninterface UISourcePart {\n type: 'source';\n sourceType: 'url' | 'document';\n id: string;\n url?: string; // For URL sources\n title?: string;\n mediaType?: string; // For document sources\n filename?: string;\n thread?: string;\n}\n\n// Generated files (from image generation, skills, code execution)\ninterface UIFilePart {\n type: 'file';\n id: string;\n mediaType: string; // MIME type (e.g., 'image/png', 'image/webp')\n url: string; // Download/display URL (presigned S3 URL)\n filename?: string;\n size?: number;\n toolCallId?: string; // Present if from a tool call\n thread?: string;\n}\n\n// Structured output (when responseType is used)\ninterface UIObjectPart {\n type: 'object';\n id: string;\n typeName: string; // Type name from protocol (e.g., \"ChatResponse\")\n partial?: unknown; // Partial object while streaming\n object?: unknown; // Final object when done\n status: 'streaming' | 'done' | 'error';\n error?: string;\n thread?: string;\n}\n\n// Structured task list (when the agent uses octavus_todo_write)\ninterface UITodoPart {\n type: 'todo';\n todos: {\n id: string;\n content: string;\n status: 'pending' | 'in_progress' | 'completed' | 'cancelled';\n }[];\n status: 'streaming' | 'done';\n thread?: string;\n}\n\n// Sub-agent execution container (when an agent invokes a worker)\ninterface UIWorkerPart {\n type: 'worker';\n workerId: string;\n workerSlug: string;\n description?: string;\n input?: Record<string, unknown>;\n parts: UIMessagePart[]; // Nested parts from the worker (excluding nested workers)\n output?: unknown;\n error?: string;\n status: 'running' | 'done' | 'error' | 'cancelled';\n}\n\n// Step boundary marker (structural, not rendered visually)\ninterface UIStepStartPart {\n type: 'step-start';\n}\n```\n\n## Sending Messages\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport } from '@octavus/react';\n\nfunction Chat({ sessionId }: { sessionId: string }) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n const { send } = useOctavusChat({ transport });\n\n async function handleSend(text: string) {\n // Add user message to UI and trigger agent\n await send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n }\n\n // ...\n}\n```\n\nThe `send` function:\n\n1. Adds the user message to the UI immediately (if `userMessage` is provided)\n2. Triggers the agent with the specified trigger name and input\n3. Streams the assistant's response back\n\n### Message Content Types\n\nThe `content` field in `userMessage` accepts both strings and objects:\n\n```tsx\n// Text content \u2192 creates a text part\nawait send('user-message', { USER_MESSAGE: text }, { userMessage: { content: text } });\n\n// Object content \u2192 creates an object part (uses `type` field as typeName)\nconst selection = { type: 'product_selection', productId: 'abc123', action: 'select' };\nawait send('user-message', { USER_INPUT: selection }, { userMessage: { content: selection } });\n```\n\nWhen passing an object as `content`:\n\n- The SDK creates a `UIObjectPart` instead of a `UITextPart`\n- The object's `type` field is used as the `typeName` (defaults to `'object'` if not present)\n- This is useful for rich UI interactions like product selections, quick replies, etc.\n\n### Sending with Files\n\nInclude file attachments with messages:\n\n```tsx\nimport type { FileReference } from '@octavus/react';\n\nasync function handleSend(text: string, files?: FileReference[]) {\n await send(\n 'user-message',\n {\n USER_MESSAGE: text,\n FILES: files, // Array of FileReference\n },\n {\n userMessage: {\n content: text,\n files: files, // Shows files in user message bubble\n },\n },\n );\n}\n```\n\nSee [File Uploads](/docs/client-sdk/file-uploads) for complete upload flow.\n\n### Attributing the Sender (Multi-User Chats)\n\nIn conversations shared by several people, pass `sender` so the optimistic bubble shows who sent the message immediately:\n\n```tsx\nawait send(\n 'user-message',\n { USER_MESSAGE: text },\n {\n userMessage: { content: text, sender: { id: user.id, name: user.name, image: user.avatarUrl } },\n },\n);\n```\n\nThis `sender` is for instant local display only. For attribution that persists and is visible to other participants, set the authoritative sender server-side on the trigger (see [Server SDK Sessions](/docs/server-sdk/sessions)). The persisted value comes back on `message.sender` from `getMessages()`, so render from `message.sender` and treat the value you passed to `send()` as the optimistic placeholder.\n\n## Rendering Messages\n\n### Basic Rendering\n\n```tsx\nfunction MessageList({ messages }: { messages: UIMessage[] }) {\n return (\n <div className=\"space-y-4\">\n {messages.map((msg) => (\n <MessageBubble key={msg.id} message={msg} />\n ))}\n </div>\n );\n}\n\nfunction MessageBubble({ message }: { message: UIMessage }) {\n const isUser = message.role === 'user';\n\n return (\n <div className={isUser ? 'text-right' : 'text-left'}>\n <div className=\"inline-block p-3 rounded-lg\">\n {message.parts.map((part, i) => (\n <PartRenderer key={i} part={part} />\n ))}\n </div>\n </div>\n );\n}\n```\n\n### Rendering Parts\n\n```tsx\nimport { isOtherThread, type UIMessagePart } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n // Check if part belongs to a named thread (e.g., \"summary\")\n if (isOtherThread(part)) {\n return <OtherThreadPart part={part} />;\n }\n\n switch (part.type) {\n case 'text':\n return <TextPart part={part} />;\n\n case 'reasoning':\n return (\n <details className=\"text-gray-500\">\n <summary>Thinking...</summary>\n <pre className=\"text-sm\">{part.text}</pre>\n </details>\n );\n\n case 'tool-call':\n return (\n <div className=\"bg-gray-100 p-2 rounded text-sm\">\n \u{1F527} {part.displayName || part.toolName}\n {part.status === 'done' && ' \u2713'}\n {part.status === 'error' && ` \u2717 ${part.error}`}\n </div>\n );\n\n case 'operation':\n return (\n <div className=\"text-gray-500 text-sm\">\n {part.name}\n {part.status === 'done' && ' \u2713'}\n </div>\n );\n\n case 'source':\n return (\n <div className=\"text-blue-500 text-sm\">\u{1F4CE} {part.title || part.url || part.filename}</div>\n );\n\n case 'file':\n // Render images inline, other files as download links\n if (part.mediaType.startsWith('image/')) {\n return (\n <img\n src={part.url}\n alt={part.filename || 'Generated image'}\n className=\"max-w-full rounded-lg\"\n />\n );\n }\n return (\n <a href={part.url} className=\"text-blue-500 text-sm underline\">\n \u{1F4C4} {part.filename || 'Download file'}\n </a>\n );\n\n case 'object':\n // For structured output, render custom UI based on typeName\n // See Structured Output guide for more details\n return <ObjectPartRenderer part={part} />;\n\n case 'step-start':\n return null;\n\n default:\n return null;\n }\n}\n\nfunction TextPart({ part }: { part: UITextPart }) {\n return (\n <p>\n {part.text}\n {part.status === 'streaming' && (\n <span className=\"inline-block w-2 h-4 bg-gray-400 animate-pulse ml-1\" />\n )}\n </p>\n );\n}\n```\n\n## Named Threads\n\nContent from named threads (like \"summary\") is identified by the `thread` property. Use the `isOtherThread` helper:\n\n```tsx\nimport { isOtherThread } from '@octavus/react';\n\nfunction PartRenderer({ part }: { part: UIMessagePart }) {\n if (isOtherThread(part)) {\n // Render differently for named threads\n return (\n <div className=\"bg-amber-50 p-2 rounded border border-amber-200\">\n <span className=\"text-amber-600 text-sm\">\n {part.thread}: {part.type === 'text' && part.text}\n </span>\n </div>\n );\n }\n\n // Regular rendering for main thread\n // ...\n}\n```\n\n## Session Restore\n\nWhen restoring a session, fetch messages from your backend and pass them to the hook:\n\n```tsx\nimport { useMemo } from 'react';\nimport { useOctavusChat, createHttpTransport, type UIMessage } from '@octavus/react';\n\ninterface ChatProps {\n sessionId: string;\n initialMessages: UIMessage[];\n}\n\nfunction Chat({ sessionId, initialMessages }: ChatProps) {\n const transport = useMemo(\n () =>\n createHttpTransport({\n request: (payload, options) =>\n fetch('/api/trigger', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({ sessionId, ...payload }),\n signal: options?.signal,\n }),\n }),\n [sessionId],\n );\n\n // Pass existing messages to restore the conversation\n const { messages } = useOctavusChat({\n transport,\n initialMessages,\n });\n\n // ...\n}\n```\n\nOn your backend, use `agentSessions.getMessages()` to fetch UI-ready messages:\n\n```typescript\n// Server-side\nconst session = await client.agentSessions.getMessages(sessionId);\n// session.messages is UIMessage[] ready for the client\n```\n\n## Callbacks\n\n```tsx\nuseOctavusChat({\n transport,\n onFinish: () => {\n console.log('Stream completed');\n // Scroll to bottom, play sound, etc.\n },\n onError: (error) => {\n console.error('Error:', error);\n toast.error('Failed to get response');\n },\n onResourceUpdate: (name, value) => {\n console.log('Resource updated:', name, value);\n },\n});\n```\n",
|
|
870
888
|
excerpt: "Messages Messages represent the conversation history. The Client SDK tracks messages automatically and provides structured access to their content through typed parts. Message Structure Message...",
|
|
871
889
|
order: 2
|
|
872
890
|
},
|
|
@@ -1358,7 +1376,7 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
1358
1376
|
section: "protocol",
|
|
1359
1377
|
title: "Skills",
|
|
1360
1378
|
description: "Using Octavus skills for code execution and specialized capabilities.",
|
|
1361
|
-
content: "\n# Skills\n\nSkills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with documentation and scripts. By default, skills run in isolated sandbox environments, but they can also run directly on the agent's computer.\n\n## Overview\n\nOctavus Skills provide **provider-agnostic** code execution. They work with any LLM provider (Anthropic, OpenAI, Google) by using explicit tool calls and system prompt injection.\n\n### How Skills Work\n\n1. **Skill Definition**: Skills are defined in the protocol's `skills:` section\n2. **Skill Resolution**: Skills are resolved from available sources (see below)\n3. **Execution**: Code runs in an isolated sandbox (default) or on the agent's computer\n4. **File Generation**: Files saved to `/output/` are automatically captured and made available for download (sandbox skills)\n\n### Skill Sources\n\nSkills come from two sources, visible in the Skills tab of your organization:\n\n| Source | Badge in UI | Visibility | Example |\n| ----------- | ----------- | ------------------------------ | ------------------ |\n| **Octavus** | `Octavus` | Available to all organizations | `qr-code` |\n| **Custom** | None | Private to your organization | `my-company-skill` |\n\nWhen you reference a skill in your protocol, Octavus resolves it from your available skills. If you create a custom skill with the same name as an Octavus skill, your custom skill takes precedence.\n\n## Defining Skills\n\nDefine skills in the protocol's `skills:` section:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n```\n\n### Skill Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------------------------------------- |\n| `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |\n| `description` | No | Custom description shown to users (overrides skill's built-in description) |\n| `execution` | No | Where the skill runs: `sandbox` (default) or `device` |\n\n### Display Modes\n\nThe `display` setting on a skill applies to all tools under that skill namespace. See [Tool Display Modes](/docs/protocol/tools#display-modes) for full details on each mode.\n\n| Mode | Behavior |\n| ------------- | -------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Skill tools run silently, no UI events emitted |\n| `name` | Shows skill name while executing |\n| `description` | Shows description while executing (default). Result not preserved after page refresh. |\n| `stream` | Full visibility - arguments stream progressively, result shown after execution, result preserved after page refresh. |\n\n## Enabling Skills\n\nAfter defining skills in the `skills:` section, specify which skills are available. Skills work in both interactive agents and workers.\n\n### Interactive Agents\n\nReference skills in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account]\n skills: [qr-code]\n agentic: true\n```\n\n### Workers and Named Threads\n\nReference skills per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n maxSteps: 10\n```\n\nThis also works for named threads in interactive agents, allowing different threads to have different skills.\n\n## Skill Tools\n\nWhen skills are enabled, the LLM has access to these tools:\n\n| Tool | Purpose | Availability |\n| --------------------- | ----------------------------------------------- | ------------------------------ |\n| `octavus_skill_read` | Read skill documentation (SKILL.md) | All skills |\n| `octavus_skill_list` | List available scripts in a skill | All skills |\n| `octavus_skill_run` | Execute a pre-built script from a skill | All skills |\n| `octavus_skill_setup` | Install a skill on the device for file browsing | Device skills only |\n| `octavus_code_run` | Execute arbitrary Python/Bash code | Sandbox skills (standard) only |\n| `octavus_file_write` | Create files in the sandbox | Sandbox skills (standard) only |\n| `octavus_file_read` | Read files from the sandbox | Sandbox skills (standard) only |\n\nThe LLM learns about available skills through system prompt injection and can use these tools to interact with skills.\n\nSkills that have [secrets](#skill-secrets) configured run in **secure mode**, where only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available. See [Skill Secrets](#skill-secrets) below.\n\n## Device Execution\n\nBy default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications to production\n execution: device\n qr-code:\n display: description\n description: Generating QR codes\n # execution defaults to sandbox\n```\n\n### How Device Skills Work\n\nDevice skills are installed on the agent's computer so the agent can browse their files and run their scripts directly. After attaching a skill via integrations, the agent uses `octavus_skill_setup` to install it on the device. Once installed, the agent can:\n\n- Read the skill's documentation with `octavus_skill_read`\n- List available scripts with `octavus_skill_list`\n- Run pre-built scripts with `octavus_skill_run`\n\nThe generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_file_read`) are **not available** for device skills. Instead, the agent uses the device's own shell and filesystem MCP servers to interact with files and run commands.\n\n### Sandbox vs Device Skills\n\n| Aspect | Sandbox (default) | Device |\n| ------------------- | ---------------------------------- | ------------------------------------------------------ |\n| **Environment** | Isolated sandbox | Agent's computer (VM or desktop) |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |\n| **Code execution** | Via `octavus_code_run` | Via device shell MCP |\n| **Isolation** | Fully sandboxed | Runs alongside other device processes |\n| **File output** | `/output/` directory auto-captured | Files written to device filesystem |\n\n### When to Use Device Execution\n\nUse `execution: device` when the skill needs to:\n\n- Access the agent's local filesystem or running processes\n- Use tools or CLIs installed on the device\n- Interact with services running on the device\n- Persist files beyond a single execution cycle\n\n## Example: QR Code Generation\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n agentic: true\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Respond:\n block: next-message\n```\n\nWhen a user asks \"Create a QR code for octavus.ai\", the LLM will:\n\n1. Recognize the task matches the `qr-code` skill\n2. Call `octavus_skill_read` to learn how to use the skill\n3. Execute code (via `octavus_code_run` or `octavus_skill_run`) to generate the QR code\n4. Save the image to `/output/` in the sandbox\n5. The file is automatically captured and made available for download\n\n## File Output\n\nFiles saved to `/output/` in the sandbox are automatically:\n\n1. **Captured** after code execution\n2. **Uploaded** to S3 storage\n3. **Made available** via presigned URLs\n4. **Included** in the message as file parts\n\nFiles persist across page refreshes and are stored in the session's message history.\n\n## Skill Format\n\nSkills follow the [Agent Skills](https://agentskills.io) open standard:\n\n- `SKILL.md` - Required skill documentation with YAML frontmatter\n- `scripts/` - Optional executable code (Python/Bash)\n- `references/` - Optional documentation loaded as needed\n- `assets/` - Optional files used in outputs (templates, images)\n\n### SKILL.md Format\n\n````yaml\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\nversion: 1.0.0\nlicense: MIT\nauthor: Octavus Team\n---\n\n# QR Code Generator\n\n## Overview\n\nThis skill creates QR codes from text data using Python...\n\n## Quick Start\n\nGenerate a QR code with Python:\n\n```python\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n# ... code to generate QR code ...\n````\n\n## Scripts Reference\n\n### scripts/generate.py\n\nMain script for generating QR codes...\n\n````\n\n### Frontmatter Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------ |\n| `name` | Yes | Skill slug (lowercase, hyphens) |\n| `description` | Yes | What the skill does (shown to the LLM) |\n| `version` | No | Semantic version string |\n| `license` | No | License identifier |\n| `author` | No | Skill author |\n| `secrets` | No | Array of secret declarations (enables secure mode) |\n\n## Best Practices\n\n### 1. Clear Descriptions\n\nProvide clear, purpose-driven descriptions:\n\n```yaml\nskills:\n # Good - clear purpose\n qr-code:\n description: Generating QR codes for URLs, contact info, or any text data\n\n # Avoid - vague\n utility:\n description: Does stuff\n````\n\n### 2. When to Use Skills vs Tools\n\n| Use Skills When | Use Tools When |\n| ------------------------ | ---------------------------- |\n| Code execution needed | Simple API calls |\n| File generation | Database queries |\n| Complex calculations | External service integration |\n| Data processing | Authentication required |\n| Provider-agnostic needed | Backend-specific logic |\n\n### 3. Skill Selection\n\nDefine all skills available to this agent in the `skills:` section. Then specify which skills are available for the chat thread in `agent.skills`:\n\n```yaml\n# All skills available to this agent (defined once at protocol level)\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n pdf-processor:\n display: description\n description: Processing PDFs\n\n# Skills available for this chat thread\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis] # Skills available for this thread\n```\n\n### 4. Display Modes\n\nChoose appropriate display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n## Comparison: Skills vs Tools vs Provider Options\n\n| Feature | Octavus Skills | External Tools | Provider Tools/Skills |\n| ------------------ | --------------------------- | ------------------- | --------------------- |\n| **Execution** | Sandbox or agent's computer | Your backend | Provider servers |\n| **Provider** | Any (agnostic) | N/A | Provider-specific |\n| **Code Execution** | Yes | No | Yes (provider tools) |\n| **File Output** | Yes | No | Yes (provider skills) |\n| **Implementation** | Skill packages | Your code | Built-in |\n| **Cost** | Sandbox + LLM API | Your infrastructure | Included in API |\n\n## Uploading Custom Skills\n\nYou can upload custom skills to your organization using the CLI or the platform UI.\n\n### Via CLI (Recommended)\n\nUse [`octavus skills sync`](/docs/server-sdk/cli#octavus-skills-sync-path) to package and upload a skill directory. If the skill has a `.env` file, secrets are pushed alongside the bundle:\n\n```bash\noctavus skills sync ./skills/my-skill\n```\n\n### Skill Directory Structure\n\n```\nmy-skill/\n\u251C\u2500\u2500 SKILL.md # Required: Skill documentation with frontmatter\n\u251C\u2500\u2500 scripts/ # Optional: Executable scripts\n\u2502 \u251C\u2500\u2500 run.py\n\u2502 \u2514\u2500\u2500 requirements.txt\n\u251C\u2500\u2500 references/ # Optional: Additional documentation\n\u251C\u2500\u2500 assets/ # Optional: Templates, images\n\u2514\u2500\u2500 .env # Optional: Secrets (not included in bundle)\n```\n\nOnce uploaded, reference the skill by slug in your protocol:\n\n```yaml\nskills:\n my-skill:\n display: description\n description: Custom analysis tool\n\nagent:\n skills: [my-skill]\n```\n\n## On-Demand Skills\n\nOn-demand skills (`onDemandSkills`) also support the `execution` field:\n\n```yaml\nonDemandSkills:\n display: description\n execution: device\n```\n\nWhen `execution: device` is set on the on-demand skills declaration, any skill attached at runtime via integrations runs on the agent's computer instead of in a sandbox.\n\n## Sandbox Timeout\n\nThe default sandbox timeout is 5 minutes (applies to sandbox skills only). You can configure a custom timeout using `sandboxTimeout` in the agent config or on individual `start-thread` blocks:\n\n```yaml\n# Agent-level timeout (applies to main thread)\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes (in milliseconds)\n```\n\n```yaml\n# Thread-level timeout (overrides agent-level for this thread)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour\n```\n\nThread-level `sandboxTimeout` takes priority over agent-level. Maximum: 1 hour (3,600,000 ms).\n\n## Skill Secrets\n\nSkills can declare secrets they need to function. When an organization configures those secrets, the skill runs in **secure mode** with additional isolation.\n\n### Declaring Secrets\n\nAdd a `secrets` array to your SKILL.md frontmatter:\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\nEach secret declaration has:\n\n| Field | Required | Description |\n| ------------- | -------- | ----------------------------------------------------------- |\n| `name` | Yes | Environment variable name (uppercase, e.g., `GITHUB_TOKEN`) |\n| `description` | No | Explains what this secret is for (shown in the UI) |\n| `required` | No | Whether the secret is required (defaults to `true`) |\n\nSecret names must match the pattern `^[A-Z_][A-Z0-9_]*$` (uppercase letters, digits, and underscores).\n\n### Configuring Secrets\n\nOrganization admins configure secret values through the skill editor in the platform UI. Each organization maintains its own independent set of secrets for each skill.\n\nSecrets are encrypted at rest and only decrypted at execution time.\n\n### Secure Mode\n\nWhen a skill has secrets configured for the organization, it automatically runs in **secure mode**:\n\n- The skill gets its own **isolated sandbox** (separate from other skills)\n- Secrets are injected as **environment variables** available to all scripts\n- Only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked\n- Scripts receive input as **JSON via stdin** (using the `input` parameter on `octavus_skill_run`) instead of CLI args\n- All output (stdout/stderr) is **automatically redacted** for secret values before being returned to the LLM\n\n### Writing Scripts for Secure Skills\n\nScripts in secure skills read input from stdin as JSON and access secrets from environment variables:\n\n```python\nimport json\nimport os\nimport sys\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ.get('GITHUB_TOKEN')\n\n# Use the token and input_data to perform the task\n```\n\nFor standard skills (without secrets), scripts receive input as CLI arguments. For secure skills, always use stdin JSON.\n\n## Security\n\nSandbox skills run in isolated environments:\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n- **Secret redaction** - output from secure skills is automatically scanned for secret values\n\nDevice skills run on the agent's computer and share its environment. They do not have sandbox isolation but benefit from restricted tool access (only slug-bearing tools are available).\n\n## Next Steps\n\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills in agent settings\n- [Provider Options](/docs/protocol/provider-options) - Anthropic's built-in skills\n- [Skills Advanced Guide](/docs/protocol/skills-advanced) - Best practices and advanced patterns\n",
|
|
1379
|
+
content: "\n# Skills\n\nSkills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with documentation and scripts. By default, skills run in isolated sandbox environments, but they can also run directly on the agent's computer.\n\n## Overview\n\nOctavus Skills provide **provider-agnostic** code execution. They work with any LLM provider (Anthropic, OpenAI, Google) by using explicit tool calls and system prompt injection.\n\n### How Skills Work\n\n1. **Skill Definition**: Skills are defined in the protocol's `skills:` section\n2. **Skill Resolution**: Skills are resolved from available sources (see below)\n3. **Execution**: Code runs in an isolated sandbox (default) or on the agent's computer\n4. **File Generation**: Files saved to `/output/` are automatically captured and made available for download (sandbox skills)\n\n### Skill Sources\n\nSkills come from two sources, visible in the Skills tab of your organization:\n\n| Source | Badge in UI | Visibility | Example |\n| ----------- | ----------- | ------------------------------ | ------------------ |\n| **Octavus** | `Octavus` | Available to all organizations | `qr-code` |\n| **Custom** | None | Private to your organization | `my-company-skill` |\n\nWhen you reference a skill in your protocol, Octavus resolves it from your available skills. If you create a custom skill with the same name as an Octavus skill, your custom skill takes precedence.\n\n## Defining Skills\n\nDefine skills in the protocol's `skills:` section:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n```\n\n### Skill Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------------------------------------- |\n| `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |\n| `description` | No | Custom description shown to users (overrides skill's built-in description) |\n| `execution` | No | Where the skill runs: `sandbox` (default) or `device` |\n\n### Display Modes\n\nThe `display` setting on a skill applies to all tools under that skill namespace. See [Tool Display Modes](/docs/protocol/tools#display-modes) for full details on each mode.\n\n| Mode | Behavior |\n| ------------- | -------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Skill tools run silently, no UI events emitted |\n| `name` | Shows skill name while executing |\n| `description` | Shows description while executing (default). Result not preserved after page refresh. |\n| `stream` | Full visibility - arguments stream progressively, result shown after execution, result preserved after page refresh. |\n\n## Enabling Skills\n\nAfter defining skills in the `skills:` section, specify which skills are available. Skills work in both interactive agents and workers.\n\n### Interactive Agents\n\nReference skills in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account]\n skills: [qr-code]\n agentic: true\n```\n\n### Workers and Named Threads\n\nReference skills per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n maxSteps: 10\n```\n\nThis also works for named threads in interactive agents, allowing different threads to have different skills.\n\n## Skill Tools\n\nWhen skills are enabled, the LLM has access to these tools:\n\n| Tool | Purpose | Availability |\n| --------------------- | ----------------------------------------------- | ------------------------------ |\n| `octavus_skill_read` | Read skill documentation (SKILL.md) | All skills |\n| `octavus_skill_list` | List available scripts in a skill | All skills |\n| `octavus_skill_run` | Execute a pre-built script from a skill | All skills |\n| `octavus_skill_setup` | Install a skill on the device for file browsing | Device skills only |\n| `octavus_code_run` | Execute arbitrary Python/Bash code | Sandbox skills (standard) only |\n| `octavus_file_write` | Create files in the sandbox | Sandbox skills (standard) only |\n| `octavus_file_read` | Read files from the sandbox | Sandbox skills (standard) only |\n\nThe LLM learns about available skills through system prompt injection and can use these tools to interact with skills.\n\nSkills that have [secrets](#skill-secrets) configured run in **secure mode**, where only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available. See [Skill Secrets](#skill-secrets) below.\n\n## Device Execution\n\nBy default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer instead.\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications to production\n execution: device\n qr-code:\n display: description\n description: Generating QR codes\n # execution defaults to sandbox\n```\n\n### How Device Skills Work\n\nDevice skills are installed on the agent's computer so the agent can browse their files and run their scripts directly. After attaching a skill via integrations, the agent uses `octavus_skill_setup` to install it on the device. Once installed, the agent can:\n\n- Read the skill's documentation with `octavus_skill_read`\n- List available scripts with `octavus_skill_list`\n- Run pre-built scripts with `octavus_skill_run`\n\nThe generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_file_read`) are **not available** for device skills. Instead, the agent uses the device's own shell and filesystem MCP servers to interact with files and run commands.\n\n### Sandbox vs Device Skills\n\n| Aspect | Sandbox (default) | Device |\n| ------------------- | ---------------------------------- | ------------------------------------------------------ |\n| **Environment** | Isolated sandbox | The agent's computer |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |\n| **Code execution** | Via `octavus_code_run` | Via device shell MCP |\n| **Isolation** | Fully sandboxed | Runs alongside other device processes |\n| **File output** | `/output/` directory auto-captured | Files written to device filesystem |\n\n### When to Use Device Execution\n\nUse `execution: device` when the skill needs to:\n\n- Access the agent's local filesystem or running processes\n- Use tools or CLIs installed on the device\n- Interact with services running on the device\n- Persist files beyond a single execution cycle\n\n## Example: QR Code Generation\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n agentic: true\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Respond:\n block: next-message\n```\n\nWhen a user asks \"Create a QR code for octavus.ai\", the LLM will:\n\n1. Recognize the task matches the `qr-code` skill\n2. Call `octavus_skill_read` to learn how to use the skill\n3. Execute code (via `octavus_code_run` or `octavus_skill_run`) to generate the QR code\n4. Save the image to `/output/` in the sandbox\n5. The file is automatically captured and made available for download\n\n## File Output\n\nFiles saved to `/output/` in the sandbox are automatically:\n\n1. **Captured** after code execution\n2. **Uploaded** to S3 storage\n3. **Made available** via presigned URLs\n4. **Included** in the message as file parts\n\nFiles persist across page refreshes and are stored in the session's message history.\n\n## Skill Format\n\nSkills follow the [Agent Skills](https://agentskills.io) open standard:\n\n- `SKILL.md` - Required skill documentation with YAML frontmatter\n- `scripts/` - Optional executable code (Python/Bash)\n- `references/` - Optional documentation loaded as needed\n- `assets/` - Optional files used in outputs (templates, images)\n\n### SKILL.md Format\n\n````yaml\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\nversion: 1.0.0\nlicense: MIT\nauthor: Octavus Team\ncategory: Productivity\n---\n\n# QR Code Generator\n\n## Overview\n\nThis skill creates QR codes from text data using Python...\n\n## Quick Start\n\nGenerate a QR code with Python:\n\n```python\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n# ... code to generate QR code ...\n````\n\n## Scripts Reference\n\n### scripts/generate.py\n\nMain script for generating QR codes...\n\n````\n\n### Frontmatter Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ------------------------------------------------------ |\n| `name` | Yes | Skill slug (lowercase, hyphens) |\n| `description` | Yes | What the skill does (shown to the LLM) |\n| `version` | No | Semantic version string |\n| `license` | No | License identifier |\n| `author` | No | Skill author |\n| `category` | No | Display category used to group and filter skills in the UI |\n| `secrets` | No | Array of secret declarations (enables secure mode) |\n\n## Best Practices\n\n### 1. Clear Descriptions\n\nProvide clear, purpose-driven descriptions:\n\n```yaml\nskills:\n # Good - clear purpose\n qr-code:\n description: Generating QR codes for URLs, contact info, or any text data\n\n # Avoid - vague\n utility:\n description: Does stuff\n````\n\n### 2. When to Use Skills vs Tools\n\n| Use Skills When | Use Tools When |\n| ------------------------ | ---------------------------- |\n| Code execution needed | Simple API calls |\n| File generation | Database queries |\n| Complex calculations | External service integration |\n| Data processing | Authentication required |\n| Provider-agnostic needed | Backend-specific logic |\n\n### 3. Skill Selection\n\nDefine all skills available to this agent in the `skills:` section. Then specify which skills are available for the chat thread in `agent.skills`:\n\n```yaml\n# All skills available to this agent (defined once at protocol level)\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n pdf-processor:\n display: description\n description: Processing PDFs\n\n# Skills available for this chat thread\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis] # Skills available for this thread\n```\n\n### 4. Display Modes\n\nChoose appropriate display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n## Comparison: Skills vs Tools vs Provider Options\n\n| Feature | Octavus Skills | External Tools | Provider Tools/Skills |\n| ------------------ | --------------------------- | ------------------- | --------------------- |\n| **Execution** | Sandbox or agent's computer | Your backend | Provider servers |\n| **Provider** | Any (agnostic) | N/A | Provider-specific |\n| **Code Execution** | Yes | No | Yes (provider tools) |\n| **File Output** | Yes | No | Yes (provider skills) |\n| **Implementation** | Skill packages | Your code | Built-in |\n| **Cost** | Sandbox + LLM API | Your infrastructure | Included in API |\n\n## Uploading Custom Skills\n\nYou can upload custom skills to your organization using the CLI or the platform UI.\n\n### Via CLI (Recommended)\n\nUse [`octavus skills sync`](/docs/server-sdk/cli#octavus-skills-sync-path) to package and upload a skill directory. If the skill has a `.env` file, secrets are pushed alongside the bundle:\n\n```bash\noctavus skills sync ./skills/my-skill\n```\n\n### Skill Directory Structure\n\n```\nmy-skill/\n\u251C\u2500\u2500 SKILL.md # Required: Skill documentation with frontmatter\n\u251C\u2500\u2500 scripts/ # Optional: Executable scripts\n\u2502 \u251C\u2500\u2500 run.py\n\u2502 \u2514\u2500\u2500 requirements.txt\n\u251C\u2500\u2500 references/ # Optional: Additional documentation\n\u251C\u2500\u2500 assets/ # Optional: Templates, images\n\u2514\u2500\u2500 .env # Optional: Secrets (not included in bundle)\n```\n\nOnce uploaded, reference the skill by slug in your protocol:\n\n```yaml\nskills:\n my-skill:\n display: description\n description: Custom analysis tool\n\nagent:\n skills: [my-skill]\n```\n\n## On-Demand Skills\n\nOn-demand skills (`onDemandSkills`) also support the `execution` field:\n\n```yaml\nonDemandSkills:\n display: description\n execution: device\n```\n\nWhen `execution: device` is set on the on-demand skills declaration, any skill attached at runtime via integrations runs on the agent's computer instead of in a sandbox.\n\n## Sandbox Timeout\n\nThe default sandbox timeout is 5 minutes (applies to sandbox skills only). You can configure a custom timeout using `sandboxTimeout` in the agent config or on individual `start-thread` blocks:\n\n```yaml\n# Agent-level timeout (applies to main thread)\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes (in milliseconds)\n```\n\n```yaml\n# Thread-level timeout (overrides agent-level for this thread)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour\n```\n\nThread-level `sandboxTimeout` takes priority over agent-level. Maximum: 1 hour (3,600,000 ms).\n\n## Skill Secrets\n\nSkills can declare secrets they need to function. When an organization configures those secrets, the skill runs in **secure mode** with additional isolation.\n\n### Declaring Secrets\n\nAdd a `secrets` array to your SKILL.md frontmatter:\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\nEach secret declaration has:\n\n| Field | Required | Description |\n| ------------- | -------- | ----------------------------------------------------------- |\n| `name` | Yes | Environment variable name (uppercase, e.g., `GITHUB_TOKEN`) |\n| `description` | No | Explains what this secret is for (shown in the UI) |\n| `required` | No | Whether the secret is required (defaults to `true`) |\n\nSecret names must match the pattern `^[A-Z_][A-Z0-9_]*$` (uppercase letters, digits, and underscores).\n\n### Configuring Secrets\n\nOrganization admins configure secret values through the skill editor in the platform UI. Each organization maintains its own independent set of secrets for each skill.\n\nSecrets are encrypted at rest and only decrypted at execution time.\n\n### Secure Mode\n\nWhen a skill has secrets configured for the organization, it automatically runs in **secure mode**:\n\n- The skill gets its own **isolated sandbox** (separate from other skills)\n- Secrets are injected as **environment variables** available to all scripts\n- Only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked\n- Scripts receive input as **JSON via stdin** (using the `input` parameter on `octavus_skill_run`) instead of CLI args\n- All output (stdout/stderr) is **automatically redacted** for secret values before being returned to the LLM\n\n### Writing Scripts for Secure Skills\n\nScripts in secure skills read input from stdin as JSON and access secrets from environment variables:\n\n```python\nimport json\nimport os\nimport sys\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ.get('GITHUB_TOKEN')\n\n# Use the token and input_data to perform the task\n```\n\nFor standard skills (without secrets), scripts receive input as CLI arguments. For secure skills, always use stdin JSON.\n\n## Security\n\nSandbox skills run in isolated environments:\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n- **Secret redaction** - output from secure skills is automatically scanned for secret values\n\nDevice skills run on the agent's computer and share its environment. They do not have sandbox isolation but benefit from restricted tool access (only slug-bearing tools are available).\n\n## Next Steps\n\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills in agent settings\n- [Provider Options](/docs/protocol/provider-options) - Anthropic's built-in skills\n- [Skills Advanced Guide](/docs/protocol/skills-advanced) - Best practices and advanced patterns\n",
|
|
1362
1380
|
excerpt: "Skills Skills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with...",
|
|
1363
1381
|
order: 5
|
|
1364
1382
|
},
|
|
@@ -1376,8 +1394,8 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
1376
1394
|
section: "protocol",
|
|
1377
1395
|
title: "Agent Config",
|
|
1378
1396
|
description: "Configuring the agent model and behavior.",
|
|
1379
|
-
content: '\n# Agent Config\n\nThe `agent` section configures the LLM model, system prompt, tools, and behavior.\n\n## Basic Configuration\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system # References prompts/system.md\n tools: [get-user-account] # Available tools\n mcpServers: [figma, browser] # MCP server connections\n skills: [qr-code] # Available skills\n references: [api-guidelines] # On-demand context documents\n```\n\n## Configuration Options\n\n| Field | Required | Description |\n| ---------------- | -------- | ---------------------------------------------------------------------------------------- |\n| `model` | Yes | Model identifier or variable reference |\n| `backupModel` | No | Backup model for automatic failover on provider errors |\n| `system` | Yes | System prompt filename (without .md) |\n| `input` | No | Variables to pass to the system prompt |\n| `tools` | No | List of tools the LLM can call |\n| `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |\n| `skills` | No | List of Octavus skills the LLM can use |\n| `references` | No | List of references the LLM can fetch on demand |\n| `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |\n| `imageModel` | No | Image generation model (enables agentic image generation) |\n| `webSearch` | No | Enable built-in web search tool (provider-agnostic) |\n| `agentic` | No | Allow multiple tool call cycles |\n| `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |\n| `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |\n| `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |\n| `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `anthropic` | No | Anthropic-specific options (tools, skills) |\n\n## Models\n\nSpecify models in `provider/model-id` format. Any model supported by the provider\'s SDK will work.\n\n### Supported Providers\n\n| Provider | Format | Examples |\n| --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |\n| Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |\n| Google | `google/{model-id}` | `gemini-3-pro-preview`, `gemini-3-flash-preview`, `gemini-2.5-flash` |\n| OpenAI | `openai/{model-id}` | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1` |\n\n### Examples\n\n```yaml\n# Anthropic Claude 4.5\nagent:\n model: anthropic/claude-sonnet-4-5\n\n# Google Gemini 3\nagent:\n model: google/gemini-3-flash-preview\n\n# OpenAI GPT-5\nagent:\n model: openai/gpt-5\n\n# OpenAI reasoning models\nagent:\n model: openai/o3-mini\n```\n\n> **Note**: Model IDs are passed directly to the provider SDK. Check the provider\'s documentation for the latest available models.\n\n### Dynamic Model Selection\n\nThe model field can also reference an input variable, allowing consumers to choose the model when creating a session:\n\n```yaml\ninput:\n MODEL:\n type: string\n description: The LLM model to use\n\nagent:\n model: MODEL # Resolved from session input\n system: system\n```\n\nWhen creating a session, pass the model:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n MODEL: \'anthropic/claude-sonnet-4-5\',\n});\n```\n\nThis enables:\n\n- **Multi-provider support** - Same agent works with different providers\n- **A/B testing** - Test different models without protocol changes\n- **User preferences** - Let users choose their preferred model\n\nThe model value is validated at runtime to ensure it\'s in the correct `provider/model-id` format.\n\n> **Note**: When using dynamic models, provider-specific options (like `anthropic:`) may not apply if the model resolves to a different provider.\n\n## Backup Model\n\nConfigure a fallback model that activates automatically when the primary model encounters a transient provider error (rate limits, outages, timeouts):\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n```\n\nWhen a provider error occurs, the system retries once with the backup model. If the backup also fails, the original error is returned.\n\n**Key behaviors:**\n\n- Only transient provider errors trigger fallback - authentication and validation errors are not retried\n- Provider-specific options (like `anthropic:`) are only forwarded to the backup model if it uses the same provider\n- For streaming responses, fallback only occurs if no content has been sent to the client yet\n\nLike `model`, `backupModel` supports variable references:\n\n```yaml\ninput:\n BACKUP_MODEL:\n type: string\n description: Fallback model for provider errors\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: BACKUP_MODEL\n system: system\n```\n\n> **Tip**: Use a different provider for your backup model (e.g., primary on Anthropic, backup on OpenAI) to maximize resilience against single-provider outages.\n\n## System Prompt\n\nThe system prompt sets the agent\'s persona and instructions. The `input` field controls which variables are available to the prompt - only variables listed in `input` are interpolated.\n\n```yaml\nagent:\n system: system # Uses prompts/system.md\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n```\n\nVariables in `input` can come from `protocol.input`, `protocol.resources`, or `protocol.variables`.\n\n### Input Mapping Formats\n\n```yaml\n# Array format (same name)\ninput:\n - COMPANY_NAME\n - PRODUCT_NAME\n\n# Array format (rename)\ninput:\n - CONTEXT: CONVERSATION_SUMMARY # Prompt sees CONTEXT, value comes from CONVERSATION_SUMMARY\n\n# Object format (rename)\ninput:\n CONTEXT: CONVERSATION_SUMMARY\n```\n\nThe left side (label) is what the prompt sees. The right side (source) is where the value comes from.\n\n### Example\n\n`prompts/system.md`:\n\n```markdown\nYou are a friendly support agent for {{COMPANY_NAME}}.\n\n## Your Role\n\nHelp users with questions about {{PRODUCT_NAME}}.\n\n## Guidelines\n\n- Be helpful and professional\n- If you can\'t help, offer to escalate\n- Never share internal information\n```\n\n## Agentic Mode\n\nEnable multi-step tool calling:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account, search-docs, create-ticket]\n agentic: true # LLM can call multiple tools\n maxSteps: 10 # Limit cycles to prevent runaway\n```\n\n**How it works:**\n\n1. LLM receives user message\n2. LLM decides to call a tool\n3. Tool executes, result returned to LLM\n4. LLM decides if more tools needed\n5. Repeat until LLM responds or maxSteps reached\n\n## Extended Thinking\n\nEnable extended reasoning for complex tasks:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n thinking: medium # low | medium | high | max\n```\n\n| Level | Use Case |\n| -------- | ---------------------------------- |\n| `low` | Simple reasoning |\n| `medium` | Moderate complexity |\n| `high` | Complex analysis |\n| `max` | Maximum reasoning budget available |\n\nThinking content streams to the UI and can be displayed to users.\n\n### How levels are applied\n\nEach provider translates `thinking` into its own reasoning controls:\n\n| Provider | Level mapping |\n| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| Anthropic 4.6+ (`claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`) | Adaptive thinking - the model decides how much to reason, guided by `effort: low / medium / high / max` |\n| Anthropic older (4.5 and earlier) | Fixed token budgets: `low` ~5,000, `medium` ~10,000, `high` ~20,000, `max` ~40,000 |\n| OpenAI (GPT-5.x, o-series) | `reasoningEffort: low / medium / high` (`max` maps to `high`) |\n| Google (Gemini 3.x) | `thinkingLevel: low / high` (`medium` rounds up to `high`) |\n| Google (Gemini 1.x / 2.x) | Token budgets: `low` 1,024, `medium` 8,192, `high` 24,576, `max` 65,536 |\n| OpenRouter | Unified `reasoning.max_tokens` (translated upstream) |\n| Vercel AI Gateway | Forwards the underlying provider\'s options |\n\n## Prompt Caching\n\nProviders charge less for tokens served from their prompt cache (often 10% of the uncached rate). Octavus exposes a single `cache` field that picks the right retention policy per provider, so the stable prefix of your agent - tools, system prompt, and historical messages - gets billed at the cache-read rate on repeat requests.\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n cache: auto # auto (default) | extended | off\n```\n\n| Mode | Behavior | When to use |\n| ---------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| `auto` | Short-TTL caching. Default when omitted. | Most agents. Free on all supported providers and pays for itself within the same session. |\n| `extended` | Long-TTL caching. Trades a higher cache-write cost for much longer residency. | Agents triggered with gaps (daily reports, on-call assistants) where the prefix is reused across hours. |\n| `off` | No opt-in caching emitted. | When you explicitly want to skip caching - e.g. debugging a non-deterministic prefix. |\n\n### Per-provider behavior\n\nThe `cache` field is provider-agnostic at the protocol level - each provider translates it into its own cache retention policy:\n\n| Provider | `auto` TTL | `extended` TTL |\n| --------- | ------------------------- | -------------- |\n| Anthropic | 5 minutes | 1 hour |\n| OpenAI | in-memory (~5\u201310 minutes) | 24 hours |\n| Google | Implicit (Gemini 2.5+) | Implicit |\n\nOn `off`, Octavus emits no explicit cache options. Providers that auto-cache (OpenAI on prefixes \u2265 1,024 tokens, Gemini 2.5+) may still cache transparently - `off` just disables Octavus\'s opt-in behavior.\n\n### Threads don\'t inherit\n\nNamed threads (created with `start-thread`) read their own `cache` field independently - they **do not** inherit the agent\'s cache value:\n\n```yaml\nagent:\n cache: extended # 1-hour TTL on the main thread\n\nhandlers:\n summarize:\n Start summary:\n block: start-thread\n thread: summary\n # No cache field \u2192 defaults to \'auto\' (5-minute TTL), NOT \'extended\'\n system: summary-system\n```\n\nThis is intentional: named threads are often used for short, one-shot work (summarization, classification) where the long TTL would be wasted. Set `cache` explicitly on `start-thread` when you do want it.\n\n### Cost trade-offs\n\n- **Cache reads** are always much cheaper than uncached input on any provider - caching is effectively free if your prefix is stable.\n- **Cache writes** on Anthropic cost ~1.25\xD7 input for `auto` and 2\xD7 input for `extended`. OpenAI and Google don\'t charge separately for cache writes.\n- Use `extended` only when the same prefix is genuinely reused across sessions that span hours; otherwise the higher write cost dominates the savings.\n\n## Skills\n\nEnable Octavus skills for code execution and file generation:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code] # Enable skills\n agentic: true\n```\n\nSkills provide provider-agnostic code execution in isolated sandboxes. When enabled, the LLM can execute Python/Bash code, run skill scripts, and generate files.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## References\n\nEnable on-demand context loading via reference documents:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n references: [api-guidelines, error-codes]\n agentic: true\n```\n\nReferences are markdown files stored in the agent\'s `references/` directory. When enabled, the LLM can list available references and read their content using `octavus_reference_list` and `octavus_reference_read` tools.\n\nSee [References](/docs/protocol/references) for full documentation.\n\n## Image Generation\n\nEnable the LLM to generate images autonomously:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n imageModel: google/gemini-2.5-flash-image\n agentic: true\n```\n\nWhen `imageModel` is configured, the `octavus_generate_image` tool becomes available. The LLM can decide when to generate images based on user requests. The tool supports both text-to-image generation and image editing/transformation using reference images.\n\n### Supported Image Providers\n\n| Provider | Model Types | Examples |\n| -------- | --------------------------------------- | --------------------------------------------------------- |\n| OpenAI | Dedicated image models | `gpt-image-1` |\n| Google | Gemini native (contains "image") | `gemini-2.5-flash-image`, `gemini-3-flash-image-generate` |\n| Google | Imagen dedicated (starts with "imagen") | `imagen-4.0-generate-001` |\n\n> **Note**: Google has two image generation approaches. Gemini "native" models (containing "image" in the ID) generate images using the language model API with `responseModalities`. Imagen models (starting with "imagen") use a dedicated image generation API.\n\n### Image Sizes\n\nThe tool supports three image sizes:\n\n- `1024x1024` (default) - Square\n- `1792x1024` - Landscape (16:9)\n- `1024x1792` - Portrait (9:16)\n\n### Image Editing with Reference Images\n\nBoth the agentic tool and the `generate-image` block support reference images for editing and transformation. When reference images are provided, the prompt describes how to modify or use those images.\n\n| Provider | Models | Reference Image Support |\n| -------- | -------------------------------- | ----------------------- |\n| OpenAI | `gpt-image-1` | Yes |\n| Google | Gemini native (`gemini-*-image`) | Yes |\n| Google | Imagen (`imagen-*`) | No |\n\n### Agentic vs Deterministic\n\nUse `imageModel` in agent config when:\n\n- The LLM should decide when to generate or edit images\n- Users ask for images in natural language\n\nUse `generate-image` block (see [Handlers](/docs/protocol/handlers#generate-image)) when:\n\n- You want explicit control over image generation or editing\n- Building prompt engineering pipelines\n- Images are generated at specific handler steps\n\n## Web Search\n\nEnable the LLM to search the web for current information:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n webSearch: true\n agentic: true\n```\n\nWhen `webSearch` is enabled, the `octavus_web_search` tool becomes available. The LLM can decide when to search the web based on the conversation. Search results include source URLs that are emitted as citations in the UI.\n\nThis is a **provider-agnostic** built-in tool - it works with any LLM provider (Anthropic, Google, OpenAI, etc.). For Anthropic\'s own web search implementation, see [Provider Options](/docs/protocol/provider-options).\n\nUse cases:\n\n- Current events and real-time data\n- Fact verification and documentation lookups\n- Any information that may have changed since the model\'s training\n\n## TODO List\n\nEnable the LLM to maintain a structured task list while it works:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n todoList: true\n agentic: true\n```\n\nWhen `todoList` is enabled, the `octavus_todo_write` tool becomes available. The LLM creates and updates a list of items - each with `id`, `content`, and `status` (`pending`, `in_progress`, `completed`, `cancelled`) - and the platform emits a `todo-update` stream event with the resolved snapshot. The Client SDK accumulates updates into a single `UITodoPart` per assistant message, so consumers render an evolving "Plan" card without managing state themselves.\n\nThe list persists across messages: the LLM can use `merge=true` to update items by id (sending only the changed fields), or `merge=false` to replace the list entirely.\n\nUse cases:\n\n- Multi-step tasks where the user benefits from seeing progress\n- Long-running agentic loops that should communicate intent\n- Workflows where the agent plans before acting\n\n## Temperature\n\nControl response randomness:\n\n```yaml\nagent:\n model: openai/gpt-4o\n temperature: 0.7 # 0 = deterministic, 2 = creative\n```\n\n**Guidelines:**\n\n- `0 - 0.3`: Factual, consistent responses\n- `0.4 - 0.7`: Balanced (good default)\n- `0.8 - 1.2`: Creative, varied responses\n- `> 1.2`: Very creative (may be inconsistent)\n\n## Dynamic Configuration\n\nLike `model`, the `temperature`, `thinking`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:\n\n```yaml\ninput:\n TEMPERATURE:\n type: number\n description: Override temperature (0-2)\n optional: true\n THINKING:\n type: string\n description: Override thinking effort (low/medium/high/max, or "off")\n optional: true\n MAX_STEPS:\n type: integer\n description: Override max agentic steps\n optional: true\n\nagent:\n model: anthropic/claude-sonnet-4-5\n temperature: TEMPERATURE\n thinking: THINKING\n maxSteps: MAX_STEPS\n system: system\n```\n\nWhen creating a session, pass the values in their natural type:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n TEMPERATURE: 0.7,\n THINKING: \'medium\',\n MAX_STEPS: 5,\n});\n```\n\n### Accepted values\n\nThe resolver accepts the natural type for each field, plus a string fallback so consumers can pass values from form inputs without coercing first.\n\n| Field | Suggested input type | Value at session creation |\n| ------------- | ------------------------------------------ | -------------------------------------------------- |\n| `temperature` | `number` (or `string` for `"off"` support) | A number `0`-`2`, a numeric string, or `"off"` |\n| `thinking` | `string` | `"low"`, `"medium"`, `"high"`, `"max"`, or `"off"` |\n| `maxSteps` | `integer` (or `string`) | A positive integer or a positive integer string |\n\nThe protocol\'s `input:` declaration enforces what the consumer can pass. Pick `type: number` / `type: integer` if you want native numeric overrides; pick `type: string` (or `type: unknown`) if you also need to pass the `"off"` sentinel for `temperature`.\n\n### Explicit "off" vs not set\n\n`temperature` and `thinking` accept an explicit `"off"` value to disable the field at session creation. This is different from omitting the variable:\n\n- **Variable not provided** -> the field is unset; the provider uses its default behavior\n- **Variable provided as `"off"`** -> the field is explicitly disabled (no temperature emitted, reasoning disabled)\n\nThe distinction matters because `temperature` and `thinking` are mutually exclusive at the provider level - several providers ignore temperature when reasoning is enabled. Use `"off"` to opt one out so the other takes effect.\n\n### Validation\n\nVariable references are caught at protocol validation time. If `temperature: TEMPERATURE` is declared but `TEMPERATURE` is missing from `input:` or `variables:`, the validator surfaces the error in the dashboard before the agent runs.\n\n## Provider Options\n\nEnable provider-specific features like Anthropic\'s built-in tools and skills:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n```\n\nProvider options are validated against the model - using `anthropic:` with a non-Anthropic model will fail validation.\n\nSee [Provider Options](/docs/protocol/provider-options) for full documentation.\n\n## Thread-Specific Config\n\nOverride config for named threads:\n\n```yaml\nhandlers:\n request-human:\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-sonnet-4-5 # Different model\n backupModel: openai/gpt-4o # Failover model\n thinking: low # Different thinking\n cache: off # Different cache mode (does not inherit from agent)\n maxSteps: 1 # Limit tool calls\n system: escalation-summary # Different prompt\n mcpServers: [figma, browser] # Thread-specific MCP servers\n skills: [data-analysis] # Thread-specific skills\n references: [escalation-policy] # Thread-specific references\n imageModel: google/gemini-2.5-flash-image # Thread-specific image model\n webSearch: true # Thread-specific web search\n todoList: true # Thread-specific task list\n```\n\nEach thread can have its own model, backup model, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol\'s `skills:` section. References must exist in the agent\'s `references/` directory. Workers use this same pattern since they don\'t have a global `agent:` section.\n\n## Full Example\n\n```yaml\ninput:\n COMPANY_NAME: { type: string }\n PRODUCT_NAME: { type: string }\n USER_ID: { type: string, optional: true }\n\nresources:\n CONVERSATION_SUMMARY:\n type: string\n default: \'\'\n\ntools:\n get-user-account:\n description: Look up user account\n parameters:\n userId: { type: string }\n\n search-docs:\n description: Search help documentation\n parameters:\n query: { type: string }\n\n create-support-ticket:\n description: Create a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string } # low, medium, high\n\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n tools:\n - get-user-account\n - search-docs\n - create-support-ticket\n mcpServers: [figma] # MCP server connections\n skills: [qr-code] # Octavus skills\n references: [support-policies] # On-demand context\n webSearch: true # Built-in web search\n todoList: true # Structured task tracking\n agentic: true\n maxSteps: 10\n thinking: medium\n # Anthropic-specific options\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n\ntriggers:\n user-message:\n input:\n USER_MESSAGE: { type: string }\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n display: hidden\n\n Respond:\n block: next-message\n```\n',
|
|
1380
|
-
excerpt: "Agent Config The section configures the LLM model, system prompt, tools, and behavior. Basic Configuration Configuration Options | Field
|
|
1397
|
+
content: '\n# Agent Config\n\nThe `agent` section configures the LLM model, system prompt, tools, and behavior.\n\n## Basic Configuration\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system # References prompts/system.md\n tools: [get-user-account] # Available tools\n mcpServers: [figma, browser] # MCP server connections\n skills: [qr-code] # Available skills\n references: [api-guidelines] # On-demand context documents\n```\n\n## Configuration Options\n\n| Field | Required | Description |\n| --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ |\n| `model` | Yes | Model identifier or variable reference |\n| `backupModel` | No | Backup model for automatic failover on provider errors |\n| `system` | Yes | System prompt filename (without .md) |\n| `input` | No | Variables to pass to the system prompt |\n| `tools` | No | List of tools the LLM can call |\n| `mcpServers` | No | List of MCP servers to connect (see [MCP Servers](/docs/protocol/mcp-servers)) |\n| `skills` | No | List of Octavus skills the LLM can use |\n| `references` | No | List of references the LLM can fetch on demand |\n| `sandboxTimeout` | No | Skill sandbox timeout in ms (default: 5 min, max: 1 hour) |\n| `imageModel` | No | Image generation model (enables agentic image generation) |\n| `webSearch` | No | Enable built-in web search tool (provider-agnostic) |\n| `agentic` | No | Allow multiple tool call cycles |\n| `maxSteps` | No | Maximum agentic steps (default: 10) - literal or variable reference |\n| `temperature` | No | Model temperature (0-2), `"off"`, or a variable reference |\n| `thinking` | No | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or a variable reference |\n| `speed` | No | Inference speed for supported Opus models: `fast`/`standard` (see [Fast Mode](/docs/protocol/fast-mode)) |\n| `cache` | No | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `maxToolOutputTokens` | No | Cap a single tool result at this many tokens in the model view (head+tail preview + note). Omit to leave tool output unbounded |\n| `contextManagement` | No | Automatic context-window compaction (see [Context Management](/docs/protocol/context-management)) |\n| `anthropic` | No | Anthropic-specific options (tools, skills) |\n\n## Models\n\nSpecify models in `provider/model-id` format. Any model supported by the provider\'s SDK will work.\n\n### Supported Providers\n\n| Provider | Format | Examples |\n| --------- | ---------------------- | -------------------------------------------------------------------------------------------------- |\n| Anthropic | `anthropic/{model-id}` | `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-sonnet-4-5`, `claude-haiku-4-5` |\n| Google | `google/{model-id}` | `gemini-3.5-flash`, `gemini-3-flash-preview`, `gemini-2.5-flash` |\n| OpenAI | `openai/{model-id}` | `gpt-5`, `gpt-4o`, `o4-mini`, `o3`, `o3-mini`, `o1` |\n\n### Examples\n\n```yaml\n# Anthropic Claude 4.5\nagent:\n model: anthropic/claude-sonnet-4-5\n\n# Google Gemini 3\nagent:\n model: google/gemini-3-flash-preview\n\n# OpenAI GPT-5\nagent:\n model: openai/gpt-5\n\n# OpenAI reasoning models\nagent:\n model: openai/o3-mini\n```\n\n> **Note**: Model IDs are passed directly to the provider SDK. Check the provider\'s documentation for the latest available models.\n\n### Dynamic Model Selection\n\nThe model field can also reference an input variable, allowing consumers to choose the model when creating a session:\n\n```yaml\ninput:\n MODEL:\n type: string\n description: The LLM model to use\n\nagent:\n model: MODEL # Resolved from session input\n system: system\n```\n\nWhen creating a session, pass the model:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n MODEL: \'anthropic/claude-sonnet-4-5\',\n});\n```\n\nThis enables:\n\n- **Multi-provider support** - Same agent works with different providers\n- **A/B testing** - Test different models without protocol changes\n- **User preferences** - Let users choose their preferred model\n\nThe model value is validated at runtime to ensure it\'s in the correct `provider/model-id` format.\n\n> **Note**: When using dynamic models, provider-specific options (like `anthropic:`) may not apply if the model resolves to a different provider.\n\n## Backup Model\n\nConfigure a fallback model that activates automatically when the primary model encounters a transient provider error (rate limits, outages, timeouts):\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n```\n\nWhen a provider error occurs, the system retries once with the backup model. If the backup also fails, the original error is returned.\n\n**Key behaviors:**\n\n- Only transient provider errors trigger fallback - authentication and validation errors are not retried\n- Provider-specific options (like `anthropic:`) are only forwarded to the backup model if it uses the same provider\n- For streaming responses, fallback only occurs if no content has been sent to the client yet\n\nLike `model`, `backupModel` supports variable references:\n\n```yaml\ninput:\n BACKUP_MODEL:\n type: string\n description: Fallback model for provider errors\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: BACKUP_MODEL\n system: system\n```\n\n> **Tip**: Use a different provider for your backup model (e.g., primary on Anthropic, backup on OpenAI) to maximize resilience against single-provider outages.\n\n## System Prompt\n\nThe system prompt sets the agent\'s persona and instructions. The `input` field controls which variables are available to the prompt - only variables listed in `input` are interpolated.\n\n```yaml\nagent:\n system: system # Uses prompts/system.md\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n```\n\nVariables in `input` can come from `protocol.input`, `protocol.resources`, or `protocol.variables`.\n\n### Input Mapping Formats\n\n```yaml\n# Array format (same name)\ninput:\n - COMPANY_NAME\n - PRODUCT_NAME\n\n# Array format (rename)\ninput:\n - CONTEXT: CONVERSATION_SUMMARY # Prompt sees CONTEXT, value comes from CONVERSATION_SUMMARY\n\n# Object format (rename)\ninput:\n CONTEXT: CONVERSATION_SUMMARY\n```\n\nThe left side (label) is what the prompt sees. The right side (source) is where the value comes from.\n\n### Example\n\n`prompts/system.md`:\n\n```markdown\nYou are a friendly support agent for {{COMPANY_NAME}}.\n\n## Your Role\n\nHelp users with questions about {{PRODUCT_NAME}}.\n\n## Guidelines\n\n- Be helpful and professional\n- If you can\'t help, offer to escalate\n- Never share internal information\n```\n\n## Agentic Mode\n\nEnable multi-step tool calling:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [get-user-account, search-docs, create-ticket]\n agentic: true # LLM can call multiple tools\n maxSteps: 10 # Limit cycles to prevent runaway\n```\n\n**How it works:**\n\n1. LLM receives user message\n2. LLM decides to call a tool\n3. Tool executes, result returned to LLM\n4. LLM decides if more tools needed\n5. Repeat until LLM responds or maxSteps reached\n\n## Extended Thinking\n\nEnable extended reasoning for complex tasks:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n thinking: medium # low | medium | high | max\n```\n\n| Level | Use Case |\n| -------- | ---------------------------------- |\n| `low` | Simple reasoning |\n| `medium` | Moderate complexity |\n| `high` | Complex analysis |\n| `max` | Maximum reasoning budget available |\n\nThinking content streams to the UI and can be displayed to users.\n\n### How levels are applied\n\nEach provider translates `thinking` into its own reasoning controls:\n\n| Provider | Level mapping |\n| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| Anthropic 4.6+ (`claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`) | Adaptive thinking - the model decides how much to reason, guided by `effort: low / medium / high / max` |\n| Anthropic older (4.5 and earlier) | Fixed token budgets: `low` ~5,000, `medium` ~10,000, `high` ~20,000, `max` ~40,000 |\n| OpenAI (GPT-5.x, o-series) | `reasoningEffort: low / medium / high` (`max` maps to `high`) |\n| Google (Gemini 3.x) | `thinkingLevel: low / high` (`medium` rounds up to `high`) |\n| Google (Gemini 1.x / 2.x) | Token budgets: `low` 1,024, `medium` 8,192, `high` 24,576, `max` 65,536 |\n| OpenRouter | Unified `reasoning.max_tokens` (translated upstream) |\n| Vercel AI Gateway | Forwards the underlying provider\'s options |\n\n## Prompt Caching\n\nProviders charge less for tokens served from their prompt cache (often 10% of the uncached rate). Octavus exposes a single `cache` field that picks the right retention policy per provider, so the stable prefix of your agent - tools, system prompt, and historical messages - gets billed at the cache-read rate on repeat requests.\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n cache: auto # auto (default) | extended | off\n```\n\n| Mode | Behavior | When to use |\n| ---------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |\n| `auto` | Short-TTL caching. Default when omitted. | Most agents. Free on all supported providers and pays for itself within the same session. |\n| `extended` | Long-TTL caching. Trades a higher cache-write cost for much longer residency. | Agents triggered with gaps (daily reports, on-call assistants) where the prefix is reused across hours. |\n| `off` | No opt-in caching emitted. | When you explicitly want to skip caching - e.g. debugging a non-deterministic prefix. |\n\n### Per-provider behavior\n\nThe `cache` field is provider-agnostic at the protocol level - each provider translates it into its own cache retention policy:\n\n| Provider | `auto` TTL | `extended` TTL |\n| --------- | ------------------------- | -------------- |\n| Anthropic | 5 minutes | 1 hour |\n| OpenAI | in-memory (~5\u201310 minutes) | 24 hours |\n| Google | Implicit (Gemini 2.5+) | Implicit |\n\nOn `off`, Octavus emits no explicit cache options. Providers that auto-cache (OpenAI on prefixes \u2265 1,024 tokens, Gemini 2.5+) may still cache transparently - `off` just disables Octavus\'s opt-in behavior.\n\n### Threads don\'t inherit\n\nNamed threads (created with `start-thread`) read their own `cache` field independently - they **do not** inherit the agent\'s cache value:\n\n```yaml\nagent:\n cache: extended # 1-hour TTL on the main thread\n\nhandlers:\n summarize:\n Start summary:\n block: start-thread\n thread: summary\n # No cache field \u2192 defaults to \'auto\' (5-minute TTL), NOT \'extended\'\n system: summary-system\n```\n\nThis is intentional: named threads are often used for short, one-shot work (summarization, classification) where the long TTL would be wasted. Set `cache` explicitly on `start-thread` when you do want it.\n\n### Cost trade-offs\n\n- **Cache reads** are always much cheaper than uncached input on any provider - caching is effectively free if your prefix is stable.\n- **Cache writes** on Anthropic cost ~1.25\xD7 input for `auto` and 2\xD7 input for `extended`. OpenAI and Google don\'t charge separately for cache writes.\n- Use `extended` only when the same prefix is genuinely reused across sessions that span hours; otherwise the higher write cost dominates the savings.\n\n## Skills\n\nEnable Octavus skills for code execution and file generation:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code] # Enable skills\n agentic: true\n```\n\nSkills provide provider-agnostic code execution in isolated sandboxes. When enabled, the LLM can execute Python/Bash code, run skill scripts, and generate files.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## References\n\nEnable on-demand context loading via reference documents:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n references: [api-guidelines, error-codes]\n agentic: true\n```\n\nReferences are markdown files stored in the agent\'s `references/` directory. When enabled, the LLM can list available references and read their content using `octavus_reference_list` and `octavus_reference_read` tools.\n\nSee [References](/docs/protocol/references) for full documentation.\n\n## Image Generation\n\nEnable the LLM to generate images autonomously:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n imageModel: google/gemini-2.5-flash-image\n agentic: true\n```\n\nWhen `imageModel` is configured, the `octavus_generate_image` tool becomes available. The LLM can decide when to generate images based on user requests. The tool supports both text-to-image generation and image editing/transformation using reference images.\n\n### Supported Image Providers\n\n| Provider | Model Types | Examples |\n| -------- | --------------------------------------- | --------------------------------------------------------- |\n| OpenAI | Dedicated image models | `gpt-image-1` |\n| Google | Gemini native (contains "image") | `gemini-2.5-flash-image`, `gemini-3-flash-image-generate` |\n| Google | Imagen dedicated (starts with "imagen") | `imagen-4.0-generate-001` |\n\n> **Note**: Google has two image generation approaches. Gemini "native" models (containing "image" in the ID) generate images using the language model API with `responseModalities`. Imagen models (starting with "imagen") use a dedicated image generation API.\n\n### Image Sizes\n\nThe tool supports three image sizes:\n\n- `1024x1024` (default) - Square\n- `1792x1024` - Landscape (16:9)\n- `1024x1792` - Portrait (9:16)\n\n### Image Editing with Reference Images\n\nBoth the agentic tool and the `generate-image` block support reference images for editing and transformation. When reference images are provided, the prompt describes how to modify or use those images.\n\n| Provider | Models | Reference Image Support |\n| -------- | -------------------------------- | ----------------------- |\n| OpenAI | `gpt-image-1` | Yes |\n| Google | Gemini native (`gemini-*-image`) | Yes |\n| Google | Imagen (`imagen-*`) | No |\n\n### Agentic vs Deterministic\n\nUse `imageModel` in agent config when:\n\n- The LLM should decide when to generate or edit images\n- Users ask for images in natural language\n\nUse `generate-image` block (see [Handlers](/docs/protocol/handlers#generate-image)) when:\n\n- You want explicit control over image generation or editing\n- Building prompt engineering pipelines\n- Images are generated at specific handler steps\n\n## Web Search\n\nEnable the LLM to search the web for current information:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n webSearch: true\n agentic: true\n```\n\nWhen `webSearch` is enabled, the `octavus_web_search` tool becomes available. The LLM can decide when to search the web based on the conversation. Search results include source URLs that are emitted as citations in the UI.\n\nThis is a **provider-agnostic** built-in tool - it works with any LLM provider (Anthropic, Google, OpenAI, etc.). For Anthropic\'s own web search implementation, see [Provider Options](/docs/protocol/provider-options).\n\nUse cases:\n\n- Current events and real-time data\n- Fact verification and documentation lookups\n- Any information that may have changed since the model\'s training\n\n## TODO List\n\nEnable the LLM to maintain a structured task list while it works:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n todoList: true\n agentic: true\n```\n\nWhen `todoList` is enabled, the `octavus_todo_write` tool becomes available. The LLM creates and updates a list of items - each with `id`, `content`, and `status` (`pending`, `in_progress`, `completed`, `cancelled`) - and the platform emits a `todo-update` stream event with the resolved snapshot. The Client SDK accumulates updates into a single `UITodoPart` per assistant message, so consumers render an evolving "Plan" card without managing state themselves.\n\nThe list persists across messages: the LLM can use `merge=true` to update items by id (sending only the changed fields), or `merge=false` to replace the list entirely.\n\nUse cases:\n\n- Multi-step tasks where the user benefits from seeing progress\n- Long-running agentic loops that should communicate intent\n- Workflows where the agent plans before acting\n\n## Temperature\n\nControl response randomness:\n\n```yaml\nagent:\n model: openai/gpt-4o\n temperature: 0.7 # 0 = deterministic, 2 = creative\n```\n\n**Guidelines:**\n\n- `0 - 0.3`: Factual, consistent responses\n- `0.4 - 0.7`: Balanced (good default)\n- `0.8 - 1.2`: Creative, varied responses\n- `> 1.2`: Very creative (may be inconsistent)\n\n## Dynamic Configuration\n\nLike `model`, the `temperature`, `thinking`, `speed`, and `maxSteps` fields can also reference an input variable. Consumers choose values at session creation, so the same agent can be tuned per call without protocol changes:\n\n```yaml\ninput:\n TEMPERATURE:\n type: number\n description: Override temperature (0-2)\n optional: true\n THINKING:\n type: string\n description: Override thinking effort (low/medium/high/max, or "off")\n optional: true\n MAX_STEPS:\n type: integer\n description: Override max agentic steps\n optional: true\n\nagent:\n model: anthropic/claude-sonnet-4-5\n temperature: TEMPERATURE\n thinking: THINKING\n maxSteps: MAX_STEPS\n system: system\n```\n\nWhen creating a session, pass the values in their natural type:\n\n```typescript\nconst sessionId = await client.agentSessions.create(\'my-agent\', {\n TEMPERATURE: 0.7,\n THINKING: \'medium\',\n MAX_STEPS: 5,\n});\n```\n\n### Accepted values\n\nThe resolver accepts the natural type for each field, plus a string fallback so consumers can pass values from form inputs without coercing first.\n\n| Field | Suggested input type | Value at session creation |\n| ------------- | ------------------------------------------ | -------------------------------------------------- |\n| `temperature` | `number` (or `string` for `"off"` support) | A number `0`-`2`, a numeric string, or `"off"` |\n| `thinking` | `string` | `"low"`, `"medium"`, `"high"`, `"max"`, or `"off"` |\n| `maxSteps` | `integer` (or `string`) | A positive integer or a positive integer string |\n\nThe protocol\'s `input:` declaration enforces what the consumer can pass. Pick `type: number` / `type: integer` if you want native numeric overrides; pick `type: string` (or `type: unknown`) if you also need to pass the `"off"` sentinel for `temperature`.\n\n### Explicit "off" vs not set\n\n`temperature` and `thinking` accept an explicit `"off"` value to disable the field at session creation. This is different from omitting the variable:\n\n- **Variable not provided** -> the field is unset; the provider uses its default behavior\n- **Variable provided as `"off"`** -> the field is explicitly disabled (no temperature emitted, reasoning disabled)\n\nThe distinction matters because `temperature` and `thinking` are mutually exclusive at the provider level - several providers ignore temperature when reasoning is enabled. Use `"off"` to opt one out so the other takes effect.\n\n### Validation\n\nVariable references are caught at protocol validation time. If `temperature: TEMPERATURE` is declared but `TEMPERATURE` is missing from `input:` or `variables:`, the validator surfaces the error in the dashboard before the agent runs.\n\n## Provider Options\n\nEnable provider-specific features like Anthropic\'s built-in tools and skills:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n```\n\nProvider options are validated against the model - using `anthropic:` with a non-Anthropic model will fail validation.\n\nSee [Provider Options](/docs/protocol/provider-options) for full documentation.\n\n## Thread-Specific Config\n\nOverride config for named threads:\n\n```yaml\nhandlers:\n request-human:\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-opus-4-8 # Different model\n backupModel: openai/gpt-4o # Failover model\n thinking: low # Different thinking\n speed: fast # Fast mode for this thread (supported Opus models only)\n cache: off # Different cache mode (does not inherit from agent)\n maxSteps: 1 # Limit tool calls\n system: escalation-summary # Different prompt\n mcpServers: [figma, browser] # Thread-specific MCP servers\n skills: [data-analysis] # Thread-specific skills\n references: [escalation-policy] # Thread-specific references\n imageModel: google/gemini-2.5-flash-image # Thread-specific image model\n webSearch: true # Thread-specific web search\n todoList: true # Thread-specific task list\n```\n\nEach thread can have its own model, backup model, thinking level, speed, cache mode, MCP servers, skills, references, image model, web search setting, and task list setting. Skills must be defined in the protocol\'s `skills:` section. References must exist in the agent\'s `references/` directory. Workers use this same pattern since they don\'t have a global `agent:` section - which is how a worker enables fast mode.\n\n## Full Example\n\n```yaml\ninput:\n COMPANY_NAME: { type: string }\n PRODUCT_NAME: { type: string }\n USER_ID: { type: string, optional: true }\n\nresources:\n CONVERSATION_SUMMARY:\n type: string\n default: \'\'\n\ntools:\n get-user-account:\n description: Look up user account\n parameters:\n userId: { type: string }\n\n search-docs:\n description: Search help documentation\n parameters:\n query: { type: string }\n\n create-support-ticket:\n description: Create a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string } # low, medium, high\n\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n\nagent:\n model: anthropic/claude-sonnet-4-5\n backupModel: openai/gpt-4o\n system: system\n input:\n - COMPANY_NAME\n - PRODUCT_NAME\n tools:\n - get-user-account\n - search-docs\n - create-support-ticket\n mcpServers: [figma] # MCP server connections\n skills: [qr-code] # Octavus skills\n references: [support-policies] # On-demand context\n webSearch: true # Built-in web search\n todoList: true # Structured task tracking\n agentic: true\n maxSteps: 10\n thinking: medium\n # Anthropic-specific options\n anthropic:\n tools:\n web-search:\n display: description\n description: Searching the web\n skills:\n pdf:\n type: anthropic\n description: Processing PDF\n\ntriggers:\n user-message:\n input:\n USER_MESSAGE: { type: string }\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n display: hidden\n\n Respond:\n block: next-message\n```\n',
|
|
1398
|
+
excerpt: "Agent Config The section configures the LLM model, system prompt, tools, and behavior. Basic Configuration Configuration Options | Field | Required | Description ...",
|
|
1381
1399
|
order: 7
|
|
1382
1400
|
},
|
|
1383
1401
|
{
|
|
@@ -1394,7 +1412,7 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
1394
1412
|
section: "protocol",
|
|
1395
1413
|
title: "Skills Advanced Guide",
|
|
1396
1414
|
description: "Best practices and advanced patterns for using Octavus skills.",
|
|
1397
|
-
content: "\n# Skills Advanced Guide\n\nThis guide covers advanced patterns and best practices for using Octavus skills in your agents.\n\n## When to Use Skills\n\nSkills are ideal for:\n\n- **Code execution** - Running Python/Bash scripts\n- **File generation** - Creating images, PDFs, reports\n- **Data processing** - Analyzing, transforming, or visualizing data\n- **Provider-agnostic needs** - Features that should work with any LLM\n\nUse external tools instead when:\n\n- **Simple API calls** - Database queries, external services\n- **Authentication required** - Accessing user-specific resources\n- **Backend integration** - Tight coupling with your infrastructure\n\n## Skill Selection Strategy\n\n### Defining Available Skills\n\nDefine all skills in the `skills:` section, then reference which skills are available where they're used:\n\n**Interactive agents** - reference in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n pdf-processor:\n display: description\n description: Processing PDFs\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\n**Workers and named threads** - reference per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n\nsteps:\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis]\n maxSteps: 10\n```\n\n### Execution Mode\n\nThe `execution` field is set at the skill definition level and applies to all threads that use the skill:\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications\n execution: device # All threads using this skill run it on the device\n qr-code:\n display: description\n description: Generating QR codes\n # Defaults to sandbox execution\n```\n\nYou don't set `execution` per-thread - a skill's execution mode is consistent wherever it's used.\n\n### Match Skills to Use Cases\n\nDifferent threads can have different skills. Define all skills at the protocol level, then scope them to each thread:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n visualization:\n display: description\n description: Creating charts and visualizations\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\nFor a data analysis thread, you would specify `[data-analysis, visualization]` in `agent.skills` or in a `start-thread` block's `skills` field.\n\n## Display Mode Strategy\n\nChoose display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n### Guidelines\n\n- **`hidden`**: Background work that doesn't need user awareness\n- **`description`**: User-facing operations (default)\n- **`name`**: Quick operations where name is sufficient\n- **`stream`**: Long-running operations where progress matters\n\n## System Prompt Integration\n\nSkills are automatically injected into the system prompt. The LLM learns:\n\n1. **Available skills** - List of enabled skills with descriptions\n2. **How to use skills** - Instructions for using skill tools\n3. **Tool reference** - Available skill tools (`octavus_skill_read`, `octavus_code_run`, etc.)\n\nYou don't need to manually document skills in your system prompt. However, you can guide the LLM:\n\n```markdown\n<!-- prompts/system.md -->\n\nYou are a helpful assistant that can generate QR codes.\n\n## When to Generate QR Codes\n\nGenerate QR codes when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Share WiFi credentials\n- Create scannable data\n\nUse the qr-code skill for all QR code generation tasks.\n```\n\n## Error Handling\n\nSkills handle errors gracefully:\n\n```yaml\n# Skill execution errors are returned to the LLM\n# The LLM can retry or explain the error to the user\n```\n\nCommon error scenarios:\n\n1. **Invalid skill slug** - Skill not found in organization\n2. **Code execution errors** - Syntax errors, runtime exceptions\n3. **Missing dependencies** - Required packages not installed\n4. **File I/O errors** - Permission issues, invalid paths\n\nThe LLM receives error messages and can:\n\n- Retry with corrected code\n- Explain errors to users\n- Suggest alternatives\n\n## File Output Patterns\n\n### Single File Output\n\n```python\n# Save single file to /output/\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\nqr = qrcode.QRCode()\nqr.add_data('https://example.com')\nimg = qr.make_image()\nimg.save(f'{output_dir}/qrcode.png')\n```\n\n### Multiple Files\n\n```python\n# Save multiple files\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Generate multiple outputs\nfor i in range(3):\n filename = f'{output_dir}/output_{i}.png'\n # ... generate file ...\n```\n\n### Structured Output\n\n```python\n# Save structured data + files\nimport json\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Save metadata\nmetadata = {\n 'files': ['chart.png', 'data.csv'],\n 'summary': 'Analysis complete'\n}\nwith open(f'{output_dir}/metadata.json', 'w') as f:\n json.dump(metadata, f)\n\n# Save actual files\n# ... generate chart.png and data.csv ...\n```\n\n## Performance Considerations\n\n### Lazy Initialization\n\nSandboxes are created only when a skill tool is first called:\n\n```yaml\nagent:\n skills: [qr-code] # Sandbox created on first skill tool call\n```\n\nThis means:\n\n- No cost if skills aren't used\n- Fast startup (no sandbox creation delay)\n- Each `next-message` execution gets its own sandbox with only the skills it needs\n\n### Timeout Limits\n\nSandboxes default to a 5-minute timeout. Configure `sandboxTimeout` on the agent config or per thread:\n\n```yaml\n# Agent-level\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes\n```\n\n```yaml\n# Thread-level (overrides agent-level)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour for long-running analysis\n```\n\nThread-level `sandboxTimeout` takes priority. Maximum: 1 hour (3,600,000 ms).\n\n### Sandbox Lifecycle\n\nEach `next-message` execution gets its own sandbox:\n\n- **Scoped** - Only contains the skills available to that thread\n- **Isolated** - Interactive agents and workers don't share sandboxes\n- **Resilient** - If a sandbox expires, it's transparently recreated\n- **Cleaned up** - Sandbox destroyed when the LLM call completes\n\n## Combining Skills with Tools\n\nSkills and tools can work together:\n\n```yaml\ntools:\n get-user-data:\n description: Fetch user data from database\n parameters:\n userId: { type: string }\n\nskills:\n data-analysis:\n display: description\n description: Analyzing data\n\nagent:\n tools: [get-user-data]\n skills: [data-analysis]\n agentic: true\n\nhandlers:\n analyze-user:\n Get user data:\n block: tool-call\n tool: get-user-data\n input:\n userId: USER_ID\n output: USER_DATA\n\n Analyze:\n block: next-message\n # LLM can use data-analysis skill with USER_DATA\n```\n\nPattern:\n\n1. Fetch data via tool (from your backend)\n2. LLM uses skill to analyze/process the data\n3. Generate outputs (files, reports)\n\n## Secure Skills\n\nWhen a skill declares secrets and an organization configures them, the skill runs in secure mode with its own isolated sandbox.\n\n### Standard vs Secure vs Device Skills\n\n| Aspect | Standard Skills | Secure Skills | Device Skills |\n| ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |\n| **Environment** | Shared sandbox | Isolated sandbox (one per skill) | Agent's computer (VM or desktop) |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |\n| **Secrets** | No secrets | Secrets as env vars | No secrets |\n| **Output** | Raw stdout/stderr | Redacted (secret values replaced with `[REDACTED]`) | Raw stdout/stderr |\n\n### Writing Scripts for Secure Skills\n\nSecure skill scripts receive structured input via stdin (JSON) and access secrets from environment variables:\n\n```python\n#!/usr/bin/env python3\nimport json\nimport os\nimport sys\nimport subprocess\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ[\"GITHUB_TOKEN\"]\n\nrepo = input_data.get(\"repo\", \"\")\nresult = subprocess.run(\n [\"gh\", \"repo\", \"view\", repo, \"--json\", \"name,description\"],\n capture_output=True, text=True,\n env={**os.environ, \"GH_TOKEN\": token}\n)\n\nprint(result.stdout)\n```\n\nKey patterns:\n\n- **Read stdin**: `json.load(sys.stdin)` to get the `input` object from the `octavus_skill_run` call\n- **Access secrets**: `os.environ[\"SECRET_NAME\"]` - secrets are injected as env vars\n- **Print output**: Write results to stdout - the LLM sees the (redacted) stdout\n- **Error handling**: Write errors to stderr and exit with non-zero code\n\n### Declaring Secrets in SKILL.md\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\n### Testing Secure Skills Locally\n\nYou can test scripts locally by piping JSON to stdin:\n\n```bash\necho '{\"repo\": \"octavus-ai/agent-sdk\"}' | GITHUB_TOKEN=ghp_xxx python scripts/list-issues.py\n```\n\n## Skill Development Tips\n\n### Writing SKILL.md\n\nFocus on **when** and **how** to use the skill:\n\n```markdown\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\n---\n\n# QR Code Generator\n\n## When to Use\n\nUse this skill when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Create scannable data\n\n## Quick Start\n\n[Clear examples of how to use the skill]\n```\n\n### Script Organization\n\nOrganize scripts logically:\n\n```\nskill-name/\n\u251C\u2500\u2500 SKILL.md\n\u2514\u2500\u2500 scripts/\n \u251C\u2500\u2500 generate.py # Main script\n \u251C\u2500\u2500 utils.py # Helper functions\n \u2514\u2500\u2500 requirements.txt # Dependencies\n```\n\n### Error Messages\n\nProvide helpful error messages:\n\n```python\ntry:\n # ... code ...\nexcept ValueError as e:\n print(f\"Error: Invalid input - {e}\")\n sys.exit(1)\n```\n\nThe LLM sees these errors and can retry or explain to users.\n\n## Security Considerations\n\n### Sandbox Isolation\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n\n### Secret Protection\n\nFor skills with configured secrets:\n\n- **Isolated sandbox** - each secure skill gets its own sandbox, preventing cross-skill secret leakage\n- **No arbitrary code** - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked for secure skills, so only pre-built scripts can execute\n- **Output redaction** - all stdout and stderr are scanned for secret values before being returned to the LLM\n- **Encrypted at rest** - secrets are encrypted using AES-256-GCM and only decrypted at execution time\n\n### Input Validation\n\nSkills should validate inputs:\n\n```python\nimport sys\n\nif not data:\n print(\"Error: Data is required\")\n sys.exit(1)\n\nif len(data) > 1000:\n print(\"Error: Data too long (max 1000 characters)\")\n sys.exit(1)\n```\n\n### Resource Limits\n\nBe aware of:\n\n- **File size limits** - Large files may fail to upload\n- **Execution time** - Sandbox timeout (5-minute default, 1-hour maximum)\n- **Memory limits** - Sandbox environment constraints\n\n## Debugging Skills\n\n### Check Skill Documentation\n\nThe LLM can read skill docs:\n\n```python\n# LLM calls octavus_skill_read to see skill instructions\n```\n\n### Test Locally\n\nTest skills before uploading:\n\n```bash\n# Test skill locally\npython scripts/generate.py --data \"test\"\n```\n\n### Monitor Execution\n\nCheck execution logs in the platform debug view:\n\n- Tool calls and arguments\n- Code execution results\n- File outputs\n- Error messages\n\n## Common Patterns\n\n### Pattern 1: Generate and Return\n\n```yaml\n# User asks for QR code\n# LLM generates QR code\n# File automatically available for download\n```\n\n### Pattern 2: Analyze and Report\n\n```yaml\n# User provides data\n# LLM analyzes with skill\n# Generates report file\n# Returns summary + file link\n```\n\n### Pattern 3: Transform and Save\n\n```yaml\n# User uploads file (via tool)\n# LLM processes with skill\n# Generates transformed file\n# Returns new file link\n```\n\n## Best Practices Summary\n\n1. **Enable only needed skills** - Don't overwhelm the LLM\n2. **Choose appropriate display modes** - Match user experience needs\n3. **Write clear skill descriptions** - Help LLM understand when to use\n4. **Handle errors gracefully** - Provide helpful error messages\n5. **Test skills locally** - Verify before uploading\n6. **Monitor execution** - Check logs for issues\n7. **Combine with tools** - Use tools for data, skills for processing\n8. **Consider performance** - Be aware of timeouts and limits\n9. **Use secrets for credentials** - Declare secrets in frontmatter instead of hardcoding tokens\n10. **Design scripts for stdin input** - Secure skills receive JSON via stdin, so plan for both input methods if the skill might be used in either mode\n\n## Next Steps\n\n- [Skills](/docs/protocol/skills) - Basic skills documentation\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills\n- [Tools](/docs/protocol/tools) - External tools integration\n",
|
|
1415
|
+
content: "\n# Skills Advanced Guide\n\nThis guide covers advanced patterns and best practices for using Octavus skills in your agents.\n\n## When to Use Skills\n\nSkills are ideal for:\n\n- **Code execution** - Running Python/Bash scripts\n- **File generation** - Creating images, PDFs, reports\n- **Data processing** - Analyzing, transforming, or visualizing data\n- **Provider-agnostic needs** - Features that should work with any LLM\n\nUse external tools instead when:\n\n- **Simple API calls** - Database queries, external services\n- **Authentication required** - Accessing user-specific resources\n- **Backend integration** - Tight coupling with your infrastructure\n\n## Skill Selection Strategy\n\n### Defining Available Skills\n\nDefine all skills in the `skills:` section, then reference which skills are available where they're used:\n\n**Interactive agents** - reference in `agent.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n pdf-processor:\n display: description\n description: Processing PDFs\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\n**Workers and named threads** - reference per-thread in `start-thread.skills`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data\n\nsteps:\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code, data-analysis]\n maxSteps: 10\n```\n\n### Execution Mode\n\nThe `execution` field is set at the skill definition level and applies to all threads that use the skill:\n\n```yaml\nskills:\n deploy-tool:\n display: description\n description: Deploy applications\n execution: device # All threads using this skill run it on the device\n qr-code:\n display: description\n description: Generating QR codes\n # Defaults to sandbox execution\n```\n\nYou don't set `execution` per-thread - a skill's execution mode is consistent wherever it's used.\n\n### Match Skills to Use Cases\n\nDifferent threads can have different skills. Define all skills at the protocol level, then scope them to each thread:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generating QR codes\n data-analysis:\n display: description\n description: Analyzing data and generating reports\n visualization:\n display: description\n description: Creating charts and visualizations\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n```\n\nFor a data analysis thread, you would specify `[data-analysis, visualization]` in `agent.skills` or in a `start-thread` block's `skills` field.\n\n## Display Mode Strategy\n\nChoose display modes based on user experience:\n\n```yaml\nskills:\n # Background processing - hide from user\n data-analysis:\n display: hidden\n\n # User-facing generation - show description\n qr-code:\n display: description\n\n # Interactive progress - stream updates\n report-generation:\n display: stream\n```\n\n### Guidelines\n\n- **`hidden`**: Background work that doesn't need user awareness\n- **`description`**: User-facing operations (default)\n- **`name`**: Quick operations where name is sufficient\n- **`stream`**: Long-running operations where progress matters\n\n## System Prompt Integration\n\nSkills are automatically injected into the system prompt. The LLM learns:\n\n1. **Available skills** - List of enabled skills with descriptions\n2. **How to use skills** - Instructions for using skill tools\n3. **Tool reference** - Available skill tools (`octavus_skill_read`, `octavus_code_run`, etc.)\n\nYou don't need to manually document skills in your system prompt. However, you can guide the LLM:\n\n```markdown\n<!-- prompts/system.md -->\n\nYou are a helpful assistant that can generate QR codes.\n\n## When to Generate QR Codes\n\nGenerate QR codes when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Share WiFi credentials\n- Create scannable data\n\nUse the qr-code skill for all QR code generation tasks.\n```\n\n## Error Handling\n\nSkills handle errors gracefully:\n\n```yaml\n# Skill execution errors are returned to the LLM\n# The LLM can retry or explain the error to the user\n```\n\nCommon error scenarios:\n\n1. **Invalid skill slug** - Skill not found in organization\n2. **Code execution errors** - Syntax errors, runtime exceptions\n3. **Missing dependencies** - Required packages not installed\n4. **File I/O errors** - Permission issues, invalid paths\n\nThe LLM receives error messages and can:\n\n- Retry with corrected code\n- Explain errors to users\n- Suggest alternatives\n\n## File Output Patterns\n\n### Single File Output\n\n```python\n# Save single file to /output/\nimport qrcode\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\nqr = qrcode.QRCode()\nqr.add_data('https://example.com')\nimg = qr.make_image()\nimg.save(f'{output_dir}/qrcode.png')\n```\n\n### Multiple Files\n\n```python\n# Save multiple files\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Generate multiple outputs\nfor i in range(3):\n filename = f'{output_dir}/output_{i}.png'\n # ... generate file ...\n```\n\n### Structured Output\n\n```python\n# Save structured data + files\nimport json\nimport os\n\noutput_dir = os.environ.get('OUTPUT_DIR', '/output')\n\n# Save metadata\nmetadata = {\n 'files': ['chart.png', 'data.csv'],\n 'summary': 'Analysis complete'\n}\nwith open(f'{output_dir}/metadata.json', 'w') as f:\n json.dump(metadata, f)\n\n# Save actual files\n# ... generate chart.png and data.csv ...\n```\n\n## Performance Considerations\n\n### Lazy Initialization\n\nSandboxes are created only when a skill tool is first called:\n\n```yaml\nagent:\n skills: [qr-code] # Sandbox created on first skill tool call\n```\n\nThis means:\n\n- No cost if skills aren't used\n- Fast startup (no sandbox creation delay)\n- Each `next-message` execution gets its own sandbox with only the skills it needs\n\n### Timeout Limits\n\nSandboxes default to a 5-minute timeout. Configure `sandboxTimeout` on the agent config or per thread:\n\n```yaml\n# Agent-level\nagent:\n model: anthropic/claude-sonnet-4-5\n skills: [data-analysis]\n sandboxTimeout: 1800000 # 30 minutes\n```\n\n```yaml\n# Thread-level (overrides agent-level)\nsteps:\n Start thread:\n block: start-thread\n thread: analysis\n skills: [data-analysis]\n sandboxTimeout: 3600000 # 1 hour for long-running analysis\n```\n\nThread-level `sandboxTimeout` takes priority. Maximum: 1 hour (3,600,000 ms).\n\n### Sandbox Lifecycle\n\nEach `next-message` execution gets its own sandbox:\n\n- **Scoped** - Only contains the skills available to that thread\n- **Isolated** - Interactive agents and workers don't share sandboxes\n- **Resilient** - If a sandbox expires, it's transparently recreated\n- **Cleaned up** - Sandbox destroyed when the LLM call completes\n\n## Combining Skills with Tools\n\nSkills and tools can work together:\n\n```yaml\ntools:\n get-user-data:\n description: Fetch user data from database\n parameters:\n userId: { type: string }\n\nskills:\n data-analysis:\n display: description\n description: Analyzing data\n\nagent:\n tools: [get-user-data]\n skills: [data-analysis]\n agentic: true\n\nhandlers:\n analyze-user:\n Get user data:\n block: tool-call\n tool: get-user-data\n input:\n userId: USER_ID\n output: USER_DATA\n\n Analyze:\n block: next-message\n # LLM can use data-analysis skill with USER_DATA\n```\n\nPattern:\n\n1. Fetch data via tool (from your backend)\n2. LLM uses skill to analyze/process the data\n3. Generate outputs (files, reports)\n\n## Secure Skills\n\nWhen a skill declares secrets and an organization configures them, the skill runs in secure mode with its own isolated sandbox.\n\n### Standard vs Secure vs Device Skills\n\n| Aspect | Standard Skills | Secure Skills | Device Skills |\n| ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |\n| **Environment** | Shared sandbox | Isolated sandbox (one per skill) | The agent's computer |\n| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |\n| **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |\n| **Secrets** | No secrets | Secrets as env vars | No secrets |\n| **Output** | Raw stdout/stderr | Redacted (secret values replaced with `[REDACTED]`) | Raw stdout/stderr |\n\n### Writing Scripts for Secure Skills\n\nSecure skill scripts receive structured input via stdin (JSON) and access secrets from environment variables:\n\n```python\n#!/usr/bin/env python3\nimport json\nimport os\nimport sys\nimport subprocess\n\ninput_data = json.load(sys.stdin)\ntoken = os.environ[\"GITHUB_TOKEN\"]\n\nrepo = input_data.get(\"repo\", \"\")\nresult = subprocess.run(\n [\"gh\", \"repo\", \"view\", repo, \"--json\", \"name,description\"],\n capture_output=True, text=True,\n env={**os.environ, \"GH_TOKEN\": token}\n)\n\nprint(result.stdout)\n```\n\nKey patterns:\n\n- **Read stdin**: `json.load(sys.stdin)` to get the `input` object from the `octavus_skill_run` call\n- **Access secrets**: `os.environ[\"SECRET_NAME\"]` - secrets are injected as env vars\n- **Print output**: Write results to stdout - the LLM sees the (redacted) stdout\n- **Error handling**: Write errors to stderr and exit with non-zero code\n\n### Declaring Secrets in SKILL.md\n\n```yaml\n---\nname: github\ndescription: >\n Run GitHub CLI (gh) commands to manage repos, issues, PRs, and more.\nsecrets:\n - name: GITHUB_TOKEN\n description: GitHub personal access token with repo access\n required: true\n - name: GITHUB_ORG\n description: Default GitHub organization\n required: false\n---\n```\n\n### Testing Secure Skills Locally\n\nYou can test scripts locally by piping JSON to stdin:\n\n```bash\necho '{\"repo\": \"octavus-ai/agent-sdk\"}' | GITHUB_TOKEN=ghp_xxx python scripts/list-issues.py\n```\n\n## Skill Development Tips\n\n### Writing SKILL.md\n\nFocus on **when** and **how** to use the skill:\n\n```markdown\n---\nname: qr-code\ndescription: >\n Generate QR codes from text, URLs, or data. Use when the user needs to create\n a QR code for any purpose - sharing links, contact information, WiFi credentials,\n or any text data that should be scannable.\n---\n\n# QR Code Generator\n\n## When to Use\n\nUse this skill when users want to:\n\n- Share URLs easily\n- Provide contact information\n- Create scannable data\n\n## Quick Start\n\n[Clear examples of how to use the skill]\n```\n\n### Script Organization\n\nOrganize scripts logically:\n\n```\nskill-name/\n\u251C\u2500\u2500 SKILL.md\n\u2514\u2500\u2500 scripts/\n \u251C\u2500\u2500 generate.py # Main script\n \u251C\u2500\u2500 utils.py # Helper functions\n \u2514\u2500\u2500 requirements.txt # Dependencies\n```\n\n### Error Messages\n\nProvide helpful error messages:\n\n```python\ntry:\n # ... code ...\nexcept ValueError as e:\n print(f\"Error: Invalid input - {e}\")\n sys.exit(1)\n```\n\nThe LLM sees these errors and can retry or explain to users.\n\n## Security Considerations\n\n### Sandbox Isolation\n\n- **No network access** (unless explicitly configured)\n- **No persistent storage** (sandbox destroyed after each `next-message` execution)\n- **File output only** via `/output/` directory\n- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)\n\n### Secret Protection\n\nFor skills with configured secrets:\n\n- **Isolated sandbox** - each secure skill gets its own sandbox, preventing cross-skill secret leakage\n- **No arbitrary code** - `octavus_code_run`, `octavus_file_write`, and `octavus_file_read` are blocked for secure skills, so only pre-built scripts can execute\n- **Output redaction** - all stdout and stderr are scanned for secret values before being returned to the LLM\n- **Encrypted at rest** - secrets are encrypted using AES-256-GCM and only decrypted at execution time\n\n### Input Validation\n\nSkills should validate inputs:\n\n```python\nimport sys\n\nif not data:\n print(\"Error: Data is required\")\n sys.exit(1)\n\nif len(data) > 1000:\n print(\"Error: Data too long (max 1000 characters)\")\n sys.exit(1)\n```\n\n### Resource Limits\n\nBe aware of:\n\n- **File size limits** - Large files may fail to upload\n- **Execution time** - Sandbox timeout (5-minute default, 1-hour maximum)\n- **Memory limits** - Sandbox environment constraints\n\n## Debugging Skills\n\n### Check Skill Documentation\n\nThe LLM can read skill docs:\n\n```python\n# LLM calls octavus_skill_read to see skill instructions\n```\n\n### Test Locally\n\nTest skills before uploading:\n\n```bash\n# Test skill locally\npython scripts/generate.py --data \"test\"\n```\n\n### Monitor Execution\n\nCheck execution logs in the platform debug view:\n\n- Tool calls and arguments\n- Code execution results\n- File outputs\n- Error messages\n\n## Common Patterns\n\n### Pattern 1: Generate and Return\n\n```yaml\n# User asks for QR code\n# LLM generates QR code\n# File automatically available for download\n```\n\n### Pattern 2: Analyze and Report\n\n```yaml\n# User provides data\n# LLM analyzes with skill\n# Generates report file\n# Returns summary + file link\n```\n\n### Pattern 3: Transform and Save\n\n```yaml\n# User uploads file (via tool)\n# LLM processes with skill\n# Generates transformed file\n# Returns new file link\n```\n\n## Best Practices Summary\n\n1. **Enable only needed skills** - Don't overwhelm the LLM\n2. **Choose appropriate display modes** - Match user experience needs\n3. **Write clear skill descriptions** - Help LLM understand when to use\n4. **Handle errors gracefully** - Provide helpful error messages\n5. **Test skills locally** - Verify before uploading\n6. **Monitor execution** - Check logs for issues\n7. **Combine with tools** - Use tools for data, skills for processing\n8. **Consider performance** - Be aware of timeouts and limits\n9. **Use secrets for credentials** - Declare secrets in frontmatter instead of hardcoding tokens\n10. **Design scripts for stdin input** - Secure skills receive JSON via stdin, so plan for both input methods if the skill might be used in either mode\n\n## Next Steps\n\n- [Skills](/docs/protocol/skills) - Basic skills documentation\n- [Agent Config](/docs/protocol/agent-config) - Configuring skills\n- [Tools](/docs/protocol/tools) - External tools integration\n",
|
|
1398
1416
|
excerpt: "Skills Advanced Guide This guide covers advanced patterns and best practices for using Octavus skills in your agents. When to Use Skills Skills are ideal for: - Code execution - Running Python/Bash...",
|
|
1399
1417
|
order: 9
|
|
1400
1418
|
},
|
|
@@ -1412,7 +1430,7 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
1412
1430
|
section: "protocol",
|
|
1413
1431
|
title: "Workers",
|
|
1414
1432
|
description: "Defining worker agents for background and task-based execution.",
|
|
1415
|
-
content: '\n# Workers\n\nWorkers are agents designed for task-based execution. Unlike interactive agents that handle multi-turn conversations, workers execute a sequence of steps and return an output value.\n\n## When to Use Workers\n\nWorkers are ideal for:\n\n- **Background processing** - Long-running tasks that don\'t need conversation\n- **Composable tasks** - Reusable units of work called by other agents\n- **Pipelines** - Multi-step processing with structured output\n- **Parallel execution** - Tasks that can run independently\n\nUse interactive agents instead when:\n\n- **Conversation is needed** - Multi-turn dialogue with users\n- **Persistence matters** - State should survive across interactions\n- **Session context** - User context needs to persist\n\n## Worker vs Interactive\n\n| Aspect | Interactive | Worker |\n| ---------- | ---------------------------------- | ----------------------------- |\n| Structure | `triggers` + `handlers` + `agent` | `steps` + `output` |\n| LLM Config | Global `agent:` section | Per-thread via `start-thread` |\n| Invocation | Fire a named trigger | Direct execution with input |\n| Session | Persists across triggers (24h TTL) | Single execution |\n| Result | Streaming chat | Streaming + output value |\n\n## Protocol Structure\n\nWorkers use a simpler protocol structure than interactive agents:\n\n```yaml\n# Input schema - provided when worker is executed\ninput:\n TOPIC:\n type: string\n description: Topic to research\n DEPTH:\n type: string\n optional: true\n default: medium\n\n# Variables for intermediate results\nvariables:\n RESEARCH_DATA:\n type: string\n ANALYSIS:\n type: string\n description: Final analysis result\n\n# Tools available to the worker\ntools:\n web-search:\n description: Search the web\n parameters:\n query: { type: string }\n\n# Sequential execution steps\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC, DEPTH]\n tools: [web-search]\n maxSteps: 5\n\n Add research request:\n block: add-message\n thread: research\n role: user\n prompt: research-prompt\n input: [TOPIC, DEPTH]\n\n Generate research:\n block: next-message\n thread: research\n output: RESEARCH_DATA\n\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: analysis-system\n\n Add analysis request:\n block: add-message\n thread: analysis\n role: user\n prompt: analysis-prompt\n input: [RESEARCH_DATA]\n\n Generate analysis:\n block: next-message\n thread: analysis\n output: ANALYSIS\n\n# Output variable - the worker\'s return value\noutput: ANALYSIS\n```\n\n## settings.json\n\nWorkers are identified by the `format` field:\n\n```json\n{\n "slug": "research-assistant",\n "name": "Research Assistant",\n "description": "Researches topics and returns structured analysis",\n "format": "worker"\n}\n```\n\n## Key Differences\n\n### No Global Agent Config\n\nInteractive agents have a global `agent:` section that configures a main thread. Workers don\'t have this - every thread must be explicitly created via `start-thread`:\n\n```yaml\n# Interactive agent: Global config\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [tool-a, tool-b]\n\n# Worker: Each thread configured independently\nsteps:\n Start thread A:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n tools: [tool-a]\n\n Start thread B:\n block: start-thread\n thread: analysis\n model: openai/gpt-4o\n tools: [tool-b]\n```\n\nThis gives workers flexibility to use different models, tools, skills, and settings at different stages.\n\n### Steps Instead of Handlers\n\nWorkers use `steps:` instead of `handlers:`. Steps execute sequentially, like handler blocks:\n\n```yaml\n# Interactive: Handlers respond to triggers\nhandlers:\n user-message:\n Add message:\n block: add-message\n # ...\n\n# Worker: Steps execute in sequence\nsteps:\n Add message:\n block: add-message\n # ...\n```\n\n### Output Value\n\nWorkers can return an output value to the caller:\n\n```yaml\nvariables:\n RESULT:\n type: string\n\nsteps:\n # ... steps that populate RESULT ...\n\noutput: RESULT # Return this variable\'s value\n```\n\nThe `output` field references a variable declared in `variables:`. If omitted, the worker completes without returning a value.\n\n## Available Blocks\n\nWorkers support the same blocks as handlers:\n\n| Block | Purpose |\n| ------------------ | -------------------------------------------- |\n| `start-thread` | Create a named thread with LLM configuration |\n| `add-message` | Add a message to a thread |\n| `next-message` | Generate LLM response |\n| `tool-call` | Call a tool deterministically |\n| `set-resource` | Update a resource value |\n| `serialize-thread` | Convert thread to text |\n| `generate-image` | Generate an image from a prompt variable |\n\n### start-thread (Required for LLM)\n\nEvery thread must be initialized with `start-thread` before using `next-message`:\n\n```yaml\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC]\n tools: [web-search]\n thinking: medium\n maxSteps: 5\n```\n\nAll LLM configuration goes here:\n\n| Field | Description |\n| ------------- | -------------------------------------------------------------------------------------- |\n| `thread` | Thread name (defaults to block name) |\n| `model` | LLM model to use |\n| `system` | System prompt filename (required) |\n| `input` | Variables for system prompt |\n| `tools` | Tools available in this thread |\n| `skills` | Octavus skills available in this thread |\n| `mcpServers` | MCP servers available in this thread |\n| `imageModel` | Image generation model |\n| `webSearch` | Enable built-in web search tool |\n| `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |\n| `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `temperature` | Model temperature (0-2), `"off"`, or variable reference |\n| `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |\n\n## Simple Example\n\nA worker that generates a title from a summary:\n\n```yaml\n# Input\ninput:\n CONVERSATION_SUMMARY:\n type: string\n description: Summary to generate a title for\n\n# Variables\nvariables:\n TITLE:\n type: string\n description: The generated title\n\n# Steps\nsteps:\n Start title thread:\n block: start-thread\n thread: title-gen\n model: anthropic/claude-sonnet-4-5\n system: title-system\n\n Add title request:\n block: add-message\n thread: title-gen\n role: user\n prompt: title-request\n input: [CONVERSATION_SUMMARY]\n\n Generate title:\n block: next-message\n thread: title-gen\n output: TITLE\n display: stream\n\n# Output\noutput: TITLE\n```\n\n## Advanced Example\n\nA worker with multiple threads, tools, and agentic behavior:\n\n```yaml\ninput:\n USER_MESSAGE:\n type: string\n description: The user\'s message to respond to\n USER_ID:\n type: string\n description: User ID for account lookups\n optional: true\n\ntools:\n get-user-account:\n description: Looking up account information\n parameters:\n userId: { type: string }\n create-support-ticket:\n description: Creating a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string }\n\nvariables:\n ASSISTANT_RESPONSE:\n type: string\n CHAT_TRANSCRIPT:\n type: string\n CONVERSATION_SUMMARY:\n type: string\n\nsteps:\n # Thread 1: Chat with agentic tool calling\n Start chat thread:\n block: start-thread\n thread: chat\n model: anthropic/claude-sonnet-4-5\n system: chat-system\n input: [USER_ID]\n tools: [get-user-account, create-support-ticket]\n thinking: medium\n maxSteps: 5\n\n Add user message:\n block: add-message\n thread: chat\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Generate response:\n block: next-message\n thread: chat\n output: ASSISTANT_RESPONSE\n display: stream\n\n # Serialize for summary\n Save conversation:\n block: serialize-thread\n thread: chat\n output: CHAT_TRANSCRIPT\n\n # Thread 2: Summary generation\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-sonnet-4-5\n system: summary-system\n thinking: low\n\n Add summary request:\n block: add-message\n thread: summary\n role: user\n prompt: summary-request\n input: [CHAT_TRANSCRIPT]\n\n Generate summary:\n block: next-message\n thread: summary\n output: CONVERSATION_SUMMARY\n display: stream\n\noutput: CONVERSATION_SUMMARY\n```\n\n## MCP Servers\n\nWorkers can declare and use MCP servers, just like interactive agents. Define them in `mcpServers:` and reference them in `start-thread`:\n\n```yaml\nmcpServers:\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [sentry, browser]\n maxSteps: 10\n```\n\nWorkers resolve their own MCP connections independently - they don\'t inherit MCP servers from a parent interactive agent. Remote MCP connections are project-scoped, so a worker in the same project automatically has access to the same OAuth connections.\n\nSee [MCP Servers](/docs/protocol/mcp-servers) for full documentation.\n\n## Skills, Image Generation, and Web Search\n\nWorkers can use Octavus skills, image generation, and web search, configured per-thread via `start-thread`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generate QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n imageModel: google/gemini-2.5-flash-image\n webSearch: true\n maxSteps: 10\n```\n\nWorkers define their own skills independently - they don\'t inherit skills from a parent interactive agent. Each thread gets its own sandbox scoped to only its listed skills.\n\nSkills with `execution: device` work the same way in workers as in interactive agents - the skill runs on the agent\'s computer. Workers resolve their device execution independently, so a worker can use device skills even if the parent agent does not.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## Tool Handling\n\nWorkers support the same tool handling as interactive agents:\n\n- **Server tools** - Handled by tool handlers you provide\n- **Client tools** - Pause execution, return tool request to caller\n\n```typescript\n// Non-streaming: get the output directly\nconst { output } = await client.workers.generate(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n\n// Streaming: observe events in real-time\nconst events = client.workers.execute(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n```\n\nSee [Server SDK Workers](/docs/server-sdk/workers) for tool handling details.\n\n## Stream Events\n\nWorkers emit the same events as interactive agents, plus worker-specific events:\n\n| Event | Description |\n| --------------- | ---------------------------------- |\n| `worker-start` | Worker execution begins |\n| `worker-result` | Worker completes (includes output) |\n\nAll standard events (text-delta, tool calls, etc.) are also emitted.\n\n## Calling Workers from Interactive Agents\n\nInteractive agents can call workers in two ways:\n\n1. **Deterministically** - Using the `run-worker` block\n2. **Agentically** - LLM calls worker as a tool\n\n### Worker Declaration\n\nFirst, declare workers in your interactive agent\'s protocol:\n\n```yaml\nworkers:\n generate-title:\n description: Generating conversation title\n display: description\n research-assistant:\n description: Researching topic\n display: stream\n tools:\n search: web-search # Map worker tool \u2192 parent tool\n```\n\n### run-worker Block\n\nCall a worker deterministically from a handler:\n\n```yaml\nhandlers:\n request-human:\n Generate title:\n block: run-worker\n worker: generate-title\n input:\n CONVERSATION_SUMMARY: SUMMARY\n output: CONVERSATION_TITLE\n```\n\n### LLM Tool Invocation\n\nMake workers available to the LLM:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n workers: [generate-title, research-assistant]\n agentic: true\n```\n\nThe LLM can then call workers as tools during conversation.\n\n### Display Modes\n\nControls how worker execution appears to users. The default for workers is `stream`.\n\n| Mode | Behavior |\n| ------------- | ---------------------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Worker runs silently. No events reach the client - no `UIWorkerPart` is created. |\n| `name` | Shows a running/done indicator with the worker name. No nested content (text, tool calls, reasoning) is forwarded. |\n| `description` | Shows a running/done indicator with the worker description. No nested content is forwarded. |\n| `stream` | Full visibility. All nested events are forwarded - text, reasoning, tool calls, sources, files. Worker input is included on start. |\n\n**Progressive input streaming:** When a worker with `display: stream` is invoked agentically (LLM calls it as a tool), the `UIWorkerPart` appears in the UI immediately as the LLM starts generating the worker\'s arguments. The worker input streams progressively into the worker part, the same way text tokens stream into a text part. Once input finishes, worker execution begins and nested content flows into the same worker part. There is no intermediate tool card.\n\n**`name` and `description` modes:** Worker input is stripped from the `worker-start` event (it may contain sensitive data). Only the running/done status and the final `worker-result` are forwarded to the parent stream. Use these for workers where the user only needs to know the worker ran, not what it did internally.\n\n**`hidden` mode:** The worker executes normally but produces no UI presence at all. Use for internal workers that are implementation details.\n\n### Tool Mapping\n\nMap parent tools to worker tools when the worker needs access to your tool handlers:\n\n```yaml\nworkers:\n research-assistant:\n description: Research topics\n tools:\n search: web-search # Worker\'s "search" \u2192 parent\'s "web-search"\n```\n\nWhen the worker calls its `search` tool, your `web-search` handler executes.\n\n## Next Steps\n\n- [Server SDK Workers](/docs/server-sdk/workers) - Executing workers from code\n- [Handlers](/docs/protocol/handlers) - Block reference for steps\n- [Agent Config](/docs/protocol/agent-config) - Model and settings\n',
|
|
1433
|
+
content: '\n# Workers\n\nWorkers are agents designed for task-based execution. Unlike interactive agents that handle multi-turn conversations, workers execute a sequence of steps and return an output value.\n\n## When to Use Workers\n\nWorkers are ideal for:\n\n- **Background processing** - Long-running tasks that don\'t need conversation\n- **Composable tasks** - Reusable units of work called by other agents\n- **Pipelines** - Multi-step processing with structured output\n- **Parallel execution** - Tasks that can run independently\n\nUse interactive agents instead when:\n\n- **Conversation is needed** - Multi-turn dialogue with users\n- **Persistence matters** - State should survive across interactions\n- **Session context** - User context needs to persist\n\n## Worker vs Interactive\n\n| Aspect | Interactive | Worker |\n| ---------- | ---------------------------------- | ----------------------------- |\n| Structure | `triggers` + `handlers` + `agent` | `steps` + `output` |\n| LLM Config | Global `agent:` section | Per-thread via `start-thread` |\n| Invocation | Fire a named trigger | Direct execution with input |\n| Session | Persists across triggers (24h TTL) | Single execution |\n| Result | Streaming chat | Streaming + output value |\n\n## Protocol Structure\n\nWorkers use a simpler protocol structure than interactive agents:\n\n```yaml\n# Input schema - provided when worker is executed\ninput:\n TOPIC:\n type: string\n description: Topic to research\n DEPTH:\n type: string\n optional: true\n default: medium\n\n# Variables for intermediate results\nvariables:\n RESEARCH_DATA:\n type: string\n ANALYSIS:\n type: string\n description: Final analysis result\n\n# Tools available to the worker\ntools:\n web-search:\n description: Search the web\n parameters:\n query: { type: string }\n\n# Sequential execution steps\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC, DEPTH]\n tools: [web-search]\n maxSteps: 5\n\n Add research request:\n block: add-message\n thread: research\n role: user\n prompt: research-prompt\n input: [TOPIC, DEPTH]\n\n Generate research:\n block: next-message\n thread: research\n output: RESEARCH_DATA\n\n Start analysis:\n block: start-thread\n thread: analysis\n model: anthropic/claude-sonnet-4-5\n system: analysis-system\n\n Add analysis request:\n block: add-message\n thread: analysis\n role: user\n prompt: analysis-prompt\n input: [RESEARCH_DATA]\n\n Generate analysis:\n block: next-message\n thread: analysis\n output: ANALYSIS\n\n# Output variable - the worker\'s return value\noutput: ANALYSIS\n```\n\n## settings.json\n\nWorkers are identified by the `format` field:\n\n```json\n{\n "slug": "research-assistant",\n "name": "Research Assistant",\n "description": "Researches topics and returns structured analysis",\n "format": "worker"\n}\n```\n\n## Key Differences\n\n### No Global Agent Config\n\nInteractive agents have a global `agent:` section that configures a main thread. Workers don\'t have this - every thread must be explicitly created via `start-thread`:\n\n```yaml\n# Interactive agent: Global config\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n tools: [tool-a, tool-b]\n\n# Worker: Each thread configured independently\nsteps:\n Start thread A:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n tools: [tool-a]\n\n Start thread B:\n block: start-thread\n thread: analysis\n model: openai/gpt-4o\n tools: [tool-b]\n```\n\nThis gives workers flexibility to use different models, tools, skills, and settings at different stages.\n\n### Steps Instead of Handlers\n\nWorkers use `steps:` instead of `handlers:`. Steps execute sequentially, like handler blocks:\n\n```yaml\n# Interactive: Handlers respond to triggers\nhandlers:\n user-message:\n Add message:\n block: add-message\n # ...\n\n# Worker: Steps execute in sequence\nsteps:\n Add message:\n block: add-message\n # ...\n```\n\n### Output Value\n\nWorkers can return an output value to the caller:\n\n```yaml\nvariables:\n RESULT:\n type: string\n\nsteps:\n # ... steps that populate RESULT ...\n\noutput: RESULT # Return this variable\'s value\n```\n\nThe `output` field references a variable declared in `variables:`. If omitted, the worker completes without returning a value.\n\n## Available Blocks\n\nWorkers support the same blocks as handlers:\n\n| Block | Purpose |\n| ------------------ | -------------------------------------------- |\n| `start-thread` | Create a named thread with LLM configuration |\n| `add-message` | Add a message to a thread |\n| `next-message` | Generate LLM response |\n| `tool-call` | Call a tool deterministically |\n| `set-resource` | Update a resource value |\n| `serialize-thread` | Convert thread to text |\n| `generate-image` | Generate an image from a prompt variable |\n\n### start-thread (Required for LLM)\n\nEvery thread must be initialized with `start-thread` before using `next-message`:\n\n```yaml\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: research-system\n input: [TOPIC]\n tools: [web-search]\n thinking: medium\n maxSteps: 5\n```\n\nAll LLM configuration goes here:\n\n| Field | Description |\n| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |\n| `thread` | Thread name (defaults to block name) |\n| `model` | LLM model to use |\n| `system` | System prompt filename (required) |\n| `input` | Variables for system prompt |\n| `tools` | Tools available in this thread |\n| `skills` | Octavus skills available in this thread |\n| `mcpServers` | MCP servers available in this thread |\n| `imageModel` | Image generation model |\n| `webSearch` | Enable built-in web search tool |\n| `thinking` | Extended reasoning level (`low`/`medium`/`high`/`max`), `"off"`, or variable reference |\n| `cache` | Prompt caching mode: `auto` (default), `extended`, or `off` |\n| `temperature` | Model temperature (0-2), `"off"`, or variable reference |\n| `maxSteps` | Maximum tool call cycles (enables agentic if > 1), or variable reference |\n| `maxToolOutputTokens` | Cap a single tool result at this many tokens in the thread\'s model view (head+tail preview + note). Omit to leave tool output unbounded |\n\n## Simple Example\n\nA worker that generates a title from a summary:\n\n```yaml\n# Input\ninput:\n CONVERSATION_SUMMARY:\n type: string\n description: Summary to generate a title for\n\n# Variables\nvariables:\n TITLE:\n type: string\n description: The generated title\n\n# Steps\nsteps:\n Start title thread:\n block: start-thread\n thread: title-gen\n model: anthropic/claude-sonnet-4-5\n system: title-system\n\n Add title request:\n block: add-message\n thread: title-gen\n role: user\n prompt: title-request\n input: [CONVERSATION_SUMMARY]\n\n Generate title:\n block: next-message\n thread: title-gen\n output: TITLE\n display: stream\n\n# Output\noutput: TITLE\n```\n\n## Advanced Example\n\nA worker with multiple threads, tools, and agentic behavior:\n\n```yaml\ninput:\n USER_MESSAGE:\n type: string\n description: The user\'s message to respond to\n USER_ID:\n type: string\n description: User ID for account lookups\n optional: true\n\ntools:\n get-user-account:\n description: Looking up account information\n parameters:\n userId: { type: string }\n create-support-ticket:\n description: Creating a support ticket\n parameters:\n summary: { type: string }\n priority: { type: string }\n\nvariables:\n ASSISTANT_RESPONSE:\n type: string\n CHAT_TRANSCRIPT:\n type: string\n CONVERSATION_SUMMARY:\n type: string\n\nsteps:\n # Thread 1: Chat with agentic tool calling\n Start chat thread:\n block: start-thread\n thread: chat\n model: anthropic/claude-sonnet-4-5\n system: chat-system\n input: [USER_ID]\n tools: [get-user-account, create-support-ticket]\n thinking: medium\n maxSteps: 5\n\n Add user message:\n block: add-message\n thread: chat\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n\n Generate response:\n block: next-message\n thread: chat\n output: ASSISTANT_RESPONSE\n display: stream\n\n # Serialize for summary\n Save conversation:\n block: serialize-thread\n thread: chat\n output: CHAT_TRANSCRIPT\n\n # Thread 2: Summary generation\n Start summary thread:\n block: start-thread\n thread: summary\n model: anthropic/claude-sonnet-4-5\n system: summary-system\n thinking: low\n\n Add summary request:\n block: add-message\n thread: summary\n role: user\n prompt: summary-request\n input: [CHAT_TRANSCRIPT]\n\n Generate summary:\n block: next-message\n thread: summary\n output: CONVERSATION_SUMMARY\n display: stream\n\noutput: CONVERSATION_SUMMARY\n```\n\n## MCP Servers\n\nWorkers can declare and use MCP servers, just like interactive agents. Define them in `mcpServers:` and reference them in `start-thread`:\n\n```yaml\nmcpServers:\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [sentry, browser]\n maxSteps: 10\n```\n\nWorkers resolve their own MCP connections independently - they don\'t inherit MCP servers from a parent interactive agent. Remote MCP connections are project-scoped, so a worker in the same project automatically has access to the same OAuth connections.\n\nSee [MCP Servers](/docs/protocol/mcp-servers) for full documentation.\n\n## Skills, Image Generation, and Web Search\n\nWorkers can use Octavus skills, image generation, and web search, configured per-thread via `start-thread`:\n\n```yaml\nskills:\n qr-code:\n display: description\n description: Generate QR codes\n\nsteps:\n Start thread:\n block: start-thread\n thread: worker\n model: anthropic/claude-sonnet-4-5\n system: system\n skills: [qr-code]\n imageModel: google/gemini-2.5-flash-image\n webSearch: true\n maxSteps: 10\n```\n\nWorkers define their own skills independently - they don\'t inherit skills from a parent interactive agent. Each thread gets its own sandbox scoped to only its listed skills.\n\nSkills with `execution: device` work the same way in workers as in interactive agents - the skill runs on the agent\'s computer. Workers resolve their device execution independently, so a worker can use device skills even if the parent agent does not.\n\nSee [Skills](/docs/protocol/skills) for full documentation.\n\n## Tool Handling\n\nWorkers support the same tool handling as interactive agents:\n\n- **Server tools** - Handled by tool handlers you provide\n- **Client tools** - Pause execution, return tool request to caller\n\n```typescript\n// Non-streaming: get the output directly\nconst { output } = await client.workers.generate(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n\n// Streaming: observe events in real-time\nconst events = client.workers.execute(\n agentId,\n { TOPIC: \'AI safety\' },\n {\n tools: {\n \'web-search\': async (args) => await searchWeb(args.query),\n },\n },\n);\n```\n\nSee [Server SDK Workers](/docs/server-sdk/workers) for tool handling details.\n\n## Stream Events\n\nWorkers emit the same events as interactive agents, plus worker-specific events:\n\n| Event | Description |\n| --------------- | ---------------------------------- |\n| `worker-start` | Worker execution begins |\n| `worker-result` | Worker completes (includes output) |\n\nAll standard events (text-delta, tool calls, etc.) are also emitted.\n\n## Calling Workers from Interactive Agents\n\nInteractive agents can call workers in three ways:\n\n1. **Deterministically** - Using the `run-worker` block\n2. **Agentically** - LLM calls worker as a tool\n3. **Automatically** - Octavus invokes the worker as part of a built-in capability, not the model. Context management\'s `summarizerWorker` (see [Context Management](/docs/protocol/context-management)) works this way: declare it in `workers:` but leave it out of `agent.workers` so the model never sees it as a tool.\n\n### Worker Declaration\n\nFirst, declare workers in your interactive agent\'s protocol:\n\n```yaml\nworkers:\n generate-title:\n description: Generating conversation title\n display: description\n research-assistant:\n description: Researching topic\n display: stream\n tools:\n search: web-search # Map worker tool \u2192 parent tool\n```\n\n### run-worker Block\n\nCall a worker deterministically from a handler:\n\n```yaml\nhandlers:\n request-human:\n Generate title:\n block: run-worker\n worker: generate-title\n input:\n CONVERSATION_SUMMARY: SUMMARY\n output: CONVERSATION_TITLE\n```\n\n### LLM Tool Invocation\n\nMake workers available to the LLM:\n\n```yaml\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n workers: [generate-title, research-assistant]\n agentic: true\n```\n\nThe LLM can then call workers as tools during conversation.\n\n### Display Modes\n\nControls how worker execution appears to users. The default for workers is `stream`.\n\n| Mode | Behavior |\n| ------------- | ---------------------------------------------------------------------------------------------------------------------------------- |\n| `hidden` | Worker runs silently. No events reach the client - no `UIWorkerPart` is created. |\n| `name` | Shows a running/done indicator with the worker name. No nested content (text, tool calls, reasoning) is forwarded. |\n| `description` | Shows a running/done indicator with the worker description. No nested content is forwarded. |\n| `stream` | Full visibility. All nested events are forwarded - text, reasoning, tool calls, sources, files. Worker input is included on start. |\n\n**Progressive input streaming:** When a worker with `display: stream` is invoked agentically (LLM calls it as a tool), the `UIWorkerPart` appears in the UI immediately as the LLM starts generating the worker\'s arguments. The worker input streams progressively into the worker part, the same way text tokens stream into a text part. Once input finishes, worker execution begins and nested content flows into the same worker part. There is no intermediate tool card.\n\n**`name` and `description` modes:** Worker input is stripped from the `worker-start` event (it may contain sensitive data). Only the running/done status and the final `worker-result` are forwarded to the parent stream. Use these for workers where the user only needs to know the worker ran, not what it did internally.\n\n**`hidden` mode:** The worker executes normally but produces no UI presence at all. Use for internal workers that are implementation details.\n\n### Tool Mapping\n\nMap parent tools to worker tools when the worker needs access to your tool handlers:\n\n```yaml\nworkers:\n research-assistant:\n description: Research topics\n tools:\n search: web-search # Worker\'s "search" \u2192 parent\'s "web-search"\n```\n\nWhen the worker calls its `search` tool, your `web-search` handler executes.\n\n## Next Steps\n\n- [Server SDK Workers](/docs/server-sdk/workers) - Executing workers from code\n- [Handlers](/docs/protocol/handlers) - Block reference for steps\n- [Agent Config](/docs/protocol/agent-config) - Model and settings\n',
|
|
1416
1434
|
excerpt: "Workers Workers are agents designed for task-based execution. Unlike interactive agents that handle multi-turn conversations, workers execute a sequence of steps and return an output value. When to...",
|
|
1417
1435
|
order: 11
|
|
1418
1436
|
},
|
|
@@ -1433,6 +1451,24 @@ See [Streaming Events](/docs/server-sdk/streaming#event-types) for the full list
|
|
|
1433
1451
|
content: "\n# MCP Servers\n\nMCP servers extend your agent with tools from external services. Define them in your protocol, and agents automatically discover and use their tools at runtime.\n\nThere are three types of MCP servers:\n\n| Source | Description | Example |\n| ---------- | ----------------------------------------------------------------------------------- | ------------------------------------- |\n| `remote` | HTTP-based MCP servers, managed by the platform | Figma, Sentry, GitHub |\n| `device` | Local MCP servers running on the agent's machine via `@octavus/computer` | Browser automation, filesystem |\n| `consumer` | Inline MCP servers defined in your server-sdk process via `createInlineMcpServer()` | Custom integrations, third-party APIs |\n\n## Defining MCP Servers\n\nMCP servers are defined in the `mcpServers:` section. The key becomes the **namespace** for all tools from that server.\n\n```yaml\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\n github:\n description: Repository management - issues, pull requests, code\n source: consumer\n display: name\n```\n\n### Fields\n\n| Field | Required | Description |\n| ------------- | -------- | ---------------------------------------------------------------------------------------------------------------------- |\n| `description` | Yes | What the MCP server provides |\n| `source` | Yes | `remote`, `device`, or `consumer` (see source types above) |\n| `display` | No | How tool calls appear in UI: `hidden`, `name`, `description` (default: `description`) |\n| `connection` | No | When to connect: `eager` or `lazy` (default: `lazy`). `remote` only. |\n| `execution` | No | Where the MCP process runs: `sandbox` (default) or `device`. `remote` only. See [Device Execution](#device-execution). |\n\n### Display Modes\n\nDisplay modes control visibility of all tool calls from the MCP server, using the same modes as [regular tools](/docs/protocol/tools#display-modes):\n\n| Mode | Behavior |\n| ------------- | -------------------------------------- |\n| `hidden` | Tool calls run silently |\n| `name` | Shows tool name while executing |\n| `description` | Shows tool description while executing |\n\n## Making MCP Servers Available\n\nLike tools, MCP servers defined in `mcpServers:` must be referenced in `agent.mcpServers` to be available:\n\n```yaml\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\n filesystem:\n description: Filesystem access for reading and writing files\n source: device\n display: hidden\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [figma, sentry, browser, filesystem]\n tools: [set-chat-title]\n agentic: true\n maxSteps: 100\n```\n\n## Tool Namespacing\n\nAll MCP tools are automatically namespaced using `__` (double underscore) as a separator. The namespace comes from the `mcpServers` key.\n\nFor example, a server defined as `browser:` that exposes `navigate_page` and `click` produces:\n\n- `browser__navigate_page`\n- `browser__click`\n\nA server defined as `figma:` that exposes `get_design_context` produces:\n\n- `figma__get_design_context`\n\nThe namespace is stripped before calling the MCP server - the server receives the original tool name. This convention matches Anthropic's MCP integration in Claude Desktop and ensures tool names stay unique across servers.\n\n### What the LLM Sees\n\nWhen an agent has both regular tools and MCP servers configured, the LLM sees all tools combined:\n\n```\nProtocol tools:\n set-chat-title\n\nRemote MCP tools (auto-discovered):\n figma__get_design_context\n figma__get_screenshot\n sentry__get_issues\n sentry__get_issue_details\n\nDevice MCP tools (auto-discovered):\n browser__navigate_page\n browser__click\n browser__take_snapshot\n filesystem__read_file\n filesystem__write_file\n filesystem__list_directory\n\nConsumer MCP tools (provided by the server-sdk):\n github__get-pr-overview\n github__list-issues\n```\n\nYou don't define individual MCP tool schemas in the protocol - remote and device tools are auto-discovered from each MCP server at runtime, and consumer tools are supplied by the server-sdk.\n\n## Remote MCP Servers\n\nRemote MCP servers (`source: remote`) connect to HTTP-based MCP endpoints. The platform manages the connection, authentication, and tool discovery.\n\nConfiguration happens in the Octavus platform UI:\n\n1. Add an MCP server to your project (URL + authentication)\n2. The server's slug must match the namespace in your protocol\n3. The platform connects, discovers tools, and makes them available to the agent\n\n### Connection Modes\n\nThe `connection` field controls when the platform connects to a remote MCP server:\n\n| Mode | Behavior |\n| ------- | ---------------------------------------------------------------------------------------------------------------------- |\n| `lazy` | (default) The agent activates integrations on demand at runtime. The agent starts responding immediately. |\n| `eager` | The platform connects and discovers tools before the first LLM request. Tools are guaranteed available from message 1. |\n\n```yaml\nmcpServers:\n sentry:\n source: remote\n connection: eager # Always connected upfront\n display: name\n\n notion:\n source: remote\n # connection defaults to lazy - agent activates when needed\n display: description\n```\n\nWith **lazy connection** (the default), the agent receives two built-in tools - one for listing available integrations and one for activating them. The agent decides which integrations it needs based on the conversation and activates them on demand. This avoids paying connection latency for integrations the agent doesn't end up using.\n\nWith **eager connection**, the platform connects to the MCP server before the first LLM request, exactly like a declared tool. Use this when the agent needs the MCP's tools from the very first message.\n\nThe `connection` field is only valid on `source: remote` - device MCPs (`source: device`) have their own connection mechanism through the server-sdk. The `connection` field is respected for remote MCPs with `execution: device` the same way as sandbox MCPs.\n\n### Authentication\n\nRemote MCP servers support multiple authentication methods:\n\n| Auth Type | Description |\n| --------- | ------------------------------- |\n| MCP OAuth | Standard MCP OAuth flow |\n| API Key | Static API key sent as a header |\n| Bearer | Bearer token authentication |\n| None | No authentication required |\n\nAuthentication is configured per-project - different projects can connect to the same MCP server with different credentials.\n\n## Device Execution\n\nThe `execution` field controls where a remote MCP server's STDIO process runs. By default (`execution: sandbox`), the process runs in the platform's sandbox. When set to `execution: device`, the STDIO process runs on the agent's computer (VM or desktop) instead.\n\n```yaml\nmcpServers:\n code-tools:\n description: Code analysis and refactoring tools\n source: remote\n execution: device # STDIO process runs on the agent's computer\n display: name\n\n sentry:\n description: Error tracking\n source: remote\n # execution defaults to sandbox - runs in the platform\n display: name\n```\n\n### When to Use\n\nUse `execution: device` when the MCP server needs access to the agent's local environment - for example, tools that read from the local filesystem, interact with running processes, or need CLIs installed on the device.\n\n### Rules\n\n- `execution` is only meaningful for `source: remote` MCPs that use STDIO transport. HTTP-transport remote MCPs always connect from the platform regardless of the `execution` setting.\n- `execution` is **invalid** on `source: device` and `source: consumer` MCPs - they already run outside the platform. Using it produces a validation error.\n- The `connection` field (`eager` or `lazy`) is respected for device-executed MCPs the same way as sandbox-executed MCPs.\n\n## Device MCP Servers\n\nDevice MCP servers (`source: device`) run on the consumer's machine. The consumer provides the MCP tools via the `@octavus/computer` package (or any `ToolProvider` implementation) through the server-sdk.\n\nWhen an agent has device MCP servers:\n\n1. The consumer creates a `Computer` with matching namespaces\n2. `@octavus/computer` discovers tools from each MCP server\n3. Tool schemas are sent to the platform via the server-sdk\n4. Tool calls flow back to the consumer for execution\n\nSee [`@octavus/computer`](/docs/server-sdk/computer) for the full integration guide.\n\n### Namespace Matching\n\nThe `mcpServers` keys in the protocol must match the keys in the consumer's `Computer` configuration:\n\n```yaml\n# protocol.yaml\nmcpServers:\n browser: # \u2190 must match\n source: device\n filesystem: # \u2190 must match\n source: device\n```\n\n```typescript\nconst computer = new Computer({\n mcpServers: {\n browser: Computer.stdio('chrome-devtools-mcp', ['--browser-url=...']),\n filesystem: Computer.stdio('@modelcontextprotocol/server-filesystem', [dir]),\n },\n});\n```\n\nIf the consumer provides a namespace not declared in the protocol, the platform rejects it.\n\n## Consumer MCP Servers\n\nConsumer MCP servers (`source: consumer`) are defined inline in your server-sdk process. Tool schemas and handlers live in your code; the platform learns the namespace from the protocol and routes tool calls to your process via the same `dynamicToolSchemas` channel that device MCPs use.\n\n```yaml\nmcpServers:\n github:\n description: Repository management - issues, pull requests, code\n source: consumer\n display: name\n\nagent:\n mcpServers:\n - github\n```\n\nThe protocol declaration is intentionally minimal - the SDK supplies tool names and JSON schemas at runtime, so adding or evolving tools doesn't require a protocol change.\n\nUse consumer MCPs when:\n\n- The integration's credentials should never reach the platform (OAuth tokens, customer API keys).\n- You want to group an integration's tools (`github__list-prs`, `github__get-issue`) without enumerating each one in YAML.\n- You want type-safe handler arguments via Zod schemas.\n\nSee [`createInlineMcpServer`](/docs/server-sdk/inline-mcp) in the server-sdk reference for the full implementation guide.\n\n### Namespace Matching\n\nThe protocol namespace must match the namespace passed to `createInlineMcpServer()`:\n\n```yaml\n# protocol.yaml\nmcpServers:\n github: # \u2190 must match\n source: consumer\n```\n\n```typescript\nconst github = createInlineMcpServer('github', {\n /* tools... */\n});\n\nsession = client.agentSessions.attach(sessionId, {\n mcpServers: [github],\n});\n```\n\nIf the SDK provides a namespace not declared in the protocol, those tools are filtered out at the runtime boundary and the LLM never sees them.\n\n## Thread-Level Scoping\n\nThreads can scope which MCP servers are available, the same way they scope [tools](/docs/protocol/handlers#start-thread):\n\n```yaml\nhandlers:\n user-message:\n Start research:\n block: start-thread\n thread: research\n mcpServers: [figma, browser]\n tools: [set-chat-title]\n system: research-prompt\n```\n\nThis thread can use Figma and browser tools, but not sentry or filesystem - even if those are available on the main agent.\n\n## On-Demand MCP Servers\n\nBy default, an agent can only call MCP tools whose namespace is listed in `mcpServers`. With `onDemandMcpServers`, a scope can opt into **every connected MCP of a given source** at runtime, without enumerating each one in the protocol.\n\nRemote MCPs are connected at the project level from the Octavus dashboard. Normally, each connected MCP that the agent should be able to use has to be declared in the protocol - connecting a new MCP means editing the protocol and redeploying. `onDemandMcpServers` removes that round-trip: once a source is opted in, any MCP connected to the project under that source becomes available to the agent immediately.\n\nCurrently supported for `source: remote`.\n\n### Protocol-level declaration\n\nAdd an `onDemandMcpServers:` section alongside `mcpServers:`, keyed by source. Each entry configures how the matched MCPs appear in tool lists:\n\n```yaml\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n display: description\n\nonDemandMcpServers:\n remote:\n description: Additional connected integrations\n display: name\n execution: device # on-demand MCPs run on the agent's computer\n contextRetention:\n toolResults: { retainLast: 5 }\n```\n\nOn-demand MCP definitions also support the `execution` field. When set, all MCPs matched by that on-demand source inherit the execution mode.\n\n### Scope-level opt-in\n\nThe agent and individual `start-thread` blocks each choose whether to pick up on-demand MCPs, by listing the sources they want:\n\n```yaml\nagent:\n mcpServers: [figma]\n onDemandMcpServers: [remote]\n\nhandlers:\n user-message:\n focused:\n block: start-thread\n mcpServers: [figma]\n # no onDemandMcpServers - this thread does NOT see on-demand MCPs\n broad:\n block: start-thread\n mcpServers: [figma]\n onDemandMcpServers: [remote]\n```\n\n### Rules\n\n- A scope's tool list includes every **connected** MCP of any referenced source, whether or not any protocol declares that slug.\n- Undeclared namespaces inherit `description`, `display`, and `contextRetention` from the per-source entry in `onDemandMcpServers`.\n- Scopes decide independently - threads do not inherit `onDemandMcpServers` from their parent, the same rule as `mcpServers:`.\n- Tool namespaces are always the connector's slug (for example `notion__search`, `linear__create_issue`). Source keys are never namespaces.\n\nWorkers opt into on-demand MCPs the same way: through `start-thread` blocks inside `steps`. A worker without a `start-thread` that lists a source won't see on-demand MCPs of that source.\n\n## Workers\n\nWorkers can declare and use MCP servers using the same `mcpServers:` syntax. Workers resolve their own MCP connections independently - they don't inherit from a parent interactive agent.\n\n```yaml\n# Worker protocol\nmcpServers:\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n\nsteps:\n Start research:\n block: start-thread\n thread: research\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [sentry, browser]\n maxSteps: 10\n```\n\nSince workers don't have a global `agent:` section, MCP servers are scoped per-thread via `start-thread` - the same way tools and skills work in workers. Remote MCP connections are project-scoped, so workers in the same project share the same OAuth connections.\n\nSee [Workers](/docs/protocol/workers) for the full worker protocol reference.\n\n## Full Example\n\n```yaml\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n connection: eager\n display: description\n sentry:\n description: Error tracking and debugging\n source: remote\n display: name\n browser:\n description: Chrome DevTools browser automation\n source: device\n display: name\n filesystem:\n description: Filesystem access for reading and writing files\n source: device\n display: hidden\n shell:\n description: Shell command execution\n source: device\n display: name\n\ntools:\n set-chat-title:\n description: Set the title of the current chat.\n parameters:\n title: { type: string, description: The title to set }\n\nagent:\n model: anthropic/claude-opus-4-6\n system: system\n mcpServers: [figma, sentry, browser, filesystem, shell]\n tools: [set-chat-title]\n thinking: medium\n maxSteps: 300\n agentic: true\n\ntriggers:\n user-message:\n input:\n USER_MESSAGE: { type: string }\n\nhandlers:\n user-message:\n Add message:\n block: add-message\n role: user\n prompt: user-message\n input: [USER_MESSAGE]\n display: hidden\n\n Respond:\n block: next-message\n```\n\n### Cloud-Only Agent\n\nAgents that only use remote MCP servers don't need `@octavus/computer`:\n\n```yaml\nmcpServers:\n figma:\n description: Figma design tool integration\n source: remote\n connection: eager # Need design tools from message 1\n display: description\n sentry:\n description: Error tracking and debugging\n source: remote\n # Lazy (default) - agent activates when debugging is needed\n display: name\n\ntools:\n submit-code:\n description: Submit code to the user.\n parameters:\n code: { type: string }\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n mcpServers: [figma, sentry]\n tools: [submit-code]\n agentic: true\n```\n",
|
|
1434
1452
|
excerpt: "MCP Servers MCP servers extend your agent with tools from external services. Define them in your protocol, and agents automatically discover and use their tools at runtime. There are three types of...",
|
|
1435
1453
|
order: 13
|
|
1454
|
+
},
|
|
1455
|
+
{
|
|
1456
|
+
slug: "protocol/context-management",
|
|
1457
|
+
section: "protocol",
|
|
1458
|
+
title: "Context Management",
|
|
1459
|
+
description: "Automatic context-window compaction so long sessions keep running past the model's limit.",
|
|
1460
|
+
content: "\n# Context Management\n\nLong-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the request and the session would otherwise fail. Two [agent config](/docs/protocol/agent-config) knobs make the agent robust to this: `maxToolOutputTokens` caps how much any single tool result puts into context, and `contextManagement` automatically compacts older history as it fills up. Together they keep a long task, a long conversation, or one oversized tool output from ending the session.\n\nCompaction and bounding transform only what the **model sees** on each request. The stored conversation is never changed - the complete history is always preserved.\n\n## Configuration\n\n```yaml\nworkers:\n context-summarizer: # the worker that produces the running summary\n description: Summarizes earlier conversation to free up context\n display: description\n\nagent:\n model: anthropic/claude-sonnet-4-5\n system: system\n maxToolOutputTokens: 300000 # safety cap on a single tool result (no default)\n # context-summarizer is intentionally NOT listed in agent.workers,\n # so the model never sees it as a callable tool.\n contextManagement:\n summarizerWorker: context-summarizer\n thresholdPercent: 0.8 # proactive trigger (no default; omit = reactive only)\n recentPercent: 0.3 # recent window kept verbatim (no default; omit = no summarization)\n```\n\n`maxToolOutputTokens` is a top-level `agent` field (a sibling of `model` and `system`), because bounding a single tool result is independent of history compaction. Workers set the same cap per thread on their [`start-thread`](/docs/protocol/workers) block. `contextManagement` groups the compaction knobs:\n\n| Field | Required | Description |\n| ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------- |\n| `summarizerWorker` | No | Slug of a worker (declared in `workers:`) that produces the running summary. Enables summarization-based compaction. |\n| `thresholdPercent` | No | Fraction of the model's context window at which compaction starts. No default; omit to disable proactive compaction. |\n| `recentPercent` | No | Fraction of the context window kept verbatim as the recent window. No default; omit to disable summarization. |\n| `recentWindow` | No | Deprecated and ignored. Superseded by `recentPercent` (a context-window fraction). |\n\n## How it works\n\n- When `maxToolOutputTokens` is set, every tool result is **bounded** before it enters the model's view: anything over the budget is replaced with a head-and-tail preview plus a note saying how much was omitted and how to fetch the rest. The full result is still preserved in the stored conversation, so nothing is lost - the model just sees a bounded copy and can narrow, page, or search for more.\n- When `thresholdPercent` is set and the prompt crosses that fraction of the context window, the oldest turns are folded into a **running summary** while the original task and the most-recent turns (`recentPercent` of the context window, a token budget) are kept verbatim - so the agent keeps the goal and full fidelity on what it is doing now. Both are opt-in with no default: omit them and the agent does no proactive compaction, relying on the automatic recovery below.\n- Compaction is **incremental**: each cycle only summarizes the newly-expired turns and folds them into the existing summary, so cost stays bounded no matter how long the session runs.\n- If the model rejects a request for being too long anyway, the agent recovers automatically (it reduces context and retries) rather than failing the session.\n\n## Bounded tool output\n\nSome tool calls return very large output - a big file read, a full-page extract, a large MCP or skill result. Left unbounded, one such call can blow past the context window in a single step. Set `maxToolOutputTokens` on the agent (or, for a worker, on its `start-thread` block) to cap how much of any single result reaches the model, while the full result stays in the stored conversation and the trace.\n\nThere is no default: bounding only happens when you set `maxToolOutputTokens`, so the runtime never silently truncates output you did not ask it to. When a result is truncated, the model is always told what was omitted and how to retrieve it, so it can decide to narrow the request, paginate, or read a specific range.\n\nBounding is never hidden: each time a tool result first crosses the budget, a `tool-output-bounded` entry is recorded in the session's execution logs with the tool name, the original size, and the cap. The full, untruncated result stays in the corresponding `tool-result` entry, so you can always see both what the model saw and the complete output.\n\n## The summarizer worker\n\n`summarizerWorker` points at a worker you define and ship like any other (see [Workers](/docs/protocol/workers)). It takes two inputs - `PREVIOUS_SUMMARY` (the running summary so far) and `CONVERSATION` (the older turns to fold in) - and returns the updated summary.\n\nSummarization is gated on its sizing knobs: a worker only runs if you also set `recentPercent` (the recent window it folds around), and it only runs **proactively** if you also set `thresholdPercent`. Set a worker without `recentPercent` and it never runs - validation warns you about this.\n\nDeclare it in the top-level `workers:` section so it can be resolved, but keep it **out** of `agent.workers`: that list is what the model can call as a tool, and the summarizer is invoked automatically, never chosen by the model.\n\nWithout a `summarizerWorker`, the agent still recovers from a context overflow by reducing older tool results, but it won't produce a summary of earlier turns.\n\n## What users see\n\nBecause the summarizer is a worker, it surfaces like any other worker, following its `display` mode (a subtle `description` indicator by default). Compaction is otherwise seamless - the conversation reads as one continuous thread and the complete history is preserved.\n",
|
|
1461
|
+
excerpt: "Context Management Long-running sessions accumulate history - messages, tool results, screenshots, file reads. Once that history approaches the model's context window, the provider rejects the...",
|
|
1462
|
+
order: 14
|
|
1463
|
+
},
|
|
1464
|
+
{
|
|
1465
|
+
slug: "protocol/fast-mode",
|
|
1466
|
+
section: "protocol",
|
|
1467
|
+
title: "Fast Mode",
|
|
1468
|
+
description: "Run supported Anthropic Opus models at higher output speed for latency-sensitive agents.",
|
|
1469
|
+
content: "\n# Fast Mode\n\nFast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the `speed` field in the [agent config](/docs/protocol/agent-config):\n\n```yaml\nagent:\n model: anthropic/claude-opus-4-8\n speed: fast # fast | standard (default)\n```\n\n| Mode | Behavior | When to use |\n| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------- |\n| `standard` | Default speed and pricing. Used whenever `speed` is omitted. | Most agents. |\n| `fast` | Higher output speed at a premium per-token rate. | Latency-sensitive, interactive agents where faster responses are worth the premium. |\n\nFast mode is orthogonal to thinking - it's a speed/price knob, not an intelligence one, and keeps full reasoning.\n\n## Supported models\n\nFast mode only applies to **Anthropic Opus 4.8, 4.7, and 4.6**. On any other model or provider it is a **no-op**: the request runs at standard speed and price, and never errors. This makes it safe to leave `speed: fast` set when using a dynamic model (resolved from input) that might turn out not to support it.\n\nWhen you set `speed: fast` on a literal model that does not support it, the protocol validator surfaces a non-fatal warning in the dashboard.\n\n## Premium pricing\n\nFast mode applies a per-model multiplier over the model's standard rates, to both input and output across the full context window:\n\n| Model | Fast-mode cost |\n| -------------- | -------------- |\n| Opus 4.8 | ~2x standard |\n| Opus 4.7 / 4.6 | ~6x standard |\n\nPrompt-caching costs continue to apply on top of the fast-mode base rates. Billing always reflects the speed a request **actually** ran at: a request that falls back to standard speed (see below) is billed at standard rates, so requesting fast never by itself triggers premium billing.\n\n## Rate limits and fallback\n\nFast mode has a dedicated rate limit, separate from standard Opus limits. When it is exhausted the agent degrades gracefully instead of failing: the request automatically retries at standard speed on the same model, then falls back to your configured [backup model](/docs/protocol/agent-config) if needed, before surfacing an error.\n\nFalling back to standard speed is a prompt-cache miss, since fast and standard requests do not share cached prefixes. The fallback is recorded in the session trace, so it is clear when a request that asked for fast ran at standard (or on the backup model) and why.\n\n## Routing\n\nA supported Opus model can be reached through more than one provider, and fast mode is expressed differently on each - the `speed` field handles the translation:\n\n| Route | Example model | How fast mode is enabled |\n| ----------------- | ------------------------------------------- | ----------------------------------------------------------------- |\n| Direct Anthropic | `anthropic/claude-opus-4-8` | `speed: fast` |\n| Vercel AI Gateway | `vercel/anthropic/claude-opus-4.7` | `speed: fast` |\n| OpenRouter | `openrouter/anthropic/claude-opus-4.8-fast` | Select the dedicated `-fast` model slug (`speed` is ignored here) |\n\n## Passing speed as input\n\nLike `thinking`, `speed` accepts a variable reference so consumers choose it per session:\n\n```yaml\ninput:\n SPEED:\n type: string\n description: Inference speed (fast/standard)\n optional: true\n\nagent:\n model: anthropic/claude-opus-4-8\n speed: SPEED # Resolved from session input; unset -> standard\n system: system\n```\n\nAn unset optional variable resolves to `standard`, so existing agents are never silently upgraded to premium pricing.\n\n## Scope\n\n`speed` follows the same scoping as `thinking`: set it at agent scope (the main thread default) or per named thread in a `start-thread` block (see [Thread-Specific Config](/docs/protocol/agent-config)). Because worker agents configure everything through their thread, that is also how a worker enables fast mode. Thread settings take precedence over the agent default.\n",
|
|
1470
|
+
excerpt: "Fast Mode Fast mode runs a supported Anthropic Opus model with a faster inference configuration - higher output tokens per second, same weights and behavior - at premium pricing. Enable it with the ...",
|
|
1471
|
+
order: 15
|
|
1436
1472
|
}
|
|
1437
1473
|
]
|
|
1438
1474
|
},
|
|
@@ -1538,4 +1574,4 @@ export {
|
|
|
1538
1574
|
getDocSlugs,
|
|
1539
1575
|
getSectionBySlug
|
|
1540
1576
|
};
|
|
1541
|
-
//# sourceMappingURL=chunk-
|
|
1577
|
+
//# sourceMappingURL=chunk-Z2OPVMHI.js.map
|