@octavus/docs 3.2.0 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/content/02-server-sdk/03-tools.md +8 -7
- package/content/02-server-sdk/08-computer.md +92 -0
- package/content/02-server-sdk/09-inline-mcp.md +204 -0
- package/content/03-client-sdk/08-file-uploads.md +11 -5
- package/content/04-protocol/05-skills.md +83 -21
- package/content/04-protocol/09-skills-advanced.md +26 -8
- package/content/04-protocol/11-workers.md +3 -1
- package/content/04-protocol/13-mcp-servers.md +105 -13
- package/dist/{chunk-R4UMXGAC.js → chunk-5L4PRXYU.js} +45 -27
- package/dist/chunk-5L4PRXYU.js.map +1 -0
- package/dist/content.js +1 -1
- package/dist/docs.json +22 -13
- package/dist/index.js +1 -1
- package/dist/search-index.json +1 -1
- package/dist/search.js +1 -1
- package/dist/search.js.map +1 -1
- package/dist/sections.json +22 -13
- package/package.json +1 -1
- package/dist/chunk-R4UMXGAC.js.map +0 -1
|
@@ -9,18 +9,19 @@ Tools extend what agents can do. In Octavus, tools can execute either on your se
|
|
|
9
9
|
|
|
10
10
|
## Server Tools vs Client Tools
|
|
11
11
|
|
|
12
|
-
| Location
|
|
13
|
-
|
|
|
14
|
-
| **Server**
|
|
15
|
-
| **MCP**
|
|
16
|
-
| **
|
|
12
|
+
| Location | Use Case | Registration |
|
|
13
|
+
| -------------- | ------------------------------------------------- | -------------------------------------------------------- |
|
|
14
|
+
| **Server** | Database queries, API calls, sensitive operations | Register handler in `attach()` |
|
|
15
|
+
| **Inline MCP** | Group an integration's tools (GitHub, Salesforce) | [`createInlineMcpServer()`](/docs/server-sdk/inline-mcp) |
|
|
16
|
+
| **Computer** | Browser, filesystem, shell, external services | [`session.setDynamicTools()`](/docs/server-sdk/computer) |
|
|
17
|
+
| **Client** | Browser APIs, interactive UIs, confirmations | No server handler (forwarded to client) |
|
|
17
18
|
|
|
18
19
|
When the Server SDK encounters a tool call:
|
|
19
20
|
|
|
20
|
-
1. **Handler exists** (server or dynamic) → Execute on server, continue automatically
|
|
21
|
+
1. **Handler exists** (server, inline MCP, or dynamic) → Execute on server, continue automatically
|
|
21
22
|
2. **No handler** → Forward to client via `client-tool-request` event
|
|
22
23
|
|
|
23
|
-
MCP
|
|
24
|
+
Inline MCP tools and dynamic tools registered via `session.setDynamicTools()` (e.g., from `@octavus/computer`) work identically to manual handlers from the platform's perspective. See [Inline MCP Servers](/docs/server-sdk/inline-mcp) for namespaced consumer-defined tool groups, and [Computer](/docs/server-sdk/computer) for device-side MCPs.
|
|
24
25
|
|
|
25
26
|
For client-side tool handling, see [Client Tools](/docs/client-sdk/client-tools).
|
|
26
27
|
|
|
@@ -180,6 +180,68 @@ await computer.stop();
|
|
|
180
180
|
|
|
181
181
|
Always call `stop()` when the session ends to clean up MCP subprocesses. For managed processes (like Chrome), pass them in the config for automatic cleanup.
|
|
182
182
|
|
|
183
|
+
## Dynamic Entries
|
|
184
|
+
|
|
185
|
+
You can add or remove MCP entries on a running `Computer` after `start()` has returned. This is useful when MCP configurations arrive after construction - for example, when a session-manager receives per-session entries from a dispatch payload and wants to wire them into the existing computer instead of rebuilding it.
|
|
186
|
+
|
|
187
|
+
### `addEntry(namespace, entry, options?)`
|
|
188
|
+
|
|
189
|
+
Registers a new MCP entry under `namespace`. By default, connects immediately:
|
|
190
|
+
|
|
191
|
+
```typescript
|
|
192
|
+
await computer.addEntry(
|
|
193
|
+
'github',
|
|
194
|
+
Computer.stdio('@modelcontextprotocol/server-github', [], {
|
|
195
|
+
env: { GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GH_TOKEN! },
|
|
196
|
+
}),
|
|
197
|
+
);
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
Pass `{ deferred: true }` to register the entry without connecting. The entry starts in a degraded state and connects on the next `restartEntry(namespace)` call - useful for lazy MCPs the agent activates on demand:
|
|
201
|
+
|
|
202
|
+
```typescript
|
|
203
|
+
await computer.addEntry('github', githubEntry, { deferred: true });
|
|
204
|
+
|
|
205
|
+
// Later, when the agent decides it needs GitHub:
|
|
206
|
+
await computer.restartEntry('github');
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
`addEntry` throws if the namespace already exists. To replace an entry, call `removeEntry` first.
|
|
210
|
+
|
|
211
|
+
If the immediate connection fails, `addEntry` does not throw - the entry is registered as degraded with the error message attached. Inspect via `getHealth()` or `restartEntry()` to retry.
|
|
212
|
+
|
|
213
|
+
### `removeEntry(namespace)`
|
|
214
|
+
|
|
215
|
+
Closes the entry's connection (if any) and drops it from the configuration. No-op when the namespace doesn't exist:
|
|
216
|
+
|
|
217
|
+
```typescript
|
|
218
|
+
await computer.removeEntry('github');
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
### `restartEntry(namespace)`
|
|
222
|
+
|
|
223
|
+
Closes the existing connection (if any) and reconnects with the current configuration:
|
|
224
|
+
|
|
225
|
+
```typescript
|
|
226
|
+
await computer.restartEntry('github');
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
Use this to bring a deferred entry online for the first time, or to recover an entry that became degraded mid-session.
|
|
230
|
+
|
|
231
|
+
### Detecting dynamic-entry support
|
|
232
|
+
|
|
233
|
+
Consumers that work with arbitrary `ToolProvider` implementations can detect dynamic-entry capability with `isDynamicMcpProvider`:
|
|
234
|
+
|
|
235
|
+
```typescript
|
|
236
|
+
import { isDynamicMcpProvider } from '@octavus/server-sdk';
|
|
237
|
+
|
|
238
|
+
if (isDynamicMcpProvider(provider)) {
|
|
239
|
+
await provider.addEntry('github', githubEntry);
|
|
240
|
+
}
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
`Computer` always passes this check.
|
|
244
|
+
|
|
183
245
|
## Chrome Launch Helper
|
|
184
246
|
|
|
185
247
|
For desktop applications that need to control a browser, `Computer.launchChrome()` launches Chrome with remote debugging enabled:
|
|
@@ -384,10 +446,38 @@ class Computer implements ToolProvider {
|
|
|
384
446
|
start(): Promise<{ errors: string[] }>;
|
|
385
447
|
stop(): Promise<void>;
|
|
386
448
|
|
|
449
|
+
// Dynamic entries
|
|
450
|
+
addEntry(namespace: string, entry: McpEntry, options?: { deferred?: boolean }): Promise<void>;
|
|
451
|
+
removeEntry(namespace: string): Promise<void>;
|
|
452
|
+
restartEntry(namespace: string): Promise<void>;
|
|
453
|
+
stopEntry(namespace: string): Promise<void>;
|
|
454
|
+
|
|
455
|
+
// Health
|
|
456
|
+
getHealth(): Promise<ComputerHealth>;
|
|
457
|
+
ensureReady(): Promise<EnsureReadyResult>;
|
|
458
|
+
retryDegraded(): Promise<{ recovered: string[]; stillDegraded: string[] }>;
|
|
459
|
+
|
|
387
460
|
// ToolProvider implementation
|
|
388
461
|
toolHandlers(): Record<string, ToolHandler>;
|
|
389
462
|
toolSchemas(): ToolSchema[];
|
|
390
463
|
}
|
|
464
|
+
|
|
465
|
+
interface ComputerHealth {
|
|
466
|
+
healthy: boolean;
|
|
467
|
+
entries: EntryHealth[];
|
|
468
|
+
totalTools: number;
|
|
469
|
+
}
|
|
470
|
+
|
|
471
|
+
interface EntryHealth {
|
|
472
|
+
name: string;
|
|
473
|
+
healthy: boolean;
|
|
474
|
+
error?: string;
|
|
475
|
+
}
|
|
476
|
+
|
|
477
|
+
interface EnsureReadyResult extends ComputerHealth {
|
|
478
|
+
recovered?: string[];
|
|
479
|
+
failedEntries?: string[];
|
|
480
|
+
}
|
|
391
481
|
```
|
|
392
482
|
|
|
393
483
|
### ComputerConfig
|
|
@@ -396,6 +486,8 @@ class Computer implements ToolProvider {
|
|
|
396
486
|
interface ComputerConfig {
|
|
397
487
|
mcpServers: Record<string, McpEntry>;
|
|
398
488
|
managedProcesses?: { process: ChildProcess }[];
|
|
489
|
+
/** Namespaces to skip during start() - they begin as degraded and can be connected on demand via restartEntry(). */
|
|
490
|
+
deferredEntries?: string[];
|
|
399
491
|
}
|
|
400
492
|
|
|
401
493
|
type McpEntry = StdioConfig | HttpConfig | ShellConfig;
|
|
@@ -0,0 +1,204 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Inline MCP Servers
|
|
3
|
+
description: Group an integration's tools into a Zod-typed bundle that runs in your server process.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Inline MCP Servers
|
|
7
|
+
|
|
8
|
+
Inline MCP servers let you group an integration's tools (e.g., GitHub, Salesforce, an internal microservice) into a namespaced bundle with Zod-typed handler arguments. The tools execute in your server process via the same tool-request/continue path as ordinary [server tools](/docs/server-sdk/tools), so authentication and credentials stay in your process.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
Use an inline MCP server when:
|
|
13
|
+
|
|
14
|
+
- You're integrating a third-party API and want a logical grouping (`github__list-prs`, `github__get-issue`).
|
|
15
|
+
- You want type-safe handler arguments instead of casting `args` from `unknown`.
|
|
16
|
+
- You want to evolve the toolset without protocol-yaml round trips - tool names and schemas are sent at runtime.
|
|
17
|
+
- Tool calls need credentials your platform should never see (OAuth tokens, customer API keys).
|
|
18
|
+
|
|
19
|
+
For comparison with the other tool registration paths, see the [tools overview](/docs/server-sdk/tools#server-tools-vs-client-tools).
|
|
20
|
+
|
|
21
|
+
## Protocol Declaration
|
|
22
|
+
|
|
23
|
+
Declare the namespace in `protocol.yaml` with `source: consumer`. The platform learns the namespace and routes tool calls to your process; tool names and JSON schemas are provided by the SDK at runtime.
|
|
24
|
+
|
|
25
|
+
```yaml
|
|
26
|
+
mcpServers:
|
|
27
|
+
github:
|
|
28
|
+
description: Repository management - issues, pull requests, code
|
|
29
|
+
source: consumer
|
|
30
|
+
display: name
|
|
31
|
+
|
|
32
|
+
agent:
|
|
33
|
+
mcpServers:
|
|
34
|
+
- github
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
See [MCP Servers in the protocol reference](/docs/protocol/mcp-servers) for the full set of MCP source types and field semantics.
|
|
38
|
+
|
|
39
|
+
## Defining the Server
|
|
40
|
+
|
|
41
|
+
```typescript
|
|
42
|
+
import { z } from 'zod';
|
|
43
|
+
import { createInlineMcpServer, defineInlineMcpTool } from '@octavus/server-sdk';
|
|
44
|
+
|
|
45
|
+
const github = createInlineMcpServer('github', {
|
|
46
|
+
tools: {
|
|
47
|
+
'get-pr-overview': defineInlineMcpTool({
|
|
48
|
+
description: 'Get pull request metadata and file changes',
|
|
49
|
+
parameters: z.object({
|
|
50
|
+
owner: z.string(),
|
|
51
|
+
repo: z.string(),
|
|
52
|
+
pullNumber: z.number(),
|
|
53
|
+
}),
|
|
54
|
+
handler: async (args) => {
|
|
55
|
+
// args is { owner: string; repo: string; pullNumber: number }
|
|
56
|
+
return await githubService.getPrOverview(args.owner, args.repo, args.pullNumber);
|
|
57
|
+
},
|
|
58
|
+
}),
|
|
59
|
+
|
|
60
|
+
'list-issues': defineInlineMcpTool({
|
|
61
|
+
description: 'List open issues for a repository',
|
|
62
|
+
parameters: z.object({
|
|
63
|
+
owner: z.string(),
|
|
64
|
+
repo: z.string(),
|
|
65
|
+
state: z.enum(['open', 'closed', 'all']).default('open'),
|
|
66
|
+
}),
|
|
67
|
+
handler: async (args) => {
|
|
68
|
+
return await githubService.listIssues(args.owner, args.repo, args.state);
|
|
69
|
+
},
|
|
70
|
+
}),
|
|
71
|
+
},
|
|
72
|
+
});
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
The factory:
|
|
76
|
+
|
|
77
|
+
1. Validates the namespace and each tool name (see [naming rules](#naming-rules)).
|
|
78
|
+
2. Converts each Zod schema to JSON Schema once at creation time.
|
|
79
|
+
3. Returns an `InlineMcpServer` exposing `toolSchemas()` and `toolHandlers()`.
|
|
80
|
+
|
|
81
|
+
The resulting tool names are namespaced: `github__get-pr-overview`, `github__list-issues`.
|
|
82
|
+
|
|
83
|
+
## Why `defineInlineMcpTool`
|
|
84
|
+
|
|
85
|
+
`defineInlineMcpTool()` is a no-op at runtime, but it preserves Zod type inference. Without the wrapper, TypeScript collapses the per-tool generic when the tools are placed in a record literal, leaving `args` typed as `unknown`:
|
|
86
|
+
|
|
87
|
+
```typescript
|
|
88
|
+
// Without defineInlineMcpTool - args ends up as `unknown`
|
|
89
|
+
tools: {
|
|
90
|
+
'get-pr-overview': {
|
|
91
|
+
description: '...',
|
|
92
|
+
parameters: z.object({ owner: z.string() }),
|
|
93
|
+
handler: async (args) => args.owner, // ❌ TS error: args is 'unknown'
|
|
94
|
+
},
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
// With defineInlineMcpTool - args inferred from the schema
|
|
98
|
+
tools: {
|
|
99
|
+
'get-pr-overview': defineInlineMcpTool({
|
|
100
|
+
description: '...',
|
|
101
|
+
parameters: z.object({ owner: z.string() }),
|
|
102
|
+
handler: async (args) => args.owner, // ✓ args is { owner: string }
|
|
103
|
+
}),
|
|
104
|
+
}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
The handler also receives Zod-validated arguments. Invalid inputs throw before reaching your code, with the failed paths and messages joined into the error.
|
|
108
|
+
|
|
109
|
+
## Attaching to a Session
|
|
110
|
+
|
|
111
|
+
Pass inline MCP servers via `mcpServers` on `attach()`. They merge with `tools` and survive across `setDynamicTools()` calls:
|
|
112
|
+
|
|
113
|
+
```typescript
|
|
114
|
+
const session = client.agentSessions.attach(sessionId, {
|
|
115
|
+
tools: {
|
|
116
|
+
'get-user-account': async (args) => db.users.findById(args.userId as string),
|
|
117
|
+
},
|
|
118
|
+
mcpServers: [github, salesforce],
|
|
119
|
+
});
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Workers accept the same option:
|
|
123
|
+
|
|
124
|
+
```typescript
|
|
125
|
+
const { output } = await client.workers.generate(agentId, input, {
|
|
126
|
+
mcpServers: [github],
|
|
127
|
+
});
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Authentication and Credentials
|
|
131
|
+
|
|
132
|
+
Handlers close over your server's auth context, so credentials never leave your process. The platform receives the namespaced schema list and the tool call name; it never sees the keys you use to fulfill the call.
|
|
133
|
+
|
|
134
|
+
A common pattern is to build the MCP server per-request when auth depends on the user:
|
|
135
|
+
|
|
136
|
+
```typescript
|
|
137
|
+
function buildGithubMcp(token: string) {
|
|
138
|
+
const client = new GithubClient({ token });
|
|
139
|
+
return createInlineMcpServer('github', {
|
|
140
|
+
tools: {
|
|
141
|
+
'get-pr-overview': defineInlineMcpTool({
|
|
142
|
+
description: 'Get pull request metadata and file changes',
|
|
143
|
+
parameters: z.object({
|
|
144
|
+
owner: z.string(),
|
|
145
|
+
repo: z.string(),
|
|
146
|
+
pullNumber: z.number(),
|
|
147
|
+
}),
|
|
148
|
+
handler: async (args) => client.pulls.get(args),
|
|
149
|
+
}),
|
|
150
|
+
},
|
|
151
|
+
});
|
|
152
|
+
}
|
|
153
|
+
|
|
154
|
+
export async function POST(request: Request) {
|
|
155
|
+
const user = await authenticate(request);
|
|
156
|
+
const session = client.agentSessions.attach(sessionId, {
|
|
157
|
+
mcpServers: [buildGithubMcp(user.githubToken)],
|
|
158
|
+
});
|
|
159
|
+
|
|
160
|
+
const events = session.execute(payload, { signal: request.signal });
|
|
161
|
+
return new Response(toSSEStream(events));
|
|
162
|
+
}
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
For static credentials (one tenant per deployment), build the server once at module scope.
|
|
166
|
+
|
|
167
|
+
## How Tool Calls Flow
|
|
168
|
+
|
|
169
|
+
1. The agent emits a `tool-request` event for `github__get-pr-overview` (or another inline-MCP-namespaced tool).
|
|
170
|
+
2. The Server SDK looks up the handler registered by `createInlineMcpServer()` and runs it. Zod validates `args` against the tool's schema; a validation failure becomes a tool error sent back to the LLM.
|
|
171
|
+
3. The handler returns; the SDK posts the result back to the platform via the same continuation request used for ordinary server tools.
|
|
172
|
+
4. The platform feeds the result to the LLM and streams the next response chunk.
|
|
173
|
+
|
|
174
|
+
There is no separate transport - inline MCP tools ride on the same `dynamicToolSchemas` channel that device MCPs use, so no additional infrastructure is required.
|
|
175
|
+
|
|
176
|
+
## Naming Rules
|
|
177
|
+
|
|
178
|
+
`createInlineMcpServer()` validates the namespace and each tool name at construction time. Invalid values throw immediately:
|
|
179
|
+
|
|
180
|
+
- **Namespace:** lowercase letters, digits, and hyphens; must start with a letter (`/^[a-z][a-z0-9-]*$/`).
|
|
181
|
+
- **Tool name:** lowercase letters, digits, underscores, and hyphens; must start with a letter (`/^[a-z][a-z0-9_-]*$/`).
|
|
182
|
+
|
|
183
|
+
The resulting `${namespace}__${toolName}` is what the LLM sees and what flows through the platform's MCP routing.
|
|
184
|
+
|
|
185
|
+
## Collision Rules
|
|
186
|
+
|
|
187
|
+
The resolver throws on the following conflicts so problems surface at attach time, not mid-stream:
|
|
188
|
+
|
|
189
|
+
- Each inline MCP server's `namespace` must be unique across the array passed to `attach()` or the workers API.
|
|
190
|
+
- A namespaced tool name (`namespace__tool`) cannot collide with a static tool handler key passed via `tools`.
|
|
191
|
+
- A namespaced tool name cannot collide with a `dynamicToolSchemas` entry passed to the workers API.
|
|
192
|
+
|
|
193
|
+
If a tool registered via `setDynamicTools()` later collides with an inline MCP tool name, the dynamic handler wins for the duration of that dynamic-tool set; the inline MCP handler is restored on the next `setDynamicTools()` call that doesn't re-register the same name.
|
|
194
|
+
|
|
195
|
+
## Inline vs Computer
|
|
196
|
+
|
|
197
|
+
Both inline MCP and the `Computer` integration register tools that flow through `dynamicToolSchemas`. Pick based on where the tool process runs:
|
|
198
|
+
|
|
199
|
+
| Tool surface | Process location | Best for |
|
|
200
|
+
| ------------ | -------------------------------- | ------------------------------------------------------------------------------------------------------------- |
|
|
201
|
+
| Inline MCP | Your server (in-process closure) | Third-party APIs, internal microservices, anything where credentials live in your backend. |
|
|
202
|
+
| Computer | The agent's machine (STDIO MCP) | Browser automation, filesystem, shell - device-local capabilities. See [Computer](/docs/server-sdk/computer). |
|
|
203
|
+
|
|
204
|
+
The two can coexist on the same session.
|
|
@@ -285,11 +285,17 @@ The `file` type is a built-in type representing uploaded files. Use `file[]` for
|
|
|
285
285
|
|
|
286
286
|
## Supported File Types
|
|
287
287
|
|
|
288
|
-
| Type
|
|
289
|
-
|
|
|
290
|
-
| Images
|
|
291
|
-
| Video
|
|
292
|
-
| Documents
|
|
288
|
+
| Type | Media Types |
|
|
289
|
+
| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
290
|
+
| Images | `image/jpeg`, `image/png`, `image/gif`, `image/webp` |
|
|
291
|
+
| Video | `video/mp4`, `video/webm`, `video/quicktime`, `video/mpeg` |
|
|
292
|
+
| Documents | `application/pdf`, `text/plain`, `text/markdown`, `text/csv`, `application/json` |
|
|
293
|
+
| Office documents | `application/vnd.openxmlformats-officedocument.wordprocessingml.document` (`.docx`), `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` (`.xlsx`), `application/vnd.openxmlformats-officedocument.presentationml.presentation` (`.pptx`), `application/msword` (`.doc`), `application/vnd.ms-excel` (`.xls`), `application/vnd.ms-powerpoint` (`.ppt`) |
|
|
294
|
+
|
|
295
|
+
Images, video, PDFs, and text-based formats are sent directly to the model as
|
|
296
|
+
file parts. Office documents are not natively readable by LLM providers, so
|
|
297
|
+
they are surfaced to the agent as presigned download URLs - the agent fetches
|
|
298
|
+
and parses them with code or skills (e.g. via a sandboxed computer).
|
|
293
299
|
|
|
294
300
|
## File Limits
|
|
295
301
|
|
|
@@ -5,7 +5,7 @@ description: Using Octavus skills for code execution and specialized capabilitie
|
|
|
5
5
|
|
|
6
6
|
# Skills
|
|
7
7
|
|
|
8
|
-
Skills are knowledge packages that enable agents to execute code and generate files
|
|
8
|
+
Skills are knowledge packages that enable agents to execute code and generate files. Unlike external tools (which you implement in your backend), skills are self-contained packages with documentation and scripts. By default, skills run in isolated sandbox environments, but they can also run directly on the agent's computer.
|
|
9
9
|
|
|
10
10
|
## Overview
|
|
11
11
|
|
|
@@ -15,8 +15,8 @@ Octavus Skills provide **provider-agnostic** code execution. They work with any
|
|
|
15
15
|
|
|
16
16
|
1. **Skill Definition**: Skills are defined in the protocol's `skills:` section
|
|
17
17
|
2. **Skill Resolution**: Skills are resolved from available sources (see below)
|
|
18
|
-
3. **
|
|
19
|
-
4. **File Generation**: Files saved to `/output/` are automatically captured and made available for download
|
|
18
|
+
3. **Execution**: Code runs in an isolated sandbox (default) or on the agent's computer
|
|
19
|
+
4. **File Generation**: Files saved to `/output/` are automatically captured and made available for download (sandbox skills)
|
|
20
20
|
|
|
21
21
|
### Skill Sources
|
|
22
22
|
|
|
@@ -49,6 +49,7 @@ skills:
|
|
|
49
49
|
| ------------- | -------- | ------------------------------------------------------------------------------------- |
|
|
50
50
|
| `display` | No | How to show in UI: `hidden`, `name`, `description`, `stream` (default: `description`) |
|
|
51
51
|
| `description` | No | Custom description shown to users (overrides skill's built-in description) |
|
|
52
|
+
| `execution` | No | Where the skill runs: `sandbox` (default) or `device` |
|
|
52
53
|
|
|
53
54
|
### Display Modes
|
|
54
55
|
|
|
@@ -107,19 +108,66 @@ This also works for named threads in interactive agents, allowing different thre
|
|
|
107
108
|
|
|
108
109
|
When skills are enabled, the LLM has access to these tools:
|
|
109
110
|
|
|
110
|
-
| Tool
|
|
111
|
-
|
|
|
112
|
-
| `octavus_skill_read`
|
|
113
|
-
| `octavus_skill_list`
|
|
114
|
-
| `octavus_skill_run`
|
|
115
|
-
| `
|
|
116
|
-
| `
|
|
117
|
-
| `
|
|
111
|
+
| Tool | Purpose | Availability |
|
|
112
|
+
| --------------------- | ----------------------------------------------- | ------------------------------ |
|
|
113
|
+
| `octavus_skill_read` | Read skill documentation (SKILL.md) | All skills |
|
|
114
|
+
| `octavus_skill_list` | List available scripts in a skill | All skills |
|
|
115
|
+
| `octavus_skill_run` | Execute a pre-built script from a skill | All skills |
|
|
116
|
+
| `octavus_skill_setup` | Install a skill on the device for file browsing | Device skills only |
|
|
117
|
+
| `octavus_code_run` | Execute arbitrary Python/Bash code | Sandbox skills (standard) only |
|
|
118
|
+
| `octavus_file_write` | Create files in the sandbox | Sandbox skills (standard) only |
|
|
119
|
+
| `octavus_file_read` | Read files from the sandbox | Sandbox skills (standard) only |
|
|
118
120
|
|
|
119
121
|
The LLM learns about available skills through system prompt injection and can use these tools to interact with skills.
|
|
120
122
|
|
|
121
123
|
Skills that have [secrets](#skill-secrets) configured run in **secure mode**, where only `octavus_skill_read`, `octavus_skill_list`, and `octavus_skill_run` are available. See [Skill Secrets](#skill-secrets) below.
|
|
122
124
|
|
|
125
|
+
## Device Execution
|
|
126
|
+
|
|
127
|
+
By default, skills run in an isolated sandbox. When `execution: device` is set, the skill runs on the agent's computer (VM or desktop) instead.
|
|
128
|
+
|
|
129
|
+
```yaml
|
|
130
|
+
skills:
|
|
131
|
+
deploy-tool:
|
|
132
|
+
display: description
|
|
133
|
+
description: Deploy applications to production
|
|
134
|
+
execution: device
|
|
135
|
+
qr-code:
|
|
136
|
+
display: description
|
|
137
|
+
description: Generating QR codes
|
|
138
|
+
# execution defaults to sandbox
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### How Device Skills Work
|
|
142
|
+
|
|
143
|
+
Device skills are installed on the agent's computer so the agent can browse their files and run their scripts directly. After attaching a skill via integrations, the agent uses `octavus_skill_setup` to install it on the device. Once installed, the agent can:
|
|
144
|
+
|
|
145
|
+
- Read the skill's documentation with `octavus_skill_read`
|
|
146
|
+
- List available scripts with `octavus_skill_list`
|
|
147
|
+
- Run pre-built scripts with `octavus_skill_run`
|
|
148
|
+
|
|
149
|
+
The generic workspace tools (`octavus_code_run`, `octavus_file_write`, `octavus_file_read`) are **not available** for device skills. Instead, the agent uses the device's own shell and filesystem MCP servers to interact with files and run commands.
|
|
150
|
+
|
|
151
|
+
### Sandbox vs Device Skills
|
|
152
|
+
|
|
153
|
+
| Aspect | Sandbox (default) | Device |
|
|
154
|
+
| ------------------- | ---------------------------------- | ------------------------------------------------------ |
|
|
155
|
+
| **Environment** | Isolated sandbox | Agent's computer (VM or desktop) |
|
|
156
|
+
| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
|
|
157
|
+
| **File access** | Via `octavus_file_read/write` | Via device filesystem MCP |
|
|
158
|
+
| **Code execution** | Via `octavus_code_run` | Via device shell MCP |
|
|
159
|
+
| **Isolation** | Fully sandboxed | Runs alongside other device processes |
|
|
160
|
+
| **File output** | `/output/` directory auto-captured | Files written to device filesystem |
|
|
161
|
+
|
|
162
|
+
### When to Use Device Execution
|
|
163
|
+
|
|
164
|
+
Use `execution: device` when the skill needs to:
|
|
165
|
+
|
|
166
|
+
- Access the agent's local filesystem or running processes
|
|
167
|
+
- Use tools or CLIs installed on the device
|
|
168
|
+
- Interact with services running on the device
|
|
169
|
+
- Persist files beyond a single execution cycle
|
|
170
|
+
|
|
123
171
|
## Example: QR Code Generation
|
|
124
172
|
|
|
125
173
|
```yaml
|
|
@@ -297,14 +345,14 @@ skills:
|
|
|
297
345
|
|
|
298
346
|
## Comparison: Skills vs Tools vs Provider Options
|
|
299
347
|
|
|
300
|
-
| Feature | Octavus Skills
|
|
301
|
-
| ------------------ |
|
|
302
|
-
| **Execution** |
|
|
303
|
-
| **Provider** | Any (agnostic)
|
|
304
|
-
| **Code Execution** | Yes
|
|
305
|
-
| **File Output** | Yes
|
|
306
|
-
| **Implementation** | Skill packages
|
|
307
|
-
| **Cost** | Sandbox + LLM API
|
|
348
|
+
| Feature | Octavus Skills | External Tools | Provider Tools/Skills |
|
|
349
|
+
| ------------------ | --------------------------- | ------------------- | --------------------- |
|
|
350
|
+
| **Execution** | Sandbox or agent's computer | Your backend | Provider servers |
|
|
351
|
+
| **Provider** | Any (agnostic) | N/A | Provider-specific |
|
|
352
|
+
| **Code Execution** | Yes | No | Yes (provider tools) |
|
|
353
|
+
| **File Output** | Yes | No | Yes (provider skills) |
|
|
354
|
+
| **Implementation** | Skill packages | Your code | Built-in |
|
|
355
|
+
| **Cost** | Sandbox + LLM API | Your infrastructure | Included in API |
|
|
308
356
|
|
|
309
357
|
## Uploading Custom Skills
|
|
310
358
|
|
|
@@ -343,9 +391,21 @@ agent:
|
|
|
343
391
|
skills: [my-skill]
|
|
344
392
|
```
|
|
345
393
|
|
|
394
|
+
## On-Demand Skills
|
|
395
|
+
|
|
396
|
+
On-demand skills (`onDemandSkills`) also support the `execution` field:
|
|
397
|
+
|
|
398
|
+
```yaml
|
|
399
|
+
onDemandSkills:
|
|
400
|
+
display: description
|
|
401
|
+
execution: device
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
When `execution: device` is set on the on-demand skills declaration, any skill attached at runtime via integrations runs on the agent's computer instead of in a sandbox.
|
|
405
|
+
|
|
346
406
|
## Sandbox Timeout
|
|
347
407
|
|
|
348
|
-
The default sandbox timeout is 5 minutes. You can configure a custom timeout using `sandboxTimeout` in the agent config or on individual `start-thread` blocks:
|
|
408
|
+
The default sandbox timeout is 5 minutes (applies to sandbox skills only). You can configure a custom timeout using `sandboxTimeout` in the agent config or on individual `start-thread` blocks:
|
|
349
409
|
|
|
350
410
|
```yaml
|
|
351
411
|
# Agent-level timeout (applies to main thread)
|
|
@@ -436,7 +496,7 @@ For standard skills (without secrets), scripts receive input as CLI arguments. F
|
|
|
436
496
|
|
|
437
497
|
## Security
|
|
438
498
|
|
|
439
|
-
|
|
499
|
+
Sandbox skills run in isolated environments:
|
|
440
500
|
|
|
441
501
|
- **No network access** (unless explicitly configured)
|
|
442
502
|
- **No persistent storage** (sandbox destroyed after each `next-message` execution)
|
|
@@ -444,6 +504,8 @@ Skills run in isolated sandbox environments:
|
|
|
444
504
|
- **Time limits** enforced (5-minute default, configurable via `sandboxTimeout`)
|
|
445
505
|
- **Secret redaction** - output from secure skills is automatically scanned for secret values
|
|
446
506
|
|
|
507
|
+
Device skills run on the agent's computer and share its environment. They do not have sandbox isolation but benefit from restricted tool access (only slug-bearing tools are available).
|
|
508
|
+
|
|
447
509
|
## Next Steps
|
|
448
510
|
|
|
449
511
|
- [Agent Config](/docs/protocol/agent-config) - Configuring skills in agent settings
|
|
@@ -66,6 +66,24 @@ steps:
|
|
|
66
66
|
maxSteps: 10
|
|
67
67
|
```
|
|
68
68
|
|
|
69
|
+
### Execution Mode
|
|
70
|
+
|
|
71
|
+
The `execution` field is set at the skill definition level and applies to all threads that use the skill:
|
|
72
|
+
|
|
73
|
+
```yaml
|
|
74
|
+
skills:
|
|
75
|
+
deploy-tool:
|
|
76
|
+
display: description
|
|
77
|
+
description: Deploy applications
|
|
78
|
+
execution: device # All threads using this skill run it on the device
|
|
79
|
+
qr-code:
|
|
80
|
+
display: description
|
|
81
|
+
description: Generating QR codes
|
|
82
|
+
# Defaults to sandbox execution
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
You don't set `execution` per-thread - a skill's execution mode is consistent wherever it's used.
|
|
86
|
+
|
|
69
87
|
### Match Skills to Use Cases
|
|
70
88
|
|
|
71
89
|
Different threads can have different skills. Define all skills at the protocol level, then scope them to each thread:
|
|
@@ -311,15 +329,15 @@ Pattern:
|
|
|
311
329
|
|
|
312
330
|
When a skill declares secrets and an organization configures them, the skill runs in secure mode with its own isolated sandbox.
|
|
313
331
|
|
|
314
|
-
### Standard vs Secure Skills
|
|
332
|
+
### Standard vs Secure vs Device Skills
|
|
315
333
|
|
|
316
|
-
| Aspect | Standard Skills
|
|
317
|
-
| ------------------- |
|
|
318
|
-
| **
|
|
319
|
-
| **Available tools** | All 6 skill tools
|
|
320
|
-
| **Script input** | CLI arguments via `args`
|
|
321
|
-
| **
|
|
322
|
-
| **Output** | Raw stdout/stderr
|
|
334
|
+
| Aspect | Standard Skills | Secure Skills | Device Skills |
|
|
335
|
+
| ------------------- | ------------------------ | --------------------------------------------------- | ------------------------------------------------------ |
|
|
336
|
+
| **Environment** | Shared sandbox | Isolated sandbox (one per skill) | Agent's computer (VM or desktop) |
|
|
337
|
+
| **Available tools** | All 6 skill tools | `skill_read`, `skill_list`, `skill_run` only | `skill_read`, `skill_list`, `skill_run`, `skill_setup` |
|
|
338
|
+
| **Script input** | CLI arguments via `args` | JSON via stdin (use `input` parameter) | CLI arguments via `args` |
|
|
339
|
+
| **Secrets** | No secrets | Secrets as env vars | No secrets |
|
|
340
|
+
| **Output** | Raw stdout/stderr | Redacted (secret values replaced with `[REDACTED]`) | Raw stdout/stderr |
|
|
323
341
|
|
|
324
342
|
### Writing Scripts for Secure Skills
|
|
325
343
|
|
|
@@ -416,7 +416,9 @@ steps:
|
|
|
416
416
|
maxSteps: 10
|
|
417
417
|
```
|
|
418
418
|
|
|
419
|
-
Workers define their own skills independently
|
|
419
|
+
Workers define their own skills independently - they don't inherit skills from a parent interactive agent. Each thread gets its own sandbox scoped to only its listed skills.
|
|
420
|
+
|
|
421
|
+
Skills with `execution: device` work the same way in workers as in interactive agents - the skill runs on the agent's computer. Workers resolve their device execution independently, so a worker can use device skills even if the parent agent does not.
|
|
420
422
|
|
|
421
423
|
See [Skills](/docs/protocol/skills) for full documentation.
|
|
422
424
|
|