demian-cli 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +374 -0
- package/architecture-tui.md +468 -0
- package/architecture.md +2773 -0
- package/bin/demian +12 -0
- package/bin/demian-cli.js +12 -0
- package/bin/demian-plain.js +12 -0
- package/bin/demian.js +12 -0
- package/dist/cli.mjs +30346 -0
- package/dist/index.mjs +28826 -0
- package/dist/tui.mjs +33873 -0
- package/dist/vscode-worker.mjs +32479 -0
- package/docs/ko/README.md +158 -0
- package/media/demian.svg +9 -0
- package/package.json +67 -0
- package/tsconfig.json +17 -0
package/architecture.md
ADDED
|
@@ -0,0 +1,2773 @@
|
|
|
1
|
+
# demian Architecture
|
|
2
|
+
|
|
3
|
+
> Status: canonical integrated architecture with implemented v1 multi-agent core
|
|
4
|
+
> Package name: `demian`
|
|
5
|
+
> Package path: `demian`
|
|
6
|
+
> Inputs: `nodejs/architecture-by-claude.md`, `nodejs/architecture-by-codex.md`, `claude/demian`, `codex/demian`, `.documents/multi-agent-architecture.md`, `.documents/add-codex-provider-by-codex.md`, `.documents/add-claudecode-provider-by-claude.md`, `.documents/add-claudecode-provider-by-codex.md`, `.documents/efficient-context.md`
|
|
7
|
+
|
|
8
|
+
`demian` is a small local coding-agent runtime. The model is replaceable; the runtime owns local execution, policy, observability, and safety boundaries.
|
|
9
|
+
|
|
10
|
+
The integrated design takes the strongest parts of both source designs:
|
|
11
|
+
|
|
12
|
+
- From the Claude design: philosophy, decision history, provider rationale, tradeoff tables, command hook protocol, detailed tool safety notes.
|
|
13
|
+
- From the Codex design: executable runtime shape, Node-first CLI, streaming, multimodal input, sandbox modes, persistent grants, network-free tests.
|
|
14
|
+
|
|
15
|
+
Where the earlier designs disagree, this document chooses the path that best fits the new integrated package. In particular, `streaming`, `multimodal`, `sandbox`, and `persistentGrants` are included in the integrated v0 because they are already represented in `codex/demian`.
|
|
16
|
+
|
|
17
|
+
This revision also folds in the target multi-agent design. The important design boundary is that demian does not become an autonomous swarm framework. It becomes an opt-in, root-owned delegation runtime: the root session owns authority, the main agent talks to the user, and sub agents run as bounded child invocations under the same permission, transcript, sandbox, and cancellation services.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## 1. Identity
|
|
22
|
+
|
|
23
|
+
`demian` is not:
|
|
24
|
+
|
|
25
|
+
- an IDE
|
|
26
|
+
- an LLM gateway
|
|
27
|
+
- a generic agent framework
|
|
28
|
+
- a general autonomous multi-agent orchestration platform
|
|
29
|
+
- a LangChain or Vercel AI SDK wrapper
|
|
30
|
+
|
|
31
|
+
`demian` is:
|
|
32
|
+
|
|
33
|
+
- a local-first coding-agent runtime
|
|
34
|
+
- a provider-neutral session loop
|
|
35
|
+
- a tool execution boundary
|
|
36
|
+
- a hook and permission policy layer
|
|
37
|
+
- an opt-in root-owned delegation runtime
|
|
38
|
+
- an observable CLI package
|
|
39
|
+
|
|
40
|
+
Canonical single-agent runtime flow:
|
|
41
|
+
|
|
42
|
+
```text
|
|
43
|
+
user prompt
|
|
44
|
+
-> config + agent resolution
|
|
45
|
+
-> provider request
|
|
46
|
+
-> model response
|
|
47
|
+
-> optional tool call
|
|
48
|
+
-> hooks
|
|
49
|
+
-> permission
|
|
50
|
+
-> local tool execution
|
|
51
|
+
-> tool result
|
|
52
|
+
-> provider request
|
|
53
|
+
-> final answer
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Canonical multi-agent runtime flow:
|
|
57
|
+
|
|
58
|
+
```text
|
|
59
|
+
root session owns authority
|
|
60
|
+
-> main agent talks to the user
|
|
61
|
+
-> main agent may call delegate_agent
|
|
62
|
+
-> child AgentInvocation runs a bounded AgentSession
|
|
63
|
+
-> child tool and agent permissions route to the root session
|
|
64
|
+
-> compact child result returns to main as a tool result
|
|
65
|
+
-> final main answer
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
The two technical horns remain:
|
|
69
|
+
|
|
70
|
+
- **Hooks**: lifecycle observers and policy gates.
|
|
71
|
+
- **Tools**: the only capabilities that can touch the workspace or process execution.
|
|
72
|
+
|
|
73
|
+
Agents add a third runtime concept, but not a third side-effect boundary. An agent is a **policy capsule**: provider profile reference, system prompt, visible tools, permission defaults, delegation policy, and catalog metadata. Sub agents are invoked through a stable `delegate_agent` tool, but the invocation itself is session-backed because it has model turns, tool calls, permissions, transcript events, and bounded memory.
|
|
74
|
+
|
|
75
|
+
The model requests work. Hooks and permissions decide whether local work may happen. Tools execute it. Agents organize model behavior. Events and transcripts remember the full chronology.
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## 2. Design Principles
|
|
80
|
+
|
|
81
|
+
| Principle | Meaning |
|
|
82
|
+
|-----------|---------|
|
|
83
|
+
| Local-first execution | Provider calls may be remote, but file and process execution are local and bounded to `cwd`. |
|
|
84
|
+
| OpenAI-compatible first | OpenAI, Gemini, Ollama, LM Studio, vLLM, llama.cpp, OpenRouter, Together, Groq, and Azure OpenAI share one provider path. Provider-native exceptions such as Anthropic, Codex, and Claude Code stay isolated adapters. |
|
|
85
|
+
| OpenAI-shaped internal messages | Internal history uses `system`, `user`, `assistant`, and `tool` messages. Anthropic converts at the adapter boundary. |
|
|
86
|
+
| Explicit side-effect boundary | Every side-effecting action passes through hook dispatch and permission evaluation. |
|
|
87
|
+
| Hard/global deny dominates | Built-in hard deny and global explicit deny override session grants, persistent grants, and `--yes`. Agent-local deny can be expanded only by explicit root-user approval. |
|
|
88
|
+
| Root owns authority | Permissions, grants, prompts, transcript, cancellation, cwd, and sandbox policy belong to the root session, not to individual agents. |
|
|
89
|
+
| Agents are policy capsules | An agent definition includes prompt, provider profile reference, visible tools, defaults, delegation rules, and catalog metadata. |
|
|
90
|
+
| Visibility is not permission | A tool can be installed without being visible to an agent, and a grant can exist without making a hidden tool callable by that agent. |
|
|
91
|
+
| Delegation is tool-entry, session-backed | The main model sees a `delegate_agent` tool, while the runtime creates a child AgentSession with its own bounded context and memory. |
|
|
92
|
+
| Observable runtime | Events, transcript JSONL, tool previews, retry events, and permission events are first-class. |
|
|
93
|
+
| Single-agent remains first-class | Multi-agent mode is opt-in and must not change the default single-agent UX or safety model. |
|
|
94
|
+
| Small core | Native Gemini, Vertex AI, MCP, plugins, background agents, and marketplace features are extensions, not core dependencies. |
|
|
95
|
+
|
|
96
|
+
Measurable architecture property:
|
|
97
|
+
|
|
98
|
+
```text
|
|
99
|
+
Adding a new OpenAI-compatible provider must require zero changes to Session Runner,
|
|
100
|
+
tools, hooks, permissions, messages, and transcript code.
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Gemini is the first proof of this property.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## 3. Scope
|
|
108
|
+
|
|
109
|
+
### v0 Includes
|
|
110
|
+
|
|
111
|
+
- CLI package named `demian`
|
|
112
|
+
- Node 22+ runtime with TypeScript strip support
|
|
113
|
+
- Bun-compatible package shape where practical
|
|
114
|
+
- OpenAI-compatible provider with `chat()` and optional `stream()`
|
|
115
|
+
- Codex provider using local Codex ChatGPT login and Responses API
|
|
116
|
+
- Claude Code external runtime using Claude Agent SDK by default, with explicit CLI fallback
|
|
117
|
+
- Gemini through the OpenAI-compatible endpoint
|
|
118
|
+
- Anthropic adapter
|
|
119
|
+
- OpenAI-shaped message history
|
|
120
|
+
- Six built-in tools
|
|
121
|
+
- Hook lifecycle plus command hooks
|
|
122
|
+
- Four built-in safety hooks
|
|
123
|
+
- Agent registry with `build` and `plan`
|
|
124
|
+
- Permission engine with session grants
|
|
125
|
+
- Persistent grants with TTL
|
|
126
|
+
- Bash sandbox modes: `off`, `read-only`, `workspace-write`
|
|
127
|
+
- Multimodal user input through repeated `--image`
|
|
128
|
+
- Runtime events and JSONL transcripts
|
|
129
|
+
- Network-free tests
|
|
130
|
+
|
|
131
|
+
### Implemented v1 Multi-Agent Core
|
|
132
|
+
|
|
133
|
+
- Mode switch: `single-agent` by default, `multi-agent` by config or flag
|
|
134
|
+
- Extensible `ToolRegistry` and `AgentRegistry`
|
|
135
|
+
- `AgentDefinition` as a policy capsule
|
|
136
|
+
- Config and programmatic agent registration
|
|
137
|
+
- Provider profile references per agent, with provider configs still root-owned
|
|
138
|
+
- RootSession above SessionRunner in multi-agent mode
|
|
139
|
+
- Shared root grants for main and child invocations
|
|
140
|
+
- Stable `delegate_agent` tool exposed only in multi-agent mode
|
|
141
|
+
- Child AgentSessionRunner with bounded context, compressed memory, and synchronous v1 execution
|
|
142
|
+
- Structured handoff packets instead of full main-history sharing
|
|
143
|
+
- Compact child results returned to main plus full transcript/artifact references
|
|
144
|
+
- Parent-child invocation events in one root transcript
|
|
145
|
+
- Context budget knobs for main and sub agents
|
|
146
|
+
|
|
147
|
+
### Deferred
|
|
148
|
+
|
|
149
|
+
| Deferred item | Reason | Future extension |
|
|
150
|
+
|---------------|--------|------------------|
|
|
151
|
+
| Native Gemini SDK | OpenAI-compatible path is sufficient for coding-agent v0 | `providers/gemini.ts` |
|
|
152
|
+
| Vertex AI | Different auth and enterprise flow | `providers/vertex.ts` |
|
|
153
|
+
| MCP | Core loop should stabilize first | MCP-backed Tool wrapper |
|
|
154
|
+
| Plugin marketplace | Requires trust and manifest design | Plugin loader |
|
|
155
|
+
| Filesystem agent discovery | Markdown loader and trust UX should land with the trust store | `agents/*.md` loader |
|
|
156
|
+
| External tool module loading | Loading JS/MJS is arbitrary local code execution | Trust-gated tool loader |
|
|
157
|
+
| Trust persistence | Should be introduced with filesystem discovery | `.demian/trust.json` |
|
|
158
|
+
| Background subagents | Permission ordering, cancellation, and transcript UX need the synchronous model first | Background invocation queue |
|
|
159
|
+
| Autonomous swarm routing | demian is a bounded local coding runtime, not a general multi-agent framework | Explicit router feature if proven necessary |
|
|
160
|
+
| Parallel tool calls | Deterministic ordering matters first | Read-only parallel lane |
|
|
161
|
+
| Full secret redaction | Cannot be guaranteed with regex only | Redaction policy registry |
|
|
162
|
+
| Provider raw advanced features | Can weaken provider portability | Explicit `ProviderCapabilities` |
|
|
163
|
+
| Per-agent secrets | Provider credentials should remain config/provider-resolver owned | Secret-scoped provider profiles |
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## 4. System Architecture
|
|
168
|
+
|
|
169
|
+
```text
|
|
170
|
+
┌──────────────────────────────────────────────────────────────┐
|
|
171
|
+
│ CLI │
|
|
172
|
+
│ flags / config / provider resolution / permission prompt │
|
|
173
|
+
└──────────────────────────────┬───────────────────────────────┘
|
|
174
|
+
v
|
|
175
|
+
┌──────────────────────────────────────────────────────────────┐
|
|
176
|
+
│ Session Runner │
|
|
177
|
+
│ OpenAI-shaped history / turns / streaming / tool lifecycle │
|
|
178
|
+
└───────┬──────────────┬───────────────┬───────────────┬────────┘
|
|
179
|
+
v v v v
|
|
180
|
+
┌────────────┐ ┌────────────┐ ┌─────────────┐ ┌──────────────┐
|
|
181
|
+
│ Provider │ │ Tool │ │ Hook │ │ Permission │
|
|
182
|
+
│ + retry │ │ Registry │ │ Dispatcher │ │ Engine │
|
|
183
|
+
└─────┬──────┘ └─────┬──────┘ └──────┬──────┘ └──────┬───────┘
|
|
184
|
+
v v v v
|
|
185
|
+
[LLM APIs] [Local OS] [Config/hooks] [Agent policy]
|
|
186
|
+
|
|
187
|
+
OpenAIProvider
|
|
188
|
+
-> OpenAI
|
|
189
|
+
-> Gemini OpenAI-compatible
|
|
190
|
+
-> Ollama / LM Studio / vLLM / llama.cpp
|
|
191
|
+
-> OpenRouter / Together / Groq / Azure OpenAI
|
|
192
|
+
|
|
193
|
+
AnthropicProvider
|
|
194
|
+
-> Anthropic API
|
|
195
|
+
|
|
196
|
+
CodexProvider
|
|
197
|
+
-> local Codex auth store
|
|
198
|
+
-> ChatGPT-backed Codex Responses API
|
|
199
|
+
|
|
200
|
+
Claude Code external runtime
|
|
201
|
+
-> Claude Agent SDK by default
|
|
202
|
+
-> explicit claude -p CLI fallback
|
|
203
|
+
-> local Claude Code login / Agent SDK plan auth
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
The Session Runner is the kernel. Provider, tools, hooks, permissions, sandbox, and transcript are replaceable services around it.
|
|
207
|
+
|
|
208
|
+
In multi-agent mode, RootSession becomes the outer lifetime:
|
|
209
|
+
|
|
210
|
+
```text
|
|
211
|
+
UI process
|
|
212
|
+
-> RootSession
|
|
213
|
+
owns provider registry, agent registry, tool registry
|
|
214
|
+
owns permission coordinator, grants, event bus, transcript writer
|
|
215
|
+
owns root abort signal, cwd, sandbox policy, context budgets
|
|
216
|
+
|
|
217
|
+
-> AgentInvocation(main)
|
|
218
|
+
-> SessionRunner(main)
|
|
219
|
+
-> direct tool call
|
|
220
|
+
-> delegate_agent
|
|
221
|
+
-> AgentInvocation(child)
|
|
222
|
+
-> AgentSessionRunner(child)
|
|
223
|
+
-> bounded child model context
|
|
224
|
+
-> child tool calls through same hooks/permissions
|
|
225
|
+
-> child memory compression
|
|
226
|
+
-> child final answer
|
|
227
|
+
-> compact child result as tool message
|
|
228
|
+
-> final main answer
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
This chooses a child session model instead of a simple inner loop inside the main SessionRunner. The gain is that a reviewer, builder, or researcher can preserve bounded working memory across repeated delegations, can use a different provider profile, and can have its own prompt and turn budget. The cost is extra lifecycle, transcript, and compression machinery. That cost is accepted because agent memory and provider identity are real runtime state, not just prompt text.
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## 5. Directory Layout
|
|
236
|
+
|
|
237
|
+
```text
|
|
238
|
+
nodejs/
|
|
239
|
+
package.json
|
|
240
|
+
tsconfig.json
|
|
241
|
+
README.md
|
|
242
|
+
architecture.md
|
|
243
|
+
architecture-by-claude.md
|
|
244
|
+
architecture-by-codex.md
|
|
245
|
+
bin/
|
|
246
|
+
demian.js
|
|
247
|
+
demian-cli.js
|
|
248
|
+
demian-plain.js
|
|
249
|
+
src/
|
|
250
|
+
index.ts
|
|
251
|
+
cli.ts
|
|
252
|
+
tui.ts
|
|
253
|
+
config.ts
|
|
254
|
+
session.ts
|
|
255
|
+
root-session.ts
|
|
256
|
+
events.ts
|
|
257
|
+
transcript.ts
|
|
258
|
+
messages.ts
|
|
259
|
+
multimodal.ts
|
|
260
|
+
id.ts
|
|
261
|
+
util.ts
|
|
262
|
+
|
|
263
|
+
providers/
|
|
264
|
+
types.ts
|
|
265
|
+
retry.ts
|
|
266
|
+
openai.ts
|
|
267
|
+
anthropic.ts
|
|
268
|
+
codex.ts
|
|
269
|
+
codex-auth.ts
|
|
270
|
+
codex-state.ts
|
|
271
|
+
codex-stream.ts
|
|
272
|
+
claudecode.ts
|
|
273
|
+
claudecode-auth.ts
|
|
274
|
+
claudecode-stream.ts
|
|
275
|
+
|
|
276
|
+
tools/
|
|
277
|
+
types.ts
|
|
278
|
+
validation.ts
|
|
279
|
+
registry.ts
|
|
280
|
+
output.ts
|
|
281
|
+
read-file.ts
|
|
282
|
+
write-file.ts
|
|
283
|
+
edit-file.ts
|
|
284
|
+
bash.ts
|
|
285
|
+
grep.ts
|
|
286
|
+
glob.ts
|
|
287
|
+
delegate-agent.ts
|
|
288
|
+
|
|
289
|
+
hooks/
|
|
290
|
+
types.ts
|
|
291
|
+
dispatcher.ts
|
|
292
|
+
command.ts
|
|
293
|
+
builtin/
|
|
294
|
+
index.ts
|
|
295
|
+
block-dangerous-bash.ts
|
|
296
|
+
protect-env-files.ts
|
|
297
|
+
mask-secrets.ts
|
|
298
|
+
inject-env-info.ts
|
|
299
|
+
|
|
300
|
+
permissions/
|
|
301
|
+
types.ts
|
|
302
|
+
engine.ts
|
|
303
|
+
grants.ts
|
|
304
|
+
persistent-grants.ts
|
|
305
|
+
prompt.ts
|
|
306
|
+
|
|
307
|
+
agents/
|
|
308
|
+
types.ts
|
|
309
|
+
registry.ts
|
|
310
|
+
build.ts
|
|
311
|
+
plan.ts
|
|
312
|
+
prompts/
|
|
313
|
+
build.txt
|
|
314
|
+
plan.txt
|
|
315
|
+
|
|
316
|
+
workspace/
|
|
317
|
+
paths.ts
|
|
318
|
+
diff.ts
|
|
319
|
+
|
|
320
|
+
sandbox/
|
|
321
|
+
types.ts
|
|
322
|
+
index.ts
|
|
323
|
+
macos.ts
|
|
324
|
+
linux.ts
|
|
325
|
+
env-only.ts
|
|
326
|
+
|
|
327
|
+
ui/
|
|
328
|
+
settings.ts
|
|
329
|
+
markdown/
|
|
330
|
+
render.ts
|
|
331
|
+
plain/
|
|
332
|
+
interactive.ts
|
|
333
|
+
tui/
|
|
334
|
+
app.ts
|
|
335
|
+
controller.ts
|
|
336
|
+
store.ts
|
|
337
|
+
|
|
338
|
+
test/
|
|
339
|
+
provider.test.ts
|
|
340
|
+
session.test.ts
|
|
341
|
+
tools.test.ts
|
|
342
|
+
permission.test.ts
|
|
343
|
+
persistent-grants.test.ts
|
|
344
|
+
multimodal.test.ts
|
|
345
|
+
hooks.test.ts
|
|
346
|
+
sandbox.test.ts
|
|
347
|
+
multi-agent.test.ts
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
### Registry Openness
|
|
351
|
+
|
|
352
|
+
Addable units:
|
|
353
|
+
|
|
354
|
+
```text
|
|
355
|
+
provider config
|
|
356
|
+
-> only from demian config
|
|
357
|
+
|
|
358
|
+
tool definition
|
|
359
|
+
-> built-in
|
|
360
|
+
-> programmatic registration
|
|
361
|
+
-> user/project filesystem scopes later behind trust
|
|
362
|
+
|
|
363
|
+
agent definition
|
|
364
|
+
-> built-in
|
|
365
|
+
-> config.agents
|
|
366
|
+
-> programmatic registration
|
|
367
|
+
-> user/project markdown scopes later behind trust
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
Provider is intentionally not an agent-provided addable unit. Endpoints, credentials, auth headers, and quirks stay in config. This choice gives up the convenience of self-contained agent files, but it keeps data routing and cost visible to the user.
|
|
371
|
+
|
|
372
|
+
Registry APIs:
|
|
373
|
+
|
|
374
|
+
```ts
|
|
375
|
+
export class ToolRegistry {
|
|
376
|
+
register(tool: ToolDefinition, source?: RegistrySource): void
|
|
377
|
+
get(name: string): ToolDefinition | undefined
|
|
378
|
+
list(): ToolDefinition[]
|
|
379
|
+
filter(names: string[]): ToolRegistry
|
|
380
|
+
}
|
|
381
|
+
|
|
382
|
+
export class AgentRegistry {
|
|
383
|
+
register(agent: AgentDefinition, source?: RegistrySource): void
|
|
384
|
+
get(name: string): AgentDefinition
|
|
385
|
+
list(filter?: AgentListFilter): AgentDefinition[]
|
|
386
|
+
primary(): AgentDefinition[]
|
|
387
|
+
callable(): AgentDefinition[]
|
|
388
|
+
catalog(): AgentCatalogEntry[]
|
|
389
|
+
}
|
|
390
|
+
|
|
391
|
+
export interface RegistrySource {
|
|
392
|
+
scope: "builtin" | "user" | "project" | "programmatic"
|
|
393
|
+
path?: string
|
|
394
|
+
trusted: boolean
|
|
395
|
+
}
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
Collision policy:
|
|
399
|
+
|
|
400
|
+
```text
|
|
401
|
+
built-in name wins
|
|
402
|
+
-> user/project/programmatic contribution with same name is rejected
|
|
403
|
+
-> warning event is emitted
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
Automatic discovery must not silently change built-in behavior. Explicit override can be added later as a separate config field if needed.
|
|
407
|
+
|
|
408
|
+
Migration guidance:
|
|
409
|
+
|
|
410
|
+
- Use `codex/demian/src/messages.ts`, `session.ts`, `cli.ts`, `config.ts`, and `providers/openai.ts` as primary implementation sources.
|
|
411
|
+
- Use `claude/demian/src/sandbox/` as the shape for a sandbox adapter directory.
|
|
412
|
+
- Preserve the Claude document's decision rationale and safety tables in this canonical architecture.
|
|
413
|
+
- Do not copy `node_modules`.
|
|
414
|
+
|
|
415
|
+
---
|
|
416
|
+
|
|
417
|
+
## 6. Core Types
|
|
418
|
+
|
|
419
|
+
### Provider
|
|
420
|
+
|
|
421
|
+
```ts
|
|
422
|
+
export interface Provider {
|
|
423
|
+
id: "openai-compatible" | "anthropic" | string
|
|
424
|
+
chat(req: ChatRequest): Promise<ChatResponse>
|
|
425
|
+
stream?(req: ChatRequest): AsyncIterable<ChatStreamEvent>
|
|
426
|
+
}
|
|
427
|
+
|
|
428
|
+
export interface ChatRequest {
|
|
429
|
+
messages: Message[]
|
|
430
|
+
tools: Tool[]
|
|
431
|
+
model: string
|
|
432
|
+
maxTokens?: number
|
|
433
|
+
temperature?: number
|
|
434
|
+
signal: AbortSignal
|
|
435
|
+
}
|
|
436
|
+
|
|
437
|
+
export interface ChatResponse {
|
|
438
|
+
message: AssistantMessage
|
|
439
|
+
toolCalls: ToolCall[]
|
|
440
|
+
stopReason: "end_turn" | "tool_use" | "max_tokens" | "content_filter" | "error"
|
|
441
|
+
usage?: TokenUsage
|
|
442
|
+
raw: unknown
|
|
443
|
+
}
|
|
444
|
+
|
|
445
|
+
export type ChatStreamEvent =
|
|
446
|
+
| { type: "text_delta"; text: string; raw?: unknown }
|
|
447
|
+
| { type: "tool_call_delta"; index: number; id?: string; name?: string; arguments?: string; raw?: unknown }
|
|
448
|
+
| { type: "done"; response: ChatResponse }
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
Invariant:
|
|
452
|
+
|
|
453
|
+
```text
|
|
454
|
+
stopReason === "tool_use" iff toolCalls.length > 0
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
Provider safety blocks become `content_filter` when detectable. Unknown malformed responses become `error`.
|
|
458
|
+
|
|
459
|
+
### Message
|
|
460
|
+
|
|
461
|
+
```ts
|
|
462
|
+
export type Message =
|
|
463
|
+
| SystemMessage
|
|
464
|
+
| UserMessage
|
|
465
|
+
| AssistantMessage
|
|
466
|
+
| ToolMessage
|
|
467
|
+
|
|
468
|
+
export interface SystemMessage {
|
|
469
|
+
role: "system"
|
|
470
|
+
content: string
|
|
471
|
+
}
|
|
472
|
+
|
|
473
|
+
export interface UserMessage {
|
|
474
|
+
role: "user"
|
|
475
|
+
content: string | UserContentPart[]
|
|
476
|
+
}
|
|
477
|
+
|
|
478
|
+
export interface AssistantMessage {
|
|
479
|
+
role: "assistant"
|
|
480
|
+
content?: string | null
|
|
481
|
+
toolCalls?: ToolCall[]
|
|
482
|
+
}
|
|
483
|
+
|
|
484
|
+
export interface ToolMessage {
|
|
485
|
+
role: "tool"
|
|
486
|
+
toolCallId: string
|
|
487
|
+
name: string
|
|
488
|
+
content: string
|
|
489
|
+
isError?: boolean
|
|
490
|
+
}
|
|
491
|
+
|
|
492
|
+
export interface ToolCall {
|
|
493
|
+
id: string
|
|
494
|
+
name: string
|
|
495
|
+
input: unknown
|
|
496
|
+
}
|
|
497
|
+
|
|
498
|
+
export type UserContentPart =
|
|
499
|
+
| { type: "text"; text: string }
|
|
500
|
+
| { type: "image_url"; image_url: { url: string; detail?: "auto" | "low" | "high" } }
|
|
501
|
+
```
|
|
502
|
+
|
|
503
|
+
OpenAI-compatible providers receive this shape almost directly. Anthropic converts system messages, tool calls, tool results, and image parts inside `AnthropicProvider`.
|
|
504
|
+
|
|
505
|
+
### Tool
|
|
506
|
+
|
|
507
|
+
```ts
|
|
508
|
+
export interface Tool {
|
|
509
|
+
name: string
|
|
510
|
+
description: string
|
|
511
|
+
inputSchema: JsonSchema
|
|
512
|
+
execute(input: unknown, ctx: ToolContext): Promise<ToolResult>
|
|
513
|
+
}
|
|
514
|
+
|
|
515
|
+
export interface ToolDefinition extends Tool {
|
|
516
|
+
metadata?: {
|
|
517
|
+
sideEffect?: "none" | "workspace" | "process" | "network" | "external"
|
|
518
|
+
defaultDecision?: "allow" | "ask" | "deny"
|
|
519
|
+
trust?: "builtin" | "user" | "project" | "programmatic"
|
|
520
|
+
}
|
|
521
|
+
}
|
|
522
|
+
|
|
523
|
+
export interface AgentInvocationInput {
|
|
524
|
+
agent: string
|
|
525
|
+
task: string
|
|
526
|
+
context?: string
|
|
527
|
+
contextRefs?: string[]
|
|
528
|
+
relevantFiles?: string[]
|
|
529
|
+
constraints?: string[]
|
|
530
|
+
expectedOutput?: string
|
|
531
|
+
maxTurns?: number
|
|
532
|
+
returnMode?: "brief" | "normal"
|
|
533
|
+
}
|
|
534
|
+
|
|
535
|
+
export interface ToolContext {
|
|
536
|
+
rootSessionId?: string
|
|
537
|
+
sessionId: string
|
|
538
|
+
invocationId?: string
|
|
539
|
+
parentInvocationId?: string
|
|
540
|
+
agent?: string
|
|
541
|
+
callId: string
|
|
542
|
+
cwd: string
|
|
543
|
+
signal: AbortSignal
|
|
544
|
+
emit(event: RuntimeEvent): void
|
|
545
|
+
ask(req: PermissionRequest): Promise<PermissionAnswer>
|
|
546
|
+
sandbox?: SandboxConfig
|
|
547
|
+
dryRun?: boolean
|
|
548
|
+
runner?: {
|
|
549
|
+
runAgentSession(input: AgentInvocationInput): Promise<ToolResult>
|
|
550
|
+
}
|
|
551
|
+
}
|
|
552
|
+
|
|
553
|
+
export interface ToolResult {
|
|
554
|
+
ok: boolean
|
|
555
|
+
content: string
|
|
556
|
+
metadata?: Record<string, unknown>
|
|
557
|
+
}
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
Inputs are `unknown` until validated at runtime. JSON Schema sent to the model is guidance, not trust.
|
|
561
|
+
|
|
562
|
+
External tools are addable capabilities, not permission authorities. Even if an external tool calls `ctx.ask()`, the request routes to the root PermissionCoordinator. External tools also do not get to start sub agents directly in v1; the supported path is for the main model to call `delegate_agent`, which uses the narrow `runner.runAgentSession()` bridge owned by the runtime.
|
|
563
|
+
|
|
564
|
+
### Hook
|
|
565
|
+
|
|
566
|
+
```ts
|
|
567
|
+
export type HookEvent =
|
|
568
|
+
| "SessionStart"
|
|
569
|
+
| "SessionEnd"
|
|
570
|
+
| "UserPromptSubmit"
|
|
571
|
+
| "BeforeModelRequest"
|
|
572
|
+
| "AfterModelResponse"
|
|
573
|
+
| "PreToolUse"
|
|
574
|
+
| "PostToolUse"
|
|
575
|
+
| "ToolError"
|
|
576
|
+
| "PermissionRequest"
|
|
577
|
+
| "Stop"
|
|
578
|
+
|
|
579
|
+
export type HookKind = "builtin" | "command" | "script"
|
|
580
|
+
|
|
581
|
+
export interface Hook {
|
|
582
|
+
name: string
|
|
583
|
+
event: HookEvent
|
|
584
|
+
kind: HookKind
|
|
585
|
+
match?: HookMatch
|
|
586
|
+
command?: string
|
|
587
|
+
modulePath?: string
|
|
588
|
+
timeoutMs?: number
|
|
589
|
+
}
|
|
590
|
+
|
|
591
|
+
export interface HookResult {
|
|
592
|
+
decision?: "allow" | "warn" | "block"
|
|
593
|
+
message?: string
|
|
594
|
+
patch?: unknown
|
|
595
|
+
metadata?: Record<string, unknown>
|
|
596
|
+
}
|
|
597
|
+
```
|
|
598
|
+
|
|
599
|
+
Failure policy:
|
|
600
|
+
|
|
601
|
+
| Hook kind | Failure behavior |
|
|
602
|
+
|-----------|------------------|
|
|
603
|
+
| `builtin` | fail-closed |
|
|
604
|
+
| `command` | fail-open with warning |
|
|
605
|
+
| `script` | fail-open with warning |
|
|
606
|
+
|
|
607
|
+
### Permission
|
|
608
|
+
|
|
609
|
+
```ts
|
|
610
|
+
export type Decision = "allow" | "ask" | "deny"
|
|
611
|
+
|
|
612
|
+
export interface PermissionRule {
|
|
613
|
+
tool: string
|
|
614
|
+
match?: {
|
|
615
|
+
pathGlob?: string
|
|
616
|
+
commandPrefix?: string
|
|
617
|
+
}
|
|
618
|
+
decision: Decision
|
|
619
|
+
reason?: string
|
|
620
|
+
}
|
|
621
|
+
|
|
622
|
+
export interface PermissionAnswer {
|
|
623
|
+
decision: Decision
|
|
624
|
+
always?: boolean
|
|
625
|
+
reason?: string
|
|
626
|
+
}
|
|
627
|
+
|
|
628
|
+
export interface PermissionCoordinator {
|
|
629
|
+
evaluateTool(req: ToolPermissionRequest): Promise<PermissionAnswer>
|
|
630
|
+
evaluateAgent(req: AgentPermissionRequest): Promise<PermissionAnswer>
|
|
631
|
+
ask(req: PermissionRequest): Promise<PermissionAnswer>
|
|
632
|
+
}
|
|
633
|
+
|
|
634
|
+
export interface ToolPermissionRequest {
|
|
635
|
+
rootSessionId?: string
|
|
636
|
+
invocationId?: string
|
|
637
|
+
agent: string
|
|
638
|
+
agentPath?: string[]
|
|
639
|
+
tool: string
|
|
640
|
+
input: unknown
|
|
641
|
+
effectiveRules: PermissionRule[]
|
|
642
|
+
}
|
|
643
|
+
|
|
644
|
+
export interface AgentPermissionRequest {
|
|
645
|
+
rootSessionId: string
|
|
646
|
+
parentInvocationId: string
|
|
647
|
+
agentPath: string[]
|
|
648
|
+
targetAgent: string
|
|
649
|
+
providerProfile: string
|
|
650
|
+
handoffPreview: string
|
|
651
|
+
}
|
|
652
|
+
```
|
|
653
|
+
|
|
654
|
+
The current implementation realizes this coordinator contract through `RootSession` plus shared `SessionGrants`, shared `permissionPrompt`, and a shared root `TranscriptWriter`. A standalone `permissions/coordinator.ts` class is still a valid extraction point, but the first implementation keeps the coordination in the root runtime to avoid an extra abstraction before the behavior is stable.
|
|
655
|
+
|
|
656
|
+
`always` is represented as `{ decision: "allow", always: true }`, matching the Codex implementation shape.
|
|
657
|
+
|
|
658
|
+
Single-agent permission evaluation order:
|
|
659
|
+
|
|
660
|
+
```text
|
|
661
|
+
1. built-in hard deny
|
|
662
|
+
2. explicit deny rules
|
|
663
|
+
3. session grants and unexpired persistent grants
|
|
664
|
+
4. most specific allow/ask rule
|
|
665
|
+
5. agent default decision
|
|
666
|
+
6. built-in default ask
|
|
667
|
+
```
|
|
668
|
+
|
|
669
|
+
Multi-agent permission evaluation separates safety deny from agent-local policy:
|
|
670
|
+
|
|
671
|
+
```text
|
|
672
|
+
1. built-in hard deny
|
|
673
|
+
2. global explicit deny rules
|
|
674
|
+
3. root session grants and unexpired persistent grants
|
|
675
|
+
4. most specific rule from the current caller agent
|
|
676
|
+
5. if an agent-local deny matched, ask the user whether to expand global authority
|
|
677
|
+
6. root default decision
|
|
678
|
+
7. built-in default ask
|
|
679
|
+
```
|
|
680
|
+
|
|
681
|
+
The invariant is narrower and stronger than "all deny wins":
|
|
682
|
+
|
|
683
|
+
```text
|
|
684
|
+
hard deny or global explicit deny cannot be expanded by any agent or grant
|
|
685
|
+
```
|
|
686
|
+
|
|
687
|
+
Agent-local deny is a behavioral default. It can be overridden only by a user-approved root grant, never by another agent.
|
|
688
|
+
|
|
689
|
+
Grant key policy:
|
|
690
|
+
|
|
691
|
+
| Tool | Grant key |
|
|
692
|
+
|------|-----------|
|
|
693
|
+
| `bash` | first two command tokens |
|
|
694
|
+
| `write_file` | parent directory |
|
|
695
|
+
| `edit_file` | parent directory |
|
|
696
|
+
| other tools | stable JSON stringification of relevant input |
|
|
697
|
+
| agent invocation | `agent:<agentName>:<providerProfile>` |
|
|
698
|
+
|
|
699
|
+
### AgentDefinition
|
|
700
|
+
|
|
701
|
+
The v0 `Agent` shape is enough for one runner. The multi-agent design normalizes agents into `AgentDefinition` and adapts them back to the current runner shape when needed.
|
|
702
|
+
|
|
703
|
+
```ts
|
|
704
|
+
export interface AgentDefinition {
|
|
705
|
+
name: string
|
|
706
|
+
displayName?: string
|
|
707
|
+
description: string
|
|
708
|
+
mode?: "primary" | "subagent" | "all"
|
|
709
|
+
hidden?: boolean
|
|
710
|
+
|
|
711
|
+
provider?: {
|
|
712
|
+
profile?: string
|
|
713
|
+
inheritRoot?: boolean
|
|
714
|
+
}
|
|
715
|
+
|
|
716
|
+
prompt: {
|
|
717
|
+
system: string
|
|
718
|
+
append?: string
|
|
719
|
+
}
|
|
720
|
+
|
|
721
|
+
tools: {
|
|
722
|
+
visible: string[]
|
|
723
|
+
}
|
|
724
|
+
|
|
725
|
+
authority?: {
|
|
726
|
+
tools?: string[]
|
|
727
|
+
permissions?: PermissionRule[]
|
|
728
|
+
defaultDecision?: "allow" | "ask" | "deny"
|
|
729
|
+
}
|
|
730
|
+
|
|
731
|
+
permissions: PermissionRule[]
|
|
732
|
+
defaultDecision?: "allow" | "ask" | "deny"
|
|
733
|
+
|
|
734
|
+
delegation?: {
|
|
735
|
+
callable?: boolean
|
|
736
|
+
canDelegate?: boolean
|
|
737
|
+
allowedAgents?: string[]
|
|
738
|
+
deniedAgents?: string[]
|
|
739
|
+
maxDepth?: number
|
|
740
|
+
invocationDecision?: "allow" | "ask" | "deny"
|
|
741
|
+
}
|
|
742
|
+
|
|
743
|
+
catalog?: {
|
|
744
|
+
category?: "builder" | "planner" | "reviewer" | "researcher" | "utility" | string
|
|
745
|
+
cost?: "free" | "cheap" | "standard" | "expensive"
|
|
746
|
+
useWhen?: string[]
|
|
747
|
+
avoidWhen?: string[]
|
|
748
|
+
triggers?: string[]
|
|
749
|
+
}
|
|
750
|
+
}
|
|
751
|
+
```
|
|
752
|
+
|
|
753
|
+
Mode semantics:
|
|
754
|
+
|
|
755
|
+
| Mode | Main agent selection | Sub agent invocation | Intent |
|
|
756
|
+
|------|----------------------|----------------------|--------|
|
|
757
|
+
| `primary` | yes | no | Talks directly to the user |
|
|
758
|
+
| `subagent` | no | yes | Delegation-only specialist |
|
|
759
|
+
| `all` | yes | yes | Can run directly or be delegated to |
|
|
760
|
+
|
|
761
|
+
Provider fields are references only. An agent may point at `config.providers["gemini-fast"]`, but it may not define `baseURL`, credentials, custom headers, auth, quirks, or a direct model override. If an agent needs a different model, config should define a separate provider profile for that model. This prevents agent files from becoming hidden network or credential entrypoints.
|
|
762
|
+
|
|
763
|
+
### RootSession And AgentSession
|
|
764
|
+
|
|
765
|
+
```ts
|
|
766
|
+
export interface RootSession {
|
|
767
|
+
id: string
|
|
768
|
+
cwd: string
|
|
769
|
+
mode: "single-agent" | "multi-agent"
|
|
770
|
+
agents: AgentRegistry
|
|
771
|
+
tools: ToolRegistry
|
|
772
|
+
providers: ProviderRegistry
|
|
773
|
+
permissions: PermissionCoordinator
|
|
774
|
+
events: EventBus
|
|
775
|
+
transcript: RootTranscriptWriter
|
|
776
|
+
signal: AbortSignal
|
|
777
|
+
agentSessions: AgentSessionStore
|
|
778
|
+
}
|
|
779
|
+
|
|
780
|
+
export interface AgentInvocation {
|
|
781
|
+
id: string
|
|
782
|
+
rootSessionId: string
|
|
783
|
+
agentSessionId: string
|
|
784
|
+
parentInvocationId?: string
|
|
785
|
+
depth: number
|
|
786
|
+
agentName: string
|
|
787
|
+
providerProfile: string
|
|
788
|
+
model: string
|
|
789
|
+
prompt: string
|
|
790
|
+
status: "created" | "running" | "completed" | "failed" | "cancelled"
|
|
791
|
+
}
|
|
792
|
+
|
|
793
|
+
export interface AgentSession {
|
|
794
|
+
id: string
|
|
795
|
+
rootSessionId: string
|
|
796
|
+
agentName: string
|
|
797
|
+
providerProfile: string
|
|
798
|
+
model: string
|
|
799
|
+
messages: Message[]
|
|
800
|
+
memory: AgentSessionMemory
|
|
801
|
+
contextPolicy: AgentContextPolicy
|
|
802
|
+
updatedAt: number
|
|
803
|
+
}
|
|
804
|
+
|
|
805
|
+
export interface AgentSessionMemory {
|
|
806
|
+
summary: string
|
|
807
|
+
findings: string[]
|
|
808
|
+
decisions: string[]
|
|
809
|
+
openQuestions: string[]
|
|
810
|
+
relevantFiles: string[]
|
|
811
|
+
lastResults: Array<{ task: string; resultPreview: string; ts: number }>
|
|
812
|
+
}
|
|
813
|
+
|
|
814
|
+
export interface AgentContextPolicy {
|
|
815
|
+
maxTurns: number
|
|
816
|
+
maxContextTokens?: number
|
|
817
|
+
maxInputTokens?: number // legacy alias
|
|
818
|
+
maxMessages?: number
|
|
819
|
+
summaryTargetTokens?: number
|
|
820
|
+
recentTurns?: number
|
|
821
|
+
compactAtRatio?: number
|
|
822
|
+
compressAtRatio?: number // legacy alias
|
|
823
|
+
compression: "compact-summary-and-recent" | "summary-and-recent" | "none"
|
|
824
|
+
}
|
|
825
|
+
```
|
|
826
|
+
|
|
827
|
+
Child session reuse key:
|
|
828
|
+
|
|
829
|
+
```text
|
|
830
|
+
rootSessionId + agentName + providerProfile + cwd
|
|
831
|
+
```
|
|
832
|
+
|
|
833
|
+
The first implementation keeps child memory in memory only. Disk persistence is deferred until explicit resume support exists. This chooses predictable current-session continuity over surprising long-lived agent memory.
|
|
834
|
+
|
|
835
|
+
In the current CLI and TUI wiring, a RootSession is created for each submitted task. Programmatic callers can keep a RootSession object and reuse it across runs, but interactive long-lived RootSession reuse is left as a follow-up. This intentionally lands the safer synchronous delegation behavior first; cross-prompt subagent memory can be added once the UI can clearly show that memory is being retained.
|
|
836
|
+
|
|
837
|
+
---
|
|
838
|
+
|
|
839
|
+
## 7. Session Lifecycle
|
|
840
|
+
|
|
841
|
+
Startup:
|
|
842
|
+
|
|
843
|
+
```text
|
|
844
|
+
CLI
|
|
845
|
+
-> parse flags
|
|
846
|
+
-> resolve cwd
|
|
847
|
+
-> load config
|
|
848
|
+
-> merge defaults, user config, project config, explicit config, flags
|
|
849
|
+
-> resolve agent mode
|
|
850
|
+
-> resolve agent
|
|
851
|
+
-> resolve provider + model
|
|
852
|
+
-> create RootSession services when multi-agent mode is active
|
|
853
|
+
-> create EventBus
|
|
854
|
+
-> create TranscriptWriter
|
|
855
|
+
-> create SessionRunner
|
|
856
|
+
-> run
|
|
857
|
+
```
|
|
858
|
+
|
|
859
|
+
Mode-specific startup:
|
|
860
|
+
|
|
861
|
+
```text
|
|
862
|
+
single-agent:
|
|
863
|
+
one selected agent
|
|
864
|
+
one provider/model for the run
|
|
865
|
+
delegate_agent is not registered or visible
|
|
866
|
+
callable agent catalog is not injected
|
|
867
|
+
|
|
868
|
+
multi-agent:
|
|
869
|
+
selected agent becomes the main agent
|
|
870
|
+
RootSession spans the interactive conversation
|
|
871
|
+
provider/model flags apply to the main invocation only
|
|
872
|
+
callable subagent catalog is injected into the main system prompt
|
|
873
|
+
delegate_agent is visible only if the main agent can delegate
|
|
874
|
+
```
|
|
875
|
+
|
|
876
|
+
Initial messages:
|
|
877
|
+
|
|
878
|
+
```ts
|
|
879
|
+
const messages: Message[] = [
|
|
880
|
+
{ role: "system", content: buildSystemPrompt(agent, envInfo) },
|
|
881
|
+
{ role: "user", content: await buildUserContent(prompt, images, multimodalConfig) },
|
|
882
|
+
]
|
|
883
|
+
```
|
|
884
|
+
|
|
885
|
+
One model turn:
|
|
886
|
+
|
|
887
|
+
```text
|
|
888
|
+
BeforeModelRequest hooks
|
|
889
|
+
-> provider.stream() when enabled and supported
|
|
890
|
+
otherwise provider.chat()
|
|
891
|
+
-> AfterModelResponse hooks
|
|
892
|
+
-> append assistant message
|
|
893
|
+
-> if content_filter: emit and stop
|
|
894
|
+
-> if no tool calls: final answer
|
|
895
|
+
-> run tool calls sequentially
|
|
896
|
+
-> append tool messages
|
|
897
|
+
-> next turn
|
|
898
|
+
```
|
|
899
|
+
|
|
900
|
+
Tool lifecycle:
|
|
901
|
+
|
|
902
|
+
```text
|
|
903
|
+
emit tool.requested
|
|
904
|
+
-> validate tool exists and is allowed for agent
|
|
905
|
+
-> validate input schema
|
|
906
|
+
-> PreToolUse hooks
|
|
907
|
+
block -> append tool error message
|
|
908
|
+
patch -> patch input, then revalidate
|
|
909
|
+
warn -> continue with warning metadata
|
|
910
|
+
-> PermissionEngine.evaluate()
|
|
911
|
+
deny -> append tool error message
|
|
912
|
+
ask -> prompt user or apply --yes
|
|
913
|
+
allow -> continue
|
|
914
|
+
-> emit tool.started
|
|
915
|
+
-> dry-run check for write_file, edit_file, bash
|
|
916
|
+
-> tool.execute()
|
|
917
|
+
-> output cap
|
|
918
|
+
-> PostToolUse hooks
|
|
919
|
+
-> emit tool.completed or tool.failed
|
|
920
|
+
-> append role="tool" message
|
|
921
|
+
```
|
|
922
|
+
|
|
923
|
+
Tool calls are sequential in v0. This keeps permission prompts, transcript order, and model continuation deterministic.
|
|
924
|
+
|
|
925
|
+
### Delegation Lifecycle
|
|
926
|
+
|
|
927
|
+
Synchronous v1 delegation flow:
|
|
928
|
+
|
|
929
|
+
```text
|
|
930
|
+
main model requests delegate_agent
|
|
931
|
+
-> emit agent.invocation.requested
|
|
932
|
+
-> validate multi-agent mode
|
|
933
|
+
-> validate parent can delegate
|
|
934
|
+
-> resolve target agent
|
|
935
|
+
-> reject hidden, non-callable, self, cycle, or max-depth violations
|
|
936
|
+
-> evaluate agent invocation permission through root PermissionCoordinator
|
|
937
|
+
-> resolve child provider profile
|
|
938
|
+
-> load or create child AgentSession
|
|
939
|
+
-> build structured handoff packet
|
|
940
|
+
-> assemble bounded child context from memory + recent child turns + handoff
|
|
941
|
+
-> run child AgentSessionRunner with root-owned services
|
|
942
|
+
-> update compressed child memory
|
|
943
|
+
-> emit agent.invocation.completed | failed | cancelled
|
|
944
|
+
-> return compact child result as the delegate_agent tool message
|
|
945
|
+
```
|
|
946
|
+
|
|
947
|
+
Delegation is intentionally synchronous in v1. The gain is deterministic permission ordering, simple cancellation, and a main model that can reason over the child result immediately. The cost is no background convenience yet; that is deferred until transcript and permission UX prove stable.
|
|
948
|
+
|
|
949
|
+
### Handoff Context
|
|
950
|
+
|
|
951
|
+
The child does not receive the full main conversation by default, because that would be costly and could send more data than the user expected to a different provider. It also does not start from a blank prompt, because that would make specialist agents brittle. The compromise is a structured handoff packet.
|
|
952
|
+
|
|
953
|
+
```ts
|
|
954
|
+
export interface HandoffContext {
|
|
955
|
+
runtime: {
|
|
956
|
+
rootSessionId: string
|
|
957
|
+
agentSessionId: string
|
|
958
|
+
parentInvocationId: string
|
|
959
|
+
parentAgent: string
|
|
960
|
+
childAgent: string
|
|
961
|
+
cwd: string
|
|
962
|
+
currentUserRequest: string
|
|
963
|
+
effectiveTools: string[]
|
|
964
|
+
}
|
|
965
|
+
delegation: AgentInvocationInput
|
|
966
|
+
summary?: {
|
|
967
|
+
rootConversation?: string
|
|
968
|
+
childMemory?: string
|
|
969
|
+
parentFindings?: string[]
|
|
970
|
+
decisions?: string[]
|
|
971
|
+
}
|
|
972
|
+
}
|
|
973
|
+
```
|
|
974
|
+
|
|
975
|
+
Only human-useful fields become model-visible: current user request, parent/child agent names, cwd, task, context, relevant files, constraints, expected output, and compact summaries. IDs remain transcript/debug metadata unless useful to the model.
|
|
976
|
+
|
|
977
|
+
### Context Budgets And Request-Time Compaction
|
|
978
|
+
|
|
979
|
+
Main and sub agent context budgets are runtime variables, not hard-coded constants.
|
|
980
|
+
|
|
981
|
+
```ts
|
|
982
|
+
export interface ContextBudgetConfig {
|
|
983
|
+
main: {
|
|
984
|
+
maxContextTokens?: number
|
|
985
|
+
maxInputTokens?: number // legacy alias
|
|
986
|
+
compactAtRatio?: number
|
|
987
|
+
compressAtRatio?: number // legacy alias
|
|
988
|
+
summaryTargetTokens: number
|
|
989
|
+
recentTurns: number
|
|
990
|
+
minRecentTurns: number
|
|
991
|
+
maxRawMessages?: number
|
|
992
|
+
keepRawDelegateResults: number
|
|
993
|
+
maxDelegateResultTokens: number
|
|
994
|
+
ledgerTargetTokens: number
|
|
995
|
+
}
|
|
996
|
+
subAgent: {
|
|
997
|
+
maxContextTokens: number
|
|
998
|
+
maxInputTokens: number
|
|
999
|
+
compactAtRatio?: number
|
|
1000
|
+
compressAtRatio?: number // legacy alias
|
|
1001
|
+
summaryTargetTokens: number
|
|
1002
|
+
recentTurns: number
|
|
1003
|
+
minRecentTurns?: number
|
|
1004
|
+
maxHandoffTokens: number
|
|
1005
|
+
maxDelegateContextTokens: number
|
|
1006
|
+
maxResultTokensToMain: number
|
|
1007
|
+
compression: "compact-summary-and-recent" | "summary-and-recent" | "none"
|
|
1008
|
+
}
|
|
1009
|
+
}
|
|
1010
|
+
```
|
|
1011
|
+
|
|
1012
|
+
Recommended defaults:
|
|
1013
|
+
|
|
1014
|
+
```text
|
|
1015
|
+
main.maxContextTokens = 48000
|
|
1016
|
+
subAgent.maxContextTokens = 12000
|
|
1017
|
+
main.compactAtRatio = 0.7
|
|
1018
|
+
subAgent.compactAtRatio = 0.7
|
|
1019
|
+
```
|
|
1020
|
+
|
|
1021
|
+
`maxContextTokens` is the maximum model-visible input context that demian should send on one provider request. `compactAtRatio` controls when history compaction starts; by default, `compactAtTokens = floor(maxContextTokens * 0.7)`. When the estimated request context reaches that threshold, `SessionRunner` compacts older history before calling the provider.
|
|
1022
|
+
|
|
1023
|
+
The first implementation performs deterministic in-memory request-time compaction. It preserves the current system prompt, a compact system summary of older turns, recent raw turns, and tool-call adjacency for the retained tail. It emits `context.compiled` for each model request and `context.compacted` when older messages are folded into a summary. Full JSONL-backed `ContextStore` persistence remains the next step; transcripts still preserve the audit chronology.
|
|
1024
|
+
|
|
1025
|
+
Interactive UIs also expose manual history compaction with `/compact`. Unlike automatic request-time compaction, `/compact` rewrites the interactive in-memory `history` immediately to the same compact-summary-plus-recent shape that would be used for a model request. The command is handled by the UI controller, is not sent to the model as a user message, and is not stored in prompt history. If there is not enough conversation history to compact, the UI reports that compaction was skipped. In non-interactive one-shot mode there is no retained history, so `/compact` is a no-op with a user-facing notice.
|
|
1026
|
+
|
|
1027
|
+
---
|
|
1028
|
+
|
|
1029
|
+
## 8. Provider Strategy
|
|
1030
|
+
|
|
1031
|
+
### OpenAIProvider
|
|
1032
|
+
|
|
1033
|
+
`OpenAIProvider` is the default provider implementation. It uses the common Chat Completions subset and a vendor quirk map.
|
|
1034
|
+
|
|
1035
|
+
Authentication is configurable per provider:
|
|
1036
|
+
|
|
1037
|
+
- default: `Authorization: Bearer <apiKey>` for OpenAI, Gemini, OpenRouter, local gateways, and most OpenAI-compatible services
|
|
1038
|
+
- Azure OpenAI API-key auth: `api-key: <apiKey>`, inferred automatically when `baseURL` ends in `.openai.azure.com` or `.services.ai.azure.com`
|
|
1039
|
+
- custom compatible gateways may set a custom auth header while keeping the same provider path
|
|
1040
|
+
|
|
1041
|
+
This split is required because Azure OpenAI's v1 endpoint is OpenAI-compatible at the API shape level, but its REST API-key authentication uses the `api-key` header. Users should not need to set `auth.type` for normal Azure OpenAI endpoints; explicit `auth` config is only an override for custom gateways or future Microsoft Entra ID bearer-token flows.
|
|
1042
|
+
|
|
1043
|
+
Payload subset:
|
|
1044
|
+
|
|
1045
|
+
- `model`
|
|
1046
|
+
- `messages`
|
|
1047
|
+
- `tools`
|
|
1048
|
+
- `tool_choice: "auto"` unless omitted by quirk
|
|
1049
|
+
- `temperature` unless omitted by quirk
|
|
1050
|
+
- `max_tokens` unless omitted by quirk
|
|
1051
|
+
- `stream` and `stream_options` only for streaming
|
|
1052
|
+
|
|
1053
|
+
Supported presets:
|
|
1054
|
+
|
|
1055
|
+
| Key | Type | baseURL | API key |
|
|
1056
|
+
|-----|------|---------|---------|
|
|
1057
|
+
| `openai` | openai-compatible | `https://api.openai.com/v1` | `OPENAI_API_KEY` |
|
|
1058
|
+
| `gemini` | openai-compatible | `https://generativelanguage.googleapis.com/v1beta/openai/` | `GOOGLE_API_KEY` |
|
|
1059
|
+
| `ollama` | openai-compatible | `http://localhost:11434/v1` | placeholder |
|
|
1060
|
+
| `lmstudio` | openai-compatible | `http://localhost:1234/v1` | placeholder |
|
|
1061
|
+
| `vllm` | openai-compatible | `http://localhost:8000/v1` | placeholder |
|
|
1062
|
+
| `llamacpp` | openai-compatible | `http://localhost:8080/v1` | placeholder |
|
|
1063
|
+
| `openrouter` | openai-compatible | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` |
|
|
1064
|
+
| `together` | openai-compatible | `https://api.together.xyz/v1` | `TOGETHER_API_KEY` |
|
|
1065
|
+
| `groq` | openai-compatible | `https://api.groq.com/openai/v1` | `GROQ_API_KEY` |
|
|
1066
|
+
| `azure` | openai-compatible | configured Azure endpoint | `AZURE_OPENAI_API_KEY` |
|
|
1067
|
+
| `anthropic` | anthropic | Anthropic API | `ANTHROPIC_API_KEY` |
|
|
1068
|
+
| `codex` | codex | `https://chatgpt.com/backend-api/codex` | local Codex ChatGPT login |
|
|
1069
|
+
| `claudecode` | claudecode external runtime | local Claude Agent SDK / Claude Code CLI | local Claude Code login or Agent SDK plan auth |
|
|
1070
|
+
|
|
1071
|
+
Provider adapters are stateless from the upstream API's point of view. `SessionRunner` sends the current compiled message context and the agent-visible tool schemas on each model request; adapters translate that shape but do not own history truncation. Context compression and tool-output compaction belong in the session/root-session context layer so provider-specific adapters cannot silently change what the model sees.
|
|
1072
|
+
|
|
1073
|
+
### CodexProvider
|
|
1074
|
+
|
|
1075
|
+
`CodexProvider` is a first-class provider, not an OpenAI-compatible preset. It uses the Responses API over the ChatGPT-backed Codex endpoint and reuses the user's local Codex login.
|
|
1076
|
+
|
|
1077
|
+
The runtime boundary remains unchanged:
|
|
1078
|
+
|
|
1079
|
+
- demian owns hooks, permissions, transcripts, sandboxing, cancellation, tool execution, and delegation
|
|
1080
|
+
- Codex is used only as the model provider
|
|
1081
|
+
- demian must not spawn `codex exec` as a nested agent runtime
|
|
1082
|
+
|
|
1083
|
+
Implementation split:
|
|
1084
|
+
|
|
1085
|
+
```text
|
|
1086
|
+
src/providers/codex.ts
|
|
1087
|
+
provider orchestration and Responses request mapping
|
|
1088
|
+
|
|
1089
|
+
src/providers/codex-auth.ts
|
|
1090
|
+
Codex auth file/keyring loading, ChatGPT token headers, refresh
|
|
1091
|
+
|
|
1092
|
+
src/providers/codex-state.ts
|
|
1093
|
+
Codex-local metadata such as installation_id
|
|
1094
|
+
|
|
1095
|
+
src/providers/codex-stream.ts
|
|
1096
|
+
Responses SSE parsing and ChatStreamEvent accumulation
|
|
1097
|
+
```
|
|
1098
|
+
|
|
1099
|
+
Default config:
|
|
1100
|
+
|
|
1101
|
+
```json
|
|
1102
|
+
{
|
|
1103
|
+
"codex": {
|
|
1104
|
+
"type": "codex",
|
|
1105
|
+
"model": "gpt-5.1-codex",
|
|
1106
|
+
"baseURL": "https://chatgpt.com/backend-api/codex",
|
|
1107
|
+
"authStore": "auto",
|
|
1108
|
+
"allowApiKeyFallback": false,
|
|
1109
|
+
"promptCacheKey": "root-session",
|
|
1110
|
+
"responses": {
|
|
1111
|
+
"store": false,
|
|
1112
|
+
"include": ["reasoning.encrypted_content"],
|
|
1113
|
+
"reasoning": {
|
|
1114
|
+
"effort": "medium",
|
|
1115
|
+
"summary": "auto"
|
|
1116
|
+
}
|
|
1117
|
+
}
|
|
1118
|
+
}
|
|
1119
|
+
}
|
|
1120
|
+
```
|
|
1121
|
+
|
|
1122
|
+
Auth rules:
|
|
1123
|
+
|
|
1124
|
+
- `authStore: "auto"` tries the Codex keyring location first, then `{codexHome}/auth.json`.
|
|
1125
|
+
- File auth is fully readable and refreshable; refreshed tokens are written atomically with `0600` permissions.
|
|
1126
|
+
- Native keyring access is adapter-backed. The built-in macOS adapter can read Codex's `Codex Auth` entry. Keyring writes require a safe adapter implementation; if refreshed keyring tokens cannot be written, the provider asks the user to refresh with `codex login`.
|
|
1127
|
+
- Keyring identity must match Codex CLI: `service = "Codex Auth"`, `account = "cli|" + first 16 hex chars of sha256(canonical codexHome)`.
|
|
1128
|
+
- API-key fallback is disabled by default because OpenAI Platform API-key usage is not ChatGPT/Codex subscription usage.
|
|
1129
|
+
|
|
1130
|
+
Codex request headers:
|
|
1131
|
+
|
|
1132
|
+
```text
|
|
1133
|
+
Authorization: Bearer <access_token>
|
|
1134
|
+
ChatGPT-Account-ID: <account_id>
|
|
1135
|
+
X-OpenAI-Fedramp: true # only when token claims require it
|
|
1136
|
+
x-codex-installation-id: <installation_id>
|
|
1137
|
+
originator: demian
|
|
1138
|
+
user-agent: demian
|
|
1139
|
+
```
|
|
1140
|
+
|
|
1141
|
+
Responses payload rules:
|
|
1142
|
+
|
|
1143
|
+
- input is converted from demian's OpenAI-shaped history to Responses `message`, `function_call`, and `function_call_output` items
|
|
1144
|
+
- tools become Responses function tools with `strict: false`
|
|
1145
|
+
- `parallel_tool_calls` is `false`
|
|
1146
|
+
- `store` is `false`
|
|
1147
|
+
- `previous_response_id` is omitted
|
|
1148
|
+
- `prompt_cache_key` defaults to the root session id
|
|
1149
|
+
- `client_metadata["x-codex-installation-id"]` mirrors the installation id header
|
|
1150
|
+
- when reasoning is configured, include `reasoning.encrypted_content`
|
|
1151
|
+
|
|
1152
|
+
Refresh rules:
|
|
1153
|
+
|
|
1154
|
+
- decode JWT claims with base64url padding restoration
|
|
1155
|
+
- refresh proactively near access-token expiry and reactively once after `/responses` 401
|
|
1156
|
+
- share one refresh promise per `CodexAuthStore`
|
|
1157
|
+
- reuse `CodexAuthStore` instances across normal provider resolutions for the same `codexHome`, `authStore`, and refresh settings
|
|
1158
|
+
- classify refresh 401s as `refresh_token_expired`, `refresh_token_invalidated`, `refresh_token_reused`, or `other`
|
|
1159
|
+
- for `refresh_token_reused`, reload the selected auth store once and retry with newer tokens if another process already rotated them
|
|
1160
|
+
|
|
1161
|
+
Streaming rules:
|
|
1162
|
+
|
|
1163
|
+
- `response.output_text.delta` emits `text_delta`
|
|
1164
|
+
- `response.function_call_arguments.delta` is accumulated as raw argument chunks
|
|
1165
|
+
- function-call JSON is parsed only after the matching completed item is available
|
|
1166
|
+
- `response.completed` emits the final `ChatResponse`
|
|
1167
|
+
- `response.failed` and `response.incomplete` become normalized provider errors
|
|
1168
|
+
- `AbortSignal` is passed through `fetch()` and SSE consumption
|
|
1169
|
+
|
|
1170
|
+
### Claude Code External Runtime
|
|
1171
|
+
|
|
1172
|
+
`claudecode` is selectable like a provider, but it resolves to an
|
|
1173
|
+
`ExternalAgentRuntime` instead of the native `Provider.chat()` path. Claude Code
|
|
1174
|
+
owns its own agent loop, built-in tool execution, context, and session ids, so
|
|
1175
|
+
demian dispatches it through `ExternalAgentSessionRunner`.
|
|
1176
|
+
|
|
1177
|
+
Runtime boundary:
|
|
1178
|
+
|
|
1179
|
+
- `anthropic` remains the API-key Claude provider and uses Anthropic API
|
|
1180
|
+
billing.
|
|
1181
|
+
- `claudecode` uses local Claude Code login or Agent SDK plan auth. demian
|
|
1182
|
+
removes `ANTHROPIC_API_KEY` and `ANTHROPIC_AUTH_TOKEN` from the Claude Code
|
|
1183
|
+
child env by default and keeps `useBareMode: false` so subscription/Agent SDK
|
|
1184
|
+
auth is not silently replaced by API-key billing.
|
|
1185
|
+
- Claude Code built-in tools execute inside Claude Code. demian keeps the
|
|
1186
|
+
user-facing approval boundary through SDK `canUseTool`, but demian
|
|
1187
|
+
`PreToolUse`/`PostToolUse` hooks apply only to demian-native tools.
|
|
1188
|
+
- Runtime events for Claude Code tools carry `executor: "claudecode"` and
|
|
1189
|
+
`qualifiedName: "claudecode.<tool>"`, and UI surfaces a CC badge.
|
|
1190
|
+
- demian custom tools are not automatically exposed to Claude Code before the
|
|
1191
|
+
later tool-bridge phase.
|
|
1192
|
+
|
|
1193
|
+
Implementation split:
|
|
1194
|
+
|
|
1195
|
+
```text
|
|
1196
|
+
src/execution.ts
|
|
1197
|
+
dispatches native providers to SessionRunner and claudecode to ExternalAgentSessionRunner
|
|
1198
|
+
|
|
1199
|
+
src/external-runtime/claudecode-sdk.ts
|
|
1200
|
+
Claude Agent SDK adapter, SDK option mapping, auth preflight, usage ledger, budget guard
|
|
1201
|
+
|
|
1202
|
+
src/external-runtime/claudecode-cli.ts
|
|
1203
|
+
explicit CLI fallback using spawn args/stdin, stream-json parsing, capability-gated argv
|
|
1204
|
+
|
|
1205
|
+
src/external-runtime/claudecode-permissions.ts
|
|
1206
|
+
SDK canUseTool bridge into demian PermissionEngine and Asker
|
|
1207
|
+
|
|
1208
|
+
src/external-runtime/session-map.ts
|
|
1209
|
+
Claude Code session keys, instruction hash, permission policy hash, split reasons
|
|
1210
|
+
|
|
1211
|
+
src/external-runtime/session-lock.ts
|
|
1212
|
+
advisory resume lock with stale timeout and PID liveness checks
|
|
1213
|
+
|
|
1214
|
+
src/external-runtime/usage-ledger.ts
|
|
1215
|
+
demian-attributable process/daily/monthly cost ledger
|
|
1216
|
+
```
|
|
1217
|
+
|
|
1218
|
+
Default config:
|
|
1219
|
+
|
|
1220
|
+
```json
|
|
1221
|
+
{
|
|
1222
|
+
"claudecode": {
|
|
1223
|
+
"type": "claudecode",
|
|
1224
|
+
"runtime": "agent-sdk",
|
|
1225
|
+
"model": "sonnet",
|
|
1226
|
+
"cliPath": "~/.local/bin/claude",
|
|
1227
|
+
"cwdMode": "session",
|
|
1228
|
+
"permissionMode": "default",
|
|
1229
|
+
"defaultDecision": "by-category",
|
|
1230
|
+
"historyPolicy": "passthrough-resume",
|
|
1231
|
+
"onInvalidResume": "fresh",
|
|
1232
|
+
"attachmentFallback": "block",
|
|
1233
|
+
"allowSubagents": false,
|
|
1234
|
+
"sanitizeApiKeyEnv": true,
|
|
1235
|
+
"authPreflight": true,
|
|
1236
|
+
"useBareMode": false,
|
|
1237
|
+
"usageLedgerScope": "process",
|
|
1238
|
+
"sessionLock": true,
|
|
1239
|
+
"abortPolicy": "record-only"
|
|
1240
|
+
}
|
|
1241
|
+
}
|
|
1242
|
+
```
|
|
1243
|
+
|
|
1244
|
+
Session and history rules:
|
|
1245
|
+
|
|
1246
|
+
- demian stores returned Claude Code session ids under a key made from root
|
|
1247
|
+
session, agent, provider profile, cwd, model, instruction hash, and Claude
|
|
1248
|
+
Code-relevant permission policy hash.
|
|
1249
|
+
- Matching turns resume with the stored Claude Code session id.
|
|
1250
|
+
- Provider, cwd, model, instruction, agent, profile, or relevant permission
|
|
1251
|
+
changes start a fresh Claude Code session and emit `session.context.split`.
|
|
1252
|
+
- Invalid resume errors follow `onInvalidResume`: `fresh`, `auto-recover`, or
|
|
1253
|
+
interactive `prompt` with a 30 second timeout.
|
|
1254
|
+
- `historyPolicy: "stateless"` suppresses resume and starts a fresh Claude Code
|
|
1255
|
+
session for every turn.
|
|
1256
|
+
|
|
1257
|
+
Permission and tool rules:
|
|
1258
|
+
|
|
1259
|
+
- Permission lookup is executor-aware: `claudecode.Tool`, `*.Tool`,
|
|
1260
|
+
`claudecode.*`, `*`, then `defaultDecision`.
|
|
1261
|
+
- `defaultDecision: "by-category"` allows read-only `Read`/`Glob`/`Grep` and
|
|
1262
|
+
asks for mutating or unknown tools.
|
|
1263
|
+
- Prefix-less rules remain backward compatible for demian-native tools. Use
|
|
1264
|
+
`demian doctor policies --upgrade-namespaces` to migrate policy files to
|
|
1265
|
+
explicit `demian.<tool>` names.
|
|
1266
|
+
- `claudecode-plan` is advisory. It displays Claude Code's plan as the answer
|
|
1267
|
+
and exposes an opt-in use-plan action that prefixes the next user turn with
|
|
1268
|
+
the plan text instead of executing automatically.
|
|
1269
|
+
|
|
1270
|
+
### Legacy ClaudeCodeProvider
|
|
1271
|
+
|
|
1272
|
+
`ClaudeCodeProvider` is the old direct `/v1/messages` implementation. This
|
|
1273
|
+
path is unsupported because it reconstructs Claude Code OAuth behavior outside
|
|
1274
|
+
the public Claude Code Agent SDK/CLI surface. It is retained only behind
|
|
1275
|
+
`type: "claudecode-api-legacy"` for staged migration, and startup is blocked
|
|
1276
|
+
unless `DEMIAN_ENABLE_UNSUPPORTED_CLAUDECODE_API=1` is set.
|
|
1277
|
+
|
|
1278
|
+
Legacy runtime boundary:
|
|
1279
|
+
|
|
1280
|
+
- demian owns hooks, permissions, transcripts, sandboxing, cancellation, tool execution, and delegation
|
|
1281
|
+
- Claude Code is used only as the model provider
|
|
1282
|
+
- demian does not spawn `claude` in this legacy path
|
|
1283
|
+
- Anthropic API-key fallback is explicit because it uses normal Anthropic API billing, not Claude Code subscription auth
|
|
1284
|
+
|
|
1285
|
+
Implementation split:
|
|
1286
|
+
|
|
1287
|
+
```text
|
|
1288
|
+
src/providers/claudecode.ts
|
|
1289
|
+
provider orchestration, Messages request mapping, OAuth/API-key header selection, 401 retry
|
|
1290
|
+
|
|
1291
|
+
src/providers/claudecode-auth.ts
|
|
1292
|
+
Claude Code credential discovery, keyring/file/env loading, OAuth refresh, API-key fallback guard
|
|
1293
|
+
|
|
1294
|
+
src/providers/claudecode-stream.ts
|
|
1295
|
+
Anthropic Messages SSE parsing and ChatStreamEvent accumulation
|
|
1296
|
+
```
|
|
1297
|
+
|
|
1298
|
+
Legacy config shape:
|
|
1299
|
+
|
|
1300
|
+
```json
|
|
1301
|
+
{
|
|
1302
|
+
"claudecode": {
|
|
1303
|
+
"type": "claudecode-api-legacy",
|
|
1304
|
+
"model": "claude-sonnet-4-6",
|
|
1305
|
+
"maxTokens": 8192,
|
|
1306
|
+
"authStore": "auto",
|
|
1307
|
+
"allowApiKeyFallback": false,
|
|
1308
|
+
"allowEnvOAuthToken": true,
|
|
1309
|
+
"refresh": {
|
|
1310
|
+
"proactiveRefreshMinutes": 30,
|
|
1311
|
+
"cache": "claude-store"
|
|
1312
|
+
}
|
|
1313
|
+
}
|
|
1314
|
+
}
|
|
1315
|
+
```
|
|
1316
|
+
|
|
1317
|
+
Auth rules:
|
|
1318
|
+
|
|
1319
|
+
- `CLAUDE_CODE_OAUTH_TOKEN` is accepted first when `allowEnvOAuthToken` is true. This is useful for explicit token setup, but it cannot be refreshed after a 401 because it has no local refresh token.
|
|
1320
|
+
- `authStore: "auto"` tries the Claude Code keyring location first, then `${CLAUDE_CONFIG_DIR:-~/.claude}/.credentials.json`.
|
|
1321
|
+
- File auth is fully readable and refreshable; refreshed tokens are written atomically with `0600` permissions.
|
|
1322
|
+
- Native keyring access is adapter-backed. The built-in macOS adapter reads `service = "Claude Code-credentials"` and `account = <OS username>`.
|
|
1323
|
+
- Keyring writes require a safe adapter implementation; if refreshed keyring tokens cannot be written, the provider asks the user to refresh through Claude Code login.
|
|
1324
|
+
- OAuth credentials must contain `claudeAiOauth.accessToken`; when scopes are present, they must include `user:inference`.
|
|
1325
|
+
- Refresh uses Claude Code's OAuth client id `9d1c250a-e61b-44d9-88ed-5944d1962f5e` and defaults to `https://platform.claude.com/v1/oauth/token`.
|
|
1326
|
+
- `refresh.cache: "demian-cache"` is reserved but not implemented; the current provider supports only `refresh.cache: "claude-store"`.
|
|
1327
|
+
- API-key fallback is disabled by default because `ANTHROPIC_API_KEY` does not use Claude Code subscription auth.
|
|
1328
|
+
|
|
1329
|
+
Claude Code OAuth request headers:
|
|
1330
|
+
|
|
1331
|
+
```text
|
|
1332
|
+
Authorization: Bearer <access_token>
|
|
1333
|
+
Accept: application/json
|
|
1334
|
+
Content-Type: application/json
|
|
1335
|
+
anthropic-version: 2023-06-01
|
|
1336
|
+
anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05
|
|
1337
|
+
User-Agent: claude-cli/2.1.133 (external, claude-vscode)
|
|
1338
|
+
x-app: cli
|
|
1339
|
+
X-Claude-Code-Session-Id: <uuid>
|
|
1340
|
+
```
|
|
1341
|
+
|
|
1342
|
+
API-key fallback request headers:
|
|
1343
|
+
|
|
1344
|
+
```text
|
|
1345
|
+
x-api-key: <api_key>
|
|
1346
|
+
Accept: application/json
|
|
1347
|
+
Content-Type: application/json
|
|
1348
|
+
anthropic-version: 2023-06-01
|
|
1349
|
+
User-Agent: claude-cli/2.1.133 (external, claude-vscode)
|
|
1350
|
+
x-app: cli
|
|
1351
|
+
X-Claude-Code-Session-Id: <uuid>
|
|
1352
|
+
```
|
|
1353
|
+
|
|
1354
|
+
The default Claude Code user-agent and beta headers are aligned with the locally inspected Claude Code VS Code extension `anthropic.claude-code` version `2.1.133`. That extension launches Claude Code with `CLAUDE_CODE_ENTRYPOINT=claude-vscode`, which yields API user-agent `claude-cli/2.1.133 (external, claude-vscode)`. Its first-party OAuth Messages requests include the Claude Code, OAuth, interleaved-thinking, context-management, and prompt-caching-scope beta headers.
|
|
1355
|
+
|
|
1356
|
+
Messages payload rules:
|
|
1357
|
+
|
|
1358
|
+
- input is converted with the Anthropic adapter mapping: top-level `system`, `messages`, `tool_use`, `tool_result`, image blocks, and normalized stop reasons
|
|
1359
|
+
- tools become Anthropic tool schemas
|
|
1360
|
+
- `max_tokens` defaults to `8192` unless `maxTokens` is configured
|
|
1361
|
+
- `stream` is set per request
|
|
1362
|
+
- OAuth requests prepend the Claude Code system prefix: `You are Claude Code, Anthropic's official CLI for Claude.`
|
|
1363
|
+
- API-key fallback requests do not inject the Claude Code system prefix or OAuth beta header
|
|
1364
|
+
|
|
1365
|
+
Refresh and error rules:
|
|
1366
|
+
|
|
1367
|
+
- refresh proactively near access-token expiry and reactively once after `/messages` 401 when a refresh token exists
|
|
1368
|
+
- share one refresh promise per `ClaudeCodeAuthStore`
|
|
1369
|
+
- reuse `ClaudeCodeAuthStore` instances across normal provider resolutions for the same Claude config directory, auth store, and refresh settings
|
|
1370
|
+
- classify refresh failures as expired, invalidated, reused, invalid grant, or other
|
|
1371
|
+
- initial streaming 429/5xx responses are retried through the provider-independent retry path before SSE consumption starts
|
|
1372
|
+
- 429 responses are surfaced with Claude Code-specific guidance, request id when present, and `Retry-After` when available
|
|
1373
|
+
|
|
1374
|
+
Streaming rules:
|
|
1375
|
+
|
|
1376
|
+
- `content_block_delta` text emits `text_delta`
|
|
1377
|
+
- tool input JSON deltas are accumulated as raw argument chunks
|
|
1378
|
+
- `message_delta` accumulates usage and stop metadata
|
|
1379
|
+
- `message_stop` emits the final normalized `ChatResponse`
|
|
1380
|
+
- Anthropic stream `error` events become normalized provider errors, with `rate_limit_error` mapped to HTTP 429
|
|
1381
|
+
- `AbortSignal` is passed through `fetch()` and SSE consumption
|
|
1382
|
+
|
|
1383
|
+
### Agent Provider Resolution
|
|
1384
|
+
|
|
1385
|
+
In multi-agent mode, an invocation resolves provider identity from profile references only:
|
|
1386
|
+
|
|
1387
|
+
```text
|
|
1388
|
+
if agent.provider.profile exists:
|
|
1389
|
+
use config.providers[profile]
|
|
1390
|
+
else if agent.provider.inheritRoot !== false:
|
|
1391
|
+
use root provider profile
|
|
1392
|
+
else:
|
|
1393
|
+
use config.defaultProvider
|
|
1394
|
+
```
|
|
1395
|
+
|
|
1396
|
+
Rules:
|
|
1397
|
+
|
|
1398
|
+
- profile key must exist in `config.providers`
|
|
1399
|
+
- agent files cannot define provider endpoint, credential, auth header, quirks, or model override
|
|
1400
|
+
- provider/model is fixed at invocation start
|
|
1401
|
+
- root CLI/TUI `--provider` and `--model` override the main invocation only
|
|
1402
|
+
- sub agent model specialization is represented as another configured provider profile
|
|
1403
|
+
- resolved profile and model are recorded in events and transcript
|
|
1404
|
+
|
|
1405
|
+
This is a deliberate trust boundary. It gives agents model specialization without letting an agent definition route data to a new endpoint or introduce a secret dependency.
|
|
1406
|
+
|
|
1407
|
+
### Gemini
|
|
1408
|
+
|
|
1409
|
+
Gemini is first-class but not a separate adapter.
|
|
1410
|
+
|
|
1411
|
+
```json
|
|
1412
|
+
{
|
|
1413
|
+
"gemini": {
|
|
1414
|
+
"type": "openai-compatible",
|
|
1415
|
+
"model": "gemini-model-name",
|
|
1416
|
+
"baseURL": "https://generativelanguage.googleapis.com/v1beta/openai/",
|
|
1417
|
+
"apiKeyEnv": "GOOGLE_API_KEY"
|
|
1418
|
+
}
|
|
1419
|
+
}
|
|
1420
|
+
```
|
|
1421
|
+
|
|
1422
|
+
Decision:
|
|
1423
|
+
|
|
1424
|
+
- Use Google's OpenAI-compatible endpoint.
|
|
1425
|
+
- Do not add `@google/generative-ai`.
|
|
1426
|
+
- Do not add `GeminiProvider` in v0.
|
|
1427
|
+
- Do not auto-fallback on provider safety blocks.
|
|
1428
|
+
|
|
1429
|
+
Gemini quirks:
|
|
1430
|
+
|
|
1431
|
+
| Quirk | Policy |
|
|
1432
|
+
|-------|--------|
|
|
1433
|
+
| Some parameters may be ignored or unsupported | Omit unsupported parameters through quirks |
|
|
1434
|
+
| Safety filters may block output | Map to `content_filter` when detectable |
|
|
1435
|
+
| Tool calling quality varies by model | Surface clear model/provider errors |
|
|
1436
|
+
| Streaming chunk boundaries can differ | Normalize in provider stream parser |
|
|
1437
|
+
|
|
1438
|
+
Safety block policy:
|
|
1439
|
+
|
|
1440
|
+
```text
|
|
1441
|
+
Gemini content filter / safety block
|
|
1442
|
+
-> stopReason: "content_filter" or "error"
|
|
1443
|
+
-> emit model.content_filter
|
|
1444
|
+
-> explain provider safety filter to user
|
|
1445
|
+
-> do not auto-fallback
|
|
1446
|
+
```
|
|
1447
|
+
|
|
1448
|
+
### AnthropicProvider
|
|
1449
|
+
|
|
1450
|
+
Anthropic is a separate adapter because its native message model differs.
|
|
1451
|
+
|
|
1452
|
+
`AnthropicProvider` should use the official `@anthropic-ai/sdk`, scoped to this adapter only.
|
|
1453
|
+
|
|
1454
|
+
Reasoning:
|
|
1455
|
+
|
|
1456
|
+
- Anthropic is not on the OpenAI-compatible path, so using its SDK does not weaken the single-path strategy for OpenAI-compatible vendors.
|
|
1457
|
+
- Anthropic's native protocol has distinct concepts: top-level `system`, `tool_use`, `tool_result`, content blocks, and vendor-specific streaming events.
|
|
1458
|
+
- The SDK gives stronger type coverage for these native shapes and absorbs provider API changes better than a hand-written `fetch` client.
|
|
1459
|
+
- Keeping SDK usage inside `src/providers/anthropic.ts` prevents dependency concerns from leaking into Session Runner, tools, hooks, permissions, or config.
|
|
1460
|
+
- Implementation should keep SDK construction adapter-local and test-injectable: `AnthropicProvider` accepts a client for fixtures and loads/constructs the SDK client only when the Anthropic provider is resolved or used. This preserves network-free tests and keeps the OpenAI-compatible path independent from Anthropic initialization.
|
|
1461
|
+
|
|
1462
|
+
The adapter converts:
|
|
1463
|
+
|
|
1464
|
+
- OpenAI-shaped `system` messages to Anthropic top-level `system`
|
|
1465
|
+
- assistant `toolCalls` to `tool_use` blocks
|
|
1466
|
+
- `role: "tool"` messages to `tool_result` blocks
|
|
1467
|
+
- OpenAI-style image content to Anthropic image blocks when supported
|
|
1468
|
+
|
|
1469
|
+
Fixture coverage is required before treating Anthropic as production-ready. Tests should cover plain text, tool use, tool result continuation, image content conversion, stop reason mapping, and streaming event normalization.
|
|
1470
|
+
|
|
1471
|
+
### Retry
|
|
1472
|
+
|
|
1473
|
+
Retry is provider-independent and lives in `providers/retry.ts`.
|
|
1474
|
+
|
|
1475
|
+
```text
|
|
1476
|
+
retry: 429, 5xx, ETIMEDOUT, ECONNRESET, ENOTFOUND
|
|
1477
|
+
do not retry: 4xx except 429
|
|
1478
|
+
backoff: 1s, 2s, 4s, 8s, 16s + jitter
|
|
1479
|
+
Retry-After: respected, capped at 30s
|
|
1480
|
+
AbortSignal: respected immediately
|
|
1481
|
+
```
|
|
1482
|
+
|
|
1483
|
+
---
|
|
1484
|
+
|
|
1485
|
+
## 9. Tool Catalog
|
|
1486
|
+
|
|
1487
|
+
| Tool | Input | Output/limit | `build` | `plan` |
|
|
1488
|
+
|------|-------|--------------|---------|--------|
|
|
1489
|
+
| `read_file` | `path`, `offset?`, `limit?` | 1MB / 2000 lines | allow | allow |
|
|
1490
|
+
| `grep` | `pattern`, `path?`, `glob?` | 200 matches | allow | allow |
|
|
1491
|
+
| `glob` | `pattern`, `path?` | 1000 paths | allow | allow |
|
|
1492
|
+
| `write_file` | `path`, `content` | summary | ask | deny |
|
|
1493
|
+
| `edit_file` | `path`, `oldString`, `newString`, `replaceAll?` | summary + diff | ask | deny |
|
|
1494
|
+
| `bash` | `command`, `workdir?`, `timeoutMs?` | 32KB output | ask | deny |
|
|
1495
|
+
| `delegate_agent` | `agent`, `task`, context fields | compact child result + transcript refs | multi only | deny |
|
|
1496
|
+
|
|
1497
|
+
Output cap:
|
|
1498
|
+
|
|
1499
|
+
```text
|
|
1500
|
+
if output.size <= 32KB:
|
|
1501
|
+
return output
|
|
1502
|
+
else:
|
|
1503
|
+
write full output to .demian/tmp/output-<callId>.txt
|
|
1504
|
+
return head 16KB + marker + tail 16KB
|
|
1505
|
+
```
|
|
1506
|
+
|
|
1507
|
+
Tool details:
|
|
1508
|
+
|
|
1509
|
+
- `read_file`: rejects cwd escape, binary files, and files over 1MB; returns line-numbered output; supports paging.
|
|
1510
|
+
- `write_file`: rejects cwd escape; creates parent directory; replaces existing file; `.env` edits are blocked by hook.
|
|
1511
|
+
- `edit_file`: exact string replace; rejects no-op edits; errors on 0 matches; requires `replaceAll` for multiple matches; emits diff metadata.
|
|
1512
|
+
- `bash`: rejects workdir outside cwd; default timeout 30s; max timeout 600s; captures stdout/stderr; uses sandbox launch policy.
|
|
1513
|
+
- `grep`: uses `rg` first; falls back to a walker; excludes `.git`, `node_modules`, and build output.
|
|
1514
|
+
- `glob`: searches inside cwd; excludes `.git` and `node_modules`; caps at 1000 paths; prefers recent files first.
|
|
1515
|
+
- `delegate_agent`: appears only in multi-agent mode, validates callable agents through registry and delegation policy, gates invocation permission, runs the child AgentSessionRunner, and returns a compact result.
|
|
1516
|
+
|
|
1517
|
+
`delegate_agent` uses one stable provider tool rather than one generated tool per sub agent:
|
|
1518
|
+
|
|
1519
|
+
```ts
|
|
1520
|
+
const delegateAgentTool: Tool = {
|
|
1521
|
+
name: "delegate_agent",
|
|
1522
|
+
description: "Delegate a bounded task to a configured demian agent and return its result.",
|
|
1523
|
+
inputSchema: {
|
|
1524
|
+
type: "object",
|
|
1525
|
+
properties: {
|
|
1526
|
+
agent: { type: "string", enum: ["<callable-agent-name>"] },
|
|
1527
|
+
task: { type: "string" },
|
|
1528
|
+
context: { type: "string" },
|
|
1529
|
+
contextRefs: { type: "array", items: { type: "string" } },
|
|
1530
|
+
relevantFiles: { type: "array", items: { type: "string" } },
|
|
1531
|
+
constraints: { type: "array", items: { type: "string" } },
|
|
1532
|
+
expectedOutput: { type: "string" },
|
|
1533
|
+
maxTurns: { type: "integer", minimum: 1 },
|
|
1534
|
+
returnMode: { type: "string", enum: ["brief", "normal"] }
|
|
1535
|
+
},
|
|
1536
|
+
required: ["agent", "task"]
|
|
1537
|
+
},
|
|
1538
|
+
execute: ...
|
|
1539
|
+
}
|
|
1540
|
+
```
|
|
1541
|
+
|
|
1542
|
+
The alternative was generated virtual tools such as `reviewer(task)` and `builder(task)`. That might give some models more obvious routing hints, but it increases provider payload size and scatters permission targets across many dynamic tool names. The stable tool is chosen for a smaller core, a central permission target, and schema-enum validation of callable agent names. Generated virtual tools can be added later if model routing quality proves weak.
|
|
1543
|
+
|
|
1544
|
+
Dangerous bash patterns blocked by built-in hook include:
|
|
1545
|
+
|
|
1546
|
+
- `rm -rf /`
|
|
1547
|
+
- `rm -rf ~`
|
|
1548
|
+
- `rm -rf $HOME`
|
|
1549
|
+
- fork-bomb patterns
|
|
1550
|
+
- `dd` writes to device paths
|
|
1551
|
+
- destructive `curl` or `wget` pipe patterns
|
|
1552
|
+
|
|
1553
|
+
This list is not a complete security boundary; it is a hardening layer before permission prompts.
|
|
1554
|
+
|
|
1555
|
+
---
|
|
1556
|
+
|
|
1557
|
+
## 10. Hooks
|
|
1558
|
+
|
|
1559
|
+
Built-in hooks:
|
|
1560
|
+
|
|
1561
|
+
| Hook | Event | Purpose | Failure |
|
|
1562
|
+
|------|-------|---------|---------|
|
|
1563
|
+
| `block-dangerous-bash` | `PreToolUse(bash)` | Block destructive command patterns | fail-closed |
|
|
1564
|
+
| `protect-env-files` | `PreToolUse(write_file/edit_file)` | Block secret file modification | fail-closed |
|
|
1565
|
+
| `mask-secrets` | `AfterModelResponse` | Mask obvious secret-looking model text | fail-closed |
|
|
1566
|
+
| `inject-env-info` | `SessionStart` | Inject cwd, OS, provider, agent, tools, sandbox context | fail-closed |
|
|
1567
|
+
|
|
1568
|
+
Command hook stdin:
|
|
1569
|
+
|
|
1570
|
+
```json
|
|
1571
|
+
{
|
|
1572
|
+
"event": "PreToolUse",
|
|
1573
|
+
"rootSessionId": "root_123",
|
|
1574
|
+
"sessionId": "ses_123",
|
|
1575
|
+
"invocationId": "inv_456",
|
|
1576
|
+
"agentPath": ["orchestrator", "builder"],
|
|
1577
|
+
"callId": "call_456",
|
|
1578
|
+
"agent": "builder",
|
|
1579
|
+
"cwd": "/repo",
|
|
1580
|
+
"toolName": "bash",
|
|
1581
|
+
"toolInput": {
|
|
1582
|
+
"command": "npm test"
|
|
1583
|
+
}
|
|
1584
|
+
}
|
|
1585
|
+
```
|
|
1586
|
+
|
|
1587
|
+
Command hook stdout:
|
|
1588
|
+
|
|
1589
|
+
```json
|
|
1590
|
+
{ "decision": "warn", "message": "Tests may take a while." }
|
|
1591
|
+
```
|
|
1592
|
+
|
|
1593
|
+
Empty stdout means pass-through. Non-zero exit from a command hook is fail-open with warning.
|
|
1594
|
+
|
|
1595
|
+
---
|
|
1596
|
+
|
|
1597
|
+
## 11. Permissions, Agents, And Delegation
|
|
1598
|
+
|
|
1599
|
+
Ask UI:
|
|
1600
|
+
|
|
1601
|
+
```text
|
|
1602
|
+
Allow bash: npm test?
|
|
1603
|
+
[y]es once / [N]o / [a]lways globally using configured grant scope:
|
|
1604
|
+
```
|
|
1605
|
+
|
|
1606
|
+
Multi-agent prompt examples:
|
|
1607
|
+
|
|
1608
|
+
```text
|
|
1609
|
+
Allow agent reviewer using provider gemini-fast?
|
|
1610
|
+
It will receive: current user request, compact handoff context, context refs, relevant file paths.
|
|
1611
|
+
[y]es once / [N]o / [a]lways globally using configured grant scope:
|
|
1612
|
+
```
|
|
1613
|
+
|
|
1614
|
+
```text
|
|
1615
|
+
Allow builder -> edit_file: src/parser.ts?
|
|
1616
|
+
[y]es once / [N]o / [a]lways globally using configured grant scope:
|
|
1617
|
+
```
|
|
1618
|
+
|
|
1619
|
+
`--yes` auto-allows `ask`, but built-in hard deny, global explicit deny, and hook blocks still apply. `--yes` does not create persistent grants unless the user explicitly chose an `always` path.
|
|
1620
|
+
|
|
1621
|
+
### Visibility vs Authority
|
|
1622
|
+
|
|
1623
|
+
`tools.visible` controls what the provider sees in the tool list. Permission rules and grants control whether a requested tool call can execute.
|
|
1624
|
+
|
|
1625
|
+
```text
|
|
1626
|
+
single-agent:
|
|
1627
|
+
visible tools == authority tools
|
|
1628
|
+
|
|
1629
|
+
multi-agent orchestrator:
|
|
1630
|
+
visible tools: [read_file, grep, glob, delegate_agent]
|
|
1631
|
+
authority tools: [read_file, grep, glob, delegate_agent]
|
|
1632
|
+
|
|
1633
|
+
builder subagent:
|
|
1634
|
+
visible tools: [read_file, write_file, edit_file, bash, grep, glob]
|
|
1635
|
+
effective tools:
|
|
1636
|
+
builder.visible ∩ registered tools ∩ global safety policy
|
|
1637
|
+
```
|
|
1638
|
+
|
|
1639
|
+
The gain is that a main orchestrator can stay narrow while a builder sub agent can still perform implementation work after root-user approval. The cost is conceptual complexity: a grant may exist but still be unusable by an agent that cannot see the tool. The design accepts that complexity because it keeps model capability exposure separate from user authority.
|
|
1640
|
+
|
|
1641
|
+
### Effective Authority
|
|
1642
|
+
|
|
1643
|
+
Sub agent effective authority is:
|
|
1644
|
+
|
|
1645
|
+
```text
|
|
1646
|
+
root global safety policy
|
|
1647
|
+
∩ child visible tool list
|
|
1648
|
+
∩ child agent-local policy defaults
|
|
1649
|
+
+ user-approved global grants
|
|
1650
|
+
```
|
|
1651
|
+
|
|
1652
|
+
No child invocation can bypass cwd boundaries, sandbox policy, built-in hard deny, global explicit deny, provider profile validation, or hook blocks. If the child requests a tool that is visible but not yet allowed, the root UI asks the user. If the user chooses `always`, the grant becomes a root/global grant reusable by main and sibling agents when the tool is visible to them.
|
|
1653
|
+
|
|
1654
|
+
This deliberately gives up per-agent grant isolation. The advantage is a simpler user model: "I approved this operation scope for this root conversation/project." The safety invariant remains that hard/global deny cannot be expanded.
|
|
1655
|
+
|
|
1656
|
+
### Agent Invocation Permission
|
|
1657
|
+
|
|
1658
|
+
Starting a child agent is itself a permission target:
|
|
1659
|
+
|
|
1660
|
+
```text
|
|
1661
|
+
targetType: "agent"
|
|
1662
|
+
targetName: "reviewer"
|
|
1663
|
+
operation: "invoke"
|
|
1664
|
+
grant key: "agent:reviewer:gemini-fast"
|
|
1665
|
+
```
|
|
1666
|
+
|
|
1667
|
+
Allowing an agent invocation does not pre-allow child tool calls. It only allows sending the bounded handoff context to the child provider and starting the child model loop.
|
|
1668
|
+
|
|
1669
|
+
Default policy:
|
|
1670
|
+
|
|
1671
|
+
- built-in cheap read-only subagents may default allow
|
|
1672
|
+
- external project subagents default ask
|
|
1673
|
+
- expensive provider profiles default ask unless configured otherwise
|
|
1674
|
+
- hidden or non-callable agents are deny
|
|
1675
|
+
|
|
1676
|
+
### Delegation Validation
|
|
1677
|
+
|
|
1678
|
+
```ts
|
|
1679
|
+
function validateDelegateInvocation(input: {
|
|
1680
|
+
targetAgentName: string
|
|
1681
|
+
callerChain: string[]
|
|
1682
|
+
maxDepth: number
|
|
1683
|
+
registry: AgentRegistry
|
|
1684
|
+
}): void {
|
|
1685
|
+
const target = input.registry.get(input.targetAgentName)
|
|
1686
|
+
if (!isCallable(target)) throw new Error(`Agent ${input.targetAgentName} is not callable`)
|
|
1687
|
+
if (input.callerChain.includes(input.targetAgentName)) {
|
|
1688
|
+
throw new Error(`Cycle detected: ${[...input.callerChain, input.targetAgentName].join(" -> ")}`)
|
|
1689
|
+
}
|
|
1690
|
+
if (input.callerChain.length >= input.maxDepth) {
|
|
1691
|
+
throw new Error(`Max delegation depth (${input.maxDepth}) exceeded`)
|
|
1692
|
+
}
|
|
1693
|
+
}
|
|
1694
|
+
```
|
|
1695
|
+
|
|
1696
|
+
Default max depth is `1`. Child agents default to `delegation.canDelegate: false`. This gives demian specialist help without opening recursive orchestration as the default behavior.
|
|
1697
|
+
|
|
1698
|
+
### Callable Catalog
|
|
1699
|
+
|
|
1700
|
+
In multi-agent mode, the main system prompt receives a compact catalog:
|
|
1701
|
+
|
|
1702
|
+
```text
|
|
1703
|
+
Available sub agents (use delegate_agent to invoke):
|
|
1704
|
+
|
|
1705
|
+
- reviewer (reviewer, cheap): Read-only code reviewer.
|
|
1706
|
+
Use when: find bugs and missing tests after implementation.
|
|
1707
|
+
Avoid when: editing files directly.
|
|
1708
|
+
|
|
1709
|
+
- builder (builder, standard): Implement scoped code changes.
|
|
1710
|
+
Use when: a focused implementation task is ready.
|
|
1711
|
+
```
|
|
1712
|
+
|
|
1713
|
+
Catalog budget defaults to about `2000` tokens. If over budget, demian drops `useWhen` and `avoidWhen` first, keeping agent names and descriptions. `triggers` are registry metadata for future routing and are not included in v1 prompt text.
|
|
1714
|
+
|
|
1715
|
+
### `build`
|
|
1716
|
+
|
|
1717
|
+
Normalized v1 shape:
|
|
1718
|
+
|
|
1719
|
+
```ts
|
|
1720
|
+
export const build: AgentDefinition = {
|
|
1721
|
+
name: "build",
|
|
1722
|
+
description: "General coding agent.",
|
|
1723
|
+
mode: "primary",
|
|
1724
|
+
prompt: { system: buildPrompt },
|
|
1725
|
+
tools: { visible: ["read_file", "write_file", "edit_file", "bash", "grep", "glob"] },
|
|
1726
|
+
permissions: [
|
|
1727
|
+
{ tool: "read_file", decision: "allow" },
|
|
1728
|
+
{ tool: "grep", decision: "allow" },
|
|
1729
|
+
{ tool: "glob", decision: "allow" },
|
|
1730
|
+
{ tool: "write_file", decision: "ask" },
|
|
1731
|
+
{ tool: "edit_file", decision: "ask" },
|
|
1732
|
+
{ tool: "bash", decision: "ask" },
|
|
1733
|
+
{ tool: "*", match: { pathGlob: "**/.env" }, decision: "deny", reason: "secrets" },
|
|
1734
|
+
{ tool: "*", match: { pathGlob: "**/.env.*" }, decision: "deny", reason: "secrets" },
|
|
1735
|
+
{ tool: "*", match: { pathGlob: "node_modules/**" }, decision: "deny", reason: "vendored" }
|
|
1736
|
+
]
|
|
1737
|
+
}
|
|
1738
|
+
```
|
|
1739
|
+
|
|
1740
|
+
### `plan`
|
|
1741
|
+
|
|
1742
|
+
```ts
|
|
1743
|
+
export const plan: AgentDefinition = {
|
|
1744
|
+
name: "plan",
|
|
1745
|
+
description: "Read-only planning agent.",
|
|
1746
|
+
mode: "primary",
|
|
1747
|
+
prompt: { system: planPrompt },
|
|
1748
|
+
tools: { visible: ["read_file", "grep", "glob"] },
|
|
1749
|
+
permissions: [
|
|
1750
|
+
{ tool: "read_file", decision: "allow" },
|
|
1751
|
+
{ tool: "grep", decision: "allow" },
|
|
1752
|
+
{ tool: "glob", decision: "allow" },
|
|
1753
|
+
{ tool: "*", decision: "deny", reason: "plan agent is read-only" }
|
|
1754
|
+
]
|
|
1755
|
+
}
|
|
1756
|
+
```
|
|
1757
|
+
|
|
1758
|
+
Existing `build` and `plan` behavior must remain unchanged in single-agent mode. The new shape is a normalization layer, not a behavioral migration.
|
|
1759
|
+
|
|
1760
|
+
---
|
|
1761
|
+
|
|
1762
|
+
## 12. Config
|
|
1763
|
+
|
|
1764
|
+
Config precedence:
|
|
1765
|
+
|
|
1766
|
+
```text
|
|
1767
|
+
built-in defaults
|
|
1768
|
+
-> ~/.demian/config.json
|
|
1769
|
+
-> <cwd>/.demian/config.json
|
|
1770
|
+
-> --config <path>
|
|
1771
|
+
-> CLI flags
|
|
1772
|
+
```
|
|
1773
|
+
|
|
1774
|
+
Default shape:
|
|
1775
|
+
|
|
1776
|
+
```json
|
|
1777
|
+
{
|
|
1778
|
+
"agentMode": "single-agent",
|
|
1779
|
+
"defaultAgent": "build",
|
|
1780
|
+
"defaultProvider": "openai",
|
|
1781
|
+
"maxTurns": 25,
|
|
1782
|
+
"streaming": {
|
|
1783
|
+
"enabled": true
|
|
1784
|
+
},
|
|
1785
|
+
"sandbox": {
|
|
1786
|
+
"mode": "workspace-write",
|
|
1787
|
+
"network": "inherit"
|
|
1788
|
+
},
|
|
1789
|
+
"persistentGrants": {
|
|
1790
|
+
"enabled": true,
|
|
1791
|
+
"scope": "project",
|
|
1792
|
+
"ttlMs": 604800000
|
|
1793
|
+
},
|
|
1794
|
+
"multimodal": {
|
|
1795
|
+
"maxImageBytes": 8388608,
|
|
1796
|
+
"detail": "auto"
|
|
1797
|
+
},
|
|
1798
|
+
"delegation": {
|
|
1799
|
+
"maxDepth": 1,
|
|
1800
|
+
"defaultInvocationDecision": "ask",
|
|
1801
|
+
"subInvocationMaxTurns": 10
|
|
1802
|
+
},
|
|
1803
|
+
"context": {
|
|
1804
|
+
"main": {
|
|
1805
|
+
"maxContextTokens": 48000,
|
|
1806
|
+
"compactAtRatio": 0.7,
|
|
1807
|
+
"summaryTargetTokens": 3000,
|
|
1808
|
+
"recentTurns": 6,
|
|
1809
|
+
"minRecentTurns": 2,
|
|
1810
|
+
"keepRawDelegateResults": 2,
|
|
1811
|
+
"maxDelegateResultTokens": 1000,
|
|
1812
|
+
"ledgerTargetTokens": 1200,
|
|
1813
|
+
"maxInlineToolResultTokens": 2000,
|
|
1814
|
+
"retrievalMaxTokens": 3000,
|
|
1815
|
+
"reserveForImagesTokens": 4000,
|
|
1816
|
+
"compression": "deterministic"
|
|
1817
|
+
},
|
|
1818
|
+
"subAgent": {
|
|
1819
|
+
"memoryScope": "root-session",
|
|
1820
|
+
"maxContextTokens": 12000,
|
|
1821
|
+
"maxInputTokens": 12000,
|
|
1822
|
+
"summaryTargetTokens": 1200,
|
|
1823
|
+
"recentTurns": 3,
|
|
1824
|
+
"minRecentTurns": 1,
|
|
1825
|
+
"compactAtRatio": 0.7,
|
|
1826
|
+
"maxHandoffTokens": 2000,
|
|
1827
|
+
"maxDelegateContextTokens": 1000,
|
|
1828
|
+
"maxResultTokensToMain": 1000,
|
|
1829
|
+
"compression": "compact-summary-and-recent"
|
|
1830
|
+
}
|
|
1831
|
+
},
|
|
1832
|
+
"trust": {
|
|
1833
|
+
"userTools": false,
|
|
1834
|
+
"project": false
|
|
1835
|
+
},
|
|
1836
|
+
"providers": {
|
|
1837
|
+
"openai": {
|
|
1838
|
+
"type": "openai-compatible",
|
|
1839
|
+
"model": "openai-model-name",
|
|
1840
|
+
"baseURL": "https://api.openai.com/v1",
|
|
1841
|
+
"apiKeyEnv": "OPENAI_API_KEY"
|
|
1842
|
+
},
|
|
1843
|
+
"gemini": {
|
|
1844
|
+
"type": "openai-compatible",
|
|
1845
|
+
"model": "gemini-model-name",
|
|
1846
|
+
"baseURL": "https://generativelanguage.googleapis.com/v1beta/openai/",
|
|
1847
|
+
"apiKeyEnv": "GOOGLE_API_KEY"
|
|
1848
|
+
},
|
|
1849
|
+
"ollama": {
|
|
1850
|
+
"type": "openai-compatible",
|
|
1851
|
+
"model": "local-coder-model",
|
|
1852
|
+
"baseURL": "http://localhost:11434/v1",
|
|
1853
|
+
"apiKey": "ollama",
|
|
1854
|
+
"quirks": {
|
|
1855
|
+
"omitTemperature": true
|
|
1856
|
+
}
|
|
1857
|
+
},
|
|
1858
|
+
"lmstudio": {
|
|
1859
|
+
"type": "openai-compatible",
|
|
1860
|
+
"model": "local-coder-model",
|
|
1861
|
+
"baseURL": "http://localhost:1234/v1",
|
|
1862
|
+
"apiKey": "lm-studio"
|
|
1863
|
+
},
|
|
1864
|
+
"openrouter": {
|
|
1865
|
+
"type": "openai-compatible",
|
|
1866
|
+
"model": "provider/model-name",
|
|
1867
|
+
"baseURL": "https://openrouter.ai/api/v1",
|
|
1868
|
+
"apiKeyEnv": "OPENROUTER_API_KEY"
|
|
1869
|
+
},
|
|
1870
|
+
"azure": {
|
|
1871
|
+
"type": "openai-compatible",
|
|
1872
|
+
"model": "azure-deployment-name",
|
|
1873
|
+
"baseURL": "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1",
|
|
1874
|
+
"apiKeyEnv": "AZURE_OPENAI_API_KEY"
|
|
1875
|
+
},
|
|
1876
|
+
"anthropic": {
|
|
1877
|
+
"type": "anthropic",
|
|
1878
|
+
"model": "anthropic-model-name",
|
|
1879
|
+
"apiKeyEnv": "ANTHROPIC_API_KEY"
|
|
1880
|
+
},
|
|
1881
|
+
"codex": {
|
|
1882
|
+
"type": "codex",
|
|
1883
|
+
"model": "gpt-5.1-codex",
|
|
1884
|
+
"baseURL": "https://chatgpt.com/backend-api/codex",
|
|
1885
|
+
"authStore": "auto",
|
|
1886
|
+
"allowApiKeyFallback": false,
|
|
1887
|
+
"promptCacheKey": "root-session",
|
|
1888
|
+
"responses": {
|
|
1889
|
+
"store": false,
|
|
1890
|
+
"include": ["reasoning.encrypted_content"],
|
|
1891
|
+
"reasoning": {
|
|
1892
|
+
"effort": "medium",
|
|
1893
|
+
"summary": "auto"
|
|
1894
|
+
}
|
|
1895
|
+
}
|
|
1896
|
+
},
|
|
1897
|
+
"claudecode": {
|
|
1898
|
+
"type": "claudecode",
|
|
1899
|
+
"runtime": "agent-sdk",
|
|
1900
|
+
"model": "sonnet",
|
|
1901
|
+
"cliPath": "~/.local/bin/claude",
|
|
1902
|
+
"cwdMode": "session",
|
|
1903
|
+
"permissionMode": "default",
|
|
1904
|
+
"defaultDecision": "by-category",
|
|
1905
|
+
"historyPolicy": "passthrough-resume",
|
|
1906
|
+
"onInvalidResume": "fresh",
|
|
1907
|
+
"attachmentFallback": "block",
|
|
1908
|
+
"allowSubagents": false,
|
|
1909
|
+
"sanitizeApiKeyEnv": true,
|
|
1910
|
+
"authPreflight": true,
|
|
1911
|
+
"useBareMode": false,
|
|
1912
|
+
"usageLedgerScope": "process",
|
|
1913
|
+
"sessionLock": true,
|
|
1914
|
+
"abortPolicy": "record-only"
|
|
1915
|
+
}
|
|
1916
|
+
}
|
|
1917
|
+
}
|
|
1918
|
+
```
|
|
1919
|
+
|
|
1920
|
+
Most model names are placeholders in this architecture. Config owns fast-changing model selection; the Codex default should track a concrete Codex model slug present in local fixtures or provider metadata, and the Claude Code default should track a Claude Code-compatible Agent SDK model alias.
|
|
1921
|
+
|
|
1922
|
+
Mode resolution precedence:
|
|
1923
|
+
|
|
1924
|
+
```text
|
|
1925
|
+
1. CLI flag: --mode / --single-agent / --multi-agent
|
|
1926
|
+
2. config.agentMode
|
|
1927
|
+
3. config.mode, accepted as a compatibility alias
|
|
1928
|
+
4. config.multiAgent.enabled === true -> multi-agent
|
|
1929
|
+
5. built-in default -> single-agent
|
|
1930
|
+
```
|
|
1931
|
+
|
|
1932
|
+
Context budget precedence:
|
|
1933
|
+
|
|
1934
|
+
```text
|
|
1935
|
+
built-in defaults
|
|
1936
|
+
-> config file
|
|
1937
|
+
-> environment/application variables
|
|
1938
|
+
-> CLI flags
|
|
1939
|
+
-> programmatic embedding options
|
|
1940
|
+
```
|
|
1941
|
+
|
|
1942
|
+
Recommended environment variable names:
|
|
1943
|
+
|
|
1944
|
+
| Variable | Meaning |
|
|
1945
|
+
|----------|---------|
|
|
1946
|
+
| `DEMIAN_MAIN_MAX_CONTEXT_TOKENS` | main agent model-visible context budget |
|
|
1947
|
+
| `DEMIAN_MAIN_COMPACT_AT_RATIO` | main context compaction trigger; default `0.7` |
|
|
1948
|
+
| `DEMIAN_MAIN_SUMMARY_TARGET_TOKENS` | target size for main rolling summary |
|
|
1949
|
+
| `DEMIAN_MAIN_KEEP_RAW_DELEGATE_RESULTS` | number of raw delegate results kept in main history |
|
|
1950
|
+
| `DEMIAN_MAIN_MAX_DELEGATE_RESULT_TOKENS` | max child result tokens returned to main |
|
|
1951
|
+
| `DEMIAN_SUB_MAX_CONTEXT_TOKENS` | sub agent model-visible context budget |
|
|
1952
|
+
| `DEMIAN_SUB_COMPACT_AT_RATIO` | sub agent compaction trigger; default `0.7` |
|
|
1953
|
+
| `DEMIAN_SUB_SUMMARY_TARGET_TOKENS` | target size for sub memory summary |
|
|
1954
|
+
| `DEMIAN_SUB_RECENT_TURNS` | raw recent child turns kept |
|
|
1955
|
+
| `DEMIAN_SUB_MAX_HANDOFF_TOKENS` | max child-visible handoff packet size |
|
|
1956
|
+
| `DEMIAN_SUB_MAX_DELEGATE_CONTEXT_TOKENS` | max main-provided delegate context size |
|
|
1957
|
+
| `DEMIAN_SUB_MAX_RESULT_TOKENS_TO_MAIN` | max compact result returned to main |
|
|
1958
|
+
|
|
1959
|
+
### Agent Registration, Discovery, And Trust
|
|
1960
|
+
|
|
1961
|
+
Current implementation supports external agents through `config.agents` and programmatic `AgentRegistry.register()`. This gives users and tests an extension point without loading arbitrary project code.
|
|
1962
|
+
|
|
1963
|
+
Config agent example:
|
|
1964
|
+
|
|
1965
|
+
```json
|
|
1966
|
+
{
|
|
1967
|
+
"agents": {
|
|
1968
|
+
"reviewer": {
|
|
1969
|
+
"name": "reviewer",
|
|
1970
|
+
"description": "Read-only code reviewer.",
|
|
1971
|
+
"mode": "subagent",
|
|
1972
|
+
"provider": {
|
|
1973
|
+
"profile": "gemini-fast"
|
|
1974
|
+
},
|
|
1975
|
+
"prompt": {
|
|
1976
|
+
"system": "You are a read-only reviewer. Report concrete issues with file references."
|
|
1977
|
+
},
|
|
1978
|
+
"tools": {
|
|
1979
|
+
"visible": ["read_file", "grep", "glob"]
|
|
1980
|
+
},
|
|
1981
|
+
"permissions": [
|
|
1982
|
+
{ "tool": "read_file", "decision": "allow" },
|
|
1983
|
+
{ "tool": "grep", "decision": "allow" },
|
|
1984
|
+
{ "tool": "glob", "decision": "allow" },
|
|
1985
|
+
{ "tool": "*", "decision": "deny", "reason": "reviewer is read-only" }
|
|
1986
|
+
],
|
|
1987
|
+
"delegation": {
|
|
1988
|
+
"callable": true,
|
|
1989
|
+
"canDelegate": false
|
|
1990
|
+
},
|
|
1991
|
+
"catalog": {
|
|
1992
|
+
"category": "reviewer",
|
|
1993
|
+
"cost": "cheap",
|
|
1994
|
+
"useWhen": ["Find bugs and risks without editing files."]
|
|
1995
|
+
}
|
|
1996
|
+
}
|
|
1997
|
+
}
|
|
1998
|
+
}
|
|
1999
|
+
```
|
|
2000
|
+
|
|
2001
|
+
Filesystem discovery remains a target design rather than current code:
|
|
2002
|
+
|
|
2003
|
+
Recommended filesystem layout:
|
|
2004
|
+
|
|
2005
|
+
```text
|
|
2006
|
+
~/.demian/
|
|
2007
|
+
agents/<name>.md
|
|
2008
|
+
tools/<name>.mjs
|
|
2009
|
+
|
|
2010
|
+
<cwd>/.demian/
|
|
2011
|
+
agents/<name>.md
|
|
2012
|
+
tools/<name>.mjs
|
|
2013
|
+
```
|
|
2014
|
+
|
|
2015
|
+
Agent markdown is data. Tool modules are code. They therefore use different trust gates:
|
|
2016
|
+
|
|
2017
|
+
| Source | Agent markdown | Tool module |
|
|
2018
|
+
|--------|----------------|-------------|
|
|
2019
|
+
| built-in | trusted | trusted |
|
|
2020
|
+
| user scope | load by default or warn once | prompt or config opt-in |
|
|
2021
|
+
| project scope | load data with warning | require `--trust-project` or prompt |
|
|
2022
|
+
| programmatic | caller responsibility | caller responsibility |
|
|
2023
|
+
|
|
2024
|
+
Trust decisions are separate from permission grants:
|
|
2025
|
+
|
|
2026
|
+
```text
|
|
2027
|
+
~/.demian/trust.json
|
|
2028
|
+
<cwd>/.demian/trust.json
|
|
2029
|
+
```
|
|
2030
|
+
|
|
2031
|
+
Trust records should include absolute path, trusted decision, decision time, and `mtimeMs`. If the file changes, prompt again. mtime validation is chosen for the first filesystem loader because it is simple; hash validation is a possible later hardening step.
|
|
2032
|
+
|
|
2033
|
+
Future agent markdown example:
|
|
2034
|
+
|
|
2035
|
+
```markdown
|
|
2036
|
+
---
|
|
2037
|
+
name: reviewer
|
|
2038
|
+
description: Read-only code reviewer.
|
|
2039
|
+
mode: all
|
|
2040
|
+
provider:
|
|
2041
|
+
profile: gemini-fast
|
|
2042
|
+
tools:
|
|
2043
|
+
visible: [read_file, grep, glob]
|
|
2044
|
+
permissions:
|
|
2045
|
+
- { tool: read_file, decision: allow }
|
|
2046
|
+
- { tool: grep, decision: allow }
|
|
2047
|
+
- { tool: glob, decision: allow }
|
|
2048
|
+
- { tool: "*", decision: deny, reason: "reviewer is read-only" }
|
|
2049
|
+
delegation:
|
|
2050
|
+
callable: true
|
|
2051
|
+
canDelegate: false
|
|
2052
|
+
catalog:
|
|
2053
|
+
category: reviewer
|
|
2054
|
+
cost: cheap
|
|
2055
|
+
useWhen:
|
|
2056
|
+
- Find bugs and risks without editing files.
|
|
2057
|
+
---
|
|
2058
|
+
You are a read-only reviewer. Report concrete issues with file references.
|
|
2059
|
+
Do not modify files.
|
|
2060
|
+
```
|
|
2061
|
+
|
|
2062
|
+
Future tool module example:
|
|
2063
|
+
|
|
2064
|
+
```ts
|
|
2065
|
+
import type { Tool } from "demian-cli"
|
|
2066
|
+
|
|
2067
|
+
const tool: Tool = {
|
|
2068
|
+
name: "lsp_references",
|
|
2069
|
+
description: "Find references for a symbol through a local LSP bridge.",
|
|
2070
|
+
inputSchema: {
|
|
2071
|
+
type: "object",
|
|
2072
|
+
properties: {
|
|
2073
|
+
file: { type: "string" },
|
|
2074
|
+
symbol: { type: "string" }
|
|
2075
|
+
},
|
|
2076
|
+
required: ["file", "symbol"]
|
|
2077
|
+
},
|
|
2078
|
+
async execute(input, ctx) {
|
|
2079
|
+
return { ok: true, content: "..." }
|
|
2080
|
+
}
|
|
2081
|
+
}
|
|
2082
|
+
|
|
2083
|
+
export default tool
|
|
2084
|
+
```
|
|
2085
|
+
|
|
2086
|
+
Loader validation:
|
|
2087
|
+
|
|
2088
|
+
- names must be stable tool-safe identifiers
|
|
2089
|
+
- built-in name collisions are rejected
|
|
2090
|
+
- duplicate external contribution names are rejected by precedence order with a warning
|
|
2091
|
+
- every `tools.visible` entry must exist in the registered tool catalog
|
|
2092
|
+
- provider profile references must exist in `config.providers`
|
|
2093
|
+
- inline provider fields are rejected
|
|
2094
|
+
- `mode: "subagent"` cannot be selected as the main agent
|
|
2095
|
+
- default tool export must expose `name`, `description`, `inputSchema`, and `execute`
|
|
2096
|
+
|
|
2097
|
+
---
|
|
2098
|
+
|
|
2099
|
+
## 13. CLI
|
|
2100
|
+
|
|
2101
|
+
Usage:
|
|
2102
|
+
|
|
2103
|
+
```text
|
|
2104
|
+
demian [flags]
|
|
2105
|
+
demian-cli [flags]
|
|
2106
|
+
demian-plain [flags]
|
|
2107
|
+
```
|
|
2108
|
+
|
|
2109
|
+
All human-facing commands are interface-first:
|
|
2110
|
+
|
|
2111
|
+
- `demian` and `demian-cli` launch the Ink terminal UI described in `architecture-tui.md`.
|
|
2112
|
+
- `demian-plain` launches an interactive plain terminal flow with raw Markdown output.
|
|
2113
|
+
- The user enters the first message after entering the CLI or TUI, not as a required positional argument.
|
|
2114
|
+
|
|
2115
|
+
Prompt arguments may remain as a compatibility shortcut, but they are not the primary UX. The canonical interactive path is command first, settings confirmation second, message input third.
|
|
2116
|
+
|
|
2117
|
+
The UI process does not exit after one assistant answer. After each `SessionRunner` run completes, the same CLI or TUI returns to standby and accepts the next message until the user exits.
|
|
2118
|
+
|
|
2119
|
+
Interactive mode keeps conversational history in memory for the life of the UI process. Each submitted message starts a new `SessionRunner` run with the previous non-system messages passed as `history`; the current system prompt is rebuilt for the run.
|
|
2120
|
+
|
|
2121
|
+
Examples:
|
|
2122
|
+
|
|
2123
|
+
```sh
|
|
2124
|
+
demian
|
|
2125
|
+
demian-cli --agent plan
|
|
2126
|
+
demian-plain
|
|
2127
|
+
demian-plain --provider gemini --model gemini-model-name
|
|
2128
|
+
demian --stream --image screenshot.png --sandbox workspace-write
|
|
2129
|
+
```
|
|
2130
|
+
|
|
2131
|
+
Cost optimization scenario:
|
|
2132
|
+
|
|
2133
|
+
```sh
|
|
2134
|
+
demian --agent plan --provider gemini --model gemini-model-name
|
|
2135
|
+
demian --agent build --provider openai --model openai-model-name
|
|
2136
|
+
```
|
|
2137
|
+
|
|
2138
|
+
### Interactive Startup Flow
|
|
2139
|
+
|
|
2140
|
+
All commands share the same resolution order before the first message:
|
|
2141
|
+
|
|
2142
|
+
```text
|
|
2143
|
+
built-in defaults
|
|
2144
|
+
-> user config
|
|
2145
|
+
-> workspace config
|
|
2146
|
+
-> --config
|
|
2147
|
+
-> saved UI preferences
|
|
2148
|
+
-> CLI flags
|
|
2149
|
+
-> interactive provider/model override
|
|
2150
|
+
-> message input
|
|
2151
|
+
-> SessionRunner
|
|
2152
|
+
```
|
|
2153
|
+
|
|
2154
|
+
Provider and model are selected before `SessionRunner` starts. This keeps the runtime provider-neutral and avoids changing model identity in the middle of a tool loop.
|
|
2155
|
+
|
|
2156
|
+
Saved UI preferences live in project-local `.demian/preferences.json`. They store only the selected provider key and model name, never API keys or copied provider config. Preferences are written after an explicit interactive or flag-sourced provider/model selection and are reused by `demian`, `demian-cli`, and `demian-plain`.
|
|
2157
|
+
|
|
2158
|
+
### Plain CLI Flow
|
|
2159
|
+
|
|
2160
|
+
`demian-plain` is interactive when stdin is a TTY:
|
|
2161
|
+
|
|
2162
|
+
```text
|
|
2163
|
+
demian-plain
|
|
2164
|
+
-> load config
|
|
2165
|
+
-> show resolved default provider/model/agent/cwd
|
|
2166
|
+
-> ask whether to use defaults, select provider, or edit model
|
|
2167
|
+
-> collect message prompt
|
|
2168
|
+
-> run SessionRunner
|
|
2169
|
+
-> print final raw Markdown answer
|
|
2170
|
+
-> return to message prompt
|
|
2171
|
+
-> repeat until /exit or /quit
|
|
2172
|
+
```
|
|
2173
|
+
|
|
2174
|
+
Plain CLI provider/model UX:
|
|
2175
|
+
|
|
2176
|
+
- Show the currently resolved provider and model before asking for the message.
|
|
2177
|
+
- `Enter` accepts each default.
|
|
2178
|
+
- `?` lists configured provider keys.
|
|
2179
|
+
- Typing a provider key selects that provider and resets the model to that provider's configured default.
|
|
2180
|
+
- Typing a model value overrides the selected provider's configured model for this run.
|
|
2181
|
+
- CLI flags preselect values and mark them as flag-sourced in the prompt.
|
|
2182
|
+
|
|
2183
|
+
Plain CLI message commands:
|
|
2184
|
+
|
|
2185
|
+
- `/compact`: compact the current interactive conversation history immediately. The command is handled locally, is not sent to the model, and is not stored as a recalled prompt.
|
|
2186
|
+
- `/stop`: stop the currently running task by aborting its active run.
|
|
2187
|
+
- `/exit` or `/quit`: exit the interactive plain CLI. If a task is running, stop it first and then exit.
|
|
2188
|
+
- Any non-command message while standby starts a new task run.
|
|
2189
|
+
|
|
2190
|
+
Plain CLI I/O contract:
|
|
2191
|
+
|
|
2192
|
+
- Settings prompts, retries, streaming deltas, tool progress, permission prompts, and hook warnings go to stderr.
|
|
2193
|
+
- Final assistant raw Markdown goes to stdout.
|
|
2194
|
+
- In non-interactive stdin/stdout contexts, `demian-plain` must not hang waiting for provider/model input. It should require an explicit automation input path such as a prompt argument, `--prompt`, or stdin message support.
|
|
2195
|
+
|
|
2196
|
+
### TUI Flow
|
|
2197
|
+
|
|
2198
|
+
`demian` and `demian-cli` show provider/model context inside the UI before the first message:
|
|
2199
|
+
|
|
2200
|
+
```text
|
|
2201
|
+
demian
|
|
2202
|
+
-> render TUI
|
|
2203
|
+
-> show default provider/model/agent/cwd in status area
|
|
2204
|
+
-> allow provider/model changes through shortcuts
|
|
2205
|
+
-> collect message in prompt composer
|
|
2206
|
+
-> run SessionRunner
|
|
2207
|
+
-> return to composer
|
|
2208
|
+
-> repeat until /exit or /quit
|
|
2209
|
+
```
|
|
2210
|
+
|
|
2211
|
+
TUI provider/model UX is defined in `architecture-tui.md`.
|
|
2212
|
+
|
|
2213
|
+
The TUI also exposes main agent selection before a message starts. Pressing `a` in an empty prompt opens the main agent selector; `Up`/`Down` moves through primary agents, `Enter` applies the selection, and `Esc` cancels. This mirrors the provider selector instead of adding a separate startup wizard. The gain is a small, repeatable settings loop that works between tasks. The tradeoff is that prompt input beginning with bare `a`, `p`, or `m` in an empty composer is reserved for settings shortcuts; users can type any other character first or change settings before composing.
|
|
2214
|
+
|
|
2215
|
+
The TUI composer keeps an in-memory prompt history for the lifetime of the UI process. `Up` recalls older submitted prompts; `Down` moves toward newer prompts and eventually restores the current draft. Control commands such as `/compact`, `/stop`, `/exit`, and `/quit` are not stored in prompt history, and prompt recall does not change `SessionRunner.history` or transcript behavior.
|
|
2216
|
+
|
|
2217
|
+
The TUI message composer accepts `/compact` while standby. The command forces deterministic compaction of the current interactive conversation history, then returns to the prompt composer. The transcript view records a local system block summarizing the token estimate before and after compaction, the number of preserved messages, the number of dropped messages, and the summary token estimate. `/compact` is disabled while a task is running; during running state, the command bar accepts only `/stop` and `/exit` because the active task owns the current `AbortController`.
|
|
2218
|
+
|
|
2219
|
+
TUI permission prompts are modal inside the running state. If a tool requires approval, the bottom bar renders the permission prompt before the generic running command bar, because the active keybindings are `y`, `a`, `n`, and `Enter` until the permission request is resolved. Tool input is summarized in human-readable form; `bash` shows the exact command, file tools show paths, and text-heavy fields show character counts plus a short preview.
|
|
2220
|
+
|
|
2221
|
+
In multi-agent mode, the TUI should render child activity as nested progress without making it modal unless a permission request is pending:
|
|
2222
|
+
|
|
2223
|
+
```text
|
|
2224
|
+
Agent reviewer started
|
|
2225
|
+
tool read_file completed
|
|
2226
|
+
tool grep completed
|
|
2227
|
+
Agent reviewer completed
|
|
2228
|
+
```
|
|
2229
|
+
|
|
2230
|
+
The same permission bar priority applies to child requests. A pending `builder -> edit_file` prompt must override generic running controls so the active keybindings are unambiguous.
|
|
2231
|
+
|
|
2232
|
+
Flags:
|
|
2233
|
+
|
|
2234
|
+
| Flag | Meaning |
|
|
2235
|
+
|------|---------|
|
|
2236
|
+
| `--mode <single|multi>` | explicit agent mode |
|
|
2237
|
+
| `--single-agent` | alias for `--mode single` |
|
|
2238
|
+
| `--multi-agent` | alias for `--mode multi` |
|
|
2239
|
+
| `--agent <name>` | selected agent in single mode; main agent in multi mode |
|
|
2240
|
+
| `--provider <name>` | provider key |
|
|
2241
|
+
| `--model <name>` | model override |
|
|
2242
|
+
| `--max-turns <n>` | loop limit |
|
|
2243
|
+
| `--main-max-context-tokens <n>` | override main model-visible context budget |
|
|
2244
|
+
| `--main-compact-at <ratio>` | override main compaction trigger |
|
|
2245
|
+
| `--sub-max-context-tokens <n>` | override sub agent model-visible context budget |
|
|
2246
|
+
| `--sub-compact-at <ratio>` | override sub agent compaction trigger |
|
|
2247
|
+
| `--cwd <path>` | workspace directory |
|
|
2248
|
+
| `--yes`, `-y` | auto-allow ask permissions |
|
|
2249
|
+
| `--dry-run` | stop before `write_file`, `edit_file`, or `bash` execution |
|
|
2250
|
+
| `--stream` | use provider streaming when available |
|
|
2251
|
+
| `--no-stream` | force non-streaming chat |
|
|
2252
|
+
| `--image <path-or-url>` | attach image; repeatable |
|
|
2253
|
+
| `--sandbox <mode>` | `off`, `read-only`, or `workspace-write` |
|
|
2254
|
+
| `--persistent-grants` | enable persisted `always` grants |
|
|
2255
|
+
| `--no-persistent-grants` | keep grants session-local |
|
|
2256
|
+
| `--no-transcript` | disable transcript writes |
|
|
2257
|
+
| `--config <path>` | load extra config file |
|
|
2258
|
+
| `--trust-user-tools` | allow loading user-scope external tool modules |
|
|
2259
|
+
| `--trust-project` | allow loading trusted project `.demian` contributions |
|
|
2260
|
+
|
|
2261
|
+
Recommended discovery commands:
|
|
2262
|
+
|
|
2263
|
+
| Command | Meaning |
|
|
2264
|
+
|---------|---------|
|
|
2265
|
+
| `demian list-agents` | show registered agents, source, mode, provider profile, cost/category |
|
|
2266
|
+
| `demian list-tools` | show registered tools, source, side-effect metadata |
|
|
2267
|
+
|
|
2268
|
+
Multi-agent mode is opt-in. Users should not encounter extra model calls, handoff data sharing, or child permission prompts unless they explicitly selected multi-agent mode through config or flags.
|
|
2269
|
+
|
|
2270
|
+
Plain CLI stdout:
|
|
2271
|
+
|
|
2272
|
+
- final assistant answer only
|
|
2273
|
+
|
|
2274
|
+
Plain CLI stderr:
|
|
2275
|
+
|
|
2276
|
+
- retry status
|
|
2277
|
+
- streaming text deltas
|
|
2278
|
+
- tool progress
|
|
2279
|
+
- permission prompts
|
|
2280
|
+
- hook warnings
|
|
2281
|
+
|
|
2282
|
+
---
|
|
2283
|
+
|
|
2284
|
+
## 14. Streaming
|
|
2285
|
+
|
|
2286
|
+
Streaming is a provider capability, not a separate runtime mode.
|
|
2287
|
+
|
|
2288
|
+
Rules:
|
|
2289
|
+
|
|
2290
|
+
- If `streaming.enabled` is true and `provider.stream` exists, use streaming.
|
|
2291
|
+
- Emit `model.text.delta` events for text deltas.
|
|
2292
|
+
- Accumulate deltas into the final `AssistantMessage`.
|
|
2293
|
+
- Accumulate tool-call deltas until a complete tool call is available.
|
|
2294
|
+
- Emit a final `done` event carrying a normalized `ChatResponse`.
|
|
2295
|
+
- `--no-stream` forces `chat()`.
|
|
2296
|
+
|
|
2297
|
+
The transcript records normalized events, not raw SSE lines.
|
|
2298
|
+
|
|
2299
|
+
---
|
|
2300
|
+
|
|
2301
|
+
## 15. Multimodal Input
|
|
2302
|
+
|
|
2303
|
+
Multimodal support is user-message input only in v0.
|
|
2304
|
+
|
|
2305
|
+
Rules:
|
|
2306
|
+
|
|
2307
|
+
- `--image <url>` passes through as OpenAI-compatible `image_url`.
|
|
2308
|
+
- `--image <path>` reads a local file, validates size, and converts to a data URL.
|
|
2309
|
+
- Repeated image flags preserve order after the text prompt.
|
|
2310
|
+
- `maxImageBytes` defaults to 8MB.
|
|
2311
|
+
- Unsupported providers should fail clearly or degrade only when explicitly configured.
|
|
2312
|
+
|
|
2313
|
+
Native Gemini multimodal APIs remain deferred. OpenAI-compatible image input is enough for the integrated v0.
|
|
2314
|
+
|
|
2315
|
+
---
|
|
2316
|
+
|
|
2317
|
+
## 16. Sandbox
|
|
2318
|
+
|
|
2319
|
+
Sandbox applies to `bash` execution.
|
|
2320
|
+
|
|
2321
|
+
```ts
|
|
2322
|
+
export type SandboxMode = "off" | "read-only" | "workspace-write"
|
|
2323
|
+
export type SandboxNetwork = "inherit" | "deny"
|
|
2324
|
+
|
|
2325
|
+
export interface SandboxConfig {
|
|
2326
|
+
mode: SandboxMode
|
|
2327
|
+
network?: SandboxNetwork
|
|
2328
|
+
}
|
|
2329
|
+
|
|
2330
|
+
export interface SandboxLaunch {
|
|
2331
|
+
command: string
|
|
2332
|
+
args: string[]
|
|
2333
|
+
adapter: "off" | "macos-sandbox-exec" | "linux-bwrap" | "env-only"
|
|
2334
|
+
env: Record<string, string>
|
|
2335
|
+
}
|
|
2336
|
+
```
|
|
2337
|
+
|
|
2338
|
+
Behavior:
|
|
2339
|
+
|
|
2340
|
+
- `off`: run through shell directly.
|
|
2341
|
+
- `read-only`: deny writes where platform support exists.
|
|
2342
|
+
- `workspace-write`: allow writes in cwd and temp directories.
|
|
2343
|
+
- `network: "deny"`: best effort when supported.
|
|
2344
|
+
- `env-only`: fallback that records intent but cannot enforce OS isolation.
|
|
2345
|
+
|
|
2346
|
+
The runtime must emit sandbox adapter metadata so transcripts show whether a command was truly sandboxed.
|
|
2347
|
+
|
|
2348
|
+
---
|
|
2349
|
+
|
|
2350
|
+
## 17. Persistent Grants
|
|
2351
|
+
|
|
2352
|
+
Persistent grants are a convenience cache, not a security override.
|
|
2353
|
+
|
|
2354
|
+
Storage:
|
|
2355
|
+
|
|
2356
|
+
```text
|
|
2357
|
+
.demian/grants.json
|
|
2358
|
+
```
|
|
2359
|
+
|
|
2360
|
+
Rules:
|
|
2361
|
+
|
|
2362
|
+
- TTL is enforced on load.
|
|
2363
|
+
- Grants are scoped to project by default.
|
|
2364
|
+
- Grants use the same grant key policy as session grants.
|
|
2365
|
+
- Built-in hard deny and global explicit deny still win.
|
|
2366
|
+
- `--no-persistent-grants` disables disk-backed grants.
|
|
2367
|
+
- `--yes` does not create persistent grants unless the answer is explicitly `always`.
|
|
2368
|
+
|
|
2369
|
+
---
|
|
2370
|
+
|
|
2371
|
+
## 18. Events And Transcript
|
|
2372
|
+
|
|
2373
|
+
Runtime events:
|
|
2374
|
+
|
|
2375
|
+
- `session.started`
|
|
2376
|
+
- `session.ended`
|
|
2377
|
+
- `user.message`
|
|
2378
|
+
- `model.request`
|
|
2379
|
+
- `model.text`
|
|
2380
|
+
- `model.text.delta`
|
|
2381
|
+
- `model.usage`
|
|
2382
|
+
- `model.content_filter`
|
|
2383
|
+
- `context.compiled`
|
|
2384
|
+
- `context.compacted`
|
|
2385
|
+
- `provider.retry`
|
|
2386
|
+
- `tool.requested`
|
|
2387
|
+
- `tool.started`
|
|
2388
|
+
- `tool.completed`
|
|
2389
|
+
- `tool.failed`
|
|
2390
|
+
- `hook.fired`
|
|
2391
|
+
- `hook.failed`
|
|
2392
|
+
- `permission.requested`
|
|
2393
|
+
- `permission.granted`
|
|
2394
|
+
- `permission.denied`
|
|
2395
|
+
- `agent.invocation.requested`
|
|
2396
|
+
- `agent.invocation.started`
|
|
2397
|
+
- `agent.invocation.tool_policy.applied`
|
|
2398
|
+
- `agent.session.context_compacted`
|
|
2399
|
+
- `agent.session.memory_updated`
|
|
2400
|
+
- `agent.invocation.completed`
|
|
2401
|
+
- `agent.invocation.failed`
|
|
2402
|
+
- `agent.invocation.cancelled`
|
|
2403
|
+
|
|
2404
|
+
Most runtime events should accept optional invocation fields:
|
|
2405
|
+
|
|
2406
|
+
```ts
|
|
2407
|
+
export interface InvocationEventFields {
|
|
2408
|
+
rootSessionId?: string
|
|
2409
|
+
agentSessionId?: string
|
|
2410
|
+
invocationId?: string
|
|
2411
|
+
parentInvocationId?: string
|
|
2412
|
+
agent?: string
|
|
2413
|
+
agentPath?: string[]
|
|
2414
|
+
}
|
|
2415
|
+
```
|
|
2416
|
+
|
|
2417
|
+
Permission events include target type:
|
|
2418
|
+
|
|
2419
|
+
```ts
|
|
2420
|
+
{
|
|
2421
|
+
type: "permission.requested"
|
|
2422
|
+
rootSessionId: string
|
|
2423
|
+
invocationId: string
|
|
2424
|
+
agent: string
|
|
2425
|
+
targetType: "tool" | "agent"
|
|
2426
|
+
targetName: string
|
|
2427
|
+
decision: Decision
|
|
2428
|
+
ts: number
|
|
2429
|
+
}
|
|
2430
|
+
```
|
|
2431
|
+
|
|
2432
|
+
Single-agent transcript path remains compatible:
|
|
2433
|
+
|
|
2434
|
+
```text
|
|
2435
|
+
.demian/transcripts/<sessionId>/session.jsonl
|
|
2436
|
+
```
|
|
2437
|
+
|
|
2438
|
+
Preferred multi-agent transcript path:
|
|
2439
|
+
|
|
2440
|
+
```text
|
|
2441
|
+
.demian/transcripts/<rootSessionId>/session.jsonl
|
|
2442
|
+
```
|
|
2443
|
+
|
|
2444
|
+
All main and child events are written in chronological order to one root transcript. Events carry `agentSessionId`, `invocationId`, and `parentInvocationId`, so a UI or debug tool can reconstruct the invocation tree and per-agent memory timeline.
|
|
2445
|
+
|
|
2446
|
+
Alternative considered:
|
|
2447
|
+
|
|
2448
|
+
```text
|
|
2449
|
+
.demian/transcripts/<rootSessionId>/main.jsonl
|
|
2450
|
+
.demian/transcripts/<rootSessionId>/agents/<invocationId>.jsonl
|
|
2451
|
+
```
|
|
2452
|
+
|
|
2453
|
+
That would physically separate child logs, but it introduces cross-file ordering problems for permissions and tool execution. One root JSONL is chosen because audit order matters more than file separation. Per-agent views can be generated later from the canonical root log.
|
|
2454
|
+
|
|
2455
|
+
Example:
|
|
2456
|
+
|
|
2457
|
+
```jsonl
|
|
2458
|
+
{"type":"session.started","sessionId":"ses_123","rootSessionId":"root_1","agent":"build","provider":"gemini","cwd":"/repo","ts":1730000000000}
|
|
2459
|
+
{"type":"user.message","sessionId":"ses_123","rootSessionId":"root_1","text":"refactor parser","ts":1730000000100}
|
|
2460
|
+
{"type":"agent.invocation.started","rootSessionId":"root_1","agentSessionId":"as_1","invocationId":"inv_1","agent":"reviewer","provider":"gemini-fast","model":"model","depth":1,"ts":1730000000150}
|
|
2461
|
+
{"type":"tool.requested","rootSessionId":"root_1","invocationId":"inv_1","agent":"reviewer","callId":"call_1","name":"read_file","input":{"path":"package.json"},"ts":1730000000200}
|
|
2462
|
+
{"type":"tool.completed","rootSessionId":"root_1","invocationId":"inv_1","agent":"reviewer","callId":"call_1","name":"read_file","ok":true,"preview":"...","ts":1730000000300}
|
|
2463
|
+
{"type":"agent.invocation.completed","rootSessionId":"root_1","agentSessionId":"as_1","invocationId":"inv_1","agent":"reviewer","preview":"no issues found","ts":1730000000400}
|
|
2464
|
+
{"type":"session.ended","sessionId":"ses_123","rootSessionId":"root_1","reason":"completed","ts":1730000002000}
|
|
2465
|
+
```
|
|
2466
|
+
|
|
2467
|
+
Transcripts may contain sensitive information. `mask-secrets` is a useful guard, not a complete redaction guarantee.
|
|
2468
|
+
|
|
2469
|
+
---
|
|
2470
|
+
|
|
2471
|
+
## 19. Safety Model
|
|
2472
|
+
|
|
2473
|
+
Guaranteed:
|
|
2474
|
+
|
|
2475
|
+
- No local side effects without tool calls.
|
|
2476
|
+
- No side-effecting tool execution without hooks and permission evaluation.
|
|
2477
|
+
- Cwd escape is rejected by built-in file tools.
|
|
2478
|
+
- `.env` edits are blocked by default.
|
|
2479
|
+
- Dangerous bash patterns are blocked before permission prompts.
|
|
2480
|
+
- `plan` agent is read-only.
|
|
2481
|
+
- Built-in hard deny and global explicit deny beat grants and `--yes`.
|
|
2482
|
+
- Single-agent mode does not expose delegation.
|
|
2483
|
+
- Sub agent tool calls route permission prompts to the root/main UI.
|
|
2484
|
+
- User-approved `always` grants from child requests become root/global grants.
|
|
2485
|
+
- Sub agents cannot bypass provider profile validation, cwd, sandbox, hook, hard deny, or global deny policy.
|
|
2486
|
+
- Child agent memory is bounded and compressed before it grows without limit.
|
|
2487
|
+
- Root abort cancels the active child invocation.
|
|
2488
|
+
- Provider content filters are surfaced.
|
|
2489
|
+
- Transcript records runtime decisions.
|
|
2490
|
+
|
|
2491
|
+
Not guaranteed:
|
|
2492
|
+
|
|
2493
|
+
- Perfect OS isolation on every platform.
|
|
2494
|
+
- Complete network exfiltration prevention.
|
|
2495
|
+
- Complete destructive command detection.
|
|
2496
|
+
- Complete secret redaction.
|
|
2497
|
+
- Provider-native safety policy control.
|
|
2498
|
+
- Correctness of model-generated code.
|
|
2499
|
+
- Correctness of model-selected sub agents.
|
|
2500
|
+
- Fairness or optimal scheduling for background agents, because background execution is deferred.
|
|
2501
|
+
- Perfect network exfiltration prevention by arbitrary external tool modules.
|
|
2502
|
+
|
|
2503
|
+
Important trust distinction:
|
|
2504
|
+
|
|
2505
|
+
```text
|
|
2506
|
+
external agent markdown can change model behavior
|
|
2507
|
+
external tool module can execute local code at load/run time
|
|
2508
|
+
```
|
|
2509
|
+
|
|
2510
|
+
Agent markdown and tool modules therefore need different trust gates. Permission grants are about runtime actions; trust decisions are about loading contributions.
|
|
2511
|
+
|
|
2512
|
+
---
|
|
2513
|
+
|
|
2514
|
+
## 20. Testing Strategy
|
|
2515
|
+
|
|
2516
|
+
Default test suite must be network-free.
|
|
2517
|
+
|
|
2518
|
+
Priority:
|
|
2519
|
+
|
|
2520
|
+
| # | Area |
|
|
2521
|
+
|---|------|
|
|
2522
|
+
| 1 | OpenAI-shaped message history |
|
|
2523
|
+
| 2 | OpenAIProvider payload and response normalization |
|
|
2524
|
+
| 3 | SSE streaming parser |
|
|
2525
|
+
| 4 | Gemini config preset path |
|
|
2526
|
+
| 5 | content filter mapping |
|
|
2527
|
+
| 6 | retry behavior and `Retry-After` |
|
|
2528
|
+
| 7 | Anthropic conversion fixtures |
|
|
2529
|
+
| 8 | Permission deny greater than grant |
|
|
2530
|
+
| 9 | Persistent grant TTL and scope |
|
|
2531
|
+
| 10 | Tool input validation |
|
|
2532
|
+
| 11 | Tool path boundary checks |
|
|
2533
|
+
| 12 | Hook dispatcher block/warn/patch |
|
|
2534
|
+
| 13 | Session loop with mock provider |
|
|
2535
|
+
| 14 | Transcript event ordering |
|
|
2536
|
+
| 15 | Multimodal data URL conversion |
|
|
2537
|
+
| 16 | Sandbox launch construction |
|
|
2538
|
+
| 17 | Single-agent regression with delegation absent |
|
|
2539
|
+
| 18 | External agent/tool registration and built-in collision rejection |
|
|
2540
|
+
| 19 | Agent provider profile validation |
|
|
2541
|
+
| 20 | Callable catalog prompt and `delegate_agent.agent` schema enum |
|
|
2542
|
+
| 21 | Hidden/non-callable/self/cycle/depth delegation rejection |
|
|
2543
|
+
| 22 | Child invocation through mock providers |
|
|
2544
|
+
| 23 | Child tool permission routed through root coordinator |
|
|
2545
|
+
| 24 | User-approved child `always` grant reusable by main |
|
|
2546
|
+
| 25 | Hard/global deny cannot be expanded by grants |
|
|
2547
|
+
| 26 | Structured handoff excludes full main history |
|
|
2548
|
+
| 27 | Delegate context and result caps |
|
|
2549
|
+
| 28 | Child memory compression and repeated invocation reuse |
|
|
2550
|
+
| 29 | Main delegate-result ledger compaction |
|
|
2551
|
+
| 30 | Cross-provider handoff permission prompt |
|
|
2552
|
+
| 31 | Root transcript preserves parent-child event order |
|
|
2553
|
+
| 32 | Codex config preset, Responses payload mapping, and stream parser |
|
|
2554
|
+
| 33 | Codex local auth discovery, refresh classification, and installation metadata |
|
|
2555
|
+
| 34 | Codex API-key fallback guard and keyring account derivation |
|
|
2556
|
+
| 35 | Claude Code external runtime config, Agent SDK adapter, and legacy direct API gate |
|
|
2557
|
+
| 36 | Claude Code permission bridge, session mapping, invalid resume recovery, and usage ledger |
|
|
2558
|
+
| 37 | Claude Code CLI fallback capability detection, stream-json parser, and cancellation diagnostics |
|
|
2559
|
+
| 38 | Trust persistence and mtime invalidation |
|
|
2560
|
+
| 39 | Mode precedence across flags and config aliases |
|
|
2561
|
+
| 40 | Root abort cancels active child run |
|
|
2562
|
+
|
|
2563
|
+
Optional integration tests:
|
|
2564
|
+
|
|
2565
|
+
- real OpenAI-compatible endpoint smoke test
|
|
2566
|
+
- real Gemini OpenAI-compatible endpoint smoke test
|
|
2567
|
+
- real local Ollama smoke test
|
|
2568
|
+
- real Codex provider smoke test with an explicit local Codex login opt-in
|
|
2569
|
+
- real Claude Code provider smoke test with an explicit local Claude Code login opt-in
|
|
2570
|
+
|
|
2571
|
+
Optional integration tests must require explicit environment variables and must not run in default CI.
|
|
2572
|
+
|
|
2573
|
+
Multi-agent mock fixture:
|
|
2574
|
+
|
|
2575
|
+
```text
|
|
2576
|
+
main provider:
|
|
2577
|
+
turn 1 -> delegate_agent(reviewer, task)
|
|
2578
|
+
turn 2 -> final answer using child result
|
|
2579
|
+
|
|
2580
|
+
child provider:
|
|
2581
|
+
turn 1 -> read_file(...)
|
|
2582
|
+
turn 2 -> final child answer
|
|
2583
|
+
```
|
|
2584
|
+
|
|
2585
|
+
This fixture exercises delegation, child tool use, permission routing, compact result return, and transcript chronology without network access.
|
|
2586
|
+
|
|
2587
|
+
---
|
|
2588
|
+
|
|
2589
|
+
## 21. Implementation Sequence
|
|
2590
|
+
|
|
2591
|
+
| Step | Work | Verification |
|
|
2592
|
+
|------|------|--------------|
|
|
2593
|
+
| 1 | Create package skeleton | `demian --help` smoke |
|
|
2594
|
+
| 2 | Port message, event, transcript, util modules | transcript golden test |
|
|
2595
|
+
| 3 | Port config loader and provider resolver | config merge tests |
|
|
2596
|
+
| 4 | Port OpenAIProvider, retry, streaming parser | provider fixture tests |
|
|
2597
|
+
| 5 | Add Gemini and local provider presets | config fixture test |
|
|
2598
|
+
| 6 | Port tool registry and read/search tools | temp workspace tests |
|
|
2599
|
+
| 7 | Port write/edit/bash with output cap | temp workspace and timeout tests |
|
|
2600
|
+
| 8 | Integrate sandbox adapter directory | launch construction tests |
|
|
2601
|
+
| 9 | Port hook dispatcher and built-ins | hook decision tests |
|
|
2602
|
+
| 10 | Port permission engine, session grants, persistent grants | deny/grant tests |
|
|
2603
|
+
| 11 | Port agents and prompts | agent policy tests |
|
|
2604
|
+
| 12 | Port Session Runner with streaming and multimodal | mock session tests |
|
|
2605
|
+
| 13 | Add Anthropic adapter fixtures | conversion tests |
|
|
2606
|
+
| 14 | Add interactive CLI/TUI startup, provider/model selection, and prompt UX | pseudo-TTY smoke tests |
|
|
2607
|
+
| 15 | Add Codex provider | Codex auth, payload, stream, and refresh fixture tests |
|
|
2608
|
+
| 16 | Update README | manual review |
|
|
2609
|
+
|
|
2610
|
+
Multi-agent follow-up sequence:
|
|
2611
|
+
|
|
2612
|
+
| Step | Work | Verification |
|
|
2613
|
+
|------|------|--------------|
|
|
2614
|
+
| 16 | Add `register()` to tool and agent registries | collision and duplicate tests |
|
|
2615
|
+
| 17 | Introduce `AgentDefinition` normalization | `build` and `plan` snapshots unchanged |
|
|
2616
|
+
| 18 | Validate agent provider profile references | unknown profile rejects or disables agent |
|
|
2617
|
+
| 19 | Add context budget config/env parsing | precedence tests |
|
|
2618
|
+
| 20 | Move grants into root PermissionCoordinator | existing grant tests pass |
|
|
2619
|
+
| 21 | Add RootSession and invocation IDs | events include root and invocation context |
|
|
2620
|
+
| 22 | Add in-memory AgentSessionStore | repeated child invocation reuses memory |
|
|
2621
|
+
| 23 | Implement synchronous child AgentSessionRunner | mock child returns answer to main |
|
|
2622
|
+
| 24 | Implement `delegate_agent` tool and catalog injection | absent in single-agent mode, present only when allowed |
|
|
2623
|
+
| 25 | Add cycle, depth, hidden, and callable validation | rejected before child model call |
|
|
2624
|
+
| 26 | Add structured handoff and result caps | large context/result handled by policy |
|
|
2625
|
+
| 27 | Add child memory compression | compaction event and memory update emitted |
|
|
2626
|
+
| 28 | Add root transcript writer fields | parent-child chronology golden test |
|
|
2627
|
+
| 29 | Add CLI/TUI mode flags and nested activity | single-agent smoke unchanged |
|
|
2628
|
+
| 30 | Add discovery and trust persistence | mtime change prompts again |
|
|
2629
|
+
|
|
2630
|
+
---
|
|
2631
|
+
|
|
2632
|
+
## 22. Dependency Policy
|
|
2633
|
+
|
|
2634
|
+
Core should be dependency-light:
|
|
2635
|
+
|
|
2636
|
+
```json
|
|
2637
|
+
{
|
|
2638
|
+
"dependencies": {
|
|
2639
|
+
"@anthropic-ai/sdk": "version-from-package-manager"
|
|
2640
|
+
},
|
|
2641
|
+
"engines": {
|
|
2642
|
+
"node": ">=22.18"
|
|
2643
|
+
}
|
|
2644
|
+
}
|
|
2645
|
+
```
|
|
2646
|
+
|
|
2647
|
+
Rationale:
|
|
2648
|
+
|
|
2649
|
+
- Native `fetch` is enough for OpenAI-compatible providers.
|
|
2650
|
+
- Gemini support does not require a Google SDK.
|
|
2651
|
+
- Anthropic is intentionally different: it uses `@anthropic-ai/sdk` because the native protocol is not OpenAI-compatible and SDK types help manage API drift.
|
|
2652
|
+
- Tests can run with Node's built-in test runner and TypeScript stripping.
|
|
2653
|
+
- Vendor SDK usage must stay adapter-local.
|
|
2654
|
+
|
|
2655
|
+
Avoid in core:
|
|
2656
|
+
|
|
2657
|
+
- `@google/generative-ai`
|
|
2658
|
+
- `@google-cloud/vertexai`
|
|
2659
|
+
- LangChain
|
|
2660
|
+
- Vercel AI SDK
|
|
2661
|
+
- LlamaIndex
|
|
2662
|
+
- LiteLLM SDK
|
|
2663
|
+
|
|
2664
|
+
Acceptable later:
|
|
2665
|
+
|
|
2666
|
+
- A small JSON schema validator if handwritten validation becomes too large.
|
|
2667
|
+
- Optional platform sandbox helpers behind adapter boundaries.
|
|
2668
|
+
|
|
2669
|
+
---
|
|
2670
|
+
|
|
2671
|
+
## 23. Decision Log
|
|
2672
|
+
|
|
2673
|
+
| Decision | Final choice | Reason |
|
|
2674
|
+
|----------|--------------|--------|
|
|
2675
|
+
| Package name | `demian` | One integrated package, no lineage suffix |
|
|
2676
|
+
| Canonical doc | `nodejs/architecture.md` | Replaces separate design lineages for new work |
|
|
2677
|
+
| Default provider path | OpenAI-compatible | Broad vendor coverage with one adapter |
|
|
2678
|
+
| OpenAI-compatible auth | Default bearer, Azure `api-key` inferred from endpoint | API shape is shared, but auth headers differ by provider; normal Azure users should not need extra config |
|
|
2679
|
+
| Gemini | Config-only via OpenAIProvider | Core code change should be zero |
|
|
2680
|
+
| Internal messages | OpenAI-shaped | Matches default provider path |
|
|
2681
|
+
| Anthropic | Separate SDK-backed adapter | Native shape differs enough to isolate conversion; SDK helps manage protocol changes |
|
|
2682
|
+
| Tool names | `read_file`, `write_file`, `edit_file`, `bash`, `grep`, `glob` | Stable and explicit side-effect boundary |
|
|
2683
|
+
| Runtime | Node-first, Bun-compatible where practical | Current Codex implementation is dependency-light |
|
|
2684
|
+
| Streaming | Included in v0 | Already implemented and important for CLI UX |
|
|
2685
|
+
| Multimodal | Included for user input | Already represented and useful for UI/code review |
|
|
2686
|
+
| Sandbox | Included for bash launch | Codex modes plus Claude adapter shape |
|
|
2687
|
+
| Persistent grants | Included with TTL | Usability without weakening deny |
|
|
2688
|
+
| Interactive command entry | Commands enter UI/CLI before message input | Avoids forcing long prompts into shell arguments and creates room for provider/model selection |
|
|
2689
|
+
| Provider/model selection timing | Before `SessionRunner` starts | Keeps provider identity stable during one tool loop |
|
|
2690
|
+
| Provider/model selection helper | Shared `src/ui/settings.ts` | Keeps plain CLI and TUI selection behavior identical |
|
|
2691
|
+
| Provider/model source labels | `config`, `saved`, `flag`, `interactive` | Shows config, remembered preference, flag, and current UI provenance clearly |
|
|
2692
|
+
| TUI permission bar priority | Permission prompt overrides the running command bar | Prevents hidden approval shortcuts when a tool waits for `y/a/n/Enter` |
|
|
2693
|
+
| TUI tool input summary | `src/ui/tool-summary.ts` | Shows bash commands and paths clearly instead of raw JSON-first output |
|
|
2694
|
+
| TUI prompt recall | `Up`/`Down` over in-memory submitted prompts | Reuse previous inputs without persisting UI history or changing model conversation history |
|
|
2695
|
+
| TUI main agent selector | `a` opens a primary-agent selector in the empty prompt state | Lets users change the main agent inside TUI without restarting |
|
|
2696
|
+
| Provider/model preferences | `.demian/preferences.json` | Reuse the last explicit selection without mutating config files or storing secrets |
|
|
2697
|
+
| Multi-agent mode | opt-in `single-agent` / `multi-agent` mode | Preserves current UX and avoids surprise extra model calls |
|
|
2698
|
+
| Multi-agent authority | RootSession owns permissions, grants, transcript, cwd, sandbox, and abort | Keeps user authority centralized across main and sub agents |
|
|
2699
|
+
| Sub execution | child AgentSessionRunner with bounded memory | Preserves specialist continuity without copying full main history |
|
|
2700
|
+
| Delegation exposure | one `delegate_agent` tool | Stable provider payload and centralized invocation permission |
|
|
2701
|
+
| Agent shape | `AgentDefinition` policy capsule | Puts prompt, provider profile, tools, defaults, delegation, and catalog in one validated unit |
|
|
2702
|
+
| Visibility and authority | Split `tools.visible` from grants/rules | Keeps model tool exposure separate from user-approved authority |
|
|
2703
|
+
| Agent provider config | profile references only | Prevents agent files from introducing endpoints or credentials |
|
|
2704
|
+
| Child context | structured handoff packet plus compressed child memory | Gives sub agents useful context without full main-history sharing |
|
|
2705
|
+
| Child result | compact result to main, full details in transcript/artifacts | Controls main context growth while preserving audit detail |
|
|
2706
|
+
| Permission grants | root/global grants, no per-agent grants in v1 | User approvals behave consistently across main and child invocations |
|
|
2707
|
+
| Transcript shape | one root JSONL with invocation fields | Preserves chronological audit order |
|
|
2708
|
+
| External trust | trust decisions separate from permission grants | Loading code/data and executing actions stay separate concerns |
|
|
2709
|
+
|
|
2710
|
+
---
|
|
2711
|
+
|
|
2712
|
+
## 24. Key Tradeoffs
|
|
2713
|
+
|
|
2714
|
+
| Decision | Gained | Given up |
|
|
2715
|
+
|----------|--------|----------|
|
|
2716
|
+
| OpenAI-compatible default | 10+ vendors through one path | Internal shape resembles OpenAI API |
|
|
2717
|
+
| OpenAI-shaped internal history | Minimal conversion for default path | Anthropic adapter does more work |
|
|
2718
|
+
| Gemini config-only | No new dependency, no new adapter | Native Gemini features are flattened |
|
|
2719
|
+
| Anthropic SDK adapter | Better native protocol typing and API-change handling | One adapter-local runtime dependency |
|
|
2720
|
+
| Six snake_case tools | Clear action boundaries | Slightly longer names |
|
|
2721
|
+
| Hard/global deny dominates grants | Strong safety guarantee for real safety boundaries | Agent-local deny is no longer an absolute cap after user approval |
|
|
2722
|
+
| Streaming in v0 | Better CLI feedback | More provider parser complexity |
|
|
2723
|
+
| Sandbox in v0 | Safer bash defaults and clear metadata | Platform enforcement varies |
|
|
2724
|
+
| Persistent grants in v0 | Less repeated prompting | Requires TTL and scope discipline |
|
|
2725
|
+
| Multimodal in v0 | Screenshots and UI tasks are possible | Provider support varies |
|
|
2726
|
+
| Interactive provider/model selection | Clearer default visibility and fewer config mistakes | Adds a pre-session state machine to CLI and TUI |
|
|
2727
|
+
| Root-owned multi-agent runtime | Clear authority, cancellation, and audit ownership | Adds a RootSession layer above SessionRunner |
|
|
2728
|
+
| Child AgentSessionRunner | Specialist continuity and provider/model isolation | More lifecycle and memory compression logic |
|
|
2729
|
+
| One `delegate_agent` tool | Small, stable core and centralized permission target | Model must choose target through an `agent` argument |
|
|
2730
|
+
| Policy-capsule agents | Agent behavior and safety are validated together | Larger agent definition shape |
|
|
2731
|
+
| Visibility vs permission split | Narrow orchestrators can delegate to capable specialists | Users and implementers must understand grants do not expose hidden tools |
|
|
2732
|
+
| Global root grants | Fewer repeated prompts and consistent user authority | No per-agent grant isolation in v1 |
|
|
2733
|
+
| Provider profile references | No hidden endpoints or credentials in agent files | Per-agent models require named config profiles |
|
|
2734
|
+
| Structured handoff | Bounded data sharing and clearer provider prompts | Child may need tools or refs for details not copied into context |
|
|
2735
|
+
| Compact child results | Main context stays small | Main may need transcript/artifact refs for full detail |
|
|
2736
|
+
| Synchronous v1 delegation | Deterministic permission and transcript order | No background child tasks yet |
|
|
2737
|
+
| Trust file mtime validation | Simple trust invalidation in v1 | Weaker than content-hash validation |
|
|
2738
|
+
|
|
2739
|
+
---
|
|
2740
|
+
|
|
2741
|
+
## 25. Summary
|
|
2742
|
+
|
|
2743
|
+
`demian` integrates the two prior designs into one canonical package architecture.
|
|
2744
|
+
|
|
2745
|
+
The final shape is:
|
|
2746
|
+
|
|
2747
|
+
- OpenAI-compatible protocol is the default provider path.
|
|
2748
|
+
- Internal history is OpenAI-shaped.
|
|
2749
|
+
- Gemini is a config entry, not a new adapter.
|
|
2750
|
+
- Anthropic is isolated in its adapter.
|
|
2751
|
+
- Hooks and permissions guard every local side effect.
|
|
2752
|
+
- Tools are small, explicit, and workspace-bound.
|
|
2753
|
+
- Streaming, multimodal input, sandbox modes, and persistent grants are part of the integrated v0.
|
|
2754
|
+
- Single-agent mode remains the default.
|
|
2755
|
+
- Multi-agent mode is an opt-in root-owned delegation extension.
|
|
2756
|
+
- Agents are policy capsules, not arbitrary provider or credential entrypoints.
|
|
2757
|
+
- `delegate_agent` is the stable model-facing delegation tool.
|
|
2758
|
+
- Sub agents run as bounded child sessions with compressed memory and structured handoff context.
|
|
2759
|
+
- Child tool calls, child invocation prompts, grants, events, transcripts, sandbox, and abort all route through the root session.
|
|
2760
|
+
- Runtime decisions are visible through events and transcripts.
|
|
2761
|
+
|
|
2762
|
+
The package should stay small enough to understand, strict enough to trust, and open enough to add new OpenAI-compatible vendors without touching the core loop.
|
|
2763
|
+
|
|
2764
|
+
---
|
|
2765
|
+
|
|
2766
|
+
## References
|
|
2767
|
+
|
|
2768
|
+
- `nodejs/architecture-by-claude.md`
|
|
2769
|
+
- `nodejs/architecture-by-codex.md`
|
|
2770
|
+
- `claude/architecture-demian.md`
|
|
2771
|
+
- `codex/architecture-demian.md`
|
|
2772
|
+
- `.documents/multi-agent-architecture.md`
|
|
2773
|
+
- Google AI for Developers, Gemini API OpenAI compatibility: `https://ai.google.dev/gemini-api/docs/openai`
|