@yemi33/minions 0.1.1588 → 0.1.1590
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/bin/minions.js +5 -3
- package/dashboard/js/settings.js +216 -22
- package/dashboard.js +135 -9
- package/docs/copilot-cli-schema.md +637 -0
- package/docs/copilot-output-sample-claude.jsonl +72 -0
- package/docs/copilot-output-sample-default.jsonl +26 -0
- package/docs/copilot-output-sample-gpt4o.jsonl +23 -0
- package/engine/cli.js +250 -18
- package/engine/lifecycle.js +14 -9
- package/engine/llm.js +346 -94
- package/engine/model-discovery.js +167 -0
- package/engine/preflight.js +247 -19
- package/engine/runtimes/claude.js +413 -0
- package/engine/runtimes/copilot.js +566 -0
- package/engine/runtimes/index.js +61 -0
- package/engine/shared.js +299 -63
- package/engine/spawn-agent.js +265 -181
- package/engine.js +118 -31
- package/package.json +1 -1
|
@@ -0,0 +1,637 @@
|
|
|
1
|
+
# GitHub Copilot CLI — Behavior & Schema Reference
|
|
2
|
+
|
|
3
|
+
> **Spike output for plan item P-8f2c4d9b.** Authoritative reference for the
|
|
4
|
+
> Copilot adapter implementation in P-1d4a8e7c. Every claim in this document is
|
|
5
|
+
> grounded in real CLI invocations against `copilot.exe` v1.0.36 on Windows
|
|
6
|
+
> (WinGet install, `%LOCALAPPDATA%\Microsoft\WinGet\Links\copilot.exe`).
|
|
7
|
+
> Captured samples live alongside this file as
|
|
8
|
+
> `copilot-output-sample-{default,claude,gpt4o}.jsonl`.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## TL;DR — Adapter Decisions
|
|
13
|
+
|
|
14
|
+
| Decision | Value | Why |
|
|
15
|
+
|---|---|---|
|
|
16
|
+
| `capabilities.promptViaArg` | **`false`** | Stdin works in non-interactive mode and dodges the Windows ARG_MAX (~32 KB) limit that breaks `-p "<long-prompt>"` outright (`CreateProcess` rejects it before Copilot even starts). |
|
|
17
|
+
| `capabilities.modelDiscovery` | **`true`** | `GET https://api.githubcopilot.com/models` with a `gh auth token` Bearer returns HTTP 200 + a 24-model JSON catalog. |
|
|
18
|
+
| `capabilities.streaming` | **`true`** | `--stream on` (default) emits `assistant.message_delta` events incrementally; `--stream off` suppresses deltas but the final `assistant.message` always arrives. |
|
|
19
|
+
| `capabilities.sessionResume` | **`true`** | `--resume <session-id>` documented, and every `result` event emits `sessionId`. |
|
|
20
|
+
| `capabilities.systemPromptFile` | **`false`** | No `--system-prompt-file` flag exists. Inject system prompt via a `<system>` block prepended to stdin. |
|
|
21
|
+
| `capabilities.effortLevels` | **`true`** | `--effort` accepts `low|medium|high|xhigh` (no `max`). Adapter must map `'max' → 'xhigh'`. |
|
|
22
|
+
| `capabilities.costTracking` | **`false`** | `result.usage` contains `premiumRequests` (count, not USD), no token counts, no cost. |
|
|
23
|
+
| `capabilities.modelShorthands` | **`false`** | Models are full IDs (`claude-sonnet-4.5`, `gpt-5.4`). No `sonnet`/`opus`/`haiku` aliasing on the Copilot side. |
|
|
24
|
+
| `capabilities.budgetCap` | **`false`** | No `--max-budget-usd` flag. |
|
|
25
|
+
| `capabilities.bareMode` | **`false`** | No `--bare`. Closest equivalent is `--no-custom-instructions` (suppresses AGENTS.md only, not all auto-discovery). |
|
|
26
|
+
| `capabilities.fallbackModel` | **`false`** | No `--fallback-model` flag. |
|
|
27
|
+
| `capabilities.sessionPersistenceControl` | **`false`** | Copilot manages session state internally in `~/.copilot/session-state/`. Engine cannot opt out without `--config-dir`. |
|
|
28
|
+
|
|
29
|
+
| Default | Value |
|
|
30
|
+
|---|---|
|
|
31
|
+
| `copilotStreamMode` (default config field) | `'on'` — preserves incremental UX; the adapter parser tolerates either mode. |
|
|
32
|
+
| `copilotDisableBuiltinMcps` | `true` — github-mcp-server bypasses Minions' `pull-requests.json` tracking; default OFF. |
|
|
33
|
+
| `copilotSuppressAgentsMd` | `true` — Minions injects its own playbook prompt; AGENTS.md auto-load conflicts with that. |
|
|
34
|
+
| `copilotReasoningSummaries` | `false` — opt-in; only some models honor it. |
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## 1. Binary Resolution
|
|
39
|
+
|
|
40
|
+
### Standalone `copilot` (Windows / WinGet)
|
|
41
|
+
|
|
42
|
+
```text
|
|
43
|
+
PS> where.exe copilot
|
|
44
|
+
C:\Users\yemishin\AppData\Local\Microsoft\WinGet\Links\copilot.exe
|
|
45
|
+
|
|
46
|
+
PS> copilot --version
|
|
47
|
+
GitHub Copilot CLI 1.0.36.
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
WinGet installs a shim into `%LOCALAPPDATA%\Microsoft\WinGet\Links\` and adds
|
|
51
|
+
that dir to PATH. The adapter's `resolveBinary()` should:
|
|
52
|
+
|
|
53
|
+
1. Check PATH (`where copilot` / `which copilot`) — the standalone path.
|
|
54
|
+
2. If not found, fall back to the `gh copilot` extension (see §1.1).
|
|
55
|
+
3. Cache the resolved path to `engine/copilot-caps.json` (mirrors
|
|
56
|
+
`engine/claude-caps.json` shape).
|
|
57
|
+
4. Never attempt npm-style resolution — Copilot is **not** an npm package.
|
|
58
|
+
|
|
59
|
+
### `gh copilot` extension (fallback / unconfirmed on this host)
|
|
60
|
+
|
|
61
|
+
The `gh-copilot` extension is documented at
|
|
62
|
+
<https://docs.github.com/en/copilot/github-copilot-in-the-cli>. On this test
|
|
63
|
+
machine `gh extension list` returned empty, so this path was **not exercised
|
|
64
|
+
empirically**. The adapter contract still needs to support it for hosts without
|
|
65
|
+
the WinGet standalone install:
|
|
66
|
+
|
|
67
|
+
```text
|
|
68
|
+
gh extension install github/gh-copilot
|
|
69
|
+
gh copilot --help # subcommand of gh
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
When falling back to the extension form, the adapter must return:
|
|
73
|
+
|
|
74
|
+
```js
|
|
75
|
+
{ bin: '<path-to-gh.exe>', native: true, leadingArgs: ['copilot'] }
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
so that `engine/spawn-agent.js` invokes `gh copilot <flags>` rather than
|
|
79
|
+
`copilot <flags>`. **Important caveat**: the `gh copilot` extension is the
|
|
80
|
+
older "explain/suggest" UX and may not support the same flag set as the
|
|
81
|
+
standalone `copilot` v1.0.36 (especially `--output-format json`, `--autopilot`,
|
|
82
|
+
`--allow-all`). Until empirically validated, treat the `gh copilot` path as
|
|
83
|
+
**best-effort** — the adapter should detect missing flags via stderr probe and
|
|
84
|
+
warn at preflight.
|
|
85
|
+
|
|
86
|
+
### Recommended `resolveBinary()` cache shape
|
|
87
|
+
|
|
88
|
+
```json
|
|
89
|
+
{
|
|
90
|
+
"copilotBin": "C:\\Users\\yemishin\\AppData\\Local\\Microsoft\\WinGet\\Links\\copilot.exe",
|
|
91
|
+
"copilotIsNative": true,
|
|
92
|
+
"leadingArgs": [],
|
|
93
|
+
"version": "1.0.36",
|
|
94
|
+
"resolvedAt": "2026-04-28T04:00:00Z"
|
|
95
|
+
}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## 2. Prompt Delivery — `promptViaArg: false` (stdin)
|
|
101
|
+
|
|
102
|
+
### Empirical results
|
|
103
|
+
|
|
104
|
+
```powershell
|
|
105
|
+
# Test A: stdin without -p — works.
|
|
106
|
+
"Say only the word: pong" |
|
|
107
|
+
copilot --output-format json -s --allow-all --no-ask-user --autopilot --log-level error
|
|
108
|
+
# EXIT=0; user.message.data.content = "Say only the word: pong\r\n"; assistant replied "pong".
|
|
109
|
+
|
|
110
|
+
# Test B: -p "<40_000-char string>" — Windows OS rejects spawn.
|
|
111
|
+
$big = "x" * 40000
|
|
112
|
+
copilot -p $big --output-format json -s --allow-all --no-ask-user --autopilot --log-level error
|
|
113
|
+
# Program 'copilot.exe' failed to run:
|
|
114
|
+
# The filename or extension is too long.
|
|
115
|
+
# (CreateProcess ARG_MAX limit, ~32 KB on Windows)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Decision
|
|
119
|
+
|
|
120
|
+
Set `capabilities.promptViaArg = false`. The adapter **does not** emit
|
|
121
|
+
`--prompt <text>` in args; instead, `engine/spawn-agent.js` pipes the final
|
|
122
|
+
prompt (system block prepended) via stdin. This:
|
|
123
|
+
|
|
124
|
+
- Sidesteps the Windows 32 KB ARG_MAX cliff for any prompt that bundles
|
|
125
|
+
`pinned.md`, `notes.md`, knowledge-base entries, and a playbook (Minions
|
|
126
|
+
prompts routinely run 20–60 KB).
|
|
127
|
+
- Mirrors the proven Claude path (also `promptViaArg: false`).
|
|
128
|
+
- Eliminates the need to investigate `--prompt @tmpfile` syntax (open question
|
|
129
|
+
in the PRD) — that flag does not appear in `copilot --help` output for v1.0.36.
|
|
130
|
+
The `@<path>` prefix syntax is only documented for `--additional-mcp-config`,
|
|
131
|
+
not `--prompt`.
|
|
132
|
+
|
|
133
|
+
### `buildPrompt(promptText, sysPromptText)` — recommended impl
|
|
134
|
+
|
|
135
|
+
Copilot has no `--system-prompt-file`. Inject the system prompt as a `<system>`
|
|
136
|
+
block prepended to the user prompt, mirroring the convention used by
|
|
137
|
+
[Anthropic's tool-use docs](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use):
|
|
138
|
+
|
|
139
|
+
```js
|
|
140
|
+
function buildPrompt(promptText, sysPromptText) {
|
|
141
|
+
const user = promptText == null ? '' : String(promptText);
|
|
142
|
+
if (!sysPromptText) return user;
|
|
143
|
+
return `<system>\n${sysPromptText}\n</system>\n\n${user}`;
|
|
144
|
+
}
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Combined with `--no-custom-instructions` (default-on per
|
|
148
|
+
`copilotSuppressAgentsMd`), this guarantees the prompt the agent sees is exactly
|
|
149
|
+
what Minions sent.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## 3. Required Headless Flag Set
|
|
154
|
+
|
|
155
|
+
Empirically confirmed flags for non-interactive Copilot invocations:
|
|
156
|
+
|
|
157
|
+
| Flag | Required? | Effect |
|
|
158
|
+
|---|---|---|
|
|
159
|
+
| `--output-format json` | **required** | Switches stdout to JSONL (one event per line). Default is `text`. |
|
|
160
|
+
| `-s` / `--silent` | recommended | Suppresses chatty stats lines; only the agent JSONL stream remains. |
|
|
161
|
+
| `--allow-all` | **required** | Equivalent to `--allow-all-tools --allow-all-paths --allow-all-urls`. Without this the CLI prompts for every tool/path use, which deadlocks in stdin/stdout mode. |
|
|
162
|
+
| `--no-ask-user` | **required** | Removes the `ask_user` tool. Without it the agent can stall waiting for human input. |
|
|
163
|
+
| `--autopilot` | for multi-turn agency | Enables `task_complete`-driven multi-turn loop. **Without it** the session ends after one assistant response (see §3.1). |
|
|
164
|
+
| `--log-level error` | recommended | Suppresses INFO/DEBUG diagnostics that aren't part of the JSONL stream. |
|
|
165
|
+
| `--no-custom-instructions` | gated by config | Disables AGENTS.md auto-load. Default-on for Minions (`copilotSuppressAgentsMd: true`). |
|
|
166
|
+
| `--disable-builtin-mcps` | gated by config | Disables `github-mcp-server`. Default-on for Minions (`copilotDisableBuiltinMcps: true`) to prevent split-brain PR creation. |
|
|
167
|
+
| `--no-color` | optional | Cosmetic; safe to omit when `--output-format json`. |
|
|
168
|
+
| `--plain-diff` | optional | Cosmetic; the agent's diff rendering doesn't appear in JSONL stream anyway. |
|
|
169
|
+
| `--max-autopilot-continues N` | optional | Maps from `opts.maxTurns`. |
|
|
170
|
+
| `--effort <level>` | optional | Choices: `low|medium|high|xhigh`. **No `max`** — adapter must map `'max' → 'xhigh'`. |
|
|
171
|
+
| `--model <id>` | optional | Full model ID (see §6 for the catalog). |
|
|
172
|
+
| `--resume=<session-id>` | optional | Maps from `opts.sessionId`. Note the `=` syntax — `--resume <id>` is also accepted but `--resume` standalone enters interactive picker. |
|
|
173
|
+
| `--stream on` / `--stream off` | optional | Default is `on`. See §4. |
|
|
174
|
+
| `--enable-reasoning-summaries` | optional | Maps from `opts.reasoningSummaries`; only Anthropic models populate `assistant.reasoning_delta`. |
|
|
175
|
+
| `--add-dir <path>` | injected by spawn-agent | Same role as on the Claude path — registers extra read-allowed dirs (skill discovery). |
|
|
176
|
+
| `-v` / `--verbose` | **never emit** | Does not exist on Copilot. The Claude adapter emits `--verbose`; the Copilot adapter MUST NOT. |
|
|
177
|
+
|
|
178
|
+
### 3.1 `--autopilot` vs single-shot
|
|
179
|
+
|
|
180
|
+
| Mode | Terminal event | When to use |
|
|
181
|
+
|---|---|---|
|
|
182
|
+
| `--autopilot` | `session.task_complete` → `result` | Multi-turn agent work (implement / fix / review). The agent calls the `task_complete` tool with a summary and Copilot ends the session. |
|
|
183
|
+
| no `--autopilot` | `assistant.turn_end` → `result` | One-shot Q&A. Fewer events; no `session.task_complete`, no `session.info`. Closer match for CC / doc-chat use cases that don't need multi-turn. |
|
|
184
|
+
|
|
185
|
+
The Minions agent path (engine.js dispatch) uses autopilot. CC and doc-chat in
|
|
186
|
+
`engine/llm.js` should also use autopilot — they need tool use even when only
|
|
187
|
+
one assistant turn is expected — but the parser must tolerate the absence of
|
|
188
|
+
`session.task_complete` because some early-exit paths skip it.
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## 4. Streaming — `--stream on` vs `--stream off`
|
|
193
|
+
|
|
194
|
+
Empirical comparison (single 4-character `pong` reply):
|
|
195
|
+
|
|
196
|
+
| Property | `--stream on` (default) | `--stream off` |
|
|
197
|
+
|---|---|---|
|
|
198
|
+
| `assistant.message_delta` events | **1+** (delta-coded; chunks the response as tokens arrive) | **0** (suppressed) |
|
|
199
|
+
| `assistant.message` (final) | **1** | **1** |
|
|
200
|
+
| Other events | identical | identical |
|
|
201
|
+
| Stdout shape | one JSON object per line | one JSON object per line |
|
|
202
|
+
| Time-to-first-token | low | high (waits for full response) |
|
|
203
|
+
|
|
204
|
+
### Parser implications
|
|
205
|
+
|
|
206
|
+
```js
|
|
207
|
+
// Pseudocode — accumulate deltas, but ALWAYS use assistant.message as truth.
|
|
208
|
+
let buffered = '';
|
|
209
|
+
for (const ev of events) {
|
|
210
|
+
if (ev.type === 'assistant.message_delta') {
|
|
211
|
+
buffered += ev.data.deltaContent;
|
|
212
|
+
emit({ kind: 'partial-text', text: buffered });
|
|
213
|
+
} else if (ev.type === 'assistant.message') {
|
|
214
|
+
// Authoritative final content. Replace buffered text — never concat.
|
|
215
|
+
emit({ kind: 'final-text', text: ev.data.content, messageId: ev.data.messageId });
|
|
216
|
+
buffered = '';
|
|
217
|
+
}
|
|
218
|
+
}
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
The parser must handle three cases:
|
|
222
|
+
1. `--stream on`, response < 1 chunk: zero deltas, one message. (Common for short replies — see the gpt-4.1 sample.)
|
|
223
|
+
2. `--stream on`, response with deltas: N deltas + 1 message (treat message as truth).
|
|
224
|
+
3. `--stream off`: zero deltas, one message.
|
|
225
|
+
|
|
226
|
+
### Recommendation
|
|
227
|
+
|
|
228
|
+
Default `copilotStreamMode = 'on'` so the engine's streaming UI (live-output.log
|
|
229
|
+
tailing, dashboard progress feed) gets incremental updates. The parser tolerates
|
|
230
|
+
both, so users who want bandwidth-efficient batch responses can flip to `off`.
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## 5. JSONL Event Schema
|
|
235
|
+
|
|
236
|
+
Captured against three model invocations:
|
|
237
|
+
- `copilot-output-sample-default.jsonl` — `gpt-5.4` (Copilot's default; OpenAI Codex variant)
|
|
238
|
+
- `copilot-output-sample-claude.jsonl` — `claude-sonnet-4.5`
|
|
239
|
+
- `copilot-output-sample-gpt4o.jsonl` — `gpt-4.1` (note: `gpt-4o` itself is no longer in the API catalog; closest enabled OpenAI model is `gpt-4.1`)
|
|
240
|
+
|
|
241
|
+
### 5.1 Event Type Inventory
|
|
242
|
+
|
|
243
|
+
| Event type | Default (gpt-5.4) | Claude Sonnet 4.5 | GPT-4.1 | Stream-on only? | Notes |
|
|
244
|
+
|---|:-:|:-:|:-:|:-:|---|
|
|
245
|
+
| `session.mcp_server_status_changed` | ✓ | ✓ | ✓ | no | Per-server connect/disconnect transitions. `data.status` is one of `connecting`/`connected`/`disabled`/`error`. |
|
|
246
|
+
| `session.mcp_servers_loaded` | ✓ | ✓ | ✓ | no | Snapshot of all MCP servers + their final status. |
|
|
247
|
+
| `session.skills_loaded` | ✓ | ✓ | ✓ | no | List of discovered skills (`source: builtin|project|plugin`). |
|
|
248
|
+
| `session.tools_updated` | ✓ | ✓ | ✓ | no | Diagnostic; `data` only has `{ model }` — does **not** list available tools. |
|
|
249
|
+
| `session.info` | ✓ | ✓ | – | no (autopilot only) | `infoType: autopilot_continuation` — emitted between turns when autopilot continues. |
|
|
250
|
+
| `session.task_complete` | ✓ | ✓ | ✓ | no (autopilot only) | Terminal-of-session signal in autopilot mode. `data.success: bool`, `data.summary: string`. |
|
|
251
|
+
| `user.message` | ✓ | ✓ | ✓ | no | Echo of the user prompt, with `transformedContent` showing what the agent actually saw (datetime + reminder block prepended). |
|
|
252
|
+
| `assistant.turn_start` | ✓ | ✓ | ✓ | no | Per-turn delimiter. |
|
|
253
|
+
| `assistant.turn_end` | ✓ | ✓ | ✓ | no | Per-turn delimiter; pairs with `turn_start`. |
|
|
254
|
+
| `assistant.reasoning` | ✓ | ✓ | – | no | Encrypted reasoning blob (`reasoningOpaque`). Absent for non-reasoning models like GPT-4.1. |
|
|
255
|
+
| `assistant.reasoning_delta` | – | ✓ | – | yes | **Anthropic-only.** Streamed reasoning text — Claude exposes plain `reasoningText`, OpenAI does not. |
|
|
256
|
+
| `assistant.message_delta` | ✓ | ✓ | ✓ | **yes** | Per-token streamed delta. Only emitted with `--stream on`. |
|
|
257
|
+
| `assistant.message` | ✓ | ✓ | ✓ | no | Authoritative final assistant content for the turn. |
|
|
258
|
+
| `tool.execution_start` | ✓ | ✓ | ✓ | no | Tool call begin. `data.toolName`, `data.arguments`. |
|
|
259
|
+
| `tool.execution_complete` | ✓ | ✓ | ✓ | no | Tool call end. `data.success: bool`, `data.result.{content, detailedContent}`. |
|
|
260
|
+
| `result` | ✓ | ✓ | ✓ | no | Final aggregate. `data.usage`, `sessionId`, `exitCode`. |
|
|
261
|
+
| `function` | (in stdin-no-`-p` test) | – | – | no | Observed once in an early stdin test; appears to be a meta event for tool invocation. **Treat as ignorable** unless future spike re-confirms its semantics. |
|
|
262
|
+
|
|
263
|
+
All events share the envelope `{ type, data, id, timestamp, parentId, ephemeral? }`.
|
|
264
|
+
`ephemeral: true` marks events that the Copilot UI hides from the persistent
|
|
265
|
+
session log (e.g., deltas, MCP loading noise). The parser should ignore the
|
|
266
|
+
`ephemeral` flag — it's a UI hint, not a parser hint.
|
|
267
|
+
|
|
268
|
+
### 5.2 Provider-Driven Schema Variation
|
|
269
|
+
|
|
270
|
+
This is the **biggest gotcha for the parser** — `assistant.message.data` carries
|
|
271
|
+
provider-specific fields:
|
|
272
|
+
|
|
273
|
+
| Field on `assistant.message.data` | Default (gpt-5.4 / Codex) | Claude Sonnet 4.5 | GPT-4.1 |
|
|
274
|
+
|---|:-:|:-:|:-:|
|
|
275
|
+
| `messageId` | ✓ | ✓ | ✓ |
|
|
276
|
+
| `content` | ✓ | ✓ | ✓ |
|
|
277
|
+
| `interactionId` | ✓ | ✓ | ✓ |
|
|
278
|
+
| `requestId` | ✓ | ✓ | ✓ |
|
|
279
|
+
| `outputTokens` | ✓ (52) | ✓ | ✓ |
|
|
280
|
+
| `toolRequests` | ✓ | ✓ | ✓ |
|
|
281
|
+
| `reasoningOpaque` | ✓ | ✓ | – |
|
|
282
|
+
| `reasoningText` | – | ✓ | – |
|
|
283
|
+
| `encryptedContent` | ✓ | – | – |
|
|
284
|
+
| `phase` | ✓ (`final_answer`) | – | – |
|
|
285
|
+
|
|
286
|
+
### 5.3 Defensive Parser Rules
|
|
287
|
+
|
|
288
|
+
1. **Whitelist the events you care about**, route everything else to a
|
|
289
|
+
`type: 'ignore'` bucket. The schema clearly has provider-specific extensions,
|
|
290
|
+
and Copilot's release cadence means new event types will appear without
|
|
291
|
+
warning.
|
|
292
|
+
2. **Never assume optional fields exist.** `outputTokens` is consistently
|
|
293
|
+
present, but `reasoningText`/`reasoningOpaque`/`encryptedContent`/`phase`
|
|
294
|
+
are provider-dependent. Prefer `?.` access; default missing numerics to
|
|
295
|
+
`null`, not `0`.
|
|
296
|
+
3. **Use `assistant.message.data.content` as the authoritative response.**
|
|
297
|
+
Do not concatenate `assistant.message_delta` deltas into your final result —
|
|
298
|
+
they're a streaming hint, not the source of truth.
|
|
299
|
+
4. **The terminal signal differs by mode.** In autopilot, watch for
|
|
300
|
+
`session.task_complete` (and then `result`); in single-shot, watch for
|
|
301
|
+
`result` directly (no `task_complete`).
|
|
302
|
+
5. **`exitCode` lives on the `result` event**, not on the process. The CLI
|
|
303
|
+
process always returns 0 even when the agent failed mid-turn — surface
|
|
304
|
+
`result.exitCode !== 0` as the actual failure signal.
|
|
305
|
+
|
|
306
|
+
### 5.4 Result / Usage Shape (no cost tracking)
|
|
307
|
+
|
|
308
|
+
```json
|
|
309
|
+
{
|
|
310
|
+
"type": "result",
|
|
311
|
+
"timestamp": "2026-04-28T04:11:36.109Z",
|
|
312
|
+
"sessionId": "8a216c49-e51c-4eef-9405-bf83298fced2",
|
|
313
|
+
"exitCode": 0,
|
|
314
|
+
"usage": {
|
|
315
|
+
"premiumRequests": 2,
|
|
316
|
+
"totalApiDurationMs": 5485,
|
|
317
|
+
"sessionDurationMs": 9103,
|
|
318
|
+
"codeChanges": {
|
|
319
|
+
"linesAdded": 0,
|
|
320
|
+
"linesRemoved": 0,
|
|
321
|
+
"filesModified": []
|
|
322
|
+
}
|
|
323
|
+
}
|
|
324
|
+
}
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
**Critical**: Copilot does **not** emit `total_cost_usd` or per-token counts
|
|
328
|
+
(input/output/cache_*). The closest proxy is `premiumRequests` — a unitless
|
|
329
|
+
count of premium-tier requests consumed in the session. The adapter's
|
|
330
|
+
`parseOutput()` must map this onto the engine's usage shape with NULLs (not
|
|
331
|
+
zeros) for fields Copilot doesn't expose, so dashboard cost telemetry can
|
|
332
|
+
distinguish "Copilot didn't tell us" from "this turn cost $0":
|
|
333
|
+
|
|
334
|
+
```js
|
|
335
|
+
{
|
|
336
|
+
costUsd: null, // ← not 0; Copilot doesn't report this
|
|
337
|
+
inputTokens: null,
|
|
338
|
+
outputTokens: <sum of assistant.message.data.outputTokens>, // recovered from per-turn events
|
|
339
|
+
cacheRead: null,
|
|
340
|
+
cacheCreation: null,
|
|
341
|
+
durationMs: result.usage.totalApiDurationMs ?? 0,
|
|
342
|
+
numTurns: <count of assistant.turn_end events>,
|
|
343
|
+
// Copilot-specific extension:
|
|
344
|
+
premiumRequests: result.usage.premiumRequests ?? 0,
|
|
345
|
+
}
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
---
|
|
349
|
+
|
|
350
|
+
## 6. Model Discovery
|
|
351
|
+
|
|
352
|
+
`GET https://api.githubcopilot.com/models` with a Bearer token works.
|
|
353
|
+
Empirical result on this host:
|
|
354
|
+
|
|
355
|
+
```http
|
|
356
|
+
GET https://api.githubcopilot.com/models
|
|
357
|
+
Authorization: Bearer <gh-cli-token>
|
|
358
|
+
|
|
359
|
+
200 OK
|
|
360
|
+
{ "data": [ <24 model objects> ] }
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
### Token resolution
|
|
364
|
+
|
|
365
|
+
The adapter should resolve the bearer in this priority:
|
|
366
|
+
|
|
367
|
+
1. `process.env.GH_TOKEN`
|
|
368
|
+
2. `process.env.COPILOT_GITHUB_TOKEN`
|
|
369
|
+
3. (Optional best-effort) shell out to `gh auth token` — already works on
|
|
370
|
+
this host since `gh auth status` shows an active session.
|
|
371
|
+
|
|
372
|
+
`gh auth token` is the authoritative path on developer machines — but spawning
|
|
373
|
+
`gh` adds an extra dependency. The adapter should attempt env vars first and
|
|
374
|
+
only fall back to `gh auth token` if both are unset, **never required at
|
|
375
|
+
listModels-time** (return `null` and let the dashboard fall back to free-text).
|
|
376
|
+
|
|
377
|
+
### Response shape (24 models on the test account)
|
|
378
|
+
|
|
379
|
+
```json
|
|
380
|
+
{
|
|
381
|
+
"data": [
|
|
382
|
+
{
|
|
383
|
+
"id": "claude-sonnet-4.5",
|
|
384
|
+
"name": "Claude Sonnet 4.5",
|
|
385
|
+
"vendor": "Anthropic",
|
|
386
|
+
"object": "model",
|
|
387
|
+
"version": "claude-sonnet-4.5",
|
|
388
|
+
"preview": false,
|
|
389
|
+
"model_picker_enabled": true,
|
|
390
|
+
"model_picker_category": "powerful",
|
|
391
|
+
"policy": { "state": "enabled", "terms": "..." },
|
|
392
|
+
"supported_endpoints": ["/v1/messages", "/chat/completions"],
|
|
393
|
+
"capabilities": {
|
|
394
|
+
"type": "chat",
|
|
395
|
+
"tokenizer": "o200k_base",
|
|
396
|
+
"family": "claude-sonnet-4.5",
|
|
397
|
+
"limits": { "max_context_window_tokens": 200000, "max_output_tokens": 16000, ... },
|
|
398
|
+
"supports": {
|
|
399
|
+
"streaming": true,
|
|
400
|
+
"tool_calls": true,
|
|
401
|
+
"vision": true,
|
|
402
|
+
"structured_outputs": true,
|
|
403
|
+
"parallel_tool_calls": true,
|
|
404
|
+
"reasoning_effort": ["low", "medium", "high"],
|
|
405
|
+
"adaptive_thinking": true,
|
|
406
|
+
"max_thinking_budget": 32000,
|
|
407
|
+
"min_thinking_budget": 1024
|
|
408
|
+
}
|
|
409
|
+
}
|
|
410
|
+
}
|
|
411
|
+
]
|
|
412
|
+
}
|
|
413
|
+
```
|
|
414
|
+
|
|
415
|
+
### Models seen on this account (snapshot)
|
|
416
|
+
|
|
417
|
+
```text
|
|
418
|
+
claude-haiku-4.5 Claude Haiku 4.5 Anthropic enabled
|
|
419
|
+
claude-opus-4.5 Claude Opus 4.5 Anthropic enabled
|
|
420
|
+
claude-opus-4.6 Claude Opus 4.6 Anthropic enabled
|
|
421
|
+
claude-opus-4.6-1m Claude Opus 4.6 (1M ctx) Anthropic enabled
|
|
422
|
+
claude-opus-4.7 Claude Opus 4.7 Anthropic enabled
|
|
423
|
+
claude-sonnet-4 Claude Sonnet 4 Anthropic enabled
|
|
424
|
+
claude-sonnet-4.5 Claude Sonnet 4.5 Anthropic enabled
|
|
425
|
+
claude-sonnet-4.6 Claude Sonnet 4.6 Anthropic enabled
|
|
426
|
+
gpt-3.5-turbo GPT 3.5 Turbo Azure OpenAI (no policy)
|
|
427
|
+
gpt-3.5-turbo-0613 GPT 3.5 Turbo Azure OpenAI (no policy)
|
|
428
|
+
gpt-4.1 GPT-4.1 Azure OpenAI enabled
|
|
429
|
+
gpt-4.1-2025-04-14 GPT-4.1 Azure OpenAI enabled
|
|
430
|
+
gpt-4o-mini GPT-4o mini Azure OpenAI (no policy)
|
|
431
|
+
gpt-4o-mini-2024-07-18 GPT-4o mini Azure OpenAI (no policy)
|
|
432
|
+
gpt-5-mini GPT-5 mini Azure OpenAI enabled
|
|
433
|
+
gpt-5.2 GPT-5.2 OpenAI enabled
|
|
434
|
+
gpt-5.2-codex GPT-5.2-Codex OpenAI enabled
|
|
435
|
+
gpt-5.3-codex GPT-5.3-Codex OpenAI enabled
|
|
436
|
+
gpt-5.4 GPT-5.4 OpenAI enabled (account default)
|
|
437
|
+
gpt-5.4-mini GPT-5.4 mini OpenAI (no policy)
|
|
438
|
+
gpt-5.5 GPT-5.5 OpenAI enabled
|
|
439
|
+
text-embedding-3-small Embedding V3 small Azure OpenAI (no streaming)
|
|
440
|
+
text-embedding-3-small-inference Azure OpenAI (no streaming)
|
|
441
|
+
text-embedding-ada-002 Embedding V2 Ada Azure OpenAI (no streaming)
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
### Adapter mapping (for `listModels()`)
|
|
445
|
+
|
|
446
|
+
```js
|
|
447
|
+
function listModels() {
|
|
448
|
+
// ... HTTP GET as above, on any error return null (non-fatal)
|
|
449
|
+
return data
|
|
450
|
+
.filter(m => m.capabilities?.type === 'chat') // drop embeddings
|
|
451
|
+
.filter(m => m.policy?.state === 'enabled' || m.preview) // drop disabled
|
|
452
|
+
.map(m => ({
|
|
453
|
+
id: m.id,
|
|
454
|
+
name: m.name,
|
|
455
|
+
provider: m.vendor,
|
|
456
|
+
}));
|
|
457
|
+
}
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
### Subscription-tier note
|
|
461
|
+
|
|
462
|
+
`policy.state` is `"enabled"` or **absent** (no key) — never explicitly
|
|
463
|
+
`"disabled"` in this snapshot. Models the user lacks entitlement for simply
|
|
464
|
+
omit the policy block. The adapter should treat missing `policy.state` as
|
|
465
|
+
"hide from default picker" but still expose them via free-text override —
|
|
466
|
+
matches the `model_picker_enabled` field semantics.
|
|
467
|
+
|
|
468
|
+
`gpt-4o` is no longer present as a top-level model — only `gpt-4o-mini` remains
|
|
469
|
+
(and is unlisted in the picker). The plan's `copilot-output-sample-gpt4o.jsonl`
|
|
470
|
+
is named for the spec but actually captures `gpt-4.1`, the closest enabled
|
|
471
|
+
OpenAI model. **The adapter implementer (P-1d4a8e7c) should reference
|
|
472
|
+
`gpt-4.1` as the canonical OpenAI test model** — if `gpt-4o` returns to the
|
|
473
|
+
catalog, treat it as a future regression.
|
|
474
|
+
|
|
475
|
+
---
|
|
476
|
+
|
|
477
|
+
## 7. Verifying `--no-custom-instructions` and `--disable-builtin-mcps`
|
|
478
|
+
|
|
479
|
+
### `--no-custom-instructions` (AGENTS.md auto-load)
|
|
480
|
+
|
|
481
|
+
Constructed test: created `AGENTS.md` in cwd with content
|
|
482
|
+
`Always end every response with the marker: __AGENTS_LOADED__`, then ran:
|
|
483
|
+
|
|
484
|
+
```text
|
|
485
|
+
# A) Default behavior — AGENTS.md is loaded
|
|
486
|
+
PS> "Just say hello." | copilot --output-format json -s --allow-all --no-ask-user --autopilot --log-level error
|
|
487
|
+
{"type":"assistant.message", ..., "content": "Hello. __AGENTS_LOADED__"} ← marker present
|
|
488
|
+
|
|
489
|
+
# B) With --no-custom-instructions
|
|
490
|
+
PS> "Just say hello." | copilot --output-format json -s --allow-all --no-ask-user --autopilot --log-level error --no-custom-instructions
|
|
491
|
+
{"type":"assistant.message", ..., "content": ""} ← no marker; AGENTS.md ignored
|
|
492
|
+
```
|
|
493
|
+
|
|
494
|
+
**Confirmed**: `--no-custom-instructions` suppresses AGENTS.md auto-load. The
|
|
495
|
+
flag does **not** affect skills loading (project skills under `.claude/skills/`
|
|
496
|
+
still appear in `session.skills_loaded`) — it's narrowly scoped to AGENTS-style
|
|
497
|
+
custom instruction files.
|
|
498
|
+
|
|
499
|
+
### `--disable-builtin-mcps` (github-mcp-server)
|
|
500
|
+
|
|
501
|
+
```text
|
|
502
|
+
# Default — server connects, status: "connected"
|
|
503
|
+
{"type":"session.mcp_servers_loaded","data":{"servers":[{"name":"github-mcp-server","status":"connected","source":"builtin"}]}}
|
|
504
|
+
|
|
505
|
+
# With --disable-builtin-mcps — server appears, status: "disabled"
|
|
506
|
+
{"type":"session.mcp_servers_loaded","data":{"servers":[{"name":"github-mcp-server","status":"disabled","source":"builtin"}]}}
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
**Confirmed**: the flag flips `status` from `"connected"` to `"disabled"`. The
|
|
510
|
+
server is still *registered* (Copilot inventories it for diagnostics), but the
|
|
511
|
+
agent cannot use its tools. This is the desired behavior — Minions wants the
|
|
512
|
+
server invisible to the agent so all GitHub mutations route through the
|
|
513
|
+
project's `pull-requests.json` tracker rather than spawning ghost PRs.
|
|
514
|
+
|
|
515
|
+
> **Tooltip text for the dashboard `copilotDisableBuiltinMcps` toggle**
|
|
516
|
+
> (per P-7a5c1f8e):
|
|
517
|
+
>
|
|
518
|
+
> > When OFF, agents can autonomously create PRs / labels / comments via the
|
|
519
|
+
> > github-mcp-server, bypassing Minions' `pull-requests.json` tracking. Leave
|
|
520
|
+
> > this ON unless you have a specific reason to expose the server.
|
|
521
|
+
|
|
522
|
+
---
|
|
523
|
+
|
|
524
|
+
## 8. Effort Level Mapping
|
|
525
|
+
|
|
526
|
+
```text
|
|
527
|
+
PS> copilot --help | findstr /C:"reasoning-effort"
|
|
528
|
+
--effort, --reasoning-effort <level> Set the reasoning effort level (choices:
|
|
529
|
+
"low", "medium", "high", "xhigh")
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
Only four valid values: `low`, `medium`, `high`, `xhigh`. The Claude adapter
|
|
533
|
+
accepts `'max'` (verbatim) — Copilot does **not**. The adapter must map
|
|
534
|
+
`'max' → 'xhigh'` and pass everything else verbatim:
|
|
535
|
+
|
|
536
|
+
```js
|
|
537
|
+
function _mapEffort(level) {
|
|
538
|
+
if (level === 'max') return 'xhigh';
|
|
539
|
+
return level;
|
|
540
|
+
}
|
|
541
|
+
```
|
|
542
|
+
|
|
543
|
+
Per the model catalog (§6), Anthropic models advertise
|
|
544
|
+
`reasoning_effort: ["low", "medium", "high"]` — note the absence of `xhigh`.
|
|
545
|
+
OpenAI Codex variants advertise the full four-level set. Passing
|
|
546
|
+
`--effort xhigh --model claude-sonnet-4.5` is unverified; the safe behavior is
|
|
547
|
+
to honor whatever the user requests and let Copilot reject it at API-layer if
|
|
548
|
+
unsupported (the parser will surface that as an error event).
|
|
549
|
+
|
|
550
|
+
---
|
|
551
|
+
|
|
552
|
+
## 9. ARG_MAX on Windows — confirmed cliff at 32 KB
|
|
553
|
+
|
|
554
|
+
```text
|
|
555
|
+
PS> $big = "x" * 40000
|
|
556
|
+
PS> copilot -p $big --output-format json ...
|
|
557
|
+
|
|
558
|
+
Program 'copilot.exe' failed to run:
|
|
559
|
+
An error occurred trying to start process 'copilot.exe' ...
|
|
560
|
+
The filename or extension is too long.
|
|
561
|
+
```
|
|
562
|
+
|
|
563
|
+
Windows' `CreateProcess` enforces `CommandLine ≤ 32 768` chars (lpCommandLine
|
|
564
|
+
limit, `MAX_COMMAND_LINE_LENGTH` ≈ 32 KB inclusive of all argv concatenation).
|
|
565
|
+
A 40 KB `--prompt` arg is rejected before the binary even starts.
|
|
566
|
+
|
|
567
|
+
**Mitigation** (already adopted): pipe via stdin (§2). Stdin is unaffected by
|
|
568
|
+
ARG_MAX; tested with the same 40 KB string via PowerShell `|` and Copilot
|
|
569
|
+
processed it without complaint (full prompt arrived in
|
|
570
|
+
`user.message.data.content`).
|
|
571
|
+
|
|
572
|
+
Linux/macOS ARG_MAX is far higher (typically 128 KB to 2 MB), but stdin is
|
|
573
|
+
still preferred — keeps the adapter cross-platform and avoids surprise on
|
|
574
|
+
edge cases like `xargs`-style chaining.
|
|
575
|
+
|
|
576
|
+
---
|
|
577
|
+
|
|
578
|
+
## 10. Summary — Adapter Wire-Up Checklist for P-1d4a8e7c
|
|
579
|
+
|
|
580
|
+
When implementing `engine/runtimes/copilot.js`:
|
|
581
|
+
|
|
582
|
+
1. `capabilities` block exactly matches the table at the top of this doc.
|
|
583
|
+
2. `resolveBinary()`:
|
|
584
|
+
- PATH → standalone first; cache to `engine/copilot-caps.json` with
|
|
585
|
+
`{ copilotBin, copilotIsNative, leadingArgs: [] }`.
|
|
586
|
+
- `gh extension list | grep gh-copilot` → fallback with
|
|
587
|
+
`leadingArgs: ['copilot']`. Mark the result as `bestEffort: true` so
|
|
588
|
+
preflight can warn.
|
|
589
|
+
- **Never** probe npm. Document this in the file header.
|
|
590
|
+
3. `buildArgs(opts)` always emits:
|
|
591
|
+
`--output-format json -s --allow-all --no-ask-user --autopilot --log-level error`
|
|
592
|
+
plus the conditional flags from §3, plus `--no-custom-instructions` /
|
|
593
|
+
`--disable-builtin-mcps` per `opts.suppressAgentsMd` / `opts.disableBuiltinMcps`.
|
|
594
|
+
**Never** emit `--verbose`.
|
|
595
|
+
4. `buildPrompt()` injects `<system>...</system>\n\n` block when sysprompt is
|
|
596
|
+
non-empty; passthrough otherwise (§2).
|
|
597
|
+
5. `resolveModel()` is verbatim passthrough; emit a one-time `console.warn`
|
|
598
|
+
when input is `'sonnet' | 'opus' | 'haiku'` (Claude shorthand the user
|
|
599
|
+
probably meant to set on the Claude adapter).
|
|
600
|
+
6. `_mapEffort()` private helper does `'max' → 'xhigh'`; pass through otherwise.
|
|
601
|
+
7. `parseOutput(raw)` produces:
|
|
602
|
+
- `text`: concatenation of all `assistant.message.data.content` (multi-turn
|
|
603
|
+
autopilot).
|
|
604
|
+
- `usage`: shape per §5.4 — `costUsd: null`, `outputTokens: <sum>`,
|
|
605
|
+
`premiumRequests: <result.usage.premiumRequests>`, durations from
|
|
606
|
+
`result.usage`.
|
|
607
|
+
- `sessionId`: from the `result` event.
|
|
608
|
+
- `model`: from any `session.tools_updated` event (`data.model`).
|
|
609
|
+
8. `parseStreamChunk(line)` returns the parsed JSON or `null` if line is empty
|
|
610
|
+
/ non-JSON. **Defensive**: any event whose `type` is not in the §5.1 inventory
|
|
611
|
+
should still parse cleanly — let the consumer decide to ignore.
|
|
612
|
+
9. `parseError(rawOutput)` patterns:
|
|
613
|
+
- `auth-failure`: `/not authenticated|copilot login|401|403/i`
|
|
614
|
+
- `rate-limit`: `/rate limit|too many requests|429/i`
|
|
615
|
+
- `unknown-model`: `/unknown model|model not found|model.*invalid/i`
|
|
616
|
+
- `crash`: `/internal error|panic|uncaught/i`
|
|
617
|
+
10. `listModels()` per §6 — return `null` on any failure (network, parse, auth).
|
|
618
|
+
`modelsCache` path: `engine/copilot-models.json`.
|
|
619
|
+
|
|
620
|
+
When the spike's findings disagree with the plan text, **this document wins**
|
|
621
|
+
(the plan was written before empirical confirmation). The notable deltas:
|
|
622
|
+
- `gpt-4o` is no longer in the catalog → use `gpt-4.1` for OpenAI tests.
|
|
623
|
+
- `--prompt @tmpfile` syntax is **not** supported on `copilot --prompt` (only
|
|
624
|
+
on `--additional-mcp-config`). Open question #3 in the PRD is closed: stdin
|
|
625
|
+
is the answer.
|
|
626
|
+
- `--verbose` does not exist; do not port the Claude adapter's `verbose: true`
|
|
627
|
+
default into the Copilot adapter.
|
|
628
|
+
|
|
629
|
+
---
|
|
630
|
+
|
|
631
|
+
## Provenance
|
|
632
|
+
|
|
633
|
+
- Test host: Windows 11, PowerShell 7+, `copilot.exe` 1.0.36 from WinGet.
|
|
634
|
+
- GitHub account: `yemi33` (active `gh auth` session, scopes:
|
|
635
|
+
`gist read:org repo workflow`).
|
|
636
|
+
- All JSONL samples reproducible via the commands documented in each section.
|
|
637
|
+
- Spike completed: 2026-04-28.
|