@yemi33/minions 0.1.1588 → 0.1.1589

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,637 @@
1
+ # GitHub Copilot CLI — Behavior & Schema Reference
2
+
3
+ > **Spike output for plan item P-8f2c4d9b.** Authoritative reference for the
4
+ > Copilot adapter implementation in P-1d4a8e7c. Every claim in this document is
5
+ > grounded in real CLI invocations against `copilot.exe` v1.0.36 on Windows
6
+ > (WinGet install, `%LOCALAPPDATA%\Microsoft\WinGet\Links\copilot.exe`).
7
+ > Captured samples live alongside this file as
8
+ > `copilot-output-sample-{default,claude,gpt4o}.jsonl`.
9
+
10
+ ---
11
+
12
+ ## TL;DR — Adapter Decisions
13
+
14
+ | Decision | Value | Why |
15
+ |---|---|---|
16
+ | `capabilities.promptViaArg` | **`false`** | Stdin works in non-interactive mode and dodges the Windows ARG_MAX (~32 KB) limit that breaks `-p "<long-prompt>"` outright (`CreateProcess` rejects it before Copilot even starts). |
17
+ | `capabilities.modelDiscovery` | **`true`** | `GET https://api.githubcopilot.com/models` with a `gh auth token` Bearer returns HTTP 200 + a 24-model JSON catalog. |
18
+ | `capabilities.streaming` | **`true`** | `--stream on` (default) emits `assistant.message_delta` events incrementally; `--stream off` suppresses deltas but the final `assistant.message` always arrives. |
19
+ | `capabilities.sessionResume` | **`true`** | `--resume <session-id>` documented, and every `result` event emits `sessionId`. |
20
+ | `capabilities.systemPromptFile` | **`false`** | No `--system-prompt-file` flag exists. Inject system prompt via a `<system>` block prepended to stdin. |
21
+ | `capabilities.effortLevels` | **`true`** | `--effort` accepts `low|medium|high|xhigh` (no `max`). Adapter must map `'max' → 'xhigh'`. |
22
+ | `capabilities.costTracking` | **`false`** | `result.usage` contains `premiumRequests` (count, not USD), no token counts, no cost. |
23
+ | `capabilities.modelShorthands` | **`false`** | Models are full IDs (`claude-sonnet-4.5`, `gpt-5.4`). No `sonnet`/`opus`/`haiku` aliasing on the Copilot side. |
24
+ | `capabilities.budgetCap` | **`false`** | No `--max-budget-usd` flag. |
25
+ | `capabilities.bareMode` | **`false`** | No `--bare`. Closest equivalent is `--no-custom-instructions` (suppresses AGENTS.md only, not all auto-discovery). |
26
+ | `capabilities.fallbackModel` | **`false`** | No `--fallback-model` flag. |
27
+ | `capabilities.sessionPersistenceControl` | **`false`** | Copilot manages session state internally in `~/.copilot/session-state/`. Engine cannot opt out without `--config-dir`. |
28
+
29
+ | Default | Value |
30
+ |---|---|
31
+ | `copilotStreamMode` (default config field) | `'on'` — preserves incremental UX; the adapter parser tolerates either mode. |
32
+ | `copilotDisableBuiltinMcps` | `true` — github-mcp-server bypasses Minions' `pull-requests.json` tracking; default OFF. |
33
+ | `copilotSuppressAgentsMd` | `true` — Minions injects its own playbook prompt; AGENTS.md auto-load conflicts with that. |
34
+ | `copilotReasoningSummaries` | `false` — opt-in; only some models honor it. |
35
+
36
+ ---
37
+
38
+ ## 1. Binary Resolution
39
+
40
+ ### Standalone `copilot` (Windows / WinGet)
41
+
42
+ ```text
43
+ PS> where.exe copilot
44
+ C:\Users\yemishin\AppData\Local\Microsoft\WinGet\Links\copilot.exe
45
+
46
+ PS> copilot --version
47
+ GitHub Copilot CLI 1.0.36.
48
+ ```
49
+
50
+ WinGet installs a shim into `%LOCALAPPDATA%\Microsoft\WinGet\Links\` and adds
51
+ that dir to PATH. The adapter's `resolveBinary()` should:
52
+
53
+ 1. Check PATH (`where copilot` / `which copilot`) — the standalone path.
54
+ 2. If not found, fall back to the `gh copilot` extension (see §1.1).
55
+ 3. Cache the resolved path to `engine/copilot-caps.json` (mirrors
56
+ `engine/claude-caps.json` shape).
57
+ 4. Never attempt npm-style resolution — Copilot is **not** an npm package.
58
+
59
+ ### `gh copilot` extension (fallback / unconfirmed on this host)
60
+
61
+ The `gh-copilot` extension is documented at
62
+ <https://docs.github.com/en/copilot/github-copilot-in-the-cli>. On this test
63
+ machine `gh extension list` returned empty, so this path was **not exercised
64
+ empirically**. The adapter contract still needs to support it for hosts without
65
+ the WinGet standalone install:
66
+
67
+ ```text
68
+ gh extension install github/gh-copilot
69
+ gh copilot --help # subcommand of gh
70
+ ```
71
+
72
+ When falling back to the extension form, the adapter must return:
73
+
74
+ ```js
75
+ { bin: '<path-to-gh.exe>', native: true, leadingArgs: ['copilot'] }
76
+ ```
77
+
78
+ so that `engine/spawn-agent.js` invokes `gh copilot <flags>` rather than
79
+ `copilot <flags>`. **Important caveat**: the `gh copilot` extension is the
80
+ older "explain/suggest" UX and may not support the same flag set as the
81
+ standalone `copilot` v1.0.36 (especially `--output-format json`, `--autopilot`,
82
+ `--allow-all`). Until empirically validated, treat the `gh copilot` path as
83
+ **best-effort** — the adapter should detect missing flags via stderr probe and
84
+ warn at preflight.
85
+
86
+ ### Recommended `resolveBinary()` cache shape
87
+
88
+ ```json
89
+ {
90
+ "copilotBin": "C:\\Users\\yemishin\\AppData\\Local\\Microsoft\\WinGet\\Links\\copilot.exe",
91
+ "copilotIsNative": true,
92
+ "leadingArgs": [],
93
+ "version": "1.0.36",
94
+ "resolvedAt": "2026-04-28T04:00:00Z"
95
+ }
96
+ ```
97
+
98
+ ---
99
+
100
+ ## 2. Prompt Delivery — `promptViaArg: false` (stdin)
101
+
102
+ ### Empirical results
103
+
104
+ ```powershell
105
+ # Test A: stdin without -p — works.
106
+ "Say only the word: pong" |
107
+ copilot --output-format json -s --allow-all --no-ask-user --autopilot --log-level error
108
+ # EXIT=0; user.message.data.content = "Say only the word: pong\r\n"; assistant replied "pong".
109
+
110
+ # Test B: -p "<40_000-char string>" — Windows OS rejects spawn.
111
+ $big = "x" * 40000
112
+ copilot -p $big --output-format json -s --allow-all --no-ask-user --autopilot --log-level error
113
+ # Program 'copilot.exe' failed to run:
114
+ # The filename or extension is too long.
115
+ # (CreateProcess ARG_MAX limit, ~32 KB on Windows)
116
+ ```
117
+
118
+ ### Decision
119
+
120
+ Set `capabilities.promptViaArg = false`. The adapter **does not** emit
121
+ `--prompt <text>` in args; instead, `engine/spawn-agent.js` pipes the final
122
+ prompt (system block prepended) via stdin. This:
123
+
124
+ - Sidesteps the Windows 32 KB ARG_MAX cliff for any prompt that bundles
125
+ `pinned.md`, `notes.md`, knowledge-base entries, and a playbook (Minions
126
+ prompts routinely run 20–60 KB).
127
+ - Mirrors the proven Claude path (also `promptViaArg: false`).
128
+ - Eliminates the need to investigate `--prompt @tmpfile` syntax (open question
129
+ in the PRD) — that flag does not appear in `copilot --help` output for v1.0.36.
130
+ The `@<path>` prefix syntax is only documented for `--additional-mcp-config`,
131
+ not `--prompt`.
132
+
133
+ ### `buildPrompt(promptText, sysPromptText)` — recommended impl
134
+
135
+ Copilot has no `--system-prompt-file`. Inject the system prompt as a `<system>`
136
+ block prepended to the user prompt, mirroring the convention used by
137
+ [Anthropic's tool-use docs](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use):
138
+
139
+ ```js
140
+ function buildPrompt(promptText, sysPromptText) {
141
+ const user = promptText == null ? '' : String(promptText);
142
+ if (!sysPromptText) return user;
143
+ return `<system>\n${sysPromptText}\n</system>\n\n${user}`;
144
+ }
145
+ ```
146
+
147
+ Combined with `--no-custom-instructions` (default-on per
148
+ `copilotSuppressAgentsMd`), this guarantees the prompt the agent sees is exactly
149
+ what Minions sent.
150
+
151
+ ---
152
+
153
+ ## 3. Required Headless Flag Set
154
+
155
+ Empirically confirmed flags for non-interactive Copilot invocations:
156
+
157
+ | Flag | Required? | Effect |
158
+ |---|---|---|
159
+ | `--output-format json` | **required** | Switches stdout to JSONL (one event per line). Default is `text`. |
160
+ | `-s` / `--silent` | recommended | Suppresses chatty stats lines; only the agent JSONL stream remains. |
161
+ | `--allow-all` | **required** | Equivalent to `--allow-all-tools --allow-all-paths --allow-all-urls`. Without this the CLI prompts for every tool/path use, which deadlocks in stdin/stdout mode. |
162
+ | `--no-ask-user` | **required** | Removes the `ask_user` tool. Without it the agent can stall waiting for human input. |
163
+ | `--autopilot` | for multi-turn agency | Enables `task_complete`-driven multi-turn loop. **Without it** the session ends after one assistant response (see §3.1). |
164
+ | `--log-level error` | recommended | Suppresses INFO/DEBUG diagnostics that aren't part of the JSONL stream. |
165
+ | `--no-custom-instructions` | gated by config | Disables AGENTS.md auto-load. Default-on for Minions (`copilotSuppressAgentsMd: true`). |
166
+ | `--disable-builtin-mcps` | gated by config | Disables `github-mcp-server`. Default-on for Minions (`copilotDisableBuiltinMcps: true`) to prevent split-brain PR creation. |
167
+ | `--no-color` | optional | Cosmetic; safe to omit when `--output-format json`. |
168
+ | `--plain-diff` | optional | Cosmetic; the agent's diff rendering doesn't appear in JSONL stream anyway. |
169
+ | `--max-autopilot-continues N` | optional | Maps from `opts.maxTurns`. |
170
+ | `--effort <level>` | optional | Choices: `low|medium|high|xhigh`. **No `max`** — adapter must map `'max' → 'xhigh'`. |
171
+ | `--model <id>` | optional | Full model ID (see §6 for the catalog). |
172
+ | `--resume=<session-id>` | optional | Maps from `opts.sessionId`. Note the `=` syntax — `--resume <id>` is also accepted but `--resume` standalone enters interactive picker. |
173
+ | `--stream on` / `--stream off` | optional | Default is `on`. See §4. |
174
+ | `--enable-reasoning-summaries` | optional | Maps from `opts.reasoningSummaries`; only Anthropic models populate `assistant.reasoning_delta`. |
175
+ | `--add-dir <path>` | injected by spawn-agent | Same role as on the Claude path — registers extra read-allowed dirs (skill discovery). |
176
+ | `-v` / `--verbose` | **never emit** | Does not exist on Copilot. The Claude adapter emits `--verbose`; the Copilot adapter MUST NOT. |
177
+
178
+ ### 3.1 `--autopilot` vs single-shot
179
+
180
+ | Mode | Terminal event | When to use |
181
+ |---|---|---|
182
+ | `--autopilot` | `session.task_complete` → `result` | Multi-turn agent work (implement / fix / review). The agent calls the `task_complete` tool with a summary and Copilot ends the session. |
183
+ | no `--autopilot` | `assistant.turn_end` → `result` | One-shot Q&A. Fewer events; no `session.task_complete`, no `session.info`. Closer match for CC / doc-chat use cases that don't need multi-turn. |
184
+
185
+ The Minions agent path (engine.js dispatch) uses autopilot. CC and doc-chat in
186
+ `engine/llm.js` should also use autopilot — they need tool use even when only
187
+ one assistant turn is expected — but the parser must tolerate the absence of
188
+ `session.task_complete` because some early-exit paths skip it.
189
+
190
+ ---
191
+
192
+ ## 4. Streaming — `--stream on` vs `--stream off`
193
+
194
+ Empirical comparison (single 4-character `pong` reply):
195
+
196
+ | Property | `--stream on` (default) | `--stream off` |
197
+ |---|---|---|
198
+ | `assistant.message_delta` events | **1+** (delta-coded; chunks the response as tokens arrive) | **0** (suppressed) |
199
+ | `assistant.message` (final) | **1** | **1** |
200
+ | Other events | identical | identical |
201
+ | Stdout shape | one JSON object per line | one JSON object per line |
202
+ | Time-to-first-token | low | high (waits for full response) |
203
+
204
+ ### Parser implications
205
+
206
+ ```js
207
+ // Pseudocode — accumulate deltas, but ALWAYS use assistant.message as truth.
208
+ let buffered = '';
209
+ for (const ev of events) {
210
+ if (ev.type === 'assistant.message_delta') {
211
+ buffered += ev.data.deltaContent;
212
+ emit({ kind: 'partial-text', text: buffered });
213
+ } else if (ev.type === 'assistant.message') {
214
+ // Authoritative final content. Replace buffered text — never concat.
215
+ emit({ kind: 'final-text', text: ev.data.content, messageId: ev.data.messageId });
216
+ buffered = '';
217
+ }
218
+ }
219
+ ```
220
+
221
+ The parser must handle three cases:
222
+ 1. `--stream on`, response < 1 chunk: zero deltas, one message. (Common for short replies — see the gpt-4.1 sample.)
223
+ 2. `--stream on`, response with deltas: N deltas + 1 message (treat message as truth).
224
+ 3. `--stream off`: zero deltas, one message.
225
+
226
+ ### Recommendation
227
+
228
+ Default `copilotStreamMode = 'on'` so the engine's streaming UI (live-output.log
229
+ tailing, dashboard progress feed) gets incremental updates. The parser tolerates
230
+ both, so users who want bandwidth-efficient batch responses can flip to `off`.
231
+
232
+ ---
233
+
234
+ ## 5. JSONL Event Schema
235
+
236
+ Captured against three model invocations:
237
+ - `copilot-output-sample-default.jsonl` — `gpt-5.4` (Copilot's default; OpenAI Codex variant)
238
+ - `copilot-output-sample-claude.jsonl` — `claude-sonnet-4.5`
239
+ - `copilot-output-sample-gpt4o.jsonl` — `gpt-4.1` (note: `gpt-4o` itself is no longer in the API catalog; closest enabled OpenAI model is `gpt-4.1`)
240
+
241
+ ### 5.1 Event Type Inventory
242
+
243
+ | Event type | Default (gpt-5.4) | Claude Sonnet 4.5 | GPT-4.1 | Stream-on only? | Notes |
244
+ |---|:-:|:-:|:-:|:-:|---|
245
+ | `session.mcp_server_status_changed` | ✓ | ✓ | ✓ | no | Per-server connect/disconnect transitions. `data.status` is one of `connecting`/`connected`/`disabled`/`error`. |
246
+ | `session.mcp_servers_loaded` | ✓ | ✓ | ✓ | no | Snapshot of all MCP servers + their final status. |
247
+ | `session.skills_loaded` | ✓ | ✓ | ✓ | no | List of discovered skills (`source: builtin|project|plugin`). |
248
+ | `session.tools_updated` | ✓ | ✓ | ✓ | no | Diagnostic; `data` only has `{ model }` — does **not** list available tools. |
249
+ | `session.info` | ✓ | ✓ | – | no (autopilot only) | `infoType: autopilot_continuation` — emitted between turns when autopilot continues. |
250
+ | `session.task_complete` | ✓ | ✓ | ✓ | no (autopilot only) | Terminal-of-session signal in autopilot mode. `data.success: bool`, `data.summary: string`. |
251
+ | `user.message` | ✓ | ✓ | ✓ | no | Echo of the user prompt, with `transformedContent` showing what the agent actually saw (datetime + reminder block prepended). |
252
+ | `assistant.turn_start` | ✓ | ✓ | ✓ | no | Per-turn delimiter. |
253
+ | `assistant.turn_end` | ✓ | ✓ | ✓ | no | Per-turn delimiter; pairs with `turn_start`. |
254
+ | `assistant.reasoning` | ✓ | ✓ | – | no | Encrypted reasoning blob (`reasoningOpaque`). Absent for non-reasoning models like GPT-4.1. |
255
+ | `assistant.reasoning_delta` | – | ✓ | – | yes | **Anthropic-only.** Streamed reasoning text — Claude exposes plain `reasoningText`, OpenAI does not. |
256
+ | `assistant.message_delta` | ✓ | ✓ | ✓ | **yes** | Per-token streamed delta. Only emitted with `--stream on`. |
257
+ | `assistant.message` | ✓ | ✓ | ✓ | no | Authoritative final assistant content for the turn. |
258
+ | `tool.execution_start` | ✓ | ✓ | ✓ | no | Tool call begin. `data.toolName`, `data.arguments`. |
259
+ | `tool.execution_complete` | ✓ | ✓ | ✓ | no | Tool call end. `data.success: bool`, `data.result.{content, detailedContent}`. |
260
+ | `result` | ✓ | ✓ | ✓ | no | Final aggregate. `data.usage`, `sessionId`, `exitCode`. |
261
+ | `function` | (in stdin-no-`-p` test) | – | – | no | Observed once in an early stdin test; appears to be a meta event for tool invocation. **Treat as ignorable** unless future spike re-confirms its semantics. |
262
+
263
+ All events share the envelope `{ type, data, id, timestamp, parentId, ephemeral? }`.
264
+ `ephemeral: true` marks events that the Copilot UI hides from the persistent
265
+ session log (e.g., deltas, MCP loading noise). The parser should ignore the
266
+ `ephemeral` flag — it's a UI hint, not a parser hint.
267
+
268
+ ### 5.2 Provider-Driven Schema Variation
269
+
270
+ This is the **biggest gotcha for the parser** — `assistant.message.data` carries
271
+ provider-specific fields:
272
+
273
+ | Field on `assistant.message.data` | Default (gpt-5.4 / Codex) | Claude Sonnet 4.5 | GPT-4.1 |
274
+ |---|:-:|:-:|:-:|
275
+ | `messageId` | ✓ | ✓ | ✓ |
276
+ | `content` | ✓ | ✓ | ✓ |
277
+ | `interactionId` | ✓ | ✓ | ✓ |
278
+ | `requestId` | ✓ | ✓ | ✓ |
279
+ | `outputTokens` | ✓ (52) | ✓ | ✓ |
280
+ | `toolRequests` | ✓ | ✓ | ✓ |
281
+ | `reasoningOpaque` | ✓ | ✓ | – |
282
+ | `reasoningText` | – | ✓ | – |
283
+ | `encryptedContent` | ✓ | – | – |
284
+ | `phase` | ✓ (`final_answer`) | – | – |
285
+
286
+ ### 5.3 Defensive Parser Rules
287
+
288
+ 1. **Whitelist the events you care about**, route everything else to a
289
+ `type: 'ignore'` bucket. The schema clearly has provider-specific extensions,
290
+ and Copilot's release cadence means new event types will appear without
291
+ warning.
292
+ 2. **Never assume optional fields exist.** `outputTokens` is consistently
293
+ present, but `reasoningText`/`reasoningOpaque`/`encryptedContent`/`phase`
294
+ are provider-dependent. Prefer `?.` access; default missing numerics to
295
+ `null`, not `0`.
296
+ 3. **Use `assistant.message.data.content` as the authoritative response.**
297
+ Do not concatenate `assistant.message_delta` deltas into your final result —
298
+ they're a streaming hint, not the source of truth.
299
+ 4. **The terminal signal differs by mode.** In autopilot, watch for
300
+ `session.task_complete` (and then `result`); in single-shot, watch for
301
+ `result` directly (no `task_complete`).
302
+ 5. **`exitCode` lives on the `result` event**, not on the process. The CLI
303
+ process always returns 0 even when the agent failed mid-turn — surface
304
+ `result.exitCode !== 0` as the actual failure signal.
305
+
306
+ ### 5.4 Result / Usage Shape (no cost tracking)
307
+
308
+ ```json
309
+ {
310
+ "type": "result",
311
+ "timestamp": "2026-04-28T04:11:36.109Z",
312
+ "sessionId": "8a216c49-e51c-4eef-9405-bf83298fced2",
313
+ "exitCode": 0,
314
+ "usage": {
315
+ "premiumRequests": 2,
316
+ "totalApiDurationMs": 5485,
317
+ "sessionDurationMs": 9103,
318
+ "codeChanges": {
319
+ "linesAdded": 0,
320
+ "linesRemoved": 0,
321
+ "filesModified": []
322
+ }
323
+ }
324
+ }
325
+ ```
326
+
327
+ **Critical**: Copilot does **not** emit `total_cost_usd` or per-token counts
328
+ (input/output/cache_*). The closest proxy is `premiumRequests` — a unitless
329
+ count of premium-tier requests consumed in the session. The adapter's
330
+ `parseOutput()` must map this onto the engine's usage shape with NULLs (not
331
+ zeros) for fields Copilot doesn't expose, so dashboard cost telemetry can
332
+ distinguish "Copilot didn't tell us" from "this turn cost $0":
333
+
334
+ ```js
335
+ {
336
+ costUsd: null, // ← not 0; Copilot doesn't report this
337
+ inputTokens: null,
338
+ outputTokens: <sum of assistant.message.data.outputTokens>, // recovered from per-turn events
339
+ cacheRead: null,
340
+ cacheCreation: null,
341
+ durationMs: result.usage.totalApiDurationMs ?? 0,
342
+ numTurns: <count of assistant.turn_end events>,
343
+ // Copilot-specific extension:
344
+ premiumRequests: result.usage.premiumRequests ?? 0,
345
+ }
346
+ ```
347
+
348
+ ---
349
+
350
+ ## 6. Model Discovery
351
+
352
+ `GET https://api.githubcopilot.com/models` with a Bearer token works.
353
+ Empirical result on this host:
354
+
355
+ ```http
356
+ GET https://api.githubcopilot.com/models
357
+ Authorization: Bearer <gh-cli-token>
358
+
359
+ 200 OK
360
+ { "data": [ <24 model objects> ] }
361
+ ```
362
+
363
+ ### Token resolution
364
+
365
+ The adapter should resolve the bearer in this priority:
366
+
367
+ 1. `process.env.GH_TOKEN`
368
+ 2. `process.env.COPILOT_GITHUB_TOKEN`
369
+ 3. (Optional best-effort) shell out to `gh auth token` — already works on
370
+ this host since `gh auth status` shows an active session.
371
+
372
+ `gh auth token` is the authoritative path on developer machines — but spawning
373
+ `gh` adds an extra dependency. The adapter should attempt env vars first and
374
+ only fall back to `gh auth token` if both are unset, **never required at
375
+ listModels-time** (return `null` and let the dashboard fall back to free-text).
376
+
377
+ ### Response shape (24 models on the test account)
378
+
379
+ ```json
380
+ {
381
+ "data": [
382
+ {
383
+ "id": "claude-sonnet-4.5",
384
+ "name": "Claude Sonnet 4.5",
385
+ "vendor": "Anthropic",
386
+ "object": "model",
387
+ "version": "claude-sonnet-4.5",
388
+ "preview": false,
389
+ "model_picker_enabled": true,
390
+ "model_picker_category": "powerful",
391
+ "policy": { "state": "enabled", "terms": "..." },
392
+ "supported_endpoints": ["/v1/messages", "/chat/completions"],
393
+ "capabilities": {
394
+ "type": "chat",
395
+ "tokenizer": "o200k_base",
396
+ "family": "claude-sonnet-4.5",
397
+ "limits": { "max_context_window_tokens": 200000, "max_output_tokens": 16000, ... },
398
+ "supports": {
399
+ "streaming": true,
400
+ "tool_calls": true,
401
+ "vision": true,
402
+ "structured_outputs": true,
403
+ "parallel_tool_calls": true,
404
+ "reasoning_effort": ["low", "medium", "high"],
405
+ "adaptive_thinking": true,
406
+ "max_thinking_budget": 32000,
407
+ "min_thinking_budget": 1024
408
+ }
409
+ }
410
+ }
411
+ ]
412
+ }
413
+ ```
414
+
415
+ ### Models seen on this account (snapshot)
416
+
417
+ ```text
418
+ claude-haiku-4.5 Claude Haiku 4.5 Anthropic enabled
419
+ claude-opus-4.5 Claude Opus 4.5 Anthropic enabled
420
+ claude-opus-4.6 Claude Opus 4.6 Anthropic enabled
421
+ claude-opus-4.6-1m Claude Opus 4.6 (1M ctx) Anthropic enabled
422
+ claude-opus-4.7 Claude Opus 4.7 Anthropic enabled
423
+ claude-sonnet-4 Claude Sonnet 4 Anthropic enabled
424
+ claude-sonnet-4.5 Claude Sonnet 4.5 Anthropic enabled
425
+ claude-sonnet-4.6 Claude Sonnet 4.6 Anthropic enabled
426
+ gpt-3.5-turbo GPT 3.5 Turbo Azure OpenAI (no policy)
427
+ gpt-3.5-turbo-0613 GPT 3.5 Turbo Azure OpenAI (no policy)
428
+ gpt-4.1 GPT-4.1 Azure OpenAI enabled
429
+ gpt-4.1-2025-04-14 GPT-4.1 Azure OpenAI enabled
430
+ gpt-4o-mini GPT-4o mini Azure OpenAI (no policy)
431
+ gpt-4o-mini-2024-07-18 GPT-4o mini Azure OpenAI (no policy)
432
+ gpt-5-mini GPT-5 mini Azure OpenAI enabled
433
+ gpt-5.2 GPT-5.2 OpenAI enabled
434
+ gpt-5.2-codex GPT-5.2-Codex OpenAI enabled
435
+ gpt-5.3-codex GPT-5.3-Codex OpenAI enabled
436
+ gpt-5.4 GPT-5.4 OpenAI enabled (account default)
437
+ gpt-5.4-mini GPT-5.4 mini OpenAI (no policy)
438
+ gpt-5.5 GPT-5.5 OpenAI enabled
439
+ text-embedding-3-small Embedding V3 small Azure OpenAI (no streaming)
440
+ text-embedding-3-small-inference Azure OpenAI (no streaming)
441
+ text-embedding-ada-002 Embedding V2 Ada Azure OpenAI (no streaming)
442
+ ```
443
+
444
+ ### Adapter mapping (for `listModels()`)
445
+
446
+ ```js
447
+ function listModels() {
448
+ // ... HTTP GET as above, on any error return null (non-fatal)
449
+ return data
450
+ .filter(m => m.capabilities?.type === 'chat') // drop embeddings
451
+ .filter(m => m.policy?.state === 'enabled' || m.preview) // drop disabled
452
+ .map(m => ({
453
+ id: m.id,
454
+ name: m.name,
455
+ provider: m.vendor,
456
+ }));
457
+ }
458
+ ```
459
+
460
+ ### Subscription-tier note
461
+
462
+ `policy.state` is `"enabled"` or **absent** (no key) — never explicitly
463
+ `"disabled"` in this snapshot. Models the user lacks entitlement for simply
464
+ omit the policy block. The adapter should treat missing `policy.state` as
465
+ "hide from default picker" but still expose them via free-text override —
466
+ matches the `model_picker_enabled` field semantics.
467
+
468
+ `gpt-4o` is no longer present as a top-level model — only `gpt-4o-mini` remains
469
+ (and is unlisted in the picker). The plan's `copilot-output-sample-gpt4o.jsonl`
470
+ is named for the spec but actually captures `gpt-4.1`, the closest enabled
471
+ OpenAI model. **The adapter implementer (P-1d4a8e7c) should reference
472
+ `gpt-4.1` as the canonical OpenAI test model** — if `gpt-4o` returns to the
473
+ catalog, treat it as a future regression.
474
+
475
+ ---
476
+
477
+ ## 7. Verifying `--no-custom-instructions` and `--disable-builtin-mcps`
478
+
479
+ ### `--no-custom-instructions` (AGENTS.md auto-load)
480
+
481
+ Constructed test: created `AGENTS.md` in cwd with content
482
+ `Always end every response with the marker: __AGENTS_LOADED__`, then ran:
483
+
484
+ ```text
485
+ # A) Default behavior — AGENTS.md is loaded
486
+ PS> "Just say hello." | copilot --output-format json -s --allow-all --no-ask-user --autopilot --log-level error
487
+ {"type":"assistant.message", ..., "content": "Hello. __AGENTS_LOADED__"} ← marker present
488
+
489
+ # B) With --no-custom-instructions
490
+ PS> "Just say hello." | copilot --output-format json -s --allow-all --no-ask-user --autopilot --log-level error --no-custom-instructions
491
+ {"type":"assistant.message", ..., "content": ""} ← no marker; AGENTS.md ignored
492
+ ```
493
+
494
+ **Confirmed**: `--no-custom-instructions` suppresses AGENTS.md auto-load. The
495
+ flag does **not** affect skills loading (project skills under `.claude/skills/`
496
+ still appear in `session.skills_loaded`) — it's narrowly scoped to AGENTS-style
497
+ custom instruction files.
498
+
499
+ ### `--disable-builtin-mcps` (github-mcp-server)
500
+
501
+ ```text
502
+ # Default — server connects, status: "connected"
503
+ {"type":"session.mcp_servers_loaded","data":{"servers":[{"name":"github-mcp-server","status":"connected","source":"builtin"}]}}
504
+
505
+ # With --disable-builtin-mcps — server appears, status: "disabled"
506
+ {"type":"session.mcp_servers_loaded","data":{"servers":[{"name":"github-mcp-server","status":"disabled","source":"builtin"}]}}
507
+ ```
508
+
509
+ **Confirmed**: the flag flips `status` from `"connected"` to `"disabled"`. The
510
+ server is still *registered* (Copilot inventories it for diagnostics), but the
511
+ agent cannot use its tools. This is the desired behavior — Minions wants the
512
+ server invisible to the agent so all GitHub mutations route through the
513
+ project's `pull-requests.json` tracker rather than spawning ghost PRs.
514
+
515
+ > **Tooltip text for the dashboard `copilotDisableBuiltinMcps` toggle**
516
+ > (per P-7a5c1f8e):
517
+ >
518
+ > > When OFF, agents can autonomously create PRs / labels / comments via the
519
+ > > github-mcp-server, bypassing Minions' `pull-requests.json` tracking. Leave
520
+ > > this ON unless you have a specific reason to expose the server.
521
+
522
+ ---
523
+
524
+ ## 8. Effort Level Mapping
525
+
526
+ ```text
527
+ PS> copilot --help | findstr /C:"reasoning-effort"
528
+ --effort, --reasoning-effort <level> Set the reasoning effort level (choices:
529
+ "low", "medium", "high", "xhigh")
530
+ ```
531
+
532
+ Only four valid values: `low`, `medium`, `high`, `xhigh`. The Claude adapter
533
+ accepts `'max'` (verbatim) — Copilot does **not**. The adapter must map
534
+ `'max' → 'xhigh'` and pass everything else verbatim:
535
+
536
+ ```js
537
+ function _mapEffort(level) {
538
+ if (level === 'max') return 'xhigh';
539
+ return level;
540
+ }
541
+ ```
542
+
543
+ Per the model catalog (§6), Anthropic models advertise
544
+ `reasoning_effort: ["low", "medium", "high"]` — note the absence of `xhigh`.
545
+ OpenAI Codex variants advertise the full four-level set. Passing
546
+ `--effort xhigh --model claude-sonnet-4.5` is unverified; the safe behavior is
547
+ to honor whatever the user requests and let Copilot reject it at API-layer if
548
+ unsupported (the parser will surface that as an error event).
549
+
550
+ ---
551
+
552
+ ## 9. ARG_MAX on Windows — confirmed cliff at 32 KB
553
+
554
+ ```text
555
+ PS> $big = "x" * 40000
556
+ PS> copilot -p $big --output-format json ...
557
+
558
+ Program 'copilot.exe' failed to run:
559
+ An error occurred trying to start process 'copilot.exe' ...
560
+ The filename or extension is too long.
561
+ ```
562
+
563
+ Windows' `CreateProcess` enforces `CommandLine ≤ 32 768` chars (lpCommandLine
564
+ limit, `MAX_COMMAND_LINE_LENGTH` ≈ 32 KB inclusive of all argv concatenation).
565
+ A 40 KB `--prompt` arg is rejected before the binary even starts.
566
+
567
+ **Mitigation** (already adopted): pipe via stdin (§2). Stdin is unaffected by
568
+ ARG_MAX; tested with the same 40 KB string via PowerShell `|` and Copilot
569
+ processed it without complaint (full prompt arrived in
570
+ `user.message.data.content`).
571
+
572
+ Linux/macOS ARG_MAX is far higher (typically 128 KB to 2 MB), but stdin is
573
+ still preferred — keeps the adapter cross-platform and avoids surprise on
574
+ edge cases like `xargs`-style chaining.
575
+
576
+ ---
577
+
578
+ ## 10. Summary — Adapter Wire-Up Checklist for P-1d4a8e7c
579
+
580
+ When implementing `engine/runtimes/copilot.js`:
581
+
582
+ 1. `capabilities` block exactly matches the table at the top of this doc.
583
+ 2. `resolveBinary()`:
584
+ - PATH → standalone first; cache to `engine/copilot-caps.json` with
585
+ `{ copilotBin, copilotIsNative, leadingArgs: [] }`.
586
+ - `gh extension list | grep gh-copilot` → fallback with
587
+ `leadingArgs: ['copilot']`. Mark the result as `bestEffort: true` so
588
+ preflight can warn.
589
+ - **Never** probe npm. Document this in the file header.
590
+ 3. `buildArgs(opts)` always emits:
591
+ `--output-format json -s --allow-all --no-ask-user --autopilot --log-level error`
592
+ plus the conditional flags from §3, plus `--no-custom-instructions` /
593
+ `--disable-builtin-mcps` per `opts.suppressAgentsMd` / `opts.disableBuiltinMcps`.
594
+ **Never** emit `--verbose`.
595
+ 4. `buildPrompt()` injects `<system>...</system>\n\n` block when sysprompt is
596
+ non-empty; passthrough otherwise (§2).
597
+ 5. `resolveModel()` is verbatim passthrough; emit a one-time `console.warn`
598
+ when input is `'sonnet' | 'opus' | 'haiku'` (Claude shorthand the user
599
+ probably meant to set on the Claude adapter).
600
+ 6. `_mapEffort()` private helper does `'max' → 'xhigh'`; pass through otherwise.
601
+ 7. `parseOutput(raw)` produces:
602
+ - `text`: concatenation of all `assistant.message.data.content` (multi-turn
603
+ autopilot).
604
+ - `usage`: shape per §5.4 — `costUsd: null`, `outputTokens: <sum>`,
605
+ `premiumRequests: <result.usage.premiumRequests>`, durations from
606
+ `result.usage`.
607
+ - `sessionId`: from the `result` event.
608
+ - `model`: from any `session.tools_updated` event (`data.model`).
609
+ 8. `parseStreamChunk(line)` returns the parsed JSON or `null` if line is empty
610
+ / non-JSON. **Defensive**: any event whose `type` is not in the §5.1 inventory
611
+ should still parse cleanly — let the consumer decide to ignore.
612
+ 9. `parseError(rawOutput)` patterns:
613
+ - `auth-failure`: `/not authenticated|copilot login|401|403/i`
614
+ - `rate-limit`: `/rate limit|too many requests|429/i`
615
+ - `unknown-model`: `/unknown model|model not found|model.*invalid/i`
616
+ - `crash`: `/internal error|panic|uncaught/i`
617
+ 10. `listModels()` per §6 — return `null` on any failure (network, parse, auth).
618
+ `modelsCache` path: `engine/copilot-models.json`.
619
+
620
+ When the spike's findings disagree with the plan text, **this document wins**
621
+ (the plan was written before empirical confirmation). The notable deltas:
622
+ - `gpt-4o` is no longer in the catalog → use `gpt-4.1` for OpenAI tests.
623
+ - `--prompt @tmpfile` syntax is **not** supported on `copilot --prompt` (only
624
+ on `--additional-mcp-config`). Open question #3 in the PRD is closed: stdin
625
+ is the answer.
626
+ - `--verbose` does not exist; do not port the Claude adapter's `verbose: true`
627
+ default into the Copilot adapter.
628
+
629
+ ---
630
+
631
+ ## Provenance
632
+
633
+ - Test host: Windows 11, PowerShell 7+, `copilot.exe` 1.0.36 from WinGet.
634
+ - GitHub account: `yemi33` (active `gh auth` session, scopes:
635
+ `gist read:org repo workflow`).
636
+ - All JSONL samples reproducible via the commands documented in each section.
637
+ - Spike completed: 2026-04-28.