@bradygaster/squad-cli 0.9.6-insider.2 → 0.9.6-insider.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/dist/cli/commands/doctor.d.ts.map +1 -1
  2. package/dist/cli/commands/doctor.js +29 -0
  3. package/dist/cli/commands/doctor.js.map +1 -1
  4. package/dist/cli/commands/export.d.ts +7 -3
  5. package/dist/cli/commands/export.d.ts.map +1 -1
  6. package/dist/cli/commands/export.js +68 -16
  7. package/dist/cli/commands/export.js.map +1 -1
  8. package/dist/cli/commands/import.d.ts +7 -3
  9. package/dist/cli/commands/import.d.ts.map +1 -1
  10. package/dist/cli/commands/import.js +140 -42
  11. package/dist/cli/commands/import.js.map +1 -1
  12. package/dist/cli/commands/link.d.ts.map +1 -1
  13. package/dist/cli/commands/link.js +7 -1
  14. package/dist/cli/commands/link.js.map +1 -1
  15. package/dist/cli/commands/memory.d.ts +2 -0
  16. package/dist/cli/commands/memory.d.ts.map +1 -0
  17. package/dist/cli/commands/memory.js +304 -0
  18. package/dist/cli/commands/memory.js.map +1 -0
  19. package/dist/cli/commands/plugin.d.ts.map +1 -1
  20. package/dist/cli/commands/plugin.js +420 -5
  21. package/dist/cli/commands/plugin.js.map +1 -1
  22. package/dist/cli/commands/state-mcp.d.ts +25 -0
  23. package/dist/cli/commands/state-mcp.d.ts.map +1 -0
  24. package/dist/cli/commands/state-mcp.js +168 -0
  25. package/dist/cli/commands/state-mcp.js.map +1 -0
  26. package/dist/cli/commands/watch/capabilities/board.d.ts.map +1 -1
  27. package/dist/cli/commands/watch/capabilities/board.js +2 -1
  28. package/dist/cli/commands/watch/capabilities/board.js.map +1 -1
  29. package/dist/cli/commands/watch/capabilities/decision-hygiene.js +1 -1
  30. package/dist/cli/commands/watch/capabilities/decision-hygiene.js.map +1 -1
  31. package/dist/cli/commands/watch/capabilities/execute.d.ts.map +1 -1
  32. package/dist/cli/commands/watch/capabilities/execute.js +12 -1
  33. package/dist/cli/commands/watch/capabilities/execute.js.map +1 -1
  34. package/dist/cli/commands/watch/capabilities/monitor-email.d.ts +1 -1
  35. package/dist/cli/commands/watch/capabilities/monitor-email.d.ts.map +1 -1
  36. package/dist/cli/commands/watch/capabilities/monitor-email.js +19 -3
  37. package/dist/cli/commands/watch/capabilities/monitor-email.js.map +1 -1
  38. package/dist/cli/commands/watch/capabilities/monitor-teams.d.ts +1 -1
  39. package/dist/cli/commands/watch/capabilities/monitor-teams.d.ts.map +1 -1
  40. package/dist/cli/commands/watch/capabilities/monitor-teams.js +19 -4
  41. package/dist/cli/commands/watch/capabilities/monitor-teams.js.map +1 -1
  42. package/dist/cli/commands/watch/capabilities/retro.js +1 -1
  43. package/dist/cli/commands/watch/capabilities/retro.js.map +1 -1
  44. package/dist/cli/commands/watch/index.d.ts.map +1 -1
  45. package/dist/cli/commands/watch/index.js +9 -6
  46. package/dist/cli/commands/watch/index.js.map +1 -1
  47. package/dist/cli/core/cast.d.ts.map +1 -1
  48. package/dist/cli/core/cast.js +132 -1
  49. package/dist/cli/core/cast.js.map +1 -1
  50. package/dist/cli/core/init.d.ts +2 -0
  51. package/dist/cli/core/init.d.ts.map +1 -1
  52. package/dist/cli/core/init.js +13 -1
  53. package/dist/cli/core/init.js.map +1 -1
  54. package/dist/cli/core/templates.d.ts.map +1 -1
  55. package/dist/cli/core/templates.js +31 -0
  56. package/dist/cli/core/templates.js.map +1 -1
  57. package/dist/cli/core/upgrade.d.ts +1 -0
  58. package/dist/cli/core/upgrade.d.ts.map +1 -1
  59. package/dist/cli/core/upgrade.js +171 -4
  60. package/dist/cli/core/upgrade.js.map +1 -1
  61. package/dist/cli/index.d.ts +1 -0
  62. package/dist/cli/index.d.ts.map +1 -1
  63. package/dist/cli/index.js +1 -0
  64. package/dist/cli/index.js.map +1 -1
  65. package/dist/cli/shell/components/App.js +1 -1
  66. package/dist/cli/shell/components/App.js.map +1 -1
  67. package/dist/cli/shell/components/MessageStream.js +2 -2
  68. package/dist/cli/shell/components/MessageStream.js.map +1 -1
  69. package/dist/cli/shell/coordinator.js +2 -2
  70. package/dist/cli/shell/index.d.ts.map +1 -1
  71. package/dist/cli/shell/index.js +2 -1
  72. package/dist/cli/shell/index.js.map +1 -1
  73. package/dist/cli-entry.js +51 -9
  74. package/dist/cli-entry.js.map +1 -1
  75. package/package.json +7 -3
  76. package/templates/after-agent-reference.md +64 -0
  77. package/templates/ceremony-reference.md +82 -0
  78. package/templates/client-compatibility-reference.md +46 -0
  79. package/templates/copilot-agent.md +96 -0
  80. package/templates/copilot-instructions.md +14 -0
  81. package/templates/model-selection-reference.md +101 -0
  82. package/templates/prd-intake.md +105 -0
  83. package/templates/rai-charter.md +110 -0
  84. package/templates/rai-policy.md +103 -0
  85. package/templates/ralph-reference.md +141 -0
  86. package/templates/routing.md +1 -0
  87. package/templates/scribe-charter.md +18 -151
  88. package/templates/session-init-reference.md +199 -0
  89. package/templates/skills/e2e-template-testing/SKILL.md +557 -0
  90. package/templates/skills/squad-commands/SKILL.md +303 -0
  91. package/templates/skills/squad-version-check/SKILL.md +160 -0
  92. package/templates/spawn-reference.md +132 -0
  93. package/templates/squad.agent.md.template +187 -622
  94. package/templates/worktree-reference.md +126 -0
@@ -0,0 +1,557 @@
1
+ ---
2
+ name: "e2e-template-testing"
3
+ description: "End-to-end validation of coordinator and agent template changes"
4
+ domain: "development"
5
+ confidence: "high"
6
+ source: "manual"
7
+ ---
8
+
9
+ ## Context
10
+
11
+ Squad's coordinator prompt (`squad.agent.md`) and agent charters (e.g.
12
+ `scribe-charter.md`) are shipped as templates in `.squad-templates/`. Changes to
13
+ these files affect how every squad session behaves — but unit tests can't catch
14
+ prompt-level regressions because the prompts are interpreted by an LLM at
15
+ runtime.
16
+
17
+ This skill describes how to validate template changes end-to-end by running real
18
+ squad sessions against a locally-built CLI that includes your modified templates.
19
+
20
+ ## When To Use
21
+
22
+ - You changed `.squad-templates/squad.agent.md` (coordinator prompt)
23
+ - You changed `.squad-templates/scribe-charter.md` or other agent charters
24
+ - You changed `.squad-templates/notes-protocol.md` or helper scripts
25
+ - You added new conditional blocks (e.g. state-backend-aware spawn templates)
26
+ - You modified the init scaffolding that writes templates to target repos
27
+
28
+ ## Prerequisites
29
+
30
+ - **Node.js** ≥20, **npm** ≥10
31
+ - **Git** CLI
32
+ - **GitHub Copilot CLI** (`copilot` or `ghcs`) installed
33
+ - A local clone of the squad repo on your feature branch
34
+
35
+ ## Workflow
36
+
37
+ ### Step 0 — Post initial tracking comment (**FIRST action — before anything else**)
38
+
39
+ If `PR_NUMBER` and `REPO` are both set, **the absolute first thing you do** — before
40
+ fast-fail checks, before building, before creating any repos — is post the initial
41
+ tracking comment with all steps marked as `:hourglass_flowing_sand: Pending`.
42
+
43
+ This gives reviewers immediate visibility that a run is in progress and what to expect.
44
+
45
+ ```powershell
46
+ $runStart = Get-Date
47
+ $body = @"
48
+ ## E2E Progress - PR $env:PR_NUMBER
49
+
50
+ | Step | Status | Started | Duration |
51
+ |---|---|---|---|
52
+ | 1. Fast-fail checks (build :cd: link :cd: ``squad version``) | :hourglass_flowing_sand: Pending | --:-- | -- |
53
+ | 2. Create test repo(s) | :hourglass_flowing_sand: Pending | --:-- | -- |
54
+ | 3. ``squad init`` + file verification | :hourglass_flowing_sand: Pending | --:-- | -- |
55
+ | 4. Run sessions | :hourglass_flowing_sand: Pending | --:-- | -- |
56
+ | 5. Verify outcomes | :hourglass_flowing_sand: Pending | --:-- | -- |
57
+ | 6. Record verdicts + post final comment | :hourglass_flowing_sand: Pending | --:-- | -- |
58
+
59
+ | Symbol | Meaning |
60
+ |---|---|
61
+ | :hourglass_flowing_sand: | Not started |
62
+ | :arrows_counterclockwise: | Running |
63
+ | :white_check_mark: | Passed |
64
+ | :x: | Failed |
65
+ | :warning: | Passed with caveats |
66
+
67
+ *Run started: $($runStart.ToString('HH:mm')) — all steps pending*
68
+ "@
69
+
70
+ $tmpFile = [System.IO.Path]::GetTempFileName()
71
+ $utf8NoBom = New-Object System.Text.UTF8Encoding $false
72
+ [System.IO.File]::WriteAllText($tmpFile, $body, $utf8NoBom)
73
+ $response = gh api "repos/$env:REPO/issues/$env:PR_NUMBER/comments" --method POST --field "body=@$tmpFile" | ConvertFrom-Json
74
+ $env:COMMENT_ID = $response.id
75
+ Remove-Item $tmpFile -Force
76
+ Write-Host "Progress comment posted — ID: $($response.id)"
77
+ ```
78
+
79
+ After posting, immediately update the comment to mark Step 1 as `:arrows_counterclockwise: Running` (do NOT wait — this is a two-step sequence: post all-pending, then immediately update to Step 1 running). Then proceed to Step 1.
80
+
81
+ See **Progress Reporting** for the full comment lifecycle and update patterns.
82
+
83
+ **⚠️ STOP before continuing:** If posting the comment fails (network error, auth error), abort the run and report the failure. Do not proceed silently without a tracking comment.
84
+
85
+ ### Step 1 — Build the CLI from your branch
86
+
87
+ ```bash
88
+ cd /path/to/squad # your feature branch
89
+ npm install
90
+ npm run build -w packages/squad-sdk && npm run build -w packages/squad-cli
91
+
92
+ # Link so `squad` command uses your local build (workspace flag — no cd required)
93
+ npm link -w packages/squad-cli
94
+ ```
95
+
96
+ Verify: `squad version` output includes the `-preview` suffix (e.g., `x.y.z-preview`),
97
+ confirming the local dev build is active. If the output shows a plain semver without
98
+ `-preview`, the globally-installed npm package is still in use — re-check the link step.
99
+ See [CONTRIBUTING.md — Making the `squad` Command Use Your Local Build](../../../CONTRIBUTING.md)
100
+ for the full guidance on local dev versioning.
101
+
102
+ ### Step 2 — Create a disposable test repo
103
+
104
+ ```bash
105
+ mkdir /tmp/sq-test-1 && cd /tmp/sq-test-1
106
+ git init
107
+ echo "# Test Project" > README.md
108
+ echo '{"name":"test-project","version":"1.0.0"}' > package.json
109
+ mkdir src
110
+ echo "export function hello() { return 'world' }" > src/index.ts
111
+ git add -A && git commit -m "init: test project"
112
+ ```
113
+
114
+ Keep the project small — you only need enough for the coordinator to recognize a
115
+ codebase and hire a team.
116
+
117
+ ### Step 3 — Init a squad with your modified templates
118
+
119
+ ```bash
120
+ squad init
121
+ # If testing a specific feature (e.g. state backends):
122
+ # squad init --state-backend git-notes
123
+ ```
124
+
125
+ Verify the init produced the expected files:
126
+ ```bash
127
+ ls -la .squad/
128
+ cat .squad/team.md # should have ## Members with 3+ agents
129
+ cat .squad/config.json # should reflect any CLI flags you passed
130
+ ```
131
+
132
+ ### Step 4 — Run a real session and capture output
133
+
134
+ Use the Copilot CLI's `-p` flag with `--allow-all-tools` for non-interactive sessions.
135
+ `--allow-all-tools` is **required** for automated/non-interactive runs — without it,
136
+ tool calls (including file writes) prompt for confirmation and block.
137
+
138
+ ```powershell
139
+ # PowerShell (Windows)
140
+ copilot --agent squad --allow-all-tools -p "Picard, decide what testing framework to use. Write your decision." `
141
+ 2>&1 | Tee-Object evidence/session-task.log
142
+ ```
143
+
144
+ ```bash
145
+ # Bash (macOS/Linux)
146
+ copilot --agent squad --allow-all-tools -p "Picard, decide what testing framework to use. Write your decision." \
147
+ 2>&1 | tee evidence/session-task.log
148
+ ```
149
+
150
+ Alternatively, set the `COPILOT_ALLOW_ALL=1` environment variable instead of the flag.
151
+
152
+ For multi-turn workflows, run sequential sessions:
153
+ ```powershell
154
+ # Session A: give the team a task
155
+ copilot --agent squad --allow-all-tools -p "prompt A" 2>&1 | Tee-Object evidence/session-A.log
156
+
157
+ # Session B: verify state persisted
158
+ copilot --agent squad --allow-all-tools -p "What decisions has the team made?" 2>&1 | Tee-Object evidence/session-B.log
159
+ ```
160
+
161
+ ### Step 5 — Verify the outcome
162
+
163
+ Check that your template change had the expected effect. Common checks:
164
+
165
+ ```bash
166
+ # State location (for state-backend changes)
167
+ git notes --ref=squad list # git-notes backend
168
+ git ls-tree -r squad-state # orphan backend
169
+ ls .squad/agents/*/history.md # worktree backend
170
+
171
+ # Coordinator behavior (grep session log)
172
+ grep "STATE_BACKEND" evidence/session-task.log
173
+ grep "spawn" evidence/session-task.log
174
+
175
+ # File tree diff
176
+ git diff --stat HEAD~1 # what changed on working branch
177
+ git log --all --oneline # commits across all branches
178
+ ```
179
+
180
+ ### Step 6 — Record the verdict
181
+
182
+ Create an `evidence/verdict.md` in each test repo:
183
+
184
+ ```markdown
185
+ ## Test: [scenario name]
186
+ **Backend:** worktree | git-notes | orphan | two-layer
187
+ **Branch:** [your feature branch]
188
+ **Result:** PASS | PARTIAL | FAIL
189
+ **Duration:** Xm Ys
190
+
191
+ ### What was verified
192
+ - [ ] Coordinator identified feature correctly (from session log)
193
+ - [ ] Agent was spawned via `task` tool (not simulated)
194
+ - [ ] team.md has ## Members with 3+ agents
195
+ - [ ] State landed in correct location
196
+ - [ ] No unexpected side effects
197
+
198
+ ### Evidence files
199
+ - session-task.log — full session output
200
+ - git-log.txt — `git log --all --oneline`
201
+
202
+ ### Notes
203
+ [anything unusual or noteworthy]
204
+ ```
205
+
206
+ Record the wall-clock time from the start of Step 1 (fast-fail checks) to the end
207
+ of Step 6 (verdict posted). This is the full E2E run duration for this scenario.
208
+
209
+ ## Progress Reporting
210
+
211
+ Use this section only when you are running E2E validation for an open PR. If
212
+ `PR_NUMBER` and `REPO` are both set, post and maintain a live tracking comment
213
+ in the PR thread. If either value is missing (for example, a local-only run),
214
+ skip progress reporting silently.
215
+
216
+ ### Start the tracking comment (Step 0 — see Workflow above)
217
+
218
+ The initial comment must be posted as **Step 0** — the absolute first action before
219
+ anything else. See the Step 0 block in the Workflow section for the exact code.
220
+
221
+ The subsections below describe how to **update** the comment at each step boundary.
222
+ For reference, here is the initial all-pending comment body posted in Step 0:
223
+
224
+ 1. Post a PR comment before Step 1 begins:
225
+
226
+ ```bash
227
+ gh pr comment "$PR_NUMBER" --repo "$REPO" --body "## E2E Progress\n\n| Step | Status | Started | Duration |
228
+ |---|---|---|---|
229
+ | 1. Fast-fail checks (build · link · \\`squad version\\`) | ⏳ Pending | --:-- | -- |
230
+ | 2. Create test repo(s) | ⏳ Pending | --:-- | -- |
231
+ | 3. \\`squad init\\` + file verification | ⏳ Pending | --:-- | -- |
232
+ | 4. Run sessions | ⏳ Pending | --:-- | -- |
233
+ | 5. Verify outcomes | ⏳ Pending | --:-- | -- |
234
+ | 6. Record verdicts + post final comment | ⏳ Pending | --:-- | -- |
235
+ \n| Symbol | Meaning |
236
+ |---|---|
237
+ | ⏳ | Not started |
238
+ | 🔄 | Running |
239
+ | ✅ | Passed |
240
+ | ❌ | Failed |
241
+ | ⚠️ | Passed with caveats |"
242
+ ```
243
+
244
+ 2. Capture the comment ID immediately after posting it:
245
+
246
+ ```bash
247
+ COMMENT_ID=$(gh api "repos/$REPO/issues/$PR_NUMBER/comments" --jq '.[-1].id')
248
+ ```
249
+
250
+ 3. Treat Step 1 as in progress as soon as the comment exists. Update the body so
251
+ Step 1 shows `🔄 Running` and every later step remains `⏳ Pending`.
252
+
253
+ ### Update the tracking comment after every step boundary
254
+
255
+ 1. When marking a step `🔄 Running`, record `$startTime = Get-Date` and store the
256
+ `HH:MM` start time in that row's **Started** column.
257
+ 2. Edit the existing comment in place; do not post a new progress comment:
258
+
259
+ ```bash
260
+ gh api --method PATCH "repos/$REPO/issues/comments/$COMMENT_ID" --field body="..."
261
+ ```
262
+
263
+ 3. When marking a step `✅`, `❌`, or `⚠️`, compute
264
+ `$duration = (Get-Date) - $startTime` and format it as
265
+ `"{0}m {1}s" -f [int]$duration.TotalMinutes, $duration.Seconds`.
266
+ 4. Update the completed step row to `✅`, `❌`, or `⚠️`, keep its original
267
+ `HH:MM` value in **Started**, and write the formatted duration in **Duration**.
268
+ 5. Keep all previously completed rows unchanged.
269
+ 6. Mark the next step as `🔄 Running` and set its **Started** value.
270
+ 7. Leave later steps as `⏳ Pending` with `--:--` for **Started** and `--` for
271
+ **Duration**.
272
+ 8. If a step fails and you stop early, still update the comment so the failed step
273
+ shows `❌` with its original start time and computed duration, and Step 6
274
+ becomes `🔄 Running` while you prepare the final verdict.
275
+
276
+ ### Use this status legend in the comment
277
+
278
+ | Symbol | Meaning |
279
+ |---|---|
280
+ | ⏳ | Not started |
281
+ | 🔄 | Running |
282
+ | ✅ | Passed |
283
+ | ❌ | Failed |
284
+ | ⚠️ | Passed with caveats |
285
+
286
+ ### Use exact step names and order
287
+
288
+ Keep these six rows in this exact order every time you update the comment:
289
+
290
+ 1. Fast-fail checks (build · link · `squad version`)
291
+ 2. Create test repo(s)
292
+ 3. `squad init` + file verification
293
+ 4. Run sessions
294
+ 5. Verify outcomes
295
+ 6. Record verdicts + post final comment
296
+
297
+ ### Handle Windows comment bodies safely
298
+
299
+ On Windows PowerShell 5.1, use the **`--field body=@file`** pattern to post comment
300
+ bodies. Write the content to a temp file using UTF-8 **without BOM**, then pass
301
+ `--field "body=@$tmpFile"` to `gh api`. This is more reliable than piping JSON
302
+ through `--input -` on PS 5.1, which can silently corrupt multi-byte characters
303
+ even with `[Console]::OutputEncoding = UTF8`.
304
+
305
+ Key rules:
306
+ - Use `New-Object System.Text.UTF8Encoding $false` (the `$false` disables the BOM).
307
+ `[System.Text.Encoding]::UTF8` writes a BOM which GitHub renders as a stray
308
+ character (``) at the start of the comment.
309
+ - Use `--field "body=@$tmpFile"`, NOT `--input -` or `--input filename`, for
310
+ comment body updates. The `@` prefix tells `gh` to read the field value from
311
+ the file rather than treating the path as a literal string.
312
+ - Clean up the temp file after posting.
313
+ - Scrub any local absolute paths from the body before posting (see PII Protection
314
+ section).
315
+
316
+ ```powershell
317
+ $step1StartTime = Get-Date
318
+ $step1Started = $step1StartTime.ToString('HH:mm')
319
+ $step1Duration = (Get-Date) - $step1StartTime
320
+ $step1DurationText = "{0}m {1}s" -f [int]$step1Duration.TotalMinutes, $step1Duration.Seconds
321
+ $step2StartTime = Get-Date
322
+ $step2Started = $step2StartTime.ToString('HH:mm')
323
+ $body = @"
324
+ ## E2E Progress
325
+
326
+ | Step | Status | Started | Duration |
327
+ |---|---|---|---|
328
+ | 1. Fast-fail checks (build · link · `squad version`) | :white_check_mark: Passed | $step1Started | $step1DurationText |
329
+ | 2. Create test repo(s) | :arrows_counterclockwise: Running | $step2Started | -- |
330
+ | 3. `squad init` + file verification | :hourglass_flowing_sand: Pending | --:-- | -- |
331
+ | 4. Run sessions | :hourglass_flowing_sand: Pending | --:-- | -- |
332
+ | 5. Verify outcomes | :hourglass_flowing_sand: Pending | --:-- | -- |
333
+ | 6. Record verdicts + post final comment | :hourglass_flowing_sand: Pending | --:-- | -- |
334
+
335
+ | Symbol | Meaning |
336
+ |---|---|
337
+ | :hourglass_flowing_sand: | Not started |
338
+ | :arrows_counterclockwise: | Running |
339
+ | :white_check_mark: | Passed |
340
+ | :x: | Failed |
341
+ | :warning: | Passed with caveats |
342
+ "@
343
+
344
+ $tmpFile = "$env:TEMP\e2e-comment-body.md"
345
+ $utf8NoBom = New-Object System.Text.UTF8Encoding $false
346
+ [System.IO.File]::WriteAllText($tmpFile, $body, $utf8NoBom)
347
+ gh api --method PATCH "repos/$env:REPO/issues/comments/$env:COMMENT_ID" --field "body=@$tmpFile"
348
+ Remove-Item $tmpFile -Force
349
+ ```
350
+
351
+ ### Progressive Verdicting — Post After Each Scenario (Critical)
352
+
353
+ **Do NOT batch all scenario results to the end.** This is the most common cause
354
+ of lost verdicts. After each scenario completes, **immediately** PATCH the
355
+ tracking comment with that scenario's result before moving to the next one.
356
+
357
+ The pattern for each scenario:
358
+
359
+ ```powershell
360
+ # After scenario N completes — PATCH immediately, before starting scenario N+1
361
+ $scenarioNDuration = "{0}m {1}s" -f [int]((Get-Date) - $scenarioNStartTime).TotalMinutes, ((Get-Date) - $scenarioNStartTime).Seconds
362
+ # ...rebuild the full comment body with this scenario updated to PASS/FAIL/PARTIAL...
363
+ $tmpFile = [System.IO.Path]::GetTempFileName()
364
+ $utf8NoBom = New-Object System.Text.UTF8Encoding $false
365
+ [System.IO.File]::WriteAllText($tmpFile, $body, $utf8NoBom)
366
+ gh api --method PATCH "repos/$env:REPO/issues/comments/$env:COMMENT_ID" --field "body=@$tmpFile"
367
+ Remove-Item $tmpFile -Force
368
+ Write-Host "Scenario N verdict posted"
369
+ ```
370
+
371
+ This guarantees that even if the AI model connection drops mid-run, the last
372
+ successfully PATCHed state is always visible in the PR.
373
+
374
+ ### Agent Run Time Budget
375
+
376
+ **⚠️ Critical: Background agents lose their AI model connection after ~15 minutes
377
+ of continuous execution.** This is a platform limit, not a bug in your code.
378
+ The verdict stage appears to "hang" because the connection drops right at the end
379
+ when the agent has been running too long.
380
+
381
+ **Per-agent scenario budget:**
382
+
383
+ | Scenario type | Estimated time | Budget |
384
+ |---|---|---|
385
+ | Static checks only (file existence, grep, size) | 1-3 min | 4 per agent |
386
+ | `squad init` + file verification (no copilot session) | 3-5 min | 3 per agent |
387
+ | `squad init` + one `copilot --agent squad` session | 8-15 min | **1 per agent** |
388
+ | Build + link + one copilot session | 12-20 min | **1 per agent** |
389
+
390
+ **Rule: Limit yourself to 1 scenario that includes a `copilot --agent squad` session
391
+ per agent run.** For a plan with multiple copilot-session scenarios, run them in
392
+ separate agents — not in sequence within a single agent.
393
+
394
+ If your scenario plan has N copilot-session scenarios, request N separate sims
395
+ agents to run them in parallel (one scenario each). Static scenarios may be
396
+ batched up to 4 per agent.
397
+
398
+ **If you are running a scenario with a `copilot --agent squad` session:**
399
+ - Run the build and link ONCE at the start (shared across all static scenarios)
400
+ - Run the copilot session immediately after the repo is set up
401
+ - PATCH the comment with the result immediately after the session ends
402
+ - Then proceed to static scenarios while you still have connection budget
403
+
404
+ ### Replace the tracking comment with the final verdict
405
+
406
+ When you reach Step 6, replace the tracking comment body entirely with the final
407
+ structured verdict table. Do not post a separate final comment. The tracking
408
+ comment is the final verdict comment.
409
+
410
+ Include a summary row at the bottom of the final table showing the total elapsed
411
+ time for the full run:
412
+
413
+ ```text
414
+ | **Total** | — | HH:MM | Xm Ys |
415
+ ```
416
+
417
+ **If the connection drops before Step 6:** The last progressive verdict PATCH
418
+ already shows the partial state. The next agent run should read the existing
419
+ comment, pick up where it left off, and add remaining scenario rows rather than
420
+ starting fresh.
421
+
422
+ ## Test Matrix Template
423
+
424
+ Use this matrix when planning validation for a template change. Not every change
425
+ needs every row — pick the scenarios relevant to your modification.
426
+
427
+ | # | Scenario | What to verify | Duration |
428
+ |---|----------|----------------|----------|
429
+ | 1 | Basic init + task | Templates applied, agent spawned, work produced | — |
430
+ | 2 | Cross-branch persistence | State survives `git checkout` (if state-backend) | — |
431
+ | 3 | Scribe behavior | Scribe commits to correct target | — |
432
+ | 4 | PR cleanliness | Feature branch PR has no leaked state files | — |
433
+ | 5 | Migration path | Existing squad picks up new template behavior | — |
434
+ | 6 | Edge case: empty repo | Init works in repo with single commit | — |
435
+ | 7 | Edge case: monorepo | Init works in subdirectory of monorepo | — |
436
+
437
+ Note: Keep `—` during planning, then replace it with the actual elapsed time when
438
+ recording the verdict for each scenario.
439
+
440
+ ## Tips
441
+
442
+ - **Name test repos descriptively:** `sq-test-notes-crossbranch`, not `test1`.
443
+ - **Always capture session logs.** Without logs, you can't debug failures.
444
+ - **One scenario per repo.** Don't reuse repos across unrelated tests — state
445
+ leaks between tests make results unreliable.
446
+ - **Clean up after.** Delete test repos when done. They accumulate fast.
447
+ - **Windows users:** Use PowerShell. `Tee-Object` replaces `tee`. Paths use `\`.
448
+
449
+ ## Fast-Fail Rules
450
+
451
+ These checks must pass before running any scenario. If any fail, stop
452
+ immediately and report the failure — do **not** attempt workarounds or mark
453
+ scenarios as SKIPPED.
454
+
455
+ 0. **Clean stale SDK before building.** Before running `npm run build`, remove any
456
+ stale published copy of `@bradygaster/squad-sdk` that may have been installed
457
+ into `packages/squad-cli/node_modules/`. This local copy shadows the workspace
458
+ symlink in the root `node_modules/` and causes TypeScript to see the published
459
+ version instead of the local source. Run from the repo root:
460
+ ```powershell
461
+ $stale = "packages\squad-cli\node_modules\@bradygaster\squad-sdk"
462
+ if (Test-Path $stale) { Remove-Item -Recurse -Force $stale; Write-Host "Cleaned stale SDK" }
463
+ ```
464
+ This is safe to run unconditionally — if the path doesn't exist, the command is
465
+ a no-op. The root `node_modules\@bradygaster\squad-sdk` workspace symlink remains
466
+ intact and npm will use it automatically.
467
+ 1. **Build must succeed.** Run `npm run build` from the repo root. A build
468
+ failure blocks all scenarios; report `BUILD_FAILED` and stop.
469
+ - If the error is `tsc: not found` or similar missing-binary errors, run
470
+ `npm install` first to reconcile `node_modules` with the lock file, then
471
+ retry. This can happen after `git checkout HEAD -- package-lock.json`
472
+ restores the lock file without reinstalling.
473
+ 2. **CLI must link successfully.** `cd packages/squad-cli && npm link` must exit
474
+ 0. If it fails, report `LINK_FAILED` and stop.
475
+ 3. **`squad version` must run.** After linking, `squad version` must output a
476
+ version string. If not, report `CLI_NOT_FOUND` and stop.
477
+
478
+ Do **not** mark scenarios as SKIPPED due to build or environment errors —
479
+ that obscures real failures from reviewers. SKIPPED is only acceptable when the
480
+ user explicitly requests it.
481
+
482
+ ## PII Protection — Mandatory
483
+
484
+ When posting evidence to PR comments, issues, or any shared document:
485
+
486
+ - **Never include absolute paths** that contain a local username (e.g.,
487
+ `C:\Users\username\...` or `/home/username/...`).
488
+ - **Use `~` notation** for home-relative paths: `~/AppData/Local/Temp/...`
489
+ or `~/tmp/sq-test-1`.
490
+ - **Repo-internal paths use `<repo-root>` as the prefix.** If evidence files live
491
+ inside the repository (e.g. `.e2e/`, `tmp/`, or any subdirectory of the repo),
492
+ write them as `<repo-root>\.e2e\pr-1035\evidence` — not as `~\..\..\...` backward
493
+ navigation. `<repo-root>` is a clear, portable placeholder for the repository root
494
+ that does not expose the machine's directory layout.
495
+ - **Scrub before posting.** Replace any occurrence of the local machine path
496
+ prefix (everything up to and including the username segment) with `~`.
497
+
498
+ Example — ❌ wrong: `C:\Users\johndoe\AppData\Local\Temp\sq-e2e-pr1035\evidence`
499
+ Example — ✅ right: `~/AppData/Local/Temp/sq-e2e-pr1035/evidence`
500
+ Example — ❌ wrong: `~\..\..\repos\squad\.e2e\pr-1035\evidence`
501
+ Example — ✅ right: `<repo-root>\.e2e\pr-1035\evidence`
502
+
503
+ This applies to all evidence tables, verdict files, and PR comments.
504
+
505
+ ## Anti-Patterns
506
+
507
+ - **Skipping the local build.** If you test with the published CLI, you're
508
+ testing the old templates, not your changes.
509
+ - **Posting absolute paths in PR comments.** Always scrub to `~`-relative paths
510
+ before sharing. See PII Protection above.
511
+ - **Marking scenarios SKIPPED due to environment issues.** Fix the environment
512
+ (use fast-fail rules above) or report BUILD_FAILED — never silently skip.
513
+ - **Testing only the happy path.** Template changes often break edge cases (empty
514
+ repos, monorepos, cross-branch). Test at least 2-3 scenarios.
515
+ - **Trusting session output alone.** Always verify git state independently —
516
+ agents can claim they wrote something without actually doing it.
517
+ - **Reusing test repos.** Prior state bleeds into later tests. Start fresh.
518
+ - **Batching all scenario verdicts to the end.** The AI model connection drops
519
+ after ~15 minutes. Always PATCH the comment after each scenario so partial
520
+ results are never lost. See Progressive Verdicting above.
521
+ - **Running multiple `copilot --agent squad` sessions in one agent.** Each
522
+ session takes 5-15 minutes; combined with build time, you'll hit the ~15-minute
523
+ connection budget. One copilot session per agent — split into parallel agents
524
+ if your plan has more.
525
+
526
+ ## Sandbox / Permission Notes
527
+
528
+ ### Always pass `--allow-all-tools` in non-interactive mode
529
+
530
+ The Copilot CLI requires explicit permission to run tools automatically. In
531
+ interactive mode the user approves each tool call; in non-interactive mode (`-p`)
532
+ those prompts cannot be displayed and writes fail silently or with a "Permission
533
+ denied and could not request permission from user" error.
534
+
535
+ Fix: always include `--allow-all-tools` (or `--yolo` / `--allow-all`) in Step 4
536
+ commands, or export `COPILOT_ALLOW_ALL=1` before running E2E sessions.
537
+
538
+ This also applies when `copilot --agent squad` is launched as a subprocess from
539
+ inside a Copilot CLI background agent (e.g. Sims running via the `task` tool) —
540
+ the flag is still needed.
541
+
542
+ ### `--allow-all-paths` for repos outside the CWD
543
+
544
+ By default the CLI restricts file access to the current directory tree. If the
545
+ coordinator needs to read files from a parent repo while running in a disposable
546
+ test repo, add `--allow-all-paths`:
547
+
548
+ ```powershell
549
+ copilot --agent squad --allow-all-tools --allow-all-paths -p "..."
550
+ ```
551
+
552
+ Or use the combined shorthand: `--allow-all` / `--yolo`.
553
+
554
+ ## Confidence
555
+
556
+ high — Validated through 12 real E2E test sessions during state-backend
557
+ development (PR #1004). `--allow-all-tools` requirement confirmed in PR #1035.