@bradygaster/squad-cli 0.9.6-insider.2 → 0.9.6-insider.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli/commands/doctor.d.ts.map +1 -1
- package/dist/cli/commands/doctor.js +29 -0
- package/dist/cli/commands/doctor.js.map +1 -1
- package/dist/cli/commands/export.d.ts +7 -3
- package/dist/cli/commands/export.d.ts.map +1 -1
- package/dist/cli/commands/export.js +68 -16
- package/dist/cli/commands/export.js.map +1 -1
- package/dist/cli/commands/import.d.ts +7 -3
- package/dist/cli/commands/import.d.ts.map +1 -1
- package/dist/cli/commands/import.js +140 -42
- package/dist/cli/commands/import.js.map +1 -1
- package/dist/cli/commands/link.d.ts.map +1 -1
- package/dist/cli/commands/link.js +7 -1
- package/dist/cli/commands/link.js.map +1 -1
- package/dist/cli/commands/memory.d.ts +2 -0
- package/dist/cli/commands/memory.d.ts.map +1 -0
- package/dist/cli/commands/memory.js +304 -0
- package/dist/cli/commands/memory.js.map +1 -0
- package/dist/cli/commands/plugin.d.ts.map +1 -1
- package/dist/cli/commands/plugin.js +420 -5
- package/dist/cli/commands/plugin.js.map +1 -1
- package/dist/cli/commands/state-mcp.d.ts +25 -0
- package/dist/cli/commands/state-mcp.d.ts.map +1 -0
- package/dist/cli/commands/state-mcp.js +168 -0
- package/dist/cli/commands/state-mcp.js.map +1 -0
- package/dist/cli/commands/watch/capabilities/board.d.ts.map +1 -1
- package/dist/cli/commands/watch/capabilities/board.js +2 -1
- package/dist/cli/commands/watch/capabilities/board.js.map +1 -1
- package/dist/cli/commands/watch/capabilities/decision-hygiene.js +1 -1
- package/dist/cli/commands/watch/capabilities/decision-hygiene.js.map +1 -1
- package/dist/cli/commands/watch/capabilities/execute.d.ts.map +1 -1
- package/dist/cli/commands/watch/capabilities/execute.js +12 -1
- package/dist/cli/commands/watch/capabilities/execute.js.map +1 -1
- package/dist/cli/commands/watch/capabilities/monitor-email.d.ts +1 -1
- package/dist/cli/commands/watch/capabilities/monitor-email.d.ts.map +1 -1
- package/dist/cli/commands/watch/capabilities/monitor-email.js +19 -3
- package/dist/cli/commands/watch/capabilities/monitor-email.js.map +1 -1
- package/dist/cli/commands/watch/capabilities/monitor-teams.d.ts +1 -1
- package/dist/cli/commands/watch/capabilities/monitor-teams.d.ts.map +1 -1
- package/dist/cli/commands/watch/capabilities/monitor-teams.js +19 -4
- package/dist/cli/commands/watch/capabilities/monitor-teams.js.map +1 -1
- package/dist/cli/commands/watch/capabilities/retro.js +1 -1
- package/dist/cli/commands/watch/capabilities/retro.js.map +1 -1
- package/dist/cli/commands/watch/index.d.ts.map +1 -1
- package/dist/cli/commands/watch/index.js +9 -6
- package/dist/cli/commands/watch/index.js.map +1 -1
- package/dist/cli/core/cast.d.ts.map +1 -1
- package/dist/cli/core/cast.js +132 -1
- package/dist/cli/core/cast.js.map +1 -1
- package/dist/cli/core/init.d.ts +2 -0
- package/dist/cli/core/init.d.ts.map +1 -1
- package/dist/cli/core/init.js +13 -1
- package/dist/cli/core/init.js.map +1 -1
- package/dist/cli/core/templates.d.ts.map +1 -1
- package/dist/cli/core/templates.js +31 -0
- package/dist/cli/core/templates.js.map +1 -1
- package/dist/cli/core/upgrade.d.ts +1 -0
- package/dist/cli/core/upgrade.d.ts.map +1 -1
- package/dist/cli/core/upgrade.js +171 -4
- package/dist/cli/core/upgrade.js.map +1 -1
- package/dist/cli/index.d.ts +1 -0
- package/dist/cli/index.d.ts.map +1 -1
- package/dist/cli/index.js +1 -0
- package/dist/cli/index.js.map +1 -1
- package/dist/cli/shell/components/App.js +1 -1
- package/dist/cli/shell/components/App.js.map +1 -1
- package/dist/cli/shell/components/MessageStream.js +2 -2
- package/dist/cli/shell/components/MessageStream.js.map +1 -1
- package/dist/cli/shell/coordinator.js +2 -2
- package/dist/cli/shell/index.d.ts.map +1 -1
- package/dist/cli/shell/index.js +2 -1
- package/dist/cli/shell/index.js.map +1 -1
- package/dist/cli-entry.js +51 -9
- package/dist/cli-entry.js.map +1 -1
- package/package.json +7 -3
- package/templates/after-agent-reference.md +64 -0
- package/templates/ceremony-reference.md +82 -0
- package/templates/client-compatibility-reference.md +46 -0
- package/templates/copilot-agent.md +96 -0
- package/templates/copilot-instructions.md +14 -0
- package/templates/model-selection-reference.md +101 -0
- package/templates/prd-intake.md +105 -0
- package/templates/rai-charter.md +110 -0
- package/templates/rai-policy.md +103 -0
- package/templates/ralph-reference.md +141 -0
- package/templates/routing.md +1 -0
- package/templates/scribe-charter.md +18 -151
- package/templates/session-init-reference.md +199 -0
- package/templates/skills/e2e-template-testing/SKILL.md +557 -0
- package/templates/skills/squad-commands/SKILL.md +303 -0
- package/templates/skills/squad-version-check/SKILL.md +160 -0
- package/templates/spawn-reference.md +132 -0
- package/templates/squad.agent.md.template +187 -622
- package/templates/worktree-reference.md +126 -0
|
@@ -0,0 +1,557 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "e2e-template-testing"
|
|
3
|
+
description: "End-to-end validation of coordinator and agent template changes"
|
|
4
|
+
domain: "development"
|
|
5
|
+
confidence: "high"
|
|
6
|
+
source: "manual"
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
|
|
11
|
+
Squad's coordinator prompt (`squad.agent.md`) and agent charters (e.g.
|
|
12
|
+
`scribe-charter.md`) are shipped as templates in `.squad-templates/`. Changes to
|
|
13
|
+
these files affect how every squad session behaves — but unit tests can't catch
|
|
14
|
+
prompt-level regressions because the prompts are interpreted by an LLM at
|
|
15
|
+
runtime.
|
|
16
|
+
|
|
17
|
+
This skill describes how to validate template changes end-to-end by running real
|
|
18
|
+
squad sessions against a locally-built CLI that includes your modified templates.
|
|
19
|
+
|
|
20
|
+
## When To Use
|
|
21
|
+
|
|
22
|
+
- You changed `.squad-templates/squad.agent.md` (coordinator prompt)
|
|
23
|
+
- You changed `.squad-templates/scribe-charter.md` or other agent charters
|
|
24
|
+
- You changed `.squad-templates/notes-protocol.md` or helper scripts
|
|
25
|
+
- You added new conditional blocks (e.g. state-backend-aware spawn templates)
|
|
26
|
+
- You modified the init scaffolding that writes templates to target repos
|
|
27
|
+
|
|
28
|
+
## Prerequisites
|
|
29
|
+
|
|
30
|
+
- **Node.js** ≥20, **npm** ≥10
|
|
31
|
+
- **Git** CLI
|
|
32
|
+
- **GitHub Copilot CLI** (`copilot` or `ghcs`) installed
|
|
33
|
+
- A local clone of the squad repo on your feature branch
|
|
34
|
+
|
|
35
|
+
## Workflow
|
|
36
|
+
|
|
37
|
+
### Step 0 — Post initial tracking comment (**FIRST action — before anything else**)
|
|
38
|
+
|
|
39
|
+
If `PR_NUMBER` and `REPO` are both set, **the absolute first thing you do** — before
|
|
40
|
+
fast-fail checks, before building, before creating any repos — is post the initial
|
|
41
|
+
tracking comment with all steps marked as `:hourglass_flowing_sand: Pending`.
|
|
42
|
+
|
|
43
|
+
This gives reviewers immediate visibility that a run is in progress and what to expect.
|
|
44
|
+
|
|
45
|
+
```powershell
|
|
46
|
+
$runStart = Get-Date
|
|
47
|
+
$body = @"
|
|
48
|
+
## E2E Progress - PR $env:PR_NUMBER
|
|
49
|
+
|
|
50
|
+
| Step | Status | Started | Duration |
|
|
51
|
+
|---|---|---|---|
|
|
52
|
+
| 1. Fast-fail checks (build :cd: link :cd: ``squad version``) | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
53
|
+
| 2. Create test repo(s) | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
54
|
+
| 3. ``squad init`` + file verification | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
55
|
+
| 4. Run sessions | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
56
|
+
| 5. Verify outcomes | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
57
|
+
| 6. Record verdicts + post final comment | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
58
|
+
|
|
59
|
+
| Symbol | Meaning |
|
|
60
|
+
|---|---|
|
|
61
|
+
| :hourglass_flowing_sand: | Not started |
|
|
62
|
+
| :arrows_counterclockwise: | Running |
|
|
63
|
+
| :white_check_mark: | Passed |
|
|
64
|
+
| :x: | Failed |
|
|
65
|
+
| :warning: | Passed with caveats |
|
|
66
|
+
|
|
67
|
+
*Run started: $($runStart.ToString('HH:mm')) — all steps pending*
|
|
68
|
+
"@
|
|
69
|
+
|
|
70
|
+
$tmpFile = [System.IO.Path]::GetTempFileName()
|
|
71
|
+
$utf8NoBom = New-Object System.Text.UTF8Encoding $false
|
|
72
|
+
[System.IO.File]::WriteAllText($tmpFile, $body, $utf8NoBom)
|
|
73
|
+
$response = gh api "repos/$env:REPO/issues/$env:PR_NUMBER/comments" --method POST --field "body=@$tmpFile" | ConvertFrom-Json
|
|
74
|
+
$env:COMMENT_ID = $response.id
|
|
75
|
+
Remove-Item $tmpFile -Force
|
|
76
|
+
Write-Host "Progress comment posted — ID: $($response.id)"
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
After posting, immediately update the comment to mark Step 1 as `:arrows_counterclockwise: Running` (do NOT wait — this is a two-step sequence: post all-pending, then immediately update to Step 1 running). Then proceed to Step 1.
|
|
80
|
+
|
|
81
|
+
See **Progress Reporting** for the full comment lifecycle and update patterns.
|
|
82
|
+
|
|
83
|
+
**⚠️ STOP before continuing:** If posting the comment fails (network error, auth error), abort the run and report the failure. Do not proceed silently without a tracking comment.
|
|
84
|
+
|
|
85
|
+
### Step 1 — Build the CLI from your branch
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
cd /path/to/squad # your feature branch
|
|
89
|
+
npm install
|
|
90
|
+
npm run build -w packages/squad-sdk && npm run build -w packages/squad-cli
|
|
91
|
+
|
|
92
|
+
# Link so `squad` command uses your local build (workspace flag — no cd required)
|
|
93
|
+
npm link -w packages/squad-cli
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Verify: `squad version` output includes the `-preview` suffix (e.g., `x.y.z-preview`),
|
|
97
|
+
confirming the local dev build is active. If the output shows a plain semver without
|
|
98
|
+
`-preview`, the globally-installed npm package is still in use — re-check the link step.
|
|
99
|
+
See [CONTRIBUTING.md — Making the `squad` Command Use Your Local Build](../../../CONTRIBUTING.md)
|
|
100
|
+
for the full guidance on local dev versioning.
|
|
101
|
+
|
|
102
|
+
### Step 2 — Create a disposable test repo
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
mkdir /tmp/sq-test-1 && cd /tmp/sq-test-1
|
|
106
|
+
git init
|
|
107
|
+
echo "# Test Project" > README.md
|
|
108
|
+
echo '{"name":"test-project","version":"1.0.0"}' > package.json
|
|
109
|
+
mkdir src
|
|
110
|
+
echo "export function hello() { return 'world' }" > src/index.ts
|
|
111
|
+
git add -A && git commit -m "init: test project"
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Keep the project small — you only need enough for the coordinator to recognize a
|
|
115
|
+
codebase and hire a team.
|
|
116
|
+
|
|
117
|
+
### Step 3 — Init a squad with your modified templates
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
squad init
|
|
121
|
+
# If testing a specific feature (e.g. state backends):
|
|
122
|
+
# squad init --state-backend git-notes
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Verify the init produced the expected files:
|
|
126
|
+
```bash
|
|
127
|
+
ls -la .squad/
|
|
128
|
+
cat .squad/team.md # should have ## Members with 3+ agents
|
|
129
|
+
cat .squad/config.json # should reflect any CLI flags you passed
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### Step 4 — Run a real session and capture output
|
|
133
|
+
|
|
134
|
+
Use the Copilot CLI's `-p` flag with `--allow-all-tools` for non-interactive sessions.
|
|
135
|
+
`--allow-all-tools` is **required** for automated/non-interactive runs — without it,
|
|
136
|
+
tool calls (including file writes) prompt for confirmation and block.
|
|
137
|
+
|
|
138
|
+
```powershell
|
|
139
|
+
# PowerShell (Windows)
|
|
140
|
+
copilot --agent squad --allow-all-tools -p "Picard, decide what testing framework to use. Write your decision." `
|
|
141
|
+
2>&1 | Tee-Object evidence/session-task.log
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
```bash
|
|
145
|
+
# Bash (macOS/Linux)
|
|
146
|
+
copilot --agent squad --allow-all-tools -p "Picard, decide what testing framework to use. Write your decision." \
|
|
147
|
+
2>&1 | tee evidence/session-task.log
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
Alternatively, set the `COPILOT_ALLOW_ALL=1` environment variable instead of the flag.
|
|
151
|
+
|
|
152
|
+
For multi-turn workflows, run sequential sessions:
|
|
153
|
+
```powershell
|
|
154
|
+
# Session A: give the team a task
|
|
155
|
+
copilot --agent squad --allow-all-tools -p "prompt A" 2>&1 | Tee-Object evidence/session-A.log
|
|
156
|
+
|
|
157
|
+
# Session B: verify state persisted
|
|
158
|
+
copilot --agent squad --allow-all-tools -p "What decisions has the team made?" 2>&1 | Tee-Object evidence/session-B.log
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Step 5 — Verify the outcome
|
|
162
|
+
|
|
163
|
+
Check that your template change had the expected effect. Common checks:
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
# State location (for state-backend changes)
|
|
167
|
+
git notes --ref=squad list # git-notes backend
|
|
168
|
+
git ls-tree -r squad-state # orphan backend
|
|
169
|
+
ls .squad/agents/*/history.md # worktree backend
|
|
170
|
+
|
|
171
|
+
# Coordinator behavior (grep session log)
|
|
172
|
+
grep "STATE_BACKEND" evidence/session-task.log
|
|
173
|
+
grep "spawn" evidence/session-task.log
|
|
174
|
+
|
|
175
|
+
# File tree diff
|
|
176
|
+
git diff --stat HEAD~1 # what changed on working branch
|
|
177
|
+
git log --all --oneline # commits across all branches
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### Step 6 — Record the verdict
|
|
181
|
+
|
|
182
|
+
Create an `evidence/verdict.md` in each test repo:
|
|
183
|
+
|
|
184
|
+
```markdown
|
|
185
|
+
## Test: [scenario name]
|
|
186
|
+
**Backend:** worktree | git-notes | orphan | two-layer
|
|
187
|
+
**Branch:** [your feature branch]
|
|
188
|
+
**Result:** PASS | PARTIAL | FAIL
|
|
189
|
+
**Duration:** Xm Ys
|
|
190
|
+
|
|
191
|
+
### What was verified
|
|
192
|
+
- [ ] Coordinator identified feature correctly (from session log)
|
|
193
|
+
- [ ] Agent was spawned via `task` tool (not simulated)
|
|
194
|
+
- [ ] team.md has ## Members with 3+ agents
|
|
195
|
+
- [ ] State landed in correct location
|
|
196
|
+
- [ ] No unexpected side effects
|
|
197
|
+
|
|
198
|
+
### Evidence files
|
|
199
|
+
- session-task.log — full session output
|
|
200
|
+
- git-log.txt — `git log --all --oneline`
|
|
201
|
+
|
|
202
|
+
### Notes
|
|
203
|
+
[anything unusual or noteworthy]
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Record the wall-clock time from the start of Step 1 (fast-fail checks) to the end
|
|
207
|
+
of Step 6 (verdict posted). This is the full E2E run duration for this scenario.
|
|
208
|
+
|
|
209
|
+
## Progress Reporting
|
|
210
|
+
|
|
211
|
+
Use this section only when you are running E2E validation for an open PR. If
|
|
212
|
+
`PR_NUMBER` and `REPO` are both set, post and maintain a live tracking comment
|
|
213
|
+
in the PR thread. If either value is missing (for example, a local-only run),
|
|
214
|
+
skip progress reporting silently.
|
|
215
|
+
|
|
216
|
+
### Start the tracking comment (Step 0 — see Workflow above)
|
|
217
|
+
|
|
218
|
+
The initial comment must be posted as **Step 0** — the absolute first action before
|
|
219
|
+
anything else. See the Step 0 block in the Workflow section for the exact code.
|
|
220
|
+
|
|
221
|
+
The subsections below describe how to **update** the comment at each step boundary.
|
|
222
|
+
For reference, here is the initial all-pending comment body posted in Step 0:
|
|
223
|
+
|
|
224
|
+
1. Post a PR comment before Step 1 begins:
|
|
225
|
+
|
|
226
|
+
```bash
|
|
227
|
+
gh pr comment "$PR_NUMBER" --repo "$REPO" --body "## E2E Progress\n\n| Step | Status | Started | Duration |
|
|
228
|
+
|---|---|---|---|
|
|
229
|
+
| 1. Fast-fail checks (build · link · \\`squad version\\`) | ⏳ Pending | --:-- | -- |
|
|
230
|
+
| 2. Create test repo(s) | ⏳ Pending | --:-- | -- |
|
|
231
|
+
| 3. \\`squad init\\` + file verification | ⏳ Pending | --:-- | -- |
|
|
232
|
+
| 4. Run sessions | ⏳ Pending | --:-- | -- |
|
|
233
|
+
| 5. Verify outcomes | ⏳ Pending | --:-- | -- |
|
|
234
|
+
| 6. Record verdicts + post final comment | ⏳ Pending | --:-- | -- |
|
|
235
|
+
\n| Symbol | Meaning |
|
|
236
|
+
|---|---|
|
|
237
|
+
| ⏳ | Not started |
|
|
238
|
+
| 🔄 | Running |
|
|
239
|
+
| ✅ | Passed |
|
|
240
|
+
| ❌ | Failed |
|
|
241
|
+
| ⚠️ | Passed with caveats |"
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
2. Capture the comment ID immediately after posting it:
|
|
245
|
+
|
|
246
|
+
```bash
|
|
247
|
+
COMMENT_ID=$(gh api "repos/$REPO/issues/$PR_NUMBER/comments" --jq '.[-1].id')
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
3. Treat Step 1 as in progress as soon as the comment exists. Update the body so
|
|
251
|
+
Step 1 shows `🔄 Running` and every later step remains `⏳ Pending`.
|
|
252
|
+
|
|
253
|
+
### Update the tracking comment after every step boundary
|
|
254
|
+
|
|
255
|
+
1. When marking a step `🔄 Running`, record `$startTime = Get-Date` and store the
|
|
256
|
+
`HH:MM` start time in that row's **Started** column.
|
|
257
|
+
2. Edit the existing comment in place; do not post a new progress comment:
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
gh api --method PATCH "repos/$REPO/issues/comments/$COMMENT_ID" --field body="..."
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
3. When marking a step `✅`, `❌`, or `⚠️`, compute
|
|
264
|
+
`$duration = (Get-Date) - $startTime` and format it as
|
|
265
|
+
`"{0}m {1}s" -f [int]$duration.TotalMinutes, $duration.Seconds`.
|
|
266
|
+
4. Update the completed step row to `✅`, `❌`, or `⚠️`, keep its original
|
|
267
|
+
`HH:MM` value in **Started**, and write the formatted duration in **Duration**.
|
|
268
|
+
5. Keep all previously completed rows unchanged.
|
|
269
|
+
6. Mark the next step as `🔄 Running` and set its **Started** value.
|
|
270
|
+
7. Leave later steps as `⏳ Pending` with `--:--` for **Started** and `--` for
|
|
271
|
+
**Duration**.
|
|
272
|
+
8. If a step fails and you stop early, still update the comment so the failed step
|
|
273
|
+
shows `❌` with its original start time and computed duration, and Step 6
|
|
274
|
+
becomes `🔄 Running` while you prepare the final verdict.
|
|
275
|
+
|
|
276
|
+
### Use this status legend in the comment
|
|
277
|
+
|
|
278
|
+
| Symbol | Meaning |
|
|
279
|
+
|---|---|
|
|
280
|
+
| ⏳ | Not started |
|
|
281
|
+
| 🔄 | Running |
|
|
282
|
+
| ✅ | Passed |
|
|
283
|
+
| ❌ | Failed |
|
|
284
|
+
| ⚠️ | Passed with caveats |
|
|
285
|
+
|
|
286
|
+
### Use exact step names and order
|
|
287
|
+
|
|
288
|
+
Keep these six rows in this exact order every time you update the comment:
|
|
289
|
+
|
|
290
|
+
1. Fast-fail checks (build · link · `squad version`)
|
|
291
|
+
2. Create test repo(s)
|
|
292
|
+
3. `squad init` + file verification
|
|
293
|
+
4. Run sessions
|
|
294
|
+
5. Verify outcomes
|
|
295
|
+
6. Record verdicts + post final comment
|
|
296
|
+
|
|
297
|
+
### Handle Windows comment bodies safely
|
|
298
|
+
|
|
299
|
+
On Windows PowerShell 5.1, use the **`--field body=@file`** pattern to post comment
|
|
300
|
+
bodies. Write the content to a temp file using UTF-8 **without BOM**, then pass
|
|
301
|
+
`--field "body=@$tmpFile"` to `gh api`. This is more reliable than piping JSON
|
|
302
|
+
through `--input -` on PS 5.1, which can silently corrupt multi-byte characters
|
|
303
|
+
even with `[Console]::OutputEncoding = UTF8`.
|
|
304
|
+
|
|
305
|
+
Key rules:
|
|
306
|
+
- Use `New-Object System.Text.UTF8Encoding $false` (the `$false` disables the BOM).
|
|
307
|
+
`[System.Text.Encoding]::UTF8` writes a BOM which GitHub renders as a stray
|
|
308
|
+
character (``) at the start of the comment.
|
|
309
|
+
- Use `--field "body=@$tmpFile"`, NOT `--input -` or `--input filename`, for
|
|
310
|
+
comment body updates. The `@` prefix tells `gh` to read the field value from
|
|
311
|
+
the file rather than treating the path as a literal string.
|
|
312
|
+
- Clean up the temp file after posting.
|
|
313
|
+
- Scrub any local absolute paths from the body before posting (see PII Protection
|
|
314
|
+
section).
|
|
315
|
+
|
|
316
|
+
```powershell
|
|
317
|
+
$step1StartTime = Get-Date
|
|
318
|
+
$step1Started = $step1StartTime.ToString('HH:mm')
|
|
319
|
+
$step1Duration = (Get-Date) - $step1StartTime
|
|
320
|
+
$step1DurationText = "{0}m {1}s" -f [int]$step1Duration.TotalMinutes, $step1Duration.Seconds
|
|
321
|
+
$step2StartTime = Get-Date
|
|
322
|
+
$step2Started = $step2StartTime.ToString('HH:mm')
|
|
323
|
+
$body = @"
|
|
324
|
+
## E2E Progress
|
|
325
|
+
|
|
326
|
+
| Step | Status | Started | Duration |
|
|
327
|
+
|---|---|---|---|
|
|
328
|
+
| 1. Fast-fail checks (build · link · `squad version`) | :white_check_mark: Passed | $step1Started | $step1DurationText |
|
|
329
|
+
| 2. Create test repo(s) | :arrows_counterclockwise: Running | $step2Started | -- |
|
|
330
|
+
| 3. `squad init` + file verification | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
331
|
+
| 4. Run sessions | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
332
|
+
| 5. Verify outcomes | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
333
|
+
| 6. Record verdicts + post final comment | :hourglass_flowing_sand: Pending | --:-- | -- |
|
|
334
|
+
|
|
335
|
+
| Symbol | Meaning |
|
|
336
|
+
|---|---|
|
|
337
|
+
| :hourglass_flowing_sand: | Not started |
|
|
338
|
+
| :arrows_counterclockwise: | Running |
|
|
339
|
+
| :white_check_mark: | Passed |
|
|
340
|
+
| :x: | Failed |
|
|
341
|
+
| :warning: | Passed with caveats |
|
|
342
|
+
"@
|
|
343
|
+
|
|
344
|
+
$tmpFile = "$env:TEMP\e2e-comment-body.md"
|
|
345
|
+
$utf8NoBom = New-Object System.Text.UTF8Encoding $false
|
|
346
|
+
[System.IO.File]::WriteAllText($tmpFile, $body, $utf8NoBom)
|
|
347
|
+
gh api --method PATCH "repos/$env:REPO/issues/comments/$env:COMMENT_ID" --field "body=@$tmpFile"
|
|
348
|
+
Remove-Item $tmpFile -Force
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
### Progressive Verdicting — Post After Each Scenario (Critical)
|
|
352
|
+
|
|
353
|
+
**Do NOT batch all scenario results to the end.** This is the most common cause
|
|
354
|
+
of lost verdicts. After each scenario completes, **immediately** PATCH the
|
|
355
|
+
tracking comment with that scenario's result before moving to the next one.
|
|
356
|
+
|
|
357
|
+
The pattern for each scenario:
|
|
358
|
+
|
|
359
|
+
```powershell
|
|
360
|
+
# After scenario N completes — PATCH immediately, before starting scenario N+1
|
|
361
|
+
$scenarioNDuration = "{0}m {1}s" -f [int]((Get-Date) - $scenarioNStartTime).TotalMinutes, ((Get-Date) - $scenarioNStartTime).Seconds
|
|
362
|
+
# ...rebuild the full comment body with this scenario updated to PASS/FAIL/PARTIAL...
|
|
363
|
+
$tmpFile = [System.IO.Path]::GetTempFileName()
|
|
364
|
+
$utf8NoBom = New-Object System.Text.UTF8Encoding $false
|
|
365
|
+
[System.IO.File]::WriteAllText($tmpFile, $body, $utf8NoBom)
|
|
366
|
+
gh api --method PATCH "repos/$env:REPO/issues/comments/$env:COMMENT_ID" --field "body=@$tmpFile"
|
|
367
|
+
Remove-Item $tmpFile -Force
|
|
368
|
+
Write-Host "Scenario N verdict posted"
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
This guarantees that even if the AI model connection drops mid-run, the last
|
|
372
|
+
successfully PATCHed state is always visible in the PR.
|
|
373
|
+
|
|
374
|
+
### Agent Run Time Budget
|
|
375
|
+
|
|
376
|
+
**⚠️ Critical: Background agents lose their AI model connection after ~15 minutes
|
|
377
|
+
of continuous execution.** This is a platform limit, not a bug in your code.
|
|
378
|
+
The verdict stage appears to "hang" because the connection drops right at the end
|
|
379
|
+
when the agent has been running too long.
|
|
380
|
+
|
|
381
|
+
**Per-agent scenario budget:**
|
|
382
|
+
|
|
383
|
+
| Scenario type | Estimated time | Budget |
|
|
384
|
+
|---|---|---|
|
|
385
|
+
| Static checks only (file existence, grep, size) | 1-3 min | 4 per agent |
|
|
386
|
+
| `squad init` + file verification (no copilot session) | 3-5 min | 3 per agent |
|
|
387
|
+
| `squad init` + one `copilot --agent squad` session | 8-15 min | **1 per agent** |
|
|
388
|
+
| Build + link + one copilot session | 12-20 min | **1 per agent** |
|
|
389
|
+
|
|
390
|
+
**Rule: Limit yourself to 1 scenario that includes a `copilot --agent squad` session
|
|
391
|
+
per agent run.** For a plan with multiple copilot-session scenarios, run them in
|
|
392
|
+
separate agents — not in sequence within a single agent.
|
|
393
|
+
|
|
394
|
+
If your scenario plan has N copilot-session scenarios, request N separate sims
|
|
395
|
+
agents to run them in parallel (one scenario each). Static scenarios may be
|
|
396
|
+
batched up to 4 per agent.
|
|
397
|
+
|
|
398
|
+
**If you are running a scenario with a `copilot --agent squad` session:**
|
|
399
|
+
- Run the build and link ONCE at the start (shared across all static scenarios)
|
|
400
|
+
- Run the copilot session immediately after the repo is set up
|
|
401
|
+
- PATCH the comment with the result immediately after the session ends
|
|
402
|
+
- Then proceed to static scenarios while you still have connection budget
|
|
403
|
+
|
|
404
|
+
### Replace the tracking comment with the final verdict
|
|
405
|
+
|
|
406
|
+
When you reach Step 6, replace the tracking comment body entirely with the final
|
|
407
|
+
structured verdict table. Do not post a separate final comment. The tracking
|
|
408
|
+
comment is the final verdict comment.
|
|
409
|
+
|
|
410
|
+
Include a summary row at the bottom of the final table showing the total elapsed
|
|
411
|
+
time for the full run:
|
|
412
|
+
|
|
413
|
+
```text
|
|
414
|
+
| **Total** | — | HH:MM | Xm Ys |
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**If the connection drops before Step 6:** The last progressive verdict PATCH
|
|
418
|
+
already shows the partial state. The next agent run should read the existing
|
|
419
|
+
comment, pick up where it left off, and add remaining scenario rows rather than
|
|
420
|
+
starting fresh.
|
|
421
|
+
|
|
422
|
+
## Test Matrix Template
|
|
423
|
+
|
|
424
|
+
Use this matrix when planning validation for a template change. Not every change
|
|
425
|
+
needs every row — pick the scenarios relevant to your modification.
|
|
426
|
+
|
|
427
|
+
| # | Scenario | What to verify | Duration |
|
|
428
|
+
|---|----------|----------------|----------|
|
|
429
|
+
| 1 | Basic init + task | Templates applied, agent spawned, work produced | — |
|
|
430
|
+
| 2 | Cross-branch persistence | State survives `git checkout` (if state-backend) | — |
|
|
431
|
+
| 3 | Scribe behavior | Scribe commits to correct target | — |
|
|
432
|
+
| 4 | PR cleanliness | Feature branch PR has no leaked state files | — |
|
|
433
|
+
| 5 | Migration path | Existing squad picks up new template behavior | — |
|
|
434
|
+
| 6 | Edge case: empty repo | Init works in repo with single commit | — |
|
|
435
|
+
| 7 | Edge case: monorepo | Init works in subdirectory of monorepo | — |
|
|
436
|
+
|
|
437
|
+
Note: Keep `—` during planning, then replace it with the actual elapsed time when
|
|
438
|
+
recording the verdict for each scenario.
|
|
439
|
+
|
|
440
|
+
## Tips
|
|
441
|
+
|
|
442
|
+
- **Name test repos descriptively:** `sq-test-notes-crossbranch`, not `test1`.
|
|
443
|
+
- **Always capture session logs.** Without logs, you can't debug failures.
|
|
444
|
+
- **One scenario per repo.** Don't reuse repos across unrelated tests — state
|
|
445
|
+
leaks between tests make results unreliable.
|
|
446
|
+
- **Clean up after.** Delete test repos when done. They accumulate fast.
|
|
447
|
+
- **Windows users:** Use PowerShell. `Tee-Object` replaces `tee`. Paths use `\`.
|
|
448
|
+
|
|
449
|
+
## Fast-Fail Rules
|
|
450
|
+
|
|
451
|
+
These checks must pass before running any scenario. If any fail, stop
|
|
452
|
+
immediately and report the failure — do **not** attempt workarounds or mark
|
|
453
|
+
scenarios as SKIPPED.
|
|
454
|
+
|
|
455
|
+
0. **Clean stale SDK before building.** Before running `npm run build`, remove any
|
|
456
|
+
stale published copy of `@bradygaster/squad-sdk` that may have been installed
|
|
457
|
+
into `packages/squad-cli/node_modules/`. This local copy shadows the workspace
|
|
458
|
+
symlink in the root `node_modules/` and causes TypeScript to see the published
|
|
459
|
+
version instead of the local source. Run from the repo root:
|
|
460
|
+
```powershell
|
|
461
|
+
$stale = "packages\squad-cli\node_modules\@bradygaster\squad-sdk"
|
|
462
|
+
if (Test-Path $stale) { Remove-Item -Recurse -Force $stale; Write-Host "Cleaned stale SDK" }
|
|
463
|
+
```
|
|
464
|
+
This is safe to run unconditionally — if the path doesn't exist, the command is
|
|
465
|
+
a no-op. The root `node_modules\@bradygaster\squad-sdk` workspace symlink remains
|
|
466
|
+
intact and npm will use it automatically.
|
|
467
|
+
1. **Build must succeed.** Run `npm run build` from the repo root. A build
|
|
468
|
+
failure blocks all scenarios; report `BUILD_FAILED` and stop.
|
|
469
|
+
- If the error is `tsc: not found` or similar missing-binary errors, run
|
|
470
|
+
`npm install` first to reconcile `node_modules` with the lock file, then
|
|
471
|
+
retry. This can happen after `git checkout HEAD -- package-lock.json`
|
|
472
|
+
restores the lock file without reinstalling.
|
|
473
|
+
2. **CLI must link successfully.** `cd packages/squad-cli && npm link` must exit
|
|
474
|
+
0. If it fails, report `LINK_FAILED` and stop.
|
|
475
|
+
3. **`squad version` must run.** After linking, `squad version` must output a
|
|
476
|
+
version string. If not, report `CLI_NOT_FOUND` and stop.
|
|
477
|
+
|
|
478
|
+
Do **not** mark scenarios as SKIPPED due to build or environment errors —
|
|
479
|
+
that obscures real failures from reviewers. SKIPPED is only acceptable when the
|
|
480
|
+
user explicitly requests it.
|
|
481
|
+
|
|
482
|
+
## PII Protection — Mandatory
|
|
483
|
+
|
|
484
|
+
When posting evidence to PR comments, issues, or any shared document:
|
|
485
|
+
|
|
486
|
+
- **Never include absolute paths** that contain a local username (e.g.,
|
|
487
|
+
`C:\Users\username\...` or `/home/username/...`).
|
|
488
|
+
- **Use `~` notation** for home-relative paths: `~/AppData/Local/Temp/...`
|
|
489
|
+
or `~/tmp/sq-test-1`.
|
|
490
|
+
- **Repo-internal paths use `<repo-root>` as the prefix.** If evidence files live
|
|
491
|
+
inside the repository (e.g. `.e2e/`, `tmp/`, or any subdirectory of the repo),
|
|
492
|
+
write them as `<repo-root>\.e2e\pr-1035\evidence` — not as `~\..\..\...` backward
|
|
493
|
+
navigation. `<repo-root>` is a clear, portable placeholder for the repository root
|
|
494
|
+
that does not expose the machine's directory layout.
|
|
495
|
+
- **Scrub before posting.** Replace any occurrence of the local machine path
|
|
496
|
+
prefix (everything up to and including the username segment) with `~`.
|
|
497
|
+
|
|
498
|
+
Example — ❌ wrong: `C:\Users\johndoe\AppData\Local\Temp\sq-e2e-pr1035\evidence`
|
|
499
|
+
Example — ✅ right: `~/AppData/Local/Temp/sq-e2e-pr1035/evidence`
|
|
500
|
+
Example — ❌ wrong: `~\..\..\repos\squad\.e2e\pr-1035\evidence`
|
|
501
|
+
Example — ✅ right: `<repo-root>\.e2e\pr-1035\evidence`
|
|
502
|
+
|
|
503
|
+
This applies to all evidence tables, verdict files, and PR comments.
|
|
504
|
+
|
|
505
|
+
## Anti-Patterns
|
|
506
|
+
|
|
507
|
+
- **Skipping the local build.** If you test with the published CLI, you're
|
|
508
|
+
testing the old templates, not your changes.
|
|
509
|
+
- **Posting absolute paths in PR comments.** Always scrub to `~`-relative paths
|
|
510
|
+
before sharing. See PII Protection above.
|
|
511
|
+
- **Marking scenarios SKIPPED due to environment issues.** Fix the environment
|
|
512
|
+
(use fast-fail rules above) or report BUILD_FAILED — never silently skip.
|
|
513
|
+
- **Testing only the happy path.** Template changes often break edge cases (empty
|
|
514
|
+
repos, monorepos, cross-branch). Test at least 2-3 scenarios.
|
|
515
|
+
- **Trusting session output alone.** Always verify git state independently —
|
|
516
|
+
agents can claim they wrote something without actually doing it.
|
|
517
|
+
- **Reusing test repos.** Prior state bleeds into later tests. Start fresh.
|
|
518
|
+
- **Batching all scenario verdicts to the end.** The AI model connection drops
|
|
519
|
+
after ~15 minutes. Always PATCH the comment after each scenario so partial
|
|
520
|
+
results are never lost. See Progressive Verdicting above.
|
|
521
|
+
- **Running multiple `copilot --agent squad` sessions in one agent.** Each
|
|
522
|
+
session takes 5-15 minutes; combined with build time, you'll hit the ~15-minute
|
|
523
|
+
connection budget. One copilot session per agent — split into parallel agents
|
|
524
|
+
if your plan has more.
|
|
525
|
+
|
|
526
|
+
## Sandbox / Permission Notes
|
|
527
|
+
|
|
528
|
+
### Always pass `--allow-all-tools` in non-interactive mode
|
|
529
|
+
|
|
530
|
+
The Copilot CLI requires explicit permission to run tools automatically. In
|
|
531
|
+
interactive mode the user approves each tool call; in non-interactive mode (`-p`)
|
|
532
|
+
those prompts cannot be displayed and writes fail silently or with a "Permission
|
|
533
|
+
denied and could not request permission from user" error.
|
|
534
|
+
|
|
535
|
+
Fix: always include `--allow-all-tools` (or `--yolo` / `--allow-all`) in Step 4
|
|
536
|
+
commands, or export `COPILOT_ALLOW_ALL=1` before running E2E sessions.
|
|
537
|
+
|
|
538
|
+
This also applies when `copilot --agent squad` is launched as a subprocess from
|
|
539
|
+
inside a Copilot CLI background agent (e.g. Sims running via the `task` tool) —
|
|
540
|
+
the flag is still needed.
|
|
541
|
+
|
|
542
|
+
### `--allow-all-paths` for repos outside the CWD
|
|
543
|
+
|
|
544
|
+
By default the CLI restricts file access to the current directory tree. If the
|
|
545
|
+
coordinator needs to read files from a parent repo while running in a disposable
|
|
546
|
+
test repo, add `--allow-all-paths`:
|
|
547
|
+
|
|
548
|
+
```powershell
|
|
549
|
+
copilot --agent squad --allow-all-tools --allow-all-paths -p "..."
|
|
550
|
+
```
|
|
551
|
+
|
|
552
|
+
Or use the combined shorthand: `--allow-all` / `--yolo`.
|
|
553
|
+
|
|
554
|
+
## Confidence
|
|
555
|
+
|
|
556
|
+
high — Validated through 12 real E2E test sessions during state-backend
|
|
557
|
+
development (PR #1004). `--allow-all-tools` requirement confirmed in PR #1035.
|