maestro-flow 0.3.37 → 0.3.39

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/.claude/agents/workflow-analyzer.md +2 -0
  2. package/.claude/agents/workflow-debugger.md +2 -0
  3. package/.claude/agents/workflow-executor.md +2 -0
  4. package/.claude/agents/workflow-integration-checker.md +1 -0
  5. package/.claude/agents/workflow-nyquist-auditor.md +1 -0
  6. package/.claude/agents/workflow-planner.md +2 -0
  7. package/.claude/agents/workflow-reviewer.md +2 -0
  8. package/.claude/agents/workflow-verifier.md +2 -0
  9. package/.claude/commands/learn-decompose.md +176 -176
  10. package/.claude/commands/learn-follow.md +167 -167
  11. package/.claude/commands/learn-investigate.md +221 -221
  12. package/.claude/commands/learn-retro.md +303 -303
  13. package/.claude/commands/learn-second-opinion.md +167 -167
  14. package/.claude/commands/maestro-amend.md +300 -300
  15. package/.claude/commands/maestro-analyze.md +126 -126
  16. package/.claude/commands/maestro-brainstorm.md +100 -100
  17. package/.claude/commands/maestro-composer.md +354 -354
  18. package/.claude/commands/maestro-execute.md +120 -114
  19. package/.claude/commands/maestro-fork.md +86 -86
  20. package/.claude/commands/maestro-init.md +78 -78
  21. package/.claude/commands/maestro-learn.md +140 -140
  22. package/.claude/commands/maestro-link-coordinate.md +1 -1
  23. package/.claude/commands/maestro-merge.md +61 -61
  24. package/.claude/commands/maestro-milestone-release.md +96 -96
  25. package/.claude/commands/maestro-overlay.md +178 -178
  26. package/.claude/commands/maestro-plan.md +150 -138
  27. package/.claude/commands/maestro-player.md +404 -404
  28. package/.claude/commands/maestro-quick.md +56 -56
  29. package/.claude/commands/maestro-ralph-execute.md +7 -18
  30. package/.claude/commands/maestro-ralph.md +9 -3
  31. package/.claude/commands/maestro-roadmap.md +1 -1
  32. package/.claude/commands/maestro-ui-design.md +93 -93
  33. package/.claude/commands/maestro-update.md +176 -176
  34. package/.claude/commands/maestro-verify.md +96 -90
  35. package/.claude/commands/maestro.md +121 -121
  36. package/.claude/commands/manage-codebase-rebuild.md +75 -75
  37. package/.claude/commands/manage-codebase-refresh.md +57 -57
  38. package/.claude/commands/manage-harvest.md +94 -94
  39. package/.claude/commands/manage-issue-discover.md +77 -77
  40. package/.claude/commands/manage-issue.md +73 -73
  41. package/.claude/commands/manage-knowhow-capture.md +193 -193
  42. package/.claude/commands/manage-knowhow.md +77 -77
  43. package/.claude/commands/manage-learn.md +67 -67
  44. package/.claude/commands/manage-status.md +51 -51
  45. package/.claude/commands/manage-wiki.md +62 -62
  46. package/.claude/commands/quality-auto-test.md +1 -1
  47. package/.claude/commands/quality-debug.md +121 -115
  48. package/.claude/commands/quality-refactor.md +55 -55
  49. package/.claude/commands/quality-retrospective.md +78 -78
  50. package/.claude/commands/quality-review.md +114 -108
  51. package/.claude/commands/quality-sync.md +51 -51
  52. package/.claude/commands/quality-test.md +103 -103
  53. package/.claude/commands/spec-add.md +49 -49
  54. package/.claude/commands/spec-load.md +51 -51
  55. package/.claude/commands/spec-remove.md +51 -51
  56. package/.claude/commands/spec-setup.md +51 -51
  57. package/.claude/commands/wiki-connect.md +62 -62
  58. package/.claude/commands/wiki-digest.md +69 -69
  59. package/.codex/skills/learn-decompose/SKILL.md +113 -113
  60. package/.codex/skills/learn-follow/SKILL.md +1 -1
  61. package/.codex/skills/learn-investigate/SKILL.md +83 -83
  62. package/.codex/skills/learn-retro/SKILL.md +83 -83
  63. package/.codex/skills/learn-second-opinion/SKILL.md +86 -86
  64. package/.codex/skills/maestro/SKILL.md +304 -304
  65. package/.codex/skills/maestro-analyze/SKILL.md +9 -8
  66. package/.codex/skills/maestro-brainstorm/SKILL.md +442 -397
  67. package/.codex/skills/maestro-composer/SKILL.md +213 -213
  68. package/.codex/skills/maestro-execute/SKILL.md +346 -318
  69. package/.codex/skills/maestro-fork/SKILL.md +56 -2
  70. package/.codex/skills/maestro-init/SKILL.md +40 -16
  71. package/.codex/skills/maestro-learn/SKILL.md +80 -80
  72. package/.codex/skills/maestro-link-coordinate/SKILL.md +257 -257
  73. package/.codex/skills/maestro-merge/SKILL.md +1 -1
  74. package/.codex/skills/maestro-milestone-audit/SKILL.md +1 -1
  75. package/.codex/skills/maestro-milestone-complete/SKILL.md +40 -9
  76. package/.codex/skills/maestro-milestone-release/SKILL.md +70 -70
  77. package/.codex/skills/maestro-overlay/SKILL.md +1 -1
  78. package/.codex/skills/maestro-plan/SKILL.md +19 -4
  79. package/.codex/skills/maestro-player/SKILL.md +323 -323
  80. package/.codex/skills/maestro-quick/SKILL.md +1 -1
  81. package/.codex/skills/maestro-ralph/SKILL.md +681 -578
  82. package/.codex/skills/maestro-roadmap/SKILL.md +518 -468
  83. package/.codex/skills/maestro-ui-design/SKILL.md +109 -12
  84. package/.codex/skills/maestro-verify/SKILL.md +27 -9
  85. package/.codex/skills/manage-codebase-rebuild/SKILL.md +3 -2
  86. package/.codex/skills/manage-codebase-refresh/SKILL.md +1 -1
  87. package/.codex/skills/manage-harvest/SKILL.md +91 -91
  88. package/.codex/skills/manage-issue/SKILL.md +19 -6
  89. package/.codex/skills/manage-issue-discover/SKILL.md +1 -1
  90. package/.codex/skills/manage-knowhow/SKILL.md +95 -95
  91. package/.codex/skills/manage-knowhow-capture/SKILL.md +110 -110
  92. package/.codex/skills/manage-learn/SKILL.md +1 -1
  93. package/.codex/skills/manage-status/SKILL.md +1 -1
  94. package/.codex/skills/manage-wiki/SKILL.md +55 -55
  95. package/.codex/skills/quality-auto-test/SKILL.md +547 -547
  96. package/.codex/skills/quality-debug/SKILL.md +339 -334
  97. package/.codex/skills/quality-refactor/SKILL.md +1 -1
  98. package/.codex/skills/quality-retrospective/SKILL.md +292 -292
  99. package/.codex/skills/quality-review/SKILL.md +365 -364
  100. package/.codex/skills/quality-sync/SKILL.md +1 -1
  101. package/.codex/skills/quality-test/SKILL.md +498 -498
  102. package/.codex/skills/spec-add/SKILL.md +101 -101
  103. package/.codex/skills/spec-load/SKILL.md +77 -77
  104. package/.codex/skills/spec-map/SKILL.md +1 -1
  105. package/.codex/skills/spec-remove/SKILL.md +69 -69
  106. package/.codex/skills/spec-setup/SKILL.md +1 -1
  107. package/.codex/skills/team-coordinate/SKILL.md +2 -1
  108. package/.codex/skills/team-executor/SKILL.md +116 -115
  109. package/.codex/skills/team-lifecycle-v4/SKILL.md +2 -1
  110. package/.codex/skills/team-lifecycle-v4/instructions/agent-instruction.md +14 -6
  111. package/.codex/skills/team-lifecycle-v4/roles/analyst/role.md +16 -4
  112. package/.codex/skills/team-lifecycle-v4/roles/executor/commands/implement.md +7 -1
  113. package/.codex/skills/team-lifecycle-v4/roles/planner/role.md +16 -4
  114. package/.codex/skills/team-lifecycle-v4/roles/writer/role.md +8 -2
  115. package/.codex/skills/team-quality-assurance/SKILL.md +2 -1
  116. package/.codex/skills/team-quality-assurance/roles/scout/role.md +9 -2
  117. package/.codex/skills/team-review/SKILL.md +2 -1
  118. package/.codex/skills/team-review/roles/reviewer/role.md +10 -1
  119. package/.codex/skills/team-review/roles/scanner/role.md +10 -1
  120. package/.codex/skills/team-tech-debt/SKILL.md +144 -143
  121. package/.codex/skills/team-tech-debt/roles/executor/role.md +9 -5
  122. package/.codex/skills/team-tech-debt/roles/scanner/role.md +10 -0
  123. package/.codex/skills/team-tech-debt/roles/validator/role.md +8 -2
  124. package/.codex/skills/team-testing/SKILL.md +2 -1
  125. package/.codex/skills/team-testing/roles/executor/role.md +8 -2
  126. package/.codex/skills/team-testing/roles/generator/role.md +8 -2
  127. package/.codex/skills/wiki-connect/SKILL.md +73 -73
  128. package/.codex/skills/wiki-digest/SKILL.md +87 -87
  129. package/README.md +6 -0
  130. package/README.zh-CN.md +6 -0
  131. package/dashboard/dist-server/dashboard/src/server/agents/claude-code-adapter.js +4 -0
  132. package/dashboard/dist-server/dashboard/src/server/agents/claude-code-adapter.js.map +1 -1
  133. package/dashboard/dist-server/dashboard/src/server/agents/codex-cli-adapter.js +118 -7
  134. package/dashboard/dist-server/dashboard/src/server/agents/codex-cli-adapter.js.map +1 -1
  135. package/dashboard/dist-server/shared/agent-types.d.ts +2 -0
  136. package/dashboard/dist-server/src/agents/cli-agent-runner.d.ts +2 -0
  137. package/dashboard/dist-server/src/agents/cli-agent-runner.js +4 -0
  138. package/dashboard/dist-server/src/agents/cli-agent-runner.js.map +1 -1
  139. package/dashboard/dist-server/src/commands/delegate.d.ts +2 -0
  140. package/dashboard/dist-server/src/commands/delegate.js +18 -0
  141. package/dashboard/dist-server/src/commands/delegate.js.map +1 -1
  142. package/dashboard/dist-server/src/config/cli-tools-config.d.ts +6 -0
  143. package/dashboard/dist-server/src/config/cli-tools-config.js +2 -0
  144. package/dashboard/dist-server/src/config/cli-tools-config.js.map +1 -1
  145. package/dist/shared/agent-types.d.ts +2 -0
  146. package/dist/shared/agent-types.d.ts.map +1 -1
  147. package/dist/src/agents/cli-agent-runner.d.ts +2 -0
  148. package/dist/src/agents/cli-agent-runner.d.ts.map +1 -1
  149. package/dist/src/agents/cli-agent-runner.js +4 -0
  150. package/dist/src/agents/cli-agent-runner.js.map +1 -1
  151. package/dist/src/commands/config.d.ts.map +1 -1
  152. package/dist/src/commands/config.js +29 -1
  153. package/dist/src/commands/config.js.map +1 -1
  154. package/dist/src/commands/delegate.d.ts +2 -0
  155. package/dist/src/commands/delegate.d.ts.map +1 -1
  156. package/dist/src/commands/delegate.js +18 -0
  157. package/dist/src/commands/delegate.js.map +1 -1
  158. package/dist/src/commands/launcher.d.ts.map +1 -1
  159. package/dist/src/commands/launcher.js +27 -4
  160. package/dist/src/commands/launcher.js.map +1 -1
  161. package/dist/src/config/cli-tools-config.d.ts +6 -0
  162. package/dist/src/config/cli-tools-config.d.ts.map +1 -1
  163. package/dist/src/config/cli-tools-config.js +2 -0
  164. package/dist/src/config/cli-tools-config.js.map +1 -1
  165. package/dist/src/core/overlay/applier.d.ts.map +1 -1
  166. package/dist/src/core/overlay/applier.js +65 -5
  167. package/dist/src/core/overlay/applier.js.map +1 -1
  168. package/dist/src/core/overlay/loader.d.ts.map +1 -1
  169. package/dist/src/core/overlay/loader.js +9 -4
  170. package/dist/src/core/overlay/loader.js.map +1 -1
  171. package/dist/src/core/overlay/types.d.ts +2 -0
  172. package/dist/src/core/overlay/types.d.ts.map +1 -1
  173. package/dist/src/core/overlay/types.js +2 -0
  174. package/dist/src/core/overlay/types.js.map +1 -1
  175. package/dist/src/tui/tools-ui/ToolsDashboard.d.ts.map +1 -1
  176. package/dist/src/tui/tools-ui/ToolsDashboard.js +1 -1
  177. package/dist/src/tui/tools-ui/ToolsDashboard.js.map +1 -1
  178. package/dist/src/tui/tools-ui/ToolsOverview.d.ts.map +1 -1
  179. package/dist/src/tui/tools-ui/ToolsOverview.js +51 -4
  180. package/dist/src/tui/tools-ui/ToolsOverview.js.map +1 -1
  181. package/package.json +1 -1
  182. package/shared/agent-types.ts +2 -0
  183. package/workflows/delegate-protocol.codex.md +65 -0
  184. package/workflows/issue-analyze.md +2 -3
  185. package/workflows/issue-gaps-analyze.codex.md +260 -0
  186. package/workflows/issue-gaps-analyze.md +214 -0
@@ -1,498 +1,498 @@
1
- ---
2
- name: quality-test
3
- description: Conversational UAT with session persistence, CSV-parallel debug diagnosis via spawn_agents_on_csv, severity inference, and gap-plan closure loop.
4
- argument-hint: "<phase> [-y] [--smoke] [--auto-fix] [--session ID]"
5
- allowed-tools: spawn_agents_on_csv, Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
6
- ---
7
-
8
- <purpose>
9
- Conversational UAT: present expected behavior one test at a time, user confirms or describes issues. Severity inferred from natural language (never asked). Session persists in `uat.md` across context resets. Failed tests trigger CSV-parallel diagnosis via `spawn_agents_on_csv` and optional gap-fix closure.
10
-
11
- **Philosophy**: Show expected, ask if reality matches.
12
-
13
- ```
14
- +---------------------------------------------------------------------------+
15
- | UAT CSV DIAGNOSIS PIPELINE |
16
- +---------------------------------------------------------------------------+
17
- | |
18
- | Phase 1: Setup & Scenario Design |
19
- | +-- Resolve target (phase / scratch) |
20
- | +-- Check active sessions (resume or new) |
21
- | +-- Smoke tests (if --smoke) |
22
- | +-- Load verification context + quality artifacts |
23
- | +-- Design test scenarios from user-observable outcomes |
24
- | +-- Create uat.md with all tests pending |
25
- | |
26
- | Phase 2: Interactive Testing (one at a time) |
27
- | +-- Present test: show expected behavior |
28
- | +-- User responds: pass / skip / describe issue |
29
- | +-- Severity inferred (never asked) |
30
- | +-- Issues auto-created in issues.jsonl |
31
- | +-- Batched writes to uat.md |
32
- | |
33
- | Phase 3: Diagnosis (if issues found) |
34
- | +-- Cluster gaps by component/module |
35
- | +-- Build diagnosis.csv from gap clusters |
36
- | +-- Diagnose in parallel via spawn_agents_on_csv |
37
- | +-- Each agent: find root cause, fix direction, affected files |
38
- | +-- Merge results into uat.md gaps |
39
- | |
40
- | Phase 4: Gap Closure & Report |
41
- | +-- If --auto-fix: plan --gaps -> execute -> re-verify (max 2) |
42
- | +-- Otherwise: present options (auto-fix / debug / plan / manual) |
43
- | +-- Issue lifecycle sync throughout |
44
- | +-- Report with pass/fail counts and next steps |
45
- | |
46
- +---------------------------------------------------------------------------+
47
- ```
48
- </purpose>
49
-
50
- <context>
51
- ```bash
52
- $quality-test "3" # test phase 3
53
- $quality-test "3 --smoke" # smoke tests first, then UAT
54
- $quality-test "3 --auto-fix" # auto-trigger gap-fix loop on failures
55
- $quality-test "-y 3" # implies --auto-fix, skip gap closure prompt
56
- $quality-test "--session 04-comments" # resume specific session
57
- ```
58
-
59
- **Flags**:
60
- - `<phase>`: Phase number or scratch task ID
61
- - `--smoke`: Run cold-start smoke tests before UAT
62
- - `--auto-fix`: Auto-trigger gap-fix loop (plan --gaps -> execute -> re-verify) on failures
63
- - `--session ID`: Resume a specific UAT session
64
-
65
- `-y` implies `--auto-fix`. UAT itself remains interactive (present expected → user confirms). `-y` only automates the gap closure loop.
66
-
67
- **Output**:
68
- - `{target_dir}/uat.md` — session file (persistent)
69
- - `{target_dir}/.tests/test-plan.json` — scenario definitions
70
- - `{target_dir}/.tests/test-results.json` — pass/fail results
71
- - `{target_dir}/.tests/coverage-report.json` — requirement coverage
72
- - `.tests/.csv-session/diagnosis.csv` + `diagnosis-results.csv` — diagnosis artifacts
73
- </context>
74
-
75
- <csv_schema>
76
-
77
- ### diagnosis.csv (Gap Diagnosis Phase)
78
-
79
- ```csv
80
- id,test_id,cluster,test_name,expected,reported,severity,target_files,issue_id,source_context,root_cause,fix_direction,affected_files,evidence,error
81
- "DX-001","T-003","auth","Login validation","Valid login returns dashboard","Clicking login does nothing, no error","major","src/auth/login.ts;src/routes/auth.ts","ISS-20260503-001","login.ts calls authService.verify, auth.ts exports POST /login","","","","",""
82
- "DX-002","T-005","events","Event cleanup on logout","Events unsubscribed after logout","Memory leak warning in console after logout","blocker","src/events/manager.ts","ISS-20260503-002","manager.ts has subscribe() but no unsubscribe in logout flow","","","","",""
83
- ```
84
-
85
- **Columns**:
86
-
87
- | Column | Phase | Description |
88
- |--------|-------|-------------|
89
- | `id` | Input | Diagnosis ID (DX-NNN) |
90
- | `test_id` | Input | Reference to T-NNN test |
91
- | `cluster` | Input | Gap cluster name (component/area) |
92
- | `test_name` | Input | Human-readable test name |
93
- | `expected` | Input | Expected behavior from test scenario |
94
- | `reported` | Input | User's issue description (verbatim) |
95
- | `severity` | Input | Inferred severity (blocker/major/minor/cosmetic) |
96
- | `target_files` | Input | Semicolon-separated source files to investigate |
97
- | `issue_id` | Input | Back-reference to issues.jsonl entry |
98
- | `source_context` | Input | Relevant code context (imports, exports, call chains) |
99
- | `root_cause` | Output | Diagnosed root cause |
100
- | `fix_direction` | Output | Suggested fix approach |
101
- | `affected_files` | Output | Semicolon-separated files that need changes |
102
- | `evidence` | Output | file:line references supporting diagnosis |
103
- | `error` | Output | Agent error if diagnosis failed |
104
-
105
- ### Session Structure
106
-
107
- ```
108
- {target_dir}/.tests/.csv-session/
109
- +-- diagnosis.csv (diagnosis input)
110
- +-- diagnosis-results.csv (diagnosis output)
111
- ```
112
- </csv_schema>
113
-
114
- <invariants>
115
- 1. **One test at a time** — never batch-present tests
116
- 2. **Never ask severity** — always infer from natural language
117
- 3. **Session persistence** — uat.md survives context resets, resume from any point
118
- 4. **Batched writes** — minimize file I/O (on issue, every 5 passes, completion)
119
- 5. **Gap-fix loop max 2 iterations** — prevent infinite loops
120
- 6. **CSV parallel diagnosis** — spawn_agents_on_csv for gap clusters, not sequential
121
- 7. **Auto-create issues** — every failed test creates entry in `.workflow/issues/issues.jsonl`
122
- 8. **Issue lifecycle sync** — track issues through registered → planning → executing → completed/failed
123
- </invariants>
124
-
125
- <execution>
126
-
127
- ### Step 1: Resolve Target
128
-
129
- 1. Parse `$ARGUMENTS` for phase number, scratch task ID, or flags
130
- 2. **Phase mode**: resolve `PHASE_DIR` via artifact registry in `state.json` (`type='execute'`, matching phase)
131
- 3. **Scratch mode**: set `SCRATCH_DIR = .workflow/scratch/{id}/`
132
- 4. Validate target exists and has `verification.json` — if missing: **E002**
133
-
134
- ### Step 2: Check Active Sessions
135
-
136
- Scan `.workflow/scratch` for existing `uat.md` files with `status: testing` in frontmatter.
137
-
138
- - If active sessions exist and no target specified: display session table, ask user to resume or start new:
139
- ```
140
- ## Active UAT Sessions
141
- | # | Target | Status | Current Test | Progress |
142
- |---|--------|--------|--------------|----------|
143
- | 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
144
- Reply with a number to resume, or provide a phase/task to start new.
145
- ```
146
- - If `--session ID` specified: resume that session directly (skip to Step 9)
147
- - If session exists for target: offer resume or restart
148
-
149
- ### Step 3: Smoke Tests (if --smoke)
150
-
151
- Skip if `--smoke` not set.
152
-
153
- | Smoke Test | Check | Method |
154
- |------------|-------|--------|
155
- | App starts | Process runs without crash | Run start command, check exit code |
156
- | Routes respond | Key endpoints return non-error | curl/fetch main routes |
157
- | Build clean | No build errors | Build command succeeds |
158
- | Dependencies | No missing deps | Install check |
159
-
160
- Record in `uat.md` under `## Smoke Tests`. If any fails: **E003** — abort, suggest `$quality-debug`.
161
-
162
- ### Step 4: Load Verification Context
163
-
164
- Read from target directory: `verification.json`, `validation.json`, `index.json`, `plan.json`, `.summaries/TASK-*.md`. Build testable list from user-observable outcomes.
165
-
166
- ### Step 4.5: Load Quality Context (Cross-Artifact Integration)
167
-
168
- Query `state.json.artifacts[]` for all artifacts matching current phase and milestone:
169
-
170
- **Review findings integration**:
171
- - For `type: "review"` artifacts: read `review.json`, extract critical/high findings
172
- - Generate additional test scenarios marked `source: "review_finding"`
173
- - If review verdict is "BLOCK" and review-finding tests fail → auto-enter gap-fix loop
174
-
175
- **Debug root cause integration**:
176
- - For `type: "debug"` artifacts: read `understanding.md`, extract confirmed root causes
177
- - Generate regression test scenarios marked `source: "debug_root_cause"`
178
-
179
- ### Step 5: Design Test Scenarios
180
-
181
- Create scenarios from testables:
182
- - `id`: T-001, T-002, ...
183
- - `name`: Brief test name
184
- - `category`: "e2e" | "integration" | "unit"
185
- - `expected`: Specific observable behavior
186
- - `requirement_ref`: Which success criterion this covers
187
- - `source`: "verification" | "review_finding" | "debug_root_cause"
188
-
189
- Write `{target_dir}/.tests/test-plan.json`:
190
- ```json
191
- {
192
- "target": "{phase or scratch ID}",
193
- "generated_at": "{ISO}",
194
- "tests": [...],
195
- "coverage": {
196
- "requirements_mapped": ["SC-001"],
197
- "requirements_unmapped": ["SC-003"]
198
- }
199
- }
200
- ```
201
-
202
- Focus on USER-OBSERVABLE outcomes. Skip internal/non-observable items.
203
-
204
- ### Step 6: Create UAT File
205
-
206
- Archive previous `uat.md` to `.history/` if exists.
207
-
208
- Write `{target_dir}/uat.md`:
209
- ```markdown
210
- ---
211
- status: testing
212
- target: {phase slug or scratch ID}
213
- source: [list of summary files]
214
- started: {ISO}
215
- updated: {ISO}
216
- ---
217
-
218
- ## Current Test
219
- number: 1
220
- name: {first test name}
221
- expected: |
222
- {what user should observe}
223
- awaiting: user response
224
-
225
- ## Smoke Tests
226
- {results if ran, otherwise omitted}
227
-
228
- ## Tests
229
- ### 1. {Test Name}
230
- expected: {observable behavior}
231
- result: [pending]
232
-
233
- ## Summary
234
- total: {N} passed: 0 issues: 0 pending: {N} skipped: 0
235
-
236
- ## Gaps
237
- [none yet]
238
- ```
239
-
240
- ### Step 7: Present Test (Interactive Loop)
241
-
242
- Present one test at a time:
243
- ```
244
- ------------------------------------------------------------
245
- TEST {number}/{total}: {name}
246
- ------------------------------------------------------------
247
-
248
- Expected behavior:
249
- {expected}
250
-
251
- ------------------------------------------------------------
252
- > Type "pass" or describe what's wrong
253
- ------------------------------------------------------------
254
- ```
255
-
256
- Wait for user response (plain text).
257
-
258
- ### Step 8: Process Response
259
-
260
- | Response | Action |
261
- |----------|--------|
262
- | empty, "yes", "y", "ok", "pass", "next" | Mark as pass |
263
- | "skip", "can't test", "n/a" | Mark as skipped |
264
- | Anything else | Log as issue, infer severity |
265
-
266
- **Severity inference** (never ask):
267
-
268
- | User says | Infer |
269
- |-----------|-------|
270
- | "crashes", "error", "exception", "fails completely", "can't use" | blocker |
271
- | "doesn't work", "nothing happens", "wrong behavior", "broken" | major |
272
- | "works but...", "slow", "weird", "minor issue", "inconsistent" | minor |
273
- | "color", "spacing", "alignment", "looks off", "typo" | cosmetic |
274
-
275
- Default: **major** if unclear.
276
-
277
- **On issue**: auto-create issue in `.workflow/issues/issues.jsonl`:
278
- ```json
279
- {
280
- "id": "ISS-{YYYYMMDD}-{NNN}",
281
- "title": "UAT: {test.name} - {response truncated 100 chars}",
282
- "status": "registered",
283
- "priority": "{from severity}",
284
- "severity": "{inferred}",
285
- "source": "uat",
286
- "phase_ref": "{phase}",
287
- "gap_ref": "{test.id}",
288
- "description": "Expected: {expected}. Reported: {verbatim}",
289
- "tags": ["uat"]
290
- }
291
- ```
292
-
293
- Back-reference: set `gap.issue_id = issue_id` in uat.md gap entry.
294
-
295
- **Batched writes**: write to file on issue, every 5 passes, or completion.
296
-
297
- If more tests → update Current Test, loop to Step 7.
298
- If done → go to Step 10.
299
-
300
- ### Step 9: Resume From File
301
-
302
- Read `uat.md`, find first `result: [pending]` test, announce progress, continue from there (go to Step 7).
303
-
304
- ### Step 10: Complete Session
305
-
306
- 1. Update `uat.md` frontmatter: `status → "complete"`, update timestamp
307
- 2. Archive previous result artifacts to `.history/`
308
- 3. Write `.tests/test-results.json`:
309
- ```json
310
- { "target": "...", "completed_at": "...", "results": [...], "summary": { "total": N, "passed": N, "issues": N, "skipped": N } }
311
- ```
312
- 4. Write `.tests/coverage-report.json`:
313
- ```json
314
- { "target": "...", "requirements_covered": [...], "requirements_uncovered": [...], "coverage_percentage": 66.7 }
315
- ```
316
- 5. Update `index.json` with UAT results
317
- 6. **Register artifact** in `state.json.artifacts[]`:
318
- ```json
319
- { "id": "TST-NNN", "type": "test", "milestone": "current", "phase": "target_phase", "scope": "phase",
320
- "path": "scratch/{YYYYMMDD}-test-P{N}-{slug}", "status": "completed|failed", "depends_on": "exec_art.id" }
321
- ```
322
- 7. If no issues → go to Step 13
323
- 8. If issues found → go to Step 11
324
-
325
- ### Step 11: Auto-Diagnose via CSV Parallel
326
-
327
- **Cluster gaps and diagnose in parallel via `spawn_agents_on_csv`.**
328
-
329
- #### 11a. Cluster Gaps
330
-
331
- Group issues by affected component/area:
332
- - Same file/module → one cluster
333
- - Same feature/flow → one cluster
334
- - Unrelated → separate clusters
335
-
336
- #### 11b. Build diagnosis.csv
337
-
338
- ```
339
- mkdir -p {target_dir}/.tests/.csv-session
340
-
341
- For each gap in uat.md:
342
- Resolve target_files from gap context (test expected behavior → source files)
343
- Gather source_context (imports, exports, call chains from target files)
344
- Create one diagnosis.csv row with: id, test_id, cluster, test_name, expected, reported, severity, target_files, issue_id, source_context
345
- ```
346
-
347
- #### 11c. Parallel Diagnosis via spawn_agents_on_csv
348
-
349
- ```javascript
350
- spawn_agents_on_csv({
351
- csv_path: `${targetDir}/.tests/.csv-session/diagnosis.csv`,
352
- id_column: "id",
353
- instruction: `
354
- You are a UAT failure diagnostician. Investigate ONE gap cluster.
355
-
356
- ## Task
357
- - Read all target_files to understand the relevant code
358
- - Analyze: why does the expected behavior not match what user reported?
359
- - Find the root cause (not the symptom)
360
- - Suggest a fix direction (what needs to change, not exact code)
361
- - List all files that would need modification
362
-
363
- ## Output
364
- - root_cause: Concise explanation of why the issue occurs
365
- - fix_direction: Suggested approach to fix (e.g., "Add null check before accessing user.email")
366
- - affected_files: Semicolon-separated list of files needing changes
367
- - evidence: file:line references supporting your diagnosis
368
-
369
- ## Rules
370
- - Do NOT modify any files — diagnosis only
371
- - Focus on root cause, not symptoms
372
- - Reference issue_id in your findings for traceability
373
- - If multiple gaps in same cluster share a root cause, note the shared cause
374
- `,
375
- max_concurrency: 5,
376
- max_runtime_seconds: 1200,
377
- output_csv_path: `${targetDir}/.tests/.csv-session/diagnosis-results.csv`,
378
- output_schema: { id, root_cause, fix_direction, affected_files, evidence, error }
379
- })
380
- ```
381
-
382
- #### 11d. Merge Results
383
-
384
- Update `uat.md` gaps with diagnosis:
385
- ```yaml
386
- - test: {N}
387
- truth: "..."
388
- status: failed
389
- reason: "..."
390
- severity: {inferred}
391
- issue_id: ISS-YYYYMMDD-NNN
392
- root_cause: "{diagnosed cause}"
393
- fix_direction: "{suggested approach}"
394
- affected_files: ["{file1}", "{file2}"]
395
- ```
396
-
397
- ### Step 12: Gap Closure Decision
398
-
399
- **If `--auto-fix` or `-y`**: execute gap-fix loop directly.
400
-
401
- **Otherwise**: present diagnosis summary and offer options:
402
- ```
403
- ### Diagnosis Complete
404
-
405
- | Gap | Severity | Root Cause | Fix Direction |
406
- |-----|----------|------------|---------------|
407
- | T-3 | major | Missing null check | Add guard clause |
408
- | T-5 | blocker | Event not cleaned | Add cleanup logic |
409
-
410
- Options:
411
- 1. Auto-fix — Plan and execute fixes, then re-verify
412
- 2. Debug deep — $quality-debug per issue
413
- 3. Plan fixes — $maestro-plan "--gaps"
414
- 4. Manual fix — Address issues yourself
415
- ```
416
-
417
- | Choice | Action |
418
- |--------|--------|
419
- | 1 / "auto-fix" | Execute gap-fix loop |
420
- | 2 / "debug" | Suggest `$quality-debug "--from-uat {phase}"` |
421
- | 3 / "plan" | Suggest `$maestro-plan "{phase} --gaps"` |
422
- | 4 / "manual" | Done, report results |
423
-
424
- **Gap-fix closure loop** (max 2 iterations):
425
- 1. `$maestro-plan "{phase} --gaps"` — generate fix tasks from gaps
426
- 2. `$maestro-execute "{phase}"` — execute fix tasks
427
- 3. `$maestro-verify "{phase}"` — re-verify
428
-
429
- **Issue lifecycle sync during loop:**
430
- - Before plan: `registered` → `planning`
431
- - Before execute: `planning` → `executing`
432
- - After re-verify: resolved gaps → `completed` (resolution: "auto-fixed via gap-fix loop"), unresolved → `failed`
433
-
434
- If re-verify passes: update uat.md gaps as resolved, report success.
435
- If gaps remain after 2 iterations: report remaining, suggest manual intervention.
436
-
437
- ### Step 13: Report
438
-
439
- ```
440
- === UAT RESULTS ===
441
- Target: {target}
442
-
443
- Smoke Tests: {smoke_count} run, {smoke_pass} passed (if ran)
444
- UAT Tests: {total} total
445
- Passed: {passed}
446
- Issues: {issues} ({blocker_count} blockers, {major_count} major)
447
- Skipped: {skipped}
448
-
449
- Diagnosis: {diagnosed_count}/{issues} gaps diagnosed
450
- Auto-fix: {fixed_count} gaps resolved (if ran)
451
-
452
- Files:
453
- {target_dir}/uat.md
454
- {target_dir}/.tests/test-results.json
455
- {target_dir}/.tests/coverage-report.json
456
- {target_dir}/.tests/.csv-session/diagnosis-results.csv (if diagnosed)
457
- ```
458
-
459
- **Next-step routing:**
460
-
461
- | Result | Next Step |
462
- |--------|-----------|
463
- | All passed, no gaps | `$maestro-milestone-audit` |
464
- | Auto-fix ran and succeeded | `$maestro-verify "{phase}"` |
465
- | Auto-fix ran but gaps remain | `$quality-debug "--from-uat {phase}"` |
466
- | Issues found, manual fix needed | `$quality-debug "--from-uat {phase}"` |
467
- | Coverage below threshold | `$quality-auto-test "{phase}"` |
468
- | Need integration tests | `$quality-auto-test "{phase}"` |
469
-
470
- </execution>
471
-
472
- <error_codes>
473
- | Code | Severity | Condition | Recovery |
474
- |------|----------|-----------|----------|
475
- | E001 | error | Phase or task target required (no active sessions) | Prompt user for phase number |
476
- | E002 | error | Phase not verified (no verification.json) | Suggest `$maestro-verify` |
477
- | E003 | error | Smoke test failed (app won't start) | Suggest `$quality-debug` |
478
- | W001 | warning | Test scenarios failed | Auto-diagnose, suggest fix options |
479
- | W002 | warning | Coverage below threshold | Suggest `$quality-auto-test` |
480
- </error_codes>
481
-
482
- <success_criteria>
483
- - [ ] Target resolved and verification context loaded
484
- - [ ] Quality artifacts loaded (review findings → extra tests, debug root causes → regression tests)
485
- - [ ] Test scenarios designed from user-observable outcomes
486
- - [ ] UAT file created with session persistence
487
- - [ ] Tests presented one at a time, severity inferred (never asked)
488
- - [ ] Issues auto-created in issues.jsonl for all failures
489
- - [ ] Batched writes: on issue, every 5 passes, or completion
490
- - [ ] test-results.json and coverage-report.json written
491
- - [ ] index.json uat fields updated
492
- - [ ] Artifact registered in state.json
493
- - [ ] If issues: diagnosis.csv built, spawn_agents_on_csv executed per gap cluster
494
- - [ ] Gaps updated with root_cause, fix_direction, affected_files
495
- - [ ] Gap-fix loop triggered if --auto-fix (max 2 iterations)
496
- - [ ] Issue lifecycle synced through gap-fix loop
497
- - [ ] Next step routed based on result
498
- </success_criteria>
1
+ ---
2
+ name: quality-test
3
+ description: Conversational UAT with auto-diagnosis and gap closure
4
+ argument-hint: "<phase> [-y] [--smoke] [--auto-fix] [--session ID]"
5
+ allowed-tools: spawn_agents_on_csv, Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
6
+ ---
7
+
8
+ <purpose>
9
+ Conversational UAT: present expected behavior one test at a time, user confirms or describes issues. Severity inferred from natural language (never asked). Session persists in `uat.md` across context resets. Failed tests trigger CSV-parallel diagnosis via `spawn_agents_on_csv` and optional gap-fix closure.
10
+
11
+ **Philosophy**: Show expected, ask if reality matches.
12
+
13
+ ```
14
+ +---------------------------------------------------------------------------+
15
+ | UAT CSV DIAGNOSIS PIPELINE |
16
+ +---------------------------------------------------------------------------+
17
+ | |
18
+ | Phase 1: Setup & Scenario Design |
19
+ | +-- Resolve target (phase / scratch) |
20
+ | +-- Check active sessions (resume or new) |
21
+ | +-- Smoke tests (if --smoke) |
22
+ | +-- Load verification context + quality artifacts |
23
+ | +-- Design test scenarios from user-observable outcomes |
24
+ | +-- Create uat.md with all tests pending |
25
+ | |
26
+ | Phase 2: Interactive Testing (one at a time) |
27
+ | +-- Present test: show expected behavior |
28
+ | +-- User responds: pass / skip / describe issue |
29
+ | +-- Severity inferred (never asked) |
30
+ | +-- Issues auto-created in issues.jsonl |
31
+ | +-- Batched writes to uat.md |
32
+ | |
33
+ | Phase 3: Diagnosis (if issues found) |
34
+ | +-- Cluster gaps by component/module |
35
+ | +-- Build diagnosis.csv from gap clusters |
36
+ | +-- Diagnose in parallel via spawn_agents_on_csv |
37
+ | +-- Each agent: find root cause, fix direction, affected files |
38
+ | +-- Merge results into uat.md gaps |
39
+ | |
40
+ | Phase 4: Gap Closure & Report |
41
+ | +-- If --auto-fix: plan --gaps -> execute -> re-verify (max 2) |
42
+ | +-- Otherwise: present options (auto-fix / debug / plan / manual) |
43
+ | +-- Issue lifecycle sync throughout |
44
+ | +-- Report with pass/fail counts and next steps |
45
+ | |
46
+ +---------------------------------------------------------------------------+
47
+ ```
48
+ </purpose>
49
+
50
+ <context>
51
+ ```bash
52
+ $quality-test "3" # test phase 3
53
+ $quality-test "3 --smoke" # smoke tests first, then UAT
54
+ $quality-test "3 --auto-fix" # auto-trigger gap-fix loop on failures
55
+ $quality-test "-y 3" # implies --auto-fix, skip gap closure prompt
56
+ $quality-test "--session 04-comments" # resume specific session
57
+ ```
58
+
59
+ **Flags**:
60
+ - `<phase>`: Phase number or scratch task ID
61
+ - `--smoke`: Run cold-start smoke tests before UAT
62
+ - `--auto-fix`: Auto-trigger gap-fix loop (plan --gaps -> execute -> re-verify) on failures
63
+ - `--session ID`: Resume a specific UAT session
64
+
65
+ `-y` implies `--auto-fix`. UAT itself remains interactive (present expected → user confirms). `-y` only automates the gap closure loop.
66
+
67
+ **Output**:
68
+ - `{target_dir}/uat.md` — session file (persistent)
69
+ - `{target_dir}/.tests/test-plan.json` — scenario definitions
70
+ - `{target_dir}/.tests/test-results.json` — pass/fail results
71
+ - `{target_dir}/.tests/coverage-report.json` — requirement coverage
72
+ - `.tests/.csv-session/diagnosis.csv` + `diagnosis-results.csv` — diagnosis artifacts
73
+ </context>
74
+
75
+ <csv_schema>
76
+
77
+ ### diagnosis.csv (Gap Diagnosis Phase)
78
+
79
+ ```csv
80
+ id,test_id,cluster,test_name,expected,reported,severity,target_files,issue_id,source_context,root_cause,fix_direction,affected_files,evidence,error
81
+ "DX-001","T-003","auth","Login validation","Valid login returns dashboard","Clicking login does nothing, no error","major","src/auth/login.ts;src/routes/auth.ts","ISS-20260503-001","login.ts calls authService.verify, auth.ts exports POST /login","","","","",""
82
+ "DX-002","T-005","events","Event cleanup on logout","Events unsubscribed after logout","Memory leak warning in console after logout","blocker","src/events/manager.ts","ISS-20260503-002","manager.ts has subscribe() but no unsubscribe in logout flow","","","","",""
83
+ ```
84
+
85
+ **Columns**:
86
+
87
+ | Column | Phase | Description |
88
+ |--------|-------|-------------|
89
+ | `id` | Input | Diagnosis ID (DX-NNN) |
90
+ | `test_id` | Input | Reference to T-NNN test |
91
+ | `cluster` | Input | Gap cluster name (component/area) |
92
+ | `test_name` | Input | Human-readable test name |
93
+ | `expected` | Input | Expected behavior from test scenario |
94
+ | `reported` | Input | User's issue description (verbatim) |
95
+ | `severity` | Input | Inferred severity (blocker/major/minor/cosmetic) |
96
+ | `target_files` | Input | Semicolon-separated source files to investigate |
97
+ | `issue_id` | Input | Back-reference to issues.jsonl entry |
98
+ | `source_context` | Input | Relevant code context (imports, exports, call chains) |
99
+ | `root_cause` | Output | Diagnosed root cause |
100
+ | `fix_direction` | Output | Suggested fix approach |
101
+ | `affected_files` | Output | Semicolon-separated files that need changes |
102
+ | `evidence` | Output | file:line references supporting diagnosis |
103
+ | `error` | Output | Agent error if diagnosis failed |
104
+
105
+ ### Session Structure
106
+
107
+ ```
108
+ {target_dir}/.tests/.csv-session/
109
+ +-- diagnosis.csv (diagnosis input)
110
+ +-- diagnosis-results.csv (diagnosis output)
111
+ ```
112
+ </csv_schema>
113
+
114
+ <invariants>
115
+ 1. **One test at a time** — never batch-present tests
116
+ 2. **Never ask severity** — always infer from natural language
117
+ 3. **Session persistence** — uat.md survives context resets, resume from any point
118
+ 4. **Batched writes** — minimize file I/O (on issue, every 5 passes, completion)
119
+ 5. **Gap-fix loop max 2 iterations** — prevent infinite loops
120
+ 6. **CSV parallel diagnosis** — spawn_agents_on_csv for gap clusters, not sequential
121
+ 7. **Auto-create issues** — every failed test creates entry in `.workflow/issues/issues.jsonl`
122
+ 8. **Issue lifecycle sync** — track issues through registered → planning → executing → completed/failed
123
+ </invariants>
124
+
125
+ <execution>
126
+
127
+ ### Step 1: Resolve Target
128
+
129
+ 1. Parse `$ARGUMENTS` for phase number, scratch task ID, or flags
130
+ 2. **Phase mode**: resolve `PHASE_DIR` via artifact registry in `state.json` (`type='execute'`, matching phase)
131
+ 3. **Scratch mode**: set `SCRATCH_DIR = .workflow/scratch/{id}/`
132
+ 4. Validate target exists and has `verification.json` — if missing: **E002**
133
+
134
+ ### Step 2: Check Active Sessions
135
+
136
+ Scan `.workflow/scratch` for existing `uat.md` files with `status: testing` in frontmatter.
137
+
138
+ - If active sessions exist and no target specified: display session table, ask user to resume or start new:
139
+ ```
140
+ ## Active UAT Sessions
141
+ | # | Target | Status | Current Test | Progress |
142
+ |---|--------|--------|--------------|----------|
143
+ | 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
144
+ Reply with a number to resume, or provide a phase/task to start new.
145
+ ```
146
+ - If `--session ID` specified: resume that session directly (skip to Step 9)
147
+ - If session exists for target: offer resume or restart
148
+
149
+ ### Step 3: Smoke Tests (if --smoke)
150
+
151
+ Skip if `--smoke` not set.
152
+
153
+ | Smoke Test | Check | Method |
154
+ |------------|-------|--------|
155
+ | App starts | Process runs without crash | Run start command, check exit code |
156
+ | Routes respond | Key endpoints return non-error | curl/fetch main routes |
157
+ | Build clean | No build errors | Build command succeeds |
158
+ | Dependencies | No missing deps | Install check |
159
+
160
+ Record in `uat.md` under `## Smoke Tests`. If any fails: **E003** — abort, suggest `$quality-debug`.
161
+
162
+ ### Step 4: Load Verification Context
163
+
164
+ Read from target directory: `verification.json`, `validation.json`, `index.json`, `plan.json`, `.summaries/TASK-*.md`. Build testable list from user-observable outcomes.
165
+
166
+ ### Step 4.5: Load Quality Context (Cross-Artifact Integration)
167
+
168
+ Query `state.json.artifacts[]` for all artifacts matching current phase and milestone:
169
+
170
+ **Review findings integration**:
171
+ - For `type: "review"` artifacts: read `review.json`, extract critical/high findings
172
+ - Generate additional test scenarios marked `source: "review_finding"`
173
+ - If review verdict is "BLOCK" and review-finding tests fail → auto-enter gap-fix loop
174
+
175
+ **Debug root cause integration**:
176
+ - For `type: "debug"` artifacts: read `understanding.md`, extract confirmed root causes
177
+ - Generate regression test scenarios marked `source: "debug_root_cause"`
178
+
179
+ ### Step 5: Design Test Scenarios
180
+
181
+ Create scenarios from testables:
182
+ - `id`: T-001, T-002, ...
183
+ - `name`: Brief test name
184
+ - `category`: "e2e" | "integration" | "unit"
185
+ - `expected`: Specific observable behavior
186
+ - `requirement_ref`: Which success criterion this covers
187
+ - `source`: "verification" | "review_finding" | "debug_root_cause"
188
+
189
+ Write `{target_dir}/.tests/test-plan.json`:
190
+ ```json
191
+ {
192
+ "target": "{phase or scratch ID}",
193
+ "generated_at": "{ISO}",
194
+ "tests": [...],
195
+ "coverage": {
196
+ "requirements_mapped": ["SC-001"],
197
+ "requirements_unmapped": ["SC-003"]
198
+ }
199
+ }
200
+ ```
201
+
202
+ Focus on USER-OBSERVABLE outcomes. Skip internal/non-observable items.
203
+
204
+ ### Step 6: Create UAT File
205
+
206
+ Archive previous `uat.md` to `.history/` if exists.
207
+
208
+ Write `{target_dir}/uat.md`:
209
+ ```markdown
210
+ ---
211
+ status: testing
212
+ target: {phase slug or scratch ID}
213
+ source: [list of summary files]
214
+ started: {ISO}
215
+ updated: {ISO}
216
+ ---
217
+
218
+ ## Current Test
219
+ number: 1
220
+ name: {first test name}
221
+ expected: |
222
+ {what user should observe}
223
+ awaiting: user response
224
+
225
+ ## Smoke Tests
226
+ {results if ran, otherwise omitted}
227
+
228
+ ## Tests
229
+ ### 1. {Test Name}
230
+ expected: {observable behavior}
231
+ result: [pending]
232
+
233
+ ## Summary
234
+ total: {N} passed: 0 issues: 0 pending: {N} skipped: 0
235
+
236
+ ## Gaps
237
+ [none yet]
238
+ ```
239
+
240
+ ### Step 7: Present Test (Interactive Loop)
241
+
242
+ Present one test at a time:
243
+ ```
244
+ ------------------------------------------------------------
245
+ TEST {number}/{total}: {name}
246
+ ------------------------------------------------------------
247
+
248
+ Expected behavior:
249
+ {expected}
250
+
251
+ ------------------------------------------------------------
252
+ > Type "pass" or describe what's wrong
253
+ ------------------------------------------------------------
254
+ ```
255
+
256
+ Wait for user response (plain text).
257
+
258
+ ### Step 8: Process Response
259
+
260
+ | Response | Action |
261
+ |----------|--------|
262
+ | empty, "yes", "y", "ok", "pass", "next" | Mark as pass |
263
+ | "skip", "can't test", "n/a" | Mark as skipped |
264
+ | Anything else | Log as issue, infer severity |
265
+
266
+ **Severity inference** (never ask):
267
+
268
+ | User says | Infer |
269
+ |-----------|-------|
270
+ | "crashes", "error", "exception", "fails completely", "can't use" | blocker |
271
+ | "doesn't work", "nothing happens", "wrong behavior", "broken" | major |
272
+ | "works but...", "slow", "weird", "minor issue", "inconsistent" | minor |
273
+ | "color", "spacing", "alignment", "looks off", "typo" | cosmetic |
274
+
275
+ Default: **major** if unclear.
276
+
277
+ **On issue**: auto-create issue in `.workflow/issues/issues.jsonl`:
278
+ ```json
279
+ {
280
+ "id": "ISS-{YYYYMMDD}-{NNN}",
281
+ "title": "UAT: {test.name} - {response truncated 100 chars}",
282
+ "status": "registered",
283
+ "priority": "{from severity}",
284
+ "severity": "{inferred}",
285
+ "source": "uat",
286
+ "phase_ref": "{phase}",
287
+ "gap_ref": "{test.id}",
288
+ "description": "Expected: {expected}. Reported: {verbatim}",
289
+ "tags": ["uat"]
290
+ }
291
+ ```
292
+
293
+ Back-reference: set `gap.issue_id = issue_id` in uat.md gap entry.
294
+
295
+ **Batched writes**: write to file on issue, every 5 passes, or completion.
296
+
297
+ If more tests → update Current Test, loop to Step 7.
298
+ If done → go to Step 10.
299
+
300
+ ### Step 9: Resume From File
301
+
302
+ Read `uat.md`, find first `result: [pending]` test, announce progress, continue from there (go to Step 7).
303
+
304
+ ### Step 10: Complete Session
305
+
306
+ 1. Update `uat.md` frontmatter: `status → "complete"`, update timestamp
307
+ 2. Archive previous result artifacts to `.history/`
308
+ 3. Write `.tests/test-results.json`:
309
+ ```json
310
+ { "target": "...", "completed_at": "...", "results": [...], "summary": { "total": N, "passed": N, "issues": N, "skipped": N } }
311
+ ```
312
+ 4. Write `.tests/coverage-report.json`:
313
+ ```json
314
+ { "target": "...", "requirements_covered": [...], "requirements_uncovered": [...], "coverage_percentage": 66.7 }
315
+ ```
316
+ 5. Update `index.json` with UAT results
317
+ 6. **Register artifact** in `state.json.artifacts[]`:
318
+ ```json
319
+ { "id": "TST-NNN", "type": "test", "milestone": "current", "phase": "target_phase", "scope": "phase",
320
+ "path": "scratch/{YYYYMMDD}-test-P{N}-{slug}", "status": "completed|failed", "depends_on": "exec_art.id" }
321
+ ```
322
+ 7. If no issues → go to Step 13
323
+ 8. If issues found → go to Step 11
324
+
325
+ ### Step 11: Auto-Diagnose via CSV Parallel
326
+
327
+ **Cluster gaps and diagnose in parallel via `spawn_agents_on_csv`.**
328
+
329
+ #### 11a. Cluster Gaps
330
+
331
+ Group issues by affected component/area:
332
+ - Same file/module → one cluster
333
+ - Same feature/flow → one cluster
334
+ - Unrelated → separate clusters
335
+
336
+ #### 11b. Build diagnosis.csv
337
+
338
+ ```
339
+ mkdir -p {target_dir}/.tests/.csv-session
340
+
341
+ For each gap in uat.md:
342
+ Resolve target_files from gap context (test expected behavior → source files)
343
+ Gather source_context (imports, exports, call chains from target files)
344
+ Create one diagnosis.csv row with: id, test_id, cluster, test_name, expected, reported, severity, target_files, issue_id, source_context
345
+ ```
346
+
347
+ #### 11c. Parallel Diagnosis via spawn_agents_on_csv
348
+
349
+ ```javascript
350
+ spawn_agents_on_csv({
351
+ csv_path: `${targetDir}/.tests/.csv-session/diagnosis.csv`,
352
+ id_column: "id",
353
+ instruction: `
354
+ You are a UAT failure diagnostician. Investigate ONE gap cluster.
355
+
356
+ ## Task
357
+ - Read all target_files to understand the relevant code
358
+ - Analyze: why does the expected behavior not match what user reported?
359
+ - Find the root cause (not the symptom)
360
+ - Suggest a fix direction (what needs to change, not exact code)
361
+ - List all files that would need modification
362
+
363
+ ## Output
364
+ - root_cause: Concise explanation of why the issue occurs
365
+ - fix_direction: Suggested approach to fix (e.g., "Add null check before accessing user.email")
366
+ - affected_files: Semicolon-separated list of files needing changes
367
+ - evidence: file:line references supporting your diagnosis
368
+
369
+ ## Rules
370
+ - Do NOT modify any files — diagnosis only
371
+ - Focus on root cause, not symptoms
372
+ - Reference issue_id in your findings for traceability
373
+ - If multiple gaps in same cluster share a root cause, note the shared cause
374
+ `,
375
+ max_concurrency: 5,
376
+ max_runtime_seconds: 1200,
377
+ output_csv_path: `${targetDir}/.tests/.csv-session/diagnosis-results.csv`,
378
+ output_schema: { id, root_cause, fix_direction, affected_files, evidence, error }
379
+ })
380
+ ```
381
+
382
+ #### 11d. Merge Results
383
+
384
+ Update `uat.md` gaps with diagnosis:
385
+ ```yaml
386
+ - test: {N}
387
+ truth: "..."
388
+ status: failed
389
+ reason: "..."
390
+ severity: {inferred}
391
+ issue_id: ISS-YYYYMMDD-NNN
392
+ root_cause: "{diagnosed cause}"
393
+ fix_direction: "{suggested approach}"
394
+ affected_files: ["{file1}", "{file2}"]
395
+ ```
396
+
397
+ ### Step 12: Gap Closure Decision
398
+
399
+ **If `--auto-fix` or `-y`**: execute gap-fix loop directly.
400
+
401
+ **Otherwise**: present diagnosis summary and offer options:
402
+ ```
403
+ ### Diagnosis Complete
404
+
405
+ | Gap | Severity | Root Cause | Fix Direction |
406
+ |-----|----------|------------|---------------|
407
+ | T-3 | major | Missing null check | Add guard clause |
408
+ | T-5 | blocker | Event not cleaned | Add cleanup logic |
409
+
410
+ Options:
411
+ 1. Auto-fix — Plan and execute fixes, then re-verify
412
+ 2. Debug deep — $quality-debug per issue
413
+ 3. Plan fixes — $maestro-plan "--gaps"
414
+ 4. Manual fix — Address issues yourself
415
+ ```
416
+
417
+ | Choice | Action |
418
+ |--------|--------|
419
+ | 1 / "auto-fix" | Execute gap-fix loop |
420
+ | 2 / "debug" | Suggest `$quality-debug "--from-uat {phase}"` |
421
+ | 3 / "plan" | Suggest `$maestro-plan "{phase} --gaps"` |
422
+ | 4 / "manual" | Done, report results |
423
+
424
+ **Gap-fix closure loop** (max 2 iterations):
425
+ 1. `$maestro-plan "{phase} --gaps"` — generate fix tasks from gaps
426
+ 2. `$maestro-execute "{phase}"` — execute fix tasks
427
+ 3. `$maestro-verify "{phase}"` — re-verify
428
+
429
+ **Issue lifecycle sync during loop:**
430
+ - Before plan: `registered` → `planning`
431
+ - Before execute: `planning` → `executing`
432
+ - After re-verify: resolved gaps → `completed` (resolution: "auto-fixed via gap-fix loop"), unresolved → `failed`
433
+
434
+ If re-verify passes: update uat.md gaps as resolved, report success.
435
+ If gaps remain after 2 iterations: report remaining, suggest manual intervention.
436
+
437
+ ### Step 13: Report
438
+
439
+ ```
440
+ === UAT RESULTS ===
441
+ Target: {target}
442
+
443
+ Smoke Tests: {smoke_count} run, {smoke_pass} passed (if ran)
444
+ UAT Tests: {total} total
445
+ Passed: {passed}
446
+ Issues: {issues} ({blocker_count} blockers, {major_count} major)
447
+ Skipped: {skipped}
448
+
449
+ Diagnosis: {diagnosed_count}/{issues} gaps diagnosed
450
+ Auto-fix: {fixed_count} gaps resolved (if ran)
451
+
452
+ Files:
453
+ {target_dir}/uat.md
454
+ {target_dir}/.tests/test-results.json
455
+ {target_dir}/.tests/coverage-report.json
456
+ {target_dir}/.tests/.csv-session/diagnosis-results.csv (if diagnosed)
457
+ ```
458
+
459
+ **Next-step routing:**
460
+
461
+ | Result | Next Step |
462
+ |--------|-----------|
463
+ | All passed, no gaps | `$maestro-milestone-audit` |
464
+ | Auto-fix ran and succeeded | `$maestro-verify "{phase}"` |
465
+ | Auto-fix ran but gaps remain | `$quality-debug "--from-uat {phase}"` |
466
+ | Issues found, manual fix needed | `$quality-debug "--from-uat {phase}"` |
467
+ | Coverage below threshold | `$quality-auto-test "{phase}"` |
468
+ | Need integration tests | `$quality-auto-test "{phase}"` |
469
+
470
+ </execution>
471
+
472
+ <error_codes>
473
+ | Code | Severity | Condition | Recovery |
474
+ |------|----------|-----------|----------|
475
+ | E001 | error | Phase or task target required (no active sessions) | Prompt user for phase number |
476
+ | E002 | error | Phase not verified (no verification.json) | Suggest `$maestro-verify` |
477
+ | E003 | error | Smoke test failed (app won't start) | Suggest `$quality-debug` |
478
+ | W001 | warning | Test scenarios failed | Auto-diagnose, suggest fix options |
479
+ | W002 | warning | Coverage below threshold | Suggest `$quality-auto-test` |
480
+ </error_codes>
481
+
482
+ <success_criteria>
483
+ - [ ] Target resolved and verification context loaded
484
+ - [ ] Quality artifacts loaded (review findings → extra tests, debug root causes → regression tests)
485
+ - [ ] Test scenarios designed from user-observable outcomes
486
+ - [ ] UAT file created with session persistence
487
+ - [ ] Tests presented one at a time, severity inferred (never asked)
488
+ - [ ] Issues auto-created in issues.jsonl for all failures
489
+ - [ ] Batched writes: on issue, every 5 passes, or completion
490
+ - [ ] test-results.json and coverage-report.json written
491
+ - [ ] index.json uat fields updated
492
+ - [ ] Artifact registered in state.json
493
+ - [ ] If issues: diagnosis.csv built, spawn_agents_on_csv executed per gap cluster
494
+ - [ ] Gaps updated with root_cause, fix_direction, affected_files
495
+ - [ ] Gap-fix loop triggered if --auto-fix (max 2 iterations)
496
+ - [ ] Issue lifecycle synced through gap-fix loop
497
+ - [ ] Next step routed based on result
498
+ </success_criteria>