@exodus/xqa 3.0.1 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -47,7 +47,7 @@ xqa explore --udid ABCD1234 # target a specific booted simulator
47
47
  Flags:
48
48
 
49
49
  - `-v, --verbose [categories]` — Log categories (prompt, tools, screen, memory). Default: all if flag is present without value.
50
- - `-t, --timeout <seconds>` — Explorer timeout in seconds (overrides `QA_EXPLORE_TIMEOUT_SECONDS`).
50
+ - `-t, --timeout <seconds>` — Explorer timeout in seconds (overrides `agents.explorer.timeoutSeconds` in `.xqa/config.yaml`).
51
51
  - `--debug` — Log timing and event details to stderr.
52
52
  - `--udid <id>` — Target simulator UDID. Overrides auto-detect of first booted; exits with code 2 if the UDID is not booted.
53
53
 
@@ -104,11 +104,40 @@ Flags:
104
104
  - `--debug` — Log timing and event details to stderr.
105
105
  - `--udid <id>` — Target simulator UDID. When supplied, the suite is constrained to that one simulator; exits with code 2 if the UDID is not booted.
106
106
 
107
+ ### plan
108
+
109
+ Generate or evolve the manual test plan for the current branch.
110
+
111
+ Inspects the git diff between the current branch and its upstream, asks the planner agent to emit Markdown scenario specs, and writes them to `.xqa/test-plan/default/` (or a custom directory). Subcommands let you refine individual scenarios, append new scenarios after fresh commits, and correlate findings from a run against the plan.
112
+
113
+ ```bash
114
+ xqa plan # generate scenarios from current diff
115
+ xqa plan --intent "login changes" --out .xqa/test-plan/my-slug
116
+ xqa plan edit .xqa/test-plan/my-slug/scenario-1.test.md --feedback "rename step 2"
117
+ xqa plan extend # append scenarios for fresh commits
118
+ xqa plan report --findings .xqa/output/.../findings.json --specs .xqa/test-plan/my-slug
119
+ ```
120
+
121
+ Flags:
122
+
123
+ - `--intent <text>` — Optional focus hint passed to the planner.
124
+ - `--out <dir>` — Output directory for the generated scenarios (default: `<xqa>/test-plan/default`).
125
+
126
+ Subcommands:
127
+
128
+ - `xqa plan edit <file> --feedback <text>` — apply user-requested edits to an existing scenario spec.
129
+ - `xqa plan extend [--intent <text>] [--out <dir>]` — append new scenarios for commits since the last plan was generated.
130
+ - `xqa plan report --findings <path> [--specs <dir>]` — correlate findings with scenarios and write `report.json` next to the plan.
131
+
132
+ What does it do?
133
+
134
+ `xqa plan` reads the branch diff, summarizes it, and feeds the context to the planner agent, which emits one Markdown scenario spec per suggested flow. The specs are written to the plan directory so you can review or hand them to `xqa run`. After running the scenarios, `xqa plan report` correlates the resulting findings back to each scenario so you can see which flows passed, which surfaced issues, and which were skipped. `xqa plan edit` lets you nudge a single scenario with natural-language feedback; `xqa plan extend` picks up commits added after the initial generation and appends new scenarios without touching the existing ones.
135
+
107
136
  ### review [findings-path]
108
137
 
109
138
  Review findings and mark false positives.
110
139
 
111
- Interactive session for triaging findings generated by explore or spec runs. Mark findings as dismissed (with optional reason) or undo previous dismissals. Dismissals are written to `dismissals.json` next to the `.xqa` directory (override with `QA_DISMISSALS_PATH`). Defaults to the last findings path if omitted.
140
+ Interactive session for triaging findings generated by explore or spec runs. Mark findings as dismissed (with optional reason) or undo previous dismissals. Dismissals are written to `dismissals.json` next to the `.xqa` directory (override with `run.dismissalsPath` in `.xqa/config.yaml`). Defaults to the last findings path if omitted.
112
141
 
113
142
  ```bash
114
143
  xqa review # use last findings file
@@ -178,14 +207,111 @@ Contract:
178
207
 
179
208
  ## Configuration
180
209
 
181
- Configuration is loaded from environment variables and `.env.local`:
210
+ Configuration splits in two: non-sensitive runtime settings in `.xqa/config.yaml`, secrets in the environment.
211
+
212
+ ### `.xqa/config.yaml`
213
+
214
+ `xqa init` writes this file with sensible defaults. It's the canonical home for agent toggles and tunables:
215
+
216
+ ```yaml
217
+ version: 1
218
+
219
+ run:
220
+ # id: my-run
221
+ # dismissalsPath: .xqa/dismissals.json
222
+
223
+ suites:
224
+ directory: .xqa/suites
225
+
226
+ agents:
227
+ explorer:
228
+ enabled: true
229
+ timeoutSeconds: 600
230
+ buildEnv: dev
231
+ capabilities:
232
+ videoRecording: false
233
+ viewUiServer: true
234
+ findingScreenshots: true
235
+
236
+ analyser:
237
+ enabled: true
238
+
239
+ inspector:
240
+ enabled: true
241
+ designsDirectory: .xqa/designs
242
+
243
+ consolidator:
244
+ enabled: true
245
+
246
+ triager:
247
+ enabled: false
248
+ ```
249
+
250
+ Field reference:
251
+
252
+ | Field | Default | Description |
253
+ | ------------------------------------------------- | ---------------------- | --------------------------------------------------------------------- |
254
+ | `version` | `1` | Config schema version. |
255
+ | `run.id` | _(auto)_ | Fixed run ID. Omit for sequential per-run IDs. |
256
+ | `run.dismissalsPath` | `.xqa/dismissals.json` | Where `xqa review` persists dismissals. |
257
+ | `suites.directory` | `.xqa/suites` | Directory containing `*.suite.json` files. |
258
+ | `agents.explorer.enabled` | `true` | Runs the explorer agent. |
259
+ | `agents.explorer.timeoutSeconds` | `600` | Wall-clock limit per explore/spec run. |
260
+ | `agents.explorer.buildEnv` | `dev` | `dev` or `prod`. `dev` ignores debug overlays as findings. |
261
+ | `agents.explorer.capabilities.videoRecording` | `false` | Records the simulator screen to MP4. |
262
+ | `agents.explorer.capabilities.viewUiServer` | `true` | Registers the `view_ui` MCP tool for reading the UI tree. |
263
+ | `agents.explorer.capabilities.findingScreenshots` | `true` | Writes per-finding PNGs. |
264
+ | `agents.analyser.enabled` | `true` | Runs the Gemini video analyser. Needs `GOOGLE_GENERATIVE_AI_API_KEY`. |
265
+ | `agents.inspector.enabled` | `true` | Runs the visual diff inspector. |
266
+ | `agents.inspector.designsDirectory` | `.xqa/designs` | Reference artboards for visual diffing. |
267
+ | `agents.consolidator.enabled` | `true` | Merges and deduplicates findings from every agent. |
268
+ | `agents.triager.enabled` | `false` | Runs the PR suite matcher. Needs `GITHUB_TOKEN`. |
269
+
270
+ ### Capabilities
271
+
272
+ Each agent has a `capabilities` block of opt-in feature flags. Enabling a capability doesn't enable the agent — both `enabled: true` and `capabilities.<name>: true` are required.
273
+
274
+ The explorer's `videoRecording` capability used to auto-turn-on whenever `GOOGLE_GENERATIVE_AI_API_KEY` was set. It's now independent: you can record video without running the analyser, and vice versa.
275
+
276
+ ### Environment variables
277
+
278
+ Secrets stay in `.env.local` (loaded by dotenv) or your shell. Lock the file down:
279
+
280
+ ```bash
281
+ chmod 600 .env.local
282
+ ```
182
283
 
183
284
  - `ANTHROPIC_API_KEY` (required) — Anthropic Claude API key for agent reasoning
184
- - `GOOGLE_GENERATIVE_AI_API_KEY` (optional) — Google Generative AI key for video analysis
185
- - `QA_RUN_ID` (optional) — Custom run identifier; defaults to auto-generated
186
- - `QA_EXPLORE_TIMEOUT_SECONDS` (optional) — Exploration timeout in seconds
187
- - `QA_BUILD_ENV` (optional) — Build environment: `dev` or `prod` (default: prod)
188
- - `QA_DISMISSALS_PATH` (optional) — Override the dismissals file path used by `xqa review`
285
+ - `GOOGLE_GENERATIVE_AI_API_KEY` (optional) — Gemini key for the analyser and `xqa analyse`
286
+ - `GITHUB_TOKEN` (optional) — required for `xqa triage`
287
+
288
+ ### Example: explorer only, with video recording
289
+
290
+ No Gemini key? Record video without the analyser:
291
+
292
+ ```yaml
293
+ agents:
294
+ explorer:
295
+ enabled: true
296
+ capabilities:
297
+ videoRecording: true
298
+ analyser:
299
+ enabled: false
300
+ ```
301
+
302
+ `videoRecording` is decoupled from the analyser, so this works without `GOOGLE_GENERATIVE_AI_API_KEY`.
303
+
304
+ ### Migration from legacy env vars
305
+
306
+ Legacy `QA_*` and `XQA_*` environment variables are rejected at startup with a `LEGACY_ENV_DETECTED` error. Move their values into `.xqa/config.yaml`:
307
+
308
+ | Legacy env var | New config path |
309
+ | ---------------------------- | -------------------------------- |
310
+ | `QA_RUN_ID` | `run.id` |
311
+ | `QA_EXPLORE_TIMEOUT_SECONDS` | `agents.explorer.timeoutSeconds` |
312
+ | `QA_BUILD_ENV` | `agents.explorer.buildEnv` |
313
+ | `QA_DISMISSALS_PATH` | `run.dismissalsPath` |
314
+ | `XQA_SUITES_DIR` | `suites.directory` |
189
315
 
190
316
  ## Architecture
191
317
 
@@ -0,0 +1,211 @@
1
+ ---
2
+ name: xqa-test-plan
3
+ description: Generate a manual test plan from the current branch diff, approve it, run it via xqa, and see findings inline. Triggers on /xqa-test-plan or implied intent ("what should I test before pushing?", "generate a test plan for my changes", "QA my branch").
4
+ license: MIT
5
+ ---
6
+
7
+ # xqa-test-plan
8
+
9
+ ## When to use
10
+
11
+ - User runs `/xqa-test-plan`
12
+ - User says "what should I test", "generate a test plan", "QA my branch", "test my changes before pushing", "what should I QA?"
13
+ - Self-activate on implied intent when the user is asking for pre-push manual verification on the current branch
14
+
15
+ ## Process
16
+
17
+ ```
18
+ Detect state → Generate → Approve → Run → Report → Re-run / Regenerate / Extend
19
+ ```
20
+
21
+ IMPORTANT: The skill orchestrates the CLI; it never writes `.test.md` files directly. Every spec mutation goes through `xqa plan` or `xqa plan edit`.
22
+
23
+ ## Detect state
24
+
25
+ Resolve the current branch and its plan directory before anything else.
26
+
27
+ ```bash
28
+ git symbolic-ref --short HEAD 2>/dev/null || git rev-parse --short HEAD
29
+ ```
30
+
31
+ - Named branch → use the name as the slug source.
32
+ - Detached HEAD → slug becomes `sha-<first-7-chars>`.
33
+
34
+ Slug rules (mirrors the planner's `branchToSlug`):
35
+
36
+ | Input fragment | Slug output |
37
+ | ------------------- | ------------- |
38
+ | `a-z`, `A-Z`, `0-9` | preserved |
39
+ | `.`, `_`, `-` | preserved |
40
+ | any other char | `-` |
41
+ | consecutive `-` | collapsed `-` |
42
+
43
+ Plan directory: `.xqa/test-plan/<slug>/`.
44
+
45
+ ### Auto-prune stale siblings
46
+
47
+ List every child of `.xqa/test-plan/`. For each sibling directory:
48
+
49
+ 1. Decode the slug back to an approximate branch name. WARNING: slug decoding is ambiguous (dashes could have been slashes or other chars). Treat the approximation as best-effort.
50
+ 2. Probe the local branch list:
51
+ ```bash
52
+ git branch --list '<approximate-name>'
53
+ ```
54
+ 3. If the probe is empty, the directory is stale — `rm -rf` it.
55
+
56
+ IMPORTANT: Never prune the current branch's slug directory, even if the approximate-name probe looks empty.
57
+ IMPORTANT: Auto-prune only removes directories whose approximation has NO matching local branch. When in doubt, leave it.
58
+
59
+ ### Existing plan detection
60
+
61
+ Check `.xqa/test-plan/<slug>/` for `*.test.md` files.
62
+
63
+ | State | Next action |
64
+ | --------------------- | ---------------------------------------------------------- |
65
+ | No specs | Proceed to Generate flow |
66
+ | Specs already present | Offer three choices: **Rerun**, **Regenerate**, **Extend** |
67
+
68
+ ## Generate flow
69
+
70
+ ### 1. Summarize intent
71
+
72
+ Scan the last ~5 turns of chat history. Compress the user's stated goal into a single sentence suitable for `--intent`. If no intent emerges, pass an empty string.
73
+
74
+ ### 2. Detect booted simulators
75
+
76
+ ```bash
77
+ xcrun simctl list devices booted --json
78
+ ```
79
+
80
+ Count booted devices:
81
+
82
+ | Count | Behavior |
83
+ | ----- | -------------------------------------------------------------------------- |
84
+ | 0 | STOP — tell user to boot a simulator (`xcrun simctl boot <udid>`) and wait |
85
+ | 1 | Auto-select its UDID |
86
+ | >1 | Ask user to pick one by name + UDID |
87
+
88
+ Remember the chosen UDID for the Run step.
89
+
90
+ ### 3. Invoke planner
91
+
92
+ ```bash
93
+ xqa plan --intent "<one-sentence summary>" --out .xqa/test-plan/<slug>
94
+ ```
95
+
96
+ ### 4. Render approval checklist
97
+
98
+ - Parse planner stdout as JSON; extract the `specs[]` array.
99
+ - Read each `<path>.test.md` that the planner wrote.
100
+ - Extract the step intents (first line of each numbered step, before `→` or `[hint:`).
101
+ - Render a numbered checklist, one line per scenario, showing scenario title and 1-3 key steps.
102
+
103
+ Ask: "Does this cover what you want to QA? Reply **approve** / **run it** / **looks good** to proceed, or describe edits."
104
+
105
+ ## Approval loop
106
+
107
+ | User reply | Skill action |
108
+ | ------------------------------------------ | --------------------------------------------------------- |
109
+ | "approve" / "run it" / "looks good" / "go" | Exit loop → proceed to Run |
110
+ | Edit request targeting one scenario | `xqa plan edit <path> --feedback "<verbatim user words>"` |
111
+ | Edit request spanning multiple scenarios | Issue one `xqa plan edit` call per affected file |
112
+ | Ambiguous request | Ask one clarifying question; do not invoke edit |
113
+
114
+ After every edit call, re-read the updated spec and re-render the checklist. Loop until approval.
115
+
116
+ IMPORTANT: Never hand-edit `.test.md` files. Every change flows through `xqa plan edit`.
117
+
118
+ ## Run
119
+
120
+ ### 1. Re-verify simulator
121
+
122
+ Confirm the UDID chosen during Generate is still booted:
123
+
124
+ ```bash
125
+ xcrun simctl list devices booted --json
126
+ ```
127
+
128
+ If it is no longer booted, go back to the simulator selection step.
129
+
130
+ ### 2. Preflight screenshot
131
+
132
+ ```bash
133
+ xcrun simctl io <udid> screenshot /tmp/preflight.png
134
+ ```
135
+
136
+ | Outcome | Next |
137
+ | ------- | --------------------------------------------------------------- |
138
+ | Success | Proceed to Run dispatch |
139
+ | Error | Pick a different booted simulator or abort with a clear message |
140
+
141
+ ### 3. Dispatch
142
+
143
+ ```bash
144
+ xqa run --spec '.xqa/test-plan/<slug>/*.test.md' --udid <udid>
145
+ ```
146
+
147
+ Capture the findings path from the `Run complete. Findings: <path>` line in stdout.
148
+
149
+ ## Move artifacts
150
+
151
+ `xqa run` writes findings + screenshots into a timestamped output directory under its own roots. Relocate them alongside the plan:
152
+
153
+ ```bash
154
+ mv <findings-path> .xqa/test-plan/<slug>/runs/<iso-timestamp>/findings.json
155
+ mv <findings-path>/../shots .xqa/test-plan/<slug>/runs/<iso-timestamp>/shots
156
+ ```
157
+
158
+ Use an ISO-8601 UTC timestamp (e.g. `2026-04-22T15-30-00Z`) as the directory name. Use `mv`, not `cp` — the plan directory owns the canonical copy.
159
+
160
+ WARNING: After moving, all downstream steps reference the new paths under `.xqa/test-plan/<slug>/runs/`.
161
+
162
+ ## Report
163
+
164
+ ```bash
165
+ xqa plan report \
166
+ --findings .xqa/test-plan/<slug>/runs/<iso-timestamp>/findings.json \
167
+ --specs .xqa/test-plan/<slug>
168
+ ```
169
+
170
+ Parse stdout JSON as `CorrelatedReport`. Render grouped output:
171
+
172
+ | Group | Rendering |
173
+ | ------------------------ | ------------------------------------------------------------------------------------------ |
174
+ | `has_findings` scenarios | Strikethrough title + fail marker. Inline each finding description. Link screenshot paths. |
175
+ | `not_run` scenarios | Muted/grey. Prefix with "(no findings referenced)" — do not mark as passed. |
176
+ | Unmatched findings | Separate "Findings without a scenario" section at the end. |
177
+
178
+ IMPORTANT: `not_run` does NOT mean "passed". It means the agent did not emit a finding that correlated to this scenario. We literally don't know the outcome. Do not print a green check.
179
+
180
+ ## Rerun / Regenerate / Extend
181
+
182
+ | Mode | Action |
183
+ | ---------- | ----------------------------------------------------------------------------------------------- |
184
+ | Rerun | Reuse existing `.test.md` files. Go straight to Run → Move artifacts → Report. |
185
+ | Regenerate | `rm .xqa/test-plan/<slug>/*.test.md` (specs only — preserve `runs/`). Re-run Generate flow. |
186
+ | Extend | `xqa plan extend --out .xqa/test-plan/<slug>`. Planner appends new scenarios for fresh commits. |
187
+
188
+ IMPORTANT: Regenerate never deletes `runs/`. Run history is preserved across regenerations.
189
+
190
+ ## Guardrails
191
+
192
+ - IMPORTANT: The skill NEVER writes `.test.md` files directly. Only `xqa plan` and `xqa plan edit` produce or modify them. `scenarioId` and meta injection live in the planner — hand-authoring breaks correlation.
193
+ - IMPORTANT: The skill NEVER computes finding correlation. Only `xqa plan report` does. Do not write ad-hoc string matchers.
194
+ - WARNING: The skill NEVER modifies another branch's slug directory. Auto-prune only removes directories whose local branch is gone AND which are not the current branch.
195
+ - WARNING: The skill NEVER edits artifacts under `.xqa/test-plan/<slug>/runs/`. Those are immutable run records.
196
+ - IMPORTANT: The skill NEVER claims `not_run` scenarios passed. Report them as unknown, not green.
197
+
198
+ ## Manual test plan
199
+
200
+ - [ ] Skill activates on `/xqa-test-plan`
201
+ - [ ] Skill activates on implied intent ("what should I QA?")
202
+ - [ ] Detect state correctly computes slug for current branch
203
+ - [ ] Detect state auto-prunes stale sibling dirs but never current
204
+ - [ ] Generate flow handles 0 booted simulators (error), 1 (auto), >1 (prompt)
205
+ - [ ] Generate flow passes `--intent` correctly
206
+ - [ ] Approval loop calls `xqa plan edit` per file
207
+ - [ ] Run flow preflights simulator before dispatching
208
+ - [ ] Report flow renders two-state status (has_findings / not_run) correctly
209
+ - [ ] Rerun doesn't regenerate specs
210
+ - [ ] Regenerate wipes specs before re-invoking xqa plan
211
+ - [ ] Extend appends scenario-N+1 without re-writing existing scenarios