ace-test-runner-e2e 0.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. checksums.yaml +7 -0
  2. data/.ace-defaults/e2e-runner/config.yml +70 -0
  3. data/.ace-defaults/nav/protocols/guide-sources/ace-test-runner-e2e.yml +11 -0
  4. data/.ace-defaults/nav/protocols/skill-sources/ace-test-runner-e2e.yml +19 -0
  5. data/.ace-defaults/nav/protocols/tmpl-sources/ace-test-runner-e2e.yml +12 -0
  6. data/.ace-defaults/nav/protocols/wfi-sources/ace-test-runner-e2e.yml +11 -0
  7. data/CHANGELOG.md +1166 -0
  8. data/LICENSE +21 -0
  9. data/README.md +42 -0
  10. data/Rakefile +15 -0
  11. data/exe/ace-test-e2e +15 -0
  12. data/exe/ace-test-e2e-sh +67 -0
  13. data/exe/ace-test-e2e-suite +13 -0
  14. data/handbook/guides/e2e-testing.g.md +124 -0
  15. data/handbook/guides/scenario-yml-reference.g.md +182 -0
  16. data/handbook/guides/tc-authoring.g.md +131 -0
  17. data/handbook/skills/as-e2e-create/SKILL.md +30 -0
  18. data/handbook/skills/as-e2e-fix/SKILL.md +35 -0
  19. data/handbook/skills/as-e2e-manage/SKILL.md +31 -0
  20. data/handbook/skills/as-e2e-plan-changes/SKILL.md +30 -0
  21. data/handbook/skills/as-e2e-review/SKILL.md +35 -0
  22. data/handbook/skills/as-e2e-rewrite/SKILL.md +31 -0
  23. data/handbook/skills/as-e2e-run/SKILL.md +48 -0
  24. data/handbook/skills/as-e2e-setup-sandbox/SKILL.md +34 -0
  25. data/handbook/templates/ace-taskflow-fixture.template.md +322 -0
  26. data/handbook/templates/agent-experience-report.template.md +89 -0
  27. data/handbook/templates/metadata.template.yml +49 -0
  28. data/handbook/templates/scenario.yml.template.yml +60 -0
  29. data/handbook/templates/tc-file.template.md +45 -0
  30. data/handbook/templates/test-report.template.md +94 -0
  31. data/handbook/workflow-instructions/e2e/analyze-failures.wf.md +126 -0
  32. data/handbook/workflow-instructions/e2e/create.wf.md +395 -0
  33. data/handbook/workflow-instructions/e2e/execute.wf.md +253 -0
  34. data/handbook/workflow-instructions/e2e/fix.wf.md +166 -0
  35. data/handbook/workflow-instructions/e2e/manage.wf.md +179 -0
  36. data/handbook/workflow-instructions/e2e/plan-changes.wf.md +255 -0
  37. data/handbook/workflow-instructions/e2e/review.wf.md +286 -0
  38. data/handbook/workflow-instructions/e2e/rewrite.wf.md +281 -0
  39. data/handbook/workflow-instructions/e2e/run.wf.md +355 -0
  40. data/handbook/workflow-instructions/e2e/setup-sandbox.wf.md +461 -0
  41. data/lib/ace/test/end_to_end_runner/atoms/display_helpers.rb +234 -0
  42. data/lib/ace/test/end_to_end_runner/atoms/prompt_builder.rb +199 -0
  43. data/lib/ace/test/end_to_end_runner/atoms/result_parser.rb +166 -0
  44. data/lib/ace/test/end_to_end_runner/atoms/skill_prompt_builder.rb +166 -0
  45. data/lib/ace/test/end_to_end_runner/atoms/skill_result_parser.rb +244 -0
  46. data/lib/ace/test/end_to_end_runner/atoms/suite_report_prompt_builder.rb +103 -0
  47. data/lib/ace/test/end_to_end_runner/atoms/tc_fidelity_validator.rb +39 -0
  48. data/lib/ace/test/end_to_end_runner/atoms/test_case_parser.rb +108 -0
  49. data/lib/ace/test/end_to_end_runner/cli/commands/run_suite.rb +130 -0
  50. data/lib/ace/test/end_to_end_runner/cli/commands/run_test.rb +156 -0
  51. data/lib/ace/test/end_to_end_runner/models/test_case.rb +47 -0
  52. data/lib/ace/test/end_to_end_runner/models/test_result.rb +115 -0
  53. data/lib/ace/test/end_to_end_runner/models/test_scenario.rb +90 -0
  54. data/lib/ace/test/end_to_end_runner/molecules/affected_detector.rb +92 -0
  55. data/lib/ace/test/end_to_end_runner/molecules/config_loader.rb +75 -0
  56. data/lib/ace/test/end_to_end_runner/molecules/failure_finder.rb +203 -0
  57. data/lib/ace/test/end_to_end_runner/molecules/fixture_copier.rb +35 -0
  58. data/lib/ace/test/end_to_end_runner/molecules/pipeline_executor.rb +121 -0
  59. data/lib/ace/test/end_to_end_runner/molecules/pipeline_prompt_bundler.rb +182 -0
  60. data/lib/ace/test/end_to_end_runner/molecules/pipeline_report_generator.rb +321 -0
  61. data/lib/ace/test/end_to_end_runner/molecules/pipeline_sandbox_builder.rb +131 -0
  62. data/lib/ace/test/end_to_end_runner/molecules/progress_display_manager.rb +172 -0
  63. data/lib/ace/test/end_to_end_runner/molecules/report_writer.rb +259 -0
  64. data/lib/ace/test/end_to_end_runner/molecules/scenario_loader.rb +254 -0
  65. data/lib/ace/test/end_to_end_runner/molecules/setup_executor.rb +181 -0
  66. data/lib/ace/test/end_to_end_runner/molecules/simple_display_manager.rb +72 -0
  67. data/lib/ace/test/end_to_end_runner/molecules/suite_progress_display_manager.rb +223 -0
  68. data/lib/ace/test/end_to_end_runner/molecules/suite_report_writer.rb +277 -0
  69. data/lib/ace/test/end_to_end_runner/molecules/suite_simple_display_manager.rb +116 -0
  70. data/lib/ace/test/end_to_end_runner/molecules/test_discoverer.rb +136 -0
  71. data/lib/ace/test/end_to_end_runner/molecules/test_executor.rb +332 -0
  72. data/lib/ace/test/end_to_end_runner/organisms/suite_orchestrator.rb +830 -0
  73. data/lib/ace/test/end_to_end_runner/organisms/test_orchestrator.rb +442 -0
  74. data/lib/ace/test/end_to_end_runner/version.rb +9 -0
  75. data/lib/ace/test/end_to_end_runner.rb +71 -0
  76. metadata +220 -0
@@ -0,0 +1,355 @@
1
+ ---
2
+ doc-type: workflow
3
+ title: Run E2E Test Workflow
4
+ purpose: Execute an E2E test scenario with full agent guidance
5
+ ace-docs:
6
+ last-updated: 2026-03-12
7
+ last-checked: 2026-03-21
8
+ ---
9
+
10
+ # Run E2E Test Workflow
11
+
12
+ This workflow guides an agent through executing an E2E test scenario. It supports two execution modes: **standard mode** (agent manages sandbox setup and full execution) and **TC-level mode** (sandbox pre-populated by `SetupExecutor`, single TC execution).
13
+
14
+ ## Arguments
15
+
16
+ - `PACKAGE` (optional) - Package containing the test (e.g., `ace-lint`). If omitted, looks for `test/e2e/` in project root.
17
+ - `TEST_ID` (optional) - Test identifier (e.g., `TS-LINT-001`). If omitted, runs all tests.
18
+ - `--run-id RUN_ID` (optional) - Pre-generated timestamp ID for deterministic report paths.
19
+ - `--report-dir PATH` (optional) - Explicit report directory path (skips computed `${TEST_DIR}-reports`).
20
+ - `--tags TAG,...` (optional) - Include only scenarios matching any of the specified tags (OR semantics).
21
+ - `--exclude-tags TAG,...` (optional) - Exclude scenarios matching any of the specified tags (OR semantics).
22
+ - `TEST_CASES` (optional) - Comma-separated TC IDs to execute (e.g., `TC-001,tc-003,002`). Normalized to `TC-NNN` format automatically.
23
+
24
+ **TC ID normalization:** `TC-001` (unchanged), `tc-001` → `TC-001`, `001` → `TC-001`, `1` → `TC-001`, `TC-1` → `TC-001`
25
+
26
+ ## Command Context
27
+
28
+ - Load this workflow with `ace-bundle wfi://e2e/run`.
29
+ - Use `ace-test-e2e` or `ace-test-e2e-suite` for CLI execution.
30
+
31
+ ## Canonical Conventions
32
+
33
+ - `ace-test-e2e` runs single-package scenarios; `ace-test-e2e-suite` runs suite-level execution
34
+ - Scenario IDs: `TS-<PACKAGE_SHORT>-<NNN>[-slug]`
35
+ - Standalone TC pairs: `TC-*.runner.md` + `TC-*.verify.md`
36
+ - TC artifacts: `results/tc/{NN}/`
37
+ - Summary counters: `tcs-passed`, `tcs-failed`, `tcs-total`, `failed[].tc`
38
+ - Tag filtering happens at discovery time (before sandbox setup)
39
+
40
+ ## Execution Contract
41
+
42
+ - Runner instructions are execution-only: perform actions and write evidence.
43
+ - Verifier instructions are verification-only: assign verdicts using impact-first checks:
44
+ 1. sandbox/project state impact
45
+ 2. explicit artifacts
46
+ 3. debug captures as fallback
47
+ - Do not place ad-hoc setup logic in TC runner files; sandbox setup belongs to `scenario.yml` and fixtures.
48
+
49
+ ## Execution Environment Guardrail
50
+
51
+ - Do **not** run `ace-test-e2e` / `ace-test-e2e-suite` autonomously in constrained or uncertain environments.
52
+ - Provide exact run commands for the user unless explicit user request and confirmed environment fidelity.
53
+
54
+ ## Pipeline Context
55
+
56
+ For CLI providers (`ace-test-e2e`), the deterministic 6-phase pipeline handles execution automatically:
57
+
58
+ 1. **Setup** — `SetupExecutor` creates sandbox (git init, mise.toml, .ace symlinks, `results/tc/{NN}/` dirs)
59
+ 2. **Runner prompt** — `SkillPromptBuilder` assembles context from `runner.yml.md` + `TC-*.runner.md`
60
+ 3. **Runner LLM** — Agent executes TC steps in sandbox, produces artifacts
61
+ 4. **Verifier prompt** — `SkillPromptBuilder` assembles context from `verifier.yml.md` + `TC-*.verify.md`
62
+ 5. **Verifier LLM** — Independent agent evaluates artifacts against expectations
63
+ 6. **Report** — `PipelineReportGenerator` produces deterministic summary
64
+
65
+ When this workflow is invoked directly (not via CLI pipeline), the agent performs steps 1-6 manually using the workflow steps below.
66
+
67
+ ---
68
+
69
+ ## Subagent Mode
70
+
71
+ When invoked as a subagent (via a batch orchestrator such as an assignment fan-out workflow):
72
+
73
+ - Each subagent runs in a clean context with no shared state
74
+ - Timestamp IDs ensure unique report paths (no collisions)
75
+ - All reports are written to disk, not returned inline
76
+
77
+ **Return contract:**
78
+
79
+ ```markdown
80
+ - **Test ID**: {test-id}
81
+ - **Status**: pass | fail | partial
82
+ - **Passed**: {count}
83
+ - **Failed**: {count}
84
+ - **Total**: {count}
85
+ - **Report Paths**: {timestamp}-{short-pkg}-{short-id}.*
86
+ - **Issues**: Brief description or "None"
87
+ ```
88
+
89
+ Do NOT return full report contents, detailed TC output, or setup logs.
90
+
91
+ ---
92
+
93
+ ## TC-Level Execution Mode
94
+
95
+ When invoked with `--tc-mode`, the sandbox is pre-populated by `SetupExecutor` and only a single TC is executed. Steps 1-5 of standard mode are skipped.
96
+
97
+ **TC-Level Arguments:**
98
+ - `PACKAGE` (required), `TEST_ID` (required), `TC_ID` (required)
99
+ - `--tc-mode` (required), `--sandbox SANDBOX_PATH` (required)
100
+ - `--run-id RUN_ID` (optional), `--env KEY=VALUE,...` (optional)
101
+
102
+ **TC-Level Steps:**
103
+ 1. Verify `SANDBOX_PATH` exists
104
+ 2. `cd SANDBOX_PATH`
105
+ 3. Export `--env` variables if provided
106
+ 4. Execute TC steps from the runner file
107
+ 5. Write per-TC reports to `{RUN_ID}-{pkg}-{scenario}-{tc}-reports/`
108
+ 6. Return TC-level contract
109
+
110
+ **TC-Level Rules:**
111
+ - Do NOT create or modify sandbox — `SetupExecutor` already prepared it
112
+ - Always export `--env` variables before executing test steps
113
+ - Report actual results even if they differ from expected
114
+
115
+ ---
116
+
117
+ ## Workflow Steps (Standard Mode)
118
+
119
+ ### 1. Locate Test Scenarios
120
+
121
+ Discover scenarios based on arguments:
122
+
123
+ ```bash
124
+ # No arguments — project root
125
+ find test/e2e -name "scenario.yml" -path "*/TS-*" 2>/dev/null | sort
126
+
127
+ # PACKAGE only — all tests in package
128
+ find {PACKAGE}/test/e2e -name "scenario.yml" -path "*/TS-*" 2>/dev/null | sort
129
+
130
+ # PACKAGE + TEST_ID — specific test
131
+ find {PACKAGE}/test/e2e -path "*{TEST_ID}*/scenario.yml" 2>/dev/null | head -1
132
+ ```
133
+
134
+ If `--tags` or `--exclude-tags` provided, filter discovered scenarios by reading each `scenario.yml` and checking the `tags` array. Tags use OR semantics: a scenario matches `--tags` if it has **any** listed tag, and is excluded by `--exclude-tags` if it has **any** listed tag.
135
+
136
+ If no tests found after filtering, report error and exit.
137
+
138
+ ### 2. Read Test Scenario
139
+
140
+ For each scenario file, read and parse:
141
+ - `test-id`, `title`, `priority`, `duration`, `requires`, `tags`
142
+
143
+ **Multiple tests:** Execute steps 2-7 for each scenario sequentially, then generate a combined summary.
144
+
145
+ ### 2.5. Parse and Filter Test Cases
146
+
147
+ If `TEST_CASES` argument was provided:
148
+
149
+ 1. Split comma-separated list
150
+ 2. Normalize each to `TC-NNN` format (uppercase, zero-padded to 3 digits)
151
+ 3. Deduplicate
152
+ 4. Validate each exists as a `TC-*.runner.md` file in the scenario directory
153
+
154
+ When `TEST_CASES` is not provided, all TCs execute (default).
155
+
156
+ ### 3. Verify Prerequisites
157
+
158
+ 1. Check each tool in `requires.tools` is available
159
+ 2. Verify `ruby --version` meets requirement
160
+ 3. Ensure package is installed
161
+
162
+ Report missing prerequisites before proceeding.
163
+
164
+ ### 4. Execute Environment Setup
165
+
166
+ > **CRITICAL: SANDBOX REQUIRED**
167
+ > All E2E tests MUST run in an isolated sandbox under `.ace-local/test-e2e/`.
168
+ > NEVER execute test commands in the main repository.
169
+
170
+ **Reference:** `wfi://e2e/setup-sandbox` for the authoritative sandbox setup pattern.
171
+
172
+ **Pre-generated Run ID:** If `--run-id` was provided, set `TIMESTAMP_ID=$RUN_ID` instead of generating a new one.
173
+
174
+ **Directory naming convention:**
175
+ - `{timestamp}` — 6-char base36 timestamp
176
+ - `{short-pkg}` — package without `ace-` prefix (e.g., `lint`)
177
+ - `{short-id}` — lowercase prefix + number (e.g., `ts001`)
178
+
179
+ ```
180
+ .ace-local/test-e2e/
181
+ ├── 8osvnh-lint-ts001/ # Sandbox
182
+ ├── 8osvnh-lint-ts001-reports/ # Reports (summary.r.md, experience.r.md, metadata.yml)
183
+ └── 8osynv-final-report.md # Suite report (sibling)
184
+ ```
185
+
186
+ **Expected variables after setup:**
187
+ - `PROJECT_ROOT` — Original project directory
188
+ - `TEST_DIR` — Sandbox directory (cwd after setup)
189
+ - `REPORTS_DIR` — Reports directory
190
+ - `TIMESTAMP_ID` — Unique run identifier
191
+
192
+ ### 4.1 Sandbox Isolation Checkpoint (MANDATORY)
193
+
194
+ Before proceeding, verify sandbox isolation:
195
+
196
+ ```bash
197
+ echo "=== SANDBOX ISOLATION CHECK ==="
198
+ CURRENT_DIR="$(pwd)"
199
+ [[ "$CURRENT_DIR" == *".ace-local/test-e2e/"* ]] && echo "PASS: In sandbox" || echo "FAIL: NOT in sandbox"
200
+ git rev-parse --git-dir >/dev/null 2>&1 && { [ -z "$(git remote -v 2>/dev/null)" ] && echo "PASS: No remotes" || echo "FAIL: Remotes found"; } || echo "PASS: No git"
201
+ [ -f "CLAUDE.md" ] || [ -f "Gemfile" ] || [ -d ".ace-taskflow" ] && echo "FAIL: Project markers found" || echo "PASS: No markers"
202
+ echo "=== END CHECK ==="
203
+ ```
204
+
205
+ - **All PASS**: Continue to step 5
206
+ - **Any FAIL**: STOP, return to `$PROJECT_ROOT`, re-run setup, re-check
207
+
208
+ ### 5. Create Test Data
209
+
210
+ > **Use `ace-test-e2e-sh "$TEST_DIR"` for ALL commands after setup.**
211
+ > Each bash block runs in a fresh shell — the wrapper ensures sandbox isolation.
212
+
213
+ Execute test data creation commands from the scenario, writing files inside `$TEST_DIR/`.
214
+
215
+ ### 6. Execute Test Cases
216
+
217
+ > **Use `ace-test-e2e-sh "$TEST_DIR"` for ALL TC commands.**
218
+
219
+ If `FILTERED_CASES` is set, execute only matching TCs. Otherwise execute all.
220
+
221
+ For each TC (TC-NNN):
222
+ 1. **Check filter** — skip if not in `FILTERED_CASES`
223
+ 2. **Read** the runner file (`TC-NNN-*.runner.md`)
224
+ 3. **Execute** runner steps, save artifacts to `results/tc/{NN}/`
225
+ 4. **Verify** against paired `.verify.md` expectations
226
+ 5. **Record** status (Pass/Fail) with evidence
227
+
228
+ Track friction points during execution for the experience report.
229
+
230
+ ### 7. Write Reports
231
+
232
+ Write three report files to the reports directory.
233
+
234
+ **Report path setup:**
235
+ ```bash
236
+ REPORT_DIR="${PROVIDED_REPORT_DIR:-${TEST_DIR}-reports}"
237
+ mkdir -p "$REPORT_DIR"
238
+ ```
239
+
240
+ Replace all `{placeholder}` values with actual data.
241
+
242
+ #### 7.1 summary.r.md
243
+
244
+ ```yaml
245
+ ---
246
+ test-id: {test-id}
247
+ package: {package}
248
+ agent: {agent-name}
249
+ executed: {timestamp}
250
+ status: pass|fail|partial|incomplete
251
+ tcs-passed: {count}
252
+ tcs-failed: {count}
253
+ tcs-total: {count}
254
+ score: "{passed}/{total}"
255
+ verdict: pass|fail|partial|incomplete
256
+ filtered: true|false
257
+ failed:
258
+ - tc: TC-NNN
259
+ category: tool-bug|runner-error|test-spec-error|infrastructure-error
260
+ evidence: "brief evidence"
261
+ ---
262
+ ```
263
+
264
+ Followed by test information table, results summary table, and TC evaluation details.
265
+
266
+ #### 7.2 experience.r.md
267
+
268
+ Agent experience report with friction points, root cause analysis, improvement suggestions, and positive observations.
269
+
270
+ #### 7.3 metadata.yml
271
+
272
+ ```yaml
273
+ run-id: "{TIMESTAMP_ID}"
274
+ test-id: "{test-id}"
275
+ package: "{package}"
276
+ status: "{status}"
277
+ score: {0.0-1.0}
278
+ verdict: pass|partial|fail
279
+ tcs-passed: {count}
280
+ tcs-failed: {count}
281
+ tcs-total: {count}
282
+ failed:
283
+ - tc: TC-NNN
284
+ category: tool-bug|runner-error|test-spec-error|infrastructure-error
285
+ evidence: "brief evidence"
286
+ test_cases:
287
+ filtered: true|false
288
+ executed: [TC-001, TC-003]
289
+ git:
290
+ branch: "{branch}"
291
+ commit: "{short-sha}"
292
+ ```
293
+
294
+ ### 8. Cleanup (Optional)
295
+
296
+ Controlled by `cleanup.enabled` in `.ace-defaults/e2e-runner/config.yml` (default: disabled).
297
+
298
+ Sandbox directories in `.ace-local/test-e2e/` are gitignored.
299
+
300
+ ### 9. Generate Summary
301
+
302
+ Summarize execution in the response. Reports are persisted to disk.
303
+
304
+ **Single test:**
305
+ ```markdown
306
+ ## E2E Test Execution Report
307
+ **Test ID:** {test-id} | **Package:** {package} | **Status:** {PASS/FAIL}
308
+
309
+ | Test Case | Description | Status |
310
+ |-----------|-------------|--------|
311
+ | TC-001 | ... | Pass |
312
+
313
+ Reports: `.ace-local/test-e2e/{timestamp}-{short-pkg}-{short-id}-reports/`
314
+ ```
315
+
316
+ ### 10. Update Test Scenario
317
+
318
+ If all tests pass, update `scenario.yml`:
319
+ ```yaml
320
+ last-verified: {today's date}
321
+ verified-by: claude-{model}
322
+ ```
323
+
324
+ ## Error Handling
325
+
326
+ | Failure | Action |
327
+ |---------|--------|
328
+ | Prerequisites not met | Report which failed, provide resolution steps, stop |
329
+ | TC fails | Record details, continue remaining TCs, include in report |
330
+ | Environment setup fails | Report error, attempt cleanup, suggest troubleshooting |
331
+ | Sandbox isolation fails | STOP immediately, return to `$PROJECT_ROOT`, re-run setup |
332
+ | TC filter mismatch | STOP, do not write reports, offer re-run |
333
+
334
+ ## Example Invocations
335
+
336
+ ```bash
337
+ # Specific test
338
+ ace-test-e2e ace-lint TS-LINT-001
339
+
340
+ # Single TC within a test
341
+ ace-test-e2e ace-lint TS-LINT-003 TC-002
342
+
343
+ # Multiple specific TCs
344
+ ace-test-e2e ace-lint TS-LINT-001 TC-001,tc-003,002
345
+
346
+ # All tests in a package
347
+ ace-test-e2e ace-lint
348
+
349
+ # Filter by tags
350
+ ace-test-e2e ace-lint --tags smoke
351
+ ace-test-e2e ace-lint --exclude-tags deep
352
+
353
+ # All tests in project root
354
+ ace-test-e2e
355
+ ```