ace-test-runner-e2e 0.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. checksums.yaml +7 -0
  2. data/.ace-defaults/e2e-runner/config.yml +70 -0
  3. data/.ace-defaults/nav/protocols/guide-sources/ace-test-runner-e2e.yml +11 -0
  4. data/.ace-defaults/nav/protocols/skill-sources/ace-test-runner-e2e.yml +19 -0
  5. data/.ace-defaults/nav/protocols/tmpl-sources/ace-test-runner-e2e.yml +12 -0
  6. data/.ace-defaults/nav/protocols/wfi-sources/ace-test-runner-e2e.yml +11 -0
  7. data/CHANGELOG.md +1166 -0
  8. data/LICENSE +21 -0
  9. data/README.md +42 -0
  10. data/Rakefile +15 -0
  11. data/exe/ace-test-e2e +15 -0
  12. data/exe/ace-test-e2e-sh +67 -0
  13. data/exe/ace-test-e2e-suite +13 -0
  14. data/handbook/guides/e2e-testing.g.md +124 -0
  15. data/handbook/guides/scenario-yml-reference.g.md +182 -0
  16. data/handbook/guides/tc-authoring.g.md +131 -0
  17. data/handbook/skills/as-e2e-create/SKILL.md +30 -0
  18. data/handbook/skills/as-e2e-fix/SKILL.md +35 -0
  19. data/handbook/skills/as-e2e-manage/SKILL.md +31 -0
  20. data/handbook/skills/as-e2e-plan-changes/SKILL.md +30 -0
  21. data/handbook/skills/as-e2e-review/SKILL.md +35 -0
  22. data/handbook/skills/as-e2e-rewrite/SKILL.md +31 -0
  23. data/handbook/skills/as-e2e-run/SKILL.md +48 -0
  24. data/handbook/skills/as-e2e-setup-sandbox/SKILL.md +34 -0
  25. data/handbook/templates/ace-taskflow-fixture.template.md +322 -0
  26. data/handbook/templates/agent-experience-report.template.md +89 -0
  27. data/handbook/templates/metadata.template.yml +49 -0
  28. data/handbook/templates/scenario.yml.template.yml +60 -0
  29. data/handbook/templates/tc-file.template.md +45 -0
  30. data/handbook/templates/test-report.template.md +94 -0
  31. data/handbook/workflow-instructions/e2e/analyze-failures.wf.md +126 -0
  32. data/handbook/workflow-instructions/e2e/create.wf.md +395 -0
  33. data/handbook/workflow-instructions/e2e/execute.wf.md +253 -0
  34. data/handbook/workflow-instructions/e2e/fix.wf.md +166 -0
  35. data/handbook/workflow-instructions/e2e/manage.wf.md +179 -0
  36. data/handbook/workflow-instructions/e2e/plan-changes.wf.md +255 -0
  37. data/handbook/workflow-instructions/e2e/review.wf.md +286 -0
  38. data/handbook/workflow-instructions/e2e/rewrite.wf.md +281 -0
  39. data/handbook/workflow-instructions/e2e/run.wf.md +355 -0
  40. data/handbook/workflow-instructions/e2e/setup-sandbox.wf.md +461 -0
  41. data/lib/ace/test/end_to_end_runner/atoms/display_helpers.rb +234 -0
  42. data/lib/ace/test/end_to_end_runner/atoms/prompt_builder.rb +199 -0
  43. data/lib/ace/test/end_to_end_runner/atoms/result_parser.rb +166 -0
  44. data/lib/ace/test/end_to_end_runner/atoms/skill_prompt_builder.rb +166 -0
  45. data/lib/ace/test/end_to_end_runner/atoms/skill_result_parser.rb +244 -0
  46. data/lib/ace/test/end_to_end_runner/atoms/suite_report_prompt_builder.rb +103 -0
  47. data/lib/ace/test/end_to_end_runner/atoms/tc_fidelity_validator.rb +39 -0
  48. data/lib/ace/test/end_to_end_runner/atoms/test_case_parser.rb +108 -0
  49. data/lib/ace/test/end_to_end_runner/cli/commands/run_suite.rb +130 -0
  50. data/lib/ace/test/end_to_end_runner/cli/commands/run_test.rb +156 -0
  51. data/lib/ace/test/end_to_end_runner/models/test_case.rb +47 -0
  52. data/lib/ace/test/end_to_end_runner/models/test_result.rb +115 -0
  53. data/lib/ace/test/end_to_end_runner/models/test_scenario.rb +90 -0
  54. data/lib/ace/test/end_to_end_runner/molecules/affected_detector.rb +92 -0
  55. data/lib/ace/test/end_to_end_runner/molecules/config_loader.rb +75 -0
  56. data/lib/ace/test/end_to_end_runner/molecules/failure_finder.rb +203 -0
  57. data/lib/ace/test/end_to_end_runner/molecules/fixture_copier.rb +35 -0
  58. data/lib/ace/test/end_to_end_runner/molecules/pipeline_executor.rb +121 -0
  59. data/lib/ace/test/end_to_end_runner/molecules/pipeline_prompt_bundler.rb +182 -0
  60. data/lib/ace/test/end_to_end_runner/molecules/pipeline_report_generator.rb +321 -0
  61. data/lib/ace/test/end_to_end_runner/molecules/pipeline_sandbox_builder.rb +131 -0
  62. data/lib/ace/test/end_to_end_runner/molecules/progress_display_manager.rb +172 -0
  63. data/lib/ace/test/end_to_end_runner/molecules/report_writer.rb +259 -0
  64. data/lib/ace/test/end_to_end_runner/molecules/scenario_loader.rb +254 -0
  65. data/lib/ace/test/end_to_end_runner/molecules/setup_executor.rb +181 -0
  66. data/lib/ace/test/end_to_end_runner/molecules/simple_display_manager.rb +72 -0
  67. data/lib/ace/test/end_to_end_runner/molecules/suite_progress_display_manager.rb +223 -0
  68. data/lib/ace/test/end_to_end_runner/molecules/suite_report_writer.rb +277 -0
  69. data/lib/ace/test/end_to_end_runner/molecules/suite_simple_display_manager.rb +116 -0
  70. data/lib/ace/test/end_to_end_runner/molecules/test_discoverer.rb +136 -0
  71. data/lib/ace/test/end_to_end_runner/molecules/test_executor.rb +332 -0
  72. data/lib/ace/test/end_to_end_runner/organisms/suite_orchestrator.rb +830 -0
  73. data/lib/ace/test/end_to_end_runner/organisms/test_orchestrator.rb +442 -0
  74. data/lib/ace/test/end_to_end_runner/version.rb +9 -0
  75. data/lib/ace/test/end_to_end_runner.rb +71 -0
  76. metadata +220 -0
data/CHANGELOG.md ADDED
@@ -0,0 +1,1166 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.29.0] - 2026-03-24
11
+
12
+ ### Added
13
+ - Documented `ace-test-e2e-sh` sandbox shell command in usage reference (previously undocumented executable).
14
+
15
+ ### Changed
16
+ - Re-recorded getting-started demo with real `ace-test-e2e-suite --progress` execution at 8x playback speed.
17
+ - Fixed demo tape YAML structure (duplicate `commands:` key caused `cd` to be silently dropped).
18
+ - Normalized gemspec homepage and changelog URIs to use consistent interpolation pattern.
19
+
20
+ ## [0.28.0] - 2026-03-23
21
+
22
+ ### Changed
23
+ - Refreshed README layout, navigation links, and section flow to align with the current package README pattern.
24
+
25
+ ## [0.27.0] - 2026-03-22
26
+
27
+ ### Changed
28
+ - Reworked package documentation with a landing-page README, tutorial getting-started guide, full usage reference, handbook catalog, demo tape/GIF assets, and refreshed gem metadata messaging.
29
+
30
+ ## [0.26.1] - 2026-03-18
31
+
32
+ ### Changed
33
+ - Migrated CLI namespace from `Ace::Core::CLI::*` to `Ace::Support::Cli::*` (ace-support-cli is now the canonical home for CLI infrastructure).
34
+
35
+
36
+ ## [0.26.0] - 2026-03-18
37
+
38
+ ### Changed
39
+ - Removed legacy backward-compatibility behavior as part of the 0.10 cleanup release.
40
+
41
+
42
+ ## [0.25.0] - 2026-03-17
43
+
44
+ ### Added
45
+ - Added optional per-scenario `timeout` support (in seconds) in `scenario.yml`, with scenario timeout taking precedence over suite/global timeout.
46
+
47
+ ## [0.24.13] - 2026-03-17
48
+
49
+ ### Fixed
50
+ - Ensure CLI E2E scenarios keep package-root references inside sandbox by provisioning package contents during pipeline setup, preventing `$PROJECT_ROOT_PATH/<package>` path failures.
51
+
52
+ ## [0.24.12] - 2026-03-15
53
+
54
+ ### Changed
55
+ - Migrated CLI framework from dry-cli to ace-support-cli
56
+
57
+ ## [0.24.11] - 2026-03-13
58
+
59
+ ### Technical
60
+ - Updated canonical E2E workflow skills for workspace-based execution flow.
61
+
62
+ ## [0.24.10] - 2026-03-13
63
+
64
+ ### Changed
65
+ - Updated canonical E2E workflow skills to explicitly run bundled workflows in the current project and execute them end-to-end.
66
+
67
+ ## [0.24.9] - 2026-03-13
68
+
69
+ ### Fixed
70
+ - Corrected experience report status output so only pass results are marked `complete`; partial or error states now consistently remain `incomplete`.
71
+
72
+ ### Changed
73
+ - Updated `e2e/fix` workflow guidance to remove suite-level `--only-failures` checkpoints and require explicit scenario-level reruns for fix iteration.
74
+
75
+ ### Technical
76
+ - Added regression coverage for experience report status behavior for pass, partial, and error results.
77
+
78
+ ## [0.24.8] - 2026-03-13
79
+
80
+ ### Changed
81
+ - Removed the stale fork-context comment from the canonical `as-e2e-run` skill so only the selected Claude-fork skills retain provider-specific fork metadata.
82
+
83
+ ## [0.24.7] - 2026-03-13
84
+
85
+ ### Changed
86
+ - Increased the default E2E suite parallelism setting in project config from `6` to `8`.
87
+
88
+ ## [0.24.6] - 2026-03-13
89
+
90
+ ### Changed
91
+ - Updated the `e2e/fix` workflow and canonical `as-e2e-fix` skill to require rerunning the selected failing scope after each fix and a final `ace-test-e2e-suite --only-failures` checkpoint before concluding a fix session.
92
+
93
+ ## [0.24.5] - 2026-03-13
94
+
95
+ ### Changed
96
+ - Updated the project E2E suite report-generation default model override to `claude:sonnet@ro`.
97
+
98
+ ## [0.24.4] - 2026-03-12
99
+
100
+ ### Fixed
101
+ - Threaded the sandbox path into CLI-provider E2E runner and verifier invocations as an explicit working directory so `results/tc/...` stays sandbox-local during deterministic pipeline execution.
102
+
103
+ ## [0.24.3] - 2026-03-12
104
+
105
+ ### Changed
106
+ - Switched the project E2E suite report-generation default model to `codex:spark`.
107
+
108
+ ## [0.24.2] - 2026-03-12
109
+
110
+ ### Fixed
111
+ - Restored the canonical `as-e2e-run` `--sandbox` path to `wfi://e2e/execute` so CLI-provider E2E runs use the pre-populated sandbox execution workflow again.
112
+ - Hardened verifier/result parsing so prose containing paths like `results/tc/{NN}` is no longer misclassified as raw JSON.
113
+ - Convert unstructured verifier responses into deterministic `error` reports and mark error metadata verdicts as `fail` instead of `pass`.
114
+
115
+ ### Technical
116
+ - Added regression coverage for canonical E2E skill routing, brace-fragment parser handling, and unstructured verifier report generation.
117
+
118
+ ## [0.24.1] - 2026-03-12
119
+
120
+ ### Changed
121
+ - Updated README and E2E workflow documentation to use `ace-bundle` and `ace-test-e2e` examples instead of slash-command orchestration.
122
+
123
+ ## [0.24.0] - 2026-03-10
124
+
125
+ ### Added
126
+ - Added canonical handbook-owned E2E lifecycle skills for create, manage, review, planning, rewrite, fix, and sandbox setup.
127
+
128
+ ### Changed
129
+ - Aligned canonical E2E skill tool declarations and metadata with the stricter handbook skill schema.
130
+
131
+
132
+ ## [0.23.0] - 2026-03-09
133
+
134
+ ### Added
135
+ - Added `skill-sources` gem defaults registration at `.ace-defaults/nav/protocols/skill-sources/ace-test-runner-e2e.yml` so `skill://` can discover canonical `handbook/skills` entries from `ace-test-runner-e2e`.
136
+
137
+ ## [0.22.1] - 2026-03-09
138
+
139
+ ### Changed
140
+ - Updated canonical `as-e2e-run` skill metadata with explicit workflow typing (`skill.kind`, `skill.execution.workflow`) and agent-context comments for schema-aligned projections.
141
+
142
+ ## [0.22.0] - 2026-03-08
143
+
144
+ ### Changed
145
+ - Remove hardcoded `providers.cli_args` config; use ace-llm `@preset` suffixes for provider permission flags
146
+
147
+ ## [0.21.2] - 2026-03-04
148
+
149
+ ### Changed
150
+ - E2E runtime, wrappers, tests, and workflows now use `.ace-local/test-e2e` and sandbox-local `.ace-local/e2e` paths.
151
+
152
+
153
+ ## [0.21.1] - 2026-03-04
154
+
155
+ ### Changed
156
+ - Preserved `required_cli_args` string compatibility for external callers and added array-normalized internal usage via `required_cli_args_list`.
157
+
158
+ ## [0.21.0] - 2026-03-04
159
+
160
+ ### Changed
161
+ - Normalize `providers.cli_args` config values to arrays and support merged string/array CLI args in adapter and executor.
162
+
163
+ ## [0.20.5] - 2026-02-25
164
+
165
+ ### Technical
166
+ - Update taskflow fixture template task lookup examples to use `ace-task show`.
167
+
168
+ ## [0.20.4] - 2026-02-25
169
+
170
+ ### Changed
171
+ - Standardize handbook runner/verifier contract across E2E guides, templates, and workflows: runner is execution-only, verifier is impact-first (sandbox impact → artifacts → debug fallback).
172
+ - Add explicit setup ownership guidance (`scenario.yml` + fixtures) and remove runner-side setup anti-patterns from handbook instructions.
173
+ - Extend E2E workflow guardrails to avoid autonomous `ace-test-e2e` / `ace-test-e2e-suite` execution in constrained or uncertain environments.
174
+
175
+ ## [0.20.3] - 2026-02-24
176
+
177
+ ### Added
178
+ - Support run-ID-driven tmux session naming via `tmux-session: { name-source: run-id }` in scenario `setup` directives.
179
+
180
+ ### Changed
181
+ - Pass the orchestrator run ID into setup execution so tmux setup can use deterministic per-run session names.
182
+ - Document run-ID tmux session setup and teardown behavior in scenario reference/template guidance.
183
+
184
+ ### Technical
185
+ - Add regression coverage for run-ID tmux session naming in setup executor and orchestrator setup integration.
186
+
187
+ ## [0.20.2] - 2026-02-24
188
+
189
+ ### Changed
190
+ - Strengthen `e2e/analyze-failures` output contract with autonomous fix decisions, concrete candidate file targets, and explicit no-touch boundaries.
191
+ - Update `e2e/fix` to consume autonomous analysis decisions directly and proceed without user clarification for normal targeting/scope choices.
192
+
193
+ ## [0.20.1] - 2026-02-24
194
+
195
+ ### Changed
196
+ - Consolidate mise.toml handling into `setup:` as `run:` steps; remove `sandbox-setup:` mechanism from `PipelineSandboxBuilder`, `TestScenario`, and `ScenarioLoader`
197
+ - Rename `env:` → `agent-env:` in scenario.yml `setup:` to clarify these are environment variables passed to the runner/verifier agent subprocess, not setup commands
198
+ - Re-export env vars (including `PROJECT_ROOT_PATH`, `ACE_TASKFLOW_PATH`) after login shell profile sourcing in `SetupExecutor#handle_run` to protect against mise's shell hook clobbering
199
+
200
+ ## [0.20.0] - 2026-02-24
201
+
202
+ ### Added
203
+ - Generic `sandbox-setup:` field in scenario.yml for declaring shell commands that run inside the pipeline sandbox after infrastructure setup, with `$SANDBOX_PATH` and `$PROJECT_ROOT_PATH` environment variables
204
+ - `sandbox_setup` and `sandbox_teardown` attributes on `TestScenario` model
205
+ - `parse_sandbox_commands` method in `ScenarioLoader` for parsing sandbox command fields from YAML
206
+ - `execute_sandbox_setup` method in `PipelineSandboxBuilder` replacing hardcoded `trust_mise_config`
207
+
208
+ ### Changed
209
+ - Replace hardcoded `mise trust` call in `PipelineSandboxBuilder` with generic `execute_sandbox_setup` mechanism driven by scenario configuration
210
+
211
+ ## [0.19.3] - 2026-02-24
212
+
213
+ ### Fixed
214
+ - Stop copying `TC-*.runner.md` / `TC-*.verify.md` scenario definitions into the sandbox root during setup; pipeline execution now relies on prompt bundling from scenario source files only.
215
+ - Clarify pipeline runner system prompt to treat initial cwd as `SANDBOX_ROOT` and keep artifact writes under `SANDBOX_ROOT/results` even when commands must run inside created worktrees.
216
+
217
+ ## [0.19.2] - 2026-02-24
218
+
219
+ ### Fixed
220
+ - Improve pipeline verifier evidence extraction to support multiline evidence blocks and `Evidence of failure` headings in report parsing.
221
+
222
+ ### Technical
223
+ - Add parser regression coverage for multiline failure evidence propagation into metadata and report outputs.
224
+
225
+ ## [0.19.1] - 2026-02-24
226
+
227
+ ### Changed
228
+ - Increase default CLI pipeline timeout from 300s to 600s in default E2E runner configuration.
229
+
230
+ ## [0.19.0] - 2026-02-24
231
+
232
+ ### Added
233
+ - Add `e2e/analyze-failures` workflow to classify failed scenarios/TCs before any fix is applied.
234
+
235
+ ### Changed
236
+ - Rewrite `e2e/fix` as an execution-only workflow with a required analysis gate and explicit rerun-scope discipline.
237
+
238
+ ## [0.18.2] - 2026-02-24
239
+
240
+ ### Changed
241
+ - Rewrite `run.wf.md` (v2.0): restructure dual-mode execution, add `--tags`/`--exclude-tags`, add pipeline context section, standardize report fields to TC-first schema
242
+ - Rewrite `execute.wf.md` (v2.0): document SetupExecutor contract, add dual-agent verifier documentation, clarify tag filtering at discovery time
243
+
244
+ ### Added
245
+ - Add `tags` field to scenario-yml-reference guide with naming conventions and OR filtering semantics
246
+ - Add `## Execution Pipeline` section to e2e-testing guide documenting 6-phase deterministic pipeline
247
+ - Add `## Scenario-Level Configuration` section to tc-authoring guide explaining tags, runner/verifier roles, and sequential context model
248
+ - Add `tags` field to scenario.yml template
249
+ - Add `score`, `verdict`, and `failed[]` (TC-first schema) to test-report template
250
+ - Add `--tags`/`--exclude-tags` arguments to manage, run-batch workflow instructions
251
+ - Add tag-related guidance to create, fix, rewrite, review, plan-changes, setup-sandbox workflows
252
+ - Add essential E2E test suite plan covering 10 new scenarios across 7 packages
253
+
254
+ ### Fixed
255
+ - Fix `cost-tier` default from `standard` to `smoke` and values to `smoke|happy-path|deep` across all guides and templates
256
+ - Rename legacy `passed`/`failed`/`total` frontmatter fields to `tcs-passed`/`tcs-failed`/`tcs-total` in test-report template
257
+
258
+ ## [0.18.1] - 2026-02-24
259
+
260
+ ### Fixed
261
+ - Harden verifier goal parsing for standalone pipeline reports: accept both `##` and `###` goal headings, normalize emphasized verdict tokens, and extract failure categories from mixed category text
262
+ - Ensure deterministic `*-reports` output on pipeline failures by writing structured error reports (summary, experience, metadata, and goal report) instead of leaving missing report directories
263
+ - Improve suite subprocess parsing to reliably extract the final `Report:` and `Error:` lines and preserve error summaries from metadata reconciliation
264
+
265
+ ### Changed
266
+ - Rename CLI provider helper to `CliProviderAdapter` and keep `SkillPromptBuilder` as a backward-compatible alias
267
+ - Route executor/orchestrator CLI-provider detection and required-args lookup through the new adapter name
268
+ - Standardize `--only-failures` behavior around scenario-level reruns and update suite messaging accordingly
269
+ - Prefer workspace-local `bin/ace-test-e2e` when suite orchestration spawns subprocesses
270
+
271
+ ### Technical
272
+ - Expand parser and pipeline tests for h2/h3 goal headings, category normalization, deterministic failure report generation, and adapter alias compatibility
273
+ - Update workflow/config wording to reflect deterministic pipeline execution terminology
274
+
275
+ ## [0.18.0] - 2026-02-24
276
+
277
+ ### Changed
278
+ - Simplify `ace-test-e2e` to single-command CLI (no `run` subcommand needed)
279
+ - Simplify `ace-test-e2e-suite` to single-command CLI (clean `--help` output)
280
+
281
+ ### Removed
282
+ - Multi-command Registry (`CLI` module with `run`/`suite`/`setup` subcommands)
283
+ - `ace-test-e2e setup` command (setup runs automatically during test execution)
284
+
285
+ ## [0.17.6] - 2026-02-24
286
+
287
+ ### Changed
288
+ - Simplify E2E execution to a single standalone pipeline model for CLI providers
289
+ - Rename internal pipeline components to neutral names:
290
+ - `PipelineSandboxBuilder`
291
+ - `PipelinePromptBundler`
292
+ - `PipelineReportGenerator`
293
+ - `PipelineExecutor`
294
+ - Update handbook guides, templates, and workflow instructions to standalone runner/verifier pair format
295
+
296
+ ### Removed
297
+ - Scenario-level `mode` and `execution-model` support in `scenario.yml` parsing
298
+ - Inline `.tc.md` test-case format support in `ScenarioLoader`
299
+ - `ace-test-e2e suite --mode` option and mode-based scenario discovery filtering
300
+ - Goal-mode-specific verify forcing in `TestOrchestrator` (verify now respects CLI flag only)
301
+
302
+ ## [0.17.5] - 2026-02-24
303
+
304
+ ### Added
305
+ - Add standalone goal-mode execution pipeline components:
306
+ - `GoalModeSandboxBuilder` (sandbox bootstrapping and tool validation)
307
+ - `GoalModePromptBundler` (runner/verifier prompt preparation with artifact embedding)
308
+ - `GoalModeReportGenerator` (TC-first report synthesis from verifier output)
309
+ - `GoalModeExecutor` (Phase A-F orchestration)
310
+
311
+ ### Changed
312
+ - Route standalone goal-mode scenarios away from slash-command skill invocation to deterministic dual-agent execution via `ace-llm` prompts
313
+ - Force verifier execution for standalone goal-mode scenarios regardless of CLI `--verify` flag (`--verify` is effectively always-on for this mode)
314
+ - Keep procedural and inline-goal execution behavior unchanged
315
+
316
+ ## [0.17.4] - 2026-02-24
317
+
318
+ ### Fixed
319
+ - Prevent synthetic TC ID collision with real failed TC IDs in verifier results
320
+ - Add test coverage for no-category TC parsing and unique TC ID guarantee
321
+
322
+ ## [0.17.3] - 2026-02-24
323
+
324
+ ### Fixed
325
+ - Fix NoMethodError in `parse_failed_tcs` when TC entry lacks category suffix
326
+
327
+ ## [0.17.2] - 2026-02-24
328
+
329
+ ### Added
330
+ - Optional independent verifier execution mode via `--verify` for `ace-test-e2e run` and `ace-test-e2e suite`
331
+ - Verifier parsing support for TC-first contracts including failure categorization (`test-spec-error`, `tool-bug`, `runner-error`, `infrastructure-error`)
332
+
333
+ ### Changed
334
+ - `TestExecutor`/`TestOrchestrator` execution path now supports runner + verifier dual invocation while keeping default single-agent behavior unchanged
335
+ - Report metadata and summary frontmatter now emit TC-first fields (`tcs-passed`, `tcs-failed`, `tcs-total`, `score`, `verdict`) and structured `failed[].tc` entries
336
+ - Failure discovery and metadata reconciliation now read TC-first schema fields in addition to existing result counters
337
+ - Execute workflow documentation updated to reflect verify mode and TC-first report structure
338
+
339
+ ## [0.17.1] - 2026-02-24
340
+
341
+ ### Added
342
+ - Test case frontmatter `mode` support (`procedural` default, `goal` explicit) in `ScenarioLoader`/`TestCase`
343
+ - Inline goal-mode TC validation: required `Objective`/`Available Tools`/`Success Criteria` and rejection of `## Steps`
344
+
345
+ ### Changed
346
+ - E2E execution workflow docs (`run.wf.md`, `execute.wf.md`) now define procedural, inline-goal, and standalone-goal execution paths
347
+ - Goal-mode report template examples now document `passed`/`failed` arrays plus `score` and `verdict` frontmatter fields
348
+ - TC/scenario authoring guides and templates updated for goal-mode conventions
349
+
350
+ ## [0.17.0] - 2026-02-24
351
+
352
+ ### Added
353
+ - Scenario-level metadata support for `tags`, `mode`, `execution-model`, `tool-under-test`, and `sandbox-layout`
354
+ - Goal-mode standalone discovery for `TC-*.runner.md` and `TC-*.verify.md` pairs with required `runner.yml.md` and `verifier.yml.md`
355
+ - CLI filtering options: `ace-test-e2e suite --tags/--exclude-tags/--mode` and `ace-test-e2e run --tags`
356
+
357
+ ### Changed
358
+ - Apply tag and mode filtering at scenario discovery time so excluded scenarios never enter execution
359
+ - Extend sandbox definition copying to include goal-mode standalone files alongside procedural `.tc.md` files
360
+
361
+ ### Technical
362
+ - Expanded model, loader, discoverer, orchestrator, and command test coverage for metadata parsing, filter semantics, and option wiring
363
+
364
+ ## [0.16.22] - 2026-02-23
365
+
366
+ ### Technical
367
+ - Updated internal dependency version constraints to current releases
368
+
369
+ ## [0.16.21] - 2026-02-22
370
+
371
+ ### Changed
372
+ - Migrate CLI to standard help pattern with explicit subcommands
373
+ - Remove DWIM default routing - users must now use `run` subcommand explicitly
374
+ - Empty args now shows help instead of requiring a command
375
+
376
+ ### Technical
377
+ - Add `HELP_EXAMPLES` constant with usage examples
378
+ - Update tests to match new CLI pattern (remove `known_command?` tests)
379
+
380
+ ### Technical
381
+ - Update e2e-testing guide to use `ace-search "pattern"` single-command syntax (drop `search` subcommand)
382
+
383
+ ## [0.16.18] - 2026-02-22
384
+
385
+ ### Changed
386
+ - Migrate skill naming and invocation references to hyphenated `ace-*` format (no underscores).
387
+
388
+ ## [0.16.17] - 2026-02-21
389
+
390
+ ### Fixed
391
+ - `TestExecutor` now passes setup `env_vars` as `subprocess_env` to `QueryInterface.query`, ensuring environment variables (e.g., `ACE_TMUX_SESSION`) are set on the `claude -p` subprocess rather than only serialized as prompt text
392
+
393
+ ### Added
394
+ - Unit tests verifying `env_vars` propagation as `subprocess_env` for both scenario and test-case execution paths
395
+
396
+ ## [0.16.16] - 2026-02-21
397
+
398
+ ### Fixed
399
+ - Tmux sessions created by E2E test setup (`tmux-session` step) are now cleaned up after test execution via `ensure` blocks in `TestOrchestrator`
400
+ - `SetupExecutor` instance returned from `setup_sandbox_if_ts` so `teardown` is called reliably in both single-test and parallel-test paths
401
+
402
+ ### Changed
403
+ - Tmux session naming uses scenario test ID (`{test_id}-e2e`, e.g., `TS-OVERSEER-001-e2e`) instead of generic `ace-e2e-{timestamp}` for easier identification
404
+ - `SetupExecutor#execute` accepts `scenario_name:` parameter for descriptive session naming
405
+
406
+ ### Added
407
+ - Unit tests for tmux session naming (scenario-based and fallback) and teardown cleanup
408
+
409
+ ## [0.16.15] - 2026-02-21
410
+
411
+ ### Added
412
+ - Debug-only post-suite diagnostics in `SuiteOrchestrator` to detect and report lingering `claude -p` processes when `ACE_LLM_DEBUG_SUBPROCESS=1`
413
+ - Unit tests for lingering-process diagnostics behavior in debug-enabled and debug-disabled modes
414
+
415
+ ## [0.16.14] - 2026-02-21
416
+
417
+ ### Changed
418
+ - Update skill invocation to colon-free convention (`ace_e2e_run` format)
419
+ - Update skill prompt builder and tests for new skill naming convention
420
+
421
+ ## [0.16.13] - 2026-02-21
422
+
423
+ ### Added
424
+ - "Refactoring Resilience" section in E2E testing guide: pre-refactoring checklist, refactoring-proof patterns (variables not literals, flexible regex, runtime path discovery), post-refactoring smoke run requirement
425
+
426
+ ## [0.16.12] - 2026-02-21
427
+
428
+ ### Fixed
429
+ - Pass `--report-dir` explicitly from suite orchestrator to inner subprocesses, eliminating directory name mismatch between Ruby `short_id` computation and LLM agent interpretation
430
+ - Thread `report_dir` parameter through the full execution chain: `SuiteOrchestrator` → CLI `--report-dir` option → `TestOrchestrator` → `TestExecutor` → `SkillPromptBuilder` → workflow
431
+
432
+ ### Added
433
+ - `--report-dir` CLI option for `ace-test-e2e` to override computed report directory path
434
+ - `REPORT_DIR` parameter in `run.wf.md` workflow instructions for agent-side report path override
435
+
436
+ ## [0.16.11] - 2026-02-21
437
+
438
+ ### Fixed
439
+ - Add `-b main` to `git init` in `SetupExecutor` to ensure consistent default branch name regardless of system git configuration
440
+
441
+ ## [0.16.10] - 2026-02-21
442
+
443
+ ### Added
444
+ - Save subprocess raw output (`subprocess_output.log`) for all test results (pass, fail, error) in report directories for diagnostic context
445
+ - Write `subprocess_output.log` alongside failure stub `metadata.yml` when subprocess has no report directory
446
+
447
+ ### Technical
448
+ - Add `save_subprocess_output` method to persist subprocess stdout+stderr to report directories
449
+ - Attach `:raw_output` to result hashes in both parallel and sequential execution paths
450
+ - Add tests for `parse_subprocess_result` raw output inclusion, `save_subprocess_output` behavior, and failure stub output logging
451
+
452
+ ## [0.16.9] - 2026-02-21
453
+
454
+ ### Fixed
455
+ - Downcase status in `SkillResultParser.normalize_status` so `"Pass"` and `"PASS"` are correctly recognized as `"pass"`
456
+ - Downcase status in `ResultParser.normalize_result` for JSON/API path consistency
457
+ - Reconcile scenario status with case counts in `SuiteOrchestrator.override_from_metadata` — override to `"pass"` when all cases passed but metadata status is incorrect
458
+ - Reconcile scenario status with case counts in `TestOrchestrator.read_agent_result` — same safety net for CLI provider metadata path
459
+
460
+ ## [0.16.8] - 2026-02-21
461
+
462
+ ### Fixed
463
+ - Fix `short_id` regex to support digits in test area names (e.g., `TS-B36TS-001` now correctly yields `ts001`)
464
+ - Copy test definition files (`.tc.md`) to sandbox before execution so the test runner can locate them during E2E runs
465
+
466
+ ### Technical
467
+ - Add test for skill name coupling in `SkillPromptBuilder` to catch invocation name drift
468
+ - Add tests for `short_id` with digit-containing area names (`B36TS`, `ASSIGN`)
469
+
470
+ ## [0.16.7] - 2026-02-20
471
+
472
+ ### Fixed
473
+ - Correct skill invocation name from `/ace:run-e2e-test` to `/ace:e2e-run` in `SkillPromptBuilder` (was causing 100% E2E test failure rate)
474
+ - Fix broken `.claude/skills/ace_e2e-run` symlink target from non-existent `ace_run-e2e-test` to `ace_e2e-run`
475
+
476
+ ### Added
477
+ - `tmux-session` setup step in `SetupExecutor` — creates an isolated detached tmux session, stores name as `ACE_TMUX_SESSION` env var, and cleans up via new `teardown` method
478
+
479
+ ## [0.16.6] - 2026-02-19
480
+
481
+ ### Technical
482
+ - Namespace workflow instructions into e2e/ subdirectory with updated wfi:// URIs
483
+ - Update skill name references to use namespaced ace:e2e-action format
484
+
485
+ ## [0.16.5] - 2026-02-19
486
+
487
+ ### Fixed
488
+
489
+ - Detect CLI-provider skill mis-invocation patterns (`/ace:...` in shell, invalid `ace-test e2e`, missing tests context) and return explicit infrastructure errors
490
+ - Require deterministic report-directory matching in `TestOrchestrator` for CLI-provider runs to prevent stale report reuse across run IDs
491
+
492
+ ### Changed
493
+
494
+ - Harden `SkillPromptBuilder` prompts and handbook guidance to explicitly require slash-command execution in chat context (not bash)
495
+
496
+ ## [0.16.4] - 2026-02-18
497
+
498
+ ### Changed
499
+
500
+ - **Balanced E2E decision evidence across handbook/workflows** — `create-e2e-test.wf.md` (v1.3), `review-e2e-tests.wf.md` (v2.1), and `plan-e2e-changes.wf.md` (v1.1) now require explicit E2E-vs-unit justification with unit coverage references and replacement evidence for overlap-based removals
501
+ - **Scenario metadata expanded for manual, cost-aware runs** — `scenario.yml` reference/template and authoring guidance now include `cost-tier`, `e2e-justification`, and `unit-coverage-reviewed` fields
502
+ - **E2E guide refined to avoid duplicate layer testing** — `e2e-testing.g.md` (v1.6) now documents manual run order (`smoke` → `standard` → `deep`) and clarifies that negative/error TCs are required when they add E2E-only value or close a documented unit gap
503
+ - **TC authoring guidance updated** — `tc-authoring.g.md` (v1.1) now ties each TC back to scenario-level Value Gate evidence instead of requiring blanket error-TC duplication
504
+
505
+ ## [0.16.3] - 2026-02-18
506
+
507
+ ### Fixed
508
+
509
+ - Suite runner now correctly detects partial test failures when subprocess exits with code 0 but has fewer passed cases than total cases
510
+ - "partial" status now counted as failed in both sequential and parallel suite execution paths
511
+ - Suite summary now displays test-case-level counts (passed/failed/percentage) alongside test-level counts
512
+
513
+ ## [0.16.2] - 2026-02-18
514
+
515
+ ### Changed
516
+
517
+ - Remove all MT-format references from e2e-testing guide — TS-format is now the only documented convention
518
+ - Remove `--format mt` parameter from create-e2e-test workflow — TS-format is the only option
519
+ - Remove MT-format discovery commands (`find ... -name "*.mt.md"`) from run-e2e-test, run-e2e-tests, review-e2e-tests, and rewrite-e2e-tests workflows
520
+ - Update setup-e2e-sandbox workflow to use `TS-` prefix in examples and sed patterns
521
+ - Update fix-e2e-tests workflow to remove MT-format file references
522
+ - Update all example test IDs and cache paths from `MT-LINT-001`/`mt001` to `TS-LINT-001`/`ts001`
523
+
524
+ ## [0.16.1] - 2026-02-18
525
+
526
+ ### Fixed
527
+
528
+ - `ace-test-e2e-suite` now reads `execution.parallel` from config instead of hardcoding `0` (sequential), matching `ace-test-e2e` behavior
529
+
530
+ ### Changed
531
+
532
+ - Package renamed from `ace-test-e2e-runner` to `ace-test-runner-e2e` for naming consistency with `ace-test-runner` base package
533
+ - Binary renamed from `ace-test-suite-e2e` to `ace-test-e2e-suite` to place `-e2e` qualifier as infix after `test`
534
+
535
+ ## [0.16.0] - 2026-02-12
536
+
537
+ ### Added
538
+
539
+ - **TS-format E2E test structure** — complete infrastructure for per-TC test scenarios in `TS-*/scenario.yml` directories with separate test case files
540
+ - **TC-level execution pipeline** — independent test case execution enabling targeted re-runs of failed TCs only
541
+ - **Setup CLI subcommand** — `ace-test-e2e setup <package> <test-id>` for deterministic Ruby-based sandbox setup before LLM handoff
542
+ - **ScenarioLoader molecule** — loads TS-format scenario directories with scenario.yml, test cases, and fixtures
543
+ - **TestCase model** — data model for individual test cases with tc_id, content, and file metadata
544
+
545
+ ### Changed
546
+
547
+ - **Remove legacy .mt.md support** — deleted ScenarioParser molecule; all test discovery and execution now uses TS-format directory structure only
548
+ - **Dual-mode → Single-mode discovery** — TestDiscoverer simplified to find only `TS-*/scenario.yml` patterns (no more `.mt.md` files)
549
+ - **Simplified extract methods** — `extract_test_name`, `extract_test_id`, `file_matches_test_id?` now work with directory names only
550
+ - **Config updated** — `discovery` pattern changed from `**/*.mt.md` to `TS-*/scenario.yml`, `test_id.pattern` from `MT-*` to `TS-*`
551
+
552
+ ### Fixed
553
+
554
+ - **ScenarioParser TS-format fallback** — fixed delegation to ScenarioLoader for scenario.yml files
555
+ - **Display managers** — suite progress/simple display managers correctly extract test names from directory paths
556
+
557
+ ## [0.15.1] - 2026-02-11
558
+
559
+ ### Fixed
560
+
561
+ - Expand relative PROJECT_ROOT_PATH to absolute sandbox path in test orchestrator,
562
+ ensuring agents running from monorepo root can find sandbox resources correctly
563
+
564
+ ## [0.15.0] - 2026-02-11
565
+
566
+ ### Added
567
+
568
+ - **fix-e2e-tests workflow** (v1.0) — new workflow for systematically diagnosing and fixing failing E2E tests with three-way root cause classification: application code issue, test definition issue, or runner/infrastructure issue
569
+ - **fix-e2e-tests skill** — `/ace:fix-e2e-tests` skill wrapping the new workflow, with cost-conscious re-run strategy and iterative fix loop
570
+
571
+ ### Fixed
572
+
573
+ - Apply code review feedback from PR #197
574
+
575
+ ## [0.14.0] - 2026-02-11
576
+
577
+ ### Added
578
+
579
+ - **3-stage E2E pipeline** — redesigned E2E test lifecycle as explicit review → plan → rewrite pipeline, replacing the monolithic manage workflow
580
+ - **plan-e2e-changes workflow** (v1.0) — new Stage 2 workflow that analyzes coverage matrix and produces concrete change plans with REMOVE/KEEP/MODIFY/CONSOLIDATE/ADD classifications
581
+ - **rewrite-e2e-tests workflow** (v1.0) — new Stage 3 workflow that executes change plans: deletes, creates, modifies, and consolidates E2E test scenarios
582
+ - **TS-format display support** — `SuiteProgressDisplayManager` and `SuiteSimpleDisplayManager` now extract test names from TS-format `scenario.yml` paths (directory name) in addition to MT-format `.mt.md` paths
583
+ - **Metadata-based result override** — `SuiteOrchestrator` reads agent-written `metadata.yml` to correct subprocess exit code mismatches, matching `TestOrchestrator#read_agent_result` behavior
584
+
585
+ ### Changed
586
+
587
+ - **review-e2e-tests workflow** (v1.2 → v2.0) — rewritten from health report generator to deep exploration producing a coverage matrix (functionality × unit tests × E2E), with overlap analysis, gap analysis, and consolidation opportunities
588
+ - **manage-e2e-tests workflow** (v1.2 → v2.0) — rewritten from 370-line monolithic flow to ~170-line lightweight orchestrator chaining the 3 pipeline stages with user confirmation gate
589
+ - **TC classifications** — replaced old ARCHIVE/CREATE/UPDATE/KEEP categories with REMOVE/KEEP/MODIFY/CONSOLIDATE/ADD for clearer intent
590
+
591
+ ## [0.13.0] - 2026-02-11
592
+
593
+ ### Added
594
+
595
+ - **E2E Value Gate** — embedded decision framework across all E2E testing documentation: guide, template, and workflows now require justification that each TC tests behavior needing real binary + real tools + real filesystem (not coverable by unit tests)
596
+ - **Coverage overlap analysis** — `review-e2e-tests.wf.md` (v1.2) includes new Step 5 to compare E2E TC coverage against unit test assertions, classifying overlap as none/partial/full with archival recommendations
597
+ - **CONSOLIDATE management action** — `manage-e2e-tests.wf.md` (v1.2) adds a new category for merging TCs that share CLI invocations, alongside archive/create/update/keep
598
+
599
+ ### Changed
600
+
601
+ - **E2E testing guide** (v1.5) — replaced vague "When to Use" criteria with concrete Value Gate question, added Cost and Scope section (cost per TC, healthy 2-5 TCs/scenario, consolidation rule), added Coverage Overlap Review to Maintenance
602
+ - **Create workflow** (v1.2) — inserted E2E Value Gate Check as Step 7 (unit test overlap check before TC generation), added COST-AWARE rules to TC generation guidelines
603
+ - **Review workflow** (v1.2) — added overlap metrics to health report summary table and new Coverage Overlap section in report template
604
+ - **Manage workflow** (v1.2) — expanded ARCHIVE criteria to include unit test overlap and presentation-only TCs
605
+ - **E2E test template** — added E2E Justification section with unit test coverage checklist, TC consolidation guidance comment, and cost/value reminders
606
+
607
+ ## [0.12.4] - 2026-02-11
608
+
609
+ ### Added
610
+
611
+ - **TC fidelity validator** — new `TcFidelityValidator` atom detects when agents invent test cases instead of executing defined `.tc.md` files, flagging results as error when reported TC count doesn't match expected
612
+ - **Suite report post-validation** — `SuiteReportWriter` now validates LLM-generated "Overall" line against deterministic totals and replaces hallucinated aggregates with correct values
613
+
614
+ ### Changed
615
+
616
+ - **Workflow TC discovery guardrails** — `execute-e2e-test.wf.md` now requires explicit TC listing before execution, includes a TC fidelity rule forbidding invented test cases, and adds a self-check step to verify result count matches discovery
617
+
618
+ ## [0.12.3] - 2026-02-11
619
+
620
+ ### Changed
621
+
622
+ - **Handbook TS-format support** — updated `run-e2e-test.wf.md` (v1.6), `run-e2e-tests.wf.md` (v1.1), `review-e2e-tests.wf.md` (v1.1), `create-e2e-test.wf.md` (v1.1), and `manage-e2e-tests.wf.md` (v1.1) to discover and reference both MT-format (`.mt.md`) and TS-format (`scenario.yml` / `.tc.md`) test scenarios
623
+ - **`create-e2e-test.wf.md`** — added `--format mt|ts` argument for creating TS-format scenario directories with `scenario.yml` and individual TC files
624
+ - **README and e2e-testing guide** — updated documentation to cover dual-format architecture, TS-format directory structure, and per-TC execution
625
+
626
+ ## [0.12.2] - 2026-02-11
627
+
628
+ ### Added
629
+
630
+ - **`execute-e2e-test.wf.md` workflow** — focused execution-only workflow for pre-populated sandboxes, handling test case discovery, execution, and reporting without setup steps
631
+
632
+ ### Changed
633
+
634
+ - **SKILL.md conditional routing** — skill now routes to `wfi://execute-e2e-test` when `--sandbox` is present, `wfi://run-e2e-test` otherwise
635
+ - **Unified skill invocation for all CLI providers** — removed `skill_aware?` distinction; all CLI providers (claude, gemini, codex, etc.) now use `/ace:run-e2e-test` skill invocation instead of embedded workflow prompts
636
+ - **Simplified `SkillPromptBuilder`** (273 → 113 lines) — removed `build_workflow_prompt`, `build_tc_workflow_prompt`, `system_prompt_for`, and `skill_aware?` methods
637
+ - **Simplified `TestExecutor`** (347 → 296 lines) — removed `skill_aware?` branching, `load_workflow_content`, and `find_project_root` dead methods
638
+ - **Cleaned `run-e2e-test.wf.md`** (v1.5) — removed sandbox mode section and skip guards (now handled by `execute-e2e-test.wf.md`)
639
+
640
+ ### Removed
641
+
642
+ - `skill_aware` config key from `config.yml` and `ConfigLoader#skill_aware_providers`
643
+
644
+ ## [0.12.1] - 2026-02-11
645
+
646
+ ### Added
647
+
648
+ - **Scenario-level sandbox pre-setup** — `TestOrchestrator` runs `SetupExecutor` in Ruby before LLM invocation for TS-format scenarios, passing `sandbox_path` and `env_vars` to skip deterministic setup steps in the LLM
649
+ - **Sandbox/env params in prompt builders** — `SkillPromptBuilder#build_skill_prompt` and `#build_workflow_prompt` accept `sandbox_path:` and `env_vars:` kwargs, appending `--sandbox` and `--env` flags
650
+ - **Workflow sandbox mode documentation** — `run-e2e-test.wf.md` documents scenario-level sandbox mode with `--sandbox` and `--env` arguments, skip guards on steps 4-5
651
+
652
+ ### Changed
653
+
654
+ - `SetupExecutor#execute` now returns `env:` key in result hash containing accumulated environment variables
655
+ - `TestExecutor#execute` and `#execute_via_skill` accept and forward `sandbox_path:` and `env_vars:` kwargs
656
+
657
+ ## [0.12.0] - 2026-02-11
658
+
659
+ ### Added
660
+
661
+ - **TestCase model** — `Models::TestCase` with title, steps, expected results, and setup/teardown support; `TestScenario` extended with `test_cases` collection for TS-format scenarios
662
+ - **ScenarioLoader** molecule — loads TS-format `scenario.yml` files with test case directory discovery, fixture path resolution, and setup script detection
663
+ - **FixtureCopier** molecule — copies fixture directories into sandbox with collision detection and path mapping
664
+ - **SetupExecutor** molecule — runs `setup.sh` scripts in sandbox context with timeout, output capture, and error reporting
665
+ - **TestCaseParser** atom — parses individual test case directories into `TestCase` models
666
+ - **Dual-mode test discovery** — `TestDiscoverer` supports both legacy `.mt.md` files and new TS-format `scenario.yml` directory structures
667
+ - **TC-level execution pipeline** — per-test-case independence with individual setup, execution, and reporting
668
+ - **`setup` CLI subcommand** — `ace-test-e2e setup PACKAGE [TEST_ID]` prepares sandbox without running tests
669
+
670
+ ### Fixed
671
+
672
+ - `ScenarioParser#parse` now handles TS-format `scenario.yml` files — delegates to `ScenarioLoader` instead of crashing with `ArgumentError: No frontmatter found`
673
+ - Review cycle 1 feedback items (medium+ severity)
674
+ - Review cycle 2 feedback items (critical + high severity)
675
+
676
+ ### Changed
677
+
678
+ - ace-lint E2E tests migrated from `.mt.md` to per-TC directory format
679
+ - E2E test configurations added for linting
680
+
681
+ ## [0.11.2] - 2026-02-10
682
+
683
+ ### Fixed
684
+
685
+ - `--only-failures` no longer re-runs passing scenarios in multi-scenario packages — SuiteOrchestrator now uses per-scenario failure data (`find_failures_by_scenario`) instead of flat per-package aggregation, so only scenarios with actual failures are launched
686
+ - `--only-failures` now correctly matches test files with descriptive filename suffixes (e.g. `MT-COMMIT-002-specific-file-commit.mt.md`) against metadata test-ids (`MT-COMMIT-002`) via prefix matching
687
+ - Per-scenario `--test-cases` filtering — each scenario now receives only its own failed TC IDs instead of the same flat list applied to every scenario in the package
688
+ - `SuiteProgressDisplayManager` no longer crashes with `NoMethodError` on nil `@footer_line` when the test queue is empty (defensive guard)
689
+
690
+ ### Technical
691
+
692
+ - 74 tests across changed files, 262 assertions, 0 failures
693
+
694
+ ## [0.11.1] - 2026-02-10
695
+
696
+ ### Fixed
697
+
698
+ - `--only-failures` now detects tests that errored without writing metadata — `write_failure_stubs` in SuiteOrchestrator backfills stub `metadata.yml` for any test that failed/errored but has no metadata on disk (e.g., provider 503, timeout before report generation)
699
+ - FailureFinder wildcard fallback now recognizes `status: "error"` and `status: "incomplete"` in addition to `fail` and `partial`, ensuring error stubs trigger full test re-runs
700
+
701
+ ### Technical
702
+
703
+ - 63 tests across changed files, 226 assertions, 0 failures
704
+
705
+ ## [0.11.0] - 2026-02-08
706
+
707
+ ### Added
708
+
709
+ - `ace-test-e2e-sh` sandbox wrapper script — enforces working directory and `PROJECT_ROOT_PATH` isolation for every bash command in E2E tests, preventing test artifacts from escaping the sandbox across separate shell invocations
710
+ - Wrapper validates sandbox path (must contain `.cache/ace-test-e2e/`), supports both args mode and stdin heredoc mode, and uses `exec` for transparent exit-code passthrough
711
+
712
+ ### Changed
713
+
714
+ - Updated all 43 E2E test files across 10 packages to use `ace-test-e2e-sh` wrapper for Test Data and Test Cases bash blocks
715
+ - Updated `run-e2e-test.wf.md` sections 5 and 6 with wrapper usage instructions
716
+ - Updated `setup-e2e-sandbox.wf.md` with wrapper documentation and usage examples
717
+
718
+ ## [0.10.10] - 2026-02-08
719
+
720
+ ### Added
721
+
722
+ - Batch timestamp generation for `ace-test-suite-e2e` — `SuiteOrchestrator` pre-generates unique 50ms-offset run IDs and passes them to subprocesses via `--run-id`, giving coordinated sandbox/report paths across suite runs
723
+ - `--run-id` CLI option on `ace-test-e2e` for deterministic report paths when invoked by suite orchestrator
724
+ - `TestOrchestrator#run` accepts external `run_id:` keyword, using it instead of generating a timestamp when provided
725
+
726
+ ### Technical
727
+
728
+ - 263 tests, 797 assertions, 0 failures
729
+
730
+ ## [0.10.9] - 2026-02-08
731
+
732
+ ### Fixed
733
+
734
+ - Surface silent failures in `SuiteOrchestrator#generate_suite_report` — replace blanket `rescue => _e; nil` with `warn` that prints error class and message; backtrace available via `DEBUG=1`
735
+ - Add DEBUG-gated warning when suite report is skipped due to no results matching test files
736
+ - Strip whitespace from `report_dir` regex captures in `parse_subprocess_result` and `run_single_test` to prevent path mismatches
737
+
738
+ ### Technical
739
+
740
+ - 257 tests, 783 assertions, 0 failures
741
+
742
+ ## [0.10.8] - 2026-02-08
743
+
744
+ ### Added
745
+
746
+ - Package filtering for `ace-test-suite-e2e` — optional comma-separated `packages` positional argument filters suite execution to specific packages (e.g., `ace-test-suite-e2e ace-bundle,ace-lint`)
747
+ - Package filter composes with `--affected` via intersection — both filters narrow the package set independently
748
+
749
+ ### Technical
750
+
751
+ - 256 tests, 769 assertions, 0 failures
752
+
753
+ ## [0.10.7] - 2026-02-08
754
+
755
+ ### Added
756
+
757
+ - Suite-level final report generation in `SuiteOrchestrator` — wires `SuiteReportWriter` into multi-package runs to produce LLM-synthesized reports after all tests complete
758
+ - `finalize_run` helper extracts duplicated summary + return pattern from `run_sequential` and `run_parallel`
759
+ - `generate_suite_report` coordinates data conversion (result hashes → `TestResult` models, test files → `TestScenario` models) and report writing
760
+ - `build_test_result` converts raw subprocess result hashes into `Models::TestResult` with synthesized test case arrays
761
+ - `parse_scenario` parses `.mt.md` files via `ScenarioParser` with fallback to stub `TestScenario`
762
+ - Report path printed to output and included in return hash as `:report_path`
763
+ - Constructor accepts injectable `suite_report_writer`, `scenario_parser`, `timestamp_generator` for testability
764
+
765
+ ### Technical
766
+
767
+ - 251 tests, 743 assertions, 0 failures
768
+
769
+ ## [0.10.6] - 2026-02-08
770
+
771
+ ### Fixed
772
+
773
+ - Unify timestamp precision to 7-char (`:"50ms"`) across all E2E paths — `default_timestamp` now uses `Timestamp.encode(Time.now.utc, format: :"50ms")` instead of `Timestamp.now`
774
+ - Remove `count <= 1` early return in `generate_timestamps` that fell back to 6-char path, causing mixed-length timestamps within the same method
775
+
776
+ ### Technical
777
+
778
+ - 247 tests, 723 assertions, 0 failures
779
+
780
+ ## [0.10.5] - 2026-02-08
781
+
782
+ ### Changed
783
+
784
+ - Extract `REFRESH_INTERVAL = 0.25` constant for 4Hz refresh rate — replaces magic number across both orchestrators and both progress display managers
785
+
786
+ ### Technical
787
+
788
+ - 247 tests, 723 assertions, 0 failures
789
+
790
+ ## [0.10.4] - 2026-02-08
791
+
792
+ ### Added
793
+
794
+ - Live timer refresh for single-package `--progress` display — dedicated 4Hz refresh thread in `TestOrchestrator` updates running timers while tests execute
795
+ - `ProgressDisplayManager` test coverage (header rendering, state transitions, throttle behavior)
796
+
797
+ ### Changed
798
+
799
+ - Throttle `ProgressDisplayManager#refresh` to ~4Hz (250ms) — matches `SuiteProgressDisplayManager` pattern
800
+
801
+ ### Technical
802
+
803
+ - 247 tests, 723 assertions, 0 failures
804
+
805
+ ## [0.10.3] - 2026-02-08
806
+
807
+ ### Changed
808
+
809
+ - Throttle `SuiteProgressDisplayManager#refresh` to ~4Hz (250ms) — reduces terminal I/O while maintaining responsive process completion detection
810
+
811
+ ### Technical
812
+
813
+ - 242 tests, 693 assertions, 0 failures
814
+
815
+ ## [0.10.2] - 2026-02-08
816
+
817
+ ### Added
818
+
819
+ - `--progress` CLI option for live animated display in `ace-test-suite-e2e`
820
+ - `SuiteProgressDisplayManager` molecule — animated ANSI table with in-place row updates, running timers, and live footer (Active/Completed/Waiting)
821
+ - `SuiteSimpleDisplayManager` molecule — extracted default line-by-line display from SuiteOrchestrator
822
+
823
+ ### Changed
824
+
825
+ - SuiteOrchestrator delegates display to pluggable display managers (same pattern as TestOrchestrator)
826
+ - SuiteOrchestrator accepts `progress:` parameter for display mode selection
827
+
828
+ ### Technical
829
+
830
+ - 241 tests, 689 assertions, 0 failures
831
+
832
+ ## [0.10.1] - 2026-02-08
833
+
834
+ ### Changed
835
+
836
+ - Polished suite output with columnar alignment, double-line separators, and structured summary
837
+ - Suite header shows test and package counts with `═` separator borders
838
+ - Per-test progress lines now display icon, duration, package, test name, and case counts in aligned columns
839
+ - Suite summary shows failed test details, duration, pass/fail stats, and colored status message
840
+ - Added `use_color:` parameter to SuiteOrchestrator for ANSI color control (auto-detects TTY)
841
+
842
+ ### Added
843
+
844
+ - `DisplayHelpers.double_separator` — 65-char `═` double-line separator for suite display
845
+ - `DisplayHelpers.format_suite_duration` — minute-range formatting (`4m 25s`)
846
+ - `DisplayHelpers.format_suite_elapsed` — right-aligned 7-char column for suite times
847
+ - `DisplayHelpers.format_suite_test_line` — columnar test result line builder
848
+ - `DisplayHelpers.format_suite_summary` — complete summary block formatter
849
+ - `SuiteOrchestrator.extract_test_name` — human-readable test name from file path
850
+ - Test case count extraction from subprocess output in `parse_subprocess_result`
851
+
852
+ ### Technical
853
+
854
+ - 224 tests, 607 assertions, 0 failures
855
+
856
+ ## [0.10.0] - 2026-02-08
857
+
858
+ ### Added
859
+
860
+ - `ace-test-suite-e2e` command for running E2E tests across all packages
861
+ - `SuiteOrchestrator` organism for managing multi-package test execution
862
+ - `AffectedDetector` molecule for detecting packages affected by recent changes
863
+ - Parallel execution support with `--parallel N` option
864
+ - `--affected` filter to test only changed packages
865
+
866
+ ### Fixed
867
+
868
+ - Prevent FrozenError in parallel execution output buffering
869
+ - Prevent shell injection vulnerability by using array-based command execution (`Open3.popen3`)
870
+ - Fix `--affected` edge case handling
871
+
872
+ ## [0.9.0] - 2026-02-07
873
+
874
+ ### Changed
875
+
876
+ - Rename CLI binary from `ace-e2e-test` to `ace-test-e2e` to align with package name `ace-test-e2e-runner`
877
+ - Update all CLI display strings, help text, and usage examples to use `ace-test-e2e`
878
+ - Update report metadata agent identifier from `ace-e2e-test` to `ace-test-e2e`
879
+ - Update gemspec executable declaration and bin wrapper paths
880
+
881
+ ### Technical
882
+
883
+ - 189 tests, 509 assertions, 0 failures
884
+
885
+ ## [0.8.2] - 2026-02-07
886
+
887
+ ### Added
888
+
889
+ - Support comma-separated test IDs in `ace-e2e-test` — e.g. `ace-e2e-test ace-lint 002,007` now discovers and runs multiple tests
890
+ - Multi-ID runs use parallel execution with progress display and suite reports (same as package-wide runs)
891
+
892
+ ### Technical
893
+
894
+ - 189 tests, 509 assertions, 0 failures
895
+
896
+ ## [0.8.1] - 2026-02-07
897
+
898
+ ### Fixed
899
+
900
+ - Informative error message on test failure — was empty `Error.new("")`, now shows count and IDs of failed tests
901
+ - CLI help text `--provider` default now dynamically read from ConfigLoader instead of hardcoded stale value
902
+ - Added `pi` to fallback CLI provider arrays in ConfigLoader and SkillPromptBuilder
903
+ - Narrowed bare `rescue => _e` to `rescue StandardError => e` with debug logging in SuiteReportWriter
904
+
905
+ ### Technical
906
+
907
+ - 185 tests, 501 assertions, 0 failures
908
+
909
+ ## [0.8.0] - 2026-02-07
910
+
911
+ ### Added
912
+
913
+ - **DisplayHelpers** atom — pure formatting module with `status_icon`, `format_elapsed`, `format_duration`, `tc_count_display`, `separator`, and `color` methods
914
+ - **SimpleDisplayManager** molecule — default line-by-line display with structured summary block (Duration/Tests/Test cases/Report)
915
+ - **ProgressDisplayManager** molecule — ANSI animated table with in-place row updates, running timers, and live footer
916
+ - `--progress` CLI option for live animated display mode
917
+ - Display helpers test suite (14 tests)
918
+
919
+ ### Changed
920
+
921
+ - Extracted display logic from TestOrchestrator into pluggable display managers
922
+ - Improved output format: status icons (✓/✗), aligned elapsed times, structured summary block with separator
923
+ - TestOrchestrator accepts `progress:` parameter for display mode selection
924
+ - Test suite updated to 145 tests (from 131)
925
+
926
+ ## [0.7.4] - 2026-02-07
927
+
928
+ ### Fixed
929
+
930
+ - Remove redundant failure count line from CLI output — enhanced summary already communicates failure counts
931
+ - Add error handling rescue in TestExecutor to catch unexpected execution errors gracefully
932
+ - Remove hardcoded version string from CLI tests — now matches semver pattern
933
+
934
+ ## [0.7.3] - 2026-02-07
935
+
936
+ ### Added
937
+
938
+ - **SuiteReportPromptBuilder** atom — pure prompt builder with `SYSTEM_PROMPT` and `build()` for LLM-synthesized suite reports (root causes, friction analysis, suggestions)
939
+ - LLM-synthesized suite reports via `Ace::LLM::QueryInterface` in SuiteReportWriter, producing rich analysis instead of mechanical tables
940
+ - Configurable `reporting.model` and `reporting.timeout` in `.ace-defaults/e2e-runner/config.yml`
941
+ - Report file reading — SuiteReportWriter reads `summary.r.md` and `experience.r.md` from each test's report directory for LLM context
942
+
943
+ ### Changed
944
+
945
+ - SuiteReportWriter accepts `config:` parameter for model/timeout configuration
946
+ - Static template report retained as automatic fallback on LLM failure
947
+ - Removed workflow Step 7.5 (agent-written suite report) — now handled by orchestrator's LLM synthesis
948
+ - Test suite expanded to 169 tests (from 156)
949
+
950
+ ## [0.7.2] - 2026-02-07
951
+
952
+ ### Added
953
+
954
+ - Deterministic report paths via `run_id` — orchestrator passes pre-generated timestamp IDs to executors and CLI providers (`--run-id` flag)
955
+ - Batch timestamp generation using `ace-timestamp encode --format 50ms --count N` for unique per-test IDs in parallel runs
956
+ - Agent metadata reading — `read_agent_result` parses `metadata.yml` from agent-written report directories for authoritative test status and TC counts
957
+ - SkillPromptBuilder `--run-id` support in both skill and workflow prompts
958
+ - Workflow instruction update documenting `RUN_ID` parameter for deterministic sandbox paths
959
+
960
+ ### Changed
961
+
962
+ - CLI provider report discovery: uses expected path first (`report_dir_for` with run_id), falls back to glob pattern
963
+ - Test suite expanded to 156 tests (from 147)
964
+
965
+ ## [0.7.1] - 2026-02-07
966
+
967
+ ### Added
968
+
969
+ - Enhanced CLI progress output with `[started]` messages showing test titles
970
+ - Test case counts in `[done]` lines: `[N/M done] TEST-ID: PASS 7/8 (duration)`
971
+ - Test case counts in single-test `Result:` line: `Result: PASS 7/8`
972
+ - Summary line now includes TC stats: `Summary: 2/5 passed | 28/35 test cases (80%)`
973
+ - **SuiteReportWriter** molecule: generates `{timestamp}-final-report.md` with frontmatter, summary table, failed test details, and report directory links
974
+ - Suite report path printed after package runs: `Report: .cache/ace-test-e2e/{ts}-final-report.md`
975
+
976
+ ### Changed
977
+
978
+ - Scenarios parsed upfront (before threading) for title display and report generation
979
+ - Test suite expanded to 147 tests (from 130)
980
+
981
+ ## [0.7.0] - 2026-02-06
982
+
983
+ ### Added
984
+
985
+ - Parallel E2E test execution with `--parallel N` CLI option (default: 3)
986
+ - Thread pool with Queue + Mutex for concurrent I/O-bound LLM calls
987
+ - Progress output: `[N/total done] TEST-ID: STATUS (duration)`
988
+ - Results preserve original file order regardless of completion order
989
+ - ConfigLoader molecule for centralized configuration access via Ace::Support::Config
990
+ - Class methods: `default_provider`, `default_timeout`, `default_parallel`, `cli_providers`, `skill_aware_providers`, `cli_args_for`
991
+ - Follows ADR-022 configuration cascade (gem defaults → project config → CLI options)
992
+ - `ace-support-config` gem dependency for configuration resolution
993
+
994
+ ### Changed
995
+
996
+ - SkillPromptBuilder refactored from hardcoded constants to config-driven instance pattern
997
+ - Provider lists and CLI args now loaded from config.yml
998
+ - Class methods delegate to lazily-loaded default instance
999
+ - CLI options `--provider` and `--timeout` now source defaults from ConfigLoader
1000
+ - Test suite expanded to 130 tests (from 109)
1001
+
1002
+ ## [0.6.1] - 2026-02-06
1003
+
1004
+ ### Added
1005
+
1006
+ - Skill-based execution for CLI providers (claude, gemini, codex, opencode)
1007
+ - **SkillPromptBuilder** atom: CLI provider detection, skill/workflow prompt building, required CLI args
1008
+ - **SkillResultParser** atom: Parses subagent return contract markdown, falls back to JSON
1009
+ - CLI provider report delegation — agents write reports via workflow, orchestrator skips ReportWriter
1010
+ - Agent-written report directory discovery in TestOrchestrator
1011
+
1012
+ ### Changed
1013
+
1014
+ - Default provider changed from `google:gemini-2.5-flash` to `claude:sonnet` (skill-aware execution)
1015
+ - TestExecutor split into `execute_via_skill` (CLI providers) and `execute_via_prompt` (API providers)
1016
+ - TestOrchestrator skips ReportWriter for CLI providers, looks for agent-written reports on disk
1017
+ - Test suite expanded to 109 tests (from 71)
1018
+
1019
+ ## [0.6.0] - 2026-02-06
1020
+
1021
+ ### Added
1022
+
1023
+ - `ace-e2e-test` CLI command for executing E2E tests via LLM providers
1024
+ - ATOM architecture components:
1025
+ - **Atoms:** PromptBuilder, ResultParser
1026
+ - **Molecules:** TestDiscoverer, ScenarioParser, TestExecutor (LLM-based), ReportWriter (summary, experience, metadata)
1027
+ - **Organisms:** TestOrchestrator (single and package-wide test execution)
1028
+ - **Models:** TestScenario, TestResult
1029
+ - CLI with dry-cli: `ace-e2e-test PACKAGE [TEST_ID] [OPTIONS]`
1030
+ - `--provider` option for LLM provider selection (default: google:gemini-2.5-flash)
1031
+ - `--cli-args` passthrough for CLI-based LLM providers
1032
+ - `--timeout` option for per-test timeout configuration
1033
+ - Report generation following existing report path contract
1034
+ - `exe/ace-e2e-test` executable and `bin/ace-e2e-test` monorepo wrapper
1035
+ - Comprehensive test suite (71 tests) covering atoms, molecules, organisms, models, and CLI
1036
+ - Injectable executor in TestOrchestrator for testability
1037
+ - `TestResult#with_report_dir` for immutable copy-with-modification
1038
+ - Injectable timestamp generator in TestOrchestrator for testability
1039
+
1040
+ ### Changed
1041
+
1042
+ - Added `dry-cli`, `ace-support-core`, and `ace-llm` as gem dependencies
1043
+
1044
+ ## [0.5.1] - 2026-02-04
1045
+
1046
+ ### Added
1047
+
1048
+ - CLI-Based Testing Requirement section to create-e2e-test workflow documenting that E2E tests must test through CLI interface, not library imports
1049
+
1050
+ ## [0.5.0] - 2026-02-01
1051
+
1052
+ ### Added
1053
+
1054
+ - Sandbox isolation checkpoint with 3-check verification (path, git remotes, project markers)
1055
+ - Standard Setup Script section as authoritative copy-executable source for sandbox setup
1056
+ - Expected Variables documentation (PROJECT_ROOT, TEST_DIR, REPORTS_DIR, TIMESTAMP_ID)
1057
+
1058
+ ### Changed
1059
+
1060
+ - Consolidated sandbox setup by moving `e2e-sandbox-setup.wf.md` into this package
1061
+ - Renamed workflow to `setup-e2e-sandbox.wf.md` following verb-first convention
1062
+ - Updated `run-e2e-test.wf.md` to delegate sandbox setup to `wfi://setup-e2e-sandbox`
1063
+ - Removed ~30 lines of duplicated inline sandbox logic from run-e2e-test workflow
1064
+ - Renamed skill from `ace_e2e-sandbox-setup` to `ace_setup-e2e-sandbox`
1065
+
1066
+ ## [0.4.1] - 2026-01-30
1067
+
1068
+ ### Fixed
1069
+
1070
+ - Updated report path documentation from sibling pattern to subfolder pattern (`-reports/`)
1071
+ - Removed incorrect `artifacts/` subdirectory from test data path examples
1072
+
1073
+ ### Technical
1074
+
1075
+ - Added pre-creation sandbox verification gate to workflow instructions
1076
+ - Enhanced directory structure diagrams for consistency across guides and templates
1077
+
1078
+ ## [0.4.0] - 2026-01-29
1079
+
1080
+ ### Added
1081
+
1082
+ - Parallel E2E test execution with subagents via `/ace:run-e2e-tests` orchestrator skill
1083
+ - Suite-level report aggregation for multi-test runs
1084
+ - Subagent return contract for structured result passing between orchestrator and workers
1085
+
1086
+ ### Changed
1087
+
1088
+ - Enhanced sandbox naming with test ID inclusion (`{timestamp}-{package}-{test-id}/`)
1089
+ - Moved reports outside sandbox as sibling files (`.summary.r.md`, `.experience.r.md`, `.metadata.yml`)
1090
+
1091
+ ### Breaking Changes
1092
+
1093
+ - **Cache directory renamed**: `.cache/test-e2e/` → `.cache/ace-test-e2e/`. External scripts referencing the old path will need updating.
1094
+
1095
+ ## [0.3.0] - 2026-01-29
1096
+
1097
+ ### Added
1098
+
1099
+ - Persistent test reports (`test-report.md`) capturing pass/fail status, test case details, and environment information
1100
+ - Agent experience reports (`agent-experience-report.md`) documenting friction points, root cause analysis, and improvement suggestions
1101
+ - Test execution metadata (`metadata.yml`) storing run-specific details like duration, Git context, and tool versions
1102
+ - ace-taskflow fixture template for standardized taskflow structure creation in E2E tests
1103
+
1104
+ ### Changed
1105
+
1106
+ - Updated test environment structure to use `artifacts/` subdirectory for test data organization
1107
+ - Enhanced E2E testing guidelines with emphasis on error path coverage and negative test cases
1108
+ - Improved test templates with error testing best practices and reviewer checklist
1109
+ - Updated test execution workflow to automatically generate and persist reports at end of each run
1110
+
1111
+ ## [0.2.1] - 2026-01-22
1112
+
1113
+ ### Added
1114
+
1115
+ - Container-based E2E test isolation guide for macOS (Lima, OrbStack support)
1116
+ - Template updates for containerized test scenarios
1117
+
1118
+ ## [0.2.0] - 2026-01-19
1119
+
1120
+ ### Added
1121
+
1122
+ - E2E test management skills for lifecycle orchestration:
1123
+ - `/ace:review-e2e-tests` - Analyze test health, coverage gaps, and outdated scenarios
1124
+ - `/ace:create-e2e-test` - Create new test scenarios from template
1125
+ - `/ace:manage-e2e-tests` - Orchestrate full lifecycle (review, create, run)
1126
+ - Workflow instructions for all three new skills
1127
+ - Protocol source registrations (wfi://, guide://, tmpl://)
1128
+ - PROJECT_ROOT detection in workflow and template
1129
+ - Gem entry point for programmatic access
1130
+ - Expanded best practices section with learnings:
1131
+ - Environment setup guidance (PROJECT_ROOT capture)
1132
+ - Tool version manager workarounds (mise shim handling)
1133
+ - Test data and cleanup patterns
1134
+
1135
+ ### Changed
1136
+
1137
+ - Renamed package from `ace-support-test-manual` to `ace-test-e2e-runner`
1138
+ - Renamed workflow from `run-manual-test` to `run-e2e-test`
1139
+ - Renamed test directory convention from `test/scenarios/` to `test/e2e/`
1140
+ - Renamed cache directory from `.cache/test-manual/` to `.cache/test-e2e/`
1141
+ - Made `PACKAGE` argument optional (defaults to current directory detection)
1142
+ - Made `TEST_ID` argument optional (runs all tests in package when omitted)
1143
+ - Cleanup is now optional and configurable via `cleanup.enabled` setting
1144
+
1145
+ ### Improved
1146
+
1147
+ - Documentation for mise shim workarounds in TC-003
1148
+ - README clarity on package purpose and usage
1149
+
1150
+ ## [0.1.0] - 2026-01-18
1151
+
1152
+ ### Added
1153
+
1154
+ - Initial package structure for E2E test support
1155
+ - Test scenario template (`test-e2e.template.md`)
1156
+ - Workflow for executing E2E tests (`run-e2e-test.wf.md`)
1157
+ - Guide documenting E2E testing conventions (`e2e-testing.g.md`)
1158
+ - Default configuration for test paths and patterns
1159
+ - Skill for invoking E2E tests (`/ace:run-e2e-test`)
1160
+
1161
+
1162
+ ## [0.16.19] - 2026-02-22
1163
+
1164
+ ### Fixed
1165
+ - Added --help/-h and --version flag handling to ace-test-e2e-sh (was causing FATAL error)
1166
+ - Standardized quiet, verbose, debug option descriptions to canonical strings