ace-test-runner-e2e 0.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. checksums.yaml +7 -0
  2. data/.ace-defaults/e2e-runner/config.yml +70 -0
  3. data/.ace-defaults/nav/protocols/guide-sources/ace-test-runner-e2e.yml +11 -0
  4. data/.ace-defaults/nav/protocols/skill-sources/ace-test-runner-e2e.yml +19 -0
  5. data/.ace-defaults/nav/protocols/tmpl-sources/ace-test-runner-e2e.yml +12 -0
  6. data/.ace-defaults/nav/protocols/wfi-sources/ace-test-runner-e2e.yml +11 -0
  7. data/CHANGELOG.md +1166 -0
  8. data/LICENSE +21 -0
  9. data/README.md +42 -0
  10. data/Rakefile +15 -0
  11. data/exe/ace-test-e2e +15 -0
  12. data/exe/ace-test-e2e-sh +67 -0
  13. data/exe/ace-test-e2e-suite +13 -0
  14. data/handbook/guides/e2e-testing.g.md +124 -0
  15. data/handbook/guides/scenario-yml-reference.g.md +182 -0
  16. data/handbook/guides/tc-authoring.g.md +131 -0
  17. data/handbook/skills/as-e2e-create/SKILL.md +30 -0
  18. data/handbook/skills/as-e2e-fix/SKILL.md +35 -0
  19. data/handbook/skills/as-e2e-manage/SKILL.md +31 -0
  20. data/handbook/skills/as-e2e-plan-changes/SKILL.md +30 -0
  21. data/handbook/skills/as-e2e-review/SKILL.md +35 -0
  22. data/handbook/skills/as-e2e-rewrite/SKILL.md +31 -0
  23. data/handbook/skills/as-e2e-run/SKILL.md +48 -0
  24. data/handbook/skills/as-e2e-setup-sandbox/SKILL.md +34 -0
  25. data/handbook/templates/ace-taskflow-fixture.template.md +322 -0
  26. data/handbook/templates/agent-experience-report.template.md +89 -0
  27. data/handbook/templates/metadata.template.yml +49 -0
  28. data/handbook/templates/scenario.yml.template.yml +60 -0
  29. data/handbook/templates/tc-file.template.md +45 -0
  30. data/handbook/templates/test-report.template.md +94 -0
  31. data/handbook/workflow-instructions/e2e/analyze-failures.wf.md +126 -0
  32. data/handbook/workflow-instructions/e2e/create.wf.md +395 -0
  33. data/handbook/workflow-instructions/e2e/execute.wf.md +253 -0
  34. data/handbook/workflow-instructions/e2e/fix.wf.md +166 -0
  35. data/handbook/workflow-instructions/e2e/manage.wf.md +179 -0
  36. data/handbook/workflow-instructions/e2e/plan-changes.wf.md +255 -0
  37. data/handbook/workflow-instructions/e2e/review.wf.md +286 -0
  38. data/handbook/workflow-instructions/e2e/rewrite.wf.md +281 -0
  39. data/handbook/workflow-instructions/e2e/run.wf.md +355 -0
  40. data/handbook/workflow-instructions/e2e/setup-sandbox.wf.md +461 -0
  41. data/lib/ace/test/end_to_end_runner/atoms/display_helpers.rb +234 -0
  42. data/lib/ace/test/end_to_end_runner/atoms/prompt_builder.rb +199 -0
  43. data/lib/ace/test/end_to_end_runner/atoms/result_parser.rb +166 -0
  44. data/lib/ace/test/end_to_end_runner/atoms/skill_prompt_builder.rb +166 -0
  45. data/lib/ace/test/end_to_end_runner/atoms/skill_result_parser.rb +244 -0
  46. data/lib/ace/test/end_to_end_runner/atoms/suite_report_prompt_builder.rb +103 -0
  47. data/lib/ace/test/end_to_end_runner/atoms/tc_fidelity_validator.rb +39 -0
  48. data/lib/ace/test/end_to_end_runner/atoms/test_case_parser.rb +108 -0
  49. data/lib/ace/test/end_to_end_runner/cli/commands/run_suite.rb +130 -0
  50. data/lib/ace/test/end_to_end_runner/cli/commands/run_test.rb +156 -0
  51. data/lib/ace/test/end_to_end_runner/models/test_case.rb +47 -0
  52. data/lib/ace/test/end_to_end_runner/models/test_result.rb +115 -0
  53. data/lib/ace/test/end_to_end_runner/models/test_scenario.rb +90 -0
  54. data/lib/ace/test/end_to_end_runner/molecules/affected_detector.rb +92 -0
  55. data/lib/ace/test/end_to_end_runner/molecules/config_loader.rb +75 -0
  56. data/lib/ace/test/end_to_end_runner/molecules/failure_finder.rb +203 -0
  57. data/lib/ace/test/end_to_end_runner/molecules/fixture_copier.rb +35 -0
  58. data/lib/ace/test/end_to_end_runner/molecules/pipeline_executor.rb +121 -0
  59. data/lib/ace/test/end_to_end_runner/molecules/pipeline_prompt_bundler.rb +182 -0
  60. data/lib/ace/test/end_to_end_runner/molecules/pipeline_report_generator.rb +321 -0
  61. data/lib/ace/test/end_to_end_runner/molecules/pipeline_sandbox_builder.rb +131 -0
  62. data/lib/ace/test/end_to_end_runner/molecules/progress_display_manager.rb +172 -0
  63. data/lib/ace/test/end_to_end_runner/molecules/report_writer.rb +259 -0
  64. data/lib/ace/test/end_to_end_runner/molecules/scenario_loader.rb +254 -0
  65. data/lib/ace/test/end_to_end_runner/molecules/setup_executor.rb +181 -0
  66. data/lib/ace/test/end_to_end_runner/molecules/simple_display_manager.rb +72 -0
  67. data/lib/ace/test/end_to_end_runner/molecules/suite_progress_display_manager.rb +223 -0
  68. data/lib/ace/test/end_to_end_runner/molecules/suite_report_writer.rb +277 -0
  69. data/lib/ace/test/end_to_end_runner/molecules/suite_simple_display_manager.rb +116 -0
  70. data/lib/ace/test/end_to_end_runner/molecules/test_discoverer.rb +136 -0
  71. data/lib/ace/test/end_to_end_runner/molecules/test_executor.rb +332 -0
  72. data/lib/ace/test/end_to_end_runner/organisms/suite_orchestrator.rb +830 -0
  73. data/lib/ace/test/end_to_end_runner/organisms/test_orchestrator.rb +442 -0
  74. data/lib/ace/test/end_to_end_runner/version.rb +9 -0
  75. data/lib/ace/test/end_to_end_runner.rb +71 -0
  76. metadata +220 -0
@@ -0,0 +1,286 @@
1
+ ---
2
+ doc-type: workflow
3
+ title: Review E2E Tests Workflow
4
+ purpose: Deep exploration producing a coverage matrix of functionality, unit tests, and E2E tests
5
+ ace-docs:
6
+ last-updated: 2026-03-12
7
+ last-checked: 2026-03-21
8
+ ---
9
+
10
+ # Review E2E Tests Workflow
11
+
12
+ This workflow performs deep exploration of a package to produce a **coverage matrix** mapping functionality to unit test and E2E test coverage. The matrix is the primary input for Stage 2 (planning changes).
13
+
14
+ During review, treat the runner/verifier split as a first-class quality check:
15
+ - Runner must be execution-only (no verdict language).
16
+ - Verifier must be impact-first (sandbox impact before artifacts/debug).
17
+
18
+ **Pipeline position:** Stage 1 of 3 (Explore)
19
+
20
+ ```text
21
+ ace-bundle wfi://e2e/review → ace-bundle wfi://e2e/plan-changes → ace-bundle wfi://e2e/rewrite
22
+ ▶ (explore) ◀ (decide) (execute)
23
+ ```
24
+
25
+ ## Arguments
26
+
27
+ - `PACKAGE` (required) - The package to review (e.g., `ace-lint`)
28
+ - `--scope <scenario-id>` (optional) - Limit review to a single scenario and its related features (e.g., `TS-LINT-001`)
29
+
30
+ ## Workflow Steps
31
+
32
+ ### 1. Identify Scope
33
+
34
+ Validate the package exists and determine review scope:
35
+
36
+ ```bash
37
+ test -d "{PACKAGE}" && echo "Package exists" || echo "Package not found"
38
+ ```
39
+
40
+ If package not found, list available packages:
41
+ ```bash
42
+ ls -d */ | grep -E "^ace-" | sed 's/\/$//'
43
+ ```
44
+
45
+ **Scope determination:**
46
+ - If `--scope <scenario-id>` provided: focus on that scenario and the features it tests
47
+ - Otherwise: full package review
48
+
49
+ ### 2. Inventory Package Functionality
50
+
51
+ Map all user-facing features of the package:
52
+
53
+ **List CLI commands:**
54
+ ```bash
55
+ ls {PACKAGE}/bin/ 2>/dev/null
56
+ ```
57
+
58
+ **List command implementations:**
59
+ ```bash
60
+ find {PACKAGE}/lib -path "*/commands/*.rb" -o -path "*/commands.rb" 2>/dev/null | sort
61
+ ```
62
+
63
+ **For each command, identify key features:**
64
+ - Read the command file to find subcommands, flags, and modes
65
+ - List distinct behaviors (e.g., "lint with autofix", "lint dry-run", "lint specific file")
66
+ - Note external tool dependencies (e.g., StandardRB, Rubocop)
67
+
68
+ **Get unit test baseline:**
69
+ ```bash
70
+ cd {PACKAGE} && ace-test --dry-run 2>/dev/null || echo "No dry-run available"
71
+ ```
72
+
73
+ ```bash
74
+ find {PACKAGE}/test -name "*_test.rb" 2>/dev/null | wc -l
75
+ ```
76
+
77
+ Build a feature inventory:
78
+
79
+ | Feature | Command | External Tools | Description |
80
+ |---------|---------|----------------|-------------|
81
+ | {name} | {CLI command} | {tools or "none"} | {what it does} |
82
+
83
+ ### 3. Inventory Unit Test Coverage
84
+
85
+ Map what unit tests cover at each layer:
86
+
87
+ **List all test files by layer:**
88
+ ```bash
89
+ find {PACKAGE}/test/atoms -name "*_test.rb" 2>/dev/null | sort
90
+ find {PACKAGE}/test/molecules -name "*_test.rb" 2>/dev/null | sort
91
+ find {PACKAGE}/test/organisms -name "*_test.rb" 2>/dev/null | sort
92
+ ```
93
+
94
+ **For each test file:**
95
+ - Read the file to extract test method names (`def test_*` or `it "..."` blocks)
96
+ - Count assertions (`assert_*` calls)
97
+ - Identify which feature/behavior each test covers
98
+
99
+ Build a unit test map:
100
+
101
+ | Test File | Layer | Feature Covered | Test Count | Assertion Count |
102
+ |-----------|-------|-----------------|------------|-----------------|
103
+ | {path} | atom | {feature} | {n} | {n} |
104
+
105
+ ### 4. Inventory Existing E2E Coverage
106
+
107
+ Discover all E2E tests for the package:
108
+
109
+ ```bash
110
+ find {PACKAGE}/test/e2e -name "scenario.yml" -path "*/TS-*" 2>/dev/null | sort
111
+ ```
112
+
113
+ **For each scenario/TC:**
114
+ - Read the file and extract frontmatter metadata:
115
+ - `test-id`, `title`, `area`, `priority`
116
+ - `tags`, `cost-tier`, `e2e-justification`, `unit-coverage-reviewed`
117
+ - `last-verified`, `verified-by`
118
+ - Extract the objective (what the TC verifies)
119
+ - Identify which CLI commands the TC runs
120
+ - Count verification steps (PASS/FAIL checks)
121
+ - Map to the feature it tests
122
+ - Mark TC evidence status:
123
+ - `complete` when `e2e-justification` is present and `unit-coverage-reviewed` has at least one path
124
+ - `missing` otherwise
125
+
126
+ If `--scope` was provided, filter to only the specified scenario.
127
+
128
+ Build an E2E test map:
129
+
130
+ | TC ID | Title | CLI Command | Feature Tested | Verifications | Tags | Cost Tier | E2E Justification | Unit Coverage Reviewed | Evidence |
131
+ |-------|-------|-------------|----------------|---------------|------|-----------|-------------------|------------------------|----------|
132
+ | {id} | {title} | {command} | {feature} | {n} | {tags} | {tier} | {reason or "(missing)"} | {files or "(missing)"} | {complete/missing} |
133
+
134
+ ### 5. Build Coverage Matrix
135
+
136
+ Combine the three inventories into a single coverage matrix:
137
+
138
+ **Matrix structure:**
139
+ - **Rows:** Features/behaviors from step 2
140
+ - **Columns:** Unit Tests (atoms/molecules/organisms) | E2E Tests
141
+ - **Cells:** Test file references + counts, or "none"
142
+
143
+ ```markdown
144
+ ### Coverage Matrix
145
+
146
+ | Feature | Unit Tests | E2E Tests | Status |
147
+ |---------|-----------|-----------|--------|
148
+ | {feature} | {test files} ({n} assertions) | {TC IDs} ({n} verifications) | Covered |
149
+ | {feature} | {test files} ({n} assertions) | none | Unit-only |
150
+ | {feature} | none | {TC IDs} ({n} verifications) | E2E-only |
151
+ | {feature} | {test files} ({n} assertions) | {TC IDs} ({n} verifications) | Overlap |
152
+ | {feature} | none | none | Gap |
153
+ ```
154
+
155
+ **Classify each row:**
156
+ - **Covered** — Both unit and E2E tests exist, and they test different aspects (unit tests logic, E2E tests CLI pipeline)
157
+ - **Unit-only** — Unit tests cover this but no E2E test exists. May or may not need E2E depending on Value Gate.
158
+ - **E2E-only** — E2E test exists but no unit test. Valid if the behavior is inherently E2E (subprocess execution, filesystem discovery).
159
+ - **Overlap** — Both unit and E2E test the same assertions. E2E TC is a candidate for removal.
160
+ - **Gap** — Neither unit nor E2E test covers this feature. Needs investigation.
161
+
162
+ ### 6. Generate Review Report
163
+
164
+ Produce the full review report with actionable findings:
165
+
166
+ ```markdown
167
+ ## E2E Coverage Review: {package}
168
+
169
+ **Reviewed:** {timestamp}
170
+ **Scope:** {package-wide or scenario-id}
171
+ **Workflow version:** 2.1
172
+
173
+ ### Summary
174
+
175
+ | Metric | Count |
176
+ |--------|-------|
177
+ | Package features | {n} |
178
+ | Unit test files | {n} |
179
+ | Unit assertions | {n} |
180
+ | E2E scenarios | {n} |
181
+ | E2E test cases | {n} |
182
+ | TCs with decision evidence | {n}/{total} |
183
+
184
+ ### Coverage Matrix
185
+
186
+ {full matrix table from step 5}
187
+
188
+ ### Overlap Analysis
189
+
190
+ TCs that may fail the E2E Value Gate (unit tests cover the same behavior):
191
+
192
+ | TC ID | Feature | Overlapping Unit Tests | Recommendation |
193
+ |-------|---------|----------------------|----------------|
194
+ | {id} | {feature} | {test files} | Remove — unit tests cover this fully |
195
+ | {id} | {feature} | {test files} | Keep — TC tests CLI pipeline, units test logic |
196
+
197
+ **Candidates for removal:** {n} TCs have full overlap with unit tests
198
+
199
+ ### E2E Decision Record Coverage
200
+
201
+ | TC ID | Evidence Status | Missing Fields |
202
+ |-------|------------------|----------------|
203
+ | {id} | complete | none |
204
+ | {id} | missing | e2e-justification, unit-coverage-reviewed |
205
+
206
+ **Action:** Any TC with missing evidence should be updated in `scenario.yml` during the next rewrite cycle.
207
+
208
+ ### Gap Analysis
209
+
210
+ Features with no E2E coverage that may need it:
211
+
212
+ | Feature | External Tools | Unit Coverage | E2E Needed? |
213
+ |---------|---------------|---------------|-------------|
214
+ | {feature} | {tools} | {yes/no} | {yes — requires real subprocess / no — unit tests sufficient} |
215
+
216
+ ### Health Status
217
+
218
+ | TC ID | Last Verified | Status |
219
+ |-------|---------------|--------|
220
+ | {id} | {date} | Healthy / Outdated / Never verified |
221
+
222
+ **Outdated (> 30 days):** {n} TCs
223
+ **Never verified:** {n} TCs
224
+
225
+ ### Consolidation Opportunities
226
+
227
+ TCs sharing the same CLI invocation that could be merged:
228
+
229
+ | CLI Command | TCs | Merged Assertions |
230
+ |-------------|-----|-------------------|
231
+ | {command} | {tc-a}, {tc-b} | {n} total verifications → {n} consolidated |
232
+
233
+ ### Recommendations
234
+
235
+ 1. {Priority recommendation based on overlap analysis}
236
+ 2. {Recommendation based on gap analysis}
237
+ 3. {Recommendation based on health status}
238
+
239
+ ### Next Step
240
+
241
+ Run `ace-bundle wfi://e2e/plan-changes` to generate a concrete change plan.
242
+ ```
243
+
244
+ ## Example Invocations
245
+
246
+ **Review a package:**
247
+ ```bash
248
+ ace-bundle wfi://e2e/review
249
+ ```
250
+
251
+ **Review a single scenario:**
252
+ ```bash
253
+ ace-bundle wfi://e2e/review
254
+ ```
255
+
256
+ ## Error Handling
257
+
258
+ ### No Tests Found
259
+
260
+ If the package has no E2E tests:
261
+ ```
262
+ No E2E tests found for {package}.
263
+
264
+ Unit test inventory was still performed. The package has {n} unit test files
265
+ with {n} assertions covering {n} features.
266
+
267
+ To create the first E2E test: `ace-bundle wfi://e2e/create`
268
+ ```
269
+
270
+ ### No Unit Tests Found
271
+
272
+ If the package has no unit tests:
273
+ ```
274
+ Warning: No unit tests found for {package}. Coverage matrix will only show E2E coverage.
275
+ Consider adding unit tests before expanding E2E coverage.
276
+ ```
277
+
278
+ ### Package Not Found
279
+
280
+ If the package directory doesn't exist:
281
+ ```
282
+ Package '{package}' not found.
283
+
284
+ Available packages:
285
+ {list of ace-* directories}
286
+ ```
@@ -0,0 +1,281 @@
1
+ ---
2
+ doc-type: workflow
3
+ title: Rewrite E2E Tests Workflow
4
+ purpose: Execute a change plan — delete, create, modify, and consolidate E2E test scenarios
5
+ ace-docs:
6
+ last-updated: 2026-03-12
7
+ last-checked: 2026-03-21
8
+ ---
9
+
10
+ # Rewrite E2E Tests Workflow
11
+
12
+ This workflow executes an approved change plan by deleting old scenarios, creating new ones, modifying existing ones, and consolidating overlapping TCs.
13
+
14
+ **Pipeline position:** Stage 3 of 3 (Execute)
15
+
16
+ ```text
17
+ ace-bundle wfi://e2e/review → ace-bundle wfi://e2e/plan-changes → ace-bundle wfi://e2e/rewrite
18
+ (explore) (decide) ▶ (execute) ◀
19
+ ```
20
+
21
+ **Difference from `ace-bundle wfi://e2e/create`:** `create-e2e-test` is for standalone creation — "I need a new E2E test for feature X" with no prior analysis. `rewrite-e2e-tests` is plan-driven — it operates from a structured change plan, handles deletions and modifications, and can replace entire suites.
22
+
23
+ ## Arguments
24
+
25
+ - `PACKAGE` (required) - The package to rewrite tests for (e.g., `ace-lint`)
26
+ - `--plan <path>` (optional) - Path to change plan from Stage 2. If omitted, runs Stages 1+2 first.
27
+ - `--dry-run` (optional) - Show what would change without writing files.
28
+
29
+ ## Canonical Conventions
30
+
31
+ - Keep scenario IDs in `TS-<PACKAGE_SHORT>-<NNN>[-slug]`
32
+ - Keep standalone pairs as `TC-*.runner.md` + `TC-*.verify.md`
33
+ - Keep TC artifact outputs under `results/tc/{NN}/`
34
+ - Keep summary report fields as `tcs-passed`, `tcs-failed`, `tcs-total`, `failed[].tc`
35
+ - CLI split reminder:
36
+ - `ace-test-e2e` runs single-package tests
37
+ - `ace-test-e2e-suite` runs suite-level tests
38
+
39
+ ## Rewrite Contract
40
+
41
+ - Normalize runner files to execution-only language.
42
+ - Normalize verifier files to verdict-only, impact-first validation.
43
+ - Keep setup concerns in `scenario.yml` and fixtures, not in TC runner setup sections.
44
+
45
+ ## Workflow Steps
46
+
47
+ ### 1. Load Change Plan
48
+
49
+ **If `--plan` provided:**
50
+ Read the file at the given path. Verify it contains the expected sections: REMOVE, KEEP, MODIFY, CONSOLIDATE, ADD, and Proposed Scenario Structure.
51
+
52
+ **If no plan:**
53
+ Run the full pipeline:
54
+ 1. Load `ace-bundle wfi://e2e/review` → capture review report
55
+ 2. Load `ace-bundle wfi://e2e/plan-changes` → capture change plan
56
+ 3. Present the plan to the user for confirmation before proceeding
57
+
58
+ Parse the plan into structured actions:
59
+ - List of TCs to REMOVE (with file paths)
60
+ - List of TCs to KEEP (no action needed)
61
+ - List of TCs to MODIFY (with change descriptions)
62
+ - List of CONSOLIDATE groups (source TCs → target TC)
63
+ - List of new TCs to ADD (with scenario assignments)
64
+ - Proposed scenario structure
65
+
66
+ **If `--dry-run`:** After loading the plan, skip to step 6 (Verify Result) and report what would change without writing files.
67
+
68
+ ### 2. Delete Removed Scenarios/TCs
69
+
70
+ For each TC classified as REMOVE:
71
+
72
+ **Entire scenario removal** (all TCs in a scenario are REMOVE):
73
+ ```bash
74
+ rm -rf {PACKAGE}/test/e2e/{scenario-dir}/
75
+ ```
76
+
77
+ **Individual TC removal** (some TCs in a scenario survive):
78
+ ```bash
79
+ rm {PACKAGE}/test/e2e/{scenario-dir}/{tc-file}.runner.md
80
+ rm {PACKAGE}/test/e2e/{scenario-dir}/{tc-file}.verify.md
81
+ ```
82
+
83
+ After all deletions, check if any scenario directories are now empty:
84
+ ```bash
85
+ # Find scenarios with no remaining TC files
86
+ find {PACKAGE}/test/e2e/TS-* -maxdepth 1 -name "TC-*.runner.md" 2>/dev/null | sort
87
+ ```
88
+
89
+ Remove empty scenario directories (no TCs left):
90
+ ```bash
91
+ rm -rf {PACKAGE}/test/e2e/{empty-scenario-dir}/
92
+ ```
93
+
94
+ Stage the deletions:
95
+ ```bash
96
+ git add {PACKAGE}/test/e2e/
97
+ ```
98
+
99
+ ### 3. Create New Scenarios and TCs
100
+
101
+ For each ADD group in the proposed scenario structure:
102
+
103
+ **Create scenario directory:**
104
+ ```bash
105
+ mkdir -p {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}
106
+ ```
107
+
108
+ **Write `scenario.yml`:**
109
+ Include metadata, setup directives, tags, and fixture requirements. Follow the existing scenario.yml format from other scenarios in the package. Preserve existing `tags` when modifying scenarios; add `tags: [{cost-tier}, "use-case:{area}"]` to new scenarios.
110
+
111
+ **Write fixture files** (if needed):
112
+ Create test data files in the scenario's `fixtures/` directory.
113
+
114
+ **Write TC files:**
115
+ For each TC in the scenario, create paired files:
116
+ - `TC-{NNN}-{slug}.runner.md`
117
+ - `TC-{NNN}-{slug}.verify.md`
118
+
119
+ Follow the E2E test writing rules:
120
+
121
+ - **Run the tool first** to verify actual behavior before writing assertions
122
+ - Apply the E2E Value Gate — every TC must require real CLI binary + external tools + filesystem I/O
123
+ - Use `&& echo "PASS" || echo "FAIL"` patterns for every verification step
124
+ - Follow TC ordering: error paths first, happy path, structure verification, lifecycle, end state
125
+ - Consolidate assertions sharing the same CLI invocation into a single TC
126
+ - Target 2-5 TCs per scenario
127
+ - Test through the CLI interface, not library imports
128
+
129
+ **Load the TC template for reference:**
130
+ ```bash
131
+ ace-bundle tmpl://test-e2e
132
+ ```
133
+
134
+ ### 4. Modify Existing TCs
135
+
136
+ For each TC classified as MODIFY:
137
+
138
+ 1. Read the current TC runner/verifier pair
139
+ 2. Apply the changes specified in the plan:
140
+ - **Update assertions** — if source code changed, run the tool to observe new behavior, then update expected output
141
+ - **Narrow scope** — remove assertions that unit tests cover, keep only E2E-exclusive checks
142
+ - **Broaden scope** — add assertions for related behavior tested by the same CLI invocation
143
+ - **Fix structure** — add missing sections, fix formatting issues
144
+ 3. Update the `last-verified` field if the TC was re-run during modification
145
+ 4. Write the updated TC runner/verifier files
146
+
147
+ ### 5. Consolidate TCs
148
+
149
+ For each CONSOLIDATE group:
150
+
151
+ 1. Read all source TC runner/verifier pairs in the group
152
+ 2. Identify the target TC (the one that survives)
153
+ 3. Merge assertion steps from source TCs into the target TC:
154
+ - Combine verification steps under the shared CLI invocation
155
+ - Preserve all unique assertions
156
+ - Remove duplicate assertions
157
+ - Maintain the PASS/FAIL pattern for each verification
158
+ 4. Write the updated target TC
159
+ 5. Delete source TC runner/verifier files (except the target pair)
160
+ 6. Verify the consolidated TC count stays within 2-5 per scenario
161
+
162
+ ### 6. Verify Result
163
+
164
+ After all changes are applied (or in `--dry-run` mode, report what would happen):
165
+
166
+ **List all remaining E2E test files:**
167
+ ```bash
168
+ find {PACKAGE}/test/e2e -name "scenario.yml" -o -name "runner.yml.md" -o -name "verifier.yml.md" -o -name "TC-*.runner.md" -o -name "TC-*.verify.md" 2>/dev/null | sort
169
+ ```
170
+
171
+ **Verify counts match the plan:**
172
+ - Scenario count matches proposed scenario structure
173
+ - TC count per scenario matches plan
174
+ - Total TC count matches plan's "Proposed" column
175
+
176
+ **Check for stale references:**
177
+ - Grep for references to deleted TC IDs in remaining files
178
+ - Verify no broken cross-references between TCs
179
+
180
+ **Check scenario health:**
181
+ - Each scenario has 2-5 TCs
182
+ - Each scenario has a valid `scenario.yml`
183
+ - No empty scenario directories
184
+
185
+ ### 7. Report Summary
186
+
187
+ Present the execution summary:
188
+
189
+ ```markdown
190
+ ## E2E Rewrite Summary: {package}
191
+
192
+ **Executed:** {timestamp}
193
+ **Plan:** {plan path or "inline"}
194
+ **Mode:** {execute or dry-run}
195
+
196
+ ### Changes Applied
197
+
198
+ | Action | Count | Details |
199
+ |--------|-------|---------|
200
+ | Deleted | {n} TCs | {list of removed TC IDs} |
201
+ | Created | {n} TCs | {list of new TC IDs} |
202
+ | Modified | {n} TCs | {list of modified TC IDs} |
203
+ | Consolidated | {n} → {n} TCs | {consolidation summary} |
204
+ | Kept | {n} TCs | (unchanged) |
205
+
206
+ ### Files Changed
207
+
208
+ **Created:**
209
+ - {file path}
210
+ - {file path}
211
+
212
+ **Modified:**
213
+ - {file path}
214
+
215
+ **Deleted:**
216
+ - {file path}
217
+
218
+ ### Final State
219
+
220
+ | Metric | Before | After |
221
+ |--------|--------|-------|
222
+ | Scenarios | {n} | {n} |
223
+ | Test Cases | {n} | {n} |
224
+
225
+ ### Verification
226
+
227
+ - [ ] Scenario count matches plan: {yes/no}
228
+ - [ ] TC count matches plan: {yes/no}
229
+ - [ ] No stale references: {yes/no}
230
+ - [ ] All scenarios have 2-5 TCs: {yes/no}
231
+
232
+ ### Next Steps
233
+
234
+ 1. Review the created/modified TC files
235
+ 2. Run `ace-test-e2e {PACKAGE} {TEST_ID}` for the scenarios you want to verify
236
+ 3. Commit changes with `ace-git-commit`
237
+ ```
238
+
239
+ ## Example Invocations
240
+
241
+ **Execute a pre-approved plan:**
242
+ ```bash
243
+ ace-bundle wfi://e2e/rewrite
244
+ ```
245
+
246
+ **Run full pipeline (review → plan → rewrite):**
247
+ ```bash
248
+ ace-bundle wfi://e2e/rewrite
249
+ ```
250
+
251
+ **Dry-run to preview changes:**
252
+ ```bash
253
+ ace-bundle wfi://e2e/rewrite
254
+ ```
255
+
256
+ ## Error Handling
257
+
258
+ ### Invalid Plan Format
259
+
260
+ If the plan file is missing required sections:
261
+ ```
262
+ Plan file is missing required sections: {missing sections}
263
+
264
+ Expected sections: REMOVE, KEEP, MODIFY, CONSOLIDATE, ADD, Proposed Scenario Structure
265
+ Re-run `ace-bundle wfi://e2e/plan-changes` to generate a valid plan.
266
+ ```
267
+
268
+ ### File Conflicts
269
+
270
+ If a file to be created already exists:
271
+ 1. Compare the existing file with the planned content
272
+ 2. If different: warn the user and ask whether to overwrite
273
+ 3. If identical: skip (already applied)
274
+
275
+ ### Partial Execution
276
+
277
+ If execution fails partway through:
278
+ 1. Report which actions completed and which failed
279
+ 2. Do not attempt to roll back completed actions
280
+ 3. Show the state of `{PACKAGE}/test/e2e/` after partial execution
281
+ 4. Suggest re-running with the remaining actions