ace-test-runner-e2e 0.29.6 → 0.29.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4175ed580aa89e48e44da80f69a528dce356d694ea98a9f7a37f3ca1df6795ab
4
- data.tar.gz: 1400f526eecf489e0dbbcf018a4ca54cb45266bae8832a358fe24be7774d5fa3
3
+ metadata.gz: 6206e4d6f65fe1ab5c27d1b5479e37af079b3554bd6289786aa71ce62e4ecf50
4
+ data.tar.gz: bedb5fa2830bc1f2818e2246acbbab15ec6a7227898bf4dba2b23b6521eb8d5b
5
5
  SHA512:
6
- metadata.gz: ec6ae0219f17b02f9a8613e063a4214b1b71f8668845a90a34f57662e0a580cdd0bd1264b6f86a12fba4dced58b6e56e1ceb3e0a77beef895a23369f5c1c0a93
7
- data.tar.gz: 7031095393272358f64d89460af7c93b8585012227bbaba0ca1c512c07a9f40da7d2f9ef1fba491ab63626e07ebf38b6eb40a9304e47cdb4b6b9f1a42c8ec14f
6
+ metadata.gz: 9d06bc8d9447debe2b48128b7c45ea0a357d01677b9b8b48fa508c5f8a078e8b4ed0396a4d4d38c06be25573701d4bd75b6f481438d116e3499b4b1890a9edd5
7
+ data.tar.gz: b148663600b83ffde9821761a1ef4a7b43d11a2efc10d97433bb090d4ae2bfc8f7dd997756190cdafc894c497f1eada99faf3b78dcf7df38ed7eec9d57cedec3
data/CHANGELOG.md CHANGED
@@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.29.8] - 2026-04-01
11
+
12
+ ### Fixed
13
+ - Replaced process-global `Dir.chdir` in pipeline LLM execution with explicit `working_dir` threading to avoid parallel scenario crashes (`RuntimeError: conflicting chdir during another chdir block`).
14
+
15
+ ### Changed
16
+ - **ace-monorepo-e2e**: Added stronger command/output evidence gates to `TS-MONO-001-rubygems-install` and `TS-MONO-002-quickstart-local` so local sandbox installs and quick-start workflow checks validate real CLI behavior, output, and exit status rather than directory/file presence alone.
17
+ - **ace-monorepo-e2e**: Updated `ace-test-runner-e2e` workflow instructions and scenario template defaults to reduce false-positive E2E tests through command-level evidence, false-positive risk tagging, and duplicate-command consolidation rules.
18
+
10
19
  ## [0.29.6] - 2026-04-01
11
20
 
12
21
  ### Fixed
@@ -23,6 +23,12 @@ tags: [{cost-tier}, "use-case:{area}"]
23
23
  # Optional: Why this scenario must be E2E (not unit-only)
24
24
  e2e-justification: "{Requires real CLI/tools/filesystem behavior}"
25
25
 
26
+ # Optional: Evidence quality target for review coverage (`command-output`, `state+content`, `existence-only`)
27
+ e2e-evidence-strength: command-output
28
+
29
+ # Optional: False-positive risk estimate (`low`, `medium`, `high`)
30
+ e2e-false-positive-risk: low
31
+
26
32
  # Optional: Unit test files reviewed during Value Gate analysis
27
33
  unit-coverage-reviewed:
28
34
  - test/{layer}/{file}_test.rb
@@ -163,7 +163,28 @@ All proposed behaviors are already covered by unit tests in {PACKAGE}/test/.
163
163
  No E2E test needed. Consider adding unit tests instead if coverage gaps exist.
164
164
  ```
165
165
 
166
- ### 7a. E2E Decision Record (Required)
166
+ ### 7a. Evidence-Gate Review Before Writing Files
167
+
168
+ Before finalizing the test plan, block weak coverage patterns:
169
+ - **Existence-only TC**:
170
+ - only checks directory/file existence
171
+ - no command output/content assertion
172
+ - missing `*.exit` capture for the executed command
173
+ - **Duplicate-invocation TC**:
174
+ - same command invocation, same purpose, split across multiple TCs
175
+
176
+ | TC ID | Decision (KEEP/ADD/SKIP) | Evidence Strength | E2E-only reason | Unit tests reviewed |
177
+ |-------|---------------------------|------------------|-----------------|--------------------|
178
+ | {tc-id} | {decision} | `command-output` | {why this needs real CLI/tools/fs} | {path1,path2} |
179
+
180
+ Rules:
181
+ - `existence-only` is never valid for KEEP/ADD. Use it only for SKIP rows with explicit unit-test replacement.
182
+ - `SKIP` rows must include replacement unit-test evidence.
183
+ - Non-skipped rows must include command-level artifacts (`stdout`, `stderr`, `exit`, and/or explicit proof files).
184
+ - At least one `unit tests reviewed` path is required for every row.
185
+ - The scenario-level `unit-coverage-reviewed` field must include the union of all referenced unit test files.
186
+
187
+ ### 7b. E2E Decision Record (Required)
167
188
 
168
189
  Before writing files, produce a decision record table for every candidate TC:
169
190
 
@@ -205,11 +226,13 @@ If a context description was provided, enhance the test with:
205
226
  - Verify actual file paths by running the tool first — never hardcode paths from documentation or assumptions
206
227
  - Use explicit `&& echo "PASS" || echo "FAIL"` patterns for every verification step
207
228
  - Check specific exit codes for error commands (not just "non-zero")
229
+ - Add at least one output-content assertion for each command being verified
208
230
 
209
231
  **SHOULD (strongly recommended):**
210
232
  - Test the real user journey — structure TCs as a sequential workflow, not isolated commands
211
233
  - Verify exit codes for all commands, not just error cases
212
234
  - Include negative assertions (files/directories that should NOT exist)
235
+ - Capture and retain command output for all assertions (`stdout`, `stderr`, and `*.exit`)
213
236
  - Capture and check CLI output content, not just exit codes
214
237
  - Verify that status values match actual implementation (e.g., `done` vs `completed`)
215
238
 
@@ -392,4 +415,4 @@ Area codes must be:
392
415
  - 2-10 characters
393
416
  - Alphanumeric only
394
417
  - Will be converted to uppercase
395
- ```
418
+ ```
@@ -117,19 +117,21 @@ find {PACKAGE}/test/e2e -name "scenario.yml" -path "*/TS-*" 2>/dev/null | sort
117
117
  - `last-verified`, `verified-by`
118
118
  - Extract the objective (what the TC verifies)
119
119
  - Identify which CLI commands the TC runs
120
+ - Record command fingerprint (`command + key flags`) for each command assertion
120
121
  - Count verification steps (PASS/FAIL checks)
121
122
  - Map to the feature it tests
122
123
  - Mark TC evidence status:
123
- - `complete` when `e2e-justification` is present and `unit-coverage-reviewed` has at least one path
124
+ - `complete` when `e2e-justification` is present, command artifacts are present, and `unit-coverage-reviewed` has at least one path
124
125
  - `missing` otherwise
126
+ - `at-risk` when evidence is existence-only or duplicate command invocations are detected
125
127
 
126
128
  If `--scope` was provided, filter to only the specified scenario.
127
129
 
128
130
  Build an E2E test map:
129
131
 
130
- | TC ID | Title | CLI Command | Feature Tested | Verifications | Tags | Cost Tier | E2E Justification | Unit Coverage Reviewed | Evidence |
132
+ | TC ID | Title | Command Invocations | Feature Tested | Verifications | Tags | Cost Tier | E2E Justification | Unit Coverage Reviewed | Evidence | False-Positive Risk |
131
133
  |-------|-------|-------------|----------------|---------------|------|-----------|-------------------|------------------------|----------|
132
- | {id} | {title} | {command} | {feature} | {n} | {tags} | {tier} | {reason or "(missing)"} | {files or "(missing)"} | {complete/missing} |
134
+ | {id} | {title} | {command list} | {feature} | {n} | {tags} | {tier} | {reason or "(missing)"} | {files or "(missing)"} | {complete/missing/at-risk} | {low/medium/high} |
133
135
 
134
136
  ### 5. Build Coverage Matrix
135
137
 
@@ -143,13 +145,13 @@ Combine the three inventories into a single coverage matrix:
143
145
  ```markdown
144
146
  ### Coverage Matrix
145
147
 
146
- | Feature | Unit Tests | E2E Tests | Status |
147
- |---------|-----------|-----------|--------|
148
- | {feature} | {test files} ({n} assertions) | {TC IDs} ({n} verifications) | Covered |
149
- | {feature} | {test files} ({n} assertions) | none | Unit-only |
150
- | {feature} | none | {TC IDs} ({n} verifications) | E2E-only |
151
- | {feature} | {test files} ({n} assertions) | {TC IDs} ({n} verifications) | Overlap |
152
- | {feature} | none | none | Gap |
148
+ | Feature | Unit Tests | E2E Tests | Evidence Strength | False-Positive Risk | Status |
149
+ |---------|-----------|-----------|------------------|----------------------|--------|
150
+ | {feature} | {test files} ({n} assertions) | {TC IDs} ({n} verifications) | command-output/state+content | low | Covered |
151
+ | {feature} | {test files} ({n} assertions) | none | none | n/a | Unit-only |
152
+ | {feature} | none | {TC IDs} ({n} verifications) | command-output | low | E2E-only |
153
+ | {feature} | {test files} ({n} assertions) | {TC IDs} ({n} verifications) | command-output or existence-only | medium/high | Overlap |
154
+ | {feature} | none | none | none | high | Gap |
153
155
  ```
154
156
 
155
157
  **Classify each row:**
@@ -158,6 +160,7 @@ Combine the three inventories into a single coverage matrix:
158
160
  - **E2E-only** — E2E test exists but no unit test. Valid if the behavior is inherently E2E (subprocess execution, filesystem discovery).
159
161
  - **Overlap** — Both unit and E2E test the same assertions. E2E TC is a candidate for removal.
160
162
  - **Gap** — Neither unit nor E2E test covers this feature. Needs investigation.
163
+ - If a row has `false-positive risk` `high`, downgrade Covered/Overlap to **manual-review** until evidence is corrected.
161
164
 
162
165
  ### 6. Generate Review Report
163
166
 
@@ -180,6 +183,7 @@ Produce the full review report with actionable findings:
180
183
  | E2E scenarios | {n} |
181
184
  | E2E test cases | {n} |
182
185
  | TCs with decision evidence | {n}/{total} |
186
+ | High-risk false-positive TCs | {n}/{total} |
183
187
 
184
188
  ### Coverage Matrix
185
189
 
@@ -187,12 +191,13 @@ Produce the full review report with actionable findings:
187
191
 
188
192
  ### Overlap Analysis
189
193
 
190
- TCs that may fail the E2E Value Gate (unit tests cover the same behavior):
194
+ TCs that may fail the E2E Value Gate (unit tests cover the same behavior or high false-positive risk):
191
195
 
192
196
  | TC ID | Feature | Overlapping Unit Tests | Recommendation |
193
197
  |-------|---------|----------------------|----------------|
194
198
  | {id} | {feature} | {test files} | Remove — unit tests cover this fully |
195
199
  | {id} | {feature} | {test files} | Keep — TC tests CLI pipeline, units test logic |
200
+ | {id} | {feature} | {test files} | Strengthen — currently existence-only or duplicate command assertions |
196
201
 
197
202
  **Candidates for removal:** {n} TCs have full overlap with unit tests
198
203
 
@@ -283,4 +288,4 @@ Package '{package}' not found.
283
288
 
284
289
  Available packages:
285
290
  {list of ace-* directories}
286
- ```
291
+ ```
@@ -125,6 +125,11 @@ Follow the E2E test writing rules:
125
125
  - Consolidate assertions sharing the same CLI invocation into a single TC
126
126
  - Target 2-5 TCs per scenario
127
127
  - Test through the CLI interface, not library imports
128
+ - Add command-level evidence in every runner:
129
+ - command output (`*.stdout`/`*.stderr`)
130
+ - command exit status (`*.exit`)
131
+ - Add at least one behavioral/content assertion per command assertion set
132
+ - Remove duplicate command-only TCs; fold related assertions into one TC where possible
128
133
 
129
134
  **Load the TC template for reference:**
130
135
  ```bash
@@ -141,6 +146,7 @@ For each TC classified as MODIFY:
141
146
  - **Narrow scope** — remove assertions that unit tests cover, keep only E2E-exclusive checks
142
147
  - **Broaden scope** — add assertions for related behavior tested by the same CLI invocation
143
148
  - **Fix structure** — add missing sections, fix formatting issues
149
+ - **Add evidence gates** — if the existing TC relies on existence-only or missing exit/status checks, add explicit command output assertions and `.exit` captures
144
150
  3. Update the `last-verified` field if the TC was re-run during modification
145
151
  4. Write the updated TC runner/verifier files
146
152
 
@@ -228,6 +234,7 @@ Present the execution summary:
228
234
  - [ ] TC count matches plan: {yes/no}
229
235
  - [ ] No stale references: {yes/no}
230
236
  - [ ] All scenarios have 2-5 TCs: {yes/no}
237
+ - [ ] All modified/created TCs include command output + exit artifacts: {yes/no}
231
238
 
232
239
  ### Next Steps
233
240
 
@@ -278,4 +285,4 @@ If execution fails partway through:
278
285
  1. Report which actions completed and which failed
279
286
  2. Do not attempt to roll back completed actions
280
287
  3. Show the state of `{PACKAGE}/test/e2e/` after partial execution
281
- 4. Suggest re-running with the remaining actions
288
+ 4. Suggest re-running with the remaining actions
@@ -102,18 +102,17 @@ module Ace
102
102
  system = File.read(system_path)
103
103
  sandbox_dir = env_vars["PROJECT_ROOT_PATH"] || env_vars[:PROJECT_ROOT_PATH]
104
104
 
105
- Dir.chdir(sandbox_dir) do
106
- Ace::LLM::QueryInterface.query(
107
- @provider,
108
- prompt,
109
- system: system,
110
- cli_args: cli_args,
111
- timeout: @timeout,
112
- fallback: false,
113
- output: output_path,
114
- subprocess_env: env_vars
115
- )
116
- end
105
+ Ace::LLM::QueryInterface.query(
106
+ @provider,
107
+ prompt,
108
+ system: system,
109
+ cli_args: cli_args,
110
+ timeout: @timeout,
111
+ fallback: false,
112
+ output: output_path,
113
+ subprocess_env: env_vars,
114
+ working_dir: sandbox_dir
115
+ )
117
116
  end
118
117
  end
119
118
  end
@@ -3,7 +3,7 @@
3
3
  module Ace
4
4
  module Test
5
5
  module EndToEndRunner
6
- VERSION = '0.29.6'
6
+ VERSION = '0.29.8'
7
7
  end
8
8
  end
9
9
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ace-test-runner-e2e
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.29.6
4
+ version: 0.29.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Michal Czyz
8
8
  bindir: exe
9
9
  cert_chain: []
10
- date: 2026-04-01 00:00:00.000000000 Z
10
+ date: 2026-04-05 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: ace-support-cli