RubyGems - ace-test-runner-e2e - Versions diffs - 0.38.11 → 0.40.2 - Mend

ace-test-runner-e2e 0.38.11 → 0.40.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 1a1e81b2b077a6bca7e75e1572743a31f20abe1ef5ebcb69ea82c7f55e95fd4b
-  data.tar.gz: e1791696e6cbbb58decab800387e005ee22f134734be2736d5c00f109273dd57
+  metadata.gz: d66fec48d22d05660c8851a50dca74a05d78314f0a0185e9fe71cbed378b624f
+  data.tar.gz: 974e1a357c134b270624df6c99de78d351da80c43b51623770a775ff96d10715
 SHA512:
-  metadata.gz: 143efde4ad09db543ff0865da3de1a94343c278a64c12f55f1786718725846b5cec565e144687b8c9d16bbc15736bf73abb34b30b49fa746a56ed522978e6434
-  data.tar.gz: b166bec29e9f10d0eff3b692d6526c22e0960252fb1a9d6d125c646b47a14137472775f73a9120365ebb608d7f2ccc38f88b4694e1b144165cf1a2994512ed2e
+  metadata.gz: '0866fe9a27cfb959f199a20f1057191ee806d201a8ae024b0bc0180dedea8c9b1984d759ae914f5f7ece9db0730d57a91d105272e8837eaca50a14e0fdafb4f8'
+  data.tar.gz: c3676f29eb9bcbcbcb343518f871a1636ff9fb2d57f2af09916ea9e991b1a944c58ed7f92b9f07f09280539843627afb90441843809250c493912857084c861a

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,66 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.40.2] - 2026-06-30
+### Technical
+- Stabilized fast tests with sandbox-local fake backends and fixture setup executors so path handling remains independent of host temporary directories.
+## [0.40.1] - 2026-04-24
+### Fixed
+- Removed suite-specific wording from the single-command `ace-test-e2e` help/output path so the `RunTest` CLI stays scoped to the single-command surface while preserving prune-artifact guidance.
+## [0.40.0] - 2026-04-24
+### Changed
+- Added `--[no-]prune-artifacts` to `ace-test-e2e` and `ace-test-e2e-suite` so operators can clear stale `.ace-local/test-e2e` run artifacts before execution while preserving suite reports and the shared `runtime-cache/`.
+## [0.39.1] - 2026-04-24
+### Fixed
+- Resolved suite-shared runtime reuse in child `ace-test-e2e` subprocesses by honoring inherited `ACE_E2E_SHARED_RUNTIME_ROOT` from the process environment instead of rebuilding sandbox-local runtimes after prewarming.
+## [0.39.0] - 2026-04-24
+### Changed
+- Added `--[no-]retry-failures-once` for full-suite reruns, including flaky-recovery reporting when a failed first pass succeeds on the retry pass.
+- Reused a suite-shared E2E runtime cache under `.ace-local/test-e2e/runtime-cache/` so parallel sandbox workers stop rebuilding the same Bundler environment and native extensions for every scenario.
+## [0.38.17] - 2026-04-24
+### Fixed
+- Detected fixture-commit setup flows across the full setup sequence instead of only single-step `git add && git commit` commands, restoring support-path git excludes for split-step fixture repositories.
+## [0.38.16] - 2026-04-23
+### Fixed
+- Enforced runner-owned verifier artifact contracts in scenario loading, expanded grouped `.stdout` / `.stderr` / `.exit` shorthand, and rejected verifier-only or wildcard artifact declarations that previously let retained E2E drift slip through.
+### Technical
+- Updated E2E guides, templates, and create/review/plan/rewrite/fix workflows to distinguish `public-surface` versus `retained-contract` TCs and require explicit downstream retained-E2E sweeps after public contract changes.
+## [0.38.15] - 2026-04-23
+### Fixed
+- Passed declared artifact contracts directly into runner prompts, added one bounded runner repair pass when required captures are still missing, and persisted repair metadata so missing-artifact E2E failures can recover before verifier judgment.
+## [0.38.14] - 2026-04-23
+### Fixed
+- Limited deterministic sandbox git excludes to setup-commit scenarios so copied package trees remain visible to ignore-aware tools while fixture-repo support paths stay unstaged.
+## [0.38.13] - 2026-04-23
+### Fixed
+- Enabled role-based verifier fallback in pipeline execution so successful runner phases still produce verifier results when the first verifier provider is unavailable.
+- Seeded deterministic sandbox git excludes for copied package trees and fixture-commit support paths so setup-time `git add -A` no longer stages runner support files or copied package content into fixture repositories.
+## [0.38.12] - 2026-04-23
+### Changed
+- Updated default ACE sandbox bootstrap to use `ace-config sync ace-llm-providers-cli` before `ace-handbook sync`, matching the renamed config sync command and minimal quick-start config requirement.
 ## [0.38.11] - 2026-04-20
 ### Fixed

data/handbook/guides/e2e-testing.g.md CHANGED Viewed

@@ -14,6 +14,10 @@ ace-docs:
 E2E tests are executed by an AI agent and reserved for behaviors that require real CLI execution, real tools, and real filesystem side effects.
 They must also answer a user-journey question: can a user do the job from the tool's public surface, and how much friction does that journey have?
+In practice, ACE uses two valid TC styles:
+- **Public-surface TCs** — prove a user job from docs/usage/`--help` and the CLI itself.
+- **Retained-contract TCs** — pin a previously fragile integrated behavior with deterministic, explicitly declared evidence.
 ## Canonical Conventions
 - CLI split:
@@ -33,7 +37,7 @@ They must also answer a user-journey question: can a user do the job from the to
 - Runner is **execution-only**:
   - perform user-like CLI actions in sandbox
-  - produce only final outcome evidence under `results/tc/{NN}/`
+  - produce only declared outcome evidence under `results/tc/{NN}/`
   - return final runner observations through the harness contract
   - do not issue PASS/FAIL verdicts
   - do not perform verifier-style assertion/classification
@@ -46,6 +50,11 @@ They must also answer a user-journey question: can a user do the job from the to
     2. runner observations
     3. explicit TC artifacts that are true product outcomes
     4. debug captures (`stdout`, `stderr`, `*.exit`, metadata) only as fallback
+- Artifact contract ownership:
+  - runner instructions and `scenario.yml` setup/layout declare verifier-visible artifact paths
+  - verifier consumes that contract; it does not create new required artifact paths
+  - grouped shorthand such as ``results/tc/02/help.stdout`, `.stderr`, `.exit`` counts as an exact declaration of all three files
+  - wildcard artifact paths such as `results/tc/02/output.*` are not valid declarations
 - Setup ownership:
   - sandbox preparation belongs to `scenario.yml` `setup:` + `fixtures/`
   - TC runner files must not define independent environment setup procedures
@@ -77,6 +86,21 @@ When an E2E failure shows that a valid user job is not discoverable from docs, u
 docs/help drift. Failure analysis must record the stale or missing public surface and the exact docs/help target to
 update instead of teaching the runner a workaround.
+## TC Style Selection
+Use **public-surface** style when the goal is a real user journey and the primary oracle should stay on user-visible behavior.
+Use **retained-contract** style when the integrated behavior matters but final sandbox state alone is not enough. In that case, small declared supporting captures are valid, for example:
+- `.stdout`, `.stderr`, `.exit`
+- `command.txt`
+- `path-check.txt`
+- `artifact-check.txt`
+Even retained-contract TCs must not rely on:
+- verifier-only artifact declarations
+- wildcard artifact paths
+- reflections, PASS/FAIL summaries, or verifier-facing manifests under `results/`
 ## Cost and Scope
 - Keep scenarios small and coherent.
@@ -128,12 +152,16 @@ This prevents duplicate assertions across test layers.
 - Keep runner goals aligned with the public user path; if the runner needs a workaround, surface that as friction rather than teaching the workaround.
 - Keep verifier expectations impact-first, then artifacts, then debug fallback.
 - Preserve strict TC pairing (`runner` + `verify`).
-- Keep `results/tc/{NN}/` for outcome artifacts only.
-- Do not instruct runners to create helper YAML, path files, command files, or reflections in `results/`.
+- Keep `results/tc/{NN}/` for declared verifier-dependent evidence only.
+- Declare every verifier-dependent file path in runner instructions or scenario setup. Do not rely on verifier-only path references.
+- Allow small supporting captures only when they are explicitly declared and materially improve confidence.
+- Do not use wildcard artifact paths.
+- Do not instruct runners to create reflections, PASS/FAIL summaries, verifier-facing manifests, or ad hoc temp inputs in `results/`.
 - Do not judge success from runner-authored summaries when final sandbox state can prove the goal directly.
 - Use runner observations only to explain ambiguity or missing side effects, not to replace missing end-state evidence.
 - Treat any workaround noted in runner observations as a product/docs/help or scenario-design smell that must be fixed, not preserved.
 - Avoid hidden dependencies between TCs unless explicitly intended.
+- For `--watch` or other live-output commands, use a bounded-session pattern with explicit termination behavior and captured exit codes.
 ## Execution Artifacts
@@ -150,9 +178,13 @@ Before approving new/updated E2E tests:
 - [ ] `runner.yml.md` and `verifier.yml.md` exist
 - [ ] Every TC has both `.runner.md` and `.verify.md`
 - [ ] Artifacts are scoped to `results/tc/{NN}/`
+- [ ] Every verifier-dependent artifact path is declared by runner/setup
+- [ ] No verifier depends on wildcard or verifier-only artifact paths
 - [ ] Verifier primary oracle is final sandbox state or real product output, not helper artifacts
 - [ ] Runner observations are the only non-filesystem secondary evidence source
+- [ ] TC style is explicit in the review (`public-surface` or `retained-contract`)
 - [ ] Scenario can be completed from docs/usage/`--help` without hidden recipes or workaround instructions
+- [ ] Any internal-detail assertion is part of the public contract or justified as retained-contract evidence
 - [ ] Any friction/workaround found during review is treated as a gap, not as a runner script opportunity
 - [ ] Failure analysis records docs/help drift from failed public user paths, or explicitly records `None`
 - [ ] Value-gate metadata is present (`e2e-justification`, `unit-coverage-reviewed`, `cost-tier`)

data/handbook/guides/scenario-yml-reference.g.md CHANGED Viewed

@@ -46,7 +46,7 @@ Example: `ace-lint/test/e2e/TS-LINT-001-lint-pipeline/scenario.yml`
 |-------|------|---------|-------------|
 | `priority` | string | `medium` | Test priority: `high`, `medium`, `low` |
 | `tool-under-test` | string | — | Primary command/tool validated |
-| `sandbox-layout` | object | `{}` | Outcome-path hints used to precreate directories and guide verification |
+| `sandbox-layout` | object | `{}` | Directory-level outcome hints used to precreate `results/tc/*` paths and guide verification |
 | `duration` | string | — | Estimated duration (e.g., `~15min`) |
 | `timeout` | integer | — | Optional per-scenario execution timeout in seconds |
 | `automation-candidate` | boolean | `false` | Whether test is automatable |
@@ -73,7 +73,10 @@ Pairing rule:
 Artifact layout conventions:
 - canonical: `results/tc/{NN}/`
 - avoid non-TC-scoped result folders
-- keep only real outcome artifacts under `results/tc/{NN}/`; runner observations live in harness reports, not sandbox helper files
+- keep only declared verifier-dependent evidence under `results/tc/{NN}/`; runner observations live in harness reports, not sandbox helper files
+- file-level verifier checks must be declared by the runner; `sandbox-layout` does not replace exact file declarations
+- grouped shorthand such as ``results/tc/01/help.stdout`, `.stderr`, `.exit`` is valid for exact sibling captures
+- wildcard artifact paths are not supported
 - absence of a declared path is debug context, not a standalone failure reason
 Canonical summary report fields:
@@ -85,7 +88,8 @@ Canonical summary report fields:
 Role contract:
 - `runner.yml.md` + `TC-*.runner.md` are execution-only.
 - `verifier.yml.md` + `TC-*.verify.md` are verification-only with impact-first checks.
-- Goal-style scenarios should be solvable from the public surface (docs/usage/`--help` + tool under test) without hidden recipes or workaround instructions.
+- Public-surface TCs should be solvable from the public surface (docs/usage/`--help` + tool under test) without hidden recipes or workaround instructions.
+- Retained-contract TCs may keep small declared supporting captures when they materially improve confidence.
 ## `requires` Object
@@ -130,6 +134,7 @@ setup:
 Setup rules:
 - Setup is fail-fast. Do not hide setup failures with `|| true`.
 - Setup belongs in `scenario.yml` and fixtures, not in TC runner instructions.
+- Use setup to create prerequisite state, not verifier-facing helper files under `results/`.
 - If setup fails (for example, missing `mise trust` support), stop scenario execution and report infrastructure failure.
 ## Complete Example

data/handbook/guides/tc-authoring.g.md CHANGED Viewed

@@ -32,6 +32,11 @@ Inline `.tc.md` and frontmatter `mode` values are no longer supported.
 - TC outcome artifacts write to `results/tc/{NN}/`
 - Summary counters use `tcs-passed`, `tcs-failed`, and `tcs-total`
+## TC Styles
+- **Public-surface**: prove a documented user job from docs/usage/`--help` and the CLI.
+- **Retained-contract**: pin an integrated behavior with deterministic, explicitly declared supporting evidence when end-state checks alone are insufficient.
 ## File Naming
 - `TC-{NNN}` — test case number (e.g., TC-001)
@@ -82,8 +87,9 @@ Run `ace-lint` and produce report artifacts for a valid file.
 ## Constraints
 - Use only sandbox paths
-- Keep only final outcome evidence under `results/tc/01/`
-- Do not place helper inputs, manifests, command transcripts, or reflections under `results/tc/01/`
+- Keep only declared verifier-dependent evidence under `results/tc/01/`
+- Declare exact paths for any verifier-dependent captures, for example ``results/tc/01/help.stdout`, `.stderr`, `.exit``
+- Do not place helper inputs, manifests, PASS/FAIL summaries, or reflections under `results/tc/01/`
 - Execute actions only; do not assign PASS/FAIL or final verdicts
 ```
@@ -122,14 +128,19 @@ Pass only when all expectations are satisfied by on-disk evidence.
 - Keep each TC focused on one coherent behavior path.
 - Ensure goal numbers and TC numbers remain aligned (`TC-001` -> Goal 1).
+- Choose the TC style up front: `public-surface` or `retained-contract`.
 - Keep runner files execution-only and verifier files verdict-only.
 - Make verifier expectations deterministic with impact-first ordering.
-- Keep `results/tc/{NN}/` for outcome artifacts only.
+- Keep `results/tc/{NN}/` for declared verifier-dependent evidence only.
+- Declare every verifier-dependent path in the runner or setup. Do not rely on verifier-only references.
+- Grouped capture shorthand is valid only for exact sibling files, for example ``foo.stdout`, `.stderr`, `.exit``.
+- Do not use wildcard artifact paths.
 - Use harness-provided runner observations as the only non-filesystem secondary evidence source.
 - Prefer final sandbox state and real product output over raw debug captures.
-- Do not ask the runner to write setup inputs, audit manifests, or final reflections for the verifier.
+- Do not ask the runner to write setup inputs, audit manifests, verifier-facing summaries, or final reflections for the verifier.
 - Do not teach the runner hidden recipes or workaround sequences; if the path is not discoverable from docs/usage/`--help`, the TC is wrong or the public surface needs improvement.
 - Use runner observations to record friction and workaround pressure, not to normalize it.
+- For watch/live-output flows, use a bounded-session pattern with explicit shutdown and captured exit code.
 - Record why each scenario remains E2E via `e2e-justification` and `unit-coverage-reviewed` in `scenario.yml`.
 ## Related

data/handbook/templates/tc-file.template.md CHANGED Viewed

@@ -22,7 +22,9 @@ ace-docs:
 - Use only declared scenario tools (`ace-*` and explicit exceptions)
 - Keep only product outcomes or essential command captures under `results/tc/{NN}/`
-- Do not write helper inputs, reflections, manifests, or temp files under `results/tc/{NN}/`
+- Declare every verifier-dependent path explicitly in the runner or scenario setup
+- Grouped capture shorthand such as ``results/tc/{NN}/cmd.stdout`, `.stderr`, `.exit`` is allowed for exact sibling files
+- Do not write helper inputs, reflections, PASS/FAIL summaries, manifests, or temp files under `results/tc/{NN}/`
 - Do not write outside sandbox
 - Execute actions only; do not assign PASS/FAIL in runner file
 - Follow the public user path from docs/usage/`--help`; do not embed hidden recipes or workaround branches in the TC
@@ -51,5 +53,5 @@ Companion verifier file (`TC-{NNN}-{slug}.verify.md`) example:
 ## Verdict
-- Pass when the public path works from sandbox evidence. Missing helper artifacts alone should not fail the goal.
+- Pass when the public path or retained contract is satisfied from sandbox evidence. Undeclared helper artifacts alone should not fail the goal.
 -->

data/handbook/workflow-instructions/e2e/create.wf.md CHANGED Viewed

@@ -41,7 +41,10 @@ This workflow guides an agent through creating a new E2E test scenario.
 ## Authoring Contract
 - Runner files (`runner.yml.md`, `TC-*.runner.md`) are execution-only.
-- Goal-style TCs must prove two things:
+- Every TC must be authored as one of:
+  - **public-surface** — a user job from docs/usage/`--help` and the CLI
+  - **retained-contract** — a deterministic integrated regression check with declared supporting evidence
+- Goal-style/public-surface TCs must prove two things:
   - the tool works
   - a user can do the job from the public surface (`README`, usage docs, `--help`, and the CLI itself) without hidden recipes or workarounds
 - Verifier files (`verifier.yml.md`, `TC-*.verify.md`) are verdict-only with impact-first evidence order:
@@ -52,7 +55,10 @@ This workflow guides an agent through creating a new E2E test scenario.
   4. debug captures as fallback
 - Setup belongs to `scenario.yml` `setup:` and fixtures; do not duplicate setup in runner TC instructions.
-- Keep `results/tc/{NN}/` for real outcome artifacts only; do not ask the runner to write helper YAML, path files, command files, reflections, or verifier-facing manifests there.
+- Keep `results/tc/{NN}/` for declared verifier-dependent evidence only.
+- Declare every verifier-dependent path in the runner or setup. Grouped shorthand such as ``foo.stdout`, `.stderr`, `.exit`` is allowed for exact sibling captures.
+- Do not use wildcard artifact paths.
+- Do not ask the runner to write reflections, verifier-facing manifests, or undeclared helper files there.
 - Do not encode hidden command recipes, fallback detours, or workaround sequences in runner TC files. If the job cannot be done from the public surface, treat that as a product/docs/help gap or remove/narrow the TC.
 ## Workflow Steps
@@ -248,9 +254,12 @@ Rules:
 - `existence-only` is never valid for KEEP/ADD. Use it only for SKIP rows with explicit unit-test replacement.
 - `helper-artifact-driven` is never valid for KEEP/ADD when final sandbox state could prove the goal directly.
 - `hidden-recipe-driven` and `workaround-driven` are never valid for KEEP/ADD.
+- Every verifier-dependent artifact must be declared by runner/setup; verifier-only references are invalid.
+- Wildcard artifact paths are never valid for KEEP/ADD.
 - `SKIP` rows must include replacement unit-test evidence.
 - Non-skipped rows must identify the primary oracle for the TC: final sandbox state, real product output, or debug fallback.
 - Non-skipped rows must state why the job is achievable from the public surface without hidden recipes.
+- Non-skipped rows must identify TC style: `public-surface` or `retained-contract`.
 - At least one `unit tests reviewed` path is required for every row.
 - The scenario-level `unit-coverage-reviewed` field must include the union of all referenced unit test files.
@@ -267,6 +276,7 @@ Rules:
 - No TC may be created without a row in this table.
 - If decision is `SKIP`, include the unit-test evidence that replaces it.
 - If the public-surface path is missing or workaround-driven, the TC must be `SKIP` or explicitly planned as a product/docs/help improvement before creation.
+- If the TC uses live refresh or watch behavior, include a bounded-session capture plan with explicit shutdown behavior and exit-code expectations.
 - At least one `unit tests reviewed` path is required for each row.
 - The scenario-level `unit-coverage-reviewed` field must include the union of all referenced unit test files.
@@ -301,7 +311,7 @@ If a context description was provided, enhance the test with:
 - Write runner goals as user outcomes, not “create a report” chores for the verifier
 - Check specific exit codes for error commands (not just "non-zero")
 - Make final sandbox state or real product output the primary oracle whenever possible
-- Do not require runner-authored helper files under `results/tc/{NN}/`
+- Do not require undeclared or verifier-facing helper files under `results/tc/{NN}/`
 - Add at least one behavioral/content assertion when CLI output itself is part of the outcome being tested
 **SHOULD (strongly recommended):**

data/handbook/workflow-instructions/e2e/fix.wf.md CHANGED Viewed

@@ -110,6 +110,7 @@ Apply fixes in this order:
 - Preserve role split: runner is execution-only, verifier is impact-first verdict
 - Keep implementation unchanged unless analysis is revised
 - Remove hidden recipes, workaround branches, and unsupported internal-detail checks from goal-style TCs
+- Repair undeclared or wildcard artifact contracts before weakening product assertions
 4. Rerun the selected failing scope after each fix
@@ -150,6 +151,12 @@ ace-test-e2e ace-bundle TS-BUNDLE-001
 - Keep one active scenario/TC at a time
 - Preserve cost-conscious rerun discipline
+6a. If the fix changes a public contract, run a downstream retained-E2E sweep
+- Trigger this sweep when the fix changes status words, JSON keys, command shapes, lifecycle semantics, or ownership/state semantics
+- Grep impacted scenarios and downstream consumers before concluding the fix
+- Update retained runner/verifier contracts in the same change set whenever feasible
 7. Run a final explicit failing-scenario checkpoint before concluding the fix session
 After the currently targeted failures are addressed, require one final:
@@ -179,6 +186,18 @@ Analysis Source: reused existing analysis | generated via `wfi://e2e/analyze-fai
 | ... | ... | ... | ... | pass/fail |
 ```
+Also include:
+```markdown
+## Fix Classification Totals
+| Bucket | Count |
+|---|---|
+| Product bug | {n} |
+| Harness bug | {n} |
+| Retained test/spec drift | {n} |
+```
 If the analysis reported docs/help drift, include:
 ```markdown

data/handbook/workflow-instructions/e2e/plan-changes.wf.md CHANGED Viewed

@@ -56,6 +56,15 @@ Build a change inventory:
 - **Removed features** — deleted files or deprecated modules
 - **Unchanged features** — stable code with no recent modifications
+Before classifying TCs, also check whether the package change alters a public contract that downstream retained E2E tests commonly pin:
+- status words
+- JSON keys or output schema
+- CLI command/flag shapes
+- lifecycle semantics
+- ownership/state semantics
+If yes, add an explicit downstream retained-E2E sweep list to the plan instead of limiting scope to the package under edit.
 ### 3. Classify Each Existing TC
 For each TC listed in the coverage matrix, assign exactly one classification:
@@ -80,6 +89,7 @@ For REMOVE due to overlap, replacement evidence is mandatory:
 - TC scope is too broad (should be narrowed to only E2E-exclusive aspects)
 - TC scope is too narrow (missing assertions for related behavior in same CLI invocation)
 - TC has structure issues flagged in the review
+- TC has undeclared or wildcard verifier-dependent artifact paths
 - TC is hidden-recipe-driven or workaround-driven but the underlying user job should still be supported by the public surface after scenario/docs/help correction
 **CONSOLIDATE** — The TC should merge with another TC. Criteria (any one is sufficient):
@@ -193,6 +203,12 @@ Format the complete change plan:
 |--------|--------|-----|
 | Update docs/help/CLI | {package/path} | {job is valid but current public surface is too weak for the E2E path} |
+### Downstream Retained-E2E Sweep ({n} actions)
+| Scenario | Trigger | Change Needed |
+|----------|---------|---------------|
+| {scenario-id} | {renamed key / lifecycle shift / command-shape change} | {update retained verifier/runner contract} |
 ### CONSOLIDATE ({n} TCs → {n} TCs)
 | Source TCs | Target TC | Merged Assertions |

data/handbook/workflow-instructions/e2e/review.wf.md CHANGED Viewed

@@ -14,7 +14,8 @@ This workflow performs deep exploration of a package to produce a **coverage mat
 During review, treat the runner/verifier split as a first-class quality check:
 - Runner must be execution-only (no verdict language).
 - Verifier must be impact-first (sandbox impact before runner observations and debug).
-- `results/tc/{NN}/` must not be used for helper inputs or verifier-feeding helper reports.
+- `results/tc/{NN}/` must contain only declared verifier-dependent evidence.
+- Every verifier-dependent artifact path must be declared by runner/setup; verifier-only or wildcard references are contract drift.
 - Goal-style TCs must also pass the public-surface check: the runner should be able to do the job from docs/usage/`--help` and the tool under test, without hidden recipes or workarounds.
 **Pipeline position:** Stage 1 of 3 (Explore)
@@ -117,6 +118,9 @@ find {PACKAGE}/test/e2e -name "scenario.yml" -path "*/TS-*" 2>/dev/null | sort
   - `tags`, `cost-tier`, `e2e-justification`, `unit-coverage-reviewed`
   - `last-verified`, `verified-by`
 - Extract the objective (what the TC verifies)
+- Record TC style:
+  - `public-surface`
+  - `retained-contract`
 - Record the TC's primary oracle:
   - final sandbox state / real product output
   - runner observations as supporting context
@@ -134,15 +138,15 @@ find {PACKAGE}/test/e2e -name "scenario.yml" -path "*/TS-*" 2>/dev/null | sort
 - Mark TC evidence status:
   - `complete` when `e2e-justification` is present, the verifier is end-state-first, and `unit-coverage-reviewed` has at least one path
   - `missing` otherwise
-  - `at-risk` when evidence is existence-only, helper-artifact-driven, duplicate command invocations are detected, or the TC is hidden-recipe/workaround-driven
+  - `at-risk` when evidence is existence-only, helper-artifact-driven, duplicate command invocations are detected, the TC is hidden-recipe/workaround-driven, or verifier-dependent artifacts are undeclared
 If `--scope` was provided, filter to only the specified scenario.
 Build an E2E test map:
-| TC ID | Title | Command Invocations | Feature Tested | Primary Oracle | Public Surface Fit | Friction | Tags | Cost Tier | E2E Justification | Unit Coverage Reviewed | Evidence | False-Positive Risk |
-|-------|-------|-------------|----------------|----------------|--------------------|----------|------|-----------|-------------------|------------------------|----------|---------------------|
-| {id} | {title} | {command list} | {feature} | {state / output / observations+fallback} | {valid/hidden-recipe/workaround/unsupported-detail} | {low/medium/high} | {tags} | {tier} | {reason or "(missing)"} | {files or "(missing)"} | {complete/missing/at-risk} | {low/medium/high} |
+| TC ID | Style | Title | Command Invocations | Feature Tested | Primary Oracle | Public Surface Fit | Artifact Contract | Friction | Tags | Cost Tier | E2E Justification | Unit Coverage Reviewed | Evidence | False-Positive Risk |
+|-------|-------|-------|-------------|----------------|----------------|--------------------|-------------------|----------|------|-----------|-------------------|------------------------|----------|---------------------|
+| {id} | {public-surface/retained-contract} | {title} | {command list} | {feature} | {state / output / observations+fallback} | {valid/hidden-recipe/workaround/unsupported-detail} | {declared/undeclared/wildcard} | {low/medium/high} | {tags} | {tier} | {reason or "(missing)"} | {files or "(missing)"} | {complete/missing/at-risk} | {low/medium/high} |
 ### 5. Build Coverage Matrix
@@ -214,12 +218,12 @@ TCs that may fail the E2E Value Gate (unit tests cover the same behavior or high
 ### E2E Decision Record Coverage
-| TC ID | Evidence Status | Public Surface Fit | Friction | Missing Fields / Contract Drift |
-|-------|------------------|--------------------|----------|-------------------------------|
-| {id} | complete | valid | low | none |
-| {id} | missing | hidden-recipe-driven | high | e2e-justification, unit-coverage-reviewed, end-state oracle |
+| TC ID | Style | Evidence Status | Public Surface Fit | Artifact Contract | Friction | Missing Fields / Contract Drift |
+|-------|-------|------------------|--------------------|-------------------|----------|-------------------------------|
+| {id} | public-surface | complete | valid | declared | low | none |
+| {id} | retained-contract | missing | hidden-recipe-driven | undeclared | high | e2e-justification, unit-coverage-reviewed, end-state oracle |
-**Action:** Any TC with missing evidence, helper-artifact drift, hidden recipes, workaround dependence, or unsupported internal-detail checks should be updated during the next rewrite cycle.
+**Action:** Any TC with missing evidence, undeclared/wildcard artifact drift, hidden recipes, workaround dependence, or unsupported internal-detail checks should be updated during the next rewrite cycle.
 ### Gap Analysis

data/handbook/workflow-instructions/e2e/rewrite.wf.md CHANGED Viewed

@@ -42,7 +42,9 @@ ace-bundle wfi://e2e/review  →  ace-bundle wfi://e2e/plan-changes  →  ace-bu
 - Normalize runner files to execution-only language.
 - Normalize verifier files to verdict-only, impact-first validation.
 - Keep setup concerns in `scenario.yml` and fixtures, not in TC runner setup sections.
-- Remove helper artifact requirements from `results/tc/{NN}/`; use runner observations instead.
+- Keep only declared verifier-dependent evidence under `results/tc/{NN}/`.
+- Move verifier-only artifact references into explicit runner/setup declarations.
+- Replace wildcard artifact paths with exact declared files.
 - Rewrite goal-style TCs around the public user path. Do not preserve hidden recipes, workaround branches, or supporting-tool probes as the way the runner reaches the goal.
 ## Workflow Steps
@@ -128,12 +130,14 @@ Follow the E2E test writing rules:
 - Target 2-5 TCs per scenario
 - Test through the CLI interface, not library imports
 - Write runner goals as “do the job” outcomes, not “write a report for the verifier” chores
-- Keep `results/tc/{NN}/` for real outcomes only; avoid helper YAML, path files, command files, and reflections
+- Keep `results/tc/{NN}/` for declared verifier-dependent evidence only; avoid undeclared helper YAML, reflections, and verifier-facing manifests
+- Keep only declared verifier-dependent evidence under `results/tc/{NN}/`; small supporting captures are acceptable when they are explicit and necessary
 - Use runner observations as the only non-filesystem secondary evidence source
 - Make final sandbox state or real product output the primary oracle whenever possible
 - Add behavioral/content assertions only when CLI output itself is part of the user-visible outcome
 - Remove duplicate command-only TCs; fold related assertions into one TC where possible
 - Do not encode exact workaround procedures, hidden command recipes, or internal debugging tricks the user would not infer from docs/usage/`--help`
+- For watch/live-output flows, rewrite to a bounded-session pattern with explicit shutdown evidence
 - If the job is valid but the public surface is too weak, plan a product/docs/help fix instead of hardcoding the workaround into the TC
 **Load the TC template for reference:**
@@ -152,6 +156,8 @@ For each TC classified as MODIFY:
    - **Broaden scope** — add assertions for related behavior tested by the same CLI invocation
    - **Fix structure** — add missing sections, fix formatting issues
    - **Replace helper-artifact oracles** — if the existing TC relies on runner-written helper files, rewrite it around final sandbox state plus runner observations
+   - **Declare verifier-dependent artifacts** — if the verifier names a `results/tc/...` file, ensure the runner or setup declares the exact same path
+   - **Remove wildcard declarations** — replace `results/tc/.../*` or `results/tc/.../foo.*` with exact paths
    - **Add evidence gates** — if the existing TC relies on existence-only or missing end-state checks, strengthen the primary oracle before falling back to debug captures
    - **Remove hidden recipes/workarounds** — if the existing TC teaches the runner how to bypass the public surface, rewrite it around the supported user path or narrow/remove the TC
 3. Update the `last-verified` field if the TC was re-run during modification
@@ -241,7 +247,8 @@ Present the execution summary:
 - [ ] TC count matches plan: {yes/no}
 - [ ] No stale references: {yes/no}
 - [ ] All scenarios have 2-5 TCs: {yes/no}
-- [ ] Modified/created TCs avoid helper files in `results/tc/{NN}/`: {yes/no}
+- [ ] Modified/created TCs avoid undeclared helper files in `results/tc/{NN}/`: {yes/no}
+- [ ] Modified/created TCs declare every verifier-dependent artifact path: {yes/no}
 ### Next Steps

data/lib/ace/test/end_to_end_runner/atoms/artifact_contract_validator.rb ADDED Viewed

@@ -0,0 +1,138 @@
+# frozen_string_literal: true
+module Ace
+  module Test
+    module EndToEndRunner
+      module Atoms
+        # Validates that verifier-visible artifact paths are explicitly declared by
+        # runner instructions or scenario setup, and normalizes grouped capture
+        # shorthand such as `foo.stdout`, `.stderr`, `.exit`.
+        class ArtifactContractValidator
+          Reference = Struct.new(:path, :optional, :source, :line, keyword_init: true)
+          FULL_PATH_PATTERN = /
+            (?:`|"|')?
+            (results\/tc\/\d{2}\/[^\s`)"']+|results\/tc\/\d{2}\/)
+            (?:`|"|')?
+            (\s*\(optional\))?
+          /ix
+          SUFFIX_PATTERN = /,\s*(?:`|"|')?(\.[A-Za-z0-9*._-]+)(?:`|"|')?(\s*\(optional\))?/i
+          WILDCARD_PATTERN = /[*?\[]/.freeze
+          class << self
+            def extract(markdown, source:)
+              markdown.to_s.each_line.with_index(1).flat_map do |line, line_number|
+                extract_from_line(line, source: source, line_number: line_number)
+              end
+            end
+            def references_from_paths(paths, source:)
+              Array(paths).filter_map do |path|
+                normalized = normalize(path)
+                next if normalized.nil?
+                Reference.new(path: normalized, optional: false, source: source, line: nil)
+              end
+            end
+            def validate!(tc_id:, scenario_dir:, runner_references:, verifier_references:, scenario_references:)
+              invalid_wildcards = (runner_references + verifier_references + scenario_references).select do |reference|
+                wildcard?(reference.path)
+              end
+              unless invalid_wildcards.empty?
+                raise ArgumentError,
+                  "Wildcard artifact path(s) are not supported for #{tc_id} in #{scenario_dir}: " \
+                  "#{format_references(invalid_wildcards)}"
+              end
+              declared_paths = normalized_paths(scenario_references + runner_references)
+              undeclared = verifier_references.reject do |reference|
+                declared_paths.include?(normalize(reference.path))
+              end
+              return if undeclared.empty?
+              raise ArgumentError,
+                "Verifier references undeclared artifact(s) for #{tc_id} in #{scenario_dir}: " \
+                "#{format_references(undeclared)}. " \
+                "Declare exact artifact paths in the runner file or scenario.yml sandbox-layout."
+            end
+            private
+            def extract_from_line(line, source:, line_number:)
+              matches = []
+              line.to_enum(:scan, FULL_PATH_PATTERN).each do
+                matches << {
+                  start: Regexp.last_match.begin(0),
+                  end: Regexp.last_match.end(0),
+                  path: normalize(Regexp.last_match[1]),
+                  optional: !Regexp.last_match[2].to_s.empty?
+                }
+              end
+              matches.each_with_index.flat_map do |match, index|
+                refs = [
+                  Reference.new(
+                    path: match[:path],
+                    optional: match[:optional],
+                    source: source,
+                    line: line_number
+                  )
+                ]
+                next_match = matches[index + 1]
+                suffix_region = line[match[:end]...(next_match ? next_match[:start] : line.length)].to_s
+                suffix_base = suffix_base_for(match[:path])
+                next refs if suffix_base.nil?
+                suffix_region.to_enum(:scan, SUFFIX_PATTERN).each do
+                  refs << Reference.new(
+                    path: "#{suffix_base}#{Regexp.last_match[1]}",
+                    optional: !Regexp.last_match[2].to_s.empty?,
+                    source: source,
+                    line: line_number
+                  )
+                end
+                refs
+              end
+            end
+            def suffix_base_for(path)
+              return nil if path.nil?
+              return nil if path.match?(%r{\Aresults/tc/\d{2}\z})
+              path.sub(/\.[^.\/]+\z/, "").tap do |value|
+                return nil if value == path
+              end
+            end
+            def normalized_paths(references)
+              references.map { |reference| normalize(reference.path) }.compact.uniq
+            end
+            def normalize(path)
+              value = path.to_s.strip
+              return nil unless value.start_with?("results/tc/")
+              value.sub(%r{/+\z}, "")
+            end
+            def wildcard?(path)
+              path.to_s.match?(WILDCARD_PATTERN)
+            end
+            def format_references(references)
+              references.uniq { |reference| [reference.path, reference.source, reference.line] }.map do |reference|
+                if reference.line
+                  "#{reference.path} (#{reference.source}:#{reference.line})"
+                else
+                  "#{reference.path} (#{reference.source})"
+                end
+              end.join(", ")
+            end
+          end
+        end
+      end
+    end
+  end
+end