RubyGems - ace-test-runner-e2e - Versions diffs - 0.29.8 → 0.38.11 - Mend

ace-test-runner-e2e 0.29.8 → 0.38.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

data/handbook/workflow-instructions/e2e/create.wf.md CHANGED Viewed

@@ -1,4 +1,12 @@
 ---
+name: e2e-create
+description: Create a new E2E test scenario from template
+allowed-tools:
+- Bash(ace-bundle:*)
+- Read
+- Write
+- Glob
+- Grep
 doc-type: workflow
 title: Create E2E Test Workflow
 purpose: Create a new E2E test scenario from template
@@ -23,35 +31,48 @@ This workflow guides an agent through creating a new E2E test scenario.
 - Scenario ID format: `TS-<PACKAGE_SHORT>-<NNN>[-slug]`
 - Standalone files: `TC-*.runner.md` and `TC-*.verify.md`
 - TC artifact layout: `results/tc/{NN}/`
+- Runner observations are harness-managed report data, not sandbox helper files
 - Summary counters: `tcs-passed`, `tcs-failed`, `tcs-total`, `failed[].tc`
 - CLI split reminder:
   - `ace-test-e2e` for single-package execution
   - `ace-test-e2e-suite` for suite-level execution
 ## Authoring Contract
 - Runner files (`runner.yml.md`, `TC-*.runner.md`) are execution-only.
+- Goal-style TCs must prove two things:
+  - the tool works
+  - a user can do the job from the public surface (`README`, usage docs, `--help`, and the CLI itself) without hidden recipes or workarounds
 - Verifier files (`verifier.yml.md`, `TC-*.verify.md`) are verdict-only with impact-first evidence order:
   1. sandbox/project state impact
-  2. explicit artifacts
-  3. debug captures as fallback
+  2. runner observations
+  3. explicit product outcomes
+  4. debug captures as fallback
 - Setup belongs to `scenario.yml` `setup:` and fixtures; do not duplicate setup in runner TC instructions.
+- Keep `results/tc/{NN}/` for real outcome artifacts only; do not ask the runner to write helper YAML, path files, command files, reflections, or verifier-facing manifests there.
+- Do not encode hidden command recipes, fallback detours, or workaround sequences in runner TC files. If the job cannot be done from the public surface, treat that as a product/docs/help gap or remove/narrow the TC.
 ## Workflow Steps
 ### 1. Validate Inputs
 **Check package exists:**
 ```bash
 test -d "{PACKAGE}" && echo "Package exists" || echo "Package not found"
 ```
 If package doesn't exist, list available packages:
 ```bash
 ls -d */ | grep -E "^ace-" | sed 's/\/$//'
 ```
 **Normalize area code:**
 - Convert to uppercase (e.g., `lint` -> `LINT`)
 - Verify it's a valid area name (2-10 alphanumeric characters)
@@ -66,6 +87,7 @@ find {PACKAGE}/test/e2e -maxdepth 1 -type d -name "TS-{AREA}-*" 2>/dev/null | \
 ```
 Sort and take the highest number:
 - If no existing tests: use `001`
 - Otherwise: increment the highest number by 1
 - Format as three digits (e.g., `001`, `002`, `015`)
@@ -85,12 +107,14 @@ mkdir -p {PACKAGE}/test/e2e
 Create a kebab-case slug:
 **If --context provided:**
 - Extract key words from the context description
 - Convert to lowercase
 - Replace spaces with hyphens
 - Limit to 5-6 words
 **If no context:**
 - Use a placeholder: `new-test-scenario`
 Example: "Test config file validation" -> `config-file-validation`
@@ -100,11 +124,13 @@ The slug is the directory name suffix: `TS-LINT-003-config-file-validation/`
 ### 5. Load Template
 Load the test template:
 ```bash
 ace-bundle tmpl://test-e2e
 ```
 Or read directly:
 ```
 ace-test-runner-e2e/handbook/templates/test-e2e.template.md
 ```
@@ -123,6 +149,7 @@ Replace template placeholders with actual values:
 | `{area-name}` | Area code (lowercase) |
 Initial values for optional fields:
 - `priority: medium`
 - `duration: ~10min`
 - `automation-candidate: false`
@@ -138,9 +165,10 @@ Initial values for optional fields:
 Before generating test cases, verify the proposed test has genuine E2E value.
 **Check unit test coverage:**
 ```bash
 # Search for existing unit tests covering this area
-find {PACKAGE}/test/atoms {PACKAGE}/test/molecules {PACKAGE}/test/organisms \
+find {PACKAGE}/test/fast {PACKAGE}/test/feat \
   -name "*_test.rb" 2>/dev/null | head -20
 ```
@@ -154,47 +182,91 @@ For each proposed TC, answer: **"Does this require the full CLI binary + real ex
 - If **PARTIAL**: create the TC but scope it to only the E2E-exclusive aspects
 **Example decisions:**
-- "Test that invalid YAML config produces error" — check if `atoms/config_parser_test.rb` already asserts this. If so, **skip** (unit test covers it). If unit test checks parsing but not the full CLI exit code path, **create** a TC scoped to just the exit code.
-- "Test that StandardRB subprocess executes and returns results" — unit tests stub the subprocess. **Create** this as E2E because it requires the real tool.
+- "Test that invalid YAML config produces error" -- check if `atoms/config_parser_test.rb` already asserts this. If so, **skip** (unit test covers it). If unit test checks parsing but not the full CLI exit code path, **create** a TC scoped to just the exit code.
+- "Test that StandardRB subprocess executes and returns results" -- unit tests stub the subprocess. **Create** this as E2E because it requires the real tool.
 If all proposed TCs fail the gate, report to the user:
 ```
 All proposed behaviors are already covered by unit tests in {PACKAGE}/test/.
 No E2E test needed. Consider adding unit tests instead if coverage gaps exist.
 ```
-### 7a. Evidence-Gate Review Before Writing Files
+### 7a. Public-Surface Gate
+Before generating or keeping a goal-style TC, answer:
+**"Can a normal user complete this job from the package's public surface, without hidden recipes or workarounds?"**
+Public surface means:
+- package README / usage docs
+- `--help`
+- declared fixtures and `scenario.yml` setup
+- the tool under test itself
+Reject or narrow the TC if it depends on:
+- step-by-step runner procedures a user would not infer from docs/help
+- workaround branches to compensate for CLI/docs/help gaps
+- direct supporting-tool probes as the primary oracle for an ACE CLI scenario
+- internal-state checks that the public surface does not expose and that do not matter to the user job
+### 7b. Evidence-Gate Review Before Writing Files
 Before finalizing the test plan, block weak coverage patterns:
 - **Existence-only TC**:
   - only checks directory/file existence
   - no command output/content assertion
   - missing `*.exit` capture for the executed command
 - **Duplicate-invocation TC**:
   - same command invocation, same purpose, split across multiple TCs
+- **Helper-artifact-driven TC**:
+  - runner is instructed to create YAML/TXT/MD helper files in `results/tc/{NN}/`
+  - verifier depends on those helper files instead of final sandbox state or real product output
+- **Hidden-recipe-driven TC**:
+  - the runner must follow a command sequence not discoverable from docs/usage/`--help`
+  - the TC succeeds only because the scenario teaches an internal or non-obvious workaround
+- **Workaround-driven TC**:
+  - the runner is told how to bypass a docs/help/CLI gap instead of surfacing it
+  - the verifier would pass a scenario that a normal user could not complete cleanly
 | TC ID | Decision (KEEP/ADD/SKIP) | Evidence Strength | E2E-only reason | Unit tests reviewed |
 |-------|---------------------------|------------------|-----------------|--------------------|
 | {tc-id} | {decision} | `command-output` | {why this needs real CLI/tools/fs} | {path1,path2} |
 Rules:
 - `existence-only` is never valid for KEEP/ADD. Use it only for SKIP rows with explicit unit-test replacement.
+- `helper-artifact-driven` is never valid for KEEP/ADD when final sandbox state could prove the goal directly.
+- `hidden-recipe-driven` and `workaround-driven` are never valid for KEEP/ADD.
 - `SKIP` rows must include replacement unit-test evidence.
-- Non-skipped rows must include command-level artifacts (`stdout`, `stderr`, `exit`, and/or explicit proof files).
+- Non-skipped rows must identify the primary oracle for the TC: final sandbox state, real product output, or debug fallback.
+- Non-skipped rows must state why the job is achievable from the public surface without hidden recipes.
 - At least one `unit tests reviewed` path is required for every row.
 - The scenario-level `unit-coverage-reviewed` field must include the union of all referenced unit test files.
-### 7b. E2E Decision Record (Required)
+### 7c. E2E Decision Record (Required)
 Before writing files, produce a decision record table for every candidate TC:
-| TC ID | Decision (KEEP/ADD/SKIP) | E2E-only reason | Unit tests reviewed |
-|-------|---------------------------|-----------------|---------------------|
-| {tc-id} | {decision} | {why this needs real CLI/tools/fs} | {path1,path2} |
+| TC ID | Decision (KEEP/ADD/SKIP) | E2E-only reason | Public-surface path | Unit tests reviewed |
+|-------|---------------------------|-----------------|---------------------|---------------------|
+| {tc-id} | {decision} | {why this needs real CLI/tools/fs} | {docs/help/CLI path or "not valid"} | {path1,path2} |
 Rules:
 - No TC may be created without a row in this table.
 - If decision is `SKIP`, include the unit-test evidence that replaces it.
+- If the public-surface path is missing or workaround-driven, the TC must be `SKIP` or explicitly planned as a product/docs/help improvement before creation.
 - At least one `unit tests reviewed` path is required for each row.
 - The scenario-level `unit-coverage-reviewed` field must include the union of all referenced unit test files.
@@ -203,12 +275,13 @@ Rules:
 If a context description was provided, enhance the test with:
 **Research the package:**
-1. **Run unit tests first** (`ace-test` in the package) — they are the ground truth for implemented behavior
+1. **Run unit tests first** (`ace-test` in the package) -- they are the ground truth for implemented behavior
 2. Examine the relevant code in `{PACKAGE}/lib/`
 3. Check existing unit tests for expected behavior patterns
 4. Understand the feature being tested
 5. **Run the tool** to observe actual behavior, output format, file paths, and exit codes
-6. **Verify config/input formats** by reading the actual parsing code — never assume formats from design specs or task descriptions
+6. **Verify config/input formats** by reading the actual parsing code -- never assume formats from design specs or task descriptions
+7. **Compare with the public surface** -- verify the intended user path is actually supported by docs/help, and do not compensate for gaps with hidden runner instructions
 **Generate test content:**
 1. Write a clear objective based on the context
@@ -220,16 +293,20 @@ If a context description was provided, enhance the test with:
 #### Test Case Generation Rules
 **MUST (required for all E2E tests):**
-- **Verify the feature is implemented** before writing the test — read the actual implementation code, not just task specs or design documents
-- **Verify config/input formats** by reading the parsing code — never assume formats from BDD specs, task descriptions, or documentation
+- **Verify the feature is implemented** before writing the test -- read the actual implementation code, not just task specs or design documents
+- **Verify config/input formats** by reading the parsing code -- never assume formats from BDD specs, task descriptions, or documentation
 - Include an error/negative TC only when it validates E2E-exclusive behavior (real CLI parser/runtime/tooling/filesystem) or when unit coverage has a documented gap
-- Verify actual file paths by running the tool first — never hardcode paths from documentation or assumptions
-- Use explicit `&& echo "PASS" || echo "FAIL"` patterns for every verification step
+- Verify actual file paths by running the tool first -- never hardcode paths from documentation or assumptions
+- Write runner goals as user outcomes, not “create a report” chores for the verifier
 - Check specific exit codes for error commands (not just "non-zero")
-- Add at least one output-content assertion for each command being verified
+- Make final sandbox state or real product output the primary oracle whenever possible
+- Do not require runner-authored helper files under `results/tc/{NN}/`
+- Add at least one behavioral/content assertion when CLI output itself is part of the outcome being tested
 **SHOULD (strongly recommended):**
-- Test the real user journey — structure TCs as a sequential workflow, not isolated commands
+- Test the real user journey -- structure TCs as a sequential workflow, not isolated commands
 - Verify exit codes for all commands, not just error cases
 - Include negative assertions (files/directories that should NOT exist)
 - Capture and retain command output for all assertions (`stdout`, `stderr`, and `*.exit`)
@@ -237,17 +314,18 @@ If a context description was provided, enhance the test with:
 - Verify that status values match actual implementation (e.g., `done` vs `completed`)
 **COST-AWARE (reduce LLM invocations):**
-- Consolidate assertions that share the same CLI invocation into a single TC. For example, after running `ace-lint file.rb`, check exit code, report.json structure, and ok.md existence in ONE TC — not three.
+- Consolidate assertions that share the same CLI invocation into a single TC. For example, after running `ace-lint file.rb`, check exit code, report.json structure, and ok.md existence in ONE TC -- not three.
 - Target 2-5 TCs per scenario. More than 5 suggests the scenario is too broad; split into focused scenarios. Fewer than 2 suggests merging with a related scenario.
 - Never create a TC for a single assertion when that assertion could be appended to an existing TC that runs the same command.
 #### Recommended TC Ordering
-1. **Error paths first** — wrong args, missing files, no prior state (run from clean state)
-2. **Happy path start** — create/init with correct args, verify output
-3. **Structure verification** — check actual on-disk file structure with negative assertions
-4. **Lifecycle operations** — status, advance, fail, retry in workflow order
-5. **End state** — verify completion message, all steps terminal
+1. **Error paths first** -- wrong args, missing files, no prior state (run from clean state)
+2. **Happy path start** -- create/init with correct args, verify output
+3. **Structure verification** -- check actual on-disk file structure with negative assertions
+4. **Lifecycle operations** -- status, advance, fail, retry in workflow order
+5. **End state** -- verify completion message, all steps terminal
 This ordering ensures error TCs run before any state is created (clean environment), and happy-path TCs build on each other sequentially.
@@ -258,6 +336,7 @@ See: **e2e-testing.g.md § "Avoiding False Positive Tests"** for the full list o
 **E2E tests MUST test through the CLI interface, not library imports.**
 **Valid approach:**
 ```bash
 OUTPUT=$(ace-review --preset code --subject "diff:HEAD~1" --auto-execute 2>&1)
 EXIT_CODE=$?
@@ -265,6 +344,7 @@ EXIT_CODE=$?
 ```
 **Invalid approach (this is integration/unit testing, not E2E):**
 ```bash
 bundle exec ruby -e '
   require_relative "lib/ace/review"
@@ -273,6 +353,7 @@ bundle exec ruby -e '
 ```
 **For execution tests (LLM, API calls):**
 - Use `--auto-execute` to make real API calls
 - Using only `--dry-run` cannot verify actual execution behavior
 - Keep costs minimal: cheap models, tiny prompts, small diffs
@@ -280,22 +361,26 @@ bundle exec ruby -e '
 #### Common Anti-Patterns to Avoid
 **Writing tests from design specs before implementation:**
 - Task descriptions and BDD specs often describe *intended* behavior with *proposed* config formats
 - The actual implementation may use different formats, different commands, or different workflows
 - Example: A spec might describe `jobs:` with explicit `number:` and `parent:` fields, but implementation uses `steps:` with auto-generated numbers and dynamic hierarchy via `add --after --child`
 - **Fix:** Always read the actual implementation code (especially config parsing) before writing test data
 **Assuming static vs dynamic behavior:**
 - Tests may assume features work at config-time (static) when they actually work at runtime (dynamic)
 - Example: Assuming hierarchy is defined in config when it's actually built dynamically via commands
 - **Fix:** Trace the actual code path for the feature being tested
 **Splitting one command into many redundant TCs:**
 - Multiple TCs each validate one assertion after the same CLI invocation, creating overlap with unit tests and increasing run cost
 - Example: TC-A checks exit code, TC-B checks report file, TC-C checks summary text for the same command run
 - **Fix:** Consolidate those assertions into one TC and move formatter/parser details to unit tests
 **Example for "Test config file validation":**
 ```markdown
 ## Test Cases
@@ -315,28 +400,33 @@ bundle exec ruby -e '
 ### 9. Write Test Files
 Create the scenario directory with separate files:
 ```bash
 mkdir -p {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}
 ```
 Write `scenario.yml` (metadata and setup):
 ```
 {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}/scenario.yml
 ```
 Write scenario pair configs:
 ```
 {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}/runner.yml.md
 {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}/verifier.yml.md
 ```
 Write individual TC runner/verifier files for each test case:
 ```
 {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}/TC-001-{tc-slug}.runner.md
 {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}/TC-001-{tc-slug}.verify.md
 ```
 Optionally create a fixtures directory if test data is needed:
 ```bash
 mkdir -p {PACKAGE}/test/e2e/TS-{AREA}-{NNN}-{slug}/fixtures
 ```
@@ -373,6 +463,7 @@ Output a summary:
 ## Example Invocations
 **Create a test:**
 ```bash
 ace-bundle wfi://e2e/create
 ```
@@ -380,6 +471,7 @@ ace-bundle wfi://e2e/create
 Creates: `ace-lint/test/e2e/TS-LINT-003-new-test-scenario/` with `scenario.yml` and TC files.
 **Create a contextual test:**
 ```bash
 ace-bundle wfi://e2e/create
 ```
@@ -387,6 +479,7 @@ ace-bundle wfi://e2e/create
 Creates: `ace-lint/test/e2e/TS-LINT-003-config-file-validation/` with `scenario.yml` and TC files for config validation.
 **Create test for new area:**
 ```bash
 ace-bundle wfi://e2e/create
 ```

data/handbook/workflow-instructions/e2e/execute.wf.md CHANGED Viewed

@@ -46,12 +46,15 @@ Tag filtering happens at discovery time (before `SetupExecutor` runs). By the ti
 ## Execution Contract
-- Runner is execution-only: execute declared TC actions and capture evidence.
+- Runner is execution-only: execute declared TC actions, leave only real outcome evidence under `results/tc/{NN}/`, and return final observations through the harness.
+- Runner follows the public user path. Do not turn missing docs/help/CLI affordances into embedded workaround instructions.
 - Verifier is verification-only: determine PASS/FAIL using impact-first ordering:
   1. sandbox/project state impact
-  2. explicit artifacts
-  3. debug captures (`stdout`/`stderr`/exit) as fallback
+  2. runner observations
+  3. explicit artifacts that are true product outcomes
+  4. debug captures (`stdout`/`stderr`/exit) as fallback
 - Do not interpret setup ownership in runner TC files; setup is owned by `scenario.yml` + fixtures.
+- Treat workaround pressure recorded in runner observations as a gap to fix, not as permission to strengthen the runner script.
 ## Dual-Agent Verifier
@@ -61,7 +64,7 @@ When `--verify` is passed (or always-on for CLI pipeline runs), execution follow
 2. **Verifier agent** independently inspects the sandbox and artifacts against `TC-*.verify.md` expectations
 3. **Report generator** (`PipelineReportGenerator`) produces deterministic summary from verifier output
-The verifier has no access to the runner's conversation — it evaluates purely from on-disk evidence. This prevents self-confirmation bias.
+The verifier has no access to the runner's conversation — it evaluates from sandbox evidence plus the structured runner observations persisted by the harness. This prevents self-confirmation bias while still surfacing execution context.
 ## Subagent Mode
@@ -75,6 +78,7 @@ When invoked as a subagent (via Task tool from orchestrator):
 - **Failed**: {count}
 - **Total**: {count}
 - **Report Paths**: {timestamp}-{short-pkg}-{short-id}.*
+- **Observations**: Brief factual summary or "None"
 - **Issues**: Brief description or "None"
 ```
@@ -149,8 +153,8 @@ For each TC (TC-NNN):
 1. **Check filter** — skip if `FILTERED_CASES` is set and TC not in list
 2. **Read** the runner file objective
-3. **Execute** runner steps, save artifacts to `results/tc/{NN}/`
-4. **Capture** exit codes, output, error messages
+3. **Execute** runner steps, save only real outcome artifacts to `results/tc/{NN}/`
+4. **Return** factual runner observations through the harness
 5. **Evaluate** against verifier expectations
 6. **Record** Pass/Fail with per-TC evidence
@@ -250,4 +254,4 @@ Reports: `.ace-local/test-e2e/{timestamp}-{short-pkg}-{short-id}-reports/`
 | TC fails | Record details, continue remaining TCs, include in report |
 | Sandbox missing/corrupted | Report error, do NOT recreate, return error summary |
 | TC filter mismatch | STOP, do not write reports, offer re-run |
-| Missing TC pair file | Report error for that TC, skip it, continue others |
+| Missing TC pair file | Report error for that TC, skip it, continue others |

data/handbook/workflow-instructions/e2e/fix.wf.md CHANGED Viewed

@@ -1,23 +1,33 @@
 ---
+name: e2e-fix
+description: Diagnose, fix, and rerun failing E2E scenarios with a self-bootstrapping analysis loop.
+allowed-tools:
+- Bash(ace-bundle:*)
+- Bash(ace-test:*)
+- Read
+- Write
+- Edit
+- Skill
 doc-type: workflow
 title: Fix E2E Tests Workflow
 purpose: fix-e2e-tests workflow instruction
 ace-docs:
-  last-updated: 2026-03-13
-  last-checked: 2026-03-21
+  last-updated: 2026-04-19
+  last-checked: 2026-04-19
 ---
 # Fix E2E Tests Workflow
 ## Goal
-Apply targeted fixes for failing E2E scenarios based on an existing E2E failure analysis report.
+Diagnose, fix, and rerun failing E2E scenarios with a single workflow entrypoint.
-This workflow is execution-only. Root cause classification is handled by `wfi://e2e/analyze-failures`.
+This workflow owns analysis readiness before any fix is applied. Reuse an existing analysis report when it is complete; otherwise generate or complete it via `wfi://e2e/analyze-failures`, then continue directly into the fix loop.
-## Hard Gate (Required Before Any Fix)
+## Analysis Readiness Gate
+Before any fix, ensure an analysis report exists with:
-Do not apply any fix until an analysis report exists with:
 - scenario / TC identifier
 - category (`code-issue`, `test-issue`, `runner-infrastructure-issue`)
 - evidence from reports/artifacts
@@ -26,22 +36,29 @@ Do not apply any fix until an analysis report exists with:
 - primary candidate files
 - do-not-touch boundaries
 - rerun scope recommendation
+- `Docs / Help Drift From E2E Failures` section with `Public Surface Checked`, `Drift Found`, and `Update Targets`
+If analysis is missing or incomplete, generate or refresh it first:
-If analysis is missing or incomplete, stop and run:
 ```bash
 ace-bundle wfi://e2e/analyze-failures
 ```
+Then continue this workflow using the resulting `E2E Failure Analysis Report`, `Fix Decisions`, and `Execution Plan Input` as the source of truth. Do not stop merely because analysis had to be generated.
 ## Required Input
-Use the output section from `e2e/analyze-failures`:
+Use the output section from `e2e/analyze-failures` when present, whether it was provided up front or generated by this workflow:
 - `## E2E Failure Analysis Report`
+- `## Docs / Help Drift From E2E Failures`
 - `## Fix Decisions`
 - `### Execution Plan Input`
 ## Autonomy Rule
 - Do not ask the user to choose fix target, category, or rerun scope.
+- If analysis is missing, run `wfi://e2e/analyze-failures` yourself before fixing.
 - If analysis is incomplete, auto-complete missing decision fields via local evidence (reports, artifacts, scenario files, implementation), then proceed.
 - Only stop for hard blockers (missing files/tools/permissions).
@@ -61,27 +78,40 @@ Apply fixes in this order:
 ## Fix Procedure
-1. Pick the first prioritized item from analysis
+1. Establish or refresh analysis
+- Check for a current analysis report that satisfies the Analysis Readiness Gate.
+- If none exists, or if required fields are missing, including the docs/help drift section, run `ace-bundle wfi://e2e/analyze-failures`.
+- Reuse the most recent valid analysis output as the source of truth for fix selection.
+- Treat full-suite/package reruns and targeted scenario reruns as different scopes. Do not label a broader suite failure set as a regression in a previously fixed targeted scenario unless the same scenario fails again on a clean rerun.
+2. Pick the first prioritized item from analysis
 - Use the selected "First item to fix"
 - Confirm category, fix target, and rerun scope
 - Apply the "Chosen fix decision" and primary candidate files directly
-2. Apply category-specific fix
+3. Apply category-specific fix
 ### Category: runner-infrastructure-issue
 - Fix runner/sandbox/provider/reporting/orchestration behavior
 - Verify with runner tests when applicable: `ace-test ace-test-runner-e2e`
 ### Category: code-issue
 - Fix package/tool behavior in implementation code
 - Add/update unit tests if needed
+- When the user job is valid but not achievable from docs/help/public CLI, apply the documented docs/help update target instead of codifying the workaround in the scenario
 ### Category: test-issue
 - Fix scenario definition, runner/verifier criteria, fixtures, or setup steps
 - Preserve role split: runner is execution-only, verifier is impact-first verdict
 - Keep implementation unchanged unless analysis is revised
+- Remove hidden recipes, workaround branches, and unsupported internal-detail checks from goal-style TCs
-3. Rerun the selected failing scope after each fix
+4. Rerun the selected failing scope after each fix
 After every implemented fix, rerun the analysis-selected failing scope before moving to the next item or recommending release.
@@ -96,25 +126,31 @@ ace-test-e2e {package}
 ```
 Rules:
 - Scenario rerun is the default after each fix iteration.
 - Use package rerun only when analysis explicitly selected package scope.
 - For multiple failing scenarios, rerun each scenario explicitly.
 ```text
 ace-test-e2e ace-assign TS-ASSIGN-001
 ace-test-e2e ace-assign TS-ASSIGN-002
 ace-test-e2e ace-bundle TS-BUNDLE-001
 ```
 - Record the rerun command and result in the execution summary for every fix item.
-4. Re-check classification when evidence conflicts
+5. Re-check classification when evidence conflicts
 - If outcome contradicts analysis, return to `e2e/analyze-failures`
 - Update analysis report and re-select a new autonomous chosen fix decision before continuing
+- If a suite/package report conflicts with a scenario report, the scenario report wins and the aggregate mismatch must be fixed or explicitly tracked before relying on suite-level TC mappings.
+6. Iterate until all targeted failures are resolved
-5. Iterate until all targeted failures are resolved
 - Keep one active scenario/TC at a time
 - Preserve cost-conscious rerun discipline
-6. Run a final explicit failing-scenario checkpoint before concluding the fix session
+7. Run a final explicit failing-scenario checkpoint before concluding the fix session
 After the currently targeted failures are addressed, require one final:
@@ -136,12 +172,25 @@ Use one explicit command per previously failing scenario to confirm no targeted
 ```markdown
 ## E2E Fix Execution Summary
+Analysis Source: reused existing analysis | generated via `wfi://e2e/analyze-failures` | refreshed incomplete analysis
 | Scenario / TC | Category | Change Applied | Verification Command | Result |
 |---|---|---|---|---|
 | ... | ... | ... | ... | pass/fail |
 ```
+If the analysis reported docs/help drift, include:
+```markdown
+## Docs / Help Updates
+| Scenario / TC | Public Surface Updated | Why |
+|---|---|---|
+| ... | docs/usage.md, CLI --help | E2E failure showed the valid user job was not discoverable |
+```
 Include one final row for the batch checkpoint:
 - Verification Command: one explicit rerun command per remaining failed scenario (`ace-test-e2e {package} {test-id}`)
 - Result: `pass` or remaining failing scenarios
 - If failures remain, continue the fix loop instead of treating the session as complete
@@ -160,7 +209,8 @@ If unresolved:
 - Fixes are traceable to analyzed failures
 - Verification scope matches analysis recommendation, including mandatory reruns after each fix
+- Any docs/help drift from analysis is fixed or explicitly carried as an unresolved blocker
 - Cost-conscious rerun strategy was followed
 - Final explicit per-scenario rerun checkpoint for all targeted failures was completed before concluding the fix session
 - No user clarification was required for fix targeting/scope in normal flow
-- Targeted failures pass, or blockers are explicitly documented
+- Targeted failures pass, or blockers are explicitly documented