@allurereport/plugin-agent 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,304 @@
1
+ # Agent Plugin
2
+
3
+ [<img src="https://allurereport.org/public/img/allure-report.svg" height="85px" alt="Allure Report logo" align="right" />](https://allurereport.org "Allure Report")
4
+
5
+ - Learn more about Allure Report at https://allurereport.org
6
+ - 📚 [Documentation](https://allurereport.org/docs/) – discover official documentation for Allure Report
7
+ - ❓ [Questions and Support](https://github.com/orgs/allure-framework/discussions/categories/questions-support) – get help from the team and community
8
+ - 📢 [Official announcements](https://github.com/orgs/allure-framework/discussions/categories/announcements) – be in touch with the latest updates
9
+ - 💬 [General Discussion ](https://github.com/orgs/allure-framework/discussions/categories/general-discussion) – engage in casual conversations, share insights and ideas with the community
10
+
11
+ ---
12
+
13
+ ## Overview
14
+
15
+ This plugin writes AI-friendly markdown summaries for a test run. It is designed for
16
+ flows like:
17
+
18
+ ```shell
19
+ ALLURE_AGENT_OUTPUT=./out/agent-report npx allure run -- npm test
20
+ ```
21
+
22
+ When enabled, the plugin writes:
23
+
24
+ - `index.md` with run summary, advisory findings, expected-scope overview, and links to every logical test
25
+ - `manifest/test-events.jsonl` as the append-only live event stream while the run is active
26
+ - one markdown file per logical test under `tests/<environment>/<historyId-or-trId>.md`
27
+ - `manifest/run.json`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` for machine-readable review
28
+ - copied run logs and other artifacts under `artifacts/`
29
+ - `AGENTS.md` with guidance for consuming the directory
30
+ - `manifest/expected.json` when `ALLURE_AGENT_EXPECTATIONS` is provided
31
+ - `project/docs/allure-agent-mode.md` when the project has a guide at `docs/allure-agent-mode.md`
32
+
33
+ If no output directory is configured, the plugin does nothing.
34
+
35
+ The plugin stays read-only by design. A separate harness layer can consume the
36
+ generated manifests, plan enrichment work, and decide whether a rerun is ready to
37
+ accept. See [the enrichment loop guide](../../docs/agent_enrichment_loop.md).
38
+
39
+ ## Verification Standard
40
+
41
+ - If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through `allure run`. It preserves the original console logs and adds agent-mode artifacts when you need them.
42
+ - Use `allure run` for smoke checks too, even when the change is small or mechanical.
43
+ - Only skip agent mode when it is impossible or when you are debugging agent mode itself.
44
+
45
+ ## Skills-First Workflow
46
+
47
+ The downstream workflow is intended to be skills-first:
48
+
49
+ 1. install the Allure skills bundle
50
+ 2. run the setup skill in a project
51
+ 3. let the setup skill create or update root `AGENTS.md`
52
+ 4. let the setup skill create `docs/allure-agent-mode.md`
53
+ 5. use Allure agent-mode in future test work through the project guide plus per-run manifests
54
+
55
+ Every generated run includes an `AGENTS.md` playbook. When the project has
56
+ `docs/allure-agent-mode.md`, the run output also copies that guide and tells agents
57
+ to read it first.
58
+
59
+ ## Install
60
+
61
+ Use your favorite package manager to install the package:
62
+
63
+ ```shell
64
+ npm add @allurereport/plugin-agent
65
+ yarn add @allurereport/plugin-agent
66
+ pnpm add @allurereport/plugin-agent
67
+ ```
68
+
69
+ Then, add the plugin to the Allure configuration file:
70
+
71
+ ```diff
72
+ import { defineConfig } from "allure";
73
+
74
+ export default defineConfig({
75
+ name: "Allure Report",
76
+ output: "./allure-report",
77
+ plugins: {
78
+ + agent: {
79
+ + options: {
80
+ + outputDir: "./out/agent-report",
81
+ + },
82
+ + },
83
+ },
84
+ });
85
+ ```
86
+
87
+ You can also enable it through an environment variable:
88
+
89
+ ```shell
90
+ ALLURE_AGENT_OUTPUT=./out/agent-report npx allure run -- npm test
91
+ ```
92
+
93
+ To compare the run against an intended scope, provide an expectations file:
94
+
95
+ ```shell
96
+ ALLURE_AGENT_OUTPUT=./out/agent-report \
97
+ ALLURE_AGENT_EXPECTATIONS=./out/agent-expected.yaml \
98
+ npx allure run -- npm test
99
+ ```
100
+
101
+ ## Options
102
+
103
+ The plugin accepts the following options:
104
+
105
+ | Option | Description | Type | Default |
106
+ |--------|-------------|------|---------|
107
+ | `outputDir` | Directory where the markdown report will be written. Relative paths are resolved from the `allure` process working directory | `string` | `ALLURE_AGENT_OUTPUT` |
108
+
109
+ ## Environment Variables
110
+
111
+ | Variable | Description |
112
+ |----------|-------------|
113
+ | `ALLURE_AGENT_OUTPUT` | Directory where the agent output should be written when `outputDir` is not set |
114
+ | `ALLURE_AGENT_EXPECTATIONS` | Optional path to a YAML or JSON file describing expected and forbidden test scope |
115
+ | `ALLURE_AGENT_COMMAND` | The executed command string recorded in `manifest/run.json` and `index.md` |
116
+ | `ALLURE_AGENT_NAME` | Optional agent identifier recorded in `manifest/run.json` |
117
+ | `ALLURE_AGENT_LOOP_ID` | Optional loop identifier recorded in `manifest/run.json` |
118
+ | `ALLURE_AGENT_TASK_ID` | Optional task identifier recorded in `manifest/run.json` |
119
+ | `ALLURE_AGENT_CONVERSATION_ID` | Optional conversation identifier recorded in `manifest/run.json` |
120
+
121
+ ## Manifest Contract
122
+
123
+ The plugin emits a hybrid output:
124
+
125
+ - Markdown for direct review:
126
+ - `index.md`
127
+ - `tests/<environment>/<slug>.md`
128
+ - `AGENTS.md`
129
+ - Machine-readable manifests for agents and tooling:
130
+ - `manifest/run.json`
131
+ - `manifest/test-events.jsonl`
132
+ - `manifest/tests.jsonl`
133
+ - `manifest/findings.jsonl`
134
+ - `manifest/expected.json` when an expectations file is provided
135
+ - `project/docs/allure-agent-mode.md` when the project guide is available
136
+
137
+ `index.md` is the landing page for the run. It includes run identity, expected scope,
138
+ advisory check summary, process logs, and grouped test links.
139
+
140
+ Each test markdown file includes:
141
+
142
+ - test identity and metadata
143
+ - expectation comparison
144
+ - copied attachment links
145
+ - retry history
146
+ - advisory findings and rerun guidance when evidence is weak
147
+
148
+ ## Expectations File
149
+
150
+ When `ALLURE_AGENT_EXPECTATIONS` is set, the plugin accepts YAML or JSON, normalizes
151
+ it into `manifest/expected.json`, and compares the run against it.
152
+
153
+ Expected top-level fields:
154
+
155
+ ```yaml
156
+ goal: Validate feature A
157
+ task_id: feature-a
158
+ expected:
159
+ environments:
160
+ - default
161
+ full_names:
162
+ - suite feature A should work
163
+ full_name_prefixes:
164
+ - suite feature A
165
+ label_values:
166
+ feature: feature-a
167
+ forbidden:
168
+ full_names:
169
+ - suite feature B should not run
170
+ full_name_prefixes:
171
+ - suite feature B
172
+ label_values:
173
+ feature:
174
+ - feature-b
175
+ - legacy-feature
176
+ notes:
177
+ - Only feature A tests should run.
178
+ ```
179
+
180
+ Selectors are advisory. The plugin does not fail the run; it records findings in
181
+ markdown and `manifest/findings.jsonl`.
182
+
183
+ ## Review Loop
184
+
185
+ The intended usage pattern is:
186
+
187
+ 1. Run tests with `allure run -- <command>`.
188
+ 2. Watch `manifest/run.json` and `manifest/test-events.jsonl` while the run is active.
189
+ 3. Review `index.md` plus the manifest files.
190
+ 4. If evidence is weak, add steps, attachments, labels, or parameters.
191
+ 5. Rerun the same scope with the same expectations file.
192
+ 6. Accept the run or iterate based on advisory findings.
193
+
194
+ For small mechanical test changes, use a scoped agent-mode run for the smoke check
195
+ too. Plain runner commands should be reserved for cases where agent mode is
196
+ impossible or when you are debugging agent mode itself.
197
+
198
+ For grouped coverage reviews, prefer one temp output directory and one expectations
199
+ file per scope instead of trying to review a whole command matrix from a single run.
200
+
201
+ ## Test Enrichment Best Practices
202
+
203
+ Use agent mode to improve evidence quality, not to decorate tests with generic noise.
204
+
205
+ - Steps must wrap real actions, state transitions, or assertions.
206
+ - Attachments must contain real runtime artifacts from that execution.
207
+ - Metadata should stay minimal and purposeful. Add labels or severity only when
208
+ expectations, debugging, or downstream quality policy actually uses them.
209
+ - Instrument stable helpers when several call sites need the same evidence.
210
+ For example, teach `runCommand` to emit a step instead of wrapping every
211
+ `runCommand(...)` call site with identical step blocks.
212
+
213
+ Avoid dummy enrichment:
214
+
215
+ - no empty wrapper steps
216
+ - no placeholder `"passed"` or `"success"` attachments
217
+ - no labels or taxonomy that never participates in scope review or policy
218
+
219
+ Acceptance should stay strict even though the plugin itself is advisory:
220
+
221
+ - regenerate expectations before each targeted rerun
222
+ - rerun only the intended tests when possible
223
+ - reject the rerun when scope drifts or high-confidence noop-style findings remain
224
+ - iterate again when evidence is still too weak to explain what happened
225
+
226
+ When agent output does not fully model runner-visible failures:
227
+
228
+ - inspect `artifacts/global/stderr.txt` and global errors before concluding the run is complete
229
+ - treat the review as partial when suite-load, import, or setup failures are visible outside logical test files
230
+ - keep console-only conclusions provisional until the missing modeling is understood
231
+
232
+ ## Project Guide
233
+
234
+ Projects using the skills flow should keep a short root `AGENTS.md` and a deeper
235
+ `docs/allure-agent-mode.md`.
236
+
237
+ `AGENTS.md` should route all test-related work to the deeper guide. The deeper guide
238
+ should explain:
239
+
240
+ - the feature-delivery loop
241
+ - the metadata-enrichment loop
242
+ - per-run temp expectations and output paths
243
+ - meaningful evidence rules
244
+ - minimal metadata rules
245
+ - future loops like flaky, known-issue, mute, and quality gates
246
+
247
+ ## Copyable Agent Instructions
248
+
249
+ The generated `AGENTS.md` already contains this guidance for each run. If you want
250
+ the same policy in a project-level skill or agent prompt, you can start with:
251
+
252
+ ```md
253
+ ## Allure Agent Mode Instructions
254
+
255
+ - If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through `allure run`. It preserves the original console logs and adds agent-mode artifacts when you need them.
256
+ - Use `allure run` for smoke checks too, even when the change is small or mechanical.
257
+ - Only skip agent mode when it is impossible or when you are debugging agent mode itself.
258
+ - Generate or refresh `ALLURE_AGENT_EXPECTATIONS` before each targeted rerun.
259
+ - Run tests with `ALLURE_AGENT_OUTPUT` and review `manifest/run.json`, `manifest/test-events.jsonl`, `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl`.
260
+ - Enrich only the intended tests. Add real steps for real setup, actions, and assertions.
261
+ - Attach only real runtime evidence such as payloads, responses, screenshots, DOM snapshots, diffs, logs, or traces.
262
+ - Keep metadata minimal. Add labels or severity only when scope review, debugging, or quality policy uses them.
263
+ - Instrument stable helpers when several call sites need the same evidence. For example, teach `runCommand` to emit a step instead of wrapping every caller.
264
+ - Reject the rerun if scope drifts, evidence stays weak, or high-confidence noop-style findings remain.
265
+ ```
266
+
267
+ ## Harness API
268
+
269
+ The package also exports a small read-only harness API for agent workflows:
270
+
271
+ ```ts
272
+ import {
273
+ buildAgentExpectations,
274
+ loadAgentOutput,
275
+ planAgentEnrichmentReview,
276
+ reviewAgentOutput,
277
+ } from "@allurereport/plugin-agent";
278
+ ```
279
+
280
+ - `buildAgentExpectations(...)` converts a goal plus target/forbidden selectors into
281
+ the JSON shape expected by `ALLURE_AGENT_EXPECTATIONS`.
282
+ - `loadAgentOutput(...)` reads `manifest/run.json`, `manifest/tests.jsonl`, and
283
+ `manifest/findings.jsonl`.
284
+ - `planAgentEnrichmentReview(...)` maps `check_name` values to enrichment actions
285
+ and returns an acceptance decision.
286
+ - `reviewAgentOutput(...)` is the convenience wrapper that loads and reviews in one call.
287
+
288
+ The harness does not mutate tests. It tells an agent what to fix next and rejects
289
+ acceptance when scope drifts or high-confidence noop-style evidence remains.
290
+
291
+ ## Enrichment Policy
292
+
293
+ The enrichment loop should add only real runtime evidence:
294
+
295
+ - Steps must wrap real actions, state transitions, or assertions.
296
+ - Attachments must contain runtime data produced by that execution.
297
+ - Feature/task labels are required only when they are used for scope review.
298
+ - Severity should be added only when it matters for review or quality-gate policy.
299
+
300
+ Avoid dummy enrichment such as empty wrapper steps, placeholder `"passed"` text
301
+ attachments, or labels that are never used downstream.
302
+
303
+ For a fuller policy, remediation mapping, and JS/Vitest examples based on the
304
+ existing sandbox tests, see [the enrichment loop guide](../../docs/agent_enrichment_loop.md).
@@ -0,0 +1,18 @@
1
+ export type EnrichmentActionCategory = "bootstrap-allure" | "narrow-test-scope" | "repair-test-metadata" | "add-meaningful-steps" | "add-test-attachments" | "add-retry-diagnostics" | "collapse-low-signal-trace" | "review-manually";
2
+ export type EnrichmentActionDefinition = {
3
+ category: EnrichmentActionCategory;
4
+ title: string;
5
+ guidance: string;
6
+ };
7
+ export declare const ENRICHMENT_ACTIONS_BY_CHECK_NAME: Record<string, EnrichmentActionDefinition>;
8
+ export declare const AGENT_ENRICHMENT_WORKFLOW: readonly ["Generate or refresh `ALLURE_AGENT_EXPECTATIONS` before each targeted enrichment iteration.", "Run tests with `allure run -- <command>` and `ALLURE_AGENT_OUTPUT` enabled.", "Inspect `manifest/run.json`, tail `manifest/test-events.jsonl`, then review `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` before editing tests.", "Enrich only the intended tests, rerun the same scope, and compare the rerun against `manifest/expected.json` when present.", "Accept the rerun only when scope is clean, evidence is strong enough to review, and no high-confidence dummy findings remain."];
9
+ export declare const AGENT_VERIFICATION_RULES: readonly ["If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through `allure run`. It preserves the original console logs and adds agent-mode artifacts when you need them.", "Use `allure run` for smoke checks too, even when the change is small or mechanical.", "Only skip agent mode when it is impossible or when you are debugging agent mode itself."];
10
+ export declare const AGENT_SMALL_TEST_CHANGE_WORKFLOW: readonly ["Create a fresh temp `ALLURE_AGENT_OUTPUT` and `ALLURE_AGENT_EXPECTATIONS` for the touched scope before closing the task.", "Run the touched scope with `allure run`, even if the goal is only a smoke check after a mechanical change such as typing cleanup, mock refactors, or helper extraction.", "Review `manifest/run.json`, `manifest/test-events.jsonl`, `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` before making any final claim."];
11
+ export declare const AGENT_COVERAGE_REVIEW_WORKFLOW: readonly ["Split package or business-logic audits into scoped groups and give each group its own temp output directory and expectations file.", "Review agent-mode artifacts first for each group, then inspect source code only after the runtime evidence shows what actually ran.", "Treat grouped coverage review as incomplete until each scoped run has matching expectations or an explicit note that the audit is intentionally broad."];
12
+ export declare const AGENT_TEST_ENRICHMENT_BEST_PRACTICES: readonly ["Steps must wrap real actions, state transitions, or assertions. Prefer a small setup/action/assertion narrative over event-by-event step spam.", "Attachments must capture real runtime evidence from that execution: payloads, responses, screenshots, DOM snapshots, diffs, logs, or traces.", "Add metadata only when it improves scope review, debugging, or downstream policy. Keep labels and parameters intentionally minimal.", "If multiple call sites need the same evidence, instrument the helper once. Example: teach `runCommand` to emit a step instead of wrapping every `runCommand(...)` call site with identical step blocks."];
13
+ export declare const AGENT_ANTI_DUMMY_POLICY: readonly ["Do not add empty wrapper steps, placeholder attachments, or generic strings such as `passed`, `success`, or static boilerplate.", "Do not attach evidence that is not grounded in the current test execution. Synthetic summaries are fine only when they are derived from real runtime artifacts.", "Do not inflate metadata with labels, stories, or severities that no expectation, review step, or policy actually uses."];
14
+ export declare const AGENT_ACCEPTANCE_CHECKLIST: readonly ["The rerun matches the intended scope and does not trigger forbidden or unexpected-test findings.", "Each touched test shows enough evidence to explain what happened and what was verified.", "Retries include per-attempt diagnostics when the same test reruns.", "No high-confidence anti-dummy findings remain, especially `noop-dominated-steps` or low-signal `step-spam` traces."];
15
+ export declare const AGENT_REVIEW_COMPLETENESS_CHECKLIST: readonly ["The relevant scope was run with agent mode unless that was impossible.", "Scoped reviews created expectations before the final rerun, unless the task was an intentionally broad package-health audit.", "The final conclusion was based on `manifest/run.json`, `manifest/test-events.jsonl`, `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` rather than console output alone.", "Any missing or partial runtime modeling was called out explicitly before the review was considered complete."];
16
+ export declare const AGENT_PARTIAL_RUNTIME_REVIEW_RULES: readonly ["Suite-load, import, or setup failures may appear only in `artifacts/global/stderr.txt` or global errors instead of `manifest/tests.jsonl`.", "If runner-visible failures are not represented as logical tests, inspect global stderr and treat the run as a partial runtime review, not a clean result set.", "Console-only conclusions stay provisional when agent output is missing, incomplete, or partially modeled."];
17
+ export declare const AGENT_INSTRUCTIONS_TEMPLATE = "## Allure Agent Mode Instructions\n\n- If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through `allure run`. It preserves the original console logs and adds agent-mode artifacts when you need them.\n- Use `allure run` for smoke checks too, even when the change is small or mechanical.\n- Only skip agent mode when it is impossible or when you are debugging agent mode itself.\n- Generate or refresh `ALLURE_AGENT_EXPECTATIONS` before each targeted rerun.\n- Run tests with `ALLURE_AGENT_OUTPUT` and review `manifest/run.json`, `manifest/test-events.jsonl`, `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl`.\n- Enrich only the intended tests. Add real steps for real setup, actions, and assertions.\n- Attach only real runtime evidence such as payloads, responses, screenshots, DOM snapshots, diffs, logs, or traces.\n- Keep metadata minimal. Add labels or severity only when scope review, debugging, or quality policy uses them.\n- Instrument stable helpers when several call sites need the same evidence. For example, teach `runCommand` to emit a step instead of wrapping every caller.\n- Reject the rerun if scope drifts, evidence stays weak, or high-confidence noop-style findings remain.";
18
+ export declare const renderAgentsGuide: (projectGuidePath?: string) => string;
@@ -0,0 +1,250 @@
1
+ export const ENRICHMENT_ACTIONS_BY_CHECK_NAME = {
2
+ "invalid-expectations-file": {
3
+ category: "bootstrap-allure",
4
+ title: "Repair the expectations file",
5
+ guidance: "Regenerate a valid YAML or JSON expectations file before the next enrichment iteration.",
6
+ },
7
+ "no-visible-tests": {
8
+ category: "bootstrap-allure",
9
+ title: "Restore Allure result generation",
10
+ guidance: "Make sure the test command emits Allure results before rerunning the enrichment loop.",
11
+ },
12
+ "missing-global-logs": {
13
+ category: "bootstrap-allure",
14
+ title: "Capture bootstrap logs when needed",
15
+ guidance: "Keep stdout and stderr capture enabled when you need run-level debugging context.",
16
+ },
17
+ "runner-failures-outside-logical-results": {
18
+ category: "bootstrap-allure",
19
+ title: "Inspect partial runtime failures before accepting the review",
20
+ guidance: "Check global stderr and global errors for suite-load, import, or setup failures that were not rendered as logical tests.",
21
+ },
22
+ "unmodeled-visible-results": {
23
+ category: "review-manually",
24
+ title: "Call out partial runtime modeling",
25
+ guidance: "Compare run statistics with the logical test files and document any skipped or non-passed results that were not rendered.",
26
+ },
27
+ "missing-expected-test": {
28
+ category: "narrow-test-scope",
29
+ title: "Bring the intended test back into scope",
30
+ guidance: "Regenerate expectations and rerun only the planned tests or selectors.",
31
+ },
32
+ "missing-expected-prefix": {
33
+ category: "narrow-test-scope",
34
+ title: "Restore the intended name-prefix scope",
35
+ guidance: "Check the selector and rerun only the feature slice that should have matched it.",
36
+ },
37
+ "missing-expected-environment": {
38
+ category: "narrow-test-scope",
39
+ title: "Rerun the intended environment",
40
+ guidance: "Constrain the rerun to the expected environment before accepting the result.",
41
+ },
42
+ "missing-expected-label-selector": {
43
+ category: "repair-test-metadata",
44
+ title: "Add the minimal missing scope label",
45
+ guidance: "Only add the labels required by the expectations selector; do not inflate metadata.",
46
+ },
47
+ "unexpected-environment": {
48
+ category: "narrow-test-scope",
49
+ title: "Remove unrelated environments from the rerun",
50
+ guidance: "Tighten the rerun selector so unrelated environments do not appear in agent output.",
51
+ },
52
+ "forbidden-selector-match": {
53
+ category: "narrow-test-scope",
54
+ title: "Stop forbidden tests from running",
55
+ guidance: "Reject the run, narrow the rerun scope, and keep the forbidden selectors in expectations.",
56
+ },
57
+ "unexpected-test": {
58
+ category: "narrow-test-scope",
59
+ title: "Remove unexpected tests from the rerun",
60
+ guidance: "Rerun only the intended tests or broaden expectations only if the extra test is truly in scope.",
61
+ },
62
+ "metadata-mismatch": {
63
+ category: "repair-test-metadata",
64
+ title: "Repair scope metadata with the minimum required labels",
65
+ guidance: "Add only the labels or parameters needed for expectations, review, or quality gates.",
66
+ },
67
+ "history-id-collision": {
68
+ category: "repair-test-metadata",
69
+ title: "Repair logical test identity",
70
+ guidance: "Use stable, unique history IDs so distinct logical tests do not collapse into one file.",
71
+ },
72
+ "failed-without-useful-steps": {
73
+ category: "add-meaningful-steps",
74
+ title: "Add meaningful setup, action, and assertion steps",
75
+ guidance: "Wrap only real actions, state transitions, and checks in Allure steps before rerunning.",
76
+ },
77
+ "failed-without-attachments": {
78
+ category: "add-test-attachments",
79
+ title: "Attach focused runtime evidence near the failure",
80
+ guidance: "Add real payloads, responses, screenshots, DOM snapshots, diffs, or logs near the failing point.",
81
+ },
82
+ "nontrivial-run-with-empty-trace": {
83
+ category: "add-meaningful-steps",
84
+ title: "Make the execution path observable",
85
+ guidance: "Expose the real setup, action, and verification path with steps or attachments on the next run.",
86
+ },
87
+ "retries-without-new-evidence": {
88
+ category: "add-retry-diagnostics",
89
+ title: "Capture what changes between retries",
90
+ guidance: "Add per-attempt diagnostics so retries show new evidence instead of repeating the same trace.",
91
+ },
92
+ "noop-dominated-steps": {
93
+ category: "collapse-low-signal-trace",
94
+ title: "Replace noop-style steps with real evidence",
95
+ guidance: "Keep only steps tied to real actions or checks, and replace bulk event spam with a compact artifact when needed.",
96
+ },
97
+ "step-spam": {
98
+ category: "collapse-low-signal-trace",
99
+ title: "Reduce low-signal step spam",
100
+ guidance: "Prefer a smaller set of meaningful steps plus one compact text attachment when the trace is mostly event logs.",
101
+ },
102
+ "global-only-artifacts": {
103
+ category: "add-test-attachments",
104
+ title: "Move evidence closer to the failing test",
105
+ guidance: "Use step-scoped or test-scoped attachments near the relevant failing action instead of relying only on global logs.",
106
+ },
107
+ "passed-without-observable-evidence": {
108
+ category: "add-meaningful-steps",
109
+ title: "Make the success path reviewable",
110
+ guidance: "Add a few real verification steps or attachments so the passing test shows what it proved.",
111
+ },
112
+ };
113
+ export const AGENT_ENRICHMENT_WORKFLOW = [
114
+ "Generate or refresh `ALLURE_AGENT_EXPECTATIONS` before each targeted enrichment iteration.",
115
+ "Run tests with `allure run -- <command>` and `ALLURE_AGENT_OUTPUT` enabled.",
116
+ "Inspect `manifest/run.json`, tail `manifest/test-events.jsonl`, then review `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` before editing tests.",
117
+ "Enrich only the intended tests, rerun the same scope, and compare the rerun against `manifest/expected.json` when present.",
118
+ "Accept the rerun only when scope is clean, evidence is strong enough to review, and no high-confidence dummy findings remain.",
119
+ ];
120
+ export const AGENT_VERIFICATION_RULES = [
121
+ "If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through `allure run`. It preserves the original console logs and adds agent-mode artifacts when you need them.",
122
+ "Use `allure run` for smoke checks too, even when the change is small or mechanical.",
123
+ "Only skip agent mode when it is impossible or when you are debugging agent mode itself.",
124
+ ];
125
+ export const AGENT_SMALL_TEST_CHANGE_WORKFLOW = [
126
+ "Create a fresh temp `ALLURE_AGENT_OUTPUT` and `ALLURE_AGENT_EXPECTATIONS` for the touched scope before closing the task.",
127
+ "Run the touched scope with `allure run`, even if the goal is only a smoke check after a mechanical change such as typing cleanup, mock refactors, or helper extraction.",
128
+ "Review `manifest/run.json`, `manifest/test-events.jsonl`, `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` before making any final claim.",
129
+ ];
130
+ export const AGENT_COVERAGE_REVIEW_WORKFLOW = [
131
+ "Split package or business-logic audits into scoped groups and give each group its own temp output directory and expectations file.",
132
+ "Review agent-mode artifacts first for each group, then inspect source code only after the runtime evidence shows what actually ran.",
133
+ "Treat grouped coverage review as incomplete until each scoped run has matching expectations or an explicit note that the audit is intentionally broad.",
134
+ ];
135
+ export const AGENT_TEST_ENRICHMENT_BEST_PRACTICES = [
136
+ "Steps must wrap real actions, state transitions, or assertions. Prefer a small setup/action/assertion narrative over event-by-event step spam.",
137
+ "Attachments must capture real runtime evidence from that execution: payloads, responses, screenshots, DOM snapshots, diffs, logs, or traces.",
138
+ "Add metadata only when it improves scope review, debugging, or downstream policy. Keep labels and parameters intentionally minimal.",
139
+ "If multiple call sites need the same evidence, instrument the helper once. Example: teach `runCommand` to emit a step instead of wrapping every `runCommand(...)` call site with identical step blocks.",
140
+ ];
141
+ export const AGENT_ANTI_DUMMY_POLICY = [
142
+ "Do not add empty wrapper steps, placeholder attachments, or generic strings such as `passed`, `success`, or static boilerplate.",
143
+ "Do not attach evidence that is not grounded in the current test execution. Synthetic summaries are fine only when they are derived from real runtime artifacts.",
144
+ "Do not inflate metadata with labels, stories, or severities that no expectation, review step, or policy actually uses.",
145
+ ];
146
+ export const AGENT_ACCEPTANCE_CHECKLIST = [
147
+ "The rerun matches the intended scope and does not trigger forbidden or unexpected-test findings.",
148
+ "Each touched test shows enough evidence to explain what happened and what was verified.",
149
+ "Retries include per-attempt diagnostics when the same test reruns.",
150
+ "No high-confidence anti-dummy findings remain, especially `noop-dominated-steps` or low-signal `step-spam` traces.",
151
+ ];
152
+ export const AGENT_REVIEW_COMPLETENESS_CHECKLIST = [
153
+ "The relevant scope was run with agent mode unless that was impossible.",
154
+ "Scoped reviews created expectations before the final rerun, unless the task was an intentionally broad package-health audit.",
155
+ "The final conclusion was based on `manifest/run.json`, `manifest/test-events.jsonl`, `index.md`, `manifest/tests.jsonl`, and `manifest/findings.jsonl` rather than console output alone.",
156
+ "Any missing or partial runtime modeling was called out explicitly before the review was considered complete.",
157
+ ];
158
+ export const AGENT_PARTIAL_RUNTIME_REVIEW_RULES = [
159
+ "Suite-load, import, or setup failures may appear only in `artifacts/global/stderr.txt` or global errors instead of `manifest/tests.jsonl`.",
160
+ "If runner-visible failures are not represented as logical tests, inspect global stderr and treat the run as a partial runtime review, not a clean result set.",
161
+ "Console-only conclusions stay provisional when agent output is missing, incomplete, or partially modeled.",
162
+ ];
163
+ export const AGENT_INSTRUCTIONS_TEMPLATE = `## Allure Agent Mode Instructions
164
+
165
+ - If a command executes tests and its result will be used for smoke checking, reasoning, review, coverage analysis, debugging, or any user-facing conclusion, run it through \`allure run\`. It preserves the original console logs and adds agent-mode artifacts when you need them.
166
+ - Use \`allure run\` for smoke checks too, even when the change is small or mechanical.
167
+ - Only skip agent mode when it is impossible or when you are debugging agent mode itself.
168
+ - Generate or refresh \`ALLURE_AGENT_EXPECTATIONS\` before each targeted rerun.
169
+ - Run tests with \`ALLURE_AGENT_OUTPUT\` and review \`manifest/run.json\`, \`manifest/test-events.jsonl\`, \`index.md\`, \`manifest/tests.jsonl\`, and \`manifest/findings.jsonl\`.
170
+ - Enrich only the intended tests. Add real steps for real setup, actions, and assertions.
171
+ - Attach only real runtime evidence such as payloads, responses, screenshots, DOM snapshots, diffs, logs, or traces.
172
+ - Keep metadata minimal. Add labels or severity only when scope review, debugging, or quality policy uses them.
173
+ - Instrument stable helpers when several call sites need the same evidence. For example, teach \`runCommand\` to emit a step instead of wrapping every caller.
174
+ - Reject the rerun if scope drifts, evidence stays weak, or high-confidence noop-style findings remain.`;
175
+ const renderBullets = (items) => items.map((item) => `- ${item}`).join("\n");
176
+ const renderNumbered = (items) => items.map((item, index) => `${index + 1}. ${item}`).join("\n");
177
+ const renderRemediationGuide = () => Object.entries(ENRICHMENT_ACTIONS_BY_CHECK_NAME)
178
+ .map(([checkName, action]) => `- \`${checkName}\`: ${action.title}. ${action.guidance}`)
179
+ .join("\n");
180
+ export const renderAgentsGuide = (projectGuidePath) => `# AGENTS Guide
181
+
182
+ ## Reading Order
183
+
184
+ ${projectGuidePath
185
+ ? `1. Read [project guidance](${projectGuidePath}) first for repo-specific testing conventions and loop expectations.
186
+ 2. Read \`manifest/run.json\` for the current phase, counts, and modeling summary.
187
+ 3. Tail \`manifest/test-events.jsonl\` for the newest structured updates while the run is active.
188
+ 4. Open \`index.md\` for run-level status, scope summary, and the highest-priority findings.
189
+ 5. Open the relevant file under \`tests/<environment>/<historyId-or-trId>.md\` for evidence review.
190
+ 6. Follow links into \`.assets/\` for test-scoped artifacts and into \`artifacts/global/\` for process logs such as stdout and stderr.`
191
+ : `1. Read \`manifest/run.json\` for the current phase, counts, and modeling summary.
192
+ 2. Tail \`manifest/test-events.jsonl\` for the newest structured updates while the run is active.
193
+ 3. Open \`index.md\` for run-level status, scope summary, and the highest-priority findings.
194
+ 4. Open the relevant file under \`tests/<environment>/<historyId-or-trId>.md\` for evidence review.
195
+ 5. Follow links into \`.assets/\` for test-scoped artifacts and into \`artifacts/global/\` for process logs such as stdout and stderr.`}
196
+
197
+ ## Directory Contract
198
+
199
+ - \`index.md\` contains the triage-oriented run overview.
200
+ - \`manifest/run.json\` is the canonical machine-readable run summary.
201
+ - \`manifest/test-events.jsonl\` is the append-only live event stream for machine consumers during the run.
202
+ - \`manifest/tests.jsonl\` contains one logical test summary per line.
203
+ - \`manifest/findings.jsonl\` contains one advisory finding per line.
204
+ - \`manifest/expected.json\` is copied from \`ALLURE_AGENT_EXPECTATIONS\` when provided.
205
+ - \`project/docs/allure-agent-mode.md\` is copied from the project when available so each run keeps the guide used for that execution.
206
+ - \`tests/<environment>/<slug>.md\` contains one logical test per file.
207
+ - Retries from the same run are nested inside the same logical test file.
208
+ - \`tests/<environment>/<slug>.assets/\` contains copied attachments for that logical test.
209
+ - \`artifacts/global/\` contains copied global artifacts for the whole run.
210
+
211
+ ## Enrichment Loop Workflow
212
+
213
+ ${renderNumbered(AGENT_ENRICHMENT_WORKFLOW)}
214
+
215
+ ## Verification Standard
216
+
217
+ ${renderBullets(AGENT_VERIFICATION_RULES)}
218
+
219
+ ## Small Test Change Workflow
220
+
221
+ ${renderNumbered(AGENT_SMALL_TEST_CHANGE_WORKFLOW)}
222
+
223
+ ## Coverage Review Workflow
224
+
225
+ ${renderNumbered(AGENT_COVERAGE_REVIEW_WORKFLOW)}
226
+
227
+ ## Test Enrichment Best Practices
228
+
229
+ ${renderBullets(AGENT_TEST_ENRICHMENT_BEST_PRACTICES)}
230
+
231
+ ## Anti-Dummy Policy
232
+
233
+ ${renderBullets(AGENT_ANTI_DUMMY_POLICY)}
234
+
235
+ ## Acceptance Checklist
236
+
237
+ ${renderBullets(AGENT_ACCEPTANCE_CHECKLIST)}
238
+
239
+ ## Review Completeness
240
+
241
+ ${renderBullets(AGENT_REVIEW_COMPLETENESS_CHECKLIST)}
242
+
243
+ ## Partial Runtime Review
244
+
245
+ ${renderBullets(AGENT_PARTIAL_RUNTIME_REVIEW_RULES)}
246
+
247
+ ## Remediation Guide
248
+
249
+ ${renderRemediationGuide()}
250
+ `;