@xera-ai/skills 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/xera-eval.md +177 -0
- package/xera-feature.md +20 -4
- package/xera-report.md +112 -13
- package/xera-script.md +22 -5
package/package.json
CHANGED
package/xera-eval.md
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: xera-eval
|
|
3
|
+
description: Evaluate AI gen quality of the 3 xera prompt templates against 5+ golden tickets. Maintainer-only tool — DO NOT use in end-user consumer projects.
|
|
4
|
+
flags:
|
|
5
|
+
- --prompt=<stage> # Restrict to one stage: feature-from-story | script-from-feature | diagnose-failure
|
|
6
|
+
- --ticket=<id> # Restrict to one ticket id (e.g. EVAL-001 or GOLD-001)
|
|
7
|
+
- --force # Allow re-running with an existing run-id
|
|
8
|
+
- --judge-only # Skip gen + deterministic; re-judge the most recent run's actual/ tree
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# /xera-eval
|
|
12
|
+
|
|
13
|
+
You are running the xera v0.2 eval harness. This is a MAINTAINER-ONLY skill that is run from a Claude Code session inside the xera repo itself (not from an end-user consumer project).
|
|
14
|
+
|
|
15
|
+
## When to use
|
|
16
|
+
|
|
17
|
+
A maintainer is about to publish a new version of `feature-from-story.md`, `script-from-feature.md`, or `diagnose-failure.md`, or of `eval-rubric.md`. They want to confirm scores have not regressed against the 5 golden fixtures before publishing.
|
|
18
|
+
|
|
19
|
+
## Important: this skill spawns sub-agents
|
|
20
|
+
|
|
21
|
+
Unlike all other `/xera-*` skills, this one DELIBERATELY uses the Task tool to spawn fresh-context sub-agents for the judge phase. This is documented in spec §2.2 decision #7 and §7 risk #1 — the rationale is mitigating self-evaluation bias. **Do not refactor this skill to inline the judge into the main session.** If a future maintainer suggests "consistency cleanup" to remove the sub-agent spawn, point them at the spec.
|
|
22
|
+
|
|
23
|
+
## Flow
|
|
24
|
+
|
|
25
|
+
The flow has 5 phases. Phases 1, 3, 5 are deterministic CLI calls. Phase 2 (gen) is your cognitive work in this session. Phase 4 (judge) spawns sub-agents.
|
|
26
|
+
|
|
27
|
+
### Phase 1 — Prepare
|
|
28
|
+
|
|
29
|
+
If `--judge-only` is set, SKIP phases 1, 2, 3 and jump to "Judge-only flow" below.
|
|
30
|
+
|
|
31
|
+
Run: `bun run xera:eval-prepare {{FLAGS}}`
|
|
32
|
+
|
|
33
|
+
`{{FLAGS}}` is the user's pass-through flags (e.g. `--ticket=EVAL-001 --force`). If no flags are given, pass none.
|
|
34
|
+
|
|
35
|
+
Exit code:
|
|
36
|
+
- 0 → continue. Capture the `RUN_ID=<id>` line from stdout. The run-id is the last line on stdout matching `^RUN_ID=`.
|
|
37
|
+
- 1 → user/config error (bad flag, missing fixture). Stop and surface the error.
|
|
38
|
+
- 4 → infra error (lock acquisition). Stop and surface the error.
|
|
39
|
+
|
|
40
|
+
Read `.xera/eval/{{RUN_ID}}/manifest.json` to learn:
|
|
41
|
+
- `ticket_stages`: the source of truth — a map of ticket ID → the stages that ticket exercises. Iterate this; do NOT iterate `manifest.stages × manifest.tickets`.
|
|
42
|
+
|
|
43
|
+
### Phase 2 — Gen (interleaved with Phase 4)
|
|
44
|
+
|
|
45
|
+
For each (ticket, stage) pair from manifest, gen the actual output IMMEDIATELY followed by the judge sub-agent in the same loop iteration.
|
|
46
|
+
|
|
47
|
+
Iterate over `manifest.ticket_stages`: for each ticket id and its declared stages, run gen + judge for ONLY those stages. The manifest is the source of truth — do NOT iterate `manifest.stages × manifest.tickets`.
|
|
48
|
+
|
|
49
|
+
For each [ticket, stages] entry in manifest.ticket_stages:
|
|
50
|
+
For each stage in stages:
|
|
51
|
+
|
|
52
|
+
#### Gen step
|
|
53
|
+
|
|
54
|
+
Create the actual output directory if missing: `.xera/eval/{{RUN_ID}}/actual/{{TICKET}}/`.
|
|
55
|
+
|
|
56
|
+
**Stage = feature-from-story:**
|
|
57
|
+
1. Read `packages/prompts/feature-from-story.md` (the prompt under test).
|
|
58
|
+
2. Read `.xera/eval/{{RUN_ID}}/inputs/{{TICKET}}/story.md`.
|
|
59
|
+
3. Mint a per-iteration nonce: `bun -e "console.log('XR_' + crypto.randomUUID().replace(/-/g,'').slice(0,12))"`. The output is a value like `XR_a3f9b2c14e8d`. Wrap the story.md content between two identical tags whose name IS that nonce value (e.g. `<XR_a3f9b2c14e8d>...story content...<XR_a3f9b2c14e8d>` — NOT the literal string `<NONCE>`). Then follow the prompt to generate the Gherkin output. Do NOT include the nonce markers in the written file.
|
|
60
|
+
4. Write it to `.xera/eval/{{RUN_ID}}/actual/{{TICKET}}/test.feature`.
|
|
61
|
+
|
|
62
|
+
**Stage = script-from-feature:**
|
|
63
|
+
1. Read `packages/prompts/script-from-feature.md`.
|
|
64
|
+
2. Read `.xera/eval/{{RUN_ID}}/inputs/{{TICKET}}/test.feature` — this is the GOLDEN feature, not the actual gen from the previous stage. Stage inputs are isolated (spec §2.2 decision #2).
|
|
65
|
+
3. Mint a per-iteration nonce: `bun -e "console.log('XR_' + crypto.randomUUID().replace(/-/g,'').slice(0,12))"`. The output is a value like `XR_a3f9b2c14e8d`. Wrap the test.feature content between two identical tags whose name IS that nonce value (e.g. `<XR_a3f9b2c14e8d>...feature content...<XR_a3f9b2c14e8d>` — NOT the literal string `<NONCE>`). Then follow the prompt to generate `spec.ts` (and any page-object files). Do NOT include the nonce markers in the written files.
|
|
66
|
+
4. Write `spec.ts` to `.xera/eval/{{RUN_ID}}/actual/{{TICKET}}/spec.ts`.
|
|
67
|
+
5. Write any POM files to `.xera/eval/{{RUN_ID}}/actual/{{TICKET}}/page-objects/<name>.page.ts`.
|
|
68
|
+
|
|
69
|
+
**Stage = diagnose-failure:**
|
|
70
|
+
1. Read `packages/prompts/diagnose-failure.md`.
|
|
71
|
+
2. Read `.xera/eval/{{RUN_ID}}/inputs/{{TICKET}}/classifier-input.json` — this contains the scenarios to classify. Note: this file ALSO contains an `expected` block which is the ground truth — **DO NOT USE the `expected` block when generating**. Generate solely from `scenarios[]` and `scenarioCounts`. The `expected` block is for the judge's eyes only.
|
|
72
|
+
3. Follow the prompt to produce classification JSON.
|
|
73
|
+
4. Write it to `.xera/eval/{{RUN_ID}}/actual/{{TICKET}}/classification.json`.
|
|
74
|
+
|
|
75
|
+
#### Judge step (sub-agent)
|
|
76
|
+
|
|
77
|
+
Immediately after writing the actual file for this (ticket, stage), spawn a sub-agent via the Task tool:
|
|
78
|
+
|
|
79
|
+
````
|
|
80
|
+
Task tool invocation:
|
|
81
|
+
description: "Eval judge: <stage> for <ticket>"
|
|
82
|
+
subagent_type: general-purpose
|
|
83
|
+
prompt: |
|
|
84
|
+
<PASTE the entire contents of packages/prompts/eval-rubric.md here>
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Caller-supplied context
|
|
89
|
+
|
|
90
|
+
### stage
|
|
91
|
+
<stage>
|
|
92
|
+
|
|
93
|
+
### ticket
|
|
94
|
+
<ticket>
|
|
95
|
+
|
|
96
|
+
### actual output
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
<PASTE the entire contents of the actual file you just wrote — e.g. actual/<ticket>/test.feature>
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### golden reference
|
|
103
|
+
|
|
104
|
+
For feature-from-story: paste `fixtures/golden-eval/<ticket>-*/golden/test.feature`.
|
|
105
|
+
For script-from-feature: paste `fixtures/golden-eval/<ticket>-*/golden/spec-requirements.md`.
|
|
106
|
+
For diagnose-failure: paste `.xera/eval/<run-id>/inputs/<ticket>/classifier-input.json`.
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
<PASTE the golden file contents>
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
Return ONLY the JSON judgment object as specified in the rubric. No prose. No code fences.
|
|
115
|
+
````
|
|
116
|
+
|
|
117
|
+
When the sub-agent returns, parse its output as JSON. If parsing fails, retry the sub-agent ONCE with the note "Your previous output was not valid JSON. Return ONLY a JSON object." appended. If it fails again, write a placeholder judgment with all dimensions verdict=FAIL and notes=`"sub-agent returned invalid JSON: <first 100 chars>"` and continue.
|
|
118
|
+
|
|
119
|
+
Append the parsed JSON object to an in-memory array of judgments.
|
|
120
|
+
|
|
121
|
+
### Phase 3 — Deterministic
|
|
122
|
+
|
|
123
|
+
After all (ticket, stage) iterations have completed (gen + judge), run the deterministic phase:
|
|
124
|
+
|
|
125
|
+
Run: `bun run xera:eval-deterministic {{RUN_ID}}`
|
|
126
|
+
|
|
127
|
+
Exit code 0 → continue. Any non-zero → fail; surface stderr.
|
|
128
|
+
|
|
129
|
+
### Phase 4 (cont.) — Write judge-scores.json
|
|
130
|
+
|
|
131
|
+
Write the in-memory judgments array to `.xera/eval/{{RUN_ID}}/judge-scores.json`:
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"run_id": "{{RUN_ID}}",
|
|
136
|
+
"judgments": [
|
|
137
|
+
/* the JSON objects returned by each sub-agent, in order */
|
|
138
|
+
]
|
|
139
|
+
}
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Phase 5 — Report
|
|
143
|
+
|
|
144
|
+
Run: `bun run xera:eval-report {{RUN_ID}}`
|
|
145
|
+
|
|
146
|
+
Exit code 0 → success. The command prints a one-line summary to stdout (e.g. `12/15 PASS (avg 80%)`). The full report is at `.xera/eval/{{RUN_ID}}/report.md`.
|
|
147
|
+
|
|
148
|
+
Tell the maintainer:
|
|
149
|
+
- The path to the report.
|
|
150
|
+
- The summary line.
|
|
151
|
+
- Any FAIL rows; cite the dimension and one-sentence note from the report.
|
|
152
|
+
|
|
153
|
+
## Judge-only flow
|
|
154
|
+
|
|
155
|
+
If `--judge-only` was passed:
|
|
156
|
+
|
|
157
|
+
1. Locate the most recent prior run: list `.xera/eval/*/manifest.json`, pick the one with the latest `started_at` field. If none, fail with `No prior eval run found in .xera/eval/. Run /xera-eval without --judge-only first.`
|
|
158
|
+
2. Read `manifest.json` to learn the tickets/stages.
|
|
159
|
+
3. Apply any `--prompt` / `--ticket` filters from the user against the manifest's scope (do not extend beyond it).
|
|
160
|
+
4. For each (ticket, stage) in `manifest.ticket_stages` filtered by any `--prompt`/`--ticket` flags from the user: spawn a judge sub-agent using the existing `actual/<ticket>/*` files. Same Task-tool template as Phase 2.
|
|
161
|
+
5. Overwrite `.xera/eval/<run-id>/judge-scores.json` with the new array.
|
|
162
|
+
6. Re-run `bun run xera:eval-report <run-id>`.
|
|
163
|
+
|
|
164
|
+
Do NOT re-run `xera:eval-prepare` or `xera:eval-deterministic` in judge-only mode.
|
|
165
|
+
|
|
166
|
+
## Exit conditions
|
|
167
|
+
|
|
168
|
+
- Exit 0 → report.md exists and was rendered. Tell the maintainer the path and the summary line.
|
|
169
|
+
- Any non-zero exit from any `bun run xera:*` call → stop, print the stderr, and ask the maintainer how to proceed. Do not invent fallbacks.
|
|
170
|
+
- A sub-agent returning persistently-invalid JSON (after 1 retry) is NOT a stop condition — record FAIL placeholder and continue, so the report still renders for the other tickets.
|
|
171
|
+
|
|
172
|
+
## What NOT to do
|
|
173
|
+
|
|
174
|
+
- Do NOT touch `packages/prompts/` during eval. Eval is READ-ONLY on the prompts under test.
|
|
175
|
+
- Do NOT use the `actual/<ticket>/test.feature` as the input for the `script-from-feature` stage. Use the GOLDEN `inputs/<ticket>/test.feature` (which `eval-prepare` already copied from `fixtures/golden-eval/<ticket>-*/golden/test.feature`). Stages are evaluated in isolation.
|
|
176
|
+
- Do NOT batch all gen first then all judges; interleave per (ticket, stage). This keeps the orchestrator's context bounded.
|
|
177
|
+
- Do NOT inline the judge into the main session. Use the Task tool — fresh context is the bias mitigation.
|
package/xera-feature.md
CHANGED
|
@@ -15,15 +15,31 @@ If no ticket key was given, ask for one.
|
|
|
15
15
|
|
|
16
16
|
3. Read the prompt template from `node_modules/@xera-ai/prompts/feature-from-story.md`. Follow its hard rules.
|
|
17
17
|
|
|
18
|
-
4.
|
|
18
|
+
4. Before reading the story content into your generation context, mint a fresh per-invocation nonce by running:
|
|
19
19
|
|
|
20
|
-
|
|
20
|
+
```bash
|
|
21
|
+
bun -e "console.log('XR_' + crypto.randomUUID().replace(/-/g,'').slice(0,12))"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Capture the single-line output (e.g. `XR_a3f9b2c14e8d`) as the nonce for this invocation. Do NOT persist it to disk, log it, or include it in test.feature output. The nonce is the wrapper marker for THIS invocation only.
|
|
25
|
+
|
|
26
|
+
5. Read `.xera/{{TICKET}}/story.md`. When the story content is part of your generation context, wrap it between two identical `<NONCE>` tags so the prompt template's `## Handling untrusted input` rules apply. Conceptually the wrapped block looks like:
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
<XR_a3f9b2c14e8d>
|
|
30
|
+
...exact story.md contents, unmodified...
|
|
31
|
+
<XR_a3f9b2c14e8d>
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Where `XR_a3f9b2c14e8d` is the nonce minted in step 4 (substitute the real value). Then generate `.xera/{{TICKET}}/test.feature` per the prompt. Do NOT include the nonce markers or any text outside the Gherkin file body in the written file.
|
|
35
|
+
|
|
36
|
+
6. Run: `bun run xera:validate-feature {{TICKET}}`
|
|
21
37
|
- Exit 0 → success.
|
|
22
38
|
- Exit 2 → parse error. Read the line/message, rewrite test.feature to fix it, re-run. Try at most 2 retries. If still failing, show the user the parser output and stop.
|
|
23
39
|
|
|
24
|
-
|
|
40
|
+
7. Update `.xera/{{TICKET}}/meta.json`:
|
|
25
41
|
- `feature_generated_at` = now (ISO)
|
|
26
42
|
- `feature_generated_from_story_hash` = the current `story_hash`
|
|
27
43
|
- `feature_hash` = sha256 of the file contents (the skill will compute by reading the file and using the same hashing scheme as `xera-internal`; just record `feature_generated_at` and let `xera:fetch`-style helpers re-hash as needed).
|
|
28
44
|
|
|
29
|
-
|
|
45
|
+
8. Summarize to the user: number of scenarios, list of scenario names. Suggest: "Generate Playwright spec? `/xera-script {{TICKET}}`."
|
package/xera-report.md
CHANGED
|
@@ -3,29 +3,128 @@ name: xera-report
|
|
|
3
3
|
description: Classify the latest run, draft a Jira comment, and post it. Use after `/xera-exec` when QA wants the diagnosis and Jira update.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
You are running `/xera-report <TICKET>` (or `/xera-report --no-heal <TICKET>` to skip the heal sub-flow described in step 4a; or you were dispatched to this step by `/xera-run`). If no ticket key is provided, ask the user.
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
## Important — this skill does AI work
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
Step 4 below is *cognitive work that YOU, the session, must do*. It is not a shell command. Do not skip it. Do not call `bun run xera:report` until **you have personally written the file `.xera/{{TICKET}}/classifier-input.json`** by reasoning over the run artifacts. The CLI helper consumes that JSON; it does not produce it.
|
|
11
|
+
|
|
12
|
+
## Steps
|
|
13
|
+
|
|
14
|
+
1. **Verify** `.xera/{{TICKET}}/runs/` has at least one run directory. If not, tell the user: "Run the test first with `/xera-exec {{TICKET}}`." then STOP.
|
|
15
|
+
|
|
16
|
+
2. **Normalize the trace.** Run: `bun run xera:normalize {{TICKET}}`
|
|
11
17
|
- Exit 0 → continue.
|
|
12
|
-
- Otherwise show stderr
|
|
18
|
+
- Otherwise show stderr to the user and STOP.
|
|
13
19
|
|
|
14
|
-
3. Read the latest `.xera/{{TICKET}}/runs/<latest>/normalized.json`. Also read:
|
|
20
|
+
3. **Read** the latest `.xera/{{TICKET}}/runs/<latest>/normalized.json`. Also read every file below before proceeding to step 4:
|
|
15
21
|
- `.xera/{{TICKET}}/test.feature`
|
|
16
22
|
- `.xera/{{TICKET}}/story.md`
|
|
17
23
|
- `.xera/{{TICKET}}/spec.ts`
|
|
18
|
-
- `.xera/{{TICKET}}/status.json` (may not exist on first run)
|
|
24
|
+
- `.xera/{{TICKET}}/status.json` (may not exist on the first run — that's fine)
|
|
19
25
|
- `.xera/{{TICKET}}/meta.json`
|
|
26
|
+
- `node_modules/@xera-ai/prompts/diagnose-failure.md` (the prompt template — read it in full; the rest of step 4 follows ITS rules)
|
|
27
|
+
|
|
28
|
+
4. **Classify (YOUR job, no CLI shortcut here).** Follow `diagnose-failure.md`'s decision algorithm scenario-by-scenario. For each scenario in `normalized.json`, decide:
|
|
29
|
+
- `class`: one of `PASS`, `REAL_BUG`, `SELECTOR_DRIFT`, `FLAKY`, `TEST_BUG`
|
|
30
|
+
- `confidence`: `low`, `medium`, or `high`
|
|
31
|
+
- `rationale`: 1–3 sentences in English citing concrete evidence (URL, HTTP status, element name, prior run timestamps, hash drift, etc.)
|
|
32
|
+
|
|
33
|
+
Then write a JSON file to `.xera/{{TICKET}}/classifier-input.json` with this exact shape:
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"runId": "<runId from normalized.json>",
|
|
38
|
+
"scenarios": [
|
|
39
|
+
{
|
|
40
|
+
"name": "<scenario name>",
|
|
41
|
+
"outcome": "PASS" | "FAIL" | "SKIPPED",
|
|
42
|
+
"class": "PASS" | "REAL_BUG" | "SELECTOR_DRIFT" | "FLAKY" | "TEST_BUG",
|
|
43
|
+
"confidence": "low" | "medium" | "high",
|
|
44
|
+
"rationale": "..."
|
|
45
|
+
}
|
|
46
|
+
],
|
|
47
|
+
"scenarioCounts": { "total": N, "passed": N, "failed": N, "skipped": N }
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**Do not skip this step.** If you find yourself about to call `bun run xera:report` without having written this file, stop and write the file first.
|
|
52
|
+
|
|
53
|
+
4a. **Heal sub-flow (only if SELECTOR_DRIFT present).** If the user passed `--no-heal` in the invocation, skip this entire sub-flow and proceed directly to step 5.
|
|
54
|
+
|
|
55
|
+
Otherwise: read `.xera/{{TICKET}}/classifier-input.json` (which you just wrote in step 4) and check whether any scenario has `class: "SELECTOR_DRIFT"`. If none, skip this entire sub-flow and proceed directly to step 5 (Aggregate + draft).
|
|
56
|
+
|
|
57
|
+
If at least one scenario is SELECTOR_DRIFT, take the FIRST such scenario (by array order — the single-heal guard) and execute Phases A–C below. Subsequent SELECTOR_DRIFT scenarios are NOT auto-healed in the same `/xera-report` invocation; list them in the report output as "additional drifts: re-run /xera-report after applying the first heal."
|
|
58
|
+
|
|
59
|
+
**Phase A — Prepare.** Determine the runId from the most recent run directory under `.xera/{{TICKET}}/runs/` (sorted descending — the latest folder name is the runId).
|
|
60
|
+
|
|
61
|
+
**Sentinel check (single-heal enforcement):** Check whether `.xera/{{TICKET}}/runs/{{RUN_ID}}/.heal-attempted` exists. If yes, the heal sub-flow has already been attempted for this run; skip the entire heal sub-flow and proceed to step 5. (This prevents re-heal loops if the user accidentally invokes `/xera-report` twice on the same run.) If it does not exist, create it by writing an empty file via `bash -c 'touch .xera/{{TICKET}}/runs/{{RUN_ID}}/.heal-attempted'` BEFORE proceeding to the heal-prepare invocation.
|
|
62
|
+
|
|
63
|
+
Then run:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
bun packages/core/bin/internal.ts heal-prepare {{TICKET}} {{RUN_ID}} "{{SCENARIO_NAME}}"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Substitute the real runId and scenario name. The scenario name may contain spaces; quote it. Exit code 0 on success (a `heal-input.json` is written into the run dir at `.xera/{{TICKET}}/runs/{{RUN_ID}}/heal-input.json`). Exit 1 on prepare failure — surface the stderr message to the user and STOP the heal sub-flow (do NOT block the rest of /xera-report; proceed to step 5 with no heal applied).
|
|
70
|
+
|
|
71
|
+
**Phase B — LLM heal proposal.**
|
|
72
|
+
|
|
73
|
+
1. Mint a per-invocation nonce by running:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
bun -e "console.log('XR_' + crypto.randomUUID().replace(/-/g,'').slice(0,12))"
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Capture the single-line output (e.g. `XR_a3f9b2c14e8d`) as the nonce for this invocation. Do NOT persist or log it.
|
|
80
|
+
|
|
81
|
+
2. Read `node_modules/@xera-ai/prompts/heal-locator.md` (the prompt template). Follow its rules.
|
|
82
|
+
|
|
83
|
+
3. Read `.xera/{{TICKET}}/runs/{{RUN_ID}}/heal-input.json` (the prepared payload).
|
|
84
|
+
|
|
85
|
+
4. When the heal-input.json's `domSnapshotAtFailure` field content is part of your generation context, wrap it between two identical tags whose name IS the nonce value. Conceptually:
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
<XR_a3f9b2c14e8d>
|
|
89
|
+
...exact domSnapshotAtFailure content...
|
|
90
|
+
<XR_a3f9b2c14e8d>
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Use the real nonce value from step 1, not the literal placeholder. NOT the literal string `<NONCE>`.
|
|
94
|
+
|
|
95
|
+
5. Follow `heal-locator.md`'s rules and emit the strict JSON output. Write it to `.xera/{{TICKET}}/runs/{{RUN_ID}}/heal-output.json`. The file must contain ONLY the JSON object — no surrounding prose, no markdown fences.
|
|
96
|
+
|
|
97
|
+
**Phase C — Apply + verify.**
|
|
98
|
+
|
|
99
|
+
1. Read `.xera/{{TICKET}}/runs/{{RUN_ID}}/heal-output.json`. Parse it.
|
|
100
|
+
|
|
101
|
+
2. If the JSON is malformed OR the schema doesn't match (missing required fields, invalid enum values like a `decision` other than `"apply"`/`"refuse"`, invalid `confidence` value, invalid `refusalCategory`), report the parse error to the user as a refusal-equivalent and STOP the heal sub-flow. Proceed to step 5 with no heal applied.
|
|
102
|
+
|
|
103
|
+
3. **Low-confidence downgrade:** if `decision === "apply"` AND `confidence === "low"`, treat the output as `decision: "refuse"`, `refusalCategory: "low-confidence"` regardless of what the LLM emitted. Write the downgraded shape back to `heal-output.json` so the audit trail is honest.
|
|
104
|
+
|
|
105
|
+
4. If `decision === "refuse"`: report to the user the refusal `refusalCategory` and `reason`. STOP the heal sub-flow.
|
|
106
|
+
|
|
107
|
+
5. If `decision === "apply"`:
|
|
108
|
+
|
|
109
|
+
- Read `heal-input.json` to get `pomFile` and `pomLineContent`.
|
|
110
|
+
- Read the current `pomFile` text. If it does NOT contain `pomLineContent` verbatim → STOP with the message: "POM line drifted since heal was proposed; please re-run /xera-report." Do NOT write any changes.
|
|
111
|
+
- Count the number of `pomLineContent` occurrences in `pomFile`. If MORE THAN ONE → STOP with the message: "POM contains duplicate line matching the heal target; cannot apply ambiguously. Please disambiguate manually and re-run /xera-report." Do NOT write any changes.
|
|
112
|
+
- Otherwise (exactly one occurrence): replace it with `newPomLine` from heal-output.json. Write the file back.
|
|
113
|
+
- Tell the user: "Re-running test to verify heal — this typically takes 1-5 minutes..."
|
|
114
|
+
- Run: `bun run xera:exec {{TICKET}}`. Capture exit code:
|
|
115
|
+
- **exit 0:** Run `git add {{POM_FILE}}`. Tell user: "Heal verified ✓ — POM change is staged. Review with `git diff --staged` and commit when ready."
|
|
116
|
+
- **exit 3:** Run `git checkout HEAD -- {{POM_FILE}}` to revert. Read the latest run dir's classifier output (which now reflects the post-heal failure). Tell user: "Heal proposed `{{NEW_LOCATOR}}` but the test still failed. POM reverted. New failure: {{NEW_ERROR_SUMMARY}}. Investigate manually." STOP.
|
|
117
|
+
- **exit 4 (or any non-0/3 code):** Run `git checkout HEAD -- {{POM_FILE}}` to revert. Tell user: "Heal verification crashed (exit code {{EXIT}}). POM reverted. Investigate manually." STOP.
|
|
20
118
|
|
|
21
|
-
|
|
119
|
+
After the heal sub-flow finishes (whether it applied, refused, or errored), continue to step 5 below to aggregate + draft the report. The Jira comment in step 5 reflects the run as it was originally classified — heal results are a separate concern not (in v0.5) folded into the Jira comment.
|
|
22
120
|
|
|
23
|
-
5. Run: `bun run xera:report {{TICKET}} -- --input=.xera/{{TICKET}}/classifier-input.json`
|
|
121
|
+
5. **Aggregate + draft.** Run: `bun run xera:report {{TICKET}} -- --input=.xera/{{TICKET}}/classifier-input.json`
|
|
122
|
+
This CLI: aggregates per-scenario classifications into an overall verdict, updates `status.json` with history, and writes `jira-comment.draft.md`. If exit code is non-zero, surface the error to the user; do not proceed to post.
|
|
24
123
|
|
|
25
|
-
6.
|
|
124
|
+
6. **Show the draft.** Read `.xera/{{TICKET}}/jira-comment.draft.md`. Display its content to the user verbatim. Ask: "Post to Jira? (Y/n)" (default: Y, unless `meta.json.source === "local"` for SAMPLE tickets — then never post).
|
|
26
125
|
|
|
27
|
-
7. If yes:
|
|
28
|
-
- If Atlassian MCP is available
|
|
29
|
-
- Else
|
|
126
|
+
7. **Post.** If user says yes (or `xera-run` is in auto mode with `postToJira: true`):
|
|
127
|
+
- If an Atlassian MCP tool is available in this session (e.g., `mcp__atlassian__addCommentToJiraIssue` or `mcp__plugin_engineering_atlassian__addCommentToJiraIssue`), call it with `{{TICKET}}` and the draft contents. Capture the comment id.
|
|
128
|
+
- Else run `bun run xera:post {{TICKET}}` (uses REST credentials from `.env`).
|
|
30
129
|
|
|
31
|
-
8. Summarize
|
|
130
|
+
8. **Summarize** to the user: overall classification, scenario pass/fail counts, the reproduce command (`bunx xera-internal exec {{TICKET}} --replay=<runId>`), and the Jira comment URL if available.
|
package/xera-script.md
CHANGED
|
@@ -14,16 +14,33 @@ The user invoked `/xera-script <TICKET>`. If no key, ask.
|
|
|
14
14
|
|
|
15
15
|
4. Read `node_modules/@xera-ai/prompts/script-from-feature.md`. Follow its hard rules.
|
|
16
16
|
|
|
17
|
-
5.
|
|
17
|
+
5. Before reading the test.feature + story.md content into your generation context, mint a fresh per-invocation nonce by running:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
bun -e "console.log('XR_' + crypto.randomUUID().replace(/-/g,'').slice(0,12))"
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Capture the single-line output (e.g. `XR_a3f9b2c14e8d`) as the nonce for this invocation. Do NOT persist it to disk, log it, or include it in spec.ts output.
|
|
24
|
+
|
|
25
|
+
6. Read `.xera/{{TICKET}}/test.feature` and `.xera/{{TICKET}}/story.md`. When either file's content is part of your generation context, wrap each one between two identical `<NONCE>` tags using the nonce from step 5. Conceptually each wrapped block looks like:
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
<XR_a3f9b2c14e8d>
|
|
29
|
+
...exact file contents, unmodified...
|
|
30
|
+
<XR_a3f9b2c14e8d>
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Then generate:
|
|
18
34
|
- `.xera/{{TICKET}}/spec.ts`
|
|
19
35
|
- `.xera/{{TICKET}}/page-objects/<ClassName>.ts` for each new POM
|
|
20
|
-
Do not modify anything under `shared/`.
|
|
21
36
|
|
|
22
|
-
|
|
37
|
+
Do not modify anything under `shared/`. Do NOT include the nonce markers or any text outside the file bodies in the written files.
|
|
38
|
+
|
|
39
|
+
7. Run quality gates:
|
|
23
40
|
- `bun run xera:typecheck {{TICKET}}` — if exit 2, read errors, fix in the generated files, retry up to 2 times.
|
|
24
41
|
- `bun run xera:lint {{TICKET}}` — same retry policy. If a CSS selector is truly necessary, add `// xera-allow-css: <reason>` on the line above it.
|
|
25
42
|
|
|
26
|
-
|
|
43
|
+
8. Update meta.json: `script_generated_at`, `script_generated_from_feature_hash`.
|
|
27
44
|
|
|
28
|
-
|
|
45
|
+
9. Summarize: list of files written, count of new POMs, mention any POM that *looked* reusable but didn't quite fit (suggest the user might want `/xera-promote` later).
|
|
29
46
|
Suggest: "Run the test now with `/xera-exec {{TICKET}}`, or do the whole pipeline with `/xera-run {{TICKET}}`."
|