@metasession.co/devaudit-cli 0.1.49 → 0.1.53
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +1 -1
- package/dist/index.js.map +1 -1
- package/package.json +2 -2
- package/sdlc/files/_common/1-plan-requirement.md +48 -5
- package/sdlc/files/_common/2-implement-and-test.md +3 -1
- package/sdlc/files/_common/3-compile-evidence.md +18 -0
- package/sdlc/files/_common/Implementation_Plan_TEMPLATE.md +37 -1
- package/sdlc/files/_common/Test_Architecture.md +1 -1
- package/sdlc/files/_common/Test_Policy.md +12 -0
- package/sdlc/files/_common/Test_Strategy.md +18 -0
- package/sdlc/files/_common/skills/e2e-test-engineer/SKILL.md +52 -0
- package/sdlc/files/_common/skills/e2e-test-engineer/references/e2e-regression-3-tier.yml +178 -0
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@metasession.co/devaudit-cli",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.53",
|
|
4
4
|
"description": "DevAudit CLI — installs, syncs, and operates the Metasession SDLC across consumer projects.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -33,7 +33,7 @@
|
|
|
33
33
|
},
|
|
34
34
|
"dependencies": {
|
|
35
35
|
"@clack/prompts": "^0.8.2",
|
|
36
|
-
"@metasession.co/devaudit-plugin-sdk": "^0.1.
|
|
36
|
+
"@metasession.co/devaudit-plugin-sdk": "^0.1.53",
|
|
37
37
|
"commander": "^12.1.0",
|
|
38
38
|
"consola": "^3.2.3",
|
|
39
39
|
"env-paths": "^3.0.0",
|
|
@@ -131,6 +131,8 @@ Create `compliance/evidence/REQ-XXX/implementation-plan.md`:
|
|
|
131
131
|
- Files to create/modify
|
|
132
132
|
- Architecture decisions
|
|
133
133
|
- Risks and dependencies
|
|
134
|
+
- **Surface inventory completeness** (MEDIUM/HIGH risk) — every user-touchable surface listed in Section 2's surface-inventory table is either `In scope`, `Already works`, or explicitly `Out of scope (waived)` with a follow-up issue. No surface is silently absent. _(devaudit#152)_
|
|
135
|
+
- **AC form** — the test-scope ACs (drafted in Step 7) can each be phrased in Given/When/Then against the surfaces in scope. If any AC reduces to _"the schema accepts X"_ or _"the resolver returns Y"_, the plan is incomplete — return to Section 2 and expand the surface inventory until every AC has a UI surface that delivers it. _(devaudit#152)_
|
|
134
136
|
|
|
135
137
|
**Do NOT proceed** until the developer explicitly approves the plan. If the developer requests changes, update `implementation-plan.md` and re-present. For HIGH risk, this review is especially important — it's cheaper to change the plan than to refactor the code.
|
|
136
138
|
|
|
@@ -177,9 +179,22 @@ Standard gates apply. No additional testing beyond universal exit criteria.
|
|
|
177
179
|
- CI independent verification: all PR checks pass
|
|
178
180
|
- Human code review via PR
|
|
179
181
|
|
|
182
|
+
### How to write acceptance criteria (devaudit#152)
|
|
183
|
+
|
|
184
|
+
Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
|
|
185
|
+
|
|
186
|
+
> **Given** [pre-state + which UI surface the user is on], **When** [named user action with a named control], **Then** [observable change in a named UI surface].
|
|
187
|
+
|
|
188
|
+
If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — return to the implementation plan's surface inventory (Section 2). LOW risk REQs may keep ACs shorter when the change is genuinely surface-free (refactor / dep bump / infra-only), but the journey form is still preferred when a user surface exists.
|
|
189
|
+
|
|
190
|
+
Examples:
|
|
191
|
+
|
|
192
|
+
- ✅ "Given the dependency is updated, When CI runs the universal gates, Then 0 high/critical findings."
|
|
193
|
+
- ❌ "Schema accepts optional `inventoryId` field" — internal mechanic, belongs in `test-plan.md` (this matters even for LOW when the change is user-facing).
|
|
194
|
+
|
|
180
195
|
## Acceptance Criteria
|
|
181
196
|
|
|
182
|
-
- [x] [Criterion 1 — what "done" looks like]
|
|
197
|
+
- [x] [Criterion 1 — what "done" looks like, phrased Given/When/Then where applicable]
|
|
183
198
|
- [x] [Criterion 2]
|
|
184
199
|
EOF
|
|
185
200
|
```
|
|
@@ -218,10 +233,24 @@ How we confirm this meets the business requirement:
|
|
|
218
233
|
- [e.g., "Verify public page displays new content correctly"]
|
|
219
234
|
- [e.g., "Confirm edits are visible to users within expected time"]
|
|
220
235
|
|
|
236
|
+
### How to write acceptance criteria (devaudit#152)
|
|
237
|
+
|
|
238
|
+
Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
|
|
239
|
+
|
|
240
|
+
> **Given** [pre-state + which UI surface the user is on], **When** [named user action with a named control], **Then** [observable change in a named UI surface] _(plus any audit / downstream UI changes)_.
|
|
241
|
+
|
|
242
|
+
If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — return to the implementation plan's surface inventory (Section 2).
|
|
243
|
+
|
|
244
|
+
Examples:
|
|
245
|
+
|
|
246
|
+
- ✅ "Given Poundo has Ogbono linked, When a staff member opens `/dashboard/orders/express/create-order`, picks Ogbono from the Soup group, and marks the order Complete, Then `/dashboard/inventory/{ogbono}` shows stock decreased by 1 and a new Sale movement row tied to the order ID."
|
|
247
|
+
- ❌ "Schema accepts optional `inventoryId` field (persistence round-trip)" — unit-test contract, belongs in `test-plan.md`, not here.
|
|
248
|
+
- ❌ "Resolver maps selected pairs to inventory link" — internal mechanic, not user value.
|
|
249
|
+
|
|
221
250
|
## Acceptance Criteria
|
|
222
251
|
|
|
223
|
-
- [ ] [Criterion 1]
|
|
224
|
-
- [ ] [Criterion 2]
|
|
252
|
+
- [ ] [Criterion 1 — Given/When/Then]
|
|
253
|
+
- [ ] [Criterion 2 — Given/When/Then]
|
|
225
254
|
- [ ] All additional testing items above pass
|
|
226
255
|
EOF
|
|
227
256
|
```
|
|
@@ -274,10 +303,24 @@ How we confirm this meets the business requirement:
|
|
|
274
303
|
- Elevated review required for: [security-sensitive files]
|
|
275
304
|
- Regeneration protocol: [will any components be regenerated?]
|
|
276
305
|
|
|
306
|
+
### How to write acceptance criteria (devaudit#152)
|
|
307
|
+
|
|
308
|
+
Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
|
|
309
|
+
|
|
310
|
+
> **Given** [pre-state + which UI surface the user is on], **When** [named user action with a named control], **Then** [observable change in a named UI surface] _(plus any audit / downstream UI changes)_.
|
|
311
|
+
|
|
312
|
+
HIGH risk especially: every AC must pin to a named UI surface from the implementation plan's surface inventory (Section 2). If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — expand the surface inventory before approving the plan. This is the gap that produced REQ-030 on a consumer project (feature shipped through every gate green, but no order-creation surface let a user select a customisation at order time).
|
|
313
|
+
|
|
314
|
+
Examples:
|
|
315
|
+
|
|
316
|
+
- ✅ "Given an admin has linked Poundo to Ogbono in `/dashboard/inventory/links`, When a staff member opens `/dashboard/orders/express/create-order`, picks Ogbono from the Soup group, and marks the order Complete, Then `/dashboard/inventory/{ogbono}` shows stock decreased by 1, a new Sale movement row appears tied to the order ID, and the activity timeline records the link-driven deduction."
|
|
317
|
+
- ❌ "Schema accepts optional `inventoryId` field (persistence round-trip)" — unit-test contract, belongs in `test-plan.md`, not here.
|
|
318
|
+
- ❌ "Resolver maps selected pairs to inventory link" — internal mechanic, not user value.
|
|
319
|
+
|
|
277
320
|
## Acceptance Criteria
|
|
278
321
|
|
|
279
|
-
- [ ] [Criterion 1]
|
|
280
|
-
- [ ] [Criterion 2]
|
|
322
|
+
- [ ] [Criterion 1 — Given/When/Then against a named UI surface]
|
|
323
|
+
- [ ] [Criterion 2 — Given/When/Then against a named UI surface]
|
|
281
324
|
- [ ] All security testing items pass
|
|
282
325
|
- [ ] All validation items confirmed
|
|
283
326
|
- [ ] Independent review completed (if required)
|
|
@@ -126,7 +126,9 @@ Write or update E2E tests **after** implementation. E2E tests need working UI/AP
|
|
|
126
126
|
|
|
127
127
|
> **Skill available:** invoke the **`e2e-test-engineer`** skill for this step (at `.claude/skills/e2e-test-engineer/SKILL.md`). It derives scenarios from the requirement's acceptance criteria, reconciles with the existing test pack (flags obsoletes — but never deletes without confirmation), runs the suite, and files defects for failures or missed ACs. Framework-agnostic (Playwright, Cypress, pytest-playwright, etc.) and tracker-agnostic (GitHub, Linear, Jira, etc.). For projects with no e2e suite yet, the skill also covers bootstrapping one. See [`sdlc/SKILLS.md`](../sdlc/SKILLS.md) for the full list of available skills.
|
|
128
128
|
|
|
129
|
-
> **Run authenticated flows in CI.** Tests that need a logged-in session (admin forms, role-gated flows) belong in their own Playwright project that depends on `auth-setup`. Register that project name in `sdlc-config.json` `e2e_projects` and set `e2e_seed_command` / `e2e_env` so CI seeds fixtures and runs it as a **report-only** gate (continue-on-error — it surfaces failures as evidence without blocking the merge until proven stable). Prove each AC with an `evidenceShot(page, 'REQ-XXX', '
|
|
129
|
+
> **Run authenticated flows in CI.** Tests that need a logged-in session (admin forms, role-gated flows) belong in their own Playwright project that depends on `auth-setup`. Register that project name in `sdlc-config.json` `e2e_projects` and set `e2e_seed_command` / `e2e_env` so CI seeds fixtures and runs it as a **report-only** gate (continue-on-error — it surfaces failures as evidence without blocking the merge until proven stable). Prove each UI-driven AC with an `evidenceShot(page, 'REQ-XXX', acN, 'slug')` so the PNG lands in `compliance/evidence/REQ-XXX/screenshots/`. This is what lets Stage 3 Step 10 reduce manual UAT to a light smoke instead of a full re-click.
|
|
130
|
+
|
|
131
|
+
> **Transport-layer specs have no page** (devaudit#127). Specs that exercise the system at the transport boundary — Node `fetch` against webhooks, `MongoClient` queries, `socket.io-client` assertions — cannot call `evidenceShot`. Their evidence form is the per-spec row in `test-execution-summary.md` describing the asserted behaviour in operator terms. The portal's release-detail "screenshots" panel will show zero entries for purely-transport REQs; that's correct. Reviewers cross-reference `test-execution-summary.md` instead. See `e2e-test-engineer/SKILL.md` § *Specs with no page object*.
|
|
130
132
|
|
|
131
133
|
**4a. Review the test plan for E2E items:**
|
|
132
134
|
```bash
|
|
@@ -135,6 +135,24 @@ cat > compliance/evidence/REQ-XXX/test-execution-summary.md << 'EOF'
|
|
|
135
135
|
**Git SHA:** [short SHA]
|
|
136
136
|
**CI Run:** [run ID or "local"]
|
|
137
137
|
|
|
138
|
+
## Test design (devaudit#50)
|
|
139
|
+
|
|
140
|
+
Records the design-time decisions before listing run results — what was tested, what was deliberately deferred, who/what decided. Auditors (and future maintainers) can see the scope decision was *made*, not implicit.
|
|
141
|
+
|
|
142
|
+
**Layers planned:** [unit | integration | e2e | visual | manual — pick the ones that apply to this REQ]
|
|
143
|
+
|
|
144
|
+
**Layers covered:** [same list, marked ✓ for shipped layers / `deferred` for skipped ones]
|
|
145
|
+
|
|
146
|
+
**Deferrals (if any):**
|
|
147
|
+
|
|
148
|
+
- [e.g. "e2e N/A — schema-only change, no UI surface reads the new fields yet; deferred to REQ-NNN when the admin form lands"]
|
|
149
|
+
- [e.g. "visual regression N/A — backend service change, no UI affected"]
|
|
150
|
+
- A deferral without a stated rationale is a gap, not a deferral. Either name *why* it was skipped or do the work.
|
|
151
|
+
|
|
152
|
+
**Skill invocation:** [`e2e-test-engineer` invoked on turn N during Phase 2 — verifiable from the chat transcript] / [`manual scope decision` — operator chose layers directly because <reason>]
|
|
153
|
+
|
|
154
|
+
**Surface inventory (MEDIUM/HIGH risk REQs):** see `implementation-plan.md` Section 2. Each `In scope` surface here should map to at least one passing test below; each `Already works` surface should map to a regression-pack spec; each `Out of scope (waived)` surface should have a follow-up issue referenced.
|
|
155
|
+
|
|
138
156
|
## Gate Results
|
|
139
157
|
|
|
140
158
|
| Gate | Result | Details |
|
|
@@ -35,11 +35,29 @@ Each section below maps to one (or more) of these clauses. Don't delete sections
|
|
|
35
35
|
> _Closes ISO 29119 §3.4 — test plan_
|
|
36
36
|
|
|
37
37
|
- **Goal:** REPLACE — one sentence describing what this REQ delivers, no jargon.
|
|
38
|
+
|
|
39
|
+
### How to write acceptance criteria (devaudit#152)
|
|
40
|
+
|
|
41
|
+
Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
|
|
42
|
+
|
|
43
|
+
> **Given** [the relevant pre-state, including which UI surface the user is on],
|
|
44
|
+
> **When** [the user takes a specific, named action with a specific, named control],
|
|
45
|
+
> **Then** [the user can observe a specific, named change in a specific, named UI surface]
|
|
46
|
+
> _(And any additional observable changes — audit rows, downstream UI updates, etc.)_
|
|
47
|
+
|
|
48
|
+
Concrete examples:
|
|
49
|
+
|
|
50
|
+
- ✅ "Given Poundo has Ogbono linked, When a staff member opens `/dashboard/orders/express/create-order` and picks Ogbono from the Soup group and marks the order Complete, Then `/dashboard/inventory/{ogbono}` shows stock decreased by 1 and one new Sale movement row tied to the order ID."
|
|
51
|
+
- ❌ "Schema accepts optional `inventoryId` field (persistence round-trip)" — this is a unit-test contract, not a user-observable AC. It belongs in `test-plan.md`, not here.
|
|
52
|
+
- ❌ "Resolver maps selected pairs to inventory link" — same problem. Internal mechanics, not user value.
|
|
53
|
+
|
|
54
|
+
If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — expand the **Surface inventory** (Section 2) until every AC has a UI surface that delivers it.
|
|
55
|
+
|
|
38
56
|
- **Acceptance criteria:**
|
|
39
57
|
|
|
40
58
|
| AC | Description | SRS item it traces to |
|
|
41
59
|
| --- | ------------------------------------------ | --------------------------------------------------------------------------------------- |
|
|
42
|
-
| AC1 | REPLACE — one-line
|
|
60
|
+
| AC1 | REPLACE — one-line Given/When/Then journey | REQ-AREA-NNN (existing) / REQ-AREA-NNN (new — propose stub) / `@srs-deferred: <reason>` |
|
|
43
61
|
| AC2 | REPLACE | REPLACE |
|
|
44
62
|
| … | | |
|
|
45
63
|
|
|
@@ -50,6 +68,24 @@ Each section below maps to one (or more) of these clauses. Don't delete sections
|
|
|
50
68
|
- **In scope:** REPLACE — list every file / module / surface the change touches.
|
|
51
69
|
- **Out of scope:** REPLACE — adjacent areas the change deliberately leaves alone.
|
|
52
70
|
|
|
71
|
+
### Surface inventory (MEDIUM/HIGH risk — required) (devaudit#152)
|
|
72
|
+
|
|
73
|
+
List every UI, API, background job, and report **that a real user touches** in the journey this REQ enables. For each surface, mark one of:
|
|
74
|
+
|
|
75
|
+
- **In scope** — this REQ adds or modifies it
|
|
76
|
+
- **Already works** — existing code already handles it correctly (link the file / route as evidence)
|
|
77
|
+
- **Out of scope (waived)** — explicitly deferred, with one-sentence justification and a follow-up issue link
|
|
78
|
+
|
|
79
|
+
| Surface | URL / file | Status |
|
|
80
|
+
| ----------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------- |
|
|
81
|
+
| [e.g. Customer cart] | `/menu` modal — `components/features/menu/…` | In scope |
|
|
82
|
+
| [e.g. Staff POS] | `/dashboard/orders/express/…` | Out of scope (waived) — front-of-house flow not used yet, follow-up #NN |
|
|
83
|
+
| [e.g. Admin Edit Order] | `/dashboard/orders/[id]/edit` — `app/admin/orders/…` | Already works — existing customisation picker handles the new fields as-is |
|
|
84
|
+
|
|
85
|
+
**Rule of thumb:** if the AC list reads _"the schema accepts X"_ or _"the resolver returns Y"_ but never _"the user can do Z in the UI and see the result in the UI"_, the surface inventory is incomplete and the plan is not ready for approval. The matching test-scope ACs in Step 7 must each pin to a surface listed here.
|
|
86
|
+
|
|
87
|
+
LOW risk REQs may skip the table when the change is genuinely surface-free (refactor, dependency bump, infra-only). State `Surface inventory: N/A — <reason>` instead.
|
|
88
|
+
|
|
53
89
|
## 3. Architecture decisions
|
|
54
90
|
|
|
55
91
|
> _Populated by the [`adr-author` skill](../skills/adr-author/SKILL.md) at Stage 1 plan APPROVAL._
|
|
@@ -39,7 +39,7 @@ These standards apply to all Metasession products, client engagements, and inter
|
|
|
39
39
|
### Speed over Exhaustiveness
|
|
40
40
|
- Fast feedback prioritized (unit tests < 30 seconds)
|
|
41
41
|
- Parallelization and sharding for E2E suites
|
|
42
|
-
- Strategic test selection based on code changes
|
|
42
|
+
- Strategic test selection based on code changes — first concrete implementation is the three-tier E2E gating model (smoke / critical / regression), see Test_Strategy.md § *E2E gating model* (v0.1.53+)
|
|
43
43
|
- Regression suites optimized for execution time
|
|
44
44
|
|
|
45
45
|
### Traceability
|
|
@@ -150,6 +150,18 @@ Testing effort is prioritized by risk level, determined at planning time:
|
|
|
150
150
|
|
|
151
151
|
AI involvement in Medium or High categories raises risk by one level. The Test Strategy defines specific testing depth requirements per level.
|
|
152
152
|
|
|
153
|
+
### E2E gate enforcement (v0.1.53+)
|
|
154
|
+
|
|
155
|
+
The MoSCoW prioritisation of acceptance criteria maps onto three E2E gates, each enforced at a different point in the workflow:
|
|
156
|
+
|
|
157
|
+
- **Must-tier ACs in the smoke subset** must pass on every push to the integration branch. Blocking — a red smoke gate stops the integration hop.
|
|
158
|
+
- **Must-tier ACs in the critical subset** must pass on every PR to the release branch. Blocking — a red critical gate stops the release.
|
|
159
|
+
- **Should/Could-tier ACs (full regression)** must pass on the next post-merge run to the release branch OR a hotfix issue is auto-filed. Not pre-merge blocking — the safety net is post-hoc triage by the operator within working hours.
|
|
160
|
+
|
|
161
|
+
Operator override on a hotfix issue (accept-with-rationale) is logged on the issue itself + carried in the next release's `test-execution-summary.md` design record (devaudit#50). The framework does not permit silently shipping a failing test — every red regression spec ends as either fixed, reverted, or accepted-with-recorded-rationale.
|
|
162
|
+
|
|
163
|
+
See Test_Strategy.md § *System Testing (E2E)* — *E2E gating model* for the tier definitions + cost philosophy.
|
|
164
|
+
|
|
153
165
|
---
|
|
154
166
|
|
|
155
167
|
## Roles & Responsibilities
|
|
@@ -38,6 +38,24 @@ Validates interactions between system components — API contracts, service inte
|
|
|
38
38
|
|
|
39
39
|
End-to-end validation of complete user workflows from UI to database. Primary responsibility of the QA team. Automated using BDD frameworks that map acceptance criteria to executable specifications. Covers 100% of critical user paths.
|
|
40
40
|
|
|
41
|
+
#### E2E gating model — three tiers (devaudit#152 follow-up, v0.1.53)
|
|
42
|
+
|
|
43
|
+
Full E2E regression on every PR is expensive — a 30+ minute wait per release-PR blocks velocity for diminishing marginal safety once smoke covers the headline flows. The framework's gating model maps the existing MoSCoW prioritisation onto three tiers, each gated at a different point in the workflow:
|
|
44
|
+
|
|
45
|
+
| Tier | Location | When it runs | Wall-clock target | Audit role |
|
|
46
|
+
|---|---|---|---|---|
|
|
47
|
+
| **smoke** | `e2e/smoke/*.spec.ts` | every push to `$INTEGRATION_BRANCH` (via `ci.yml`) | ~3–5 min | fast feedback on every change |
|
|
48
|
+
| **critical** | `e2e/smoke/` + `e2e/critical/*.spec.ts` | PR-to-`$RELEASE_BRANCH` (via `e2e-regression.yml`) | ~10–15 min | release-readiness Must gate |
|
|
49
|
+
| **regression** | all `e2e/**/*.spec.ts` | nightly + push-to-`$RELEASE_BRANCH` + `workflow_dispatch` | ~35 min (or your project's full pack) | full audit trail + drift catch |
|
|
50
|
+
|
|
51
|
+
The mapping to MoSCoW: **Must-priority SRS items live in `e2e/smoke/` (fast feedback) and `e2e/critical/` (release gate); Should/Could items live in `e2e/` and are covered by the regression tier.** The classifier is the developer authoring the spec — see `skills/e2e-test-engineer/SKILL.md` Phase 3 for the decision tree.
|
|
52
|
+
|
|
53
|
+
**Cost philosophy.** Smoke protects every push from breaking the headline flow. Critical protects every release from a Must-tier regression. Full regression protects the audit trail + catches drift overnight. We accept that a Should/Could-tier regression *can* slip past the PR gate; we catch it on the next post-merge run + auto-file a hotfix issue. The framework prefers this over a 35-min wait on every release because operator velocity matters and the safety net stays intact.
|
|
54
|
+
|
|
55
|
+
**Post-merge safety net.** Every push to `$RELEASE_BRANCH` re-runs the full regression. On failure, `e2e-regression.yml` auto-files a `bug, priority:high` issue tagging the merge commit + the failing specs. The operator triages within working hours — hotfix forward, revert the commit, or accept-with-rationale if the failure is environmental. No automated revert (false positives + flakes + UAT-data drift are real classes; an operator triages each individually).
|
|
56
|
+
|
|
57
|
+
**Reference workflow.** A copy-pasteable `e2e-regression.yml` shape lives at `skills/e2e-test-engineer/references/e2e-regression-3-tier.yml`. Adoption is opt-in per consumer (the framework doesn't currently sync this workflow; consumers own their own `e2e-regression.yml`).
|
|
58
|
+
|
|
41
59
|
### Acceptance Testing
|
|
42
60
|
|
|
43
61
|
Validates that requirements and acceptance criteria are met from a business perspective. Conducted in staging environments mirroring production. Requires sign-off from Product Managers. May include formal UAT with stakeholders for regulated features.
|
|
@@ -19,6 +19,8 @@ Maintain or bootstrap a project's e2e and visual regression test suite. Given an
|
|
|
19
19
|
- Unit, component, or API-only tests.
|
|
20
20
|
- Performance, load, or accessibility audits, unless the project's e2e pack already includes them — in which case follow its lead.
|
|
21
21
|
|
|
22
|
+
**Transport-layer specs that live in `e2e/`** (Node `fetch` against webhooks, `MongoClient` queries, `socket.io-client` assertions) ARE in scope — they exercise the deployed system end-to-end, just at the transport boundary rather than the UI. Their evidence form is `test-execution-summary.md`, not `evidenceShot` (see *Specs with no page object* below). The "API-only tests" exclusion above means **unit-level** API contract tests that exercise a route handler in isolation, not transport-boundary integration tests against the running system.
|
|
23
|
+
|
|
22
24
|
## The workflow
|
|
23
25
|
|
|
24
26
|
Six phases. Don't skip them and don't reorder — each one feeds the next. Communicate progress as you go; long silent phases feel like the skill has stalled.
|
|
@@ -129,6 +131,26 @@ Resist padding. A new endpoint doesn't need a test that re-verifies login if log
|
|
|
129
131
|
|
|
130
132
|
For each scenario, write a one-line description. Present the full grouped list to the user before writing any code: *"Here's the coverage I'd propose — anything to add or drop?"*
|
|
131
133
|
|
|
134
|
+
#### Classify each spec into a tier (devaudit#152 follow-up, v0.1.53)
|
|
135
|
+
|
|
136
|
+
When designing each scenario, also pick the tier it'll live in. Three tiers map to MoSCoW priority + gating point (see `Test_Strategy.md` § *E2E gating model*):
|
|
137
|
+
|
|
138
|
+
| Tier | File location | Picks this when… |
|
|
139
|
+
|---|---|---|
|
|
140
|
+
| **smoke** | `e2e/smoke/*.spec.ts` | Cross-cutting sanity that proves the app is up: login, basic nav, one canonical CRUD per main domain. Runs on every push to the integration branch. Keep small — total smoke wall-clock target is ~3–5 min. |
|
|
141
|
+
| **critical** | `e2e/critical/*.spec.ts` | Must-priority SRS item that breaks a headline flow if it regresses. Examples: payment authorisation, order completion, admin permission editing, RBAC enforcement on financial surfaces. Runs on PR-to-release-branch. Total critical wall-clock target ~10–15 min (includes smoke). |
|
|
142
|
+
| **regression** | `e2e/<area>/*.spec.ts` | Should/Could-priority SRS item, edge cases, less-load-bearing flows. Runs nightly + post-merge + dispatch. Total full pack can be 30+ min; that's the point of the tier. |
|
|
143
|
+
|
|
144
|
+
Decision tree, applied per scenario:
|
|
145
|
+
|
|
146
|
+
1. **Does the spec prove a Must-priority SRS AC (or a baseline "app is up" sanity check)?** → smoke or critical.
|
|
147
|
+
2. **Within Must: would a regression here break a headline business flow visible to a paying customer or stop a release from shipping?** → critical. Otherwise → smoke.
|
|
148
|
+
3. **Should/Could priority, edge case, advanced flow?** → regression (file under `e2e/<area>/`, not under `e2e/smoke/` or `e2e/critical/`).
|
|
149
|
+
|
|
150
|
+
When you can't decide between critical and regression, default to **regression** — promoting a spec from regression → critical later is cheap (move the file); demoting in the other direction is rarely needed but equally cheap. The cost of putting a Should spec in critical is everyone waiting longer on every PR-to-main for a low-value signal.
|
|
151
|
+
|
|
152
|
+
Record the tier choice in the eventual `test-execution-summary.md` § *Test design* (devaudit#50) — Layers covered should name which tier each new spec landed in. Reviewers verify the tier choice is defensible during the WAIT CHECKPOINT.
|
|
153
|
+
|
|
132
154
|
### Phase 4 — Reconcile with existing tests
|
|
133
155
|
|
|
134
156
|
For the area touched by the change, look at what's already there.
|
|
@@ -303,6 +325,15 @@ Wrap up with a summary the user can drop into the PR or ticket:
|
|
|
303
325
|
- Defects filed — count, with links.
|
|
304
326
|
- Missed requirements — count, with links.
|
|
305
327
|
|
|
328
|
+
**Then feed the test-design record (devaudit#50).** The Stage 3 `test-execution-summary.md` (generated per `3-compile-evidence.md` Step 1a) carries a `## Test design` section at the top. Before Stage 3 finalises the file, populate that section with the design-time decisions this skill made, so the SDLC has a recorded trace that scope was *decided*, not implicit:
|
|
329
|
+
|
|
330
|
+
- **Layers planned** — which of `unit | integration | e2e | visual | manual` applied to this REQ
|
|
331
|
+
- **Layers covered** — same list with ✓ or `deferred`
|
|
332
|
+
- **Deferrals** — explicit one-line rationale per deferred layer (`e2e N/A — schema-only, no UI yet` rather than silent absence)
|
|
333
|
+
- **Skill invocation** — _"`e2e-test-engineer` invoked on turn N during Phase 2"_, with a turn pointer the reviewer can verify against the chat transcript
|
|
334
|
+
|
|
335
|
+
If you authored or modified `e2e/**/*.spec.ts` directly without invoking this skill, that's a delegation gap — the `sdlc-implementer` Phase 2 audit (devaudit#132) will catch it before Phase 3. The honest record is: the skill ran (or didn't), the layers were chosen for stated reasons, and the test-execution-summary attribution points back at the chat turn where the decision happened.
|
|
336
|
+
|
|
306
337
|
---
|
|
307
338
|
|
|
308
339
|
## Evidence vs failure forensics
|
|
@@ -373,6 +404,27 @@ When to deviate:
|
|
|
373
404
|
- **Long flows** (>3 meaningful transitions) keep all stages tier `'feature'`. The post-merge regression run still has the canonical anchor to corroborate the AC; the dense journey is on the feature PR for reviewers and in the audit-pack download for that release forever.
|
|
374
405
|
- **Reviewer pushback that evidence feels thin** (single-shot per AC across a HIGH-risk REQ) almost always means tier `'feature'` stages are missing — add them on the feature branch where they actually fire, not after.
|
|
375
406
|
|
|
407
|
+
### Specs with no page object — transport-layer evidence (devaudit#127)
|
|
408
|
+
|
|
409
|
+
`evidenceShot` requires a Playwright `page` object. Specs that exercise behaviour at the transport layer — Node `fetch` against HTTP / webhook endpoints, `socket.io-client` connections, direct `MongoClient` queries, gRPC clients — have no `page` and **cannot call `evidenceShot`**. Examples from the wawagardenbar-app regression pack:
|
|
410
|
+
|
|
411
|
+
- `e2e/payments/webhook-signature-rejection.spec.ts` — HMAC-SHA512 verification via Node `fetch`
|
|
412
|
+
- `e2e/realtime/order-status-broadcast.spec.ts` — `socket.io-client` event assertion
|
|
413
|
+
- `e2e/admin/menu-item-delete.spec.ts` — direct `MongoClient` + service-layer call
|
|
414
|
+
|
|
415
|
+
These specs are still E2E (they exercise the deployed system end-to-end at the transport boundary), they belong in `e2e/`, and they run alongside UI specs. **Their evidence form is the per-spec entry in `test-execution-summary.md`** — the table of spec → pass/fail → asserted behaviour (signature rejected with HTTP 401, idempotent replay suppressed, broadcast received within Nms, soft-delete cascaded) is the load-bearing proof. The screenshot check is **N/A** for them; the release-completeness "behavioural proof" check is satisfied by the test-execution-summary upload alone.
|
|
416
|
+
|
|
417
|
+
Discipline for transport specs:
|
|
418
|
+
|
|
419
|
+
- Name the asserted behaviour in the test title using the same `[REQ-XXX][ACn]` bracket convention UI specs use. Reviewers grep on that.
|
|
420
|
+
- The `test-execution-summary.md` table row should describe what the spec verified in operator-facing terms ("signature mismatch returns 401; payment row unchanged"), not in TypeScript-spec terms ("`expect(response.status).toBe(401)`").
|
|
421
|
+
- If a transport spec *can* be paired with a thin UI shim that screenshots the user-visible outcome (e.g. an admin dashboard surface that shows the rejected payment as "Failed — signature mismatch"), pair them — that buys back the screenshot evidence at the surface level. Otherwise: transport spec stands alone.
|
|
422
|
+
- The portal's release-detail "screenshots" panel will show zero entries for purely-transport REQs; that's correct. Reviewers cross-reference `test-execution-summary.md` instead.
|
|
423
|
+
|
|
424
|
+
This is **observation**, not gate-relaxation — these specs satisfy the SDLC evidence requirement; the screenshot mechanism doesn't apply.
|
|
425
|
+
|
|
426
|
+
A `evidenceTrace(reqId, ac, slug, payload)` helper that writes a JSON sidecar (request/response/ledger shape) was considered as a Phase B; deferred until the portal grows a non-PNG evidence type. Today the test-execution-summary already carries the equivalent information at the table level.
|
|
427
|
+
|
|
376
428
|
---
|
|
377
429
|
|
|
378
430
|
## Principles
|
|
@@ -0,0 +1,178 @@
|
|
|
1
|
+
# Reference: three-tier E2E gating workflow (devaudit#152 follow-up, v0.1.53)
|
|
2
|
+
#
|
|
3
|
+
# Copy this into your consumer-owned .github/workflows/e2e-regression.yml
|
|
4
|
+
# to adopt the 3-tier model: smoke (every develop push, fast) / critical
|
|
5
|
+
# (PR-to-main, ~10-15 min target) / regression (nightly + push-to-main +
|
|
6
|
+
# dispatch, full audit trail with auto-issue on failure).
|
|
7
|
+
#
|
|
8
|
+
# The framework does NOT sync this file automatically — your consumer
|
|
9
|
+
# owns its e2e-regression.yml. Apply the patterns below to your own
|
|
10
|
+
# file; keep any consumer-specific env / matrix / runner customisations.
|
|
11
|
+
#
|
|
12
|
+
# Tier definitions:
|
|
13
|
+
# - smoke — runs on develop push via ci.yml (no change here)
|
|
14
|
+
# - critical — Playwright project that selects e2e/smoke/ + e2e/critical/
|
|
15
|
+
# - regression — Playwright project that selects all e2e/**/*.spec.ts
|
|
16
|
+
#
|
|
17
|
+
# playwright.config.ts must define the `critical` project for this to
|
|
18
|
+
# fire; if it doesn't, the gate falls back to the existing `smoke`
|
|
19
|
+
# project so PR-to-main stays green during migration.
|
|
20
|
+
|
|
21
|
+
name: E2E Regression
|
|
22
|
+
|
|
23
|
+
on:
|
|
24
|
+
pull_request:
|
|
25
|
+
branches: [main] # critical-tier gate before merge
|
|
26
|
+
push:
|
|
27
|
+
branches: [main] # full regression after merge; auto-issues on failure
|
|
28
|
+
schedule:
|
|
29
|
+
- cron: '0 2 * * *' # nightly full regression
|
|
30
|
+
workflow_dispatch:
|
|
31
|
+
inputs:
|
|
32
|
+
specs:
|
|
33
|
+
description: 'Optional: space-separated spec paths or --grep pattern for a scoped run. Empty = full regression.'
|
|
34
|
+
required: false
|
|
35
|
+
|
|
36
|
+
permissions:
|
|
37
|
+
contents: read
|
|
38
|
+
issues: write # post-merge auto-issue on regression failure
|
|
39
|
+
|
|
40
|
+
concurrency:
|
|
41
|
+
group: e2e-regression-${{ github.ref }}
|
|
42
|
+
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
|
|
43
|
+
|
|
44
|
+
jobs:
|
|
45
|
+
e2e:
|
|
46
|
+
name: E2E Regression Tests
|
|
47
|
+
runs-on: ubuntu-latest # adapt to your runner; e.g. self-hosted, ubuntu-24.04
|
|
48
|
+
steps:
|
|
49
|
+
- uses: actions/checkout@v4
|
|
50
|
+
with:
|
|
51
|
+
fetch-depth: 0 # for E2E_NEW_SPECS computation
|
|
52
|
+
|
|
53
|
+
- uses: actions/setup-node@v4
|
|
54
|
+
with:
|
|
55
|
+
node-version: '22' # match your project
|
|
56
|
+
cache: 'npm'
|
|
57
|
+
|
|
58
|
+
- name: Install dependencies
|
|
59
|
+
run: npm ci --legacy-peer-deps
|
|
60
|
+
|
|
61
|
+
- name: Install Playwright browsers
|
|
62
|
+
run: npx playwright install --with-deps chromium
|
|
63
|
+
|
|
64
|
+
# Decide which Playwright project to run based on the trigger.
|
|
65
|
+
# PR-to-main uses critical with smoke fall-back; push-to-main and
|
|
66
|
+
# schedule run the full regression project; workflow_dispatch
|
|
67
|
+
# accepts an optional spec filter.
|
|
68
|
+
- name: Determine E2E project + spec selector
|
|
69
|
+
id: select
|
|
70
|
+
run: |
|
|
71
|
+
set -euo pipefail
|
|
72
|
+
EVENT="${{ github.event_name }}"
|
|
73
|
+
case "$EVENT" in
|
|
74
|
+
pull_request)
|
|
75
|
+
if grep -qE "name:\s*['\"]critical['\"]" playwright.config.ts 2>/dev/null; then
|
|
76
|
+
echo "project=critical" >> "$GITHUB_OUTPUT"
|
|
77
|
+
echo "Using critical-tier project (smoke + e2e/critical/)"
|
|
78
|
+
else
|
|
79
|
+
echo "project=smoke" >> "$GITHUB_OUTPUT"
|
|
80
|
+
echo "::warning::No 'critical' Playwright project defined; falling back to smoke. See e2e-test-engineer/references/e2e-regression-3-tier.yml + the Phase 3 tier-classification guide."
|
|
81
|
+
fi
|
|
82
|
+
echo "specs=" >> "$GITHUB_OUTPUT"
|
|
83
|
+
;;
|
|
84
|
+
push|schedule)
|
|
85
|
+
echo "project=regression" >> "$GITHUB_OUTPUT"
|
|
86
|
+
echo "specs=" >> "$GITHUB_OUTPUT"
|
|
87
|
+
echo "Running full regression project"
|
|
88
|
+
;;
|
|
89
|
+
workflow_dispatch)
|
|
90
|
+
echo "project=regression" >> "$GITHUB_OUTPUT"
|
|
91
|
+
echo "specs=${{ github.event.inputs.specs }}" >> "$GITHUB_OUTPUT"
|
|
92
|
+
if [ -n "${{ github.event.inputs.specs }}" ]; then
|
|
93
|
+
echo "Scoped dispatch: ${{ github.event.inputs.specs }}"
|
|
94
|
+
fi
|
|
95
|
+
;;
|
|
96
|
+
esac
|
|
97
|
+
|
|
98
|
+
- name: Run E2E suite
|
|
99
|
+
id: run
|
|
100
|
+
env:
|
|
101
|
+
PLAYWRIGHT_HTML_REPORTER_OPEN: never
|
|
102
|
+
PLAYWRIGHT_JSON_OUTPUT_NAME: e2e-regression-results.json
|
|
103
|
+
# Add your e2e_env values here as needed (DEVAUDIT_BASE_URL etc.)
|
|
104
|
+
run: |
|
|
105
|
+
set -euo pipefail
|
|
106
|
+
PROJECT="${{ steps.select.outputs.project }}"
|
|
107
|
+
SPECS="${{ steps.select.outputs.specs }}"
|
|
108
|
+
if [ -n "$SPECS" ]; then
|
|
109
|
+
npx playwright test --project="$PROJECT" --reporter=json,html $SPECS
|
|
110
|
+
else
|
|
111
|
+
npx playwright test --project="$PROJECT" --reporter=json,html
|
|
112
|
+
fi
|
|
113
|
+
|
|
114
|
+
- uses: actions/upload-artifact@v4
|
|
115
|
+
if: always()
|
|
116
|
+
with:
|
|
117
|
+
name: e2e-regression-report
|
|
118
|
+
path: |
|
|
119
|
+
e2e-regression-results.json
|
|
120
|
+
playwright-report/
|
|
121
|
+
test-results/
|
|
122
|
+
|
|
123
|
+
# ─────────────────────────────────────────────────────────────
|
|
124
|
+
# Post-merge auto-issue on regression failure (push:branches:[main])
|
|
125
|
+
#
|
|
126
|
+
# Catches regressions that slipped past the critical-tier PR gate.
|
|
127
|
+
# Opens a high-priority issue tagging the merge commit + the
|
|
128
|
+
# failing specs so the operator can triage within working hours.
|
|
129
|
+
# No auto-revert — that's intentionally an operator decision.
|
|
130
|
+
# ─────────────────────────────────────────────────────────────
|
|
131
|
+
- name: Open hotfix issue on post-merge regression
|
|
132
|
+
if: failure() && github.event_name == 'push' && github.ref == 'refs/heads/main'
|
|
133
|
+
env:
|
|
134
|
+
GH_TOKEN: ${{ github.token }}
|
|
135
|
+
run: |
|
|
136
|
+
set -euo pipefail
|
|
137
|
+
MERGE_SHA="${{ github.sha }}"
|
|
138
|
+
MERGE_SHA_SHORT=$(echo "$MERGE_SHA" | cut -c1-7)
|
|
139
|
+
RUN_URL="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
|
|
140
|
+
|
|
141
|
+
# Extract failing spec names from the JSON reporter output if available.
|
|
142
|
+
FAILING=""
|
|
143
|
+
if [ -f e2e-regression-results.json ]; then
|
|
144
|
+
FAILING=$(jq -r '
|
|
145
|
+
[.. | objects | select(.status == "failed" or .status == "timedOut") | .title // empty]
|
|
146
|
+
| unique | .[]
|
|
147
|
+
' e2e-regression-results.json 2>/dev/null | head -20 || true)
|
|
148
|
+
fi
|
|
149
|
+
if [ -z "$FAILING" ]; then
|
|
150
|
+
FAILING="(see the failing run logs — could not parse spec titles from reporter output)"
|
|
151
|
+
fi
|
|
152
|
+
|
|
153
|
+
BODY=$(cat <<EOF
|
|
154
|
+
## Post-merge regression caught on \`main\`
|
|
155
|
+
|
|
156
|
+
The full regression suite failed on the post-merge run for commit \`${MERGE_SHA_SHORT}\`. The critical-tier PR gate let this slip through.
|
|
157
|
+
|
|
158
|
+
**Failing specs (best-effort extracted from the JSON reporter):**
|
|
159
|
+
|
|
160
|
+
\`\`\`
|
|
161
|
+
${FAILING}
|
|
162
|
+
\`\`\`
|
|
163
|
+
|
|
164
|
+
**Triage actions:**
|
|
165
|
+
|
|
166
|
+
- [ ] Read the run log: ${RUN_URL}
|
|
167
|
+
- [ ] Pull \`e2e-regression-report\` artifact from the run; inspect \`test-results/<spec>/error-context.md\` for page state at failure
|
|
168
|
+
- [ ] Decide: hotfix on \`main\`, revert \`${MERGE_SHA_SHORT}\`, or accept-with-rationale if the failure is environmental
|
|
169
|
+
- [ ] If the failing spec is a Must-tier candidate that should have caught this pre-merge, move it from \`e2e/\` to \`e2e/critical/\` so the next PR-to-main runs it
|
|
170
|
+
|
|
171
|
+
**Auto-filed by:** \`e2e-regression.yml\` (devaudit#152 3-tier gating, v0.1.53+)
|
|
172
|
+
EOF
|
|
173
|
+
)
|
|
174
|
+
|
|
175
|
+
gh issue create \
|
|
176
|
+
--title "[hotfix] Post-merge regression on \`${MERGE_SHA_SHORT}\` — full E2E failed" \
|
|
177
|
+
--body "$BODY" \
|
|
178
|
+
--label "bug,priority:high"
|