@metasession.co/devaudit-cli 0.1.47 → 0.1.52

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@metasession.co/devaudit-cli",
3
- "version": "0.1.47",
3
+ "version": "0.1.52",
4
4
  "description": "DevAudit CLI — installs, syncs, and operates the Metasession SDLC across consumer projects.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -33,7 +33,7 @@
33
33
  },
34
34
  "dependencies": {
35
35
  "@clack/prompts": "^0.8.2",
36
- "@metasession.co/devaudit-plugin-sdk": "^0.1.47",
36
+ "@metasession.co/devaudit-plugin-sdk": "^0.1.52",
37
37
  "commander": "^12.1.0",
38
38
  "consola": "^3.2.3",
39
39
  "env-paths": "^3.0.0",
@@ -131,6 +131,8 @@ Create `compliance/evidence/REQ-XXX/implementation-plan.md`:
131
131
  - Files to create/modify
132
132
  - Architecture decisions
133
133
  - Risks and dependencies
134
+ - **Surface inventory completeness** (MEDIUM/HIGH risk) — every user-touchable surface listed in Section 2's surface-inventory table is either `In scope`, `Already works`, or explicitly `Out of scope (waived)` with a follow-up issue. No surface is silently absent. _(devaudit#152)_
135
+ - **AC form** — the test-scope ACs (drafted in Step 7) can each be phrased in Given/When/Then against the surfaces in scope. If any AC reduces to _"the schema accepts X"_ or _"the resolver returns Y"_, the plan is incomplete — return to Section 2 and expand the surface inventory until every AC has a UI surface that delivers it. _(devaudit#152)_
134
136
 
135
137
  **Do NOT proceed** until the developer explicitly approves the plan. If the developer requests changes, update `implementation-plan.md` and re-present. For HIGH risk, this review is especially important — it's cheaper to change the plan than to refactor the code.
136
138
 
@@ -177,9 +179,22 @@ Standard gates apply. No additional testing beyond universal exit criteria.
177
179
  - CI independent verification: all PR checks pass
178
180
  - Human code review via PR
179
181
 
182
+ ### How to write acceptance criteria (devaudit#152)
183
+
184
+ Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
185
+
186
+ > **Given** [pre-state + which UI surface the user is on], **When** [named user action with a named control], **Then** [observable change in a named UI surface].
187
+
188
+ If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — return to the implementation plan's surface inventory (Section 2). LOW risk REQs may keep ACs shorter when the change is genuinely surface-free (refactor / dep bump / infra-only), but the journey form is still preferred when a user surface exists.
189
+
190
+ Examples:
191
+
192
+ - ✅ "Given the dependency is updated, When CI runs the universal gates, Then 0 high/critical findings."
193
+ - ❌ "Schema accepts optional `inventoryId` field" — internal mechanic, belongs in `test-plan.md` (this matters even for LOW when the change is user-facing).
194
+
180
195
  ## Acceptance Criteria
181
196
 
182
- - [x] [Criterion 1 — what "done" looks like]
197
+ - [x] [Criterion 1 — what "done" looks like, phrased Given/When/Then where applicable]
183
198
  - [x] [Criterion 2]
184
199
  EOF
185
200
  ```
@@ -218,10 +233,24 @@ How we confirm this meets the business requirement:
218
233
  - [e.g., "Verify public page displays new content correctly"]
219
234
  - [e.g., "Confirm edits are visible to users within expected time"]
220
235
 
236
+ ### How to write acceptance criteria (devaudit#152)
237
+
238
+ Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
239
+
240
+ > **Given** [pre-state + which UI surface the user is on], **When** [named user action with a named control], **Then** [observable change in a named UI surface] _(plus any audit / downstream UI changes)_.
241
+
242
+ If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — return to the implementation plan's surface inventory (Section 2).
243
+
244
+ Examples:
245
+
246
+ - ✅ "Given Poundo has Ogbono linked, When a staff member opens `/dashboard/orders/express/create-order`, picks Ogbono from the Soup group, and marks the order Complete, Then `/dashboard/inventory/{ogbono}` shows stock decreased by 1 and a new Sale movement row tied to the order ID."
247
+ - ❌ "Schema accepts optional `inventoryId` field (persistence round-trip)" — unit-test contract, belongs in `test-plan.md`, not here.
248
+ - ❌ "Resolver maps selected pairs to inventory link" — internal mechanic, not user value.
249
+
221
250
  ## Acceptance Criteria
222
251
 
223
- - [ ] [Criterion 1]
224
- - [ ] [Criterion 2]
252
+ - [ ] [Criterion 1 — Given/When/Then]
253
+ - [ ] [Criterion 2 — Given/When/Then]
225
254
  - [ ] All additional testing items above pass
226
255
  EOF
227
256
  ```
@@ -274,10 +303,24 @@ How we confirm this meets the business requirement:
274
303
  - Elevated review required for: [security-sensitive files]
275
304
  - Regeneration protocol: [will any components be regenerated?]
276
305
 
306
+ ### How to write acceptance criteria (devaudit#152)
307
+
308
+ Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
309
+
310
+ > **Given** [pre-state + which UI surface the user is on], **When** [named user action with a named control], **Then** [observable change in a named UI surface] _(plus any audit / downstream UI changes)_.
311
+
312
+ HIGH risk especially: every AC must pin to a named UI surface from the implementation plan's surface inventory (Section 2). If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — expand the surface inventory before approving the plan. This is the gap that produced REQ-030 on a consumer project (feature shipped through every gate green, but no order-creation surface let a user select a customisation at order time).
313
+
314
+ Examples:
315
+
316
+ - ✅ "Given an admin has linked Poundo to Ogbono in `/dashboard/inventory/links`, When a staff member opens `/dashboard/orders/express/create-order`, picks Ogbono from the Soup group, and marks the order Complete, Then `/dashboard/inventory/{ogbono}` shows stock decreased by 1, a new Sale movement row appears tied to the order ID, and the activity timeline records the link-driven deduction."
317
+ - ❌ "Schema accepts optional `inventoryId` field (persistence round-trip)" — unit-test contract, belongs in `test-plan.md`, not here.
318
+ - ❌ "Resolver maps selected pairs to inventory link" — internal mechanic, not user value.
319
+
277
320
  ## Acceptance Criteria
278
321
 
279
- - [ ] [Criterion 1]
280
- - [ ] [Criterion 2]
322
+ - [ ] [Criterion 1 — Given/When/Then against a named UI surface]
323
+ - [ ] [Criterion 2 — Given/When/Then against a named UI surface]
281
324
  - [ ] All security testing items pass
282
325
  - [ ] All validation items confirmed
283
326
  - [ ] Independent review completed (if required)
@@ -126,7 +126,9 @@ Write or update E2E tests **after** implementation. E2E tests need working UI/AP
126
126
 
127
127
  > **Skill available:** invoke the **`e2e-test-engineer`** skill for this step (at `.claude/skills/e2e-test-engineer/SKILL.md`). It derives scenarios from the requirement's acceptance criteria, reconciles with the existing test pack (flags obsoletes — but never deletes without confirmation), runs the suite, and files defects for failures or missed ACs. Framework-agnostic (Playwright, Cypress, pytest-playwright, etc.) and tracker-agnostic (GitHub, Linear, Jira, etc.). For projects with no e2e suite yet, the skill also covers bootstrapping one. See [`sdlc/SKILLS.md`](../sdlc/SKILLS.md) for the full list of available skills.
128
128
 
129
- > **Run authenticated flows in CI.** Tests that need a logged-in session (admin forms, role-gated flows) belong in their own Playwright project that depends on `auth-setup`. Register that project name in `sdlc-config.json` `e2e_projects` and set `e2e_seed_command` / `e2e_env` so CI seeds fixtures and runs it as a **report-only** gate (continue-on-error — it surfaces failures as evidence without blocking the merge until proven stable). Prove each AC with an `evidenceShot(page, 'REQ-XXX', 'ACn-…')` so the PNG lands in `compliance/evidence/REQ-XXX/screenshots/`. This is what lets Stage 3 Step 10 reduce manual UAT to a light smoke instead of a full re-click.
129
+ > **Run authenticated flows in CI.** Tests that need a logged-in session (admin forms, role-gated flows) belong in their own Playwright project that depends on `auth-setup`. Register that project name in `sdlc-config.json` `e2e_projects` and set `e2e_seed_command` / `e2e_env` so CI seeds fixtures and runs it as a **report-only** gate (continue-on-error — it surfaces failures as evidence without blocking the merge until proven stable). Prove each UI-driven AC with an `evidenceShot(page, 'REQ-XXX', acN, 'slug')` so the PNG lands in `compliance/evidence/REQ-XXX/screenshots/`. This is what lets Stage 3 Step 10 reduce manual UAT to a light smoke instead of a full re-click.
130
+
131
+ > **Transport-layer specs have no page** (devaudit#127). Specs that exercise the system at the transport boundary — Node `fetch` against webhooks, `MongoClient` queries, `socket.io-client` assertions — cannot call `evidenceShot`. Their evidence form is the per-spec row in `test-execution-summary.md` describing the asserted behaviour in operator terms. The portal's release-detail "screenshots" panel will show zero entries for purely-transport REQs; that's correct. Reviewers cross-reference `test-execution-summary.md` instead. See `e2e-test-engineer/SKILL.md` § *Specs with no page object*.
130
132
 
131
133
  **4a. Review the test plan for E2E items:**
132
134
  ```bash
@@ -135,6 +135,24 @@ cat > compliance/evidence/REQ-XXX/test-execution-summary.md << 'EOF'
135
135
  **Git SHA:** [short SHA]
136
136
  **CI Run:** [run ID or "local"]
137
137
 
138
+ ## Test design (devaudit#50)
139
+
140
+ Records the design-time decisions before listing run results — what was tested, what was deliberately deferred, who/what decided. Auditors (and future maintainers) can see the scope decision was *made*, not implicit.
141
+
142
+ **Layers planned:** [unit | integration | e2e | visual | manual — pick the ones that apply to this REQ]
143
+
144
+ **Layers covered:** [same list, marked ✓ for shipped layers / `deferred` for skipped ones]
145
+
146
+ **Deferrals (if any):**
147
+
148
+ - [e.g. "e2e N/A — schema-only change, no UI surface reads the new fields yet; deferred to REQ-NNN when the admin form lands"]
149
+ - [e.g. "visual regression N/A — backend service change, no UI affected"]
150
+ - A deferral without a stated rationale is a gap, not a deferral. Either name *why* it was skipped or do the work.
151
+
152
+ **Skill invocation:** [`e2e-test-engineer` invoked on turn N during Phase 2 — verifiable from the chat transcript] / [`manual scope decision` — operator chose layers directly because <reason>]
153
+
154
+ **Surface inventory (MEDIUM/HIGH risk REQs):** see `implementation-plan.md` Section 2. Each `In scope` surface here should map to at least one passing test below; each `Already works` surface should map to a regression-pack spec; each `Out of scope (waived)` surface should have a follow-up issue referenced.
155
+
138
156
  ## Gate Results
139
157
 
140
158
  | Gate | Result | Details |
@@ -35,11 +35,29 @@ Each section below maps to one (or more) of these clauses. Don't delete sections
35
35
  > _Closes ISO 29119 §3.4 — test plan_
36
36
 
37
37
  - **Goal:** REPLACE — one sentence describing what this REQ delivers, no jargon.
38
+
39
+ ### How to write acceptance criteria (devaudit#152)
40
+
41
+ Phrase each AC as a **user-observable journey**, not a technical-layer assertion. Use the Given/When/Then form:
42
+
43
+ > **Given** [the relevant pre-state, including which UI surface the user is on],
44
+ > **When** [the user takes a specific, named action with a specific, named control],
45
+ > **Then** [the user can observe a specific, named change in a specific, named UI surface]
46
+ > _(And any additional observable changes — audit rows, downstream UI updates, etc.)_
47
+
48
+ Concrete examples:
49
+
50
+ - ✅ "Given Poundo has Ogbono linked, When a staff member opens `/dashboard/orders/express/create-order` and picks Ogbono from the Soup group and marks the order Complete, Then `/dashboard/inventory/{ogbono}` shows stock decreased by 1 and one new Sale movement row tied to the order ID."
51
+ - ❌ "Schema accepts optional `inventoryId` field (persistence round-trip)" — this is a unit-test contract, not a user-observable AC. It belongs in `test-plan.md`, not here.
52
+ - ❌ "Resolver maps selected pairs to inventory link" — same problem. Internal mechanics, not user value.
53
+
54
+ If you can't phrase an AC in Given/When/Then because no UI surface delivers the change to a user, the scope is incomplete — expand the **Surface inventory** (Section 2) until every AC has a UI surface that delivers it.
55
+
38
56
  - **Acceptance criteria:**
39
57
 
40
58
  | AC | Description | SRS item it traces to |
41
59
  | --- | ------------------------------------------ | --------------------------------------------------------------------------------------- |
42
- | AC1 | REPLACE — one-line behavioural description | REQ-AREA-NNN (existing) / REQ-AREA-NNN (new — propose stub) / `@srs-deferred: <reason>` |
60
+ | AC1 | REPLACE — one-line Given/When/Then journey | REQ-AREA-NNN (existing) / REQ-AREA-NNN (new — propose stub) / `@srs-deferred: <reason>` |
43
61
  | AC2 | REPLACE | REPLACE |
44
62
  | … | | |
45
63
 
@@ -50,6 +68,24 @@ Each section below maps to one (or more) of these clauses. Don't delete sections
50
68
  - **In scope:** REPLACE — list every file / module / surface the change touches.
51
69
  - **Out of scope:** REPLACE — adjacent areas the change deliberately leaves alone.
52
70
 
71
+ ### Surface inventory (MEDIUM/HIGH risk — required) (devaudit#152)
72
+
73
+ List every UI, API, background job, and report **that a real user touches** in the journey this REQ enables. For each surface, mark one of:
74
+
75
+ - **In scope** — this REQ adds or modifies it
76
+ - **Already works** — existing code already handles it correctly (link the file / route as evidence)
77
+ - **Out of scope (waived)** — explicitly deferred, with one-sentence justification and a follow-up issue link
78
+
79
+ | Surface | URL / file | Status |
80
+ | ----------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------- |
81
+ | [e.g. Customer cart] | `/menu` modal — `components/features/menu/…` | In scope |
82
+ | [e.g. Staff POS] | `/dashboard/orders/express/…` | Out of scope (waived) — front-of-house flow not used yet, follow-up #NN |
83
+ | [e.g. Admin Edit Order] | `/dashboard/orders/[id]/edit` — `app/admin/orders/…` | Already works — existing customisation picker handles the new fields as-is |
84
+
85
+ **Rule of thumb:** if the AC list reads _"the schema accepts X"_ or _"the resolver returns Y"_ but never _"the user can do Z in the UI and see the result in the UI"_, the surface inventory is incomplete and the plan is not ready for approval. The matching test-scope ACs in Step 7 must each pin to a surface listed here.
86
+
87
+ LOW risk REQs may skip the table when the change is genuinely surface-free (refactor, dependency bump, infra-only). State `Surface inventory: N/A — <reason>` instead.
88
+
53
89
  ## 3. Architecture decisions
54
90
 
55
91
  > _Populated by the [`adr-author` skill](../skills/adr-author/SKILL.md) at Stage 1 plan APPROVAL._
@@ -0,0 +1,144 @@
1
+ #!/usr/bin/env bash
2
+ # update-sdlc-status.sh — Post or update the canonical SDLC status
3
+ # sticky comment on a REQ tracking issue (devaudit#131).
4
+ #
5
+ # Purpose: long-running SDLC issues accumulate dozens of comments.
6
+ # The operator scrolling the thread can't find "where are we right
7
+ # now" without re-reading. This helper writes a marker-tagged comment
8
+ # at a predictable shape; subsequent calls find + edit the existing
9
+ # comment instead of stacking new ones, so the latest status always
10
+ # lives in exactly one place on the issue.
11
+ #
12
+ # Idempotent — find-or-create. The marker is HTML-commented so it
13
+ # doesn't show up in the rendered issue UI but is greppable via the
14
+ # API. Subsequent invocations on the same issue replace the body
15
+ # without dropping the marker.
16
+ #
17
+ # Usage:
18
+ # ./scripts/update-sdlc-status.sh <issue-number> "<last-step>" "<next-step>" [--repo owner/name] [--dry-run]
19
+ #
20
+ # Examples:
21
+ # ./scripts/update-sdlc-status.sh 322 \
22
+ # "Phase 2 complete — feat branch landed on develop" \
23
+ # "Phase 3 — sdlc-implementer auto-continuing"
24
+ #
25
+ # ./scripts/update-sdlc-status.sh 322 \
26
+ # "Phase 4 — release PR #455 opened" \
27
+ # "Operator action — review + merge develop→main when ready" \
28
+ # --repo metasession-dev/wawagardenbar-app
29
+ #
30
+ # Required:
31
+ # - `gh` CLI authenticated (uses GITHUB_TOKEN or the current `gh auth` session)
32
+ # - The issue must exist
33
+ #
34
+ # Optional flags:
35
+ # --repo owner/name Override repo (defaults to the cwd's git remote)
36
+ # --dry-run Print the body + the gh command that would run,
37
+ # without making any API calls. Used by the test
38
+ # suite + safe for operator inspection.
39
+
40
+ set -euo pipefail
41
+
42
+ if [ "$#" -lt 3 ]; then
43
+ cat <<'USAGE' >&2
44
+ Usage: update-sdlc-status.sh <issue-number> "<last-step>" "<next-step>" [--repo owner/name] [--dry-run]
45
+ USAGE
46
+ exit 1
47
+ fi
48
+
49
+ ISSUE_NUM="$1"
50
+ LAST_STEP="$2"
51
+ NEXT_STEP="$3"
52
+ shift 3
53
+
54
+ REPO=""
55
+ DRY_RUN=false
56
+ while [ "$#" -gt 0 ]; do
57
+ case "$1" in
58
+ --repo)
59
+ REPO="$2"
60
+ shift 2
61
+ ;;
62
+ --dry-run)
63
+ DRY_RUN=true
64
+ shift
65
+ ;;
66
+ *)
67
+ echo "Unknown flag: $1" >&2
68
+ exit 1
69
+ ;;
70
+ esac
71
+ done
72
+
73
+ # Validate issue number is numeric early so we don't make bogus API
74
+ # calls when the caller fat-fingers the arg order.
75
+ if ! [[ "$ISSUE_NUM" =~ ^[1-9][0-9]*$ ]]; then
76
+ echo "Error: issue number must be a positive integer, got: $ISSUE_NUM" >&2
77
+ exit 1
78
+ fi
79
+
80
+ MARKER='<!-- sdlc-implementer:status -->'
81
+
82
+ # Body shape — keep this compact and load-bearing. The marker MUST be
83
+ # the first line so the find-existing pass can use startswith() in
84
+ # the gh JSON filter without false positives.
85
+ BODY=$(cat <<EOF
86
+ $MARKER
87
+
88
+ **🟢 LAST STEP** — $LAST_STEP
89
+
90
+ **🔵 NEXT STEP** — $NEXT_STEP
91
+
92
+ ---
93
+
94
+ _Updated by \`sdlc-implementer\` on every stage transition. The full SDLC trail lives in the comments below; this comment is the always-current pointer._
95
+ EOF
96
+ )
97
+
98
+ REPO_FLAG=""
99
+ if [ -n "$REPO" ]; then
100
+ REPO_FLAG="--repo $REPO"
101
+ fi
102
+
103
+ if [ "$DRY_RUN" = "true" ]; then
104
+ echo "[dry-run] would update sticky on issue #$ISSUE_NUM${REPO:+ in $REPO}"
105
+ echo "----- body -----"
106
+ echo "$BODY"
107
+ echo "----- end body -----"
108
+ exit 0
109
+ fi
110
+
111
+ # Find an existing status sticky on this issue. We grep through the
112
+ # comments looking for the canonical marker; if found, edit it; if
113
+ # not, create a fresh one.
114
+ #
115
+ # gh's --jq filter handles the lookup server-side so we don't drag
116
+ # every comment back to local. `startswith` is the right matcher
117
+ # because the marker is always the first line.
118
+ EXISTING_ID=""
119
+ # Build the api endpoint. Without --repo, gh resolves from the current
120
+ # git remote — same as `gh issue …` does elsewhere in the framework.
121
+ if [ -n "$REPO" ]; then
122
+ EXISTING_ID=$(gh api "repos/$REPO/issues/$ISSUE_NUM/comments" --paginate \
123
+ --jq '.[] | select(.body | startswith("'"$MARKER"'")) | .id' | head -1)
124
+ else
125
+ EXISTING_ID=$(gh api "repos/{owner}/{repo}/issues/$ISSUE_NUM/comments" --paginate \
126
+ --jq '.[] | select(.body | startswith("'"$MARKER"'")) | .id' | head -1)
127
+ fi
128
+
129
+ if [ -n "$EXISTING_ID" ]; then
130
+ echo "Updating existing SDLC status sticky (comment id: $EXISTING_ID)"
131
+ if [ -n "$REPO" ]; then
132
+ gh api "repos/$REPO/issues/comments/$EXISTING_ID" -X PATCH \
133
+ --field "body=$BODY" >/dev/null
134
+ else
135
+ gh api "repos/{owner}/{repo}/issues/comments/$EXISTING_ID" -X PATCH \
136
+ --field "body=$BODY" >/dev/null
137
+ fi
138
+ else
139
+ echo "Posting new SDLC status sticky on issue #$ISSUE_NUM"
140
+ # shellcheck disable=SC2086 # REPO_FLAG must split on space
141
+ gh issue comment "$ISSUE_NUM" $REPO_FLAG --body "$BODY" >/dev/null
142
+ fi
143
+
144
+ echo "SDLC status updated."
@@ -0,0 +1,131 @@
1
+ #!/usr/bin/env bash
2
+ # update-sdlc-status.test.sh — Tests for the SDLC status sticky helper
3
+ # (devaudit#131). Exercises --dry-run so no real API call is needed.
4
+ #
5
+ # Usage:
6
+ # ./scripts/update-sdlc-status.test.sh
7
+
8
+ set -euo pipefail
9
+
10
+ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
11
+ HELPER="$SCRIPT_DIR/update-sdlc-status.sh"
12
+ [ -x "$HELPER" ] || chmod +x "$HELPER"
13
+
14
+ PASS=0
15
+ FAIL=0
16
+ ok() { echo " ✓ $1"; PASS=$((PASS + 1)); }
17
+ no() { echo " ✗ $1"; FAIL=$((FAIL + 1)); }
18
+
19
+ case_missing_args() {
20
+ echo "case: missing args exits non-zero with a usage line"
21
+ local out exit_code
22
+ out=$("$HELPER" 2>&1) && exit_code=0 || exit_code=$?
23
+ if [ "$exit_code" -ne 0 ]; then
24
+ ok "exit code non-zero ($exit_code)"
25
+ else
26
+ no "expected non-zero exit on missing args"
27
+ fi
28
+ if printf '%s\n' "$out" | grep -q "Usage:"; then
29
+ ok "stderr includes Usage line"
30
+ else
31
+ no "stderr missing Usage; got:\n$out"
32
+ fi
33
+ }
34
+
35
+ case_non_numeric_issue() {
36
+ echo "case: non-numeric issue number fails fast"
37
+ local out exit_code
38
+ out=$("$HELPER" "abc" "last" "next" --dry-run 2>&1) && exit_code=0 || exit_code=$?
39
+ if [ "$exit_code" -ne 0 ]; then
40
+ ok "exit code non-zero"
41
+ else
42
+ no "expected failure on non-numeric issue number"
43
+ fi
44
+ if printf '%s\n' "$out" | grep -q "must be a positive integer"; then
45
+ ok "error message names the problem"
46
+ else
47
+ no "wrong error message:\n$out"
48
+ fi
49
+ }
50
+
51
+ case_dry_run_emits_body() {
52
+ echo "case: --dry-run prints the body without invoking gh"
53
+ local out exit_code
54
+ out=$("$HELPER" 42 "Phase 1 complete — plan written" "Phase 2 — implement" --dry-run 2>&1) && exit_code=0 || exit_code=$?
55
+ if [ "$exit_code" -eq 0 ]; then
56
+ ok "exit code 0"
57
+ else
58
+ no "expected exit 0, got $exit_code"
59
+ return
60
+ fi
61
+ if printf '%s\n' "$out" | grep -q '<!-- sdlc-implementer:status -->'; then
62
+ ok "body includes marker comment"
63
+ else
64
+ no "body missing marker; got:\n$out"
65
+ fi
66
+ if printf '%s\n' "$out" | grep -qE '\*\*🟢 LAST STEP\*\* — Phase 1 complete'; then
67
+ ok "body includes LAST STEP line"
68
+ else
69
+ no "LAST STEP line missing or wrong format; got:\n$out"
70
+ fi
71
+ if printf '%s\n' "$out" | grep -qE '\*\*🔵 NEXT STEP\*\* — Phase 2 — implement'; then
72
+ ok "body includes NEXT STEP line"
73
+ else
74
+ no "NEXT STEP line missing or wrong format; got:\n$out"
75
+ fi
76
+ if printf '%s\n' "$out" | grep -q 'would update sticky on issue #42'; then
77
+ ok "dry-run header names the issue"
78
+ else
79
+ no "dry-run header missing issue number; got:\n$out"
80
+ fi
81
+ }
82
+
83
+ case_dry_run_repo_flag() {
84
+ echo "case: --repo flag is reflected in the dry-run header"
85
+ local out
86
+ out=$("$HELPER" 5 "a" "b" --repo metasession-dev/example --dry-run 2>&1)
87
+ if printf '%s\n' "$out" | grep -q 'in metasession-dev/example'; then
88
+ ok "dry-run header includes repo"
89
+ else
90
+ no "dry-run header missing repo; got:\n$out"
91
+ fi
92
+ }
93
+
94
+ case_unknown_flag_rejected() {
95
+ echo "case: unknown flag rejected"
96
+ local out exit_code
97
+ out=$("$HELPER" 1 "a" "b" --bogus 2>&1) && exit_code=0 || exit_code=$?
98
+ if [ "$exit_code" -ne 0 ] && printf '%s\n' "$out" | grep -q 'Unknown flag'; then
99
+ ok "unknown flag rejected with message"
100
+ else
101
+ no "expected unknown-flag rejection; got exit $exit_code, output:\n$out"
102
+ fi
103
+ }
104
+
105
+ case_marker_is_first_line() {
106
+ echo "case: marker is the FIRST line of the body (find-existing relies on startswith)"
107
+ local out
108
+ out=$("$HELPER" 1 "a" "b" --dry-run 2>&1)
109
+ # Extract just the body between the markers we print
110
+ local body
111
+ body=$(printf '%s\n' "$out" | awk '/^----- body -----$/,/^----- end body -----$/')
112
+ local first
113
+ first=$(printf '%s\n' "$body" | sed -n '2p') # line 1 is the "----- body -----" header; line 2 is the body's first line
114
+ if printf '%s\n' "$first" | grep -q '<!-- sdlc-implementer:status -->'; then
115
+ ok "marker is the body's first line"
116
+ else
117
+ no "marker not on first line; first line was: '$first'"
118
+ fi
119
+ }
120
+
121
+ case_missing_args
122
+ case_non_numeric_issue
123
+ case_dry_run_emits_body
124
+ case_dry_run_repo_flag
125
+ case_unknown_flag_rejected
126
+ case_marker_is_first_line
127
+
128
+ echo ""
129
+ echo "=== update-sdlc-status.test.sh ==="
130
+ echo "PASS: $PASS FAIL: $FAIL"
131
+ [ "$FAIL" -eq 0 ]
@@ -19,6 +19,8 @@ Maintain or bootstrap a project's e2e and visual regression test suite. Given an
19
19
  - Unit, component, or API-only tests.
20
20
  - Performance, load, or accessibility audits, unless the project's e2e pack already includes them — in which case follow its lead.
21
21
 
22
+ **Transport-layer specs that live in `e2e/`** (Node `fetch` against webhooks, `MongoClient` queries, `socket.io-client` assertions) ARE in scope — they exercise the deployed system end-to-end, just at the transport boundary rather than the UI. Their evidence form is `test-execution-summary.md`, not `evidenceShot` (see *Specs with no page object* below). The "API-only tests" exclusion above means **unit-level** API contract tests that exercise a route handler in isolation, not transport-boundary integration tests against the running system.
23
+
22
24
  ## The workflow
23
25
 
24
26
  Six phases. Don't skip them and don't reorder — each one feeds the next. Communicate progress as you go; long silent phases feel like the skill has stalled.
@@ -303,6 +305,15 @@ Wrap up with a summary the user can drop into the PR or ticket:
303
305
  - Defects filed — count, with links.
304
306
  - Missed requirements — count, with links.
305
307
 
308
+ **Then feed the test-design record (devaudit#50).** The Stage 3 `test-execution-summary.md` (generated per `3-compile-evidence.md` Step 1a) carries a `## Test design` section at the top. Before Stage 3 finalises the file, populate that section with the design-time decisions this skill made, so the SDLC has a recorded trace that scope was *decided*, not implicit:
309
+
310
+ - **Layers planned** — which of `unit | integration | e2e | visual | manual` applied to this REQ
311
+ - **Layers covered** — same list with ✓ or `deferred`
312
+ - **Deferrals** — explicit one-line rationale per deferred layer (`e2e N/A — schema-only, no UI yet` rather than silent absence)
313
+ - **Skill invocation** — _"`e2e-test-engineer` invoked on turn N during Phase 2"_, with a turn pointer the reviewer can verify against the chat transcript
314
+
315
+ If you authored or modified `e2e/**/*.spec.ts` directly without invoking this skill, that's a delegation gap — the `sdlc-implementer` Phase 2 audit (devaudit#132) will catch it before Phase 3. The honest record is: the skill ran (or didn't), the layers were chosen for stated reasons, and the test-execution-summary attribution points back at the chat turn where the decision happened.
316
+
306
317
  ---
307
318
 
308
319
  ## Evidence vs failure forensics
@@ -373,6 +384,27 @@ When to deviate:
373
384
  - **Long flows** (>3 meaningful transitions) keep all stages tier `'feature'`. The post-merge regression run still has the canonical anchor to corroborate the AC; the dense journey is on the feature PR for reviewers and in the audit-pack download for that release forever.
374
385
  - **Reviewer pushback that evidence feels thin** (single-shot per AC across a HIGH-risk REQ) almost always means tier `'feature'` stages are missing — add them on the feature branch where they actually fire, not after.
375
386
 
387
+ ### Specs with no page object — transport-layer evidence (devaudit#127)
388
+
389
+ `evidenceShot` requires a Playwright `page` object. Specs that exercise behaviour at the transport layer — Node `fetch` against HTTP / webhook endpoints, `socket.io-client` connections, direct `MongoClient` queries, gRPC clients — have no `page` and **cannot call `evidenceShot`**. Examples from the wawagardenbar-app regression pack:
390
+
391
+ - `e2e/payments/webhook-signature-rejection.spec.ts` — HMAC-SHA512 verification via Node `fetch`
392
+ - `e2e/realtime/order-status-broadcast.spec.ts` — `socket.io-client` event assertion
393
+ - `e2e/admin/menu-item-delete.spec.ts` — direct `MongoClient` + service-layer call
394
+
395
+ These specs are still E2E (they exercise the deployed system end-to-end at the transport boundary), they belong in `e2e/`, and they run alongside UI specs. **Their evidence form is the per-spec entry in `test-execution-summary.md`** — the table of spec → pass/fail → asserted behaviour (signature rejected with HTTP 401, idempotent replay suppressed, broadcast received within Nms, soft-delete cascaded) is the load-bearing proof. The screenshot check is **N/A** for them; the release-completeness "behavioural proof" check is satisfied by the test-execution-summary upload alone.
396
+
397
+ Discipline for transport specs:
398
+
399
+ - Name the asserted behaviour in the test title using the same `[REQ-XXX][ACn]` bracket convention UI specs use. Reviewers grep on that.
400
+ - The `test-execution-summary.md` table row should describe what the spec verified in operator-facing terms ("signature mismatch returns 401; payment row unchanged"), not in TypeScript-spec terms ("`expect(response.status).toBe(401)`").
401
+ - If a transport spec *can* be paired with a thin UI shim that screenshots the user-visible outcome (e.g. an admin dashboard surface that shows the rejected payment as "Failed — signature mismatch"), pair them — that buys back the screenshot evidence at the surface level. Otherwise: transport spec stands alone.
402
+ - The portal's release-detail "screenshots" panel will show zero entries for purely-transport REQs; that's correct. Reviewers cross-reference `test-execution-summary.md` instead.
403
+
404
+ This is **observation**, not gate-relaxation — these specs satisfy the SDLC evidence requirement; the screenshot mechanism doesn't apply.
405
+
406
+ A `evidenceTrace(reqId, ac, slug, payload)` helper that writes a JSON sidecar (request/response/ledger shape) was considered as a Phase B; deferred until the portal grows a non-PNG evidence type. Today the test-execution-summary already carries the equivalent information at the table level.
407
+
376
408
  ---
377
409
 
378
410
  ## Principles