@windyroad/itil 0.30.0 → 0.30.1-preview.315

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  {
2
2
  "name": "wr-itil",
3
- "version": "0.30.0",
3
+ "version": "0.30.1",
4
4
  "description": "ITIL-aligned IT service management for Claude Code"
5
5
  }
@@ -103,8 +103,12 @@ esac
103
103
  # Trap detected — emit deny with terse recovery.
104
104
  # Voice-tone budget per ADR-045 deny-band ≤300 bytes total. Names the
105
105
  # offending ticket ID, the literal recovery command, the BYPASS env
106
- # var escape, and the P165 cite.
107
- REASON="BLOCKED: P165. ${TICKET_ID} needs docs/problems/README.md refresh. Run: git add docs/problems/README.md. Bypass: BYPASS_README_REFRESH_GATE=1."
106
+ # var escape with correct propagation syntax (P231 / P173), and the
107
+ # P165 cite. Inline-prefix `VAR=1 git commit ...` does NOT propagate
108
+ # from a Bash subshell to PreToolUse hooks; the env field of
109
+ # `.claude/settings.json` (or shell `export` before `claude` launch)
110
+ # is the working path.
111
+ REASON="BLOCKED: P165. ${TICKET_ID} needs README refresh: git add docs/problems/README.md. Bypass: BYPASS_README_REFRESH_GATE=1 via .claude/settings.json env (P173)."
108
112
 
109
113
  cat <<EOF
110
114
  {
@@ -37,8 +37,25 @@
37
37
  #
38
38
  # Bypass:
39
39
  # - `BYPASS_README_REFRESH_GATE=1` env var → return 0 (allow). For
40
- # legitimate narrative-only ticket-body edits that don't change
41
- # ranking-bearing fields. Audit-traceable via shell history.
40
+ # legitimate one-off escape (e.g. force-amend after rebase rewrote
41
+ # history). Audit-traceable via shell history. Set in
42
+ # `.claude/settings.json` env field or shell `export` before
43
+ # launching `claude` — inline-prefix syntax (`VAR=1 git commit ...`)
44
+ # does NOT propagate from a Bash subshell to PreToolUse hooks (P173).
45
+ #
46
+ # Narrative-only short-circuit (P230):
47
+ # - When all staged ticket edits are purely narrative — no
48
+ # ranking-bearing field change (Priority / Effort / Status / WSJF /
49
+ # Type field-lines), no title change, no rename between state
50
+ # subdirs, no creation/deletion — AND
51
+ # `packages/itil/scripts/reconcile-readme.sh` reports exit=0 against
52
+ # the current README, return 0 (allow). Reconcile-readme is the
53
+ # authoritative drift oracle for narrative-only edits.
54
+ # - Ranking-bearing edits still fall through to existing detection
55
+ # regardless of reconcile state, preserving ADR-014 single-commit
56
+ # grain for the change-set surface (architect verdict: reconcile is
57
+ # a robustness layer on top of per-operation refresh, not a
58
+ # supersession of either).
42
59
  #
43
60
  # Fail-open contract:
44
61
  # - Outside a git working tree, or when `git diff` fails for any
@@ -111,6 +128,7 @@ detect_readme_refresh_required() {
111
128
  local has_readme=0
112
129
  local offending_ticket=""
113
130
  local path basename
131
+ local staged_tickets=()
114
132
 
115
133
  while IFS= read -r path; do
116
134
  [ -n "$path" ] || continue
@@ -135,6 +153,7 @@ detect_readme_refresh_required() {
135
153
  case "$basename" in
136
154
  [0-9]*.md)
137
155
  [ -z "$offending_ticket" ] && offending_ticket="$path"
156
+ staged_tickets+=("$path")
138
157
  ;;
139
158
  esac
140
159
  ;;
@@ -143,6 +162,7 @@ detect_readme_refresh_required() {
143
162
  # Excludes README.md and README-history.md (already cased
144
163
  # above; both start with `R`, not a digit).
145
164
  [ -z "$offending_ticket" ] && offending_ticket="$path"
165
+ staged_tickets+=("$path")
146
166
  ;;
147
167
  *)
148
168
  # Non-ticket surface: ignored.
@@ -152,10 +172,84 @@ detect_readme_refresh_required() {
152
172
  $staged
153
173
  EOF
154
174
 
155
- if [ -n "$offending_ticket" ] && [ "$has_readme" -eq 0 ]; then
156
- printf '%s\n' "$offending_ticket"
157
- return 1
175
+ # No staged ticket nothing to gate.
176
+ [ -n "$offending_ticket" ] || return 0
177
+
178
+ # README staged alongside — clean.
179
+ [ "$has_readme" -eq 1 ] && return 0
180
+
181
+ # P230 narrative-only short-circuit. Detect whether the staged ticket
182
+ # set is purely narrative (no ranking-bearing field change, no rename
183
+ # between state subdirs, no creation/deletion). If so, consult
184
+ # reconcile-readme.sh as the authoritative drift oracle; exit=0 means
185
+ # the README is in sync with filesystem truth and narrative-only
186
+ # ticket edits are safe to allow silently.
187
+ if ! _readme_refresh_staged_is_ranking_bearing "${staged_tickets[@]}"; then
188
+ if _readme_refresh_reconcile_clean; then
189
+ return 0
190
+ fi
191
+ fi
192
+
193
+ # Either ranking-bearing, or narrative-only with reconcile drift —
194
+ # fall through to deny.
195
+ printf '%s\n' "$offending_ticket"
196
+ return 1
197
+ }
198
+
199
+ # Returns 0 if any staged ticket exhibits a ranking-bearing change:
200
+ # - field-line diff matching ^[+-]**(Priority|Effort|Status|WSJF|Type)**:
201
+ # - title-line diff matching ^[+-]# Problem
202
+ # - new ticket file added (A entry on a ticket path)
203
+ # - ticket file deleted (D entry on a ticket path)
204
+ # - rename between state subdirs (R<NN> entry where either path is a
205
+ # ticket path)
206
+ # Returns 1 if narrative-only.
207
+ _readme_refresh_staged_is_ranking_bearing() {
208
+ local tickets=("$@")
209
+ [ "${#tickets[@]}" -gt 0 ] || return 1
210
+
211
+ # (i) Field-line / title-line diff
212
+ if git diff --staged -- "${tickets[@]}" 2>/dev/null \
213
+ | grep -qE '^[+-](\*\*(Priority|Effort|Status|WSJF|Type)\*\*:|# Problem )'; then
214
+ return 0
158
215
  fi
159
216
 
160
- return 0
217
+ # (ii) Creation / deletion / rename via --name-status -M
218
+ local namestatus
219
+ namestatus=$(git diff --staged --name-status -M 2>/dev/null) || return 1
220
+
221
+ local ticket_re='^docs/problems/(open|verifying|closed|known-error|parked)/[0-9].*\.md$'
222
+ local legacy_re='^docs/problems/[0-9].*\.md$'
223
+
224
+ while IFS=$'\t' read -r status p1 p2; do
225
+ [ -n "$status" ] || continue
226
+ case "$status" in
227
+ A|D)
228
+ if [[ "$p1" =~ $ticket_re ]] || [[ "$p1" =~ $legacy_re ]]; then
229
+ return 0
230
+ fi
231
+ ;;
232
+ R*)
233
+ if [[ "$p1" =~ $ticket_re ]] || [[ "$p1" =~ $legacy_re ]] \
234
+ || [[ "$p2" =~ $ticket_re ]] || [[ "$p2" =~ $legacy_re ]]; then
235
+ return 0
236
+ fi
237
+ ;;
238
+ esac
239
+ done <<EOF
240
+ $namestatus
241
+ EOF
242
+
243
+ return 1
244
+ }
245
+
246
+ # Returns 0 if reconcile-readme.sh reports the README is in sync with
247
+ # filesystem truth (exit=0), 1 otherwise (drift, parse error, or script
248
+ # not located).
249
+ _readme_refresh_reconcile_clean() {
250
+ local lib_dir
251
+ lib_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" || return 1
252
+ local reconcile="$lib_dir/../../scripts/reconcile-readme.sh"
253
+ [ -f "$reconcile" ] || return 1
254
+ bash "$reconcile" "docs/problems" >/dev/null 2>&1
161
255
  }
@@ -259,3 +259,163 @@ run_bash_hook() {
259
259
  [ "$status" -eq 0 ]
260
260
  [[ "$output" != *"\"permissionDecision\": \"deny\""* ]]
261
261
  }
262
+
263
+ # --- P230: narrative-only short-circuit (reconcile-readme is authority) ---
264
+ #
265
+ # When all staged ticket edits are purely narrative (Change Log entries,
266
+ # Investigation Task checkbox ticks, prose edits — no ranking-bearing
267
+ # field change, no rename between state subdirs, no creation/deletion)
268
+ # AND `packages/itil/scripts/reconcile-readme.sh` reports exit=0 against
269
+ # the current README, the hook silently passes. Ranking-bearing edits
270
+ # still fall through to existing detection per ADR-014 single-commit
271
+ # grain (architect verdict: reconcile is robustness layer, not
272
+ # supersession of per-operation refresh).
273
+
274
+ seed_valid_readme_p999_open() {
275
+ cat > docs/problems/README.md <<EOF
276
+ # Problem Backlog
277
+
278
+ ## WSJF Rankings
279
+
280
+ | ID | Title | WSJF |
281
+ |---|---|---|
282
+ | P999 | Test ticket | 1.0 |
283
+
284
+ ## Verification Queue
285
+
286
+ (none)
287
+
288
+ ## Closed
289
+
290
+ (none)
291
+ EOF
292
+ }
293
+
294
+ @test "P230 allow: narrative-only edit + reconcile-readme exit=0 → allow silently" {
295
+ cat > docs/problems/open/999-narrative.md <<EOF
296
+ # Problem 999: Test ticket
297
+ **Status**: Open
298
+ **Priority**: 1
299
+ EOF
300
+ seed_valid_readme_p999_open
301
+ git add docs/problems/open/999-narrative.md docs/problems/README.md
302
+ git -c commit.gpgsign=false commit --quiet -m "seed p999"
303
+ # Narrative-only edit: append a Change Log line
304
+ echo "- 2026-05-16 — narrative tweak" >> docs/problems/open/999-narrative.md
305
+ git add docs/problems/open/999-narrative.md
306
+ run run_bash_hook "git commit -m 'docs(problems): narrative tweak'"
307
+ [ "$status" -eq 0 ]
308
+ [[ "$output" != *"\"permissionDecision\": \"deny\""* ]]
309
+ }
310
+
311
+ @test "P230 allow: Investigation Task checkbox tick (narrative-only) + reconcile=0 → allow silently" {
312
+ cat > docs/problems/open/999-checkbox.md <<EOF
313
+ # Problem 999: Test ticket
314
+ **Status**: Open
315
+ **Priority**: 1
316
+
317
+ ## Investigation Tasks
318
+
319
+ - [ ] First task
320
+ EOF
321
+ seed_valid_readme_p999_open
322
+ git add docs/problems/open/999-checkbox.md docs/problems/README.md
323
+ git -c commit.gpgsign=false commit --quiet -m "seed p999"
324
+ # Narrative-only edit: tick a checkbox
325
+ sed -i.bak 's/- \[ \] First task/- [x] First task/' docs/problems/open/999-checkbox.md
326
+ rm docs/problems/open/999-checkbox.md.bak
327
+ git add docs/problems/open/999-checkbox.md
328
+ run run_bash_hook "git commit -m 'docs(problems): tick task'"
329
+ [ "$status" -eq 0 ]
330
+ [[ "$output" != *"\"permissionDecision\": \"deny\""* ]]
331
+ }
332
+
333
+ @test "P230 deny: ranking-bearing Status field change + reconcile=0 → still deny per ADR-014 single-commit grain" {
334
+ cat > docs/problems/open/999-ranking.md <<EOF
335
+ # Problem 999: Test ticket
336
+ **Status**: Open
337
+ **Priority**: 1
338
+ EOF
339
+ seed_valid_readme_p999_open
340
+ git add docs/problems/open/999-ranking.md docs/problems/README.md
341
+ git -c commit.gpgsign=false commit --quiet -m "seed p999"
342
+ # Ranking-bearing edit: change Status
343
+ sed -i.bak 's/\*\*Status\*\*: Open/\*\*Status\*\*: Verifying/' docs/problems/open/999-ranking.md
344
+ rm docs/problems/open/999-ranking.md.bak
345
+ git add docs/problems/open/999-ranking.md
346
+ run run_bash_hook "git commit -m 'transition'"
347
+ [ "$status" -eq 0 ]
348
+ [[ "$output" == *"\"permissionDecision\": \"deny\""* ]]
349
+ }
350
+
351
+ @test "P230 deny: ranking-bearing Priority field change + reconcile=0 → still deny" {
352
+ cat > docs/problems/open/999-priority.md <<EOF
353
+ # Problem 999: Test ticket
354
+ **Status**: Open
355
+ **Priority**: 1
356
+ EOF
357
+ seed_valid_readme_p999_open
358
+ git add docs/problems/open/999-priority.md docs/problems/README.md
359
+ git -c commit.gpgsign=false commit --quiet -m "seed p999"
360
+ # Ranking-bearing edit: change Priority
361
+ sed -i.bak 's/\*\*Priority\*\*: 1/\*\*Priority\*\*: 5/' docs/problems/open/999-priority.md
362
+ rm docs/problems/open/999-priority.md.bak
363
+ git add docs/problems/open/999-priority.md
364
+ run run_bash_hook "git commit -m 're-rate'"
365
+ [ "$status" -eq 0 ]
366
+ [[ "$output" == *"\"permissionDecision\": \"deny\""* ]]
367
+ }
368
+
369
+ @test "P230 deny: git mv between state subdirs (open→verifying) + no README refresh → deny (canonical iter-subprocess case)" {
370
+ cat > docs/problems/open/999-rename.md <<EOF
371
+ # Problem 999: Test ticket
372
+ **Status**: Open
373
+ EOF
374
+ seed_valid_readme_p999_open
375
+ git add docs/problems/open/999-rename.md docs/problems/README.md
376
+ git -c commit.gpgsign=false commit --quiet -m "seed p999"
377
+ # Rename to verifying state subdir
378
+ git mv docs/problems/open/999-rename.md docs/problems/verifying/999-rename.md
379
+ run run_bash_hook "git commit -m 'transition'"
380
+ [ "$status" -eq 0 ]
381
+ [[ "$output" == *"\"permissionDecision\": \"deny\""* ]]
382
+ }
383
+
384
+ @test "P230 deny: narrative-only edit + reconcile-readme drift (README missing ticket) → still deny per existing logic" {
385
+ cat > docs/problems/open/999-narrative-drift.md <<EOF
386
+ # Problem 999: Test ticket
387
+ **Status**: Open
388
+ EOF
389
+ # README does NOT list P999 → reconcile detects MISSING drift → exit=1
390
+ cat > docs/problems/README.md <<EOF
391
+ # Problem Backlog
392
+
393
+ ## WSJF Rankings
394
+
395
+ | ID | Title | WSJF |
396
+ |---|---|---|
397
+ EOF
398
+ git add docs/problems/open/999-narrative-drift.md docs/problems/README.md
399
+ git -c commit.gpgsign=false commit --quiet -m "seed p999"
400
+ # Narrative-only edit
401
+ echo "- 2026-05-16 — narrative line" >> docs/problems/open/999-narrative-drift.md
402
+ git add docs/problems/open/999-narrative-drift.md
403
+ run run_bash_hook "git commit -m 'docs(problems): narrative'"
404
+ [ "$status" -eq 0 ]
405
+ [[ "$output" == *"\"permissionDecision\": \"deny\""* ]]
406
+ }
407
+
408
+ # --- P231: deny message advertises correct bypass syntax (Option A) ---
409
+ #
410
+ # Deny message advertises `.claude/settings.json env` rather than inline
411
+ # prefix (which doesn't propagate to PreToolUse hooks per P173).
412
+
413
+ @test "P231 deny message advertises .claude/settings.json bypass path + P173 reference" {
414
+ echo "# Problem 999" > docs/problems/open/999-bypass-msg.md
415
+ git add docs/problems/open/999-bypass-msg.md
416
+ run run_bash_hook "git commit -m 'feat'"
417
+ [ "$status" -eq 0 ]
418
+ [[ "$output" == *"\"permissionDecision\": \"deny\""* ]]
419
+ [[ "$output" == *".claude/settings.json"* ]]
420
+ [[ "$output" == *"P173"* ]]
421
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/itil",
3
- "version": "0.30.0",
3
+ "version": "0.30.1-preview.315",
4
4
  "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
5
5
  "bin": {
6
6
  "windyroad-itil": "./bin/install.mjs"
@@ -56,7 +56,7 @@ ls docs/problems/*.parked.md docs/problems/parked/*.md 2>/dev/null # for
56
56
 
57
57
  For each `.open.md` and `.known-error.md` file, read the `**Status**`, `**Priority**`, `**Effort**`, and `**WSJF**` lines from the frontmatter section. Compute WSJF if missing: `WSJF = (Severity × StatusMultiplier) / EffortDivisor` per `/wr-itil:manage-problem` WSJF Prioritisation. Default to M (divisor 2) when Effort is absent; flag missing scores so the user knows a review is overdue.
58
58
 
59
- For each `.verifying.md` file, read the `## Fix Released` marker and extract the release age for the `Likely verified?` column per P048 Candidate 4 (within-skill default: 14 days = `yes`).
59
+ For each `.verifying.md` file, read the `## Fix Released` marker. The `Likely verified?` column carries an **evidence-first** cell per P186 (supersedes the original P048 Candidate 4 14-day heuristic). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> When this skill runs against a stale cache, the live-scan path reads the cell value from the `.verifying.md` ticket's `## Fix Released` section (or carries forward the prior cell value from the cached README when present); it does NOT compute the cell from age — that responsibility moved to `/wr-itil:review-problems` Step 4 (user confirmation populates `yes — observed: …`) and `run-retro` Step 4a close-on-evidence citations.
60
60
 
61
61
  ### 3. Display
62
62
 
@@ -70,12 +70,12 @@ Render three sections matching the README.md format so cached and live output lo
70
70
  | <score> | P<NNN> | <title> | <severity> | <status> | <effort> |
71
71
  ```
72
72
 
73
- **Verification Queue** — `.verifying.md` tickets, sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150.
73
+ **Verification Queue** — `.verifying.md` tickets, sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150. The `Likely verified?` column carries an **evidence-first** cell per P186 — three canonical values: `yes — observed: <evidence>`, `no — not observed` (default for newly-released tickets), `no — observed regression`. <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Drift on the cell shape re-opens P186.
74
74
 
75
75
  ```
76
76
  | ID | Title | Released | Likely verified? |
77
77
  |----|-------|----------|------------------|
78
- | P<NNN> | <title> | <release marker> | yes (N days) / no (N days) |
78
+ | P<NNN> | <title> | <release marker> | <yes observed: / no — not observed / no observed regression> |
79
79
  ```
80
80
 
81
81
  **Parked** — `.parked.md` tickets:
@@ -106,7 +106,8 @@ After the tables, print one of two short pointers depending on what the output s
106
106
  - **ADR-022** (`docs/decisions/022-verification-pending-status.proposed.md`) — Verification Pending status conventions; `.verifying.md` exclusion from WSJF ranking.
107
107
  - **ADR-037** (`docs/decisions/037-skill-testing-strategy.proposed.md`) — contract-assertion bats pattern applied to this skill.
108
108
  - **P031** — git-history freshness check rationale (mtime unreliable in worktrees).
109
- - **P048** Candidate 4 — the 14-day `Likely verified?` heuristic.
109
+ - **P048** Candidate 4 — original `Likely verified?` column (14-day age-heuristic). Superseded by P186.
110
+ - **P186** — evidence-first cell shape (`yes — observed: <evidence>` / `no — not observed` / `no — observed regression`) replaces the age-based heuristic; `<!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 -->` drives cross-skill drift detection.
110
111
  - **JTBD-001** (`docs/jtbd/solo-developer/JTBD-001-enforce-governance.proposed.md`) — discoverable surface via `/wr-itil:` autocomplete.
111
112
  - **JTBD-101** (`docs/jtbd/plugin-developer/JTBD-101-extend-suite.proposed.md`) — one skill per distinct user intent.
112
113
  - `packages/itil/skills/manage-problem/SKILL.md` — hosts the thin-router forwarder for the deprecated `manage-problem list` form.
@@ -152,21 +152,33 @@ next=$(printf 'I%03d' $((10#${last:-0} + 1)))
152
152
  echo "$next"
153
153
  ```
154
154
 
155
- ### 4. For new incidents: Gather information (ADR-044 category-1 direction-setting)
155
+ ### 4. For new incidents: Gather information (P132 derive-first; ADR-044 category-4 silent-framework on derivable fields; category-1 direction-setting fallback only on Scope)
156
156
 
157
- Use `AskUserQuestion` for anything not in `$ARGUMENTS`. Incident-declaration inputs are user-knowledge that the framework cannot infer (only the user observed the symptoms / knows the scope / can rate live business impact); this is the canonical ADR-044 **category-1 (direction-setting)** surface — *"only the user knows the goals that haven't been written down yet."*
157
+ **Derive-first dispatch.** Incident declarations carry observable evidence in the user's prose, the working tree, `RISK-POLICY.md`, and the wall-clock the framework can resolve most fields without firing `AskUserQuestion`. Only **Scope** is genuinely user-judgment (semantic blast-radius the framework cannot infer); only **Scope** retains the AskUserQuestion gate.
158
158
 
159
- - **Title**: short kebab-case-friendly description
160
- - **Symptoms**: what is observable (errors, latency, missing data)?
161
- - **Scope**: who/what is affected (users, endpoints, regions)?
162
- - **Start time**: when did symptoms begin? (UTC, as precise as known)
163
- - **Severity**: Impact (1-5) × Likelihood (1-5) per `RISK-POLICY.md`, interpreted as live impact
159
+ The P132 inverse-P078 trap (`docs/problems/known-error/132-...md`) is the load-bearing motivation: the I001 declaration regression fired a 4-question AskUserQuestion with 3 of 4 sub-questions being lazy classifications (Title kebab-derivable, Severity matrix-derivable, Start time git-log-derivable). This dispatch closes that regression on the manage-incident surface and mirrors `/wr-itil:capture-problem` Step 1.5's worked-example pattern (P185 derive-first refactor).
164
160
 
165
- Do not ask for fields that can be inferred:
161
+ Resolve each field via the following dispatch. **The order is load-bearing** — every field except Scope resolves silently with a stderr advisory citing the source; Scope alone fires `AskUserQuestion` as the genuine category-1 surface.
162
+
163
+ | Field | Dispatch | ADR-044 category |
164
+ |-------|----------|------------------|
165
+ | **Title** | Derive silently. Kebab-case the first 8-10 non-stopword tokens of the user's prose description (same slug derivation as `/wr-itil:capture-problem` Step 1.4 and `/wr-itil:manage-problem` Step 4). Emit stderr advisory: `manage-incident: derived title='<slug>' from description; re-invoke or rename the file if the slug is wrong`. Do NOT fire AskUserQuestion. | category-4 silent-framework |
166
+ | **Symptoms** | Pull from user prose verbatim — the description text IS the symptoms surface for declaration. Place into the `## Observations` section template at Step 5. Do NOT fire AskUserQuestion. | category-4 silent-framework |
167
+ | **Start time** | Derive silently, three sources in priority order: (a) explicit timestamp in description (regex `\b\d{4}-\d{2}-\d{2}([ T]\d{2}:\d{2})?\b`, or relative form `"<N> (minutes|hours|days) ago"` resolved against current wall-clock); (b) if the description cites a specific file/dir/changeset-holding-area, run `git log --diff-filter=A --follow -- <path> \| tail -1` for first-touch evidence (the I001 regression's "first hold at 2026-04-24" was this exact shape — `git log --diff-filter=A --follow -- docs/changesets-holding/`); (c) otherwise default to current wall-clock UTC. Emit stderr advisory: `manage-incident: start-time derived as <ts> from <source>; cite an additional evidence anchor in the Timeline section if symptoms began earlier`. Do NOT fire AskUserQuestion. | category-4 silent-framework |
168
+ | **Severity** | Derive silently when evidence maps to a clear `RISK-POLICY.md` Impact × Likelihood cell. Cross-reference description signals against the matrix: (a) impact signals (service disruption keywords like `down` / `degraded` / `unavailable` → high; latency / throughput keywords → moderate; cosmetic / typo keywords → low); (b) likelihood signals (`reproducible` / `every request` → high; `intermittent` / `flaky` → medium; `one-off` / `single user` → low); (c) named anchors (held-cluster age cited → use that age to map cell; scorer state cited → use the cited band). When the cross-reference produces a single clear cell, set it silently and emit stderr advisory: `manage-incident: severity derived as <score> (<label>) from RISK-POLICY matrix + evidence: <evidence list>; re-invoke or update if mis-rated`. **Ambiguous-evidence fallback** (no mappable signal in description, or signals point to conflicting cells): fire AskUserQuestion with the Impact (1-5) × Likelihood (1-5) options as the genuine ADR-044 **category-5 (taste)** fallback surface. The fallback is genuine ambiguity, NOT defaults. | category-4 silent-framework (derivable); category-5 fallback (ambiguous) |
169
+ | **Scope** | Retain AskUserQuestion. Scope is the user-judgment surface — only the user knows whether downstream-adopter-risk is in scope, whether mobile is affected, whether the blast radius extends past the cited symptoms. The framework cannot resolve semantic scope deterministically (same reasoning as Step 2 duplicate-check). Construct the call with `header: "Incident scope"`, `multiSelect: false` if a closed enum applies or free-text capture otherwise. This is the canonical ADR-044 **category-1 (direction-setting)** surface — *"only the user knows the goals that haven't been written down yet."* | category-1 direction-setting |
170
+
171
+ **Inferred fields (no ask, no advisory needed)**:
166
172
 
167
173
  - **Reported**: today's date (UTC)
168
174
  - **Status**: always "Investigating" for new incidents
169
175
 
176
+ **Stderr advisory contract**: each derived field emits a SINGLE line to stderr (NOT stdout, NOT in the ticket body) per the capture-problem Step 1.5 pattern. The advisory text shape is I2-isomorphic — identical sentence structure across fields beyond substituted values + source names. Embedding the advisory in stdout would risk machine-readers parsing it as a ticket-body line; embedding it in the ticket body would violate ADR-011's required-section schema. Stderr is the correct channel — visible to interactive maintainers in the terminal; invisible to ticket consumers; loggable by orchestrators that capture subprocess stderr.
177
+
178
+ **ADR-026 cost-source grounding**: each derived field cites its source in the advisory (description token sequence for Title; explicit-regex / `git log` / wall-clock for Start time; RISK-POLICY matrix cell + named evidence for Severity). The `re-invoke or update if mis-rated` clause carries the reversibility marker ADR-026 mandates for ungrounded outputs.
179
+
180
+ **AFK fail-safe (ADR-013 Rule 6)**: under AFK orchestration, all derivable fields resolve without interactive input; only Scope's AskUserQuestion can block. The orchestrator should halt-with-stderr citing which field needed input rather than guess (Scope is genuinely user-judgment per JTBD-006's "Problems requiring my judgment ... are queued for my return, not guessed at"). manage-incident is rarely AFK-invoked because incidents are interactive by design (JTBD-201), so the halt-on-Scope path is the expected behaviour, not a regression.
181
+
170
182
  ### 5. For new incidents: Write the incident file
171
183
 
172
184
  **File path**: `docs/incidents/<I###>-<kebab-case-title>.investigating.md`
@@ -331,7 +343,9 @@ Otherwise, after the commit in step 14 lands, drain the release queue so the fix
331
343
  ## Related
332
344
 
333
345
  - **P136** (`docs/problems/136-adr-044-alignment-audit-master.open.md`) — ADR-044 alignment audit master. This skill is the third high-ask SKILL audited under Phase 2 (after work-problem singular and mitigate-incident).
334
- - **ADR-044** (`docs/decisions/044-decision-delegation-contract.proposed.md`) — Decision-Delegation Contract. All four AskUserQuestion surfaces in this skill align with the 6-class authority taxonomy: Step 2 duplicate-check is **category-1 (direction-setting)**; Step 4 gather-info is **category-1 (direction-setting)**; Step 6 evidence-gate is **category-2 (deviation-approval)**; Step 14 risk-above-appetite is **category-3 (one-time-override)**.
346
+ - **ADR-044** (`docs/decisions/044-decision-delegation-contract.proposed.md`) — Decision-Delegation Contract. The skill's AskUserQuestion surfaces align with the 6-class authority taxonomy: Step 2 duplicate-check is **category-1 (direction-setting)**; Step 4 is **category-4 (silent-framework)** on Title / Symptoms / Start time / Severity-when-evidence-present + **category-1 (direction-setting)** on Scope + **category-5 (taste)** fallback on Severity-on-ambiguity (P132 derive-first refactor 2026-05-15 re-classified Step 4 from "single cat-1 declaration" to "derive-first dispatch with cat-1 / cat-5 fallback only"); Step 6 evidence-gate is **category-2 (deviation-approval)**; Step 14 risk-above-appetite is **category-3 (one-time-override)**.
347
+ - **P132** (`docs/problems/known-error/132-agents-over-ask-in-interactive-sessions-conflating-mechanical-stages-with-user-interactive-stages.md`) — Agents over-ask in interactive sessions (inverse-P078). Step 4 derive-first refactor closes the 2026-05-06 I001 declaration regression where 3 of 4 sub-questions were lazy classifications. Composes with P185 (capture-problem Step 1.5 derive-first refactor — the in-tree worked-example precedent).
348
+ - **P185** (`docs/problems/...`) — capture-problem Step 1.5 derive-first refactor. Step 4 mirrors the same dispatch shape (silent classifier + stderr advisory + AskUserQuestion only on ambiguity).
335
349
  - **ADR-013 amended Rule 1** (`docs/decisions/013-structured-user-interaction-for-governance-decisions.proposed.md`) — structured user interaction; narrowed in P135 to defer to ADR-044 for framework-resolution boundary. All four surfaces retain `AskUserQuestion` as genuine user-authority surfaces under categories enumerated in ADR-044.
336
350
  - **ADR-013 Confirmation criterion #1** — `grep -inE "Options:.*\(a\)\|Your call:\|which would you like\|which way?"` returns zero matches. Step 2's prior prompt body violated this with `Would you like to (a) update...` phrasing; the P136 Phase 2 refactor (2026-04-28) closed the regression by lifting options into the `AskUserQuestion` `options[]` mechanism.
337
351
  - **ADR-011** (`docs/decisions/011-manage-incident-skill.proposed.md`) — incident lifecycle; evidence-first workflow; reversible-mitigation preference; Sev 4-5 lightweight path. Step 6's evidence-gate refactor (2026-04-28) extends ADR-011's evidence-first rule with the documented `Record anyway` audit-trail bypass that mitigate-incident already used (cool-headed-commitment consistency across the two incident skills).
@@ -67,7 +67,8 @@ setup() {
67
67
  }
68
68
 
69
69
  # ----------------------------------------------------------------------
70
- # Surface 2 — Step 4 gather info (cat-1 cosmetic cross-ref)
70
+ # Surface 2 — Step 4 gather info (P132 derive-first refactor — cat-4 silent-framework
71
+ # on derivable fields; cat-1 direction-setting fallback only on Scope)
71
72
  # ----------------------------------------------------------------------
72
73
 
73
74
  @test "SKILL.md Step 4 gather-info cross-references ADR-044 category-1 (direction-setting)" {
@@ -77,6 +78,73 @@ setup() {
77
78
  [[ "$output" == *"direction-setting"* ]] || [[ "$output" == *"category 1"* ]] || [[ "$output" == *"category-1"* ]]
78
79
  }
79
80
 
81
+ @test "SKILL.md Step 4 cross-references ADR-044 category-4 (silent-framework) for derivable fields (P132 derive-first)" {
82
+ # P132 derive-first refactor: Title / Symptoms / Start time / Severity-when-evidence-present
83
+ # resolve via silent-framework per ADR-044 category 4. Only Scope retains AskUserQuestion as
84
+ # the genuine category-1 direction-setting surface.
85
+ run awk '/^### 4\. /,/^### 5\. /' "$SKILL_FILE"
86
+ [ "$status" -eq 0 ]
87
+ [[ "$output" == *"silent-framework"* ]] || [[ "$output" == *"category 4"* ]] || [[ "$output" == *"category-4"* ]]
88
+ }
89
+
90
+ @test "SKILL.md Step 4 derives Title from prose silently (P132 inverse-P078)" {
91
+ # I001 regression cited in P132 line 14: agent asked "Title" with 3 candidate
92
+ # options when kebab-casing the description would have produced the slug directly.
93
+ # The refactor names "Title" + "derive"/"derived"/"kebab" in the same step.
94
+ run awk '/^### 4\. /,/^### 5\. /' "$SKILL_FILE"
95
+ [ "$status" -eq 0 ]
96
+ [[ "$output" == *"Title"* ]]
97
+ [[ "$output" == *"derive"* ]] || [[ "$output" == *"derived"* ]]
98
+ [[ "$output" == *"kebab"* ]] || [[ "$output" == *"prose"* ]]
99
+ }
100
+
101
+ @test "SKILL.md Step 4 derives Start time from evidence sources (P132 inverse-P078)" {
102
+ # I001 regression cited in P132 line 16: agent asked "Start time" with 3 candidate
103
+ # options when git log first-touch evidence would have produced 2026-04-24 directly.
104
+ # The refactor names git-log / timestamp / wall-clock as the three priority-ordered sources.
105
+ run awk '/^### 4\. /,/^### 5\. /' "$SKILL_FILE"
106
+ [ "$status" -eq 0 ]
107
+ [[ "$output" == *"Start time"* ]] || [[ "$output" == *"start-time"* ]]
108
+ [[ "$output" == *"git log"* ]] || [[ "$output" == *"timestamp"* ]]
109
+ }
110
+
111
+ @test "SKILL.md Step 4 derives Severity from RISK-POLICY matrix + evidence (P132 inverse-P078)" {
112
+ # I001 regression cited in P132 line 15: agent asked "Severity" with 4 candidate
113
+ # options when the RISK-POLICY matrix + observable evidence (cluster age, scorer
114
+ # state) maps to a clear cell. The refactor cites RISK-POLICY.md + evidence in
115
+ # the Severity row of the dispatch table. Ambiguous-evidence fallback to
116
+ # AskUserQuestion is preserved as the genuine cat-5 (taste) surface.
117
+ run awk '/^### 4\. /,/^### 5\. /' "$SKILL_FILE"
118
+ [ "$status" -eq 0 ]
119
+ [[ "$output" == *"Severity"* ]]
120
+ [[ "$output" == *"RISK-POLICY"* ]]
121
+ }
122
+
123
+ @test "SKILL.md Step 4 retains Scope as AskUserQuestion direction-setting (negative-of-negative guard)" {
124
+ # Regression-resistance: the refactor MUST preserve the genuine cat-1 direction-setting
125
+ # surface on Scope. Semantic scope (who/what affected, blast radius) is user-judgment;
126
+ # the framework cannot resolve it deterministically. Same reasoning as Step 2 duplicate-check.
127
+ run awk '/^### 4\. /,/^### 5\. /' "$SKILL_FILE"
128
+ [ "$status" -eq 0 ]
129
+ [[ "$output" == *"Scope"* ]]
130
+ [[ "$output" == *"AskUserQuestion"* ]]
131
+ }
132
+
133
+ @test "SKILL.md Step 4 cites P132 (inverse-P078 audit traceability)" {
134
+ # P132 + ADR-044 must appear in Step 4 or Related section so the audit trail
135
+ # for the I001 regression fix is recoverable from the SKILL.md surface.
136
+ run grep -nE "P132" "$SKILL_FILE"
137
+ [ "$status" -eq 0 ]
138
+ }
139
+
140
+ @test "SKILL.md Step 4 documents stderr advisory shape for derived fields (ADR-026 grounding)" {
141
+ # ADR-026 cost-source grounding: each silent derivation emits a stderr advisory
142
+ # citing the source. Pattern parity with capture-problem Step 1.5 stderr advisory.
143
+ run awk '/^### 4\. /,/^### 5\. /' "$SKILL_FILE"
144
+ [ "$status" -eq 0 ]
145
+ [[ "$output" == *"stderr"* ]] || [[ "$output" == *"advisory"* ]]
146
+ }
147
+
80
148
  # ----------------------------------------------------------------------
81
149
  # Surface 3 — Step 6 evidence-first gate refactor (cat-2; align with mitigate-incident)
82
150
  # ----------------------------------------------------------------------
@@ -374,19 +374,34 @@ next=$(printf '%03d' $(( 10#$(echo -e "${local_max:-0}\n${origin_max:-0}" | sort
374
374
 
375
375
  If the local choice would have collided with an origin ticket created since the last fetch, the `git ls-tree origin/<base>` lookup catches it here and the renumber is automatic. Log the renumber decision in the operation report (e.g. "Bumped next ID from 042 → 043 to avoid collision with origin").
376
376
 
377
- ### 4. For new problems: Gather information
377
+ ### 4. For new problems: Gather information (P132 derive-first; ADR-044 category-4 silent-framework on derivable fields; category-1 direction-setting fallback only on Description)
378
378
 
379
- If the arguments contain a description, extract what you can. For anything missing, use `AskUserQuestion` to gather:
379
+ **Derive-first dispatch.** Problem-declaration inputs carry observable evidence in the user's prose, the working tree, `RISK-POLICY.md`, and the wall-clock — the framework can resolve most fields without firing `AskUserQuestion`. Only **Description** is genuinely user-knowledge (without prose there is literally nothing to capture); only **Description** retains the AskUserQuestion gate.
380
380
 
381
- - **Title**: Short kebab-case-friendly description
382
- - **Description**: What is happening? What should happen instead?
383
- - **Priority**: Impact (1-5) × Likelihood (1-5) per RISK-POLICY.md
381
+ The P132 inverse-P078 trap (`docs/problems/known-error/132-...md`) is the load-bearing motivation. The 2026-05-06 I001 declaration regression cited in P132 fired a 4-question AskUserQuestion with 3 of 4 sub-questions being lazy classifications (Title kebab-derivable, Severity matrix-derivable, Start time git-log-derivable). manage-problem Step 4 is the second declaration-skill surface under Phase 2a (after manage-incident Step 4 in commit b7cc645) to ship the derive-first dispatch. The pattern is isomorphic across `/wr-itil:capture-problem` Step 1.5 (P185 worked example), `/wr-itil:manage-incident` Step 4, and this skill.
384
382
 
385
- Do NOT ask for fields that can be inferred:
386
- - **Reported date**: Use today's date
387
- - **Status**: Always "Open" for new problems
388
- - **Symptoms**: Infer from description if possible
389
- - **Workaround**: Default to "None identified yet." unless obvious from context
383
+ Resolve each field via the following dispatch. **The order is load-bearing** — every field except Description resolves silently with a stderr advisory citing the source; Description alone fires `AskUserQuestion` as the genuine category-1 surface.
384
+
385
+ | Field | Dispatch | ADR-044 category |
386
+ |-------|----------|------------------|
387
+ | **Title** | Derive silently. Kebab-case the first 8-10 non-stopword tokens of the user's prose description (same slug derivation as `/wr-itil:capture-problem` Step 1.4 and `/wr-itil:manage-incident` Step 4). Emit stderr advisory: `manage-problem: derived title='<slug>' from description; re-invoke with the desired title or rename the file if the slug is wrong`. Do NOT fire AskUserQuestion. | category-4 silent-framework |
388
+ | **Description** | Pull verbatim from `$ARGUMENTS` prose into Step 5's `## Description` section. **Fallback**: when `$ARGUMENTS` carries NO prose at all (only flags / status / no body), fire AskUserQuestion as the genuine category-1 direction-setting surface — *"only the user knows the goals that haven't been written down yet."* Question text: *"What is happening? What should happen instead?"* This is the ONLY user-knowledge field at Step 4. | category-1 direction-setting (fallback only; category-4 silent-framework on the typical path where prose is present) |
389
+ | **Priority** (Impact × Likelihood) | Derive silently when description signals map to a clear `RISK-POLICY.md` Impact × Likelihood cell. Cross-reference signals: (a) **impact** — service-disruption keywords (`down` / `degraded` / `unavailable` / `data loss` → high; latency / throughput / slow → moderate; cosmetic / typo / minor friction → low); (b) **likelihood** — reproducibility keywords (`every invocation` / `reproducible` / `100%` → high; `intermittent` / `flaky` / `sometimes` → medium; `one-off` / `single observation` → low); (c) **named anchors** — explicit `Impact: <label>` / `Likelihood: <label>` or `Priority: <score>` mentions in prose take precedence. When the cross-reference produces a single clear cell, set it silently and emit stderr advisory: `manage-problem: priority derived as <score> (<label>) from RISK-POLICY matrix + evidence: <evidence list>; re-invoke or update if mis-rated`. **Ambiguous-evidence fallback** (no mappable signal, or signals point to conflicting cells): fire AskUserQuestion with the Impact (1-5) × Likelihood (1-5) options as the genuine ADR-044 **category-5 (taste)** fallback surface. The fallback is genuine ambiguity, NOT defaults. | category-4 silent-framework (derivable); category-5 fallback (ambiguous) |
390
+
391
+ **Inferred fields (no ask, no advisory needed)**:
392
+
393
+ - **Reported date**: today's date (`date +%Y-%m-%d`)
394
+ - **Status**: always "Open" for new problems
395
+ - **Symptoms**: infer from description verbatim into Step 5's `## Symptoms` section
396
+ - **Workaround**: default to "None identified yet." unless explicit workaround prose appears in `$ARGUMENTS`
397
+
398
+ **Stderr advisory contract**: each derived field emits a SINGLE line to stderr (NOT stdout, NOT in the ticket body) per the capture-problem Step 1.5 + manage-incident Step 4 pattern. The advisory text shape is I2-isomorphic — identical sentence structure across the three declaration-skill surfaces (`<skill>: derived <field>=<value> from <source>; <reversibility-clause>`) beyond substituted values + source names. Embedding the advisory in stdout would risk machine-readers parsing it as a ticket-body line; embedding it in the ticket body would violate the required-section schema. Stderr is the correct channel — visible to interactive maintainers in the terminal; invisible to ticket consumers; loggable by orchestrators that capture subprocess stderr.
399
+
400
+ **ADR-026 cost-source grounding**: each derived field cites its source in the advisory (description token sequence for Title; RISK-POLICY matrix cell + named evidence for Priority). The `re-invoke or update if mis-rated` clause carries the reversibility marker ADR-026 mandates for ungrounded outputs.
401
+
402
+ **AFK fail-safe (ADR-013 Rule 6)**: under AFK orchestration, all derivable fields resolve without interactive input; only Description-when-absent can block. The orchestrator should halt-with-stderr citing the missing-prose case rather than guess (Description is genuinely user-judgment per JTBD-006's "Problems requiring my judgment ... are queued for my return, not guessed at"). The typical AFK manage-problem call carries prose in `$ARGUMENTS` (or the orchestrator's per-iter context supplies it), so the halt-on-Description path is the rare-corner-case behaviour, not the routine flow.
403
+
404
+ **Cross-skill consistency note**: this is the third declaration-skill surface to ship the derive-first dispatch (after `/wr-itil:capture-problem` Step 1.5 and `/wr-itil:manage-incident` Step 4 in commit b7cc645). The architect verdict 2026-05-15 P132 Phase 2a-ii flagged this triplet as the pattern-lock point — the I2-isomorphic stderr advisory format is now established across three skills before Phase 2a-iii (`/wr-architect:create-adr` argument-collection) extends the same pattern to a fourth.
390
405
 
391
406
  ### 4b. For new problems: Concern-boundary analysis (multi-concern check)
392
407
 
@@ -483,6 +498,8 @@ After writing the new `.open.md` file, regenerate `docs/problems/README.md` to i
483
498
 
484
499
  **Verification Queue sort direction (P150)**: rows in the Verification Queue table are sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics — older entries are the most likely-verified candidates the user wants to surface first when closing the queue. Newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any future change to the VQ sort direction MUST update this render block, the Step 7 P062 block, the Step 9c presentation block, the Step 9e template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift here re-opens P150.
485
500
 
501
+ **Likely-verified cell shape (P186)**: the `Likely verified?` column carries an **evidence-first** cell — `yes — observed: <evidence>` / `no — not observed` / `no — observed regression`. The 14-day age-based heuristic (originally introduced by P048 Candidate 4) is superseded — age is preserved separately via the `Released` column; the `Likely verified?` column is reserved for session-observed evidence (Step 4 user confirmation, in-session test invocation outcome per ADR-026 grounding, or `run-retro` Step 4a close-on-evidence citation). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Any future change to the cell shape MUST update this render block, the Step 7 P062 block, the Step 9c presentation block, the Step 9e template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift here re-opens P186.
502
+
486
503
  1. After `Write`-ing the new `.open.md` file (and, for multi-concern splits per step 4b, after all split files are written), regenerate `docs/problems/README.md` in-place reflecting the new filename set.
487
504
  2. Update the "Last reviewed" line per the **Last-reviewed line discipline (P134)** subsection below — name the new ticket as the most-recent fragment (e.g. `P<NNN> opened — <one-line title>`); displaced prior fragments rotate to `docs/problems/README-history.md`.
488
505
  3. `git add docs/problems/README.md` — the stage list at Step 11 must include it alongside the new `.open.md` file (Step 11's `git add -u` catch-all handles tracked-file modifications; the new README render lands via this path when README.md already exists in git, and via an explicit `git add docs/problems/README.md` when it is newly created). When line-3 truncation displaces prior content, also `git add docs/problems/README-history.md`.
@@ -650,6 +667,8 @@ The refresh uses the same rendering rules as Step 9e (dual-tolerant glob per RFC
650
667
 
651
668
  **Verification Queue sort direction (P150)**: rows in the Verification Queue table are sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics — older entries are the most likely-verified candidates the user wants to surface first when closing the queue. Newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any future change to the VQ sort direction MUST update this render block, the Step 5 P094 block, the Step 9c presentation block, the Step 9e template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift here re-opens P150.
652
669
 
670
+ **Likely-verified cell shape (P186)**: the `Likely verified?` column carries an **evidence-first** cell — `yes — observed: <evidence>` / `no — not observed` / `no — observed regression`. Age is preserved separately via the `Released` column; session-observed evidence drives the cell. On a Known Error → Verification Pending transition the refresh writes `no — not observed` as the default (no observed evidence yet at the moment of release). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Any future change to the cell shape MUST update this render block, the Step 5 P094 block, the Step 9c presentation block, the Step 9e template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift here re-opens P186.
671
+
653
672
  **Mechanism:**
654
673
 
655
674
  1. After renaming + Editing + `git add`-ing the transitioned ticket file (per the staging-trap rule above), regenerate `docs/problems/README.md` in-place reflecting the new filename set and the transitioned ticket's new Status.
@@ -745,14 +764,15 @@ After reviewing all problems, present a WSJF-ranked table for open/known-error p
745
764
  | WSJF | ID | Title | Severity | Status | Effort | Reported | Notes |
746
765
  |------|-----|-------|----------|--------|--------|----------|-------|
747
766
 
748
- Then present a separate **Verification Queue** section for `.verifying.md` files (per ADR-022 — ranked by release age, oldest first; no WSJF because the multiplier is 0). Sort key + direction is the canonical `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) — drift here re-opens P150. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Highlight each ticket whose release age is **≥ 14 days** (the within-skill default per P048 Candidate 4 tunable; if it needs cross-skill consistency later, promote to policy) with a `likely verified` marker in the final column. This makes the Verification Queue not just a list but a ranked view of which verifications are most likely ready to close — older entries are the most likely-verified candidates the user wants to surface first when closing the queue:
767
+ Then present a separate **Verification Queue** section for `.verifying.md` files (per ADR-022 — ranked by release age, oldest first; no WSJF because the multiplier is 0). Sort key + direction is the canonical `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) — drift here re-opens P150. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> The final `Likely verified?` column carries an **evidence-first** cell (per P186 supersedes the original P048 Candidate 4 14-day heuristic). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Three canonical values:
749
768
 
750
769
  | ID | Title | Released | Fix summary | Likely verified? |
751
770
  |----|-------|----------|-------------|------------------|
752
771
 
753
- The `Likely verified?` column takes values:
754
- - `yes (N days)`release age 14 days; the user is unlikely to revert a landed fix after this long. Surface these first in step 9d's verification prompt so the user can batch-close them.
755
- - `no (N days)` — release age < 14 days; may still be in validation. Fire step 9d for these too, but without the highlight.
772
+ The `Likely verified?` column takes values (per P186):
773
+ - `yes observed: <evidence>` session-observed evidence the fix works. Cite the evidence inline (≤ 120 chars): a Step 9d user confirmation phrase quoted, an in-session test invocation + observable outcome per ADR-026 grounding, or a `run-retro` Step 4a close-on-evidence citation. Surface these FIRST in step 9d's verification prompt so the user can batch-close them.
774
+ - `no not observed` — fix released but no session-observable evidence yet. Default for newly-released tickets. Fire step 9d for these too, without batch-close highlight. Aging surfaces via the `Released` column — NOT in this cell.
775
+ - `no — observed regression` — fix released and the bug recurred this session. Cite the recurrence inline (≤ 120 chars). Do NOT batch-close — these may warrant `.verifying.md` → `.known-error.md` flip-back via `/wr-itil:transition-problem`.
756
776
 
757
777
  Then present a separate **Parked** section listing `.parked.md` files (no ranking):
758
778
 
@@ -805,11 +825,11 @@ Edit each problem file where the priority changed. Then write/overwrite `docs/pr
805
825
 
806
826
  ## Verification Queue
807
827
 
808
- Fix released, awaiting user verification (driven off `docs/problems/*.verifying.md` via glob — per ADR-022). Sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC). <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150 — any change to VQ sort direction MUST update the Step 5 P094 block, the Step 7 P062 block, the Step 9c presentation block, this template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems`.
828
+ Fix released, awaiting user verification (driven off `docs/problems/*.verifying.md` via glob — per ADR-022). Sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC). <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150 — any change to VQ sort direction MUST update the Step 5 P094 block, the Step 7 P062 block, the Step 9c presentation block, this template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems`. The `Likely verified?` column carries an **evidence-first** cell per P186 — three canonical values: `yes — observed: <evidence>`, `no — not observed` (default for newly-released tickets), `no — observed regression`. <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Age is preserved separately via the `Released` column — drift on the cell shape re-opens P186.
809
829
 
810
- | ID | Title | Released | Fix summary |
811
- |----|-------|----------|-------------|
812
- | P<NNN> | <title> | <release marker> | <one-sentence fix summary> |
830
+ | ID | Title | Released | Fix summary | Likely verified? |
831
+ |----|-------|----------|-------------|------------------|
832
+ | P<NNN> | <title> | <release marker> | <one-sentence fix summary> | <yes — observed: … / no — not observed / no — observed regression> |
813
833
  ...
814
834
 
815
835
  ## Parked
@@ -0,0 +1,151 @@
1
+ #!/usr/bin/env bats
2
+ # ADR-044 alignment contract assertions for manage-problem SKILL.md
3
+ # Step 4 (P132 Phase 2a-ii derive-first refactor, 2026-05-15).
4
+ #
5
+ # tdd-review: structural-permitted (justification: SKILL.md prose contract
6
+ # assertions; behavioural skill-runtime harness pending P012 + P081 Phase 2;
7
+ # expected to migrate to behavioural form once the harness exists. Added
8
+ # during P132 Phase 2a-ii per the inline plan's bridge-marker rule —
9
+ # isomorphic precedent at manage-incident-adr-044-contract.bats Surface 2.)
10
+ #
11
+ # This file is the dedicated structural-grep-permitted home for the ADR-044
12
+ # alignment contract during the bridge window. After P081 Phase 2 retrofits
13
+ # the project's structural-grep tests to behavioural form, this file's
14
+ # assertions migrate too.
15
+ #
16
+ # @problem P132 (agents over-ask in interactive sessions — Phase 2a-ii
17
+ # manage-problem create flow derive-first refactor)
18
+ # @problem P185 (capture-problem Step 1.5 worked-example precedent)
19
+ # @problem P136 (ADR-044 alignment audit master)
20
+ # @adr ADR-044 (Decision-Delegation Contract)
21
+ # @adr ADR-013 amended Rule 1 (structured user interaction)
22
+ # @adr ADR-026 (cost-source grounding — stderr advisory shape)
23
+ # @adr ADR-052 (behavioural-by-default with structural bridge window)
24
+ # @jtbd JTBD-001 (enforce governance without slowing down — primary)
25
+ # @jtbd JTBD-006 (work backlog AFK — queued for return, not guessed at)
26
+ # @jtbd JTBD-101 (extend the suite with consistent patterns)
27
+
28
+ setup() {
29
+ SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
30
+ SKILL_FILE="${SKILL_DIR}/SKILL.md"
31
+ [ -f "$SKILL_FILE" ]
32
+ }
33
+
34
+ # ----------------------------------------------------------------------
35
+ # Step 4 derive-first refactor (P132 Phase 2a-ii) — cat-4 silent-framework
36
+ # on Title + Priority-when-evidence-present; cat-1 direction-setting only
37
+ # on Description; cat-5 taste fallback only on Priority-when-ambiguous.
38
+ # ----------------------------------------------------------------------
39
+
40
+ @test "SKILL.md Step 4 cross-references ADR-044 category-4 (silent-framework) for derivable fields (P132 derive-first)" {
41
+ # P132 Phase 2a-ii: Title + Priority-when-evidence-present resolve via
42
+ # silent-framework per ADR-044 category 4. Only Description retains
43
+ # AskUserQuestion as genuine cat-1 direction-setting (no prose -> nothing
44
+ # to capture).
45
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
46
+ [ "$status" -eq 0 ]
47
+ [[ "$output" == *"silent-framework"* ]] || [[ "$output" == *"category 4"* ]] || [[ "$output" == *"category-4"* ]]
48
+ }
49
+
50
+ @test "SKILL.md Step 4 cross-references ADR-044 category-1 (direction-setting) for Description fallback" {
51
+ # Description is the genuine cat-1 surface — without prose there is
52
+ # literally nothing to capture. The refactor preserves the AskUserQuestion
53
+ # on Description.
54
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
55
+ [ "$status" -eq 0 ]
56
+ [[ "$output" == *"direction-setting"* ]] || [[ "$output" == *"category 1"* ]] || [[ "$output" == *"category-1"* ]]
57
+ }
58
+
59
+ @test "SKILL.md Step 4 derives Title from prose silently (P132 inverse-P078)" {
60
+ # The 2026-05-06 I001 declaration regression cited in P132 line 14 was the
61
+ # same agent failure mode on the manage-incident surface: agent asked
62
+ # "Title" with 3 candidate options when kebab-casing the description
63
+ # would have produced the slug directly. manage-problem Step 4 must ship
64
+ # the same derive-first pattern.
65
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
66
+ [ "$status" -eq 0 ]
67
+ [[ "$output" == *"Title"* ]]
68
+ [[ "$output" == *"derive"* ]] || [[ "$output" == *"derived"* ]]
69
+ [[ "$output" == *"kebab"* ]] || [[ "$output" == *"prose"* ]]
70
+ }
71
+
72
+ @test "SKILL.md Step 4 derives Priority from RISK-POLICY matrix + evidence (P132 inverse-P078)" {
73
+ # The I001 regression cited in P132 line 15 was the analogous failure on
74
+ # Severity. manage-problem's Priority (Impact x Likelihood) derives from
75
+ # the same RISK-POLICY matrix lookup against description signals.
76
+ # Ambiguous-evidence falls back to AskUserQuestion as cat-5 (taste).
77
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
78
+ [ "$status" -eq 0 ]
79
+ [[ "$output" == *"Priority"* ]]
80
+ [[ "$output" == *"RISK-POLICY"* ]]
81
+ }
82
+
83
+ @test "SKILL.md Step 4 retains Description as AskUserQuestion fallback (negative-of-negative guard)" {
84
+ # Regression-resistance: the refactor MUST preserve the genuine cat-1
85
+ # direction-setting surface on Description. Without user-supplied prose
86
+ # the SKILL has nothing to derive from — Description IS the input. Same
87
+ # reasoning as manage-incident Step 4 Scope retention.
88
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
89
+ [ "$status" -eq 0 ]
90
+ [[ "$output" == *"Description"* ]]
91
+ [[ "$output" == *"AskUserQuestion"* ]]
92
+ }
93
+
94
+ @test "SKILL.md Step 4 cites P132 (inverse-P078 audit traceability)" {
95
+ # P132 + ADR-044 must appear in Step 4 or Related section so the audit
96
+ # trail for the Phase 2a-ii refactor is recoverable from the SKILL.md
97
+ # surface.
98
+ run grep -nE "P132" "$SKILL_FILE"
99
+ [ "$status" -eq 0 ]
100
+ }
101
+
102
+ @test "SKILL.md Step 4 documents stderr advisory shape for derived fields (ADR-026 grounding)" {
103
+ # ADR-026 cost-source grounding: each silent derivation emits a stderr
104
+ # advisory citing the source. Pattern parity with capture-problem Step
105
+ # 1.5 + manage-incident Step 4 (I2-isomorphic across the three
106
+ # declaration-skill surfaces per architect verdict 2026-05-15).
107
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
108
+ [ "$status" -eq 0 ]
109
+ [[ "$output" == *"stderr"* ]] || [[ "$output" == *"advisory"* ]]
110
+ }
111
+
112
+ @test "SKILL.md Step 4 cross-references capture-problem Step 1.5 + manage-incident Step 4 (cross-skill consistency)" {
113
+ # The architect verdict 2026-05-15 P132 Phase 2a-ii flagged cross-skill
114
+ # consistency: three declaration-skill surfaces now ship the same
115
+ # dispatch shape. The Step 4 prose must explicitly cite both prior
116
+ # surfaces (P185 capture-problem + manage-incident b7cc645) as
117
+ # worked-example precedents so the I2-isomorphic stderr advisory format
118
+ # is locked-in by reference before a fourth surface (Phase 2a-iii
119
+ # create-adr) drifts.
120
+ run awk '/^### 4\. /,/^### 4b\. /' "$SKILL_FILE"
121
+ [ "$status" -eq 0 ]
122
+ [[ "$output" == *"P185"* ]] || [[ "$output" == *"capture-problem"* ]]
123
+ [[ "$output" == *"manage-incident"* ]] || [[ "$output" == *"b7cc645"* ]]
124
+ }
125
+
126
+ # ----------------------------------------------------------------------
127
+ # Negative-of-negative guards — Step 4b multi-concern + Step 2
128
+ # duplicate-check MUST remain cat-1 direction-setting AskUserQuestion
129
+ # surfaces (architect verdict 2026-05-15: not touched by Phase 2a-ii).
130
+ # ----------------------------------------------------------------------
131
+
132
+ @test "SKILL.md Step 4b multi-concern AskUserQuestion is preserved (cat-1 direction-setting, not touched by Phase 2a-ii)" {
133
+ # Architect verdict 2026-05-15: Step 4b is a separate cat-1
134
+ # direction-setting surface — only the user knows whether the concerns
135
+ # can be independently fixed. The Phase 2a-ii refactor MUST NOT touch
136
+ # Step 4b's AskUserQuestion gate.
137
+ run awk '/^### 4b\. /,/^### 5\. /' "$SKILL_FILE"
138
+ [ "$status" -eq 0 ]
139
+ [[ "$output" == *"AskUserQuestion"* ]]
140
+ [[ "$output" == *"concern"* ]] || [[ "$output" == *"split"* ]]
141
+ }
142
+
143
+ @test "SKILL.md Step 2 duplicate-check AskUserQuestion is preserved (cat-1 direction-setting, not touched by Phase 2a-ii)" {
144
+ # Architect verdict 2026-05-15: Step 2 is a separate cat-1
145
+ # direction-setting surface — only the user knows whether an existing
146
+ # ticket is the same root cause. The Phase 2a-ii refactor MUST NOT
147
+ # touch Step 2's AskUserQuestion gate.
148
+ run awk '/^### 2\. /,/^### 3\. /' "$SKILL_FILE"
149
+ [ "$status" -eq 0 ]
150
+ [[ "$output" == *"AskUserQuestion"* ]]
151
+ }
@@ -85,10 +85,10 @@ For each `MISSING` Verification Queue entry, read the `## Fix Released` block:
85
85
  sed -n '/^## Fix Released/,/^## /p' docs/problems/<NNN>-*.verifying.md
86
86
  ```
87
87
 
88
- Render the Verification Queue row in the existing format:
88
+ Render the Verification Queue row in the existing format. The `Likely verified?` cell carries an **evidence-first** value per P186 (supersedes the original P048 Candidate 4 14-day heuristic). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> When reconcile-readme synthesises a missing row, default the cell to `no — not observed` — the row is being added because some prior session committed the `.verifying.md` transition without staging the README refresh; reconcile-readme has no session-observed evidence to cite. Subsequent `/wr-itil:review-problems` Step 4 or `run-retro` Step 4a passes populate `yes — observed: <evidence>` when the user verifies. Drift on the cell shape re-opens P186.
89
89
 
90
90
  ```
91
- | P<NNN> | <title> | <release marker> | <Likely verified? per P048 Candidate 4: yes if ≥14 days, else no (<N> days)> |
91
+ | P<NNN> | <title> | <release marker> | no not observed |
92
92
  ```
93
93
 
94
94
  ### Step 4. Apply edits via Edit tool — preserve narrative
@@ -99,7 +99,7 @@ For each REMOVE: `Edit` with the existing row as `old_string`, and remove it (re
99
99
 
100
100
  For each ADD to WSJF Rankings: locate the correct WSJF position by descending order. Use `Edit` to insert the new row immediately above the next-lower-WSJF row (or append at the bottom of the table if the new row's WSJF is the lowest). The Edit's `old_string` is the line that the new row inserts above; the `new_string` is the new row + the same line below.
101
101
 
102
- For each ADD to Verification Queue: insert the new row in `Released date ASC` position (oldest at row 1; same-day releases tiebreak by ID ASC) per the canonical VQ sort direction. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Recent releases land at the bottom; oldest-pending verifications surface at the top so the user lands on actionable closure candidates first per P048 user-task semantics. Drift here re-opens P150.
102
+ For each ADD to Verification Queue: insert the new row in `Released date ASC` position (oldest at row 1; same-day releases tiebreak by ID ASC) per the canonical VQ sort direction. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Recent releases land at the bottom; oldest-pending verifications surface at the top so the user lands on actionable closure candidates first per P048 user-task semantics. Drift here re-opens P150. The synthesised cell defaults to `no — not observed` per the P186 evidence-first cell shape — see the "Render the Verification Queue row" block above. <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 -->
103
103
 
104
104
  After all edits, re-run `packages/itil/scripts/reconcile-readme.sh docs/problems` to confirm exit 0. If the second run still reports drift, investigate the residual edits — do NOT re-run reconciliation in a loop, as that hides systematic edit failures.
105
105
 
@@ -73,7 +73,13 @@ After re-scoring, present three sections matching the README.md format (same ren
73
73
  |------|-----|-------|----------|--------|--------|----------|-------|
74
74
  ```
75
75
 
76
- **Verification Queue** — `.verifying.md` tickets, sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics. Older entries are the most likely-verified candidates the user wants to surface first when closing the queue; newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any change to the VQ sort direction MUST update this rendering block, Step 5's README template, AND `/wr-itil:manage-problem` SKILL.md Step 5 P094 / Step 7 P062 / Step 9c / Step 9e + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift re-opens P150. Highlight any ticket whose release age is **≥ 14 days** with a `yes (N days)` marker in the `Likely verified?` column (within-skill default per P048 Candidate 4 tunable; promote to cross-skill policy if needed):
76
+ **Verification Queue** — `.verifying.md` tickets, sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics. Older entries are the most likely-verified candidates the user wants to surface first when closing the queue; newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any change to the VQ sort direction MUST update this rendering block, Step 5's README template, AND `/wr-itil:manage-problem` SKILL.md Step 5 P094 / Step 7 P062 / Step 9c / Step 9e + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift re-opens P150. The `Likely verified?` column carries an **evidence-first** cell (per P186 supersedes the age-based heuristic). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Three canonical values:
77
+
78
+ - `yes — observed: <evidence>` — a Step 4 user confirmation, an in-session test invocation + observable outcome (per ADR-026 grounding), or a `run-retro` Step 4a close-on-evidence citation. Quote the evidence inline (≤ 120 chars; abbreviate to ticket/commit/version anchor + verb).
79
+ - `no — not observed` — fix released but no session-observable evidence yet. Default for newly-released tickets. Aging is preserved separately via the `Released` column — the Released column is the aging signal, `Likely verified?` is the evidence signal.
80
+ - `no — observed regression` — fix released and the bug recurred this session. Cite the recurrence inline (≤ 120 chars).
81
+
82
+ Any change to the canonical cell shape MUST update this rendering block, Step 5's README template, AND every co-located render site listed in the VQ-SORT-DIRECTION drift-tripwire above — drift re-opens P186. Surface `yes — observed: …` rows first in Step 4's verification prompt (user can batch-close them); `no — observed regression` rows must NOT be batch-closed (they may signal a botched fix and warrant a flip-back to `.known-error.md`).
77
83
 
78
84
  ```
79
85
  | ID | Title | Released | Fix summary | Likely verified? |
@@ -102,9 +108,9 @@ Target the dual-tolerant glob `docs/problems/*.verifying.md docs/problems/verify
102
108
 
103
109
  The question MUST include a fix summary extracted from the `## Fix Released` section — include the first sentence (or first bullet list) of that section in the question body or as the option description, so the user can answer without reading the full problem file. Do NOT ask with only the problem ID + title + version.
104
110
 
105
- - Surface the Step 3 `yes (N days)` tickets first so the user can batch-close them.
106
- - If the user confirms: close the problem (`git mv` from `.verifying.md` to `.closed.md`, update Status to "Closed", re-stage per the P057 staging trap).
107
- - If the user says no or is unsure: leave the ticket as Verification Pending.
111
+ - Surface the Step 3 `yes observed: …` tickets first so the user can batch-close them (per P186 evidence-first cell shape).
112
+ - If the user confirms: close the problem (`git mv` from `.verifying.md` to `.closed.md`, update Status to "Closed", re-stage per the P057 staging trap). Update the `Likely verified?` cell on the same render path to `yes — observed: user confirmed <YYYY-MM-DD>`.
113
+ - If the user says no or is unsure: leave the ticket as Verification Pending. If the user reports recurrence, update the cell to `no — observed regression — <one-line citation>` and flag for `.verifying.md` → `.known-error.md` flip-back via `/wr-itil:transition-problem`.
108
114
 
109
115
  **AFK / non-interactive branch (ADR-013 Rule 6):** when `AskUserQuestion` is unavailable, record the Verification Queue in the review output and skip the prompt. Do NOT auto-close verifying tickets — only the user can make that call. The user sees the queue on next interactive invocation.
110
116
 
@@ -222,11 +228,11 @@ Dev-work queue only. Verification Pending (`.verifying.md`, WSJF multiplier 0) a
222
228
 
223
229
  ## Verification Queue
224
230
 
225
- Fix released, awaiting user verification (driven off the dual-tolerant glob `docs/problems/*.verifying.md docs/problems/verifying/*.md` per ADR-022 + RFC-002 migration window). Sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC). <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> `Likely verified?` column marks tickets ≥14 days old (P048 Candidate 4 default).
231
+ Fix released, awaiting user verification (driven off the dual-tolerant glob `docs/problems/*.verifying.md docs/problems/verifying/*.md` per ADR-022 + RFC-002 migration window). Sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC). <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> `Likely verified?` column carries an **evidence-first** cell per P186 — three canonical values: `yes — observed: <evidence>`, `no — not observed` (default for newly-released tickets), `no — observed regression`. <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Age is preserved separately via the `Released` column — aging surfaces there, not in `Likely verified?`.
226
232
 
227
233
  | ID | Title | Released | Likely verified? |
228
234
  |----|-------|----------|------------------|
229
- | P<NNN> | <title> | <release marker> | <yes (N days) / no (N days)> |
235
+ | P<NNN> | <title> | <release marker> | <yes observed: / no — not observed / no observed regression> |
230
236
  ...
231
237
 
232
238
  ## Inbound Upstream Reports
@@ -293,7 +299,8 @@ Otherwise, after the commit in Step 6 lands, drain the release queue per the mec
293
299
  - **ADR-037** (`docs/decisions/037-skill-testing-strategy.proposed.md`) — contract-assertion bats pattern applied to this skill.
294
300
  - **P031** — git-history freshness check rationale (mtime unreliable in worktrees). Applies to the README cache this skill owns.
295
301
  - **P047** — live-estimate effort buckets; the Step 2 re-estimate is the lifecycle transition this ticket closes.
296
- - **P048** Candidate 4 — the 14-day `Likely verified?` heuristic in Step 3.
302
+ - **P048** Candidate 4 — original `Likely verified?` column introduction (14-day age-heuristic). Superseded by P186 evidence-first cell shape.
303
+ - **P186** — evidence-first cell shape (`yes — observed: <evidence>` / `no — not observed` / `no — observed regression`) supersedes the age-based heuristic in Step 3 + Step 5; `<!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 -->` marker drives cross-skill drift detection (P138 / P150 fix-shape precedent).
297
304
  - **P057** — staging trap. Step 2's auto-transition MUST re-stage after Edit.
298
305
  - **P062** — README.md refresh on transitions. Step 5 is the review-path of the same refresh; `/wr-itil:manage-problem` Step 7 carries the transition-path.
299
306
  - **JTBD-001** (`docs/jtbd/solo-developer/JTBD-001-enforce-governance.proposed.md`) — discoverable surface via `/wr-itil:` autocomplete.
@@ -0,0 +1,229 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # P186: `Likely verified?` column in docs/problems/README.md
4
+ # Verification Queue must carry an evidence-first cell shape — NOT
5
+ # the original P048 Candidate 4 age-based heuristic (≥14 days = yes).
6
+ # Sibling proxy-for-evidence anti-pattern to P185 at the review-problems
7
+ # Step 3/5 surface. User critique 2026-05-12: "I don't like 'it's been
8
+ # a while, so likely verified' approach. We want firm evidence. For
9
+ # these, it should be things you actually observe."
10
+ #
11
+ # Three canonical values per P186:
12
+ # yes — observed: <evidence> (session-observed evidence the fix works)
13
+ # no — not observed (fix released, no evidence yet; default)
14
+ # no — observed regression (fix released, bug recurred)
15
+ #
16
+ # Hybrid coverage per ADR-005 + ADR-037 + ADR-052:
17
+ # - Structural contract-assertions (Permitted Exception per ADR-005 /
18
+ # contract-assertion pattern per ADR-037 — narrowly scoped to marker
19
+ # presence per architect verdict): each render-block site carries the
20
+ # canonical LIKELY-VERIFIED-CELL-SHAPE marker pointing to P186.
21
+ # - Behavioural-shape assertions: each render site documents the three
22
+ # canonical cell values + the age-based heuristic is NOT cited as
23
+ # authority anywhere the marker fires.
24
+ # - Drift-tripwire prose assertion: primary render sites (review-problems
25
+ # + manage-problem) name P186 in the drift-re-opens contract per
26
+ # P138 / P150 fix-shape precedent.
27
+ #
28
+ # @problem P186
29
+ # @jtbd JTBD-001 (enforce governance without slowing down — evidence-grounded
30
+ # closure decision rather than calendar proxy)
31
+ # @jtbd JTBD-006 (progress backlog AFK — `observed: <evidence>` cell IS the
32
+ # audit trail the AFK contract requires)
33
+ #
34
+ # Cross-reference:
35
+ # P186: docs/problems/open/186-vq-likely-verified-column-uses-age-heuristic-not-evidence.md
36
+ # P185: sibling proxy-for-evidence anti-pattern at capture-problem Step 1.5
37
+ # P150: sibling fix shape — VQ-SORT-DIRECTION marker
38
+ # P138: sibling fix shape — TIE-BREAK-LADDER-SOURCE marker
39
+ # P048: introduced the Verification Queue + 14-day heuristic this ticket supersedes
40
+ # ADR-022 — `.verifying.md` lifecycle; VQ rendering
41
+ # ADR-026 — agent output grounding (evidence-citation discipline)
42
+ # ADR-037 — contract-assertion bats pattern
43
+ # ADR-052 — behavioural-tests default
44
+
45
+ setup() {
46
+ REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
47
+ REVIEW_SKILL="$REPO_ROOT/packages/itil/skills/review-problems/SKILL.md"
48
+ MANAGE_SKILL="$REPO_ROOT/packages/itil/skills/manage-problem/SKILL.md"
49
+ LIST_SKILL="$REPO_ROOT/packages/itil/skills/list-problems/SKILL.md"
50
+ TRANSITION_SKILL="$REPO_ROOT/packages/itil/skills/transition-problem/SKILL.md"
51
+ TRANSITIONS_SKILL="$REPO_ROOT/packages/itil/skills/transition-problems/SKILL.md"
52
+ RECONCILE_SKILL="$REPO_ROOT/packages/itil/skills/reconcile-readme/SKILL.md"
53
+
54
+ MARKER='<!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 -->'
55
+ }
56
+
57
+ # ---------------------------------------------------------------------------
58
+ # Marker presence at every render site (P138 / P150 fix-shape precedent)
59
+ # ---------------------------------------------------------------------------
60
+
61
+ @test "review-problems carries the LIKELY-VERIFIED-CELL-SHAPE marker" {
62
+ run grep -F "$MARKER" "$REVIEW_SKILL"
63
+ [ "$status" -eq 0 ]
64
+ count=$(grep -c -F "$MARKER" "$REVIEW_SKILL")
65
+ # review-problems is the primary owner — Step 3 presentation AND Step 5
66
+ # README template both render the column.
67
+ [ "$count" -ge 2 ]
68
+ }
69
+
70
+ @test "manage-problem carries the LIKELY-VERIFIED-CELL-SHAPE marker at every render site" {
71
+ run grep -F "$MARKER" "$MANAGE_SKILL"
72
+ [ "$status" -eq 0 ]
73
+ count=$(grep -c -F "$MARKER" "$MANAGE_SKILL")
74
+ # manage-problem renders the VQ at 4 sites: Step 5 P094, Step 7 P062,
75
+ # Step 9c presentation, Step 9e README template. Marker must appear at
76
+ # each — drift re-opens P186.
77
+ [ "$count" -ge 4 ]
78
+ }
79
+
80
+ @test "list-problems VQ rendering carries the LIKELY-VERIFIED-CELL-SHAPE marker" {
81
+ run grep -F "$MARKER" "$LIST_SKILL"
82
+ [ "$status" -eq 0 ]
83
+ }
84
+
85
+ @test "transition-problem Step 7 README refresh carries the LIKELY-VERIFIED-CELL-SHAPE marker" {
86
+ run grep -F "$MARKER" "$TRANSITION_SKILL"
87
+ [ "$status" -eq 0 ]
88
+ }
89
+
90
+ @test "transition-problems batch render carries the LIKELY-VERIFIED-CELL-SHAPE marker" {
91
+ run grep -F "$MARKER" "$TRANSITIONS_SKILL"
92
+ [ "$status" -eq 0 ]
93
+ }
94
+
95
+ @test "reconcile-readme rendering carries the LIKELY-VERIFIED-CELL-SHAPE marker" {
96
+ run grep -F "$MARKER" "$RECONCILE_SKILL"
97
+ [ "$status" -eq 0 ]
98
+ }
99
+
100
+ # ---------------------------------------------------------------------------
101
+ # Canonical cell values present at every render site
102
+ # ---------------------------------------------------------------------------
103
+
104
+ @test "review-problems documents all three canonical cell values" {
105
+ run grep -F 'yes — observed:' "$REVIEW_SKILL"
106
+ [ "$status" -eq 0 ]
107
+ run grep -F 'no — not observed' "$REVIEW_SKILL"
108
+ [ "$status" -eq 0 ]
109
+ run grep -F 'no — observed regression' "$REVIEW_SKILL"
110
+ [ "$status" -eq 0 ]
111
+ }
112
+
113
+ @test "manage-problem documents all three canonical cell values" {
114
+ run grep -F 'yes — observed:' "$MANAGE_SKILL"
115
+ [ "$status" -eq 0 ]
116
+ run grep -F 'no — not observed' "$MANAGE_SKILL"
117
+ [ "$status" -eq 0 ]
118
+ run grep -F 'no — observed regression' "$MANAGE_SKILL"
119
+ [ "$status" -eq 0 ]
120
+ }
121
+
122
+ @test "list-problems documents all three canonical cell values" {
123
+ run grep -F 'yes — observed:' "$LIST_SKILL"
124
+ [ "$status" -eq 0 ]
125
+ run grep -F 'no — not observed' "$LIST_SKILL"
126
+ [ "$status" -eq 0 ]
127
+ run grep -F 'no — observed regression' "$LIST_SKILL"
128
+ [ "$status" -eq 0 ]
129
+ }
130
+
131
+ # ---------------------------------------------------------------------------
132
+ # Drift-tripwire prose at primary render sites (P138 / P150 precedent)
133
+ # ---------------------------------------------------------------------------
134
+
135
+ @test "review-problems names drift-re-opens-P186 contract" {
136
+ run grep -F 'drift re-opens P186' "$REVIEW_SKILL"
137
+ [ "$status" -eq 0 ]
138
+ }
139
+
140
+ @test "manage-problem names drift-re-opens-P186 contract" {
141
+ # manage-problem hosts the drift-tripwire prose at Step 5 P094 AND Step 7
142
+ # P062 — both render sites name P186 alongside the existing P138 / P150
143
+ # contracts. List-problems / transition-problem(s) / reconcile-readme
144
+ # carry the marker but defer the canonical drift contract to the primary
145
+ # owners (manage-problem / review-problems) per the P138 + P150 precedent.
146
+ run grep -F 'drift here re-opens P186' "$MANAGE_SKILL"
147
+ [ "$status" -eq 0 ]
148
+ count=$(grep -c -i 're-opens P186' "$MANAGE_SKILL")
149
+ [ "$count" -ge 2 ]
150
+ }
151
+
152
+ # ---------------------------------------------------------------------------
153
+ # Age-based heuristic must NOT survive as the authoritative cell rule
154
+ # ---------------------------------------------------------------------------
155
+
156
+ @test "review-problems no longer cites the 14-day heuristic as the cell rule" {
157
+ # The P048 Candidate 4 "marks tickets ≥14 days old" phrasing was the
158
+ # exact framing the user critique targeted. After P186, the cell shape
159
+ # contract no longer references age as the authoritative trigger — age
160
+ # is preserved separately via the `Released` column. The phrase may
161
+ # survive in historical context (e.g. Related-section pointer back to
162
+ # P048) but NOT as the live rendering rule.
163
+ run grep -F 'marks tickets ≥14 days old' "$REVIEW_SKILL"
164
+ [ "$status" -ne 0 ]
165
+ }
166
+
167
+ @test "manage-problem Step 9c no longer treats age as the cell trigger" {
168
+ # Pre-P186 Step 9c documented `yes (N days)` and `no (N days)` as the
169
+ # cell values keyed on a 14-day threshold. The new shape replaces both
170
+ # with evidence-first values; the literal `yes (N days)` template must
171
+ # not survive as a documented cell value (it can still appear in
172
+ # historical narrative such as the README VQ rows pending re-render).
173
+ run grep -F '`yes (N days)` — release age ≥ 14 days' "$MANAGE_SKILL"
174
+ [ "$status" -ne 0 ]
175
+ }
176
+
177
+ # ---------------------------------------------------------------------------
178
+ # Behavioural / template-shape — README template row carries the new shape
179
+ # ---------------------------------------------------------------------------
180
+
181
+ @test "review-problems Step 5 README template row uses the new cell-shape vocabulary" {
182
+ # The template ROW (the `| P<NNN> | <title> | ... |` line below the
183
+ # Verification Queue header) must reference the new vocabulary, not
184
+ # the old `yes (N days) / no (N days)` placeholder.
185
+ run grep -F 'yes — observed' "$REVIEW_SKILL"
186
+ [ "$status" -eq 0 ]
187
+ # Old placeholder gone
188
+ run grep -F '<yes (N days) / no (N days)>' "$REVIEW_SKILL"
189
+ [ "$status" -ne 0 ]
190
+ }
191
+
192
+ @test "list-problems Step 3 template row uses the new cell-shape vocabulary" {
193
+ run grep -F 'yes — observed' "$LIST_SKILL"
194
+ [ "$status" -eq 0 ]
195
+ # Old placeholder gone from list-problems template
196
+ run grep -F 'yes (N days) / no (N days)' "$LIST_SKILL"
197
+ [ "$status" -ne 0 ]
198
+ }
199
+
200
+ # ---------------------------------------------------------------------------
201
+ # Behavioural — produced README's VQ section uses the new cell vocabulary
202
+ # ---------------------------------------------------------------------------
203
+ #
204
+ # Behavioural assertion per ADR-052: the actual rendered docs/problems/
205
+ # README.md Verification Queue rows must use the new evidence-first cell
206
+ # shape, not age-based markers. This is the user-visible artefact the
207
+ # entire fix targets.
208
+
209
+ @test "docs/problems/README.md VQ section contains the new evidence-first cell vocabulary" {
210
+ README="$REPO_ROOT/docs/problems/README.md"
211
+ [ -f "$README" ]
212
+ # At least one row should carry the new vocabulary after the iter
213
+ # re-renders the VQ section. Tests run after the iter's edits land.
214
+ run grep -F 'no — not observed' "$README"
215
+ [ "$status" -eq 0 ]
216
+ }
217
+
218
+ @test "docs/problems/README.md VQ section no longer uses bare age-marker cells like 'no (N days)' as the dominant rendering" {
219
+ README="$REPO_ROOT/docs/problems/README.md"
220
+ [ -f "$README" ]
221
+ # Allow a small residual count for transitional rows or quoted prose,
222
+ # but the bulk of the VQ table must have migrated to the new shape.
223
+ # Concretely: count `no — not observed` occurrences and require they
224
+ # exceed the count of bare `no (<digit>` age-marker cells. This is a
225
+ # behavioural check — the rendered surface, not the SKILL.md template.
226
+ new_shape_count=$(grep -c -F 'no — not observed' "$README" || true)
227
+ old_shape_count=$(grep -cE '\| no \([0-9]+ days?\) \|' "$README" || true)
228
+ [ "$new_shape_count" -gt "$old_shape_count" ]
229
+ }
@@ -173,6 +173,8 @@ The refresh uses the same rendering rules as `/wr-itil:review-problems` Step 9e
173
173
 
174
174
  **Verification Queue sort direction (P150)**: Verification Queue rows are sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics — older entries are the most likely-verified candidates the user wants to surface first when closing the queue. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150.
175
175
 
176
+ **Likely-verified cell shape (P186)**: the `Likely verified?` column carries an **evidence-first** cell — `yes — observed: <evidence>` / `no — not observed` / `no — observed regression`. On a Known Error → Verification Pending transition the refresh writes `no — not observed` as the default (no observed evidence yet at the moment of release). On a Verification Pending → Closed transition the closing commit's session-observed evidence should populate the cell as `yes — observed: <evidence>` before the row exits the queue. <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Drift on the cell shape re-opens P186.
177
+
176
178
  **Mechanism:**
177
179
 
178
180
  1. After renaming + Editing + `git add`-ing the transitioned ticket file (per the staging-trap rule above), regenerate `docs/problems/README.md` in-place reflecting the new filename set and the transitioned ticket's new Status.
@@ -234,6 +236,7 @@ Release draining is owned by the caller — `/wr-itil:manage-problem` Step 12 (i
234
236
  - **ADR-037** (`docs/decisions/037-skill-testing-strategy.proposed.md`) — contract-assertion bats pattern applied to this skill.
235
237
  - **P057** — `git mv` + Edit staging trap rationale; the delegated Step 7 block implements the re-stage. Named here as a transitive contract so callers can reason about the dependency.
236
238
  - **P062** — `/wr-itil:review-problems` is the canonical README.md cache writer, but Step 7 transitions also refresh README.md in-place per P062's mechanism. Named here as a transitive contract.
239
+ - **P186** — evidence-first `Likely verified?` cell shape (`yes — observed: <evidence>` / `no — not observed` / `no — observed regression`); `<!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 -->` marker drives cross-skill drift detection (P138 / P150 fix-shape precedent).
237
240
  - **P063** — external-root-cause detection at Open → Known Error and at the `upstream-blocked` park path. The delegated Step 7 block owns the prompt; this skill inherits the AFK fallback without re-implementing.
238
241
  - **JTBD-001** (`docs/jtbd/solo-developer/JTBD-001-enforce-governance.proposed.md`) — discoverable surface via `/wr-itil:` autocomplete. Users type `/wr-itil:transition-problem 042 known-error` rather than remembering the `manage-problem <NNN> known-error` subcommand.
239
242
  - **JTBD-101** (`docs/jtbd/plugin-developer/JTBD-101-extend-suite.proposed.md`) — one skill per distinct user intent.
@@ -180,6 +180,8 @@ Per P062, every Step 7 status transition refreshes README.md. At the batch grain
180
180
 
181
181
  The refresh follows the same render rules as `/wr-itil:review-problems` Step 9e (glob `docs/problems/*.open.md` / `*.known-error.md` / `*.verifying.md` / `*.parked.md`; rank open + known-error by WSJF; Verification Queue sorted by `Released date ASC` with same-day tiebreak by ID ASC per ADR-022 + P048; Parked section). It does NOT re-rank — existing WSJF values on ticket files are trusted; the refresh is a render, not a re-rank. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift on the VQ sort direction re-opens P150.
182
182
 
183
+ **Likely-verified cell shape (P186)**: the `Likely verified?` column carries an **evidence-first** cell — `yes — observed: <evidence>` / `no — not observed` / `no — observed regression`. At batch grain the refresh writes the per-pair cell from the per-pair transition context: a `verifying` destination defaults to `no — not observed` (the batch just released the fix; evidence accrues subsequently); a `close` destination assumes session-observed evidence was the trigger for the batch close (the upstream caller — `run-retro` Step 4a, `review-problems` Step 9d — already verified the evidence) and the row exits the queue (not re-rendered as VQ). <!-- LIKELY-VERIFIED-CELL-SHAPE: evidence-based per P186 --> Drift on the cell shape re-opens P186.
184
+
183
185
  ```bash
184
186
  git add docs/problems/README.md
185
187
  ```