npm - @windyroad/itil - Versions diffs - 0.39.0-preview.471 → 0.39.0-preview.472 - Mend

@windyroad/itil 0.39.0-preview.471 → 0.39.0-preview.472

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/itil",
-  "version": "0.39.0-preview.471",
+  "version": "0.39.0-preview.472",
   "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
   "bin": {
     "windyroad-itil": "./bin/install.mjs"

package/skills/work-problems/SKILL.md CHANGED Viewed

@@ -212,6 +212,31 @@ Stop the loop and report a summary if any of these are true:
 For stop-conditions #1 and #3 (no actionable problems / all blocked), Step 2.5 still runs — it reads the accumulated `outstanding_questions` queue from `.afk-run-state/outstanding-questions.jsonl` and presents the batch. Empty queue → no `AskUserQuestion` fires; non-empty queue → batched per ADR-013 Rule 1 cap (≤4 per call, sequential if >4).
+### Step 2.4: Pre-`ALL_DONE` gate sequence (UNCONDITIONAL — fires before every `ALL_DONE` emit, P341)
+Before the orchestrator emits the final `ALL_DONE` sentinel for the AFK loop, it MUST run the following gate sequence. The sequence fires **unconditionally** — at every stop-condition (`#1`, `#2`, `#3` per Step 2) AND at every halt-path that emits a final AFK summary AND on quota-exhaustion / natural loop end. The sequence has three parts that MUST complete in order; the structural rule is `ALL_DONE` emits ONLY after (a) AND (b) complete cleanly. Per-state subdir layout reminder: this step's order in the SKILL is logical (Step 2.4 fires *between* the Step 2 stop-check and the Step 2.5 surfacing routine *only as a wrapper*); the numerical ordering reflects the conceptual sequence (Step 2.4 wraps Step 2.5 + the new retro gate, then Step 2.5/2.5b execute as gate (a)'s worker).
+**Gate (a) — Outstanding-questions surface.** Read `.afk-run-state/outstanding-questions.jsonl`. If non-empty, invoke Step 2.5b's surfacing routine to present the accumulated queue (via `AskUserQuestion`-when-available-else-table per ADR-013 Rule 1 / Rule 6). On completion, truncate the queue file. If the queue is empty, gate (a) returns immediately. The surfacing routine is the existing Step 2.5b — Step 2.4 does NOT re-implement; it sequences.
+**Gate (b) — Session-level retro.** Invoke `/wr-retrospective:run-retro` via the Skill tool. This is the **orchestrator-main-turn session-level retro**, distinct from the per-iter retro fired inside each iter subprocess (per P086 / Step 5 retro-on-exit clause). The session-level retro covers cross-iter patterns, friction observations, framework-improvement candidates, and the AFK loop's overall trajectory — surface visible only after multiple iters have completed. Retro commits its own work per ADR-014; any tickets retro creates ride retro's own commit, and the orchestrator picks them up on the *next* invocation of `/wr-itil:work-problems` rather than re-entering the loop here.
+**Gate (c) — Emit `ALL_DONE`.** The sentinel emits ONLY after both gate (a) and gate (b) complete. The final summary (per Output Format below) includes the Session Cost section and the Outstanding Design Questions table (when gate (a)'s fallback branch fired). `ALL_DONE` is the single canonical emit position — Step 2.5 no longer emits `ALL_DONE` directly; its closing prose hands control to Step 2.4 (b) per the cross-reference.
+```
+ALL_DONE
+```
+**Hard-fail mode (halt with directive instead of emit `ALL_DONE`).** If either gate cannot complete to a clean state, the orchestrator MUST halt with a clear directive rather than emit `ALL_DONE`. The halt is recoverable — the user returns, satisfies the gate, and re-invokes `/wr-itil:work-problems` (which observes the now-clean state and emits `ALL_DONE` on the natural Step 2 → Step 2.4 path).
+Halt triggers:
+- **Gate (a) cannot complete**: queue has user-input-required entries AND `AskUserQuestion` is unavailable AND the fallback Outstanding Design Questions table cannot render (e.g. write error to `.afk-run-state/`). The halt directive cites the queue file path + entry count + the rendering failure.
+- **Gate (b) cannot complete**: `/wr-retrospective:run-retro` returns a non-zero exit code or the Skill tool itself is unavailable. The halt directive cites the run-retro failure mode (skill-unavailable / non-zero exit / commit-gate rejection per ADR-014). Retro is non-blocking *within* the iter subprocess (per Step 5's retro-on-exit clause) but **load-bearing** at the orchestrator-main-turn session-level gate — these are distinct surfaces.
+**Why unconditional**: prior to this gate, Step 2.5's outstanding-questions surface fired conditionally on stop-condition #2; stop-conditions #1 and #3 did NOT route through it unless the queue happened to be non-empty AND the agent remembered the cross-reference. Session-level retro was implicit — only per-iter retros existed. The structural gap was that `ALL_DONE` could emit while direction-class observations remained queued AND without a session-level retro running — both gates were nominally documented but neither was a hard prerequisite. Step 2.4 closes this by making the gate sequence a hard, unconditional prerequisite. The 2026-05-31 user direction codified the invariant: *"the work-problems skill MUST surface the outstanding questions at the end before emitting ALL_DONE. It MUST then run a retro. Only then should it emit ALL_DONE"* (P341 Description verbatim).
+**Composition**: gate (a) inherits the Step 2.5 / Step 2.5b surfacing routine without modification — the new structure is a wrapper, not a re-implementation. Gate (b) is the orchestrator-level extension of P086 (which fires retro at iter-subprocess level only). Gate (c) is the same `ALL_DONE` sentinel; only its emit position is amended. The pre-existing P126 cross-reference principle (`halt-paths-must-route-design-questions-through-Step-2.5b`) is preserved — halt-paths still route through Step 2.5b; the only addition is that even *successful* loop ends now route through Step 2.4 (a)+(b) before `ALL_DONE`. Per ADR-044 framework-resolution boundary: the agent-internal trust-boundary for *when* to surface is now framework-resolved (unconditional pre-`ALL_DONE`); the user-input surface *within* gate (a) is unchanged (still ADR-013 Rule 1 batched-AskUserQuestion when available, Rule 6 table fallback otherwise).
 ### Step 2.5: Surface accumulated outstanding questions at loop end (P135 Phase 3 — default emit shape)
 Per ADR-044 framework-resolution boundary: human input is for direction-setting / deviation-approval / one-time-override / silent-framework / taste / authentic-correction (six categories). Across N iters, those observations accumulate at iter level (`ITERATION_SUMMARY.outstanding_questions`) and persist to a session-level queue file. Loop-end Step 2.5 reads, ranks, and presents the batch.
@@ -228,13 +253,7 @@ Per ADR-044 framework-resolution boundary: human input is for direction-setting
 **4. Cleanup.** After all entries are resolved (whether via `AskUserQuestion` or table), truncate `.afk-run-state/outstanding-questions.jsonl` to empty. The next AFK loop starts with a clean queue.
-**5. Emit the final summary + `ALL_DONE`.** The summary includes the Outstanding Design Questions table when Step 2.5b's fallback branch fired (see Output Format). When Step 2.5b's default branch fired (`AskUserQuestion` was available), the answers have already been written back; the table is omitted from the summary.
-```
-ALL_DONE
-```
-This sentinel line allows external scripts to detect completion.
+**5. Cleanup + hand control to Step 2.4 (b) for session-level retro (P341).** Step 2.5 is the worker of Step 2.4 gate (a); after gate (a)'s surfacing routine completes and the queue file is truncated, control passes to Step 2.4 gate (b) for the session-level retro. The final summary (including the Outstanding Design Questions table when Step 2.5b's fallback branch fired) is prepared here but the `ALL_DONE` sentinel emits at Step 2.4 (c) AFTER retro completes — not at Step 2.5 directly. This makes Step 2.4 the single canonical `ALL_DONE` emit position; external scripts watching for completion read the sentinel from the post-retro position per Step 2.4.
 ### Step 2.5b: Surface accumulated user-answerable skips (reusable surfacing routine, P122 + P126)
@@ -390,8 +409,16 @@ rm -f "$ITER_JSON"
 1. **Context**: this is one iteration of the AFK work-problems loop. The user is AFK. The orchestrator selected `P<NNN> (<title>)` as the highest-WSJF actionable ticket.
 2. **Task**: apply the `/wr-itil:manage-problem` workflow for `work highest WSJF problem that can be progressed non-interactively as the user is AFK`. Follow manage-problem SKILL.md verbatim, including architect / jtbd / style-guide / voice-tone gate reviews and the commit gate (manage-problem Step 11). Because this subprocess has the Agent tool in its own surface, the normal review-via-subagent paths work — no inline-verdict fallback needed.
-3. **Constraints**: commit the completed work per ADR-014. Do NOT push, do NOT run `push:watch`, do NOT run `release:watch` — the orchestrator's Step 6.5 owns release cadence. Do NOT invoke `capture-*` background skills (AFK carve-out — ADR-032). Do NOT use `ScheduleWakeup` under any circumstance (P083 — iteration workers must not self-reschedule). **NEVER call `AskUserQuestion` mid-loop in AFK** (P135 / ADR-044): direction / deviation-approval / one-time-override / silent-framework observations queue at `ITERATION_SUMMARY.outstanding_questions` for loop-end batched presentation. **This includes the manage-problem substance-confirm-before-build guard (ADR-074 (Confirm a decision's substance before building dependent work)):** when the propose-fix step detects that the fix builds on a born-`proposed` decision whose substance is unconfirmed (via `wr-architect-is-decision-unconfirmed`), the iter does NOT implement on it and does NOT ask mid-loop — it queues a `category: "direction"` entry naming the unconfirmed ADR + its Decision Outcome for loop-end confirmation, and routes the ticket to `action: skipped`, `skip_reason_category: user-answerable`. Building on the unconfirmed substance instead (or guessing the choice) is the P315 failure this guard exists to prevent. The queued substance-confirm is a legitimate cat-1 direction ask — it is NOT counted as lazy in the Step 2d Ask Hygiene Pass (ADR-074 lazy-count exclusion). Per-iter `AskUserQuestion` calls are sub-contracting framework-resolved decisions back to the user (lazy deferral per Step 2d Ask Hygiene Pass classification). Non-interactive defaults apply per ADR-013 Rule 6 + ADR-044's framework-resolution boundary. **Treat the user as transient** (P130): even when observably present at orchestrator dispatch time, the user may answer one question and disappear for hours; presence is not a reliable signal and is not the goal. The iter's job is to progress the ticket and accumulate questions for batched surfacing — not to ask "is it OK to proceed?" at a mechanical-stage boundary. **Do NOT poll `bats` output with a bats-console-summary regex against TAP-format output** (P146 — bash until-loop-deadlock antipattern). The bats-console-summary line `<N> tests, <M> failures` is emitted ONLY by bats's *default* (non-TAP) formatter; `bats --tap` does not emit a console summary, so a polling loop of shape `until [ -f $OUT ] && grep -qE '^[0-9]+ tests?,' $OUT; do sleep 5; done` spins forever after bats completes (silent deadlock — no error, no exit; recovery requires manual SIGTERM with metadata loss per the P146/P147 stuck-before-emit subclass). When you need to wait on a backgrounded bats run, prefer `wait $bg_pid` (Unix idiom — completion signaled by process exit, no regex required) or, for the Bash tool, `run_in_background=true` + `BashOutput` polling on the tool's exit-state field rather than regex-poll on stdout. If you genuinely must regex-poll TAP output, anchor on the TAP plan line `^[0-9]+\.\.[0-9]+` (e.g. `1..1455`) — TAP's plan line is emitted on completion and is format-stable across bats versions; the bats-console-summary line is not. The console-summary vs TAP-format divergence is the load-bearing detail: `bats` and `bats --tap` produce structurally different stdout, and the antipattern assumes the former when iter dispatch typically uses the latter. **Do NOT poll subprocess completion with `pgrep -f '<pattern>'` inside an `until` / `while` loop** (P232 — self-referential pgrep deadlock; sibling variant of P146). `pgrep -f` matches against the FULL command line of every running process, so the polling loop's own `zsh -c` argument (which contains the literal `pgrep -f '<pattern>'` text) matches itself; with multiple concurrent polling loops, each loop matches the others and spins forever. Worked example of the antipattern: `until ! pgrep -f 'bats --recursive' > /dev/null 2>&1; do sleep 5; done` — the 2026-05-16 P232 deadlock witness; 4 concurrent polling loops each matched the others' command lines while no actual bats process ran; 45 min wall-clock + $20-30 wasted before manual SIGTERM. The same self-reference shape applies to `while pgrep -f ...; do sleep; done` and to `until ! pkill -0 -f '<pattern>'` / `while pkill -0 -f '<pattern>'` (signal-0 polling). The structural fix is the same as P146: prefer `wait $bg_pid` (Unix idiom — shell-native completion signal, no regex / no pgrep) or Bash-tool `run_in_background=true` + `BashOutput` polling (harness-tracked completion state). The hook `packages/itil/hooks/itil-bash-polling-antipattern-detect.sh` denies these shapes at PreToolUse:Bash, but the prompt rule belongs here too — structural enforcement + prompt discipline together close the class.
-4. **Retro-on-exit (P086)**: before emitting `ITERATION_SUMMARY`, invoke `/wr-retrospective:run-retro`. Retro runs INSIDE this subprocess so its Step 2b pipeline-instability scan has access to the iteration's rich tool-call history (hook misbehaviour, repeat-workaround patterns, subagent-delegation friction, release-path instability). Retro may create tickets or update `docs/BRIEFING.md` — run-retro commits its own work per ADR-014; any tickets it creates ride into either the iteration's own commit (if retro runs before the main commit) or a retro-owned follow-up commit, and the orchestrator picks them up on the next Step 1 scan. Proceed to `ITERATION_SUMMARY` emission regardless of retro findings — retro is non-blocking (do not block on retro): if retro fails or surfaces findings, the iteration still returns a summary so the AFK loop does not silently halt on a flaky retro run.
+3. **Constraints**: commit the completed work per ADR-014. Do NOT push, do NOT run `push:watch`, do NOT run `release:watch` — the orchestrator's Step 6.5 owns release cadence. Do NOT invoke `capture-*` background skills mid-iter (AFK carve-out — ADR-032), **EXCEPT for retro-surfaced observations of recurring class-of-behaviour** — those route to `/wr-itil:capture-problem` per the **P342 mechanical-stage carve-out** (see retro-on-exit constraint #4 below; same trust-boundary as `/wr-retrospective:run-retro` Step 4a verification close-on-evidence — P342). Do NOT use `ScheduleWakeup` under any circumstance (P083 — iteration workers must not self-reschedule). **NEVER call `AskUserQuestion` mid-loop in AFK** (P135 / ADR-044): direction / deviation-approval / one-time-override / silent-framework observations queue at `ITERATION_SUMMARY.outstanding_questions` for loop-end batched presentation. **This includes the manage-problem substance-confirm-before-build guard (ADR-074 (Confirm a decision's substance before building dependent work)):** when the propose-fix step detects that the fix builds on a born-`proposed` decision whose substance is unconfirmed (via `wr-architect-is-decision-unconfirmed`), the iter does NOT implement on it and does NOT ask mid-loop — it queues a `category: "direction"` entry naming the unconfirmed ADR + its Decision Outcome for loop-end confirmation, and routes the ticket to `action: skipped`, `skip_reason_category: user-answerable`. Building on the unconfirmed substance instead (or guessing the choice) is the P315 failure this guard exists to prevent. The queued substance-confirm is a legitimate cat-1 direction ask — it is NOT counted as lazy in the Step 2d Ask Hygiene Pass (ADR-074 lazy-count exclusion). Per-iter `AskUserQuestion` calls are sub-contracting framework-resolved decisions back to the user (lazy deferral per Step 2d Ask Hygiene Pass classification). Non-interactive defaults apply per ADR-013 Rule 6 + ADR-044's framework-resolution boundary. **Treat the user as transient** (P130): even when observably present at orchestrator dispatch time, the user may answer one question and disappear for hours; presence is not a reliable signal and is not the goal. The iter's job is to progress the ticket and accumulate questions for batched surfacing — not to ask "is it OK to proceed?" at a mechanical-stage boundary. **Do NOT poll `bats` output with a bats-console-summary regex against TAP-format output** (P146 — bash until-loop-deadlock antipattern). The bats-console-summary line `<N> tests, <M> failures` is emitted ONLY by bats's *default* (non-TAP) formatter; `bats --tap` does not emit a console summary, so a polling loop of shape `until [ -f $OUT ] && grep -qE '^[0-9]+ tests?,' $OUT; do sleep 5; done` spins forever after bats completes (silent deadlock — no error, no exit; recovery requires manual SIGTERM with metadata loss per the P146/P147 stuck-before-emit subclass). When you need to wait on a backgrounded bats run, prefer `wait $bg_pid` (Unix idiom — completion signaled by process exit, no regex required) or, for the Bash tool, `run_in_background=true` + `BashOutput` polling on the tool's exit-state field rather than regex-poll on stdout. If you genuinely must regex-poll TAP output, anchor on the TAP plan line `^[0-9]+\.\.[0-9]+` (e.g. `1..1455`) — TAP's plan line is emitted on completion and is format-stable across bats versions; the bats-console-summary line is not. The console-summary vs TAP-format divergence is the load-bearing detail: `bats` and `bats --tap` produce structurally different stdout, and the antipattern assumes the former when iter dispatch typically uses the latter. **Do NOT poll subprocess completion with `pgrep -f '<pattern>'` inside an `until` / `while` loop** (P232 — self-referential pgrep deadlock; sibling variant of P146). `pgrep -f` matches against the FULL command line of every running process, so the polling loop's own `zsh -c` argument (which contains the literal `pgrep -f '<pattern>'` text) matches itself; with multiple concurrent polling loops, each loop matches the others and spins forever. Worked example of the antipattern: `until ! pgrep -f 'bats --recursive' > /dev/null 2>&1; do sleep 5; done` — the 2026-05-16 P232 deadlock witness; 4 concurrent polling loops each matched the others' command lines while no actual bats process ran; 45 min wall-clock + $20-30 wasted before manual SIGTERM. The same self-reference shape applies to `while pgrep -f ...; do sleep; done` and to `until ! pkill -0 -f '<pattern>'` / `while pkill -0 -f '<pattern>'` (signal-0 polling). The structural fix is the same as P146: prefer `wait $bg_pid` (Unix idiom — shell-native completion signal, no regex / no pgrep) or Bash-tool `run_in_background=true` + `BashOutput` polling (harness-tracked completion state). The hook `packages/itil/hooks/itil-bash-polling-antipattern-detect.sh` denies these shapes at PreToolUse:Bash, but the prompt rule belongs here too — structural enforcement + prompt discipline together close the class.
+4. **Retro-on-exit (P086) + retro-surfaced observation classification (P342)**: before emitting `ITERATION_SUMMARY`, invoke `/wr-retrospective:run-retro`. Retro runs INSIDE this subprocess so its Step 2b pipeline-instability scan has access to the iteration's rich tool-call history (hook misbehaviour, repeat-workaround patterns, subagent-delegation friction, release-path instability). Retro may create tickets or update `docs/BRIEFING.md` — run-retro commits its own work per ADR-014; any tickets it creates ride into either the iteration's own commit (if retro runs before the main commit) or a retro-owned follow-up commit, and the orchestrator picks them up on the next Step 1 scan. Proceed to `ITERATION_SUMMARY` emission regardless of retro findings — retro is non-blocking at the iter-subprocess layer (do not block on retro): if retro fails or surfaces findings, the iteration still returns a summary so the AFK loop does not silently halt on a flaky retro run. (Session-level retro at the orchestrator-main-turn layer per Step 2.4 gate (b) IS load-bearing — distinct surface; see Step 2.4 prose for the orchestrator-layer halt semantics.)
+   **P342 classification taxonomy — retro-surfaced observations.** When the iter-retro's Step 4b Stage 1 surfaces a ticketable observation, the routing depends on classification:
+   - **Recurring class-of-behaviour observation** (sibling iters hit same pattern; SKILL-contract drift; hook misbehaviour; framework-gap; pipeline instability with concrete fix path): **auto-ticket via `/wr-itil:capture-problem`** (or `/wr-itil:manage-problem` if capture-problem sibling not yet available). This is the **mechanical-stage carve-out per run-retro Step 4a precedent** — the retro IS the system designed to mechanically observe and surface recurring class-of-behaviour, so its output ticketing is policy-authorised silent proceed per ADR-013 Rule 5. The capture-problem dispatch commits its own ticket per ADR-014; the ticket enters the WSJF queue on the orchestrator's next Step 1 scan. This is the routing that closes the silent-queue-accumulation gap P342 names.
+   - **Direction-setting observation** (genuine user-judgment-bound question — design choice, deviation-approval, framework boundary): route to `outstanding_questions` entry per the ITERATION_SUMMARY schema. Orchestrator-level Step 2.5 surfaces these at loop end per the existing batched `AskUserQuestion` flow. These observations preserve the user's authority surface and MUST NOT auto-ticket.
+   - **Ambiguous** (retro cannot cleanly distinguish recurring-class from direction-setting): **default to auto-ticket** per the P342 trust-boundary asymmetry. The ticket lifecycle (`/wr-itil:manage-problem` Step 9d / `/wr-itil:review-problems` Step 4) will surface any embedded direction-setting question through the standard problem-review flow. Defaulting to queue would re-introduce the silent-queue-accumulation hazard P342 closes; defaulting to ticket has zero observation-drop risk.
+   The classification is silent agent judgement (no `AskUserQuestion` per observation — that would re-route mechanical decisions back to the user, the lazy-deferral surface P135 / ADR-044 close). The mirror locus is run-retro `Step 4b` — same trust-boundary applies whether retro fires in iter context (this surface) OR standalone in main turn (run-retro Step 4b).
 5. **Output**: end the final message with the `ITERATION_SUMMARY` block defined below — this is how the orchestrator consumes the iteration's result.
 **Return-summary contract** (unchanged from the P077 amendment — the parse shape is dispatch-mechanism-agnostic). The subprocess's final message MUST end with this structured block, extracted by the orchestrator from the JSON `.result` field:
@@ -728,6 +755,7 @@ When `AskUserQuestion` is unavailable or the user is AFK, the skill (and the del
 | Prior-session partial work detected at start (session-continuity dirty: untracked `docs/decisions/*.proposed.md` / `docs/problems/*.md`, `.afk-run-state/iter-*.json` with `is_error: true` or `api_error_status >= 400`, stale `.claude/worktrees/*`, uncommitted SKILL.md/source/ADR edits) | Halt the loop with a structured Prior-Session State report in the AFK summary. Do NOT attempt non-interactive resume. Interactive invocations prompt via `AskUserQuestion` with 4 options (resume / discard / leave-and-lower-priority / halt). Per P109 + ADR-013 Rule 6 (Step 0 session-continuity detection pass). |
 | Fix verification needed | Skip problem, add to "needs verification" list |
 | Stop-condition #2 with user-answerable skip-reasons | Default: call AskUserQuestion (batched, ≤4 per call, sequential when >4) — the orchestrator's main turn is interactive by construction per ADR-032 subprocess-boundary; user is presumed at the keyboard. Fallback: emit Outstanding Design Questions table when AskUserQuestion is unavailable (Rule 6 fail-safe). Per ADR-013 Rule 1 + P122 (Step 2.5). |
+| Pre-`ALL_DONE` gate sequence at any loop end (every stop-condition + every halt-path that emits a final summary + quota-exhaustion natural end) | Run Step 2.4 sequence UNCONDITIONALLY before `ALL_DONE` emit: gate (a) outstanding-questions surface via Step 2.5b; gate (b) session-level retro via `/wr-retrospective:run-retro`; gate (c) emit `ALL_DONE` only after (a) AND (b) complete. Hard-fail mode: if either gate cannot complete cleanly, halt with directive instead of emit `ALL_DONE` — recovery is the user satisfying the gate and re-invoking the skill. Per ADR-044 framework-resolution boundary + ADR-013 + ADR-014 (retro commits its own work) + P086 (extends iter-level retro to orchestrator-level) + P341 (Step 2.4). |
 | Halt-path final summary with accumulated user-answerable skips (CI failure / Rule 5 above-appetite / dirty-unknown / session-continuity / fetch failure) | Run Step 2.5b's surfacing routine before emitting the halt path's final AFK summary. Step 2.5b is gated on ≥1 accumulated user-answerable skip — empty-skip halts skip the routine. Step 2.5b surfaces *prior-iter accumulated user-answerable skips only*; it does NOT ask the user how to remediate the halt cause itself (CI failure / above-appetite state / dirty-unknown state remain halt-with-bug-signal). Per ADR-013 Rule 1 + ADR-032 + P126 (`halt-paths-must-route-design-questions-through-Step-2.5b`). |
 | Unexpected dirty state between iterations | Halt the loop. Report the `git status --porcelain` output, the last iteration's reported outcome, and the divergence — per P036 (Step 6.75). Run Step 2.5b before emitting the halt summary if ≥1 accumulated user-answerable skip from prior iters (P126). Do NOT attempt non-interactive recovery of the dirty state itself. |
 | External root cause detected at Open → Known Error, or at park with `upstream-blocked` reason | Append the stable `- **Upstream report pending** — external dependency identified; invoke /wr-itil:report-upstream when ready` marker to the ticket's `## Related` section; do NOT auto-invoke `/wr-itil:report-upstream` (Step 6 security-path branch is interactive — per ADR-024 Consequences). Use the already-noted grep check to avoid duplicate lines. Per P063 + ADR-013 Rule 6. |
@@ -827,10 +855,14 @@ Extracted from each iteration subprocess's `claude -p --output-format json` resp
 ALL_DONE
 ```
+**`ALL_DONE` position (P341 Step 2.4).** The `ALL_DONE` sentinel is the FINAL line of the rendered summary, emitted at Step 2.4 gate (c) — AFTER Step 2.4 gate (a) (outstanding-questions surface via Step 2.5b) AND AFTER Step 2.4 gate (b) (session-level retro via `/wr-retrospective:run-retro`) BOTH complete cleanly. The session-level retro's own commit + any tickets it creates land BEFORE the `ALL_DONE` emit. External scripts watching for AFK-loop completion can rely on `ALL_DONE` as an honest sentinel: when it appears, both gates have completed. Hard-fail mode (halt with directive) replaces `ALL_DONE` when either gate cannot complete — adopters should treat the absence of `ALL_DONE` paired with a halt-directive line as the recoverable-pause shape (user satisfies the gate on return; re-invocation emits `ALL_DONE` cleanly).
 When every skipped ticket is in the `upstream-blocked` category (stop-condition #3) or there are no skipped tickets (stop-condition #1), omit the Outstanding Design Questions section entirely rather than rendering an empty heading. The Session Cost section always renders when at least one iteration ran.
 ## Related
+- **P341** (`docs/problems/open/341-work-problems-skill-must-surface-outstanding-questions-then-run-retro-before-emitting-all-done.md`) — driver for Step 2.4 Pre-`ALL_DONE` gate sequence (UNCONDITIONAL fire of outstanding-questions surface + session-level retro before `ALL_DONE` emit). 2026-05-31 user direction (verbatim in ticket Description): *"The work-problems skill MUST surface the outstanding questions at the end before emitting ALL_DONE. It MUST then run a retro. Only then should it emit ALL_DONE."* Closes the structural gap that allowed `ALL_DONE` to emit while direction-class observations remained queued AND without a session-level retro running. Behavioural second-source: `test/work-problems-p341-pre-all-done-gate.bats`. Composes with P086 (extends iter-level retro-on-exit to orchestrator-level), P126 (preserves `halt-paths-must-route-design-questions-through-Step-2.5b` principle), ADR-014 (retro commits its own work), ADR-044 (framework-resolution boundary for when to surface — now framework-resolved as unconditional pre-`ALL_DONE`).
+- **P342** (`docs/problems/open/342-iter-retros-queue-observations-as-outstanding-questions-instead-of-auto-ticketing-same-trust-boundary-as-step-4a.md`) — driver for Step 5 iter-prompt body's retro-surfaced observation classification taxonomy and capture-* carve-out. Iter retros' observations of recurring class-of-behaviour now route to `/wr-itil:capture-problem` (mechanical-stage carve-out per run-retro Step 4a precedent); only direction-setting observations queue at `outstanding_questions`; ambiguous defaults to auto-ticket per the trust-boundary asymmetry. The "no `capture-*` siblings mid-loop" rule is preserved for non-retro mid-iter capture (P078-class spam); the carve-out is bounded to the retro path. Sibling locus: `packages/retrospective/skills/run-retro/SKILL.md` Step 4b carries the symmetric mirror (same trust-boundary fires whether retro runs in iter context OR standalone in main turn). Behavioural second-source: `test/work-problems-p342-retro-auto-ticket-carveout.bats` + `packages/retrospective/skills/run-retro/test/run-retro-step-4b-retro-auto-ticket-carveout.bats`. Composes with run-retro Step 4a (precedent), ADR-013 Rule 5 (policy-authorised silent proceed), ADR-032 (foreground-spawns-N-background fanout already documented for Stage 1 in run-retro Step 4b), ADR-044 (mechanical-stage carve-out), P130 (mid-loop AskUserQuestion ban unchanged), P078 (capture-on-correction — distinct trigger surface; both end in capture but for different signals).
 - **P121** (`docs/problems/121-afk-orchestrator-should-sigterm-stuck-subprocesses-after-idle-timeout.verifying.md`) — driver for Step 5's backgrounded-poll-loop dispatch shape (replacing the prior foreground-synchronous form) and the idle-timeout SIGTERM branch. The 2026-04-25 P118 iter 5 evidence: an iteration subprocess sat idle ~70 min after its final commit, then SIGTERM produced a clean JSON exit-flush. Fix: orchestrator backgrounds the subprocess, polls every 60s, computes `LAST_ACTIVITY_MARK = max(DISPATCH_START_EPOCH, git log -1 --format=%at HEAD)`, and sends SIGTERM when `now - LAST_ACTIVITY_MARK > WORK_PROBLEMS_IDLE_TIMEOUT_S` (default 3600s = 60 min). Behavioural second-source: `test/work-problems-step-5-idle-timeout-sigterm.bats` exercises a fake `claude -p` shim that sleeps past the threshold and asserts SIGTERM, JSON exit-flush, env-var override, and within-threshold no-fire. Step 6's per-iter progress line SHOULD annotate `(SIGTERM_SENT)` when the branch fires so users can distinguish recovered iters from natural completions. ADR-032's subprocess-boundary variant amended 2026-04-26 with the backgrounded-poll-loop refinement.
 - **P146** (`docs/problems/146-afk-iteration-subprocess-bash-until-loop-polls-bats-output-with-bats-console-regex-against-tap-format.verifying.md`) — driver for Step 5 iteration prompt body's bats-output-polling-discipline clause. The 2026-04-29 incident (iter 1, PID 23580 child PID 16408) saw a `bash until`-loop poll a backgrounded bats output file with regex `^[0-9]+ tests?,` (bats's *default* console-summary format) against `bats --tap` output that never emits that line — silent infinite spin after bats completed; manual SIGTERM at 68m34s wall-clock; metadata loss per the P147 stuck-before-emit subclass. The polling idiom is NOT taught by any SKILL.md (audit confirmed via repo grep) — it is agent-learned from training data. Fix: prompt-discipline rule in the iteration prompt body's Constraints list explicitly forbidding the antipattern, naming `wait $bg_pid` (or Bash-tool `run_in_background=true` + `BashOutput`) as the safe substitute, and citing the TAP-vs-console-summary divergence so future contributors don't "fix" the rule incorrectly. Behavioural second-source: `test/work-problems-step-5-bats-polling-discipline.bats` asserts the prohibition phrase, the safe-substitute pointer, the P146 cite, the divergence explanation, and the Related-section cite.
 - **P232** (`docs/problems/verifying/232-bash-until-loop-pgrep-self-referential-deadlock-new-variant-of-p146.md`) — sibling variant of P146; driver for the second clause in Step 5 iter prompt's polling-discipline rule plus the structural PreToolUse:Bash hook at `packages/itil/hooks/itil-bash-polling-antipattern-detect.sh`. The 2026-05-16 incident (iter 4, P132 Phase 2a-iii-B) saw 4 concurrent `until ! pgrep -f 'bats --recursive'` polling loops each match the OTHER loops' command lines and spin forever after the main commit landed; 45 min wall-clock + $20-30 wasted before manual SIGTERM. Two-layer fix: prompt-discipline clause naming the self-reference failure mode with worked-example syntax (`until ! pgrep -f ...`), PLUS PreToolUse:Bash hook denying `(until|while)[[:space:]]+!?[[:space:]]*(pgrep|pkill[[:space:]]+-0)` shapes with a deny message citing P232 and naming both recovery alternatives (`wait $bg_pid` shell-native, Bash-tool `BashOutput` harness-native). Behavioural second-source: `packages/itil/hooks/test/itil-bash-polling-antipattern-detect.bats` (positive cases — until/while pgrep, until/while pkill -0, heredoc; negative cases — one-shot pgrep, non-`-0` pkill, unrelated until/while, `wait $!`; advisory-message content cite). P146 prompt-only enforcement failed empirically in iter 4 of the very loop that ships it; P232 closes the class with structural enforcement.

package/skills/work-problems/test/work-problems-p341-pre-all-done-gate.bats ADDED Viewed

@@ -0,0 +1,122 @@
+#!/usr/bin/env bats
+# P341: work-problems SKILL must surface outstanding questions THEN run
+# a session-level retro BEFORE emitting ALL_DONE. The fix shape adds a
+# new step (Step 2.4 — Pre-ALL_DONE gate sequence) that fires
+# UNCONDITIONALLY before ALL_DONE emit, sequencing (a) outstanding-
+# questions surface + (b) session-level retro + (c) ALL_DONE.
+#
+# Hard-fail mode: if either gate cannot complete (user not present and
+# queue has user-input-required entries, retro fails), the SKILL.md MUST
+# direct the orchestrator to halt with a clear directive — NOT emit
+# ALL_DONE.
+#
+# Doc-lint contract assertions per ADR-037 Permitted Exception
+# (structural checks on prose contract; behavioural harness for SKILL.md
+# pending P081 Phase 2 / P012).
+#
+# @problem P341
+# @adr ADR-044 (Decision-Delegation Contract — direction-class observations are the protected surface this gate enforces)
+# @adr ADR-013 (structured user interaction — outstanding-questions surface is the load-bearing application)
+# @adr ADR-014 (governance skills commit own work — retro commits its own work)
+# @adr ADR-037 (skill-testing-strategy — Permitted Exception for prose contract)
+# @jtbd JTBD-006 (Progress the Backlog While I'm Away)
+# @jtbd JTBD-201 (audit-trail — ALL_DONE is honest sentinel post-amendment)
+setup() {
+  REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
+  SKILL_MD="$REPO_ROOT/packages/itil/skills/work-problems/SKILL.md"
+}
+@test "work-problems P341: SKILL.md exists" {
+  [ -f "$SKILL_MD" ]
+}
+# ── Step 2.4 (or named equivalent) gate-sequence subsection presence ────────
+@test "work-problems P341: SKILL.md names a Pre-ALL_DONE gate sequence step" {
+  # The fix adds a new orchestrator-main-turn step that fires
+  # UNCONDITIONALLY before ALL_DONE emit. The step MUST be a markdown
+  # heading so cross-references resolve to a single source of truth.
+  run grep -nE '^#{3,4} Step 2\.4|Pre-`?ALL_DONE`? gate sequence' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P341: gate-sequence step is unconditional (fires before every ALL_DONE emit)" {
+  # The structural gap P341 closes is that Step 2.5 fires conditionally
+  # on stop-condition #2. The new step fires UNCONDITIONALLY before
+  # ALL_DONE emit regardless of stop-condition.
+  run grep -nE 'unconditionally|UNCONDITIONAL|every `?ALL_DONE`? emit|before every `?ALL_DONE`?' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P341: gate-sequence names outstanding-questions surface as gate (a)" {
+  # Sequence (a): Read .afk-run-state/outstanding-questions.jsonl; if
+  # non-empty, fire Step 2.5b's surfacing routine; truncate on completion.
+  run grep -nE 'outstanding-questions\.jsonl|outstanding-questions surface|Step 2\.5b.*surfacing' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P341: gate-sequence names session-level retro as gate (b)" {
+  # Sequence (b): Run session-level retro via /wr-retrospective:run-retro
+  # — covers cross-iter patterns, friction observations, framework-
+  # improvement candidates. Retro commits its own work per ADR-014.
+  run grep -nE 'session-level retro|/wr-retrospective:run-retro' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P341: gate-sequence names ALL_DONE emit as gate (c) ONLY after (a) and (b)" {
+  # Sequence (c): Emit ALL_DONE ONLY after both (a) and (b) complete.
+  # The ordering must be explicit so future authors don't re-permit a
+  # short-circuit.
+  run grep -nE 'ALL_DONE.*after.*both|ONLY after.*(a).*(b)|Emit `?ALL_DONE`? ONLY' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Hard-fail mode (halt instead of ALL_DONE) ──────────────────────────────
+@test "work-problems P341: gate-sequence directs halt-with-directive when either gate cannot complete" {
+  # Hard-fail mode: if outstanding-questions surface cannot complete OR
+  # retro fails, the SKILL.md MUST direct the orchestrator to halt with a
+  # clear directive — NOT emit ALL_DONE. Halt is recoverable; user
+  # returns, surfaces, completes the loop with ALL_DONE.
+  run grep -nE 'halt with.*directive|MUST.*halt.*NOT emit|halt instead of.*ALL_DONE|halt.*not.*ALL_DONE' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P341: gate-sequence mentions ADR-014 commit-ownership for retro work" {
+  # Retro commits its own work per ADR-014 — the orchestrator MUST NOT
+  # re-commit retro's output, AND retro is not silently dropped because
+  # the orchestrator forgot to invoke it.
+  run grep -nE 'retro commits its own work|run-retro.*ADR-014|retro.*per ADR-014' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Output Format ALL_DONE positioning ──────────────────────────────────────
+@test "work-problems P341: Output Format section reflects new ALL_DONE sequence" {
+  # The Output Format section MUST reference the gate sequence so the
+  # rendered ALL_DONE position is documented to follow Step 2.4. This
+  # protects against future authors who add new sections between the
+  # gate sequence and the ALL_DONE marker.
+  run grep -nE 'Step 2\.4|gate sequence|Pre-`?ALL_DONE`? gate' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Decision Table row (cross-reference to gate sequence) ───────────────────
+@test "work-problems P341: Non-Interactive Decision Making table carries a pre-ALL_DONE gate row" {
+  # The decisions table at the bottom of SKILL.md must carry a row that
+  # names the pre-ALL_DONE gate sequence so the decision summary is
+  # consistent with the Step prose. This prevents future readers from
+  # missing the gate when scanning the decisions table only.
+  run grep -nE '\| Pre-`?ALL_DONE`? gate|\| ALL_DONE.*gate sequence|\| Loop-end.*gate sequence' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Cross-reference to P341 + sibling P342 ──────────────────────────────────
+@test "work-problems P341: Related section cites P341 as the originating ticket" {
+  run grep -nE '\*\*P341\*\*|P341\b' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}

package/skills/work-problems/test/work-problems-p342-retro-auto-ticket-carveout.bats ADDED Viewed

@@ -0,0 +1,120 @@
+#!/usr/bin/env bats
+# P342: Iter retros queue their own observations as outstanding-questions
+# for user-direction triage instead of auto-ticketing — same trust-
+# boundary as /wr-retrospective:run-retro Step 4a (verification close-on-
+# evidence).
+#
+# The fix shape amends the work-problems Step 5 iter-prompt body:
+#   - Relax the "no capture-* siblings mid-loop" rule for RETRO-surfaced
+#     observations specifically.
+#   - Direct retro to auto-ticket recurring class-of-behaviour
+#     observations via /wr-itil:capture-problem (mechanical-stage carve-
+#     out per run-retro Step 4a precedent).
+#   - Route ONLY direction-setting observations (genuine user-judgment-
+#     bound questions) to outstanding_questions.
+#   - Document the classification:
+#       recurring class-of-behaviour / SKILL-contract drift / hook
+#       misbehaviour / framework-gap → auto-ticket.
+#       Direction-setting (design choice, deviation-approval, framework
+#       boundary) → outstanding_questions.
+#       Ambiguous → default to auto-ticket.
+#
+# Doc-lint contract assertions per ADR-037 Permitted Exception.
+#
+# @problem P342
+# @adr ADR-044 (Decision-Delegation Contract — mechanical-stage carve-out per Step 4a precedent)
+# @adr ADR-013 Rule 5 (policy-authorised silent proceed)
+# @adr ADR-032 (governance skill invocation patterns — foreground-spawns-background fanout pattern for capture-*)
+# @jtbd JTBD-006 (Progress the Backlog While I'm Away — durable WSJF-ranked backlog accumulation)
+# @jtbd JTBD-201 (audit-trail — auto-ticketed observations become durable artefacts)
+setup() {
+  REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
+  SKILL_MD="$REPO_ROOT/packages/itil/skills/work-problems/SKILL.md"
+}
+@test "work-problems P342: SKILL.md exists" {
+  [ -f "$SKILL_MD" ]
+}
+# ── Step 5 iter-prompt: capture-* carve-out for retro observations ──────────
+@test "work-problems P342: iter-prompt body carves out capture-* for retro-surfaced observations" {
+  # The fix relaxes the "No capture-* siblings mid-loop" rule for RETRO
+  # observations specifically. Existing P078-class spam rule remains for
+  # non-retro mid-iter capture; the carve-out is bounded to retro.
+  run grep -nE 'retro.*capture-\*|capture-\*.*retro|retro-surfaced.*capture|capture-problem.*retro' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P342: iter-prompt body directs retro to auto-ticket via /wr-itil:capture-problem" {
+  # Recurring class-of-behaviour observations MUST route to
+  # /wr-itil:capture-problem (mechanical-stage carve-out per run-retro
+  # Step 4a precedent). The skill is named explicitly so adopters know
+  # which capture sibling to invoke.
+  run grep -nE '/wr-itil:capture-problem' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P342: iter-prompt cites Step 4a precedent for mechanical-stage carve-out" {
+  # The carve-out's authority is the run-retro Step 4a verification
+  # close-on-evidence precedent. Cite it so future authors don't unwind
+  # the carve-out by re-applying the broad "no capture-* mid-loop" rule
+  # uniformly.
+  run grep -nE 'Step 4a precedent|run-retro Step 4a|Step 4a.*mechanical' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Classification taxonomy ────────────────────────────────────────────────
+@test "work-problems P342: iter-prompt classifies recurring class-of-behaviour as auto-ticket" {
+  # Recurring class-of-behaviour / SKILL-contract drift / hook
+  # misbehaviour / framework-gap → auto-ticket. The taxonomy MUST be
+  # documented so future authors don't drift on classification.
+  run grep -nE 'recurring class-of-behaviour|class-of-behaviour observation' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P342: iter-prompt classifies direction-setting as outstanding_questions" {
+  # Direction-setting (design choice, deviation-approval, framework
+  # boundary) → outstanding_questions. The route must be named so the
+  # framework boundary is preserved.
+  run grep -nE 'Direction-setting observation|direction-setting.*outstanding_questions' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P342: iter-prompt classifies ambiguous observations as default-to-auto-ticket" {
+  # Ambiguous → default to auto-ticket. This is the trust-boundary
+  # asymmetry that prevents observations from silently piling in the
+  # queue file (per P342 Description).
+  run grep -nE 'Ambiguous.*auto-ticket|default to auto-ticket|ambiguous.*default.*ticket' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Trust-boundary mirror in run-retro Step 4b ─────────────────────────────
+@test "work-problems P342: iter-prompt cross-references run-retro Step 4b carve-out symmetry" {
+  # The run-retro Step 4b stage classification mirrors this carve-out so
+  # the same trust-boundary applies whether retro fires in iter context
+  # OR standalone in main turn. Cross-reference the sibling locus so the
+  # symmetry is discoverable.
+  run grep -nE 'run-retro.*Step 4b|Step 4b.*mirror|symmetry.*Step 4b|Step 4b.*carve-out' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Cross-reference to P342 ────────────────────────────────────────────────
+@test "work-problems P342: Related section cites P342 as the originating ticket" {
+  run grep -nE '\*\*P342\*\*|P342\b' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Anti-pattern preservation: P130 mid-loop ask discipline unaffected ─────
+@test "work-problems P342: iter-prompt preserves P130 NEVER call AskUserQuestion mid-loop discipline" {
+  # The P342 carve-out is for capture-* siblings on the retro path only;
+  # the iter-prompt's mid-loop AskUserQuestion ban is unchanged.
+  run grep -nE 'NEVER call .?AskUserQuestion.? mid-loop|MUST NOT call .?AskUserQuestion.? between iter' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}