npm - @windyroad/itil - Versions diffs - 0.23.0-preview.249 → 0.23.1-preview.251 - Mend

@windyroad/itil 0.23.0-preview.249 → 0.23.1-preview.251

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/.claude-plugin/plugin.json +1 -1
package/package.json +1 -1
package/skills/work-problems/SKILL.md +31 -3
package/skills/work-problems/test/work-problems-step-6-5-fix-and-continue.bats +254 -0

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
   "name": "wr-itil",
-  "version": "0.23.0",
+  "version": "0.23.1",
   "description": "ITIL-aligned IT service management for Claude Code"
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/itil",
-  "version": "0.23.0-preview.249",
+  "version": "0.23.1-preview.251",
   "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
   "bin": {
     "windyroad-itil": "./bin/install.mjs"

package/skills/work-problems/SKILL.md CHANGED Viewed

@@ -415,9 +415,36 @@ After the iteration's commit lands but before starting the next iteration, check
 2. If `.changeset/` is non-empty after push, run `npm run release:watch` (merge the release PR + wait for npm publish).
 3. Resume the loop only after the release lands on npm.
-**Failure handling**: If `release:watch` fails (CI failure, publish failure), stop the loop and report the failure in the AFK summary. Do not retry non-interactively — the user must intervene. **Step 2.5b cross-reference (P126)**: before emitting the final AFK summary for a Failure handling / CI failure / release:watch halt, run Step 2.5b's surfacing routine. The routine is gated on ≥1 accumulated user-answerable skip; this halt path empirically frequently has accumulated skips from prior iters (the original P126 surface), so the gate is normally satisfied and Step 2.5b's AskUserQuestion-default branch fires (`halt-paths-must-route-design-questions-through-Step-2.5b`). The CI-failure cause itself remains a halt with bug-signal — Step 2.5b surfaces *prior-iter accumulated user-answerable skips only*; it does NOT ask the user how to remediate the CI failure (that requires the user to inspect the failing CI run on return).
+**Failure handling (P140)**: When `push:watch` or `release:watch` reports a CI failure or publish failure, the orchestrator follows a diagnose-then-classify routing — fix-and-continue for the documented mechanically-fixable allow-list, halt for everything else. The previous uniform halt rule converted mechanically-fixable failures (1-line stale-grep-string updates, transient flakes) into ~45min queue stalls, regressing JTBD-006 "Progress the Backlog While I'm Away" without any governance benefit.
-`push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 5).
+**Diagnostic preamble (ADR-026 grounding)**: orchestrator MUST first fetch the failed CI log via `gh run view <run-id> --log-failed` (or `gh run view --log-failed` against the most recent failure). Read the failure output and classify into ONE of the buckets below. Cite the failed test output verbatim in the fix-and-continue commit message or halt summary so future readers can audit the classification.
+**Fixable-in-iter allow-list (closed)**: the following classes are policy-authorised silent fix-and-continue per ADR-013 Rule 5. The list is **closed** — adding a new class is itself a deviation-candidate per ADR-044's framework-resolution boundary (surface to user via Step 2.5b's AskUserQuestion-default branch; do NOT auto-extend at agent discretion).
+- **P081-class stale-grep-string** — structural test runs `grep -F '<literal>'` (or `grep -nE '<pattern>'`) against a SKILL.md / ADR / source file; non-zero return because source was edited and the test's grep string was not. Fix: update the grep string to current source phrasing. Composes with P081 (structural-tests-are-wasteful root cause); fix-and-continue is the stop-gap, P081's full retrofit is the structural elimination.
+- **Hook stub mismatch** — test's mock-stdin field doesn't match current hook expectation (e.g. renamed JSON key, renamed event type). Fix: update the stub.
+- **Test ID drift** — assertion message grep doesn't match a recently-renamed function or symbol. Fix: sed in the test.
+- **Environmental flake** — CI runner intermittent issue (npm registry timeout, GitHub API rate limit, transient infra). Fix: re-trigger the workflow.
+**Ambiguous classification defaults to halt.** If the failure does not unambiguously match one of the above, the orchestrator halts. No diagnose-then-guess.
+**Fix-and-continue branch**: for a fixable class:
+1. Apply the fix (typically a single `Edit` change).
+2. Commit the fix through the **standard ADR-014 commit gate flow** — architect / JTBD / risk-scorer review per retry. A gate rejection routes to the halt branch (no retry budget restoration). Each fix-and-continue commit is its own discrete unit of work and rides its own commit through gates per ADR-014 + ADR-042 Rule 3 precedent (retries each ride their own commit).
+3. `git push` and re-run `npm run push:watch` (or `release:watch` if the failure was on the release-PR side) to wait for CI re-trigger.
+4. If CI passes, resume the loop (Step 6.75).
+5. If CI fails again, increment the per-iteration retry counter and return to step 1.
+**3-retry cap (per iteration, not per failure-class)**: after 3 fix-and-continue attempts in a single Step 6.5 invocation, the orchestrator routes to the halt branch regardless of failure class. Repeated failures of the "same" class are evidence the diagnosis was wrong; halt and surface for user judgment. The cap is per-iteration — a 4th distinct fixable failure in the same drain still halts.
+**Halt branch (genuinely unrecoverable)**: halt the loop and report the failure in the AFK summary. Do not retry non-interactively. Genuinely-unrecoverable classes include: auth failure (npm token, GitHub credentials), npm publish rejection (version conflict, package access denied), semantic test failure requiring user judgment (not literal-string drift), repeated transient failures (3+ retries, per the cap above), and any failure outside the fixable-in-iter allow-list.
+**Step 2.5b cross-reference (P126)**: before emitting the final AFK summary for a Failure handling / CI failure / release:watch halt, run Step 2.5b's surfacing routine. The routine is gated on ≥1 accumulated user-answerable skip; this halt path empirically frequently has accumulated skips from prior iters (the original P126 surface), so the gate is normally satisfied and Step 2.5b's AskUserQuestion-default branch fires (`halt-paths-must-route-design-questions-through-Step-2.5b`). The CI-failure cause itself remains a halt with bug-signal — Step 2.5b surfaces *prior-iter accumulated user-answerable skips only*; it does NOT ask the user how to remediate the CI failure (that requires the user to inspect the failing CI run on return).
+`push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 5). The fix-and-continue branch is itself policy-authorised by the closed allow-list above, satisfying ADR-013 Rule 5 without an `AskUserQuestion` round-trip.
+**Composition notes**: fix-and-continue is the inverse of P132 (over-ask in interactive sessions) on the failure-handling surface — both arise from over-defensive uniform routing where a documented class-policy would empower silent action. Composes with P130 (orchestrator main-turn ask discipline — fix-and-continue does NOT introduce mid-iter asks; the closed allow-list resolves the decision per ADR-044). Cross-references: P081 (stop-gap composition — most fixables are P081-class), P135 (decision-delegation contract — the closed allow-list IS the framework-resolved policy).
 #### Above-appetite branch (per ADR-042)
@@ -497,6 +524,7 @@ When `AskUserQuestion` is unavailable or the user is AFK, the skill (and the del
 | Commit when risk within appetite | Auto-commit (manage-problem step 9e fallback) |
 | Commit when risk above appetite | Skip commit, report uncommitted state |
 | Pipeline risk at appetite (push or release = 4/25) | Drain release queue (`push:watch` then `release:watch`) before next iteration — per ADR-018 (Step 6.5) |
+| CI failure during Step 6.5 drain (within-appetite branch) | Diagnose via `gh run view --log-failed`, classify against the closed fixable-in-iter allow-list (P081-class stale-grep-string, hook stub mismatch, test ID drift, environmental flake), fix-and-continue for fixable classes (each retry rides its own ADR-014 commit gate), 3-retry cap per iteration, halt for unrecoverable classes. Ambiguous classification defaults to halt. ADR-013 Rule 5 policy-authorised. Per ADR-026 grounding + ADR-044 framework-resolution boundary + P140 (Step 6.5 Failure handling). |
 | Pipeline risk above appetite (push or release >= 5/25) | Auto-apply scorer remediations incrementally (ADR-042 Rule 2). The agent reads suggestions and decides what to do. Re-score after each apply; drain when within appetite. **Never release above appetite** (ADR-042 Rule 1) — no AskUserQuestion shortcut. Halt the loop with `outcome: halted-above-appetite` if the loop exhausts without convergence (ADR-042 Rule 5). Verification Pending commits excluded from auto-revert (Rule 2b). Per ADR-042 (Step 6.5 Above-appetite branch). |
 | Origin diverged before start | Pull `--ff-only` if trivial; stop with report (`git log HEAD..origin/<base>` and reverse) if non-fast-forward — per ADR-019 (Step 0) |
 | Prior-session partial work detected at start (session-continuity dirty: untracked `docs/decisions/*.proposed.md` / `docs/problems/*.md`, `.afk-run-state/iter-*.json` with `is_error: true` or `api_error_status >= 400`, stale `.claude/worktrees/*`, uncommitted SKILL.md/source/ADR edits) | Halt the loop with a structured Prior-Session State report in the AFK summary. Do NOT attempt non-interactive resume. Interactive invocations prompt via `AskUserQuestion` with 4 options (resume / discard / leave-and-lower-priority / halt). Per P109 + ADR-013 Rule 6 (Step 0 session-continuity detection pass). |
@@ -517,7 +545,7 @@ The orchestrator MUST NOT call `AskUserQuestion` between iterations except at th
 - **Step 0 fetch-failure halt** — `git fetch origin` network failure; halt-with-report so the user retries on return.
 - **Step 2.5 / Step 2.5b loop-end emit** — accumulated `outstanding_questions` queue presented as batched `AskUserQuestion` (or fallback Outstanding Design Questions table per ADR-013 Rule 6). This is the framework's prescribed user-interaction point; do NOT dilute it by asking earlier.
 - **Step 6.5 above-appetite Rule 5 halt** — auto-apply loop exhausted without convergence; halt-with-batched-questions per the Step 2.5b cross-reference (Step 2.5b surfaces *prior-iter accumulated user-answerable skips only* — the halt-causing scorer-gap remains a halt-with-bug-signal per ADR-042 Rule 5).
-- **Step 6.5 CI-failure / `release:watch` failure halt** — push:watch or release:watch failed; halt-with-batched-questions per the Step 2.5b cross-reference.
+- **Step 6.5 CI-failure / `release:watch` failure halt** — push:watch or release:watch failed AND the failure is genuinely-unrecoverable (outside the fixable-in-iter allow-list, or 3-retry cap reached); halt-with-batched-questions per the Step 2.5b cross-reference. Failures inside the closed allow-list route to fix-and-continue per Step 6.5 Failure handling (P140), not this halt point.
 - **Step 6.75 dirty-for-unknown-reason halt** — `git status --porcelain` divergence; halt-with-batched-questions per the Step 2.5b cross-reference.
 **No mid-iter ask points.** Every other point in the orchestrator's main turn (between Step 5 dispatch completing and Step 6.5 release-cadence check; between Step 6.75 verification and Step 7 loop-back; between Step 7 and Step 1 next-iteration; between consecutive iters generally) is a mechanical-stage transition that the framework has already resolved. Do NOT introduce ad-hoc `AskUserQuestion` calls at those points to confirm "is it OK to proceed?" or "want me to start the next iter?" — proceeding IS the framework-resolved default. Continue iterating until quota or stop-condition #1/#2/#3 fires.

package/skills/work-problems/test/work-problems-step-6-5-fix-and-continue.bats ADDED Viewed

@@ -0,0 +1,254 @@
+#!/usr/bin/env bats
+# P140: /wr-itil:work-problems Step 6.5 Failure handling subsection must
+# document diagnose-then-classify routing — fix-and-continue for the
+# documented mechanically-fixable allow-list, halt for everything else.
+#
+# Prior behaviour was a uniform halt-on-CI-failure rule that converted
+# 1-line stale-grep-string updates and transient flakes into ~45min queue
+# stalls, regressing JTBD-006 "Progress the Backlog While I'm Away"
+# without any governance benefit. P140's Phase 1 amendment replaces that
+# uniform rule with a closed allow-list policy authorising silent
+# fix-and-continue per ADR-013 Rule 5, capped at 3 retries per iteration
+# before falling back to the halt branch.
+#
+# Doc-lint contract assertions per ADR-037 Permitted Exception
+# (contract-assertion class — same shape as the P130 / P126 / P135
+# sibling fixtures). The asserted prose IS the load-bearing policy
+# surface — re-reading the SKILL.md is the only way an AFK reader (and
+# the iteration subprocess) learns the fixable-class taxonomy and the
+# retry cap. Behavioural verification is impossible until Phase 2's
+# advisory classifier ships (deferred per the ticket Fix Strategy —
+# observe over 30 days).
+#
+# @problem P140
+# @adr ADR-013 (Rule 5 — policy-authorised silent action)
+# @adr ADR-014 (one-commit-per-iter; retries each ride their own commit)
+# @adr ADR-018 (inter-iteration release cadence; this refines its
+#       Failure handling clause)
+# @adr ADR-026 (agent output grounding — diagnostic preamble citation)
+# @adr ADR-037 (skill-testing strategy — contract-assertion class)
+# @adr ADR-042 (above-appetite branch — Rule 3 commit-gate-per-retry
+#       precedent composes with this fix-and-continue branch)
+# @adr ADR-044 (decision-delegation contract — framework-resolution
+#       boundary; closed allow-list extensions are deviation-candidates)
+# @jtbd JTBD-006 (Progress the Backlog While I'm Away — primary)
+# @jtbd JTBD-001 (Enforce Governance Without Slowing Down — composes;
+#       per-retry gates preserve governance)
+setup() {
+  REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
+  SKILL_MD="$REPO_ROOT/packages/itil/skills/work-problems/SKILL.md"
+}
+@test "work-problems P140: SKILL.md exists" {
+  [ -f "$SKILL_MD" ]
+}
+# ── Failure handling subsection identity ───────────────────────────────────
+@test "work-problems P140: Step 6.5 Failure handling subsection cites P140" {
+  # The amendment must self-identify so future readers tracing back from
+  # the ticket find the load-bearing prose without keyword-guessing.
+  run grep -nE 'Failure handling.*P140|P140.*Failure handling' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Diagnostic preamble (ADR-026 grounding) ────────────────────────────────
+@test "work-problems P140: Failure handling cites gh run view --log-failed as the diagnostic preamble" {
+  # ADR-026 grounding: the orchestrator MUST read the actual failure
+  # output before classifying. Without this, classification degrades to
+  # guess-from-context.
+  run grep -nE 'gh run view.*--log-failed' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling cites ADR-026 (grounding) on the diagnostic preamble" {
+  # The grounding requirement should cite ADR-026 explicitly so the
+  # connection is auditable.
+  run grep -nE 'ADR-026' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Fixable-in-iter allow-list (closed) ────────────────────────────────────
+@test "work-problems P140: Failure handling names P081-class stale-grep-string as a fixable class" {
+  run grep -nE 'P081-class stale-grep-string|stale-grep-string' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling names hook stub mismatch as a fixable class" {
+  run grep -niE 'hook stub mismatch' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling names test ID drift as a fixable class" {
+  run grep -niE 'test ID drift' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling names environmental flake as a fixable class" {
+  run grep -niE 'environmental flake' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: allow-list is framed as 'closed' (not extensible at agent discretion)" {
+  # JTBD review guard-rail: persona could misread "fix-and-continue" as
+  # "auto-fix anything" without the closed framing. Future agent edits
+  # must not drift the allow-list open without explicit user direction.
+  run grep -niE 'allow-list.*closed|closed.*allow-list' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: extending the allow-list is framed as a deviation-candidate per ADR-044" {
+  # ADR-044 framework-resolution boundary: the closed list IS the
+  # framework-resolved policy. Adding a class is a direction-setting
+  # decision, not a mechanical fix.
+  run grep -niE 'deviation-candidate.*ADR-044|ADR-044.*deviation' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: ambiguous classification defaults to halt (no diagnose-then-guess)" {
+  # JTBD review guard-rail (b): without this, the persona-misread risk
+  # of "auto-fix anything" re-enters via fuzzy classification.
+  run grep -niE 'Ambiguous classification defaults to halt|ambiguous.*halt' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Fix-and-continue branch ────────────────────────────────────────────────
+@test "work-problems P140: Failure handling documents a fix-and-continue branch" {
+  run grep -niE 'Fix-and-continue branch|fix-and-continue branch' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: each fix-and-continue retry rides standard ADR-014 commit gate flow (architect / JTBD / risk-scorer)" {
+  # Architect-flagged invariant: governance gates MUST run on every
+  # retry. The fix-and-continue branch does NOT bypass gates.
+  run grep -niE 'standard ADR-014 commit gate flow|ADR-014.*commit gate' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: ADR-042 Rule 3 commit-gate-per-retry precedent is cross-referenced" {
+  # ADR-042 already establishes that retries each ride their own
+  # commit through full gate flow. P140 composes with that precedent
+  # rather than inventing a new commit-cardinality rule.
+  run grep -niE 'ADR-042 Rule 3' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── 3-retry cap (per iteration) ────────────────────────────────────────────
+@test "work-problems P140: Failure handling caps fix-and-continue at 3 retries" {
+  run grep -niE '3-retry cap|3 retr|three retr' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: 3-retry cap is per-iteration, not per-failure-class" {
+  # Without this clarification, an agent could reset the counter on
+  # each new failure class and drain budget indefinitely.
+  run grep -niE 'per[- ]iteration, not per[- ]failure[- ]class|cap is per[- ]iteration' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Halt branch preserved ──────────────────────────────────────────────────
+@test "work-problems P140: Halt branch preserved for genuinely-unrecoverable failures" {
+  run grep -niE 'genuinely-unrecoverable|genuinely unrecoverable' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Halt branch enumerates auth failure / npm publish rejection / semantic test as unrecoverable" {
+  # The halt branch's allow-list mirror — naming the unrecoverable
+  # classes makes the boundary auditable.
+  run grep -niE 'auth failure|npm publish rejection|semantic test.*judgment' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Step 2.5b cross-reference preserved (P126) ─────────────────────────────
+@test "work-problems P140: Halt branch routes through Step 2.5b surfacing routine (P126 preserved)" {
+  # The halt branch's existing P126 cross-reference must survive the
+  # amendment — surfacing accumulated user-answerable skips before
+  # emitting the halt summary remains the contract.
+  run grep -nE 'Step 2\.5b cross-reference \(P126\)' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── ADR-013 Rule 5 policy-authorised silent action ─────────────────────────
+@test "work-problems P140: fix-and-continue branch is policy-authorised per ADR-013 Rule 5" {
+  # ADR-044's framework-mediated surface includes "policy-authorised
+  # silent proceed" — the closed allow-list IS the policy. Future
+  # readers must find the citation to confirm this is not an ad-hoc
+  # bypass of Rule 1.
+  run grep -nE 'ADR-013 Rule 5|Rule 5 policy-authorised|policy-authorised.*ADR-013' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Composition cross-references ───────────────────────────────────────────
+@test "work-problems P140: Failure handling cross-references P081 (stop-gap composition)" {
+  # P081 is the structural-tests-are-wasteful root cause. Most
+  # P081-class stale-grep-string failures are P081's territory.
+  # Fix-and-continue is the stop-gap; P081's full retrofit is the
+  # structural elimination.
+  run grep -nE 'P081' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling cross-references P135 (decision-delegation contract)" {
+  # P135 + ADR-044 frame the closed allow-list as the
+  # framework-resolved policy.
+  run grep -nE 'P135' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling cross-references P130 (orchestrator main-turn ask discipline)" {
+  # P130 ensures fix-and-continue does NOT introduce mid-iter asks —
+  # the closed allow-list resolves the decision per ADR-044's
+  # framework-resolution boundary.
+  run grep -nE 'P130' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Failure handling cross-references P132 (over-ask in interactive sessions)" {
+  # P140 is the inverse of P132 on the failure-handling surface — both
+  # arise from over-defensive uniform routing. Naming the symmetry
+  # protects against future drift.
+  run grep -nE 'P132' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Mid-loop ask discipline halt-point bullet narrowed ─────────────────────
+@test "work-problems P140: Step 6.5 CI-failure halt-point bullet narrows to outside-allow-list / cap-reached scope" {
+  # The Mid-loop ask discipline subsection enumerates Step 6.5 CI-
+  # failure as a halt point. After P140 the halt fires only on
+  # unrecoverable failures — the bullet must reflect that narrower
+  # scope, otherwise future readers conclude all CI failures still
+  # halt.
+  run grep -nE 'fixable-in-iter allow-list|3-retry cap reached|outside the.*allow-list' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+# ── Non-Interactive Decision Making table row ──────────────────────────────
+@test "work-problems P140: Decision Making table carries a CI-failure-during-Step-6.5-drain row" {
+  # The decision table is the AFK reader's quick-reference; without a
+  # row here the failure-handling refinement is buried 80 lines up in
+  # Step 6.5.
+  run grep -nE '\| CI failure during Step 6\.5 drain' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Decision Making table row cites the closed fixable-in-iter allow-list" {
+  run grep -nE 'closed fixable-in-iter allow-list' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}
+@test "work-problems P140: Decision Making table row cites the 3-retry cap" {
+  run grep -nE 'CI failure during Step 6\.5.*3-retry cap|3-retry cap.*CI failure' "$SKILL_MD"
+  [ "$status" -eq 0 ]
+}