@windyroad/itil 0.17.2 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  {
2
2
  "name": "wr-itil",
3
- "version": "0.17.2",
3
+ "version": "0.18.0",
4
4
  "description": "ITIL-aligned IT service management for Claude Code"
5
5
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/itil",
3
- "version": "0.17.2",
3
+ "version": "0.18.0",
4
4
  "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
5
5
  "bin": {
6
6
  "windyroad-itil": "./bin/install.mjs"
@@ -295,8 +295,20 @@ Otherwise, after the commit in step 14 lands, drain the release queue so the fix
295
295
 
296
296
  **Failure handling**: If `release:watch` fails (CI failure, publish failure), stop and report the failure clearly. Do not retry non-interactively — the user must intervene.
297
297
 
298
- **Above-appetite branch**: If push/release risk is above appetite, skip the drain and report: "Release skipped risk above appetite. Run `npm run push:watch` and `npm run release:watch` manually when ready."
298
+ **Above-appetite branch (per ADR-041)**: If push or release risk is above appetite (≥ 5/25), the skill MUST auto-apply scorer remediations in rank order until residual risk converges within appetite, OR halt the skill per ADR-041 Rule 5 if the scorer cannot produce a convergent plan. **The skill MUST NOT release above appetite under any circumstance.** The skill MUST NOT call `AskUserQuestion` as a shortcut out of the auto-apply loop.
299
299
 
300
- `push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 6).
300
+ **Auto-apply mechanism (ADR-041 Rule 2):**
301
+
302
+ 1. Parse the scorer's `RISK_REMEDIATIONS:` block.
303
+ 2. Rank by largest absolute `risk_delta` → smaller effort (S < M < L) → lower remediation ID.
304
+ 3. Classify each remediation's `description` against ADR-041 Rule 2a's closed action-class enumeration. **Today's orchestrator-supported class (ADR-041 v1)**: `move-to-holding` only. Other classes (`revert-commit`, `amend-commit`, `feature-flag`, `rollback-to-tag`) are deferred to P108 and route to Rule 5 halt.
305
+ 4. **Verification Pending carve-out (ADR-041 Rule 2b)**: skip remediations that target a commit attached to a `.verifying.md` ticket.
306
+ 5. Apply the top-ranked eligible remediation. Each auto-apply is its own commit (ADR-041 Rule 3 — non-AFK has no iteration wrapper to amend into); each commit goes through architect + JTBD + risk-scorer gates per ADR-014.
307
+ 6. Re-score via the same delegation path as step 1 above.
308
+ 7. **Loop**: within appetite → drain per the Drain action above. Still above → next remediation. Exhausted or unsupported class → Rule 5 halt.
309
+
310
+ **Rule 5 halt (non-AFK mode)**: halt the skill. Emit the terminal report naming the final `RISK_SCORES:`, the Auto-apply trail, any Verification Pending ticket IDs implicated, and a one-line scorer-gap note. The user resolves interactively.
311
+
312
+ `push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 5). Auto-apply actions under Rules 2–7 are also policy-authorised per ADR-013 Rule 5.
301
313
 
302
314
  $ARGUMENTS
@@ -724,8 +724,27 @@ Otherwise, after the commit in step 11 lands, drain the release queue so the fix
724
724
 
725
725
  **Failure handling**: If `release:watch` fails (CI failure, publish failure), stop and report the failure clearly. Do not retry non-interactively — the user must intervene.
726
726
 
727
- **Above-appetite branch**: If push/release risk is above appetite, skip the drain and report: "Release skipped risk above appetite. Run `npm run push:watch` and `npm run release:watch` manually when ready."
727
+ **Above-appetite branch (per ADR-041)**: If push or release risk is above appetite (≥ 5/25), the skill MUST auto-apply scorer remediations in rank order until residual risk converges within appetite, OR halt the skill per ADR-041 Rule 5 if the scorer cannot produce a convergent plan. **The skill MUST NOT release above appetite under any circumstance.** The skill MUST NOT call `AskUserQuestion` as a shortcut out of the auto-apply loop.
728
728
 
729
- `push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 6).
729
+ **Auto-apply mechanism (ADR-041 Rule 2):**
730
+
731
+ 1. Parse the scorer's `RISK_REMEDIATIONS:` block.
732
+ 2. Rank by largest absolute `risk_delta` → smaller effort (S < M < L) → lower remediation ID.
733
+ 3. Classify each remediation's `description` against ADR-041 Rule 2a's closed action-class enumeration. **Today's orchestrator-supported class (ADR-041 v1)**: `move-to-holding` only. Other classes (`revert-commit`, `amend-commit`, `feature-flag`, `rollback-to-tag`) are deferred to P108 and route to Rule 5 halt.
734
+ 4. **Verification Pending carve-out (ADR-041 Rule 2b)**: skip remediations that target a commit attached to a `.verifying.md` ticket. Do NOT auto-revert VP commits.
735
+ 5. Apply the top-ranked eligible remediation:
736
+ - `move-to-holding`: `git mv .changeset/<name>.md docs/changesets-holding/<name>.md` + append to holding-area README "Currently held" per ADR-041 Rule 6. Since the non-AFK skill has no iteration wrapper to amend into, each auto-apply is its own commit (ADR-041 Rule 3). Each commit goes through the standard ADR-014 commit flow — architect + JTBD + risk-scorer gates.
737
+ 6. Re-score via the same delegation path as step 1 above.
738
+ 7. **Loop**: re-score within appetite → drain per the Drain action above. Re-score still above → goto step 3 with remaining remediations. Exhausted or unsupported class → Rule 5 halt.
739
+
740
+ **Rule 5 halt (non-AFK mode)**: halt the skill. Emit the terminal report naming:
741
+ - The final `RISK_SCORES:` line
742
+ - An "Auto-apply trail" subsection listing each remediation attempted with outcome
743
+ - Any Verification Pending ticket IDs implicated per Rule 2b
744
+ - A one-line scorer-gap note (e.g., "scorer produced only `move-to-holding`; residual still ≥ 5/25 after exhaustion — extend scorer vocabulary per P108")
745
+
746
+ The user resolves interactively — typical resolutions include splitting the commit, feature-flagging the change, or opening a problem ticket documenting the scorer gap.
747
+
748
+ `push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 5). Auto-apply actions under Rules 2–7 are also policy-authorised per ADR-013 Rule 5 — `RISK-POLICY.md` appetite + ADR-041 eligibility constitute the policy.
730
749
 
731
750
  $ARGUMENTS
@@ -238,17 +238,20 @@ Format as a brief status line, not a wall of text. The user will read these when
238
238
  [Iteration 3] Skipped P016 (Multi-concern ticket splitting) — fix released, awaiting user verification. Worked P024 (Risk scorer WIP flag) — implemented fix, closed. 6 problems remain. ($1.12, 62s, 541K tokens)
239
239
  ```
240
240
 
241
- ### Step 6.5: Release-cadence check (per ADR-018)
241
+ ### Step 6.5: Release-cadence check (per ADR-018, above-appetite branch per ADR-041)
242
242
 
243
- After the iteration's commit lands but before starting the next iteration, check whether the unreleased queue would push pipeline risk to or above appetite. If so, drain the queue before continuing. This prevents silent accumulation of unreleased changesets across AFK iterations (P041).
243
+ After the iteration's commit lands but before starting the next iteration, check whether the unreleased queue would push pipeline risk to or above appetite. This prevents silent accumulation of unreleased changesets across AFK iterations (P041). **The orchestrator MUST NOT release above appetite under any circumstance** — above-appetite states route to the ADR-041 auto-apply loop or halt.
244
244
 
245
245
  **Mechanism — delegate, do not re-implement scoring:**
246
246
 
247
247
  1. Invoke the risk scorer to score cumulative pipeline state. Two paths are valid (per ADR-015):
248
248
  - **Primary**: delegate to subagent type `wr-risk-scorer:pipeline` via the Agent tool.
249
249
  - **Fallback**: if that subagent type is not available, invoke skill `/wr-risk-scorer:assess-release` via the Skill tool. The skill wraps the same pipeline subagent.
250
- 2. Read the returned `RISK_SCORES: commit=X push=Y release=Z` line.
251
- 3. **Threshold**: if `push` or `release` is at or above appetite (4/25, "Low" band per `RISK-POLICY.md`), drain the queue.
250
+ 2. Read the returned `RISK_SCORES: commit=X push=Y release=Z` line and the `RISK_REMEDIATIONS:` block (if present).
251
+ 3. **Classify the residual**:
252
+ - **Within appetite (≤ 3/25)** — no drain needed. Proceed to Step 6.75.
253
+ - **At appetite (= 4/25)** — drain the queue per the Drain action below, then proceed to Step 6.75.
254
+ - **Above appetite (≥ 5/25)** — route to the **Above-appetite branch** below. Do NOT drain. Do NOT proceed to Step 6.75 until either (a) the auto-apply loop re-converges within appetite and drain succeeds, or (b) Rule 5 halt fires.
252
255
 
253
256
  **Drain action (non-interactive, policy-authorised per ADR-013 Rule 6):**
254
257
 
@@ -258,7 +261,44 @@ After the iteration's commit lands but before starting the next iteration, check
258
261
 
259
262
  **Failure handling**: If `release:watch` fails (CI failure, publish failure), stop the loop and report the failure in the AFK summary. Do not retry non-interactively — the user must intervene.
260
263
 
261
- `push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 6).
264
+ `push:watch` and `release:watch` are policy-authorised actions when residual risk is within appetite per RISK-POLICY.md, so no `AskUserQuestion` is required for the drain itself (ADR-013 Rule 5).
265
+
266
+ #### Above-appetite branch (per ADR-041)
267
+
268
+ **Invariant**: the orchestrator MUST NOT release above appetite. There is no code path in Step 6.5 that releases at residual push/release ≥ 5/25. The orchestrator MUST NOT call `AskUserQuestion` as a shortcut out of the auto-apply loop — the scorer is the decision surface, not the user. The branch terminates in either a within-appetite drain or a Rule 5 halt.
269
+
270
+ **Auto-apply loop (ADR-041 Rule 2):**
271
+
272
+ 1. Parse the scorer's `RISK_REMEDIATIONS:` block. Expected shape per ADR-015:
273
+ ```
274
+ RISK_REMEDIATIONS:
275
+ - R1 | <description> | <effort S/M/L> | <risk_delta -N> | <files affected>
276
+ - R2 | ...
277
+ ```
278
+ 2. Rank remediations by: largest absolute `risk_delta` first; tie-break by smaller effort (S < M < L); tie-break further by lower remediation ID (R1 before R2).
279
+ 3. Classify each remediation's `description` against ADR-041 Rule 2a's closed action-class enumeration. **Today's orchestrator-supported class (ADR-041 v1)**: `move-to-holding` (matched when `description` says move a changeset file to the holding area, or explicitly cites `docs/changesets-holding/`). All other classes (`revert-commit`, `amend-commit`, `feature-flag`, `rollback-to-tag`) are deferred to P108 and route to Rule 5 halt.
280
+ 4. **Verification Pending carve-out (ADR-041 Rule 2b)**: if a remediation targets a commit attached to a `.verifying.md` ticket, skip it and continue ranking. Do NOT auto-revert VP commits. If VP carve-out leaves no eligible remediations, route to Rule 5 halt naming the VP ticket(s).
281
+ 5. Apply the top-ranked eligible remediation:
282
+ - `move-to-holding`: `git mv .changeset/<name>.md docs/changesets-holding/<name>.md`. Append the entry to `docs/changesets-holding/README.md` under "Currently held" per ADR-041 Rule 6. Amend the iteration's commit to fold the move (per ADR-041 Rule 3 amend-based folding — preserves ADR-032 one-commit-per-iteration invariant).
283
+ 6. Re-invoke the risk scorer (same delegation path as step 1 above — subagent preferred, skill fallback). Read the new `RISK_SCORES:` line.
284
+ 7. **Loop classification**:
285
+ - **Re-score within appetite (≤ 4/25)** — proceed to Drain action above. Done with the above-appetite branch.
286
+ - **Re-score still above appetite (≥ 5/25)** — goto step 3 with the remaining ranked remediations.
287
+ - **No remediations remain** or **no remaining remediation classifies into Rule 2a enumeration** — Rule 5 halt.
288
+
289
+ **Governance gates per auto-apply (ADR-041 Rule 3):** each auto-apply that requires a commit (the amend in step 5 above) goes through the standard ADR-014 commit flow — architect review, JTBD review, risk-scorer gate. A gate rejection falls through to Rule 5 halt. The scorer's ranking does NOT bypass gates.
290
+
291
+ **Rule 5 halt (exhaustion):** when the auto-apply loop exhausts without convergence, or any gate/operation fails, halt the loop. Do NOT proceed to Step 6.75. Do NOT spawn the next iteration. Emit the iteration summary with:
292
+
293
+ - `outcome: halted-above-appetite`
294
+ - The final `RISK_SCORES:` line
295
+ - An "Auto-apply trail" subsection listing each remediation attempted with outcome
296
+ - Any Verification Pending ticket IDs implicated per Rule 2b
297
+ - A one-line scorer-gap note (e.g., "scorer produced only `move-to-holding` remediations; residual still ≥ 5/25 after exhaustion — extend scorer vocabulary per P108")
298
+
299
+ Halt is a **bug signal** — the scorer should always have progressively more aggressive remediations available once P108 lands. Until then, exhaustion is expected when the only path to within-appetite requires a non-`move-to-holding` class.
300
+
301
+ **Audit trail (ADR-041 Rule 6):** append one line per auto-apply to the iteration summary's Auto-apply trail subsection, including remediation ID, action class, pre/post scores, action taken, and description citation. For `move-to-holding` actions, also append to `docs/changesets-holding/README.md` "Currently held".
262
302
 
263
303
  ### Step 6.75: Inter-iteration verification (P036)
264
304
 
@@ -296,7 +336,8 @@ When `AskUserQuestion` is unavailable or the user is AFK, the skill (and the del
296
336
  | Scope expansion during work | Update problem file, re-score WSJF, move to next problem instead of continuing |
297
337
  | Commit when risk within appetite | Auto-commit (manage-problem step 9e fallback) |
298
338
  | Commit when risk above appetite | Skip commit, report uncommitted state |
299
- | Pipeline risk at appetite (push or release >= 4/25) | Drain release queue (`push:watch` then `release:watch`) before next iteration — per ADR-018 (Step 6.5) |
339
+ | Pipeline risk at appetite (push or release = 4/25) | Drain release queue (`push:watch` then `release:watch`) before next iteration — per ADR-018 (Step 6.5) |
340
+ | Pipeline risk above appetite (push or release >= 5/25) | Auto-apply scorer remediations in rank order (ADR-041 Rule 2) under the closed action-class enumeration (Rule 2a). Today: `move-to-holding` supported; other classes deferred to P108. Re-score after each apply; drain when within appetite. **Never release above appetite** (ADR-041 Rule 1) — no AskUserQuestion shortcut. Halt the loop with `outcome: halted-above-appetite` if the loop exhausts without convergence (ADR-041 Rule 5). Verification Pending commits excluded from auto-revert (Rule 2b). Per ADR-041 (Step 6.5 Above-appetite branch). |
300
341
  | Origin diverged before start | Pull `--ff-only` if trivial; stop with report (`git log HEAD..origin/<base>` and reverse) if non-fast-forward — per ADR-019 (Step 0) |
301
342
  | Fix verification needed | Skip problem, add to "needs verification" list |
302
343
  | Stop-condition #2 with user-answerable skip-reasons | Emit Outstanding Design Questions table in summary (do NOT call AskUserQuestion). The persona is AFK by definition — per JTBD-006 and ADR-013 Rule 6 — so the table is the default. Interactive invocations may batch up to 4 questions through AskUserQuestion instead — per ADR-013 Rule 1 (Step 2.5). |
@@ -0,0 +1,122 @@
1
+ #!/usr/bin/env bats
2
+ # Doc-lint guard: work-problems SKILL.md must include the above-appetite
3
+ # auto-apply + halt-on-exhaustion branch per ADR-041.
4
+ #
5
+ # Structural assertion — Permitted Exception to the source-grep ban (ADR-005 / P011).
6
+ # These assertions are load-bearing-string checks on the skill specification
7
+ # document. Per P081, structural tests are placeholders for behavioural tests
8
+ # against P012's skill-testing harness; until that harness lands, these
9
+ # assertions are the confirmation mechanism called out in ADR-041 Confirmation
10
+ # criterion 2.
11
+ #
12
+ # Cross-reference:
13
+ # P103 (work-problems escalates resolved release decisions — defeats AFK)
14
+ # P104 (partial-progress paints release queue into corner)
15
+ # P108 (scorer remediation action-class vocabulary — deferred work)
16
+ # ADR-041 (auto-apply scorer remediations — never release above appetite)
17
+ # ADR-037 (skill testing strategy — contract-assertion pattern)
18
+ # @jtbd JTBD-006 (Progress the Backlog While I'm Away)
19
+
20
+ setup() {
21
+ SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
22
+ SKILL_FILE="${SKILL_DIR}/SKILL.md"
23
+ }
24
+
25
+ @test "SKILL.md exists" {
26
+ [ -f "$SKILL_FILE" ]
27
+ }
28
+
29
+ @test "SKILL.md cites ADR-041 (above-appetite auto-apply)" {
30
+ # ADR-041 Confirmation criterion 1: source review names the ADR.
31
+ run grep -n "ADR-041" "$SKILL_FILE"
32
+ [ "$status" -eq 0 ]
33
+ }
34
+
35
+ @test "SKILL.md contains the never-release-above-appetite invariant (Rule 1)" {
36
+ # The load-bearing invariant from Rule 1. "MUST NOT release above appetite"
37
+ # is the phrase that anchors the policy.
38
+ run grep -nE "MUST NOT release above appetite" "$SKILL_FILE"
39
+ [ "$status" -eq 0 ]
40
+ }
41
+
42
+ @test "SKILL.md references RISK_REMEDIATIONS parsing contract (Rule 2)" {
43
+ # Rule 2 parses RISK_REMEDIATIONS from the scorer. If the string is absent,
44
+ # the skill does not implement the parse step.
45
+ run grep -n "RISK_REMEDIATIONS" "$SKILL_FILE"
46
+ [ "$status" -eq 0 ]
47
+ }
48
+
49
+ @test "SKILL.md references docs/changesets-holding/ (Rule 2a move-to-holding class)" {
50
+ # The one currently-implemented action class moves changesets to the holding
51
+ # area. The path must be named so the skill body is unambiguous about target.
52
+ run grep -n "docs/changesets-holding/" "$SKILL_FILE"
53
+ [ "$status" -eq 0 ]
54
+ }
55
+
56
+ @test "SKILL.md names the closed action-class enumeration (Rule 2a)" {
57
+ # "move-to-holding" is the single supported class today; later P108 extends.
58
+ # The string must appear so the enumeration is greppable.
59
+ run grep -n "move-to-holding" "$SKILL_FILE"
60
+ [ "$status" -eq 0 ]
61
+ }
62
+
63
+ @test "SKILL.md names P108 (deferred action-class vocabulary)" {
64
+ # Rule 2a defers revert-commit, amend-commit, feature-flag, rollback-to-tag
65
+ # to P108. Keeping the reference greppable makes the deferral auditable.
66
+ run grep -n "P108" "$SKILL_FILE"
67
+ [ "$status" -eq 0 ]
68
+ }
69
+
70
+ @test "SKILL.md includes the Verification Pending carve-out (Rule 2b)" {
71
+ # Rule 2b prevents auto-revert of commits attached to .verifying.md tickets.
72
+ run grep -niE "Verification Pending.*carve.out|Rule 2b|\.verifying\.md.*(skip|exclude|carve)" "$SKILL_FILE"
73
+ [ "$status" -eq 0 ]
74
+ }
75
+
76
+ @test "SKILL.md references the halt-on-exhaustion outcome (Rule 5)" {
77
+ # Rule 5 emits outcome: halted-above-appetite when the auto-apply loop
78
+ # exhausts without convergence.
79
+ run grep -n "halted-above-appetite" "$SKILL_FILE"
80
+ [ "$status" -eq 0 ]
81
+ }
82
+
83
+ @test "SKILL.md cites ADR-013 Rule 5 (policy-authorised silent proceed)" {
84
+ # Rule 1 is authorised by ADR-013 Rule 5. The citation should be explicit.
85
+ run grep -nE "ADR-013 Rule 5" "$SKILL_FILE"
86
+ [ "$status" -eq 0 ]
87
+ }
88
+
89
+ @test "SKILL.md references the scorer-gap halt signal" {
90
+ # Rule 5 treats exhaustion as a scorer-gap bug signal, not routine behaviour.
91
+ run grep -niE "scorer.gap|scorer vocabulary|bug signal" "$SKILL_FILE"
92
+ [ "$status" -eq 0 ]
93
+ }
94
+
95
+ @test "SKILL.md Non-Interactive Decision Making table covers above-appetite auto-apply" {
96
+ # The non-interactive defaults table row makes the behaviour discoverable to
97
+ # an AFK reader without forcing a full prose read.
98
+ run grep -niE "above appetite.*>= 5/25|pipeline risk above appetite|auto-apply scorer remediations" "$SKILL_FILE"
99
+ [ "$status" -eq 0 ]
100
+ }
101
+
102
+ @test "SKILL.md forbids AskUserQuestion shortcut for above-appetite" {
103
+ # The anti-shortcut stance is load-bearing for P103. Absent this, the skill
104
+ # reverts to the P103 bug. Allow optional "call "/"invoke " verb and optional
105
+ # backtick around the tool name (since the SKILL.md phrasing treats it as code).
106
+ run grep -niE "MUST NOT (call |invoke )?[\`]?AskUserQuestion" "$SKILL_FILE"
107
+ [ "$status" -eq 0 ]
108
+ }
109
+
110
+ @test "SKILL.md references the amend-based folding rule for ADR-032 compatibility (Rule 3)" {
111
+ # Auto-apply commits fold into the iteration's main commit via amend so
112
+ # ADR-032's one-commit-per-iteration invariant holds.
113
+ run grep -niE "amend|git commit --amend" "$SKILL_FILE"
114
+ [ "$status" -eq 0 ]
115
+ }
116
+
117
+ @test "SKILL.md references the audit-trail subsection (Rule 6)" {
118
+ # Rule 6 emits an Auto-apply trail subsection in the iteration summary. If
119
+ # the phrase is missing, audit trail is not wired through.
120
+ run grep -niE "Auto-apply trail|audit trail" "$SKILL_FILE"
121
+ [ "$status" -eq 0 ]
122
+ }