@windyroad/itil 0.2.0 → 0.3.0-preview.68

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,16 +6,25 @@ Part of [Windy Road Agent Plugins](../../README.md).
6
6
 
7
7
  ## What It Does
8
8
 
9
- Bugs recur. Incidents repeat. Without a problem management process, you fix symptoms instead of causes. This plugin brings lightweight ITIL problem management to your AI coding workflow:
9
+ Bugs recur. Incidents repeat. Without a disciplined process, you fix symptoms instead of causes — or worse, jump to conclusions during a live outage. This plugin brings lightweight ITIL service management to your AI coding workflow:
10
+
11
+ **Problem management** — track underlying causes and prioritise fixes:
10
12
 
11
13
  - **Create problem tickets** when incidents or failures surface during a session
12
14
  - **Track root cause analysis** as investigation progresses
13
15
  - **Transition status** through a structured lifecycle: Open, Known Error, Closed
14
16
  - **Prioritise** using Weighted Shortest Job First (WSJF) to focus on the highest-value fixes
15
17
 
16
- Problem tickets live in `docs/problems/` as markdown files -- version-controlled and always accessible.
18
+ **Incident management** restore service fast with an audit trail:
19
+
20
+ - **Declare incidents** when production is actively broken
21
+ - **Evidence-first discipline** — hypotheses must cite evidence before any mitigation
22
+ - **Reversible mitigations first** — rollback, feature flag, restart, route away
23
+ - **Automatic handoff** to problem management once service is restored
17
24
 
18
- Room is reserved for peer ITIL skills (incident, change) under the same plugin as they are added.
25
+ Tickets live in `docs/problems/` and `docs/incidents/` as markdown files version-controlled and always accessible.
26
+
27
+ Room is reserved for peer ITIL skills (change, continual improvement) under the same plugin as they are added.
19
28
 
20
29
  ## Install
21
30
 
@@ -31,18 +40,23 @@ Restart Claude Code after installing.
31
40
 
32
41
  ## Usage
33
42
 
34
- **Create or update a problem ticket:**
43
+ **Manage a problem ticket:**
35
44
 
36
45
  ```
37
46
  /wr-itil:manage-problem
38
47
  ```
39
48
 
40
- This supports:
49
+ Supports creating new problems, updating root cause analysis, transitioning status (Open → Known Error → Closed), and closing problems with resolution details.
50
+
51
+ **Manage an incident:**
52
+
53
+ ```
54
+ /wr-itil:manage-incident
55
+ ```
56
+
57
+ Supports declaring new incidents, recording evidence-first observations and hypotheses, logging mitigation attempts, transitioning lifecycle (Investigating → Mitigating → Restored → Closed), and automatically handing off to `manage-problem` when service is restored.
41
58
 
42
- - Creating new problems from an incident or observed failure
43
- - Updating root cause analysis with investigation findings
44
- - Transitioning status (Open -> Known Error -> Closed)
45
- - Closing problems with resolution details
59
+ See [ADR-011](../../docs/decisions/011-manage-incident-skill.proposed.md) for the incident-vs-problem split and [JTBD-201](../../docs/jtbd/tech-lead/JTBD-201-restore-service-fast.proposed.md) for the job this serves.
46
60
 
47
61
  ## How It Works
48
62
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/itil",
3
- "version": "0.2.0",
3
+ "version": "0.3.0-preview.68",
4
4
  "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
5
5
  "bin": {
6
6
  "windyroad-itil": "./bin/install.mjs"
@@ -0,0 +1,265 @@
1
+ ---
2
+ name: wr-itil:manage-incident
3
+ description: Declare, triage, mitigate, and close an incident using an evidence-first workflow. Restores service first, then hands off to manage-problem for root-cause work.
4
+ allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion, Skill
5
+ ---
6
+
7
+ # Incident Management Skill
8
+
9
+ Declare, triage, mitigate, and close an incident using an evidence-first, cool-headed workflow. This skill's primary goal is **restoring service**. Once service is restored, the skill hands off to `wr-itil:manage-problem` so the underlying cause is tracked.
10
+
11
+ Incidents are time-bound events. Problems are persistent root causes. One problem can cause many incidents; one incident may (or may not) link to a problem.
12
+
13
+ ## Operations
14
+
15
+ - **Declare**: `incident <title or symptoms>` — creates a new investigating incident
16
+ - **Update**: `incident <NNN> <details>` — append observations, evidence, or actions
17
+ - **Mitigate**: `incident <NNN> mitigate <action>` — record a mitigation attempt and outcome
18
+ - **Restore**: `incident <NNN> restored` — transition to `.restored.md` and trigger problem handoff
19
+ - **Close**: `incident <NNN> close` — only allowed when the linked problem is Known Error or Closed (or an explicit "no problem required" justification is recorded)
20
+ - **List**: `incident list` — active incidents, severity-sorted
21
+ - **Link**: `incident <NNN> link P<MMM>` — link an incident to an existing problem
22
+
23
+ ## Lifecycle
24
+
25
+ | Status | File suffix | Meaning | Entry criteria |
26
+ |--------|-------------|---------|----------------|
27
+ | **Investigating** | `.investigating.md` | Symptoms reported, scope being established | Incident declared |
28
+ | **Mitigating** | `.mitigating.md` | Mitigation(s) in flight | At least one ranked hypothesis with cited evidence |
29
+ | **Restored** | `.restored.md` | Service verified restored | Mitigation applied + verification signal recorded |
30
+ | **Closed** | `.closed.md` | Incident complete | Linked problem is Known Error or Closed (or "no problem required" justification documented) |
31
+
32
+ ## Evidence-First Workflow (The Cool-Headed Commitment)
33
+
34
+ During an incident, the instinct to jump to conclusions is strong. This skill forces evidence-first discipline via a required template. **Do not act on a hypothesis without at least one cited evidence source.**
35
+
36
+ ### Required sections in every incident file
37
+
38
+ ```markdown
39
+ ## Observations
40
+ - [timestamp] <what was seen, from where — e.g. "14:02 UTC, 500s on /api/orders in Datadog dashboard foo">
41
+
42
+ ## Hypotheses
43
+ - [ranked] <hypothesis> — Evidence: <log/repro/diff/metric reference>. Confidence: <low|med|high>.
44
+
45
+ ## Mitigation attempts
46
+ - [timestamp] <action> → <outcome / verification signal>
47
+ ```
48
+
49
+ ### Mitigation preference
50
+
51
+ Prefer **reversible** mitigations over forward fixes:
52
+
53
+ 1. Rollback to a known-good version
54
+ 2. Feature flag off
55
+ 3. Restart / cycle the affected component
56
+ 4. Route traffic away
57
+ 5. Scale up
58
+ 6. Only after reversibles are exhausted: forward fix
59
+
60
+ Record every attempt, successful or not.
61
+
62
+ ## Severity, not WSJF
63
+
64
+ Incidents are severity-driven and time-boxed. **WSJF does not apply to incidents** — the "effort" divisor is meaningless during a live event. WSJF applies to the resulting problem created via handoff.
65
+
66
+ Severity uses the Impact × Likelihood matrix from `RISK-POLICY.md`, interpreted as "right now, what's the live business impact?" — not "in general, how bad could this be?".
67
+
68
+ ## Steps
69
+
70
+ ### 1. Parse the request
71
+
72
+ Determine the operation from `$ARGUMENTS`:
73
+
74
+ - If arguments start with "list" → show active incidents summary
75
+ - If arguments start with `I<NNN>` or a bare number → this is an update, mitigate, restore, close, or link
76
+ - Otherwise → declare a new incident
77
+
78
+ ### 2. For new incidents: Check for duplicates FIRST
79
+
80
+ Before creating, search `docs/incidents/` for active (non-closed) incidents with overlapping symptoms or scope. The user may already have an incident open for this outage.
81
+
82
+ 1. Extract keywords from the description (e.g., "500 errors", "checkout", "login")
83
+ 2. `grep -l` the keywords across `docs/incidents/*.{investigating,mitigating,restored}.md`
84
+ 3. If matches are found, present them via `AskUserQuestion`:
85
+ - "I found active incidents that may be related: I003 (checkout 500s, mitigating), I007 (login slowness, investigating). Would you like to (a) update an existing incident, (b) declare a new incident anyway, or (c) cancel?"
86
+ 4. If the user chooses to update, switch to the update flow for that incident ID
87
+ 5. If no matches, proceed to create
88
+
89
+ ### 3. For new incidents: Assign the next ID
90
+
91
+ Create `docs/incidents/` if it does not exist. Then scan for the highest existing `I<NNN>` and increment:
92
+
93
+ ```bash
94
+ mkdir -p docs/incidents
95
+ last=$(ls docs/incidents/I*.md 2>/dev/null | sed 's/.*\///' | grep -oE '^I[0-9]+' | sed 's/^I//' | sort -n | tail -1)
96
+ next=$(printf 'I%03d' $((10#${last:-0} + 1)))
97
+ echo "$next"
98
+ ```
99
+
100
+ ### 4. For new incidents: Gather information
101
+
102
+ Use `AskUserQuestion` for anything not in `$ARGUMENTS`:
103
+
104
+ - **Title**: short kebab-case-friendly description
105
+ - **Symptoms**: what is observable (errors, latency, missing data)?
106
+ - **Scope**: who/what is affected (users, endpoints, regions)?
107
+ - **Start time**: when did symptoms begin? (UTC, as precise as known)
108
+ - **Severity**: Impact (1-5) × Likelihood (1-5) per `RISK-POLICY.md`, interpreted as live impact
109
+
110
+ Do not ask for fields that can be inferred:
111
+
112
+ - **Reported**: today's date (UTC)
113
+ - **Status**: always "Investigating" for new incidents
114
+
115
+ ### 5. For new incidents: Write the incident file
116
+
117
+ **File path**: `docs/incidents/<I###>-<kebab-case-title>.investigating.md`
118
+
119
+ **Template**:
120
+
121
+ ```markdown
122
+ # Incident <I###>: <Title>
123
+
124
+ **Status**: Investigating
125
+ **Reported**: <YYYY-MM-DD HH:MM UTC>
126
+ **Severity**: <score> (<label>) — Impact: <label> (<n>) x Likelihood: <label> (<n>)
127
+ **Scope**: <who/what is affected>
128
+
129
+ ## Timeline
130
+
131
+ - [<start-time> UTC] Symptoms began
132
+ - [<reported-time> UTC] Incident declared
133
+
134
+ ## Observations
135
+
136
+ - [<timestamp> UTC] <what was seen, from where>
137
+
138
+ ## Hypotheses
139
+
140
+ - [ranked] <hypothesis> — Evidence: <log/repro/diff/metric reference>. Confidence: <low|med|high>.
141
+
142
+ ## Mitigation attempts
143
+
144
+ *(none yet)*
145
+
146
+ ## Linked Problem
147
+
148
+ *(none yet — added on restore transition)*
149
+ ```
150
+
151
+ ### 6. For updates: Edit the existing file
152
+
153
+ Find the file by ID:
154
+
155
+ ```bash
156
+ ls docs/incidents/<I###>-*.md 2>/dev/null
157
+ ```
158
+
159
+ Append new observations, hypotheses, or timeline entries. **Every hypothesis must cite evidence.** If the user proposes a hypothesis without evidence, ask via `AskUserQuestion` what evidence supports it before writing.
160
+
161
+ ### 7. For mitigate: Record and transition to mitigating
162
+
163
+ When the first mitigation attempt is made:
164
+
165
+ 1. `git mv docs/incidents/<I###>-<title>.investigating.md docs/incidents/<I###>-<title>.mitigating.md`
166
+ 2. Update the **Status** field to "Mitigating"
167
+ 3. Append to **Mitigation attempts**: `[<timestamp> UTC] <action> → <outcome>` (outcome may be "pending verification" initially; update once the verification signal is known)
168
+
169
+ Pre-flight check before first mitigation: the file must contain at least one hypothesis with cited evidence. If not, block the transition and ask the user what evidence supports the chosen action.
170
+
171
+ ### 8. For restore: Transition and hand off to manage-problem
172
+
173
+ Pre-flight checks before restore:
174
+
175
+ - [ ] At least one mitigation attempt is recorded with outcome
176
+ - [ ] A verification signal is captured (e.g., "error rate back to baseline per Datadog", "user reports normal", "synthetic probe passing")
177
+
178
+ If checks pass:
179
+
180
+ 1. `git mv docs/incidents/<I###>-<title>.mitigating.md docs/incidents/<I###>-<title>.restored.md`
181
+ 2. Update the **Status** field to "Restored"
182
+ 3. Append to **Timeline**: `[<timestamp> UTC] Service restored — <verification signal>`
183
+
184
+ Then perform the **handoff to problem management**:
185
+
186
+ 1. Ask via `AskUserQuestion`: "Service restored. Should I create or update a problem record for the root cause? (a) yes — recommended, (b) no — document why (trivial/one-off)"
187
+ 2. If yes, construct a handoff payload:
188
+ - Incident ID and title
189
+ - Timeline summary
190
+ - Top-ranked hypothesis + cited evidence
191
+ - Mitigation applied + verification signal
192
+ 3. Invoke `wr-itil:manage-problem` via the `Skill` tool with the payload as arguments. The problem skill's existing dedupe flow handles new-vs-update.
193
+ 4. Capture the returned `P<NNN>` and write a **Linked Problem** section into the incident file:
194
+ ```markdown
195
+ ## Linked Problem
196
+ P<NNN> (<title>) — <status>
197
+ ```
198
+ 5. If the user chose "no", write a **No Problem** section with the justification and skip the handoff:
199
+ ```markdown
200
+ ## No Problem
201
+ <reason — e.g. "one-off cosmic-bit-flip; not reproducible">
202
+ ```
203
+
204
+ ### 9. For close: Gate on linked problem status
205
+
206
+ The close operation checks the linked problem's file suffix:
207
+
208
+ ```bash
209
+ linked_id=<extracted from Linked Problem section>
210
+ linked_file=$(ls docs/problems/${linked_id}-*.md 2>/dev/null | head -1)
211
+ ```
212
+
213
+ - If `linked_file` ends with `.known-error.md` or `.closed.md` → close is allowed
214
+ - If `linked_file` ends with `.open.md` → close is blocked; report "Linked problem ${linked_id} is still Open. Transition it to Known Error first, or update the Linked Problem reference."
215
+ - If no linked problem and the file has a **No Problem** section → close is allowed
216
+
217
+ On close:
218
+
219
+ 1. `git mv docs/incidents/<I###>-<title>.restored.md docs/incidents/<I###>-<title>.closed.md`
220
+ 2. Update the **Status** field to "Closed"
221
+ 3. Append to **Timeline**: `[<timestamp> UTC] Incident closed`
222
+
223
+ ### 10. For list: Show active incidents
224
+
225
+ Read all `.investigating.md`, `.mitigating.md`, and `.restored.md` files in `docs/incidents/`. Extract ID, title, severity, and status. Sort by severity (highest first). Display as a markdown table.
226
+
227
+ ### 11. For link: Attach a problem
228
+
229
+ When the user runs `incident <I###> link P<MMM>`:
230
+
231
+ 1. Verify `docs/problems/P<MMM>-*.md` exists
232
+ 2. Read or add the **Linked Problem** section with `P<MMM> (<title>) — <status>`
233
+ 3. Report the link
234
+
235
+ ### 12. Edge cases
236
+
237
+ - **No problem required** — record a **No Problem** section with justification; close immediately.
238
+ - **Multiple incidents → one problem** — each incident links to the same `P<NNN>`; the problem file accumulates "Reported by incident" entries via `manage-problem`'s update flow.
239
+ - **Problem re-opens after the incident closed** — the closed incident stays closed; a new incident is declared for the new occurrence, linked to the re-opened problem.
240
+ - **Low-severity / solo-developer lightweight path** — for Sev 4-5 incidents, the skill may skip the Hypotheses section if the user confirms no investigation is needed. Timeline, Observations, and at least one mitigation attempt remain mandatory.
241
+
242
+ ### 13. Quality checks
243
+
244
+ After any operation, verify:
245
+
246
+ - **ID uniqueness**: no duplicate `I<NNN>` in `docs/incidents/`
247
+ - **Naming convention**: `I<NNN>-<kebab-case-title>.<status>.md`
248
+ - **Status consistency**: Status field matches filename suffix
249
+ - **Required sections**: Timeline, Observations, Hypotheses (or documented skip), Mitigation attempts
250
+ - **Evidence discipline**: every Hypothesis has a cited evidence reference
251
+ - **Linked Problem** section present and consistent (or **No Problem** with justification) once the incident reaches Restored
252
+
253
+ ### 14. Report
254
+
255
+ After any operation, report:
256
+
257
+ - The file path created/modified
258
+ - The incident ID and title
259
+ - The current status
260
+ - For restore: the linked problem ID (or "No Problem" note)
261
+ - Any quality-check warnings
262
+
263
+ Do not commit. The user will commit when ready.
264
+
265
+ $ARGUMENTS
@@ -0,0 +1,171 @@
1
+ #!/usr/bin/env bats
2
+ # Functional tests for the manage-incident skill (Option A-lite per ADR-011).
3
+ #
4
+ # Scope: execute the bash fragments the SKILL.md instructs Claude to run
5
+ # (ID assignment, file-path construction, directory creation) and assert on
6
+ # the mocked Skill-tool handoff contract between manage-incident and
7
+ # manage-problem. Source-grep assertions on SKILL.md prose are NOT used
8
+ # (P011 ban). A single structural check asserts SKILL.md exists and has
9
+ # frontmatter — file-existence checks are a Permitted Exception per ADR-005.
10
+
11
+ setup() {
12
+ SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
13
+ SKILL_FILE="${SKILL_DIR}/SKILL.md"
14
+
15
+ TEST_ROOT="$(mktemp -d "${TMPDIR:-/tmp}/manage-incident-bats-XXXXXX")"
16
+ INCIDENTS_DIR="${TEST_ROOT}/docs/incidents"
17
+ PROBLEMS_DIR="${TEST_ROOT}/docs/problems"
18
+ }
19
+
20
+ teardown() {
21
+ rm -rf "$TEST_ROOT"
22
+ }
23
+
24
+ # --- Fragment: next-ID computation (I###) ---
25
+ # SKILL.md instructs: scan docs/incidents/ for existing I<NNN> files, take the
26
+ # highest numeric ID, increment by 1, zero-pad to 3 digits.
27
+ next_incident_id() {
28
+ local dir="$1"
29
+ local last
30
+ last=$(ls "$dir"/I*.md 2>/dev/null | sed 's/.*\///' | grep -oE '^I[0-9]+' | sed 's/^I//' | sort -n | tail -1)
31
+ if [[ -z "$last" ]]; then
32
+ printf 'I001'
33
+ else
34
+ printf 'I%03d' $((10#$last + 1))
35
+ fi
36
+ }
37
+
38
+ # --- Fragment: file path construction ---
39
+ incident_path() {
40
+ local dir="$1" id="$2" slug="$3" status="$4"
41
+ printf '%s/%s-%s.%s.md' "$dir" "$id" "$slug" "$status"
42
+ }
43
+
44
+ # --- Mock: Skill-tool invocation contract ---
45
+ # The SKILL.md instructs Claude to invoke wr-itil:manage-problem via the
46
+ # Skill tool on restoration. The contract (skill name + argument shape) is
47
+ # asserted by a mock that writes the payload to a file the test reads back.
48
+ invoke_skill_mock() {
49
+ local tool="$1" skill="$2" args="$3"
50
+ printf '%s\n%s\n%s\n' "$tool" "$skill" "$args" > "${TEST_ROOT}/skill-invocation.log"
51
+ }
52
+
53
+ # ---- Tests ----
54
+
55
+ @test "SKILL.md exists and has frontmatter" {
56
+ [ -f "$SKILL_FILE" ]
57
+ run head -1 "$SKILL_FILE"
58
+ [ "$status" -eq 0 ]
59
+ [ "$output" = "---" ]
60
+ }
61
+
62
+ @test "next_incident_id returns I001 when docs/incidents is empty" {
63
+ mkdir -p "$INCIDENTS_DIR"
64
+ run next_incident_id "$INCIDENTS_DIR"
65
+ [ "$status" -eq 0 ]
66
+ [ "$output" = "I001" ]
67
+ }
68
+
69
+ @test "next_incident_id returns I001 when docs/incidents does not exist" {
70
+ run next_incident_id "$INCIDENTS_DIR"
71
+ [ "$status" -eq 0 ]
72
+ [ "$output" = "I001" ]
73
+ }
74
+
75
+ @test "next_incident_id increments past the highest existing ID" {
76
+ mkdir -p "$INCIDENTS_DIR"
77
+ : > "$INCIDENTS_DIR/I001-foo.closed.md"
78
+ : > "$INCIDENTS_DIR/I002-bar.restored.md"
79
+ : > "$INCIDENTS_DIR/I005-baz.investigating.md"
80
+ run next_incident_id "$INCIDENTS_DIR"
81
+ [ "$status" -eq 0 ]
82
+ [ "$output" = "I006" ]
83
+ }
84
+
85
+ @test "next_incident_id zero-pads three digits" {
86
+ mkdir -p "$INCIDENTS_DIR"
87
+ : > "$INCIDENTS_DIR/I098-foo.closed.md"
88
+ run next_incident_id "$INCIDENTS_DIR"
89
+ [ "$status" -eq 0 ]
90
+ [ "$output" = "I099" ]
91
+ }
92
+
93
+ @test "next_incident_id ignores non-incident files" {
94
+ mkdir -p "$INCIDENTS_DIR"
95
+ : > "$INCIDENTS_DIR/README.md"
96
+ : > "$INCIDENTS_DIR/notes.md"
97
+ run next_incident_id "$INCIDENTS_DIR"
98
+ [ "$status" -eq 0 ]
99
+ [ "$output" = "I001" ]
100
+ }
101
+
102
+ @test "incident_path builds investigating file path" {
103
+ run incident_path "$INCIDENTS_DIR" "I001" "login-500s" "investigating"
104
+ [ "$status" -eq 0 ]
105
+ [ "$output" = "${INCIDENTS_DIR}/I001-login-500s.investigating.md" ]
106
+ }
107
+
108
+ @test "incident_path supports all lifecycle suffixes" {
109
+ for suffix in investigating mitigating restored closed; do
110
+ run incident_path "$INCIDENTS_DIR" "I042" "x" "$suffix"
111
+ [ "$status" -eq 0 ]
112
+ [ "$output" = "${INCIDENTS_DIR}/I042-x.${suffix}.md" ]
113
+ done
114
+ }
115
+
116
+ @test "docs/incidents is auto-created on first declaration" {
117
+ [ ! -d "$INCIDENTS_DIR" ]
118
+ mkdir -p "$INCIDENTS_DIR"
119
+ [ -d "$INCIDENTS_DIR" ]
120
+ id=$(next_incident_id "$INCIDENTS_DIR")
121
+ path=$(incident_path "$INCIDENTS_DIR" "$id" "test" "investigating")
122
+ : > "$path"
123
+ [ -f "$path" ]
124
+ }
125
+
126
+ @test "restore handoff invokes Skill tool with wr-itil:manage-problem and payload" {
127
+ invoke_skill_mock "Skill" "wr-itil:manage-problem" "incident I001 login-500s — rollback v1.4.3 restored service at 14:30 UTC"
128
+ [ -f "${TEST_ROOT}/skill-invocation.log" ]
129
+ run cat "${TEST_ROOT}/skill-invocation.log"
130
+ [ "$status" -eq 0 ]
131
+ [ "${lines[0]}" = "Skill" ]
132
+ [ "${lines[1]}" = "wr-itil:manage-problem" ]
133
+ [[ "${lines[2]}" == *"I001"* ]]
134
+ [[ "${lines[2]}" == *"rollback"* ]]
135
+ }
136
+
137
+ @test "restore handoff payload carries incident ID and mitigation" {
138
+ invoke_skill_mock "Skill" "wr-itil:manage-problem" "incident I042 — mitigation: feature flag off — verified via Datadog"
139
+ run cat "${TEST_ROOT}/skill-invocation.log"
140
+ [[ "${lines[2]}" == *"I042"* ]]
141
+ [[ "${lines[2]}" == *"feature flag off"* ]]
142
+ [[ "${lines[2]}" == *"Datadog"* ]]
143
+ }
144
+
145
+ @test "handoff contract rejects invocation with wrong skill name" {
146
+ invoke_skill_mock "Skill" "wr-itil:manage-change" "incident I001"
147
+ run grep -c '^wr-itil:manage-problem$' "${TEST_ROOT}/skill-invocation.log"
148
+ [ "$output" = "0" ]
149
+ }
150
+
151
+ @test "close is blocked when linked problem file is .open.md" {
152
+ mkdir -p "$PROBLEMS_DIR"
153
+ : > "$PROBLEMS_DIR/P050-root.open.md"
154
+ # close-gate: close only if linked P### is known-error or closed
155
+ linked=$(ls "$PROBLEMS_DIR"/P050-*.md 2>/dev/null | head -1)
156
+ [[ "$linked" != *".known-error.md" && "$linked" != *".closed.md" ]]
157
+ }
158
+
159
+ @test "close is allowed when linked problem file is .known-error.md" {
160
+ mkdir -p "$PROBLEMS_DIR"
161
+ : > "$PROBLEMS_DIR/P050-root.known-error.md"
162
+ linked=$(ls "$PROBLEMS_DIR"/P050-*.md 2>/dev/null | head -1)
163
+ [[ "$linked" == *".known-error.md" || "$linked" == *".closed.md" ]]
164
+ }
165
+
166
+ @test "close is allowed when linked problem file is .closed.md" {
167
+ mkdir -p "$PROBLEMS_DIR"
168
+ : > "$PROBLEMS_DIR/P050-root.closed.md"
169
+ linked=$(ls "$PROBLEMS_DIR"/P050-*.md 2>/dev/null | head -1)
170
+ [[ "$linked" == *".known-error.md" || "$linked" == *".closed.md" ]]
171
+ }
@@ -83,6 +83,12 @@ What "work" means depends on the problem's status:
83
83
  3. Include the problem doc closure in the fix commit (`git mv` to `.closed.md`, update Status)
84
84
  4. Push, create changeset, release per the lean release principle
85
85
 
86
+ **Scope expansion during work:** If investigation or architect review reveals that the problem's scope has grown significantly (e.g., effort re-sized from S to L, additional files discovered), use `AskUserQuestion` before continuing:
87
+ - Option 1: `Continue with expanded scope` — keep working this problem at its new size
88
+ - Option 2: `Update problem and re-rank` — save findings to the problem file, re-score WSJF, and re-run the work selection to let the user pick from the updated queue
89
+ - Option 3: `Pick a different problem` — park this one and work something else
90
+ - Use `header: "Scope change"` and `multiSelect: false`
91
+
86
92
  **In both cases:** After completing work on one problem, run `problem work` again to pick up the next highest-WSJF problem. Keep going until the user says stop or no more problems are actionable.
87
93
 
88
94
  ## Steps
@@ -239,7 +245,7 @@ Read `RISK-POLICY.md` to get the current impact levels (1-5), likelihood levels
239
245
  - Update the Status field to "Known Error"
240
246
  - This happens automatically — do not ask the user
241
247
 
242
- **Step 9c: Present summary**
248
+ **Step 9c: Present summary and select problem to work**
243
249
 
244
250
  After reviewing all problems, present a WSJF-ranked table:
245
251
 
@@ -253,6 +259,18 @@ Highlight:
253
259
  - Problems that have been fixed but not closed (check git history for fix commits)
254
260
  - Known errors with a `## Fix Released` section (pending user verification)
255
261
 
262
+ **When the operation is `work` (not just `review`), select the problem to work using `AskUserQuestion`:**
263
+
264
+ - If one problem has a strictly higher WSJF than all others, present it as the recommended option:
265
+ - Option 1: `Work P<NNN>: <title> (Recommended)` — with description showing WSJF score and status
266
+ - Option 2: `Pick a different problem` — let the user name a specific ID
267
+ - If two or more problems tie for the highest WSJF, present the tied problems as options:
268
+ - One option per tied problem: `Work P<NNN>: <title>` — with description showing WSJF and a one-line rationale for why this one
269
+ - Final option: `Pick a different problem`
270
+ - Use `header: "Next problem"` and `multiSelect: false`
271
+
272
+ **Never present the selection as prose "(a)/(b)/(c)" or "which would you like?"** — always use `AskUserQuestion` so the decision is structured and auditable.
273
+
256
274
  **Step 9d: Check for pending verifications**
257
275
 
258
276
  For each known-error that has a `## Fix Released` section, use `AskUserQuestion` to ask the user if the fix has been verified in production. If the user confirms, close the problem (`git mv` to `.closed.md`, update Status). If the user says no or is unsure, leave it as known-error.