npm - @windyroad/itil - Versions diffs - 0.3.3-preview.77 → 0.4.0-preview.81 - Mend

@windyroad/itil 0.3.3-preview.77 → 0.4.0-preview.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/itil",
-  "version": "0.3.3-preview.77",
+  "version": "0.4.0-preview.81",
   "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
   "bin": {
     "windyroad-itil": "./bin/install.mjs"

package/skills/manage-problem/SKILL.md CHANGED Viewed

@@ -8,6 +8,10 @@ allowed-tools: Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
 Create, update, or transition problem tickets following an ITIL-aligned problem management process. This skill is the authoritative definition of the problem management workflow — no separate process document is needed.
+## Output Formatting
+When referencing problem IDs, ADR IDs, or JTBD IDs in prose output, always include the human-readable title on first mention. Use the format `P029 (Edit gate overhead for governance docs)`, not bare `P029`. Tables with separate ID and Title columns are fine as-is.
 ## Operations
 - **Create**: `problem <title or description>` — creates a new open problem
@@ -146,6 +150,28 @@ Do NOT ask for fields that can be inferred:
 - **Symptoms**: Infer from description if possible
 - **Workaround**: Default to "None identified yet." unless obvious from context
+### 4b. For new problems: Concern-boundary analysis (multi-concern check)
+Before writing the problem file, perform a concern-boundary analysis on the gathered description to prevent conflated tickets that make WSJF scoring meaningless (P016).
+**Self-check**: Read the description and root cause information gathered in step 4. Answer: "How many distinct root causes are present? If fixed independently, how many separate fix paths exist?"
+- **Single concern** (one root cause, one fix path): proceed directly to step 5.
+- **Multiple concerns** (two or more distinct root causes, different components, or if the architect review flagged this needs its own ADR): present a split prompt.
+**Split prompt** — use `AskUserQuestion`:
+- `header: "Multi-concern problem"`
+- `multiSelect: false`
+- Options:
+  1. `Split into separate problems (Recommended)` — description: "Create one problem ticket per distinct concern, with consecutive IDs. Each ticket gets its own priority, WSJF score, and fix path."
+  2. `Keep as a single problem` — description: "Create one ticket covering all concerns. Use this only if the concerns are so tightly coupled that they cannot be fixed independently."
+**Non-interactive fallback**: When `AskUserQuestion` is unavailable (e.g., non-interactive/AFK mode), automatically split into separate problems and note the auto-split in output. Do not block creation.
+**Split implementation**: When splitting, assign consecutive IDs (e.g., if next ID is 035, create P035 and P036). Create each problem file independently. Cross-reference each ticket in the other's "Related" section.
+**Scope**: This step applies only to **new problem creation** (steps 2–5). It does NOT apply to updates, status transitions, or reviews of existing tickets.
 ### 5. For new problems: Write the problem file
 **File path**: `docs/problems/<NNN>-<kebab-case-title>.open.md`
@@ -235,19 +261,24 @@ This is a batch operation that reviews every open/known-error problem and update
 **Fast-path for `work` (skip full re-scan when cache is fresh):**
-Before running the full review, check whether `docs/problems/README.md` exists and is up to date:
+Before running the full review, check whether `docs/problems/README.md` exists and is up to date using **git history** (not filesystem mtime, which is unreliable in worktrees and fresh checkouts — see P031):
 ```bash
-find docs/problems -name "*.md" ! -name "README.md" -newer docs/problems/README.md 2>/dev/null | head -1
+readme_commit=$(git log -1 --format=%H -- docs/problems/README.md 2>/dev/null)
+# Cache is stale if: no README commit, OR problem files committed since README, OR uncommitted problem file changes
+if [ -z "$readme_commit" ] || \
+   git log --oneline "${readme_commit}..HEAD" -- 'docs/problems/*.md' ':!docs/problems/README.md' 2>/dev/null | grep -q .; then
+  echo "stale"
+fi
 ```
-If this command produces **no output** (README.md is newer than all problem files), the cache is fresh:
+If the command produces **no output** (no problem files have been committed or modified since the last README.md update), the cache is fresh:
 - Read `docs/problems/README.md` only — it contains the ranked table from the last review
 - Skip steps 9a–9b entirely
 - Proceed directly to step 9c (work selection) using the cached table
 - Note in the output: "Using cached ranking from [timestamp in README.md]"
-If the command produces output, or `README.md` does not exist, run the full review (steps 9a–9e) and refresh the cache.
+If the command prints "stale", or `README.md` does not exist in git, run the full review (steps 9a–9e) and refresh the cache.
 **Step 9a: Read the risk framework**
@@ -304,7 +335,7 @@ Highlight:
 **Step 9d: Check for pending verifications**
-For each known-error that has a `## Fix Released` section, use `AskUserQuestion` to ask the user if the fix has been verified in production. If the user confirms, close the problem (`git mv` to `.closed.md`, update Status). If the user says no or is unsure, leave it as known-error.
+For each known-error that has a `## Fix Released` section, use `AskUserQuestion` to ask the user if the fix has been verified in production. The question MUST include a fix summary extracted from the `## Fix Released` section — include the first sentence (or first bullet list) of that section in the question body or as the option description, so the user can answer without reading the full problem file. Do not ask with only the problem ID + title + version. If the user confirms, close the problem (`git mv` to `.closed.md`, update Status). If the user says no or is unsure, leave it as known-error.
 **Step 9e: Update files and refresh README.md cache**
@@ -333,9 +364,12 @@ Edit each problem file where the priority changed. Then write/overwrite `docs/pr
 Then commit all changed files per ADR-014:
 1. `git add` the changed problem files and `docs/problems/README.md`
-2. Delegate to `wr-risk-scorer:pipeline` to assess and create a bypass marker
+2. Satisfy the commit gate — two paths are valid (either produces a bypass marker):
+   - **Primary**: delegate to the `wr-risk-scorer:pipeline` subagent-type via the Agent tool
+   - **Fallback**: if the `wr-risk-scorer:pipeline` subagent-type is not available in the current tool set (e.g., this skill is itself running inside a spawned subagent), invoke the `/wr-risk-scorer:assess-release` skill via the Skill tool. Per ADR-015 it wraps the same pipeline subagent and produces an equivalent bypass marker via the `PostToolUse:Agent` hook. Do not silently skip the gate because the primary path is unavailable — the fallback exists specifically to close this gap (see P035).
 3. `git commit -m "docs(problems): review — re-rank priorities"`
-If `AskUserQuestion` is unavailable and risk is above appetite, skip the commit and report the uncommitted state.
+If `AskUserQuestion` is unavailable and risk is above appetite, skip the commit and report the uncommitted state (ADR-013 Rule 6 fail-safe). This applies only to the risk-above-appetite branch, not to the delegation-unavailable case above.
 ### 10. Quality checks
@@ -360,13 +394,15 @@ After any operation, report:
 Commit the completed work per ADR-014 (governance skills commit their own work):
 1. `git add` all created/modified files for this operation
-2. Delegate to `wr-risk-scorer:pipeline` (subagent_type: `wr-risk-scorer:pipeline`) to assess the staged changes and create a bypass marker
+2. Satisfy the commit gate — two paths are valid (either produces a bypass marker):
+   - **Primary**: delegate to the `wr-risk-scorer:pipeline` subagent-type via the Agent tool (subagent_type: `wr-risk-scorer:pipeline`)
+   - **Fallback**: if the `wr-risk-scorer:pipeline` subagent-type is not available in the current tool set (e.g., this skill is itself running inside a spawned subagent), invoke the `/wr-risk-scorer:assess-release` skill via the Skill tool. Per ADR-015 it wraps the same pipeline subagent and the `PostToolUse:Agent` hook writes an equivalent bypass marker. Do not silently skip the gate because the primary path is unavailable — the fallback exists specifically to close this gap (see P035).
 3. `git commit -m "<message>"` using the convention for the operation type:
    - New problem: `docs(problems): open P<NNN> <title>`
    - Known Error transition: `docs(problems): P<NNN> known error — <root cause summary>`
    - Problem closed: `docs(problems): close P<NNN> <title>`
    - Review/re-rank: `docs(problems): review — re-rank priorities`
    - Fix implemented: `fix(<scope>): <description> (closes P<NNN>)` — include problem file changes in the same commit
-4. If risk is above appetite: use `AskUserQuestion` to ask whether to commit anyway, remediate first, or park the work. If `AskUserQuestion` is unavailable, skip the commit and report the uncommitted state clearly.
+4. If risk is above appetite: use `AskUserQuestion` to ask whether to commit anyway, remediate first, or park the work. If `AskUserQuestion` is unavailable, skip the commit and report the uncommitted state clearly (ADR-013 Rule 6 fail-safe). This applies only to the risk-above-appetite branch, not to the delegation-unavailable case above.
 $ARGUMENTS

package/skills/manage-problem/test/manage-problem-concern-boundary.bats ADDED Viewed

@@ -0,0 +1,64 @@
+#!/usr/bin/env bats
+# Doc-lint guard: manage-problem SKILL.md must include a concern-boundary
+# analysis step for new problem creation.
+#
+# Structural assertion — Permitted Exception to the source-grep ban (ADR-005 / P011).
+# These tests assert that the skill specification document conforms to the
+# concern-boundary splitting contract introduced by P016.
+#
+# Cross-reference:
+#   P016: docs/problems/016-manage-problem-should-split-multi-concern-tickets.open.md
+#   ADR-013: docs/decisions/013-structured-user-interaction-for-governance-decisions.proposed.md
+#   @jtbd JTBD-001 (enforce governance without slowing down)
+#   @jtbd JTBD-101 (extend the suite with clear patterns)
+setup() {
+  SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
+  SKILL_FILE="${SKILL_DIR}/SKILL.md"
+}
+@test "SKILL.md includes a concern-boundary analysis step for new problem creation" {
+  # P016: Before writing a problem file (step 5), the skill must check whether
+  # the description contains multiple distinct root causes or concerns, and offer
+  # to split if it does. This guards against conflated tickets that make WSJF
+  # scoring meaningless.
+  run grep -in "concern.boundary\|concern-boundary\|concern boundary\|boundary.*concern\|split.*concern\|multi.concern\|single.*concern" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md concern-boundary step uses AskUserQuestion, not prose (ADR-013)" {
+  # ADR-013 Rule 1: all branch points must use AskUserQuestion, not prose options.
+  # The concern-boundary split decision (split vs keep as one) is a branch point
+  # and must be handled with a structured AskUserQuestion call, not a
+  # '(a) split (b) keep' prose paragraph.
+  # This test verifies the split prompt references AskUserQuestion (not just that
+  # AskUserQuestion appears anywhere — the no-prose-options.bats test covers that).
+  run grep -n "concern.boundary\|concern-boundary\|concern boundary\|split.*concern\|multi.concern" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  # The split decision must direct the skill to use AskUserQuestion
+  run grep -in "split.*AskUserQuestion\|AskUserQuestion.*split\|split.*question\|question.*split\|split.*ask\|concern.*AskUserQuestion\|AskUserQuestion.*concern" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md concern-boundary step is scoped to new problem creation (not updates)" {
+  # P016 fix must only fire during new problem creation (between steps 4 and 5),
+  # not during updates or transitions. Scope constraint prevents spurious split
+  # prompts on existing tickets being updated or transitioned.
+  # This checks that the concern-boundary step is placed in the 'new problems'
+  # section (steps 2-5), not in the update or transition sections.
+  run grep -n "For new problems\|new problem" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  # The concern-boundary check must appear in the new-problems workflow context
+  run grep -A5 -i "concern.boundary\|concern-boundary\|concern boundary\|multi.concern\|split.*concern" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md concern-boundary step specifies non-interactive fallback with auto-split" {
+  # ADR-013 Rule 6: non-interactive fail-safe — when AskUserQuestion is unavailable,
+  # the skill must auto-split rather than hanging or silently dropping the split.
+  # This specifically requires "auto-split" or "automatically split" language in
+  # the concern-boundary step, not just general "AskUserQuestion unavailable" text
+  # which already exists for the commit step (step 11).
+  run grep -in "auto.split\|automatically split" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}

package/skills/manage-problem/test/manage-problem-no-prose-options.bats CHANGED Viewed

@@ -61,3 +61,14 @@ setup() {
   run grep -n "Scope change" "$SKILL_FILE"
   [ "$status" -eq 0 ]
 }
+@test "SKILL.md step 9d requires fix summary extracted from Fix Released in AskUserQuestion (P030)" {
+  # P030: verification prompts must include a one-line fix summary extracted from the
+  # '## Fix Released' section so the user can answer without a clarifying round-trip.
+  # This checks that step 9d explicitly instructs including fix content in the question,
+  # not just detecting Fix Released to decide which problems need verification.
+  # The fix must add wording like "extract" or "include" + "Fix Released" + "summary"
+  # (or "question") within step 9d. A generic "Fix Released" mention is insufficient.
+  run grep -n "fix summary\|Fix Released.*question\|Fix Released.*summary\|extract.*Fix Released\|include.*Fix Released\|summary.*Fix Released" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}

package/skills/manage-problem/test/manage-problem-output-formatting.bats ADDED Viewed

@@ -0,0 +1,30 @@
+#!/usr/bin/env bats
+# Doc-lint guard: manage-problem SKILL.md must include the output formatting rule
+# requiring human-readable titles alongside bare IDs (P032).
+#
+# Structural assertion — Permitted Exception to the source-grep ban (ADR-005 / P011).
+# These tests assert that the skill specification document contains the output
+# formatting instruction so agents include titles with IDs in prose output.
+#
+# Cross-reference:
+#   P032 (agent output uses opaque IDs without titles)
+#   @jtbd JTBD-001 (enforce governance without slowing down)
+setup() {
+  SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
+  SKILL_FILE="${SKILL_DIR}/SKILL.md"
+}
+@test "SKILL.md contains output formatting section" {
+  run grep -n "## Output Formatting" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md output formatting rule requires titles with IDs (P032)" {
+  # P032: agents must include human-readable titles when referencing IDs in prose.
+  # The rule must mention including the title alongside IDs.
+  run grep -n "title" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  run grep -n "Output Formatting" "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}

package/skills/manage-problem/test/manage-problem-parked-and-cache.bats CHANGED Viewed

@@ -59,10 +59,15 @@ setup() {
   [ "$status" -eq 0 ]
 }
-@test "SKILL.md describes checking README.md freshness before full re-scan" {
-  # The work fast-path: if README.md is newer than all problem files,
-  # skip the 18-file re-scan and read the cached table directly.
-  # Proxy: SKILL.md mentions -newer (the find flag used for mtime comparison).
-  run grep -q "\-newer" "$SKILL_FILE"
+@test "SKILL.md describes checking README.md freshness using git history, not mtime" {
+  # P031: The mtime-based `find -newer` check is broken in git worktrees
+  # because all files receive the same mtime at checkout time.
+  # The cache-freshness check must use git log to compare commits, not
+  # filesystem timestamps.
+  # Positive: SKILL.md uses git log for cache freshness.
+  run grep -q "git log.*README\.md" "$SKILL_FILE"
   [ "$status" -eq 0 ]
+  # Negative: SKILL.md must NOT use find -newer for cache freshness.
+  run grep -q "\-newer" "$SKILL_FILE"
+  [ "$status" -ne 0 ]
 }

package/skills/work-problems/SKILL.md ADDED Viewed

@@ -0,0 +1,150 @@
+---
+name: wr-itil:work-problems
+description: Batch-work ITIL problem tickets while the user is AFK. Loops through the problem backlog by WSJF priority, delegating each problem to wr-itil:manage-problem, and stops when nothing is left to progress. Use this skill whenever the user says things like "work through my problems", "grind problems", "work the backlog", "work problems while I'm away", "process problems AFK", or any request to autonomously work through multiple problem tickets without interactive input. Also trigger when the user asks to "loop" or "batch" problem work, or says they'll be away and wants problems handled.
+allowed-tools: Skill, Bash, Glob, Grep, Read
+---
+# Work Problems — AFK Batch Orchestrator
+Autonomously loop through ITIL problem tickets by WSJF priority, working each one via `wr-itil:manage-problem`, until nothing actionable remains.
+The user is AFK during this process, so every decision point that would normally require interactive input should be resolved automatically using safe defaults. The skill reports progress between iterations so the user can review what happened when they return.
+## How It Works
+Each iteration is one cycle of: scan backlog, pick highest-WSJF problem, work it, report result. The loop continues until a stop condition is met.
+### Step 1: Scan the backlog
+Read `docs/problems/README.md` if it exists and is fresh (check via git history — see manage-problem step 9 for the cache freshness check). If stale or missing, scan all `.open.md` and `.known-error.md` files in `docs/problems/`, extract their WSJF scores, and rank them.
+Exclude:
+- `.closed.md` files (done)
+- `.parked.md` files (blocked on upstream)
+- Problems with no WSJF score (need a review first — run `/wr-itil:manage-problem review` as the first iteration if scores are missing)
+### Step 2: Check stop conditions
+Stop the loop and report a summary if any of these are true:
+1. **No actionable problems** — zero open or known-error problems remain
+2. **All remaining problems require interactive input** — e.g., they all need user verification (known-errors with `## Fix Released`), or their scope expanded beyond what's safe to auto-resolve
+3. **All remaining problems are blocked** — investigation hit a dead end, or the fix requires changes outside the project
+When stopping, output a summary table of what was worked and what remains, then output exactly:
+```
+ALL_DONE
+```
+This sentinel line allows external scripts to detect completion.
+### Step 3: Pick the highest-WSJF problem
+Select the problem with the highest WSJF score. If there's a tie, prefer:
+1. Known Errors over Open problems (they have a confirmed fix path — less risk of wasted effort)
+2. Smaller effort over larger (faster throughput)
+3. Older reported date (longer wait = higher urgency)
+### Step 4: Classify each problem
+Read the problem file and apply these deterministic rules:
+| Problem state | Action |
+|---|---|
+| Known Error with `## Fix Released` | **Skip** — needs user verification |
+| Known Error with fix strategy documented | **Work it** — implement the fix |
+| Known Error without fix strategy | **Work it** — produce a fix strategy, then implement |
+| Open problem with preliminary hypothesis or investigation notes | **Work it** — continue the investigation |
+| Open problem with no leads (empty Root Cause Analysis) | **Work it** — read the relevant code, form a hypothesis, document findings |
+| Problem previously attempted twice without progress in this session | **Skip** — mark as stuck, needs interactive attention |
+The default is to work the problem. Only skip when the rule explicitly says so. This is an AFK loop — forward progress matters more than avoiding dead ends, because dead ends are cheap (findings are saved) and interactive input is expensive (user is absent).
+**Time-box each problem** to avoid runaway investigation: the delegated `manage-problem` skill's internal logic decides scope. If investigation reveals the scope has grown (e.g., effort was estimated S but turns out to be L), save findings to the problem file, update the WSJF score, and move to the next problem. Never sink unbounded effort into one problem during AFK mode.
+If a problem is skipped by this step, add it to a "skipped" list with the reason and loop back to step 3 for the next one.
+### Step 5: Work the problem
+Invoke the manage-problem skill:
+```
+/wr-itil:manage-problem work highest WSJF problem that can be progressed non-interactively as the user is AFK
+```
+The manage-problem skill will:
+- Run a review if the cache is stale
+- Select and work the highest-WSJF problem
+- Use its built-in non-interactive fallbacks (auto-split multi-concern problems, auto-commit when risk is within appetite)
+- Commit completed work per ADR-014
+### Step 6: Report progress
+After each iteration, report:
+- Which problem was worked (ID + title)
+- What was done (investigated, transitioned to known-error, fix implemented, etc.)
+- The outcome (success, partially progressed, skipped, scope expanded)
+- How many problems remain in the backlog
+Format as a brief status line, not a wall of text. The user will read these when they return.
+**Example:**
+```
+[Iteration 1] Worked P029 (Edit gate overhead for governance docs) — implemented fix, closed. 8 problems remain.
+[Iteration 2] Worked P021 (Governance skill structured prompts) — investigated root cause, transitioned to known-error. 7 problems remain.
+[Iteration 3] Skipped P016 (Multi-concern ticket splitting) — fix released, awaiting user verification. Worked P024 (Risk scorer WIP flag) — implemented fix, closed. 6 problems remain.
+```
+### Step 7: Loop
+Go back to step 1. The backlog may have changed — new problems may have been created during fixes, priorities may have shifted, and the README.md cache will be stale.
+## Non-Interactive Decision Making
+When `AskUserQuestion` is unavailable or the user is AFK, the skill (and the delegated manage-problem skill) should resolve decisions automatically:
+| Decision Point | Non-Interactive Default |
+|---|---|
+| Which problem to work | Highest WSJF, no prompt needed |
+| Multi-concern split | Auto-split (manage-problem step 4b fallback) |
+| Scope expansion during work | Update problem file, re-score WSJF, move to next problem instead of continuing |
+| Commit when risk within appetite | Auto-commit (manage-problem step 9e fallback) |
+| Commit when risk above appetite | Skip commit, report uncommitted state |
+| Fix verification needed | Skip problem, add to "needs verification" list |
+## Edge Cases
+**Review needed first**: If no problems have WSJF scores, run `/wr-itil:manage-problem review` as the first iteration to score everything, then proceed to the work loop.
+**Scope creep during investigation**: If investigating an open problem reveals the scope is larger than expected (effort re-sized from S to L), save findings to the problem file, update the WSJF score, and move to the next problem. Don't sink unlimited effort into one problem during AFK mode — the user can decide when they return.
+**Circular work**: If the same problem keeps appearing as highest-WSJF across iterations without making progress, skip it after the second attempt and note it as "stuck — needs interactive attention".
+**Git conflicts**: If a commit fails due to conflicts, stop the loop and report the conflict. Don't try to resolve conflicts non-interactively.
+## Output Format
+The skill should produce a final summary when the loop ends:
+```
+## Work Problems Summary
+### Completed
+| # | Problem | Action | Result |
+|---|---------|--------|--------|
+| 1 | P029 (Edit gate overhead) | Implemented fix | Closed |
+| 2 | P021 (Structured prompts) | Investigated root cause | Transitioned to Known Error |
+### Skipped
+| Problem | Reason |
+|---------|--------|
+| P016 (Multi-concern splitting) | Awaiting user verification |
+### Remaining Backlog
+| WSJF | Problem | Status |
+|------|---------|--------|
+| 9.0 | P012 (Skill testing harness) | Open |
+ALL_DONE
+```

package/skills/work-problems/evals/evals.json ADDED Viewed

@@ -0,0 +1,45 @@
+{
+  "skill_name": "wr-itil:work-problems",
+  "evals": [
+    {
+      "id": 1,
+      "prompt": "work through my problems while I'm away — I'll be AFK for a bit so just grind through whatever you can",
+      "expected_output": "Works highest-WSJF problems in sequence, reports progress per iteration, outputs ALL_DONE when nothing is left",
+      "files": [],
+      "expectations": [
+        "Output includes a progress line for each iteration with problem ID and title",
+        "Output includes a final summary table with Completed and Skipped sections",
+        "Output ends with the ALL_DONE sentinel",
+        "Problems are worked in WSJF priority order (highest first)",
+        "Problems needing user verification (Fix Released) are skipped with reason",
+        "Each iteration reports how many problems remain"
+      ]
+    },
+    {
+      "id": 2,
+      "prompt": "grind the problem backlog for me — do as many as you can without asking me anything",
+      "expected_output": "Same loop behavior but triggered by different phrasing. Should still work WSJF-ordered, skip interactive decisions, report progress.",
+      "files": [],
+      "expectations": [
+        "Skill triggers correctly from casual 'grind the backlog' phrasing",
+        "Does not use AskUserQuestion during execution",
+        "Commits work automatically when risk is within appetite",
+        "Handles scope expansion conservatively (saves findings, moves to next problem)",
+        "Stops when no more actionable problems remain"
+      ]
+    },
+    {
+      "id": 3,
+      "prompt": "I need to step out — can you work through the open problems? Start with the highest priority ones. Don't wait for me on anything, just make the best call you can.",
+      "expected_output": "Runs review if cache is stale, then loops through problems. Demonstrates the review-first-then-work pattern and handles edge cases.",
+      "files": [],
+      "expectations": [
+        "Runs a review or uses cached rankings before starting work",
+        "Known Errors are preferred over Open problems when WSJF is tied",
+        "If a problem is attempted twice without progress, it is skipped as stuck",
+        "Git conflicts cause the loop to stop with a clear report",
+        "Final output includes Remaining Backlog section showing what's left"
+      ]
+    }
+  ]
+}