@windyroad/itil 0.19.0-preview.192 → 0.19.1-preview.194

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  {
2
2
  "name": "wr-itil",
3
- "version": "0.19.0",
3
+ "version": "0.19.1",
4
4
  "description": "ITIL-aligned IT service management for Claude Code"
5
5
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/itil",
3
- "version": "0.19.0-preview.192",
3
+ "version": "0.19.1-preview.194",
4
4
  "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
5
5
  "bin": {
6
6
  "windyroad-itil": "./bin/install.mjs"
@@ -158,13 +158,15 @@ PROMPT_EOF
158
158
  claude -p \
159
159
  --permission-mode bypassPermissions \
160
160
  --output-format json \
161
- "$ITERATION_PROMPT"
161
+ "$ITERATION_PROMPT" \
162
+ < /dev/null
162
163
  ```
163
164
 
164
165
  **Flag rationale:**
165
166
 
166
167
  - `--permission-mode bypassPermissions` — handles non-interactive permission prompts. Without this, Bash/Edit/Write calls inside the subprocess halt on approval prompts (no TTY). Alternative modes (`acceptEdits`, `auto`, `dontAsk`) are acceptable if adopters need narrower permission scopes; `bypassPermissions` is the broadest and the empirically-verified path.
167
168
  - `--output-format json` — deterministic structured output. The subprocess's final agent message lands in the JSON response's `.result` field; orchestrator extracts `ITERATION_SUMMARY` from that field. Plain-text output would require fragile scraping.
169
+ - `< /dev/null` — explicit stdin-closed redirect (P089 Gap 1). Without this, `claude -p` waits up to 3s for stdin data in non-TTY contexts and then prints `Warning: no stdin data received in 3s, proceeding without it. If piping from a slow command, redirect stdin explicitly: < /dev/null to skip, or wait longer.` to stderr. The warning is on stderr — if the caller separates stderr and stdout streams, the warning is harmless. But the orchestrator captures via `2>&1` (required because the CLI emits progress prose on stderr that must not interleave between JSON responses when multiple invocations chain). Under the `2>&1` merge the stderr warning prefixes the stdout JSON and breaks `jq` / `json.load` / `JSON.parse` extraction at "line 1, column 1: Expecting value". The redirect suppresses the warning at source. First observed AFK-iter-7 iter 1 (2026-04-21); workaround is the Anthropic CLI help's own suggestion.
168
170
 
169
171
  **No per-iteration budget cap.** The dispatch deliberately omits `--max-budget-usd`. Per user direction 2026-04-21: the natural stop condition for an AFK loop is quota exhaustion, not an arbitrary per-iteration dollar cap. A cap would halt iterations before quota is actually exhausted, wasting remaining budget. Runaway-iteration risk is bounded by quota + the orchestrator's Step 6.75 halt on unexpected dirty state + exit-code handling below.
170
172
 
@@ -224,6 +226,13 @@ SESSION_CACHE_READ_TOKENS=$(( ${SESSION_CACHE_READ_TOKENS:-0} + ITER_CACHE_READ
224
226
 
225
227
  Do NOT extract `session_id`, `model`, `stop_reason`, `permission_denials`, `uuid`, or any other field from the JSON response. Those are subprocess-envelope fields that serve no user-visible purpose and risk leaking subprocess-internal identifiers into orchestrator output.
226
228
 
229
+ **Authority hierarchy (P089 Gap 2).** `total_cost_usd` and `usage.*` do NOT have the same reliability envelope — treat them accordingly when aggregating:
230
+
231
+ - `.total_cost_usd` is **authoritative for dollar cost** — cumulative across the subprocess's entire lifetime by contract. Use it as the sole source of truth for the Session Cost "Total cost (USD)" column and any cost-based stop condition.
232
+ - `.usage.*` token fields are **best-effort approximate** — the Anthropic CLI returns the final API response envelope, which is per-turn by construction. When the subprocess exits on a normal final turn the fields accumulate real usage; when the subprocess exits via a background-task completion-notification ack (a closing turn that only acknowledges a backgrounded task finished), the fields reflect ONLY that final ack turn and undercount dramatically. Detectable anomaly shape: the subprocess reports a final-turn-sized usage (handful of input tokens, hundreds of output tokens) alongside a wall-clock duration from the Bash wrapper's own timer that is orders of magnitude larger than the JSON's `duration_ms` field — the cumulative dollar cost still matches real spend, so the mismatch is self-evident on inspection.
233
+
234
+ Aggregation rule: sum `.total_cost_usd` into the session total and trust it; sum `.usage.*` into the session totals for cache-reuse ratio reasoning but label them best-effort in the Session Cost table. This asymmetry is correct-by-CLI-contract (cost is a session cumulative; usage is a per-response envelope); the orchestrator documents the asymmetry so adopters do not silently under-count tokens. First observed AFK-iter-7 iter 5 (2026-04-21): 1071s wall-clock / 60+ tool-use subprocess returned `duration_ms: 8546, num_turns: 1, usage.* ≈ 137K tokens, total_cost_usd: 6.08` — cost cumulative and correct, tokens reflecting only the final ack turn.
235
+
227
236
  **Exit-code semantics.** `claude -p` exits non-zero when the subprocess fails hard — subprocess crash, auth failure, unresolvable permission denial, API/quota exhaustion. The orchestrator reads the exit code BEFORE parsing `.result`:
228
237
 
229
238
  - Exit 0 → parse `ITERATION_SUMMARY` from `.result` field; proceed to Step 6.
@@ -414,6 +423,8 @@ The skill should produce a final summary when the loop ends:
414
423
 
415
424
  Extracted from each iteration subprocess's `claude -p --output-format json` response (source: measured-actual, not estimated — per ADR-026 grounding). Renders identically in interactive and AFK modes; no decision branch, so output-side only. Cache-read column surfaces the warm-cache-reuse signal observed across subsequent subprocess invocations in the same Bash session.
416
425
 
426
+ **Authority note (per P089 Gap 2 — see Step 5 Authority hierarchy):** the "Total cost (USD)" column is authoritative (CLI reports `.total_cost_usd` as a session cumulative). The token columns are **best-effort** — they accumulate each iteration's `.usage.*` response fields, which reflect only the final-turn API envelope and can undercount when a subprocess exits via a background-task completion-notification ack. Cost-based reasoning trusts the cost column; token-based reasoning (cache-reuse ratios, cost-envelope calibration) reads the token columns with that caveat in mind.
427
+
417
428
  | Metric | Value |
418
429
  |--------|-------|
419
430
  | Iterations run | 3 |
@@ -434,6 +445,7 @@ When every skipped ticket is in the `upstream-blocked` category (stop-condition
434
445
 
435
446
  ## Related
436
447
 
448
+ - **P089** (`docs/problems/089-work-problems-step-5-dispatch-robustness-stdin-warning-and-cost-metadata-edge-case.verifying.md`) — driver for Step 5's `< /dev/null` dispatch redirect and the Per-iteration cost metadata "Authority hierarchy" paragraph. Gap 1: stdin warning contaminated stderr-merged JSON captures; closed by adding `< /dev/null` to the canonical dispatch command. Gap 2: `.usage.*` undercounts when subprocess exits via a background-task completion ack while `.total_cost_usd` stays cumulative-authoritative; closed by documenting the authority hierarchy in Step 5 and the Session Cost output section so adopters trust cost and label token totals best-effort.
437
449
  - **P086** (`docs/problems/086-afk-iteration-subprocess-does-not-run-retro-before-returning.verifying.md`) — driver for Step 5's retro-on-exit clause. Iteration subprocesses exit without running retro, so per-iteration friction (hook misbehaviour, repeat-workaround patterns, pipeline instability) evaporates on exit. Fix: iteration prompt body names `/wr-retrospective:run-retro` as a closing step before `ITERATION_SUMMARY` emission; retro runs inside the subprocess so Step 2b pipeline-instability scan has the full tool-call history; run-retro commits its own work per ADR-014; orchestrator picks up retro-created tickets on the next Step 1 scan.
438
450
  - **P084** (`docs/problems/084-work-problems-iteration-worker-has-no-agent-tool-so-architect-jtbd-gates-block.open.md`) — driver for Step 5's subprocess-boundary dispatch. Supersedes P077's Agent-tool dispatch on the same Step 5 surface because Agent-tool-spawned subagents cannot themselves invoke Agent (platform restriction), which prevents governance gate markers from being set inside the iteration worker.
439
451
  - **P077** (`docs/problems/077-work-problems-step-5-does-not-delegate-to-subagent.verifying.md`) — parent amendment. Established the AFK iteration-isolation wrapper sub-pattern and the `ITERATION_SUMMARY` return contract. P084 is the refinement that swaps the spawn mechanism; the isolation intent and return contract are preserved verbatim.
@@ -245,3 +245,89 @@ setup() {
245
245
  run grep -nE "ScheduleWakeup.{0,120}P083|P083.{0,120}ScheduleWakeup" "$SKILL_FILE"
246
246
  [ "$status" -eq 0 ]
247
247
  }
248
+
249
+ # @problem P089
250
+ # @jtbd JTBD-006
251
+ #
252
+ # STRUCTURAL: tests the SKILL.md content contract per ADR-037's Permitted
253
+ # Exception (doc-lint contract assertion against the contract document
254
+ # itself). Behavioural alternative would require spawning a real `claude -p`
255
+ # subprocess from a non-TTY context, observing that the JSON response parses
256
+ # cleanly, and asserting `jq` does not error on a stderr-merged capture —
257
+ # that harness sits outside the skill layer and depends on the Anthropic
258
+ # CLI binary. Contract assertion is the named permitted pattern.
259
+ @test "SKILL.md Step 5 dispatch command redirects stdin (< /dev/null) to suppress stdin warning (P089 Gap 1)" {
260
+ # P089 Gap 1 (2026-04-21, AFK-iter-7 iter 1): the shipped dispatch command
261
+ # without an explicit stdin redirect causes `claude -p` to emit
262
+ # `Warning: no stdin data received in 3s, proceeding without it...` to
263
+ # stderr after a 3s wait. When the orchestrator captures the subprocess
264
+ # with `2>&1` (required for structured parse), that warning prepends to
265
+ # stdout and corrupts `jq` / `json.load` / `JSON.parse` extraction of
266
+ # `.result` and cost metadata. The `claude -p` CLI help explicitly
267
+ # suggests `< /dev/null` as the workaround. Regression guard ensures the
268
+ # redirect stays in the canonical dispatch block adopters copy.
269
+ run grep -nE '< /dev/null|< ?/dev/null' "$SKILL_FILE"
270
+ [ "$status" -eq 0 ]
271
+ }
272
+
273
+ @test "SKILL.md Step 5 explains stdin warning is on stderr (becomes stdout problem under 2>&1 capture)" {
274
+ # Architect advisory (2026-04-22): the prose around the `< /dev/null`
275
+ # redirect must state that the warning is emitted to stderr (not stdout)
276
+ # so adopters who capture stderr separately understand they do not need
277
+ # the redirect. This prevents cargo-culting the redirect in environments
278
+ # where stdout and stderr are consumed independently.
279
+ run grep -niE "stderr.{0,120}(2>&1|merge|capture)|warning.{0,60}stderr|stderr.{0,60}warning" "$SKILL_FILE"
280
+ [ "$status" -eq 0 ]
281
+ }
282
+
283
+ # @problem P089
284
+ # @jtbd JTBD-006
285
+ #
286
+ # STRUCTURAL: same Permitted Exception rationale — behavioural alternative
287
+ # would require a subprocess that internally spawns a background task,
288
+ # receives its completion notification, and returns a final-turn ack;
289
+ # the observation is a runtime contract of the Anthropic CLI and harness
290
+ # instrumentation sits outside the skill layer.
291
+ @test "SKILL.md Per-iteration cost metadata block names total_cost_usd as authoritative (P089 Gap 2)" {
292
+ # P089 Gap 2 (2026-04-21, AFK-iter-7 iter 5): subprocess ran 1071s wall-clock
293
+ # with 60+ tool uses; JSON returned `duration_ms: 8546, num_turns: 1,
294
+ # usage.* = ~137K tokens, total_cost_usd: 6.08`. Cost was cumulative-
295
+ # authoritative; usage.* reflected only the final-turn ack after a
296
+ # background-task completion notification. The SKILL.md must name the
297
+ # authority hierarchy explicitly so Session Cost consumers treat
298
+ # total_cost_usd as the trusted dollar signal and usage.* as best-effort
299
+ # approximate.
300
+ run grep -niE "total_cost_usd.{0,80}authoritative|authoritative.{0,80}total_cost_usd|total_cost_usd.{0,40}cumulative" "$SKILL_FILE"
301
+ [ "$status" -eq 0 ]
302
+ }
303
+
304
+ @test "SKILL.md Per-iteration cost metadata block flags usage.* as best-effort under early-ack anomaly (P089 Gap 2)" {
305
+ # The asymmetry between cumulative total_cost_usd and per-turn usage.*
306
+ # must be documented so contributors do not assume summing usage fields
307
+ # yields a total-tokens number matching real cost. Detection criterion
308
+ # (1-turn-short + duration_ms << wall-clock) is stated descriptively per
309
+ # architect option-b (keeps num_turns off the extracted field list).
310
+ run grep -niE "usage\\.\\*.{0,120}(best.?effort|approximate|final.?turn|under.?count|partial)|best.?effort.{0,120}usage|final.?turn.{0,120}usage" "$SKILL_FILE"
311
+ [ "$status" -eq 0 ]
312
+ }
313
+
314
+ @test "SKILL.md Session Cost output section notes tokens are best-effort under the early-ack anomaly (P089 Gap 2)" {
315
+ # The Session Cost table renders to the user on every ALL_DONE. If token
316
+ # columns silently undercount when a subprocess exits via background-task
317
+ # ack, the user's cost envelope is wrong without any caveat. The output
318
+ # section must carry a short note that the token totals are best-effort
319
+ # while total-cost is authoritative — this is the observability contract
320
+ # visible to the end user, not just an internal implementation detail.
321
+ # The note can live inline in the table footer or in the prose around
322
+ # the table.
323
+ run grep -niE "tokens.{0,120}(best.?effort|approximate|undercount|may reflect)|total.?cost.{0,120}authoritative" "$SKILL_FILE"
324
+ [ "$status" -eq 0 ]
325
+ }
326
+
327
+ @test "SKILL.md Related section cites P089 (Step 5 dispatch robustness)" {
328
+ # P089 documents the two gaps fixed by the < /dev/null redirect and the
329
+ # authority hierarchy paragraph. The Related section must cite it so the
330
+ # contract document remains self-documenting.
331
+ run grep -nE "P089" "$SKILL_FILE"
332
+ [ "$status" -eq 0 ]
333
+ }