@windyroad/itil 0.13.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,5 @@
1
1
  {
2
2
  "name": "wr-itil",
3
- "version": "0.13.0",
3
+ "version": "0.14.0",
4
4
  "description": "ITIL-aligned IT service management for Claude Code"
5
5
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/itil",
3
- "version": "0.13.0",
3
+ "version": "0.14.0",
4
4
  "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
5
5
  "bin": {
6
6
  "windyroad-itil": "./bin/install.mjs"
@@ -171,6 +171,35 @@ notes: <one-line>
171
171
 
172
172
  Architect review (R2) requires the commit state fields (`committed` / `commit_sha` / `reason`) so **Step 6.75's Dirty-for-known-reason branch stays evaluable** from the summary alone. JTBD review requires `ticket_id` / `action` / `skip_reason_category` / `outstanding_questions` so Step 2.5 and the Output Format's Completed / Skipped / Outstanding Design Questions tables can be populated deterministically without the orchestrator having to re-parse ticket files.
173
173
 
174
+ **Per-iteration cost metadata.** Alongside `.result`, the `claude -p --output-format json` response carries cost + usage fields in the same JSON blob. The orchestrator MUST extract these **named fields only** into per-iteration totals and session aggregates — nothing else from the JSON should be surfaced to the user or logged (PII guard: the response also carries `session_id`, `model`, `stop_reason`, and other envelope fields; the extraction is **scoped to the named fields** below so future contributors do not unconsciously broaden it).
175
+
176
+ Extracted fields (explicit field list):
177
+
178
+ - `.total_cost_usd` — dollar cost for the iteration.
179
+ - `.duration_ms` — wall-clock duration of the iteration subprocess.
180
+ - `.usage.input_tokens` — prompt tokens.
181
+ - `.usage.output_tokens` — generated tokens.
182
+ - `.usage.cache_creation_input_tokens` — tokens written to the prompt cache on this invocation.
183
+ - `.usage.cache_read_input_tokens` — tokens read from the prompt cache on this invocation (cache-read is the signal for warm-cache reuse across subsequent subprocess invocations in the same Bash session; high values here indicate the iteration benefited from prior-invocation caching).
184
+
185
+ Use `jq` (or an equivalent JSON parser) to extract them:
186
+
187
+ ```bash
188
+ # $SUBPROCESS_OUTPUT holds the full JSON response body from claude -p.
189
+ read -r ITER_COST ITER_DURATION_MS ITER_INPUT ITER_OUTPUT ITER_CACHE_WRITE ITER_CACHE_READ < <(
190
+ jq -r '[.total_cost_usd, .duration_ms, .usage.input_tokens, .usage.output_tokens, .usage.cache_creation_input_tokens, .usage.cache_read_input_tokens] | @tsv' <<<"$SUBPROCESS_OUTPUT"
191
+ )
192
+ # Accumulate into session totals for the ALL_DONE Session Cost section.
193
+ SESSION_COST=$(awk "BEGIN { printf \"%.4f\", ${SESSION_COST:-0} + $ITER_COST }")
194
+ SESSION_DURATION_MS=$(( ${SESSION_DURATION_MS:-0} + ITER_DURATION_MS ))
195
+ SESSION_INPUT_TOKENS=$(( ${SESSION_INPUT_TOKENS:-0} + ITER_INPUT ))
196
+ SESSION_OUTPUT_TOKENS=$(( ${SESSION_OUTPUT_TOKENS:-0} + ITER_OUTPUT ))
197
+ SESSION_CACHE_WRITE_TOKENS=$(( ${SESSION_CACHE_WRITE_TOKENS:-0} + ITER_CACHE_WRITE ))
198
+ SESSION_CACHE_READ_TOKENS=$(( ${SESSION_CACHE_READ_TOKENS:-0} + ITER_CACHE_READ ))
199
+ ```
200
+
201
+ Do NOT extract `session_id`, `model`, `stop_reason`, `permission_denials`, `uuid`, or any other field from the JSON response. Those are subprocess-envelope fields that serve no user-visible purpose and risk leaking subprocess-internal identifiers into orchestrator output.
202
+
174
203
  **Exit-code semantics.** `claude -p` exits non-zero when the subprocess fails hard — subprocess crash, auth failure, unresolvable permission denial, API/quota exhaustion. The orchestrator reads the exit code BEFORE parsing `.result`:
175
204
 
176
205
  - Exit 0 → parse `ITERATION_SUMMARY` from `.result` field; proceed to Step 6.
@@ -197,14 +226,15 @@ After each iteration, report:
197
226
  - What was done (investigated, transitioned to known-error, fix implemented, etc.)
198
227
  - The outcome (success, partially progressed, skipped, scope expanded)
199
228
  - How many problems remain in the backlog
229
+ - The iteration's cost metadata — format: `($<cost>, <duration_s>s, <total_tokens_K>K tokens)`. Cost comes from the `.total_cost_usd` field extracted in Step 5; duration from `.duration_ms`; total tokens is the sum of `.usage.input_tokens + .usage.output_tokens + .usage.cache_creation_input_tokens + .usage.cache_read_input_tokens`.
200
230
 
201
231
  Format as a brief status line, not a wall of text. The user will read these when they return.
202
232
 
203
233
  **Example:**
204
234
  ```
205
- [Iteration 1] Worked P029 (Edit gate overhead for governance docs) — implemented fix, closed. 8 problems remain.
206
- [Iteration 2] Worked P021 (Governance skill structured prompts) — investigated root cause, transitioned to known-error. 7 problems remain.
207
- [Iteration 3] Skipped P016 (Multi-concern ticket splitting) — fix released, awaiting user verification. Worked P024 (Risk scorer WIP flag) — implemented fix, closed. 6 problems remain.
235
+ [Iteration 1] Worked P029 (Edit gate overhead for governance docs) — implemented fix, closed. 8 problems remain. ($0.32, 23s, 171K tokens)
236
+ [Iteration 2] Worked P021 (Governance skill structured prompts) — investigated root cause, transitioned to known-error. 7 problems remain. ($0.85, 47s, 432K tokens)
237
+ [Iteration 3] Skipped P016 (Multi-concern ticket splitting) — fix released, awaiting user verification. Worked P024 (Risk scorer WIP flag) — implemented fix, closed. 6 problems remain. ($1.12, 62s, 541K tokens)
208
238
  ```
209
239
 
210
240
  ### Step 6.5: Release-cadence check (per ADR-018)
@@ -313,10 +343,27 @@ The skill should produce a final summary when the loop ends:
313
343
  |------|---------|--------|
314
344
  | 9.0 | P012 (Skill testing harness) | Open |
315
345
 
346
+ ### Session Cost
347
+
348
+ Extracted from each iteration subprocess's `claude -p --output-format json` response (source: measured-actual, not estimated — per ADR-026 grounding). Renders identically in interactive and AFK modes; no decision branch, so output-side only. Cache-read column surfaces the warm-cache-reuse signal observed across subsequent subprocess invocations in the same Bash session.
349
+
350
+ | Metric | Value |
351
+ |--------|-------|
352
+ | Iterations run | 3 |
353
+ | Successful (committed) | 2 |
354
+ | Skipped | 1 |
355
+ | Total cost (USD) | $2.29 |
356
+ | Mean cost per iteration | $0.76 |
357
+ | Total input tokens | 42 |
358
+ | Total output tokens | 1,531 |
359
+ | Cache-creation tokens | 78,000 |
360
+ | Cache-read tokens (reuse) | 1,064,000 |
361
+ | Total duration | 2m 12s |
362
+
316
363
  ALL_DONE
317
364
  ```
318
365
 
319
- When every skipped ticket is in the `upstream-blocked` category (stop-condition #3) or there are no skipped tickets (stop-condition #1), omit the Outstanding Design Questions section entirely rather than rendering an empty heading.
366
+ When every skipped ticket is in the `upstream-blocked` category (stop-condition #3) or there are no skipped tickets (stop-condition #1), omit the Outstanding Design Questions section entirely rather than rendering an empty heading. The Session Cost section always renders when at least one iteration ran.
320
367
 
321
368
  ## Related
322
369
 
@@ -0,0 +1,109 @@
1
+ #!/usr/bin/env bats
2
+ # Doc-lint guard: work-problems SKILL.md must extract cost + usage metadata
3
+ # from each iteration's `claude -p --output-format json` response, surface it
4
+ # in the per-iteration Step 6 progress line, and aggregate it in the ALL_DONE
5
+ # Output Format as a dedicated "Session Cost" section.
6
+ #
7
+ # Rationale: the subprocess-boundary dispatch lands per-iteration cost in the
8
+ # JSON response alongside `.result`. Without an explicit extraction contract,
9
+ # that data is invisible to the user even though it's already emitted. Cost
10
+ # logging lets the user calibrate AFK loop sizing on return (e.g. "max out
11
+ # the token usage" direction 2026-04-21 needs a feedback loop).
12
+ #
13
+ # Structural assertion — Permitted Exception under ADR-005 + ADR-037 (SKILL.md
14
+ # is the contract document). A behavioural harness that exercises the `jq`
15
+ # extraction against a fixture JSON is a potential follow-up; out of scope
16
+ # for this doc-lint pass.
17
+ #
18
+ # @problem P084
19
+ # @jtbd JTBD-006
20
+ #
21
+ # Cross-reference:
22
+ # P084 (iteration worker has no Agent tool) — parent ticket; cost logging
23
+ # is an additive observability overlay on P084's shipped subprocess
24
+ # dispatch.
25
+ # ADR-032 (governance skill invocation patterns) — subprocess-boundary
26
+ # sub-pattern; `--output-format json` parse shape is already pinned.
27
+ # ADR-026 (agent output grounding) — Session Cost section cites its source
28
+ # so downstream audits can distinguish measured-actual from estimated.
29
+ # ADR-037 (skill testing strategy) — contract-assertion pattern.
30
+ # JTBD-006 (Progress the Backlog While I'm Away) — "clear summary when I
31
+ # return" documented outcome includes cost/token traceability.
32
+
33
+ setup() {
34
+ SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
35
+ SKILL_FILE="${SKILL_DIR}/SKILL.md"
36
+ }
37
+
38
+ @test "SKILL.md Step 5 extracts .total_cost_usd from iteration JSON response" {
39
+ # Cost per iteration lives in the same JSON blob as .result; parsing it
40
+ # costs nothing more than a jq call the orchestrator already needs for
41
+ # ITERATION_SUMMARY.
42
+ run grep -nE 'total_cost_usd' "$SKILL_FILE"
43
+ [ "$status" -eq 0 ]
44
+ }
45
+
46
+ @test "SKILL.md Step 5 extracts usage token fields from iteration JSON response" {
47
+ # input_tokens / output_tokens / cache_creation_input_tokens /
48
+ # cache_read_input_tokens are the four usage fields that give a full
49
+ # accounting. Cache-read is the key signal for reuse across subprocess
50
+ # invocations in the same Bash session.
51
+ run grep -nE 'input_tokens|output_tokens|cache_creation_input_tokens|cache_read_input_tokens' "$SKILL_FILE"
52
+ [ "$status" -eq 0 ]
53
+ }
54
+
55
+ @test "SKILL.md Step 5 names jq (or equivalent) as the extraction mechanism" {
56
+ # jq is already implicit in `--output-format json` consumption; naming it
57
+ # in the SKILL.md prevents bespoke sed/awk reimplementations.
58
+ run grep -nE '\\bjq\\b|JSON parser|JSON extraction' "$SKILL_FILE"
59
+ [ "$status" -eq 0 ]
60
+ }
61
+
62
+ @test "SKILL.md Step 5 scopes the extraction to named fields only (PII guard)" {
63
+ # Architect advisory 2026-04-21: the JSON response also carries session_id,
64
+ # model, stop_reason, etc. that should NOT be surfaced in user-visible
65
+ # output. The extraction list must be explicit so future contributors
66
+ # don't unconsciously broaden it.
67
+ run grep -niE 'extract only|only the fields|do not (surface|log|emit)|scoped to (the )?named fields|explicit field list' "$SKILL_FILE"
68
+ [ "$status" -eq 0 ]
69
+ }
70
+
71
+ @test "SKILL.md Step 6 per-iteration progress line includes cost marker" {
72
+ # Example progress line in Step 6 should show the (cost, duration, tokens)
73
+ # suffix so contributors see the target format.
74
+ run grep -nE '\$[0-9]+\.[0-9]+.{0,40}(tokens|iteration|s,)' "$SKILL_FILE"
75
+ [ "$status" -eq 0 ]
76
+ }
77
+
78
+ @test "SKILL.md Output Format includes a Session Cost section" {
79
+ # ALL_DONE summary aggregates per-iteration cost across the run. The
80
+ # section renders in every ALL_DONE — interactive OR AFK — because it's
81
+ # output-side, not a decision branch.
82
+ run grep -nE '^### Session Cost|## Session Cost|Session Cost.{0,40}Total' "$SKILL_FILE"
83
+ [ "$status" -eq 0 ]
84
+ }
85
+
86
+ @test "SKILL.md Output Format Session Cost table includes cache-read reuse signal" {
87
+ # Cache-read is the signal for "warm-cache savings across subprocess
88
+ # invocations in the same Bash session" — empirically observed 65-147K
89
+ # cache-read tokens on probes 2-4 during the P084 probe sequence. Making
90
+ # this visible to the user helps them reason about AFK loop cost dynamics.
91
+ run grep -niE 'cache.?read|cache reuse|reuse signal|cache hit' "$SKILL_FILE"
92
+ [ "$status" -eq 0 ]
93
+ }
94
+
95
+ @test "SKILL.md Output Format Session Cost section cites its data source (ADR-026)" {
96
+ # Architect advisory: the Session Cost numbers are measured-actual (from
97
+ # each iteration's claude -p JSON output), not estimates. Name the source
98
+ # so audit / downstream-tooling can trust the numbers.
99
+ run grep -niE 'extracted from.{0,80}(claude -p|--output-format json|iteration)|source:.{0,80}claude -p|measured.{0,40}(iteration|subprocess)' "$SKILL_FILE"
100
+ [ "$status" -eq 0 ]
101
+ }
102
+
103
+ @test "SKILL.md Session Cost section renders in both interactive and AFK modes" {
104
+ # JTBD-006 Rule 6 check: Session Cost is pure output, no AskUserQuestion,
105
+ # no policy-authorised action. Must render identically in both modes so
106
+ # AFK users see the same summary on return.
107
+ run grep -niE 'Session Cost.{0,160}(regardless|both|interactive.{0,40}AFK|AFK.{0,40}interactive)|output-side|no decision branch' "$SKILL_FILE"
108
+ [ "$status" -eq 0 ]
109
+ }