npm - @windyroad/itil - Versions diffs - 0.13.0-preview.152 → 0.14.0-preview.154 - Mend

@windyroad/itil 0.13.0-preview.152 → 0.14.0-preview.154

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/.claude-plugin/plugin.json +1 -1
package/package.json +1 -1
package/skills/work-problems/SKILL.md +51 -4
package/skills/work-problems/test/work-problems-cost-logging.bats +109 -0

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
   "name": "wr-itil",
-  "version": "0.13.0",
+  "version": "0.14.0",
   "description": "ITIL-aligned IT service management for Claude Code"
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/itil",
-  "version": "0.13.0-preview.152",
+  "version": "0.14.0-preview.154",
   "description": "ITIL-aligned IT service management for Claude Code (problem, and future incident/change skills)",
   "bin": {
     "windyroad-itil": "./bin/install.mjs"

package/skills/work-problems/SKILL.md CHANGED Viewed

@@ -171,6 +171,35 @@ notes: <one-line>
 Architect review (R2) requires the commit state fields (`committed` / `commit_sha` / `reason`) so **Step 6.75's Dirty-for-known-reason branch stays evaluable** from the summary alone. JTBD review requires `ticket_id` / `action` / `skip_reason_category` / `outstanding_questions` so Step 2.5 and the Output Format's Completed / Skipped / Outstanding Design Questions tables can be populated deterministically without the orchestrator having to re-parse ticket files.
+**Per-iteration cost metadata.** Alongside `.result`, the `claude -p --output-format json` response carries cost + usage fields in the same JSON blob. The orchestrator MUST extract these **named fields only** into per-iteration totals and session aggregates — nothing else from the JSON should be surfaced to the user or logged (PII guard: the response also carries `session_id`, `model`, `stop_reason`, and other envelope fields; the extraction is **scoped to the named fields** below so future contributors do not unconsciously broaden it).
+Extracted fields (explicit field list):
+- `.total_cost_usd` — dollar cost for the iteration.
+- `.duration_ms` — wall-clock duration of the iteration subprocess.
+- `.usage.input_tokens` — prompt tokens.
+- `.usage.output_tokens` — generated tokens.
+- `.usage.cache_creation_input_tokens` — tokens written to the prompt cache on this invocation.
+- `.usage.cache_read_input_tokens` — tokens read from the prompt cache on this invocation (cache-read is the signal for warm-cache reuse across subsequent subprocess invocations in the same Bash session; high values here indicate the iteration benefited from prior-invocation caching).
+Use `jq` (or an equivalent JSON parser) to extract them:
+```bash
+# $SUBPROCESS_OUTPUT holds the full JSON response body from claude -p.
+read -r ITER_COST ITER_DURATION_MS ITER_INPUT ITER_OUTPUT ITER_CACHE_WRITE ITER_CACHE_READ < <(
+  jq -r '[.total_cost_usd, .duration_ms, .usage.input_tokens, .usage.output_tokens, .usage.cache_creation_input_tokens, .usage.cache_read_input_tokens] | @tsv' <<<"$SUBPROCESS_OUTPUT"
+)
+# Accumulate into session totals for the ALL_DONE Session Cost section.
+SESSION_COST=$(awk "BEGIN { printf \"%.4f\", ${SESSION_COST:-0} + $ITER_COST }")
+SESSION_DURATION_MS=$(( ${SESSION_DURATION_MS:-0} + ITER_DURATION_MS ))
+SESSION_INPUT_TOKENS=$(( ${SESSION_INPUT_TOKENS:-0} + ITER_INPUT ))
+SESSION_OUTPUT_TOKENS=$(( ${SESSION_OUTPUT_TOKENS:-0} + ITER_OUTPUT ))
+SESSION_CACHE_WRITE_TOKENS=$(( ${SESSION_CACHE_WRITE_TOKENS:-0} + ITER_CACHE_WRITE ))
+SESSION_CACHE_READ_TOKENS=$(( ${SESSION_CACHE_READ_TOKENS:-0} + ITER_CACHE_READ ))
+```
+Do NOT extract `session_id`, `model`, `stop_reason`, `permission_denials`, `uuid`, or any other field from the JSON response. Those are subprocess-envelope fields that serve no user-visible purpose and risk leaking subprocess-internal identifiers into orchestrator output.
 **Exit-code semantics.** `claude -p` exits non-zero when the subprocess fails hard — subprocess crash, auth failure, unresolvable permission denial, API/quota exhaustion. The orchestrator reads the exit code BEFORE parsing `.result`:
 - Exit 0 → parse `ITERATION_SUMMARY` from `.result` field; proceed to Step 6.
@@ -197,14 +226,15 @@ After each iteration, report:
 - What was done (investigated, transitioned to known-error, fix implemented, etc.)
 - The outcome (success, partially progressed, skipped, scope expanded)
 - How many problems remain in the backlog
+- The iteration's cost metadata — format: `($<cost>, <duration_s>s, <total_tokens_K>K tokens)`. Cost comes from the `.total_cost_usd` field extracted in Step 5; duration from `.duration_ms`; total tokens is the sum of `.usage.input_tokens + .usage.output_tokens + .usage.cache_creation_input_tokens + .usage.cache_read_input_tokens`.
 Format as a brief status line, not a wall of text. The user will read these when they return.
 **Example:**
 ```
-[Iteration 1] Worked P029 (Edit gate overhead for governance docs) — implemented fix, closed. 8 problems remain.
-[Iteration 2] Worked P021 (Governance skill structured prompts) — investigated root cause, transitioned to known-error. 7 problems remain.
-[Iteration 3] Skipped P016 (Multi-concern ticket splitting) — fix released, awaiting user verification. Worked P024 (Risk scorer WIP flag) — implemented fix, closed. 6 problems remain.
+[Iteration 1] Worked P029 (Edit gate overhead for governance docs) — implemented fix, closed. 8 problems remain. ($0.32, 23s, 171K tokens)
+[Iteration 2] Worked P021 (Governance skill structured prompts) — investigated root cause, transitioned to known-error. 7 problems remain. ($0.85, 47s, 432K tokens)
+[Iteration 3] Skipped P016 (Multi-concern ticket splitting) — fix released, awaiting user verification. Worked P024 (Risk scorer WIP flag) — implemented fix, closed. 6 problems remain. ($1.12, 62s, 541K tokens)
 ```
 ### Step 6.5: Release-cadence check (per ADR-018)
@@ -313,10 +343,27 @@ The skill should produce a final summary when the loop ends:
 |------|---------|--------|
 | 9.0 | P012 (Skill testing harness) | Open |
+### Session Cost
+Extracted from each iteration subprocess's `claude -p --output-format json` response (source: measured-actual, not estimated — per ADR-026 grounding). Renders identically in interactive and AFK modes; no decision branch, so output-side only. Cache-read column surfaces the warm-cache-reuse signal observed across subsequent subprocess invocations in the same Bash session.
+| Metric | Value |
+|--------|-------|
+| Iterations run | 3 |
+| Successful (committed) | 2 |
+| Skipped | 1 |
+| Total cost (USD) | $2.29 |
+| Mean cost per iteration | $0.76 |
+| Total input tokens | 42 |
+| Total output tokens | 1,531 |
+| Cache-creation tokens | 78,000 |
+| Cache-read tokens (reuse) | 1,064,000 |
+| Total duration | 2m 12s |
 ALL_DONE
 ```
-When every skipped ticket is in the `upstream-blocked` category (stop-condition #3) or there are no skipped tickets (stop-condition #1), omit the Outstanding Design Questions section entirely rather than rendering an empty heading.
+When every skipped ticket is in the `upstream-blocked` category (stop-condition #3) or there are no skipped tickets (stop-condition #1), omit the Outstanding Design Questions section entirely rather than rendering an empty heading. The Session Cost section always renders when at least one iteration ran.
 ## Related

package/skills/work-problems/test/work-problems-cost-logging.bats ADDED Viewed

@@ -0,0 +1,109 @@
+#!/usr/bin/env bats
+# Doc-lint guard: work-problems SKILL.md must extract cost + usage metadata
+# from each iteration's `claude -p --output-format json` response, surface it
+# in the per-iteration Step 6 progress line, and aggregate it in the ALL_DONE
+# Output Format as a dedicated "Session Cost" section.
+#
+# Rationale: the subprocess-boundary dispatch lands per-iteration cost in the
+# JSON response alongside `.result`. Without an explicit extraction contract,
+# that data is invisible to the user even though it's already emitted. Cost
+# logging lets the user calibrate AFK loop sizing on return (e.g. "max out
+# the token usage" direction 2026-04-21 needs a feedback loop).
+#
+# Structural assertion — Permitted Exception under ADR-005 + ADR-037 (SKILL.md
+# is the contract document). A behavioural harness that exercises the `jq`
+# extraction against a fixture JSON is a potential follow-up; out of scope
+# for this doc-lint pass.
+#
+# @problem P084
+# @jtbd JTBD-006
+#
+# Cross-reference:
+#   P084 (iteration worker has no Agent tool) — parent ticket; cost logging
+#     is an additive observability overlay on P084's shipped subprocess
+#     dispatch.
+#   ADR-032 (governance skill invocation patterns) — subprocess-boundary
+#     sub-pattern; `--output-format json` parse shape is already pinned.
+#   ADR-026 (agent output grounding) — Session Cost section cites its source
+#     so downstream audits can distinguish measured-actual from estimated.
+#   ADR-037 (skill testing strategy) — contract-assertion pattern.
+#   JTBD-006 (Progress the Backlog While I'm Away) — "clear summary when I
+#     return" documented outcome includes cost/token traceability.
+setup() {
+  SKILL_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
+  SKILL_FILE="${SKILL_DIR}/SKILL.md"
+}
+@test "SKILL.md Step 5 extracts .total_cost_usd from iteration JSON response" {
+  # Cost per iteration lives in the same JSON blob as .result; parsing it
+  # costs nothing more than a jq call the orchestrator already needs for
+  # ITERATION_SUMMARY.
+  run grep -nE 'total_cost_usd' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Step 5 extracts usage token fields from iteration JSON response" {
+  # input_tokens / output_tokens / cache_creation_input_tokens /
+  # cache_read_input_tokens are the four usage fields that give a full
+  # accounting. Cache-read is the key signal for reuse across subprocess
+  # invocations in the same Bash session.
+  run grep -nE 'input_tokens|output_tokens|cache_creation_input_tokens|cache_read_input_tokens' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Step 5 names jq (or equivalent) as the extraction mechanism" {
+  # jq is already implicit in `--output-format json` consumption; naming it
+  # in the SKILL.md prevents bespoke sed/awk reimplementations.
+  run grep -nE '\\bjq\\b|JSON parser|JSON extraction' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Step 5 scopes the extraction to named fields only (PII guard)" {
+  # Architect advisory 2026-04-21: the JSON response also carries session_id,
+  # model, stop_reason, etc. that should NOT be surfaced in user-visible
+  # output. The extraction list must be explicit so future contributors
+  # don't unconsciously broaden it.
+  run grep -niE 'extract only|only the fields|do not (surface|log|emit)|scoped to (the )?named fields|explicit field list' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Step 6 per-iteration progress line includes cost marker" {
+  # Example progress line in Step 6 should show the (cost, duration, tokens)
+  # suffix so contributors see the target format.
+  run grep -nE '\$[0-9]+\.[0-9]+.{0,40}(tokens|iteration|s,)' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Output Format includes a Session Cost section" {
+  # ALL_DONE summary aggregates per-iteration cost across the run. The
+  # section renders in every ALL_DONE — interactive OR AFK — because it's
+  # output-side, not a decision branch.
+  run grep -nE '^### Session Cost|## Session Cost|Session Cost.{0,40}Total' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Output Format Session Cost table includes cache-read reuse signal" {
+  # Cache-read is the signal for "warm-cache savings across subprocess
+  # invocations in the same Bash session" — empirically observed 65-147K
+  # cache-read tokens on probes 2-4 during the P084 probe sequence. Making
+  # this visible to the user helps them reason about AFK loop cost dynamics.
+  run grep -niE 'cache.?read|cache reuse|reuse signal|cache hit' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Output Format Session Cost section cites its data source (ADR-026)" {
+  # Architect advisory: the Session Cost numbers are measured-actual (from
+  # each iteration's claude -p JSON output), not estimates. Name the source
+  # so audit / downstream-tooling can trust the numbers.
+  run grep -niE 'extracted from.{0,80}(claude -p|--output-format json|iteration)|source:.{0,80}claude -p|measured.{0,40}(iteration|subprocess)' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "SKILL.md Session Cost section renders in both interactive and AFK modes" {
+  # JTBD-006 Rule 6 check: Session Cost is pure output, no AskUserQuestion,
+  # no policy-authorised action. Must render identically in both modes so
+  # AFK users see the same summary on return.
+  run grep -niE 'Session Cost.{0,160}(regardless|both|interactive.{0,40}AFK|AFK.{0,40}interactive)|output-side|no decision branch' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}