@windyroad/itil 0.19.0-preview.192 → 0.19.1-preview.194
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -158,13 +158,15 @@ PROMPT_EOF
|
|
|
158
158
|
claude -p \
|
|
159
159
|
--permission-mode bypassPermissions \
|
|
160
160
|
--output-format json \
|
|
161
|
-
"$ITERATION_PROMPT"
|
|
161
|
+
"$ITERATION_PROMPT" \
|
|
162
|
+
< /dev/null
|
|
162
163
|
```
|
|
163
164
|
|
|
164
165
|
**Flag rationale:**
|
|
165
166
|
|
|
166
167
|
- `--permission-mode bypassPermissions` — handles non-interactive permission prompts. Without this, Bash/Edit/Write calls inside the subprocess halt on approval prompts (no TTY). Alternative modes (`acceptEdits`, `auto`, `dontAsk`) are acceptable if adopters need narrower permission scopes; `bypassPermissions` is the broadest and the empirically-verified path.
|
|
167
168
|
- `--output-format json` — deterministic structured output. The subprocess's final agent message lands in the JSON response's `.result` field; orchestrator extracts `ITERATION_SUMMARY` from that field. Plain-text output would require fragile scraping.
|
|
169
|
+
- `< /dev/null` — explicit stdin-closed redirect (P089 Gap 1). Without this, `claude -p` waits up to 3s for stdin data in non-TTY contexts and then prints `Warning: no stdin data received in 3s, proceeding without it. If piping from a slow command, redirect stdin explicitly: < /dev/null to skip, or wait longer.` to stderr. The warning is on stderr — if the caller separates stderr and stdout streams, the warning is harmless. But the orchestrator captures via `2>&1` (required because the CLI emits progress prose on stderr that must not interleave between JSON responses when multiple invocations chain). Under the `2>&1` merge the stderr warning prefixes the stdout JSON and breaks `jq` / `json.load` / `JSON.parse` extraction at "line 1, column 1: Expecting value". The redirect suppresses the warning at source. First observed AFK-iter-7 iter 1 (2026-04-21); workaround is the Anthropic CLI help's own suggestion.
|
|
168
170
|
|
|
169
171
|
**No per-iteration budget cap.** The dispatch deliberately omits `--max-budget-usd`. Per user direction 2026-04-21: the natural stop condition for an AFK loop is quota exhaustion, not an arbitrary per-iteration dollar cap. A cap would halt iterations before quota is actually exhausted, wasting remaining budget. Runaway-iteration risk is bounded by quota + the orchestrator's Step 6.75 halt on unexpected dirty state + exit-code handling below.
|
|
170
172
|
|
|
@@ -224,6 +226,13 @@ SESSION_CACHE_READ_TOKENS=$(( ${SESSION_CACHE_READ_TOKENS:-0} + ITER_CACHE_READ
|
|
|
224
226
|
|
|
225
227
|
Do NOT extract `session_id`, `model`, `stop_reason`, `permission_denials`, `uuid`, or any other field from the JSON response. Those are subprocess-envelope fields that serve no user-visible purpose and risk leaking subprocess-internal identifiers into orchestrator output.
|
|
226
228
|
|
|
229
|
+
**Authority hierarchy (P089 Gap 2).** `total_cost_usd` and `usage.*` do NOT have the same reliability envelope — treat them accordingly when aggregating:
|
|
230
|
+
|
|
231
|
+
- `.total_cost_usd` is **authoritative for dollar cost** — cumulative across the subprocess's entire lifetime by contract. Use it as the sole source of truth for the Session Cost "Total cost (USD)" column and any cost-based stop condition.
|
|
232
|
+
- `.usage.*` token fields are **best-effort approximate** — the Anthropic CLI returns the final API response envelope, which is per-turn by construction. When the subprocess exits on a normal final turn the fields accumulate real usage; when the subprocess exits via a background-task completion-notification ack (a closing turn that only acknowledges a backgrounded task finished), the fields reflect ONLY that final ack turn and undercount dramatically. Detectable anomaly shape: the subprocess reports a final-turn-sized usage (handful of input tokens, hundreds of output tokens) alongside a wall-clock duration from the Bash wrapper's own timer that is orders of magnitude larger than the JSON's `duration_ms` field — the cumulative dollar cost still matches real spend, so the mismatch is self-evident on inspection.
|
|
233
|
+
|
|
234
|
+
Aggregation rule: sum `.total_cost_usd` into the session total and trust it; sum `.usage.*` into the session totals for cache-reuse ratio reasoning but label them best-effort in the Session Cost table. This asymmetry is correct-by-CLI-contract (cost is a session cumulative; usage is a per-response envelope); the orchestrator documents the asymmetry so adopters do not silently under-count tokens. First observed AFK-iter-7 iter 5 (2026-04-21): 1071s wall-clock / 60+ tool-use subprocess returned `duration_ms: 8546, num_turns: 1, usage.* ≈ 137K tokens, total_cost_usd: 6.08` — cost cumulative and correct, tokens reflecting only the final ack turn.
|
|
235
|
+
|
|
227
236
|
**Exit-code semantics.** `claude -p` exits non-zero when the subprocess fails hard — subprocess crash, auth failure, unresolvable permission denial, API/quota exhaustion. The orchestrator reads the exit code BEFORE parsing `.result`:
|
|
228
237
|
|
|
229
238
|
- Exit 0 → parse `ITERATION_SUMMARY` from `.result` field; proceed to Step 6.
|
|
@@ -414,6 +423,8 @@ The skill should produce a final summary when the loop ends:
|
|
|
414
423
|
|
|
415
424
|
Extracted from each iteration subprocess's `claude -p --output-format json` response (source: measured-actual, not estimated — per ADR-026 grounding). Renders identically in interactive and AFK modes; no decision branch, so output-side only. Cache-read column surfaces the warm-cache-reuse signal observed across subsequent subprocess invocations in the same Bash session.
|
|
416
425
|
|
|
426
|
+
**Authority note (per P089 Gap 2 — see Step 5 Authority hierarchy):** the "Total cost (USD)" column is authoritative (CLI reports `.total_cost_usd` as a session cumulative). The token columns are **best-effort** — they accumulate each iteration's `.usage.*` response fields, which reflect only the final-turn API envelope and can undercount when a subprocess exits via a background-task completion-notification ack. Cost-based reasoning trusts the cost column; token-based reasoning (cache-reuse ratios, cost-envelope calibration) reads the token columns with that caveat in mind.
|
|
427
|
+
|
|
417
428
|
| Metric | Value |
|
|
418
429
|
|--------|-------|
|
|
419
430
|
| Iterations run | 3 |
|
|
@@ -434,6 +445,7 @@ When every skipped ticket is in the `upstream-blocked` category (stop-condition
|
|
|
434
445
|
|
|
435
446
|
## Related
|
|
436
447
|
|
|
448
|
+
- **P089** (`docs/problems/089-work-problems-step-5-dispatch-robustness-stdin-warning-and-cost-metadata-edge-case.verifying.md`) — driver for Step 5's `< /dev/null` dispatch redirect and the Per-iteration cost metadata "Authority hierarchy" paragraph. Gap 1: stdin warning contaminated stderr-merged JSON captures; closed by adding `< /dev/null` to the canonical dispatch command. Gap 2: `.usage.*` undercounts when subprocess exits via a background-task completion ack while `.total_cost_usd` stays cumulative-authoritative; closed by documenting the authority hierarchy in Step 5 and the Session Cost output section so adopters trust cost and label token totals best-effort.
|
|
437
449
|
- **P086** (`docs/problems/086-afk-iteration-subprocess-does-not-run-retro-before-returning.verifying.md`) — driver for Step 5's retro-on-exit clause. Iteration subprocesses exit without running retro, so per-iteration friction (hook misbehaviour, repeat-workaround patterns, pipeline instability) evaporates on exit. Fix: iteration prompt body names `/wr-retrospective:run-retro` as a closing step before `ITERATION_SUMMARY` emission; retro runs inside the subprocess so Step 2b pipeline-instability scan has the full tool-call history; run-retro commits its own work per ADR-014; orchestrator picks up retro-created tickets on the next Step 1 scan.
|
|
438
450
|
- **P084** (`docs/problems/084-work-problems-iteration-worker-has-no-agent-tool-so-architect-jtbd-gates-block.open.md`) — driver for Step 5's subprocess-boundary dispatch. Supersedes P077's Agent-tool dispatch on the same Step 5 surface because Agent-tool-spawned subagents cannot themselves invoke Agent (platform restriction), which prevents governance gate markers from being set inside the iteration worker.
|
|
439
451
|
- **P077** (`docs/problems/077-work-problems-step-5-does-not-delegate-to-subagent.verifying.md`) — parent amendment. Established the AFK iteration-isolation wrapper sub-pattern and the `ITERATION_SUMMARY` return contract. P084 is the refinement that swaps the spawn mechanism; the isolation intent and return contract are preserved verbatim.
|
|
@@ -245,3 +245,89 @@ setup() {
|
|
|
245
245
|
run grep -nE "ScheduleWakeup.{0,120}P083|P083.{0,120}ScheduleWakeup" "$SKILL_FILE"
|
|
246
246
|
[ "$status" -eq 0 ]
|
|
247
247
|
}
|
|
248
|
+
|
|
249
|
+
# @problem P089
|
|
250
|
+
# @jtbd JTBD-006
|
|
251
|
+
#
|
|
252
|
+
# STRUCTURAL: tests the SKILL.md content contract per ADR-037's Permitted
|
|
253
|
+
# Exception (doc-lint contract assertion against the contract document
|
|
254
|
+
# itself). Behavioural alternative would require spawning a real `claude -p`
|
|
255
|
+
# subprocess from a non-TTY context, observing that the JSON response parses
|
|
256
|
+
# cleanly, and asserting `jq` does not error on a stderr-merged capture —
|
|
257
|
+
# that harness sits outside the skill layer and depends on the Anthropic
|
|
258
|
+
# CLI binary. Contract assertion is the named permitted pattern.
|
|
259
|
+
@test "SKILL.md Step 5 dispatch command redirects stdin (< /dev/null) to suppress stdin warning (P089 Gap 1)" {
|
|
260
|
+
# P089 Gap 1 (2026-04-21, AFK-iter-7 iter 1): the shipped dispatch command
|
|
261
|
+
# without an explicit stdin redirect causes `claude -p` to emit
|
|
262
|
+
# `Warning: no stdin data received in 3s, proceeding without it...` to
|
|
263
|
+
# stderr after a 3s wait. When the orchestrator captures the subprocess
|
|
264
|
+
# with `2>&1` (required for structured parse), that warning prepends to
|
|
265
|
+
# stdout and corrupts `jq` / `json.load` / `JSON.parse` extraction of
|
|
266
|
+
# `.result` and cost metadata. The `claude -p` CLI help explicitly
|
|
267
|
+
# suggests `< /dev/null` as the workaround. Regression guard ensures the
|
|
268
|
+
# redirect stays in the canonical dispatch block adopters copy.
|
|
269
|
+
run grep -nE '< /dev/null|< ?/dev/null' "$SKILL_FILE"
|
|
270
|
+
[ "$status" -eq 0 ]
|
|
271
|
+
}
|
|
272
|
+
|
|
273
|
+
@test "SKILL.md Step 5 explains stdin warning is on stderr (becomes stdout problem under 2>&1 capture)" {
|
|
274
|
+
# Architect advisory (2026-04-22): the prose around the `< /dev/null`
|
|
275
|
+
# redirect must state that the warning is emitted to stderr (not stdout)
|
|
276
|
+
# so adopters who capture stderr separately understand they do not need
|
|
277
|
+
# the redirect. This prevents cargo-culting the redirect in environments
|
|
278
|
+
# where stdout and stderr are consumed independently.
|
|
279
|
+
run grep -niE "stderr.{0,120}(2>&1|merge|capture)|warning.{0,60}stderr|stderr.{0,60}warning" "$SKILL_FILE"
|
|
280
|
+
[ "$status" -eq 0 ]
|
|
281
|
+
}
|
|
282
|
+
|
|
283
|
+
# @problem P089
|
|
284
|
+
# @jtbd JTBD-006
|
|
285
|
+
#
|
|
286
|
+
# STRUCTURAL: same Permitted Exception rationale — behavioural alternative
|
|
287
|
+
# would require a subprocess that internally spawns a background task,
|
|
288
|
+
# receives its completion notification, and returns a final-turn ack;
|
|
289
|
+
# the observation is a runtime contract of the Anthropic CLI and harness
|
|
290
|
+
# instrumentation sits outside the skill layer.
|
|
291
|
+
@test "SKILL.md Per-iteration cost metadata block names total_cost_usd as authoritative (P089 Gap 2)" {
|
|
292
|
+
# P089 Gap 2 (2026-04-21, AFK-iter-7 iter 5): subprocess ran 1071s wall-clock
|
|
293
|
+
# with 60+ tool uses; JSON returned `duration_ms: 8546, num_turns: 1,
|
|
294
|
+
# usage.* = ~137K tokens, total_cost_usd: 6.08`. Cost was cumulative-
|
|
295
|
+
# authoritative; usage.* reflected only the final-turn ack after a
|
|
296
|
+
# background-task completion notification. The SKILL.md must name the
|
|
297
|
+
# authority hierarchy explicitly so Session Cost consumers treat
|
|
298
|
+
# total_cost_usd as the trusted dollar signal and usage.* as best-effort
|
|
299
|
+
# approximate.
|
|
300
|
+
run grep -niE "total_cost_usd.{0,80}authoritative|authoritative.{0,80}total_cost_usd|total_cost_usd.{0,40}cumulative" "$SKILL_FILE"
|
|
301
|
+
[ "$status" -eq 0 ]
|
|
302
|
+
}
|
|
303
|
+
|
|
304
|
+
@test "SKILL.md Per-iteration cost metadata block flags usage.* as best-effort under early-ack anomaly (P089 Gap 2)" {
|
|
305
|
+
# The asymmetry between cumulative total_cost_usd and per-turn usage.*
|
|
306
|
+
# must be documented so contributors do not assume summing usage fields
|
|
307
|
+
# yields a total-tokens number matching real cost. Detection criterion
|
|
308
|
+
# (1-turn-short + duration_ms << wall-clock) is stated descriptively per
|
|
309
|
+
# architect option-b (keeps num_turns off the extracted field list).
|
|
310
|
+
run grep -niE "usage\\.\\*.{0,120}(best.?effort|approximate|final.?turn|under.?count|partial)|best.?effort.{0,120}usage|final.?turn.{0,120}usage" "$SKILL_FILE"
|
|
311
|
+
[ "$status" -eq 0 ]
|
|
312
|
+
}
|
|
313
|
+
|
|
314
|
+
@test "SKILL.md Session Cost output section notes tokens are best-effort under the early-ack anomaly (P089 Gap 2)" {
|
|
315
|
+
# The Session Cost table renders to the user on every ALL_DONE. If token
|
|
316
|
+
# columns silently undercount when a subprocess exits via background-task
|
|
317
|
+
# ack, the user's cost envelope is wrong without any caveat. The output
|
|
318
|
+
# section must carry a short note that the token totals are best-effort
|
|
319
|
+
# while total-cost is authoritative — this is the observability contract
|
|
320
|
+
# visible to the end user, not just an internal implementation detail.
|
|
321
|
+
# The note can live inline in the table footer or in the prose around
|
|
322
|
+
# the table.
|
|
323
|
+
run grep -niE "tokens.{0,120}(best.?effort|approximate|undercount|may reflect)|total.?cost.{0,120}authoritative" "$SKILL_FILE"
|
|
324
|
+
[ "$status" -eq 0 ]
|
|
325
|
+
}
|
|
326
|
+
|
|
327
|
+
@test "SKILL.md Related section cites P089 (Step 5 dispatch robustness)" {
|
|
328
|
+
# P089 documents the two gaps fixed by the < /dev/null redirect and the
|
|
329
|
+
# authority hierarchy paragraph. The Related section must cite it so the
|
|
330
|
+
# contract document remains self-documenting.
|
|
331
|
+
run grep -nE "P089" "$SKILL_FILE"
|
|
332
|
+
[ "$status" -eq 0 ]
|
|
333
|
+
}
|