@windyroad/itil 0.47.6-preview.544 → 0.47.7-preview.546
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -368,9 +368,26 @@ claude -p \
|
|
|
368
368
|
ITER_PID=$!
|
|
369
369
|
|
|
370
370
|
SIGTERM_SENT=0
|
|
371
|
+
LAST_POLL_EPOCH=$DISPATCH_START_EPOCH
|
|
372
|
+
SUSPEND_OFFSET_S=0
|
|
373
|
+
EXPECTED_POLL_DELTA_S=60 # matches `sleep 60` cadence below
|
|
374
|
+
SUSPEND_JITTER_S=120 # tolerance above expected before treating gap as suspend (P307)
|
|
371
375
|
while kill -0 "$ITER_PID" 2>/dev/null; do
|
|
372
|
-
sleep
|
|
376
|
+
sleep "$EXPECTED_POLL_DELTA_S"
|
|
373
377
|
NOW=$(date +%s)
|
|
378
|
+
# P307 machine-sleep false-kill: when the host suspends between polls,
|
|
379
|
+
# wall-clock advances while the iter subprocess is itself suspended (no
|
|
380
|
+
# actual idle work). Detect the wall-clock jump and accumulate it into
|
|
381
|
+
# SUSPEND_OFFSET_S so IDLE_SECONDS (computed against NOW - SUSPEND_OFFSET_S
|
|
382
|
+
# below) reads active-elapsed rather than wall-clock-elapsed. Without
|
|
383
|
+
# this, laptop suspend falsely kills a completing iter (2026-05-26 iter 1
|
|
384
|
+
# evidence: idle jumped 481s -> 1016s -> 5544s across suspend gaps;
|
|
385
|
+
# SIGTERM fired at 5544s > 3600s, lost the iter's commit + cost metadata).
|
|
386
|
+
ACTUAL_POLL_DELTA=$(( NOW - LAST_POLL_EPOCH ))
|
|
387
|
+
if (( ACTUAL_POLL_DELTA > EXPECTED_POLL_DELTA_S + SUSPEND_JITTER_S )); then
|
|
388
|
+
SUSPEND_OFFSET_S=$(( SUSPEND_OFFSET_S + ACTUAL_POLL_DELTA - EXPECTED_POLL_DELTA_S ))
|
|
389
|
+
fi
|
|
390
|
+
LAST_POLL_EPOCH=$NOW
|
|
374
391
|
LAST_COMMIT_EPOCH=$(git log -1 --format=%at HEAD 2>/dev/null || echo "$DISPATCH_START_EPOCH")
|
|
375
392
|
# LAST_ACTIVITY_MARK = max(DISPATCH_START_EPOCH, last commit timestamp).
|
|
376
393
|
# The dispatch-start floor handles skip-iterations that produce no commit:
|
|
@@ -381,7 +398,7 @@ while kill -0 "$ITER_PID" 2>/dev/null; do
|
|
|
381
398
|
else
|
|
382
399
|
LAST_ACTIVITY_MARK=$DISPATCH_START_EPOCH
|
|
383
400
|
fi
|
|
384
|
-
IDLE_SECONDS=$(( NOW - LAST_ACTIVITY_MARK ))
|
|
401
|
+
IDLE_SECONDS=$(( NOW - SUSPEND_OFFSET_S - LAST_ACTIVITY_MARK ))
|
|
385
402
|
if (( IDLE_SECONDS > IDLE_TIMEOUT_S )) && (( SIGTERM_SENT == 0 )); then
|
|
386
403
|
kill -TERM "$ITER_PID" 2>/dev/null || true
|
|
387
404
|
SIGTERM_SENT=1
|
|
@@ -409,6 +426,8 @@ rm -f "$ITER_JSON"
|
|
|
409
426
|
|
|
410
427
|
**LAST_ACTIVITY_MARK signal trade-off.** The mark is `max(DISPATCH_START_EPOCH, last commit timestamp)`. The dispatch-start floor is intentional: skip-iterations that produce no commit (Step 4 routes a ticket to `action: skipped`) are bounded by `IDLE_TIMEOUT_S` since dispatch start, not by an arbitrarily-stale prior-commit timestamp. This protects against false-positive SIGTERM at iter T=0 when the most recent commit happens to be hours old. The trade-off is the inverse: a skip-iter that runs for `IDLE_TIMEOUT_S` (60 min default) will SIGTERM even though it never had a chance to commit. The 60-min default is well past the typical skip-iter wall-clock (a normal skip completes in seconds), so the trade-off rarely fires in practice; adopters who run unusually long skip-evaluation iters (e.g. deep architect-design probes) should raise `WORK_PROBLEMS_IDLE_TIMEOUT_S` accordingly. Alternative signals considered and rejected: `stat -f%m "$ITER_JSON"` (binary — file mtime only changes on subprocess exit, useless during the idle gap); subprocess RSS-change tracking (noisy; spikes during Agent-tool expansions confound the signal). The git-log signal is the cheapest reliable progress indicator the orchestrator already has.
|
|
411
428
|
|
|
429
|
+
**Machine-sleep false-kill — suspend-detect heuristic (P307).** The IDLE_SECONDS computation above subtracts `SUSPEND_OFFSET_S` from wall-clock `NOW` so the orchestrator measures *active-elapsed* time rather than raw wall-clock between LAST_ACTIVITY_MARK and now. The offset accumulates whenever a poll observes `ACTUAL_POLL_DELTA > EXPECTED_POLL_DELTA_S + SUSPEND_JITTER_S` (default `60 + 120 = 180s`) — i.e., the gap between consecutive `sleep 60` polls vastly exceeds the cadence the loop scheduled. The driver is the 2026-05-26 iter 1 evidence: the iter's host suspended (lid-close mid-loop) and the next poll observed an idle of 5544s; the wall-clock-only computation tripped SIGTERM at 5544s > 3600s, exit 143 + 0-byte JSON (the P147 stuck-before-emit metadata-loss class), losing a commit + cost metadata for an iter whose semantic work had completed. The suspend-detect heuristic converts that wall-clock-elapsed measure to "active-elapsed approximate" without needing monotonic clocks (which bash does not natively expose anyway). Alternatives considered and rejected: (a) monotonic / active-time clocks (POSIX `CLOCK_MONOTONIC` is not surfaced by `date` or `$EPOCHSECONDS`; would require a C helper or a Python-shim subprocess per poll); (b) iter-side heartbeat file the poll loop reads instead of wall-clock (works but adds an iter-side write contract; suspend-detect is purely orchestrator-side, no iter-prompt changes). The jitter buffer (`SUSPEND_JITTER_S=120`) is the load-bearing safety margin: it tolerates slow-hook / GC / brief-load-spike jitter (up to 180s total inter-poll delay) without falsely shifting; only genuine suspend / system-clock jumps cross the threshold. Adopters with unusually noisy hosts can raise `SUSPEND_JITTER_S` per environment; lowering it risks counting brief stalls as suspend. The heuristic is asymmetric — it can absorb a 5 min host hang into the offset and treat it as suspend, but the cost is at worst that one iter runs an extra 5 min before SIGTERM (cheaper than losing the iter's commit + metadata to a false-kill).
|
|
430
|
+
|
|
412
431
|
**Iteration prompt body (self-contained — the subprocess has no prior conversation context):**
|
|
413
432
|
|
|
414
433
|
1. **Context**: this is one iteration of the AFK work-problems loop. The user is AFK. The orchestrator selected `P<NNN> (<title>)` as the highest-WSJF actionable ticket.
|
|
@@ -298,3 +298,124 @@ FAKE_EOF
|
|
|
298
298
|
run grep -nE "P147" "$SKILL_FILE"
|
|
299
299
|
[ "$status" -eq 0 ]
|
|
300
300
|
}
|
|
301
|
+
|
|
302
|
+
# ---------------------------------------------------------------------------
|
|
303
|
+
# P307 machine-sleep false-kill subclass: P121's IDLE_SECONDS = NOW -
|
|
304
|
+
# LAST_ACTIVITY_MARK computation is wall-clock time. When the host machine
|
|
305
|
+
# suspends/sleeps between the 60s polls, wall-clock advances while the iter
|
|
306
|
+
# subprocess is itself suspended (no actual idle work). On resume, IDLE_SECONDS
|
|
307
|
+
# jumps past the threshold and SIGTERM fires on a subprocess that was
|
|
308
|
+
# genuinely making progress, not stuck. The 2026-05-26 evidence: poll log
|
|
309
|
+
# idle jumped non-linearly 481s -> 1016s -> 5544s across suspend gaps,
|
|
310
|
+
# SIGTERM at idle=5544s > 3600s threshold, exit 143 + 0-byte JSON (the
|
|
311
|
+
# P147 stuck-before-emit metadata-loss class).
|
|
312
|
+
#
|
|
313
|
+
# Fix: detect large wall-clock jumps between consecutive polls (>> 60s
|
|
314
|
+
# expected) as suspend events and shift LAST_ACTIVITY_MARK forward by the
|
|
315
|
+
# gap-minus-expected so IDLE_SECONDS approximates active-elapsed rather
|
|
316
|
+
# than wall-clock-elapsed. Pure-bash heuristic — no monotonic-clock
|
|
317
|
+
# dependency.
|
|
318
|
+
#
|
|
319
|
+
# @problem P307
|
|
320
|
+
|
|
321
|
+
# Pure-bash helper mirroring SKILL.md Step 5 suspend-detect math. Tests
|
|
322
|
+
# below pin the algorithm against parameter combinations exercising
|
|
323
|
+
# (a) normal poll cadence (no shift), (b) within-jitter delay (no shift),
|
|
324
|
+
# (c) detected suspend (shift forward by actual-minus-expected),
|
|
325
|
+
# (d) reproduction of the 2026-05-26 5544s evidence (large shift absorbs
|
|
326
|
+
# the gap). The shape returned is the EFFECTIVE LAST_ACTIVITY_MARK such
|
|
327
|
+
# that IDLE_SECONDS = NOW - effective_mark yields active-elapsed.
|
|
328
|
+
compute_effective_mark() {
|
|
329
|
+
local prev_mark="$1"
|
|
330
|
+
local prev_poll="$2"
|
|
331
|
+
local now="$3"
|
|
332
|
+
local expected_delta="${4:-60}"
|
|
333
|
+
local jitter="${5:-120}"
|
|
334
|
+
|
|
335
|
+
local actual_delta=$(( now - prev_poll ))
|
|
336
|
+
local threshold=$(( expected_delta + jitter ))
|
|
337
|
+
if (( actual_delta > threshold )); then
|
|
338
|
+
printf '%d\n' $(( prev_mark + actual_delta - expected_delta ))
|
|
339
|
+
else
|
|
340
|
+
printf '%d\n' "$prev_mark"
|
|
341
|
+
fi
|
|
342
|
+
}
|
|
343
|
+
|
|
344
|
+
@test "P307: normal poll cadence (60s actual delta) does NOT shift LAST_ACTIVITY_MARK" {
|
|
345
|
+
# 60s between polls is the expected `sleep 60` cadence; no suspend; mark
|
|
346
|
+
# unchanged. Guards against an over-eager heuristic that would shift on
|
|
347
|
+
# every normal poll.
|
|
348
|
+
run compute_effective_mark 1000 0 60 60 120
|
|
349
|
+
[ "$status" -eq 0 ]
|
|
350
|
+
[ "$output" = "1000" ]
|
|
351
|
+
}
|
|
352
|
+
|
|
353
|
+
@test "P307: within-jitter delay (90s actual delta) does NOT shift LAST_ACTIVITY_MARK" {
|
|
354
|
+
# 90s between polls is mild jitter (slow hook, GC pause, brief load
|
|
355
|
+
# spike); below the 60+120=180s suspend threshold; mark unchanged.
|
|
356
|
+
# Bounded noise must not trigger a shift.
|
|
357
|
+
run compute_effective_mark 1000 0 90 60 120
|
|
358
|
+
[ "$status" -eq 0 ]
|
|
359
|
+
[ "$output" = "1000" ]
|
|
360
|
+
}
|
|
361
|
+
|
|
362
|
+
@test "P307: at-threshold delay (180s actual delta) does NOT shift LAST_ACTIVITY_MARK" {
|
|
363
|
+
# Exactly at expected+jitter is the boundary; strict-greater-than test
|
|
364
|
+
# means no shift at the boundary. Adopters tuning the jitter window
|
|
365
|
+
# know 180s == EXPECTED_POLL_DELTA_S + SUSPEND_JITTER_S is the inclusive
|
|
366
|
+
# ceiling of the no-shift band.
|
|
367
|
+
run compute_effective_mark 1000 0 180 60 120
|
|
368
|
+
[ "$status" -eq 0 ]
|
|
369
|
+
[ "$output" = "1000" ]
|
|
370
|
+
}
|
|
371
|
+
|
|
372
|
+
@test "P307: detected suspend (300s actual delta) shifts mark forward by actual-minus-expected" {
|
|
373
|
+
# 300s between polls vastly exceeds the 180s threshold; treat as suspend
|
|
374
|
+
# event and shift mark forward by 300-60=240s. Effect: IDLE_SECONDS
|
|
375
|
+
# (NOW - effective_mark) reads 60s instead of 300s, preserving the
|
|
376
|
+
# subprocess from a wall-clock false-kill.
|
|
377
|
+
run compute_effective_mark 1000 0 300 60 120
|
|
378
|
+
[ "$status" -eq 0 ]
|
|
379
|
+
[ "$output" = "1240" ]
|
|
380
|
+
}
|
|
381
|
+
|
|
382
|
+
@test "P307: reproduces 2026-05-26 iter 1 evidence (5544s suspend gap shifts mark to absorb)" {
|
|
383
|
+
# Concrete reproduction of the production observation: poll saw idle
|
|
384
|
+
# jump to 5544s after a multi-hour laptop suspend. Without suspend-detect,
|
|
385
|
+
# SIGTERM fires at 5544s > 3600s threshold. With suspend-detect, mark
|
|
386
|
+
# shifts forward by 5544-60=5484s; IDLE_SECONDS = 5544 - 5484 = 60s,
|
|
387
|
+
# below threshold; iter survives.
|
|
388
|
+
run compute_effective_mark 0 0 5544 60 120
|
|
389
|
+
[ "$status" -eq 0 ]
|
|
390
|
+
[ "$output" = "5484" ]
|
|
391
|
+
}
|
|
392
|
+
|
|
393
|
+
@test "P307: SKILL.md Step 5 documents the suspend-detect heuristic" {
|
|
394
|
+
# Prose must name the heuristic so adopters reading the SKILL.md know
|
|
395
|
+
# how the poll loop survives machine-sleep without inventing one.
|
|
396
|
+
# Accept any of: "suspend-detect", "wall-clock jump", "machine sleep",
|
|
397
|
+
# "machine-sleep", or the constants EXPECTED_POLL_DELTA_S /
|
|
398
|
+
# SUSPEND_JITTER_S / SUSPEND_OFFSET_S that name the construct.
|
|
399
|
+
run grep -niE "suspend.?detect|wall.?clock jump|machine.?sleep|EXPECTED_POLL_DELTA_S|SUSPEND_JITTER_S|SUSPEND_OFFSET_S" "$SKILL_FILE"
|
|
400
|
+
[ "$status" -eq 0 ]
|
|
401
|
+
}
|
|
402
|
+
|
|
403
|
+
@test "P307: SKILL.md Step 5 cites P307 (machine-sleep false-kill driver)" {
|
|
404
|
+
run grep -nE "P307" "$SKILL_FILE"
|
|
405
|
+
[ "$status" -eq 0 ]
|
|
406
|
+
}
|
|
407
|
+
|
|
408
|
+
@test "P307: SKILL.md Step 5 trade-off paragraph names suspend-detect alongside skip-iter trade-off" {
|
|
409
|
+
# The LAST_ACTIVITY_MARK signal trade-off paragraph (existing at L410)
|
|
410
|
+
# enumerates alternatives considered and rejected (mtime, RSS). The
|
|
411
|
+
# suspend-detect addition belongs in the same locus per architect
|
|
412
|
+
# review — keeps the rationale chain (P121 -> P147 -> trade-off ->
|
|
413
|
+
# P307 suspend-detect) reading linearly rather than fragmenting into
|
|
414
|
+
# a separate section. Assert the trade-off paragraph names both
|
|
415
|
+
# SUSPEND_OFFSET_S (the accumulator) AND the EXPECTED_POLL_DELTA_S +
|
|
416
|
+
# SUSPEND_JITTER_S threshold so the rationale chain is complete.
|
|
417
|
+
run grep -nE "LAST_ACTIVITY_MARK signal trade-off" "$SKILL_FILE"
|
|
418
|
+
[ "$status" -eq 0 ]
|
|
419
|
+
run grep -niE "SUSPEND_OFFSET_S|suspend.?offset" "$SKILL_FILE"
|
|
420
|
+
[ "$status" -eq 0 ]
|
|
421
|
+
}
|