@windyroad/itil 0.23.3-preview.259 → 0.23.4-preview.261
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/bin/wr-itil-classify-readme-drift +2 -0
- package/package.json +1 -1
- package/skills/list-problems/SKILL.md +1 -1
- package/skills/manage-problem/SKILL.md +31 -3
- package/skills/manage-problem/test/manage-problem-readme-vq-sort-order.bats +206 -0
- package/skills/reconcile-readme/SKILL.md +1 -1
- package/skills/review-problems/SKILL.md +2 -2
- package/skills/transition-problem/SKILL.md +2 -0
- package/skills/transition-problems/SKILL.md +1 -1
- package/skills/work-problems/SKILL.md +28 -1
- package/skills/work-problems/test/work-problems-step-5-idle-timeout-sigterm.bats +104 -0
package/package.json
CHANGED
|
@@ -66,7 +66,7 @@ Render three sections matching the README.md format so cached and live output lo
|
|
|
66
66
|
| <score> | P<NNN> | <title> | <severity> | <status> | <effort> |
|
|
67
67
|
```
|
|
68
68
|
|
|
69
|
-
**Verification Queue** — `.verifying.md` tickets, sorted by
|
|
69
|
+
**Verification Queue** — `.verifying.md` tickets, sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150.
|
|
70
70
|
|
|
71
71
|
```
|
|
72
72
|
| ID | Title | Released | Likely verified? |
|
|
@@ -193,9 +193,33 @@ The `wr-itil-reconcile-readme` command is a `$PATH`-resolved shim shipped in `pa
|
|
|
193
193
|
|
|
194
194
|
Exit-code routing:
|
|
195
195
|
- **Exit 0 (clean)**: continue to Step 1.
|
|
196
|
-
- **Exit 1 (drift detected)**: structured diff lines printed to stdout, one per drift entry (≤150 bytes per ADR-038 progressive-disclosure budget).
|
|
196
|
+
- **Exit 1 (drift detected)**: structured diff lines printed to stdout, one per drift entry (≤150 bytes per ADR-038 progressive-disclosure budget). Capture stdout to a temp file and classify the drift via the **uncommitted-rename carve-out** (P149) before halt-routing — see "Drift classification carve-out" immediately below.
|
|
197
197
|
- **Exit 2 (parse error)**: README missing or malformed. Halt with the parse-error message; this needs investigation, not mechanical reconciliation. AFK orchestrators halt-with-report per ADR-013 Rule 6.
|
|
198
198
|
|
|
199
|
+
#### Drift classification carve-out (P149)
|
|
200
|
+
|
|
201
|
+
The Exit 1 halt-and-route path is correct for **committed cross-session drift** — a past session committed a ticket transition without staging the README refresh, and proceeding now would re-encode the drift into the post-operation refresh and propagate the lie. It is **wrong for uncommitted-rename-rooted drift** — when the current working tree carries a staged ticket rename (a same-session `git mv` that the in-flow P094 / P062 refresh at Step 5 / Step 7 will reconcile in the upcoming commit per ADR-014's single-commit grain). Halting in the latter case forces a separate `/wr-itil:reconcile-readme` commit, splitting one logical change across two commits and violating the ADR-014 grain.
|
|
202
|
+
|
|
203
|
+
Run the classifier on Exit 1 to distinguish the two cases:
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
wr-itil-reconcile-readme docs/problems > /tmp/wr-itil-drift-$$.txt
|
|
207
|
+
reconcile_exit=$?
|
|
208
|
+
if [ "$reconcile_exit" -eq 1 ]; then
|
|
209
|
+
wr-itil-classify-readme-drift /tmp/wr-itil-drift-$$.txt docs/problems
|
|
210
|
+
classify_exit=$?
|
|
211
|
+
rm -f /tmp/wr-itil-drift-$$.txt
|
|
212
|
+
fi
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
The `wr-itil-classify-readme-drift` command is a `$PATH`-resolved shim (ADR-049 naming grammar) dispatching `packages/itil/scripts/classify-readme-drift.sh`. It cross-references the drifting IDs from the script's stdout against `git status --porcelain docs/problems/` filtered for staged rename (`R`) entries — the destination path's ticket ID is the post-rename status the in-flow refresh will reconcile.
|
|
216
|
+
|
|
217
|
+
Classifier exit-code routing:
|
|
218
|
+
|
|
219
|
+
- **`classify_exit == 0` (INLINE_REFRESH)**: every drifting ID is the destination of a staged rename in the working tree. Log a one-line note ("Step 0 reconcile drift covered by N staged rename(s); deferring README refresh to in-flow Step 5 / Step 7 per P094 / P062 + ADR-014 single-commit grain") and continue to Step 1. Do NOT invoke `/wr-itil:reconcile-readme` — the in-flow refresh will land the README correction in the same commit as the ticket work.
|
|
220
|
+
- **`classify_exit == 1` (HALT_ROUTE_RECONCILE)**: at least one drifting ID is NOT covered by a staged rename — committed cross-session drift OR mixed (some IDs in working tree, some committed-only). **Halt this invocation** with a directive to invoke `/wr-itil:reconcile-readme` (interactive mode) or auto-route through the same skill in non-interactive mode (per ADR-013 Rule 6, AFK orchestrator). The reconciliation must complete and commit before this manage-problem invocation proceeds. Mixed routes to halt because `/wr-itil:reconcile-readme` resolves both classes safely; the in-flow refresh only handles the rename'd subset.
|
|
221
|
+
- **`classify_exit == 2` (parse error)**: classifier received empty / missing drift input — contract violation upstream. Fall back to the conservative halt-and-route path.
|
|
222
|
+
|
|
199
223
|
This is a **preflight CHECK only** — manage-problem does NOT itself apply edits. The edit application lives in `/wr-itil:reconcile-readme`'s Step 4 with narrative preservation. Per architect verdict on P118 (Q3): manage-problem and work-problems Step 0 invoke the script (cheap mechanical check); transition-problem does NOT (P062 already covers transition-time refresh inside the same commit, redundant preflight there would pay the cost on every transition).
|
|
200
224
|
|
|
201
225
|
This step is a robustness layer ON TOP of P094 + P062, not a supersession of either — both per-operation contracts remain in force at Step 5 (creation refresh) and Step 7 (transition refresh).
|
|
@@ -410,6 +434,8 @@ After writing the new `.open.md` file, regenerate `docs/problems/README.md` to i
|
|
|
410
434
|
|
|
411
435
|
**WSJF Rankings tie-break sort (P138)**: rows in the WSJF Rankings table are sorted by the multi-key `(WSJF desc, Known-Error-first, Effort-divisor asc, Reported-date asc, ID asc)` so the rendered top-to-bottom row order matches `/wr-itil:work-problems` SKILL.md Step 3's tie-break selection 1:1. The first key (WSJF desc) sets the tier; within a tier the next three keys are the canonical tie-break ladder (Known Error before Open; smaller effort before larger; older Reported date before newer); ID asc is the deterministic final tiebreaker for full-tie cases. The table MUST include a `Reported` column so the third tie-break input is visible to README readers — without it, users cannot reconcile the rendered order against the orchestrator's selection. <!-- TIE-BREAK-LADDER-SOURCE: /wr-itil:work-problems SKILL.md Step 3 --> Any future change to the tie-break ladder MUST update this render block, the Step 7 P062 block, the Step 9e template, AND `/wr-itil:review-problems` SKILL.md Step 3 / Step 5 — drift here re-opens P138.
|
|
412
436
|
|
|
437
|
+
**Verification Queue sort direction (P150)**: rows in the Verification Queue table are sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics — older entries are the most likely-verified candidates the user wants to surface first when closing the queue. Newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any future change to the VQ sort direction MUST update this render block, the Step 7 P062 block, the Step 9c presentation block, the Step 9e template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift here re-opens P150.
|
|
438
|
+
|
|
413
439
|
1. After `Write`-ing the new `.open.md` file (and, for multi-concern splits per step 4b, after all split files are written), regenerate `docs/problems/README.md` in-place reflecting the new filename set.
|
|
414
440
|
2. Update the "Last reviewed" line per the **Last-reviewed line discipline (P134)** subsection below — name the new ticket as the most-recent fragment (e.g. `P<NNN> opened — <one-line title>`); displaced prior fragments rotate to `docs/problems/README-history.md`.
|
|
415
441
|
3. `git add docs/problems/README.md` — the stage list at Step 11 must include it alongside the new `.open.md` file (Step 11's `git add -u` catch-all handles tracked-file modifications; the new README render lands via this path when README.md already exists in git, and via an explicit `git add docs/problems/README.md` when it is newly created). When line-3 truncation displaces prior content, also `git add docs/problems/README-history.md`.
|
|
@@ -575,6 +601,8 @@ The refresh uses the same rendering rules as Step 9e (glob `docs/problems/*.open
|
|
|
575
601
|
|
|
576
602
|
**WSJF Rankings tie-break sort (P138)**: rows in the WSJF Rankings table are sorted by the multi-key `(WSJF desc, Known-Error-first, Effort-divisor asc, Reported-date asc, ID asc)` so the rendered top-to-bottom row order matches `/wr-itil:work-problems` SKILL.md Step 3's tie-break selection 1:1. Within each WSJF tier, rows are ordered by the canonical tie-break ladder: Known Error before Open, smaller Effort before larger, older Reported date before newer. The table MUST include a `Reported` column so the third tie-break input is visible to README readers. <!-- TIE-BREAK-LADDER-SOURCE: /wr-itil:work-problems SKILL.md Step 3 --> Any future change to the tie-break ladder MUST update this render block, the Step 5 P094 block, the Step 9e template, AND `/wr-itil:review-problems` SKILL.md Step 3 / Step 5 — drift here re-opens P138.
|
|
577
603
|
|
|
604
|
+
**Verification Queue sort direction (P150)**: rows in the Verification Queue table are sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics — older entries are the most likely-verified candidates the user wants to surface first when closing the queue. Newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any future change to the VQ sort direction MUST update this render block, the Step 5 P094 block, the Step 9c presentation block, the Step 9e template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift here re-opens P150.
|
|
605
|
+
|
|
578
606
|
**Mechanism:**
|
|
579
607
|
|
|
580
608
|
1. After renaming + Editing + `git add`-ing the transitioned ticket file (per the staging-trap rule above), regenerate `docs/problems/README.md` in-place reflecting the new filename set and the transitioned ticket's new Status.
|
|
@@ -667,7 +695,7 @@ After reviewing all problems, present a WSJF-ranked table for open/known-error p
|
|
|
667
695
|
| WSJF | ID | Title | Severity | Status | Effort | Reported | Notes |
|
|
668
696
|
|------|-----|-------|----------|--------|--------|----------|-------|
|
|
669
697
|
|
|
670
|
-
Then present a separate **Verification Queue** section for `.verifying.md` files (per ADR-022 — ranked by release age, oldest first; no WSJF because the multiplier is 0). Highlight each ticket whose release age is **≥ 14 days** (the within-skill default per P048 Candidate 4 — tunable; if it needs cross-skill consistency later, promote to policy) with a `likely verified` marker in the final column. This makes the Verification Queue not just a list but a ranked view of which verifications are most likely ready to close:
|
|
698
|
+
Then present a separate **Verification Queue** section for `.verifying.md` files (per ADR-022 — ranked by release age, oldest first; no WSJF because the multiplier is 0). Sort key + direction is the canonical `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) — drift here re-opens P150. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Highlight each ticket whose release age is **≥ 14 days** (the within-skill default per P048 Candidate 4 — tunable; if it needs cross-skill consistency later, promote to policy) with a `likely verified` marker in the final column. This makes the Verification Queue not just a list but a ranked view of which verifications are most likely ready to close — older entries are the most likely-verified candidates the user wants to surface first when closing the queue:
|
|
671
699
|
|
|
672
700
|
| ID | Title | Released | Fix summary | Likely verified? |
|
|
673
701
|
|----|-------|----------|-------------|------------------|
|
|
@@ -727,7 +755,7 @@ Edit each problem file where the priority changed. Then write/overwrite `docs/pr
|
|
|
727
755
|
|
|
728
756
|
## Verification Queue
|
|
729
757
|
|
|
730
|
-
Fix released, awaiting user verification (driven off `docs/problems/*.verifying.md` via glob — per ADR-022).
|
|
758
|
+
Fix released, awaiting user verification (driven off `docs/problems/*.verifying.md` via glob — per ADR-022). Sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC). <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150 — any change to VQ sort direction MUST update the Step 5 P094 block, the Step 7 P062 block, the Step 9c presentation block, this template, AND `/wr-itil:review-problems` + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems`.
|
|
731
759
|
|
|
732
760
|
| ID | Title | Released | Fix summary |
|
|
733
761
|
|----|-------|----------|-------------|
|
|
@@ -0,0 +1,206 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# P150: docs/problems/README.md Verification Queue must be rendered
|
|
4
|
+
# oldest-first (by Released date ASC, oldest at row 1) per ADR-022 +
|
|
5
|
+
# P048 user-task semantics. The header has long claimed "Ranked by
|
|
6
|
+
# release age, oldest first" while the rendered table drifted to
|
|
7
|
+
# newest-first across multiple SKILL.md render sites. This file
|
|
8
|
+
# encodes the canonical sort spec + greppable VQ-SORT-DIRECTION
|
|
9
|
+
# marker as a contract assertion across every render block, plus a
|
|
10
|
+
# behavioural fixture that asserts the actual sort outcome.
|
|
11
|
+
#
|
|
12
|
+
# Hybrid coverage per ADR-005 + ADR-037:
|
|
13
|
+
# - Structural contract-assertions (Permitted Exception per ADR-005 /
|
|
14
|
+
# contract-assertion pattern per ADR-037): each of the render-block
|
|
15
|
+
# sites carries the canonical VQ-SORT-DIRECTION marker.
|
|
16
|
+
# - One behavioural fixture sort: 4 .verifying.md tickets with known
|
|
17
|
+
# Released dates. Apply the documented ASC-by-date sort. Assert
|
|
18
|
+
# row 1 = the oldest entry; row N = the newest.
|
|
19
|
+
#
|
|
20
|
+
# @problem P150
|
|
21
|
+
# @jtbd JTBD-001 (enforce governance without slowing down — predictable
|
|
22
|
+
# render order visible across the README and from `list-problems`)
|
|
23
|
+
# @jtbd JTBD-006 (progress backlog AFK — verification candidates ready
|
|
24
|
+
# to close are at the top of the queue, not the bottom)
|
|
25
|
+
#
|
|
26
|
+
# Cross-reference:
|
|
27
|
+
# P150: docs/problems/150-readme-verification-queue-rendered-newest-first-contradicts-oldest-first-header.*.md
|
|
28
|
+
# P138: sibling fix on the WSJF Rankings table — same fix shape
|
|
29
|
+
# P048: introduced the Verification Queue + Likely verified column
|
|
30
|
+
# ADR-005 — plugin testing strategy / Permitted Exception
|
|
31
|
+
# ADR-022 — `.verifying.md` lifecycle; VQ rendering
|
|
32
|
+
# ADR-037 — contract-assertion bats pattern
|
|
33
|
+
|
|
34
|
+
setup() {
|
|
35
|
+
REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
|
|
36
|
+
MANAGE_SKILL="$REPO_ROOT/packages/itil/skills/manage-problem/SKILL.md"
|
|
37
|
+
REVIEW_SKILL="$REPO_ROOT/packages/itil/skills/review-problems/SKILL.md"
|
|
38
|
+
TRANSITION_SKILL="$REPO_ROOT/packages/itil/skills/transition-problem/SKILL.md"
|
|
39
|
+
TRANSITIONS_SKILL="$REPO_ROOT/packages/itil/skills/transition-problems/SKILL.md"
|
|
40
|
+
RECONCILE_SKILL="$REPO_ROOT/packages/itil/skills/reconcile-readme/SKILL.md"
|
|
41
|
+
LIST_SKILL="$REPO_ROOT/packages/itil/skills/list-problems/SKILL.md"
|
|
42
|
+
|
|
43
|
+
TEST_TMP="$(mktemp -d)"
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
teardown() {
|
|
47
|
+
if [ -n "${TEST_TMP:-}" ] && [ -d "$TEST_TMP" ]; then
|
|
48
|
+
rm -rf "$TEST_TMP"
|
|
49
|
+
fi
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
# ---------------------------------------------------------------------------
|
|
53
|
+
# Structural contract-assertions — VQ-SORT-DIRECTION marker
|
|
54
|
+
# ---------------------------------------------------------------------------
|
|
55
|
+
|
|
56
|
+
@test "manage-problem render blocks carry the VQ-SORT-DIRECTION marker" {
|
|
57
|
+
# Each render block writing the Verification Queue must carry the
|
|
58
|
+
# canonical greppable marker pointing back to ADR-022 (the
|
|
59
|
+
# framework-resolved source of the VQ ordering contract). Drift
|
|
60
|
+
# across render sites re-opens P150.
|
|
61
|
+
run grep -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$MANAGE_SKILL"
|
|
62
|
+
[ "$status" -eq 0 ]
|
|
63
|
+
# Marker must appear at the three manage-problem render sites:
|
|
64
|
+
# Step 5 P094 (refresh on new ticket), Step 7 P062 (refresh on
|
|
65
|
+
# transition), Step 9e (review-emit template).
|
|
66
|
+
count=$(grep -c -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$MANAGE_SKILL")
|
|
67
|
+
[ "$count" -ge 3 ]
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
@test "review-problems renders the VQ-SORT-DIRECTION marker" {
|
|
71
|
+
run grep -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$REVIEW_SKILL"
|
|
72
|
+
[ "$status" -eq 0 ]
|
|
73
|
+
}
|
|
74
|
+
|
|
75
|
+
@test "transition-problem Step 7 README refresh carries the VQ-SORT-DIRECTION marker" {
|
|
76
|
+
run grep -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$TRANSITION_SKILL"
|
|
77
|
+
[ "$status" -eq 0 ]
|
|
78
|
+
}
|
|
79
|
+
|
|
80
|
+
@test "transition-problems batch render carries the VQ-SORT-DIRECTION marker" {
|
|
81
|
+
run grep -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$TRANSITIONS_SKILL"
|
|
82
|
+
[ "$status" -eq 0 ]
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
@test "reconcile-readme rendering carries the VQ-SORT-DIRECTION marker" {
|
|
86
|
+
run grep -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$RECONCILE_SKILL"
|
|
87
|
+
[ "$status" -eq 0 ]
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
@test "list-problems VQ rendering carries the VQ-SORT-DIRECTION marker" {
|
|
91
|
+
run grep -F '<!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 -->' "$LIST_SKILL"
|
|
92
|
+
[ "$status" -eq 0 ]
|
|
93
|
+
}
|
|
94
|
+
|
|
95
|
+
# ---------------------------------------------------------------------------
|
|
96
|
+
# Structural contract-assertions — sort-direction phrase consistency
|
|
97
|
+
# ---------------------------------------------------------------------------
|
|
98
|
+
|
|
99
|
+
@test "manage-problem render blocks document the Released-date ASC direction" {
|
|
100
|
+
# Free-form explanation of the sort key + direction must accompany
|
|
101
|
+
# the marker so a reader doesn't have to chase the ADR to understand
|
|
102
|
+
# what the marker authorises.
|
|
103
|
+
run grep -F 'Released date ASC' "$MANAGE_SKILL"
|
|
104
|
+
[ "$status" -eq 0 ]
|
|
105
|
+
count=$(grep -c -F 'Released date ASC' "$MANAGE_SKILL")
|
|
106
|
+
[ "$count" -ge 3 ]
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
@test "review-problems documents the Released-date ASC direction" {
|
|
110
|
+
run grep -F 'Released date ASC' "$REVIEW_SKILL"
|
|
111
|
+
[ "$status" -eq 0 ]
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
# ---------------------------------------------------------------------------
|
|
115
|
+
# Structural contract-assertions — drift-warning prose
|
|
116
|
+
# ---------------------------------------------------------------------------
|
|
117
|
+
|
|
118
|
+
@test "manage-problem render blocks warn that drift re-opens P150" {
|
|
119
|
+
# The cross-coupling note must explicitly name P150 so future agents
|
|
120
|
+
# who consider relaxing the VQ sort direction see the regression risk.
|
|
121
|
+
count=$(grep -c -F 'drift here re-opens P150' "$MANAGE_SKILL")
|
|
122
|
+
[ "$count" -ge 3 ]
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
@test "review-problems renders the drift-re-opens-P150 warning" {
|
|
126
|
+
run grep -F 'drift re-opens P150' "$REVIEW_SKILL"
|
|
127
|
+
[ "$status" -eq 0 ]
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
# ---------------------------------------------------------------------------
|
|
131
|
+
# Behavioural fixture: ASC-by-Released-date puts oldest at row 1
|
|
132
|
+
# ---------------------------------------------------------------------------
|
|
133
|
+
|
|
134
|
+
@test "behavioural: VQ sort by Released date ASC puts oldest entry at row 1" {
|
|
135
|
+
# Fixture: 4 .verifying.md tickets with known Released dates spanning
|
|
136
|
+
# 2026-04-22 to 2026-05-02. Encode each as a tab-separated row whose
|
|
137
|
+
# columns are the sort axes (Released date, ID, Title). Apply the
|
|
138
|
+
# documented ASC-by-Released sort and assert the output row order
|
|
139
|
+
# places the oldest entry at row 1 and the newest at row N.
|
|
140
|
+
#
|
|
141
|
+
# This is the regression guard against the drift documented in P150 —
|
|
142
|
+
# before the fix, render sites iterated newest-first and pushed the
|
|
143
|
+
# actionable closure candidates (oldest entries) below the fold.
|
|
144
|
+
|
|
145
|
+
fixture_in="$TEST_TMP/fixture-vq.tsv"
|
|
146
|
+
cat >"$fixture_in" <<'EOF'
|
|
147
|
+
2026-05-02 148 P148: youngest released
|
|
148
|
+
2026-04-29 144 P144: 3 days old
|
|
149
|
+
2026-04-25 120 P120: week-old
|
|
150
|
+
2026-04-22 093 P093: oldest released
|
|
151
|
+
EOF
|
|
152
|
+
|
|
153
|
+
# Canonical sort: Released date ASC (oldest first), ID ASC as final
|
|
154
|
+
# tiebreaker for same-day releases.
|
|
155
|
+
sorted=$(sort -t$'\t' -k1,1 -k2,2n "$fixture_in" | cut -f3)
|
|
156
|
+
expected="P093: oldest released
|
|
157
|
+
P120: week-old
|
|
158
|
+
P144: 3 days old
|
|
159
|
+
P148: youngest released"
|
|
160
|
+
[ "$sorted" = "$expected" ]
|
|
161
|
+
}
|
|
162
|
+
|
|
163
|
+
@test "behavioural: same-day Released uses ID ASC as the final tiebreaker" {
|
|
164
|
+
# Regression guard: when two tickets share a Released date, the ID
|
|
165
|
+
# ASC tiebreaker must produce a deterministic order. Without an
|
|
166
|
+
# explicit final tiebreaker, render-time row order can shift on
|
|
167
|
+
# every refresh and look like content drift in git diff.
|
|
168
|
+
fixture_in="$TEST_TMP/fixture-vq-sameday.tsv"
|
|
169
|
+
cat >"$fixture_in" <<'EOF'
|
|
170
|
+
2026-05-02 148 P148: same day high
|
|
171
|
+
2026-05-02 147 P147: same day mid
|
|
172
|
+
2026-05-02 146 P146: same day low
|
|
173
|
+
EOF
|
|
174
|
+
|
|
175
|
+
sorted=$(sort -t$'\t' -k1,1 -k2,2n "$fixture_in" | cut -f3)
|
|
176
|
+
expected="P146: same day low
|
|
177
|
+
P147: same day mid
|
|
178
|
+
P148: same day high"
|
|
179
|
+
[ "$sorted" = "$expected" ]
|
|
180
|
+
}
|
|
181
|
+
|
|
182
|
+
@test "behavioural: oldest-first ordering surfaces likely-verified candidates first" {
|
|
183
|
+
# P048 user-task semantics: the Verification Queue exists so the user
|
|
184
|
+
# can close pending verifications. Older entries are more likely
|
|
185
|
+
# ready to close (less chance of revert). Oldest-first ordering puts
|
|
186
|
+
# those candidates at the top so the user lands on actionable rows
|
|
187
|
+
# without scrolling past fresh-release entries still in dwell-time.
|
|
188
|
+
#
|
|
189
|
+
# Fixture spans ages 0d, 1d, 14d, 30d. Assert that after sort, the
|
|
190
|
+
# 30-day entry is at row 1 (highest "likely verified" probability)
|
|
191
|
+
# and the 0-day entry is at row N (lowest probability).
|
|
192
|
+
today="2026-05-02"
|
|
193
|
+
fixture_in="$TEST_TMP/fixture-vq-ages.tsv"
|
|
194
|
+
cat >"$fixture_in" <<'EOF'
|
|
195
|
+
2026-05-02 150 P150: 0 days no
|
|
196
|
+
2026-05-01 149 P149: 1 day no
|
|
197
|
+
2026-04-18 048 P048: 14 days yes
|
|
198
|
+
2026-04-02 030 P030: 30 days yes
|
|
199
|
+
EOF
|
|
200
|
+
|
|
201
|
+
sorted=$(sort -t$'\t' -k1,1 -k2,2n "$fixture_in" | cut -f3)
|
|
202
|
+
first=$(printf "%s\n" "$sorted" | head -1)
|
|
203
|
+
last=$(printf "%s\n" "$sorted" | tail -1)
|
|
204
|
+
[ "$first" = "P030: 30 days yes" ]
|
|
205
|
+
[ "$last" = "P150: 0 days no" ]
|
|
206
|
+
}
|
|
@@ -99,7 +99,7 @@ For each REMOVE: `Edit` with the existing row as `old_string`, and remove it (re
|
|
|
99
99
|
|
|
100
100
|
For each ADD to WSJF Rankings: locate the correct WSJF position by descending order. Use `Edit` to insert the new row immediately above the next-lower-WSJF row (or append at the bottom of the table if the new row's WSJF is the lowest). The Edit's `old_string` is the line that the new row inserts above; the `new_string` is the new row + the same line below.
|
|
101
101
|
|
|
102
|
-
For each ADD to Verification Queue:
|
|
102
|
+
For each ADD to Verification Queue: insert the new row in `Released date ASC` position (oldest at row 1; same-day releases tiebreak by ID ASC) per the canonical VQ sort direction. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Recent releases land at the bottom; oldest-pending verifications surface at the top so the user lands on actionable closure candidates first per P048 user-task semantics. Drift here re-opens P150.
|
|
103
103
|
|
|
104
104
|
After all edits, re-run `packages/itil/scripts/reconcile-readme.sh docs/problems` to confirm exit 0. If the second run still reports drift, investigate the residual edits — do NOT re-run reconciliation in a loop, as that hides systematic edit failures.
|
|
105
105
|
|
|
@@ -72,7 +72,7 @@ After re-scoring, present three sections matching the README.md format (same ren
|
|
|
72
72
|
|------|-----|-------|----------|--------|--------|----------|-------|
|
|
73
73
|
```
|
|
74
74
|
|
|
75
|
-
**Verification Queue** — `.verifying.md` tickets, sorted by
|
|
75
|
+
**Verification Queue** — `.verifying.md` tickets, sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics. Older entries are the most likely-verified candidates the user wants to surface first when closing the queue; newest-first ordering pushes those actionable closure candidates below the fold and contradicts the section header. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Any change to the VQ sort direction MUST update this rendering block, Step 5's README template, AND `/wr-itil:manage-problem` SKILL.md Step 5 P094 / Step 7 P062 / Step 9c / Step 9e + `/wr-itil:transition-problem` + `/wr-itil:transition-problems` + `/wr-itil:reconcile-readme` + `/wr-itil:list-problems` — drift re-opens P150. Highlight any ticket whose release age is **≥ 14 days** with a `yes (N days)` marker in the `Likely verified?` column (within-skill default per P048 Candidate 4 — tunable; promote to cross-skill policy if needed):
|
|
76
76
|
|
|
77
77
|
```
|
|
78
78
|
| ID | Title | Released | Fix summary | Likely verified? |
|
|
@@ -128,7 +128,7 @@ Dev-work queue only. Verification Pending (`.verifying.md`, WSJF multiplier 0) a
|
|
|
128
128
|
|
|
129
129
|
## Verification Queue
|
|
130
130
|
|
|
131
|
-
Fix released, awaiting user verification (driven off `docs/problems/*.verifying.md` via glob per ADR-022).
|
|
131
|
+
Fix released, awaiting user verification (driven off `docs/problems/*.verifying.md` via glob per ADR-022). Sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC). <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> `Likely verified?` column marks tickets ≥14 days old (P048 Candidate 4 default).
|
|
132
132
|
|
|
133
133
|
| ID | Title | Released | Likely verified? |
|
|
134
134
|
|----|-------|----------|------------------|
|
|
@@ -171,6 +171,8 @@ Every Step 7 status transition regenerates `docs/problems/README.md` and stages
|
|
|
171
171
|
|
|
172
172
|
The refresh uses the same rendering rules as `/wr-itil:review-problems` Step 9e (glob `docs/problems/*.open.md` / `*.known-error.md` / `*.verifying.md` / `*.parked.md`; rank open/known-error by WSJF; list verifyings in the Verification Queue ordered by release age; list parkeds in the Parked section) but skips the full re-scoring pass — existing WSJF values on the ticket files are trusted. The refresh is a render, not a re-rank.
|
|
173
173
|
|
|
174
|
+
**Verification Queue sort direction (P150)**: Verification Queue rows are sorted by `Released date ASC` (oldest at row 1; same-day releases tiebreak by ID ASC) per ADR-022 + P048 user-task semantics — older entries are the most likely-verified candidates the user wants to surface first when closing the queue. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift here re-opens P150.
|
|
175
|
+
|
|
174
176
|
**Mechanism:**
|
|
175
177
|
|
|
176
178
|
1. After renaming + Editing + `git add`-ing the transitioned ticket file (per the staging-trap rule above), regenerate `docs/problems/README.md` in-place reflecting the new filename set and the transitioned ticket's new Status.
|
|
@@ -178,7 +178,7 @@ After the per-pair loop finishes, IF AT LEAST ONE PAIR SUCCEEDED:
|
|
|
178
178
|
|
|
179
179
|
Per P062, every Step 7 status transition refreshes README.md. At the batch grain, the refresh runs ONCE — a single render reflecting ALL surviving renames + Status updates. Not N refreshes (that would force the README to thrash N times mid-batch and amplify diff noise).
|
|
180
180
|
|
|
181
|
-
The refresh follows the same render rules as `/wr-itil:review-problems` Step 9e (glob `docs/problems/*.open.md` / `*.known-error.md` / `*.verifying.md` / `*.parked.md`; rank open + known-error by WSJF; Verification Queue
|
|
181
|
+
The refresh follows the same render rules as `/wr-itil:review-problems` Step 9e (glob `docs/problems/*.open.md` / `*.known-error.md` / `*.verifying.md` / `*.parked.md`; rank open + known-error by WSJF; Verification Queue sorted by `Released date ASC` with same-day tiebreak by ID ASC per ADR-022 + P048; Parked section). It does NOT re-rank — existing WSJF values on ticket files are trusted; the refresh is a render, not a re-rank. <!-- VQ-SORT-DIRECTION: oldest-first per ADR-022 --> Drift on the VQ sort direction re-opens P150.
|
|
182
182
|
|
|
183
183
|
```bash
|
|
184
184
|
git add docs/problems/README.md
|
|
@@ -93,9 +93,33 @@ The `wr-itil-reconcile-readme` command is a `$PATH`-resolved shim shipped in `pa
|
|
|
93
93
|
|
|
94
94
|
Exit-code routing:
|
|
95
95
|
- **Exit 0 (clean)**: continue to Step 1.
|
|
96
|
-
- **Exit 1 (drift detected)**: structured diff lines printed to stdout, one per drift entry (≤150 bytes per ADR-038 progressive-disclosure budget).
|
|
96
|
+
- **Exit 1 (drift detected)**: structured diff lines printed to stdout, one per drift entry (≤150 bytes per ADR-038 progressive-disclosure budget). Capture stdout to a temp file and classify the drift via the **uncommitted-rename carve-out** (P149) before halt-routing — see "Drift classification carve-out" immediately below.
|
|
97
97
|
- **Exit 2 (parse error)**: README missing or malformed. Halt the loop with the parse-error message and the structured Prior-Session State report — this is a deeper repair that needs investigation, not mechanical reconciliation.
|
|
98
98
|
|
|
99
|
+
##### Drift classification carve-out (P149)
|
|
100
|
+
|
|
101
|
+
The Exit 1 auto-route to `/wr-itil:reconcile-readme` is correct for **committed cross-session drift** but **wrong for uncommitted-rename-rooted drift** — when a prior AFK iter (or any in-flight session) carries a staged ticket rename that the next iteration's in-flow P094 / P062 refresh will reconcile in the upcoming commit per ADR-014's single-commit grain. Auto-routing in the latter case fires an extra `chore(problems): reconcile README ...` commit and splits one logical change across two commits, violating the grain. Worse for the AFK orchestrator: that extra commit lands BEFORE the iter's actual work commit, so the audit trail reads "reconcile, then ticket work" when the truth is "ticket work in progress, README refresh deferred to its in-flow contract".
|
|
102
|
+
|
|
103
|
+
Run the classifier on Exit 1 to distinguish the two cases:
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
wr-itil-reconcile-readme docs/problems > /tmp/wr-itil-drift-$$.txt
|
|
107
|
+
reconcile_exit=$?
|
|
108
|
+
if [ "$reconcile_exit" -eq 1 ]; then
|
|
109
|
+
wr-itil-classify-readme-drift /tmp/wr-itil-drift-$$.txt docs/problems
|
|
110
|
+
classify_exit=$?
|
|
111
|
+
rm -f /tmp/wr-itil-drift-$$.txt
|
|
112
|
+
fi
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
The `wr-itil-classify-readme-drift` command is a `$PATH`-resolved shim (ADR-049 naming grammar) dispatching `packages/itil/scripts/classify-readme-drift.sh`. It cross-references drifting IDs from the script's stdout against `git status --porcelain docs/problems/` filtered for staged rename (`R`) entries.
|
|
116
|
+
|
|
117
|
+
Classifier exit-code routing:
|
|
118
|
+
|
|
119
|
+
- **`classify_exit == 0` (INLINE_REFRESH)**: every drifting ID is the destination of a staged rename in the working tree. Log a one-line note in the iter summary ("Step 0 reconcile drift covered by N staged rename(s); deferring README refresh to in-flow Step 5 / Step 7 per P094 / P062 + ADR-014 single-commit grain") and continue to Step 1. Do NOT invoke `/wr-itil:reconcile-readme` — the in-flow refresh will land the README correction in the same commit as the iter's ticket work.
|
|
120
|
+
- **`classify_exit == 1` (HALT_ROUTE_RECONCILE)**: at least one drifting ID is NOT covered by a staged rename — committed cross-session drift OR mixed. Per ADR-013 Rule 6 (non-interactive AFK fail-safe), invoke `/wr-itil:reconcile-readme` to apply the corrections + commit a `chore(problems): reconcile README ...` commit, then proceed to Step 1. The reconciled README is the orchestrator's source of truth for Step 3 ranking — a stale read at Step 1 would propagate the lie into the iteration's selection. Mixed routes to halt because `/wr-itil:reconcile-readme` resolves both classes safely; the in-flow refresh only handles the rename'd subset.
|
|
121
|
+
- **`classify_exit == 2` (parse error)**: classifier received empty / missing drift input — contract violation upstream. Fall back to the conservative auto-route.
|
|
122
|
+
|
|
99
123
|
This is a robustness layer ON TOP of P094 + P062, not a supersession — both per-operation contracts remain in force inside each iteration's manage-problem / transition-problem invocation.
|
|
100
124
|
|
|
101
125
|
### Step 1: Scan the backlog
|
|
@@ -266,6 +290,8 @@ rm -f "$ITER_JSON"
|
|
|
266
290
|
|
|
267
291
|
**Idle-timeout SIGTERM (P121).** The poll loop above is the orchestrator-side guard against stuck iteration subprocesses — iters that complete their semantic work (commits land, retro runs, `ITERATION_SUMMARY` is emitted into the agent output stream) but then sit waiting on a hook timeout, a backgrounded subagent that never resolved, or some other CLI-level idle behaviour before exiting. Without the guard the orchestrator polls indefinitely; the JSON file stays 0 bytes (the CLI only flushes on exit) and wall-clock burns for ~$8/hour of subprocess overhead with no API turns. The 2026-04-25 P118 iter 5 evidence: 121 min wall-clock; final commit at ~100 min; manual SIGTERM at 121 min produced a clean 5649-byte JSON response with `is_error: false`, full `## Session Retrospective` section, parseable `ITERATION_SUMMARY` block, and `duration_ms: 2992935` (49.9 min — the real-work portion). SIGTERM is therefore a safe recovery primitive for this stuck-state class — empirically a clean exit-flush, not a destructive interrupt. Behavioural confirmation lives in `test/work-problems-step-5-idle-timeout-sigterm.bats` (P121 ships with this fixture as the second-source the production observation needed). The default `IDLE_TIMEOUT_S=3600` (60 min) leaves headroom for genuinely long architectural iters; the `WORK_PROBLEMS_IDLE_TIMEOUT_S` env-var overrides per-environment for adopters who run very long iters or want a tighter guard. The orchestrator's Step 6 progress line SHOULD annotate `(SIGTERM_SENT)` when the branch fires so the user can distinguish a SIGTERM-recovered iter from a normal completion (per JTBD-006 audit-trail expectation).
|
|
268
292
|
|
|
293
|
+
**SIGTERM exit-flush is conditional, not universal (P147).** The "clean exit-flush" claim above is empirically true ONLY when the subprocess has already emitted `ITERATION_SUMMARY` through the agent stream before going idle (the P118 shape: semantic work complete + retro complete, then idle-wait on some final hook). The 2026-04-29 P146 incident falsified the universal generalisation: an iteration deadlocked in a `bash until`-loop polling a backgrounded-task output file (commits had landed; ITERATION_SUMMARY had NEVER been emitted) and SIGTERM at 68m34s produced exit 143 with a **0-byte JSON file**. `claude -p --output-format json` writes the entire response as a single blob ON normal exit; the SIGTERM-handler (whatever it does inside the CLI) cannot synthesise a JSON response that the agent loop never produced. **Stuck-before-emit subclass: SIGTERM still recovers wall-clock, but loses metadata.** When the orchestrator observes exit 143 + 0-byte JSON, it MUST treat the iteration as a metadata-loss event: (1) verify work integrity from independent evidence (`git log` for commits + `git status --porcelain` for tree state); (2) halt the AFK loop per exit-code semantics rather than silently continue; (3) reconstruct cost from the Anthropic billing dashboard rather than from the missing JSON envelope. The behavioural second-source for the stuck-before-emit case lives in the same `test/work-problems-step-5-idle-timeout-sigterm.bats` fixture (a fake-shim that traps SIGTERM and exits without writing stdout, asserting `JSON_BYTES=0` after the orchestrator-shape harness fires SIGTERM). Cost-of-metadata-loss < cost-of-stuck-subprocess; SIGTERM remains the right recovery primitive — the conditional caveat is about what flushes after, not whether to fire.
|
|
294
|
+
|
|
269
295
|
**LAST_ACTIVITY_MARK signal trade-off.** The mark is `max(DISPATCH_START_EPOCH, last commit timestamp)`. The dispatch-start floor is intentional: skip-iterations that produce no commit (Step 4 routes a ticket to `action: skipped`) are bounded by `IDLE_TIMEOUT_S` since dispatch start, not by an arbitrarily-stale prior-commit timestamp. This protects against false-positive SIGTERM at iter T=0 when the most recent commit happens to be hours old. The trade-off is the inverse: a skip-iter that runs for `IDLE_TIMEOUT_S` (60 min default) will SIGTERM even though it never had a chance to commit. The 60-min default is well past the typical skip-iter wall-clock (a normal skip completes in seconds), so the trade-off rarely fires in practice; adopters who run unusually long skip-evaluation iters (e.g. deep architect-design probes) should raise `WORK_PROBLEMS_IDLE_TIMEOUT_S` accordingly. Alternative signals considered and rejected: `stat -f%m "$ITER_JSON"` (binary — file mtime only changes on subprocess exit, useless during the idle gap); subprocess RSS-change tracking (noisy; spikes during Agent-tool expansions confound the signal). The git-log signal is the cheapest reliable progress indicator the orchestrator already has.
|
|
270
296
|
|
|
271
297
|
**Iteration prompt body (self-contained — the subprocess has no prior conversation context):**
|
|
@@ -635,6 +661,7 @@ When every skipped ticket is in the `upstream-blocked` category (stop-condition
|
|
|
635
661
|
## Related
|
|
636
662
|
|
|
637
663
|
- **P121** (`docs/problems/121-afk-orchestrator-should-sigterm-stuck-subprocesses-after-idle-timeout.verifying.md`) — driver for Step 5's backgrounded-poll-loop dispatch shape (replacing the prior foreground-synchronous form) and the idle-timeout SIGTERM branch. The 2026-04-25 P118 iter 5 evidence: an iteration subprocess sat idle ~70 min after its final commit, then SIGTERM produced a clean JSON exit-flush. Fix: orchestrator backgrounds the subprocess, polls every 60s, computes `LAST_ACTIVITY_MARK = max(DISPATCH_START_EPOCH, git log -1 --format=%at HEAD)`, and sends SIGTERM when `now - LAST_ACTIVITY_MARK > WORK_PROBLEMS_IDLE_TIMEOUT_S` (default 3600s = 60 min). Behavioural second-source: `test/work-problems-step-5-idle-timeout-sigterm.bats` exercises a fake `claude -p` shim that sleeps past the threshold and asserts SIGTERM, JSON exit-flush, env-var override, and within-threshold no-fire. Step 6's per-iter progress line SHOULD annotate `(SIGTERM_SENT)` when the branch fires so users can distinguish recovered iters from natural completions. ADR-032's subprocess-boundary variant amended 2026-04-26 with the backgrounded-poll-loop refinement.
|
|
664
|
+
- **P147** (`docs/problems/147-p121-sigterm-clean-flush-guarantee-conditional-needs-skill-md-caveat-for-stuck-before-emit-subclass.verifying.md`) — refinement to P121's "clean exit-flush" claim. P118's evidence held only for subprocesses that had already emitted `ITERATION_SUMMARY` before going idle; the 2026-04-29 P146 incident produced exit 143 + 0-byte JSON when SIGTERM fired before `ITERATION_SUMMARY` emission. Fix: SKILL.md prose now carries the conditional caveat (Step 5 "SIGTERM exit-flush is conditional, not universal" subsection) and adopters reading the prose are directed to treat exit 143 + 0-byte JSON as a metadata-loss event — verify work integrity from `git log` + `git status --porcelain`, halt the AFK loop, and reconstruct cost from the Anthropic billing dashboard. Behavioural second-source extends `test/work-problems-step-5-idle-timeout-sigterm.bats` with a stuck-before-emit fake-shim asserting `JSON_BYTES=0` after SIGTERM. Mechanism unchanged (SIGTERM remains the right recovery primitive); the refinement is documentation accuracy + the metadata-loss-event handling shape.
|
|
638
665
|
- **P089** (`docs/problems/089-work-problems-step-5-dispatch-robustness-stdin-warning-and-cost-metadata-edge-case.verifying.md`) — driver for Step 5's `< /dev/null` dispatch redirect and the Per-iteration cost metadata "Authority hierarchy" paragraph. Gap 1: stdin warning contaminated stderr-merged JSON captures; closed by adding `< /dev/null` to the canonical dispatch command. Gap 2: `.usage.*` undercounts when subprocess exits via a background-task completion ack while `.total_cost_usd` stays cumulative-authoritative; closed by documenting the authority hierarchy in Step 5 and the Session Cost output section so adopters trust cost and label token totals best-effort.
|
|
639
666
|
- **P086** (`docs/problems/086-afk-iteration-subprocess-does-not-run-retro-before-returning.verifying.md`) — driver for Step 5's retro-on-exit clause. Iteration subprocesses exit without running retro, so per-iteration friction (hook misbehaviour, repeat-workaround patterns, pipeline instability) evaporates on exit. Fix: iteration prompt body names `/wr-retrospective:run-retro` as a closing step before `ITERATION_SUMMARY` emission; retro runs inside the subprocess so Step 2b pipeline-instability scan has the full tool-call history; run-retro commits its own work per ADR-014; orchestrator picks up retro-created tickets on the next Step 1 scan.
|
|
640
667
|
- **P084** (`docs/problems/084-work-problems-iteration-worker-has-no-agent-tool-so-architect-jtbd-gates-block.open.md`) — driver for Step 5's subprocess-boundary dispatch. Supersedes P077's Agent-tool dispatch on the same Step 5 surface because Agent-tool-spawned subagents cannot themselves invoke Agent (platform restriction), which prevents governance gate markers from being set inside the iteration worker.
|
|
@@ -194,3 +194,107 @@ assert "total_cost_usd" in j, "cost metadata must survive SIGTERM exit-flush"
|
|
|
194
194
|
run grep -nE 'ITER_PID=\$!|& *\n*ITER_PID|claude -p.{0,200}&[[:space:]]*$' "$SKILL_FILE"
|
|
195
195
|
[ "$status" -eq 0 ]
|
|
196
196
|
}
|
|
197
|
+
|
|
198
|
+
# ---------------------------------------------------------------------------
|
|
199
|
+
# P147 stuck-before-emit subclass: P121's "SIGTERM produces clean JSON exit-
|
|
200
|
+
# flush" claim was empirically grounded only against subprocesses that had
|
|
201
|
+
# ALREADY emitted ITERATION_SUMMARY before going idle (the P118 evidence). The
|
|
202
|
+
# 2026-04-29 P146 incident falsified the generalisation: a subprocess that
|
|
203
|
+
# deadlocked BEFORE ITERATION_SUMMARY emission produced exit 143 + a 0-byte
|
|
204
|
+
# JSON file when SIGTERMed. `claude -p --output-format json` writes the entire
|
|
205
|
+
# response as a single blob ON normal exit; SIGTERM-before-blob-write means no
|
|
206
|
+
# JSON is ever written.
|
|
207
|
+
#
|
|
208
|
+
# The fixture below exercises the stuck-before-emit shape with a fake `claude`
|
|
209
|
+
# that traps SIGTERM and exits WITHOUT writing any stdout. The orchestrator-
|
|
210
|
+
# shape harness then SIGTERMs after the idle threshold, and the assertions
|
|
211
|
+
# pin: (a) the JSON file is 0 bytes (the metadata-loss-event indicator the
|
|
212
|
+
# SKILL.md prose now warns adopters to watch for), and (b) SIGTERM was sent
|
|
213
|
+
# (the recovery primitive still fires — the bug is in the prose claim about
|
|
214
|
+
# what flushes, not in the SIGTERM action itself).
|
|
215
|
+
#
|
|
216
|
+
# @problem P147
|
|
217
|
+
|
|
218
|
+
dispatch_with_poll_no_emit() {
|
|
219
|
+
local json_file="${TEST_TMP}/iter.json"
|
|
220
|
+
local idle_timeout_s="${WORK_PROBLEMS_IDLE_TIMEOUT_S:-3600}"
|
|
221
|
+
local dispatch_start_epoch
|
|
222
|
+
dispatch_start_epoch=$(date +%s)
|
|
223
|
+
local sigterm_sent=0
|
|
224
|
+
|
|
225
|
+
: > "$json_file"
|
|
226
|
+
claude_no_emit -p --permission-mode bypassPermissions --output-format json "TEST" \
|
|
227
|
+
< /dev/null > "$json_file" 2>&1 &
|
|
228
|
+
local iter_pid=$!
|
|
229
|
+
|
|
230
|
+
while kill -0 "$iter_pid" 2>/dev/null; do
|
|
231
|
+
sleep 1
|
|
232
|
+
local now
|
|
233
|
+
now=$(date +%s)
|
|
234
|
+
local last_activity_mark=$dispatch_start_epoch
|
|
235
|
+
local idle_seconds=$(( now - last_activity_mark ))
|
|
236
|
+
if (( idle_seconds > idle_timeout_s )) && (( sigterm_sent == 0 )); then
|
|
237
|
+
kill -TERM "$iter_pid" 2>/dev/null || true
|
|
238
|
+
sigterm_sent=1
|
|
239
|
+
fi
|
|
240
|
+
done
|
|
241
|
+
|
|
242
|
+
wait "$iter_pid" 2>/dev/null || true
|
|
243
|
+
|
|
244
|
+
local json_bytes
|
|
245
|
+
json_bytes=$(wc -c < "$json_file" | tr -d ' ')
|
|
246
|
+
|
|
247
|
+
printf 'SIGTERM_SENT=%d\n' "$sigterm_sent"
|
|
248
|
+
printf 'JSON_BYTES=%s\n' "$json_bytes"
|
|
249
|
+
}
|
|
250
|
+
|
|
251
|
+
setup_no_emit_shim() {
|
|
252
|
+
cat > "$FAKE_BIN/claude_no_emit" <<'FAKE_EOF'
|
|
253
|
+
#!/usr/bin/env bash
|
|
254
|
+
# Test fake for work-problems Step 5 P147 stuck-before-emit fixture.
|
|
255
|
+
# Traps SIGTERM and exits 0 WITHOUT emitting any stdout. Mirrors the 2026-
|
|
256
|
+
# 04-29 P146 incident shape: subprocess deadlocked before ITERATION_SUMMARY
|
|
257
|
+
# emission; SIGTERM cannot flush a JSON blob the CLI never produced.
|
|
258
|
+
trap 'exit 0' TERM
|
|
259
|
+
sleep "${FAKE_SLEEP_AFTER:-30}"
|
|
260
|
+
FAKE_EOF
|
|
261
|
+
chmod +x "$FAKE_BIN/claude_no_emit"
|
|
262
|
+
}
|
|
263
|
+
|
|
264
|
+
@test "P147: SIGTERM-before-emit produces 0-byte JSON (stuck-before-emit subclass)" {
|
|
265
|
+
setup_no_emit_shim
|
|
266
|
+
export FAKE_SLEEP_AFTER=10
|
|
267
|
+
export WORK_PROBLEMS_IDLE_TIMEOUT_S=2
|
|
268
|
+
run dispatch_with_poll_no_emit
|
|
269
|
+
[ "$status" -eq 0 ]
|
|
270
|
+
[[ "$output" == *"SIGTERM_SENT=1"* ]]
|
|
271
|
+
[[ "$output" == *"JSON_BYTES=0"* ]]
|
|
272
|
+
}
|
|
273
|
+
|
|
274
|
+
@test "P147: SKILL.md Step 5 names the conditional caveat for SIGTERM exit-flush" {
|
|
275
|
+
# The prose claim must NOT generalise the P118 clean-flush observation
|
|
276
|
+
# universally; it must explicitly condition on ITERATION_SUMMARY having been
|
|
277
|
+
# emitted before SIGTERM. Adopters reading the SKILL.md need to know that
|
|
278
|
+
# SIGTERM-before-emit produces a 0-byte JSON, not a clean flush. Require
|
|
279
|
+
# the failure-mode language explicitly so a future drift that removes the
|
|
280
|
+
# caveat but happens to mention ITERATION_SUMMARY in a different sentence
|
|
281
|
+
# cannot keep this assertion green.
|
|
282
|
+
run grep -niE "stuck.?before.?emit|before ITERATION_SUMMARY|ITERATION_SUMMARY.{0,80}not.{0,40}(yet|been).{0,40}emit|conditional.{0,40}(caveat|on)" "$SKILL_FILE"
|
|
283
|
+
[ "$status" -eq 0 ]
|
|
284
|
+
}
|
|
285
|
+
|
|
286
|
+
@test "P147: SKILL.md Step 5 documents metadata-loss-event handling (git-evidence + halt + billing-dashboard)" {
|
|
287
|
+
# When SIGTERM-before-emit is observed (exit 143 + 0-byte JSON), the
|
|
288
|
+
# orchestrator must verify work integrity from independent evidence (git
|
|
289
|
+
# log + git status), halt the AFK loop per exit-code semantics, and
|
|
290
|
+
# reconstruct cost from the Anthropic billing dashboard. This guards
|
|
291
|
+
# against orchestrators silently treating exit-143-no-JSON as a normal
|
|
292
|
+
# iteration completion.
|
|
293
|
+
run grep -niE "metadata.?loss|git log.{0,80}git status|billing dashboard|reconstruct cost" "$SKILL_FILE"
|
|
294
|
+
[ "$status" -eq 0 ]
|
|
295
|
+
}
|
|
296
|
+
|
|
297
|
+
@test "P147: SKILL.md Step 5 cites P147 (conditional-caveat ticket)" {
|
|
298
|
+
run grep -nE "P147" "$SKILL_FILE"
|
|
299
|
+
[ "$status" -eq 0 ]
|
|
300
|
+
}
|