@windyroad/itil 0.21.2-preview.226 → 0.21.3-preview.228
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -114,17 +114,27 @@ Stop the loop and report a summary if any of these are true:
|
|
|
114
114
|
2. **All remaining problems require interactive input** — e.g., they all need user verification (known-errors with `## Fix Released`), or their scope expanded beyond what's safe to auto-resolve
|
|
115
115
|
3. **All remaining problems are blocked** — investigation hit a dead end, or the fix requires changes outside the project
|
|
116
116
|
|
|
117
|
-
|
|
117
|
+
**Step 2.5 fires unconditionally at loop end** (P135 Phase 3 / ADR-044) — promoted from "fallback when stop-condition #2" to **default loop-end emit shape**. Anti-BUFD framing per ADR-044: the AFK loop is the empirical-discovery engine; direction-class observations + deviation-candidates accumulate from real friction across iters; loop-end batched presentation is the user-facing deliverable. Per-iter surfacing was the old (now-superseded) pattern; Phase 3 makes batch-at-loop-end the default for ALL stop conditions, not just #2.
|
|
118
118
|
|
|
119
|
-
For stop-conditions #1 and #3 (no
|
|
119
|
+
For stop-conditions #1 and #3 (no actionable problems / all blocked), Step 2.5 still runs — it reads the accumulated `outstanding_questions` queue from `.afk-run-state/outstanding-questions.jsonl` and presents the batch. Empty queue → no `AskUserQuestion` fires; non-empty queue → batched per ADR-013 Rule 1 cap (≤4 per call, sequential if >4).
|
|
120
120
|
|
|
121
|
-
### Step 2.5: Surface outstanding
|
|
121
|
+
### Step 2.5: Surface accumulated outstanding questions at loop end (P135 Phase 3 — default emit shape)
|
|
122
122
|
|
|
123
|
-
|
|
123
|
+
Per ADR-044 framework-resolution boundary: human input is for direction-setting / deviation-approval / one-time-override / silent-framework / taste / authentic-correction (six categories). Across N iters, those observations accumulate at iter level (`ITERATION_SUMMARY.outstanding_questions`) and persist to a session-level queue file. Loop-end Step 2.5 reads, ranks, and presents the batch.
|
|
124
124
|
|
|
125
|
-
**1.
|
|
125
|
+
**1. Read the accumulated queue.** Read `.afk-run-state/outstanding-questions.jsonl` — each line is one entry per the ITERATION_SUMMARY `outstanding_questions` schema (see Step 5 Output contract). De-duplicate identical entries (same `category` + same `question` text + same `existing_decision` for deviation-approval).
|
|
126
126
|
|
|
127
|
-
**2.
|
|
127
|
+
**2. Rank the entries.** Apply the ranking precedence: deviation-approval (highest) > direction > one-time-override > silent-framework > taste > correction-followup. Within each category, preserve iter-order (oldest first) so the user reads the queue in temporal sequence.
|
|
128
|
+
|
|
129
|
+
**3. Branch on interactivity.**
|
|
130
|
+
|
|
131
|
+
- **Default branch — call `AskUserQuestion` when available** (the orchestrator's main turn is interactive by construction; the user is presumed at the keyboard at loop end). Batch the entries into one or more `AskUserQuestion` calls per ADR-013 Rule 1 cap. Header per category: `"Outstanding direction"`, `"Approve deviation from existing decision"`, `"One-time override"`, etc. For deviation-approval entries, options are `Approve + amend ADR` / `Approve + supersede ADR` / `Approve + one-time exception` / `Reject (existing decision stands)` / `Defer (need more evidence)` — the 5-option shape matching the `proposed_shape` field. For other entries, options are extracted from the entry's `question` text or candidate fixes. Write answers back to the corresponding ticket files so the next AFK loop does not re-ask.
|
|
132
|
+
|
|
133
|
+
- **Fallback branch — emit `### Outstanding Design Questions` table** when `AskUserQuestion` is unavailable (restricted permission mode, hook-disabled tool surface). The table lists each entry with its `category`, `question`, `existing_decision` / `contradicting_evidence` for deviation-approval entries, and `ticket_id`. The user answers on return.
|
|
134
|
+
|
|
135
|
+
**4. Cleanup.** After all entries are resolved (whether via `AskUserQuestion` or table), truncate `.afk-run-state/outstanding-questions.jsonl` to empty. The next AFK loop starts with a clean queue.
|
|
136
|
+
|
|
137
|
+
**5. Emit the final summary + `ALL_DONE`.** The summary includes the Outstanding Design Questions table when Step 2.5b's fallback branch fired (see Output Format). When Step 2.5b's default branch fired (`AskUserQuestion` was available), the answers have already been written back; the table is omitted from the summary.
|
|
128
138
|
|
|
129
139
|
```
|
|
130
140
|
ALL_DONE
|
|
@@ -274,13 +284,41 @@ committed: true | false | skipped
|
|
|
274
284
|
commit_sha: <sha> # required when committed=true
|
|
275
285
|
reason: <one-line> # required when committed=false or action=skipped
|
|
276
286
|
skip_reason_category: user-answerable | architect-design | upstream-blocked # required when action=skipped
|
|
277
|
-
outstanding_questions: [<
|
|
287
|
+
outstanding_questions: [<entry per ADR-044 6-class taxonomy — see schema below>] # mandatory non-empty when iter touched a direction / deviation-approval / one-time-override / silent-framework decision; otherwise empty array
|
|
278
288
|
remaining_backlog_count: <N>
|
|
279
289
|
notes: <one-line>
|
|
280
290
|
```
|
|
281
291
|
|
|
292
|
+
**`outstanding_questions` schema (P135 Phase 3 / ADR-044)**: each entry is tagged with its category for loop-end Step 2.5 ranking. Two shapes:
|
|
293
|
+
|
|
294
|
+
```
|
|
295
|
+
# Standard direction / one-time-override / silent-framework / taste / correction-followup entry:
|
|
296
|
+
{
|
|
297
|
+
category: "direction" | "one-time-override" | "silent-framework" | "taste" | "correction-followup"
|
|
298
|
+
question: "<one-line — the genuine human-value question this iter surfaced>"
|
|
299
|
+
context: "<one-line — the in-iter situation that surfaced it>"
|
|
300
|
+
ticket_id: "P<NNN>" # the iter's ticket; loop-end groups by ticket
|
|
301
|
+
}
|
|
302
|
+
|
|
303
|
+
# Deviation-candidate entry (the anti-BUFD-for-framework-evolution shape per ADR-044):
|
|
304
|
+
{
|
|
305
|
+
category: "deviation-approval"
|
|
306
|
+
existing_decision: "<ADR-NNN section / SKILL.md path:line / RISK-POLICY clause>"
|
|
307
|
+
contradicting_evidence: "<tool invocation + observable outcome per ADR-026 grounding>"
|
|
308
|
+
proposed_shape: "amend" | "supersede" | "one-time"
|
|
309
|
+
rationale: "<one-line — why current evidence contradicts the existing decision>"
|
|
310
|
+
ticket_id: "P<NNN>"
|
|
311
|
+
}
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
When the iter encounters an existing decision (ADR / SKILL contract / WSJF rule / RISK-POLICY entry) that current evidence contradicts, the agent does **NOT auto-deviate**. Instead it queues a `deviation-approval` entry per the schema. Loop-end Step 2.5 presents it as `AskUserQuestion` with options matching the proposed shape: `Approve + amend ADR` / `Approve + supersede ADR` / `Approve + one-time exception` / `Reject (existing decision stands)` / `Defer (need more evidence)`. The agent never auto-deviates; never blindly follows against evidence. **Not-queueing-when-strong-contradicting-evidence-exists is a regression** per the Phase 3 bats coverage (`work-problems-deviation-candidate-shape.bats`).
|
|
315
|
+
|
|
282
316
|
Architect review (R2) requires the commit state fields (`committed` / `commit_sha` / `reason`) so **Step 6.75's Dirty-for-known-reason branch stays evaluable** from the summary alone. JTBD review requires `ticket_id` / `action` / `skip_reason_category` / `outstanding_questions` so Step 2.5 and the Output Format's Completed / Skipped / Outstanding Design Questions tables can be populated deterministically without the orchestrator having to re-parse ticket files.
|
|
283
317
|
|
|
318
|
+
**Between-iter aggregation (P135 Phase 3)**: orchestrator's main turn appends each iter's `outstanding_questions` entries to a session-level queue file at `.afk-run-state/outstanding-questions.jsonl` between Step 6 (report) and Step 6.5 (release-cadence check). Each line is one JSON-encoded entry per the schema above. Loop-end emit (Step 2.5) reads the queue file, de-duplicates, ranks (deviation-approval > direction > one-time-override > silent-framework > taste > correction-followup), and presents as batched `AskUserQuestion` per ADR-013 Rule 1 cap (≤4 per call, sequential if >4). Per ADR-032 pending-questions artefact precedent.
|
|
319
|
+
|
|
320
|
+
**Mid-loop UserPromptSubmit handling (P135 Phase 3 / R4)**: when the orchestrator receives a user message DURING an iter (e.g. the user returns mid-loop and sends a new directive), the orchestrator MUST let the in-flight iter complete naturally to its `ITERATION_SUMMARY` emission BEFORE surfacing the new direction or the accumulated queue. Do NOT abort the iter mid-flight (no SIGTERM to the iter PID; no kill signal). The corrective for the 2026-04-27 iter-9-killed overcorrection: the user's correction was about future iter dispatch shape, not about the in-flight iter; killing wasted ~$5 + 25 min in-flight work. The handler waits for the natural exit, surfaces the queue + the new direction together, then routes per the user's response.
|
|
321
|
+
|
|
284
322
|
**Per-iteration cost metadata.** Alongside `.result`, the `claude -p --output-format json` response carries cost + usage fields in the same JSON blob. The orchestrator MUST extract these **named fields only** into per-iteration totals and session aggregates — nothing else from the JSON should be surfaced to the user or logged (PII guard: the response also carries `session_id`, `model`, `stop_reason`, and other envelope fields; the extraction is **scoped to the named fields** below so future contributors do not unconsciously broaden it).
|
|
285
323
|
|
|
286
324
|
Extracted fields (explicit field list):
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
#
|
|
3
|
+
# packages/itil/skills/work-problems/test/work-problems-deviation-candidate-shape.bats
|
|
4
|
+
#
|
|
5
|
+
# Behavioural tests for the deviation-candidate sub-pattern in
|
|
6
|
+
# ITERATION_SUMMARY.outstanding_questions (P135 Phase 3 / R7 / ADR-044).
|
|
7
|
+
#
|
|
8
|
+
# Per ADR-044's anti-BUFD-for-framework-evolution clause: existing
|
|
9
|
+
# decisions are point-in-time; as reality changes, existing decisions
|
|
10
|
+
# may become wrong. The agent MUST surface deviation candidates with
|
|
11
|
+
# evidence (existing-decision citation + contradicting-evidence
|
|
12
|
+
# citation per ADR-026 + proposed shape) and queue them for user
|
|
13
|
+
# approval. Never auto-deviate; never blindly follow against evidence.
|
|
14
|
+
#
|
|
15
|
+
# This bats fixture covers the deviation-candidate schema surface in
|
|
16
|
+
# the ITERATION_SUMMARY contract + Step 2.5 5-option AskUserQuestion
|
|
17
|
+
# loop-end emit + jsonl persistence shape across iter subprocess
|
|
18
|
+
# boundary + the positive regression assertion (not-queueing-when-
|
|
19
|
+
# evidence-present is a regression).
|
|
20
|
+
#
|
|
21
|
+
# tdd-review: structural-permitted (justification: skill behavioural
|
|
22
|
+
# harness pending P012 + P081 Phase 2; SKILL.md contract assertions
|
|
23
|
+
# bridge until then; expected to migrate to behavioural form once the
|
|
24
|
+
# harness exists)
|
|
25
|
+
#
|
|
26
|
+
# @problem P135 Phase 3 R7
|
|
27
|
+
# @adr ADR-044 (Decision-Delegation Contract — deviation-approval surface)
|
|
28
|
+
# @adr ADR-026 (cost-source grounding for evidence citations)
|
|
29
|
+
# @adr ADR-032 (pending-questions artefact precedent for jsonl)
|
|
30
|
+
# @adr ADR-005 / ADR-037 (testing strategy — bridge during harness build)
|
|
31
|
+
# @jtbd JTBD-006 (AFK loop empirical-discovery surface)
|
|
32
|
+
|
|
33
|
+
SKILL_FILE="${BATS_TEST_DIRNAME}/../SKILL.md"
|
|
34
|
+
|
|
35
|
+
setup() {
|
|
36
|
+
[ -f "$SKILL_FILE" ]
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
# ── Deviation-candidate schema documented in ITERATION_SUMMARY contract ─────
|
|
40
|
+
|
|
41
|
+
@test "SKILL.md ITERATION_SUMMARY.outstanding_questions schema documents deviation-candidate entry shape" {
|
|
42
|
+
run grep -F "deviation-approval" "$SKILL_FILE"
|
|
43
|
+
[ "$status" -eq 0 ]
|
|
44
|
+
run grep -F "Deviation-candidate entry" "$SKILL_FILE"
|
|
45
|
+
[ "$status" -eq 0 ]
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
@test "deviation-candidate schema requires existing_decision citation field" {
|
|
49
|
+
run grep -F "existing_decision:" "$SKILL_FILE"
|
|
50
|
+
[ "$status" -eq 0 ]
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
@test "deviation-candidate schema requires contradicting_evidence citation per ADR-026 grounding" {
|
|
54
|
+
run grep -F "contradicting_evidence:" "$SKILL_FILE"
|
|
55
|
+
[ "$status" -eq 0 ]
|
|
56
|
+
run grep -F "ADR-026" "$SKILL_FILE"
|
|
57
|
+
[ "$status" -eq 0 ]
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
@test "deviation-candidate schema requires proposed_shape ∈ {amend, supersede, one-time}" {
|
|
61
|
+
run grep -F "proposed_shape:" "$SKILL_FILE"
|
|
62
|
+
[ "$status" -eq 0 ]
|
|
63
|
+
run grep -F '"amend" | "supersede" | "one-time"' "$SKILL_FILE"
|
|
64
|
+
[ "$status" -eq 0 ]
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
# ── No-auto-deviate contract ────────────────────────────────────────────────
|
|
68
|
+
|
|
69
|
+
@test "SKILL.md asserts agent does NOT auto-deviate when existing decision appears no-longer-right" {
|
|
70
|
+
run grep -F "does **NOT auto-deviate**" "$SKILL_FILE"
|
|
71
|
+
[ "$status" -eq 0 ]
|
|
72
|
+
}
|
|
73
|
+
|
|
74
|
+
@test "SKILL.md asserts agent never blindly follows against evidence" {
|
|
75
|
+
run grep -F "never blindly follows against evidence" "$SKILL_FILE"
|
|
76
|
+
[ "$status" -eq 0 ]
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
@test "Phase 3 contract: not-queueing-when-strong-contradicting-evidence-exists is a regression" {
|
|
80
|
+
run grep -F "Not-queueing-when-strong-contradicting-evidence-exists is a regression" "$SKILL_FILE"
|
|
81
|
+
[ "$status" -eq 0 ]
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
# ── Loop-end 5-option AskUserQuestion emit ─────────────────────────────────
|
|
85
|
+
|
|
86
|
+
@test "Step 2.5 deviation-candidate loop-end emit presents the 5-option AskUserQuestion" {
|
|
87
|
+
run grep -F "Approve + amend ADR" "$SKILL_FILE"
|
|
88
|
+
[ "$status" -eq 0 ]
|
|
89
|
+
run grep -F "Approve + supersede ADR" "$SKILL_FILE"
|
|
90
|
+
[ "$status" -eq 0 ]
|
|
91
|
+
run grep -F "Approve + one-time exception" "$SKILL_FILE"
|
|
92
|
+
[ "$status" -eq 0 ]
|
|
93
|
+
run grep -F "Reject (existing decision stands)" "$SKILL_FILE"
|
|
94
|
+
[ "$status" -eq 0 ]
|
|
95
|
+
run grep -F "Defer (need more evidence)" "$SKILL_FILE"
|
|
96
|
+
[ "$status" -eq 0 ]
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
@test "Step 2.5 ranking puts deviation-approval at highest precedence" {
|
|
100
|
+
run grep -F "deviation-approval (highest)" "$SKILL_FILE"
|
|
101
|
+
[ "$status" -eq 0 ]
|
|
102
|
+
}
|
|
103
|
+
|
|
104
|
+
# ── Jsonl persistence across iter subprocess boundary ───────────────────────
|
|
105
|
+
|
|
106
|
+
@test "SKILL.md persists outstanding_questions to .afk-run-state/outstanding-questions.jsonl" {
|
|
107
|
+
run grep -F ".afk-run-state/outstanding-questions.jsonl" "$SKILL_FILE"
|
|
108
|
+
[ "$status" -eq 0 ]
|
|
109
|
+
}
|
|
110
|
+
|
|
111
|
+
@test "SKILL.md cites ADR-032 pending-questions artefact precedent for the jsonl shape" {
|
|
112
|
+
run grep -F "ADR-032 pending-questions artefact precedent" "$SKILL_FILE"
|
|
113
|
+
[ "$status" -eq 0 ]
|
|
114
|
+
}
|
|
115
|
+
|
|
116
|
+
# ── Anti-BUFD-for-framework-evolution clause cross-reference ────────────────
|
|
117
|
+
|
|
118
|
+
@test "SKILL.md cites ADR-044's anti-BUFD-for-framework-evolution as the rationale" {
|
|
119
|
+
run grep -F "anti-BUFD-for-framework-evolution" "$SKILL_FILE"
|
|
120
|
+
[ "$status" -eq 0 ]
|
|
121
|
+
}
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
#
|
|
3
|
+
# packages/itil/skills/work-problems/test/work-problems-mid-loop-userpromptsubmit-handler.bats
|
|
4
|
+
#
|
|
5
|
+
# Behavioural tests for the mid-loop UserPromptSubmit handler contract
|
|
6
|
+
# (P135 Phase 3 / R4 / ADR-044). The explicit corrective for the
|
|
7
|
+
# 2026-04-27 iter-9-killed overcorrection.
|
|
8
|
+
#
|
|
9
|
+
# When the orchestrator receives a user message DURING an iter, the
|
|
10
|
+
# in-flight iter MUST complete naturally to its ITERATION_SUMMARY
|
|
11
|
+
# emission BEFORE the orchestrator surfaces the queue + new direction.
|
|
12
|
+
# Killing the iter mid-flight (SIGTERM the iter PID) is forbidden —
|
|
13
|
+
# it wastes in-flight work and breaks the iter subprocess contract.
|
|
14
|
+
#
|
|
15
|
+
# tdd-review: structural-permitted (justification: skill behavioural
|
|
16
|
+
# harness pending P012 + P081 Phase 2; SKILL.md contract assertions
|
|
17
|
+
# bridge until then; expected to migrate to behavioural form once the
|
|
18
|
+
# harness exists)
|
|
19
|
+
#
|
|
20
|
+
# @problem P135 Phase 3 R4
|
|
21
|
+
# @adr ADR-044 (Decision-Delegation Contract — mid-loop interrupt handling)
|
|
22
|
+
# @adr ADR-032 (subprocess-boundary contract; iter completes naturally)
|
|
23
|
+
# @adr ADR-005 / ADR-037 (testing strategy — bridge during harness build)
|
|
24
|
+
# @jtbd JTBD-006 (AFK orchestrator must respect in-flight iter)
|
|
25
|
+
|
|
26
|
+
SKILL_FILE="${BATS_TEST_DIRNAME}/../SKILL.md"
|
|
27
|
+
|
|
28
|
+
setup() {
|
|
29
|
+
[ -f "$SKILL_FILE" ]
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
# ── Mid-loop UserPromptSubmit handler contract ──────────────────────────────
|
|
33
|
+
|
|
34
|
+
@test "SKILL.md documents mid-loop UserPromptSubmit handler (P135 Phase 3 R4)" {
|
|
35
|
+
run grep -F "Mid-loop UserPromptSubmit handling" "$SKILL_FILE"
|
|
36
|
+
[ "$status" -eq 0 ]
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
@test "SKILL.md mid-loop handler clause MUST let the in-flight iter complete naturally to ITERATION_SUMMARY emission" {
|
|
40
|
+
run grep -F "complete naturally to its" "$SKILL_FILE"
|
|
41
|
+
[ "$status" -eq 0 ]
|
|
42
|
+
[[ "$output" == *"ITERATION_SUMMARY"* ]]
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
@test "SKILL.md mid-loop handler clause forbids SIGTERM to the iter PID" {
|
|
46
|
+
run grep -F "no SIGTERM to the iter PID" "$SKILL_FILE"
|
|
47
|
+
[ "$status" -eq 0 ]
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
@test "SKILL.md mid-loop handler clause forbids killing iter mid-flight" {
|
|
51
|
+
run grep -F "Do NOT abort the iter mid-flight" "$SKILL_FILE"
|
|
52
|
+
[ "$status" -eq 0 ]
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
@test "SKILL.md cites the 2026-04-27 iter-9-killed overcorrection as the corrective precedent" {
|
|
56
|
+
run grep -F "iter-9-killed overcorrection" "$SKILL_FILE"
|
|
57
|
+
[ "$status" -eq 0 ]
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
@test "SKILL.md mid-loop handler surfaces the queue + new direction together AFTER iter completes" {
|
|
61
|
+
run grep -F "surfaces the queue + the new direction together" "$SKILL_FILE"
|
|
62
|
+
[ "$status" -eq 0 ]
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
@test "SKILL.md cites the ~$5 + 25 min wasted-work cost as the loss measurement for the corrective" {
|
|
66
|
+
# Concrete cost citation per ADR-026 grounding
|
|
67
|
+
run grep -F "$5 + 25 min" "$SKILL_FILE"
|
|
68
|
+
[ "$status" -eq 0 ]
|
|
69
|
+
}
|