@windyroad/retrospective 0.21.3 → 0.21.4-preview.449

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -66,5 +66,5 @@
66
66
  }
67
67
  },
68
68
  "name": "wr-retrospective",
69
- "version": "0.21.3"
69
+ "version": "0.21.4"
70
70
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/retrospective",
3
- "version": "0.21.3",
3
+ "version": "0.21.4-preview.449",
4
4
  "description": "Session retrospectives that update briefings and create problem tickets",
5
5
  "bin": {
6
6
  "windyroad-retrospective": "./bin/install.mjs"
@@ -393,6 +393,28 @@ Problems whose fix shipped but whose closure is still pending (`docs/problems/*.
393
393
 
394
394
  8. **Same-session verifyings excluded** (unchanged from P068 design): `.verifying.md` tickets for fixes that ship in the currently-running session (e.g. P127, P065, P126, P101 just transitioned this session) are NOT close-candidates — a session cannot verify its own fix beyond "bats passed at commit time"; subsequent-session exercise is the meaningful signal. Same-session verifyings are skipped in step 4 categorisation.
395
395
 
396
+ 9. **Prior-session evidence drain (P282)** — surfaces tickets whose `## Fix Released` section was written in a prior session AND whose `docs/problems/README.md` Verification Queue `Likely verified?` cell already records `yes — observed: <citations>` from that prior session. Sub-steps 1-8 above scan the CURRENT session's tool-call activity for evidence; this sub-step consumes durable on-disk evidence that was structurally invisible to those scans — the evidence is not in any later session's tool-call context, so without this drain a prior-session-verified ticket stays in `verifying` forever.
397
+
398
+ **Why a separate stage**: the same-session exclusion in sub-step 8 correctly prevents a session from verifying its own fix. Without this stage, a ticket whose evidence landed in a prior session has no surface that ever re-considers it — closure depends on a user manually prompting. 2026-05-26 evidence in this repo (P282 Related section): 8/91 `verifying/` rows carried `yes — observed: …` from prior sessions; none auto-closed; the README Verification Queue grew to 134 KB exceeding the Read-tool 25K-token whole-file cap, forcing persisted-output + paged reads.
399
+
400
+ **Sub-steps:**
401
+
402
+ a. **Read `docs/problems/README.md`** Verification Queue table (the section starts at the `## Verification Queue` heading and ends at the next `## ` heading). Parse each row's `Likely verified?` cell — the last `|`-delimited column.
403
+
404
+ b. **Filter to evidence-bearing rows**: cell value begins with `yes — observed:` — the canonical P186 evidence-first cell shape (`yes — observed: <citations>` / `no — not observed` / `no — observed regression`). The `no — *` rows are skipped (no durable evidence yet); the `yes — observed:` rows are the close-candidates.
405
+
406
+ c. **Same-session exclusion (inherited from sub-step 8)**: skip rows whose `.verifying.md` rename was committed in the current session. Detect via `git log --since=<session-start> --diff-filter=R --name-status` filtered to renames into `docs/problems/verifying/`. A ticket whose `yes — observed:` cell was written in the current session has its rename in the current session's git log and is excluded from the drain — sub-steps 5-7 already handled it via the in-session evidence flow.
407
+
408
+ d. **Dispatch close** per the same cross-plugin contract as sub-step 5: invoke `/wr-itil:transition-problem <NNN> close` via the Skill tool. The dispatch success / failure / unavailable outcomes are recorded in the Step 5 Verification Candidates table per sub-step 7's contract — uniform treatment regardless of evidence source.
409
+
410
+ e. **Record source distinction** in the Decision column: append `(prior-session README cell)` to the Decision text. The Citations column carries the README cell's `yes — observed: <citations>` text verbatim so the user can audit the evidence that drove the close.
411
+
412
+ **Composition**: this sub-step fires AFTER sub-steps 5-7 dispatched any current-session evidence. Each transition-problem dispatch refreshes the README per P062, so by the time sub-step 9 reads the README the rows handled by sub-steps 5-7 are already gone — the remaining `yes — observed:` rows are exactly the prior-session set.
413
+
414
+ **Recovery path (inherited from sub-step 6)**: a wrong close is reversible via `/wr-itil:transition-problem <NNN> known-error` (or the `.verifying.md` flip-back path used in the 2026-04-27 P124 regression). Recovery is a single-skill invocation.
415
+
416
+ **Closes P282** (V→Closed transition skipped when validation lands inline) — the README `Likely verified?` cell is the durable encoding of prior-session validation evidence; consuming it pairs the lifecycle transition with the evidence already on disk. The body-content-scan trigger surfaces (option (a) `/wr-itil:transition-problem` Step 4 pre-flight; option (b) PostToolUse hook on `.verifying.md` Edit) named in P282's Investigation Tasks are within thin-extension territory but deferred per the architect+JTBD verdicts ranking (c) highest persona-service — if the prior-session drain proves insufficient on next-session evidence, capture a sibling ticket for the body-content scan surface.
417
+
396
418
  **ADR-032 supersession note** (was: ADR-027 compatibility note): ADR-027's Step-0 subagent auto-delegation was superseded by **ADR-032** (Governance skill invocation patterns). No Step-0 subagent migration applies to run-retro — Step 4a's evidence scan runs directly in main-agent context, where session-activity citations are natively grounded per ADR-026. The hypothetical session-activity-summary marshalling this note previously discussed is obviated by the supersession; preserved here as audit-trail continuity for prior cross-references.
397
419
 
398
420
  **Interaction with other surfaces**:
@@ -0,0 +1,168 @@
1
+ #!/usr/bin/env bats
2
+ #
3
+ # packages/retrospective/skills/run-retro/test/run-retro-step-4a-prior-session-evidence-drain.bats
4
+ #
5
+ # Contract assertions for run-retro Step 4a's prior-session evidence drain
6
+ # stage (P282). Step 4a's current sub-steps 1-8 scan the CURRENT session's
7
+ # tool-call activity for evidence. This stage consumes durable on-disk
8
+ # evidence — README Verification Queue rows whose `Likely verified?` cell
9
+ # already records `yes — observed: <citations>` from a prior session — that
10
+ # is structurally invisible to current-session scans.
11
+ #
12
+ # Background: 2026-05-26 evidence in this repo (P282 Related section) —
13
+ # 8/91 `verifying/` rows had `yes — observed: …` from prior sessions; none
14
+ # auto-closed; the Verification Queue grew to 134 KB exceeding the Read-tool
15
+ # 25K-token whole-file cap. Closure required user prompting. The drain
16
+ # closes that gap.
17
+ #
18
+ # Tests are behavioural per ADR-005 / ADR-037 / ADR-044 — they assert what
19
+ # the SKILL contract DOES (mechanism + observable outcome) by inspecting
20
+ # the SKILL.md text + the precedents it cites. Per ADR-044 Confirmation
21
+ # Criteria (a), the test FILE exists and is named; the per-assertion shape
22
+ # matures as the behavioural-test harness for LLM-interpreted skills lands
23
+ # (P081 Phase 2/3 deferred; P012 harness work).
24
+ #
25
+ # tdd-review: structural-permitted (justification: skill behavioural
26
+ # harness pending P012 + P081 Phase 2; SKILL.md contract assertions are
27
+ # the bridge until then; behavioural fixture at the foot of this file
28
+ # exercises a sample README VQ table to confirm the drain's row-detection
29
+ # heuristic against a real evidence-bearing cell)
30
+ #
31
+ # @problem P282
32
+ # @adr ADR-022 (verification-pending lifecycle)
33
+ # @adr ADR-014 (commit grain)
34
+ # @adr ADR-044 (decision-delegation — close-on-evidence)
35
+ # @adr ADR-074 (substance-confirm-before-build — this fix passed the gate)
36
+ # @jtbd JTBD-001 / JTBD-006 / JTBD-201
37
+
38
+ SKILL_FILE="${BATS_TEST_DIRNAME}/../SKILL.md"
39
+
40
+ setup() {
41
+ [ -f "$SKILL_FILE" ] || skip "SKILL.md not found"
42
+ }
43
+
44
+ @test "run-retro: Step 4a documents a prior-session evidence drain stage (P282)" {
45
+ # The drain stage is the headline behaviour; SKILL.md must name it.
46
+ run grep -F 'Prior-session evidence drain (P282)' "$SKILL_FILE"
47
+ [ "$status" -eq 0 ]
48
+ }
49
+
50
+ @test "run-retro: prior-session drain reads docs/problems/README.md Verification Queue" {
51
+ # The cell to consume is in the README's Verification Queue table, not
52
+ # in any current-session activity stream. SKILL.md must name the source.
53
+ run awk '/Prior-session evidence drain/,/^ \*\*Composition/' "$SKILL_FILE"
54
+ [ "$status" -eq 0 ]
55
+ [[ "$output" == *"docs/problems/README.md"* ]]
56
+ [[ "$output" == *"Verification Queue"* ]]
57
+ }
58
+
59
+ @test "run-retro: prior-session drain filters on P186 evidence-first cell shape" {
60
+ # The canonical signal is `yes — observed: <citations>` per P186.
61
+ # SKILL.md must cite the exact cell shape as the filter predicate so
62
+ # adopters and future agents can grep for the right value.
63
+ run awk '/Prior-session evidence drain/,/^ \*\*Composition/' "$SKILL_FILE"
64
+ [ "$status" -eq 0 ]
65
+ [[ "$output" == *"yes — observed:"* ]]
66
+ [[ "$output" == *"P186"* ]]
67
+ }
68
+
69
+ @test "run-retro: prior-session drain preserves same-session exclusion (sub-step 8)" {
70
+ # The drain MUST inherit Step 4a's same-session exclusion (sub-step 8)
71
+ # so a session cannot verify its own fix via the README cell either.
72
+ # The exclusion is the load-bearing constraint distinguishing "prior
73
+ # session wrote the cell" from "current session wrote the cell".
74
+ run awk '/Prior-session evidence drain/,/^ \*\*Composition/' "$SKILL_FILE"
75
+ [ "$status" -eq 0 ]
76
+ [[ "$output" == *"same-session"* ]] || [[ "$output" == *"current session"* ]]
77
+ }
78
+
79
+ @test "run-retro: prior-session drain delegates close via /wr-itil:transition-problem" {
80
+ # Per Step 4a's existing dispatch contract (sub-steps 5-7), the close
81
+ # MUST route through /wr-itil:transition-problem <NNN> close. run-retro
82
+ # never renames, edits Status, or commits — the ownership boundary holds.
83
+ run awk '/Prior-session evidence drain/,/^ \*\*Composition/' "$SKILL_FILE"
84
+ [ "$status" -eq 0 ]
85
+ [[ "$output" == *"/wr-itil:transition-problem"* ]]
86
+ }
87
+
88
+ @test "run-retro: prior-session drain inherits dispatch outcome contract (P135 R3)" {
89
+ # The dispatch success / failure / unavailable outcomes are recorded
90
+ # in the Step 5 Verification Candidates table per sub-step 7. The
91
+ # drain stage MUST cite the inheritance so behaviour is uniform.
92
+ run awk '/Prior-session evidence drain/,/^ \*\*Composition/' "$SKILL_FILE"
93
+ [ "$status" -eq 0 ]
94
+ [[ "$output" == *"sub-step 7"* ]] || [[ "$output" == *"Verification Candidates"* ]]
95
+ }
96
+
97
+ @test "run-retro: prior-session drain records source distinction in Decision column" {
98
+ # The Decision column must distinguish drained-from-README from
99
+ # current-session-dispatched closes so the user can scan the source
100
+ # of each close at retro-summary review time.
101
+ run awk '/Prior-session evidence drain/,/^ \*\*Composition/' "$SKILL_FILE"
102
+ [ "$status" -eq 0 ]
103
+ [[ "$output" == *"prior-session README cell"* ]]
104
+ }
105
+
106
+ @test "run-retro: prior-session drain documents the recovery path inline (P135 R5)" {
107
+ # Closes are reversible via /wr-itil:transition-problem <NNN> known-error
108
+ # (the verifying-flip-back path); the drain stage inherits this contract.
109
+ run awk '/Prior-session evidence drain/,/\*\*Closes P282\*\*/' "$SKILL_FILE"
110
+ [ "$status" -eq 0 ]
111
+ [[ "$output" == *"Recovery"* ]] || [[ "$output" == *"recoverable"* ]] || [[ "$output" == *"reversible"* ]]
112
+ }
113
+
114
+ @test "run-retro: prior-session drain composes with current-session dispatch ordering" {
115
+ # The drain fires AFTER sub-steps 5-7 dispatched current-session
116
+ # evidence. The Composition note must name the ordering so future
117
+ # readers don't accidentally re-order the stages.
118
+ run grep -F '**Composition**' "$SKILL_FILE"
119
+ [ "$status" -eq 0 ]
120
+ run awk '/Prior-session evidence drain/,/\*\*Closes P282\*\*/' "$SKILL_FILE"
121
+ [ "$status" -eq 0 ]
122
+ [[ "$output" == *"AFTER"* ]] || [[ "$output" == *"after"* ]]
123
+ }
124
+
125
+ @test "run-retro: prior-session drain cites the 2026-05-26 evidence motivating the fix" {
126
+ # The drain's rationale is grounded in observable repo evidence: 8 rows
127
+ # carried `yes — observed:` across prior sessions and none auto-closed
128
+ # until user-prompted. SKILL.md cites the evidence so future maintainers
129
+ # can audit the design driver.
130
+ run awk '/Prior-session evidence drain/,/\*\*Closes P282\*\*/' "$SKILL_FILE"
131
+ [ "$status" -eq 0 ]
132
+ [[ "$output" == *"2026-05-26"* ]] || [[ "$output" == *"134 KB"* ]] || [[ "$output" == *"8/91"* ]]
133
+ }
134
+
135
+ @test "run-retro: prior-session drain explicitly closes P282" {
136
+ # The drain stage names P282 as the ticket it closes so the
137
+ # ticket-to-behavior link is greppable. (Same convention used by
138
+ # the existing Step 4a header which names P068.)
139
+ run grep -F '**Closes P282**' "$SKILL_FILE"
140
+ [ "$status" -eq 0 ]
141
+ }
142
+
143
+ # Behavioural fixture — exercises the row-detection heuristic against a
144
+ # sample README VQ table. Asserts the drain's filter predicate matches
145
+ # `yes — observed:` rows and skips `no — not observed` / `no — observed
146
+ # regression` rows.
147
+ @test "behavioural: drain filter matches yes — observed rows and skips no rows" {
148
+ # Build a minimal README VQ-shaped fixture in a temp dir.
149
+ TMP="$(mktemp -d)"
150
+ cat > "$TMP/README.md" <<'EOF'
151
+ ## Verification Queue
152
+
153
+ | ID | Title | Released | Likely verified? |
154
+ | --- | --- | --- | --- |
155
+ | P100 | sample one | 2026-04-01 | yes — observed: cited evidence |
156
+ | P101 | sample two | 2026-04-02 | no — not observed |
157
+ | P102 | sample three | 2026-04-03 | yes — observed: more evidence |
158
+ | P103 | sample four | 2026-04-04 | no — observed regression |
159
+ EOF
160
+ # The drain's row-detection predicate: the cell starts with `yes — observed:`.
161
+ # Assert the fixture matches the expected row count.
162
+ matched=$(grep -cE '\| yes — observed:' "$TMP/README.md")
163
+ [ "$matched" -eq 2 ]
164
+ # And the skip predicates match the other two.
165
+ skipped=$(grep -cE '\| no — not observed|\| no — observed regression' "$TMP/README.md")
166
+ [ "$skipped" -eq 2 ]
167
+ rm -rf "$TMP"
168
+ }