npm - @windyroad/retrospective - Versions diffs - 0.21.3 → 0.21.4-preview.449 - Mend

@windyroad/retrospective 0.21.3 → 0.21.4-preview.449

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/.claude-plugin/plugin.json +1 -1
package/package.json +1 -1
package/skills/run-retro/SKILL.md +22 -0
package/skills/run-retro/test/run-retro-step-4a-prior-session-evidence-drain.bats +168 -0

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -66,5 +66,5 @@
     }
   },
   "name": "wr-retrospective",
-  "version": "0.21.3"
+  "version": "0.21.4"
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@windyroad/retrospective",
-  "version": "0.21.3",
+  "version": "0.21.4-preview.449",
   "description": "Session retrospectives that update briefings and create problem tickets",
   "bin": {
     "windyroad-retrospective": "./bin/install.mjs"

package/skills/run-retro/SKILL.md CHANGED Viewed

@@ -393,6 +393,28 @@ Problems whose fix shipped but whose closure is still pending (`docs/problems/*.
 8. **Same-session verifyings excluded** (unchanged from P068 design): `.verifying.md` tickets for fixes that ship in the currently-running session (e.g. P127, P065, P126, P101 just transitioned this session) are NOT close-candidates — a session cannot verify its own fix beyond "bats passed at commit time"; subsequent-session exercise is the meaningful signal. Same-session verifyings are skipped in step 4 categorisation.
+9. **Prior-session evidence drain (P282)** — surfaces tickets whose `## Fix Released` section was written in a prior session AND whose `docs/problems/README.md` Verification Queue `Likely verified?` cell already records `yes — observed: <citations>` from that prior session. Sub-steps 1-8 above scan the CURRENT session's tool-call activity for evidence; this sub-step consumes durable on-disk evidence that was structurally invisible to those scans — the evidence is not in any later session's tool-call context, so without this drain a prior-session-verified ticket stays in `verifying` forever.
+   **Why a separate stage**: the same-session exclusion in sub-step 8 correctly prevents a session from verifying its own fix. Without this stage, a ticket whose evidence landed in a prior session has no surface that ever re-considers it — closure depends on a user manually prompting. 2026-05-26 evidence in this repo (P282 Related section): 8/91 `verifying/` rows carried `yes — observed: …` from prior sessions; none auto-closed; the README Verification Queue grew to 134 KB exceeding the Read-tool 25K-token whole-file cap, forcing persisted-output + paged reads.
+   **Sub-steps:**
+   a. **Read `docs/problems/README.md`** Verification Queue table (the section starts at the `## Verification Queue` heading and ends at the next `## ` heading). Parse each row's `Likely verified?` cell — the last `|`-delimited column.
+   b. **Filter to evidence-bearing rows**: cell value begins with `yes — observed:` — the canonical P186 evidence-first cell shape (`yes — observed: <citations>` / `no — not observed` / `no — observed regression`). The `no — *` rows are skipped (no durable evidence yet); the `yes — observed:` rows are the close-candidates.
+   c. **Same-session exclusion (inherited from sub-step 8)**: skip rows whose `.verifying.md` rename was committed in the current session. Detect via `git log --since=<session-start> --diff-filter=R --name-status` filtered to renames into `docs/problems/verifying/`. A ticket whose `yes — observed:` cell was written in the current session has its rename in the current session's git log and is excluded from the drain — sub-steps 5-7 already handled it via the in-session evidence flow.
+   d. **Dispatch close** per the same cross-plugin contract as sub-step 5: invoke `/wr-itil:transition-problem <NNN> close` via the Skill tool. The dispatch success / failure / unavailable outcomes are recorded in the Step 5 Verification Candidates table per sub-step 7's contract — uniform treatment regardless of evidence source.
+   e. **Record source distinction** in the Decision column: append `(prior-session README cell)` to the Decision text. The Citations column carries the README cell's `yes — observed: <citations>` text verbatim so the user can audit the evidence that drove the close.
+   **Composition**: this sub-step fires AFTER sub-steps 5-7 dispatched any current-session evidence. Each transition-problem dispatch refreshes the README per P062, so by the time sub-step 9 reads the README the rows handled by sub-steps 5-7 are already gone — the remaining `yes — observed:` rows are exactly the prior-session set.
+   **Recovery path (inherited from sub-step 6)**: a wrong close is reversible via `/wr-itil:transition-problem <NNN> known-error` (or the `.verifying.md` flip-back path used in the 2026-04-27 P124 regression). Recovery is a single-skill invocation.
+   **Closes P282** (V→Closed transition skipped when validation lands inline) — the README `Likely verified?` cell is the durable encoding of prior-session validation evidence; consuming it pairs the lifecycle transition with the evidence already on disk. The body-content-scan trigger surfaces (option (a) `/wr-itil:transition-problem` Step 4 pre-flight; option (b) PostToolUse hook on `.verifying.md` Edit) named in P282's Investigation Tasks are within thin-extension territory but deferred per the architect+JTBD verdicts ranking (c) highest persona-service — if the prior-session drain proves insufficient on next-session evidence, capture a sibling ticket for the body-content scan surface.
 **ADR-032 supersession note** (was: ADR-027 compatibility note): ADR-027's Step-0 subagent auto-delegation was superseded by **ADR-032** (Governance skill invocation patterns). No Step-0 subagent migration applies to run-retro — Step 4a's evidence scan runs directly in main-agent context, where session-activity citations are natively grounded per ADR-026. The hypothetical session-activity-summary marshalling this note previously discussed is obviated by the supersession; preserved here as audit-trail continuity for prior cross-references.
 **Interaction with other surfaces**:

package/skills/run-retro/test/run-retro-step-4a-prior-session-evidence-drain.bats ADDED Viewed

@@ -0,0 +1,168 @@
+#!/usr/bin/env bats
+#
+# packages/retrospective/skills/run-retro/test/run-retro-step-4a-prior-session-evidence-drain.bats
+#
+# Contract assertions for run-retro Step 4a's prior-session evidence drain
+# stage (P282). Step 4a's current sub-steps 1-8 scan the CURRENT session's
+# tool-call activity for evidence. This stage consumes durable on-disk
+# evidence — README Verification Queue rows whose `Likely verified?` cell
+# already records `yes — observed: <citations>` from a prior session — that
+# is structurally invisible to current-session scans.
+#
+# Background: 2026-05-26 evidence in this repo (P282 Related section) —
+# 8/91 `verifying/` rows had `yes — observed: …` from prior sessions; none
+# auto-closed; the Verification Queue grew to 134 KB exceeding the Read-tool
+# 25K-token whole-file cap. Closure required user prompting. The drain
+# closes that gap.
+#
+# Tests are behavioural per ADR-005 / ADR-037 / ADR-044 — they assert what
+# the SKILL contract DOES (mechanism + observable outcome) by inspecting
+# the SKILL.md text + the precedents it cites. Per ADR-044 Confirmation
+# Criteria (a), the test FILE exists and is named; the per-assertion shape
+# matures as the behavioural-test harness for LLM-interpreted skills lands
+# (P081 Phase 2/3 deferred; P012 harness work).
+#
+# tdd-review: structural-permitted (justification: skill behavioural
+# harness pending P012 + P081 Phase 2; SKILL.md contract assertions are
+# the bridge until then; behavioural fixture at the foot of this file
+# exercises a sample README VQ table to confirm the drain's row-detection
+# heuristic against a real evidence-bearing cell)
+#
+# @problem P282
+# @adr ADR-022 (verification-pending lifecycle)
+# @adr ADR-014 (commit grain)
+# @adr ADR-044 (decision-delegation — close-on-evidence)
+# @adr ADR-074 (substance-confirm-before-build — this fix passed the gate)
+# @jtbd JTBD-001 / JTBD-006 / JTBD-201
+SKILL_FILE="${BATS_TEST_DIRNAME}/../SKILL.md"
+setup() {
+  [ -f "$SKILL_FILE" ] || skip "SKILL.md not found"
+}
+@test "run-retro: Step 4a documents a prior-session evidence drain stage (P282)" {
+  # The drain stage is the headline behaviour; SKILL.md must name it.
+  run grep -F 'Prior-session evidence drain (P282)' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+@test "run-retro: prior-session drain reads docs/problems/README.md Verification Queue" {
+  # The cell to consume is in the README's Verification Queue table, not
+  # in any current-session activity stream. SKILL.md must name the source.
+  run awk '/Prior-session evidence drain/,/^   \*\*Composition/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"docs/problems/README.md"* ]]
+  [[ "$output" == *"Verification Queue"* ]]
+}
+@test "run-retro: prior-session drain filters on P186 evidence-first cell shape" {
+  # The canonical signal is `yes — observed: <citations>` per P186.
+  # SKILL.md must cite the exact cell shape as the filter predicate so
+  # adopters and future agents can grep for the right value.
+  run awk '/Prior-session evidence drain/,/^   \*\*Composition/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"yes — observed:"* ]]
+  [[ "$output" == *"P186"* ]]
+}
+@test "run-retro: prior-session drain preserves same-session exclusion (sub-step 8)" {
+  # The drain MUST inherit Step 4a's same-session exclusion (sub-step 8)
+  # so a session cannot verify its own fix via the README cell either.
+  # The exclusion is the load-bearing constraint distinguishing "prior
+  # session wrote the cell" from "current session wrote the cell".
+  run awk '/Prior-session evidence drain/,/^   \*\*Composition/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"same-session"* ]] || [[ "$output" == *"current session"* ]]
+}
+@test "run-retro: prior-session drain delegates close via /wr-itil:transition-problem" {
+  # Per Step 4a's existing dispatch contract (sub-steps 5-7), the close
+  # MUST route through /wr-itil:transition-problem <NNN> close. run-retro
+  # never renames, edits Status, or commits — the ownership boundary holds.
+  run awk '/Prior-session evidence drain/,/^   \*\*Composition/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"/wr-itil:transition-problem"* ]]
+}
+@test "run-retro: prior-session drain inherits dispatch outcome contract (P135 R3)" {
+  # The dispatch success / failure / unavailable outcomes are recorded
+  # in the Step 5 Verification Candidates table per sub-step 7. The
+  # drain stage MUST cite the inheritance so behaviour is uniform.
+  run awk '/Prior-session evidence drain/,/^   \*\*Composition/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"sub-step 7"* ]] || [[ "$output" == *"Verification Candidates"* ]]
+}
+@test "run-retro: prior-session drain records source distinction in Decision column" {
+  # The Decision column must distinguish drained-from-README from
+  # current-session-dispatched closes so the user can scan the source
+  # of each close at retro-summary review time.
+  run awk '/Prior-session evidence drain/,/^   \*\*Composition/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"prior-session README cell"* ]]
+}
+@test "run-retro: prior-session drain documents the recovery path inline (P135 R5)" {
+  # Closes are reversible via /wr-itil:transition-problem <NNN> known-error
+  # (the verifying-flip-back path); the drain stage inherits this contract.
+  run awk '/Prior-session evidence drain/,/\*\*Closes P282\*\*/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"Recovery"* ]] || [[ "$output" == *"recoverable"* ]] || [[ "$output" == *"reversible"* ]]
+}
+@test "run-retro: prior-session drain composes with current-session dispatch ordering" {
+  # The drain fires AFTER sub-steps 5-7 dispatched current-session
+  # evidence. The Composition note must name the ordering so future
+  # readers don't accidentally re-order the stages.
+  run grep -F '**Composition**' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  run awk '/Prior-session evidence drain/,/\*\*Closes P282\*\*/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"AFTER"* ]] || [[ "$output" == *"after"* ]]
+}
+@test "run-retro: prior-session drain cites the 2026-05-26 evidence motivating the fix" {
+  # The drain's rationale is grounded in observable repo evidence: 8 rows
+  # carried `yes — observed:` across prior sessions and none auto-closed
+  # until user-prompted. SKILL.md cites the evidence so future maintainers
+  # can audit the design driver.
+  run awk '/Prior-session evidence drain/,/\*\*Closes P282\*\*/' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+  [[ "$output" == *"2026-05-26"* ]] || [[ "$output" == *"134 KB"* ]] || [[ "$output" == *"8/91"* ]]
+}
+@test "run-retro: prior-session drain explicitly closes P282" {
+  # The drain stage names P282 as the ticket it closes so the
+  # ticket-to-behavior link is greppable. (Same convention used by
+  # the existing Step 4a header which names P068.)
+  run grep -F '**Closes P282**' "$SKILL_FILE"
+  [ "$status" -eq 0 ]
+}
+# Behavioural fixture — exercises the row-detection heuristic against a
+# sample README VQ table. Asserts the drain's filter predicate matches
+# `yes — observed:` rows and skips `no — not observed` / `no — observed
+# regression` rows.
+@test "behavioural: drain filter matches yes — observed rows and skips no rows" {
+  # Build a minimal README VQ-shaped fixture in a temp dir.
+  TMP="$(mktemp -d)"
+  cat > "$TMP/README.md" <<'EOF'
+## Verification Queue
+| ID | Title | Released | Likely verified? |
+| --- | --- | --- | --- |
+| P100 | sample one | 2026-04-01 | yes — observed: cited evidence |
+| P101 | sample two | 2026-04-02 | no — not observed |
+| P102 | sample three | 2026-04-03 | yes — observed: more evidence |
+| P103 | sample four | 2026-04-04 | no — observed regression |
+EOF
+  # The drain's row-detection predicate: the cell starts with `yes — observed:`.
+  # Assert the fixture matches the expected row count.
+  matched=$(grep -cE '\| yes — observed:' "$TMP/README.md")
+  [ "$matched" -eq 2 ]
+  # And the skip predicates match the other two.
+  skipped=$(grep -cE '\| no — not observed|\| no — observed regression' "$TMP/README.md")
+  [ "$skipped" -eq 2 ]
+  rm -rf "$TMP"
+}