@windyroad/retrospective 0.8.0-preview.185 → 0.8.0-preview.187

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@windyroad/retrospective",
3
- "version": "0.8.0-preview.185",
3
+ "version": "0.8.0-preview.187",
4
4
  "description": "Session retrospectives that update briefings and create problem tickets",
5
5
  "bin": {
6
6
  "windyroad-retrospective": "./bin/install.mjs"
@@ -14,6 +14,56 @@ Reflect on the current session, update the project briefing, and create problem
14
14
 
15
15
  Read `docs/briefing/README.md` — the per-topic index, Critical Points summary, and per-file hooks. Then read each topic file referenced in the Topic Index (`docs/briefing/<topic>.md`) to understand what previous sessions captured under each heading. During the P100 transition window the legacy single-file `docs/BRIEFING.md` may still exist as a stub pointer; it is read-only until P100 slice 2 retires it.
16
16
 
17
+ ### 1.5. Briefing signal-vs-noise pass (P105)
18
+
19
+ After reading the briefing tree, score every entry in `docs/briefing/*.md` to decide whether it was **signal** (useful this session), **noise** (loaded but not useful), or **decay-only** (not in context at all). This pass drives the Critical Points roll-up curation that the SessionStart hook consumes.
20
+
21
+ **Scoring rules** (applied per entry, per retro cycle):
22
+
23
+ | Event | Delta | Trigger |
24
+ |-------|-------|---------|
25
+ | Signal | +2 | Entry was cited, paraphrased, or acted on during this session. |
26
+ | Noise | -1 | Entry was loaded into context but not cited or acted on. |
27
+ | Decay | -1 | Applied to **all** entries every retro cycle, regardless of signal/noise status. |
28
+
29
+ **Grounding requirement (ADR-026)**: every classification MUST carry a specific citation — the tool invocation, reasoning paraphrase, or session position that exercised (or failed to exercise) the entry. Bare classifications are forbidden. The citation is recorded in the retro summary (Step 5) so the user can audit the agent's judgment.
30
+
31
+ **Thresholds and actions**:
32
+
33
+ | Score range | Action |
34
+ |-------------|--------|
35
+ | >= +3 | Promote to Critical Points candidate. The agent adds the entry to the Critical Points roll-up in `docs/briefing/README.md` during Step 3. |
36
+ | 0 .. +2 | Keep in the topic file. No roll-up change. |
37
+ | <= -3 | Route to the **delete queue**. These entries are surfaced for user confirmation in a single batched `AskUserQuestion` at the end of this step. |
38
+
39
+ **Per-entry persistence format**: each briefing entry carries a trailing HTML comment block:
40
+
41
+ ```markdown
42
+ - Entry text body goes here.
43
+ <!-- signal-score: 2 | last-classified: 2026-04-22 | first-written: 2026-04-15 -->
44
+ ```
45
+
46
+ The comment block is appended to the list item (or heading) that contains the entry text. `first-written` is set when the entry is created and never changed; `last-classified` and `signal-score` are updated each retro. If an entry lacks a comment block, treat `signal-score` as `0` and set `first-written` to today.
47
+
48
+ **Classification ownership (policy-authorised per ADR-013 Rule 5)**: the agent owns silent classification. No `AskUserQuestion` is fired for individual entry promotions, demotions, or keep decisions. The agent applies the ADR-026 heuristic directly: entry cited in a tool call (or paraphrased in reasoning) during the session = signal; never loaded or loaded-but-unused = noise; ambiguous cases still classify but with a tentative flag the next retro resolves.
49
+
50
+ **Delete queue confirmation**: after scoring all entries, if any entries have a score <= -3:
51
+
52
+ 1. Present a single `AskUserQuestion` with `header: "Delete briefing entries?"` and `multiSelect: false`.
53
+ 2. The question body lists each delete candidate with its score and the ADR-026 citation that led to the noise classification.
54
+ 3. Options (up to 4 per prompt, sequential if > 4):
55
+ 1. `Confirm all deletions` — description: "Remove all listed entries from their topic files."
56
+ 2. `Delete selected only` — description: "The agent will present a follow-up with per-entry checkboxes."
57
+ 3. `Keep all (defer to next retro)` — description: "Leave entries in place; scores remain unchanged."
58
+ 4. `Review individually` — description: "Present each entry one at a time for keep/delete decision."
59
+ 4. If the queue is empty, skip the prompt entirely.
60
+
61
+ If the user chooses `Delete selected only` or `Review individually`, present subsequent `AskUserQuestion` calls as needed, respecting the 4-option cap per ADR-013 Rule 1.
62
+
63
+ **Tier 1 budget guard**: if promoting all score >= +3 entries would breach the 2 KB / ~10-bullet Critical Points budget (ADR-040), promote only the highest-scored entries until the budget is met and surface the remainder as a budget-overflow advisory in the retro summary.
64
+
65
+ **Non-interactive / AFK fallback (ADR-013 Rule 6)**: when `AskUserQuestion` is unavailable, classify silently and defer the delete queue to the retro summary (Step 5). Do NOT auto-delete entries in AFK mode. The retro summary's "Signal-vs-Noise Pass" section lists each delete candidate with score and citation so the user can review on return. Same trust-boundary shape as Step 2b and Step 4a.
66
+
17
67
  ### 2. Reflect on this session
18
68
 
19
69
  Consider the work done in this session and identify:
@@ -64,7 +114,7 @@ The shape mirrors P068's Step 4a Verification-close housekeeping: glob / evidenc
64
114
 
65
115
  **Signal categories** — each detection is tagged with the primary category. A detection may match multiple categories; pick the one whose fix path is most concrete.
66
116
 
67
- 1. **Hook-protocol friction** — gate-marker TTL expiries mid-work (e.g. architect-hook 1800s TTL per ADR-009 expiring while drafting a long file), marker-vs-file deadlocks (a gate demands PASS before a Write; the agent refuses to PASS on a file that doesn't exist yet), hook-exemption scope gaps, hooks firing on paths they shouldn't, hooks silently skipping paths they should.
117
+ 1. **Hook-protocol friction** — gate-marker TTL expiries mid-work (e.g. architect-hook 3600s TTL per ADR-009 expiring while drafting a very long file — was 1800s before P107), marker-vs-file deadlocks (a gate demands PASS before a Write; the agent refuses to PASS on a file that doesn't exist yet), hook-exemption scope gaps, hooks firing on paths they shouldn't, hooks silently skipping paths they should.
68
118
  2. **Skill-contract violations** — skill steps that collide (e.g. ADR-027 Step 0 colliding with ADR-031 auto-migration Step 0), skills that return empty on paths they should handle (e.g. work-problems false-zero-bail on flat-layout adopter repos), skills whose AskUserQuestion options exceed the 4-option cap (per P061), skills that silently swallow error states the contract says should halt.
69
119
  3. **Release-path instability** — `push:watch` / `release:watch` misbehaviour (P054, P060 class — reporting success on a stale SHA's workflow run), changeset authoring defects (P073), release-PR body issues, npm publish failing on metadata mismatch.
70
120
  4. **Subagent-delegation friction** — architect / jtbd / risk-scorer / style-guide / voice-tone agents returning `DEFERRED` or `ISSUES FOUND` that block progress, PASS markers failing to write, agent prompts timing out, agent outputs missing the specific citations ADR-026 requires.
@@ -117,7 +167,7 @@ For each accepted learning:
117
167
  After editing topic files, update `docs/briefing/README.md`:
118
168
 
119
169
  - Refresh per-file summaries in the Topic Index if the topic file's character changed.
120
- - Promote an entry into the Critical Points section only when it is genuinely among the highest-value rules of the session (the session-start surface is small and curated adding there is a user-interactive decision per the helpfulness rating slice 2 will add).
170
+ - Promote an entry into the Critical Points section when its signal-score is >= +3 (agent-driven per Step 1.5). The session-start surface is small and curated; the agent promotes the highest-scored entries first, respecting the Tier 1 budget guard. Demotion from Critical Points happens automatically when an entry's score drops below +3 after decay. The remaining user-interactive boundary is the delete queue (score <= -3), which requires explicit user confirmation.
121
171
 
122
172
  Use the AskUserQuestion tool to confirm any removals: "I would like to remove [item] from `docs/briefing/<topic>.md` because [reason]. Is this correct?"
123
173
 
@@ -259,6 +309,20 @@ Present a summary to the user:
259
309
  - Updated: [items modified]
260
310
  - README index refreshed: [per-file summaries or Critical Points changes]
261
311
 
312
+ ### Signal-vs-Noise Pass (P105)
313
+
314
+ (Emitted only when Step 1.5 scored briefing entries. Always present when run-retro is invoked — the pass runs regardless of other outcomes. In non-interactive / AFK mode, the delete queue is surfaced here instead of firing `AskUserQuestion`.)
315
+
316
+ | Entry | Topic file | Old score | New score | Classification | Citation |
317
+ |-------|-----------|-----------|-----------|----------------|----------|
318
+ | <one-line entry summary> | `docs/briefing/<topic>.md` | <old> | <new> | signal / noise / decay-only | <tool call or reasoning paraphrase> |
319
+
320
+ **Critical Points changes**: list any entries promoted to or demoted from the Critical Points roll-up.
321
+
322
+ **Delete queue** (only when non-empty): list each score <= -3 candidate with its score and citation. In interactive mode, note the user's decision (`confirmed / deferred / kept`). In AFK mode, label each as `deferred to next interactive session`.
323
+
324
+ **Budget overflow** (only when triggered): list any score >= +3 entries that were NOT promoted because the Tier 1 budget was already met.
325
+
262
326
  ### Problems Created/Updated
263
327
  - [problem ticket]: [summary]
264
328
 
@@ -0,0 +1,129 @@
1
+ #!/usr/bin/env bats
2
+
3
+ # P105: run-retro SKILL.md documents a signal-vs-noise pass (Step 1.5) that
4
+ # scores every briefing entry per retro cycle, drives Critical Points curation,
5
+ # and gates deletion behind a batched user-confirmation queue.
6
+ #
7
+ # Doc-lint structural test (Permitted Exception per ADR-005). Asserts
8
+ # SKILL.md wording for: the step header, placement between Step 1 and Step 2,
9
+ # the three scoring rules (signal +2, noise -1, decay -1), the HTML comment
10
+ # persistence format, the ADR-026 grounding requirement, the threshold
11
+ # actions (promote/keep/delete), the Tier 1 budget guard, the delete-queue
12
+ # AskUserQuestion contract (ADR-013 Rule 1), the AFK fallback (ADR-013 Rule 6),
13
+ # the Step 3 agent-driven promotion update, and the Step 5 summary integration.
14
+
15
+ setup() {
16
+ REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
17
+ SKILL_MD="$REPO_ROOT/packages/retrospective/skills/run-retro/SKILL.md"
18
+ }
19
+
20
+ @test "run-retro: SKILL.md contains Step 1.5 Briefing signal-vs-noise pass (P105)" {
21
+ run grep -F '### 1.5. Briefing signal-vs-noise pass (P105)' "$SKILL_MD"
22
+ [ "$status" -eq 0 ]
23
+ }
24
+
25
+ @test "run-retro: Step 1.5 placement between Step 1 and Step 2" {
26
+ pos_1=$(grep -n '^### 1\. ' "$SKILL_MD" | head -1 | cut -d: -f1)
27
+ pos_1_5=$(grep -n '^### 1\.5\. ' "$SKILL_MD" | head -1 | cut -d: -f1)
28
+ pos_2=$(grep -n '^### 2\. ' "$SKILL_MD" | head -1 | cut -d: -f1)
29
+ [ -n "$pos_1" ]
30
+ [ -n "$pos_1_5" ]
31
+ [ -n "$pos_2" ]
32
+ [ "$pos_1" -lt "$pos_1_5" ]
33
+ [ "$pos_1_5" -lt "$pos_2" ]
34
+ }
35
+
36
+ @test "run-retro: Step 1.5 documents all three scoring events" {
37
+ run grep -F 'Signal | +2' "$SKILL_MD"
38
+ [ "$status" -eq 0 ]
39
+ run grep -F 'Noise | -1' "$SKILL_MD"
40
+ [ "$status" -eq 0 ]
41
+ run grep -F 'Decay | -1' "$SKILL_MD"
42
+ [ "$status" -eq 0 ]
43
+ }
44
+
45
+ @test "run-retro: Step 1.5 documents the HTML comment persistence format" {
46
+ run grep -F 'signal-score:' "$SKILL_MD"
47
+ [ "$status" -eq 0 ]
48
+ run grep -F 'last-classified:' "$SKILL_MD"
49
+ [ "$status" -eq 0 ]
50
+ run grep -F 'first-written:' "$SKILL_MD"
51
+ [ "$status" -eq 0 ]
52
+ }
53
+
54
+ @test "run-retro: Step 1.5 requires ADR-026 grounding for classifications" {
55
+ run grep -F 'ADR-026' "$SKILL_MD"
56
+ [ "$status" -eq 0 ]
57
+ run grep -F 'specific citation' "$SKILL_MD"
58
+ [ "$status" -eq 0 ]
59
+ run grep -F 'Bare classifications are forbidden' "$SKILL_MD"
60
+ [ "$status" -eq 0 ]
61
+ }
62
+
63
+ @test "run-retro: Step 1.5 documents promote threshold (score >= +3)" {
64
+ run grep -F '>= +3' "$SKILL_MD"
65
+ [ "$status" -eq 0 ]
66
+ run grep -F 'Promote to Critical Points' "$SKILL_MD"
67
+ [ "$status" -eq 0 ]
68
+ }
69
+
70
+ @test "run-retro: Step 1.5 documents delete queue threshold (score <= -3)" {
71
+ run grep -F '<= -3' "$SKILL_MD"
72
+ [ "$status" -eq 0 ]
73
+ run grep -F 'delete queue' "$SKILL_MD"
74
+ [ "$status" -eq 0 ]
75
+ }
76
+
77
+ @test "run-retro: Step 1.5 delete queue uses single batched AskUserQuestion (ADR-013 Rule 1)" {
78
+ run grep -F 'Delete briefing entries?' "$SKILL_MD"
79
+ [ "$status" -eq 0 ]
80
+ run grep -F 'Confirm all deletions' "$SKILL_MD"
81
+ [ "$status" -eq 0 ]
82
+ run grep -F 'Keep all (defer to next retro)' "$SKILL_MD"
83
+ [ "$status" -eq 0 ]
84
+ }
85
+
86
+ @test "run-retro: Step 1.5 AFK fallback defers delete queue to retro summary (ADR-013 Rule 6)" {
87
+ run grep -F 'Non-interactive / AFK fallback (ADR-013 Rule 6)' "$SKILL_MD"
88
+ [ "$status" -eq 0 ]
89
+ run grep -F 'Do NOT auto-delete' "$SKILL_MD"
90
+ [ "$status" -eq 0 ]
91
+ run grep -F 'Signal-vs-Noise Pass' "$SKILL_MD"
92
+ [ "$status" -eq 0 ]
93
+ }
94
+
95
+ @test "run-retro: Step 1.5 documents the Tier 1 budget guard" {
96
+ run grep -F 'Tier 1 budget guard' "$SKILL_MD"
97
+ [ "$status" -eq 0 ]
98
+ run grep -F '2 KB' "$SKILL_MD"
99
+ [ "$status" -eq 0 ]
100
+ }
101
+
102
+ @test "run-retro: Step 1.5 cites ADR-040 for the Critical Points budget" {
103
+ run grep -F 'ADR-040' "$SKILL_MD"
104
+ [ "$status" -eq 0 ]
105
+ }
106
+
107
+ @test "run-retro: Step 1.5 classification is policy-authorised silent (ADR-013 Rule 5)" {
108
+ run grep -F 'policy-authorised per ADR-013 Rule 5' "$SKILL_MD"
109
+ [ "$status" -eq 0 ]
110
+ run grep -F 'agent owns silent classification' "$SKILL_MD"
111
+ [ "$status" -eq 0 ]
112
+ }
113
+
114
+ @test "run-retro: Step 3 acknowledges agent-driven promotion from Step 1.5 score" {
115
+ run grep -F 'agent-driven per Step 1.5' "$SKILL_MD"
116
+ [ "$status" -eq 0 ]
117
+ run grep -F 'signal-score' "$SKILL_MD"
118
+ [ "$status" -eq 0 ]
119
+ }
120
+
121
+ @test "run-retro: Step 5 summary adds a Signal-vs-Noise Pass section" {
122
+ run grep -F '### Signal-vs-Noise Pass (P105)' "$SKILL_MD"
123
+ [ "$status" -eq 0 ]
124
+ }
125
+
126
+ @test "run-retro: Signal-vs-Noise Pass summary table columns match Step 1.5 output" {
127
+ run grep -F '| Entry | Topic file | Old score | New score | Classification | Citation |' "$SKILL_MD"
128
+ [ "$status" -eq 0 ]
129
+ }