@windyroad/retrospective 0.5.0 → 0.6.0-preview.152
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -54,6 +54,52 @@ Counter-examples (what does **not** become a codification candidate):
|
|
|
54
54
|
- "The commit gate rejected my work twice because X was misconfigured" — diagnostic, project-specific. Still flows through Step 4b Stage 1 ticketing; the fix strategy (Stage 2) is captured as free-text under `Other codification shape` (e.g. hook tweak, script adjustment).
|
|
55
55
|
- "I always forget to run `npm run verify` before pushing" — short, user-habit rather than codifiable sequence → **memory** shape or **BRIEFING.md** note.
|
|
56
56
|
|
|
57
|
+
### 2b. Pipeline-instability scan (P074)
|
|
58
|
+
|
|
59
|
+
Step 2's reflection prompts are framed around the product-code work the session was trying to do. They under-report **pipeline-level instability** — bugs, regressions, or friction in the tools the session itself relied on (hooks, skills, subagent protocols, release scripts, TTL / marker contracts). Agents read the prompts and list "what I was trying to build" instead of "what was in the way of building it". Step 2b is a dedicated evidence-scan step that recovers those observations before Step 4's ticketing flow fires, so pipeline friction reaches the WSJF queue instead of accumulating off-ledger across sessions.
|
|
60
|
+
|
|
61
|
+
The shape mirrors P068's Step 4a Verification-close housekeeping: glob / evidence-scan / categorise / dedup / prompt. The ownership boundary is the same — run-retro surfaces the detection and delegates ticket creation to `/wr-itil:manage-problem` via the Skill tool; run-retro does not rename, edit, or commit problem-ticket files on its own (per ADR-014).
|
|
62
|
+
|
|
63
|
+
**Ownership boundary**: run-retro surfaces the detection and its specific citations; `/wr-itil:manage-problem` creates or updates the ticket and commits per ADR-014. run-retro does not write `.open.md` files directly — it delegates through the ticketing skill so the audit trail, WSJF scoring, and concern-boundary analysis all apply consistently. This matches Step 4a's boundary to manage-problem Step 7 and Step 4b Stage 1's boundary to manage-problem creation.
|
|
64
|
+
|
|
65
|
+
**Signal categories** — each detection is tagged with the primary category. A detection may match multiple categories; pick the one whose fix path is most concrete.
|
|
66
|
+
|
|
67
|
+
1. **Hook-protocol friction** — gate-marker TTL expiries mid-work (e.g. architect-hook 1800s TTL per ADR-009 expiring while drafting a long file), marker-vs-file deadlocks (a gate demands PASS before a Write; the agent refuses to PASS on a file that doesn't exist yet), hook-exemption scope gaps, hooks firing on paths they shouldn't, hooks silently skipping paths they should.
|
|
68
|
+
2. **Skill-contract violations** — skill steps that collide (e.g. ADR-027 Step 0 colliding with ADR-031 auto-migration Step 0), skills that return empty on paths they should handle (e.g. work-problems false-zero-bail on flat-layout adopter repos), skills whose AskUserQuestion options exceed the 4-option cap (per P061), skills that silently swallow error states the contract says should halt.
|
|
69
|
+
3. **Release-path instability** — `push:watch` / `release:watch` misbehaviour (P054, P060 class — reporting success on a stale SHA's workflow run), changeset authoring defects (P073), release-PR body issues, npm publish failing on metadata mismatch.
|
|
70
|
+
4. **Subagent-delegation friction** — architect / jtbd / risk-scorer / style-guide / voice-tone agents returning `DEFERRED` or `ISSUES FOUND` that block progress, PASS markers failing to write, agent prompts timing out, agent outputs missing the specific citations ADR-026 requires.
|
|
71
|
+
5. **Repeat-work friction** — the same workaround applied ≥ 3 times in one session (each application is signal; the third triggers a ticket candidate). Includes: the same `git add` re-stage after `git mv` (P057), the same marker-refresh pattern after an agent returns DEFERRED, the same hook-bypass incantation.
|
|
72
|
+
6. **Session-wrap silent drops** — cases where run-retro itself under-reports (the meta case this step fixes). Detect by comparing the set of `## Fix Released` updates in this session against the set of observations in the retro summary; a `.verifying.md` rename without a matching retro entry is suspect.
|
|
73
|
+
|
|
74
|
+
**Steps:**
|
|
75
|
+
|
|
76
|
+
1. **Glob / scan**: walk session history for signal matches from each category above. Candidate patterns to search:
|
|
77
|
+
- Hook TTL expiry → log lines containing `review expired (Ns old, TTL Ms)`, `marker refresh`, `PreToolUse hook blocking error`.
|
|
78
|
+
- Marker-vs-file deadlock → sequences where a Write was blocked, an agent was invoked for the marker, and the agent returned `DEFERRED` or similar non-PASS.
|
|
79
|
+
- `push:watch` / `release:watch` failures → non-zero exits on those scripts, or observable SHA-mismatch in `gh run list` output.
|
|
80
|
+
- Subagent DEFERRED / ISSUES FOUND that blocked progress → agent outputs matching those markers.
|
|
81
|
+
- Repeat workaround → the same `Bash` command pattern appearing ≥ 3 times with the same outcome.
|
|
82
|
+
|
|
83
|
+
2. **Evidence-scan grounding (ADR-026)**: every detected signal MUST carry specific citations — the tool invocation (command or agent call), a session position marker (turn number, timestamp, or commit SHA), and the observable outcome (exit status, error message, marker content). Bare "pipeline was flaky this session" does not qualify. An example acceptable citation: *"architect hook TTL expired at turn N while drafting `docs/decisions/031-…proposed.md` (log line `review expired (1814s old, TTL 1800s)`), forcing a marker-refresh round-trip"*. If no specific citation can be produced, the detection is NOT logged — false positives are worse than silent drops here because each false positive produces a ticket.
|
|
84
|
+
|
|
85
|
+
3. **Categorise**: tag each detection with its primary category from the six above.
|
|
86
|
+
|
|
87
|
+
4. **Dedup against existing tickets**: for each detection, search `docs/problems/*.open.md` and `docs/problems/*.known-error.md` for tickets whose description or symptoms match the detection's category + signal pattern. If a matching ticket exists: route the detection through Step 4 as an **update** (append new evidence to the existing ticket's `## Symptoms` or `## Root Cause Analysis` section via the manage-problem update path). If no match: route as a **new ticket** with the detection's category, citations, and a suggested title. The matching heuristic is category + signal-pattern keyword overlap — LLM-based dup classification (as discussed in P070) is not required here; local-ticket dedup runs against a small enough corpus that keyword overlap on the category + primary signal word is acceptable.
|
|
88
|
+
|
|
89
|
+
5. **Interactive path (ADR-013 Rule 1)**: for each detection, invoke `AskUserQuestion` with the detection summary + specific citations inline so the user can decide without reading session logs. Options (exactly four, per ADR-013 Rule 1 cap):
|
|
90
|
+
1. `Create new ticket` — description: "Delegate to /wr-itil:manage-problem to create a problem ticket with the detection's category, citations, and suggested title."
|
|
91
|
+
2. `Append to P<NNN>` — description: "An existing ticket covers this signal; delegate to /wr-itil:manage-problem to append new evidence to its Root Cause Analysis section."
|
|
92
|
+
3. `Record in retro report only (not ticket-worthy)` — description: "The detection is session-local friction that does not warrant a persistent ticket; record it in the Pipeline Instability section of the retro summary only."
|
|
93
|
+
4. `Skip — false positive` — description: "The evidence-scan matched on a false positive; the observed behaviour was correct. Do not record."
|
|
94
|
+
|
|
95
|
+
6. **Non-interactive / AFK fallback (ADR-013 Rule 6)**: when `AskUserQuestion` is unavailable (autonomous retro, batch session-wrap), do NOT auto-create tickets — record each detection in the retro summary's new **Pipeline Instability** section with its category, citations, and dedup status (`new` or `matches P<NNN>`). The user reviews on return and runs `/wr-itil:manage-problem` per accepted detection. Same trust-boundary shape as Step 4a's AFK deferral: surface the evidence, defer the decision. This matches the user's documented preference (feedback_verify_from_own_observation.md memory): surface observations from the agent's own in-session activity, but ticket-creation decisions remain user-confirmed.
|
|
96
|
+
|
|
97
|
+
**Interaction with other surfaces:**
|
|
98
|
+
|
|
99
|
+
- **Step 4a (Verification-close housekeeping, P068)** — same evidence-scan shape applied to a different surface. Both share the glob / scan / categorise / specific-citation / interactive-or-AFK pattern. Step 4a scans for successful exercise of `.verifying.md` fixes; Step 2b scans for tool-level friction. They fire independently and produce independent retro-summary sections.
|
|
100
|
+
- **Step 4 (problem-ticket creation)** — Step 2b feeds Step 4. A detection surfaced in Step 2b that the user accepts becomes a Step 4 creation or update via the manage-problem delegation. Step 4b's Stage 1 two-stage codification flow (P075) applies to pipeline-instability tickets the same way it applies to Step 2 reflection tickets — the detection IS the codify-worthy observation.
|
|
101
|
+
- **ADR-027 compatibility note**: when ADR-027's Step-0 auto-delegation lands on run-retro, Step 2b's evidence scan is load-bearing on main-agent session context that a delegated subagent does not automatically inherit. The migration path mirrors Step 4a's: either (a) run Step 2b in the main-agent context BEFORE Step-0 delegation to the subagent, or (b) include an explicit session-activity summary (tool invocations, commits, skill calls observed in main-agent context) in the Step-0 delegation prompt. Option (a) is preferred to keep the evidence scan close to the observed activity.
|
|
102
|
+
|
|
57
103
|
### 3. Update BRIEFING.md
|
|
58
104
|
|
|
59
105
|
Edit `docs/BRIEFING.md`:
|
|
@@ -216,6 +262,14 @@ Present a summary to the user:
|
|
|
216
262
|
|--------|-------------|----------------------|----------|
|
|
217
263
|
| P<NNN> | <one-sentence fix summary> | <specific invocations + observable outcomes> | closed via manage-problem / left Verification Pending / flagged for manual review / flagged (non-interactive) |
|
|
218
264
|
|
|
265
|
+
### Pipeline Instability
|
|
266
|
+
|
|
267
|
+
(Emitted only when Step 2b detected pipeline-level friction with specific citations. Omit this section entirely when no detections were made — or when the interactive path ticketed or dismissed them all during Step 2b. Populated in non-interactive / AFK mode per ADR-013 Rule 6 — the user reviews on return and tickets via `/wr-itil:manage-problem` per accepted detection.)
|
|
268
|
+
|
|
269
|
+
| Signal | Category | Citations | Decision |
|
|
270
|
+
|--------|----------|-----------|----------|
|
|
271
|
+
| <one-line signal summary> | Hook-protocol friction / Skill-contract violations / Release-path instability / Subagent-delegation friction / Repeat-work friction / Session-wrap silent drops | <specific invocations + session-position markers + observable outcomes> | new ticket via manage-problem / appended to P<NNN> / recorded in retro only / skipped (false positive) / flagged (non-interactive) |
|
|
272
|
+
|
|
219
273
|
### Codification Candidates
|
|
220
274
|
|
|
221
275
|
| Kind | Shape | Suggested name / Target file | Scope / Flaw | Triggers / Evidence | Decision |
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# P074: run-retro SKILL.md documents a Pipeline-instability scan step
|
|
4
|
+
# (Step 2b) that inspects session activity for tool-level friction
|
|
5
|
+
# signals and funnels each detection into Step 4's problem-ticket
|
|
6
|
+
# creation path. Shape mirrors P068's Step 4a (evidence-scan + ADR-026
|
|
7
|
+
# grounding + interactive/AFK branches).
|
|
8
|
+
#
|
|
9
|
+
# Doc-lint structural test (Permitted Exception per ADR-005). Asserts
|
|
10
|
+
# SKILL.md wording for: the step header, the placement between Step 2
|
|
11
|
+
# and Step 4, the six signal categories enumerated in the RCA, the
|
|
12
|
+
# ADR-026 grounding requirement, the interactive AskUserQuestion
|
|
13
|
+
# contract (ADR-013 Rule 1), the AFK fallback (ADR-013 Rule 6), the
|
|
14
|
+
# ownership-delegation boundary to /wr-itil:manage-problem, and the
|
|
15
|
+
# Step 5 summary integration.
|
|
16
|
+
|
|
17
|
+
setup() {
|
|
18
|
+
REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
|
|
19
|
+
SKILL_MD="$REPO_ROOT/packages/retrospective/skills/run-retro/SKILL.md"
|
|
20
|
+
}
|
|
21
|
+
|
|
22
|
+
@test "run-retro: SKILL.md contains Step 2b Pipeline-instability scan (P074)" {
|
|
23
|
+
run grep -F '### 2b. Pipeline-instability scan (P074)' "$SKILL_MD"
|
|
24
|
+
[ "$status" -eq 0 ]
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
@test "run-retro: Step 2b enumerates all six signal categories" {
|
|
28
|
+
run grep -F 'Hook-protocol friction' "$SKILL_MD"
|
|
29
|
+
[ "$status" -eq 0 ]
|
|
30
|
+
run grep -F 'Skill-contract violations' "$SKILL_MD"
|
|
31
|
+
[ "$status" -eq 0 ]
|
|
32
|
+
run grep -F 'Release-path instability' "$SKILL_MD"
|
|
33
|
+
[ "$status" -eq 0 ]
|
|
34
|
+
run grep -F 'Subagent-delegation friction' "$SKILL_MD"
|
|
35
|
+
[ "$status" -eq 0 ]
|
|
36
|
+
run grep -F 'Repeat-work friction' "$SKILL_MD"
|
|
37
|
+
[ "$status" -eq 0 ]
|
|
38
|
+
run grep -F 'Session-wrap silent drops' "$SKILL_MD"
|
|
39
|
+
[ "$status" -eq 0 ]
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
@test "run-retro: Step 2b requires specific-citation grounding (ADR-026)" {
|
|
43
|
+
run grep -F 'ADR-026' "$SKILL_MD"
|
|
44
|
+
[ "$status" -eq 0 ]
|
|
45
|
+
run grep -F 'specific citations' "$SKILL_MD"
|
|
46
|
+
[ "$status" -eq 0 ]
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
@test "run-retro: Step 2b interactive AskUserQuestion contract per ADR-013 Rule 1" {
|
|
50
|
+
run grep -F 'ADR-013 Rule 1' "$SKILL_MD"
|
|
51
|
+
[ "$status" -eq 0 ]
|
|
52
|
+
run grep -F 'Create new ticket' "$SKILL_MD"
|
|
53
|
+
[ "$status" -eq 0 ]
|
|
54
|
+
run grep -F 'Append to P<NNN>' "$SKILL_MD"
|
|
55
|
+
[ "$status" -eq 0 ]
|
|
56
|
+
run grep -F 'Skip — false positive' "$SKILL_MD"
|
|
57
|
+
[ "$status" -eq 0 ]
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
@test "run-retro: Step 2b AFK fallback defers ticket creation per ADR-013 Rule 6" {
|
|
61
|
+
run grep -F 'ADR-013 Rule 6' "$SKILL_MD"
|
|
62
|
+
[ "$status" -eq 0 ]
|
|
63
|
+
run grep -F 'Pipeline Instability' "$SKILL_MD"
|
|
64
|
+
[ "$status" -eq 0 ]
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
@test "run-retro: Step 2b delegates ticket creation to /wr-itil:manage-problem" {
|
|
68
|
+
run grep -F '/wr-itil:manage-problem' "$SKILL_MD"
|
|
69
|
+
[ "$status" -eq 0 ]
|
|
70
|
+
run grep -F 'run-retro surfaces the detection' "$SKILL_MD"
|
|
71
|
+
[ "$status" -eq 0 ]
|
|
72
|
+
}
|
|
73
|
+
|
|
74
|
+
@test "run-retro: Step 2b dedup checks existing tickets before creating" {
|
|
75
|
+
run grep -F 'Dedup against existing tickets' "$SKILL_MD"
|
|
76
|
+
[ "$status" -eq 0 ]
|
|
77
|
+
run grep -F 'docs/problems/*.open.md' "$SKILL_MD"
|
|
78
|
+
[ "$status" -eq 0 ]
|
|
79
|
+
run grep -F 'docs/problems/*.known-error.md' "$SKILL_MD"
|
|
80
|
+
[ "$status" -eq 0 ]
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
@test "run-retro: Step 2b ADR-027 compatibility note documents subagent-context handling" {
|
|
84
|
+
run grep -F 'ADR-027' "$SKILL_MD"
|
|
85
|
+
[ "$status" -eq 0 ]
|
|
86
|
+
}
|
|
87
|
+
|
|
88
|
+
@test "run-retro: Step 2b placement between Step 2 reflection and Step 4 ticket creation" {
|
|
89
|
+
# Section 2b must appear after 2 and before 4.
|
|
90
|
+
pos_2=$(grep -n '^### 2\. Reflect on this session' "$SKILL_MD" | head -1 | cut -d: -f1)
|
|
91
|
+
pos_2b=$(grep -n '^### 2b\. Pipeline-instability scan' "$SKILL_MD" | head -1 | cut -d: -f1)
|
|
92
|
+
pos_4=$(grep -n '^### 4\. Create or update problem tickets' "$SKILL_MD" | head -1 | cut -d: -f1)
|
|
93
|
+
[ -n "$pos_2" ]
|
|
94
|
+
[ -n "$pos_2b" ]
|
|
95
|
+
[ -n "$pos_4" ]
|
|
96
|
+
[ "$pos_2" -lt "$pos_2b" ]
|
|
97
|
+
[ "$pos_2b" -lt "$pos_4" ]
|
|
98
|
+
}
|
|
99
|
+
|
|
100
|
+
@test "run-retro: Step 5 summary adds a Pipeline Instability section" {
|
|
101
|
+
run grep -F '### Pipeline Instability' "$SKILL_MD"
|
|
102
|
+
[ "$status" -eq 0 ]
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
@test "run-retro: Pipeline Instability summary table columns match Step 2b output" {
|
|
106
|
+
run grep -F '| Signal | Category | Citations | Decision |' "$SKILL_MD"
|
|
107
|
+
[ "$status" -eq 0 ]
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
@test "run-retro: Step 2b documents interaction with P068 Step 4a shape (shared evidence-scan pattern)" {
|
|
111
|
+
run grep -F 'P068' "$SKILL_MD"
|
|
112
|
+
[ "$status" -eq 0 ]
|
|
113
|
+
run grep -F 'evidence-scan' "$SKILL_MD"
|
|
114
|
+
[ "$status" -eq 0 ]
|
|
115
|
+
}
|