@windyroad/retrospective 0.9.0 → 0.10.0-preview.218
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/package.json +1 -1
- package/skills/analyze-context/SKILL.md +246 -0
- package/skills/analyze-context/test/analyze-context-skill-contract.bats +112 -0
- package/skills/run-retro/SKILL.md +44 -0
- package/skills/run-retro/test/run-retro-context-usage-step-2c.bats +98 -0
package/package.json
CHANGED
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: wr-retrospective:analyze-context
|
|
3
|
+
description: Deep on-demand context-usage analyzer. Runs richer heuristics than run-retro Step 2c — per-turn attribution, per-plugin decomposition, suggestion generation, policy-breach detection. Produces a markdown report at docs/retros/<date>-context-analysis.md with an HTML-comment trailer carrying the bucket-snapshot for delta-from-prior comparison. User-invoked only; never auto-fires.
|
|
4
|
+
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Skill
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Analyze Context (Deep Layer)
|
|
8
|
+
|
|
9
|
+
On-demand deep analysis of session context-usage — per-turn attribution, per-plugin decomposition, suggestion generation. Produces a committed markdown report at `docs/retros/<date>-context-analysis.md` whose HTML-comment trailer is the snapshot subsequent runs of `run-retro` Step 2c (cheap layer) compare against.
|
|
10
|
+
|
|
11
|
+
This skill is the **deep layer** of the two-layer design in **ADR-043** (Progressive context-usage measurement and reporting for retrospective sessions; P101). The **cheap layer** lives in `packages/retrospective/skills/run-retro/SKILL.md` Step 2c and runs every retro at < 5% session budget. This skill runs only on explicit user direction.
|
|
12
|
+
|
|
13
|
+
## When to use
|
|
14
|
+
|
|
15
|
+
- The user invokes `/wr-retrospective:analyze-context` directly.
|
|
16
|
+
- The cheap layer (run-retro Step 2c) surfaced a delta anomaly (>+20% in any bucket since prior snapshot) and the user wants the per-turn / per-plugin decomposition.
|
|
17
|
+
- The user is preparing to trim context — e.g. before a release that introduces new hooks or skills, or after observing early compaction in long-running AFK loops.
|
|
18
|
+
- The user wants a baseline snapshot at a known-good moment (e.g. immediately after a P091-cluster fix lands).
|
|
19
|
+
|
|
20
|
+
**Never auto-fires.** Per ADR-043 + ADR-013 Rule 6 (AFK fallback), this skill is invoked only by explicit user direction. AFK orchestrators that observe anomalies via the cheap layer surface them in iteration summaries; the user runs this skill on return.
|
|
21
|
+
|
|
22
|
+
## Output Formatting
|
|
23
|
+
|
|
24
|
+
Per **ADR-026** (Agent output grounding), this skill's prose, suggestions, and findings MUST cite specific surfaces, MUST persist evidence in re-readable form, and MUST mark ungrounded fields with explicit sentinels. Forbidden phrases (Banned per ADR-026 Confirmation line 148): `load is negligible`, `microseconds only`, `minimal`, `small change`, `trim X to reduce bloat` (without a comparable prior). Every top-N offender row and every trim suggestion MUST carry a concrete byte count + measurement-method citation. Comparable-prior reclamation suggestions (e.g. `P095 reclaimed ~120KB by once-per-session gating`) cite the specific prior; when no prior exists, emit `not estimated — no prior data` per ADR-026 line 90.
|
|
25
|
+
|
|
26
|
+
When referencing problem IDs, ADR IDs, or JTBD IDs in prose output, include the human-readable title on first mention. Format `P101 (wr-retrospective has no context-usage analysis)` rather than bare `P101`.
|
|
27
|
+
|
|
28
|
+
## Steps
|
|
29
|
+
|
|
30
|
+
### 0. Verify the cheap-layer primitive exists
|
|
31
|
+
|
|
32
|
+
The deep layer reuses the cheap layer's measurement script as its byte-count baseline. Verify the primitive is available:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
test -x packages/retrospective/scripts/measure-context-budget.sh
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
If the script is missing or not executable, halt with a directive: *"measure-context-budget.sh is the cheap-layer measurement primitive (P101 / ADR-043). Verify the wr-retrospective plugin is installed and up to date before running the deep analyzer."*
|
|
39
|
+
|
|
40
|
+
### 1. Capture the bucket-totals baseline
|
|
41
|
+
|
|
42
|
+
Invoke the script to capture the canonical per-source-bucket byte totals. Identical contract to run-retro Step 2c:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
bash packages/retrospective/scripts/measure-context-budget.sh "${CLAUDE_PROJECT_DIR:-.}"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
The output is the deep layer's baseline. Parse each `BUCKET <name> bytes=<N>` row into a structured map; preserve `BUCKET <name> not-measured reason=<reason>` rows verbatim — the sentinels carry into the report unchanged.
|
|
49
|
+
|
|
50
|
+
### 2. Decompose per-plugin attribution
|
|
51
|
+
|
|
52
|
+
The cheap layer reports `hooks` and `skills` as aggregates. The deep layer decomposes each by plugin, citing concrete byte counts per plugin:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Per-plugin hooks decomposition
|
|
56
|
+
for plugin_dir in packages/*/hooks; do
|
|
57
|
+
plugin=$(basename "$(dirname "$plugin_dir")")
|
|
58
|
+
bytes=$(find "$plugin_dir" -type f -name '*.sh' -print0 2>/dev/null | xargs -0 wc -c 2>/dev/null | tail -1 | awk '{print $1}')
|
|
59
|
+
printf 'PLUGIN-HOOKS %s bytes=%s\n' "$plugin" "${bytes:-0}"
|
|
60
|
+
done
|
|
61
|
+
|
|
62
|
+
# Per-plugin skills decomposition
|
|
63
|
+
for plugin_dir in packages/*/skills; do
|
|
64
|
+
plugin=$(basename "$(dirname "$plugin_dir")")
|
|
65
|
+
bytes=$(find "$plugin_dir" -type f -name 'SKILL.md' -print0 2>/dev/null | xargs -0 wc -c 2>/dev/null | tail -1 | awk '{print $1}')
|
|
66
|
+
printf 'PLUGIN-SKILLS %s bytes=%s\n' "$plugin" "${bytes:-0}"
|
|
67
|
+
done
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Each plugin's `hooks` and `skills` row carries a concrete byte count + `find / wc -c` measurement-method citation. The aggregate cheap-layer `hooks` row equals the sum of all `PLUGIN-HOOKS` rows (sanity-check the report).
|
|
71
|
+
|
|
72
|
+
### 3. Per-turn attribution (when session log is available)
|
|
73
|
+
|
|
74
|
+
When `${CLAUDE_PROJECT_DIR}/.afk-run-state/*.jsonl` exists (AFK orchestrator session) OR the user supplies an explicit session-log path, parse the per-turn `usage` field per ADR-043's deep-layer methodology:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
log_paths=( "${CLAUDE_PROJECT_DIR:-.}"/.afk-run-state/*.jsonl )
|
|
78
|
+
shopt -s nullglob
|
|
79
|
+
log_paths=( "${log_paths[@]}" )
|
|
80
|
+
shopt -u nullglob
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
For each log file:
|
|
84
|
+
|
|
85
|
+
- Extract `usage.{input,output,cache_creation,cache_read}_tokens` per turn.
|
|
86
|
+
- Map each tool-call's input/output bytes to the bucket(s) the tool referenced (e.g. an Edit on `packages/itil/skills/manage-problem/SKILL.md` attributes to `skills/itil`).
|
|
87
|
+
- Aggregate per-turn totals; flag turns whose token cost exceeds 2× the median turn-cost as **anomalous turns**.
|
|
88
|
+
- When no session log is available, emit `per-turn attribution: not measured — no session log accessible` per ADR-026.
|
|
89
|
+
|
|
90
|
+
### 4. Suggestion generation (ADR-026 grounding)
|
|
91
|
+
|
|
92
|
+
For each non-trivial bucket (top-5 by byte count), generate a trim-candidate suggestion citing:
|
|
93
|
+
|
|
94
|
+
1. **The specific surface** that dominates the bucket (e.g. `packages/itil/skills/manage-problem/SKILL.md`).
|
|
95
|
+
2. **A comparable prior reclamation** (e.g. `P095 reclaimed ~120KB by once-per-session gating; P099 promoted Tier 3 to advisory enforcement; P100 split BRIEFING.md into per-topic files`). When no comparable prior exists, emit `not estimated — no prior data` per ADR-026 line 90.
|
|
96
|
+
3. **A concrete byte-saving estimate** anchored to the prior or marked ungrounded.
|
|
97
|
+
|
|
98
|
+
**Forbidden suggestion shapes** (ADR-026 Confirmation line 148): bare `trim X` without a citation; `consider reducing` without a target byte count; `optimise this skill` without a comparable prior. Every suggestion is auditable end-to-end or it is not emitted.
|
|
99
|
+
|
|
100
|
+
### 5. Detect policy breaches
|
|
101
|
+
|
|
102
|
+
For surfaces with explicit budgets, check for breach:
|
|
103
|
+
|
|
104
|
+
- **ADR-038 hook prose budget** (≤150 bytes per subsequent-prompt reminder) — verify by sampling each `UserPromptSubmit` hook's terse-reminder branch. Emit a `BREACH` row per offending hook with citation.
|
|
105
|
+
- **ADR-040 Tier 1 / Tier 2 / Tier 3 budgets** — invoke `packages/retrospective/scripts/check-briefing-budgets.sh` and surface any `OVER` rows verbatim in the deep report.
|
|
106
|
+
- **ADR-038 SKILL.md size cluster (P097)** — when a single `SKILL.md` exceeds 50KB, emit a `BREACH` row citing the file path + byte count + the P097 ticket as the evolving budget anchor.
|
|
107
|
+
|
|
108
|
+
When a breach is detected, the report includes a `## Policy Breaches` section. Each breach cites the specific budget rule + the offending file path + a concrete byte count.
|
|
109
|
+
|
|
110
|
+
### 6. Render the report
|
|
111
|
+
|
|
112
|
+
Write the deep-layer report to `docs/retros/<TODAY>-context-analysis.md` where `<TODAY>` is the current ISO date.
|
|
113
|
+
|
|
114
|
+
If `docs/retros/` does not exist, create it (`mkdir -p docs/retros`). If `docs/retros/README.md` does not exist, scaffold a minimal one-line index pointing at the report directory shape:
|
|
115
|
+
|
|
116
|
+
```markdown
|
|
117
|
+
# Retro Reports
|
|
118
|
+
|
|
119
|
+
Per-date context-analysis reports produced by the wr-retrospective deep layer (`/wr-retrospective:analyze-context`). Each report carries an HTML-comment snapshot trailer; see ADR-043 for the schema.
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**Report shape:**
|
|
123
|
+
|
|
124
|
+
```markdown
|
|
125
|
+
# Context Analysis — YYYY-MM-DD
|
|
126
|
+
|
|
127
|
+
> Source: `/wr-retrospective:analyze-context` (deep layer per ADR-043).
|
|
128
|
+
> Methodology: byte-count-on-disk + per-plugin decomposition + per-turn attribution (when session log available).
|
|
129
|
+
> Cheap-layer baseline: `packages/retrospective/scripts/measure-context-budget.sh`.
|
|
130
|
+
|
|
131
|
+
## Bucket Totals
|
|
132
|
+
|
|
133
|
+
| Bucket | Bytes | % of measured | Δ vs prior |
|
|
134
|
+
|--------|-------|---------------|------------|
|
|
135
|
+
| ... | ... | ... | ... |
|
|
136
|
+
|
|
137
|
+
(Bucket rows ordered by byte count descending. `not-measured` buckets ride a separate row with the reason sentinel.)
|
|
138
|
+
|
|
139
|
+
## Per-Plugin Decomposition
|
|
140
|
+
|
|
141
|
+
### Hooks (aggregate from cheap layer: <N> bytes)
|
|
142
|
+
|
|
143
|
+
| Plugin | Bytes | % of hooks |
|
|
144
|
+
|--------|-------|------------|
|
|
145
|
+
| ... | ... | ... |
|
|
146
|
+
|
|
147
|
+
### Skills (aggregate from cheap layer: <N> bytes)
|
|
148
|
+
|
|
149
|
+
| Plugin | Bytes | % of skills |
|
|
150
|
+
|--------|-------|-------------|
|
|
151
|
+
| ... | ... | ... |
|
|
152
|
+
|
|
153
|
+
## Top-N Offenders
|
|
154
|
+
|
|
155
|
+
| Surface | Bytes | Bucket | Comparable prior |
|
|
156
|
+
|---------|-------|--------|------------------|
|
|
157
|
+
| ... | ... | ... | ... |
|
|
158
|
+
|
|
159
|
+
## Per-Turn Attribution
|
|
160
|
+
|
|
161
|
+
(Populated only when a session log is accessible; otherwise: `per-turn attribution: not measured — no session log accessible`.)
|
|
162
|
+
|
|
163
|
+
| Turn | Input tokens | Output tokens | Cache creation | Cache read | Notes |
|
|
164
|
+
|------|--------------|---------------|----------------|------------|-------|
|
|
165
|
+
| ... | ... | ... | ... | ... | ... |
|
|
166
|
+
|
|
167
|
+
## Suggestions
|
|
168
|
+
|
|
169
|
+
(Per ADR-026 — each suggestion cites specific surface + comparable prior + concrete byte estimate, or marks `not estimated — no prior data`.)
|
|
170
|
+
|
|
171
|
+
1. **[Bucket / Surface]** — Suggestion text. Comparable prior: `P<NNN> reclaimed ~<N>KB by <approach>`. Estimated byte saving: `~<N>KB` / `not estimated — no prior data`.
|
|
172
|
+
|
|
173
|
+
## Policy Breaches
|
|
174
|
+
|
|
175
|
+
(Populated only when a breach is detected per Step 5; otherwise: `no policy breaches detected`.)
|
|
176
|
+
|
|
177
|
+
| Budget | Offender | Bytes | Citation |
|
|
178
|
+
|--------|----------|-------|----------|
|
|
179
|
+
| ... | ... | ... | ... |
|
|
180
|
+
|
|
181
|
+
<!--
|
|
182
|
+
context-snapshot:
|
|
183
|
+
total-bytes: <N>
|
|
184
|
+
hooks: <N>
|
|
185
|
+
skills: <N>
|
|
186
|
+
memory: <N>
|
|
187
|
+
briefing: <N>
|
|
188
|
+
decisions: <N>
|
|
189
|
+
problems: <N>
|
|
190
|
+
jtbd: <N>
|
|
191
|
+
project-claude-md: <N>
|
|
192
|
+
framework-injected: not measured
|
|
193
|
+
measurement-method: byte-count-on-disk
|
|
194
|
+
measured-at: <ISO timestamp>
|
|
195
|
+
-->
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
The HTML-comment trailer is the snapshot subsequent retros (run-retro Step 2c) read for delta-from-prior comparison. Schema mirrors ADR-040's per-entry signal-score block.
|
|
199
|
+
|
|
200
|
+
### 7. Commit per ADR-014
|
|
201
|
+
|
|
202
|
+
Stage and commit per the ADR-014 commit-message convention added by ADR-043:
|
|
203
|
+
|
|
204
|
+
1. `git add docs/retros/<TODAY>-context-analysis.md` plus, if newly created, `docs/retros/README.md`.
|
|
205
|
+
2. Satisfy the commit gate — two paths are valid:
|
|
206
|
+
- **Primary**: delegate to the `wr-risk-scorer:pipeline` subagent-type via the Agent tool.
|
|
207
|
+
- **Fallback**: invoke `/wr-risk-scorer:assess-release` via the Skill tool. Per ADR-015 it wraps the same pipeline subagent and produces an equivalent bypass marker.
|
|
208
|
+
3. `git commit -m "docs(retros): context analysis YYYY-MM-DD"` per ADR-014's amended Commit Message Convention table.
|
|
209
|
+
|
|
210
|
+
If risk is above appetite per ADR-013 Rule 5 + ADR-042: do NOT commit; report the uncommitted state and let the user resolve. Do NOT call `AskUserQuestion` as a shortcut out of the auto-apply loop.
|
|
211
|
+
|
|
212
|
+
### 8. Report
|
|
213
|
+
|
|
214
|
+
After the commit lands, report:
|
|
215
|
+
|
|
216
|
+
- The path of the new report file.
|
|
217
|
+
- The total measured bytes + delta-vs-prior summary line.
|
|
218
|
+
- The number of suggestions generated.
|
|
219
|
+
- The number of policy breaches detected.
|
|
220
|
+
- A pointer to run-retro Step 2c: *"Subsequent `/wr-retrospective:run-retro` invocations will read this report's HTML-comment trailer for delta comparison."*
|
|
221
|
+
|
|
222
|
+
## Non-interactive / AFK behaviour (ADR-013 Rule 6)
|
|
223
|
+
|
|
224
|
+
This skill is **never auto-invoked** in AFK or non-interactive mode. The cheap layer (run-retro Step 2c) surfaces anomalies in the iteration summary; the user runs `/wr-retrospective:analyze-context` on return.
|
|
225
|
+
|
|
226
|
+
If invoked in a non-interactive context with `AskUserQuestion` unavailable AND the commit gate flags above-appetite risk: skip the commit, report the uncommitted report path clearly, and let the user resolve on return. The report file itself is still written — it is the evidence the user reviews.
|
|
227
|
+
|
|
228
|
+
## Composition with sibling measurements
|
|
229
|
+
|
|
230
|
+
- **`P099`** (briefing bloat — `check-briefing-budgets.sh`) — the deep report cites P099's `OVER` rows verbatim under Policy Breaches when the briefing tree exceeds Tier 3.
|
|
231
|
+
- **`P105`** (signal-vs-noise pass) — the deep report cites P105 score totals from the most-recent retro under Per-Turn Attribution / Suggestions, when relevant.
|
|
232
|
+
- **`ADR-040`** (session-start briefing surface) — the deep report cites ADR-040's tier budgets when surfacing briefing-related suggestions.
|
|
233
|
+
- **`run-retro` Step 2c (cheap layer)** — the deep report's HTML-comment trailer is the snapshot run-retro Step 2c reads for delta-from-prior comparison. Bidirectional contract.
|
|
234
|
+
|
|
235
|
+
## ADRs cited
|
|
236
|
+
|
|
237
|
+
- **ADR-043** (Progressive context-usage measurement) — this skill's source decision.
|
|
238
|
+
- **ADR-026** (Agent output grounding) — `analyze-context/SKILL.md` is on the per-agent prompt amendments list (lines 94–101 of ADR-026, amended within reassessment window).
|
|
239
|
+
- **ADR-014** (Governance skills commit own work) — `docs(retros): context analysis YYYY-MM-DD` row added to the Commit Message Convention table; this skill commits its own report per the amended convention.
|
|
240
|
+
- **ADR-013** Rule 5 / Rule 6 — interactive AskUserQuestion path / AFK fallback.
|
|
241
|
+
- **ADR-038** (Progressive disclosure) — the methodology mirrors ADR-038's tiered disclosure pattern; report rows obey ≤150-byte budget per row.
|
|
242
|
+
- **ADR-040** (Session-start briefing surface) — HTML-comment trailer pattern precedent.
|
|
243
|
+
- **ADR-022** (Verification Pending lifecycle) — P101's transition path on this skill landing.
|
|
244
|
+
- **ADR-005** / **ADR-037** — bats fixture shape under `test/`.
|
|
245
|
+
|
|
246
|
+
$ARGUMENTS
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# P101 / ADR-043: new skill /wr-retrospective:analyze-context (deep layer)
|
|
4
|
+
# at packages/retrospective/skills/analyze-context/SKILL.md. Doc-lint
|
|
5
|
+
# structural test (Permitted Exception per ADR-005). Asserts the SKILL.md
|
|
6
|
+
# carries: the canonical name, citations of ADR-043 / ADR-026 / ADR-014 /
|
|
7
|
+
# ADR-013, references to the cheap-layer measurement primitive, the report
|
|
8
|
+
# path convention, the HTML-comment-trailer snapshot schema, the AFK
|
|
9
|
+
# never-auto-fires discipline, the ADR-014 commit-message convention, and
|
|
10
|
+
# the ADR-026 forbidden-qualitative-phrase ban.
|
|
11
|
+
|
|
12
|
+
setup() {
|
|
13
|
+
REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
|
|
14
|
+
SKILL_MD="$REPO_ROOT/packages/retrospective/skills/analyze-context/SKILL.md"
|
|
15
|
+
}
|
|
16
|
+
|
|
17
|
+
@test "analyze-context: SKILL.md exists at expected path" {
|
|
18
|
+
[ -f "$SKILL_MD" ]
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
@test "analyze-context: frontmatter declares the wr-retrospective:analyze-context name" {
|
|
22
|
+
run grep -F 'name: wr-retrospective:analyze-context' "$SKILL_MD"
|
|
23
|
+
[ "$status" -eq 0 ]
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
@test "analyze-context: cites ADR-043 as the source decision" {
|
|
27
|
+
run grep -F 'ADR-043' "$SKILL_MD"
|
|
28
|
+
[ "$status" -eq 0 ]
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
@test "analyze-context: cites ADR-026 (Agent output grounding)" {
|
|
32
|
+
run grep -F 'ADR-026' "$SKILL_MD"
|
|
33
|
+
[ "$status" -eq 0 ]
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
@test "analyze-context: cites ADR-014 (Governance skills commit own work)" {
|
|
37
|
+
run grep -F 'ADR-014' "$SKILL_MD"
|
|
38
|
+
[ "$status" -eq 0 ]
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
@test "analyze-context: cites ADR-013 Rule 6 (AFK fallback)" {
|
|
42
|
+
run grep -F 'ADR-013 Rule 6' "$SKILL_MD"
|
|
43
|
+
[ "$status" -eq 0 ]
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
@test "analyze-context: references the cheap-layer measurement primitive" {
|
|
47
|
+
run grep -F 'packages/retrospective/scripts/measure-context-budget.sh' "$SKILL_MD"
|
|
48
|
+
[ "$status" -eq 0 ]
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
@test "analyze-context: declares the report path convention docs/retros/<date>-context-analysis.md" {
|
|
52
|
+
run grep -F 'docs/retros/' "$SKILL_MD"
|
|
53
|
+
[ "$status" -eq 0 ]
|
|
54
|
+
# `--` separates flags from pattern; `-context-analysis.md` would otherwise
|
|
55
|
+
# parse as an option flag.
|
|
56
|
+
run grep -F -- '-context-analysis.md' "$SKILL_MD"
|
|
57
|
+
[ "$status" -eq 0 ]
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
@test "analyze-context: HTML-comment trailer schema present (context-snapshot)" {
|
|
61
|
+
run grep -F 'context-snapshot:' "$SKILL_MD"
|
|
62
|
+
[ "$status" -eq 0 ]
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
@test "analyze-context: trailer schema cites measurement-method and measured-at fields" {
|
|
66
|
+
run grep -F 'measurement-method' "$SKILL_MD"
|
|
67
|
+
[ "$status" -eq 0 ]
|
|
68
|
+
run grep -F 'measured-at' "$SKILL_MD"
|
|
69
|
+
[ "$status" -eq 0 ]
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
@test "analyze-context: AFK never-auto-fires discipline declared" {
|
|
73
|
+
run grep -F 'never auto-invoked' "$SKILL_MD"
|
|
74
|
+
[ "$status" -eq 0 ]
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
@test "analyze-context: ADR-014 commit-message convention declared (docs(retros): context analysis)" {
|
|
78
|
+
run grep -F 'docs(retros): context analysis' "$SKILL_MD"
|
|
79
|
+
[ "$status" -eq 0 ]
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
@test "analyze-context: bans qualitative-only phrases per ADR-026" {
|
|
83
|
+
run grep -F 'load is negligible' "$SKILL_MD"
|
|
84
|
+
[ "$status" -eq 0 ]
|
|
85
|
+
run grep -F 'microseconds only' "$SKILL_MD"
|
|
86
|
+
[ "$status" -eq 0 ]
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
@test "analyze-context: requires comparable-prior citation in suggestions" {
|
|
90
|
+
run grep -F 'comparable prior' "$SKILL_MD"
|
|
91
|
+
[ "$status" -eq 0 ]
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
@test "analyze-context: routes to /wr-risk-scorer:assess-release fallback when subagent unavailable" {
|
|
95
|
+
run grep -F '/wr-risk-scorer:assess-release' "$SKILL_MD"
|
|
96
|
+
[ "$status" -eq 0 ]
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
@test "analyze-context: documents per-plugin decomposition step" {
|
|
100
|
+
run grep -F 'Per-Plugin Decomposition' "$SKILL_MD"
|
|
101
|
+
[ "$status" -eq 0 ]
|
|
102
|
+
}
|
|
103
|
+
|
|
104
|
+
@test "analyze-context: documents policy-breach detection step" {
|
|
105
|
+
run grep -F 'Policy Breaches' "$SKILL_MD"
|
|
106
|
+
[ "$status" -eq 0 ]
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
@test "analyze-context: cites P101 (driver ticket) and P091 (parent meta)" {
|
|
110
|
+
run grep -F 'P101' "$SKILL_MD"
|
|
111
|
+
[ "$status" -eq 0 ]
|
|
112
|
+
}
|
|
@@ -165,6 +165,50 @@ The shape mirrors P068's Step 4a Verification-close housekeeping: glob / evidenc
|
|
|
165
165
|
- **Step 4 (problem-ticket creation)** — Step 2b feeds Step 4. A detection surfaced in Step 2b that the user accepts becomes a Step 4 creation or update via the manage-problem delegation. Step 4b's Stage 1 two-stage codification flow (P075) applies to pipeline-instability tickets the same way it applies to Step 2 reflection tickets — the detection IS the codify-worthy observation.
|
|
166
166
|
- **ADR-027 compatibility note**: when ADR-027's Step-0 auto-delegation lands on run-retro, Step 2b's evidence scan is load-bearing on main-agent session context that a delegated subagent does not automatically inherit. The migration path mirrors Step 4a's: either (a) run Step 2b in the main-agent context BEFORE Step-0 delegation to the subagent, or (b) include an explicit session-activity summary (tool invocations, commits, skill calls observed in main-agent context) in the Step-0 delegation prompt. Option (a) is preferred to keep the evidence scan close to the observed activity.
|
|
167
167
|
|
|
168
|
+
### 2c. Context-usage measurement (cheap layer, P101)
|
|
169
|
+
|
|
170
|
+
Per **ADR-043** (Progressive context-usage measurement and reporting for retrospective sessions), every retro emits a per-source-bucket context-usage summary so bloat is detected at session-time rather than after the user notices. The cheap layer runs unconditionally in every retro (interactive and AFK) at a static-budget-bounded ~2.5 KB output ceiling — well under the 5% / 200K cheap-layer envelope. Anything richer (per-turn attribution, per-plugin decomposition, suggestion generation) is the deep layer's responsibility, invoked by explicit user direction via `/wr-retrospective:analyze-context`.
|
|
171
|
+
|
|
172
|
+
**Ownership boundary**: this step measures and surfaces; it does NOT trim, edit, or refactor any source surface. Trim decisions stay with the user (or a follow-up problem ticket via Step 4 / Step 4b). The cheap layer is a read-only observability surface, not an enforcement gate.
|
|
173
|
+
|
|
174
|
+
**Steps:**
|
|
175
|
+
|
|
176
|
+
1. **Invoke the diagnostic script**:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
bash packages/retrospective/scripts/measure-context-budget.sh "${CLAUDE_PROJECT_DIR:-.}"
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
The script is read-only, exits 0 on advisory output and 2 on parse error (project root missing). It emits one row per bucket: `BUCKET <name> bytes=<N>` for measured surfaces, `BUCKET <name> not-measured reason=<reason>` for absent or framework-injected surfaces, plus a trailing `THRESHOLD bytes=<N>` row for the configurable ceiling. See `packages/retrospective/scripts/test/measure-context-budget.bats` for the exact contract.
|
|
183
|
+
|
|
184
|
+
2. **Read the prior snapshot** (when present):
|
|
185
|
+
|
|
186
|
+
```bash
|
|
187
|
+
prior_report=$(ls -1r docs/retros/*-context-analysis.md 2>/dev/null | head -1)
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
If `$prior_report` is non-empty and the file contains a `<!-- context-snapshot:` HTML-comment trailer (per ADR-043's snapshot-persistence shape), parse the trailer fields for the prior bucket totals. **First-retro / no-prior path**: emit the bucket table without a delta column and label it `no prior snapshot — first measurement this project` per ADR-026's `not estimated — no prior data` sentinel. Do NOT silently omit the column — the absence is itself signal.
|
|
191
|
+
|
|
192
|
+
3. **Render the cheap-layer report** as a `## Context Usage (Cheap Layer)` section in the retro summary (see Step 5). The section MUST contain:
|
|
193
|
+
- A per-bucket table (one row per script-emitted bucket, sorted by bytes descending). Columns: `Bucket | Bytes | % of total | Δ vs prior`.
|
|
194
|
+
- A top-5 offenders block when ≥ 5 buckets carry non-zero byte counts. Top-5 cites the bucket name + byte count + measurement-method (per ADR-026).
|
|
195
|
+
- A one-line affordance: `Per-plugin breakdown available in /wr-retrospective:analyze-context (deep layer).`
|
|
196
|
+
- When the deep layer's last run is older than 14 days OR a bucket's delta exceeds +20% since prior snapshot, append the one-line note: `Deep analysis recommended — invoke /wr-retrospective:analyze-context.` This is a non-blocking advisory, never a prompt.
|
|
197
|
+
|
|
198
|
+
4. **Forbidden phrases (ADR-026)**: the cheap-layer report MUST NOT contain qualitative-only phrases. Banned: `load is negligible`, `microseconds only`, `minimal`, `small change`, `trim X to reduce bloat` (without comparable prior). Concrete byte counts + measurement-method citations are mandatory; ungrounded fields use the explicit `not measured — <reason>` sentinel.
|
|
199
|
+
|
|
200
|
+
5. **Defensive trip (fail-open)**: if the script exits non-zero or the rendered report exceeds the `THRESHOLD bytes=<N>` ceiling at runtime, skip the bucket table and emit the one-line pointer `cheap layer disabled — invoke /wr-retrospective:analyze-context for context measurement`. Log the trip in Step 2b's Pipeline Instability section so the regression is captured as a ticket candidate per the existing flow.
|
|
201
|
+
|
|
202
|
+
6. **AFK behaviour (ADR-013 Rule 6)**: identical to interactive mode. The cheap layer is silent (no `AskUserQuestion`); the bucket table + advisory line ride the retro summary. AFK orchestrators read the summary on iteration close.
|
|
203
|
+
|
|
204
|
+
**Interaction with other surfaces:**
|
|
205
|
+
|
|
206
|
+
- **`P099` Tier 3 advisory** (`check-briefing-budgets.sh`) — measures **per-topic-file** budget on `docs/briefing/<topic>.md`. The cheap layer aggregates this into a single `briefing` bucket row; the per-file detail is drillable via P099's existing surface. No double-counting.
|
|
207
|
+
- **`P105` signal-vs-noise pass** (Step 1.5 of this skill) — measures **per-entry** signal scores on briefing entries. The cheap layer's `briefing` bucket is upstream of the per-entry signal scores; deep layer cites both as evidence sources.
|
|
208
|
+
- **Step 4 / 4b — codification flow**: when the cheap layer surfaces a delta-from-prior anomaly that the user wants to investigate, the deep layer (`/wr-retrospective:analyze-context`) is the correct routing target — it produces a `docs/retros/<date>-context-analysis.md` report with per-turn attribution and suggestion generation. The cheap layer never auto-routes.
|
|
209
|
+
- **`/wr-retrospective:analyze-context` (deep layer)** — invoked only by explicit user direction. Never auto-fires from this step. Deep-layer report writes the HTML-comment-trailer snapshot that subsequent runs of this step read.
|
|
210
|
+
- **ADR-027 compatibility note**: same migration shape as Step 2b. The script invocation must run in main-agent context; the parsed bucket totals are the artefact a delegated subagent can consume without re-running the byte-count.
|
|
211
|
+
|
|
168
212
|
### 3. Update the briefing tree
|
|
169
213
|
|
|
170
214
|
Edit `docs/briefing/<topic>.md` files — each topic file is per-subject (`hooks-and-gates.md`, `releases-and-ci.md`, `governance-workflow.md`, `afk-subprocess.md`, `plugin-distribution.md`, `agent-interaction-patterns.md`). Select the topic file whose scope matches the learning; if no file fits, add a new topic file under `docs/briefing/` and update `docs/briefing/README.md`'s Topic Index accordingly.
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
#!/usr/bin/env bats
|
|
2
|
+
|
|
3
|
+
# P101 / ADR-043: run-retro SKILL.md gains a Step 2c (Context-usage measurement
|
|
4
|
+
# — cheap layer) between Step 2b (Pipeline-instability scan) and Step 3
|
|
5
|
+
# (Update the briefing tree). The cheap layer invokes
|
|
6
|
+
# packages/retrospective/scripts/measure-context-budget.sh and renders a
|
|
7
|
+
# per-source-bucket table in the retro summary at < 5% of the session budget.
|
|
8
|
+
#
|
|
9
|
+
# Doc-lint structural test (Permitted Exception per ADR-005). Asserts SKILL.md
|
|
10
|
+
# wording for: the step header, citation of ADR-043, citation of ADR-026
|
|
11
|
+
# grounding rule, citation of the diagnostic script path, the AFK fallback
|
|
12
|
+
# (ADR-013 Rule 6) prose, the defensive-trip fail-open contract, the
|
|
13
|
+
# composition-with-P099 / P105 paragraphs, and the user-direction-only
|
|
14
|
+
# discipline on the deep layer.
|
|
15
|
+
#
|
|
16
|
+
# Behavioural assertions on the script itself live in
|
|
17
|
+
# packages/retrospective/scripts/test/measure-context-budget.bats.
|
|
18
|
+
|
|
19
|
+
setup() {
|
|
20
|
+
REPO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/../../../../.." && pwd)"
|
|
21
|
+
SKILL_MD="$REPO_ROOT/packages/retrospective/skills/run-retro/SKILL.md"
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
@test "run-retro: SKILL.md contains Step 2c Context-usage measurement (P101)" {
|
|
25
|
+
run grep -F '### 2c. Context-usage measurement (cheap layer, P101)' "$SKILL_MD"
|
|
26
|
+
[ "$status" -eq 0 ]
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
@test "run-retro: Step 2c cites ADR-043 as the source decision" {
|
|
30
|
+
run grep -F 'ADR-043' "$SKILL_MD"
|
|
31
|
+
[ "$status" -eq 0 ]
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
@test "run-retro: Step 2c invokes measure-context-budget.sh as the primitive" {
|
|
35
|
+
run grep -F 'packages/retrospective/scripts/measure-context-budget.sh' "$SKILL_MD"
|
|
36
|
+
[ "$status" -eq 0 ]
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
@test "run-retro: Step 2c cites ADR-026 grounding rule" {
|
|
40
|
+
run grep -F 'ADR-026' "$SKILL_MD"
|
|
41
|
+
[ "$status" -eq 0 ]
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
@test "run-retro: Step 2c bans qualitative-only phrases per ADR-026" {
|
|
45
|
+
# Forbidden phrases listed verbatim in Step 2c step 4
|
|
46
|
+
run grep -F 'load is negligible' "$SKILL_MD"
|
|
47
|
+
[ "$status" -eq 0 ]
|
|
48
|
+
run grep -F 'microseconds only' "$SKILL_MD"
|
|
49
|
+
[ "$status" -eq 0 ]
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
@test "run-retro: Step 2c documents the defensive-trip fail-open contract" {
|
|
53
|
+
run grep -F 'cheap layer disabled' "$SKILL_MD"
|
|
54
|
+
[ "$status" -eq 0 ]
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
@test "run-retro: Step 2c AFK Rule 6 fallback prose present" {
|
|
58
|
+
run grep -F 'ADR-013 Rule 6' "$SKILL_MD"
|
|
59
|
+
[ "$status" -eq 0 ]
|
|
60
|
+
run grep -F 'AFK behaviour' "$SKILL_MD"
|
|
61
|
+
[ "$status" -eq 0 ]
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
@test "run-retro: Step 2c documents first-retro / no-prior-snapshot path" {
|
|
65
|
+
run grep -F 'no prior snapshot' "$SKILL_MD"
|
|
66
|
+
[ "$status" -eq 0 ]
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
@test "run-retro: Step 2c cites P099 + P105 composition (no double-counting)" {
|
|
70
|
+
run grep -F 'P099' "$SKILL_MD"
|
|
71
|
+
[ "$status" -eq 0 ]
|
|
72
|
+
run grep -F 'P105' "$SKILL_MD"
|
|
73
|
+
[ "$status" -eq 0 ]
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
@test "run-retro: Step 2c references HTML-comment trailer for snapshot persistence" {
|
|
77
|
+
run grep -F 'context-snapshot' "$SKILL_MD"
|
|
78
|
+
[ "$status" -eq 0 ]
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
@test "run-retro: Step 2c routes deep analysis to /wr-retrospective:analyze-context only on user direction" {
|
|
82
|
+
run grep -F '/wr-retrospective:analyze-context' "$SKILL_MD"
|
|
83
|
+
[ "$status" -eq 0 ]
|
|
84
|
+
}
|
|
85
|
+
|
|
86
|
+
@test "run-retro: Step 2c is placed between Step 2b and Step 3" {
|
|
87
|
+
# Capture line numbers for ordering check
|
|
88
|
+
step_2b_line=$(grep -n '^### 2b\.' "$SKILL_MD" | head -1 | cut -d: -f1)
|
|
89
|
+
step_2c_line=$(grep -n '^### 2c\.' "$SKILL_MD" | head -1 | cut -d: -f1)
|
|
90
|
+
step_3_line=$(grep -n '^### 3\. Update the briefing tree' "$SKILL_MD" | head -1 | cut -d: -f1)
|
|
91
|
+
|
|
92
|
+
[ -n "$step_2b_line" ]
|
|
93
|
+
[ -n "$step_2c_line" ]
|
|
94
|
+
[ -n "$step_3_line" ]
|
|
95
|
+
|
|
96
|
+
[ "$step_2b_line" -lt "$step_2c_line" ]
|
|
97
|
+
[ "$step_2c_line" -lt "$step_3_line" ]
|
|
98
|
+
}
|