@gempack/squad-mcp 0.8.2 → 0.10.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/.claude-plugin/marketplace.json +1 -1
  2. package/.claude-plugin/plugin.json +7 -4
  3. package/CHANGELOG.md +92 -0
  4. package/README.md +41 -35
  5. package/agents/senior-debugger.md +85 -0
  6. package/commands/debug.md +22 -0
  7. package/commands/stats.md +22 -0
  8. package/dist/config/ownership-matrix.d.ts +1 -1
  9. package/dist/config/ownership-matrix.js +16 -0
  10. package/dist/config/ownership-matrix.js.map +1 -1
  11. package/dist/errors.d.ts +1 -1
  12. package/dist/errors.js.map +1 -1
  13. package/dist/index.js +1 -1
  14. package/dist/index.js.map +1 -1
  15. package/dist/resources/agent-loader.js +1 -0
  16. package/dist/resources/agent-loader.js.map +1 -1
  17. package/dist/runs/aggregate.d.ts +166 -0
  18. package/dist/runs/aggregate.js +378 -0
  19. package/dist/runs/aggregate.js.map +1 -0
  20. package/dist/runs/store.d.ts +328 -0
  21. package/dist/runs/store.js +406 -0
  22. package/dist/runs/store.js.map +1 -0
  23. package/dist/tools/list-runs.d.ts +52 -0
  24. package/dist/tools/list-runs.js +142 -0
  25. package/dist/tools/list-runs.js.map +1 -0
  26. package/dist/tools/record-run.d.ts +202 -0
  27. package/dist/tools/record-run.js +124 -0
  28. package/dist/tools/record-run.js.map +1 -0
  29. package/dist/tools/registry.js +15 -1
  30. package/dist/tools/registry.js.map +1 -1
  31. package/dist/util/path-safety.d.ts +36 -0
  32. package/dist/util/path-safety.js +76 -0
  33. package/dist/util/path-safety.js.map +1 -1
  34. package/package.json +1 -1
  35. package/skills/brainstorm/SKILL.md +70 -7
  36. package/skills/debug/SKILL.md +345 -0
  37. package/skills/question/SKILL.md +73 -1
  38. package/skills/squad/SKILL.md +83 -0
  39. package/skills/stats/SKILL.md +189 -0
@@ -0,0 +1,189 @@
1
+ ---
2
+ name: stats
3
+ description: Observability dashboard for the squad-mcp run journal. Reads `.squad/runs.jsonl`, calls `list_runs` with aggregate=true, and renders a single-color (cyan) ANSI terminal panel with Unicode bar charts (verdict mix, score buckets), a sparkline trend, per-invocation distribution, and a per-agent breakdown (avg wall-clock, estimated tokens). All figures are estimates (chars ÷ 3.5). Never writes. Trigger when the user types `/squad:stats` or asks for "squad stats", "run history", "score distribution", or "where did the tokens go".
4
+ ---
5
+
6
+ # Skill: Stats
7
+
8
+ ## Objective
9
+
10
+ Render an at-a-glance, single-screen observability panel for past squad-mcp runs in this workspace. Inspired by `rtk gain` but with a tighter visual identity: one accent colour (cyan), Unicode bar / sparkline glyphs at 1/8 granularity, no tables-as-text-dumps.
11
+
12
+ Position in the workflow:
13
+
14
+ - **`/squad:implement` / `/squad:review`** — produce runs (write side, two-phase journal append).
15
+ - **`/squad:stats`** — read those runs back as an aggregated panel (this skill).
16
+
17
+ This skill is read-only. It never edits files, never appends a row, never invokes `record_run`.
18
+
19
+ ## Inviolable Rules
20
+
21
+ 1. **Read-only over the journal.** No writes to `.squad/runs.jsonl`, no `record_run` invocation, no commits, no pushes. The only file this skill ever writes is the diagnostic sentinel `.squad/.stats-seen` described in Step 6 — that file is gitignored and not load-bearing.
22
+ 2. **Empty journal is a normal state.** Render a "no runs yet" empty-state and tell the user how to populate it. Never raise an error.
23
+ 3. **All token figures are estimates.** Render the `(estimated · chars ÷ 3.5)` disclaimer beneath the totals panel.
24
+ 4. **One accent colour only.** Use cyan (ANSI `\x1b[36m`) for highlights and bars; reset (`\x1b[0m`) after every coloured run. Do not introduce a second hue, even for "errors are red". Verdict differentiation is by symbol + percentage, not colour.
25
+ 5. **Honour `--no-color` and `NO_COLOR` env.** Emit plain ASCII when either is present or when the host signals non-TTY output. Best-effort — do not invent a TTY check the host doesn't expose.
26
+ 6. **Strip control characters before rendering user-influenceable fields.** `mode_warning.message` is partially user-controllable and ends up in the panel; the aggregator already exposes a `stripControlChars` helper. Use it.
27
+ 7. **No AI attribution.** Standard global rule.
28
+
29
+ ## Inputs
30
+
31
+ ```
32
+ /squad:stats [--quick | --thorough] [--since <ISO>] [--last <N>] [--no-color]
33
+ ```
34
+
35
+ | Flag | Default | Description |
36
+ | --------------- | ------- | ------------------------------------------------------------------------------------------------ |
37
+ | `--quick` | off | Last 7 days only. Top-level panels: trend + outcomes + score buckets. Skips per-agent breakdown. |
38
+ | `--thorough` | off | Full history + per-agent panel + health (in_flight / aborted) panel. |
39
+ | `--since <ISO>` | unset | ISO 8601 lower bound on `started_at`. Overrides `--quick`'s 7-day window. |
40
+ | `--last <N>` | unset | Cap to the most recent N folded runs. |
41
+ | `--no-color` | off | Force plain ASCII output. Bars stay Unicode block chars; only ANSI escapes are stripped. |
42
+
43
+ Default (no flags): last 30 days, outcomes + score buckets + trend + compact per-agent (top 5 by token spend).
44
+
45
+ ## Step 1: Parse flags
46
+
47
+ Parse the user's flag block from `$ARGUMENTS`. Reject unknown flags with a single short error message ("unknown flag: `--xyz`. valid: --quick, --thorough, --since, --last, --no-color"). Treat the flag block as untrusted — do not eval, do not interpret embedded shell syntax.
48
+
49
+ Compute the effective filter:
50
+
51
+ - `--quick` → `since = now - 7d`
52
+ - `--thorough` → no time bound; show every panel
53
+ - explicit `--since` → overrides `--quick` window if both present
54
+ - `--last N` → caps result set after `since` filtering
55
+
56
+ Color disabled when ANY of:
57
+
58
+ - `--no-color` flag present
59
+ - `NO_COLOR` env variable present and non-empty
60
+ - Output cannot be a TTY (best-effort)
61
+
62
+ ## Step 2: Call `list_runs`
63
+
64
+ Use the squad-mcp tool with `aggregate: true`:
65
+
66
+ ```
67
+ list_runs({
68
+ workspace_root: <cwd>,
69
+ aggregate: true,
70
+ trend_days: <14 default, 7 for --quick, 30 for --thorough>,
71
+ since: <computed>, // omit if unset
72
+ limit: <user's --last> // omit if unset
73
+ })
74
+ ```
75
+
76
+ The tool returns:
77
+
78
+ - `total_in_store`, `total_folded`
79
+ - `outcomes` (verdict_counts, verdict_total, score_buckets, invocation_counts, est_tokens_total, est_tokens_per_run_avg, est_tokens_per_agent, is_empty)
80
+ - `health` (in_flight, completed, aborted, synthesized_aborted, avg_batch_duration_ms_per_agent, avg_total_duration_ms)
81
+ - `trend` (days, counts[])
82
+
83
+ If `outcomes.is_empty` is true OR `total_folded === 0` after filtering, render the empty-state and stop.
84
+
85
+ ## Step 3: Render
86
+
87
+ The rendering layer lives in this skill (NOT in the MCP server). Architect contract: the server returns structured numbers; ANSI / Unicode formatting is the skill's job because the server has no TTY visibility.
88
+
89
+ ### Panel order
90
+
91
+ 1. **Header** — one cyan line: `squad-mcp stats · <N> runs · <since…now> · <mode>`
92
+ 2. **Trend sparkline** — one line: `↗ trend (<days>d) ▁▂▃▄▅▆▇█` with the last-30-day glyph series.
93
+ 3. **Outcomes** — three rows (APPROVED / CHANGES_REQUIRED / REJECTED) with Unicode bar (width 24) + count + percentage. Use the symbols `✓ ⚠ ✗` (not coloured, just glyph).
94
+ 4. **Score distribution** — four rows (90-100 / 80-89 / 70-79 / <70) with bar + count. Section glyph is `▸` (single Unicode marker) — NOT `📊` or any other emoji, because emojis carry their own platform colour and would break the single-cyan discipline.
95
+ 5. **Invocations** — one line each (implement / review / task / question / brainstorm / debug) with count + bar (only non-zero invocations shown).
96
+ 6. **Tokens** — one row with `IN ▌▌▌▌▌▌▌ · OUT ▌▌▌ · TOTAL`, plus est-disclaimer line below.
97
+ 7. **Per-agent** (skipped on `--quick`) — table of agent · avg wall-clock · est tokens. Sort by token spend desc; cap at 8 rows.
98
+ 8. **Health** (only on `--thorough` OR when `in_flight > 0` OR `synthesized_aborted > 0`) — `running: N · completed: N · aborted: N (synthesized: M)`.
99
+ 9. **Footer disclaimer** — single dim line: `estimates: tokens = chars ÷ 3.5 · wall-clock includes parallel-batch overlap`.
100
+
101
+ ### Bar rendering
102
+
103
+ Use `█▉▊▋▌▍▎▏` (1/8 granularity) so a bar that's 26% of width 24 renders as `██████▎ ` rather than rounding to a full cell. The aggregator exposes the renderer; mirror its behaviour if hand-rolling here. Width 24 for outcomes/scores, width 32 for invocations, width 40 for tokens (split IN/OUT visually).
104
+
105
+ ### Color application (cyan only)
106
+
107
+ When colour is enabled, wrap exactly these runs in `\x1b[36m…\x1b[0m`:
108
+
109
+ - The header line
110
+ - The leading glyph of each section (`↗ ✓ ⚠ ✗ ▸` etc. — pure ASCII / monochrome Unicode only; no emoji)
111
+ - The bar fill itself
112
+
113
+ Everything else stays default-fg. Counts, percentages, agent names, and the disclaimer are plain. Reset after each coloured run — never leave an unterminated SGR.
114
+
115
+ When colour is disabled, drop the SGR escapes entirely. The bars stay Unicode block characters (they are not colour, they are glyph shape).
116
+
117
+ ### Output mode
118
+
119
+ Render the panel inside an ```ansi code-fence so Claude Code (and any host that supports the `ansi`info-string) actually applies the SGR codes. Hosts that don't understand`ansi`will still render the code block — they just won't colour it. Do not try to detect this; the`ansi` fence is the lowest-overhead universal escape hatch.
120
+
121
+ ```ansi
122
+ <the rendered panel goes here>
123
+ ```
124
+
125
+ ## Step 4: Empty state
126
+
127
+ If the journal is empty, do not render the full panel. Print this short block (no code-fence — it's prose):
128
+
129
+ > No runs recorded yet in `.squad/runs.jsonl`. Run `/squad:implement <task>` or `/squad:review` and the journal will start filling automatically. `/squad:stats` reads the file on every invocation — no setup needed.
130
+
131
+ ## Step 5: Stranded `in_flight` notice (subtle)
132
+
133
+ The aggregator synthesizes an `aborted` view for `in_flight` rows older than 1h that never paired with a terminal row (`synthesized_aborted` count). If `synthesized_aborted > 0`, append one line under the Outcomes panel:
134
+
135
+ `note: N stranded in_flight rows treated as aborted (Phase 10 never wrote). check .squad/runs.jsonl tail.`
136
+
137
+ This is a quiet signal, not an alarm — no colour change, no symbol. It exists so users notice repeated host crashes.
138
+
139
+ ## Step 6: Sentinel `.stats-seen`
140
+
141
+ Track lifecycle visibility per architect cycle-2 PO Major. On every successful render, write a single-file sentinel at `.squad/.stats-seen` containing JSON:
142
+
143
+ ```json
144
+ { "last_seen_at": "<ISO>", "run_count_at_last_seen": <total_in_store> }
145
+ ```
146
+
147
+ Fires the sentinel write on the FIRST `/squad:stats` invocation in this repo, and every 10th run-count delta thereafter (`run_count_at_last_seen + 10 <= current total_in_store`). The sentinel is gitignored alongside `runs.jsonl`. Failure to write is silent — sentinel is diagnostic, not load-bearing.
148
+
149
+ The sentinel is consumed by no other code today; it exists so future "you haven't checked stats in a while" prompts can be added without re-engineering. Document the schema in CHANGELOG.
150
+
151
+ ## Worked example (rough)
152
+
153
+ ```ansi
154
+ [36msquad-mcp stats · 42 runs · 2026-04-09 → 2026-05-09 · normal[0m
155
+
156
+ [36m↗[0m trend (14d) [36m▁▁▂▃▂▄▅▄▃▃▅▆▇█[0m
157
+
158
+ [36m✓[0m APPROVED [36m███████████████████▌ [0m 31 (74%)
159
+ [36m⚠[0m CHANGES_REQUIRED [36m████▌ [0m 8 (19%)
160
+ [36m✗[0m REJECTED [36m█▊ [0m 3 ( 7%)
161
+
162
+ [36m▸[0m score distribution
163
+ 90-100 [36m█████████████▎ [0m 22
164
+ 80-89 [36m████████▍ [0m 14
165
+ 70-79 [36m██▌ [0m 4
166
+ <70 [36m█▎ [0m 2
167
+
168
+ invocations
169
+ implement [36m████████████████████▍ [0m 27
170
+ review [36m███████▌ [0m 10
171
+ question [36m███ [0m 4
172
+ brainstorm [36m▊ [0m 1
173
+
174
+ tokens (estimated · chars ÷ 3.5)
175
+ IN [36m███████████▍ [0m 1.2M
176
+ OUT [36m███▎ [0m 340k
177
+ total: 1.54M · avg/run: 37k
178
+
179
+ per-agent (top 5 by spend)
180
+ senior-architect 14s 320k tokens
181
+ senior-developer 11s 280k tokens
182
+ senior-dev-security 9s 210k tokens
183
+ senior-qa 8s 180k tokens
184
+ tech-lead-consolidator 6s 150k tokens
185
+
186
+ estimates: tokens = chars ÷ 3.5 · wall-clock includes parallel-batch overlap
187
+ ```
188
+
189
+ That is one possible shape; treat the order and panel labels as binding, treat the exact widths and emoji glyphs as guidelines. The goal is "I glance at it for two seconds and know what happened" — if a panel takes a paragraph to read it's misdesigned.