uv-suite 0.28.0 → 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (151) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +58 -35
  3. package/agents/claude-code/anti-slop-guard.md +14 -1
  4. package/agents/claude-code/architect.md +30 -4
  5. package/agents/claude-code/cartographer.md +18 -6
  6. package/agents/claude-code/eval-writer.md +7 -2
  7. package/agents/claude-code/reviewer.md +5 -1
  8. package/agents/claude-code/spec-writer.md +30 -7
  9. package/agents/generate.py +88 -0
  10. package/bin/cli.js +51 -48
  11. package/hooks/auto-checkpoint-helper.sh +2 -2
  12. package/hooks/auto-checkpoint.sh +3 -3
  13. package/hooks/auto-restore-on-start.sh +30 -0
  14. package/hooks/checkpoint-helper.sh +40 -35
  15. package/hooks/git-context.sh +41 -0
  16. package/hooks/lite-mode-inject.sh +26 -0
  17. package/hooks/session-end-helper.sh +2 -2
  18. package/hooks/session-end.sh +2 -2
  19. package/hooks/session-label-nag.sh +2 -2
  20. package/hooks/session-meta.sh +18 -1
  21. package/hooks/session-review-reminder.sh +2 -2
  22. package/hooks/session-start.sh +16 -0
  23. package/hooks/slop-grep.sh +12 -31
  24. package/hooks/uv-out-best.sh +20 -0
  25. package/hooks/uv-out-collect.sh +52 -0
  26. package/hooks/uv-out-notify.sh +28 -0
  27. package/hooks/uv-out-pointer.sh +16 -0
  28. package/hooks/uv-out-session.sh +24 -0
  29. package/hooks/watchtower-notify.sh +45 -0
  30. package/hooks/watchtower-send.sh +4 -0
  31. package/install.sh +93 -42
  32. package/package.json +2 -2
  33. package/personas/auto.json +40 -1
  34. package/personas/professional.json +46 -1
  35. package/personas/spike.json +32 -2
  36. package/personas/sport.json +44 -1
  37. package/settings.json +6 -2
  38. package/skills/architect/SKILL.md +109 -8
  39. package/skills/architect/specialists/distributed-systems.md +84 -0
  40. package/skills/architect/specialists/full-stack.md +92 -0
  41. package/skills/architect/specialists/llm-ai-engineering.md +86 -0
  42. package/skills/architect/specialists/ml-systems.md +81 -0
  43. package/skills/commit/SKILL.md +5 -2
  44. package/skills/confirm/SKILL.md +3 -3
  45. package/skills/investigate/SKILL.md +14 -4
  46. package/skills/lite/SKILL.md +45 -0
  47. package/skills/qa/SKILL.md +274 -0
  48. package/skills/review/SKILL.md +187 -8
  49. package/skills/review/specialists/api-contract.md +122 -0
  50. package/skills/review/specialists/architecture-trace.md +64 -0
  51. package/skills/review/specialists/data-migration.md +113 -0
  52. package/skills/review/specialists/maintainability.md +138 -0
  53. package/skills/review/specialists/performance.md +115 -0
  54. package/skills/review/specialists/security.md +132 -0
  55. package/skills/review/specialists/testing.md +109 -0
  56. package/skills/session/SKILL.md +87 -0
  57. package/skills/session/operations/auto.md +22 -0
  58. package/skills/session/operations/checkpoint.md +43 -0
  59. package/skills/session/operations/end.md +35 -0
  60. package/skills/session/operations/init.md +16 -0
  61. package/skills/session/operations/restore.md +16 -0
  62. package/skills/spec/SKILL.md +40 -1
  63. package/skills/test/SKILL.md +89 -0
  64. package/skills/test/specialists/eval.md +46 -0
  65. package/skills/test/specialists/integration.md +42 -0
  66. package/skills/test/specialists/unit.md +39 -0
  67. package/skills/understand/SKILL.md +118 -0
  68. package/skills/understand/modes/repo.md +38 -0
  69. package/skills/understand/modes/stack.md +41 -0
  70. package/skills/uv-help/SKILL.md +43 -20
  71. package/uv.sh +36 -3
  72. package/watchtower/Dockerfile +9 -0
  73. package/watchtower/README.md +78 -0
  74. package/watchtower/app/__init__.py +0 -0
  75. package/watchtower/app/__pycache__/__init__.cpython-314.pyc +0 -0
  76. package/watchtower/app/__pycache__/db.cpython-314.pyc +0 -0
  77. package/watchtower/app/__pycache__/main.cpython-314.pyc +0 -0
  78. package/watchtower/app/__pycache__/models.cpython-314.pyc +0 -0
  79. package/watchtower/app/db.py +85 -0
  80. package/watchtower/app/main.py +45 -0
  81. package/watchtower/app/models.py +49 -0
  82. package/watchtower/app/routers/__init__.py +0 -0
  83. package/watchtower/app/routers/__pycache__/__init__.cpython-314.pyc +0 -0
  84. package/watchtower/app/routers/__pycache__/control.cpython-314.pyc +0 -0
  85. package/watchtower/app/routers/__pycache__/ingest.cpython-314.pyc +0 -0
  86. package/watchtower/app/routers/__pycache__/query.cpython-314.pyc +0 -0
  87. package/watchtower/app/routers/__pycache__/stream.cpython-314.pyc +0 -0
  88. package/watchtower/app/routers/control.py +144 -0
  89. package/watchtower/app/routers/ingest.py +102 -0
  90. package/watchtower/app/routers/query.py +84 -0
  91. package/watchtower/app/routers/stream.py +30 -0
  92. package/watchtower/app/services/__init__.py +0 -0
  93. package/watchtower/app/services/__pycache__/__init__.cpython-314.pyc +0 -0
  94. package/watchtower/app/services/__pycache__/checkpoint.cpython-314.pyc +0 -0
  95. package/watchtower/app/services/__pycache__/tmux.cpython-314.pyc +0 -0
  96. package/watchtower/app/services/checkpoint.py +107 -0
  97. package/watchtower/app/services/tmux.py +54 -0
  98. package/watchtower/docker-compose.yml +22 -0
  99. package/watchtower/events.json +10373 -1
  100. package/watchtower/{auto-checkpoint-runner.js → legacy/auto-checkpoint-runner.js} +29 -2
  101. package/watchtower/{dashboard.html → legacy/dashboard.html} +261 -0
  102. package/watchtower/{server.js → legacy/server.js} +63 -0
  103. package/watchtower/legacy/snapshot-manager.js +305 -0
  104. package/watchtower/requirements.txt +3 -0
  105. package/watchtower/schema.sql +43 -0
  106. package/watchtower/static/dashboard.html +449 -0
  107. package/agents/claude-code/devops.md +0 -50
  108. package/agents/claude-code/security.md +0 -75
  109. package/agents/codex/anti-slop-guard.toml +0 -12
  110. package/agents/codex/architect.toml +0 -11
  111. package/agents/codex/cartographer.toml +0 -16
  112. package/agents/codex/devops.toml +0 -8
  113. package/agents/codex/eval-writer.toml +0 -11
  114. package/agents/codex/prototype-builder.toml +0 -10
  115. package/agents/codex/reviewer.toml +0 -16
  116. package/agents/codex/security.toml +0 -14
  117. package/agents/codex/spec-writer.toml +0 -11
  118. package/agents/codex/test-writer.toml +0 -13
  119. package/agents/cursor/anti-slop-guard.mdc +0 -22
  120. package/agents/cursor/architect.mdc +0 -24
  121. package/agents/cursor/cartographer.mdc +0 -28
  122. package/agents/cursor/devops.mdc +0 -16
  123. package/agents/cursor/eval-writer.mdc +0 -21
  124. package/agents/cursor/prototype-builder.mdc +0 -25
  125. package/agents/cursor/reviewer.mdc +0 -26
  126. package/agents/cursor/security.mdc +0 -20
  127. package/agents/cursor/spec-writer.mdc +0 -27
  128. package/agents/cursor/test-writer.mdc +0 -28
  129. package/agents/portable/anti-slop-guard.md +0 -71
  130. package/agents/portable/architect.md +0 -83
  131. package/agents/portable/cartographer.md +0 -64
  132. package/agents/portable/devops.md +0 -56
  133. package/agents/portable/eval-writer.md +0 -70
  134. package/agents/portable/prototype-builder.md +0 -70
  135. package/agents/portable/reviewer.md +0 -79
  136. package/agents/portable/security.md +0 -63
  137. package/agents/portable/spec-writer.md +0 -89
  138. package/agents/portable/test-writer.md +0 -56
  139. package/hooks/context-warning.sh +0 -4
  140. package/skills/auto-checkpoint/SKILL.md +0 -47
  141. package/skills/checkpoint/SKILL.md +0 -105
  142. package/skills/map-codebase/SKILL.md +0 -54
  143. package/skills/map-stack/SKILL.md +0 -121
  144. package/skills/restore/SKILL.md +0 -55
  145. package/skills/security-review/SKILL.md +0 -87
  146. package/skills/session-end/SKILL.md +0 -100
  147. package/skills/session-init/SKILL.md +0 -45
  148. package/skills/slop-check/SKILL.md +0 -40
  149. package/skills/write-evals/SKILL.md +0 -34
  150. package/skills/write-tests/SKILL.md +0 -54
  151. /package/watchtower/{auto-checkpoint-prompt.md → legacy/auto-checkpoint-prompt.md} +0 -0
@@ -0,0 +1,274 @@
1
+ ---
2
+ name: qa
3
+ description: >
4
+ Browser-based QA: exercises the running app via Playwright MCP, captures console
5
+ errors and visual evidence, optionally fixes source bugs with atomic commits and
6
+ generates regression tests. Three tiers (quick / standard / exhaustive). Writes
7
+ uv-out/qa-state.md so /commit and /ship can detect completion and read the health
8
+ score.
9
+ argument-hint: "[url] [--tier quick|standard|exhaustive]"
10
+ user-invocable: true
11
+ context: fork
12
+ model: claude-opus-4-6
13
+ effort: high
14
+ allowed-tools:
15
+ - Read(*)
16
+ - Grep(*)
17
+ - Glob(*)
18
+ - Write(uv-out/**)
19
+ - Write(test/**)
20
+ - Write(tests/**)
21
+ - Write(__tests__/**)
22
+ - Edit(*)
23
+ - Bash(git status *)
24
+ - Bash(git diff *)
25
+ - Bash(git log *)
26
+ - Bash(git add *)
27
+ - Bash(git commit *)
28
+ - Bash(git rev-parse *)
29
+ - Bash(mkdir *)
30
+ - Bash(ls *)
31
+ - Bash(cat *)
32
+ - Bash(date *)
33
+ - Bash(curl -sf *)
34
+ - mcp__playwright__*
35
+ ---
36
+
37
+ ## Target
38
+
39
+ $ARGUMENTS
40
+
41
+ Parse `$ARGUMENTS`:
42
+ - First positional token without `--` prefix → app URL (defaults to `http://localhost:3000` if not given).
43
+ - `--tier quick|standard|exhaustive` → tier (defaults to `standard`).
44
+ - `--baseline <path>` → path to a prior `qa-state.md` for health-score delta (defaults to `uv-out/qa-state.md`, the flat pointer to the latest run, if present).
45
+
46
+ ## Session output directory
47
+
48
+ Write QA artifacts under this directory (scoped to the current session). The
49
+ `<session-output-dir>/qa/<ts>/` paths below all resolve under it:
50
+
51
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
52
+
53
+ Stable flat pointer maintained for `/commit` and `/ship`:
54
+
55
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-pointer.sh qa-state.md qa/state.md`
56
+
57
+ ## Project context
58
+
59
+ !`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
60
+
61
+ ## Detected test framework
62
+
63
+ !`(test -f playwright.config.ts && echo "playwright") || (test -f playwright.config.js && echo "playwright") || (test -f vitest.config.ts && echo "vitest") || (test -f jest.config.js && echo "jest") || (grep -l '"jest"' package.json 2>/dev/null && echo "jest") || echo "none-detected"`
64
+
65
+ ## Tier definitions
66
+
67
+ | Tier | Viewports | Pages | Console capture | Fix bugs | Generate regression tests |
68
+ |---|---|---|---|---|---|
69
+ | `quick` | 1 (desktop 1440×900) | Top-level entry + 1 deep link | Yes | No (report only) | No |
70
+ | `standard` | 2 (desktop 1440×900, mobile 390×844) | Entry + every link in nav + 2 deep flows | Yes | Only `critical` severity | When framework detected |
71
+ | `exhaustive` | 3 (desktop, tablet 1024×768, mobile) | Every reachable page from nav + 3 deep flows + edge-case inputs | Yes | All `critical` + `high` | Yes when framework detected |
72
+
73
+ If `$ARGUMENTS` doesn't specify a tier, default to `standard`.
74
+
75
+ ## Orchestration procedure
76
+
77
+ Execute these steps in order.
78
+
79
+ ### Step 1 — Setup
80
+
81
+ - Confirm Playwright MCP is reachable (`mcp__playwright__browser_*` tools available). If not, stop and tell user to enable the Playwright MCP server.
82
+ - Verify the target URL is reachable: `curl -sf <url> > /dev/null`. If not, stop and ask user to start the dev server.
83
+ - Create output directory: `mkdir -p <session-output-dir>/qa/<ISO-timestamp>/screenshots` (the `<session-output-dir>` printed above).
84
+ - If framework detected, note its config path; you'll write regression tests against it later.
85
+
86
+ ### Step 2 — Orient
87
+
88
+ - Navigate to the target URL. Take a full-page screenshot saved as `<session-output-dir>/qa/<ts>/screenshots/00-entry-desktop.png`.
89
+ - Take a DOM accessibility snapshot (`browser_snapshot`) and use it to enumerate navigable elements: nav links, buttons with handlers, forms.
90
+ - Read browser console messages (`browser_console_messages`). Record any errors or warnings at this point as Tier-0 findings (page didn't even load cleanly).
91
+ - Build a page map: list of URLs reachable from the entry page within one click, with rough page type (list, detail, form, dashboard, etc.).
92
+
93
+ Output the page map to the user as a short table before moving on. Pause if the map is empty or suspicious (single-page app with no nav surfaced? — note it).
94
+
95
+ ### Step 3 — Explore
96
+
97
+ Based on tier, visit pages and interact:
98
+
99
+ - **Quick:** Entry + 1 deep link. Click into a representative detail page. Capture screenshot per page per viewport.
100
+ - **Standard:** Entry + every nav link + 2 deep flows. A "deep flow" = a multi-step user action (e.g., signup form, search-then-click-result, add-to-cart).
101
+ - **Exhaustive:** Every reachable page + 3 deep flows + edge cases (empty input, overlength input, special characters, rapid clicks).
102
+
103
+ For each interaction:
104
+ 1. Take a `before` screenshot.
105
+ 2. Perform the action via `browser_click` / `browser_type` / etc.
106
+ 3. Take an `after` screenshot.
107
+ 4. Check console for new errors via `browser_console_messages`.
108
+ 5. Verify the URL changed or DOM updated as expected.
109
+
110
+ Switch viewports per tier (call `browser_resize` between passes). Re-run the entry + key flows in each viewport.
111
+
112
+ Document every finding as you go — don't trust memory across the run.
113
+
114
+ ### Step 4 — Triage
115
+
116
+ Sort findings into severity buckets:
117
+
118
+ | Severity | Definition |
119
+ |---|---|
120
+ | `critical` | Page crashes, blank renders, console errors blocking user actions, broken auth, security warning visible to user |
121
+ | `high` | Dead button (click does nothing), form submit fails silently, broken nav link, content overflow that hides critical info |
122
+ | `medium` | Visual regression visible to user (layout shift, missing image, alignment), console warning, slow interaction (>2s with no spinner) |
123
+ | `low` | Minor visual nit, accessibility warning that doesn't block use, console info-level message |
124
+
125
+ Assign confidence 1-10 to each (same rubric as `/review`):
126
+ - 9-10: Reproduced in this run with screenshot evidence
127
+ - 7-8: Pattern-matched against a known bug class (e.g., 404 on resource fetch in console) with screenshot
128
+ - 5-6: Likely issue but couldn't reproduce reliably (e.g., race condition observed once)
129
+ - 3-4: Suspicious but not confirmed
130
+ - 1-2: Speculation, suppress from output
131
+
132
+ ### Step 5 — Fix loop (skip for tier `quick`)
133
+
134
+ For each finding the tier allows fixing (`critical` always; `high` only on `exhaustive`), in confidence order high to low:
135
+
136
+ 1. **Locate source.** Use Grep/Glob to find the component, handler, or route responsible. Read it.
137
+ 2. **Fix.** Apply the smallest change that addresses the bug. No refactor surrounding code.
138
+ 3. **Re-test.** Reload the affected page, replay the action, capture new `after` screenshot.
139
+ 4. **Classify outcome:**
140
+ - `verified-fixed`: bug no longer reproduces; new screenshot shows expected state
141
+ - `regressed`: bug fixed but a new issue appeared in re-test
142
+ - `unchanged`: fix didn't take effect; revert and mark `ask` for human review
143
+ 5. **Commit** only verified-fixed. One commit per fix:
144
+ ```
145
+ git commit -m "fix(qa): <severity> <one-line> — confidence <n>"
146
+ ```
147
+ Include the fix file path + before/after screenshot paths in the commit body.
148
+
149
+ Never bundle multiple fixes in one commit — atomic per fix so revert is precise.
150
+
151
+ ### Step 6 — Generate regression tests
152
+
153
+ Skip if framework is `none-detected` or tier is `quick`.
154
+
155
+ For each `verified-fixed` finding, write a regression test in the project's convention:
156
+ - Playwright: `*.spec.ts` in the existing test dir, using `test(...)` and `page.*` API.
157
+ - Jest/Vitest: `*.test.ts` covering the unit if the fix was unit-level; integration spec if not.
158
+
159
+ Each test must:
160
+ - Have a name that describes the bug (`'navigates to /users when nav button clicked (regression for QA-003)'`)
161
+ - Have at least one assertion that would have failed pre-fix
162
+ - Run as part of the project's existing test command (verify by running it)
163
+
164
+ Run the new tests once before declaring done. If they fail, the fix didn't actually work — go back to Step 5 with `regressed`.
165
+
166
+ ### Step 7 — Final QA pass
167
+
168
+ After all fixes are committed and tests pass:
169
+
170
+ - Re-navigate the full page map (per tier) one more time.
171
+ - Take fresh screenshots into `<session-output-dir>/qa/<ts>/screenshots/final-*`.
172
+ - Capture final console state.
173
+ - Compute health score (see scoring below).
174
+
175
+ ### Step 8 — Compute health score
176
+
177
+ ```
178
+ score = 100
179
+ - 25 × critical_remaining
180
+ - 10 × high_remaining
181
+ - 3 × medium_remaining
182
+ - 1 × low_remaining
183
+ - 5 × console_errors_remaining
184
+ - 2 × console_warnings_remaining
185
+ (clamped to [0, 100])
186
+ ```
187
+
188
+ If a baseline was provided or auto-detected, compute `delta = current - baseline`. **If delta is negative, warn prominently** — something regressed during this run that wasn't caused by an in-run fix.
189
+
190
+ ### Step 9 — Write state
191
+
192
+ Write the canonical state to `<session-output-dir>/qa/state.md` (overwritten each run), and
193
+ keep this run's detailed copy at `<session-output-dir>/qa/<ts>/qa-state.md`. The flat pointer
194
+ `uv-out/qa-state.md` already points at `qa/state.md`, so `/commit` and `/ship` read the latest
195
+ run unchanged.
196
+
197
+ State frontmatter schema:
198
+
199
+ ```yaml
200
+ ---
201
+ schema: uv-suite/qa-state/v1
202
+ session_id: <UVS_SESSION_ID env, or "unknown">
203
+ ran_at: <ISO 8601>
204
+ target: <url>
205
+ tier: quick|standard|exhaustive
206
+ viewports: [<list>]
207
+ test_framework: <playwright|jest|vitest|none-detected>
208
+ pages_visited: <n>
209
+ deep_flows_run: <n>
210
+ issues_found:
211
+ critical: <n>
212
+ high: <n>
213
+ medium: <n>
214
+ low: <n>
215
+ issues_remaining:
216
+ critical: <n>
217
+ high: <n>
218
+ medium: <n>
219
+ low: <n>
220
+ console_errors: <n>
221
+ console_warnings: <n>
222
+ fixes_committed: <n>
223
+ fixes_regressed: <n>
224
+ fixes_unchanged: <n>
225
+ regression_tests_added: <n>
226
+ health_score: <0-100>
227
+ baseline_score: <0-100 | null>
228
+ health_delta: <integer | null>
229
+ artifacts:
230
+ screenshots_dir: <session-output-dir>/qa/<ts>/screenshots/
231
+ commits: [<sha>, ...]
232
+ status: complete # or: partial, failed
233
+ ---
234
+ ```
235
+
236
+ Below the frontmatter, write a human-readable report with sections for: page map, findings by severity (with screenshot references), commits made, regression tests added, remaining work, health score breakdown.
237
+
238
+ ### Step 10 — Report to user
239
+
240
+ Output to the user in this order:
241
+ 1. Tier + viewports + pages visited (one line)
242
+ 2. Health score + delta vs baseline (with warning if negative)
243
+ 3. Critical findings (with file:line, fix outcome, screenshot path)
244
+ 4. High findings (same)
245
+ 5. Medium/low counts (pointer to state file for details)
246
+ 6. Commits made + regression tests added
247
+ 7. Pointer to `uv-out/qa-state.md` and screenshots directory
248
+
249
+ If no fixes were applied (tier=`quick` or no critical bugs), say so explicitly — don't pretend there was nothing to fix.
250
+
251
+ ## Notes for downstream skills
252
+
253
+ `/commit` reads `uv-out/qa-state.md` and:
254
+ - Surfaces `issues_remaining.critical > 0` as a blocker (user must override)
255
+ - Includes `health_score` + `delta` in commit message footer when run after QA
256
+
257
+ `/ship` reads `uv-out/qa-state.md` and:
258
+ - Blocks PR if `status != complete` or `issues_remaining.critical > 0`
259
+ - Posts the health-score summary as a PR comment
260
+
261
+ ## Voice rules
262
+
263
+ - Lead with what you observed, not what you assumed. "Clicked button at /users/new, no network request fired" beats "submit handler appears broken."
264
+ - Cite the screenshot for every finding (file path). Evidence first.
265
+ - When a bug isn't reproducible, say so and lower confidence — don't fabricate steps.
266
+ - "Health score dropped 8 points; root cause: 2 new console errors after the search input change" beats "QA found regressions."
267
+
268
+ ## Dogfood note (2026-06-05)
269
+
270
+ This skill is new. No formal eval suite gates it; correctness is validated by running it against real apps in this workspace. Expected failure modes to watch for in early runs:
271
+ - Playwright MCP tool names may drift; update `allowed-tools` if `mcp__playwright__*` wildcard stops matching.
272
+ - Health-score weights are a starting point — adjust the formula in Step 8 if real runs produce uninformative scores.
273
+ - Atomic-commit-per-fix can be noisy; consider squashing pre-PR or extending `/commit` to bundle when fixes are related.
274
+ - Auto-detection of framework via config file presence misses monorepos; add support if needed.
@@ -1,21 +1,35 @@
1
1
  ---
2
2
  name: review
3
3
  description: >
4
- Code review for correctness, security, performance, maintainability, and AI slop.
5
- Use before merging or as self-review.
6
- argument-hint: "[file-or-branch]"
4
+ Multi-specialist code review. Dispatches concern-specific subagents in parallel
5
+ (security, performance, testing, maintainability, api-contract, data-migration),
6
+ scores each finding 1-10 for confidence, gates output by tier, and persists state
7
+ to uv-out/review-state.md so /commit and /ship can detect completion. Pass --security
8
+ for a focused tool-backed (Semgrep/Gitleaks/Trivy) OWASP review, --slop for a full
9
+ anti-slop audit (all six slop categories), or --architecture to audit a design against
10
+ its recorded Design Constraints (traceability). Note: ambient slop detection also runs as
11
+ a PostToolUse hook on every write; --slop is the deep on-demand audit.
12
+ argument-hint: "[file-or-branch] [--security|--slop|--architecture]"
7
13
  user-invocable: true
8
14
  context: fork
9
- agent: reviewer
10
15
  model: claude-opus-4-6
11
16
  effort: high
12
17
  allowed-tools:
13
18
  - Read(*)
14
19
  - Grep(*)
15
20
  - Glob(*)
21
+ - Write(uv-out/**)
16
22
  - Bash(git diff *)
17
23
  - Bash(git log *)
18
24
  - Bash(git show *)
25
+ - Bash(git rev-parse *)
26
+ - Bash(git merge-base *)
27
+ - Bash(semgrep *)
28
+ - Bash(gitleaks *)
29
+ - Bash(trivy *)
30
+ - Bash(npm audit *)
31
+ - Bash(pip audit *)
32
+ - Agent(*)
19
33
  ---
20
34
 
21
35
  ## Changes to review
@@ -30,6 +44,16 @@ allowed-tools:
30
44
 
31
45
  $ARGUMENTS
32
46
 
47
+ ## Session output directory
48
+
49
+ Write the review report and state under this directory (scoped to the current session):
50
+
51
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
52
+
53
+ Stable flat pointer maintained for `/commit` and `/ship`:
54
+
55
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-pointer.sh review-state.md review/state.md`
56
+
33
57
  ## Project context
34
58
 
35
59
  !`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
@@ -42,16 +66,171 @@ $ARGUMENTS
42
66
 
43
67
  ### Architecture map
44
68
 
45
- !`cat uv-out/map-codebase.md 2>/dev/null | head -100 || echo "No codebase map — run /map-codebase first for better review context"`
69
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh map-codebase.md 100 || echo "No codebase map — run /understand first for better review context"`
46
70
 
47
71
  ### Architecture decisions
48
72
 
49
- !`cat uv-out/architecture/decisions.md 2>/dev/null | head -60 || echo "No architecture decisions found"`
73
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/decisions.md' 60 || echo "No architecture decisions found"`
50
74
 
51
75
  ### Acts plan
52
76
 
53
- !`cat uv-out/architecture/acts-plan.md 2>/dev/null | head -60 || echo "No acts plan found"`
77
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/acts-plan.md' 60 || echo "No acts plan found"`
54
78
 
55
79
  ### Session checkpoint (what's in progress)
56
80
 
57
- !`cat uv-out/checkpoints/latest.md 2>/dev/null | head -40 || echo "No checkpoint"`
81
+ !`cat uv-out/current/checkpoints/latest.md 2>/dev/null | head -40 || echo "No checkpoint"`
82
+
83
+ ## Orchestration procedure
84
+
85
+ Execute these steps in order. Do not skip steps.
86
+
87
+ ### Step 0 — Check for a focused mode
88
+
89
+ `$ARGUMENTS` may request a single deep specialist instead of the full review. In a
90
+ focused mode, a diff is **not** required (the target may be a directory or the whole
91
+ project — skip Step 1's stop), run **only** that specialist, and skip all others.
92
+
93
+ - **`--security`** (the former `/review --security`): dispatch the `security` specialist in
94
+ **deep-scan mode** — it runs the available SAST / secret / dependency tools (Semgrep,
95
+ Gitleaks, Trivy) over the target in addition to diff reasoning.
96
+ - **`--slop`** (the former `/review --slop`): dispatch the **anti-slop-guard** agent over the
97
+ target — the full anti-slop audit across all six slop categories (over-engineering,
98
+ architecture, test, doc, error-handling, comment slop), not just the diff-level
99
+ `maintainability` subset that a normal review runs.
100
+ - **`--architecture`**: audit the design against its recorded constraints (no diff needed).
101
+ Load the session's architecture artifacts and dispatch the `architecture-trace` specialist:
102
+
103
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/constraints.md' 120 || echo "No constraints.md — run /architect first (it records design constraints)"`
104
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/decisions.md' 200 || echo "No decisions.md"`
105
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/acts-plan.md' 120 || echo "No acts-plan.md"`
106
+
107
+ Pass those three to the specialist. If `constraints.md` is absent, stop and say so — there
108
+ is nothing to trace against.
109
+
110
+ Otherwise, proceed normally from Step 1.
111
+
112
+ ### Step 1 — Validate diff exists
113
+
114
+ If the diff loaded above is empty or "No diff available", stop and tell the user there's nothing to review. Suggest `$ARGUMENTS` should be a branch name (e.g., `/review feature/foo`) if they want to review a different target. (Does not apply in a focused `--security`/`--slop`/`--architecture` mode.)
115
+
116
+ ### Step 2 — Classify scope, pick specialists
117
+
118
+ Read the diff and `git diff --stat` output. Decide which specialists to dispatch based on what the diff touches. Default specialist set (always run unless explicitly skipped):
119
+
120
+ | Specialist | Run when diff touches |
121
+ |---|---|
122
+ | security | Anything in auth, sessions, tokens, user input handling, file I/O paths, SQL, shell commands, network calls, secrets |
123
+ | performance | Loops over collections, DB queries, blocking I/O on request paths, caching layers, batch jobs |
124
+ | testing | Files under `test/`, `tests/`, `spec/`, `__tests__/`, or any source file with no corresponding test |
125
+ | maintainability | Always — covers comment slop, over-engineering, error-handling slop |
126
+ | api-contract | Public interfaces, exported types, REST/GraphQL endpoints, library entry points, breaking signature changes |
127
+ | data-migration | SQL DDL, migration files, schema changes, backfill scripts, `ALTER TABLE`, index changes |
128
+
129
+ Skip a specialist if the diff has zero relevance to its scope. Document which you skipped and why in the final report.
130
+
131
+ ### Step 3 — Dispatch specialists in parallel
132
+
133
+ For each selected specialist, you (the orchestrator) read the corresponding specialist prompt at `.claude/skills/review/specialists/<name>.md`, then launch a subagent in a single message (parallel tool-call block) passing the specialist's prompt content + the diff loaded above as the subagent's task.
134
+
135
+ Use `Agent(general-purpose)` for each dispatch. Pass the diff, the specialist prompt content, and the relevant project context. Expected return shape per specialist:
136
+
137
+ ```yaml
138
+ specialist: <name>
139
+ findings:
140
+ - file: <path>
141
+ line: <number or range>
142
+ severity: critical|high|medium|low
143
+ confidence: <1-10>
144
+ title: <one-line summary>
145
+ detail: <what's wrong, why it matters>
146
+ fix_class: auto_fix|ask|info
147
+ suggested_fix: <code or instruction, optional>
148
+ status: complete
149
+ ```
150
+
151
+ ### Step 4 — Aggregate, score, tier
152
+
153
+ Collect all specialist findings. Sort by confidence then severity. Apply tier gating to the user-facing output:
154
+
155
+ | Confidence | Tier | Treatment |
156
+ |---|---|---|
157
+ | 9-10 | Critical | Surface at top, no caveats |
158
+ | 7-8 | High | Surface normally |
159
+ | 5-6 | Medium | Surface with `(medium confidence)` caveat |
160
+ | 3-4 | Low | Move to "Appendix: low-confidence findings" section |
161
+ | 1-2 | Noise | Suppress from output, log to state file only |
162
+
163
+ Confidence scoring rubric (specialists apply this; orchestrator validates):
164
+ - 10: Direct evidence — bug visible in the diff, exploit demonstrable
165
+ - 8-9: Pattern-match with high prior — matches a known anti-pattern with no obvious mitigating context
166
+ - 6-7: Likely issue but depends on context not visible in the diff (call out the assumption)
167
+ - 4-5: Possible issue, would need code outside the diff to confirm
168
+ - 1-3: Speculation, style preference, or "could be cleaner"
169
+
170
+ ### Step 5 — Classify Fix-First
171
+
172
+ For each surfaced finding (tier Critical/High/Medium), assign `fix_class`:
173
+
174
+ - `auto_fix`: trivial, mechanical fix where wrong-ness is unambiguous. Apply directly if the user runs `/commit` or asks. Examples: missing `await`, comment slop, dead variable.
175
+ - `ask`: judgment call or risky change. Surface to user, wait for direction. Examples: refactor proposal, security finding requiring threat assessment, API contract break.
176
+ - `info`: not actionable, just worth knowing. Example: "test coverage dropped from 87% to 82% on touched files."
177
+
178
+ ### Step 6 — Write state
179
+
180
+ Write the review state to `<session-output-dir>/review/state.md` (the
181
+ `<session-output-dir>` printed above, e.g. `uv-out/sessions/<sid>/`). The flat pointer
182
+ `uv-out/review-state.md` already points here, so `/commit` and `/ship` read it unchanged.
183
+ Use this exact frontmatter schema so they can parse it:
184
+
185
+ ```yaml
186
+ ---
187
+ schema: uv-suite/review-state/v1
188
+ session_id: <UVS_SESSION_ID env var, or "unknown">
189
+ ran_at: <ISO 8601 timestamp>
190
+ target: <branch | HEAD | $ARGUMENTS>
191
+ diff_stats:
192
+ files_changed: <n>
193
+ additions: <n>
194
+ deletions: <n>
195
+ specialists_run: [security, performance, testing, maintainability, api-contract, data-migration]
196
+ specialists_skipped: [] # with reason in body
197
+ summary:
198
+ critical: <count of confidence 9-10>
199
+ high: <count of confidence 7-8>
200
+ medium: <count of confidence 5-6>
201
+ low: <count of confidence 3-4>
202
+ suppressed: <count of confidence 1-2>
203
+ auto_fix: <count>
204
+ ask: <count>
205
+ info: <count>
206
+ status: complete # or: partial, failed
207
+ ---
208
+ ```
209
+
210
+ Below the frontmatter, write the human-readable findings report (markdown) with one section per tier, then an appendix for low-confidence findings, then a "Specialists skipped" section explaining why each was skipped.
211
+
212
+ ### Step 7 — Report to user
213
+
214
+ Output to the user in this order:
215
+ 1. One-line summary: counts by tier + total fix_class counts
216
+ 2. Critical findings (each with file:line, title, detail, suggested fix if auto_fix)
217
+ 3. High-confidence findings
218
+ 4. Medium-confidence findings (with caveat)
219
+ 5. Pointer to `uv-out/review-state.md` for the full report including low-confidence findings
220
+
221
+ Do not paste the appendix into the chat unless asked. Keep terminal output focused on what needs attention.
222
+
223
+ ## Notes for downstream skills
224
+
225
+ `/commit` reads `uv-out/review-state.md` and:
226
+ - Refuses to commit if `summary.critical > 0` unless user explicitly overrides
227
+ - Auto-applies `fix_class: auto_fix` findings before commit when `summary.ask == 0`
228
+ - Includes review summary in commit message footer
229
+
230
+ `/ship` reads `uv-out/review-state.md` and:
231
+ - Blocks PR creation if `status != complete` or `summary.critical > 0`
232
+ - Adds review summary to PR body
233
+
234
+ ## Dogfood note (2026-06-05)
235
+
236
+ This skill was rewritten with gstack-derived structural patterns (parallel specialist dispatch, confidence-scored output gating, persisted state coupling). No formal eval gate was used; correctness is validated by running against real PRs in this workspace and tuning specialist prompts as gaps surface. If you spot a finding category that's getting missed or a noise pattern that's leaking through tier gating, edit the relevant specialist file or this orchestrator directly.
@@ -0,0 +1,122 @@
1
+ # Specialist: API Contract
2
+
3
+ You are the API-contract specialist for `/review`. You receive a diff and project context. You scan only for breaking changes to interfaces other code depends on. Other specialists cover correctness, security, etc.
4
+
5
+ ## Your scope
6
+
7
+ You own these concern areas:
8
+
9
+ - Public function/method signature changes (exported or otherwise consumed across module/package boundaries)
10
+ - Type/schema changes on data crossing module or service boundaries
11
+ - REST/GraphQL/RPC endpoint contract changes (URLs, methods, request/response shapes, status codes)
12
+ - Library entry-point changes (default exports, re-exports, package.json `exports` field)
13
+ - Event/message schema changes (Kafka, webhooks, internal pub/sub)
14
+ - Database schema changes affecting application code shape (delegated to data-migration specialist for SQL safety, but the API contract impact is yours)
15
+
16
+ Out of scope: implementation correctness, perf, security. Internal-only refactors that don't cross any boundary.
17
+
18
+ ## Detection rules — flag with confidence 9-10 (Critical)
19
+
20
+ Direct evidence in the diff. Cite file:line.
21
+
22
+ 1. **Removed exported symbol.** `export function foo` deleted, or removed from `index.ts` re-exports, with no deprecation warning or compat shim. Critical regardless of whether you see consumers in this repo — consumers may be downstream.
23
+
24
+ 2. **Required → required-different-shape parameter change.**
25
+ ```ts
26
+ // before
27
+ export function send(user: { id: string }) { ... }
28
+ // after
29
+ export function send(user: { uuid: string }) { ... }
30
+ ```
31
+ Existing callers break silently if the field is optional in TypeScript's structural type or implicitly stringly-typed.
32
+
33
+ 3. **REST endpoint removed or renamed.** A route handler deleted, or path pattern changed, with no `301` redirect or compat alias.
34
+
35
+ 4. **REST response field removed or renamed.** A field consumers parse no longer present. Critical even if "no one's using it" — you can't prove that from the diff.
36
+
37
+ 5. **HTTP status code semantics changed.** `return 200` becomes `return 201` for the same operation, or success path now returns `204` instead of `200 { ok: true }`. Clients pattern-match on status codes.
38
+
39
+ 6. **Event/message schema field removed.** Field deleted from a published Kafka schema, webhook payload, or pub/sub message. Subscribers break on next message.
40
+
41
+ ## Detection rules — flag with confidence 7-8 (High)
42
+
43
+ Strong pattern match. State assumptions.
44
+
45
+ 1. **Required parameter added to an exported function.** New param without default value, no overload. Callers that don't pass it break. Assumption: there are callers — flag regardless.
46
+
47
+ 2. **Return type narrowed.** `Promise<User | null>` becomes `Promise<User>`, or array becomes single object. Callers that handle the broader case may have dead code, but more importantly type checks may break.
48
+
49
+ 3. **Enum value removed.** `enum Status { Active, Pending, Archived }` → `enum Status { Active, Archived }`. Any switch on the old value silently misses cases.
50
+
51
+ 4. **Field type changed without runtime conversion.** `id: number` becomes `id: string`. Equality checks, hash keys, serialization shape all break.
52
+
53
+ 5. **Default value changed.** `function foo(x = 10)` becomes `function foo(x = 0)`. Behavior change for any caller that relied on the default.
54
+
55
+ 6. **GraphQL schema: required field becomes nullable, or non-nullable becomes nullable on response type.** Frontends may crash on null they didn't expect.
56
+
57
+ 7. **Renamed exported symbol with no alias.**
58
+ ```ts
59
+ // before: export { fetchUser };
60
+ // after: export { getUser }; // no `export { getUser as fetchUser }`
61
+ ```
62
+
63
+ ## Detection rules — flag with confidence 5-6 (Medium)
64
+
65
+ Need context to confirm.
66
+
67
+ 1. **Optional parameter added in the middle of a positional signature.** `function(a, b)` → `function(a, newOpt, b)`. TypeScript catches at compile; runtime callers in dynamic languages don't.
68
+
69
+ 2. **New required field on a response object.** Clients written defensively (`r.foo ?? default`) survive; clients that destructure assertively don't.
70
+
71
+ 3. **Configuration key renamed.** Env var, config file key, feature flag name changed. Caveat: depends on whether the renamed key has a migration path.
72
+
73
+ 4. **Behavior change on edge case without signature change.** `divide(10, 0)` returned `Infinity`, now throws. Same signature, different contract.
74
+
75
+ ## Detection rules — flag with confidence 3-4 (Low)
76
+
77
+ Surface to appendix only.
78
+
79
+ 1. **Renamed parameter on an exported function (no signature change).** Keyword-args languages (Python, etc.) — callers using `foo(bar=...)` break. In positional languages, no break but docs/IDE confused.
80
+
81
+ 2. **JSDoc/docstring removed or changed materially without code change.** Consumers reading docs see different contract than before.
82
+
83
+ ## What NOT to flag (anti-noise)
84
+
85
+ - Internal helpers that aren't exported and aren't called from other modules.
86
+ - Tightening type annotations that don't change runtime behavior (`unknown` → `string` where input was always strings).
87
+ - Adding new optional parameters at the end of a signature with a sensible default.
88
+ - Adding new fields to a response object (additive change).
89
+ - New endpoints, new exports — these aren't breaking.
90
+ - Internal database column rename when no app code references the column by old name (verify by Grep).
91
+
92
+ ## Output format
93
+
94
+ ```yaml
95
+ specialist: api-contract
96
+ findings:
97
+ - file: <path>
98
+ line: <n or range>
99
+ severity: critical|high|medium|low
100
+ confidence: <1-10>
101
+ title: <one line>
102
+ detail: <2-4 sentences including: what changed, who's affected, the migration path if any>
103
+ fix_class: auto_fix|ask|info
104
+ suggested_fix: <e.g., "Add alias export: `export { newName as oldName }`">
105
+ status: complete
106
+ ```
107
+
108
+ If nothing found:
109
+
110
+ ```yaml
111
+ specialist: api-contract
112
+ findings: []
113
+ status: complete
114
+ notes: <e.g., "Diff touches only internal helpers, no exports or schemas changed">
115
+ ```
116
+
117
+ ## Voice rules
118
+
119
+ - Name who breaks: "Callers of `sendEmail` that pass `userId` as a string now fail at runtime when..."
120
+ - Distinguish "breaks at compile time" (TS catches it) from "breaks at runtime silently" (much worse).
121
+ - If a compat shim is available, suggest it concretely: alias export, redirect route, deprecation header.
122
+ - Don't flag every signature change — only the ones with external callers in plausible scope.