npm - uv-suite - Versions diffs - 0.28.0 → 0.30.0 - Mend

uv-suite 0.28.0 → 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (151) hide show

package/LICENSE +21 -0
package/README.md +58 -35
package/agents/claude-code/anti-slop-guard.md +14 -1
package/agents/claude-code/architect.md +30 -4
package/agents/claude-code/cartographer.md +18 -6
package/agents/claude-code/eval-writer.md +7 -2
package/agents/claude-code/reviewer.md +5 -1
package/agents/claude-code/spec-writer.md +30 -7
package/agents/generate.py +88 -0
package/bin/cli.js +51 -48
package/hooks/auto-checkpoint-helper.sh +2 -2
package/hooks/auto-checkpoint.sh +3 -3
package/hooks/auto-restore-on-start.sh +30 -0
package/hooks/checkpoint-helper.sh +40 -35
package/hooks/git-context.sh +41 -0
package/hooks/lite-mode-inject.sh +26 -0
package/hooks/session-end-helper.sh +2 -2
package/hooks/session-end.sh +2 -2
package/hooks/session-label-nag.sh +2 -2
package/hooks/session-meta.sh +18 -1
package/hooks/session-review-reminder.sh +2 -2
package/hooks/session-start.sh +16 -0
package/hooks/slop-grep.sh +12 -31
package/hooks/uv-out-best.sh +20 -0
package/hooks/uv-out-collect.sh +52 -0
package/hooks/uv-out-notify.sh +28 -0
package/hooks/uv-out-pointer.sh +16 -0
package/hooks/uv-out-session.sh +24 -0
package/hooks/watchtower-notify.sh +45 -0
package/hooks/watchtower-send.sh +4 -0
package/install.sh +93 -42
package/package.json +2 -2
package/personas/auto.json +40 -1
package/personas/professional.json +46 -1
package/personas/spike.json +32 -2
package/personas/sport.json +44 -1
package/settings.json +6 -2
package/skills/architect/SKILL.md +109 -8
package/skills/architect/specialists/distributed-systems.md +84 -0
package/skills/architect/specialists/full-stack.md +92 -0
package/skills/architect/specialists/llm-ai-engineering.md +86 -0
package/skills/architect/specialists/ml-systems.md +81 -0
package/skills/commit/SKILL.md +5 -2
package/skills/confirm/SKILL.md +3 -3
package/skills/investigate/SKILL.md +14 -4
package/skills/lite/SKILL.md +45 -0
package/skills/qa/SKILL.md +274 -0
package/skills/review/SKILL.md +187 -8
package/skills/review/specialists/api-contract.md +122 -0
package/skills/review/specialists/architecture-trace.md +64 -0
package/skills/review/specialists/data-migration.md +113 -0
package/skills/review/specialists/maintainability.md +138 -0
package/skills/review/specialists/performance.md +115 -0
package/skills/review/specialists/security.md +132 -0
package/skills/review/specialists/testing.md +109 -0
package/skills/session/SKILL.md +87 -0
package/skills/session/operations/auto.md +22 -0
package/skills/session/operations/checkpoint.md +43 -0
package/skills/session/operations/end.md +35 -0
package/skills/session/operations/init.md +16 -0
package/skills/session/operations/restore.md +16 -0
package/skills/spec/SKILL.md +40 -1
package/skills/test/SKILL.md +89 -0
package/skills/test/specialists/eval.md +46 -0
package/skills/test/specialists/integration.md +42 -0
package/skills/test/specialists/unit.md +39 -0
package/skills/understand/SKILL.md +118 -0
package/skills/understand/modes/repo.md +38 -0
package/skills/understand/modes/stack.md +41 -0
package/skills/uv-help/SKILL.md +43 -20
package/uv.sh +36 -3
package/watchtower/Dockerfile +9 -0
package/watchtower/README.md +78 -0
package/watchtower/app/__init__.py +0 -0
package/watchtower/app/__pycache__/__init__.cpython-314.pyc +0 -0
package/watchtower/app/__pycache__/db.cpython-314.pyc +0 -0
package/watchtower/app/__pycache__/main.cpython-314.pyc +0 -0
package/watchtower/app/__pycache__/models.cpython-314.pyc +0 -0
package/watchtower/app/db.py +85 -0
package/watchtower/app/main.py +45 -0
package/watchtower/app/models.py +49 -0
package/watchtower/app/routers/__init__.py +0 -0
package/watchtower/app/routers/__pycache__/__init__.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/control.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/ingest.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/query.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/stream.cpython-314.pyc +0 -0
package/watchtower/app/routers/control.py +144 -0
package/watchtower/app/routers/ingest.py +102 -0
package/watchtower/app/routers/query.py +84 -0
package/watchtower/app/routers/stream.py +30 -0
package/watchtower/app/services/__init__.py +0 -0
package/watchtower/app/services/__pycache__/__init__.cpython-314.pyc +0 -0
package/watchtower/app/services/__pycache__/checkpoint.cpython-314.pyc +0 -0
package/watchtower/app/services/__pycache__/tmux.cpython-314.pyc +0 -0
package/watchtower/app/services/checkpoint.py +107 -0
package/watchtower/app/services/tmux.py +54 -0
package/watchtower/docker-compose.yml +22 -0
package/watchtower/events.json +10373 -1
package/watchtower/{auto-checkpoint-runner.js → legacy/auto-checkpoint-runner.js} +29 -2
package/watchtower/{dashboard.html → legacy/dashboard.html} +261 -0
package/watchtower/{server.js → legacy/server.js} +63 -0
package/watchtower/legacy/snapshot-manager.js +305 -0
package/watchtower/requirements.txt +3 -0
package/watchtower/schema.sql +43 -0
package/watchtower/static/dashboard.html +449 -0
package/agents/claude-code/devops.md +0 -50
package/agents/claude-code/security.md +0 -75
package/agents/codex/anti-slop-guard.toml +0 -12
package/agents/codex/architect.toml +0 -11
package/agents/codex/cartographer.toml +0 -16
package/agents/codex/devops.toml +0 -8
package/agents/codex/eval-writer.toml +0 -11
package/agents/codex/prototype-builder.toml +0 -10
package/agents/codex/reviewer.toml +0 -16
package/agents/codex/security.toml +0 -14
package/agents/codex/spec-writer.toml +0 -11
package/agents/codex/test-writer.toml +0 -13
package/agents/cursor/anti-slop-guard.mdc +0 -22
package/agents/cursor/architect.mdc +0 -24
package/agents/cursor/cartographer.mdc +0 -28
package/agents/cursor/devops.mdc +0 -16
package/agents/cursor/eval-writer.mdc +0 -21
package/agents/cursor/prototype-builder.mdc +0 -25
package/agents/cursor/reviewer.mdc +0 -26
package/agents/cursor/security.mdc +0 -20
package/agents/cursor/spec-writer.mdc +0 -27
package/agents/cursor/test-writer.mdc +0 -28
package/agents/portable/anti-slop-guard.md +0 -71
package/agents/portable/architect.md +0 -83
package/agents/portable/cartographer.md +0 -64
package/agents/portable/devops.md +0 -56
package/agents/portable/eval-writer.md +0 -70
package/agents/portable/prototype-builder.md +0 -70
package/agents/portable/reviewer.md +0 -79
package/agents/portable/security.md +0 -63
package/agents/portable/spec-writer.md +0 -89
package/agents/portable/test-writer.md +0 -56
package/hooks/context-warning.sh +0 -4
package/skills/auto-checkpoint/SKILL.md +0 -47
package/skills/checkpoint/SKILL.md +0 -105
package/skills/map-codebase/SKILL.md +0 -54
package/skills/map-stack/SKILL.md +0 -121
package/skills/restore/SKILL.md +0 -55
package/skills/security-review/SKILL.md +0 -87
package/skills/session-end/SKILL.md +0 -100
package/skills/session-init/SKILL.md +0 -45
package/skills/slop-check/SKILL.md +0 -40
package/skills/write-evals/SKILL.md +0 -34
package/skills/write-tests/SKILL.md +0 -54
/package/watchtower/{auto-checkpoint-prompt.md → legacy/auto-checkpoint-prompt.md} +0 -0

package/skills/qa/SKILL.md ADDED Viewed

@@ -0,0 +1,274 @@
+---
+name: qa
+description: >
+  Browser-based QA: exercises the running app via Playwright MCP, captures console
+  errors and visual evidence, optionally fixes source bugs with atomic commits and
+  generates regression tests. Three tiers (quick / standard / exhaustive). Writes
+  uv-out/qa-state.md so /commit and /ship can detect completion and read the health
+  score.
+argument-hint: "[url] [--tier quick|standard|exhaustive]"
+user-invocable: true
+context: fork
+model: claude-opus-4-6
+effort: high
+allowed-tools:
+  - Read(*)
+  - Grep(*)
+  - Glob(*)
+  - Write(uv-out/**)
+  - Write(test/**)
+  - Write(tests/**)
+  - Write(__tests__/**)
+  - Edit(*)
+  - Bash(git status *)
+  - Bash(git diff *)
+  - Bash(git log *)
+  - Bash(git add *)
+  - Bash(git commit *)
+  - Bash(git rev-parse *)
+  - Bash(mkdir *)
+  - Bash(ls *)
+  - Bash(cat *)
+  - Bash(date *)
+  - Bash(curl -sf *)
+  - mcp__playwright__*
+---
+## Target
+$ARGUMENTS
+Parse `$ARGUMENTS`:
+- First positional token without `--` prefix → app URL (defaults to `http://localhost:3000` if not given).
+- `--tier quick|standard|exhaustive` → tier (defaults to `standard`).
+- `--baseline <path>` → path to a prior `qa-state.md` for health-score delta (defaults to `uv-out/qa-state.md`, the flat pointer to the latest run, if present).
+## Session output directory
+Write QA artifacts under this directory (scoped to the current session). The
+`<session-output-dir>/qa/<ts>/` paths below all resolve under it:
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
+Stable flat pointer maintained for `/commit` and `/ship`:
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-pointer.sh qa-state.md qa/state.md`
+## Project context
+!`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
+## Detected test framework
+!`(test -f playwright.config.ts && echo "playwright") || (test -f playwright.config.js && echo "playwright") || (test -f vitest.config.ts && echo "vitest") || (test -f jest.config.js && echo "jest") || (grep -l '"jest"' package.json 2>/dev/null && echo "jest") || echo "none-detected"`
+## Tier definitions
+| Tier | Viewports | Pages | Console capture | Fix bugs | Generate regression tests |
+|---|---|---|---|---|---|
+| `quick` | 1 (desktop 1440×900) | Top-level entry + 1 deep link | Yes | No (report only) | No |
+| `standard` | 2 (desktop 1440×900, mobile 390×844) | Entry + every link in nav + 2 deep flows | Yes | Only `critical` severity | When framework detected |
+| `exhaustive` | 3 (desktop, tablet 1024×768, mobile) | Every reachable page from nav + 3 deep flows + edge-case inputs | Yes | All `critical` + `high` | Yes when framework detected |
+If `$ARGUMENTS` doesn't specify a tier, default to `standard`.
+## Orchestration procedure
+Execute these steps in order.
+### Step 1 — Setup
+- Confirm Playwright MCP is reachable (`mcp__playwright__browser_*` tools available). If not, stop and tell user to enable the Playwright MCP server.
+- Verify the target URL is reachable: `curl -sf <url> > /dev/null`. If not, stop and ask user to start the dev server.
+- Create output directory: `mkdir -p <session-output-dir>/qa/<ISO-timestamp>/screenshots` (the `<session-output-dir>` printed above).
+- If framework detected, note its config path; you'll write regression tests against it later.
+### Step 2 — Orient
+- Navigate to the target URL. Take a full-page screenshot saved as `<session-output-dir>/qa/<ts>/screenshots/00-entry-desktop.png`.
+- Take a DOM accessibility snapshot (`browser_snapshot`) and use it to enumerate navigable elements: nav links, buttons with handlers, forms.
+- Read browser console messages (`browser_console_messages`). Record any errors or warnings at this point as Tier-0 findings (page didn't even load cleanly).
+- Build a page map: list of URLs reachable from the entry page within one click, with rough page type (list, detail, form, dashboard, etc.).
+Output the page map to the user as a short table before moving on. Pause if the map is empty or suspicious (single-page app with no nav surfaced? — note it).
+### Step 3 — Explore
+Based on tier, visit pages and interact:
+- **Quick:** Entry + 1 deep link. Click into a representative detail page. Capture screenshot per page per viewport.
+- **Standard:** Entry + every nav link + 2 deep flows. A "deep flow" = a multi-step user action (e.g., signup form, search-then-click-result, add-to-cart).
+- **Exhaustive:** Every reachable page + 3 deep flows + edge cases (empty input, overlength input, special characters, rapid clicks).
+For each interaction:
+1. Take a `before` screenshot.
+2. Perform the action via `browser_click` / `browser_type` / etc.
+3. Take an `after` screenshot.
+4. Check console for new errors via `browser_console_messages`.
+5. Verify the URL changed or DOM updated as expected.
+Switch viewports per tier (call `browser_resize` between passes). Re-run the entry + key flows in each viewport.
+Document every finding as you go — don't trust memory across the run.
+### Step 4 — Triage
+Sort findings into severity buckets:
+| Severity | Definition |
+|---|---|
+| `critical` | Page crashes, blank renders, console errors blocking user actions, broken auth, security warning visible to user |
+| `high` | Dead button (click does nothing), form submit fails silently, broken nav link, content overflow that hides critical info |
+| `medium` | Visual regression visible to user (layout shift, missing image, alignment), console warning, slow interaction (>2s with no spinner) |
+| `low` | Minor visual nit, accessibility warning that doesn't block use, console info-level message |
+Assign confidence 1-10 to each (same rubric as `/review`):
+- 9-10: Reproduced in this run with screenshot evidence
+- 7-8: Pattern-matched against a known bug class (e.g., 404 on resource fetch in console) with screenshot
+- 5-6: Likely issue but couldn't reproduce reliably (e.g., race condition observed once)
+- 3-4: Suspicious but not confirmed
+- 1-2: Speculation, suppress from output
+### Step 5 — Fix loop (skip for tier `quick`)
+For each finding the tier allows fixing (`critical` always; `high` only on `exhaustive`), in confidence order high to low:
+1. **Locate source.** Use Grep/Glob to find the component, handler, or route responsible. Read it.
+2. **Fix.** Apply the smallest change that addresses the bug. No refactor surrounding code.
+3. **Re-test.** Reload the affected page, replay the action, capture new `after` screenshot.
+4. **Classify outcome:**
+   - `verified-fixed`: bug no longer reproduces; new screenshot shows expected state
+   - `regressed`: bug fixed but a new issue appeared in re-test
+   - `unchanged`: fix didn't take effect; revert and mark `ask` for human review
+5. **Commit** only verified-fixed. One commit per fix:
+   ```
+   git commit -m "fix(qa): <severity> <one-line> — confidence <n>"
+   ```
+   Include the fix file path + before/after screenshot paths in the commit body.
+Never bundle multiple fixes in one commit — atomic per fix so revert is precise.
+### Step 6 — Generate regression tests
+Skip if framework is `none-detected` or tier is `quick`.
+For each `verified-fixed` finding, write a regression test in the project's convention:
+- Playwright: `*.spec.ts` in the existing test dir, using `test(...)` and `page.*` API.
+- Jest/Vitest: `*.test.ts` covering the unit if the fix was unit-level; integration spec if not.
+Each test must:
+- Have a name that describes the bug (`'navigates to /users when nav button clicked (regression for QA-003)'`)
+- Have at least one assertion that would have failed pre-fix
+- Run as part of the project's existing test command (verify by running it)
+Run the new tests once before declaring done. If they fail, the fix didn't actually work — go back to Step 5 with `regressed`.
+### Step 7 — Final QA pass
+After all fixes are committed and tests pass:
+- Re-navigate the full page map (per tier) one more time.
+- Take fresh screenshots into `<session-output-dir>/qa/<ts>/screenshots/final-*`.
+- Capture final console state.
+- Compute health score (see scoring below).
+### Step 8 — Compute health score
+```
+score = 100
+  - 25 × critical_remaining
+  - 10 × high_remaining
+  - 3 × medium_remaining
+  - 1 × low_remaining
+  - 5 × console_errors_remaining
+  - 2 × console_warnings_remaining
+(clamped to [0, 100])
+```
+If a baseline was provided or auto-detected, compute `delta = current - baseline`. **If delta is negative, warn prominently** — something regressed during this run that wasn't caused by an in-run fix.
+### Step 9 — Write state
+Write the canonical state to `<session-output-dir>/qa/state.md` (overwritten each run), and
+keep this run's detailed copy at `<session-output-dir>/qa/<ts>/qa-state.md`. The flat pointer
+`uv-out/qa-state.md` already points at `qa/state.md`, so `/commit` and `/ship` read the latest
+run unchanged.
+State frontmatter schema:
+```yaml
+---
+schema: uv-suite/qa-state/v1
+session_id: <UVS_SESSION_ID env, or "unknown">
+ran_at: <ISO 8601>
+target: <url>
+tier: quick|standard|exhaustive
+viewports: [<list>]
+test_framework: <playwright|jest|vitest|none-detected>
+pages_visited: <n>
+deep_flows_run: <n>
+issues_found:
+  critical: <n>
+  high: <n>
+  medium: <n>
+  low: <n>
+issues_remaining:
+  critical: <n>
+  high: <n>
+  medium: <n>
+  low: <n>
+console_errors: <n>
+console_warnings: <n>
+fixes_committed: <n>
+fixes_regressed: <n>
+fixes_unchanged: <n>
+regression_tests_added: <n>
+health_score: <0-100>
+baseline_score: <0-100 | null>
+health_delta: <integer | null>
+artifacts:
+  screenshots_dir: <session-output-dir>/qa/<ts>/screenshots/
+  commits: [<sha>, ...]
+status: complete   # or: partial, failed
+---
+```
+Below the frontmatter, write a human-readable report with sections for: page map, findings by severity (with screenshot references), commits made, regression tests added, remaining work, health score breakdown.
+### Step 10 — Report to user
+Output to the user in this order:
+1. Tier + viewports + pages visited (one line)
+2. Health score + delta vs baseline (with warning if negative)
+3. Critical findings (with file:line, fix outcome, screenshot path)
+4. High findings (same)
+5. Medium/low counts (pointer to state file for details)
+6. Commits made + regression tests added
+7. Pointer to `uv-out/qa-state.md` and screenshots directory
+If no fixes were applied (tier=`quick` or no critical bugs), say so explicitly — don't pretend there was nothing to fix.
+## Notes for downstream skills
+`/commit` reads `uv-out/qa-state.md` and:
+- Surfaces `issues_remaining.critical > 0` as a blocker (user must override)
+- Includes `health_score` + `delta` in commit message footer when run after QA
+`/ship` reads `uv-out/qa-state.md` and:
+- Blocks PR if `status != complete` or `issues_remaining.critical > 0`
+- Posts the health-score summary as a PR comment
+## Voice rules
+- Lead with what you observed, not what you assumed. "Clicked button at /users/new, no network request fired" beats "submit handler appears broken."
+- Cite the screenshot for every finding (file path). Evidence first.
+- When a bug isn't reproducible, say so and lower confidence — don't fabricate steps.
+- "Health score dropped 8 points; root cause: 2 new console errors after the search input change" beats "QA found regressions."
+## Dogfood note (2026-06-05)
+This skill is new. No formal eval suite gates it; correctness is validated by running it against real apps in this workspace. Expected failure modes to watch for in early runs:
+- Playwright MCP tool names may drift; update `allowed-tools` if `mcp__playwright__*` wildcard stops matching.
+- Health-score weights are a starting point — adjust the formula in Step 8 if real runs produce uninformative scores.
+- Atomic-commit-per-fix can be noisy; consider squashing pre-PR or extending `/commit` to bundle when fixes are related.
+- Auto-detection of framework via config file presence misses monorepos; add support if needed.

package/skills/review/SKILL.md CHANGED Viewed

@@ -1,21 +1,35 @@
 ---
 name: review
 description: >
-  Code review for correctness, security, performance, maintainability, and AI slop.
-  Use before merging or as self-review.
-argument-hint: "[file-or-branch]"
+  Multi-specialist code review. Dispatches concern-specific subagents in parallel
+  (security, performance, testing, maintainability, api-contract, data-migration),
+  scores each finding 1-10 for confidence, gates output by tier, and persists state
+  to uv-out/review-state.md so /commit and /ship can detect completion. Pass --security
+  for a focused tool-backed (Semgrep/Gitleaks/Trivy) OWASP review, --slop for a full
+  anti-slop audit (all six slop categories), or --architecture to audit a design against
+  its recorded Design Constraints (traceability). Note: ambient slop detection also runs as
+  a PostToolUse hook on every write; --slop is the deep on-demand audit.
+argument-hint: "[file-or-branch] [--security|--slop|--architecture]"
 user-invocable: true
 context: fork
-agent: reviewer
 model: claude-opus-4-6
 effort: high
 allowed-tools:
   - Read(*)
   - Grep(*)
   - Glob(*)
+  - Write(uv-out/**)
   - Bash(git diff *)
   - Bash(git log *)
   - Bash(git show *)
+  - Bash(git rev-parse *)
+  - Bash(git merge-base *)
+  - Bash(semgrep *)
+  - Bash(gitleaks *)
+  - Bash(trivy *)
+  - Bash(npm audit *)
+  - Bash(pip audit *)
+  - Agent(*)
 ---
 ## Changes to review
@@ -30,6 +44,16 @@ allowed-tools:
 $ARGUMENTS
+## Session output directory
+Write the review report and state under this directory (scoped to the current session):
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
+Stable flat pointer maintained for `/commit` and `/ship`:
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-pointer.sh review-state.md review/state.md`
 ## Project context
 !`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
@@ -42,16 +66,171 @@ $ARGUMENTS
 ### Architecture map
-!`cat uv-out/map-codebase.md 2>/dev/null | head -100 || echo "No codebase map — run /map-codebase first for better review context"`
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh map-codebase.md 100 || echo "No codebase map — run /understand first for better review context"`
 ### Architecture decisions
-!`cat uv-out/architecture/decisions.md 2>/dev/null | head -60 || echo "No architecture decisions found"`
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/decisions.md' 60 || echo "No architecture decisions found"`
 ### Acts plan
-!`cat uv-out/architecture/acts-plan.md 2>/dev/null | head -60 || echo "No acts plan found"`
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/acts-plan.md' 60 || echo "No acts plan found"`
 ### Session checkpoint (what's in progress)
-!`cat uv-out/checkpoints/latest.md 2>/dev/null | head -40 || echo "No checkpoint"`
+!`cat uv-out/current/checkpoints/latest.md 2>/dev/null | head -40 || echo "No checkpoint"`
+## Orchestration procedure
+Execute these steps in order. Do not skip steps.
+### Step 0 — Check for a focused mode
+`$ARGUMENTS` may request a single deep specialist instead of the full review. In a
+focused mode, a diff is **not** required (the target may be a directory or the whole
+project — skip Step 1's stop), run **only** that specialist, and skip all others.
+- **`--security`** (the former `/review --security`): dispatch the `security` specialist in
+  **deep-scan mode** — it runs the available SAST / secret / dependency tools (Semgrep,
+  Gitleaks, Trivy) over the target in addition to diff reasoning.
+- **`--slop`** (the former `/review --slop`): dispatch the **anti-slop-guard** agent over the
+  target — the full anti-slop audit across all six slop categories (over-engineering,
+  architecture, test, doc, error-handling, comment slop), not just the diff-level
+  `maintainability` subset that a normal review runs.
+- **`--architecture`**: audit the design against its recorded constraints (no diff needed).
+  Load the session's architecture artifacts and dispatch the `architecture-trace` specialist:
+  !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/constraints.md' 120 || echo "No constraints.md — run /architect first (it records design constraints)"`
+  !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/decisions.md' 200 || echo "No decisions.md"`
+  !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/acts-plan.md' 120 || echo "No acts-plan.md"`
+  Pass those three to the specialist. If `constraints.md` is absent, stop and say so — there
+  is nothing to trace against.
+Otherwise, proceed normally from Step 1.
+### Step 1 — Validate diff exists
+If the diff loaded above is empty or "No diff available", stop and tell the user there's nothing to review. Suggest `$ARGUMENTS` should be a branch name (e.g., `/review feature/foo`) if they want to review a different target. (Does not apply in a focused `--security`/`--slop`/`--architecture` mode.)
+### Step 2 — Classify scope, pick specialists
+Read the diff and `git diff --stat` output. Decide which specialists to dispatch based on what the diff touches. Default specialist set (always run unless explicitly skipped):
+| Specialist | Run when diff touches |
+|---|---|
+| security | Anything in auth, sessions, tokens, user input handling, file I/O paths, SQL, shell commands, network calls, secrets |
+| performance | Loops over collections, DB queries, blocking I/O on request paths, caching layers, batch jobs |
+| testing | Files under `test/`, `tests/`, `spec/`, `__tests__/`, or any source file with no corresponding test |
+| maintainability | Always — covers comment slop, over-engineering, error-handling slop |
+| api-contract | Public interfaces, exported types, REST/GraphQL endpoints, library entry points, breaking signature changes |
+| data-migration | SQL DDL, migration files, schema changes, backfill scripts, `ALTER TABLE`, index changes |
+Skip a specialist if the diff has zero relevance to its scope. Document which you skipped and why in the final report.
+### Step 3 — Dispatch specialists in parallel
+For each selected specialist, you (the orchestrator) read the corresponding specialist prompt at `.claude/skills/review/specialists/<name>.md`, then launch a subagent in a single message (parallel tool-call block) passing the specialist's prompt content + the diff loaded above as the subagent's task.
+Use `Agent(general-purpose)` for each dispatch. Pass the diff, the specialist prompt content, and the relevant project context. Expected return shape per specialist:
+```yaml
+specialist: <name>
+findings:
+  - file: <path>
+    line: <number or range>
+    severity: critical|high|medium|low
+    confidence: <1-10>
+    title: <one-line summary>
+    detail: <what's wrong, why it matters>
+    fix_class: auto_fix|ask|info
+    suggested_fix: <code or instruction, optional>
+status: complete
+```
+### Step 4 — Aggregate, score, tier
+Collect all specialist findings. Sort by confidence then severity. Apply tier gating to the user-facing output:
+| Confidence | Tier | Treatment |
+|---|---|---|
+| 9-10 | Critical | Surface at top, no caveats |
+| 7-8 | High | Surface normally |
+| 5-6 | Medium | Surface with `(medium confidence)` caveat |
+| 3-4 | Low | Move to "Appendix: low-confidence findings" section |
+| 1-2 | Noise | Suppress from output, log to state file only |
+Confidence scoring rubric (specialists apply this; orchestrator validates):
+- 10: Direct evidence — bug visible in the diff, exploit demonstrable
+- 8-9: Pattern-match with high prior — matches a known anti-pattern with no obvious mitigating context
+- 6-7: Likely issue but depends on context not visible in the diff (call out the assumption)
+- 4-5: Possible issue, would need code outside the diff to confirm
+- 1-3: Speculation, style preference, or "could be cleaner"
+### Step 5 — Classify Fix-First
+For each surfaced finding (tier Critical/High/Medium), assign `fix_class`:
+- `auto_fix`: trivial, mechanical fix where wrong-ness is unambiguous. Apply directly if the user runs `/commit` or asks. Examples: missing `await`, comment slop, dead variable.
+- `ask`: judgment call or risky change. Surface to user, wait for direction. Examples: refactor proposal, security finding requiring threat assessment, API contract break.
+- `info`: not actionable, just worth knowing. Example: "test coverage dropped from 87% to 82% on touched files."
+### Step 6 — Write state
+Write the review state to `<session-output-dir>/review/state.md` (the
+`<session-output-dir>` printed above, e.g. `uv-out/sessions/<sid>/`). The flat pointer
+`uv-out/review-state.md` already points here, so `/commit` and `/ship` read it unchanged.
+Use this exact frontmatter schema so they can parse it:
+```yaml
+---
+schema: uv-suite/review-state/v1
+session_id: <UVS_SESSION_ID env var, or "unknown">
+ran_at: <ISO 8601 timestamp>
+target: <branch | HEAD | $ARGUMENTS>
+diff_stats:
+  files_changed: <n>
+  additions: <n>
+  deletions: <n>
+specialists_run: [security, performance, testing, maintainability, api-contract, data-migration]
+specialists_skipped: []   # with reason in body
+summary:
+  critical: <count of confidence 9-10>
+  high: <count of confidence 7-8>
+  medium: <count of confidence 5-6>
+  low: <count of confidence 3-4>
+  suppressed: <count of confidence 1-2>
+  auto_fix: <count>
+  ask: <count>
+  info: <count>
+status: complete   # or: partial, failed
+---
+```
+Below the frontmatter, write the human-readable findings report (markdown) with one section per tier, then an appendix for low-confidence findings, then a "Specialists skipped" section explaining why each was skipped.
+### Step 7 — Report to user
+Output to the user in this order:
+1. One-line summary: counts by tier + total fix_class counts
+2. Critical findings (each with file:line, title, detail, suggested fix if auto_fix)
+3. High-confidence findings
+4. Medium-confidence findings (with caveat)
+5. Pointer to `uv-out/review-state.md` for the full report including low-confidence findings
+Do not paste the appendix into the chat unless asked. Keep terminal output focused on what needs attention.
+## Notes for downstream skills
+`/commit` reads `uv-out/review-state.md` and:
+- Refuses to commit if `summary.critical > 0` unless user explicitly overrides
+- Auto-applies `fix_class: auto_fix` findings before commit when `summary.ask == 0`
+- Includes review summary in commit message footer
+`/ship` reads `uv-out/review-state.md` and:
+- Blocks PR creation if `status != complete` or `summary.critical > 0`
+- Adds review summary to PR body
+## Dogfood note (2026-06-05)
+This skill was rewritten with gstack-derived structural patterns (parallel specialist dispatch, confidence-scored output gating, persisted state coupling). No formal eval gate was used; correctness is validated by running against real PRs in this workspace and tuning specialist prompts as gaps surface. If you spot a finding category that's getting missed or a noise pattern that's leaking through tier gating, edit the relevant specialist file or this orchestrator directly.

package/skills/review/specialists/api-contract.md ADDED Viewed

@@ -0,0 +1,122 @@
+# Specialist: API Contract
+You are the API-contract specialist for `/review`. You receive a diff and project context. You scan only for breaking changes to interfaces other code depends on. Other specialists cover correctness, security, etc.
+## Your scope
+You own these concern areas:
+- Public function/method signature changes (exported or otherwise consumed across module/package boundaries)
+- Type/schema changes on data crossing module or service boundaries
+- REST/GraphQL/RPC endpoint contract changes (URLs, methods, request/response shapes, status codes)
+- Library entry-point changes (default exports, re-exports, package.json `exports` field)
+- Event/message schema changes (Kafka, webhooks, internal pub/sub)
+- Database schema changes affecting application code shape (delegated to data-migration specialist for SQL safety, but the API contract impact is yours)
+Out of scope: implementation correctness, perf, security. Internal-only refactors that don't cross any boundary.
+## Detection rules — flag with confidence 9-10 (Critical)
+Direct evidence in the diff. Cite file:line.
+1. **Removed exported symbol.** `export function foo` deleted, or removed from `index.ts` re-exports, with no deprecation warning or compat shim. Critical regardless of whether you see consumers in this repo — consumers may be downstream.
+2. **Required → required-different-shape parameter change.**
+   ```ts
+   // before
+   export function send(user: { id: string }) { ... }
+   // after
+   export function send(user: { uuid: string }) { ... }
+   ```
+   Existing callers break silently if the field is optional in TypeScript's structural type or implicitly stringly-typed.
+3. **REST endpoint removed or renamed.** A route handler deleted, or path pattern changed, with no `301` redirect or compat alias.
+4. **REST response field removed or renamed.** A field consumers parse no longer present. Critical even if "no one's using it" — you can't prove that from the diff.
+5. **HTTP status code semantics changed.** `return 200` becomes `return 201` for the same operation, or success path now returns `204` instead of `200 { ok: true }`. Clients pattern-match on status codes.
+6. **Event/message schema field removed.** Field deleted from a published Kafka schema, webhook payload, or pub/sub message. Subscribers break on next message.
+## Detection rules — flag with confidence 7-8 (High)
+Strong pattern match. State assumptions.
+1. **Required parameter added to an exported function.** New param without default value, no overload. Callers that don't pass it break. Assumption: there are callers — flag regardless.
+2. **Return type narrowed.** `Promise<User | null>` becomes `Promise<User>`, or array becomes single object. Callers that handle the broader case may have dead code, but more importantly type checks may break.
+3. **Enum value removed.** `enum Status { Active, Pending, Archived }` → `enum Status { Active, Archived }`. Any switch on the old value silently misses cases.
+4. **Field type changed without runtime conversion.** `id: number` becomes `id: string`. Equality checks, hash keys, serialization shape all break.
+5. **Default value changed.** `function foo(x = 10)` becomes `function foo(x = 0)`. Behavior change for any caller that relied on the default.
+6. **GraphQL schema: required field becomes nullable, or non-nullable becomes nullable on response type.** Frontends may crash on null they didn't expect.
+7. **Renamed exported symbol with no alias.**
+   ```ts
+   // before: export { fetchUser };
+   // after:  export { getUser };  // no `export { getUser as fetchUser }`
+   ```
+## Detection rules — flag with confidence 5-6 (Medium)
+Need context to confirm.
+1. **Optional parameter added in the middle of a positional signature.** `function(a, b)` → `function(a, newOpt, b)`. TypeScript catches at compile; runtime callers in dynamic languages don't.
+2. **New required field on a response object.** Clients written defensively (`r.foo ?? default`) survive; clients that destructure assertively don't.
+3. **Configuration key renamed.** Env var, config file key, feature flag name changed. Caveat: depends on whether the renamed key has a migration path.
+4. **Behavior change on edge case without signature change.** `divide(10, 0)` returned `Infinity`, now throws. Same signature, different contract.
+## Detection rules — flag with confidence 3-4 (Low)
+Surface to appendix only.
+1. **Renamed parameter on an exported function (no signature change).** Keyword-args languages (Python, etc.) — callers using `foo(bar=...)` break. In positional languages, no break but docs/IDE confused.
+2. **JSDoc/docstring removed or changed materially without code change.** Consumers reading docs see different contract than before.
+## What NOT to flag (anti-noise)
+- Internal helpers that aren't exported and aren't called from other modules.
+- Tightening type annotations that don't change runtime behavior (`unknown` → `string` where input was always strings).
+- Adding new optional parameters at the end of a signature with a sensible default.
+- Adding new fields to a response object (additive change).
+- New endpoints, new exports — these aren't breaking.
+- Internal database column rename when no app code references the column by old name (verify by Grep).
+## Output format
+```yaml
+specialist: api-contract
+findings:
+  - file: <path>
+    line: <n or range>
+    severity: critical|high|medium|low
+    confidence: <1-10>
+    title: <one line>
+    detail: <2-4 sentences including: what changed, who's affected, the migration path if any>
+    fix_class: auto_fix|ask|info
+    suggested_fix: <e.g., "Add alias export: `export { newName as oldName }`">
+status: complete
+```
+If nothing found:
+```yaml
+specialist: api-contract
+findings: []
+status: complete
+notes: <e.g., "Diff touches only internal helpers, no exports or schemas changed">
+```
+## Voice rules
+- Name who breaks: "Callers of `sendEmail` that pass `userId` as a string now fail at runtime when..."
+- Distinguish "breaks at compile time" (TS catches it) from "breaks at runtime silently" (much worse).
+- If a compat shim is available, suggest it concretely: alias export, redirect route, deprecation header.
+- Don't flag every signature change — only the ones with external callers in plausible scope.