npm - uv-suite - Versions diffs - 0.28.0 → 0.30.0 - Mend

uv-suite 0.28.0 → 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (151) hide show

package/LICENSE +21 -0
package/README.md +58 -35
package/agents/claude-code/anti-slop-guard.md +14 -1
package/agents/claude-code/architect.md +30 -4
package/agents/claude-code/cartographer.md +18 -6
package/agents/claude-code/eval-writer.md +7 -2
package/agents/claude-code/reviewer.md +5 -1
package/agents/claude-code/spec-writer.md +30 -7
package/agents/generate.py +88 -0
package/bin/cli.js +51 -48
package/hooks/auto-checkpoint-helper.sh +2 -2
package/hooks/auto-checkpoint.sh +3 -3
package/hooks/auto-restore-on-start.sh +30 -0
package/hooks/checkpoint-helper.sh +40 -35
package/hooks/git-context.sh +41 -0
package/hooks/lite-mode-inject.sh +26 -0
package/hooks/session-end-helper.sh +2 -2
package/hooks/session-end.sh +2 -2
package/hooks/session-label-nag.sh +2 -2
package/hooks/session-meta.sh +18 -1
package/hooks/session-review-reminder.sh +2 -2
package/hooks/session-start.sh +16 -0
package/hooks/slop-grep.sh +12 -31
package/hooks/uv-out-best.sh +20 -0
package/hooks/uv-out-collect.sh +52 -0
package/hooks/uv-out-notify.sh +28 -0
package/hooks/uv-out-pointer.sh +16 -0
package/hooks/uv-out-session.sh +24 -0
package/hooks/watchtower-notify.sh +45 -0
package/hooks/watchtower-send.sh +4 -0
package/install.sh +93 -42
package/package.json +2 -2
package/personas/auto.json +40 -1
package/personas/professional.json +46 -1
package/personas/spike.json +32 -2
package/personas/sport.json +44 -1
package/settings.json +6 -2
package/skills/architect/SKILL.md +109 -8
package/skills/architect/specialists/distributed-systems.md +84 -0
package/skills/architect/specialists/full-stack.md +92 -0
package/skills/architect/specialists/llm-ai-engineering.md +86 -0
package/skills/architect/specialists/ml-systems.md +81 -0
package/skills/commit/SKILL.md +5 -2
package/skills/confirm/SKILL.md +3 -3
package/skills/investigate/SKILL.md +14 -4
package/skills/lite/SKILL.md +45 -0
package/skills/qa/SKILL.md +274 -0
package/skills/review/SKILL.md +187 -8
package/skills/review/specialists/api-contract.md +122 -0
package/skills/review/specialists/architecture-trace.md +64 -0
package/skills/review/specialists/data-migration.md +113 -0
package/skills/review/specialists/maintainability.md +138 -0
package/skills/review/specialists/performance.md +115 -0
package/skills/review/specialists/security.md +132 -0
package/skills/review/specialists/testing.md +109 -0
package/skills/session/SKILL.md +87 -0
package/skills/session/operations/auto.md +22 -0
package/skills/session/operations/checkpoint.md +43 -0
package/skills/session/operations/end.md +35 -0
package/skills/session/operations/init.md +16 -0
package/skills/session/operations/restore.md +16 -0
package/skills/spec/SKILL.md +40 -1
package/skills/test/SKILL.md +89 -0
package/skills/test/specialists/eval.md +46 -0
package/skills/test/specialists/integration.md +42 -0
package/skills/test/specialists/unit.md +39 -0
package/skills/understand/SKILL.md +118 -0
package/skills/understand/modes/repo.md +38 -0
package/skills/understand/modes/stack.md +41 -0
package/skills/uv-help/SKILL.md +43 -20
package/uv.sh +36 -3
package/watchtower/Dockerfile +9 -0
package/watchtower/README.md +78 -0
package/watchtower/app/__init__.py +0 -0
package/watchtower/app/__pycache__/__init__.cpython-314.pyc +0 -0
package/watchtower/app/__pycache__/db.cpython-314.pyc +0 -0
package/watchtower/app/__pycache__/main.cpython-314.pyc +0 -0
package/watchtower/app/__pycache__/models.cpython-314.pyc +0 -0
package/watchtower/app/db.py +85 -0
package/watchtower/app/main.py +45 -0
package/watchtower/app/models.py +49 -0
package/watchtower/app/routers/__init__.py +0 -0
package/watchtower/app/routers/__pycache__/__init__.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/control.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/ingest.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/query.cpython-314.pyc +0 -0
package/watchtower/app/routers/__pycache__/stream.cpython-314.pyc +0 -0
package/watchtower/app/routers/control.py +144 -0
package/watchtower/app/routers/ingest.py +102 -0
package/watchtower/app/routers/query.py +84 -0
package/watchtower/app/routers/stream.py +30 -0
package/watchtower/app/services/__init__.py +0 -0
package/watchtower/app/services/__pycache__/__init__.cpython-314.pyc +0 -0
package/watchtower/app/services/__pycache__/checkpoint.cpython-314.pyc +0 -0
package/watchtower/app/services/__pycache__/tmux.cpython-314.pyc +0 -0
package/watchtower/app/services/checkpoint.py +107 -0
package/watchtower/app/services/tmux.py +54 -0
package/watchtower/docker-compose.yml +22 -0
package/watchtower/events.json +10373 -1
package/watchtower/{auto-checkpoint-runner.js → legacy/auto-checkpoint-runner.js} +29 -2
package/watchtower/{dashboard.html → legacy/dashboard.html} +261 -0
package/watchtower/{server.js → legacy/server.js} +63 -0
package/watchtower/legacy/snapshot-manager.js +305 -0
package/watchtower/requirements.txt +3 -0
package/watchtower/schema.sql +43 -0
package/watchtower/static/dashboard.html +449 -0
package/agents/claude-code/devops.md +0 -50
package/agents/claude-code/security.md +0 -75
package/agents/codex/anti-slop-guard.toml +0 -12
package/agents/codex/architect.toml +0 -11
package/agents/codex/cartographer.toml +0 -16
package/agents/codex/devops.toml +0 -8
package/agents/codex/eval-writer.toml +0 -11
package/agents/codex/prototype-builder.toml +0 -10
package/agents/codex/reviewer.toml +0 -16
package/agents/codex/security.toml +0 -14
package/agents/codex/spec-writer.toml +0 -11
package/agents/codex/test-writer.toml +0 -13
package/agents/cursor/anti-slop-guard.mdc +0 -22
package/agents/cursor/architect.mdc +0 -24
package/agents/cursor/cartographer.mdc +0 -28
package/agents/cursor/devops.mdc +0 -16
package/agents/cursor/eval-writer.mdc +0 -21
package/agents/cursor/prototype-builder.mdc +0 -25
package/agents/cursor/reviewer.mdc +0 -26
package/agents/cursor/security.mdc +0 -20
package/agents/cursor/spec-writer.mdc +0 -27
package/agents/cursor/test-writer.mdc +0 -28
package/agents/portable/anti-slop-guard.md +0 -71
package/agents/portable/architect.md +0 -83
package/agents/portable/cartographer.md +0 -64
package/agents/portable/devops.md +0 -56
package/agents/portable/eval-writer.md +0 -70
package/agents/portable/prototype-builder.md +0 -70
package/agents/portable/reviewer.md +0 -79
package/agents/portable/security.md +0 -63
package/agents/portable/spec-writer.md +0 -89
package/agents/portable/test-writer.md +0 -56
package/hooks/context-warning.sh +0 -4
package/skills/auto-checkpoint/SKILL.md +0 -47
package/skills/checkpoint/SKILL.md +0 -105
package/skills/map-codebase/SKILL.md +0 -54
package/skills/map-stack/SKILL.md +0 -121
package/skills/restore/SKILL.md +0 -55
package/skills/security-review/SKILL.md +0 -87
package/skills/session-end/SKILL.md +0 -100
package/skills/session-init/SKILL.md +0 -45
package/skills/slop-check/SKILL.md +0 -40
package/skills/write-evals/SKILL.md +0 -34
package/skills/write-tests/SKILL.md +0 -54
/package/watchtower/{auto-checkpoint-prompt.md → legacy/auto-checkpoint-prompt.md} +0 -0

package/skills/session/SKILL.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+name: session
+description: >
+  Manage the UV Suite session lifecycle: label it (init), save a checkpoint,
+  restore a prior checkpoint, end it cleanly, or toggle auto-checkpoints.
+  One skill, subcommand-routed. The automatic counterparts (PostToolUse
+  auto-checkpoint, SessionStart auto-restore) remain hooks. Checkpoints are
+  stored per-session under uv-out/sessions/<session-id>/checkpoints/, so
+  concurrent terminals don't clobber each other.
+argument-hint: "init|checkpoint|restore|end|auto [args]"
+user-invocable: true
+allowed-tools:
+  - Read(*)
+  - Write(*)
+  - Bash(git status *)
+  - Bash(git diff *)
+  - Bash(git log *)
+  - Bash(git branch *)
+  - Bash(git rev-parse *)
+  - Bash(date *)
+  - Bash(ls *)
+  - Bash(mkdir *)
+  - Bash(cat *)
+  - Bash(echo *)
+  - Bash(grep *)
+  - Bash(find *)
+  - Bash(*/.claude/hooks/session-meta.sh *)
+  - Bash(*/.claude/hooks/checkpoint-helper.sh *)
+  - Bash(*/.claude/hooks/session-end-helper.sh *)
+  - Bash(*/.claude/hooks/auto-checkpoint-helper.sh *)
+---
+## Dispatch
+The first word of `$ARGUMENTS` is the subcommand; the rest are its arguments. The
+block below runs only the context-gathering for the chosen subcommand and prints a
+`SUBCOMMAND=...` marker — follow the matching section under "Instructions".
+```!
+ARGS="$ARGUMENTS"; SUB=$(printf '%s' "$ARGS" | awk '{print $1}'); REST=$(printf '%s' "$ARGS" | sed 's/^[^ ]* *//'); [ "$REST" = "$SUB" ] && REST=""
+H="${CLAUDE_PROJECT_DIR:-.}/.claude/hooks"
+case "$SUB" in
+  init)
+    if [ -z "$REST" ] || [ "$REST" = show ]; then "$H/session-meta.sh" show
+    elif [ "$REST" = clear ]; then "$H/session-meta.sh" clear
+    else F=$(printf '%s' "$REST"|awk '{print $1}'); R2=$(printf '%s' "$REST"|sed 's/^[^ ]* *//')
+      case "$F" in
+        --kind) "$H/session-meta.sh" set-kind "$R2";;
+        --priority) "$H/session-meta.sh" set-priority "$R2";;
+        --purpose) "$H/session-meta.sh" set-purpose "$R2";;
+        --name) "$H/session-meta.sh" set-name "$R2";;
+        *) "$H/session-meta.sh" set-name "$REST";;
+      esac
+    fi; echo "SUBCOMMAND=init";;
+  auto)
+    "$H/auto-checkpoint-helper.sh" $REST; echo "SUBCOMMAND=auto";;
+  restore)
+    "$H/checkpoint-helper.sh" list; echo "---LATEST---"; "$H/checkpoint-helper.sh" latest; echo "SUBCOMMAND=restore REST=$REST";;
+  checkpoint|end)
+    "$H/checkpoint-helper.sh" dir; echo "---FRONTMATTER---"; "$H/checkpoint-helper.sh" frontmatter; echo "---META---"; "$H/checkpoint-helper.sh" meta
+    echo "---GIT---"; git branch --show-current 2>/dev/null || echo "(not a git repo)"; git status --short 2>/dev/null | head -20; git log --oneline -5 2>/dev/null
+    echo "SUBCOMMAND=$SUB REST=$REST";;
+  *) echo "SUBCOMMAND=help (got: '$SUB')";;
+esac
+```
+## Instructions
+Read the `SUBCOMMAND=...` marker printed by the dispatch block above, then read
+and follow the matching operation file:
+- `init` → `skills/session/operations/init.md`
+- `checkpoint` → `skills/session/operations/checkpoint.md`
+- `restore` → `skills/session/operations/restore.md`
+- `end` → `skills/session/operations/end.md`
+- `auto` → `skills/session/operations/auto.md`
+The operation file references the context the dispatch block already printed
+above (helper output, checkpoint-dir, frontmatter, git state, session list).
+### help / unknown
+Tell the user the subcommands and their one-line usage:
+- `/session init <name>|--kind|--priority|--purpose|show|clear` — label the session
+- `/session checkpoint [label]` — save a checkpoint
+- `/session restore [<id-prefix>|<name>|list]` — load a prior checkpoint
+- `/session end [label]` — write the final checkpoint and terminate
+- `/session auto on|off|<minutes>|status` — toggle automatic checkpoints

package/skills/session/operations/auto.md ADDED Viewed

@@ -0,0 +1,22 @@
+# auto — toggle automatic checkpoints
+Show the line of output above; it confirms the new mode/interval. No commentary.
+The change applies to the very next interval — no restart needed.
+Usage: `/session auto on|off|<minutes>|status` (default interval 10 min, range
+1-1440). `status` or no argument prints the current mode and interval.
+How it works:
+- **Tier A (mechanical):** the `auto-checkpoint.sh` hook runs after each tool
+  call. When the interval has passed and there's been activity since the last
+  checkpoint, it writes a deterministic snapshot — git state, recent tool calls,
+  files touched — to `uv-out/sessions/<sid>/checkpoints/auto-<ts>-mechanical.md`.
+- **Tier B (semantic):** the Watchtower (`uvs watch`) keeps a timer. Every N
+  minutes, for each active session, it shells out to `claude -p --bare --model
+  haiku` with a prompt from recent dashboard events and git state, saving
+  `auto-<ts>-semantic.md` next to the mechanical one. Cap: `--max-budget-usd
+  0.05` per call.
+- Both tiers fire `AutoCheckpoint` events to the Watchtower. Sessions with zero
+  activity in the interval are skipped — no empty checkpoints.
+State lives in `.uv-suite-state/auto-checkpoint.json` (mode + interval) and
+`.uv-suite-state/sessions/<sid>.last-{mechanical,semantic}-checkpoint.txt`.

package/skills/session/operations/checkpoint.md ADDED Viewed

@@ -0,0 +1,43 @@
+# checkpoint — save a checkpoint
+Using the checkpoint-dir, frontmatter, and git state printed above, write
+`<checkpoint-dir>/YYYY-MM-DD-HHMM.md` (current timestamp; append `-REST` to the
+filename if `REST` is non-empty). The directory is per-session — two `uv`
+launches in the same repo write to different folders, so checkpoints don't
+collide. **Begin the file with the exact YAML frontmatter block printed above** —
+`/session restore` parses these fields to pick the right checkpoint and display
+session context. Also write/overwrite `<checkpoint-dir>/latest.md` with the same
+content so the next session's restore finds the freshest state. Body (keep under
+80 lines; frontmatter is required and not counted):
+```markdown
+# Checkpoint: [date] [time] [label if provided]
+## What was accomplished
+- [Concrete things done this session — files, commits, decisions]
+- [Be specific: "Added webhook retry logic to PaymentService" not "worked on payments"]
+## Key decisions made
+- [Decision]: [Why] — [What was considered and rejected]
+- [Only decisions that affect future work]
+## Current state
+- Branch: [current git branch]
+- Uncommitted changes: [yes/no, summary if yes]
+- Tests: [passing/failing/not run]
+- Blockers: [any unresolved issues]
+## Files modified
+- [Key files changed, not every file]
+## What's next
+- [Immediate next step the next session should start with]
+- [Remaining tasks from the current Act/plan]
+## Context the next session needs
+- [Non-obvious facts, workarounds, "this looks wrong but it's intentional because..."]
+- [Environment setup notes if relevant]
+```
+Capture WHY decisions were made, not just what. "Worked on auth" is useless;
+"Added JWT refresh token rotation with 7-day expiry" is useful.

package/skills/session/operations/end.md ADDED Viewed

@@ -0,0 +1,35 @@
+# end — close the session cleanly
+1. Write a final checkpoint exactly like `checkpoint` above, but name it
+   `<checkpoint-dir>/final-YYYY-MM-DD-HHMM.md` (append `-REST` to the filename if
+   given) and **override** the frontmatter line
+   `checkpoint_kind: auto-mechanical` to read `checkpoint_kind: final-manual` so
+   it's distinguishable from the auto-checkpoints. Also write/overwrite
+   `<checkpoint-dir>/latest.md` with the same content so `/session restore` picks
+   it up. Use the live conversation context — this is the highest-fidelity record
+   before the session closes; anything not written here is lost unless it's in
+   code or the auto-checkpoints. Final-checkpoint body structure:
+   ```markdown
+   # Final checkpoint: [date] [time] [label if provided]
+   ## What was accomplished
+   - Concrete things done across the whole session — files, commits, decisions
+   ## Key decisions made
+   - Decision: Why — what was considered, what was rejected
+   ## Current state
+   - Branch / uncommitted changes / tests status / blockers
+   ## Open threads
+   - Anything left in flight that the next session needs to pick up
+   ## Context the next session needs
+   - Non-obvious facts, workarounds, "this looks wrong but it's intentional because…"
+   ```
+2. Then run `"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/session-end-helper.sh` (via
+   Bash) to mark the session terminated and flip the Watchtower badge to
+   Terminated. Show the user the two-line output; don't add commentary. This does
+   **not** close the terminal or exit Claude Code — the user does that.

package/skills/session/operations/init.md ADDED Viewed

@@ -0,0 +1,16 @@
+# init — label the session
+Show the helper output above. Do not add commentary; the command confirms what
+changed. The Watchtower dashboard refreshes within a few seconds (next event
+refreshes session metadata).
+Usage: `/session init <name>` · `--kind long|outcome` · `--priority low|med|high`
+· `--purpose <text>` · `show` · `clear`. Each call sets one field; run multiple
+times to set more.
+Notes:
+- The session id is generated by `uv.sh` at launch and exported as
+  `UVS_SESSION_ID`; launching `claude` directly falls back to an
+  `ad-hoc-<timestamp>` id.
+- Metadata lives at `.uv-suite-state/sessions/<id>.json`.
+- Persona is captured at launch time and is not editable here — re-launch with a
+  different `uv` persona to change it.

package/skills/session/operations/restore.md ADDED Viewed

@@ -0,0 +1,16 @@
+# restore — load a prior checkpoint
+The available-sessions list (with `*` marking the current session) and the
+current session's `latest.md` are printed above.
+- **`REST` empty or `latest`:** read the current session's `latest.md` shown
+  above. Summarize in 3-4 sentences (what was done, current state, what's next),
+  including the session's name and purpose from the frontmatter. Then ask:
+  "Ready to pick up from here, or do you want to take a different direction?"
+- **`REST` = `list`:** just show the available-sessions list above and ask which
+  one they want to restore.
+- **`REST` looks like a session id prefix** (8-char hex / UUID-ish) **or a
+  session name:** match it against the list, then Read the matching session's
+  `<project>/uv-out/sessions/<full-session-id>/checkpoints/latest.md` (fall back
+  to the legacy `<project>/uv-out/checkpoints/<full-session-id>/latest.md` if the
+  new path is absent), and summarize as above.
+- **No match:** list the available sessions and ask the user to pick one.

package/skills/spec/SKILL.md CHANGED Viewed

@@ -13,13 +13,30 @@ allowed-tools:
   - Read(*)
   - Grep(*)
   - Glob(*)
-  - Write(*)
+  - Write(uv-out/**)
+  - AskUserQuestion
 ---
 ## Requirements
 $ARGUMENTS
+## Session output directory
+Write the spec under this directory (it is scoped to the current session, so artifacts
+are attributable to the session that produced them):
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
+## Step 0 — Gather context (do this FIRST, before writing)
+The codebase map, prior specs, and decisions for this project are auto-loaded below (from
+`uv-out/`). Read them first — they are the indexed view of the existing codebase. Then use
+`AskUserQuestion` to ask the user only for what those don't cover: specific files, modules,
+API contracts, or data models the spec must account for. Read every file they name before
+drafting. Greenfield (no map, no relevant code) → proceed without. Don't guess at relevant
+files — ground in the map or ask.
 ## Project context
 !`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
@@ -28,6 +45,28 @@ $ARGUMENTS
 !`ls -la docs/architecture* 2>/dev/null || echo "No architecture docs found"`
+## What we already know about this codebase (from uv-out)
+Prior UV Suite artifacts for THIS project. Ground the spec in them — reference real modules
+and patterns from the map, reuse existing conventions, and build on (don't re-spec) what's
+already captured.
+### Codebase map (architecture, domains, entry points)
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh map-codebase.md 120 || echo "No codebase map — run /understand first for a grounded spec"`
+### Stack map (if a multi-service system)
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh map-stack.md 80 || echo "No stack map"`
+### Prior specs (build on these; don't duplicate)
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'specs/*.md' 80 || echo "No prior specs"`
+### Architecture decisions
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/decisions.md' 60 || echo "No architecture decisions found"`
 ## Today's date
 !`date +%Y-%m-%d`

package/skills/test/SKILL.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: test
+description: >
+  Write tests or LLM evals. Routes to a specialist: the test-writer agent for
+  code tests (unit/integration), the eval-writer agent for AI/LLM evals (--eval).
+  Use after implementing a feature, when coverage is low, or when building/
+  changing an LLM-powered feature.
+argument-hint: "[target] [--unit|--integration|--eval]"
+user-invocable: true
+context: fork
+model: claude-opus-4-6
+effort: high
+allowed-tools:
+  - Read(*)
+  - Grep(*)
+  - Glob(*)
+  - Agent(*)
+---
+## Target
+$ARGUMENTS
+## Mode
+The orchestrator picks one specialist per run. Default is `--unit`.
+```!
+case "$ARGUMENTS" in
+  *--eval*)        echo "MODE=eval        → specialists/eval.md        (eval-writer agent)";;
+  *--integration*) echo "MODE=integration → specialists/integration.md (test-writer agent)";;
+  *)               echo "MODE=unit        → specialists/unit.md        (test-writer agent)";;
+esac
+```
+## Context for code tests (unit / integration)
+### Existing test patterns (match these)
+!`find . \( -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" \) -not -path "*/node_modules/*" 2>/dev/null | head -5`
+!`cat $(find . \( -name "*.test.*" -o -name "*.spec.*" \) -not -path "*/node_modules/*" 2>/dev/null | head -1) 2>/dev/null | head -40 || echo "No existing tests found"`
+### Project test command
+!`cat package.json 2>/dev/null | grep -A2 '"test"' || echo "No package.json test script"`
+## Context for evals (--eval)
+### Existing eval framework
+!`find . \( -name "*eval*" -o -name "*evals*" \) -not -path "*/node_modules/*" 2>/dev/null | head -10 || echo "No eval files found"`
+### Session output directory
+Write eval artifacts under this directory (scoped to the current session):
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
+## Shared prior analysis
+### Spec (what to test / evaluate against)
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'specs/*.md' 60 || echo "No spec found — work from code/prompt behavior"`
+### Acts plan (current task context)
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/acts-plan.md' 40 || echo "No acts plan found"`
+### Session checkpoint
+!`cat uv-out/current/checkpoints/latest.md 2>/dev/null | head -40 || echo "No checkpoint"`
+## Dispatch
+Read the `MODE` line above. It names the specialist prompt file and the agent to use:
+| MODE | Specialist prompt | Agent |
+|---|---|---|
+| unit | `skills/test/specialists/unit.md` | `test-writer` |
+| integration | `skills/test/specialists/integration.md` | `test-writer` |
+| eval | `skills/test/specialists/eval.md` | `eval-writer` |
+You (the orchestrator) read the matching specialist prompt at
+`.claude/skills/test/specialists/<mode>.md`, then dispatch the named agent via the Agent
+tool, passing the specialist prompt content + the target (`$ARGUMENTS`) + the gathered
+context above as the agent's task. Do not write the tests or evals yourself — the agent does.
+Relay the agent's result. If `$ARGUMENTS` names no target, ask what to test.

package/skills/test/specialists/eval.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Specialist: LLM / AI evals
+Dispatch the `eval-writer` agent to write graded evals for the target prompt or LLM-powered feature.
+## Agent
+`Agent(eval-writer)`.
+## What to produce
+A set of eval cases that grade the target's actual behavior against explicit criteria, using
+the project's existing eval framework (shown in the gathered context). Each eval defines:
+- **Criteria / rubric** — the specific, observable properties a correct response must have
+  (e.g., "cites a source for every factual claim", "refuses without revealing the system
+  prompt", "returns valid JSON matching the schema"). Concrete and checkable, not vague
+  ("is helpful", "sounds good").
+- **Cases** — both representative and adversarial:
+  - Representative: the inputs the feature is built for, including realistic variation.
+  - Adversarial: prompt injection, contradictory or out-of-scope requests, malformed input,
+    boundary/edge inputs, and attempts to make the model violate a stated constraint.
+- **Pass/fail conditions** — for each case, what makes the response pass vs fail. Prefer
+  deterministic checks (exact match, regex, schema validation, presence/absence of a string)
+  where possible; use an LLM judge with a written rubric only where behavior is open-ended,
+  and make the judge's pass condition explicit.
+## Quality bar
+- Each case must be able to fail. A case the current prompt trivially always passes verifies
+  nothing — make it discriminating.
+- Adversarial cases are required, not optional. An eval with only happy-path inputs gives
+  false confidence.
+- Pass conditions must be specific enough that two people grading by hand would agree. No
+  "looks reasonable" rubrics.
+- Cover the failure modes that matter for this feature (hallucination, injection, format
+  drift, refusal failures, scope creep) — name them per case.
+## Output location
+Write eval artifacts under the session output directory shown in the gathered context.
+## Conventions
+Match the existing eval framework's case format, runner, and rubric structure (shown in the
+gathered context). If no framework exists, work from the target prompt's intended behavior and
+state that assumption.

package/skills/test/specialists/integration.md ADDED Viewed

@@ -0,0 +1,42 @@
+# Specialist: Integration tests
+Dispatch the `test-writer` agent to write cross-component / real-dependency tests for the target.
+## Agent
+`Agent(test-writer)`.
+## What to produce
+Integration tests that exercise more than one component together and use real dependencies
+where the unit suite would stub them. The point is to catch bugs that only appear at the
+seams — wiring, contracts, serialization, transactions, ordering.
+Cover:
+- The interaction across components: data flows in at one entry point and the assertion
+  checks the observable effect at the far end (a returned value, a stored row, an emitted
+  event, an HTTP response).
+- Contract boundaries between modules — the shape and types one component passes to another,
+  including the failure shape when the downstream call errors.
+- Real dependency behavior that stubs hide: actual DB constraints and transactions, real
+  serialization/deserialization round-trips, real file I/O, real client/server handshakes.
+- Failure propagation across the boundary — a downstream timeout, rejection, or constraint
+  violation surfaces correctly to the caller.
+## Quality bar
+- Assert the real cross-component effect, not an intermediate value you control. If every
+  collaborator is mocked, this is a unit test mislabeled — use real dependencies.
+- Specific assertions only: specific persisted state, specific response, specific error.
+  No existence-only checks (`toBeTruthy`/`toBeDefined`).
+- Set up and tear down real state deterministically (fresh DB/schema, temp dirs, fixed
+  seeds). No reliance on leftover state from a prior test or run.
+- Avoid flakiness: no fixed `sleep` for async settling — poll/await a condition. Pin clocks
+  and seeds where timing or randomness affects the result.
+- Every test must be able to fail for a real integration bug. See `rules/test-slop.md`.
+## Conventions
+Match the existing test patterns shown in the gathered context (framework, fixture/factory
+style, how the project marks or isolates integration vs unit tests, directory location). Use
+the project test command from that context to confirm the tests run and pass before reporting.

package/skills/test/specialists/unit.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Specialist: Unit tests
+Dispatch the `test-writer` agent to write isolated unit tests for the target.
+## Agent
+`Agent(test-writer)`.
+## What to produce
+Unit tests that exercise one unit (function, method, class) in isolation. Dependencies that
+cross a process or network boundary are stubbed; everything internal runs for real.
+For each unit under test, cover:
+- The happy path with a specific input and a specific expected output.
+- Each conditional branch reachable from the public surface (every `if`/`case`/early return).
+- Boundary values where the code uses `<`, `<=`, `>`, `>=`, or array/string indices —
+  test the off-by-one points, not just the middle of the range.
+- Error and failure paths — what the unit throws, returns, or rejects with on bad input.
+- Empty / null / zero-length inputs where the unit accepts a collection or optional.
+## Quality bar
+- Assert specific values, specific side effects, or specific error conditions. Never
+  `expect(x).toBeTruthy()` / `toBeDefined()` / `not.toBeNull()` as the only assertion.
+- Do not test the mock. If a stub's return value is the only thing an assertion checks,
+  the test verifies setup, not code — delete or rewrite it.
+- Each test name must describe the behavior it verifies and match what the body asserts.
+- No sleeps, no real network, no timing-dependent assertions. Tests must be hermetic.
+- One behavior per test. Use the project's parameterized-test mechanism instead of
+  copy-pasting a test with minor variations.
+- Every test must be able to fail for a real bug. If it can't, it's slop — see
+  `rules/test-slop.md`.
+## Conventions
+Match the existing test patterns shown in the gathered context (file naming, framework,
+assertion library, fixture style, directory location). Use the project test command from
+that context to confirm the tests run and pass before reporting.

package/skills/understand/SKILL.md ADDED Viewed

@@ -0,0 +1,118 @@
+---
+name: understand
+description: >
+  Understand code: map a single codebase or a whole multi-service stack. Repo mode
+  produces an architecture overview, domain map, sequence diagrams, and entry points;
+  stack mode maps how services connect (REST, queues, shared DBs, shared libs).
+  Use when entering an unfamiliar codebase or system.
+argument-hint: "[target]"
+user-invocable: true
+context: fork
+agent: cartographer
+model: claude-opus-4-6
+effort: high
+allowed-tools:
+  - Read(*)
+  - Write(uv-out/**)
+  - AskUserQuestion
+  - Grep(*)
+  - Glob(*)
+  - Bash(graphify *)
+  - Bash(repomix *)
+  - Bash(find *)
+  - Bash(git *)
+  - Bash(wc *)
+  - Bash(head *)
+  - Bash(ls *)
+  - Bash(cat *)
+  - Bash(pip *)
+---
+## Target
+$ARGUMENTS
+## Mode (auto-detected)
+The same cartographer, two ways of looking — and the skill picks for you:
+- **repo** — one codebase: architecture, domains, sequence diagrams, entry points, danger zones.
+- **stack** — multiple services: how they connect (REST, queues, shared DBs, shared libs).
+Detection counts **distinct directories that contain a build file** (not the number of
+build files — a single project often has several, e.g. `setup.py` + `pyproject.toml`).
+**Two or more distinct service directories → stack; otherwise → repo.** The block below
+prints the chosen mode and why — follow it.
+```!
+T="$(echo "$ARGUMENTS" | xargs)"; D="${T:-.}"
+svc=$(find "$D" -maxdepth 3 \
+  \( -name package.json -o -name go.mod -o -name requirements.txt -o -name pom.xml \
+     -o -name build.gradle -o -name Cargo.toml -o -name pyproject.toml -o -name setup.py \
+     -o -name composer.json -o -name Gemfile \) \
+  -not -path "*/node_modules/*" -not -path "*/.git/*" -not -path "*/vendor/*" -not -path "*/.venv/*" \
+  2>/dev/null | xargs -n1 dirname 2>/dev/null | sort -u)
+ndirs=$(printf '%s\n' "$svc" | grep -c .)
+if [ "$ndirs" -ge 2 ]; then
+  echo "MODE=stack (${ndirs} distinct service dirs:"; printf '%s\n' "$svc" | sed 's/^/  - /'
+  echo ")"
+else
+  echo "MODE=repo (single codebase; ${ndirs} build dir)"
+fi
+```
+The `T`/`D` convention: `T` is the target, `D` falls back to `.` when no target is given.
+## Session output directory
+Write the map under this directory (scoped to the current session):
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
+## Shared context
+!`T="$(echo "$ARGUMENTS"|xargs)"; D="$T"; [ -d "$D" ] || D="."; cat "$D/CLAUDE.md" 2>/dev/null || cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
+!`T="$(echo "$ARGUMENTS"|xargs)"; D="$T"; [ -d "$D" ] || D="."; cat "$D/DANGER-ZONES.md" 2>/dev/null || echo "No DANGER-ZONES.md found"`
+## Graphify availability
+```!
+graphify --version 2>/dev/null || echo "NOT_INSTALLED"
+```
+## Prior maps (current session first, then prior, then legacy)
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-collect.sh 'map-codebase.md' || echo "No prior codebase map"`
+!`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-collect.sh 'map-stack.md' || echo "No prior stack map"`
+## Discovery
+Both sets run at load (cheap); the chosen mode uses what it needs.
+### Services (stack mode)
+```!
+T="$(echo "$ARGUMENTS"|xargs)"; find ${T:-.} -maxdepth 3 \( -name "package.json" -o -name "pom.xml" -o -name "go.mod" -o -name "Cargo.toml" -o -name "requirements.txt" -o -name "setup.py" -o -name "pyproject.toml" \) -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | head -30
+```
+### Dockerfiles, compose, infra, contracts (stack mode)
+```!
+T="$(echo "$ARGUMENTS"|xargs)"; find ${T:-.} -maxdepth 4 \( -name "Dockerfile" -o -name "docker-compose*" -o -name "*.tf" -o -name "Chart.yaml" -o -name "values.yaml" -o -name "*.proto" -o -name "openapi*" -o -name "*.graphql" \) -not -path "*/node_modules/*" 2>/dev/null | head -30
+```
+### Existing knowledge graph (repo mode)
+```!
+T="$(echo "$ARGUMENTS"|xargs)"; D="$T"; [ -d "$D" ] || D="."; cat "$D/graphify-out/GRAPH_REPORT.md" 2>/dev/null | head -80 || echo "No existing graph found"
+```
+## Procedure
+Based on the `MODE` printed above, follow the matching mode file. It owns the process and
+the output for that mode:
+- **MODE=repo** → follow `skills/understand/modes/repo.md`
+- **MODE=stack** → follow `skills/understand/modes/stack.md`

package/skills/understand/modes/repo.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Repo mode
+Map one codebase: architecture, business domains, key flows, entry points, danger zones.
+The orchestrator has already loaded for you: the target (`$ARGUMENTS`), the chosen
+`MODE`, the session output directory, the target's CLAUDE.md / DANGER-ZONES.md, Graphify
+availability, the **existing knowledge graph** (if any), and any prior `map-codebase.md`
+artifacts. Use them; don't re-fetch.
+## Process
+If Graphify is available (see the orchestrator's availability check), run it first and
+fold its findings into the map below.
+1. **Map the architecture** — identify major modules and how they fit together.
+2. **Map the business domain** — the real-world concepts the code models.
+3. **Trace the key flows** — the paths a request/job takes through the system.
+4. **Find the entry points** — main, routes, handlers, CLI commands.
+5. **Flag the danger zones** — fragile areas, high-fan-in modules, missing tests.
+## Output
+Write the full map to `map-codebase.md` inside the session output directory printed by
+the orchestrator (e.g. `uv-out/sessions/<sid>/map-codebase.md`), stamped with provenance
+frontmatter (`session`, `skill: understand`, `created`). It should contain:
+- **Architecture overview** — major modules and how they fit together
+- **Business domain map** — the real-world concepts the code models
+- **Sequence diagrams** (Mermaid) for the key flows
+- **Entry points** — main, routes, handlers, CLI commands
+- **Danger zones** — fragile areas, high-fan-in modules, missing tests
+## Report back
+After writing the file, print only a one-line pointer to the terminal — do not repeat
+the map. For example:
+> Map written to `uv-out/sessions/<sid>/map-codebase.md` — go check it.