uv-suite 0.28.0 → 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (151) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +58 -35
  3. package/agents/claude-code/anti-slop-guard.md +14 -1
  4. package/agents/claude-code/architect.md +30 -4
  5. package/agents/claude-code/cartographer.md +18 -6
  6. package/agents/claude-code/eval-writer.md +7 -2
  7. package/agents/claude-code/reviewer.md +5 -1
  8. package/agents/claude-code/spec-writer.md +30 -7
  9. package/agents/generate.py +88 -0
  10. package/bin/cli.js +51 -48
  11. package/hooks/auto-checkpoint-helper.sh +2 -2
  12. package/hooks/auto-checkpoint.sh +3 -3
  13. package/hooks/auto-restore-on-start.sh +30 -0
  14. package/hooks/checkpoint-helper.sh +40 -35
  15. package/hooks/git-context.sh +41 -0
  16. package/hooks/lite-mode-inject.sh +26 -0
  17. package/hooks/session-end-helper.sh +2 -2
  18. package/hooks/session-end.sh +2 -2
  19. package/hooks/session-label-nag.sh +2 -2
  20. package/hooks/session-meta.sh +18 -1
  21. package/hooks/session-review-reminder.sh +2 -2
  22. package/hooks/session-start.sh +16 -0
  23. package/hooks/slop-grep.sh +12 -31
  24. package/hooks/uv-out-best.sh +20 -0
  25. package/hooks/uv-out-collect.sh +52 -0
  26. package/hooks/uv-out-notify.sh +28 -0
  27. package/hooks/uv-out-pointer.sh +16 -0
  28. package/hooks/uv-out-session.sh +24 -0
  29. package/hooks/watchtower-notify.sh +45 -0
  30. package/hooks/watchtower-send.sh +4 -0
  31. package/install.sh +93 -42
  32. package/package.json +2 -2
  33. package/personas/auto.json +40 -1
  34. package/personas/professional.json +46 -1
  35. package/personas/spike.json +32 -2
  36. package/personas/sport.json +44 -1
  37. package/settings.json +6 -2
  38. package/skills/architect/SKILL.md +109 -8
  39. package/skills/architect/specialists/distributed-systems.md +84 -0
  40. package/skills/architect/specialists/full-stack.md +92 -0
  41. package/skills/architect/specialists/llm-ai-engineering.md +86 -0
  42. package/skills/architect/specialists/ml-systems.md +81 -0
  43. package/skills/commit/SKILL.md +5 -2
  44. package/skills/confirm/SKILL.md +3 -3
  45. package/skills/investigate/SKILL.md +14 -4
  46. package/skills/lite/SKILL.md +45 -0
  47. package/skills/qa/SKILL.md +274 -0
  48. package/skills/review/SKILL.md +187 -8
  49. package/skills/review/specialists/api-contract.md +122 -0
  50. package/skills/review/specialists/architecture-trace.md +64 -0
  51. package/skills/review/specialists/data-migration.md +113 -0
  52. package/skills/review/specialists/maintainability.md +138 -0
  53. package/skills/review/specialists/performance.md +115 -0
  54. package/skills/review/specialists/security.md +132 -0
  55. package/skills/review/specialists/testing.md +109 -0
  56. package/skills/session/SKILL.md +87 -0
  57. package/skills/session/operations/auto.md +22 -0
  58. package/skills/session/operations/checkpoint.md +43 -0
  59. package/skills/session/operations/end.md +35 -0
  60. package/skills/session/operations/init.md +16 -0
  61. package/skills/session/operations/restore.md +16 -0
  62. package/skills/spec/SKILL.md +40 -1
  63. package/skills/test/SKILL.md +89 -0
  64. package/skills/test/specialists/eval.md +46 -0
  65. package/skills/test/specialists/integration.md +42 -0
  66. package/skills/test/specialists/unit.md +39 -0
  67. package/skills/understand/SKILL.md +118 -0
  68. package/skills/understand/modes/repo.md +38 -0
  69. package/skills/understand/modes/stack.md +41 -0
  70. package/skills/uv-help/SKILL.md +43 -20
  71. package/uv.sh +36 -3
  72. package/watchtower/Dockerfile +9 -0
  73. package/watchtower/README.md +78 -0
  74. package/watchtower/app/__init__.py +0 -0
  75. package/watchtower/app/__pycache__/__init__.cpython-314.pyc +0 -0
  76. package/watchtower/app/__pycache__/db.cpython-314.pyc +0 -0
  77. package/watchtower/app/__pycache__/main.cpython-314.pyc +0 -0
  78. package/watchtower/app/__pycache__/models.cpython-314.pyc +0 -0
  79. package/watchtower/app/db.py +85 -0
  80. package/watchtower/app/main.py +45 -0
  81. package/watchtower/app/models.py +49 -0
  82. package/watchtower/app/routers/__init__.py +0 -0
  83. package/watchtower/app/routers/__pycache__/__init__.cpython-314.pyc +0 -0
  84. package/watchtower/app/routers/__pycache__/control.cpython-314.pyc +0 -0
  85. package/watchtower/app/routers/__pycache__/ingest.cpython-314.pyc +0 -0
  86. package/watchtower/app/routers/__pycache__/query.cpython-314.pyc +0 -0
  87. package/watchtower/app/routers/__pycache__/stream.cpython-314.pyc +0 -0
  88. package/watchtower/app/routers/control.py +144 -0
  89. package/watchtower/app/routers/ingest.py +102 -0
  90. package/watchtower/app/routers/query.py +84 -0
  91. package/watchtower/app/routers/stream.py +30 -0
  92. package/watchtower/app/services/__init__.py +0 -0
  93. package/watchtower/app/services/__pycache__/__init__.cpython-314.pyc +0 -0
  94. package/watchtower/app/services/__pycache__/checkpoint.cpython-314.pyc +0 -0
  95. package/watchtower/app/services/__pycache__/tmux.cpython-314.pyc +0 -0
  96. package/watchtower/app/services/checkpoint.py +107 -0
  97. package/watchtower/app/services/tmux.py +54 -0
  98. package/watchtower/docker-compose.yml +22 -0
  99. package/watchtower/events.json +10373 -1
  100. package/watchtower/{auto-checkpoint-runner.js → legacy/auto-checkpoint-runner.js} +29 -2
  101. package/watchtower/{dashboard.html → legacy/dashboard.html} +261 -0
  102. package/watchtower/{server.js → legacy/server.js} +63 -0
  103. package/watchtower/legacy/snapshot-manager.js +305 -0
  104. package/watchtower/requirements.txt +3 -0
  105. package/watchtower/schema.sql +43 -0
  106. package/watchtower/static/dashboard.html +449 -0
  107. package/agents/claude-code/devops.md +0 -50
  108. package/agents/claude-code/security.md +0 -75
  109. package/agents/codex/anti-slop-guard.toml +0 -12
  110. package/agents/codex/architect.toml +0 -11
  111. package/agents/codex/cartographer.toml +0 -16
  112. package/agents/codex/devops.toml +0 -8
  113. package/agents/codex/eval-writer.toml +0 -11
  114. package/agents/codex/prototype-builder.toml +0 -10
  115. package/agents/codex/reviewer.toml +0 -16
  116. package/agents/codex/security.toml +0 -14
  117. package/agents/codex/spec-writer.toml +0 -11
  118. package/agents/codex/test-writer.toml +0 -13
  119. package/agents/cursor/anti-slop-guard.mdc +0 -22
  120. package/agents/cursor/architect.mdc +0 -24
  121. package/agents/cursor/cartographer.mdc +0 -28
  122. package/agents/cursor/devops.mdc +0 -16
  123. package/agents/cursor/eval-writer.mdc +0 -21
  124. package/agents/cursor/prototype-builder.mdc +0 -25
  125. package/agents/cursor/reviewer.mdc +0 -26
  126. package/agents/cursor/security.mdc +0 -20
  127. package/agents/cursor/spec-writer.mdc +0 -27
  128. package/agents/cursor/test-writer.mdc +0 -28
  129. package/agents/portable/anti-slop-guard.md +0 -71
  130. package/agents/portable/architect.md +0 -83
  131. package/agents/portable/cartographer.md +0 -64
  132. package/agents/portable/devops.md +0 -56
  133. package/agents/portable/eval-writer.md +0 -70
  134. package/agents/portable/prototype-builder.md +0 -70
  135. package/agents/portable/reviewer.md +0 -79
  136. package/agents/portable/security.md +0 -63
  137. package/agents/portable/spec-writer.md +0 -89
  138. package/agents/portable/test-writer.md +0 -56
  139. package/hooks/context-warning.sh +0 -4
  140. package/skills/auto-checkpoint/SKILL.md +0 -47
  141. package/skills/checkpoint/SKILL.md +0 -105
  142. package/skills/map-codebase/SKILL.md +0 -54
  143. package/skills/map-stack/SKILL.md +0 -121
  144. package/skills/restore/SKILL.md +0 -55
  145. package/skills/security-review/SKILL.md +0 -87
  146. package/skills/session-end/SKILL.md +0 -100
  147. package/skills/session-init/SKILL.md +0 -45
  148. package/skills/slop-check/SKILL.md +0 -40
  149. package/skills/write-evals/SKILL.md +0 -34
  150. package/skills/write-tests/SKILL.md +0 -54
  151. /package/watchtower/{auto-checkpoint-prompt.md → legacy/auto-checkpoint-prompt.md} +0 -0
@@ -0,0 +1,87 @@
1
+ ---
2
+ name: session
3
+ description: >
4
+ Manage the UV Suite session lifecycle: label it (init), save a checkpoint,
5
+ restore a prior checkpoint, end it cleanly, or toggle auto-checkpoints.
6
+ One skill, subcommand-routed. The automatic counterparts (PostToolUse
7
+ auto-checkpoint, SessionStart auto-restore) remain hooks. Checkpoints are
8
+ stored per-session under uv-out/sessions/<session-id>/checkpoints/, so
9
+ concurrent terminals don't clobber each other.
10
+ argument-hint: "init|checkpoint|restore|end|auto [args]"
11
+ user-invocable: true
12
+ allowed-tools:
13
+ - Read(*)
14
+ - Write(*)
15
+ - Bash(git status *)
16
+ - Bash(git diff *)
17
+ - Bash(git log *)
18
+ - Bash(git branch *)
19
+ - Bash(git rev-parse *)
20
+ - Bash(date *)
21
+ - Bash(ls *)
22
+ - Bash(mkdir *)
23
+ - Bash(cat *)
24
+ - Bash(echo *)
25
+ - Bash(grep *)
26
+ - Bash(find *)
27
+ - Bash(*/.claude/hooks/session-meta.sh *)
28
+ - Bash(*/.claude/hooks/checkpoint-helper.sh *)
29
+ - Bash(*/.claude/hooks/session-end-helper.sh *)
30
+ - Bash(*/.claude/hooks/auto-checkpoint-helper.sh *)
31
+ ---
32
+
33
+ ## Dispatch
34
+
35
+ The first word of `$ARGUMENTS` is the subcommand; the rest are its arguments. The
36
+ block below runs only the context-gathering for the chosen subcommand and prints a
37
+ `SUBCOMMAND=...` marker — follow the matching section under "Instructions".
38
+
39
+ ```!
40
+ ARGS="$ARGUMENTS"; SUB=$(printf '%s' "$ARGS" | awk '{print $1}'); REST=$(printf '%s' "$ARGS" | sed 's/^[^ ]* *//'); [ "$REST" = "$SUB" ] && REST=""
41
+ H="${CLAUDE_PROJECT_DIR:-.}/.claude/hooks"
42
+ case "$SUB" in
43
+ init)
44
+ if [ -z "$REST" ] || [ "$REST" = show ]; then "$H/session-meta.sh" show
45
+ elif [ "$REST" = clear ]; then "$H/session-meta.sh" clear
46
+ else F=$(printf '%s' "$REST"|awk '{print $1}'); R2=$(printf '%s' "$REST"|sed 's/^[^ ]* *//')
47
+ case "$F" in
48
+ --kind) "$H/session-meta.sh" set-kind "$R2";;
49
+ --priority) "$H/session-meta.sh" set-priority "$R2";;
50
+ --purpose) "$H/session-meta.sh" set-purpose "$R2";;
51
+ --name) "$H/session-meta.sh" set-name "$R2";;
52
+ *) "$H/session-meta.sh" set-name "$REST";;
53
+ esac
54
+ fi; echo "SUBCOMMAND=init";;
55
+ auto)
56
+ "$H/auto-checkpoint-helper.sh" $REST; echo "SUBCOMMAND=auto";;
57
+ restore)
58
+ "$H/checkpoint-helper.sh" list; echo "---LATEST---"; "$H/checkpoint-helper.sh" latest; echo "SUBCOMMAND=restore REST=$REST";;
59
+ checkpoint|end)
60
+ "$H/checkpoint-helper.sh" dir; echo "---FRONTMATTER---"; "$H/checkpoint-helper.sh" frontmatter; echo "---META---"; "$H/checkpoint-helper.sh" meta
61
+ echo "---GIT---"; git branch --show-current 2>/dev/null || echo "(not a git repo)"; git status --short 2>/dev/null | head -20; git log --oneline -5 2>/dev/null
62
+ echo "SUBCOMMAND=$SUB REST=$REST";;
63
+ *) echo "SUBCOMMAND=help (got: '$SUB')";;
64
+ esac
65
+ ```
66
+
67
+ ## Instructions
68
+
69
+ Read the `SUBCOMMAND=...` marker printed by the dispatch block above, then read
70
+ and follow the matching operation file:
71
+
72
+ - `init` → `skills/session/operations/init.md`
73
+ - `checkpoint` → `skills/session/operations/checkpoint.md`
74
+ - `restore` → `skills/session/operations/restore.md`
75
+ - `end` → `skills/session/operations/end.md`
76
+ - `auto` → `skills/session/operations/auto.md`
77
+
78
+ The operation file references the context the dispatch block already printed
79
+ above (helper output, checkpoint-dir, frontmatter, git state, session list).
80
+
81
+ ### help / unknown
82
+ Tell the user the subcommands and their one-line usage:
83
+ - `/session init <name>|--kind|--priority|--purpose|show|clear` — label the session
84
+ - `/session checkpoint [label]` — save a checkpoint
85
+ - `/session restore [<id-prefix>|<name>|list]` — load a prior checkpoint
86
+ - `/session end [label]` — write the final checkpoint and terminate
87
+ - `/session auto on|off|<minutes>|status` — toggle automatic checkpoints
@@ -0,0 +1,22 @@
1
+ # auto — toggle automatic checkpoints
2
+
3
+ Show the line of output above; it confirms the new mode/interval. No commentary.
4
+ The change applies to the very next interval — no restart needed.
5
+ Usage: `/session auto on|off|<minutes>|status` (default interval 10 min, range
6
+ 1-1440). `status` or no argument prints the current mode and interval.
7
+
8
+ How it works:
9
+ - **Tier A (mechanical):** the `auto-checkpoint.sh` hook runs after each tool
10
+ call. When the interval has passed and there's been activity since the last
11
+ checkpoint, it writes a deterministic snapshot — git state, recent tool calls,
12
+ files touched — to `uv-out/sessions/<sid>/checkpoints/auto-<ts>-mechanical.md`.
13
+ - **Tier B (semantic):** the Watchtower (`uvs watch`) keeps a timer. Every N
14
+ minutes, for each active session, it shells out to `claude -p --bare --model
15
+ haiku` with a prompt from recent dashboard events and git state, saving
16
+ `auto-<ts>-semantic.md` next to the mechanical one. Cap: `--max-budget-usd
17
+ 0.05` per call.
18
+ - Both tiers fire `AutoCheckpoint` events to the Watchtower. Sessions with zero
19
+ activity in the interval are skipped — no empty checkpoints.
20
+
21
+ State lives in `.uv-suite-state/auto-checkpoint.json` (mode + interval) and
22
+ `.uv-suite-state/sessions/<sid>.last-{mechanical,semantic}-checkpoint.txt`.
@@ -0,0 +1,43 @@
1
+ # checkpoint — save a checkpoint
2
+
3
+ Using the checkpoint-dir, frontmatter, and git state printed above, write
4
+ `<checkpoint-dir>/YYYY-MM-DD-HHMM.md` (current timestamp; append `-REST` to the
5
+ filename if `REST` is non-empty). The directory is per-session — two `uv`
6
+ launches in the same repo write to different folders, so checkpoints don't
7
+ collide. **Begin the file with the exact YAML frontmatter block printed above** —
8
+ `/session restore` parses these fields to pick the right checkpoint and display
9
+ session context. Also write/overwrite `<checkpoint-dir>/latest.md` with the same
10
+ content so the next session's restore finds the freshest state. Body (keep under
11
+ 80 lines; frontmatter is required and not counted):
12
+
13
+ ```markdown
14
+ # Checkpoint: [date] [time] [label if provided]
15
+
16
+ ## What was accomplished
17
+ - [Concrete things done this session — files, commits, decisions]
18
+ - [Be specific: "Added webhook retry logic to PaymentService" not "worked on payments"]
19
+
20
+ ## Key decisions made
21
+ - [Decision]: [Why] — [What was considered and rejected]
22
+ - [Only decisions that affect future work]
23
+
24
+ ## Current state
25
+ - Branch: [current git branch]
26
+ - Uncommitted changes: [yes/no, summary if yes]
27
+ - Tests: [passing/failing/not run]
28
+ - Blockers: [any unresolved issues]
29
+
30
+ ## Files modified
31
+ - [Key files changed, not every file]
32
+
33
+ ## What's next
34
+ - [Immediate next step the next session should start with]
35
+ - [Remaining tasks from the current Act/plan]
36
+
37
+ ## Context the next session needs
38
+ - [Non-obvious facts, workarounds, "this looks wrong but it's intentional because..."]
39
+ - [Environment setup notes if relevant]
40
+ ```
41
+
42
+ Capture WHY decisions were made, not just what. "Worked on auth" is useless;
43
+ "Added JWT refresh token rotation with 7-day expiry" is useful.
@@ -0,0 +1,35 @@
1
+ # end — close the session cleanly
2
+
3
+ 1. Write a final checkpoint exactly like `checkpoint` above, but name it
4
+ `<checkpoint-dir>/final-YYYY-MM-DD-HHMM.md` (append `-REST` to the filename if
5
+ given) and **override** the frontmatter line
6
+ `checkpoint_kind: auto-mechanical` to read `checkpoint_kind: final-manual` so
7
+ it's distinguishable from the auto-checkpoints. Also write/overwrite
8
+ `<checkpoint-dir>/latest.md` with the same content so `/session restore` picks
9
+ it up. Use the live conversation context — this is the highest-fidelity record
10
+ before the session closes; anything not written here is lost unless it's in
11
+ code or the auto-checkpoints. Final-checkpoint body structure:
12
+
13
+ ```markdown
14
+ # Final checkpoint: [date] [time] [label if provided]
15
+
16
+ ## What was accomplished
17
+ - Concrete things done across the whole session — files, commits, decisions
18
+
19
+ ## Key decisions made
20
+ - Decision: Why — what was considered, what was rejected
21
+
22
+ ## Current state
23
+ - Branch / uncommitted changes / tests status / blockers
24
+
25
+ ## Open threads
26
+ - Anything left in flight that the next session needs to pick up
27
+
28
+ ## Context the next session needs
29
+ - Non-obvious facts, workarounds, "this looks wrong but it's intentional because…"
30
+ ```
31
+
32
+ 2. Then run `"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/session-end-helper.sh` (via
33
+ Bash) to mark the session terminated and flip the Watchtower badge to
34
+ Terminated. Show the user the two-line output; don't add commentary. This does
35
+ **not** close the terminal or exit Claude Code — the user does that.
@@ -0,0 +1,16 @@
1
+ # init — label the session
2
+
3
+ Show the helper output above. Do not add commentary; the command confirms what
4
+ changed. The Watchtower dashboard refreshes within a few seconds (next event
5
+ refreshes session metadata).
6
+ Usage: `/session init <name>` · `--kind long|outcome` · `--priority low|med|high`
7
+ · `--purpose <text>` · `show` · `clear`. Each call sets one field; run multiple
8
+ times to set more.
9
+
10
+ Notes:
11
+ - The session id is generated by `uv.sh` at launch and exported as
12
+ `UVS_SESSION_ID`; launching `claude` directly falls back to an
13
+ `ad-hoc-<timestamp>` id.
14
+ - Metadata lives at `.uv-suite-state/sessions/<id>.json`.
15
+ - Persona is captured at launch time and is not editable here — re-launch with a
16
+ different `uv` persona to change it.
@@ -0,0 +1,16 @@
1
+ # restore — load a prior checkpoint
2
+
3
+ The available-sessions list (with `*` marking the current session) and the
4
+ current session's `latest.md` are printed above.
5
+ - **`REST` empty or `latest`:** read the current session's `latest.md` shown
6
+ above. Summarize in 3-4 sentences (what was done, current state, what's next),
7
+ including the session's name and purpose from the frontmatter. Then ask:
8
+ "Ready to pick up from here, or do you want to take a different direction?"
9
+ - **`REST` = `list`:** just show the available-sessions list above and ask which
10
+ one they want to restore.
11
+ - **`REST` looks like a session id prefix** (8-char hex / UUID-ish) **or a
12
+ session name:** match it against the list, then Read the matching session's
13
+ `<project>/uv-out/sessions/<full-session-id>/checkpoints/latest.md` (fall back
14
+ to the legacy `<project>/uv-out/checkpoints/<full-session-id>/latest.md` if the
15
+ new path is absent), and summarize as above.
16
+ - **No match:** list the available sessions and ask the user to pick one.
@@ -13,13 +13,30 @@ allowed-tools:
13
13
  - Read(*)
14
14
  - Grep(*)
15
15
  - Glob(*)
16
- - Write(*)
16
+ - Write(uv-out/**)
17
+ - AskUserQuestion
17
18
  ---
18
19
 
19
20
  ## Requirements
20
21
 
21
22
  $ARGUMENTS
22
23
 
24
+ ## Session output directory
25
+
26
+ Write the spec under this directory (it is scoped to the current session, so artifacts
27
+ are attributable to the session that produced them):
28
+
29
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
30
+
31
+ ## Step 0 — Gather context (do this FIRST, before writing)
32
+
33
+ The codebase map, prior specs, and decisions for this project are auto-loaded below (from
34
+ `uv-out/`). Read them first — they are the indexed view of the existing codebase. Then use
35
+ `AskUserQuestion` to ask the user only for what those don't cover: specific files, modules,
36
+ API contracts, or data models the spec must account for. Read every file they name before
37
+ drafting. Greenfield (no map, no relevant code) → proceed without. Don't guess at relevant
38
+ files — ground in the map or ask.
39
+
23
40
  ## Project context
24
41
 
25
42
  !`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
@@ -28,6 +45,28 @@ $ARGUMENTS
28
45
 
29
46
  !`ls -la docs/architecture* 2>/dev/null || echo "No architecture docs found"`
30
47
 
48
+ ## What we already know about this codebase (from uv-out)
49
+
50
+ Prior UV Suite artifacts for THIS project. Ground the spec in them — reference real modules
51
+ and patterns from the map, reuse existing conventions, and build on (don't re-spec) what's
52
+ already captured.
53
+
54
+ ### Codebase map (architecture, domains, entry points)
55
+
56
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh map-codebase.md 120 || echo "No codebase map — run /understand first for a grounded spec"`
57
+
58
+ ### Stack map (if a multi-service system)
59
+
60
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh map-stack.md 80 || echo "No stack map"`
61
+
62
+ ### Prior specs (build on these; don't duplicate)
63
+
64
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'specs/*.md' 80 || echo "No prior specs"`
65
+
66
+ ### Architecture decisions
67
+
68
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/decisions.md' 60 || echo "No architecture decisions found"`
69
+
31
70
  ## Today's date
32
71
 
33
72
  !`date +%Y-%m-%d`
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: test
3
+ description: >
4
+ Write tests or LLM evals. Routes to a specialist: the test-writer agent for
5
+ code tests (unit/integration), the eval-writer agent for AI/LLM evals (--eval).
6
+ Use after implementing a feature, when coverage is low, or when building/
7
+ changing an LLM-powered feature.
8
+ argument-hint: "[target] [--unit|--integration|--eval]"
9
+ user-invocable: true
10
+ context: fork
11
+ model: claude-opus-4-6
12
+ effort: high
13
+ allowed-tools:
14
+ - Read(*)
15
+ - Grep(*)
16
+ - Glob(*)
17
+ - Agent(*)
18
+ ---
19
+
20
+ ## Target
21
+
22
+ $ARGUMENTS
23
+
24
+ ## Mode
25
+
26
+ The orchestrator picks one specialist per run. Default is `--unit`.
27
+
28
+ ```!
29
+ case "$ARGUMENTS" in
30
+ *--eval*) echo "MODE=eval → specialists/eval.md (eval-writer agent)";;
31
+ *--integration*) echo "MODE=integration → specialists/integration.md (test-writer agent)";;
32
+ *) echo "MODE=unit → specialists/unit.md (test-writer agent)";;
33
+ esac
34
+ ```
35
+
36
+ ## Context for code tests (unit / integration)
37
+
38
+ ### Existing test patterns (match these)
39
+
40
+ !`find . \( -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" \) -not -path "*/node_modules/*" 2>/dev/null | head -5`
41
+
42
+ !`cat $(find . \( -name "*.test.*" -o -name "*.spec.*" \) -not -path "*/node_modules/*" 2>/dev/null | head -1) 2>/dev/null | head -40 || echo "No existing tests found"`
43
+
44
+ ### Project test command
45
+
46
+ !`cat package.json 2>/dev/null | grep -A2 '"test"' || echo "No package.json test script"`
47
+
48
+ ## Context for evals (--eval)
49
+
50
+ ### Existing eval framework
51
+
52
+ !`find . \( -name "*eval*" -o -name "*evals*" \) -not -path "*/node_modules/*" 2>/dev/null | head -10 || echo "No eval files found"`
53
+
54
+ ### Session output directory
55
+
56
+ Write eval artifacts under this directory (scoped to the current session):
57
+
58
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
59
+
60
+ ## Shared prior analysis
61
+
62
+ ### Spec (what to test / evaluate against)
63
+
64
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'specs/*.md' 60 || echo "No spec found — work from code/prompt behavior"`
65
+
66
+ ### Acts plan (current task context)
67
+
68
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-best.sh 'architecture/acts-plan.md' 40 || echo "No acts plan found"`
69
+
70
+ ### Session checkpoint
71
+
72
+ !`cat uv-out/current/checkpoints/latest.md 2>/dev/null | head -40 || echo "No checkpoint"`
73
+
74
+ ## Dispatch
75
+
76
+ Read the `MODE` line above. It names the specialist prompt file and the agent to use:
77
+
78
+ | MODE | Specialist prompt | Agent |
79
+ |---|---|---|
80
+ | unit | `skills/test/specialists/unit.md` | `test-writer` |
81
+ | integration | `skills/test/specialists/integration.md` | `test-writer` |
82
+ | eval | `skills/test/specialists/eval.md` | `eval-writer` |
83
+
84
+ You (the orchestrator) read the matching specialist prompt at
85
+ `.claude/skills/test/specialists/<mode>.md`, then dispatch the named agent via the Agent
86
+ tool, passing the specialist prompt content + the target (`$ARGUMENTS`) + the gathered
87
+ context above as the agent's task. Do not write the tests or evals yourself — the agent does.
88
+
89
+ Relay the agent's result. If `$ARGUMENTS` names no target, ask what to test.
@@ -0,0 +1,46 @@
1
+ # Specialist: LLM / AI evals
2
+
3
+ Dispatch the `eval-writer` agent to write graded evals for the target prompt or LLM-powered feature.
4
+
5
+ ## Agent
6
+
7
+ `Agent(eval-writer)`.
8
+
9
+ ## What to produce
10
+
11
+ A set of eval cases that grade the target's actual behavior against explicit criteria, using
12
+ the project's existing eval framework (shown in the gathered context). Each eval defines:
13
+
14
+ - **Criteria / rubric** — the specific, observable properties a correct response must have
15
+ (e.g., "cites a source for every factual claim", "refuses without revealing the system
16
+ prompt", "returns valid JSON matching the schema"). Concrete and checkable, not vague
17
+ ("is helpful", "sounds good").
18
+ - **Cases** — both representative and adversarial:
19
+ - Representative: the inputs the feature is built for, including realistic variation.
20
+ - Adversarial: prompt injection, contradictory or out-of-scope requests, malformed input,
21
+ boundary/edge inputs, and attempts to make the model violate a stated constraint.
22
+ - **Pass/fail conditions** — for each case, what makes the response pass vs fail. Prefer
23
+ deterministic checks (exact match, regex, schema validation, presence/absence of a string)
24
+ where possible; use an LLM judge with a written rubric only where behavior is open-ended,
25
+ and make the judge's pass condition explicit.
26
+
27
+ ## Quality bar
28
+
29
+ - Each case must be able to fail. A case the current prompt trivially always passes verifies
30
+ nothing — make it discriminating.
31
+ - Adversarial cases are required, not optional. An eval with only happy-path inputs gives
32
+ false confidence.
33
+ - Pass conditions must be specific enough that two people grading by hand would agree. No
34
+ "looks reasonable" rubrics.
35
+ - Cover the failure modes that matter for this feature (hallucination, injection, format
36
+ drift, refusal failures, scope creep) — name them per case.
37
+
38
+ ## Output location
39
+
40
+ Write eval artifacts under the session output directory shown in the gathered context.
41
+
42
+ ## Conventions
43
+
44
+ Match the existing eval framework's case format, runner, and rubric structure (shown in the
45
+ gathered context). If no framework exists, work from the target prompt's intended behavior and
46
+ state that assumption.
@@ -0,0 +1,42 @@
1
+ # Specialist: Integration tests
2
+
3
+ Dispatch the `test-writer` agent to write cross-component / real-dependency tests for the target.
4
+
5
+ ## Agent
6
+
7
+ `Agent(test-writer)`.
8
+
9
+ ## What to produce
10
+
11
+ Integration tests that exercise more than one component together and use real dependencies
12
+ where the unit suite would stub them. The point is to catch bugs that only appear at the
13
+ seams — wiring, contracts, serialization, transactions, ordering.
14
+
15
+ Cover:
16
+ - The interaction across components: data flows in at one entry point and the assertion
17
+ checks the observable effect at the far end (a returned value, a stored row, an emitted
18
+ event, an HTTP response).
19
+ - Contract boundaries between modules — the shape and types one component passes to another,
20
+ including the failure shape when the downstream call errors.
21
+ - Real dependency behavior that stubs hide: actual DB constraints and transactions, real
22
+ serialization/deserialization round-trips, real file I/O, real client/server handshakes.
23
+ - Failure propagation across the boundary — a downstream timeout, rejection, or constraint
24
+ violation surfaces correctly to the caller.
25
+
26
+ ## Quality bar
27
+
28
+ - Assert the real cross-component effect, not an intermediate value you control. If every
29
+ collaborator is mocked, this is a unit test mislabeled — use real dependencies.
30
+ - Specific assertions only: specific persisted state, specific response, specific error.
31
+ No existence-only checks (`toBeTruthy`/`toBeDefined`).
32
+ - Set up and tear down real state deterministically (fresh DB/schema, temp dirs, fixed
33
+ seeds). No reliance on leftover state from a prior test or run.
34
+ - Avoid flakiness: no fixed `sleep` for async settling — poll/await a condition. Pin clocks
35
+ and seeds where timing or randomness affects the result.
36
+ - Every test must be able to fail for a real integration bug. See `rules/test-slop.md`.
37
+
38
+ ## Conventions
39
+
40
+ Match the existing test patterns shown in the gathered context (framework, fixture/factory
41
+ style, how the project marks or isolates integration vs unit tests, directory location). Use
42
+ the project test command from that context to confirm the tests run and pass before reporting.
@@ -0,0 +1,39 @@
1
+ # Specialist: Unit tests
2
+
3
+ Dispatch the `test-writer` agent to write isolated unit tests for the target.
4
+
5
+ ## Agent
6
+
7
+ `Agent(test-writer)`.
8
+
9
+ ## What to produce
10
+
11
+ Unit tests that exercise one unit (function, method, class) in isolation. Dependencies that
12
+ cross a process or network boundary are stubbed; everything internal runs for real.
13
+
14
+ For each unit under test, cover:
15
+ - The happy path with a specific input and a specific expected output.
16
+ - Each conditional branch reachable from the public surface (every `if`/`case`/early return).
17
+ - Boundary values where the code uses `<`, `<=`, `>`, `>=`, or array/string indices —
18
+ test the off-by-one points, not just the middle of the range.
19
+ - Error and failure paths — what the unit throws, returns, or rejects with on bad input.
20
+ - Empty / null / zero-length inputs where the unit accepts a collection or optional.
21
+
22
+ ## Quality bar
23
+
24
+ - Assert specific values, specific side effects, or specific error conditions. Never
25
+ `expect(x).toBeTruthy()` / `toBeDefined()` / `not.toBeNull()` as the only assertion.
26
+ - Do not test the mock. If a stub's return value is the only thing an assertion checks,
27
+ the test verifies setup, not code — delete or rewrite it.
28
+ - Each test name must describe the behavior it verifies and match what the body asserts.
29
+ - No sleeps, no real network, no timing-dependent assertions. Tests must be hermetic.
30
+ - One behavior per test. Use the project's parameterized-test mechanism instead of
31
+ copy-pasting a test with minor variations.
32
+ - Every test must be able to fail for a real bug. If it can't, it's slop — see
33
+ `rules/test-slop.md`.
34
+
35
+ ## Conventions
36
+
37
+ Match the existing test patterns shown in the gathered context (file naming, framework,
38
+ assertion library, fixture style, directory location). Use the project test command from
39
+ that context to confirm the tests run and pass before reporting.
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: understand
3
+ description: >
4
+ Understand code: map a single codebase or a whole multi-service stack. Repo mode
5
+ produces an architecture overview, domain map, sequence diagrams, and entry points;
6
+ stack mode maps how services connect (REST, queues, shared DBs, shared libs).
7
+ Use when entering an unfamiliar codebase or system.
8
+ argument-hint: "[target]"
9
+ user-invocable: true
10
+ context: fork
11
+ agent: cartographer
12
+ model: claude-opus-4-6
13
+ effort: high
14
+ allowed-tools:
15
+ - Read(*)
16
+ - Write(uv-out/**)
17
+ - AskUserQuestion
18
+ - Grep(*)
19
+ - Glob(*)
20
+ - Bash(graphify *)
21
+ - Bash(repomix *)
22
+ - Bash(find *)
23
+ - Bash(git *)
24
+ - Bash(wc *)
25
+ - Bash(head *)
26
+ - Bash(ls *)
27
+ - Bash(cat *)
28
+ - Bash(pip *)
29
+ ---
30
+
31
+ ## Target
32
+
33
+ $ARGUMENTS
34
+
35
+ ## Mode (auto-detected)
36
+
37
+ The same cartographer, two ways of looking — and the skill picks for you:
38
+
39
+ - **repo** — one codebase: architecture, domains, sequence diagrams, entry points, danger zones.
40
+ - **stack** — multiple services: how they connect (REST, queues, shared DBs, shared libs).
41
+
42
+ Detection counts **distinct directories that contain a build file** (not the number of
43
+ build files — a single project often has several, e.g. `setup.py` + `pyproject.toml`).
44
+ **Two or more distinct service directories → stack; otherwise → repo.** The block below
45
+ prints the chosen mode and why — follow it.
46
+
47
+ ```!
48
+ T="$(echo "$ARGUMENTS" | xargs)"; D="${T:-.}"
49
+ svc=$(find "$D" -maxdepth 3 \
50
+ \( -name package.json -o -name go.mod -o -name requirements.txt -o -name pom.xml \
51
+ -o -name build.gradle -o -name Cargo.toml -o -name pyproject.toml -o -name setup.py \
52
+ -o -name composer.json -o -name Gemfile \) \
53
+ -not -path "*/node_modules/*" -not -path "*/.git/*" -not -path "*/vendor/*" -not -path "*/.venv/*" \
54
+ 2>/dev/null | xargs -n1 dirname 2>/dev/null | sort -u)
55
+ ndirs=$(printf '%s\n' "$svc" | grep -c .)
56
+ if [ "$ndirs" -ge 2 ]; then
57
+ echo "MODE=stack (${ndirs} distinct service dirs:"; printf '%s\n' "$svc" | sed 's/^/ - /'
58
+ echo ")"
59
+ else
60
+ echo "MODE=repo (single codebase; ${ndirs} build dir)"
61
+ fi
62
+ ```
63
+
64
+ The `T`/`D` convention: `T` is the target, `D` falls back to `.` when no target is given.
65
+
66
+ ## Session output directory
67
+
68
+ Write the map under this directory (scoped to the current session):
69
+
70
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-session.sh`
71
+
72
+ ## Shared context
73
+
74
+ !`T="$(echo "$ARGUMENTS"|xargs)"; D="$T"; [ -d "$D" ] || D="."; cat "$D/CLAUDE.md" 2>/dev/null || cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
75
+
76
+ !`T="$(echo "$ARGUMENTS"|xargs)"; D="$T"; [ -d "$D" ] || D="."; cat "$D/DANGER-ZONES.md" 2>/dev/null || echo "No DANGER-ZONES.md found"`
77
+
78
+ ## Graphify availability
79
+
80
+ ```!
81
+ graphify --version 2>/dev/null || echo "NOT_INSTALLED"
82
+ ```
83
+
84
+ ## Prior maps (current session first, then prior, then legacy)
85
+
86
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-collect.sh 'map-codebase.md' || echo "No prior codebase map"`
87
+
88
+ !`"${CLAUDE_PROJECT_DIR:-.}"/.claude/hooks/uv-out-collect.sh 'map-stack.md' || echo "No prior stack map"`
89
+
90
+ ## Discovery
91
+
92
+ Both sets run at load (cheap); the chosen mode uses what it needs.
93
+
94
+ ### Services (stack mode)
95
+
96
+ ```!
97
+ T="$(echo "$ARGUMENTS"|xargs)"; find ${T:-.} -maxdepth 3 \( -name "package.json" -o -name "pom.xml" -o -name "go.mod" -o -name "Cargo.toml" -o -name "requirements.txt" -o -name "setup.py" -o -name "pyproject.toml" \) -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | head -30
98
+ ```
99
+
100
+ ### Dockerfiles, compose, infra, contracts (stack mode)
101
+
102
+ ```!
103
+ T="$(echo "$ARGUMENTS"|xargs)"; find ${T:-.} -maxdepth 4 \( -name "Dockerfile" -o -name "docker-compose*" -o -name "*.tf" -o -name "Chart.yaml" -o -name "values.yaml" -o -name "*.proto" -o -name "openapi*" -o -name "*.graphql" \) -not -path "*/node_modules/*" 2>/dev/null | head -30
104
+ ```
105
+
106
+ ### Existing knowledge graph (repo mode)
107
+
108
+ ```!
109
+ T="$(echo "$ARGUMENTS"|xargs)"; D="$T"; [ -d "$D" ] || D="."; cat "$D/graphify-out/GRAPH_REPORT.md" 2>/dev/null | head -80 || echo "No existing graph found"
110
+ ```
111
+
112
+ ## Procedure
113
+
114
+ Based on the `MODE` printed above, follow the matching mode file. It owns the process and
115
+ the output for that mode:
116
+
117
+ - **MODE=repo** → follow `skills/understand/modes/repo.md`
118
+ - **MODE=stack** → follow `skills/understand/modes/stack.md`
@@ -0,0 +1,38 @@
1
+ # Repo mode
2
+
3
+ Map one codebase: architecture, business domains, key flows, entry points, danger zones.
4
+
5
+ The orchestrator has already loaded for you: the target (`$ARGUMENTS`), the chosen
6
+ `MODE`, the session output directory, the target's CLAUDE.md / DANGER-ZONES.md, Graphify
7
+ availability, the **existing knowledge graph** (if any), and any prior `map-codebase.md`
8
+ artifacts. Use them; don't re-fetch.
9
+
10
+ ## Process
11
+
12
+ If Graphify is available (see the orchestrator's availability check), run it first and
13
+ fold its findings into the map below.
14
+
15
+ 1. **Map the architecture** — identify major modules and how they fit together.
16
+ 2. **Map the business domain** — the real-world concepts the code models.
17
+ 3. **Trace the key flows** — the paths a request/job takes through the system.
18
+ 4. **Find the entry points** — main, routes, handlers, CLI commands.
19
+ 5. **Flag the danger zones** — fragile areas, high-fan-in modules, missing tests.
20
+
21
+ ## Output
22
+
23
+ Write the full map to `map-codebase.md` inside the session output directory printed by
24
+ the orchestrator (e.g. `uv-out/sessions/<sid>/map-codebase.md`), stamped with provenance
25
+ frontmatter (`session`, `skill: understand`, `created`). It should contain:
26
+
27
+ - **Architecture overview** — major modules and how they fit together
28
+ - **Business domain map** — the real-world concepts the code models
29
+ - **Sequence diagrams** (Mermaid) for the key flows
30
+ - **Entry points** — main, routes, handlers, CLI commands
31
+ - **Danger zones** — fragile areas, high-fan-in modules, missing tests
32
+
33
+ ## Report back
34
+
35
+ After writing the file, print only a one-line pointer to the terminal — do not repeat
36
+ the map. For example:
37
+
38
+ > Map written to `uv-out/sessions/<sid>/map-codebase.md` — go check it.