specpipe 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. package/README.md +1319 -0
  2. package/bin/devkit.js +3 -0
  3. package/package.json +61 -0
  4. package/src/cli.js +76 -0
  5. package/src/commands/check.js +33 -0
  6. package/src/commands/diff.js +84 -0
  7. package/src/commands/init-adopt.js +54 -0
  8. package/src/commands/init-agents.js +118 -0
  9. package/src/commands/init-global.js +102 -0
  10. package/src/commands/init.js +311 -0
  11. package/src/commands/list.js +54 -0
  12. package/src/commands/remove.js +133 -0
  13. package/src/commands/upgrade.js +215 -0
  14. package/src/lib/agent-guards.js +100 -0
  15. package/src/lib/agent-install.js +161 -0
  16. package/src/lib/agents.js +280 -0
  17. package/src/lib/claude-global.js +183 -0
  18. package/src/lib/detector.js +93 -0
  19. package/src/lib/hasher.js +21 -0
  20. package/src/lib/installer.js +213 -0
  21. package/src/lib/logger.js +16 -0
  22. package/src/lib/manifest.js +102 -0
  23. package/src/lib/reconcile.js +56 -0
  24. package/templates/.claude/CLAUDE.md +79 -0
  25. package/templates/.claude/hooks/comment-guard.js +126 -0
  26. package/templates/.claude/hooks/file-guard.js +216 -0
  27. package/templates/.claude/hooks/glob-guard.js +104 -0
  28. package/templates/.claude/hooks/path-guard.sh +118 -0
  29. package/templates/.claude/hooks/self-review.sh +27 -0
  30. package/templates/.claude/hooks/sensitive-guard.sh +227 -0
  31. package/templates/.claude/settings.json +68 -0
  32. package/templates/docs/WORKFLOW.md +325 -0
  33. package/templates/docs/specs/.gitkeep +0 -0
  34. package/templates/hooks/specpipe-read-guard.sh +42 -0
  35. package/templates/hooks/specpipe-shell-guard.sh +65 -0
  36. package/templates/rules/specpipe-guards.md +40 -0
  37. package/templates/scripts/test-hooks.sh +66 -0
  38. package/templates/skills/sp-build/SKILL.md +776 -0
  39. package/templates/skills/sp-challenge/SKILL.md +255 -0
  40. package/templates/skills/sp-commit/SKILL.md +174 -0
  41. package/templates/skills/sp-explore/SKILL.md +730 -0
  42. package/templates/skills/sp-fix/SKILL.md +266 -0
  43. package/templates/skills/sp-humanize/SKILL.md +212 -0
  44. package/templates/skills/sp-investigate/SKILL.md +648 -0
  45. package/templates/skills/sp-md-render/SKILL.md +200 -0
  46. package/templates/skills/sp-md-render/components.md +415 -0
  47. package/templates/skills/sp-md-render/template.html +283 -0
  48. package/templates/skills/sp-plan/SKILL.md +947 -0
  49. package/templates/skills/sp-review/SKILL.md +268 -0
  50. package/templates/skills/sp-scaffold/SKILL.md +237 -0
  51. package/templates/skills/sp-scaffold/references/ARCHITECTURE.md.tmpl +228 -0
  52. package/templates/skills/sp-scaffold/references/DESIGN.md.tmpl +113 -0
  53. package/templates/skills/sp-scaffold/references/adr/NNNN-template.md +92 -0
  54. package/templates/skills/sp-scaffold/references/stack-profiles/react.md +36 -0
  55. package/templates/skills/sp-spec-render/SKILL.md +254 -0
  56. package/templates/skills/sp-spec-render/components.md +418 -0
  57. package/templates/skills/sp-spec-render/examples/user-auth.html +749 -0
  58. package/templates/skills/sp-spec-render/examples/user-auth.md +114 -0
  59. package/templates/skills/sp-spec-render/template.html +222 -0
  60. package/templates/skills/sp-voices/SKILL.md +1184 -0
@@ -0,0 +1,1184 @@
1
+ ---
2
+ description: |
3
+ Multi-voice review — orchestrate multiple LLMs (Claude + Codex + others) to
4
+ independently evaluate any input, synthesize consensus and disagreements
5
+ into actionable output.
6
+ Use when asked to "multi-voice review", "second opinion", "ý kiến nhiều mô hình",
7
+ "hỏi nhiều LLM", "ask multiple LLMs", "voices review", or "what do other models think".
8
+ Proactively suggest for high-stakes or controversial decisions — irreversible
9
+ architecture choices, security trade-offs, "are we sure about this design"
10
+ moments — where a single model's confidence is not enough.
11
+ Skip for trivial questions or work where one perspective is sufficient.
12
+ Works on code, specs, plans, ideas, or any text input.
13
+ allowed-tools: Read, Bash, Glob, Grep, Write, AskUserQuestion
14
+ ---
15
+ # /sp-voices — Multi-Voice Review
16
+
17
+ Get independent perspectives from multiple LLMs on anything —
18
+ code, ideas, documents, architecture, skills, decisions.
19
+
20
+ Target: $ARGUMENTS
21
+
22
+ ---
23
+
24
+ ## How It Works
25
+
26
+ ```
27
+ 1. Understand what you're asking (Phase 1)
28
+ 2. Find available reviewers (Phase 2)
29
+ 3. Ask them — open-ended, not templated (Phase 3)
30
+ 4. Synthesize their responses (Phase 4)
31
+ 5. Show you what matters for YOUR decision (Phase 5)
32
+ ```
33
+
34
+ ---
35
+
36
+ ## Phase 1: Understand Intent
37
+
38
+ Read `$ARGUMENTS`. Don't classify into a box — understand what the user
39
+ is trying to DECIDE.
40
+
41
+ ### 1.1 — What is the user trying to decide?
42
+
43
+ ```
44
+ Parse $ARGUMENTS for decision intent:
45
+
46
+ "what do you think about..." → User wants: opinions + consensus on direction
47
+ "review code/diff" → User wants: bugs, risks, merge/block decision
48
+ "check this doc" → User wants: readiness assessment, gaps
49
+ "is this approach ok" → User wants: validation or alternatives
50
+ "any issues with this" → User wants: risk identification
51
+ "compare A vs B" → User wants: trade-off analysis
52
+ "this strategy" → User wants: go/pivot/stop signal
53
+
54
+ If unclear → ask 1 question:
55
+ "What decision are you trying to make from this review?"
56
+ Don't ask "what type of review" — ask "what decision".
57
+ ```
58
+
59
+ ### 1.2 — What material is involved?
60
+
61
+ ```bash
62
+ # If $ARGUMENTS points to file(s)
63
+ # Read and measure
64
+ MATERIAL=$(cat <file> 2>/dev/null)
65
+ LINES=$(echo "$MATERIAL" | wc -l | xargs)
66
+ echo "Material: <file>, $LINES lines"
67
+
68
+ # If $ARGUMENTS is about git diff
69
+ MATERIAL=$(git diff main...HEAD 2>/dev/null)
70
+ [ -z "$MATERIAL" ] && MATERIAL=$(git diff HEAD~1 2>/dev/null)
71
+
72
+ # If $ARGUMENTS is a question/idea (no file)
73
+ # Material = the question itself + any referenced context
74
+ ```
75
+
76
+ If material > 32KB → chunk by logical sections.
77
+
78
+ ### 1.3 — Confirm before proceeding
79
+
80
+ **Always confirm intent in 1 line before spawning voices.**
81
+ Include voice count + which voice(s).
82
+
83
+ ```
84
+ Simple (1 voice, auto-selected):
85
+ "Asking Perplexity if anyone has solved a similar problem. Ok?"
86
+ "Having Claude review auth.ts for bugs. Ok?"
87
+
88
+ Medium (2 voices, auto-selected):
89
+ "Getting 2 opinions: Claude (code logic) + Perplexity (security/CVEs). Ok?"
90
+ "Asking GPT (business logic) + Claude (technical feasibility). Ok?"
91
+
92
+ Complex (N voices, user picks via AskUserQuestion):
93
+ "Complex problem — I'll ask you to pick voices. First, confirm:
94
+ you want to evaluate [intent summary] — correct?"
95
+ ```
96
+
97
+ **If user corrects → adjust intent + voice selection.**
98
+ **If user says "add voice" or "fewer voices" → adjust.**
99
+
100
+ ---
101
+
102
+ ## Phase 2: Find Reviewers
103
+
104
+ ### 2.1 — Probe Availability
105
+
106
+ ```bash
107
+ echo "=== Reviewer availability ==="
108
+
109
+ # External LLMs
110
+ command -v openai &>/dev/null && echo "OPENAI_CLI: available" || \
111
+ ([ -n "$OPENAI_API_KEY" ] && echo "OPENAI_API: key set" || echo "OPENAI: ✗")
112
+ # Codex needs binary AND auth (one of: $CODEX_API_KEY, $OPENAI_API_KEY,
113
+ # or ${CODEX_HOME:-~/.codex}/auth.json). Binary alone isn't enough.
114
+ if command -v codex &>/dev/null; then
115
+ _CODEX_AUTH_FILE="${CODEX_HOME:-$HOME/.codex}/auth.json"
116
+ if [ -n "$CODEX_API_KEY" ] || [ -n "$OPENAI_API_KEY" ] || [ -f "$_CODEX_AUTH_FILE" ]; then
117
+ echo "CODEX_CLI: available"
118
+ else
119
+ echo "CODEX: ✗ (binary present, no auth — run 'codex login')"
120
+ fi
121
+ else
122
+ echo "CODEX: ✗"
123
+ fi
124
+ # Gemini API (generativelanguage / AI Studio) is a hosted REST endpoint — needs
125
+ # $GEMINI_API_KEY. NOTE: the standalone `gemini` CLI was retired 2026-06-18 and
126
+ # folded into Antigravity CLI (`agy`) — probe that separately below, not here.
127
+ [ -n "$GEMINI_API_KEY" ] && echo "GEMINI_API: key set" || echo "GEMINI: ✗"
128
+ [ -n "$PERPLEXITY_API_KEY" ] && echo "PERPLEXITY: available" || echo "PERPLEXITY: ✗"
129
+ # Antigravity CLI (`agy`) — Google's agentic terminal coding agent, the successor
130
+ # to the retired `gemini` CLI. Agentic like Codex: reads code, runs commands.
131
+ # Needs binary AND auth (one of: $ANTIGRAVITY_API_KEY, $GEMINI_API_KEY — both
132
+ # accepted by agy — or OS-keyring/OAuth state from a prior interactive `agy`
133
+ # login under ~/.gemini/antigravity-cli/). Binary alone isn't enough.
134
+ if command -v agy &>/dev/null; then
135
+ if [ -n "$ANTIGRAVITY_API_KEY" ] || [ -n "$GEMINI_API_KEY" ] || [ -d "$HOME/.gemini/antigravity-cli" ]; then
136
+ echo "ANTIGRAVITY_CLI: available"
137
+ else
138
+ echo "ANTIGRAVITY: ✗ (binary present, no auth — run 'agy' once to log in)"
139
+ fi
140
+ else
141
+ echo "ANTIGRAVITY: ✗"
142
+ fi
143
+ [ -n "$ANTHROPIC_API_KEY" ] && echo "ANTHROPIC_API: key set" || echo "ANTHROPIC: host only"
144
+ command -v ollama &>/dev/null && echo "OLLAMA: available" || echo "OLLAMA: ✗"
145
+ command -v claude &>/dev/null && echo "SELF_SPAWN: available" || echo "SELF_SPAWN: ✗"
146
+
147
+ echo "==========================="
148
+ ```
149
+
150
+ ### 2.2 — Auth Probe (Tier 1 voices only)
151
+
152
+ Before building expensive prompts, verify that API keys are actually valid.
153
+ A set key does not mean a working key.
154
+
155
+ ```bash
156
+ # Lightweight auth probe — only for voices that will be used
157
+ # Each probe: small request, < 10 tokens, just check for 401/403
158
+
159
+ # OpenAI
160
+ if [ -n "$OPENAI_API_KEY" ]; then
161
+ _OAI_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
162
+ -H "Authorization: Bearer $OPENAI_API_KEY" \
163
+ https://api.openai.com/v1/models 2>/dev/null)
164
+ [ "$_OAI_STATUS" = "200" ] && echo "OPENAI_AUTH: valid" || echo "OPENAI_AUTH: FAILED ($_OAI_STATUS)"
165
+ fi
166
+
167
+ # Perplexity — SKIPPED: Perplexity has no free auth-probe endpoint.
168
+ # A real chat completion (even max_tokens:1) is billed per request, so probing
169
+ # every run wastes money. Trust the key is set; if invalid, the actual review
170
+ # call will return 401 and Phase 3.5 (Post-Response Checks) flags it as
171
+ # "auth failed". Net cost: 1 wasted real call vs N probe calls per session.
172
+ if [ -n "$PERPLEXITY_API_KEY" ]; then
173
+ echo "PERPLEXITY_AUTH: assumed valid (probe skipped — would cost money)"
174
+ fi
175
+
176
+ # Gemini API — use header auth (x-goog-api-key) to keep the key out of URLs/logs.
177
+ if [ -n "$GEMINI_API_KEY" ]; then
178
+ _GEM_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
179
+ -H "x-goog-api-key: $GEMINI_API_KEY" \
180
+ https://generativelanguage.googleapis.com/v1beta/models 2>/dev/null)
181
+ [ "$_GEM_STATUS" = "200" ] && echo "GEMINI_AUTH: valid" || echo "GEMINI_AUTH: FAILED ($_GEM_STATUS)"
182
+ fi
183
+
184
+ # Antigravity CLI — no cheap REST auth-probe endpoint; it authenticates the agent
185
+ # harness on first call. If $ANTIGRAVITY_API_KEY / $GEMINI_API_KEY is set or OAuth
186
+ # state exists, trust it; a dead key surfaces as an error on the real call
187
+ # (Phase 3.5 flags it).
188
+ if command -v agy &>/dev/null && { [ -n "$ANTIGRAVITY_API_KEY" ] || [ -n "$GEMINI_API_KEY" ] || [ -d "$HOME/.gemini/antigravity-cli" ]; }; then
189
+ echo "ANTIGRAVITY_AUTH: assumed valid (agent harness — probe skipped)"
190
+ fi
191
+
192
+ # Anthropic
193
+ if [ -n "$ANTHROPIC_API_KEY" ]; then
194
+ _ANT_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
195
+ -H "x-api-key: $ANTHROPIC_API_KEY" \
196
+ -H "anthropic-version: 2023-06-01" \
197
+ https://api.anthropic.com/v1/models 2>/dev/null)
198
+ [ "$_ANT_STATUS" = "200" ] && echo "ANTHROPIC_AUTH: valid" || echo "ANTHROPIC_AUTH: FAILED ($_ANT_STATUS)"
199
+ fi
200
+ ```
201
+
202
+ If any voice's auth probe returns FAILED:
203
+ - Remove it from available voices BEFORE voice selection
204
+ - Note in output: "Voice X skipped — auth failed"
205
+ - Do NOT waste tokens building a prompt for a dead key
206
+
207
+ ### 2.3 — Reviewer Priority
208
+
209
+ ```
210
+ Tier 1 — Different model family (most diverse):
211
+ GPT, Gemini, Perplexity
212
+ → Different training = different perspectives
213
+
214
+ Tier 2 — Agentic / independent session (reads code, runs commands, or fresh context):
215
+ Codex CLI, Antigravity CLI (`agy`), Anthropic API (different Claude model)
216
+ → Antigravity CLI is Google's agentic terminal agent (successor to the retired
217
+ `gemini` CLI, shut down 2026-06-18) — it actually reads the repo, like Codex.
218
+ Pick the backing model with `agy --model` (Gemini 3.1 Pro, Claude, GPT-OSS —
219
+ depends on plan), so this voice doubles as a Google-family OR cross-family reviewer.
220
+ → Independent context = still valuable
221
+
222
+ Tier 3 — Local:
223
+ Ollama
224
+ → Free, private, lower capability
225
+
226
+ Tier 4 — Self-spawn (always available):
227
+ claude --print (fresh context, no conversation history)
228
+ → Inherits the current Claude Code session's model by default
229
+ (override via $MF_VOICES_SELF_SPAWN_MODEL)
230
+ → Same model but fresh eyes — better than nothing
231
+ → MARK in output: "self-spawn — same model family"
232
+ ```
233
+
234
+ ### Voice Strengths — Who's Good at What
235
+
236
+ ```
237
+ ┌─────────────────┬──────────────────────────────────────────────────────┐
238
+ │ Voice │ Best For │
239
+ ├─────────────────┼──────────────────────────────────────────────────────┤
240
+ │ Claude │ Code review, nuanced reasoning, design/architecture, │
241
+ │ (Haiku 4.5 / │ long-context analysis, careful edge case thinking. │
242
+ │ Sonnet 4.6 / │ Default voice: sonnet-4-6 ($3/$15). Self-spawn │
243
+ │ Opus 4.7) │ inherits the current Claude Code session's model │
244
+ │ │ (override via $MF_VOICES_SELF_SPAWN_MODEL — e.g. │
245
+ │ │ haiku-4-5 $1/$5 for cheap second opinion). │
246
+ │ │ Bump to opus-4-7 ($5/$25) for │
247
+ │ │ hardest reasoning. │
248
+ │ │ Strongest at: code quality, readability, subtle bugs.│
249
+ ├─────────────────┼──────────────────────────────────────────────────────┤
250
+ │ GPT (5-mini / │ Wide domain knowledge, business logic, product │
251
+ │ 5.5) │ thinking, real-world patterns. Strong at connecting │
252
+ │ │ technical decisions to business impact. │
253
+ │ │ Default: gpt-5-mini ($0.25/$2). gpt-5.5 ($5/$30, │
254
+ │ │ released 2026-04-23) only when top quality matters │
255
+ │ │ — gpt-5.5 is now pricier than Sonnet 4.6. │
256
+ │ │ Strongest at: domain expertise, practical tradeoffs. │
257
+ ├─────────────────┼──────────────────────────────────────────────────────┤
258
+ │ Gemini │ Broad analysis, large context window, multi-modal. │
259
+ │ (3 Flash / │ Good at synthesizing large documents. │
260
+ │ 3.1 Pro) │ Default: gemini-3-flash ($0.50/$3). Upgrade to │
261
+ │ │ gemini-3.1-pro-preview ($2/$12, $4/$18 >200k ctx). │
262
+ │ │ NOTE: gemini-3-pro deprecated 2026-03-09 — calls │
263
+ │ │ to that model ID will fail. Use 3.1-pro-preview. │
264
+ │ │ Strongest at: big-picture, cross-cutting concerns. │
265
+ ├─────────────────┼──────────────────────────────────────────────────────┤
266
+ │ Perplexity │ Real-time web search. Knows current CVEs, latest │
267
+ │ (sonar / │ best practices, library health, who solved this │
268
+ │ sonar-pro) │ problem before. CITES SOURCES. │
269
+ │ │ Default: sonar-pro ($3/$15) for citation quality. │
270
+ │ │ sonar ($1/$1) for cheap quick lookups. │
271
+ │ │ Strongest at: security, research, current info, │
272
+ │ │ "is this current best practice?". │
273
+ │ │ UNIQUE: only voice with live web access. │
274
+ ├─────────────────┼──────────────────────────────────────────────────────┤
275
+ │ Antigravity CLI │ Google's agentic terminal agent (`agy`), successor │
276
+ │ (`agy`) │ to the retired `gemini` CLI (shut down 2026-06-18). │
277
+ │ │ Agentic like Codex: reads the repo, runs commands. │
278
+ │ │ Backed by a model via `agy --model` (Gemini 3.1 Pro,│
279
+ │ │ Claude Sonnet/Opus, GPT-OSS — plan-dependent). │
280
+ │ │ Strongest at: big-picture + actually running code. │
281
+ ├─────────────────┼──────────────────────────────────────────────────────┤
282
+ │ Codex CLI │ Agentic — reads code itself, runs commands, │
283
+ │ │ explores repo structure. Finds things text-only │
284
+ │ │ review misses because it actually RUNS the code. │
285
+ │ │ Strongest at: code bugs, runtime behavior, │
286
+ │ │ "does this actually work when you run it?". │
287
+ ├─────────────────┼──────────────────────────────────────────────────────┤
288
+ │ Ollama (local) │ Privacy-sensitive reviews. No data leaves machine. │
289
+ │ │ Capability varies by model (llama3.3:70b decent). │
290
+ │ │ Strongest at: private code, air-gapped envs. │
291
+ ├─────────────────┼──────────────────────────────────────────────────────┤
292
+ │ Self-spawn │ Always available. Fresh context = no conversation │
293
+ │ (Claude CLI) │ bias. Same model family = possible blind spots. │
294
+ │ │ Strongest at: "second pair of eyes" when nothing │
295
+ │ │ else available. │
296
+ └─────────────────┴──────────────────────────────────────────────────────┘
297
+ ```
298
+
299
+ ### Smart Voice Assignment
300
+
301
+ Skill selects voices based on intent + voice strengths:
302
+
303
+ ```
304
+ Intent: code review
305
+ Best voices: Claude (quality) + Codex (runtime) + Perplexity (CVEs)
306
+ Alt: Claude + GPT (domain logic) + self-spawn
307
+
308
+ Intent: strategy / business decision
309
+ Best voices: GPT (domain/business) + Claude (reasoning) + Perplexity (research)
310
+ Alt: GPT + Gemini (big-picture) + self-spawn
311
+
312
+ Intent: research / deep technical topic
313
+ Best voices: Perplexity (current info) + GPT (broad knowledge) + Claude (reasoning)
314
+ Alt: Perplexity + Gemini (large-context synthesis) + self-spawn
315
+
316
+ Intent: security review
317
+ Best voices: Perplexity (CVEs, advisories) + Claude (logic) + Codex (runtime test)
318
+ Alt: Perplexity + GPT + self-spawn
319
+
320
+ Intent: architecture / design
321
+ Best voices: Claude (design) + GPT (practical tradeoffs) + Gemini (big picture)
322
+ Alt: Antigravity CLI (reads the repo) + Claude + Perplexity (who solved this before)
323
+
324
+ Intent: document readiness
325
+ Best voices: Claude (nuance) + GPT (domain) + Perplexity (current standards)
326
+ Alt: Claude + Gemini (logical consistency) + self-spawn
327
+
328
+ Intent: comparison (A vs B)
329
+ Best voices: Perplexity (research/benchmarks) + GPT (practical) + Claude (reasoning)
330
+ Alt: Gemini (structured comparison) + any 2
331
+
332
+ Fallback (any intent, limited voices):
333
+ Use whatever is available. Self-spawn as last resort.
334
+ ALWAYS note which voices would be ideal but weren't available.
335
+ ```
336
+
337
+ ### Voice Count — Adaptive, Not Fixed
338
+
339
+ ```
340
+ Do NOT default to 3 voices. Voice count depends on complexity.
341
+
342
+ Simple (clear question, material < 100 lines, straightforward intent):
343
+ → 1 voice — pick BEST FIT for intent, don't ask
344
+ → Example: "any bugs in this?" → spawn Claude (best for code)
345
+ → Example: "has anyone done this before?" → spawn Perplexity (web search)
346
+ → Fast, cheap, enough for simple questions
347
+
348
+ Medium (material 100-500 lines, a few concerns, clear but nuanced intent):
349
+ → 2 voices — pick 2 best fit, don't ask
350
+ → Example: "review security + logic" → Perplexity (CVEs) + Claude (logic)
351
+
352
+ Complex (material > 500 lines, multi-faceted, strategy/architecture, high stakes):
353
+ → Ask user to pick voices via AskUserQuestion
354
+ → Suggest combo based on intent + available voices
355
+ ```
356
+
357
+ ### Complexity Detection
358
+
359
+ ```
360
+ Signals for SIMPLE (auto 1 voice):
361
+ - Short, clear question ("any bugs?", "approach ok?")
362
+ - Material < 100 lines or 1 small file
363
+ - User says "quick", "fast", "just one opinion"
364
+
365
+ Signals for MEDIUM (auto 2 voices):
366
+ - Material 100-500 lines
367
+ - Question has 2+ concerns ("security + performance")
368
+ - User didn't say "quick" but also didn't say "thorough"
369
+
370
+ Signals for COMPLEX (ask user):
371
+ - Material > 500 lines or multi-file
372
+ - Strategy, architecture, or high-stakes decision
373
+ - User says "thorough", "complete", "multiple perspectives"
374
+ - Disagreements likely (controversial topic, multiple valid approaches)
375
+ - User explicit: "/sp-voices 3" or "/sp-voices full"
376
+
377
+ When in doubt → treat as MEDIUM (2 voices, don't ask).
378
+ ```
379
+
380
+ ---
381
+
382
+ ## Phase 3: Ask Reviewers
383
+
384
+ ### 3.1 — Prompt Construction
385
+
386
+ **Core principle: ask an open question, not a structured template.**
387
+
388
+ Every reviewer gets:
389
+
390
+ ```
391
+ [Filesystem Boundary — agentic voices only]
392
+ +
393
+ [Base Question]
394
+ +
395
+ [Bias — light nudge matched to user's intent]
396
+ +
397
+ [Material]
398
+ ```
399
+
400
+ **Filesystem Boundary — prepend ONLY for agentic voices (Codex CLI, Antigravity
401
+ CLI, self-spawn, local agents). Hosted chat APIs (OpenAI, Gemini, Anthropic
402
+ Messages, Perplexity) have no file access — the boundary is wasted tokens for
403
+ them.**
404
+
405
+ ```
406
+ IMPORTANT: Do NOT read or execute any files under ~/.claude/, .claude/,
407
+ .cursor/, agents/, .claude/skills/, node_modules/, __pycache__/,
408
+ .git/objects/, vendor/, Pods/, DerivedData/, dist/, build/, .next/.
409
+ These paths contain skill definitions, build artifacts, or vendored code
410
+ meant for a different AI system or tooling — they will waste your time
411
+ and pull you off-task. Ignore them completely. Focus only on the content
412
+ provided below.
413
+ ```
414
+
415
+ **Base Question (same for all voices, all intents):**
416
+ ```
417
+ "Review the following. Be direct, be honest.
418
+
419
+ - What's wrong or could go wrong?
420
+ - What concerns you?
421
+ - What would you change?
422
+ - What's good and should stay?
423
+
424
+ Be specific — point to exact locations.
425
+ If you see an overall pattern, say it.
426
+ If nothing is wrong, say that — don't invent problems.
427
+
428
+ MATERIAL:
429
+ <content>"
430
+ ```
431
+
432
+ ### 3.2 — Bias Selection (matched to intent, not to type)
433
+
434
+ Bias is a LIGHT NUDGE — 1-2 sentences appended after base question.
435
+ Reviewer can and should go beyond the nudge.
436
+
437
+ **Choose 3 biases that match the user's DECISION INTENT:**
438
+
439
+ ```
440
+ When user wants DIRECTION (go/pivot/stop):
441
+ Bias 1: "Pay special attention to: is this feasible? What's the biggest risk?"
442
+ Bias 2: "Pay special attention to: who benefits? Does this solve a real problem or an imagined one?"
443
+ Bias 3: "Pay special attention to: is there a simpler way to achieve the same goal?"
444
+
445
+ When user wants VALIDATION (ok or not):
446
+ Bias 1: "Pay special attention to: is this approach on the right track? What's missing?"
447
+ Bias 2: "Pay special attention to: what risks are being overlooked? What failure modes haven't been considered?"
448
+ Bias 3: "Pay special attention to: has anyone solved this problem better already?"
449
+
450
+ When user wants BUG/RISK FINDING:
451
+ Bias 1: "Pay special attention to: is the code/logic correct? Edge cases?"
452
+ Bias 2: "Pay special attention to: security? How could this be exploited?"
453
+ Bias 3: "Pay special attention to: maintainability? Will the next person understand this?"
454
+
455
+ When user wants COMPARISON (A vs B):
456
+ Bias 1: "Pay special attention to: where does A beat B? Where does B beat A?"
457
+ Bias 2: "Pay special attention to: risk of each option? Which one fails worse?"
458
+ Bias 3: "Pay special attention to: is there an option C that neither has considered?"
459
+
460
+ When user wants READINESS CHECK:
461
+ Bias 1: "Pay special attention to: is this ready to use? What's missing?"
462
+ Bias 2: "Pay special attention to: any internal contradictions? Is the logic consistent?"
463
+ Bias 3: "Pay special attention to: can the implementer read this and actually execute?"
464
+
465
+ When intent doesn't fit above:
466
+ No bias — just base question. Let voices decide what matters.
467
+ ```
468
+
469
+ **Every bias ends with:** "But if you see a more important issue, say that instead."
470
+
471
+ ### 3.3 — Special Voice Roles
472
+
473
+ **Perplexity (when available):**
474
+ Always assign to the bias that needs real-time information:
475
+ - Security → search CVEs, advisories
476
+ - Strategy → search who else solved this
477
+ - Research → search current standards, benchmarks
478
+ - Comparison → search real-world data
479
+
480
+ Dedicated system prompt for Perplexity:
481
+ ```
482
+ "You have web search. Use it to find:
483
+ - Known vulnerabilities in mentioned libraries/patterns
484
+ - Who else solved this problem and how
485
+ - Current best practices (not outdated)
486
+ - Real benchmarks/case studies if discussing performance
487
+ Cite sources for every external claim."
488
+ ```
489
+
490
+ **Antigravity CLI (`agy`, when available):** Agentic like Codex — reads the
491
+ repo itself and can run commands. Assign to biases that benefit from actually
492
+ exploring the code: architecture (dependency graph), big-picture review,
493
+ "does this hold up across the whole codebase". Pick the backing model with
494
+ `agy --model` (default is plan-dependent; `gemini-3.1-pro` for large-context
495
+ reasoning), and pass `--sandbox` so the review stays read-only. Do NOT use for
496
+ pure idea/strategy review — wastes agentic tokens, same caveat as Codex.
497
+
498
+ **Codex CLI (when available):**
499
+ Assign to the bias that needs actual code interaction:
500
+ - Code review → reads files, traces execution
501
+ - Bug hunting → can actually run tests
502
+ - Architecture → explores repo structure, dependency graph
503
+ Do NOT use for idea/strategy review — overkill, wastes agentic tokens.
504
+
505
+ ### 3.4 — Execute Calls
506
+
507
+ Each voice call is wrapped in a timeout. If a call hangs, skip it and
508
+ continue with remaining voices.
509
+
510
+ **JSON safety:** every payload is built with `jq -n --arg` so material
511
+ containing quotes, newlines, or backslashes cannot break the JSON or
512
+ inject extra fields. Never interpolate `$PROMPT` directly into a JSON
513
+ string with `'"$PROMPT"'`.
514
+
515
+ ```bash
516
+ # Defensive: PROMPT must be set before any voice call
517
+ : "${PROMPT:?PROMPT is empty — refusing to call voices}"
518
+
519
+ # macOS does not ship GNU `timeout` — fall back to `gtimeout` (brew coreutils).
520
+ # Without this shim, every voice call below errors with "command not found".
521
+ if ! command -v timeout >/dev/null 2>&1; then
522
+ if command -v gtimeout >/dev/null 2>&1; then
523
+ timeout() { command gtimeout "$@"; }
524
+ else
525
+ echo "WARN: neither timeout nor gtimeout found — install coreutils (brew install coreutils)"
526
+ timeout() { shift; "$@"; } # no-op fallback (no timeout enforcement)
527
+ fi
528
+ fi
529
+
530
+ # Timeout wrapper — use for every voice call
531
+ # Usage: voice_call <timeout_seconds> <command...>
532
+ voice_call() {
533
+ local _TIMEOUT=$1; shift
534
+ timeout "$_TIMEOUT" "$@" 2>/tmp/voice-err-$$.txt
535
+ local _EXIT=$?
536
+ if [ "$_EXIT" = "124" ]; then
537
+ echo "VOICE_TIMEOUT: call exceeded ${_TIMEOUT}s"
538
+ return 124
539
+ fi
540
+ return $_EXIT
541
+ }
542
+
543
+ # OpenAI GPT (timeout: 60s)
544
+ # gpt-5-mini: $0.25/$2.00 per 1M tokens — cheap, strong default for review.
545
+ # Upgrade to "gpt-5.5" ($5/$30, released 2026-04-23) only when top quality
546
+ # matters — note gpt-5.5 is now more expensive per output than Sonnet 4.6.
547
+ # NOTE: GPT-5 family uses `max_completion_tokens`, not `max_tokens` (legacy).
548
+ # Sending `max_tokens` to gpt-5* returns HTTP 400.
549
+ _PAYLOAD=$(jq -n --arg p "$PROMPT" '{
550
+ model: "gpt-5-mini",
551
+ messages: [{role: "user", content: $p}],
552
+ max_completion_tokens: 4000,
553
+ temperature: 0.3
554
+ }')
555
+ voice_call 60 curl -s https://api.openai.com/v1/chat/completions \
556
+ -H "Authorization: Bearer $OPENAI_API_KEY" \
557
+ -H "Content-Type: application/json" \
558
+ -d "$_PAYLOAD" | jq -r '.choices[0].message.content'
559
+
560
+ # Gemini (timeout: 60s)
561
+ # gemini-3-flash: $0.50/$3.00 per 1M tokens — cheapest Tier-1 voice.
562
+ # Upgrade to "gemini-3.1-pro-preview" ($2.00/$12.00, $4/$18 over 200k ctx)
563
+ # for big-picture work on long material.
564
+ # NOTE: gemini-3-pro was deprecated/shut down 2026-03-09. Hardcoding
565
+ # "gemini-3-pro" returns 404 — always use "gemini-3.1-pro-preview".
566
+ _PAYLOAD=$(jq -n --arg p "$PROMPT" '{
567
+ contents: [{parts: [{text: $p}]}],
568
+ generationConfig: {maxOutputTokens: 4000, temperature: 0.3}
569
+ }')
570
+ voice_call 60 curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash:generateContent" \
571
+ -H "x-goog-api-key: $GEMINI_API_KEY" \
572
+ -H "Content-Type: application/json" \
573
+ -d "$_PAYLOAD" | jq -r '.candidates[0].content.parts[0].text'
574
+
575
+ # Perplexity (timeout: 90s — web search takes longer)
576
+ # sonar-pro: $3/$15 per 1M — keeps citations + deeper search.
577
+ # For cheap quick lookups, "sonar" is $1/$1. Use sonar-pro when sources matter.
578
+ _PAYLOAD=$(jq -n --arg p "$PROMPT" '{
579
+ model: "sonar-pro",
580
+ messages: [
581
+ {role: "system", content: "You are a reviewer with web search. Search for relevant CVEs, benchmarks, prior art, and current best practices. Cite sources."},
582
+ {role: "user", content: $p}
583
+ ],
584
+ max_tokens: 4000,
585
+ temperature: 0.3
586
+ }')
587
+ voice_call 90 curl -s https://api.perplexity.ai/chat/completions \
588
+ -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
589
+ -H "Content-Type: application/json" \
590
+ -d "$_PAYLOAD" | jq -r '.choices[0].message.content'
591
+
592
+ # Anthropic API (timeout: 60s)
593
+ # claude-sonnet-4-6: $3/$15 per 1M — main quality voice for code/reasoning.
594
+ # For cheap independent second opinion, "claude-haiku-4-5" ($1/$5) works too.
595
+ _PAYLOAD=$(jq -n --arg p "$PROMPT" '{
596
+ model: "claude-sonnet-4-6",
597
+ max_tokens: 4000,
598
+ messages: [{role: "user", content: $p}]
599
+ }')
600
+ voice_call 60 curl -s https://api.anthropic.com/v1/messages \
601
+ -H "x-api-key: $ANTHROPIC_API_KEY" \
602
+ -H "content-type: application/json" \
603
+ -H "anthropic-version: 2023-06-01" \
604
+ -d "$_PAYLOAD" | jq -r '.content[0].text'
605
+
606
+ # Codex CLI (timeout: 300s — agentic, reads code itself)
607
+ # Use `codex exec` for free-form review prompts. Critical flags:
608
+ # < /dev/null — prevents stdin deadlock (regression in codex 0.120.x)
609
+ # -C "$_REPO_ROOT" — runs at git root, not random CWD
610
+ # -s read-only — sandbox, codex cannot mutate files
611
+ # -c '...="high"' — explicit reasoning effort (default is too low)
612
+ # --enable web_search_cached — lets codex look up CVEs / current docs
613
+ # For diff-against-main reviews specifically, swap `exec "$PROMPT"` for
614
+ # `review "$PROMPT" --base main` (same other flags).
615
+ _REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
616
+ voice_call 300 codex exec "$PROMPT" \
617
+ -C "$_REPO_ROOT" \
618
+ -s read-only \
619
+ -c 'model_reasoning_effort="high"' \
620
+ --enable web_search_cached \
621
+ < /dev/null 2>/tmp/voice-codex-err-$$.txt
622
+
623
+ # Antigravity CLI (external timeout 360s — agentic, reads code itself, like Codex)
624
+ # Flags below verified against agy 1.0.9 (`agy --help`):
625
+ # -p "$PROMPT" — alias for --print: run ONE prompt non-interactively
626
+ # --model <id> — backing model; `agy models` lists them. Tested ids use
627
+ # kebab-case: gemini-3.1-pro (also gemini-3.5-flash,
628
+ # claude-sonnet-4-6, claude-opus-4-6, gpt-oss-120b —
629
+ # availability is plan-dependent). Omit for the default.
630
+ # --sandbox — run with terminal restrictions enabled (limits what
631
+ # commands the agent may execute). Use it for a review.
632
+ # --print-timeout — agy's own wait cap (default 5m); the 360s external
633
+ # backstop below is intentionally a bit longer.
634
+ # NOTE: there is NO `-m` short flag (that's not a model alias) and NO
635
+ # `--output-format` flag — agy has no structured/JSON output, parse plain text.
636
+ # Auth: $ANTIGRAVITY_API_KEY or $GEMINI_API_KEY (both accepted), else OS keyring
637
+ # / OAuth from a prior interactive `agy` login.
638
+ #
639
+ # NON-TTY STDOUT DROP: when stdout is not a terminal (command substitution,
640
+ # pipes, CI) agy can SILENTLY drop its final answer and still exit 0. Fix: run
641
+ # under `script` to fake a PTY. `script` arg order differs between macOS (BSD)
642
+ # and Linux (util-linux) — branch on uname. Prompt is passed via $AGY_PROMPT
643
+ # (never interpolated into the command string) so quotes/newlines in the
644
+ # material can't break the `script -qec` command line.
645
+ #
646
+ # Output cleanup (verified live on agy 1.0.9 / macOS):
647
+ # perl -0777 — slurp whole output, then:
648
+ # s/\x1b\[…//g strip ANSI escapes (use perl, NOT sed — BSD/macOS sed
649
+ # does not interpret \x1b)
650
+ # s/\A\^D[\x08]*// drop the literal "^D" + backspaces that BSD `script`
651
+ # echoes for the pty EOF at the very start of the stream
652
+ # tr -d … — remove remaining control bytes, keeping only tab (\011)/newline (\012)
653
+ export AGY_PROMPT="$PROMPT"
654
+ if [ "$(uname)" = "Darwin" ]; then
655
+ voice_call 360 script -q /dev/null \
656
+ agy -p "$AGY_PROMPT" --model gemini-3.1-pro --sandbox \
657
+ | perl -0777 -pe 's/\x1b\[[0-9;]*[A-Za-z]//g; s/\A\^D[\x08]*//' \
658
+ | tr -d '\000-\010\013-\037'
659
+ else
660
+ voice_call 360 script -qec 'agy -p "$AGY_PROMPT" --model gemini-3.1-pro --sandbox' /dev/null \
661
+ | perl -0777 -pe 's/\x1b\[[0-9;]*[A-Za-z]//g; s/\A\^D[\x08]*//' \
662
+ | tr -d '\000-\010\013-\037'
663
+ fi
664
+ unset AGY_PROMPT
665
+
666
+ # Ollama (timeout: 120s — local, can be slow)
667
+ _PAYLOAD=$(jq -n --arg p "$PROMPT" '{
668
+ model: "llama3.3:70b",
669
+ prompt: $p,
670
+ stream: false
671
+ }')
672
+ voice_call 120 curl -s http://localhost:11434/api/generate \
673
+ -d "$_PAYLOAD" | jq -r '.response'
674
+
675
+ # Self-spawn (timeout: 120s)
676
+ # By default, `claude --print` inherits the model from the user's current Claude
677
+ # Code session/config — DO NOT hardcode a model here. Hardcoding silently overrode
678
+ # the user's choice (e.g. forcing Haiku on an Opus session) and made the docs lie.
679
+ # To override for a cheaper second opinion, set MF_VOICES_SELF_SPAWN_MODEL
680
+ # (e.g. claude-haiku-4-5 for $1/$5 per 1M, or claude-sonnet-4-6 for stronger).
681
+ # Note: Claude Code CLI uses --append-system-prompt, NOT --system (would error).
682
+ echo "$PROMPT" | voice_call 120 claude --print \
683
+ --append-system-prompt "You are an independent reviewer. Fresh context. No prior conversation. Be direct." \
684
+ ${MF_VOICES_SELF_SPAWN_MODEL:+--model "$MF_VOICES_SELF_SPAWN_MODEL"} 2>/dev/null
685
+ ```
686
+
687
+ ### 3.5 — Post-Response Checks
688
+
689
+ ```
690
+ Rabbit hole: response mentions .claude/, SKILL.md, package-lock.json
691
+ → Flag "⚠ Voice N got distracted by config files"
692
+
693
+ Empty: response < 100 chars
694
+ → Flag "Voice N: empty response"
695
+ → Antigravity CLI specifically: empty output WITH exit 0 = non-TTY stdout drop.
696
+ The `script` PTY wrapper in 3.4 prevents this; if it still happens, the
697
+ wrapper failed (no `script` binary?) — note it, don't silently treat as clean.
698
+
699
+ Timeout: voice_call returned 124
700
+ → Flag "Voice N: timed out after Xs"
701
+ → If 2+ voices remaining: continue silently
702
+ → If only 1 remaining: ask retry/continue/stop
703
+
704
+ Auth error: HTTP 401/403 in response
705
+ → Flag "Voice N: auth failed"
706
+ → Should have been caught by auth probe — log as unexpected
707
+
708
+ Rate limit: HTTP 429 in response
709
+ → Flag "Voice N: rate limited"
710
+ → If 2+ voices remaining: continue silently
711
+ → If only 1 remaining: ask retry/continue/stop
712
+ ```
713
+
714
+ ---
715
+
716
+ ## Phase 4: Synthesize
717
+
718
+ ### 4.1 — Read All Responses
719
+
720
+ Read each voice's free-form response. Don't impose structure yet.
721
+ Note for each voice:
722
+ - What did they focus on? (may differ from bias — that's fine)
723
+ - What's their overall stance?
724
+ - What specific concerns did they raise?
725
+ - What did they praise?
726
+
727
+ ### 4.2 — Find Patterns
728
+
729
+ ```
730
+ CONSENSUS: 2+ voices raise same concern or hold same position
731
+ → Strong signal. Note it.
732
+
733
+ UNIQUE: Only 1 voice raises something
734
+ → May be specialist insight or false positive
735
+ → Keep, mark as single-voice
736
+
737
+ DISAGREEMENT: Voices contradict each other
738
+ → Most valuable data. This is WHERE the decision lives.
739
+ → Present both sides clearly.
740
+
741
+ SEVERITY (for code/doc findings only — WE assign, not reviewers):
742
+ If material is code or doc:
743
+ → Parse specific findings, assign CRITICAL/HIGH/MEDIUM/LOW
744
+ → Based on reviewer language + actual impact
745
+ If material is idea/strategy:
746
+ → Do NOT use severity — use consensus/disagreement instead
747
+ ```
748
+
749
+ ### 4.3 — Identify the Decision Point
750
+
751
+ ```
752
+ From patterns, determine: what does the user need to DECIDE?
753
+
754
+ If consensus is clear → decision is easy, show verdict
755
+ If disagreement is clear → decision is hard, show both sides + context
756
+ If all voices say "fine" → confirm clean, move on
757
+ ```
758
+
759
+ ### 4.4 — Confusion Protocol
760
+
761
+ ```
762
+ If during synthesis you discover:
763
+ - Voices are responding to fundamentally different interpretations of the intent
764
+ - A voice raised something that changes the entire framing of the problem
765
+ - Material had a critical ambiguity that voices split on differently
766
+
767
+ → STOP synthesis. Do not force a verdict.
768
+ → Name the ambiguity in 1 sentence.
769
+ → Present the split: "Voice A read this as X, Voice B read this as Y."
770
+ → Ask the user which framing is correct before continuing.
771
+
772
+ This is rare. Most synthesis proceeds normally.
773
+ ```
774
+
775
+ ---
776
+
777
+ ## Phase 5: Output — Matched to Intent
778
+
779
+ ### Core Rule
780
+
781
+ ```
782
+ Chat output is optimized for DECISIONS — not information.
783
+ Max 20 lines in chat. Full details in file.
784
+
785
+ The "→ docs/voices/<file>.md" footer in the templates below is CONDITIONAL —
786
+ include it ONLY when a report file was actually written (see "Report File —
787
+ Save on Demand" below). For unsaved chat-only reviews, OMIT that line.
788
+ ```
789
+
790
+ ### Completion Status
791
+
792
+ After synthesis, assign 1 of 4 statuses. Status appears on the first line
793
+ of chat output, right after the target name.
794
+
795
+ ```
796
+ DONE — All voices responded, synthesis is clear, user has enough data to decide.
797
+
798
+ DONE_WITH_CONCERNS — Synthesis complete but:
799
+ • Voices disagree on an important point (not just minor)
800
+ • 1+ voice flagged a risk that other voices didn't mention
801
+ • Self-spawn only → same model family bias
802
+ • 100% consensus on a complex topic → possible shared blind spot
803
+ → List each concern, 1 line each.
804
+
805
+ BLOCKED — Cannot produce meaningful output:
806
+ • All voices failed (timeout/auth/empty)
807
+ • Material unreadable or too large even after chunking
808
+ • Intent still unclear after already asking once
809
+ → State clearly: blocked because of what, what was tried, what user should do next.
810
+
811
+ NEEDS_CONTEXT — Missing important info discovered MID-workflow:
812
+ • Voice A asked "what auth does this use?" but material doesn't say
813
+ • Voices disagree because of an unstated assumption
814
+ → State clearly: what's needed, from whom, to unlock which decision.
815
+ ```
816
+
817
+ If BLOCKED or NEEDS_CONTEXT → do NOT output synthesis.
818
+ Only output status + reason + next step.
819
+
820
+ ### Output adapts to what voices actually said — not to a pre-set template.
821
+
822
+ But there are structural patterns for common intents:
823
+
824
+ ---
825
+
826
+ **When user wanted DIRECTION:**
827
+
828
+ ```
829
+ /sp-voices — <target> STATUS: <status>
830
+ ══════════════════════════════════════════
831
+ Voices: <N> (<names>)
832
+
833
+ VERDICT: <GO | PIVOT | STOP | SPLIT>
834
+
835
+ ✅ Consensus:
836
+ • <what all voices agree on>
837
+ • <what all voices agree on>
838
+
839
+ ❌ Disagreements:
840
+ • <topic> — A: <position> / B: <position>
841
+
842
+ 💡 Insight:
843
+ • <notable observation — 1 voice>
844
+
845
+ → docs/voices/<file>.md
846
+ ══════════════════════════════════════════
847
+ ```
848
+
849
+ ---
850
+
851
+ **When user wanted VALIDATION:**
852
+
853
+ ```
854
+ /sp-voices — <target> STATUS: <status>
855
+ ══════════════════════════════════════════
856
+ Voices: <N> (<names>)
857
+
858
+ ASSESSMENT: <SOLID | HAS GAPS | RETHINK>
859
+
860
+ ✅ Validated:
861
+ • <aspects voices confirm are good>
862
+
863
+ 🔴 Must address:
864
+ • <gaps/risks voices agree are blocking>
865
+
866
+ 🟡 Consider:
867
+ • <concerns raised but not blocking>
868
+
869
+ → docs/voices/<file>.md
870
+ ══════════════════════════════════════════
871
+ ```
872
+
873
+ ---
874
+
875
+ **When user wanted BUG/RISK FINDING (code review):**
876
+
877
+ ```
878
+ /sp-voices — <target> STATUS: <status>
879
+ ══════════════════════════════════════════
880
+ Voices: <N> (<names>)
881
+
882
+ GATE: <PASS | FAIL — N blocking>
883
+
884
+ 🔴 Blocking:
885
+ [C1] <summary> — <file:line>
886
+ [H1] <summary> — <file:line> (consensus)
887
+
888
+ ⚠️ Non-blocking:
889
+ [H2] <summary> — <file:line>
890
+
891
+ 🔵 Disagreements:
892
+ [D1] <topic> — <file:line>
893
+
894
+ → docs/voices/<file>.md
895
+ ══════════════════════════════════════════
896
+ ```
897
+
898
+ ---
899
+
900
+ **When user wanted COMPARISON:**
901
+
902
+ ```
903
+ /sp-voices — <A> vs <B> STATUS: <status>
904
+ ══════════════════════════════════════════
905
+ Voices: <N> (<names>)
906
+
907
+ LEAN: <A | B | DEPENDS | NO CLEAR WINNER>
908
+
909
+ Option A:
910
+ ✅ <strengths voices agree on>
911
+ ❌ <weaknesses voices agree on>
912
+
913
+ Option B:
914
+ ✅ <strengths voices agree on>
915
+ ❌ <weaknesses voices agree on>
916
+
917
+ 🔵 Disagreements:
918
+ • <where voices pick different sides>
919
+
920
+ 💡 Option C (if any voice proposed one):
921
+ • <alternative approach>
922
+
923
+ → docs/voices/<file>.md
924
+ ══════════════════════════════════════════
925
+ ```
926
+
927
+ ---
928
+
929
+ **When user wanted READINESS CHECK:**
930
+
931
+ ```
932
+ /sp-voices — <target> STATUS: <status>
933
+ ══════════════════════════════════════════
934
+ Voices: <N> (<names>)
935
+
936
+ READY: <YES | NOT YET — N items | MAJOR ISSUES>
937
+
938
+ 🔴 Fix before using:
939
+ • <blocking issue + location>
940
+
941
+ 🟡 Should fix:
942
+ • <non-blocking issue>
943
+
944
+ ✅ Already good:
945
+ • <what voices confirm is ready>
946
+
947
+ → docs/voices/<file>.md
948
+ ══════════════════════════════════════════
949
+ ```
950
+
951
+ ---
952
+
953
+ **When intent doesn't fit patterns above:**
954
+
955
+ ```
956
+ /sp-voices — <target> STATUS: <status>
957
+ ══════════════════════════════════════════
958
+ Voices: <N> (<names>)
959
+
960
+ ✅ Consensus:
961
+ • <what voices agree on>
962
+
963
+ ❌ Disagreements:
964
+ • <where voices differ>
965
+
966
+ 💡 Notable:
967
+ • <unique insights>
968
+
969
+ → docs/voices/<file>.md
970
+ ══════════════════════════════════════════
971
+ ```
972
+
973
+ ---
974
+
975
+ **DONE_WITH_CONCERNS example** (status details appear between status line and verdict):
976
+
977
+ ```
978
+ /sp-voices — auth.ts refactor STATUS: DONE_WITH_CONCERNS
979
+ ══════════════════════════════════════════
980
+ ⚠ Concerns:
981
+ • Self-spawn only — same model family, possible blind spots
982
+ • 100% consensus on complex topic — verify independently
983
+
984
+ Voices: 2 (Claude self-spawn, Claude self-spawn)
985
+
986
+ GATE: PASS
987
+ ...
988
+ ```
989
+
990
+ ---
991
+
992
+ ### Report File — Save on Demand, Not Always
993
+
994
+ ```
995
+ Do NOT auto-save files. Wastes tokens on file writing + formatting.
996
+
997
+ When to suggest save:
998
+ - 3+ voices, many disagreements → complex, worth saving
999
+ - User says "save"
1000
+ - Many findings (> 5 CRITICAL+HIGH for code, > 3 disagreements for ideas)
1001
+
1002
+ When NOT to suggest save:
1003
+ - Quick review, 2 voices, clear consensus → chat output is enough
1004
+ - User says "quick" or "fast"
1005
+ - Simple yes/no validation
1006
+
1007
+ If save needed → include in next-action options.
1008
+ If not needed → chat output is sufficient, user can copy if they want.
1009
+ ```
1010
+
1011
+ File format when saving (`docs/voices/<date>-<target>.md`):
1012
+
1013
+ ```markdown
1014
+ # /sp-voices — <target>
1015
+ Date: <date>
1016
+ Voices: <list>
1017
+ Intent: <what user was deciding>
1018
+ Status: <DONE | DONE_WITH_CONCERNS | ...>
1019
+
1020
+ ## Summary
1021
+ <same as chat output>
1022
+
1023
+ ## Voice A (<model>) — Full Response
1024
+ <verbatim>
1025
+
1026
+ ## Voice B (<model>) — Full Response
1027
+ <verbatim>
1028
+
1029
+ ## Synthesis Notes
1030
+ - Consensus: <list>
1031
+ - Disagreements: <list>
1032
+ - Unique insights: <list>
1033
+
1034
+ ## META
1035
+ | Voice | Model | Bias | Tokens | Cost |
1036
+ |-------|-------|------|--------|------|
1037
+ | A | ... | ... | N | ~$X |
1038
+ Agreement rate: N%
1039
+ Limitations: <if any>
1040
+ ```
1041
+
1042
+ ---
1043
+
1044
+ ## After Output: Next Action
1045
+
1046
+ After showing chat summary, ask what's next.
1047
+ Options adapt based on complexity + output + status.
1048
+
1049
+ **DONE — Simple review (clear consensus, few findings):**
1050
+ ```json
1051
+ {
1052
+ "questions": [{
1053
+ "question": "/sp-voices done.",
1054
+ "header": "What next?",
1055
+ "multiSelect": false,
1056
+ "options": [
1057
+ {"label": "Act on it — proceed with recommendation"},
1058
+ {"label": "Drill down — details on specific point"},
1059
+ {"label": "Done — I have what I need"}
1060
+ ]
1061
+ }]
1062
+ }
1063
+ ```
1064
+
1065
+ **DONE_WITH_CONCERNS or complex review (disagreements, many findings, CRITICAL items):**
1066
+ ```json
1067
+ {
1068
+ "questions": [{
1069
+ "question": "/sp-voices done. [N] disagreements, [N] critical findings.",
1070
+ "header": "What next?",
1071
+ "multiSelect": false,
1072
+ "options": [
1073
+ {"label": "Drill down — details on specific point"},
1074
+ {"label": "Resolve disagreement — get tiebreaker voice"},
1075
+ {"label": "Save full report — docs/voices/ for reference"},
1076
+ {"label": "Fix now — address critical items"},
1077
+ {"label": "More voices — add external LLM for diversity"},
1078
+ {"label": "Done — I'll decide myself"}
1079
+ ]
1080
+ }]
1081
+ }
1082
+ ```
1083
+
1084
+ **Self-spawn only (limited diversity):**
1085
+ ```json
1086
+ {
1087
+ "questions": [{
1088
+ "question": "/sp-voices done (self-spawn only — same model family).",
1089
+ "header": "What next?",
1090
+ "multiSelect": false,
1091
+ "options": [
1092
+ {"label": "Good enough — proceed"},
1093
+ {"label": "Get real diversity — add external LLM (GPT/Gemini/Perplexity)"},
1094
+ {"label": "Drill down — details on specific point"},
1095
+ {"label": "Done"}
1096
+ ]
1097
+ }]
1098
+ }
1099
+ ```
1100
+
1101
+ **BLOCKED:**
1102
+ ```json
1103
+ {
1104
+ "questions": [{
1105
+ "question": "/sp-voices BLOCKED — [reason].",
1106
+ "header": "What next?",
1107
+ "multiSelect": false,
1108
+ "options": [
1109
+ {"label": "Retry — try again with same voices"},
1110
+ {"label": "Different voices — switch to available alternatives"},
1111
+ {"label": "Abort — I'll handle this manually"}
1112
+ ]
1113
+ }]
1114
+ }
1115
+ ```
1116
+
1117
+ **NEEDS_CONTEXT:**
1118
+ ```json
1119
+ {
1120
+ "questions": [{
1121
+ "question": "/sp-voices needs context — [what's missing].",
1122
+ "header": "What next?",
1123
+ "multiSelect": false,
1124
+ "options": [
1125
+ {"label": "Provide context — I'll answer now"},
1126
+ {"label": "Continue anyway — work with what you have"},
1127
+ {"label": "Abort — I'll come back later"}
1128
+ ]
1129
+ }]
1130
+ }
1131
+ ```
1132
+
1133
+ ---
1134
+
1135
+ ## Drill Down (on demand)
1136
+
1137
+ After summary, user can request details:
1138
+
1139
+ ```
1140
+ "details on [topic]" → show relevant voice quotes + context
1141
+ "what did voice A say" → show Voice A full response
1142
+ "why the disagreement on [X]" → show both positions + reasoning
1143
+ "any sources?" (Perplexity) → show citations from Perplexity response
1144
+ ```
1145
+
1146
+ Each drill-down = 1 focused response, not full report dump.
1147
+
1148
+ ---
1149
+
1150
+ ## Adaptive Sizing
1151
+
1152
+ ```
1153
+ Simple (< 100 lines, clear question):
1154
+ 1 voice, best-fit for intent
1155
+ Compact output, no file save
1156
+ Total cost: ~$0.01-0.05
1157
+
1158
+ Medium (100-500 lines, 2+ concerns):
1159
+ 2 voices, auto-selected
1160
+ Standard output, file save on request
1161
+ Total cost: ~$0.05-0.15
1162
+
1163
+ Complex (> 500 lines, multi-faceted, high stakes):
1164
+ N voices (user picks via AskUserQuestion)
1165
+ Full output + suggest file save
1166
+ Total cost: depends on N voices + models
1167
+ ```
1168
+
1169
+ ---
1170
+
1171
+ ## Rules
1172
+
1173
+ 1. **Understand intent first.** Don't classify — understand what decision the user faces.
1174
+ 2. **Confirm before spawning.** 1 line: what voices will look at, under what angle.
1175
+ 3. **Bias matches intent, not type.** Strategy question → strategy biases. Code question → code biases.
1176
+ 4. **Open prompts, no templates.** Reviewers think freely. We structure after.
1177
+ 5. **Output for decision, not information.** Chat max 20 lines. Details in file.
1178
+ 6. **Don't resolve disagreements.** Present both sides. User decides.
1179
+ 7. **Consensus ≠ correct.** All voices can share blind spots. Note when agreement is 100%.
1180
+ 8. **Findings must be specific.** Location, not vibes.
1181
+ 9. **Perplexity → web-grounded role.** When available, assign to bias benefiting from live search.
1182
+ 10. **Graceful degradation.** 1 voice fails → continue. 0 succeed → BLOCKED.
1183
+ 11. **Probe before prompting.** Verify auth before building expensive prompts. Dead keys waste tokens.
1184
+ 12. **Timeout everything.** Every voice call gets a timeout. A hanging call must never block the entire review.