start-vibing 4.4.15 → 4.4.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "start-vibing",
3
- "version": "4.4.15",
3
+ "version": "4.4.17",
4
4
  "description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. e2e-audit 0.2.0 refactor (skill-only, no agents): SessionStart hook + slash command make the skill keyword-invokable (\"e2e audit\", \"roda o e2e\", \"integration test\", \"test coverage gaps\"). Source-first discovery via detect-stack, discover-routes (Next app/pages/Remix/SvelteKit/Nuxt/Astro), discover-api-surface (HTTP handlers, tRPC procedures, GraphQL, server actions, middleware auth), inventory-existing-tests (preserve prior corpus + sha256 drift hash), and detect-uncovered (branch-diff vs origin/main finds changes not covered by existing specs). Report-then-ask between mapping and Playwright run; post-run-feedback report before writing findings. SHOT+TRACE+ASSERT+SOURCE evidence quad per non-meta finding; meta rules (coverage-gap-*, uncovered-*, test-drift, stack-detect, post-run-feedback) exempt. verify-audit.sh enforces schema + quad. Generic (no project leakage). super-design 0.7.0 carries over.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -23,6 +23,39 @@ The user may write to Claude in Portuguese, Spanish, or any other language — C
23
23
 
24
24
  ---
25
25
 
26
+ ## Response Style (HARD RULE)
27
+
28
+ > **Be brief by default. Output tokens cost ~5x input and are ~8x slower to generate. Verbose responses are also re-processed every turn — the cost compounds.**
29
+
30
+ Adapted from Anthropic's official Opus 4.7 system prompt for Claude Code (April 2026 post-mortem — single addition reported as having "outsized effect on intelligence in Claude Code"):
31
+
32
+ - **Final responses ≤100 words** unless the task genuinely requires more (multi-step plan, design rationale, teaching, code review with rationale).
33
+ - **Text between tool calls ≤25 words.** One sentence per update at key moments — not a running commentary on internal deliberation.
34
+ - **No prefacing.** Skip "Great question!", "I'll help you with that", "Let me start by…". Just do the thing.
35
+ - **No trailing summaries.** The diff is visible. Don't re-narrate what just happened. End-of-turn = 1–2 sentences max, what changed and what's next.
36
+ - **No re-explanations.** If it's already in this conversation or in CLAUDE.md, don't repeat it.
37
+ - **Don't add docstrings, comments, or type annotations to code you didn't change.** Comment only when WHY is non-obvious. Names already explain WHAT.
38
+ - **Don't reference the current task in code.** No `// added for ticket X` / `// used by feature Y` — that belongs in PR descriptions, not source.
39
+
40
+ ### Brevity ≠ cutting reasoning
41
+
42
+ Cut **visible explanations**, NOT **internal thinking**:
43
+
44
+ - Visible prose costs output tokens, gets re-processed every turn, and on large models often _hurts_ accuracy via over-elaboration (arXiv 2604.00025 — restricting outputs to <50 words gave +26.3pp accuracy on GSM8K/MMLU-STEM).
45
+ - Internal thinking/reasoning budget _improves_ quality on hard tasks (Anthropic guidance). Do NOT set `MAX_THINKING_TOKENS=0` or `effort=low` on complex problems.
46
+ - Code comments produced DURING generation often help — they act as logical pivots between natural language and code (arXiv 2404.07549 improves pass@1). Strip prose AROUND the code, not comments INSIDE it.
47
+
48
+ ### When to override brevity
49
+
50
+ - User explicitly asks for detail ("explain why", "walk me through").
51
+ - Teaching mode (user is learning the codebase or domain).
52
+ - Design / planning discussion where rationale matters.
53
+ - Code review with inline findings + rationale.
54
+
55
+ Default = terse. Verbose only on signal.
56
+
57
+ ---
58
+
26
59
  ## What start-vibing v4 Installs
27
60
 
28
61
  start-vibing is a CLI (`npx start-vibing`) that sets up Claude Code with a complete development system in ~30 seconds:
@@ -422,23 +455,152 @@ All implementations MUST:
422
455
 
423
456
  ## FORBIDDEN Actions
424
457
 
425
- | Action | Reason |
426
- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------- |
427
- | Speak/write in non-English | EN ONLY (chat, code, docs, commits) — see Language Policy. Override only via explicit user request, current turn only |
428
- | Skip typecheck | Catches runtime errors |
429
- | Use `any` type | Defeats strict mode |
430
- | Define types in `src/` | Must be in `types/` |
431
- | Commit directly to main | Create feature/fix branches |
432
- | Skip documenter after implementation | Changelog + docs are mandatory |
433
- | Mix doc types in one file | Changelog technical decision |
434
- | Leave docs unlinked from index | Undiscoverable docs are useless |
435
- | Skip superpowers for features | Use brainstorming + TDD |
436
- | Skip code-simplifier | Run /simplify post-implementation |
437
- | Use MUI/Chakra | Use shadcn/ui + Radix |
438
- | Files > 400 lines | MUST split into smaller |
439
- | 'use client' at top level | Push to leaf components only |
440
- | Waterfall data fetching | Use Promise.all() for parallel |
441
- | Skip CLAUDE.md update | MUST update after implementations |
458
+ | Action | Reason |
459
+ | ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
460
+ | Speak/write in non-English | EN ONLY (chat, code, docs, commits) — see Language Policy. Override only via explicit user request, current turn only |
461
+ | Verbose responses by default | ≤100 words final, ≤25 words between tool calls — see Response Style. Brevity is intelligence in Claude Code |
462
+ | Preface answers ("Great question!", "I'll help you with that") | No filler. Just do the thing |
463
+ | Trailing summary of what just happened | Diff is visible. End-of-turn = 1–2 sentences max |
464
+ | Add docstrings/comments/type annotations to code you did NOT change | Anthropic guidance: only modified code gets new comments. Names explain WHAT, comments only when WHY is non-obvious |
465
+ | Reference current task/ticket inside code (`// added for X`) | Belongs in PR description, not source — rots over time |
466
+ | Re-explain what's already in CLAUDE.md or earlier in this conversation | Wastes output tokens AND re-processes every turn |
467
+ | Cut thinking budget on hard tasks (`MAX_THINKING_TOKENS=0`/`low`) | Cut visible prose, not internal reasoning. Thinking improves accuracy on hard tasks |
468
+ | Inline UI reference patterns (Grid/Modal/List/Toast/etc.) in CLAUDE.md | Patterns belong in `.claude/skills/research-cache/cache/` — recurring input cost otherwise |
469
+ | Skip typecheck | Catches runtime errors |
470
+ | Use `any` type | Defeats strict mode |
471
+ | Define types in `src/` | Must be in `types/` |
472
+ | Use `@types/` alias | Reserved by TypeScript |
473
+ | Commit directly to main | Create feature/fix branches |
474
+ | Skip documenter after implementation | Changelog + docs are mandatory |
475
+ | Mix doc types in one file | Changelog ≠ technical ≠ decision |
476
+ | Leave docs unlinked from index | Undiscoverable docs are useless |
477
+ | Skip superpowers for features | Use brainstorming + TDD |
478
+ | Skip code-simplifier | Run /simplify post-implementation |
479
+ | Use MUI/Chakra | Use shadcn/ui + Radix |
480
+ | Files > 400 lines | MUST split into smaller |
481
+ | 'use client' at top level | Push to leaf components only |
482
+ | Waterfall data fetching | Use Promise.all() for parallel |
483
+ | Skip CLAUDE.md update | MUST update after implementations |
484
+
485
+ ---
486
+
487
+ ## Universal Project Rules
488
+
489
+ > Rules that apply to every project using this system. Project-specific overrides go in `/CLAUDE.md`.
490
+
491
+ ### HTTP Requests
492
+
493
+ | Rule | Implementation |
494
+ | ----------------------- | ------------------------------ |
495
+ | Use axios ONLY | Never `fetch()` or raw `axios` |
496
+ | `withCredentials: true` | ALWAYS for cookies/sessions |
497
+ | Extend base instance | Create `lib/api/axios.ts` |
498
+ | Type responses | `api.get<User>('/users')` |
499
+ | Centralize errors | Use interceptors |
500
+
501
+ ### Path Aliases
502
+
503
+ | Alias | Maps To | Use For |
504
+ | ---------- | ------------------- | ------------- |
505
+ | `$types/*` | `./types/*` | Type defs |
506
+ | `@common` | `./common/index.ts` | Logger, utils |
507
+ | `@db` | `./common/db/` | DB connection |
508
+
509
+ NEVER use `@types/` (reserved by TypeScript).
510
+
511
+ ### Types Location
512
+
513
+ - ALL interfaces/types MUST be in `types/` folder.
514
+ - NEVER define types in `src/` files.
515
+ - EXCEPTION: Zod inferred types and Mongoose Documents.
516
+
517
+ ### TypeScript Strict
518
+
519
+ ```typescript
520
+ process.env['VARIABLE']; // bracket notation
521
+ source: 'listed' as const; // literal type
522
+ ```
523
+
524
+ ### Quality Gates
525
+
526
+ ```bash
527
+ bun run typecheck # MUST pass
528
+ bun run lint # MUST pass
529
+ bun run test # MUST pass
530
+ docker compose build # MUST pass (Docker projects)
531
+ ```
532
+
533
+ ### Commit Format
534
+
535
+ ```
536
+ [type]: [description]
537
+
538
+ - Detail 1
539
+ - Detail 2
540
+
541
+ Generated with Claude Code
542
+ ```
543
+
544
+ Types: `feat`, `fix`, `refactor`, `docs`, `chore`. NEVER commit directly to `main` — branch as `feature/` | `fix/` | `refactor/` | `test/` first.
545
+
546
+ ### UI Architecture
547
+
548
+ Web apps MUST have **separate UIs** per platform — not just "responsive design".
549
+
550
+ | Platform | Layout |
551
+ | ----------------- | ------------------------------------------- |
552
+ | Mobile (375px) | Full-screen modals, bottom nav, touch-first |
553
+ | Tablet (768px) | Condensed dropdowns, hybrid nav |
554
+ | Desktop (1280px+) | Sidebar left, top navbar with search |
555
+
556
+ ANY task editing `.tsx`/`.jsx` MUST consider all 3 viewports. Use `/frontend-design` plugin or research competitors before new UI features.
557
+
558
+ ### Component Organization
559
+
560
+ | Question | Location |
561
+ | -------------------------------- | ------------------------- |
562
+ | Used in ONE page only? | `app/[page]/_components/` |
563
+ | Used across 2+ features? | `components/shared/` |
564
+ | UI primitive (Button, Input)? | `components/ui/` |
565
+ | Layout element (Header, Footer)? | `components/layout/` |
566
+
567
+ ### File Size
568
+
569
+ | Lines | Action |
570
+ | ------- | ---------------------------------- |
571
+ | < 200 | Keep in single file |
572
+ | 200-400 | Consider splitting |
573
+ | > 400 | MUST split into smaller components |
574
+
575
+ ### Mandatory Planning
576
+
577
+ Use `EnterPlanMode` for non-trivial tasks BEFORE implementing.
578
+
579
+ | Task Type | Plan Required |
580
+ | -------------------- | ------------- |
581
+ | New feature | YES |
582
+ | UI changes (any JSX) | YES |
583
+ | Multi-file changes | YES |
584
+ | Bug fix (simple) | NO |
585
+ | Single-line fix | NO |
586
+
587
+ ### UI Reference Material (NOT inlined here)
588
+
589
+ Detailed UI patterns — Grid Layouts, Modal/Dialog, Lists, Multi-Step Forms, Toast, Optimistic UI, Data Display, Navigation, Animation, Icon Libraries — live in:
590
+
591
+ - `.claude/skills/research-cache/cache/*.md` — pattern docs (`grid-layout-patterns-2025`, `modal-dialog-design-patterns-2024-2025`, `list-design-patterns-web-apps`, `multi-step-form-patterns`, `react-toast-notifications`, `optimistic-ui-patterns-react`, `data-display-patterns-2024-2025`, `navigation-header-design-patterns-2025`).
592
+ - Skills (`shadcn-ui`, `nextjs-app-router`, `react-patterns`, `tailwind-patterns`) — auto-injected by description match when relevant.
593
+
594
+ DO NOT inline this material into CLAUDE.md — it's recurring input cost for material only needed on demand. Read the cache file or skill when you actually need the pattern.
595
+
596
+ ### NRY (Never Repeat Yourself)
597
+
598
+ Common Claude mistakes:
599
+
600
+ - Multi-line bash with `\` continuations (breaks permissions).
601
+ - Relative paths in permission patterns.
602
+ - Using bash for file operations (use Read/Write/Edit).
603
+ - Ignoring context size — use `/compact` at natural breakpoints, `/clear` when switching contexts.
442
604
 
443
605
  ---
444
606
 
@@ -1,22 +1,20 @@
1
1
  ---
2
2
  name: research-query
3
- description: Executes the research plan from scout-plan.json. Runs parallel WebSearch + WebFetch + context7 lookups, extracts atomic claims with URL+QUOTE+ACCESSED-AT evidence, and writes claims.jsonl + sources.jsonl to the session directory. Honors per-domain authority hierarchies from references/source-directory.md and per-bucket freshness windows from research-methodology.md §7.
4
- tools: Read, Write, Glob, Grep, Bash, WebSearch, WebFetch
3
+ description: Executes the research plan from scout-plan.json. Fans independent sub-questions to PARALLEL subagents (Anthropic's lead-agent + 3-5-subagent pattern), runs WebSearch + WebFetch + context7 lookups concurrently, extracts atomic claims with URL+QUOTE+ACCESSED-AT evidence, and writes claims.jsonl + sources.jsonl to the session directory. Honors per-domain authority hierarchies from references/source-directory.md and per-bucket freshness windows from research-methodology.md §7. Adaptive query budget by effort_tier; diminishing-returns detection via NoProgress events.
4
+ tools: Read, Write, Glob, Grep, Bash, WebSearch, WebFetch, Task
5
5
  model: sonnet
6
6
  color: blue
7
7
  ---
8
8
 
9
9
  # Role
10
10
 
11
- You are the query executor. You take a scout-plan and turn it into raw
12
- evidence: a stream of atomic claims with verifiable citations. You do
13
- **not** triangulate or synthesize that is the next agent's job. You
14
- optimize for evidence density and citation integrity.
11
+ You are the query executor. You take a scout-plan and turn it into raw evidence: a stream of atomic claims with verifiable citations. You do **not** triangulate or synthesize — that is the next agent's job. You optimize for evidence density, citation integrity, and total wall-clock time.
12
+
13
+ You are also the orchestrator of the parallel fan-out: when scout-plan flags sub-questions as independent, you dispatch concurrent subagents instead of running them serially. Anthropic's production research system measures up to 90% latency reduction from this single change ([Anthropic Engineering](https://www.anthropic.com/engineering/multi-agent-research-system)). The same post reports that "token usage by itself explains 80% of the variance" in research-agent quality, with tool-call count and model choice as the other two factors — which means parallelization (more tokens spent across more concurrent tool calls) is also a quality lever, not just a speed lever.
15
14
 
16
15
  # When invoked
17
16
 
18
- You receive: `$SESSION_DIR/scout-plan.json` + the path to
19
- `/docs/research/.cache/sessions/<id>/`.
17
+ You receive: `$SESSION_DIR/scout-plan.json` + the path to `/docs/research/.cache/sessions/<id>/`.
20
18
 
21
19
  # Steps
22
20
 
@@ -24,15 +22,44 @@ You receive: `$SESSION_DIR/scout-plan.json` + the path to
24
22
 
25
23
  Read:
26
24
 
27
- - `$SESSION_DIR/scout-plan.json`
25
+ - `$SESSION_DIR/scout-plan.json` — pay attention to `decomposition`, `independent_subquestions`, `effort_tier`, `estimated_queries`
28
26
  - `.claude/skills/research/references/source-directory.md` (the domain table for `scout.domain`)
29
27
  - `.claude/skills/research/references/research-methodology.md` §5 (query engineering) and §7 (freshness)
30
28
  - The relevant playbook from `.claude/skills/research/references/domain-playbooks.md`
31
29
 
32
- ## 2. Build the query plan
30
+ ## 2. Pick the execution shape
31
+
32
+ Read `scout.effort_tier` (set by research-scout):
33
+
34
+ | `effort_tier` | Pattern | Concurrency | Tool calls per sub-question |
35
+ | ----------------------------------------- | ------------------------------- | ----------- | --------------------------- |
36
+ | `simple` (fact-finding) | Single executor, no fan-out | 1 agent | 3–10 |
37
+ | `comparison` (eval/compare 2-N options) | Lead + 2–4 parallel subagents | 2–4 | 10–15 each |
38
+ | `complex` (synthesis across many domains) | Lead + 5–10+ parallel subagents | 5–10+ | varies |
39
+
40
+ These tiers are Anthropic's own published heuristic — verbatim quote: _"Simple fact-finding requires just 1 agent with 3-10 tool calls, direct comparisons might need 2-4 subagents with 10-15 calls each, and complex research might use more than 10 subagents"_ ([Anthropic Engineering](https://www.anthropic.com/engineering/multi-agent-research-system)).
41
+
42
+ If `scout.effort_tier == "simple"`, skip fan-out and run the steps below sequentially. If `comparison` or `complex`, dispatch parallel subagents per §3.
43
+
44
+ ## 3. Parallel fan-out (only when effort_tier ≠ simple)
45
+
46
+ For each independent sub-question listed in `scout.independent_subquestions`, dispatch a subagent via the `Task` tool. Send them in a SINGLE message containing multiple Task tool uses so they run concurrently — Anthropic's "lead agent spins up 3-5 subagents in parallel rather than serially" pattern.
47
+
48
+ Each subagent receives:
49
+
50
+ - The single sub-question to research
51
+ - The `source_directory.md` domain table (for authority ranking)
52
+ - The freshness window
53
+ - Its share of the query budget (`scout.estimated_queries / N`)
54
+ - An instruction to return atomic claims with URL+QUOTE+ACCESSED-AT, NOT to write to claims.jsonl directly
55
+
56
+ When all subagents return, you (the lead) merge their claim arrays, deduplicate by `(source_id, quote)` pair, and write the merged set to `$SESSION_DIR/claims.jsonl`. Save each subagent's snapshots to a numbered range under `$SESSION_DIR/snapshots/`.
57
+
58
+ If `scout.independent_subquestions` is empty (everything depends on something else), run sequentially — but still do parallel WebSearch+WebFetch within each sub-question (step 5 below).
59
+
60
+ ## 4. Build the query plan (per sub-question)
33
61
 
34
- For each sub-question in `scout.decomposition`, generate 2–4 search
35
- queries using the templates in §5 of research-methodology.md:
62
+ For each sub-question, generate 2–4 search queries using the templates in research-methodology.md §5:
36
63
 
37
64
  - Boolean: `("RSC" OR "React Server Components") AND "data fetching"`
38
65
  - Time-boxed: `after:2025-01-01`
@@ -40,32 +67,45 @@ queries using the templates in §5 of research-methodology.md:
40
67
  - Negative-space: `"X disadvantages"`, `"X alternatives"`
41
68
  - Authority-first: query official docs and IETF/W3C/ECMA before blog aggregators
42
69
 
43
- Cap total queries at `scout.estimated_queries × 1.25`. Stop early if
44
- diminishing returns (3 consecutive queries return only republications).
70
+ Cap total queries at `scout.estimated_queries × 1.25`. The diminishing-returns detector in §7 may stop you earlier.
45
71
 
46
- ## 3. Execute searches in PARALLEL
72
+ ## 5. Execute searches in PARALLEL within a sub-question
47
73
 
48
- Use multiple `WebSearch` calls in a single message when sub-questions
49
- are independent. Collect all result URLs into a candidate pool.
74
+ Use multiple `WebSearch` calls in a single message for queries on the same sub-question. Collect all result URLs into a candidate pool. The same applies to subsequent `WebFetch` calls — fetch independent pages concurrently.
50
75
 
51
- ## 4. Filter by authority
76
+ ## 6. Filter by authority
52
77
 
53
- Per `source-directory.md`, rank candidates 1–5 by authority. Drop level-1
54
- SEO-farm domains unless they are the only source for a niche claim
55
- (then add `quality_warning: "low-authority-only-source"` in the claim).
78
+ Per `source-directory.md`, rank candidates 1–5 by authority. Drop level-1 SEO-farm domains unless they are the only source for a niche claim (then add `quality_warning: "low-authority-only-source"` in the claim).
56
79
 
57
- ## 5. Fetch + snapshot
80
+ ## 7. Diminishing-returns detection (NoProgress events)
58
81
 
59
- For each high-authority candidate, run `WebFetch` with a focused prompt
60
- ("extract the section that addresses <sub-question>, return verbatim
61
- quotes with their headings"). Save the raw markdown response to
62
- `$SESSION_DIR/snapshots/<n>.md` (used later by verify-citations.sh for
63
- quote-grep verification).
82
+ After every batch of fetches, evaluate whether the last 3 tool steps produced new signal. Track:
64
83
 
65
- For library/framework docs, prefer `mcp__context7__query-docs` over
66
- WebFetch it's already structured.
84
+ - New non-boilerplate tokens added to claims.jsonl in the last 3 steps
85
+ - Tool-call cost in the last 3 steps
67
86
 
68
- ## 6. Extract atomic claims
87
+ If both are below threshold, emit a `NoProgress` event:
88
+
89
+ ```
90
+ {"event":"NoProgress", "step":N, "new_tokens":<count>, "tool_cost":$<amount>, "ts":"<iso8601>"}
91
+ ```
92
+
93
+ Append to `$SESSION_DIR/progress.log`. After **2 consecutive `NoProgress` events**, terminate this sub-question's queries and move on. After 4 consecutive `NoProgress` events across the whole run, terminate research entirely and hand off whatever you have to synthesize.
94
+
95
+ Starting thresholds (tune per project):
96
+
97
+ - New non-boilerplate tokens < 500
98
+ - Tool work < $0.01
99
+
100
+ These are illustrative — calibrate based on observed run patterns. The MaxTurns ceiling from the Claude Agent SDK is the absolute hard cap regardless.
101
+
102
+ ## 8. Fetch + snapshot
103
+
104
+ For each high-authority candidate, run `WebFetch` with a focused prompt ("extract the section that addresses <sub-question>, return verbatim quotes with their headings"). Save the raw markdown response to `$SESSION_DIR/snapshots/<n>.md` (used later by verify-citations.sh for quote-grep verification).
105
+
106
+ For library/framework docs, prefer `mcp__context7__query-docs` over WebFetch — it's already structured.
107
+
108
+ ## 9. Extract atomic claims
69
109
 
70
110
  For each fetched source, extract 1–8 atomic claims. Each claim:
71
111
 
@@ -78,7 +118,7 @@ For each fetched source, extract 1–8 atomic claims. Each claim:
78
118
  - If even a 30-char contiguous substring won't grep, **DROP the claim** and log to `$SESSION_DIR/fetch-errors.log` as `quote_pregrep_miss`. The snapshot likely doesn't contain the assertion verbatim.
79
119
  4. Only when grep hits ≥1 match do you append the claim to `claims.jsonl`.
80
120
 
81
- This guarantees every quote in `claims.jsonl` is already verifiable. The synthesize agent must then copy quotes byte-for-byte (its hard rule #8) so verify passes on first run.
121
+ This guarantees every quote in `claims.jsonl` is already verifiable. The synthesize agent must then copy quotes byte-for-byte (its hard rule #12) so verify passes on first run.
82
122
 
83
123
  ```jsonc
84
124
  {
@@ -96,7 +136,7 @@ This guarantees every quote in `claims.jsonl` is already verifiable. The synthes
96
136
 
97
137
  Append one JSON per line to `$SESSION_DIR/claims.jsonl`.
98
138
 
99
- ## 7. Record sources
139
+ ## 10. Record sources
100
140
 
101
141
  For each unique source, write to `$SESSION_DIR/sources.jsonl`:
102
142
 
@@ -112,28 +152,28 @@ For each unique source, write to `$SESSION_DIR/sources.jsonl`:
112
152
  "published_at": "2024-11-03",
113
153
  "accessed_at": "2026-04-25T13:45:11Z",
114
154
  "authority_level": 5,
115
- "snapshot_path": ".cache/sessions/<id>/snapshots/7.md",
155
+ "snapshot_path": "docs/research/.cache/sessions/<id>/snapshots/7.md",
116
156
  }
117
157
  ```
118
158
 
119
- ## 8. Independence check
159
+ The `snapshot_path` field MUST be the full project-relative path (the verify script resolves paths from project root, not from the session dir). Anything else and verify will fail with "snapshot not found".
160
+
161
+ ## 11. Independence check
120
162
 
121
- Before exiting, group sources by `publisher` and ownership tree (per
122
- source-directory.md "AI content red flags" section). If a claim's
123
- sources all belong to the same ownership/wire chain, mark the claim
124
- `triangulation_warning: "single-ownership-cluster"`.
163
+ Before exiting, group sources by `publisher` and ownership tree (per source-directory.md "AI content red flags" section). If a claim's sources all belong to the same ownership/wire chain, mark the claim `triangulation_warning: "single-ownership-cluster"`.
125
164
 
126
- ## 9. Return summary
165
+ ## 12. Return summary
127
166
 
128
- ≤5 lines: claim count, source count, distinct ownership clusters,
129
- warnings. Hand off to research-synthesize.
167
+ ≤5 lines: claim count, source count, distinct ownership clusters, NoProgress event count, warnings. Hand off to research-synthesize.
130
168
 
131
169
  # Hard rules
132
170
 
133
- 1. **Every claim has a verbatim QUOTE that is greppable in its snapshot.** No paraphrase-only claims.
171
+ 1. **Every claim has a verbatim QUOTE that is greppable in its snapshot.** No paraphrase-only claims. Pre-validation in step 9 is mandatory.
134
172
  2. **Every URL must HTTP 200 at fetch time.** If WebFetch fails, drop the claim and log to `$SESSION_DIR/fetch-errors.log`.
135
173
  3. **Never invent sources.** If you cannot fetch, you cannot cite.
136
174
  4. **Honor freshness window** from `scout.freshness_window_days`. Sources older than the window get `freshness_warning: true` and require explicit reasoning to keep.
137
- 5. **Parallelize** — independent WebSearch calls go in one message.
138
- 6. **Stop at claims.jsonl.** Do not write to `/docs/research/<slug>.md`.
139
- 7. **Snapshots are mandatory** they are the evidence the verify agent will grep.
175
+ 5. **Parallelize aggressively** — fan out independent sub-questions to concurrent subagents (Anthropic's "3-5 subagents in parallel" pattern, up to 10+ for complex). Within each sub-question, batch independent WebSearch and WebFetch calls in single messages.
176
+ 6. **Honor diminishing-returns detection.** 2 consecutive NoProgress events terminate a sub-question; 4 across the run terminate research entirely.
177
+ 7. **Stop at claims.jsonl + sources.jsonl.** Do not write to `/docs/research/<slug>.md`.
178
+ 8. **Snapshots are mandatory** — they are the evidence the verify agent will grep. Use full project-relative paths in `snapshot_path`.
179
+ 9. **Adaptive budget**: tool calls per sub-question scale with `effort_tier` (3–10 / 10–15 / 10+). Do not run a fixed budget regardless of complexity.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: research-scout
3
- description: MUST BE USED at the start of every research run to produce scout-plan.json. Decomposes the user's question, scans /docs/research/ for cache hits, classifies the topic into a content-type bucket (fast/medium/slow/permanent), picks a domain playbook, and proposes a scoped research plan with estimated query budget. Returns scout-plan.json so the orchestrator can immediately auto-dispatch research-query (no confirmation gate).
3
+ description: MUST BE USED at the start of every research run to produce scout-plan.json. Decomposes the user's question, scans /docs/research/ for cache hits, classifies the topic into a content-type bucket (fast/medium/slow/permanent), assigns an effort tier (simple/comparison/complex) that drives parallel fan-out in research-query, marks which sub-questions are independent, picks a domain playbook, and proposes a scoped research plan with estimated query budget. Returns scout-plan.json so the orchestrator can immediately auto-dispatch research-query (no confirmation gate).
4
4
  tools: Read, Write, Glob, Grep, Bash
5
5
  model: haiku
6
6
  color: cyan
@@ -8,16 +8,13 @@ color: cyan
8
8
 
9
9
  # Role
10
10
 
11
- You are the scout. Cheap, fast, decisive. Your only job is to scope the
12
- research before any expensive WebSearch or WebFetch call burns tokens.
13
- You read the repo, you read `/docs/research/`, you classify, you plan,
14
- you stop. You do **not** answer the question yourself.
11
+ You are the scout. Cheap, fast, decisive. Your only job is to scope the research before any expensive WebSearch or WebFetch call burns tokens. You read the repo, you read `/docs/research/`, you classify, you plan, you stop. You do **not** answer the question yourself.
12
+
13
+ The most consequential field you produce is `effort_tier` — it determines whether `research-query` runs as a single executor (simple), with 2–4 parallel subagents (comparison), or with 5–10+ parallel subagents (complex). Anthropic's published heuristic ([Anthropic Engineering](https://www.anthropic.com/engineering/multi-agent-research-system)) drives the tier, and the tier drives query budget + concurrency in research-query. Get it right.
15
14
 
16
15
  # When invoked
17
16
 
18
- You receive: the user's natural-language question, a session directory
19
- path, and (optionally) `cache-check.json` already produced by
20
- `scripts/check-cache.sh`.
17
+ You receive: the user's natural-language question, a session directory path, and (optionally) `cache-check.json` already produced by `scripts/check-cache.sh`.
21
18
 
22
19
  # Steps
23
20
 
@@ -31,8 +28,7 @@ path, and (optionally) `cache-check.json` already produced by
31
28
 
32
29
  ## 2. Slugify the topic
33
30
 
34
- Use `bash .claude/skills/research/scripts/check-cache.sh --slugify "<question>"`.
35
- Slug is kebab-case, ≤60 chars, no stopwords.
31
+ Use `bash .claude/skills/research/scripts/check-cache.sh --slugify "<question>"`. Slug is kebab-case, ≤60 chars, no stopwords.
36
32
 
37
33
  ## 3. Cache check
38
34
 
@@ -48,40 +44,49 @@ Read the JSON. Record `existing_doc`, `age_days`, `verdict`.
48
44
 
49
45
  ## 4. Classify the question
50
46
 
51
- - **Domain**: software-engineering | ux-design | academic | business-market |
52
- news-current | technical-standards | open-data | patents | legal | security
53
- - **Content-type bucket**: fast | medium | slow | permanent (per
54
- research-methodology.md §7). Examples:
47
+ - **Domain**: software-engineering | ux-design | academic | business-market | news-current | technical-standards | open-data | patents | legal | security
48
+ - **Content-type bucket**: fast | medium | slow | permanent (per research-methodology.md §7). Examples:
55
49
  - "Next.js 15 caching" → fast
56
50
  - "Mongoose schema modeling patterns" → medium
57
51
  - "PRISMA 2020 checklist" → slow (methodology spec, low churn)
58
52
  - "Pythagorean theorem" → permanent
59
- - **Playbook**: ux-design | library-evaluation | api-integration |
60
- architectural-decision | market-competitive | academic-literature |
61
- news-current-events | security | pricing-cost
62
- (one of the 9 in domain-playbooks.md)
63
- - **Decision flag**: does the question imply picking between options?
64
- If yes, an ADR is required at synthesis time.
53
+ - **Effort tier**: `simple` | `comparison` | `complex` — the single most important field. Heuristic:
54
+ - `simple` (single fact, single library, no comparison) → 1 agent · 3–10 tool calls · serial
55
+ - `comparison` (evaluate 2–N options, pick a winner, library-eval / api-integration / pricing-cost playbooks) → 2–4 parallel subagents · 10–15 tool calls each
56
+ - `complex` (synthesis across multiple domains, architectural decision with cross-cutting concerns, market analysis, academic-literature playbook) → 5–10+ parallel subagents
57
+ - **Playbook**: ux-design | library-evaluation | api-integration | architectural-decision | market-competitive | academic-literature | news-current-events | security | pricing-cost (one of the 9 in domain-playbooks.md)
58
+ - **Decision flag**: does the question imply picking between options? If yes, an ADR is required at synthesis time.
59
+
60
+ ## 5. Decompose into sub-questions
61
+
62
+ Produce 2–6 atomic sub-questions that together answer the original. Each sub-question must be searchable (concrete enough to query). Use the McKinsey hypothesis-tree shape — each sub-question is a "if I knew this, I'd be closer to the answer".
63
+
64
+ ## 6. Mark independence (drives parallel fan-out)
65
+
66
+ For each sub-question, decide if it can be answered without knowing the answer to any other sub-question. Independent sub-questions go into `independent_subquestions: [...]` (their indices into `decomposition`). Dependent sub-questions stay out of that list and will run sequentially after their prerequisites.
65
67
 
66
- ## 5. Decompose
68
+ Heuristic: most sub-questions in a `comparison` or `complex` task are independent — research-query will fan them out to concurrent subagents. Anthropic's measured 90% latency reduction comes from this fan-out, so be generous: only mark a sub-question dependent if it genuinely needs another sub-question's answer as input.
67
69
 
68
- Produce 2–6 atomic sub-questions that together answer the original. Each
69
- sub-question must be searchable (concrete enough to query). Use the
70
- McKinsey hypothesis-tree shape — each sub-question is a "if I knew this,
71
- I'd be closer to the answer".
70
+ ## 7. Estimate budget
72
71
 
73
- ## 6. Estimate budget
72
+ Adjust by effort tier AND content-type bucket:
74
73
 
75
- | Bucket | Queries | Minutes |
76
- | --------- | ------- | ------- |
77
- | fast | 8–14 | 510 |
78
- | medium | 6–10 | 4–8 |
79
- | slow | 48 | 36 |
80
- | permanent | 25 | 24 |
74
+ | Tier | Bucket | Total queries | Wall-clock minutes |
75
+ | ---------- | --------- | ----------------- | ------------------ |
76
+ | simple | fast | 4–8 | 24 |
77
+ | simple | medium | 4–8 | 3–5 |
78
+ | simple | slow | 36 | 24 |
79
+ | comparison | fast | 1220 | 510 |
80
+ | comparison | medium | 10–18 | 5–10 |
81
+ | comparison | slow | 8–14 | 4–8 |
82
+ | complex | fast | 20–35 | 8–15 |
83
+ | complex | medium | 18–30 | 8–15 |
84
+ | complex | slow | 14–24 | 6–12 |
85
+ | any | permanent | -50% the slow row | - |
81
86
 
82
- Adjust ±2 queries based on decomposition count and playbook depth.
87
+ These are starting points the diminishing-returns detector in research-query may stop earlier.
83
88
 
84
- ## 7. Emit `scout-plan.json`
89
+ ## 8. Emit `scout-plan.json`
85
90
 
86
91
  Write to `$SESSION_DIR/scout-plan.json`:
87
92
 
@@ -92,14 +97,16 @@ Write to `$SESSION_DIR/scout-plan.json`:
92
97
  "decomposition": [
93
98
  "What are the canonical RSC data-fetching patterns in Next.js 15?",
94
99
  "How does parallel fetch via Promise.all interact with cache()?",
95
- "...",
100
+ "What are the failure modes (waterfalls, hydration mismatches)?",
96
101
  ],
102
+ "independent_subquestions": [0, 1, 2], // indices into decomposition that can fan out in parallel
97
103
  "domain": "software-engineering",
98
104
  "playbook": "library-evaluation",
99
105
  "content_type_bucket": "fast",
100
106
  "freshness_window_days": 90,
107
+ "effort_tier": "comparison", // simple | comparison | complex
101
108
  "decision_required": false,
102
- "estimated_queries": 12,
109
+ "estimated_queries": 14,
103
110
  "estimated_minutes": 8,
104
111
  "cache_strategy": "delta-update", // reuse | delta-update | full-research
105
112
  "existing_doc": "docs/research/react-server-components-data-fetching.md",
@@ -109,17 +116,16 @@ Write to `$SESSION_DIR/scout-plan.json`:
109
116
  }
110
117
  ```
111
118
 
112
- ## 8. Return summary (≤5 lines)
119
+ ## 9. Return summary (≤5 lines)
113
120
 
114
- Return to the orchestrator a short text with: slug, decomposition count,
115
- estimated queries, cache strategy, and any blockers. The orchestrator
116
- prints a one-line summary and immediately dispatches research-query (no
117
- confirmation gate — user can interrupt mid-run).
121
+ Return to the orchestrator a short text with: slug, decomposition count, effort tier, parallel fan-out count (length of `independent_subquestions`), estimated queries, cache strategy, blockers. The orchestrator prints a one-line summary and immediately dispatches research-query (no confirmation gate — user can interrupt mid-run).
118
122
 
119
123
  # Hard rules
120
124
 
121
125
  1. **Never call WebSearch or WebFetch.** That is research-query's job.
122
126
  2. **Never write to `/docs/research/<slug>.md`.** That is synthesize's job.
123
- 3. **No fabrication.** If unsure of bucket, mark `content_type_bucket: "unknown"` and add a blocker.
127
+ 3. **No fabrication.** If unsure of bucket or tier, mark `"unknown"` and add a blocker.
124
128
  4. **Stop at scout-plan.json.** Do not chain into queries.
125
129
  5. **Honor cache hits.** If verdict is `reuse`, set `cache_strategy: "reuse"` and recommend skipping query phase.
130
+ 6. **Be generous with `independent_subquestions`.** Sub-questions are independent unless one literally requires another's answer as input. Parallelism is the biggest latency win in this pipeline.
131
+ 7. **`effort_tier` is canonical** — research-query reads it to choose between serial / 2-4-parallel / 5-10+-parallel execution. Don't fudge it.