space-architect 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,308 @@
1
+ # Builder dispatch reference
2
+
3
+ Verified against the `claude` CLI (Claude Code) headless mode, June 2026. The
4
+ builder is `claude -p` (`--print`, the non-interactive headless mode) pinned to
5
+ `claude-sonnet-4-6` — the *same binary the architect runs, one tier down*. Key
6
+ facts the skill encodes: lane-prompts go in on **stdin** (Claude Code has no
7
+ `@file`, and a big quoted lane-prompt as a shell argument gets mangled); the
8
+ model is pinned with `--model claude-sonnet-4-6` (the `sonnet` alias floats to
9
+ the latest Sonnet — pin the full id); there is **no `-C`/working-dir flag**, so
10
+ per-lane dispatch `cd`s into the worktree; permissions are the **tool
11
+ allow/deny lists** (`--allowedTools`/`--disallowedTools`) plus
12
+ `--permission-mode`, not a sandbox; web access is the built-in
13
+ `WebSearch`/`WebFetch` tools (no extension to install).
14
+
15
+ **The one load-bearing difference from the Codex design:** Codex's
16
+ `--sandbox workspace-write` made `.git` physically read-only. Claude Code has
17
+ **no automatic filesystem sandbox** in headless mode (the sandbox is opt-in via
18
+ settings, off by default), so `.git` is not hardware-protected. "Builders never
19
+ commit" (hard rule 7) is now enforced in three layers, weakest to strongest:
20
+ (1) a runtime first line — deny the git-write tools with
21
+ `--disallowedTools 'Bash(git commit:*)' …`; (2) worktree isolation between
22
+ lanes; (3) the authoritative check — an architect post-flight
23
+ `git -C <worktree> log <repo-base>..` that must be empty. The deny rules are
24
+ not airtight (a builder can shell out — `sh -c 'git commit …'` — past the
25
+ pattern match), so the post-flight `git log` is what the loop actually trusts.
26
+ If a lane committed, treat the worktree as tampered: reset and re-dispatch.
27
+
28
+ **Preflight (once per environment):** run `claude --version`, and confirm the
29
+ builder model resolves with a one-shot canary
30
+ (`echo ok | claude -p --model claude-sonnet-4-6 --max-turns 1`). No API key —
31
+ the builder runs on your Claude plan — but note headless `claude -p` draws on
32
+ the Agent SDK credit pool (separate from interactive usage since June 15 2026;
33
+ see `DESIGN.md` §4). On the first real dispatch in a new environment, launch
34
+ ONE canary lane and confirm it starts cleanly before fanning anything out.
35
+
36
+ ## Canonical dispatch — `architect dispatch <iteration> <lane>`
37
+
38
+ The canonical path is `architect dispatch <iteration> <lane>`. The tool
39
+ assembles the canonical `claude -p` argv, pins the model to `claude-sonnet-4-6`,
40
+ reads the lane prompt from `build/<id>-<lane>/prompt.md` on stdin, and streams
41
+ `--output-format stream-json --verbose` output to
42
+ `build/<id>-<lane>/run.jsonl`. Run each lane as its own **background Bash tool
43
+ call** (`run_in_background`) so your turn doesn't block for the full multi-hour
44
+ run.
45
+
46
+ Write the lane's prompt to `build/<id>-<lane>/prompt.md` first (never pass a
47
+ big prompt as a shell argument — shells mangle quotes), then:
48
+
49
+ ```bash
50
+ # single-lane iteration — run from the space root
51
+ architect dispatch <iteration> <lane>
52
+ ```
53
+
54
+ For multi-lane iterations, create worktrees first (one per lane), then dispatch
55
+ each lane from its worktree:
56
+
57
+ ```bash
58
+ # per lane:
59
+ architect worktree add <repo> <iteration> <lane> [--base <repo-base>]
60
+ architect dispatch <iteration> <lane>
61
+ ```
62
+
63
+ `architect worktree add` creates `build/<id>-<lane>/wt` off the target repo's
64
+ base commit (a repo commit — distinct from the freeze, which is a space
65
+ commit), adds it with a `lane/<iteration>-<lane>` branch, and records it in
66
+ `space.yaml`.
67
+
68
+ Issue each dispatch as its **own background Bash tool call** — one call per
69
+ lane. Never use a shell `&` loop. A `for … & done` launcher is a *launcher*
70
+ process: it returns the instant it has spawned the lane children, the harness
71
+ reaps those now-orphaned `claude` processes, and every lane dies at once with no
72
+ `result` — partial diffs, no reports (this exact failure has happened: three
73
+ lanes killed at the same second, zero output). One blocking dispatch per
74
+ background Bash tool keeps each lane attached to a harness-tracked task that
75
+ survives the full multi-hour run and reports completion per lane.
76
+
77
+ ### What the tool runs under the hood
78
+
79
+ `architect dispatch` is equivalent to this — documented here for transparency
80
+ and as the manual fallback:
81
+
82
+ ```bash
83
+ # write prompt to build/<id>-<lane>/prompt.md first, then:
84
+ ( cd build/<id>-<lane>/wt && \
85
+ claude -p --model claude-sonnet-4-6 \
86
+ --permission-mode acceptEdits \
87
+ --allowedTools 'Read,Edit,Write,Grep,Glob,Bash,WebSearch,WebFetch' \
88
+ --disallowedTools 'Bash(git commit:*),Bash(git push:*),Bash(git reset:*),Bash(git merge:*),Bash(git rebase:*),Bash(git checkout:*),Bash(git branch:*)' \
89
+ --output-format stream-json --verbose \
90
+ --max-turns 200 \
91
+ < <space>/build/<id>-<lane>/prompt.md \
92
+ > <space>/build/<id>-<lane>/run.jsonl 2>&1 )
93
+ ```
94
+
95
+ `acceptEdits` auto-approves file writes; listing `Bash` in `--allowedTools`
96
+ auto-approves shell commands so the run never blocks on a prompt; any tool *not*
97
+ on the allow list is denied rather than prompted in `-p` mode (so the builder
98
+ can't wander outside its toolset), and the `--disallowedTools` deny rules win
99
+ over the allow list (deny always takes precedence) as the runtime first line
100
+ against commits. Redirect stderr (`2>&1`) into the run-log so a dispatch error
101
+ lands somewhere instead of vanishing.
102
+
103
+ ### Integration (architect-only, after per-lane post-flight passes)
104
+
105
+ You decide which lanes pass; the CLI does the git mechanics. Canonical path:
106
+
107
+ ```bash
108
+ architect integrate <iteration> --lanes <passing-set> # e.g. --lanes lane-a,lane-b
109
+ architect gate <iteration> # integration smoke (raw output; verdict stays yours)
110
+ architect integrate <iteration> --lanes <passing-set> --teardown # or remove worktrees + lane branches after
111
+ ```
112
+
113
+ `architect integrate` commits each named lane on its branch and merges it
114
+ `--no-ff` into the repo's `lane/<iteration>` integration branch, in order. It
115
+ **refuses** a lane that left builder commits or wrote out-of-bounds (the
116
+ mechanical post-flight checks), and aborts on a merge conflict. A merge conflict
117
+ = the lane plan wasn't disjoint = a spec defect: kill the conflicting lane and
118
+ re-spec; don't hand-resolve builder conflicts. It runs **no gates and makes no
119
+ verdict** — `architect gate` streams the raw gate output for you to judge.
120
+
121
+ Under the hood / manual fallback (one lane shown):
122
+
123
+ ```bash
124
+ git -C repos/<repo> checkout -b lane/<iteration> <repo-base>
125
+ git -C build/<id>-<lane>/wt add -A
126
+ git -C build/<id>-<lane>/wt commit -m "lane <lane>: <what>"
127
+ git -C repos/<repo> merge --no-ff lane/<iteration>-<lane>
128
+ <run the gate commands> # integration smoke after every merge
129
+ architect worktree remove <iteration> <lane>
130
+ git -C repos/<repo> branch -d lane/<iteration>-<lane>
131
+ ```
132
+
133
+ ## Operating guidance
134
+
135
+ - Background each lane as its own harness task and let the **per-lane
136
+ completion notification** bring you back (multi-hour runs are normal); read
137
+ `build/<id>-<lane>/run.jsonl` and the repo state afterwards. Do not write a
138
+ blocking `while pgrep …; sleep` wait loop as a Bash command — that is itself
139
+ a launcher that ties up a turn. When you return to a lane, check liveness via
140
+ run-log growth (the stall rules below still apply unchanged).
141
+ - Pin the model explicitly. The tool does this automatically (`--model
142
+ claude-sonnet-4-6`). The `sonnet` alias floats to the latest Sonnet — fine
143
+ interactively, but automations pin the full id so a model bump can't silently
144
+ change builder behavior mid-project.
145
+ - Effort = thinking budget. Claude Code has no per-invocation effort flag the
146
+ way Codex exposed `model_reasoning_effort`; the builder sets thinking depth
147
+ **in the block** via the escalation keywords (`think` < `think hard` <
148
+ `think harder` < `ultrathink`), or you floor it with the `MAX_THINKING_TOKENS`
149
+ env var on the dispatch. Default unattended builder work to a high budget
150
+ (open the block with "Think harder…"); downgrade a routine,
151
+ tightly-specified lane to "think hard" (record which and why in the spec).
152
+ - **Builders never commit, and the architect verifies it.** Claude Code has no
153
+ sandbox to make `.git` read-only, so this is enforced by the deny rules at
154
+ dispatch *and* checked after the run: before integrating a lane, confirm
155
+ `git -C build/<id>-<lane>/wt log <repo-base>..` is empty and
156
+ `git -C build/<id>-<lane>/wt status` shows only files inside the lane's
157
+ declared set. A commit or an out-of-bounds write fails the lane — reset and
158
+ re-dispatch (lanes are cheap, hard rule 7).
159
+ - Same-iteration follow-up (e.g. answering PHASE 0 disagreements after the
160
+ human rules): from the lane's worktree, `claude -p --continue "<rulings +
161
+ proceed>"` resumes that worktree's most recent session with full context —
162
+ sessions are scoped per directory, so `--continue` (`-c`) is deterministic
163
+ even with parallel lanes. (Alternatively pin `--session-id <uuid>` at
164
+ dispatch and resume with `--resume <uuid>`.) Resume the **same way you
165
+ dispatch** — one background Bash tool call per lane, each a single blocking
166
+ `claude -p --continue …`, never a `&` loop (a `&` launcher orphans the
167
+ resumed lanes exactly as it does fresh ones). Never resume across iterations —
168
+ every iteration gets a fresh context.
169
+ - Cross-model review gate (high-stakes iterations): the architect is Opus 4.8
170
+ and the builder is Sonnet 4.6 — both Claude Code, so this is a
171
+ cross-*tier* read inside one lab, not cross-vendor (see `DESIGN.md` R3). The
172
+ architect (Opus) reading the diff is already the stronger-model fresh-context
173
+ pass. For an extra adversarial pass, pipe the instruction + diff to a fresh
174
+ read-only reviewer:
175
+ ```bash
176
+ { echo "Review this diff against the spec. Flag ONLY correctness/requirement/invariant gaps with file:line evidence. No style."; \
177
+ git -C <repo-root> diff <base>...HEAD; } \
178
+ | claude -p --model claude-sonnet-4-6 --allowedTools 'Read,Grep,Glob'
179
+ ```
180
+ - `build/` is already gitignored by the space, so no extra `.gitignore` entry
181
+ is needed. Scratch never reaches the space repo; only `architecture/` is
182
+ committed.
183
+
184
+ ## Stall detection and rescue
185
+
186
+ A dispatched run is STALLED when its `run.jsonl`
187
+ (`build/<id>-<lane>/run.jsonl`) has not grown for 15+ minutes AND the last
188
+ event is an in-flight `Bash` tool call (a `tool_use` for `Bash` with no
189
+ matching `tool_result` yet). Silent gaps between events are normal model
190
+ thinking; a shell command that should take seconds sitting in flight for 15+
191
+ minutes is not.
192
+
193
+ Diagnose before killing: find the command's child under the `claude` PID
194
+ (claude → shell → child). Hot-spinning (high CPU) or blocked (zero CPU and none
195
+ of its expected side effects on disk) — hung either way.
196
+
197
+ Kill the NARROWEST thing: the stuck child process, not the `claude` run. The
198
+ command returns a failure to the builder, which adapts with its full context
199
+ intact. Kill the whole run only when the builder re-enters the same hang or the
200
+ worktree is broken; then discard the lane and re-dispatch (hard rule 7).
201
+
202
+ Claude Code runs the `Bash` tool directly with no sandbox, so the Codex-era
203
+ sandbox-specific hang sources don't apply — but long-running and interactive
204
+ commands still hang an unattended run. Spec consequence: give every potentially
205
+ long command an explicit timeout in the lane-prompt (the `Bash` tool also takes
206
+ a per-call timeout), cap the run with `--max-turns` as a loop backstop, steer
207
+ builders toward the repo's existing test fixtures over hand-rolled long-running
208
+ harnesses, and when a gate needs a runtime that can't run unattended
209
+ (interactive prompts, servers without a timeout), have the builder record the
210
+ exact failure as a disagreement/blocker and verify what it can — gate verdicts
211
+ are architect-run anyway (hard rule 4). Write the gate file anticipating this.
212
+
213
+ ## Manual alternative (human-driven)
214
+
215
+ Paste the lane-prompt into an interactive `claude` session (no `-p`). Claude
216
+ Code's agent loop runs plan→act→test against the block's stopping condition
217
+ while you watch and steer — approve tools as they come, or set `/permissions`
218
+ first. Use when the human wants to babysit a run.
219
+
220
+ ## Lane-prompt template
221
+
222
+ ```
223
+ Execute the architect spec below. Operating rules:
224
+
225
+ PHASE 0 — Before any code: reply with your plan and EVERY disagreement you have
226
+ with this spec, with reasons, citing real files in this repo. Silent compliance
227
+ is a failure. Silent scope additions are a failure. If you have no
228
+ disagreements, state what you checked before concluding the spec is sound.
229
+ Verify the named APIs/formats/versions against the live dependencies before
230
+ planning around them.
231
+
232
+ PHASE 1 — Treat the shared contracts (schemas/interfaces) named in the spec,
233
+ and the repo's existing public interfaces, as FROZEN: do not change them —
234
+ other lanes depend on them. You have no access to the space's architecture/
235
+ directory; the architect owns it. The ACCEPTANCE CRITERIA below are frozen —
236
+ verify your work against them; never weaken or work around them.
237
+
238
+ PHASE 2 — Build YOUR LANE ONLY: exactly the files listed in BOUNDARIES. You
239
+ are one of several parallel lane agents working in isolated worktrees; files
240
+ outside your lane belong to other agents — touching them fails your lane.
241
+ No placeholder implementations — search the codebase before implementing;
242
+ full implementations only. Write IDIOMATIC, house-consistent code: read the
243
+ neighbouring code and match its conventions (naming, guards, predicates,
244
+ error/persistence idioms, the language's expressive collection/enumerable
245
+ forms); well-factored and DRY-ish; terse but clear; pragmatic, not clever — the
246
+ smallest change that does the job, no abstraction it doesn't need. Consistency
247
+ with the surrounding code is part of correctness here: a change that works but
248
+ fights the house style (or introduces an inconsistency a careful reader of this
249
+ repo would never write) is not done. Tests stay simple and terse, exercising
250
+ public behavior, no mock/stub of the class under test. Verify your work by
251
+ running the acceptance criteria's gate commands and record the verbatim output. Do NOT commit and do NOT run any
252
+ git write command (commit/add/branch/reset/checkout) — the architect commits
253
+ and merges after verification, and verifies you made no commits. Do NOT delete
254
+ lock files or escalate privileges if a command fails; record the exact error
255
+ and continue. Give every potentially long command an explicit timeout; if a
256
+ runtime will not start unattended (interactive prompt, server with no timeout),
257
+ record the exact failure in your report and route around it — never busy-wait
258
+ or retry in a loop. When done, write your report to the scratch file given to
259
+ you, build/<id>-<lane>/report.md (an absolute path outside your worktree),
260
+ with RAW results only — tables, numbers, command output — no interpretation, no
261
+ "promising". Every status claim must be backed by a command result from this
262
+ run. Keep the report compact — tables and numbers, not prose. End it with
263
+ exactly one status line: STATUS: COMPLETE | COMPLETE_WITH_CONCERNS (list them)
264
+ | BLOCKED (exact blocker + what you tried). Verdicts belong to the architect
265
+ and the human. Persist until your lane is fully handled end-to-end; do not stop
266
+ at analysis or partial fixes.
267
+
268
+ === OBJECTIVE (and why) ===
269
+ ...
270
+
271
+ === OUTPUT FORMAT ===
272
+ ...
273
+
274
+ === TOOL GUIDANCE (verification commands; verify-against-reality list) ===
275
+ ...
276
+
277
+ === BOUNDARIES (may touch / must not touch / out of scope) ===
278
+ ...
279
+
280
+ === DISAGREEMENT RULINGS (from last session) ===
281
+ ...
282
+
283
+ === ACCEPTANCE CRITERIA (frozen — the architect re-runs these to judge; verify
284
+ against them, do not edit or work around) ===
285
+ ...
286
+ ```
287
+
288
+ ## Builder-side standing setup (one time per machine/repo)
289
+
290
+ - The builder is the same `claude` binary as the architect, one tier down —
291
+ nothing extra to install. `architect dispatch` pins the model per dispatch
292
+ (`--model claude-sonnet-4-6`); a `~/.claude/settings.json` `"model"` default
293
+ is fine interactively, but automations pin it explicitly so a default can't
294
+ silently swap the builder.
295
+ - Repo `CLAUDE.md` is the builder's standing context — Claude Code loads it
296
+ root-down automatically. Put exact build/test commands and repo gotchas there;
297
+ the loop's PHASE rules stay in the dispatch block so they version with the
298
+ skill. (Claude Code does **not** auto-read `AGENTS.md`; if the repo keeps its
299
+ build/test docs there, add `@AGENTS.md` to `CLAUDE.md` to pull it in.)
300
+ - The builder is a bare `claude -p` over the block — it is not invoking the
301
+ `/architect` skills, the block is its entire instruction set. (`--bare` would
302
+ give a leaner builder context but also drops `CLAUDE.md`/skills/hooks — keep
303
+ `CLAUDE.md`, so skip `--bare` unless the repo has no standing build/test doc.)
304
+ - Billing: headless `claude -p` draws on the Agent SDK credit pool on your
305
+ Claude plan (separate from interactive usage limits since June 15 2026).
306
+ There's no per-window quota that dies mid-run the way a chat session can, but
307
+ a long parallel fan-out does spend that pool. The architect runs as your
308
+ interactive Claude Code session.
@@ -0,0 +1,89 @@
1
+ # Research fan-out reference
2
+
3
+ Read this only when a research trigger fires (see SKILL.md step 3). The
4
+ fan-out uses `claude -p` (Sonnet 4.6) as parallel web-research subagents —
5
+ read-only (no `Edit`/`Write`/`Bash`), built-in `WebSearch`/`WebFetch` — and
6
+ the architect keeps all judgment: it verifies the load-bearing claims and writes
7
+ the iteration's **Grounds** section itself.
8
+
9
+ ## Fan out
10
+
11
+ Decompose the question into 3–5 narrow, NON-OVERLAPPING research questions.
12
+ Cover different angles, not the same angle five times — typical split:
13
+ official docs/reference, changelog/breaking changes, community failure reports,
14
+ alternatives/comparisons, security/operational constraints.
15
+
16
+ One fresh `claude -p` per question, each launched as its own **background Bash
17
+ tool call** (`run_in_background`) — one call per researcher, not a shell `&`
18
+ loop (a `&` launcher orphans the researchers and the harness reaps them all at
19
+ once; same trap as builder dispatch). The researcher toolset is read-only
20
+ (`Read,Grep,Glob`) plus the web tools (`WebSearch,WebFetch`) — no
21
+ `Edit`/`Write`/`Bash`, so it cannot touch the repo. Capture the final report
22
+ by redirecting stdout (the report *is* the text result in the default `text`
23
+ output format):
24
+
25
+ ```bash
26
+ claude -p --model claude-sonnet-4-6 \
27
+ --allowedTools 'Read,Grep,Glob,WebSearch,WebFetch' \
28
+ --max-turns 40 \
29
+ < build/research/<NN>-<topic>.prompt.md \
30
+ > build/research/<NN>-<topic>.md
31
+ ```
32
+
33
+ Write each research-prompt to a `.prompt.md` file and feed it on stdin, never
34
+ as a shell argument — a quote-mangling shell will corrupt a big prompt; the
35
+ stdin redirect injects it verbatim.
36
+
37
+ - Read-only by toolset: with only `Read,Grep,Glob,WebSearch,WebFetch` on the
38
+ allow list and nothing else granted, any write/bash call is denied — the
39
+ researcher can't touch the repo. Its report is the redirected stdout.
40
+ - Web comes from the built-in `WebSearch` and `WebFetch` tools — no extension
41
+ or key. Launch ONE canary researcher and confirm it actually fetches live URLs
42
+ before fanning out. Restrict domains with `WebFetch(domain:…)` allow rules in
43
+ prompt-injection-sensitive repos.
44
+ - Thinking budget: keep research at a modest level (a plain block, or "think
45
+ hard") — research is coverage work; deep thinking buys nothing here. Synthesis
46
+ happens on the architect's side.
47
+ - Scope each researcher to ≤5 subjects and put hard context rules in the
48
+ research-prompt (snippet over page; quote ≤2 sentences; stop the moment you
49
+ can answer) — a researcher that fills its context window dies without emitting
50
+ its report. Bisect and re-dispatch dead lanes; don't re-run as-is.
51
+
52
+ ## Research-prompt template
53
+
54
+ ```
55
+ You are a web research agent. Answer ONE question. Do not write code, do not
56
+ make recommendations — judgment belongs to the architect who reads your output.
57
+
58
+ QUESTION: <one narrow question>
59
+
60
+ OUTPUT FORMAT — a markdown report:
61
+ - Findings as bullets. EVERY finding carries: source URL, source date (if
62
+ shown), the exact figure or a short direct quote, and a confidence tag
63
+ (high = primary source / med = reputable secondary / low = single blog or
64
+ forum post).
65
+ - Prefer primary sources (official docs, changelogs, release notes, source
66
+ code) over blog posts. Record exact version numbers and dates.
67
+ - When sources disagree, report the disagreement — do not resolve it.
68
+ - If you cannot find evidence for something, write NOT FOUND — never infer or
69
+ fill gaps from prior knowledge without flagging it as such.
70
+ - End with: the 2-3 findings most likely to change an implementation decision.
71
+ ```
72
+
73
+ ## Gather (architect — this is your work, not another agent's)
74
+
75
+ 1. Read every findings file in `build/research/`.
76
+ 2. Identify the **load-bearing claims** — facts the spec will depend on
77
+ (an API shape, a version constraint, a limit, a deprecation). Adversarially
78
+ verify each: cross-check against a second independent source or the live
79
+ dependency itself. Discard single-source low-confidence claims or mark them
80
+ as open questions.
81
+ 3. Write the iteration's **Grounds** section
82
+ (`architecture/I<NN>-<name>.md`): problem, decision + why, requirements,
83
+ non-goals, verified facts **with citations**, open questions for the human.
84
+ You write it — researchers gather, the architect judges and decides.
85
+ 4. Commit Grounds (`I<NN>: grounds`). Raw findings stay in `build/research/`
86
+ (gitignored) — only the distilled, cited Grounds section is repo memory.
87
+ 5. The iteration's Specification cites Grounds instead of restating it; the
88
+ builder's PHASE 0 is expected to challenge Grounds' claims like anything
89
+ else.
@@ -0,0 +1,165 @@
1
+ ---
2
+ name: architect-research
3
+ description: >
4
+ Discovery-scale research harness: a cheap scout researcher maps the topic,
5
+ the orchestrator designs topic-specific parallel researcher lanes from the
6
+ scout report (drawing on a source-class tactics library — academic, repos,
7
+ production patterns, web, experts), then verifies claims against sources and
8
+ synthesizes a decision-oriented report. Use when
9
+ brainstorming a project or feature, choosing a technology, or asked to
10
+ "research X", "what's the state of the art", "deep research". For narrow
11
+ slice-level fact checks inside the build loop, /architect handles those inline.
12
+ ---
13
+
14
+ # Architect Research
15
+
16
+ You are the research orchestrator. Researchers gather; **you** design the
17
+ decomposition, verify, and write — judgment never delegates. The source-class
18
+ tactics library (search mechanics + verified endpoints per source class) is in
19
+ `lanes.md` next to this file; read it when you design lanes.
20
+
21
+ ## Scale before anything
22
+
23
+ - **Simple fact-find** → answer directly or 1 researcher (3–10 searches).
24
+ Don't run a harness on a question one search answers.
25
+ - **Comparison / focused question** → 2–4 researchers on distinct
26
+ perspectives, no scout — you already know the terrain.
27
+ - **Brainstorm / SOTA survey / technology choice** → scout first, then a
28
+ designed fan-out of 4–6 researchers.
29
+
30
+ ## Procedure
31
+
32
+ ### 1. Scope → brief
33
+
34
+ If the question is ambiguous, ask at most 2–3 clarifying questions, then
35
+ compress everything into a **research brief**: the question, the decision it
36
+ informs, constraints, and what "answered" looks like. The brief is the north
37
+ star — every later step is checked against it, and it's restated at the top of
38
+ the final report so the reader can audit scope drift.
39
+
40
+ ### 2. Scout, then design the lanes
41
+
42
+ The surveyed production deep-research systems and 4/5 leading OSS frameworks
43
+ use LLM-designed, topic-specific decomposition rather than a fixed lane
44
+ taxonomy. Lanes are designed per topic, not taken from a template.
45
+
46
+ **Scout (brainstorm scale only):** dispatch ONE cheap researcher (~10
47
+ searches, same `claude -p` command as step 3) to map the terrain: canonical
48
+ terminology, the 5–10 load-bearing systems/papers/repos, the named people,
49
+ which source classes look rich vs empty, and the topic's natural fault lines.
50
+ The scout returns a map, not findings — discovering the topic's actual
51
+ perspectives from sources substantially increased source diversity in STORM's
52
+ ablations. Skip the scout when you already know the terrain (comparisons,
53
+ fact-finds) — an upfront pass that tells you nothing new is pure latency.
54
+
55
+ **Design (you, from the scout report):** decompose into 3–6 sub-questions
56
+ along the topic's own fault lines — distinct perspectives, never keyword
57
+ variants of one query. For each lane pick the source-class tactics it needs
58
+ from `lanes.md` (academic snowballing, dependents-not-stars repo evidence,
59
+ production-grade pattern mining, general web, expert tracking) — one lane may
60
+ mix tactics; most topics don't need every source class. Scope each lane to
61
+ ≤5 subjects and give every lane an explicit search budget. Reserve **expert
62
+ opinion** as a second-wave lane: its roster (survey authors, maintainers,
63
+ recurring names) comes from the first wave's findings.
64
+
65
+ Review the lane set for overlap AND for gaps against the brief before
66
+ dispatch. State the plan in a few lines; proceed unless the user redirects.
67
+
68
+ ### 3. Fan out
69
+
70
+ One fresh researcher per lane, each launched as its own **background Bash tool
71
+ call** (`run_in_background`) — one call per lane, not a shell `&` loop (a `&`
72
+ launcher orphans the lanes and the harness reaps them all at once). Read-only by
73
+ toolset (`Read,Grep,Glob`) plus the web tools (`WebSearch,WebFetch`); the
74
+ report is the redirected stdout:
75
+
76
+ ```bash
77
+ claude -p --model claude-sonnet-4-6 \
78
+ --allowedTools 'Read,Grep,Glob,WebSearch,WebFetch' \
79
+ --max-turns 40 \
80
+ < build/research/<NN>-<lane>.prompt.md \
81
+ > build/research/<NN>-<lane>.md
82
+ ```
83
+
84
+ Write each lane block to a `.prompt.md` file and feed it on stdin — never as a
85
+ shell argument; a quote-mangling shell will corrupt a big block, the stdin
86
+ redirect injects it verbatim.
87
+
88
+ (Web comes from the built-in `WebSearch` and `WebFetch` tools — no extension or
89
+ key. With only the read-only + web tools on the allow list, every write/bash
90
+ call is denied, so the researcher can't touch the repo. Launch ONE canary lane
91
+ and confirm it actually fetches live URLs before fanning out. These lane blocks
92
+ also run verbatim as read-only Claude subagents with web search if you'd rather
93
+ keep research inside the architect's own session.)
94
+
95
+ Every lane block carries the full contract — objective, output format, source
96
+ guidance, boundaries — plus:
97
+
98
+ - **Search budget** by tier: simple 5, standard 15, deep 25 searches.
99
+ - **Saturation rule**: two consecutive searches yielding no new load-bearing
100
+ facts → return what you have.
101
+ - **Findings discipline**: every finding has URL + date + exact figure or
102
+ short quote + confidence tag (high = primary source / med = reputable
103
+ secondary / low = single blog or forum). NOT FOUND beats inference.
104
+ Disagreements between sources are reported, never resolved. No
105
+ recommendations — judgment is the orchestrator's.
106
+
107
+ ### 4. Gap round (max 2 extra rounds, usually 1)
108
+
109
+ Read all findings. Score coverage against the brief: which sub-questions have
110
+ supported answers? Spawn targeted gap-fill researchers **only** for the
111
+ unanswered ones. This is also where the **expert-opinion lane** dispatches:
112
+ extract the expert roster from the first wave (survey authors, maintainers,
113
+ recurring names) and send the lane-6 researcher after them. Hard stop after
114
+ two refinement rounds — past that you're chasing nonexistent information.
115
+
116
+ ### 5. Verify (your work, against raw sources)
117
+
118
+ - Extract the **load-bearing claims** — the facts the decision depends on.
119
+ - Require **≥2 independent sources** per load-bearing claim. Independent means
120
+ independent *origin* — two articles rewriting the same press release are one
121
+ source.
122
+ - Tag each: **VERIFIED** (≥2 independent agree) / **UNVERIFIED** (<2, no
123
+ contradiction) / **DISPUTED** (sources disagree — report both positions and
124
+ *why* they differ: date, method, definition) / **SUSPICIOUS** (contradicts
125
+ available evidence).
126
+ - **Adversarial pass** on the top claims: search "<claim> criticism",
127
+ "<X> problems", "<X> vs <alternative>" — actively try to falsify.
128
+ - **Citations are only URLs fetched this session.** Never cite from memory —
129
+ even search-grounded agents fabricate 3–13% of URLs. Spot-check the
130
+ load-bearing ones by fetching them yourself.
131
+ - **Recency discipline**: every quantitative or current-state claim carries a
132
+ source date; prefer the most recent authoritative treatment; date-restrict
133
+ searches on fast-moving topics. Anything that smells like training-data
134
+ leakage gets re-verified or cut.
135
+ - **Source hierarchy**: primary (papers, official docs, changelogs, first-party
136
+ engineering blogs) > reputable secondary > SEO listicles (pointers only,
137
+ never citations).
138
+ - **Opinion ≠ fact.** Expert opinions are reported as positions — quoted,
139
+ dated, conflict-of-interest flagged — and never count toward the ≥2-source
140
+ rule for factual claims. Expert *disagreements* are first-class findings:
141
+ they mark the genuinely open questions.
142
+
143
+ ### 6. Synthesize (one pass, one author — you)
144
+
145
+ Parallelize gathering, never synthesis. Write `build/research/<topic>-report.md`:
146
+
147
+ - **Answer first** (BLUF), then evidence, then method.
148
+ - The brief, restated.
149
+ - Per major finding: the claim + confidence tag + **what it implies for the
150
+ decision** + **what evidence would change this conclusion**.
151
+ - Disputes surfaced with both positions — never silently averaged.
152
+ - **Expert positions map**: who believes what (quoted, dated,
153
+ conflict-of-interest flagged), and where credible experts disagree.
154
+ - **Open questions**: each UNVERIFIED/DISPUTED item with the specific search
155
+ or experiment that would resolve it (this doubles as the next round's input).
156
+ - Citations dated and tier-labeled: `[primary, 2026-04]`.
157
+
158
+ Commit the report. Raw findings stay in `build/research/` (gitignored).
159
+
160
+ ### 7. Hand off
161
+
162
+ If this feeds the build loop: distill the report into the iteration's **Grounds**
163
+ section (`architecture/I<NN>-<name>.md`) per `/architect`, or into
164
+ `architecture/BRIEF.md` §sections when it is mission-scope, and continue there.
165
+ The builder's PHASE 0 will challenge Grounds' claims — that's a feature.