space-architect 1.2.0 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +9 -0
- data/lib/space_architect/cli/architect.rb +26 -0
- data/lib/space_architect/skill_installer.rb +107 -0
- data/lib/space_architect/terminal.rb +15 -0
- data/lib/space_architect/version.rb +1 -1
- data/lib/space_architect.rb +1 -0
- data/skill/architect/SKILL.md +329 -0
- data/skill/architect/dispatch.md +308 -0
- data/skill/architect/research.md +89 -0
- data/skill/architect-research/SKILL.md +165 -0
- data/skill/architect-research/lanes.md +191 -0
- data/skill/architect-vocabulary/SKILL.md +141 -0
- data/vendor/repo-tender/lib/space_architect/pristine/cli.rb +1 -10
- data/vendor/repo-tender/lib/space_architect/pristine.rb +7 -0
- metadata +8 -1
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
# Builder dispatch reference
|
|
2
|
+
|
|
3
|
+
Verified against the `claude` CLI (Claude Code) headless mode, June 2026. The
|
|
4
|
+
builder is `claude -p` (`--print`, the non-interactive headless mode) pinned to
|
|
5
|
+
`claude-sonnet-4-6` — the *same binary the architect runs, one tier down*. Key
|
|
6
|
+
facts the skill encodes: lane-prompts go in on **stdin** (Claude Code has no
|
|
7
|
+
`@file`, and a big quoted lane-prompt as a shell argument gets mangled); the
|
|
8
|
+
model is pinned with `--model claude-sonnet-4-6` (the `sonnet` alias floats to
|
|
9
|
+
the latest Sonnet — pin the full id); there is **no `-C`/working-dir flag**, so
|
|
10
|
+
per-lane dispatch `cd`s into the worktree; permissions are the **tool
|
|
11
|
+
allow/deny lists** (`--allowedTools`/`--disallowedTools`) plus
|
|
12
|
+
`--permission-mode`, not a sandbox; web access is the built-in
|
|
13
|
+
`WebSearch`/`WebFetch` tools (no extension to install).
|
|
14
|
+
|
|
15
|
+
**The one load-bearing difference from the Codex design:** Codex's
|
|
16
|
+
`--sandbox workspace-write` made `.git` physically read-only. Claude Code has
|
|
17
|
+
**no automatic filesystem sandbox** in headless mode (the sandbox is opt-in via
|
|
18
|
+
settings, off by default), so `.git` is not hardware-protected. "Builders never
|
|
19
|
+
commit" (hard rule 7) is now enforced in three layers, weakest to strongest:
|
|
20
|
+
(1) a runtime first line — deny the git-write tools with
|
|
21
|
+
`--disallowedTools 'Bash(git commit:*)' …`; (2) worktree isolation between
|
|
22
|
+
lanes; (3) the authoritative check — an architect post-flight
|
|
23
|
+
`git -C <worktree> log <repo-base>..` that must be empty. The deny rules are
|
|
24
|
+
not airtight (a builder can shell out — `sh -c 'git commit …'` — past the
|
|
25
|
+
pattern match), so the post-flight `git log` is what the loop actually trusts.
|
|
26
|
+
If a lane committed, treat the worktree as tampered: reset and re-dispatch.
|
|
27
|
+
|
|
28
|
+
**Preflight (once per environment):** run `claude --version`, and confirm the
|
|
29
|
+
builder model resolves with a one-shot canary
|
|
30
|
+
(`echo ok | claude -p --model claude-sonnet-4-6 --max-turns 1`). No API key —
|
|
31
|
+
the builder runs on your Claude plan — but note headless `claude -p` draws on
|
|
32
|
+
the Agent SDK credit pool (separate from interactive usage since June 15 2026;
|
|
33
|
+
see `DESIGN.md` §4). On the first real dispatch in a new environment, launch
|
|
34
|
+
ONE canary lane and confirm it starts cleanly before fanning anything out.
|
|
35
|
+
|
|
36
|
+
## Canonical dispatch — `architect dispatch <iteration> <lane>`
|
|
37
|
+
|
|
38
|
+
The canonical path is `architect dispatch <iteration> <lane>`. The tool
|
|
39
|
+
assembles the canonical `claude -p` argv, pins the model to `claude-sonnet-4-6`,
|
|
40
|
+
reads the lane prompt from `build/<id>-<lane>/prompt.md` on stdin, and streams
|
|
41
|
+
`--output-format stream-json --verbose` output to
|
|
42
|
+
`build/<id>-<lane>/run.jsonl`. Run each lane as its own **background Bash tool
|
|
43
|
+
call** (`run_in_background`) so your turn doesn't block for the full multi-hour
|
|
44
|
+
run.
|
|
45
|
+
|
|
46
|
+
Write the lane's prompt to `build/<id>-<lane>/prompt.md` first (never pass a
|
|
47
|
+
big prompt as a shell argument — shells mangle quotes), then:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
# single-lane iteration — run from the space root
|
|
51
|
+
architect dispatch <iteration> <lane>
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
For multi-lane iterations, create worktrees first (one per lane), then dispatch
|
|
55
|
+
each lane from its worktree:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# per lane:
|
|
59
|
+
architect worktree add <repo> <iteration> <lane> [--base <repo-base>]
|
|
60
|
+
architect dispatch <iteration> <lane>
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
`architect worktree add` creates `build/<id>-<lane>/wt` off the target repo's
|
|
64
|
+
base commit (a repo commit — distinct from the freeze, which is a space
|
|
65
|
+
commit), adds it with a `lane/<iteration>-<lane>` branch, and records it in
|
|
66
|
+
`space.yaml`.
|
|
67
|
+
|
|
68
|
+
Issue each dispatch as its **own background Bash tool call** — one call per
|
|
69
|
+
lane. Never use a shell `&` loop. A `for … & done` launcher is a *launcher*
|
|
70
|
+
process: it returns the instant it has spawned the lane children, the harness
|
|
71
|
+
reaps those now-orphaned `claude` processes, and every lane dies at once with no
|
|
72
|
+
`result` — partial diffs, no reports (this exact failure has happened: three
|
|
73
|
+
lanes killed at the same second, zero output). One blocking dispatch per
|
|
74
|
+
background Bash tool keeps each lane attached to a harness-tracked task that
|
|
75
|
+
survives the full multi-hour run and reports completion per lane.
|
|
76
|
+
|
|
77
|
+
### What the tool runs under the hood
|
|
78
|
+
|
|
79
|
+
`architect dispatch` is equivalent to this — documented here for transparency
|
|
80
|
+
and as the manual fallback:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
# write prompt to build/<id>-<lane>/prompt.md first, then:
|
|
84
|
+
( cd build/<id>-<lane>/wt && \
|
|
85
|
+
claude -p --model claude-sonnet-4-6 \
|
|
86
|
+
--permission-mode acceptEdits \
|
|
87
|
+
--allowedTools 'Read,Edit,Write,Grep,Glob,Bash,WebSearch,WebFetch' \
|
|
88
|
+
--disallowedTools 'Bash(git commit:*),Bash(git push:*),Bash(git reset:*),Bash(git merge:*),Bash(git rebase:*),Bash(git checkout:*),Bash(git branch:*)' \
|
|
89
|
+
--output-format stream-json --verbose \
|
|
90
|
+
--max-turns 200 \
|
|
91
|
+
< <space>/build/<id>-<lane>/prompt.md \
|
|
92
|
+
> <space>/build/<id>-<lane>/run.jsonl 2>&1 )
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
`acceptEdits` auto-approves file writes; listing `Bash` in `--allowedTools`
|
|
96
|
+
auto-approves shell commands so the run never blocks on a prompt; any tool *not*
|
|
97
|
+
on the allow list is denied rather than prompted in `-p` mode (so the builder
|
|
98
|
+
can't wander outside its toolset), and the `--disallowedTools` deny rules win
|
|
99
|
+
over the allow list (deny always takes precedence) as the runtime first line
|
|
100
|
+
against commits. Redirect stderr (`2>&1`) into the run-log so a dispatch error
|
|
101
|
+
lands somewhere instead of vanishing.
|
|
102
|
+
|
|
103
|
+
### Integration (architect-only, after per-lane post-flight passes)
|
|
104
|
+
|
|
105
|
+
You decide which lanes pass; the CLI does the git mechanics. Canonical path:
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
architect integrate <iteration> --lanes <passing-set> # e.g. --lanes lane-a,lane-b
|
|
109
|
+
architect gate <iteration> # integration smoke (raw output; verdict stays yours)
|
|
110
|
+
architect integrate <iteration> --lanes <passing-set> --teardown # or remove worktrees + lane branches after
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
`architect integrate` commits each named lane on its branch and merges it
|
|
114
|
+
`--no-ff` into the repo's `lane/<iteration>` integration branch, in order. It
|
|
115
|
+
**refuses** a lane that left builder commits or wrote out-of-bounds (the
|
|
116
|
+
mechanical post-flight checks), and aborts on a merge conflict. A merge conflict
|
|
117
|
+
= the lane plan wasn't disjoint = a spec defect: kill the conflicting lane and
|
|
118
|
+
re-spec; don't hand-resolve builder conflicts. It runs **no gates and makes no
|
|
119
|
+
verdict** — `architect gate` streams the raw gate output for you to judge.
|
|
120
|
+
|
|
121
|
+
Under the hood / manual fallback (one lane shown):
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
git -C repos/<repo> checkout -b lane/<iteration> <repo-base>
|
|
125
|
+
git -C build/<id>-<lane>/wt add -A
|
|
126
|
+
git -C build/<id>-<lane>/wt commit -m "lane <lane>: <what>"
|
|
127
|
+
git -C repos/<repo> merge --no-ff lane/<iteration>-<lane>
|
|
128
|
+
<run the gate commands> # integration smoke after every merge
|
|
129
|
+
architect worktree remove <iteration> <lane>
|
|
130
|
+
git -C repos/<repo> branch -d lane/<iteration>-<lane>
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
## Operating guidance
|
|
134
|
+
|
|
135
|
+
- Background each lane as its own harness task and let the **per-lane
|
|
136
|
+
completion notification** bring you back (multi-hour runs are normal); read
|
|
137
|
+
`build/<id>-<lane>/run.jsonl` and the repo state afterwards. Do not write a
|
|
138
|
+
blocking `while pgrep …; sleep` wait loop as a Bash command — that is itself
|
|
139
|
+
a launcher that ties up a turn. When you return to a lane, check liveness via
|
|
140
|
+
run-log growth (the stall rules below still apply unchanged).
|
|
141
|
+
- Pin the model explicitly. The tool does this automatically (`--model
|
|
142
|
+
claude-sonnet-4-6`). The `sonnet` alias floats to the latest Sonnet — fine
|
|
143
|
+
interactively, but automations pin the full id so a model bump can't silently
|
|
144
|
+
change builder behavior mid-project.
|
|
145
|
+
- Effort = thinking budget. Claude Code has no per-invocation effort flag the
|
|
146
|
+
way Codex exposed `model_reasoning_effort`; the builder sets thinking depth
|
|
147
|
+
**in the block** via the escalation keywords (`think` < `think hard` <
|
|
148
|
+
`think harder` < `ultrathink`), or you floor it with the `MAX_THINKING_TOKENS`
|
|
149
|
+
env var on the dispatch. Default unattended builder work to a high budget
|
|
150
|
+
(open the block with "Think harder…"); downgrade a routine,
|
|
151
|
+
tightly-specified lane to "think hard" (record which and why in the spec).
|
|
152
|
+
- **Builders never commit, and the architect verifies it.** Claude Code has no
|
|
153
|
+
sandbox to make `.git` read-only, so this is enforced by the deny rules at
|
|
154
|
+
dispatch *and* checked after the run: before integrating a lane, confirm
|
|
155
|
+
`git -C build/<id>-<lane>/wt log <repo-base>..` is empty and
|
|
156
|
+
`git -C build/<id>-<lane>/wt status` shows only files inside the lane's
|
|
157
|
+
declared set. A commit or an out-of-bounds write fails the lane — reset and
|
|
158
|
+
re-dispatch (lanes are cheap, hard rule 7).
|
|
159
|
+
- Same-iteration follow-up (e.g. answering PHASE 0 disagreements after the
|
|
160
|
+
human rules): from the lane's worktree, `claude -p --continue "<rulings +
|
|
161
|
+
proceed>"` resumes that worktree's most recent session with full context —
|
|
162
|
+
sessions are scoped per directory, so `--continue` (`-c`) is deterministic
|
|
163
|
+
even with parallel lanes. (Alternatively pin `--session-id <uuid>` at
|
|
164
|
+
dispatch and resume with `--resume <uuid>`.) Resume the **same way you
|
|
165
|
+
dispatch** — one background Bash tool call per lane, each a single blocking
|
|
166
|
+
`claude -p --continue …`, never a `&` loop (a `&` launcher orphans the
|
|
167
|
+
resumed lanes exactly as it does fresh ones). Never resume across iterations —
|
|
168
|
+
every iteration gets a fresh context.
|
|
169
|
+
- Cross-model review gate (high-stakes iterations): the architect is Opus 4.8
|
|
170
|
+
and the builder is Sonnet 4.6 — both Claude Code, so this is a
|
|
171
|
+
cross-*tier* read inside one lab, not cross-vendor (see `DESIGN.md` R3). The
|
|
172
|
+
architect (Opus) reading the diff is already the stronger-model fresh-context
|
|
173
|
+
pass. For an extra adversarial pass, pipe the instruction + diff to a fresh
|
|
174
|
+
read-only reviewer:
|
|
175
|
+
```bash
|
|
176
|
+
{ echo "Review this diff against the spec. Flag ONLY correctness/requirement/invariant gaps with file:line evidence. No style."; \
|
|
177
|
+
git -C <repo-root> diff <base>...HEAD; } \
|
|
178
|
+
| claude -p --model claude-sonnet-4-6 --allowedTools 'Read,Grep,Glob'
|
|
179
|
+
```
|
|
180
|
+
- `build/` is already gitignored by the space, so no extra `.gitignore` entry
|
|
181
|
+
is needed. Scratch never reaches the space repo; only `architecture/` is
|
|
182
|
+
committed.
|
|
183
|
+
|
|
184
|
+
## Stall detection and rescue
|
|
185
|
+
|
|
186
|
+
A dispatched run is STALLED when its `run.jsonl`
|
|
187
|
+
(`build/<id>-<lane>/run.jsonl`) has not grown for 15+ minutes AND the last
|
|
188
|
+
event is an in-flight `Bash` tool call (a `tool_use` for `Bash` with no
|
|
189
|
+
matching `tool_result` yet). Silent gaps between events are normal model
|
|
190
|
+
thinking; a shell command that should take seconds sitting in flight for 15+
|
|
191
|
+
minutes is not.
|
|
192
|
+
|
|
193
|
+
Diagnose before killing: find the command's child under the `claude` PID
|
|
194
|
+
(claude → shell → child). Hot-spinning (high CPU) or blocked (zero CPU and none
|
|
195
|
+
of its expected side effects on disk) — hung either way.
|
|
196
|
+
|
|
197
|
+
Kill the NARROWEST thing: the stuck child process, not the `claude` run. The
|
|
198
|
+
command returns a failure to the builder, which adapts with its full context
|
|
199
|
+
intact. Kill the whole run only when the builder re-enters the same hang or the
|
|
200
|
+
worktree is broken; then discard the lane and re-dispatch (hard rule 7).
|
|
201
|
+
|
|
202
|
+
Claude Code runs the `Bash` tool directly with no sandbox, so the Codex-era
|
|
203
|
+
sandbox-specific hang sources don't apply — but long-running and interactive
|
|
204
|
+
commands still hang an unattended run. Spec consequence: give every potentially
|
|
205
|
+
long command an explicit timeout in the lane-prompt (the `Bash` tool also takes
|
|
206
|
+
a per-call timeout), cap the run with `--max-turns` as a loop backstop, steer
|
|
207
|
+
builders toward the repo's existing test fixtures over hand-rolled long-running
|
|
208
|
+
harnesses, and when a gate needs a runtime that can't run unattended
|
|
209
|
+
(interactive prompts, servers without a timeout), have the builder record the
|
|
210
|
+
exact failure as a disagreement/blocker and verify what it can — gate verdicts
|
|
211
|
+
are architect-run anyway (hard rule 4). Write the gate file anticipating this.
|
|
212
|
+
|
|
213
|
+
## Manual alternative (human-driven)
|
|
214
|
+
|
|
215
|
+
Paste the lane-prompt into an interactive `claude` session (no `-p`). Claude
|
|
216
|
+
Code's agent loop runs plan→act→test against the block's stopping condition
|
|
217
|
+
while you watch and steer — approve tools as they come, or set `/permissions`
|
|
218
|
+
first. Use when the human wants to babysit a run.
|
|
219
|
+
|
|
220
|
+
## Lane-prompt template
|
|
221
|
+
|
|
222
|
+
```
|
|
223
|
+
Execute the architect spec below. Operating rules:
|
|
224
|
+
|
|
225
|
+
PHASE 0 — Before any code: reply with your plan and EVERY disagreement you have
|
|
226
|
+
with this spec, with reasons, citing real files in this repo. Silent compliance
|
|
227
|
+
is a failure. Silent scope additions are a failure. If you have no
|
|
228
|
+
disagreements, state what you checked before concluding the spec is sound.
|
|
229
|
+
Verify the named APIs/formats/versions against the live dependencies before
|
|
230
|
+
planning around them.
|
|
231
|
+
|
|
232
|
+
PHASE 1 — Treat the shared contracts (schemas/interfaces) named in the spec,
|
|
233
|
+
and the repo's existing public interfaces, as FROZEN: do not change them —
|
|
234
|
+
other lanes depend on them. You have no access to the space's architecture/
|
|
235
|
+
directory; the architect owns it. The ACCEPTANCE CRITERIA below are frozen —
|
|
236
|
+
verify your work against them; never weaken or work around them.
|
|
237
|
+
|
|
238
|
+
PHASE 2 — Build YOUR LANE ONLY: exactly the files listed in BOUNDARIES. You
|
|
239
|
+
are one of several parallel lane agents working in isolated worktrees; files
|
|
240
|
+
outside your lane belong to other agents — touching them fails your lane.
|
|
241
|
+
No placeholder implementations — search the codebase before implementing;
|
|
242
|
+
full implementations only. Write IDIOMATIC, house-consistent code: read the
|
|
243
|
+
neighbouring code and match its conventions (naming, guards, predicates,
|
|
244
|
+
error/persistence idioms, the language's expressive collection/enumerable
|
|
245
|
+
forms); well-factored and DRY-ish; terse but clear; pragmatic, not clever — the
|
|
246
|
+
smallest change that does the job, no abstraction it doesn't need. Consistency
|
|
247
|
+
with the surrounding code is part of correctness here: a change that works but
|
|
248
|
+
fights the house style (or introduces an inconsistency a careful reader of this
|
|
249
|
+
repo would never write) is not done. Tests stay simple and terse, exercising
|
|
250
|
+
public behavior, no mock/stub of the class under test. Verify your work by
|
|
251
|
+
running the acceptance criteria's gate commands and record the verbatim output. Do NOT commit and do NOT run any
|
|
252
|
+
git write command (commit/add/branch/reset/checkout) — the architect commits
|
|
253
|
+
and merges after verification, and verifies you made no commits. Do NOT delete
|
|
254
|
+
lock files or escalate privileges if a command fails; record the exact error
|
|
255
|
+
and continue. Give every potentially long command an explicit timeout; if a
|
|
256
|
+
runtime will not start unattended (interactive prompt, server with no timeout),
|
|
257
|
+
record the exact failure in your report and route around it — never busy-wait
|
|
258
|
+
or retry in a loop. When done, write your report to the scratch file given to
|
|
259
|
+
you, build/<id>-<lane>/report.md (an absolute path outside your worktree),
|
|
260
|
+
with RAW results only — tables, numbers, command output — no interpretation, no
|
|
261
|
+
"promising". Every status claim must be backed by a command result from this
|
|
262
|
+
run. Keep the report compact — tables and numbers, not prose. End it with
|
|
263
|
+
exactly one status line: STATUS: COMPLETE | COMPLETE_WITH_CONCERNS (list them)
|
|
264
|
+
| BLOCKED (exact blocker + what you tried). Verdicts belong to the architect
|
|
265
|
+
and the human. Persist until your lane is fully handled end-to-end; do not stop
|
|
266
|
+
at analysis or partial fixes.
|
|
267
|
+
|
|
268
|
+
=== OBJECTIVE (and why) ===
|
|
269
|
+
...
|
|
270
|
+
|
|
271
|
+
=== OUTPUT FORMAT ===
|
|
272
|
+
...
|
|
273
|
+
|
|
274
|
+
=== TOOL GUIDANCE (verification commands; verify-against-reality list) ===
|
|
275
|
+
...
|
|
276
|
+
|
|
277
|
+
=== BOUNDARIES (may touch / must not touch / out of scope) ===
|
|
278
|
+
...
|
|
279
|
+
|
|
280
|
+
=== DISAGREEMENT RULINGS (from last session) ===
|
|
281
|
+
...
|
|
282
|
+
|
|
283
|
+
=== ACCEPTANCE CRITERIA (frozen — the architect re-runs these to judge; verify
|
|
284
|
+
against them, do not edit or work around) ===
|
|
285
|
+
...
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## Builder-side standing setup (one time per machine/repo)
|
|
289
|
+
|
|
290
|
+
- The builder is the same `claude` binary as the architect, one tier down —
|
|
291
|
+
nothing extra to install. `architect dispatch` pins the model per dispatch
|
|
292
|
+
(`--model claude-sonnet-4-6`); a `~/.claude/settings.json` `"model"` default
|
|
293
|
+
is fine interactively, but automations pin it explicitly so a default can't
|
|
294
|
+
silently swap the builder.
|
|
295
|
+
- Repo `CLAUDE.md` is the builder's standing context — Claude Code loads it
|
|
296
|
+
root-down automatically. Put exact build/test commands and repo gotchas there;
|
|
297
|
+
the loop's PHASE rules stay in the dispatch block so they version with the
|
|
298
|
+
skill. (Claude Code does **not** auto-read `AGENTS.md`; if the repo keeps its
|
|
299
|
+
build/test docs there, add `@AGENTS.md` to `CLAUDE.md` to pull it in.)
|
|
300
|
+
- The builder is a bare `claude -p` over the block — it is not invoking the
|
|
301
|
+
`/architect` skills, the block is its entire instruction set. (`--bare` would
|
|
302
|
+
give a leaner builder context but also drops `CLAUDE.md`/skills/hooks — keep
|
|
303
|
+
`CLAUDE.md`, so skip `--bare` unless the repo has no standing build/test doc.)
|
|
304
|
+
- Billing: headless `claude -p` draws on the Agent SDK credit pool on your
|
|
305
|
+
Claude plan (separate from interactive usage limits since June 15 2026).
|
|
306
|
+
There's no per-window quota that dies mid-run the way a chat session can, but
|
|
307
|
+
a long parallel fan-out does spend that pool. The architect runs as your
|
|
308
|
+
interactive Claude Code session.
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# Research fan-out reference
|
|
2
|
+
|
|
3
|
+
Read this only when a research trigger fires (see SKILL.md step 3). The
|
|
4
|
+
fan-out uses `claude -p` (Sonnet 4.6) as parallel web-research subagents —
|
|
5
|
+
read-only (no `Edit`/`Write`/`Bash`), built-in `WebSearch`/`WebFetch` — and
|
|
6
|
+
the architect keeps all judgment: it verifies the load-bearing claims and writes
|
|
7
|
+
the iteration's **Grounds** section itself.
|
|
8
|
+
|
|
9
|
+
## Fan out
|
|
10
|
+
|
|
11
|
+
Decompose the question into 3–5 narrow, NON-OVERLAPPING research questions.
|
|
12
|
+
Cover different angles, not the same angle five times — typical split:
|
|
13
|
+
official docs/reference, changelog/breaking changes, community failure reports,
|
|
14
|
+
alternatives/comparisons, security/operational constraints.
|
|
15
|
+
|
|
16
|
+
One fresh `claude -p` per question, each launched as its own **background Bash
|
|
17
|
+
tool call** (`run_in_background`) — one call per researcher, not a shell `&`
|
|
18
|
+
loop (a `&` launcher orphans the researchers and the harness reaps them all at
|
|
19
|
+
once; same trap as builder dispatch). The researcher toolset is read-only
|
|
20
|
+
(`Read,Grep,Glob`) plus the web tools (`WebSearch,WebFetch`) — no
|
|
21
|
+
`Edit`/`Write`/`Bash`, so it cannot touch the repo. Capture the final report
|
|
22
|
+
by redirecting stdout (the report *is* the text result in the default `text`
|
|
23
|
+
output format):
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
claude -p --model claude-sonnet-4-6 \
|
|
27
|
+
--allowedTools 'Read,Grep,Glob,WebSearch,WebFetch' \
|
|
28
|
+
--max-turns 40 \
|
|
29
|
+
< build/research/<NN>-<topic>.prompt.md \
|
|
30
|
+
> build/research/<NN>-<topic>.md
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Write each research-prompt to a `.prompt.md` file and feed it on stdin, never
|
|
34
|
+
as a shell argument — a quote-mangling shell will corrupt a big prompt; the
|
|
35
|
+
stdin redirect injects it verbatim.
|
|
36
|
+
|
|
37
|
+
- Read-only by toolset: with only `Read,Grep,Glob,WebSearch,WebFetch` on the
|
|
38
|
+
allow list and nothing else granted, any write/bash call is denied — the
|
|
39
|
+
researcher can't touch the repo. Its report is the redirected stdout.
|
|
40
|
+
- Web comes from the built-in `WebSearch` and `WebFetch` tools — no extension
|
|
41
|
+
or key. Launch ONE canary researcher and confirm it actually fetches live URLs
|
|
42
|
+
before fanning out. Restrict domains with `WebFetch(domain:…)` allow rules in
|
|
43
|
+
prompt-injection-sensitive repos.
|
|
44
|
+
- Thinking budget: keep research at a modest level (a plain block, or "think
|
|
45
|
+
hard") — research is coverage work; deep thinking buys nothing here. Synthesis
|
|
46
|
+
happens on the architect's side.
|
|
47
|
+
- Scope each researcher to ≤5 subjects and put hard context rules in the
|
|
48
|
+
research-prompt (snippet over page; quote ≤2 sentences; stop the moment you
|
|
49
|
+
can answer) — a researcher that fills its context window dies without emitting
|
|
50
|
+
its report. Bisect and re-dispatch dead lanes; don't re-run as-is.
|
|
51
|
+
|
|
52
|
+
## Research-prompt template
|
|
53
|
+
|
|
54
|
+
```
|
|
55
|
+
You are a web research agent. Answer ONE question. Do not write code, do not
|
|
56
|
+
make recommendations — judgment belongs to the architect who reads your output.
|
|
57
|
+
|
|
58
|
+
QUESTION: <one narrow question>
|
|
59
|
+
|
|
60
|
+
OUTPUT FORMAT — a markdown report:
|
|
61
|
+
- Findings as bullets. EVERY finding carries: source URL, source date (if
|
|
62
|
+
shown), the exact figure or a short direct quote, and a confidence tag
|
|
63
|
+
(high = primary source / med = reputable secondary / low = single blog or
|
|
64
|
+
forum post).
|
|
65
|
+
- Prefer primary sources (official docs, changelogs, release notes, source
|
|
66
|
+
code) over blog posts. Record exact version numbers and dates.
|
|
67
|
+
- When sources disagree, report the disagreement — do not resolve it.
|
|
68
|
+
- If you cannot find evidence for something, write NOT FOUND — never infer or
|
|
69
|
+
fill gaps from prior knowledge without flagging it as such.
|
|
70
|
+
- End with: the 2-3 findings most likely to change an implementation decision.
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Gather (architect — this is your work, not another agent's)
|
|
74
|
+
|
|
75
|
+
1. Read every findings file in `build/research/`.
|
|
76
|
+
2. Identify the **load-bearing claims** — facts the spec will depend on
|
|
77
|
+
(an API shape, a version constraint, a limit, a deprecation). Adversarially
|
|
78
|
+
verify each: cross-check against a second independent source or the live
|
|
79
|
+
dependency itself. Discard single-source low-confidence claims or mark them
|
|
80
|
+
as open questions.
|
|
81
|
+
3. Write the iteration's **Grounds** section
|
|
82
|
+
(`architecture/I<NN>-<name>.md`): problem, decision + why, requirements,
|
|
83
|
+
non-goals, verified facts **with citations**, open questions for the human.
|
|
84
|
+
You write it — researchers gather, the architect judges and decides.
|
|
85
|
+
4. Commit Grounds (`I<NN>: grounds`). Raw findings stay in `build/research/`
|
|
86
|
+
(gitignored) — only the distilled, cited Grounds section is repo memory.
|
|
87
|
+
5. The iteration's Specification cites Grounds instead of restating it; the
|
|
88
|
+
builder's PHASE 0 is expected to challenge Grounds' claims like anything
|
|
89
|
+
else.
|
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: architect-research
|
|
3
|
+
description: >
|
|
4
|
+
Discovery-scale research harness: a cheap scout researcher maps the topic,
|
|
5
|
+
the orchestrator designs topic-specific parallel researcher lanes from the
|
|
6
|
+
scout report (drawing on a source-class tactics library — academic, repos,
|
|
7
|
+
production patterns, web, experts), then verifies claims against sources and
|
|
8
|
+
synthesizes a decision-oriented report. Use when
|
|
9
|
+
brainstorming a project or feature, choosing a technology, or asked to
|
|
10
|
+
"research X", "what's the state of the art", "deep research". For narrow
|
|
11
|
+
slice-level fact checks inside the build loop, /architect handles those inline.
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Architect Research
|
|
15
|
+
|
|
16
|
+
You are the research orchestrator. Researchers gather; **you** design the
|
|
17
|
+
decomposition, verify, and write — judgment never delegates. The source-class
|
|
18
|
+
tactics library (search mechanics + verified endpoints per source class) is in
|
|
19
|
+
`lanes.md` next to this file; read it when you design lanes.
|
|
20
|
+
|
|
21
|
+
## Scale before anything
|
|
22
|
+
|
|
23
|
+
- **Simple fact-find** → answer directly or 1 researcher (3–10 searches).
|
|
24
|
+
Don't run a harness on a question one search answers.
|
|
25
|
+
- **Comparison / focused question** → 2–4 researchers on distinct
|
|
26
|
+
perspectives, no scout — you already know the terrain.
|
|
27
|
+
- **Brainstorm / SOTA survey / technology choice** → scout first, then a
|
|
28
|
+
designed fan-out of 4–6 researchers.
|
|
29
|
+
|
|
30
|
+
## Procedure
|
|
31
|
+
|
|
32
|
+
### 1. Scope → brief
|
|
33
|
+
|
|
34
|
+
If the question is ambiguous, ask at most 2–3 clarifying questions, then
|
|
35
|
+
compress everything into a **research brief**: the question, the decision it
|
|
36
|
+
informs, constraints, and what "answered" looks like. The brief is the north
|
|
37
|
+
star — every later step is checked against it, and it's restated at the top of
|
|
38
|
+
the final report so the reader can audit scope drift.
|
|
39
|
+
|
|
40
|
+
### 2. Scout, then design the lanes
|
|
41
|
+
|
|
42
|
+
The surveyed production deep-research systems and 4/5 leading OSS frameworks
|
|
43
|
+
use LLM-designed, topic-specific decomposition rather than a fixed lane
|
|
44
|
+
taxonomy. Lanes are designed per topic, not taken from a template.
|
|
45
|
+
|
|
46
|
+
**Scout (brainstorm scale only):** dispatch ONE cheap researcher (~10
|
|
47
|
+
searches, same `claude -p` command as step 3) to map the terrain: canonical
|
|
48
|
+
terminology, the 5–10 load-bearing systems/papers/repos, the named people,
|
|
49
|
+
which source classes look rich vs empty, and the topic's natural fault lines.
|
|
50
|
+
The scout returns a map, not findings — discovering the topic's actual
|
|
51
|
+
perspectives from sources substantially increased source diversity in STORM's
|
|
52
|
+
ablations. Skip the scout when you already know the terrain (comparisons,
|
|
53
|
+
fact-finds) — an upfront pass that tells you nothing new is pure latency.
|
|
54
|
+
|
|
55
|
+
**Design (you, from the scout report):** decompose into 3–6 sub-questions
|
|
56
|
+
along the topic's own fault lines — distinct perspectives, never keyword
|
|
57
|
+
variants of one query. For each lane pick the source-class tactics it needs
|
|
58
|
+
from `lanes.md` (academic snowballing, dependents-not-stars repo evidence,
|
|
59
|
+
production-grade pattern mining, general web, expert tracking) — one lane may
|
|
60
|
+
mix tactics; most topics don't need every source class. Scope each lane to
|
|
61
|
+
≤5 subjects and give every lane an explicit search budget. Reserve **expert
|
|
62
|
+
opinion** as a second-wave lane: its roster (survey authors, maintainers,
|
|
63
|
+
recurring names) comes from the first wave's findings.
|
|
64
|
+
|
|
65
|
+
Review the lane set for overlap AND for gaps against the brief before
|
|
66
|
+
dispatch. State the plan in a few lines; proceed unless the user redirects.
|
|
67
|
+
|
|
68
|
+
### 3. Fan out
|
|
69
|
+
|
|
70
|
+
One fresh researcher per lane, each launched as its own **background Bash tool
|
|
71
|
+
call** (`run_in_background`) — one call per lane, not a shell `&` loop (a `&`
|
|
72
|
+
launcher orphans the lanes and the harness reaps them all at once). Read-only by
|
|
73
|
+
toolset (`Read,Grep,Glob`) plus the web tools (`WebSearch,WebFetch`); the
|
|
74
|
+
report is the redirected stdout:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
claude -p --model claude-sonnet-4-6 \
|
|
78
|
+
--allowedTools 'Read,Grep,Glob,WebSearch,WebFetch' \
|
|
79
|
+
--max-turns 40 \
|
|
80
|
+
< build/research/<NN>-<lane>.prompt.md \
|
|
81
|
+
> build/research/<NN>-<lane>.md
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Write each lane block to a `.prompt.md` file and feed it on stdin — never as a
|
|
85
|
+
shell argument; a quote-mangling shell will corrupt a big block, the stdin
|
|
86
|
+
redirect injects it verbatim.
|
|
87
|
+
|
|
88
|
+
(Web comes from the built-in `WebSearch` and `WebFetch` tools — no extension or
|
|
89
|
+
key. With only the read-only + web tools on the allow list, every write/bash
|
|
90
|
+
call is denied, so the researcher can't touch the repo. Launch ONE canary lane
|
|
91
|
+
and confirm it actually fetches live URLs before fanning out. These lane blocks
|
|
92
|
+
also run verbatim as read-only Claude subagents with web search if you'd rather
|
|
93
|
+
keep research inside the architect's own session.)
|
|
94
|
+
|
|
95
|
+
Every lane block carries the full contract — objective, output format, source
|
|
96
|
+
guidance, boundaries — plus:
|
|
97
|
+
|
|
98
|
+
- **Search budget** by tier: simple 5, standard 15, deep 25 searches.
|
|
99
|
+
- **Saturation rule**: two consecutive searches yielding no new load-bearing
|
|
100
|
+
facts → return what you have.
|
|
101
|
+
- **Findings discipline**: every finding has URL + date + exact figure or
|
|
102
|
+
short quote + confidence tag (high = primary source / med = reputable
|
|
103
|
+
secondary / low = single blog or forum). NOT FOUND beats inference.
|
|
104
|
+
Disagreements between sources are reported, never resolved. No
|
|
105
|
+
recommendations — judgment is the orchestrator's.
|
|
106
|
+
|
|
107
|
+
### 4. Gap round (max 2 extra rounds, usually 1)
|
|
108
|
+
|
|
109
|
+
Read all findings. Score coverage against the brief: which sub-questions have
|
|
110
|
+
supported answers? Spawn targeted gap-fill researchers **only** for the
|
|
111
|
+
unanswered ones. This is also where the **expert-opinion lane** dispatches:
|
|
112
|
+
extract the expert roster from the first wave (survey authors, maintainers,
|
|
113
|
+
recurring names) and send the lane-6 researcher after them. Hard stop after
|
|
114
|
+
two refinement rounds — past that you're chasing nonexistent information.
|
|
115
|
+
|
|
116
|
+
### 5. Verify (your work, against raw sources)
|
|
117
|
+
|
|
118
|
+
- Extract the **load-bearing claims** — the facts the decision depends on.
|
|
119
|
+
- Require **≥2 independent sources** per load-bearing claim. Independent means
|
|
120
|
+
independent *origin* — two articles rewriting the same press release are one
|
|
121
|
+
source.
|
|
122
|
+
- Tag each: **VERIFIED** (≥2 independent agree) / **UNVERIFIED** (<2, no
|
|
123
|
+
contradiction) / **DISPUTED** (sources disagree — report both positions and
|
|
124
|
+
*why* they differ: date, method, definition) / **SUSPICIOUS** (contradicts
|
|
125
|
+
available evidence).
|
|
126
|
+
- **Adversarial pass** on the top claims: search "<claim> criticism",
|
|
127
|
+
"<X> problems", "<X> vs <alternative>" — actively try to falsify.
|
|
128
|
+
- **Citations are only URLs fetched this session.** Never cite from memory —
|
|
129
|
+
even search-grounded agents fabricate 3–13% of URLs. Spot-check the
|
|
130
|
+
load-bearing ones by fetching them yourself.
|
|
131
|
+
- **Recency discipline**: every quantitative or current-state claim carries a
|
|
132
|
+
source date; prefer the most recent authoritative treatment; date-restrict
|
|
133
|
+
searches on fast-moving topics. Anything that smells like training-data
|
|
134
|
+
leakage gets re-verified or cut.
|
|
135
|
+
- **Source hierarchy**: primary (papers, official docs, changelogs, first-party
|
|
136
|
+
engineering blogs) > reputable secondary > SEO listicles (pointers only,
|
|
137
|
+
never citations).
|
|
138
|
+
- **Opinion ≠ fact.** Expert opinions are reported as positions — quoted,
|
|
139
|
+
dated, conflict-of-interest flagged — and never count toward the ≥2-source
|
|
140
|
+
rule for factual claims. Expert *disagreements* are first-class findings:
|
|
141
|
+
they mark the genuinely open questions.
|
|
142
|
+
|
|
143
|
+
### 6. Synthesize (one pass, one author — you)
|
|
144
|
+
|
|
145
|
+
Parallelize gathering, never synthesis. Write `build/research/<topic>-report.md`:
|
|
146
|
+
|
|
147
|
+
- **Answer first** (BLUF), then evidence, then method.
|
|
148
|
+
- The brief, restated.
|
|
149
|
+
- Per major finding: the claim + confidence tag + **what it implies for the
|
|
150
|
+
decision** + **what evidence would change this conclusion**.
|
|
151
|
+
- Disputes surfaced with both positions — never silently averaged.
|
|
152
|
+
- **Expert positions map**: who believes what (quoted, dated,
|
|
153
|
+
conflict-of-interest flagged), and where credible experts disagree.
|
|
154
|
+
- **Open questions**: each UNVERIFIED/DISPUTED item with the specific search
|
|
155
|
+
or experiment that would resolve it (this doubles as the next round's input).
|
|
156
|
+
- Citations dated and tier-labeled: `[primary, 2026-04]`.
|
|
157
|
+
|
|
158
|
+
Commit the report. Raw findings stay in `build/research/` (gitignored).
|
|
159
|
+
|
|
160
|
+
### 7. Hand off
|
|
161
|
+
|
|
162
|
+
If this feeds the build loop: distill the report into the iteration's **Grounds**
|
|
163
|
+
section (`architecture/I<NN>-<name>.md`) per `/architect`, or into
|
|
164
|
+
`architecture/BRIEF.md` §sections when it is mission-scope, and continue there.
|
|
165
|
+
The builder's PHASE 0 will challenge Grounds' claims — that's a feature.
|