clideck 1.30.5 → 1.30.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,224 +1,411 @@
1
1
  ---
2
2
  name: research-experiment
3
- description: Coordinate autonomous research experiments across multiple coding agents using isolated git worktrees. Use when a user wants the main agent to define a goal, constraints, acceptance criteria, and experiment boundaries, then dispatch Codex/Claude/Gemini or other agents to independently search for solutions without touching production code.
3
+ description: Set up and coordinate parallel autonomous research across multiple agents. Use when a user wants a manager agent to interview for requirements, create the research package, prepare isolated researcher workspaces, and dispatch independent researchers to explore the problem without repeated check-ins.
4
4
  ---
5
5
 
6
6
  # Research Experiment
7
7
 
8
- Use this skill to run parallel, autonomous experiments safely.
8
+ Use this skill to run parallel autonomous research with one manager and multiple researchers.
9
9
 
10
- There are two roles:
10
+ This skill is generic. It is for:
11
11
 
12
- - Main Agent: owns the research brief, workspace setup, researcher prompts, verification, and merge decision.
13
- - Experiment Agent: owns independent exploration inside one assigned worktree and must not redefine the experiment.
12
+ - code optimization
13
+ - implementation experiments
14
+ - product or workflow discovery
15
+ - marketing or advertising research
16
+ - strategy comparison
17
+ - scientific or technical literature review
18
+ - invention and concept exploration
14
19
 
15
- If the user says "you are the main agent", follow the Main Agent Role. If the user says "you are an experiment agent" or "researcher agent", follow the Experiment Agent Role.
20
+ The core model is:
16
21
 
17
- ## Main Agent Role
22
+ - The **manager** interviews the user, defines the experiment, prepares the workspace, writes the researcher prompt files, and later synthesizes results.
23
+ - Each **researcher** works independently inside an assigned workspace and prompt file, explores multiple approaches, and produces a final findings document.
18
24
 
19
- The main agent defines the experiment and coordinates researchers. It must not let each researcher invent different goals or acceptance criteria.
25
+ If the user says "you are the main agent" or "you are the manager", follow the Manager Role.
20
26
 
21
- ## Main Agent Workflow
27
+ If the user says "you are a researcher" or "experiment agent", follow the Researcher Role.
22
28
 
23
- 1. Convert the user request into an experiment brief:
24
- - Goal: the outcome to achieve.
25
- - Acceptance criteria: exact tests, benchmarks, or review gates.
26
- - Hard constraints: what must not change.
27
- - Quality bar: what counts as meaningful progress versus noise.
28
- - Shared resources: ports, GPUs, model caches, services, datasets, credentials, or external APIs.
29
- - Stop conditions: when researchers may quit.
29
+ ## First Check: Can This Run Autonomously?
30
30
 
31
- 2. Identify the production repository root:
32
- - Use `git rev-parse --show-toplevel` when inside a git repo.
33
- - If the project has nested repos, identify which repo owns the files under experiment.
34
- - Do not assume the current working directory is the repo root.
31
+ Before doing substantial work, verify that autonomy is actually possible.
35
32
 
36
- 3. Create per-researcher worktrees inside the project folder, not beside it:
37
- - Prefer a project-local directory such as `<project>/.research-worktrees/<slug>-<n>`.
38
- - Keep worktree directories inside the main project folder so agents do not need extra filesystem permissions.
39
- - Do not create sibling worktrees such as `../project_research_1` unless the user explicitly asks.
40
- - Never point a researcher at the production checkout for edits.
41
- - Use unique branches, for example `research/<slug>-1`, `research/<slug>-2`.
33
+ If the environment will force routine approval prompts for normal work, stop and say so plainly. Do not pretend the experiment is autonomous if the manager or researchers will keep pausing for permission.
42
34
 
43
- 4. Give every researcher the same goal and rules:
44
- - Do not assign fixed technical roles unless the user explicitly asks.
45
- - Let each researcher decide the approach and iterate independently.
46
- - Include the exact worktree path, branch, allowed edit scope, forbidden files, verification commands, and report format.
47
- - Save the canonical brief to the experiment folder before dispatching researchers.
35
+ ## Core Rules
48
36
 
49
- 5. Verify centrally:
50
- - Researchers may run local checks in their worktree, but the main agent owns authoritative acceptance verification.
51
- - If benchmarks contend for scarce resources, run final benchmarks sequentially from the main agent.
52
- - Merge nothing unless it passes acceptance criteria and clears the quality bar.
37
+ - The manager defines the problem, constraints, and evaluation. It does not prescribe the answer.
38
+ - Researchers must not redefine the goal, constraints, or evidence standard.
39
+ - Researchers should not read each other's work while the experiment is running unless the user explicitly asks for collaboration.
40
+ - The manager should not bias researchers toward one preferred solution path unless the user explicitly wants that.
41
+ - The goal is independent exploration, not consensus-by-default.
53
42
 
54
- ## Experiment Folder
43
+ ## Manager Role
55
44
 
56
- The main agent should create one experiment folder under the main project, for example:
45
+ The manager owns setup quality. A bad setup poisons the whole experiment.
57
46
 
58
- ```text
59
- .research-worktrees/<experiment-slug>/
60
- ```
47
+ If the researchers are not pushed toward deep questioning and independent exploration, the experiment will waste time and tokens on obvious, shallow paths.
61
48
 
62
- Inside it, create:
49
+ The manager must keep querying the user until all critical setup fields are clear enough to run safely and usefully.
63
50
 
64
- - `EXPERIMENT.md`: the canonical brief, baseline, quality gate, constraints, assignments, commands, and report format.
65
- - `<slug>-experiment-1/`: worktree for researcher 1.
66
- - `<slug>-experiment-2/`: worktree for researcher 2.
67
- - `<slug>-experiment-3/`: worktree for researcher 3.
68
- - `<slug>-experiment-1/LOG.md`: progress log for researcher 1.
69
- - `<slug>-experiment-2/LOG.md`: progress log for researcher 2.
70
- - `<slug>-experiment-3/LOG.md`: progress log for researcher 3.
51
+ Do not start researcher setup just because you have a rough idea. Start only when the experiment brief is decision-grade.
71
52
 
72
- Researcher prompts should tell agents to read `EXPERIMENT.md` first and then follow only their assigned workspace, branch, log file, and resource values.
53
+ ## Manager Intake Interview
73
54
 
74
- The main agent may check each researcher log during the experiment to monitor progress without interrupting researchers. Researcher agents should append concise entries after each meaningful experiment loop:
55
+ Before creating the experiment package, gather or confirm:
75
56
 
76
- - Hypothesis tried.
77
- - Files changed.
78
- - Command run.
79
- - Result versus same-session baseline.
80
- - Keep/reject decision.
81
- - Current blocker, if any.
57
+ - Objective: what are we trying to discover, optimize, compare, prove, design, or explain?
58
+ - Decision to support: what user decision will this research inform?
59
+ - Output type: what should the final answer look like?
60
+ - Success criteria: what makes a useful result?
61
+ - Evidence standard: benchmark, rubric, citations, reasoning, expert judgment, human review, or another standard.
62
+ - Constraints: budget, time, ethics, legal boundaries, brand rules, forbidden actions, forbidden files, safety limits.
63
+ - Allowed tools and sources: codebase only, web research, papers, datasets, APIs, interviews, etc.
64
+ - Domain context: what background is necessary before work starts?
65
+ - Researcher count: how many researchers should run?
66
+ - Workspace model: code workspace, isolated git worktree, document workspace, or another isolated setup.
67
+ - Stop conditions: when should a researcher conclude success, failure, or exhaustion?
82
68
 
83
- ## Worktree Setup
69
+ If any of those would materially change the experiment, keep asking.
84
70
 
85
- Use project-local worktrees. The worktree directory must live under the main project folder:
71
+ ## Pick The Research Mode
86
72
 
87
- ```bash
88
- mkdir -p .research-worktrees
89
- mkdir -p .research-worktrees/<experiment-slug>
90
- git worktree add .research-worktrees/<experiment-slug>/<slug>-experiment-1 -b research/<slug>-1
91
- git worktree add .research-worktrees/<experiment-slug>/<slug>-experiment-2 -b research/<slug>-2
92
- git worktree add .research-worktrees/<experiment-slug>/<slug>-experiment-3 -b research/<slug>-3
73
+ Choose the mode that matches the work:
74
+
75
+ ### 1. Code Research Mode
76
+
77
+ Use when researchers will change code, run commands, benchmark, or inspect a repository.
78
+
79
+ Preferred isolation:
80
+
81
+ - isolated git worktrees inside the main project folder
82
+ - or isolated copies if worktrees are unavailable
83
+
84
+ ### 2. Knowledge Research Mode
85
+
86
+ Use when the work is primarily reading, comparing, synthesizing, writing, or ideating.
87
+
88
+ Preferred isolation:
89
+
90
+ - per-researcher document workspaces
91
+ - private notes and findings files
92
+ - source and citation tracking if external research is allowed
93
+
94
+ Git worktrees are optional here, not required.
95
+
96
+ ## Standard Experiment Package
97
+
98
+ Create one experiment folder under the project or working directory, for example:
99
+
100
+ ```text
101
+ .research/<experiment-slug>/
93
102
  ```
94
103
 
95
- If a branch already exists, choose a new suffix. Do not delete or overwrite existing worktrees unless the user explicitly asks.
104
+ Inside it, create:
105
+
106
+ - `EXPERIMENT.md`: canonical experiment brief
107
+ - `MANAGER.md`: manager notes, open questions, and round control
108
+ - `SYNTHESIS.md`: manager's final synthesis target
109
+ - `researcher-1/PROMPT.md`
110
+ - `researcher-1/LOG.md`
111
+ - `researcher-1/FINDINGS.md`
112
+ - `researcher-1/IDEAS.md`
113
+ - `researcher-2/...`
114
+ - `researcher-3/...`
115
+
116
+ Each researcher folder may also include a private notes or workspace folder if useful.
96
117
 
97
- When the target files live in a nested repo, run the worktree commands from that nested repo root but still put the worktree folders under the main project folder. Example:
118
+ Use the scaffold script if helpful:
98
119
 
99
120
  ```bash
100
- cd path/to/nested/repo
101
- mkdir -p /absolute/path/to/main-project/.research-worktrees
102
- mkdir -p /absolute/path/to/main-project/.research-worktrees/<experiment-slug>
103
- git worktree add /absolute/path/to/main-project/.research-worktrees/<experiment-slug>/<slug>-experiment-1 -b research/<slug>-1
121
+ node skills/research-experiment/scripts/init-research-layout.mjs .research/<experiment-slug> 3 knowledge
104
122
  ```
105
123
 
106
- ## Baseline And Quality Gate
124
+ Replace `knowledge` with `code` when preparing a code experiment.
107
125
 
108
- Before changing code, each researcher must run the exact benchmark or check command once from their assigned worktree and record it as their same-session baseline.
126
+ ## Workspace Rules
109
127
 
110
- Compare final results against:
128
+ Every researcher must have an isolated assigned workspace.
111
129
 
112
- - The user/main-agent supplied baseline.
113
- - The researcher’s same-session baseline.
130
+ Examples:
114
131
 
115
- The main agent/user defines the quality gate for each experiment. Examples:
132
+ - Code mode: `.research-worktrees/<experiment-slug>/researcher-1/`
133
+ - Knowledge mode: `.research/<experiment-slug>/researcher-1/`
116
134
 
117
- - Test suite must pass.
118
- - Benchmark score must not regress.
119
- - ASR must recover at least N words from generated TTS.
120
- - Human listening check required.
135
+ For code mode:
121
136
 
122
- Do not invent or weaken the quality gate. If the gate is unclear, ask the main agent before accepting a result.
137
+ - never point researchers at the production checkout for edits
138
+ - prefer project-local worktrees rather than sibling folders
139
+ - use unique branches such as `research/<slug>-1`, `research/<slug>-2`
123
140
 
124
- ## Experiment Agent Role
141
+ For knowledge mode:
125
142
 
126
- The experiment agent executes the fixed brief from the main agent. It uses this skill for discipline and workflow only.
143
+ - give each researcher a private workspace for notes, drafts, and source tracking
144
+ - keep their working files separate until the manager collects final findings
127
145
 
128
- Follow only your assigned researcher section from the experiment brief. Do not edit outside your assigned worktree.
129
- Create any temporary files, profiling scripts, generated outputs, scratch notes, and helper artifacts inside your assigned worktree. Do not use `/tmp`, `/var/tmp`, home-directory scratch folders, or sibling project folders unless the brief explicitly allows it.
146
+ ## Canonical Brief Requirements
130
147
 
131
- The experiment agent must not redefine:
148
+ `EXPERIMENT.md` must include:
132
149
 
133
- - Goal.
134
- - Constraints.
135
- - Quality gate.
136
- - Benchmark commands.
137
- - Acceptance criteria.
138
- - Assigned workspace or branch.
150
+ - objective
151
+ - decision this research supports
152
+ - mode: `code` or `knowledge`
153
+ - success criteria
154
+ - evidence standard
155
+ - constraints
156
+ - allowed tools and sources
157
+ - domain context
158
+ - shared resources
159
+ - stop conditions
160
+ - required final output format
161
+ - researcher independence rules
139
162
 
140
- If any of those are missing or ambiguous, ask the main agent for clarification before accepting a result. Do not make up a weaker gate.
163
+ The brief should define the problem clearly without embedding a preferred answer.
141
164
 
142
- ## Experiment Agent Autonomy
165
+ ## Researcher Prompt Files
143
166
 
144
- DO NOT STOP EXPERIMENTING UNLESS YOU ACHIEVED THE GOALS OR ABSOLUTELY NECESSARY.
167
+ The manager must write one self-contained `PROMPT.md` per researcher.
145
168
 
146
- - Do NOT ask the user if you should continue.
147
- - The user may be away from the computer and expects the experiment to continue until you achieve the experiment goals, so keep working until the task is naturally complete.
148
- - You are autonomous. If you are unsure how to proceed, re-read the skill, goals, context, think differently, try different innovative approaches, and continue.
149
- - Stop only if something out of your control blocks you from continuing. Otherwise continue experimenting until goals are achieved or the useful paths are exhausted.
169
+ Each prompt file must be usable cold. The researcher should be able to open it in a fresh session and work without hidden context.
150
170
 
151
- ## Researcher Prompt Template
171
+ Prefer absolute paths in researcher prompt files whenever the user will launch researchers manually in separate sessions. Relative paths are acceptable only when the working directory is guaranteed.
152
172
 
153
- Include this block, or an equivalent adapted version, in every researcher prompt:
173
+ Each prompt must include:
154
174
 
155
- ```text
156
- You are an autonomous research agent for this experiment.
157
-
158
- Goal:
159
- <goal>
160
-
161
- Experiment rules and boundaries:
162
- <rules>
163
-
164
- Assigned workspace:
165
- <absolute path to your worktree>
166
-
167
- Canonical experiment brief:
168
- <absolute path to EXPERIMENT.md>
169
-
170
- Assigned experiment log:
171
- <absolute path to your LOG.md>
172
-
173
- You must work only inside your assigned worktree unless the brief explicitly allows another path. Do not edit, overwrite, or revert files in the production checkout. Do not change benchmark scoring, test fixtures, or acceptance criteria unless the brief explicitly asks for that.
174
- Read the canonical experiment brief before making changes. Follow only your assigned researcher section. Do not redefine the goal, constraints, quality gate, benchmark commands, acceptance criteria, workspace, branch, port, log path, or resource assignments. Do not edit outside your assigned worktree.
175
- If you need temp folders, profiling scripts, scratch files, benchmark outputs, or helper artifacts, create them inside your assigned worktree. Do not use `/tmp`, `/var/tmp`, home-directory scratch folders, or sibling project folders unless the brief explicitly allows it.
176
-
177
- DO NOT STOP EXPERIMENTING UNLESS YOU ACHIEVED THE GOALS OR ABSOLUTELY NECESSARY.
178
- - Do NOT ask the user if you should continue.
179
- - The user may be away from the computer and expects the experiment to continue until you achieve the experiment goals, so keep working until the task is naturally complete.
180
- - You are autonomous. If you are unsure how to proceed, re-read the skill, goals, context, think differently, try different innovative approaches, and continue.
181
- - Stop only if something out of your control blocks you from continuing. Otherwise continue experimenting until goals are achieved or useful paths are exhausted.
182
-
183
- Loop:
184
- 1. Inspect the code and constraints.
185
- 2. Form hypotheses.
186
- 3. Try a small, explainable experiment.
187
- 4. Run relevant verification.
188
- 5. Keep, revise, or reject the attempt.
189
- 6. Repeat until the goal is achieved, useful paths are exhausted, or an external blocker prevents progress.
190
-
191
- Report format:
192
- - Worktree path and branch.
193
- - Changed files.
194
- - Commands run and exact relevant output.
195
- - Results against acceptance criteria.
196
- - Failed attempts and why they were rejected.
197
- - Whether you recommend merging the patch.
198
- - Any cleanup needed: running servers, ports, PIDs, temp files.
199
-
200
- Also append concise progress entries to your assigned experiment log after each meaningful experiment loop.
201
- ```
175
+ - experiment goal
176
+ - decision being supported
177
+ - assigned workspace path
178
+ - paths to `EXPERIMENT.md`, `LOG.md`, `FINDINGS.md`, and `IDEAS.md`
179
+ - allowed tools and sources
180
+ - forbidden actions
181
+ - evidence standard
182
+ - stop conditions
183
+ - reporting requirements
184
+ - the researcher operating rules needed to execute autonomously without opening this skill file
185
+
186
+ Optional: assign a thinking style to increase diversity, such as:
187
+
188
+ - first-principles
189
+ - contrarian
190
+ - evidence-first
191
+ - systems-level
192
+ - creative/wildcard
193
+
194
+ Do not assign solution hints disguised as roles.
195
+
196
+ ## Manual Launch In Clideck
197
+
198
+ When the user is launching researchers manually in Clideck, the manager should give explicit launch instructions.
199
+
200
+ Recommended pattern:
201
+
202
+ 1. Open one new Clideck session per researcher.
203
+ 2. Set the session working directory to the researcher's assigned workspace.
204
+ 3. Tell the agent to read that researcher's `PROMPT.md`, or paste the prompt contents as the first message.
205
+ 4. Do not give the researcher extra steering beyond what is in `PROMPT.md` unless the experiment setup changes.
206
+
207
+ The point is that `PROMPT.md` is the contract.
208
+
209
+ ## Anti-Bias Rules
210
+
211
+ The manager should:
212
+
213
+ - define the question, not the answer
214
+ - avoid telling researchers which solution seems best
215
+ - avoid sharing one researcher's interim ideas with another
216
+ - use diversity prompts only to widen exploration, not to funnel conclusions
217
+ - explicitly require an early question-generation round before action begins
218
+
219
+ Researchers should:
220
+
221
+ - think independently
222
+ - explore at least one non-obvious path
223
+ - not assume the manager knows the answer
224
+ - not converge on the first plausible approach without testing alternatives
225
+
226
+ ## Manager Workflow
227
+
228
+ 1. Interview the user until setup is complete.
229
+ 2. Choose the research mode and isolation strategy.
230
+ 3. Create the experiment folder and researcher folders.
231
+ 4. Create isolated workspaces for each researcher.
232
+ 5. Write `EXPERIMENT.md`, `MANAGER.md`, `SYNTHESIS.md`, and every `PROMPT.md`.
233
+ 6. Verify that each researcher prompt is self-contained.
234
+ 7. Dispatch researchers manually or through the user's orchestration flow.
235
+ 8. Do not let researchers see each other's active work unless the user explicitly wants collaboration.
236
+ 9. When researchers finish, review all `FINDINGS.md` files.
237
+ 10. Write `SYNTHESIS.md` with convergences, divergences, confidence, and ranked recommendations.
238
+
239
+ ## Handoff Protocol
240
+
241
+ When a researcher finishes, `FINDINGS.md` is the handoff artifact.
242
+
243
+ The user, manager, or orchestration layer should deliver every completed `FINDINGS.md` back to the manager session.
244
+
245
+ The manager should synthesize from researcher findings, not from half-finished logs unless a researcher failed before producing a final findings document.
202
246
 
203
- ## Shared Resource Rules
247
+ ## Researcher Role
204
248
 
205
- For resources that can contaminate results or conflict across agents:
249
+ The researcher executes the fixed brief from the manager.
206
250
 
207
- - Assign unique ports, output directories, cache directories, and branch names.
208
- - Do not run final GPU/MPS benchmarks concurrently across researchers.
209
- - Prefer researcher-local smoke tests and main-agent final benchmarks.
210
- - Require researchers to stop servers they started, or report any still-running process clearly.
251
+ The researcher must not redefine:
211
252
 
212
- ## Merge Rules
253
+ - objective
254
+ - decision being supported
255
+ - evidence standard
256
+ - constraints
257
+ - assigned workspace
258
+ - stop conditions
259
+
260
+ If those are missing or materially ambiguous, ask the manager before accepting a conclusion.
261
+
262
+ ## Researcher Operating Rules
263
+
264
+ - Do not ask "should I continue?"
265
+ - Keep working until you reach a credible conclusion or a real blocker.
266
+ - Do not wait for praise or confirmation between attempts.
267
+ - Do not get trapped refining one weak direction forever.
268
+ - Start with a question-generation round before picking your first approach.
269
+ - Try multiple structurally different approaches when the task is open-ended.
270
+ - Record meaningful progress in `LOG.md`.
271
+ - Record deferred but promising paths in `IDEAS.md`.
272
+ - Write your final conclusion in `FINDINGS.md`.
273
+
274
+ ## Evidence And Innovation
275
+
276
+ For every meaningful claim, use the evidence standard defined in the brief.
277
+
278
+ Examples:
279
+
280
+ - Code mode: benchmarks, tests, traces, metrics, profiler output
281
+ - Knowledge mode: citations, comparisons, reasoning chains, datasets, examples, scored rubrics
282
+
283
+ Researchers should be innovative, but not sloppy.
284
+
285
+ - Explore unconventional ideas.
286
+ - Challenge assumptions.
287
+ - Consider contrarian approaches.
288
+ - Ask what the real limits are before optimizing locally.
289
+ - Separate facts from hypotheses.
290
+ - State confidence honestly.
291
+
292
+ ## Researcher Question Round
293
+
294
+ Before the first experiment or line of inquiry, each researcher should generate a short list of high-value questions.
295
+
296
+ The point is to widen the search space before committing to an approach.
297
+
298
+ This question round should happen:
299
+
300
+ - once at the start
301
+ - again after major failed or inconclusive attempts
302
+
303
+ Questions should be practical, not decorative. They should help uncover limits, assumptions, and overlooked paths.
304
+
305
+ Examples:
306
+
307
+ - what is actually expensive here?
308
+ - what is actually creating the token bloat?
309
+ - what is the current hard limit?
310
+ - what happens if we remove this component entirely?
311
+ - what happens if we compress, batch, defer, cache, or approximate this step?
312
+ - what quality signal might break if we optimize too aggressively?
313
+ - what is the theoretical lower bound?
314
+ - what assumptions are probably wrong?
315
+ - what would a contrarian approach try first?
316
+ - if I had to get most of the gain with the smallest change, where would I look?
317
+ - if the obvious path fails, what structurally different path is left?
318
+ - what evidence would prove this direction is a dead end?
319
+
320
+ ## Researcher Loop
321
+
322
+ Use a loop like this:
323
+
324
+ 1. Re-read the goal, constraints, and evidence standard.
325
+ 2. Inspect the workspace and relevant context.
326
+ 3. Generate high-value questions about limits, assumptions, and overlooked paths.
327
+ 4. Form multiple candidate approaches from those questions.
328
+ 5. Choose one explainable experiment or line of inquiry.
329
+ 6. Execute it and gather evidence.
330
+ 7. Record what happened in `LOG.md`.
331
+ 8. Keep, revise, or reject the approach.
332
+ 9. Re-question after major failures or surprises.
333
+ 10. Repeat until success, exhaustion, or a true blocker.
334
+
335
+ For code mode, this may be code changes plus verification.
336
+
337
+ For knowledge mode, this may be source review, comparison, synthesis, ideation, or rubric-based evaluation.
338
+
339
+ ## Stop Conditions
340
+
341
+ A researcher may stop only when one of these is true:
342
+
343
+ - The goal is achieved well enough to make a recommendation.
344
+ - The useful paths are exhausted.
345
+ - A real blocker outside the researcher's control prevents progress.
346
+
347
+ "Useful paths are exhausted" should mean both:
348
+
349
+ - at least several materially different approaches were explored
350
+ - recent attempts are no longer producing meaningful new information
351
+
352
+ ## Required Final Findings Format
353
+
354
+ Every researcher must complete `FINDINGS.md` with:
355
+
356
+ - Executive summary
357
+ - Recommendation
358
+ - Confidence level
359
+ - Approaches tried
360
+ - Evidence collected
361
+ - Failed or rejected paths
362
+ - Remaining uncertainties
363
+ - Suggested next steps
364
+
365
+ For code mode also include:
366
+
367
+ - workspace path
368
+ - branch name if applicable
369
+ - files changed
370
+ - commands run
371
+ - cleanup needed
372
+
373
+ ## Manager Synthesis
374
+
375
+ After collecting all researcher findings, the manager should produce `SYNTHESIS.md` with:
376
+
377
+ - experiment summary
378
+ - researcher-by-researcher conclusions
379
+ - convergences
380
+ - divergences
381
+ - strongest evidence
382
+ - lowest-confidence areas
383
+ - ranked recommendations
384
+ - follow-up experiments or round-two ideas
385
+
386
+ If useful, the manager may launch another research round with a refined brief. A new round may incorporate prior findings, but should still avoid biasing researchers toward one presumed answer.
387
+
388
+ ## Code Mode Guidance
389
+
390
+ When using git worktrees, prefer commands like:
391
+
392
+ ```bash
393
+ mkdir -p .research-worktrees/<experiment-slug>
394
+ git worktree add .research-worktrees/<experiment-slug>/researcher-1 -b research/<slug>-1
395
+ git worktree add .research-worktrees/<experiment-slug>/researcher-2 -b research/<slug>-2
396
+ git worktree add .research-worktrees/<experiment-slug>/researcher-3 -b research/<slug>-3
397
+ ```
213
398
 
214
- The main agent must review researcher diffs before applying them to production.
399
+ Do not delete or overwrite existing worktrees unless the user explicitly asks.
215
400
 
216
- Reject or keep separate any patch that:
401
+ The scaffold script creates the experiment package and researcher folders only. It does not create git worktrees automatically. In code mode, the manager must create and verify the worktrees separately.
217
402
 
218
- - Touches production checkout files.
219
- - Changes tests, benchmarks, fixtures, sample inputs, or scoring without permission.
220
- - Passes only because the acceptance criteria were weakened.
221
- - Produces only noisy or marginal improvement below the declared quality bar.
222
- - Leaves unexplained background processes or shared state.
403
+ ## Hard Rules
223
404
 
224
- If a researcher accidentally edits the production checkout, preserve pre-existing user changes and move or recreate the experiment in an isolated worktree before continuing.
405
+ - Do not start researchers before the manager brief is complete.
406
+ - Do not use the production checkout as a researcher edit workspace.
407
+ - Do not let researchers redefine the experiment.
408
+ - Do not let active researchers read each other's interim work unless the user wants collaboration.
409
+ - Do not confuse manager guidance with answer selection.
410
+ - Do not stop because of ordinary uncertainty.
411
+ - Do not pretend that vague findings are strong evidence.