clideck 1.30.4 → 1.30.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +8 -2
- package/agent-presets.json +1 -0
- package/bin/clideck.js +7 -1
- package/clideck-ask-cli.js +110 -0
- package/config.js +7 -1
- package/handlers.js +16 -3
- package/package.json +1 -1
- package/public/js/creator.js +31 -29
- package/public/js/drag.js +1 -0
- package/server.js +12 -1
- package/session-ask.js +171 -0
- package/sessions.js +7 -1
- package/skills/autonomous-session/SKILL.md +211 -0
- package/skills/research-experiment/SKILL.md +350 -163
- package/skills/research-experiment/SKILL.md.bak +224 -0
- package/skills/research-experiment/scripts/init-research-layout.mjs +184 -0
- package/transcript.js +5 -1
- package/skills/research-experiment/agents/openai.yaml +0 -4
|
@@ -1,224 +1,411 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: research-experiment
|
|
3
|
-
description:
|
|
3
|
+
description: Set up and coordinate parallel autonomous research across multiple agents. Use when a user wants a manager agent to interview for requirements, create the research package, prepare isolated researcher workspaces, and dispatch independent researchers to explore the problem without repeated check-ins.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Research Experiment
|
|
7
7
|
|
|
8
|
-
Use this skill to run parallel
|
|
8
|
+
Use this skill to run parallel autonomous research with one manager and multiple researchers.
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
This skill is generic. It is for:
|
|
11
11
|
|
|
12
|
-
-
|
|
13
|
-
-
|
|
12
|
+
- code optimization
|
|
13
|
+
- implementation experiments
|
|
14
|
+
- product or workflow discovery
|
|
15
|
+
- marketing or advertising research
|
|
16
|
+
- strategy comparison
|
|
17
|
+
- scientific or technical literature review
|
|
18
|
+
- invention and concept exploration
|
|
14
19
|
|
|
15
|
-
|
|
20
|
+
The core model is:
|
|
16
21
|
|
|
17
|
-
|
|
22
|
+
- The **manager** interviews the user, defines the experiment, prepares the workspace, writes the researcher prompt files, and later synthesizes results.
|
|
23
|
+
- Each **researcher** works independently inside an assigned workspace and prompt file, explores multiple approaches, and produces a final findings document.
|
|
18
24
|
|
|
19
|
-
|
|
25
|
+
If the user says "you are the main agent" or "you are the manager", follow the Manager Role.
|
|
20
26
|
|
|
21
|
-
|
|
27
|
+
If the user says "you are a researcher" or "experiment agent", follow the Researcher Role.
|
|
22
28
|
|
|
23
|
-
|
|
24
|
-
- Goal: the outcome to achieve.
|
|
25
|
-
- Acceptance criteria: exact tests, benchmarks, or review gates.
|
|
26
|
-
- Hard constraints: what must not change.
|
|
27
|
-
- Quality bar: what counts as meaningful progress versus noise.
|
|
28
|
-
- Shared resources: ports, GPUs, model caches, services, datasets, credentials, or external APIs.
|
|
29
|
-
- Stop conditions: when researchers may quit.
|
|
29
|
+
## First Check: Can This Run Autonomously?
|
|
30
30
|
|
|
31
|
-
|
|
32
|
-
- Use `git rev-parse --show-toplevel` when inside a git repo.
|
|
33
|
-
- If the project has nested repos, identify which repo owns the files under experiment.
|
|
34
|
-
- Do not assume the current working directory is the repo root.
|
|
31
|
+
Before doing substantial work, verify that autonomy is actually possible.
|
|
35
32
|
|
|
36
|
-
|
|
37
|
-
- Prefer a project-local directory such as `<project>/.research-worktrees/<slug>-<n>`.
|
|
38
|
-
- Keep worktree directories inside the main project folder so agents do not need extra filesystem permissions.
|
|
39
|
-
- Do not create sibling worktrees such as `../project_research_1` unless the user explicitly asks.
|
|
40
|
-
- Never point a researcher at the production checkout for edits.
|
|
41
|
-
- Use unique branches, for example `research/<slug>-1`, `research/<slug>-2`.
|
|
33
|
+
If the environment will force routine approval prompts for normal work, stop and say so plainly. Do not pretend the experiment is autonomous if the manager or researchers will keep pausing for permission.
|
|
42
34
|
|
|
43
|
-
|
|
44
|
-
- Do not assign fixed technical roles unless the user explicitly asks.
|
|
45
|
-
- Let each researcher decide the approach and iterate independently.
|
|
46
|
-
- Include the exact worktree path, branch, allowed edit scope, forbidden files, verification commands, and report format.
|
|
47
|
-
- Save the canonical brief to the experiment folder before dispatching researchers.
|
|
35
|
+
## Core Rules
|
|
48
36
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
37
|
+
- The manager defines the problem, constraints, and evaluation. It does not prescribe the answer.
|
|
38
|
+
- Researchers must not redefine the goal, constraints, or evidence standard.
|
|
39
|
+
- Researchers should not read each other's work while the experiment is running unless the user explicitly asks for collaboration.
|
|
40
|
+
- The manager should not bias researchers toward one preferred solution path unless the user explicitly wants that.
|
|
41
|
+
- The goal is independent exploration, not consensus-by-default.
|
|
53
42
|
|
|
54
|
-
##
|
|
43
|
+
## Manager Role
|
|
55
44
|
|
|
56
|
-
The
|
|
45
|
+
The manager owns setup quality. A bad setup poisons the whole experiment.
|
|
57
46
|
|
|
58
|
-
|
|
59
|
-
.research-worktrees/<experiment-slug>/
|
|
60
|
-
```
|
|
47
|
+
If the researchers are not pushed toward deep questioning and independent exploration, the experiment will waste time and tokens on obvious, shallow paths.
|
|
61
48
|
|
|
62
|
-
|
|
49
|
+
The manager must keep querying the user until all critical setup fields are clear enough to run safely and usefully.
|
|
63
50
|
|
|
64
|
-
|
|
65
|
-
- `<slug>-experiment-1/`: worktree for researcher 1.
|
|
66
|
-
- `<slug>-experiment-2/`: worktree for researcher 2.
|
|
67
|
-
- `<slug>-experiment-3/`: worktree for researcher 3.
|
|
68
|
-
- `<slug>-experiment-1/LOG.md`: progress log for researcher 1.
|
|
69
|
-
- `<slug>-experiment-2/LOG.md`: progress log for researcher 2.
|
|
70
|
-
- `<slug>-experiment-3/LOG.md`: progress log for researcher 3.
|
|
51
|
+
Do not start researcher setup just because you have a rough idea. Start only when the experiment brief is decision-grade.
|
|
71
52
|
|
|
72
|
-
|
|
53
|
+
## Manager Intake Interview
|
|
73
54
|
|
|
74
|
-
|
|
55
|
+
Before creating the experiment package, gather or confirm:
|
|
75
56
|
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
-
|
|
79
|
-
-
|
|
80
|
-
-
|
|
81
|
-
-
|
|
57
|
+
- Objective: what are we trying to discover, optimize, compare, prove, design, or explain?
|
|
58
|
+
- Decision to support: what user decision will this research inform?
|
|
59
|
+
- Output type: what should the final answer look like?
|
|
60
|
+
- Success criteria: what makes a useful result?
|
|
61
|
+
- Evidence standard: benchmark, rubric, citations, reasoning, expert judgment, human review, or another standard.
|
|
62
|
+
- Constraints: budget, time, ethics, legal boundaries, brand rules, forbidden actions, forbidden files, safety limits.
|
|
63
|
+
- Allowed tools and sources: codebase only, web research, papers, datasets, APIs, interviews, etc.
|
|
64
|
+
- Domain context: what background is necessary before work starts?
|
|
65
|
+
- Researcher count: how many researchers should run?
|
|
66
|
+
- Workspace model: code workspace, isolated git worktree, document workspace, or another isolated setup.
|
|
67
|
+
- Stop conditions: when should a researcher conclude success, failure, or exhaustion?
|
|
82
68
|
|
|
83
|
-
|
|
69
|
+
If any of those would materially change the experiment, keep asking.
|
|
84
70
|
|
|
85
|
-
|
|
71
|
+
## Pick The Research Mode
|
|
86
72
|
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
73
|
+
Choose the mode that matches the work:
|
|
74
|
+
|
|
75
|
+
### 1. Code Research Mode
|
|
76
|
+
|
|
77
|
+
Use when researchers will change code, run commands, benchmark, or inspect a repository.
|
|
78
|
+
|
|
79
|
+
Preferred isolation:
|
|
80
|
+
|
|
81
|
+
- isolated git worktrees inside the main project folder
|
|
82
|
+
- or isolated copies if worktrees are unavailable
|
|
83
|
+
|
|
84
|
+
### 2. Knowledge Research Mode
|
|
85
|
+
|
|
86
|
+
Use when the work is primarily reading, comparing, synthesizing, writing, or ideating.
|
|
87
|
+
|
|
88
|
+
Preferred isolation:
|
|
89
|
+
|
|
90
|
+
- per-researcher document workspaces
|
|
91
|
+
- private notes and findings files
|
|
92
|
+
- source and citation tracking if external research is allowed
|
|
93
|
+
|
|
94
|
+
Git worktrees are optional here, not required.
|
|
95
|
+
|
|
96
|
+
## Standard Experiment Package
|
|
97
|
+
|
|
98
|
+
Create one experiment folder under the project or working directory, for example:
|
|
99
|
+
|
|
100
|
+
```text
|
|
101
|
+
.research/<experiment-slug>/
|
|
93
102
|
```
|
|
94
103
|
|
|
95
|
-
|
|
104
|
+
Inside it, create:
|
|
105
|
+
|
|
106
|
+
- `EXPERIMENT.md`: canonical experiment brief
|
|
107
|
+
- `MANAGER.md`: manager notes, open questions, and round control
|
|
108
|
+
- `SYNTHESIS.md`: manager's final synthesis target
|
|
109
|
+
- `researcher-1/PROMPT.md`
|
|
110
|
+
- `researcher-1/LOG.md`
|
|
111
|
+
- `researcher-1/FINDINGS.md`
|
|
112
|
+
- `researcher-1/IDEAS.md`
|
|
113
|
+
- `researcher-2/...`
|
|
114
|
+
- `researcher-3/...`
|
|
115
|
+
|
|
116
|
+
Each researcher folder may also include a private notes or workspace folder if useful.
|
|
96
117
|
|
|
97
|
-
|
|
118
|
+
Use the scaffold script if helpful:
|
|
98
119
|
|
|
99
120
|
```bash
|
|
100
|
-
|
|
101
|
-
mkdir -p /absolute/path/to/main-project/.research-worktrees
|
|
102
|
-
mkdir -p /absolute/path/to/main-project/.research-worktrees/<experiment-slug>
|
|
103
|
-
git worktree add /absolute/path/to/main-project/.research-worktrees/<experiment-slug>/<slug>-experiment-1 -b research/<slug>-1
|
|
121
|
+
node skills/research-experiment/scripts/init-research-layout.mjs .research/<experiment-slug> 3 knowledge
|
|
104
122
|
```
|
|
105
123
|
|
|
106
|
-
|
|
124
|
+
Replace `knowledge` with `code` when preparing a code experiment.
|
|
107
125
|
|
|
108
|
-
|
|
126
|
+
## Workspace Rules
|
|
109
127
|
|
|
110
|
-
|
|
128
|
+
Every researcher must have an isolated assigned workspace.
|
|
111
129
|
|
|
112
|
-
|
|
113
|
-
- The researcher’s same-session baseline.
|
|
130
|
+
Examples:
|
|
114
131
|
|
|
115
|
-
|
|
132
|
+
- Code mode: `.research-worktrees/<experiment-slug>/researcher-1/`
|
|
133
|
+
- Knowledge mode: `.research/<experiment-slug>/researcher-1/`
|
|
116
134
|
|
|
117
|
-
|
|
118
|
-
- Benchmark score must not regress.
|
|
119
|
-
- ASR must recover at least N words from generated TTS.
|
|
120
|
-
- Human listening check required.
|
|
135
|
+
For code mode:
|
|
121
136
|
|
|
122
|
-
|
|
137
|
+
- never point researchers at the production checkout for edits
|
|
138
|
+
- prefer project-local worktrees rather than sibling folders
|
|
139
|
+
- use unique branches such as `research/<slug>-1`, `research/<slug>-2`
|
|
123
140
|
|
|
124
|
-
|
|
141
|
+
For knowledge mode:
|
|
125
142
|
|
|
126
|
-
|
|
143
|
+
- give each researcher a private workspace for notes, drafts, and source tracking
|
|
144
|
+
- keep their working files separate until the manager collects final findings
|
|
127
145
|
|
|
128
|
-
|
|
129
|
-
Create any temporary files, profiling scripts, generated outputs, scratch notes, and helper artifacts inside your assigned worktree. Do not use `/tmp`, `/var/tmp`, home-directory scratch folders, or sibling project folders unless the brief explicitly allows it.
|
|
146
|
+
## Canonical Brief Requirements
|
|
130
147
|
|
|
131
|
-
|
|
148
|
+
`EXPERIMENT.md` must include:
|
|
132
149
|
|
|
133
|
-
-
|
|
134
|
-
-
|
|
135
|
-
-
|
|
136
|
-
-
|
|
137
|
-
-
|
|
138
|
-
-
|
|
150
|
+
- objective
|
|
151
|
+
- decision this research supports
|
|
152
|
+
- mode: `code` or `knowledge`
|
|
153
|
+
- success criteria
|
|
154
|
+
- evidence standard
|
|
155
|
+
- constraints
|
|
156
|
+
- allowed tools and sources
|
|
157
|
+
- domain context
|
|
158
|
+
- shared resources
|
|
159
|
+
- stop conditions
|
|
160
|
+
- required final output format
|
|
161
|
+
- researcher independence rules
|
|
139
162
|
|
|
140
|
-
|
|
163
|
+
The brief should define the problem clearly without embedding a preferred answer.
|
|
141
164
|
|
|
142
|
-
##
|
|
165
|
+
## Researcher Prompt Files
|
|
143
166
|
|
|
144
|
-
|
|
167
|
+
The manager must write one self-contained `PROMPT.md` per researcher.
|
|
145
168
|
|
|
146
|
-
|
|
147
|
-
- The user may be away from the computer and expects the experiment to continue until you achieve the experiment goals, so keep working until the task is naturally complete.
|
|
148
|
-
- You are autonomous. If you are unsure how to proceed, re-read the skill, goals, context, think differently, try different innovative approaches, and continue.
|
|
149
|
-
- Stop only if something out of your control blocks you from continuing. Otherwise continue experimenting until goals are achieved or the useful paths are exhausted.
|
|
169
|
+
Each prompt file must be usable cold. The researcher should be able to open it in a fresh session and work without hidden context.
|
|
150
170
|
|
|
151
|
-
|
|
171
|
+
Prefer absolute paths in researcher prompt files whenever the user will launch researchers manually in separate sessions. Relative paths are acceptable only when the working directory is guaranteed.
|
|
152
172
|
|
|
153
|
-
|
|
173
|
+
Each prompt must include:
|
|
154
174
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
-
|
|
194
|
-
-
|
|
195
|
-
-
|
|
196
|
-
-
|
|
197
|
-
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
175
|
+
- experiment goal
|
|
176
|
+
- decision being supported
|
|
177
|
+
- assigned workspace path
|
|
178
|
+
- paths to `EXPERIMENT.md`, `LOG.md`, `FINDINGS.md`, and `IDEAS.md`
|
|
179
|
+
- allowed tools and sources
|
|
180
|
+
- forbidden actions
|
|
181
|
+
- evidence standard
|
|
182
|
+
- stop conditions
|
|
183
|
+
- reporting requirements
|
|
184
|
+
- the researcher operating rules needed to execute autonomously without opening this skill file
|
|
185
|
+
|
|
186
|
+
Optional: assign a thinking style to increase diversity, such as:
|
|
187
|
+
|
|
188
|
+
- first-principles
|
|
189
|
+
- contrarian
|
|
190
|
+
- evidence-first
|
|
191
|
+
- systems-level
|
|
192
|
+
- creative/wildcard
|
|
193
|
+
|
|
194
|
+
Do not assign solution hints disguised as roles.
|
|
195
|
+
|
|
196
|
+
## Manual Launch In Clideck
|
|
197
|
+
|
|
198
|
+
When the user is launching researchers manually in Clideck, the manager should give explicit launch instructions.
|
|
199
|
+
|
|
200
|
+
Recommended pattern:
|
|
201
|
+
|
|
202
|
+
1. Open one new Clideck session per researcher.
|
|
203
|
+
2. Set the session working directory to the researcher's assigned workspace.
|
|
204
|
+
3. Tell the agent to read that researcher's `PROMPT.md`, or paste the prompt contents as the first message.
|
|
205
|
+
4. Do not give the researcher extra steering beyond what is in `PROMPT.md` unless the experiment setup changes.
|
|
206
|
+
|
|
207
|
+
The point is that `PROMPT.md` is the contract.
|
|
208
|
+
|
|
209
|
+
## Anti-Bias Rules
|
|
210
|
+
|
|
211
|
+
The manager should:
|
|
212
|
+
|
|
213
|
+
- define the question, not the answer
|
|
214
|
+
- avoid telling researchers which solution seems best
|
|
215
|
+
- avoid sharing one researcher's interim ideas with another
|
|
216
|
+
- use diversity prompts only to widen exploration, not to funnel conclusions
|
|
217
|
+
- explicitly require an early question-generation round before action begins
|
|
218
|
+
|
|
219
|
+
Researchers should:
|
|
220
|
+
|
|
221
|
+
- think independently
|
|
222
|
+
- explore at least one non-obvious path
|
|
223
|
+
- not assume the manager knows the answer
|
|
224
|
+
- not converge on the first plausible approach without testing alternatives
|
|
225
|
+
|
|
226
|
+
## Manager Workflow
|
|
227
|
+
|
|
228
|
+
1. Interview the user until setup is complete.
|
|
229
|
+
2. Choose the research mode and isolation strategy.
|
|
230
|
+
3. Create the experiment folder and researcher folders.
|
|
231
|
+
4. Create isolated workspaces for each researcher.
|
|
232
|
+
5. Write `EXPERIMENT.md`, `MANAGER.md`, `SYNTHESIS.md`, and every `PROMPT.md`.
|
|
233
|
+
6. Verify that each researcher prompt is self-contained.
|
|
234
|
+
7. Dispatch researchers manually or through the user's orchestration flow.
|
|
235
|
+
8. Do not let researchers see each other's active work unless the user explicitly wants collaboration.
|
|
236
|
+
9. When researchers finish, review all `FINDINGS.md` files.
|
|
237
|
+
10. Write `SYNTHESIS.md` with convergences, divergences, confidence, and ranked recommendations.
|
|
238
|
+
|
|
239
|
+
## Handoff Protocol
|
|
240
|
+
|
|
241
|
+
When a researcher finishes, `FINDINGS.md` is the handoff artifact.
|
|
242
|
+
|
|
243
|
+
The user, manager, or orchestration layer should deliver every completed `FINDINGS.md` back to the manager session.
|
|
244
|
+
|
|
245
|
+
The manager should synthesize from researcher findings, not from half-finished logs unless a researcher failed before producing a final findings document.
|
|
202
246
|
|
|
203
|
-
##
|
|
247
|
+
## Researcher Role
|
|
204
248
|
|
|
205
|
-
|
|
249
|
+
The researcher executes the fixed brief from the manager.
|
|
206
250
|
|
|
207
|
-
|
|
208
|
-
- Do not run final GPU/MPS benchmarks concurrently across researchers.
|
|
209
|
-
- Prefer researcher-local smoke tests and main-agent final benchmarks.
|
|
210
|
-
- Require researchers to stop servers they started, or report any still-running process clearly.
|
|
251
|
+
The researcher must not redefine:
|
|
211
252
|
|
|
212
|
-
|
|
253
|
+
- objective
|
|
254
|
+
- decision being supported
|
|
255
|
+
- evidence standard
|
|
256
|
+
- constraints
|
|
257
|
+
- assigned workspace
|
|
258
|
+
- stop conditions
|
|
259
|
+
|
|
260
|
+
If those are missing or materially ambiguous, ask the manager before accepting a conclusion.
|
|
261
|
+
|
|
262
|
+
## Researcher Operating Rules
|
|
263
|
+
|
|
264
|
+
- Do not ask "should I continue?"
|
|
265
|
+
- Keep working until you reach a credible conclusion or a real blocker.
|
|
266
|
+
- Do not wait for praise or confirmation between attempts.
|
|
267
|
+
- Do not get trapped refining one weak direction forever.
|
|
268
|
+
- Start with a question-generation round before picking your first approach.
|
|
269
|
+
- Try multiple structurally different approaches when the task is open-ended.
|
|
270
|
+
- Record meaningful progress in `LOG.md`.
|
|
271
|
+
- Record deferred but promising paths in `IDEAS.md`.
|
|
272
|
+
- Write your final conclusion in `FINDINGS.md`.
|
|
273
|
+
|
|
274
|
+
## Evidence And Innovation
|
|
275
|
+
|
|
276
|
+
For every meaningful claim, use the evidence standard defined in the brief.
|
|
277
|
+
|
|
278
|
+
Examples:
|
|
279
|
+
|
|
280
|
+
- Code mode: benchmarks, tests, traces, metrics, profiler output
|
|
281
|
+
- Knowledge mode: citations, comparisons, reasoning chains, datasets, examples, scored rubrics
|
|
282
|
+
|
|
283
|
+
Researchers should be innovative, but not sloppy.
|
|
284
|
+
|
|
285
|
+
- Explore unconventional ideas.
|
|
286
|
+
- Challenge assumptions.
|
|
287
|
+
- Consider contrarian approaches.
|
|
288
|
+
- Ask what the real limits are before optimizing locally.
|
|
289
|
+
- Separate facts from hypotheses.
|
|
290
|
+
- State confidence honestly.
|
|
291
|
+
|
|
292
|
+
## Researcher Question Round
|
|
293
|
+
|
|
294
|
+
Before the first experiment or line of inquiry, each researcher should generate a short list of high-value questions.
|
|
295
|
+
|
|
296
|
+
The point is to widen the search space before committing to an approach.
|
|
297
|
+
|
|
298
|
+
This question round should happen:
|
|
299
|
+
|
|
300
|
+
- once at the start
|
|
301
|
+
- again after major failed or inconclusive attempts
|
|
302
|
+
|
|
303
|
+
Questions should be practical, not decorative. They should help uncover limits, assumptions, and overlooked paths.
|
|
304
|
+
|
|
305
|
+
Examples:
|
|
306
|
+
|
|
307
|
+
- what is actually expensive here?
|
|
308
|
+
- what is actually creating the token bloat?
|
|
309
|
+
- what is the current hard limit?
|
|
310
|
+
- what happens if we remove this component entirely?
|
|
311
|
+
- what happens if we compress, batch, defer, cache, or approximate this step?
|
|
312
|
+
- what quality signal might break if we optimize too aggressively?
|
|
313
|
+
- what is the theoretical lower bound?
|
|
314
|
+
- what assumptions are probably wrong?
|
|
315
|
+
- what would a contrarian approach try first?
|
|
316
|
+
- if I had to get most of the gain with the smallest change, where would I look?
|
|
317
|
+
- if the obvious path fails, what structurally different path is left?
|
|
318
|
+
- what evidence would prove this direction is a dead end?
|
|
319
|
+
|
|
320
|
+
## Researcher Loop
|
|
321
|
+
|
|
322
|
+
Use a loop like this:
|
|
323
|
+
|
|
324
|
+
1. Re-read the goal, constraints, and evidence standard.
|
|
325
|
+
2. Inspect the workspace and relevant context.
|
|
326
|
+
3. Generate high-value questions about limits, assumptions, and overlooked paths.
|
|
327
|
+
4. Form multiple candidate approaches from those questions.
|
|
328
|
+
5. Choose one explainable experiment or line of inquiry.
|
|
329
|
+
6. Execute it and gather evidence.
|
|
330
|
+
7. Record what happened in `LOG.md`.
|
|
331
|
+
8. Keep, revise, or reject the approach.
|
|
332
|
+
9. Re-question after major failures or surprises.
|
|
333
|
+
10. Repeat until success, exhaustion, or a true blocker.
|
|
334
|
+
|
|
335
|
+
For code mode, this may be code changes plus verification.
|
|
336
|
+
|
|
337
|
+
For knowledge mode, this may be source review, comparison, synthesis, ideation, or rubric-based evaluation.
|
|
338
|
+
|
|
339
|
+
## Stop Conditions
|
|
340
|
+
|
|
341
|
+
A researcher may stop only when one of these is true:
|
|
342
|
+
|
|
343
|
+
- The goal is achieved well enough to make a recommendation.
|
|
344
|
+
- The useful paths are exhausted.
|
|
345
|
+
- A real blocker outside the researcher's control prevents progress.
|
|
346
|
+
|
|
347
|
+
"Useful paths are exhausted" should mean both:
|
|
348
|
+
|
|
349
|
+
- at least several materially different approaches were explored
|
|
350
|
+
- recent attempts are no longer producing meaningful new information
|
|
351
|
+
|
|
352
|
+
## Required Final Findings Format
|
|
353
|
+
|
|
354
|
+
Every researcher must complete `FINDINGS.md` with:
|
|
355
|
+
|
|
356
|
+
- Executive summary
|
|
357
|
+
- Recommendation
|
|
358
|
+
- Confidence level
|
|
359
|
+
- Approaches tried
|
|
360
|
+
- Evidence collected
|
|
361
|
+
- Failed or rejected paths
|
|
362
|
+
- Remaining uncertainties
|
|
363
|
+
- Suggested next steps
|
|
364
|
+
|
|
365
|
+
For code mode also include:
|
|
366
|
+
|
|
367
|
+
- workspace path
|
|
368
|
+
- branch name if applicable
|
|
369
|
+
- files changed
|
|
370
|
+
- commands run
|
|
371
|
+
- cleanup needed
|
|
372
|
+
|
|
373
|
+
## Manager Synthesis
|
|
374
|
+
|
|
375
|
+
After collecting all researcher findings, the manager should produce `SYNTHESIS.md` with:
|
|
376
|
+
|
|
377
|
+
- experiment summary
|
|
378
|
+
- researcher-by-researcher conclusions
|
|
379
|
+
- convergences
|
|
380
|
+
- divergences
|
|
381
|
+
- strongest evidence
|
|
382
|
+
- lowest-confidence areas
|
|
383
|
+
- ranked recommendations
|
|
384
|
+
- follow-up experiments or round-two ideas
|
|
385
|
+
|
|
386
|
+
If useful, the manager may launch another research round with a refined brief. A new round may incorporate prior findings, but should still avoid biasing researchers toward one presumed answer.
|
|
387
|
+
|
|
388
|
+
## Code Mode Guidance
|
|
389
|
+
|
|
390
|
+
When using git worktrees, prefer commands like:
|
|
391
|
+
|
|
392
|
+
```bash
|
|
393
|
+
mkdir -p .research-worktrees/<experiment-slug>
|
|
394
|
+
git worktree add .research-worktrees/<experiment-slug>/researcher-1 -b research/<slug>-1
|
|
395
|
+
git worktree add .research-worktrees/<experiment-slug>/researcher-2 -b research/<slug>-2
|
|
396
|
+
git worktree add .research-worktrees/<experiment-slug>/researcher-3 -b research/<slug>-3
|
|
397
|
+
```
|
|
213
398
|
|
|
214
|
-
|
|
399
|
+
Do not delete or overwrite existing worktrees unless the user explicitly asks.
|
|
215
400
|
|
|
216
|
-
|
|
401
|
+
The scaffold script creates the experiment package and researcher folders only. It does not create git worktrees automatically. In code mode, the manager must create and verify the worktrees separately.
|
|
217
402
|
|
|
218
|
-
|
|
219
|
-
- Changes tests, benchmarks, fixtures, sample inputs, or scoring without permission.
|
|
220
|
-
- Passes only because the acceptance criteria were weakened.
|
|
221
|
-
- Produces only noisy or marginal improvement below the declared quality bar.
|
|
222
|
-
- Leaves unexplained background processes or shared state.
|
|
403
|
+
## Hard Rules
|
|
223
404
|
|
|
224
|
-
|
|
405
|
+
- Do not start researchers before the manager brief is complete.
|
|
406
|
+
- Do not use the production checkout as a researcher edit workspace.
|
|
407
|
+
- Do not let researchers redefine the experiment.
|
|
408
|
+
- Do not let active researchers read each other's interim work unless the user wants collaboration.
|
|
409
|
+
- Do not confuse manager guidance with answer selection.
|
|
410
|
+
- Do not stop because of ordinary uncertainty.
|
|
411
|
+
- Do not pretend that vague findings are strong evidence.
|