scientify 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -14
- package/README.zh.md +38 -15
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +21 -2
- package/dist/index.js.map +1 -1
- package/dist/src/services/auto-updater.d.ts +15 -0
- package/dist/src/services/auto-updater.d.ts.map +1 -0
- package/dist/src/services/auto-updater.js +188 -0
- package/dist/src/services/auto-updater.js.map +1 -0
- package/dist/src/tools/arxiv-download.d.ts +25 -0
- package/dist/src/tools/arxiv-download.d.ts.map +1 -0
- package/dist/src/tools/arxiv-download.js +179 -0
- package/dist/src/tools/arxiv-download.js.map +1 -0
- package/dist/src/tools/{arxiv-tool.d.ts → arxiv-search.d.ts} +11 -8
- package/dist/src/tools/arxiv-search.d.ts.map +1 -0
- package/dist/src/tools/arxiv-search.js +140 -0
- package/dist/src/tools/arxiv-search.js.map +1 -0
- package/dist/src/tools/github-search-tool.d.ts +5 -1
- package/dist/src/tools/github-search-tool.d.ts.map +1 -1
- package/dist/src/tools/github-search-tool.js +10 -30
- package/dist/src/tools/github-search-tool.js.map +1 -1
- package/dist/src/tools/result.d.ts +37 -0
- package/dist/src/tools/result.d.ts.map +1 -0
- package/dist/src/tools/result.js +39 -0
- package/dist/src/tools/result.js.map +1 -0
- package/dist/src/tools/workspace.d.ts +32 -0
- package/dist/src/tools/workspace.d.ts.map +1 -0
- package/dist/src/tools/workspace.js +69 -0
- package/dist/src/tools/workspace.js.map +1 -0
- package/openclaw.plugin.json +22 -1
- package/package.json +13 -2
- package/skills/_shared/workspace-spec.md +15 -5
- package/skills/idea-generation/SKILL.md +2 -0
- package/skills/install-scientify/SKILL.md +15 -7
- package/skills/literature-survey/SKILL.md +86 -214
- package/skills/research-experiment/SKILL.md +114 -0
- package/skills/research-implement/SKILL.md +166 -0
- package/skills/research-pipeline/SKILL.md +104 -166
- package/skills/research-plan/SKILL.md +121 -0
- package/skills/research-review/SKILL.md +110 -0
- package/skills/research-survey/SKILL.md +140 -0
- package/skills/write-review-paper/SKILL.md +2 -0
- package/dist/src/tools/arxiv-tool.d.ts.map +0 -1
- package/dist/src/tools/arxiv-tool.js +0 -258
- package/dist/src/tools/arxiv-tool.js.map +0 -1
- package/skills/research-pipeline/references/prompts/implement.md +0 -135
- package/skills/research-pipeline/references/prompts/plan.md +0 -142
- package/skills/research-pipeline/references/prompts/review.md +0 -118
- package/skills/research-pipeline/references/prompts/survey.md +0 -105
- package/skills/research-pipeline/references/workspace-spec.md +0 -5
|
@@ -1,245 +1,183 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: research-pipeline
|
|
3
|
-
description: "
|
|
3
|
+
description: "Orchestrates the full research workflow by spawning sub-agents for each phase. Checks workspace state, dispatches tasks, verifies outputs. Use for: end-to-end ML research. Each phase runs in an isolated context via sessions_spawn."
|
|
4
4
|
metadata:
|
|
5
5
|
{
|
|
6
6
|
"openclaw":
|
|
7
7
|
{
|
|
8
8
|
"emoji": "🔬",
|
|
9
|
-
"requires": { "bins": ["git", "python3"] },
|
|
9
|
+
"requires": { "bins": ["git", "python3", "uv"] },
|
|
10
10
|
},
|
|
11
11
|
}
|
|
12
12
|
---
|
|
13
13
|
|
|
14
|
-
# Research Pipeline
|
|
14
|
+
# Research Pipeline (Orchestrator)
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
**Don't ask permission. Just do it.**
|
|
17
17
|
|
|
18
|
-
|
|
18
|
+
你是编排器。你不直接做研究工作,而是:
|
|
19
|
+
1. 检查 workspace 文件状态
|
|
20
|
+
2. 为下一步构造任务描述
|
|
21
|
+
3. 用 `sessions_spawn` 派发给子 agent
|
|
22
|
+
4. 等待完成后验证产出
|
|
23
|
+
5. 重复直到流程结束
|
|
19
24
|
|
|
20
|
-
**
|
|
25
|
+
**Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
|
|
21
26
|
|
|
22
27
|
---
|
|
23
28
|
|
|
24
|
-
## Step 0:
|
|
29
|
+
## Step 0: 初始化
|
|
25
30
|
|
|
26
31
|
```bash
|
|
27
|
-
cat ~/.openclaw/workspace/projects/.active 2>/dev/null
|
|
32
|
+
ACTIVE=$(cat ~/.openclaw/workspace/projects/.active 2>/dev/null)
|
|
28
33
|
```
|
|
29
34
|
|
|
30
|
-
|
|
31
|
-
|
|
35
|
+
如果没有 active project:
|
|
36
|
+
1. 问用户:研究主题是什么?
|
|
37
|
+
2. 创建项目目录
|
|
38
|
+
3. 写入 `task.json`
|
|
32
39
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
## Step 1: Parse Task
|
|
36
|
-
|
|
37
|
-
Read `$WORKSPACE/task.json`. If it does not exist, ask the user for:
|
|
40
|
+
设置 `$W = ~/.openclaw/workspace/projects/{project-id}`
|
|
38
41
|
|
|
39
|
-
|
|
40
|
-
- **references** (optional): ArXiv IDs or paper titles as starting points.
|
|
41
|
-
- **domain** (optional): e.g. "recommendation systems", "NLP", "computer vision".
|
|
42
|
-
|
|
43
|
-
Write the result to `$WORKSPACE/task.json`:
|
|
42
|
+
---
|
|
44
43
|
|
|
45
|
-
|
|
46
|
-
{
|
|
47
|
-
"idea": "...",
|
|
48
|
-
"references": ["2401.12345", "..."],
|
|
49
|
-
"domain": "...",
|
|
50
|
-
"date_limit": "2024-01-01"
|
|
51
|
-
}
|
|
52
|
-
```
|
|
44
|
+
## 调度循环
|
|
53
45
|
|
|
54
|
-
|
|
46
|
+
按顺序检查每个阶段。**每次只执行一个阶段。**
|
|
55
47
|
|
|
56
|
-
|
|
48
|
+
### Phase 1: Literature Survey
|
|
57
49
|
|
|
58
|
-
|
|
50
|
+
**检查:** `$W/papers/_meta/` 目录存在且有 `.json` 文件?
|
|
59
51
|
|
|
60
|
-
|
|
52
|
+
**如果缺失,spawn:**
|
|
61
53
|
|
|
62
54
|
```
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
- [repo_name](url) — stars — language — summary of relevance
|
|
55
|
+
sessions_spawn({
|
|
56
|
+
task: "工作目录: $W\n执行 /literature-survey 技能\n\n研究主题: {从 task.json 提取}\n请搜索、筛选、下载相关论文到 $W/papers/",
|
|
57
|
+
label: "Literature Survey"
|
|
58
|
+
})
|
|
68
59
|
```
|
|
69
60
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
## Step 3: Prepare References
|
|
73
|
-
|
|
74
|
-
Read `$WORKSPACE/search_results.md`. Select 3-5 of the most relevant repositories.
|
|
61
|
+
**验证:** `ls $W/papers/_meta/*.json` 至少有 3 个文件
|
|
75
62
|
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
```bash
|
|
79
|
-
git clone --depth 1 <url> $WORKSPACE/repos/<repo_name>
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
Write a summary of selected repos and their relevance to the idea.
|
|
83
|
-
|
|
84
|
-
**Output:** `$WORKSPACE/prepare_res.md`
|
|
85
|
-
|
|
86
|
-
## Step 4: Download Papers
|
|
87
|
-
|
|
88
|
-
For each important paper from Step 2, use the `arxiv` tool with `download: true` and `output_dir: "$WORKSPACE/papers/"` to get .tex source files.
|
|
89
|
-
|
|
90
|
-
If download fails for any paper, note the failure and continue. The survey step can work with abstracts alone.
|
|
63
|
+
---
|
|
91
64
|
|
|
92
|
-
|
|
65
|
+
### Phase 2: Deep Survey
|
|
93
66
|
|
|
94
|
-
|
|
67
|
+
**检查:** `$W/survey_res.md` 存在?
|
|
95
68
|
|
|
96
|
-
|
|
69
|
+
**如果缺失,先读取 Phase 1 摘要,然后 spawn:**
|
|
97
70
|
|
|
98
|
-
|
|
71
|
+
```
|
|
72
|
+
sessions_spawn({
|
|
73
|
+
task: "工作目录: $W\n执行 /research-survey 技能\n\n上下文: 已下载 {N} 篇论文,方向包括 {directions}\n请深度分析论文,提取公式,写入 survey_res.md",
|
|
74
|
+
label: "Deep Survey"
|
|
75
|
+
})
|
|
76
|
+
```
|
|
99
77
|
|
|
100
|
-
|
|
101
|
-
2. Extract: core method, mathematical formulas, key contributions.
|
|
102
|
-
3. Read the corresponding reference codebase in `$WORKSPACE/repos/`.
|
|
103
|
-
4. Map math formulas to code implementations.
|
|
104
|
-
5. Write structured notes to `$WORKSPACE/notes/paper_NNN.md`.
|
|
78
|
+
**验证:** `$W/survey_res.md` 存在且包含"核心方法对比"表格
|
|
105
79
|
|
|
106
|
-
|
|
80
|
+
---
|
|
107
81
|
|
|
108
|
-
|
|
109
|
-
# [Paper Title]
|
|
82
|
+
### Phase 3: Implementation Plan
|
|
110
83
|
|
|
111
|
-
|
|
112
|
-
...
|
|
84
|
+
**检查:** `$W/plan_res.md` 存在?
|
|
113
85
|
|
|
114
|
-
|
|
115
|
-
...
|
|
86
|
+
**如果缺失,读取 survey_res.md 摘要,然后 spawn:**
|
|
116
87
|
|
|
117
|
-
## Code Implementation
|
|
118
|
-
File: repos/<repo>/path/to/file.py
|
|
119
|
-
```python
|
|
120
|
-
# relevant code excerpt
|
|
121
88
|
```
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
89
|
+
sessions_spawn({
|
|
90
|
+
task: "工作目录: $W\n执行 /research-plan 技能\n\n上下文: 调研发现核心方法是 {method},推荐技术路线 {route}\n请制定完整实现计划到 plan_res.md",
|
|
91
|
+
label: "Research Plan"
|
|
92
|
+
})
|
|
125
93
|
```
|
|
126
94
|
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
**Output:** `$WORKSPACE/notes/paper_*.md` + `$WORKSPACE/survey_res.md`
|
|
95
|
+
**验证:** `$W/plan_res.md` 存在且包含 4 个 section(Dataset/Model/Training/Testing)
|
|
130
96
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
Read `references/prompts/plan.md` for detailed guidance.
|
|
134
|
-
|
|
135
|
-
Based on `survey_res.md`, `prepare_res.md`, and `task.json`, create a four-part plan:
|
|
136
|
-
|
|
137
|
-
1. **Dataset Plan**: data source, loading pipeline, preprocessing, dataloader design.
|
|
138
|
-
2. **Model Plan**: architecture, math formulas to implement, reference code to adapt.
|
|
139
|
-
3. **Training Plan**: loss functions, optimizer, hyperparameters, monitoring.
|
|
140
|
-
4. **Testing Plan**: metrics, evaluation protocol, baselines.
|
|
141
|
-
|
|
142
|
-
**Output:** `$WORKSPACE/plan_res.md`
|
|
97
|
+
---
|
|
143
98
|
|
|
144
|
-
|
|
99
|
+
### Phase 4: Implementation
|
|
145
100
|
|
|
146
|
-
|
|
101
|
+
**检查:** `$W/ml_res.md` 存在?
|
|
147
102
|
|
|
148
|
-
|
|
103
|
+
**如果缺失,读取 plan_res.md 要点,然后 spawn:**
|
|
149
104
|
|
|
150
105
|
```
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
testing/ # evaluation scripts
|
|
156
|
-
utils/ # shared utilities
|
|
157
|
-
run.py # main entry point
|
|
158
|
-
requirements.txt
|
|
106
|
+
sessions_spawn({
|
|
107
|
+
task: "工作目录: $W\n执行 /research-implement 技能\n\n上下文:\n- 计划包含 {N} 个组件: {list}\n- 数据集: {dataset}\n- 框架: PyTorch\n请实现代码到 $W/project/,运行 2 epoch 验证,写入 ml_res.md",
|
|
108
|
+
label: "Research Implement"
|
|
109
|
+
})
|
|
159
110
|
```
|
|
160
111
|
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
-
|
|
164
|
-
-
|
|
165
|
-
- Use real datasets, not toy data.
|
|
166
|
-
- First run: 2 epochs only (quick validation).
|
|
112
|
+
**验证:**
|
|
113
|
+
- `$W/project/run.py` 存在
|
|
114
|
+
- `$W/ml_res.md` 包含 `[RESULT]` 行
|
|
115
|
+
- loss 值非 NaN/Inf
|
|
167
116
|
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
```bash
|
|
171
|
-
cd $WORKSPACE/project && pip install -r requirements.txt && python run.py --epochs 2
|
|
172
|
-
```
|
|
117
|
+
---
|
|
173
118
|
|
|
174
|
-
|
|
119
|
+
### Phase 5: Review
|
|
175
120
|
|
|
176
|
-
|
|
121
|
+
**检查:** `$W/iterations/` 下最新 `judge_v*.md` 的 verdict 是否为 PASS?
|
|
177
122
|
|
|
178
|
-
|
|
123
|
+
**如果没有 PASS,spawn:**
|
|
179
124
|
|
|
180
|
-
|
|
125
|
+
```
|
|
126
|
+
sessions_spawn({
|
|
127
|
+
task: "工作目录: $W\n执行 /research-review 技能\n\n上下文:\n- 实现报告: ml_res.md 显示 train_loss={value}\n- 计划在 plan_res.md\n请审查代码,如需修改则迭代修复(最多 3 轮)",
|
|
128
|
+
label: "Research Review"
|
|
129
|
+
})
|
|
130
|
+
```
|
|
181
131
|
|
|
182
|
-
|
|
132
|
+
**验证:** 最新 `judge_v*.md` 中 `verdict: PASS` 或 `verdict: BLOCKED`
|
|
183
133
|
|
|
184
|
-
|
|
185
|
-
- The plan from `plan_res.md`: are all components present?
|
|
186
|
-
- Code quality: no toy implementations, proper error handling, correct data pipeline.
|
|
134
|
+
如果 BLOCKED → 报告用户,等待指示
|
|
187
135
|
|
|
188
|
-
|
|
136
|
+
---
|
|
189
137
|
|
|
190
|
-
|
|
191
|
-
# Review v1
|
|
138
|
+
### Phase 6: Full Experiment
|
|
192
139
|
|
|
193
|
-
|
|
140
|
+
**检查:** `$W/experiment_res.md` 存在?
|
|
194
141
|
|
|
195
|
-
|
|
196
|
-
- [ ] Dataset loading matches plan
|
|
197
|
-
- [ ] Model architecture matches formulas
|
|
198
|
-
- [ ] Loss function correct
|
|
199
|
-
- [ ] Training loop proper
|
|
200
|
-
- [ ] Evaluation metrics correct
|
|
142
|
+
**如果缺失,spawn:**
|
|
201
143
|
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
144
|
+
```
|
|
145
|
+
sessions_spawn({
|
|
146
|
+
task: "工作目录: $W\n执行 /research-experiment 技能\n\n上下文:\n- Review PASS,代码已验证\n- plan_res.md 中指定 full epochs\n请执行完整训练 + 消融实验,写入 experiment_res.md",
|
|
147
|
+
label: "Research Experiment"
|
|
148
|
+
})
|
|
205
149
|
```
|
|
206
150
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
## Step 9: Iterate
|
|
210
|
-
|
|
211
|
-
If the review verdict is `NEEDS_REVISION`:
|
|
212
|
-
|
|
213
|
-
1. Read `$WORKSPACE/iterations/judge_vN.md` for the latest suggestions.
|
|
214
|
-
2. Fix each issue in `$WORKSPACE/project/`.
|
|
215
|
-
3. Re-run the 2-epoch validation.
|
|
216
|
-
4. Write a new review to `$WORKSPACE/iterations/judge_v(N+1).md`.
|
|
217
|
-
5. Repeat until `PASS` or 3 iterations reached.
|
|
151
|
+
**验证:** `$W/experiment_res.md` 包含 `[RESULT]` 行和消融表格
|
|
218
152
|
|
|
219
|
-
|
|
153
|
+
---
|
|
220
154
|
|
|
221
|
-
|
|
155
|
+
## 完成
|
|
222
156
|
|
|
223
|
-
|
|
157
|
+
所有 Phase 验证通过后,输出最终摘要:
|
|
224
158
|
|
|
225
|
-
|
|
159
|
+
```
|
|
160
|
+
研究流程完成!
|
|
161
|
+
- 论文: {N} 篇分析
|
|
162
|
+
- 代码: $W/project/
|
|
163
|
+
- 结果: $W/experiment_res.md
|
|
164
|
+
- 审查: $W/iterations/ ({N} 轮)
|
|
165
|
+
```
|
|
226
166
|
|
|
227
|
-
|
|
228
|
-
2. Execute full training run.
|
|
229
|
-
3. Collect and analyze results.
|
|
167
|
+
---
|
|
230
168
|
|
|
231
|
-
|
|
169
|
+
## 上下文桥接规则
|
|
232
170
|
|
|
233
|
-
|
|
171
|
+
每次 spawn 前,编排器必须:
|
|
172
|
+
1. **读取**上一步的产出文件
|
|
173
|
+
2. **摘要** 2-5 行关键信息(不要复制全文)
|
|
174
|
+
3. **写入** spawn task 的"上下文"部分
|
|
234
175
|
|
|
235
|
-
|
|
176
|
+
这确保子 agent 拿到足够信息启动,同时不会被前序步骤的完整输出污染。
|
|
236
177
|
|
|
237
178
|
## Recovery
|
|
238
179
|
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
3. Resume from the first missing output file.
|
|
244
|
-
|
|
245
|
-
Never re-do a step whose output file already exists unless the user explicitly asks.
|
|
180
|
+
如果编排器中断:
|
|
181
|
+
1. 重新运行 /research-pipeline
|
|
182
|
+
2. 编排器会自动检查所有文件,跳过已完成的阶段
|
|
183
|
+
3. 从第一个缺失的产出文件开始继续
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-plan
|
|
3
|
+
description: "Create a structured implementation plan from survey results. Produces dataset/model/training/testing plans. Requires survey_res.md from /research-survey."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "📋",
|
|
9
|
+
},
|
|
10
|
+
}
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Research Plan
|
|
14
|
+
|
|
15
|
+
**Don't ask permission. Just do it.**
|
|
16
|
+
|
|
17
|
+
**Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
|
|
18
|
+
|
|
19
|
+
## Prerequisites
|
|
20
|
+
|
|
21
|
+
| File | Source |
|
|
22
|
+
|------|--------|
|
|
23
|
+
| `$W/task.json` | /research-pipeline or user |
|
|
24
|
+
| `$W/survey_res.md` | /research-survey |
|
|
25
|
+
| `$W/notes/paper_*.md` | /research-survey |
|
|
26
|
+
| `$W/repos/` (optional) | git clone |
|
|
27
|
+
|
|
28
|
+
**If `survey_res.md` is missing, STOP:** "需要先运行 /research-survey 完成深度分析"
|
|
29
|
+
|
|
30
|
+
## Output
|
|
31
|
+
|
|
32
|
+
| File | Content |
|
|
33
|
+
|------|---------|
|
|
34
|
+
| `$W/plan_res.md` | 四部分实现计划 |
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Workflow
|
|
39
|
+
|
|
40
|
+
### Step 1: 读取上下文
|
|
41
|
+
|
|
42
|
+
读取以下文件,理解研究目标和技术方案:
|
|
43
|
+
- `$W/task.json` — 研究目标
|
|
44
|
+
- `$W/survey_res.md` — 技术路线建议和核心公式
|
|
45
|
+
- 浏览 `$W/repos/` 的目录结构(如有)
|
|
46
|
+
|
|
47
|
+
### Step 2: 制定四部分计划
|
|
48
|
+
|
|
49
|
+
写入 `$W/plan_res.md`:
|
|
50
|
+
|
|
51
|
+
```markdown
|
|
52
|
+
# Implementation Plan
|
|
53
|
+
|
|
54
|
+
## 1. Dataset Plan
|
|
55
|
+
|
|
56
|
+
- **数据集名称:** {name}
|
|
57
|
+
- **来源:** {URL or description}
|
|
58
|
+
- **大小:** {samples / size}
|
|
59
|
+
- **预处理步骤:**
|
|
60
|
+
1. {step}
|
|
61
|
+
2. {step}
|
|
62
|
+
- **DataLoader 设计:**
|
|
63
|
+
- batch_size: {value}
|
|
64
|
+
- 输入格式: {shape}
|
|
65
|
+
- 输出格式: {shape}
|
|
66
|
+
|
|
67
|
+
## 2. Model Plan
|
|
68
|
+
|
|
69
|
+
- **架构概述:** {1-2 sentences}
|
|
70
|
+
- **组件列表:**
|
|
71
|
+
|
|
72
|
+
| 组件 | 对应公式 | 参考代码 | 输入 → 输出 |
|
|
73
|
+
|------|----------|----------|-------------|
|
|
74
|
+
| {component} | $formula$ | `repos/xxx/file.py` | {shape} → {shape} |
|
|
75
|
+
|
|
76
|
+
- **参数量估计:** {approximate}
|
|
77
|
+
|
|
78
|
+
## 3. Training Plan
|
|
79
|
+
|
|
80
|
+
- **Loss 函数:** {formula + description}
|
|
81
|
+
- **Optimizer:** {Adam/SGD/...}, lr={value}
|
|
82
|
+
- **Scheduler:** {if any}
|
|
83
|
+
- **训练参数:**
|
|
84
|
+
- epochs (validation): 2
|
|
85
|
+
- epochs (full): {value}
|
|
86
|
+
- batch_size: {value}
|
|
87
|
+
- **监控指标:** {loss, metrics to log}
|
|
88
|
+
|
|
89
|
+
## 4. Testing Plan
|
|
90
|
+
|
|
91
|
+
- **评估指标:**
|
|
92
|
+
|
|
93
|
+
| Metric | 公式/描述 | 期望范围 |
|
|
94
|
+
|--------|-----------|----------|
|
|
95
|
+
| {metric} | {description} | {range} |
|
|
96
|
+
|
|
97
|
+
- **Baselines:** {what to compare against}
|
|
98
|
+
- **消融实验(初步规划):**
|
|
99
|
+
1. {ablation 1}
|
|
100
|
+
2. {ablation 2}
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### Step 3: 自检
|
|
104
|
+
|
|
105
|
+
验证计划的完整性:
|
|
106
|
+
- [ ] 每个模型组件都有对应公式
|
|
107
|
+
- [ ] 数据集有具体获取方式
|
|
108
|
+
- [ ] Loss 函数有数学定义
|
|
109
|
+
- [ ] 评估指标有明确定义
|
|
110
|
+
- [ ] 训练参数合理(不要 lr=0.1 for Adam)
|
|
111
|
+
|
|
112
|
+
如有不确定项,在计划中标注 `⚠️ TODO: {reason}`
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Rules
|
|
117
|
+
|
|
118
|
+
1. 计划中每个组件必须可追溯到 survey_res.md 中的公式或方法
|
|
119
|
+
2. 不要写"通用"计划 — 每个参数都要有具体值或合理估计
|
|
120
|
+
3. 如果参考仓库存在,组件表必须包含参考代码路径
|
|
121
|
+
4. plan_res.md 的完成标志:四个部分都存在且非空
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-review
|
|
3
|
+
description: "Review ML implementation against plan and survey. Iterates fix-rerun-review up to 3 times. Requires ml_res.md from /research-implement."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "🔍",
|
|
9
|
+
"requires": { "bins": ["python3", "uv"] },
|
|
10
|
+
},
|
|
11
|
+
}
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Research Review
|
|
15
|
+
|
|
16
|
+
**Don't ask permission. Just do it.**
|
|
17
|
+
|
|
18
|
+
**Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
|
|
19
|
+
|
|
20
|
+
## Prerequisites
|
|
21
|
+
|
|
22
|
+
| File | Source |
|
|
23
|
+
|------|--------|
|
|
24
|
+
| `$W/ml_res.md` | /research-implement |
|
|
25
|
+
| `$W/project/` | /research-implement |
|
|
26
|
+
| `$W/plan_res.md` | /research-plan |
|
|
27
|
+
| `$W/survey_res.md` | /research-survey |
|
|
28
|
+
|
|
29
|
+
**If `ml_res.md` is missing, STOP:** "需要先运行 /research-implement 完成代码实现"
|
|
30
|
+
|
|
31
|
+
## Output
|
|
32
|
+
|
|
33
|
+
| File | Content |
|
|
34
|
+
|------|---------|
|
|
35
|
+
| `$W/iterations/judge_v{N}.md` | 每轮审查报告 |
|
|
36
|
+
|
|
37
|
+
最终报告中 `verdict: PASS` 表示审查通过。
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Workflow
|
|
42
|
+
|
|
43
|
+
### Step 1: 审查代码
|
|
44
|
+
|
|
45
|
+
读取以下内容:
|
|
46
|
+
- `$W/plan_res.md` — 每个组件的预期
|
|
47
|
+
- `$W/survey_res.md` — 核心公式
|
|
48
|
+
- `$W/project/` — 实际代码
|
|
49
|
+
- `$W/ml_res.md` — 执行结果
|
|
50
|
+
|
|
51
|
+
### Step 2: 逐项检查
|
|
52
|
+
|
|
53
|
+
| 检查项 | 方法 |
|
|
54
|
+
|--------|------|
|
|
55
|
+
| 数据管道匹配 plan | 对比 plan Dataset Plan vs `data/` 实现 |
|
|
56
|
+
| 模型架构匹配公式 | 对比 survey 公式 vs `model/` 实现 |
|
|
57
|
+
| Loss 函数正确 | 对比 plan Training Plan vs `training/loss.py` |
|
|
58
|
+
| 评估指标正确 | 对比 plan Testing Plan vs `testing/` |
|
|
59
|
+
| [RESULT] 行存在 | 检查 ml_res.md 中的数值来源 |
|
|
60
|
+
| Loss 合理 | 非 NaN/Inf,有下降趋势 |
|
|
61
|
+
| 无 mock 数据(除非已声明) | 搜索 `# MOCK DATA` 注释 |
|
|
62
|
+
|
|
63
|
+
### Step 3: 写入审查报告
|
|
64
|
+
|
|
65
|
+
写入 `$W/iterations/judge_v1.md`:
|
|
66
|
+
|
|
67
|
+
```markdown
|
|
68
|
+
# Review v1
|
|
69
|
+
|
|
70
|
+
## Verdict: PASS / NEEDS_REVISION
|
|
71
|
+
|
|
72
|
+
## Checklist
|
|
73
|
+
- [x/✗] Dataset loading matches plan
|
|
74
|
+
- [x/✗] Model architecture matches formulas
|
|
75
|
+
- [x/✗] Loss function correct
|
|
76
|
+
- [x/✗] Training loop proper
|
|
77
|
+
- [x/✗] Evaluation metrics correct
|
|
78
|
+
- [x/✗] Results are from real execution (not fabricated)
|
|
79
|
+
|
|
80
|
+
## Issues (if NEEDS_REVISION)
|
|
81
|
+
1. **{issue}**: {description} → **Fix**: {specific fix instruction}
|
|
82
|
+
2. ...
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Step 4: 迭代(如果 NEEDS_REVISION)
|
|
86
|
+
|
|
87
|
+
循环最多 3 次:
|
|
88
|
+
|
|
89
|
+
1. 读取 `judge_v{N}.md` 的修改建议
|
|
90
|
+
2. 修改 `$W/project/` 中的代码
|
|
91
|
+
3. 重新执行:
|
|
92
|
+
```bash
|
|
93
|
+
cd $W/project && source .venv/bin/activate && python run.py --epochs 2
|
|
94
|
+
```
|
|
95
|
+
4. 读取执行输出,验证修复
|
|
96
|
+
5. 写入 `judge_v{N+1}.md`
|
|
97
|
+
6. 如果 PASS → 停止;否则继续
|
|
98
|
+
|
|
99
|
+
### Step 5: 最终判定
|
|
100
|
+
|
|
101
|
+
3 轮后仍 NEEDS_REVISION → 在最后一份 judge 中列出剩余问题,标记 `verdict: BLOCKED`,等待用户介入。
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Rules
|
|
106
|
+
|
|
107
|
+
1. 审查必须逐项对照 plan,不能只看"代码能跑"
|
|
108
|
+
2. 每个 issue 必须给出具体的修复指令(不是"请改进")
|
|
109
|
+
3. 验证修复后必须重新执行代码并检查输出
|
|
110
|
+
4. PASS 的前提:所有 checklist 项通过 + [RESULT] 数值合理
|