scientify 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -14
- package/README.zh.md +38 -15
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +21 -2
- package/dist/index.js.map +1 -1
- package/dist/src/services/auto-updater.d.ts +15 -0
- package/dist/src/services/auto-updater.d.ts.map +1 -0
- package/dist/src/services/auto-updater.js +188 -0
- package/dist/src/services/auto-updater.js.map +1 -0
- package/dist/src/tools/arxiv-download.d.ts +25 -0
- package/dist/src/tools/arxiv-download.d.ts.map +1 -0
- package/dist/src/tools/arxiv-download.js +179 -0
- package/dist/src/tools/arxiv-download.js.map +1 -0
- package/dist/src/tools/{arxiv-tool.d.ts → arxiv-search.d.ts} +11 -8
- package/dist/src/tools/arxiv-search.d.ts.map +1 -0
- package/dist/src/tools/arxiv-search.js +140 -0
- package/dist/src/tools/arxiv-search.js.map +1 -0
- package/dist/src/tools/github-search-tool.d.ts +5 -1
- package/dist/src/tools/github-search-tool.d.ts.map +1 -1
- package/dist/src/tools/github-search-tool.js +10 -30
- package/dist/src/tools/github-search-tool.js.map +1 -1
- package/dist/src/tools/result.d.ts +37 -0
- package/dist/src/tools/result.d.ts.map +1 -0
- package/dist/src/tools/result.js +39 -0
- package/dist/src/tools/result.js.map +1 -0
- package/dist/src/tools/workspace.d.ts +32 -0
- package/dist/src/tools/workspace.d.ts.map +1 -0
- package/dist/src/tools/workspace.js +69 -0
- package/dist/src/tools/workspace.js.map +1 -0
- package/openclaw.plugin.json +22 -1
- package/package.json +13 -2
- package/skills/_shared/workspace-spec.md +15 -5
- package/skills/idea-generation/SKILL.md +2 -0
- package/skills/install-scientify/SKILL.md +15 -7
- package/skills/literature-survey/SKILL.md +86 -214
- package/skills/research-experiment/SKILL.md +114 -0
- package/skills/research-implement/SKILL.md +166 -0
- package/skills/research-pipeline/SKILL.md +104 -166
- package/skills/research-plan/SKILL.md +121 -0
- package/skills/research-review/SKILL.md +110 -0
- package/skills/research-survey/SKILL.md +140 -0
- package/skills/write-review-paper/SKILL.md +2 -0
- package/dist/src/tools/arxiv-tool.d.ts.map +0 -1
- package/dist/src/tools/arxiv-tool.js +0 -258
- package/dist/src/tools/arxiv-tool.js.map +0 -1
- package/skills/research-pipeline/references/prompts/implement.md +0 -135
- package/skills/research-pipeline/references/prompts/plan.md +0 -142
- package/skills/research-pipeline/references/prompts/review.md +0 -118
- package/skills/research-pipeline/references/prompts/survey.md +0 -105
- package/skills/research-pipeline/references/workspace-spec.md +0 -5
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: install-scientify
|
|
3
|
-
description: "Install Scientify - AI-powered research workflow automation plugin. Adds skills for
|
|
3
|
+
description: "Install Scientify - AI-powered research workflow automation plugin. Adds skills for research-pipeline (multi-agent orchestrator), literature-survey, idea-generation, arxiv tools, and workspace management commands."
|
|
4
4
|
metadata:
|
|
5
5
|
{
|
|
6
6
|
"openclaw":
|
|
@@ -21,6 +21,8 @@ metadata:
|
|
|
21
21
|
|
|
22
22
|
# Install Scientify
|
|
23
23
|
|
|
24
|
+
**Don't ask permission. Just do it.**
|
|
25
|
+
|
|
24
26
|
**Scientify** is an AI-powered research workflow automation plugin for OpenClaw.
|
|
25
27
|
|
|
26
28
|
## What You Get
|
|
@@ -29,10 +31,14 @@ metadata:
|
|
|
29
31
|
|
|
30
32
|
| Skill | Description |
|
|
31
33
|
|-------|-------------|
|
|
32
|
-
| **
|
|
33
|
-
| **research-
|
|
34
|
-
| **
|
|
35
|
-
| **
|
|
34
|
+
| **research-pipeline** | Orchestrator for end-to-end ML research. Spawns sub-agents for each phase. |
|
|
35
|
+
| **research-survey** | Deep analysis of papers: extract formulas, produce method comparison. |
|
|
36
|
+
| **research-plan** | 4-part implementation plan (Dataset/Model/Training/Testing). |
|
|
37
|
+
| **research-implement** | Implement ML code, run 2-epoch validation with `uv` venv isolation. |
|
|
38
|
+
| **research-review** | Review implementation against plan. Iterates up to 3 times. |
|
|
39
|
+
| **research-experiment** | Full training + ablation experiments. |
|
|
40
|
+
| **literature-survey** | Literature survey: search → filter → download → cluster → report. |
|
|
41
|
+
| **idea-generation** | Generate research ideas from arXiv/GitHub papers. |
|
|
36
42
|
|
|
37
43
|
### Commands (Direct, no LLM)
|
|
38
44
|
|
|
@@ -45,9 +51,11 @@ metadata:
|
|
|
45
51
|
| `/project-switch <id>` | Switch project |
|
|
46
52
|
| `/project-delete <id>` | Delete project |
|
|
47
53
|
|
|
48
|
-
###
|
|
54
|
+
### Tools
|
|
49
55
|
|
|
50
|
-
- **
|
|
56
|
+
- **arxiv_search** - Search arXiv.org API for papers (metadata only)
|
|
57
|
+
- **arxiv_download** - Download arXiv papers (.tex source or PDF)
|
|
58
|
+
- **github_search** - Search GitHub repositories
|
|
51
59
|
|
|
52
60
|
## Installation
|
|
53
61
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: literature-survey
|
|
3
|
-
description: "Comprehensive literature survey
|
|
3
|
+
description: "Comprehensive literature survey. Searches, filters, downloads, and clusters papers by research direction."
|
|
4
4
|
metadata:
|
|
5
5
|
{
|
|
6
6
|
"openclaw":
|
|
@@ -12,263 +12,135 @@ metadata:
|
|
|
12
12
|
|
|
13
13
|
# Literature Survey
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
**Don't ask permission. Just do it.**
|
|
16
16
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
## Architecture: Isolated Sub-agent
|
|
20
|
-
|
|
21
|
-
This survey runs in an **isolated sub-session** to avoid context pollution. The main session only receives the final report.
|
|
17
|
+
## Output Structure
|
|
22
18
|
|
|
23
19
|
```
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
├──
|
|
30
|
-
├──
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
├── Phase 5: 迭代发现
|
|
34
|
-
└── Phase 6: 生成报告
|
|
35
|
-
↓
|
|
36
|
-
返回主 Session: 摘要 + 文件路径
|
|
20
|
+
~/.openclaw/workspace/projects/{project-id}/
|
|
21
|
+
├── survey/
|
|
22
|
+
│ ├── search_terms.json # 检索词列表
|
|
23
|
+
│ └── report.md # 最终报告
|
|
24
|
+
└── papers/
|
|
25
|
+
├── _downloads/ # 原始下载
|
|
26
|
+
├── _meta/ # 每篇论文的元数据
|
|
27
|
+
│ └── {arxiv_id}.json
|
|
28
|
+
└── {direction}/ # 整理后的分类
|
|
37
29
|
```
|
|
38
30
|
|
|
39
31
|
---
|
|
40
32
|
|
|
41
|
-
##
|
|
42
|
-
|
|
43
|
-
**Step 1: Spawn isolated sub-agent**
|
|
33
|
+
## Workflow
|
|
44
34
|
|
|
45
|
-
|
|
46
|
-
- "调研 [topic] 领域的文献"
|
|
47
|
-
- "帮我收集 [topic] 相关的论文"
|
|
48
|
-
- "Survey papers on [topic]"
|
|
49
|
-
|
|
50
|
-
Use `sessions_spawn` to run the survey in isolation:
|
|
35
|
+
### Phase 1: 准备
|
|
51
36
|
|
|
37
|
+
```bash
|
|
38
|
+
ACTIVE=$(cat ~/.openclaw/workspace/projects/.active 2>/dev/null)
|
|
39
|
+
if [ -z "$ACTIVE" ]; then
|
|
40
|
+
PROJECT_ID="<topic-slug>"
|
|
41
|
+
mkdir -p ~/.openclaw/workspace/projects/$PROJECT_ID/{survey,papers/_downloads,papers/_meta}
|
|
42
|
+
echo "$PROJECT_ID" > ~/.openclaw/workspace/projects/.active
|
|
43
|
+
fi
|
|
44
|
+
PROJECT_DIR="$HOME/.openclaw/workspace/projects/$(cat ~/.openclaw/workspace/projects/.active)"
|
|
52
45
|
```
|
|
53
|
-
sessions_spawn({
|
|
54
|
-
task: `你是一个文献调研专家。请为研究主题 "{TOPIC}" 执行完整的文献调研。
|
|
55
|
-
|
|
56
|
-
## 调研目标
|
|
57
|
-
{USER_REQUIREMENTS}
|
|
58
|
-
|
|
59
|
-
## 执行流程
|
|
60
|
-
|
|
61
|
-
### Phase 1: 生成检索词
|
|
62
|
-
基于研究主题,生成 8-15 个检索词组合,覆盖:
|
|
63
|
-
- 核心概念的不同表述
|
|
64
|
-
- 相关技术方法
|
|
65
|
-
- 应用场景
|
|
66
|
-
- 英文和中文关键词(如适用)
|
|
67
|
-
|
|
68
|
-
将检索词保存到 $WORKSPACE/survey/search_terms.json
|
|
69
|
-
|
|
70
|
-
### Phase 2: 批量检索
|
|
71
|
-
对每个检索词使用 arxiv_search tool:
|
|
72
|
-
- max_results: 30-50 per query
|
|
73
|
-
- 合并去重(按 arxiv_id)
|
|
74
|
-
- 记录每篇论文的来源检索词
|
|
75
46
|
|
|
76
|
-
|
|
47
|
+
生成 4-8 个检索词,保存到 `survey/search_terms.json`。
|
|
77
48
|
|
|
78
|
-
|
|
79
|
-
阅读每篇论文的标题和摘要,判断与 "{TOPIC}" 的相关性:
|
|
80
|
-
- 5分:高度相关,直接研究此主题
|
|
81
|
-
- 4分:相关,涉及关键方法或应用
|
|
82
|
-
- 3分:部分相关,可作为参考
|
|
83
|
-
- 2分:边缘相关
|
|
84
|
-
- 1分:不相关
|
|
85
|
-
|
|
86
|
-
保留 score >= 4 的论文。
|
|
87
|
-
将筛选结果保存到 $WORKSPACE/survey/filtered_papers.json
|
|
88
|
-
|
|
89
|
-
### Phase 4: 聚类分组
|
|
90
|
-
分析筛选后论文的摘要,识别 3-6 个研究方向/子主题。
|
|
91
|
-
为每个方向创建子文件夹并分配论文:
|
|
92
|
-
|
|
93
|
-
$WORKSPACE/papers/
|
|
94
|
-
├── {direction-1}/
|
|
95
|
-
│ ├── paper_list.md
|
|
96
|
-
│ └── [arxiv_ids...]
|
|
97
|
-
├── {direction-2}/
|
|
98
|
-
│ └── ...
|
|
99
|
-
└── uncategorized/
|
|
49
|
+
---
|
|
100
50
|
|
|
101
|
-
|
|
51
|
+
### Phase 2: 增量搜索-筛选-下载(循环)
|
|
102
52
|
|
|
103
|
-
|
|
104
|
-
检查高分论文的摘要,识别:
|
|
105
|
-
- 提到的新方法名称
|
|
106
|
-
- 引用的重要工作
|
|
107
|
-
- 新的关键词
|
|
53
|
+
**对每个检索词重复以下步骤**:
|
|
108
54
|
|
|
109
|
-
|
|
110
|
-
最多迭代 2 轮。
|
|
55
|
+
#### 2.1 搜索
|
|
111
56
|
|
|
112
|
-
|
|
113
|
-
|
|
57
|
+
```
|
|
58
|
+
arxiv_search({ query: "<term>", max_results: 30 })
|
|
59
|
+
```
|
|
114
60
|
|
|
115
|
-
|
|
61
|
+
#### 2.2 即时筛选
|
|
116
62
|
|
|
117
|
-
|
|
118
|
-
- 检索词数量: X
|
|
119
|
-
- 初始检索: Y 篇
|
|
120
|
-
- 筛选后: Z 篇
|
|
121
|
-
- 研究方向: N 个
|
|
63
|
+
对返回的论文**立即**评分(1-5),只保留 ≥4 分的。
|
|
122
64
|
|
|
123
|
-
|
|
65
|
+
评分标准:
|
|
66
|
+
- 5分:核心论文,直接研究该主题
|
|
67
|
+
- 4分:相关方法或应用
|
|
68
|
+
- 3分及以下:跳过
|
|
124
69
|
|
|
125
|
-
|
|
126
|
-
- 论文数量: X
|
|
127
|
-
- 代表性工作: [列表]
|
|
128
|
-
- 主要特点: [描述]
|
|
70
|
+
#### 2.3 下载有用论文
|
|
129
71
|
|
|
130
|
-
|
|
131
|
-
|
|
72
|
+
```
|
|
73
|
+
arxiv_download({
|
|
74
|
+
arxiv_ids: ["<有用的论文ID>"],
|
|
75
|
+
output_dir: "$PROJECT_DIR/papers/_downloads"
|
|
76
|
+
})
|
|
77
|
+
```
|
|
132
78
|
|
|
133
|
-
|
|
134
|
-
| 排名 | 标题 | 年份 | 相关度 | 方向 |
|
|
135
|
-
|-----|------|-----|-------|-----|
|
|
136
|
-
| 1 | ... | ... | 5 | ... |
|
|
79
|
+
#### 2.4 写入元数据
|
|
137
80
|
|
|
138
|
-
|
|
139
|
-
[基于论文年份分布的观察]
|
|
81
|
+
为每篇下载的论文创建元数据文件 `papers/_meta/{arxiv_id}.json`:
|
|
140
82
|
|
|
141
|
-
|
|
142
|
-
|
|
83
|
+
```json
|
|
84
|
+
{
|
|
85
|
+
"arxiv_id": "2401.12345",
|
|
86
|
+
"title": "...",
|
|
87
|
+
"abstract": "...",
|
|
88
|
+
"score": 5,
|
|
89
|
+
"source_term": "battery RUL prediction",
|
|
90
|
+
"downloaded_at": "2024-01-15T10:00:00Z"
|
|
91
|
+
}
|
|
92
|
+
```
|
|
143
93
|
|
|
144
|
-
|
|
145
|
-
1. [入门级论文]
|
|
146
|
-
2. [核心方法论文]
|
|
147
|
-
3. [最新进展]
|
|
94
|
+
**完成一个检索词后,再进行下一个。** 这样避免上下文被大量搜索结果污染。
|
|
148
95
|
|
|
149
96
|
---
|
|
150
97
|
|
|
151
|
-
|
|
152
|
-
- 总共发现的论文数量
|
|
153
|
-
- 识别的研究方向
|
|
154
|
-
- 报告文件位置`,
|
|
155
|
-
label: "literature-survey-{TOPIC_SLUG}",
|
|
156
|
-
runTimeoutSeconds: 900,
|
|
157
|
-
cleanup: "keep"
|
|
158
|
-
})
|
|
159
|
-
```
|
|
160
|
-
|
|
161
|
-
**Step 2: Wait and relay results**
|
|
162
|
-
|
|
163
|
-
Sub-agent 完成后会自动 announce 结果到主 session。
|
|
164
|
-
将结果摘要展示给用户,包括:
|
|
165
|
-
- 发现的论文数量
|
|
166
|
-
- 主要研究方向
|
|
167
|
-
- 报告文件位置
|
|
98
|
+
### Phase 3: 分类整理
|
|
168
99
|
|
|
169
|
-
|
|
100
|
+
所有检索词处理完毕后:
|
|
170
101
|
|
|
171
|
-
|
|
102
|
+
#### 3.1 读取所有元数据
|
|
172
103
|
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
├── project.json
|
|
176
|
-
├── survey/ # 调研过程数据
|
|
177
|
-
│ ├── search_terms.json # 检索词列表
|
|
178
|
-
│ ├── raw_results.json # 原始检索结果
|
|
179
|
-
│ ├── filtered_papers.json # 筛选后的论文
|
|
180
|
-
│ ├── clusters.json # 聚类结果
|
|
181
|
-
│ ├── iterations.log # 迭代记录
|
|
182
|
-
│ └── report.md # 最终报告
|
|
183
|
-
├── papers/ # 按方向组织的论文
|
|
184
|
-
│ ├── {direction-1}/
|
|
185
|
-
│ │ ├── paper_list.md
|
|
186
|
-
│ │ └── 2401.12345/ # .tex 源文件
|
|
187
|
-
│ ├── {direction-2}/
|
|
188
|
-
│ └── uncategorized/
|
|
189
|
-
└── ideas/ # 后续 idea-generation 输出
|
|
104
|
+
```bash
|
|
105
|
+
ls $PROJECT_DIR/papers/_meta/
|
|
190
106
|
```
|
|
191
107
|
|
|
192
|
-
|
|
108
|
+
读取所有 `.json` 文件,汇总论文列表。
|
|
193
109
|
|
|
194
|
-
|
|
110
|
+
#### 3.2 聚类分析
|
|
195
111
|
|
|
196
|
-
|
|
197
|
-
```json
|
|
198
|
-
{
|
|
199
|
-
"topic": "battery life prediction",
|
|
200
|
-
"generated_at": "2024-01-15T10:00:00Z",
|
|
201
|
-
"terms": [
|
|
202
|
-
{"term": "battery remaining useful life", "category": "core"},
|
|
203
|
-
{"term": "lithium-ion degradation prediction", "category": "method"},
|
|
204
|
-
{"term": "SOH estimation neural network", "category": "technique"},
|
|
205
|
-
{"term": "EV battery health monitoring", "category": "application"}
|
|
206
|
-
]
|
|
207
|
-
}
|
|
208
|
-
```
|
|
112
|
+
根据论文的标题、摘要、来源检索词,识别 3-6 个研究方向。
|
|
209
113
|
|
|
210
|
-
|
|
211
|
-
```json
|
|
212
|
-
{
|
|
213
|
-
"filtered_at": "2024-01-15T10:30:00Z",
|
|
214
|
-
"total_raw": 245,
|
|
215
|
-
"total_filtered": 42,
|
|
216
|
-
"papers": [
|
|
217
|
-
{
|
|
218
|
-
"arxiv_id": "2401.12345",
|
|
219
|
-
"title": "...",
|
|
220
|
-
"abstract": "...",
|
|
221
|
-
"authors": ["..."],
|
|
222
|
-
"published": "2024-01-15",
|
|
223
|
-
"relevance_score": 5,
|
|
224
|
-
"source_terms": ["battery RUL", "degradation prediction"],
|
|
225
|
-
"notes": "直接研究锂电池RUL预测"
|
|
226
|
-
}
|
|
227
|
-
]
|
|
228
|
-
}
|
|
229
|
-
```
|
|
114
|
+
#### 3.3 创建文件夹并移动
|
|
230
115
|
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
"clustered_at": "2024-01-15T11:00:00Z",
|
|
235
|
-
"clusters": [
|
|
236
|
-
{
|
|
237
|
-
"id": "data-driven",
|
|
238
|
-
"name": "数据驱动方法",
|
|
239
|
-
"description": "使用机器学习/深度学习的方法",
|
|
240
|
-
"paper_count": 15,
|
|
241
|
-
"paper_ids": ["2401.12345", "2401.12346", "..."],
|
|
242
|
-
"keywords": ["LSTM", "CNN", "transformer", "neural network"]
|
|
243
|
-
},
|
|
244
|
-
{
|
|
245
|
-
"id": "physics-based",
|
|
246
|
-
"name": "物理模型方法",
|
|
247
|
-
"description": "基于电化学机理的方法",
|
|
248
|
-
"paper_count": 8,
|
|
249
|
-
"paper_ids": ["..."]
|
|
250
|
-
}
|
|
251
|
-
]
|
|
252
|
-
}
|
|
116
|
+
```bash
|
|
117
|
+
mkdir -p "$PROJECT_DIR/papers/data-driven"
|
|
118
|
+
mv "$PROJECT_DIR/papers/_downloads/2401.12345" "$PROJECT_DIR/papers/data-driven/"
|
|
253
119
|
```
|
|
254
120
|
|
|
255
121
|
---
|
|
256
122
|
|
|
257
|
-
|
|
123
|
+
### Phase 4: 生成报告
|
|
258
124
|
|
|
259
|
-
|
|
125
|
+
创建 `survey/report.md`:
|
|
126
|
+
- 调研概要(检索词数、论文数、方向数)
|
|
127
|
+
- 各研究方向概述
|
|
128
|
+
- Top 10 论文
|
|
129
|
+
- 建议阅读顺序
|
|
260
130
|
|
|
261
|
-
|
|
131
|
+
---
|
|
262
132
|
|
|
263
|
-
|
|
264
|
-
→ Skip iteration
|
|
265
|
-
→ Generate simplified report
|
|
133
|
+
## 关键设计
|
|
266
134
|
|
|
267
|
-
|
|
135
|
+
| 原则 | 说明 |
|
|
136
|
+
|------|------|
|
|
137
|
+
| **增量处理** | 每个检索词独立完成搜索→筛选→下载→写元数据,避免上下文膨胀 |
|
|
138
|
+
| **元数据驱动** | 分类基于 `_meta/*.json`,不依赖内存中的大列表 |
|
|
139
|
+
| **文件夹即分类** | 聚类结果通过 `papers/{direction}/` 体现,无需额外 JSON |
|
|
268
140
|
|
|
269
|
-
##
|
|
141
|
+
## Tools
|
|
270
142
|
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
143
|
+
| Tool | Purpose |
|
|
144
|
+
|------|---------|
|
|
145
|
+
| `arxiv_search` | 搜索论文(无副作用) |
|
|
146
|
+
| `arxiv_download` | 下载 .tex/.pdf(需绝对路径) |
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-experiment
|
|
3
|
+
description: "Full training run + ablation experiments + result analysis. Requires review PASS from /research-review."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "🧪",
|
|
9
|
+
"requires": { "bins": ["python3", "uv"] },
|
|
10
|
+
},
|
|
11
|
+
}
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Research Experiment
|
|
15
|
+
|
|
16
|
+
**Don't ask permission. Just do it.**
|
|
17
|
+
|
|
18
|
+
**Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
|
|
19
|
+
|
|
20
|
+
## Prerequisites
|
|
21
|
+
|
|
22
|
+
| File | Source |
|
|
23
|
+
|------|--------|
|
|
24
|
+
| `$W/project/` | /research-implement |
|
|
25
|
+
| `$W/plan_res.md` | /research-plan |
|
|
26
|
+
| `$W/iterations/judge_v*.md` | /research-review(最后一份 verdict 必须是 PASS) |
|
|
27
|
+
|
|
28
|
+
**验证 PASS:** 读取最新的 `judge_v*.md`,确认 `verdict: PASS`。如果不是,STOP。
|
|
29
|
+
|
|
30
|
+
## Output
|
|
31
|
+
|
|
32
|
+
| File | Content |
|
|
33
|
+
|------|---------|
|
|
34
|
+
| `$W/experiment_res.md` | 完整实验报告 |
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Workflow
|
|
39
|
+
|
|
40
|
+
### Step 1: Full Training
|
|
41
|
+
|
|
42
|
+
修改 epoch 数为 plan_res.md 中指定的正式值。**不要改代码逻辑,只改 epoch。**
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
cd $W/project && source .venv/bin/activate
|
|
46
|
+
python run.py # full epochs
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
记录完整训练的 `[RESULT]` 输出。
|
|
50
|
+
|
|
51
|
+
### Step 2: 分析结果
|
|
52
|
+
|
|
53
|
+
读取训练输出,评估:
|
|
54
|
+
- 最终 loss 和 metrics
|
|
55
|
+
- 训练曲线趋势(loss 是否持续下降)
|
|
56
|
+
- 是否过拟合(train vs val gap)
|
|
57
|
+
|
|
58
|
+
### Step 3: 消融实验
|
|
59
|
+
|
|
60
|
+
根据 plan_res.md 中的消融计划,执行 2-3 个消融实验:
|
|
61
|
+
|
|
62
|
+
对每个消融:
|
|
63
|
+
1. 修改代码(注释/替换对应组件)
|
|
64
|
+
2. 执行 2 epoch 快速验证
|
|
65
|
+
3. 记录结果
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
# Example: 去掉 attention module
|
|
69
|
+
python run.py --epochs 2 --ablation no_attention
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Step 4: 写入实验报告
|
|
73
|
+
|
|
74
|
+
写入 `$W/experiment_res.md`:
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
# Experiment Report
|
|
78
|
+
|
|
79
|
+
## Full Training Results (from execution log)
|
|
80
|
+
- Epochs: {N}
|
|
81
|
+
- [RESULT] train_loss={value}
|
|
82
|
+
- [RESULT] val_metric={value}
|
|
83
|
+
- [RESULT] elapsed={value}
|
|
84
|
+
- [RESULT] device={device}
|
|
85
|
+
|
|
86
|
+
> 以上数值来自真实执行输出。
|
|
87
|
+
|
|
88
|
+
## Training Analysis
|
|
89
|
+
- 收敛情况: {converged / still improving / diverged}
|
|
90
|
+
- 过拟合: {yes/no, evidence}
|
|
91
|
+
|
|
92
|
+
## Ablation Studies
|
|
93
|
+
|
|
94
|
+
| 实验 | 修改 | val_metric | vs Full |
|
|
95
|
+
|------|------|-----------|---------|
|
|
96
|
+
| Full model | — | {value} | baseline |
|
|
97
|
+
| No {component} | 去掉 {X} | {value} | {-/+}% |
|
|
98
|
+
| ... | ... | ... | ... |
|
|
99
|
+
|
|
100
|
+
## Conclusions
|
|
101
|
+
- {key findings}
|
|
102
|
+
|
|
103
|
+
## Limitations
|
|
104
|
+
- {limitations and future work}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
## Rules
|
|
110
|
+
|
|
111
|
+
1. Full training 只改 epoch 数,不改代码逻辑
|
|
112
|
+
2. 所有数值必须来自真实执行输出
|
|
113
|
+
3. 消融实验至少做 2 个
|
|
114
|
+
4. 如果 full training 失败(OOM 等),调整 batch_size 后重试,不要跳过
|
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-implement
|
|
3
|
+
description: "Implement ML code from plan, run 2-epoch validation, verify real results. Requires plan_res.md from /research-plan."
|
|
4
|
+
metadata:
|
|
5
|
+
{
|
|
6
|
+
"openclaw":
|
|
7
|
+
{
|
|
8
|
+
"emoji": "💻",
|
|
9
|
+
"requires": { "bins": ["python3", "uv"] },
|
|
10
|
+
},
|
|
11
|
+
}
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Research Implement
|
|
15
|
+
|
|
16
|
+
**Don't ask permission. Just do it.**
|
|
17
|
+
|
|
18
|
+
**Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
|
|
19
|
+
|
|
20
|
+
## Prerequisites
|
|
21
|
+
|
|
22
|
+
| File | Source |
|
|
23
|
+
|------|--------|
|
|
24
|
+
| `$W/plan_res.md` | /research-plan |
|
|
25
|
+
| `$W/survey_res.md` | /research-survey |
|
|
26
|
+
| `$W/repos/` (optional) | reference code |
|
|
27
|
+
|
|
28
|
+
**If `plan_res.md` is missing, STOP:** "需要先运行 /research-plan 完成实现计划"
|
|
29
|
+
|
|
30
|
+
## Output
|
|
31
|
+
|
|
32
|
+
| File | Content |
|
|
33
|
+
|------|---------|
|
|
34
|
+
| `$W/project/` | 完整可运行代码 |
|
|
35
|
+
| `$W/ml_res.md` | 实现报告(含真实执行结果) |
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Workflow
|
|
40
|
+
|
|
41
|
+
### Step 1: 读取计划
|
|
42
|
+
|
|
43
|
+
读取 `$W/plan_res.md`,提取:
|
|
44
|
+
- 所有组件列表
|
|
45
|
+
- 数据集信息
|
|
46
|
+
- 训练参数
|
|
47
|
+
|
|
48
|
+
### Step 2: 创建项目结构
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
$W/project/
|
|
52
|
+
model/ # 模型组件(每个组件一个文件)
|
|
53
|
+
data/ # 数据加载
|
|
54
|
+
training/ # 训练循环 + loss
|
|
55
|
+
testing/ # 评估
|
|
56
|
+
utils/ # 工具函数
|
|
57
|
+
run.py # 入口(必须输出 [RESULT] 行)
|
|
58
|
+
requirements.txt
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Step 3: 实现代码
|
|
62
|
+
|
|
63
|
+
按此顺序实现(每步完成后立即验证):
|
|
64
|
+
|
|
65
|
+
**3a. requirements.txt** — 列出所有依赖,pin 主版本
|
|
66
|
+
|
|
67
|
+
**3b. 数据管道**
|
|
68
|
+
```bash
|
|
69
|
+
cd $W/project && uv venv .venv && source .venv/bin/activate
|
|
70
|
+
uv pip install -r requirements.txt
|
|
71
|
+
python -c "from data.dataset import *; print('data OK')"
|
|
72
|
+
```
|
|
73
|
+
验证:import 无报错
|
|
74
|
+
|
|
75
|
+
**3c. 模型架构**
|
|
76
|
+
```bash
|
|
77
|
+
python -c "from model import *; import torch; x = torch.randn(2, ...); print(model(x).shape)"
|
|
78
|
+
```
|
|
79
|
+
验证:输出 shape 正确
|
|
80
|
+
|
|
81
|
+
**3d. Loss + 训练循环**
|
|
82
|
+
|
|
83
|
+
**3e. 评估逻辑**
|
|
84
|
+
|
|
85
|
+
**3f. run.py** — 必须包含:
|
|
86
|
+
```python
|
|
87
|
+
print(f"[RESULT] train_loss={train_loss:.6f}")
|
|
88
|
+
print(f"[RESULT] val_metric={val_metric:.6f}")
|
|
89
|
+
print(f"[RESULT] elapsed={elapsed:.1f}s")
|
|
90
|
+
print(f"[RESULT] device={device}")
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### Step 4: 环境搭建 + 执行
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
cd $W/project
|
|
97
|
+
uv venv .venv
|
|
98
|
+
source .venv/bin/activate
|
|
99
|
+
|
|
100
|
+
# 自动检测依赖格式
|
|
101
|
+
if [ -f "pyproject.toml" ]; then
|
|
102
|
+
uv pip install -e .
|
|
103
|
+
elif [ -f "requirements.txt" ]; then
|
|
104
|
+
uv pip install -r requirements.txt
|
|
105
|
+
fi
|
|
106
|
+
|
|
107
|
+
# 2 epoch 验证
|
|
108
|
+
python run.py --epochs 2
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Step 5: 验证执行结果
|
|
112
|
+
|
|
113
|
+
**执行完成后,必须:**
|
|
114
|
+
|
|
115
|
+
1. 读取 stdout/stderr 完整输出
|
|
116
|
+
2. 确认存在 `[RESULT]` 行
|
|
117
|
+
3. 确认 loss 非 NaN/Inf
|
|
118
|
+
4. 确认 loss 有下降趋势(即使微小)
|
|
119
|
+
|
|
120
|
+
**如果执行失败:**
|
|
121
|
+
- 读取报错信息
|
|
122
|
+
- 修复代码
|
|
123
|
+
- 重新执行
|
|
124
|
+
- 最多重试 3 次
|
|
125
|
+
|
|
126
|
+
### Step 6: 写入报告
|
|
127
|
+
|
|
128
|
+
写入 `$W/ml_res.md`:
|
|
129
|
+
|
|
130
|
+
```markdown
|
|
131
|
+
# Implementation Report
|
|
132
|
+
|
|
133
|
+
## Data Source
|
|
134
|
+
- Dataset: {name} — real / mock (reason)
|
|
135
|
+
- If mock: steps to obtain real data: [...]
|
|
136
|
+
|
|
137
|
+
## Components Implemented
|
|
138
|
+
- {module}: {description}
|
|
139
|
+
|
|
140
|
+
## Quick Validation Results (from execution log)
|
|
141
|
+
- Epochs: 2
|
|
142
|
+
- [RESULT] train_loss={从执行输出中复制}
|
|
143
|
+
- [RESULT] val_metric={从执行输出中复制}
|
|
144
|
+
- [RESULT] elapsed={从执行输出中复制}
|
|
145
|
+
- [RESULT] device={从执行输出中复制}
|
|
146
|
+
|
|
147
|
+
> 以上数值直接引用自代码执行输出。
|
|
148
|
+
> 如任何数值无法从执行日志中验证,标注为 ⚠️ UNVERIFIED。
|
|
149
|
+
|
|
150
|
+
## Deviations from Plan
|
|
151
|
+
- {changes and why}
|
|
152
|
+
|
|
153
|
+
## Known Issues
|
|
154
|
+
- {issues}
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Critical Rules
|
|
160
|
+
|
|
161
|
+
1. **禁止编造结果。** 所有数值必须来自代码执行输出。执行失败就报告失败。
|
|
162
|
+
2. **禁止使用全局 pip。** 必须用 uv venv 隔离。
|
|
163
|
+
3. **禁止直接 import repos/**,必须改写适配。
|
|
164
|
+
4. **mock 数据必须标注** — 代码中 `# MOCK DATA: <reason>`,报告中声明。
|
|
165
|
+
5. **run.py 必须输出 `[RESULT]` 行**,报告必须引用这些输出。
|
|
166
|
+
6. 3 次重试后仍失败,写入失败报告并停止。
|