scientify 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/README.md +38 -14
  2. package/README.zh.md +38 -15
  3. package/dist/index.d.ts.map +1 -1
  4. package/dist/index.js +21 -2
  5. package/dist/index.js.map +1 -1
  6. package/dist/src/services/auto-updater.d.ts +15 -0
  7. package/dist/src/services/auto-updater.d.ts.map +1 -0
  8. package/dist/src/services/auto-updater.js +188 -0
  9. package/dist/src/services/auto-updater.js.map +1 -0
  10. package/dist/src/tools/arxiv-download.d.ts +25 -0
  11. package/dist/src/tools/arxiv-download.d.ts.map +1 -0
  12. package/dist/src/tools/arxiv-download.js +179 -0
  13. package/dist/src/tools/arxiv-download.js.map +1 -0
  14. package/dist/src/tools/{arxiv-tool.d.ts → arxiv-search.d.ts} +11 -8
  15. package/dist/src/tools/arxiv-search.d.ts.map +1 -0
  16. package/dist/src/tools/arxiv-search.js +140 -0
  17. package/dist/src/tools/arxiv-search.js.map +1 -0
  18. package/dist/src/tools/github-search-tool.d.ts +5 -1
  19. package/dist/src/tools/github-search-tool.d.ts.map +1 -1
  20. package/dist/src/tools/github-search-tool.js +10 -30
  21. package/dist/src/tools/github-search-tool.js.map +1 -1
  22. package/dist/src/tools/result.d.ts +37 -0
  23. package/dist/src/tools/result.d.ts.map +1 -0
  24. package/dist/src/tools/result.js +39 -0
  25. package/dist/src/tools/result.js.map +1 -0
  26. package/dist/src/tools/workspace.d.ts +32 -0
  27. package/dist/src/tools/workspace.d.ts.map +1 -0
  28. package/dist/src/tools/workspace.js +69 -0
  29. package/dist/src/tools/workspace.js.map +1 -0
  30. package/openclaw.plugin.json +22 -1
  31. package/package.json +13 -2
  32. package/skills/_shared/workspace-spec.md +15 -5
  33. package/skills/idea-generation/SKILL.md +2 -0
  34. package/skills/install-scientify/SKILL.md +15 -7
  35. package/skills/literature-survey/SKILL.md +86 -214
  36. package/skills/research-experiment/SKILL.md +114 -0
  37. package/skills/research-implement/SKILL.md +166 -0
  38. package/skills/research-pipeline/SKILL.md +104 -166
  39. package/skills/research-plan/SKILL.md +121 -0
  40. package/skills/research-review/SKILL.md +110 -0
  41. package/skills/research-survey/SKILL.md +140 -0
  42. package/skills/write-review-paper/SKILL.md +2 -0
  43. package/dist/src/tools/arxiv-tool.d.ts.map +0 -1
  44. package/dist/src/tools/arxiv-tool.js +0 -258
  45. package/dist/src/tools/arxiv-tool.js.map +0 -1
  46. package/skills/research-pipeline/references/prompts/implement.md +0 -135
  47. package/skills/research-pipeline/references/prompts/plan.md +0 -142
  48. package/skills/research-pipeline/references/prompts/review.md +0 -118
  49. package/skills/research-pipeline/references/prompts/survey.md +0 -105
  50. package/skills/research-pipeline/references/workspace-spec.md +0 -5
@@ -1,245 +1,183 @@
1
1
  ---
2
2
  name: research-pipeline
3
- description: "End-to-end research automation: idea literature plan implement → review → iterate. Use for: implementing a specific research idea, full ML research workflow. NOT for: just exploring literature (use /literature-survey), just generating ideas (use /idea-generation), just writing review (use /write-review-paper)."
3
+ description: "Orchestrates the full research workflow by spawning sub-agents for each phase. Checks workspace state, dispatches tasks, verifies outputs. Use for: end-to-end ML research. Each phase runs in an isolated context via sessions_spawn."
4
4
  metadata:
5
5
  {
6
6
  "openclaw":
7
7
  {
8
8
  "emoji": "🔬",
9
- "requires": { "bins": ["git", "python3"] },
9
+ "requires": { "bins": ["git", "python3", "uv"] },
10
10
  },
11
11
  }
12
12
  ---
13
13
 
14
- # Research Pipeline
14
+ # Research Pipeline (Orchestrator)
15
15
 
16
- Automate an end-to-end ML research workflow: idea → literature → survey → plan → implement → review → iterate.
16
+ **Don't ask permission. Just do it.**
17
17
 
18
- **Workspace:** See `../_shared/workspace-spec.md` for directory structure. Outputs go to `$WORKSPACE/project/`, `$WORKSPACE/iterations/`.
18
+ 你是编排器。你不直接做研究工作,而是:
19
+ 1. 检查 workspace 文件状态
20
+ 2. 为下一步构造任务描述
21
+ 3. 用 `sessions_spawn` 派发给子 agent
22
+ 4. 等待完成后验证产出
23
+ 5. 重复直到流程结束
19
24
 
20
- **File existence = step completion.** Skip steps whose output already exists.
25
+ **Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
21
26
 
22
27
  ---
23
28
 
24
- ## Step 0: Check Active Project
29
+ ## Step 0: 初始化
25
30
 
26
31
  ```bash
27
- cat ~/.openclaw/workspace/projects/.active 2>/dev/null
32
+ ACTIVE=$(cat ~/.openclaw/workspace/projects/.active 2>/dev/null)
28
33
  ```
29
34
 
30
- If active, set `$WORKSPACE = ~/.openclaw/workspace/projects/{project_id}/`.
31
- If none, create based on research idea in Step 1.
35
+ 如果没有 active project:
36
+ 1. 问用户:研究主题是什么?
37
+ 2. 创建项目目录
38
+ 3. 写入 `task.json`
32
39
 
33
- ---
34
-
35
- ## Step 1: Parse Task
36
-
37
- Read `$WORKSPACE/task.json`. If it does not exist, ask the user for:
40
+ 设置 `$W = ~/.openclaw/workspace/projects/{project-id}`
38
41
 
39
- - **idea**: A description of the research idea (1-3 sentences).
40
- - **references** (optional): ArXiv IDs or paper titles as starting points.
41
- - **domain** (optional): e.g. "recommendation systems", "NLP", "computer vision".
42
-
43
- Write the result to `$WORKSPACE/task.json`:
42
+ ---
44
43
 
45
- ```json
46
- {
47
- "idea": "...",
48
- "references": ["2401.12345", "..."],
49
- "domain": "...",
50
- "date_limit": "2024-01-01"
51
- }
52
- ```
44
+ ## 调度循环
53
45
 
54
- **Output:** `$WORKSPACE/task.json`
46
+ 按顺序检查每个阶段。**每次只执行一个阶段。**
55
47
 
56
- ## Step 2: Search
48
+ ### Phase 1: Literature Survey
57
49
 
58
- Use the `arxiv` tool to search for 5-10 related papers based on the idea and any reference paper titles. Use the `github_search` tool to find related repositories.
50
+ **检查:** `$W/papers/_meta/` 目录存在且有 `.json` 文件?
59
51
 
60
- Combine results into a markdown report:
52
+ **如果缺失,spawn:**
61
53
 
62
54
  ```
63
- ## ArXiv Papers
64
- - [title](pdf_url) arxiv_id summary of relevance
65
-
66
- ## GitHub Repositories
67
- - [repo_name](url) — stars — language — summary of relevance
55
+ sessions_spawn({
56
+ task: "工作目录: $W\n执行 /literature-survey 技能\n\n研究主题: {从 task.json 提取}\n请搜索、筛选、下载相关论文到 $W/papers/",
57
+ label: "Literature Survey"
58
+ })
68
59
  ```
69
60
 
70
- **Output:** `$WORKSPACE/search_results.md`
71
-
72
- ## Step 3: Prepare References
73
-
74
- Read `$WORKSPACE/search_results.md`. Select 3-5 of the most relevant repositories.
61
+ **验证:** `ls $W/papers/_meta/*.json` 至少有 3 个文件
75
62
 
76
- For each selected repo, clone it into `$WORKSPACE/repos/`:
77
-
78
- ```bash
79
- git clone --depth 1 <url> $WORKSPACE/repos/<repo_name>
80
- ```
81
-
82
- Write a summary of selected repos and their relevance to the idea.
83
-
84
- **Output:** `$WORKSPACE/prepare_res.md`
85
-
86
- ## Step 4: Download Papers
87
-
88
- For each important paper from Step 2, use the `arxiv` tool with `download: true` and `output_dir: "$WORKSPACE/papers/"` to get .tex source files.
89
-
90
- If download fails for any paper, note the failure and continue. The survey step can work with abstracts alone.
63
+ ---
91
64
 
92
- **Output:** `$WORKSPACE/papers/*.tex` (or `.md` summaries if .tex unavailable)
65
+ ### Phase 2: Deep Survey
93
66
 
94
- ## Step 5: Literature Survey
67
+ **检查:** `$W/survey_res.md` 存在?
95
68
 
96
- This is the most intellectually demanding step. Read `references/prompts/survey.md` for detailed guidance.
69
+ **如果缺失,先读取 Phase 1 摘要,然后 spawn:**
97
70
 
98
- For each paper:
71
+ ```
72
+ sessions_spawn({
73
+ task: "工作目录: $W\n执行 /research-survey 技能\n\n上下文: 已下载 {N} 篇论文,方向包括 {directions}\n请深度分析论文,提取公式,写入 survey_res.md",
74
+ label: "Deep Survey"
75
+ })
76
+ ```
99
77
 
100
- 1. Read the .tex source (or abstract) thoroughly.
101
- 2. Extract: core method, mathematical formulas, key contributions.
102
- 3. Read the corresponding reference codebase in `$WORKSPACE/repos/`.
103
- 4. Map math formulas to code implementations.
104
- 5. Write structured notes to `$WORKSPACE/notes/paper_NNN.md`.
78
+ **验证:** `$W/survey_res.md` 存在且包含"核心方法对比"表格
105
79
 
106
- Each note file should contain:
80
+ ---
107
81
 
108
- ```markdown
109
- # [Paper Title]
82
+ ### Phase 3: Implementation Plan
110
83
 
111
- ## Core Method
112
- ...
84
+ **检查:** `$W/plan_res.md` 存在?
113
85
 
114
- ## Math Formulas
115
- ...
86
+ **如果缺失,读取 survey_res.md 摘要,然后 spawn:**
116
87
 
117
- ## Code Implementation
118
- File: repos/<repo>/path/to/file.py
119
- ```python
120
- # relevant code excerpt
121
88
  ```
122
-
123
- ## Key Insights
124
- ...
89
+ sessions_spawn({
90
+ task: "工作目录: $W\n执行 /research-plan 技能\n\n上下文: 调研发现核心方法是 {method},推荐技术路线 {route}\n请制定完整实现计划到 plan_res.md",
91
+ label: "Research Plan"
92
+ })
125
93
  ```
126
94
 
127
- After all papers are surveyed, write a synthesis combining all notes.
128
-
129
- **Output:** `$WORKSPACE/notes/paper_*.md` + `$WORKSPACE/survey_res.md`
95
+ **验证:** `$W/plan_res.md` 存在且包含 4 section(Dataset/Model/Training/Testing)
130
96
 
131
- ## Step 6: Implementation Plan
132
-
133
- Read `references/prompts/plan.md` for detailed guidance.
134
-
135
- Based on `survey_res.md`, `prepare_res.md`, and `task.json`, create a four-part plan:
136
-
137
- 1. **Dataset Plan**: data source, loading pipeline, preprocessing, dataloader design.
138
- 2. **Model Plan**: architecture, math formulas to implement, reference code to adapt.
139
- 3. **Training Plan**: loss functions, optimizer, hyperparameters, monitoring.
140
- 4. **Testing Plan**: metrics, evaluation protocol, baselines.
141
-
142
- **Output:** `$WORKSPACE/plan_res.md`
97
+ ---
143
98
 
144
- ## Step 7: Implement
99
+ ### Phase 4: Implementation
145
100
 
146
- Read `references/prompts/implement.md` for detailed guidance.
101
+ **检查:** `$W/ml_res.md` 存在?
147
102
 
148
- Create a self-contained project in `$WORKSPACE/project/`:
103
+ **如果缺失,读取 plan_res.md 要点,然后 spawn:**
149
104
 
150
105
  ```
151
- $WORKSPACE/project/
152
- model/ # model architecture
153
- data/ # data loading and preprocessing
154
- training/ # training loop and configs
155
- testing/ # evaluation scripts
156
- utils/ # shared utilities
157
- run.py # main entry point
158
- requirements.txt
106
+ sessions_spawn({
107
+ task: "工作目录: $W\n执行 /research-implement 技能\n\n上下文:\n- 计划包含 {N} 个组件: {list}\n- 数据集: {dataset}\n- 框架: PyTorch\n请实现代码到 $W/project/,运行 2 epoch 验证,写入 ml_res.md",
108
+ label: "Research Implement"
109
+ })
159
110
  ```
160
111
 
161
- **Critical rules:**
162
-
163
- - Do NOT import directly from `$WORKSPACE/repos/`. Adapt and rewrite code.
164
- - Implement EVERY component from `plan_res.md`.
165
- - Use real datasets, not toy data.
166
- - First run: 2 epochs only (quick validation).
112
+ **验证:**
113
+ - `$W/project/run.py` 存在
114
+ - `$W/ml_res.md` 包含 `[RESULT]`
115
+ - loss 值非 NaN/Inf
167
116
 
168
- Execute:
169
-
170
- ```bash
171
- cd $WORKSPACE/project && pip install -r requirements.txt && python run.py --epochs 2
172
- ```
117
+ ---
173
118
 
174
- **Note:** GPU support requires external configuration. For GPU-accelerated training, consider using a dedicated ML environment or cloud instance.
119
+ ### Phase 5: Review
175
120
 
176
- **Output:** `$WORKSPACE/project/` (code) + `$WORKSPACE/ml_res.md` (implementation report)
121
+ **检查:** `$W/iterations/` 下最新 `judge_v*.md` verdict 是否为 PASS?
177
122
 
178
- ## Step 8: Review
123
+ **如果没有 PASS,spawn:**
179
124
 
180
- Read `references/prompts/review.md` for detailed guidance.
125
+ ```
126
+ sessions_spawn({
127
+ task: "工作目录: $W\n执行 /research-review 技能\n\n上下文:\n- 实现报告: ml_res.md 显示 train_loss={value}\n- 计划在 plan_res.md\n请审查代码,如需修改则迭代修复(最多 3 轮)",
128
+ label: "Research Review"
129
+ })
130
+ ```
181
131
 
182
- Review the implementation against:
132
+ **验证:** 最新 `judge_v*.md` 中 `verdict: PASS` 或 `verdict: BLOCKED`
183
133
 
184
- - Each atomic idea from `survey_res.md`: is the math correctly translated to code?
185
- - The plan from `plan_res.md`: are all components present?
186
- - Code quality: no toy implementations, proper error handling, correct data pipeline.
134
+ 如果 BLOCKED 报告用户,等待指示
187
135
 
188
- Write a structured review:
136
+ ---
189
137
 
190
- ```markdown
191
- # Review v1
138
+ ### Phase 6: Full Experiment
192
139
 
193
- ## Verdict: PASS / NEEDS_REVISION
140
+ **检查:** `$W/experiment_res.md` 存在?
194
141
 
195
- ## Checklist
196
- - [ ] Dataset loading matches plan
197
- - [ ] Model architecture matches formulas
198
- - [ ] Loss function correct
199
- - [ ] Training loop proper
200
- - [ ] Evaluation metrics correct
142
+ **如果缺失,spawn:**
201
143
 
202
- ## Issues (if NEEDS_REVISION)
203
- 1. Issue description → suggested fix
204
- 2. ...
144
+ ```
145
+ sessions_spawn({
146
+ task: "工作目录: $W\n执行 /research-experiment 技能\n\n上下文:\n- Review PASS,代码已验证\n- plan_res.md 中指定 full epochs\n请执行完整训练 + 消融实验,写入 experiment_res.md",
147
+ label: "Research Experiment"
148
+ })
205
149
  ```
206
150
 
207
- **Output:** `$WORKSPACE/iterations/judge_v1.md`
208
-
209
- ## Step 9: Iterate
210
-
211
- If the review verdict is `NEEDS_REVISION`:
212
-
213
- 1. Read `$WORKSPACE/iterations/judge_vN.md` for the latest suggestions.
214
- 2. Fix each issue in `$WORKSPACE/project/`.
215
- 3. Re-run the 2-epoch validation.
216
- 4. Write a new review to `$WORKSPACE/iterations/judge_v(N+1).md`.
217
- 5. Repeat until `PASS` or 3 iterations reached.
151
+ **验证:** `$W/experiment_res.md` 包含 `[RESULT]` 行和消融表格
218
152
 
219
- If 3 iterations are exhausted without PASS, summarize remaining issues and ask the user for guidance.
153
+ ---
220
154
 
221
- **Output:** `$WORKSPACE/iterations/judge_v*.md` (review history)
155
+ ## 完成
222
156
 
223
- ## Step 10: Full Training
157
+ 所有 Phase 验证通过后,输出最终摘要:
224
158
 
225
- Once review passes:
159
+ ```
160
+ 研究流程完成!
161
+ - 论文: {N} 篇分析
162
+ - 代码: $W/project/
163
+ - 结果: $W/experiment_res.md
164
+ - 审查: $W/iterations/ ({N} 轮)
165
+ ```
226
166
 
227
- 1. Update epoch count in `run.py` to the full training value.
228
- 2. Execute full training run.
229
- 3. Collect and analyze results.
167
+ ---
230
168
 
231
- **Output:** `$WORKSPACE/experiment_res.md`
169
+ ## 上下文桥接规则
232
170
 
233
- ## Batch Processing Rule
171
+ 每次 spawn 前,编排器必须:
172
+ 1. **读取**上一步的产出文件
173
+ 2. **摘要** 2-5 行关键信息(不要复制全文)
174
+ 3. **写入** spawn task 的"上下文"部分
234
175
 
235
- When you need to apply the same LLM operation to more than 10 files (e.g., summarizing all papers), do NOT process them one by one in conversation. Instead, write a script to handle them in batch.
176
+ 这确保子 agent 拿到足够信息启动,同时不会被前序步骤的完整输出污染。
236
177
 
237
178
  ## Recovery
238
179
 
239
- If the session crashes or context fills up:
240
-
241
- 1. List files in `$WORKSPACE/` to see which steps completed.
242
- 2. Read the most recent output file to understand current state.
243
- 3. Resume from the first missing output file.
244
-
245
- Never re-do a step whose output file already exists unless the user explicitly asks.
180
+ 如果编排器中断:
181
+ 1. 重新运行 /research-pipeline
182
+ 2. 编排器会自动检查所有文件,跳过已完成的阶段
183
+ 3. 从第一个缺失的产出文件开始继续
@@ -0,0 +1,121 @@
1
+ ---
2
+ name: research-plan
3
+ description: "Create a structured implementation plan from survey results. Produces dataset/model/training/testing plans. Requires survey_res.md from /research-survey."
4
+ metadata:
5
+ {
6
+ "openclaw":
7
+ {
8
+ "emoji": "📋",
9
+ },
10
+ }
11
+ ---
12
+
13
+ # Research Plan
14
+
15
+ **Don't ask permission. Just do it.**
16
+
17
+ **Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
18
+
19
+ ## Prerequisites
20
+
21
+ | File | Source |
22
+ |------|--------|
23
+ | `$W/task.json` | /research-pipeline or user |
24
+ | `$W/survey_res.md` | /research-survey |
25
+ | `$W/notes/paper_*.md` | /research-survey |
26
+ | `$W/repos/` (optional) | git clone |
27
+
28
+ **If `survey_res.md` is missing, STOP:** "需要先运行 /research-survey 完成深度分析"
29
+
30
+ ## Output
31
+
32
+ | File | Content |
33
+ |------|---------|
34
+ | `$W/plan_res.md` | 四部分实现计划 |
35
+
36
+ ---
37
+
38
+ ## Workflow
39
+
40
+ ### Step 1: 读取上下文
41
+
42
+ 读取以下文件,理解研究目标和技术方案:
43
+ - `$W/task.json` — 研究目标
44
+ - `$W/survey_res.md` — 技术路线建议和核心公式
45
+ - 浏览 `$W/repos/` 的目录结构(如有)
46
+
47
+ ### Step 2: 制定四部分计划
48
+
49
+ 写入 `$W/plan_res.md`:
50
+
51
+ ```markdown
52
+ # Implementation Plan
53
+
54
+ ## 1. Dataset Plan
55
+
56
+ - **数据集名称:** {name}
57
+ - **来源:** {URL or description}
58
+ - **大小:** {samples / size}
59
+ - **预处理步骤:**
60
+ 1. {step}
61
+ 2. {step}
62
+ - **DataLoader 设计:**
63
+ - batch_size: {value}
64
+ - 输入格式: {shape}
65
+ - 输出格式: {shape}
66
+
67
+ ## 2. Model Plan
68
+
69
+ - **架构概述:** {1-2 sentences}
70
+ - **组件列表:**
71
+
72
+ | 组件 | 对应公式 | 参考代码 | 输入 → 输出 |
73
+ |------|----------|----------|-------------|
74
+ | {component} | $formula$ | `repos/xxx/file.py` | {shape} → {shape} |
75
+
76
+ - **参数量估计:** {approximate}
77
+
78
+ ## 3. Training Plan
79
+
80
+ - **Loss 函数:** {formula + description}
81
+ - **Optimizer:** {Adam/SGD/...}, lr={value}
82
+ - **Scheduler:** {if any}
83
+ - **训练参数:**
84
+ - epochs (validation): 2
85
+ - epochs (full): {value}
86
+ - batch_size: {value}
87
+ - **监控指标:** {loss, metrics to log}
88
+
89
+ ## 4. Testing Plan
90
+
91
+ - **评估指标:**
92
+
93
+ | Metric | 公式/描述 | 期望范围 |
94
+ |--------|-----------|----------|
95
+ | {metric} | {description} | {range} |
96
+
97
+ - **Baselines:** {what to compare against}
98
+ - **消融实验(初步规划):**
99
+ 1. {ablation 1}
100
+ 2. {ablation 2}
101
+ ```
102
+
103
+ ### Step 3: 自检
104
+
105
+ 验证计划的完整性:
106
+ - [ ] 每个模型组件都有对应公式
107
+ - [ ] 数据集有具体获取方式
108
+ - [ ] Loss 函数有数学定义
109
+ - [ ] 评估指标有明确定义
110
+ - [ ] 训练参数合理(不要 lr=0.1 for Adam)
111
+
112
+ 如有不确定项,在计划中标注 `⚠️ TODO: {reason}`
113
+
114
+ ---
115
+
116
+ ## Rules
117
+
118
+ 1. 计划中每个组件必须可追溯到 survey_res.md 中的公式或方法
119
+ 2. 不要写"通用"计划 — 每个参数都要有具体值或合理估计
120
+ 3. 如果参考仓库存在,组件表必须包含参考代码路径
121
+ 4. plan_res.md 的完成标志:四个部分都存在且非空
@@ -0,0 +1,110 @@
1
+ ---
2
+ name: research-review
3
+ description: "Review ML implementation against plan and survey. Iterates fix-rerun-review up to 3 times. Requires ml_res.md from /research-implement."
4
+ metadata:
5
+ {
6
+ "openclaw":
7
+ {
8
+ "emoji": "🔍",
9
+ "requires": { "bins": ["python3", "uv"] },
10
+ },
11
+ }
12
+ ---
13
+
14
+ # Research Review
15
+
16
+ **Don't ask permission. Just do it.**
17
+
18
+ **Workspace:** See `../_shared/workspace-spec.md`. Set `$W` to the active project directory.
19
+
20
+ ## Prerequisites
21
+
22
+ | File | Source |
23
+ |------|--------|
24
+ | `$W/ml_res.md` | /research-implement |
25
+ | `$W/project/` | /research-implement |
26
+ | `$W/plan_res.md` | /research-plan |
27
+ | `$W/survey_res.md` | /research-survey |
28
+
29
+ **If `ml_res.md` is missing, STOP:** "需要先运行 /research-implement 完成代码实现"
30
+
31
+ ## Output
32
+
33
+ | File | Content |
34
+ |------|---------|
35
+ | `$W/iterations/judge_v{N}.md` | 每轮审查报告 |
36
+
37
+ 最终报告中 `verdict: PASS` 表示审查通过。
38
+
39
+ ---
40
+
41
+ ## Workflow
42
+
43
+ ### Step 1: 审查代码
44
+
45
+ 读取以下内容:
46
+ - `$W/plan_res.md` — 每个组件的预期
47
+ - `$W/survey_res.md` — 核心公式
48
+ - `$W/project/` — 实际代码
49
+ - `$W/ml_res.md` — 执行结果
50
+
51
+ ### Step 2: 逐项检查
52
+
53
+ | 检查项 | 方法 |
54
+ |--------|------|
55
+ | 数据管道匹配 plan | 对比 plan Dataset Plan vs `data/` 实现 |
56
+ | 模型架构匹配公式 | 对比 survey 公式 vs `model/` 实现 |
57
+ | Loss 函数正确 | 对比 plan Training Plan vs `training/loss.py` |
58
+ | 评估指标正确 | 对比 plan Testing Plan vs `testing/` |
59
+ | [RESULT] 行存在 | 检查 ml_res.md 中的数值来源 |
60
+ | Loss 合理 | 非 NaN/Inf,有下降趋势 |
61
+ | 无 mock 数据(除非已声明) | 搜索 `# MOCK DATA` 注释 |
62
+
63
+ ### Step 3: 写入审查报告
64
+
65
+ 写入 `$W/iterations/judge_v1.md`:
66
+
67
+ ```markdown
68
+ # Review v1
69
+
70
+ ## Verdict: PASS / NEEDS_REVISION
71
+
72
+ ## Checklist
73
+ - [x/✗] Dataset loading matches plan
74
+ - [x/✗] Model architecture matches formulas
75
+ - [x/✗] Loss function correct
76
+ - [x/✗] Training loop proper
77
+ - [x/✗] Evaluation metrics correct
78
+ - [x/✗] Results are from real execution (not fabricated)
79
+
80
+ ## Issues (if NEEDS_REVISION)
81
+ 1. **{issue}**: {description} → **Fix**: {specific fix instruction}
82
+ 2. ...
83
+ ```
84
+
85
+ ### Step 4: 迭代(如果 NEEDS_REVISION)
86
+
87
+ 循环最多 3 次:
88
+
89
+ 1. 读取 `judge_v{N}.md` 的修改建议
90
+ 2. 修改 `$W/project/` 中的代码
91
+ 3. 重新执行:
92
+ ```bash
93
+ cd $W/project && source .venv/bin/activate && python run.py --epochs 2
94
+ ```
95
+ 4. 读取执行输出,验证修复
96
+ 5. 写入 `judge_v{N+1}.md`
97
+ 6. 如果 PASS → 停止;否则继续
98
+
99
+ ### Step 5: 最终判定
100
+
101
+ 3 轮后仍 NEEDS_REVISION → 在最后一份 judge 中列出剩余问题,标记 `verdict: BLOCKED`,等待用户介入。
102
+
103
+ ---
104
+
105
+ ## Rules
106
+
107
+ 1. 审查必须逐项对照 plan,不能只看"代码能跑"
108
+ 2. 每个 issue 必须给出具体的修复指令(不是"请改进")
109
+ 3. 验证修复后必须重新执行代码并检查输出
110
+ 4. PASS 的前提:所有 checklist 项通过 + [RESULT] 数值合理