scientify 1.2.1 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/_shared/workspace-spec.md +129 -0
- package/skills/idea-generation/SKILL.md +111 -373
- package/skills/literature-survey/SKILL.md +2 -0
- package/skills/research-pipeline/SKILL.md +7 -29
- package/skills/research-pipeline/references/workspace-spec.md +3 -79
- package/skills/write-review-paper/SKILL.md +2 -0
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "scientify",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.3.0",
|
|
4
4
|
"description": "Scientify - AI-powered research workflow automation for OpenClaw. Includes idea generation, literature review, research pipeline skills, and arxiv tool.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/index.js",
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
# Workspace Directory Specification
|
|
2
|
+
|
|
3
|
+
All Scientify skills share a unified project-based workspace structure.
|
|
4
|
+
|
|
5
|
+
## Base Path
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
~/.openclaw/workspace/projects/
|
|
9
|
+
├── .active # Current project ID (plain text)
|
|
10
|
+
├── {project-id}/ # Each research topic has its own project
|
|
11
|
+
│ └── ...
|
|
12
|
+
└── {another-project}/
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Project Structure
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
~/.openclaw/workspace/projects/{project-id}/
|
|
19
|
+
├── project.json # Project metadata
|
|
20
|
+
├── task.json # Research task definition
|
|
21
|
+
│
|
|
22
|
+
├── survey/ # /literature-survey outputs
|
|
23
|
+
│ ├── search_terms.json # Generated search keywords
|
|
24
|
+
│ ├── raw_results.json # All search results
|
|
25
|
+
│ ├── filtered_papers.json # Papers with relevance scores
|
|
26
|
+
│ ├── clusters.json # Clustered by research direction
|
|
27
|
+
│ └── report.md # Final survey report
|
|
28
|
+
│
|
|
29
|
+
├── papers/ # Downloaded paper sources
|
|
30
|
+
│ ├── {direction-1}/ # Organized by cluster
|
|
31
|
+
│ │ ├── paper_list.md
|
|
32
|
+
│ │ └── {arxiv_id}/ # .tex source files
|
|
33
|
+
│ ├── {direction-2}/
|
|
34
|
+
│ └── uncategorized/
|
|
35
|
+
│
|
|
36
|
+
├── repos/ # Cloned reference repositories
|
|
37
|
+
│ ├── {repo-name-1}/
|
|
38
|
+
│ └── {repo-name-2}/
|
|
39
|
+
│
|
|
40
|
+
├── ideas/ # /idea-generation outputs
|
|
41
|
+
│ ├── gaps.md # Identified research gaps
|
|
42
|
+
│ ├── idea_1.md ... idea_5.md # Generated ideas
|
|
43
|
+
│ ├── selected_idea.md # Enhanced best idea
|
|
44
|
+
│ ├── implementation_report.md # Code mapping
|
|
45
|
+
│ └── summary.md # Final summary
|
|
46
|
+
│
|
|
47
|
+
├── review/ # /write-review-paper outputs
|
|
48
|
+
│ ├── reading_plan.md # Prioritized reading list
|
|
49
|
+
│ ├── notes/ # Per-paper reading notes
|
|
50
|
+
│ │ └── {paper_id}.md
|
|
51
|
+
│ ├── comparison.md # Method comparison table
|
|
52
|
+
│ ├── timeline.md # Research timeline
|
|
53
|
+
│ ├── taxonomy.md # Classification system
|
|
54
|
+
│ ├── draft.md # Survey paper draft
|
|
55
|
+
│ └── bibliography.bib # References
|
|
56
|
+
│
|
|
57
|
+
├── plan_res.md # /research-pipeline: implementation plan
|
|
58
|
+
├── project/ # /research-pipeline: code implementation
|
|
59
|
+
│ ├── model/
|
|
60
|
+
│ ├── data/
|
|
61
|
+
│ ├── training/
|
|
62
|
+
│ ├── testing/
|
|
63
|
+
│ ├── run.py
|
|
64
|
+
│ └── requirements.txt
|
|
65
|
+
├── iterations/ # Review iterations
|
|
66
|
+
│ ├── judge_v1.md
|
|
67
|
+
│ └── judge_v2.md
|
|
68
|
+
└── experiment_res.md # Final results
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Conventions
|
|
72
|
+
|
|
73
|
+
### File Existence = Step Completion
|
|
74
|
+
|
|
75
|
+
Check output file before executing any step. If exists, skip.
|
|
76
|
+
|
|
77
|
+
Enables:
|
|
78
|
+
- **Crash recovery**: resume from last completed step
|
|
79
|
+
- **Incremental progress**: rerunning skips completed work
|
|
80
|
+
- **Transparency**: inspect progress by listing directory
|
|
81
|
+
|
|
82
|
+
### Project Metadata
|
|
83
|
+
|
|
84
|
+
**project.json:**
|
|
85
|
+
```json
|
|
86
|
+
{
|
|
87
|
+
"id": "battery-rul-prediction",
|
|
88
|
+
"name": "Battery RUL Prediction",
|
|
89
|
+
"created": "2024-01-15T10:00:00Z",
|
|
90
|
+
"topics": ["battery", "remaining useful life", "prediction"]
|
|
91
|
+
}
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**task.json:**
|
|
95
|
+
```json
|
|
96
|
+
{
|
|
97
|
+
"domain": "battery health",
|
|
98
|
+
"focus": "RUL prediction using transformer",
|
|
99
|
+
"date_limit": "2024-01-01",
|
|
100
|
+
"created": "2024-01-15"
|
|
101
|
+
}
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Immutability
|
|
105
|
+
|
|
106
|
+
Once written, do NOT modify outputs unless user explicitly asks.
|
|
107
|
+
Exception: `project/` is mutable during implement-review-iterate loop.
|
|
108
|
+
|
|
109
|
+
### Active Project
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
# Read active project
|
|
113
|
+
cat ~/.openclaw/workspace/projects/.active
|
|
114
|
+
|
|
115
|
+
# Set active project
|
|
116
|
+
echo "battery-rul-prediction" > ~/.openclaw/workspace/projects/.active
|
|
117
|
+
|
|
118
|
+
# Set $WORKSPACE variable
|
|
119
|
+
WORKSPACE=~/.openclaw/workspace/projects/$(cat ~/.openclaw/workspace/projects/.active)
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
## Skill Outputs Summary
|
|
123
|
+
|
|
124
|
+
| Skill | Primary Outputs |
|
|
125
|
+
|-------|-----------------|
|
|
126
|
+
| `/literature-survey` | `survey/`, `papers/` |
|
|
127
|
+
| `/idea-generation` | `ideas/` |
|
|
128
|
+
| `/write-review-paper` | `review/` |
|
|
129
|
+
| `/research-pipeline` | `project/`, `iterations/`, `experiment_res.md` |
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: idea-generation
|
|
3
|
-
description: "Generate innovative research ideas from
|
|
3
|
+
description: "Generate 5 innovative research ideas from collected papers. Analyzes literature, identifies gaps, proposes novel methods with citations. Use for: 找研究方向, 生成创新点, find research gaps. Requires papers in workspace (run /literature-survey first if needed)."
|
|
4
4
|
metadata:
|
|
5
5
|
{
|
|
6
6
|
"openclaw":
|
|
@@ -13,389 +13,168 @@ metadata:
|
|
|
13
13
|
|
|
14
14
|
# Idea Generation
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
Generate innovative research ideas grounded in literature analysis. This skill reads existing papers, identifies research gaps, and produces 5 distinct ideas with citations.
|
|
17
17
|
|
|
18
|
-
|
|
19
|
-
2. Select and download references
|
|
20
|
-
3. Analyze literature and codebases
|
|
21
|
-
4. Generate multiple ideas
|
|
22
|
-
5. Select and enhance the best idea
|
|
23
|
-
6. Map to code implementations
|
|
18
|
+
**Core principle:** Ideas MUST be grounded in actual papers, not generated from model knowledge.
|
|
24
19
|
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
## ⚠️ CRITICAL: EXECUTION MODE
|
|
28
|
-
|
|
29
|
-
**AUTONOMOUS EXECUTION**: Execute ALL steps without asking for user confirmation at each step.
|
|
30
|
-
- Do NOT ask "要我继续吗?" or "Should I proceed?"
|
|
31
|
-
- You MAY spawn subagents for parallel tasks (e.g., downloading multiple papers)
|
|
32
|
-
- Only ask user when there's a genuine ambiguity (e.g., which focus area to choose)
|
|
33
|
-
- Checkpoints are for YOUR internal verification, not for asking user
|
|
34
|
-
|
|
35
|
-
**Run the entire workflow from Step 1 to Step 8 automatically.**
|
|
20
|
+
**Workspace:** See `../_shared/workspace-spec.md` for directory structure. Outputs go to `$WORKSPACE/ideas/`.
|
|
36
21
|
|
|
37
22
|
---
|
|
38
23
|
|
|
39
|
-
##
|
|
40
|
-
|
|
41
|
-
**DO NOT generate ideas from your own knowledge.** All ideas MUST be grounded in actual literature research.
|
|
42
|
-
|
|
43
|
-
### Blocking Requirements
|
|
44
|
-
|
|
45
|
-
1. **MUST call `arxiv` tool** to search papers - NO EXCEPTIONS
|
|
46
|
-
2. **MUST call `github_search` tool** to find repositories - NO EXCEPTIONS
|
|
47
|
-
3. **MUST write `search_results.md`** BEFORE proceeding to idea generation
|
|
48
|
-
4. **MUST reference specific papers** (with arXiv IDs) in generated ideas
|
|
49
|
-
5. **MUST clone actual repos** before code survey
|
|
50
|
-
|
|
51
|
-
### Anti-Pattern: DO NOT DO THIS
|
|
24
|
+
## Step 1: Check Workspace Resources
|
|
52
25
|
|
|
53
|
-
|
|
54
|
-
❌ Agent generates ideas without calling any search tools
|
|
55
|
-
❌ Agent skips to idea generation without `search_results.md` existing
|
|
26
|
+
First, check what resources already exist:
|
|
56
27
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
✅ Agent calls `github_search` tool to find implementations
|
|
61
|
-
✅ Agent writes search results to file
|
|
62
|
-
✅ Agent reads downloaded papers before generating ideas
|
|
63
|
-
✅ Ideas reference specific papers by arXiv ID
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
## Workspace Convention (Project-based)
|
|
28
|
+
```bash
|
|
29
|
+
# Check active project
|
|
30
|
+
cat ~/.openclaw/workspace/projects/.active 2>/dev/null
|
|
68
31
|
|
|
69
|
-
|
|
32
|
+
# Check papers
|
|
33
|
+
ls ~/.openclaw/workspace/projects/*/papers/ 2>/dev/null | head -20
|
|
70
34
|
|
|
71
|
-
|
|
72
|
-
~/.openclaw/workspace/
|
|
73
|
-
└── projects/
|
|
74
|
-
├── .active # Current project ID (plain text file)
|
|
75
|
-
├── nlp-summarization/ # Project A
|
|
76
|
-
│ ├── project.json # Project metadata
|
|
77
|
-
│ ├── task.json # Research task definition
|
|
78
|
-
│ ├── search_results.md # Search results
|
|
79
|
-
│ ├── prepare_res.md # Selected repos summary
|
|
80
|
-
│ ├── papers/ # Downloaded papers
|
|
81
|
-
│ ├── repos/ # Cloned repositories
|
|
82
|
-
│ └── ideas/ # Generated ideas
|
|
83
|
-
├── image-segmentation/ # Project B
|
|
84
|
-
│ └── ...
|
|
85
|
-
└── ...
|
|
35
|
+
# Check survey results
|
|
36
|
+
cat ~/.openclaw/workspace/projects/*/survey/clusters.json 2>/dev/null | head -5
|
|
86
37
|
```
|
|
87
38
|
|
|
88
|
-
|
|
39
|
+
### Assess Available Resources
|
|
89
40
|
|
|
90
|
-
|
|
41
|
+
| Resource | Location | Status |
|
|
42
|
+
|----------|----------|--------|
|
|
43
|
+
| Papers | `$WORKSPACE/papers/` | Count: ? |
|
|
44
|
+
| Survey clusters | `$WORKSPACE/survey/clusters.json` | Exists: Y/N |
|
|
45
|
+
| Repos | `$WORKSPACE/repos/` | Count: ? |
|
|
91
46
|
|
|
92
47
|
---
|
|
93
48
|
|
|
94
|
-
## Step
|
|
49
|
+
## Step 2: Ask User About Search Strategy
|
|
95
50
|
|
|
96
|
-
|
|
51
|
+
Based on workspace state, ask user:
|
|
97
52
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
53
|
+
**If papers exist (≥5):**
|
|
54
|
+
> 📚 Found {N} papers in workspace from previous survey.
|
|
55
|
+
>
|
|
56
|
+
> Options:
|
|
57
|
+
> 1. **Use existing papers** - Generate ideas from current collection
|
|
58
|
+
> 2. **Search more** - Run `/literature-survey` to expand collection
|
|
59
|
+
> 3. **Quick search** - Add 5-10 more papers on specific topic
|
|
102
60
|
|
|
103
|
-
|
|
61
|
+
**If no papers:**
|
|
62
|
+
> 📭 No papers found in workspace.
|
|
63
|
+
>
|
|
64
|
+
> To generate grounded ideas, I need literature. Options:
|
|
65
|
+
> 1. **Run /literature-survey** - Comprehensive search (100+ papers, recommended)
|
|
66
|
+
> 2. **Quick search** - Fetch 10-15 papers on your topic now
|
|
67
|
+
> 3. **You provide papers** - Point me to existing PDFs/tex files
|
|
104
68
|
|
|
105
69
|
---
|
|
106
70
|
|
|
107
|
-
## Step
|
|
71
|
+
## Step 3: Acquire Resources (if needed)
|
|
108
72
|
|
|
109
|
-
|
|
73
|
+
### Option A: Delegate to /literature-survey (Recommended)
|
|
110
74
|
|
|
111
|
-
|
|
112
|
-
- **focus** (optional): Specific problem or technique
|
|
113
|
-
- **date_limit** (optional): Only consider papers before this date
|
|
114
|
-
|
|
115
|
-
```bash
|
|
116
|
-
cat $WORKSPACE/task.json 2>/dev/null || echo "No task.json"
|
|
75
|
+
If user wants comprehensive search:
|
|
117
76
|
```
|
|
77
|
+
Please run: /literature-survey {topic}
|
|
118
78
|
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
"date_limit": "2024-01-01",
|
|
125
|
-
"created": "2024-XX-XX"
|
|
126
|
-
}
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
**Output:** `$WORKSPACE/task.json`
|
|
130
|
-
|
|
131
|
-
---
|
|
132
|
-
|
|
133
|
-
## Step 2: Search Papers and Code (MANDATORY)
|
|
134
|
-
|
|
135
|
-
**⚠️ BLOCKING: You MUST complete this step before ANY idea generation.**
|
|
136
|
-
|
|
137
|
-
### 2.1 ArXiv Search (REQUIRED)
|
|
138
|
-
|
|
139
|
-
**You MUST call the `arxiv` tool.** Example:
|
|
79
|
+
This will:
|
|
80
|
+
- Search 100+ papers systematically
|
|
81
|
+
- Filter by relevance (score ≥4)
|
|
82
|
+
- Cluster into research directions
|
|
83
|
+
- Save to $WORKSPACE/papers/
|
|
140
84
|
|
|
141
|
-
|
|
142
|
-
Tool: arxiv
|
|
143
|
-
Arguments:
|
|
144
|
-
query: "text summarization transformer model"
|
|
145
|
-
max_results: 10
|
|
146
|
-
sort_by: "relevance"
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
If `arxiv` tool is not available, use `WebSearch` with `site:arxiv.org`:
|
|
150
|
-
```
|
|
151
|
-
Tool: WebSearch
|
|
152
|
-
Arguments:
|
|
153
|
-
query: "site:arxiv.org text summarization transformer model"
|
|
85
|
+
After survey completes, run /idea-generation again.
|
|
154
86
|
```
|
|
155
87
|
|
|
156
|
-
###
|
|
88
|
+
### Option B: Quick Search (5-10 papers)
|
|
157
89
|
|
|
158
|
-
|
|
90
|
+
For fast iteration, do minimal search:
|
|
159
91
|
|
|
92
|
+
1. **ArXiv search:**
|
|
160
93
|
```
|
|
161
|
-
Tool:
|
|
94
|
+
Tool: arxiv_search
|
|
162
95
|
Arguments:
|
|
163
|
-
query: "
|
|
164
|
-
|
|
165
|
-
max_results: 20
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
If `github_search` tool is not available, use `WebSearch`:
|
|
169
|
-
```
|
|
170
|
-
Tool: WebSearch
|
|
171
|
-
Arguments:
|
|
172
|
-
query: "site:github.com text summarization pytorch stars:>100"
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
### 2.3 CHECKPOINT: Verify Search Completed
|
|
176
|
-
|
|
177
|
-
Before proceeding, confirm:
|
|
178
|
-
- [ ] Called arxiv/WebSearch for papers
|
|
179
|
-
- [ ] Called github_search/WebSearch for repositories
|
|
180
|
-
- [ ] Have at least 5 paper results
|
|
181
|
-
- [ ] Have at least 5 repository results
|
|
182
|
-
|
|
183
|
-
**If search returns 0 results, try different queries. DO NOT proceed without results.**
|
|
184
|
-
|
|
185
|
-
### 2.4 Compile Results
|
|
186
|
-
|
|
187
|
-
Write to `$WORKSPACE/search_results.md`:
|
|
188
|
-
|
|
189
|
-
```markdown
|
|
190
|
-
# Search Results
|
|
191
|
-
|
|
192
|
-
## Task
|
|
193
|
-
- Domain: {domain}
|
|
194
|
-
- Focus: {focus}
|
|
195
|
-
- Date: {date}
|
|
196
|
-
|
|
197
|
-
## ArXiv Papers Found
|
|
198
|
-
|
|
199
|
-
| # | Title | ArXiv ID | Year | Relevance |
|
|
200
|
-
|---|-------|----------|------|-----------|
|
|
201
|
-
| 1 | [Title](pdf_url) | 2401.xxxxx | 2024 | [Why relevant] |
|
|
202
|
-
| 2 | ... | ... | ... | ... |
|
|
203
|
-
|
|
204
|
-
## GitHub Repositories Found
|
|
205
|
-
|
|
206
|
-
| # | Repository | Stars | Language | Relevance |
|
|
207
|
-
|---|------------|-------|----------|-----------|
|
|
208
|
-
| 1 | [owner/repo](url) | 1.2k | Python | [Why relevant] |
|
|
209
|
-
| 2 | ... | ... | ... | ... |
|
|
96
|
+
query: "{user_topic}"
|
|
97
|
+
max_results: 10
|
|
210
98
|
```
|
|
211
99
|
|
|
212
|
-
**
|
|
213
|
-
|
|
214
|
-
---
|
|
215
|
-
|
|
216
|
-
## Step 3: Prepare - Select Repositories
|
|
217
|
-
|
|
218
|
-
Read search results and select **at least 5** most valuable repositories.
|
|
219
|
-
|
|
220
|
-
Selection criteria:
|
|
221
|
-
- Direct implementation of relevant papers
|
|
222
|
-
- High code quality (stars, documentation)
|
|
223
|
-
- Active maintenance
|
|
224
|
-
- Covers key techniques in the domain
|
|
225
|
-
|
|
226
|
-
### 3.1 Clone Selected Repos
|
|
227
|
-
|
|
100
|
+
2. **Clone 3-5 reference repos:**
|
|
228
101
|
```bash
|
|
229
102
|
mkdir -p $WORKSPACE/repos
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
# For each selected repo:
|
|
233
|
-
git clone --depth 1 https://github.com/owner/repo1.git
|
|
234
|
-
git clone --depth 1 https://github.com/owner/repo2.git
|
|
235
|
-
# ... at least 5 repos
|
|
236
|
-
```
|
|
237
|
-
|
|
238
|
-
### 3.2 Document Selection
|
|
239
|
-
|
|
240
|
-
Write to `$WORKSPACE/prepare_res.md`:
|
|
241
|
-
|
|
242
|
-
```markdown
|
|
243
|
-
# Selected Reference Codebases
|
|
244
|
-
|
|
245
|
-
## Selection Rationale
|
|
246
|
-
[Why these repos were chosen]
|
|
247
|
-
|
|
248
|
-
## Repositories
|
|
249
|
-
|
|
250
|
-
### 1. repo1
|
|
251
|
-
- **URL**: https://github.com/owner/repo1
|
|
252
|
-
- **Paper**: [Associated paper if any]
|
|
253
|
-
- **Key Components**:
|
|
254
|
-
- `model/` - Model architecture
|
|
255
|
-
- `train.py` - Training loop
|
|
256
|
-
- **Usage**: [How this will help implement our idea]
|
|
257
|
-
|
|
258
|
-
### 2. repo2
|
|
259
|
-
...
|
|
260
|
-
|
|
261
|
-
## Reference Papers
|
|
262
|
-
Based on these repos, the key papers to read are:
|
|
263
|
-
1. [Paper Title 1] - ArXiv: 2401.xxxxx
|
|
264
|
-
2. [Paper Title 2] - ArXiv: 2401.xxxxx
|
|
265
|
-
...
|
|
103
|
+
git clone --depth 1 {repo_url} $WORKSPACE/repos/{name}
|
|
266
104
|
```
|
|
267
105
|
|
|
268
|
-
**
|
|
269
|
-
|
|
270
|
-
---
|
|
271
|
-
|
|
272
|
-
## Step 4: Download Papers
|
|
273
|
-
|
|
274
|
-
For each paper referenced in prepare_res.md, download the source.
|
|
275
|
-
|
|
276
|
-
**IMPORTANT: Download .tex source, NOT PDF.** .tex files are much easier for AI to read and extract information from.
|
|
277
|
-
|
|
278
|
-
### 4.1 Download .tex Source (RECOMMENDED - Use arxiv tool)
|
|
279
|
-
|
|
280
|
-
Use the `arxiv` tool with `download: true` to automatically download and extract .tex sources:
|
|
281
|
-
|
|
282
|
-
```
|
|
283
|
-
Tool: arxiv
|
|
284
|
-
Arguments:
|
|
285
|
-
query: "abstractive summarization long document"
|
|
286
|
-
max_results: 10
|
|
287
|
-
download: true
|
|
288
|
-
output_dir: "$WORKSPACE/papers"
|
|
289
|
-
```
|
|
290
|
-
|
|
291
|
-
The tool will:
|
|
292
|
-
1. Search for papers matching your query
|
|
293
|
-
2. Download .tex source from `https://arxiv.org/src/{arxiv_id}`
|
|
294
|
-
3. Extract tar.gz archives automatically
|
|
295
|
-
4. Fall back to PDF if .tex is unavailable
|
|
296
|
-
5. Return a `downloads` array showing what was downloaded
|
|
297
|
-
|
|
298
|
-
**Output format:**
|
|
299
|
-
```json
|
|
300
|
-
{
|
|
301
|
-
"papers": [...],
|
|
302
|
-
"downloads": [
|
|
303
|
-
{"arxiv_id": "2404.04429", "format": "tex", "files": ["main.tex", "methods.tex"]},
|
|
304
|
-
{"arxiv_id": "2308.03664", "format": "pdf", "files": ["2308.03664.pdf"], "error": "tex unavailable"}
|
|
305
|
-
],
|
|
306
|
-
"output_dir": "$WORKSPACE/papers"
|
|
307
|
-
}
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
### 4.2 Manual Download (Fallback)
|
|
311
|
-
|
|
312
|
-
If the arxiv tool is unavailable, use bash:
|
|
106
|
+
3. **Download paper sources:**
|
|
313
107
|
```bash
|
|
314
108
|
mkdir -p $WORKSPACE/papers/{arxiv_id}
|
|
315
|
-
|
|
316
|
-
curl -L "https://arxiv.org/src/{arxiv_id}" -o source.tar.gz
|
|
317
|
-
tar -xzf source.tar.gz 2>/dev/null || mv source.tar.gz main.tex
|
|
109
|
+
curl -L "https://arxiv.org/src/{arxiv_id}" | tar -xz -C $WORKSPACE/papers/{arxiv_id}
|
|
318
110
|
```
|
|
319
111
|
|
|
320
|
-
### 4.3 Document Downloads
|
|
321
|
-
|
|
322
|
-
Write to `$WORKSPACE/papers/download_log.md`:
|
|
323
|
-
|
|
324
|
-
```markdown
|
|
325
|
-
# Downloaded Papers
|
|
326
|
-
|
|
327
|
-
| ArXiv ID | Title | Format | Status |
|
|
328
|
-
|----------|-------|--------|--------|
|
|
329
|
-
| 2404.04429 | Physics-Informed ML for Battery... | .tex | ✓ |
|
|
330
|
-
| 2308.03664 | Two-stage Early Prediction... | .tex | ✓ |
|
|
331
|
-
| 2401.99999 | Some Other Paper | .pdf | ✓ (tex unavailable) |
|
|
332
|
-
```
|
|
333
|
-
|
|
334
|
-
**Output:** `$WORKSPACE/papers/`
|
|
335
|
-
|
|
336
112
|
---
|
|
337
113
|
|
|
338
|
-
## Step
|
|
114
|
+
## Step 4: Analyze Literature
|
|
339
115
|
|
|
340
|
-
|
|
116
|
+
**Prerequisites:** At least 5 papers in `$WORKSPACE/papers/`
|
|
341
117
|
|
|
342
|
-
###
|
|
118
|
+
### 4.1 Read Papers
|
|
343
119
|
|
|
344
|
-
|
|
345
|
-
-
|
|
346
|
-
-
|
|
347
|
-
-
|
|
120
|
+
For each paper, extract:
|
|
121
|
+
- Core contribution (1 sentence)
|
|
122
|
+
- Key method/formula
|
|
123
|
+
- Limitations mentioned
|
|
124
|
+
- Future work suggestions
|
|
348
125
|
|
|
349
|
-
**
|
|
126
|
+
**Long papers (>50KB):** See `references/reading-long-papers.md`
|
|
350
127
|
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
**IMPORTANT: Ideas must be grounded in the literature you just read. Each idea MUST:**
|
|
354
|
-
- Reference at least 2 specific papers by arXiv ID
|
|
355
|
-
- Identify specific limitations from those papers
|
|
356
|
-
- Propose improvements based on gaps found in the literature
|
|
357
|
-
|
|
358
|
-
### 5.1 Analyze Literature First (REQUIRED)
|
|
359
|
-
|
|
360
|
-
For each paper in `papers/`:
|
|
361
|
-
1. Read thoroughly (especially: abstract, method, experiments, limitations)
|
|
362
|
-
2. Extract: core contribution, math formulas, limitations, future work
|
|
363
|
-
3. Note connections to other papers
|
|
364
|
-
|
|
365
|
-
**Long Papers (>50KB):** See `references/reading-long-papers.md` for chunked reading strategy.
|
|
366
|
-
|
|
367
|
-
For each repo in `repos/`:
|
|
368
|
-
1. Understand structure: `gen_code_tree_structure` equivalent
|
|
369
|
-
2. Identify key implementations
|
|
370
|
-
3. Note reusable components
|
|
371
|
-
|
|
372
|
-
### 5.2 Identify Research Gaps
|
|
128
|
+
### 4.2 Identify Research Gaps
|
|
373
129
|
|
|
374
130
|
Look for:
|
|
375
131
|
- Common limitations across papers
|
|
376
|
-
- Unexplored combinations
|
|
132
|
+
- Unexplored technique combinations
|
|
377
133
|
- Scalability issues
|
|
378
134
|
- Assumptions that could be relaxed
|
|
379
135
|
|
|
380
|
-
|
|
136
|
+
Document gaps in `$WORKSPACE/ideas/gaps.md`:
|
|
137
|
+
```markdown
|
|
138
|
+
# Research Gaps Identified
|
|
139
|
+
|
|
140
|
+
## Gap 1: [Description]
|
|
141
|
+
- Mentioned in: [paper1], [paper2]
|
|
142
|
+
- Why important: ...
|
|
143
|
+
|
|
144
|
+
## Gap 2: [Description]
|
|
145
|
+
...
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Step 5: Generate 5 Ideas
|
|
381
151
|
|
|
382
152
|
Create `$WORKSPACE/ideas/idea_1.md` through `idea_5.md` using template in `references/idea-template.md`.
|
|
383
153
|
|
|
384
|
-
**
|
|
385
|
-
- Each idea
|
|
386
|
-
- Use different strategies
|
|
154
|
+
**Requirements:**
|
|
155
|
+
- Each idea cites ≥2 papers by arXiv ID
|
|
156
|
+
- Use different strategies:
|
|
387
157
|
|
|
388
|
-
|
|
158
|
+
| Idea | Strategy |
|
|
159
|
+
|------|----------|
|
|
160
|
+
| 1 | Combination - merge 2+ techniques |
|
|
161
|
+
| 2 | Simplification - reduce complexity |
|
|
162
|
+
| 3 | Generalization - extend to new domain |
|
|
163
|
+
| 4 | Constraint relaxation - remove assumption |
|
|
164
|
+
| 5 | Architecture innovation - new design |
|
|
389
165
|
|
|
390
|
-
|
|
166
|
+
**❌ REJECTED if:** No arXiv IDs cited, or ideas not grounded in literature
|
|
391
167
|
|
|
392
168
|
---
|
|
393
169
|
|
|
394
170
|
## Step 6: Select and Enhance Best Idea
|
|
395
171
|
|
|
396
|
-
### 6.1
|
|
172
|
+
### 6.1 Score All Ideas
|
|
397
173
|
|
|
398
|
-
|
|
174
|
+
| Idea | Novelty | Feasibility | Impact | Total |
|
|
175
|
+
|------|---------|-------------|--------|-------|
|
|
176
|
+
| 1 | /5 | /5 | /5 | /15 |
|
|
177
|
+
| ... | | | | |
|
|
399
178
|
|
|
400
179
|
### 6.2 Enhance Selected Idea
|
|
401
180
|
|
|
@@ -404,62 +183,25 @@ Create `$WORKSPACE/ideas/selected_idea.md` with:
|
|
|
404
183
|
- Architecture choices
|
|
405
184
|
- Hyperparameters
|
|
406
185
|
- Implementation roadmap
|
|
407
|
-
- Failure modes & mitigations
|
|
408
|
-
|
|
409
|
-
**Output:** `$WORKSPACE/ideas/selected_idea.md`
|
|
410
186
|
|
|
411
187
|
---
|
|
412
188
|
|
|
413
|
-
## Step 7: Code Survey
|
|
189
|
+
## Step 7: Code Survey
|
|
414
190
|
|
|
415
|
-
Map
|
|
191
|
+
Map idea concepts to reference implementations.
|
|
416
192
|
|
|
417
|
-
See `references/code-mapping.md` for
|
|
418
|
-
|
|
419
|
-
**Quick steps:**
|
|
420
|
-
1. Extract atomic concepts from `selected_idea.md`
|
|
421
|
-
2. Search repos: `grep -r "class.*Attention" $WORKSPACE/repos/`
|
|
422
|
-
3. Document mapping to `$WORKSPACE/ideas/implementation_report.md`
|
|
193
|
+
See `references/code-mapping.md` for template.
|
|
423
194
|
|
|
424
195
|
**Output:** `$WORKSPACE/ideas/implementation_report.md`
|
|
425
196
|
|
|
426
197
|
---
|
|
427
198
|
|
|
428
|
-
## Step 8:
|
|
199
|
+
## Step 8: Summary
|
|
429
200
|
|
|
430
|
-
Create `$WORKSPACE/ideas/summary.md
|
|
431
|
-
- Task overview (domain, focus)
|
|
432
|
-
- Resources gathered (papers, repos count)
|
|
201
|
+
Create `$WORKSPACE/ideas/summary.md`:
|
|
433
202
|
- All 5 ideas with scores
|
|
434
203
|
- Selected idea details
|
|
435
|
-
- Next steps: `/research-pipeline`
|
|
436
|
-
|
|
437
|
-
**Output:** `$WORKSPACE/ideas/summary.md`
|
|
438
|
-
|
|
439
|
-
---
|
|
440
|
-
|
|
441
|
-
## Quality Checklist
|
|
442
|
-
|
|
443
|
-
Before completing, verify:
|
|
444
|
-
|
|
445
|
-
- [ ] At least 5 repos cloned in `repos/`
|
|
446
|
-
- [ ] At least 3 papers downloaded in `papers/`
|
|
447
|
-
- [ ] All 5 ideas are substantially different
|
|
448
|
-
- [ ] Selected idea has complete math formulations
|
|
449
|
-
- [ ] Implementation report covers ALL atomic concepts
|
|
450
|
-
- [ ] Each concept has actual code reference (not placeholder)
|
|
451
|
-
- [ ] Evaluation plan has specific datasets and metrics
|
|
452
|
-
|
|
453
|
-
---
|
|
454
|
-
|
|
455
|
-
## Integration with Other Skills
|
|
456
|
-
|
|
457
|
-
**After idea-generation:**
|
|
458
|
-
- `/research-pipeline` → Implement the selected idea
|
|
459
|
-
|
|
460
|
-
**To gather more resources:**
|
|
461
|
-
- `/literature-survey` → Comprehensive paper collection
|
|
462
|
-
- `/write-review-paper` → Synthesize into review
|
|
204
|
+
- Next steps: `/research-pipeline` to implement
|
|
463
205
|
|
|
464
206
|
---
|
|
465
207
|
|
|
@@ -467,19 +209,15 @@ Before completing, verify:
|
|
|
467
209
|
|
|
468
210
|
| User Says | Action |
|
|
469
211
|
|-----------|--------|
|
|
470
|
-
| "Generate
|
|
471
|
-
| "
|
|
472
|
-
| "
|
|
473
|
-
| "
|
|
474
|
-
| "Map this idea to code" | Step 7 only |
|
|
212
|
+
| "Generate ideas for X" | Check workspace → ask strategy → generate |
|
|
213
|
+
| "I have papers, generate ideas" | Skip to Step 4 |
|
|
214
|
+
| "Enhance idea N" | Jump to Step 6 |
|
|
215
|
+
| "Map to code" | Jump to Step 7 |
|
|
475
216
|
|
|
476
217
|
---
|
|
477
218
|
|
|
478
|
-
##
|
|
479
|
-
|
|
480
|
-
If more than 10 papers/repos to analyze:
|
|
481
|
-
1. First pass: Quick scan all (abstract/README only)
|
|
482
|
-
2. Select top 5-7 for deep analysis
|
|
483
|
-
3. Generate ideas from deep analysis
|
|
219
|
+
## Integration
|
|
484
220
|
|
|
485
|
-
|
|
221
|
+
- **Before:** `/literature-survey` to collect papers
|
|
222
|
+
- **After:** `/research-pipeline` to implement selected idea
|
|
223
|
+
- **Alternative:** `/write-review-paper` to write survey instead
|
|
@@ -14,6 +14,8 @@ metadata:
|
|
|
14
14
|
|
|
15
15
|
Comprehensive literature discovery workflow for a research domain. This skill searches broadly, filters by relevance, clusters by direction, and iterates to ensure complete coverage.
|
|
16
16
|
|
|
17
|
+
**Workspace:** See `../_shared/workspace-spec.md` for directory structure. Outputs go to `$WORKSPACE/survey/` and `$WORKSPACE/papers/`.
|
|
18
|
+
|
|
17
19
|
## Architecture: Isolated Sub-agent
|
|
18
20
|
|
|
19
21
|
This survey runs in an **isolated sub-session** to avoid context pollution. The main session only receives the final report.
|
|
@@ -13,44 +13,22 @@ metadata:
|
|
|
13
13
|
|
|
14
14
|
# Research Pipeline
|
|
15
15
|
|
|
16
|
-
Automate an end-to-end ML research workflow: idea
|
|
16
|
+
Automate an end-to-end ML research workflow: idea → literature → survey → plan → implement → review → iterate.
|
|
17
17
|
|
|
18
|
-
|
|
18
|
+
**Workspace:** See `../_shared/workspace-spec.md` for directory structure. Outputs go to `$WORKSPACE/project/`, `$WORKSPACE/iterations/`.
|
|
19
19
|
|
|
20
|
-
|
|
20
|
+
**File existence = step completion.** Skip steps whose output already exists.
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
---
|
|
23
23
|
|
|
24
|
-
|
|
24
|
+
## Step 0: Check Active Project
|
|
25
25
|
|
|
26
|
-
### Check Active Project
|
|
27
26
|
```bash
|
|
28
27
|
cat ~/.openclaw/workspace/projects/.active 2>/dev/null
|
|
29
28
|
```
|
|
30
29
|
|
|
31
|
-
If
|
|
32
|
-
|
|
33
|
-
If no active project exists, create one based on the research idea (see Step 1).
|
|
34
|
-
|
|
35
|
-
### Directory Structure
|
|
36
|
-
```
|
|
37
|
-
$WORKSPACE/
|
|
38
|
-
├── project.json # Project metadata
|
|
39
|
-
├── task.json # Research task/idea definition
|
|
40
|
-
├── search_results.md # Search results (Step 2)
|
|
41
|
-
├── prepare_res.md # Selected repos (Step 3)
|
|
42
|
-
├── papers/ # Downloaded papers (Step 4)
|
|
43
|
-
├── repos/ # Cloned repositories (Step 3)
|
|
44
|
-
├── notes/ # Paper notes (Step 5)
|
|
45
|
-
├── survey_res.md # Literature survey (Step 5)
|
|
46
|
-
├── plan_res.md # Implementation plan (Step 6)
|
|
47
|
-
├── project/ # Code implementation (Step 7)
|
|
48
|
-
├── ml_res.md # Implementation report (Step 7)
|
|
49
|
-
├── iterations/ # Review iterations (Step 8-9)
|
|
50
|
-
│ ├── judge_v1.md
|
|
51
|
-
│ └── ...
|
|
52
|
-
└── experiment_res.md # Final results (Step 10)
|
|
53
|
-
```
|
|
30
|
+
If active, set `$WORKSPACE = ~/.openclaw/workspace/projects/{project_id}/`.
|
|
31
|
+
If none, create based on research idea in Step 1.
|
|
54
32
|
|
|
55
33
|
---
|
|
56
34
|
|
|
@@ -1,81 +1,5 @@
|
|
|
1
|
-
# Workspace
|
|
1
|
+
# Workspace Specification
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**This file has moved to the shared location.**
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
```
|
|
8
|
-
workspace/
|
|
9
|
-
task.json # Input: research task definition
|
|
10
|
-
search_results.md # Step 2: arxiv + github search results
|
|
11
|
-
prepare_res.md # Step 3: selected repos and rationale
|
|
12
|
-
survey_res.md # Step 5: synthesized literature survey
|
|
13
|
-
plan_res.md # Step 6: four-part implementation plan
|
|
14
|
-
ml_res.md # Step 7: implementation report
|
|
15
|
-
experiment_res.md # Step 10: full training results
|
|
16
|
-
|
|
17
|
-
repos/ # Step 3: cloned reference repositories
|
|
18
|
-
repo-name-1/
|
|
19
|
-
repo-name-2/
|
|
20
|
-
|
|
21
|
-
papers/ # Step 4: downloaded paper sources
|
|
22
|
-
2401.12345.tex
|
|
23
|
-
2401.67890.tex
|
|
24
|
-
|
|
25
|
-
notes/ # Step 5: per-paper survey notes
|
|
26
|
-
paper_001.md
|
|
27
|
-
paper_002.md
|
|
28
|
-
|
|
29
|
-
iterations/ # Steps 8-9: review history
|
|
30
|
-
judge_v1.md
|
|
31
|
-
judge_v2.md
|
|
32
|
-
|
|
33
|
-
project/ # Step 7: implementation code
|
|
34
|
-
model/
|
|
35
|
-
data/
|
|
36
|
-
training/
|
|
37
|
-
testing/
|
|
38
|
-
utils/
|
|
39
|
-
run.py
|
|
40
|
-
requirements.txt
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
## Conventions
|
|
44
|
-
|
|
45
|
-
### File Existence = Step Completion
|
|
46
|
-
|
|
47
|
-
The research pipeline uses file existence as the checkpoint mechanism. Before executing any step, check whether its output file already exists. If it does, skip the step.
|
|
48
|
-
|
|
49
|
-
This enables:
|
|
50
|
-
- **Crash recovery**: resume from the last completed step.
|
|
51
|
-
- **Incremental progress**: re-running the pipeline skips completed work.
|
|
52
|
-
- **Transparency**: a human can inspect progress by listing the directory.
|
|
53
|
-
|
|
54
|
-
### Naming Rules
|
|
55
|
-
|
|
56
|
-
- Markdown files (`.md`) for human-readable outputs.
|
|
57
|
-
- JSON files (`.json`) for structured data (task definition).
|
|
58
|
-
- Paper notes use sequential numbering: `paper_001.md`, `paper_002.md`.
|
|
59
|
-
- Review iterations use version numbering: `judge_v1.md`, `judge_v2.md`.
|
|
60
|
-
|
|
61
|
-
### Immutability
|
|
62
|
-
|
|
63
|
-
Once a step's output is written, do NOT modify it unless the user explicitly asks. If a step needs to be re-done, delete the output file first, then re-execute.
|
|
64
|
-
|
|
65
|
-
Exception: `workspace/project/` is mutable during the implement-review-iterate loop (Steps 7-9).
|
|
66
|
-
|
|
67
|
-
### task.json Schema
|
|
68
|
-
|
|
69
|
-
```json
|
|
70
|
-
{
|
|
71
|
-
"idea": "A 1-3 sentence description of the research idea",
|
|
72
|
-
"references": ["2401.12345", "paper title string"],
|
|
73
|
-
"domain": "recommendation systems",
|
|
74
|
-
"date_limit": "2024-01-01"
|
|
75
|
-
}
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
- `idea` (required): The core research idea to implement.
|
|
79
|
-
- `references` (optional): ArXiv IDs or paper titles as starting points.
|
|
80
|
-
- `domain` (optional): Research domain for focused searching.
|
|
81
|
-
- `date_limit` (optional): Only consider papers published after this date.
|
|
5
|
+
See: `../../_shared/workspace-spec.md` for the unified workspace specification used by all Scientify skills.
|
|
@@ -14,6 +14,8 @@ metadata:
|
|
|
14
14
|
|
|
15
15
|
Guide for writing a structured literature review or survey paper from papers you've already collected. This skill helps with reading strategy, note organization, and academic writing.
|
|
16
16
|
|
|
17
|
+
**Workspace:** See `../_shared/workspace-spec.md` for directory structure. Outputs go to `$WORKSPACE/review/`.
|
|
18
|
+
|
|
17
19
|
## Prerequisites
|
|
18
20
|
|
|
19
21
|
Before starting, ensure you have:
|