scientify 1.2.0 → 1.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/idea-generation/SKILL.md +113 -643
- package/skills/idea-generation/references/code-mapping.md +51 -0
- package/skills/idea-generation/references/idea-template.md +49 -184
- package/skills/idea-generation/references/reading-long-papers.md +43 -0
- package/skills/literature-survey/SKILL.md +1 -14
- package/skills/research-pipeline/SKILL.md +1 -1
- package/skills/write-review-paper/SKILL.md +5 -179
- package/skills/write-review-paper/references/survey-template.md +70 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: idea-generation
|
|
3
|
-
description: "Generate innovative research ideas from
|
|
3
|
+
description: "Generate 5 innovative research ideas from collected papers. Analyzes literature, identifies gaps, proposes novel methods with citations. Use for: 找研究方向, 生成创新点, find research gaps. Requires papers in workspace (run /literature-survey first if needed)."
|
|
4
4
|
metadata:
|
|
5
5
|
{
|
|
6
6
|
"openclaw":
|
|
@@ -13,719 +13,193 @@ metadata:
|
|
|
13
13
|
|
|
14
14
|
# Idea Generation
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
Generate innovative research ideas grounded in literature analysis. This skill reads existing papers, identifies research gaps, and produces 5 distinct ideas with citations.
|
|
17
17
|
|
|
18
|
-
|
|
19
|
-
2. Select and download references
|
|
20
|
-
3. Analyze literature and codebases
|
|
21
|
-
4. Generate multiple ideas
|
|
22
|
-
5. Select and enhance the best idea
|
|
23
|
-
6. Map to code implementations
|
|
18
|
+
**Core principle:** Ideas MUST be grounded in actual papers, not generated from model knowledge.
|
|
24
19
|
|
|
25
20
|
---
|
|
26
21
|
|
|
27
|
-
##
|
|
22
|
+
## Step 1: Check Workspace Resources
|
|
28
23
|
|
|
29
|
-
|
|
30
|
-
- Do NOT ask "要我继续吗?" or "Should I proceed?"
|
|
31
|
-
- You MAY spawn subagents for parallel tasks (e.g., downloading multiple papers)
|
|
32
|
-
- Only ask user when there's a genuine ambiguity (e.g., which focus area to choose)
|
|
33
|
-
- Checkpoints are for YOUR internal verification, not for asking user
|
|
34
|
-
|
|
35
|
-
**Run the entire workflow from Step 1 to Step 8 automatically.**
|
|
36
|
-
|
|
37
|
-
---
|
|
38
|
-
|
|
39
|
-
## ⚠️ CRITICAL: MANDATORY TOOL USAGE
|
|
40
|
-
|
|
41
|
-
**DO NOT generate ideas from your own knowledge.** All ideas MUST be grounded in actual literature research.
|
|
42
|
-
|
|
43
|
-
### Blocking Requirements
|
|
44
|
-
|
|
45
|
-
1. **MUST call `arxiv` tool** to search papers - NO EXCEPTIONS
|
|
46
|
-
2. **MUST call `github_search` tool** to find repositories - NO EXCEPTIONS
|
|
47
|
-
3. **MUST write `search_results.md`** BEFORE proceeding to idea generation
|
|
48
|
-
4. **MUST reference specific papers** (with arXiv IDs) in generated ideas
|
|
49
|
-
5. **MUST clone actual repos** before code survey
|
|
50
|
-
|
|
51
|
-
### Anti-Pattern: DO NOT DO THIS
|
|
52
|
-
|
|
53
|
-
❌ User asks about "time series forecasting" → Agent immediately lists methods from memory
|
|
54
|
-
❌ Agent generates ideas without calling any search tools
|
|
55
|
-
❌ Agent skips to idea generation without `search_results.md` existing
|
|
56
|
-
|
|
57
|
-
### Correct Pattern: DO THIS
|
|
58
|
-
|
|
59
|
-
✅ User asks about "time series forecasting" → Agent calls `arxiv` tool with query
|
|
60
|
-
✅ Agent calls `github_search` tool to find implementations
|
|
61
|
-
✅ Agent writes search results to file
|
|
62
|
-
✅ Agent reads downloaded papers before generating ideas
|
|
63
|
-
✅ Ideas reference specific papers by arXiv ID
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
## Workspace Convention (Project-based)
|
|
68
|
-
|
|
69
|
-
**IMPORTANT**: Each research topic uses its own project directory. Agent auto-selects or creates projects.
|
|
70
|
-
|
|
71
|
-
```
|
|
72
|
-
~/.openclaw/workspace/
|
|
73
|
-
└── projects/
|
|
74
|
-
├── .active # Current project ID (plain text file)
|
|
75
|
-
├── nlp-summarization/ # Project A
|
|
76
|
-
│ ├── project.json # Project metadata
|
|
77
|
-
│ ├── task.json # Research task definition
|
|
78
|
-
│ ├── search_results.md # Search results
|
|
79
|
-
│ ├── prepare_res.md # Selected repos summary
|
|
80
|
-
│ ├── papers/ # Downloaded papers
|
|
81
|
-
│ ├── repos/ # Cloned repositories
|
|
82
|
-
│ └── ideas/ # Generated ideas
|
|
83
|
-
├── image-segmentation/ # Project B
|
|
84
|
-
│ └── ...
|
|
85
|
-
└── ...
|
|
86
|
-
```
|
|
87
|
-
|
|
88
|
-
**All paths are project-relative**: `~/.openclaw/workspace/projects/{project_id}/`
|
|
89
|
-
|
|
90
|
-
**File existence = step completion.** Skip steps whose output already exists.
|
|
91
|
-
|
|
92
|
-
---
|
|
93
|
-
|
|
94
|
-
## Step 0: Auto Project Management (REQUIRED)
|
|
95
|
-
|
|
96
|
-
**Agent autonomously manages projects. DO NOT ask user for confirmation.**
|
|
97
|
-
|
|
98
|
-
### 0.1 Extract Topic from User Query
|
|
99
|
-
|
|
100
|
-
Analyze the user's message to identify the research topic. Examples:
|
|
101
|
-
- "帮我调研文本摘要方法" → topic: `text-summarization`
|
|
102
|
-
- "推荐系统的深度学习方法" → topic: `rec-deep-learning`
|
|
103
|
-
- "transformer attention optimization" → topic: `transformer-attention`
|
|
104
|
-
|
|
105
|
-
Convert to kebab-case ID: lowercase, spaces/special chars → hyphens.
|
|
106
|
-
|
|
107
|
-
### 0.2 Check Existing Projects
|
|
24
|
+
First, check what resources already exist:
|
|
108
25
|
|
|
109
26
|
```bash
|
|
110
|
-
|
|
111
|
-
|
|
27
|
+
# Check active project
|
|
28
|
+
cat ~/.openclaw/workspace/projects/.active 2>/dev/null
|
|
112
29
|
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
cat ~/.openclaw/workspace/projects/*/project.json 2>/dev/null
|
|
116
|
-
```
|
|
30
|
+
# Check papers
|
|
31
|
+
ls ~/.openclaw/workspace/projects/*/papers/ 2>/dev/null | head -20
|
|
117
32
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
**If matching project exists**: Use it, update `.active`
|
|
121
|
-
```bash
|
|
122
|
-
echo "{project_id}" > ~/.openclaw/workspace/projects/.active
|
|
33
|
+
# Check survey results
|
|
34
|
+
cat ~/.openclaw/workspace/projects/*/survey/clusters.json 2>/dev/null | head -5
|
|
123
35
|
```
|
|
124
36
|
|
|
125
|
-
|
|
126
|
-
```bash
|
|
127
|
-
PROJECT_ID="{topic-as-kebab-case}"
|
|
128
|
-
mkdir -p ~/.openclaw/workspace/projects/$PROJECT_ID/{papers,repos,ideas}
|
|
129
|
-
echo "$PROJECT_ID" > ~/.openclaw/workspace/projects/.active
|
|
130
|
-
|
|
131
|
-
# Create project.json
|
|
132
|
-
cat > ~/.openclaw/workspace/projects/$PROJECT_ID/project.json << 'EOF'
|
|
133
|
-
{
|
|
134
|
-
"id": "{project_id}",
|
|
135
|
-
"name": "{Human readable name}",
|
|
136
|
-
"created": "{ISO date}",
|
|
137
|
-
"topics": ["{keyword1}", "{keyword2}"]
|
|
138
|
-
}
|
|
139
|
-
EOF
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
### 0.4 Set Working Paths
|
|
143
|
-
|
|
144
|
-
After project selection, ALL subsequent paths use:
|
|
145
|
-
```
|
|
146
|
-
WORKSPACE=~/.openclaw/workspace/projects/{project_id}
|
|
147
|
-
$WORKSPACE/task.json
|
|
148
|
-
$WORKSPACE/search_results.md
|
|
149
|
-
$WORKSPACE/papers/
|
|
150
|
-
$WORKSPACE/repos/
|
|
151
|
-
$WORKSPACE/ideas/
|
|
152
|
-
$WORKSPACE/prepare_res.md
|
|
153
|
-
```
|
|
37
|
+
### Assess Available Resources
|
|
154
38
|
|
|
155
|
-
|
|
156
|
-
|
|
39
|
+
| Resource | Location | Status |
|
|
40
|
+
|----------|----------|--------|
|
|
41
|
+
| Papers | `$WORKSPACE/papers/` | Count: ? |
|
|
42
|
+
| Survey clusters | `$WORKSPACE/survey/clusters.json` | Exists: Y/N |
|
|
43
|
+
| Repos | `$WORKSPACE/repos/` | Count: ? |
|
|
157
44
|
|
|
158
45
|
---
|
|
159
46
|
|
|
160
|
-
## Step
|
|
47
|
+
## Step 2: Ask User About Search Strategy
|
|
161
48
|
|
|
162
|
-
|
|
49
|
+
Based on workspace state, ask user:
|
|
163
50
|
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
51
|
+
**If papers exist (≥5):**
|
|
52
|
+
> 📚 Found {N} papers in workspace from previous survey.
|
|
53
|
+
>
|
|
54
|
+
> Options:
|
|
55
|
+
> 1. **Use existing papers** - Generate ideas from current collection
|
|
56
|
+
> 2. **Search more** - Run `/literature-survey` to expand collection
|
|
57
|
+
> 3. **Quick search** - Add 5-10 more papers on specific topic
|
|
167
58
|
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
"domain": "graph neural networks",
|
|
176
|
-
"focus": "scalable transformers for node classification",
|
|
177
|
-
"date_limit": "2024-01-01",
|
|
178
|
-
"created": "2024-XX-XX"
|
|
179
|
-
}
|
|
180
|
-
```
|
|
181
|
-
|
|
182
|
-
**Output:** `$WORKSPACE/task.json`
|
|
59
|
+
**If no papers:**
|
|
60
|
+
> 📭 No papers found in workspace.
|
|
61
|
+
>
|
|
62
|
+
> To generate grounded ideas, I need literature. Options:
|
|
63
|
+
> 1. **Run /literature-survey** - Comprehensive search (100+ papers, recommended)
|
|
64
|
+
> 2. **Quick search** - Fetch 10-15 papers on your topic now
|
|
65
|
+
> 3. **You provide papers** - Point me to existing PDFs/tex files
|
|
183
66
|
|
|
184
67
|
---
|
|
185
68
|
|
|
186
|
-
## Step
|
|
187
|
-
|
|
188
|
-
**⚠️ BLOCKING: You MUST complete this step before ANY idea generation.**
|
|
189
|
-
|
|
190
|
-
### 2.1 ArXiv Search (REQUIRED)
|
|
69
|
+
## Step 3: Acquire Resources (if needed)
|
|
191
70
|
|
|
192
|
-
|
|
71
|
+
### Option A: Delegate to /literature-survey (Recommended)
|
|
193
72
|
|
|
73
|
+
If user wants comprehensive search:
|
|
194
74
|
```
|
|
195
|
-
|
|
196
|
-
Arguments:
|
|
197
|
-
query: "text summarization transformer model"
|
|
198
|
-
max_results: 10
|
|
199
|
-
sort_by: "relevance"
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
If `arxiv` tool is not available, use `WebSearch` with `site:arxiv.org`:
|
|
203
|
-
```
|
|
204
|
-
Tool: WebSearch
|
|
205
|
-
Arguments:
|
|
206
|
-
query: "site:arxiv.org text summarization transformer model"
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
### 2.2 GitHub Search (REQUIRED)
|
|
210
|
-
|
|
211
|
-
**You MUST call the `github_search` tool or search GitHub.** Example:
|
|
212
|
-
|
|
213
|
-
```
|
|
214
|
-
Tool: github_search
|
|
215
|
-
Arguments:
|
|
216
|
-
query: "text summarization pytorch huggingface"
|
|
217
|
-
sort: "stars"
|
|
218
|
-
max_results: 20
|
|
219
|
-
```
|
|
220
|
-
|
|
221
|
-
If `github_search` tool is not available, use `WebSearch`:
|
|
222
|
-
```
|
|
223
|
-
Tool: WebSearch
|
|
224
|
-
Arguments:
|
|
225
|
-
query: "site:github.com text summarization pytorch stars:>100"
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
### 2.3 CHECKPOINT: Verify Search Completed
|
|
75
|
+
Please run: /literature-survey {topic}
|
|
229
76
|
|
|
230
|
-
|
|
231
|
-
-
|
|
232
|
-
-
|
|
233
|
-
-
|
|
234
|
-
-
|
|
77
|
+
This will:
|
|
78
|
+
- Search 100+ papers systematically
|
|
79
|
+
- Filter by relevance (score ≥4)
|
|
80
|
+
- Cluster into research directions
|
|
81
|
+
- Save to $WORKSPACE/papers/
|
|
235
82
|
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
### 2.4 Compile Results
|
|
239
|
-
|
|
240
|
-
Write to `$WORKSPACE/search_results.md`:
|
|
241
|
-
|
|
242
|
-
```markdown
|
|
243
|
-
# Search Results
|
|
244
|
-
|
|
245
|
-
## Task
|
|
246
|
-
- Domain: {domain}
|
|
247
|
-
- Focus: {focus}
|
|
248
|
-
- Date: {date}
|
|
249
|
-
|
|
250
|
-
## ArXiv Papers Found
|
|
251
|
-
|
|
252
|
-
| # | Title | ArXiv ID | Year | Relevance |
|
|
253
|
-
|---|-------|----------|------|-----------|
|
|
254
|
-
| 1 | [Title](pdf_url) | 2401.xxxxx | 2024 | [Why relevant] |
|
|
255
|
-
| 2 | ... | ... | ... | ... |
|
|
256
|
-
|
|
257
|
-
## GitHub Repositories Found
|
|
258
|
-
|
|
259
|
-
| # | Repository | Stars | Language | Relevance |
|
|
260
|
-
|---|------------|-------|----------|-----------|
|
|
261
|
-
| 1 | [owner/repo](url) | 1.2k | Python | [Why relevant] |
|
|
262
|
-
| 2 | ... | ... | ... | ... |
|
|
83
|
+
After survey completes, run /idea-generation again.
|
|
263
84
|
```
|
|
264
85
|
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
---
|
|
86
|
+
### Option B: Quick Search (5-10 papers)
|
|
268
87
|
|
|
269
|
-
|
|
88
|
+
For fast iteration, do minimal search:
|
|
270
89
|
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
Selection criteria:
|
|
274
|
-
- Direct implementation of relevant papers
|
|
275
|
-
- High code quality (stars, documentation)
|
|
276
|
-
- Active maintenance
|
|
277
|
-
- Covers key techniques in the domain
|
|
278
|
-
|
|
279
|
-
### 3.1 Clone Selected Repos
|
|
280
|
-
|
|
281
|
-
```bash
|
|
282
|
-
mkdir -p $WORKSPACE/repos
|
|
283
|
-
cd $WORKSPACE/repos
|
|
284
|
-
|
|
285
|
-
# For each selected repo:
|
|
286
|
-
git clone --depth 1 https://github.com/owner/repo1.git
|
|
287
|
-
git clone --depth 1 https://github.com/owner/repo2.git
|
|
288
|
-
# ... at least 5 repos
|
|
90
|
+
1. **ArXiv search:**
|
|
289
91
|
```
|
|
290
|
-
|
|
291
|
-
### 3.2 Document Selection
|
|
292
|
-
|
|
293
|
-
Write to `$WORKSPACE/prepare_res.md`:
|
|
294
|
-
|
|
295
|
-
```markdown
|
|
296
|
-
# Selected Reference Codebases
|
|
297
|
-
|
|
298
|
-
## Selection Rationale
|
|
299
|
-
[Why these repos were chosen]
|
|
300
|
-
|
|
301
|
-
## Repositories
|
|
302
|
-
|
|
303
|
-
### 1. repo1
|
|
304
|
-
- **URL**: https://github.com/owner/repo1
|
|
305
|
-
- **Paper**: [Associated paper if any]
|
|
306
|
-
- **Key Components**:
|
|
307
|
-
- `model/` - Model architecture
|
|
308
|
-
- `train.py` - Training loop
|
|
309
|
-
- **Usage**: [How this will help implement our idea]
|
|
310
|
-
|
|
311
|
-
### 2. repo2
|
|
312
|
-
...
|
|
313
|
-
|
|
314
|
-
## Reference Papers
|
|
315
|
-
Based on these repos, the key papers to read are:
|
|
316
|
-
1. [Paper Title 1] - ArXiv: 2401.xxxxx
|
|
317
|
-
2. [Paper Title 2] - ArXiv: 2401.xxxxx
|
|
318
|
-
...
|
|
319
|
-
```
|
|
320
|
-
|
|
321
|
-
**Output:** `$WORKSPACE/prepare_res.md` + `$WORKSPACE/repos/`
|
|
322
|
-
|
|
323
|
-
---
|
|
324
|
-
|
|
325
|
-
## Step 4: Download Papers
|
|
326
|
-
|
|
327
|
-
For each paper referenced in prepare_res.md, download the source.
|
|
328
|
-
|
|
329
|
-
**IMPORTANT: Download .tex source, NOT PDF.** .tex files are much easier for AI to read and extract information from.
|
|
330
|
-
|
|
331
|
-
### 4.1 Download .tex Source (RECOMMENDED - Use arxiv tool)
|
|
332
|
-
|
|
333
|
-
Use the `arxiv` tool with `download: true` to automatically download and extract .tex sources:
|
|
334
|
-
|
|
335
|
-
```
|
|
336
|
-
Tool: arxiv
|
|
92
|
+
Tool: arxiv_search
|
|
337
93
|
Arguments:
|
|
338
|
-
query: "
|
|
94
|
+
query: "{user_topic}"
|
|
339
95
|
max_results: 10
|
|
340
|
-
download: true
|
|
341
|
-
output_dir: "$WORKSPACE/papers"
|
|
342
96
|
```
|
|
343
97
|
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
4. Fall back to PDF if .tex is unavailable
|
|
349
|
-
5. Return a `downloads` array showing what was downloaded
|
|
350
|
-
|
|
351
|
-
**Output format:**
|
|
352
|
-
```json
|
|
353
|
-
{
|
|
354
|
-
"papers": [...],
|
|
355
|
-
"downloads": [
|
|
356
|
-
{"arxiv_id": "2404.04429", "format": "tex", "files": ["main.tex", "methods.tex"]},
|
|
357
|
-
{"arxiv_id": "2308.03664", "format": "pdf", "files": ["2308.03664.pdf"], "error": "tex unavailable"}
|
|
358
|
-
],
|
|
359
|
-
"output_dir": "$WORKSPACE/papers"
|
|
360
|
-
}
|
|
98
|
+
2. **Clone 3-5 reference repos:**
|
|
99
|
+
```bash
|
|
100
|
+
mkdir -p $WORKSPACE/repos
|
|
101
|
+
git clone --depth 1 {repo_url} $WORKSPACE/repos/{name}
|
|
361
102
|
```
|
|
362
103
|
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
If the arxiv tool is unavailable, use bash:
|
|
104
|
+
3. **Download paper sources:**
|
|
366
105
|
```bash
|
|
367
106
|
mkdir -p $WORKSPACE/papers/{arxiv_id}
|
|
368
|
-
|
|
369
|
-
curl -L "https://arxiv.org/src/{arxiv_id}" -o source.tar.gz
|
|
370
|
-
tar -xzf source.tar.gz 2>/dev/null || mv source.tar.gz main.tex
|
|
107
|
+
curl -L "https://arxiv.org/src/{arxiv_id}" | tar -xz -C $WORKSPACE/papers/{arxiv_id}
|
|
371
108
|
```
|
|
372
109
|
|
|
373
|
-
|
|
110
|
+
---
|
|
374
111
|
|
|
375
|
-
|
|
112
|
+
## Step 4: Analyze Literature
|
|
376
113
|
|
|
377
|
-
|
|
378
|
-
# Downloaded Papers
|
|
114
|
+
**Prerequisites:** At least 5 papers in `$WORKSPACE/papers/`
|
|
379
115
|
|
|
380
|
-
|
|
381
|
-
|----------|-------|--------|--------|
|
|
382
|
-
| 2404.04429 | Physics-Informed ML for Battery... | .tex | ✓ |
|
|
383
|
-
| 2308.03664 | Two-stage Early Prediction... | .tex | ✓ |
|
|
384
|
-
| 2401.99999 | Some Other Paper | .pdf | ✓ (tex unavailable) |
|
|
385
|
-
```
|
|
116
|
+
### 4.1 Read Papers
|
|
386
117
|
|
|
387
|
-
|
|
118
|
+
For each paper, extract:
|
|
119
|
+
- Core contribution (1 sentence)
|
|
120
|
+
- Key method/formula
|
|
121
|
+
- Limitations mentioned
|
|
122
|
+
- Future work suggestions
|
|
388
123
|
|
|
389
|
-
|
|
124
|
+
**Long papers (>50KB):** See `references/reading-long-papers.md`
|
|
390
125
|
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
**⚠️ BLOCKING: DO NOT start this step unless Steps 2-4 are complete.**
|
|
394
|
-
|
|
395
|
-
### Pre-requisite Checkpoint
|
|
396
|
-
|
|
397
|
-
Before generating ANY ideas, verify these files exist:
|
|
398
|
-
- [ ] `$WORKSPACE/search_results.md` - search results from Step 2
|
|
399
|
-
- [ ] `$WORKSPACE/prepare_res.md` - selected repos from Step 3
|
|
400
|
-
- [ ] At least 3 papers downloaded in `$WORKSPACE/papers/`
|
|
401
|
-
|
|
402
|
-
**If any file is missing, GO BACK and complete the previous steps.**
|
|
403
|
-
|
|
404
|
-
This is the core intellectual step. Generate **exactly 5 distinct innovative ideas**.
|
|
405
|
-
|
|
406
|
-
**IMPORTANT: Ideas must be grounded in the literature you just read. Each idea MUST:**
|
|
407
|
-
- Reference at least 2 specific papers by arXiv ID
|
|
408
|
-
- Identify specific limitations from those papers
|
|
409
|
-
- Propose improvements based on gaps found in the literature
|
|
410
|
-
|
|
411
|
-
### 5.1 Analyze Literature First (REQUIRED)
|
|
412
|
-
|
|
413
|
-
For each paper in `papers/`:
|
|
414
|
-
1. Read thoroughly (especially: abstract, method, experiments, limitations)
|
|
415
|
-
2. Extract: core contribution, math formulas, limitations, future work
|
|
416
|
-
3. Note connections to other papers
|
|
417
|
-
|
|
418
|
-
**⚠️ Handling Long Papers (>50KB or >15k tokens):**
|
|
419
|
-
|
|
420
|
-
If a .tex file is too long to read in one pass:
|
|
421
|
-
|
|
422
|
-
1. **First pass - Structure scan:**
|
|
423
|
-
```bash
|
|
424
|
-
# List all .tex files and their sizes
|
|
425
|
-
ls -la $WORKSPACE/papers/{arxiv_id}/
|
|
426
|
-
# Check line count
|
|
427
|
-
wc -l $WORKSPACE/papers/{arxiv_id}/*.tex
|
|
428
|
-
```
|
|
429
|
-
|
|
430
|
-
2. **Chunked reading strategy:**
|
|
431
|
-
- Read `abstract` section first (usually in main.tex, first 200 lines)
|
|
432
|
-
- Read `\section{Introduction}` or `\section{Method}` separately
|
|
433
|
-
- Read `\section{Experiments}` or `\section{Results}` separately
|
|
434
|
-
- Read `\section{Conclusion}` and `\section{Related Work}` last
|
|
435
|
-
|
|
436
|
-
Use the Read tool with `offset` and `limit` parameters:
|
|
437
|
-
```
|
|
438
|
-
Tool: Read
|
|
439
|
-
Arguments:
|
|
440
|
-
file_path: "$WORKSPACE/papers/2404.04429/main.tex"
|
|
441
|
-
offset: 1
|
|
442
|
-
limit: 500 # First 500 lines (abstract + intro)
|
|
443
|
-
```
|
|
444
|
-
|
|
445
|
-
Then continue:
|
|
446
|
-
```
|
|
447
|
-
Tool: Read
|
|
448
|
-
Arguments:
|
|
449
|
-
file_path: "$WORKSPACE/papers/2404.04429/main.tex"
|
|
450
|
-
offset: 500
|
|
451
|
-
limit: 500 # Lines 500-1000 (method section)
|
|
452
|
-
```
|
|
453
|
-
|
|
454
|
-
3. **Priority sections for idea generation:**
|
|
455
|
-
| Priority | Section | Why |
|
|
456
|
-
|----------|---------|-----|
|
|
457
|
-
| 1 | Abstract | Core contribution |
|
|
458
|
-
| 2 | Method/Approach | Technical details, formulas |
|
|
459
|
-
| 3 | Experiments | What works, what doesn't |
|
|
460
|
-
| 4 | Conclusion/Future Work | Limitations, open problems |
|
|
461
|
-
| 5 | Related Work | Connections to other papers |
|
|
462
|
-
|
|
463
|
-
4. **Skip if context-limited:**
|
|
464
|
-
- Appendix (proofs, supplementary)
|
|
465
|
-
- Acknowledgments
|
|
466
|
-
- Detailed hyperparameter tables
|
|
467
|
-
|
|
468
|
-
For each repo in `repos/`:
|
|
469
|
-
1. Understand structure: `gen_code_tree_structure` equivalent
|
|
470
|
-
2. Identify key implementations
|
|
471
|
-
3. Note reusable components
|
|
472
|
-
|
|
473
|
-
### 5.2 Identify Research Gaps
|
|
126
|
+
### 4.2 Identify Research Gaps
|
|
474
127
|
|
|
475
128
|
Look for:
|
|
476
129
|
- Common limitations across papers
|
|
477
|
-
- Unexplored combinations
|
|
130
|
+
- Unexplored technique combinations
|
|
478
131
|
- Scalability issues
|
|
479
132
|
- Assumptions that could be relaxed
|
|
480
133
|
|
|
481
|
-
|
|
134
|
+
Document gaps in `$WORKSPACE/ideas/gaps.md`:
|
|
135
|
+
```markdown
|
|
136
|
+
# Research Gaps Identified
|
|
482
137
|
|
|
483
|
-
|
|
138
|
+
## Gap 1: [Description]
|
|
139
|
+
- Mentioned in: [paper1], [paper2]
|
|
140
|
+
- Why important: ...
|
|
484
141
|
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
|
|
489
|
-
|
|
490
|
-
- Example: "Method B [arXiv:2302.67890] proposes Z but has limitation W"
|
|
491
|
-
- Motivation (why this gap matters)
|
|
492
|
-
- Proposed method (with math formulas)
|
|
493
|
-
- **How this improves on cited papers**
|
|
494
|
-
- Expected advantages
|
|
495
|
-
- Evaluation plan (datasets, baselines from the papers you read)
|
|
496
|
-
- Novelty/Feasibility/Impact scores
|
|
142
|
+
## Gap 2: [Description]
|
|
143
|
+
...
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
---
|
|
497
147
|
|
|
498
|
-
|
|
148
|
+
## Step 5: Generate 5 Ideas
|
|
499
149
|
|
|
500
|
-
|
|
150
|
+
Create `$WORKSPACE/ideas/idea_1.md` through `idea_5.md` using template in `references/idea-template.md`.
|
|
501
151
|
|
|
502
|
-
|
|
152
|
+
**Requirements:**
|
|
153
|
+
- Each idea cites ≥2 papers by arXiv ID
|
|
154
|
+
- Use different strategies:
|
|
503
155
|
|
|
504
156
|
| Idea | Strategy |
|
|
505
157
|
|------|----------|
|
|
506
|
-
| 1 | Combination - merge
|
|
507
|
-
| 2 | Simplification -
|
|
508
|
-
| 3 | Generalization - extend to new domain
|
|
509
|
-
| 4 | Constraint relaxation - remove
|
|
510
|
-
| 5 | Architecture innovation -
|
|
158
|
+
| 1 | Combination - merge 2+ techniques |
|
|
159
|
+
| 2 | Simplification - reduce complexity |
|
|
160
|
+
| 3 | Generalization - extend to new domain |
|
|
161
|
+
| 4 | Constraint relaxation - remove assumption |
|
|
162
|
+
| 5 | Architecture innovation - new design |
|
|
511
163
|
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
**Output:** `$WORKSPACE/ideas/idea_1.md` through `idea_5.md`
|
|
164
|
+
**❌ REJECTED if:** No arXiv IDs cited, or ideas not grounded in literature
|
|
515
165
|
|
|
516
166
|
---
|
|
517
167
|
|
|
518
168
|
## Step 6: Select and Enhance Best Idea
|
|
519
169
|
|
|
520
|
-
### 6.1
|
|
170
|
+
### 6.1 Score All Ideas
|
|
521
171
|
|
|
522
|
-
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
| Idea | Title | Novelty | Feasibility | Impact | Total |
|
|
528
|
-
|------|-------|---------|-------------|--------|-------|
|
|
529
|
-
| 1 | ... | 4 | 3 | 4 | 11 |
|
|
530
|
-
| 2 | ... | 5 | 4 | 5 | 14 |
|
|
531
|
-
| 3 | ... | 3 | 5 | 3 | 11 |
|
|
532
|
-
| 4 | ... | 4 | 4 | 4 | 12 |
|
|
533
|
-
| 5 | ... | 3 | 3 | 4 | 10 |
|
|
534
|
-
|
|
535
|
-
**Selected: Idea 2**
|
|
536
|
-
|
|
537
|
-
## Selection Rationale
|
|
538
|
-
[Why this idea is most promising - technical innovation, feasibility, impact]
|
|
539
|
-
```
|
|
172
|
+
| Idea | Novelty | Feasibility | Impact | Total |
|
|
173
|
+
|------|---------|-------------|--------|-------|
|
|
174
|
+
| 1 | /5 | /5 | /5 | /15 |
|
|
175
|
+
| ... | | | | |
|
|
540
176
|
|
|
541
177
|
### 6.2 Enhance Selected Idea
|
|
542
178
|
|
|
543
|
-
|
|
544
|
-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
3. Hyperparameter recommendations
|
|
549
|
-
4. Implementation roadmap
|
|
550
|
-
5. Potential failure modes and mitigations
|
|
551
|
-
6. Detailed experiment design
|
|
552
|
-
|
|
553
|
-
**Output:** `$WORKSPACE/ideas/selected_idea.md`
|
|
554
|
-
|
|
555
|
-
---
|
|
556
|
-
|
|
557
|
-
## Step 7: Code Survey - Map Idea to Implementations
|
|
558
|
-
|
|
559
|
-
This step bridges theory and code. For each **atomic concept** in the selected idea, find corresponding implementations in the reference repos.
|
|
560
|
-
|
|
561
|
-
### 7.1 Extract Atomic Concepts
|
|
562
|
-
|
|
563
|
-
From selected_idea.md, list all concepts needing implementation:
|
|
564
|
-
|
|
565
|
-
```markdown
|
|
566
|
-
## Atomic Concepts to Implement
|
|
567
|
-
|
|
568
|
-
1. Multi-head Self-Attention
|
|
569
|
-
2. Graph Message Passing
|
|
570
|
-
3. Energy-based Diffusion
|
|
571
|
-
4. Adaptive Diffusivity Function
|
|
572
|
-
5. ...
|
|
573
|
-
```
|
|
574
|
-
|
|
575
|
-
### 7.2 Survey Codebases
|
|
576
|
-
|
|
577
|
-
For each concept:
|
|
578
|
-
|
|
579
|
-
1. Search repos for relevant code:
|
|
580
|
-
```bash
|
|
581
|
-
grep -r "class.*Attention" $WORKSPACE/repos/
|
|
582
|
-
grep -r "def forward" $WORKSPACE/repos/
|
|
583
|
-
```
|
|
584
|
-
|
|
585
|
-
2. Read and understand the implementation
|
|
586
|
-
|
|
587
|
-
3. Document the mapping
|
|
588
|
-
|
|
589
|
-
### 7.3 Create Implementation Report
|
|
590
|
-
|
|
591
|
-
Write to `$WORKSPACE/ideas/implementation_report.md`:
|
|
592
|
-
|
|
593
|
-
```markdown
|
|
594
|
-
# Implementation Report
|
|
595
|
-
|
|
596
|
-
## Selected Idea Summary
|
|
597
|
-
[One paragraph summary]
|
|
598
|
-
|
|
599
|
-
## Concept-to-Code Mapping
|
|
600
|
-
|
|
601
|
-
### Concept 1: Multi-head Self-Attention
|
|
602
|
-
|
|
603
|
-
**Math Formula:**
|
|
604
|
-
$$
|
|
605
|
-
\text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
|
|
606
|
-
$$
|
|
607
|
-
|
|
608
|
-
**Reference Implementation:**
|
|
609
|
-
- File: `repos/transformer/attention.py`
|
|
610
|
-
- Class: `MultiHeadAttention`
|
|
611
|
-
- Key code:
|
|
612
|
-
```python
|
|
613
|
-
class MultiHeadAttention(nn.Module):
|
|
614
|
-
def __init__(self, d_model, n_heads):
|
|
615
|
-
self.d_k = d_model // n_heads
|
|
616
|
-
self.W_q = nn.Linear(d_model, d_model)
|
|
617
|
-
# ...
|
|
618
|
-
|
|
619
|
-
def forward(self, x):
|
|
620
|
-
Q = self.W_q(x)
|
|
621
|
-
# ...
|
|
622
|
-
```
|
|
623
|
-
|
|
624
|
-
**Adaptation needed:**
|
|
625
|
-
- [What to modify for our idea]
|
|
626
|
-
|
|
627
|
-
---
|
|
628
|
-
|
|
629
|
-
### Concept 2: Graph Message Passing
|
|
630
|
-
...
|
|
179
|
+
Create `$WORKSPACE/ideas/selected_idea.md` with:
|
|
180
|
+
- Detailed math (loss functions, gradients)
|
|
181
|
+
- Architecture choices
|
|
182
|
+
- Hyperparameters
|
|
183
|
+
- Implementation roadmap
|
|
631
184
|
|
|
632
185
|
---
|
|
633
186
|
|
|
634
|
-
##
|
|
187
|
+
## Step 7: Code Survey
|
|
635
188
|
|
|
636
|
-
|
|
637
|
-
2. [ ] Build Concept Y on top
|
|
638
|
-
3. [ ] Integrate with Concept Z
|
|
639
|
-
4. [ ] Add training loop from repo W
|
|
189
|
+
Map idea concepts to reference implementations.
|
|
640
190
|
|
|
641
|
-
|
|
642
|
-
[Which repo to fork/use as base]
|
|
643
|
-
```
|
|
191
|
+
See `references/code-mapping.md` for template.
|
|
644
192
|
|
|
645
193
|
**Output:** `$WORKSPACE/ideas/implementation_report.md`
|
|
646
194
|
|
|
647
195
|
---
|
|
648
196
|
|
|
649
|
-
## Step 8:
|
|
197
|
+
## Step 8: Summary
|
|
650
198
|
|
|
651
199
|
Create `$WORKSPACE/ideas/summary.md`:
|
|
652
|
-
|
|
653
|
-
|
|
654
|
-
|
|
655
|
-
|
|
656
|
-
## Task
|
|
657
|
-
- Domain: {domain}
|
|
658
|
-
- Focus: {focus}
|
|
659
|
-
- Date: {date}
|
|
660
|
-
|
|
661
|
-
## Resources Gathered
|
|
662
|
-
- Papers analyzed: X
|
|
663
|
-
- Repositories cloned: Y
|
|
664
|
-
- Key techniques identified: Z
|
|
665
|
-
|
|
666
|
-
## Ideas Generated
|
|
667
|
-
1. **[Idea 1 title]** - Score: 11
|
|
668
|
-
2. **[Idea 2 title]** - Score: 14 ⭐ SELECTED
|
|
669
|
-
3. **[Idea 3 title]** - Score: 11
|
|
670
|
-
4. **[Idea 4 title]** - Score: 12
|
|
671
|
-
5. **[Idea 5 title]** - Score: 10
|
|
672
|
-
|
|
673
|
-
## Selected Idea
|
|
674
|
-
**{Title}**
|
|
675
|
-
|
|
676
|
-
{One paragraph description}
|
|
677
|
-
|
|
678
|
-
### Key Innovation
|
|
679
|
-
{What makes this novel}
|
|
680
|
-
|
|
681
|
-
### Implementation Ready
|
|
682
|
-
- Math formulas: ✓ Complete
|
|
683
|
-
- Code references: ✓ Mapped
|
|
684
|
-
- Evaluation plan: ✓ Defined
|
|
685
|
-
|
|
686
|
-
## Next Steps
|
|
687
|
-
1. Run `/research-pipeline` with `selected_idea.md` as input
|
|
688
|
-
2. Or manually implement following `implementation_report.md`
|
|
689
|
-
|
|
690
|
-
## Files Generated
|
|
691
|
-
- `task.json` - Task definition
|
|
692
|
-
- `search_results.md` - Search results
|
|
693
|
-
- `prepare_res.md` - Selected repos
|
|
694
|
-
- `ideas/idea_*.md` - 5 generated ideas
|
|
695
|
-
- `ideas/selected_idea.md` - Enhanced best idea
|
|
696
|
-
- `ideas/implementation_report.md` - Code mapping
|
|
697
|
-
```
|
|
698
|
-
|
|
699
|
-
**Output:** `$WORKSPACE/ideas/summary.md`
|
|
700
|
-
|
|
701
|
-
---
|
|
702
|
-
|
|
703
|
-
## Quality Checklist
|
|
704
|
-
|
|
705
|
-
Before completing, verify:
|
|
706
|
-
|
|
707
|
-
- [ ] At least 5 repos cloned in `repos/`
|
|
708
|
-
- [ ] At least 3 papers downloaded in `papers/`
|
|
709
|
-
- [ ] All 5 ideas are substantially different
|
|
710
|
-
- [ ] Selected idea has complete math formulations
|
|
711
|
-
- [ ] Implementation report covers ALL atomic concepts
|
|
712
|
-
- [ ] Each concept has actual code reference (not placeholder)
|
|
713
|
-
- [ ] Evaluation plan has specific datasets and metrics
|
|
714
|
-
|
|
715
|
-
---
|
|
716
|
-
|
|
717
|
-
## Integration with Other Skills
|
|
718
|
-
|
|
719
|
-
**After idea-generation:**
|
|
720
|
-
```
|
|
721
|
-
/research-pipeline → Implement the selected idea
|
|
722
|
-
```
|
|
723
|
-
|
|
724
|
-
**To gather more resources:**
|
|
725
|
-
```
|
|
726
|
-
/arxiv "specific topic" → Search more papers
|
|
727
|
-
/literature-review → Deep dive into papers
|
|
728
|
-
```
|
|
200
|
+
- All 5 ideas with scores
|
|
201
|
+
- Selected idea details
|
|
202
|
+
- Next steps: `/research-pipeline` to implement
|
|
729
203
|
|
|
730
204
|
---
|
|
731
205
|
|
|
@@ -733,19 +207,15 @@ Before completing, verify:
|
|
|
733
207
|
|
|
734
208
|
| User Says | Action |
|
|
735
209
|
|-----------|--------|
|
|
736
|
-
| "Generate
|
|
737
|
-
| "
|
|
738
|
-
| "
|
|
739
|
-
| "
|
|
740
|
-
| "Map this idea to code" | Step 7 only |
|
|
210
|
+
| "Generate ideas for X" | Check workspace → ask strategy → generate |
|
|
211
|
+
| "I have papers, generate ideas" | Skip to Step 4 |
|
|
212
|
+
| "Enhance idea N" | Jump to Step 6 |
|
|
213
|
+
| "Map to code" | Jump to Step 7 |
|
|
741
214
|
|
|
742
215
|
---
|
|
743
216
|
|
|
744
|
-
##
|
|
745
|
-
|
|
746
|
-
If more than 10 papers/repos to analyze:
|
|
747
|
-
1. First pass: Quick scan all (abstract/README only)
|
|
748
|
-
2. Select top 5-7 for deep analysis
|
|
749
|
-
3. Generate ideas from deep analysis
|
|
217
|
+
## Integration
|
|
750
218
|
|
|
751
|
-
|
|
219
|
+
- **Before:** `/literature-survey` to collect papers
|
|
220
|
+
- **After:** `/research-pipeline` to implement selected idea
|
|
221
|
+
- **Alternative:** `/write-review-paper` to write survey instead
|