@wentorai/research-plugins 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +22 -22
- package/curated/analysis/README.md +71 -56
- package/curated/domains/README.md +176 -67
- package/curated/literature/README.md +71 -47
- package/curated/research/README.md +91 -58
- package/curated/tools/README.md +88 -87
- package/curated/writing/README.md +80 -45
- package/mcp-configs/cloud-docs/confluence-mcp.json +37 -0
- package/mcp-configs/cloud-docs/google-drive-mcp.json +35 -0
- package/mcp-configs/cloud-docs/notion-mcp.json +29 -0
- package/mcp-configs/communication/discord-mcp.json +29 -0
- package/mcp-configs/communication/slack-mcp.json +29 -0
- package/mcp-configs/communication/telegram-mcp.json +28 -0
- package/mcp-configs/database/neo4j-mcp.json +37 -0
- package/mcp-configs/database/postgres-mcp.json +28 -0
- package/mcp-configs/database/sqlite-mcp.json +29 -0
- package/mcp-configs/dev-platform/github-mcp.json +31 -0
- package/mcp-configs/dev-platform/gitlab-mcp.json +34 -0
- package/mcp-configs/email/email-mcp.json +40 -0
- package/mcp-configs/email/gmail-mcp.json +37 -0
- package/mcp-configs/registry.json +178 -149
- package/mcp-configs/repository/dataverse-mcp.json +33 -0
- package/mcp-configs/repository/huggingface-mcp.json +29 -0
- package/openclaw.plugin.json +2 -2
- package/package.json +2 -2
- package/skills/analysis/dataviz/algorithm-visualizer-guide/SKILL.md +259 -0
- package/skills/analysis/dataviz/bokeh-visualization-guide/SKILL.md +270 -0
- package/skills/analysis/dataviz/chart-image-generator/SKILL.md +229 -0
- package/skills/analysis/dataviz/d3-visualization-guide/SKILL.md +281 -0
- package/skills/analysis/dataviz/echarts-visualization-guide/SKILL.md +250 -0
- package/skills/analysis/dataviz/metabase-analytics-guide/SKILL.md +242 -0
- package/skills/analysis/dataviz/plotly-interactive-guide/SKILL.md +266 -0
- package/skills/analysis/dataviz/redash-analytics-guide/SKILL.md +284 -0
- package/skills/analysis/econometrics/econml-causal-guide/SKILL.md +163 -0
- package/skills/analysis/econometrics/mostly-harmless-guide/SKILL.md +139 -0
- package/skills/analysis/econometrics/panel-data-analyst/SKILL.md +259 -0
- package/skills/analysis/econometrics/python-causality-guide/SKILL.md +134 -0
- package/skills/analysis/econometrics/stata-accounting-guide/SKILL.md +269 -0
- package/skills/analysis/econometrics/stata-analyst-guide/SKILL.md +245 -0
- package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +157 -0
- package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +212 -0
- package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +192 -0
- package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +193 -0
- package/skills/analysis/statistics/senior-data-scientist-guide/SKILL.md +223 -0
- package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +170 -0
- package/skills/analysis/wrangling/data-cleaning-pipeline/SKILL.md +266 -0
- package/skills/analysis/wrangling/data-cog-guide/SKILL.md +178 -0
- package/skills/analysis/wrangling/stata-data-cleaning/SKILL.md +276 -0
- package/skills/analysis/wrangling/survey-data-processing/SKILL.md +298 -0
- package/skills/domains/ai-ml/ai-model-benchmarking/SKILL.md +209 -0
- package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +159 -0
- package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md +239 -0
- package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +146 -0
- package/skills/domains/ai-ml/huggingface-inference-guide/SKILL.md +196 -0
- package/skills/domains/ai-ml/keras-deep-learning/SKILL.md +210 -0
- package/skills/domains/ai-ml/llm-from-scratch-guide/SKILL.md +124 -0
- package/skills/domains/ai-ml/ml-pipeline-guide/SKILL.md +295 -0
- package/skills/domains/ai-ml/nlp-toolkit-guide/SKILL.md +247 -0
- package/skills/domains/ai-ml/pytorch-guide/SKILL.md +281 -0
- package/skills/domains/ai-ml/pytorch-lightning-guide/SKILL.md +244 -0
- package/skills/domains/ai-ml/tensorflow-guide/SKILL.md +241 -0
- package/skills/domains/biomedical/bioagents-guide/SKILL.md +308 -0
- package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +345 -0
- package/skills/domains/biomedical/medical-imaging-guide/SKILL.md +305 -0
- package/skills/domains/business/architecture-design-guide/SKILL.md +279 -0
- package/skills/domains/business/innovation-management-guide/SKILL.md +257 -0
- package/skills/domains/business/operations-research-guide/SKILL.md +258 -0
- package/skills/domains/chemistry/molecular-dynamics-guide/SKILL.md +237 -0
- package/skills/domains/chemistry/pubchem-api-guide/SKILL.md +180 -0
- package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md +290 -0
- package/skills/domains/cs/distributed-systems-guide/SKILL.md +268 -0
- package/skills/domains/cs/formal-verification-guide/SKILL.md +298 -0
- package/skills/domains/ecology/species-distribution-guide/SKILL.md +343 -0
- package/skills/domains/economics/imf-data-api-guide/SKILL.md +174 -0
- package/skills/domains/economics/post-labor-economics/SKILL.md +254 -0
- package/skills/domains/economics/pricing-psychology-guide/SKILL.md +273 -0
- package/skills/domains/economics/world-bank-data-guide/SKILL.md +179 -0
- package/skills/domains/education/assessment-design-guide/SKILL.md +213 -0
- package/skills/domains/education/educational-research-methods/SKILL.md +179 -0
- package/skills/domains/education/mooc-analytics-guide/SKILL.md +206 -0
- package/skills/domains/finance/portfolio-optimization-guide/SKILL.md +279 -0
- package/skills/domains/finance/risk-modeling-guide/SKILL.md +260 -0
- package/skills/domains/finance/stata-accounting-research/SKILL.md +372 -0
- package/skills/domains/geoscience/climate-modeling-guide/SKILL.md +215 -0
- package/skills/domains/geoscience/satellite-remote-sensing/SKILL.md +193 -0
- package/skills/domains/geoscience/seismology-data-guide/SKILL.md +208 -0
- package/skills/domains/humanities/ethical-philosophy-guide/SKILL.md +244 -0
- package/skills/domains/humanities/history-research-guide/SKILL.md +260 -0
- package/skills/domains/humanities/political-history-guide/SKILL.md +241 -0
- package/skills/domains/law/legal-nlp-guide/SKILL.md +236 -0
- package/skills/domains/law/patent-analysis-guide/SKILL.md +257 -0
- package/skills/domains/law/regulatory-compliance-guide/SKILL.md +267 -0
- package/skills/domains/math/symbolic-computation-guide/SKILL.md +263 -0
- package/skills/domains/math/topology-data-analysis/SKILL.md +305 -0
- package/skills/domains/pharma/clinical-trial-design-guide/SKILL.md +271 -0
- package/skills/domains/pharma/drug-target-interaction/SKILL.md +242 -0
- package/skills/domains/pharma/pharmacovigilance-guide/SKILL.md +216 -0
- package/skills/domains/physics/astrophysics-data-guide/SKILL.md +305 -0
- package/skills/domains/physics/particle-physics-guide/SKILL.md +287 -0
- package/skills/domains/social-science/network-analysis-guide/SKILL.md +310 -0
- package/skills/domains/social-science/psychology-research-guide/SKILL.md +270 -0
- package/skills/domains/social-science/sociology-research-guide/SKILL.md +238 -0
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +120 -0
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +144 -0
- package/skills/literature/discovery/zotero-arxiv-daily-guide/SKILL.md +94 -0
- package/skills/literature/fulltext/core-api-guide/SKILL.md +144 -0
- package/skills/literature/fulltext/institutional-repository-guide/SKILL.md +212 -0
- package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +341 -0
- package/skills/literature/metadata/academic-paper-summarizer/SKILL.md +101 -0
- package/skills/literature/metadata/wikidata-api-guide/SKILL.md +156 -0
- package/skills/literature/search/arxiv-batch-reporting/SKILL.md +133 -0
- package/skills/literature/search/arxiv-paper-processor/SKILL.md +141 -0
- package/skills/literature/search/baidu-scholar-guide/SKILL.md +110 -0
- package/skills/literature/search/chatpaper-guide/SKILL.md +122 -0
- package/skills/literature/search/deep-literature-search/SKILL.md +149 -0
- package/skills/literature/search/deepgit-search-guide/SKILL.md +147 -0
- package/skills/literature/search/pasa-paper-search-guide/SKILL.md +138 -0
- package/skills/research/automation/ai-scientist-v2-guide/SKILL.md +284 -0
- package/skills/research/automation/aim-experiment-guide/SKILL.md +234 -0
- package/skills/research/automation/datagen-research-guide/SKILL.md +131 -0
- package/skills/research/automation/kedro-pipeline-guide/SKILL.md +216 -0
- package/skills/research/automation/mle-agent-guide/SKILL.md +139 -0
- package/skills/research/automation/paper-to-agent-guide/SKILL.md +116 -0
- package/skills/research/automation/rd-agent-guide/SKILL.md +246 -0
- package/skills/research/automation/research-paper-orchestrator/SKILL.md +254 -0
- package/skills/research/deep-research/academic-deep-research/SKILL.md +190 -0
- package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +141 -0
- package/skills/research/deep-research/deep-research-pro/SKILL.md +213 -0
- package/skills/research/deep-research/deep-research-work/SKILL.md +204 -0
- package/skills/research/deep-research/deep-searcher-guide/SKILL.md +253 -0
- package/skills/research/deep-research/gpt-researcher-guide/SKILL.md +191 -0
- package/skills/research/deep-research/khoj-research-guide/SKILL.md +200 -0
- package/skills/research/deep-research/local-deep-research-guide/SKILL.md +253 -0
- package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +217 -0
- package/skills/research/funding/eu-horizon-guide/SKILL.md +244 -0
- package/skills/research/funding/grant-budget-guide/SKILL.md +284 -0
- package/skills/research/funding/nih-reporter-api-guide/SKILL.md +166 -0
- package/skills/research/funding/nsf-award-api-guide/SKILL.md +133 -0
- package/skills/research/methodology/academic-mentor-guide/SKILL.md +169 -0
- package/skills/research/methodology/claude-scientific-guide/SKILL.md +122 -0
- package/skills/research/methodology/deep-innovator-guide/SKILL.md +242 -0
- package/skills/research/methodology/osf-api-guide/SKILL.md +165 -0
- package/skills/research/methodology/research-paper-kb/SKILL.md +263 -0
- package/skills/research/methodology/research-town-guide/SKILL.md +263 -0
- package/skills/research/paper-review/automated-review-guide/SKILL.md +281 -0
- package/skills/research/paper-review/paper-compare-guide/SKILL.md +238 -0
- package/skills/research/paper-review/paper-digest-guide/SKILL.md +240 -0
- package/skills/research/paper-review/paper-research-assistant/SKILL.md +231 -0
- package/skills/research/paper-review/research-quality-filter/SKILL.md +261 -0
- package/skills/research/paper-review/review-response-guide/SKILL.md +275 -0
- package/skills/tools/code-exec/google-colab-guide/SKILL.md +276 -0
- package/skills/tools/code-exec/kaggle-api-guide/SKILL.md +216 -0
- package/skills/tools/code-exec/overleaf-cli-guide/SKILL.md +279 -0
- package/skills/tools/diagram/code-flow-visualizer/SKILL.md +197 -0
- package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +170 -0
- package/skills/tools/diagram/json-data-visualizer/SKILL.md +270 -0
- package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +219 -0
- package/skills/tools/diagram/tldraw-whiteboard-guide/SKILL.md +397 -0
- package/skills/tools/document/docsgpt-guide/SKILL.md +130 -0
- package/skills/tools/document/large-document-reader/SKILL.md +202 -0
- package/skills/tools/document/paper-parse-guide/SKILL.md +243 -0
- package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +244 -0
- package/skills/tools/knowledge-graph/concept-map-generator/SKILL.md +284 -0
- package/skills/tools/knowledge-graph/graphiti-guide/SKILL.md +219 -0
- package/skills/tools/ocr-translate/pdf-math-translate-guide/SKILL.md +141 -0
- package/skills/tools/ocr-translate/zotero-pdf-translate-guide/SKILL.md +95 -0
- package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +143 -0
- package/skills/tools/scraping/dataset-finder-guide/SKILL.md +253 -0
- package/skills/tools/scraping/easy-spider-guide/SKILL.md +250 -0
- package/skills/tools/scraping/google-scholar-scraper/SKILL.md +255 -0
- package/skills/tools/scraping/repository-harvesting-guide/SKILL.md +310 -0
- package/skills/writing/citation/academic-citation-manager/SKILL.md +314 -0
- package/skills/writing/citation/jabref-reference-guide/SKILL.md +127 -0
- package/skills/writing/citation/jasminum-zotero-guide/SKILL.md +103 -0
- package/skills/writing/citation/obsidian-citation-guide/SKILL.md +164 -0
- package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +137 -0
- package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +132 -0
- package/skills/writing/citation/papis-cli-guide/SKILL.md +213 -0
- package/skills/writing/citation/zotero-better-bibtex-guide/SKILL.md +107 -0
- package/skills/writing/citation/zotero-better-notes-guide/SKILL.md +121 -0
- package/skills/writing/citation/zotero-gpt-guide/SKILL.md +111 -0
- package/skills/writing/citation/zotero-mcp-guide/SKILL.md +164 -0
- package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +162 -0
- package/skills/writing/citation/zotero-reference-guide/SKILL.md +139 -0
- package/skills/writing/citation/zotero-scholar-guide/SKILL.md +294 -0
- package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +140 -0
- package/skills/writing/composition/ml-paper-writing/SKILL.md +163 -0
- package/skills/writing/composition/paper-debugger-guide/SKILL.md +143 -0
- package/skills/writing/composition/scientific-writing-resources/SKILL.md +151 -0
- package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +153 -0
- package/skills/writing/latex/latex-drawing-collection/SKILL.md +154 -0
- package/skills/writing/latex/latex-templates-collection/SKILL.md +159 -0
- package/skills/writing/latex/md-to-pdf-academic/SKILL.md +230 -0
- package/skills/writing/latex/tex-render-guide/SKILL.md +243 -0
- package/skills/writing/polish/academic-tone-guide/SKILL.md +209 -0
- package/skills/writing/polish/conciseness-editing-guide/SKILL.md +225 -0
- package/skills/writing/polish/paper-polish-guide/SKILL.md +160 -0
- package/skills/writing/templates/graphical-abstract-guide/SKILL.md +183 -0
- package/skills/writing/templates/novathesis-guide/SKILL.md +152 -0
- package/skills/writing/templates/scientific-article-pdf/SKILL.md +261 -0
- package/skills/writing/templates/sjtuthesis-guide/SKILL.md +197 -0
- package/skills/writing/templates/thuthesis-guide/SKILL.md +181 -0
- package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +0 -207
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: arxiv-batch-reporting
|
|
3
|
+
description: "Batch search and report generation from arXiv preprint repository"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "📊"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "search"
|
|
9
|
+
keywords: ["arxiv", "batch search", "preprint", "report generation", "literature monitoring", "research trends"]
|
|
10
|
+
source: "https://github.com/sspaeti/arxiv-batch-search"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# arXiv Batch Reporting
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
Keeping up with the flood of new preprints on arXiv is one of the most persistent challenges in fast-moving fields like machine learning, physics, mathematics, and computer science. The arXiv Batch Reporting skill provides a systematic approach to searching, filtering, and generating structured reports from arXiv at scale.
|
|
18
|
+
|
|
19
|
+
Unlike ad-hoc manual searches, this skill enables researchers to define persistent query profiles, run batch searches across date ranges, and produce formatted reports that highlight the most relevant papers. It is particularly useful for weekly or monthly literature surveillance, lab meeting preparation, and trend analysis across subfields.
|
|
20
|
+
|
|
21
|
+
The skill leverages the arXiv API and supports advanced query syntax, category filtering, and result ranking by relevance or recency. Reports can be generated in Markdown, HTML, or CSV formats for integration into existing workflows.
|
|
22
|
+
|
|
23
|
+
## Setting Up Batch Queries
|
|
24
|
+
|
|
25
|
+
### Query Profile Definition
|
|
26
|
+
|
|
27
|
+
Define your search profiles as structured configurations. Each profile specifies the search terms, category filters, date range, and output preferences:
|
|
28
|
+
|
|
29
|
+
```yaml
|
|
30
|
+
profile_name: "transformer-architectures-weekly"
|
|
31
|
+
queries:
|
|
32
|
+
- "ti:transformer AND abs:attention mechanism"
|
|
33
|
+
- "ti:vision transformer"
|
|
34
|
+
- "abs:efficient transformer AND cat:cs.LG"
|
|
35
|
+
categories:
|
|
36
|
+
- cs.LG
|
|
37
|
+
- cs.CL
|
|
38
|
+
- cs.CV
|
|
39
|
+
date_range: "last_7_days"
|
|
40
|
+
max_results: 100
|
|
41
|
+
sort_by: "submittedDate"
|
|
42
|
+
sort_order: "descending"
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### arXiv API Query Syntax
|
|
46
|
+
|
|
47
|
+
The arXiv API supports field-specific searches:
|
|
48
|
+
|
|
49
|
+
- `ti:` — Search in title
|
|
50
|
+
- `abs:` — Search in abstract
|
|
51
|
+
- `au:` — Search by author
|
|
52
|
+
- `cat:` — Filter by category (e.g., `cs.AI`, `math.PR`, `physics.comp-ph`)
|
|
53
|
+
- Boolean operators: `AND`, `OR`, `ANDNOT`
|
|
54
|
+
- Group with parentheses for complex queries
|
|
55
|
+
|
|
56
|
+
**Example queries:**
|
|
57
|
+
- Find recent GAN papers in computer vision: `abs:generative adversarial AND cat:cs.CV`
|
|
58
|
+
- Find a specific author's work: `au:bengio AND ti:deep learning`
|
|
59
|
+
- Exclude survey papers: `abs:reinforcement learning ANDNOT ti:survey`
|
|
60
|
+
|
|
61
|
+
### Rate Limiting and Pagination
|
|
62
|
+
|
|
63
|
+
The arXiv API enforces rate limits. Follow these guidelines:
|
|
64
|
+
|
|
65
|
+
- Wait at least 3 seconds between API requests
|
|
66
|
+
- Use pagination with `start` and `max_results` parameters (max 2000 per request)
|
|
67
|
+
- For large batch jobs, implement exponential backoff on HTTP 503 responses
|
|
68
|
+
- Cache results locally to avoid redundant API calls
|
|
69
|
+
|
|
70
|
+
## Report Generation
|
|
71
|
+
|
|
72
|
+
### Standard Report Template
|
|
73
|
+
|
|
74
|
+
After collecting batch results, generate a report with the following structure:
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
# arXiv Batch Report: [Profile Name]
|
|
78
|
+
**Date range:** [start] to [end]
|
|
79
|
+
**Total results:** [N] papers
|
|
80
|
+
**Generated:** [timestamp]
|
|
81
|
+
|
|
82
|
+
## Highlights (Top 10 by Relevance)
|
|
83
|
+
| # | Title | Authors | Category | Date |
|
|
84
|
+
|---|-------|---------|----------|------|
|
|
85
|
+
| 1 | [Title](arxiv-link) | First Author et al. | cs.LG | 2026-03-08 |
|
|
86
|
+
|
|
87
|
+
## Category Breakdown
|
|
88
|
+
- cs.LG: 45 papers
|
|
89
|
+
- cs.CL: 23 papers
|
|
90
|
+
- cs.CV: 18 papers
|
|
91
|
+
|
|
92
|
+
## Keyword Frequency
|
|
93
|
+
- "transformer": 38 mentions
|
|
94
|
+
- "attention": 29 mentions
|
|
95
|
+
- "efficient": 15 mentions
|
|
96
|
+
|
|
97
|
+
## Full Results
|
|
98
|
+
[Expandable table with all papers]
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Filtering and Ranking
|
|
102
|
+
|
|
103
|
+
After retrieving raw results, apply post-processing filters to surface the most relevant papers:
|
|
104
|
+
|
|
105
|
+
1. **Relevance scoring**: Score each paper based on keyword density in the title and abstract relative to your query terms.
|
|
106
|
+
2. **Author filtering**: Boost papers from authors on your watch list (key researchers in your field).
|
|
107
|
+
3. **Citation proxy**: Papers that appear in multiple query results likely sit at the intersection of your interests—rank them higher.
|
|
108
|
+
4. **Novelty detection**: Flag papers whose abstracts contain terms not seen in your previous reports, indicating potentially new directions.
|
|
109
|
+
|
|
110
|
+
## Automation and Scheduling
|
|
111
|
+
|
|
112
|
+
For ongoing literature surveillance, automate your batch reports:
|
|
113
|
+
|
|
114
|
+
- **Cron scheduling**: Run batch queries weekly (e.g., every Monday at 8 AM) using a scheduled task or CI pipeline.
|
|
115
|
+
- **Diff reports**: Compare the current week's results against the previous week to highlight only new papers.
|
|
116
|
+
- **Alert thresholds**: Set alerts when a report contains more than N papers matching a high-priority query, indicating a burst of activity in that area.
|
|
117
|
+
- **Email or Slack delivery**: Route generated reports to your inbox or lab Slack channel for team-wide awareness.
|
|
118
|
+
|
|
119
|
+
Store all generated reports in a versioned directory structure for longitudinal trend analysis:
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
reports/
|
|
123
|
+
transformer-architectures-weekly/
|
|
124
|
+
2026-03-03.md
|
|
125
|
+
2026-03-10.md
|
|
126
|
+
...
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## References
|
|
130
|
+
|
|
131
|
+
- arXiv API documentation: https://info.arxiv.org/help/api/index.html
|
|
132
|
+
- arXiv category taxonomy: https://arxiv.org/category_taxonomy
|
|
133
|
+
- arXiv Batch Search: https://github.com/sspaeti/arxiv-batch-search
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: arxiv-paper-processor
|
|
3
|
+
description: "Process and analyze arXiv papers systematically for research workflows"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "⚙️"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "search"
|
|
9
|
+
keywords: ["arxiv", "paper processing", "PDF parsing", "metadata extraction", "preprint analysis", "research pipeline"]
|
|
10
|
+
source: "https://github.com/tatsu-lab/gpt_paper_assistant"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# arXiv Paper Processor
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
The arXiv Paper Processor skill provides a complete pipeline for downloading, parsing, and analyzing arXiv papers programmatically. While the arXiv API provides metadata, researchers often need to work with the full text—extracting sections, equations, figures, and references for deeper analysis.
|
|
18
|
+
|
|
19
|
+
This skill covers the entire processing chain: retrieving papers by ID or search query, downloading PDF and LaTeX source files, extracting structured content, and producing analysis-ready outputs. It is particularly valuable for researchers conducting large-scale literature analysis, building training datasets from academic text, or automating evidence extraction for systematic reviews.
|
|
20
|
+
|
|
21
|
+
The pipeline handles common challenges in academic PDF processing including multi-column layouts, mathematical notation, table extraction, and reference parsing. It integrates with tools like GROBID for PDF parsing and can work directly with arXiv LaTeX sources for higher-fidelity extraction.
|
|
22
|
+
|
|
23
|
+
## Paper Retrieval and Download
|
|
24
|
+
|
|
25
|
+
### Fetching by arXiv ID
|
|
26
|
+
|
|
27
|
+
The most reliable method is to fetch papers by their arXiv identifier:
|
|
28
|
+
|
|
29
|
+
```python
|
|
30
|
+
import urllib.request
|
|
31
|
+
import feedparser
|
|
32
|
+
|
|
33
|
+
# Fetch metadata via Atom feed
|
|
34
|
+
arxiv_id = "2301.07041"
|
|
35
|
+
url = f"http://export.arxiv.org/api/query?id_list={arxiv_id}"
|
|
36
|
+
response = urllib.request.urlopen(url)
|
|
37
|
+
feed = feedparser.parse(response.read())
|
|
38
|
+
|
|
39
|
+
entry = feed.entries[0]
|
|
40
|
+
title = entry.title
|
|
41
|
+
abstract = entry.summary
|
|
42
|
+
authors = [a.name for a in entry.authors]
|
|
43
|
+
pdf_url = entry.links[1].href # PDF link
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### Downloading Source Files
|
|
47
|
+
|
|
48
|
+
arXiv stores LaTeX source files for most papers. These provide much richer structure than PDFs:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
# Download LaTeX source (typically a .tar.gz)
|
|
52
|
+
wget https://arxiv.org/e-print/2301.07041 -O paper_source.tar.gz
|
|
53
|
+
tar -xzf paper_source.tar.gz -C paper_source/
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Source files contain the original `.tex` files, figures, bibliography files, and any custom style files. Parsing LaTeX directly gives you access to section structure, equations in their original notation, citation keys, and figure captions without the ambiguity of PDF extraction.
|
|
57
|
+
|
|
58
|
+
### Batch Download Guidelines
|
|
59
|
+
|
|
60
|
+
When downloading multiple papers, respect arXiv's usage policies:
|
|
61
|
+
|
|
62
|
+
- Limit requests to 1 per 3 seconds for API calls
|
|
63
|
+
- Use the arXiv bulk data access (S3 or GCS) for large-scale processing (1000+ papers)
|
|
64
|
+
- Cache all downloaded files locally and check before re-downloading
|
|
65
|
+
- Include a descriptive User-Agent header in your HTTP requests
|
|
66
|
+
|
|
67
|
+
## Content Extraction Pipeline
|
|
68
|
+
|
|
69
|
+
### PDF Extraction with GROBID
|
|
70
|
+
|
|
71
|
+
For papers where only PDF is available, use GROBID (GeneRation Of BIbliographic Data) for structured extraction:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
# Run GROBID as a local service
|
|
75
|
+
docker run --rm -p 8070:8070 grobid/grobid:0.8.0
|
|
76
|
+
|
|
77
|
+
# Process a PDF
|
|
78
|
+
curl -X POST "http://localhost:8070/api/processFulltextDocument" \
|
|
79
|
+
-F "input=@paper.pdf" \
|
|
80
|
+
-F "consolidateHeader=1" \
|
|
81
|
+
-F "consolidateCitations=1" \
|
|
82
|
+
> paper_tei.xml
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
GROBID outputs TEI-XML with structured sections including:
|
|
86
|
+
- Header metadata (title, authors, affiliations, abstract)
|
|
87
|
+
- Body text with section hierarchy
|
|
88
|
+
- Equations (as MathML or raw text)
|
|
89
|
+
- Figure and table references
|
|
90
|
+
- Parsed bibliography entries with DOIs where available
|
|
91
|
+
|
|
92
|
+
### LaTeX Source Parsing
|
|
93
|
+
|
|
94
|
+
When LaTeX source is available, parse it directly for higher fidelity:
|
|
95
|
+
|
|
96
|
+
1. Identify the main `.tex` file (look for `\documentclass` or `\begin{document}`)
|
|
97
|
+
2. Resolve `\input{}` and `\include{}` directives to build the complete document
|
|
98
|
+
3. Extract sections using `\section{}`, `\subsection{}` markers
|
|
99
|
+
4. Extract equations from `equation`, `align`, `gather` environments
|
|
100
|
+
5. Parse `\cite{}` commands and cross-reference with the `.bib` file
|
|
101
|
+
6. Extract figure captions from `\caption{}` commands
|
|
102
|
+
|
|
103
|
+
### Structured Output Schema
|
|
104
|
+
|
|
105
|
+
Produce a standardized JSON output for each processed paper:
|
|
106
|
+
|
|
107
|
+
```json
|
|
108
|
+
{
|
|
109
|
+
"arxiv_id": "2301.07041",
|
|
110
|
+
"title": "Paper Title",
|
|
111
|
+
"authors": ["Author One", "Author Two"],
|
|
112
|
+
"abstract": "...",
|
|
113
|
+
"sections": [
|
|
114
|
+
{"heading": "Introduction", "level": 1, "text": "..."},
|
|
115
|
+
{"heading": "Related Work", "level": 1, "text": "..."}
|
|
116
|
+
],
|
|
117
|
+
"equations": ["E = mc^2", "..."],
|
|
118
|
+
"figures": [{"id": "fig1", "caption": "..."}],
|
|
119
|
+
"references": [{"key": "smith2020", "title": "...", "doi": "..."}],
|
|
120
|
+
"processed_date": "2026-03-10"
|
|
121
|
+
}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
## Analysis and Integration
|
|
125
|
+
|
|
126
|
+
Once papers are processed into structured format, several downstream analyses become possible:
|
|
127
|
+
|
|
128
|
+
- **Section-level search**: Search across the methods sections of hundreds of papers to find specific techniques.
|
|
129
|
+
- **Equation extraction**: Build a database of mathematical formulations used in your subfield.
|
|
130
|
+
- **Citation graph construction**: Map which papers cite which, using extracted reference lists.
|
|
131
|
+
- **Terminology tracking**: Monitor how specific terms evolve in usage frequency over time.
|
|
132
|
+
- **Dataset identification**: Extract mentions of datasets and benchmarks from experimental sections.
|
|
133
|
+
|
|
134
|
+
Integrate processed outputs with your reference manager by generating BibTeX entries enriched with extracted metadata, or feed structured JSON into a local search index for full-text retrieval across your paper collection.
|
|
135
|
+
|
|
136
|
+
## References
|
|
137
|
+
|
|
138
|
+
- arXiv API: https://info.arxiv.org/help/api/index.html
|
|
139
|
+
- GROBID: https://github.com/kermitt2/grobid
|
|
140
|
+
- GPT Paper Assistant: https://github.com/tatsu-lab/gpt_paper_assistant
|
|
141
|
+
- arXiv bulk data access: https://info.arxiv.org/help/bulk_data/index.html
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: baidu-scholar-guide
|
|
3
|
+
description: "Using Baidu Scholar for Chinese and English academic literature search"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "🔍"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "search"
|
|
9
|
+
keywords: ["baidu scholar", "chinese literature", "academic search", "CNKI", "bilingual research", "chinese journals"]
|
|
10
|
+
source: "https://xueshu.baidu.com"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Baidu Scholar Guide
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
Baidu Scholar (百度学术, xueshu.baidu.com) is one of the largest academic search engines with particular strength in indexing Chinese-language scholarly publications. For researchers working with Chinese academic literature—or conducting bilingual research that spans both English and Chinese sources—Baidu Scholar provides access to content that is often underrepresented in Western databases like Google Scholar, Scopus, or Web of Science.
|
|
18
|
+
|
|
19
|
+
Baidu Scholar indexes content from major Chinese academic databases including CNKI (China National Knowledge Infrastructure), Wanfang Data, VIP/CQVIP, as well as international sources like IEEE, Springer, Elsevier, and arXiv. This makes it a valuable complement to English-centric search tools, particularly for fields where Chinese research output is substantial: materials science, traditional medicine, agricultural science, renewable energy, and AI/ML.
|
|
20
|
+
|
|
21
|
+
This skill covers effective search strategies on Baidu Scholar, navigating its interface, accessing full-text content, and integrating Chinese-language papers into your broader research workflow.
|
|
22
|
+
|
|
23
|
+
## Search Strategies and Syntax
|
|
24
|
+
|
|
25
|
+
### Basic Search
|
|
26
|
+
|
|
27
|
+
Navigate to `https://xueshu.baidu.com` and enter your search terms. Baidu Scholar supports both Chinese and English queries. For bilingual research, run parallel searches:
|
|
28
|
+
|
|
29
|
+
- English query: `"deep learning" medical imaging`
|
|
30
|
+
- Chinese equivalent: `"深度学习" 医学影像`
|
|
31
|
+
|
|
32
|
+
### Advanced Filtering
|
|
33
|
+
|
|
34
|
+
After performing a search, use the left sidebar filters to narrow results:
|
|
35
|
+
|
|
36
|
+
- **Time range** (时间): Filter by publication year or custom date range
|
|
37
|
+
- **Author** (作者): Filter by specific author names
|
|
38
|
+
- **Journal/Conference** (来源): Filter by publication venue
|
|
39
|
+
- **Keywords** (关键词): Refine by subject keywords
|
|
40
|
+
- **Type** (类型): Journal article, conference paper, thesis, patent
|
|
41
|
+
|
|
42
|
+
### Effective Query Techniques
|
|
43
|
+
|
|
44
|
+
1. **Use quotes for exact phrases**: `"convolutional neural network"` or `"卷积神经网络"`
|
|
45
|
+
2. **Combine Chinese and English terms**: Many Chinese papers include English keywords—searching for the English term often finds Chinese papers too.
|
|
46
|
+
3. **Search by author in both scripts**: A Chinese researcher may publish as both "张三" and "San Zhang"—search both forms.
|
|
47
|
+
4. **Use subject-specific Chinese terminology**: Consult your field's Chinese glossary (术语表) for standard translations of technical terms.
|
|
48
|
+
5. **Check "related searches" (相关搜索)**: Baidu Scholar suggests related queries that can reveal alternative terminology.
|
|
49
|
+
|
|
50
|
+
## Navigating Chinese Academic Databases
|
|
51
|
+
|
|
52
|
+
Baidu Scholar results often link to papers hosted on major Chinese databases. Understanding how to access these is essential:
|
|
53
|
+
|
|
54
|
+
### CNKI (中国知网, cnki.net)
|
|
55
|
+
- The largest Chinese academic database with over 200 million records
|
|
56
|
+
- Requires institutional subscription for full-text access
|
|
57
|
+
- Many Chinese universities provide VPN-based remote access
|
|
58
|
+
- Supports export to EndNote, BibTeX, and other reference managers
|
|
59
|
+
|
|
60
|
+
### Wanfang Data (万方数据, wanfangdata.com.cn)
|
|
61
|
+
- Strong coverage of Chinese dissertations and conference proceedings
|
|
62
|
+
- Offers a free preview of the first page of many papers
|
|
63
|
+
- Institutional access widely available at Chinese universities
|
|
64
|
+
|
|
65
|
+
### VIP/CQVIP (维普, cqvip.com)
|
|
66
|
+
- Focuses on Chinese periodical literature
|
|
67
|
+
- Good coverage of Chinese-language journals not indexed elsewhere
|
|
68
|
+
|
|
69
|
+
### Access Tips
|
|
70
|
+
|
|
71
|
+
- If you lack institutional access, check whether the paper is also available on the author's personal page, ResearchGate, or a university repository.
|
|
72
|
+
- Many Chinese authors post preprints on arXiv or ChinaXiv (chinaxiv.org).
|
|
73
|
+
- Baidu Scholar's "free download" (免费下载) links sometimes point to open-access copies.
|
|
74
|
+
- Use the DOI when available to find the paper on the publisher's site.
|
|
75
|
+
|
|
76
|
+
## Citation Features and Export
|
|
77
|
+
|
|
78
|
+
Baidu Scholar provides several useful citation tools:
|
|
79
|
+
|
|
80
|
+
- **Citation count**: Displayed below each result; click to see citing papers.
|
|
81
|
+
- **Citation export**: Click "引用" (Cite) below any result to get formatted citations in GB/T 7714, MLA, APA, or BibTeX format.
|
|
82
|
+
- **Related papers**: The "相关文章" link shows semantically similar papers.
|
|
83
|
+
- **Author profiles**: Click an author name to see their indexed publications and citation metrics.
|
|
84
|
+
|
|
85
|
+
### Exporting to Reference Managers
|
|
86
|
+
|
|
87
|
+
To integrate Baidu Scholar results into your reference management workflow:
|
|
88
|
+
|
|
89
|
+
1. Click the "引用" button on a search result
|
|
90
|
+
2. Select "BibTeX" format
|
|
91
|
+
3. Copy the BibTeX entry and import into Zotero, Mendeley, or your preferred manager
|
|
92
|
+
4. Verify the metadata—Chinese-to-BibTeX conversion sometimes introduces encoding issues
|
|
93
|
+
5. Manually add the English translation of the title in a `note` field for bilingual libraries
|
|
94
|
+
|
|
95
|
+
## Best Practices for Bilingual Research
|
|
96
|
+
|
|
97
|
+
When conducting research that spans Chinese and English literature:
|
|
98
|
+
|
|
99
|
+
- **Maintain parallel search logs**: Document both your English and Chinese search strategies so your methodology is reproducible.
|
|
100
|
+
- **Use consistent translation**: Establish a glossary mapping key technical terms between languages and use it consistently in your review.
|
|
101
|
+
- **Cite Chinese sources properly**: Use the GB/T 7714 standard for Chinese references when publishing in Chinese venues, or transliterate titles when publishing in English journals.
|
|
102
|
+
- **Cross-validate findings**: When a claim appears in Chinese literature but not in English sources (or vice versa), investigate whether it represents a genuine knowledge gap or a language barrier issue.
|
|
103
|
+
- **Note database coverage differences**: Be transparent in your methodology about which databases you searched and in which languages.
|
|
104
|
+
|
|
105
|
+
## References
|
|
106
|
+
|
|
107
|
+
- Baidu Scholar: https://xueshu.baidu.com
|
|
108
|
+
- CNKI: https://www.cnki.net
|
|
109
|
+
- ChinaXiv: https://chinaxiv.org
|
|
110
|
+
- GB/T 7714-2015 citation standard: https://std.samr.gov.cn
|
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: chatpaper-guide
|
|
3
|
+
description: "Use ChatPaper to summarize and search arXiv papers with LLM assistance"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "📑"
|
|
7
|
+
category: literature
|
|
8
|
+
subcategory: search
|
|
9
|
+
keywords: ["arxiv", "paper-summarization", "literature-search", "chatgpt", "research-acceleration"]
|
|
10
|
+
source: "https://github.com/kaixindelele/ChatPaper"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# ChatPaper Guide
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
ChatPaper is an open-source tool that leverages large language models to automatically summarize, search, and analyze academic papers from arXiv. It addresses a fundamental challenge in modern research: the overwhelming volume of new publications makes it nearly impossible for researchers to keep up with developments in their fields through manual reading alone.
|
|
18
|
+
|
|
19
|
+
The tool connects to the arXiv API to retrieve papers based on keyword queries, then uses LLM capabilities to generate structured summaries covering research motivation, methodology, key findings, and limitations. This enables researchers to rapidly triage large batches of papers and identify the most relevant ones for detailed study.
|
|
20
|
+
|
|
21
|
+
With over 19,000 GitHub stars, ChatPaper has become a widely adopted tool in the research community. It supports multiple LLM backends and offers both command-line and web-based interfaces, making it accessible to researchers with varying levels of technical expertise.
|
|
22
|
+
|
|
23
|
+
## Installation and Setup
|
|
24
|
+
|
|
25
|
+
Clone the repository and install dependencies:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
git clone https://github.com/kaixindelele/ChatPaper.git
|
|
29
|
+
cd ChatPaper
|
|
30
|
+
pip install -r requirements.txt
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Configure your LLM API access by setting environment variables:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# For OpenAI API
|
|
37
|
+
export OPENAI_API_KEY=$OPENAI_API_KEY
|
|
38
|
+
|
|
39
|
+
# Optional: use a custom API endpoint
|
|
40
|
+
export OPENAI_BASE_URL=$OPENAI_BASE_URL
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Alternatively, edit the configuration directly in the settings file to specify your preferred model and API parameters. The tool supports OpenAI models as well as compatible alternatives.
|
|
44
|
+
|
|
45
|
+
Verify the installation by running a test query:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
python chat_paper.py --query "transformer attention mechanism" --max_results 3
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Core Features
|
|
52
|
+
|
|
53
|
+
**Automated Paper Search and Summarization**: ChatPaper queries arXiv based on your research interests and generates concise, structured summaries for each paper:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
# Search for recent papers on a topic
|
|
57
|
+
python chat_paper.py \
|
|
58
|
+
--query "graph neural networks drug discovery" \
|
|
59
|
+
--max_results 10 \
|
|
60
|
+
--sort "Relevance" \
|
|
61
|
+
--language "en"
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Each summary is structured to highlight the core research question, proposed method, experimental results, and conclusions, saving significant reading time during literature reviews.
|
|
65
|
+
|
|
66
|
+
**Batch Processing**: Process multiple search queries or a list of arXiv paper IDs in a single run:
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
# Summarize specific papers by arXiv ID
|
|
70
|
+
python chat_paper.py \
|
|
71
|
+
--pdf_path "2301.00234,2302.01567,2303.04589" \
|
|
72
|
+
--language "en"
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Multi-Language Output**: Generate summaries in your preferred language regardless of the source paper language. This is particularly useful for researchers who think and write in a language different from the papers they read:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
python chat_paper.py \
|
|
79
|
+
--query "quantum computing optimization" \
|
|
80
|
+
--language "zh" \
|
|
81
|
+
--max_results 5
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**Research Report Generation**: Beyond individual summaries, ChatPaper can compile comparative analysis reports across multiple papers on the same topic, identifying common themes, methodological differences, and research gaps.
|
|
85
|
+
|
|
86
|
+
## Academic Workflow Integration
|
|
87
|
+
|
|
88
|
+
ChatPaper integrates into research workflows at several critical stages:
|
|
89
|
+
|
|
90
|
+
**Daily Literature Monitoring**: Set up automated scripts to check for new papers in your research area each morning. Create a cron job or scheduled task that runs ChatPaper queries and delivers summaries to your inbox or a designated folder:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# Example daily monitoring script
|
|
94
|
+
python chat_paper.py \
|
|
95
|
+
--query "large language model reasoning" \
|
|
96
|
+
--max_results 20 \
|
|
97
|
+
--sort "LastUpdatedDate" \
|
|
98
|
+
--days 1 \
|
|
99
|
+
--save_path ./daily_summaries/
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Systematic Review Support**: When conducting systematic literature reviews, use ChatPaper to generate initial screening summaries for a large pool of candidate papers. This accelerates the title-and-abstract screening phase by providing structured, consistent summaries that highlight methodological details often buried in abstracts.
|
|
103
|
+
|
|
104
|
+
**Research Group Discussions**: Generate summary documents for journal club or lab meeting preparation. Share the structured summaries with your group so everyone arrives with baseline understanding of the papers under discussion.
|
|
105
|
+
|
|
106
|
+
**Identifying Research Gaps**: By summarizing many papers in a subfield simultaneously, patterns emerge in what has been studied and what remains unexplored. ChatPaper summaries can be analyzed collectively to map the landscape of a research area.
|
|
107
|
+
|
|
108
|
+
## Advanced Usage and Tips
|
|
109
|
+
|
|
110
|
+
**Custom Prompts**: Modify the summarization prompts to focus on aspects most relevant to your research. For example, you might emphasize dataset details for data-centric work or focus on theoretical contributions for more mathematical fields.
|
|
111
|
+
|
|
112
|
+
**Combining with Reference Managers**: Export ChatPaper summaries alongside BibTeX entries for direct import into Zotero, Mendeley, or other reference management tools. This creates an annotated bibliography with minimal manual effort.
|
|
113
|
+
|
|
114
|
+
**Rate Limiting Considerations**: When processing large batches, be mindful of both arXiv API rate limits and your LLM provider quotas. Space requests appropriately and consider using local or self-hosted models for high-volume processing.
|
|
115
|
+
|
|
116
|
+
**Quality Verification**: LLM-generated summaries may occasionally contain inaccuracies. Always verify critical claims by checking the original paper, particularly for numerical results, statistical significance values, and methodological details that require precise interpretation.
|
|
117
|
+
|
|
118
|
+
## References
|
|
119
|
+
|
|
120
|
+
- ChatPaper repository: https://github.com/kaixindelele/ChatPaper
|
|
121
|
+
- arXiv API documentation: https://info.arxiv.org/help/api/
|
|
122
|
+
- Related tool ChatReviewer for automated paper review generation
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: deep-literature-search
|
|
3
|
+
description: "Multi-source exhaustive literature search across academic databases"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "🕵️"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "search"
|
|
9
|
+
keywords: ["exhaustive search", "systematic search", "multi-database", "literature review", "search strategy", "PRISMA"]
|
|
10
|
+
source: "https://github.com/utilityfog/Literature-Search"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Deep Literature Search
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
A deep literature search goes beyond a quick Google Scholar query. It is a methodical, multi-source search process designed to identify all relevant publications on a topic with minimal omissions. This level of thoroughness is required for systematic reviews, meta-analyses, grant applications, and dissertation literature reviews where comprehensiveness is not optional—it is a methodological requirement.
|
|
18
|
+
|
|
19
|
+
This skill provides a structured framework for planning, executing, and documenting exhaustive literature searches across multiple academic databases. It covers query formulation using controlled vocabularies, database selection strategy, deduplication, screening workflows, and PRISMA-compliant documentation of the search process.
|
|
20
|
+
|
|
21
|
+
The framework is database-agnostic and can be applied across disciplines, from biomedical sciences (PubMed, Cochrane) to social sciences (PsycINFO, ERIC), engineering (IEEE Xplore, Compendex), and multidisciplinary databases (Web of Science, Scopus, OpenAlex).
|
|
22
|
+
|
|
23
|
+
## Search Strategy Design
|
|
24
|
+
|
|
25
|
+
### Step 1: Define the Research Question
|
|
26
|
+
|
|
27
|
+
Use the PICO/PEO/SPIDER framework appropriate to your field:
|
|
28
|
+
|
|
29
|
+
- **PICO** (clinical/biomedical): Population, Intervention, Comparison, Outcome
|
|
30
|
+
- **PEO** (qualitative): Population, Exposure, Outcome
|
|
31
|
+
- **SPIDER** (mixed methods): Sample, Phenomenon of Interest, Design, Evaluation, Research type
|
|
32
|
+
|
|
33
|
+
**Example**: "What is the effect of mindfulness-based interventions (I) on academic stress (O) in graduate students (P) compared to no intervention (C)?"
|
|
34
|
+
|
|
35
|
+
### Step 2: Identify Key Concepts and Synonyms
|
|
36
|
+
|
|
37
|
+
Break your research question into 2-4 key concepts. For each concept, list all synonyms, related terms, abbreviations, and controlled vocabulary terms:
|
|
38
|
+
|
|
39
|
+
| Concept | Synonyms and Related Terms |
|
|
40
|
+
|---------|---------------------------|
|
|
41
|
+
| Mindfulness | mindfulness-based stress reduction, MBSR, meditation, mindful awareness |
|
|
42
|
+
| Academic stress | study stress, exam anxiety, academic burnout, student distress |
|
|
43
|
+
| Graduate students | postgraduate, doctoral students, PhD candidates, master's students |
|
|
44
|
+
|
|
45
|
+
### Step 3: Build the Search String
|
|
46
|
+
|
|
47
|
+
Combine concepts using Boolean logic:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
("mindfulness" OR "MBSR" OR "mindfulness-based stress reduction" OR "meditation")
|
|
51
|
+
AND
|
|
52
|
+
("academic stress" OR "study stress" OR "exam anxiety" OR "academic burnout")
|
|
53
|
+
AND
|
|
54
|
+
("graduate student*" OR "postgraduate*" OR "doctoral student*" OR "PhD candidate*")
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Key syntax rules:
|
|
58
|
+
- Use `OR` within concept groups (broadens)
|
|
59
|
+
- Use `AND` between concept groups (narrows)
|
|
60
|
+
- Use `*` for truncation (e.g., `student*` matches students, student's)
|
|
61
|
+
- Use `""` for exact phrases
|
|
62
|
+
- Use `NOT` sparingly and document its use
|
|
63
|
+
|
|
64
|
+
### Step 4: Adapt for Each Database
|
|
65
|
+
|
|
66
|
+
Each database has its own syntax and controlled vocabulary. You must translate your master search string for each target:
|
|
67
|
+
|
|
68
|
+
- **PubMed**: Use MeSH terms alongside free-text; syntax uses `[MeSH]` tags
|
|
69
|
+
- **Scopus**: Uses `TITLE-ABS-KEY()` field codes
|
|
70
|
+
- **Web of Science**: Uses `TS=` (Topic) and `TI=` (Title) field tags
|
|
71
|
+
- **IEEE Xplore**: Uses "Command Search" with field codes
|
|
72
|
+
- **OpenAlex**: Uses concept IDs and filter parameters in the API
|
|
73
|
+
|
|
74
|
+
## Multi-Database Execution
|
|
75
|
+
|
|
76
|
+
### Recommended Database Selection by Discipline
|
|
77
|
+
|
|
78
|
+
| Discipline | Primary Databases | Supplementary |
|
|
79
|
+
|-----------|-------------------|---------------|
|
|
80
|
+
| Biomedical | PubMed, Cochrane, Embase | CINAHL, PsycINFO |
|
|
81
|
+
| Computer Science | IEEE Xplore, ACM DL, DBLP | Scopus, arXiv |
|
|
82
|
+
| Social Sciences | PsycINFO, ERIC, Sociological Abstracts | Web of Science |
|
|
83
|
+
| Engineering | Compendex, IEEE Xplore | Scopus, Web of Science |
|
|
84
|
+
| Multidisciplinary | Web of Science, Scopus, OpenAlex | Google Scholar (supplementary) |
|
|
85
|
+
|
|
86
|
+
### Execution Checklist
|
|
87
|
+
|
|
88
|
+
For each database:
|
|
89
|
+
1. Translate the master search string to the database's syntax
|
|
90
|
+
2. Run the search and record the date, exact query string, and result count
|
|
91
|
+
3. Export all results in a structured format (RIS, BibTeX, or CSV)
|
|
92
|
+
4. Save a screenshot or copy of the search interface showing the query and results count
|
|
93
|
+
|
|
94
|
+
### Grey Literature and Supplementary Sources
|
|
95
|
+
|
|
96
|
+
A truly exhaustive search also covers non-indexed sources:
|
|
97
|
+
|
|
98
|
+
- **Preprint servers**: arXiv, bioRxiv, medRxiv, SSRN
|
|
99
|
+
- **Dissertations**: ProQuest Dissertations, EThOS, institutional repositories
|
|
100
|
+
- **Conference proceedings**: Check major conferences in your field
|
|
101
|
+
- **Citation chaining**: Forward (who cited this?) and backward (what did this cite?) from key papers
|
|
102
|
+
- **Expert consultation**: Contact domain experts for unpublished or in-press work
|
|
103
|
+
- **Trial registries**: ClinicalTrials.gov, WHO ICTRP (for clinical topics)
|
|
104
|
+
|
|
105
|
+
## Deduplication and Screening
|
|
106
|
+
|
|
107
|
+
### Deduplication Process
|
|
108
|
+
|
|
109
|
+
After collecting results from multiple databases, expect 20-40% overlap. Use reference management software to deduplicate:
|
|
110
|
+
|
|
111
|
+
1. Import all exported results into a single library (Zotero, EndNote, or Rayyan)
|
|
112
|
+
2. Run automatic deduplication using DOI matching first, then title+author matching
|
|
113
|
+
3. Manually review flagged potential duplicates—automated tools miss ~5-10%
|
|
114
|
+
4. Record the count: total imported, duplicates removed, unique records remaining
|
|
115
|
+
|
|
116
|
+
### Screening Workflow
|
|
117
|
+
|
|
118
|
+
Apply a two-stage screening process:
|
|
119
|
+
|
|
120
|
+
- **Title/Abstract screening**: Review each unique record against your inclusion/exclusion criteria. Mark as Include, Exclude, or Maybe.
|
|
121
|
+
- **Full-text screening**: Retrieve full texts for all Include and Maybe records. Apply detailed eligibility criteria.
|
|
122
|
+
|
|
123
|
+
Use screening tools like Rayyan, Covidence, or ASReview to manage this process, especially for large result sets (500+ records).
|
|
124
|
+
|
|
125
|
+
## PRISMA Documentation
|
|
126
|
+
|
|
127
|
+
Document your entire search process using the PRISMA 2020 flow diagram:
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
Records identified (N = ?)
|
|
131
|
+
├── Database 1 (n = ?)
|
|
132
|
+
├── Database 2 (n = ?)
|
|
133
|
+
└── Other sources (n = ?)
|
|
134
|
+
Duplicates removed (n = ?)
|
|
135
|
+
Records screened (n = ?)
|
|
136
|
+
Records excluded (n = ?)
|
|
137
|
+
Full-text assessed (n = ?)
|
|
138
|
+
Full-text excluded with reasons (n = ?)
|
|
139
|
+
Studies included (n = ?)
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Save your complete search strategies (exact query strings, dates, result counts per database) as supplementary material for your publication. This transparency is essential for reproducibility and is increasingly required by journals.
|
|
143
|
+
|
|
144
|
+
## References
|
|
145
|
+
|
|
146
|
+
- PRISMA 2020 Statement: https://www.prisma-statement.org
|
|
147
|
+
- Cochrane Handbook, Ch. 4 (Searching for studies): https://training.cochrane.org/handbook
|
|
148
|
+
- Literature Search tool: https://github.com/utilityfog/Literature-Search
|
|
149
|
+
- Bramer, W.M. et al. (2017). "De-duplication of database search results." BMC Medical Research Methodology.
|