@wentorai/research-plugins 1.2.3 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (142) hide show
  1. package/README.md +16 -8
  2. package/openclaw.plugin.json +10 -3
  3. package/package.json +2 -5
  4. package/skills/analysis/dataviz/SKILL.md +25 -0
  5. package/skills/analysis/dataviz/chart-image-generator/SKILL.md +1 -1
  6. package/skills/analysis/econometrics/SKILL.md +23 -0
  7. package/skills/analysis/econometrics/robustness-checks/SKILL.md +1 -1
  8. package/skills/analysis/statistics/SKILL.md +21 -0
  9. package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +1 -1
  10. package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +1 -1
  11. package/skills/analysis/statistics/{senior-data-scientist-guide → modeling-strategy-guide}/SKILL.md +5 -5
  12. package/skills/analysis/wrangling/SKILL.md +21 -0
  13. package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +1 -1
  14. package/skills/analysis/wrangling/data-cog-guide/SKILL.md +1 -1
  15. package/skills/domains/ai-ml/SKILL.md +37 -0
  16. package/skills/domains/biomedical/SKILL.md +28 -0
  17. package/skills/domains/biomedical/genomas-guide/SKILL.md +1 -1
  18. package/skills/domains/biomedical/med-researcher-guide/SKILL.md +1 -1
  19. package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +1 -1
  20. package/skills/domains/business/SKILL.md +17 -0
  21. package/skills/domains/business/architecture-design-guide/SKILL.md +1 -1
  22. package/skills/domains/chemistry/SKILL.md +19 -0
  23. package/skills/domains/chemistry/computational-chemistry-guide/SKILL.md +1 -1
  24. package/skills/domains/cs/SKILL.md +21 -0
  25. package/skills/domains/ecology/SKILL.md +16 -0
  26. package/skills/domains/economics/SKILL.md +20 -0
  27. package/skills/domains/economics/post-labor-economics/SKILL.md +1 -1
  28. package/skills/domains/economics/pricing-psychology-guide/SKILL.md +1 -1
  29. package/skills/domains/education/SKILL.md +19 -0
  30. package/skills/domains/education/academic-study-methods/SKILL.md +1 -1
  31. package/skills/domains/education/edumcp-guide/SKILL.md +1 -1
  32. package/skills/domains/finance/SKILL.md +19 -0
  33. package/skills/domains/finance/akshare-finance-data/SKILL.md +1 -1
  34. package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +1 -1
  35. package/skills/domains/finance/stata-accounting-research/SKILL.md +1 -1
  36. package/skills/domains/geoscience/SKILL.md +17 -0
  37. package/skills/domains/humanities/SKILL.md +16 -0
  38. package/skills/domains/humanities/history-research-guide/SKILL.md +1 -1
  39. package/skills/domains/humanities/political-history-guide/SKILL.md +1 -1
  40. package/skills/domains/law/SKILL.md +19 -0
  41. package/skills/domains/math/SKILL.md +17 -0
  42. package/skills/domains/pharma/SKILL.md +17 -0
  43. package/skills/domains/physics/SKILL.md +16 -0
  44. package/skills/domains/social-science/SKILL.md +17 -0
  45. package/skills/domains/social-science/sociology-research-methods/SKILL.md +1 -1
  46. package/skills/literature/discovery/SKILL.md +20 -0
  47. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +1 -1
  48. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +1 -1
  49. package/skills/literature/fulltext/SKILL.md +26 -0
  50. package/skills/literature/metadata/SKILL.md +35 -0
  51. package/skills/literature/metadata/doi-content-negotiation/SKILL.md +4 -0
  52. package/skills/literature/metadata/doi-resolution-guide/SKILL.md +4 -0
  53. package/skills/literature/metadata/orcid-api/SKILL.md +4 -0
  54. package/skills/literature/metadata/orcid-integration-guide/SKILL.md +4 -0
  55. package/skills/literature/search/SKILL.md +43 -0
  56. package/skills/literature/search/paper-search-mcp-guide/SKILL.md +1 -1
  57. package/skills/research/automation/SKILL.md +21 -0
  58. package/skills/research/deep-research/SKILL.md +24 -0
  59. package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +1 -1
  60. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
  61. package/skills/research/funding/SKILL.md +20 -0
  62. package/skills/research/methodology/SKILL.md +24 -0
  63. package/skills/research/paper-review/SKILL.md +19 -0
  64. package/skills/research/paper-review/paper-critique-framework/SKILL.md +1 -1
  65. package/skills/tools/code-exec/SKILL.md +18 -0
  66. package/skills/tools/diagram/SKILL.md +20 -0
  67. package/skills/tools/document/SKILL.md +21 -0
  68. package/skills/tools/knowledge-graph/SKILL.md +21 -0
  69. package/skills/tools/ocr-translate/SKILL.md +18 -0
  70. package/skills/tools/ocr-translate/handwriting-recognition-guide/SKILL.md +2 -0
  71. package/skills/tools/ocr-translate/latex-ocr-guide/SKILL.md +2 -0
  72. package/skills/tools/scraping/SKILL.md +17 -0
  73. package/skills/writing/citation/SKILL.md +33 -0
  74. package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +2 -0
  75. package/skills/writing/composition/SKILL.md +22 -0
  76. package/skills/writing/composition/research-paper-writer/SKILL.md +1 -1
  77. package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +1 -1
  78. package/skills/writing/latex/SKILL.md +22 -0
  79. package/skills/writing/latex/academic-writing-latex/SKILL.md +1 -1
  80. package/skills/writing/latex/latex-drawing-guide/SKILL.md +1 -1
  81. package/skills/writing/polish/SKILL.md +20 -0
  82. package/skills/writing/polish/chinese-text-humanizer/SKILL.md +1 -1
  83. package/skills/writing/templates/SKILL.md +22 -0
  84. package/skills/writing/templates/beamer-presentation-guide/SKILL.md +1 -1
  85. package/skills/writing/templates/scientific-article-pdf/SKILL.md +1 -1
  86. package/skills/analysis/dataviz/citation-map-guide/SKILL.md +0 -184
  87. package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +0 -171
  88. package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +0 -192
  89. package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +0 -267
  90. package/skills/analysis/econometrics/stata-regression/SKILL.md +0 -117
  91. package/skills/analysis/statistics/general-statistics-guide/SKILL.md +0 -226
  92. package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +0 -106
  93. package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +0 -192
  94. package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +0 -193
  95. package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +0 -100
  96. package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +0 -197
  97. package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +0 -159
  98. package/skills/domains/humanities/digital-humanities-methods/SKILL.md +0 -232
  99. package/skills/domains/law/legal-research-methods/SKILL.md +0 -190
  100. package/skills/domains/social-science/sociology-research-guide/SKILL.md +0 -238
  101. package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +0 -233
  102. package/skills/literature/discovery/paper-tracking-guide/SKILL.md +0 -211
  103. package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +0 -168
  104. package/skills/literature/search/arxiv-osiris/SKILL.md +0 -199
  105. package/skills/literature/search/deepgit-search-guide/SKILL.md +0 -147
  106. package/skills/literature/search/multi-database-literature-search/SKILL.md +0 -198
  107. package/skills/literature/search/papers-chat-guide/SKILL.md +0 -194
  108. package/skills/literature/search/pasa-paper-search-guide/SKILL.md +0 -138
  109. package/skills/literature/search/scientify-literature-survey/SKILL.md +0 -203
  110. package/skills/research/automation/ai-scientist-guide/SKILL.md +0 -228
  111. package/skills/research/automation/coexist-ai-guide/SKILL.md +0 -149
  112. package/skills/research/automation/foam-agent-guide/SKILL.md +0 -203
  113. package/skills/research/automation/research-paper-orchestrator/SKILL.md +0 -254
  114. package/skills/research/deep-research/academic-deep-research/SKILL.md +0 -190
  115. package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +0 -200
  116. package/skills/research/deep-research/corvus-research-guide/SKILL.md +0 -132
  117. package/skills/research/deep-research/deep-research-pro/SKILL.md +0 -213
  118. package/skills/research/deep-research/deep-research-work/SKILL.md +0 -204
  119. package/skills/research/deep-research/research-cog/SKILL.md +0 -153
  120. package/skills/research/methodology/academic-mentor-guide/SKILL.md +0 -169
  121. package/skills/research/methodology/deep-innovator-guide/SKILL.md +0 -242
  122. package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +0 -169
  123. package/skills/research/paper-review/paper-compare-guide/SKILL.md +0 -238
  124. package/skills/research/paper-review/paper-digest-guide/SKILL.md +0 -240
  125. package/skills/research/paper-review/paper-research-assistant/SKILL.md +0 -231
  126. package/skills/research/paper-review/research-quality-filter/SKILL.md +0 -261
  127. package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +0 -110
  128. package/skills/tools/diagram/clawphd-guide/SKILL.md +0 -149
  129. package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +0 -201
  130. package/skills/tools/document/md2pdf-xelatex/SKILL.md +0 -212
  131. package/skills/tools/document/openpaper-guide/SKILL.md +0 -232
  132. package/skills/tools/document/qq-connect/SKILL.md +0 -227
  133. package/skills/tools/document/weknora-guide/SKILL.md +0 -216
  134. package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +0 -135
  135. package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +0 -156
  136. package/skills/tools/ocr-translate/formula-recognition-guide/SKILL.md +0 -367
  137. package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +0 -198
  138. package/skills/tools/scraping/api-data-collection-guide/SKILL.md +0 -301
  139. package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +0 -182
  140. package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +0 -200
  141. package/skills/writing/composition/paper-debugger-guide/SKILL.md +0 -143
  142. package/skills/writing/composition/paperforge-guide/SKILL.md +0 -205
@@ -1,199 +0,0 @@
1
- ---
2
- name: arxiv-osiris
3
- description: "Search and download arXiv papers via Python and PowerShell scripts"
4
- metadata:
5
- openclaw:
6
- emoji: "🔍"
7
- category: "literature"
8
- subcategory: "search"
9
- keywords: ["arxiv", "paper download", "preprint search", "python script", "powershell", "literature retrieval"]
10
- source: "https://clawhub.com/kostaskyq/arxiv-osiris"
11
- ---
12
-
13
- # arXiv Osiris — Paper Search and Download Tool
14
-
15
- ## Overview
16
-
17
- arXiv Osiris provides cross-platform scripts (Python and PowerShell) for searching and downloading scientific papers from arXiv.org. It supports keyword search, category filtering, metadata retrieval, and direct PDF download. Useful for researchers who prefer scripted automation over browser-based arXiv access, particularly for building local paper collections.
18
-
19
- ## Installation
20
-
21
- ```bash
22
- # Install the arxiv Python client (required dependency)
23
- pip install arxiv
24
-
25
- # Clone the tool (if using from source)
26
- git clone https://github.com/kostaskyq/arxiv-osiris.git
27
- ```
28
-
29
- ## Usage — Python API
30
-
31
- ### Search for Papers
32
-
33
- ```python
34
- import arxiv
35
-
36
- # Basic keyword search
37
- search = arxiv.Search(
38
- query="quantum computing error correction",
39
- max_results=10,
40
- sort_by=arxiv.SortCriterion.Relevance
41
- )
42
-
43
- client = arxiv.Client()
44
- for result in client.results(search):
45
- print(f"ID: {result.entry_id}")
46
- print(f"Title: {result.title}")
47
- print(f"Authors: {', '.join(a.name for a in result.authors)}")
48
- print(f"Published:{result.published.strftime('%Y-%m-%d')}")
49
- print(f"PDF: {result.pdf_url}")
50
- print(f"Abstract: {result.summary[:200]}...")
51
- print()
52
- ```
53
-
54
- ### Category-Filtered Search
55
-
56
- ```python
57
- # Search within specific categories
58
- search = arxiv.Search(
59
- query="cat:cs.CL AND transformer",
60
- max_results=20,
61
- sort_by=arxiv.SortCriterion.SubmittedDate
62
- )
63
-
64
- # Multiple categories
65
- search = arxiv.Search(
66
- query="(cat:cs.AI OR cat:cs.LG) AND reinforcement learning",
67
- max_results=15
68
- )
69
- ```
70
-
71
- ### Download Papers
72
-
73
- ```python
74
- import os
75
-
76
- search = arxiv.Search(query="attention mechanism", max_results=5)
77
- client = arxiv.Client()
78
- download_dir = os.path.expanduser("~/papers/attention")
79
- os.makedirs(download_dir, exist_ok=True)
80
-
81
- for result in client.results(search):
82
- # Download PDF
83
- result.download_pdf(dirpath=download_dir)
84
- print(f"Downloaded: {result.title}")
85
-
86
- # Download source (LaTeX) if available
87
- result.download_source(dirpath=download_dir)
88
- ```
89
-
90
- ## Usage — PowerShell Script
91
-
92
- ### Search
93
-
94
- ```powershell
95
- # Basic search
96
- .\arxiv.ps1 -Action search -Query "machine learning"
97
-
98
- # With max results
99
- .\arxiv.ps1 -Action search -Query "neural networks" -MaxResults 10
100
-
101
- # Filter by category
102
- .\arxiv.ps1 -Action search -Query "deep learning" -Categories "cs,stat"
103
- ```
104
-
105
- ### Download
106
-
107
- ```powershell
108
- # Download by arXiv ID
109
- .\arxiv.ps1 -Action download -ArxivId "1706.03762"
110
-
111
- # Download to specific directory
112
- .\arxiv.ps1 -Action download -ArxivId "2301.13688" -OutputDir "C:\Papers"
113
- ```
114
-
115
- ## Advanced Queries
116
-
117
- The arXiv API supports a rich query syntax:
118
-
119
- | Operator | Meaning | Example |
120
- |----------|---------|---------|
121
- | `AND` | Both terms | `"deep learning" AND "drug discovery"` |
122
- | `OR` | Either term | `"GAN" OR "diffusion model"` |
123
- | `ANDNOT` | Exclude term | `"NLP" ANDNOT "translation"` |
124
- | `au:` | Author | `au:"Hinton"` |
125
- | `ti:` | Title contains | `ti:"attention"` |
126
- | `abs:` | Abstract contains | `abs:"protein folding"` |
127
- | `cat:` | Category | `cat:cs.CV` |
128
-
129
- ### Complex Query Examples
130
-
131
- ```python
132
- # Papers by a specific author on a specific topic
133
- search = arxiv.Search(query='au:"Yann LeCun" AND ti:"self-supervised"')
134
-
135
- # Recent papers in two categories excluding surveys
136
- search = arxiv.Search(
137
- query='(cat:cs.CL OR cat:cs.AI) AND "large language model" ANDNOT ti:"survey"',
138
- sort_by=arxiv.SortCriterion.SubmittedDate,
139
- max_results=50
140
- )
141
- ```
142
-
143
- ## Building a Local Paper Library
144
-
145
- ```python
146
- import arxiv
147
- import json
148
- import os
149
- from datetime import datetime
150
-
151
- def build_library(queries: dict, base_dir: str = "~/papers"):
152
- """Build organized paper library from multiple search queries."""
153
- base = os.path.expanduser(base_dir)
154
- catalog = []
155
- client = arxiv.Client()
156
-
157
- for topic, query in queries.items():
158
- topic_dir = os.path.join(base, topic)
159
- os.makedirs(topic_dir, exist_ok=True)
160
-
161
- search = arxiv.Search(query=query, max_results=20,
162
- sort_by=arxiv.SortCriterion.SubmittedDate)
163
-
164
- for paper in client.results(search):
165
- paper.download_pdf(dirpath=topic_dir)
166
- catalog.append({
167
- "id": paper.entry_id,
168
- "title": paper.title,
169
- "authors": [a.name for a in paper.authors],
170
- "published": paper.published.isoformat(),
171
- "topic": topic,
172
- "pdf_path": os.path.join(topic_dir, f"{paper.get_short_id()}.pdf")
173
- })
174
-
175
- # Save catalog
176
- with open(os.path.join(base, "catalog.json"), "w") as f:
177
- json.dump(catalog, f, indent=2)
178
- print(f"Library built: {len(catalog)} papers in {len(queries)} topics")
179
-
180
- # Usage
181
- build_library({
182
- "rag": "cat:cs.CL AND retrieval augmented generation",
183
- "agents": "cat:cs.AI AND (LLM agent OR tool use)",
184
- "evaluation": "cat:cs.CL AND (benchmark OR evaluation) AND language model"
185
- })
186
- ```
187
-
188
- ## Rate Limits
189
-
190
- - arXiv API: **1 request per 3 seconds** for automated access
191
- - The `arxiv` Python client handles rate limiting automatically
192
- - For large-scale downloads, add explicit delays: `time.sleep(3)`
193
- - Respect [arXiv API Terms of Use](https://info.arxiv.org/help/api/tou.html)
194
-
195
- ## References
196
-
197
- - [arxiv Python Client](https://github.com/lukasschwab/arxiv.py)
198
- - [arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html)
199
- - [arXiv Category Taxonomy](https://arxiv.org/category_taxonomy)
@@ -1,147 +0,0 @@
1
- ---
2
- name: deepgit-search-guide
3
- description: "Deep research tool for discovering academic code in Git repositories"
4
- version: 1.0.0
5
- author: wentor-community
6
- source: https://github.com/DeepGit/DeepGit
7
- metadata:
8
- openclaw:
9
- category: "literature"
10
- subcategory: "search"
11
- keywords:
12
- - git-search
13
- - code-discovery
14
- - repository-analysis
15
- - research-code
16
- - implementation-search
17
- - open-source
18
- ---
19
-
20
- # DeepGit Search Guide
21
-
22
- A skill for conducting deep searches across Git repositories to discover research implementations, datasets, and academic code artifacts. Based on DeepGit (852 stars), this skill helps researchers find, evaluate, and utilize open-source code associated with academic publications.
23
-
24
- ## Overview
25
-
26
- Modern academic research increasingly relies on code for data analysis, model implementation, and experiment reproduction. However, finding the right repository among millions on GitHub requires more than simple keyword search. DeepGit applies deep research techniques to repository discovery, combining semantic code understanding, README analysis, citation linking, and quality assessment to surface the most relevant and reliable research code.
27
-
28
- This skill is essential for researchers who want to build on existing implementations rather than reinventing from scratch, verify published results through code inspection, or find reference implementations of algorithms described in papers.
29
-
30
- ## Search Strategies
31
-
32
- **Keyword-Based Search**
33
- - Start with the paper title, method name, or algorithm as search terms
34
- - Include the first author's name or institution to narrow results
35
- - Add framework-specific terms (PyTorch, TensorFlow, scikit-learn) when looking for specific implementations
36
- - Use language filters to find implementations in your preferred programming language
37
- - Combine topic tags (machine-learning, deep-learning, nlp, cv) with method-specific terms
38
-
39
- **Paper-Linked Search**
40
- - Many papers include a "Code available at" link; extract and verify these first
41
- - Search Papers with Code for repository links associated with specific papers
42
- - Check the paper's Semantic Scholar or Google Scholar entry for linked code
43
- - Look for the paper's arXiv abstract which often contains a GitHub link
44
- - Search for the paper's DOI or arXiv ID in GitHub README files
45
-
46
- **Author-Based Search**
47
- - Visit the first author's or corresponding author's GitHub profile
48
- - Check the research group's or lab's GitHub organization page
49
- - Look for personal academic websites that link to code repositories
50
- - Search for the author's ORCID or Google Scholar profile for linked repositories
51
- - Follow the author's collaborators who may have contributed to or forked the code
52
-
53
- **Citation-Chain Search**
54
- - Find code for papers that cite or are cited by the target paper
55
- - Implementations of closely related methods often share similar repository structures
56
- - Forked repositories may contain adaptations for different datasets or settings
57
- - Look at the "Used by" and "Forks" tabs on GitHub for derivative work
58
- - Check awesome-lists in the relevant field for curated repository collections
59
-
60
- ## Repository Evaluation
61
-
62
- Once candidate repositories are found, evaluate them systematically:
63
-
64
- **Code Quality Indicators**
65
- - README completeness: clear description, installation instructions, usage examples
66
- - Documentation: API documentation, tutorials, or walkthroughs
67
- - Test coverage: presence of test files and CI/CD configuration
68
- - Code organization: logical directory structure, modular design
69
- - Dependencies: clear requirements file with pinned versions
70
-
71
- **Reproducibility Assessment**
72
- - Does the README specify how to reproduce the paper's results?
73
- - Are pretrained models or checkpoints provided?
74
- - Is the training data available or are instructions for obtaining it provided?
75
- - Are random seeds and hardware specifications documented?
76
- - Do the reported results match the paper's claims?
77
-
78
- **Maintenance Status**
79
- - Last commit date: recent activity suggests active maintenance
80
- - Issue response time: how quickly are issues acknowledged and addressed
81
- - Open issues count: a high ratio of open to closed issues may indicate abandonment
82
- - Release history: regular releases suggest mature, stable software
83
- - Contributor count: multiple contributors indicate community involvement
84
-
85
- **Community Signals**
86
- - Star count: general popularity indicator (but not quality guarantee)
87
- - Fork count: indicates others are building on the work
88
- - Citation count of the associated paper
89
- - Mentions in academic forums, Twitter, or blog posts
90
- - Inclusion in curated awesome-lists or benchmark suites
91
-
92
- ## Working with Research Code
93
-
94
- **Getting Started**
95
- - Clone the repository and read the entire README before proceeding
96
- - Check the requirements file and create an isolated environment (conda, venv, Docker)
97
- - Install dependencies using the exact versions specified
98
- - Run the provided tests or examples to verify the installation
99
- - Start with the simplest example before attempting full reproduction
100
-
101
- **Common Challenges**
102
- - Missing dependencies not listed in requirements
103
- - Hardcoded paths that need to be adapted to your environment
104
- - GPU memory requirements exceeding available hardware
105
- - Dataset preprocessing steps not documented or automated
106
- - Version conflicts between required packages
107
-
108
- **Adaptation Strategies**
109
- - Fork the repository before making modifications for your use case
110
- - Document all changes you make in a changelog or commit messages
111
- - Keep the original code as a reference branch for comparison
112
- - Submit bug fixes back to the original repository as pull requests
113
- - Cite the repository in your publications using its preferred citation format
114
-
115
- ## Organizing Discovered Repositories
116
-
117
- **Local Catalog**
118
- - Maintain a structured record of discovered repositories with metadata
119
- - Fields: paper title, authors, year, repo URL, stars, language, framework, reproduction status
120
- - Tag repositories by topic, method, and dataset for cross-referencing
121
- - Track which repositories you have successfully run and which had issues
122
- - Note the key configuration settings that made reproduction work
123
-
124
- **Integration with Reference Management**
125
- - Link repository entries to corresponding Zotero or BibTeX references
126
- - Use Zotero's URL field to store repository links alongside paper PDFs
127
- - Tag references with "has-code" or "code-verified" for filtering
128
- - Include repository URLs in your literature notes
129
-
130
- ## Integration with Research-Claw
131
-
132
- This skill enhances the Research-Claw code discovery workflow:
133
-
134
- - Search for implementations after discovering relevant papers through literature skills
135
- - Feed discovered code to analysis skills for experiment replication
136
- - Connect with writing skills to properly cite code and data sources
137
- - Store repository evaluations in the knowledge base for team access
138
- - Automate periodic checks for new repositories related to ongoing projects
139
-
140
- ## Best Practices
141
-
142
- - Always check the repository's license before using code in your own projects
143
- - Cite both the paper and the repository when using others' code
144
- - Verify reproduction results before building on top of existing implementations
145
- - Contribute back improvements, bug fixes, and documentation to the community
146
- - Keep local copies of critical repositories in case they are deleted or moved
147
- - Document your environment setup steps so collaborators can replicate your results
@@ -1,198 +0,0 @@
1
- ---
2
- name: multi-database-literature-search
3
- description: "Conduct comprehensive literature searches across multiple academic databases"
4
- metadata:
5
- openclaw:
6
- emoji: "🔍"
7
- category: "literature"
8
- subcategory: "search"
9
- keywords: ["multi-database search", "systematic review", "literature search", "academic databases", "cross-database", "search strategy"]
10
- source: "https://clawhub.ai/jpjy/literature-search"
11
- ---
12
-
13
- # Multi-Database Literature Search
14
-
15
- ## Overview
16
-
17
- No single database covers all academic literature. A comprehensive search requires querying multiple databases, each with its own coverage, search syntax, and strengths. This guide provides a structured approach to searching across Google Scholar, PubMed, Semantic Scholar, arXiv, IEEE Xplore, ACM Digital Library, and Scopus/Web of Science, with strategies for deduplication and result management.
18
-
19
- ## Database Coverage Map
20
-
21
- | Database | Coverage | Strengths | Free? |
22
- |----------|----------|-----------|-------|
23
- | **Google Scholar** | All disciplines, broadest | Grey literature, books, citations | Yes |
24
- | **Semantic Scholar** | 220M+ papers, all fields | AI-powered relevance, citation context, TLDR | Yes |
25
- | **PubMed** | Biomedical, life sciences | MeSH terms, clinical trials, 36M+ records | Yes |
26
- | **arXiv** | Physics, CS, math, econ, stats | Preprints, latest research, open access | Yes |
27
- | **OpenAlex** | 250M+ works, all fields | Open metadata, citation network, concepts | Yes |
28
- | **Scopus** | All disciplines | Citation metrics, author profiles | Subscription |
29
- | **Web of Science** | All disciplines | Impact factors, citation reports | Subscription |
30
- | **IEEE Xplore** | Engineering, CS | IEEE/IET publications, standards | Partial |
31
- | **ACM DL** | Computer science | ACM proceedings, computing reviews | Partial |
32
- | **SSRN** | Social sciences, economics | Working papers, preprints | Yes |
33
- | **JSTOR** | Humanities, social sciences | Historical archives, journals | Partial |
34
-
35
- ## Search Strategy Design
36
-
37
- ### Step 1: Decompose Your Question
38
-
39
- ```
40
- Research question:
41
- "How does remote work affect employee productivity in knowledge-intensive firms?"
42
-
43
- Concept blocks:
44
- Block A: remote work | telework | work from home | telecommuting | hybrid work
45
- Block B: productivity | performance | output | efficiency | effectiveness
46
- Block C: knowledge work | knowledge-intensive | white collar | professional
47
- ```
48
-
49
- ### Step 2: Build Database-Specific Queries
50
-
51
- Each database has different syntax. Translate your concept blocks:
52
-
53
- **Google Scholar**:
54
- ```
55
- ("remote work" OR telework OR "work from home") AND
56
- (productivity OR performance OR output) AND
57
- ("knowledge work" OR "knowledge-intensive" OR professional)
58
- ```
59
-
60
- **PubMed**:
61
- ```
62
- ("remote work"[Title/Abstract] OR "telework"[Title/Abstract] OR
63
- "work from home"[Title/Abstract]) AND
64
- ("productivity"[Title/Abstract] OR "performance"[Title/Abstract]) AND
65
- ("knowledge workers"[Title/Abstract] OR "professional"[Title/Abstract])
66
- ```
67
-
68
- **Semantic Scholar API**:
69
- ```bash
70
- curl "https://api.semanticscholar.org/graph/v1/paper/search?\
71
- query=remote+work+productivity+knowledge+workers&\
72
- year=2019-2026&\
73
- fieldsOfStudy=Economics,Business&\
74
- limit=100&\
75
- fields=title,authors,year,abstract,citationCount,url"
76
- ```
77
-
78
- **arXiv**:
79
- ```
80
- all:"remote work" AND all:productivity AND cat:econ.*
81
- ```
82
-
83
- ### Step 3: Execute Searches Systematically
84
-
85
- ```markdown
86
- ## Search Log Template (PRISMA-compliant)
87
-
88
- | # | Database | Date | Query String | Filters | Results | Relevant | Notes |
89
- |---|----------|------|-------------|---------|---------|----------|-------|
90
- | 1 | Google Scholar | 2026-03-10 | [full query] | 2019-2026 | 1,240 | ~80 | Top 200 screened |
91
- | 2 | Semantic Scholar | 2026-03-10 | [full query] | Year ≥ 2019 | 487 | ~45 | API, sorted by relevance |
92
- | 3 | PubMed | 2026-03-10 | [full query] | 5 years | 156 | ~30 | MeSH term: Teleworking |
93
- | 4 | SSRN | 2026-03-10 | [full query] | — | 89 | ~20 | Working papers |
94
- | 5 | Scopus | 2026-03-10 | [full query] | 2019-2026 | 312 | ~55 | Most overlap with GS |
95
- ```
96
-
97
- ## Deduplication
98
-
99
- After collecting results from multiple databases, remove duplicates:
100
-
101
- ```python
102
- import pandas as pd
103
- from fuzzywuzzy import fuzz
104
-
105
- def deduplicate_papers(df: pd.DataFrame, title_col: str = "title",
106
- threshold: int = 90) -> pd.DataFrame:
107
- """Remove duplicate papers based on fuzzy title matching."""
108
- df = df.sort_values("citation_count", ascending=False)
109
- keep = []
110
- seen_titles = []
111
-
112
- for _, row in df.iterrows():
113
- title = row[title_col].lower().strip()
114
- is_dup = False
115
- for seen in seen_titles:
116
- if fuzz.ratio(title, seen) >= threshold:
117
- is_dup = True
118
- break
119
- if not is_dup:
120
- keep.append(row)
121
- seen_titles.append(title)
122
-
123
- result = pd.DataFrame(keep)
124
- print(f"Deduplicated: {len(df)} → {len(result)} ({len(df)-len(result)} duplicates removed)")
125
- return result
126
-
127
- # Usage
128
- all_results = pd.concat([gs_results, s2_results, pubmed_results, scopus_results])
129
- unique = deduplicate_papers(all_results)
130
- ```
131
-
132
- ### DOI-Based Deduplication (More Reliable)
133
-
134
- ```python
135
- def deduplicate_by_doi(df: pd.DataFrame) -> pd.DataFrame:
136
- """Primary: DOI match. Fallback: fuzzy title match for missing DOIs."""
137
- with_doi = df[df["doi"].notna()].drop_duplicates(subset="doi", keep="first")
138
- without_doi = df[df["doi"].isna()]
139
- without_doi_deduped = deduplicate_papers(without_doi, threshold=85)
140
- return pd.concat([with_doi, without_doi_deduped]).reset_index(drop=True)
141
- ```
142
-
143
- ## Screening Workflow
144
-
145
- ### Title/Abstract Screening
146
-
147
- ```markdown
148
- After deduplication, screen titles and abstracts:
149
-
150
- Include if:
151
- □ Directly addresses research question
152
- □ Empirical study with data OR systematic review
153
- □ Published in peer-reviewed venue OR reputable preprint server
154
- □ Written in English or Chinese
155
-
156
- Exclude if:
157
- □ Irrelevant population (e.g., manual labor when studying knowledge work)
158
- □ No empirical component (pure opinion)
159
- □ Duplicate or superseded version
160
- □ Cannot access full text (after OA and institutional access attempts)
161
- ```
162
-
163
- ### Citation Chaining
164
-
165
- After initial screening, expand coverage:
166
-
167
- ```
168
- Forward citation (who cited this paper?):
169
- - Semantic Scholar: "Citations" tab
170
- - Google Scholar: "Cited by" link
171
- - Web of Science: "Citing Articles"
172
-
173
- Backward citation (what does this paper cite?):
174
- - Read the reference list of each key paper
175
- - Identify seminal works and foundational papers
176
-
177
- Typically adds 15-30% more relevant papers beyond database searches
178
- ```
179
-
180
- ## Recommended Search Order
181
-
182
- For maximum coverage with minimum effort:
183
-
184
- ```
185
- 1. Semantic Scholar (broad coverage, AI-powered ranking, free API)
186
- 2. Google Scholar (broadest coverage, catches grey literature)
187
- 3. Domain-specific DB (PubMed for biomedical, arXiv for CS/physics, SSRN for social science)
188
- 4. Scopus or Web of Science (if institutional access available — adds citation metrics)
189
- 5. Citation chaining from top 10 most relevant papers found so far
190
- 6. Grey literature: Google, institutional repositories, conference websites
191
- ```
192
-
193
- ## References
194
-
195
- - Moher, D., et al. (2009). "PRISMA Statement." *BMJ*, 339, b2535.
196
- - Bramer, W. M., et al. (2017). "De-duplication of database search results." *BMC Medical Research Methodology*, 17(1), 1-9.
197
- - [Semantic Scholar API](https://api.semanticscholar.org/)
198
- - [PubMed Search Guide](https://pubmed.ncbi.nlm.nih.gov/help/)