@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (415) hide show
  1. package/README.md +22 -22
  2. package/curated/analysis/README.md +82 -56
  3. package/curated/domains/README.md +225 -69
  4. package/curated/literature/README.md +115 -46
  5. package/curated/research/README.md +106 -58
  6. package/curated/tools/README.md +107 -87
  7. package/curated/writing/README.md +92 -45
  8. package/mcp-configs/academic-db/alphafold-mcp.json +20 -0
  9. package/mcp-configs/academic-db/brightspace-mcp.json +21 -0
  10. package/mcp-configs/academic-db/climatiq-mcp.json +20 -0
  11. package/mcp-configs/academic-db/gibs-mcp.json +20 -0
  12. package/mcp-configs/academic-db/gis-mcp-server.json +22 -0
  13. package/mcp-configs/academic-db/google-earth-engine-mcp.json +21 -0
  14. package/mcp-configs/academic-db/m4-clinical-mcp.json +21 -0
  15. package/mcp-configs/academic-db/medical-mcp.json +21 -0
  16. package/mcp-configs/academic-db/nexonco-mcp.json +20 -0
  17. package/mcp-configs/academic-db/omop-mcp.json +20 -0
  18. package/mcp-configs/academic-db/onekgpd-mcp.json +20 -0
  19. package/mcp-configs/academic-db/openedu-mcp.json +20 -0
  20. package/mcp-configs/academic-db/opengenes-mcp.json +20 -0
  21. package/mcp-configs/academic-db/openstax-mcp.json +21 -0
  22. package/mcp-configs/academic-db/openstreetmap-mcp.json +21 -0
  23. package/mcp-configs/academic-db/opentargets-mcp.json +21 -0
  24. package/mcp-configs/academic-db/pdb-mcp.json +21 -0
  25. package/mcp-configs/academic-db/smithsonian-mcp.json +20 -0
  26. package/mcp-configs/ai-platform/magi-researchers.json +21 -0
  27. package/mcp-configs/ai-platform/mcp-academic-researcher.json +22 -0
  28. package/mcp-configs/ai-platform/open-paper-machine.json +21 -0
  29. package/mcp-configs/ai-platform/paper-intelligence.json +21 -0
  30. package/mcp-configs/ai-platform/paper-reader.json +21 -0
  31. package/mcp-configs/ai-platform/paperdebugger.json +21 -0
  32. package/mcp-configs/browser/exa-mcp.json +20 -0
  33. package/mcp-configs/browser/mcp-searxng.json +21 -0
  34. package/mcp-configs/browser/mcp-webresearch.json +20 -0
  35. package/mcp-configs/cloud-docs/confluence-mcp.json +37 -0
  36. package/mcp-configs/cloud-docs/google-drive-mcp.json +35 -0
  37. package/mcp-configs/cloud-docs/notion-mcp.json +29 -0
  38. package/mcp-configs/communication/discord-mcp.json +29 -0
  39. package/mcp-configs/communication/discourse-mcp.json +21 -0
  40. package/mcp-configs/communication/slack-mcp.json +29 -0
  41. package/mcp-configs/communication/telegram-mcp.json +28 -0
  42. package/mcp-configs/data-platform/automl-stat-mcp.json +21 -0
  43. package/mcp-configs/data-platform/jefferson-stats-mcp.json +22 -0
  44. package/mcp-configs/data-platform/mcp-excel-server.json +21 -0
  45. package/mcp-configs/data-platform/mcp-stata.json +21 -0
  46. package/mcp-configs/data-platform/mcpstack-jupyter.json +21 -0
  47. package/mcp-configs/data-platform/ml-mcp.json +21 -0
  48. package/mcp-configs/data-platform/nasdaq-data-link-mcp.json +20 -0
  49. package/mcp-configs/data-platform/numpy-mcp.json +21 -0
  50. package/mcp-configs/database/neo4j-mcp.json +37 -0
  51. package/mcp-configs/database/postgres-mcp.json +28 -0
  52. package/mcp-configs/database/sqlite-mcp.json +29 -0
  53. package/mcp-configs/dev-platform/geogebra-mcp.json +21 -0
  54. package/mcp-configs/dev-platform/github-mcp.json +31 -0
  55. package/mcp-configs/dev-platform/gitlab-mcp.json +34 -0
  56. package/mcp-configs/dev-platform/latex-mcp-server.json +21 -0
  57. package/mcp-configs/dev-platform/manim-mcp.json +20 -0
  58. package/mcp-configs/dev-platform/mcp-echarts.json +20 -0
  59. package/mcp-configs/dev-platform/panel-viz-mcp.json +20 -0
  60. package/mcp-configs/dev-platform/paperbanana.json +20 -0
  61. package/mcp-configs/dev-platform/texflow-mcp.json +20 -0
  62. package/mcp-configs/dev-platform/texmcp.json +20 -0
  63. package/mcp-configs/dev-platform/typst-mcp.json +21 -0
  64. package/mcp-configs/dev-platform/vizro-mcp.json +20 -0
  65. package/mcp-configs/email/email-mcp.json +40 -0
  66. package/mcp-configs/email/gmail-mcp.json +37 -0
  67. package/mcp-configs/note-knowledge/local-faiss-mcp.json +21 -0
  68. package/mcp-configs/note-knowledge/mcp-memory-service.json +21 -0
  69. package/mcp-configs/note-knowledge/mcp-obsidian.json +23 -0
  70. package/mcp-configs/note-knowledge/mcp-ragdocs.json +20 -0
  71. package/mcp-configs/note-knowledge/mcp-summarizer.json +21 -0
  72. package/mcp-configs/note-knowledge/mediawiki-mcp.json +21 -0
  73. package/mcp-configs/note-knowledge/openzim-mcp.json +20 -0
  74. package/mcp-configs/note-knowledge/zettelkasten-mcp.json +21 -0
  75. package/mcp-configs/reference-mgr/academic-paper-mcp-http.json +20 -0
  76. package/mcp-configs/reference-mgr/academix.json +20 -0
  77. package/mcp-configs/reference-mgr/arxiv-research-mcp.json +21 -0
  78. package/mcp-configs/reference-mgr/google-scholar-abstract-mcp.json +19 -0
  79. package/mcp-configs/reference-mgr/google-scholar-mcp.json +20 -0
  80. package/mcp-configs/reference-mgr/mcp-paperswithcode.json +21 -0
  81. package/mcp-configs/reference-mgr/mcp-scholarly.json +20 -0
  82. package/mcp-configs/reference-mgr/mcp-simple-arxiv.json +20 -0
  83. package/mcp-configs/reference-mgr/mcp-simple-pubmed.json +20 -0
  84. package/mcp-configs/reference-mgr/mcp-zotero.json +21 -0
  85. package/mcp-configs/reference-mgr/mendeley-mcp.json +20 -0
  86. package/mcp-configs/reference-mgr/ncbi-mcp-server.json +22 -0
  87. package/mcp-configs/reference-mgr/onecite.json +21 -0
  88. package/mcp-configs/reference-mgr/paper-search-mcp.json +21 -0
  89. package/mcp-configs/reference-mgr/pubmed-search-mcp.json +21 -0
  90. package/mcp-configs/reference-mgr/scholar-mcp.json +21 -0
  91. package/mcp-configs/reference-mgr/scholar-multi-mcp.json +21 -0
  92. package/mcp-configs/reference-mgr/seerai.json +21 -0
  93. package/mcp-configs/reference-mgr/semantic-scholar-fastmcp.json +21 -0
  94. package/mcp-configs/reference-mgr/sourcelibrary.json +20 -0
  95. package/mcp-configs/registry.json +178 -149
  96. package/mcp-configs/repository/dataverse-mcp.json +33 -0
  97. package/mcp-configs/repository/huggingface-mcp.json +29 -0
  98. package/openclaw.plugin.json +2 -2
  99. package/package.json +2 -2
  100. package/skills/analysis/dataviz/algorithm-visualizer-guide/SKILL.md +259 -0
  101. package/skills/analysis/dataviz/bokeh-visualization-guide/SKILL.md +270 -0
  102. package/skills/analysis/dataviz/chart-image-generator/SKILL.md +229 -0
  103. package/skills/analysis/dataviz/citation-map-guide/SKILL.md +184 -0
  104. package/skills/analysis/dataviz/d3-visualization-guide/SKILL.md +281 -0
  105. package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +171 -0
  106. package/skills/analysis/dataviz/echarts-visualization-guide/SKILL.md +250 -0
  107. package/skills/analysis/dataviz/metabase-analytics-guide/SKILL.md +242 -0
  108. package/skills/analysis/dataviz/plotly-interactive-guide/SKILL.md +266 -0
  109. package/skills/analysis/dataviz/redash-analytics-guide/SKILL.md +284 -0
  110. package/skills/analysis/econometrics/econml-causal-guide/SKILL.md +163 -0
  111. package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +192 -0
  112. package/skills/analysis/econometrics/mostly-harmless-guide/SKILL.md +139 -0
  113. package/skills/analysis/econometrics/panel-data-analyst/SKILL.md +259 -0
  114. package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +267 -0
  115. package/skills/analysis/econometrics/python-causality-guide/SKILL.md +134 -0
  116. package/skills/analysis/econometrics/stata-accounting-guide/SKILL.md +269 -0
  117. package/skills/analysis/econometrics/stata-analyst-guide/SKILL.md +245 -0
  118. package/skills/analysis/econometrics/stata-reference-guide/SKILL.md +293 -0
  119. package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +157 -0
  120. package/skills/analysis/statistics/general-statistics-guide/SKILL.md +226 -0
  121. package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +106 -0
  122. package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +212 -0
  123. package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +192 -0
  124. package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +193 -0
  125. package/skills/analysis/statistics/senior-data-scientist-guide/SKILL.md +223 -0
  126. package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +100 -0
  127. package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +170 -0
  128. package/skills/analysis/wrangling/data-cleaning-pipeline/SKILL.md +266 -0
  129. package/skills/analysis/wrangling/data-cog-guide/SKILL.md +178 -0
  130. package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +197 -0
  131. package/skills/analysis/wrangling/stata-data-cleaning/SKILL.md +276 -0
  132. package/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md +119 -0
  133. package/skills/analysis/wrangling/survey-data-processing/SKILL.md +298 -0
  134. package/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md +146 -0
  135. package/skills/domains/ai-ml/ai-model-benchmarking/SKILL.md +209 -0
  136. package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +159 -0
  137. package/skills/domains/ai-ml/anomaly-detection-papers-guide/SKILL.md +167 -0
  138. package/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md +178 -0
  139. package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md +239 -0
  140. package/skills/domains/ai-ml/domain-adaptation-papers-guide/SKILL.md +173 -0
  141. package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +146 -0
  142. package/skills/domains/ai-ml/graph-learning-papers-guide/SKILL.md +125 -0
  143. package/skills/domains/ai-ml/huggingface-inference-guide/SKILL.md +196 -0
  144. package/skills/domains/ai-ml/keras-deep-learning/SKILL.md +210 -0
  145. package/skills/domains/ai-ml/kolmogorov-arnold-networks-guide/SKILL.md +185 -0
  146. package/skills/domains/ai-ml/llm-from-scratch-guide/SKILL.md +124 -0
  147. package/skills/domains/ai-ml/ml-pipeline-guide/SKILL.md +295 -0
  148. package/skills/domains/ai-ml/nlp-toolkit-guide/SKILL.md +247 -0
  149. package/skills/domains/ai-ml/npcpy-research-guide/SKILL.md +137 -0
  150. package/skills/domains/ai-ml/pytorch-guide/SKILL.md +281 -0
  151. package/skills/domains/ai-ml/pytorch-lightning-guide/SKILL.md +244 -0
  152. package/skills/domains/ai-ml/responsible-ai-guide/SKILL.md +126 -0
  153. package/skills/domains/ai-ml/tensorflow-guide/SKILL.md +241 -0
  154. package/skills/domains/ai-ml/vmas-simulator-guide/SKILL.md +129 -0
  155. package/skills/domains/biomedical/bioagents-guide/SKILL.md +308 -0
  156. package/skills/domains/biomedical/clawbio-guide/SKILL.md +167 -0
  157. package/skills/domains/biomedical/clinical-dialogue-agents-guide/SKILL.md +145 -0
  158. package/skills/domains/biomedical/ena-sequence-api/SKILL.md +175 -0
  159. package/skills/domains/biomedical/genomas-guide/SKILL.md +126 -0
  160. package/skills/domains/biomedical/genotex-benchmark-guide/SKILL.md +125 -0
  161. package/skills/domains/biomedical/med-researcher-guide/SKILL.md +161 -0
  162. package/skills/domains/biomedical/med-researcher-r1-guide/SKILL.md +146 -0
  163. package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +345 -0
  164. package/skills/domains/biomedical/medical-imaging-guide/SKILL.md +305 -0
  165. package/skills/domains/biomedical/ncbi-blast-api/SKILL.md +195 -0
  166. package/skills/domains/biomedical/ncbi-datasets-api/SKILL.md +220 -0
  167. package/skills/domains/biomedical/quickgo-api/SKILL.md +181 -0
  168. package/skills/domains/business/architecture-design-guide/SKILL.md +279 -0
  169. package/skills/domains/business/innovation-management-guide/SKILL.md +257 -0
  170. package/skills/domains/business/operations-research-guide/SKILL.md +258 -0
  171. package/skills/domains/business/xpert-bi-guide/SKILL.md +84 -0
  172. package/skills/domains/chemistry/cactus-cheminformatics-guide/SKILL.md +89 -0
  173. package/skills/domains/chemistry/chemeagle-guide/SKILL.md +147 -0
  174. package/skills/domains/chemistry/chemgraph-agent-guide/SKILL.md +120 -0
  175. package/skills/domains/chemistry/molecular-dynamics-guide/SKILL.md +237 -0
  176. package/skills/domains/chemistry/pubchem-api-guide/SKILL.md +180 -0
  177. package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md +290 -0
  178. package/skills/domains/cs/ai-security-papers-guide/SKILL.md +103 -0
  179. package/skills/domains/cs/code-llm-papers-guide/SKILL.md +131 -0
  180. package/skills/domains/cs/distributed-systems-guide/SKILL.md +268 -0
  181. package/skills/domains/cs/formal-verification-guide/SKILL.md +298 -0
  182. package/skills/domains/cs/gaussian-splatting-papers-guide/SKILL.md +158 -0
  183. package/skills/domains/cs/llm-aiops-guide/SKILL.md +70 -0
  184. package/skills/domains/cs/software-heritage-api/SKILL.md +200 -0
  185. package/skills/domains/ecology/species-distribution-guide/SKILL.md +343 -0
  186. package/skills/domains/economics/imf-data-api-guide/SKILL.md +174 -0
  187. package/skills/domains/economics/nber-working-papers-api/SKILL.md +177 -0
  188. package/skills/domains/economics/post-labor-economics/SKILL.md +254 -0
  189. package/skills/domains/economics/pricing-psychology-guide/SKILL.md +273 -0
  190. package/skills/domains/economics/repec-economics-api/SKILL.md +188 -0
  191. package/skills/domains/economics/world-bank-data-guide/SKILL.md +179 -0
  192. package/skills/domains/education/academic-study-methods/SKILL.md +228 -0
  193. package/skills/domains/education/assessment-design-guide/SKILL.md +213 -0
  194. package/skills/domains/education/educational-research-methods/SKILL.md +179 -0
  195. package/skills/domains/education/edumcp-guide/SKILL.md +74 -0
  196. package/skills/domains/education/mooc-analytics-guide/SKILL.md +206 -0
  197. package/skills/domains/education/open-syllabus-api/SKILL.md +171 -0
  198. package/skills/domains/finance/akshare-finance-data/SKILL.md +207 -0
  199. package/skills/domains/finance/finsight-research-guide/SKILL.md +113 -0
  200. package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +117 -0
  201. package/skills/domains/finance/portfolio-optimization-guide/SKILL.md +279 -0
  202. package/skills/domains/finance/risk-modeling-guide/SKILL.md +260 -0
  203. package/skills/domains/finance/stata-accounting-research/SKILL.md +372 -0
  204. package/skills/domains/geoscience/climate-modeling-guide/SKILL.md +215 -0
  205. package/skills/domains/geoscience/pangaea-data-api/SKILL.md +197 -0
  206. package/skills/domains/geoscience/satellite-remote-sensing/SKILL.md +193 -0
  207. package/skills/domains/geoscience/seismology-data-guide/SKILL.md +208 -0
  208. package/skills/domains/humanities/digital-humanities-methods/SKILL.md +232 -0
  209. package/skills/domains/humanities/ethical-philosophy-guide/SKILL.md +244 -0
  210. package/skills/domains/humanities/history-research-guide/SKILL.md +260 -0
  211. package/skills/domains/humanities/political-history-guide/SKILL.md +241 -0
  212. package/skills/domains/law/caselaw-access-api/SKILL.md +149 -0
  213. package/skills/domains/law/legal-agent-skills-guide/SKILL.md +132 -0
  214. package/skills/domains/law/legal-nlp-guide/SKILL.md +236 -0
  215. package/skills/domains/law/legal-research-methods/SKILL.md +190 -0
  216. package/skills/domains/law/opencontracts-guide/SKILL.md +168 -0
  217. package/skills/domains/law/patent-analysis-guide/SKILL.md +257 -0
  218. package/skills/domains/law/regulatory-compliance-guide/SKILL.md +267 -0
  219. package/skills/domains/math/lean-theorem-proving-guide/SKILL.md +140 -0
  220. package/skills/domains/math/symbolic-computation-guide/SKILL.md +263 -0
  221. package/skills/domains/math/topology-data-analysis/SKILL.md +305 -0
  222. package/skills/domains/pharma/clinical-trial-design-guide/SKILL.md +271 -0
  223. package/skills/domains/pharma/drug-target-interaction/SKILL.md +242 -0
  224. package/skills/domains/pharma/madd-drug-discovery-guide/SKILL.md +153 -0
  225. package/skills/domains/pharma/pharmacovigilance-guide/SKILL.md +216 -0
  226. package/skills/domains/physics/astrophysics-data-guide/SKILL.md +305 -0
  227. package/skills/domains/physics/particle-physics-guide/SKILL.md +287 -0
  228. package/skills/domains/social-science/ipums-microdata-api/SKILL.md +211 -0
  229. package/skills/domains/social-science/network-analysis-guide/SKILL.md +310 -0
  230. package/skills/domains/social-science/psychology-research-guide/SKILL.md +270 -0
  231. package/skills/domains/social-science/sociology-research-guide/SKILL.md +238 -0
  232. package/skills/domains/social-science/sociology-research-methods/SKILL.md +181 -0
  233. package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +233 -0
  234. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +120 -0
  235. package/skills/literature/discovery/papers-we-love-guide/SKILL.md +169 -0
  236. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +144 -0
  237. package/skills/literature/discovery/zotero-arxiv-daily-guide/SKILL.md +94 -0
  238. package/skills/literature/fulltext/bioc-pmc-api/SKILL.md +146 -0
  239. package/skills/literature/fulltext/core-api-guide/SKILL.md +144 -0
  240. package/skills/literature/fulltext/dataverse-api/SKILL.md +215 -0
  241. package/skills/literature/fulltext/hal-archive-api/SKILL.md +218 -0
  242. package/skills/literature/fulltext/institutional-repository-guide/SKILL.md +212 -0
  243. package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +341 -0
  244. package/skills/literature/fulltext/osf-api/SKILL.md +212 -0
  245. package/skills/literature/fulltext/pmc-ftp-bulk-download/SKILL.md +182 -0
  246. package/skills/literature/fulltext/zotero-ai-butler-guide/SKILL.md +166 -0
  247. package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +168 -0
  248. package/skills/literature/metadata/academic-paper-summarizer/SKILL.md +101 -0
  249. package/skills/literature/metadata/bibliometrix-guide/SKILL.md +164 -0
  250. package/skills/literature/metadata/crossref-event-data-api/SKILL.md +183 -0
  251. package/skills/literature/metadata/doi-content-negotiation/SKILL.md +202 -0
  252. package/skills/literature/metadata/orkg-api/SKILL.md +153 -0
  253. package/skills/literature/metadata/plumx-metrics-api/SKILL.md +188 -0
  254. package/skills/literature/metadata/ror-organization-api/SKILL.md +208 -0
  255. package/skills/literature/metadata/sophosia-reference-guide/SKILL.md +110 -0
  256. package/skills/literature/metadata/viaf-authority-api/SKILL.md +209 -0
  257. package/skills/literature/metadata/wikidata-api-guide/SKILL.md +156 -0
  258. package/skills/literature/metadata/zoplicate-dedup-guide/SKILL.md +147 -0
  259. package/skills/literature/metadata/zotero-actions-tags-guide/SKILL.md +212 -0
  260. package/skills/literature/metadata/zotmoov-guide/SKILL.md +120 -0
  261. package/skills/literature/metadata/zutilo-guide/SKILL.md +140 -0
  262. package/skills/literature/search/arxiv-batch-reporting/SKILL.md +133 -0
  263. package/skills/literature/search/arxiv-cli-tools/SKILL.md +172 -0
  264. package/skills/literature/search/arxiv-osiris/SKILL.md +199 -0
  265. package/skills/literature/search/arxiv-paper-processor/SKILL.md +141 -0
  266. package/skills/literature/search/baidu-scholar-guide/SKILL.md +110 -0
  267. package/skills/literature/search/base-academic-search/SKILL.md +196 -0
  268. package/skills/literature/search/chatpaper-guide/SKILL.md +122 -0
  269. package/skills/literature/search/citeseerx-api/SKILL.md +183 -0
  270. package/skills/literature/search/deep-literature-search/SKILL.md +149 -0
  271. package/skills/literature/search/deepgit-search-guide/SKILL.md +147 -0
  272. package/skills/literature/search/eric-education-api/SKILL.md +199 -0
  273. package/skills/literature/search/findpapers-guide/SKILL.md +177 -0
  274. package/skills/literature/search/ieee-xplore-api/SKILL.md +177 -0
  275. package/skills/literature/search/lens-scholarly-api/SKILL.md +211 -0
  276. package/skills/literature/search/multi-database-literature-search/SKILL.md +198 -0
  277. package/skills/literature/search/open-library-api/SKILL.md +196 -0
  278. package/skills/literature/search/open-semantic-search-guide/SKILL.md +190 -0
  279. package/skills/literature/search/openaire-api/SKILL.md +141 -0
  280. package/skills/literature/search/paper-search-mcp-guide/SKILL.md +107 -0
  281. package/skills/literature/search/papers-chat-guide/SKILL.md +194 -0
  282. package/skills/literature/search/pasa-paper-search-guide/SKILL.md +138 -0
  283. package/skills/literature/search/plos-open-access-api/SKILL.md +203 -0
  284. package/skills/literature/search/scielo-api/SKILL.md +182 -0
  285. package/skills/literature/search/share-research-api/SKILL.md +129 -0
  286. package/skills/literature/search/worldcat-search-api/SKILL.md +224 -0
  287. package/skills/research/automation/ai-scientist-v2-guide/SKILL.md +284 -0
  288. package/skills/research/automation/aim-experiment-guide/SKILL.md +234 -0
  289. package/skills/research/automation/claude-academic-workflow-guide/SKILL.md +202 -0
  290. package/skills/research/automation/coexist-ai-guide/SKILL.md +149 -0
  291. package/skills/research/automation/datagen-research-guide/SKILL.md +131 -0
  292. package/skills/research/automation/foam-agent-guide/SKILL.md +203 -0
  293. package/skills/research/automation/kedro-pipeline-guide/SKILL.md +216 -0
  294. package/skills/research/automation/mle-agent-guide/SKILL.md +139 -0
  295. package/skills/research/automation/paper-to-agent-guide/SKILL.md +116 -0
  296. package/skills/research/automation/rd-agent-guide/SKILL.md +246 -0
  297. package/skills/research/automation/research-paper-orchestrator/SKILL.md +254 -0
  298. package/skills/research/deep-research/academic-deep-research/SKILL.md +190 -0
  299. package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +141 -0
  300. package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +200 -0
  301. package/skills/research/deep-research/corvus-research-guide/SKILL.md +132 -0
  302. package/skills/research/deep-research/deep-research-pro/SKILL.md +213 -0
  303. package/skills/research/deep-research/deep-research-work/SKILL.md +204 -0
  304. package/skills/research/deep-research/deep-searcher-guide/SKILL.md +253 -0
  305. package/skills/research/deep-research/gpt-researcher-guide/SKILL.md +191 -0
  306. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +205 -0
  307. package/skills/research/deep-research/khoj-research-guide/SKILL.md +200 -0
  308. package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +185 -0
  309. package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +178 -0
  310. package/skills/research/deep-research/local-deep-research-guide/SKILL.md +253 -0
  311. package/skills/research/deep-research/open-researcher-guide/SKILL.md +138 -0
  312. package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +217 -0
  313. package/skills/research/funding/eu-horizon-guide/SKILL.md +244 -0
  314. package/skills/research/funding/grant-budget-guide/SKILL.md +284 -0
  315. package/skills/research/funding/nih-reporter-api-guide/SKILL.md +166 -0
  316. package/skills/research/funding/nsf-award-api-guide/SKILL.md +133 -0
  317. package/skills/research/methodology/academic-mentor-guide/SKILL.md +169 -0
  318. package/skills/research/methodology/claude-scientific-guide/SKILL.md +122 -0
  319. package/skills/research/methodology/deep-innovator-guide/SKILL.md +242 -0
  320. package/skills/research/methodology/osf-api-guide/SKILL.md +165 -0
  321. package/skills/research/methodology/parsifal-slr-guide/SKILL.md +154 -0
  322. package/skills/research/methodology/research-paper-kb/SKILL.md +263 -0
  323. package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +169 -0
  324. package/skills/research/methodology/research-town-guide/SKILL.md +263 -0
  325. package/skills/research/methodology/slr-automation-guide/SKILL.md +235 -0
  326. package/skills/research/paper-review/automated-review-guide/SKILL.md +281 -0
  327. package/skills/research/paper-review/latte-review-guide/SKILL.md +175 -0
  328. package/skills/research/paper-review/paper-compare-guide/SKILL.md +238 -0
  329. package/skills/research/paper-review/paper-critique-framework/SKILL.md +181 -0
  330. package/skills/research/paper-review/paper-digest-guide/SKILL.md +240 -0
  331. package/skills/research/paper-review/paper-research-assistant/SKILL.md +231 -0
  332. package/skills/research/paper-review/research-quality-filter/SKILL.md +261 -0
  333. package/skills/research/paper-review/review-response-guide/SKILL.md +275 -0
  334. package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +110 -0
  335. package/skills/tools/code-exec/google-colab-guide/SKILL.md +276 -0
  336. package/skills/tools/code-exec/kaggle-api-guide/SKILL.md +216 -0
  337. package/skills/tools/code-exec/overleaf-cli-guide/SKILL.md +279 -0
  338. package/skills/tools/diagram/clawphd-guide/SKILL.md +149 -0
  339. package/skills/tools/diagram/code-flow-visualizer/SKILL.md +197 -0
  340. package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +170 -0
  341. package/skills/tools/diagram/json-data-visualizer/SKILL.md +270 -0
  342. package/skills/tools/diagram/kroki-diagram-api/SKILL.md +198 -0
  343. package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +219 -0
  344. package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +201 -0
  345. package/skills/tools/diagram/tldraw-whiteboard-guide/SKILL.md +397 -0
  346. package/skills/tools/document/docsgpt-guide/SKILL.md +130 -0
  347. package/skills/tools/document/large-document-reader/SKILL.md +202 -0
  348. package/skills/tools/document/md2pdf-xelatex/SKILL.md +212 -0
  349. package/skills/tools/document/openpaper-guide/SKILL.md +232 -0
  350. package/skills/tools/document/paper-parse-guide/SKILL.md +243 -0
  351. package/skills/tools/document/weknora-guide/SKILL.md +216 -0
  352. package/skills/tools/document/zotero-addon-market-guide/SKILL.md +108 -0
  353. package/skills/tools/document/zotero-night-theme-guide/SKILL.md +142 -0
  354. package/skills/tools/document/zotero-style-guide/SKILL.md +217 -0
  355. package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +244 -0
  356. package/skills/tools/knowledge-graph/concept-map-generator/SKILL.md +284 -0
  357. package/skills/tools/knowledge-graph/graphiti-guide/SKILL.md +219 -0
  358. package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +135 -0
  359. package/skills/tools/knowledge-graph/notero-zotero-notion-guide/SKILL.md +187 -0
  360. package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +156 -0
  361. package/skills/tools/knowledge-graph/openspg-guide/SKILL.md +210 -0
  362. package/skills/tools/knowledge-graph/paperpile-notion-guide/SKILL.md +84 -0
  363. package/skills/tools/knowledge-graph/zotero-markdb-connect-guide/SKILL.md +162 -0
  364. package/skills/tools/ocr-translate/latex-translation-guide/SKILL.md +176 -0
  365. package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +198 -0
  366. package/skills/tools/ocr-translate/pdf-math-translate-guide/SKILL.md +141 -0
  367. package/skills/tools/ocr-translate/zotero-pdf-translate-guide/SKILL.md +95 -0
  368. package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +143 -0
  369. package/skills/tools/scraping/dataset-finder-guide/SKILL.md +253 -0
  370. package/skills/tools/scraping/easy-spider-guide/SKILL.md +250 -0
  371. package/skills/tools/scraping/google-scholar-scraper/SKILL.md +255 -0
  372. package/skills/tools/scraping/repository-harvesting-guide/SKILL.md +310 -0
  373. package/skills/writing/citation/academic-citation-manager/SKILL.md +314 -0
  374. package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +182 -0
  375. package/skills/writing/citation/citation-assistant-skill/SKILL.md +192 -0
  376. package/skills/writing/citation/jabref-reference-guide/SKILL.md +127 -0
  377. package/skills/writing/citation/jasminum-zotero-guide/SKILL.md +103 -0
  378. package/skills/writing/citation/mendeley-api/SKILL.md +231 -0
  379. package/skills/writing/citation/obsidian-citation-guide/SKILL.md +164 -0
  380. package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +137 -0
  381. package/skills/writing/citation/onecite-reference-guide/SKILL.md +168 -0
  382. package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +132 -0
  383. package/skills/writing/citation/papis-cli-guide/SKILL.md +213 -0
  384. package/skills/writing/citation/zotero-better-bibtex-guide/SKILL.md +107 -0
  385. package/skills/writing/citation/zotero-better-notes-guide/SKILL.md +121 -0
  386. package/skills/writing/citation/zotero-gpt-guide/SKILL.md +111 -0
  387. package/skills/writing/citation/zotero-mcp-guide/SKILL.md +164 -0
  388. package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +162 -0
  389. package/skills/writing/citation/zotero-reference-guide/SKILL.md +139 -0
  390. package/skills/writing/citation/zotero-scholar-guide/SKILL.md +294 -0
  391. package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +140 -0
  392. package/skills/writing/composition/ml-paper-writing/SKILL.md +163 -0
  393. package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +200 -0
  394. package/skills/writing/composition/paper-debugger-guide/SKILL.md +143 -0
  395. package/skills/writing/composition/paperforge-guide/SKILL.md +205 -0
  396. package/skills/writing/composition/research-paper-writer/SKILL.md +226 -0
  397. package/skills/writing/composition/scientific-writing-resources/SKILL.md +151 -0
  398. package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +153 -0
  399. package/skills/writing/latex/academic-writing-latex/SKILL.md +285 -0
  400. package/skills/writing/latex/latex-drawing-collection/SKILL.md +154 -0
  401. package/skills/writing/latex/latex-templates-collection/SKILL.md +159 -0
  402. package/skills/writing/latex/md-to-pdf-academic/SKILL.md +230 -0
  403. package/skills/writing/latex/tex-render-guide/SKILL.md +243 -0
  404. package/skills/writing/polish/academic-tone-guide/SKILL.md +209 -0
  405. package/skills/writing/polish/chinese-text-humanizer/SKILL.md +140 -0
  406. package/skills/writing/polish/conciseness-editing-guide/SKILL.md +225 -0
  407. package/skills/writing/polish/paper-polish-guide/SKILL.md +160 -0
  408. package/skills/writing/templates/arxiv-preprint-template/SKILL.md +184 -0
  409. package/skills/writing/templates/elegant-paper-template/SKILL.md +141 -0
  410. package/skills/writing/templates/graphical-abstract-guide/SKILL.md +183 -0
  411. package/skills/writing/templates/novathesis-guide/SKILL.md +152 -0
  412. package/skills/writing/templates/scientific-article-pdf/SKILL.md +261 -0
  413. package/skills/writing/templates/sjtuthesis-guide/SKILL.md +197 -0
  414. package/skills/writing/templates/thuthesis-guide/SKILL.md +181 -0
  415. package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +0 -207
@@ -0,0 +1,212 @@
1
+ ---
2
+ name: institutional-repository-guide
3
+ description: "Access papers from institutional and subject repositories at scale"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "🏛️"
7
+ category: "literature"
8
+ subcategory: "fulltext"
9
+ keywords: ["institutional repository", "DSpace", "EPrints", "open access archive", "subject repository", "OpenDOAR"]
10
+ source: "wentor-research-plugins"
11
+ ---
12
+
13
+ # Institutional Repository Guide
14
+
15
+ Institutional repositories (IRs) are university-run digital archives that store and provide open access to their researchers' scholarly output — dissertations, journal articles, conference papers, datasets, and technical reports. Subject repositories like arXiv, bioRxiv, SSRN, and RePEc serve similar functions for specific disciplines. Together, they form a distributed network of open scholarship that complements commercial databases.
16
+
17
+ This guide covers how to discover, access, and systematically harvest content from institutional and subject repositories for literature reviews, meta-analyses, and research data collection.
18
+
19
+ ## Repository Landscape
20
+
21
+ ### Types of Repositories
22
+
23
+ ```
24
+ Institutional Repositories (IR):
25
+ - Run by universities to archive their researchers' output
26
+ - Examples: DSpace, EPrints, Fedora-based systems
27
+ - Discovery: OpenDOAR directory (v2.sherpa.ac.uk/opendoar)
28
+
29
+ Subject Repositories:
30
+ - Discipline-specific archives
31
+ - arXiv (physics, CS, math), bioRxiv, SSRN, RePEc, EarthArXiv
32
+
33
+ Aggregators:
34
+ - Harvest from many repositories into a single search interface
35
+ - BASE (Bielefeld Academic Search Engine)
36
+ - CORE (core.ac.uk, 200M+ open access articles)
37
+ - OpenAIRE (European research output)
38
+ ```
39
+
40
+ ### Discovering Repositories
41
+
42
+ OpenDOAR (Directory of Open Access Repositories) is the primary registry for finding institutional repositories:
43
+
44
+ ```python
45
+ import urllib.request
46
+ import json
47
+
48
+ def search_opendoar(subject: str = None, country: str = None) -> list:
49
+ """
50
+ Search the OpenDOAR registry for institutional repositories.
51
+
52
+ Args:
53
+ subject: Filter by subject area (e.g., "Biology", "Computer Science")
54
+ country: ISO country code (e.g., "US", "GB", "CN")
55
+ """
56
+ base_url = "https://v2.sherpa.ac.uk/cgi/retrieve"
57
+ params = "?item-type=repository&format=Json"
58
+ if subject:
59
+ params += f"&filter=[[\"{subject}\",\"subject\"]]"
60
+ if country:
61
+ params += f"&filter=[[\"{country}\",\"country\"]]"
62
+
63
+ req = urllib.request.Request(base_url + params)
64
+ response = urllib.request.urlopen(req)
65
+ data = json.loads(response.read())
66
+
67
+ repositories = []
68
+ for item in data.get("items", []):
69
+ repo_info = {
70
+ "name": item.get("repository_metadata", {}).get("name", [{}])[0].get("name", ""),
71
+ "url": item.get("repository_metadata", {}).get("url", ""),
72
+ "oai_url": item.get("repository_metadata", {}).get("oai_url", ""),
73
+ "software": item.get("repository_metadata", {}).get("software", {}).get("name", ""),
74
+ "type": item.get("repository_metadata", {}).get("type", "")
75
+ }
76
+ repositories.append(repo_info)
77
+
78
+ return repositories
79
+ ```
80
+
81
+ ## OAI-PMH Harvesting from Repositories
82
+
83
+ Most institutional repositories support OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting), the standard protocol for metadata exchange:
84
+
85
+ ```python
86
+ import xml.etree.ElementTree as ET
87
+ import urllib.request
88
+
89
+ def harvest_repository(base_url: str, metadata_prefix: str = "oai_dc",
90
+ set_spec: str = None, from_date: str = None) -> list:
91
+ """
92
+ Harvest metadata records from a repository's OAI-PMH endpoint.
93
+
94
+ Args:
95
+ base_url: The OAI-PMH base URL
96
+ metadata_prefix: Metadata format (oai_dc, datacite, mets)
97
+ set_spec: Optional set/collection to restrict harvesting
98
+ from_date: Harvest only records added after this date (YYYY-MM-DD)
99
+ """
100
+ params = f"?verb=ListRecords&metadataPrefix={metadata_prefix}"
101
+ if set_spec:
102
+ params += f"&set={set_spec}"
103
+ if from_date:
104
+ params += f"&from={from_date}"
105
+
106
+ url = base_url + params
107
+ records = []
108
+
109
+ while url:
110
+ response = urllib.request.urlopen(url)
111
+ tree = ET.parse(response)
112
+ root = tree.getroot()
113
+ ns = {"oai": "http://www.openarchives.org/OAI/2.0/"}
114
+
115
+ for record in root.findall(".//oai:record", ns):
116
+ header = record.find("oai:header", ns)
117
+ identifier = header.find("oai:identifier", ns).text
118
+ datestamp = header.find("oai:datestamp", ns).text
119
+ records.append({"identifier": identifier, "datestamp": datestamp})
120
+
121
+ token_elem = root.find(".//oai:resumptionToken", ns)
122
+ if token_elem is not None and token_elem.text:
123
+ url = f"{base_url}?verb=ListRecords&resumptionToken={token_elem.text}"
124
+ else:
125
+ url = None
126
+
127
+ return records
128
+ ```
129
+
130
+ ### Key OAI-PMH Verbs
131
+
132
+ | Verb | Purpose |
133
+ |------|---------|
134
+ | `Identify` | Get repository name, admin email, policies |
135
+ | `ListSets` | List available collections/sets |
136
+ | `ListMetadataFormats` | List supported metadata schemas |
137
+ | `ListIdentifiers` | Lightweight listing of record headers |
138
+ | `ListRecords` | Full metadata records with pagination |
139
+ | `GetRecord` | Retrieve a single record by identifier |
140
+
141
+ ## Major Repository Platforms
142
+
143
+ ### DSpace
144
+
145
+ The most widely deployed open-source repository platform (used by ~40% of repositories worldwide):
146
+
147
+ - OAI-PMH endpoint: `{base-url}/oai/request`
148
+ - REST API: `{base-url}/server/api`
149
+ - Supports Dublin Core, METS, and custom metadata schemas
150
+ - Examples: MIT DSpace, University of Cambridge Repository
151
+
152
+ ### EPrints
153
+
154
+ Popular in the UK and Europe:
155
+
156
+ - OAI-PMH endpoint: `{base-url}/cgi/oai2`
157
+ - REST API: `{base-url}/cgi/export/{id}/{format}`
158
+ - Strong support for research output types (articles, theses, conference items)
159
+ - Examples: University of Southampton EPrints
160
+
161
+ ### Fedora / Islandora
162
+
163
+ Used by larger institutions with complex digital collections:
164
+
165
+ - Typically paired with a discovery layer (Solr/Blacklight)
166
+ - Strong support for digital preservation workflows
167
+ - Examples: University of Toronto, Smithsonian Institution
168
+
169
+ ## Building a Harvesting Pipeline
170
+
171
+ ### Systematic Collection Workflow
172
+
173
+ ```
174
+ 1. Identify target repositories
175
+ - Use OpenDOAR to find IRs by subject or country
176
+ - List subject repositories relevant to your discipline
177
+
178
+ 2. Test endpoints
179
+ - Send Identify request to verify the endpoint is active
180
+ - Check ListMetadataFormats for available schemas
181
+
182
+ 3. Harvest incrementally
183
+ - Use "from" parameter to harvest only new records
184
+ - Store last harvest date for each repository
185
+ - Respect rate limits (typically 1 request per second)
186
+
187
+ 4. Deduplicate
188
+ - Match records by DOI when available
189
+ - Use title + author fuzzy matching for records without DOIs
190
+ - Flag duplicates rather than deleting (keep provenance)
191
+
192
+ 5. Store and index
193
+ - Save metadata in structured format (JSON, SQLite, CSV)
194
+ - Build a local search index for efficient retrieval
195
+ ```
196
+
197
+ ## Ethical Considerations
198
+
199
+ - Always respect `robots.txt` and repository rate limits
200
+ - Metadata harvesting is generally permitted; bulk full-text download may require permission
201
+ - Check each repository's terms of use before harvesting
202
+ - Use harvested data for research purposes, not commercial redistribution
203
+ - Attribute the source repository in publications using harvested data
204
+ - Consider reaching out to repository administrators for large-scale harvesting projects
205
+
206
+ ## References
207
+
208
+ - OpenDOAR: https://v2.sherpa.ac.uk/opendoar/
209
+ - OAI-PMH specification: http://www.openarchives.org/OAI/openarchivesprotocol.html
210
+ - CORE: https://core.ac.uk
211
+ - BASE: https://www.base-search.net
212
+ - DSpace documentation: https://wiki.lyrasis.org/display/DSPACE
@@ -0,0 +1,341 @@
1
+ ---
2
+ name: open-access-mining-guide
3
+ description: "Mine open access full-text repositories for research data extraction"
4
+ metadata:
5
+ openclaw:
6
+ emoji: "unlock"
7
+ category: "literature"
8
+ subcategory: "fulltext"
9
+ keywords: ["open access", "text mining", "full text", "PubMed Central", "CORE", "content mining", "TDM"]
10
+ source: "wentor-research-plugins"
11
+ ---
12
+
13
+ # Open Access Mining Guide
14
+
15
+ A skill for systematically mining open access full-text repositories to extract structured research data at scale. Covers legal frameworks for text and data mining (TDM), major open access repositories and their APIs, full-text retrieval and parsing, section-level extraction, entity recognition in scientific text, and building reproducible mining pipelines.
16
+
17
+ ## Legal Framework for Text and Data Mining
18
+
19
+ ### Rights and Regulations
20
+
21
+ Text and data mining of published literature operates within a specific legal framework that varies by jurisdiction. Understanding these rules is essential before starting any mining project.
22
+
23
+ ```
24
+ Legal landscape for TDM:
25
+
26
+ EU Directive 2019/790 (DSM Directive):
27
+ - Article 3: TDM exception for research organizations
28
+ - Lawful access required (institutional subscription counts)
29
+ - Must be for scientific research purposes
30
+ - No opt-out possible for publishers
31
+ - Applies to EU/EEA research institutions
32
+ - Article 4: General TDM exception
33
+ - Available to anyone with lawful access
34
+ - Publishers CAN opt out (via robots.txt or metadata)
35
+
36
+ UK: TDM exception for non-commercial research (CDPA s.29A)
37
+
38
+ US: No specific TDM law; relies on fair use doctrine
39
+ - Transformative use generally favored by courts
40
+ - Google Books case (2015) supports large-scale text analysis
41
+ - But: database protection via Terms of Service
42
+
43
+ Practical guidelines:
44
+ - Mine open access content (CC-BY, CC-BY-SA) freely
45
+ - Mine subscription content under institutional license
46
+ - Check publisher TDM policies (Elsevier, Springer, Wiley
47
+ all have TDM APIs for licensed content)
48
+ - Never redistribute full text; share derived data only
49
+ - Credit the data source in publications
50
+ ```
51
+
52
+ ## Major Open Access Repositories
53
+
54
+ ### Repository Comparison
55
+
56
+ ```
57
+ Repository overview for full-text mining:
58
+
59
+ PubMed Central (PMC):
60
+ - Coverage: 8M+ full-text articles (biomedical/life sciences)
61
+ - Access: Free, OA subset freely downloadable
62
+ - Formats: XML (JATS), PDF
63
+ - API: E-utilities (Entrez), bulk FTP download
64
+ - License: varies by article (check individual licenses)
65
+ - Best for: biomedical systematic reviews, meta-analyses
66
+ - Bulk download: ftp.ncbi.nlm.nih.gov/pub/pmc/
67
+
68
+ Europe PMC:
69
+ - Coverage: PMC content + European-funded research
70
+ - Access: Free, REST API
71
+ - Formats: XML, JSON
72
+ - API: europepmc.org/RestfulWebService
73
+ - Annotations: sentence-level annotations, concepts, data links
74
+ - Best for: European research, annotated text mining
75
+
76
+ CORE (core.ac.uk):
77
+ - Coverage: 200M+ metadata records, 36M+ full texts
78
+ - Access: Free API (registration required)
79
+ - Formats: JSON, full text as extracted plain text
80
+ - Sources: aggregates from 10,000+ repositories worldwide
81
+ - Best for: cross-disciplinary mining, thesis/dissertation text
82
+
83
+ arXiv:
84
+ - Coverage: 2M+ preprints (physics, math, CS, etc.)
85
+ - Access: Free bulk download, API
86
+ - Formats: LaTeX source, PDF
87
+ - Bulk: Kaggle dataset, S3 requester-pays bucket
88
+ - Best for: STEM preprint analysis, citation studies
89
+
90
+ Unpaywall / OpenAlex:
91
+ - Coverage: tracks OA status of 200M+ works
92
+ - Access: Free API, database dump
93
+ - Use: Find OA versions of any DOI
94
+ - Best for: Locating freely available versions of papers
95
+
96
+ Semantic Scholar:
97
+ - Coverage: 200M+ papers, abstracts + some full text
98
+ - Access: Free API, bulk datasets
99
+ - Features: TLDR summaries, citation intents, S2ORC corpus
100
+ - Best for: NLP research on scientific text
101
+ ```
102
+
103
+ ## Full-Text Retrieval and Parsing
104
+
105
+ ### Retrieving from PubMed Central
106
+
107
+ ```python
108
+ import requests
109
+ import xml.etree.ElementTree as ET
110
+ import time
111
+
112
+ def fetch_pmc_fulltext(pmc_id):
113
+ """
114
+ Fetch full-text XML from PubMed Central via E-utilities.
115
+
116
+ Args:
117
+ pmc_id: PMC identifier (e.g., "PMC7096724")
118
+
119
+ Returns:
120
+ Parsed article as structured dictionary
121
+ """
122
+ base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
123
+ params = {
124
+ "db": "pmc",
125
+ "id": pmc_id.replace("PMC", ""),
126
+ "rettype": "xml",
127
+ }
128
+
129
+ response = requests.get(base_url, params=params, timeout=30)
130
+ response.raise_for_status()
131
+
132
+ root = ET.fromstring(response.content)
133
+ article = parse_jats_xml(root)
134
+
135
+ return article
136
+
137
+
138
+ def parse_jats_xml(root):
139
+ """
140
+ Parse JATS XML (Journal Article Tag Suite) into structured data.
141
+ JATS is the standard XML format for PMC articles.
142
+ """
143
+ article = {}
144
+
145
+ # Title
146
+ title_elem = root.find(".//article-title")
147
+ article["title"] = "".join(title_elem.itertext()) if title_elem is not None else ""
148
+
149
+ # Abstract
150
+ abstract_elem = root.find(".//abstract")
151
+ if abstract_elem is not None:
152
+ article["abstract"] = "".join(abstract_elem.itertext()).strip()
153
+
154
+ # Body sections
155
+ body = root.find(".//body")
156
+ if body is not None:
157
+ article["sections"] = extract_sections(body)
158
+
159
+ # References
160
+ ref_list = root.find(".//ref-list")
161
+ if ref_list is not None:
162
+ article["references"] = extract_references(ref_list)
163
+
164
+ return article
165
+
166
+
167
+ def extract_sections(body_element):
168
+ """
169
+ Extract sections with their titles and text content.
170
+ Preserves the hierarchical structure of the paper.
171
+ """
172
+ sections = []
173
+ for sec in body_element.findall(".//sec"):
174
+ title_elem = sec.find("title")
175
+ title = title_elem.text if title_elem is not None else "Untitled"
176
+ paragraphs = []
177
+ for p in sec.findall("p"):
178
+ text = "".join(p.itertext()).strip()
179
+ if text:
180
+ paragraphs.append(text)
181
+
182
+ sections.append({
183
+ "title": title,
184
+ "text": "\n".join(paragraphs),
185
+ "id": sec.get("id", ""),
186
+ })
187
+
188
+ return sections
189
+ ```
190
+
191
+ ### Batch Processing Pipeline
192
+
193
+ ```python
194
+ def batch_mine_pmc(pmc_ids, output_dir, delay=0.4):
195
+ """
196
+ Mine multiple PMC articles with rate limiting.
197
+
198
+ NCBI E-utilities rate limit:
199
+ - Without API key: 3 requests/second
200
+ - With API key: 10 requests/second
201
+ - Register for API key at ncbi.nlm.nih.gov/account/
202
+ """
203
+ import json
204
+ import os
205
+
206
+ results = []
207
+ errors = []
208
+
209
+ for i, pmc_id in enumerate(pmc_ids):
210
+ try:
211
+ article = fetch_pmc_fulltext(pmc_id)
212
+ results.append(article)
213
+
214
+ # Save individual article
215
+ output_path = os.path.join(output_dir, f"{pmc_id}.json")
216
+ with open(output_path, "w") as f:
217
+ json.dump(article, f, indent=2)
218
+
219
+ if (i + 1) % 100 == 0:
220
+ print(f"Processed {i + 1}/{len(pmc_ids)} articles")
221
+
222
+ except Exception as e:
223
+ errors.append({"pmc_id": pmc_id, "error": str(e)})
224
+
225
+ # Rate limiting
226
+ time.sleep(delay)
227
+
228
+ print(f"Successfully mined {len(results)} articles, "
229
+ f"{len(errors)} errors")
230
+ return results, errors
231
+ ```
232
+
233
+ ## Information Extraction from Full Text
234
+
235
+ ### Section-Level Extraction
236
+
237
+ ```
238
+ Targeted extraction by paper section:
239
+
240
+ Introduction:
241
+ - Research questions and hypotheses
242
+ - Knowledge gaps identified
243
+ - Theoretical framework references
244
+
245
+ Methods:
246
+ - Study design (RCT, cohort, case-control, etc.)
247
+ - Sample size and population characteristics
248
+ - Measurement instruments and their validity
249
+ - Statistical analysis methods
250
+ - Software and versions used
251
+
252
+ Results:
253
+ - Effect sizes with confidence intervals
254
+ - P-values and test statistics
255
+ - Participant flow (enrollment, dropout, analysis)
256
+ - Tables and figures (structured data)
257
+
258
+ Discussion:
259
+ - Key findings summarized
260
+ - Comparison with prior work
261
+ - Limitations acknowledged
262
+ - Future directions proposed
263
+ - Clinical/practical implications
264
+ ```
265
+
266
+ ### Named Entity Recognition for Science
267
+
268
+ ```python
269
+ def extract_scientific_entities(text):
270
+ """
271
+ Extract scientific named entities from full text.
272
+
273
+ For biomedical text, use specialized NER models:
274
+ - SciSpaCy: biomedical NER (diseases, chemicals, genes)
275
+ - BioBERT: contextual biomedical NER
276
+ - PubTator: NCBI's annotation service
277
+ """
278
+ import scispacy
279
+ import spacy
280
+
281
+ nlp = spacy.load("en_ner_bionlp13cg_md")
282
+ doc = nlp(text)
283
+
284
+ entities = []
285
+ for ent in doc.ents:
286
+ entities.append({
287
+ "text": ent.text,
288
+ "label": ent.label_,
289
+ "start": ent.start_char,
290
+ "end": ent.end_char,
291
+ })
292
+
293
+ return entities
294
+ ```
295
+
296
+ ## Building Reproducible Pipelines
297
+
298
+ ### Pipeline Architecture
299
+
300
+ ```
301
+ Recommended pipeline structure:
302
+
303
+ 1. Query definition:
304
+ - Define search terms, date ranges, inclusion criteria
305
+ - Document in a protocol file (version-controlled)
306
+
307
+ 2. Article retrieval:
308
+ - Search API for matching articles
309
+ - Download full text (XML/PDF)
310
+ - Store raw data with metadata
311
+
312
+ 3. Text extraction:
313
+ - Parse XML or extract text from PDF
314
+ - Section segmentation
315
+ - Table and figure extraction (if needed)
316
+
317
+ 4. Information extraction:
318
+ - NER for entities of interest
319
+ - Relation extraction
320
+ - Numeric data extraction (effect sizes, p-values)
321
+
322
+ 5. Quality control:
323
+ - Sample-based manual validation (10-20% of results)
324
+ - Inter-annotator agreement on validation sample
325
+ - Error analysis and pipeline refinement
326
+
327
+ 6. Data export:
328
+ - Structured output (CSV, JSON, database)
329
+ - Provenance tracking (which article, which section)
330
+ - Ready for downstream analysis
331
+
332
+ Best practices:
333
+ - Version control the entire pipeline code
334
+ - Log all API queries and responses
335
+ - Set random seeds for any sampling steps
336
+ - Share the pipeline code in supplementary materials
337
+ - Use DOIs or PMCIDs as stable article identifiers
338
+ - Cache downloaded articles to avoid re-fetching
339
+ ```
340
+
341
+ Open access full-text mining enables research at a scale impossible with manual reading. A single researcher can systematically extract data from thousands of papers, enabling comprehensive evidence synthesis, trend analysis, and hypothesis generation. The key requirements are respecting legal and ethical boundaries, building robust parsing pipelines, and rigorously validating extracted data against manual review.