@wentorai/research-plugins 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +22 -22
- package/curated/analysis/README.md +82 -56
- package/curated/domains/README.md +225 -69
- package/curated/literature/README.md +115 -46
- package/curated/research/README.md +106 -58
- package/curated/tools/README.md +107 -87
- package/curated/writing/README.md +92 -45
- package/mcp-configs/academic-db/alphafold-mcp.json +20 -0
- package/mcp-configs/academic-db/brightspace-mcp.json +21 -0
- package/mcp-configs/academic-db/climatiq-mcp.json +20 -0
- package/mcp-configs/academic-db/gibs-mcp.json +20 -0
- package/mcp-configs/academic-db/gis-mcp-server.json +22 -0
- package/mcp-configs/academic-db/google-earth-engine-mcp.json +21 -0
- package/mcp-configs/academic-db/m4-clinical-mcp.json +21 -0
- package/mcp-configs/academic-db/medical-mcp.json +21 -0
- package/mcp-configs/academic-db/nexonco-mcp.json +20 -0
- package/mcp-configs/academic-db/omop-mcp.json +20 -0
- package/mcp-configs/academic-db/onekgpd-mcp.json +20 -0
- package/mcp-configs/academic-db/openedu-mcp.json +20 -0
- package/mcp-configs/academic-db/opengenes-mcp.json +20 -0
- package/mcp-configs/academic-db/openstax-mcp.json +21 -0
- package/mcp-configs/academic-db/openstreetmap-mcp.json +21 -0
- package/mcp-configs/academic-db/opentargets-mcp.json +21 -0
- package/mcp-configs/academic-db/pdb-mcp.json +21 -0
- package/mcp-configs/academic-db/smithsonian-mcp.json +20 -0
- package/mcp-configs/ai-platform/magi-researchers.json +21 -0
- package/mcp-configs/ai-platform/mcp-academic-researcher.json +22 -0
- package/mcp-configs/ai-platform/open-paper-machine.json +21 -0
- package/mcp-configs/ai-platform/paper-intelligence.json +21 -0
- package/mcp-configs/ai-platform/paper-reader.json +21 -0
- package/mcp-configs/ai-platform/paperdebugger.json +21 -0
- package/mcp-configs/browser/exa-mcp.json +20 -0
- package/mcp-configs/browser/mcp-searxng.json +21 -0
- package/mcp-configs/browser/mcp-webresearch.json +20 -0
- package/mcp-configs/cloud-docs/confluence-mcp.json +37 -0
- package/mcp-configs/cloud-docs/google-drive-mcp.json +35 -0
- package/mcp-configs/cloud-docs/notion-mcp.json +29 -0
- package/mcp-configs/communication/discord-mcp.json +29 -0
- package/mcp-configs/communication/discourse-mcp.json +21 -0
- package/mcp-configs/communication/slack-mcp.json +29 -0
- package/mcp-configs/communication/telegram-mcp.json +28 -0
- package/mcp-configs/data-platform/automl-stat-mcp.json +21 -0
- package/mcp-configs/data-platform/jefferson-stats-mcp.json +22 -0
- package/mcp-configs/data-platform/mcp-excel-server.json +21 -0
- package/mcp-configs/data-platform/mcp-stata.json +21 -0
- package/mcp-configs/data-platform/mcpstack-jupyter.json +21 -0
- package/mcp-configs/data-platform/ml-mcp.json +21 -0
- package/mcp-configs/data-platform/nasdaq-data-link-mcp.json +20 -0
- package/mcp-configs/data-platform/numpy-mcp.json +21 -0
- package/mcp-configs/database/neo4j-mcp.json +37 -0
- package/mcp-configs/database/postgres-mcp.json +28 -0
- package/mcp-configs/database/sqlite-mcp.json +29 -0
- package/mcp-configs/dev-platform/geogebra-mcp.json +21 -0
- package/mcp-configs/dev-platform/github-mcp.json +31 -0
- package/mcp-configs/dev-platform/gitlab-mcp.json +34 -0
- package/mcp-configs/dev-platform/latex-mcp-server.json +21 -0
- package/mcp-configs/dev-platform/manim-mcp.json +20 -0
- package/mcp-configs/dev-platform/mcp-echarts.json +20 -0
- package/mcp-configs/dev-platform/panel-viz-mcp.json +20 -0
- package/mcp-configs/dev-platform/paperbanana.json +20 -0
- package/mcp-configs/dev-platform/texflow-mcp.json +20 -0
- package/mcp-configs/dev-platform/texmcp.json +20 -0
- package/mcp-configs/dev-platform/typst-mcp.json +21 -0
- package/mcp-configs/dev-platform/vizro-mcp.json +20 -0
- package/mcp-configs/email/email-mcp.json +40 -0
- package/mcp-configs/email/gmail-mcp.json +37 -0
- package/mcp-configs/note-knowledge/local-faiss-mcp.json +21 -0
- package/mcp-configs/note-knowledge/mcp-memory-service.json +21 -0
- package/mcp-configs/note-knowledge/mcp-obsidian.json +23 -0
- package/mcp-configs/note-knowledge/mcp-ragdocs.json +20 -0
- package/mcp-configs/note-knowledge/mcp-summarizer.json +21 -0
- package/mcp-configs/note-knowledge/mediawiki-mcp.json +21 -0
- package/mcp-configs/note-knowledge/openzim-mcp.json +20 -0
- package/mcp-configs/note-knowledge/zettelkasten-mcp.json +21 -0
- package/mcp-configs/reference-mgr/academic-paper-mcp-http.json +20 -0
- package/mcp-configs/reference-mgr/academix.json +20 -0
- package/mcp-configs/reference-mgr/arxiv-research-mcp.json +21 -0
- package/mcp-configs/reference-mgr/google-scholar-abstract-mcp.json +19 -0
- package/mcp-configs/reference-mgr/google-scholar-mcp.json +20 -0
- package/mcp-configs/reference-mgr/mcp-paperswithcode.json +21 -0
- package/mcp-configs/reference-mgr/mcp-scholarly.json +20 -0
- package/mcp-configs/reference-mgr/mcp-simple-arxiv.json +20 -0
- package/mcp-configs/reference-mgr/mcp-simple-pubmed.json +20 -0
- package/mcp-configs/reference-mgr/mcp-zotero.json +21 -0
- package/mcp-configs/reference-mgr/mendeley-mcp.json +20 -0
- package/mcp-configs/reference-mgr/ncbi-mcp-server.json +22 -0
- package/mcp-configs/reference-mgr/onecite.json +21 -0
- package/mcp-configs/reference-mgr/paper-search-mcp.json +21 -0
- package/mcp-configs/reference-mgr/pubmed-search-mcp.json +21 -0
- package/mcp-configs/reference-mgr/scholar-mcp.json +21 -0
- package/mcp-configs/reference-mgr/scholar-multi-mcp.json +21 -0
- package/mcp-configs/reference-mgr/seerai.json +21 -0
- package/mcp-configs/reference-mgr/semantic-scholar-fastmcp.json +21 -0
- package/mcp-configs/reference-mgr/sourcelibrary.json +20 -0
- package/mcp-configs/registry.json +178 -149
- package/mcp-configs/repository/dataverse-mcp.json +33 -0
- package/mcp-configs/repository/huggingface-mcp.json +29 -0
- package/openclaw.plugin.json +2 -2
- package/package.json +2 -2
- package/skills/analysis/dataviz/algorithm-visualizer-guide/SKILL.md +259 -0
- package/skills/analysis/dataviz/bokeh-visualization-guide/SKILL.md +270 -0
- package/skills/analysis/dataviz/chart-image-generator/SKILL.md +229 -0
- package/skills/analysis/dataviz/citation-map-guide/SKILL.md +184 -0
- package/skills/analysis/dataviz/d3-visualization-guide/SKILL.md +281 -0
- package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +171 -0
- package/skills/analysis/dataviz/echarts-visualization-guide/SKILL.md +250 -0
- package/skills/analysis/dataviz/metabase-analytics-guide/SKILL.md +242 -0
- package/skills/analysis/dataviz/plotly-interactive-guide/SKILL.md +266 -0
- package/skills/analysis/dataviz/redash-analytics-guide/SKILL.md +284 -0
- package/skills/analysis/econometrics/econml-causal-guide/SKILL.md +163 -0
- package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +192 -0
- package/skills/analysis/econometrics/mostly-harmless-guide/SKILL.md +139 -0
- package/skills/analysis/econometrics/panel-data-analyst/SKILL.md +259 -0
- package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +267 -0
- package/skills/analysis/econometrics/python-causality-guide/SKILL.md +134 -0
- package/skills/analysis/econometrics/stata-accounting-guide/SKILL.md +269 -0
- package/skills/analysis/econometrics/stata-analyst-guide/SKILL.md +245 -0
- package/skills/analysis/econometrics/stata-reference-guide/SKILL.md +293 -0
- package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +157 -0
- package/skills/analysis/statistics/general-statistics-guide/SKILL.md +226 -0
- package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +106 -0
- package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +212 -0
- package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +192 -0
- package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +193 -0
- package/skills/analysis/statistics/senior-data-scientist-guide/SKILL.md +223 -0
- package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +100 -0
- package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +170 -0
- package/skills/analysis/wrangling/data-cleaning-pipeline/SKILL.md +266 -0
- package/skills/analysis/wrangling/data-cog-guide/SKILL.md +178 -0
- package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +197 -0
- package/skills/analysis/wrangling/stata-data-cleaning/SKILL.md +276 -0
- package/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md +119 -0
- package/skills/analysis/wrangling/survey-data-processing/SKILL.md +298 -0
- package/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md +146 -0
- package/skills/domains/ai-ml/ai-model-benchmarking/SKILL.md +209 -0
- package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +159 -0
- package/skills/domains/ai-ml/anomaly-detection-papers-guide/SKILL.md +167 -0
- package/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md +178 -0
- package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md +239 -0
- package/skills/domains/ai-ml/domain-adaptation-papers-guide/SKILL.md +173 -0
- package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +146 -0
- package/skills/domains/ai-ml/graph-learning-papers-guide/SKILL.md +125 -0
- package/skills/domains/ai-ml/huggingface-inference-guide/SKILL.md +196 -0
- package/skills/domains/ai-ml/keras-deep-learning/SKILL.md +210 -0
- package/skills/domains/ai-ml/kolmogorov-arnold-networks-guide/SKILL.md +185 -0
- package/skills/domains/ai-ml/llm-from-scratch-guide/SKILL.md +124 -0
- package/skills/domains/ai-ml/ml-pipeline-guide/SKILL.md +295 -0
- package/skills/domains/ai-ml/nlp-toolkit-guide/SKILL.md +247 -0
- package/skills/domains/ai-ml/npcpy-research-guide/SKILL.md +137 -0
- package/skills/domains/ai-ml/pytorch-guide/SKILL.md +281 -0
- package/skills/domains/ai-ml/pytorch-lightning-guide/SKILL.md +244 -0
- package/skills/domains/ai-ml/responsible-ai-guide/SKILL.md +126 -0
- package/skills/domains/ai-ml/tensorflow-guide/SKILL.md +241 -0
- package/skills/domains/ai-ml/vmas-simulator-guide/SKILL.md +129 -0
- package/skills/domains/biomedical/bioagents-guide/SKILL.md +308 -0
- package/skills/domains/biomedical/clawbio-guide/SKILL.md +167 -0
- package/skills/domains/biomedical/clinical-dialogue-agents-guide/SKILL.md +145 -0
- package/skills/domains/biomedical/ena-sequence-api/SKILL.md +175 -0
- package/skills/domains/biomedical/genomas-guide/SKILL.md +126 -0
- package/skills/domains/biomedical/genotex-benchmark-guide/SKILL.md +125 -0
- package/skills/domains/biomedical/med-researcher-guide/SKILL.md +161 -0
- package/skills/domains/biomedical/med-researcher-r1-guide/SKILL.md +146 -0
- package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +345 -0
- package/skills/domains/biomedical/medical-imaging-guide/SKILL.md +305 -0
- package/skills/domains/biomedical/ncbi-blast-api/SKILL.md +195 -0
- package/skills/domains/biomedical/ncbi-datasets-api/SKILL.md +220 -0
- package/skills/domains/biomedical/quickgo-api/SKILL.md +181 -0
- package/skills/domains/business/architecture-design-guide/SKILL.md +279 -0
- package/skills/domains/business/innovation-management-guide/SKILL.md +257 -0
- package/skills/domains/business/operations-research-guide/SKILL.md +258 -0
- package/skills/domains/business/xpert-bi-guide/SKILL.md +84 -0
- package/skills/domains/chemistry/cactus-cheminformatics-guide/SKILL.md +89 -0
- package/skills/domains/chemistry/chemeagle-guide/SKILL.md +147 -0
- package/skills/domains/chemistry/chemgraph-agent-guide/SKILL.md +120 -0
- package/skills/domains/chemistry/molecular-dynamics-guide/SKILL.md +237 -0
- package/skills/domains/chemistry/pubchem-api-guide/SKILL.md +180 -0
- package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md +290 -0
- package/skills/domains/cs/ai-security-papers-guide/SKILL.md +103 -0
- package/skills/domains/cs/code-llm-papers-guide/SKILL.md +131 -0
- package/skills/domains/cs/distributed-systems-guide/SKILL.md +268 -0
- package/skills/domains/cs/formal-verification-guide/SKILL.md +298 -0
- package/skills/domains/cs/gaussian-splatting-papers-guide/SKILL.md +158 -0
- package/skills/domains/cs/llm-aiops-guide/SKILL.md +70 -0
- package/skills/domains/cs/software-heritage-api/SKILL.md +200 -0
- package/skills/domains/ecology/species-distribution-guide/SKILL.md +343 -0
- package/skills/domains/economics/imf-data-api-guide/SKILL.md +174 -0
- package/skills/domains/economics/nber-working-papers-api/SKILL.md +177 -0
- package/skills/domains/economics/post-labor-economics/SKILL.md +254 -0
- package/skills/domains/economics/pricing-psychology-guide/SKILL.md +273 -0
- package/skills/domains/economics/repec-economics-api/SKILL.md +188 -0
- package/skills/domains/economics/world-bank-data-guide/SKILL.md +179 -0
- package/skills/domains/education/academic-study-methods/SKILL.md +228 -0
- package/skills/domains/education/assessment-design-guide/SKILL.md +213 -0
- package/skills/domains/education/educational-research-methods/SKILL.md +179 -0
- package/skills/domains/education/edumcp-guide/SKILL.md +74 -0
- package/skills/domains/education/mooc-analytics-guide/SKILL.md +206 -0
- package/skills/domains/education/open-syllabus-api/SKILL.md +171 -0
- package/skills/domains/finance/akshare-finance-data/SKILL.md +207 -0
- package/skills/domains/finance/finsight-research-guide/SKILL.md +113 -0
- package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +117 -0
- package/skills/domains/finance/portfolio-optimization-guide/SKILL.md +279 -0
- package/skills/domains/finance/risk-modeling-guide/SKILL.md +260 -0
- package/skills/domains/finance/stata-accounting-research/SKILL.md +372 -0
- package/skills/domains/geoscience/climate-modeling-guide/SKILL.md +215 -0
- package/skills/domains/geoscience/pangaea-data-api/SKILL.md +197 -0
- package/skills/domains/geoscience/satellite-remote-sensing/SKILL.md +193 -0
- package/skills/domains/geoscience/seismology-data-guide/SKILL.md +208 -0
- package/skills/domains/humanities/digital-humanities-methods/SKILL.md +232 -0
- package/skills/domains/humanities/ethical-philosophy-guide/SKILL.md +244 -0
- package/skills/domains/humanities/history-research-guide/SKILL.md +260 -0
- package/skills/domains/humanities/political-history-guide/SKILL.md +241 -0
- package/skills/domains/law/caselaw-access-api/SKILL.md +149 -0
- package/skills/domains/law/legal-agent-skills-guide/SKILL.md +132 -0
- package/skills/domains/law/legal-nlp-guide/SKILL.md +236 -0
- package/skills/domains/law/legal-research-methods/SKILL.md +190 -0
- package/skills/domains/law/opencontracts-guide/SKILL.md +168 -0
- package/skills/domains/law/patent-analysis-guide/SKILL.md +257 -0
- package/skills/domains/law/regulatory-compliance-guide/SKILL.md +267 -0
- package/skills/domains/math/lean-theorem-proving-guide/SKILL.md +140 -0
- package/skills/domains/math/symbolic-computation-guide/SKILL.md +263 -0
- package/skills/domains/math/topology-data-analysis/SKILL.md +305 -0
- package/skills/domains/pharma/clinical-trial-design-guide/SKILL.md +271 -0
- package/skills/domains/pharma/drug-target-interaction/SKILL.md +242 -0
- package/skills/domains/pharma/madd-drug-discovery-guide/SKILL.md +153 -0
- package/skills/domains/pharma/pharmacovigilance-guide/SKILL.md +216 -0
- package/skills/domains/physics/astrophysics-data-guide/SKILL.md +305 -0
- package/skills/domains/physics/particle-physics-guide/SKILL.md +287 -0
- package/skills/domains/social-science/ipums-microdata-api/SKILL.md +211 -0
- package/skills/domains/social-science/network-analysis-guide/SKILL.md +310 -0
- package/skills/domains/social-science/psychology-research-guide/SKILL.md +270 -0
- package/skills/domains/social-science/sociology-research-guide/SKILL.md +238 -0
- package/skills/domains/social-science/sociology-research-methods/SKILL.md +181 -0
- package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +233 -0
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +120 -0
- package/skills/literature/discovery/papers-we-love-guide/SKILL.md +169 -0
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +144 -0
- package/skills/literature/discovery/zotero-arxiv-daily-guide/SKILL.md +94 -0
- package/skills/literature/fulltext/bioc-pmc-api/SKILL.md +146 -0
- package/skills/literature/fulltext/core-api-guide/SKILL.md +144 -0
- package/skills/literature/fulltext/dataverse-api/SKILL.md +215 -0
- package/skills/literature/fulltext/hal-archive-api/SKILL.md +218 -0
- package/skills/literature/fulltext/institutional-repository-guide/SKILL.md +212 -0
- package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +341 -0
- package/skills/literature/fulltext/osf-api/SKILL.md +212 -0
- package/skills/literature/fulltext/pmc-ftp-bulk-download/SKILL.md +182 -0
- package/skills/literature/fulltext/zotero-ai-butler-guide/SKILL.md +166 -0
- package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +168 -0
- package/skills/literature/metadata/academic-paper-summarizer/SKILL.md +101 -0
- package/skills/literature/metadata/bibliometrix-guide/SKILL.md +164 -0
- package/skills/literature/metadata/crossref-event-data-api/SKILL.md +183 -0
- package/skills/literature/metadata/doi-content-negotiation/SKILL.md +202 -0
- package/skills/literature/metadata/orkg-api/SKILL.md +153 -0
- package/skills/literature/metadata/plumx-metrics-api/SKILL.md +188 -0
- package/skills/literature/metadata/ror-organization-api/SKILL.md +208 -0
- package/skills/literature/metadata/sophosia-reference-guide/SKILL.md +110 -0
- package/skills/literature/metadata/viaf-authority-api/SKILL.md +209 -0
- package/skills/literature/metadata/wikidata-api-guide/SKILL.md +156 -0
- package/skills/literature/metadata/zoplicate-dedup-guide/SKILL.md +147 -0
- package/skills/literature/metadata/zotero-actions-tags-guide/SKILL.md +212 -0
- package/skills/literature/metadata/zotmoov-guide/SKILL.md +120 -0
- package/skills/literature/metadata/zutilo-guide/SKILL.md +140 -0
- package/skills/literature/search/arxiv-batch-reporting/SKILL.md +133 -0
- package/skills/literature/search/arxiv-cli-tools/SKILL.md +172 -0
- package/skills/literature/search/arxiv-osiris/SKILL.md +199 -0
- package/skills/literature/search/arxiv-paper-processor/SKILL.md +141 -0
- package/skills/literature/search/baidu-scholar-guide/SKILL.md +110 -0
- package/skills/literature/search/base-academic-search/SKILL.md +196 -0
- package/skills/literature/search/chatpaper-guide/SKILL.md +122 -0
- package/skills/literature/search/citeseerx-api/SKILL.md +183 -0
- package/skills/literature/search/deep-literature-search/SKILL.md +149 -0
- package/skills/literature/search/deepgit-search-guide/SKILL.md +147 -0
- package/skills/literature/search/eric-education-api/SKILL.md +199 -0
- package/skills/literature/search/findpapers-guide/SKILL.md +177 -0
- package/skills/literature/search/ieee-xplore-api/SKILL.md +177 -0
- package/skills/literature/search/lens-scholarly-api/SKILL.md +211 -0
- package/skills/literature/search/multi-database-literature-search/SKILL.md +198 -0
- package/skills/literature/search/open-library-api/SKILL.md +196 -0
- package/skills/literature/search/open-semantic-search-guide/SKILL.md +190 -0
- package/skills/literature/search/openaire-api/SKILL.md +141 -0
- package/skills/literature/search/paper-search-mcp-guide/SKILL.md +107 -0
- package/skills/literature/search/papers-chat-guide/SKILL.md +194 -0
- package/skills/literature/search/pasa-paper-search-guide/SKILL.md +138 -0
- package/skills/literature/search/plos-open-access-api/SKILL.md +203 -0
- package/skills/literature/search/scielo-api/SKILL.md +182 -0
- package/skills/literature/search/share-research-api/SKILL.md +129 -0
- package/skills/literature/search/worldcat-search-api/SKILL.md +224 -0
- package/skills/research/automation/ai-scientist-v2-guide/SKILL.md +284 -0
- package/skills/research/automation/aim-experiment-guide/SKILL.md +234 -0
- package/skills/research/automation/claude-academic-workflow-guide/SKILL.md +202 -0
- package/skills/research/automation/coexist-ai-guide/SKILL.md +149 -0
- package/skills/research/automation/datagen-research-guide/SKILL.md +131 -0
- package/skills/research/automation/foam-agent-guide/SKILL.md +203 -0
- package/skills/research/automation/kedro-pipeline-guide/SKILL.md +216 -0
- package/skills/research/automation/mle-agent-guide/SKILL.md +139 -0
- package/skills/research/automation/paper-to-agent-guide/SKILL.md +116 -0
- package/skills/research/automation/rd-agent-guide/SKILL.md +246 -0
- package/skills/research/automation/research-paper-orchestrator/SKILL.md +254 -0
- package/skills/research/deep-research/academic-deep-research/SKILL.md +190 -0
- package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +141 -0
- package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +200 -0
- package/skills/research/deep-research/corvus-research-guide/SKILL.md +132 -0
- package/skills/research/deep-research/deep-research-pro/SKILL.md +213 -0
- package/skills/research/deep-research/deep-research-work/SKILL.md +204 -0
- package/skills/research/deep-research/deep-searcher-guide/SKILL.md +253 -0
- package/skills/research/deep-research/gpt-researcher-guide/SKILL.md +191 -0
- package/skills/research/deep-research/in-depth-research-guide/SKILL.md +205 -0
- package/skills/research/deep-research/khoj-research-guide/SKILL.md +200 -0
- package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +185 -0
- package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +178 -0
- package/skills/research/deep-research/local-deep-research-guide/SKILL.md +253 -0
- package/skills/research/deep-research/open-researcher-guide/SKILL.md +138 -0
- package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +217 -0
- package/skills/research/funding/eu-horizon-guide/SKILL.md +244 -0
- package/skills/research/funding/grant-budget-guide/SKILL.md +284 -0
- package/skills/research/funding/nih-reporter-api-guide/SKILL.md +166 -0
- package/skills/research/funding/nsf-award-api-guide/SKILL.md +133 -0
- package/skills/research/methodology/academic-mentor-guide/SKILL.md +169 -0
- package/skills/research/methodology/claude-scientific-guide/SKILL.md +122 -0
- package/skills/research/methodology/deep-innovator-guide/SKILL.md +242 -0
- package/skills/research/methodology/osf-api-guide/SKILL.md +165 -0
- package/skills/research/methodology/parsifal-slr-guide/SKILL.md +154 -0
- package/skills/research/methodology/research-paper-kb/SKILL.md +263 -0
- package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +169 -0
- package/skills/research/methodology/research-town-guide/SKILL.md +263 -0
- package/skills/research/methodology/slr-automation-guide/SKILL.md +235 -0
- package/skills/research/paper-review/automated-review-guide/SKILL.md +281 -0
- package/skills/research/paper-review/latte-review-guide/SKILL.md +175 -0
- package/skills/research/paper-review/paper-compare-guide/SKILL.md +238 -0
- package/skills/research/paper-review/paper-critique-framework/SKILL.md +181 -0
- package/skills/research/paper-review/paper-digest-guide/SKILL.md +240 -0
- package/skills/research/paper-review/paper-research-assistant/SKILL.md +231 -0
- package/skills/research/paper-review/research-quality-filter/SKILL.md +261 -0
- package/skills/research/paper-review/review-response-guide/SKILL.md +275 -0
- package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +110 -0
- package/skills/tools/code-exec/google-colab-guide/SKILL.md +276 -0
- package/skills/tools/code-exec/kaggle-api-guide/SKILL.md +216 -0
- package/skills/tools/code-exec/overleaf-cli-guide/SKILL.md +279 -0
- package/skills/tools/diagram/clawphd-guide/SKILL.md +149 -0
- package/skills/tools/diagram/code-flow-visualizer/SKILL.md +197 -0
- package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +170 -0
- package/skills/tools/diagram/json-data-visualizer/SKILL.md +270 -0
- package/skills/tools/diagram/kroki-diagram-api/SKILL.md +198 -0
- package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +219 -0
- package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +201 -0
- package/skills/tools/diagram/tldraw-whiteboard-guide/SKILL.md +397 -0
- package/skills/tools/document/docsgpt-guide/SKILL.md +130 -0
- package/skills/tools/document/large-document-reader/SKILL.md +202 -0
- package/skills/tools/document/md2pdf-xelatex/SKILL.md +212 -0
- package/skills/tools/document/openpaper-guide/SKILL.md +232 -0
- package/skills/tools/document/paper-parse-guide/SKILL.md +243 -0
- package/skills/tools/document/weknora-guide/SKILL.md +216 -0
- package/skills/tools/document/zotero-addon-market-guide/SKILL.md +108 -0
- package/skills/tools/document/zotero-night-theme-guide/SKILL.md +142 -0
- package/skills/tools/document/zotero-style-guide/SKILL.md +217 -0
- package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +244 -0
- package/skills/tools/knowledge-graph/concept-map-generator/SKILL.md +284 -0
- package/skills/tools/knowledge-graph/graphiti-guide/SKILL.md +219 -0
- package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +135 -0
- package/skills/tools/knowledge-graph/notero-zotero-notion-guide/SKILL.md +187 -0
- package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +156 -0
- package/skills/tools/knowledge-graph/openspg-guide/SKILL.md +210 -0
- package/skills/tools/knowledge-graph/paperpile-notion-guide/SKILL.md +84 -0
- package/skills/tools/knowledge-graph/zotero-markdb-connect-guide/SKILL.md +162 -0
- package/skills/tools/ocr-translate/latex-translation-guide/SKILL.md +176 -0
- package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +198 -0
- package/skills/tools/ocr-translate/pdf-math-translate-guide/SKILL.md +141 -0
- package/skills/tools/ocr-translate/zotero-pdf-translate-guide/SKILL.md +95 -0
- package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +143 -0
- package/skills/tools/scraping/dataset-finder-guide/SKILL.md +253 -0
- package/skills/tools/scraping/easy-spider-guide/SKILL.md +250 -0
- package/skills/tools/scraping/google-scholar-scraper/SKILL.md +255 -0
- package/skills/tools/scraping/repository-harvesting-guide/SKILL.md +310 -0
- package/skills/writing/citation/academic-citation-manager/SKILL.md +314 -0
- package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +182 -0
- package/skills/writing/citation/citation-assistant-skill/SKILL.md +192 -0
- package/skills/writing/citation/jabref-reference-guide/SKILL.md +127 -0
- package/skills/writing/citation/jasminum-zotero-guide/SKILL.md +103 -0
- package/skills/writing/citation/mendeley-api/SKILL.md +231 -0
- package/skills/writing/citation/obsidian-citation-guide/SKILL.md +164 -0
- package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +137 -0
- package/skills/writing/citation/onecite-reference-guide/SKILL.md +168 -0
- package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +132 -0
- package/skills/writing/citation/papis-cli-guide/SKILL.md +213 -0
- package/skills/writing/citation/zotero-better-bibtex-guide/SKILL.md +107 -0
- package/skills/writing/citation/zotero-better-notes-guide/SKILL.md +121 -0
- package/skills/writing/citation/zotero-gpt-guide/SKILL.md +111 -0
- package/skills/writing/citation/zotero-mcp-guide/SKILL.md +164 -0
- package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +162 -0
- package/skills/writing/citation/zotero-reference-guide/SKILL.md +139 -0
- package/skills/writing/citation/zotero-scholar-guide/SKILL.md +294 -0
- package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +140 -0
- package/skills/writing/composition/ml-paper-writing/SKILL.md +163 -0
- package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +200 -0
- package/skills/writing/composition/paper-debugger-guide/SKILL.md +143 -0
- package/skills/writing/composition/paperforge-guide/SKILL.md +205 -0
- package/skills/writing/composition/research-paper-writer/SKILL.md +226 -0
- package/skills/writing/composition/scientific-writing-resources/SKILL.md +151 -0
- package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +153 -0
- package/skills/writing/latex/academic-writing-latex/SKILL.md +285 -0
- package/skills/writing/latex/latex-drawing-collection/SKILL.md +154 -0
- package/skills/writing/latex/latex-templates-collection/SKILL.md +159 -0
- package/skills/writing/latex/md-to-pdf-academic/SKILL.md +230 -0
- package/skills/writing/latex/tex-render-guide/SKILL.md +243 -0
- package/skills/writing/polish/academic-tone-guide/SKILL.md +209 -0
- package/skills/writing/polish/chinese-text-humanizer/SKILL.md +140 -0
- package/skills/writing/polish/conciseness-editing-guide/SKILL.md +225 -0
- package/skills/writing/polish/paper-polish-guide/SKILL.md +160 -0
- package/skills/writing/templates/arxiv-preprint-template/SKILL.md +184 -0
- package/skills/writing/templates/elegant-paper-template/SKILL.md +141 -0
- package/skills/writing/templates/graphical-abstract-guide/SKILL.md +183 -0
- package/skills/writing/templates/novathesis-guide/SKILL.md +152 -0
- package/skills/writing/templates/scientific-article-pdf/SKILL.md +261 -0
- package/skills/writing/templates/sjtuthesis-guide/SKILL.md +197 -0
- package/skills/writing/templates/thuthesis-guide/SKILL.md +181 -0
- package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +0 -207
|
@@ -0,0 +1,212 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: institutional-repository-guide
|
|
3
|
+
description: "Access papers from institutional and subject repositories at scale"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "🏛️"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "fulltext"
|
|
9
|
+
keywords: ["institutional repository", "DSpace", "EPrints", "open access archive", "subject repository", "OpenDOAR"]
|
|
10
|
+
source: "wentor-research-plugins"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Institutional Repository Guide
|
|
14
|
+
|
|
15
|
+
Institutional repositories (IRs) are university-run digital archives that store and provide open access to their researchers' scholarly output — dissertations, journal articles, conference papers, datasets, and technical reports. Subject repositories like arXiv, bioRxiv, SSRN, and RePEc serve similar functions for specific disciplines. Together, they form a distributed network of open scholarship that complements commercial databases.
|
|
16
|
+
|
|
17
|
+
This guide covers how to discover, access, and systematically harvest content from institutional and subject repositories for literature reviews, meta-analyses, and research data collection.
|
|
18
|
+
|
|
19
|
+
## Repository Landscape
|
|
20
|
+
|
|
21
|
+
### Types of Repositories
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
Institutional Repositories (IR):
|
|
25
|
+
- Run by universities to archive their researchers' output
|
|
26
|
+
- Examples: DSpace, EPrints, Fedora-based systems
|
|
27
|
+
- Discovery: OpenDOAR directory (v2.sherpa.ac.uk/opendoar)
|
|
28
|
+
|
|
29
|
+
Subject Repositories:
|
|
30
|
+
- Discipline-specific archives
|
|
31
|
+
- arXiv (physics, CS, math), bioRxiv, SSRN, RePEc, EarthArXiv
|
|
32
|
+
|
|
33
|
+
Aggregators:
|
|
34
|
+
- Harvest from many repositories into a single search interface
|
|
35
|
+
- BASE (Bielefeld Academic Search Engine)
|
|
36
|
+
- CORE (core.ac.uk, 200M+ open access articles)
|
|
37
|
+
- OpenAIRE (European research output)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Discovering Repositories
|
|
41
|
+
|
|
42
|
+
OpenDOAR (Directory of Open Access Repositories) is the primary registry for finding institutional repositories:
|
|
43
|
+
|
|
44
|
+
```python
|
|
45
|
+
import urllib.request
|
|
46
|
+
import json
|
|
47
|
+
|
|
48
|
+
def search_opendoar(subject: str = None, country: str = None) -> list:
|
|
49
|
+
"""
|
|
50
|
+
Search the OpenDOAR registry for institutional repositories.
|
|
51
|
+
|
|
52
|
+
Args:
|
|
53
|
+
subject: Filter by subject area (e.g., "Biology", "Computer Science")
|
|
54
|
+
country: ISO country code (e.g., "US", "GB", "CN")
|
|
55
|
+
"""
|
|
56
|
+
base_url = "https://v2.sherpa.ac.uk/cgi/retrieve"
|
|
57
|
+
params = "?item-type=repository&format=Json"
|
|
58
|
+
if subject:
|
|
59
|
+
params += f"&filter=[[\"{subject}\",\"subject\"]]"
|
|
60
|
+
if country:
|
|
61
|
+
params += f"&filter=[[\"{country}\",\"country\"]]"
|
|
62
|
+
|
|
63
|
+
req = urllib.request.Request(base_url + params)
|
|
64
|
+
response = urllib.request.urlopen(req)
|
|
65
|
+
data = json.loads(response.read())
|
|
66
|
+
|
|
67
|
+
repositories = []
|
|
68
|
+
for item in data.get("items", []):
|
|
69
|
+
repo_info = {
|
|
70
|
+
"name": item.get("repository_metadata", {}).get("name", [{}])[0].get("name", ""),
|
|
71
|
+
"url": item.get("repository_metadata", {}).get("url", ""),
|
|
72
|
+
"oai_url": item.get("repository_metadata", {}).get("oai_url", ""),
|
|
73
|
+
"software": item.get("repository_metadata", {}).get("software", {}).get("name", ""),
|
|
74
|
+
"type": item.get("repository_metadata", {}).get("type", "")
|
|
75
|
+
}
|
|
76
|
+
repositories.append(repo_info)
|
|
77
|
+
|
|
78
|
+
return repositories
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## OAI-PMH Harvesting from Repositories
|
|
82
|
+
|
|
83
|
+
Most institutional repositories support OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting), the standard protocol for metadata exchange:
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
import xml.etree.ElementTree as ET
|
|
87
|
+
import urllib.request
|
|
88
|
+
|
|
89
|
+
def harvest_repository(base_url: str, metadata_prefix: str = "oai_dc",
|
|
90
|
+
set_spec: str = None, from_date: str = None) -> list:
|
|
91
|
+
"""
|
|
92
|
+
Harvest metadata records from a repository's OAI-PMH endpoint.
|
|
93
|
+
|
|
94
|
+
Args:
|
|
95
|
+
base_url: The OAI-PMH base URL
|
|
96
|
+
metadata_prefix: Metadata format (oai_dc, datacite, mets)
|
|
97
|
+
set_spec: Optional set/collection to restrict harvesting
|
|
98
|
+
from_date: Harvest only records added after this date (YYYY-MM-DD)
|
|
99
|
+
"""
|
|
100
|
+
params = f"?verb=ListRecords&metadataPrefix={metadata_prefix}"
|
|
101
|
+
if set_spec:
|
|
102
|
+
params += f"&set={set_spec}"
|
|
103
|
+
if from_date:
|
|
104
|
+
params += f"&from={from_date}"
|
|
105
|
+
|
|
106
|
+
url = base_url + params
|
|
107
|
+
records = []
|
|
108
|
+
|
|
109
|
+
while url:
|
|
110
|
+
response = urllib.request.urlopen(url)
|
|
111
|
+
tree = ET.parse(response)
|
|
112
|
+
root = tree.getroot()
|
|
113
|
+
ns = {"oai": "http://www.openarchives.org/OAI/2.0/"}
|
|
114
|
+
|
|
115
|
+
for record in root.findall(".//oai:record", ns):
|
|
116
|
+
header = record.find("oai:header", ns)
|
|
117
|
+
identifier = header.find("oai:identifier", ns).text
|
|
118
|
+
datestamp = header.find("oai:datestamp", ns).text
|
|
119
|
+
records.append({"identifier": identifier, "datestamp": datestamp})
|
|
120
|
+
|
|
121
|
+
token_elem = root.find(".//oai:resumptionToken", ns)
|
|
122
|
+
if token_elem is not None and token_elem.text:
|
|
123
|
+
url = f"{base_url}?verb=ListRecords&resumptionToken={token_elem.text}"
|
|
124
|
+
else:
|
|
125
|
+
url = None
|
|
126
|
+
|
|
127
|
+
return records
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Key OAI-PMH Verbs
|
|
131
|
+
|
|
132
|
+
| Verb | Purpose |
|
|
133
|
+
|------|---------|
|
|
134
|
+
| `Identify` | Get repository name, admin email, policies |
|
|
135
|
+
| `ListSets` | List available collections/sets |
|
|
136
|
+
| `ListMetadataFormats` | List supported metadata schemas |
|
|
137
|
+
| `ListIdentifiers` | Lightweight listing of record headers |
|
|
138
|
+
| `ListRecords` | Full metadata records with pagination |
|
|
139
|
+
| `GetRecord` | Retrieve a single record by identifier |
|
|
140
|
+
|
|
141
|
+
## Major Repository Platforms
|
|
142
|
+
|
|
143
|
+
### DSpace
|
|
144
|
+
|
|
145
|
+
The most widely deployed open-source repository platform (used by ~40% of repositories worldwide):
|
|
146
|
+
|
|
147
|
+
- OAI-PMH endpoint: `{base-url}/oai/request`
|
|
148
|
+
- REST API: `{base-url}/server/api`
|
|
149
|
+
- Supports Dublin Core, METS, and custom metadata schemas
|
|
150
|
+
- Examples: MIT DSpace, University of Cambridge Repository
|
|
151
|
+
|
|
152
|
+
### EPrints
|
|
153
|
+
|
|
154
|
+
Popular in the UK and Europe:
|
|
155
|
+
|
|
156
|
+
- OAI-PMH endpoint: `{base-url}/cgi/oai2`
|
|
157
|
+
- REST API: `{base-url}/cgi/export/{id}/{format}`
|
|
158
|
+
- Strong support for research output types (articles, theses, conference items)
|
|
159
|
+
- Examples: University of Southampton EPrints
|
|
160
|
+
|
|
161
|
+
### Fedora / Islandora
|
|
162
|
+
|
|
163
|
+
Used by larger institutions with complex digital collections:
|
|
164
|
+
|
|
165
|
+
- Typically paired with a discovery layer (Solr/Blacklight)
|
|
166
|
+
- Strong support for digital preservation workflows
|
|
167
|
+
- Examples: University of Toronto, Smithsonian Institution
|
|
168
|
+
|
|
169
|
+
## Building a Harvesting Pipeline
|
|
170
|
+
|
|
171
|
+
### Systematic Collection Workflow
|
|
172
|
+
|
|
173
|
+
```
|
|
174
|
+
1. Identify target repositories
|
|
175
|
+
- Use OpenDOAR to find IRs by subject or country
|
|
176
|
+
- List subject repositories relevant to your discipline
|
|
177
|
+
|
|
178
|
+
2. Test endpoints
|
|
179
|
+
- Send Identify request to verify the endpoint is active
|
|
180
|
+
- Check ListMetadataFormats for available schemas
|
|
181
|
+
|
|
182
|
+
3. Harvest incrementally
|
|
183
|
+
- Use "from" parameter to harvest only new records
|
|
184
|
+
- Store last harvest date for each repository
|
|
185
|
+
- Respect rate limits (typically 1 request per second)
|
|
186
|
+
|
|
187
|
+
4. Deduplicate
|
|
188
|
+
- Match records by DOI when available
|
|
189
|
+
- Use title + author fuzzy matching for records without DOIs
|
|
190
|
+
- Flag duplicates rather than deleting (keep provenance)
|
|
191
|
+
|
|
192
|
+
5. Store and index
|
|
193
|
+
- Save metadata in structured format (JSON, SQLite, CSV)
|
|
194
|
+
- Build a local search index for efficient retrieval
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
## Ethical Considerations
|
|
198
|
+
|
|
199
|
+
- Always respect `robots.txt` and repository rate limits
|
|
200
|
+
- Metadata harvesting is generally permitted; bulk full-text download may require permission
|
|
201
|
+
- Check each repository's terms of use before harvesting
|
|
202
|
+
- Use harvested data for research purposes, not commercial redistribution
|
|
203
|
+
- Attribute the source repository in publications using harvested data
|
|
204
|
+
- Consider reaching out to repository administrators for large-scale harvesting projects
|
|
205
|
+
|
|
206
|
+
## References
|
|
207
|
+
|
|
208
|
+
- OpenDOAR: https://v2.sherpa.ac.uk/opendoar/
|
|
209
|
+
- OAI-PMH specification: http://www.openarchives.org/OAI/openarchivesprotocol.html
|
|
210
|
+
- CORE: https://core.ac.uk
|
|
211
|
+
- BASE: https://www.base-search.net
|
|
212
|
+
- DSpace documentation: https://wiki.lyrasis.org/display/DSPACE
|
|
@@ -0,0 +1,341 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: open-access-mining-guide
|
|
3
|
+
description: "Mine open access full-text repositories for research data extraction"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "unlock"
|
|
7
|
+
category: "literature"
|
|
8
|
+
subcategory: "fulltext"
|
|
9
|
+
keywords: ["open access", "text mining", "full text", "PubMed Central", "CORE", "content mining", "TDM"]
|
|
10
|
+
source: "wentor-research-plugins"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Open Access Mining Guide
|
|
14
|
+
|
|
15
|
+
A skill for systematically mining open access full-text repositories to extract structured research data at scale. Covers legal frameworks for text and data mining (TDM), major open access repositories and their APIs, full-text retrieval and parsing, section-level extraction, entity recognition in scientific text, and building reproducible mining pipelines.
|
|
16
|
+
|
|
17
|
+
## Legal Framework for Text and Data Mining
|
|
18
|
+
|
|
19
|
+
### Rights and Regulations
|
|
20
|
+
|
|
21
|
+
Text and data mining of published literature operates within a specific legal framework that varies by jurisdiction. Understanding these rules is essential before starting any mining project.
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
Legal landscape for TDM:
|
|
25
|
+
|
|
26
|
+
EU Directive 2019/790 (DSM Directive):
|
|
27
|
+
- Article 3: TDM exception for research organizations
|
|
28
|
+
- Lawful access required (institutional subscription counts)
|
|
29
|
+
- Must be for scientific research purposes
|
|
30
|
+
- No opt-out possible for publishers
|
|
31
|
+
- Applies to EU/EEA research institutions
|
|
32
|
+
- Article 4: General TDM exception
|
|
33
|
+
- Available to anyone with lawful access
|
|
34
|
+
- Publishers CAN opt out (via robots.txt or metadata)
|
|
35
|
+
|
|
36
|
+
UK: TDM exception for non-commercial research (CDPA s.29A)
|
|
37
|
+
|
|
38
|
+
US: No specific TDM law; relies on fair use doctrine
|
|
39
|
+
- Transformative use generally favored by courts
|
|
40
|
+
- Google Books case (2015) supports large-scale text analysis
|
|
41
|
+
- But: database protection via Terms of Service
|
|
42
|
+
|
|
43
|
+
Practical guidelines:
|
|
44
|
+
- Mine open access content (CC-BY, CC-BY-SA) freely
|
|
45
|
+
- Mine subscription content under institutional license
|
|
46
|
+
- Check publisher TDM policies (Elsevier, Springer, Wiley
|
|
47
|
+
all have TDM APIs for licensed content)
|
|
48
|
+
- Never redistribute full text; share derived data only
|
|
49
|
+
- Credit the data source in publications
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Major Open Access Repositories
|
|
53
|
+
|
|
54
|
+
### Repository Comparison
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
Repository overview for full-text mining:
|
|
58
|
+
|
|
59
|
+
PubMed Central (PMC):
|
|
60
|
+
- Coverage: 8M+ full-text articles (biomedical/life sciences)
|
|
61
|
+
- Access: Free, OA subset freely downloadable
|
|
62
|
+
- Formats: XML (JATS), PDF
|
|
63
|
+
- API: E-utilities (Entrez), bulk FTP download
|
|
64
|
+
- License: varies by article (check individual licenses)
|
|
65
|
+
- Best for: biomedical systematic reviews, meta-analyses
|
|
66
|
+
- Bulk download: ftp.ncbi.nlm.nih.gov/pub/pmc/
|
|
67
|
+
|
|
68
|
+
Europe PMC:
|
|
69
|
+
- Coverage: PMC content + European-funded research
|
|
70
|
+
- Access: Free, REST API
|
|
71
|
+
- Formats: XML, JSON
|
|
72
|
+
- API: europepmc.org/RestfulWebService
|
|
73
|
+
- Annotations: sentence-level annotations, concepts, data links
|
|
74
|
+
- Best for: European research, annotated text mining
|
|
75
|
+
|
|
76
|
+
CORE (core.ac.uk):
|
|
77
|
+
- Coverage: 200M+ metadata records, 36M+ full texts
|
|
78
|
+
- Access: Free API (registration required)
|
|
79
|
+
- Formats: JSON, full text as extracted plain text
|
|
80
|
+
- Sources: aggregates from 10,000+ repositories worldwide
|
|
81
|
+
- Best for: cross-disciplinary mining, thesis/dissertation text
|
|
82
|
+
|
|
83
|
+
arXiv:
|
|
84
|
+
- Coverage: 2M+ preprints (physics, math, CS, etc.)
|
|
85
|
+
- Access: Free bulk download, API
|
|
86
|
+
- Formats: LaTeX source, PDF
|
|
87
|
+
- Bulk: Kaggle dataset, S3 requester-pays bucket
|
|
88
|
+
- Best for: STEM preprint analysis, citation studies
|
|
89
|
+
|
|
90
|
+
Unpaywall / OpenAlex:
|
|
91
|
+
- Coverage: tracks OA status of 200M+ works
|
|
92
|
+
- Access: Free API, database dump
|
|
93
|
+
- Use: Find OA versions of any DOI
|
|
94
|
+
- Best for: Locating freely available versions of papers
|
|
95
|
+
|
|
96
|
+
Semantic Scholar:
|
|
97
|
+
- Coverage: 200M+ papers, abstracts + some full text
|
|
98
|
+
- Access: Free API, bulk datasets
|
|
99
|
+
- Features: TLDR summaries, citation intents, S2ORC corpus
|
|
100
|
+
- Best for: NLP research on scientific text
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Full-Text Retrieval and Parsing
|
|
104
|
+
|
|
105
|
+
### Retrieving from PubMed Central
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
import requests
|
|
109
|
+
import xml.etree.ElementTree as ET
|
|
110
|
+
import time
|
|
111
|
+
|
|
112
|
+
def fetch_pmc_fulltext(pmc_id):
|
|
113
|
+
"""
|
|
114
|
+
Fetch full-text XML from PubMed Central via E-utilities.
|
|
115
|
+
|
|
116
|
+
Args:
|
|
117
|
+
pmc_id: PMC identifier (e.g., "PMC7096724")
|
|
118
|
+
|
|
119
|
+
Returns:
|
|
120
|
+
Parsed article as structured dictionary
|
|
121
|
+
"""
|
|
122
|
+
base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
|
|
123
|
+
params = {
|
|
124
|
+
"db": "pmc",
|
|
125
|
+
"id": pmc_id.replace("PMC", ""),
|
|
126
|
+
"rettype": "xml",
|
|
127
|
+
}
|
|
128
|
+
|
|
129
|
+
response = requests.get(base_url, params=params, timeout=30)
|
|
130
|
+
response.raise_for_status()
|
|
131
|
+
|
|
132
|
+
root = ET.fromstring(response.content)
|
|
133
|
+
article = parse_jats_xml(root)
|
|
134
|
+
|
|
135
|
+
return article
|
|
136
|
+
|
|
137
|
+
|
|
138
|
+
def parse_jats_xml(root):
|
|
139
|
+
"""
|
|
140
|
+
Parse JATS XML (Journal Article Tag Suite) into structured data.
|
|
141
|
+
JATS is the standard XML format for PMC articles.
|
|
142
|
+
"""
|
|
143
|
+
article = {}
|
|
144
|
+
|
|
145
|
+
# Title
|
|
146
|
+
title_elem = root.find(".//article-title")
|
|
147
|
+
article["title"] = "".join(title_elem.itertext()) if title_elem is not None else ""
|
|
148
|
+
|
|
149
|
+
# Abstract
|
|
150
|
+
abstract_elem = root.find(".//abstract")
|
|
151
|
+
if abstract_elem is not None:
|
|
152
|
+
article["abstract"] = "".join(abstract_elem.itertext()).strip()
|
|
153
|
+
|
|
154
|
+
# Body sections
|
|
155
|
+
body = root.find(".//body")
|
|
156
|
+
if body is not None:
|
|
157
|
+
article["sections"] = extract_sections(body)
|
|
158
|
+
|
|
159
|
+
# References
|
|
160
|
+
ref_list = root.find(".//ref-list")
|
|
161
|
+
if ref_list is not None:
|
|
162
|
+
article["references"] = extract_references(ref_list)
|
|
163
|
+
|
|
164
|
+
return article
|
|
165
|
+
|
|
166
|
+
|
|
167
|
+
def extract_sections(body_element):
|
|
168
|
+
"""
|
|
169
|
+
Extract sections with their titles and text content.
|
|
170
|
+
Preserves the hierarchical structure of the paper.
|
|
171
|
+
"""
|
|
172
|
+
sections = []
|
|
173
|
+
for sec in body_element.findall(".//sec"):
|
|
174
|
+
title_elem = sec.find("title")
|
|
175
|
+
title = title_elem.text if title_elem is not None else "Untitled"
|
|
176
|
+
paragraphs = []
|
|
177
|
+
for p in sec.findall("p"):
|
|
178
|
+
text = "".join(p.itertext()).strip()
|
|
179
|
+
if text:
|
|
180
|
+
paragraphs.append(text)
|
|
181
|
+
|
|
182
|
+
sections.append({
|
|
183
|
+
"title": title,
|
|
184
|
+
"text": "\n".join(paragraphs),
|
|
185
|
+
"id": sec.get("id", ""),
|
|
186
|
+
})
|
|
187
|
+
|
|
188
|
+
return sections
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Batch Processing Pipeline
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
def batch_mine_pmc(pmc_ids, output_dir, delay=0.4):
|
|
195
|
+
"""
|
|
196
|
+
Mine multiple PMC articles with rate limiting.
|
|
197
|
+
|
|
198
|
+
NCBI E-utilities rate limit:
|
|
199
|
+
- Without API key: 3 requests/second
|
|
200
|
+
- With API key: 10 requests/second
|
|
201
|
+
- Register for API key at ncbi.nlm.nih.gov/account/
|
|
202
|
+
"""
|
|
203
|
+
import json
|
|
204
|
+
import os
|
|
205
|
+
|
|
206
|
+
results = []
|
|
207
|
+
errors = []
|
|
208
|
+
|
|
209
|
+
for i, pmc_id in enumerate(pmc_ids):
|
|
210
|
+
try:
|
|
211
|
+
article = fetch_pmc_fulltext(pmc_id)
|
|
212
|
+
results.append(article)
|
|
213
|
+
|
|
214
|
+
# Save individual article
|
|
215
|
+
output_path = os.path.join(output_dir, f"{pmc_id}.json")
|
|
216
|
+
with open(output_path, "w") as f:
|
|
217
|
+
json.dump(article, f, indent=2)
|
|
218
|
+
|
|
219
|
+
if (i + 1) % 100 == 0:
|
|
220
|
+
print(f"Processed {i + 1}/{len(pmc_ids)} articles")
|
|
221
|
+
|
|
222
|
+
except Exception as e:
|
|
223
|
+
errors.append({"pmc_id": pmc_id, "error": str(e)})
|
|
224
|
+
|
|
225
|
+
# Rate limiting
|
|
226
|
+
time.sleep(delay)
|
|
227
|
+
|
|
228
|
+
print(f"Successfully mined {len(results)} articles, "
|
|
229
|
+
f"{len(errors)} errors")
|
|
230
|
+
return results, errors
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
## Information Extraction from Full Text
|
|
234
|
+
|
|
235
|
+
### Section-Level Extraction
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
Targeted extraction by paper section:
|
|
239
|
+
|
|
240
|
+
Introduction:
|
|
241
|
+
- Research questions and hypotheses
|
|
242
|
+
- Knowledge gaps identified
|
|
243
|
+
- Theoretical framework references
|
|
244
|
+
|
|
245
|
+
Methods:
|
|
246
|
+
- Study design (RCT, cohort, case-control, etc.)
|
|
247
|
+
- Sample size and population characteristics
|
|
248
|
+
- Measurement instruments and their validity
|
|
249
|
+
- Statistical analysis methods
|
|
250
|
+
- Software and versions used
|
|
251
|
+
|
|
252
|
+
Results:
|
|
253
|
+
- Effect sizes with confidence intervals
|
|
254
|
+
- P-values and test statistics
|
|
255
|
+
- Participant flow (enrollment, dropout, analysis)
|
|
256
|
+
- Tables and figures (structured data)
|
|
257
|
+
|
|
258
|
+
Discussion:
|
|
259
|
+
- Key findings summarized
|
|
260
|
+
- Comparison with prior work
|
|
261
|
+
- Limitations acknowledged
|
|
262
|
+
- Future directions proposed
|
|
263
|
+
- Clinical/practical implications
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
### Named Entity Recognition for Science
|
|
267
|
+
|
|
268
|
+
```python
|
|
269
|
+
def extract_scientific_entities(text):
|
|
270
|
+
"""
|
|
271
|
+
Extract scientific named entities from full text.
|
|
272
|
+
|
|
273
|
+
For biomedical text, use specialized NER models:
|
|
274
|
+
- SciSpaCy: biomedical NER (diseases, chemicals, genes)
|
|
275
|
+
- BioBERT: contextual biomedical NER
|
|
276
|
+
- PubTator: NCBI's annotation service
|
|
277
|
+
"""
|
|
278
|
+
import scispacy
|
|
279
|
+
import spacy
|
|
280
|
+
|
|
281
|
+
nlp = spacy.load("en_ner_bionlp13cg_md")
|
|
282
|
+
doc = nlp(text)
|
|
283
|
+
|
|
284
|
+
entities = []
|
|
285
|
+
for ent in doc.ents:
|
|
286
|
+
entities.append({
|
|
287
|
+
"text": ent.text,
|
|
288
|
+
"label": ent.label_,
|
|
289
|
+
"start": ent.start_char,
|
|
290
|
+
"end": ent.end_char,
|
|
291
|
+
})
|
|
292
|
+
|
|
293
|
+
return entities
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
## Building Reproducible Pipelines
|
|
297
|
+
|
|
298
|
+
### Pipeline Architecture
|
|
299
|
+
|
|
300
|
+
```
|
|
301
|
+
Recommended pipeline structure:
|
|
302
|
+
|
|
303
|
+
1. Query definition:
|
|
304
|
+
- Define search terms, date ranges, inclusion criteria
|
|
305
|
+
- Document in a protocol file (version-controlled)
|
|
306
|
+
|
|
307
|
+
2. Article retrieval:
|
|
308
|
+
- Search API for matching articles
|
|
309
|
+
- Download full text (XML/PDF)
|
|
310
|
+
- Store raw data with metadata
|
|
311
|
+
|
|
312
|
+
3. Text extraction:
|
|
313
|
+
- Parse XML or extract text from PDF
|
|
314
|
+
- Section segmentation
|
|
315
|
+
- Table and figure extraction (if needed)
|
|
316
|
+
|
|
317
|
+
4. Information extraction:
|
|
318
|
+
- NER for entities of interest
|
|
319
|
+
- Relation extraction
|
|
320
|
+
- Numeric data extraction (effect sizes, p-values)
|
|
321
|
+
|
|
322
|
+
5. Quality control:
|
|
323
|
+
- Sample-based manual validation (10-20% of results)
|
|
324
|
+
- Inter-annotator agreement on validation sample
|
|
325
|
+
- Error analysis and pipeline refinement
|
|
326
|
+
|
|
327
|
+
6. Data export:
|
|
328
|
+
- Structured output (CSV, JSON, database)
|
|
329
|
+
- Provenance tracking (which article, which section)
|
|
330
|
+
- Ready for downstream analysis
|
|
331
|
+
|
|
332
|
+
Best practices:
|
|
333
|
+
- Version control the entire pipeline code
|
|
334
|
+
- Log all API queries and responses
|
|
335
|
+
- Set random seeds for any sampling steps
|
|
336
|
+
- Share the pipeline code in supplementary materials
|
|
337
|
+
- Use DOIs or PMCIDs as stable article identifiers
|
|
338
|
+
- Cache downloaded articles to avoid re-fetching
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
Open access full-text mining enables research at a scale impossible with manual reading. A single researcher can systematically extract data from thousands of papers, enabling comprehensive evidence synthesis, trend analysis, and hypothesis generation. The key requirements are respecting legal and ethical boundaries, building robust parsing pipelines, and rigorously validating extracted data against manual review.
|