@wentorai/research-plugins 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +22 -22
- package/curated/analysis/README.md +82 -56
- package/curated/domains/README.md +225 -69
- package/curated/literature/README.md +115 -46
- package/curated/research/README.md +106 -58
- package/curated/tools/README.md +107 -87
- package/curated/writing/README.md +92 -45
- package/mcp-configs/academic-db/alphafold-mcp.json +20 -0
- package/mcp-configs/academic-db/brightspace-mcp.json +21 -0
- package/mcp-configs/academic-db/climatiq-mcp.json +20 -0
- package/mcp-configs/academic-db/gibs-mcp.json +20 -0
- package/mcp-configs/academic-db/gis-mcp-server.json +22 -0
- package/mcp-configs/academic-db/google-earth-engine-mcp.json +21 -0
- package/mcp-configs/academic-db/m4-clinical-mcp.json +21 -0
- package/mcp-configs/academic-db/medical-mcp.json +21 -0
- package/mcp-configs/academic-db/nexonco-mcp.json +20 -0
- package/mcp-configs/academic-db/omop-mcp.json +20 -0
- package/mcp-configs/academic-db/onekgpd-mcp.json +20 -0
- package/mcp-configs/academic-db/openedu-mcp.json +20 -0
- package/mcp-configs/academic-db/opengenes-mcp.json +20 -0
- package/mcp-configs/academic-db/openstax-mcp.json +21 -0
- package/mcp-configs/academic-db/openstreetmap-mcp.json +21 -0
- package/mcp-configs/academic-db/opentargets-mcp.json +21 -0
- package/mcp-configs/academic-db/pdb-mcp.json +21 -0
- package/mcp-configs/academic-db/smithsonian-mcp.json +20 -0
- package/mcp-configs/ai-platform/magi-researchers.json +21 -0
- package/mcp-configs/ai-platform/mcp-academic-researcher.json +22 -0
- package/mcp-configs/ai-platform/open-paper-machine.json +21 -0
- package/mcp-configs/ai-platform/paper-intelligence.json +21 -0
- package/mcp-configs/ai-platform/paper-reader.json +21 -0
- package/mcp-configs/ai-platform/paperdebugger.json +21 -0
- package/mcp-configs/browser/exa-mcp.json +20 -0
- package/mcp-configs/browser/mcp-searxng.json +21 -0
- package/mcp-configs/browser/mcp-webresearch.json +20 -0
- package/mcp-configs/cloud-docs/confluence-mcp.json +37 -0
- package/mcp-configs/cloud-docs/google-drive-mcp.json +35 -0
- package/mcp-configs/cloud-docs/notion-mcp.json +29 -0
- package/mcp-configs/communication/discord-mcp.json +29 -0
- package/mcp-configs/communication/discourse-mcp.json +21 -0
- package/mcp-configs/communication/slack-mcp.json +29 -0
- package/mcp-configs/communication/telegram-mcp.json +28 -0
- package/mcp-configs/data-platform/automl-stat-mcp.json +21 -0
- package/mcp-configs/data-platform/jefferson-stats-mcp.json +22 -0
- package/mcp-configs/data-platform/mcp-excel-server.json +21 -0
- package/mcp-configs/data-platform/mcp-stata.json +21 -0
- package/mcp-configs/data-platform/mcpstack-jupyter.json +21 -0
- package/mcp-configs/data-platform/ml-mcp.json +21 -0
- package/mcp-configs/data-platform/nasdaq-data-link-mcp.json +20 -0
- package/mcp-configs/data-platform/numpy-mcp.json +21 -0
- package/mcp-configs/database/neo4j-mcp.json +37 -0
- package/mcp-configs/database/postgres-mcp.json +28 -0
- package/mcp-configs/database/sqlite-mcp.json +29 -0
- package/mcp-configs/dev-platform/geogebra-mcp.json +21 -0
- package/mcp-configs/dev-platform/github-mcp.json +31 -0
- package/mcp-configs/dev-platform/gitlab-mcp.json +34 -0
- package/mcp-configs/dev-platform/latex-mcp-server.json +21 -0
- package/mcp-configs/dev-platform/manim-mcp.json +20 -0
- package/mcp-configs/dev-platform/mcp-echarts.json +20 -0
- package/mcp-configs/dev-platform/panel-viz-mcp.json +20 -0
- package/mcp-configs/dev-platform/paperbanana.json +20 -0
- package/mcp-configs/dev-platform/texflow-mcp.json +20 -0
- package/mcp-configs/dev-platform/texmcp.json +20 -0
- package/mcp-configs/dev-platform/typst-mcp.json +21 -0
- package/mcp-configs/dev-platform/vizro-mcp.json +20 -0
- package/mcp-configs/email/email-mcp.json +40 -0
- package/mcp-configs/email/gmail-mcp.json +37 -0
- package/mcp-configs/note-knowledge/local-faiss-mcp.json +21 -0
- package/mcp-configs/note-knowledge/mcp-memory-service.json +21 -0
- package/mcp-configs/note-knowledge/mcp-obsidian.json +23 -0
- package/mcp-configs/note-knowledge/mcp-ragdocs.json +20 -0
- package/mcp-configs/note-knowledge/mcp-summarizer.json +21 -0
- package/mcp-configs/note-knowledge/mediawiki-mcp.json +21 -0
- package/mcp-configs/note-knowledge/openzim-mcp.json +20 -0
- package/mcp-configs/note-knowledge/zettelkasten-mcp.json +21 -0
- package/mcp-configs/reference-mgr/academic-paper-mcp-http.json +20 -0
- package/mcp-configs/reference-mgr/academix.json +20 -0
- package/mcp-configs/reference-mgr/arxiv-research-mcp.json +21 -0
- package/mcp-configs/reference-mgr/google-scholar-abstract-mcp.json +19 -0
- package/mcp-configs/reference-mgr/google-scholar-mcp.json +20 -0
- package/mcp-configs/reference-mgr/mcp-paperswithcode.json +21 -0
- package/mcp-configs/reference-mgr/mcp-scholarly.json +20 -0
- package/mcp-configs/reference-mgr/mcp-simple-arxiv.json +20 -0
- package/mcp-configs/reference-mgr/mcp-simple-pubmed.json +20 -0
- package/mcp-configs/reference-mgr/mcp-zotero.json +21 -0
- package/mcp-configs/reference-mgr/mendeley-mcp.json +20 -0
- package/mcp-configs/reference-mgr/ncbi-mcp-server.json +22 -0
- package/mcp-configs/reference-mgr/onecite.json +21 -0
- package/mcp-configs/reference-mgr/paper-search-mcp.json +21 -0
- package/mcp-configs/reference-mgr/pubmed-search-mcp.json +21 -0
- package/mcp-configs/reference-mgr/scholar-mcp.json +21 -0
- package/mcp-configs/reference-mgr/scholar-multi-mcp.json +21 -0
- package/mcp-configs/reference-mgr/seerai.json +21 -0
- package/mcp-configs/reference-mgr/semantic-scholar-fastmcp.json +21 -0
- package/mcp-configs/reference-mgr/sourcelibrary.json +20 -0
- package/mcp-configs/registry.json +178 -149
- package/mcp-configs/repository/dataverse-mcp.json +33 -0
- package/mcp-configs/repository/huggingface-mcp.json +29 -0
- package/openclaw.plugin.json +2 -2
- package/package.json +2 -2
- package/skills/analysis/dataviz/algorithm-visualizer-guide/SKILL.md +259 -0
- package/skills/analysis/dataviz/bokeh-visualization-guide/SKILL.md +270 -0
- package/skills/analysis/dataviz/chart-image-generator/SKILL.md +229 -0
- package/skills/analysis/dataviz/citation-map-guide/SKILL.md +184 -0
- package/skills/analysis/dataviz/d3-visualization-guide/SKILL.md +281 -0
- package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +171 -0
- package/skills/analysis/dataviz/echarts-visualization-guide/SKILL.md +250 -0
- package/skills/analysis/dataviz/metabase-analytics-guide/SKILL.md +242 -0
- package/skills/analysis/dataviz/plotly-interactive-guide/SKILL.md +266 -0
- package/skills/analysis/dataviz/redash-analytics-guide/SKILL.md +284 -0
- package/skills/analysis/econometrics/econml-causal-guide/SKILL.md +163 -0
- package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +192 -0
- package/skills/analysis/econometrics/mostly-harmless-guide/SKILL.md +139 -0
- package/skills/analysis/econometrics/panel-data-analyst/SKILL.md +259 -0
- package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +267 -0
- package/skills/analysis/econometrics/python-causality-guide/SKILL.md +134 -0
- package/skills/analysis/econometrics/stata-accounting-guide/SKILL.md +269 -0
- package/skills/analysis/econometrics/stata-analyst-guide/SKILL.md +245 -0
- package/skills/analysis/econometrics/stata-reference-guide/SKILL.md +293 -0
- package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +157 -0
- package/skills/analysis/statistics/general-statistics-guide/SKILL.md +226 -0
- package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +106 -0
- package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +212 -0
- package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +192 -0
- package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +193 -0
- package/skills/analysis/statistics/senior-data-scientist-guide/SKILL.md +223 -0
- package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +100 -0
- package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +170 -0
- package/skills/analysis/wrangling/data-cleaning-pipeline/SKILL.md +266 -0
- package/skills/analysis/wrangling/data-cog-guide/SKILL.md +178 -0
- package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +197 -0
- package/skills/analysis/wrangling/stata-data-cleaning/SKILL.md +276 -0
- package/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md +119 -0
- package/skills/analysis/wrangling/survey-data-processing/SKILL.md +298 -0
- package/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md +146 -0
- package/skills/domains/ai-ml/ai-model-benchmarking/SKILL.md +209 -0
- package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +159 -0
- package/skills/domains/ai-ml/anomaly-detection-papers-guide/SKILL.md +167 -0
- package/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md +178 -0
- package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md +239 -0
- package/skills/domains/ai-ml/domain-adaptation-papers-guide/SKILL.md +173 -0
- package/skills/domains/ai-ml/generative-ai-guide/SKILL.md +146 -0
- package/skills/domains/ai-ml/graph-learning-papers-guide/SKILL.md +125 -0
- package/skills/domains/ai-ml/huggingface-inference-guide/SKILL.md +196 -0
- package/skills/domains/ai-ml/keras-deep-learning/SKILL.md +210 -0
- package/skills/domains/ai-ml/kolmogorov-arnold-networks-guide/SKILL.md +185 -0
- package/skills/domains/ai-ml/llm-from-scratch-guide/SKILL.md +124 -0
- package/skills/domains/ai-ml/ml-pipeline-guide/SKILL.md +295 -0
- package/skills/domains/ai-ml/nlp-toolkit-guide/SKILL.md +247 -0
- package/skills/domains/ai-ml/npcpy-research-guide/SKILL.md +137 -0
- package/skills/domains/ai-ml/pytorch-guide/SKILL.md +281 -0
- package/skills/domains/ai-ml/pytorch-lightning-guide/SKILL.md +244 -0
- package/skills/domains/ai-ml/responsible-ai-guide/SKILL.md +126 -0
- package/skills/domains/ai-ml/tensorflow-guide/SKILL.md +241 -0
- package/skills/domains/ai-ml/vmas-simulator-guide/SKILL.md +129 -0
- package/skills/domains/biomedical/bioagents-guide/SKILL.md +308 -0
- package/skills/domains/biomedical/clawbio-guide/SKILL.md +167 -0
- package/skills/domains/biomedical/clinical-dialogue-agents-guide/SKILL.md +145 -0
- package/skills/domains/biomedical/ena-sequence-api/SKILL.md +175 -0
- package/skills/domains/biomedical/genomas-guide/SKILL.md +126 -0
- package/skills/domains/biomedical/genotex-benchmark-guide/SKILL.md +125 -0
- package/skills/domains/biomedical/med-researcher-guide/SKILL.md +161 -0
- package/skills/domains/biomedical/med-researcher-r1-guide/SKILL.md +146 -0
- package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +345 -0
- package/skills/domains/biomedical/medical-imaging-guide/SKILL.md +305 -0
- package/skills/domains/biomedical/ncbi-blast-api/SKILL.md +195 -0
- package/skills/domains/biomedical/ncbi-datasets-api/SKILL.md +220 -0
- package/skills/domains/biomedical/quickgo-api/SKILL.md +181 -0
- package/skills/domains/business/architecture-design-guide/SKILL.md +279 -0
- package/skills/domains/business/innovation-management-guide/SKILL.md +257 -0
- package/skills/domains/business/operations-research-guide/SKILL.md +258 -0
- package/skills/domains/business/xpert-bi-guide/SKILL.md +84 -0
- package/skills/domains/chemistry/cactus-cheminformatics-guide/SKILL.md +89 -0
- package/skills/domains/chemistry/chemeagle-guide/SKILL.md +147 -0
- package/skills/domains/chemistry/chemgraph-agent-guide/SKILL.md +120 -0
- package/skills/domains/chemistry/molecular-dynamics-guide/SKILL.md +237 -0
- package/skills/domains/chemistry/pubchem-api-guide/SKILL.md +180 -0
- package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md +290 -0
- package/skills/domains/cs/ai-security-papers-guide/SKILL.md +103 -0
- package/skills/domains/cs/code-llm-papers-guide/SKILL.md +131 -0
- package/skills/domains/cs/distributed-systems-guide/SKILL.md +268 -0
- package/skills/domains/cs/formal-verification-guide/SKILL.md +298 -0
- package/skills/domains/cs/gaussian-splatting-papers-guide/SKILL.md +158 -0
- package/skills/domains/cs/llm-aiops-guide/SKILL.md +70 -0
- package/skills/domains/cs/software-heritage-api/SKILL.md +200 -0
- package/skills/domains/ecology/species-distribution-guide/SKILL.md +343 -0
- package/skills/domains/economics/imf-data-api-guide/SKILL.md +174 -0
- package/skills/domains/economics/nber-working-papers-api/SKILL.md +177 -0
- package/skills/domains/economics/post-labor-economics/SKILL.md +254 -0
- package/skills/domains/economics/pricing-psychology-guide/SKILL.md +273 -0
- package/skills/domains/economics/repec-economics-api/SKILL.md +188 -0
- package/skills/domains/economics/world-bank-data-guide/SKILL.md +179 -0
- package/skills/domains/education/academic-study-methods/SKILL.md +228 -0
- package/skills/domains/education/assessment-design-guide/SKILL.md +213 -0
- package/skills/domains/education/educational-research-methods/SKILL.md +179 -0
- package/skills/domains/education/edumcp-guide/SKILL.md +74 -0
- package/skills/domains/education/mooc-analytics-guide/SKILL.md +206 -0
- package/skills/domains/education/open-syllabus-api/SKILL.md +171 -0
- package/skills/domains/finance/akshare-finance-data/SKILL.md +207 -0
- package/skills/domains/finance/finsight-research-guide/SKILL.md +113 -0
- package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +117 -0
- package/skills/domains/finance/portfolio-optimization-guide/SKILL.md +279 -0
- package/skills/domains/finance/risk-modeling-guide/SKILL.md +260 -0
- package/skills/domains/finance/stata-accounting-research/SKILL.md +372 -0
- package/skills/domains/geoscience/climate-modeling-guide/SKILL.md +215 -0
- package/skills/domains/geoscience/pangaea-data-api/SKILL.md +197 -0
- package/skills/domains/geoscience/satellite-remote-sensing/SKILL.md +193 -0
- package/skills/domains/geoscience/seismology-data-guide/SKILL.md +208 -0
- package/skills/domains/humanities/digital-humanities-methods/SKILL.md +232 -0
- package/skills/domains/humanities/ethical-philosophy-guide/SKILL.md +244 -0
- package/skills/domains/humanities/history-research-guide/SKILL.md +260 -0
- package/skills/domains/humanities/political-history-guide/SKILL.md +241 -0
- package/skills/domains/law/caselaw-access-api/SKILL.md +149 -0
- package/skills/domains/law/legal-agent-skills-guide/SKILL.md +132 -0
- package/skills/domains/law/legal-nlp-guide/SKILL.md +236 -0
- package/skills/domains/law/legal-research-methods/SKILL.md +190 -0
- package/skills/domains/law/opencontracts-guide/SKILL.md +168 -0
- package/skills/domains/law/patent-analysis-guide/SKILL.md +257 -0
- package/skills/domains/law/regulatory-compliance-guide/SKILL.md +267 -0
- package/skills/domains/math/lean-theorem-proving-guide/SKILL.md +140 -0
- package/skills/domains/math/symbolic-computation-guide/SKILL.md +263 -0
- package/skills/domains/math/topology-data-analysis/SKILL.md +305 -0
- package/skills/domains/pharma/clinical-trial-design-guide/SKILL.md +271 -0
- package/skills/domains/pharma/drug-target-interaction/SKILL.md +242 -0
- package/skills/domains/pharma/madd-drug-discovery-guide/SKILL.md +153 -0
- package/skills/domains/pharma/pharmacovigilance-guide/SKILL.md +216 -0
- package/skills/domains/physics/astrophysics-data-guide/SKILL.md +305 -0
- package/skills/domains/physics/particle-physics-guide/SKILL.md +287 -0
- package/skills/domains/social-science/ipums-microdata-api/SKILL.md +211 -0
- package/skills/domains/social-science/network-analysis-guide/SKILL.md +310 -0
- package/skills/domains/social-science/psychology-research-guide/SKILL.md +270 -0
- package/skills/domains/social-science/sociology-research-guide/SKILL.md +238 -0
- package/skills/domains/social-science/sociology-research-methods/SKILL.md +181 -0
- package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +233 -0
- package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +120 -0
- package/skills/literature/discovery/papers-we-love-guide/SKILL.md +169 -0
- package/skills/literature/discovery/semantic-paper-radar/SKILL.md +144 -0
- package/skills/literature/discovery/zotero-arxiv-daily-guide/SKILL.md +94 -0
- package/skills/literature/fulltext/bioc-pmc-api/SKILL.md +146 -0
- package/skills/literature/fulltext/core-api-guide/SKILL.md +144 -0
- package/skills/literature/fulltext/dataverse-api/SKILL.md +215 -0
- package/skills/literature/fulltext/hal-archive-api/SKILL.md +218 -0
- package/skills/literature/fulltext/institutional-repository-guide/SKILL.md +212 -0
- package/skills/literature/fulltext/open-access-mining-guide/SKILL.md +341 -0
- package/skills/literature/fulltext/osf-api/SKILL.md +212 -0
- package/skills/literature/fulltext/pmc-ftp-bulk-download/SKILL.md +182 -0
- package/skills/literature/fulltext/zotero-ai-butler-guide/SKILL.md +166 -0
- package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +168 -0
- package/skills/literature/metadata/academic-paper-summarizer/SKILL.md +101 -0
- package/skills/literature/metadata/bibliometrix-guide/SKILL.md +164 -0
- package/skills/literature/metadata/crossref-event-data-api/SKILL.md +183 -0
- package/skills/literature/metadata/doi-content-negotiation/SKILL.md +202 -0
- package/skills/literature/metadata/orkg-api/SKILL.md +153 -0
- package/skills/literature/metadata/plumx-metrics-api/SKILL.md +188 -0
- package/skills/literature/metadata/ror-organization-api/SKILL.md +208 -0
- package/skills/literature/metadata/sophosia-reference-guide/SKILL.md +110 -0
- package/skills/literature/metadata/viaf-authority-api/SKILL.md +209 -0
- package/skills/literature/metadata/wikidata-api-guide/SKILL.md +156 -0
- package/skills/literature/metadata/zoplicate-dedup-guide/SKILL.md +147 -0
- package/skills/literature/metadata/zotero-actions-tags-guide/SKILL.md +212 -0
- package/skills/literature/metadata/zotmoov-guide/SKILL.md +120 -0
- package/skills/literature/metadata/zutilo-guide/SKILL.md +140 -0
- package/skills/literature/search/arxiv-batch-reporting/SKILL.md +133 -0
- package/skills/literature/search/arxiv-cli-tools/SKILL.md +172 -0
- package/skills/literature/search/arxiv-osiris/SKILL.md +199 -0
- package/skills/literature/search/arxiv-paper-processor/SKILL.md +141 -0
- package/skills/literature/search/baidu-scholar-guide/SKILL.md +110 -0
- package/skills/literature/search/base-academic-search/SKILL.md +196 -0
- package/skills/literature/search/chatpaper-guide/SKILL.md +122 -0
- package/skills/literature/search/citeseerx-api/SKILL.md +183 -0
- package/skills/literature/search/deep-literature-search/SKILL.md +149 -0
- package/skills/literature/search/deepgit-search-guide/SKILL.md +147 -0
- package/skills/literature/search/eric-education-api/SKILL.md +199 -0
- package/skills/literature/search/findpapers-guide/SKILL.md +177 -0
- package/skills/literature/search/ieee-xplore-api/SKILL.md +177 -0
- package/skills/literature/search/lens-scholarly-api/SKILL.md +211 -0
- package/skills/literature/search/multi-database-literature-search/SKILL.md +198 -0
- package/skills/literature/search/open-library-api/SKILL.md +196 -0
- package/skills/literature/search/open-semantic-search-guide/SKILL.md +190 -0
- package/skills/literature/search/openaire-api/SKILL.md +141 -0
- package/skills/literature/search/paper-search-mcp-guide/SKILL.md +107 -0
- package/skills/literature/search/papers-chat-guide/SKILL.md +194 -0
- package/skills/literature/search/pasa-paper-search-guide/SKILL.md +138 -0
- package/skills/literature/search/plos-open-access-api/SKILL.md +203 -0
- package/skills/literature/search/scielo-api/SKILL.md +182 -0
- package/skills/literature/search/share-research-api/SKILL.md +129 -0
- package/skills/literature/search/worldcat-search-api/SKILL.md +224 -0
- package/skills/research/automation/ai-scientist-v2-guide/SKILL.md +284 -0
- package/skills/research/automation/aim-experiment-guide/SKILL.md +234 -0
- package/skills/research/automation/claude-academic-workflow-guide/SKILL.md +202 -0
- package/skills/research/automation/coexist-ai-guide/SKILL.md +149 -0
- package/skills/research/automation/datagen-research-guide/SKILL.md +131 -0
- package/skills/research/automation/foam-agent-guide/SKILL.md +203 -0
- package/skills/research/automation/kedro-pipeline-guide/SKILL.md +216 -0
- package/skills/research/automation/mle-agent-guide/SKILL.md +139 -0
- package/skills/research/automation/paper-to-agent-guide/SKILL.md +116 -0
- package/skills/research/automation/rd-agent-guide/SKILL.md +246 -0
- package/skills/research/automation/research-paper-orchestrator/SKILL.md +254 -0
- package/skills/research/deep-research/academic-deep-research/SKILL.md +190 -0
- package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +141 -0
- package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +200 -0
- package/skills/research/deep-research/corvus-research-guide/SKILL.md +132 -0
- package/skills/research/deep-research/deep-research-pro/SKILL.md +213 -0
- package/skills/research/deep-research/deep-research-work/SKILL.md +204 -0
- package/skills/research/deep-research/deep-searcher-guide/SKILL.md +253 -0
- package/skills/research/deep-research/gpt-researcher-guide/SKILL.md +191 -0
- package/skills/research/deep-research/in-depth-research-guide/SKILL.md +205 -0
- package/skills/research/deep-research/khoj-research-guide/SKILL.md +200 -0
- package/skills/research/deep-research/kosmos-scientist-guide/SKILL.md +185 -0
- package/skills/research/deep-research/llm-scientific-discovery-guide/SKILL.md +178 -0
- package/skills/research/deep-research/local-deep-research-guide/SKILL.md +253 -0
- package/skills/research/deep-research/open-researcher-guide/SKILL.md +138 -0
- package/skills/research/deep-research/tongyi-deep-research-guide/SKILL.md +217 -0
- package/skills/research/funding/eu-horizon-guide/SKILL.md +244 -0
- package/skills/research/funding/grant-budget-guide/SKILL.md +284 -0
- package/skills/research/funding/nih-reporter-api-guide/SKILL.md +166 -0
- package/skills/research/funding/nsf-award-api-guide/SKILL.md +133 -0
- package/skills/research/methodology/academic-mentor-guide/SKILL.md +169 -0
- package/skills/research/methodology/claude-scientific-guide/SKILL.md +122 -0
- package/skills/research/methodology/deep-innovator-guide/SKILL.md +242 -0
- package/skills/research/methodology/osf-api-guide/SKILL.md +165 -0
- package/skills/research/methodology/parsifal-slr-guide/SKILL.md +154 -0
- package/skills/research/methodology/research-paper-kb/SKILL.md +263 -0
- package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +169 -0
- package/skills/research/methodology/research-town-guide/SKILL.md +263 -0
- package/skills/research/methodology/slr-automation-guide/SKILL.md +235 -0
- package/skills/research/paper-review/automated-review-guide/SKILL.md +281 -0
- package/skills/research/paper-review/latte-review-guide/SKILL.md +175 -0
- package/skills/research/paper-review/paper-compare-guide/SKILL.md +238 -0
- package/skills/research/paper-review/paper-critique-framework/SKILL.md +181 -0
- package/skills/research/paper-review/paper-digest-guide/SKILL.md +240 -0
- package/skills/research/paper-review/paper-research-assistant/SKILL.md +231 -0
- package/skills/research/paper-review/research-quality-filter/SKILL.md +261 -0
- package/skills/research/paper-review/review-response-guide/SKILL.md +275 -0
- package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +110 -0
- package/skills/tools/code-exec/google-colab-guide/SKILL.md +276 -0
- package/skills/tools/code-exec/kaggle-api-guide/SKILL.md +216 -0
- package/skills/tools/code-exec/overleaf-cli-guide/SKILL.md +279 -0
- package/skills/tools/diagram/clawphd-guide/SKILL.md +149 -0
- package/skills/tools/diagram/code-flow-visualizer/SKILL.md +197 -0
- package/skills/tools/diagram/excalidraw-diagram-guide/SKILL.md +170 -0
- package/skills/tools/diagram/json-data-visualizer/SKILL.md +270 -0
- package/skills/tools/diagram/kroki-diagram-api/SKILL.md +198 -0
- package/skills/tools/diagram/mermaid-architect-guide/SKILL.md +219 -0
- package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +201 -0
- package/skills/tools/diagram/tldraw-whiteboard-guide/SKILL.md +397 -0
- package/skills/tools/document/docsgpt-guide/SKILL.md +130 -0
- package/skills/tools/document/large-document-reader/SKILL.md +202 -0
- package/skills/tools/document/md2pdf-xelatex/SKILL.md +212 -0
- package/skills/tools/document/openpaper-guide/SKILL.md +232 -0
- package/skills/tools/document/paper-parse-guide/SKILL.md +243 -0
- package/skills/tools/document/weknora-guide/SKILL.md +216 -0
- package/skills/tools/document/zotero-addon-market-guide/SKILL.md +108 -0
- package/skills/tools/document/zotero-night-theme-guide/SKILL.md +142 -0
- package/skills/tools/document/zotero-style-guide/SKILL.md +217 -0
- package/skills/tools/knowledge-graph/citation-network-builder/SKILL.md +244 -0
- package/skills/tools/knowledge-graph/concept-map-generator/SKILL.md +284 -0
- package/skills/tools/knowledge-graph/graphiti-guide/SKILL.md +219 -0
- package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +135 -0
- package/skills/tools/knowledge-graph/notero-zotero-notion-guide/SKILL.md +187 -0
- package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +156 -0
- package/skills/tools/knowledge-graph/openspg-guide/SKILL.md +210 -0
- package/skills/tools/knowledge-graph/paperpile-notion-guide/SKILL.md +84 -0
- package/skills/tools/knowledge-graph/zotero-markdb-connect-guide/SKILL.md +162 -0
- package/skills/tools/ocr-translate/latex-translation-guide/SKILL.md +176 -0
- package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +198 -0
- package/skills/tools/ocr-translate/pdf-math-translate-guide/SKILL.md +141 -0
- package/skills/tools/ocr-translate/zotero-pdf-translate-guide/SKILL.md +95 -0
- package/skills/tools/ocr-translate/zotero-pdf2zh-guide/SKILL.md +143 -0
- package/skills/tools/scraping/dataset-finder-guide/SKILL.md +253 -0
- package/skills/tools/scraping/easy-spider-guide/SKILL.md +250 -0
- package/skills/tools/scraping/google-scholar-scraper/SKILL.md +255 -0
- package/skills/tools/scraping/repository-harvesting-guide/SKILL.md +310 -0
- package/skills/writing/citation/academic-citation-manager/SKILL.md +314 -0
- package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +182 -0
- package/skills/writing/citation/citation-assistant-skill/SKILL.md +192 -0
- package/skills/writing/citation/jabref-reference-guide/SKILL.md +127 -0
- package/skills/writing/citation/jasminum-zotero-guide/SKILL.md +103 -0
- package/skills/writing/citation/mendeley-api/SKILL.md +231 -0
- package/skills/writing/citation/obsidian-citation-guide/SKILL.md +164 -0
- package/skills/writing/citation/obsidian-zotero-guide/SKILL.md +137 -0
- package/skills/writing/citation/onecite-reference-guide/SKILL.md +168 -0
- package/skills/writing/citation/papersgpt-zotero-guide/SKILL.md +132 -0
- package/skills/writing/citation/papis-cli-guide/SKILL.md +213 -0
- package/skills/writing/citation/zotero-better-bibtex-guide/SKILL.md +107 -0
- package/skills/writing/citation/zotero-better-notes-guide/SKILL.md +121 -0
- package/skills/writing/citation/zotero-gpt-guide/SKILL.md +111 -0
- package/skills/writing/citation/zotero-mcp-guide/SKILL.md +164 -0
- package/skills/writing/citation/zotero-mdnotes-guide/SKILL.md +162 -0
- package/skills/writing/citation/zotero-reference-guide/SKILL.md +139 -0
- package/skills/writing/citation/zotero-scholar-guide/SKILL.md +294 -0
- package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +140 -0
- package/skills/writing/composition/ml-paper-writing/SKILL.md +163 -0
- package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +200 -0
- package/skills/writing/composition/paper-debugger-guide/SKILL.md +143 -0
- package/skills/writing/composition/paperforge-guide/SKILL.md +205 -0
- package/skills/writing/composition/research-paper-writer/SKILL.md +226 -0
- package/skills/writing/composition/scientific-writing-resources/SKILL.md +151 -0
- package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +153 -0
- package/skills/writing/latex/academic-writing-latex/SKILL.md +285 -0
- package/skills/writing/latex/latex-drawing-collection/SKILL.md +154 -0
- package/skills/writing/latex/latex-templates-collection/SKILL.md +159 -0
- package/skills/writing/latex/md-to-pdf-academic/SKILL.md +230 -0
- package/skills/writing/latex/tex-render-guide/SKILL.md +243 -0
- package/skills/writing/polish/academic-tone-guide/SKILL.md +209 -0
- package/skills/writing/polish/chinese-text-humanizer/SKILL.md +140 -0
- package/skills/writing/polish/conciseness-editing-guide/SKILL.md +225 -0
- package/skills/writing/polish/paper-polish-guide/SKILL.md +160 -0
- package/skills/writing/templates/arxiv-preprint-template/SKILL.md +184 -0
- package/skills/writing/templates/elegant-paper-template/SKILL.md +141 -0
- package/skills/writing/templates/graphical-abstract-guide/SKILL.md +183 -0
- package/skills/writing/templates/novathesis-guide/SKILL.md +152 -0
- package/skills/writing/templates/scientific-article-pdf/SKILL.md +261 -0
- package/skills/writing/templates/sjtuthesis-guide/SKILL.md +197 -0
- package/skills/writing/templates/thuthesis-guide/SKILL.md +181 -0
- package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md +0 -207
|
@@ -0,0 +1,310 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: repository-harvesting-guide
|
|
3
|
+
description: "Harvest metadata from open repositories using OAI-PMH protocol"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "tractor"
|
|
7
|
+
category: "tools"
|
|
8
|
+
subcategory: "scraping"
|
|
9
|
+
keywords: ["OAI-PMH", "metadata harvesting", "open repositories", "Dublin Core", "institutional repositories", "data providers"]
|
|
10
|
+
source: "wentor-research-plugins"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Repository Harvesting Guide
|
|
14
|
+
|
|
15
|
+
A skill for harvesting metadata from open access repositories using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol. Covers protocol fundamentals, building harvesters in Python, handling resumption tokens for large collections, metadata format parsing (Dublin Core, MARC, METS), selective harvesting by date and set, and integrating harvested data into research workflows.
|
|
16
|
+
|
|
17
|
+
## OAI-PMH Protocol Fundamentals
|
|
18
|
+
|
|
19
|
+
### What Is OAI-PMH
|
|
20
|
+
|
|
21
|
+
OAI-PMH is a standardized protocol that allows metadata to be harvested from repository systems. It is the backbone of library interoperability and is supported by virtually every institutional repository, preprint server, and digital library worldwide.
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
OAI-PMH Architecture:
|
|
25
|
+
|
|
26
|
+
Data Providers (repositories):
|
|
27
|
+
- Expose metadata through a standardized HTTP interface
|
|
28
|
+
- Must support Dublin Core as minimum metadata format
|
|
29
|
+
- May support additional formats (MARC, MODS, DataCite, etc.)
|
|
30
|
+
- Examples: arXiv, PubMed Central, DSpace repositories,
|
|
31
|
+
EPrints, institutional repositories
|
|
32
|
+
|
|
33
|
+
Service Providers (harvesters):
|
|
34
|
+
- Send HTTP requests to data providers
|
|
35
|
+
- Collect, aggregate, and index metadata
|
|
36
|
+
- Build search services, union catalogs, analytics
|
|
37
|
+
- Examples: BASE (Bielefeld), CORE, OpenDOAR
|
|
38
|
+
|
|
39
|
+
Protocol Version: 2.0 (current, since 2002)
|
|
40
|
+
Transport: HTTP GET or POST
|
|
41
|
+
Response format: XML
|
|
42
|
+
Base URL example: https://arxiv.org/oai2
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Six OAI-PMH Verbs
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
OAI-PMH defines exactly six request types (verbs):
|
|
49
|
+
|
|
50
|
+
1. Identify
|
|
51
|
+
Purpose: Describe the repository
|
|
52
|
+
URL: baseURL?verb=Identify
|
|
53
|
+
Returns: repository name, admin email, earliest datestamp,
|
|
54
|
+
granularity, compression support
|
|
55
|
+
|
|
56
|
+
2. ListMetadataFormats
|
|
57
|
+
Purpose: List available metadata formats
|
|
58
|
+
URL: baseURL?verb=ListMetadataFormats
|
|
59
|
+
Returns: format prefixes (oai_dc, marc21, datacite, etc.)
|
|
60
|
+
Optional: identifier parameter to check formats for one record
|
|
61
|
+
|
|
62
|
+
3. ListSets
|
|
63
|
+
Purpose: List available sets (collections/categories)
|
|
64
|
+
URL: baseURL?verb=ListSets
|
|
65
|
+
Returns: set names and specs for selective harvesting
|
|
66
|
+
Example sets: physics:hep-th, cs:AI, math:AG
|
|
67
|
+
|
|
68
|
+
4. ListIdentifiers
|
|
69
|
+
Purpose: List record identifiers (headers only, no metadata)
|
|
70
|
+
URL: baseURL?verb=ListIdentifiers&metadataPrefix=oai_dc
|
|
71
|
+
Optional: from, until, set parameters
|
|
72
|
+
Returns: identifiers, datestamps, set memberships
|
|
73
|
+
|
|
74
|
+
5. ListRecords
|
|
75
|
+
Purpose: Harvest full metadata records
|
|
76
|
+
URL: baseURL?verb=ListRecords&metadataPrefix=oai_dc
|
|
77
|
+
Optional: from, until, set parameters
|
|
78
|
+
Returns: complete metadata records in requested format
|
|
79
|
+
|
|
80
|
+
6. GetRecord
|
|
81
|
+
Purpose: Retrieve a single record by identifier
|
|
82
|
+
URL: baseURL?verb=GetRecord&identifier=oai:arxiv:2301.00001
|
|
83
|
+
&metadataPrefix=oai_dc
|
|
84
|
+
Returns: one complete metadata record
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Building a Harvester in Python
|
|
88
|
+
|
|
89
|
+
### Basic Harvester
|
|
90
|
+
|
|
91
|
+
```python
|
|
92
|
+
import requests
|
|
93
|
+
import xml.etree.ElementTree as ET
|
|
94
|
+
import time
|
|
95
|
+
|
|
96
|
+
OAI_NS = "http://www.openarchives.org/OAI/2.0/"
|
|
97
|
+
DC_NS = "http://purl.org/dc/elements/1.1/"
|
|
98
|
+
|
|
99
|
+
def harvest_records(base_url, metadata_prefix="oai_dc",
|
|
100
|
+
from_date=None, until_date=None,
|
|
101
|
+
set_spec=None):
|
|
102
|
+
"""
|
|
103
|
+
Harvest all records from an OAI-PMH endpoint.
|
|
104
|
+
Handles resumption tokens for paginated results.
|
|
105
|
+
|
|
106
|
+
Args:
|
|
107
|
+
base_url: OAI-PMH base URL
|
|
108
|
+
metadata_prefix: metadata format (default: oai_dc)
|
|
109
|
+
from_date: selective harvest start (YYYY-MM-DD)
|
|
110
|
+
until_date: selective harvest end (YYYY-MM-DD)
|
|
111
|
+
set_spec: restrict to a specific set
|
|
112
|
+
"""
|
|
113
|
+
params = {
|
|
114
|
+
"verb": "ListRecords",
|
|
115
|
+
"metadataPrefix": metadata_prefix,
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
if from_date:
|
|
119
|
+
params["from"] = from_date
|
|
120
|
+
if until_date:
|
|
121
|
+
params["until"] = until_date
|
|
122
|
+
if set_spec:
|
|
123
|
+
params["set"] = set_spec
|
|
124
|
+
|
|
125
|
+
all_records = []
|
|
126
|
+
request_count = 0
|
|
127
|
+
|
|
128
|
+
while True:
|
|
129
|
+
response = requests.get(base_url, params=params, timeout=30)
|
|
130
|
+
response.raise_for_status()
|
|
131
|
+
request_count += 1
|
|
132
|
+
|
|
133
|
+
root = ET.fromstring(response.content)
|
|
134
|
+
|
|
135
|
+
# Parse records from this page
|
|
136
|
+
records = root.findall(
|
|
137
|
+
f".//{{{OAI_NS}}}record"
|
|
138
|
+
)
|
|
139
|
+
|
|
140
|
+
for record in records:
|
|
141
|
+
parsed = parse_dublin_core(record)
|
|
142
|
+
if parsed:
|
|
143
|
+
all_records.append(parsed)
|
|
144
|
+
|
|
145
|
+
# Check for resumption token
|
|
146
|
+
token_elem = root.find(
|
|
147
|
+
f".//{{{OAI_NS}}}resumptionToken"
|
|
148
|
+
)
|
|
149
|
+
|
|
150
|
+
if token_elem is not None and token_elem.text:
|
|
151
|
+
params = {
|
|
152
|
+
"verb": "ListRecords",
|
|
153
|
+
"resumptionToken": token_elem.text,
|
|
154
|
+
}
|
|
155
|
+
# Polite delay between requests
|
|
156
|
+
time.sleep(2)
|
|
157
|
+
else:
|
|
158
|
+
break
|
|
159
|
+
|
|
160
|
+
print(f"Harvested {len(all_records)} records "
|
|
161
|
+
f"in {request_count} requests")
|
|
162
|
+
return all_records
|
|
163
|
+
|
|
164
|
+
|
|
165
|
+
def parse_dublin_core(record_element):
|
|
166
|
+
"""
|
|
167
|
+
Parse a Dublin Core metadata record into a dictionary.
|
|
168
|
+
"""
|
|
169
|
+
header = record_element.find(f"{{{OAI_NS}}}header")
|
|
170
|
+
metadata = record_element.find(f"{{{OAI_NS}}}metadata")
|
|
171
|
+
|
|
172
|
+
if header is None or metadata is None:
|
|
173
|
+
return None
|
|
174
|
+
|
|
175
|
+
# Check if record is deleted
|
|
176
|
+
status = header.get("status", "")
|
|
177
|
+
if status == "deleted":
|
|
178
|
+
return None
|
|
179
|
+
|
|
180
|
+
identifier = header.findtext(f"{{{OAI_NS}}}identifier", "")
|
|
181
|
+
datestamp = header.findtext(f"{{{OAI_NS}}}datestamp", "")
|
|
182
|
+
|
|
183
|
+
dc = metadata.find(f".//{{{DC_NS}}}../")
|
|
184
|
+
|
|
185
|
+
result = {
|
|
186
|
+
"oai_identifier": identifier,
|
|
187
|
+
"datestamp": datestamp,
|
|
188
|
+
"title": find_dc_text(metadata, "title"),
|
|
189
|
+
"creator": find_dc_all(metadata, "creator"),
|
|
190
|
+
"subject": find_dc_all(metadata, "subject"),
|
|
191
|
+
"description": find_dc_text(metadata, "description"),
|
|
192
|
+
"date": find_dc_text(metadata, "date"),
|
|
193
|
+
"type": find_dc_text(metadata, "type"),
|
|
194
|
+
"identifier": find_dc_all(metadata, "identifier"),
|
|
195
|
+
"language": find_dc_text(metadata, "language"),
|
|
196
|
+
"rights": find_dc_text(metadata, "rights"),
|
|
197
|
+
}
|
|
198
|
+
|
|
199
|
+
return result
|
|
200
|
+
|
|
201
|
+
|
|
202
|
+
def find_dc_text(metadata, element_name):
|
|
203
|
+
"""Find first Dublin Core element text."""
|
|
204
|
+
elem = metadata.find(f".//{{{DC_NS}}}{element_name}")
|
|
205
|
+
return elem.text if elem is not None else ""
|
|
206
|
+
|
|
207
|
+
|
|
208
|
+
def find_dc_all(metadata, element_name):
|
|
209
|
+
"""Find all values of a Dublin Core element."""
|
|
210
|
+
elems = metadata.findall(f".//{{{DC_NS}}}{element_name}")
|
|
211
|
+
return [e.text for e in elems if e.text]
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
## Selective Harvesting
|
|
215
|
+
|
|
216
|
+
### By Date Range
|
|
217
|
+
|
|
218
|
+
```
|
|
219
|
+
Incremental harvesting strategy:
|
|
220
|
+
|
|
221
|
+
First harvest: Get everything
|
|
222
|
+
from_date = None (or repository's earliestDatestamp)
|
|
223
|
+
until_date = today
|
|
224
|
+
|
|
225
|
+
Subsequent harvests: Get only new/modified records
|
|
226
|
+
from_date = last_harvest_date
|
|
227
|
+
until_date = today
|
|
228
|
+
|
|
229
|
+
Date granularity:
|
|
230
|
+
- Day-level: YYYY-MM-DD (most common)
|
|
231
|
+
- Second-level: YYYY-MM-DDThh:mm:ssZ (some repositories)
|
|
232
|
+
- Check the Identify response for supported granularity
|
|
233
|
+
|
|
234
|
+
Important: OAI-PMH datestamps reflect the date the METADATA
|
|
235
|
+
was last modified, not the publication date. A record edited
|
|
236
|
+
yesterday to fix a typo will appear in a harvest with
|
|
237
|
+
from=yesterday, even if the paper was published in 2015.
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
### By Set (Collection)
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
Common set structures by repository type:
|
|
244
|
+
|
|
245
|
+
arXiv:
|
|
246
|
+
physics, physics:hep-th, cs, cs:AI, math, math:AG, etc.
|
|
247
|
+
|
|
248
|
+
DSpace repositories:
|
|
249
|
+
com_12345_1 (community), col_12345_2 (collection)
|
|
250
|
+
Hierarchical: department -> collection
|
|
251
|
+
|
|
252
|
+
PubMed Central:
|
|
253
|
+
By journal: pmc-journal-name
|
|
254
|
+
By funder: pmc-funder-name
|
|
255
|
+
|
|
256
|
+
Strategy:
|
|
257
|
+
1. Call ListSets to see available sets
|
|
258
|
+
2. Identify sets relevant to your research topic
|
|
259
|
+
3. Harvest only those sets to reduce data volume
|
|
260
|
+
4. Store the set membership for each record
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
## Data Quality and Deduplication
|
|
264
|
+
|
|
265
|
+
### Common Quality Issues
|
|
266
|
+
|
|
267
|
+
```
|
|
268
|
+
Quality problems in harvested metadata:
|
|
269
|
+
|
|
270
|
+
1. Duplicate records:
|
|
271
|
+
- Same paper in multiple repositories
|
|
272
|
+
- Same paper in multiple sets within one repository
|
|
273
|
+
- Solution: Deduplicate by DOI, then by title similarity
|
|
274
|
+
|
|
275
|
+
2. Incomplete metadata:
|
|
276
|
+
- Missing abstracts (very common)
|
|
277
|
+
- Missing author identifiers
|
|
278
|
+
- Missing dates or using inconsistent date formats
|
|
279
|
+
- Solution: Enrich with Crossref or OpenAlex lookups
|
|
280
|
+
|
|
281
|
+
3. Encoding issues:
|
|
282
|
+
- Non-UTF-8 characters in older repositories
|
|
283
|
+
- HTML entities in text fields
|
|
284
|
+
- Solution: Normalize encoding, strip HTML tags
|
|
285
|
+
|
|
286
|
+
4. Inconsistent formats:
|
|
287
|
+
- Dates as "2023", "2023-01", "2023-01-15", "January 2023"
|
|
288
|
+
- Author names as "Smith, John" vs "John Smith" vs "J. Smith"
|
|
289
|
+
- Solution: Parse and normalize to canonical formats
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
## Notable OAI-PMH Endpoints
|
|
293
|
+
|
|
294
|
+
```
|
|
295
|
+
Major repositories with OAI-PMH support:
|
|
296
|
+
|
|
297
|
+
arXiv: https://export.arxiv.org/oai2
|
|
298
|
+
PubMed Central: https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi
|
|
299
|
+
Europeana: https://oai.europeana.eu/oai
|
|
300
|
+
HAL (France): https://api.archives-ouvertes.fr/oai/hal
|
|
301
|
+
DBLP: https://dblp.org/oai
|
|
302
|
+
CiteSeerX: https://citeseerx.ist.psu.edu/oai2
|
|
303
|
+
|
|
304
|
+
To find more endpoints:
|
|
305
|
+
- OpenDOAR directory: https://v2.sherpa.ac.uk/opendoar/
|
|
306
|
+
- ROAR (Registry of Open Access Repositories)
|
|
307
|
+
- BASE (Bielefeld Academic Search Engine) source list
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
OAI-PMH harvesting remains the most reliable method for building comprehensive metadata collections from open repositories. While newer APIs like ResourceSync and Signposting offer richer functionality, OAI-PMH's universal adoption and simplicity make it the practical choice for most academic metadata collection tasks.
|
|
@@ -0,0 +1,314 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: academic-citation-manager
|
|
3
|
+
description: "Manage academic citations across BibTeX, APA, MLA, and Chicago formats"
|
|
4
|
+
metadata:
|
|
5
|
+
openclaw:
|
|
6
|
+
emoji: "bookmark"
|
|
7
|
+
category: "writing"
|
|
8
|
+
subcategory: "citation"
|
|
9
|
+
keywords: ["citation", "BibTeX", "APA", "reference management", "bibliography", "formatting"]
|
|
10
|
+
source: "https://github.com/wentor-ai/research-plugins"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Academic Citation Manager
|
|
14
|
+
|
|
15
|
+
Manage academic citations across multiple formats (BibTeX, APA 7th, MLA 9th, Chicago, Vancouver, IEEE) with automated retrieval from DOIs, conversion between formats, deduplication, and validation. This skill handles the complete citation lifecycle from initial capture through final manuscript formatting.
|
|
16
|
+
|
|
17
|
+
## Overview
|
|
18
|
+
|
|
19
|
+
Citation management is a persistent friction point in academic writing. Researchers collect references from multiple sources (databases, PDFs, colleagues, web pages), store them in different formats, and must output them in the specific style required by each target journal. Errors in citations -- misspelled author names, incorrect years, broken DOIs, inconsistent formatting -- are among the most common reasons for desk rejection and reviewer criticism.
|
|
20
|
+
|
|
21
|
+
This skill provides a comprehensive citation management workflow that goes beyond what GUI reference managers offer. It can retrieve complete metadata from a DOI in seconds, convert between any citation format, detect and merge duplicate entries, validate entries against CrossRef and Semantic Scholar databases, and generate properly formatted bibliographies for any major citation style.
|
|
22
|
+
|
|
23
|
+
The approach is text-based and scriptable, making it ideal for integration with LaTeX workflows, Markdown writing pipelines, and automated document generation. All citation data is stored in standard BibTeX format as the canonical source, with on-demand conversion to other formats for specific manuscript requirements.
|
|
24
|
+
|
|
25
|
+
## Citation Retrieval
|
|
26
|
+
|
|
27
|
+
### From DOI
|
|
28
|
+
|
|
29
|
+
```python
|
|
30
|
+
import requests
|
|
31
|
+
|
|
32
|
+
def get_bibtex_from_doi(doi):
|
|
33
|
+
"""Retrieve BibTeX entry from a DOI via CrossRef."""
|
|
34
|
+
url = f"https://doi.org/{doi}"
|
|
35
|
+
headers = {"Accept": "application/x-bibtex"}
|
|
36
|
+
response = requests.get(url, headers=headers, allow_redirects=True)
|
|
37
|
+
if response.status_code == 200:
|
|
38
|
+
return response.text
|
|
39
|
+
return None
|
|
40
|
+
|
|
41
|
+
# Example
|
|
42
|
+
bibtex = get_bibtex_from_doi("10.1038/s41586-021-03819-2")
|
|
43
|
+
print(bibtex)
|
|
44
|
+
# @article{Jumper_2021,
|
|
45
|
+
# title={Highly accurate protein structure prediction with AlphaFold},
|
|
46
|
+
# author={Jumper, John and Evans, Richard and ...},
|
|
47
|
+
# journal={Nature},
|
|
48
|
+
# volume={596},
|
|
49
|
+
# pages={583--589},
|
|
50
|
+
# year={2021},
|
|
51
|
+
# publisher={Springer}
|
|
52
|
+
# }
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### From Semantic Scholar
|
|
56
|
+
|
|
57
|
+
```python
|
|
58
|
+
def get_citation_from_s2(paper_id):
|
|
59
|
+
"""Retrieve citation data from Semantic Scholar API."""
|
|
60
|
+
url = f"https://api.semanticscholar.org/graph/v1/paper/{paper_id}"
|
|
61
|
+
params = {"fields": "title,authors,year,venue,doi,citationCount,externalIds"}
|
|
62
|
+
response = requests.get(url, params=params)
|
|
63
|
+
if response.status_code == 200:
|
|
64
|
+
data = response.json()
|
|
65
|
+
return format_as_bibtex(data)
|
|
66
|
+
return None
|
|
67
|
+
|
|
68
|
+
def format_as_bibtex(s2_data):
|
|
69
|
+
"""Convert Semantic Scholar data to BibTeX."""
|
|
70
|
+
authors = s2_data.get("authors", [])
|
|
71
|
+
author_str = " and ".join(a["name"] for a in authors)
|
|
72
|
+
first_author = authors[0]["name"].split()[-1] if authors else "Unknown"
|
|
73
|
+
year = s2_data.get("year", "")
|
|
74
|
+
key = f"{first_author}_{year}"
|
|
75
|
+
|
|
76
|
+
return f"""@article{{{key},
|
|
77
|
+
title={{{s2_data.get('title', '')}}},
|
|
78
|
+
author={{{author_str}}},
|
|
79
|
+
year={{{year}}},
|
|
80
|
+
journal={{{s2_data.get('venue', '')}}},
|
|
81
|
+
doi={{{s2_data.get('doi', '')}}}
|
|
82
|
+
}}"""
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### From arXiv ID
|
|
86
|
+
|
|
87
|
+
```python
|
|
88
|
+
def get_bibtex_from_arxiv(arxiv_id):
|
|
89
|
+
"""Retrieve BibTeX from arXiv."""
|
|
90
|
+
import feedparser
|
|
91
|
+
url = f"http://export.arxiv.org/api/query?id_list={arxiv_id}"
|
|
92
|
+
feed = feedparser.parse(url)
|
|
93
|
+
if feed.entries:
|
|
94
|
+
entry = feed.entries[0]
|
|
95
|
+
authors = " and ".join(a["name"] for a in entry.authors)
|
|
96
|
+
first_author = entry.authors[0]["name"].split()[-1]
|
|
97
|
+
year = entry.published[:4]
|
|
98
|
+
return f"""@article{{{first_author}_{year},
|
|
99
|
+
title={{{entry.title.replace(chr(10), ' ')}}},
|
|
100
|
+
author={{{authors}}},
|
|
101
|
+
year={{{year}}},
|
|
102
|
+
journal={{arXiv preprint arXiv:{arxiv_id}}},
|
|
103
|
+
url={{https://arxiv.org/abs/{arxiv_id}}}
|
|
104
|
+
}}"""
|
|
105
|
+
return None
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
## Format Conversion
|
|
109
|
+
|
|
110
|
+
### BibTeX to APA 7th
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
def bibtex_to_apa7(entry):
|
|
114
|
+
"""Convert a parsed BibTeX entry to APA 7th edition format."""
|
|
115
|
+
authors = format_apa_authors(entry["author"])
|
|
116
|
+
year = entry.get("year", "n.d.")
|
|
117
|
+
title = entry["title"]
|
|
118
|
+
journal = entry.get("journal", "")
|
|
119
|
+
volume = entry.get("volume", "")
|
|
120
|
+
issue = entry.get("number", "")
|
|
121
|
+
pages = entry.get("pages", "")
|
|
122
|
+
doi = entry.get("doi", "")
|
|
123
|
+
|
|
124
|
+
# Article format
|
|
125
|
+
citation = f"{authors} ({year}). {title}. "
|
|
126
|
+
if journal:
|
|
127
|
+
citation += f"*{journal}*"
|
|
128
|
+
if volume:
|
|
129
|
+
citation += f", *{volume}*"
|
|
130
|
+
if issue:
|
|
131
|
+
citation += f"({issue})"
|
|
132
|
+
if pages:
|
|
133
|
+
citation += f", {pages}"
|
|
134
|
+
citation += "."
|
|
135
|
+
if doi:
|
|
136
|
+
citation += f" https://doi.org/{doi}"
|
|
137
|
+
|
|
138
|
+
return citation
|
|
139
|
+
|
|
140
|
+
def format_apa_authors(author_string):
|
|
141
|
+
"""Format author names in APA style: Last, F. M."""
|
|
142
|
+
authors = [a.strip() for a in author_string.split(" and ")]
|
|
143
|
+
formatted = []
|
|
144
|
+
for author in authors:
|
|
145
|
+
parts = author.split(", ") if ", " in author else author.rsplit(" ", 1)[::-1]
|
|
146
|
+
if len(parts) >= 2:
|
|
147
|
+
last = parts[0]
|
|
148
|
+
firsts = parts[1].split()
|
|
149
|
+
initials = " ".join(f"{f[0]}." for f in firsts)
|
|
150
|
+
formatted.append(f"{last}, {initials}")
|
|
151
|
+
else:
|
|
152
|
+
formatted.append(parts[0])
|
|
153
|
+
|
|
154
|
+
if len(formatted) == 1:
|
|
155
|
+
return formatted[0]
|
|
156
|
+
elif len(formatted) == 2:
|
|
157
|
+
return f"{formatted[0]}, & {formatted[1]}"
|
|
158
|
+
elif len(formatted) <= 20:
|
|
159
|
+
return ", ".join(formatted[:-1]) + f", & {formatted[-1]}"
|
|
160
|
+
else:
|
|
161
|
+
return ", ".join(formatted[:19]) + f", ... {formatted[-1]}"
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### Format Examples
|
|
165
|
+
|
|
166
|
+
The same reference in different styles:
|
|
167
|
+
|
|
168
|
+
**BibTeX:**
|
|
169
|
+
```bibtex
|
|
170
|
+
@article{Jumper_2021,
|
|
171
|
+
title={Highly accurate protein structure prediction with AlphaFold},
|
|
172
|
+
author={Jumper, John and Evans, Richard and Pritzel, Alexander},
|
|
173
|
+
journal={Nature},
|
|
174
|
+
volume={596},
|
|
175
|
+
pages={583--589},
|
|
176
|
+
year={2021},
|
|
177
|
+
doi={10.1038/s41586-021-03819-2}
|
|
178
|
+
}
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
**APA 7th:**
|
|
182
|
+
Jumper, J., Evans, R., & Pritzel, A. (2021). Highly accurate protein structure prediction with AlphaFold. *Nature*, *596*, 583-589. https://doi.org/10.1038/s41586-021-03819-2
|
|
183
|
+
|
|
184
|
+
**MLA 9th:**
|
|
185
|
+
Jumper, John, Richard Evans, and Alexander Pritzel. "Highly Accurate Protein Structure Prediction with AlphaFold." *Nature*, vol. 596, 2021, pp. 583-89.
|
|
186
|
+
|
|
187
|
+
**Chicago (Author-Date):**
|
|
188
|
+
Jumper, John, Richard Evans, and Alexander Pritzel. 2021. "Highly Accurate Protein Structure Prediction with AlphaFold." *Nature* 596: 583-89.
|
|
189
|
+
|
|
190
|
+
**Vancouver:**
|
|
191
|
+
Jumper J, Evans R, Pritzel A. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583-9.
|
|
192
|
+
|
|
193
|
+
**IEEE:**
|
|
194
|
+
J. Jumper, R. Evans, and A. Pritzel, "Highly accurate protein structure prediction with AlphaFold," *Nature*, vol. 596, pp. 583-589, 2021.
|
|
195
|
+
|
|
196
|
+
## Deduplication
|
|
197
|
+
|
|
198
|
+
### Detecting Duplicate Entries
|
|
199
|
+
|
|
200
|
+
```python
|
|
201
|
+
from difflib import SequenceMatcher
|
|
202
|
+
|
|
203
|
+
def find_duplicates(bib_entries, threshold=0.85):
|
|
204
|
+
"""Find duplicate BibTeX entries by title similarity."""
|
|
205
|
+
duplicates = []
|
|
206
|
+
titles = [(key, normalize_title(entry["title"]))
|
|
207
|
+
for key, entry in bib_entries.items()]
|
|
208
|
+
|
|
209
|
+
for i in range(len(titles)):
|
|
210
|
+
for j in range(i + 1, len(titles)):
|
|
211
|
+
similarity = SequenceMatcher(
|
|
212
|
+
None, titles[i][1], titles[j][1]
|
|
213
|
+
).ratio()
|
|
214
|
+
if similarity >= threshold:
|
|
215
|
+
duplicates.append({
|
|
216
|
+
"entry_a": titles[i][0],
|
|
217
|
+
"entry_b": titles[j][0],
|
|
218
|
+
"similarity": similarity
|
|
219
|
+
})
|
|
220
|
+
return duplicates
|
|
221
|
+
|
|
222
|
+
def normalize_title(title):
|
|
223
|
+
"""Normalize title for comparison."""
|
|
224
|
+
import re
|
|
225
|
+
title = title.lower()
|
|
226
|
+
title = re.sub(r'[{}\\]', '', title) # Remove LaTeX formatting
|
|
227
|
+
title = re.sub(r'[^a-z0-9\s]', '', title) # Remove punctuation
|
|
228
|
+
title = ' '.join(title.split()) # Normalize whitespace
|
|
229
|
+
return title
|
|
230
|
+
|
|
231
|
+
def merge_duplicates(entry_a, entry_b):
|
|
232
|
+
"""Merge two duplicate entries, preferring the more complete one."""
|
|
233
|
+
merged = {}
|
|
234
|
+
all_fields = set(list(entry_a.keys()) + list(entry_b.keys()))
|
|
235
|
+
for field in all_fields:
|
|
236
|
+
val_a = entry_a.get(field, "")
|
|
237
|
+
val_b = entry_b.get(field, "")
|
|
238
|
+
# Prefer the longer (more complete) value
|
|
239
|
+
merged[field] = val_a if len(str(val_a)) >= len(str(val_b)) else val_b
|
|
240
|
+
return merged
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
## Validation
|
|
244
|
+
|
|
245
|
+
### CrossRef Validation
|
|
246
|
+
|
|
247
|
+
```python
|
|
248
|
+
def validate_citation(doi):
|
|
249
|
+
"""Validate a citation against CrossRef metadata."""
|
|
250
|
+
url = f"https://api.crossref.org/works/{doi}"
|
|
251
|
+
response = requests.get(url)
|
|
252
|
+
if response.status_code != 200:
|
|
253
|
+
return {"valid": False, "error": "DOI not found in CrossRef"}
|
|
254
|
+
|
|
255
|
+
data = response.json()["message"]
|
|
256
|
+
return {
|
|
257
|
+
"valid": True,
|
|
258
|
+
"title": data.get("title", [None])[0],
|
|
259
|
+
"authors": [f"{a.get('family', '')}, {a.get('given', '')}"
|
|
260
|
+
for a in data.get("author", [])],
|
|
261
|
+
"year": data.get("published-print", {}).get("date-parts", [[None]])[0][0],
|
|
262
|
+
"journal": data.get("container-title", [None])[0],
|
|
263
|
+
"type": data.get("type", "unknown")
|
|
264
|
+
}
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Common Citation Errors
|
|
268
|
+
|
|
269
|
+
| Error | Detection | Fix |
|
|
270
|
+
|-------|-----------|-----|
|
|
271
|
+
| Missing DOI | Check `doi` field is empty | Query CrossRef by title |
|
|
272
|
+
| Wrong year | Compare against CrossRef | Use CrossRef year |
|
|
273
|
+
| Author name variants | Fuzzy match against ORCID | Standardize to ORCID name |
|
|
274
|
+
| Duplicate entries | Title similarity > 85% | Merge into single entry |
|
|
275
|
+
| Broken URL | HTTP HEAD request returns 4xx/5xx | Update or remove URL |
|
|
276
|
+
| Incomplete entry | Missing required fields for style | Retrieve from DOI |
|
|
277
|
+
|
|
278
|
+
## Integration with Writing Tools
|
|
279
|
+
|
|
280
|
+
### LaTeX
|
|
281
|
+
|
|
282
|
+
```latex
|
|
283
|
+
% In document preamble
|
|
284
|
+
\usepackage[backend=biber,style=apa]{biblatex}
|
|
285
|
+
\addbibresource{references.bib}
|
|
286
|
+
|
|
287
|
+
% In text
|
|
288
|
+
\textcite{Jumper_2021} showed that...
|
|
289
|
+
As demonstrated by previous work \parencite{Jumper_2021}...
|
|
290
|
+
|
|
291
|
+
% At end of document
|
|
292
|
+
\printbibliography
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### Pandoc Markdown
|
|
296
|
+
|
|
297
|
+
```markdown
|
|
298
|
+
Previous work [@Jumper_2021] showed that...
|
|
299
|
+
|
|
300
|
+
## References
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
```bash
|
|
304
|
+
pandoc paper.md --citeproc --bibliography=references.bib \
|
|
305
|
+
--csl=apa.csl -o paper.pdf
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
## References
|
|
309
|
+
|
|
310
|
+
- CrossRef API: https://api.crossref.org
|
|
311
|
+
- Semantic Scholar API: https://api.semanticscholar.org
|
|
312
|
+
- APA 7th Edition Manual: https://apastyle.apa.org/products/publication-manual-7th-edition
|
|
313
|
+
- BibTeX documentation: http://www.bibtex.org
|
|
314
|
+
- CSL styles repository: https://github.com/citation-style-language/styles
|